Achieving AI Scale in FSI
Despite the meteoric rise of AI, and GenAI in particular, it has yet to hit the mainstream. One significant barrier to widespread adoption is AI’s high run-time consumption costs. Therefore, when implementing AI solutions, selecting a platform that maximizes efficiency, scalability, and performance is critical. This blog explores this challenge and highlights how the Vespa architecture supports large-scale AI deployments.
AI in FSI is an Experiment
In their recent Financial Services AI Dossier, Deloitte observed that while AI holds immense potential, the financial burden of developing and running AI systems is a major hurdle. Due to the high costs, many FSIs remain in the early stages of AI adoption, focusing on small-scale pilots rather than large-scale deployments.
While these experiments are proving value, the high cost of running GenAI applications is a significant barrier to their widespread adoption, primarily attributed to the extensive computational power and specialized infrastructure required. Training large-scale AI models, such as those used in natural language processing, demands vast amounts of data and high-performance hardware, including GPUs and TPUs, which are both expensive to acquire and operate. According to Gartner, the financial burden of running these systems is so substantial that by 2025, growth in enterprise AI deployments is expected to slow as costs begin to exceed perceived value. By 2028, more than half of enterprises may abandon large AI model projects due to escalating expenses and complexity.
FSI Scale and AI Resource Consumption: Not a Marriage Made in Heaven
Modern FSI technology infrastructures are complex and operate on a massive scale, driven by vast data volumes, high transaction rates, diverse technology stacks, and stringent regulatory requirements. FSIs generate and process enormous amounts of data daily, with entities like the New York Stock Exchange (NYSE) producing about one terabyte of data per day. Transaction volumes are also immense; for instance, Visa handles over 150 million transactions daily, translating to over 65,000 transaction messages per second at peak times.
If AI can work at this scale without breaking the bank, killer use cases are plentiful. They include enhancing customer service through AI-powered chatbots and virtual assistants, automating fraud detection and prevention with advanced pattern recognition, and optimizing trading strategies using AI-generated insights. AI can also streamline compliance and regulatory reporting by automating the generation of required documents and ensuring accuracy. Additionally, it can assist in personalized financial planning and advisory services by analyzing customer data and market conditions to provide tailored recommendations. These applications improve efficiency and accuracy and enhance customer satisfaction and trust in financial institutions.
Vespa: Achieving Run Time Efficiency to Keep Costs in Check
Vespa is a collaborative platform for developing real-time AI-driven applications for search, recommendation, personalization, and retrieval-augmented generation (RAG). Vespa efficiently manages data, inference, and logic, supporting applications with large data volumes and high concurrent query rates.
Initially developed under Yahoo and released in 2017 as an open-source project, Vespa.ai was spun out as an independent company in 2023. Vespa was designed to handle scale, concurrency, performance, and run-time efficiency–table stakes for Yahoo. Today, Vespa underpins a diverse portfolio of 150 applications integral to Yahoo’s operations. These applications are pivotal in delivering personalized content across all of Yahoo’s pages in real-time and effectively managing targeted advertisements within one of the world’s largest ad exchanges. Collectively, these applications serve an impressive user base of nearly one billion individuals, processing a staggering 800,000 queries per second.
Vespa achieves high performance and scalability through its distributed architecture, efficient query processing, and advanced data management. By distributing data and queries across multiple nodes in a cluster, Vespa ensures load balancing and fault tolerance, which are crucial for handling large volumes of data and high query throughput. Vespa does this without the need for specialized processors, such as GPUs. The system supports horizontal scaling, allowing additional nodes to be added as needed to increase capacity and performance. This distributed approach ensures that Vespa effectively manages growing datasets and higher query rates.
Efficient query processing is another key factor in Vespa’s performance. The platform is optimized for low-latency query execution through advanced optimization techniques and in-memory data structures that reduce access times. Real-time data ingestion and updates enable immediate availability of new or modified data, essential for applications requiring up-to-the-minute information. Vespa’s powerful query language supports complex queries, filtering, and ranking, allowing for sophisticated and efficient searches. Moreover, advanced ranking algorithms leverage machine learning models to deliver relevant and high-quality search results.
Vespa also ensures fault tolerance and high availability–critical to FSIs– through data replication across multiple nodes and automatic failover mechanisms. This setup maintains continuous operation and performance even in the event of node failures. The platform provides robust APIs and SDKs for seamless integration with various applications and data sources, and it allows for custom components to tailor optimizations and enhancements specific to user needs. Comprehensive metrics and logging capabilities enable detailed monitoring and tuning, helping to identify bottlenecks and optimize configurations for peak performance.
Summary: Not an Afterthought
Numerous conversations with prospects reveal a consistent pattern: scale and efficiency are afterthoughts. FSI experiments prove the potential for AI, but large-scale rollouts hit massive cost barriers. Run-time efficiency is designed-in by Vespa Engineers. It’s not an afterthought or something we will get back to later! It’s in our DNA.