Scaling a Vespa Application: Feeding Fast and Furiously
A tutorial on how to scale the resources in a Vespa application to increase feed throughput. Using the metrics dashboard for informed and optimised scaling.
April 28, 2026
Vespa is the world’s leading open text search engine and the world’s most capable vector database. In combination with Vespa’s integrated distributed machine-learned model inference for relevance this lets you create search applications with a quality you simply cannot achieve in any other way.
GenAI applications are only as good as the data we surface for them to work with; they need great search relevance. This takes much more than vector similarity—hybrid search, relevance models, and multi-vector representations. Vespa is the only platform which lets you deploy such techniques with no limitations and at any scale.
Recommendation, personalization and ad targeting systems combine retrieval of eligible content with machine-learned model evaluation to select the best data items. Vespa lets you easily build applications that do this at any scale and complexity.
Applications like e-commerce use a combination of structured data and text+images, and need to combine search and recommendation seamlessly with structured navigation. Vespa provides all the features required to do this with great performance, at any scale.
In applications working with personal data, any query will only access a small fraction of the total data, and building indexes would be wasteful – especially with vectors. Vespa provides a special mode – streaming search – which delivers all the industry-leading features of Vespa for personal/private search 20x cheaper than with indexing.
“As a reliable and scalable solution, Vespa has been instrumental in enabling Search at Spotify. We look forward to continuing our work with the Vespa team, and enabling innovation that will enhance the experience for Spotify listeners.”
Daniel Doro,
Director of Engineering, Search
“Vespa is a battle-tested platform that allows us to integrate keyword and vector search seamlessly. It forms a key part of our AI research solution, guaranteeing both precision and rapidity in streamlining research processes. We highly recommend Vespa for its reliability and efficiency.”
Jungwon Byun
COO & Cofounder
“Vespa has been a critical component to Yahoo’s AI and machine learning capabilities across all of our properties for many years”
Jim Lanzone,
CEO
“Our team successfully implemented the entire recommendation process of one algorithm with Vespa, matching the latency requirements (provide recommendations under 100ms) and scalability needs.”
Ricardo Rossi Tegão
Machine Learning Engineer
A tutorial on how to scale the resources in a Vespa application to increase feed throughput. Using the metrics dashboard for informed and optimised scaling.
April 28, 2026
A guide to the Vespa Cloud metrics dashboard — how to move from symptom to bottleneck to action, and what's new in the latest revision.
April 24, 2026
Many ONNX models exceed the 2GB protobuf limit and store weights in external data files. Vespa now supports these models for embedders.
March 27, 2026