Vespa.ai Launches The RAG Blueprint to Accelerate Scalable, Accurate Generative AI

New fast-track template delivers Perplexity-level RAG quality and 10x productivity gains through simplified deployment.

July 9, 2025

TRONDHEIM, Norway – 9th July 2025 –Vespa.ai, the AI Search Platform for real-time AI applications at scale, today announced the launch of The RAG Blueprint—a template for building production-ready retrieval-augmented generation (RAG) applications with high accuracy, low latency, and streamlined deployment.

Designed to simplify enterprise RAG adoption and ensure accurate results are delivered to a Large Language Model (LLM), The RAG Blueprint packages proven architectural patterns into a modular, easy-to-deploy application template. It enables developers to build production systems with cutting-edge accuracy and unlimited scalability, drawing on the same approaches and core technology that power advanced platforms such as Perplexity.

The RAG Blueprint features modular retrieval, ranking pipelines, and native support for hybrid search—combining lexical, semantic and metadata signals—as well as seamless integration with leading embedding models. To support real-time performance at scale, The RAG Blueprint leverages Vespa’s advanced optimization capabilities, including phased retrieval and workload-aware query execution, which enable consistent, low-latency responses across billions of documents.

“Generative AI is a massive opportunity, but getting it right at scale—without sacrificing accuracy or driving up costs—is a real challenge,” said Jon Bratseth, CEO and Founder of Vespa.ai. “Vespa delivers the most scalable and performant RAG infrastructure available. With the RAG Blueprint, we’re making it dramatically easier for teams to deploy production-grade systems with the power of Perplexity, but in a far more accessible and manageable way.”

The RAG Blueprint is available now as part of Vespa Cloud.

About Vespa

Vespa, an AI Search Platform for applications that leverage AI and data to deliver experiences to end users online in real time, such as search, recommendation, personalization, and retrieval-augmented generation (RAG). Vespa automatically organizes data, inference and logic to allow applications to work with any amount of data, queries, and is available both as a managed service and as open source.