The RAG Blueprint

Build Reliable, Accurate RAG. Fast-Tracked for Production

Accelerate your path to production with a best-practice template that prioritizes retrieval quality, inference speed, and operational scale.

A Smarter Starting Point for RAG

The RAG Blueprint is a modular application template for designing, deploying, and testing production-grade RAG systems. Built on the same core architecture that powers Perplexity, it codifies best practices for building accurate and scalable retrieval pipelines using Vespa’s native support for hybrid search, phased ranking, and real-time inference. Designed for developers and architects, the Blueprint serves as a hands-on guide for production-ready implementations, helping teams move faster without compromising on quality or control.

Benefits of The RAG Blueprint

Accuracy You Can Trust

Built on proven architectural patterns, The RAG Blueprint prioritizes precision in retrieval and ranking—helping you deliver more relevant, reliable answers from day one.

Proven Practices

Develop confidently with a proven, production-grade reference application that guides implementation while preserving full control over retrieval quality and system design.

Scalable Performance, Built In

Designed for large-scale applications, the Blueprint leverages Vespa’s native support for low-latency, hybrid retrieval over billions of documents.

Perplexity uses Vespa.ai to power fast, accurate, and trusted answers for millions of users.

With Vespa RAG, Perplexity delivers accurate, near-real-time responses to more than 15 million monthly users and handles more than 100 million queries each week.

  • What is The RAG Blueprint?

    The RAG Blueprint is a best-practice template based on real-world Vespa deployments, designed to help teams implement RAG systems efficiently and reliably. It provides practical guidance on retrieval and ranking, including support for hybrid search that combines keyword, vector, and structured signals. The Blueprint integrates with common embedding models and leverages Vespa’s built-in features—such as phased retrieval and query execution optimizations—to ensure low-latency performance at scale, even across billion-document workloads.

  • What problem does it solve?

    Deploying large-scale RAG systems in production presents several challenges that go beyond proof-of-concept implementations. At scale, maintaining accurate retrieval becomes more difficult as the volume and variety of data increase. Systems must combine keyword, semantic, and metadata-based signals to ensure relevant results, especially when content is noisy or domain-specific. Latency and throughput are also critical, as RAG pipelines must handle complex query chains and model inference in real-time, often across billions of documents.

     

    These challenges become even more pronounced with the shift toward deep research, where LLMs must issue multiple queries, evaluate intermediate results, and reason across sources to produce trustworthy answers. This increases the demand on retrieval infrastructure, which must support high query rates, tight latency budgets, and rapid updates. Enterprises also face operational hurdles, such as enforcing access controls, keeping indexes up to date, and managing the costs of running embedding models and LLMs at scale. Without the right architecture, deep research use cases can expose the limits of traditional vector databases, delaying deployment and reducing the effectiveness of GenAI initiatives.

  • Who is The RAG Blueprint for?

    The RAG Blueprint is intended for engineers proficient in Vespa developing production-ready RAG applications. By following the steps of a predefined application, it provides a series of steps for developing a Vespa RAG application including validating your system, demonstrating how to implement machine-learning document ranking, and outlining how to configure Vespa for optimal performance.

  • Can I use The RAG Blueprint for my use case?

    The methodology presented in the Blueprint can be used for any RAG application. The current version of the RAG Blueprint is built around a predefined sample application intended for educational and evaluation purposes. The code provided is designed for the sample application but is not ready-to-run for customer-specific scenarios, although it does provide a good starting point for understanding how to do your own implementations. 

  • Is The RAG Blueprint for Vespa Cloud only?

    No. The RAG Blueprint can be used with both Vespa Cloud and self-managed (open source) Vespa deployments. However, certain features—such as advanced chunking—require Vespa version 8.543.14 or later. On Vespa Cloud, upgrades are handled automatically, while self-hosted users must manage this manually. Vespa Cloud also simplifies secure integration with off-the-shelf LLMs by allowing API keys to be stored in the built-in secret store, avoiding the need to pass keys in request headers.

  • What is included in The RAG Blueprint Package?

     

Summary

Production-Grade RAG, From Day One

The RAG Blueprint makes it easier for enterprises to operationalize high-quality RAG systems. It distills proven architectural patterns into a reusable design that emphasizes accuracy, consistency, and scalability—so you can focus on innovation, not infrastructure.

Built on Vespa. Battle-Tested at Scale.

Powered by the same core technology that supports advanced systems like Perplexity and bigdata.com, The RAG Blueprint enables developers to build production-grade pipelines from day one. It includes modular components for retrieval and ranking, combining lexical, semantic, and metadata signals in one query path for advanced hybrid search,.

Optimized for Performance and Precision

To deliver consistent low-latency performance over billions of documents, The RAG Blueprint leverages Vespa’s unique capabilities: phased retrieval, vector and lexical fusion, and optimized query execution—all running natively on Vespa Cloud.

Accelerate Deployment Without Compromise

Generative AI offers transformative potential—but achieving accuracy, speed, and scalability without ballooning costs remains a challenge. The RAG Blueprint gives AI teams a proven, production-ready foundation to build reliable and accurate systems faster and with greater confidence.

The RAG Blueprint is now available on Vespa Cloud, ready to support your next-generation AI applications.

Explore More

Retrieval Augmented Generation

Discover Vespa’s RAG features for hybrid search, combining text-vector, token-vector, and machine-learned ranking, all designed to scale effortlessly and handle any query volume or data size without compromising on quality.

The RAG Blueprint Blog

Read more about The RAG Blueprint from the Vespa engineering blog.

Vespa RAG Manager’s Guide

This management guide outlines how businesses can deploy generative AI effectively, focusing on Retrieval-Augmented Generation (RAG) to integrate private data for tailored, context-rich responses.

RAG Technical Guide

Learn how Vespa RAG allows language models to access up-to-date or specific domain knowledge beyond their training, improving performance in tasks such as question answering and dynamic content creation.