Performance Benchmark for Large-Scale AI Retrieval

Every Vespa release is validated using comprehensive performance benchmarks that cover retrieval, indexing, ranking, machine-learning inference, and distributed serving. Explore the engineering results behind Vespa's architecture and see how it performs on representative production AI retrieval workloads.

How We Measure Production AI Retrieval

Every Vespa release is validated using an extensive suite of automated performance tests that cover retrieval, indexing, ranking, machine-learning inference, tensor operations, feeding, and distributed serving. These continuous benchmarks help ensure new features improve performance without introducing regressions.

We also publish comparative benchmarks using representative production workloads to demonstrate how Vespa performs against alternative retrieval architectures.

Continuous Testing

  • Vector Retrieval

    • nearest neighbour
    • HNSW
    • multi-vector
    • distance metrics
  • Hybrid Retrieval

    • vector + BM25
    • filtering
    • WAND
    • ranking
  • Real-time Updates

    • feeding
    • partial updates
    • indexing
    • reindexing
  • Machine Learning

    • ONNX
    • BERT
    • Llama
    • embeddings
  • Distributed Scale

    • throughput
    • dispatch
    • clustering

Get the Benchmark

The benchmark results shown above are a direct consequence of Vespa's architecture. By combining retrieval, ranking, and machine learning inference in a single distributed serving engine, Vespa minimizes unnecessary data movement, delivering higher throughput, lower latency, and predictable performance at production scale.