Performance Benchmark for Large-Scale AI Retrieval

Every Vespa release is validated using comprehensive performance benchmarks that cover retrieval, indexing, ranking, machine-learning inference, and distributed serving. Explore the engineering results behind Vespa's architecture and see how it performs on representative production AI retrieval workloads.

↗ Elastic Benchmark

How We Measure Production AI Retrieval

Every Vespa release is validated using an extensive suite of automated performance tests that cover retrieval, indexing, ranking, machine-learning inference, tensor operations, feeding, and distributed serving. These continuous benchmarks help ensure new features improve performance without introducing regressions.

We also publish comparative benchmarks using representative production workloads to demonstrate how Vespa performs against alternative retrieval architectures.

Vector Retrieval
- nearest neighbour
- HNSW
- multi-vector
- distance metrics
Hybrid Retrieval
- vector + BM25
- filtering
- WAND
- ranking
Real-time Updates
- feeding
- partial updates
- indexing
- reindexing
Machine Learning
- ONNX
- BERT
- Llama
- embeddings
Distributed Scale
- throughput
- dispatch
- clustering

Get the Benchmark

The benchmark results shown above are a direct consequence of Vespa's architecture. By combining retrieval, ranking, and machine learning inference in a single distributed serving engine, Vespa minimizes unnecessary data movement, delivering higher throughput, lower latency, and predictable performance at production scale.

Explore the benchmark

Performance Benchmark for Large-Scale AI Retrieval

How We Measure Production AI Retrieval

Continuous Testing

Vector Retrieval

Hybrid Retrieval

Real-time Updates

Machine Learning

Distributed Scale

Get the Benchmark