Use cases:

What can you do when you can compute over your data with millisecond latency?

Some of the most common use cases people solve with Vespa.

Vespa Architecture in 3 Minutes

Use cases

Search

Vespa is the world’s leading open text search engine with support for linguistics processing, snippet generation, advanced ML ranking using features from bm25 to positional text relevance features that enable state-of-the-art text relevance.

In addition, it is the worlds most capable vector database, with support for any number of vectors and tensors, with any value type down to binary, for both indexing and ranking.

By combining these features you can create hybrid search applications with a quality that cannot be achieved with any other technology.
- Search features
  
  Lexical search without limitations: Any amount of fields, text, tokens etc.
  
  Vector search without limitations: Multiple fields, collections of vectors per field, any size, any value type from 64 to 1 bits.
  
  Hybrid search in any number of text, vector and metadata fields, combined with AND, OR and so on.
  
  Text embedding inside your Vespa application, on Gpu or Cpu, or submit vectors in your documents and requests.
  
  Any ranking function/ML model evaluated on all matches to a query, locally on each content partition.
  
  Second-phase reranking with any ranking function/ML model, evaluated on the local top hits on each content partition.
  
  Global-phase reranking with any ranking function/ML model, evaluated on the global top hits.
  
  Any number of ranking functions selectable at query time.
  
  A large number of rank features to be used as input to ranking functions, including high level text match features such as bm25 and detailed features using the position of each matching word in the document, geo, and time features, features over arrays of text, and any in-memory document field, any value sent with the query.
  
  Store vectors used for ranking on disk to achieve state-of-the-art quality at lower cost.
  
  Dynamic snipped generation and word match highlighting.
  
  Linguistics process of documents and queries supporting a large number of languages, including language detection, stemming and CJK segmentation.
  
  Flexible matching modes – lexical, exact, regex, n-gram and fuzzy.
  
  WAND based lexical retrieval to get the semantics of OR with the performance of AND.
  
  End user-level query language, embeddable in structured queries.
  
  Support for application components intercepting queries, results and writes to process, orchestrate, amend etc.
  
  Federation over multiple internal document types and sources, as well as external sources of results.
  
  Rich grouping (faceting) and aggregation over all matches to a query.
  
  Scale dynamically to any data or traffic volume while online simply by changing allocated node resources.
- Search sample applications
  
  Colbert long
  
  Colbert
  
  Commerce product ranking
  
  Incremental (typeahead) search
  
  MS-Marco ranking
  
  Multilingual search
  
  Simple semantic search
  
  Splade
Generative AI (RAG)

GenAI applications usually need access to proprietary, specific and recent information, and will only be as good as the relevance of the data we surface for them.

Achieving great relevance requires progressing from plain vector similarity or bm25 to hybrid text-vector search, token-vector approaches, or machine learned ranking using positional text ranking features.

Vespa is the only platform that lets you leverage all such approaches, and it does so while letting you scale to any query volume or data size without compromising quality.

Learn more about Vespa for RAG.
- RAG features
  
  LLM integration: Invoke LLMs as part of processing requests. Respond with a mixture of results (immediately) and generated text (incrementally). LLMs can be supplied by the application, by Vespa Cloud, or remote invoked over standard industry APIs.
  
  Lexical search without limitations: Any amount of fields, text, tokens etc.
  
  Vector search without limitations: Multiple fields, collections of vectors per field, any size, any value type from 64 to 1 bits.
  
  Hybrid search in any number of text, vector and metadata fields, combined with AND, OR and so on.
  
  Text embedding inside your Vespa application, on Gpu or Cpu, or submit vectors in your documents and requests.
  
  Any ranking function/ML model evaluated on all matches to a query, locally on each content partition.
  
  Second-phase reranking with any ranking function/ML model, evaluated on the local top hits on each content partition.
  
  Global-phase reranking with any ranking function/ML model, evaluated on the global top hits.
  
  Any number of ranking functions selectable at query time.
  
  A large number of rank features to be used as input to ranking functions, including high level text match features such as bm25 and detailed features using the position of each matching word in the document, geo, and time features, features over arrays of text, and any in-memory document field, any value sent with the query.
  
  Store vectors used for ranking on disk to achieve state-of-the-art quality at lower cost.
  
  Dynamic snipped generation and word match highlighting.
  
  Linguistics process of documents and queries supporting a large number of languages, including language detection, stemming and CJK segmentation.
  
  Flexible matching modes – lexical, exact, regex, n-gram and fuzzy.
  
  WAND based lexical retrieval to get the semantics of OR with the performance of AND.
  
  End user-level query language, embeddable in structured queries.
  
  Support for application components intercepting queries, results and writes to process, orchestrate, amend etc.
  
  Federation over multiple internal document types and sources, as well as external sources of results.
  
  Rich grouping (faceting) and aggregation over all matches to a query.
  
  Scale dynamically to any data or traffic volume while online simply by changing allocated node resources.
  
  Supported by The RAG Blueprint, a modular application template for designing, deploying, and testing production-grade RAG systems.
- RAG sample applications
  
  Retrieval-augmented Generation
  
  Simple semantic search
Recommendation and personalization

Recommendation, personalization and ad targeting systems combine retrieval with machine-learned model evaluation to select recommended data.

Vespa’s fast execution of complex filters combined with machine-learned model evaluation distributed on the nodes storing the content, enables blazing fast recommendation applications at any scale.

And field updates at at rate of up to 100k writes per second per node makes it possible to let behavior information and other signals instantly influence the results.
- Recommendation features
  
  Any complex set of filters defining the eligible content can be expressed in a query.
  
  Any in-memory document field can be used as a scoring signal, as well as geo and time features, and any value sent with the query.
  
  Document field signals can be updated in real time at a rate of about 100k writes per second per node.
  
  Parent-child relationships can be used to join in stored signals which are not per document without impacting latency (such as ads belonging to campaigns).
  
  Vector similarity can be used for retrieval and/or ranking.
  
  Any scoring function/ML model over tensors can be evaluated on all matches to a query, locally on each content partition.
  
  Second-phase reranking with any ranking function/ML model, evaluated on the local top candidates on each content partition.
  
  Global-phase reranking with any ranking function/ML model, evaluated on the global top candidates.
  
  Any number of ranking functions selectable at query time.
  
  Support for application components intercepting queries, results and writes to process, orchestrate, amend etc.
  
  Federation over multiple internal document types and sources, as well as external sources of results.
  
  Predicate fields to specify conditions for what user properties should match can be expressed on documents for detailed targeting.
- Recommendation sample applications
  
  Album recommendation
  
  Album recommendation, with Java
Semi-structured navigation

Applications like e-commerce use a combination of structured data and text+images, and need to combine search and recommendation with structured navigation where users select the subset of data they are interested in from properties of the data, such as sizes, colors, merchants etc.

Combining these interaction modes seamlessly require them to be served from a single system, and Vespa provides all the features needed to do this: Text and vector search with state-of-the-art relevance, recommendation at blazing speed, and distributed grouping and aggregation for structured navigation at any scale.
- Semi-structured navigation features
  
  Any combination of state-of-the-art text and vector search.
  
  Relevance and scoring inference distributed to where the content is stored.
  
  Rich grouping (faceting) and aggregation over all matches to a query.
  
  Support for incorporating both text, structured data, and embeddings of text and images.
- Semi-structured navigation sample apps
  
  Commerce product ranking
Personal search

In applications working with personal data, any query will only access a small fraction of the total data, and building indexes would be wasteful – especially with vectors.

Vespa’s streaming mode enables all the features of Vespa directly from the compressed store of raw data, which is dramatically cheaper than using indexing, while also delivering perfectly accurate results.

And since Vespa distributes the data of a each user over multiple nodes as needed, it can deliver low latency also for the occasional very large user.
- Personal search features
  
  Apply all of Vespa’s features directly on stored and compressed data to get text, vector, and metadata search and ranking/inference with unlimited flexibility at a fraction of the cost.
  
  Efficient non-approximate vector search to avoid missing critical personal data.
  
  Vespa automatically distributes users with large amounts of data over multiple nodes to get low latency.
  
  Use two clusters with the same schema to migrate large users to an indexed backend while providing the same features with both implementations.
- Personal search sample apps
  
  Vector streaming search
Your novel use case

What can you do when you’re able to make any inference over any set of data with milliseconds latency?

Our customers are using Vespa to do things nobody before have imagined, and some of them will upend industries. Maybe you will be one of them?
- Features for novel use cases
  
  Some of the features of Vespa that helps imagining new uses:
  
  The content returned from Vespa instance do not need to be sorted by the score you are inferring. It could for example be the difference between what you are inferring and a document value.
  
  And any type of inferences over data items, and any number of them, can be made as part of query execution.
  
  Tensors with sparse dimensions make it possible to store, pass and compute over arbitrary structured data.
  
  All writes to Vespa are fully real-time such that the next query will observe any data changes made prior to it.
  
  Any kind of data can be stored and selectively surfaced with documents.

What can you do when you can compute over your data with millisecond latency?

Search

Search features

Search sample applications

Generative AI (RAG)

RAG features

RAG sample applications

Recommendation and personalization

Recommendation features

Recommendation sample applications

Semi-structured navigation

Semi-structured navigation features

Semi-structured navigation sample apps

Personal search

Personal search features

Personal search sample apps

Your novel use case

Features for novel use cases