Vespa use cases

When you can to compute over large data sets online, a new world of possibilities for new applications and features opens up. This page describes some of the most well known problems people use Vespa to solve.

Vespa is a full-featured search engine with full support for traditional information retrieval as well as modern vector embedding based techniques. And since Vespa allows these approaches to be combined efficiently in the same query and ranking model, you can create hybrid solutions that combines the best of both. Search applications usually make use of these features of Vespa:

No matter which features you combine, you'll benefit from Vespa's linear scalability, automatic data management and online elasticity, and support for sustained high volume and fully realtime writes which allows you to both add new documents, and cheaply update fields of existing documents while serving.

Search sample applications

Grab one of these open source sample applications to create your own Vespa application:

Recommendation and personalization

Recommendation, content personalization, and ad targeting converge in implementation: For a given user or context, evaluate machine-learned content recommender models to find the best items and show them to the user. Usually it is also necessary to filter out unwanted items based on metadata, such as e.g. language used, or remaining ad budget. In addition, it is often necessary to group the recommended items to make browsing easier or filter out those that are too similar.

Vespa makes it possible to do the whole process online, at the moment when the recommendation is needed, which ensures recommendations are up-to-date and makes it affordable to make them specifically for each user or situation. These features of Vespa are usually leveraged:

Recommendation sample applications

These example open source Vespa recommendation applications can be used as a starting point:

Generative AI (RAG)

Large language models lack information that is recent, detailed, or private to a user or organization. That's why most generative AI systems combine the LLM with a component that surfaces the most useful information for the task at hand - colloquially called RAG. The relevance, precision, and extent of data you can surface determines the quality you can get from your overall solution.

For real-world use cases, it is rarely sufficient to just look up text snippets by proximity of a vector - you also need full-text search with rich text features, multiple vectors per data item, and metadata, and on-node machine-learned model evaluation to combine all this information to make a final selection. Furthermore, you need flexibility to trade off quality and cost in a way that suits your use case.

Vespa is the only platform that lets you easily implement any state-of-the art method for surfacing the best information from data sets of any size, using any combination of text, tensor and structured data, at latencies below 100 ms, for any amount of traffic. In addition, Vespa uniquely lets you reduce cost in a myriad of practical ways, such as using binary and shortened vector for retrieval, or extremely cost-effective streaming search for personal data.

By building on Vespa you can be confident that you can always be able to create a LLM RAG system with the best possible quality for the resources you spend, and that you will be able to scale it to any amount of data and traffic.

Generative AI (RAG) sample applications

These example open source Vespa applications can be used as a starting point for RAG:

  • A retrieval-augmented-generation application with complete application source. THis application showcases the utilization of external LLM API providers and local LLM inference within Vespa.
  • The MS Marco ranking app is a great starting point for using the best possible techniques for improving retrieval. It demonstrates multiple neural search techniques, much beyond simple cosine similarity. Complete application source.

Semi-structured navigation

Applications that use semi-structured data - that is a combination of data-base like data and plain text - usually benefit from allowing users to navigate in the data using both structured navigation and text search. The most common example of this is e-commerce, or shopping sites.

This makes use of traditional text search in conjunction with sorting, grouping and filtering by metadata. As any query can be grouped and filtered, this allows users to switch between drilling down by metadata and searching by text seamlessly without losing context. Commonly some of the metadata is supplied by parent documents (such as the merchant of a product). Some e-commerce applications also make use of embeddings to provide search, navigation or recommendation in an embedding space.

For more details, see the shopping use case in the Vespa documentation and the accompanying application source with frontend.

Personal search (not to be confused with personalization) is to provide search in personal collections of data where there is never a need to search across many collections in a single query. In such applications it is not cost-effective to do the work to maintain global reverse indexes and the best solution is to search by streaming through the raw data at query time. Latency can still be bounded for arbitrary sized collections as each is distributed over a number of nodes to bound the size of a given user's collection on a given node.

Vespa provides a streaming mode where the usual functionality of the engine is backed by searching streaming through the raw data stored in Vespa, no indexes necessary. This allows powerful personal search applications to be implemented easily and cheaply at any scale. Read more in our blog post on personal search. Vespa streaming mode also fully supports vector search, offering a extremely cost-effective vector search implementation for personal data.

Typeahead suggestions

Many applications which make use of textual input make use of typeahead suggestions, where a number of suggested completions are presented while the user is typing. This usually involves searching and ranking matching candidate completions with low single-digit latency - a suitable job for Vespa. Vespa features usually involved in this are:

  • Text search with prefix match, or gram matching, or prefix, substring or regexp search in a (structured) attribute containing an array of strings.
  • All of the above, combined with semantic search using Vespa's embedding inference for accelerated low-latency encoding of text, avoiding sending large vector payloads over remote networks.
  • Realtime updates of features signalling how often a suggestion is selected.
  • A ranking expression to rank the candidate completions using match and metadata features.
  • A personal search cluster if some suggestions depend on personal data, and if so federated with a cluster of shared suggestions.