Vespa use cases

When you can to compute over large data sets online, a new world of possibilities for new applications and features opens up. This page describes some of the most well known problems people use Vespa to solve.

Vespa is a full-featured search engine with full support for traditional information retrieval as well as modern vector embedding based techniques. And since Vespa allows these approaches to be combined efficiently in the same query and ranking model, you can create hybrid solutions that combines the best of both. Search applications usually make use of these features of Vespa:

No matter which features you combine, you'll benefit from Vespa's linear scalability, automatic data management and online elasticity, and support for sustained high volume and fully realtime writes which allows you to both add new documents, and cheaply update fields of existing documents while serving.

Search sample applications

Grab one of these open source sample applications to create your own Vespa application:

Recommendation and personalization

Recommendation, content personalization and ad targeting is all the same thing when it comes to implementation: For a given user or context, evaluate machine-learned content recommender models to find the best items and show them to the user. Usually it is also necessary to filter out unwanted items based on metadata, such as e.g. language used, or remaining ad budget. In addition, it is often necessary to group the recommended items to make browsing easier or filter out those that are too similar.

Vespa makes it possible to do the whole process online, at the moment when the recommendation is needed, which ensures recommendations are up-to-date and makes it affordable to make them specifically for each user or situation. These features of Vespa are usually leveraged:

Recommendation sample applications

These example open source Vespa recommendation applications can be used as a starting point:

Conversational AI

Large language models compress the information in large amounts of text into a handful of billions of parameters. When done well, this gives them a certain amount of intelligence and true understanding, but they are still missing crucial ingredients to be truly useful in many settings: They need to be able to access specific information about the topic at hand, form memories, and temporarily store information as they work, carry out long reasoning chains using this information, and verify their results.

We at Vespa are not going to create these more complete and capable agents, but we know that they will be built, and that they will need the capabilities that Vespa provides as an integrated package: Machine-learned model inference using ONNX, storing vectors and text and immediately retrieve the data in queries, easily writing components that carries out chains of inference, store and search steps, and the ability to run all this at scale with high availability.

Building such solutions on Vespa means you can focus on the behavior of the system rather than infrastructure and integration, and ensures it will be suitable for running in production, solving real problems.

Semi-structured navigation

Applications that use semi-structured data - that is a combination of data-base like data and plain text - usually benefit from allowing users to navigate in the data using both structured navigation and text search. The most common example of this is e-commerce, or shopping sites.

This makes use of traditional text search in conjunction with sorting, grouping and "filtering by metadata. As any query can be grouped and filtered, this allows users to switch between drilling down by metadata and searching by text seamlessly without losing context. Commonly some of the metadata is supplied by parent documents (such as the merchant of a product). Some e-commerce applications also make use of embeddings to provide search, navigation or recommendation in an embedding space.

For more details, see the shopping use case in the Vespa documentation and the accompanying application source with frontend.

Personal search (not to be confused with personalization) is to provide search in personal collections of data where there is never a need to search across many collections in a single query. In such applications it is not cost-effective to do the work to maintain global reverse indexes and the best solution is to search by streaming through the raw data at query time. Latency can still be bounded for arbitrary sized collections as each is distributed over a number of nodes to bound the size of a given user's collection on a given node.

Vespa provides a streaming mode where the usual functionality of the engine is backed by searching streaming through the raw data stored in Vespa, no indexes necessary. This allows powerful personal search applications to be implemented easily and cheaply at any scale. Read more in our blog post on personal search.

Typeahead suggestions

Many applications which make use of textual input make use of typeahead suggestions, where a number of suggested completions are presented while the user is typing. This usually involves searching and ranking matching candidate completions with really low latency - a suitable job for Vespa. Vespa features usually involved in this are:

  • Text search with prefix match, or gram matching, or
  • prefix, substring or regexp search in a (structured) attribute containing an array of strings.
  • Realtime updates of features signalling how often a suggestion is selected.
  • A ranking expression to rank the candidate completions using match and metadata features.
  • A personal search cluster if some suggestions depend on personal data, and if so federated with a cluster of shared suggestions.

Question answering

Question answering provides direct answers to user's question. This is needed in chat-bots, virtual assistants and similar, and is also becoming an expected feature of high-end search solutions, where a direct answer is provided on queries that seem to be questions.

A high quality question answerer works as follows: Text snippets are represented by vector embeddings which are indexed for fast matching with ANN. The best candidates found by ANN matching are evaluated in a transformer-based language model, which outputs the score of the snippet as well as the beginning and end of the text answer.

By using Vespa, the entire process can be implemented as an application on a single platform, and made to execute with a latency of a few tens of milliseconds while scaling to any volume, while delivering quality on par with the research state of the art.

See our blog post showing how to replicate the best question-answering performance from the research community as a production-ready Vespa application, and the follow-up post how we brought down the response time to tens of milliseconds. The complete source for this application is also available.