Use cases:

What can you do when you can compute over your data with millisecond latency?

Some of the most common use cases people solve with Vespa.

  • Generative AI (RAG)

    GenAI applications usually need access to proprietary, specific and recent information, and will only be as good as the relevance of the data we surface for them.

    Achieving great relevance requires progressing from plain vector similarity or bm25 to hybrid text-vector search, token-vector approaches, or machine learned ranking using positional text ranking features.

    Vespa is the only platform that lets you leverage all such approaches, and it does so while letting you scale to any query volume or data size without compromising quality.

    • RAG features

      • LLM integration: Invoke LLMs as part of processing requests. Respond with a mixture of results (immediately) and generated text (incrementally). LLMs can be supplied by the application, by Vespa Cloud, or remote invoked over standard industry APIs.
      • Lexical search without limitations: Any amount of fields, text, tokens etc.
      • Vector search without limitations: Multiple fields, collections of vectors per field, any size, any value type from 64 to 1 bits.
      • Hybrid search in any number of text, vector and metadata fields, combined with AND, OR and so on.
      • Text embedding inside your Vespa application, on Gpu or Cpu, or submit vectors in your documents and requests.
      • Any ranking function/ML model evaluated on all matches to a query, locally on each content partition.
      • Second-phase reranking with any ranking function/ML model, evaluated on the local top hits on each content partition.
      • Global-phase reranking with any ranking function/ML model, evaluated on the global top hits.
      • Any number of ranking functions selectable at query time.
      • A large number of rank features to be used as input to ranking functions, including high level text match features such as bm25 and detailed features using the position of each matching word in the document, geo, and time features, features over arrays of text, and any in-memory document field, any value sent with the query.
      • Store vectors used for ranking on disk to achieve state-of-the-art quality at lower cost.
      • Dynamic snipped generation and word match highlighting.
      • Linguistics process of documents and queries supporting a large number of languages, including language detection, stemming and CJK segmentation.
      • Flexible matching modes – lexical, exact, regex, n-gram and fuzzy.
      • WAND based lexical retrieval to get the semantics of OR with the performance of AND.
      • End user-level query language, embeddable in structured queries.
      • Support for application components intercepting queries, results and writes to process, orchestrate, amend etc.
      • Federation over multiple internal document types and sources, as well as external sources of results.
      • Rich grouping (faceting) and aggregation over all matches to a query.
      • Scale dynamically to any data or traffic volume while online simply by changing allocated node resources.
    • RAG sample applications

  • Recommendation and personalization

    Recommendation, personalization and ad targeting systems combine retrieval with machine-learned model evaluation to select recommended data.

    Vespa’s fast execution of complex filters combined with machine-learned model evaluation distributed on the nodes storing the content, enables blazing fast recommendation applications at any scale.

    And field updates at at rate of up to 100k writes per second per node makes it possible to let behavior information and other signals instantly influence the results.

    • Recommendation features

      • Any complex set of filters defining the eligible content can be expressed in a query.
      • Any in-memory document field can be used as a scoring signal, as well as geo and time features, and any value sent with the query.
      • Document field signals can be updated in real time at a rate of about 100k writes per second per node.
      • Parent-child relationships can be used to join in stored signals which are not per document without impacting latency (such as ads belonging to campaigns).
      • Vector similarity can be used for retrieval and/or ranking.
      • Any scoring function/ML model over tensors can be evaluated on all matches to a query, locally on each content partition.
      • Second-phase reranking with any ranking function/ML model, evaluated on the local top candidates on each content partition.
      • Global-phase reranking with any ranking function/ML model, evaluated on the global top candidates.
      • Any number of ranking functions selectable at query time.
      • Support for application components intercepting queries, results and writes to process, orchestrate, amend etc.
      • Federation over multiple internal document types and sources, as well as external sources of results.
      • Predicate fields to specify conditions for what user properties should match can be expressed on documents for detailed targeting.
    • Recommendation sample applications

  • Semi-structured navigation

    Applications like e-commerce use a combination of structured data and text+images, and need to combine search and recommendation with structured navigation where users select the subset of data they are interested in from properties of the data, such as sizes, colors, merchants etc.

    Combining these interaction modes seamlessly require them to be served from a single system, and Vespa provides all the features needed to do this: Text and vector search with state-of-the-art relevance, recommendation at blazing speed, and distributed grouping and aggregation for structured navigation at any scale.

    • Semi-structured navigation features

      • Any combination of state-of-the-art text and vector search.
      • Relevance and scoring inference distributed to where the content is stored.
      • Rich grouping (faceting) and aggregation over all matches to a query.
      • Support for incorporating both text, structured data, and embeddings of text and images.
    • Semi-structured navigation sample apps

  • Your novel use case

    What can you do when you’re able to make any inference over any set of data with milliseconds latency?

    Our customers are using Vespa to do things nobody before have imagined, and some of them will upend industries. Maybe you will be one of them?

    • Features for novel use cases

      Some of the features of Vespa that helps imagining new uses:

      • The content returned from Vespa instance do not need to be sorted by the score you are inferring. It could for example be the difference between what you are inferring and a document value.
      • And any type of inferences over data items, and any number of them, can be made as part of query execution.
      • Tensors with sparse dimensions make it possible to store, pass and compute over arbitrary structured data.
      • All writes to Vespa are fully real-time such that the next query will observe any data changes made prior to it.
      • Any kind of data can be stored and selectively surfaced with documents.