Use cases:
What can you do when you can compute over your data with millisecond latency?
Some of the most common use cases people solve with Vespa.
-
Search
Vespa is the world’s leading open text search engine with support for linguistics processing, snippet generation, advanced ML ranking using features from bm25 to positional text relevance features that enable state-of-the-art text relevance.
In addition, it is the worlds most capable vector database, with support for any number of vectors and tensors, with any value type down to binary, for both indexing and ranking.
By combining these features you can create hybrid search applications with a quality that cannot be achieved with any other technology.
-
Search features
- Lexical search without limitations: Any amount of fields, text, tokens etc.
- Vector search without limitations: Multiple fields, collections of vectors per field, any size, any value type from 64 to 1 bits.
- Hybrid search in any number of text, vector and metadata fields, combined with AND, OR and so on.
- Text embedding inside your Vespa application, on Gpu or Cpu, or submit vectors in your documents and requests.
- Any ranking function/ML model evaluated on all matches to a query, locally on each content partition.
- Second-phase reranking with any ranking function/ML model, evaluated on the local top hits on each content partition.
- Global-phase reranking with any ranking function/ML model, evaluated on the global top hits.
- Any number of ranking functions selectable at query time.
- A large number of rank features to be used as input to ranking functions, including high level text match features such as bm25 and detailed features using the position of each matching word in the document, geo, and time features, features over arrays of text, and any in-memory document field, any value sent with the query.
- Store vectors used for ranking on disk to achieve state-of-the-art quality at lower cost.
- Dynamic snipped generation and word match highlighting.
- Linguistics process of documents and queries supporting a large number of languages, including language detection, stemming and CJK segmentation.
- Flexible matching modes – lexical, exact, regex, n-gram and fuzzy.
- WAND based lexical retrieval to get the semantics of OR with the performance of AND.
- End user-level query language, embeddable in structured queries.
- Support for application components intercepting queries, results and writes to process, orchestrate, amend etc.
- Federation over multiple internal document types and sources, as well as external sources of results.
- Rich grouping (faceting) and aggregation over all matches to a query.
- Scale dynamically to any data or traffic volume while online simply by changing allocated node resources.
-
Search sample applications
-
-
Generative AI (RAG)
GenAI applications usually need access to proprietary, specific and recent information, and will only be as good as the relevance of the data we surface for them.
Achieving great relevance requires progressing from plain vector similarity or bm25 to hybrid text-vector search, token-vector approaches, or machine learned ranking using positional text ranking features.
Vespa is the only platform that lets you leverage all such approaches, and it does so while letting you scale to any query volume or data size without compromising quality.
-
RAG features
- LLM integration: Invoke LLMs as part of processing requests. Respond with a mixture of results (immediately) and generated text (incrementally). LLMs can be supplied by the application, by Vespa Cloud, or remote invoked over standard industry APIs.
- Lexical search without limitations: Any amount of fields, text, tokens etc.
- Vector search without limitations: Multiple fields, collections of vectors per field, any size, any value type from 64 to 1 bits.
- Hybrid search in any number of text, vector and metadata fields, combined with AND, OR and so on.
- Text embedding inside your Vespa application, on Gpu or Cpu, or submit vectors in your documents and requests.
- Any ranking function/ML model evaluated on all matches to a query, locally on each content partition.
- Second-phase reranking with any ranking function/ML model, evaluated on the local top hits on each content partition.
- Global-phase reranking with any ranking function/ML model, evaluated on the global top hits.
- Any number of ranking functions selectable at query time.
- A large number of rank features to be used as input to ranking functions, including high level text match features such as bm25 and detailed features using the position of each matching word in the document, geo, and time features, features over arrays of text, and any in-memory document field, any value sent with the query.
- Store vectors used for ranking on disk to achieve state-of-the-art quality at lower cost.
- Dynamic snipped generation and word match highlighting.
- Linguistics process of documents and queries supporting a large number of languages, including language detection, stemming and CJK segmentation.
- Flexible matching modes – lexical, exact, regex, n-gram and fuzzy.
- WAND based lexical retrieval to get the semantics of OR with the performance of AND.
- End user-level query language, embeddable in structured queries.
- Support for application components intercepting queries, results and writes to process, orchestrate, amend etc.
- Federation over multiple internal document types and sources, as well as external sources of results.
- Rich grouping (faceting) and aggregation over all matches to a query.
- Scale dynamically to any data or traffic volume while online simply by changing allocated node resources.
-
RAG sample applications
-
-
Recommendation and personalization
Recommendation, personalization and ad targeting systems combine retrieval with machine-learned model evaluation to select recommended data.
Vespa’s fast execution of complex filters combined with machine-learned model evaluation distributed on the nodes storing the content, enables blazing fast recommendation applications at any scale.
And field updates at at rate of up to 100k writes per second per node makes it possible to let behavior information and other signals instantly influence the results.
-
Recommendation features
- Any complex set of filters defining the eligible content can be expressed in a query.
- Any in-memory document field can be used as a scoring signal, as well as geo and time features, and any value sent with the query.
- Document field signals can be updated in real time at a rate of about 100k writes per second per node.
- Parent-child relationships can be used to join in stored signals which are not per document without impacting latency (such as ads belonging to campaigns).
- Vector similarity can be used for retrieval and/or ranking.
- Any scoring function/ML model over tensors can be evaluated on all matches to a query, locally on each content partition.
- Second-phase reranking with any ranking function/ML model, evaluated on the local top candidates on each content partition.
- Global-phase reranking with any ranking function/ML model, evaluated on the global top candidates.
- Any number of ranking functions selectable at query time.
- Support for application components intercepting queries, results and writes to process, orchestrate, amend etc.
- Federation over multiple internal document types and sources, as well as external sources of results.
- Predicate fields to specify conditions for what user properties should match can be expressed on documents for detailed targeting.
-
Recommendation sample applications
-
-
Personal search
In applications working with personal data, any query will only access a small fraction of the total data, and building indexes would be wasteful – especially with vectors.
Vespa’s streaming mode enables all the features of Vespa directly from the compressed store of raw data, which is dramatically cheaper than using indexing, while also delivering perfectly accurate results.
And since Vespa distributes the data of a each user over multiple nodes as needed, it can deliver low latency also for the occasional very large user.
-
Personal search features
- Apply all of Vespa’s features directly on stored and compressed data to get text, vector, and metadata search and ranking/inference with unlimited flexibility at a fraction of the cost.
- Efficient non-approximate vector search to avoid missing critical personal data.
- Vespa automatically distributes users with large amounts of data over multiple nodes to get low latency.
- Use two clusters with the same schema to migrate large users to an indexed backend while providing the same features with both implementations.
-
Personal search sample apps
-
-
Your novel use case
What can you do when you’re able to make any inference over any set of data with milliseconds latency?
Our customers are using Vespa to do things nobody before have imagined, and some of them will upend industries. Maybe you will be one of them?
-
Features for novel use cases
Some of the features of Vespa that helps imagining new uses:
- The content returned from Vespa instance do not need to be sorted by the score you are inferring. It could for example be the difference between what you are inferring and a document value.
- And any type of inferences over data items, and any number of them, can be made as part of query execution.
- Tensors with sparse dimensions make it possible to store, pass and compute over arbitrary structured data.
- All writes to Vespa are fully real-time such that the next query will observe any data changes made prior to it.
- Any kind of data can be stored and selectively surfaced with documents.
-