Webinar Transcript
Unlock the Future of eCommerce
One Platform, Unlimited Possibilities
Session 1: Introduction to Vespa
Jürgen Obermann, Senior Account Executive, EMEA
Session Summary
Vespa began as an internal Yahoo project in 2011 and was early to adopt vector and tensor operations — nearly a decade before they became mainstream. Open-sourced in 2017 and launched as a cloud service in 2021, Vespa became an independent company in 2023. Today, it’s used by 10% of the world’s top 50 companies and leading retailers.
Built for real-time, large-scale vector and hybrid search, Vespa powers use cases ranging from hyper-personalization at Spotify to internet-scale RAG at Perplexity, which indexes billions of documents. It also underpins fast, low-latency recommendations at Farfetch and ad delivery at Taboola. Flexible deployments and strong LLM integration make Vespa ideal for production-grade AI applications.
Gartner Peer Insights users gave Vespa a rare 5-star rating across all surveyed customers — a testament to its enterprise-grade reliability and performance.
Transcript
Let me start by giving you a bit of background on Vespa — how analysts view us and some customer examples, including a real-life one you’ll see later with Anessa.
Although Vespa, as a company, is only about two years old, the technology itself is much more mature. We began deploying search solutions back in 2011 within Yahoo. What I find particularly interesting is that in 2014, Vespa began adopting vector and tensor operations — nearly a decade before these approaches became mainstream. Most of the industry only started talking about them around 2022 or 2023, yet Vespa was already implementing solutions around these capabilities years earlier.
In 2017, we open-sourced the Vespa codebase. From what we can tell, it’s been quite popular, with lots of downloads for the open-source version. In 2021, we launched the Vespa Cloud service and added general GenAI support in 2022.
Then, in 2023, Vespa officially separated from Yahoo and became its own entity. That was a strong starting point. We’ve already seen significant adoption in the enterprise space, including 10% of the world’s top 50 companies and three to five of the largest global retailers. It’s been a great beginning, particularly in the e-commerce space.
Now, you might ask where Vespa fits best. We’re built for speed and efficiency — real-time querying at very large scale. But we’re equally well-suited for smaller companies that rely heavily on vector or tensor search. Vespa scales from very small use cases to internet-scale workloads, all in real time. Relevance and personalization are core strengths, as you’ll see later in the customer use cases, especially in the Vinted presentation. Vespa’s efficiency and speed allow businesses to reduce costs and improve total cost of ownership (TCO).
The platform is very flexible — it supports deployments in public clouds, private clouds, and on-premises. It also includes a wide range of APIs, including those needed to integrate with LLMs and other advanced tools. That’s essentially where we fit.
Looking at what analysts have said — this is quite astonishing, actually — Gartner conducted a Peer Insights survey of our customers. Every customer they interviewed gave Vespa a five-star rating. That’s very rare, and something we’re extremely proud of.
In terms of well-known customers, the most famous is probably Perplexity. This is likely one of the largest RAG (Retrieval-Augmented Generation) implementations in the world, indexing essentially the entire internet for search. I use it, my family uses it, friends use it — it’s become a go-to for questions, and it relies on precise answers powered by Vespa under the hood.
They currently have about 1.5 billion documents indexed, with a goal of reaching 10 billion by the end of the year. It’s a hybrid, multi-vector text search with multi-phase ranking to optimize LLM usage.
Another customer is Spotify, which uses Vespa for several use cases — the most prominent being hyper-personalization. They leverage Vespa to search within their podcast library — supporting over 600 million monthly active users and more than 5 million podcasts. You can imagine the scale involved. The key feature here is vector search over embedded documents, with queries based on vector similarity. Shutter will go into more detail on this later. Ranking is important here too, especially in relation to neural networks.
As for Yahoo, Vespa was originally built there and continues to be used extensively. What’s striking is that it handles more than 800,000 queries per second across about 50 applications. That shows the scalability Vespa supports — though we also serve much smaller environments just as effectively.
Another example is Taboola, an advertising platform where serving the right ad at the right moment is essential — before the user leaves the website. Vespa enables scalable vector-based search with real-time updates and deep integration, which is critical in that context. Hybrid querying and narrowing of results using deep machine learning are also key parts of the solution there.
Farfetch is another customer using Vespa to power their recommendation engine with low latency and high throughput. The system ensures that recommended items are actually in stock, so users aren’t shown unavailable products due to outdated inventory data. This is the kind of application Vespa is ideal for.
I’ll leave the details on Vinted to Ernest, who’s better positioned to speak on that.
With that brief introduction to who we are, what Vespa does, and where we excel, I’ll now hand over to Piotr.
Piotr Kobziakowski, Senior Principal Solutions Architect
Session Summary
This session explored how Vespa uniquely combines the capabilities of text search engines and vector databases into a single, scalable AI-native platform. It highlighted Vespa’s advantages in real-time ranking, tensor operations, and integrated LLM workflows — all essential for powering modern e-commerce, recommendation, and personalization systems. The speaker also explained common limitations in legacy architectures and detailed how Vespa helps eliminate data silos, reduce latency, and simplify complex search infrastructure, making it ideal for production-grade Retrieval-Augmented Generation and AI applications.
Transcript
Let me start by discussing Vespa’s positioning in the market and how it fits within the broader landscape of search and vector databases.
Today, we see two dominant categories of database engines used in e-commerce platforms:
- Text-based engines like Elasticsearch, OpenSearch, and Solr.
- Vector-native databases such as Pinecone, Chroma, and Qdrant.
Vespa uniquely bridges these two worlds. Over ten years ago, we introduced not just vector support but full tensor support — allowing us to combine structured, semantic, and machine-learned signals in one unified platform. While others are just starting to add vector features, Vespa has long focused on advanced ranking and vector operations.
What sets Vespa apart:
- Real-time updates: Crucial for applications like live product inventory or shopping cart changes. Vespa supports partial document updates, improving latency and performance.
- Integrated ranking and search: True relevance comes from tightly coupling search with ranking. Vespa does this natively, including support for multistage ranking and complex custom expressions.
- Tensor operations: Vespa enables in-place computation of similarity, model inference, and custom scoring using tensors.
- Model integration: Vespa supports ONNX-based models like XGBoost and GPT, allowing you to bring in models from platforms like Hugging Face and use them directly.
- Cloud auto-scaling: Thanks to its separation of stateless and stateful services, Vespa scales up and down elastically.
- LLM integration: Through built-in workflows, LLMs can be used directly in Vespa’s RAG pipelines — no extra orchestration required.
We also support deployment on-premises, in private clouds, or in Vespa Cloud.
On Market Comparisons:
Many legacy systems fall short in modern search use cases:
- Text-based engines struggle with real-time personalization and don’t offer robust ranking or tensor support.
- Vector databases often lack Boolean logic, multistage ranking, and deep integration of AI models. Most don’t support mixing lexical and semantic queries seamlessly.
Vespa avoids the pitfalls of manual sharding through its bucket-based architecture, enabling massive scale without added complexity. It also reduces vector storage costs through techniques like heavy quantization and approximate search with learned indexing.
E-commerce Challenges Addressed by Vespa:
- Fragmented customer experiences: Many retailers can’t deliver consistent, personalized journeys across channels like web, SMS, email, and social due to data silos.
- Real-time personalization: Requires low-latency access to up-to-date behavioral and catalog data — a strength of Vespa.
- Ad fatigue and upsell limitations: Vespa helps serve smarter recommendations based on recent purchases or behavior, reducing irrelevant retargeting.
- Personalized pricing: Real-time inference capabilities allow Vespa to support dynamic pricing models.
Architectural inefficiencies: Microservice sprawl often separates search and ranking into different systems, introducing latency and operational complexity. Vespa unifies these.
Finally, the session introduced how Vespa outperforms Elasticsearch in benchmarks on QPS, latency, and CPU efficiency — and how teams can reproduce those tests themselves.