Unlock the Future of eCommerce Webinar

In this webinar, you’ll learn how Vespa.ai reshaped online retail by integrating search, ranking, and recommendations into a single, scalable platform. The session explored Vespa’s advanced AI capabilities— from agentic commerce to Retrieval-Augmented Generation (RAG) — and how they power state-of-the-art document, video, and image search. You’ll also hear how this unified approach enabled Vinted to scale seamlessly to 1 billion listings, boost query speed, and cut operational costs.

You can view the session recordings and transcripts below.

Session 1: Introduction to Vespa

  • Jürgen Obermann, Senior Account Executive, EMEA
  • Piotr Kobziakowski, Senior Principal Solutions Architect

This session explores how Vespa uniquely combines the capabilities of text search engines and vector databases into a single, scalable AI-native platform. It highlights Vespa’s advantages in real-time ranking, tensor operations, and integrated LLM workflows — all essential for powering modern e-commerce, recommendation, and personalization systems. Common limitations in legacy architectures are described and how Vespa helps eliminate data silos, reduce latency, and simplify complex search infrastructure, making it ideal for production-grade Retrieval-Augmented Generation and AI applications.

Session 1: Introduction to Vespa.ai

Session 2: Vinted Case Study

Ernestas Poskus, Search Engineering Manager

Ernestas Poskus of Vinted shares how the company transitioned from an overloaded, fragmented Elasticsearch setup to a unified Vespa-based search architecture. The move delivered dramatic gains in latency, scalability, cost efficiency, and developer velocity. With Vespa now powering real-time indexing, AI-driven personalization, and multilayered ranking, Vinted has future-proofed its platform and enabled fast innovation in search, recommendations, and experimentation.

Session 2: Vinted Case Study

Session 3: Vespa Technical Deep Dive and Demo

Piotr Kobziakowski, Senior Principal Solutions Architect

This session explores the advanced personalization and ranking capabilities of Vespa.ai, particularly in the context of e-commerce applications. It details how Vespa enables low-latency, scalable operations by breaking data and system silos, supporting personalized search, product recommendations, and real-time model-driven decisions. The presentation highlights Vespa’s architecture, including multi-phase ranking, tensor operations, vector-based search, and integration with multimodal models. A live demo illustrates Vespa’s application in personalized car search and visual matching using PDF images. The session concludes with pointers to sample applications and resources for getting started with Vespa.

Session 3: Vespa Deep Dive and Demo

Transcripts

Jurgen Obermann, Senior Account Executive, EMEA:
Let me start by giving you a bit of background on Vespa — how analysts view us and some customer examples.
Although Vespa, as a company, is only about two years old, the technology itself is much more mature. We began deploying search solutions back in 2011 within Yahoo. What I find particularly interesting is that in 2014, Vespa began adopting vector and tensor operations — nearly a decade before these approaches became mainstream. Most of the industry only started talking about them around 2022 or 2023, yet Vespa was already implementing solutions around these capabilities years earlier.
In 2017, we open-sourced the Vespa codebase. From what we can tell, it’s been quite popular, with lots of downloads for the open-source version. In 2021, we launched the Vespa Cloud service and added general GenAI support in 2022.
Then, in 2023, Vespa officially separated from Yahoo and became its own entity. That was a strong starting point. We’ve already seen significant adoption in the enterprise space, including 10% of the world’s top 50 companies and three to five of the largest global retailers. It’s been a great beginning, particularly in the e-commerce space.
Now, you might ask where Vespa fits best. We’re built for speed and efficiency — real-time querying at very large scale. But we’re equally well-suited for smaller companies that rely heavily on vector or tensor search. Vespa scales from very small use cases to internet-scale workloads, all in real time. Relevance and personalization are core strengths, as you’ll see later in the customer use cases, especially in the Vinted presentation. Vespa’s efficiency and speed allow businesses to reduce costs and improve total cost of ownership (TCO).
The platform is very flexible — it supports deployments in public clouds, private clouds, and on-premises. It also includes a wide range of APIs, including those needed to integrate with LLMs and other advanced tools. That’s essentially where we fit.
Looking at what analysts have said — this is quite astonishing, actually — Gartner conducted a Peer Insights survey of our customers. Every customer they interviewed gave Vespa a five-star rating. That’s very rare, and something we’re extremely proud of.
In terms of well-known customers, the most famous is probably Perplexity. This is likely one of the largest RAG (Retrieval-Augmented Generation) implementations in the world, indexing essentially the entire internet for search. I use it, my family uses it, friends use it — it’s become a go-to for questions, and it relies on precise answers powered by Vespa under the hood.
They currently have about 1.5 billion documents indexed, with a goal of reaching 10 billion by the end of the year. It’s a hybrid, multi-vector text search with multi-phase ranking to optimize LLM usage.
Another customer is Spotify, which uses Vespa for several use cases — the most prominent being hyper-personalization. They leverage Vespa to search within their podcast library — supporting over 600 million monthly active users and more than 5 million podcasts. You can imagine the scale involved. The key feature here is vector search over embedded documents, with queries based on vector similarity. Shutter will go into more detail on this later. Ranking is important here too, especially in relation to neural networks.
As for Yahoo, Vespa was originally built there and continues to be used extensively. What’s striking is that it handles more than 800,000 queries per second across about 50 applications. That shows the scalability Vespa supports — though we also serve much smaller environments just as effectively.
Another example is Taboola, an advertising platform where serving the right ad at the right moment is essential — before the user leaves the website. Vespa enables scalable vector-based search with real-time updates and deep integration, which is critical in that context. Hybrid querying and narrowing of results using deep machine learning are also key parts of the solution there.
I’ll leave the details on Vinted to Ernest, who’s better positioned to speak on that.
With that brief introduction to who we are, what Vespa does, and where we excel, I’ll now hand over to Piotr.
Piotr Kobziakowski, Senior Principal Solutions Architect:
Let me start by discussing Vespa’s positioning in the market and how it fits within the broader landscape of search and vector databases.
Today, we see two dominant categories of database engines used in e-commerce platforms:
  • Text-based engines like Elasticsearch, OpenSearch, and Solr.
  • Vector-native databases such as Pinecone, Chroma, and Qdrant.
Vespa uniquely bridges these two worlds. Over ten years ago, we introduced not just vector support but full tensor support — allowing us to combine structured, semantic, and machine-learned signals in one unified platform. While others are just starting to add vector features, Vespa has long focused on advanced ranking and vector operations.
What sets Vespa apart:
  • Real-time updates: Crucial for applications like live product inventory or shopping cart changes. Vespa supports partial document updates, improving latency and performance.
  • Integrated ranking and search: True relevance comes from tightly coupling search with ranking. Vespa does this natively, including support for multistage ranking and complex custom expressions.
  • Tensor operations: Vespa enables in-place computation of similarity, model inference, and custom scoring using tensors.
  • Model integration: Vespa supports ONNX-based models like XGBoost and GPT, allowing you to bring in models from platforms like Hugging Face and use them directly.
  • Cloud auto-scaling: Thanks to its separation of stateless and stateful services, Vespa scales up and down elastically.
  • LLM integration: Through built-in workflows, LLMs can be used directly in Vespa’s RAG pipelines — no extra orchestration required.
We also support deployment on-premises, in private clouds, or in Vespa Cloud.
On Market Comparisons:
Many legacy systems fall short in modern search use cases:
  • Text-based engines struggle with real-time personalization and don’t offer robust ranking or tensor support.
  • Vector databases often lack Boolean logic, multistage ranking, and deep integration of AI models. Most don’t support mixing lexical and semantic queries seamlessly.
Vespa avoids the pitfalls of manual sharding through its bucket-based architecture, enabling massive scale without added complexity. It also reduces vector storage costs through techniques like heavy quantization and approximate search with learned indexing.
E-commerce Challenges Addressed by Vespa:
  • Fragmented customer experiences: Many retailers can’t deliver consistent, personalized journeys across channels like web, SMS, email, and social due to data silos.
  • Real-time personalization: Requires low-latency access to up-to-date behavioral and catalog data — a strength of Vespa.
  • Ad fatigue and upsell limitations: Vespa helps serve smarter recommendations based on recent purchases or behavior, reducing irrelevant retargeting.
  • Personalized pricing: Real-time inference capabilities allow Vespa to support dynamic pricing models.
Architectural inefficiencies: Microservice sprawl often separates search and ranking into different systems, introducing latency and operational complexity. Vespa unifies these.
Finally, the session introduced how Vespa outperforms Elasticsearch in benchmarks on QPS, latency, and CPU efficiency — and how teams can reproduce those tests themselves.
Ernestas Poskus, Search Engineering Manager
Hello and thank you for having me. I’m pleased to share the story of how we unified our search infrastructure at Vinted and unlocked new capabilities by adopting the Vespa search engine.
This is a story of moving beyond limitations—from a fragmented, operationally heavy setup to one with clarity, speed, and flexibility. I’ll walk you through our architectural evolution and how Vespa became the backbone of our search, powering fast results, personalized suggestions, and recommendations.
I’ve been at Vinted for over a decade, starting as a full-stack engineer, then moving through backend development, site reliability engineering, and eventually into product leadership. That journey taught me the value of scalability and operational resilience—experience that now shapes our search platform team.
Our team was created to migrate away from Elasticsearch and build our new foundation on Vespa. In some ways, it felt like working for seven different companies under one Slack domain.
About Vinted
Vinted is one of the most popular online marketplaces for second-hand items, operating in 20+ countries. We support 15 spoken languages, making search a linguistic and technical challenge. Our infrastructure handles over 75 billion active items in real time and serves around 25,000 search queries per second, each returning up to 1,000 results.
Why We Moved Away from Elasticsearch
At one point, we were managing six large Elasticsearch clusters. These created significant operational overhead: coordinating updates, alias switches, reindexing, performance degradation, and shard management. Feature development slowed because the platform couldn’t support new capabilities without risk.
Eventually, this setup became unsustainable—costly in time, compute, and team productivity.
Enter Vespa
We discovered Vespa thanks to a persistent data scientist who encouraged us to use it for homepage recommendations. Reluctantly, we tried it—and were immediately impressed. It was faster, easier to manage, and impactful on both performance and business outcomes.
Vespa wasn’t a popular or obvious choice back then, but its proven scalability (originating at Yahoo) and support for real-time indexing, lexical and vector search, partial document updates, and built-in ML features made it the right one.
Our Migration Outcomes
  • – Migrated from 6 Elasticsearch clusters to 1 Vespa deployment
  • – Cut server costs by 50%
  • – Improved query latency by over 2x
  • – Increased indexing speed by over 3x
  • – Achieved sub-second data visibility for real-time updates (vs. 6 mins previously)
Architectural Simplicity
Previously, our architecture was a complex chain of loosely connected services across multiple teams. Now, we have a unified platform where all phases of search—including ML-based ranking and retrieval—happen in one place. This reduced system complexity, improved collaboration, and eliminated silos.
Advanced Ranking and ML Integration
Vespa allows us to:
  • Run multiple ML models natively
  • Deploy real-time recommendation engines using two-tower neural networks and ANN search
  • Tune search with fine-grained control over multi-phase ranking logic
  • Co-locate data, compute, and ranking for deterministic performance
We also run six ML models in production with plans to expand, including GPU-ready inference and LLLM-powered RAG across the org.
Real-Time Personalization at Scale
Vespa powers:
  • Instant homepage personalization (e.g., adapting to PlayStation game searches in seconds)
  • Counterfeit detection via image search
  • Stream processing with Apache Flink integrated into Vespa as a persistent, searchable index
What’s Next
We’re expanding use cases each quarter and integrating Vespa more deeply into our experimentation platforms and personalization engines. It’s a strategic enabler, letting us experiment faster and deliver real-time, AI-powered features across the business.

Closing Thoughts

Migrating to Vespa wasn’t just an engine swap—it was a transformation. From silos to synergy. From delays to real time. Vespa is now central to our infrastructure, enabling smarter, faster, and more unified search.
Thank you to the Vespa team and our engineering crew. It’s been an exciting journey—and we’re just getting started.
Piotr Kobziakowski, Senior Principal Solutions Architect

Let’s think about the topics we’ve discussed before. Some elements are already present in vintage implementations. Breaking down silos is extremely important. Vespa excels at this—not just in ranking, as you’ve seen, but in personalized search, query suggestions, and customized category pages like the homepage or any specific section.
Vespa makes these capabilities easy to implement and offers excellent performance in terms of low latency. Beyond this, it handles personalized data sharing efficiently. For example, if you have models with partial updates for user behavioral representation, you can share that outside your core systems—with advertising platforms, marketing platforms, or even inventory tracking.
This works well in Vespa due to its extremely fast updates. You can track changes easily. And when you colocate components, you gain further improvements in latency and speed across personalization engines, clickstreams, and model training pipelines. These can all run together—including things like investment card management and cart data—which are also well-supported by Vespa.
The benefits of this approach are clear: consistent customer profiles and touchpoints, a single source of behavioral data, and improved accessibility across systems. If your data and applications run within Vespa, there are no network delays. Operations run in memory, not over the network—reducing bandwidth costs and improving operational efficiency.
Many teams report heavy gains in operational efficiency when they have one central location for application configurations, data management, and issue investigation. These tasks become significantly easier than in distributed systems. By destroying data and system silos, you’re not only reducing latency in search performance, but also improving application access times and overall responsiveness.
E-commerce Use Cases and Personalization
Let’s now look at what Vespa addresses in e-commerce platforms. I’ll also take this opportunity to announce that we’ll be running another webinar focused on building specific e-commerce components using Vespa. We’ll post the date on our website soon.
In that session, we’ll dive into query personalization and suggestions—something that’s trending right now. These aren’t just based on behavioral popularity or common queries, but truly personalized suggestions. You’ll feel like the system understands you. You start typing, and the system predicts your intent with a high degree of relevance.
We also support personalized product search using lexical, semantic, and visual understanding—combining image embeddings with semantic meaning and text descriptions like titles or product details. This combination improves recall and powers a multi-phase ranking pipeline. Lighter models are applied early, while heavier, more resource-intensive models like cross-encoders are reserved for later ranking stages—achieving both performance and accuracy.
By incorporating personalized vector embeddings based on A/B testing or behavioral models, you can deliver powerful search results. This also applies to personalized category pages. Like in TikTok or other social apps, the more you engage, the more tailored the homepage becomes. It’s not just cool—it directly boosts upsell potential and business outcomes.
“You might also like” recommendations are another layer. These can be based on items viewed or purchased, including cross-category discovery. Relationships between products can be easily modeled and extended to advertising—reusing the same signals and datasets.
Trending product identification is also a strength. You can monitor orders, clicks, and ratings. Reviews and sentiment scores can even be incorporated into Vespa’s ranking profiles. Other use cases include customer engagement, fraud detection, and counterfeit detection. Vespa supports all of this natively.
You don’t need to start with everything at once. Start with search and ranking, and expand from there as you grow more familiar with Vespa.
Personalization at Scale and Vespa Platform Capabilities
When it comes to personalization at scale, today’s systems rely heavily on neural methods. This means tensor operations are fundamental. Vespa supports this out of the box. You can store individual or cohort-level embeddings (as vectors or tensors) and use them directly in ranking. Inference can be executed on GPU or CPU, depending on model complexity. Vespa supports both seamlessly.
This gives you consistent performance regardless of catalog size, user volume, or QPS. Vespa is extremely scalable.
Now, about the platform: there’s often a misconception about what Vespa is. Some think it’s a vector database, others a search engine. But it’s actually a platform. It includes hybrid search (vector + lexical), multi-phase ranking, model inference, data transformation pipelines, and distributed storage.
You can define how each field is stored—on disk, in memory, embedded index, or HNSW graph. Vespa allows you to model datasets across multi-cloud or multi-cluster configurations. One content cluster might serve fast filtering; another might serve complex semantic retrieval.
We wrote about this in our blog, explaining how we built separate content clusters for different workloads. This flexibility supports all the services we’ve discussed. It’s especially relevant for Retrieval-Augmented Generation (RAG) applications.
Take Perplexity, for example. They index 10 billion pages into Vespa and use it to power real-time search for chat interfaces. Their platform showcases what’s possible.
Whether it’s chatbots, virtual assistants, or agentic applications, Vespa’s multi-stage ranking helps extract accuracy from massive knowledge bases. It also supports recommendation systems and advanced search use cases across e-commerce and beyond.
Many platforms are already experimenting with RAG and agent-style architectures. We expect this will become the standard for e-commerce soon.
Vespa Architecture and Operational Principles
Now let’s talk about Vespa’s architecture and core principles. Vespa prioritizes local execution. Model inference and operations like MaxSim run where the data lives—on the same node—so data doesn’t have to move across the network. That’s highly efficient.
Ranking expressions are also executed locally, and computation is distributed across clusters. This ensures stable and predictable performance.
Everything in Vespa is packaged via the application package. It defines configurations, schemas, document types, ranking logic, semantic/lexical handling, field inference (like binarization), and embedded ML models.
Models are bundled and loaded into the Vespa cluster for local inference. The system separates stateless Java containers (handling business logic) from stateful C++ nodes (handling storage and compute). This dual-layer design ensures performance and scalability.
Vespa is a distributed system. You don’t have to worry about manual sharding—it handles that. Computation happens where the data resides, which lowers total cost of ownership and improves performance.
Low latency is essential for advanced use cases like fraud detection, analytics, and payment monitoring. Real-time updates mean you can prevent issues as they happen.
Vespa handles real-time ingestion. Documents are searchable as soon as they’re written—no refreshes or forced merges. It supports partial updates of any field. This is particularly valuable for inventory, carts, or payment flows, where fast updates are critical. Vespa can handle over 100,000 updates per second per node.
Retrieval, Ranking, and Tensor Capabilities
In terms of retrieval and ranking, Vespa supports distributed multi-stage pipelines. Early stages filter with lightweight models; later stages use heavier models. You can choose what model or ranking expression runs at each stage.
This approach allows you to scale accurately and efficiently. For example, in the first stage you might filter 1 million documents down to 1,000 using fast filters. Then apply heavier models only on those 1,000.
You can also control memory usage by paging vectors to disk and loading them only when needed for final ranking.
Vespa supports tensors—scalars, vectors, matrices, maps, and map of vectors. You can mix operators and build expressive ranking functions with dot products, normalization, etc. All of this can be demoed easily.
Live Demo: Personalized Search and Visual Understanding
That brings us to the demo: a personalized car search.
Imagine you want to buy a car. You input preferences like “I like Audi and Mercedes,” “I dislike diesel and Skoda,” and “I can’t drive a manual.” Vespa converts this into a preference vector, then uses it in a ranking profile. Results are personalized instantly.
You can also layer in filters and use additional models to represent emotional preference, taste, or intent. These models work together to deliver hyper-personalized results.
In the next webinar, we’ll show how to extend this to a full e-commerce platform.
We also demoed visual understanding using screenshots from PDFs. Our system breaks images into 32×32 blocks, embeds them, and uses late interaction to match vectors against queries like “child jumping.”
This enables retrieval not only by text or layout but by the actual content of an image. It’s incredibly powerful for visually rich documents.
Resources and Final Thoughts
To wrap up, we encourage everyone to explore the Vespa Python API and our sample apps. These include album recommendation engines, billion-scale vector search, and ColBERT-style late-stage re-rankers.
These are available as open resources to help you build your own production-grade systems. If you’d like to go deeper, please get in touch—we’d be happy to assist.