Unlock the Future of eCommerce Webinar
In this webinar, you’ll learn how Vespa.ai reshaped online retail by integrating search, ranking, and recommendations into a single, scalable platform. The session explored Vespa’s advanced AI capabilities— from agentic commerce to Retrieval-Augmented Generation (RAG) — and how they power state-of-the-art document, video, and image search. You’ll also hear how this unified approach enabled Vinted to scale seamlessly to 1 billion listings, boost query speed, and cut operational costs.
You can view the session recordings and transcripts below.
Session 1: Introduction to Vespa
- Jürgen Obermann, Senior Account Executive, EMEA
- Piotr Kobziakowski, Senior Principal Solutions Architect
This session explores how Vespa uniquely combines the capabilities of text search engines and vector databases into a single, scalable AI-native platform. It highlights Vespa’s advantages in real-time ranking, tensor operations, and integrated LLM workflows — all essential for powering modern e-commerce, recommendation, and personalization systems. Common limitations in legacy architectures are described and how Vespa helps eliminate data silos, reduce latency, and simplify complex search infrastructure, making it ideal for production-grade Retrieval-Augmented Generation and AI applications.
Session 2: Vinted Case Study
Ernestas Poskus, Search Engineering Manager
Ernestas Poskus of Vinted shares how the company transitioned from an overloaded, fragmented Elasticsearch setup to a unified Vespa-based search architecture. The move delivered dramatic gains in latency, scalability, cost efficiency, and developer velocity. With Vespa now powering real-time indexing, AI-driven personalization, and multilayered ranking, Vinted has future-proofed its platform and enabled fast innovation in search, recommendations, and experimentation.
Session 3: Vespa Technical Deep Dive and Demo
Piotr Kobziakowski, Senior Principal Solutions Architect
This session explores the advanced personalization and ranking capabilities of Vespa.ai, particularly in the context of e-commerce applications. It details how Vespa enables low-latency, scalable operations by breaking data and system silos, supporting personalized search, product recommendations, and real-time model-driven decisions. The presentation highlights Vespa’s architecture, including multi-phase ranking, tensor operations, vector-based search, and integration with multimodal models. A live demo illustrates Vespa’s application in personalized car search and visual matching using PDF images. The session concludes with pointers to sample applications and resources for getting started with Vespa.
Transcripts
Let me start by discussing Vespa’s positioning in the market and how it fits within the broader landscape of search and vector databases.
- Text-based engines like Elasticsearch, OpenSearch, and Solr.
- Vector-native databases such as Pinecone, Chroma, and Qdrant.
- Real-time updates: Crucial for applications like live product inventory or shopping cart changes. Vespa supports partial document updates, improving latency and performance.
- Integrated ranking and search: True relevance comes from tightly coupling search with ranking. Vespa does this natively, including support for multistage ranking and complex custom expressions.
- Tensor operations: Vespa enables in-place computation of similarity, model inference, and custom scoring using tensors.
- Model integration: Vespa supports ONNX-based models like XGBoost and GPT, allowing you to bring in models from platforms like Hugging Face and use them directly.
- Cloud auto-scaling: Thanks to its separation of stateless and stateful services, Vespa scales up and down elastically.
- LLM integration: Through built-in workflows, LLMs can be used directly in Vespa’s RAG pipelines — no extra orchestration required.
- Text-based engines struggle with real-time personalization and don’t offer robust ranking or tensor support.
- Vector databases often lack Boolean logic, multistage ranking, and deep integration of AI models. Most don’t support mixing lexical and semantic queries seamlessly.
- Fragmented customer experiences: Many retailers can’t deliver consistent, personalized journeys across channels like web, SMS, email, and social due to data silos.
- Real-time personalization: Requires low-latency access to up-to-date behavioral and catalog data — a strength of Vespa.
- Ad fatigue and upsell limitations: Vespa helps serve smarter recommendations based on recent purchases or behavior, reducing irrelevant retargeting.
- Personalized pricing: Real-time inference capabilities allow Vespa to support dynamic pricing models.
Hello and thank you for having me. I’m pleased to share the story of how we unified our search infrastructure at Vinted and unlocked new capabilities by adopting the Vespa search engine.
Vinted is one of the most popular online marketplaces for second-hand items, operating in 20+ countries. We support 15 spoken languages, making search a linguistic and technical challenge. Our infrastructure handles over 75 billion active items in real time and serves around 25,000 search queries per second, each returning up to 1,000 results.
At one point, we were managing six large Elasticsearch clusters. These created significant operational overhead: coordinating updates, alias switches, reindexing, performance degradation, and shard management. Feature development slowed because the platform couldn’t support new capabilities without risk.
We discovered Vespa thanks to a persistent data scientist who encouraged us to use it for homepage recommendations. Reluctantly, we tried it—and were immediately impressed. It was faster, easier to manage, and impactful on both performance and business outcomes.
- – Migrated from 6 Elasticsearch clusters to 1 Vespa deployment
- – Cut server costs by 50%
- – Improved query latency by over 2x
- – Increased indexing speed by over 3x
- – Achieved sub-second data visibility for real-time updates (vs. 6 mins previously)
Previously, our architecture was a complex chain of loosely connected services across multiple teams. Now, we have a unified platform where all phases of search—including ML-based ranking and retrieval—happen in one place. This reduced system complexity, improved collaboration, and eliminated silos.
- Run multiple ML models natively
- Deploy real-time recommendation engines using two-tower neural networks and ANN search
- Tune search with fine-grained control over multi-phase ranking logic
- Co-locate data, compute, and ranking for deterministic performance
- Instant homepage personalization (e.g., adapting to PlayStation game searches in seconds)
- Counterfeit detection via image search
- Stream processing with Apache Flink integrated into Vespa as a persistent, searchable index
We’re expanding use cases each quarter and integrating Vespa more deeply into our experimentation platforms and personalization engines. It’s a strategic enabler, letting us experiment faster and deliver real-time, AI-powered features across the business.
Closing Thoughts
Migrating to Vespa wasn’t just an engine swap—it was a transformation. From silos to synergy. From delays to real time. Vespa is now central to our infrastructure, enabling smarter, faster, and more unified search.