Solving the Personalization Problem in eCommerce (AMER Edition)

Solving the Personalization Problem in eCommerce

Part 3 of the AI in eCommerce Series

This webinar explores how to build real-time e-commerce personalization using Vespa’s search and ranking engine. Learn how to combine shopper behavior signals, product attributes, dense and sparse tensors, and multi-phase ranking to deliver personalized search results and recommendations in milliseconds. The session includes a live demo of semantic search, image search, and real-time product ranking at scale.

Note: This transcript has been cleaned up and formatted for readability using AI.

Bonnie Chase:
Welcome, everyone, and thanks for joining us today. My name is Bonnie, and I lead product marketing at Vespa. Today I’m joined by Zohar, our Lead Solution Architect.

This session is being recorded, and you will receive a copy of the recording within 48 hours after the session ends.

Today, we’ll be digging into how to solve one of the biggest challenges in e-commerce systems: delivering genuinely individualized experiences at scale without sacrificing latency.

For the structure of today’s session, we’ll start with a quick framing of the problem, then walk through the three things you need to get right: your data model, your ranking logic, and your ability to act on signals in real time.

After that, we’ll show it all in action in a live demo, and then close with key takeaways before Q&A.

Most personalization systems fail shoppers in one of three ways:

  • Stale data – shopper profiles updated on batch cycles instead of in real time
  • One-size-fits-all recommendations – demographic approximation instead of true personalization
  • Fragmented systems – signals spread across platforms that do not communicate

The result is personalization that often feels inaccurate and disconnected from shopper intent.

So what does good look like?

We think about it in three principles:

  • Model all your data
  • Rank, don’t just retrieve
  • Act on signals immediately

These three principles are what Vespa is built around.

Let me give you a concrete picture of what this pipeline looks like end-to-end, from a raw product image to a personalized result.

First, it’s about capturing product information.

We use an AI vision model to read each product image and automatically generate a natural language description so you have rich, consistent product content at scale without requiring manual copywriting for every SKU.

Then we extract product attributes such as color, material, style, and fit, each scored by confidence.

These attribute tags are what power personalization downstream.

Once we have the product data, it’s about building the shopper profile based on interactions.

Every interaction a shopper has — whether it’s a click, add to cart, or removal — updates their taste profile in real time.

Recent behavior is weighted more heavily than older behavior, so what somebody did 10 minutes ago matters more than what they did last week.

Finally, Vespa scores every product in the catalog against that shopper’s live profile and returns the most relevant results in milliseconds.

The key thing is that all of these steps run inside a single platform.

Now I’ll pass it over to Zohar to go deeper on the building blocks, starting with the data model.

Zohar Nissare-Houssen:
Thank you, Bonnie.

Let’s look a little bit under the hood at how you would implement this in Vespa.

With Vespa, a single document captures everything a shopper cares about and everything the retrieval and ranking engine needs in one place.

This includes structured attributes like category, brand, title, price, stock, rating, and reviews.

You also have lexical fields like title and description that are used for lexical retrieval and matching, such as BM25, as well as inputs for embeddings.

One important point to notice is the tensors.

In this example, you have dense tensors for text embeddings to enable semantic search.

You also have dense tensors representing images, enabling image-to-image and text-to-image search.

And finally, you have sparse tensors representing product attributes such as black, sleeveless, and silver.

These provide a compact, weighted representation of visual style attributes and allow lightweight personalization by comparing those attributes against the user profile.

This enables highly accurate retrieval across multiple search modalities: lexical, semantic, visual, structured, and behavioral.

All of this lives on the same document, serves from the same node, and is available in real time in a single query.

Most vector databases stop at a single dense vector per document.

Vespa’s tensor framework goes much further.

A tensor can be scalar, dense, sparse, or multi-vector.

This allows you to represent complex data modalities that flat vectors simply cannot.

It also enables tensor math at query time inside the engine where the data lives.

You can combine multiple signals into a single relevance score using dot products, sums, joins, and reductions.

This means you can blend lexical, semantic, visual, and business signals into your own custom ranking expression.

In addition, Vespa supports real-time ML inference directly as part of ranking.

You can execute ONNX models, learning-to-rank models, and neural re-rankers at query time, co-located with the data.

The result is advanced ranking based on shopper purchase propensity as well as business rules such as deprioritizing low-inventory items.


Multi-Phase Ranking

Now let’s look at the ranking engine.

One of the most common questions we get is: how do you stay fast at e-commerce scale without sacrificing accuracy?

The answer is multi-phase ranking.

Instead of running the most expensive scoring models on every candidate, Vespa filters and ranks progressively, spending compute only where it pays off.

Think of it as a funnel.

You may start with millions or billions of products.

  • Phase 1: lightweight retrieval and ranking
  • Phase 2: refined re-ranking on a smaller candidate set
  • Global phase: final most expensive model on the top candidates

This allows you to apply expensive ML models only to the most relevant items.

That’s how Vespa delivers machine-learning-powered accuracy without giving up latency.


Live Demo

Now let’s see how this works in real time.

For those who have been following our webinar series, this may look familiar — it’s the Vespa demo shop.

This demonstrates an e-commerce use case with fashion accessories and apparel.

We chose this example because it offers a lot of opportunity to demonstrate search accuracy and granular personalization in real time.

As we discussed, the Vespa data model is highly flexible.

Within the same document, you can represent lexical fields, structured attributes, and sparse tensors.

For example, we can model user preferences as a sparse tensor.

Let’s say a shopper has a strong preference for green and yellow items, and a negative preference for red items.

Once those preferences are saved, the featured products immediately update to reflect that personalization.

Now the shopper sees green and yellow items surfaced more prominently.

Under the hood, this happens through a dot product between two sparse tensor representations:

  • the user preference tensor
  • the product attribute tensor

The system ranks higher the products with stronger matches.

We can also demonstrate additional search capabilities.

For example, let’s say I’m looking for gold shoes.

Using text-to-image search, the results become significantly more accurate.

These are all items visually aligned with the search description.

This is done by embedding the search query using the same embedding model used for the product images.

We can also demonstrate image-to-image search.

If I upload an image of a Donald Duck T-shirt, Vespa returns the exact item first, followed by visually similar items.

Again, this works by embedding the uploaded image and performing similarity search across product image embeddings.

Now let’s look at dense personalization.

Suppose I search for emerald earrings.

As I click on specific styles, the “you may also like” section updates in real time to show visually similar products.

Each click refines the user embedding profile.

Dense personalization allows the system to capture highly nuanced style preferences such as:

  • shape
  • color tone
  • size
  • design style

This is especially powerful for categories like jewelry, where shopper taste is highly specific.

Both sparse and dense personalization are valid approaches.

Sparse embeddings offer readability and direct control.

Dense embeddings offer stronger performance and accuracy at scale.

This demonstrates the flexibility Vespa provides in tensor modeling and real-time personalization.

I’ll now hand it back to Bonnie.

Bonnie Chase:
I hope you enjoyed that demo.

Now let’s get into some key takeaways.

The first key takeaway is that generic personalization can be costly.

Stale batch signals and segment-based recommendations can quickly decay relevance within a session.

This impacts click-through rates, average order value, and shopper confidence.

The second key takeaway is around tensors.

Sparse tensors make shopper intent readable.

Unlike dense embeddings, named dimensions like color and style are human-readable, debuggable, and directly updatable.

You don’t need to retrain a model to update a preference — you simply update the value.

The third takeaway is that real-time is an architectural choice.

When your system supports real-time updates and ranking, session signals such as clicks, favorites, filters, and removals reach the ranking layer in milliseconds.

This is not a bolt-on feature.

It’s a foundational architectural decision.

And finally: one platform, not five services.

Retrieval, ML inference, personalization, and merchandising controls all run inside one platform.

No ETL pipelines.

No separate scoring service.

No stitching systems together.

That simplicity is what makes real-time possible.

Now we’ll open it up for questions.

FAQ: Common Questions About Real-Time Personalization

How do you solve the cold start problem for new users?

New users can begin with default preferences or popularity-based ranking.

As soon as they begin interacting, the user profile updates in real time.

How is Vespa different from external re-rankers?

Vespa performs ranking where the data lives.

This removes network hops and enables lower latency.

How do you model user preferences?

User profiles are stored as tensors and updated continuously based on session behavior.