Real-Time Indexing at Any Scale

Real-time indexing continuously refines the searchable corpus by making newly created and updated documents available for retrieval immediately after they are updated.

Role in the AI Retrieval Workflow

Real-time indexing is the stage of the AI Retrieval Workflow that continuously incorporates new and updated documents into the searchable corpus.

Why Real-Time Indexing Matters

Real-time indexing is what separates search that reflects the world right now from search that's always a step behind. Most search and recommendation systems treat freshness as an afterthought: data is batch-indexed on a schedule, and "real-time" means a lag of a few minutes to a few hours. That's a real cost. Stale inventory gets sold, stale prices get shown, stale signals make recommendations worse with every passing minute.
Vespa was built the opposite way. Real-time indexing isn't a mode you turn on; it's the default. Every write, whether a full document or a partial update, follows the same low-latency path: validated, indexed, and made visible to retrieval, ranking, grouping, and sorting, typically in under a second.
No batch windows. No reindex jobs. No stale results.
That architecture is what makes it possible to:

  • Reflect change as it happens. Price drops, inventory counts, click signals, and fraud scores are queryable the moment they're written, not after the next nightly job.
  • Update without rewriting the document. Partial updates change a single field, such as a price or a popularity counter, without resending the full document, supporting tens of thousands of updates per second per node.
  • Scale writes and reads independently. Nodes can be added or removed without taking the index offline or rebuilding it from scratch. Elasticity is built into the architecture, not bolted on.
  • Use freshness as a ranking signal, not just a display value. Because updates are visible immediately, ranking expressions can use live signals like recency, popularity, and inventory in the same query that serves the result.

How Real-Time Indexing Works

Attribute fields are what enable sub-second indexing and updates. They live in memory and update in place, so changing a price or a counter never means reading the full document from disk, modifying it, and writing it back. The value just changes where it sits.
Index fields, used for full-text matching, follow a similar in-memory-first path: writes land fast, and the heavier work happens in the background, without ever blocking reads or writes.

Without real-time indexing, teams either accept staleness (batch windows, scheduled reindexing, eventual-but-slow consistency) or bolt together a separate hot-path system for "live" data and a separate system for everything else, doubling the infrastructure and the ways the two can disagree.
Vespa removes that split: one system, one write path, and the data is live the moment it lands.
Beyond real-time indexing, the write path also provides:

  • No read-modify-write tax: Attribute updates change values in place in memory, instead of reading the full document, applying the change, and writing it back to disk.
  • High-throughput partial updates: Built for tens of thousands of updates per second per node, not occasional corrections.
  • Background-only maintenance: Index merges and attribute flushes happen without pausing reads or writes, so freshness never costs you availability.

Configuration

Real-time performance depends on a combination of schema and cluster configuration. The following settings determine how quickly writes become searchable and how efficiently they are processed. Here's what you control:

  • Attribute fields: In-memory, in-place updates that are the fastest path for high-frequency changes.
  • Partial updates: Update one or more fields by document ID, without resending the whole document.
  • Elastic content nodes: Add or remove capacity without taking the index offline.
  • Transaction log: A write-ahead durability guarantee before a write is acknowledged.
  • Consistency model: Tunable eventual consistency, trading strict ACID guarantees for speed and availability.

Learn with Vespa

Learn how to build search, recommendation, and RAG applications with Vespa through a free, self-paced course that combines hands-on exercises with links to the documentation.

  • Partial Updates: How to update documents by ID without resending full content.
  • Attributes: How in-memory attribute fields enable fast, in-place updates.
  • Indexing: How Vespa processes and routes fields during indexing.
  • Feed Sizing Guide: Tuning for high-throughput feeding and updates.
  • Schemas: How fields are defined as attribute, index, or both.
  • Reindexing: How Vespa handles schema and indexing pipeline changes without downtime.

Frequently Asked Questions

Need more than a quick answer?

If these FAQs don't answer your question, there are several ways to continue:

Learn the fundamentals with our free online training at learn.vespa.ai.

Experience Vespa yourself with a free Vespa Cloud trial.

Watch the Getting Started with Vespa AI Search YouTube video

Contact our team to discuss your application or migration project.
What is indexing, and why isn't it just called updating?
Indexing is the process of making data searchable. When a document changes, Vespa updates not only the stored document but also the internal indexes that enable fast retrieval, filtering, ranking, and sorting. That's why the process is called indexing rather than simply updating—the goal is to make changes immediately available to the AI Retrieval Workflow, not just store new data.
How fast is "real time" in Vespa?
Indexing latency is typically sub-second from the moment a document is fed until it's searchable, rankable, and groupable. Exact latency depends on the use case, available resources, and system tuning.
Does updating a field require resending the whole document?
No. Partial updates let you change one or more fields using only the document ID, without needing to read or resend the full document — which is what makes high-frequency updates affordable at scale.
Does Vespa support batch ingestion for large-scale loads?
Not as a separate concept. Batch ingestion conflicts with the elasticity and sub-second indexing latency that are core to Vespa's architecture. High-throughput feeding, including updates to large or entire document sets, is supported and tunable, but it goes through the same real-time write path as everything else.
What makes updates fast in Vespa?
Mostly architecture. Fields marked as attributes are stored in memory and updated in place, avoiding the read-modify-write pattern that slows down disk-backed updates. This is what allows tens of thousands of updates per second per node.
What happens if a node fails right after a write?
Writes are logged to a transaction log before being acknowledged, so they are only confirmed once they're durably recorded. Vespa uses a tunable eventual consistency model rather than strict ACID transactions, trading some traditional guarantees for the performance and availability real-time applications need.