Vespa and Elasticsearch / Solr (Lucene)

Venn diagram

With focus on big data serving, Vespa is optimized for:

  • Low millisecond response
  • High write and query load
  • Machine Learning integration
  • Automated high availability operations

Vespa supports true realtime writes, true partial updates, and is also easy to operate at large scale. Vespa is the only open source platform optimized for such big data serving.

For Solr-users: How I learned Vespa by thinking in Solr.

Also see the Q&A and recording of the "The Great Search Engine Debate - Elasticsearch, Solr or Vespa?" meetup.

Analytics vs. Big Data Serving

To decide whether Elasticsearch or Vespa is the right choice for a use case, consider if it needs to be optimized for analytics or serving.

AnalyticsBig data serving
Response time in low secondsResponse time in low milliseconds
Low query rateHigh query rate
Time series, append onlyRandom writes
Down time, data loss acceptableHigh availability, no data loss, online redistribution
Massive data sets (trillion of docs) are cheapMassive data sets are more expensive
Analytics GUI integrationMachine learning integration

Scaling

The fundamental unit of scale in Elasticsearch is the shard. Sharding allows scale out by partitioning the data into smaller chunks that can be distributed across a cluster of nodes. The challenge is to figure out the right number of shards, because you only get to make the decision once per index. And it impacts both performance, storage and scale, since queries are sent to all shards. So how many shards are the right number of shards?

In Vespa you do not have to worry about the number of shards and re-sharding. Vespa will take care of that. You have a cluster of nodes, and you can add or remove nodes without re-sharding, which means no downtime for re-sharding.

Vespa allows applications to grow (and shrink) their hardware while serving queries and accepting writes as normal. Data is automatically redistributed in the background using the minimal amount of data movement. No restarts or other operations are needed, just change the hardware listed in the configuration and redeploy the application.

For a detailed guide on how to set up a multinode Vespa system see Multi-Node Quick Start.

Further reading: