Vespa at Work:

Onyx: 25% More Cost Efficient with Vespa Cloud

Onyx’s journey shows how even a simple, automated configuration change, powered by real-time insights, can yield big results.

At a Glance

  • Challenge: Rapid growth led to overprovisioned infrastructure and increasing costs.
  • Solution: Vespa Cloud’s Resource Suggestions and automated scaling guided cost-efficient configuration changes.
  • Outcome: ~24.5% reduction in infrastructure cost with zero downtime and minimal engineering effort. Onyx can focus engineering resources on product innovation while maintaining performance.

Introduction

Onyx is an open-source AI platform that helps organizations unlock and leverage institutional knowledge scattered across documents, tools, and conversations. Built for both fast-moving teams and secure enterprises, Onyx delivers AI-powered workplace search, custom AI assistants, and developer APIs that integrate knowledge into daily workflows.

Growing usage and expanding data volumes meant Onyx required infrastructure that could scale efficiently without ballooning costs, which pushed them to optimize their cloud deployment.

The Challenge: Scaling without Slowing Down

As Onyx’s user base grew, so did the complexity of managing their self-hosted Vespa deployment. Vespa’s architecture offers powerful flexibility: teams can scale horizontally, resize clusters, and tune performance parameters. But identifying the ideal balance between cost and performance takes time, experimentation, and monitoring. For a lean, fast-moving team like Onyx, the priority was clear: focus on building, not infrastructure management.

In the early stages, the infrastructure strategy was straightforward: avoid running out of disk or memory, and provision enough CPU to keep query performance strong. But as Onyx onboarded more customers and ingested more data with heavier workloads, system demands grew significantly.

By this point, schemas had stabilized, and usage patterns were clearer. There was finally enough operational data and traffic to assess optimization opportunities and identify the right balance of performance and cost: what is the sweet spot, given the current load? In theory, most platforms allow teams to detect overprovisioned resources and right-size clusters.

The real challenge, however, wasn’t identifying what to change. It was finding the time to make the changes. Reconfiguring infrastructure takes time and operational effort, both of which were in short supply for a team focused on delivering new features and meeting customer demand.

Their early move to Vespa Cloud proved valuable. It allowed them to offload operational complexity while preserving the scalability and responsiveness of their system.

The Solution: Vespa Cloud Resource Suggestions

Because Onyx operated on Vespa Cloud, the team had access to built-in monitoring, insights, and resource recommendations that dramatically simplified cost optimization without sacrificing performance or stability.

Key Capabilities

Performance Monitoring & Insights

Continuously tracks usage patterns and highlights optimization opportunities.

Resource Suggestions

Analyzes historical workload data and recommends optimal configurations tailored to actual usage.

Automated Instance Migrations

Executes upgrades or instance-type changes seamlessly via Vespa Cloud’s automated deployment pipeline.

Results: Simple Changes, Significant Savings

Vespa Cloud’s Resource Suggestions surfaced a clear imbalance in Onyx’s infrastructure: memory utilization was healthy, but CPU capacity was far higher than required.

Based on usage patterns, Vespa Cloud recommended a reconfigured cluster with larger memory-optimized nodes and fewer total CPUs. This adjustment:

  • Reduced hourly infrastructure costs significantly (e.g., from ~$114/hr to ~$84/hr).
  • Delivered ~24.5% overall cost reduction on cluster spend.
  • Required minimal manual effort: A three-line configuration update followed by automated deployment.

Best of all, the process completed with zero downtime or performance impact, thanks to Vespa Cloud’s automated migration and prioritization of user traffic.

What’s Next for Onyx

With fundamental improvements in place, Onyx can continue to experiment with advanced optimization strategies like topology tuning, streaming search configurations, and embedding strategies, all within Vespa Cloud’s safe experimentation environment.

The team is now poised to scale confidently, knowing that resource efficiency and performance are manageable without diverting focus from core product development.

Conclusion

Onyx’s experience highlights a broader insight: Infrastructure optimization doesn’t have to be complex or risky.

With Vespa Cloud’s resource insights and automated tooling, teams can continuously tune deployments , reducing waste, controlling costs, and staying focused on innovation.

Continuous optimization becomes a low-effort, high-impact part of the lifecycle, not a periodic burden.

More Reading

Autoscaling with Vespa

This eBook explores how Vespa’s advanced autoscaling capabilities help organizations efficiently manage variable workloads by automatically adjusting resources to meet performance, cost, and scalability requirements.

Migrating from Elasticsearch to Vespa

In this webinar, guest speaker Ravindra Harige, long-time search expert and founder of Searchplex, will share how to confidently make the move from Elasticsearch to Vespa and scale with confidence.

GigaOm Radar for Vector Databases V3

This report provides a detailed comparison of 17 leading open source and commercial solutions, examining their strengths across hybrid search, semantic retrieval, RAG, and large-scale AI workloads.