PR Newswire
10 Jun 2026, 03:30 GMT+10
![]() |
Joint benchmarks on OCI H100 infrastructure showed 10x more concurrent users, 10x higher token throughput, and 7x more tokens served without adding GPUs
CAMPBELL, Calif., June 10, 2026 /PRNewswire/ -- WEKA, the AI data and memory infrastructure company, today announced production-scale benchmarks that show how organizations can improve the economics of long-context AI inference by serving more users and tokens on the same GPU footprint. The benchmarks show that WEKA's NeuralMesh platform with Augmented Memory Grid on Oracle Cloud Infrastructure (OCI) serves 10x more concurrent users, delivers 10x higher token throughput, and produces 7x more tokens per GPU than DRAM-only configurations without adding infrastructure. The results were validated on a nine-node OCI bare-metal H100 cluster with 100,000-token context windows.
"Enterprise AI workloads are pushing context windows and GPU utilization to new limits," said Pablo Selem, senior director, software development, Oracle Cloud Infrastructure. "These benchmarks show how WEKA's NeuralMesh platform with Augmented Memory Grid on OCI helps remove memory bottlenecks so customers can support larger, more demanding inference workloads without simply adding more GPUs."
Three Outcomes That Change the Math on Inference
Validated at production scale on a bare-metal H100 cluster (nine nodes, 72 GPUs, 100,000-token context windows, thousands of concurrent users), NeuralMesh with Augmented Memory Grid on OCI delivered:
"Inference is bottlenecked by how much effective memory is available to GPUs," said Liran Zvibel, CEO of WEKA. "These results prove that AI token economics aren't solved by hardware alone; they're solved by eliminating the memory wall that has been the real ceiling on what existing hardware can do. NeuralMesh with Augmented Memory Grid running on OCI brings orders of magnitude more tokens to customers in an extremely cost-efficient way."
Transforming AI Economics with Context Memory Infrastructure
As inference demand grows, AI infrastructure inefficiencies compound. Every key-value (KV) cache eviction is a tax: on GPU cycles, latency, user experience, and the cost of every token served. For long-context and agentic workloads, where inputs routinely run to 100,000 tokens or more, that tax is not a rounding error. It is a direct hit on the unit economics of every organization running production AI.
Augmented Memory Grid, a capability of NeuralMesh, solves the problem at the architectural level by decoupling KV cache from local GPU memory and storing it in a high-performance token warehouse accessible across the cluster. Any host can serve any session with cache hits intact, eliminating rigid session stickiness while delivering superior performance to DRAM, improving load balancing, and enabling clean horizontal scaling as concurrency grows. The result is persistent context memory for AI agents and the cost lever that makes long-context inference economical to run at scale.
Production-Grade Proof
OCI published the full benchmark methodology, system configuration, and results on its AI & Data Science blog on May 13, 2026. The benchmarks, executed on a nine-node OCI bare-metal H100 cluster, move beyond the prior phase of validation, which demonstrated 1000x more KV cache capacity and up to 20x faster time to first token at 128,000 tokens. This latest phase tests the full economics of inference in production: concurrency density, sustained throughput, cache persistence, and service level objective (SLO) stability when demand spikes under high load.
Available on Oracle Marketplace
NeuralMesh with Augmented Memory Grid is generally available to WEKA customers and on the Oracle Marketplace, with OCI as WEKA's exclusive cloud launch partner. Organizations running long-context inference on OCI can deploy a validated, production-ready architecture today. For more on the OCI and WEKA Augmented Memory Grid benchmark, read the OCI blog: https://blogs.oracle.com/ai-and-datascience/scaling-long-context-inference-on-oci-with-wekas-augmented-memory-grid.
About WEKA
WEKA is the AI data and memory infrastructure company transforming the economics of agentic AI. Its NeuralMesh platform unifies high-performance data storage with extended GPU memory, giving enterprises, AI cloud providers, and AI builders a single foundation for training, inference, and agentic workloads. With Augmented Memory Grid, NeuralMesh extends GPU memory capacity by 1000x, accelerates time to first token by up to 20x, and delivers 10x more concurrent users from the same GPU footprint, proven in production benchmarks. Trusted by 30% of the Fortune 50, WEKA enables organizations to scale AI faster, optimize GPU utilization, and reduce the cost of every token served. Learn more at www.weka.io or connect with us on LinkedIn and X.
WEKA and the W logo are registered trademarks of WekaIO, Inc. Other trade names herein may be trademarks of their respective owners.
Get a daily dose of Singapore Star news through our daily email, its complimentary and keeps you fully up to date with world and business news as well.
Publish news of your business, community or sports group, personnel appointments, major event and more by submitting a news release to Singapore Star.
More InformationBOSTON, Massachusetts: A federal judge has invalidated a $100,000 fee imposed by President Donald Trump on new H-1B visas, ruling that...
DUBAI/JERUSALEM: Iran and Israel said on June 8 that they had stopped attacking each other after U.S. President Donald Trump urged...
MUZAFFARABAD, PoJK: Clashes in Pakistan-occupied Jammu and Kashmir ahead of a protest planned for June 9 left 11 people dead and more...
UNITED NATIONS: The U.N.'s food agency says the Iran war is pushing millions more people into serious hunger, as it had warned would...
RIO DE JANEIRO, Brazil: Airline executives from around the world are meeting in Rio de Janeiro this weekend amid growing concerns that...
TAIPEI, Taiwan: Taiwan should use its defense money more wisely and learn from the wars in Ukraine and the Middle East, especially...
NEW YORK, New York - Global stock markets closed with sharp divergences Tuesday, as U.S. indexes delivered a mixed performance while...
RIO DE JANEIRO, Brazil: Rising fuel prices linked to conflict in the Middle East are putting pressure on airlines worldwide, forcing...
CUPERTINO, California: Apple introduced a redesigned version of Siri powered by artificial intelligence, marking its biggest update...
SINGAPORE/LONDON: Indonesian President Prabowo Subianto is losing investors' trust, and his plans to grow the economy are at risk as...
NEW YORK, New York - World stock markets closed on a deeply divided note Monday, with Wall Street's technology sector powering the...
NEW YORK CITY, New York: S&P Dow Jones Indices has decided against changing its rules to allow newly public mega-cap companies to enter...
