Zayn Asghar, an adjunct professor at Stanford University and successful founder, has raised $80 million in Series A for a startup that cleverly solves the bottleneck problem of AI inference. The round was led by Menlo Ventures.
A company called Gimlet Labs has developed the first and only “multi-silicon inference cloud,” software that allows AI workloads to run on different types of hardware simultaneously. You can split the work of your AI apps across both traditional CPUs and AI-tuned GPUs, as well as high-memory systems.
“We basically run into different hardware that is available,” Asghar told TechCrunch.
A single agent may chain multiple steps, each of which “requires different hardware: inference is compute-dependent, decoding is memory-dependent, and tool invocation is network-dependent,” Menlo lead investor Tim Tully wrote in a blog post about the funding.
There isn’t a chip to do it all yet, but as new hardware is rolled out and aging GPUs are redeployed, “a multi-silicon fleet is ready. We’re just missing the software layer to make it work.” That’s what Tully believes the Gimlet Institute will deliver.
If current trends in deploy-more-computing continue, McKinsey estimates that data center spending will reach nearly $7 trillion by 2030. Asghar said the app only uses the existing hardware already deployed “between 15 and 30 percent” of the time.
“There’s another way to think about this: You’re wasting hundreds of billions of dollars because you’re just leaving idle resources,” he said. “Our goal was essentially to figure out how to make AI workloads 10x more efficient today than they have ever been before.”
tech crunch event
San Francisco, California
|
October 13-15, 2026
So he and co-founders Michelle Nguyen, Omid Azizi, and Natalie Serrino set out to build orchestration software that could split agent workloads and distribute them across all types of hardware simultaneously.
Gimlet Labs claims that it can reliably speed up AI inference by 3x to 10x for the same cost and power. Gimlet says the underlying model can also be sliced to run across different architectures, using the best chip for each part of the model.
The company already has partnerships with chip makers such as NVIDIA, AMD, Intel, ARM, Cerebras, and d-Matrix.
Gimlet’s products, provided as software or through APIs to our proprietary Gimlet Cloud, are not intended for general AI app developers. For the largest AI model labs and data centers.
The company went public in October and says it has achieved eight-figure revenues (or at least $10 million) since its inception. Asghar said the customer base has more than doubled in the past four months and now includes major model manufacturers and very large cloud computing companies, but declined to name them.
The co-founders previously worked together at Pixie, a startup that developed open source observability tools for Kubernetes. Pixie was acquired by New Relic in 2020, just two months after launching in a $9 million Series A led by Benchmark. (Pixie’s technology is now part of the open source organization that oversees Kubernetes.)
After Mr. Asghar met Mr. Talley by chance about a year ago and received angel investment from Stanford University professors, venture capitalists started calling. After the start, a term sheet arrived on Asgar’s desk. When VCs heard that Asghar was considering an offer, the round quickly maxed out because “we had quite a lot of money,” he said.
With the previous seed, the startup has now raised a total of $92 million from a number of angels, including Sequoia’s Bill Coughran, Stanford professor Nick McKeown, former CEO of VMware Raghu Raghuram, and Intel CEO Lip-Bu Tan. The company currently employs 30 people.
Other investors include Factory, which led the seed, Eclipse Ventures, Prosperity7, and Triatomic.
