The intense demand for computers to run AI models will only accelerate, but there are two major hurdles that everyone in business will need to overcome. It’s about getting the right chips and getting them into data centers where they can generate revenue.
General Compute, a new inference neocloud (a company that rents AI processing power and specializes in the phase where models are executed and respond to users, rather than being trained) has the answers to these questions that reveal where the AI ecosystem is headed. These answers helped us raise a $15 million seed round at a $60 million post-money valuation, led by FUSE VC with participation from Carya Venture Partners and Village Global Ventures.
First, what’s the right chip? The demand for GPUs is through the roof, but it’s becoming common knowledge that GPUs aren’t the best chips to run AI models once they’ve been trained. The phase of AI, where the model actively generates a response, has different computational requirements than training, and a new class of chips is being designed specifically for this phase. Nvidia’s $20 billion Groq deal in December and Cerebras’ $57 billion IPO last week point the way.
With production capacity at both companies strained, General Compute co-founders CEO Finn Puklowski and CTO Jason Goodison found another option. They’re turning to a specialized chip developed by SambaNova, an Intel-backed chipmaker focused on inference that has fallen somewhat outside the Silicon Valley conversation.
Things could change when SambaNova releases new chips this year. This architecture is more flexible, uses more memory to store context during inference calculations, and SambaNova claims to outperform GPUs as well as other specialized chips built by Groq, Cerebras, and others. Puklowski said the new chip generates 600 to 700 tokens per second, while the GPU generates about 250 tokens per second.
General Compute has ordered the company’s SN50 chip for $300 million, and this will be the first time it has deployed it in the neocloud.
These chips also help solve general computing’s second big problem: where to put it. The chips are air-cooled rather than water-cooled and consume less power, allowing them to be installed in existing data center facilities without new infrastructure investments.
Mr. Puclovsky is pursuing colocation deals, where General Computing installs its hardware at other companies’ facilities. In addition to data center providers, the company also works with crypto miners who are looking to reuse their infrastructure, as the cost of producing Bitcoin often exceeds its price.
General Compute announced its cloud offering last week, claiming that its powerful open source LLM, MiniMax 2.7, is already the fastest running.
Joe Hasselmann is a venture investor who invested in Groq in 2021, laying the foundation for the inference boom. This year, he launched Evercrest Capital Partners, a new fund focused on the AI space, and made General Compute its first investment. Hasslemann sees SambaNova’s partnership with General Compute as similar to CoreWeave’s relationship with Nvidia and the combination of Groq’s chip manufacturing and previous cloud offerings.
“They need a healthy mix of customers deploying chips in an environment with high growth potential,” Hasleman said. “SambaNova is betting on General Compute as much as General Compute is betting on SambaNova.”
The question is: What computer architectures will create the most value in the future of AI? Inference clouds are an implicit bet on a world of multiple models and agents, where no single provider will dominate, and where speed and cost of inference will be the key competitive variables. Consider the $113 million Series B raised this week for OpenRouter. This reflects the company’s ability to provide customers with access to multiple models to optimize their token spend.
Speed is important in this calculation, both in terms of price and features. Puklowski wants to turn hour-long workloads for coding agents into five- or 10-minute tasks, making it more economical for voice agents for customer service, which require faster reasoning to communicate effectively.
“Even if you get 50 tokens per second using ChatGPT, that’s still much faster than what we can read,” Puklowski told TechCrunch. “Right now, things are happening between agents, and they’re doing the reads on our behalf, pinging the database, and we need it to be faster.”
If you buy through links in our articles, we may earn a small commission. This does not affect editorial independence.
