Can tech companies love cheap AI models?

The AI boom is built on the basic premise that bigger models are more powerful, and that the most powerful models will win. Now, the industry is learning what happens when that assumption begins to crumble.

Rising costs are already causing users to look back at smaller, cheaper models. This cost-conscious model shopping is new and it is unclear what impact it will have on the industry, but it is likely to be significant.

One of the predictions best described by Coinbase co-founder Brian Armstrong is that the majority of tasks will move to cheaper models.

“The demand for intelligence is nearly limitless, but 80% of workloads will be running on models that are 99% cheaper within 12 to 18 months,” Armstrong wrote in X. “20% of the workload continues to run on the latest generation models where maximizing IQ is key.”

It’s hard to overstate how big a change it will be for the AI industry if Armstrong’s predictions come true.

Until now, most AI companies have competed on quality, which has meant defaulting to the most advanced models available. Being able to do these same jobs with cheaper models without impacting quality would represent a major shift in the economics of AI. And importantly, much of that savings will come out of the pockets of big labs, dealing a financial blow to OpenAI and Anthropic as they prepare for their IPOs.

This could lead to significant changes in the industry, and at the heart of it all is one fundamental question: “Are companies ready to switch to smaller models?”

Initial testing suggests that if the system is placed correctly, cheaper models can be used without sacrificing quality. In recent testing with legal AI tool Harvey, the company was able to reduce inference costs by 3x without reducing quality. This testing was conducted in partnership with the inference platform Fireworks AI, combining Claude Opus with Fireworks’ GLM 5.1, moving to Opus for the most intensive tasks. The result was a significant reduction in load in terms of server time and overall cost.

“Quality is paramount and always has been in legal affairs,” Harvey co-founder Gabe Pereyra told TechCrunch, referring to his startup’s AI legal services offering. “But the definition of quality has evolved from simply using the most powerful model for everything to using the best model that gets the right answer most efficiently.”

This trend is often framed in terms of the big labs and China model, or the promiscuous model, but that misses the bigger point. The real difference is not between proprietary and open models. It is between a large model and a small model. You can save money by switching from GPT-5.5 to DeepSeek’s V4 flash, but switching to GPT-5.4-mini works just as well.

There is an active price competition between in-house inference from major laboratories and independently provided promiscuous models. When it comes to the larger question of small vs. large, it doesn’t really matter which kind of small model wins.

All of this may seem obvious, and of course you shouldn’t use more compute than necessary, but this goes against the scaling-first approach that has dominated the industry to date. Inspired by this bitter lesson, research institutions have worked hard to train the most computationally intensive models possible, pushing the frontiers of what AI models can do. With prices heavily subsidized by investors, customers had no reason to choose anything but the most advanced options.

Users are facing cost pressure for the first time due to rising token prices and slowing subsidies. It remains to be seen whether new cost pressures will actually drive enterprise users to smaller models. You can also easily save money by making fewer calls, using less context, or simply giving up on the least promising deployments.

However, if it turns out that most deployments can be performed just as well with smaller models, it could have serious implications for growing inference demands and raise new questions about how to justify the cost of training frontier models.

If you buy through links in our articles, we may earn a small commission. This does not affect editorial independence.

Source link

What's Hot

Supermicro stocks plummet on $7 billion funding plan

Gracie Abrams opens up about early romance with Paul Mescal

Jim Cramer says tech stocks are losing the qualities that made them bull market leaders

WWDC 2026: All the announcements about Siri AI, iOS 27, Apple Intelligence, and more

Anthropic’s Fable 5 lets you create weirdly fun video games with the click of a button

Hey Siri, this is what I actually want from AI

It’s not FAANG anymore. It’s mango.

Newly freed hostages face long road to recovery after two years in captivity

Former Kenyan Prime Minister Raila Odinga dies at 80

New NATO member offers to buy more US weapons to Ukraine as Western aid dwindles

Russia expands drone targeting on Ukraine’s rail network

Gracie Abrams opens up about early romance with Paul Mescal

Jesse Ridgway and Ashley Ridgway urged to break up after abortion

Jack Schlossberg responds to Madonna’s comments about dating JFK Jr.

Which Emily Blunt movies have she and John Krasinski’s kids seen?

Our Picks

‘Horrifying’ knife attack, footage of man repeatedly stabbed, shocks Northern Ireland

Why Lebanon holds the key to the future of the Iran war

ICC chief prosecutor suspended pending regulator’s decision on sexual misconduct allegations

Subscribe to Updates

What's Hot

Can tech companies love cheap AI models?

Related Posts

Subscribe to Updates