The AI boom is built on the basic premise that bigger models are more powerful, and that the most powerful models will win. Now, the industry is learning what happens when that assumption begins to crumble.
Rising costs are already causing users to look back at smaller, cheaper models. This cost-conscious model shopping is new and it is unclear what impact it will have on the industry, but it is likely to be significant.
One of the predictions best described by Coinbase co-founder Brian Armstrong is that the majority of tasks will move to cheaper models.
“The demand for intelligence is nearly limitless, but 80% of workloads will be running on models that are 99% cheaper within 12 to 18 months,” Armstrong wrote in X. “20% of the workload continues to run on the latest generation models where maximizing IQ is key.”
It’s hard to overstate how big a change it will be for the AI industry if Armstrong’s predictions come true.
Until now, most AI companies have competed on quality, which has meant defaulting to the most advanced models available. Being able to do these same jobs with cheaper models without impacting quality would represent a major shift in the economics of AI. And importantly, much of that savings will come out of the pockets of big labs, dealing a financial blow to OpenAI and Anthropic as they prepare for their IPOs.
This could lead to significant changes in the industry, and at the heart of it all is one fundamental question: “Are companies ready to switch to smaller models?”
Initial testing suggests that if the system is placed correctly, cheaper models can be used without sacrificing quality. In recent testing with legal AI tool Harvey, the company was able to reduce inference costs by 3x without reducing quality. This testing was conducted in partnership with the inference platform Fireworks AI, combining Claude Opus with Fireworks’ GLM 5.1, moving to Opus for the most intensive tasks. The result was a significant reduction in load in terms of server time and overall cost.
“Quality is paramount and always has been in legal affairs,” Harvey co-founder Gabe Pereyra told TechCrunch, referring to his startup’s AI legal services offering. “But the definition of quality has evolved from simply using the most powerful model for everything to using the best model that gets the right answer most efficiently.”
This trend is often framed in terms of the big labs and China model, or the promiscuous model, but that misses the bigger point. The real difference is not between proprietary and open models. It is between a large model and a small model. You can save money by switching from GPT-5.5 to DeepSeek’s V4 flash, but switching to GPT-5.4-mini works just as well.
There is an active price competition between in-house inference from major laboratories and independently provided promiscuous models. When it comes to the larger question of small vs. large, it doesn’t really matter which kind of small model wins.
All of this may seem obvious, and of course you shouldn’t use more compute than necessary, but this goes against the scaling-first approach that has dominated the industry to date. Inspired by this bitter lesson, research institutions have worked hard to train the most computationally intensive models possible, pushing the frontiers of what AI models can do. With prices heavily subsidized by investors, customers had no reason to choose anything but the most advanced options.
Users are facing cost pressure for the first time due to rising token prices and slowing subsidies. It remains to be seen whether new cost pressures will actually drive enterprise users to smaller models. You can also easily save money by making fewer calls, using less context, or simply giving up on the least promising deployments.
However, if it turns out that most deployments can be performed just as well with smaller models, it could have serious implications for growing inference demands and raise new questions about how to justify the cost of training frontier models.
If you buy through links in our articles, we may earn a small commission. This does not affect editorial independence.
