A new spending discipline is taking hold within corporate America, as chief financial officers and boards of directors begin to crack down on inefficient spending on artificial intelligence. This change has the potential to reshape AI trade.
For the past two years, playbooks have defaulted to choosing the most powerful AI model and sending all queries through that model, regardless of complexity. Now that the AI bill is so far over budget, companies are starting to question whether they actually need a top-of-the-line or frontier model for every task. Two leaders at the center of building AI told CNBC this week that a solution is emerging: model routing.
What is model routing?
Routing is a tool that adapts jobs to models, sending difficult problems to expensive frontier models and easy problems to cheaper, faster alternative models.
Scott Wu, CEO of Cognition, which develops the coding agent Devin, said the benefits of routine work are huge. For many routine tasks, he says, companies can be five to 10 times more cost-effective by using models that are good enough for the task.
Today, most companies don’t do any routing at all. Glean CEO Arvind Jain estimates that approximately 95% of enterprise AI usage is still performed on the most expensive frontier models, even for tasks that could easily be handled by cheaper alternatives. Wu gave the example of asking a model to name the third US president. No matter how expensive they are, you’ll know they were all Thomas Jefferson.
Arvind Jain, CEO of Glean, takes to the SaaS Monster stage during day 1 of Web Summit 2022 at Altice Arena in Lisbon, Portugal on November 2, 2022.
Harry Murphy | Sports File | Getty Images
The pressure behind this change is a cost curve that has taken even the biggest technology companies by surprise. Jeetu Patel, Chief Product Officer Ciscolaid out the calculations. At approximately $200 in token usage per employee per week, that’s approximately $10,000 per employee per year. A company with 90,000 employees expects to make $900 million a year. A token is a block of data that a model uses to generate information. Usage is charged according to the number of tokens processed.
Patel said the adjustment was necessary because Cisco is way over its budget and currently has 30,000 engineers developing products primarily written in AI. Cisco reallocated resources and prioritized tokens over other spending.
Vendors under pressure
AI companies are aware of the concerns.
Cognition has announced what it calls the AI Productivity Guarantee. If the engineering value provided by Devin is less than the price paid by the customer, Cognition will fund up to $10 million in usage until it reaches par. Mr. Wu framed this as a way to cut through the noise of metrics that plague the industry: return on investment.
Wu said that rather than measuring activities such as tokens or lines of code consumed, Cognition estimates the number of human engineering hours agents actually save and backs up that estimate with a refund. He said you can spend billions of tokens and not do anything with it. Companies should strive to produce results, not activities.
If companies start steering easy, high-volume work to cheaper open source models in places like China, OpenAI and Anthropic won’t be able to get paid for every task. They only accept more complex tasks. Both companies have built their businesses and IPO expectations around them on the premise of huge demand at premium prices.
Patel doesn’t think that will sink Frontier Labs, and says its cutting-edge technology will continue to be valuable. But he sees the pricing model changing. Rather than simply charging more, labs will need to be more efficient in how they use their models, which Patel predicts will lead to a concerted industry effort.
The question was whether companies would continue to spend as AI-related costs skyrocket. Nowadays, many people seem to be finding ways to spend their money wisely. Pricing power is shifting from companies selling premium AI to those buying it.
Frontier Labs will still charge a premium for the most difficult research. But how much of the market do the others account for? The answer could go a long way in determining the valuation of the leading AI companies.
