Close Menu
  • Home
  • AI
  • Entertainment
  • Finance
  • Sports
  • Tech
  • USA
  • World
  • Latest News

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

What's Hot

President Trump criticizes Pope Leo again, days after calling him ‘soft on crime’

April 16, 2026

Anthropic announces major London expansion plans

April 16, 2026

Duchess Meghan says she was the ‘most trolled person in the world’

April 16, 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram Vimeo
BWE News – USA, World, Tech, AI, Finance, Sports & Entertainment Updates
  • Home
  • AI
  • Entertainment
  • Finance
  • Sports
  • Tech
  • USA
  • World
  • Latest News
BWE News – USA, World, Tech, AI, Finance, Sports & Entertainment Updates
Home » Executing AI models is turning into a memory game
AI

Executing AI models is turning into a memory game

adminBy adminFebruary 17, 2026No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
Share
Facebook Twitter LinkedIn Pinterest Email


When talking about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs, but memory is becoming an increasingly important part of the picture. DRAM chip prices have jumped about seven times in the last year as hyperscalers prepare to build billions of dollars worth of new data centers.

At the same time, there is increased discipline in coordinating all memory to ensure the right data reaches the right agent at the right time. Companies that master this will be able to perform the same queries with fewer tokens, which could be the difference between going out of business and staying in business.

Semiconductor analyst Doug O’Loughlin speaks with Weka’s chief AI officer, Val Bercovitch, for an interesting look at the importance of memory chips in his substack. They are both semiconductor experts, so their focus is on chips rather than broader architectures. The impact on AI software is also very important.

I was especially struck by Bercovici’s discussion of the growing complexity of Anthropic’s prompt cache documentation:

You can find out by visiting Anthropic’s Prompt Cash pricing page. It started out as a very simple page six or seven months ago, especially around the time Claude Code was launched. They just said, “It’s cheaper if you use cash.” It’s now an encyclopedia of advice on exactly how many cache writes to buy in advance. There’s a 5-minute window, or a 1-hour window, that’s very common across the industry, and no more. That’s a really important announcement. Of course, you have all sorts of arbitrage opportunities regarding the pricing of cache reads based on the number of cache writes you have purchased upfront.

The question here is how long Claude keeps the prompt in cached memory. You can pay for a 5-minute window or even more for a 1-hour window. It’s much cheaper to utilize data that’s still in cache, so if you manage your data properly, you can save a lot of money. However, there is a catch. Every time you add new data to your query, something else may be pushed out of the cache window.

This is complex, but the conclusion is very simple. Memory management for AI models will be a big part of the future of AI. Companies that do this well will rise to the top.

And a lot of progress is being made in this new field. Back in October, I covered a startup called TensorMesh that was working on one layer in the stack known as cache optimization.

tech crunch event

boston, massachusetts
|
June 23, 2026

Opportunities also exist elsewhere in the stack. For example, lower down the stack is how data centers use the different types of memory they have. (The interview includes a nice discussion about when DRAM chips are used instead of HBM, but it’s pretty deep in the hardware weeds.) Higher up the stack, end users are figuring out how to configure their model suites to take advantage of shared cache.

As companies improve their memory orchestration, they use fewer tokens and the cost of inference becomes cheaper. On the other hand, the model is becoming more efficient at processing each token, further lowering the cost. As the cost of servers decreases, many applications that currently seem unfeasible will gradually begin to become profitable.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
Previous ArticlePeruvian leader Jose Gerri ousted over Chifagate scandal, presidential curse strikes again
Next Article Omega, led by Leon Cooperman, acquires a stake in the mortgage company.
admin
  • Website

Related Posts

DeepL, known for text translation, wants to translate your audio

April 16, 2026

AI learning app Gizmo levels up with 13M users and $22M investment

April 16, 2026

LinkedIn data shows AI is not to blame for job losses…yet.

April 16, 2026

LinkedIn data shows AI is not to blame for job losses…yet.

April 16, 2026
Leave A Reply Cancel Reply

Our Picks

Newly freed hostages face long road to recovery after two years in captivity

October 15, 2025

Former Kenyan Prime Minister Raila Odinga dies at 80

October 15, 2025

New NATO member offers to buy more US weapons to Ukraine as Western aid dwindles

October 15, 2025

Russia expands drone targeting on Ukraine’s rail network

October 15, 2025
Don't Miss
Entertainment

Hunger Games star Ethan Jamieson arrested for assault and murder

By adminApril 16, 20260

a hunger games The alum is grappling with legal issues. ethan jamison— He starred as…

Lena Dunham talks about Allison Mack being invited to NXIVM meeting

April 16, 2026

Ashley Tisdale, Meghan Trainor, moms’ email message after group drama revealed

April 16, 2026

Benjamin Bratt and Sandra Bullock talk about their experience on the set of Miss Congeniality

April 16, 2026
About Us
About Us

Welcome to BWE News – your trusted source for timely, reliable, and insightful news from around the globe.

At BWE News, we believe in keeping our readers informed with facts that matter. Our mission is to deliver clear, unbiased, and up-to-date news so you can stay ahead in an ever-changing world.

Our Picks

Duchess Meghan says she was the ‘most trolled person in the world’

April 16, 2026

Latest news: Iran war, Trump optimistic about deal, closure of Strait of Hormuz

April 16, 2026

China Q1 2026 GDP: China reports 5% growth

April 16, 2026

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Facebook X (Twitter) Instagram Pinterest
  • Home
  • About Us
  • Advertise With Us
  • Contact US
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2026 bwenews. Designed by bwenews.

Type above and press Enter to search. Press Esc to cancel.