DeepSeek releases a "sparse warning" model that cuts API costs by half - BWE News – USA, World, Tech, AI, Finance, Sports & Entertainment Updates

On Monday, researchers at DeepSeek released a new experimental model called V3.2-EXP, designed to dramatically reduce inference costs when used in long context operations. Deepseek announced the model in a post about Face’s hugs and posted an academic paper linked to Github.

The most important feature of the new model is called DeepSeek Sparse Anterest. This is a complex system explained in detail in the diagram below. Essentially, the system uses a module called “Lightning Indencer” to prioritize certain excerpts from the context window. Another system, called the “fine-grained token selection system,” then selects a specific token from within these excerpts and loads it into the module’s limited attention window. In summary, sparse attention models can work so that server loads over long sections of relatively small contexts.

For long-context operations, the advantages of the system are important. A preliminary test by DeepSeek shows that the price of simple API calls can be reduced by half in long context situations. Building a more robust assessment will require further testing, but since the models are openweight and freely available, it will not be long before third-party tests can evaluate claims made in the paper.

Deepseek’s new model is one of the recent breakthroughs tackling the issue of inference costs. Essentially, it is the server cost for manipulating a pre-trained AI model that is different from the cost of training. In Deepseek’s case, researchers were looking for ways to make basic transformer architectures work more efficiently.

China-based Deepseek was a rare figure in the AI boom, especially those who view AI research as a nationalist struggle between the US and China. The company made waves in the R1 model early in the year, and was trained using reinforcement learning, primarily at a much lower cost than its American competitors. However, this model has not triggered a wholesale revolution in AI training, as some have predicted. The company then retreated from the spotlight in those few months.

The new “sparse attention” approach is unlikely to produce the same uproar as R1, but it can teach providers the tricks needed to keep inference costs low.

Source link

What's Hot

8 things that emotionally stable couples regularly discuss

Katy Perry, Justin Trudeau enjoy Coachella date night

Man City beat Chelsea 3-0 to close the gap on Premier League leaders Arsenal | Soccer News

DeepSeek releases a “sparse warning” model that cuts API costs by half

At the HumanX conference, everyone was talking about Claude

From LLMs to hallucinations, here’s a simple guide to common AI terms

Sam Altman responds to ‘inflammatory’ New Yorker article after home attack

Anthropic has temporarily banned the creator of OpenClaw from accessing Claude

Newly freed hostages face long road to recovery after two years in captivity

Former Kenyan Prime Minister Raila Odinga dies at 80

New NATO member offers to buy more US weapons to Ukraine as Western aid dwindles

Russia expands drone targeting on Ukraine’s rail network

Katy Perry, Justin Trudeau enjoy Coachella date night

The Real Housewives of Rhode Island tagline revealed

Hailey Bieber talks about Justin Bieber’s Coachella 2026 performance

SZA talks about Justin Bieber’s Coachella performance

Our Picks

Live updates: Hungarian elections, Viktor Orbán and Peter Magyar in close race in important European elections

Indian singer Asha Bhosle dies at 92, bringing an end to an ‘extraordinary’ journey

Failure of US-Iran talks deals blow to hopes of finding exit to crisis

Subscribe to Updates

What's Hot

DeepSeek releases a “sparse warning” model that cuts API costs by half

Related Posts

Subscribe to Updates