Anna Barclay | Getty Images News | Getty Images
While the latest experimental model from Chinese startup DeepSeek promises to increase efficiency and improve the ability to process a lot of information at some of the costs of AI, questions remain about how effective and secure the architecture is.
Deepseek plunged Silicon Valley into a frenzy last year when it launched its first Model R1 out of nowhere.
The company released DeepSeek-V3.2-Exp, an experimental version of the current model DeepSeek-V3.1 end, on Monday, according to a post hugging the face of the AI forum.
“Deepseek v3.2 continues to focus on efficiency, cost reduction and open source sharing,” said Adina Yakefu, China’s community lead, who told CNBC. “A major improvement is a new feature called DSA (Deepseek Sparse Caution), where AI improves handling of long documents and conversations. It also reduces the cost of running AI in half compared to previous versions.”
Nick Patience, Vice President and Practical Lead of AI at Futurum Group, said: “This could make powerful AI more accessible to developers, researchers and small businesses, leading to a wave of new and innovative applications.”
Pros and cons of sparse attention
AI models make decisions based on new information such as training data and prompts. Let’s say your airline wants to find the best route from A to B. By excluding less viable routes, time, fuel, and ultimately money dramatically reduce the amount of money you need to travel. This is a totally sparse attention, and is a data-only factor that considers important, taking into account the task at hand, in contrast to other models that crunched all the data in the model.
“Essentially, you cut out what you think isn’t important,” said Ekaterina Armas, co-founder and managing partner of Blank Page Capital, a new venture capital fund.
Sparse attention is a boon for efficiency, and the ability to scale AI if the resources required are low, but one concern is that it can lead to a decline in reliable models due to lack of monitoring how and why information is discounted.
“The reality is that they (sparse attention model) lost a lot of nuance,” said Almasque, an early supporter of Dataiku and Darktrace and an investor at Graphcore. “And the real question is whether they had the right mechanism to exclude important data or whether they had a mechanism to exclude truly important data and the outcomes are much less relevant.”
This could be particularly problematic for AI safety and inclusiveness, investors noted, adding that it may not be the “optimal or safest” AI model to use compared to competitors and traditional architectures.
However, Deepseek says the experimental model behaves on the same level as the V3.1 end. Despite speculation about the formation of a bubble, AI is at the heart of geopolitical competition with the US, with China competing for a victory position. Yakefu pointed out that Deepseek’s models work “quickly” with Chinese-made AI chips such as Ascend and Cambricon.
Deepseek also shared the actual programming code and tools needed to use the experimental model, she said. “This means others can learn from it and build their own improvements.”
But for Armas, the essence of this means that technology may not be defensible. “The approach is not very new,” she says, saying the industry has been “talking about sparse models since 2015,” and Deepseek is unable to patent its technology because it is open source. So Deepseek’s competitiveness must be in how it determines what information to include, she added.
The company itself acknowledges that V3.2-EXP is “a mid-step towards next-generation architecture,” according to the embracing face post.
As Patience pointed out, “This is a prop of Deepseek’s value. Efficiency is just as important as raw power.”
“Deepseek has been playing a long game to keep the community investing in progress,” Yakefu added. “People always want something cheap, reliable and effective.”
