AI models are starting to decipher high-level math problems

Neel Somani, a software engineer, former quantitative researcher, and startup founder, was testing the math skills of OpenAI’s new models last weekend when he made an unexpected discovery. After pasting the problem into ChatGPT and letting it think for 15 minutes, I came back with a complete solution. He evaluated the proof and formalized it using a tool called Harmonic, and everything went well.

“I was interested in establishing a baseline for when LLMs can effectively solve unsolved math problems compared to when they are struggling,” Somani said. What surprised me was that Frontier started to move forward little by little with the latest model.

ChatGPT’s chain of thought is even more impressive, rattling off mathematical axioms such as Legendre’s formula, Bertrand’s postulate, and the Star of David theorem. Eventually, the model found a 2013 Math Overflow post. There, Harvard mathematician Noam Elkies had an elegant solution to a similar problem. However, ChatGPT’s final proof differed from Elkies’ work in important ways and provided a more complete solution to the version of the problem posed by legendary mathematician Paul Erdős. His vast collection of unsolved problems has become a testing ground for AI.

For machine intelligence skeptics, this is a surprising result, but it’s not the only one. From formalization-oriented LLMs like Harmonic’s Aristotle to literature review tools like OpenAI’s Deep Research, AI tools are widespread in mathematics. But since the release of GPT 5.2, which Somani says is “anecdotally more proficient at mathematical reasoning than previous versions,” it has become difficult to ignore the sheer volume of problems solved, raising new questions about the ability of large-scale language models to push the frontiers of human knowledge.

Mr. Somani was paying attention to the Erdos issue. Erdos Problems is a set of over 1,000 conjectures by Hungarian mathematicians maintained online. These problems vary widely in both subject matter and difficulty, making them attractive targets for AI-driven mathematics. The first batch of autonomous solutions was delivered in November with a Gemini-powered model called AlphaEvolve. But recently, Somani and colleagues discovered that GPT 5.2 is very good at high-level mathematics.

Since Christmas, 15 issues have been changed from “open” to “resolved” on the Erdos website, with 11 of the resolutions specifically acknowledging that an AI model is involved in the process.

Respected mathematician Terence Tao offers a more nuanced analysis of the progress on his GitHub page, counting eight different cases where AI models have made meaningful autonomous progress on the Erdos problem, and six other cases where they have discovered and built on prior research. Although we have a long way to go before AI systems can perform mathematics without human intervention, it is clear that large-scale models have an important role to play.

tech crunch event

san francisco
|
October 13-15, 2026

Regarding Mastodon, Tao speculates that the scalable nature of AI systems makes them well-suited to “systematically apply to the ‘long tail’ of Erdos problems, many of which actually have simple solutions.”

“Many of these simple Erdos problems are therefore more likely to be solved by purely AI-based methods than by human or hybrid means,” Tao continued.

Another driver is the recent move toward formalization, a labor-intensive task that facilitates the validation and extension of mathematical reasoning. Formalization does not require the use of AI or computers, but the advent of new automated tools has made the process much easier. Lean, an open source “proof assistant” developed at Microsoft Research in 2013, has become widely used in the field as a way to formalize proofs, and AI tools like Harmonic’s Aristotle are expected to automate much of the formalization work.

For Harmonic founder Tudor Achim, the fact that Erdos’ problem was suddenly solved is less important than the fact that the world’s greatest mathematicians are starting to take these tools seriously. “I’m more concerned about the fact that math and computer science professors are using[AI tools],” Achim said. “These people have reputations to protect, so when they say they’re using Aristotle or they’re using ChatGPT, that’s real evidence.”

Source link

What's Hot

Nvidia is sending GPUs to the moon

Intel is down 27% from its all-time high in June. how can you reverse the slide

Women’s Africa Cup of Nations 2026: What you need to know about the tournament | Soccer News

Nvidia is sending GPUs to the moon

AI chip startup Etched defies skepticism, leading to $10.3 billion valuation from big-name investors

Experts say exploiting ‘Anthropic’s Fable’ is not the secret to Kimi K3’s success

ServiceNow bets $40 million on Indian banking software experts to expand financial services push

Newly freed hostages face long road to recovery after two years in captivity

Former Kenyan Prime Minister Raila Odinga dies at 80

New NATO member offers to buy more US weapons to Ukraine as Western aid dwindles

Russia expands drone targeting on Ukraine’s rail network

‘Seinfeld’ star Jason Alexander apologizes to Courtney Stodden over 2012 sketch

Tyra Banks’ Project Runway hairstyle with huge bangs causes a huge stir

Jennifer Aniston praises friend co-star Courteney Cox’s evil genius

Posted by RHOSLC’s Lisa Barlow amid Whitney Rose and Angie Katsanevas breakup

Our Picks

Firefighters battle Spain’s largest wildfire in history amid heat wave across the Mediterranean

‘Firefighter’s donkey’ helps prevent spread of wildfires in Spain

Live updates: US and Iranian attacks. Houthis claim Red Sea attack, escalating conflict

Subscribe to Updates

What's Hot

AI models are starting to decipher high-level math problems

Related Posts

Subscribe to Updates