Are the incentives to blame AI hallucinations bad?

Openai’s new research paper asks why large-scale language models like GPT-5 and chatbots like ChatGpt are still hallucinating and whether there is anything you can do to reduce those hallucinations.

In a blog post summarizing the paper, Openai defines hallucinations as “plausible but false statements generated by language models,” acknowledging that despite improvements, hallucinations “continued to be a fundamental challenge for all major language models.”

To explain the point, researchers say they got three different answers when asked about the title of Adam Tauman Kalai’s doctoral dissertation: “Widely Used Chatbots.” (Karai is one of the authors of the paper.) They then asked about his birthday and received three different dates. Again, they were all wrong.

Why are chatbots so wrong? Researchers suggest that hallucinations occur due to pre-training processes focused on correctly predicting the model without attaching true or false labels attached to the training statement.

“The spelling and parentheses follow a consistent pattern, so the error disappears on scale,” they write. “However, like a pet’s birthday, any low-frequency fact cannot be predicted from the pattern alone, and thus leads to hallucinations.”

However, the proposed solution does not focus on the initial prerequisite process, which is why a large model of language models has been evaluated. Current evaluation models do not cause hallucinations per se, but they argue that they “set the wrong incentives.”

Researchers should compare these ratings with a large number of choice tests where random guesses make sense.

TechCrunch Events

San Francisco
|
October 27th-29th, 2025

“In the same way, if the model is rated only with accuracy, the exact percentage of questions is encouraged to guess rather than say “I don’t know,”” they say.

The proposed solution is similar to a test that includes partial credits (such as SAT) to leave a blank to discourage the negative (scoring) of incorrect answers or blind guessing. Similarly, Openai states that valuing the model should “punish a confident error rather than punish uncertainty and give partial credits for the appropriate expression of uncertainty.”

And the researchers argue that “it’s not enough to introduce some new uncertainty-conscious tests on the side. Instead, “A widespread accuracy-based avoidance should be updated so that scoring prevents guessing.”

“When the main scoreboard continues to reward fortune guesses, the model continues to learn speculation,” the researcher says.

Source link

What's Hot

Stocks with the biggest price movements at midday: NKE, CORT, TSM

Chase Stokes and Kelsea Ballerini reconcile after breakup

Gabon government sacks Aubameyang, suspends national team from AFCON2025 | Africa Cup of Nations News

‘College dropout’ has become the most coveted qualification to be a startup founder

Investors predict AI will enter the workforce in 2026

My phone went off. Please live long. . . What exactly?

Best AI-powered dictation apps of 2025

Newly freed hostages face long road to recovery after two years in captivity

Former Kenyan Prime Minister Raila Odinga dies at 80

New NATO member offers to buy more US weapons to Ukraine as Western aid dwindles

Russia expands drone targeting on Ukraine’s rail network

Chase Stokes and Kelsea Ballerini reconcile after breakup

Prediction of zodiac signs in 2026

Little People, Big World’s Matt Roloff and Zach Roloff reunite amid feud

David Beckham pays tribute to Brooklyn Beckham amid family rift

Our Picks

Why does “Auld Lang Syne” still unite the world in the dead of night?

Russia-Ukraine: Putin exudes confidence as Russia approaches tough milestone

Live updates: Fire at ski resort in Crans-Montana, Switzerland, dozens believed dead in New Year’s disaster

Subscribe to Updates

What's Hot

Are the incentives to blame AI hallucinations bad?

Related Posts

Subscribe to Updates