Openai says that GPT-5 will stack up on humans with a wide range of jobs

Openai released a new benchmark on Thursday. This tested the performance of the AI model compared to human experts in a wide range of industries and employment. This test, GDPVAL, is an early attempt to understand how close OpenAI systems are to outperform humans in economically valuable work.

Openai says it has discovered that the GPT-5 model and Anthropic’s Claude Opus 4.1 are “already approaching the quality of work produced by industry experts.”

That doesn’t mean Openai’s models will soon begin human change at work. Despite predictions by some CEOs that AI will take on human work in just a few years, Openai acknowledges that GDPVal covers a very limited number of tasks people today do in real work. However, this is one of the latest ways in which companies are measuring AI progress towards this milestone.

GDPVal is based on nine industries that contribute most to the US gross domestic product, including domains such as healthcare, finance, manufacturing and government. This benchmark tests the performance of AI models in 44 occupations in these industries, ranging from software engineers to nurses and journalists.

For the first version of Openai, GDPVal-V0, Openai asked experienced experts to compare AI generation reports with reports generated by other experts and select the best report. For example, I asked an investment banker to create a competitor landscape for the last mile delivery industry and compare them with AI generation reports. Openai then averages the “winning rate” of the AI model for human reporting across all 44 occupations.

For GPT-5, a soup-up version of GPT-5 of GPT-5-High, for GPT-5 with additional computing power, the company says that the AI model was ranked on par with industry experts in 40.6% of the time.

Openai also tested the Claude Opus 4.1 model of humanity. This was ranked on par with industry experts at 49% of tasks. Openai says he believes Claude scored very high because he tends to make fun graphics rather than performance.

TechCrunch Events

San Francisco
|
October 27th-29th, 2025

It is worth noting that most working professionals do more than submitting research reports to their boss, which is everything about the GDPVAL-V0 test. Openai acknowledges this and says it plans to create more robust tests in the future that can explain more industries and interactive workflows.

Nevertheless, the company considers GDPVal’s progress worth noting.

In an interview with TechCrunch, Openai’s chief economist Dr. Aaron Chatterji said the results of GDPVal suggest that people in these jobs can spend time using AI models to spend more meaningful tasks.

“(Because) the model is getting better with some of these things,” says Chatterji.

In Openai’s assessment, Tejal Patwardhan told TechCrunch that he was encouraged by the GDPVal progress rate. Openai’s GPT-4O model won 13.7% (victory and bond with humans), released about 15 months ago. Currently, the GPT-5 has scored almost three times the score.

Silicon Valley has a wide range of benchmarks used to measure the progress of AI models and to assess whether a particular model is cutting edge. The most popular are AIME 2025 (testing competitive math problems) and GPQA diamond (testing PHD-level science questions). However, some AI models are approaching saturation with some of these benchmarks, and many AI researchers have cited the need for better testing that can measure AI proficiency with respect to real tasks.

Benchmarks like GDPVal can become increasingly important in that conversation, as Openai claims AI models are valuable to a wide range of industries. However, Openai clearly states that testing of a more comprehensive version may be required, and that its AI model may be superior to humans.

Source link

What's Hot

Man dies in Venezuela earthquake, day after US deportation

Prince Harry attends TIME event in New York after reuniting with King Charles

France vs England – World Cup third place match: Predictions, kick-off | 2026 World Cup News

Why first GPU investors turn to inference chips in $400 million deal

Roblox launches AI-powered game creation feature in mobile app

You can now star in your own AI videos on Google Vids

Moonshot’s upcoming Kim 3 is expected to close the gap with Anthropic’s Opus 4.8

Newly freed hostages face long road to recovery after two years in captivity

Former Kenyan Prime Minister Raila Odinga dies at 80

New NATO member offers to buy more US weapons to Ukraine as Western aid dwindles

Russia expands drone targeting on Ukraine’s rail network

Prince Harry attends TIME event in New York after reuniting with King Charles

Tom Holland makes rare comment about Zendaya’s marriage

All new movies available in July on Fandango

Kristin Cavallari models jewelry with Jay Cutler’s sons Camden and Jackson

Our Picks

Man dies in Venezuela earthquake, day after US deportation

Prime Minister Netanyahu buys political loyalty with controversial bill ahead of election

Andy Burnham becomes Britain’s seventh coach in 10 years. Will he be able to buck the trend?

Subscribe to Updates

What's Hot

Openai says that GPT-5 will stack up on humans with a wide range of jobs

Related Posts

Subscribe to Updates