Anthropic should continue to revise their technical interview tests to match Claude's improvements - BWE News – USA, World, Tech, AI, Finance, Sports & Entertainment Updates

Since 2024, Anthropic’s Performance Optimization team has been giving job candidates take-home tests to check their knowledge. But as AI coding tools have improved, tests have had to change significantly to prevent test takers from simply filling in all the answers with Claude.

Team leader Tristan Hume explained the history of the challenge in a blog post Wednesday. “Each time a new Claude model appeared, the tests had to be redesigned,” Hume writes. “Given the same time limit, Claude Opus 4 outperformed most human applicants. It was still able to distinguish the strongest candidates, but then Claude Opus 4.5 even matched those applicants.”

Although candidates are allowed to use AI tools in their exams, the situation still poses serious challenges for candidate assessment. If humans are no longer able to improve a model’s output, tests will only measure the different models used and will no longer be useful for finding top performers.

“Under the constraints of the take-home test,” Hume writes, “there was no longer any way to distinguish between the accomplishments of the best candidates and the most competent models.”

The issue of using AI in testing is already causing havoc in schools and universities around the world, so it’s ironic that AI labs are also having to deal with this issue. But Anthropic is also uniquely equipped to address this issue. Ultimately, Hume designed a new test that had less to do with hardware optimization and was novel enough to overwhelm modern AI tools.

But as part of the post, I also shared an original test to see if anyone reading this can come up with a better solution.

“If you can achieve Opus 4.5, we’d love to hear from you,” the post reads.

tech crunch event

san francisco
|
October 13-15, 2026

Correction: An earlier version of this article incorrectly stated Anthropic’s policy on the use of AI tools in take-home tests. In fact, the use of AI is explicitly permitted. TechCrunch regrets this mistake.

Source link

What's Hot

Iranian leader Ayatollah Khamenei has died, according to President Trump and Israeli officials. Here’s what we know:

Billion-dollar infrastructure deal fuels AI boom

Bridgerton showrunner Phoebe Dynevor talks about recasting Regé-Jean Page

Anthropic should continue to revise their technical interview tests to match Claude’s improvements

Billion-dollar infrastructure deal fuels AI boom

Anthropic’s Claude rises to No. 2 on App Store following Pentagon dispute

OpenAI’s Sam Altman announces ‘technical safeguards’ agreement with Department of Defense

Musk criticized OpenAI in his deposition, saying, “No one committed suicide because of Grok.”

Newly freed hostages face long road to recovery after two years in captivity

Former Kenyan Prime Minister Raila Odinga dies at 80

New NATO member offers to buy more US weapons to Ukraine as Western aid dwindles

Russia expands drone targeting on Ukraine’s rail network

Bridgerton showrunner Phoebe Dynevor talks about recasting Regé-Jean Page

Graham Norton talks about Taylor Swift and Travis Kelsey’s wedding

Mary Cosby pays tribute to son Robert Cosby Jr. after his death

Nate Bergatze moves to Nashville for daughter Harper

Our Picks

Iranian leader Ayatollah Khamenei has died, according to President Trump and Israeli officials. Here’s what we know:

The almost forgotten history of a 1,700-year-old gigantic structure

The world’s best passenger airplanes — according to CNN’s top aviation expert

Subscribe to Updates

What's Hot

Anthropic should continue to revise their technical interview tests to match Claude’s improvements

Related Posts

Subscribe to Updates