Since 2024, Anthropic’s Performance Optimization team has been giving job candidates take-home tests to check their knowledge. But as AI coding tools have improved, tests have had to change significantly to prevent test takers from simply filling in all the answers with Claude.
Team leader Tristan Hume explained the history of the challenge in a blog post Wednesday. “Each time a new Claude model appeared, the tests had to be redesigned,” Hume writes. “Given the same time limit, Claude Opus 4 outperformed most human applicants. It was still able to distinguish the strongest candidates, but then Claude Opus 4.5 even matched those applicants.”
Although candidates are allowed to use AI tools in their exams, the situation still poses serious challenges for candidate assessment. If humans are no longer able to improve a model’s output, tests will only measure the different models used and will no longer be useful for finding top performers.
“Under the constraints of the take-home test,” Hume writes, “there was no longer any way to distinguish between the accomplishments of the best candidates and the most competent models.”
The issue of using AI in testing is already causing havoc in schools and universities around the world, so it’s ironic that AI labs are also having to deal with this issue. But Anthropic is also uniquely equipped to address this issue. Ultimately, Hume designed a new test that had less to do with hardware optimization and was novel enough to overwhelm modern AI tools.
But as part of the post, I also shared an original test to see if anyone reading this can come up with a better solution.
“If you can achieve Opus 4.5, we’d love to hear from you,” the post reads.
tech crunch event
san francisco
|
October 13-15, 2026
Correction: An earlier version of this article incorrectly stated Anthropic’s policy on the use of AI tools in take-home tests. In fact, the use of AI is explicitly permitted. TechCrunch regrets this mistake.
