According to Anthropic, fictional depictions of artificial intelligence can have real-world effects on AI models.
The company announced last year that during pre-release testing involving fictitious companies, Claude Opus 4 often attempted to blackmail engineers to avoid being replaced with another system. Anthropic later published research suggesting that other companies’ models had similar issues with “agent misalignment.”
It appears that Anthropic has taken further action on its behavior, claiming in a post to
The company elaborated further in a blog post, saying that starting with Claude Haiku 4.5, Anthropic’s models “never make threats (during testing), compared to up to 96% of the time in previous models.”
What’s the difference? The company said it found that training based on “Claude’s constitutional documents and fictional stories of AI working brilliantly” improved collaboration.
In this regard, Anthropic stated that training was found to be more effective when it included “the principles underlying coordinated behavior” rather than just “a demonstration of coordinated behavior alone.”
“Doing both together appears to be the most effective strategy,” the company said.
tech crunch event
San Francisco, California
|
October 13-15, 2026
