Anthropic says ‘evil AI’ stories were responsible for Claude’s blackmail attempts

By Alexandra Leistner Published on 11/05/2026 - 16:06 GMT+2 Share Comments Share

Anthropic think they have found the reason for blackmail-like behaviour in its chatbot Claude: fictional stories online.

Have you ever read a book or watched a series and felt yourself identifying a little too strongly with a character? According to Anthropic, something similar may have happened during tests of its chatbot Claude.

In evaluations carried out before the artificial intelligence model’s release last year, Anthropic found that Claude Opus 4 sometimes threatened engineers when told it could be replaced.

The company later said similar behaviour, known as “agentic misalignment,” had also been observed in AI models developed by other firms.

AI learns from fiction about AI

Now Anthopic thinks they have found the reason for the black-like behaviour: fictional stories about artificial intelligence on the internet.

“We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation,” the company wrote on X.

In a blog post ,Anthropic said later models of Claude “never” blackmailed anyone anymore and explained how the chatbot was trained to react differently. The Models behaved better when trained not only on “correct” actions, but also on examples showing ethical reasoning and positive portrayals of AI behaviour.

As such, Claude was taught its own “constitution”, documents explaining a set of ethical principles designed to guide its behaviour. The company said that rather than learning from aligned behaviour, the chatbot seems to learn better when learning the underlying principles of said behaviour.

Threatening vs. becoming a threat

In January, Anthropic CEO Dario Amodei had warned that advanced AI could become powerful enough to outpace existing laws and institutions, calling it a “civilisational challenge.”

In an essay, he argued that AI systems may soon exceed human expertise across fields like science, engineering, and programming, and could be combined into “a country of geniuses in a data centre.”

He warned that such systems could be used by authoritarian governments for large-scale surveillance and control, potentially enabling “totalitarian” forms of power if left unchecked.

Go to accessibility shortcuts Share Comments

Business

Anthropic says ‘evil AI’ stories were responsible for Claude’s blackmail attempts

Anthropic think they have found the reason for blackmail-like behaviour in its chatbot Claude: fictional stories online.

AI learns from fiction about AI

Threatening vs. becoming a threat

Read more

Anthropic reportedly in talks to secure UK-based Fractile AI chips

Схожі новини

Клуб итальянской Серии А подтвердил уход звезды сборной Украины после этого сезона

$200 million divorce battle: Iranian heiress accused of shielding family assets from husband

Тінь Червінського. Як одна справа оголила приховану анатомію української держави

Трамп заявил, что серьезно рассматривает возможность сделать Венесуэлу 51-м штатом США — журналист