BETA — Сайт у режимі бета-тестування. Можливі помилки та зміни.
UK | EN |
LIVE
Світ 🇬🇧 Велика Британія

Anthropic says ‘evil AI’ stories were responsible for Claude’s blackmail attempts

Euronews 0 переглядів 8 хв читання
By Alexandra Leistner Published on 11/05/2026 - 16:06 GMT+2 Share Comments Share Close Button

Anthropic think they have found the reason for blackmail-like behaviour in its chatbot Claude: fictional stories online.

Have you ever read a book or watched a series and felt yourself identifying a little too strongly with a character? According to Anthropic, something similar may have happened during tests of its chatbot Claude.

ADVERTISEMENT ADVERTISEMENT

In evaluations carried out before the artificial intelligence model’s release last year, Anthropic found that Claude Opus 4 sometimes threatened engineers when told it could be replaced.

The company later said similar behaviour, known as “agentic misalignment,” had also been observed in AI models developed by other firms.

AI learns from fiction about AI

Now Anthopic thinks they have found the reason for the black-like behaviour: fictional stories about artificial intelligence on the internet.

“We believe the original source of the behaviour was internet text that portrays AI as evil and interested in self-preservation,” the company wrote on X.

In a blog post ,Anthropic said later models of Claude “never” blackmailed anyone anymore and explained how the chatbot was trained to react differently. The Models behaved better when trained not only on “correct” actions, but also on examples showing ethical reasoning and positive portrayals of AI behaviour.

As such, Claude was taught its own “constitution”, documents explaining a set of ethical principles designed to guide its behaviour. The company said that rather than learning from aligned behaviour, the chatbot seems to learn better when learning the underlying principles of said behaviour.

Threatening vs. becoming a threat

In January, Anthropic CEO Dario Amodei had warned that advanced AI could become powerful enough to outpace existing laws and institutions, calling it a “civilisational challenge.”

In an essay, he argued that AI systems may soon exceed human expertise across fields like science, engineering, and programming, and could be combined into “a country of geniuses in a data centre.”

He warned that such systems could be used by authoritarian governments for large-scale surveillance and control, potentially enabling “totalitarian” forms of power if left unchecked.

Go to accessibility shortcuts Share Comments

Read more

FILE. The Anthropic website and mobile phone app are shown, Jul. 2024
Business

Anthropic reportedly in talks to secure UK-based Fractile AI chips

Поділитися

Схожі новини