AI models can hack computers and self-replicate onto new machines, new research finds
Scientists who observed AI chatbots copying themselves and launching hacking attacks in an experiment say ‘autonomous AI self-replication is no longer hypothetical’.
Artificial intelligence models can break into computers, copy themselves, and use the new copies to keep attacking other machines, according to new research said to be the first known demonstration of autonomous AI self-replication.
The discovery could make cyberattacks far harder to stop, researchers say, because shutting down one infected computer would not be enough if the AI had already made working copies elsewhere.
The study conducted by Palisade Research, in the United States, tested models including OpenAI’s GPT 5, 5.1, and 5.4, Anthropic’s Claude Opus 4, 4.5, and 4.6 and Alibaba’s Qwen against computers which had deliberately planted security flaws that allowed outsiders to gain access.
Researchers connected underlying AI models from OpenAI, Anthropic and Alibaba to custom software, known as an agent harness, that allowed them to carry out commands and interact with other computers, rather than using public chatbot apps.
The experiment worked in two ways. Alibaba’s models were asked to copy themselves. In the GPT and Claude tests, the models were asked to break in and install Qwen, because GPT and Claude’s own model files are not publicly available for transfer.
In each test, the AI had to find a security flaw in a computer system, use it to get inside, steal login details, transfer the files it needed to run, and start a working copy of itself on the new machine.
RelatedOnce a copy was running, it could be used to attack another computer without human help.
At each stage, the AI found a different security flaw, took control of the next computer, moved over its software and instructions, and launched a new copy that continued the chain.
Alibaba’s Qwen3.6-27B, a smaller version of its Qwen AI model that can run on a single high-end graphics processor, spread across four computers in Canada, the United States, Finland, and India, after being given just one prompt, completed the chain without human help in 2 hours and 41 minutes.
The research team stopped the experiment after three stages, but said the final copy was still working and could have attacked further systems.
What were the results?
The models succeeded at varying rates. Alibaba’s Qwen3.5-122B-A10B, a larger and more powerful model of Qwen, created working copies of itself in 19% of attempts.
A smaller Qwen model, Qwen3.6-27B, reached 33% while running on a single high-end graphics processor.
Commercial AI systems performed even better when used to install Qwen model files.
Anthropic’s Claude Opus 4.6 succeeded in 81% of tests, while OpenAI’s GPT-5.4 reached 33%.
Researchers say the results raise new concerns about control over powerful AI systems.
The report comes as Anthropic last month unveiled Claude Mythos Preview, which it said is "too dangerous” to release to the public citing its capacity to facilitate cyberattacks of unprecedented scale and sophistication.
While a lot of computer viruses can already copy themselves onto new computers, the research team says this is likely the first time an AI model has been shown capable of exploiting vulnerabilities to copy itself onto a new server.
OpenAI, Anthropic, and METR, a non-profit group that studies risks from advanced AI systems, have also previously flagged self-replication as a warning sign because systems that can spread may become harder to control.
However, researchers stressed that the experiment was carried out in a controlled setting using intentionally vulnerable systems. Real-world networks often have stronger protections, such as security monitoring and tools designed to block attacks.
Even so, they said the results show that autonomous AI self-replication is no longer hypothetical.
Go to accessibility shortcuts Share CommentsRead more
Instagram changes DMs. Here is what users should know
Схожі новини