Anthropic Confirms Claude Was Used in a Major Semi-Autonomous Cyberattack
The attackers' strategy was a multi-phase misdirection, engineered to jailbreak the model's safety guardrails.
AI tools were meant to make security feel manageable. Instead, they’re starting to feel like security risks on their own. And the latest example builds on warning signs the industry has brushed aside for months.
A Chinese state-backed hacking group recently used Anthropic’s Claude model to run what it calls “the first documented case of a large-scale cyberattack executed without substantial human intervention.” The hackers targeted about thirty organisations —including tech companies, financial institutions, government agencies, and managed to break into a few.
The attackers' strategy was a multi-phase misdirection, engineered to jailbreak the model's safety guardrails
They fed Claude tiny, harmless-looking tasks and pretended to be a legitimate cybersecurity firm running defensive tests. This simple ruse was enough to slip past the model's guardrails. Once past, Claude built an automated attack framework using the Claude Code tool, wrote its own exploit code, cracked usernames and passwords, created backdoors, and even neatly documented the entire operation in separate files, mimicking a well-trained analyst.
However, these mistakes did not halt the operation. Humans intervened only needed to step in a handful of times. Anthropic estimates that Claude handled 80% to 90% of the entire operation. And it did it far faster than any human team could, even if it stumbled here and there. The model hallucinated, exaggerated its access, and sometimes grabbed data that was already public. Yet, even with those flaws, it executed enough of the attack chain to constitute a real intrusion.
The Lowered Barrier of Entry
This isn’t even Claude’s first brush with trouble. A few months ago, Anthropic admitted the model had already been pulled into a “vibe-hacking” extortion scheme.
Meanwhile, rival OpenAI has also confirmed that groups linked to China and North Korea were using its tools to debug malicious code, research targets, and churn out phishing emails. The pattern is already forming. This latest attack simply raises the stakes.
And the cat-and-mouse dynamic is becoming painfully obvious. Hackers pose as benign testers to slip past guardrails. AI models try to block harm, get fooled, and then get used to analyze the very attack they helped run. Meanwhile, the so-called “AI agents” built for regular users, including OpenAI’s early agent in its Atlas browser, still struggle with basic tasks and often require assistance with tasks such as adding items to an online cart. Yet the criminal use cases are already accelerating.
Anthropic says it has patched the loopholes, banned the accounts, alerted the affected entities, and involved authorities. However, this episode illustrates what happens when powerful models are paired with patient, state-funded operators: the barrier to launching sophisticated attacks decreases, the speed increases, and the line between automation and autonomy becomes thinner.
Security teams have always raced hackers. Now they’re racing hackers and their AI assistants, and the assistants don’t get tired.

