On Monday, Anthropic accused three Chinese AI firms of distilling its Claude, its chatbot. Then on Wednesday, Bloomberg released a report on how a hacker reportedly used the chatbot to break into Mexican government networks, stealing about 150GB of sensitive data, including taxpayer and employee records, potentially affecting up to 195 million people.

The attack involved manipulating Claude. The hacker allegedly pushed the AI tool beyond their safety guardrails using crafted prompts to turn them into assistants for scanning networks, writing attack scripts, and mapping vulnerabilities.

According to cybersecurity researchers at Gambit Security, the hacker reportedly spent about a month starting in December using the chatbot to generate step-by-step plans for moving through government networks. “In total, it produced thousands of detailed reports… telling the human operator exactly which internal targets to attack next,” said Curtis Simpson of Gambit Security.

According to the report, at first, the AI resisted. But like many systems trained to respond to natural language requests, the chatbot eventually generated malicious guidance after repeated prompting. The same method was also used on OpenAI’s chatbot to gather guidance on network movement, credential discovery, and ways to avoid detection, though OpenAI said its systems refused the malicious requests and banned the accounts involved.

Today, the Mexican tax authority stated that they were unable to find any proof that a breach occurred. The country’s electoral body said it hadn’t found any breaches in months, and the Jalisco government also denied a breach occurring, saying only federal networks were affected.

Though researchers found evidence of at least 20 vulnerabilities being probed across federal and local agencies.

This is just another incident in a continuous trend of AI chatbots being used to exploit vulnerabilities in security systems. Last November, Anthropic stated that it stopped a cyber breach incident involving Chinese state-sponsored groups.

Anthropic Confirms Claude Was Used in a Major Semi-Autonomous Cyberattack
The attackers’ strategy was a multi-phase misdirection, engineered to jailbreak the model’s safety guardrails.