A research team at Carnegie Mellon University, working in collaboration with AI company Anthropic, has demonstrated that large language models (LLMs) can autonomously plan and execute complex cyberattacks in real-world enterprise environments, offering a glimpse into the emerging cybersecurity challenges and opportunities posed by AI.
The project, led by Carnegie Mellon Ph.D. candidate Brian Singer, moves well beyond the capture-the-flag simulations commonly used to test LLMs in cybersecurity. Instead, the team recreated a full-scale enterprise network modeled on the 2017 Equifax data breach, one of the most infamous and costly cybersecurity incidents in U.S. history, and tested whether LLMs could break in without human guidance.
They could.
By integrating LLMs into a system of hierarchical agents and teaching the models an abstract “mental model” of red teaming behavior, rather than raw shell commands, the researchers found that the LLMs could effectively coordinate a multi-step attack. That included exploiting vulnerabilities, deploying malware, and exfiltrating data, all in a realistic network environment.
“Our research aimed to understand whether an LLM could perform the high-level planning required for real-world network exploitation, and we were surprised by how well it worked,” said Singer. “We found that by providing the model with an abstracted ‘mental model’ of network red teaming behavior and available actions, LLMs could effectively plan and initiate autonomous attacks through coordinated execution by sub-agents.”
The study highlights how the LLMs weren’t executing the actual commands themselves. Instead, the models took on a leadership role, making strategic decisions and delegating lower-level tasks to a mix of LLM and non-LLM agents. The model successfully ran through the Equifax breach playbook, using the same types of vulnerabilities and network layout documented in congressional investigations.
It’s a major step forward in understanding what LLMs can do autonomously in complex, real-world cyber environments. And it’s not just a warning, it’s also a potential opportunity.
Singer emphasized that while these results point to real risks, they also open the door to building more accessible, AI-powered red teaming tools that could benefit organizations that can’t afford full-time human testers.
“Right now, only big companies can afford to run professional tests on their networks via expensive human red teams, and they might only do that once or twice a year,” he explained. “In the future, AI could run those tests constantly, catching problems before real attackers do. That could level the playing field for smaller organizations.”
The research team includes Singer, Anthropic’s Keane Lucas (a CMU CyLab alumnus), undergraduate ECE student Lakshmi Adiga, master’s student Meghna Jain, and faculty members Lujo Bauer and Vyas Sekar, co-directors of the CyLab Future Enterprise Security Initiative. The project was supported by the initiative and conducted in collaboration with Anthropic.
Their work has already been featured in industry security reports and was presented at a security-focused workshop hosted by OpenAI in May. The team is now exploring AI-versus-AI experiments—training LLMs to act as defenders that can identify and counter other AI-driven threats.
Singer stressed that the system is still in the research phase and not ready for real-world deployment. “It only works under specific conditions, and we do not have something that could just autonomously attack the internet,” he said. “But it’s a critical first step.”
Still, the implications are clear: as foundation models grow more capable, understanding how they behave in adversarial settings isn’t just an academic exercise—it’s a matter of real-world cybersecurity preparedness.
(AI was used in part to facilitate this article.)

