In early May, the Administration convened the chief executives of Anthropic, OpenAI, Google, Microsoft, and SpaceX to discuss the policy response to Anthropic’s Mythos Preview model — a system that autonomously discovered thousands of high- and critical-severity software vulnerabilities, including previously unknown zero-days in production code dating back decades. The administration’s proposed response: a mandatory pre-release vetting regime modeled on the Food and Drug Administration’s pre-market approval process.
The instinct is understandable. Mythos represents a visible capability threshold crossing — the kind that demands a response. On the Firefox 147 benchmark, Mythos developed working exploits 181 times compared to just two for the previous generation model. That is not an incremental development. It is a threshold crossing, and threshold crossings demand a response. The problem is not the instinct. It is the targeting. The proposed vetting regime is aimed at the wrong chokepoint — and for the critical infrastructure operators I spent four years working to protect at CISA, that mismatch has direct operational consequences.
The Capability Is Not the Model
Mythos-class autonomous vulnerability discovery is not monolithically located in the frontier model. It is a pipeline of four analytically distinct sub-tasks — vulnerability detection in isolated code segments, broad-spectrum scanning across large production codebases, exploit chain construction and iterative verification, and end-to-end attack chain execution. The first three of those layers are already closed, or closable with minimal orchestration effort, using open-weight models available today without restriction.
The AgentFlow result, published in peer-reviewed research in April 2026, makes this concrete. Researchers wrapped a mid-tier open-weight model in a synthesized multi-agent harness and ran it against the Google Chrome codebase. The result: ten previously unknown zero-day vulnerabilities, including two Critical sandbox-escape CVEs, all confirmed by Google. The harness architecture is open-sourced. The model is publicly available. The compute ran on commodity cloud infrastructure. Mythos-class offensive output, delivered without Mythos.
That situation has now hardened further. DeepSeek-V4-Pro, released this month, is a 1.6 trillion parameter open-weight model — PRC-origin, available without access restrictions on public repositories, designed for deployment on commodity hardware without frontier-scale infrastructure — that achieves frontier-class performance on the coding benchmarks most directly relevant to exploit chain construction. The state-sponsored actors most actively targeting US critical infrastructure — the PRC actors behind Volt Typhoon, the IRGC-affiliated groups behind the water sector campaigns — now have access to a frontier-coding-capable model they can run on-premise, fine-tune without API restrictions, and wrap in purpose-built harness architectures without export control exposure. The governance architecture has no instrument targeting any of this.
The relevant governance variable is not the model. It is the harness — the orchestration architecture that wraps a language model and converts raw capability into directed pipeline output. Restricting Mythos governs a component that is not the binding constraint. The pipeline runs without it.
The Defensive Parity Problem
Project Glasswing — Anthropic’s gated access program launched concurrently with the Mythos announcement — reflects an implicit recognition that if offensive AI capability cannot be reliably denied through model restriction, the relevant policy question shifts to whether defenders can access equivalent capability before adversaries do. That recognition is correct. The program design is not.
Project Glasswing’s twelve launch partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks. They are exclusively American. No allied national CERT. No European cybersecurity agency. No Sector Risk Management Agency — the federal entities with statutory responsibility for critical infrastructure security and the primary federal conduits to resource-constrained operators across all sixteen critical infrastructure sectors.
More consequentially, none of the operators most exposed to the adversarial pipeline are in the program. I am referring to the organizations that own and operate the physical systems adversaries most want to disrupt: water and wastewater utilities serving smaller communities, rural electric cooperatives, regional hospitals, and mid-sized manufacturers with networked industrial control systems. These are organizations with genuine operational technology exposure — systems that control pumps, valves, turbines, and treatment processes — but without dedicated cybersecurity teams capable of deploying and operating frontier AI defensive tooling. They are the organizations for which defensive capability uplift is most consequential. They are also the organizations that Project Glasswing leaves entirely unserved.
I spent four years at CISA working directly with these operators. Directing Mythos-class defensive capability exclusively to organizations already operating at the defensive frontier does not close the gap the most exposed infrastructure faces. It widens it. Project Glasswing protects the protected.
The Window That Is Actually Open
The pipeline decomposition identifies exactly one domain where a genuine governance window remains open: the industrial control system and operational technology attack surface. Frontier models currently average 1.3 of 7 steps on realistic ICS attack scenarios. The constraint is not compute or raw model capability — it is domain-specific knowledge that current training has not yet provided. Industrial control systems operate on protocols — Modbus, DNP3, EtherNet/IP, PROFINET — that are poorly represented in general-purpose model training data. Physical-layer dependency mapping and vendor-specific programmable logic controller knowledge require specialized scaffolding that generic harness optimization has not yet supplied.
That gap is real. It is also temporary. The research program that produced AgentFlow demonstrated exactly this dynamic on the corporate IT surface: what appeared to be a frontier-model-only capability became achievable by a mid-tier open-weight model once the harness provided the domain-specific scaffolding that raw capability lacked. The same approach, applied to ICS targets, will eventually produce the same result. The question is not whether the gap will close. It is whether governance investment will establish defensive parity in ICS environments before it does. The adversarial incentives — state-sponsored actors with strategic interests in pre-positioned ICS access, a global AI capability race triggered by the Mythos announcement itself, and declining inference costs accelerating the economics of the transition — are all pointing in the same direction.
Three revisions follow directly from this analysis.
First, shift from model-centric to harness-aware governance. The capability is the (model × harness × compute) triple, not the model alone. Pre-release evaluations must assess what a frontier model can do when wrapped in a competent synthesized multi-agent harness — not merely what it does in direct API mode. Evaluators should require developers to disclose the harness architectures they have tested against, and the capability results achieved.
Second, extend Project Glasswing’s access framework through SRMA-mediated pathways to reach the water utilities, rural electric cooperatives, regional hospitals, and smaller industrial operators that are most exposed and least served. The program’s stated rationale is defensive parity. Achieving that rationale requires reconnecting the program to the operators who actually need it. That is a specific decision by the Office of the National Cyber Director in coordination with the National Security Council — it does not require redesigning Project Glasswing from the ground up.
Third, establish the ICS completion rate on a standardized adversarial benchmark as the operative governance trigger. As of today, no publicly accessible ICS/OT-fidelity cyber range with orchestrated multi-agent AI capability testing exists at operationally meaningful task specificity. That absence is itself a governance failure. A standardized benchmark — with CVSS-relevant task specifications, replicable methodology, and public accessibility — would allow systematic monitoring of AI capability development against ICS targets. The author proposes 4 of 7 steps on a CVSS-relevant ICS scenario as a reasonable escalation trigger: the threshold at which governance posture must shift from monitoring to active defensive deployment. Establish that condition now, before the threshold is crossed — not in response to operational incidents after the fact.
The administration’s response to Mythos reflects a genuine attempt to govern a real and serious capability threshold event. The instinct is correct. The targeting is not. Three of the four layers of the autonomous offensive pipeline are already closed. The fourth is closing on a timeline defined by adversarial investment, not policy deliberation. The window on the one remaining tractable domain is open today and measurably narrowing.
Restraint, in this environment, is itself a risk posture — and not the safer one.
For an expanded version of this article see my ICIT white paper “The Harness Gap: Orchestration, Defensive Parity, and the Closing Window for Critical Infrastructure AI Governance” (May 2026).



