55.1 F
Washington D.C.
Sunday, December 3, 2023

Artificial Intelligence, Critical Systems, and the Control Problem

The integration of multiagent systems could be far more dangerous and could lead to other (as of now unanticipated) failure modes between systems.

Artificial Intelligence (AI) is transforming our way of life from new forms of social organization and scientific discovery to defense and intelligence. This explosive progress is especially apparent in the subfield of machine learning (ML), where AI systems learn autonomously by identifying patterns in large volumes of data.[1] Indeed, over the last five years, the fields of AI and ML have witnessed stunning advancements in computer vision (e.g., object recognition), speech recognition, and scientific discovery.[2], [3], [4], [5]  However, these advances are not without risk as transformative technologies are generally accompanied by a significant risk profile, with notable examples including the discovery of nuclear energy, the Internet, and synthetic biology. Experts are increasingly voicing concerns over AI risk from misuse by state and non-state actors, principally in the areas of cybersecurity and disinformation propagation. However, issues of control – for example, how advanced AI decision-making aligns with human goals – are not as prominent in the discussion of risk and could ultimately be equally or more dangerous than threats from nefarious actors. Modern ML systems are not programmed (as programming is typically understood), but rather independently developed strategies to complete objectives, which can be mis-specified, learned incorrectly, or executed in unexpected ways. This issue becomes more pronounced as AI becomes more ubiquitous and we become more reliant on AI decision-making. Thus, as AI is increasingly entwined through tightly coupled critical systems, the focus must expand beyond accidents and misuse to the autonomous decision processes themselves.

The principal mid- to long-term risks from AI systems fall into three broad categories: risks of misuse or accidents, structural risks, and misaligned objectives. The misuse or accident category includes things such as AI-enabled cyber-attacks with increased speed and effectiveness or the generation and distribution of disinformation at scale.[6] In critical infrastructures, AI accidents could manifest as system failures with potential secondary and tertiary effects across connected networks. A contemporary example of an AI accident is the New York Stock Exchange (NYSE) “Flash Crash” of 2010, which drove the market down 600 points in 5 minutes.[7] Such rapid and unexpected operations from algorithmic trading platforms will only increase in destructive potential as systems increase in complexity, interconnectedness, and autonomy.

The structural risks category is concerned with how AI technologies shape the social and geopolitical environment in which they are deployed. Important contemporary examples include the impact of social media content selection algorithms on political polarization or uncertainty in nuclear deterrence and the offense-to-defense balance.[8],[9]  For example, the integration of AI into critical systems, including peripheral processes (e.g., command and control, targeting, supply chain, and logistics), can degrade multilateral trust in deterrence.[10] Indeed, increasing autonomy in all links of the national defense chain, from decision support to offensive weapons deployment, compounds the uncertainty already under discussion with autonomous weapons.[11]

Misaligned objectives is another important failure mode. Since ML systems develop independent strategies, a concern is that the AI systems will misinterpret the correct objectives, develop destructive subgoals, or complete them in an unpredictable way. While typically grouped together, it is important to clarify the differences between a system crash and actions executed by a misaligned AI system so that appropriate risk mitigation measures can be evaluated. Understanding the range of potential failures may help in the allocation of resources for research on system robustness, interpretability, or AI alignment.

At its most basic level, AI alignment involves teaching AI systems to accurately capture what we want and complete it in a safe and ethical manner. Misalignment of AI systems poses the highest downside risk of catastrophic failures. While system failures by themselves could be immensely damaging, alignment failures could include unexpected and surprising actions outside the system’s intent or window of probability. However, ensuring the safe and accurate interpretation of human objectives is deceptively complex in AI systems. On the surface, this seems straightforward, but the problem is far from obvious with unimaginably complex subtleties that could lead to dangerous consequences.

In contrast with nuclear weapons or cyber threats, where the risks are more obvious, risks from AI misalignment can be less clear. These complexities have led to misinterpretation and confusion with some attributing the concerns to disobedient or malicious AI systems.[12] However, the concerns are not that AI will defy its programming but rather that it will follow the programming exactly and develop novel, unanticipated solutions. In effect, the AI will pursue the objective accurately but may yield an unintended, even harmful, consequence. Google’s Alpha Go program, which defeated the world champion Go[13] player in 2016, provides an illustrative example of the potential for unexpected solutions. Trained on millions of games, Alpha Go’s neural network learned completely unexpected actions outside of the human frame of reference.[14] As Chris Anderson explains, what took the human brain thousands of years to optimize Google’s Alpha Go completed in three years, executing “better, almost alien solutions that we hadn’t even considered.”[15] This novelty illustrates how unpredictable AI systems can be when permitted to develop their own strategies to accomplish a defined objective.

To appreciate how AI systems pose these risks, by default, it is important to understand how and why AI systems pursue objectives. As described, ML is designed not to program distinct instructions but to allow the AI to determine the most efficient means. As learning progresses, the training parameters are adjusted to minimize the difference between the pursued objective and the actual value by incentivizing positive behavior (known as reinforcement learning, or RL).[16],[17]  Just as humans pursue positive reinforcement, AI agents are goal-directed entities, designed to pursue objectives, whether the goal aligns with the original intent or not.

Computer science professor Steve Omohundro illustrates a series of innate “AI drives” that systems will pursue “unless explicitly counteracted.”[18] According to Omohundro, distinct from programming, AI agents will strive to self-improve, seek to acquire resources, and be self-protective.[19] These innate drives were recently demonstrated experimentally, where AI agents “tend to seek power over the environment” to achieve objectives most efficiently.[20] Thus, AI agents are naturally incentivized to seek out useful resources to accomplish an objective. This “power-seeking” behavior was reported by Open AI, where two teams of agents, instructed to play hide-and-seek in a simulated environment, proceeded to horde objects from the competition in what Open AI described as “tool use” distinct from the actual objective.[21] The AI teams learned that the objects were “instrumental” in completing the objective.[22] Thus, a significant concern for AI researchers is the undefined instrumental sub-goals that are pursued to complete the final objective. This tendency to instantiate sub-goals is coined the “instrumental convergence thesis” by Oxford philosopher Nick Bostrom.  Bostrom postulated that intermediate sub-goals are “likely to be pursued by an intelligent agent” to complete the final objective more efficiently.[23] Consider an advanced AI system optimized to ensure adequate power between several cities. The agent could develop a sub-goal of capturing and redirecting bulk power from other locations to ensure power grid stability. Another example is an autonomous weapons system designed to identify targets that develop a unique set of intermediate indicators to determine the identity and location of the enemy.  Instrumental sub-goals could be as simple as locking a computer-controlled access door or breaking traffic laws in an autonomous car, or as severe as destabilizing a regional power grid or nuclear power control system. These hypothetical and novel AI decision processes raise troubling questions in the context of conflict or safety of critical systems. The range of possible AI solutions are too large to consider and can only get more consequential as systems become more capable and complex. The effect of AI misalignment could be disastrous if the AI discovers an unanticipated optimal solution to a problem that results in a critical system becoming inoperable or yielding a catastrophic result.

While the control problem is troubling by itself, the integration of multiagent systems could be far more dangerous and could lead to other (as of now unanticipated) failure modes between systems. Just like complex societies, complex agent communities could manifest new capabilities and emergent failure modes unique to the complex system. Indeed, AI failures are unlikely to happen in isolation and the roadmap for multiagent AI environments is currently underway in both the public and private sectors.

Several U.S. government initiatives for next-generation intelligent networks include adaptive learning agents for autonomous processes. The Army’s Joint All-Domain Command and Control (JADC2) concept for networked operations and the Resilient and Intelligent Next-Generation Systems (RINGS) program, put forth by the National Institute of Standards and Technology (NIST), are two notable ongoing initiatives.[24], [25] Literature on cognitive Internet of Things (IoT) points to the extent of autonomy planned for self-configuring, adaptive AI “communities” and “societies” to steer networks through managing user intent, supervision of autonomy, and control.[26] A recent report from the world’s largest technical professional organization, IEEE, outlines the benefits of deep reinforcement learning (RL) agents for cyber security, proposing that, since RL agents are highly “capable of solving complex, dynamic, and especially high-dimensional” problems, they are optimal for cyber defense.[27] Researchers propose that RL agents be designed and released autonomously to configure the network, prevent cyber exploits, detect and counter jamming attacks, and offensively target distributed denial-of-service attacks.[28] Other researchers submitted proposals for automated penetration-testing, the ability to self-replicate the RL agents, while others propose cyber-red teaming autonomous agents for cyber-defense.[29], [30], [31]

Considering the host of problems discussed from AI alignment, unexpected side effects, and the issue of control, jumping headfirst into efforts that give AI meaningful control over critical systems (such as the examples described above) without careful consideration of the potential unexpected (or potentially catastrophic) outcomes does not appear to be the appropriate course of action. Proposing the use of one autonomous system in warfare is concerning but releasing millions into critical networks is another matter entirely. Researcher David Manheim explains that multiagent systems are vulnerable to entirely novel risks, such as “over-optimization failures,” where optimization pressure allows individual agents to circumvent designed limits.[32] As Manheim describes, “In many-agent systems, even relatively simple systems can become complex adaptive systems due to agent behavior.”[33] At the same time, research demonstrates that multiagent environments lead to greater agent generalization, thus reducing the capability gap that separates human intelligence from machine intelligence.[34] In contrast, some authors present multiagent systems as a viable solution to the control problem, with “stable, bounded capabilities,” and others note the broad uncertainty and potential for self-adaptation and mutation.[35] Yet, the author admits that there are risks and the multiplicative growth of RL agents could potentially lead to unexpected failures, with the potential for the manifestation of malignant agential behaviors.[36],[37] AI researcher Trent McConaughy highlights the risk from adaptive AI systems, specifically “decentralized autonomous organizations” (DAO) in blockchain networks. McConaughy suggests that rather than a powerful AI system taking control of resources, as is typically discussed, the situation may be far more subtle where we could simply hand over global resources to self-replicating communities of adaptive AI systems (e.g., Bitcoin’s increasing energy expenditures that show no sign of slowing).[38]

Advanced AI capabilities in next-generation networks that “dynamically reconfigure and reorganize” network operations hold undeniable risks to security and stability.[39],[40] A complex landscape of AI agents, designed to autonomously protect critical networks or conduct offensive operations, would invariably need to develop subgoals to manage the diversity of objectives. Thus, whether individual systems or autonomous collectives, the web of potential failures and subtle side-effects could unleash unpredictable dangers leading to catastrophic second- and third-order effects. As AI systems are currently designed, understanding the impact of the subgoals (or even their existence) could be extremely difficult or impossible. The AI examples above illustrate critical infrastructure and national security cases that are currently in discussion, but the reality could be far more complex, unexpected, and dangerous. While most AI researchers expect that safety will develop concurrently with system autonomy and complexity, there is no certainty in this proposition. Indeed, if there is even a minute chance of misalignment in a deployed AI system (or systems) in critical infrastructure or national defense it is important that researchers dedicate a portion of resources to evaluating the risks. Decision makers in government and industry must consider these risks and potential means to mitigate them before generalized AI systems are integrated into critical and national security infrastructure, because to do otherwise could lead to catastrophic failure modes that we may not be able to fully anticipate, endure, or overcome.


Disclaimer: The authors are responsible for the content of this article. The views expressed do not reflect the official policy or position of the National Intelligence University, the National Geospatial Intelligence Agency, the Department of Defense, the Office of the Director of National Intelligence, the U.S. Intelligence Community, or the U.S. Government.


Anderson, Chris. “Life.” In Possible Minds: Twenty-Five Ways of Looking at AI, by John Brockman, 150. New York: Penguin Books, 2019.

Avatrade Staff. “The Flash Crash of 2010.” Avatrade. August 26, 2021. https://www.avatrade.com/blog/trading-history/the-flash-crash-of-2010 (accessed August 24, 2022).

Baker, Bowen, et al. “Emergent Tool Use From Multi-Agent Autocurricula.” arXiv:1909.07528v2, 2020.

Berggren, Viktor, et al. “Artificial intelligence in next-generation connected systems.” Ericsson. September 2021. https://www.ericsson.com/en/reports-and-papers/white-papers/artificial-intelligence-in-next-generation-connected-systems (accessed May 3, 2022).

Bostrom, Nick. “The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.” Minds and Machines 22, no. 2 (2012): 71-85.

Brown, Tom B., et al. “Language Models are Few-Shot Learners.” arXiv:2005.14165, 2020.

Buchanan, Ben, John Bansemer, Dakota Cary, Jack Lucas, and Micah Musser. “Georgetown University Center for Security and Emerging Technology.” Automating Cyber Attacks: Hype and Reality. November 2020. https://cset.georgetown.edu/publication/automating-cyber-attacks/.

Byford, Sam. “AlphaGo’s battle with Lee Se-dol is something I’ll never forget.” The Verge. March 15, 2016. https://www.theverge.com/2016/3/15/11234816/alphago-vs-lee-sedol-go-game-recap (accessed August 19, 2022).

Drexler, K Eric. “Reframing Superintelligence: Comprehensive AI Services as General Intelligence.” Future of Humanity Institute. 2019. https://www.fhi.ox.ac.uk/wp-content/uploads/Reframing_Superintelligence_FHI-TR-2019-1.1-1.pdf (accessed August 19, 2022).

Duettmann, Allison. “WELCOME NEW PLAYERS | Gaming the Future.” Foresight Institute. February 14, 2022. https://foresightinstitute.substack.com/p/new-players?s=r (accessed August 19, 2022).

Edison, Bill. “Creating an AI red team to protect critical infrastructure.” MITRE Corporation. September 2019. https://www.mitre.org/publications/project-stories/creating-an-ai-red-team-to-protect-critical-infrastructure (accessed August 19, 2022).

Etzioni, Oren. “No, the Experts Don’t Think Superintelligent AI is a Threat to Humanity.” MIT Technology Review. September 20, 2016. https://www.technologyreview.com/2016/09/20/70131/no-the-experts-dont-think-superintelligent-ai-is-a-threat-to-humanity/ (accessed August 19, 2022).

Gary, Marcus, Ernest Davis, and Scott Aaronson. “A very preliminary analysis of DALL-E 2.” arXiv:2204.13807, 2022.

GCN Staff. “NSF, NIST, DOD team up on resilient next-gen networking.” GCN. April 30, 2021. https://gcn.com/cybersecurity/2021/04/nsf-nist-dod-team-up-on-resilient-next-gen-networking/315337/ (accessed May 1, 2022).

Jumper, John, et al. “Highly accurate protein structure prediction with AlphaFold.” Nature 596 (August 2021): 583–589.

Kallenborn, Zachary. “Swords and Shields: Autonomy, AI, and the Offense-Defense Balance.” Georgetown Journal of International Affairs. November 22, 2021. https://gjia.georgetown.edu/2021/11/22/swords-and-shields-autonomy-ai-and-the-offense-defense-balance/ (accessed August 19, 2022).

Kegel, Helene. “Understanding Gradient Descent in Machine Learning.” Medium. November 17, 2021. https://medium.com/mlearning-ai/understanding-gradient-descent-in-machine-learning-f48c211c391a (accessed August 19, 2022).

Krakovna, Victoria. “Specification gaming: the flip side of AI ingenuity.” Medium. April 11, 2020. https://deepmindsafetyresearch.medium.com/specification-gaming-the-flip-side-of-ai-ingenuity-c85bdb0deeb4 (accessed August 19, 2022).

Littman, Michael L, et al. “Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100) Study Panel Report.” Stanford University. September 2021. http://ai100.stanford.edu/2021-report (accessed August 19, 2022).

Manheim, David. “Overoptimization Failures and Specification Gaming in Multi-agent Systems.” Deep AI. October 16, 2018. https://deepai.org/publication/overoptimization-failures-and-specification-gaming-in-multi-agent-systems (accessed August 19, 2022).

Nguyen, Thanh Thi, and Vijay Janapa Reddi. “Deep Reinforcement Learning for Cyber Security.” IEEE Transactions on Neural Networks and Learning Systems. IEEE, 2021. 1-17.

Omohundro, Stephen M. “The Basic AI Drives.” Proceedings of the 2008 conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference. Amsterdam: IOS Press, 2008. 483–492.

Panfili, Martina, Alessandro Giuseppi, Andrea Fiaschetti, Homoud B. Al-Jibreen, Antonio Pietrabissa, and Franchisco Delli Priscoli. “A Game-Theoretical Approach to Cyber-Security of Critical Infrastructures Based on Multi-Agent Reinforcement Learning.” 2018 26th Mediterranean Conference on Control and Automation (MED). IEEE, 2018. 460-465.

Pico-Valencia, Pablo, and Juan A Holgado-Terriza. “Agentification of the Internet of Things: A Systematic Literature Review.” International Journal of Distributed Sensor Networks 14, no. 10 (2018).

Pomerleu, Mark. “US Army network modernization sets the stage for JADC2.” C4ISRNet. February 9, 2022. https://www.c4isrnet.com/it-networks/2022/02/09/us-army-network-modernization-sets-the-stage-for-jadc2/ (accessed August 19, 2022).

Russell, Stewart. Human Compatible: Artificial Intelligence and the Problem of Control. New York: Viking, 2019.

Shah, Rohin. “Reframing Superintelligence: Comprehensive AI Services as General Intelligence.” AI Alignment Forum. January 8, 2019. https://www.alignmentforum.org/posts/x3fNwSe5aWZb5yXEG/reframing-superintelligence-comprehensive-ai-services-as (accessed August 19, 2022).

Shahar, Avin, and SM Amadae. “Autonomy and machine learning at the interface of nuclear weapons, computers and people.” In The Impact of Artificial Intelligence on Strategic Stability and Nuclear Risk, by Vincent Boulanin, 105-118. Stockholm: Stockholm International Peace Research Institute, 2019.

Trevino, Marty. “Cyber Physical Systems: The Coming Singularity.” Prism 8, no. 3 (2019): 4.

Turner, Alexander Matt, Logan Smith, Rohin Shah, Andrew Critch, and Prasad Tadepalli. “Optimal Policies Tend to Seek Power.” arXiv:1912.01683, 2021: 8-9.

Winder, Phil. “Automating Cyber-Security With Reinforcement Learning.” Winder.AI. n.d. https://winder.ai/automating-cyber-security-with-reinforcement-learning/ (accessed August 19, 2022).

Zeng, Andy, et al. “Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language.” arXiv:2204.00598 (arXiv), April 2022.

Zewe, Adam. Does this artificial intelligence think like a human? April 6, 2022. https://news.mit.edu/2022/does-this-artificial-intelligence-think-human-0406 (accessed August 19, 2022).

Zwetsloot, Remco, and Allan Dafoe. “Lawfare.” Thinking About Risks From AI: Accidents, Misuse and Structure. February 11, 2019. https://www.lawfareblog.com/thinking-about-risks-ai-accidents-misuse-and-structure (accessed August 19, 2022).


[1] (Zewe 2022)

[2] (Littman, et al. 2021)

[3] (Jumper, et al. 2021)

[4] (Brown, et al. 2020)

[5] (Gary, Davis and Aaronson 2022)

[6] (Buchanan, et al. 2020)

[7] (Avatrade Staff 2021)

[8] (Russell 2019, 9-10)

[9] (Zwetsloot and Dafoe 2019)

[12] (Etzioni 2016)

[13] GO is an ancient Chinese strategy board game

[14] (Byford 2016)

[15] (Anderson 2019, 150)

[16] (Kegel 2021)

[17] (Krakovna 2020)

[18] (Omohundro 2008, 483-492)

[19] Ibid., 484.

[20] (Turner, et al. 2021, 8-9)

[21] (Baker, et al. 2020)

[22] Ibid.

[23] (Bostrom 2012, 71-85)

[24] (GCN Staff 2021)

[25] (Pomerleu 2022)

[26] (Berggren, et al. 2021)

[27] (Nguyen and Reddi 2021)

[28] Ibid.

[29] (Edison 2019)

[30] (Panfili, et al. 2018)

[31] (Winder n.d.)

[32] (Manheim 2018)

[33] Ibid.

[34] (Zeng, et al. 2022)

[35] (Drexler 2019, 18)

[36]  Ibid.

[37] (Shah 2019)

[38] (Duettmann 2022)

[39] (Trevino 2019)

[40] (Pico-Valencia and Holgado-Terriza 2018)

Dr. Mark Bailey and Kyle Kilian
Dr. Mark Bailey and Kyle Kilian
Dr. Mark Bailey is the Chair of the Cyber Intelligence and Data Science Department, as well as the Co-Director of the Data Science Intelligence Center, at the National Intelligence University. Prior to that, he worked as a data scientist on several AI programs within the U.S. Department of Defense and the Intelligence Community. Dr. Bailey is also a Major in the U.S. Army Reserve. He can be contacted at [email protected]. Kyle Kilian is an intelligence officer at the National Geospatial-Intelligence Agency (NGA) where he leads an analytical modernization team focused on integrating artificial intelligence (AI) into agency tradecraft. Mr. Kilian is a recent graduate of the National Intelligence University, where he studied the control problem and AI futures. Prior to his time at NIU, Kyle served as a senior analyst at NGA and an adjunct professor in activity-based intelligence at the NGA College. He can be contacted at [email protected].

Related Articles

- Advertisement -

Latest Articles

Verified by MonsterInsights