Cooperative AI

The Science of Agent Networks

What It Is and Why It Matters

Understanding the system-level properties and dynamics of populations of advanced AI agents is a foundational scientific challenge. It requires characterising how the properties of individual agents – their capabilities, objectives, and behavioural dispositions – contribute to population-level outcomes, as well as how the structure and dynamics of agent networks give rise to emergent vulnerabilities, failures, and collective behaviours. Of particular importance are cases where groups of agents form a ‘collective agent’, exhibiting coherent collective ‘goals’, strategies, or capabilities that are not predictable from individual systems in isolation. Without this scientific foundation, individual-level safeguards may fail to anticipate system-level risks, reducing our ability to pre-empt, forecast, or diagnose the impacts of larger-scale deployments. We are especially interested in work that combines theoretical insight with empirical evaluation in realistic multi-agent settings.

Specific Work We Would Like to Fund

From individual properties to system-level safety. Determining how the cooperation-relevant properties of individual agents – their strategic capabilities, propensity to cooperate or defect, and susceptibility to manipulation – shape system-level outcomes (Tilli, 2026). Establishing relationships between agents’ training data, objectives, and model specifications and their cooperation-relevant properties.
‍
Evaluating vulnerabilities in networks of AI agents. New evaluation frameworks for risks specific to multi-agent deployments, such as resilience to adversarial sub-populations, propagation of attacks between agents (e.g., Lee & Tiwari, 2024), and susceptibility to cascading failures. Red-teaming frameworks that can surface new collective failure modes at scale.
‍
Modelling emergent capabilities and communication. Models and metrics that can be used to predict how collective capabilities, volatilities, and other safety-relevant properties vary with population size, heterogeneity, interaction topology, individual agent capabilities, and the availability of tools and resources. Areas of particular interest are the emergence and transferability of new forms of communication and the possibility of ‘phase transitions’ in agent populations.
‍
Theoretical foundations of collective agency. Formal definitions of collective agency and emergent ‘goals’ or capabilities, with tractable operationalisations applicable to realistic settings. Several existing proposals (e.g., Szabo & Teo, 2015; Jørgensen et al., 2025) either require infeasibly many observations and interventions or rely on micro- or macro-level abstractions that are hard to instantiate in practice.
‍
Evaluating dangerous emergent capabilities and goals. Evaluations that target whether combinations of agents exhibit specific dangerous capabilities or ‘goals’ absent in individuals. Examples include: coordination to resist modification or shutdown (Agrawal et al., 2026), decomposing tasks to evade per-agent safety filters (Jones et al., 2025), developing covert communication channels (Motwani et al., 2024), or accumulating resources and influence at the collective level.

Key Considerations

Please see the guidelines on research areas and out-of-scope topics.

References

Agrawal, Akash, Soroush Ebadian, and Lewis Hammond (2026). "The Multi-Agent Off-Switch Game". In Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), Paphos, Cyprus, pp. 4–12.

‍

Jones, Erik, Anca Dragan, and Jacob Steinhardt (2025). "Adversaries Can Misuse Combinations of Safe Models". In Proceedings of the 42nd International Conference on Machine Learning, pp. 28327–28349.

‍

Jørgensen, Frederik Hytting, Sebastian Weichwald, and Lewis Hammond (2025). "Causal Foundations of Collective Agency". arXiv:2605.00248 (CLeaR 2026).

‍

Lee, Donghyun, and Mo Tiwari (2024). "Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems". arXiv:2410.07283.

‍

Motwani, Sumeet Ramesh, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, and Christian Schroeder de Witt (2024). "Secret Collusion among AI Agents: Multi-Agent Deception via Steganography". In Advances in Neural Information Processing Systems 37 (NeurIPS 2024).

‍

Szabo, Claudia, and Yong Meng Teo (2015). "Formalization of Weak Emergence in Multiagent Systems". ACM Transactions on Modeling and Computer Simulation (26:1), Article 6, pp. 1–25.

‍

Tilli, Cecilia Elena (2026). "Agent Properties for Multi-Agent Safety". ICLR 2026 Workshop on AI Agents in the Wild (AIWILD).

Priority Research Areas

Sandboxes and Testbeds

Learn More →

Strengthening Agent Infrastructure

Learn More →

Multi-Agent Oversight and Control

Learn More →

Secondary Research Areas

No items found.