Cooperative AI

Multi-Agent Oversight and Control

What It Is and Why It Matters

Many important risks from collective AI systems – including collusion, cascading failures, and emergent collective capabilities – are fundamentally population-level phenomena. Oversight mechanisms designed for individual agents may not straightforwardly scale to large populations of interacting systems. We therefore seek technical methods to detect, attribute, discourage, and control unsafe behaviour in deployed multi-agent systems. Importantly, these methods must remain robust under realistic, multi-principal deployment constraints and partial observability. We expect that they will build on the foundational science of agent networks (Section 2) and may also leverage emerging agent infrastructure (Section 3).

Specific Work We Would Like to Fund

Detection of collusion and the evolution of inter-agent communication. Algorithms and tools to detect undesirable or unanticipated coordination between agents, including via emergent forms of communication or steganography (e.g., Bonjour et al., 2022; Riedl, 2026; Rose et al., 2026). Methods that can identify such signals from (partial observations of) both interaction and communication traces, ideally using privacy-preserving tools.
‍
Attribution and oversight interfaces. Tools to help trace emergent failures back to specific agents, interactions, or delegation chains (Zhang et al., 2025). Interfaces and visualisations that make agent populations – their structure, relationships, and decision processes – legible to human overseers, including interactive tools for exploring and querying populations at runtime.
‍
Multi-agent control and scalable oversight. Extensions of AI control (Greenblatt et al., 2025) and scalable oversight methodologies (see Shah et al., 2025, Section 6.1) to multi-agent settings. This includes designing secure harnesses and task-allocation architectures that respect cross-principal trust boundaries (Foerster et al., 2026), as well as red/blue-team evaluations of control protocols for robustness to subversion by groups of agents.
‍
Mechanism and information design. Adaptive mechanism design tools and algorithms for promoting cooperation or preventing collusion among frontier-model agents in complex domains. Information design tools – what to reveal to which agents – to promote cooperation or reduce miscoordination. Circuit breakers, (de)synchronisation, and limits on agent action rates for stabilising volatile networks. Agents designed to foster population-level cooperation and stability when reliance on centralised mechanisms is undesirable or infeasible.

Key Considerations

Please see the guidelines on research areas and out-of-scope topics.

References

Bonjour, Trevor, Vaneet Aggarwal, and Bharat Bhargava (2022). "Information Theoretic Approach to Detect Collusion in Multi-Agent Games". In Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022, PMLR 180), pp. 223–232.

‍

Foerster, Hanna, Tom Blanchard, Kristina Nikolić, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, and Yiren Zhao (2026). "CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents". arXiv:2601.09923.

‍

Greenblatt, Ryan, Buck Shlegeris, Kshitij Sachan, and Fabien Roger (2023). "AI Control: Improving Safety Despite Intentional Subversion". arXiv:2312.06942.

‍

Riedl, Christoph (2026). "Emergent Coordination in Multi-Agent Language Models". arXiv:2510.05174 (ICLR 2026).

‍

Rose, Aaron, Carissa Cullen, Brandon Gary Kaplowitz, and Christian Schroeder de Witt (2026). "Detecting Multi-Agent Collusion Through Multi-Agent Interpretability". arXiv:2604.01151.

‍

Shah, Rohin, Alex Irpan, Alexander Matt Turner, Anna Wang, Arthur Conmy, et al. (2025). "An Approach to Technical AGI Safety and Security". arXiv:2504.01849.

‍

Zhang, Shaokun, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, and Qingyun Wu (2025). "Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems". In Proceedings of the 42nd International Conference on Machine Learning.

Priority Research Areas

Sandboxes and Testbeds

Learn More →

Strengthening Agent Infrastructure

Learn More →

The Science of Agent Networks

Learn More →

Secondary Research Areas

No items found.