Many important risks from collective AI systems – including collusion, cascading failures, and emergent collective capabilities – are fundamentally population-level phenomena. Oversight mechanisms designed for individual agents may not straightforwardly scale to large populations of interacting systems. We therefore seek technical methods to detect, attribute, discourage, and control unsafe behaviour in deployed multi-agent systems. Importantly, these methods must remain robust under realistic, multi-principal deployment constraints and partial observability. We expect that they will build on the foundational science of agent networks (Section 2) and may also leverage emerging agent infrastructure (Section 3).
Please see the guidelines on research areas and out-of-scope topics.
Bonjour, Trevor, Vaneet Aggarwal, and Bharat Bhargava (2022). "Information Theoretic Approach to Detect Collusion in Multi-Agent Games". In Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022, PMLR 180), pp. 223–232.
Foerster, Hanna, Tom Blanchard, Kristina Nikolić, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, and Yiren Zhao (2026). "CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents". arXiv:2601.09923.
Greenblatt, Ryan, Buck Shlegeris, Kshitij Sachan, and Fabien Roger (2023). "AI Control: Improving Safety Despite Intentional Subversion". arXiv:2312.06942.
Riedl, Christoph (2026). "Emergent Coordination in Multi-Agent Language Models". arXiv:2510.05174 (ICLR 2026).
Rose, Aaron, Carissa Cullen, Brandon Gary Kaplowitz, and Christian Schroeder de Witt (2026). "Detecting Multi-Agent Collusion Through Multi-Agent Interpretability". arXiv:2604.01151.
Shah, Rohin, Alex Irpan, Alexander Matt Turner, Anna Wang, Arthur Conmy, et al. (2025). "An Approach to Technical AGI Safety and Security". arXiv:2504.01849.
Zhang, Shaokun, Ming Yin, Jieyu Zhang, Jiale Liu, Zhiguang Han, Jingyang Zhang, Beibin Li, Chi Wang, Huazheng Wang, Yiran Chen, and Qingyun Wu (2025). "Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems". In Proceedings of the 42nd International Conference on Machine Learning.