Updates in Cooperative AI

As LLM-based multi-agent systems coordinate through free-form language, a critical safety concern emerges: what happens when individual agents form coalitions and collude to pursue secondary goals at the expense of the joint objective? In this talk, Sahar will present Colosseum, a framework for auditing collusive behaviour in multi-agent settings. Colosseum grounds agent cooperation through Distributed Constraint Optimisation Problems (DCOPs) and measures collusion via regret relative to the cooperative optimum, enabling rigorous evaluation across different objectives, persuasion tactics, and network topologies. Their audit reveals that most out-of-the-box models are susceptible to collusion when a secret communication channel is available. They also uncover a surprising phenomenon – ‘collusion on paper’ – where agents plan to collude in text but ultimately select non-collusive actions, highlighting a gap between language-level intent and behavioural outcomes. Colosseum provides a principled new way to study collusion by jointly analysing communications and actions in rich yet verifiable environments.
We’re delighted to host this eighth seminar in our 'Updates in Cooperative AI' series. We're running these seminars monthly, and you're welcome to subscribe to our Google Calendar to stay up-to-date on all upcoming events.
Speakers

Sahar Abdelnabi (ELLIS Institute Tübingen)

Discussants
Time

16:00-17:00 UTC 23 April 2026

Links
This seminar has now finished
Register Here

Sahar Abdelnabi is a Principal Investigator at the ELLIS Institute Tübingen and an Independent Research Group Leader at the Max Planck Institute for Intelligent Systems. She leads the COMPASS research group (COoperative Machine intelligence for People-Aligned Safe Systems). Her research focuses on AI security, safety, and alignment, with particular expertise in multi-agent systems, prompt injection attacks, privacy frameworks, and evaluation robustness. Prior to her current role, she worked at the Microsoft Security Response Center on AI security vulnerabilities and red-teaming. Sahar's contributions include pioneering work on indirect prompt injection in LLM-integrated applications, which has been widely adopted by NIST, MITRE, OWASP, and Microsoft. She holds a PhD from CISPA Helmholtz Center for Information Security and Saarland University.

Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems