Fellows' Spotlight

Adversarial learning algorithms that explicitly search for flaws in agents' policies have been successfully applied to finding robust and diverse policies in multi-agent settings. However, the success of adversarial learning has been largely limited to zero-sum settings because its naive application in cooperative settings leads to a critical failure mode: agents are irrationally incentivised to self-sabotage, blocking the completion of tasks and halting further learning. To address this, we introduce Rationality-preserving Policy Optimisation (RPO), a formalism for adversarial optimisation that avoids self-sabotage by ensuring agents remain rational – that is, their policies are optimal with respect to some possible partner policy. To solve RPO, we develop Rational Policy Gradient (RPG), which trains agents to maximise their own reward in a modified version of the original game in which we use opponent shaping techniques to optimise the adversarial objective. RPG enables us to extend a variety of existing adversarial learning algorithms that, no longer subject to the limitations of self-sabotage, can find adversarial examples, improve robustness and adaptability, and learn diverse policies. We empirically validate that our approach achieves strong performance in several popular cooperative and general-sum environments.
Speakers

Niklas Lauffer (UC Berkeley)

Discussants
Time

16:00-17:00 UTC 25 June 2026

Links
This seminar has now finished
Register Here

Niklas Lauffer recently finished his doctorate in computer science at UC Berkeley, where he was advised by Stuart Russell and Sanjit Seshia, and will be joining Google DeepMind as a research scientist. His research centers on AI safety and reinforcement learning, with a focus on training safe, robust, and collaborative agents in open-ended multi-agent interactions. He is an NSF Graduate Research Fellow and a Cooperative AI Fellow. Previously, he spent time on the Reasoning and Agents team at Scale AI as well as the Planning and Scheduling group at NASA Ames Research Center. He received his BS in computer science and math from UT Austin where he worked with Ufuk Topcu. Visit his website to learn more about this work.

Robust and Diverse Multi-Agent Learning via Rational Policy Gradient