1 What is Cooperative AI?
Cooperative AI is an emerging research field that promotes safe and beneficial interactions between advanced AI agents. This includes both addressing risks such as conflict and collusion, as well as realising the potential of AI to enhance human cooperation. This curriculum is meant to provide an introduction to cooperative AI.
This introductory section is aimed at providing a broad introduction to the field of cooperative AI. The resources present risks of advanced AI related to multi-agent interactions, but also touch on how advanced AI could present new opportunities for enhanced human cooperation—a topic that we will return to later in the curriculum.
Tooltip Text
We start off with a short video that introduces some basics about the field: the relationship between AI alignment and cooperative AI, the differences between cooperative capabilities and dispositions and terms such as collective alignment:
Tooltip Text
Cooperative AI is still a young research field, even compared to other areas of ML and AI safety. Therefore, there is often a lack of consensus around terminology and definitions, and it can be difficult to get a sense of what the important research questions or relevant methods are. As you learn more about the work in this area, it is therefore good to be prepared for some confusion and contradictions. The following piece is a blogpost that aims to disentangle some key concepts and relationships between cooperative AI and neighbouring areas:
Tooltip Text
All parts
Tooltip Text
A recurring core concept in cooperative AI is cooperation problems: situations where individuals or groups would benefit from working together, yet have incentives that make cooperation difficult or unstable. Humans are often applauded for our incredible abilities to cooperate, but there are plenty of examples of how cooperation between humans can fail catastrophically. It seems likely that advanced AI systems and autonomous AI agents are going to be integrated into our society (e.g. AI assistants for warfare or AI delegates for a nation’s government), and the cooperation problems that will arise or the impact they will have on existing cooperation problems is concerning, or at the very least unclear.
Tooltip Text
List of 2 examples of persistent cooperation problems, and examples of tools or infrastructure that humans have created to solve these problems. (Solve here means: enable humans to work together in such a way that benefits the group and helps them avoid conflict, races to the bottom, exploitation, or other multi-agent failures.)
Tooltip Text
The next resource goes into a bit more in depth on the differences between humans and AI systems as they relate to cooperation problems. It also provides a game-theoretic argument behind the claim that AI alignment might not be enough to ensure the safety of AI systems in multi-agent settings. This is an important foundational piece to the field of cooperative AI, but relies heavily on a good understanding of game theory so is optional at this point in the curriculum. In the next section, we will more thoroughly introduce the concepts from game theory that are relevant to cooperative AI, and so this paper would be worth returning to if you find the time.
Tooltip Text
Abstract, Introduction, Cooperation between Copies, Cooperation by Reading Each Other’s Code, Self-Locating Beliefs
Tooltip Text
In the paper ‘Foundations of Cooperative AI’ linked above, the authors state that the traveller’s dilemma presented in the introduction, can be “thought of as between two agents, each of which is perfectly aligned with a distinct subset of humanity, but the two subsets of humanity have slightly conflicting objectives”. Try to explicitly draw the link between the traveller’s dilemma presented and this scenario.
Tooltip Text
The other sections of the above paper provide more background on the problem of cooperation among agents generally, and is optional content. Read those sections if you’re interested in learning how repeated interactions in games can be a powerful, yet fragile and unrealistic, method for fostering cooperation and how equilibrium selection can remain a problem, even after an intervention aimed at improving cooperation.
Tooltip Text
A lot of prior work on interactions between AI agents was focused on multi-agent reinforcement learning (MARL), and such methods still have an important place in the field. (To continue engaging with this section it is not necessary to understand MARL in detail.) In recent years however the focus has shifted more to large language models (LLMs) as these have become increasingly useful, and it seems likely that the most advanced agents that will be deployed in the coming years will be built using LLMs.
The next resource covers multi-agent safety with specific consideration of LLM-based agents and provides arguments for why this requires special consideration. It also introduces concepts such as emergent multi-agent behaviours and collusion, and discusses to what extent work from MARL applies to LLM agent interactions. If you are very new to the concept of LLM-based agents (or “agentic LLMs”) we recommend that you also read section 2.5 (at least up to 2.5.4).
Tooltip Text
Cooperation problems clearly predate the development of AI; human cooperation at different scales is often about resolving social dilemmas for mutual benefit. While advanced AI introduces new risks of cooperation failure, there is also the potential that AI tools can be used to solve high-stakes cooperation problems better than we have been able to do so far. This is the subfield of cooperative AI called AI for human cooperation, and the next resource gives a taste of such work. The final resource of this section is a blogpost that is based on a research workshop that took place in July 2025 where experts worked on formulating visions of success for AI-facilitated cooperation and mapping out promising directions to work on.
Tooltip Text
All parts
Tooltip Text
The idea of cooperative intelligence and cooperative capabilities are introduced in Cecilia's blogpost 'Cooperative AI: Three things that confused me as a beginner (and my current understanding)'. Do you think high general intelligence and capabilities among AI agents (the pursuit of which some would argue is dangerous) is necessary for high cooperative intelligence? To what extent do you think these concepts are separable or reliant on each other? In what way do they scale with each other? Is the relationship always positive? These questions are part of ongoing research, and are purposefully vague and open. Have a think about the questions, but don’t spend longer than 30 minutes on this exercise.
Tooltip Text
Spend under 30 minutes trying to answer all the following prompts about the field of cooperative AI. You are not meant to have detailed or confident answers to any of the following questions yet. You will return to these later in the curriculum.
- How would you define the field of cooperative AI in your own words?
- What concepts that you’ve heard about so far confuse you?
- What problems do you think the cooperative AI field focuses on?
- What real-world present day scenarios would the field of cooperative AI be concerned about or focussed on? What future scenarios might the field of cooperative AI be concerned about?
- Why does the field of cooperative AI matter?
Tooltip Text
