1 What is Cooperative AI?
Required Content 2hrs 45mins
Cooperative AI is an emerging research field focused on improving the cooperative intelligence of advanced AI for the benefit of all. This includes both addressing risks of advanced AI related to multi-agent interactions and realizing the potential of advanced AI to enhance human cooperation. This curriculum is meant to provide an introduction to cooperative AI.
This introductory section is aimed at providing a broad introduction to the field of cooperative AI. The resources present risks of advanced AI related to multi-agent interactions, but also touch on how advanced AI could present new opportunities for enhanced human cooperation – a topic that we will return to later in the curriculum.
By the end of the section, you should be able to:
- Explain why alignment may not be enough to ensure the safety of AI systems in a multi-agent setting.
- Explain how classical game-theoretic problems could apply to AI systems.
- Give examples of research questions that target multi-agent safety for large language models (LLMs).
- Explain, with an example, how AI could enable better cooperation among humans.
We start off with a short video that introduces some basics about the field: the relationship between AI alignment and cooperative AI, the differences between cooperative capabilities and dispositions and terms such as collective alignment:
Tooltip Text
A recurring core concept in cooperative AI is cooperation problems: situations where individuals or groups would benefit from working together, yet have incentives that make cooperation difficult or unstable. The next resource goes a bit more in depth on the differences between humans and AI systems as they relate to cooperation problems. If you are not familiar with game theory, it might be helpful to review the linked background material as well, but the important thing to take away from this paper is the conceptual ideas of how AI cooperation might differ from human cooperation.
Tooltip Text
Abstract, Introduction, Cooperation between Copies, Cooperation by Reading Each Other’s Code, Self-Locating Beliefs
Tooltip Text
In the paper ‘Foundations of Cooperative AI’ linked above, the authors state that the traveller’s dilemma presented in the introduction, can be “thought of as between two agents, each of which is perfectly aligned with a distinct subset of humanity, but the two subsets of humanity have slightly conflicting objectives”. Try to explicitly draw the link between the traveller’s dilemma presented and this scenario.
Tooltip Text
The other sections of the above paper provide more background on the problem of cooperation among agents generally, and is optional content. Read those sections if you’re interested in learning how repeated interactions in games can be a powerful, yet fragile and unrealistic, method for fostering cooperation and how equilibrium selection can persist as a problem, even after an intervention aimed at improving cooperation.
Tooltip Text
Early work on interactions between AI agents was focused on Multi-Agent Reinforcement Learning (MARL), and such methods still have an important place in the field. (To continue engaging with this section it is not necessary to understand MARL in detail.) In recent years however the focus has shifted more to large language models (LLMs) as these have become increasingly useful, and it seems likely that the most advanced agents that will be deployed in the coming years will be built using LLMs.
The next resource (which also cites the previous paper) covers multi-agent safety with specific consideration of LLM-based agents and provides arguments for why this requires special consideration. It also introduces concepts such as emergent multi-agent behaviours and collusion, and discusses to what extent work from MARL applies to LLM agent interactions.
If you are very new to the concept of LLM-based agents (or “agentic LLMs”) we recommend that you also read section 2.5 (at least up to 2.5.4).
Tooltip Text
Cooperation problems clearly predate the development of artificial intelligence; human cooperation at different scales is all about resolving social dilemmas for mutual benefit. While advanced AI introduces new risks of cooperation failure, there is also the potential that AI tools can be used to solve high-stakes cooperation better than we have been able to do so far. This is the subfield of cooperative AI called AI for human cooperation, and the next resource gives a taste of such work. Several of the concepts in this talk (e.g. mechanism design, norms and institutions) are central to work in cooperative AI both when focus is on human cooperation and when it is on AI agent interactions.
Tooltip Text
Cooperative AI is still a young research field, even compared to other areas of ML and AI Safety. Therefore, there is often a lack of consensus around terminology and definitions, and it can be difficult to get a sense of what the important research questions or relevant methods are. As you learn more about the work in this area, it is therefore good to be prepared for some confusion and contradictions. The following piece is a blog post that aims to disentangle some key concepts and relationships between cooperative AI and neighbouring areas:
Tooltip Text
All parts
Tooltip Text
The idea of cooperative intelligence and cooperative capabilities are introduced in Cecilia's blog post 'Cooperative AI: Three things that confused me as a beginner (and my current understanding)'. Do you think high general intelligence and capabilities among AI agents (the pursuit of which some would argue is dangerous) is necessary for high cooperative intelligence? To what extent do you think these concepts are separable or reliant on each other? In what way do they scale with each other? Is the relationship always positive? These questions are part of ongoing research, and are purposefully vague and open.
Tooltip Text
