What is Cooperative AI?

Cooperative AI is an emerging research field focused on improving the cooperative intelligence of advanced AI for the benefit of all. This includes both addressing risks of advanced AI related to multi-agent interactions and realizing the potential of advanced AI to enhance human cooperation. This curriculum is meant to provide an introduction to cooperative AI.

This introductory section is aimed at providing a broad introduction to the field of cooperative AI. The resources present risks of advanced AI related to multi-agent interactions, but also touch on how advanced AI could present new opportunities for enhanced human cooperation – a topic that we will return to later in the curriculum.

‍

By the end of the section, you should be able to:

‍

Explain why alignment may not be enough to ensure the safety of AI systems in a multi-agent setting.
Explain how classical game-theoretic problems could apply to AI systems.
Give examples of research questions that target multi-agent safety for large language models (LLMs).
Explain, with an example, how AI could enable better cooperation among humans.

‍

We start off with a short video that introduces some basics about the field: the relationship between AI alignment and cooperative AI, the differences between cooperative capabilities and dispositions and terms such as collective alignment:

Introduction to Summer School by Lewis Hammond

From 11:50

Required • 8 mins

Defining Cooperative AI

Alignment

Cooperative Capabilities

Cooperative Propensities

A recurring core concept in cooperative AI is cooperation problems: situations where individuals or groups would benefit from working together, yet have incentives that make cooperation difficult or unstable. The next resource goes a bit more in depth on the differences between humans and AI systems as they relate to cooperation problems. If you are not familiar with game theory, it might be helpful to review the linked background material as well, but the important thing to take away from this paper is the conceptual ideas of how AI cooperation might differ from human cooperation.

Foundations of Cooperative AI

Abstract, Introduction, Cooperation between Copies, Cooperation by Reading Each Other’s Code, Self-Locating Beliefs

Prerequisites

Basic Game Theory

Required • 40 mins

Open-Source Game Theory

Self-Locating Beliefs

Decision Theory

Exercise

In the paper ‘Foundations of Cooperative AI’ linked above, the authors state that the traveller’s dilemma presented in the introduction, can be “thought of as between two agents, each of which is perfectly aligned with a distinct subset of humanity, but the two subsets of humanity have slightly conflicting objectives”. Try to explicitly draw the link between the traveller’s dilemma presented and this scenario.

Prerequisites

Basic Game Theory

Optional • 15 mins

Mixed-Motive

Basic Game Theory

Alignment

Exercise

Try to list reasons why the problem of ensuring good outcomes in multi-agent systems of humans is disanalogous from the problem of ensuring good outcomes in multi-agent systems of AI-agents.

Required • 15 mins

Exercise

Try to list interventions for improving cooperation among AI agents that would not be feasible for among humans. Now do the same for the other way around.

Required • 20 mins

The other sections of the above paper provide more background on the problem of cooperation among agents generally, and is optional content. Read those sections if you’re interested in learning how repeated interactions in games can be a powerful, yet fragile and unrealistic, method for fostering cooperation and how equilibrium selection can persist as a problem, even after an intervention aimed at improving cooperation.

Foundations of Cooperative AI

Cooperation in Repeated Games, Disarmament, Equilibrium Selection

Prerequisites

Basic Game Theory

Solution Concepts

Optional • 12 mins

Equilibrium Selection

Repeated Games

Early work on interactions between AI agents was focused on Multi-Agent Reinforcement Learning (MARL), and such methods still have an important place in the field. (To continue engaging with this section it is not necessary to understand MARL in detail.) In recent years however the focus has shifted more to large language models (LLMs) as these have become increasingly useful, and it seems likely that the most advanced agents that will be deployed in the coming years will be built using LLMs.

‍

The next resource (which also cites the previous paper) covers multi-agent safety with specific consideration of LLM-based agents and provides arguments for why this requires special consideration. It also introduces concepts such as emergent multi-agent behaviours and collusion, and discusses to what extent work from MARL applies to LLM agent interactions.

‍

If you are very new to the concept of LLM-based agents (or “agentic LLMs”) we recommend that you also read section 2.5 (at least up to 2.5.4).

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

‘1 Introduction’ excluding ‘1.3 Structure’, ‘2.6 Multi-Agent Safety Is Not Assured by Single-Agent Safety’

Prerequisites

LLM Agents

Required • 25 mins

LLM-Agents

MARL

Alignment

Collusion

Emergence

Cooperation problems clearly predate the development of artificial intelligence; human cooperation at different scales is all about resolving social dilemmas for mutual benefit. While advanced AI introduces new risks of cooperation failure, there is also the potential that AI tools can be used to solve high-stakes cooperation better than we have been able to do so far. This is the subfield of cooperative AI called AI for human cooperation, and the next resource gives a taste of such work. Several of the concepts in this talk (e.g. mechanism design, norms and institutions) are central to work in cooperative AI both when focus is on human cooperation and when it is on AI agent interactions.

Studying and shaping norms with deep reinforcement learning by Raphael Köster

Up to 19:36

Required • 30 mins

Norms

Inter-Agent Infrastructure

Public Goods Game

AI for Human Cooperation

Mechanism Design

Reinforcement Learning

Cooperative AI is still a young research field, even compared to other areas of ML and AI Safety. Therefore, there is often a lack of consensus around terminology and definitions, and it can be difficult to get a sense of what the important research questions or relevant methods are. As you learn more about the work in this area, it is therefore good to be prepared for some confusion and contradictions. The following piece is a blog post that aims to disentangle some key concepts and relationships between cooperative AI and neighbouring areas:

Cooperative AI: Three things that confused me as a beginner (and my current understanding)

All parts

Required • 10 mins

Defining Cooperative AI

Exercise

The idea of cooperative intelligence and cooperative capabilities are introduced in Cecilia's blog post 'Cooperative AI: Three things that confused me as a beginner (and my current understanding)'. Do you think high general intelligence and capabilities among AI agents (the pursuit of which some would argue is dangerous) is necessary for high cooperative intelligence? To what extent do you think these concepts are separable or reliant on each other? In what way do they scale with each other? Is the relationship always positive? These questions are part of ongoing research, and are purposefully vague and open.

Optional • 20 mins

Back-Fire Risks

Cooperative Capabilities

Exercise

Given the content you have engaged with so far, how would you define the field of cooperative AI? In what ways does cooperative AI differ from other subfields of beneficial AI?

Required • 10 mins

Defining Cooperative AI

Next Section

1 What is Cooperative AI?