4 Cooperative Agents

Curriculum
Curriculum

The first sections of the curriculum have focused on the problems that the field of cooperative AI is aiming to tackle, and presented game theory and complex systems as possible framings. We will now turn to looking at different approaches to solving these problems, and start off looking at interventions that are focused on the agents themselves and their inherent properties.

Tooltip Text

Learning Objectives:
Be able to broadly define what it means for an agent to be cooperative, potentially using concepts from game theory.
Describe and compare how different agent designs fare in a multi-agent setting like the iterated prisoner's dilemma with evolution.
Comment on the efficacy and practicality of training certain biases or tendencies into agents to improve their cooperativeness.

The Properties of a Cooperative Agent

Tooltip Text

The idea that a goal of cooperative AI should be to create cooperative agents was introduced from the very beginning of the field's establishment. The following optional reading published in 2021 can be seen as a starting point for the field of cooperative AI, and outlines cooperative intelligence as a central property that AI should possess. If you’re interested in the history of cooperative AI we recommend reading the suggested sections and taking note of the authors.

Tooltip Text

Machines must learn to find common ground

Up to 'AI for human collaboration'

Tooltip Text

Optional • 1600 Words (Technical)

If you want to take a deeper dive into the cognitive skills or individual capabilities that make up cooperative intelligence, as claimed by the previous piece, then the following reading might be helpful. However, note that it is quite a technical paper and spends most of its time listing citations over explaining concepts, which is why it is also optional.

Tooltip Text

Open Problems in Cooperative AI

‘4. Cooperative Capabilities' excluding '4.4 Institutions'

Tooltip Text

Optional • 5200 Words (Technical)

Settling on a formal definition of what makes an agent cooperative is tricky and an ongoing area of research. If we want to avoid catastrophic cooperation failures between agents in the real-world, we must figure out how to create agents that are not only able to cooperate well, but that are also likely to actually be deployed. Imagine that you can choose the properties of your own AI assistant. Would you pick an assistant that always prioritises the common good, or would you pick one that prioritises what is best for you? Chances are, most people (and companies) would prefer an assistant that is able to negotiate good deals for them and protect their interests.

Tooltip Text

Exercise 4.1

What do you think are the properties that make an agent cooperative? After jotting down your thoughts try to construct as formal a definition as you can (feel free to use concepts from game theory if it’s helpful). This is meant to be a hard, explorative exercise, so don't panic and don’t spend longer than 20 minutes on it.

Tooltip Text

Required
Exercise 4.2

In 2022 Anthropic introduced “Constitutional AI”, where they used a set of prescriptive statements to direct the fine-tuning of their LLM Claude. Their constitution includes statements such as the following: “Please choose the assistant response that is as harmless and ethical as possible. Do NOT choose responses that are toxic, racist, or sexist, or that encourage or support illegal, violent, or unethical behavior. Above all the assistant's response should be wise, peaceful, and ethical.”

While this constitution is written for a chatbot, you could imagine a similar set of instructions written for a more autonomous kind of AI agent that could interact with other agents and humans. Try to formulate a few of statements for an “AI Constitution” that would encourage cooperative behaviour.

Tooltip Text

Required

The next resources attempt to break down our understanding of what a cooperative agent would be into more granular and specific agent properties that could more easily be specified, evaluated and trained for. The first is a blogpost and the second is a presentation. Feel free to engage with both or just one of them depending on your preferences as they cover the same concepts (the presentation does however include audience questions from 38:12).

Tooltip Text

Agent Properties for Safe Interactions

All parts

Tooltip Text

Required • 2300 Words
Properties of Cooperative Agents by Cecilia Tilli

All parts

Tooltip Text

Required • 50 mins

The following interactive piece is useful for seeing how different agent designs fare in a very particular multi-agent setting, but one that is analogous to many real life situations: the iterated prisoners’ dilemma (with evolution).

Tooltip Text

Evolution of Trust by Nicky Case

All parts

Tooltip Text

Required • 30 mins
Exercise 4.3

List a few real-world mixed-motive settings that are analogous to or could be modelled using an iterated prisoner’s dilemma with evolution.

Tooltip Text

Required
Exercise 4.4

The interactive piece by Nicky Case demonstrates that the success of some agent designs or properties depends heavily on the features of the environment they are in. Go to section 7 ‘Sandbox mode’ and experiment with all the different parameters under population, payoffs and rules. Take note of any observations you make about how different parameters affect the success of the different agents in the simulation. Can you find interesting parameter values such that the ‘Random’ agent is typically the most successful agent?

Tooltip Text

Required
Exercise 4.5

Try to label the properties of the ‘Copycat’ agent from the interactive piece with terms that generalise to agents in other settings. For example, you might label ‘Cooperator’ as altruistic and ‘Grudger’ as initially nice, but vengeful and wholly unforgiving. Do you think the properties of the ‘Copycat’ agent you've written are good properties of cooperative agents in general?

Tooltip Text

Required

The final part of this section ‘Multi-agent environments’ lists many more simulations, experiments or tournaments for testing the success of cooperative agent designs in particular environments, including ones constructed for LLM-based agents.

Tooltip Text

Exercise 4.6

Revisit the first two exercises of this section, and write new responses in light of what you've learnt so far.

Tooltip Text

Required

Opponent Shaping Optional

When we think about agents in multi-agent settings, it is important to distinguish between static agents that are trained on a specific dataset and do not develop further in deployment, and continually learning agents that keep learning in deployment. Training of static agents is called offline learning, while the continual learning is also called online learning.

The circumstance when agents are learning in the presence of other learning agents deserves special attention as this introduces a meta-game where an agent can gain advantage by taking the learning process of the opponent into account. This is called opponent shaping.

Tooltip Text

As an example to explain how opponent shaping works, we will use a classic two-player coordination game from game theory called Bach or Stravinsky (also known as Battle of the Sexes and often shortened to BoS). The setup is that you have two friends who want to go to a concert together, and they each need to choose between two options: Bach or Stravinsky. One of them prefers Bach and the other Stravinsky, but above all they both prioritise going together to splitting up. The payoff matrix then looks something like this:

B S
B 2,1 0,0
S 0,0 1,2

The players are not allowed to communicate before they make their choices.

Playing against a static opponent

Before continuing, we recommend looking at the reinforcement learning part of the Appendix, if you are unfamiliar with the basics (e.g. terms like policy, reward function and environment).

If you were to train a reinforcement learning (RL) agent to do well in this game, the simplest case would be the case where the opponent is a static agent with a fixed policy. The purpose of the training would be to optimize for a best response to this opponent policy. You might let your agent play a hundred or a thousand times with different policies (e.g. “randomize between alternatives” or “always choose their own preferred choice”) and then pick the policy that generated the best payoff.

Opponent shaping: reinforcement learning against a learning opponent

If you were to assume that the other agent is also a learning agent, this changes things. This is likely intuitive to anyone who has trained a puppy or interacted with children - when you interact with an agent that develops quickly, you have to consider not only the immediate consequences of the interaction but also what the other party will learn from it and how it will change their future behaviour.

In our Bach or Stravinsky game, consider a strategic player that has a preference for the Stravinsky concert. Knowing that their opponent is a reinforcement learning agent, this player might reason that if they only ever play Stravinsky their opponent will never experience a reward for playing Bach. This changes the observed payoff matrix to look like this:

B S
B - 0,0
S 0,0 1,2

This way, the strategic player could ensure that they would always get their preferred choice. This is the basic concept of opponent shaping: to optimize a policy for a best response to the opponent learning algorithm, rather than to the opponent policy.

Tooltip Text

Exercise 4.7

Suppose two agents both try to opponent shape each other. Agent A best responds to B’s learning algorithm, but B is also trying to best respond to agent A’s learning algorithm which depends on B’s. Each can in principle think several steps deep (A models B models A models B …). First just try to make this concept clear in your head (possibly with diagrams), then try to reflect on the trade-off between deeper and shallower reasoning.

Tooltip Text

Optional

If you’re interested in how opponent shaping is implemented in MARL, among other reflections on opponent shaping, we recommend the next optional piece.

Tooltip Text

Opponent shaping as a model for manipulation and cooperation

All parts

Tooltip Text

Optional • 5100 Words (Technical)
Exercise 4.8

Take the iterated prisoner’s dilemma with two players. In this setup, they play some number of games of the prisoner’s dilemma together and then go through a step of reinforcement learning over the full history of observed behaviour. First imagine that both agents use a standard reinforcement learning algorithm like proximal policy optimization or Q-learning. Do you expect the agents to learn to cooperate? Explain your intuition. Now imagine that the RL algorithms of both agents are modified to take into account what the other will learn from the data from one step of standard RL. Now what do you expect from the agent’s policies?

Tooltip Text

Optional

Opponent shaping could potentially be a powerful tool for cooperative agents, if it can be scaled to complex agents and environments (including LLM agents). A key feature of this approach is that it could theoretically be used to make other learning agents cooperate even if you have no control over its design or objectives. Just as for many other cooperation-relevant capabilities, opponent shaping also comes with risks which will be covered in section 7 ‘Back-fire Risks’.

Tooltip Text

Training for Cooperativeness

So far in this section we have discussed what properties of agents might influence the outcomes in multi-agent interactions in general and cooperation problems in particular. If we want to figure out interventions for cooperative AI at the agent level, it is fundamental to identify which agent properties are relevant to study and what properties might be more or less desirable. We will now turn to different approaches of training agents for such properties: assuming we know what a cooperative agent is, can we create it?

One way to approach training for cooperativeness is to try to induce specific desirable biases or tendencies in agents. The next two resources explore two specific approaches for creating cooperative agents: norm-adaptive policies and inequality aversion. Reading the optional parts of the second paper requires a more technical background, but does provide some useful context on how cooperative properties can be trained into agents in practice.

Tooltip Text

Normative Disagreement as a Challenge for Cooperative AI

‘Abstract', 'Introduction', '3.2 Coordination problems', '3.3 Bargaining problems and normative disagreement', first paragraphs of '3.4 Norm-adaptive policies' before 'Definition 3.1'

Tooltip Text

Required • 1700 Words (Technical)
Inequity aversion improves cooperation in intertemporal social dilemmas

‘Abstract' and '1 Introduction'

Tooltip Text

Required • 1100 Words (Technical)
Inequity aversion improves cooperation in intertemporal social dilemmas

‘2 Reinforcement learning in sequential social dilemmas', '3 The model'

Tooltip Text

Optional • 2500 Words (Technical)
Exercise 4.9

Hypothetically, independent AI companies could train general inequity aversion into their agents using simulated multi-agent settings. Why do you think AI companies would avoid this without external pressure and cross company agreements?

Tooltip Text

Required

Training with Human Data Optional

Recall that the cooperative AI field is concerned with the multiagent dynamics of human-human, human-AI and AI-AI systems. The next resources are papers that explore the utility of training with human data to elicit good outcomes in fully cooperative and mixed-motive human-AI systems (technically, the whole game involved in the second study, diplomacy, is zero-sum, but contains mixed-motive subgames throughout). We make some recommendations for which parts might be the most valuable to read, but as this is optional content, feel free to use or ignore those recommendations as you like.

Tooltip Text

On the Utility of Learning about Humans for Human-AI Coordination

‘Abstract', 'Introduction', ‘3 Preliminaries', and parts A, B, C, and D of the Appendix.

Tooltip Text

Optional • 3500 Words (Technical)
Human-Level Performance In No-Press Diplomacy Via Equilibrium Search

‘Abstract', 'Introduction', ‘2 Background and Related Work', '3 Agent Descriptions’

Tooltip Text

Optional • 3300 Words (Technical)
Exercise 4.10

Why might including human data in the training of AI systems, at least initially, be beneficial for AI-AI systems or even human-human systems?

Tooltip Text

Optional

The following presentation discusses the extent to which we should aim for human-like AI agents if we care about problems of cooperation. It is related to the paper ‘Foundations of Cooperative AI’ that was linked in the first section of the curriculum.

Tooltip Text

AI Agents May Cooperate Better if They Don’t Resemble Us

From 6:53

Tooltip Text

Optional • 55 mins (Technical)

Multi-agent Environments

Work in cooperative AI often makes use of simulation or experiments that are carried out in some kind of environment. Below are a few environments that are frequently used in cooperative AI. It might be especially useful to be familiar with these if you are aiming to do your own research in cooperative AI, but for now, don’t spend any longer than about 20mins total looking over the abstracts and introductions for these.

Tooltip Text

Exercise 4.11

Imagine an agent is trained in many multi-agent environments for the purpose of improving its cooperative intelligence. What potential problem arises if the agent is consistently informed about which entities in each environment are also agents? How significant do you believe this issue is in practice?

Tooltip Text

Optional