4 Cooperative Agents

Required Content 4hrs 30mins • All Content 9hrs 15mins

The first sections of the curriculum have focused on the problems that the field of cooperative AI is aiming to tackle, and presented game theory and complex systems as possible framings.

We will now turn to looking at different approaches to solving these problems, and start off looking at interventions that are focused on the agents themselves and their inherent properties.

By the end of the section, you should be able to:

  • Define what makes an agent cooperative, potentially using notation from game theory.
  • Describe and compare how different agent designs fare in a multi-agent setting like the iterated prisoner's dilemma with evolution.
  • Explain the concept of opponent shaping and its implications for learning agents and designing cooperative agents.
  • Comment on the efficacy and practicality of training certain biases or tendencies into agents to improve their cooperativeness.
  • Explain the benefits of including human data in the training of AI systems for various multi-agent systems.

The idea that a goal of cooperative AI should be to create cooperative agents was introduced from the very beginning of the field's establishment. The following optional reading published in 2021 can be seen as a starting point for the field of cooperative AI, and outlines cooperative intelligence as a central property that AI should possess. If you’re interested in the history of Cooperative AI we recommend reading the suggested sections and taking note of the authors.

Tooltip Text

Machines must learn to find common ground

Up to 'AI for human collaboration'

Tooltip Text

Optional • 10 mins

If you want to take a deeper dive into the cognitive skills or individual capabilities that make up cooperative intelligence, as claimed by the previous piece, then the following reading might be helpful. However, note that it is quite a technical paper and spends most of its time listing citations over explaining concepts, which is why it is also optional.

Tooltip Text

Open Problems in Cooperative AI

‘4. Cooperative Capabilities' excluding '4.4 Institutions'

Tooltip Text

Prerequisites
Optional • 1 hr

Settling on a formal definition of what makes an agent cooperative is tricky – it’s an ongoing area of research. If we want to avoid catastrophic cooperation failures between agents in the real world, we must figure out how to create agents that are not only able to cooperate well, but that are also likely to actually be deployed. Imagine that you can choose the properties of your own AI assistant - would you pick an assistant that always prioritises the common good, or would you pick one that prioritises what is best for you? Chances are, most people (and companies) would prefer an assistant that is able to negotiate good deals for them and protect their interests.

Tooltip Text

Exercise

What do you think makes an agent cooperative? After jotting down your thoughts try to construct as formal a definition as you can (feel free to use notation from game theory if it’s helpful). This is meant to be a hard, explorative exercise, so don't panic!

Tooltip Text

Required • 20 mins
Exercise

In 2022 Anthropic introduced “Constitutional AI”, where they used a set of prescriptive statements to direct the fine-tuning of their LLM Claude. Their constitution includes statements such as the following: “Please choose the assistant response that is as harmless and ethical as possible. Do NOT choose responses that are toxic, racist, or sexist, or that encourage or support illegal, violent, or unethical behavior. Above all the assistant's response should be wise, peaceful, and ethical.”

While this constitution is written for a chatbot, you could imagine a similar set of instructions written for a more autonomous kind of AI agent that could interact with other agents and humans. Try to formulate a couple of statements for an “AI Constitution” that would encourage cooperative behaviour.

Tooltip Text

Required • 20 mins

The next resource attempts to break down our understanding of what a cooperative agent would be into more granular and specific agent properties that could more easily be specified, evaluated and trained for.

Tooltip Text

Properties of Cooperative Agents by Cecilia Tilli

All parts

Tooltip Text

Required • 1 hr

The following interactive piece is useful for seeing how different agent designs fare in a very particular multi-agent setting, but one that is analogous to many real life situations: the iterated prisoners’ dilemma (with evolution).

Tooltip Text

Evolution of Trust by Nicky Case

All parts

Tooltip Text

Required • 30 mins
Exercise

The interactive piece by Nicky Case demonstrates that the success of some agent designs or properties depends heavily on the features of the environment they are in. What were the significant features of the environment that influenced the success of a particular agent design in the iterated prisoner’s dilemma with evolution? Can you think of other features that might have affected things? Now think about real world mixed-motive settings that are analogous to the iterated prisoner’s dilemma with evolution, and link the features you've picked out in the formal setting to features of your real-world settings.

Tooltip Text

Prerequisites
Required • 25 mins
Exercise

Try to label the properties of the ‘Copycat’ agent from the interactive piece with terms that generalise to agents in other settings. For example, you might label ‘Cooperator’ as altruistic and ‘Grudger’ as initially nice, but vengeful and wholly unforgiving. Do you think the properties of the ‘Copycat’ agent you've written are good properties of cooperative agents in general?

Tooltip Text

Required • 15 mins

The final part of this section ‘Multi-agent environments’ lists many more simulations, experiments or tournaments for testing the success of cooperative agent designs in particular environments, including ones constructed for LLM-based agents.

Tooltip Text

Exercise

Revisit the first two exercises of this section, and write new responses in light of what you've learnt so far.

Tooltip Text

Required • 15 mins

Opponent shaping

When we think about agents in multi-agent settings, it is important to distinguish between static agents that are trained on a specific dataset and do not develop further in deployment, and continually learning agents that keep learning in deployment. Training of static agents is called offline learning, while the continual learning is also called online learning.

The circumstance when agents are learning in the presence of other learning agents deserves special attention as this introduces a meta-game where an agent can gain advantage by taking the learning process of the opponent into account. This is called opponent shaping.

Tooltip Text

As an example to explain how opponent shaping works, we will use a classic two-player coordination game from game theory called Bach or Stravinsky (also known as Battle of the Sexes and often shortened to BoS). The setup is that you have two friends who want to go to a concert together, and they each need to choose between two options: Bach or Stravinsky. One of them prefers Bach and the other Stravinsky, but above all they both prioritise going together to splitting up. The payoff matrix then looks something like this:

B S
B 2,1 0,0
S 0,0 1,2

The players are not allowed to communicate before they make their choices.

Playing against a static opponent

Before continuing, we recommend looking at the reinforcement learning part of the Appendix, if you are unfamiliar with the basics (e.g. terms like policy, reward function and environment).

If you were to train a reinforcement learning (RL) agent to do well in this game, the simplest case would be the case where the opponent is a static agent with a fixed policy. The purpose of the training would be to optimize for a best response to this opponent policy. You might let your agent play a hundred or a thousand times with different policies (e.g. “randomize between alternatives” or “always choose their own preferred choice”) and then pick the policy that generated the best payoff.

Opponent shaping: reinforcement learning against a learning opponent

If you were to assume that the other agent is also a learning agent, this changes things. This is likely intuitive to anyone who has trained a puppy or interacted with children - when you interact with an agent that develops quickly, you have to consider not only the immediate consequences of the interaction but also what the other party will learn from it and how it will change their future behavior.

In our Bach or Stravinsky game, consider a strategic player that has a preference for the Stravinsky concert. Knowing that their opponent is a reinforcement learning agent, this player might reason that if they only ever play Stravinsky their opponent will never experience a reward for playing Bach. This changes the observed payoff matrix to look like this:

B S
B - 0,0
S 0,0 1,2

This way, the strategic player could ensure that they would always get their preferred choice. This is the basic concept of opponent shaping: to optimize a policy for a best response to the opponent learning algorithm, rather than to the opponent policy.

Tooltip Text

Exercise

Suppose two agents both try to opponent shape each other. Agent A best responds to B’s learning algorithm, but B is also trying to best respond to agent A’s learning algorithm which depends on B’s. Each can in principle think several steps deep (A models B models A models B …). First just try to make this concept clear in your head (possibly with diagrams), then try to reflect on the trade-off between deeper and shallower reasoning.

Tooltip Text

Required • 30 mins

If you’re interested in how opponent shaping is implemented in MARL, among other reflections on opponent shaping, we recommend the next optional piece.

Tooltip Text

Opponent shaping as a model for manipulation and cooperation

All parts

Tooltip Text

Optional • 45 mins
Exercise

Take the iterated prisoner’s dilemma with two players. In this setup, they play some number of games of the prisoner’s dilemma together and then go through a step of reinforcement learning over the full history of observed behaviour. First imagine that both agents use a standard reinforcement learning algorithm like proximal policy optimization or Q-learning. Do you expect the agents to learn to cooperate? Explain your intuition. Now imagine that the RL algorithms of both agents are modified to take into account what the other will learn from the data from one step of standard RL. Now what do expect from the agent’s policies?

Tooltip Text

Optional • 20 mins

Opponent shaping could potentially be a powerful tool for cooperative agents, if it can be scaled to complex agents and environments (including LLM agents). A key feature of this approach is that it could theoretically be used to make other learning agents cooperate even if you have no control over its design or objectives. Just as for many other cooperation-relevant capabilities, opponent shaping also comes with risks which will be covered in section 7 ‘Back-fire Risks’.

Training for cooperativeness

So far in this section we have discussed what properties of agents might influence the outcomes in multi-agent interactions in general and cooperation problems in particular. If we want to figure out interventions for cooperative AI at the agent level, it is fundamental to identify which agent properties are relevant to study and what properties might be more or less desirable. We will now turn to different approaches of training agents for such properties: assuming we know what a cooperative agent is, can we create it?

One way to approach training for cooperativeness is to try to induce specific desirable biases or tendencies in agents. The next two resources explore two specific approaches for creating cooperative agents: norm-adaptive policies and inequality aversion. Reading the optional parts of the papers requires a more technical background, but does provide some useful context on how cooperative properties can be trained into agents in practice.

Tooltip Text

Normative Disagreement as a Challenge for Cooperative AI

‘Abstract', 'Introduction', '3.2 Coordination problems', '3.3 Bargaining problems and normative disagreement', first paragraphs of '3.4 Norm-adaptive policies' before 'Definition 3.1'

Tooltip Text

Prerequisites
Required • 15 mins
Normative Disagreement as a Challenge for Cooperative AI

All parts

Tooltip Text

Optional • 1 hr
Inequity aversion improves cooperation in intertemporal social dilemmas

‘Abstract' and '1 Introduction'

Tooltip Text

Required • 8 mins
Inequity aversion improves cooperation in intertemporal social dilemmas

‘2 Reinforcement learning in sequential social dilemmas', '3 The model'

Tooltip Text

Optional • 25 mins
Exercise

Hypothetically, independent AI companies could train general inequity aversion into their agents using simulated multi-agent settings. Why do you think AI companies would avoid this without external pressure and cross company agreements?

Tooltip Text

Required • 10 mins

Recall that the cooperative AI field is concerned with the multiagent dynamics of human-human, human-AI and AI-AI systems. The next resources are papers that explore the utility of training with human data to elicit good outcomes in fully cooperative and mixed-motive human-AI systems (technically, the whole game involved in the second study, diplomacy, is zero-sum, but contains mixed-motive subgames throughout). Again, reading the optional parts of the papers requires a more technical background, but is valuable for those interested in details on training agents in practice.

Tooltip Text

On the Utility of Learning about Humans for Human-AI Coordination

‘Abstract' and 'Introduction'

Tooltip Text

Prerequisites
Required • 12 mins
On the Utility of Learning about Humans for Human-AI Coordination

‘3 Preliminaries', 'A Behavior cloning', 'B Self-play PPO', 'C PPO with embedded-agent environment', 'D Population Based Training'

Tooltip Text

Optional • 20 mins
Human-Level Performance In No-Press Diplomacy Via Equilibrium Search

‘Abstract' and 'Introduction'

Tooltip Text

Prerequisites
Required • 5 mins
Human-Level Performance In No-Press Diplomacy Via Equilibrium Search

‘2 Background and Related Work', '3 Agent Descriptions’

Tooltip Text

Optional • 30 mins
Exercise

Why might including human data in the training of AI systems, at least initially, be beneficial for AI-AI systems or even human-human systems?

Tooltip Text

Required • 10 mins

Multi-agent environments

Work in cooperative AI often makes use of simulation or experiments that are carried out in some kind of environment. Below are a few environments that are frequently used in cooperative AI. It might be especially useful to be familiar with these if you are aiming to do your own research in cooperative AI, but for now, don’t spend any longer than about 20mins total looking over the abstracts and introductions for these.

Tooltip Text

Exercise

Imagine an agent is trained in many multi-agent environments for the purpose of improving its cooperative intelligence. What potential problem arises if the agent is consistently informed about which entities in each environment are also agents? How significant do you believe this issue is in practice?

Tooltip Text

Optional • 15 mins