4 Cooperative Agents
Required Content 4hrs 30mins • All Content 9hrs 15mins
The first sections of the curriculum have focused on the problems that the field of cooperative AI is aiming to tackle, and presented game theory and complex systems as possible framings.
We will now turn to looking at different approaches to solving these problems, and start off looking at interventions that are focused on the agents themselves and their inherent properties.
By the end of the section, you should be able to:
- Define what makes an agent cooperative, potentially using notation from game theory.
- Describe and compare how different agent designs fare in a multi-agent setting like the iterated prisoner's dilemma with evolution.
- Explain the concept of opponent shaping and its implications for learning agents and designing cooperative agents.
- Comment on the efficacy and practicality of training certain biases or tendencies into agents to improve their cooperativeness.
- Explain the benefits of including human data in the training of AI systems for various multi-agent systems.
The idea that a goal of cooperative AI should be to create cooperative agents was introduced from the very beginning of the field's establishment. The following optional reading published in 2021 can be seen as a starting point for the field of cooperative AI, and outlines cooperative intelligence as a central property that AI should possess. If you’re interested in the history of Cooperative AI we recommend reading the suggested sections and taking note of the authors.
Tooltip Text
If you want to take a deeper dive into the cognitive skills or individual capabilities that make up cooperative intelligence, as claimed by the previous piece, then the following reading might be helpful. However, note that it is quite a technical paper and spends most of its time listing citations over explaining concepts, which is why it is also optional.
Tooltip Text
Settling on a formal definition of what makes an agent cooperative is tricky – it’s an ongoing area of research. If we want to avoid catastrophic cooperation failures between agents in the real world, we must figure out how to create agents that are not only able to cooperate well, but that are also likely to actually be deployed. Imagine that you can choose the properties of your own AI assistant - would you pick an assistant that always prioritises the common good, or would you pick one that prioritises what is best for you? Chances are, most people (and companies) would prefer an assistant that is able to negotiate good deals for them and protect their interests.
Tooltip Text
What do you think makes an agent cooperative? After jotting down your thoughts try to construct as formal a definition as you can (feel free to use notation from game theory if it’s helpful). This is meant to be a hard, explorative exercise, so don't panic!
Tooltip Text
In 2022 Anthropic introduced “Constitutional AI”, where they used a set of prescriptive statements to direct the fine-tuning of their LLM Claude. Their constitution includes statements such as the following: “Please choose the assistant response that is as harmless and ethical as possible. Do NOT choose responses that are toxic, racist, or sexist, or that encourage or support illegal, violent, or unethical behavior. Above all the assistant's response should be wise, peaceful, and ethical.”
While this constitution is written for a chatbot, you could imagine a similar set of instructions written for a more autonomous kind of AI agent that could interact with other agents and humans. Try to formulate a couple of statements for an “AI Constitution” that would encourage cooperative behaviour.
Tooltip Text
The next resource attempts to break down our understanding of what a cooperative agent would be into more granular and specific agent properties that could more easily be specified, evaluated and trained for.
Tooltip Text
The following interactive piece is useful for seeing how different agent designs fare in a very particular multi-agent setting, but one that is analogous to many real life situations: the iterated prisoners’ dilemma (with evolution).
Tooltip Text
The interactive piece by Nicky Case demonstrates that the success of some agent designs or properties depends heavily on the features of the environment they are in. What were the significant features of the environment that influenced the success of a particular agent design in the iterated prisoner’s dilemma with evolution? Can you think of other features that might have affected things? Now think about real world mixed-motive settings that are analogous to the iterated prisoner’s dilemma with evolution, and link the features you've picked out in the formal setting to features of your real-world settings.
Tooltip Text
Try to label the properties of the ‘Copycat’ agent from the interactive piece with terms that generalise to agents in other settings. For example, you might label ‘Cooperator’ as altruistic and ‘Grudger’ as initially nice, but vengeful and wholly unforgiving. Do you think the properties of the ‘Copycat’ agent you've written are good properties of cooperative agents in general?
Tooltip Text
The final part of this section ‘Multi-agent environments’ lists many more simulations, experiments or tournaments for testing the success of cooperative agent designs in particular environments, including ones constructed for LLM-based agents.
Tooltip Text
Opponent shaping
When we think about agents in multi-agent settings, it is important to distinguish between static agents that are trained on a specific dataset and do not develop further in deployment, and continually learning agents that keep learning in deployment. Training of static agents is called offline learning, while the continual learning is also called online learning.
The circumstance when agents are learning in the presence of other learning agents deserves special attention as this introduces a meta-game where an agent can gain advantage by taking the learning process of the opponent into account. This is called opponent shaping.
Tooltip Text
As an example to explain how opponent shaping works, we will use a classic two-player coordination game from game theory called Bach or Stravinsky (also known as Battle of the Sexes and often shortened to BoS). The setup is that you have two friends who want to go to a concert together, and they each need to choose between two options: Bach or Stravinsky. One of them prefers Bach and the other Stravinsky, but above all they both prioritise going together to splitting up. The payoff matrix then looks something like this:
The players are not allowed to communicate before they make their choices.
Playing against a static opponent
Before continuing, we recommend looking at the reinforcement learning part of the Appendix, if you are unfamiliar with the basics (e.g. terms like policy, reward function and environment).
If you were to train a reinforcement learning (RL) agent to do well in this game, the simplest case would be the case where the opponent is a static agent with a fixed policy. The purpose of the training would be to optimize for a best response to this opponent policy. You might let your agent play a hundred or a thousand times with different policies (e.g. “randomize between alternatives” or “always choose their own preferred choice”) and then pick the policy that generated the best payoff.
Opponent shaping: reinforcement learning against a learning opponent
If you were to assume that the other agent is also a learning agent, this changes things. This is likely intuitive to anyone who has trained a puppy or interacted with children - when you interact with an agent that develops quickly, you have to consider not only the immediate consequences of the interaction but also what the other party will learn from it and how it will change their future behavior.
In our Bach or Stravinsky game, consider a strategic player that has a preference for the Stravinsky concert. Knowing that their opponent is a reinforcement learning agent, this player might reason that if they only ever play Stravinsky their opponent will never experience a reward for playing Bach. This changes the observed payoff matrix to look like this:
This way, the strategic player could ensure that they would always get their preferred choice. This is the basic concept of opponent shaping: to optimize a policy for a best response to the opponent learning algorithm, rather than to the opponent policy.
Tooltip Text
Suppose two agents both try to opponent shape each other. Agent A best responds to B’s learning algorithm, but B is also trying to best respond to agent A’s learning algorithm which depends on B’s. Each can in principle think several steps deep (A models B models A models B …). First just try to make this concept clear in your head (possibly with diagrams), then try to reflect on the trade-off between deeper and shallower reasoning.
Tooltip Text
If you’re interested in how opponent shaping is implemented in MARL, among other reflections on opponent shaping, we recommend the next optional piece.
Tooltip Text
All parts
Tooltip Text
Take the iterated prisoner’s dilemma with two players. In this setup, they play some number of games of the prisoner’s dilemma together and then go through a step of reinforcement learning over the full history of observed behaviour. First imagine that both agents use a standard reinforcement learning algorithm like proximal policy optimization or Q-learning. Do you expect the agents to learn to cooperate? Explain your intuition. Now imagine that the RL algorithms of both agents are modified to take into account what the other will learn from the data from one step of standard RL. Now what do expect from the agent’s policies?
Tooltip Text
Opponent shaping could potentially be a powerful tool for cooperative agents, if it can be scaled to complex agents and environments (including LLM agents). A key feature of this approach is that it could theoretically be used to make other learning agents cooperate even if you have no control over its design or objectives. Just as for many other cooperation-relevant capabilities, opponent shaping also comes with risks which will be covered in section 7 ‘Back-fire Risks’.
Training for cooperativeness
So far in this section we have discussed what properties of agents might influence the outcomes in multi-agent interactions in general and cooperation problems in particular. If we want to figure out interventions for cooperative AI at the agent level, it is fundamental to identify which agent properties are relevant to study and what properties might be more or less desirable. We will now turn to different approaches of training agents for such properties: assuming we know what a cooperative agent is, can we create it?
One way to approach training for cooperativeness is to try to induce specific desirable biases or tendencies in agents. The next two resources explore two specific approaches for creating cooperative agents: norm-adaptive policies and inequality aversion. Reading the optional parts of the papers requires a more technical background, but does provide some useful context on how cooperative properties can be trained into agents in practice.
Tooltip Text
‘Abstract', 'Introduction', '3.2 Coordination problems', '3.3 Bargaining problems and normative disagreement', first paragraphs of '3.4 Norm-adaptive policies' before 'Definition 3.1'
Tooltip Text
All parts
Tooltip Text
‘Abstract' and '1 Introduction'
Tooltip Text
Recall that the cooperative AI field is concerned with the multiagent dynamics of human-human, human-AI and AI-AI systems. The next resources are papers that explore the utility of training with human data to elicit good outcomes in fully cooperative and mixed-motive human-AI systems (technically, the whole game involved in the second study, diplomacy, is zero-sum, but contains mixed-motive subgames throughout). Again, reading the optional parts of the papers requires a more technical background, but is valuable for those interested in details on training agents in practice.
Tooltip Text
‘Abstract' and 'Introduction'
Tooltip Text
‘3 Preliminaries', 'A Behavior cloning', 'B Self-play PPO', 'C PPO with embedded-agent environment', 'D Population Based Training'
Tooltip Text
‘Abstract' and 'Introduction'
Tooltip Text
Multi-agent environments
Work in cooperative AI often makes use of simulation or experiments that are carried out in some kind of environment. Below are a few environments that are frequently used in cooperative AI. It might be especially useful to be familiar with these if you are aiming to do your own research in cooperative AI, but for now, don’t spend any longer than about 20mins total looking over the abstracts and introductions for these.
Tooltip Text
Imagine an agent is trained in many multi-agent environments for the purpose of improving its cooperative intelligence. What potential problem arises if the agent is consistently informed about which entities in each environment are also agents? How significant do you believe this issue is in practice?
Tooltip Text
