Imagine two teams having a tie on a soccer field. Players can cooperate to achieve a goal, and compete against other players with conflicting interests. This is how the game works.
Creating AI agents that can learn to compete and collaborate as effectively as humans remains a thorny problem. The main challenge is enabling AI agents to predict the future behavior of other agents when they are all learning simultaneously.
Because of the complexity of this problem, current approaches tend to be short-sighted. Agents can only guess the next few moves of their teammates or opponents, which leads to poor performance in the long run.
Researchers from MIT, the MIT-IBM Watson AI Lab, and elsewhere have developed a new approach that gives AI agents a far-sighted perspective. they machine learning framework Enables cooperative or competitive AI agents to think about what other agents will do as time approaches infinity, not just within the next few steps. The agents then adapt their behaviors accordingly to influence other agents’ future behaviors and come up with an optimal long-term solution.
This framework could be used by a group of self-driving drones working together to find a park lost in a dense forest, or by self-driving cars striving to keep occupants safe by anticipating the future movements of other vehicles traveling on a busy highway.
When AI agents cooperate or compete, what matters most is when their behaviors converge at some point in the future. There are a lot of fleeting behaviors along the way that don’t matter much in the long run. Getting to this convergent behavior is what we’re really interested in, and we now have a mathematical way to enable that,” says Dong-Ki Kim, a graduate student at MIT’s Laboratory for Information and Decision Systems (LIDS) and lead author of a paper describing the framework.
The lead author is Jonathan B. How, the Richard C. Maclaurin Professor of Aeronautics and Astronautics and a member of the MIT-IBM Watson AI Lab. Other co-authors at the MIT-IBM Watson AI Lab include IBM Research, the Mila-Quebec Institute for Artificial Intelligence, and the University of Oxford. The research will be presented at the Neural Information Processing Systems conference.
In this demonstration video, a red robot, which has been trained using the researchers’ machine learning system, is able to defeat a green robot by learning more effective behaviors that take advantage of its opponent’s ever-changing strategy.
More customers, more problems
The researchers focused on a problem known as multi-agency reinforcement learning. Reinforcement learning is a form of machine learning in which an artificial intelligence agent learns by trial and error. The researchers give the agent a reward for “good” behaviors that help him achieve the goal. The agent adapts his behavior to maximize that reward in order to eventually become an expert at a task.
But when many agents learn to cooperate or compete simultaneously, things get more complicated. As agents consider more of their fellow agents’ future steps, and how their behavior affects others, the problem soon requires a great deal of computational power to solve efficiently. This is why other approaches focus only on the short term.
“The AI really wants to think about the end of the game, but they don’t know when the game will end. They need to think about how they can continue to adapt their behavior to infinity so that they can win at some point in the future. Our paper proposes a new goal that enables AI to Thinking of infinity.”
But because it is impossible to feed infinity into an algorithm, the researchers designed their system so that agents focus on a future point where their behavior converges with that of other agents, known as equilibrium. The equilibrium point determines the long-run performance of the factors, and multiple equilibria can exist in a multi-factor scenario. Therefore, the agent effectively influences the future behaviors of other agents in such a way as to reach a desirable equilibrium from the perspective of the agent. If all factors influence each other, they converge into a general concept that researchers call “energetic equilibrium.”
The machine learning framework they developed, called FURTHER (which stands for FUlly Reinforcing Effect withH AverageagE Reward), enables agents to learn how to adapt their behaviors as they interact with other agents to achieve this energetic balance.
Moreover, it does this using two machine learning modules. The first, the inference module, enables an agent to guess the future behaviors of other agents and the learning algorithms they use, based only on their past actions.
This information is fed into a reinforcement learning module, which the agent uses to adapt his behavior and influence other factors in a way that increases his reward.
The challenge was to think of infinity. We had to use a lot of different mathematical tools to enable this, and make some assumptions to make it work in practice,” says Kim.
win in the long run
They tested their approach against other multi-agent reinforcement learning frameworks in several different scenarios, including a pair of bots fighting sumo style and a battle pitting two teams of 25 agents against each other. Either way, AI agents using FURTHER have won games more than once.
Because their approach is decentralized, which means that agents learn to win games independently, it’s also more scalable than other methods that require a central computer to control the agents, Kim explains.
The researchers used the games to test their approach, but they can also be used to tackle any kind of multiagent problem. For example, it can be applied by economists seeking to develop sound policy in situations where several interacting payees have behaviors and interests that change over time.
Economics is one application that Kim is particularly excited to study. He also wants to delve deeper into the concept of active balance and further enhance the framework for doing more.
This research is funded in part by the MIT-IBM Watson AI Lab.