Programmers have developed a computerized poker champion that has successfully defeated Darren Elias (who holds the most World Poker Tour titles), Chris "Jesus" Ferguson (winner of six World Series of Poker events), and 13 pros who, between them, have won over $1 million from the game.
In one experiment, each pro played 5,000 hands of poker against five copies of the machine, "Pluribus". In another, Pluribus played five pros at a time for 10,000 hands. AI came out on top both times. The results are published in Science.
"Pluribus achieved superhuman performance at multi-player poker, which is a recognized milestone in artificial intelligence and in game theory that has been open for decades," Tuomas Sandholm, Professor of Computer Science at Carnegie Mellon University and co-developer of Pluribus, said in a statement.
One thing that makes this triumph so special is the secretive nature of poker. In chess and Go, both players can see everything that goes on on the board. In poker, they can't – cards in play aren't always visible and players can bluff. This, the researchers say, makes it a trickier game to play for a machine built on logic and probabilities.
It also involves more players. This requires a strategy separate from the two-player games of before.
"Playing a six-player game rather than head-to-head requires fundamental changes in how the AI develops its playing strategy," Noam Brown, a research scientist at Facebook AI and co-developer, added.
Take, for example, Nash equilibrium: so long as your opponent's strategy remains the same, you won't benefit from changing yours. In a two-player game, it can be an effective strategy for AI – ideally, the human opponent will slip up or upset the equilibrium resulting in a win for the machine but at worst, the game will result in a tie.
This doesn't work when you add in more players.
"[Pluribus'] major strength is its ability to use mixed strategies," said Elias. "That's the same thing that humans try to do. It's a matter of execution for humans – to do this in a perfectly random way and to do so consistently. Most people just can't."
One unusual strategy adopted by the machine was "donk betting", which involves ending one round with a call and starting the next with a bet. Traditionally, it's seen as a weak, non-sensical move that human players tend to avoid. Yet, Pluribus used donk bets much more frequently than its (defeated) human opponents.
So, how did programmers build Pluribus? First, they had it play against five copies of itself, so it could learn the game through trial and error and create a "blueprint" for use in future matches.
Pluribus uses a limited-lookahead search algorithm, enabling it to predict the strategy its opponents will use in the next two to three plays (rather than the entire game). The technique isn't perfect – it considers just five possible continuation strategies for each player (the true number is much higher), but it's sufficient to enable the machine to carry out a strong strategy.
Pluribus also thrives on unpredictability. After all, it wouldn't get very far if it saved its bets for excellent hands only.
The implications of this achievement could extend well beyond poker. The novel strategy makes AI more relevant to "real-world" problems, which often involve missing information and multiple players.
"The ability to beat five other players in such a complicated game opens up new opportunities to use AI to solve a wide variety of real-world problems," said Brown.