The world’s number one poker player has just been relegated. Not by a person, but by a computer, of course. A team of scientists is reporting in Science that they have essentially solved a type of Texas Hold ‘em poker by developing a computer program that cannot be beaten, at least in a human lifetime.
While this has been achieved previously for simpler games, such as Connect Four, this is the first time that scientists have solved a game in which some information is concealed from the players. Not only could this technology help poker players step up their game, but it could also have a diverse range of applications in situations requiring complex decision-making, such as security and medicine.
The algorithm developed is specific to one variety of poker, and it can still lose hands if dealt bad cards just like the opponent. In computer science lingo, it’s what is referred to as a “weak” solution. However, as Motherboard explains, it will minimize its losses as best is mathematically possible and will gradually wipe you clean of your chips by making the “perfect” decision in any given scenario. So, even if you fancy enduring 60 million hands with the program, it will still beat you.
There are many different types of poker, but all of them exhibit something called imperfect information, where something about the game is concealed from the players, such as an opponent’s cards. Scientists have previously solved perfect-information games, such as checkers, where all the information is there in front of you, but imperfect-information games are considerably more challenging.
Poker is a complex game, involving uncertainty, randomness, luck and bluffing, and Texas Hold ‘em—the most popular variety—is no exception. However, a simpler version of it exists, called heads-up limit, where there are only two players, fixed bet sizes and a fixed number of raises. That’s why scientists from the University of Alberta decided to select this game for their algorithm.
For their study, the researchers improved upon a previously developed algorithm called counterfactual regret minimization (CFR). Regret minimization basically involves reviewing past moves and examining whether making a different decision, such as raising, folding or calling, could have resulted in a better outcome. The computer then calculates how much it lost because of a particular move, and stores that as a regret value. This is then applied to each opportunity that the computer has to make that same decision, so that losses can be avoided.
After using 4,000 central processing units for two months, which is the equivalent of around 1,000 years of computing time, to practice against itself in hundreds of thousands of rounds, the computer gradually improved and developed better solutions. The regrets then became so small that it couldn’t be beaten in a human lifetime.
While this may all seem like fun and games, the algorithms could actually have a wide range of important applications. For example, it could be developed to assist airport checkpoints, or to aid doctors in decision-making where different possible outcomes from treatment need to be assessed.