Following their massive success in uncovering predicting protein folding in 2020, Google’s DeepMind has now released another AI that is less about solving complex biological problems, and more about dominating it’s opponents in strategy games – and it doesn’t even bother to read the rules.
In a blog post describing their latest innovation, DeepMind show off their MuZero machine-learning AI that can play multiple different games and set record-breaking scores without being told the rules. By combining previous iterations of game-playing AI that can plan ahead whilst learning from their previous move, MuZero is capable of creating strategies as it plays whilst being in a completely unknown environment.
Their findings were published to Nature.
“Systems that use lookahead search, such as AlphaZero, have achieved remarkable success in classic games such as checkers, chess and poker, but rely on being given knowledge of their environment’s dynamics, such as the rules of the game or an accurate simulator,” the authors state in the blog post.
“This makes it difficult to apply them to messy real world problems, which are typically complex and hard to distill into simple rules.”
MuZero currently plays Go, chess, shogi and Atari benchmarks such as Ms Pac-Man, but such advancements in AI could have resounding implications for algorithms that can adapt without rulesets, a challenge that humans face daily.
The AI works by utilizing 3 different parameters to create a game strategy:
How good is the current position?
What is the best action to take next?
How successful was the last action?
Essentially, the AI simplifies the entire game into a distinct set of questions, that then dictate how it proceeds further. It continuously learns throughout the game to make these decisions, and the results are extremely impressive.
In Atari suite benchmarks, MuZero set a new record for performance, outclassing all AI competitors. In chess, shogi and Go, MuZero matched the leading performance set by its’ younger AI sibling AlphaZero. It also showed interesting results when the number of simulations it was allowed to perform was increased. As The number of planned simulations was increased per move, MuZero performed better, demonstrating that increased planning allowed MuZero to perform and learn more effectively.
MuZero will now continue in its’ quest for total gaming dominance, but it will likely see many other uses in various scientific fields. AlphaZero is currently employed in may complex applications, including optimizing quantum dynamics far more rapidly than humans can.
Such algorithms will be integral to creating robots that can tackle the real world, instead of predefined roles with limited flexibility.