In 1997, IBM’s chess-playing computer, Deep Blue, shocked the world by defeating the reigning human world champion over a six-game tournament. It was a defining moment in the nascent field of artificial intelligence, and even if it took the computer six games to prove it, the proof was there: humanity had been beaten at its own game.
Over the last twenty years, chess computers have only become more and more capable, leaving humanity far behind. But what all chess computers have had in common is that, while they have far surpassed humans in their chess skills, they are programmed at a basic level by chess-playing humans. The reigning world champion chess computer, Stockfish 8, would easily dismantle Deep Blue, but it did not teach itself chess strategy. It was taught, and it was fed an exhaustive archive of past chess matches to compare notes with.
Well, things just changed. AlphaZero, an AI developed by DeepMind, itself a division of Google, just crushed Stockfish 8. Out of a 100-game match, AlphaZero won 28 and drew 72, never losing once. It did so after being provided only the basic rules of chess by the AlphaZero team. After a mere four hours of playing around with those rules on its own, with no other input, it handily crushed the most successful, purpose-built chess program in the world without losing a game.
Chess.com’s Mike Klein compared this feat to “a robot being given access to thousands of metal bits and parts, but no knowledge of a combustion engine, then [experimenting] numerous times with every combination possible until it builds a Ferrari.” It bears repeating that AlphaZero became the world chess master after playing only itself for half a workday, with no outside help, empirical data, or chess game archives.
The paper published by the developers noted with interest that, in those four hours, AlphaZero independently discovered the twelve most common human chess openings and played them frequently. AlphaZero essentially taught itself in four hours more about chess than all of humanity accrued over roughly 1,500 years.
Interestingly, while making game decisions, AlphaZero actually searches far fewer positions per second (80,000, nothing to sneeze at) than Stockfish 8 does (70 million). The authors of the paper write that AlphaZero compensates for this by “using its deep neural network to focus much more selectively on the most promising variations — arguably a more “human-like” approach to search.” The computers, it seems, are learning.
To reinforce the point that this AI is a general-purpose reinforcement learning algorithm that can, from scratch, “achieve superhuman performances across many challenging domains,” the developers also provided it with the rules to Shogi, a game even more complicated than Chess. The top computer programs have only recently bested human players. After eight hours of playing Shogi by itself, AlphaZero is now the world champion of that game, too.