AlphaZero (Computer) vs Stockfish (Computer) (2017)

Members · Prefs · Laboratory · Collections · Openings · Endgames · Sacrifices · History · Search Kibitzing · Kibitzer's Café · Chessforums · Tournament Index · Players ·

Kibitzing

AlphaZero (Computer) vs Stockfish (Computer)
AlphaZero - Stockfish (2017), London ENG, Dec-04
Queen's Indian Defense: Classical Variation. Polugayevsky Gambit (E17) · 1-0

ANALYSIS [x]

FEN COPIED

explore this opening	find similar games	219 more AlphaZero/Stockfish games
sac: 19.Re1	PGN: download \| view \| print	Help: general \| java-troubleshooting

TIP: You can learn a lot about this site (and chess in general) by reading the Chessgames Help Page. If you need help with premium features, please see the Premium Membership Help Page.

For help with this chess viewer, please see the Olga Chess Viewer Quickstart Guide.

PREMIUM MEMBERS CAN REQUEST COMPUTER ANALYSIS [more info]

A COMPUTER ANNOTATED SCORE OF THIS GAME IS AVAILABLE. [CLICK HERE]

< Earlier Kibitzing · PAGE 2 OF 2 · Later Kibitzing>
Jan-14-18		Eduardo Bermudez B.: There are too many counterintuitive moves for most humans

Feb-03-18		keypusher: <weisyschwarz: <Eatman>, what about <Whiteshark's> line, 34...bxc4, where 35. Rd3 is not possible?> Try 35.g4, clearing the queen's path to c3 (e.g. 35....Rd8 36.Rxd8 Qxd8 37.Qc3+ +-). Seems completely winning.

Feb-03-18		Ron: At the end of Black's 19th move, White is down a knight and a pawn. However, after Black's 19th move, the Black King is on its third rank. This online article about King Safety in chess programming discusses Stockfish toward the end. https://chessprogramming.wikispaces... Perhaps Stockfish programmers--and chess players--need to rethink King Safety. Perhaps King Safety is under-evaluated.

Apr-29-18		kungfufighter888: omg 26 Qh1.

May-17-18		princecharming: I think 26.Qh1 is my favourite move of any game ever.

Jun-11-18		ThirdPawn: AlphaZero throws everything known about chess out the window. Theory, computation, tactics, etc. We can consider chess solved with AlphaZero, because chess really only has a finite number of moves. And it really doesn't matter how much processing power was given to Stockfish because all AlphaZero is doing is searching through its database that it collected from past games involving the same position, games which AlphaZero compiled by playing against itself, and then choosing the best moves based on its win ratio. So, AlphaZero is 100s of moves ahead because with every move AlphaZero makes, it already sees checkmate in X number of moves, or a draw from current position. Stockfish just cannot even compare. Perhaps Google will one day allow us to search positions through AlphaZero's database. To be honest, AlphaZero reminded me of Paul Morphy's reckless game play. But at least Morphy's games were analyzable!

Jul-01-18		PJs Studio: What creepy is Alphazero is kicking the crap out of what is essentially a 3200elo. Chess is NOT a drawn game. For humans maybe the higher percentage of draws are due to exhaustion, lack of skill (by comparison) and a healthy fear of losing (plus our inability to take advantage of every slight inaccuracy within our own games...) What’s Alphazero’s ELO? 3600??

Jul-14-18		Ron: This is from the Wikipedia article on Stockfish, about its results against AlphaZero: "In 100 games from the normal starting position AlphaZero won 25 games as White, won 3 as Black, and drew the remaining 72, with 0 losses.[32] AlphaZero also played twelve 100-game matches against Stockfish starting from twelve popular openings for a final score of 290 wins, 886 draws and 24 losses, for a point score of 733:467.[33][note 1] The research has not been peer reviewed and Google declined to comment until it is published.[32]" Comment: So in some games, where the opening was already set up, AlphaZero lost to Stockfish. I encourage all the AlphaZero games to be made public. That would advance chess knowledge. For example, if AlphaZero lost a few games where it was given a certain opening, that would be a telling argument against that opening.

Jul-14-18		zanzibar: <<Ron> Comment: So in some games, where the opening was already set up, AlphaZero lost to Stockfish. I encourage all the AlphaZero games to be made public.> Yes, we all agree on this. I've tried writing an AlphaZero member to get the additional games, but he begged me off citing a pending submitted article's publication. This article should have been published by now, and perhaps I should attempt to recontact. Of course, this attempt is unlikely to succeed, since one might assume the games would have been all published by now if there wasn't some unknown internal reluctance to do so.

Jul-15-18		tonsillolith: Let's also not forget that the time controls Stockfish was made to use were not the ones it was optimized for. If we saw an Alphago vs Stockfish match that was organized by Stockfish's creatora instead of by Alphago's, this could be a very different outcome. It's like looking at "scientific studies" funded for and published by a company about how good their products are.

Jul-15-18		Ron: <zanzibar: Yes, we all agree on this. I've tried writing an AlphaZero member to get the additional games, but he begged me off citing a pending submitted article's publication.> If I worked for Google, I would consider copying the games and giving it to Wikileaks. <tonsillolith: Let's also not forget that the time controls Stockfish was made to use were not the ones it was optimized for. If we saw an Alphago vs Stockfish match that was organized by Stockfish's creatora instead of by Alphago's, this could be a very different outcome> I'm all for more experiments. I just recently downloaded Stockfish 9, which is said to be stronger than its predecessors. After utilizing Stockfish 9, I believe it.

Jul-16-18		zanzibar: At the rate we're going (or not going, as the case may be), Wikileaks might be our only hope! .

Jul-17-18		djvanscoy: It seems that the losing move was 27...Bg6. Two questions: First, would 27...Bxe4 have drawn? (A possible variation is 27...Bg6 28.Qxe4 Kg8 29.Bd4 Rf7, planning finally to develop the knight and release the other rook. To give credit where credit is due, Erik Kislik pointed this out.) Second, if indeed 27...Bxe4 draws, at what depth does Stockfish find it? (In the preprint reporting on these games, it was stated that Stockfish had one minute per move, computing with 64 threads with 1 GB of hash. I'll speculate that this means Stockfish was running on a machine with a pair of Intel Xeon E5-2697A v4 chips; the preprint didn't specify. The hash size strikes me as quite small given their hardware; I don't know if that matters. Based on the information from the preprint, Stockfish would have been able to search approximately 4.2 billion nodes per move in this match.)

Jul-17-18		AylerKupp: <<ThirdPawn> And it really doesn't matter how much processing power was given to Stockfish because all AlphaZero is doing is searching through its database that it collected from past games involving the same position, games which AlphaZero compiled by playing against itself, and then choosing the best moves based on its win ratio. > I don't think that you understand how AlphaZero in partcular or neural networks in general work. AlphaZero does not have an explicit database of all the games it has played or all the games it was provided to analyze. Instead, AlphaZero, like all neural network-based chess engines, uses that information to calculate the proper weights for its nodes to use when calculating the best move to make at any one time. So the information derived from its neural network training is <implicit> in AlphaZero's memory but not <explicitly>. <We can consider chess solved with AlphaZero, because chess really only has a finite number of moves.> Chess does have a finite number of possible moves or, more appropriately. Possible positions. But that number is so high that it might as well be infinite. It has been estimated anywhere from 10^43 (the Shannon number) to 10^52 positions. This measure of complexiity is referred to as state space complexity since it counts every possible position once regardless of how it was reached; i.e. ignoring move transpositions. This is equivalent of the information contained in each chess engine's hash table. Using the lower number of 10^43 and assuming that the chess computer can evaluate 1 position / attosecond (10^-15 seconds, with 12 attoseconds currently being the best timing control of laser pulses), it would mean that computer would need about 3.1710^21 years to evaluate all possible positions. Given that the age of the universe is estimated at 1.4710^10 years, it's not practical in the near future to completely solve the game of chess. <Perhaps Google will one day allow us to search positions through AlphaZero's database. To be honest, AlphaZero reminded me of Paul Morphy's reckless game play. But at least Morphy's games were analyzable!> As I said above, AlphaZero does not use an explicit database during play. I suppose that it's possible for Google/Deep mind to have created a database of all the games that AlphaZero played against itself but doubt that they did that. And all chess games are analyzable. The ability to do so has everything to do with the skilll of the analyst and not the complexity of the games themselves.

Jul-17-18		AylerKupp: <<zanzibar> Yes, we all agree on this. I've tried writing an AlphaZero member to get the additional games, but he begged me off citing a pending submitted article's publication.> This reminds me of one of my favorite jokes: A man is driving along a road in his car when suddenly an animal jumps across his path. The man swerves to avoid it but as a result he drives his car into a ditch. As the man is considering what to do he spots a farmer in the field next to the road and goes to meet him. Man: "Excuse me, do you by chance have a rope?" Farmer: "Yes I do." Man: "Great. My car went into a ditch as I swerved to avoid hitting an animal that suddenly jumped the road. Can I borrow your rope to pull my car out of the ditch?" Farmer: "I'm sorry, no. I need the rope to tie up my milk." Man (puzzled): "But you can't use a rope to tie up your milk?" Farmer: "When you don't want to do something, one answer is as good as another." Since Google/Deep Mind already achieved all the publicity they wanted as a result of the AlphaZero / Stockfish match, I doubt that we will ever see a collection of all of AlphaZero's games. Like IBM with Deep Blue after the 1997 rematch with Kasparov, what would Google/Deep Mind gain by publishing them?

Jul-17-18		AylerKupp: <<tonsillolith> Let's also not forget that the time controls Stockfish was made to use were not the ones it was optimized for. If we saw an Alphago vs Stockfish match that was organized by Stockfish's creatora instead of by Alphago's, this could be a very different outcome.> I think that it's even more pertinent to compare the relative power of the hardware used by AlphaZero with the hardware used by Stockfish in their match. After all, if you want to compare the relative performance of two <software> chess engines, you want to have them run either on similar <hardware> or <hardware> with equivalent performance. That's why the TCEC championships and the various chess engine tournaments like CCRL and CEGT are all run on the same hardware. The hardware use by AlphaZero during its match with Stockfish consisted of 4 second generation and proprietary Tensor Processing Units rated at 45 TFlops each for a total performance capability of about 180 TFlops. Stockfish ran on a Linux system consisting of, supposedly, 64 cores, since the DeepMind AlphaZero paper (https://arxiv.org/pdf/1712.01815.pdf indicated that it used 64 threads. Assuming a system of something like 16 Intel i5s each with 4 cores running at 3.4 GHz the total performance capability of the hardware running Stockfish was about 1.68 TFlops. So the hardware running AlphaZero during the match was > 100 times more powerful than the hardware running Stockfish. Stockfish's opening book and tablebase support were disabled during the match. This might seem fair since AlphaZero doesn't have either, but they represent part of the strongest Stockfish configuration, even though the default Stockfish installation, at least in Arena, does not include the use of their . And, they also represent some of what Stockfish has "learned" about the opening and endgame so, whether it "learned" by having it available during game play doesn't seem to be any different than what AlphaZero learned during its training. But that's clearly a matter of opinion. As you mentioned, Todd Romstad, one of Stockfish's original creators, also talked by the effective disabling of Stockfish's time management algorithm by using a fixed time/move, saying that a lot of effort was spent in incorporating "smarts" into Stockfish to determine which positions were not critical and therefore did not need much analysis time, thus making more analysis time available for the critical positions that required it. AlphaZero did not have a time management function because this was, after all, a proof of concept and not a production chess engine so that again may or may not be a fair comparison of the two engines' playing strength. Another source of discussion was the 1 GB size of Stockfish's hash table. To me that seems low; I use a 1024 MByte hash table with my 4-core machine so that's 256 MByte per core. Yet the default Stockfish 9 installation, at least in Arena, sets up only a 16-MByte hash table with 1 thread (core) so that would be the same on a per thread/core basis as a 1 GB hash table for a 64-core system. So that's consistent with Google/Deep Mind's use of default Stockfish installation parameters for opening book, tablebase support, and hash table size. Although Stockfish's playing strength would have been increased if the first two had been enabled and a larger hash table used.

Jul-18-18		djvanscoy: <AylerKupp> Actually, the AlphaZero team's preprint said they were running Stockfish on 64 threads, not 64 cores. My guess is that that means 32 physical cores with hyperthreading (I speculated on the specific CPUs in my previous post). There is some debate over how alpha-beta pruning interacts with hyperthreading; I am not sure you would get better performance from Stockfish with 32 threads on the 32 cores than with 64 threads on them. It would be interesting to know if anyone has tested this. Disabling tablebases in Stockfish, to be fair, probably had little to no effect on the outcome; the slowdown in accessing SSD data probably eliminates almost all advantage accruing to perfect information on 6-piece endgames, especially at 60-second-per-move time controls. It wasn't clear to me that they disabled opening books per se. The AlphaZero team's preprint says: "Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games (1)). Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish..." The last sentence evidently means that they started the AlphaZero-Stockfish games from some set of common openings. Indeed, on page 6 of the preprint, they indicate that AlphaZero and Stockfish played (at least) 1200 games, consisting of a 100-game match in each of 12 openings. For example, they report that in E61, the King's Indian Defense, AlphaZero beat Stockfish +16 -2 =82. In C00, the French Defense, AlphaZero beat Stockfish +43 -0 =57. I guess "the" 100-game match that AlphaZero won +28 -0 =72 was a separate set of games. If so, they never explicitly say whether or how openings were chosen. In this game at least, AlphaZero and Stockfish do stay in book for quite a while. The hardware disparity might be unavoidable. I'm no expert on tensor processing units, but they appear to be special-purpose hardware for machine learning, which suggests that you have to run AlphaZero on TPUs and you can't run Stockfish on TPUs.

Jul-18-18		AylerKupp: <<djvanscoy> The consensus, I think, is that whatever benefit you get from hyperthreading depends on what you're doing, and I think that most chess engines suggest that you disable hyperthreading, although this <might> be more applicable to Windows and not Linux. I have a Windows system with Intel chips that don't have hyperthreading so there's no way I can verify that one way or the other. At any rate, I'm pretty sure that you would get better performance from 64 physical cores without hyperthreading rather than from 32 physical cores with hyperthreading, so I gave the presumable Stockfish configuration the benefit of the doubt and I assumed 64 physical cores without hyperthreading. If the Stockfish configuration used was indeed 32 physical cores with hyperthreading then the performance advantage for AlphaZero would be greater than 100X, possibly in the 130X – 150X range. And I also have no idea how alpha-beta pruning would be impacted by hyperthreading, although I'm reasonably sure that it would be slower with 32 hyperthreaded physical cores than with 64 physical cores without hyperthreading. Which is why I always wondered why Google/Deep Mind indicated that the Stockfish configuration used "64 threads". If indeed it was 32 hyperthreaded cores, then that would be somewhat misleading with regards to the performance capability of the system used to run Stockfish. For a similar perspective you can look here: https://chess.stackexchange.com/que.... But I don't know how old the system where Stockfish ran was, I doubt that it consisted of Intel I9 Extreme Processors, so I suspect that the system used for Stockfish was substantially slower. For more information about TPUs, see https://en.wikipedia.org/wiki/Tenso.... But basically a TPU is designed for a high volume of low precision computations of the types that are common to neural network-based calculations. You could presumably run AlphaZero on an Intel processor by implementing the TPU calculations in software but at great impact on performance, but I have no idea of the magnitude of the effort required. Some enterprising,, dedicated, and independently wealthy person(s) could probably do that to satisfy their curiosity, but I'm 100% sure that Google/Deep Mind will not spend the time and money to do this and make AlphaZero perform less capably in another match with Stockfish. With respect to the openings I read the AlphaZero paper again and every time I read it I get more confused. I also apparently didn't pay much attention to pages 5 and 6. It now seems clear, at least to me, that separately (a) AlphaZero and Stockfish played "the" 100-game match starting from the initial position with AlphaZero performing considerably better than Stockfish and (b) they presumably played 12 100-game matches (a total of 1200 games) from each of the positions shown in Table 2. So I don't think that Stockfish necessarily had its opening book enabled for part (b), the board was set according to the final position in each of the openings and play resumed from there, with AlphaZero playing White for 50 games of each opening and Stockfish playing White for 50 games of each opening. I don't know how either AlphaZero OR Stockfish selected their openings in their "Initial Position" match. If that's the case then there's certainly hasn't been too much publicity regarding these results. After all, with 1200 games rather than 100, it should be a more accurate reflection of the relative strengths of AlphaZero and Stockfish. In the Openings scenario Stockfish did slightly better; from the initial position (100 games) AlphaZero's Scoring Percentage [ (Wins + Draws/2)/No. of games ] was 64.0% while in the Openings scenario (1200 games) AlphaZero's Scoring Percentage was "only" 61.1%. And for those who trumpet AlphaZero's infallibility the Openings scenario provide a rude awakening; AlphaZero <lost> 24 games (2.0%), 5 with White and 19 with Black. Clearly not significant and indicates that in this system configuration (hardware + software) AlphaZero was by far the better performer. But infallible? No, of course not.

Oct-01-18		Atking: I do not understand why 30...Rh8 wasn't play? It seems better even for human eyes. After 32.c4 White upper hands is quite clear as Black queen side pieces can't get out without a sacrifice.

Oct-02-18		Atking: <talwnbe4: Stockfish 6 on a single core processor considers 34.. Rd8 ?? (+-1.8) as a bad mistake in less than a minute.. 34..Re5 as =, (33..Re6! another alternative =) Don't know what was running here in this match. > Are you sure that moves drawn...

Oct-17-18		CharlesSullivan: < Stockfish's losing move was 34...Rd8 > [ The following analysis was posted on the Rybka Forum on Dec. 24, 2017 ]: Please note that Black has a draw as late as move 34. After an overnight search of 65 plies, Stockfish8 (with endgame tables) rates 34...a6 as <+2.65> and best play is 35.Rd3 Ra7 36.Rf3 Qh8 37.Qg4 Qxh1+ 38.Kxh1 Rd7 39.Qxg5 Rd1+ 40.Kg2 Nd7 41.Ra3 bxc4 42.Rxa6 Ne5 43.Qe3 Kf7 44.Ra7+ Rd7 45.Qf4+ Kg7 46.Rxd7+ Nxd7 47.Qxd4 c5 48.Qd5 Re7 49.f4 c4 50.Qd4+ Kh6 51.Qxc4 click for larger view This is almost certainly a draw. Consider the following position that might eventually be reached: click for larger view This 6-man endgame position is a draw, although Stockfish8 evaluates this position at greater than <+2.00>! It seems that Stockfish could improve its evaluation of unbalanced endgame positions. [ Since the above was posted last year, I have used the latest version of Stockfish 9 on a 16-core Threadripper 1950X to re-analyze 34...a6. Even using endgame tablesbases, Stockfish still rates this as an overwhelming <+3.30> variation for White. But I have found no improvements for White, and the verdict is still that 34...a6 leads to a draw. ]

Dec-09-18		dannygjk: In the 2017 AZ vs SF 8 match SF 8 was doing 70,000 Knps, AZ was doing 80 Knps.

Sep-06-19		Tiggler: In case there are some who have not yet noticed, be advised: <CharlesSullivan> does not post an opinion on a position without being sure that it is the last word. So, say no more.

Mar-08-20		drdos7: The losing move in this game is 27...Bg6?, the correct move in this position is 27...Bxe4 click for larger view

Oct-05-22		CharlesSullivan: As usual, faster machines present new insights... My 32-core Threadripper 3970X says: (A) Black does NOT draw at move 34 or move 32, but (B) Black does draw with 30...Rh8 31.Qg4 Rf8 32.Bd4 Qd6 33.Be5 Qxe5 34.Rxe5 Bxe5 35.Rh1 Rf5 36.Qh3 Kf8 37.Rd1 Na6 38.Qh6+ Bg7 39.Qxg6 Rf6 40.Qxg5 Nc7 41.Qc5+ Kg8 42.Rd6 Raf8 43.f4 Rxd6 44.Qxd6 Nd5 45.Qxc6 Rd8 46.Qxb5 Bxc3 47.Qa6 Bd4, and Black holds. click for larger view When will this be proven wrong??

search thread:
< Earlier Kibitzing · PAGE 2 OF 2 · Later Kibitzing>

NOTE: Create an account today to post replies and access other powerful features which are available only to registered users. Becoming a member is free, anonymous, and takes less than 1 minute! If you already have a username, then simply login login under your username now to join the discussion.

Please observe our posting guidelines:

No obscene, racist, sexist, or profane language.
No spamming, advertising, duplicate, or gibberish posts.
No vitriolic or systematic personal attacks against other members.
Nothing in violation of United States law.
No cyberstalking or malicious posting of negative or private information (doxing/doxxing) of members.
No trolling.
The use of "sock puppet" accounts to circumvent disciplinary action taken by moderators, create a false impression of consensus or support, or stage conversations, is prohibited.
Do not degrade Chessgames or any of it's staff/volunteers.

Please try to maintain a semblance of civility at all times.

See something that violates our rules? Blow the whistle and inform a moderator.

NOTE: Please keep all discussion on-topic. This forum is for this specific game only. To discuss chess or this site in general, visit the Kibitzer's Café.

Messages posted by Chessgames members do not necessarily represent the views of Chessgames.com, its employees, or sponsors.
All moderator actions taken are ultimately at the sole discretion of the administration.

This game is type: CLASSICAL. Please report incorrect or missing information by submitting a correction slip to help us improve the quality of our content.