< Earlier Kibitzing · PAGE 13 OF 15 ·
Later Kibitzing> |
Jun-26-20 | | Helios727: The link for Stockfish analysis online is at:
https://www.365chess.com/analysis_b... |
|
Jun-28-20
 | | AylerKupp: <Helios727> Thanks for the link. I used the <365chess.com>'s Stockfish analysis app to analyze the position after 1.e4 e6 2.d4 d5 3.Nc3 Bb4 . I'm a French Defense player so it's the first position that came to mind for it to analyze:
 click for larger viewThis Stockfish analysis app is incredibly slow. I let Stockfish analyze for more than 24 hours and it got to search ply depth of 38 with an evaluation of 4.e5 of [+0.17] as its best move. I don't know if it will stop at d=42 since I'm stopping it due to lack of interest since I think it's far too slow to be of much practical use. Did you let it continue running and it just stopped, or seemed to stop at d=42? Or did you get a message indicating that was it's limit. Maybe, since the time to analyze positions increases exponentially as the search depth increases, you just thought it had stopped. I don't know what version of Stockfish they are using, but when I Googled "365chess stockfish analysis" the summary of the first hit said "You can analyze your positions and games online with a powerful chess engine - Stockfish. 8." But there is no mention of the Stockfish version used in the https://www.365chess.com/analysis_b... page. The latest officially release version of Stockfish is 11, released in mid-Jan 2020, although there are other later development versions that can be downloaded. All are free. I had my oldish 32-bit, 4-core computer analyze the same position using Stockfish 11 overnight. It reached d=38 in less than 33 minutes and after about 8½ hours it had completed its analysis d=48. At d=38 it also considered 4.e5 to be White's best move with an evaluation of [+0.18] and likewise at d=48 with an evaluation of [+0.38]. Faster computer systems will of course take less time, user <RandomVisitor> with his much more powerful system routinely posts analyses where his Stockfish reaches search depths in the low 50s. I don't know if you do much computer chess analysis but, if you are interested in doing so, I suggest that you download and install a chess GUI and Stockfish to your computer and run your analyses there rather than rely on https://www.365chess.com/analysis_b.... I have found that with Stockfish you need to let it reach deeper plies, in the mid-30s and preferably in the low 40s, when compared to other engines in order to have similar confidence on the accuracy of their results. There are many chess GUIs that are free to download and use as, of course, Stockfish. I personally use Arena because it is capable of doing many other things besides analysis such as running engine vs. engine matches or multi-engine tournaments, as well as let you play against the chess engine. The reason why I chose Arena when I was just getting started running computer chess analyses, besides its capabilities, is that it was free. However, Arena has a somewhat steep learning curve because of its many features but, like anything else, once you learn how to use it it's not difficult to use. But there are many other free chess GUIs that you can download that might be more to your liking. If you are interested in getting started with doing chess analysis on your own computer and have any questions, feel free to drop me a note on my forum. |
|
Jun-28-20
 | | AylerKupp: <Helios727> Here's another reason why running chess engines on your own computer is important. I sometimes record information about some of the key parameters of an analysis so that I can compare the results from different engines at each ply; search depth, time required to reach that depth, total KNodes evaluated, KNodes/sec, evaluations (of course!), and the top move. I should do it more often. The last one indicates how stable the analysis of the position is. Sometimes the same move is consistently listed as the top move in ply after ply, particularly at the deeper plies so I then consider the analysis "stable". But sometimes the top move changes a lot between plies. I consider that analysis "unstable" and what the engine considers to be the top move in that position often depends mostly on when you stop the analysis due to time or patience limitations, or both; it's pretty much arbitrary. The evaluation of the top move also change, sometimes significantly, on a ply-by-ply basis. At d=48 it evaluated 4.e5 at [+0.38], effectively equal. But it was as low as [0.00] at d=39 and d=40 for 4.exd5. And it evaluated 4.exd5 at d=46 at [+0.35]. In this analysis the GUI listed Stockfish's top 3 moves at each ply from d=8 to d=48, 41 plies in all. And there were 7 different moves identified as the "top move" in each of those 41 plies, with 4.exd5 considered the best move more often than all the others. In descending order of occurrence (2nd column) with White's Scoring % listed in the 3rd column, these were: 4.exd5 <22> 54.7% 4.e5 <7> 57.5%
4.a3 <6> 57.3%
4.Bd3 <2> 50.0% 4.Nge2 <2> 54.0% 4.Qg4 <1> 53.6% 4.Qd3 <1> 54.3% Fortunately, Opening Explorer, Opening Explorer, indicates that 4.e5 is by far the most popular move, being played 7,781 times out of 10,308 games (75.5%). And 4.e5 is also, but just barely, the move that has the highest Scoring % for White. 4.e5 was also listed as Stockfish's top move in 5 of the last 6 plies analyzed. But if I had stopped my analysis at d=46, Stockfish would have indicated that 4.exd5 was the top move as it did at d=38. So analysis of ply-by-ply information presented can be essential in determining what the best move in a given position might be, and sometimes we just have to accept that a position is not stable analysis-wise, and that that chess engine is just not capable of determining what the best move really is in a reasonable amount of time. And that's something you can't do with https://www.365chess.com/analysis_b... or other apps like it that do not provide easily recordable ply-by-ply information. |
|
Jun-30-20 | | Helios727: What happens is that when I analyze a game, I sometimes have to leave the computer for long periods of time and when I come back the search depth is never above 42. I will make a point of getting a GUI. As for unstable analysis, is it possible that this simply results from situations where more than one move is equal to another and it comes down to personal style? |
|
Jun-30-20
 | | AylerKupp: <Helios727> Same here. I do some short interactive analysis for blunder checking purposes but I typically run my "serious" analyses overnight, with 3 engines (typically Houdini, Komodo, and Stockfish and I average their evaluation in order to rank the moves, since multiple engines sometimes differ as to what their top move is, and their evaluations differ) and with only one core used per engine to practically eliminate multi-core engine non-determinism. So I can run all 3 engine analyses concurrently. I start my analysis, I go to bed, and in the morning check the results. As far as for "unstable analysis" I'll clarify that I refer to those situations where the engine(s) top move changes, at the extreme case, every search ply. After all, we want to know and play the best move. This situation typically happens when different moves have similar evaluations and after each search ply one of the moves' evaluation inches above the others by a small amount. Or, if the evaluations are truly equal, then the "top move" is the one with that evaluation that the engine finds first. In that case, any move with the same evaluation is really the "top move". In fact, I consider several moves to be effectively equal if the difference in their evaluations is [0.50] or less, since I think that ½ pawn is probably the resolution of a human player, assuming that we think in terms of equivalent pawns, which of course we don't, although perhaps subconsciously we do. I typically map engine to human evaluations roughly as follows: [0.00] to [0.49] = Position with equal chances for both sides (Note: Not necessarily a draw, although that's probably the most likely outcome, particularly between strong players) [0.50] to [0.99] = White (Black) has a slight advantage. [1.00] to [1.99] = White (Black) has a definite advantage. > [2.00] = White (Black) likely has a winning advantage. Then, if more than one move has an effectively equivalent evaluation, the move that we choose to play is probably pretty much based on personal style. Good luck in your search for a suitable GUI to meet your needs. And, as I said, let me know if I can help you get started. |
|
Jun-30-20 | | Helios727: I installed the Arena GUI and the Stockfish-11 engine. It appears there are three 64 bit engines in it: stockfish_20011801_64, stockfish_20011801_64_bmi2, and stockfish_20011801_64_modern. Which one do I use? How do I cut and paste a set of PGN text for game analysis? Also, how do I set it up to mimic the online version of Stockfish by having it show the top three moves to choose from? |
|
Jul-01-20 | | Helios727: In the meantime I installed the Tarrasch GUI, which is much simpler than Arena, so I will use it until I figure out Arena better. I see that the Stockfish engine is much faster in a PC than the online web version. This is cool. |
|
Jul-01-20
 | | AylerKupp: <Helios727> Which version of 64-bit engines to install depends on how recent a computer you have. The 20011891 refers to the build number and the _bmi2 and _modern refer to the instruction set available with your computer. Each of the 3 versions is slightly faster than the others. The _64 is the most basic and will work with any 64-bit computer regardless of age, the _modern is suitable for 64-bit computers released since 2013 or so, and the _bmi2 version is suitable for computers from the mid 2015s and beyond. The _bmi2 will likely give the best results but not spectacularly best. But the only foolproof way to find out which one works best is to try them. Install all 3 versions (Engines > Install Engine from the Main Menu), load the _bmi2 version (Engines > Load Engine from the Main Menu) and analyze (select Analyze) from the main window) a position, any position. If that engine version works, let it run for a little while, say 30 secs to let the kN/sec value stabilize somewhat, and record the kN/sec number. Then install the _modern and _64 versions and repeat. If only the _64 version works, you have your answer. Choose the version with the highest kN/sec value. Of course, if only the _64 version works you have your answer. I've never pasted a PGN game text into Arena for game analysis. I instead paste a FEN string describing a particular position and start from there. You can set it up in Arena by selecting Position > Set-up a Position from the Main Menu or press Ctrl+B or select it from the icon that shows a chessboard and an arrow in the bottom right-hand corner (Arena unfortunately does not provide balloon help). The Arena position editor is a little clunky for my tastes so I find the easiest way is to cut and paste the FEN string defining the position into Stockfish. You can create the FEN string using any one of a number of apps, I use http://www.chess-poster.com/english... to set up the position that I want to analyze, select it and copy it to the Clipboard, and in Arena from the Main Menu select Position > Get FEN from Clipboard or press <F6> to activate that position for analysis. From the Set-up a Position pop-up menu you also specify the initial move number, the player to move, and – important – whether either or both players still have castling rights or an en-passant capture is possible (since Arena does not have a history of the moves it doesn't know whether a previous move allowed an en passant capture). Fortunately the default is no castling for either side and this is the most common situation so I don't need to specify it explicitly. But sometimes when analyzing an opening-type position castling is still possible and I forget to specify it since I'm not used to doing so. |
|
Jul-01-20
 | | AylerKupp: <Stockfish, Arena> Configuration (part 2 of 2) But you made curious so I looked up pasting a PGN into Arena. Arena has a very good Help system and I found how to do it easily. It's almost trivial; save the PGN game into a *.pgn text file and drag the *.pgn file into the Arena main screen. All this appears to do is display the PGM header information in a spreadsheet-type window. But just click on any of the cells and it will load the game into the Arena main window and display the first move. You can then click on any move and it will show the position after that move, or advance/go back moves using As a bonus, since Arena now knows the history of the move's games, it automatically checks the proper settings for White and Black castling. However, you still have to explicitly specify the first move number and whether White or Black is to move. I didn't have a game handy that featured a possible en passant capture so I don't know if it will correctly indicate the en passant square. Try it and let me know. To specify that the top 3 (or any number of moves) be displayed press Ctrl+1 to display a pop-up of the UCI parameters that engine allows to be changed. If you have 2 engines loaded press Ctrl+2 to modify the UCI parameters for the 2nd engine, etc.). Change MultiPV to whatever number of values you want displayed. Close the pop-up and select Analyze. Each additional value you specify will make Stockfish run a little bit slower since it has more work to do, but not noticeably so. I like the display for MultiPV > 1 better so if I'm interested in pure speed I specify MPV=2, otherwise I typically specify MPV=3 or MPV=5. And you can save most of the information displayed by copying it to the clipboard by specifying Position > Copy Analysis to Clipboard or Shift+Ctrl+F6 (who remembers these keyboard shortcuts anyway?) and then paste them in a text editor or word processor. But there are some "features" of this operation that I will discuss if you're interested. These engine UCI parameters are stored in a text file in the Arena folder called ArenaENG.cfg. I find it easier to just edit this file if I'm changing multiple parameters or parameters for different engines. Make sure that Arena is not loaded, load the ArenaENG.cfg file, make the changes (in the Stockfish portion of the ArenaENG.cfg file MultiPV is indicated as MPV), and save it back. Just make sure that you use a plain text editor and not a word processor to do this. There is lot of documentation on Arena that you can find on the web. And, like I said, the Arena Help system is quite good. Just be patient, Arena is a full-featured GUI that takes time to learn how to use. And it's best that you configure it with icons for the features that you use the most to make Arena easier to use. But I suspect that most of your future questions will be about Arena and not Stockfish and this is, after all, the Stockfish page. So I welcome you to post any further Arena-related questions on my forum so as to not clutter this Stockfish page with Stockfish off-topic information. Good luck! |
|
Jul-02-20 | | scholes: Stockfish wins TCEC Season 18 Superfinal. Current score 52-44. Four games to go. |
|
Jul-10-20 | | Helios727: My experience has been that when Stockfish shows an advantage of 2.5 or higher, the result is almost always a win for that side if Stockfish plays it out. AylerKupp seems to get that result at the 2.0 level. What sort of threshold result do other people here get? |
|
Jul-11-20
 | | AylerKupp: <Helios727> Well, I didn't "get" anything; I didn't run any tests. My winning level threshold > [2.00] was just an opinion. Various GUIs had the default winning level to be > [1.50] and I felt that this was too optimistic, so I arbitrarily used [2.00] as a more conservative threshold. If you want to be even more conservative and want to wait until the evaluation is > [2.50] to consider the position to be winning for White, that would be fine by me. So it's just a guideline, nothing more. I you wanted to be more objective you could run various engine vs. engine tournaments with Stockfish playing itself. You could then calculate what the probability of a win resulting after achieving an evaluation threshold <x.xx> for, say 3, consecutive moves. But you need to remember that these probabilities would be engine-dependent since different engines have different evaluation functions and will evaluate positions differently. Stockfish's evaluations tend to run higher than other engines' evaluations, but not always. So a win probability of 0.90 might occur when Stockfish repeatedly reaches an evaluation > <threshold 1> and for another engine this probability of 0.90 might occur when that engine reaches an evaluation > <threshold 2> Yet another consideration is that current classic engines seem to have is dealing with fortress positions. For example see my post in Spassky vs Fischer, 1972 (kibitz #1147). Even though Stockfish's evaluations were > [+4.00], which would usually be considered winning for White, it did not recognize that it was allowing Black to achieve a fortress. Which is a pity since it's easily recognizable by an engine. In that analysis Stockfish achieved positions which it evaluated at [+4.98] and [+4.09], usually considered winning for White, at d=36 but it was unable to improve on those evaluations as far as d=52 when I just stopped the analysis. An engine should be able to recognize that if its evaluation does not change for, say, 10 consecutive moves, then it is heading into a draw if it continues to pursue that line. Particularly if up to that point its evaluation of the positions resulting from a given move had been increasing. The question is then, what can the engine do about it? Maybe by the time it discovers this situation it's too late to do anything about it. Or maybe it could permanently prune the branches of its search tree that satisfy that condition regardless of how high (within reason) the evaluation is so that it can concentrate on branches that, while they may have lower evaluations, provide more realistic opportunities for a win. |
|
Jul-29-20 | | scholes: A revolution in sf is coming. Search for stockfish nnue. They have combined a small neural network with stockfish search. It is has become much stronger than stockfish. It is becoming stronger everyday. http://www.fastgm.de/h2h10.html |
|
Jul-29-20 | | Helios727: To my chagrin, Stockfish showed me that I had +2.74 (a won game) in the very position where I had agreed to a draw in an old tournament game. I had SF play it thru and my side won. If only I had the technical vision and skill to play it out to a win myself. |
|
Jul-30-20
 | | AylerKupp: <<scholes> It is has become much stronger than stockfish. It is becoming stronger everyday.> Thanks for the update. I had heard of Stockfish NNUE but I had not looked at it in detail. It seems that they have replaced regular Stockfish's hand-crafted evaluation function with the information provided by a neural network imported from a shogi-playing engine and a suitable training set. It's a relatively simple NN, only 2 hidden layers (so I've read, but unfortunately I can no longer find the reference). The advantage of such a relatively simple NN is that its calculations can be performed in a reasonable amount of time by a CPU-based engine without having to depend on auxiliary processors like GPUs or TPUs as used by LeelaC0 and AlphaZero to implement the NN calculations. It is currently stronger than regular 4-core Stockfish 11 as evidenced by the latest CCRL 40/02 engine vs. engine tournament (http://ccrl.chessdom.com/ccrl/404/) , being rated at 3659 and ranked #1 compared with the latest release of Stockfish 11 which is rated at 3598 and ranked #4. That 61-point Elo rating advantage corresponds to a projected scoring % advantage of 58.5% vs. 41.5% of Stockfish NNUE 4 CPU vs. Stockfish 11 4 CPU. What I think is really impressive is that the Stockfish NNUE 64-bit 4 CPU engine is higher rated and ranked higher in this tournament than LeelaC0 0.25.1 t40-154, Fat Fritz w471, and Leelenstein 11.1, all of which also use an RTX 2080 GPU. CCRL engine vs. engine tournament ratings are based on the CPU-based engines running on an "oldish" Intel i7-4770K 4-core CPU rated at 0.177 TFlops compared with an RTX 2080 GPU rated at 14.2 TFlops, about 80X faster. So this version of Stockfish NNUE is rated higher and ranked higher than NN-based engines with more than 80X the performance computational capabilities. But I'm not sure how it can become significantly stronger without resorting to GPU support. Per Chess.com (not, IMO, known for their accuracy) in https://www.chess.com/news/view/sto..., regular Stockfish can evaluate 100M nodes (positions)/sec while Stockfish NNUE can evaluate 60M nodes/sec and LeelaC0 can evaluate 40K nodes/sec. But I haven't been able to find any information about the hardware used in the CCCC. The point I'm trying to make is that even restricting its NN to a relatively simple 2-hiden layer NN reduces the number of nodes/sec it can evaluate to 60%. So, to make Stockfish NNUE stronger by increasing the number of hidden layers of its NN, will lead to a significant decrease in the number of nodes/sec it can evaluate, and unless that's accompanied with significant improvements in its search tree pruning heuristics will result in a dramatic decrease in both the number of nodes/sec it can evaluate and the search depth that in can achieve in a given amount of time. So there is probably a "sweet spot" where the increased complexity of Stockfish NNUE's NN can be balanced with a corresponding decrease in its search depth capability in order to achieve optimum results. Assuming that "sweet spot" hasn't already been reached with a relatively simple 2-hidden layer NN. Of course, Stockfish NNUE's performance could be improved by off-loading its NN calculations to a GPU but what would that prove? That an NN system with an 80X+ performance advantage will likely defeat an otherwise equivalent system without an NN? We already knew that as shown by DeepMind's AlphaZero paper that indicated that if AlphaZero's time control in its 2nd match with Stockfish 8 had been reduced by a factor of 80, equalizing the computational performance capability available to both engines, Stockfish 8 would have defeated AlphaZero as convincingly as AlphaZero defeated Stockfish 8 when both engines used the same time control. Still, as you say, a revolution in Stockfish is likely coming if it's not already here. |
|
Aug-08-20 | | scholes: https://blog.stockfishchess.org/pos... |
|
Sep-05-20
 | | AylerKupp: Stockfish 12 was released on Sep 2, 2020. Download it from https://stockfishchess.org/download/. It has versions for Windows, MacOS, Linux, iOS, and Android. And it provides guidance for which version to use depending on how recent a computer you have, and you must download each version explicitly. It (optionally) incorporates NNUE as its evaluation function. If you want to use NNUE (and it's supposedly 150 Elo rating points stronger than the non-NNUE version and it does not need a GPU), you will also need to download the NNUE file from https://tests.stockfishchess.org/nns. Download it into the same folder/directory where you downloaded the Stockfish executables. If you are interested in the source code you need to download that separately, it's not downloaded automatically as in previous versions. You can download it from a separate button in https://stockfishchess.org/download/. And make sure that you read the Readme file. It's downloaded when you download the source or you can read the Stockfish 12 blog (as well as earlier versions' blogs if you are interested) on https://blog.stockfishchess.org/pos.... If you are interested in older Stockfish versions all the way back to Stockfish 1 you can download them from the Stockfish archives at https://www.dropbox.com/sh/75gzfgu7... or from an option in https://stockfishchess.org/download/. A Stockfish geek's dream. If you want to see the release dates for all Stockfish versions go to https://www.chessprogramming.org/St.... Now we can wait to see if the 150 Elo rating point advantage for Stockfish 12 NNUE holds up after unbiased engine vs. engine tournaments like CCRL (http://computerchess.org.uk/ccrl/) and CEGT (http://www.cegt.net/) incorporate it into their various time control tournaments. |
|
Sep-16-20
 | | AylerKupp: <CCRL and CEGT engine vs. engine tournament results> As I indicated above, I'm somewhat skeptical about Elo rating gains provided by engine developers, preferring unbiased ratings calculated by 3rd party organizations such as CCRL and CEGT. So here are some ratings calculated by these 2 organizations for Stockfish 12 (presumably with NNUE, it's an option, although it's not always specified) and Stockfish 11, both using only 1 CPU. For some reason no games were played with Stockfish 12 using 4 CPUs; that might make the rating differential greater. Or not. CCRL 40/15 (Rapid) Rating Difference = 57
Stockfish 12 64-bit (1 CPU) 3489 76 games
Stockfish 11 64-bit (1 CPU) 3432 1537 games
CCRL 40/2 (Blitz) Rating Difference = 154
Stockfish 12 64-bit (1 CPU) 3688 695 games
Stockfish 11 64-bit (1 CPU) 3534 1709 games
CEGT 40/20 (Rapid) Rating Difference = 98
Stockfish 12 64-bit (1 CPU) 3546 501 games
Stockfish 11 64-bit (1 CPU) 3448 2730 games
CEGT 40/4 (Blitz) Rating Difference = 109
Stockfish 12 64-bit (1 CPU) 3596 700 games
Stockfish 11 64-bit (1 CPU) 3487 4900 games
Average Rating Difference:
Rapid Only = Average(57,98 ) = <78> Blitz Only = Average(154,109) =<132> Rapid + Blitz Avg = Average(57,154,98,109) = <105> So the 150 Elo rating point gain for Stockfish 12 seems only to apply to CCRL's 40/2 (Blitz) playing conditions, the rating differences for other time controls are not nearly as large. Still, an average rating differential of 105 for all Rapid and Blitz tournaments is impressive, Stockfish typically does not release an official new version (developmental versions with greater or lower rating gains are made available for download) until its rating gain is > 50 points, so Stockfish 12 certainly qualifies as an official new version. |
|
Sep-16-20
 | | AylerKupp: <My engine vs. engine tournament results> Prior to these CCRL and CEGT numbers being available I ran a Stockfish 12 vs. Stockfish 11 100-game tournament at Blitz time controls (5 mins/game + 6 secs/move increment starting at move 1) over several nights. These were the results: Stockfish 12: +28, =61, -10; 0.590 fractional score, 59.0% scoring % Stockfish 11: +10, =61, -28; 0.410 fractional score, 41.0% scoring % So Stockfish 12's.0.590 fractional score translates into a +65 Elo rating point gain. However, I did notice a seemingly large number of games lost on time by both engines, 16, i.e. 16% of the total number of games played. Most of these were won by Stockfish 12, 14 of them or 87.5% While some losses on time can be expected, the disproportionate amount of games lost on time by Stockfish 11 might indicate that there is an inherent defect in Stockfish 11's time management function which was corrected in Stockfish 12. So the tournament results might not be a true indication of using the NNUE-based evaluation function in Stockfish 12 compared to using the classic hand-crafted evaluation function in Stockfish 11. So, excluding the losses on time by both engines, the time control loss-adjusted results were: Stockfish 12: +14, =61, -7; 0.542 fractional score, 54.2% scoring % Stockfish 11: +7, =61, -14; 0.458 fractional score, 45.8% scoring % So Stockfish 12's time control loss-adjusted 0.542 fractional score translates into "only" a +30 Elo rating point gain. I don't know if CCRL and/or CEGT count time forfeits for the engine that lost on time or not (I suspect they do), and after all, a faulty time management function is a problem for the engine and, if it results in a time forfeit, so be it. But in this instance, if we are interested in comparing the results of having an NNUE-based evaluation function with having a classic, hand-crafted evaluation function, then whether one of the engines has a faulty time-management function needs to be taken into account. But remember that these rating differential are for Blitz time control games only, Rapid and Classic time control games may give different results. |
|
Oct-02-20
 | | Ron: In the position below, White is up a pawn but because of opposite colored bishops the position is a draw: click for larger viewHowever, Stockfish 11 evaluates: + (0.60) Depth=85/89 0:09:04 1151 MN |
|
Oct-03-20
 | | AylerKupp: <Ron> Try thinking of it this way: Evaluation between [-0.00] to [+0.49]: Even chances for both sides. Evaluation between [+0.50] to [+0.99]: White has a slight advantage. Evaluation between [+1.00] to [+1.99]: White has a significant advantage. Evaluation between [+2.00] & up: White has a winning advantage. The same but with negative evaluations means that the evaluations refer to Black. So in your position it's not unreasonable for White to have a slight advantage given that he's a pawn up which, all other factors being equal, would likely earn him a [+1.00] evaluation. But all other factors are not equal; there are BOCs on the board and Black's king can get to c7 and prevent White's king from supporting his Pa6. And, of course, White's LSB cannot threaten Black's pawns on dark squares. So Stockfish 11's evaluation of [+0.60] is not unreasonable. Also consider that Stockfish's evaluations are typically slightly higher than other engines'. For example, I had Houdini 6 and Komodo 12.3 analyze the same position with 5-piece Syzygy tablebase support. At d=36 and about 2.5 hour of analysis in my ancient 32-bit computer Houdini 6 evaluated the position at [+0.19] for it's 3 "top" moves, 1. Bd5, 1.Kd3, and 1.Be4, indicating that the position indicates equal chances for both sides, but not necessarily a draw (although in this position that's the most likely outcome). Given the moves identical evaluations the order that the moves are listed is just the order that it found them in its search tree transversal. But Komodo 12.3 evaluated the position at d=83 (!) and slightly less than 3 hours of analysis at [+0.71] for it's 3 "top" moves, 1.Ba8, 1.Kd3, and 1.Kb5, again listing them in the order that it found them in its search tree. And it had evaluated those same 3 moves at [+0.92] since d=55, and [+0.96] from d=11 to d=54. Still, all those evaluations indicate that Komodo 12.3 considered that in this position White has a slight advantage, consistent with Stockfish's evaluation. And this is fastest by far I've ever seen Komodo reach such search depths. It reached d=54 in only 29 secs, maybe even faster than Stockfish could have. And it reached d=60 in only 3 mins. I have no idea why. Tonight I'll have Stockfish 11 and Stockfish 12 analyze the position under the same conditions and I'll post the results tomorrow. In situations like these when I run analyses using multiple engines I typically calculate a ratings-weighted average of the 3 engines' evaluations to try to remove some of the engine's evaluation biases. The ratings are based on the latest CCRL and CEGT engine vs. engine tournament results and in turn are the engines' average ratings at the different time controls used. So in this case Stockfish 11's evaluation would be given a greater weight than either Houdini 6's or Komodo 12.3's and a likely more accurate absolute evaluation would be [+0.48], indicating even chances for both sides. But the best evaluation of the position is provided by the FinalGen tablebase evaluator which looks at <every> possible move from a position that satisfies its constraints. And FinalGen indicates that the position is a draw for every reasonable White move, only indicating a Black win after the nonsensical 1.Bc6 which after 1...Kxc6 it indicates that Black reaches a winning position (likely after a pawn's promotion of a queen) after 19 moves. |
|
Oct-03-20
 | | AylerKupp: <Ron> You should consider upgrading to Stockfish 12, it seems to be far superior to Stockfish 11, although maybe likely not as superior as its developers indicate. You see the results of my Stockfish 11 vs. Stockfish 12 100-game tournament above. And in the latest CCRL and CEGT tournaments the following are the two engines' ratings at the various time controls they use; the 2nd column is Stockfish 11's rating, the 3rd column is Stockfish 12's rating, and the 4th column is Stockfish 12's rating advantage over Stockfish 11. All ratings are for the 4-CPU version of the engines unless otherwise noted 40/120 (CEGT) 3481 N/A(1) N/A(1)
40/20 (CEGT) 3507 3545(2) +38(3)
40/15 (CCRL) 3481 3516 +115
40/04 (CEGT) 3587 N/A(1) N/A(1)
40/02 (CCRL) 3599 3680(2) +81(3)
Note 1: Not Available. The CEGT 40/120 and 40/04 engine tournaments were completed before Stockfish 12 was released. Note 2: For 1-CPU version of the engine. I have no idea why a 4-CPU version of the engine was not used. Note 3: The 1-CPU version of Stockfish 12 performed better than the 4-CPU version of Stockfish 11! So I'm sure that Stockfish 12's 4-CPU version rating advantage over Stockfish 11's 4-CPU version would be even greater. |
|
Oct-03-20
 | | Ron: <AylerKupp>
Thank you for the informative posts.
I'm wondering if there are any programs that use Monte Carlo Methods. I hypothesize that a program using Monte Carlo methods on the position would give a zero or near zero evaluation. I heard that there's "Rybka Randomizer". |
|
Oct-03-20 | | Big Pawn: <AylerKupp: Stockfish 12 was released on Sep 2, 2020. Download it from https://stockfishchess.org/download/. It has versions for Windows, MacOS, Linux, iOS, and Android. And it provides guidance for which version to use depending on how recent a computer you have, and you must download each version explicitly. It (optionally) incorporates NNUE as its evaluation function. If you want to use NNUE (and it's supposedly 150 Elo rating points stronger than the non-NNUE version and it does not need a GPU), you will also need to download the NNUE file from https://tests.stockfishchess.org/nns. Download it into the same folder/directory where you downloaded the Stockfish executables. If you are interested in the source code you need to download that separately, it's not downloaded automatically as in previous versions. You can download it from a separate button in https://stockfishchess.org/download/. And make sure that you read the Readme file. It's downloaded when you download the source or you can read the Stockfish 12 blog (as well as earlier versions' blogs if you are interested) on https://blog.stockfishchess.org/pos.... If you are interested in older Stockfish versions all the way back to Stockfish 1 you can download them from the Stockfish archives at https://www.dropbox.com/sh/75gzfgu7... or from an option in https://stockfishchess.org/download/. A Stockfish geek's dream. If you want to see the release dates for all Stockfish versions go to https://www.chessprogramming.org/St.... Now we can wait to see if the 150 Elo rating point advantage for Stockfish 12 NNUE holds up after unbiased engine vs. engine tournaments like CCRL (http://computerchess.org.uk/ccrl/) and CEGT (http://www.cegt.net/) incorporate it into their various time control tournaments.> Thank you for this informative post. |
|
Oct-04-20
 | | AylerKupp: <<Ron> I'm wondering if there are any programs that use Monte Carlo Methods.> Yes. LeelaC0 and all (I believe) the neural network-based chess engines such as Fat Fritz, Lelenstein, Allie, Stoofvless, Scorpio, etc. use Monte Carlo Tree Search (MCTS) instead of Minimax (MMax) + Alpha/Beta pruning + Search Tree Pruning Heuristics to select the best move from a given position. Unfortunately they all require 64-bit computers and I only have (but hopefully not for long) a 32-bit computer so I can't run any of them. But Komodo provides an option to use MCTS instead of MMax to select the best move to play and come up with an evaluation of its top move. Currently Komodo 14 with MMax is somewhat stronger than Komodo 14 with MCTS (43 Elo rating points), the first being rated at 3419 and the second at 3376 in the latest CCRL 40/15 engine vs. engine tournament, both ratings being for the 4-CPU version, but the rating difference between the two for the same Komodo version is getting smaller, although not monotonically. I unfortunately can't run any Komodo version higher than 13 on my computer because of my 32-bit limitation but tonight I will run a test case with Komodo 12.3 MMax (which is what I used with their analysis of you initial position) and Komodo 12.3 MCTS and see what results I get. Some caveats. Unlike chess engines using minimax and have evaluation functions expressed in centipawns in the range [-128, +128] (or so, the [ ± 128] evaluations are an artifact of the information provided by the Syzygy evaluations and depend on how the specific chess engine interprets them), chess engines using MCTS evaluate each candidate move by calculating the scoring probability of each move (their documentation usually call it the winning probability but I don't believe that's right, it also includes the probability of drawing) in the range of [0, 1]. Komodo MCTS then "estimates" what an equivalent "centipawn" evaluation would be. And the mapping between scoring probability and centipawn evaluation, at least for LeelaC0, is not a simple one (see http://chessforallages.blogspot.com... if you are curious and/or masochistic). It looks like a brace ( "}") lying on its side and it's not monotonic. So you might need to take a leap of faith when engines that use And the concept of search depth is not the same for Komodo MMax and Komodo MCTS since the latter does not use iterative deepening, so Komodo MCTS also "estimates" what an equivalent "search depth" would be if it was using MMax. Well, enough extraneous nonsense. I'll run the comparison tonight and report tomorrow. |
|
 |
 |
< Earlier Kibitzing · PAGE 13 OF 15 ·
Later Kibitzing> |
|
|
|