< Earlier Kibitzing · PAGE 9 OF 13 ·
Later Kibitzing> |
Dec-24-09
 | | alexmagnus: Some questions for chessmetrics fanatics here: how does Sonas com from performances to ratings? Don't link me to his formulae site, I read it. But if I understood everything correctly, one single played tournament would imply the first rating being same as the padded performance, no? If yes, then how do you explain f.x. this?: http://db.chessmetrics.com/CM2/Play... First and only padded performance 2500, first rating 2470 and then it chaotically changes. |
|
Dec-24-09
 | | alexmagnus: <Whitehat> The situation with IQ is similar to that with longevity - average (if one didn't change tests) rises, maximum remains constant. And, similar to the case of longevity, the average rises because there are fewer people with low IQ. Note, the initial aim of IQ was to detect mentally disabled people. And what happened is, there became fewer and fewer people who would qualify for mentally disabled on early tests, so the test were changed on multiple occasions, to fit the new distribution of intelligence.Today's IQ 130 people would do equally well on old tests as those of years ago., i.e. they'd score 130 on an old test too. Today's IQ 100 people would score around 107 on a 50-year-old IQ test. Today's IQ 70 people would score 90-95 on an old test . |
|
Dec-26-09 | | notafrog: <alexmagnus> I don't think I qualify, but I'll still try to answer. If I understood correctly, the performance rating is for October, while the first rating is for November. By November, there are new results and the player was inactive for a month. I suspect, the chaotic variation is due to additional inactivity and variation in the rating of the player's opponents. Had this tournament been rated for this new player for October, which using the chessmetrics methodology it should, the player probably would not have had a 2500 rating since additional games would lower weight of the padding of the opponents' ratings. It is too late at night for me to check if that actually is the case, or how it would effect rating. Additionally, chessmetrics ratings are geared toward ranking the higher rated players. Using the formula as is to rank middle rated players is questionable at best. |
|
Dec-26-09
 | | alexmagnus: <f I understood correctly, the performance rating is for October, while the first rating is for November. By November, there are new results and the player was inactive for a month.> According to Sonas' page, the earlier games get weighted - linearly, for 4 years. That means, a game played a month ago should have a weight of 47/48. Not too much of a difference to 1 to get 30 points less. Also note, th fluctuations in the next months are nowhere near 30 points - despite the fact that all formulae applied by Sonas are linear. <Had this tournament been rated for this new player for October, which using the chessmetrics methodology it should, the player probably would not have had a 2500 rating since additional games would lower weight of the padding of the opponents' ratings.> The 2500 number <is> already a padded performance. So he needs only to weight it. For October, the weight would be 1.
<Additionally, chessmetrics ratings are geared toward ranking the higher rated players> It still doesn't answer how h coms to the ratings. I had same lack of success trying to reconstruct ratings of world champions. <I suspect, the chaotic variation is due to additional inactivity and variation in the rating of the player's opponents.> Inactivity is the argument which gives room for weighting - but then the fluctuation would not be chaotic. It should be either a monotonous fall or a rise followed by a fall. Opponent's ratings later don't matter - only their October ratings matter... So the mystery remains... |
|
Dec-27-09 | | notafrog: I explained what I seem to remember from those same pages. I may be wrong. I have played around in the past with the same methodology, though I never actually verified Sonas'es ratings. Problem with lack of input data. Sonas may follow a different calculation method, have made a mistake, or the 30 point variation is correct. Did you check if such 30 point variations are common? One point on which I am almost certain you are wrong is: <only their October ratings matter>. I think the games are weighted and rated using the rating of the list, and not the list of the time of the game. In that sense, each list is calculated independently of all other lists. I have no idea if that can explain the data. |
|
Dec-27-09
 | | alexmagnus: <I think the games are weighted and rated using the rating of the list, and not the list of the time of the game.> That would make no sense as the games have (even though sinking) influence <four years ahead>. After 4 years, the ratings are totally different, and it would be somewhat senseless to rate a tournament played 4 years ago with ratings of today... Especially with the tournament being a youth championship... <Did you check if such 30 point variations are common?> Some players get their first rating after some 5-6 months after their first tournament (with no tournaments in between!) which is also unexplained. This player plays three tournaments in January and gets a rating in May: http://db.chessmetrics.com/CM2/Play.... For this one the ratings are not shown at all but we see that his highest rating is <86> points below his initial performance: http://db.chessmetrics.com/CM2/Play... And for this player the ratings are not shown too but his highest rating is 30 points <higher> than his initial performance: http://db.chessmetrics.com/CM2/Play... |
|
Dec-27-09
 | | alexmagnus: Sorry, the link for the second player should be of course http://db.chessmetrics.com/CM2/Play... |
|
Dec-28-09 | | notafrog: Since it is difficult to collect the source data, I obviously cannot check the examples to mentioned. It is possible that the calculations are wrong. Other possible reasons for a delay in ratings entering the system is the criterion of how "connected" a player is to the top players. There is obviously no way to check these hypotheses without reconstructing the input data. I once started collecting data from the site, though I don't remember where they are, or in what state they are. |
|
Dec-28-09
 | | alexmagnus: I've got an answer from Sonas today. He explains the 30-point change with the effects of simultaneous calculation. But that doesn't explain why there are no large changes in the following months. |
|
Dec-29-09 | | notafrog: You can try asking Sonas for the game result database. Then we can check if the 30 point difference is correct, and why it appears. |
|
Feb-08-10
 | | alexmagnus: Those who explain rising elo with inflation of this - how do you explain rising results in memory sports? Those rise even rapidlier and are <absolute>. Look at these records, all of them are quite new: http://web.aanet.com.au/memorysport...
The memory championships exist since early 90s. |
|
Feb-08-10 | | Tomlinsky: I started to explain... but can't be arsed. Sorry. :) |
|
Apr-13-10 | | RainPiper: How large is the white-against-black advantage expressed in ELO points? Put otherwise: If I play white against a higher rated player, what is the rating difference that gives me a winning probability of precisely 50%? This has certainly been calculated somewhere, but I wasn't able to find it yet. |
|
Apr-13-10 | | whatthefat: <RainPiper>
In my experience looking at the data for super-GMs, it's circa 50 points, although with big differences between individual players. I would also expect it to depend strongly on playing strength. It's also been shown to depend on the time control. Jeff Sonas seems to imply that it's about 35 points here http://www.chessbase.com/newsdetail... Meanwhile, NIC and cg.com both have White scoring about 55% on average - see http://www.newinchess.com/Yearbook/... and ChessGames.com Statistics Page - which also works out to a 35 point rating difference by the Elo formula. |
|
Apr-14-10 | | RainPiper: Thanks for the links, <whatthefat>, this was exactly the sort of information I was looking for. It also answers a follow-up question that I had in the back of my mind. Is there discussion about correcting performance ratings for the white/black bias? (Jeff Sonas actually advocates this in the text you linked.) In tournaments for individuals, this is not that much of an issue. (The number of games with white and black will hardly ever deviate by more than one). However, in team events, the white/black ratio can be strongly biased. E.g. at the 2008 Olympiad, Grischuk had 6 times white and black only twice: http://chess-results.com/tnr16314.a... |
|
Nov-07-10 | | whiteshark: Quote of the Day
" The process of rating players can be compared to the measurement of the position of a cork bobbing up and down on the surface of agitated water with a yard stick tied to a rope and which is swaying in the wind. " -- Arpad Elo |
|
Nov-07-10 | | prensdepens: What is this? No picture for the gentleman who came up with the system by which the strength chess players could be calculated, and which is now the norm. Can we have his picture please on his page to honor The Man for his significant contribution to chess? |
|
Feb-22-11
 | | cu8sfan: Elo is being challanged by data miners: http://www.kaggle.com/ChessRatings2.... |
|
Feb-22-11
 | | alexmagnus: Well, as already said multiple times, Elo isn't designed to predict results, it is designed to describe results. Unlike f.x. Chessmetrics (which is a better predictor with some funny consequences for description).. |
|
Feb-22-11 | | Akavall: <Well, as already said multiple times, Elo isn't designed to predict results, it is designed to describe results. > Yes. And the two seem to be quite different. For example, I think a good prediction method should be very sensitive to players' current form, which is generally pretty volatile. For example, player A is 2650 rated; he starts a tournament poorly, so prediction algorithm should predict his results as if "A" was considerably weaker player (let's say 2500). Therefore, the algorithm would have that player's rating at 2500 for that tournament (unless the player turns his performance around). However, for the next tournament the the algorithm should treat the player as somewhere closer to 2600, and then adjust to his performance during the tournament. The algorithm would probably do pretty well if treated Grischuk as 2550-2600 player during Tata-Steel, but it should weight him higher than that for the next event. This would lead to wild fluctuations in the rating that the algorithm assigns to the player. |
|
Feb-22-11 | | mojonera: harkness system was better . |
|
Feb-22-11 | | TheFocus: <mojonera> <harkness system was better.> And wasn't it around before Elo's system? |
|
Feb-23-11
 | | alexmagnus: <Akavall> Yes, a perfect predictor would be about as volatile as the TPRs are, while a perfect descriptor would remain about constant most of the time and heavily change only if the improvement/decline is clear. I don't know if Elo is perfect in these terms (how is it even possible to test the "descriptive power"?), but it clearly does a better job than any system made on the base of "best predictive power". |
|
Apr-04-11
 | | alexmagnus: In the chessmetrics system it's a known phenomenon that player's highest rating is sometimes higher than his highest performance - which is "philosophically" a result of predictive (and not descriptive) nature of chessmetrics, and mathematically a consequence of weighting+padding. The differences are usually small in such cases, but what if we take performances only <prior to the achievement of the rating peak>. The record among the 3-year-average top 100 holds Neumann whose highest CM-rating is 149 (!) points higher than his highest CM performance prior to reaching that rating (and 53 points higher than his highest ever CM performance). Second is Lasker with a 76 pt difference. Talking about sense and nonsense of "padding" and "predictive power"... |
|
Apr-08-11 | | drik: drik: <metatron2: also each trial in the binomial dist has only two possible outcomes, while each chess game has 3 possible outcomes, meaning that I also ignored all the draws>
I was looking at it as p=win & q=not win.
<and that Fide's <Elo> is also based on the normal distribution (they don't use logistic curves)> True, but I think that USCF uses logistical distributions because gaussians underestimate the rate of upsets at large rating differences. FIDE ratings attempt to make the gaussians heavy-tailed, by having a cutoff threshold which never asymptotically approaches zero. But these cause distortions that are worse than the supposed problem. http://www.chessbase.com/newsdetail... |
|
 |
 |
< Earlier Kibitzing · PAGE 9 OF 13 ·
Later Kibitzing> |