*In this post I summarize some correlations between ratings, complexity and components of board games. These were extracted by analyzing data from 350000 games in BoardGameGeek’s database. If anyone is interested in seeing more details (e.g. standard deviations, p-values, etc…), feel free to leave a comment.*

## Do heavier games get higher ratings?

The short answer is yes. I’ve always suspected that BoardGameGeek’s rating system is slightly biased towards heavier games, meaning that users tend to give higher ratings to the heavier, “gamer’s games”, and consequently lower ratings to lighter family and party games. I looked at data from 350 thousand games in BGG’s database, then filtered out games with fewer than 100 rating votes and games without a weight rating. Below is a two dimensional histogram, or heatmap, of the weights and the ratings.

We can see and upward trend quite clearly. Furthermore, if we perform a linear regression on this data, we get a coefficient of determination (R^{2}) of 0.36, meaning that 36% of a game’s rating is explained by its weight, since it seems reasonable to expect other factors at play which would explain the remaining 67%. The slope of the regression line is 0.66, which is how much the ratings go up for every unit of weight.

We can still see this fairly linear relation even if we restrict ourselves to specific designers. I looked at the main games (as listed by BGG) of four prominent designers: Uwe Rosenberg, Stefan Feld, Jamey Stegmaier, and Vital Lacerda:

In these four cases something very interesting happened: the coefficient of determination increased significantly, almost doubling on most cases, meaning that, once we restrict to a specific designer, the weight plays a much greater part in a game’s rating. This makes sense, since we’re limiting some other factors that might influence the quality of the games.

But are users giving higher ratings to heavier games just because they’re heavier, or does the weight actually make the game better? It’s impossible to say for sure just by looking at this data. I’m of the opinion that complexity for complexity’s sake does not add quality to a game, yet it’s not unthinkable that heavier games might yield a more involved gaming experience, which can be seen as enhancing its overall enjoyment. But then I look at fairly light yet deep games such as The Resistance: Avalon. There are many such games which, as far as rules go, are incredibly light but offer highly complex game states for the players to untangle and usually involve a non-trivial amount of emotional investment – and yet these tend to under perform in terms of user ratings.

Now here’s thought experiment, we could try to see what would happen to a game’s rating if we were to subtract the influence of its weight. By doing this we could see which is the best “pound for pound” game, much like the Wilks coefficient for powerlifting which subtracts the influence of body mass from each athletes performance. If we did this using our liner regression, given a game with rating * r* and weight

*, its weightless rating*

**w***is given by the formula*

**r’***r’ = r – 0.66w*

Using the above equation we arrive at the unsettling result of One Night Ultimate Werewolf having a higher “weightless” rating than Gloomhaven…

## Do games with miniatures get higher ratings?

Again, the short answer is yes, but there’s nuance.

Analyzing the impact of miniatures on ratings is fairly straightforward since this variable is binary, i.e. we have two disjoint classes: games with miniatures and games without miniatures. Looking at 350 thousand games and again, filtering out those with fewer than 100 ratings we see that the average (mean) rating of games with miniatures is 7.5 (std 0.7), which is significantly higher than of games without miniatures, which is 6.8 (std 0.9).

Here’s a density histogram for the ratings of each class.

Now, it could be the case that this has nothing to do with plastic components, and what’s actually happening is that games with miniatures tend to be heavier, and as we have seen before, heavier games get higher ratings. So if we now control for weight, and look at games with weight between 2 and 3, we see that the average (mean and median) weight for games with miniatures and for games without miniatures is 2.6 and 2.4, respectively, but the mean ratings are 7.4 and 7.0. Interestingly, in other weight brackets the the difference is smaller, but still statistically significant (worst case had p < 0.01 on Kolmogorov-Smirnov test).

Here’s a table that gives us the average ratings for each weight bracket.

weight class | miniatures | rating mean | rating median | rating stdev | votes |
---|---|---|---|---|---|

1 to 2 | yes | 6.6 | 6.5 | 0.9 | 100 |

no | 6.3 | 6.3 | 0.8 | 6814 | |

2 to 3 | yes | 7.4 | 7.5 | 0.6 | 667 |

no | 7.0 | 7.1 | 0.7 | 6422 | |

3 to 4 | yes | 7.7 | 7.8 | 0.7 | 410 |

no | 7.5 | 7.6 | 0.7 | 2515 | |

4 to 5 | yes | 8.1 | 8.1 | 0.8 | 65 |

no | 7.9 | 8.0 | 0.6 | 353 |

If you’d rather look at an image that pretty much tells the same story, here are side-by-side heatmaps so we can compare the relationship of weight and rating in games with miniatures and in games without them.

Now, I’m not too bothered by this result, since, if I were given the option of playing the same game either with miniatures or without miniatures, I’d choose the first option, as I believe would most people. But I feel it’s useful to know that, comparing two different medium-light weight games with the same rating, if one of them has miniatures, then you know where roughly half a rating point might be coming from.

## Conclusion

It does seem like heavier games do get higher ratings, as do games with miniatures, yet what is actually* causing* the rating increase is still up to speculation, since what we’ve witnessed are just correlations.

I feel I should end this post asking the reader to think about what BGG ratings even mean. Are people voting based on enjoyment of the play session, or are they evaluating the game design and the *ideas* behind the game? Or maybe some are just quantifying how good the contents of the box look before even playing it. And in any case, maybe the true meaning of the ratings isn’t *how* but *why* they are created in the first place, and the way we as a community use them to make decisions. But that’s a topic for another post.