How to choose a Boardgame like a Data Analyst
Ever wondered about the ideal number of board games to own? It often seems like the answer is always one more than what you have. But which game should be your next choice?
One common method is to dive into YouTube rabbit holes, watching videos like "Best Games for 6 Players." However, these sources can sometimes offer subjective opinions that may not align with everyone's taste.
Another approach is to explore the vast ocean of boardgamegeek.com, sifting through their ranked lists. This option has the advantage of being based on a wider pool of opinions, but it has its limitations. The filtering options are somewhat restricted; you can sort games by type (like Strategy or Family) or by category (such as Horror, Farming, Adventure), but it's challenging to filter by the number of players or game complexity simultaneously.
For a Data Analyst, this is not just a problem - it's an opportunity.
Imagine a dashboard that allows you to select your ideal board game based on all your specific criteria. You want a game for 6 players, with medium complexity? Just input these requirements, and voilĂ ! A list of games tailored just for you, combining the wisdom of the crowd with the precision of data analytics.
Step 1: Gathering Data
Our quest to find the perfect board game begins with gathering data from Board Game Geek. Although the site has an API, I chose to scrape its web version for a more comprehensive dataset.
We start with BGG's massive list of about 150,000 games. To ensure we're working with quality data, I focused on the 25,000 games that have ratings. To further refine the data for accuracy, I selected around 6,000 games that meet specific criteria: each game has at least 50 reviews, was released after 1934 (Hello, Monopoly!), and has complete information on important aspects like complexity and the number of players.
For those who are curious about the technical aspects, you can find the code for this data scraping, the initial processing and cleaning of the data, as well as the final dataset on my GitHub page.
Step 2: Data Exploration
In our dataset, we have several key metrics:
- User Rated: Number of users who have rated the game.
- Average Rating: The sum of all user ratings divided by the number of users who rated the game.
- Geek Rating: The BoardGameGeek Rating adjusts the Average Rating by adding artificial "dummy" votes to prevent games with few votes from ranking too high. These dummy votes are thought to be around 100, each valued at 5.5. This method, known as "Bayesian averaging," pulls ratings toward the mid-range, affecting games with fewer votes more significantly. The exact algorithm is secret to prevent manipulation.
- Owned: Number of users who own the game.
- Plays: Total recorded plays of the game.
- Plays/Month: Number of plays in the last month.
- Complexity: A community-assessed rating of how complex or challenging a game is to understand.
Our task is to determine the most useful metric for finding the best board game. We'll explore their distributions and seek insights.
Now, the critical question: Which metric best evaluates a game's quality? Should we lean on the site's Geek Rating, or is the Average Rating more reliable? To unravel this, we'll compare how these ratings relate to each other.
To earn a high Geek Rating, popularity is key. However, many excellent games may not gain popularity due to limited marketing or high costs, leading to fewer owners and ratings. Identifying such hidden gems could be intriguing. But for most players, a game's popularity matters for two reasons: availability, preferably in local shops, and the likelihood of finding others to play with who are familiar with the rules. Therefore, while Geek Rating will be our primary metric, we'll also consider the clean Average Rating for a comprehensive view.
Now, let's dive into which metrics best indicate a game's popularity and explore how these metrics are interconnected.
We observe that the number of ratings (Users Rated) and the number of owners (Owned) very similar, though ownership figures are higher. Owning a game appears to be a more significant indicator of its popularity than just its rating
The Total Plays metric places a card games (Dominions, Race for the Galaxy, Magic: The gathering) at the forefront, indicating a genre-specific trend rather than overall popularity. This suggests it's not the most suitable metric for broad comparisons. However, Plays per Month provides a fascinating snapshot of current popular games, making it a valuable metric for understanding real-time trends.
Based on this analysis, Owned will be our primary metric for assessing popularity, with the other metrics providing additional context.
So, what about Complexity?
From there, I narrowed the dataset to games with over 300 reviews (4520 games). This threshold was chosen because these games tend to have more reliable Complexity data, which is generally filled out worse than game rating.
We observe a trend where more complex games often receive higher ratings. However, it's important to note that this data comes from the site with a self-explanatory name, it is not a random, independent sample. So, we can't say for sure that this trend is true for entire population.
Despite this, complexity remains a crucial factor in choosing a game, and it's the only metric we have for gauging this aspect.
Finally, let's explore the average Complexity and Geek Rating across different game Types to gain further insights.
It's worth noting that a game can fall into multiple types, meaning it can be counted more than once in our analysis. This overlap reflects the multifaceted nature of board games, where a single game might blend elements from various genres or categories.
It's clear that children's, family, and party games are generally designed to be straightforward and easy to understand. Considering our earlier observation that more complex games tend to receive higher ratings, it's not surprising to see that these types of games usually have lower ratings.
This underscores the importance of selecting a game type that aligns with our preferences. For instance, a great party game might not rank as high as a strategic game in terms of Geek Rating, largely due to the specific preferences of the BoardGameGeek community.
Step 3: Dashboard
Now, it's time to assemble our dashboard. It will feature a graph and a table showcasing the top 10 board games.
Main Features:
- Primary Metric: The Geek Rating is our main criterion for game selection. Games will be ranked by this rating by default, but users can also sort them by Average Rating. The X-axis of the graph will represent these ratings.
- Popularity Indicators: The Y-axis will reflect the game's popularity, indicated by the number of users who own the game (Owned). A secondary measure of popularity is the number of users who rated the game (User Rated).
- Complexity Visualization: The size of each game's figure on the graph will show its Complexity. This can be switched to represent Plays or Plays/Month.
- Visual Differentiation: Game Types will be distinguished by varying shapes and colors of the figures.
Filters:
- Game Type and Complexity: As discussed, these are essential filters.
- Number of Players: Options include the game's provided number, recommended, and best number of players based on BGG community feedback.
- Categories and Mechanics: Though not analyzed in-depth, these will be included for user convenience.
- Users Rated: Particularly useful when sorting by Average Rating to discover less-known gems.
It's time to test the dashboard!
My criteria were: best for 6 players and not too difficult (Complexity below 3.5). The Type of game didn't matter. After setting the filters, I dived into exploring the top ten games listed, except for 'Codenames,' which I already own. After much consideration, I finally chose Heat: Pedal to the Metal, the top of the list that met my criteria.
To my delight, and much to my wife's satisfaction, the game was in my hands by the evening.