The Olympic Medal Table Visualized Gapminder Style
Abstract
Following Hans Rosling’s Gapminder animation style we visualize the total number of medals a country wins during each olympic summer games in relation to the country’s gross domestic product (GDP) per capita. We illustrate how R’s data wrangling capabilities provide a useful toolbox to make such an analysis happen.
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License. The
markdown+Rknitr source code of this blog is available under a GNU General Public
License (GPL v3) license from github.
Introduction
Long Swedish winter nights are best spent watching Hans Rosling’s
inspiring TED
talks. Such visualizations help the statistician make points about
temporal trends in a x-axis to y-axis relationship, which otherwise
might drown in modelling details. Recently, I stumbled over a blog post on
how to use the gganimate
R
package to animate the Gapminder data available from the
gapminder
package. In order to perform a similar
Rosling style animation consider the following: Today, the
Olympic Summer Games in Rio de Janeiro end. As usual this spawns a
debate, whether the nation’s participation has been successful. For this
purpose the olympic medal
table is often taken as basis for comparisons, e.g., to mock your neighbouring
countries. Recent analyses and visualization have been interested in
how to correct these tables for, e.g., population size or, more
interesting, analyse the influence of GDP. For example:
- Google provides alternative Olympics medal tables
- Time Magazine discusses whether it is fair to rank countries by medals achieved alone
The aim of the present blog note is to visualize how countries perform in the medal table in relation to their GDP per capita. From a technical viewpoint we experiment with using R to scrape the olympic medal tables from Wikipedia and animate the results Gapminder style. Disclaimer: We only show the potential of such an analysis and, hence, worry less about the scientific validity of the analysis.
Data
We use the data of Gapminder in order to obtain country specific population and GDP per capita data for each of the years in the period of 1960-2016. The olympic medal tables are ‘harvested’ from Wikipedia.
Olympic medal tables
Olympic medal tables were extracted using the rvest
package from the corresponding Wikipedia pages by using
table-extracting-code described in the post by Cory
Nissen. The Wikipedia tables contain the current state of the medal
table and hence take changes in the medal distribution, e.g. deprivation
due to doping, into account. For details on such a table, see for
example the medal
table of the 2012 summer games in London. In order to stay focused
we hide the scraping functionality in the function
scrape_medaltab
- see the code on GitHub for more
details.
#Years which had olympic games
<- seq(1960, 2016, by=4)
olympic_years
# Extract olympic medal table from all olympic years since 1960
<- bind_rows(lapply(olympic_years, scrape_medaltab))
medals
# Show result
::datatable(medals) DT
Gapminder data
We obtain GDP per capita and population data from Gapminder. Unfortunately,
these need to be fetched and merged manually. A more convenient way
would have been to take these directly from the package gapminder
,
but newer GDP data
are now available. Again, we hide the details of the data wrangling
activities and refer to GitHub code.
For convenience, we also extract the corresponding continent each
country belongs to. This can be done conveniently by comparing with the
gapminder
dataset (see code for details).
Joining the two data sources
In principle, all that is left to do is to join the two data sources using the country name of the gapminder dataset and the nation names of the olympic medal tables. However, a challenge of the present country based analysis is how to incorporate the many political changes which happened during the analysis period. As an example, East Germany participated as independent national olympic committee during 1968-1988, but the gapminder data only contain GDP data for Germany as a total. We therefore aggregate the results of the two countries for the analysis. A further important change is the split of the former Soviet Union into several independent states. As a consequence, in 1992 a subset of the former Soviet republics participated as Unified Team. The GDP values for the Soviet Union thus have to be computed from the Gapminder data by manually summing the individual Soviet republic GDP values. Again we skip further data munging details and simply refer to the GitHub code for a transparent & reproducible account. Warning: Only few of the entries in the list of obsolete nations & name changes are taken into account.
Conditioned on the success of the previous wrangling step, we can now join the two data sources:
<- left_join(medals_mod, gapminder_manual, by=c("Nation","Year")) medals_gm
Results
First we analyse the all-time summer olympic medal table for the period 1960-2016.
<- medals_gm %>%
medals_alltime group_by(Nation) %>%
summarise(Total = sum(Total)) %>%
arrange(desc(Total))
::datatable(medals_alltime) DT
We now plot of the total number of medals awarded for each summer games in the period of 1960-2016.
<- medals_gm %>%
nTotal group_by(Year) %>%
summarise(TotalOfGames = sum(Total))
ggplot(nTotal, aes(x = Year, y = TotalOfGames)) + geom_line() + ylab("Total number of medals per Summer Games")
A distinct increasing trend is observed in the above figure. Hence,
in order to make between-country comparisons over time based on the
number of medals won, we normalize the medals by the total number of
medals awarded during the corresponding games. The result is stored in
the column Frac
.
<- medals_gm %>%
medals_gm left_join(nTotal, by = "Year") %>%
mutate(Frac = Total / TotalOfGames)
After all these pre-processing steps, we can now compare country results for all summer games in the period 2000-2016.
Note that for better visualization of the many countries with a small number of medals, an \(\sqrt{}\)-transform of the y-axis is used.
Finally, we can use the gganimate
package to visualize
the dependence of the total number of medals won in the summer games
1960-2016 as a function of GDP per capita.
As before a \(\sqrt{}\)-transform of the y-axis is used for better visualization. One interesting observation we see from the animation is that the home-country of the Olympics always appears to do well in the following Olympics. Also note that the 1980 and 1984 were special due to boycotts. With respect to the top-5 nations it is also worth noticing that China, due to protests against the participation of Taiwan, did not participate in the Olympics 1956-1980. Furthermore, up to 1988 the team denoted “Germany” in the animation consists of the combined number of medals of “East Germany” and “West Germany”.
Fun with Flags
Update: After being made aware of the concurrent blog
entry by Philippe
Massicotte on how to visualize the Rio medal table using the
ggflags
package, the above gapminder visualization can
easily be extended to use flags instead of nation names. As the
ggflags
package only contains the flags of currently
existing countries we start the visualization in 1990. For better
visability we also add the trajectory of each nation.
Number of Medals per Population
To see the medal tables in a different light, we instead visualize a quantity relative to the number of medals per population. To enable cross-year comparisons we therefore compute the following index for each country and olympic summer games: \[ \frac{\text{Fraction of All Medals the Country got in that Year}}{\text{Population in the Country that Year}} \times 10^6. \] We shall call this index a country’s fraction of all medals per million population. A similar animation as above, now with logarithmic y-axis, illustrates the dynamics. To provide evidence supported neighbour mocking, we highlight the position of the three Nordic countries (Denmark, Sweden and Norway).
Jamaica, Bahamas and Grenada appear to do reasonably well lately compared to their population size. However, more more important - did you noticed the position of Denmark at the 2016 games in Rio?