
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Olympics Data Analysis Using Python
The contemporary Olympic Games, sometimes known as the Olympics, are major international sporting events that feature summer and winter sports contests in which thousands of participants from all over the world compete in a range of disciplines. With over 200 nations competing, the Olympic Games are regarded as the world's premier sporting event. In this article, we will examine the Olympics using Python. Let's begin.
Importing necessary libraries
!pip install pandas !pip install numpy import numpy as np import pandas as pd import seaborn as sns from matplotlib import pyplot as plt
Importing and understanding the dataset
We have two CSV files when dealing with Olympic data. One detailing the total sports-related expenses of all Olympic Games. Another has information on athletes from all years who competed with information.
You can get a CSV data file by clicking here ?
data = pd.read_csv('/content/sample_data/athlete_events.csv') # data.head() display first 5 entry print(data.head(), data.describe(), data.info())
Merging both datasets
# regions and country noc data CSV file regions = pd.read_csv('/content/sample_data/datasets_31029_40943_noc_regions.csv') print(regions.head()) # merging to data and regions frame merged = pd.merge(data, regions, on='NOC', how='left') print(merged.head())
From here the data analysis starts.
Data analysis of Gold analysis
Example
#creating goldmedal dataframes goldMedals = merged[(merged.Medal == 'Gold')] print(goldMedals.head())
Output
ID Name Sex Age Height Weight Team \ 3 4 Edgar Lindenau Aabye M 34.0 NaN NaN Denmark/Sweden 42 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland 44 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland 48 17 Paavo Johannes Aaltonen M 28.0 175.0 64.0 Finland 60 20 Kjetil Andr Aamodt M 20.0 176.0 85.0 Norway NOC Games Year Season City Sport \ 3 DEN 1900 Summer 1900 Summer Paris Tug-Of-War 42 FIN 1948 Summer 1948 Summer London Gymnastics 44 FIN 1948 Summer 1948 Summer London Gymnastics 48 FIN 1948 Summer 1948 Summer London Gymnastics 60 NOR 1992 Winter 1992 Winter Albertville Alpine Skiing Event Medal region notes 3 Tug-Of-War Men's Tug-Of-War Gold Denmark NaN 42 Gymnastics Men's Team All-Around Gold Finland NaN 44 Gymnastics Men's Horse Vault Gold Finland NaN 48 Gymnastics Men's Pommelled Horse Gold Finland NaN 60 Alpine Skiing Men's Super G Gold Norway NaN
Analysis of gold medalists according to age
Here, we'll make a graph showing the number of gold medals in relation to age. For this, we will develop a counterplot for graph representation, with the participants' ages shown on the X-axis and the number of medals on the Y-axis.
Example
plt.figure(figsize=(20, 10)) plt.title('Distribution of Gold Medals') sns.countplot(goldMedals['Age']) plt.show()
Output

Make a new data frame named ?masterDisciplines' in which to place this new group of people. Then, use that data frame to make a visualization.
Example
masterDisciplines = goldMedals['Sport'][goldMedals['Age'] > 50] plt.figure(figsize=(20, 10)) plt.tight_layout() sns.countplot(masterDisciplines) plt.title('Gold Medals for Athletes Over 50') plt.show()
Output

Analysis women won the medals
Example
womenInOlympics = merged[(merged.Sex == 'F') & (merged.Season == 'Summer')] print(womenInOlympics.head(10)) sns.set(style="darkgrid") plt.figure(figsize=(20, 10)) sns.countplot(x='Year', data=womenInOlympics) plt.title('Women medals per edition of the Games') plt.show()
Output

Analyzing the top 5 countries that won the medal
Example
print(goldMedals.region.value_counts().reset_index(name='Medal').head()) totalGoldMedals = goldMedals.region.value_counts().reset_index(name='Medal').head(5) g = sns.catplot(x="index", y="Medal", data=totalGoldMedals, height=6, kind="bar", palette="muted") g.despine(left=True) g.set_xlabels("Top 5 countries") g.set_ylabels("Number of Medals") plt.title('Medals per Country') plt.show()
Output

Evolution of athletes over time
Example
MenOverTime = merged[(merged.Sex == 'M') & (merged.Season == 'Summer')] WomenOverTime = merged[(merged.Sex == 'F') & (merged.Season == 'Summer')] part = MenOverTime.groupby('Year')['Sex'].value_counts() plt.figure(figsize=(20, 10)) part.loc[:,'M'].plot() plt.title('Variation of Male Athletes over time')
Output

Example
part = WomenOverTime.groupby('Year')['Sex'].value_counts() plt.figure(figsize=(20, 10)) part.loc[:,'F'].plot() plt.title('Variation of Female Athletes over time')
Output

Conclusion
We have gone through some analysis of the data, you can also go further and figure out more insights.