0% found this document useful (0 votes)
46 views30 pages

Internship 74

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views30 pages

Internship 74

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

CSD 3159 – INTERNSHIP REPORT

J.Mohideen Sathak Uzham


220171601074

1
BONAFIDE CERTIFICATE

Certified that this is the Bonafide record of the work done by

J. Mohideen Sathak Uzham of Register No : 220171601075 of V


semester B.Tech – ARTIFICIAL INTELLIGENCE AND DATA
SCIENCE in the CSD 3159 INTERNSHIP during the year 2024-25.

Course Faculty Head of the Department

Date:

Submitted for the Practical Examination held on

Internal Examiner
2
Name of the Student : J. Mohideen Sathak Uzham
RRN : 220171601074
Department of the Student : B. Tech Artificial Intelligence & Data Science
Semester from : July 2024 – December 2024
Year & Section : IIIrd Year - V Semester- B

TITLE PAGE NO.


CHAPTER
1. ACKNOWLEDGE 4
SECTION/CERTIFICATE
2. ABSTRACT 5
3. INTRODUCTION 6-7
4. ROLES & RESPONSIBILITIES 8
5. OBJECTIVES 9
6. PROJECT OVERVIEW 10-24
7. APPENDIX 25
8. SKILLS AQCUIRED 26
9. SUMMARY & REFERENCES 27
10. RESULT 30

3
4
Abstract

This report presents a comprehensive overview of my internship experience as a Data Science


intern at TechQuant (techquant.in), highlighting the significant role of Python programming in
data analysis and machine learning projects. The internship provided me with an invaluable
opportunity to engage with real-world datasets, develop data-driven insights, and implement
predictive models aimed at addressing pressing business and social issues. A pivotal project
during my internship involved a detailed analysis of the impact of COVID-19 on various
socioeconomic factors. This project entailed utilizing extensive datasets from reputable public
health sources, governmental reports, and economic indicators to uncover critical trends and
correlations that could inform public policy and business strategies.
The primary objectives of the COVID-19 analysis project included examining the effects of the
pandemic on employment rates across different sectors, assessing changes in healthcare access
and utilization, and exploring the relationship between government interventions, such as
lockdown measures and financial aid, and public health outcomes. To achieve these objectives, I
employed a systematic approach that involved several key methodologies: data cleaning and
preprocessing to ensure dataset accuracy, exploratory data analysis (EDA) to identify patterns
and anomalies, and the application of machine learning algorithms to build predictive models.
Throughout the internship at TechQuant, I utilized a range of Python libraries, including Pandas
for data manipulation, NumPy for numerical analysis, Matplotlib and Seaborn for data
visualization, and Scikit-learn for implementing machine learning algorithms. These tools
allowed me to extract meaningful insights from the data, leading to a better understanding of the
complex interplay between COVID-19 and socioeconomic dynamics.
This report also details the challenges encountered during the internship, such as managing
missing data, navigating the complexities of correlational analyses, and effectively
communicating findings to stakeholders. The skills and knowledge gained throughout this
experience significantly enhanced my technical proficiency and deepened my understanding of
data science principles and practices. Ultimately, the insights derived from the COVID-19
analysis not only contributed to TechQuant’s strategic objectives but also reinforced the critical
importance of data science in addressing contemporary global challenges. This internship
experience solidified my passion for the field of data science and underscored its potential to drive
meaningful change in society.

5
Introduction

In the contemporary digital age, Data Science has emerged as a crucial discipline, enabling organizations
to leverage vast amounts of data for informed decision-making and strategic planning. My internship at
TechQuant (techquant.in) as a Data Science intern provided me with a unique opportunity to delve into
this dynamic field, particularly through the lens of real-world applications. The experience allowed me to
engage in diverse projects, but one that stood out was a comprehensive analysis of the impact of COVID-
19 on various socioeconomic factors. This project was not only timely and relevant but also critical in
understanding the far-reaching effects of the pandemic.

During this internship, I worked with extensive datasets from various sources, including public health
records and economic indicators. My primary focus was on employing Python, a versatile programming
language widely regarded for its effectiveness in data manipulation, analysis, and visualization. Utilizing
libraries such as Pandas for data cleaning, NumPy for numerical analysis, and Matplotlib and Seaborn for
data visualization, I was able to extract meaningful insights and identify significant trends resulting from
the pandemic.

The project involved several key steps: first, gathering and cleaning datasets to ensure accuracy and
completeness; next, conducting exploratory data analysis (EDA) to identify patterns and relationships
within the data; and finally, applying machine learning algorithms to predict outcomes and draw
conclusions. Throughout the process, I encountered various challenges, such as dealing with missing data
and understanding complex correlations, but these experiences proved instrumental in honing my problem-
solving skills.

One of the significant challenges I encountered was dealing with missing data and ensuring that the
analyses remained robust despite these gaps. To address this, I employed various imputation techniques
and sensitivity analyses to assess the impact of missing values on my findings. Additionally, understanding
complex correlations and potential confounding variables required a thorough exploration of the data,
prompting me to use advanced visualization techniques with Matplotlib and Seaborn to communicate my
findings effectively.

The culmination of this project was not merely to generate insights for TechQuant but also to contribute
to the broader understanding of how COVID-19 has reshaped our world. The findings indicated significant
disparities in the socioeconomic impact of the pandemic across different regions and demographics,
highlighting areas where targeted interventions could be most beneficial. This project not only enhanced
my technical skills but also underscored the importance of data-driven decision-making in addressing
complex global challenges.

In conclusion, this report aims to provide a comprehensive overview of my internship journey, detailing
the projects undertaken, methodologies employed, and the key learnings acquired throughout this
enriching experience. By examining the impact of COVID-19 through data science, I gained not only
technical expertise but also a deeper appreciation for the role of data science in informing public policy
and improving community well-being.

Throughout the internship, I also had the opportunity to collaborate with a multidisciplinary team, which
was an invaluable aspect of the experience. Working closely with experts from various domains such as
public health, economics, and policy analysis helped me understand the broader context in which data
science operates. The team dynamic fostered a collaborative environment, allowing me to approach
problems from different angles and gain diverse perspectives on complex issues. This collaboration also
helped me improve my communication skills, particularly when presenting data-driven insights to non-
6
technical stakeholders, ensuring that the findings were both understandable and actionable.

Moreover, this internship experience helped me appreciate the importance of ethics in data science.
Working with sensitive public health data highlighted the need for responsible data handling and
transparency. Ensuring data privacy and mitigating bias were critical considerations throughout the
project. I learned to approach data analysis with a mindset that prioritizes ethical considerations, ensuring
that the conclusions drawn were not only scientifically sound but also socially responsible. This aspect of
the internship reinforced my understanding of the broader implications of data science in shaping public
perceptions and decisions.

Looking ahead, the skills and knowledge I acquired during this internship have equipped me with a solid
foundation to continue pursuing a career in data science. The experience of working on a real-world
problem like the COVID-19 pandemic, combined with the technical tools I learned to use, has deepened
my interest in applying data science to solve pressing global challenges. I am now more confident in my
ability to analyze complex datasets, interpret findings, and contribute meaningfully to the growing field of
data science. As I move forward in my career, I aim to continue leveraging data science to drive positive
change and contribute to meaningful solutions for society's most pressing issues.

7
ROLES AND RESPONSIBILITIES

Role:
I was selected as a Data science intern and was trained and worked on a project in TechQuant.

Responsibility:

During my data science internship at TechQuant (techquant.in), my role involved working on real-
world projects where I applied my skills in data analysis to tackle complex problems. I was responsible
for tasks such as cleaning and preparing data, building predictive models, and visualizing results to
extract meaningful insights. Leveraging Python and libraries like NumPy and Pandas for data
handling, along with Matplotlib and Seaborn for data visualization, I also explored techniques to
optimize machine learning models. Throughout the program, I demonstrated dedication, creativity, and
a proactive approach to learning, contributing significantly to the team's projects while enhancing my
expertise in data science.

Following the completion of my training period, I was assigned a major project: "COVID-19 Impact
Analysis Using Python." This project required the integration of both technical and soft skills.
Effective communication was vital, as I needed to clearly explain my analysis and ensure the data
visualizations were accessible and easy to interpret. My focus was on creating clear, insightful
representations of data that allowed stakeholders to understand the trends and conclusions drawn from
my work.

In addition to technical work, I created comprehensive data visualizations and detailed reports to
effectively communicate findings to stakeholders, presenting insights in a clear and actionable manner.
Collaboration with team members and domain experts played a significant role in this project, allowing
me to contribute to discussions on data-driven strategies and solutions. This experience not only
sharpened my technical abilities but also improved my communication and teamwork skills.

These responsibilities highlight the technical expertise and collaborative efforts I brought to the
project, reflecting the value I added to the organization. A detailed overview of the project I completed
is presented in the following pages.

8
OBJECTIVES

1. Enhance Technical Proficiency in Data Science


During my internship at TechQuant (techquant.in), I focused on developing and refining
my programming skills in Python, leveraging libraries like Pandas, NumPy, Matplotlib,
Seaborn, and Scikit-learn. These tools allowed me to efficiently handle data
manipulation, analysis, and visualization, enhancing my understanding of the data
science workflow and strengthening my technical foundation.
2. Apply Data Science Methodologies
I gained extensive hands-on experience with key data science methodologies, including
data cleaning, exploratory data analysis (EDA), feature engineering, and predictive
modeling. Through the COVID-19 Impact Analysis Using Python project, I applied
statistical techniques and machine learning algorithms to real-world datasets, solving
practical problems and deriving actionable insights.
3. Understand Real-World Applications of Data Science
My work exposed me to the application of data science across domains such as public
health, economic analysis, and policy-making. By analyzing the socioeconomic impacts
of COVID-19, I witnessed firsthand how data-driven insights can inform decisions and
strategies in diverse sectors. Additionally, I explored case studies and trends in data
science, deepening my understanding of its transformative role in contemporary
challenges.
4. Develop Problem-Solving Skills
Tackling complex datasets during the internship helped me cultivate strong problem-
solving and critical-thinking skills. I systematically approached challenges, such as
managing missing data and identifying meaningful insights, by employing analytical
methods and iterative experimentation to ensure robust and accurate solutions.
5. Foster Effective Communication Skills
Communicating technical findings effectively was a key aspect of my role. I created
comprehensive visualizations and detailed reports to present insights to non-technical
stakeholders, ensuring clarity and impact. By tailoring my communication style to
different audiences, I honed my ability to articulate complex concepts in a simple and
actionable manner.

Summary
Overall, the objectives of my internship at TechQuant were met through a combination of
technical skill development, practical applications, and personal growth. This experience
solidified my passion for data science and provided me with the tools and insights necessary to
excel in this dynamic field. Moving forward, I am motivated to continue exploring the vast
potential of data science, applying my expertise to address critical challenges and contribute
positively to society.

9
Project overview:

The project I undertook during my internship was titled “COVID-19 Impact Analysis Using
Python.” In this project, I leveraged Python and its versatile libraries to analyze and visualize
the economic impact of the COVID-19 pandemic. Specifically, I utilized libraries like Pandas
and NumPy for data cleaning and manipulation, ensuring the dataset was accurate and ready
for analysis. To make the insights accessible and understandable, I employed Matplotlib and
Seaborn to create clear and informative visualizations, transforming raw data into meaningful
graphs and charts.

The outbreak of COVID-19 brought unprecedented challenges and restrictions, resulting in


significant disruptions to the global economy. Virtually every country experienced adverse
effects as the number of cases surged, leading to reduced economic activity, shifts in
employment, and changes in consumer behavior. Through this project, I focused on examining
these impacts, analyzing trends, and drawing insights that could provide a better understanding
of the socioeconomic consequences of the pandemic.

This analysis not only enhanced my technical proficiency in Python but also deepened my
awareness of how data science can be used to study and address real-world challenges.

Covid-19 Impact analysis


The first wave of COVID-19 had a profound impact on the global economy, catching the world
unprepared for such an unprecedented disaster. This pandemic led to a sharp rise in cases,
deaths, unemployment, and poverty, causing a significant economic slowdown. The primary
goal of my project, “COVID-19 Impact Analysis Using Python,” was to analyze the spread of
COVID-19 cases and evaluate its various economic impacts.
Dataset Description
To conduct this analysis, I used a dataset sourced from Kaggle, which provided comprehensive
information about COVID-19's socioeconomic impact. The dataset included:
1. Country Code: Unique identifier for each country.
2. Name of Countries: Names of all countries in the dataset.
3. Date of Record: The specific dates for which data was recorded.
4. Human Development Index (HDI): A measure of each country's development.
5. Daily COVID-19 Cases: Number of confirmed cases recorded daily.
6. Daily Deaths Due to COVID-19: Number of deaths reported daily due to the virus.
7. Stringency Index: A measure of the strictness of government policies (e.g., lockdowns).
8. Population of the Countries: Total population figures for each country.
9. GDP per Capita: Economic output per person, indicating the financial health of each
country.

10
Covid-19 impact analysis using python:
The project is done in ‘Google colab’
Libraries used: pandas, numpy, matplotlib, seaborn.
To proceed with the analysis, first we should import the necessary libraries and then import
the dataset.

Fig: 1

In the fig 1, the library is imported and the dataset is also imported and then the dataset is checked
whether it is correctly imported or not.
The dataset that I used contains the data of covid-19 cases and from, December 31, 2019 to October 10,
2020

Data preparation:

Before analysing the data we should check whether the data is clean, consistent and ready for analysis,
that is it should not contain missing values, duplicates, date type conversion (Certain columns, like the
date column, need to be converted to appropriate data types (e.g., from string to datetime) to allow for
easy manipulation), feature engineering, data manipulation, data filtering and aggregating the data.
The dataset that used here contains two data files. One file contains raw data, and the other file contains
transformed one. But we have to use both datasets for this task, as both of them contain equally
important information in different columns.
Checking both the dataset one by one:

11
Fig 3

Fig 2
From fig 2, After having initial impressions of both datasets, I found that we have to combine
both datasets by creating a new dataset. But before creating a new dataset, we have to check
how many samples of country is present in the dataset.

Fig 3

12
Fig 4
From Figure 3, it is clear that the dataset contains an unequal number of samples for each
country, so it is important to calculate the mode value.
So, the mode value is 294, and we will use this value to divide the sum of all the samples related
to the Human Development Index (HDI), GDP per capita, and population. After performing
these calculations, we will create a new dataset by combining the relevant columns from both
datasets, ensuring the necessary data is correctly aggregated for further analysis.
Aggregating the data:

Fig 5

13
The GDP per capita column has not yet been included due to the absence of accurate GDP per
capita data in the dataset. Since manually collecting this data for all countries would be both
time-consuming and challenging, I decided to focus on the top 10 countries with the highest
number of COVID-19 cases and retrieve their GDP per capita values. To implement this, the
dataset needs to be sorted first, after which the GDP per capita values can be added for these
selected countries.
For that we have to sort the dataset and then add the GDP per capita:

Fig 6
From Figure 6, it is evident that the addition of the GDP per capita column was successful, and
the data appears to be accurate without any issues.

14
Analysing the spread of Covid-19
To begin the analysis, we will first focus on examining the spread of COVID-19 in countries
that have reported the highest number of cases.

Fig 7

In Figure 8, the numbers for total deaths are represented in millions. The graph shows that the
USA leads with the highest total number of deaths caused by COVID-19, followed by Brazil
in second place and India in third.
From the dataset, we observe that countries like India, Russia, and South Africa have a
relatively lower death rate compared to the total number of COVID-19 cases. This indicates
that, despite having high numbers of cases, the proportion of deaths to cases is comparatively
low in these countries.
And then we are going check the total number of deaths among the countries with highest cases
of covid-19.

15
Fig 8

In fig 8, the numbers in total deaths represent millions. From fig 8, we can see that USA is also
leading in the total number of deaths caused by the covid-19 followed by Brazil and the India
which occupies the second and third positions.
Apart from the graph, according to the data in dataset, the death rate in India, Russia and South
Africa is comparatively low according to the total number of covid-19 cases.
So we have to compare the total number of cases and total number of deaths in all these
countries

16
Fig 9
In Figure 9, the numbers on the y-axis are represented in millions.

Fig 10

17
In Figure 10, the percentage of total COVID-19 cases is 96.5%, while the percentage of total
deaths is 3.49%.
The death rate is calculated as follows: Death rate = (Sum of total deaths / Sum of total
cases) * 100,
which gives a death rate of 3.614421.
Another important variable in this dataset is the stringency index, which is a composite
measure that evaluates a country's response to the pandemic. It includes factors such as school
closures, workplace closures, and travel bans. The stringency index helps to indicate how
strictly a country is implementing measures to control the spread of COVID-19.

Fig 11

In Figure 11, the numbers for total cases are represented in millions. From the figure, it is
evident that India has implemented a relatively strict stringency policy, while the USA has a
moderate stringency level, and Brazil has adopted a more lenient approach compared to India.
These countries are compared because they ranked among the top three in terms of total cases
and deaths. Despite India's strong performance in terms of stringency, the country still
experienced a high number of cases and deaths, largely due to its large population.

18
Now, we can start the analysis of impact of covid-19 on economy. To find the impact we can
compare the GDP per capita before covid-19 and GDP per capita during covid-19.
First, GDP per capita before covid-19:

Fig 12

GDP per capita during covid-19:

Fig13

19
Comparing GDP per capita before and during Covid-19:

Fig 14
This task focuses on analyzing the global spread of COVID-19 and its impact on the economy.
The United States experienced the highest number of COVID-19 cases and deaths, which can
be attributed in part to the country's relatively low stringency index in comparison to its
population. Additionally, the analysis explored how the GDP per capita of various countries
was affected by the pandemic, highlighting the economic repercussions of COVID-19 across
the globe.

Conclusion
The examination of COVID-19's impact through data science methods provided a
comprehensive approach to understanding the pandemic's multifaceted effects on both public
health and the economy. Through meticulous data collection, preprocessing, and advanced
analysis, the project yielded valuable insights that can guide future public health strategies and
economic policies. This experience underscored the significant role of data science in tackling
pressing societal issues and highlighted the power of data-driven solutions in supporting
informed decision-making during times of crisis.

20
Appendix:

Code Samples:
This appendix provides more detailed code samples from the projects completed during the
internship. These samples demonstrate the practical application of data science techniques and
programming skills developed throughout the internship.
• Data preprocessing:

# Importing necessary libraries and dataset


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load datasets
transformed_data = pd.read_csv("transformed_data.csv")
raw_data = pd.read_csv("raw_data.csv")

# View the data


print(transformed_data)
print(transformed_data.head())
print(raw_data.head())

# Count the occurrences of each country


transformed_data["COUNTRY"].value_counts()
transformed_data["COUNTRY"].value_counts().mode()

# Aggregating the data


country_codes = transformed_data["CODE"].unique().tolist()
country_names = transformed_data["COUNTRY"].unique().tolist()
human_development_index = []
total_cases = []
total_deaths = []
stringency_index = []
populations = transformed_data["POP"].unique().tolist()
gdp_per_capita = []

for country in country_names:


hdi_value = (transformed_data.loc[transformed_data["COUNTRY"] == country,
"HDI"]).sum() / 294
human_development_index.append(hdi_value)
total_cases.append((raw_data.loc[raw_data["location"] == country, "total_cases"]).sum())
21
total_deaths.append((raw_data.loc[raw_data["location"] == country, "total_deaths"]).sum())
stringency_value = (transformed_data.loc[transformed_data["COUNTRY"] == country,
"STI"]).sum() / 294
stringency_index.append(stringency_value)
populations.append((raw_data.loc[raw_data["location"] == country, "population"]).sum() /
294)

# Creating aggregated data


aggregated_data = pd.DataFrame(list(zip(country_codes, country_names,
human_development_index, total_cases, total_deaths, stringency_index, populations)),
columns=["Country Code", "Country", "HDI", "Total Cases", "Total
Deaths", "Stringency Index", "Population"])
print(aggregated_data.head())

# Sorting Data Based on Total Cases


aggregated_data = aggregated_data.sort_values(by=["Total Cases"], ascending=False)
print(aggregated_data.head())
aggregated_data = aggregated_data.head(10)
print(aggregated_data)

# Adding GDP columns


aggregated_data["GDP Before Covid"] = [65279.53, 8897.49, 2100.75, 11497.65, 7027.61,
9946.03, 29564.74, 6001.40, 6424.98, 42354.41]
aggregated_data["GDP During Covid"] = [63543.58, 6796.84, 1900.71, 10126.72, 6126.87,
8346.70, 27057.16, 5090.72, 5332.77, 40284.64]
print(aggregated_data)

# Data Visualization

# Total Cases Bar Chart


plt.figure(figsize=(10, 6))
plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'], color='skyblue')
plt.title("Countries with Highest Covid Cases", fontsize=16)
plt.xlabel("Country", fontsize=12)
plt.ylabel("Total Cases", fontsize=12)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

# Total Deaths Bar Chart


plt.figure(figsize=(10, 6))
plt.bar(aggregated_data['Country'], aggregated_data['Total Deaths'], color='salmon')
plt.title("Countries with Highest Covid Deaths", fontsize=16)
plt.xlabel("Country", fontsize=12)
plt.ylabel("Total Deaths", fontsize=12)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
22
# Combined Total Cases and Total Deaths Bar Chart
countries = aggregated_data["Country"]
total_cases = aggregated_data["Total Cases"]
total_deaths = aggregated_data["Total Deaths"]
bar_width = 0.35
index = np.arange(len(countries))

plt.figure(figsize=(12, 6))
plt.bar(index, total_cases, bar_width, label='Total Cases', color='indianred')
plt.bar(index + bar_width, total_deaths, bar_width, label='Total Deaths', color='lightsalmon')
plt.title('Countries with Total Cases and Deaths', fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(index + bar_width / 2, countries, rotation=-45, ha='right')
plt.legend()
plt.tight_layout()
plt.show()

# Pie Chart for Total Cases and Deaths Distribution


cases = aggregated_data["Total Cases"].sum()
deceased = aggregated_data["Total Deaths"].sum()
labels = ["Total Cases", "Total Deaths"]
values = [cases, deceased]
plt.pie(values, labels=labels, autopct='%1.1f%%')
plt.title('Percentage of Total Cases and Deaths')
plt.legend()
plt.axis('equal')
plt.show()

# Calculating Death Rate


death_rate = (aggregated_data["Total Deaths"].sum() / aggregated_data["Total Cases"].sum()) *
100
print("Death Rate = ", death_rate)

# GDP Visualization before and during Covid

# GDP Before Covid-19 Color Mapping


norm = plt.Normalize(aggregated_data['GDP Before Covid'].min(), aggregated_data['GDP
Before Covid'].max())
sm = plt.cm.ScalarMappable(cmap="coolwarm", norm=norm)
sm.set_array([])
plt.figure(figsize=(10, 6))
bars = plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'],
color=plt.cm.coolwarm(norm(aggregated_data['GDP Before Covid'])))
plt.title("GDP Per Capita Before Covid-19", fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Total Cases', fontsize=12)
23
plt.xticks(rotation=90)
cbar = plt.colorbar(sm)
cbar.set_label('GDP Before Covid')
plt.tight_layout()
plt.show()

# GDP During Covid-19 Color Mapping


norm = plt.Normalize(aggregated_data['GDP During Covid'].min(), aggregated_data['GDP
During Covid'].max())
sm = plt.cm.ScalarMappable(cmap="viridis", norm=norm)
sm.set_array([])
plt.figure(figsize=(10, 6))
bars = plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'],
color=plt.cm.viridis(norm(aggregated_data['GDP During Covid'])))
plt.title("GDP Per Capita During Covid-19", fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Total Cases', fontsize=12)
plt.xticks(rotation=90)
cbar = plt.colorbar(sm)
cbar.set_label('GDP During Covid')
plt.tight_layout()
plt.show()

# Comparing GDP Before and During Covid

gdp_before = aggregated_data["GDP Before Covid"]


gdp_during = aggregated_data["GDP During Covid"]

plt.figure(figsize=(10, 5))
plt.bar(index, gdp_before, bar_width, label='GDP Before Covid-19', color='indianred')
plt.bar(index + bar_width, gdp_during, bar_width, label='GDP During Covid-19',
color='lightsalmon')
plt.title('GDP Per Capita Before and During Covid-19', fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('GDP Per Capita', fontsize=12)
plt.xticks(index + bar_width / 2, countries, rotation=-45, ha='right')
plt.legend()
plt.tight_layout()
plt.show()

# Stringency Index Visualization


norm = plt.Normalize(aggregated_data['Stringency Index'].min(), aggregated_data['Stringency
Index'].max())
sm = plt.cm.ScalarMappable(cmap="coolwarm", norm=norm)
sm.set_array([])

plt.figure(figsize=(10, 6))
bars = plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'],
24
color=plt.cm.coolwarm(norm(aggregated_data['Stringency Index'])))
plt.title("Stringency Index during Covid-19", fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Total Cases', fontsize=12)
plt.xticks(rotation=90)
cbar = plt.colorbar(sm)
cbar.set_label('Stringency Index')
plt.tight_layout()
plt.show()

25
Skills acquired

1. Python Programming
Improved proficiency in Python, with a focus on key libraries for data science, including
Pandas for data manipulation, NumPy for numerical analysis, Matplotlib and Seaborn
for data visualization, and Scikit-learn for machine learning and predictive modeling.

2. Data Manipulation and Cleaning


Gained expertise in cleaning and preprocessing large datasets, including handling
missing data, identifying outliers, normalizing data, and performing feature engineering
to generate meaningful variables.

3. Data Visualization
Developed strong skills in data visualization, utilizing Matplotlib and Seaborn to create
clear and effective charts, graphs, and heatmaps that communicated trends and insights
to stakeholders.

4. Machine Learning
Acquired hands-on experience with machine learning techniques such as regression
analysis, decision trees, and time-series forecasting. Learned to build, train, and evaluate
models using relevant performance metrics.

5. Statistical Analysis
Enhanced understanding of statistical methods, including hypothesis testing, correlation
analysis, and significance testing, applied to explore relationships between variables in
the COVID-19 dataset.

6. Problem-Solving and Analytical Thinking


Strengthened problem-solving abilities by systematically addressing challenges related
to data quality, model accuracy, and the interpretation of complex data.

7. Effective Communication
Improved the ability to clearly present technical findings to non-technical stakeholders,
delivering insights through reports, presentations, and visualizations that facilitated data-
driven decision-making.

26
SUMMARY

My internship experience at TechQuant has been an invaluable learning journey, providing both
foundational knowledge and practical, hands-on experience in the field of data science. The initial
training program laid a solid foundation by introducing me to essential tools such as Python and
key libraries like NumPy, pandas, Matplotlib, and Seaborn. These tools proved to be
indispensable throughout the internship as I applied them in real-world scenarios, particularly
when analyzing the vast datasets related to the impact of COVID-19 on public health and the
global economy.

One of the key takeaways from the internship was the importance of data cleaning and
preparation. As I worked with large and complex datasets, I gained significant exposure to the
challenges associated with data cleaning, such as handling missing values, removing duplicates,
and ensuring the accuracy of data. This hands-on experience with data manipulation using pandas
and NumPy was invaluable, as it not only enhanced my technical skills but also highlighted the
critical role that data preparation plays in ensuring the reliability of analyses and the robustness
of conclusions.

Additionally, the application of exploratory data analysis (EDA) techniques and machine learning
algorithms was a major component of my internship. Using Python libraries like Matplotlib and
Seaborn for data visualization, I was able to uncover trends and patterns within the data, which
facilitated meaningful insights into the effects of COVID-19. For example, I was able to identify
correlations between socioeconomic factors and public health outcomes, and utilize machine
learning techniques for predictive modeling. These experiences deepened my understanding of
how to leverage data science to inform decision-making and solve complex real-world problems.

A significant aspect of this internship was the opportunity to collaborate with a multidisciplinary
team. Working alongside experts from various fields, such as economics, public health, and policy
analysis, enriched my understanding of how data science can be applied in different contexts. The
cross-disciplinary collaboration also helped me improve my communication skills, as I had to
present technical findings in a way that was accessible and actionable to non-technical
stakeholders. This experience highlighted the importance of effective communication in data
science, as it is essential for translating complex data insights into clear recommendations that
drive decision-making.

Moreover, the internship provided me with insights into the ethical dimensions of data science.
Handling sensitive data, particularly related to public health, underscored the importance of
privacy, transparency, and ethical responsibility in the analysis and interpretation of data. I
learned to approach data analysis with a critical eye, ensuring that my findings were both
scientifically rigorous and socially responsible. This ethical perspective is crucial as data science
continues to play a pivotal role in shaping policy, public opinion, and business strategies.

As I reflect on this internship, I realize how much I have grown as a data scientist. The
combination of technical skills, practical experience, and exposure to interdisciplinary
collaboration has prepared me to take on future challenges in the field of data science. I have
gained confidence in my ability to handle complex datasets, draw meaningful insights, and
communicate findings effectively. The experience has reinforced my passion for using data
science to address global challenges, such as public health crises, and has inspired me to continue
exploring ways in which data can be used to improve the world.

27
Looking ahead, I am eager to build on the knowledge and skills I acquired during my internship.
The technical expertise I gained with tools like Python, Pandas, and machine learning algorithms
will serve as a strong foundation as I continue to pursue my career in data science. Furthermore,
the insights I gained from working on real-world projects, such as analyzing the impact of
COVID-19, have sparked a desire to explore more complex and impactful problems that can
benefit from data-driven solutions.

In conclusion, my internship at TechQuant was a transformative experience that not only


enhanced my technical skills but also broadened my understanding of how data science can be
applied to address real-world problems. The exposure to diverse methodologies, collaboration
with multidisciplinary teams, and the ethical considerations of data science have all contributed
to shaping me into a well-rounded data scientist. I am excited to apply what I have learned to
future projects and continue contributing to the ever-evolving field of data science.
4o mini

28
REFERENCE

• https://pandas.pydata.org/pandas-docs/stable/
• https://numpy.org/doc/
• https://matplotlib.org/stable/contents.html
• https://seaborn.pydata.org/
• https://scikit-learn.org/stable/
• https://www.kaggle.com/datasets
• https://realpython.com/pandas-python-explore-dataset/
• https://jakevdp.github.io/PythonDataScienceHandbook/
• https://stackoverflow.com/questions/tagged/python
• https://www.oreilly.com/library/view/data-science-handbook/9781492041137/
• https://www.datacamp.com/community/tutorials/tutorial-machine-learning-python
• https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
• https://www.analyticsvidhya.com/
• https://towardsdatascience.com/
• https://www.pythonforbeginners.com/
• https://www.learnpython.org/
• https://docs.python.org/3/tutorial/
• https://www.codecademy.com/learn/learn-python-3

29
Conclusion

My internship as a Data Science intern provided invaluable hands-on experience, allowing me


to apply data science techniques in a real-world setting through the analysis of COVID-19's
impact. Throughout the project, I honed my skills in Python programming, data manipulation,
visualization, and machine learning, while also gaining a deeper understanding of how to
handle and analyze complex datasets. By leveraging these skills, I was able to generate
meaningful insights into the socioeconomic effects of the pandemic and offer data-driven
recommendations to stakeholders.

In addition to the technical expertise I developed, this internship sharpened my ability to think
critically and approach problems in a structured manner. Working on a large-scale project from
start to finish, and collaborating with a diverse team of professionals, has provided me with a
solid foundation for my future career in data science.

The comprehensive nature of this project—from data collection and cleaning to predictive
modeling—highlighted the importance of data in guiding decision-making. This experience
has inspired me to seek out more opportunities where I can apply data science to tackle pressing
global challenges. Overall, the internship not only expanded my technical skills but also
enhanced my ability to communicate complex findings clearly and create impactful, data-
driven solutions.

30

You might also like