Internship 74
Internship 74
1
BONAFIDE CERTIFICATE
Date:
Internal Examiner
2
Name of the Student : J. Mohideen Sathak Uzham
RRN : 220171601074
Department of the Student : B. Tech Artificial Intelligence & Data Science
Semester from : July 2024 – December 2024
Year & Section : IIIrd Year - V Semester- B
3
4
Abstract
5
Introduction
In the contemporary digital age, Data Science has emerged as a crucial discipline, enabling organizations
to leverage vast amounts of data for informed decision-making and strategic planning. My internship at
TechQuant (techquant.in) as a Data Science intern provided me with a unique opportunity to delve into
this dynamic field, particularly through the lens of real-world applications. The experience allowed me to
engage in diverse projects, but one that stood out was a comprehensive analysis of the impact of COVID-
19 on various socioeconomic factors. This project was not only timely and relevant but also critical in
understanding the far-reaching effects of the pandemic.
During this internship, I worked with extensive datasets from various sources, including public health
records and economic indicators. My primary focus was on employing Python, a versatile programming
language widely regarded for its effectiveness in data manipulation, analysis, and visualization. Utilizing
libraries such as Pandas for data cleaning, NumPy for numerical analysis, and Matplotlib and Seaborn for
data visualization, I was able to extract meaningful insights and identify significant trends resulting from
the pandemic.
The project involved several key steps: first, gathering and cleaning datasets to ensure accuracy and
completeness; next, conducting exploratory data analysis (EDA) to identify patterns and relationships
within the data; and finally, applying machine learning algorithms to predict outcomes and draw
conclusions. Throughout the process, I encountered various challenges, such as dealing with missing data
and understanding complex correlations, but these experiences proved instrumental in honing my problem-
solving skills.
One of the significant challenges I encountered was dealing with missing data and ensuring that the
analyses remained robust despite these gaps. To address this, I employed various imputation techniques
and sensitivity analyses to assess the impact of missing values on my findings. Additionally, understanding
complex correlations and potential confounding variables required a thorough exploration of the data,
prompting me to use advanced visualization techniques with Matplotlib and Seaborn to communicate my
findings effectively.
The culmination of this project was not merely to generate insights for TechQuant but also to contribute
to the broader understanding of how COVID-19 has reshaped our world. The findings indicated significant
disparities in the socioeconomic impact of the pandemic across different regions and demographics,
highlighting areas where targeted interventions could be most beneficial. This project not only enhanced
my technical skills but also underscored the importance of data-driven decision-making in addressing
complex global challenges.
In conclusion, this report aims to provide a comprehensive overview of my internship journey, detailing
the projects undertaken, methodologies employed, and the key learnings acquired throughout this
enriching experience. By examining the impact of COVID-19 through data science, I gained not only
technical expertise but also a deeper appreciation for the role of data science in informing public policy
and improving community well-being.
Throughout the internship, I also had the opportunity to collaborate with a multidisciplinary team, which
was an invaluable aspect of the experience. Working closely with experts from various domains such as
public health, economics, and policy analysis helped me understand the broader context in which data
science operates. The team dynamic fostered a collaborative environment, allowing me to approach
problems from different angles and gain diverse perspectives on complex issues. This collaboration also
helped me improve my communication skills, particularly when presenting data-driven insights to non-
6
technical stakeholders, ensuring that the findings were both understandable and actionable.
Moreover, this internship experience helped me appreciate the importance of ethics in data science.
Working with sensitive public health data highlighted the need for responsible data handling and
transparency. Ensuring data privacy and mitigating bias were critical considerations throughout the
project. I learned to approach data analysis with a mindset that prioritizes ethical considerations, ensuring
that the conclusions drawn were not only scientifically sound but also socially responsible. This aspect of
the internship reinforced my understanding of the broader implications of data science in shaping public
perceptions and decisions.
Looking ahead, the skills and knowledge I acquired during this internship have equipped me with a solid
foundation to continue pursuing a career in data science. The experience of working on a real-world
problem like the COVID-19 pandemic, combined with the technical tools I learned to use, has deepened
my interest in applying data science to solve pressing global challenges. I am now more confident in my
ability to analyze complex datasets, interpret findings, and contribute meaningfully to the growing field of
data science. As I move forward in my career, I aim to continue leveraging data science to drive positive
change and contribute to meaningful solutions for society's most pressing issues.
7
ROLES AND RESPONSIBILITIES
Role:
I was selected as a Data science intern and was trained and worked on a project in TechQuant.
Responsibility:
During my data science internship at TechQuant (techquant.in), my role involved working on real-
world projects where I applied my skills in data analysis to tackle complex problems. I was responsible
for tasks such as cleaning and preparing data, building predictive models, and visualizing results to
extract meaningful insights. Leveraging Python and libraries like NumPy and Pandas for data
handling, along with Matplotlib and Seaborn for data visualization, I also explored techniques to
optimize machine learning models. Throughout the program, I demonstrated dedication, creativity, and
a proactive approach to learning, contributing significantly to the team's projects while enhancing my
expertise in data science.
Following the completion of my training period, I was assigned a major project: "COVID-19 Impact
Analysis Using Python." This project required the integration of both technical and soft skills.
Effective communication was vital, as I needed to clearly explain my analysis and ensure the data
visualizations were accessible and easy to interpret. My focus was on creating clear, insightful
representations of data that allowed stakeholders to understand the trends and conclusions drawn from
my work.
In addition to technical work, I created comprehensive data visualizations and detailed reports to
effectively communicate findings to stakeholders, presenting insights in a clear and actionable manner.
Collaboration with team members and domain experts played a significant role in this project, allowing
me to contribute to discussions on data-driven strategies and solutions. This experience not only
sharpened my technical abilities but also improved my communication and teamwork skills.
These responsibilities highlight the technical expertise and collaborative efforts I brought to the
project, reflecting the value I added to the organization. A detailed overview of the project I completed
is presented in the following pages.
8
OBJECTIVES
Summary
Overall, the objectives of my internship at TechQuant were met through a combination of
technical skill development, practical applications, and personal growth. This experience
solidified my passion for data science and provided me with the tools and insights necessary to
excel in this dynamic field. Moving forward, I am motivated to continue exploring the vast
potential of data science, applying my expertise to address critical challenges and contribute
positively to society.
9
Project overview:
The project I undertook during my internship was titled “COVID-19 Impact Analysis Using
Python.” In this project, I leveraged Python and its versatile libraries to analyze and visualize
the economic impact of the COVID-19 pandemic. Specifically, I utilized libraries like Pandas
and NumPy for data cleaning and manipulation, ensuring the dataset was accurate and ready
for analysis. To make the insights accessible and understandable, I employed Matplotlib and
Seaborn to create clear and informative visualizations, transforming raw data into meaningful
graphs and charts.
This analysis not only enhanced my technical proficiency in Python but also deepened my
awareness of how data science can be used to study and address real-world challenges.
10
Covid-19 impact analysis using python:
The project is done in ‘Google colab’
Libraries used: pandas, numpy, matplotlib, seaborn.
To proceed with the analysis, first we should import the necessary libraries and then import
the dataset.
Fig: 1
In the fig 1, the library is imported and the dataset is also imported and then the dataset is checked
whether it is correctly imported or not.
The dataset that I used contains the data of covid-19 cases and from, December 31, 2019 to October 10,
2020
Data preparation:
Before analysing the data we should check whether the data is clean, consistent and ready for analysis,
that is it should not contain missing values, duplicates, date type conversion (Certain columns, like the
date column, need to be converted to appropriate data types (e.g., from string to datetime) to allow for
easy manipulation), feature engineering, data manipulation, data filtering and aggregating the data.
The dataset that used here contains two data files. One file contains raw data, and the other file contains
transformed one. But we have to use both datasets for this task, as both of them contain equally
important information in different columns.
Checking both the dataset one by one:
11
Fig 3
Fig 2
From fig 2, After having initial impressions of both datasets, I found that we have to combine
both datasets by creating a new dataset. But before creating a new dataset, we have to check
how many samples of country is present in the dataset.
Fig 3
12
Fig 4
From Figure 3, it is clear that the dataset contains an unequal number of samples for each
country, so it is important to calculate the mode value.
So, the mode value is 294, and we will use this value to divide the sum of all the samples related
to the Human Development Index (HDI), GDP per capita, and population. After performing
these calculations, we will create a new dataset by combining the relevant columns from both
datasets, ensuring the necessary data is correctly aggregated for further analysis.
Aggregating the data:
Fig 5
13
The GDP per capita column has not yet been included due to the absence of accurate GDP per
capita data in the dataset. Since manually collecting this data for all countries would be both
time-consuming and challenging, I decided to focus on the top 10 countries with the highest
number of COVID-19 cases and retrieve their GDP per capita values. To implement this, the
dataset needs to be sorted first, after which the GDP per capita values can be added for these
selected countries.
For that we have to sort the dataset and then add the GDP per capita:
Fig 6
From Figure 6, it is evident that the addition of the GDP per capita column was successful, and
the data appears to be accurate without any issues.
14
Analysing the spread of Covid-19
To begin the analysis, we will first focus on examining the spread of COVID-19 in countries
that have reported the highest number of cases.
Fig 7
In Figure 8, the numbers for total deaths are represented in millions. The graph shows that the
USA leads with the highest total number of deaths caused by COVID-19, followed by Brazil
in second place and India in third.
From the dataset, we observe that countries like India, Russia, and South Africa have a
relatively lower death rate compared to the total number of COVID-19 cases. This indicates
that, despite having high numbers of cases, the proportion of deaths to cases is comparatively
low in these countries.
And then we are going check the total number of deaths among the countries with highest cases
of covid-19.
15
Fig 8
In fig 8, the numbers in total deaths represent millions. From fig 8, we can see that USA is also
leading in the total number of deaths caused by the covid-19 followed by Brazil and the India
which occupies the second and third positions.
Apart from the graph, according to the data in dataset, the death rate in India, Russia and South
Africa is comparatively low according to the total number of covid-19 cases.
So we have to compare the total number of cases and total number of deaths in all these
countries
16
Fig 9
In Figure 9, the numbers on the y-axis are represented in millions.
Fig 10
17
In Figure 10, the percentage of total COVID-19 cases is 96.5%, while the percentage of total
deaths is 3.49%.
The death rate is calculated as follows: Death rate = (Sum of total deaths / Sum of total
cases) * 100,
which gives a death rate of 3.614421.
Another important variable in this dataset is the stringency index, which is a composite
measure that evaluates a country's response to the pandemic. It includes factors such as school
closures, workplace closures, and travel bans. The stringency index helps to indicate how
strictly a country is implementing measures to control the spread of COVID-19.
Fig 11
In Figure 11, the numbers for total cases are represented in millions. From the figure, it is
evident that India has implemented a relatively strict stringency policy, while the USA has a
moderate stringency level, and Brazil has adopted a more lenient approach compared to India.
These countries are compared because they ranked among the top three in terms of total cases
and deaths. Despite India's strong performance in terms of stringency, the country still
experienced a high number of cases and deaths, largely due to its large population.
18
Now, we can start the analysis of impact of covid-19 on economy. To find the impact we can
compare the GDP per capita before covid-19 and GDP per capita during covid-19.
First, GDP per capita before covid-19:
Fig 12
Fig13
19
Comparing GDP per capita before and during Covid-19:
Fig 14
This task focuses on analyzing the global spread of COVID-19 and its impact on the economy.
The United States experienced the highest number of COVID-19 cases and deaths, which can
be attributed in part to the country's relatively low stringency index in comparison to its
population. Additionally, the analysis explored how the GDP per capita of various countries
was affected by the pandemic, highlighting the economic repercussions of COVID-19 across
the globe.
Conclusion
The examination of COVID-19's impact through data science methods provided a
comprehensive approach to understanding the pandemic's multifaceted effects on both public
health and the economy. Through meticulous data collection, preprocessing, and advanced
analysis, the project yielded valuable insights that can guide future public health strategies and
economic policies. This experience underscored the significant role of data science in tackling
pressing societal issues and highlighted the power of data-driven solutions in supporting
informed decision-making during times of crisis.
20
Appendix:
Code Samples:
This appendix provides more detailed code samples from the projects completed during the
internship. These samples demonstrate the practical application of data science techniques and
programming skills developed throughout the internship.
• Data preprocessing:
# Load datasets
transformed_data = pd.read_csv("transformed_data.csv")
raw_data = pd.read_csv("raw_data.csv")
# Data Visualization
plt.figure(figsize=(12, 6))
plt.bar(index, total_cases, bar_width, label='Total Cases', color='indianred')
plt.bar(index + bar_width, total_deaths, bar_width, label='Total Deaths', color='lightsalmon')
plt.title('Countries with Total Cases and Deaths', fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(index + bar_width / 2, countries, rotation=-45, ha='right')
plt.legend()
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 5))
plt.bar(index, gdp_before, bar_width, label='GDP Before Covid-19', color='indianred')
plt.bar(index + bar_width, gdp_during, bar_width, label='GDP During Covid-19',
color='lightsalmon')
plt.title('GDP Per Capita Before and During Covid-19', fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('GDP Per Capita', fontsize=12)
plt.xticks(index + bar_width / 2, countries, rotation=-45, ha='right')
plt.legend()
plt.tight_layout()
plt.show()
plt.figure(figsize=(10, 6))
bars = plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'],
24
color=plt.cm.coolwarm(norm(aggregated_data['Stringency Index'])))
plt.title("Stringency Index during Covid-19", fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Total Cases', fontsize=12)
plt.xticks(rotation=90)
cbar = plt.colorbar(sm)
cbar.set_label('Stringency Index')
plt.tight_layout()
plt.show()
25
Skills acquired
1. Python Programming
Improved proficiency in Python, with a focus on key libraries for data science, including
Pandas for data manipulation, NumPy for numerical analysis, Matplotlib and Seaborn
for data visualization, and Scikit-learn for machine learning and predictive modeling.
3. Data Visualization
Developed strong skills in data visualization, utilizing Matplotlib and Seaborn to create
clear and effective charts, graphs, and heatmaps that communicated trends and insights
to stakeholders.
4. Machine Learning
Acquired hands-on experience with machine learning techniques such as regression
analysis, decision trees, and time-series forecasting. Learned to build, train, and evaluate
models using relevant performance metrics.
5. Statistical Analysis
Enhanced understanding of statistical methods, including hypothesis testing, correlation
analysis, and significance testing, applied to explore relationships between variables in
the COVID-19 dataset.
7. Effective Communication
Improved the ability to clearly present technical findings to non-technical stakeholders,
delivering insights through reports, presentations, and visualizations that facilitated data-
driven decision-making.
26
SUMMARY
My internship experience at TechQuant has been an invaluable learning journey, providing both
foundational knowledge and practical, hands-on experience in the field of data science. The initial
training program laid a solid foundation by introducing me to essential tools such as Python and
key libraries like NumPy, pandas, Matplotlib, and Seaborn. These tools proved to be
indispensable throughout the internship as I applied them in real-world scenarios, particularly
when analyzing the vast datasets related to the impact of COVID-19 on public health and the
global economy.
One of the key takeaways from the internship was the importance of data cleaning and
preparation. As I worked with large and complex datasets, I gained significant exposure to the
challenges associated with data cleaning, such as handling missing values, removing duplicates,
and ensuring the accuracy of data. This hands-on experience with data manipulation using pandas
and NumPy was invaluable, as it not only enhanced my technical skills but also highlighted the
critical role that data preparation plays in ensuring the reliability of analyses and the robustness
of conclusions.
Additionally, the application of exploratory data analysis (EDA) techniques and machine learning
algorithms was a major component of my internship. Using Python libraries like Matplotlib and
Seaborn for data visualization, I was able to uncover trends and patterns within the data, which
facilitated meaningful insights into the effects of COVID-19. For example, I was able to identify
correlations between socioeconomic factors and public health outcomes, and utilize machine
learning techniques for predictive modeling. These experiences deepened my understanding of
how to leverage data science to inform decision-making and solve complex real-world problems.
A significant aspect of this internship was the opportunity to collaborate with a multidisciplinary
team. Working alongside experts from various fields, such as economics, public health, and policy
analysis, enriched my understanding of how data science can be applied in different contexts. The
cross-disciplinary collaboration also helped me improve my communication skills, as I had to
present technical findings in a way that was accessible and actionable to non-technical
stakeholders. This experience highlighted the importance of effective communication in data
science, as it is essential for translating complex data insights into clear recommendations that
drive decision-making.
Moreover, the internship provided me with insights into the ethical dimensions of data science.
Handling sensitive data, particularly related to public health, underscored the importance of
privacy, transparency, and ethical responsibility in the analysis and interpretation of data. I
learned to approach data analysis with a critical eye, ensuring that my findings were both
scientifically rigorous and socially responsible. This ethical perspective is crucial as data science
continues to play a pivotal role in shaping policy, public opinion, and business strategies.
As I reflect on this internship, I realize how much I have grown as a data scientist. The
combination of technical skills, practical experience, and exposure to interdisciplinary
collaboration has prepared me to take on future challenges in the field of data science. I have
gained confidence in my ability to handle complex datasets, draw meaningful insights, and
communicate findings effectively. The experience has reinforced my passion for using data
science to address global challenges, such as public health crises, and has inspired me to continue
exploring ways in which data can be used to improve the world.
27
Looking ahead, I am eager to build on the knowledge and skills I acquired during my internship.
The technical expertise I gained with tools like Python, Pandas, and machine learning algorithms
will serve as a strong foundation as I continue to pursue my career in data science. Furthermore,
the insights I gained from working on real-world projects, such as analyzing the impact of
COVID-19, have sparked a desire to explore more complex and impactful problems that can
benefit from data-driven solutions.
28
REFERENCE
• https://pandas.pydata.org/pandas-docs/stable/
• https://numpy.org/doc/
• https://matplotlib.org/stable/contents.html
• https://seaborn.pydata.org/
• https://scikit-learn.org/stable/
• https://www.kaggle.com/datasets
• https://realpython.com/pandas-python-explore-dataset/
• https://jakevdp.github.io/PythonDataScienceHandbook/
• https://stackoverflow.com/questions/tagged/python
• https://www.oreilly.com/library/view/data-science-handbook/9781492041137/
• https://www.datacamp.com/community/tutorials/tutorial-machine-learning-python
• https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
• https://www.analyticsvidhya.com/
• https://towardsdatascience.com/
• https://www.pythonforbeginners.com/
• https://www.learnpython.org/
• https://docs.python.org/3/tutorial/
• https://www.codecademy.com/learn/learn-python-3
29
Conclusion
In addition to the technical expertise I developed, this internship sharpened my ability to think
critically and approach problems in a structured manner. Working on a large-scale project from
start to finish, and collaborating with a diverse team of professionals, has provided me with a
solid foundation for my future career in data science.
The comprehensive nature of this project—from data collection and cleaning to predictive
modeling—highlighted the importance of data in guiding decision-making. This experience
has inspired me to seek out more opportunities where I can apply data science to tackle pressing
global challenges. Overall, the internship not only expanded my technical skills but also
enhanced my ability to communicate complex findings clearly and create impactful, data-
driven solutions.
30