0% found this document useful (0 votes)

46 views30 pages

Internship 74

Uploaded by

infinitescientist7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views30 pages

Internship 74

Uploaded by

infinitescientist7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

CSD 3159 – INTERNSHIP REPORT

J.Mohideen Sathak Uzham

220171601074

1
BONAFIDE CERTIFICATE

Certified that this is the Bonafide record of the work done by

J. Mohideen Sathak Uzham of Register No : 220171601075 of V

semester B.Tech – ARTIFICIAL INTELLIGENCE AND DATA
SCIENCE in the CSD 3159 INTERNSHIP during the year 2024-25.

Course Faculty Head of the Department

Date:

Submitted for the Practical Examination held on

Internal Examiner
2
Name of the Student : J. Mohideen Sathak Uzham
RRN : 220171601074
Department of the Student : B. Tech Artificial Intelligence & Data Science
Semester from : July 2024 – December 2024
Year & Section : IIIrd Year - V Semester- B

TITLE PAGE NO.

CHAPTER
1. ACKNOWLEDGE 4
SECTION/CERTIFICATE
2. ABSTRACT 5
3. INTRODUCTION 6-7
4. ROLES & RESPONSIBILITIES 8
5. OBJECTIVES 9
6. PROJECT OVERVIEW 10-24
7. APPENDIX 25
8. SKILLS AQCUIRED 26
9. SUMMARY & REFERENCES 27
10. RESULT 30

3
4
Abstract

This report presents a comprehensive overview of my internship experience as a Data Science

intern at TechQuant (techquant.in), highlighting the significant role of Python programming in
data analysis and machine learning projects. The internship provided me with an invaluable
opportunity to engage with real-world datasets, develop data-driven insights, and implement
predictive models aimed at addressing pressing business and social issues. A pivotal project
during my internship involved a detailed analysis of the impact of COVID-19 on various
socioeconomic factors. This project entailed utilizing extensive datasets from reputable public
health sources, governmental reports, and economic indicators to uncover critical trends and
correlations that could inform public policy and business strategies.
The primary objectives of the COVID-19 analysis project included examining the effects of the
pandemic on employment rates across different sectors, assessing changes in healthcare access
and utilization, and exploring the relationship between government interventions, such as
lockdown measures and financial aid, and public health outcomes. To achieve these objectives, I
employed a systematic approach that involved several key methodologies: data cleaning and
preprocessing to ensure dataset accuracy, exploratory data analysis (EDA) to identify patterns
and anomalies, and the application of machine learning algorithms to build predictive models.
Throughout the internship at TechQuant, I utilized a range of Python libraries, including Pandas
for data manipulation, NumPy for numerical analysis, Matplotlib and Seaborn for data
visualization, and Scikit-learn for implementing machine learning algorithms. These tools
allowed me to extract meaningful insights from the data, leading to a better understanding of the
complex interplay between COVID-19 and socioeconomic dynamics.
This report also details the challenges encountered during the internship, such as managing
missing data, navigating the complexities of correlational analyses, and effectively
communicating findings to stakeholders. The skills and knowledge gained throughout this
experience significantly enhanced my technical proficiency and deepened my understanding of
data science principles and practices. Ultimately, the insights derived from the COVID-19
analysis not only contributed to TechQuant’s strategic objectives but also reinforced the critical
importance of data science in addressing contemporary global challenges. This internship
experience solidified my passion for the field of data science and underscored its potential to drive
meaningful change in society.

5
Introduction

In the contemporary digital age, Data Science has emerged as a crucial discipline, enabling organizations
to leverage vast amounts of data for informed decision-making and strategic planning. My internship at
TechQuant (techquant.in) as a Data Science intern provided me with a unique opportunity to delve into
this dynamic field, particularly through the lens of real-world applications. The experience allowed me to
engage in diverse projects, but one that stood out was a comprehensive analysis of the impact of COVID-
19 on various socioeconomic factors. This project was not only timely and relevant but also critical in
understanding the far-reaching effects of the pandemic.

During this internship, I worked with extensive datasets from various sources, including public health
records and economic indicators. My primary focus was on employing Python, a versatile programming
language widely regarded for its effectiveness in data manipulation, analysis, and visualization. Utilizing
libraries such as Pandas for data cleaning, NumPy for numerical analysis, and Matplotlib and Seaborn for
data visualization, I was able to extract meaningful insights and identify significant trends resulting from
the pandemic.

The project involved several key steps: first, gathering and cleaning datasets to ensure accuracy and
completeness; next, conducting exploratory data analysis (EDA) to identify patterns and relationships
within the data; and finally, applying machine learning algorithms to predict outcomes and draw
conclusions. Throughout the process, I encountered various challenges, such as dealing with missing data
and understanding complex correlations, but these experiences proved instrumental in honing my problem-
solving skills.

One of the significant challenges I encountered was dealing with missing data and ensuring that the
analyses remained robust despite these gaps. To address this, I employed various imputation techniques
and sensitivity analyses to assess the impact of missing values on my findings. Additionally, understanding
complex correlations and potential confounding variables required a thorough exploration of the data,
prompting me to use advanced visualization techniques with Matplotlib and Seaborn to communicate my
findings effectively.

The culmination of this project was not merely to generate insights for TechQuant but also to contribute
to the broader understanding of how COVID-19 has reshaped our world. The findings indicated significant
disparities in the socioeconomic impact of the pandemic across different regions and demographics,
highlighting areas where targeted interventions could be most beneficial. This project not only enhanced
my technical skills but also underscored the importance of data-driven decision-making in addressing
complex global challenges.

In conclusion, this report aims to provide a comprehensive overview of my internship journey, detailing
the projects undertaken, methodologies employed, and the key learnings acquired throughout this
enriching experience. By examining the impact of COVID-19 through data science, I gained not only
technical expertise but also a deeper appreciation for the role of data science in informing public policy
and improving community well-being.

Throughout the internship, I also had the opportunity to collaborate with a multidisciplinary team, which
was an invaluable aspect of the experience. Working closely with experts from various domains such as
public health, economics, and policy analysis helped me understand the broader context in which data
science operates. The team dynamic fostered a collaborative environment, allowing me to approach
problems from different angles and gain diverse perspectives on complex issues. This collaboration also
helped me improve my communication skills, particularly when presenting data-driven insights to non-
6
technical stakeholders, ensuring that the findings were both understandable and actionable.

Moreover, this internship experience helped me appreciate the importance of ethics in data science.
Working with sensitive public health data highlighted the need for responsible data handling and
transparency. Ensuring data privacy and mitigating bias were critical considerations throughout the
project. I learned to approach data analysis with a mindset that prioritizes ethical considerations, ensuring
that the conclusions drawn were not only scientifically sound but also socially responsible. This aspect of
the internship reinforced my understanding of the broader implications of data science in shaping public
perceptions and decisions.

Looking ahead, the skills and knowledge I acquired during this internship have equipped me with a solid
foundation to continue pursuing a career in data science. The experience of working on a real-world
problem like the COVID-19 pandemic, combined with the technical tools I learned to use, has deepened
my interest in applying data science to solve pressing global challenges. I am now more confident in my
ability to analyze complex datasets, interpret findings, and contribute meaningfully to the growing field of
data science. As I move forward in my career, I aim to continue leveraging data science to drive positive
change and contribute to meaningful solutions for society's most pressing issues.

7
ROLES AND RESPONSIBILITIES

Role:
I was selected as a Data science intern and was trained and worked on a project in TechQuant.

Responsibility:

During my data science internship at TechQuant (techquant.in), my role involved working on real-
world projects where I applied my skills in data analysis to tackle complex problems. I was responsible
for tasks such as cleaning and preparing data, building predictive models, and visualizing results to
extract meaningful insights. Leveraging Python and libraries like NumPy and Pandas for data
handling, along with Matplotlib and Seaborn for data visualization, I also explored techniques to
optimize machine learning models. Throughout the program, I demonstrated dedication, creativity, and
a proactive approach to learning, contributing significantly to the team's projects while enhancing my
expertise in data science.

Following the completion of my training period, I was assigned a major project: "COVID-19 Impact
Analysis Using Python." This project required the integration of both technical and soft skills.
Effective communication was vital, as I needed to clearly explain my analysis and ensure the data
visualizations were accessible and easy to interpret. My focus was on creating clear, insightful
representations of data that allowed stakeholders to understand the trends and conclusions drawn from
my work.

In addition to technical work, I created comprehensive data visualizations and detailed reports to
effectively communicate findings to stakeholders, presenting insights in a clear and actionable manner.
Collaboration with team members and domain experts played a significant role in this project, allowing
me to contribute to discussions on data-driven strategies and solutions. This experience not only
sharpened my technical abilities but also improved my communication and teamwork skills.

These responsibilities highlight the technical expertise and collaborative efforts I brought to the
project, reflecting the value I added to the organization. A detailed overview of the project I completed
is presented in the following pages.

8
OBJECTIVES

1. Enhance Technical Proficiency in Data Science

During my internship at TechQuant (techquant.in), I focused on developing and refining
my programming skills in Python, leveraging libraries like Pandas, NumPy, Matplotlib,
Seaborn, and Scikit-learn. These tools allowed me to efficiently handle data
manipulation, analysis, and visualization, enhancing my understanding of the data
science workflow and strengthening my technical foundation.
2. Apply Data Science Methodologies
I gained extensive hands-on experience with key data science methodologies, including
data cleaning, exploratory data analysis (EDA), feature engineering, and predictive
modeling. Through the COVID-19 Impact Analysis Using Python project, I applied
statistical techniques and machine learning algorithms to real-world datasets, solving
practical problems and deriving actionable insights.
3. Understand Real-World Applications of Data Science
My work exposed me to the application of data science across domains such as public
health, economic analysis, and policy-making. By analyzing the socioeconomic impacts
of COVID-19, I witnessed firsthand how data-driven insights can inform decisions and
strategies in diverse sectors. Additionally, I explored case studies and trends in data
science, deepening my understanding of its transformative role in contemporary
challenges.
4. Develop Problem-Solving Skills
Tackling complex datasets during the internship helped me cultivate strong problem-
solving and critical-thinking skills. I systematically approached challenges, such as
managing missing data and identifying meaningful insights, by employing analytical
methods and iterative experimentation to ensure robust and accurate solutions.
5. Foster Effective Communication Skills
Communicating technical findings effectively was a key aspect of my role. I created
comprehensive visualizations and detailed reports to present insights to non-technical
stakeholders, ensuring clarity and impact. By tailoring my communication style to
different audiences, I honed my ability to articulate complex concepts in a simple and
actionable manner.

Summary
Overall, the objectives of my internship at TechQuant were met through a combination of
technical skill development, practical applications, and personal growth. This experience
solidified my passion for data science and provided me with the tools and insights necessary to
excel in this dynamic field. Moving forward, I am motivated to continue exploring the vast
potential of data science, applying my expertise to address critical challenges and contribute
positively to society.

9
Project overview:

The project I undertook during my internship was titled “COVID-19 Impact Analysis Using
Python.” In this project, I leveraged Python and its versatile libraries to analyze and visualize
the economic impact of the COVID-19 pandemic. Specifically, I utilized libraries like Pandas
and NumPy for data cleaning and manipulation, ensuring the dataset was accurate and ready
for analysis. To make the insights accessible and understandable, I employed Matplotlib and
Seaborn to create clear and informative visualizations, transforming raw data into meaningful
graphs and charts.

The outbreak of COVID-19 brought unprecedented challenges and restrictions, resulting in

significant disruptions to the global economy. Virtually every country experienced adverse
effects as the number of cases surged, leading to reduced economic activity, shifts in
employment, and changes in consumer behavior. Through this project, I focused on examining
these impacts, analyzing trends, and drawing insights that could provide a better understanding
of the socioeconomic consequences of the pandemic.

This analysis not only enhanced my technical proficiency in Python but also deepened my
awareness of how data science can be used to study and address real-world challenges.

Covid-19 Impact analysis

The first wave of COVID-19 had a profound impact on the global economy, catching the world
unprepared for such an unprecedented disaster. This pandemic led to a sharp rise in cases,
deaths, unemployment, and poverty, causing a significant economic slowdown. The primary
goal of my project, “COVID-19 Impact Analysis Using Python,” was to analyze the spread of
COVID-19 cases and evaluate its various economic impacts.
Dataset Description
To conduct this analysis, I used a dataset sourced from Kaggle, which provided comprehensive
information about COVID-19's socioeconomic impact. The dataset included:
1. Country Code: Unique identifier for each country.
2. Name of Countries: Names of all countries in the dataset.
3. Date of Record: The specific dates for which data was recorded.
4. Human Development Index (HDI): A measure of each country's development.
5. Daily COVID-19 Cases: Number of confirmed cases recorded daily.
6. Daily Deaths Due to COVID-19: Number of deaths reported daily due to the virus.
7. Stringency Index: A measure of the strictness of government policies (e.g., lockdowns).
8. Population of the Countries: Total population figures for each country.
9. GDP per Capita: Economic output per person, indicating the financial health of each
country.

10
Covid-19 impact analysis using python:
The project is done in ‘Google colab’
Libraries used: pandas, numpy, matplotlib, seaborn.
To proceed with the analysis, first we should import the necessary libraries and then import
the dataset.

Fig: 1

In the fig 1, the library is imported and the dataset is also imported and then the dataset is checked
whether it is correctly imported or not.
The dataset that I used contains the data of covid-19 cases and from, December 31, 2019 to October 10,
2020

Data preparation:

Before analysing the data we should check whether the data is clean, consistent and ready for analysis,
that is it should not contain missing values, duplicates, date type conversion (Certain columns, like the
date column, need to be converted to appropriate data types (e.g., from string to datetime) to allow for
easy manipulation), feature engineering, data manipulation, data filtering and aggregating the data.
The dataset that used here contains two data files. One file contains raw data, and the other file contains
transformed one. But we have to use both datasets for this task, as both of them contain equally
important information in different columns.
Checking both the dataset one by one:

11
Fig 3

Fig 2
From fig 2, After having initial impressions of both datasets, I found that we have to combine
both datasets by creating a new dataset. But before creating a new dataset, we have to check
how many samples of country is present in the dataset.

Fig 3

12
Fig 4
From Figure 3, it is clear that the dataset contains an unequal number of samples for each
country, so it is important to calculate the mode value.
So, the mode value is 294, and we will use this value to divide the sum of all the samples related
to the Human Development Index (HDI), GDP per capita, and population. After performing
these calculations, we will create a new dataset by combining the relevant columns from both
datasets, ensuring the necessary data is correctly aggregated for further analysis.
Aggregating the data:

Fig 5

13
The GDP per capita column has not yet been included due to the absence of accurate GDP per
capita data in the dataset. Since manually collecting this data for all countries would be both
time-consuming and challenging, I decided to focus on the top 10 countries with the highest
number of COVID-19 cases and retrieve their GDP per capita values. To implement this, the
dataset needs to be sorted first, after which the GDP per capita values can be added for these
selected countries.
For that we have to sort the dataset and then add the GDP per capita:

Fig 6
From Figure 6, it is evident that the addition of the GDP per capita column was successful, and
the data appears to be accurate without any issues.

14
Analysing the spread of Covid-19
To begin the analysis, we will first focus on examining the spread of COVID-19 in countries
that have reported the highest number of cases.

Fig 7

In Figure 8, the numbers for total deaths are represented in millions. The graph shows that the
USA leads with the highest total number of deaths caused by COVID-19, followed by Brazil
in second place and India in third.
From the dataset, we observe that countries like India, Russia, and South Africa have a
relatively lower death rate compared to the total number of COVID-19 cases. This indicates
that, despite having high numbers of cases, the proportion of deaths to cases is comparatively
low in these countries.
And then we are going check the total number of deaths among the countries with highest cases
of covid-19.

15
Fig 8

In fig 8, the numbers in total deaths represent millions. From fig 8, we can see that USA is also
leading in the total number of deaths caused by the covid-19 followed by Brazil and the India
which occupies the second and third positions.
Apart from the graph, according to the data in dataset, the death rate in India, Russia and South
Africa is comparatively low according to the total number of covid-19 cases.
So we have to compare the total number of cases and total number of deaths in all these
countries

16
Fig 9
In Figure 9, the numbers on the y-axis are represented in millions.

Fig 10

17
In Figure 10, the percentage of total COVID-19 cases is 96.5%, while the percentage of total
deaths is 3.49%.
The death rate is calculated as follows: Death rate = (Sum of total deaths / Sum of total
cases) * 100,
which gives a death rate of 3.614421.
Another important variable in this dataset is the stringency index, which is a composite
measure that evaluates a country's response to the pandemic. It includes factors such as school
closures, workplace closures, and travel bans. The stringency index helps to indicate how
strictly a country is implementing measures to control the spread of COVID-19.

Fig 11

In Figure 11, the numbers for total cases are represented in millions. From the figure, it is
evident that India has implemented a relatively strict stringency policy, while the USA has a
moderate stringency level, and Brazil has adopted a more lenient approach compared to India.
These countries are compared because they ranked among the top three in terms of total cases
and deaths. Despite India's strong performance in terms of stringency, the country still
experienced a high number of cases and deaths, largely due to its large population.

18
Now, we can start the analysis of impact of covid-19 on economy. To find the impact we can
compare the GDP per capita before covid-19 and GDP per capita during covid-19.
First, GDP per capita before covid-19:

Fig 12

GDP per capita during covid-19:

Fig13

19
Comparing GDP per capita before and during Covid-19:

Fig 14
This task focuses on analyzing the global spread of COVID-19 and its impact on the economy.
The United States experienced the highest number of COVID-19 cases and deaths, which can
be attributed in part to the country's relatively low stringency index in comparison to its
population. Additionally, the analysis explored how the GDP per capita of various countries
was affected by the pandemic, highlighting the economic repercussions of COVID-19 across
the globe.

Conclusion
The examination of COVID-19's impact through data science methods provided a
comprehensive approach to understanding the pandemic's multifaceted effects on both public
health and the economy. Through meticulous data collection, preprocessing, and advanced
analysis, the project yielded valuable insights that can guide future public health strategies and
economic policies. This experience underscored the significant role of data science in tackling
pressing societal issues and highlighted the power of data-driven solutions in supporting
informed decision-making during times of crisis.

20
Appendix:

Code Samples:
This appendix provides more detailed code samples from the projects completed during the
internship. These samples demonstrate the practical application of data science techniques and
programming skills developed throughout the internship.
• Data preprocessing:

# Importing necessary libraries and dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load datasets
transformed_data = pd.read_csv("transformed_data.csv")
raw_data = pd.read_csv("raw_data.csv")

# View the data

print(transformed_data)
print(transformed_data.head())
print(raw_data.head())

# Count the occurrences of each country

transformed_data["COUNTRY"].value_counts()
transformed_data["COUNTRY"].value_counts().mode()

# Aggregating the data

country_codes = transformed_data["CODE"].unique().tolist()
country_names = transformed_data["COUNTRY"].unique().tolist()
human_development_index = []
total_cases = []
total_deaths = []
stringency_index = []
populations = transformed_data["POP"].unique().tolist()
gdp_per_capita = []

for country in country_names:

hdi_value = (transformed_data.loc[transformed_data["COUNTRY"] == country,
"HDI"]).sum() / 294
human_development_index.append(hdi_value)
total_cases.append((raw_data.loc[raw_data["location"] == country, "total_cases"]).sum())
21
total_deaths.append((raw_data.loc[raw_data["location"] == country, "total_deaths"]).sum())
stringency_value = (transformed_data.loc[transformed_data["COUNTRY"] == country,
"STI"]).sum() / 294
stringency_index.append(stringency_value)
populations.append((raw_data.loc[raw_data["location"] == country, "population"]).sum() /
294)

# Creating aggregated data

aggregated_data = pd.DataFrame(list(zip(country_codes, country_names,
human_development_index, total_cases, total_deaths, stringency_index, populations)),
columns=["Country Code", "Country", "HDI", "Total Cases", "Total
Deaths", "Stringency Index", "Population"])
print(aggregated_data.head())

# Sorting Data Based on Total Cases

aggregated_data = aggregated_data.sort_values(by=["Total Cases"], ascending=False)
print(aggregated_data.head())
aggregated_data = aggregated_data.head(10)
print(aggregated_data)

# Adding GDP columns

aggregated_data["GDP Before Covid"] = [65279.53, 8897.49, 2100.75, 11497.65, 7027.61,
9946.03, 29564.74, 6001.40, 6424.98, 42354.41]
aggregated_data["GDP During Covid"] = [63543.58, 6796.84, 1900.71, 10126.72, 6126.87,
8346.70, 27057.16, 5090.72, 5332.77, 40284.64]
print(aggregated_data)

# Data Visualization

# Total Cases Bar Chart

plt.figure(figsize=(10, 6))
plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'], color='skyblue')
plt.title("Countries with Highest Covid Cases", fontsize=16)
plt.xlabel("Country", fontsize=12)
plt.ylabel("Total Cases", fontsize=12)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

# Total Deaths Bar Chart

plt.figure(figsize=(10, 6))
plt.bar(aggregated_data['Country'], aggregated_data['Total Deaths'], color='salmon')
plt.title("Countries with Highest Covid Deaths", fontsize=16)
plt.xlabel("Country", fontsize=12)
plt.ylabel("Total Deaths", fontsize=12)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
22
# Combined Total Cases and Total Deaths Bar Chart
countries = aggregated_data["Country"]
total_cases = aggregated_data["Total Cases"]
total_deaths = aggregated_data["Total Deaths"]
bar_width = 0.35
index = np.arange(len(countries))

plt.figure(figsize=(12, 6))
plt.bar(index, total_cases, bar_width, label='Total Cases', color='indianred')
plt.bar(index + bar_width, total_deaths, bar_width, label='Total Deaths', color='lightsalmon')
plt.title('Countries with Total Cases and Deaths', fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.xticks(index + bar_width / 2, countries, rotation=-45, ha='right')
plt.legend()
plt.tight_layout()
plt.show()

# Pie Chart for Total Cases and Deaths Distribution

cases = aggregated_data["Total Cases"].sum()
deceased = aggregated_data["Total Deaths"].sum()
labels = ["Total Cases", "Total Deaths"]
values = [cases, deceased]
plt.pie(values, labels=labels, autopct='%1.1f%%')
plt.title('Percentage of Total Cases and Deaths')
plt.legend()
plt.axis('equal')
plt.show()

# Calculating Death Rate

death_rate = (aggregated_data["Total Deaths"].sum() / aggregated_data["Total Cases"].sum()) *
100
print("Death Rate = ", death_rate)

# GDP Visualization before and during Covid

# GDP Before Covid-19 Color Mapping

norm = plt.Normalize(aggregated_data['GDP Before Covid'].min(), aggregated_data['GDP
Before Covid'].max())
sm = plt.cm.ScalarMappable(cmap="coolwarm", norm=norm)
sm.set_array([])
plt.figure(figsize=(10, 6))
bars = plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'],
color=plt.cm.coolwarm(norm(aggregated_data['GDP Before Covid'])))
plt.title("GDP Per Capita Before Covid-19", fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Total Cases', fontsize=12)
23
plt.xticks(rotation=90)
cbar = plt.colorbar(sm)
cbar.set_label('GDP Before Covid')
plt.tight_layout()
plt.show()

# GDP During Covid-19 Color Mapping

norm = plt.Normalize(aggregated_data['GDP During Covid'].min(), aggregated_data['GDP
During Covid'].max())
sm = plt.cm.ScalarMappable(cmap="viridis", norm=norm)
sm.set_array([])
plt.figure(figsize=(10, 6))
bars = plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'],
color=plt.cm.viridis(norm(aggregated_data['GDP During Covid'])))
plt.title("GDP Per Capita During Covid-19", fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Total Cases', fontsize=12)
plt.xticks(rotation=90)
cbar = plt.colorbar(sm)
cbar.set_label('GDP During Covid')
plt.tight_layout()
plt.show()

# Comparing GDP Before and During Covid

gdp_before = aggregated_data["GDP Before Covid"]

gdp_during = aggregated_data["GDP During Covid"]

plt.figure(figsize=(10, 5))
plt.bar(index, gdp_before, bar_width, label='GDP Before Covid-19', color='indianred')
plt.bar(index + bar_width, gdp_during, bar_width, label='GDP During Covid-19',
color='lightsalmon')
plt.title('GDP Per Capita Before and During Covid-19', fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('GDP Per Capita', fontsize=12)
plt.xticks(index + bar_width / 2, countries, rotation=-45, ha='right')
plt.legend()
plt.tight_layout()
plt.show()

# Stringency Index Visualization

norm = plt.Normalize(aggregated_data['Stringency Index'].min(), aggregated_data['Stringency
Index'].max())
sm = plt.cm.ScalarMappable(cmap="coolwarm", norm=norm)
sm.set_array([])

plt.figure(figsize=(10, 6))
bars = plt.bar(aggregated_data['Country'], aggregated_data['Total Cases'],
24
color=plt.cm.coolwarm(norm(aggregated_data['Stringency Index'])))
plt.title("Stringency Index during Covid-19", fontsize=16)
plt.xlabel('Country', fontsize=12)
plt.ylabel('Total Cases', fontsize=12)
plt.xticks(rotation=90)
cbar = plt.colorbar(sm)
cbar.set_label('Stringency Index')
plt.tight_layout()
plt.show()

25
Skills acquired

1. Python Programming
Improved proficiency in Python, with a focus on key libraries for data science, including
Pandas for data manipulation, NumPy for numerical analysis, Matplotlib and Seaborn
for data visualization, and Scikit-learn for machine learning and predictive modeling.

2. Data Manipulation and Cleaning

Gained expertise in cleaning and preprocessing large datasets, including handling
missing data, identifying outliers, normalizing data, and performing feature engineering
to generate meaningful variables.

3. Data Visualization
Developed strong skills in data visualization, utilizing Matplotlib and Seaborn to create
clear and effective charts, graphs, and heatmaps that communicated trends and insights
to stakeholders.

4. Machine Learning
Acquired hands-on experience with machine learning techniques such as regression
analysis, decision trees, and time-series forecasting. Learned to build, train, and evaluate
models using relevant performance metrics.

5. Statistical Analysis
Enhanced understanding of statistical methods, including hypothesis testing, correlation
analysis, and significance testing, applied to explore relationships between variables in
the COVID-19 dataset.

6. Problem-Solving and Analytical Thinking

Strengthened problem-solving abilities by systematically addressing challenges related
to data quality, model accuracy, and the interpretation of complex data.

7. Effective Communication
Improved the ability to clearly present technical findings to non-technical stakeholders,
delivering insights through reports, presentations, and visualizations that facilitated data-
driven decision-making.

26
SUMMARY

My internship experience at TechQuant has been an invaluable learning journey, providing both
foundational knowledge and practical, hands-on experience in the field of data science. The initial
training program laid a solid foundation by introducing me to essential tools such as Python and
key libraries like NumPy, pandas, Matplotlib, and Seaborn. These tools proved to be
indispensable throughout the internship as I applied them in real-world scenarios, particularly
when analyzing the vast datasets related to the impact of COVID-19 on public health and the
global economy.

One of the key takeaways from the internship was the importance of data cleaning and
preparation. As I worked with large and complex datasets, I gained significant exposure to the
challenges associated with data cleaning, such as handling missing values, removing duplicates,
and ensuring the accuracy of data. This hands-on experience with data manipulation using pandas
and NumPy was invaluable, as it not only enhanced my technical skills but also highlighted the
critical role that data preparation plays in ensuring the reliability of analyses and the robustness
of conclusions.

Additionally, the application of exploratory data analysis (EDA) techniques and machine learning
algorithms was a major component of my internship. Using Python libraries like Matplotlib and
Seaborn for data visualization, I was able to uncover trends and patterns within the data, which
facilitated meaningful insights into the effects of COVID-19. For example, I was able to identify
correlations between socioeconomic factors and public health outcomes, and utilize machine
learning techniques for predictive modeling. These experiences deepened my understanding of
how to leverage data science to inform decision-making and solve complex real-world problems.

A significant aspect of this internship was the opportunity to collaborate with a multidisciplinary
team. Working alongside experts from various fields, such as economics, public health, and policy
analysis, enriched my understanding of how data science can be applied in different contexts. The
cross-disciplinary collaboration also helped me improve my communication skills, as I had to
present technical findings in a way that was accessible and actionable to non-technical
stakeholders. This experience highlighted the importance of effective communication in data
science, as it is essential for translating complex data insights into clear recommendations that
drive decision-making.

Moreover, the internship provided me with insights into the ethical dimensions of data science.
Handling sensitive data, particularly related to public health, underscored the importance of
privacy, transparency, and ethical responsibility in the analysis and interpretation of data. I
learned to approach data analysis with a critical eye, ensuring that my findings were both
scientifically rigorous and socially responsible. This ethical perspective is crucial as data science
continues to play a pivotal role in shaping policy, public opinion, and business strategies.

As I reflect on this internship, I realize how much I have grown as a data scientist. The
combination of technical skills, practical experience, and exposure to interdisciplinary
collaboration has prepared me to take on future challenges in the field of data science. I have
gained confidence in my ability to handle complex datasets, draw meaningful insights, and
communicate findings effectively. The experience has reinforced my passion for using data
science to address global challenges, such as public health crises, and has inspired me to continue
exploring ways in which data can be used to improve the world.

27
Looking ahead, I am eager to build on the knowledge and skills I acquired during my internship.
The technical expertise I gained with tools like Python, Pandas, and machine learning algorithms
will serve as a strong foundation as I continue to pursue my career in data science. Furthermore,
the insights I gained from working on real-world projects, such as analyzing the impact of
COVID-19, have sparked a desire to explore more complex and impactful problems that can
benefit from data-driven solutions.

In conclusion, my internship at TechQuant was a transformative experience that not only

enhanced my technical skills but also broadened my understanding of how data science can be
applied to address real-world problems. The exposure to diverse methodologies, collaboration
with multidisciplinary teams, and the ethical considerations of data science have all contributed
to shaping me into a well-rounded data scientist. I am excited to apply what I have learned to
future projects and continue contributing to the ever-evolving field of data science.
4o mini

28
REFERENCE

• https://pandas.pydata.org/pandas-docs/stable/
• https://numpy.org/doc/
• https://matplotlib.org/stable/contents.html
• https://seaborn.pydata.org/
• https://scikit-learn.org/stable/
• https://www.kaggle.com/datasets
• https://realpython.com/pandas-python-explore-dataset/
• https://jakevdp.github.io/PythonDataScienceHandbook/
• https://stackoverflow.com/questions/tagged/python
• https://www.oreilly.com/library/view/data-science-handbook/9781492041137/
• https://www.datacamp.com/community/tutorials/tutorial-machine-learning-python
• https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
• https://www.analyticsvidhya.com/
• https://towardsdatascience.com/
• https://www.pythonforbeginners.com/
• https://www.learnpython.org/
• https://docs.python.org/3/tutorial/
• https://www.codecademy.com/learn/learn-python-3

29
Conclusion

My internship as a Data Science intern provided invaluable hands-on experience, allowing me

to apply data science techniques in a real-world setting through the analysis of COVID-19's
impact. Throughout the project, I honed my skills in Python programming, data manipulation,
visualization, and machine learning, while also gaining a deeper understanding of how to
handle and analyze complex datasets. By leveraging these skills, I was able to generate
meaningful insights into the socioeconomic effects of the pandemic and offer data-driven
recommendations to stakeholders.

In addition to the technical expertise I developed, this internship sharpened my ability to think
critically and approach problems in a structured manner. Working on a large-scale project from
start to finish, and collaborating with a diverse team of professionals, has provided me with a
solid foundation for my future career in data science.

The comprehensive nature of this project—from data collection and cleaning to predictive
modeling—highlighted the importance of data in guiding decision-making. This experience
has inspired me to seek out more opportunities where I can apply data science to tackle pressing
global challenges. Overall, the internship not only expanded my technical skills but also
enhanced my ability to communicate complex findings clearly and create impactful, data-
driven solutions.

Internshipreport 4
No ratings yet
Internshipreport 4
51 pages
IT Python Intern Report Jun 25 Bargur
No ratings yet
IT Python Intern Report Jun 25 Bargur
30 pages
Hitesh Internship Report
No ratings yet
Hitesh Internship Report
14 pages
22iot21 Internship Report (
No ratings yet
22iot21 Internship Report (
23 pages
Hitesh Internship Report - 1
No ratings yet
Hitesh Internship Report - 1
11 pages
DS Internship Report
No ratings yet
DS Internship Report
34 pages
Covid 19report
No ratings yet
Covid 19report
41 pages
Internship Progress Report: Data Science
No ratings yet
Internship Progress Report: Data Science
14 pages
NIELIT DS Internship Report
No ratings yet
NIELIT DS Internship Report
23 pages
Project Report
No ratings yet
Project Report
58 pages
Final Int. Report
No ratings yet
Final Int. Report
14 pages
Industrial Report
No ratings yet
Industrial Report
52 pages
SHUKLAdocument
No ratings yet
SHUKLAdocument
21 pages
Diabetes Disease Prediction Using A Web Tool With The Help of A Machine Learning Model.
No ratings yet
Diabetes Disease Prediction Using A Web Tool With The Help of A Machine Learning Model.
43 pages
VISHWA - INT (1) (1) Lights
No ratings yet
VISHWA - INT (1) (1) Lights
49 pages
Internship2 Report 2064 (AutoRecovered)
No ratings yet
Internship2 Report 2064 (AutoRecovered)
29 pages
Summer Entrepreneurship-II REPORT
No ratings yet
Summer Entrepreneurship-II REPORT
35 pages
Finall Report Internship
No ratings yet
Finall Report Internship
45 pages
School of Engineering and Technology: Data Science"
No ratings yet
School of Engineering and Technology: Data Science"
18 pages
Sanchay BCA
No ratings yet
Sanchay BCA
1 page
Data Scientist&Analytics
No ratings yet
Data Scientist&Analytics
2 pages
8th - Sem - Shreya - Internship - Report
No ratings yet
8th - Sem - Shreya - Internship - Report
43 pages
Internship Report
No ratings yet
Internship Report
64 pages
Aparna INTERN REPORT 12
No ratings yet
Aparna INTERN REPORT 12
46 pages
Internship Report 1
No ratings yet
Internship Report 1
19 pages
Data Science Intern Report Sheena
No ratings yet
Data Science Intern Report Sheena
24 pages
Placement CV
No ratings yet
Placement CV
1 page
Data Analytics Internship: Mohit Kumar Guided By: Mr. Sandip Gavit (Internship Supervisor)
No ratings yet
Data Analytics Internship: Mohit Kumar Guided By: Mr. Sandip Gavit (Internship Supervisor)
8 pages
Internship
No ratings yet
Internship
52 pages
Odugaa Tech Internship Report 2024
No ratings yet
Odugaa Tech Internship Report 2024
13 pages
Medical Cost Prediction Internship
No ratings yet
Medical Cost Prediction Internship
10 pages
Ayush Cse Synopsis2
No ratings yet
Ayush Cse Synopsis2
11 pages
Data Science Intern Report Meena
No ratings yet
Data Science Intern Report Meena
24 pages
AI-ML for Disease Detection Report
No ratings yet
AI-ML for Disease Detection Report
64 pages
7th Sem Intern
No ratings yet
7th Sem Intern
12 pages
Machine Learning Internship Report
No ratings yet
Machine Learning Internship Report
19 pages
Akshay Final Internship Report
No ratings yet
Akshay Final Internship Report
64 pages
Vinesh
No ratings yet
Vinesh
10 pages
Adnan Internship
No ratings yet
Adnan Internship
15 pages
Data Science & Machine Learning: Prajapati Dipkumar Ramabhai
No ratings yet
Data Science & Machine Learning: Prajapati Dipkumar Ramabhai
53 pages
Carlos Diego
No ratings yet
Carlos Diego
3 pages
Sravan Resume1
No ratings yet
Sravan Resume1
3 pages
Internship Completion Certificate
No ratings yet
Internship Completion Certificate
30 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
27 pages
Data Science 4-Week Internship Report
No ratings yet
Data Science 4-Week Internship Report
14 pages
Data Science and Analytics 1
No ratings yet
Data Science and Analytics 1
2 pages
Inbound 8735677009415310819
No ratings yet
Inbound 8735677009415310819
17 pages
Copy-Roorkee Institute of Technology-Merged
No ratings yet
Copy-Roorkee Institute of Technology-Merged
15 pages
Technohacks Internship Report
No ratings yet
Technohacks Internship Report
22 pages
Data Analyst Portfolio: Projects & Skills
No ratings yet
Data Analyst Portfolio: Projects & Skills
1 page
Aditya Singh
No ratings yet
Aditya Singh
15 pages
Week 1 Submission of Billion Connect
No ratings yet
Week 1 Submission of Billion Connect
7 pages
AIML Internship Report
No ratings yet
AIML Internship Report
53 pages
Internship Report Format VII SEM
No ratings yet
Internship Report Format VII SEM
17 pages
Codsoft Report
No ratings yet
Codsoft Report
26 pages
Data Science Intern
No ratings yet
Data Science Intern
19 pages
Screenshot 2024-12-14 at 1.26.20 PM
No ratings yet
Screenshot 2024-12-14 at 1.26.20 PM
15 pages
Hospital KPI Data Entry Guide
No ratings yet
Hospital KPI Data Entry Guide
48 pages
Unwed Mother.
No ratings yet
Unwed Mother.
22 pages
HIV/TB Nurse Officer CV: Glory Nicholaus
No ratings yet
HIV/TB Nurse Officer CV: Glory Nicholaus
3 pages
King's College London
No ratings yet
King's College London
8 pages
Safety Data Sheet: Product Name: Mobil Dte Oil Light
No ratings yet
Safety Data Sheet: Product Name: Mobil Dte Oil Light
9 pages
جميع المستندات المرقمة ب الاجابات الصحيحة
No ratings yet
جميع المستندات المرقمة ب الاجابات الصحيحة
93 pages
The Chiro Express Auto Accident Intake Form
100% (1)
The Chiro Express Auto Accident Intake Form
8 pages
P.7 2024 Sci Ple Mock - Guide
No ratings yet
P.7 2024 Sci Ple Mock - Guide
12 pages
Amended Statement of Claim1
No ratings yet
Amended Statement of Claim1
20 pages
EMS Evolution and Integration
No ratings yet
EMS Evolution and Integration
9 pages
Final Questionnaire
No ratings yet
Final Questionnaire
5 pages
The Source of Shoulder Pain in Hemiplegia: Robert L. Joynt, MD
No ratings yet
The Source of Shoulder Pain in Hemiplegia: Robert L. Joynt, MD
5 pages
Technology's Impact on Human Flourishing
No ratings yet
Technology's Impact on Human Flourishing
2 pages
Science - Grade 7 Learner Activity Sheets Quarter 1, Week No. 1 Title: Components of Scientific Investigation First Edition, 2021
No ratings yet
Science - Grade 7 Learner Activity Sheets Quarter 1, Week No. 1 Title: Components of Scientific Investigation First Edition, 2021
11 pages
Thomas Dissertation 2020
No ratings yet
Thomas Dissertation 2020
95 pages
66aa79bbd5dfc PDF
No ratings yet
66aa79bbd5dfc PDF
28 pages
Case History Sample 2 Case History Sample 2
No ratings yet
Case History Sample 2 Case History Sample 2
6 pages
Costa Rica Demographics and Geography Data
No ratings yet
Costa Rica Demographics and Geography Data
1 page
Counseling Core Values and Principles
No ratings yet
Counseling Core Values and Principles
25 pages
Medicare Advantage Provider Directory Central
No ratings yet
Medicare Advantage Provider Directory Central
637 pages
Dopamine
No ratings yet
Dopamine
28 pages
Portable Oxygen Concentrator
No ratings yet
Portable Oxygen Concentrator
44 pages
Subcontractor Plan
No ratings yet
Subcontractor Plan
20 pages
QP Ii GNM
No ratings yet
QP Ii GNM
30 pages
CABTA - The Development of A Screening Questionnaire For Childhood Cruelty To Animals
No ratings yet
CABTA - The Development of A Screening Questionnaire For Childhood Cruelty To Animals
7 pages
Bottle Refusal
No ratings yet
Bottle Refusal
1 page
Pharmaceutical Storage Essentials
No ratings yet
Pharmaceutical Storage Essentials
18 pages
Cannabis Brochure
No ratings yet
Cannabis Brochure
2 pages
Difficult or Impossible Facemask Ventilation in CH
No ratings yet
Difficult or Impossible Facemask Ventilation in CH
10 pages
EU Challenges and Priorities - Young Europeans Views - FL556 - Report - en
No ratings yet
EU Challenges and Priorities - Young Europeans Views - FL556 - Report - en
98 pages

Internship 74

Uploaded by

Internship 74

Uploaded by

CSD 3159 – INTERNSHIP REPORT

J.Mohideen Sathak Uzham

Certified that this is the Bonafide record of the work done by

J. Mohideen Sathak Uzham of Register No : 220171601075 of V

Course Faculty Head of the Department

Submitted for the Practical Examination held on

TITLE PAGE NO.

This report presents a comprehensive overview of my internship experience as a Data Science

1. Enhance Technical Proficiency in Data Science

The outbreak of COVID-19 brought unprecedented challenges and restrictions, resulting in

Covid-19 Impact analysis

GDP per capita during covid-19:

# Importing necessary libraries and dataset

# View the data

# Count the occurrences of each country

# Aggregating the data

for country in country_names:

# Creating aggregated data

# Sorting Data Based on Total Cases

# Adding GDP columns

# Total Cases Bar Chart

# Total Deaths Bar Chart

# Pie Chart for Total Cases and Deaths Distribution

# Calculating Death Rate

# GDP Visualization before and during Covid

# GDP Before Covid-19 Color Mapping

# GDP During Covid-19 Color Mapping

# Comparing GDP Before and During Covid

gdp_before = aggregated_data["GDP Before Covid"]

# Stringency Index Visualization

2. Data Manipulation and Cleaning

6. Problem-Solving and Analytical Thinking

In conclusion, my internship at TechQuant was a transformative experience that not only

My internship as a Data Science intern provided invaluable hands-on experience, allowing me

You might also like