Aparna INTERN REPORT 12
Aparna INTERN REPORT 12
Under Supervision of
Prof. R. A. Ghadage
SHRI CHHATRAPATI
SHIVAJI MAHARAJ
COLLEGE OF ENGINEERING
(Duration: 1st Jan 2024 to 15th Feb 2024)
1
DEPARTMENT OF COMPUTER ENGINEERING
SHRI CHHATRAPATI SHIVAJI MAHARAJ COLLEGE
OF ENGINEERING, NEPTI
CERTIFICATE
This is to certify that the “Internship Report” submitted by Aparna Gangishetty is work
done by her and submitted during 2023 – 2024 academic year (Duration: 1st Jan, 2024 to
15th Feb 2024), in partial fulfillment of the requirements for the award of the degree of
BACHELOR OF ENGINEERING in COMPUTER ENGINEERING, at SHRI
CHHATRAPATI SHIVAJI MAHARAJ COLLEGE OF ENGINEERING, NEPTI,
AHMEDNAGAR.
2
INTERNSHIP CERTIFICATE
3
ACKNOWLEDGEMENT
First, I would like to thank Innover Infotech for giving me the opportunity
to do an internship within the organization.
I also would like all the people that worked along with me in Innovar
Infotech with their patience and openness they created an enjoyable working
environment.
I would like to thank my Head of the Department Prof. V. V. Jagtap for her
constructive criticism throughout my internship.
Aparna Gangishetty
4
ABSTRACT
This internship in data analytics with Python provides hands-on experience in exploring and
interpreting data using Python programming language. Participants will delve into the world of
data to extract valuable insights and make informed decisions. Throughout the internship, you
will learn to manipulate and analyze datasets, uncover patterns, and create visualizations to
communicate findings effectively.
The program begins with a solid foundation in Python, ensuring participants are comfortable
with programming basics. As the internship progresses, emphasis is placed on applying these
skills to real-world data scenarios. You will gain proficiency in popular data analytics libraries
such as Pandas and Numpy, enabling you to clean, transform, and manipulate data efficiently.
By the end of the internship, participants will have developed a portfolio showcasing their ability
to analyze data and derive actionable insights. This hands-on experience with Python in the
context of data analytics equips interns with valuable skills sought after in today's data-driven
industries. Whether you are a beginner or have some experience in programming, this internship
provides a supportive environment to enhance your data analytics capabilities using Python. Join
us to unlock the power of data and become proficient in leveraging Python for effective data
analysis.
5
Methodologies:
Data Collection:- Gather relevant data from various sources, ensuring data
quality and integrity.
Data Cleaning:- Identify and rectify errors, missing values, and inconsistencies in
the dataset. Clean data is crucial for accurate analysis.
Learn Relevant Tools:- Master data analytics tools like Python (pandas, numpy), R,
or SQL. Familiarize yourself with data visualization tools such as Tableau or Power
BI.
Machine Learning:- Depending on the role, learn the basics of machine learning.
Scikit- learn for Python is a good starting point. Understand algorithms like
regression, clustering, and classification.
6
Benefits of Internship
7
INDEX
1. Introduction..........................................................................................10
2. Analysis.......................................................................................................11
4. Technology............................................................................................13
PYTHON...............................................................................................14
PANDAS LIBRARY........................................................................15
SEABORN LIBRARY......................................................................15
MATPLOT LIBRARY......................................................................16
TABLEAU.........................................................................................16
5. Project Description....................................................................................17
6. Screenshots................................................................................................29
7. Conclusion..................................................................................................36
8
Learning Objectives/Internship Objectives
Internships are generally thought of to be reserved for college students looking to gain
experience in a particular field. However, a wide array of people can benefit from Training
Internships in order to receive real world experience and develop their skills.
An objective for this position should emphasize the skills you already possess in the area and
your interest in learning more
Internships are utilized in a number of different career fields, including architecture,
engineering, healthcare, economics, advertising and many more.
Some internship is used to allow individuals to perform scientific research while others are
specifically designed to allow people to gain first-hand experience working.
Utilizing internships is a great way to build your resume and develop skills that can be
emphasized in your resume for future jobs. When you are applying for a Training Internship,
make sure to highlight any special skills or talents that can make you stand apart from the
rest of the applicants so that you have an improved chance of landing the position.
9
1. INTRODUCTION
In the dynamic landscape of today's data-driven world, the ability to harness the power of
information is a key driver of success. This report encapsulates my enriching experience during
the Data Analytics with Python internship, where I embarked on a journey to explore, analyze,
and extract meaningful insights from diverse datasets.
Over the course of the internship, I delved into the realm of Python programming,
leveraging powerful libraries such as Pandas, NumPy, and Matplotlib to manipulate, visualize,
and analyze data. From the intricacies of data cleaning to the art of exploratory data analysis
(EDA), each phase of the internship contributed to my growth as a data analyst.
This report aims to provide a comprehensive overview of the learning objectives achieved,
the skills acquired, and the practical applications encountered during the internship. It showcases
the journey from foundational concepts to advanced analytics techniques, illustrating how
Python became the language of choice for unraveling patterns, making predictions, and
communicating findings.
Join me as I navigate through the intricacies of statistical analysis, delve into the world of
machine learning applications, and reflect on the ethical considerations inherent in the field of
data analytics. Through this report, I aim to convey not just the technical aspects of the internship
but also the holistic development of problem-solving skills, teamwork, and effective
communication.
May this documentation serve as a testament to the transformative power of Python in the
realm of data analytics and inspire others to embark on their own data-driven explorations.
10
2. ANALYSIS
Existing System:
A data science project may involve automating data collection and integrating data from
different sources. Advanced machine learning algorithms can be used to analyze the data and
generate insights that are more accurate and actionable. Interactive dashboards or visualizations
can be developed to present the results in a more user-friendly format, allowing stakeholders to
explore the data and gain deeper insights.
Overall, the goal of a data science project is to improve the existing system by leveraging the
latest technology and techniques in data science to generate more accurate insights, streamline
processes, and ultimately drive better decision-making.
11
3. SOFTWARE REQUIREMENTS SPECIFICATIONS
• System : Laptop
• RAM : 8 GB RAM
12
4. TECHNOLOGY
Structured Query Language (SQL) is a powerful domain-specific language used for managing
and manipulating relational databases. It serves as the standard language for interacting with
relational database management systems (RDBMS) and is essential for tasks related to data
definition, data manipulation, and data control.
Key Concepts:
CREATE: Used to create database objects like tables, indexes, and views.
Query Language:SQL is particularly known for its query capabilities, allowing users to retrieve
and filter data based on specified conditions.
13
Joins:
SQL supports various types of joins (e.g., INNER JOIN, LEFT JOIN, RIGHT JOIN) to
combine rows from different tables based on related columns.
Constraints:
SQL enables the definition of constraints such as PRIMARY KEY, FOREIGN KEY, UNIQUE,
and CHECK to maintain data integrity.
Functions:
SQL provides a rich set of functions for manipulating data during queries, including
mathematical functions, string functions, and date functions.
4.2 PYTHON:
Python is a general-purpose language that is designed to be easy to read and write, which
makes it a popular choice for beginners as well as experienced programmers. It has a simple
syntax and is easy to learn, which allows developers to quickly prototype and test ideas. Python
also has a vast library of pre-built modules and packages, which makes it easy to implement
complex algorithms and data structures.
One of the key strengths of Python is its flexibility and versatility. It can be used for a wide
range of applications, from simple scripts to complex applications. It is also platform-
independent, which means that Python code can be run on different operating systems, such as
Windows, Linux, and macOS.
14
4.3 PANDAS LIBRARY:
Pandas is a powerful and popular open-source library for data manipulation and analysis in
Python. It provides easy-to-use data structures, tools for data analysis, and data cleaning
functions. The library is built on top of the NumPy library and provides more high-level data
manipulation functionalities that allow for more efficient and fast analysis.
Pandas provides two main data structures, the Series and DataFrame objects. A Series is a
one- dimensional array-like object that can hold any data type, while a DataFrame is a two-
dimensional table-like data structure that can store data of different types. These two data
structures are the building blocks of pandas and are widely used in data manipulation and
analysis.
Pandas also provides many powerful functions and methods for data manipulation, such as
filtering, merging, grouping, and pivoting data. It also offers tools for data cleaning, including
handling missing data, data imputation, and data normalization.
One of the key strengths of Seaborn is its ability to create complex visualizations with ease.
It provides a range of built-in functions for creating statistical graphics, such as bar plots, scatter
plots, heatmaps, and more. These functions allow for the creation of informative and
aesthetically pleasing visualizations that can help to uncover patterns and relationships in data.
Another strength of Seaborn is its integration with Pandas data structures. Seaborn can
directly accept Pandas data frames as input, making it easy to use with data science workflows.
It also provides many options for customization and formatting of visualizations.
15
4.5 MATPLOT LIBRARY:
Matplotlib is a Python library for creating visualizations and plots. It is one of the most
widely used visualization libraries in the Python ecosystem and provides a comprehensive set
of tools for creating static, animated, and interactive visualizations.
Matplotlib provides a wide range of plot types, including line plots, scatter plots, bar plots,
histograms, and more. It also provides many customization options, allowing users to adjust plot
properties such as color, font size, axis limits, and labels. This makes it easy to create high-
quality, publication-ready visualizations.
One of the strengths of Matplotlib is its integration with NumPy, which allows for efficient
handling and plotting of large datasets. Matplotlib can also be used in conjunction with other
libraries such as Pandas and Seaborn to create even more sophisticated visualizations.
4.6 TABLEAU:
Tableau is a powerful data visualization and business intelligence tool that allows users to
connect, visualize, and share data in a compelling and interactive way. It supports various data
sources, offers a drag-and-drop interface for creating dashboards and reports, and enables users
to explore insights from their data. Tableau is widely used for its user-friendly design, flexibility,
and ability to handle large datasets, making it a popular choice for data analysis and decision
making.
16
5. Project Description:
Myntra.
Myntra is a one stop shop for all your fashion and lifestyle needs. Being India's largest e-
commerce store for fashion and lifestyle products, Myntra aims at providing a hassle free and
enjoyable shopping experience to shoppers across the country with the widest range of brands
and products on its portal. The brand is making a conscious effort to bring the power of fashion
to shoppers with an array of the latest and trendiest products available in the country.
Key Fields:
17
Project Goals:
• Efficient Order Management: Develop a database schema that allows for the efficient
recording and retrieval of customer orders, facilitating quick and accurate order
processing.
• Price and Inventory Tracking: Enable real-time tracking of prices for different food
items and maintain inventory levels to avoid discrepancies.
• Analytics and Reporting: Implement SQL queries to derive insights such as popular
food items, customer preferences, and order trends, supporting informed decision-
making for business growth.
• Data Security and Integrity: Implement robust data security measures to protect
customer information and ensure the integrity of the database.
This project aims to demonstrate the power of SQL in creating a comprehensive and well-
organized products delivery system, fostering efficiency and enhancing the overall experience
for both customers.
18
Project 1 by SQL :-
create table cust(Custid int primary key,Custname varchar(50),Custcity varchar(50),Phoneno
int,Gender text,Pincode int,Custreview text);
19
values(140,'hema','nanded',897765,'F',431601,'good');
values(141,'riya','pune',997834,'F',443201,'satisfactory');
values(142,'aliya','sangli',998534,'F',415301,'best');
values(143,'akshay','solapur',986643,'M',413001,'poor');
values(144,'riya','kolapur',887654,'F',415101,'worst');
values(145,'shahid','pune',976653,'M',413502,'satisfactory');
20
Questions :-
select count(Custid)
from cust
where Gender='F';
select Custreview,count(Custreview)
from `cust`
group by Custreview;
select Gender,count(*)
from `cust`
group by Gender;
21
select *,
CASE
WHEN Custcity = 'mumbai' THEN 'Tier 1'
WHEN Custcity = 'pune' THEN 'Tier 1'
WHEN Custcity = 'nagpur' THEN 'Tier 2'
WHEN Custcity = 'nashik' THEN 'Tier 2'
WHEN Custcity = 'ahmednagar' THEN 'Tier
3' ELSE 'Tier 4'
END as city_Rank
from cust;
product table
insert into
product(prid,Custid,ordered,iprname,prdetails,orderdate,size,quantity,price,totamt,status,p
rreturn,payment,ratings)
values(1001,102,21103,'kurti','printed three quarter sleeves','23-02-
23','XL',2,200,400,'Delivered','no','online',4.3);
values(1004,105,21233,'facewash','vtamin c face serum','21-06-
21','300ml',1,400,400,'Delivered','no','cash',3.3);
values(1011,103,21234,'curtaitns','polyster single door curtain','11-01-
22','2.15m*1.3m',1,799,799,'Delivered','no','cash',4.1);
values(1111,103,21024,'saree','saree with zari border silk','01-11-
21','onesize',2,527,1054,'Pending','no','online',4.1);
values(1213,111,20124,'T-shirt','cotton printed','11-10-
22
22','XXL',3,319,957,'Delivered','NO','cash',4-5);
values(1141,109,21145,'Airpods','white bluetooth headset','09-03-
22','onesize',1,899,899,'Delivered','returned','cash',2.1);
values(1434,117,20435,'hair serum','anti frizz hair serum','04-05-
23','150ml',2,300,600,'Pending','no','cash',4.8);
values(1124,140,23414,'flats','open toe women flats','11-08-
23','38',1,589,589,'Delivered','no','online',3.5);
values(1432,122,21432,'jeans','men slim fit streachable jeans','05-
03- 23',36,1,799,799,'Delivered','returned','cash',2.9);
values(1552,114,21984,'t-shirt','women printed casual tshirt','21-07-
23','3XL',1,550,550,'Delivered','returned','online',2.2);
values(1110,100,10424,'bangles','oxidised beaded bangles','17-06-
23',2.8,2,200,400,'Delivered','no','online',4.5);
values(1324,118,19654,'trousers','women parallel trouser','13-01-
24',32,1,720,720,'Pending','no','online',3.8);
values(1230,137,11756,'cushion cover','cotton square cushion cover','18-
06- 22','X16',2,340,680,'Delivered','no','online',5.5);
values(1432,130,23412,'dinner set','27pcs printed dinnerset','20-08-
22','onesize',1,1704,1704,'Delivered','returned','online',2.5);
values(1332,123,39452,'idol set','silver brass radhakrishna idol','25-10-
23','onesize',1,850,850,'Delivered','no','cash',4.9);
values(1240,142,18764,'necklace','gold plated layered','17-09-
23','onesize',2,350,700,'Delivered','no','cash',4.3);
values(1111,149,39874,'saree','zari border silk saree','27-12-
23','onesize',1,550,550,'Delivered','no','online',5.5);
values(1563,122,15432,'ethic wear','printed kurti with dupatta','24-05-
23','M',1,1759,1759,'Delivered','no','online',2.5);
values(1650,116,14374,'men kurta','mirror work cotton kurta','08-11-
23','XL',2,1200,2400,'Pending','no','online',4.9);
values(1124,131,22134,'flats','open toe women flats','27-11-
23',36,1,589,589,'Delivered','returned','cash',2.5);
values(1010,129,21765,'bedsheet','floral printed','02-03-23','double
XL',3,350,1050,'Delivered','no','cash',4.9);
23
select min(price)
from product
select max(price)
from product
24
select sum(price)
from product
select avg(price)
from product
select prname,quantity
from product
where quantity >2;
25
select c.*,p.prname,ratings
from cust c
join product p
on c.Custid=p.Custid;
select c.*,p.quantity,status
from cust c
left join product p
on c.Custid=p.Custid;
26
PYTHON PROJECT
In this Python project, we delve into the fascinating world of exploratory data analysis (EDA)
using the Anaconda distribution.
This dataset unveils the statistics of the most subscribed YouTube channels. A collection of
YouTube giants, this dataset offers a perfect avenue to analyze and gain valuable insights from
the luminaries of the platform. With comprehensive details on top creators,subscriber counts,
video views, upload frequency, country of origin, earnings, and more.
Dataset Overview:
The dataset consists of information related to YouTube channels,Youtubers, including features
such as head, tail, description, index, and additional details crucial for understanding the dataset's
structure.
Operations Conducted:
Unique operation:
generate an array that only includes unique elements from series
Index Operation:
Explore the dataset's index to understand its organization and uniqueness
27
Boxplot Operation:
Implement boxplot visualizations to identify the spread, central tendency, and outliers in
numerical variables, providing a robust understanding of the dataset's statistical characteristics.
Plt.show()
show the plot of specific column
Pivot table()
pivot table uesd to transfer columns into rows
Heatmap
heatmap is used to dispaly relationship between variables in a tabular dataset
sns.jointplot
displays relationship between two variables and distribution of individuals of each variable
28
6. Screenshots
29
30
31
32
33
Tableau Project
The "IPL Dashboard" is an immersive data visualization project crafted using Tableau, aimed
at unlocking valuable insights within the realm of IPL team data. This project focuses on
harnessing the power of visual analytics to provide a comprehensive overview of various facets
of team statistics and operations.
Objective:
The primary goal of the project is to transform raw IPL data into an interactive and visually
appealing dashboard that allows stakeholders to gain meaningful insights. From enrollment
trends and team performance to resource allocation and playing demographics, the dashboard
provides a holistic view of the teams landscape.
Introduction:
The Indian Premier League (IPL) is a Twenty20 cricket league founded in 2008 and held
annually. The league features participation from national and international players, with eight
teams representing eight Indian cities that compete in a double round-robin format during the
league stages, followed by playoffs. Over the years, the IPL has emerged as one of the most-
watched and most-attended live sporting events globally.
Business Objective:
As a data analyst at IPL, I create Tableau dashboards for news reports and feeds. Recently,
the Sports Editor asked me to build an interactive dashboard featuring IPL statistics for their
upcoming newsletter. The dashboard will provide customizable filters for interactivity and
display visual representations created in Tableau.
matches.csv: which contains match-level information for every match held in IPL from 2008 to
2017
34
SCREENSHOT OF PROJECT
IPL DATASET
35
7. CONCLUSION
My data analytics internship equipped me with valuable skills in SQL, Python, and
Tableau. I gained hands-on experience in querying databases, manipulating data using Python,
and creating insightful visualizations in Tableau. This internship not only enhanced my technical
proficiency but also provided me with a practical understanding of how these tools synergize to
derive meaningful insights from data. I gain very much knowledge from this internship. I am
now well- prepared to apply these skills in real-world scenarios, making a meaningful
contribution to data- driven decision-making processes.
36
WEEKLY DIARY
37
WEEKLY DIARY
FOR
INDUSTRIAL TRAINING
38
Week 1 :- From 01 Jan To 06 Jan 2024
Monday, 01 Jan 2024 On this Day, We got knowledge about the company and all the
activities which is to be carried out in the entire Internship.
Tuesday, 02 Jan 2024 On this Day, We get explanation about the companies project and
overall internship work.
Wednesday, 03 Jan 2024 On this Day, We gain all the knowledge about the concept of Excel
related to the Data Analytics.
Thursday, 04 Jan 2024 On this Day, We learned the Aggregation Functions (Min, Max, Sum,
Count, Average) and advanced formulas of Excel.
Friday, 05 Jan 2024 On this Day, We learned all the knowledge about statistics which is
required for Data Analytics.
Saturday, 06 Jan 2024 On this Day, We Learned about the concept of Outliers and Skewness
(Left Skewness & Right Skewness).
Summarize At The Week End :- Introduction to company and there work , gain knowledge about
excel and data analytics.
39
Week 2 :- From 08 Jan To 13 Jan 2024
Monday, 08 Jan 2024 On this Day, We get brief introduction of the SQL (Structured
Query Language) and how it is helpful in data analytics.
Wednesday, 10 Jan 2024 On this Day, We work on select statement, Where clause, Having
clause and Know the difference between Where clause and
Having clause
Thursday, 11 Jan 2024 On this Day, we work on group by function and order by function
in SQL.
Friday, 12 Jan 2024 On this Day, We get overview of limit and Aggregation Functions
(Min, Max, Sum, Count, Average).
Summarize At The Week End :- We Implemented all the concept of sql(Aggregation function, group
by, limit by which is used for the data analytics.
40
Week 3 :- From 15 Jan To 20 Jan 2024
Monday, 15 Jan 2024 On this Day, We get knowledge about the subquery in the SQL with data
analytics.
Tuesday, 16 Jan 2024 On this Day, We practically work on subquery (A query inside another
query).
Wednesday, 17 Jan 2024 On this Day, We come to know about Joins and the types of joins in SQL.
Thursday, 18 Jan 2024 On this Day, We work on the concept of Joins (Inner join, Outer join,
Left join, Right join).
Friday, 19 Jan 2024 On this Day, We work on Operators (Arithmetic, Bitwise operator,
Logical operator, increment and decrement operator).
Saturday, 20 Jan 2024 On this Day, We work on the case statement (If, If-Else, Else-If )
Summarize At The Week End :-We successfully implemented subqueries ,joins(inner join, outer join,
left and right join) operators and case statements.
41
Week 4 :- From 22 Jan To 28 Jan 2024
Monday, 22 Jan 2024 On this Day, We get introduction of python (variables, class,
functions, loop, OOP’s).
Tuesday, 23 Jan 2024 On this Day, We get Overview of Anaconda Navigator and Jupyter
Notebook and done the installation of Anaconda Navigator.
Wednesday, 24 Jan 2024 On this Day, We come to know about python collection objects
(strings, list, tuple and dictionary).
Thursday, 25 Jan 2024 On this Day, We Downloaded the dataset required for data analytics in
python.
Saturday, 27 Jan 2024 On this Day, We work on datasets using Pandas and Numpy libraries.
Summarize At The Week End :- We successfully gain knowledge about the concept of
python and different libraries (Numpy, Pandas and Matplot) in python.
42
Week 5 :- From 30 Jan To 04 Feb 2024
Monday, 30 Jan 2024 On this Day, We get the overall introduction of the Tableau and
downloaded Tableau in our system.
Tuesday, 31 Jan 2024 On this Day, We imported different datasets in Tableau which is used
for data analytics.
Wednesday, 01 Feb 2024 On this Day, We get to know about the Tableau interface and different
chart types (Pie Chart, Histogram, Boxplot).
Thursday, 02 Feb 2024 On this Day, We implemented mapping of datasets and perform visual
analytics of it.
Friday, 03 Feb 2024 On this Day, We Work on different questions and calculations on the
dataset.
Summarize At The Week End :- We successfully gain knowledge about the concept of
Tableau and created our own dashboard to publish our work.
43
Week 6 :- From 30 Jan To 04 Feb 2024
Monday, 06 Feb 2024 On this Day, We start build in our SQL project, perform all the research about
that project.
Tuesday, 07 Feb 2024 On this Day, We start to implement that project (Food Delivery System) and
completed it Successfully.
Wednesday, 08 Feb 2024 On this Day, We start build in our Python project, perform all the research about
that project.
Thursday, 09 Feb 2024 On this Day, We start to implement that project (Vehicle System) and
completed it Successfully.
Friday, 10 Feb 2024 On this Day, We start build in our Tableau project, perform all the research
about that project and Publish our dashboard about project in our tableau
account.
https://public.tableau.com/views/IPLDashboard_17093605462120/Dashboard1?:l
anguage=en-US&:sid=&:display_count=n&:origin=viz_share_link
Summarize At The Week End :- We successfully complete our projects of python, Tableau,
SQL.
44
45
46