Report of the Summer Internship Project
On
DATA ANLYTICS CONSULTING VIRTUAL INTERNSHIP
At
Company Name: Forage
Location: Hyderabad
Duration:
BY
Mr. K BASU NAYAK (2451-20-733-082)
Department of Computer Science and Engineering
Maturi Venkata Subba Rao (MVSR) Engineering College
(An Autonomous Institution)
(Affiliated to Osmania University & Recognized by AICTE)
Nadergul, Saroor Nagar Mandal, Hyderabad – 501 510
2023-24.
Department of Computer Science and Engineering
Maturi Venkata Subba Rao (MVSR) Engineering College (Autonomous)
(Affiliated to Osmania University & Recognized by AICTE)
Nadergul, Saroor Nagar Mandal, Hyderabad-501510
2023-2024
Certificate
This is to certify that the Summer Internship work entitled “DATA ANALYTICS
CONSULTING VIRTUAL INTERNSHIP” is a bonafide work carried out by Mr. K BASU
NAYAK (2451-20-733-082) in partial fulfillment of the requirements for the award of degree
of Bachelor of Engineering in Computer Science and Engineering from Maturi Venkata
Subba Rao (MVSR) Engineering College, affiliated to OSMANIA UNIVERSITY, Hyderabad
during the Academic Year 2022-23 under our guidance and supervision.
INTERNAL EXAMINER EXTERNAL EXAMINER
DECLARATION
This is to certify that the work reported in the present summer internship entitled “DATA
ANALYTICS CONSULTING VIRTUAL INTERNSHIP” is a record of bonafide work done
by us as part of internship in the KPMG FORAGE. The report is based on the project work
done entirely by us and not copied from any other source.
K Basu nayak
(2451-20-733-082)
ii
ACKNOWLEDGEMENTS
We would like to express our sincere gratitude and indebtedness to my summer internship
guide Mr V Sathish, Asst. Professor for his valuable suggestions and interest throughout the
course of this summer internship.
We are also thankful to our principal Dr. G Kanaka Durga and Mr. J Prasanna Kumar,
Professor and Head, Department of Computer Science and Engineering, Maturi Venkata
Subba Rao (MVSR) Engineering College, Hyderabad for providing excellent infrastructure
for completing this summer internship successfully as a part of our B.E. Degree (CSE). We
would like to thank our summer internship coordinator Ms. Sirisha Daggubati, Asst.
Professor for their constant monitoring, guidance, and support.
We convey our heartfelt thanks to the lab staff for allowing us to use the required
equipment whenever needed.
Finally, we would like to take this opportunity to thank our families for their support
through the work. We sincerely acknowledge and thank all those who gave directly or
indirectly their support in the completion of this work.
K Basu nayak
(2451-20-733-082)
iii
MISSION
VISION
vision is to empower students with visual insights into the journey of their data, fostering
a sense of digital literacy and security consciousness. It envisions a visually engaging
representation that not only educates but also sparks interest in networking concepts.
iv
COURSE OBJECTIVES
To prepare the students
To give an experience to the students in solving real life practical problems with all its
constraints.
To give an opportunity to integrate different aspects of learning with reference to real
life problems.
To enhance the confidence of the students while communicating with industry
engineers and give an opportunity for useful interaction with them and familiarize
with work culture and ethics of the industry.
COURSE OUTCOMES
On successful completion of this course student will be
Formulate a problem to map the requirements of real world scenario
Design/develop a small and suitable product in hardware or software.
Exhibit the skills to use contemporary technologies used by the industry
Evaluate the solution against pre-existing alternatives with reference to pre specified
criteria
Demonstrate an understanding of work culture and ethics of the industry
Display effective technical communication skills both orally and written in the form
of a report
ABSTRACT
The objective of the internship is to facilitate reflection on experiences obtained in the
internship and to enhance understanding of academic material by application in the internship
setting. Internships will provide students the opportunity to test their interest in a particular
career before permanent commitments are made.
Internship students will develop skills and techniques directly applicable totheir careers.
Internship programs will enhance advancement possibilities of graduates.
Develop skills in analyzing Data Sets and perform different traditional techniques, processing
methods, make uses of different various algorithms toprocess data quickly and efficiently.
vi
TABLE OF CONTENTS
TABLE OF
CONTENT
Content Page No.
Chapter 1: Introduction
1.1: Big Data
1.2: Data Analytics
1.3: Data Science
1
1
2
2
Chapter 2: Problem
Statement 3
Chapter 3: Motivation 5
Chapter 4:
Methodological Details
4.1: Task 1
4.2: Task 2
4.3: Task 3
6
5
7
7
Chapter 5: Result
5.1: Result and analysis
5.2: Output
11
11
12
Chapter 6: Conclusion
6.1: Conclusion
13
13
Acknowledgement 15
Worksheet 16
References
TABLE OF
CONTENT
Content Page No.
Chapter 1: Introduction
1.1: Big Data
1.2: Data Analytics
1.3: Data Science
1
1
2
2
Chapter 2: Problem
Statement 3
Chapter 3: Motivation 5
Chapter 4:
Methodological Details
4.1: Task 1
4.2: Task 2
4.3: Task 3
6
5
7
7
Chapter 5: Result
5.1: Result and analysis
5.2: Output
11
11
12
Chapter 6: Conclusion
6.1: Conclusion
13
13
Acknowledgement 15
Worksheet 16
References
TABLE OF
CONTENT
Content Page No.
Chapter 1: Introduction
1.1: Big Data
1.2: Data Analytics
1.3: Data Science
1
1
2
2
Chapter 2: Problem
Statement 3
Chapter 3: Motivation 5
Chapter 4:
Methodological Details
4.1: Task 1
4.2: Task 2
4.3: Task 3
6
5
7
7
Chapter 5: Result
5.1: Result and analysis
5.2: Output
11
11
12
Chapter 6: Conclusion
6.1: Conclusion
13
13
Acknowledgement 15
Worksheet 16
References
CHAPTER 1: INTRODUCTION 1
1.1 : BIG DATA 1
1.2 : DATA ANALYTICS 2
CHAPTER 2: PROBLEM STATEMENT 3
CHAPTER 3: METHODOLOGICAL DETAILS 5
3.1: TASK 1 6
3.2: TASK 2 7
3.3: TASK 3 7
CHAPETR 4: RESULT 11
4.1: RESULT AND ANALYSIS 11
4.2: OUTPUT 12
CHAPTER 5: CONCLUSION 13
REFFERENCES 16
vii
LIST OF FIGIRES
Fig.No Figure Name Page No
Fig. 4.2 Before Analysis 10
Fig. 4.2 After analysis 10
viii
CHAPTER I
1. INTRODUCTION
1.1Big Data
What is Data?
The quantities, characters, or symbols on which operations are performed by a computer,
which may be stored and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media. Now, let’s learn Big Data definition
What is Big Data?
Big Data is a collection of data that is huge in volume, yet growing exponentially with time.
It is a data with so large size and complexity that none of traditional data management tools
can store it or process it efficiently. Big data is also a data but with huge size.
What is an Example of Big Data?
Following are some of the Big Data examples-The New York Stock Exchange is an example
of Big Data that generates about one terabyte of new trade data per day.
Types Of Big Data
Following are the types of Big Data :
1. Structured
2. Unstructured
3. Semi-structured
2.2 Data Analytics
As the process of analysing raw data to find trends and answer questions, the definition of
data analytics captures its broad scope of the field. However, it includes many techniques
with many different goals. The data analytics process has some components that can help
a variety of initiatives. By combining these components, a successful data analytics
initiative will provide a clear picture of where you are, where you have been and where
you should go
Types of Data Analytics
Descriptive analytics helps answer questions about what happened. These techniques
summarize large datasets to describe outcomes to stakeholders. By developing key
performance indicators (KPIs,) these strategies can help track successes or failures.
Metrics such as return on investment (ROI) are used in many industries. Specialized
metrics are developed to track performance in specific industries. This process requires
the collection of relevant data, processing of the data, data analysis and data visualization.
This process provides essential insight into past performance.
Diagnostic analytics helps answer questions about why things happened. These
techniques supplement more basic descriptive analytics. They take the findings from
descriptive analytics and dig deeper to find the cause. The performance indicators are
further investigated to discover why they got better or worse. This generally occurs in
three steps:
Identify anomalies in the data. These may be unexpected changes in a metric or a
particular market.
Data that is related to these anomalies is collected.
Statistical techniques are used to find relationships and trends that explain these
anomalies.
Predictive analytics helps answer questions about what will happen in the future.
These techniques use historical data to identify trends and determine if they are likely
to recur.
Predictive analytical tools provide valuable insight into what may happen in the future
and its techniques include a variety of statistical and machine learning techniques,
such as: neural networks, decision trees, and regression.Prescriptive analytics helps
answer questions about what should be done. By using insights from predictive
analytics, data-driven decisions can be made. This allows businesses to make
informed decisions in the face of uncertainty. Prescriptive analytics techniques rely on
machine learning strategies that can find patterns in large datasets. By analysing past
decisions and events, the likelihood of different outcomes can be estimated.
CHAPTER II
Problem Statement:
The client provided KPMG with 3 datasets:
Customer Demographic
Customer Addresses
Transactional data in the past 3 months
To correct issues in data set like accuracy, completeness or duplicate values or null values
2
CHAPTER III
Task 1 : Data Quality Assessements
As per voicemail, please find the 3 datasets attached from Sprocket Central Pty Ltd:
Customer Demographic
Customer Addresses
Transaction data in the past three months
Can you please review the data quality to ensure that it is ready for our analysis in phase two.
Remember to take note of any assumptions or issues we need to go back to the client on. As
well as recommendations going forward to mitigate current data quality concerns.
I’ve also attached a data quality framework as a guideline. Let me know if you have any
questions.
Draft an email to the client identifying the data quality issues and strategies to mitigate
these issues. Refer to ‘Data Quality Framework Table’ and resources below for
criteria and dimensions which have been considered.
Using programs like Excel, Google Sheets, Tableau, Power BI to start. Feel free to use
Python, R Programming Language, Mat Lab and other data analytics tools that you
know of.
Task 2:
Sprocket Central Pty Ltd has given us a new list of 1000 potential customers with
their demographics and attributes. However, these customers do not have prior
transaction history with the organization.
The marketing team at Sprocket Central Pty Ltd is sure that, if correctly analysed, the
data would reveal useful customer insights which could help optimize resource
allocation for targeted marketing. Hence, improve performance by focusing on high
value customers.
For context, Sprocket Central Pty Ltd is a long-standing KPMG client
Whom specializes in high-quality bikes and accessible cycling accessories to riders.
Their marketing team is looking to boost business by analysing their existing
customer dataset to determine customer trends and behaviour.
Using the existing 3 datasets (Customer demographic, customer address and
transactions) as a labelled dataset, please recommend which of these 1000 new
customers should be targeted to drive the most value for the organization.
Task 3:
The client is happy with the analysis plan and would like us to proceed. After
building the model we need to present our results back to the client.
Visualizations such as interactive dashboards often help us highlight key findings and
convey our ideas in a more succinct manner. A list of customersor algorithm won’t cut
it with the client, we need to support our results with the use of visualizations.
Please develop a dashboard that we can present to the client at our next meeting.
Display your data summary and results of the analysis in a dashboard.
It is important to keep in mind the business context when presenting your findings:
What are the trends in the underlying data?
Which customer segment has the highest customer value?
What do you propose should be Sprocket Central Pty Ltd ’s marketing and growth strategy?
What additional external datasets may be useful to obtain greater insights into customer
preferences and propensity to purchase the products?
Specifically, your presentation should specify who Sprocket Central Pty Ltd’s marketing team
should be targeting out of the new 1000 customer list as well as the broader market segment
to reach out to.
CHAPTER IV
Result and analysis:
Result and analysis: We have successfully analysed the dataset given by Sprocket Central Pty
Ltd. The final step was interpreting the results from the data analysis. This part is essential
because it's how a business will gain actual value from the previous four steps. Interpreting
data analysis results should validate why you conducted it, even if it's not 100 percent
conclusive.
OUTPUT:
Before Analysis
After Analysis
CONCLUSION:
Hence, we have successfully completed Assessment of data quality
And completeness in preparation for analysis, then targeted high value
Customers based on customer demographics and attributes and used visualizations to
present insights