0% found this document useful (0 votes)
153 views

CS Lab Manual New Format

The document discusses the department of computer science and business systems at a college. It provides information on the department vision, mission and quality policy. It also outlines the program educational objectives, specific outcomes and outcomes of the B.Tech program in computer science and business systems.

Uploaded by

anithaa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views

CS Lab Manual New Format

The document discusses the department of computer science and business systems at a college. It provides information on the department vision, mission and quality policy. It also outlines the program educational objectives, specific outcomes and outcomes of the B.Tech program in computer science and business systems.

Uploaded by

anithaa
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 57

DEPARTMENT OF COMPUTER SCIENCE AND

BUSINESS SYSTEMS

LABORATORY MANUAL

21CB5611 – COMPUTATIONAL STATISTICS LABORATORY

ACADEMIC YEAR : 2023 – 2024 / ODD SEMESTER


CLASS/SEM : III YEAR, B.TECH CS & BS / V SEM

1
PREFACE
Learning is a process that requires class instructions and practice labs. If we omit any of
the above then the learning process is clearly flawed. This manual is an attempt to standalone
the lab instructions through the development of lab curriculum that is based on the class
curriculum. This manual is intended to be used by lab instructors, course instructors and
students.
The intent of this curriculum is to define a clear lab structure that can be followed by
the lab instructor and the students. Perhaps one of the greatest problems faced by lab
instructors is that they are unable to keep the students occupied for the entire duration of the
lab due to which the learning process is greatly hampered.
The labs have been developed in such a way that there is synchronization between the
class and the lab. The manual has been divided into 15 labs having duration of 3 hours each.
Students of the course are expected to carefully read the concept map before coming to the lab.
Students come to the lab with a design/program that will be handed over to the lab instructor
for further grading. The code/design is based on previous learning and experiments. Each lab
has a detailed walk-through task which provides a problem statement and its programmable
solution to the students. The students can raise queries about the code provided and the lab
instructor will guide the students on how the solution has been designed.
Thereafter predefined practice questions have been presented such that each question
has a fix duration and grade. Students are graded upon their accomplishments in these
practice tasks. At the end of the lab, the lab instructor will assign an unseen task to the
students. This unseen task contains all the concepts taught in the lab. These unseen tasks have
a higher level of complexity and generally have a greater gain in terms of marks.
What sets these labs apart is the fact that a clear grading criterion has been defined for
each lab. Students are aware of the grading criteria and are expected to meet the requirements
for successful completion of each lab.

2
COLLEGE VISION AND MISSION

College Vision

Our Vision is "To create innovative and vibrant young leaders and entrepreneurs in
Engineering and Technology for building India as a super knowledge power and blossom into a
University of excellence recognized globally".

College Mission

To provide education in Engineering with excellence and ethics and to reach the unreached.

3
COLLEGE QUALITY POLICY

We aim at continuous pursuit for excellence through

 Quality education tapped from National and International Resources


 Modular approach to channelize knowledge and programmed evaluation
of knowledge accumulated.
 Continuous reviewal and renewal of quality systems leading to quality
output.
 Producing Engineers with strong ethical and moral background.

4
DEPARTMENT OF COMPUTER SCIENCE AND BUSINESS SYSTEMS
Computer Science and Business Systems (CSBS) is one of the popular courses among engineering
aspirants which mainly focuses on computation, analysis of algorithms, programming languages,
program design, software engineering, computer hardware, computer networks and problem-
solving skills. This course was established in the year 2021 in Francis Xavier Engineering College. To
address the growing need of Engineering talents with skills in digital technology, TCS, in partnership
with leading academicians across India, has designed a 4 years undergraduate programme on
Computer Science titled “Computer Science and Business Systems.

This curriculum aims to ensure that the students graduating from the program not only know the
core topics of Computer Science but also develop an equal appreciation of humanities, human
values, Financial Management, Services Science & Service Operational Management, Marketing
Research, and Marketing Management.

5
DEPARTMENT VISION AND MISSION

Department Vision

To become a center of excellence in Information Technology and to generate young


Engineers with enriched knowledge to serve industries with high values and social
responsibilities.

Department Mission

 To provide world class teaching learning environment and to offer computing

education programs.
 To inculcate varied skill sets that meets global industry standards and to practice

moral values.
 To enrich moral and ethical values to lead and serve the society.

6
DEPARTMENT QUALITY POLICY

We aim at continuous pursuit for excellence through

 Identify and prioritize Quality education in of Computer Science and Business systems

tapped from National and International Resources

 Promote Modular approach to channelize knowledge and sharing of best practices

relative to quality management in of Computer Science and Engineering.

 Provide a focal point for an extended IT quality network comprised of end users and

providers leading to quality output.

 To develop professional and ethical attitude, effective communication skills, moral

values and an ability to relate engineering issues to social welfare

7
PROGRAM EDUCATIONAL OBJECTIVES

S. PEOs Definition of
No PEOs
To apply problem solving skills in Computer science and
I PEO 1 Business Management by applying Engineering
fundamentals.
To improve communication skills, business management
skills, professional ethics, team work and to innovate
II PEO 2 technologies for the betterment of society.

To exhibit leadership qualities, interpersonal skills and


to progress through life-long learning.
III PEO 3
To develop professional and ethical attitude, effective
communication skills, moral values and an ability to
IV PEO 4 relate engineering issues to social welfare.

PROGRAM SPECIFIC OUTCOMES

S. No. Program Specific Outcomes

Enriched knowledge in Computer Science, Business Management and human


PSO1 ethics.

The students will have effective knowledge in software engineering


PSO2 principles and solving scientific and business problems.
The students will explore emerging technologies in Information and
Communication Technologies (ICT), Business Analytics and Machine
PSO3 Learning to innovate ideas and solutions to existing/novel Business
applications

8
PROGRAM OUTCOMES
S. No. Programme
Outcomes
Engineering knowledge: Apply the knowledge of mathematics, science,
PO1 engineering fundamentals, and an engineering specialization to the solution of
complex engineering problems.
Problem analysis: Identify, formulate, review research literature, and
PO2 analyze complex engineering problems reaching substantiated conclusions
using first principles of mathematics, natural sciences, and engineering
sciences.
Design/development of solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified
PO3 needs with appropriate consideration for the public health and safety, and the
cultural, societal, and environmental considerations.
Conduct investigations of complex problems: Use research-based
PO4 knowledge and research methods including design of experiments, analysis
and interpretation of data, and synthesis of the information to provide valid
conclusions.
Modern tool usage: Create, select, and apply appropriate techniques,
PO5 resources, and modern engineering and IT tools including prediction and
modeling to complex engineering activities with an understanding of the
limitations.
The engineer and society: Apply reasoning informed by the contextual
PO6 knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
Environment and sustainability: Understand the impact of the professional
PO7 engineering solutions in societal and environmental contexts, and
demonstrate the knowledge of, and need for sustainable development.
Ethics: Apply ethical principles and commit to professional ethics
PO8 and responsibilities and norms of the engineering practice.
Individual and team work: Function effectively as an individual, and as a
PO9 member or leader in diverse teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering activities
PO10 with the engineering community and with society at large, such as, being able
to comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.
Project management and finance: Demonstrate knowledge and
PO11 understanding of the engineering and management principles and apply these
to one’s own work, as a member and leader in a team, to manage projects and
in multidisciplinary environments.
Life-long learning: Recognize the need for, and have the preparation and
PO12 ability to engage in independent and life-long learning in the broadest context
of technological change.

9
LBORATORY

INTRODUCTION

OBJECTIVE
The graduates from Data and Computational Science course are called computational and data
scientists. These scientists have the responsibility to work on mathematical models, develop
quantitative analysis techniques and learn the usage of computers to analyse and solve real-life
scientific problems. The knowledge of innovative tools which can be used and how they should
collaborate with clients and fulfil their demands is one of the most important aspects of Data and
Computational Science course. Extraordinary technology like modelling, simulation and data mining
can also be studied in Data and Computational Science course.

SCOPE OF DBMS
Since more technologies are being used in today’s world, the need for a Data scientist is increasing
tremendously. Therefore, the scope of Data and Computational Science is huge. Data and
Computational Science course is very attractive for the young generation as well as the professionals
of this field. The demand is created in sectors of Information technology, telecom, manufacturing,
finance and insurance, retail and much more. Data Science is used in the fields of E-Commerce,
manufacturing, banking and finance, healthcare, and transport. Data and Computational Science
professionals can find employment in the top companies like Amazon, Walmart, and Mate Labs with
a variety of job roles of Software engineer, data scientist, business analyst, and many more.

10
Do’s and Don’ts

Do’s
 Know the location of the fire extinguisher and the first aid box and how to use

them in case of emergency.

 Read and understand how to carry out an activity thoroughly before coming to the

laboratory.

 Report fires or accidents immediately.

 Report any broken plugs or exposed electrical wires immediately.

 Turn off the machine when it is not in use.

 Always maintain an extra copy of all your important data.

Dont’s
 Do not eat or drink in the laboratory.

 Avoid stepping on electrical wires or any other computer cables.

 Do not insert metal objects such as clips, pins and needles into the computer casings.

They may cause fire.

 Do not remove anything from the computer laboratory without permission.

 Do not touch, connect or disconnect any plug or cable without permission.

 Do not run inside the Lab.

 Do not personalize the computers, for example, installing screen savers,

changing the desktop back ground or changing the video and audio settings.

11
Safety Measures and Guidelines
 Take a note of all the exits in the room and also take note of the location of

the fire extinguishers in the room for the sake of fire safety.

 Try not to type continuously for extremely long periods.

 Look away from the screen once in a while to give your eyes a rest.

 Do not attempt to open any machines and do not touch the backs of machines when

they are switched on.

 Do not spill water or any other liquid on the machines in order to maintain electrical
safety.

 Do not personalize the computers , for example, installing screen savers,

changing the desktop back Ground or changing the video and audio settings.

 All connections and disconnections must be performed by the technical staff.

 Report fires or accidents immediately.

 Report any broken plugs or exposed electrical wires Immediately.

 Turn off the machine when it is not in use.

12
Instructions to Teachers
 Teacher should review the experiment’s instructions prior to class for proper
conduction of the experiments
 Teachers must instruct students in Internet Safety
 Teacher must remain in the lab at all times and is responsible for discipline.
 Teacher must report for any computer with missing or damaged hardware or peripherals
 Teachers are expected to closely monitor student activity by frequent screen checks.
 Teachers should report any non-functioning technology equipment to their
Department Head
 Teachers, should when using computer labs, turn off the digital projector, and return
the room key after doors have been locked. Doors to computer labs must be locked
when not in use.
 Everyone will adhere to federal copyright laws.

13
Instructions to Students
 Student should follow the Lab dress code whenever they avail the laboratory facilities
and make sure your ID cards are visible outside
 Whenever students enter into the lab, they should make the entry in the log register kept
for that purpose.
 Observation note books / record note books are only allowed inside the lab, other
belongings are not allowed.
 Maintain silence in the Lab.
 Only one user is allowed to work in one system at a time.
 If any problem occurs in the software or hardware it should be brought to the notice
of the staff in-charge, as well as entry should be made in the log register kept for that
purpose.
 The laboratory must be kept clean and neat.
 Arrange the chairs before leaving the lab.
 Shutdown the systems in a proper way before leaving the laboratory.

14
Lab Code of Conduct

 You must wear your ID and Lab Coat each time you enter a computer lab. If you do
not have your ID, or lab coat when entering the computer lab, you may be asked to
leave the computer lab.
 No drinking or eating is allowed in any computer lab. All open and unopened food,
and beverages are prohibited from entering the computer lab.
 You must be considerate of other users. Privacy and concentration are important in
computer labs. If you need to talk to somebody, please do so in a way that does not
disturb other users.
 Lab assistants are there to assist in using the technology so that you may complete
your work.
 The computer labs are an academic resource. As such, please respect the needs of
others by not monopolizing the computers for non-academic use.
 Lab staff is not responsible for any belongings left in the computer labs. Please make
sure you take your belongings with you when you leave.
 The computers in the labs have been set up in such a way as to be used by multiple
people having differing needs. Do not change or interfere with the configuration of
the computers.
 Software downloaded from the Internet is not to be installed on any lab computer for
any purpose.
 Documents should be saved to the D drive.
 Users are not allowed to print large quantities of flyers, banners or other distribution
materials. If print jobs of this nature are required, one copy may be printed in the
computer lab and copies will need to be processed through the alternative printing
facility.
 Attempting to damage or destroy information on the computers will not be tolerated.
 You are expected to leave your computer in the same condition as you found it. This
includes putting chairs back in place and logging out when you leave.
 You are responsible for reading and abiding by all signs posted in the computer labs.

15
Major Lab Equipment with Specifications

LAB EQUIPMENT FOR A BATCH OF 30 STUDENTS:


 Standalone desktops with Anaconda/ R studio
 Operating system: Windows

16
OBJECTIVES: Course Objectives and Course Outcomes
The student should be made to:

1. To expose the variables, expressions, control stations of R.


2. To use R programming for analysis of data and visualize outcomes in the form of
graphs, charts.
3. To develop and understand the modern computational statistical approaches and
their applications to different data sets.
4. To apply principles of data science to analyse various business problems.
5. To use R software to carry out statistical computations and to analysis data using R.

OUTCOMES:

At the end of the course, the student should be able to

CO1 Apply the basic concepts of Computational Statistics using python & R
CO2 Apply the Graph techniques
CO3 Apply the multivariate graphing techniques
CO4 Apply the concept of regression and clustering
CO5 Implement a project based on the Data Analytics

17
Mapping Course Outcome with Program Outcome
Course

algorithms and recommendation systems

Work with big data tools and analysis


Build a predictive analytic solution
Understand knowledge discovery

Analyze data by using clustering,


association and classification
process and methodologies
Outcomes Program Outcomes

Learn and apply different mining

techniques
for large volumes of data.
Pa Engineering Knowledge: Apply
knowledge of mathematics, science, H H H H M
engineering fundamentals and an
engineering specialization
for building engineering models.
Pb Problem Analysis: Identify and solve
engineering problems reaching H H H H M
conclusions using mathematics and
engineering sciences.
Pc Design/Development of Solutions:
Design and develop solutions for H H H H M
engineering problems
that meet specified needs.
Pd Conduct Investigations of Complex
Problems: Conduct investigations of H H H
complex problems including design of
experiments and
analysis to provide valid solutions.
Pe Modern Tool Usage: Create and apply
appropriate techniques, resources, and H M
modern engineering tools for
executing engineering
activities.
Pf The Engineer and Society: Apply
reasoning of the societal, safety issues and
the consequent
responsibilities relevant to engineering
practice.
Pg Environment and Sustainability: M
Understand the impact of engineering
solutions in the environment and exhibit
the knowledge for
sustainable development.
Ph Ethics: Apply ethical principles and M
18
commit to professional ethics,
responsibilities and norms of
engineering practice.
Pi Individual and Team Work: Function M
effectively as an individual, and as a
member or leader in diverse teams in
multi-disciplinary
settings.
Pj Communication: Communicate M
effectively to the engineering community
and the outside world
and also to write effective reports.
Course

algorithms and recommendation systems

Work with big data tools and analysis


Build a predictive analytic solution
Understand knowledge discovery

Analyze data by using clustering,


association and classification
process and methodologies
Outcomes Program Outcomes

Learn and apply different mining

techniques
for large volumes of data.

Pk Project Management and Finance: M M M M H


Understand engineering and management
principles and apply them to handle
projects in
multi disciplinary environments.
Pl Life-Long Learning: Recognize the
need for life-long learning and apply in
the context of
technological change.

19
LIST OF EXPERIMENTS

L T P C
21CB5611 COMPUTATIONAL STATISTICS LABORATORY
0 0 4 2
Preamble
The goal of the course is to present essential statistical concepts. Simulation is used to illustrate the
concepts and to provide understanding and develop the mathematical operations.
Prerequisites for the course
 21MA3205- Probability and Statistics
 21IT4601-Introduction to algorithms
Objectives
o To expose the variables, expressions, control stations of R.
o To use R programming for analysis of data and visualize outcomes in the form of
graphs, charts.
o To develop and understand the modern computational statistical approaches
and their applications to different data sets.
o To apply principles of data science to analyse various business problems.
o To use R software to carry out statistical computations and to analysis data using R.

S.No List of Experiments CO

1 Python Concepts, Data Structures CO1


Classes: Interpreter, Program Execution, Statements, Expressions, Flow CO1
2
Controls, Functions, Numeric Types,
Sequences and Class Definition, Constructors, Text & Binary Files - Reading CO1
3
and Writing
Visualization in Python: Matplotlib package
4 CO2

5 Plotting Graphs, Controlling Graph, Adding Text, CO2

6 More Graph Types, Getting and setting values, Patches. CO2

7 Multivariate data analysis: Multiple regression, CO3

8 multivariate regression, cluster analysis with various algorithms, CO4

9 factor analysis, CO4

10 PCA and linear discriminant analysis. CO5

Total Periods :60


S. No List of Test Projects CO
1 Market Basket Analysis CO5
2 Reducing Manufacturing Failures CO5
20
3 Insurance Pricing Forecast CO5
4 City Employee Salary Data Analysis CO5
5 Churn Prediction in Telecom CO5
6 Predicting Wine Preferences of Customers using Wine Dataset CO5
7 Identifying Product Bundles from Sales Data CO5
8 Movie Review Sentiment Analysis CO5
9 Store Sales Forecasting CO5
10 Building a Music Recommendation Engine CO5
11 Airline Dataset Analysis CO5
12 Predicting Flight Delays CO5
13 Event Data Analysis CO5
14 Building a Job Portal using Twitter Data CO5
15 Implementing Slowly Changing Dimensions in a Data CO5

21
INDEX

List of Experiments Page No


S.No
1 Python Concepts, Data Structures 25

2 Classes: Interpreter, Program Execution, Statements, 29


Expressions, Flow Controls, Functions, Numeric Types,
3 Sequences and Class Definition, Constructors, Text & Binary 31
Files - Reading and Writing
4 Visualization in Python: Matplotlib package 35

5 Plotting Graphs, Controlling Graph, Adding Text, 37

6 More Graph Types, Getting and setting values, Patches. 40

7 Multivariate data analysis: Multiple regression, 43

8 multivariate regression, cluster analysis with various 45


algorithms,
9 factor analysis, 49

10 PCA and linear discriminant analysis. 50

11 Test Projects 55

12 You tube links 24

INDEX
22
S.No. List of Projects Related CO
Experiment
1. Market Basket Analysis Exp. 1,2,3,4 CO1- CO5

2. Reducing Manufacturing Failures Exp. 5,6,7,8 CO1- CO5

3. Insurance Pricing Forecast Exp. 1 – 11 CO1- CO5

4. City Employee Salary Data Analysis Exp. 1,3,4,5,9 CO1- CO5

5. Churn Prediction in Telecom Exp. 1 – 11 CO1- CO5

6. Predicting Wine Preferences of Customers Exp. 2,3,4,10 CO1- CO5


using Wine Dataset
7. Identifying Product Bundles from Sales Data Exp. 1,2,3,7,8 CO1- CO5

8. Movie Review Sentiment Analysis Exp. 3,5,6,7 CO1- CO5

9. Store Sales Forecasting Exp. 1 – 11 CO1- CO5

10. Building a Music Recommendation Engine Exp. 1 – 11 CO1- CO5

11. Airline Dataset Analysis Exp. 1,2,3,10,11 CO1- CO5

12. Predicting Flight Delays Exp. 1,2,4,6 CO1- CO5

13. Event Data Analysis Exp. 4,6,7,10 CO1- CO5

14. Building a Job Portal using Twitter Data Exp. 1 – 11 CO1- CO5

15. Implementing Slowly Changing Dimensions in a Exp. 1 – 11 CO1- CO5


Data

YOU TUBE LINKS FOR THE LAB SESSION


23
S. Course
EXP Topics to be covered
No Outcome Video Link

https://drive.google.com/file/d/
1 1 Python Concepts, Data Structures CO1 1gNdHoysEX99wOg8WSHOtKHOoj
JtlYiuS/view?usp=drive_link
Classes: Interpreter, Program
https://drive.google.com/file/d/
Execution, Statements,
2 2 CO1 1wGWKGltG9lXc8J9fmF_e6e0BYis
Expressions, Flow Controls,
MjNdP/view?usp=drive_link
Functions, Numeric Types,
Sequences and Class Definition, https://drive.google.com/file/d/
3 3 Constructors, Text & Binary Files - CO1 158IIflaoGOtIruzO7nK_87uYWmq6
Reading and Writing Xj-A/view?usp=drive_link
https://drive.google.com/file/d/
Visualization in Python: Matplotlib 1FcP0wtOL5-
4 4 CO2
package iHWCbMg5s884U2HQnolrzs/view?
usp=drive_link
https://drive.google.com/file/d/
Plotting Graphs, Controlling Graph,
5 5 CO2 1WAhspVw24Ru4htF_czxJzObvzGA
Adding Text,
UmXdY/view?usp=drive_link
https://drive.google.com/file/d/
More Graph Types, Getting and
6 6 CO2 1LRbfm9yhLn7zQPLW8qu258Z3E
setting values, Patches.
3ph0Bck/view?usp=drive_link
https://drive.google.com/file/d/
Multivariate data analysis: Multiple
7 7 CO3 1j54s6ugN93hPZ5ChAU3NK65Rpy
regression,
0wygd9/view?usp=drive_link
https://drive.google.com/file/d/
multivariate regression, cluster
8 8 CO4 1pw3wlo3MphKzMUKef9oPkDoFb
analysis with various algorithms,
DYL7vYK/view?usp=drive_link
https://drive.google.com/file/d/
1RAcOgQa6eynk-
9 9 factor analysis, CO4
iZq48P_kvRbhc_V3did/view?
usp=drive_link
https://drive.google.com/file/d/
PCA and linear discriminant
10 10 CO5 1NRNMPBHqznbig0fzDz2jSPEasQp
analysis.
z19aC/view?usp=drive_link

EXPERIMENT 1: Python Concepts, Data Structures


24
PROGRAM:

Exercise 1.1
# Creating a List with
# the use of multiple values
List = ["Geeks", "For", "Geeks"]
print("\nList containing multiple values: ")
print(List)

# Creating a Multi-Dimensional List


# (By Nesting a list inside a List)
List2 = [['Geeks', 'For'], ['Geeks']]
print("\nMulti-Dimensional List: ")
print(List2)

# accessing a element from the


# list using index number
print("Accessing element from the list")
print(List[0])
print(List[2])

# accessing a element using


# negative indexing
print("Accessing element using negative indexing")

# print the last element of list


print(List[-1])

# print the third last element of list


print(List[-3])

Output
List containing multiple values:
['Geeks', 'For', 'Geeks']
Multi-Dimensional List:
[['Geeks', 'For'], ['Geeks']]
Accessing element from the list
Geeks
Geeks
Accessing element using negative indexing
Geeks
Geeks
Exercise 1.2
# Creating a Dictionary
25
Dict = {'Name': 'Geeks', 1: [1, 2, 3, 4]}
print("Creating Dictionary: ")
print(Dict)

# accessing a element using key


print("Accessing a element using key:")
print(Dict['Name'])

# accessing a element using get()


# method
print("Accessing a element using get:")
print(Dict.get(1))

# creation using Dictionary comprehension


myDict = {x: x**2 for x in [1,2,3,4,5]}
print(myDict)

Output
Creating Dictionary:
{'Name': 'Geeks', 1: [1, 2, 3, 4]}
Accessing a element using key:
Geeks
Accessing a element using get:
[1, 2, 3, 4]
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

Exercise 1.3
# Creating a Tuple with
# the use of Strings
Tuple = ('Geeks', 'For')
print("\nTuple with the use of String: ")
print(Tuple)

# Creating a Tuple with


# the use of list
list1 = [1, 2, 4, 5, 6]
print("\nTuple using List: ")
Tuple = tuple(list1)
26
# Accessing element using indexing
print("First element of tuple")
print(Tuple[0])

# Accessing element from last


# negative indexing
print("\nLast element of tuple")
print(Tuple[-1])

print("\nThird last element of tuple")


print(Tuple[-3])

Output
Tuple with the use of String:
('Geeks', 'For')
Tuple using List:
First element of tuple
1
Last element of tuple
6
Third last element of tuple
4

Exercise 1.4
# Creating a Set with
# a mixed type of values
# (Having numbers and strings)
Set = set([1, 2, 'Geeks', 4, 'For', 6, 'Geeks'])
print("\nSet with the use of Mixed Values")
print(Set)

# Accessing element using


# for loop
print("\nElements of set: ")
for i in Set:
print(i, end =" ")
print()

# Checking the element


# using in keyword
print("Geeks" in Set)
Output
Set with the use of Mixed Values
{1, 2, 'Geeks', 4, 6, 'For'}
27
Elements of set:
1 2 Geeks 4 6 For
True

Exercise 1.5
# importing "collections" for deque operations
import collections

# initializing deque
de = collections.deque([1,2,3])

# using append() to insert element at right end


# inserts 4 at the end of deque
de.append(4)

# printing modified deque


print("The deque after appending at right is : ")
print(de)

# using appendleft() to insert element at left end


# inserts 6 at the beginning of deque
de.appendleft(6)

# printing modified deque


print("The deque after appending at left is : ")
print(de)

# using pop() to delete element from right end


# deletes 4 from the right end of deque
de.pop()

# printing modified deque


print("The deque after deleting from right is : ")
print(de)

# using popleft() to delete element from left end


# deletes 6 from the left end of deque
de.popleft()

# printing modified deque


print("The deque after deleting from left is : ")
print(de)

Output
The deque after appending at right is :
28
deque([1, 2, 3, 4])
The deque after appending at left is :
deque([6, 1, 2, 3, 4])
The deque after deleting from right is :
deque([6, 1, 2, 3])
The deque after deleting from left is :
deque([1, 2, 3])

RESULT

Thus, the python programs using data structure concepts was executed successfully.

EXPERIMENT 2: Classes: Interpreter, Program Execution, Statements,


Expressions, Flow Controls, Functions, Numeric Types

Exercise 2.1
# Python program to show how a simple if keyword works
29
# Initializing some variables
v=5
t=4
print("The initial value of v is", v, "and that of t is ",t)

# Creating a selection control structure


if v > t :
print(v, "is bigger than ", t)
v -= 2

print("The new value of v is", v, "and the t is ",t)

# Creating the second control structure


if v < t :
print(v , "is smaller than ", t)
v += 1

print("the new value of v is ", v)

# Creating the third control structure


if v == t:
print("The value of v, ", v, " and t,", t, ", are equal")

Output:
The initial value of v is 5 and that of t is 4
5 is bigger than 4
The new value of v is 3 and the t is 4
3 is smaller than 4
the new value of v is
4
The value of v,
4 and t, 4, are equal

Exercise 2.2
# Python program to show how to execute a for loop
# Creating a sequence. In this case, a list

l = [2, 4, 7, 1, 6, 4]

# Executing the for loops


for i in range(len(l)):
print(l[i], end = ", ")
print("\n")
for j in range(0,10):
print(j, end = ", ")

Output:
2, 4, 7, 1, 6, 4,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
30
Exercise 2.3
# Python program to show how to execute a while loop
b=9
a=2
# Starting the while loop
# The condition a < b will be checked before each iteration
while a < b:
print(a, end = " ")
a=a+1
print("While loop is completed")

Output:
2 3 4 5 6 7 8 While loop is completed

Exercise 2.4
# defining a function to calculate LCM
def calculate_lcm(x, y):
# selecting the greater number
if x > y:
greater = x
else:
greater = y
while(True):
if((greater % x == 0) and (greater % y == 0)):
lcm = greater
break
greater += 1
return lcm
# taking input from users
num1 = int(input("Enter first number: "))
num2 = int(input("Enter second number: "))
# printing the result for the users
print("The L.C.M. of", num1,"and", num2,"is", calculate_lcm(num1, num2))

Output:
Enter first number: 3
Enter second number: 4
The L.C.M. of 3 and 4 is 12

RESULT

Thus, the class based python programs were executed successfully.

EXPERIMENT 3: Sequences and Class Definition, Constructors, Text &


Binary Files - Reading and Writing

Exercise 3.1
31
class Addition:
first = 0
second = 0
answer = 0
# parameterized constructor
def __init__(self, f, s):
self.first = f
self.second = s

def display(self):
print("First number = " + str(self.first))
print("Second number = " + str(self.second))
print("Addition of two numbers = " + str(self.answer))

def calculate(self):
self.answer = self.first + self.second
# creating object of the class
# this will invoke parameterized constructor
obj1 = Addition(1000, 2000)

# creating second object of same class


obj2 = Addition(10, 20)

# perform Addition on obj1


obj1.calculate()

# perform Addition on obj2


obj2.calculate()

# display result of obj1


obj1.display()

# display result of obj2


obj2.display()

Output
First number = 1000
Second number = 2000
Addition of two numbers = 3000
First number = 10
Second number = 20
Addition of two numbers = 30
Exercise 3.2
import os

def create_file(filename):
try:
32
with open(filename, 'w') as f:
f.write('Hello, world!\n')
print("File " + filename + " created successfully.")
except IOError:
print("Error: could not create file " + filename)

def read_file(filename):
try:
with open(filename, 'r') as f:
contents = f.read()
print(contents)
except IOError:
print("Error: could not read file " + filename)

def append_file(filename, text):


try:
with open(filename, 'a') as f:
f.write(text)
print("Text appended to file " + filename + " successfully.")
except IOError:
print("Error: could not append to file " + filename)

def rename_file(filename, new_filename):


try:
os.rename(filename, new_filename)
print("File " + filename + " renamed to " + new_filename + " successfully.")
except IOError:
print("Error: could not rename file " + filename)

def delete_file(filename):
try:
os.remove(filename)
print("File " + filename + " deleted successfully.")
except IOError:
print("Error: could not delete file " + filename)

if __name__ == '__main__':
filename = "example.txt"
new_filename = "new_example.txt"

create_file(filename)
read_file(filename)
append_file(filename, "This is some additional text.\n")
read_file(filename)
rename_file(filename, new_filename)
read_file(new_filename)
delete_file(new_filename)

Output:
33
File example.txt created successfully.
Hello, world!

Text appended to file example.txt successfully.


Hello, world!
This is some additional text.

File example.txt renamed to new_example.txt successfully.


Hello, world!
This is some additional text.

File new_example.txt deleted successfully.

RESULT

Thus, the python program using constructors and file handling methods was executed
successfully.
EXPERIMENT 4: Visualization in Python: Matplotlib package
# Python program to show pyplot module
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
34
# initializing the data
x = [10, 20, 30, 40]
y = [20, 25, 35, 55]

# Creating a new figure with width = 7 inches


# and height = 5 inches with face color as
# green, edgecolor as red and the line width
# of the edge as 7
fig = plt.figure(figsize =(7, 5), facecolor='g',
edgecolor='b', linewidth=7)

# Creating a new axes for the figure


ax = fig.add_axes([1, 1, 1, 1])

# Adding the data to be plotted


ax.plot(x, y)

# Adding title to the plot


plt.title("Linear graph", fontsize=25, color="yellow")

# Adding label on the y-axis


plt.ylabel('Y-Axis')

# Adding label on the x-axis


plt.xlabel('X-Axis')

# Setting the limit of y-axis


plt.ylim(0, 80)

# setting the labels of x-axis


plt.xticks(x, labels=["one", "two", "three", "four"])

# Adding legends
plt.legend(["GFG"])

plt.show()

Output:

35
RESULT

Thus, the Visualization in Python: Matplotlib package was implemented successfully.

EXPERIMENT 5: Plotting Graphs, Controlling Graph, Adding Text


36
Exercise 5.1
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-10, 10, 0.01)


y = x**2

plt.xlabel("X-axis", fontsize = 15)


plt.ylabel("Y-axis",fontsize = 15)

#Adding text inside a rectangular box by using the keyword 'bbox'


plt.text(-5, 60, 'Parabola $Y = x^2$', fontsize = 22,
bbox = dict(facecolor = 'red', alpha = 0.5))

plt.plot(x, y, c = 'g')

plt.show()

Output:

Exercise 5.2
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)


y = np.sin(x)

plt.plot(x,y)

plt.text(3.5, 0.9, 'Sine wave', fontsize = 23)

plt.xlabel('X-axis', fontsize = 15)


plt.ylabel('Y-axis', fontsize = 15)

#plt.grid(True, which='both')
37
plt.show()

Output:

Exercise 5.3
import matplotlib.pyplot as plt
import numpy as np

x = ['Rani', 'Meena', 'Raju', 'Jhansi', 'Ram']


y = [5, 7, 9, 2, 6]

plt.bar(x,y)

plt.text(3, 7, 'Student Marks',


fontsize = 18, color = 'g')

plt.xlabel('Students', fontsize = 15)


plt.ylabel('Marks', fontsize = 15)

plt.annotate('Highest scored', xy = (2.4, 8),


fontsize = 16, xytext = (3, 9),
arrowprops = dict(facecolor = 'red'),
color = 'g')

plt.show()

Output:

38
RESULT

Thus, the Plotting Graphs, Controlling Graph, Adding Text in python were executed
successfully.
EXPERIMENT 6: More Graph Types, Getting and setting values, Patches.
39
Exercise 6.1
import matplotlib.pyplot as plt

# Data labels, sizes, and colors are defined:


labels = 'Broccoli', 'Chocolate Cake', 'Blueberries', 'Raspberries'
sizes = [30, 330, 245, 210]
colors = ['green', 'brown', 'blue', 'red']

# Data is plotted:
plt.pie(sizes, labels=labels, colors=colors)

plt.axis('equal')
plt.title(“Pie Plot”)
plt.show()

Output:

Exercise 6.2
import numpy as np
import matplotlib.
pyplot as plt
discount= np.array([10,20,30,40,50])
saleInRs=np.array([40000,45000,48000,50000,100000])
size=discount*10
plt.scatter(x=discount,y=saleInRs,s=size,color='red',linewidth=3,
marker='*',edgecolor='blue')
plt.title('Sales Vs Discount')
plt.xlabel('Discount offered')
plt.ylabel('Sales in Rs') plt.show()
Output:

40
Exercise 6.3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv("Min_Max_Seasonal_IMD_2017.csv",
usecols=['ANNUAL - MIN'])
df=pd.DataFrame(data)
#convert the 'ANNUAL - MIN' column into a numpy 1D array
minarray=np.array([df['ANNUAL - MIN']])
# Extract y (frequency) and edges (bins)
y,edges = np.histogram(minarray)
#calculate the midpoint for each bar on the histogram
mid = 0.5*(edges[1:]+ edges[:-1])
df.plot(kind='hist',y='ANNUAL - MIN'
plt.plot(mid,y,'-^')
plt.title('Annual Min Temperature plot(1901 - 2017)')
plt.xlabel('Temperature')
plt.show()

41
Output:

RESULT

Thus, the Graph Types, Getting and setting values, Patches in python were executed
successfully.

42
EXPERIMENT 7: Multivariate data analysis: Multiple regression
def mse(coef, x, y):
return np.mean((np.dot(x, coef) - y)**2)/2

def gradients(coef, x, y):


return np.mean(x.transpose()*(np.dot(x, coef) - y), axis=1)

def multilinear_regression(coef, x, y, lr, b1=0.9, b2=0.999, epsilon=1e-8):


prev_error = 0
m_coef = np.zeros(coef.shape)
v_coef = np.zeros(coef.shape)
moment_m_coef = np.zeros(coef.shape)
moment_v_coef = np.zeros(coef.shape)
t=0

while True:
error = mse(coef, x, y)
if abs(error - prev_error) <= epsilon:
break
prev_error = error
grad = gradients(coef, x, y)
t += 1
m_coef = b1 * m_coef + (1-b1)*grad
v_coef = b2 * v_coef + (1-b2)*grad**2
moment_m_coef = m_coef / (1-b1**t)
moment_v_coef = v_coef / (1-b2**t)

delta = ((lr / moment_v_coef**0.5 + 1e-8) *


(b1 * moment_m_coef + (1-b1)*grad/(1-b1**t)))

coef = np.subtract(coef, delta)


return coef

coef = np.array([0, 0, 0])


c = multilinear_regression(coef, x, y, 1e-1)
fig = plt.figure()
ax = fig.add_subplot(projection='3d')

ax.scatter(x[:, 1], x[:, 2], y, label='y',


s=5, color="dodgerblue")

ax.scatter(x[:, 1], x[:, 2], c[0] + c[1]*x[:, 1] + c[2]*x[:, 2],


label='regression', s=5, color="orange")

ax.view_init(45, 0)
ax.legend()
plt.show()
43
Output:

RESULT

Thus, the Multivariate data analysis: Multiple regression in python was executed
successfully

44
EXPERIMENT 8: Multivariate regression, cluster analysis with various
algorithms

Exercise 8.1
from matplotlib import cm

# generating correlation data


df = zoo_data.corr()
df.index = range(0, len(df))
df.rename(columns = dict(zip(df.columns, df.index)), inplace = True)
df = df.astype(object)

''' Generating coordinates with


corresponding correlation values '''
for i in range(0, len(df)):
for j in range(0, len(df)):
if i != j:
df.iloc[i, j] = (i, j, df.iloc[i, j])
else :
df.iloc[i, j] = (i, j, 0)

df_list = []

# flattening dataframe values


for sub_list in df.values:
df_list.extend(sub_list)

# converting list of tuples into trivariate dataframe


plot_df = pd.DataFrame(df_list)

fig = plt.figure()
ax = Axes3D(fig)

# plotting 3D trisurface plot


ax.plot_trisurf(plot_df[0], plot_df[1], plot_df[2],
cmap = cm.jet, linewidth = 0.2)

plt.show()

45
Output

Exercise 8.2
from copy import deepcopy
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv("/content/Iris.csv")
df.drop('Id',axis=1,inplace=True)
df.head()
df["Species"] = pd.Categorical(df["Species"])
df["Species"] = df["Species"].cat.codes

# Changing dataframe to numpy matrix


data = df.values[:, 0:4]
category = df.values[:, 4]

k=3
# Training data
n = data.shape[0]

# Number of features in the data


c = data.shape[1]

# Generating random centers


mean = np.mean(data, axis = 0)
std = np.std(data, axis = 0)
centers = np.random.randn(k,c)*std + mean

# Plotting data
colors=['blue', 'yellow', 'green']
for i in range(n):
plt.scatter(data[i, 0], data[i,1], s=7, color = colors[int(category[i])])
plt.scatter(centers[:,0], centers[:,1], marker='.', c='r', s=150)

46
Output:

Exercise 8.3

import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4,
random_state=0)

X = StandardScaler().fit_transform(X)
db = DBSCAN(eps=0.3, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)

print('Estimated number of clusters: %d' % n_clusters_)


print('Estimated number of noise points: %d' % n_noise_)
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
print("Adjusted Rand Index: %0.3f"
% metrics.adjusted_rand_score(labels_true, labels))
print("Adjusted Mutual Information: %0.3f"
% metrics.adjusted_mutual_info_score(labels_true, labels))
print("Silhouette Coefficient: %0.3f"
% metrics.silhouette_score(X, labels))

Now, let’s plot the results that we saw in our output above.
import matplotlib.pyplot as plt
47
%matplotlib inline
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]

class_member_mask = (labels == k)

xy = X[class_member_mask & core_samples_mask]


plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=14)

xy = X[class_member_mask & ~core_samples_mask]


plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=6)

plt.title('Estimated number of clusters: %d' % n_clusters_)


plt.show()

Output:

RESULT

Thus, the multivariate regression, cluster analysis with various algorithms were
executed successfully.

48
EXPERIMENT 9: Factor analysis using Python
#EXPLORATORY FACTOR ANALYSIS
fa = FactorAnalyzer(10, rotation=None)
fa.fit(scaled_baseball)

FactorAnalyzer(bounds=(0.005, 1), impute='median', is_corr_matrix=False,


method='minres', n_factors=3, rotation=None, rotation_kwargs={},
use_smc=True)

#GET EIGENVALUES
fa.get_eigenvalues()

[4.82803775 2.16198932 1.74301502 1.45229663 0.95769957 0.82668734


0.60604294 0.50273078 0.33357804 0.19899495 0.16015927 0.12622135
0.087932 0.01461505]
[ 4.56895784e+00 1.96965781e+00 1.46222314e+00 1.18651683e+00
4.28384036e-01 2.49077679e-01 3.39698376e-02 -3.85219812e-03
-4.32093191e-02 -5.84983971e-02 -7.14656872e-02 -1.06756688e-01
-1.84699846e-01 -2.57518572e-01]

# SCREEPLOT (need pyplot)


plt.scatter(range(1,scaled_baseball.shape[1]+1),ev)
plt.plot(range(1,scaled_baseball.shape[1]+1),ev)
plt.title('Scree Plot')
plt.xlabel('Factors')
plt.ylabel('Eigenvalue')
plt.grid()
plt.show()

Output:

RESULT

Thus, the factor analysis using python was executed successfully.


49
EXPERIMENT 10: PCA and linear discriminant analysis
Exercise 10.1
# compare lda number of components with naive bayes algorithm for classification
from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from matplotlib import pyplot

# get the dataset


def get_dataset():
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=7, n_classes=10)
return X, y

# get a list of models to evaluate


def get_models():
models = dict()
for i in range(1,10):
steps = [('lda', LinearDiscriminantAnalysis(n_components=i)), ('m',
GaussianNB())]
models[str(i)] = Pipeline(steps=steps)
return models

# evaluate a give model using cross-validation


def evaluate_model(model, X, y):
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1,
error_score='raise')
return scores

# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
scores = evaluate_model(model, X, y)
results.append(scores)
names.append(name)
print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))
# plot model performance for comparison
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.show()
50
Output

>1 0.182 (0.032)


>2 0.235 (0.036)
>3 0.267 (0.038)
>4 0.303 (0.037)
>5 0.314 (0.049)
>6 0.314 (0.040)
>7 0.329 (0.042)
>8 0.343 (0.045)
>9 0.358 (0.056)

Exercise 10.2
# importing required libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# importing or loading the dataset
dataset = pd.read_csv('wine.csv')

# distributing the dataset into two components X and Y


X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values
# Splitting the X and Y into the
# Training set and Testing set
from sklearn.model_selection import train_test_split
51
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# performing preprocessing part
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Applying PCA function on training
# and testing set of X component
from sklearn.decomposition import PCA

pca = PCA(n_components = 2)

X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

explained_variance = pca.explained_variance_ratio_
# Fitting Logistic Regression To the training set
from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

Output

# Predicting the test set result using


# predict function under LogisticRegression
y_pred = classifier.predict(X_test)
# making confusion matrix between
# test set of Y and predicted value.
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

# Predicting the training set


# result through scatter plot
from matplotlib.colors import ListedColormap

X_set, y_set = X_train, y_train


X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,
stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1,
stop = X_set[:, 1].max() + 1, step = 0.01))

52
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape), alpha = 0.75,
cmap = ListedColormap(('yellow', 'white', 'aquamarine')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)

plt.title('Logistic Regression (Training set)')


plt.xlabel('PC1') # for Xlabel
plt.ylabel('PC2') # for Ylabel
plt.legend() # to show legend

# show scatter plot


plt.show()

Output:

# Visualising the Test set results through scatter plot


from matplotlib.colors import ListedColormap

X_set, y_set = X_test, y_test

X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,


stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1,
stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),


X2.ravel()]).T).reshape(X1.shape), alpha = 0.75,
cmap = ListedColormap(('yellow', 'white', 'aquamarine')))

53
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)

# title for scatter plot


plt.title('Logistic Regression (Test set)')
plt.xlabel('PC1') # for Xlabel
plt.ylabel('PC2') # for Ylabel
plt.legend()

# show scatter plot


plt.show()

Output:

RESULT

Thus, the PCA and linear discriminant analysis were executed successfully.

54
TEST PROJECT DESCRIPTION
S.No. List of Projects Project Description
1. Market Basket Analysis Market basket analysis is a data mining technique used by
retailers to increase sales by better understanding customer
purchasing patterns. It involves analyzing large data sets,
such as purchase history, to reveal product groupings, as
well as products that are likely to be purchased together.
2. Reducing In this project we use the measurements of the parts of the
Manufacturing appliances as they progress through an assembly- line to
Failures predict whether there would be a defect in the part. This will
help companies to produce high-quality, low-cost products
at the user end.
3. Insurance Pricing The purposes of this project to look into different features to
Forecast observe their relationship, and plot a multiple linear
regression based on several features of individual such as
age, physical/family condition and location against their
existing medical expense to be used for predicting future
medical expenses of individuals that help medical insurance
to make decision on charging the premium.
4. City Employee Data has changed the face of our world over the last ten
Salary Data years. The numerous emails, text messages we share,
Analysis YouTube videos we watch are part of the nearly 2.5
quintillion bytes of data generated daily across the world.
Businesses, both large and small, deal with massive data
volumes, and a lot depends on their ability to glean
meaningful insights from them. A data analyst does precisely
that. They interpret statistical data and turn it into useful
information that businesses and organizations can use for
critical decision-making
5. Churn Prediction in Predicting customer churn is critical for telecommunication
Telecom companies to be able to effectively retain customers. It is
more costly to acquire new customers than to retain existing
ones. For this reason, large telecommunications corporations
are seeking to develop models to predict which customers
are more likely to change and take actions accordingly.
6. Predicting Wine Predicting wine quality using machine learning techniques is
Preferences of becoming increasingly popular today. Basically, it’s the
Customers using computer algorithm that can tell if there’s a difference
Wine Dataset between a $5 bottle of wine or a $100 one. There are many
educational step-by-step guides by professional
programmers using open-source wine quality prediction
datasets and teaching how to use ML for wine quality
prediction. But we decided to break this down into a more
detailed and technical overview.
7. Identifying Product The objective of this data science project in R is to find out
Bundles from Sales product bundles that can be put together on sale. Typically
Data Market Basket Analysis was used to identify such bundles,
here we are going to compare the relative importance of
55
time series clustering in identifying product bundles.
8. Movie Review Sentiment relates to the meaning of a word or sequence of
Sentiment Analysis words and is usually associated with an opinion or emotion.
And analysis? Well, this is the process of looking at data and
making inferences; in this case, using machine learning to
learn and predict whether a movie review is positive or
negative.
9. Store Sales Sales forecasting is the process of estimating future sales.
Forecasting Accurate sales forecasts enable companies to make informed
business decisions and predict short-term and long-term
performance. Companies can base their forecasts on past
sales data, industry-wide comparisons, and economic trends.

10. Building a Music Similar genres will sound similar and will come from similar
Recommendation time periods while the same can be said for songs within
Engine those genres. We can use this idea to build a
recommendation system by taking the data points of the
songs a user has listened to and recommending songs
corresponding to nearby data points.
11. Airline Dataset It describes financial metrics for Individual airlines, airline
Analysis sectors and the industry as a whole for the American
commercial airline industry. The Original data from the
source is collected in the zip file "Original MIT data" and the
data relating to Airline finances and the main industry
metrics has been cleaned and written into csv files for ease
of use.
12. Predicting Flight Our project focuses on predicting flight delays using
Delays machine learning techniques. We employ feature
engineering and advanced regression algorithms to enhance
accuracy. The dataset includes flight info, weather
conditions, and other relevant factors. Our model achieves
94% accuracy.
13. Event Data In an Event Data Analysis project, the goal is to analyze and
Analysis gain insights from data generated during events or
conferences. The project typically involves collecting,
cleaning, and processing the data to extract meaningful
information.
14. Building a Job The portal will stream data from Twitter API to find out the
Portal using recently published jobs. Classification of relevant and
Twitter Data irrelevant tweets is accomplished using the machine-
learning algorithm, i.e., Logistic Regression. Using the
algorithm, we have measured the 97% accuracy.
15. Implementing Slowly Changing Dimensions (SCDs) are techniques used in
Slowly Changing data projects to manage the updates, changes, and historical
Dimensions in a data in a dimensional model. It is important to implement
Data SCDs to ensure the accuracy and consistency of the data over
time.

56
57

You might also like