CS Lab Manual New Format
CS Lab Manual New Format
BUSINESS SYSTEMS
LABORATORY MANUAL
1
PREFACE
Learning is a process that requires class instructions and practice labs. If we omit any of
the above then the learning process is clearly flawed. This manual is an attempt to standalone
the lab instructions through the development of lab curriculum that is based on the class
curriculum. This manual is intended to be used by lab instructors, course instructors and
students.
The intent of this curriculum is to define a clear lab structure that can be followed by
the lab instructor and the students. Perhaps one of the greatest problems faced by lab
instructors is that they are unable to keep the students occupied for the entire duration of the
lab due to which the learning process is greatly hampered.
The labs have been developed in such a way that there is synchronization between the
class and the lab. The manual has been divided into 15 labs having duration of 3 hours each.
Students of the course are expected to carefully read the concept map before coming to the lab.
Students come to the lab with a design/program that will be handed over to the lab instructor
for further grading. The code/design is based on previous learning and experiments. Each lab
has a detailed walk-through task which provides a problem statement and its programmable
solution to the students. The students can raise queries about the code provided and the lab
instructor will guide the students on how the solution has been designed.
Thereafter predefined practice questions have been presented such that each question
has a fix duration and grade. Students are graded upon their accomplishments in these
practice tasks. At the end of the lab, the lab instructor will assign an unseen task to the
students. This unseen task contains all the concepts taught in the lab. These unseen tasks have
a higher level of complexity and generally have a greater gain in terms of marks.
What sets these labs apart is the fact that a clear grading criterion has been defined for
each lab. Students are aware of the grading criteria and are expected to meet the requirements
for successful completion of each lab.
2
COLLEGE VISION AND MISSION
College Vision
Our Vision is "To create innovative and vibrant young leaders and entrepreneurs in
Engineering and Technology for building India as a super knowledge power and blossom into a
University of excellence recognized globally".
College Mission
To provide education in Engineering with excellence and ethics and to reach the unreached.
3
COLLEGE QUALITY POLICY
4
DEPARTMENT OF COMPUTER SCIENCE AND BUSINESS SYSTEMS
Computer Science and Business Systems (CSBS) is one of the popular courses among engineering
aspirants which mainly focuses on computation, analysis of algorithms, programming languages,
program design, software engineering, computer hardware, computer networks and problem-
solving skills. This course was established in the year 2021 in Francis Xavier Engineering College. To
address the growing need of Engineering talents with skills in digital technology, TCS, in partnership
with leading academicians across India, has designed a 4 years undergraduate programme on
Computer Science titled “Computer Science and Business Systems.
This curriculum aims to ensure that the students graduating from the program not only know the
core topics of Computer Science but also develop an equal appreciation of humanities, human
values, Financial Management, Services Science & Service Operational Management, Marketing
Research, and Marketing Management.
5
DEPARTMENT VISION AND MISSION
Department Vision
Department Mission
education programs.
To inculcate varied skill sets that meets global industry standards and to practice
moral values.
To enrich moral and ethical values to lead and serve the society.
6
DEPARTMENT QUALITY POLICY
Identify and prioritize Quality education in of Computer Science and Business systems
Provide a focal point for an extended IT quality network comprised of end users and
7
PROGRAM EDUCATIONAL OBJECTIVES
S. PEOs Definition of
No PEOs
To apply problem solving skills in Computer science and
I PEO 1 Business Management by applying Engineering
fundamentals.
To improve communication skills, business management
skills, professional ethics, team work and to innovate
II PEO 2 technologies for the betterment of society.
8
PROGRAM OUTCOMES
S. No. Programme
Outcomes
Engineering knowledge: Apply the knowledge of mathematics, science,
PO1 engineering fundamentals, and an engineering specialization to the solution of
complex engineering problems.
Problem analysis: Identify, formulate, review research literature, and
PO2 analyze complex engineering problems reaching substantiated conclusions
using first principles of mathematics, natural sciences, and engineering
sciences.
Design/development of solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified
PO3 needs with appropriate consideration for the public health and safety, and the
cultural, societal, and environmental considerations.
Conduct investigations of complex problems: Use research-based
PO4 knowledge and research methods including design of experiments, analysis
and interpretation of data, and synthesis of the information to provide valid
conclusions.
Modern tool usage: Create, select, and apply appropriate techniques,
PO5 resources, and modern engineering and IT tools including prediction and
modeling to complex engineering activities with an understanding of the
limitations.
The engineer and society: Apply reasoning informed by the contextual
PO6 knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
Environment and sustainability: Understand the impact of the professional
PO7 engineering solutions in societal and environmental contexts, and
demonstrate the knowledge of, and need for sustainable development.
Ethics: Apply ethical principles and commit to professional ethics
PO8 and responsibilities and norms of the engineering practice.
Individual and team work: Function effectively as an individual, and as a
PO9 member or leader in diverse teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering activities
PO10 with the engineering community and with society at large, such as, being able
to comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.
Project management and finance: Demonstrate knowledge and
PO11 understanding of the engineering and management principles and apply these
to one’s own work, as a member and leader in a team, to manage projects and
in multidisciplinary environments.
Life-long learning: Recognize the need for, and have the preparation and
PO12 ability to engage in independent and life-long learning in the broadest context
of technological change.
9
LBORATORY
INTRODUCTION
OBJECTIVE
The graduates from Data and Computational Science course are called computational and data
scientists. These scientists have the responsibility to work on mathematical models, develop
quantitative analysis techniques and learn the usage of computers to analyse and solve real-life
scientific problems. The knowledge of innovative tools which can be used and how they should
collaborate with clients and fulfil their demands is one of the most important aspects of Data and
Computational Science course. Extraordinary technology like modelling, simulation and data mining
can also be studied in Data and Computational Science course.
SCOPE OF DBMS
Since more technologies are being used in today’s world, the need for a Data scientist is increasing
tremendously. Therefore, the scope of Data and Computational Science is huge. Data and
Computational Science course is very attractive for the young generation as well as the professionals
of this field. The demand is created in sectors of Information technology, telecom, manufacturing,
finance and insurance, retail and much more. Data Science is used in the fields of E-Commerce,
manufacturing, banking and finance, healthcare, and transport. Data and Computational Science
professionals can find employment in the top companies like Amazon, Walmart, and Mate Labs with
a variety of job roles of Software engineer, data scientist, business analyst, and many more.
10
Do’s and Don’ts
Do’s
Know the location of the fire extinguisher and the first aid box and how to use
Read and understand how to carry out an activity thoroughly before coming to the
laboratory.
Dont’s
Do not eat or drink in the laboratory.
Do not insert metal objects such as clips, pins and needles into the computer casings.
changing the desktop back ground or changing the video and audio settings.
11
Safety Measures and Guidelines
Take a note of all the exits in the room and also take note of the location of
the fire extinguishers in the room for the sake of fire safety.
Look away from the screen once in a while to give your eyes a rest.
Do not attempt to open any machines and do not touch the backs of machines when
Do not spill water or any other liquid on the machines in order to maintain electrical
safety.
changing the desktop back Ground or changing the video and audio settings.
12
Instructions to Teachers
Teacher should review the experiment’s instructions prior to class for proper
conduction of the experiments
Teachers must instruct students in Internet Safety
Teacher must remain in the lab at all times and is responsible for discipline.
Teacher must report for any computer with missing or damaged hardware or peripherals
Teachers are expected to closely monitor student activity by frequent screen checks.
Teachers should report any non-functioning technology equipment to their
Department Head
Teachers, should when using computer labs, turn off the digital projector, and return
the room key after doors have been locked. Doors to computer labs must be locked
when not in use.
Everyone will adhere to federal copyright laws.
13
Instructions to Students
Student should follow the Lab dress code whenever they avail the laboratory facilities
and make sure your ID cards are visible outside
Whenever students enter into the lab, they should make the entry in the log register kept
for that purpose.
Observation note books / record note books are only allowed inside the lab, other
belongings are not allowed.
Maintain silence in the Lab.
Only one user is allowed to work in one system at a time.
If any problem occurs in the software or hardware it should be brought to the notice
of the staff in-charge, as well as entry should be made in the log register kept for that
purpose.
The laboratory must be kept clean and neat.
Arrange the chairs before leaving the lab.
Shutdown the systems in a proper way before leaving the laboratory.
14
Lab Code of Conduct
You must wear your ID and Lab Coat each time you enter a computer lab. If you do
not have your ID, or lab coat when entering the computer lab, you may be asked to
leave the computer lab.
No drinking or eating is allowed in any computer lab. All open and unopened food,
and beverages are prohibited from entering the computer lab.
You must be considerate of other users. Privacy and concentration are important in
computer labs. If you need to talk to somebody, please do so in a way that does not
disturb other users.
Lab assistants are there to assist in using the technology so that you may complete
your work.
The computer labs are an academic resource. As such, please respect the needs of
others by not monopolizing the computers for non-academic use.
Lab staff is not responsible for any belongings left in the computer labs. Please make
sure you take your belongings with you when you leave.
The computers in the labs have been set up in such a way as to be used by multiple
people having differing needs. Do not change or interfere with the configuration of
the computers.
Software downloaded from the Internet is not to be installed on any lab computer for
any purpose.
Documents should be saved to the D drive.
Users are not allowed to print large quantities of flyers, banners or other distribution
materials. If print jobs of this nature are required, one copy may be printed in the
computer lab and copies will need to be processed through the alternative printing
facility.
Attempting to damage or destroy information on the computers will not be tolerated.
You are expected to leave your computer in the same condition as you found it. This
includes putting chairs back in place and logging out when you leave.
You are responsible for reading and abiding by all signs posted in the computer labs.
15
Major Lab Equipment with Specifications
16
OBJECTIVES: Course Objectives and Course Outcomes
The student should be made to:
OUTCOMES:
CO1 Apply the basic concepts of Computational Statistics using python & R
CO2 Apply the Graph techniques
CO3 Apply the multivariate graphing techniques
CO4 Apply the concept of regression and clustering
CO5 Implement a project based on the Data Analytics
17
Mapping Course Outcome with Program Outcome
Course
techniques
for large volumes of data.
Pa Engineering Knowledge: Apply
knowledge of mathematics, science, H H H H M
engineering fundamentals and an
engineering specialization
for building engineering models.
Pb Problem Analysis: Identify and solve
engineering problems reaching H H H H M
conclusions using mathematics and
engineering sciences.
Pc Design/Development of Solutions:
Design and develop solutions for H H H H M
engineering problems
that meet specified needs.
Pd Conduct Investigations of Complex
Problems: Conduct investigations of H H H
complex problems including design of
experiments and
analysis to provide valid solutions.
Pe Modern Tool Usage: Create and apply
appropriate techniques, resources, and H M
modern engineering tools for
executing engineering
activities.
Pf The Engineer and Society: Apply
reasoning of the societal, safety issues and
the consequent
responsibilities relevant to engineering
practice.
Pg Environment and Sustainability: M
Understand the impact of engineering
solutions in the environment and exhibit
the knowledge for
sustainable development.
Ph Ethics: Apply ethical principles and M
18
commit to professional ethics,
responsibilities and norms of
engineering practice.
Pi Individual and Team Work: Function M
effectively as an individual, and as a
member or leader in diverse teams in
multi-disciplinary
settings.
Pj Communication: Communicate M
effectively to the engineering community
and the outside world
and also to write effective reports.
Course
techniques
for large volumes of data.
19
LIST OF EXPERIMENTS
L T P C
21CB5611 COMPUTATIONAL STATISTICS LABORATORY
0 0 4 2
Preamble
The goal of the course is to present essential statistical concepts. Simulation is used to illustrate the
concepts and to provide understanding and develop the mathematical operations.
Prerequisites for the course
21MA3205- Probability and Statistics
21IT4601-Introduction to algorithms
Objectives
o To expose the variables, expressions, control stations of R.
o To use R programming for analysis of data and visualize outcomes in the form of
graphs, charts.
o To develop and understand the modern computational statistical approaches
and their applications to different data sets.
o To apply principles of data science to analyse various business problems.
o To use R software to carry out statistical computations and to analysis data using R.
21
INDEX
11 Test Projects 55
INDEX
22
S.No. List of Projects Related CO
Experiment
1. Market Basket Analysis Exp. 1,2,3,4 CO1- CO5
14. Building a Job Portal using Twitter Data Exp. 1 – 11 CO1- CO5
https://drive.google.com/file/d/
1 1 Python Concepts, Data Structures CO1 1gNdHoysEX99wOg8WSHOtKHOoj
JtlYiuS/view?usp=drive_link
Classes: Interpreter, Program
https://drive.google.com/file/d/
Execution, Statements,
2 2 CO1 1wGWKGltG9lXc8J9fmF_e6e0BYis
Expressions, Flow Controls,
MjNdP/view?usp=drive_link
Functions, Numeric Types,
Sequences and Class Definition, https://drive.google.com/file/d/
3 3 Constructors, Text & Binary Files - CO1 158IIflaoGOtIruzO7nK_87uYWmq6
Reading and Writing Xj-A/view?usp=drive_link
https://drive.google.com/file/d/
Visualization in Python: Matplotlib 1FcP0wtOL5-
4 4 CO2
package iHWCbMg5s884U2HQnolrzs/view?
usp=drive_link
https://drive.google.com/file/d/
Plotting Graphs, Controlling Graph,
5 5 CO2 1WAhspVw24Ru4htF_czxJzObvzGA
Adding Text,
UmXdY/view?usp=drive_link
https://drive.google.com/file/d/
More Graph Types, Getting and
6 6 CO2 1LRbfm9yhLn7zQPLW8qu258Z3E
setting values, Patches.
3ph0Bck/view?usp=drive_link
https://drive.google.com/file/d/
Multivariate data analysis: Multiple
7 7 CO3 1j54s6ugN93hPZ5ChAU3NK65Rpy
regression,
0wygd9/view?usp=drive_link
https://drive.google.com/file/d/
multivariate regression, cluster
8 8 CO4 1pw3wlo3MphKzMUKef9oPkDoFb
analysis with various algorithms,
DYL7vYK/view?usp=drive_link
https://drive.google.com/file/d/
1RAcOgQa6eynk-
9 9 factor analysis, CO4
iZq48P_kvRbhc_V3did/view?
usp=drive_link
https://drive.google.com/file/d/
PCA and linear discriminant
10 10 CO5 1NRNMPBHqznbig0fzDz2jSPEasQp
analysis.
z19aC/view?usp=drive_link
Exercise 1.1
# Creating a List with
# the use of multiple values
List = ["Geeks", "For", "Geeks"]
print("\nList containing multiple values: ")
print(List)
Output
List containing multiple values:
['Geeks', 'For', 'Geeks']
Multi-Dimensional List:
[['Geeks', 'For'], ['Geeks']]
Accessing element from the list
Geeks
Geeks
Accessing element using negative indexing
Geeks
Geeks
Exercise 1.2
# Creating a Dictionary
25
Dict = {'Name': 'Geeks', 1: [1, 2, 3, 4]}
print("Creating Dictionary: ")
print(Dict)
Output
Creating Dictionary:
{'Name': 'Geeks', 1: [1, 2, 3, 4]}
Accessing a element using key:
Geeks
Accessing a element using get:
[1, 2, 3, 4]
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
Exercise 1.3
# Creating a Tuple with
# the use of Strings
Tuple = ('Geeks', 'For')
print("\nTuple with the use of String: ")
print(Tuple)
Output
Tuple with the use of String:
('Geeks', 'For')
Tuple using List:
First element of tuple
1
Last element of tuple
6
Third last element of tuple
4
Exercise 1.4
# Creating a Set with
# a mixed type of values
# (Having numbers and strings)
Set = set([1, 2, 'Geeks', 4, 'For', 6, 'Geeks'])
print("\nSet with the use of Mixed Values")
print(Set)
Exercise 1.5
# importing "collections" for deque operations
import collections
# initializing deque
de = collections.deque([1,2,3])
Output
The deque after appending at right is :
28
deque([1, 2, 3, 4])
The deque after appending at left is :
deque([6, 1, 2, 3, 4])
The deque after deleting from right is :
deque([6, 1, 2, 3])
The deque after deleting from left is :
deque([1, 2, 3])
RESULT
Thus, the python programs using data structure concepts was executed successfully.
Exercise 2.1
# Python program to show how a simple if keyword works
29
# Initializing some variables
v=5
t=4
print("The initial value of v is", v, "and that of t is ",t)
Output:
The initial value of v is 5 and that of t is 4
5 is bigger than 4
The new value of v is 3 and the t is 4
3 is smaller than 4
the new value of v is
4
The value of v,
4 and t, 4, are equal
Exercise 2.2
# Python program to show how to execute a for loop
# Creating a sequence. In this case, a list
l = [2, 4, 7, 1, 6, 4]
Output:
2, 4, 7, 1, 6, 4,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
30
Exercise 2.3
# Python program to show how to execute a while loop
b=9
a=2
# Starting the while loop
# The condition a < b will be checked before each iteration
while a < b:
print(a, end = " ")
a=a+1
print("While loop is completed")
Output:
2 3 4 5 6 7 8 While loop is completed
Exercise 2.4
# defining a function to calculate LCM
def calculate_lcm(x, y):
# selecting the greater number
if x > y:
greater = x
else:
greater = y
while(True):
if((greater % x == 0) and (greater % y == 0)):
lcm = greater
break
greater += 1
return lcm
# taking input from users
num1 = int(input("Enter first number: "))
num2 = int(input("Enter second number: "))
# printing the result for the users
print("The L.C.M. of", num1,"and", num2,"is", calculate_lcm(num1, num2))
Output:
Enter first number: 3
Enter second number: 4
The L.C.M. of 3 and 4 is 12
RESULT
Exercise 3.1
31
class Addition:
first = 0
second = 0
answer = 0
# parameterized constructor
def __init__(self, f, s):
self.first = f
self.second = s
def display(self):
print("First number = " + str(self.first))
print("Second number = " + str(self.second))
print("Addition of two numbers = " + str(self.answer))
def calculate(self):
self.answer = self.first + self.second
# creating object of the class
# this will invoke parameterized constructor
obj1 = Addition(1000, 2000)
Output
First number = 1000
Second number = 2000
Addition of two numbers = 3000
First number = 10
Second number = 20
Addition of two numbers = 30
Exercise 3.2
import os
def create_file(filename):
try:
32
with open(filename, 'w') as f:
f.write('Hello, world!\n')
print("File " + filename + " created successfully.")
except IOError:
print("Error: could not create file " + filename)
def read_file(filename):
try:
with open(filename, 'r') as f:
contents = f.read()
print(contents)
except IOError:
print("Error: could not read file " + filename)
def delete_file(filename):
try:
os.remove(filename)
print("File " + filename + " deleted successfully.")
except IOError:
print("Error: could not delete file " + filename)
if __name__ == '__main__':
filename = "example.txt"
new_filename = "new_example.txt"
create_file(filename)
read_file(filename)
append_file(filename, "This is some additional text.\n")
read_file(filename)
rename_file(filename, new_filename)
read_file(new_filename)
delete_file(new_filename)
Output:
33
File example.txt created successfully.
Hello, world!
RESULT
Thus, the python program using constructors and file handling methods was executed
successfully.
EXPERIMENT 4: Visualization in Python: Matplotlib package
# Python program to show pyplot module
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
34
# initializing the data
x = [10, 20, 30, 40]
y = [20, 25, 35, 55]
# Adding legends
plt.legend(["GFG"])
plt.show()
Output:
35
RESULT
plt.plot(x, y, c = 'g')
plt.show()
Output:
Exercise 5.2
import matplotlib.pyplot as plt
import numpy as np
plt.plot(x,y)
#plt.grid(True, which='both')
37
plt.show()
Output:
Exercise 5.3
import matplotlib.pyplot as plt
import numpy as np
plt.bar(x,y)
plt.show()
Output:
38
RESULT
Thus, the Plotting Graphs, Controlling Graph, Adding Text in python were executed
successfully.
EXPERIMENT 6: More Graph Types, Getting and setting values, Patches.
39
Exercise 6.1
import matplotlib.pyplot as plt
# Data is plotted:
plt.pie(sizes, labels=labels, colors=colors)
plt.axis('equal')
plt.title(“Pie Plot”)
plt.show()
Output:
Exercise 6.2
import numpy as np
import matplotlib.
pyplot as plt
discount= np.array([10,20,30,40,50])
saleInRs=np.array([40000,45000,48000,50000,100000])
size=discount*10
plt.scatter(x=discount,y=saleInRs,s=size,color='red',linewidth=3,
marker='*',edgecolor='blue')
plt.title('Sales Vs Discount')
plt.xlabel('Discount offered')
plt.ylabel('Sales in Rs') plt.show()
Output:
40
Exercise 6.3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data=pd.read_csv("Min_Max_Seasonal_IMD_2017.csv",
usecols=['ANNUAL - MIN'])
df=pd.DataFrame(data)
#convert the 'ANNUAL - MIN' column into a numpy 1D array
minarray=np.array([df['ANNUAL - MIN']])
# Extract y (frequency) and edges (bins)
y,edges = np.histogram(minarray)
#calculate the midpoint for each bar on the histogram
mid = 0.5*(edges[1:]+ edges[:-1])
df.plot(kind='hist',y='ANNUAL - MIN'
plt.plot(mid,y,'-^')
plt.title('Annual Min Temperature plot(1901 - 2017)')
plt.xlabel('Temperature')
plt.show()
41
Output:
RESULT
Thus, the Graph Types, Getting and setting values, Patches in python were executed
successfully.
42
EXPERIMENT 7: Multivariate data analysis: Multiple regression
def mse(coef, x, y):
return np.mean((np.dot(x, coef) - y)**2)/2
while True:
error = mse(coef, x, y)
if abs(error - prev_error) <= epsilon:
break
prev_error = error
grad = gradients(coef, x, y)
t += 1
m_coef = b1 * m_coef + (1-b1)*grad
v_coef = b2 * v_coef + (1-b2)*grad**2
moment_m_coef = m_coef / (1-b1**t)
moment_v_coef = v_coef / (1-b2**t)
ax.view_init(45, 0)
ax.legend()
plt.show()
43
Output:
RESULT
Thus, the Multivariate data analysis: Multiple regression in python was executed
successfully
44
EXPERIMENT 8: Multivariate regression, cluster analysis with various
algorithms
Exercise 8.1
from matplotlib import cm
df_list = []
fig = plt.figure()
ax = Axes3D(fig)
plt.show()
45
Output
Exercise 8.2
from copy import deepcopy
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv("/content/Iris.csv")
df.drop('Id',axis=1,inplace=True)
df.head()
df["Species"] = pd.Categorical(df["Species"])
df["Species"] = df["Species"].cat.codes
k=3
# Training data
n = data.shape[0]
# Plotting data
colors=['blue', 'yellow', 'green']
for i in range(n):
plt.scatter(data[i, 0], data[i,1], s=7, color = colors[int(category[i])])
plt.scatter(centers[:,0], centers[:,1], marker='.', c='r', s=150)
46
Output:
Exercise 8.3
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4,
random_state=0)
X = StandardScaler().fit_transform(X)
db = DBSCAN(eps=0.3, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
Now, let’s plot the results that we saw in our output above.
import matplotlib.pyplot as plt
47
%matplotlib inline
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = (labels == k)
Output:
RESULT
Thus, the multivariate regression, cluster analysis with various algorithms were
executed successfully.
48
EXPERIMENT 9: Factor analysis using Python
#EXPLORATORY FACTOR ANALYSIS
fa = FactorAnalyzer(10, rotation=None)
fa.fit(scaled_baseball)
#GET EIGENVALUES
fa.get_eigenvalues()
Output:
RESULT
# define dataset
X, y = get_dataset()
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
scores = evaluate_model(model, X, y)
results.append(scores)
names.append(name)
print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))
# plot model performance for comparison
pyplot.boxplot(results, labels=names, showmeans=True)
pyplot.show()
50
Output
Exercise 10.2
# importing required libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# importing or loading the dataset
dataset = pd.read_csv('wine.csv')
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Applying PCA function on training
# and testing set of X component
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
explained_variance = pca.explained_variance_ratio_
# Fitting Logistic Regression To the training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
Output
cm = confusion_matrix(y_test, y_pred)
52
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape), alpha = 0.75,
cmap = ListedColormap(('yellow', 'white', 'aquamarine')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)
Output:
53
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)
Output:
RESULT
Thus, the PCA and linear discriminant analysis were executed successfully.
54
TEST PROJECT DESCRIPTION
S.No. List of Projects Project Description
1. Market Basket Analysis Market basket analysis is a data mining technique used by
retailers to increase sales by better understanding customer
purchasing patterns. It involves analyzing large data sets,
such as purchase history, to reveal product groupings, as
well as products that are likely to be purchased together.
2. Reducing In this project we use the measurements of the parts of the
Manufacturing appliances as they progress through an assembly- line to
Failures predict whether there would be a defect in the part. This will
help companies to produce high-quality, low-cost products
at the user end.
3. Insurance Pricing The purposes of this project to look into different features to
Forecast observe their relationship, and plot a multiple linear
regression based on several features of individual such as
age, physical/family condition and location against their
existing medical expense to be used for predicting future
medical expenses of individuals that help medical insurance
to make decision on charging the premium.
4. City Employee Data has changed the face of our world over the last ten
Salary Data years. The numerous emails, text messages we share,
Analysis YouTube videos we watch are part of the nearly 2.5
quintillion bytes of data generated daily across the world.
Businesses, both large and small, deal with massive data
volumes, and a lot depends on their ability to glean
meaningful insights from them. A data analyst does precisely
that. They interpret statistical data and turn it into useful
information that businesses and organizations can use for
critical decision-making
5. Churn Prediction in Predicting customer churn is critical for telecommunication
Telecom companies to be able to effectively retain customers. It is
more costly to acquire new customers than to retain existing
ones. For this reason, large telecommunications corporations
are seeking to develop models to predict which customers
are more likely to change and take actions accordingly.
6. Predicting Wine Predicting wine quality using machine learning techniques is
Preferences of becoming increasingly popular today. Basically, it’s the
Customers using computer algorithm that can tell if there’s a difference
Wine Dataset between a $5 bottle of wine or a $100 one. There are many
educational step-by-step guides by professional
programmers using open-source wine quality prediction
datasets and teaching how to use ML for wine quality
prediction. But we decided to break this down into a more
detailed and technical overview.
7. Identifying Product The objective of this data science project in R is to find out
Bundles from Sales product bundles that can be put together on sale. Typically
Data Market Basket Analysis was used to identify such bundles,
here we are going to compare the relative importance of
55
time series clustering in identifying product bundles.
8. Movie Review Sentiment relates to the meaning of a word or sequence of
Sentiment Analysis words and is usually associated with an opinion or emotion.
And analysis? Well, this is the process of looking at data and
making inferences; in this case, using machine learning to
learn and predict whether a movie review is positive or
negative.
9. Store Sales Sales forecasting is the process of estimating future sales.
Forecasting Accurate sales forecasts enable companies to make informed
business decisions and predict short-term and long-term
performance. Companies can base their forecasts on past
sales data, industry-wide comparisons, and economic trends.
10. Building a Music Similar genres will sound similar and will come from similar
Recommendation time periods while the same can be said for songs within
Engine those genres. We can use this idea to build a
recommendation system by taking the data points of the
songs a user has listened to and recommending songs
corresponding to nearby data points.
11. Airline Dataset It describes financial metrics for Individual airlines, airline
Analysis sectors and the industry as a whole for the American
commercial airline industry. The Original data from the
source is collected in the zip file "Original MIT data" and the
data relating to Airline finances and the main industry
metrics has been cleaned and written into csv files for ease
of use.
12. Predicting Flight Our project focuses on predicting flight delays using
Delays machine learning techniques. We employ feature
engineering and advanced regression algorithms to enhance
accuracy. The dataset includes flight info, weather
conditions, and other relevant factors. Our model achieves
94% accuracy.
13. Event Data In an Event Data Analysis project, the goal is to analyze and
Analysis gain insights from data generated during events or
conferences. The project typically involves collecting,
cleaning, and processing the data to extract meaningful
information.
14. Building a Job The portal will stream data from Twitter API to find out the
Portal using recently published jobs. Classification of relevant and
Twitter Data irrelevant tweets is accomplished using the machine-
learning algorithm, i.e., Logistic Regression. Using the
algorithm, we have measured the 97% accuracy.
15. Implementing Slowly Changing Dimensions (SCDs) are techniques used in
Slowly Changing data projects to manage the updates, changes, and historical
Dimensions in a data in a dimensional model. It is important to implement
Data SCDs to ensure the accuracy and consistency of the data over
time.
56
57