Assignment II Machine Learning
Assignment II Machine Learning
MEMBERS.
Ibrahim Hussein 19/05592 BISF
Moses Kipngeno 19/05914 BISF
Everlyne Nelius Irungu 19/05463 BISF
Alice Njeri Kuria 19/05790 BISF
Collins Njoroge 19/02573 BISF
ACTIVITY
1. Describe the Support Vector Machine algorithm.
Support Vector Machine (SVM) is a powerful machine learning algorithm used for
classification and regression tasks.
It works by finding the best hyper plane that separates the data points into different
classes in a high-dimensional space.
The SVM algorithm works through:
i. Data preprocessing: the input data is first preprocessed to ensure that it is in a suitable
format for Support Vector Machine. It may include scaling, normalization and other
transformations to ensure that the data is centered and the features are on similar
scales.
ii. Feature mapping: SVM maps the input data into a higher dimensional space using a
kernel function. This helps find a hyper plane that can effectively separate the data
points given.
iii. Hyper plane selection: SVM then searches for the optimal hyper plane that separates
the data points with maximum margin. The margin is (the distance between the hyper
plane and the closest data points from each class). The larger the margin, the more
confident the algorithm is about its classification.
iv. Support vector identification: The data points closest to the hyper plane on each side
are known as support vectors. These support vectors determine the position of the
hyper plane and are used to calculate the margin.
v. Classification: Once the optimal hyper plane is found, SVM uses it to classify new
data points based on which side of the hyper plane they fall on. If the data point falls
on the positive side of the hyper plane, it is classified as one class, and if it falls on
the negative side, it is classified as the other class.
SVM can therefore handle both linear and non-linearly separable data by using different
kernel functions. Kernel functions used in SVM include linear, polynomial, radial basis
function (RBF), and sigmoid.
SVM is a powerful algorithm for classification tasks and can handle high dimensional
datasets with complex decision boundaries as seen above.
SVM disadvantage is that it’s still not suitable for large datasets because of its high
training time.
2. Preprocess a selected dataset
Data preprocessing is the process of preparing the raw data and making it suitable for machine
learning models. Data preprocessing includes data cleaning for making the data ready to be given
to machine learning model
Below is a dataset containing student performances. We apply various data preprocessing
commands to the dataset as shown below.
import pandas as pd
import numpy as np
#read csv
df_excel = pd.read_csv('StudentsPerformance.csv')
df_excel
#first look
df_excel.describe()
df_excel['math score'].sum()
df_excel['math score'].mean()
df_excel['math score'].max()
df_excel['math score'].min()
df_excel['math score'].count()
# count
df_excel['gender'].value_counts()
# if condition
df_excel['pass/fail'] = np.where(df_excel['average'] > 70, 'Pass', 'Fail')
df_excel.head()
# multiple conditions
conditions = [
(df_excel['average']>=90),
(df_excel['average']>=80) & (df_excel['average']<90),
(df_excel['average']>=70) & (df_excel['average']<80),
(df_excel['average']>=60) & (df_excel['average']<70),
(df_excel['average']>=50) & (df_excel['average']<60),
(df_excel['average']<50),
]
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
datasets = pd.read_csv('Social_Network_Ads.csv')
X = datasets.iloc[:, [2,3]].values
Y = datasets.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
# Feature Scaling
Y_Pred = classifier.predict(X_Test)