0% found this document useful (0 votes)
25 views

Assignment II Machine Learning

The document describes an assignment for a machine learning course involving support vector machines (SVM). It includes: 1) A description of the SVM algorithm and how it works. 2) Examples of preprocessing a student performance dataset in Python, including cleaning, aggregating and transforming the data. 3) An example Python code to build an SVM classification model on a social network advertising dataset, including data preprocessing, training and evaluating the model.

Uploaded by

Hussein Ibrahim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Assignment II Machine Learning

The document describes an assignment for a machine learning course involving support vector machines (SVM). It includes: 1) A description of the SVM algorithm and how it works. 2) Examples of preprocessing a student performance dataset in Python, including cleaning, aggregating and transforming the data. 3) An example Python code to build an SVM classification model on a social network advertising dataset, including data preprocessing, training and evaluating the model.

Uploaded by

Hussein Ibrahim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

SCHOOL OF TECHNOLOGY

BACHELOR OF INFORMATION SECURITY AND FORENSICS &


BACHELOR OF SOFTWARE DEVELOPMENT & BACHELOR IN INFORMATION
FORENSICS AND SECURITY
MACHINE LEARNING
JANUARY-APRIL 2023
ASSIGNMENT II

MEMBERS.
Ibrahim Hussein 19/05592 BISF
Moses Kipngeno 19/05914 BISF
Everlyne Nelius Irungu 19/05463 BISF
Alice Njeri Kuria 19/05790 BISF
Collins Njoroge 19/02573 BISF
ACTIVITY
1. Describe the Support Vector Machine algorithm.

Support Vector Machine (SVM) is a powerful machine learning algorithm used for
classification and regression tasks.
It works by finding the best hyper plane that separates the data points into different
classes in a high-dimensional space.
The SVM algorithm works through:
i. Data preprocessing: the input data is first preprocessed to ensure that it is in a suitable
format for Support Vector Machine. It may include scaling, normalization and other
transformations to ensure that the data is centered and the features are on similar
scales.
ii. Feature mapping: SVM maps the input data into a higher dimensional space using a
kernel function. This helps find a hyper plane that can effectively separate the data
points given.
iii. Hyper plane selection: SVM then searches for the optimal hyper plane that separates
the data points with maximum margin. The margin is (the distance between the hyper
plane and the closest data points from each class). The larger the margin, the more
confident the algorithm is about its classification.
iv. Support vector identification: The data points closest to the hyper plane on each side
are known as support vectors. These support vectors determine the position of the
hyper plane and are used to calculate the margin.
v. Classification: Once the optimal hyper plane is found, SVM uses it to classify new
data points based on which side of the hyper plane they fall on. If the data point falls
on the positive side of the hyper plane, it is classified as one class, and if it falls on
the negative side, it is classified as the other class.

SVM can therefore handle both linear and non-linearly separable data by using different
kernel functions. Kernel functions used in SVM include linear, polynomial, radial basis
function (RBF), and sigmoid.
SVM is a powerful algorithm for classification tasks and can handle high dimensional
datasets with complex decision boundaries as seen above.
SVM disadvantage is that it’s still not suitable for large datasets because of its high
training time.
2. Preprocess a selected dataset
Data preprocessing is the process of preparing the raw data and making it suitable for machine
learning models. Data preprocessing includes data cleaning for making the data ready to be given
to machine learning model
Below is a dataset containing student performances. We apply various data preprocessing
commands to the dataset as shown below.
import pandas as pd
import numpy as np

#read csv
df_excel = pd.read_csv('StudentsPerformance.csv')
df_excel

#first look
df_excel.describe()

#calculate specific columns

df_excel['math score'].sum()
df_excel['math score'].mean()
df_excel['math score'].max()
df_excel['math score'].min()
df_excel['math score'].count()

#calculate specific rows

df_excel['average'] = (df_excel['math score'] + df_excel['reading score']


+ df_excel['writing score'])/3
df_excel.mean(axis=1)
df_excel.head()

# count
df_excel['gender'].value_counts()

# if condition
df_excel['pass/fail'] = np.where(df_excel['average'] > 70, 'Pass', 'Fail')
df_excel.head()

# multiple conditions
conditions = [
(df_excel['average']>=90),
(df_excel['average']>=80) & (df_excel['average']<90),
(df_excel['average']>=70) & (df_excel['average']<80),
(df_excel['average']>=60) & (df_excel['average']<70),
(df_excel['average']>=50) & (df_excel['average']<60),
(df_excel['average']<50),
]

values = ['A', 'B', 'C', 'D', 'E', 'F']


df_excel['grades'] = np.select(conditions, values)
df_excel.head()

# show first 5 rows


df_excel[['average', 'pass/fail', 'grades']].head()
3. Using an example in Python and a sample dataset build an SVM model.

# Support Vector Machine


# Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the datasets

datasets = pd.read_csv('Social_Network_Ads.csv')
X = datasets.iloc[:, [2,3]].values
Y = datasets.iloc[:, 4].values

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split


X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size = 0.25,
random_state = 0)

# Feature Scaling

from sklearn.preprocessing import StandardScaler


sc_X = StandardScaler()
X_Train = sc_X.fit_transform(X_Train)
X_Test = sc_X.transform(X_Test)

# Fitting the classifier into the Training set

from sklearn.svm import SVC


classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_Train, Y_Train)

# Predicting the test set results

Y_Pred = classifier.predict(X_Test)

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix


cm = confusion_matrix(Y_Test, Y_Pred)

# Visualising the Training set results

from matplotlib.colors import ListedColormap


X_Set, Y_Set = X_Train, Y_Train
X1, X2 = np.meshgrid(np.arange(start = X_Set[:, 0].min() - 1, stop = X_Set[:,
0].max() + 1, step = 0.01),
np.arange(start = X_Set[:, 1].min() - 1, stop = X_Set[:,
1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(Y_Set)):
plt.scatter(X_Set[Y_Set == j, 0], X_Set[Y_Set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Support Vector Machine (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

# Visualising the Test set results

from matplotlib.colors import ListedColormap


X_Set, Y_Set = X_Test, Y_Test
X1, X2 = np.meshgrid(np.arange(start = X_Set[:, 0].min() - 1, stop = X_Set[:,
0].max() + 1, step = 0.01),
np.arange(start = X_Set[:, 1].min() - 1, stop = X_Set[:,
1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(Y_Set)):
plt.scatter(X_Set[Y_Set == j, 0], X_Set[Y_Set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Support Vector Machine (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

You might also like