NAME: Michael Olufemi Magreola T00686670
AI EXAM I Date: 03/14 /2022 Deadline: 03/16/2022 11.59 am
(For Tasks 1 , 4 and 5 show every step and not using libraries )
Task 1: (20 points) For the training set given below, predict the classification of the
following sample X = {2,1,1, Class =?}
using Simple Bayesian Classifier
Sample Attribute1 Attribute2 Attribute3 Class
A1 A2 A3 C
1 1 2 1 1
2 0 0 1 1
3 2 1 2 2
4 1 2 1 2
5 0 1 2 1
6 2 2 2 2
7 1 0 1 1
Solution
Predict the classification for the sample using Bayesian Classification X = {2,1,1, Class
=?}
Bayesian classification of the class given the sample X is ) for all
We need to maximize the product ) for all
P(C=1) = 4/7 = 0.5714
P(C=2) = 3/7 = 0.4286
X = {A1=2, A2=1, A3 =1}; C =?
P(A1=2/C=1) = 0/4 = 0
P(A1=2/C=2) = 2/3 = 0.66
P(A1=2/C=1) = 1/4 = 0.25
P(A2=1/C=2) = 1/3= 0.33
P(A3=1/C=1) = 3/4 = 0.75
P(A3=1/C=2) = 1/3 = 0.33
Using the assumption of conditional independence of attributes, the conditional
probabilities will be:
= P(A1=2/C=1) P(A2=2/C=1) P(A3=1/C=1) = 0*0.25*0.75 = 0
) = P(A1=2/C=2) P(A2=2/C=2) P(A3=1/C=2) = 0.66*0.33*0.33 =
0.071874
Finally, multiplying these conditional probabilities with corresponding prior
probabilities, we can obtain values proportional to P(Ci/X) and find their
maximum:
.P(C=1) = 0.5714
≡ P (X/C=1) P(C=1) = 0.071874*0.4286 = 0.030805196
)=
C1 C2
max P( ), P( ) max{0,0.30805196} 0.30805196
X X
Based on the previous two values that are the final results of the Naïve Bayesian
Classifier, we can predict that the new sample X belongs to the Class C = 2.
Task 2: ( 20 Points )In which situations you would recommend Leave-One-Out method for
validation of data mining results? (20 points)
Solution
The Leave-One-Out Cross-Validation method is used to estimate the performance of
machine learning algorithms when they are used to make predictions on data not used
to train the model. It is a special case of cross-validation where the number of folds
equals the number of instances in the data set.
The Leave-One-Out Cross-Validation method will be recommended when you have a
small dataset or when an accurate estimate of model performance is more important
than the computational cost of the method. It is a computationally expensive procedure
to perform, although it results in a reliable and unbiased estimate of model
performance.
Task 3: (20 points) What is meant by the term overfitting in the context of inductive
inference? Give example(s) and solution(s)
Solution
Inductive inference is the process to reach on general conclusion from specific example.
It makes broad generalizations from specific observations
Inductive Learning is to generalize well from the training data to any data from the
problem domain. This allows us to make predictions in the future on data the model has
never seen.
Overfitting: this means that we have too many parameters to be justified by the actual
underlying data and therefore build an overly complex model. Overfitting in the context
of inductive inference happens when a model learns the detail and noise in the training
data to the extent that it negatively impacts the performance of the model on new data,
and when a generalization is concluded and doesn’t work for test data.
Let’s have this example and consider that you have visited a city “X” and took a ride in a
taxi. On speaking to friends, you later realize that the taxi driver charged you twice or
three times more than the standard fare. This occurred as you were new in the city and
driver quite literally took you for a ride.
Also, you purchased some items from a street vendor, and you again ended up paying
more than they were worth. You finally decide that the people in the city “X” are
dishonest, which is a human trait people often generalize. Machine learning models also
have this weakness if we are not careful to avoid bias during the development stages:
modeling, selecting algorithms, features, training dataset etc.
Suppose in the same city “X” another taxi driver charged you reasonably and as per the
meter, but based on experience, you consider that this driver has also charged more.
This is also called Overfitting.
Example
Identified relevant attributes: x, y, z
X Y Z
1 2 4
3 5 9
4 2 6
Model 1:
x y z
Prediction:
x 0, z 0 y 0
Model 2:
if x = 1 and z = 4, then y = 2.
if x = 3 and z = 9, then y = 5.
if x = 4 and z = 6, then y = 2.
otherwise y = 1.
The model 2 is likely overfitting
Solution to Overfitting in Inductive Inference
Cross Validation: In order to find the optimal complexity, we need to carefully
train the model and then validate it against data that was unseen in the training
set. The performance of the model against the validation set will initially
improve, but eventually suffer and dis-improve.
Train with more data: More data into the hypothesis will make the model unable
to overfit and forced to obtain a generalize result.
Data Augmentation: Make data look more slightly different every time it is being
processed.
Regularization: Adding additional or penalty parameters
Task 4 : ( 20 points) Given the data set with two dimensions X and Y:
X Y
1 4
4 2
3 3
5 2
Use a linear regression method to calculate the parameters and where y = + x.
(Show every step and not using libraries )
Solution
a. Linear regression method
n
n
( xi meanx ).( yi mean y ) / xi meanx 2
i 1 i 1
meany .meanx
where:
1 n
mean x xi
n i 1
1 n
mean y yi
n i 1
meanx= (1+4+3+5)/4 = 3.25
meany= (5+2.75+3+2.5)/4 = 3.3125
Now, we can calculate
i xi yi xi- meanx yi- meany (xi- meanx)2 (yi- meany)2 (xi- meanx).( yi- meany)
1 1 5 -2.2500 1.6875 5.0625 2.8477 -3.7969
2 4 2.75 0.7500 -0.5625 0.5625 0.3164 -0.4219
3 3 3 -0.2500 -0.3125 0.0625 0.0977 0.0781
4 5 2.5 1.7500 -0.8125 3.0625 0.6602 -1.4219
Sxx = 8.7500 Syy =3.9219 Sxy = -5.5625
-5.5625/8.75 = -0.6357
= 3.3125 – (-0.6357) * 3.25 = 5.3785
We have the linear equation: y = 5.3785 - 0.6357 x
Task 5: Support Vector Machines ( SVM) . The Mercer kernel used to solve the XOR
problem is given by k(xi, xj) = (1 + xi Txj) p . What is the smallest positive integer p for which
the XOR problem is solved? Show the kernel and XOR Problem solution using SVM (20
points)
Solution:
Let the two dimensional vectors x =
We need to solve XOR problem through
So Let the kennel given where p =2
we must show that
with we can solve the XOR problem in
terms of monomials :
=
=
=
=
So
Findings :
Therefore, smallest positive integer P to solve XOR problem is 2.
Fig. 1. View of Kernel Machine