Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
1
1.1Machine Learning:
Machine learning is a concept which allow to machine to learn from examples and
experiences and that too without being explicitly programmed so instead of writing code
what is done is simple feed data to the generic data.
Machine learning is the study of computer algorithms that improve automatically
through experience. It is seen as a subset of artificial intelligence. Machine learning
algorithms build a model based on sample data, known as training data in order to make
predictions or decisions without being explicitly programmed to do so. Machine learning
algorithms are used in a wide variety of applications e.g., email filtering spam or not and
also for computer vision. where it is difficult or unfeasible to develop conventional
algorithms to perform the needed tasks.
Machine learning is an application of artificial intelligence that provides systems the
ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that
can access data and use it to learn for themselves.
Since our data set has two features: height and weight, the logistic regression hypothesis is the
following:
With the coefficients at hand, a manual prediction (that is, without using the
and to check if the resulting scalar is bigger than or equal to zero (to predict Male), or otherwise
inches and Weight = 180 pounds, like at line 14 at the script LogisticRegression.py above, one
A visualization of the decision boundary and the complete data set can be seen here:
As you can see, above the decision boundary lie most of the blue points that correspond to the
Male class, and below it all the pink points that correspond to the Female class.
Also, from just looking at the data you can tell that the predictions won’t be perfect. This can be
improved by including more features (beyond weight and height), and by potentially using a
The scikit-learn library does a great job of abstracting the computation of the logistic regression
Let’s start by defining the logistic regression cost function for the two points of interest: y=1, and
y=0, that is, when the hypothesis function predicts Male or Female.
Then, we take a convex combination in y of these two terms to come up with the logistic
The logistic regression cost function is convex. Thus, in order to compute θ, one needs to solve
There is a variety of methods that can be used to solve this unconstrained optimization problem,
such as the 1st order method gradient descent that requires the gradient of the logistic regression
cost function, or a 2nd order method such as Newton’s method that requires the gradient and the
In its most basic form, gradient descent will iterate along the negative gradient direction of θ
Working:
1. Information is fed into the input layer which transfers it to the hidden layer
2. The interconnections between the two layers assign weights to each input randomly
3. A bias added to every input after weights are multiplied with them individually
4. The weighted sum is transferred to the activation function
5. The activation function determines which nodes it should fire for feature extraction
6. The model applies an application function to the output layer to deliver the output
7. Weights are adjusted, and the output is back-propagated to minimize error
The model uses a cost function to reduce the error rate. You will have to change the weights with
different training models.
1. The model compares the output with the original result
2. It repeats the process to improve accuracy
The model adjusts the weights in every iteration to enhance the accuracy of the output.
2.3SVM: support vector machine is a computer algorithm that learns by example to assign
labels to objects. For instance, an SVM can learn to recognize fraudulent credit card activity by
examining hundreds or thousands of fraudulent and nonfraudulent credit card activity reports.
Alternatively, an SVM can learn to recognize handwritten digits by examining a large collection
of scanned images of handwritten zeroes, ones and so forth. SVM have also been successfully
applied to an increasingly wide variety of biological applications.
Step by step explanation:
Import the dataset.
Explore the data to figure out what they look like.
Pre-process the data.
Split the data into attributes and labels.
Divide the data into training and testing sets.
Train the SVM algorithm.
Make some predictions.
Example
we can see that the boundary decision line is the function x2=x1−3x2=x1−3. Using the
formula wTx+b=0wTx+b=0 we can obtain a first guess of the parameters as
Recall that scaling the boundary by a factor of cc does not change the boundary line, hence we can
generalize the equation as
cx1−xc2−3c=0cx1−xc2−3c=0
w= [c, −c] b=−3cw= [c, −c] b=−3c
Plugging back into the equation for the width we get
2||w||22–√cc=14=42–√=42–√2||w||=4222c=42c=14
Hence the parameters are in fact
w= [14, −14] b=−34w= [14, −14] b=−34
To find the values of αiαi we can use the following two constraints which come from the dual
problem:
w=∑imαiy(i)x(i)w=∑imαiy(i)x(i)
∑imαiy(i)=0∑imαiy(i)=0
And using the fact that αi≥0αi≥0 for support vectors only (i.e., 3 vectors in this case) we obtain the
system of simultaneous linear equations:
⎡⎣⎢6α1−2α2−3α3−1α1−3α2−4α31α1−2α2−1α3⎤⎦⎥α=⎡⎣⎢1/4−1/40⎤⎦⎥=⎡⎣⎢1/161/160⎤⎦
3.Comparision of performance
3.1 Different matrices are used for the evaluation of performance of these algorithms.
We used confusion matric in class.
3.2 Conditions in which one algorithm is preferred over other
In fact, no one could be the best. It depends upon the problem which classifier would
be suitable.
SVM is best.
1) When number of features (variables) and number of training data is very large
(say millions of features and millions of instances (data)).
2) When sparsity in the problem is very high, i.e., most of the features have zero
value.
3) It is the best for document classification problems where sparsity is high and
features/instances are also very high.
4) It also performs very well for problems like image classification, genes
classification, drug disambiguation etc. where number of features are high.
Logistic regression:
Neural Networks: