Machine learning using matlab.pdf

Machine Learning using
Matlab
Lecture 2 Linear regression

Concerned questions
● Collaboration
○ The maximum number for a group is 3.
○ Submit technical report individually, include the whole framework of your project, and what you
have done in the project.
○ Your score will be given combining the whole project and your work.
● Project
○ A good technical report is composed of three parts: novel idea, good writing, and code (good
code ≠ high score)
○ Project flowchart (next slide)
● Submit your group list and project proposal (up to one page) by the end of this
week.

Flowchart of a ML project
Data collection and
annotation
Training data
(60%)
Validating data
(20%)
Test data
(20%)
Machine learning
model

Linear regression with one variable
Hypothesis:
Parameters:
Cost Function:
Goal: ?

Gradient descent
● Given a function fx , our objective is
● Repeat until convergence{
}

Concern
● If α is too small, gradient descent can be slow, more iterations are needed
● If α is too large, gradient descent can overshoot the minimum. It may fail to
converge, or even diverge.
● May converge to a local minimum if the cost function is non-convex
● When we approach a local minimum, gradient descent will automatically take
smaller steps. So, no need to decrease α over time.
● “Batch”: all the training examples are used in each step of gradient descent

Gradient descent for linear regression
(one variable)
Repeat until convergence{
}

Question: which algorithm is correct?
}
}
Correct

Linear regression with multiple variables
● n : number of features
● x(i) : input features of i-th training example
● x(j) : value of feature j in i-th training example
Area of site
(1000 square feet)
Size of living place
(1000 square feet)
Number of rooms Ages in years Selling price
3.472 0.998 7 42 25.9
3.531 1.500 7 62 29.5
2.275 1.175 6 40 27.9
4.050 1.232 6 54 25.9
... ... ... ... ...

Hypothesis
● Hypothesis for multiple features:
● If we define x0 = 1, then we have

Linear regression with multiple variables
Hypothesis:
Parameters:
Cost Function:
Gradient descent:
● Repeat until convergence{
}

Gradient descent for linear regression
(multiple variables)
Repeat{
}

Suggestions on gradient descent
● How to make sure gradient descent is working correctly?
● How to choose learning rate?

Make sure gradient descent is working correctly
● Ideal cost output: decrease sharply, then slightly decrease
● Declare convergence if cost decrease between two iterations is less than a
threshold

Choosing learning rate
● α too small, more iterations; α too large, may not converge
● To choose α, try
Ideal small large

Feature normalization - intuition
● Each feature has a different scale, which may generates an oval shape
● Result: more iterations to converge
● Solution: feature normalization

Feature normalization
● Feature scaling:
● Standard score:

Normal equation for linear regression
Area of site
(1000 square
feet)
Size of living
place (1000
square feet)
Number of
rooms
Ages in
years
Selling
price
3.472 0.998 7 42 25.9
3.531 1.500 7 62 29.5
2.275 1.175 6 40 27.9
4.050 1.232 6 54 25.9
Area of site
(1000 square
feet)
Size of living
place (1000
square feet)
Number of
rooms
Ages in
years
Selling
price
1 3.472 0.998 7 42 25.9
1 3.531 1.500 7 62 29.5
1 2.275 1.175 6 40 27.9
1 4.050 1.232 6 54 25.9

Normal equation for linear regression
● Instead of gradient descent, we can obtain the best solution using the
following equations:
● Matlab one line code: pinv(X’*X)*X’*y

Gradient descent vs normal equation
Gradient descent
● Need to choose learning rate
● Need many iterations
● Works well even when number of features
n is very large
Normal equation
● No learning rate
● No iterations
● Need to compute , slow when
is very large, non-invertible

Congratulations!
You have learnt your first ML model!
Questions?

Classification: logistic regression

Binary classification
● Examples:
○ Email: spam/not spam?
○ Tumor: malignant/benign?
○ Object: car/not car?
● ,where 1 is positive class, and 0 is negative class, e.g, spam (1) vs
not spam (0).
● Intuitively, negative class conveys absence of something

Hypothesis
● If hx > 0.5 , predict “y = 1”
● If hx < 0.5 , predict “y = 0”
●

Logistic regression model
● Hell
● sigmoid/logistic function:

Interpretation of hypothesis output
● Estimated probability that y = 1, given x , “parameterized” by T :
● Example: given tumor size, if we have , which means: tell patient
that 70% chance of tumor being malignant.
● As we only have two classes, we have:

Logistic regression
● when hx > 0.5 , predict “y = 1”
● when hx < 0.5 , predict “y = 0”

● Hx
● Decision boundary:
● Note: different ML model generates
different decision boundary
Decision boundary

Cost function
● A new representation of cost function:
● In linear regression, it is given by:

Cost function
● Logistic regression cost function:
● Intuition: if h(x) = 0, but y = 1, the learning algorithm will be penalized by a
very large cost.
● We can compact two cases into one equation:

Machine learning using matlab.pdf

More Related Content

What's hot

Similar to Machine learning using matlab.pdf

Recently uploaded

Machine learning using matlab.pdf