1
AI System Semiconductor Design
Lecture2: Introduction to Machine
Learning
Lecturer: Taewook Kang
Acknowledgments
Lecture material adapted from
PyTorchZeroToAll, Prof. Sung Kim, HKUST
Prof. Woowhan Jung, DSLAB, Hanyang Univ.
CM20315 Prof. Simon Prince, Dr. Georgios Exarchakis, and Dr. Andrew Barnes, University of Bath
Dr. Hyungjoo Seo, Apple, CA, USA
SKKU Kang Research Group / SEE3007 Spring 2025 1 1
Class Schedule - Python Part
▪ What is ML?
▪ Classification Vs. Regression
▪ Linear regression
▪ Gradient descent for linear regression
▪ Logistic regression
▪ Shallow Neural Networks
▪ Deep Neural Networks
▪ Stochastic gradient descent (SGD)
▪ MNIST Python implementation
SKKU Kang Research Group / SEE3007 Spring 2025 2
What is ML?
https://docs.google.com/presentation/d/1xC-
lg8RnaO4wTwwQWYEJBUtjj2Caxn5QsHM0LEc9YVc/edit#slide=id.g27be7003ef_0_9
SKKU Kang Research Group / SEE3007 Spring 2025 3
What is Human Intelligence?
SKKU Kang Research Group / SEE3007 Spring 2025 4
What is Human Intelligence?
What to eat for lunch?
SKKU Kang Research Group / SEE3007 Spring 2025 5
What is Human Intelligence?
What to eat for lunch?
Information Infer / Guess
SKKU Kang Research Group / SEE3007 Spring 2025 6
What is Human Intelligence?
What to dress?
Information
Infer
SKKU Kang Research Group / SEE3007 Spring 2025 7
What is Human Intelligence?
What is this picture?
CAT
Image information Prediction
SKKU Kang Research Group / SEE3007 Spring 2025 8
What is Human Intelligence?
What is this number?
2
Image information Prediction
SKKU Kang Research Group / SEE3007 Spring 2025 9
What is Human Intelligence?
What would be the grade if I study 4 hours?
4 ?
hours points
Prediction
information
SKKU Kang Research Group / SEE3007 Spring 2025 10
Machine Learning
What to dress?
Information
Infer
SKKU Kang Research Group / SEE3007 Spring 2025 11
https://styledna.ai/
SKKU Kang Research Group / SEE3007 Spring 2025 12
Machine Learning
What is this picture?
CAT
Image information Prediction
SKKU Kang Research Group / SEE3007 Spring 2025 13
Machine Learning
Machine needs lots of training
2
Image information Prediction
SKKU Kang Research Group / SEE3007 Spring 2025 14
Machine Learning
Machine needs lots of training
Model
training
Labeled dataset
SKKU Kang Research Group / SEE3007 Spring 2025 15
Machine Learning
Predict (test) with trained model
2
Image information Prediction
Test dataset Trained model
SKKU Kang Research Group / SEE3007 Spring 2025 16
Machine Learning
What would be the grade if I study 4 hours?
4 ?
hours points
Prediction
Hours (x) Points (y)
1 2
2 4 Training dataset
3 6
4 ? Test dataset
SKKU Kang Research Group / SEE3007 Spring 2025 17
Deep Learning?
Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville
SKKU Kang Research Group / SEE3007 Spring 2025 18
Supervised Learning
SKKU Kang Research Group / SEE3007 Spring 2025 https://udlbook.github.io/udlbook/ 19
SUPERVISED LEARNING
SKKU Kang Research Group / SEE3007 Spring 2025 20
Supervised learning
Dataset Goal: generalize the input-output relationship
𝒙 Model 𝑦ො ≈ 𝑦
Input Output
(features) (prediction)
Label
(Actual)
Training?
𝐷= 𝒙 𝟏 ,𝑦 1 , 𝒙 2 ,𝑦 2 ,… 𝒙 𝑚 ,𝑦 𝑚
Building a model to make the model can predict
the labels by using train data
Each row of data is called an observation or a tuple
SKKU Kang Research Group / SEE3007 Spring 2025 Prof. Woowhan Jung, DSLAB, Hanyang Univ. 21
Classification vs Regression
100
Life span (years)
90
80
70
60
Classification Regression
Output Categorical value Numeric value
type (class)
SKKU Kang Research Group / SEE3007 Spring 2025 22
Classification vs Regression
100
Life span (years)
90
80
70
60
Classification Regression
Output Categorical value Numeric value
type (class)
SKKU Kang Research Group / SEE3007 Spring 2025 23
Classification vs Regression
Q1. Classification? Regression?
100
Life span (years)
90
80
Predicted rating
70
Q2. Classification? Regression?
60
Cat
Classification Regression
Output Categorical value Numeric value
type (class)
Dog
SKKU Kang Research Group / SEE3007 Spring 2025 24
Classification or Regression?
• Univariate regression problem (one output, real value)
• Fully connected network
SKKU Kang Research Group / SEE3007 Spring 2025 25
Classification or Regression?
• Multivariate regression problem (>1 output, real value)
• Graph neural network
SKKU Kang Research Group / SEE3007 Spring 2025 26
Text Classification
• Binary classification problem (two discrete classes)
• Transformer network
SKKU Kang Research Group / SEE3007 Spring 2025 27
Image Classification
• Multiclass classification problem (discrete classes, >2 possible classes)
• Convolutional neural network (CNN)
SKKU Kang Research Group / SEE3007 Spring 2025 28
LINEAR REGRESSION
SKKU Kang Research Group / SEE3007 Spring 2025 29
Linear Regression – Problem Statement
What would be the grade if I study 4 hours?
4 ?
hours points
Prediction
Hours (x) Points (y)
1 2
2 4 Training dataset Supervised learning
3 6
4 ? Test dataset
SKKU Kang Research Group / SEE3007 Spring 2025 30
Linear Regression – Model Design
What would be the best model for the data? Linear?
Let’s make it simple!
Hours (x) Points (y)
1 2 General form
2 4
Linear
3 6
4 ?
: hat means a predicted value
SKKU Kang Research Group / SEE3007 Spring 2025 31
Linear Regression – Model Design
* The machine starts with a random guess, w=random value
w1
Points
w2
Hours (x) Points (y)
1 2
2 4 w3
3 6
Goal: Find a line
Hours
that fits the y best! Training data points
SKKU Kang Research Group / SEE3007 Spring 2025 32
Training Loss (Error)
Hours, x Points, y Prediction, y^(w=3) Loss (w=3)
1 2 3 1
2 4 6 4
3 6 9 9
mean=14/3
MSE: Mean Square Error
SKKU Kang Research Group / SEE3007 Spring 2025 33
Training Loss (Error)
Hours, x Points, y Prediction, y^(w=4) Loss (w=4)
1 2 4 4
2 4 8 16
3 6 12 36
mean=56/3
MSE: Mean Square Error
SKKU Kang Research Group / SEE3007 Spring 2025 34
Training Loss (Error)
Hours, x Points, y Prediction, y^(w=2) Loss (w=2)
1 2 2 0
2 4 4 0
3 6 6 0
mean=0/3
MSE: Mean Square Error
SKKU Kang Research Group / SEE3007 Spring 2025 35
Training Loss (Error)
MSE, mean square error
Loss Loss Loss Loss Loss
Hours, x Points, y
(w=0) (w=1) (w=2) (w=3) (w=4)
1 2 4 1 0 1 4
2 4 16 4 0 4 16
3 6 36 9 0 9 36
MSE 56/3=18.7 14/3=4.7 0 14/3=4.3 56/3=18.7
SKKU Kang Research Group / SEE3007 Spring 2025 36
Loss Graph
Loss (w=0) Loss (w=1) Loss (w=2) Loss (w=3) Loss (w=4)
MSE 56/3=18.7 14/3=4.7 0 14/3=4.7 56/3=18.7
MSE
SKKU Kang Research Group / SEE3007 Spring 2025 37
Coding Practice: Model & Loss
w = 1.0 # a random guess: a random value
# model for the forward pass
def forward(x):
return x * w
# Loss function
def loss(x, y):
y_pred = forward(x)
return (y_pred - y) * (y_pred - y)
SKKU Kang Research Group / SEE3007 Spring 2025 38
Compute Loss for w
for w in np.arange(0.0, 4.1, 0.1):
print("w=", w)
l_sum = 0
for x_val, y_val in zip(x_data, y_data):
y_pred_val = forward(x_val)
l = loss(x_val, y_val)
l_sum += l
print("\t", x_val, y_val, y_pred_val, l)
print("MSE=", l_sum / 3)
SKKU Kang Research Group / SEE3007 Spring 2025 39
Plot Loss for w
w_list = []
mse_list = []
for w in np.arange(0.0, 4.1, 0.1):
print("w=", w)
l_sum = 0
for x_val, y_val in zip(x_data, y_data):
y_pred_val = forward(x_val)
l = loss(x_val, y_val)
l_sum += l
print("\t", x_val, y_val, y_pred_val, l)
print("MSE=", l_sum / 3)
w_list.append(w)
mse_list.append(l_sum / 3)
plt.plot(w_list, mse_list)
plt.ylabel('Loss')
plt.xlabel('w')
plt.show()
SKKU Kang Research Group / SEE3007 Spring 2025 40
Practice
▪ Stop the lecture here
▪ Complete the loss plot code
▪ Operate the code on your own Python environment
▪ The answer code will be followed
▪ If you are already used to Python, this would be a very easy task
SKKU Kang Research Group / SEE3007 Spring 2025 41
Complete Code Answer
import numpy as np
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0 # a random guess: random value, 1.0
# our model for the forward pass
def forward(x):
return x * w
# Loss function
def loss(x, y):
y_pred = forward(x)
return (y_pred - y) * (y_pred - y)
w_list = np.arange(0.0, 4.1, 0.1)
mse_list = []
for w in w_list:
print("w=", w)
l_sum = 0
for x_val, y_val in zip(x_data, y_data):
y_pred_val = forward(x_val)
l = loss(x_val, y_val)
l_sum += l
print("\t", x_val, y_val, y_pred_val, l)
print("MSE=", l_sum / len(x_data))
mse_list.append(l_sum / len(y_data))
plt.plot(w_list, mse_list)
plt.ylabel('Loss')
plt.xlabel('w')
plt.show()
SKKU Kang Research Group / SEE3007 Spring 2025 42
Linear Regression
▪ Modelling the linear relationship between a scalar response (label) and one or more
explanatory variables (features)
Examples)
Area of house -> house price
# of iPhones sold -> Apple’s sales
SKKU Kang Research Group / SEE3007 Spring 2025 43
Linear Regression
▪ Data 𝐷 = 𝒙 𝟏 , 𝑦 1 , 𝒙 2 , 𝑦 2 , … 𝒙 𝑚 , 𝑦 𝑚
𝒙 Model 𝑦ො ≈𝑦
▪ Model
▪ Input:𝒙 𝑖 ∈ ℝ𝑑 Input Output Label
▪ Output 𝑦ො (𝑖) = 𝒘⊤𝒙 𝑖 + 𝑏
▪ Parameters: 𝒘 ∈ ℝ𝑑 , 𝑏 ∈ ℝ
Training a linear regression model?
Finding the model parameters 𝒘 and 𝑏 which make 𝑦ො ≈ 𝑦
Measuring the distance between 𝑦ො and 𝑦
Squared error: 𝑦ො − 𝑦 2
2
Loss function 𝐿 𝑦ො (𝑖) , 𝑦 (𝑖) = 𝑦 (𝑖) − 𝑦ො (𝑖)
1 𝑚 1 𝑚 2
Cost function 𝐽 𝒘, 𝑏 = σ 𝐿 𝑦ො (𝑖) , 𝑦 (𝑖) = σ 𝑦 (𝑖) − 𝑦ො (𝑖)
𝑚 𝑖=1 𝑚 𝑖=1
SKKU Kang Research Group / SEE3007 Spring 2025 44
Training a linear regression model
▪ Given
▪ Training data 𝐷 = 𝒙 𝟏 , 𝑦 1 , 𝒙 2 , 𝑦 2 , … 𝒙 𝑚 , 𝑦 𝑚
▪ Our goal
1 𝑚 2
▪ Find 𝒘, 𝑏 that minimizes 𝐽 𝒘, 𝑏 = σ
𝑚 𝑖=1
𝑦 𝑖 − 𝑦ො 𝑖
Q. How?
Applicable methods: gradient descent, linear least squares, …
We are going to use gradient descent!
SKKU Kang Research Group / SEE3007 Spring 2025 45
GRADIENT DESCENT ALGORITHM
SKKU Kang Research Group / SEE3007 Spring 2025 46
Learning (training)?: Find w that minimizes the loss
Loss (w=0) Loss (w=1) Loss (w=2) Loss (w=3) Loss (w=4)
MSE 56/3=18.7 14/3=4.7 0 14/3=4.7 56/3=18.7
𝑁
1 2
𝑙𝑜𝑠𝑠 = 𝑀𝑆𝐸 = 𝑦ො𝑛 − 𝑦𝑛
𝑁
𝑛=1
MSE
𝑎𝑟𝑔 min 𝑙𝑜𝑠𝑠(𝑤)
𝑤
SKKU Kang Research Group / SEE3007 Spring 2025 47
Gradient Descent Algorithm
loss
𝜕𝑙𝑜𝑠𝑠
Gradient (slope) =
Random initial weight 𝜕𝑤
starting point
Global loss
minimum
w
SKKU Kang Research Group / SEE3007 Spring 2025 48
Gradient Descent Algorithm
loss
𝜕𝑙𝑜𝑠𝑠
Gradient (slope) =
Random initial weight 𝜕𝑤
starting point
𝜕𝑙𝑜𝑠𝑠
𝑤𝑛𝑒𝑤 = 𝑤𝑝𝑟𝑒𝑣 − 𝛼
𝜕𝑤
𝜶 = learning rate
(small value)
Global loss
minimum
𝑤𝑛𝑒𝑤 𝑤𝑝𝑟𝑒𝑣
w
SKKU Kang Research Group / SEE3007 Spring 2025 49
Gradient Descent Algorithm
loss
𝜕𝑙𝑜𝑠𝑠
Gradient (slope) =
𝜕𝑤
Compared to the
previous jump, w
moves slower 𝜕𝑙𝑜𝑠𝑠
𝑤𝑛𝑒𝑤 = 𝑤𝑝𝑟𝑒𝑣 − 𝛼
𝜕𝑤
𝜶 = learning rate
(small value)
Global loss
minimum
𝑤𝑝𝑟𝑒𝑣
w
SKKU Kang Research Group / SEE3007 Spring 2025 50
Calculate Derivative
𝑙𝑜𝑠𝑠 = 𝑦ො − 𝑦 2 = 𝑥 ∗ 𝑤 − 𝑦 2
𝜕𝑙𝑜𝑠𝑠
𝑤𝑛𝑒𝑤 = 𝑤𝑝𝑟𝑒𝑣 − 𝛼
𝜕𝑤
𝜕𝑙𝑜𝑠𝑠
= 2𝑥 𝑥 ∗ 𝑤 − 𝑦
𝜕𝑤
https://www.derivative-calculator.net/
SKKU Kang Research Group / SEE3007 Spring 2025 51
Gradient Descent Algorithm
loss
𝜕𝑙𝑜𝑠𝑠
Gradient (slope) =
Random initial weight 𝜕𝑤
starting point 𝜕𝑙𝑜𝑠𝑠
𝑤𝑛𝑒𝑤 = 𝑤𝑝𝑟𝑒𝑣 − 𝛼
𝜕𝑤
𝑤𝑛𝑒𝑤 = 𝑤𝑝𝑟𝑒𝑣 − 𝛼 2𝑥 𝑥 ∗ 𝑤 − 𝑦
𝜶 = learning rate
(small value)
Global loss
minimum
𝑤𝑝𝑟𝑒𝑣
w
SKKU Kang Research Group / SEE3007 Spring 2025 52