CSE465
Lecture 2
Perceptron
CSE465: Pattern Recognition and Neural Network
Sec: 3
Faculty: Silvia Ahmed (SvA)
Spring 2025
Today’s Topic
1. Perceptron:
• What is a Perceptron?
• Perceptron vs Neuron
• Geometric Intuition
• How to train a Perceptron?
Silvia Ahmed (SvA) CSE465 ECE@NSU 2
What is a Perceptron?
• Fundamental building block of ANN
• It is an algorithm, used for supervised ML.
• A Perceptron is a simple type of artificial
neural network algorithm developed by Frank
Rosenblatt in 1957. 1 b
• It's the basic unit of a neural network, taking w2
multiple binary inputs and producing a single x2 Σ A
binary output.
w1
• It computes a weighted sum of its input, x1
applies an activation function, and produces
an output.
Silvia Ahmed (SvA) CSE465 ECE@NSU 3
Different parts of Perceptron
Activation Function:
• Signum Function
Bias • Sigmoid
• ReLU
1 • tanh
b
w2
x2 Σ A
Input w1
features x1
Summation Function that
Weights works as dot product
𝑧 = 𝑤1 ∙ 𝑥1 + 𝑤2 ∙ 𝑥2 + 𝑏
Silvia Ahmed (SvA) CSE465 ECE@NSU 4
Example use of a Perceptron
1 b
IQ, x1 CGPA, x2 Job Placement
78 7.8 1 w2
69 5.1 0
x2 Σ A
… … … w1
x1
1) Training: 2) Prediction:
Main job is to learn the values of the For a new sample where IQ = 100 and CGPA = 5.1:
weights and the bias from the training 𝑧 = 100 × 1 + 5.1 × 2 + 3 = 113.2 ≥ 0
samples So Job placement = 1
Eg. w1 =1, w2 = 2, b = 3
Silvia Ahmed (SvA) CSE465 ECE@NSU 5
• Question: If there are more than 2 features?
1 b
IQ, x1 CGPA, x2 State Job x3 w3
Placement
w2
78 7.8 Dhaka 1 x2 Σ A
69 5.1 Khulna 0
w1
… … … x1
𝑧 = 𝑤1 ∙ 𝑥1 + 𝑤2 ∙ 𝑥2 + 𝑤3 ∙ 𝑥3 + 𝑏
Silvia Ahmed (SvA) CSE465 ECE@NSU 6
Perceptron vs Neuron
• Deep learning is inspired by nervous system.
Figure: Perceptron vs Neuron [2]
Silvia Ahmed (SvA) CSE465 ECE@NSU 7
Interpretation
1 b=1
IQ, x1 CGPA, x2 Job Placement
78 7.8 1
Σ
w2 = 4
x2 A
69 5.1 0
… … … w1 = 2
x1
• Weights actually depicts the strength of each (input) connections.
• Weights are mostly the feature importance.
Silvia Ahmed (SvA) CSE465 ECE@NSU 8
Geometric Intuition
1 b IQ, x1 CGPA, x2 Job Placement
w2
x2 Σ A y = 0,1 CGPA 𝐴𝑥 + 𝐵𝑦 + 𝑐 ≥ 0
w1
x1
𝑤1 => 𝐴, 𝑤2 => 𝐵, 𝑏 => 𝑐
𝑧 = 𝑤1 ∙ 𝑥1 + 𝑤2 ∙ 𝑥2 + 𝑏
𝑥1 => 𝑥, 𝑥2 => 𝑦
1 𝑧≥0
𝑦=𝑓 𝑧 =ቊ
0 𝑧<0 𝐴𝑥 + 𝐵𝑦 + 𝑐 IQ
𝐴𝑥 + 𝐵𝑦 + 𝑐 < 0
Equation of a line
• Perceptron is a “line” and its main functionality is to create “regions” 2D -> line
3D -> plane
• Perceptron is a binary classifier.
≥4D -> hyperplane
Silvia Ahmed (SvA) CSE465 ECE@NSU 9
Logic AND
input 1 input 2 output
1 1 1
1 0 0
0 1 0
0 0 0
Silvia Ahmed (SvA) CSE465 ECE@NSU 10
Logic OR
input 1 input 2 output
1 1 1
1 0 1
0 1 1
0 0 0
Silvia Ahmed (SvA) CSE465 ECE@NSU 11
Logic XOR
input 1 input 2 output
1 1 0
1 0 1
0 1 1
0 0 0
Silvia Ahmed (SvA) CSE465 ECE@NSU 12
Limitation
• Works only with linear or “sort-of” linear data
• Tensorflow playground: [Link]
Dataset type Noise Learning rate Activation
Gaussian 15-20 0.01 Sigmoid
Exclusive OR 15-20 0.01 Sigmoid
Silvia Ahmed (SvA) CSE465 ECE@NSU 13
Perception Trick
• Main target is to
get the decision
boundary in the
form:
𝑛
𝑤𝑖 𝑥𝑖 = 0
𝑖=0
Silvia Ahmed (SvA) CSE465 ECE@NSU 14
Steps - 1
• Initialize:
• A = 1, B = 1, C = 0
• Randomly select
one sample
Silvia Ahmed (SvA) CSE465 ECE@NSU 15
Steps - 2
• Initialize:
• A = 2, B = 1.5, C =
0.4
• Randomly select
one sample
Silvia Ahmed (SvA) CSE465 ECE@NSU 16
Steps - 3
• Initialize:
• A = 4, B = 1.5, C =
0.4
• Randomly select
one sample
Silvia Ahmed (SvA) CSE465 ECE@NSU 17
Line Transformation
• Shown in [Link]/calculator
• Ax+By+C=0
Main equation: 2x+3y+5=0 Effect
Change in c 2x+3y+10=0 2x+3y+0=0
Change in A 4x+3y+5=0 x+3y+5=0
Change in B 2x+6y+5=0 2x+y+5=0
Silvia Ahmed (SvA) CSE465 ECE@NSU 18
How much to transform?
Minus operation to
(1,3,1) 2 3 5 bring the wrongly
(1,3) (4,5) (4,5,1) (-) 4 5 1 “positive” point to
2 3 5 -2 -2 4 the correct
(+) 1 3 1 “negative” zone.
3 6 6
Plus operation to 2x+3y+5=0
bring the wrongly
“negative” point to
the correct
“positive” zone.
Silvia Ahmed (SvA) CSE465 ECE@NSU 19
Live Desmos demonstration
2x+3y+5=0 (5,2) (-3,-2)
Silvia Ahmed (SvA) CSE465 ECE@NSU 20
Learning rate
• The learning rate is a small number that controls how fast or slow a
machine learning or deep learning model updates its internal parameters
(like weights) during training.
• "It’s like the step size your model takes while learning. Too big, and it may
trip and fall. Too small, and it may take forever to learn."
• New coef = coef – learning rate * coef
• Why it's important:
• If the learning rate is too high → the model may skip over the best solution and
never settle.
• If the learning rate is too low → the model will learn very slowly, taking a long time
to improve (or getting stuck).
Silvia Ahmed (SvA) CSE465 ECE@NSU 21
Algorithm
• epoch = 1000, η = 0.01
for i in range(epoch):
randomly select a point for i in range(epoch):
if xi ∈ N and σ2𝑖=0 𝑤𝑖 𝑥𝑖 ≥ 0 randomly select a point
𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 + η 𝑦𝑖 − 𝑦ො𝑖 𝑥𝑖
𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 − η 𝑥𝑖
if xi ∈ P and σ2𝑖=0 𝑤𝑖 𝑥𝑖 < 0
𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 + η 𝑥𝑖
𝑦𝑖 𝑦ො𝑖 𝑦𝑖 − 𝑦ො𝑖
1 1 0
0 0 0
1 0 1
0 1 -1
Silvia Ahmed (SvA) CSE465 ECE@NSU 22
Problem with Perceptron Trick
• Which decision boundary is better?
• Quantify the result
• Convergence
Silvia Ahmed (SvA) CSE465 ECE@NSU 23
Loss Function
• An error function (also called a loss function) measures how far off a
machine learning or deep learning model's predictions are from the actual
target values.
• It gives the model a numeric value that reflects its performance—lower
values mean better predictions.
• The error function guides the learning process by telling the optimizer
how to adjust the model’s parameters (like weights in a neural network)
during training.
• f(w1, w2, b)
Silvia Ahmed (SvA) CSE465 ECE@NSU 24
Perceptron Loss Function
• Number of misclassified points
• (Perpendicular) Distance of the misclassified points
• (In practice)
• Take the point and put it on the line
• This is proportional to the perpendicular
distance but the mathematics is much
simpler than calculating the actual distance. (4,5)
2(4)+3(5)+5=28
2(-2)+3(-2)+5= |-5| = 5
(-2,-2)
2x+3y+5=0
Silvia Ahmed (SvA) CSE465 ECE@NSU 25
More Loss Functions
• If activation function is Sigmoid:
• Loss is Binary cross entropy (used in logistic regression)
• So when activation function is sigmoid then perceptron is basically
logistic regression
• Multi-class classification:
• Activation: Softmax
• Loss: Categorical Cross Entropy
• Regression:
• Activation: Linear (no activation)
• Loss: MSE
Silvia Ahmed (SvA) CSE465 ECE@NSU 26
Reference and further reading
1. “Deep Learning”, Ian Goodfellow, et al.
2. Pramoditha, Rukshan. “The Concept of Artificial Neurons
(Perceptrons) in Neural Networks.” Medium, Towards Data
Science, 29 Dec. 2021, [Link]/the-concept-
of-artificial-neurons-perceptrons-in-neural-networks-
fab22249cbfc. Accessed 21 Jan. 2025.
Silvia Ahmed (SvA) CSE465 ECE@NSU 27