Chapter-4
Neural Network
By: Gizachew M.
Neural Network
• Purpose: This is the entry point for the data into the network. Each node (or "neuron") in this layer
represents one feature of your input data.
• Example: If you are building a model to identify handwritten digits (like the MNIST dataset), the input
layer might have 784 nodes, one for each pixel in a 28x28 image.
• The value passed to each node would be the grayscale intensity of that specific pixel.
• Purpose: This layer produces the final result or prediction of
the network.
• Purpose: This is the computational core of the network.
• The "hidden" layer is where the magic happens it automatically discovers and learns the 2
relevant patterns and features from the input data.
The Power of Brain vs. Machine
• The brain is not a pre-programmed computer.
• It is a dynamic, self-organizing network.
• The Brain
– Pattern Recognition
– Association
– Complexity
– Noise Tolerance
• Machine : Refers to systems that operate based on explicit, pre-defined instructions (algorithms and
programs).
• Its intelligence is not emergent or self-taught in the same way as a brain; it is bestowed by a
programmer.
• The Machine
• Calculation, Precision, Logic
3
Features of the Brain
• Ten billion (1010) neurons
• Face Recognition ~0.1secs
• On average, each neuron has several thousand connections
• Hundreds of operations per second
• High degree of parallel computation
• Distributed representations
• Die off frequently (never replaced)
4
Neural Network classifier
⚫ It is represented as a layered set of interconnected processors.
⚫ These processor nodes and connections resembles a relationship with the
neurons of the brain.
⚫ Each node has a weighted connection to several other nodes in adjacent
layers.
⚫ Individual nodes take the input received from connected nodes and use the
weights together to compute output values.
⚫ The inputs are fed simultaneously into the input layer.
⚫ The weighted outputs of these units are fed into hidden layer.
⚫ The weighted outputs of the last hidden layer are inputs to units making
up the output layer.
5
6
Neural Networks Applications
There are two basic goals for neural network research:
Brain modelling
• Aid our understanding of how the brain works.
• This helps to understand the nature of perception, actions, learning and memory,
thought and intelligence and/or formulate medical solutions to brain damaged
patients.
Artificial System Construction/ real world applications.
• Financial modelling – predicting the stock market
• Time series prediction – climate, weather, seizures
• Computer games – intelligent agents, chess, backgammon
• Robotics – autonomous adaptable robots
• Pattern recognition – speech recognition, seismic activity, sonar signals
• Data analysis – data compression, data mining
• Bioinformatics – DNA sequencing, alignment 7
Architecture of Neural network
⚫ Neural networks are used to look for patterns in data, learn these patterns, &
then classify new patterns & make forecasts
⚫ A network with the input and output layer only is called single-layered
neural network. Whereas, a multilayer neural network is a generalized one
with one or more hidden layer.
⚫ A network containing two hidden layers is called a three-layer neural network, and so on.
8
A Multilayer Neural Network
⚫ Input: corresponds with class attribute that are with normalized attributes
values.
– There are as many nodes as class attributes, X = {x1, x2, …. xm}, where n is the number of
attributes.
• Hidden Layer
– Neither its input nor its output can be observed from outside.
– The number of nodes in the hidden layer & the number of hidden layers depends on
implementation.
– Mostly different number of hidden layers and nodes produce different result
• Output Layer – corresponds to the class attribute.
There are as many nodes as classes (values of the class attribute) 9
Multi-layer Perceptron (MLP)
• One of the most popular neural network model is the multi-layer perceptron (MLP).
• In an MLP, neurons are arranged in layers. There is one input layer, one output layer,
and several (or many) hidden layers.
10
11
Hidden layer: Neuron with Activation
⚫ The neuron is the basic information processing unit of a NN.
⚫ It consists of:
1. A set of links, describing the neuron inputs, with weights W1,W2, …,Wm
2. An adder function (linear combiner) for computing the weighted sum of the
inputs (real numbers): M This function collects all the weighted inputs and sums them
into a single value.
y= ∑ w xj j
j= 1
3. Activation function (also called squashing function): for limiting the output behavior of the neuron.
This is the final, crucial step that determines the neuron's final output.
•Input Value (x): The data coming from the previous neuron or input feature.
•Weight (w): A number that represents the strength or importance of that specific connection. A
high weight means the input is very influential; a low weight means it's less important.
12
Two Topologies of neural network
⚫ NN can be designed in a feed forward or recurrent manner
⚫ In a feed forward neural network connections between the units do not form a directed
cycle.
⚫ In this network, the information moves in only one direction, forward, from the input nodes, through
the hidden nodes (if any) & to the output nodes.
⚫ There are no cycles or loops or no feedback connections are present in the network, that is, connections
extending from outputs of units to inputs of units in the same layer or previous layers.
⚫ In recurrent networks data circulates back & forth until the activation of the units is
stabilized
⚫ Recurrent networks have a feedback loop where data can be fed back into the input at some point
before it is fed forward again for further processing and final output.
13
Training the neural network
⚫ The purpose is to learn to generalize using a set of sample patterns m where the
desired output is known.
⚫ Back Propagation is the most commonly used method for training multilayer feed
forward NN.
⚫ Back propagation learns by iteratively processing a set of training data (samples).
⚫ For each sample, weights are modified to minimize the error between the desired
output and the actual output.
⚫ After propagating an input through the network, the error is calculated and the error is
propagated back through the network while the weights are adjusted in order to make
the error smaller. 14
37
Training Algorithm
⚫ The learning algorithm is as follows
⚫ Initialize
the weights and threshold to small random numbers.
⚫ Present a vector x to the neuron inputs and calculate the
m
output using the adder function.
y= ∑j j
w
⚫ Apply the activation function (in this case step functionx) such that
y = 0 if y 0 j=1
1 if y > 0
⚫ Update the weights according to the error.
W j =W j +η (yT y) x j
15
Training Multi-layer NN
16
Training Multi-layer NN
Train this layer first
17
Training Multi-layer NN
Train this layer first
then this layer
18
Training Multi-layer NN
Train this layer first
then this layer
then this layer
19
Training Multi-layer NN
Train this layer first
then this layer
then this layer
then this layer
20
Training Multi-layer NN
Train this layer first
then this layer
then this layer
then this layer
finally this layer 23 21
Calculating the Error
⚫ Evaluatethe predicted output - Calcualte the error as the difference
between the predicted output against the target output of sample n and
passed to a loss function
22
Calculating the Error: Example
23
Reducing Error
• The main goal of the training is to reduce the error or the difference between
prediction and actual output.
• By decomposing prediction into its basic elements we can find that weights are the
variable elements affecting prediction value. In other words, in order to change
prediction value, we need to change weights values.
How to change\update the weights value so that the error is reduced?
The answer is Backpropagation!
24
Pros and Cons of Neural Network
• Useful for learning complex data like handwriting, speech and image
recognition
Cons
Pros
Slow training time
Can learn more complicated
class boundaries Hard to interpret & understand the learned
Fast application function (weights)
Can handle large number of Hard to implement: trial & error for choosing
features number of nodes
o Neural Network needs long time for training.
o Neural Network has a high tolerance to noisy and incomplete data
o Conclusion: Use neural nets only if decision-trees fail.
25
Deep Learning…
What exactly is deep learning ?
1. ‘Deep Learning’ means using a neural network with several layers of nodes
between input and output
2. The series of layers between input & output do feature identification and
processing in a series of stages, just as our brains seem to.
3. Deep learning is about building a multi-layered, learning machine that mimics the
brain's staged approach to understanding the world, one layer of abstraction at a
time. 26
Deep Learning…
27
Convolutional Neural Networks (CNNs)
• CNNs are a special kind of multi-layer neural networks, designed for processing data
that has an input shape like a 2D matrix like images.
• CNN’s are typically used for image detection and classification.
• Images are 2D matrix of pixels on which we run CNN to either recognize the image or
to classify the image.
• Example: Identify if an image is of a human being, or car or just digits on an address.
28
Convolutional Neural Network Architecture
•Purpose: To detect local features (like edges, corners, textures) from the input.
•How it works: It uses small filters (or kernels), which are matrices of weights, that slide (convolve) across the
input image.
29
Convolutional Neural Network Architecture
• A CNN typically has three layers:
• Convolutional layer,
• Pooling layer, and
• Fully connected layer.
• Convolutional layer: is the core building block of a CNN, and it is where the majority of computation
occurs.
• Purpose: To detect local features (like edges, corners, textures) from the input.
• How it works: It uses small filters (or kernels), which are matrices of weights, that slide (convolve) across
the input image.
• The term convolution refers to the mathematical combination of two functions to produce a third function. It
merges two sets of information.
• In the case of a CNN, the convolution is performed on the input data with the use of a filter or kernel then
produce a feature map. 30
Convolution Operation
(CONV) uses filters that perform convolution
operations as it is scanning the input with respect to
its dimensions. Its hyperparameters include the filter
size and stride . The resulting output called feature
map/activation map
31
Convolution Operation
32
Pooling Layer
The pooling Layer is a mechanism of down sampling. It is usually appended after
convolutional layers to progressively decrease the spatial size of feature maps.
• Max pooling takes the largest
value from the window of the
image currently covered by the
kernel.
• Average pooling takes the
average of all values in the
window.
33
The whole CNN
cat dog ……
Convolution
Max Pooling
Can
Fully Connected
repeat
Feedforward network
Convolution many
times
Max Pooling
Flattened
34
Recurrent Neural Networks (RNN)
• A recurrent neural network (RNN) is an extension of a regular feedforward
neural network, which is able to handle variable-length sequential data and
processing time-series prediction.
• Example: If you want to predict the next word in a sentence you need to
know which words came before it.
• In sequence problem, the output depends on
• Current Input
• Previous Output
• Example: Sequence is important for part of speech (POS) tagging
• Traditional neural network cannot capture such relationship.
35
Typical RNN Architecture
RNN can be seen as an MLP network with the addition of loops to the architecture.
36
RNN Example: Guess part of speech (POS)
37
RNN Example: Sentiment Analysis
38
Recurrent Neural Networks: Process Sequences
e.g. Image Captioning
image -> sequence of words
39
Recurrent Neural Networks: Process Sequences
e.g. Sentiment Classification
sequence of words -> sentiment
40
Recurrent Neural Networks: Process Sequences
e.g. Machine Translation
seq of words -> seq of words
41
Recurrent Neural Networks: Process Sequences
e.g. Video classification on frame level
42
RNN Applications
• Natural language processing
• E.g. Given a sequence of words, RNN predicts the probability of next word given the
previous ones.
• Machine translation: Similar to language modeling
• E.g. Google translator (English to Amharic )
• Speech recognition:
• given input: sequence of acoustic signals, produce output phonetic segments
• Image tagging : RNN + CNN jointly trained.
• CNN generates features (hidden state representation).
• RNN reads CNN features and produces output (end-to-end training).
• Time series prediction : Forecast of future values in a time series, from past seen values.
• e.g Weather forecast, financial time series
43
THANK YOU
44