0% found this document useful (0 votes)

23 views48 pages

6.1 DeepFFNets M2

The document provides an overview of Deep Feedforward Networks (DFF), discussing their architecture, training processes, and importance in machine learning. It highlights the distinction between feedforward and recurrent networks, the role of hidden layers, and the concept of function approximation. Additionally, it addresses the challenges of learning non-linear functions and the significance of learning feature representations in deep learning models.

Uploaded by

yashbnv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views48 pages

6.1 DeepFFNets M2

Uploaded by

yashbnv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Deep

Learning

Deep Feedforward
Networks:
Overview

1
Topics in DFF Networks
1. Overview
2. Example: Learning XOR
3.Hidden Units
4. Architecture Design
5. Backpropagation and Other
Differentiation
6. Historical Notes

2
Deep
Learning

Sub-topics in Overview of
DFF
1. Goal of a Feed-Forward
Network
2. Feedforward vs Recurrent
Networks
3. Function Approximation as
Goal
4. Extending Linear Models (SVM)
5. Example of XOR
3
Deep

Goal of a feedforward
Learning

network
• Feedforward Nets are
quintessential deep learning
models
• Deep Feedforward Networks
are also called as
– Feedforward neural networks or
– Multilayer Perceptrons (MLPs)
• Their Goal is to approximate
some function f *
– E.g., classifier y = f * (x) maps
bestx function
input to category y
4

approximation
– Feedforward Network defines a
Feedforward network for
MNIST
MNIST 28x28
images

Source: https://towardsdatascience.com/probability-and-statistics-explained-in-the-context-of-deep-
learning-ed1509b2eb3f
5
Deep
Learning
Flow of
• Information
Models are called Feedforward y=f (x)
because:
– To evaluate f (x): information flows one-
way from
x through computations defining f s to outputs
y
• There are no feedback connections
– No outputs of model are fed back into
itself

6
Deep

Feedforward Net: US
Learning

• USElection
Presidential Election y=f (x)
• Output: y={y1, y2}
• votes of electoral college for
candidate
• Input: X={x1,..x50}
• are vote vectors cast for 2 candidates
• W converts votes to electoral
• h is electoral college votesh is defined for each
state as shown in map
– E.g., Winner
• Each takes
state hasall or proportionate
fixed no of
electors
• w maps 50 states to 2
outputs 7
• Simple addition
Importance of Feedforward
Networks
• They are extremely important to ML
practice
• Form basis for many commercial
applications
1. CNNs are a special kind of feedforward
networks
• They are used for recognizing objects from
photos
2. They are a conceptual stepping stones to
RNNs
• RNNs power many NLP applications 8
Deep
Learning

Feedforward vs.
Recurrent
• When feedforward neural networks are
extended to include feedback
connections they are called Recurrent
Neural Networks (RNNs)
RNN Unrolled
RNN RNN with
learning
compone
nt

9
Deep
Learning

Feedforward Neural Network

Structures
• They are called networks because
they are composed of many different
functions
• Model is associated with a directed
acyclic graph describing how
functions composed
– E.g., functions f (1), f (2), f (3) connected in a
chain to form f (x)= f (3) [ f (2) [ f (1)(x)]]
• f (1) is called the first layer of network (which is a
vector)
10
• f (2) is called the second layer, etc
• These chain structures are the most
Definition of Depth
• Overall length of the chain is the depth
of the model
– Ex: the composite function f (x)= f (3) [ f (2) [ f
(1)(x)]]

has depth of 3
• The name deep learning arises from
this terminology
• Final layer of a feedforward network, ex
f (3), is called the output layer
11
Training the Network
• In network training we drive f (x) to
match f* (x)
• Training data provides us with noisy,
approximate examples of f* (x)
evaluated at different training points
• Each example accompanied by label y
≈ f*(x)
• Training examples specify directly
what the output layer must do at
each point x 12

– It must produce a value that is close

Definition of Hidden
Layer
• Behavior of other layers is not directly
specified by the data
• Learning algorithm must decide how to
use those layers to produce value that
is close to y
• Training data does not say what
individual layers should do
• Since the desired output for these
layers is not shown, they are called
hidden layers 13
Deep Learning
Srihari

A net with depth 2: one hidden

layer
K outputs y1,..yK for a given
input x
Hidden layer consists of M
units
 M (2)  D (1)  
y k(x,w)    wkj  wji x i  (1)
 (2)

j0  
  j h
i
1
w w k0

1


f (x)= f (2) [ f (1)(x)]

f (1) is a vector of M dimensions
and
f (2) is a vector of K dimensions
fm (1) =zm= h(xTw(1)), m=1,..M
fk (2) = σ (zTw(2)), k=1,..K

15
Feedforward net with
depth 2 of printed characters
• Recognition
(OCR)
f (x)= f (2) [ f (1)(x)]
– Hidden layer f (1) compares raw pixel
inputs to component patterns

15
Width of Model
• Each hidden layer is typically vector-
valued
• Dimensionality of hidden layer vector is
width of the model

16
Units of a model
• Each element of vector viewed as a
neuron
– Instead of thinking of it as a vector-vector
function, they are regarded as units in
parallel
• Each unit receives inputs from many
other units and computes its own
activation value

17
Depth versus Width
• Going deeper makes network more
expressive
– It can capture variations of the data better.
– Yields expressiveness more efficiently than
width
• Tradeoff for more expressiveness is
increased tendency to overfit
– You will need more data or additional
regularization
• network should be as deep as training data
allows.
– But you can only determine a suitable
Deep
Learning

Why are they neural

• networks?
These networks are loosely
inspired by neuroscience
• Each unit resembles a neuron
– Receives input from many other
units
– Computes its own activation value
• Choice of functions f (i)(x):
– Loosely guided by neuroscientific
observations about biological neurons
• Modern neural networks are guided by
many mathematical and engineering
19
disciplines
• Not perfectly model the brain
Deep
Learning

Function Approximation is
• goalof feedforward networks as
Think
function approximation machines
– Designed to achieve statistical
generalization
• Occasionally draw insights from what
we know about the brain
– Rather than as models of brain function

20
Understanding Feedforward
Nets
• Begin with linear networks and
understand their limitations
• Linear models such as logistic
regression and linear regression can be
fit reliably and efficiently using either
– Closed-form solution
– Convex optimization
• Limitation

21
Extending Linear Models
• To represent non-linear functions of x
– apply linear model to transformed input ϕ(x)
• where ϕ is non-linear
– Equivalently kernel trick of SVM obtains
nonlinearity
SVM Kernel
Deep
Learning

• Many ML trick
algos can be rewritten
as dot products between
examples:
f (x)=wTx+b written as b + Σi αi xTx(i)
where x(i) is a training example and α is a vector of
coeffts
– This allows us to replace x with a feature function
ϕ(x) and dot product with function
k(x,x(i))=ϕ(x)ϕ(x(i)) called a kernel
• The  operator represents an inner product analogous to
ϕ(x)Tϕ(x(i))
• For some feature spaces we may not literally use an inner
product
– In continuous spaces an inner product based on integration
– Gaussian kernel
• Consider k(u,v) = exp (-||u-v||2/2σ2)
SVM
Deep
Learning

• Prediction
Use linear regression on
Lagrangian for determining the
weights αi
• We can make predictions using
– f (x)= b + Σiαi k(x,x(i))
– Function is nonlinear wrt x but
relationship between
ϕ(x) and f (x) is linear
– Also the relationship between α and f (x)
is linear
– We can think of ϕ as providing a set of
features
• describing x or providing a new
Disadvantages of Kernel
Methods
• Cost of decision function evaluation:
linear in m
– Because the ith example contributes term αi k(x,
x(i))
to the decision function
– Can mitigate this by learning an α with
mostly zeros
• Classification requires evaluating the kernel
function only for training examples that have
a nonzero αi
• These are known as support vectors
• Cost of training: high with large data 25
sets
Options for choosing
mapping ϕ
1. Generic feature function ϕ (x)
– Radial basis function
2. Manually engineer ϕ
– Feature engineering
3. Principle of Deep Learning:
Learn ϕ

26
Option 1 to choose the
• mapping
Generic feature function ϕ (x) ϕ
– Infinite-dimensional ϕ that is implicitly
used by kernel machines based on
RBF
• RBF: N(x ; x(i), σ2I) centered at x(i) σ =mean
x : From
(i) distance
k-means between
clusterin each unit j and
g its
closest
neighbor
– If ϕ(x) is of high enough dimension we can
have enough capacity to fit the
training set
• Generalization to test set remains poor
• Generic feature mappings are based on 27
smoothness
– Do not include prior information to solve advanced
Deep
Learning

Option 2 to choose the

•
mapping ϕ
Manually engineer ϕ
• This was the dominant approach until
arrival of deep learning
• Requires decades of effort
– e.g., speech recognition, computer vision
• Little transfer between domains

28
Option 3 to choose the
mapping ϕ
• Strategy of Deep learning: Learn ϕ
• Model is y=f (x; θ,w) = ϕ(x; θ)T w
– θ used to learn ϕ from broad class of
functions
– Parameters w map from ϕ (x) to output
– Defines FFN where ϕ define a hidden
layer
• Unlike other two (basis functions,
manual engineering), this approach
gives-up on convexity of training
29
– But its benefits outweigh harms
Deep
Learning

Extend Linear Methods to Learn

ϕ ϕM K outputs y1,..yK for a given
θMD input x
wKM
Hidden layer consists of M
units M ⎛D
k ∑ kj j ⎜∑ ji i
⎠⎟
j =1 ⎞⎝ i=1 j0 k
y (x; θ,w) = w φ⎜ θ x + θ ⎟+ w
0

ϕ1 w10 yk = fk (x;θ,w) = ϕ (x;θ)T w

ϕ0
Can be viewed as a generalization of linear models
• Nonlinear function fk with M+1 parameters wk= (wk0 ,..wkM )
with
• M basis functions, ϕj j=1,..M each with D parameters θj=
(θj1,..θjD)
• Both wk and θj are learnt from data

32
Approaches to
• Learning
Parameterize theϕbasis functions as
ϕ(x;θ)
– Use optimization to find θ that
corresponds to a good representation
• Approach can capture benefit of first
approach (fixed basis functions) by
being highly generic
– By using a broad family for ϕ(x;θ)
• Can also capture benefits of second
approach
– Human practitioners design families of 3
3
ϕ(x;θ) that will perform well
Importance of
• Learning
Learning ϕ
ϕ is discussed beyond
this first introduction to feed-
forward networks
– It is a recurring theme throughout deep
learning applicable to all kinds of
models
• Feedforward networks are application
of this principle to learning
deterministic mappings form x to y
without feedback
• Applicable to
– learning stochastic mappings
Plan of Discussion: Feedforward
Networks
1. A simple example: learning XOR
2. Design decisions for a feedforward
network
– Many are same as for designing a linear
model
• Basics of gradient descent
– Choosing the optimizer, Cost function, Form of output
units
– Some are unique
• Concept of hidden layer
– Makes it necessary to have activation functions
• Architecture of network
– Backpropagation and modern
– 3
How many layers , How are they connected to each
generalizations
other, How many units in each later
5
Deep

1. Ex: XOR
Learning

• XOR: an problem
operation on binary variables x1
and x2
– When exactly one value equals 1 it returns 1
otherwise it returns 0
– Target function is y=f *(x) that we want to
learn
• Our model is y =f ([x1, x2] ; θ) which we learn, i.e.,
adapt parameters θ to make it similar to f *
• Not concerned with statistical
generalization
– Perform correctly on four training points:
•• X={[0,0]
f ([0,1]T;T,θ)[0,1]
= T,[1,0]
f ([1,0] T; θ)
T, [1,1] T} 3
=1 6
– Challenge is to fit the training set
ML for XOR: linear model
•doesn’t
Treat it asfit
regression with MSE loss
function
J(θ) =
4
1
∑ (f *(x) − f (x;θ)) = 4
2 1∑
4

(f *(x n ) − f n
2

(x ;θ)x∈X n=1
)
– Usually not used for binary
Alternative is Cross-entropy
J(θ)
J(θ) = − l nN p(t | θ)

data ∑{ = − t n ln yn +(1 − t n )ln(1 − y n )}

– But math is simple

n=1

• We must choose the form of the

yn= σ (θTxn)

model
• Consider af linear
(x;w,b) = xmodel
w with θ ={w,b}T

+b
where
– J(θ) = 4 ∑ (
1
4
t −x w - to get closed-form
n
T
n
2

n=1 b)
Minimize ) w andsolution
• Differentiate wrt b to obtain w = 0 and b=½
– Then the linear model f(x;w,b)=½ simply outputs 0.5
everywhere
– Why does this 3
7
happen?
Linear model cannot solve
• XOR
Bold numbers are values system must
output
• When x1=0, output has to increase with x2
• When x1=1, output has to decrease with x2

• Linear model f (x;w,b)= x1w1+x2w2+b has to assign a

single weight to x2, so it cannot solve this
problem
• A better solution:
– use a model to learn a different
representation
• in which a linear model is able to represent the
solution 36

– We use a simple feedforward network

Deep
Learning

– Intercept parameters b are

Functions computed by
• Network
Layer 1 (hidden layer): vector of
hidden units h computed by
function f (1)(x; W,c)
– c are bias variables
• Layer 2 (output layer) computes
f (2)(h; w,b)
– w are linear regression weights
– Output is linear regression applied to
h
rather than to x
• Complete model is 38

(2) (1)
Linear vs Nonlinear
• functions
If we choose both f (1) and f (2) to be
linear, the total function will still be
linear f (x)=xTw’
– Suppose
Then we could f (1)(x)= WTx and
represent f (2)(h)=hTw
this
f (x)=xTw’
function as
f (x)=x Tw’ where w’=Ww
• Since linear is insufficient, we must
use a nonlinear function to describe
the features
– We use the strategy of neural networks
– by using a nonlinear activation function
41

h=g(WTx+c)
Activation
• In linear Function
regression we used a vector of
w and scalar
weights f (x;w,b) = x w
T

+b
bias b
– to describe an affine transformation from
an input vector to an output scalar
• Now we describe an affine
transformation from a vector x to a
vector h, so an entire vector of bias
parameters is needed
• Activation function g is typically
chosen to be applied element-wise
hi=g(xTW:,i+ci) 4
2
Deep
Learning

Default Activation
• Function
Activation: g(z)=max{0,z}
– Applying this to the
output of a linear
transformation yields a
nonlinearfunction
– However transformation
remains A principle of CS:
close to linear Build complicated
systems from
• Piecewise linear with two minimal
pieces components.
A Turing Machine
• Therefore preserve properties Memory needs
that make linear models only 0 and 1 states.
easy to optimize with
We can build
gradient-based methods Universal Function
• Preserve many properties approximator from
ReLUs
that make linear models
generalize
Specifying the Network using
•ReLU
Activation: g(z)=max{0,z}
• We can now specify the complete
network as
f (x; W,c,w,b)=f (2)(f (1)(x))=wT max {0,WTx+c}+b
We can now specify XOR
• Solution
Le
⎡
W =⎢1
⎢
f (x; W,c,w,b)=
⎥
1 ⎤, c =
⎢ 0
⎤
⎥
⎡
w max {0,W x+c}+b ⎢ 1
⎤
⎥, b = T T
⎣ 1 ⎦ ⎣⎢ − 1 ⎦, w=⎢
⎣ −2 0
• tNow walk through how model
⎦
1
⎡
⎥ ⎥ ⎥

batch of a
processes ⎡ ⎤
⎥
⎢⎢ 0 0 ⎥
inputs
• Design matrix X of all four ⎡
⎢ 0 0
⎢
⎤X =⎢ 0 1
⎥⎢
⎥
⎢
⎢ 1 0
⎥⎥
⎥
⎥
• points:
First step is ⎡
⎢ 0 −1 ⎤XW = ⎢
1 1 ⎥
⎥ ⎢⎣ 1 1 ⎥

⎢ ⎥ ⎢ ⎦
In this space all points ⎢
• XW:
Adding liealong a line with slope 1. XW + c =
⎢ 1 ⎥
⎢ 1
0⎢⎥
⎥
⎥ ⎣
1 1 ⎥
⎥
implemented
Cannot be by a linear ⎢ ⎥ 2 2
⎢ ⎢
• c:
Compute h Using
model ⎡
⎢⎢ 0 0
⎤
⎥ ⎢
0
2 1
⎦
⎥
1 0 ⎥ ⎣
ReLU
Has changed relationship among max{0, X W + c} = ⎢⎥
⎢ ⎢
⎢
1 0
⎥
⎥ ⎥
⎥ ⎦
examples. They no longer lie on a
⎢ 2 1
A linear
single model
line. ⎣
• Finish by multiplying
suffices ⎦
⎥
⎡ ⎤

• by w:
⎢ 0 ⎥

Network has ⎢
f (x) = ⎢
⎢
1
⎢
⎥
⎥
⎥
⎥

obtained
⎢ ⎥
⎣ ⎦
1
⎢ ⎥
0

correct answer for all 4

examples 43
Learned representation for
• XOR
Two points that must
When x =0, output
have output 1 have
1
has to increase with
x2
been collapsed into When x1=1, output
one has to decrease with
x2
• Points x=[0,1]T
and x=[1,0]T have
been mapped
When h1=0, output is
• into h=[0,1] T
Described in linear constant 0 with h2
When h1=1, output is
model constant 1 with h2
When h1=2, output is
– For fixed h2, 1 constant 0
with h2 44
output increases
in h
Deep

About the XOR

Learning

• example
We simply specified the solution
– Then showed that it achieves zero error
• In real situations there might be
billions of parameters and billions of
training examples
– So one cannot simply guess the solution
• Instead gradient descent optimization
can find parameters that produce very
little error
– The solution described is at the global
minimum 45
• Gradient descent could converge to this
solution
Learning XOR
Learning XOR
XOR cant be calculated by a single
perceptron

6.1 DeepFFNets
No ratings yet
6.1 DeepFFNets
47 pages
Deep Feedforward Networks Guide
No ratings yet
Deep Feedforward Networks Guide
103 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Ch06 Deep Feedforward Networks
100% (1)
Ch06 Deep Feedforward Networks
90 pages
Module 2
No ratings yet
Module 2
44 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
DL 2
No ratings yet
DL 2
62 pages
Contents MLP PDF
No ratings yet
Contents MLP PDF
60 pages
DL 02 Deep Forward Networks
No ratings yet
DL 02 Deep Forward Networks
47 pages
Neural Networks: Feedforward Basics
No ratings yet
Neural Networks: Feedforward Basics
24 pages
978-3-030-41068-1 (1) - 133-188
No ratings yet
978-3-030-41068-1 (1) - 133-188
56 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Module 2 DL Snotes P1
No ratings yet
Module 2 DL Snotes P1
16 pages
DL Unit-3
No ratings yet
DL Unit-3
9 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
Unit 3
No ratings yet
Unit 3
16 pages
Unit II
No ratings yet
Unit II
56 pages
Lecture5 MCQ Guide
No ratings yet
Lecture5 MCQ Guide
9 pages
04 - Neural Networks PDF
No ratings yet
04 - Neural Networks PDF
46 pages
Unit 1
No ratings yet
Unit 1
70 pages
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
No ratings yet
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
7 pages
3 Neural Networks
No ratings yet
3 Neural Networks
72 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Intro to Feed Forward Neural Networks
No ratings yet
Intro to Feed Forward Neural Networks
41 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
2023.05.03 The Little Book of Deep Learning
No ratings yet
2023.05.03 The Little Book of Deep Learning
143 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Unit 4
No ratings yet
Unit 4
51 pages
DL Unit 3 Notes
No ratings yet
DL Unit 3 Notes
16 pages
Unit 2 Notes NLP
No ratings yet
Unit 2 Notes NLP
6 pages
MLT Unit 4 and 5 Part 2
No ratings yet
MLT Unit 4 and 5 Part 2
34 pages
LBDL
No ratings yet
LBDL
143 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Deep Learning Book by François Fleuret
No ratings yet
Deep Learning Book by François Fleuret
149 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Macro Finance
No ratings yet
Macro Finance
119 pages
What Is Gradient Based Learning in Deep Learning
100% (1)
What Is Gradient Based Learning in Deep Learning
12 pages
Deep Learning Fundamentals Explained
No ratings yet
Deep Learning Fundamentals Explained
144 pages
Deep Learning Techniques: 1. Define Neural Networks
No ratings yet
Deep Learning Techniques: 1. Define Neural Networks
31 pages
w05 LectureSlices MA4550
No ratings yet
w05 LectureSlices MA4550
31 pages
Neural Networks & Deep Learning Lecture
No ratings yet
Neural Networks & Deep Learning Lecture
9 pages
Cheatsheets For Deep Learning 1650192034
No ratings yet
Cheatsheets For Deep Learning 1650192034
95 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Module 02
No ratings yet
Module 02
20 pages
Ảnh Màn Hình 2025-04-10 Lúc 10.10.40
No ratings yet
Ảnh Màn Hình 2025-04-10 Lúc 10.10.40
63 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
3-Neural Networks - Parts 1 and 2
No ratings yet
3-Neural Networks - Parts 1 and 2
48 pages
Deep Learning Essentials
No ratings yet
Deep Learning Essentials
143 pages
Ch5-Feedforward Neural Networks, Word Embeddings, Neural Language Models, and Word2vec PDF
No ratings yet
Ch5-Feedforward Neural Networks, Word Embeddings, Neural Language Models, and Word2vec PDF
67 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
Deep Learning Fundamentals Overview
No ratings yet
Deep Learning Fundamentals Overview
29 pages
Unit 3
No ratings yet
Unit 3
12 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Assignment 03
No ratings yet
Assignment 03
4 pages
UNIT II - Gated Recurrent Unit
No ratings yet
UNIT II - Gated Recurrent Unit
24 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
17 pages
Project Report
No ratings yet
Project Report
30 pages
Face Mask Detection Mini Project Report Updated
No ratings yet
Face Mask Detection Mini Project Report Updated
8 pages
Unit5 PPT
No ratings yet
Unit5 PPT
13 pages
Autoencoders in Deep Learning
No ratings yet
Autoencoders in Deep Learning
73 pages
4 DL Deep Neural Nets
No ratings yet
4 DL Deep Neural Nets
56 pages
Bengali Image Captioning with Deep Learning
No ratings yet
Bengali Image Captioning with Deep Learning
72 pages
Deep Neural Networks Overview and Techniques
No ratings yet
Deep Neural Networks Overview and Techniques
109 pages
Sparse Autoencoders in Deep Learning
No ratings yet
Sparse Autoencoders in Deep Learning
11 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
3 pages
Deep Learning with CNN Architectures
No ratings yet
Deep Learning with CNN Architectures
7 pages
Recurrent Neural Network (RNN)
No ratings yet
Recurrent Neural Network (RNN)
8 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
Unit - VI - ML SPPU IT
No ratings yet
Unit - VI - ML SPPU IT
116 pages
Chapter1. Introduction To Deep Learning
No ratings yet
Chapter1. Introduction To Deep Learning
21 pages
7 CNN
No ratings yet
7 CNN
66 pages
Deep Learning Syllabus
100% (2)
Deep Learning Syllabus
2 pages
Tugas Data Mining Pertemuan 10 Kelompok 3
No ratings yet
Tugas Data Mining Pertemuan 10 Kelompok 3
4 pages
Datamites Ai Expert Brochure
No ratings yet
Datamites Ai Expert Brochure
10 pages
LSTM Recurrent Neural Networks - How To Teach A Network To Remember The Past - by Saul Dobilas - Towards Data Science
No ratings yet
LSTM Recurrent Neural Networks - How To Teach A Network To Remember The Past - by Saul Dobilas - Towards Data Science
20 pages
And Gate Perceptron Neumerical
No ratings yet
And Gate Perceptron Neumerical
3 pages
Introduction To Radial Basis Function Networks
No ratings yet
Introduction To Radial Basis Function Networks
45 pages
Ch04-ANN-Dr Amin ML
No ratings yet
Ch04-ANN-Dr Amin ML
57 pages
Deep Learning Units I To V Notes
No ratings yet
Deep Learning Units I To V Notes
4 pages
What Are Activation Functions?: Whether That Neuron Should Be "Activated" (Fire)
No ratings yet
What Are Activation Functions?: Whether That Neuron Should Be "Activated" (Fire)
12 pages
A Survey On Vision Transformer
No ratings yet
A Survey On Vision Transformer
23 pages
Deep Learning
100% (1)
Deep Learning
189 pages
Neural Network Lab Guide
No ratings yet
Neural Network Lab Guide
17 pages

6.1 DeepFFNets M2

Uploaded by

6.1 DeepFFNets M2

Uploaded by

Deep

Feedforward Neural Network

– It must produce a value that is close

A net with depth 2: one hidden

f (x)= f (2) [ f (1)(x)]

Why are they neural

Option 2 to choose the

Extend Linear Methods to Learn

ϕ1 w10 yk = fk (x;θ,w) = ϕ (x;θ)T w

data ∑{ = − t n ln yn +(1 − t n )ln(1 − y n )}

– But math is simple

• We must choose the form of the

• Linear model f (x;w,b)= x1w1+x2w2+b has to assign a

– We use a simple feedforward network

Feedforward Network for

– Intercept parameters b are

correct answer for all 4

About the XOR

You might also like