0% found this document useful (0 votes)

8 views

01 Intro

intro to Deep learning

Uploaded by

admiller030

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

01 Intro

intro to Deep learning

Uploaded by

admiller030

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

CSD456

Deep Learning
Course Instructor Information

Instructor: Dr. Saurabh Shigwan

Email: [email protected]
Office: C219F
Office Hours: By appointment
PhD TA: TBD
Prerequisites of This Course

This is a computer science course

• It will involve a fair amount of math
– calculus, linear algebra, geometry
– probability
– analog/digital signal processing

• It will involve the modeling and design of a real

system - one final course project
– Programming skills with Python and PyTorch
Text Book

Required:
Dive Into Deep Learning
By Aston Zhang, Zachary C. Lipton, Mu Li,
Alexander J. Smola · 2023
Link to PDF: [https://d2l.ai/d2l-en.pdf]
• We will cover many topics in this text book
• We will also include special topics on recent
progresses on image processing
• There will other reference books also.
Requirement for Final Project

A complete research project

• Introduction (problem formulation/definition)
• literature review
• the proposed method and analysis
• experiment
• conclusion
• reference
Requirement for Final Project

• Select a topic and write a one-page proposal (31st August)

• Progress report (discuss with the instructor)
• Research work and report writing
• Oral presentation
• Final project report
Requirement for Final Project

Teamwork is acceptable for a research project (Option 1)

• <=3 people
• Get the permission from the instructor first
• Under a single topic, each member must have their own
specific tasks
• One combined report with each member clearly stating
their own contributions
• One combined presentation
Requirement for Final Project

Written report
• Report format: the same as an IEEE conference paper
• Executable code must be submitted with clear comments
except for a survey study
Academic integrity (avoiding plagiarism)
• don’t copy other person’s work
• describe using your own words
• complete citation and acknowledgement whenever you use
any other work (either published or online)
Requirement for Final Project

Evaluation
• written report (be clear, complete, correct, etc.)
• code (be clear, complete, correct, well documented, etc.)
• oral presentation
• discussion with the instructor
• quality: publication-level project – extra credits
Paper Reading and Presentation

• A paper picked by yourself and approved by the instructor

• Suggested paper source:IEEE TPAMI, IEEE TIP,IEEE TMI, IJCV,
CVIU, Elsevier Pattern Recognition, NeurIPS, ICCV, ECCV, CVPR,
WACV, ICASSP, ICIP, ICML, ECML, MICCAI, ISBI, IPMI

• Thorough understanding of the paper

• Prepare PPT slides
• Clearly explain the main contributions in the selected
paper
• Critical comments and discussions
• About 15 mins oral presentation for each Group
Assessment scheme

Evaluation Instrument Weightage Learning Outcomes

Mid Term Test 25% Understanding half semester
concepts
Laboratory/Assign. 20% Testing implementation skill
End Term Exam 25% Understanding full semester
concepts
Group Project 30% Testing project building
History of Deep Learning

• Early concepts date back to the 1940s and 1950s (e.g.,

Perceptron).
• Major breakthroughs in the 1980s with backpropagation.
• Resurgence in the 2000s with the availability of large datasets
and powerful GPUs.
• Deep learning is transforming industries with its powerful
predictive capabilities.
• Continuous research is needed to overcome current challenges.
• The field is rapidly evolving with new techniques and applications
emerging regularly.
Inspiration for Deep Learning: The Brain!
1943: McCulloch & Pitts, networks of binary neurons can do logic
1947: Donald Hebb, Hebbian synaptic plasticity
1948: Norbert Wiener, cybernetics, optimal filter,
feedback, autopoïesis, auto-organization.
1957: Frank Rosenblatt, Perceptron
1961: Bernie Widrow, Adaline
1962: Hubel & Wiesel, visual cortex architecture
1969: Minsky & Papert, limits of the Perceptron
Supervised Learning goes back to the Perceptron & Adaline
N

y= sign(∑ W i X i +b)
The McCulloch-Pitts Binary Neuron
Perceptron: weights are motorized potentiometers i=1

Adaline: Weights are electrochemical “memistors”

https://youtu.be/X1G2g3SiCwU
More History
1970s: statistical patter recognition (Duda & Hart 1973)
1979: Kunihiko Fukushima, Neocognitron
1982: Hopfield Networks
1983: Hinton & Sejnowski, Boltzmann Machines
1985/1986: Practical Backpropagation for neural net training
1989: Convolutional Networks
1991: Bottou & Gallinari, module-based automatic differentiation
1995: Hochreiter & Schmidhuber, LSTM recurrent net.
1996: structured prediction with neural nets, graph transformer nets
…..
2003: Yoshua Bengio, neural language model
2006: Layer-wise unsupervised pre-training of deep networks
2010: Collobert & Weston, self-supervised neural nets in NLP
More History
2012: AlexNet / convnet on GPU / object classification
2015: I. Sutskever, neural machine translation with multilayer LSTM
2015: Weston, Chopra, Bordes: Memory Networks
2016: Bahdanau, Cho, Bengio: GRU, attention mechanism
2016: Kaiming He, ResNet
The Standard Paradigm of Pattern Recognition

...since the 1960s

...and “traditional” Machine Learning
until the “Deep Learning Revolution” (circa 2012)

Feature Trainable
Extractor C las sifier

Hand engineered Trainable

What is Deep Learning?

• Deep learning is a subset of machine learning.

• It uses neural networks with many layers (deep architectures).
• Models learn from large amounts of data to make predictions or
decisions.
• Deep learning has achieved state-of-the-art results in various
fields.
• Key applications include computer vision, natural language
processing, and reinforcement learning.
• It powers technologies like autonomous vehicles, medical
diagnostics, and personal assistants.
Multilayer Neural Nets and Deep Learning
Traditional Machine Learning

Feature Trainable
Extractor Classifier

Hand engineered Trainable

Trainable
Deep Learning

Low-Level Mid-Level High-Level Trainable

Features Features Features Classifier
Parameterized Model
Cost
Parameterized model Function
Implicit
Output scalar output
y C(y,y)
Example: linear regression

Parameterized
Example: Nearest neighbor: Deterministic G(x,w)
Function
implicit parameter input

x y
Computing function G may involve
complicated algorithms Input Desired
output
Block diagram notations for computation graphs

Variables (tensor, scalar, continuous, discrete...)

x Observed: input, desired output…
y Computed variable: outputs of deterministic functions

Deterministic function
x G(x,w) y Multiple inputs and outputs (tensors, scalars,….)
Implicit parameter variable (here: w)

Scalar-valued function (implicit output)

y C(y,y) y Single scalar output (implicit)
used mostly for cost functions
Loss function, average loss.

Simple per-sample loss function

average

A set of samples y C(y,y)

y C(y,y)
y C(y,y)
y C(y,y)

G(x,w)
Average loss over the set G(x,w)
G(x,w)
G(x,w)

x[0] y[0]
x[1] y[1]
x[2] y[2]
x[3] y[3]
Supervised Machine Learning = Function Optimization

Function with
adjustable parameters
Objective
Function Error

traffic light: -1
It's like walking in the mountains in a fog
and following the direction of steepest
descent to reach the village in the valley
But each sample gives us a noisy
estimate of the direction. So our path is ∂L(W , X )
a bit random. W i ← W i− η
∂W i
Gradient Descent
Full (batch) gradient

-g
g

Stochastic Gradient (SGD)

Pick a p in 0...P-1, then update w:

SGD exploits the redundancy in the samples

It goes faster than full gradient in most cases
In practice, we use mini-batches for parallelization.
Traditional Neural Net

Stacked linear and non-linear functional blocks

Weighted sums, matrix-vector product
Point-wise non-linearities (e.g. ReLu, tanh, … . )

w
w
w w w

w w w
w
w w

w w w
w

w w
w w
w
w
w
Traditional Neural Net

Stacked linear and non-linear functional blocks

w[i,j] s[i] z[i]

w w

s[j] z[j]
w

w
w
Backprop through a non-linear function

Chain rule: c 1
g(h(s))’ = g’(h(s)).h’(s)
dc/ds = dc/dz*dz/ds cost cost

dc/ds = dc/dz*h’(s)
Perturbations:
Perturbing s by ds will perturb z dc/dz
z by: dz=ds*h’(s) h(s) network hT(s) derivative
* network
This will perturb c by s dc/ds

dc = dz*dc/dz = ds*h’(s)*dc/dz
Hence: dc/ds = dc/dz*h’(s)
x y dc/dx dc/dy
Backprop through a weighted sum

Perturbations:
c 1
Perturbing z by dz will perturb
s[0],s[1],s[2] by ds[0]=w[0]*dz, cost cost
ds[1]=w[1]*dz, ds[2]=w[2]*dz
This will perturb c by dc/ds[1]
s[0] s[1] s[2] dc/ds[0] dc/ds[2]
dc = ds[0]*dc/ds[0]+
ds[1]*dc/ds[1]+ w[0] w[1] w[2] network w[0] w[1] w[2] derivative
réseau
network
ds[2]*dc/ds[2] dérivée
z
Hence: dc/dz = dc/ds[0]*w[0]+ dc/dz

dc/ds[1]*w[1]+
dc/ds[2]*w[2]+ x y dc/dx dc/dy
(Deep) Multi-Layer Neural Nets

Multiple Layers of simple units R eL U ( x )= m a x ( x , 0 )

Each units computes a weighted sum of its inputs
Weighted sum is passed through a non-linear function
The learning algorithm changes the weights

This is a car
Weig ht
matrix
Hidden
Layer
Block Diagram of a Traditional Neural Net

linear blocks

Non-linear blocks
PyTorch definition

Object-oriented version
Uses predefined nn.Linear class,
(which includes a bias vector)
Uses torch.relu function
State variables are temporary
Linear Classifiers and their limitations
N

Linear classifier ȳ =sign ( ∑ w i x i +b)

i =1

Partitions the space into two half spaces separated by the hyperplane:
N

∑ w i x i+b=0
i=1
Not linearly separable dataset
W
x2 x2

x1 x1
-b/w1 -b/w1
Number of linearly separable dichotomies

The probability that a dichotomy over P points in N dimensions is

linearly separable goes to zero as P gets larger than N
[Cover’s theorem 1966]
Solution: representations (a.k.a. features)

Extracting relevant features from the raw input

Computing good representations of the input
The feature extractor must be non-linear
Simple solution: expand the dimension non-linearly
But how?

Feature Trainable
Extractor C las sifier

Representation /
Features
Ideas for “generic” feature extraction

Basic principle:
expanding the dimension of the representation so that things are more
likely to become linearly separable.

- space tiling
- random projections
- polynomial classifier (feature cross-products)
- radial basis functions
- kernel machines
Example: monomial features

Feature extractor computes cross products of input variables

A linear classifier on top computes a polynomial of input variables

generalizable to degree d
Unfortunately impractical
for large d
Number of features is d
choose N, which grows
like Nd
But d=2 is used a lot in
“attention” circuits.
Shallow networks are universal approximators!

SVMs and Kernel methods

Layer1: kernels; layer2: linear
The first layer is “trained” with the
simplest unsupervised method ever
devised: using the samples as
templates for the kernel functions.
2-layer neural nets
Layer1: dot products + non-linear
function; Layer2: linear
But few useful functions can be
efficiently represented with only two
layers of reasonable size.
Do we really need deep architectures?
Theoretician's dilemma: “We can approximate any function as close as we
want with shallow architecture. Why would we need deep ones?”

kernel machines (and 2-layer neural nets) are “universal”.

Deep learning machines

Deep machines are more efficient for representing certain classes of

functions, particularly those involved in visual recognition
they can represent more complex functions with less “hardware”
We need an efficient parameterization of the class of functions that are useful
for “AI” tasks (vision, audition, NLP...)
Basic Idea for Invariant Feature Learning

Embed the input non-linearly into a high(er) dimensional space

In the new space, things that were non separable may become separable
Pool regions of the new space together
Bringing together things that are semantically similar. Like pooling.

Pooling,
Non-Linear Aggregation,
Function Projection,
Dim reduction
Input
High-dim features Stable/invariant
(Unstable/non-smooth) features
Non-Linear Expansion → Pooling

Entangled data manifolds

Non-Linear Dim
Pooling.
Expansion,
Aggregation
Disentangling
Sparse Non-Linear Expansion → Pooling
Use non-linear fn to break things apart, pool together similar things
Clustering,
Quantization, Pooling.
Sparse Coding Aggregation
Linear+ReLU
Discovering the Hidden Structure in High-Dimensional Data:
The manifold hypothesis
Learning Representations of Data:
Discovering & disentangling the independent explanatory factors
The Manifold Hypothesis:
Natural data lives in a low-dimensional (non-linear) manifold
Because variables in natural data are mutually dependent
Discovering the Hidden Structure in High-Dimensional
Data
Example: all face images of a person
1000x1000 pixels = 1,000,000 dimensions
But the face has 3 Cartesian coordinates and 3 Euler angles And
Ideal
humans have less than about 50 muscles in the face Hence the
Feature
manifold of face images for a person has <56 dimensions Extractor

[]
The perfect representations of a face image:
Face/not face
Its coordinates on the face manifold 1.2
−3 Pose
Its coordinates away from the manifold 0.2 Lighting
− 2 .. . Expression
kind of representation

We do not have good and general methods to learn functions that turns an image into this
Disentangling factors of variation

The Ideal Disentangling Feature Extractor

View
Pixel n

Ideal
Feature
Extractor
Pixel 2

Expression
Pixel 1
Data Manifold
[Hadsell et al. CVPR 2006]
Deep Learning = Learning Hierarchical Representations

Traditional Machine Learning

Feature Trainable
Extractor Classifier

Hand engineered Trainable

Trainable
Deep Learning

Low-Level Mid-Level High-Level Trainable

Features Features Features Classifier
Multilayer Architectures == Compositional Structure of Data
Naturally data is compositional => it is efficiently representable hierarchically

Low-Level Mid-Level Hig h-Level Trainable

Feature Feature Feature Classifier

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
Multilayer Architecture == Hierarchical representation
Hierarchy of representations with increasing level of abstraction
Each stage is a kind of trainable feature transform
Image recognition
Pixel → edge → texton → motif → part → object
Text
Character → word → word group → clause → sentence → story
Speech
Sample → spectral band → sound → … → phone → phoneme → word
Why would deep architectures be more efficient?
[Bengio & LeCun 2 0 0 7 “Scaling Learning Algorithms Towards AI”]
A deep architecture trades space for time (or breadth for depth)
more layers (more sequential computation),
but less hardware (less parallel computation).
Example1: N-bit parity
requires N-1 XOR gates in a tree of depth log(N).
Even easier if we use threshold gates
requires an exponential number of gates of we restrict ourselves to 2 layers (DNF
formula with exponential number of minterms).
Example2: circuit for addition of 2 N-bit binary numbers
Requires O(N) gates, and O(N) layers using N one-bit adders with ripple carry
propagation.
Requires lots of gates (some polynomial in N) if we restrict ourselves to two layers (e.g.
Disjunctive Normal Form).
Bad news: almost all boolean functions have a DNF formula with an exponential
number of minterms O(2^N).....

The Necklace Activities
No ratings yet
The Necklace Activities
4 pages
Understanding Divine Direction - Introduction
No ratings yet
Understanding Divine Direction - Introduction
2 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
Lecun 20181015 Ihes Gomax PDF
No ratings yet
Lecun 20181015 Ihes Gomax PDF
109 pages
Introduction To DL With TensorFlow
No ratings yet
Introduction To DL With TensorFlow
55 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
LFD-1
No ratings yet
LFD-1
39 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Talk MLSS Part2
No ratings yet
Talk MLSS Part2
97 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
CNN2
No ratings yet
CNN2
70 pages
Asset-V1 MITx+CTL - sc0x+2T2020+Type@Asset+Block@SC0x M1U2 AnalyticsBasics CLEAN
No ratings yet
Asset-V1 MITx+CTL - sc0x+2T2020+Type@Asset+Block@SC0x M1U2 AnalyticsBasics CLEAN
35 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
An Overview of Edward: A Probabilistic Programming System: Dustin Tran Columbia University
No ratings yet
An Overview of Edward: A Probabilistic Programming System: Dustin Tran Columbia University
34 pages
CS221 - Artificial Intelligence - Machine Learning - 1 Overview
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 1 Overview
16 pages
GDG_SOF_WEEK_2[1]
No ratings yet
GDG_SOF_WEEK_2[1]
11 pages
AI_slide_2
No ratings yet
AI_slide_2
82 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Neural Network BSC
No ratings yet
Neural Network BSC
32 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
12-13.Chapter9_DeepLearningInNLP
No ratings yet
12-13.Chapter9_DeepLearningInNLP
45 pages
Lec 1
No ratings yet
Lec 1
30 pages
Mlfa Autumn 22 Lec 05
No ratings yet
Mlfa Autumn 22 Lec 05
29 pages
Introduction to Deep Learning
No ratings yet
Introduction to Deep Learning
24 pages
DeepLearning Book
No ratings yet
DeepLearning Book
108 pages
5 Backward Propagation
No ratings yet
5 Backward Propagation
81 pages
RBF.ppt
No ratings yet
RBF.ppt
45 pages
Session 2 Introduction to Deep Learning
No ratings yet
Session 2 Introduction to Deep Learning
24 pages
Lecture NN 2005
No ratings yet
Lecture NN 2005
137 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Introduction to ML
No ratings yet
Introduction to ML
15 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Machine Learning: Chapter 4. Artificial Neural Networks
No ratings yet
Machine Learning: Chapter 4. Artificial Neural Networks
34 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
125 pages
NN mat workshop
No ratings yet
NN mat workshop
36 pages
SML_Lecture1
No ratings yet
SML_Lecture1
37 pages
LOD Differentiable
No ratings yet
LOD Differentiable
55 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Intelligent Robotic Systems
No ratings yet
Intelligent Robotic Systems
66 pages
05 Sciml PINN
No ratings yet
05 Sciml PINN
131 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
19 pages
Module 4 Continued
No ratings yet
Module 4 Continued
244 pages
Soft Computing Lab Manual
No ratings yet
Soft Computing Lab Manual
24 pages
Lecture 01
No ratings yet
Lecture 01
45 pages
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
No ratings yet
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
40 pages
Lecture 9
No ratings yet
Lecture 9
97 pages
Amazon Inventory Reconciliation Using AI: ST ND RD
No ratings yet
Amazon Inventory Reconciliation Using AI: ST ND RD
6 pages
Module1 ECO-598 AI & ML Aug 21
No ratings yet
Module1 ECO-598 AI & ML Aug 21
45 pages
Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others
No ratings yet
Introduction To Object Recognition: Slides Adapted From Fei-Fei Li, Rob Fergus, Antonio Torralba, and Others
60 pages
Deep Learning Andrew NG
100% (3)
Deep Learning Andrew NG
173 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
SE5072_Process integration
No ratings yet
SE5072_Process integration
45 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Lecture 4 Introduction to Calculus (Part 1)
No ratings yet
Lecture 4 Introduction to Calculus (Part 1)
45 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Network Tools - DNS, IP, Email
No ratings yet
Network Tools - DNS, IP, Email
9 pages
Jew-of-Malta-Script
No ratings yet
Jew-of-Malta-Script
97 pages
All About The TCF Canada Exam LINGORELIC
No ratings yet
All About The TCF Canada Exam LINGORELIC
9 pages
OceanofPDF.com is It Wrong to Try to Pick Up Girls in a Dungeon Volume 15 - Fujino Omori
No ratings yet
OceanofPDF.com is It Wrong to Try to Pick Up Girls in a Dungeon Volume 15 - Fujino Omori
268 pages
Action Verbs Vocabulary Matching Exercise ESL Worksheets For Kids and New Learners 2948 PDF
No ratings yet
Action Verbs Vocabulary Matching Exercise ESL Worksheets For Kids and New Learners 2948 PDF
2 pages
Desert Anachoresis by James Manthra
67% (3)
Desert Anachoresis by James Manthra
35 pages
Instant Download Java the UML way integrating object oriented design and programming Else Lervik PDF All Chapters
100% (1)
Instant Download Java the UML way integrating object oriented design and programming Else Lervik PDF All Chapters
82 pages
Exponents and Power
No ratings yet
Exponents and Power
1 page
ReED - INTRO TO THE SACRAMENTS
No ratings yet
ReED - INTRO TO THE SACRAMENTS
2 pages
Gapps
No ratings yet
Gapps
2 pages
2425 Grade 10S UAE Biology Final Exam Materials T2
No ratings yet
2425 Grade 10S UAE Biology Final Exam Materials T2
3 pages
Blog Rubric
No ratings yet
Blog Rubric
1 page
Characteristics of Data Structures
No ratings yet
Characteristics of Data Structures
2 pages
User login-KLU
No ratings yet
User login-KLU
246 pages
Test 7B Unit 13-14
No ratings yet
Test 7B Unit 13-14
8 pages
Briefer Panabo City, Davao Del Norte: Rank Muncipalit Y Number of Votes Total Number of Voters Percentage of Votes
No ratings yet
Briefer Panabo City, Davao Del Norte: Rank Muncipalit Y Number of Votes Total Number of Voters Percentage of Votes
4 pages
Tattoo NFT
No ratings yet
Tattoo NFT
1 page
REED Prelim
No ratings yet
REED Prelim
8 pages
The Poker Night
No ratings yet
The Poker Night
6 pages
DM Lec28,29
No ratings yet
DM Lec28,29
18 pages
Song Writing and Presentation
No ratings yet
Song Writing and Presentation
4 pages
History of C Programming Language
No ratings yet
History of C Programming Language
23 pages
Assessment 2 Lesson Guide-Romero
No ratings yet
Assessment 2 Lesson Guide-Romero
4 pages
(Ebook) Medieval Modal Systems: Problems and Concepts by Paul Thom ISBN 9780754608332, 0754608336 - Read the ebook online or download it to own the full content
100% (1)
(Ebook) Medieval Modal Systems: Problems and Concepts by Paul Thom ISBN 9780754608332, 0754608336 - Read the ebook online or download it to own the full content
59 pages
Tarot - The Royal Road- 1 the Magician i
No ratings yet
Tarot - The Royal Road- 1 the Magician i
13 pages
UG CBCS III Semester (Regular) For 2020-21AB Application Form
No ratings yet
UG CBCS III Semester (Regular) For 2020-21AB Application Form
8 pages
Report
No ratings yet
Report
28 pages
DLL English-6 Q4 W3
100% (1)
DLL English-6 Q4 W3
6 pages

01 Intro

Uploaded by

01 Intro

Uploaded by

CSD456

Instructor: Dr. Saurabh Shigwan

This is a computer science course

• It will involve the modeling and design of a real

A complete research project

• Select a topic and write a one-page proposal (31st August)

Teamwork is acceptable for a research project (Option 1)

• A paper picked by yourself and approved by the instructor

• Thorough understanding of the paper

Evaluation Instrument Weightage Learning Outcomes

• Early concepts date back to the 1940s and 1950s (e.g.,

Adaline: Weights are electrochemical “memistors”

...since the 1960s

Hand engineered Trainable

• Deep learning is a subset of machine learning.

Hand engineered Trainable

Low-Level Mid-Level High-Level Trainable

Variables (tensor, scalar, continuous, discrete...)

Scalar-valued function (implicit output)

Simple per-sample loss function

A set of samples y C(y,y)

Stochastic Gradient (SGD)

SGD exploits the redundancy in the samples

Stacked linear and non-linear functional blocks

Stacked linear and non-linear functional blocks

w[i,j] s[i] z[i]

Multiple Layers of simple units R eL U ( x )= m a x ( x , 0 )

Linear classifier ȳ =sign ( ∑ w i x i +b)

The probability that a dichotomy over P points in N dimensions is

Extracting relevant features from the raw input

Feature extractor computes cross products of input variables

SVMs and Kernel methods

kernel machines (and 2-layer neural nets) are “universal”.

Deep machines are more efficient for representing certain classes of

Embed the input non-linearly into a high(er) dimensional space

Entangled data manifolds

The Ideal Disentangling Feature Extractor

Traditional Machine Learning

Hand engineered Trainable

Low-Level Mid-Level High-Level Trainable

Low-Level Mid-Level Hig h-Level Trainable

You might also like