0% found this document useful (0 votes)

17 views

Lec 13

1. Unsupervised learning aims to discover useful structures in unlabeled data through tasks like dimensionality reduction, density estimation, and clustering. 2. Autoencoders and principal component analysis (PCA) are two techniques for unsupervised learning that use backpropagation to learn efficient data encodings or representations. 3. PCA finds orthogonal directions of greatest variance in the data and represents points as projections onto these principal components to perform dimensionality reduction, while autoencoders can learn more complex nonlinear representations through additional hidden layers.

Uploaded by

mheba11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Lec 13

Uploaded by

mheba11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CSC321: Neural Networks

Lecture 13: Learning without a teacher:

Autoencoders and Principal Components
Analysis

Geoffrey Hinton
Three problems with backpropagation
• Where does the supervision come from?
– Most data is unlabelled
• The vestibular-ocular reflex is an exception.
• How well does the learning time scale?
– Its is impossible to learn features for different parts of
an image independently if they all use the same error
signal. y

w1 w2

• Can neurons implement backpropagation?

– Not in the obvious way.
• but getting derivatives from later layers is so important that
evolution may have found a way.
Three kinds of learning
• Supervised Learning: this models p(y|x)
– Learn to predict a real valued output or a class label
from an input.
• Reinforcement learning: this just tries to have a good time
– Choose actions that maximize payoff
• Unsupervised Learning: this models p(x)
– Build a causal generative model that explains why
some data vectors occur and not others
or
– Learn an energy function that gives low energy to data
and high energy to non-data
or
– Discover interesting features; separate sources that
have been mixed together; find temporal invariants etc.
etc.
The Goals of Unsupervised Learning
• The general goal of unsupervised learning is to
discover useful structure in large data sets
without requiring a target desired output or
reinforcement signal.
– It is not obvious how to turn this general goal
into a specific objective function that can be
used to drive the learning.
• A more specific goal:
– Create representations that are better for
subsequent supervised or reinforcement
learning.
Why unsupervised pre-training makes sense

stuff stuff
high low
bandwidth bandwidth

image label image label

If image-label pairs were If image-label pairs are

generated this way, it generated this way, it
would make sense to try makes sense to first learn
to go straight from to recover the stuff that
images to labels. caused the image by
For example, do the inverting the high
pixels have even parity? bandwidth pathway.
Another Goal of Unsupervised Learning

• Improve learning speed for high-dimensional

inputs
– Allow features within a layer to learn
independently
– Allow multiple layers to be learned greedily.
– Improve the speed of supervised learning by
removing directions with high curvature of the
error surface.
• This can be done by using Principal Components
Analysis to pre-process the input vectors.
Another Goal of Unsupervised Learning
• Build a density model of the data vectors.
– This assigns a “score” or probability to each
possible datavector.

• There are many ways to use a good density

model.
– Classify by seeing which model likes the test case
data most
– Monitor a complex system by noticing improbable
states.
– Extract interpretable factors (causes or constraints).
Using backprop for unsupervised learning

• Try to make the output be

output vector
the same as the input in a
network with a central
bottleneck.
– The activities of the
hidden units in the
code
bottleneck form an
efficient code.
• The bottleneck does not
have room for
redundant features.
input vector
– Good for extracting
independent features
(as in the family trees)
Self-supervised backprop in a linear network

• If the hidden and output layers are linear, it will

learn hidden units that are a linear function of
the data and minimize the squared
reconstruction error.
– This is exactly what Principal Components
Analysis does.
• The M hidden units will span the same space as
the first M principal components found by PCA
– Their weight vectors may not be orthogonal
– They will tend to have equal variances
Principal Components Analysis

• This takes N-dimensional data and finds the M orthogonal

directions in which the data have the most variance
– These M principal directions form a subspace.
– We can represent an N-dimensional datapoint by its
projections onto the M principal directions
• This loses all information about where the datapoint is located
in the remaining orthogonal directions.
– We reconstruct by using the mean value (over all the
data) on the N-M directions that are not represented.
• The reconstruction error is the sum over all these
unrepresented directions of the squared differences from the
mean.
A picture of PCA with N=2 and M=1
The red point is represented by the
green point. Our “reconstruction” of
the red point has an error equal to the
squared distance between red and
green points.

First principal component:

Direction of greatest variance
Self-supervised backprop and clustering

• If we force the hidden unit reconstruction

whose weight vector is
closest to the input vector to
have an activity of 1 and the
rest to have activities of 0,
we get clustering.
– The weight vector of data=(x,y)
each hidden unit
represents the center of a
cluster.
– Input vectors are
reconstructed as the
nearest cluster center.
Clustering and backpropagation

• We need to tie the input->hidden weights to be the same as

the hidden->output weights.
– Usually, we cannot backpropagate through binary hidden
units, but in this case the derivatives for the input-
>hidden weights all become zero!
• If the winner doesn’t change – no derivative
• The winner changes when two hidden units give exactly the
same error – no derivative
• So the only error-derivative is for the output weights. This
derivative pulls the weight vector of the winning cluster
towards the data point. When the weight vector is at the
center of gravity of a cluster, the derivatives all balance out
because the c. of g. minimizes squared error.
A spectrum of representations
• PCA is powerful because it uses
distributed representations but limited
because its representations are linearly Local Distributed
related to the data
– Autoencoders with more hidden
layers are not limited this way. Linear PCA

• Clustering is powerful because it uses

very non-linear representations but non- What
linear clustering we
limited because its representations are
local (not componential). need

• We need representations that are both

distributed and non-linear
– Unfortunately, these are typically
very hard to learn.

Defferentiated Lesson Plan (Detailed)
90% (10)
Defferentiated Lesson Plan (Detailed)
14 pages
MSD Activity Plan Template
No ratings yet
MSD Activity Plan Template
17 pages
NNunsuperv Learning PDF
No ratings yet
NNunsuperv Learning PDF
21 pages
ML-UNIT-I
No ratings yet
ML-UNIT-I
14 pages
Ai-session2-presentation
No ratings yet
Ai-session2-presentation
19 pages
Chapter 9. Classification: Advanced Methods
No ratings yet
Chapter 9. Classification: Advanced Methods
39 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Unit 3
No ratings yet
Unit 3
110 pages
Presentation 3
No ratings yet
Presentation 3
43 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
23 pages
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
No ratings yet
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
7 pages
Deep Learning Lecture 0 Introduction Alexander Tkachenko
No ratings yet
Deep Learning Lecture 0 Introduction Alexander Tkachenko
31 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Fast Learning in Networks of Locally-Tuned Processing Units
No ratings yet
Fast Learning in Networks of Locally-Tuned Processing Units
14 pages
Unit V
No ratings yet
Unit V
22 pages
Neural Net 2002
No ratings yet
Neural Net 2002
12 pages
Imp Questions
No ratings yet
Imp Questions
42 pages
Lec 12 NN
No ratings yet
Lec 12 NN
20 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Module 4 Continued
No ratings yet
Module 4 Continued
244 pages
ECE/CS 559 - Neural Networks Lecture Notes #6: Learning: Erdem Koyuncu
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #6: Learning: Erdem Koyuncu
13 pages
Unit-1 - Machine Learning
No ratings yet
Unit-1 - Machine Learning
85 pages
CNN Stanford2015
No ratings yet
CNN Stanford2015
129 pages
week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
week 03-04 - Deep Feedforward Networks - Intro
141 pages
Lecture 9
No ratings yet
Lecture 9
97 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
AI lab6 (1)
No ratings yet
AI lab6 (1)
7 pages
ai_presentation
No ratings yet
ai_presentation
28 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
125 pages
Classification 1
No ratings yet
Classification 1
78 pages
reserch papers on deep learning mpgi
No ratings yet
reserch papers on deep learning mpgi
6 pages
Models For Machine Learning: M. Tim Jones
No ratings yet
Models For Machine Learning: M. Tim Jones
10 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
GML-slides-2024-04-29 (1)
No ratings yet
GML-slides-2024-04-29 (1)
206 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
25 pages
A2.2 DNN Update 2
No ratings yet
A2.2 DNN Update 2
51 pages
Classification BP Regression KNN Other Classifiers_ Final.ppt
No ratings yet
Classification BP Regression KNN Other Classifiers_ Final.ppt
116 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
Deep Learning: IPAM Summer School 2012 Tutorial On
No ratings yet
Deep Learning: IPAM Summer School 2012 Tutorial On
69 pages
ANN ARTIFICAL NEURAL NETWORK
No ratings yet
ANN ARTIFICAL NEURAL NETWORK
34 pages
DUnit I
No ratings yet
DUnit I
25 pages
L04 Slides.mlp1
No ratings yet
L04 Slides.mlp1
22 pages
Introduction to Machine Learing
No ratings yet
Introduction to Machine Learing
4 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Week 6
No ratings yet
Week 6
67 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
Vectorization: Linear Model As A Perceptron
No ratings yet
Vectorization: Linear Model As A Perceptron
5 pages
Statistics Mechanic of Deep Learning
No ratings yet
Statistics Mechanic of Deep Learning
28 pages
conmatphys-031119-050745
No ratings yet
conmatphys-031119-050745
28 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Neural Networks / Deep Learning
No ratings yet
Neural Networks / Deep Learning
9 pages
4 AI ML - 2
No ratings yet
4 AI ML - 2
31 pages
22222222222222222
No ratings yet
22222222222222222
1 page
EN3150 Pattern Recognition - L02
No ratings yet
EN3150 Pattern Recognition - L02
51 pages
1. Unit–2 Advanced Concepts of Modeling in AI_Question Answers
No ratings yet
1. Unit–2 Advanced Concepts of Modeling in AI_Question Answers
8 pages
Back Prop in Cortex 2007
No ratings yet
Back Prop in Cortex 2007
16 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Python Machine Learning
From Everand
Python Machine Learning
Sebastian Raschka
4/5 (18)
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Kuhn 2011
No ratings yet
Kuhn 2011
1 page
CH 13
No ratings yet
CH 13
6 pages
ch20 22
No ratings yet
ch20 22
8 pages
ABC Algorithm For Combinatorial Testing Problem: October 2017
No ratings yet
ABC Algorithm For Combinatorial Testing Problem: October 2017
5 pages
A Tool For Automated Test Data Generation (And Execution) Based On Combinatorial Approach
No ratings yet
A Tool For Automated Test Data Generation (And Execution) Based On Combinatorial Approach
19 pages
Late Acceptance Hill Climbing Based Strategy For Addressing Constraints Within Combinatorial Test Data Generation
No ratings yet
Late Acceptance Hill Climbing Based Strategy For Addressing Constraints Within Combinatorial Test Data Generation
6 pages
Implementation of Artificial Bee Colony Algorithm For T-Way Testing
No ratings yet
Implementation of Artificial Bee Colony Algorithm For T-Way Testing
4 pages
A Review On Recent T-Way Combinatorial Testing Strategy: Nuraminah Ramli, Rozmie Razif Othman
No ratings yet
A Review On Recent T-Way Combinatorial Testing Strategy: Nuraminah Ramli, Rozmie Razif Othman
6 pages
A Self-Adapting Ant Colony Optimization Algorithm Using Fuzzy Logic (ACOF) For Combinatorial Test Suite Generation
No ratings yet
A Self-Adapting Ant Colony Optimization Algorithm Using Fuzzy Logic (ACOF) For Combinatorial Test Suite Generation
11 pages
Combinatorial Testing of ACTS: A Case Study: Mehra N.Borazjany, Linbin Yu, Yu Lei Raghu Kacker, Rick Kuhn
No ratings yet
Combinatorial Testing of ACTS: A Case Study: Mehra N.Borazjany, Linbin Yu, Yu Lei Raghu Kacker, Rick Kuhn
10 pages
Introduction and Motivation: CITS 3242 Programming Paradigms
No ratings yet
Introduction and Motivation: CITS 3242 Programming Paradigms
11 pages
A Cuckoo Search Based Pairwise Strategy For Combinatorial Testing Problem
No ratings yet
A Cuckoo Search Based Pairwise Strategy For Combinatorial Testing Problem
9 pages
BCS Higher Education Qualifications Professional Graduate Diploma in IT Programming Paradigms Syllabus
No ratings yet
BCS Higher Education Qualifications Professional Graduate Diploma in IT Programming Paradigms Syllabus
6 pages
Programming Paradigms: Unit 1 - Introduction and Basic Concepts
No ratings yet
Programming Paradigms: Unit 1 - Introduction and Basic Concepts
33 pages
The University of Western Australia School of Computer Science & Software Engineering
No ratings yet
The University of Western Australia School of Computer Science & Software Engineering
2 pages
The University of Western Australia School of Computer Science & Software Engineering
No ratings yet
The University of Western Australia School of Computer Science & Software Engineering
3 pages
Rajalakshmi Engineering College Department of Computer Science Cs2309 - Java Lab Lab Manual
100% (1)
Rajalakshmi Engineering College Department of Computer Science Cs2309 - Java Lab Lab Manual
5 pages
Programming Languages & Paradigms Abstraction & Modularity: PROP HT 2011
No ratings yet
Programming Languages & Paradigms Abstraction & Modularity: PROP HT 2011
14 pages
XX Chapter16 InstructionLevelParallelismAndSuperscalarProcessors PDF
No ratings yet
XX Chapter16 InstructionLevelParallelismAndSuperscalarProcessors PDF
90 pages
XX-BSC Compact Vector Processing
No ratings yet
XX-BSC Compact Vector Processing
49 pages
Wrapping Things Up: Programming Paradigms - P. 361/385
No ratings yet
Wrapping Things Up: Programming Paradigms - P. 361/385
8 pages
Subject Description Form: Subject Code Subject Title Credit Value Pre-Requisite / Co-Requisite/ Exclusion
No ratings yet
Subject Description Form: Subject Code Subject Title Credit Value Pre-Requisite / Co-Requisite/ Exclusion
4 pages
Reevaluating Amdahl's Law and Gustafson's Law
No ratings yet
Reevaluating Amdahl's Law and Gustafson's Law
9 pages
Xx-Iip & Ilp
No ratings yet
Xx-Iip & Ilp
16 pages
Network Security: MSC Course by
No ratings yet
Network Security: MSC Course by
9 pages
Point Feature Detection and Matching: Davide Scaramuzza
No ratings yet
Point Feature Detection and Matching: Davide Scaramuzza
65 pages
Christina Suarezs Resume
No ratings yet
Christina Suarezs Resume
2 pages
2021 Legal Method Study Manual Introduction
No ratings yet
2021 Legal Method Study Manual Introduction
20 pages
10.1515 - Jelf 2012 0007
No ratings yet
10.1515 - Jelf 2012 0007
30 pages
Nguyen Khanh Linh - Appeal Letter PDF
No ratings yet
Nguyen Khanh Linh - Appeal Letter PDF
2 pages
West Bay Learning Center: How To Draw Comic Strip Characters
No ratings yet
West Bay Learning Center: How To Draw Comic Strip Characters
2 pages
1.1 English 10 w3 Ok
No ratings yet
1.1 English 10 w3 Ok
3 pages
NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
Ped 101 - Group 1
No ratings yet
Ped 101 - Group 1
11 pages
Amazon
No ratings yet
Amazon
2 pages
Approaches in Teaching Values Education: Inculcation Approach
75% (4)
Approaches in Teaching Values Education: Inculcation Approach
4 pages
Culturally Responsive Lesson
No ratings yet
Culturally Responsive Lesson
3 pages
Brecht Assessment Sheet
No ratings yet
Brecht Assessment Sheet
4 pages
Mapeh 10 Week 6
No ratings yet
Mapeh 10 Week 6
4 pages
Fieldstudy 1 - Episode8-Observation
No ratings yet
Fieldstudy 1 - Episode8-Observation
2 pages
Pe9Rd-Iib-1) Pe9Rd-Iib-H-4) Pe9Pf - Iib-H-28
No ratings yet
Pe9Rd-Iib-1) Pe9Rd-Iib-H-4) Pe9Pf - Iib-H-28
1 page
Template Penulisan Artikel Ilmiah PPs UNJ
No ratings yet
Template Penulisan Artikel Ilmiah PPs UNJ
5 pages
Educational Psychology Review Volume 24 Issue 1 2012 (Doi 10.1007/s10648-011-9183-6) Margaret E. Gredler - Understanding Vygotsky For The Classroom - Is It Too Late
No ratings yet
Educational Psychology Review Volume 24 Issue 1 2012 (Doi 10.1007/s10648-011-9183-6) Margaret E. Gredler - Understanding Vygotsky For The Classroom - Is It Too Late
19 pages
Visual Art Lesson Plan
No ratings yet
Visual Art Lesson Plan
8 pages
Feedback Form
No ratings yet
Feedback Form
1 page
Chanda Khatri Diploma Assignment
100% (2)
Chanda Khatri Diploma Assignment
14 pages
Weekly Instructional Plan: Novaliches High School
No ratings yet
Weekly Instructional Plan: Novaliches High School
1 page
IPCRF Teacher I III
No ratings yet
IPCRF Teacher I III
10 pages
Lesson Plan Factors Affecting Climate Latitude
No ratings yet
Lesson Plan Factors Affecting Climate Latitude
3 pages
TUSOME Eng Grade 2 Schemes
No ratings yet
TUSOME Eng Grade 2 Schemes
56 pages
BEC-PELC 2010 - English
60% (5)
BEC-PELC 2010 - English
26 pages
Machine Learning For Beginners. The Simplified Guide
No ratings yet
Machine Learning For Beginners. The Simplified Guide
24 pages
DCaps HL
No ratings yet
DCaps HL
42 pages
Dipaculao 2020 CBDRP - CSAR
100% (1)
Dipaculao 2020 CBDRP - CSAR
7 pages

Lec 13

Uploaded by

Lec 13

Uploaded by

CSC321: Neural Networks

Lecture 13: Learning without a teacher:

• Can neurons implement backpropagation?

image label image label

If image-label pairs were If image-label pairs are

• Improve learning speed for high-dimensional

• There are many ways to use a good density

• Try to make the output be

• If the hidden and output layers are linear, it will

• This takes N-dimensional data and finds the M orthogonal

First principal component:

• If we force the hidden unit reconstruction

• We need to tie the input->hidden weights to be the same as

• Clustering is powerful because it uses

• We need representations that are both

You might also like