0% found this document useful (0 votes)
31 views20 pages

5.11 MLBasics-Challenges

Basic

Uploaded by

Yehaa Km
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views20 pages

5.11 MLBasics-Challenges

Basic

Uploaded by

Yehaa Km
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Deep Learning Srihari

Challenges motivating deep


learning
Sargur N. Srihari
[email protected]

1
Topics In Machine Learning Basics
Deep Learning Srihari

1. Learning Algorithms
2. Capacity, Overfitting and Underfitting
3. Hyperparameters and Validation Sets
4. Estimators, Bias and Variance
5. Maximum Likelihood Estimation
6. Bayesian Statistics
7. Supervised Learning Algorithms
8. Unsupervised Learning Algorithms
9. Stochastic Gradient Descent
10. Building a Machine Learning Algorithm
11. Challenges Motivating Deep Learning 2
Deep Learning Srihari

Topics in “Motivations”
• Shortcomings of conventional ML
1. The curse of dimensionality
2. Local constancy and smoothness
regularization
3. Manifold learning

3
Deep Learning Srihari

Challenges Motivating DL
• Simple ML algorithms work very well on a wide
variety of important problems
• However they have not succeeded in solving
central problems of AI, such as recognizing
speech and recognizing objects
• Deep learning was motivated by failure of
traditional algorithms to generalize well on such
tasks
4
Deep Learning Srihari

Curse of dimensionality

• No of possible distinct
configurations of a set of
variables increases exponentially with no of
variables
– Poses a statistical challenge
• Ex: 10 regions of interest with one variable
– We need to track 100 regions with two variables
– 1000 regions with three variables 5
Local Constancy & Smoothness
Deep Learning Srihari

Regularization
• Prior beliefs
– To generalize well ML algorithms need prior beliefs
• Form of probability distributions over parameters
• Influencing the function itself, while parameters are
influenced only indirectly
• Algorithms biased towards preferring a class of functions
– These biases may not be expressed in terms of a probability
distribution

• Most widely used prior is smoothness


– Also called local constancy prior
– States that the function we learn should not change
6
very much within a small region
Deep Learning Srihari

Local Constancy Prior


• Function should not change very much within a
small region
• Many simpler algorithms rely exclusively on this
prior to generalize well
– Thus fail to scale statistical challenges in AI tasks
• Deep learning introduces additional (explicit
and implicit) priors in order to reduce
generalization error on sophisticated tasks
• We now explain why smoothness alone is
insufficient 7
Deep Learning Srihari

Specifying smoothness
• Several methods to encourage learning a
function f* that satisfies the condition
f*(x)≈f*(x+ε)
– For most configurations x and small change ε
• If we know a good answer for input x then that
answer is good in the neighborhood of x
• An extreme example is k-nearest neighbor
– Points having the same set of nearest neighbors all
have the same prediction
8
– For k=1, no of regions ≤ no of training examples
Deep Learning Srihari

Kernel machines and smoothness


• Kernel machines interpolate between training
set outputs associated with nearby training
examples
• With local kernels: k(u,v) is large when u=v
and decreases as u and v grow further apart
• Can be thought of as a similarity function that
performs template matching
– By measuring how closely test example x
resembles training example x(i)
• Much of deep learning is motivated by
9
limitations of template matching
Deep Learning Srihari

Decision Trees and Smoothness


• Also suffers from exclusively smoothness-
based learning
– They break input space into as many regions as
there are leaves and use a separate parameter in
each region
– For n leaves, at least n training samples are
required
– Many more needed for statistical confidence

10
Deep Learning Srihari

No. of examples and no. of regions


• All of the above methods require:
– O(k) regions need O(k) examples;
– O(k) parameters with O(1) parameters associated
with O(k) regions
• Nearest-neighbor : each training sample (circle)
defines at most one region
– y value associated with
each example defines the
output for all points within region

11
Deep Learning Srihari

More regions than examples


• Suppose we need more regions than examples
• Two questions of interest
1. Is it possible represent a complicated function
efficiently?
2. Is it possible for the estimated function to
generalize well for new inputs?
• Answer to both is yes
– O(2k) regions can be defined with O(k) examples
• By introducing dependencies between regions through
assumptions on data generating distribution 12
Core idea of deep learning
Deep Learning Srihari

• Assume data was generated by composition of


factors, at multiple levels in a hierarchy
– Many other similarly generic assumptions
• These mild assumptions allow exponential gain
in no of samples and no of regions
– An example of a distributed representation is a
vector of n binary features
• It can take 2n configurations
– Whereas in a symbolic
representation, each input
is associated with a single
symbol (or category)
13
– Here h1, h2 and h3 are three binary features
Deep Learning Srihari

Manifold Learning
• An important idea underlying many ideas in
machine learning
• A manifold is a connected region
– Mathematically it is a set of points in a
neighborhood
– It appears to be in a Euclidean space
• E.g., we experience the world as a 2-D plane while it is a
spherical manifold in 3-D space

14
Deep Learning Srihari

Manifold in Machine Learning


• Although manifold is mathematically defined,
in machine learning it is loosely defined:
– A connected set of points that can be approximated
well by considering only a small no of degrees of
freedom embedded in a higher-dimensional space
Training data lying near a 1-D The solid line indicates the underlying
Manifold in a 2-D space manifold that the learner should infer

In machine learning we allow the


dimensionality of the manifold to
vary from one point to another.
This often happens when a manifold
Intersects itself, as in a figure-eight
15
Deep Learning Srihari

Manifold learning surmounts Rn


• It is sometimes hopeless to learn functions with
variations across all of Rn
• Manifold learning algorithms surmount this
obstacle by assuming most of Rn consists of
invalid inputs
– And that intersecting inputs occur only along the
manifolds
• Introduced for continuous data and in
unsupervised learning, the probability
concentration idea can be generalized to
discrete and unsupervised settings
Deep Learning Srihari

Manifold hypothesis for Images


• Manifold assumption is
justified since:
• Distributions are highly
concentrated
– Uniformly sampled points
look like static noise,
never structured
• Although there is a non-zero
probability of generating a
face, it is never observed

17
Deep Learning Srihari

Manifold justified in Text domain


• If you generate a document by randomly
generating text, it is a near zero probability of
generating meaningful text
• Natural language sequences occupy a small
volume of total space of sequences of letters

18
Deep Learning Srihari

Manifolds traced by transformations


• Manifolds can be traced by making small
transformations
• Manifold structure of a dataset of human faces

19
Deep Learning Srihari

Manifolds discovered for Human Faces


• Variational
autoencoder
discovers underlying
two-dimensional
coordinate system:
1. Rotation
2. Emotion

20

You might also like