0% found this document useful (0 votes)
159 views100 pages

Lecture 12 Learning in Vision 2022

This document discusses learning in computer vision and introduces several machine learning concepts and approaches. It explains that learning is needed in vision due to the difficulty of explicitly modeling higher-level reasoning tasks. It then covers supervised learning methods like logistic regression and support vector machines. Unsupervised learning approaches like clustering, principal component analysis are also introduced. Deep learning is discussed as the dominant learning paradigm today for computer vision tasks.

Uploaded by

ashok kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views100 pages

Lecture 12 Learning in Vision 2022

This document discusses learning in computer vision and introduces several machine learning concepts and approaches. It explains that learning is needed in vision due to the difficulty of explicitly modeling higher-level reasoning tasks. It then covers supervised learning methods like logistic regression and support vector machines. Unsupervised learning approaches like clustering, principal component analysis are also introduced. Deep learning is discussed as the dominant learning paradigm today for computer vision tasks.

Uploaded by

ashok kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

E1 216 COMPUTER VISION

LECTURE 12: LEARNING IN VISION

Venu Madhav Govindu


Department of Electrical Engineering
Indian Institute of Science, Bengaluru

2022
• Why do we need learning in vision?
• Should every solution be learnt?
• Why do we need learning in vision?
• Should every solution be learnt?
Tasks in Computer Vision
• Segmentation, Recognition, Detection, Localisation
• Tasks on the image plane R2
• Deep Learning breakthrough, with problems

Adapted from Fig. 6.1 in Szeliski Computer Vision: Algorithms and Applications, draft 2nd edition
Geometric Problems in Computer Vision
• 3D Reconstruction from multiple images
• Geometry induced by pinhole camera
• Reasoning about 3D world from 2D images
• Explicit reasoning and engineering used
b

b
b

b b

Geometric Models are Explicit


• Geometric relations governed by pinhole model
• Explicit models for observations
• Epipolar Geometry: x̂T F x = 0
• Reprojection Error can be written in explicit form
Learning in Vision

Role of Learning
• Different types of tasks
• Motion Estimation
• Shape Analysis
• Segmentation
• Theory, Model, Algorithms
• Understanding of physics (geometry) and statistics
• Higher-level Reasoning?
Learning in Vision
Learning in Vision
Learning in Vision

Sources: brittanica.com scmp.com makemytrip.com curlytales.com taj-mahal.net rfi.fr viator.com taj-mahal.net metmuseum.org easydrawing.net
Learning in Vision
Learning in Vision
Learning in Vision
Learning in Vision

Why Learning?
• Higher-level reasoning difficult to model
• Process of reasoning not fully described
• Interested in functional replication
• Flexibility of model
• Biological organisms learn
• Nature vs. Nurture debate
I shall reconsider human knowledge by starting
from the fact that we can know more than we
can tell. This fact seems obvious enough; but it
is not easy to say exactly what it means. Take
an example. We know a person’s face, and can
recognize it among a thousand, indeed among a
million. Yet we usually cannot tell how we recog-
nize a face we know. So most of this knowledge
cannot be put into words.

Michael Polanyi
The Tacit Dimension, 1966

Learning in Vision
• Tacit vs Explicit Forms of Knowledge
• Perceptual vs Engineering Solutions
• “All models are wrong, some are useful” to “What models?”
• Polanyi’s Revenge
h/t Subbarao Khambampati’s talk Polanyi vs Planning
Learning in Vision

Why Learning Now?


• Low-level vision well developed
• Difficult to formulate general models for reasoning
• Bypass through learning
• Explosion of image data, internet
• Growth of computational power
• Deep Learning
• Vision 6= Machine Learning 6= Deep Learning 6= AI!
Szeliski 2nd edition
• Consult slides of Andreas Geiger, Computer Vision (2021) Lecture
10:Recognition
Link provided on lecture page
Slide numbers: 3 14-20 60 75 77-82 136 139-140
• Consult slides of Noah Snavely, Introuction to Computer Vision (2021)
Lecture 19: Introduction to Recognition
Link provided on lecture page
Slide numbers: 12-16 24-29
Topics
• Machine Learning Methods
• Deep Learning and Datasets
• Later: Fairness and Ethics
Learning in Vision

Problems in Learning
• Classification
• Regression
• Clustering
javatpoint.com
Learning in Vision

Approaches to Learning
• Supervised
• Unsupervised (self-learning)
• Semi-supervised

Szeliski 2nd Edition


Learning in Vision

Supervised Learning
• Use input-output pairs
• How do we get labels?
• How do we score for tasks?
• Classification
• Detection
• Segmentation
Learning in Vision

Empirical Risk Minimisation


• yi = f (x i ; w)
• ΣL(yi , f (x i ; w))
• True Risk: E(L(y, f (x; w)))
• Classification (possibly asymmetric)
• Regression (think line fitting)

Statistical Learning Theory


• This is just a caricature
• Vast body of theoretical work
• Assumption: unknown underlying probability
• Training samples drawn from pdf
• Test from same pdf (Generalisation ?)
Learning in Vision

Fitting
• Learning model?
• Expressiveness
• Complexity
• Over vs. underfit
• Deep learning
• Too many parameters
• Generalisation?
• When?

https://www.kaggle.com/getting-started/166897
Learning in Vision

https://blog.ml.cmu.edu/2020/08/31/4-overfitting/
Learning in Vision

Bayes Classifier

p(x|Ck )p(Ck ) exp lk


p(Ck |x) = =P
p(x|C j exp lj
P
j j )p(Cj )
where lk = log p(x|Ck ) + log p(Ck )

• Logistic function: σ(l) = 1


1+e −l
for l = l0 − l1
Learning in Vision

1 1 T
p(x|Ck ) = exp{− (x − µk ) Σ−1 (x − µk )}
Vk 2
⇒ p(C0 |x) = σ(wT x + b)

Discriminant Analysis
• Binary Classification
• Assume Gaussian distributions (further for 2-class, assume same
covariance Σ)
• Result is logistic regression
• Linear Discriminant Function: compare wTk x + bk
• For non-equal Σ, quadratic discriminant function
Learning in Vision

pi = p(C0 |xi ) = σ(wT xi + b)


⇒ ECE (w, b) = −Σi ti log pi + (1 − ti ) log(1 − pi )

Logistic Regression
• Gaussian assumption too strong
• Work with posterior
• Cross-entropy Loss
• One-hot encoding
• Limitations: when not linearly separable
• Limitations: infinite solutions when separable
Learning in Vision

exp lik 1
pik = p(Ck |xi ) = = exp lik
Σj exp lij Zi
with lik = wTk xi + bk
⇒ EMCCE (wk , bk ) = −Σi Σk t̃ik log pik

Logistic Regression
• Gaussian assumption too strong
• Work with posterior
• Cross-entropy Loss
• One-hot encoding
• Limitations: when not linearly separable
• Limitations: infinite solutions when separable
Learning in Vision

Support Vector Machines


• Multiple solutions when separable
• Recognise that data is only partial
• Maximise margin of classifier
• For not linearly separable: kernel regression
Learning in Vision

Approaches to Learning
• Clustering using k-means
Szeliski 2nd Edition
Learning in Vision

Approaches to Learning
• Principal Component Analysis
• C = Σ(x j − µ)(x j − µ)T
• C = U ΛU T = λi ui uTi
P
T
• C ≈ k λk u k u k
P

• Low dimensional representation


• Project observation onto subspace
Szeliski 2nd Edition
Learning in Vision

Deep Learning
• Simple nonlinear model of single neuron
• Old idea of connectionism
• Rosenblatt 1958; Rumelhart et al. 1986, Fukushima 1980
• Cycles of interest
• Significant breakthroughs with deep layers
• Dominant paradigm today
Simon Haykin’s textbook; Szeliski 2nd Edition
Learning in Vision

Perceptron Model
• Feedforward networks
• Simple “neurons”, rich connections
• y = h(s) = h(w T x + b)
• h(l) = 1+e1 −l
• Key: Non-linearity of neuron
Simon Haykin’s textbook; Szeliski 2nd Edition
Learning in Vision

Multilayer Neural Networks


• Regular structure with layers
• Each layer outputs: sl = Wl xl
• Next layer: xl+1 = yl = h(sl )
• Output: y = hWN (hWN −1 ((· · · (x))))
• Non-linear function mapping: y = H (x, W)
• W: All weights in all layers!
• Expressive power
Learning in Vision

Deep Neural Networks


• What is deep here?
• Non-linear with many many weights!
• Breakthrough in 2012
• Tsunami of DL approaches
• Completely taken over vision and ML (almost)
Learning in Vision

Types of Neural Networks


• Layers with vector inputs
• Convolutional Networks (Receptive Fields)
• Temporal Networks (LSTM, Transformer)
• Many more models
Noah Snavely’s slides; Kevin Murphy’s book
Learning in Vision

Types of Neural Networks


• Layers with vector inputs
• Convolutional Networks (Receptive Fields)
• Temporal Networks (LSTM, Transformer)
• Many more models
Noah Snavely’s slides; Kevin Murphy’s book
Learning in Vision

Types of Neural Networks


• Layers with vector inputs
• Convolutional Networks (Receptive Fields)
• Temporal Networks (LSTM, Transformer)
• Many more models
Noah Snavely’s slides; Kevin Murphy’s book
Learning in Vision

Key Ingredients
• Non-linear activation functions
• Gradient descent for fitting
• Learning over masses of data
• Nested functions h(h(h(· · · ))
• Derivatives using chain rule of calculus
• Learning through Backpropagation
• Stochastic Gradient Descent
https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep-learning-9689331ba092
Learning in Vision

Key Ingredients
• Non-linear activation functions
• Gradient descent for fitting
• Learning over masses of data
• Nested functions h(h(h(· · · ))
• Derivatives using chain rule of calculus
• Learning through Backpropagation
• Stochastic Gradient Descent (Graduate Student Descent)
https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep-learning-9689331ba092
Learning in Vision

Activation Functions
• Many functions
• Sigmoid is smooth
• ReLU is simple and popular
• ReLU has issues
Learning in Vision

Softmax Layer
• pi = Pexp xi
k exp xk

• Soft version of max


• Often as last layer
• Converts outputs to class likelihoods
Learning in Vision

Data Augmentation
• Use training samples
• Reduce over-fitting
• Augment traning data with distortions
Learning in Vision

Data Augmentation
• Variety of augmentations in range and domain
• Very hacky
https://medium.com/@sauravkumarsct
Learning in Vision

Invariances and Equivariances


• Invariance: Output doesn’t change with nuisance variable
• Equivariance: Invariance upto equivariant factor
• lT p = (Rl)T (Rp) = 0
• Line fitting using different co-ordinate systems
• Recall OLS vs. TLS solutions
• Deep Learning can fail catastrophically
• Recent approaches more principled
Learning in Vision

Learning can be Brittle


• Catastrophic failures
• Why does this happen?
• Explainable approaches
• GANs
Szeliski 2nd Edition
Learning in Vision

Dropout
• Method for regularization
• Reduces overfitting, improves generalization
• Applies to each mini-batch
Szeliski 2nd Edition
Learning in Vision

Batch Normalization
• Optimization is tricky, needs good conditions
• Recall condition number, scaling
• Varying scales of weights, outputs
• Components of gradient scaled differently
• Simple scaling+recentering along layers etc.
Szeliski 2nd Edition
Learning in Vision

Loss Functions
• Define optimization cost or loss
• Classification vs. Regression
• Classification: Cross-entropy loss
• Contrastive learning, metric embedding
• Regression: typically least squares
• Issue: our confidence in output
Szeliski 2nd Edition
Learning in Vision

Supervised
Given “ground truth” F for Unsupervised
training data 2
minW k i ρ(x i kT H (X k , W)x ki )
P P 0

2
minW k ||H (X k , W) − Fk ||
P

Learning Epipolar Geometry


• Toy example illustration
• Supervised vs. Unsupervised Learning
0
• Correspondences X = {(x i , x i )|i = 1, · · · , N }
• Can contain outliers
• Learnt model F = H (X , W)
• Recall IRLS weights W = {wi , i = 1 · · · N }
• Learn to estimate weights directly W correspondences = H (X , W)
• Wcorrespondences : Not to be confused with network weights W!
• “Learning to Find Good Correspondences”
Learning in Vision
2
X
min ||y − H (xk , W)||
W
k

Solving for Weights


• Learning is an optimization problem
• Optimize what?
• How?
• Too much data for higher-order methods
• Key observation: two passes
Szeliski 2nd Edition
Learning in Vision
2
X
min ||y − H (xk , W)||
W
k

Solving for Weights


• Learning is an optimization problem
• Optimize what? Weights W
• How?
• Too much data for higher-order methods
• Key observation: two passes
Szeliski 2nd Edition
Learning in Vision
2
X
min ||y − H (xk , W)||
W
k

Solving for Weights


• Learning is an optimization problem
• Optimize what? Weights W
• How?
• Too much data for higher-order methods
• Key observation: two passes
Szeliski 2nd Edition
Learning in Vision
2
X
min ||y − H (xk , W)||
W
k

Solving for Weights


• Learning is an optimization problem
• Optimize what? Weights W
• How? Gradient Descent
• Too much data for higher-order methods
• Key observation: two passes
Szeliski 2nd Edition
Learning in Vision
2
X
min ||y − H (xk , W)||
W
k

Solving for Weights


• Learning is an optimization problem
• Optimize what? Weights W
• How? Gradient Descent
• Too much data for higher-order methods
• Key observation: two passes
Szeliski 2nd Edition
Learning in Vision

Backpropagation
• Backpropagation: Rumelhart, Hinton, Williams (1986)
• Compute output in forward pass
• Want to change weights W in descent direction
• Derivative of output wrt input xk ?
• Summation of individual contributions
• Derivative of output wrt weights?
Szeliski 2nd Edition
Learning in Vision

Backpropagation
• Recall y = H (x, W) = hWN (hWN −1 ((· · · (x))))
• Loss: E = (y − H (x, W))2
• Denote yi = h(si ) = h(wTi x)
0
• ∂E ∂E
∂s = h (si ) ∂y
i i

• What does yi depend on?


• y = h(h(h(· · · )))
Szeliski 2nd Edition
Learning in Vision

Backpropagation
• Recall yi depends on outputs of previous layer
• Recall yi affects subsequent layers
• Define ‘error’ ei = ∂E ∂si
• ∂y
∂E ∂E
= k>i ∂xki = k>i wki ek
P P
i
0 0
• ei = h (si ) ∂y
∂E
= h (si ) k>i wki ek
P
i

• Chain rule: Derivative of loss (error) wrt unit


• Depends on weighted sum of errors of units feeds into
• Store activations in forward pass
• Estimate in backward sweep (bfs)
Szeliski 2nd Edition
Learning in Vision

wt+1 = wt − αg
Define vt+1 = ρvt + gt
wt+1 = wt − αvt with momentum

Training Issues
• Data too big for higher-order methods
• Just use gradient descent
• Gradient: sum of gradient terms of each x
• Stochastic Gradient Descent
• Minibatches: · · · [· · · ][· · · ][· · · ] · · ·
• Epoch: One cycle through batches
• α: learning rate to be annealed (why?)
• ρ is relatively large
• Hyper-parameters
Learning in Vision

Key Ingredients
• Large datasets are important
• Deep Networks
• Massive Compute Power
• AlexNet: 8 Layers; ResNet: 152 layers
• ImageNet Dataset: 1000 classes, > million images
Szeliski 2nd Edition
Learning in Vision

Ethics of datasets
• Transparency of acquisition process, privacy
• Ethics problems should not be ignored
• Large % of images removed from ImageNet
• Ethics of labour (Amazon Mechanical Turk)
• Obsession with test error
• “Datasheets for Datasets”
Szeliski 2nd Edition
Learning in Vision

Deep Learning for Images


• Convolutional Neural Networks
• Locality of pixels propagated
• End-to-end learning
• Unified approaches for multiple tasks
• Segmentation, Localization, Recognition
Kevin Murphy’s book
• Consult slides of Noah Snavely, Introduction to Computer Vision
(2021)
Lecture 21: Convolutional Neural Networks
Link provided on lecture page
Slide numbers: 57-100
Learning in Vision

Object Recognition
• Major breakthroughs in recognition tasks
• Efficient computation of repeated convolutions
• Older approaches: Instance Recognition
• re-recognise specific objects
• Current approaches: Class or Category Recognition
• Variable classes: dogs, cats, chairs
• Fine-grained categories
Kevin Murphy’s book; Szeliski 2nd edition
Learning in Vision

Object Recognition
• Major breakthroughs in recognition tasks
• Efficient computation of repeated convolutions
• Older approaches: Instance Recognition
• re-recognise specific objects
• Current approaches: Class or Category Recognition
• Variable classes: dogs, cats, chairs
• Fine-grained categories
Kevin Murphy’s book; Szeliski 2nd edition
Image Net Examples; Szeliski 2nd edition
Learning in Vision
Learning in Vision

Object Detection
• Early work in detecting faces, people (pedestrians)
• Early neural networks
• Some used bag of words
• Deformable parts model
• Boosting: Combine many simple features
• Cascade of classifiers
Szeliski 2nd edition
Learning in Vision

Object Detection
• Early work in detecting faces, people (pedestrians)
• Early neural networks
• Some used bag of words
• Deformable parts model
• Boosting: Combine many simple features
• Cascade of classifiers
Szeliski 2nd edition
Learning in Vision
Learning in Vision

Face Recognition
• High interest: Access, surveillance
• Seen PCA version earlier (EigenFaces)
• DL version: Frontalization + Recognition
• Works well in many contexts
• Accuracy “in the wild” is questionable
• Extraordinary crises around FRT
• Discuss in Ethics lecture
Szeliski 2nd Edition
Learning in Vision

Generic Object Detection


• Major breakthroughs with DL
• Rectangular regions
• Based on sliding window tests
Learning in Vision

How to Score Performance?


• Two types of errors
• Receiver Operating Characteristic (ROC)
• True Positive vs. False Positive
• Precision-Recall (PC)
• True, False, Number of Positives (TP,FP,NP)
• Precision= TP
TP +FP
• Recall= TP
NP
• Average Precision (AP);meanAP (mAP) over all categories

https://www.r-bloggers.com/2020/01/area-under-the-precision-recall-curve/
Learning in Vision

Modern Object Detectors


• Rectangular Region Proposals + Classifier
• R-CNN: Region-based CNN
• ≈ 2000 region proposals
• Each warped to fixed 224 × 224 region
• Classify using SVM
Szeliski 2nd edition
Learning in Vision

Modern Object Detectors


• Rectangular Region Proposals + Classifier
• Fast R-CNN
• End-to-end
• Resamples convolution features for proposals
• Classify using fully connected network
Szeliski 2nd edition
Learning in Vision

Modern Object Detectors


• Rectangular Region Proposals + Classifier
• Also Faster R-CNN
• Single network for detection+classification
• Single Shot Multibox Detector (SSD)
• You Only Look Once (YOLO)
Szeliski 2nd edition
Learning in Vision

RCNN and Faster RCNN papers


Learning in Vision

Some results from RCNN paper


Learning in Vision

You Only Look Once


• Single shot instead of two-stages
• Directly predicts 2D bounding box
• Faster, lower performance
• Redmon et al., ‘You Only Look Once: Unified, Real-Time Object Detection’,
CVPR 2016
• Many improvements
• Ethics dimensions in next lecture
Learning in Vision

You Only Look Once


• Single shot instead of two-stages
• Directly predicts 2D bounding box
• Faster, lower performance
• Redmon et al., ‘You Only Look Once: Unified, Real-Time Object Detection’,
CVPR 2016
• Many improvements
• Ethics dimensions in next lecture
Learning in Vision

Results from YOLO paper


Learning in Vision

Semantic Segmentation
• Standard segmentation: distinction between classes
• Pairwise potentials: similarity + proximity
• No classification
• Semantic segmentation: per-pixel classification
• Networks “percolate” semantic information to pixels
Learning in Vision

Instance Segmentation
• Find all objects, give per-pixel masks
• Mask R-CNN
• Region proposal as Faster R-CNN
• Additional branch for mask prediction
• Training loss carefully combines all parts
Szeliski 2nd edition
• Consult slides of Andreas Geiger, Computer Vision (2021) Lecture 9:
Co-ordinate Based Networks
Link provided on lecture page
Slide numbers: 54-66
Learning in 3D Geometry Estimation
Progression from Tacit to Explicit Problems
Ranftl et al., ‘Towards Monocular Depth Estimation’;http://3dstereophoto.blogspot.com; https://www.cs.cornell.edu/projects/bigsfm
3D Geometry Problems
• Correspondence is ambiguous for low texture
• ∴ dense depth estimation has tacit parts
• Geometric problems with explicit forms
• camera motion estimation
• sparse triangulation for corners
• Recognise distinction between tacit and explicit aspects
• Implications for accuracy and reliability

Ranftl et al., ‘Towards Monocular Depth Estimation’;http://3dstereophoto.blogspot.com; https://www.cs.cornell.edu/projects/bigsfm


Monocular Depth
• Very impressive, but what kind of depth is it?
• Notions of depth: Euclidean, quasi-Euclidean, ordinal, bounding box
• Semantic segmentation of depth is useful for tasks
Miangoleh et al., ‘Boosting Monocular Depth ...’, CVPR 2021
Monocular Depth
• Learnt models for specific narrow contexts
• Lessons
• networks ignore apparent size
• use vertical position of objects
• dark region used to detect obstacles
• brittle and unreliable
van Dijk et al., ‘How Do Neural Networks See Depth in Single Images?’, ICCV 2019
Two-View Stereo
• Recover dense depth with known geometry
• Stereo is a correspondence problem
• Many ambiguities and issues
• Search constraint + ambiguous correspondence
• ⇒ mixture of explicit and tacit problems

http://3dstereophoto.blogspot.com
3D Reconstruction from Many Images
• Geometry induced by pinhole camera
• SLAM vs SfM
• Significantly different motion and noise distributions
• Implications for use of brightness constraint
𝑹𝒊 𝑹𝒋
𝑹𝒊𝒋

Global Approaches to SfM


• Jointly solve geometry over all cameras
• Many two-view relative motions available
• Averaging: Solve global rotations and translations
• Solve for 3D structure and refine
𝑹𝒊 𝑹𝒋
𝑹𝒊𝒋

Rotation Averaging
• Viewgraph of camera-camera relations
• Given Rij on each edge
• Solve for individual cameras Ri
• Use relationship: Rij = Rj R−1
i
• Optimisation of robust geometric cost
Rotation Averaging
• Deep learning does well compared to geometric methods
• Key factors
• Distribution of rotations
• Distribution of noise+outliers
• Distribution of viewgraph edges
• Combinatorial explosion
• Is accuracy on datasets enough?
• What is learnt?
• How reliable are learnt models?
Forstner, Photogrammetric Computer Vision

Gauge Freedom
• Arbitrary choice of basis
• Rotations should be equivariant
• Natural for geometric methods
• Not for learnt models
Robustness
• Good performance on noisy real-world SfM datasets
• Consider perfect data: Rij = Rj R−1
i exactly
• Exact solution exists
• DL method has non-zero error
• What has it learnt?
𝑹𝒊 𝑹𝒋
𝑹𝒊𝒋

SLAM sequences
• Smooth sequences
• dense connectivity
• small rotations
• Loop closures are very useful
• DL method trained on SfM data fails here
Geometric Deep
Method Learning
Equivariance 3 7
Robustness 3 ?
𝑹𝒋

Graph Agnostic 3 7
𝑹𝒊
𝑹𝒊𝒋

Loop Closure 3 7
Some Observations
• Geometry is fundamental in vision
• Desired accuracy: qualitative vs. metric
• Limitations are understood: ambiguous configurations, high noise,
outliers
• Deep Learning for geometry
• works well in narrow contexts
• combinatorial explosion difficult to tame
• lacks desirable properties
• can be unreliable
Some Observations
• DL to mitigate geometric ambiguities + limitations
• Useful for
• tacit parts of 3D reconstruction pipeline
• weights for robust least squares
• initialisation of geometry
• principled fusion with geometric estimates
Learning in Vision

Summary
• Almost all problems now have DL version
• Datasets play key role in developments
• More (layers) the merrier?
• Massive computational power involved
• Vision tools with high accuracies (deployable)
• What does such “learning” mean?
• Debates on AGI
• Pitfalls: Safety, Privacy, Accuracy, Ethics
• Data+Computational Divide between haves and have-nots
• Handful of corporations driving agenda
• Environmental impact of deep learning
• Deep Learning will continue to dominate
• Consequences?

You might also like