0% found this document useful (0 votes)

7 views

lecture_2_handout

The document outlines the goals and concepts of computer vision, emphasizing the importance of understanding images, naming objects, and recognizing actions. It discusses the evolution of visual intelligence, the differences between language and vision processing, and introduces various learning methods in computer vision. Additionally, it provides a tentative syllabus for a course on computer vision, detailing topics to be covered each week.

Uploaded by

yg3481

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

lecture_2_handout

Uploaded by

yg3481

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 154

CSCI-GA 2271

Computer Vision
Lecture 2
Saining Xie
Courant Institute, NYU
[email protected]
Recap

Goals of Computer Vison

Goals of Computer Vision
Get a computer to understand
Goals of Computer Vision
Goal: Naming

Building

Person
Road
Goals of Computer Vision
Goal: Naming

Silver Center

Saining Xie
Asphalt concrete
Goals of Computer Vision
Goal: Naming

The image depicts a street corner view of a building with

a distinctive urban architectural style, likely located in a
city. The building has a stone facade with tall windows
and a classic design, suggesting it is an institutional or
educational structure. Several purple flags are visible on
the building, each featuring the logo of New York
University (NYU), indicating that this building is part of
the NYU campus. The street appears active with
pedestrians, and a few cars are present, further
confirming the city setting…
Goals of Computer Vision
Goal: 3D Structure
Goals of Computer Vision
Goal: Actions

What can I
do here?
Goals of Computer Vision
Goal: Matching
phylogeny of intelligence
538.8 million years ago
Cambrian era
“biological explosion”

“The evolution of the eye is likely to have been a

catalyst for the explosion, initiating an arms race
between organisms that were increasingly aware
of their surroundings.”
https://www.nhm.ac.uk/discover/eyes-on-the-prize-evolution-of-vision.html
phylogeny of intelligence
50,000 – 150,000 years
Behavioral modernity

- Language
- Abstract thinking
- Symbolic behavior
“Who won the game?”
Language vs Visual Intelligence

“Where can I buy this mug?” “Which direction leads home?”

“what does this remind you of?”

[V-IRL – Ji et al. ECCV 2024]

Tasks Requiring more

Tasks Requiring more Visual Intelligence
Strong Language Capability
Language
Abstract knowledge
High information density
Small gap between pre-training and testing
Homogeneous data format on web
Easy to distill
Easy to impress human! (“hard problems are easy”)
Next token prediction
Vision Language
Natural signals / observations Abstract knowledge
Low information density / redundant High information density
Huge gap between pre-training and testing ?
Small gap between pre-training and testing
Many moving parts, data hard to acquire Homogeneous data format on web
Hard to distill, distribution shift Easy to distill
Hard to impress human! (“easy problems are hard”) Easy to impress human! (“hard problems are easy”)

But really important

(even for language understanding, grounding, TruthGPT, experiential computing, embodiment, …)

?
Self-Supervised Supervised
Learning Learning

NLP Learning from “unlabeled*” text corpora Fine-tuning with

Next word prediction Reinforcement Learning from Human Feedback (RLHF)
GPT-1/2/3 Instruction following
ChatGPT

Vision Learning from unlabeled natural images Fine-tuning with

MoCo, MAE ImageNet classification labels, bounding boxes etc.
Generative Discriminative
Models Models

Natural Language Generation (GPT) Natural Language Understanding (BERT + finetuning)

Autogressive Models Current vision SSL
Varational Autoencoders Image Classification
GANs Object Recognition
Diffusion Models …
…

GPT unifies the tasks in NLP, but in vision Synthesis & Analysis are still very much independent
Tentative Syllabus (check website for updated schedule)
Week 1: Why Computer Vision Matters (Sept 5)
Week 2: Filtering, Detectors, Descriptors + Why Representation Learning Matters (Sept 12)
Assignment 0 due
Week 3: Deep Learning Basics, Backpropagation, AutoDiff (Sept 19)
Week 4: Training Deep Neural Networks: optimization, initialization, regularization, normalization (Sept 26)
Week 5: ECCV Conference field report; Remote Lecture: Detection and Segmentation (Oct 3)
Assignment 1 due, Proposal due
Week 6: Attention and Transformer Deep Dive (Oct 10)
Week 7: Self-supervised Learning and Multi-modal Learning (Oct 17)
Week 8: Generative Models 1 – GANs (Oct 24)
Assignment 2 due
Week 9: Generative Models 2 – VAEs, Diffusion Models, Flow-based Models (Oct 31)
Week 10: Visualizing and Understanding Neural Networks (Nov 7)
Week 11: Motions + Deep Learning on Spatiotemporal Data (Nov 14)
Week 12: 3D Vision: Cameras + Meshes, Point Cloud, NeRF, Gaussian Splatting (Nov 21)
Assignment 3 due
Week 13: Thanksgiving Recess, no class (Nov 28)
Week 14: Guest Lecture: Topics TBD (Dec 5)
Week 15: Project Final Presentation (Dec 12)
Final Project report / code due (Dec 20)
“Do you realize how lucky you are working
in AI / CV in 2024?”
Design vs. Learning
Hand designed features
Typical CV pipeline back then (e.g., 2011)
And still doesn’t really work…

99%

Visualizing Object Detection Features, Vondrick et al., 2015

Every domain is different
Image Filtering
Let’s Take An Image
Let’s Fix Things
• We have noise in our image
• Let’s replace each pixel with a weighted
average of its neighborhood
• Weights are filter kernel
1/9 1/9 1/9

Out 1/9 1/9 1/9

1/9 1/9 1/9

Slide Credit: D. Lowe

1D Case

Signal 10 12 9 11 10 11 12

What’s the average of

Filter 1/3 1/3 1/3
9, 10, 12?

(a) 9 (b) 11.5

Output 10.33
(c) 10.33 (d) 11.66
1D Case

Signal 10 12 9 11 10 11 12

Filter 1/3 1/3 1/3

Output 10.33

Done! Next?
(a) 10.66 (b) 9.33
1D Case (c) 14.2 (d) 11.33

Signal 10 12 9 11 10 11 12

Filter 1/3 1/3 1/3

Output 10.33 10.66

(a) 10.33 (b) 11.33
1D Case (c) 10 (d) 9.1

Signal 10 12 9 11 10 11 12

Filter 1/3 1/3 1/3

Output 10.33 10.66 10

1D Case

Signal 10 12 9 11 10 11 12

Filter 1/3 1/3 1/3

Output 10.33 10.66 10 10.66

1D Case

Signal 10 12 9 11 10 11 12

Filter 1/3 1/3 1/3

Output 10.33 10.66 10 10.66 11

1D Case
10 12 9 11 10 11 12

⁎
1/3 1/3 1/3

=
10.33 10.66 10 10.66 11

You lose pixels (maybe…)

Filter “sees” only a few pixels at a time
Applying a Linear Filter

Input Filter Output

I11 I12 I13 I14 I15 I16

I21 I22 I23 I24 I25 I26 F11 F12 F13 O11 O12 O13 O14

I31 I32 I33 I34 I35 I36 F21 F22 F23 O21 O22 O23 O24

I41 I42 I43 I44 I45 I46 F31 F32 F33 O31 O32 O33 O34

I51 I52 I53 I54 I55 I56

Applying a Linear Filter

Input & Filter Output

F11
I11 F12
I12 F13
I13 I14 I15 I16

F21
I21 F22
I22 F23
I23 I24 I25 I26 O11

F31
I31 F32
I32 F33
I33 I34 I35 I36

I41 I42 I43 I44 I45 I46

I51 I52 I53 I54 I55 I56

O11 = I11F11 + I12F12 + … + I33*F33

Applying a Linear Filter

Input & Filter Output

I11 F11
I12 F12
I13 F13
I14 I15 I16

I21 F21
I22 F22
I23 F23
I24 I25 I26 O11 O12

I31 F31
I32 F32
I33 F33
I34 I35 I36

I41 I42 I43 I44 I45 I46

I51 I52 I53 I54 I55 I56

O12 = I12F11 + I13F12 + … + I34*F33

Applying a Linear Filter

Input Filter Output

I11 I12 I13 I14 I15 I16

I21 I22 I23 I24 I25 I26 F11 F12 F13

I31 I32 I33 I34 I35 I36 F21 F22 F23

I41 I42 I43 I44 I45 I46 F31 F32 F33

I51 I52 I53 I54 I55 I56

How many times can we apply a

3x3 filter to a 5x6 image?
Applying a Linear Filter

Input Filter Output

I11 I12 I13 I14 I15 I16

I21 I22 I23 I24 I25 I26 F11 F12 F13 O11 O12 O13 O14

I31 I32 I33 I34 I35 I36 F21 F22 F23 O21 O22 O23 O24

I41 I42 I43 I44 I45 I46 F31 F32 F33 O31 O32 O33 O34

I51 I52 I53 I54 I55 I56

Oij = IijF11 + Ii(j+1)F12 + … + I(i+2)(j+2)*F33

Painful Details – Edge Cases
Convolution doesn’t keep the whole image.
Suppose f is the image and g the filter.
Full – any part of g touches f. Same – same size as f;
Valid – only when filter doesn’t fall off edge.
full same valid
g g g g
g g

f f f

g g
g g g g
f/g Diagram Credit: D. Lowe
Painful Details – Edge Cases
What to about the “?” region?

???? Symm: fold sides over

g g

Circular/Wrap: wrap around

pad/fill: add value, often 0

g g

f/g Diagram Credit: D. Lowe

Painful Details – Does it Matter?
(I’ve applied the filter per-color channel)
Which padding did I use and why?
Input Box Filtered Box Filtered
Image ??? ???

Note – this is a zoom of the filtered, not a filter of the zoomed

Painful Details – Does it Matter?
(I’ve applied the filter per-color channel)

Input Box Filtered Box Filtered

Image Symm Pad Zero Pad

Note – this is a zoom of the filtered, not a filter of the zoomed

Practice with Linear Filters

0 0 0
0 1 0 ?
0 0 0

Original

Slide Credit: D. Lowe

Practice with Linear Filters

0 0 0
0 1 0
0 0 0

Original The Same!

Slide Credit: D. Lowe

Practice with Linear Filters

0 0 0
0 0 1 ?
0 0 0

Original

Slide Credit: D. Lowe

Practice with Linear Filters

0 0 0
0 0 1
0 0 0

Original Shifted
LEFT
1 pixel

Slide Credit: D. Lowe

Practice with Linear Filters

0 1 0
0 0 0 ?
0 0 0

Original

Slide Credit: D. Lowe

Practice with Linear Filters

0 1 0
0 0 0
0 0 0

Original Shifted
DOWN
1 pixel

Slide Credit: D. Lowe

Practice with Linear Filters

1/9 1/9 1/9

1/9 1/9 1/9 ?

1/9 1/9 1/9

Original

Slide Credit: D. Lowe

Practice with Linear Filters

1/9 1/9 1/9

Original Blur
(Box Filter)

Slide Credit: D. Lowe

Practice with Linear Filters

0 0 0

0 2 0

0 0 0

-
?
1/9 1/9 1/9
Original
1/9 1/9 1/9

1/9 1/9 1/9

Slide Credit: D. Lowe

Practice with Linear Filters

0 0 0

0 2 0

0 0 0

1/9 1/9 1/9

Original Sharpened
(Accentuates
1/9 1/9 1/9
difference from
local average)
1/9 1/9 1/9

Slide Credit: D. Lowe

Sharpening

Slide Credit: D. Lowe

Properties – Linear
Assume: I image f1, f2 filters
Linear: apply(I,f1+f2) = apply(I,f1) + apply(I,f2)
I is a white box on black, and f1, f2 are rectangles

A( , + ) =A( , )=

A( , )+A( , )= + =

Note: I am showing filters un-normalized and blown up. They’re a smaller box
filter (i.e., each entry is 1/(size^2))
Properties – Shift-Invariant

Assume: I image, f filter

Shift-invariant: shift(apply(I,f)) = apply(shift(I,f))
Intuitively: only depends on filter neighborhood

A( , )=

A( , )=
Painful Details – Signal Processing

Often called “convolution”. Actually cross-correlation. Source of terrible

confusion.

Cross-Correlation Convolution
(Original Orientation) (Flipped in x and y)
Properties of Convolution
• Any shift-invariant, linear operation is a convolution (⁎)
• Commutative: f ⁎ g = g ⁎ f
• Associative: (f ⁎ g) ⁎ h = f ⁎ (g ⁎ h)
• Distributes over +: f ⁎ (g + h) = f ⁎ g + f ⁎ h
• Scalars factor out: kf ⁎ g = f ⁎ kg = k (f ⁎ g)
• Identity (a single one with all zeros):

* =
Property List: K. Grauman
Questions?

• Nearly everything onwards is a convolution.

• This is important to get right.
Smoothing With A Box
Intuition: if filter touches it, it gets a contribution.
Input Filter Output

1/9 1/9 1/9

Solution – Weighted Combination

Intuition: weight contributions according to closeness to center.

𝐹𝑖𝑙𝑡𝑒𝑟𝑖𝑗 ∝ 1
What’s this?

𝑥2 + 𝑦2
𝐹𝑖𝑙𝑡𝑒𝑟𝑖𝑗 ∝ exp −
2𝜎 2
Recognize the Filter?
It’s a Gaussian!

1 𝑥2 + 𝑦2
𝐹𝑖𝑙𝑡𝑒𝑟𝑖𝑗 ∝ 2
exp −
2𝜋𝜎 2𝜎 2

0.003 0.013 0.022 0.013 0.003

0.013 0.060 0.098 0.060 0.013
0.022 0.098 0.162 0.098 0.022
0.013 0.060 0.098 0.060 0.013
0.003 0.013 0.022 0.013 0.003
Smoothing With A Box & Gauss
Still have some speckles, but it’s not a big box
Input Box Filter Gauss. Filter
Gaussian Filters

σ=1 σ=2 σ=4 σ=8

filter = 21x21 filter = 21x21 filter = 21x21 filter = 21x21

Note: filter visualizations are independently normalized throughout the

slides so you can see them better
Applying Gaussian Filters
Applying Gaussian Filters
Input Image
(no filter)
Applying Gaussian Filters
σ=1
Applying Gaussian Filters
σ=2
Applying Gaussian Filters
σ=4
Applying Gaussian Filters
σ=8
Picking a Filter Size
Too small filter → bad approximation
Want size ≈ 6σ (99.7% of energy)
Left far too small; right slightly too small!
σ = 8, size = 21 σ = 8, size = 43
Runtime Complexity
Image size = NxN = 6x6
Filter size = MxM = 3x3
I11 I12 I13 I14 I15 I16 for ImageY in range(N):
I21 F11
I22 F12
I23 F13
I24 I25 I26 for ImageX in range(N):
I31 I32
F21 I33
F22 I34
F23 I35 I36 for FilterY in range(M):
I41 F31
I42 F32
I43 F33
I44 I45 I46
for FilterX in range(M):
…
I51 I52 I53 I54 I55 I56

I61 I62 I63 I64 I65 I66

Time: O(N2M2)
Separability

1 𝑥2 + 𝑦2
𝐹𝑖𝑙𝑡𝑒𝑟𝑖𝑗 ∝ 2
exp −
2𝜋𝜎 2𝜎 2

→
1 𝑥2 1 𝑦2
𝐹𝑖𝑙𝑡𝑒𝑟𝑖𝑗 ∝ exp − 2 exp − 2
2𝜋𝜎 2𝜎 2𝜋𝜎 2𝜎
Separability
1D Gaussian ⁎ 1D Gaussian = 2D Gaussian
Image ⁎ 2D Gauss = Image ⁎ (1D Gauss ⁎ 1D Gauss )
= (Image ⁎ 1D Gauss) ⁎ 1D Gauss

⁎ =
Runtime Complexity
Image size = NxN = 6x6
Filter size = Mx1 = 3x1
I11 I12 I13 I14 I15 I16 for ImageY in range(N):
I21 F1
I22 I23 I24 I25 I26 for ImageX in range(N):
I31 I32
F2 I33 I34 I35 I36 for FilterY in range(M):
I41 F3
I42 I43 I44 I45 I46
…
for ImageY in range(N):
I51 I52 I53 I54 I55 I56
for ImageX in range(N):
I61 I62 I63 I64 I65 I66
for FilterX in range(M):
What are my compute savings …
for a 13x13 filter?
Time: O(N2M)
Why Gaussian?

Gaussian filtering removes parts of the signal above a certain

frequency. Often noise is high frequency and signal is low frequency.
Where Gaussian Fails
Applying Gaussian Filters
σ=1
Why Does This Fail?
Means can be arbitrarily distorted by outliers

Signal 10 12 9 8 1000 11 10 12

Filter 0.1 0.8 0.1

Output 11.5 9.2 107.3 801.9 109.8 10.3

What else is an “average” other than a mean?

Non-linear Filters (2D)

[040, 081, 013, 125, 830, 076, 144, 092, 108]

Sort
40 81 13 22
[013, 040, 076, 081, 092, 108, 125, 144, 830]
125 830 76 80
144 92 108 95 92
132 102 106 87
[830, 076, 080, 092, 108, 095, 102, 106, 087]

Sort
[076, 080, 087, 092, 095, 102, 106, 108, 830]

95
Applying Median Filter
Median
Filter
(size=3)
Applying Median Filter
Median
Filter
(size = 7)
Is Median Filtering Linear?

1 1 1 0 0 0 1 1 1
1 1 2 + 0 1 0 = 1 2 2
2 2 2 0 0 0 2 2 2
Median Filter

1 + 0 = 2

Example from (I believe): Kristen Grauman

Some Examples of Filtering
Filtering – Sharpening
Image Smoothed

-
Details

=
Filtering – Sharpening
Image Details

+α
“Sharpened” α=1

=
Filtering – Sharpening
Image Details

+α
“Sharpened” α=0

=
Filtering – Sharpening
Image Details

+α
“Sharpened” α=2

=
Filtering – Sharpening
Image Details

+α
“Sharpened” α=0

=
Filtering – Extreme Sharpening
Image Details

+α
“Sharpened” α=10

=
Filtering

What’s this Filter?

T
-1 0 1 -1 0 1

Derivative Dx Derivative Dy
Images as Functions or Points
Key idea: can treat image as a point in R(HxW) or as
a function of x,y.

𝜕𝐼 How much the intensity of the

(𝑥, 𝑦) image changes as you go
𝜕𝑥 horizontally at (x,y)
∇𝐼(𝑥, 𝑦) = (Often called Ix)
𝜕𝐼
(𝑥, 𝑦)
𝜕𝑦
Images as Functions
Image is function f(x,y)
𝜕 𝑓(𝑥, 𝑦) 𝑓 𝑥 + 𝜖, 𝑦 − 𝑓(𝑥, 𝑦)
Remember: = lim
𝜕𝑥 𝜖→0 𝜖

𝜕 𝑓(𝑥, 𝑦) 𝑓 𝑥 + 1, 𝑦 − 𝑓(𝑥, 𝑦)
Approximate: ≈
𝜕𝑥 1
-1 1

𝜕 𝑓(𝑥, 𝑦) 𝑓 𝑥 + 1, 𝑦 − 𝑓(𝑥 − 1, 𝑦)
Another one: ≈
𝜕𝑥 2
-1 0 1
Other Differentiation Operators
Horizontal Vertical
−1 0 1 1 1 1
Prewitt −1 0 1 0 0 0
−1 0 1 −1 −1 −1
−1 0 1 1 2 1
Sobel −2 0 2 0 0 0
−1 0 1 −1 −2 −1
Why not just use [-1,1] or [-1,0,1]?

- sensitive to noise: not to apply it on a single row of pixels, but on 3 rows: this allows to get an
average gradient on these 3 rows, that will soften possible noise.

- But this one tends to average things a little too much: when applied to one specific row, we
lose much of what makes the detail of this specific row.
Image Gradient
Compute derivatives Ix and Iy with filters

Ix Iy
Image Gradient
Compute derivatives Ix and Iy with filters

Ix Iy
Image Gradient Magnitude
Gradient Magnitude (Ix2 + Iy2 )1/2
Gives rate of change at each pixel
Image Gradient Magnitude
Gradient Magnitude (Ix2 + Iy2 )1/2
Gives rate of change at each pixel
Image Gradient Direction
Gradient Direction atan2(Ix, Iy)
Gives direction of change at each pixel

𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
∇𝑓 = ,0 ∇𝑓 = 0, ∇𝑓 = ,
𝜕𝑥 𝜕𝑦 𝜕𝑥 𝜕𝑦
Figure Credit: S. Seitz
Image Gradient Direction
Gradient Direction atan2(Ix, Iy)
Gives direction of change at each pixel
Image Gradient Direction
Gradient Direction atan2(Ix, Iy)
Gives direction of change at each pixel
Image Gradient Direction
Gradient Direction atan2(Ix, Iy)
Gives direction of change at each pixel

I’m making the lightness equal to gradient magnitude

Filtering – Bonus
Filtering – Missing Data
Oh no! Missing data!
(and we know where)

Common with many non-normal cameras (e.g., depth cameras)

Filtering – Missing Data

Image ⁎

Per-element /

Binary
Mask ⁎
Filtering – Missing Data

Image

Per-element /

Binary
Mask
Filtering – Missing Data

Before
Filtering – Missing Data

After
Filtering – Missing Data

After (without missing data)

Edges
Where do Edges Come From?
Depth / Distance
Discontinuity

Why?
Where do Edges Come From?
Surface Normal / Orientation
Discontinuity

Why?
Where do Edges Come From?
Surface Color / Reflectance Properties
Discontinuity
Illumination
Discontinuity
Recap: Image Gradient
Compute derivatives Ix and Iy with filters

Ix Iy
Compute derivatives Ix and Iy with filters

Ix Iy
Gradient Magnitude (Ix2 + Iy2 )1/2
Gives rate of change at each pixel
Gradient Direction atan2(Ix, Iy)
Gives direction of change at each pixel

Showing the gradient direction at every pixel

Gradient Direction
atan2(Iy,Ix): orientation
Why is there structure at 1 and not at 2?

1
2
Gaussian Derivative Filter
1 pixel 3 pixels 7 pixels

Removes noise, but blurs edge

Slide Credit: D. Forsyth
Filters We’ve Seen
Smoothing Derivative

Example Gaussian Deriv. of gauss

Goal Remove noise Find edges
Problems (ill-defined)
Image human segmentation gradient magnitude
Hand designed features
Typical CV pipeline back then (e.g., 2011)
And still doesn’t really work…

99%

Visualizing Object Detection Features, Vondrick et al., 2015

“Do you realize how lucky you are working
in AI / CV in 2024?”

So, what changed?

Representation learning!
Why hand design was doomed to fail…

3361 possible states?

(>> number of atoms in the universe)

Figure: AlphaGo vs Lee Sedol G4 move 78

2563×600×800
possible states?

Figure: detectron2
• In some cases, we can directly “use” the useful representations.

• In many applications, we need to “transfer” the useful

representations to some downstream tasks.
Architecture: what to train?

Inductive biases
Objective: how to train?
Data: where to train? huge diverse corpus general domain

loosely organized
not labeled

not huge, or not diverse, or both

task-specific
carefully curated domain

task-specific
annotations
Large pre-trained language models

Zhao, Wayne Xin, et al. "A Survey of Large Language Models." arXiv preprint arXiv:2303.18223 (2023).
Large pre-trained language models

Data Architecture Objective

Visual Pre-training has a longer history…

[Deng et al., 2009]

Data Architecture Objective

Visual Pre-training has a longer history…
Arguably the first major success of “pre-training"…

Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic
segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
Visual Pre-training has a longer history…

End-to-end Object Detection – Faster/Mask-RCNN

Offline CNN features → Backbone pre-training

Ren, He, Girshick, Sun, Faster R-CNN: Towards Real-Time Object

Detection with Region Proposal Networks, NeurIPS ’15
Architecture / Objective / Data
1. How to design neural network architectures

PDP LeNet AlexNet DSN ResNet ResNeXt ViT ConvNeXt v1/v2

1986 1989 2012 2014 2015 2017 2019 2020 2022

Architecture / Objective / Data
2. Training objectives beyond supervised classification:
Are labels necessary?

Various MoCo v3 SLIP, DINO

RBM Pretext tasks BERT / GPT MoCo SimCLR BYOL MAE CLIP

2006 2012 - 2018 2019 2020 2021 2022

Architecture / Objective / Data
3. What data to use for pre-training

2021 2022
Architecture / Objective / Data
1. How to design neural network architectures

PDP LeNet AlexNet DSN ResNet ResNeXt ViT ConvNeXt v1/v2

1986 1989 2012 2014 2015 2017 2019 2020 2022

Connectionism
[A general framework for parallel distributed processing. Rumelhart et al., 1986]

(PDP group is now at )

1986
Convolutional Neural Networks
[Learning Internal Representations by Error Propagation. Rumelhart et al., 1986]

ConvNet using BP

- Receptive field
- Translation equivariance
- Trained by error propagation

1986
LeNet
[Backpropagation Applied to Handwritten Zip Code Recognition, LeCun et al., 1989]

LeNet

1989
AlexNet
[Krizhevsky, Sutskever and Hinton, 2012]

[Deng et al., 2009]

[Russakovsky et al., 2015]

AlexNet

2012

Visualization 1 Introduction 1
No ratings yet
Visualization 1 Introduction 1
53 pages
CT1 4
No ratings yet
CT1 4
5 pages
DLCV CH0 Syllabus v2
No ratings yet
DLCV CH0 Syllabus v2
16 pages
On Deep Learning
No ratings yet
On Deep Learning
97 pages
Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks
No ratings yet
Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks
15 pages
LLM-Grounder
No ratings yet
LLM-Grounder
8 pages
CV #1 Course Introduction-1
No ratings yet
CV #1 Course Introduction-1
61 pages
Lecture1.2- Multimodal Research Tasks
No ratings yet
Lecture1.2- Multimodal Research Tasks
154 pages
Lesson plan_week5
No ratings yet
Lesson plan_week5
4 pages
Computer Vision and Image Computer Vision and Image Processing (CSEL Processing (CSEL - 393) 393) Lecture 1: Introduction Lecture 1: Introduction
No ratings yet
Computer Vision and Image Computer Vision and Image Processing (CSEL Processing (CSEL - 393) 393) Lecture 1: Introduction Lecture 1: Introduction
14 pages
Smalltalk
No ratings yet
Smalltalk
12 pages
Ijsret v7 Issue2 211
No ratings yet
Ijsret v7 Issue2 211
5 pages
Object Tracking
No ratings yet
Object Tracking
50 pages
Performance Analysis of NASNet On
No ratings yet
Performance Analysis of NASNet On
26 pages
Lecture1-1
No ratings yet
Lecture1-1
30 pages
M.EIC-Electives-2324-S2-VCOM
No ratings yet
M.EIC-Electives-2324-S2-VCOM
23 pages
CS373 2024 Part3 0 - Introduction
No ratings yet
CS373 2024 Part3 0 - Introduction
26 pages
Artificial Intelligence Based Mobile Robot
No ratings yet
Artificial Intelligence Based Mobile Robot
19 pages
PROJECT REPORT- SIGN LANGUAGE TO TEXT CONVERSION
No ratings yet
PROJECT REPORT- SIGN LANGUAGE TO TEXT CONVERSION
34 pages
2210.03094
No ratings yet
2210.03094
40 pages
Sign Language Detection For Deaf Using Deep Learning, MediaPipe and OpenCV - by Mayank Bali - Medium
No ratings yet
Sign Language Detection For Deaf Using Deep Learning, MediaPipe and OpenCV - by Mayank Bali - Medium
16 pages
Colorizer Camera Ready
No ratings yet
Colorizer Camera Ready
3 pages
LectureNotes PDF
No ratings yet
LectureNotes PDF
212 pages
Week5_Computer_Vision
No ratings yet
Week5_Computer_Vision
58 pages
Lec 00
No ratings yet
Lec 00
76 pages
Document Image Binarization Using Dual Discriminator Generative Adversarial Networks
No ratings yet
Document Image Binarization Using Dual Discriminator Generative Adversarial Networks
5 pages
Deepseek-Vl: Towards Real-World Vision-Language Understanding
No ratings yet
Deepseek-Vl: Towards Real-World Vision-Language Understanding
33 pages
SR22804211151
No ratings yet
SR22804211151
8 pages
Si-Lang Translator With Image Processing
No ratings yet
Si-Lang Translator With Image Processing
4 pages
Data Science Interview Questions 1
No ratings yet
Data Science Interview Questions 1
15 pages
2021a1r002
No ratings yet
2021a1r002
51 pages
Deep Learning with JavaScript: Neural networks in TensorFlow.js 1st Edition Shanqing Cai download
100% (1)
Deep Learning with JavaScript: Neural networks in TensorFlow.js 1st Edition Shanqing Cai download
56 pages
2311.15209v3
No ratings yet
2311.15209v3
19 pages
Bachelor's-Project Report-(Sign Language To Text Conversion)
No ratings yet
Bachelor's-Project Report-(Sign Language To Text Conversion)
30 pages
PDF Deep Learning with JavaScript: Neural networks in TensorFlow.js 1st Edition Shanqing Cai download
100% (2)
PDF Deep Learning with JavaScript: Neural networks in TensorFlow.js 1st Edition Shanqing Cai download
65 pages
Full download Deep Learning with JavaScript: Neural networks in TensorFlow.js 1st Edition Shanqing Cai pdf docx
100% (5)
Full download Deep Learning with JavaScript: Neural networks in TensorFlow.js 1st Edition Shanqing Cai pdf docx
55 pages
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
No ratings yet
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
15 pages
Giorgio Roffo: Curriculum Vitæ
No ratings yet
Giorgio Roffo: Curriculum Vitæ
10 pages
Hy X Ai MS
No ratings yet
Hy X Ai MS
6 pages
2304.03767v1
No ratings yet
2304.03767v1
18 pages
Computer Vision ch1
No ratings yet
Computer Vision ch1
80 pages
19 Deep Learning
100% (1)
19 Deep Learning
49 pages
Deep Learning With JavaScript: Neural Networks in TensorFlow - Js 1st Edition Shanqing Cai All Chapters Instant Download
100% (3)
Deep Learning With JavaScript: Neural Networks in TensorFlow - Js 1st Edition Shanqing Cai All Chapters Instant Download
62 pages
Revision 1 Answer Key
No ratings yet
Revision 1 Answer Key
7 pages
intro
No ratings yet
intro
66 pages
Explicit Architecture: DDD, Hexagonal, Onion, Clean, CQRS, How I Put It All Together
No ratings yet
Explicit Architecture: DDD, Hexagonal, Onion, Clean, CQRS, How I Put It All Together
37 pages
Cognitive Differences
No ratings yet
Cognitive Differences
27 pages
Development of An End-To-End Deep Learning Framework For Sign Language Recognition Translation and Video Generation
No ratings yet
Development of An End-To-End Deep Learning Framework For Sign Language Recognition Translation and Video Generation
17 pages
Lec00 Intro For Web Highlighted
No ratings yet
Lec00 Intro For Web Highlighted
72 pages
4th review (1)
No ratings yet
4th review (1)
20 pages
Facial Emotion and Object Detection For Visually Impaired Blind Persons IJERTV10IS090108
No ratings yet
Facial Emotion and Object Detection For Visually Impaired Blind Persons IJERTV10IS090108
4 pages
(2012) Principal Visual Word Discovery For Automatic License Plate Detection
No ratings yet
(2012) Principal Visual Word Discovery For Automatic License Plate Detection
11 pages
Download Complete (Ebook) Deep Learning with JavaScript: Neural networks in TensorFlow.js by Shanqing Cai, Stan Bileschi, Eric Nielsen ISBN 9781617296178, 1617296171 PDF for All Chapters
100% (2)
Download Complete (Ebook) Deep Learning with JavaScript: Neural networks in TensorFlow.js by Shanqing Cai, Stan Bileschi, Eric Nielsen ISBN 9781617296178, 1617296171 PDF for All Chapters
67 pages
Ruchitha_paper
No ratings yet
Ruchitha_paper
5 pages
Geneface++ ICLR 23
No ratings yet
Geneface++ ICLR 23
15 pages
Lec01 Intro
No ratings yet
Lec01 Intro
47 pages
01 - Introduction To Deep Learning
No ratings yet
01 - Introduction To Deep Learning
56 pages
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
Learn C# Programming by Creating Games with Unity
From Everand
Learn C# Programming by Creating Games with Unity
Patrick Felicia
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Edge Detection
No ratings yet
Edge Detection
21 pages
Lecture Spatial Filters High Pass
No ratings yet
Lecture Spatial Filters High Pass
36 pages
Seminar Report Edge Detection
No ratings yet
Seminar Report Edge Detection
20 pages
16-Edge Detection - Sobel Algorithm-21-02-2024
No ratings yet
16-Edge Detection - Sobel Algorithm-21-02-2024
15 pages
Real-Time Image Processing Method Using
100% (1)
Real-Time Image Processing Method Using
6 pages
Edge Detection Techniques On Digital Images - A Review
No ratings yet
Edge Detection Techniques On Digital Images - A Review
4 pages
Edge Detection
No ratings yet
Edge Detection
24 pages
Assignment No. 1 - FCV
No ratings yet
Assignment No. 1 - FCV
1 page
Canny Edge Detection Step by Step in Python - Computer Vision - by Sofiane Sahir - Towards Data Science
No ratings yet
Canny Edge Detection Step by Step in Python - Computer Vision - by Sofiane Sahir - Towards Data Science
24 pages
Paper HoughTrans
No ratings yet
Paper HoughTrans
19 pages
MS Thesis ECE Mohammad Tasneem Obaid
No ratings yet
MS Thesis ECE Mohammad Tasneem Obaid
72 pages
Edge Detection in Image Processing
No ratings yet
Edge Detection in Image Processing
9 pages
Project Report
100% (1)
Project Report
40 pages
Lecture 3 of Computer Vision
No ratings yet
Lecture 3 of Computer Vision
45 pages
Canny Edge Detection Tutorial PDF
No ratings yet
Canny Edge Detection Tutorial PDF
17 pages
Edge Detection.ppt
No ratings yet
Edge Detection.ppt
22 pages
Real Time Object Detection and Measurement
No ratings yet
Real Time Object Detection and Measurement
6 pages
Edge Drawing A Combined Real-Time Edge and Segment Detector
No ratings yet
Edge Drawing A Combined Real-Time Edge and Segment Detector
11 pages
Dip 4
No ratings yet
Dip 4
24 pages
A Study of Different Classifier Combination Approaches For Handwritten Indic Script Recognition
No ratings yet
A Study of Different Classifier Combination Approaches For Handwritten Indic Script Recognition
21 pages
Digital Image Processing Laboratory Manual
No ratings yet
Digital Image Processing Laboratory Manual
65 pages
18CSE481T AML AIML CT3 Answer Key
No ratings yet
18CSE481T AML AIML CT3 Answer Key
11 pages
Frax Flame
No ratings yet
Frax Flame
6 pages
Computer Vision
No ratings yet
Computer Vision
14 pages
Object Detection - Week 1 - Object Detection in 20 Years - Final
No ratings yet
Object Detection - Week 1 - Object Detection in 20 Years - Final
280 pages
5 Sobel Edge Detection and Prewitt Edge Detection 09102024 123654pm
No ratings yet
5 Sobel Edge Detection and Prewitt Edge Detection 09102024 123654pm
17 pages
Linear Algebra: Camera Calibration
No ratings yet
Linear Algebra: Camera Calibration
19 pages
Ec8762 Dip Lab Manual
100% (1)
Ec8762 Dip Lab Manual
55 pages
DSP Project
No ratings yet
DSP Project
23 pages
LXF122.Tut Gimp
No ratings yet
LXF122.Tut Gimp
4 pages