Unit 3: Making Machines See
1. The main goal of computer vision is to:
A) Recognize and interpret visual data
B) Convert text to speech
C) Process audio signals
D) Perform numerical computations
2. Convolutional Neural Networks (CNNs) are primarily used for:
A) Text analysis
B) Image and video processing
C) Audio recognition
D) Time-series forecasting
3. The convolution layer in CNN is used to:
A) Extract features from images
B) Normalize data
C) Reduce dataset size
D) Fully connect neurons
4. Pooling layers in CNN are used to:
A) Reduce spatial dimensions of feature maps
B) Increase dataset size
C) Normalize images
D) Detect edges
5. Which of the following is not a computer vision application?
A) Facial recognition
B) Self-driving cars
C) Speech-to-text conversion
D) Medical imaging
6. Edge detection is used to:
A) Blur images
B) Identify object boundaries
C) Increase color intensity
D) Compress images
7. Image classification refers to:
A) Assigning a label to an entire image
B) Detecting objects’ location
C) Segmenting the image
D) Augmenting the dataset
8. Object detection differs from image classification because it:
A) Only classifies one object
B) Identifies both class and location of multiple objects
C) Converts images into audio
D) Generates synthetic images
9. Image segmentation refers to:
A) Dividing an image into regions for analysis
B) Reducing dataset size
C) Normalizing images
D) Labeling data points
10. Data augmentation is used to:
A) Reduce dataset size
B) Increase the size and diversity of dataset
C) Remove missing data
D) Evaluate models
11. Which of the following is a commonly used dataset for handwritten
digit recognition?
A) CIFAR-10
B) MNIST
C) ImageNet
D) COCO
12. Which dataset is used for large-scale object recognition?
A) MNIST
B) CIFAR-10
C) ImageNet
D) Fashion-MNIST
13. Image embedding refers to:
A) Converting images into numerical vectors for ML models
B) Compressing images
C) Detecting edges
D) Segmentation
14. Feature maps in CNN represent:
A) Raw pixel values
B) Extracted features like edges, textures, patterns
C) Data labels
D) Model weights
15. The ReLU activation function is used in CNN to:
A) Introduce non-linearity
B) Normalize images
C) Pool feature maps
D) Reduce overfitting
16. Transfer learning in computer vision allows:
A) Using pre-trained models for new tasks
B) Training a model from scratch
C) Reducing dataset size
D) Normalizing images
17. Object tracking in video is an application of:
A) Natural language processing
B) Computer vision
C) Reinforcement learning
D) Speech recognition
18. Semantic segmentation labels:
A) Each pixel of the image with a class
B) Entire image with one class
C) Bounding boxes of objects
D) Data features
19. Instance segmentation differs from semantic segmentation because it:
A) Labels individual objects separately
B) Labels the entire image
C) Reduces dataset size
D) Performs classification
20. YOLO and SSD are examples of:
A) Image classification algorithms
B) Object detection algorithms
C) Segmentation algorithms
D) Data augmentation techniques
21. GANs (Generative Adversarial Networks) in computer vision are used
for:
A) Generating realistic images
B) Image classification
C) Feature extraction
D) Edge detection
22. Image pre-processing includes:
A) Resizing, normalization, and augmentation
B) Model evaluation
C) Labeling objects
D) Generating datasets
23. Which of the following is a challenge in computer vision?
A) Occlusion of objects
B) Variations in lighting
C) Complex backgrounds
D) All of the above
24. Which is a common evaluation metric for object detection?
A) Precision
B) Recall
C) mAP (mean Average Precision)
D) RMSE
25. Facial recognition systems rely on:
A) Audio processing
B) Image feature extraction
C) Text analytics
D) Statistical regression
26. Optical Character Recognition (OCR) is used to:
A) Convert images of text into machine-readable text
B) Detect objects
C) Segment images
D) Compress datasets
27. Edge detectors include:
A) Sobel
B) Canny
C) Prewitt
D) All of the above
28. Convolution operation in CNN involves:
A) Sliding a kernel over the image
B) Normalizing images
C) Segmenting objects
D) Reducing dataset size
29. Pooling can be:
A) Max pooling
B) Average pooling
C) Both A and B
D) None of the above
30. Batch normalization in CNN is used to:
A) Reduce internal covariate shift
B) Detect edges
C) Augment data
D) Segment images
31. Transfer learning is especially useful when:
A) Dataset is small
B) Dataset is large
C) Images are grayscale
D) No labels are available
32. Which of the following is a common deep learning library for computer
vision?
A) TensorFlow
B) Keras
C) PyTorch
D) All of the above
33. In computer vision, an RGB image has:
A) 1 channel
B) 2 channels
C) 3 channels
D) 4 channels
34. A grayscale image has:
A) 1 channel
B) 2 channels
C) 3 channels
D) 4 channels
35. Image convolution helps in:
A) Feature extraction
B) Dataset splitting
C) Label encoding
D) Model deployment
36. Residual Networks (ResNet) help in:
A) Preventing vanishing gradient
B) Normalizing data
C) Augmenting images
D) Segmentation
37. Data labeling in computer vision is required for:
A) Supervised learning
B) Unsupervised learning
C) Data augmentation
D) Pooling
38. The HOG (Histogram of Oriented Gradients) descriptor is used for:
A) Feature extraction
B) Pooling
C) Convolution
D) Classification
39. Feature pyramid networks (FPN) are used to:
A) Detect objects at multiple scales
B) Segment images
C) Normalize features
D) Train models
40. Autoencoders in vision are used for:
A) Image compression and reconstruction
B) Object detection
C) Classification
D) Pooling
41. Image super-resolution improves:
A) Dataset size
B) Image quality and resolution
C) Model speed
D) Label accuracy
42. Image captioning is a combination of:
A) Computer vision and NLP
B) Image processing only
C) Audio processing
D) Regression analysis
43. Image histogram equalization is used to:
A) Improve contrast
B) Detect objects
C) Segment images
D) Pool images
44. Edge detection is used in:
A) Feature extraction
B) Data augmentation
C) Model training
D) Image compression
45. Object recognition is different from object detection because:
A) It identifies objects but does not locate them
B) It locates objects only
C) It segments objects
D) It augments images
46. CNNs require:
A) Large amounts of labeled data
B) No data
C) Only small datasets
D) Only grayscale images
47. Data augmentation techniques include:
A) Rotation
B) Flipping
C) Cropping
D) All of the above
48. The main challenge in self-driving car vision systems is:
A) Object occlusion and dynamic environments
B) Image storage
C) Data labeling only
D) Pooling
49. Semantic segmentation output:
A) Each pixel labeled with a class
B) Entire image classified
C) Only edges detected
D) Bounding boxes only
50. In computer vision, the term “feature map” refers to:
A) Matrix representing detected features from input
B) Dataset matrix
C) Model weights
D) Label matrix
Answer Key – Unit 3: Making Machines See
1. A – Recognize and interpret visual data
2. B – Image and video processing
3. A – Extract features from images
4. A – Reduce spatial dimensions of feature maps
5. C – Speech-to-text conversion
6. B – Identify object boundaries
7. A – Assigning a label to an entire image
8. B – Identifies both class and location of multiple objects
9. A – Dividing an image into regions for analysis
10. B – Increase the size and diversity of dataset
11. B – MNIST
12. C – ImageNet
13. A – Converting images into numerical vectors for ML models
14. B – Extracted features like edges, textures, patterns
15. A – Introduce non-linearity
16. A – Using pre-trained models for new tasks
17. B – Computer vision
18. A – Each pixel of the image with a class
19. A – Labels individual objects separately
20. B – Object detection algorithms
21. A – Generating realistic images
22. A – Resizing, normalization, and augmentation
23. D – All of the above
24. C – mAP (mean Average Precision)
25. B – Image feature extraction
26. A – Convert images of text into machine-readable text
27. D – All of the above
28. A – Sliding a kernel over the image
29. C – Both A and B
30. A – Reduce internal covariate shift
31. A – Dataset is small
32. D – All of the above
33. C – 3 channels
34. A – 1 channel
35. A – Feature extraction
36. A – Preventing vanishing gradient
37. A – Supervised learning
38. A – Feature extraction
39. A – Detect objects at multiple scales
40. A – Image compression and reconstruction
41. B – Image quality and resolution
42. A – Computer vision and NLP
43. A – Improve contrast
44. A – Feature extraction
45. A – It identifies objects but does not locate them
46. A – Large amounts of labeled data
47. D – All of the above
48. A – Object occlusion and dynamic environments
49. A – Each pixel labeled with a class
50. A – Matrix representing detected features from input