GPU_Assignment-3_Solution

Uploaded by

Cat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

GPU_Assignment-3_Solution

Uploaded by

Cat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

GPU Architectures
and Programming
Assignment- Week 3

TYPE OF QUESTION: Objective

Number of questions: 10
Total mark: 10 X 1 = 10

QUESTION 1:
How are CUDA threads invoked to execute a kernel from the host?
Options:
A) Using a loop structure
B) With the <<<...>>> execution configuration syntax
C) By specifying thread IDs in the main function
D) Automatically by the GPU scheduler
Answer:
B) With the <<<...>>> execution configuration syntax
QUESTION 2:
What is the purpose of the threadIdx built-in variable in a CUDA kernel?
Options:
A) Provides a random number
B) Identifies the current CUDA block
C) Gives the total number of threads
D) Provides a unique identifier for each thread
Answer:
D) Provides a unique identifier for each thread
QUESTION 3:
Any function that is launched by the host and executed by a GPU kernel should be qualified by which
keyword?
Options:
A) __device__
B) __host__
C) __kernel__
D) __global__
Answer:
D) __global__
QUESTION 4:
What does the <<<1, N>>> syntax signify in the kernel invocation VecAdd<<<1, N>>>(A, B, C)?
Options:
A) 1 block of threads, N threads per block
B) N blocks of threads, 1 thread per block
C) N blocks with variable thread count
D) 1 thread per block, 1 block in total
Answer:
A) 1 block of threads, N threads per block

QUESTION 5:
Given a GPU with 10 streaming multiprocessors, each supporting a maximum of 1024 threads per SM, and a
CUDA kernel is launched with a block size of 128 threads, calculate the maximum number of active blocks
on the GPU.
Options:
A. 80
B. 100
C. 200
D. 1280
Answer:
A. 80
Detailed Solution:
Maximum active blocks per SM = Total threads per SM / Threads per block
Maximum active blocks on GPU = Maximum active blocks per SM * Number of SMs

QUESTION 6:
Calculate the execution time (in seconds) for a CUDA kernel that processes 8192 elements with a block size
of 128 threads and an average execution time of 2 milliseconds per block, considering that only one SM is
available on the target GPU for executing the blocks.
Options:
A. 0.512 seconds
B. 0.256 seconds
C. 1.024 seconds
D. 0.128 seconds
Answer:
D. 0.128 seconds
Detailed Solution:
Execution time is calculated as the product of the number of blocks and the average execution time per
block.
QUESTION 7:
Given a CUDA kernel with a grid size of 2 blocks and 256 threads per block, calculate the total number of
threads launched by the kernel.
Options:
A. 256
B. 512
C. 1024
D. 4096
Answer:
B. 512
Detailed Solution:
Total threads launched = Block size * Threads per block
QUESTION 8:
What is the CUDA function call required to copy an array h_A from the CPU memory to the GPU
memory, where it is known as d_A?
Options:
A. cudaMemcpy(h_A, d_A, size, cudaMemcpyHostToDevice);
B. cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
C. cudaMemcpy(h_A, d_A, size, cudaMemcpyDeviceToHost);
D. cudaMemcpy(d_A, h_A, size, cudaMemcpyDeviceToHost);
Answer:
B. cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

QUESTION 9:
Which of the following options is true regarding the matrix multiplication kernel in the code shown
below:

#where d_M and d_N are matrices and N is the row and column sizes
and d_P is the product matrix
__global__ void Matrix MulKernel ( float * d_M , float * d_N , float * d_P , int N ) {

int i = blockIdx.y * blockDim.y + threadIdx.y ;

int j = blockIdx.x * blockDim.x + threadIdx.x ;
if (( i < N ) && (j < N ) ) {
float Pvalue = 0.0;
for ( int k = 0; k < N ; ++k ) {
Pvalue += d_M [i*N + k]* d_N [k*N + j];
}
d_P [i*N+j] = Pvalue ;
}
}
Options:
A. The kernel iterates over each element of the output matrix (d_P) parallelly and calculates its
value using a nested loop that iterates over the corresponding row of the first matrix (d_M)
and the corresponding column of the second matrix (d_N) sequentially.
B. The kernel iterates over each element of the output matrix (d_P) sequentially and calculates
its value using a nested loop that iterates over the corresponding row of the first matrix
(d_M) and the corresponding column of the second matrix (d_N) parallelly.
C. The computation of individual elements in the product matrix d_P can be carried out
parallelly using threads along a different dimension than the ones used for the parallel
computation of the entire product matrix.
D. The computation of individual elements in the product matrix d_P can be carried out
parallelly using threads along one of the same dimensions as the ones used for the parallel
computation of the entire product matrix.
Answer:
B. The kernel iterates over each element of the output matrix (d_P) sequentially and calculates its
value using a nested loop that iterates over the corresponding row of the first matrix (d_M) and the
corresponding column of the second matrix (d_N) parallelly.
QUESTION 10:
Which of the following statements regarding CUDA memory allocation is false?
Options:
A. It is possible to allocate memory in a CUDA device kernel for an integer array.
B. It is possible to allocate memory in a CUDA device kernel by passing the pointer to an
integer array.
C. An array created inside a CUDA device kernel cannot be directly dereferenced in the host
side.
D. An array created inside a CUDA device kernel can be copied to another CUDA device
kernel by calling the function cudaMemcpy using the flag cudaMemcpyDeviceToDevice.
Answer: B. It is possible to allocate memory in a CUDA device kernel by passing the pointer to an
integer array.
Detailed Solution: It is not possible to allocate memory in a CUDA device kernel by passing the
pointer to an integer array, the pointer has to be typecast to void before passing.

ECE408 S19 ZJUI Exam1 Study Guide
No ratings yet
ECE408 S19 ZJUI Exam1 Study Guide
25 pages
Processors
No ratings yet
Processors
25 pages
Coursera Quiz Week1 Spring 2014 Heterogeneous Programming
100% (5)
Coursera Quiz Week1 Spring 2014 Heterogeneous Programming
4 pages
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
No ratings yet
BCS3413 Principle & Applications of Parallel Programming Quiz 2: Gpgpu Cuda
3 pages
CUDA Putting It All Together
No ratings yet
CUDA Putting It All Together
39 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
ECE408 2012 Practice Exam1
No ratings yet
ECE408 2012 Practice Exam1
10 pages
CUDA_part-2
No ratings yet
CUDA_part-2
49 pages
How To Optimize A CUDA Matmul Kernel For CuBLAS-like Performance - A Worklog
No ratings yet
How To Optimize A CUDA Matmul Kernel For CuBLAS-like Performance - A Worklog
23 pages
20 Quiz 14
No ratings yet
20 Quiz 14
12 pages
Threads
No ratings yet
Threads
54 pages
Matrix Mult
100% (1)
Matrix Mult
55 pages
cs239 Ejer1
No ratings yet
cs239 Ejer1
2 pages
Lab Report 6
No ratings yet
Lab Report 6
12 pages
217 Lec3
No ratings yet
217 Lec3
46 pages
D. Granularity
No ratings yet
D. Granularity
24 pages
Coursera Quiz Week2 Fall 2012
No ratings yet
Coursera Quiz Week2 Fall 2012
3 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
cuda_mode_lecture2
No ratings yet
cuda_mode_lecture2
33 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
hw4
No ratings yet
hw4
3 pages
CENG443_2023_Final
No ratings yet
CENG443_2023_Final
4 pages
CUDA Exercises
No ratings yet
CUDA Exercises
185 pages
Hpc file
No ratings yet
Hpc file
22 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
3-CUDA
No ratings yet
3-CUDA
5 pages
CUDA NPTEL WEEK 8 Assignment
No ratings yet
CUDA NPTEL WEEK 8 Assignment
3 pages
217 Lec2
No ratings yet
217 Lec2
24 pages
5-computation
No ratings yet
5-computation
13 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
Lecture 4
No ratings yet
Lecture 4
48 pages
GPU_Programming_slides_3
No ratings yet
GPU_Programming_slides_3
73 pages
Lecture2 Cuda Basic 2010
No ratings yet
Lecture2 Cuda Basic 2010
44 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
250121_L5
No ratings yet
250121_L5
59 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
3 Cuda
No ratings yet
3 Cuda
5 pages
GPU - Mid - Gradescope
No ratings yet
GPU - Mid - Gradescope
11 pages
Parallel Computing Lab4
No ratings yet
Parallel Computing Lab4
13 pages
978-3-642-29737-3_42
No ratings yet
978-3-642-29737-3_42
10 pages
Opencl Programming For The Cuda Architecture
No ratings yet
Opencl Programming For The Cuda Architecture
23 pages
CUDA_part-1
No ratings yet
CUDA_part-1
52 pages
GPU Programming: CUDA
No ratings yet
GPU Programming: CUDA
29 pages
HPC-Practical-4Addition of two large vectors
No ratings yet
HPC-Practical-4Addition of two large vectors
4 pages
HPC
No ratings yet
HPC
90 pages
2
No ratings yet
2
58 pages
HPC 4 B
No ratings yet
HPC 4 B
5 pages
AcceleratingAIAdvancements Pre Print Doube Blind
No ratings yet
AcceleratingAIAdvancements Pre Print Doube Blind
9 pages
Multithreaded Architectures: Memory and Data Locality
No ratings yet
Multithreaded Architectures: Memory and Data Locality
39 pages
CUDA MatrixMultiplication
No ratings yet
CUDA MatrixMultiplication
2 pages
002 - Introduction To CUDA Programming - 1
No ratings yet
002 - Introduction To CUDA Programming - 1
54 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
PDC assignment
No ratings yet
PDC assignment
9 pages
01 Cuda c Basics
No ratings yet
01 Cuda c Basics
32 pages
CUDA Programming: Johan Seland Johan - Seland@sintef - No
No ratings yet
CUDA Programming: Johan Seland Johan - Seland@sintef - No
76 pages
Lect11 12 Cuda Threads
No ratings yet
Lect11 12 Cuda Threads
25 pages
vertopal.com_Lab7_GPU (1)
No ratings yet
vertopal.com_Lab7_GPU (1)
10 pages
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
List of Approved Recorders VMS Approved Record
No ratings yet
List of Approved Recorders VMS Approved Record
16 pages
08 Lecture2
No ratings yet
08 Lecture2
23 pages
Thesis Project Management Bbs 3rd Year
100% (2)
Thesis Project Management Bbs 3rd Year
5 pages
ENDURO FRP Panels Cooling Tower Application
No ratings yet
ENDURO FRP Panels Cooling Tower Application
5 pages
MSK3
No ratings yet
MSK3
17 pages
ZXV Launch Presentation - Jan, 2020
No ratings yet
ZXV Launch Presentation - Jan, 2020
22 pages
NetNumen U31 R06. Backup and Recovery Guide
No ratings yet
NetNumen U31 R06. Backup and Recovery Guide
49 pages
BV Endura_Specifications
No ratings yet
BV Endura_Specifications
11 pages
High Strength Light-Weight Recyclable: CEE-plus™, ZEE-plus™ and ZED-plus™
No ratings yet
High Strength Light-Weight Recyclable: CEE-plus™, ZEE-plus™ and ZED-plus™
8 pages
Google Soli: 2019-20 Dept. of ECE, KLECET, Chikodi
No ratings yet
Google Soli: 2019-20 Dept. of ECE, KLECET, Chikodi
14 pages
PAVBACK 6.03P Installation & Quick Start Guide
No ratings yet
PAVBACK 6.03P Installation & Quick Start Guide
2 pages
RE I Slides Part II
No ratings yet
RE I Slides Part II
172 pages
Btled 3-B
No ratings yet
Btled 3-B
27 pages
PUBL-6113-Sales - Guide-The - Quick - Start - Feature - The - Fastest - Chiller - Restart - After - Power - Failure-YK (0222) PDF
No ratings yet
PUBL-6113-Sales - Guide-The - Quick - Start - Feature - The - Fastest - Chiller - Restart - After - Power - Failure-YK (0222) PDF
2 pages
Bai Tap Word Form Tieng Anh Lop 7
No ratings yet
Bai Tap Word Form Tieng Anh Lop 7
13 pages
CDRoller 11.61.20.0 Crack & Keygen Full 2020 Download
No ratings yet
CDRoller 11.61.20.0 Crack & Keygen Full 2020 Download
7 pages
Implementation Planning Checklists
No ratings yet
Implementation Planning Checklists
5 pages
Ipl Dream Team - A Prediction Software Based On Data Mining and Statistical Analysis
No ratings yet
Ipl Dream Team - A Prediction Software Based On Data Mining and Statistical Analysis
7 pages
Lecture - Switching Systems and Multistage Network
No ratings yet
Lecture - Switching Systems and Multistage Network
29 pages
Vengeance Relics
No ratings yet
Vengeance Relics
7 pages
MCQ 206 OSCM Supply Chain Management
No ratings yet
MCQ 206 OSCM Supply Chain Management
53 pages
Bez Mini Game
No ratings yet
Bez Mini Game
10 pages
Visual Basic 2010 Programmer s Reference 1st Edition Rod Stephens download
No ratings yet
Visual Basic 2010 Programmer s Reference 1st Edition Rod Stephens download
60 pages
Final-Paper
No ratings yet
Final-Paper
58 pages
API 653 Minimum Thk. Calculation
0% (1)
API 653 Minimum Thk. Calculation
1 page
Synopsis Presentation Cyberpunk
No ratings yet
Synopsis Presentation Cyberpunk
10 pages
Brochure TriAuto Mini EN
No ratings yet
Brochure TriAuto Mini EN
10 pages
The Method of Detecting Asynchronous Power Swing Based On The Variation of Active Power
No ratings yet
The Method of Detecting Asynchronous Power Swing Based On The Variation of Active Power
6 pages
7750 SR-a8 Front side
No ratings yet
7750 SR-a8 Front side
2 pages
Lesson 5 - C++ Cin & Strings
No ratings yet
Lesson 5 - C++ Cin & Strings
5 pages

GPU_Assignment-3_Solution

Uploaded by

GPU_Assignment-3_Solution

Uploaded by

NPTEL Online Certification Courses

Indian Institute of Technology Kharagpur

TYPE OF QUESTION: Objective

int i = blockIdx.y * blockDim.y + threadIdx.y ;

You might also like