0% found this document useful (0 votes)
69 views3 pages

GPU Computing MCQ FillBlanks QA

The document contains multiple-choice questions and fill-in-the-blank questions related to GPU computing and CUDA programming. Key topics include GPU architecture, CUDA functions, memory types, and parallel processing concepts. It also covers OpenCL and performance considerations in GPU applications.

Uploaded by

kumarkaran55770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views3 pages

GPU Computing MCQ FillBlanks QA

The document contains multiple-choice questions and fill-in-the-blank questions related to GPU computing and CUDA programming. Key topics include GPU architecture, CUDA functions, memory types, and parallel processing concepts. It also covers OpenCL and performance considerations in GPU applications.

Uploaded by

kumarkaran55770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

GPU Computing – MCQs & Fill in the Blanks

(Questions with Answers)

Q1. The primary goal of evolving GPU architectures is to improve


Answer: Parallel processing capability

Q2. In CUDA, a group of threads that execute the same instruction simultaneously is called
Answer: Warp

Q3. __________ memory is read-only and cached, making it suitable for frequently accessed
constants.
Answer: Constant memory

Q4. Global memory in CUDA is accessible by all threads but has higher latency compared to
shared memory. (True/False)
Answer: True

Q5. In CUDA, a function executed on the GPU is called


Answer: Kernel

Q6. The main purpose of using multiple GPUs in a system is to


Answer: Increase computational throughput

Q7. Using ____________ allows a CUDA application to scale beyond the capacity of a single GPU.
Answer: Multi-GPU programming

Q8. Increasing the number of threads always guarantees better CUDA performance. (True/False)
Answer: False

Q9. The CUDA function used to check the last error generated by a kernel launch is
Answer: cudaGetLastError()

Q10. The return type cudaError_t represents


Answer: CUDA runtime error status

Q11. A race condition occurs when


Answer: Multiple threads access shared data without synchronization

Q12. Using atomic operations always eliminates all performance issues. (True/False)
Answer: False

Q13. OpenCL kernels are typically written using


Answer: C-based language

Q14. ________ provides a platform-independent programming framework.


Answer: OpenCL

Q15. The OpenCL host communicates with devices through ____________.


Answer: Command queue
Q16. OpenCL is primarily designed for
Answer: Heterogeneous parallel computing

Q17. The prefix sum pattern computes the ____________ sum of elements in an array.
Answer: Cumulative

Q18. Sparse matrices store only ____________ elements.


Answer: Non-zero

Q19. In CUDA, a collection of blocks forms a ____________.


Answer: Grid

Q20. GPUs achieve high performance mainly through massive ____________ execution.
Answer: Parallel

Q21. Which CUDA memory is shared among all threads within a block?
Answer: Shared memory

Q22. The CUDA hardware component responsible for executing warps is the
Answer: Streaming Multiprocessor

Q23. CUDA programs execute on the GPU while control remains with the
Answer: Host CPU

Q24. Excessive host-device data transfer affects performance due to


Answer: PCIe latency

Q25. Threads within a CUDA block are synchronized using


Answer: __syncthreads()

Q26. CUDA errors are returned as values of type


Answer: cudaError_t

Q27. Failure to synchronize threads correctly may result in ____________ conditions.


Answer: Race

Q28. The function used to convert CUDA error codes into readable strings is
Answer: cudaGetErrorString()

Q29. In OpenCL, a function executed on a compute device is called a


Answer: Kernel

Q30. A collection of work-items in OpenCL is called a


Answer: Work-group

Q31. The prefix sum operation is also known as


Answer: Scan

Q32. Convolution is commonly used in


Answer: Signal and image processing

Q33. Convolution can be efficiently parallelized by dividing input data. (True/False)


Answer: True

Q34. Matrix multiplication performance depends heavily on efficient ____________ access.


Answer: Memory

You might also like