GPU Computing – MCQs & Fill in the Blanks
(Questions with Answers)
Q1. The primary goal of evolving GPU architectures is to improve
Answer: Parallel processing capability
Q2. In CUDA, a group of threads that execute the same instruction simultaneously is called
Answer: Warp
Q3. __________ memory is read-only and cached, making it suitable for frequently accessed
constants.
Answer: Constant memory
Q4. Global memory in CUDA is accessible by all threads but has higher latency compared to
shared memory. (True/False)
Answer: True
Q5. In CUDA, a function executed on the GPU is called
Answer: Kernel
Q6. The main purpose of using multiple GPUs in a system is to
Answer: Increase computational throughput
Q7. Using ____________ allows a CUDA application to scale beyond the capacity of a single GPU.
Answer: Multi-GPU programming
Q8. Increasing the number of threads always guarantees better CUDA performance. (True/False)
Answer: False
Q9. The CUDA function used to check the last error generated by a kernel launch is
Answer: cudaGetLastError()
Q10. The return type cudaError_t represents
Answer: CUDA runtime error status
Q11. A race condition occurs when
Answer: Multiple threads access shared data without synchronization
Q12. Using atomic operations always eliminates all performance issues. (True/False)
Answer: False
Q13. OpenCL kernels are typically written using
Answer: C-based language
Q14. ________ provides a platform-independent programming framework.
Answer: OpenCL
Q15. The OpenCL host communicates with devices through ____________.
Answer: Command queue
Q16. OpenCL is primarily designed for
Answer: Heterogeneous parallel computing
Q17. The prefix sum pattern computes the ____________ sum of elements in an array.
Answer: Cumulative
Q18. Sparse matrices store only ____________ elements.
Answer: Non-zero
Q19. In CUDA, a collection of blocks forms a ____________.
Answer: Grid
Q20. GPUs achieve high performance mainly through massive ____________ execution.
Answer: Parallel
Q21. Which CUDA memory is shared among all threads within a block?
Answer: Shared memory
Q22. The CUDA hardware component responsible for executing warps is the
Answer: Streaming Multiprocessor
Q23. CUDA programs execute on the GPU while control remains with the
Answer: Host CPU
Q24. Excessive host-device data transfer affects performance due to
Answer: PCIe latency
Q25. Threads within a CUDA block are synchronized using
Answer: __syncthreads()
Q26. CUDA errors are returned as values of type
Answer: cudaError_t
Q27. Failure to synchronize threads correctly may result in ____________ conditions.
Answer: Race
Q28. The function used to convert CUDA error codes into readable strings is
Answer: cudaGetErrorString()
Q29. In OpenCL, a function executed on a compute device is called a
Answer: Kernel
Q30. A collection of work-items in OpenCL is called a
Answer: Work-group
Q31. The prefix sum operation is also known as
Answer: Scan
Q32. Convolution is commonly used in
Answer: Signal and image processing
Q33. Convolution can be efficiently parallelized by dividing input data. (True/False)
Answer: True
Q34. Matrix multiplication performance depends heavily on efficient ____________ access.
Answer: Memory