Setting Up Google Colab for CUDA

Google Colab provides a cloud-based environment that grants free access to NVIDIA GPUs. This makes it an ideal platform for developing and testing CUDA (Compute Unified Device Architecture) programs without needing expensive local hardware. By leveraging Colab's infrastructure, one can compile and run C++ code directly on a GPU using the nvcc compiler.

Step-by-Step Setup Guide

Step 1: Enable GPU Hardware Acceleration

Before running any CUDA code, enable a physical GPU to the Colab virtual machine instead of just a standard CPU.

1. Navigate to the Edit menu and select Notebook settings (or go to Runtime > Change runtime type).

Screenshot-2026-02-10-181813 — select Notebook Settings

2. Under the Hardware accelerator dropdown, select T4 GPU (or any available GPU) and click Save.

Screenshot-2026-02-10-182122 — select T4 GPU

Step 2: Verify GPU Allocation

Ensure that a GPU is attached to your Colab runtime. The following command confirms the presence and model of the NVIDIA GPU, along with its driver version.

!nvidia-smi

Screenshot-2026-02-10-182843 — verifying GPU allocation

Explanation:

Above command outputs information about the NVIDIA GPU currently allocated to your Colab session, including its name (e.g., Tesla T4), memory usage and the CUDA version supported by the driver.
If no GPU is shown, you must enable it in Runtime > Change runtime type > Hardware accelerator > T4 GPU.

Step 3: Check NVCC Compiler Version

Colab comes with the CUDA Toolkit pre-installed. The following command verifies the version of nvcc, the NVIDIA CUDA Compiler, which is crucial for compiling CUDA C++ code.

!nvcc --version

Screenshot-2026-02-11-110829 — checking nvcc compiler version

Step 4: Install the NVCC Jupyter Plugin

To seamlessly write and execute CUDA C++ code directly within a Colab notebook cell using the %%cu magic command, you need to install the "nvcc4jupyter" package.

!pip install nvcc4jupyter

Screenshot-2026-02-11-111404 — installing nvcc jupyter plugin

Explanation: This command installs the nvcc4jupyter Python package from PyPI. This plugin enables Jupyter (and by extension, Colab) to recognize and process CUDA code blocks.

Step 5: Load the NVCC Jupyter Extension

After installing the plugin, one must explicitly load the extension into notebook's kernel. This activates the %%cu command.

%load_ext nvcc4jupyter

Screenshot-2026-02-11-112209 — loading the nvcc jupter extension

Explanation: This command tells the Jupyter kernel to load the nvcc4jupyter extension. Once loaded, you can use %%cu at the beginning of a cell to indicate that its content is CUDA C++ code meant for compilation.

Step 6: Create and Run CUDA Program

This step involves writing a simple CUDA C++ program directly in a cell. The program includes a kernel function that runs on the GPU and a main function that runs on the CPU.

C++

%%cuda
#include <stdio.h>

// kernel function that runs on the GPU hardware
__global__ void simpleKernel() {
    printf("Hello world\n");
}

int main() {
    // Launching 1 block and 1 thread
    simpleKernel<<<1, 1>>>();

    // Wait for the GPU to finish its task before the CPU closes the program
    cudaDeviceSynchronize();

    return 0;
}

Output

Hello world