Understanding How CUDA_VISIBLE_DEVICES Works

chukim0 · July 15, 2025, 7:20am

I’m trying to understand how CUDA_VISIBLE_DEVICES behaves in a system with multiple A100 GPUs, some of which have MIG enabled.

Here’s the situation:

Tue Jul 15 16:01:30 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:07:00.0 Off |                   On |
| N/A   31C    P0              51W / 400W |     87MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  | 00000000:0B:00.0 Off |                   On |
| N/A   30C    P0              47W / 400W |     87MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          On  | 00000000:48:00.0 Off |                   On |
| N/A   43C    P0             166W / 400W |   2195MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-80GB          On  | 00000000:4C:00.0 Off |                   On |
| N/A   32C    P0              47W / 400W |     87MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          On  | 00000000:88:00.0 Off |                    0 |
| N/A   29C    P0              57W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-80GB          On  | 00000000:8B:00.0 Off |                    0 |
| N/A   32C    P0              60W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-80GB          On  | 00000000:C8:00.0 Off |                    0 |
| N/A   32C    P0              63W / 400W |   1517MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-80GB          On  | 00000000:CB:00.0 Off |                    0 |
| N/A   32C    P0              66W / 400W |  22419MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+



+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  2    1   0   0  |            2145MiB / 40192MiB  | 42      0 |  3   0    2    0    0 |
|                  |               4MiB / 65535MiB  |           |                       |

When I set CUDA_VISIBLE_DEVICES, the processes end up using:

CUDA_VISIBLE_DEVICES=0 → physical GPU 4
CUDA_VISIBLE_DEVICES=1 → physical GPU 5
CUDA_VISIBLE_DEVICES=2 → physical GPU 6
CUDA_VISIBLE_DEVICES=3 → physical GPU 7
CUDA_VISIBLE_DEVICES=4 → a MIG instance on physical GPU 2
Beyond that (e.g., setting CUDA device 5 or higher), I get an error.

What I find particularly confusing is this part:

CUDA_VISIBLE_DEVICES=4 → a MIG instance on physical GPU 2

Is there a specific rule or logic that determines this kind of mapping?

rs277 · July 15, 2025, 8:07am

I have no experience in this area, and I find the definition of CUDA_VISIBLE_DEVICES somewhat unclear.

I wonder if the reason your ordering starts at GPU4, is because GPUs 0-3 all have MIG enabled, even though you have only one MIG instance configured on GPU 2.

So the integer ordering starts with non MIG devices and then moves to MIG instances. Perhaps you need to use the UUID method to differentiate them?

chukim0 · July 15, 2025, 8:14am

I can target a specific MIG instance by using CUDA_VISIBLE_DEVICES=[MIG-UUID]. Additionally, I can target a specific physical GPU by using CUDA_VISIBLE_DEVICES=[GPU-UUID]. However, it’s not possible to target a MIG-enabled GPU using CUDA_VISIBLE_DEVICES=[GPU-UUID]. It seems that GPUs with MIG enabled cannot be specified directly by CUDA_VISIBLE_DEVICES. They are likely only recognized at the smaller MIG unit level.

Regarding the relationship between integer indices and GPU recognition, it appears they are assigned sequentially after listing all available physical GPUs and MIG instances. While my tests show this behavior, I haven’t been able to find any official references to confirm it.

Robert_Crovella · July 15, 2025, 2:57pm

The MIG document gives examples of selecting specific MIG devices. It doesn’t use or demonstrate using ordinary ordinal device IDs. I think that is expected. To quote:

CUDA_VISIBLE_DEVICES has been extended to add support for MIG. Depending on the driver versions being used, two formats are supported:

With drivers >= R470 (470.42.01+), each MIG device is assigned a GPU UUID starting with MIG-<UUID>.

With drivers < R470 (for example, R450 and R460), each MIG device is enumerated by specifying the CI and the corresponding parent GI. The format follows this convention: MIG-<GPU-UUID>/<GPU instance ID>/<compute instance ID>.

(emphasis added)

There is no indication that using a device ordinal method is supported for proper selection of MIG devices.

Topic		Replies	Views
How to use CUDA_VISIBLE_DEVICES for MIG instances CUDA Programming and Performance	5	5925	November 29, 2021
CUDA_VISIBLE_DEVICES being ignored CUDA Setup and Installation	9	20582	March 15, 2016
How to use cudaSetDevice to select devices while using MIG on A100? CUDA Programming and Performance	1	1961	September 28, 2021
How to use cudaSetDevice to select devices while using MIG on A100? Compute Sanitizer a100	2	1076	September 27, 2021
Set CUDA_VISIBLE_DEVICES to run kernels on specific MIG instance CUDA Programming and Performance	1	129	December 12, 2024
CUDA Pro Tip: Control GPU Visibility with CUDA_VISIBLE_DEVICES Technical Blog	3	1390	April 26, 2021
What is the meaning of the statement 'only enumeration of a single MIG instance is supported'? Deep Learning (Training & Inference) cuda	0	27	June 9, 2025
Change id assigned to gpu's CUDA Programming and Performance	2	4473	July 6, 2011
How to Select a Specific MIG Instance in the Kubernetes Pod GPU-Accelerated Libraries cuda	2	398	September 24, 2024
CUDA_VISIBLE_DEVICES doesn't work properly Frameworks tensorflow	0	592	April 17, 2019

Understanding How CUDA_VISIBLE_DEVICES Works

Related topics