GPU support for worker pools

This page describes GPU configuration for your Cloud Run workerpools. Google provides NVIDIA L4 GPUs with 24 GB of GPU memory (VRAM) and NVIDIA RTX PRO 6000 Blackwell GPU with 96 GB of GPU memory (VRAM), which is separate from the instance memory.

GPU on Cloud Run is fully managed, with no extra drivers or libraries needed. The GPU feature offers on-demand availability with no reservations needed, similar to the way on-demand CPU and on-demand memory work in Cloud Run.

Cloud Run instances with an attached L4 or NVIDIA RTX PRO 6000 Blackwell GPU with drivers pre-installed start in approximately 5 seconds, at which point the processes running in your container can start to use the GPU.

You can configure one GPU per Cloud Run instance. If you use sidecar containers, note that the GPU can only be attached to one container.

Supported GPU types

Cloud Run supports two types of GPUs:

L4 GPU with the current NVIDIA driver version: 535.x.x (12.2). For L4 GPUs, you must use a minimum of 4 CPU and 16 GiB of memory.
NVIDIA RTX PRO 6000 Blackwell GPU with the current NVIDIA driver version: 580.x.x (13.0). For NVIDIA RTX PRO 6000 Blackwell GPU, you must use a minimum of 20 CPU and 80 GiB of memory.

Supported regions

The following regions are supported by the L4 GPU:

asia-southeast1 (Singapore)
asia-south1 (Mumbai) . This region is available by invitation only. Contact your Google Account team if you are interested in this region.
europe-west1 (Belgium) Low CO₂
europe-west4 (Netherlands) Low CO₂
us-central1 (Iowa) Low CO₂ . This region is available by invitation only. Contact your Google Account team if you are interested in this region.
us-east4 (Northern Virginia)

The following regions are supported by the NVIDIA RTX PRO 6000 Blackwell GPU:

asia-southeast1 (Singapore).
asia-south2 (Delhi, India). This region is available by invitation only. Contact your Google Account team if you are interested in this region.
europe-west4 (Netherlands) Low CO₂
us-central1 (Iowa) Low CO₂

Pricing impact

See Cloud Run pricing for GPU pricing details. Note the following requirements and considerations:

There is a difference in cost between GPU zonal redundancy and non-zonal redundancy. See Cloud Run pricing for GPU pricing details.
GPU worker pools cannot be autoscaled. You are charged for the GPU even if the GPU is not running any process, and as long as the worker pool GPU instance is running.
CPU and memory for worker pools is priced differently than services and jobs. However, GPU SKU is priced the same as services and jobs.
The CPU and memory configurations of your resource.
GPU is billed for the entire duration of the instance lifecycle.

GPU zonal redundancy options

By default, Cloud Run deploys your worker pool across multiple zones within a region. This architecture provides inherent resil