Cluster metadata

Dataproc sets special metadata values for the instances that run in your cluster:

Metadata keyValue
dataproc-bucketName of the cluster's staging bucket
dataproc-regionRegion of the cluster's endpoint
dataproc-worker-countNumber of worker nodes in the cluster. The value is 0 for single node clusters.
dataproc-cluster-nameName of the cluster
dataproc-cluster-uuidUUID of the cluster
dataproc-roleInstance's role, either Master or Worker
dataproc-masterHostname of the first master node. The value is either [CLUSTER_NAME]-m in a standard or single node cluster, or [CLUSTER_NAME]-m-0 in a high-availability cluster, where [CLUSTER_NAME] is the name of your cluster.
dataproc-master-additionalComma-separated list of hostnames for the additional master nodes in a high-availability cluster, for example, [CLUSTER_NAME]-m-1,[CLUSTER_NAME]-m-2 in a cluster that has 3 master nodes.
SPARK_BQ_CONNECTOR_VERSION or SPARK_BQ_CONNECTOR_URLThe version or URL that points to a Spark BigQuery connector version to use in Spark applications, for example, 0.42.1 or gs://spark-lib/bigquery/spark-3.5-bigquery-0.42.1.jar. A default Spark BigQuery connector version is pre-installed in Dataproc 2.1 and later image version clusters. For more information, see Use the Spark BigQuery connector.

You can use these values to customize the behavior of initialization actions.

You can use the --metadata flag in the gcloud dataproc clusters create command to provide your own metadata:

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --metadata=name1=value1,name2=value2... \
    ... other flags ...