Cache cannot be downloaded when Azure Workload Identity enabled
Summary
GitLab Runner with Kubernetes executor was configured to use Azure Blob Storage (Storage Account) to store cache using Account Key and it worked well.
After Azure Workload Identity feature was introduced in GitLab 17.5, I've started migration GitLab Runner to Azure Workload Identity feature to store pipeline cache in Azure Blob Storage. Cache is uploading successfully, but cache downloading process has the following warning and cache not downloading:
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
WARNING: Cache file does not exist
Steps to reproduce
- Create Kubernetes cluster using Azure AKS (Azure Kubernetes services) service with Linux node pool
- Deploy GitLab Runner using Helm and official Helm Chart into Kubernetes cluster with
values.yaml
file mentioned below - Write
.gitlab-ci.yml
pipeline file mentioned below - Run a pipeline
values.yaml
replicas: 1
gitlabUrl: https://[REDACTED]
runnerToken: [REDACTED]
unregisterRunners: true
rbac:
create: true
concurrent: 10
checkInterval: 0
logFormat: json
serviceAccount:
annotations:
azure.workload.identity/client-id: [REDACTED]
metrics:
enabled: true
runners:
config: |
[[runners]]
environment = [
"DOCKER_DRIVER=overlay2",
"DOCKER_TLS_CERTDIR=/certs",
"DOCKER_TLS_VERIFY=1",
"DOCKER_CERT_PATH=$DOCKER_TLS_CERTDIR/client",
]
output_limit = 100000
[runners.feature_flags]
FF_SCRIPT_SECTIONS = true
FF_TIMESTAMPS = true
[runners.cache]
Type = "azure"
[runners.cache.azure]
AccountName = "[REDACTED]"
ContainerName = "[REDACTED]"
[runners.kubernetes]
namespace = "{{ .Release.Namespace }}"
image = "ruby:3"
privileged = true
service_account = "gitlab-runner"
[runners.kubernetes.pod_labels]
"azure.workload.identity/use" = "true"
executor: kubernetes
name: k8s
.gitlab-ci.yml
build:
script:
- mkdir myfolder/
- touch myfolder/123.txt
cache:
key: mycache
paths:
- myfolder/
Actual behavior
Empty or already uploaded cache not downloading and get the following when Azure Workload Identity enabled:
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
WARNING: Cache file does not exist
All 4 environment variables are passed with correct data into both "build" and "helper" containers during pipeline execution when a new Pod for a pipeline running:
AZURE_AUTHORITY_HOST
AZURE_CLIENT_ID
AZURE_FEDERATED_TOKEN_FILE
AZURE_TENANT_ID
If AccountKey
property in config.toml
configuration of GitLab Runner is set, then everything works well
Expected behavior
Cache downloads and uploads from/into Azure Blob Storage successfully when Azure Workload Identity enabled
Relevant logs and/or screenshots
I've attached a useful job log with enabled trace.
You can see that --gocloud-url
flag exists only when cache uploading.
When cache downloading, this flag missed for some reason
job log
++ echo 'Checking cache for mycache-1-non_protected...'
Checking cache for mycache-1-non_protected...
++ /usr/bin/gitlab-runner-helper cache-extractor --file ../../../../../cache/[REDACTED]/mycache-1-non_protected/cache.zip --timeout 10
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
WARNING: Cache file does not exist
++ echo 'Failed to extract cache'
Failed to extract cache
+ exit 0
...
++ echo 'Creating cache mycache-1-non_protected...'
Creating cache mycache-1-non_protected...
++ export AZURE_STORAGE_ACCOUNT=[REDACTED]
++ AZURE_STORAGE_ACCOUNT=[REDACTED]
++ export AZURE_STORAGE_DOMAIN=
++ AZURE_STORAGE_DOMAIN=
++ /usr/bin/gitlab-runner-helper cache-archiver --file ../../../../../cache/[REDACTED]/mycache-1-non_protected/cache.zip --timeout 10 --path myfolder/ --gocloud-url azblob://[REDACTED]
myfolder/: found 4 matching artifact files and directories
Uploading cache.zip to azblob://[REDACTED]
++ echo 'Created cache'
Created cache
+ exit 0
Environment description
I'm using self-hosted GitLab Runner with Kubernetes executor and self-hosted GitLab server. Self-hosted GitLab Server is not managed by me, so have no configuration how it was deployed, but there is a chance to getting know about it if really necessary.
Both have version 17.5.3
Kubernetes version based on Azure AKS is 1.29.5
config.toml contents
[[runners]]
environment = [
"DOCKER_DRIVER=overlay2",
"DOCKER_TLS_CERTDIR=/certs",
"DOCKER_TLS_VERIFY=1",
"DOCKER_CERT_PATH=$DOCKER_TLS_CERTDIR/client",
]
output_limit = 100000
[runners.feature_flags]
FF_SCRIPT_SECTIONS = true
FF_TIMESTAMPS = true
[runners.cache]
Type = "azure"
[runners.cache.azure]
AccountName = "[REDACTED]"
ContainerName = "[REDACTED]"
[runners.kubernetes]
namespace = "{{ .Release.Namespace }}"
image = "ruby:3"
privileged = true
service_account = "gitlab-runner"
[runners.kubernetes.pod_labels]
"azure.workload.identity/use" = "true"
Used GitLab Runner version
Version: 17.5.3
Git revision: 12030cf4
Git branch: 17-5-stable
GO version: go1.22.7
Built: 2024-10-31T20:31:08+0000
OS/Arch: linux/amd64
Possible fixes
Seems like GitLab Runner (particularly, "helper" container) doesn't recognize that Azure Workload Identity enabled only when cache downloading.
Not sure, but the problem maybe somewhere here