GitLab CI after_script is not executed

Summary

In GitLab CI, the steps of after_script section are not executed anymore. Experienced with gitlab-runner 12.10.0-rc1 (80ffd94f) on docker-auto-scale fa6cab46

Steps to reproduce

Create project with .gitlab-ci.yml:

echo:
  image: alpine:latest
  script:
    - echo script
  after_script:
    - echo after_script

Example Project

https://gitlab.com/hs-karlsruhe/ci-test/-/jobs/518567755

What is the current bug behavior?

The steps of after_script section are not executed

What is the expected correct behavior?

The steps of after_script section are executed

Relevant logs and/or screenshots

Running with gitlab-runner 12.10.0-rc1 (80ffd94f)
   on docker-auto-scale fa6cab46
Preparing the "docker+machine" executor
 Using Docker executor with image alpine:latest ...
 Pulling docker image alpine:latest ...
 Using docker image sha256:a187dde48cd289ac374ad8539930628314bc581a481cdb41409c9289419ddb72 for alpine:latest ...
Preparing environment
00:02
 Running on runner-fa6cab46-project-16554705-concurrent-0 via runner-fa6cab46-srm-1587389254-aa44b7cd...
Getting source from Git repository
00:02
 $ eval "$CI_PRE_CLONE_SCRIPT"
 Fetching changes with git depth set to 50...
 Initialized empty Git repository in /builds/hs-karlsruhe/ci-test/.git/
 Created fresh repository.
 From https://gitlab.com/hs-karlsruhe/ci-test
  * [new ref]         refs/pipelines/137885685 -> refs/pipelines/137885685
  * [new branch]      after_script             -> origin/after_script
 Checking out 062ebbae as after_script...
 Skipping Git submodules setup
Restoring cache
00:01
Downloading artifacts
00:01
Running before_script and script
00:01
 $ echo script
 script
Running after_script
00:01
Saving cache
00:01
Uploading artifacts for successful job
00:01
 Job succeeded

Output of checks

This bug happens on GitLab.com

What happened

This bug was introduced in gitlab-runner!1990 (merged), specifically https://gitlab.com/gitlab-org/gitlab-runner/-/blob/1494bf0071cb93ceb9bd771ea990ef292746f5d7/executors/docker/docker.go#L880. To understand the problem we have to understand how we execute the before_script, script, after_script we run before_script and script together as 1 script, inside of a container that we call the build container. When before_script+script are finished executing we then use the same container to run after_script for performance reasons, and also that after_script has all the state the before_script+script generated. We check the container sate to see if we need to get its exit code and finish execution, however, after_script uses the already exited, created, not running container so this condition evaluates to false and results into us just look at the exit code again. This means we just return the exit code and never actually running the after_script.

What we are doing

Revert gitlab-runner!1990 (merged) which is being done in gitlab-runner!2026 (merged)
Cherry-pick this commit into the 12.10 stable branch and tag 12.10.0-rc2
Deploy 12.10.0-rc2 to the shared Runner fleet

The reason we aren't doing a rollback of the deployment, but rolling forward is because for us to do a rollback it would be just like a normal deploy, so it would take the same amount of time.

Follow up steps

Create an integration test to assert that we run the after_script inside of the Docker executor.

Edited Apr 21, 2020 by Steve Xuereb