Skip to content

Out-of-sequence job execution using directed acyclic graphs (DAG) MVC

Description

GitLab CI/CD pipelines are pretty powerful. Sequential stages and parallel jobs provide a lot of configurability to handle a wide variety of needs. But sometimes it's not enough. Or at least, sometimes it's not efficient enough. Sometimes you want a job in a future stage to run as soon as the job that it depends on finishes.

For example, when a project generates both Android and iOS apps in a multi-stage pipeline, people want the iOS deployment to start as soon as all the iOS tests pass rather than waiting for all the Android tests to pass too. The total compute time might be the same, but the wall-clock time is different. In more complicated cases, it's possible to significantly reduce the overall wall-clock time of the pipeline by declaring exactly which jobs depend on which other jobs.

A solution like DAG can allow pipelines to be mapped in terms of dependencies, and then cloud compute resources applied automatically in the most efficient way in order to execute. This is very powerful and solves much manual optimization when it comes to pipelines.

Proposal

Implement a simple extension to CI jobs that will allow multiple stages to run at the same time:

build:
  stage: build
  script: echo Hello World

setupenv:
  stage: build

rspec:
  stage: test
  needs: [build]

This causes the rspec job to run immediately after the build finishes, regardless of status of other jobs in stage: build.

More complex example

Here is an in progress pipeline where you can see that the test_quickbuild and deploy_quickbuild were allowed to proceed even though longbuild was still in progress:

image

image

Here is the complete `.gitlab-ci.yml` for implementing this
stages:
  - build
  - test
  - deploy
  
longbuild:
  stage: build
  script:
    - sleep 120

quickbuild:
  stage: build
  script: 
    - sleep 10
    
test_quickbuild:
  stage: test
  script:
    - sleep 10
  needs: [quickbuild]

test_longbuild:
  stage: test
  script: 
    - sleep 10
  needs: [longbuild]

deploy_quickbuild:
  stage: deploy
  script:
    - sleep 10
  needs: [test_quickbuild]

deploy_longbuild:
  stage: deploy
  script:
    - sleep 10
  needs: [test_longbuild]

The logic of needs:

  1. If needs: is set to point to a job that is not instantiated because of only/except rules or otherwise does not exist, the job will fail.
  2. Note that one day one of the launch, we are temporarily limiting the maximum number of jobs that a single job can need in the needs: array. Track our infrastructure issue for details on the current limit.
  3. If you use dependencies: with needs:, it's important that you do not mark a job as having a dependency on something that won't have been run at the time it needs it. It's better to use both keywords in this case so that GitLab handles the ordering appropriately.
  4. It is impossible for now to have needs: [] (empty needs), the job always needs to depend on something, unless this is the job in the first stage (see gitlab-ce#65504).
  5. If needs: refers to a job that is marked as parallel:. the current job will depend on all parallel jobs created.
  6. needs: is similar to dependencies: in that needs to use jobs from prior stages, this means that it is impossible to create circular dependencies or depend on jobs in the current stage (see gitlab-ce#65505).
  7. Related to the above, stages must be explicitly defined for all jobs that have the keyword needs: or are referred to by one.

Needing jobs from the same stage

For the MVC, we are limiting the definition of needs: to only support needing jobs from previous stages. The reason for this is to avoid the risk of circular dependencies. This may be added in a future release (see https://gitlab.com/gitlab-org/gitlab-ce/issues/65505), but we want to see how the feature is used and performs prior to implementing this as a possibility.

When to run a job that needs: another job

The keyword needs: implies that the job is needed, and should run. Therefore for the MVC we have implemented it so that if a job points to a job that ends up not existing for whatever reason (likely an only/except rule that did not match) then the job that needs it will also not run.

Future improvements

Links / references

Edited by Jason Yavorsky