Skip to content

`parallel` job keyword to speed up pipelines

Problem to Solve

The speed of builds is an important factor for any team, and running tests tends to take a big chunk of the time for any build. Providing a framework to simply parallelize tests will allow these teams to accelerate their software delivery process.

Description

We have split gitlab-ce's tests into multiple parallel jobs running substantially the same scripts which differ only by a loop index. Let's formalize this approach and create a parallel keyword which takes a number, N, and duplicates a job N times while setting CI_NODE_INDEX and CI_NODE_TOTAL for each job.

Proposal

Given:

rspec:
  stage: test
  parallel: 20
  script:
    - export KNAPSACK_REPORT_PATH=knapsack/rspec_node_${CI_NODE_INDEX}_${CI_NODE_TOTAL}_report.json
    - cp knapsack/rspec_report.json ${KNAPSACK_REPORT_PATH}
    - knapsack rspec

Generate 20 jobs named rspec 1/20 through rspec 20/20. (I prefer indexing from 1 for human-named items.). Each job would have a unique CI_NODE_INDEX and CI_NODE_TOTAL would be set to 20. This would be handled at the parser level so GitLab runner wouldn't require any changes.

Note that .gitlab-ci.yml would support multiple definitions for parallel jobs (e.g. rspec and spinach) in the same script, and the CI_NODE_INDEX variables would only be unique within each definition. e.g. there would be two jobs running with CI_NODE_INDEX=1.

Links

/cc @ayufan @grzesiek

Edited by Jason Yavorsky