Skip to content

Cancel the pipeline immediately if any jobs fails

Release Notes

You can now configure a pipeline to be cancelled immediately when a job fails. Please see documentation for additional details.Special thanks to @zillemarco for contributing to the feature.

Problem to solve

We should allow the user to configure a pipeline in such a way that when one job fails it cancels all other already started jobs to reduce CI/CD minutes consumed when the pipeline would fail anyway.

Proposal

Acceptance Criteria

When

workflow:
  auto_cancel:
    on_job_failure: all # or: none, default none

is configured then a single job failure will cause all jobs in the pipeline to be cancelled.

When this configuration is not included then the pipeline will behave as it does today (won't be cancelled).

Cascading to children is required but specify a cancellation reason is not required.

Engineering

Looking at it again, I might even make the keyword more specific, to be workflow:cancel_on_job_failure.

My thinking is that workflow:cancel_on_job_failure: all would persist somewhere attached to the Pipeline record, and then we'd check in the BuildFinishedWorker:

# frozen_string_literal: true

module Ci
  class BuildFinishedWorker # rubocop:disable Scalability/IdempotentWorker
    ...

    def process_build(build)
      # We execute these in sync to reduce IO.
      build.update_coverage
      Ci::BuildReportResultService.new.execute(build)

      build.execute_hooks
      ChatNotificationWorker.perform_async(build.id) if build.pipeline.chat?
      build.track_deployment_usage
      build.track_verify_environment_usage
      build.remove_token!

      if build.failed? && !build.auto_retry_expected?
+       if build.pipeline.cancel_on_job_failure == :all # Something like this
+         ::Ci::CancelPipelineWorker.perform_async(build.pipeline_id, build.pipeline_id) # Add param for passing cascade_to_children: true?
+       end
+ 
        ::Ci::MergeRequests::AddTodoWhenBuildFailsWorker.perform_async(build.id)
      end
 module Ci
   class Pipeline < Ci::ApplicationRecord 

+  def cancel_on_job_failure
+    # Return the config value of the workflow:cancel_on_job_failure keyword.
+    # This can be a simple string ("all") for now, or become something more
+    # complicated later, e.g. { states: ['created', 'pending'] }
+  end

In the future, to extend this with more granular configuration, we'd replace the simple pipeline.cancel_running call with something more complex:

cancel_on_job_failure:
  state: [created, pending]
      if build.failed? && !build.auto_retry_expected?
-       build.pipeline.cancel_running if build.pipeline.cancel_on_job_failure == :all
+       cancel_all_immediately_cancellable_jobs if build.pipeline.cancel_on_job_failure.present?
        ::Ci::MergeRequests::AddTodoWhenBuildFailsWorker.perform_async(build.id)
      end
    end

+   def cancel_all_immediately_cancellable_jobs
+     # a bunch of query logic that passes configuration options into Ci::BuildCancelService, etc.
+   end

I'm not going to sketch out the whole future implementation here, but I'm just demonstrating that by inserting a check in the BuildFinishedWorker, we're working in a relatively flexible asynchronous place where we can make calls to queue other asynchronous querying and cancellation functionality.

What is NOT included in the MVC

  • A new failure type will not be reported, pipelines will report as canceled, jobs will report as canceled.
  • No extra error reporting in the job log will be inserted.
  • Configurable through new Workflow syntax
workflow:
  auto_cancel:
    on_job_failure: all

I scratched that last bullet point about workflow syntax, because a bunch of us got together and agreed that introducing a very specific syntax directly under the workflow keyword is a relatively light-touch way to do exactly what's asked in this issue, while leaving us room to iterate and make improvements, more fine-tuned configuration, etc.

So for this (useful!) MVC, we will only configure this at the workflow level, and have the only allowed value all apply to all jobs in the Pipeline.

Considerations

  • It should be optional and off by default.
  • Possibly a GitLab CI yaml level configuration.
  • Deployment jobs should not be canceled for MVC.

How does this work with interruptible?

Importantly, it does not. interruptible configuration, while being named very generically, is a very specific functionality where a Pipeline may be cancelled by a newer, different pipeline running on the same ref. That is the only application of it.

This change, specifically, is to enable Pipelines to cancel themselves after a single job failure.

If these two configurations are to intersect in the future, we'll have to decide how to do that. The naming of interruptible makes it somewhat difficult because there are so many different kinds of interruption that customers want. This is a future concern, and will not be addressed here.

What does success look like, and how can we measure that?

  • For GitLab pipelines we should see a decrease in time for failed pipelines by >= 10%
  • We'd expect to see XX pipelines on GitLab.com with this configuration added 30 days after GA

For the internal customer

  • Create an ability to define a pipeline as fail-fast
  • When a job fails, it should immediately cancel all running jobs in the pipeline and set the pipeline status to failed

For all users

  • All other jobs of a pipeline are cancelled if one job fails.

Links / references

This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.

This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.

This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.

Edited by 🤖 GitLab Bot 🤖