Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 1 | # CQ |
| 2 | |
| 3 | This document describes how the Chromium Commit Queue (CQ) is structured and |
| 4 | managed. This is specific for the Chromium CQ. Questions about other CQs should |
Ben Pastene | eaaa5f1 | 2024-05-14 16:59:35 | [diff] [blame] | 5 | be directed to infra-dev@chromium.org. If you find terms you're not familiar |
| 6 | with in this doc, consult the infra [glossary](glossary.md). |
| 7 | |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 8 | |
| 9 | [TOC] |
| 10 | |
| 11 | ## Purpose |
| 12 | |
| 13 | The Chromium CQ exists to test developer changes before they land into |
Ben Pastene | eaaa5f1 | 2024-05-14 16:59:35 | [diff] [blame] | 14 | [chromium/src](https://chromium.googlesource.com/chromium/src/). It runs a |
| 15 | curated set of test suites across a curated set of platforms for a given CL, |
| 16 | and ensures the CL doesn't introduce any new regressions. |
| 17 | |
| 18 | ## Modes |
| 19 | |
| 20 | The CQ supports a few different modes of execution. Each mode differs in what |
| 21 | tests it runs and/or what happens when all those tests pass. These modes are |
| 22 | described below. |
| 23 | |
| 24 | ### Dry-Run |
| 25 | |
| 26 | This runs all the normal set of tests for a CL. When the dry-run has complete, |
| 27 | it will simply report the results of the CQ attempt as a Gerrit comment on the |
| 28 | CL and take no further action. This mode is intended to be used frequently as a |
| 29 | CL is developed. Can be triggered via either the `CQ DRY RUN` button in Gerrit |
| 30 | or by applying the `Commit-Queue +1` label vote. See |
| 31 | [below](#what-exactly-does-the-cq-run) for the anatomy of a build on the CQ. |
| 32 | |
| 33 | ### Full-Run |
| 34 | |
| 35 | Runs all the same tests as dry-run. If there are no new regressions introduced, |
| 36 | the CL will be submitted into the repo. This mode should only be used when a CL |
| 37 | is finalized. Can be triggered via either the `SUBMIT TO CQ` button in Gerrit or |
| 38 | by applying the `Commit-Queue +2` label vote. |
| 39 | |
| 40 | ### Mega-CQ Dry-Run |
| 41 | |
| 42 | This runs a much larger set of tests compared to the previous two CQ modes. |
| 43 | Those run a limited & curated set of tests that optimizes for quick turn-around |
| 44 | time while still catching most _but not all_ regressions. The Mega CQ, on the |
| 45 | other hand, aims to catch nearly _all_ regressions regardless of cycle time. |
| 46 | Consequently, the Mega CQ takes much longer, and should only be used for |
| 47 | particularly risky CLs. Triggered via the `Mega CQ: Dry Run` button under the |
| 48 | three-dot menu in Gerrit. |
| 49 | |
| 50 | For a peak under the hood, the Mega CQ determines its test coverage by including |
| 51 | in it the trybot mirrors of all CI builders gardened by the main Chromium |
| 52 | gardening rotations. It also unconditionally applies |
| 53 | `Include-Ci-Only-Tests: true` to its builds (see [below](#options)). |
| 54 | |
Ben Pastene | 12b7e024 | 2024-06-18 20:26:35 | [diff] [blame] | 55 | If you find that the Mega CQ isn't covering a build or test config that it |
| 56 | should, please file a general [trooper bug](https://g.co/bugatrooper) for the |
| 57 | missing coverage. |
| 58 | |
Ben Pastene | eaaa5f1 | 2024-05-14 16:59:35 | [diff] [blame] | 59 | ### Mega-CQ Full-Run |
| 60 | |
| 61 | Runs all the same tests as the Mega-CQ dry-run. Will submit the CL if |
| 62 | everything passes. Triggered via the `Mega CQ: Submit` button under the |
Ben Pastene | 12b7e024 | 2024-06-18 20:26:35 | [diff] [blame] | 63 | three-dot menu in Gerrit. The amount of builds and tests the Mega CQ runs makes |
| 64 | a passing run much more unlikely than a normal CQ run. Consequently, when |
| 65 | running the Mega CQ, you'll likely want to spot-check the failures listed on |
| 66 | Gerrit for anything that looks particularly relevant to your CL. Then you can use |
| 67 | the normal `SUBMIT TO CQ` button to land once all failures look unrelated. |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 68 | |
| 69 | ## Options |
| 70 | |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 71 | The Chromium CQ supports a variety of options that can change what it checks. |
| 72 | |
| 73 | > These options are supported via git footers. They must appear in the last |
| 74 | > paragraph of your commit message to be used. See `git help footers` or |
| 75 | > [git_footers.py][1] for more information. |
| 76 | |
Dirk Pranke | c167021 | 2021-10-14 00:34:33 | [diff] [blame] | 77 | * `Binary-Size: <rationale>` |
| 78 | |
| 79 | This should be used when you are landing a change that will intentionally |
| 80 | increase the size of the Chrome binaries on Android (since we try not to |
| 81 | accidentally do so). The rationale should explain why this is okay to do. |
| 82 | |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 83 | * `Commit: false` |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 84 | |
| 85 | You can mark a CL with this if you are working on experimental code and do not |
| 86 | want to risk accidentally submitting it via the CQ. The CQ will immediately |
| 87 | stop processing the change if it contains this option. |
| 88 | |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 89 | * `Cq-Include-Trybots: <trybots>` |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 90 | |
| 91 | This flag allows you to specify some additional bots to run for this CL, in |
| 92 | addition to the default bots. The format for the list of trybots is |
| 93 | "bucket:trybot1,trybot2;bucket2:trybot3". |
| 94 | |
Dirk Pranke | c167021 | 2021-10-14 00:34:33 | [diff] [blame] | 95 | * `Disable-Retries: true` |
| 96 | |
| 97 | The CQ will normally try to retry failed test shards (up to a point) to work |
| 98 | around any intermittent infra failures. If this footer is set, it won't try |
| 99 | to retry failed shards no matter what happens. |
| 100 | |
Thiago Perrotta | 82d590a | 2022-12-21 23:23:31 | [diff] [blame] | 101 | * `Ignore-Freeze: true` |
| 102 | |
| 103 | Whenever there is an active prod freeze (usually around Christmas), it can be |
| 104 | bypassed by setting this footer. |
| 105 | |
Garrett Beaty | db67d1c0 | 2025-03-06 21:59:51 | [diff] [blame] | 106 | * `Include-Ci-Only-Tests: true` or |
| 107 | `Include-Ci-Only-Tests: <comma-separated-builder-ids>|<comma-separated-tests>` |
Dirk Pranke | c167021 | 2021-10-14 00:34:33 | [diff] [blame] | 108 | |
Garrett Beaty | db67d1c0 | 2025-03-06 21:59:51 | [diff] [blame] | 109 | Some builder configurations may run configure to run some tests only after |
| 110 | submission (on CI) and not before submission (in the CQ) by default. Possible |
| 111 | reasons this might be that the tests are too slow or too expensive or there is |
| 112 | insufficient capacity to run the tests for every CL. In order to still be able |
| 113 | to explicitly reproduce what the CI builder is doing, you can specify this |
| 114 | footer to run those tests before submission anyway. Specifying true will run |
| 115 | all such tests on any triggered try builders. Specifying builder IDs and tests |
| 116 | will run only the named tests defined for the identified CI builders on the |
| 117 | try builders that mirror those CI builders. A * can be used in place of a |
| 118 | builder ID or test name to match any builder/test. |
| 119 | |
| 120 | Constructing a footer value manually should generally be unnecessary: tests |
| 121 | configured to run only on CI will have the necessary footer included in their |
| 122 | step text and in the build summary when they fail. |
| 123 | |
| 124 | If it is necessary to construct a footer manually, the builder IDs have the |
| 125 | format of _builder-group_:_builder-name_. _builder-name_ is the name of the CI |
| 126 | builder that the test is configured for. _builder-group_ is the "builder |
| 127 | group" of that builder, which is a concept specific to chromium infra. The |
| 128 | builder group of a builder can be found by going to a build of the builder, |
| 129 | and finding the `builder_group` property in the `Input Properties` section of |
| 130 | the `Infra` tab. |
Dirk Pranke | c167021 | 2021-10-14 00:34:33 | [diff] [blame] | 131 | |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 132 | * `No-Presubmit: true` |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 133 | |
| 134 | If you want to skip the presubmit check, you can add this line, and the commit |
| 135 | queue won't run the presubmit for your change. This should only be used when |
| 136 | there's a bug in the PRESUBMIT scripts. Please check that there's a bug filed |
| 137 | against the bad script, and if there isn't, [file one](https://crbug.com/new). |
| 138 | |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 139 | * `No-Tree-Checks: true` |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 140 | |
| 141 | Add this line if you want to skip the tree status checks. This means the CQ |
| 142 | will commit a CL even if the tree is closed. Obviously this is strongly |
| 143 | discouraged, since the tree is usually closed for a reason. However, in rare |
| 144 | cases this is acceptable, primarily to fix build breakages (i.e., your CL will |
| 145 | help in reopening the tree). |
| 146 | |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 147 | * `No-Try: true` |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 148 | |
| 149 | This should only be used for reverts to green the tree, since it skips try |
| 150 | bots and might therefore break the tree. You shouldn't use this otherwise. |
| 151 | |
Ben Pastene | 35b081a3 | 2024-08-14 21:15:11 | [diff] [blame] | 152 | * `Validate-Test-Flakiness: skip` |
| 153 | |
| 154 | This will disable the `test new tests for flakiness.*` steps in CQ builds that |
| 155 | check new tests for flakiness. |
| 156 | |
Yiwei Zhang | f4fbd8d0 | 2025-01-15 17:36:55 | [diff] [blame] | 157 | * `Skip-Clang-Tidy-Checks: <check_1>,<check_2>,...` |
| 158 | |
| 159 | This will skip the specified clang-tidy checks. The checks can be specified |
| 160 | as check name (e.g. `modernize-use-equals-default`) or glob to skip a set of |
| 161 | checks (e.g. `modernize-*` to skip checks that advocate usage of modern |
| 162 | language constructs). This option can span across multiple lines, for example: |
| 163 | ``` |
| 164 | Skip-Clang-Tidy-Checks: google-explicit-constructor |
| 165 | Skip-Clang-Tidy-Checks: modernize-*,readability-* |
| 166 | ``` |
| 167 | |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 168 | ## FAQ |
| 169 | |
Erik Chen | 3eabe5e | 2019-05-30 23:23:25 | [diff] [blame] | 170 | ### What exactly does the CQ run? |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 171 | |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 172 | CQ runs the jobs specified in [commit-queue.cfg][2]. See |
Henrique Ferreiro | 1395c3e | 2020-07-30 16:00:18 | [diff] [blame] | 173 | [`cq-builders.md`](../../infra/config/generated/cq-builders.md) |
Garrett Beaty | 8928f390 | 2019-10-16 22:41:09 | [diff] [blame] | 174 | for an auto generated file with links to information about the builders on the |
| 175 | CQ. |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 176 | |
| 177 | Some of these jobs are experimental. This means they are executed on a |
| 178 | percentage of CQ builds, and the outcome of the build doesn't affect if the CL |
| 179 | can land or not. See the schema linked at the top of the file for more |
| 180 | information on what the fields in the config do. |
| 181 | |
Erik Chen | 3eabe5e | 2019-05-30 23:23:25 | [diff] [blame] | 182 | The CQ has the following structure: |
| 183 | |
| 184 | * Compile all test suites that might be affected by the CL. |
| 185 | * Runs all test suites that might be affected by the CL. |
| 186 | * Many test suites are divided into shards. Each shard is run as a separate |
| 187 | swarming task. |
| 188 | * These steps are labeled '(with patch)' |
| 189 | * Retry each shard that has a test failure. The retry has the exact same |
| 190 | configuration as the original run. No recompile is necessary. |
| 191 | * If the retry succeeds, then the failure is ignored. |
| 192 | * These steps are labeled '(retry shards with patch)' |
| 193 | * It's important to retry with the exact same configuration. Attempting to |
| 194 | retry the failing test in isolation often produces different behavior. |
| 195 | * Recompile each failing test suite without the CL. Rerun each failing test |
| 196 | suite in isolation. |
| 197 | * If the retry fails, then the fail is ignored, as it's assumed that the test |
| 198 | is broken/flaky on tip of tree. |
| 199 | * These steps are labeled '(without patch)' |
| 200 | * Fail the build if there are tests which failed in both '(with patch)' and |
| 201 | '(retry shards with patch)' but passed in '(without patch)'. |
| 202 | |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 203 | ### Why did my CL fail the CQ? |
| 204 | |
| 205 | Please follow these general guidelines: |
Erik Chen | 3eabe5e | 2019-05-30 23:23:25 | [diff] [blame] | 206 | |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 207 | 1. Check to see if your patch caused the build failures, and fix if possible. |
| 208 | 1. If compilation or individual tests are failing on one or more CQ bots and you |
| 209 | suspect that your CL is not responsible, please contact your friendly |
Erik Staab | 80a7c1b | 2024-03-27 23:26:23 | [diff] [blame] | 210 | neighborhood gardener by filing a |
| 211 | [gardener bug](https://bugs.chromium.org/p/chromium/issues/entry?template=Defect%20report%20from%20developer&labels=Gardener-Chromium&summary=%5BBrief%20description%20of%20problem%5D&comment=What%27s%20wrong?). |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 212 | If the code in question has appropriate OWNERS, consider contacting or CCing |
| 213 | them. |
| 214 | 1. If other parts of CQ bot execution (e.g. `bot_update`) are failing, or you |
| 215 | have reason to believe the CQ itself is broken, or you can't really |
Sven Zheng | 58e18fb | 2019-01-22 19:00:00 | [diff] [blame] | 216 | tell what's wrong, please file a [trooper bug](https://g.co/bugatrooper). |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 217 | |
| 218 | In both cases, when filing bugs, please include links to the build and/or CL |
| 219 | (including relevant patchset information) in question. |
| 220 | |
dljames | 4556155 | 2025-02-12 01:23:27 | [diff] [blame] | 221 | ### How do I stop the CQ? |
| 222 | |
| 223 | There are a few ways to do this. Here are 3: |
| 224 | |
| 225 | 1. Change the Commit-Queue value from +1 to 0 in Gerrit UI. |
| 226 | 2. Upload a new patchset which triggers a new dry run (Ex: git cl upload -d). |
| 227 | 3. Code-Review -1. This prevents a CL from landing. |
| 228 | |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 229 | ### How do I add a new builder to the CQ? |
| 230 | |
| 231 | There are several requirements for a builder to be added to the Commit Queue. |
| 232 | |
Erik Staab | 80a7c1b | 2024-03-27 23:26:23 | [diff] [blame] | 233 | * There must be a "mirrored" (aka matching) CI builder that is gardened, to |
Dirk Pranke | efa8d0be | 2021-05-17 18:19:57 | [diff] [blame] | 234 | ensure that someone is actively keeping the configuration green. |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 235 | * All the code for this configuration must be in Chromium's public repository or |
| 236 | brought in through [src/DEPS](../../DEPS). |
| 237 | * Setting up the build should be straightforward for a Chromium developer |
| 238 | familiar with existing configurations. |
| 239 | * Tests should use existing test harnesses i.e. |
| 240 | [gtest](../../third_party/googletest). |
Monica Chintala | 75bc7167 | 2024-09-21 19:02:01 | [diff] [blame] | 241 | * It should be possible for any committer to replicate any testing run, i.e. |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 242 | tests and their data must be in the public repository. |
| 243 | * Median cycle time needs to be under 40 minutes for trybots. 90th percentile |
Quinten Yearsley | 317532d | 2021-10-20 17:10:31 | [diff] [blame] | 244 | should be around an hour (preferably shorter). |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 245 | * Configurations need to catch enough failures to be worth adding to the CQ. |
| 246 | Running builds on every CL requires a significant amount of compute resources. |
| 247 | If a configuration only fails once every couple of weeks on the waterfalls, |
| 248 | then it's probably not worth adding it to the commit queue. |
| 249 | |
Dirk Pranke | efa8d0be | 2021-05-17 18:19:57 | [diff] [blame] | 250 | Please email [email protected], who will approve new build configurations. |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 251 | |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 252 | ### How do I ensure a trybot runs on all changes to a specific directory? |
| 253 | |
| 254 | Several builders are included in the CQ only for changes that affect specific |
| 255 | directories. These used to be configured via Cq-Include-Trybots footers |
Garrett Beaty | d88e4f06 | 2020-07-22 19:24:02 | [diff] [blame] | 256 | injected at CL upload time. They are now configured via the `location_regexp` |
| 257 | attribute of the tryjob parameter to the try builder's definition e.g. |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 258 | |
| 259 | ``` |
Garrett Beaty | d88e4f06 | 2020-07-22 19:24:02 | [diff] [blame] | 260 | try_.some_builder_function( |
| 261 | name = "my-specific-try-builder", |
| 262 | tryjob = try_.job( |
| 263 | location_regexp = [ |
| 264 | ".+/{+]/path/to/my/specific/directory/.+" |
| 265 | ], |
| 266 | ), |
| 267 | ) |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 268 | ``` |
| 269 | |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 270 | ## Flakiness |
| 271 | |
| 272 | The CQ can sometimes be flaky. Flakiness is when a test on the CQ fails, but |
| 273 | should have passed (commonly known as a false negative). There are a few common |
| 274 | causes of flaky tests on the CQ: |
| 275 | |
| 276 | * Machine issues; weird system processes running, running out of disk space, |
| 277 | etc... |
| 278 | * Test issues; individual tests not being independent and relying on the order |
| 279 | of tests being run, not mocking out network traffic or other real world |
| 280 | interactions. |
| 281 | |
Erik Chen | 3eabe5e | 2019-05-30 23:23:25 | [diff] [blame] | 282 | The CQ mitigates flakiness by retrying failed tests. The core tradeoff in retry |
| 283 | policy is that adding retries increases the probability that a flaky test will |
| 284 | land on tip of tree sublinearly, but mitigates the impact of the flaky test on |
| 285 | unrelated CLs exponentially. |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 286 | |
Erik Chen | 3eabe5e | 2019-05-30 23:23:25 | [diff] [blame] | 287 | For example, imagine a CL that adds a test that fails with 50% probability. Even |
| 288 | with no retries, the test will land with 50% probability. Subsequently, 50% of |
| 289 | all unrelated CQ attempts would flakily fail. This effect is cumulative across |
| 290 | different flaky tests. Since the CQ has roughly ~20,000 unique flaky tests, |
| 291 | without retries, pretty much no CL would ever pass the CQ. |
Stephen Martinis | b5ad5b22 | 2018-11-08 01:24:04 | [diff] [blame] | 292 | |
| 293 | ## Help! |
| 294 | |
| 295 | Have other questions? Run into any issues with the CQ? Email |
Sven Zheng | 58e18fb | 2019-01-22 19:00:00 | [diff] [blame] | 296 | infra-dev@chromium.org, or file a [trooper bug](https://g.co/bugatrooper). |
John Budorick | 20db7c9 | 2019-08-20 19:30:59 | [diff] [blame] | 297 | |
| 298 | |
| 299 | [1]: https://chromium.googlesource.com/chromium/tools/depot_tools/+/HEAD/git_footers.py |
Sparik Hayrapetyan | 26b2a3d7 | 2022-05-06 16:24:42 | [diff] [blame] | 300 | [2]: ../../infra/config/generated/luci/commit-queue.cfg |