blob: 7907ead36438bda5d18867dddfec66324e126f4b [file] [log] [blame] [view]
Stephen Martinisb5ad5b222018-11-08 01:24:041# CQ
2
3This document describes how the Chromium Commit Queue (CQ) is structured and
4managed. This is specific for the Chromium CQ. Questions about other CQs should
Ben Pasteneeaaa5f12024-05-14 16:59:355be directed to infra-dev@chromium.org. If you find terms you're not familiar
6with in this doc, consult the infra [glossary](glossary.md).
7
Stephen Martinisb5ad5b222018-11-08 01:24:048
9[TOC]
10
11## Purpose
12
13The Chromium CQ exists to test developer changes before they land into
Ben Pasteneeaaa5f12024-05-14 16:59:3514[chromium/src](https://chromium.googlesource.com/chromium/src/). It runs a
15curated set of test suites across a curated set of platforms for a given CL,
16and ensures the CL doesn't introduce any new regressions.
17
18## Modes
19
20The CQ supports a few different modes of execution. Each mode differs in what
21tests it runs and/or what happens when all those tests pass. These modes are
22described below.
23
24### Dry-Run
25
26This runs all the normal set of tests for a CL. When the dry-run has complete,
27it will simply report the results of the CQ attempt as a Gerrit comment on the
28CL and take no further action. This mode is intended to be used frequently as a
29CL is developed. Can be triggered via either the `CQ DRY RUN` button in Gerrit
30or by applying the `Commit-Queue +1` label vote. See
31[below](#what-exactly-does-the-cq-run) for the anatomy of a build on the CQ.
32
33### Full-Run
34
35Runs all the same tests as dry-run. If there are no new regressions introduced,
36the CL will be submitted into the repo. This mode should only be used when a CL
37is finalized. Can be triggered via either the `SUBMIT TO CQ` button in Gerrit or
38by applying the `Commit-Queue +2` label vote.
39
40### Mega-CQ Dry-Run
41
42This runs a much larger set of tests compared to the previous two CQ modes.
43Those run a limited & curated set of tests that optimizes for quick turn-around
44time while still catching most _but not all_ regressions. The Mega CQ, on the
45other hand, aims to catch nearly _all_ regressions regardless of cycle time.
46Consequently, the Mega CQ takes much longer, and should only be used for
47particularly risky CLs. Triggered via the `Mega CQ: Dry Run` button under the
48three-dot menu in Gerrit.
49
50For a peak under the hood, the Mega CQ determines its test coverage by including
51in it the trybot mirrors of all CI builders gardened by the main Chromium
52gardening rotations. It also unconditionally applies
53`Include-Ci-Only-Tests: true` to its builds (see [below](#options)).
54
Ben Pastene12b7e0242024-06-18 20:26:3555If you find that the Mega CQ isn't covering a build or test config that it
56should, please file a general [trooper bug](https://g.co/bugatrooper) for the
57missing coverage.
58
Ben Pasteneeaaa5f12024-05-14 16:59:3559### Mega-CQ Full-Run
60
61Runs all the same tests as the Mega-CQ dry-run. Will submit the CL if
62everything passes. Triggered via the `Mega CQ: Submit` button under the
Ben Pastene12b7e0242024-06-18 20:26:3563three-dot menu in Gerrit. The amount of builds and tests the Mega CQ runs makes
64a passing run much more unlikely than a normal CQ run. Consequently, when
65running the Mega CQ, you'll likely want to spot-check the failures listed on
66Gerrit for anything that looks particularly relevant to your CL. Then you can use
67the normal `SUBMIT TO CQ` button to land once all failures look unrelated.
Stephen Martinisb5ad5b222018-11-08 01:24:0468
69## Options
70
John Budorick20db7c92019-08-20 19:30:5971The Chromium CQ supports a variety of options that can change what it checks.
72
73> These options are supported via git footers. They must appear in the last
74> paragraph of your commit message to be used. See `git help footers` or
75> [git_footers.py][1] for more information.
76
Dirk Prankec1670212021-10-14 00:34:3377* `Binary-Size: <rationale>`
78
79 This should be used when you are landing a change that will intentionally
80 increase the size of the Chrome binaries on Android (since we try not to
81 accidentally do so). The rationale should explain why this is okay to do.
82
John Budorick20db7c92019-08-20 19:30:5983* `Commit: false`
Stephen Martinisb5ad5b222018-11-08 01:24:0484
85 You can mark a CL with this if you are working on experimental code and do not
86 want to risk accidentally submitting it via the CQ. The CQ will immediately
87 stop processing the change if it contains this option.
88
John Budorick20db7c92019-08-20 19:30:5989* `Cq-Include-Trybots: <trybots>`
Stephen Martinisb5ad5b222018-11-08 01:24:0490
91 This flag allows you to specify some additional bots to run for this CL, in
92 addition to the default bots. The format for the list of trybots is
93 "bucket:trybot1,trybot2;bucket2:trybot3".
94
Dirk Prankec1670212021-10-14 00:34:3395* `Disable-Retries: true`
96
97 The CQ will normally try to retry failed test shards (up to a point) to work
98 around any intermittent infra failures. If this footer is set, it won't try
99 to retry failed shards no matter what happens.
100
Thiago Perrotta82d590a2022-12-21 23:23:31101* `Ignore-Freeze: true`
102
103 Whenever there is an active prod freeze (usually around Christmas), it can be
104 bypassed by setting this footer.
105
Garrett Beatydb67d1c02025-03-06 21:59:51106* `Include-Ci-Only-Tests: true` or
107 `Include-Ci-Only-Tests: <comma-separated-builder-ids>|<comma-separated-tests>`
Dirk Prankec1670212021-10-14 00:34:33108
Garrett Beatydb67d1c02025-03-06 21:59:51109 Some builder configurations may run configure to run some tests only after
110 submission (on CI) and not before submission (in the CQ) by default. Possible
111 reasons this might be that the tests are too slow or too expensive or there is
112 insufficient capacity to run the tests for every CL. In order to still be able
113 to explicitly reproduce what the CI builder is doing, you can specify this
114 footer to run those tests before submission anyway. Specifying true will run
115 all such tests on any triggered try builders. Specifying builder IDs and tests
116 will run only the named tests defined for the identified CI builders on the
117 try builders that mirror those CI builders. A * can be used in place of a
118 builder ID or test name to match any builder/test.
119
120 Constructing a footer value manually should generally be unnecessary: tests
121 configured to run only on CI will have the necessary footer included in their
122 step text and in the build summary when they fail.
123
124 If it is necessary to construct a footer manually, the builder IDs have the
125 format of _builder-group_:_builder-name_. _builder-name_ is the name of the CI
126 builder that the test is configured for. _builder-group_ is the "builder
127 group" of that builder, which is a concept specific to chromium infra. The
128 builder group of a builder can be found by going to a build of the builder,
129 and finding the `builder_group` property in the `Input Properties` section of
130 the `Infra` tab.
Dirk Prankec1670212021-10-14 00:34:33131
John Budorick20db7c92019-08-20 19:30:59132* `No-Presubmit: true`
Stephen Martinisb5ad5b222018-11-08 01:24:04133
134 If you want to skip the presubmit check, you can add this line, and the commit
135 queue won't run the presubmit for your change. This should only be used when
136 there's a bug in the PRESUBMIT scripts. Please check that there's a bug filed
137 against the bad script, and if there isn't, [file one](https://crbug.com/new).
138
John Budorick20db7c92019-08-20 19:30:59139* `No-Tree-Checks: true`
Stephen Martinisb5ad5b222018-11-08 01:24:04140
141 Add this line if you want to skip the tree status checks. This means the CQ
142 will commit a CL even if the tree is closed. Obviously this is strongly
143 discouraged, since the tree is usually closed for a reason. However, in rare
144 cases this is acceptable, primarily to fix build breakages (i.e., your CL will
145 help in reopening the tree).
146
John Budorick20db7c92019-08-20 19:30:59147* `No-Try: true`
Stephen Martinisb5ad5b222018-11-08 01:24:04148
149 This should only be used for reverts to green the tree, since it skips try
150 bots and might therefore break the tree. You shouldn't use this otherwise.
151
Ben Pastene35b081a32024-08-14 21:15:11152* `Validate-Test-Flakiness: skip`
153
154 This will disable the `test new tests for flakiness.*` steps in CQ builds that
155 check new tests for flakiness.
156
Yiwei Zhangf4fbd8d02025-01-15 17:36:55157* `Skip-Clang-Tidy-Checks: <check_1>,<check_2>,...`
158
159 This will skip the specified clang-tidy checks. The checks can be specified
160 as check name (e.g. `modernize-use-equals-default`) or glob to skip a set of
161 checks (e.g. `modernize-*` to skip checks that advocate usage of modern
162 language constructs). This option can span across multiple lines, for example:
163 ```
164 Skip-Clang-Tidy-Checks: google-explicit-constructor
165 Skip-Clang-Tidy-Checks: modernize-*,readability-*
166 ```
167
Stephen Martinisb5ad5b222018-11-08 01:24:04168## FAQ
169
Erik Chen3eabe5e2019-05-30 23:23:25170### What exactly does the CQ run?
Stephen Martinisb5ad5b222018-11-08 01:24:04171
John Budorick20db7c92019-08-20 19:30:59172CQ runs the jobs specified in [commit-queue.cfg][2]. See
Henrique Ferreiro1395c3e2020-07-30 16:00:18173[`cq-builders.md`](../../infra/config/generated/cq-builders.md)
Garrett Beaty8928f3902019-10-16 22:41:09174for an auto generated file with links to information about the builders on the
175CQ.
Stephen Martinisb5ad5b222018-11-08 01:24:04176
177Some of these jobs are experimental. This means they are executed on a
178percentage of CQ builds, and the outcome of the build doesn't affect if the CL
179can land or not. See the schema linked at the top of the file for more
180information on what the fields in the config do.
181
Erik Chen3eabe5e2019-05-30 23:23:25182The CQ has the following structure:
183
184* Compile all test suites that might be affected by the CL.
185* Runs all test suites that might be affected by the CL.
186 * Many test suites are divided into shards. Each shard is run as a separate
187 swarming task.
188 * These steps are labeled '(with patch)'
189* Retry each shard that has a test failure. The retry has the exact same
190 configuration as the original run. No recompile is necessary.
191 * If the retry succeeds, then the failure is ignored.
192 * These steps are labeled '(retry shards with patch)'
193 * It's important to retry with the exact same configuration. Attempting to
194 retry the failing test in isolation often produces different behavior.
195* Recompile each failing test suite without the CL. Rerun each failing test
196 suite in isolation.
197 * If the retry fails, then the fail is ignored, as it's assumed that the test
198 is broken/flaky on tip of tree.
199 * These steps are labeled '(without patch)'
200* Fail the build if there are tests which failed in both '(with patch)' and
201 '(retry shards with patch)' but passed in '(without patch)'.
202
Stephen Martinisb5ad5b222018-11-08 01:24:04203### Why did my CL fail the CQ?
204
205Please follow these general guidelines:
Erik Chen3eabe5e2019-05-30 23:23:25206
Stephen Martinisb5ad5b222018-11-08 01:24:042071. Check to see if your patch caused the build failures, and fix if possible.
2081. If compilation or individual tests are failing on one or more CQ bots and you
209 suspect that your CL is not responsible, please contact your friendly
Erik Staab80a7c1b2024-03-27 23:26:23210 neighborhood gardener by filing a
211 [gardener bug](https://bugs.chromium.org/p/chromium/issues/entry?template=Defect%20report%20from%20developer&labels=Gardener-Chromium&summary=%5BBrief%20description%20of%20problem%5D&comment=What%27s%20wrong?).
Stephen Martinisb5ad5b222018-11-08 01:24:04212 If the code in question has appropriate OWNERS, consider contacting or CCing
213 them.
2141. If other parts of CQ bot execution (e.g. `bot_update`) are failing, or you
215 have reason to believe the CQ itself is broken, or you can't really
Sven Zheng58e18fb2019-01-22 19:00:00216 tell what's wrong, please file a [trooper bug](https://g.co/bugatrooper).
Stephen Martinisb5ad5b222018-11-08 01:24:04217
218In both cases, when filing bugs, please include links to the build and/or CL
219(including relevant patchset information) in question.
220
dljames45561552025-02-12 01:23:27221### How do I stop the CQ?
222
223There are a few ways to do this. Here are 3:
224
2251. Change the Commit-Queue value from +1 to 0 in Gerrit UI.
2262. Upload a new patchset which triggers a new dry run (Ex: git cl upload -d).
2273. Code-Review -1. This prevents a CL from landing.
228
Stephen Martinisb5ad5b222018-11-08 01:24:04229### How do I add a new builder to the CQ?
230
231There are several requirements for a builder to be added to the Commit Queue.
232
Erik Staab80a7c1b2024-03-27 23:26:23233* There must be a "mirrored" (aka matching) CI builder that is gardened, to
Dirk Prankeefa8d0be2021-05-17 18:19:57234 ensure that someone is actively keeping the configuration green.
Stephen Martinisb5ad5b222018-11-08 01:24:04235* All the code for this configuration must be in Chromium's public repository or
236 brought in through [src/DEPS](../../DEPS).
237* Setting up the build should be straightforward for a Chromium developer
238 familiar with existing configurations.
239* Tests should use existing test harnesses i.e.
240 [gtest](../../third_party/googletest).
Monica Chintala75bc71672024-09-21 19:02:01241* It should be possible for any committer to replicate any testing run, i.e.
Stephen Martinisb5ad5b222018-11-08 01:24:04242 tests and their data must be in the public repository.
243* Median cycle time needs to be under 40 minutes for trybots. 90th percentile
Quinten Yearsley317532d2021-10-20 17:10:31244 should be around an hour (preferably shorter).
Stephen Martinisb5ad5b222018-11-08 01:24:04245* Configurations need to catch enough failures to be worth adding to the CQ.
246 Running builds on every CL requires a significant amount of compute resources.
247 If a configuration only fails once every couple of weeks on the waterfalls,
248 then it's probably not worth adding it to the commit queue.
249
Dirk Prankeefa8d0be2021-05-17 18:19:57250Please email [email protected], who will approve new build configurations.
Stephen Martinisb5ad5b222018-11-08 01:24:04251
John Budorick20db7c92019-08-20 19:30:59252### How do I ensure a trybot runs on all changes to a specific directory?
253
254Several builders are included in the CQ only for changes that affect specific
255directories. These used to be configured via Cq-Include-Trybots footers
Garrett Beatyd88e4f062020-07-22 19:24:02256injected at CL upload time. They are now configured via the `location_regexp`
257attribute of the tryjob parameter to the try builder's definition e.g.
John Budorick20db7c92019-08-20 19:30:59258
259```
Garrett Beatyd88e4f062020-07-22 19:24:02260 try_.some_builder_function(
261 name = "my-specific-try-builder",
262 tryjob = try_.job(
263 location_regexp = [
264 ".+/{+]/path/to/my/specific/directory/.+"
265 ],
266 ),
267 )
John Budorick20db7c92019-08-20 19:30:59268```
269
Stephen Martinisb5ad5b222018-11-08 01:24:04270## Flakiness
271
272The CQ can sometimes be flaky. Flakiness is when a test on the CQ fails, but
273should have passed (commonly known as a false negative). There are a few common
274causes of flaky tests on the CQ:
275
276* Machine issues; weird system processes running, running out of disk space,
277 etc...
278* Test issues; individual tests not being independent and relying on the order
279 of tests being run, not mocking out network traffic or other real world
280 interactions.
281
Erik Chen3eabe5e2019-05-30 23:23:25282The CQ mitigates flakiness by retrying failed tests. The core tradeoff in retry
283policy is that adding retries increases the probability that a flaky test will
284land on tip of tree sublinearly, but mitigates the impact of the flaky test on
285unrelated CLs exponentially.
Stephen Martinisb5ad5b222018-11-08 01:24:04286
Erik Chen3eabe5e2019-05-30 23:23:25287For example, imagine a CL that adds a test that fails with 50% probability. Even
288with no retries, the test will land with 50% probability. Subsequently, 50% of
289all unrelated CQ attempts would flakily fail. This effect is cumulative across
290different flaky tests. Since the CQ has roughly ~20,000 unique flaky tests,
291without retries, pretty much no CL would ever pass the CQ.
Stephen Martinisb5ad5b222018-11-08 01:24:04292
293## Help!
294
295Have other questions? Run into any issues with the CQ? Email
Sven Zheng58e18fb2019-01-22 19:00:00296infra-dev@chromium.org, or file a [trooper bug](https://g.co/bugatrooper).
John Budorick20db7c92019-08-20 19:30:59297
298
299[1]: https://chromium.googlesource.com/chromium/tools/depot_tools/+/HEAD/git_footers.py
Sparik Hayrapetyan26b2a3d72022-05-06 16:24:42300[2]: ../../infra/config/generated/luci/commit-queue.cfg