Blame - docs/infra/cq.md - chromium/src

blob: 7907ead36438bda5d18867dddfec66324e126f4b [file] [log] [blame] [view]

Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	1	# CQ
				2
				3	This document describes how the Chromium Commit Queue (CQ) is structured and
				4	managed. This is specific for the Chromium CQ. Questions about other CQs should
Ben Pastene	eaaa5f1	2024-05-14 16:59:35	[diff] [blame]	5	be directed to infra-dev@chromium.org. If you find terms you're not familiar
				6	with in this doc, consult the infra [glossary](glossary.md).
				7
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	8
				9	[TOC]
				10
				11	## Purpose
				12
				13	The Chromium CQ exists to test developer changes before they land into
Ben Pastene	eaaa5f1	2024-05-14 16:59:35	[diff] [blame]	14	[chromium/src](https://chromium.googlesource.com/chromium/src/). It runs a
				15	curated set of test suites across a curated set of platforms for a given CL,
				16	and ensures the CL doesn't introduce any new regressions.
				17
				18	## Modes
				19
				20	The CQ supports a few different modes of execution. Each mode differs in what
				21	tests it runs and/or what happens when all those tests pass. These modes are
				22	described below.
				23
				24	### Dry-Run
				25
				26	This runs all the normal set of tests for a CL. When the dry-run has complete,
				27	it will simply report the results of the CQ attempt as a Gerrit comment on the
				28	CL and take no further action. This mode is intended to be used frequently as a
				29	CL is developed. Can be triggered via either the `CQ DRY RUN` button in Gerrit
				30	or by applying the `Commit-Queue +1` label vote. See
				31	[below](#what-exactly-does-the-cq-run) for the anatomy of a build on the CQ.
				32
				33	### Full-Run
				34
				35	Runs all the same tests as dry-run. If there are no new regressions introduced,
				36	the CL will be submitted into the repo. This mode should only be used when a CL
				37	is finalized. Can be triggered via either the `SUBMIT TO CQ` button in Gerrit or
				38	by applying the `Commit-Queue +2` label vote.
				39
				40	### Mega-CQ Dry-Run
				41
				42	This runs a much larger set of tests compared to the previous two CQ modes.
				43	Those run a limited & curated set of tests that optimizes for quick turn-around
				44	time while still catching most _but not all_ regressions. The Mega CQ, on the
				45	other hand, aims to catch nearly _all_ regressions regardless of cycle time.
				46	Consequently, the Mega CQ takes much longer, and should only be used for
				47	particularly risky CLs. Triggered via the `Mega CQ: Dry Run` button under the
				48	three-dot menu in Gerrit.
				49
				50	For a peak under the hood, the Mega CQ determines its test coverage by including
				51	in it the trybot mirrors of all CI builders gardened by the main Chromium
				52	gardening rotations. It also unconditionally applies
				53	`Include-Ci-Only-Tests: true` to its builds (see [below](#options)).
				54
Ben Pastene	12b7e024	2024-06-18 20:26:35	[diff] [blame]	55	If you find that the Mega CQ isn't covering a build or test config that it
				56	should, please file a general [trooper bug](https://g.co/bugatrooper) for the
				57	missing coverage.
				58
Ben Pastene	eaaa5f1	2024-05-14 16:59:35	[diff] [blame]	59	### Mega-CQ Full-Run
				60
				61	Runs all the same tests as the Mega-CQ dry-run. Will submit the CL if
				62	everything passes. Triggered via the `Mega CQ: Submit` button under the
Ben Pastene	12b7e024	2024-06-18 20:26:35	[diff] [blame]	63	three-dot menu in Gerrit. The amount of builds and tests the Mega CQ runs makes
				64	a passing run much more unlikely than a normal CQ run. Consequently, when
				65	running the Mega CQ, you'll likely want to spot-check the failures listed on
				66	Gerrit for anything that looks particularly relevant to your CL. Then you can use
				67	the normal `SUBMIT TO CQ` button to land once all failures look unrelated.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	68
				69	## Options
				70
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	71	The Chromium CQ supports a variety of options that can change what it checks.
				72
				73	> These options are supported via git footers. They must appear in the last
				74	> paragraph of your commit message to be used. See `git help footers` or
				75	> [git_footers.py][1] for more information.
				76
Dirk Pranke	c167021	2021-10-14 00:34:33	[diff] [blame]	77	* `Binary-Size: <rationale>`
				78
				79	This should be used when you are landing a change that will intentionally
				80	increase the size of the Chrome binaries on Android (since we try not to
				81	accidentally do so). The rationale should explain why this is okay to do.
				82
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	83	* `Commit: false`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	84
				85	You can mark a CL with this if you are working on experimental code and do not
				86	want to risk accidentally submitting it via the CQ. The CQ will immediately
				87	stop processing the change if it contains this option.
				88
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	89	* `Cq-Include-Trybots: <trybots>`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	90
				91	This flag allows you to specify some additional bots to run for this CL, in
				92	addition to the default bots. The format for the list of trybots is
				93	"bucket:trybot1,trybot2;bucket2:trybot3".
				94
Dirk Pranke	c167021	2021-10-14 00:34:33	[diff] [blame]	95	* `Disable-Retries: true`
				96
				97	The CQ will normally try to retry failed test shards (up to a point) to work
				98	around any intermittent infra failures. If this footer is set, it won't try
				99	to retry failed shards no matter what happens.
				100
Thiago Perrotta	82d590a	2022-12-21 23:23:31	[diff] [blame]	101	* `Ignore-Freeze: true`
				102
				103	Whenever there is an active prod freeze (usually around Christmas), it can be
				104	bypassed by setting this footer.
				105
Garrett Beaty	db67d1c0	2025-03-06 21:59:51	[diff] [blame]	106	* `Include-Ci-Only-Tests: true` or
				107	`Include-Ci-Only-Tests: <comma-separated-builder-ids>\|<comma-separated-tests>`
Dirk Pranke	c167021	2021-10-14 00:34:33	[diff] [blame]	108
Garrett Beaty	db67d1c0	2025-03-06 21:59:51	[diff] [blame]	109	Some builder configurations may run configure to run some tests only after
				110	submission (on CI) and not before submission (in the CQ) by default. Possible
				111	reasons this might be that the tests are too slow or too expensive or there is
				112	insufficient capacity to run the tests for every CL. In order to still be able
				113	to explicitly reproduce what the CI builder is doing, you can specify this
				114	footer to run those tests before submission anyway. Specifying true will run
				115	all such tests on any triggered try builders. Specifying builder IDs and tests
				116	will run only the named tests defined for the identified CI builders on the
				117	try builders that mirror those CI builders. A * can be used in place of a
				118	builder ID or test name to match any builder/test.
				119
				120	Constructing a footer value manually should generally be unnecessary: tests
				121	configured to run only on CI will have the necessary footer included in their
				122	step text and in the build summary when they fail.
				123
				124	If it is necessary to construct a footer manually, the builder IDs have the
				125	format of _builder-group_:_builder-name_. _builder-name_ is the name of the CI
				126	builder that the test is configured for. _builder-group_ is the "builder
				127	group" of that builder, which is a concept specific to chromium infra. The
				128	builder group of a builder can be found by going to a build of the builder,
				129	and finding the `builder_group` property in the `Input Properties` section of
				130	the `Infra` tab.
Dirk Pranke	c167021	2021-10-14 00:34:33	[diff] [blame]	131
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	132	* `No-Presubmit: true`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	133
				134	If you want to skip the presubmit check, you can add this line, and the commit
				135	queue won't run the presubmit for your change. This should only be used when
				136	there's a bug in the PRESUBMIT scripts. Please check that there's a bug filed
				137	against the bad script, and if there isn't, [file one](https://crbug.com/new).
				138
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	139	* `No-Tree-Checks: true`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	140
				141	Add this line if you want to skip the tree status checks. This means the CQ
				142	will commit a CL even if the tree is closed. Obviously this is strongly
				143	discouraged, since the tree is usually closed for a reason. However, in rare
				144	cases this is acceptable, primarily to fix build breakages (i.e., your CL will
				145	help in reopening the tree).
				146
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	147	* `No-Try: true`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	148
				149	This should only be used for reverts to green the tree, since it skips try
				150	bots and might therefore break the tree. You shouldn't use this otherwise.
				151
Ben Pastene	35b081a3	2024-08-14 21:15:11	[diff] [blame]	152	* `Validate-Test-Flakiness: skip`
				153
				154	This will disable the `test new tests for flakiness.*` steps in CQ builds that
				155	check new tests for flakiness.
				156
Yiwei Zhang	f4fbd8d0	2025-01-15 17:36:55	[diff] [blame]	157	* `Skip-Clang-Tidy-Checks: <check_1>,<check_2>,...`
				158
				159	This will skip the specified clang-tidy checks. The checks can be specified
				160	as check name (e.g. `modernize-use-equals-default`) or glob to skip a set of
				161	checks (e.g. `modernize-*` to skip checks that advocate usage of modern
				162	language constructs). This option can span across multiple lines, for example:
				163	```
				164	Skip-Clang-Tidy-Checks: google-explicit-constructor
				165	Skip-Clang-Tidy-Checks: modernize-,readability-
				166	```
				167
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	168	## FAQ
				169
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	170	### What exactly does the CQ run?
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	171
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	172	CQ runs the jobs specified in [commit-queue.cfg][2]. See
Henrique Ferreiro	1395c3e	2020-07-30 16:00:18	[diff] [blame]	173	[`cq-builders.md`](../../infra/config/generated/cq-builders.md)
Garrett Beaty	8928f390	2019-10-16 22:41:09	[diff] [blame]	174	for an auto generated file with links to information about the builders on the
				175	CQ.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	176
				177	Some of these jobs are experimental. This means they are executed on a
				178	percentage of CQ builds, and the outcome of the build doesn't affect if the CL
				179	can land or not. See the schema linked at the top of the file for more
				180	information on what the fields in the config do.
				181
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	182	The CQ has the following structure:
				183
				184	* Compile all test suites that might be affected by the CL.
				185	* Runs all test suites that might be affected by the CL.
				186	* Many test suites are divided into shards. Each shard is run as a separate
				187	swarming task.
				188	* These steps are labeled '(with patch)'
				189	* Retry each shard that has a test failure. The retry has the exact same
				190	configuration as the original run. No recompile is necessary.
				191	* If the retry succeeds, then the failure is ignored.
				192	* These steps are labeled '(retry shards with patch)'
				193	* It's important to retry with the exact same configuration. Attempting to
				194	retry the failing test in isolation often produces different behavior.
				195	* Recompile each failing test suite without the CL. Rerun each failing test
				196	suite in isolation.
				197	* If the retry fails, then the fail is ignored, as it's assumed that the test
				198	is broken/flaky on tip of tree.
				199	* These steps are labeled '(without patch)'
				200	* Fail the build if there are tests which failed in both '(with patch)' and
				201	'(retry shards with patch)' but passed in '(without patch)'.
				202
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	203	### Why did my CL fail the CQ?
				204
				205	Please follow these general guidelines:
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	206
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	207	1. Check to see if your patch caused the build failures, and fix if possible.
				208	1. If compilation or individual tests are failing on one or more CQ bots and you
				209	suspect that your CL is not responsible, please contact your friendly
Erik Staab	80a7c1b	2024-03-27 23:26:23	[diff] [blame]	210	neighborhood gardener by filing a
				211	[gardener bug](https://bugs.chromium.org/p/chromium/issues/entry?template=Defect%20report%20from%20developer&labels=Gardener-Chromium&summary=%5BBrief%20description%20of%20problem%5D&comment=What%27s%20wrong?).
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	212	If the code in question has appropriate OWNERS, consider contacting or CCing
				213	them.
				214	1. If other parts of CQ bot execution (e.g. `bot_update`) are failing, or you
				215	have reason to believe the CQ itself is broken, or you can't really
Sven Zheng	58e18fb	2019-01-22 19:00:00	[diff] [blame]	216	tell what's wrong, please file a [trooper bug](https://g.co/bugatrooper).
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	217
				218	In both cases, when filing bugs, please include links to the build and/or CL
				219	(including relevant patchset information) in question.
				220
dljames	4556155	2025-02-12 01:23:27	[diff] [blame]	221	### How do I stop the CQ?
				222
				223	There are a few ways to do this. Here are 3:
				224
				225	1. Change the Commit-Queue value from +1 to 0 in Gerrit UI.
				226	2. Upload a new patchset which triggers a new dry run (Ex: git cl upload -d).
				227	3. Code-Review -1. This prevents a CL from landing.
				228
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	229	### How do I add a new builder to the CQ?
				230
				231	There are several requirements for a builder to be added to the Commit Queue.
				232
Erik Staab	80a7c1b	2024-03-27 23:26:23	[diff] [blame]	233	* There must be a "mirrored" (aka matching) CI builder that is gardened, to
Dirk Pranke	efa8d0be	2021-05-17 18:19:57	[diff] [blame]	234	ensure that someone is actively keeping the configuration green.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	235	* All the code for this configuration must be in Chromium's public repository or
				236	brought in through [src/DEPS](../../DEPS).
				237	* Setting up the build should be straightforward for a Chromium developer
				238	familiar with existing configurations.
				239	* Tests should use existing test harnesses i.e.
				240	[gtest](../../third_party/googletest).
Monica Chintala	75bc7167	2024-09-21 19:02:01	[diff] [blame]	241	* It should be possible for any committer to replicate any testing run, i.e.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	242	tests and their data must be in the public repository.
				243	* Median cycle time needs to be under 40 minutes for trybots. 90th percentile
Quinten Yearsley	317532d	2021-10-20 17:10:31	[diff] [blame]	244	should be around an hour (preferably shorter).
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	245	* Configurations need to catch enough failures to be worth adding to the CQ.
				246	Running builds on every CL requires a significant amount of compute resources.
				247	If a configuration only fails once every couple of weeks on the waterfalls,
				248	then it's probably not worth adding it to the commit queue.
				249
Dirk Pranke	efa8d0be	2021-05-17 18:19:57	[diff] [blame]	250	Please email [email protected], who will approve new build configurations.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	251
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	252	### How do I ensure a trybot runs on all changes to a specific directory?
				253
				254	Several builders are included in the CQ only for changes that affect specific
				255	directories. These used to be configured via Cq-Include-Trybots footers
Garrett Beaty	d88e4f06	2020-07-22 19:24:02	[diff] [blame]	256	injected at CL upload time. They are now configured via the `location_regexp`
				257	attribute of the tryjob parameter to the try builder's definition e.g.
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	258
				259	```
Garrett Beaty	d88e4f06	2020-07-22 19:24:02	[diff] [blame]	260	try_.some_builder_function(
				261	name = "my-specific-try-builder",
				262	tryjob = try_.job(
				263	location_regexp = [
				264	".+/{+]/path/to/my/specific/directory/.+"
				265	],
				266	),
				267	)
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	268	```
				269
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	270	## Flakiness
				271
				272	The CQ can sometimes be flaky. Flakiness is when a test on the CQ fails, but
				273	should have passed (commonly known as a false negative). There are a few common
				274	causes of flaky tests on the CQ:
				275
				276	* Machine issues; weird system processes running, running out of disk space,
				277	etc...
				278	* Test issues; individual tests not being independent and relying on the order
				279	of tests being run, not mocking out network traffic or other real world
				280	interactions.
				281
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	282	The CQ mitigates flakiness by retrying failed tests. The core tradeoff in retry
				283	policy is that adding retries increases the probability that a flaky test will
				284	land on tip of tree sublinearly, but mitigates the impact of the flaky test on
				285	unrelated CLs exponentially.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	286
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	287	For example, imagine a CL that adds a test that fails with 50% probability. Even
				288	with no retries, the test will land with 50% probability. Subsequently, 50% of
				289	all unrelated CQ attempts would flakily fail. This effect is cumulative across
				290	different flaky tests. Since the CQ has roughly ~20,000 unique flaky tests,
				291	without retries, pretty much no CL would ever pass the CQ.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	292
				293	## Help!
				294
				295	Have other questions? Run into any issues with the CQ? Email
Sven Zheng	58e18fb	2019-01-22 19:00:00	[diff] [blame]	296	infra-dev@chromium.org, or file a [trooper bug](https://g.co/bugatrooper).
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	297
				298
				299	[1]: https://chromium.googlesource.com/chromium/tools/depot_tools/+/HEAD/git_footers.py
Sparik Hayrapetyan	26b2a3d7	2022-05-06 16:24:42	[diff] [blame]	300	[2]: ../../infra/config/generated/luci/commit-queue.cfg