Blame - docs/testing/test_executable_api.md - chromium/src.git

blob: 16d613ac28db47ca4af097cd7b3cdf6ca369dce3 [file] [log] [blame] [view]

Dirk Pranke	e034e1ef5	2020-07-03 00:48:08	[diff] [blame]	1	# The Chromium Test Executable API
				2
				3	[bit.ly/chromium-test-runner-api][1] (*)
				4
				5
				6	[TOC]
				7
				8	## Introduction
				9
				10	This document defines the API that test executables must implement in order to
				11	be run on the Chromium continuous integration infrastructure (the
				12	[LUCI][2]
				13	system using the `chromium` and `chromium_trybot` recipes).
				14
				15	*** note
				16	NOTE: This document specifies the existing `isolated_scripts` API in the
				17	Chromium recipe. Currently we also support other APIs (e.g., for
				18	GTests), but we should migrate them to use the `isolated_scripts` API.
				19	That work is not currently scheduled.
				20	***
				21
				22	This spec applies only to functional tests and does not attempt to
				23	specify how performance tests should work, though in principle they
				24	could probably work the same way and possibly just produce different
				25	output.
				26
				27	This document is specifically targeted at Chromium and assumes you are
				28	using GN and Ninja for your build system. It should be possible to adapt
				29	these APIs to other projects and build recipes, but this is not an
				30	immediate goal. Similarly, if a project adapts this API and the related
				31	specifications it should be able to reuse the functionality and tooling
				32	we've built out for Chromium's CI system more easily in other LUCI
				33	deployments.
				34
				35	***
				36	NOTE: It bears repeating that this describes the current state of
				37	affairs, and not the desired end state. A companion doc,
				38	[Cleaning up the Chromium Testing Environment][3],
				39	discusses a possible path forward and end state.
				40	***
				41
				42	## Building and Invoking a Test Executable
				43
				44	There are lots of different kinds of tests, but we want to be able to
				45	build and invoke them uniformly, regardless of how they are implemented.
				46
				47	We will call the thing being executed to run the tests a _test
				48	executable_ (or executable for short). This is not an ideal name, as
				49	this doesn't necessarily refer to a GN executable target type; it may be
				50	a wrapper script that invokes other binaries or scripts to run the
				51	tests.
				52
				53	We expect the test executable to run one or more tests. A _test_ must be
				54	an atomically addressable thing with a name that is unique to that
				55	invocation of the executable, i.e., we expect that we can pass a list of
				56	test names to the test executable and only run just those tests. Test
				57	names must not contain a "::" (which is used as a separator between test
				58	names) and must not contain a "*" (which could be confused with a glob
				59	character) or start with a "-" (which would be confused with an
				60	indicator that you should skip the test). Test names should generally
				61	only contain ASCII code points, as the infrastructure does not currently
				62	guarantee that non-ASCII code points will work correctly everywhere. We
				63	do not specify test naming conventions beyond these requirements, and it
				64	is fully permissible for a test to contain multiple assertions which may
				65	pass or fail; this design does not specify a way to interpret or handle
				66	those "sub-atomic" assertions; their existence is opaque to this design.
				67	In particular, this spec does not provide a particular way to identify
				68	and handle parameterized tests, or to do anything with test suites
				69	beyond a supporting a limited form of globbing for specifying sets of
				70	test names.
				71
				72	To configure a new test, you need to modify one to three files:
				73
				74	* The test must be listed in one or more test suites in
				75	[//testing/buildbot/test_suites.pyl][4]. Most commonly the test will be
				76	defined as a single string (e.g., "base_unittests"), which keys into an
				77	entry in [//testing/buildbot/gn_isolate_map.pyl][5]. In some cases, tests
				78	will reference a target and add additional command line arguments. These
				79	entries (along with [//testing/buildbot/test_suite_exceptions.pyl][6] and
				80	[//testing/buildbot/waterfalls.pyl][7]) determine where the tests will be
				81	run. For more information on how these files work, see
				82	[//testing/buildbot/README.md][8]
				83	* Tests entries must ultimately reference an entry in
				84	//testing/buildbot/gn_isolate_map.pyl. This file contains the mapping of
				85	ninja compile targets to GN targets (specifying the GN label for the
				86	latter); we need this mapping in order to be able to run `gn analyze`
				87	against a patch to see which targets are affected by a patch. This file
				88	also tells MB what kind of test an entry is (so we can form the correct
				89	command line) and may specify additional command line flags. If you are
				90	creating a test that is only a variant of an existing test, this may be the
				91	only file you need to modify. (Technically, you could define a new test
				92	solely in test_suites.pyl and reference existing gn_isolate_map.pyl
				93	entries, but this is considered bad practice).
				94	* Add the GN target itself to the appropriate build files. Make sure this GN
				95	target contains all of the data and data_deps entries needed to ensure the
				96	test isolate has all the files the test needs to run. If your test doesn't
				97	depend on new build targets or add additional data file dependencies, you
				98	likely don't need this. However, this is increasingly uncommon.
				99
				100	### Command Line Arguments
				101
				102	The executable must support the following command line arguments (aka flags):
				103
				104	```
Dirk Pranke	8744a67	2020-08-18 23:11:59	[diff] [blame]	105	--isolated-outdir=[PATH]
				106	```
				107
				108	This argument is required, and should be set to the directory created
				109	by the swarming task for the task to write outputs into.
				110
				111	```
danakj	7e0ef7b	2023-05-18 22:38:57	[diff] [blame]	112	--out-dir=[PATH]
				113	```
				114
				115	This argument mirrors `--isolated-outdir`, but may appear in addition to
				116	it depending on the bot configuration (e.g. IOS bots that specify the
				117	`out_dir_arg` mixin in //testing/buildbot/waterfalls.pyl). It only needs
				118	to be handled in these cases.
				119
				120	```
Dirk Pranke	e034e1ef5	2020-07-03 00:48:08	[diff] [blame]	121	--isolated-script-test-output=[FILENAME]
				122	```
				123
				124	This argument is optional. If this argument is provided, the executable
				125	must write the results of the test run in the [JSON Test
				126	Results Format](json_test_results_format.md) into
				127	that file. If this argument is not given to the executable, the
				128	executable must not write the output anywhere. The executable should
				129	only write a valid version of the file, and generally should only do
				130	this at the end of the test run. This means that if the run is
				131	interrupted, you may not get the results of what did run, but that is
				132	acceptable.
				133
				134	```
				135	--isolated-script-test-filter=[STRING]
				136	```
				137
				138	This argument is optional. If this argument is provided, it must be a
				139	double-colon-separated list of strings, where each string either
				140	uniquely identifies a full test name or is a prefix plus a "*" on the
				141	end (to form a glob). The executable must run only the test matching
				142	those names or globs. "" is _only_ supported at the end, i.e., 'Foo.'
				143	is legal, but '*.bar' is not. If the string has a "-" at the front, the
				144	test (or glob of tests) must be skipped, not run. This matches how test
				145	names are specified in the simple form of the [Chromium Test List
				146	Format][9]. We use the double
				147	colon as a separator because most other common punctuation characters
				148	can occur in test names (some test suites use URLs as test names, for
				149	example). This argument may be provided multiple times; how to treat
				150	multiple occurrences (and how this arg interacts with
				151	--isolated-script-test-filter-file) is described below.
				152
				153	```
				154	--isolated-script-test-filter-file=[FILENAME]
				155	```
				156
				157	If provided, the executable must read the given filename to determine
				158	which tests to run and what to expect their results to be. The file must
				159	be in the [Chromium Test List Format][9] (either the simple or
				160	tagged formats are fine). This argument may be provided multiple times;
				161	how to treat multiple occurrences (and how this arg interacts with
				162	`--isolated-script-test-filter`) is described below.
				163
				164	```
				165	--isolated-script-test-launcher-retry-limit=N
				166	```
				167
				168	By default, tests are run only once if they succeed. If they fail, we
				169	will retry the test up to N times (so, for N+1 total invocations of the
				170	test) looking for a success (and stop retrying once the test has
				171	succeed). By default, the value of N is 3. To turn off retries, pass
				172	`--isolated-script-test-launcher-retry-limit=0`. If this flag is provided,
				173	it is an error to also pass `--isolated-script-test-repeat` (since -repeat
				174	specifies an explicit number of times to run the test, it makes no sense
				175	to also pass --retry-limit).
				176
				177	```
				178	--isolated-script-test-repeat=N
				179	```
				180
				181	If provided, the executable must run a given test N times (total),
				182	regardless of whether the test passes or fails. By default, tests are
				183	only run once (N=1) if the test matches an expected result or passes,
				184	otherwise it may be retried until it succeeds, as governed by
				185	`--isolated-script-test-launcher-retry-limit`, above. If this flag is
				186	provided, it is an error to also pass
				187	`--isolated-script-test-launcher-retry-limit` (since -repeat specifies an
				188	explicit number of times to run the test, it makes no sense to also pass
				189	-retry-limit).
				190
danakj	7e0ef7b	2023-05-18 22:38:57	[diff] [blame]	191	```
				192	--xcode-build-version [VERSION]
				193	```
				194
				195	This flag is passed to scripts on IOS bots only, due to the `xcode_14_main`
				196	mixin in //testing/builtbot/waterfalls.pyl.
				197
				198	```
				199	--xctest
				200	```
				201
				202	This flag is passed to scripts on IOS bots only, due to the `xctest`
				203	mixin in //testing/builtbot/waterfalls.pyl.
				204
Dirk Pranke	e034e1ef5	2020-07-03 00:48:08	[diff] [blame]	205	If "`--`" is passed as an argument:
				206
				207	* If the executable is a wrapper that invokes another underlying
				208	executable, then the wrapper must handle arguments passed before the
				209	"--" on the command line (and must error out if it doesn't know how
				210	to do that), and must pass through any arguments following the "--"
				211	unmodified to the underlying executable (and otherwise ignore them
				212	rather than erroring out if it doesn't know how to interpret them).
				213	* If the executable is not a wrapper, but rather invokes the tests
				214	directly, it should handle all of the arguments and otherwise ignore
				215	the "--". The executable should error out if it gets arguments it
				216	can't handle, but it is not required to do so.
				217
				218	If "--" is not passed, the executable should error out if it gets
				219	arguments it doesn't know how to handle, but it is not required to do
				220	so.
				221
				222	If the test executable produces artifacts, they should be written to the
				223	location specified by the dirname of the `--isolated-script-test-output`
				224	argument). If the `--isolated-script-test-output-argument` is not
				225	specified, the executable should store the tests somewhere under the
				226	root_build_dir, but there is no standard for how to do this currently
				227	(most tests do not produce artifacts).
				228
				229	The flag names are purposely chosen to be long in order to not conflict
				230	with other flags the executable might support.
				231
				232	### Environment variables
				233
				234	The executable must check for and honor the following environment variables:
				235
				236	```
				237	GTEST_SHARD_INDEX=[N]
				238	```
				239
				240	This environment variable is optional, but if it is provided, it
				241	partially determines (along with `GTEST_TOTAL_SHARDS`) which fixed
				242	subset of tests (or "shard") to run. `GTEST_TOTAL_SHARDS` must also be
				243	set, and `GTEST_SHARD_INDEX` must be set to an integer between 0 and
				244	`GTEST_TOTAL_SHARDS`. Determining which tests to run is described
				245	below.
				246
				247	```
				248	GTEST_TOTAL_SHARDS=[N]
				249	```
				250
				251	This environment variable is optional, but if it is provided, it
				252	partially determines (along with `GTEST_TOTAL_SHARDS`) which fixed subset
				253	of tests (or "shard") to run. It must be set to a non-zero integer.
				254	Determining which tests to run is described below.
				255
				256	### Exit codes (aka return codes or return values)
				257
				258	The executable must return 0 for a completely successful run, and a
				259	non-zero result if something failed. The following codes are recommended
				260	(2 and 130 coming from UNIX conventions):
				261
				262	\| Value \| Meaning \|
				263	\|--------- \| ------- \|
				264	\| 0 (zero) \| The executable ran to completion and all tests either ran as expected or passed unexpectedly. \|
				265	\| 1 \| The executable ran to completion but some tests produced unexpectedly failing results. \|
				266	\| 2 \| The executable failed to start, most likely due to unrecognized or unsupported command line arguments. \|
				267	\| 130 \| The executable run was aborted the user (or caller) in a semi-orderly manner (aka SIGKILL or Ctrl-C). \|
				268
				269	## Filtering which tests to run
				270
				271	By default, the executable must run every test it knows about. However,
				272	as noted above, the `--isolated-script-test-filter` and
				273	`--isolated-script-test-filter-file` flags can be used to customize which
				274	tests to run. Either or both flags may be used, and either may be
				275	specified multiple times.
				276
				277	The interaction is as follows:
				278
				279	* A test should be run only if it would be run when every flag is
				280	evaluated individually.
				281	* A test should be skipped if it would be skipped if any flag was
				282	evaluated individually.
				283
				284	If multiple filters in a flag match a given test name, the longest match
				285	takes priority (longest match wins). I.e.,. if you had
				286	`--isolated-script-test-filter='a::-ab'`, then `ace.html` would run but
				287	`abd.html` would not. The order of the filters should not matter. It is
				288	an error to have multiple expressions of the same length that conflict
				289	(e.g., `a::-a`).
				290
				291	Examples are given below.
				292
				293	It may not be obvious why we need to support these flags being used multiple
				294	times, or together. There are two main sets of reasons:
				295	* First, you may want to use multiple -filter-file arguments to specify
				296	multiple sets of test expectations (e.g., the base test expectations and
				297	then MSAN-specific expectations), or to specify expectations in one file
				298	and list which tests to run in a separate file.
				299	* Second, the way the Chromium recipes work, in order to retry a test step to
				300	confirm test failures, the recipe doesn't want to have to parse the
				301	existing command line, it just wants to append
				302	--isolated-script-test-filter and list the
				303	tests that fail, and this can cause the --isolated-script-test-filter
				304	argument to be listed multiple times (or in conjunction with
				305	--isolated-script-test-filter-file).
				306
				307	You cannot practically use these mechanisms to run equally sized subsets of the
				308	tests, so if you want to do the latter, use `GTEST_SHARD_INDEX` and
				309	`GTEST_TOTAL_SHARDS` instead, as described in the next section.
				310
				311	## Running equally-sized subsets of tests (shards)
				312
				313	If the `GTEST_SHARD_INDEX` and `GTEST_TOTAL_SHARDS` environment variables are
				314	set, `GTEST_TOTAL_SHARDS` must be set to a non-zero integer N, and
				315	`GTEST_SHARD_INDEX` must be set to an integer M between 0 and N-1. Given those
				316	two values, the executable must run only every N<sup>th</sup> test starting at
				317	test number M (i.e., every i<sup>th</sup> test where (i mod N) == M). dd
				318
				319	This mechanism produces roughly equally-sized sets of tests that will hopefully
				320	take roughly equal times to execute, but cannot guarantee the latter property
				321	to any degree of precision. If you need them to be as close to the same
				322	duration as possible, you will need a more complicated process. For example,
				323	you could run all of the tests once to determine their individual running
				324	times, and then build up lists of tests based on that, or do something even
				325	more complicated based on multiple test runs to smooth over variance in test
				326	execution times. Chromium does not currently attempt to do this for functional
				327	tests, but we do something similar for performance tests in order to better
				328	achieve equal running times and device affinity for consistent results.
				329
				330	You cannot practically use the sharding mechanism to run a stable named set of
				331	tests, so if you want to do the latter, use the `--isolated-script-test-filter`
				332	flags instead, as described in the previous section.
				333
				334	Which tests are in which shard must be determined after tests have been
				335	filtered out using the `--isolated-script-test-filter(-file)` flags.
				336
				337	The order that tests are run in is not otherwise specified, but tests are
				338	commonly run either in lexicographic order or in a semi-fixed random order; the
				339	latter is useful to help identify inter-test dependencies, i.e., tests that
				340	rely on the results of previous tests having run in order to pass (such tests
				341	are generally considered to be undesirable).
				342
				343	## Examples
				344
				345	Assume that out/Default is a debug build (i.e., that the "Debug" tag will
				346	apply), and that you have tests named Foo.Bar.bar{1,2,3}, Foo.Bar.baz,
				347	and Foo.Quux.quux, and the following two filter files:
				348
				349	```sh
				350	$ cat filter1
				351	Foo.Bar.*
				352	-Foo.Bar.bar3
				353	$ cat filter2
				354	# tags: [ Debug Release ]
				355	[ Debug ] Foo.Bar.bar2 [ Skip ]
				356	$
				357	```
				358
				359	#### Filtering tests on the command line
				360
				361	```sh
				362	$ out/Default/bin/run_foo_tests \
				363	--isolated_script-test-filter='Foo.Bar.*::-Foo.Bar.bar3'
				364	[1/2] Foo.Bar.bar1 passed in 0.1s
				365	[2/2] Foo.Bar.bar2 passed in 0.13s
				366
				367	2 tests passed in 0.23s, 0 skipped, 0 failures.
				368	$
				369	```
				370
				371	#### Using a filter file
				372
				373	```sh
				374	$ out/Default/bin/run_foo_tests --isolated-script-test-filter-file=filter1
				375	[1/2] Foo.Bar.bar1 passed in 0.1s
				376	[2/2] Foo.Bar.bar2 passed in 0.13s
				377
				378	2 tests passed in 0.23s, 0 skipped, 0 failures.
				379	```
				380
				381	#### Combining multiple filters
				382
				383	```sh
				384	$ out/Default/bin/run_foo_tests --isolated-script-test-filter='Foo.Bar.*' \
				385	--isolated-script-test-filter='Foo.Bar.bar2'
				386	[1/1] Foo.Bar.bar2 passed in 0.13s
				387
				388	All 2 tests completed successfully in 0.13s
				389	$ out/Default/bin/run_foo_tests --isolated-script-test-filter='Foo.Bar.* \
				390	--isolated-script-test-filter='Foo.Baz.baz'
				391	No tests to run.
				392	$ out/Default/bin/run_foo_tests --isolated-script-test-filter-file=filter2 \
				393	--isolated-script-test-filter=-FooBaz.baz
				394	[1/4] Foo.Bar.bar1 passed in 0.1s
				395	[2/4] Foo.Bar.bar3 passed in 0.13s
				396	[3/4] Foo.Baz.baz passed in 0.05s
				397
				398	3 tests passed in 0.28s, 2 skipped, 0 failures.
				399	$
				400	```
				401
				402	#### Running one shard of tests
				403
				404	```sh
				405	$ GTEST_TOTAL_SHARDS=3 GTEST_SHARD_INDEX=1 out/Default/bin/run_foo_tests
				406	Foo.Bar.bar2 passed in 0.13s
				407	Foo.Quux.quux1 passed in 0.02s
				408
				409	2 tests passed in 0.15s, 0 skipped, 0 failures.
				410	$
				411	```
				412
				413	## Related Work
				414
				415	This document only partially makes sense in isolation.
				416
				417	The [JSON Test Results Format](json_test_results_format.md) document
				418	specifies how the results of the test run should be reported.
				419
				420	The [Chromium Test List Format][14] specifies in more detail how we can specify
				421	which tests to run and which to skip, and whether the tests are expected to
				422	pass or fail.
				423
				424	Implementing everything in this document plus the preceding three documents
				425	should fully specify how tests are run in Chromium. And, if we do this,
				426	implementing tools to manage tests should be significantly easier.
				427
				428	[On Naming Chromium Builders and Build Steps][15] is a related proposal that
				429	has been partially implemented; it is complementary to this work, but not
				430	required.
				431
				432	[Cleaning up the Chromium Testing Conventions][3] describes a series of
				433	changes we might want to make to this API and the related infrastructure to
				434	simplify things.
				435
				436	Additional documents that may be of interest:
				437	* [Testing Configuration Files][8]
				438	* [The MB (Meta-Build wrapper) User Guide][10]
				439	* [The MB (Meta-Build wrapper) Design Spec][11]
				440	* [Test Activation / Deactivation (TADA)][12] (internal Google document only,
				441	sorry)
				442	* [Standardize Artifacts for Chromium Testing][13] is somewhat dated but goes
				443	into slightly greater detail on how to store artifacts produced by tests
				444	than the JSON Test Results Format does.
				445
				446	## Document history
				447
				448	\[ Significant changes only. \]
				449
				450	\| Date \| Comment \|
				451	\| ---------- \| -------- \|
				452	\| 2017-12-13 \| Initial version. This tried to be a full-featured spec that defined common flags that devs might want with friendly names, as well the flags needed to run tests on the bots. \|
				453	\| 2019-05-24 \| Second version. The spec was significantly revised to just specify the minimal subset needed to run tests consistently on bots given the current infrastructure. \|
				454	\| 2019-05-29 \| All TODOs and discussion of future work was stripped out; now the spec only specifies how the `isolated_scripts` currently behave. Future work was moved to a new doc, [Cleaning up the Chromium Testing Environment][3]. \|
				455	\| 2019-09-16 \| Add comment about ordering of filters and longest match winning for `--isolated-script-test-filter`. \|
				456	\| 2020-07-01 \| Moved into the src repo and converted to Markdown. No content changes otherwise. \|
				457
				458	## Notes
				459
				460	(*) The initial version of this document talked about test runners instead of
				461	test executables, so the bit.ly shortcut URL refers to the test-runner-api instead of
				462	the test-executable-api. The author attempted to create a test-executable-api link,
				463	but pointed it at the wrong document by accident. bit.ly URLs can't easily be
				464	updated :(.
				465
				466	[1]: https://bit.ly/chromium-test-runner-api
John Palmer	046f987	2021-05-24 01:24:56	[diff] [blame]	467	[2]: https://chromium.googlesource.com/infra/infra/+/main/doc/users/services/about_luci.md
Dirk Pranke	e034e1ef5	2020-07-03 00:48:08	[diff] [blame]	468	[3]: https://docs.google.com/document/d/1MwnIx8kavuLSpZo3JmL9T7nkjTz1rpaJA4Vdj_9cRYw/edit?usp=sharing
				469	[4]: ../../testing/buildbot/test_suites.pyl
				470	[5]: ../../testing/buildbot/gn_isolate_map.pyl
				471	[6]: ../../testing/buildbot/test_suite_exceptions.pyl
				472	[7]: ../../testing/buildbot/waterfalls.pyl
				473	[8]: ../../testing/buildbot/README.md
				474	[9]: https://bit.ly/chromium-test-list-format
				475	[10]: ../../tools/mb/docs/user_guide.md
				476	[11]: ../../tools/mb/docs/design_spec.md
				477	[12]: https://goto.google.com/chops-tada
				478	[13]: https://bit.ly/chromium-test-artifacts
				479	[14]: https://bit.ly/chromium-test-list-format
				480	[15]: https://bit.ly/chromium-build-naming