blob: 16d613ac28db47ca4af097cd7b3cdf6ca369dce3 [file] [log] [blame] [view]
Dirk Prankee034e1ef52020-07-03 00:48:081# The Chromium Test Executable API
2
3[bit.ly/chromium-test-runner-api][1] (*)
4
5
6[TOC]
7
8## Introduction
9
10This document defines the API that test executables must implement in order to
11be run on the Chromium continuous integration infrastructure (the
12[LUCI][2]
13system using the `chromium` and `chromium_trybot` recipes).
14
15*** note
16**NOTE:** This document specifies the existing `isolated_scripts` API in the
17Chromium recipe. Currently we also support other APIs (e.g., for
18GTests), but we should migrate them to use the `isolated_scripts` API.
19That work is not currently scheduled.
20***
21
22This spec applies only to functional tests and does not attempt to
23specify how performance tests should work, though in principle they
24could probably work the same way and possibly just produce different
25output.
26
27This document is specifically targeted at Chromium and assumes you are
28using GN and Ninja for your build system. It should be possible to adapt
29these APIs to other projects and build recipes, but this is not an
30immediate goal. Similarly, if a project adapts this API and the related
31specifications it should be able to reuse the functionality and tooling
32we've built out for Chromium's CI system more easily in other LUCI
33deployments.
34
35***
36**NOTE:** It bears repeating that this describes the current state of
37affairs, and not the desired end state. A companion doc,
38[Cleaning up the Chromium Testing Environment][3],
39discusses a possible path forward and end state.
40***
41
42## Building and Invoking a Test Executable
43
44There are lots of different kinds of tests, but we want to be able to
45build and invoke them uniformly, regardless of how they are implemented.
46
47We will call the thing being executed to run the tests a _test
48executable_ (or executable for short). This is not an ideal name, as
49this doesn't necessarily refer to a GN executable target type; it may be
50a wrapper script that invokes other binaries or scripts to run the
51tests.
52
53We expect the test executable to run one or more tests. A _test_ must be
54an atomically addressable thing with a name that is unique to that
55invocation of the executable, i.e., we expect that we can pass a list of
56test names to the test executable and only run just those tests. Test
57names must not contain a "::" (which is used as a separator between test
58names) and must not contain a "*" (which could be confused with a glob
59character) or start with a "-" (which would be confused with an
60indicator that you should skip the test). Test names should generally
61only contain ASCII code points, as the infrastructure does not currently
62guarantee that non-ASCII code points will work correctly everywhere. We
63do not specify test naming conventions beyond these requirements, and it
64is fully permissible for a test to contain multiple assertions which may
65pass or fail; this design does not specify a way to interpret or handle
66those "sub-atomic" assertions; their existence is opaque to this design.
67In particular, this spec does not provide a particular way to identify
68and handle parameterized tests, or to do anything with test suites
69beyond a supporting a limited form of globbing for specifying sets of
70test names.
71
72To configure a new test, you need to modify one to three files:
73
74* The test must be listed in one or more test suites in
75 [//testing/buildbot/test_suites.pyl][4]. Most commonly the test will be
76 defined as a single string (e.g., "base_unittests"), which keys into an
77 entry in [//testing/buildbot/gn_isolate_map.pyl][5]. In some cases, tests
78 will reference a target and add additional command line arguments. These
79 entries (along with [//testing/buildbot/test_suite_exceptions.pyl][6] and
80 [//testing/buildbot/waterfalls.pyl][7]) determine where the tests will be
81 run. For more information on how these files work, see
82 [//testing/buildbot/README.md][8]
83* Tests entries must ultimately reference an entry in
84 //testing/buildbot/gn_isolate_map.pyl. This file contains the mapping of
85 ninja compile targets to GN targets (specifying the GN label for the
86 latter); we need this mapping in order to be able to run `gn analyze`
87 against a patch to see which targets are affected by a patch. This file
88 also tells MB what kind of test an entry is (so we can form the correct
89 command line) and may specify additional command line flags. If you are
90 creating a test that is only a variant of an existing test, this may be the
91 only file you need to modify. (Technically, you could define a new test
92 solely in test_suites.pyl and reference existing gn_isolate_map.pyl
93 entries, but this is considered bad practice).
94* Add the GN target itself to the appropriate build files. Make sure this GN
95 target contains all of the data and data_deps entries needed to ensure the
96 test isolate has all the files the test needs to run. If your test doesn't
97 depend on new build targets or add additional data file dependencies, you
98 likely don't need this. However, this is increasingly uncommon.
99
100### Command Line Arguments
101
102The executable must support the following command line arguments (aka flags):
103
104```
Dirk Pranke8744a672020-08-18 23:11:59105--isolated-outdir=[PATH]
106```
107
108This argument is required, and should be set to the directory created
109by the swarming task for the task to write outputs into.
110
111```
danakj7e0ef7b2023-05-18 22:38:57112--out-dir=[PATH]
113```
114
115This argument mirrors `--isolated-outdir`, but may appear in addition to
116it depending on the bot configuration (e.g. IOS bots that specify the
117`out_dir_arg` mixin in //testing/buildbot/waterfalls.pyl). It only needs
118to be handled in these cases.
119
120```
Dirk Prankee034e1ef52020-07-03 00:48:08121--isolated-script-test-output=[FILENAME]
122```
123
124This argument is optional. If this argument is provided, the executable
125must write the results of the test run in the [JSON Test
126Results Format](json_test_results_format.md) into
127that file. If this argument is not given to the executable, the
128executable must not write the output anywhere. The executable should
129only write a valid version of the file, and generally should only do
130this at the end of the test run. This means that if the run is
131interrupted, you may not get the results of what did run, but that is
132acceptable.
133
134```
135--isolated-script-test-filter=[STRING]
136```
137
138This argument is optional. If this argument is provided, it must be a
139double-colon-separated list of strings, where each string either
140uniquely identifies a full test name or is a prefix plus a "*" on the
141end (to form a glob). The executable must run only the test matching
142those names or globs. "*" is _only_ supported at the end, i.e., 'Foo.*'
143is legal, but '*.bar' is not. If the string has a "-" at the front, the
144test (or glob of tests) must be skipped, not run. This matches how test
145names are specified in the simple form of the [Chromium Test List
146Format][9]. We use the double
147colon as a separator because most other common punctuation characters
148can occur in test names (some test suites use URLs as test names, for
149example). This argument may be provided multiple times; how to treat
150multiple occurrences (and how this arg interacts with
151--isolated-script-test-filter-file) is described below.
152
153```
154--isolated-script-test-filter-file=[FILENAME]
155```
156
157If provided, the executable must read the given filename to determine
158which tests to run and what to expect their results to be. The file must
159be in the [Chromium Test List Format][9] (either the simple or
160tagged formats are fine). This argument may be provided multiple times;
161how to treat multiple occurrences (and how this arg interacts with
162`--isolated-script-test-filter`) is described below.
163
164```
165--isolated-script-test-launcher-retry-limit=N
166```
167
168By default, tests are run only once if they succeed. If they fail, we
169will retry the test up to N times (so, for N+1 total invocations of the
170test) looking for a success (and stop retrying once the test has
171succeed). By default, the value of N is 3. To turn off retries, pass
172`--isolated-script-test-launcher-retry-limit=0`. If this flag is provided,
173it is an error to also pass `--isolated-script-test-repeat` (since -repeat
174specifies an explicit number of times to run the test, it makes no sense
175to also pass --retry-limit).
176
177```
178--isolated-script-test-repeat=N
179```
180
181If provided, the executable must run a given test N times (total),
182regardless of whether the test passes or fails. By default, tests are
183only run once (N=1) if the test matches an expected result or passes,
184otherwise it may be retried until it succeeds, as governed by
185`--isolated-script-test-launcher-retry-limit`, above. If this flag is
186provided, it is an error to also pass
187`--isolated-script-test-launcher-retry-limit` (since -repeat specifies an
188explicit number of times to run the test, it makes no sense to also pass
189-retry-limit).
190
danakj7e0ef7b2023-05-18 22:38:57191```
192--xcode-build-version [VERSION]
193```
194
195This flag is passed to scripts on IOS bots only, due to the `xcode_14_main`
196mixin in //testing/builtbot/waterfalls.pyl.
197
198```
199--xctest
200```
201
202This flag is passed to scripts on IOS bots only, due to the `xctest`
203mixin in //testing/builtbot/waterfalls.pyl.
204
Dirk Prankee034e1ef52020-07-03 00:48:08205If "`--`" is passed as an argument:
206
207* If the executable is a wrapper that invokes another underlying
208 executable, then the wrapper must handle arguments passed before the
209 "--" on the command line (and must error out if it doesn't know how
210 to do that), and must pass through any arguments following the "--"
211 unmodified to the underlying executable (and otherwise ignore them
212 rather than erroring out if it doesn't know how to interpret them).
213* If the executable is not a wrapper, but rather invokes the tests
214 directly, it should handle all of the arguments and otherwise ignore
215 the "--". The executable should error out if it gets arguments it
216 can't handle, but it is not required to do so.
217
218If "--" is not passed, the executable should error out if it gets
219arguments it doesn't know how to handle, but it is not required to do
220so.
221
222If the test executable produces artifacts, they should be written to the
223location specified by the dirname of the `--isolated-script-test-output`
224argument). If the `--isolated-script-test-output-argument` is not
225specified, the executable should store the tests somewhere under the
226root_build_dir, but there is no standard for how to do this currently
227(most tests do not produce artifacts).
228
229The flag names are purposely chosen to be long in order to not conflict
230with other flags the executable might support.
231
232### Environment variables
233
234The executable must check for and honor the following environment variables:
235
236```
237GTEST_SHARD_INDEX=[N]
238```
239
240This environment variable is optional, but if it is provided, it
241partially determines (along with `GTEST_TOTAL_SHARDS`) which fixed
242subset of tests (or "shard") to run. `GTEST_TOTAL_SHARDS` must also be
243set, and `GTEST_SHARD_INDEX` must be set to an integer between 0 and
244`GTEST_TOTAL_SHARDS`. Determining which tests to run is described
245below.
246
247```
248GTEST_TOTAL_SHARDS=[N]
249```
250
251This environment variable is optional, but if it is provided, it
252partially determines (along with `GTEST_TOTAL_SHARDS`) which fixed subset
253of tests (or "shard") to run. It must be set to a non-zero integer.
254Determining which tests to run is described below.
255
256### Exit codes (aka return codes or return values)
257
258The executable must return 0 for a completely successful run, and a
259non-zero result if something failed. The following codes are recommended
260(2 and 130 coming from UNIX conventions):
261
262| Value | Meaning |
263|--------- | ------- |
264| 0 (zero) | The executable ran to completion and all tests either ran as expected or passed unexpectedly. |
265| 1 | The executable ran to completion but some tests produced unexpectedly failing results. |
266| 2 | The executable failed to start, most likely due to unrecognized or unsupported command line arguments. |
267| 130 | The executable run was aborted the user (or caller) in a semi-orderly manner (aka SIGKILL or Ctrl-C). |
268
269## Filtering which tests to run
270
271By default, the executable must run every test it knows about. However,
272as noted above, the `--isolated-script-test-filter` and
273`--isolated-script-test-filter-file` flags can be used to customize which
274tests to run. Either or both flags may be used, and either may be
275specified multiple times.
276
277The interaction is as follows:
278
279* A test should be run only if it would be run when **every** flag is
280 evaluated individually.
281* A test should be skipped if it would be skipped if **any** flag was
282 evaluated individually.
283
284If multiple filters in a flag match a given test name, the longest match
285takes priority (longest match wins). I.e.,. if you had
286`--isolated-script-test-filter='a*::-ab*'`, then `ace.html` would run but
287`abd.html` would not. The order of the filters should not matter. It is
288an error to have multiple expressions of the same length that conflict
289(e.g., `a*::-a*`).
290
291Examples are given below.
292
293It may not be obvious why we need to support these flags being used multiple
294times, or together. There are two main sets of reasons:
295* First, you may want to use multiple -filter-file arguments to specify
296 multiple sets of test expectations (e.g., the base test expectations and
297 then MSAN-specific expectations), or to specify expectations in one file
298 and list which tests to run in a separate file.
299* Second, the way the Chromium recipes work, in order to retry a test step to
300 confirm test failures, the recipe doesn't want to have to parse the
301 existing command line, it just wants to append
302 --isolated-script-test-filter and list the
303 tests that fail, and this can cause the --isolated-script-test-filter
304 argument to be listed multiple times (or in conjunction with
305 --isolated-script-test-filter-file).
306
307You cannot practically use these mechanisms to run equally sized subsets of the
308tests, so if you want to do the latter, use `GTEST_SHARD_INDEX` and
309`GTEST_TOTAL_SHARDS` instead, as described in the next section.
310
311## Running equally-sized subsets of tests (shards)
312
313If the `GTEST_SHARD_INDEX` and `GTEST_TOTAL_SHARDS` environment variables are
314set, `GTEST_TOTAL_SHARDS` must be set to a non-zero integer N, and
315`GTEST_SHARD_INDEX` must be set to an integer M between 0 and N-1. Given those
316two values, the executable must run only every N<sup>th</sup> test starting at
317test number M (i.e., every i<sup>th</sup> test where (i mod N) == M). dd
318
319This mechanism produces roughly equally-sized sets of tests that will hopefully
320take roughly equal times to execute, but cannot guarantee the latter property
321to any degree of precision. If you need them to be as close to the same
322duration as possible, you will need a more complicated process. For example,
323you could run all of the tests once to determine their individual running
324times, and then build up lists of tests based on that, or do something even
325more complicated based on multiple test runs to smooth over variance in test
326execution times. Chromium does not currently attempt to do this for functional
327tests, but we do something similar for performance tests in order to better
328achieve equal running times and device affinity for consistent results.
329
330You cannot practically use the sharding mechanism to run a stable named set of
331tests, so if you want to do the latter, use the `--isolated-script-test-filter`
332flags instead, as described in the previous section.
333
334Which tests are in which shard must be determined **after** tests have been
335filtered out using the `--isolated-script-test-filter(-file)` flags.
336
337The order that tests are run in is not otherwise specified, but tests are
338commonly run either in lexicographic order or in a semi-fixed random order; the
339latter is useful to help identify inter-test dependencies, i.e., tests that
340rely on the results of previous tests having run in order to pass (such tests
341are generally considered to be undesirable).
342
343## Examples
344
345Assume that out/Default is a debug build (i.e., that the "Debug" tag will
346apply), and that you have tests named Foo.Bar.bar{1,2,3}, Foo.Bar.baz,
347and Foo.Quux.quux, and the following two filter files:
348
349```sh
350$ cat filter1
351Foo.Bar.*
352-Foo.Bar.bar3
353$ cat filter2
354# tags: [ Debug Release ]
355[ Debug ] Foo.Bar.bar2 [ Skip ]
356$
357```
358
359#### Filtering tests on the command line
360
361```sh
362$ out/Default/bin/run_foo_tests \
363 --isolated_script-test-filter='Foo.Bar.*::-Foo.Bar.bar3'
364[1/2] Foo.Bar.bar1 passed in 0.1s
365[2/2] Foo.Bar.bar2 passed in 0.13s
366
3672 tests passed in 0.23s, 0 skipped, 0 failures.
368$
369```
370
371#### Using a filter file
372
373```sh
374$ out/Default/bin/run_foo_tests --isolated-script-test-filter-file=filter1
375[1/2] Foo.Bar.bar1 passed in 0.1s
376[2/2] Foo.Bar.bar2 passed in 0.13s
377
3782 tests passed in 0.23s, 0 skipped, 0 failures.
379```
380
381#### Combining multiple filters
382
383```sh
384$ out/Default/bin/run_foo_tests --isolated-script-test-filter='Foo.Bar.*' \
385 --isolated-script-test-filter='Foo.Bar.bar2'
386[1/1] Foo.Bar.bar2 passed in 0.13s
387
388All 2 tests completed successfully in 0.13s
389$ out/Default/bin/run_foo_tests --isolated-script-test-filter='Foo.Bar.* \
390 --isolated-script-test-filter='Foo.Baz.baz'
391No tests to run.
392$ out/Default/bin/run_foo_tests --isolated-script-test-filter-file=filter2 \
393 --isolated-script-test-filter=-FooBaz.baz
394[1/4] Foo.Bar.bar1 passed in 0.1s
395[2/4] Foo.Bar.bar3 passed in 0.13s
396[3/4] Foo.Baz.baz passed in 0.05s
397
3983 tests passed in 0.28s, 2 skipped, 0 failures.
399$
400```
401
402#### Running one shard of tests
403
404```sh
405$ GTEST_TOTAL_SHARDS=3 GTEST_SHARD_INDEX=1 out/Default/bin/run_foo_tests
406Foo.Bar.bar2 passed in 0.13s
407Foo.Quux.quux1 passed in 0.02s
408
4092 tests passed in 0.15s, 0 skipped, 0 failures.
410$
411```
412
413## Related Work
414
415This document only partially makes sense in isolation.
416
417The [JSON Test Results Format](json_test_results_format.md) document
418specifies how the results of the test run should be reported.
419
420The [Chromium Test List Format][14] specifies in more detail how we can specify
421which tests to run and which to skip, and whether the tests are expected to
422pass or fail.
423
424Implementing everything in this document plus the preceding three documents
425should fully specify how tests are run in Chromium. And, if we do this,
426implementing tools to manage tests should be significantly easier.
427
428[On Naming Chromium Builders and Build Steps][15] is a related proposal that
429has been partially implemented; it is complementary to this work, but not
430required.
431
432[Cleaning up the Chromium Testing Conventions][3] describes a series of
433changes we might want to make to this API and the related infrastructure to
434simplify things.
435
436Additional documents that may be of interest:
437* [Testing Configuration Files][8]
438* [The MB (Meta-Build wrapper) User Guide][10]
439* [The MB (Meta-Build wrapper) Design Spec][11]
440* [Test Activation / Deactivation (TADA)][12] (internal Google document only,
441 sorry)
442* [Standardize Artifacts for Chromium Testing][13] is somewhat dated but goes
443 into slightly greater detail on how to store artifacts produced by tests
444 than the JSON Test Results Format does.
445
446## Document history
447
448\[ Significant changes only. \]
449
450| Date | Comment |
451| ---------- | -------- |
452| 2017-12-13 | Initial version. This tried to be a full-featured spec that defined common flags that devs might want with friendly names, as well the flags needed to run tests on the bots. |
453| 2019-05-24 | Second version. The spec was significantly revised to just specify the minimal subset needed to run tests consistently on bots given the current infrastructure. |
454| 2019-05-29 | All TODOs and discussion of future work was stripped out; now the spec only specifies how the `isolated_scripts` currently behave. Future work was moved to a new doc, [Cleaning up the Chromium Testing Environment][3]. |
455| 2019-09-16 | Add comment about ordering of filters and longest match winning for `--isolated-script-test-filter`. |
456| 2020-07-01 | Moved into the src repo and converted to Markdown. No content changes otherwise. |
457
458## Notes
459
460(*) The initial version of this document talked about test runners instead of
461test executables, so the bit.ly shortcut URL refers to the test-runner-api instead of
462the test-executable-api. The author attempted to create a test-executable-api link,
463but pointed it at the wrong document by accident. bit.ly URLs can't easily be
464updated :(.
465
466[1]: https://bit.ly/chromium-test-runner-api
John Palmer046f9872021-05-24 01:24:56467[2]: https://chromium.googlesource.com/infra/infra/+/main/doc/users/services/about_luci.md
Dirk Prankee034e1ef52020-07-03 00:48:08468[3]: https://docs.google.com/document/d/1MwnIx8kavuLSpZo3JmL9T7nkjTz1rpaJA4Vdj_9cRYw/edit?usp=sharing
469[4]: ../../testing/buildbot/test_suites.pyl
470[5]: ../../testing/buildbot/gn_isolate_map.pyl
471[6]: ../../testing/buildbot/test_suite_exceptions.pyl
472[7]: ../../testing/buildbot/waterfalls.pyl
473[8]: ../../testing/buildbot/README.md
474[9]: https://bit.ly/chromium-test-list-format
475[10]: ../../tools/mb/docs/user_guide.md
476[11]: ../../tools/mb/docs/design_spec.md
477[12]: https://goto.google.com/chops-tada
478[13]: https://bit.ly/chromium-test-artifacts
479[14]: https://bit.ly/chromium-test-list-format
480[15]: https://bit.ly/chromium-build-naming