blob: 703fcee1a9ddd5ea1c8a40b29b9f5418b12d3e87 [file] [log] [blame] [view]
Dirk Prankee034e1ef52020-07-03 00:48:081# The Chromium Test Executable API
2
3[bit.ly/chromium-test-runner-api][1] (*)
4
5
6[TOC]
7
8## Introduction
9
10This document defines the API that test executables must implement in order to
11be run on the Chromium continuous integration infrastructure (the
12[LUCI][2]
13system using the `chromium` and `chromium_trybot` recipes).
14
15*** note
16**NOTE:** This document specifies the existing `isolated_scripts` API in the
17Chromium recipe. Currently we also support other APIs (e.g., for
18GTests), but we should migrate them to use the `isolated_scripts` API.
19That work is not currently scheduled.
20***
21
22This spec applies only to functional tests and does not attempt to
23specify how performance tests should work, though in principle they
24could probably work the same way and possibly just produce different
25output.
26
27This document is specifically targeted at Chromium and assumes you are
28using GN and Ninja for your build system. It should be possible to adapt
29these APIs to other projects and build recipes, but this is not an
30immediate goal. Similarly, if a project adapts this API and the related
31specifications it should be able to reuse the functionality and tooling
32we've built out for Chromium's CI system more easily in other LUCI
33deployments.
34
35***
36**NOTE:** It bears repeating that this describes the current state of
37affairs, and not the desired end state. A companion doc,
38[Cleaning up the Chromium Testing Environment][3],
39discusses a possible path forward and end state.
40***
41
42## Building and Invoking a Test Executable
43
44There are lots of different kinds of tests, but we want to be able to
45build and invoke them uniformly, regardless of how they are implemented.
46
47We will call the thing being executed to run the tests a _test
48executable_ (or executable for short). This is not an ideal name, as
49this doesn't necessarily refer to a GN executable target type; it may be
50a wrapper script that invokes other binaries or scripts to run the
51tests.
52
53We expect the test executable to run one or more tests. A _test_ must be
54an atomically addressable thing with a name that is unique to that
55invocation of the executable, i.e., we expect that we can pass a list of
56test names to the test executable and only run just those tests. Test
57names must not contain a "::" (which is used as a separator between test
58names) and must not contain a "*" (which could be confused with a glob
59character) or start with a "-" (which would be confused with an
60indicator that you should skip the test). Test names should generally
61only contain ASCII code points, as the infrastructure does not currently
62guarantee that non-ASCII code points will work correctly everywhere. We
63do not specify test naming conventions beyond these requirements, and it
64is fully permissible for a test to contain multiple assertions which may
65pass or fail; this design does not specify a way to interpret or handle
66those "sub-atomic" assertions; their existence is opaque to this design.
67In particular, this spec does not provide a particular way to identify
68and handle parameterized tests, or to do anything with test suites
69beyond a supporting a limited form of globbing for specifying sets of
70test names.
71
72To configure a new test, you need to modify one to three files:
73
74* The test must be listed in one or more test suites in
75 [//testing/buildbot/test_suites.pyl][4]. Most commonly the test will be
76 defined as a single string (e.g., "base_unittests"), which keys into an
77 entry in [//testing/buildbot/gn_isolate_map.pyl][5]. In some cases, tests
78 will reference a target and add additional command line arguments. These
79 entries (along with [//testing/buildbot/test_suite_exceptions.pyl][6] and
80 [//testing/buildbot/waterfalls.pyl][7]) determine where the tests will be
81 run. For more information on how these files work, see
82 [//testing/buildbot/README.md][8]
83* Tests entries must ultimately reference an entry in
84 //testing/buildbot/gn_isolate_map.pyl. This file contains the mapping of
85 ninja compile targets to GN targets (specifying the GN label for the
86 latter); we need this mapping in order to be able to run `gn analyze`
87 against a patch to see which targets are affected by a patch. This file
88 also tells MB what kind of test an entry is (so we can form the correct
89 command line) and may specify additional command line flags. If you are
90 creating a test that is only a variant of an existing test, this may be the
91 only file you need to modify. (Technically, you could define a new test
92 solely in test_suites.pyl and reference existing gn_isolate_map.pyl
93 entries, but this is considered bad practice).
94* Add the GN target itself to the appropriate build files. Make sure this GN
95 target contains all of the data and data_deps entries needed to ensure the
96 test isolate has all the files the test needs to run. If your test doesn't
97 depend on new build targets or add additional data file dependencies, you
98 likely don't need this. However, this is increasingly uncommon.
99
100### Command Line Arguments
101
102The executable must support the following command line arguments (aka flags):
103
104```
105--isolated-script-test-output=[FILENAME]
106```
107
108This argument is optional. If this argument is provided, the executable
109must write the results of the test run in the [JSON Test
110Results Format](json_test_results_format.md) into
111that file. If this argument is not given to the executable, the
112executable must not write the output anywhere. The executable should
113only write a valid version of the file, and generally should only do
114this at the end of the test run. This means that if the run is
115interrupted, you may not get the results of what did run, but that is
116acceptable.
117
118```
119--isolated-script-test-filter=[STRING]
120```
121
122This argument is optional. If this argument is provided, it must be a
123double-colon-separated list of strings, where each string either
124uniquely identifies a full test name or is a prefix plus a "*" on the
125end (to form a glob). The executable must run only the test matching
126those names or globs. "*" is _only_ supported at the end, i.e., 'Foo.*'
127is legal, but '*.bar' is not. If the string has a "-" at the front, the
128test (or glob of tests) must be skipped, not run. This matches how test
129names are specified in the simple form of the [Chromium Test List
130Format][9]. We use the double
131colon as a separator because most other common punctuation characters
132can occur in test names (some test suites use URLs as test names, for
133example). This argument may be provided multiple times; how to treat
134multiple occurrences (and how this arg interacts with
135--isolated-script-test-filter-file) is described below.
136
137```
138--isolated-script-test-filter-file=[FILENAME]
139```
140
141If provided, the executable must read the given filename to determine
142which tests to run and what to expect their results to be. The file must
143be in the [Chromium Test List Format][9] (either the simple or
144tagged formats are fine). This argument may be provided multiple times;
145how to treat multiple occurrences (and how this arg interacts with
146`--isolated-script-test-filter`) is described below.
147
148```
149--isolated-script-test-launcher-retry-limit=N
150```
151
152By default, tests are run only once if they succeed. If they fail, we
153will retry the test up to N times (so, for N+1 total invocations of the
154test) looking for a success (and stop retrying once the test has
155succeed). By default, the value of N is 3. To turn off retries, pass
156`--isolated-script-test-launcher-retry-limit=0`. If this flag is provided,
157it is an error to also pass `--isolated-script-test-repeat` (since -repeat
158specifies an explicit number of times to run the test, it makes no sense
159to also pass --retry-limit).
160
161```
162--isolated-script-test-repeat=N
163```
164
165If provided, the executable must run a given test N times (total),
166regardless of whether the test passes or fails. By default, tests are
167only run once (N=1) if the test matches an expected result or passes,
168otherwise it may be retried until it succeeds, as governed by
169`--isolated-script-test-launcher-retry-limit`, above. If this flag is
170provided, it is an error to also pass
171`--isolated-script-test-launcher-retry-limit` (since -repeat specifies an
172explicit number of times to run the test, it makes no sense to also pass
173-retry-limit).
174
175If "`--`" is passed as an argument:
176
177* If the executable is a wrapper that invokes another underlying
178 executable, then the wrapper must handle arguments passed before the
179 "--" on the command line (and must error out if it doesn't know how
180 to do that), and must pass through any arguments following the "--"
181 unmodified to the underlying executable (and otherwise ignore them
182 rather than erroring out if it doesn't know how to interpret them).
183* If the executable is not a wrapper, but rather invokes the tests
184 directly, it should handle all of the arguments and otherwise ignore
185 the "--". The executable should error out if it gets arguments it
186 can't handle, but it is not required to do so.
187
188If "--" is not passed, the executable should error out if it gets
189arguments it doesn't know how to handle, but it is not required to do
190so.
191
192If the test executable produces artifacts, they should be written to the
193location specified by the dirname of the `--isolated-script-test-output`
194argument). If the `--isolated-script-test-output-argument` is not
195specified, the executable should store the tests somewhere under the
196root_build_dir, but there is no standard for how to do this currently
197(most tests do not produce artifacts).
198
199The flag names are purposely chosen to be long in order to not conflict
200with other flags the executable might support.
201
202### Environment variables
203
204The executable must check for and honor the following environment variables:
205
206```
207GTEST_SHARD_INDEX=[N]
208```
209
210This environment variable is optional, but if it is provided, it
211partially determines (along with `GTEST_TOTAL_SHARDS`) which fixed
212subset of tests (or "shard") to run. `GTEST_TOTAL_SHARDS` must also be
213set, and `GTEST_SHARD_INDEX` must be set to an integer between 0 and
214`GTEST_TOTAL_SHARDS`. Determining which tests to run is described
215below.
216
217```
218GTEST_TOTAL_SHARDS=[N]
219```
220
221This environment variable is optional, but if it is provided, it
222partially determines (along with `GTEST_TOTAL_SHARDS`) which fixed subset
223of tests (or "shard") to run. It must be set to a non-zero integer.
224Determining which tests to run is described below.
225
226### Exit codes (aka return codes or return values)
227
228The executable must return 0 for a completely successful run, and a
229non-zero result if something failed. The following codes are recommended
230(2 and 130 coming from UNIX conventions):
231
232| Value | Meaning |
233|--------- | ------- |
234| 0 (zero) | The executable ran to completion and all tests either ran as expected or passed unexpectedly. |
235| 1 | The executable ran to completion but some tests produced unexpectedly failing results. |
236| 2 | The executable failed to start, most likely due to unrecognized or unsupported command line arguments. |
237| 130 | The executable run was aborted the user (or caller) in a semi-orderly manner (aka SIGKILL or Ctrl-C). |
238
239## Filtering which tests to run
240
241By default, the executable must run every test it knows about. However,
242as noted above, the `--isolated-script-test-filter` and
243`--isolated-script-test-filter-file` flags can be used to customize which
244tests to run. Either or both flags may be used, and either may be
245specified multiple times.
246
247The interaction is as follows:
248
249* A test should be run only if it would be run when **every** flag is
250 evaluated individually.
251* A test should be skipped if it would be skipped if **any** flag was
252 evaluated individually.
253
254If multiple filters in a flag match a given test name, the longest match
255takes priority (longest match wins). I.e.,. if you had
256`--isolated-script-test-filter='a*::-ab*'`, then `ace.html` would run but
257`abd.html` would not. The order of the filters should not matter. It is
258an error to have multiple expressions of the same length that conflict
259(e.g., `a*::-a*`).
260
261Examples are given below.
262
263It may not be obvious why we need to support these flags being used multiple
264times, or together. There are two main sets of reasons:
265* First, you may want to use multiple -filter-file arguments to specify
266 multiple sets of test expectations (e.g., the base test expectations and
267 then MSAN-specific expectations), or to specify expectations in one file
268 and list which tests to run in a separate file.
269* Second, the way the Chromium recipes work, in order to retry a test step to
270 confirm test failures, the recipe doesn't want to have to parse the
271 existing command line, it just wants to append
272 --isolated-script-test-filter and list the
273 tests that fail, and this can cause the --isolated-script-test-filter
274 argument to be listed multiple times (or in conjunction with
275 --isolated-script-test-filter-file).
276
277You cannot practically use these mechanisms to run equally sized subsets of the
278tests, so if you want to do the latter, use `GTEST_SHARD_INDEX` and
279`GTEST_TOTAL_SHARDS` instead, as described in the next section.
280
281## Running equally-sized subsets of tests (shards)
282
283If the `GTEST_SHARD_INDEX` and `GTEST_TOTAL_SHARDS` environment variables are
284set, `GTEST_TOTAL_SHARDS` must be set to a non-zero integer N, and
285`GTEST_SHARD_INDEX` must be set to an integer M between 0 and N-1. Given those
286two values, the executable must run only every N<sup>th</sup> test starting at
287test number M (i.e., every i<sup>th</sup> test where (i mod N) == M). dd
288
289This mechanism produces roughly equally-sized sets of tests that will hopefully
290take roughly equal times to execute, but cannot guarantee the latter property
291to any degree of precision. If you need them to be as close to the same
292duration as possible, you will need a more complicated process. For example,
293you could run all of the tests once to determine their individual running
294times, and then build up lists of tests based on that, or do something even
295more complicated based on multiple test runs to smooth over variance in test
296execution times. Chromium does not currently attempt to do this for functional
297tests, but we do something similar for performance tests in order to better
298achieve equal running times and device affinity for consistent results.
299
300You cannot practically use the sharding mechanism to run a stable named set of
301tests, so if you want to do the latter, use the `--isolated-script-test-filter`
302flags instead, as described in the previous section.
303
304Which tests are in which shard must be determined **after** tests have been
305filtered out using the `--isolated-script-test-filter(-file)` flags.
306
307The order that tests are run in is not otherwise specified, but tests are
308commonly run either in lexicographic order or in a semi-fixed random order; the
309latter is useful to help identify inter-test dependencies, i.e., tests that
310rely on the results of previous tests having run in order to pass (such tests
311are generally considered to be undesirable).
312
313## Examples
314
315Assume that out/Default is a debug build (i.e., that the "Debug" tag will
316apply), and that you have tests named Foo.Bar.bar{1,2,3}, Foo.Bar.baz,
317and Foo.Quux.quux, and the following two filter files:
318
319```sh
320$ cat filter1
321Foo.Bar.*
322-Foo.Bar.bar3
323$ cat filter2
324# tags: [ Debug Release ]
325[ Debug ] Foo.Bar.bar2 [ Skip ]
326$
327```
328
329#### Filtering tests on the command line
330
331```sh
332$ out/Default/bin/run_foo_tests \
333 --isolated_script-test-filter='Foo.Bar.*::-Foo.Bar.bar3'
334[1/2] Foo.Bar.bar1 passed in 0.1s
335[2/2] Foo.Bar.bar2 passed in 0.13s
336
3372 tests passed in 0.23s, 0 skipped, 0 failures.
338$
339```
340
341#### Using a filter file
342
343```sh
344$ out/Default/bin/run_foo_tests --isolated-script-test-filter-file=filter1
345[1/2] Foo.Bar.bar1 passed in 0.1s
346[2/2] Foo.Bar.bar2 passed in 0.13s
347
3482 tests passed in 0.23s, 0 skipped, 0 failures.
349```
350
351#### Combining multiple filters
352
353```sh
354$ out/Default/bin/run_foo_tests --isolated-script-test-filter='Foo.Bar.*' \
355 --isolated-script-test-filter='Foo.Bar.bar2'
356[1/1] Foo.Bar.bar2 passed in 0.13s
357
358All 2 tests completed successfully in 0.13s
359$ out/Default/bin/run_foo_tests --isolated-script-test-filter='Foo.Bar.* \
360 --isolated-script-test-filter='Foo.Baz.baz'
361No tests to run.
362$ out/Default/bin/run_foo_tests --isolated-script-test-filter-file=filter2 \
363 --isolated-script-test-filter=-FooBaz.baz
364[1/4] Foo.Bar.bar1 passed in 0.1s
365[2/4] Foo.Bar.bar3 passed in 0.13s
366[3/4] Foo.Baz.baz passed in 0.05s
367
3683 tests passed in 0.28s, 2 skipped, 0 failures.
369$
370```
371
372#### Running one shard of tests
373
374```sh
375$ GTEST_TOTAL_SHARDS=3 GTEST_SHARD_INDEX=1 out/Default/bin/run_foo_tests
376Foo.Bar.bar2 passed in 0.13s
377Foo.Quux.quux1 passed in 0.02s
378
3792 tests passed in 0.15s, 0 skipped, 0 failures.
380$
381```
382
383## Related Work
384
385This document only partially makes sense in isolation.
386
387The [JSON Test Results Format](json_test_results_format.md) document
388specifies how the results of the test run should be reported.
389
390The [Chromium Test List Format][14] specifies in more detail how we can specify
391which tests to run and which to skip, and whether the tests are expected to
392pass or fail.
393
394Implementing everything in this document plus the preceding three documents
395should fully specify how tests are run in Chromium. And, if we do this,
396implementing tools to manage tests should be significantly easier.
397
398[On Naming Chromium Builders and Build Steps][15] is a related proposal that
399has been partially implemented; it is complementary to this work, but not
400required.
401
402[Cleaning up the Chromium Testing Conventions][3] describes a series of
403changes we might want to make to this API and the related infrastructure to
404simplify things.
405
406Additional documents that may be of interest:
407* [Testing Configuration Files][8]
408* [The MB (Meta-Build wrapper) User Guide][10]
409* [The MB (Meta-Build wrapper) Design Spec][11]
410* [Test Activation / Deactivation (TADA)][12] (internal Google document only,
411 sorry)
412* [Standardize Artifacts for Chromium Testing][13] is somewhat dated but goes
413 into slightly greater detail on how to store artifacts produced by tests
414 than the JSON Test Results Format does.
415
416## Document history
417
418\[ Significant changes only. \]
419
420| Date | Comment |
421| ---------- | -------- |
422| 2017-12-13 | Initial version. This tried to be a full-featured spec that defined common flags that devs might want with friendly names, as well the flags needed to run tests on the bots. |
423| 2019-05-24 | Second version. The spec was significantly revised to just specify the minimal subset needed to run tests consistently on bots given the current infrastructure. |
424| 2019-05-29 | All TODOs and discussion of future work was stripped out; now the spec only specifies how the `isolated_scripts` currently behave. Future work was moved to a new doc, [Cleaning up the Chromium Testing Environment][3]. |
425| 2019-09-16 | Add comment about ordering of filters and longest match winning for `--isolated-script-test-filter`. |
426| 2020-07-01 | Moved into the src repo and converted to Markdown. No content changes otherwise. |
427
428## Notes
429
430(*) The initial version of this document talked about test runners instead of
431test executables, so the bit.ly shortcut URL refers to the test-runner-api instead of
432the test-executable-api. The author attempted to create a test-executable-api link,
433but pointed it at the wrong document by accident. bit.ly URLs can't easily be
434updated :(.
435
436[1]: https://bit.ly/chromium-test-runner-api
437[2]: https://chromium.googlesource.com/infra/infra/+/master/doc/users/services/about_luci.md
438[3]: https://docs.google.com/document/d/1MwnIx8kavuLSpZo3JmL9T7nkjTz1rpaJA4Vdj_9cRYw/edit?usp=sharing
439[4]: ../../testing/buildbot/test_suites.pyl
440[5]: ../../testing/buildbot/gn_isolate_map.pyl
441[6]: ../../testing/buildbot/test_suite_exceptions.pyl
442[7]: ../../testing/buildbot/waterfalls.pyl
443[8]: ../../testing/buildbot/README.md
444[9]: https://bit.ly/chromium-test-list-format
445[10]: ../../tools/mb/docs/user_guide.md
446[11]: ../../tools/mb/docs/design_spec.md
447[12]: https://goto.google.com/chops-tada
448[13]: https://bit.ly/chromium-test-artifacts
449[14]: https://bit.ly/chromium-test-list-format
450[15]: https://bit.ly/chromium-build-naming