| # The Chromium Test Executable API |
| |
| [bit.ly/chromium-test-runner-api][1] (*) |
| |
| |
| [TOC] |
| |
| ## Introduction |
| |
| This document defines the API that test executables must implement in order to |
| be run on the Chromium continuous integration infrastructure (the |
| [LUCI][2] |
| system using the `chromium` and `chromium_trybot` recipes). |
| |
| *** note |
| **NOTE:** This document specifies the existing `isolated_scripts` API in the |
| Chromium recipe. Currently we also support other APIs (e.g., for |
| GTests), but we should migrate them to use the `isolated_scripts` API. |
| That work is not currently scheduled. |
| *** |
| |
| This spec applies only to functional tests and does not attempt to |
| specify how performance tests should work, though in principle they |
| could probably work the same way and possibly just produce different |
| output. |
| |
| This document is specifically targeted at Chromium and assumes you are |
| using GN and Ninja for your build system. It should be possible to adapt |
| these APIs to other projects and build recipes, but this is not an |
| immediate goal. Similarly, if a project adapts this API and the related |
| specifications it should be able to reuse the functionality and tooling |
| we've built out for Chromium's CI system more easily in other LUCI |
| deployments. |
| |
| *** |
| **NOTE:** It bears repeating that this describes the current state of |
| affairs, and not the desired end state. A companion doc, |
| [Cleaning up the Chromium Testing Environment][3], |
| discusses a possible path forward and end state. |
| *** |
| |
| ## Building and Invoking a Test Executable |
| |
| There are lots of different kinds of tests, but we want to be able to |
| build and invoke them uniformly, regardless of how they are implemented. |
| |
| We will call the thing being executed to run the tests a _test |
| executable_ (or executable for short). This is not an ideal name, as |
| this doesn't necessarily refer to a GN executable target type; it may be |
| a wrapper script that invokes other binaries or scripts to run the |
| tests. |
| |
| We expect the test executable to run one or more tests. A _test_ must be |
| an atomically addressable thing with a name that is unique to that |
| invocation of the executable, i.e., we expect that we can pass a list of |
| test names to the test executable and only run just those tests. Test |
| names must not contain a "::" (which is used as a separator between test |
| names) and must not contain a "*" (which could be confused with a glob |
| character) or start with a "-" (which would be confused with an |
| indicator that you should skip the test). Test names should generally |
| only contain ASCII code points, as the infrastructure does not currently |
| guarantee that non-ASCII code points will work correctly everywhere. We |
| do not specify test naming conventions beyond these requirements, and it |
| is fully permissible for a test to contain multiple assertions which may |
| pass or fail; this design does not specify a way to interpret or handle |
| those "sub-atomic" assertions; their existence is opaque to this design. |
| In particular, this spec does not provide a particular way to identify |
| and handle parameterized tests, or to do anything with test suites |
| beyond a supporting a limited form of globbing for specifying sets of |
| test names. |
| |
| To configure a new test, you need to modify one to three files: |
| |
| * The test must be listed in one or more test suites in |
| [//testing/buildbot/test_suites.pyl][4]. Most commonly the test will be |
| defined as a single string (e.g., "base_unittests"), which keys into an |
| entry in [//testing/buildbot/gn_isolate_map.pyl][5]. In some cases, tests |
| will reference a target and add additional command line arguments. These |
| entries (along with [//testing/buildbot/test_suite_exceptions.pyl][6] and |
| [//testing/buildbot/waterfalls.pyl][7]) determine where the tests will be |
| run. For more information on how these files work, see |
| [//testing/buildbot/README.md][8] |
| * Tests entries must ultimately reference an entry in |
| //testing/buildbot/gn_isolate_map.pyl. This file contains the mapping of |
| ninja compile targets to GN targets (specifying the GN label for the |
| latter); we need this mapping in order to be able to run `gn analyze` |
| against a patch to see which targets are affected by a patch. This file |
| also tells MB what kind of test an entry is (so we can form the correct |
| command line) and may specify additional command line flags. If you are |
| creating a test that is only a variant of an existing test, this may be the |
| only file you need to modify. (Technically, you could define a new test |
| solely in test_suites.pyl and reference existing gn_isolate_map.pyl |
| entries, but this is considered bad practice). |
| * Add the GN target itself to the appropriate build files. Make sure this GN |
| target contains all of the data and data_deps entries needed to ensure the |
| test isolate has all the files the test needs to run. If your test doesn't |
| depend on new build targets or add additional data file dependencies, you |
| likely don't need this. However, this is increasingly uncommon. |
| |
| ### Command Line Arguments |
| |
| The executable must support the following command line arguments (aka flags): |
| |
| ``` |
| --isolated-outdir=[PATH] |
| ``` |
| |
| This argument is required, and should be set to the directory created |
| by the swarming task for the task to write outputs into. |
| |
| ``` |
| --out-dir=[PATH] |
| ``` |
| |
| This argument mirrors `--isolated-outdir`, but may appear in addition to |
| it depending on the bot configuration (e.g. IOS bots that specify the |
| `out_dir_arg` mixin in //testing/buildbot/waterfalls.pyl). It only needs |
| to be handled in these cases. |
| |
| ``` |
| --isolated-script-test-output=[FILENAME] |
| ``` |
| |
| This argument is optional. If this argument is provided, the executable |
| must write the results of the test run in the [JSON Test |
| Results Format](json_test_results_format.md) into |
| that file. If this argument is not given to the executable, the |
| executable must not write the output anywhere. The executable should |
| only write a valid version of the file, and generally should only do |
| this at the end of the test run. This means that if the run is |
| interrupted, you may not get the results of what did run, but that is |
| acceptable. |
| |
| ``` |
| --isolated-script-test-filter=[STRING] |
| ``` |
| |
| This argument is optional. If this argument is provided, it must be a |
| double-colon-separated list of strings, where each string either |
| uniquely identifies a full test name or is a prefix plus a "*" on the |
| end (to form a glob). The executable must run only the test matching |
| those names or globs. "*" is _only_ supported at the end, i.e., 'Foo.*' |
| is legal, but '*.bar' is not. If the string has a "-" at the front, the |
| test (or glob of tests) must be skipped, not run. This matches how test |
| names are specified in the simple form of the [Chromium Test List |
| Format][9]. We use the double |
| colon as a separator because most other common punctuation characters |
| can occur in test names (some test suites use URLs as test names, for |
| example). This argument may be provided multiple times; how to treat |
| multiple occurrences (and how this arg interacts with |
| --isolated-script-test-filter-file) is described below. |
| |
| ``` |
| --isolated-script-test-filter-file=[FILENAME] |
| ``` |
| |
| If provided, the executable must read the given filename to determine |
| which tests to run and what to expect their results to be. The file must |
| be in the [Chromium Test List Format][9] (either the simple or |
| tagged formats are fine). This argument may be provided multiple times; |
| how to treat multiple occurrences (and how this arg interacts with |
| `--isolated-script-test-filter`) is described below. |
| |
| ``` |
| --isolated-script-test-launcher-retry-limit=N |
| ``` |
| |
| By default, tests are run only once if they succeed. If they fail, we |
| will retry the test up to N times (so, for N+1 total invocations of the |
| test) looking for a success (and stop retrying once the test has |
| succeed). By default, the value of N is 3. To turn off retries, pass |
| `--isolated-script-test-launcher-retry-limit=0`. If this flag is provided, |
| it is an error to also pass `--isolated-script-test-repeat` (since -repeat |
| specifies an explicit number of times to run the test, it makes no sense |
| to also pass --retry-limit). |
| |
| ``` |
| --isolated-script-test-repeat=N |
| ``` |
| |
| If provided, the executable must run a given test N times (total), |
| regardless of whether the test passes or fails. By default, tests are |
| only run once (N=1) if the test matches an expected result or passes, |
| otherwise it may be retried until it succeeds, as governed by |
| `--isolated-script-test-launcher-retry-limit`, above. If this flag is |
| provided, it is an error to also pass |
| `--isolated-script-test-launcher-retry-limit` (since -repeat specifies an |
| explicit number of times to run the test, it makes no sense to also pass |
| -retry-limit). |
| |
| ``` |
| --xcode-build-version [VERSION] |
| ``` |
| |
| This flag is passed to scripts on IOS bots only, due to the `xcode_14_main` |
| mixin in //testing/builtbot/waterfalls.pyl. |
| |
| ``` |
| --xctest |
| ``` |
| |
| This flag is passed to scripts on IOS bots only, due to the `xctest` |
| mixin in //testing/builtbot/waterfalls.pyl. |
| |
| If "`--`" is passed as an argument: |
| |
| * If the executable is a wrapper that invokes another underlying |
| executable, then the wrapper must handle arguments passed before the |
| "--" on the command line (and must error out if it doesn't know how |
| to do that), and must pass through any arguments following the "--" |
| unmodified to the underlying executable (and otherwise ignore them |
| rather than erroring out if it doesn't know how to interpret them). |
| * If the executable is not a wrapper, but rather invokes the tests |
| directly, it should handle all of the arguments and otherwise ignore |
| the "--". The executable should error out if it gets arguments it |
| can't handle, but it is not required to do so. |
| |
| If "--" is not passed, the executable should error out if it gets |
| arguments it doesn't know how to handle, but it is not required to do |
| so. |
| |
| If the test executable produces artifacts, they should be written to the |
| location specified by the dirname of the `--isolated-script-test-output` |
| argument). If the `--isolated-script-test-output-argument` is not |
| specified, the executable should store the tests somewhere under the |
| root_build_dir, but there is no standard for how to do this currently |
| (most tests do not produce artifacts). |
| |
| The flag names are purposely chosen to be long in order to not conflict |
| with other flags the executable might support. |
| |
| ### Environment variables |
| |
| The executable must check for and honor the following environment variables: |
| |
| ``` |
| GTEST_SHARD_INDEX=[N] |
| ``` |
| |
| This environment variable is optional, but if it is provided, it |
| partially determines (along with `GTEST_TOTAL_SHARDS`) which fixed |
| subset of tests (or "shard") to run. `GTEST_TOTAL_SHARDS` must also be |
| set, and `GTEST_SHARD_INDEX` must be set to an integer between 0 and |
| `GTEST_TOTAL_SHARDS`. Determining which tests to run is described |
| below. |
| |
| ``` |
| GTEST_TOTAL_SHARDS=[N] |
| ``` |
| |
| This environment variable is optional, but if it is provided, it |
| partially determines (along with `GTEST_TOTAL_SHARDS`) which fixed subset |
| of tests (or "shard") to run. It must be set to a non-zero integer. |
| Determining which tests to run is described below. |
| |
| ### Exit codes (aka return codes or return values) |
| |
| The executable must return 0 for a completely successful run, and a |
| non-zero result if something failed. The following codes are recommended |
| (2 and 130 coming from UNIX conventions): |
| |
| | Value | Meaning | |
| |--------- | ------- | |
| | 0 (zero) | The executable ran to completion and all tests either ran as expected or passed unexpectedly. | |
| | 1 | The executable ran to completion but some tests produced unexpectedly failing results. | |
| | 2 | The executable failed to start, most likely due to unrecognized or unsupported command line arguments. | |
| | 130 | The executable run was aborted the user (or caller) in a semi-orderly manner (aka SIGKILL or Ctrl-C). | |
| |
| ## Filtering which tests to run |
| |
| By default, the executable must run every test it knows about. However, |
| as noted above, the `--isolated-script-test-filter` and |
| `--isolated-script-test-filter-file` flags can be used to customize which |
| tests to run. Either or both flags may be used, and either may be |
| specified multiple times. |
| |
| The interaction is as follows: |
| |
| * A test should be run only if it would be run when **every** flag is |
| evaluated individually. |
| * A test should be skipped if it would be skipped if **any** flag was |
| evaluated individually. |
| |
| If multiple filters in a flag match a given test name, the longest match |
| takes priority (longest match wins). I.e.,. if you had |
| `--isolated-script-test-filter='a*::-ab*'`, then `ace.html` would run but |
| `abd.html` would not. The order of the filters should not matter. It is |
| an error to have multiple expressions of the same length that conflict |
| (e.g., `a*::-a*`). |
| |
| Examples are given below. |
| |
| It may not be obvious why we need to support these flags being used multiple |
| times, or together. There are two main sets of reasons: |
| * First, you may want to use multiple -filter-file arguments to specify |
| multiple sets of test expectations (e.g., the base test expectations and |
| then MSAN-specific expectations), or to specify expectations in one file |
| and list which tests to run in a separate file. |
| * Second, the way the Chromium recipes work, in order to retry a test step to |
| confirm test failures, the recipe doesn't want to have to parse the |
| existing command line, it just wants to append |
| --isolated-script-test-filter and list the |
| tests that fail, and this can cause the --isolated-script-test-filter |
| argument to be listed multiple times (or in conjunction with |
| --isolated-script-test-filter-file). |
| |
| You cannot practically use these mechanisms to run equally sized subsets of the |
| tests, so if you want to do the latter, use `GTEST_SHARD_INDEX` and |
| `GTEST_TOTAL_SHARDS` instead, as described in the next section. |
| |
| ## Running equally-sized subsets of tests (shards) |
| |
| If the `GTEST_SHARD_INDEX` and `GTEST_TOTAL_SHARDS` environment variables are |
| set, `GTEST_TOTAL_SHARDS` must be set to a non-zero integer N, and |
| `GTEST_SHARD_INDEX` must be set to an integer M between 0 and N-1. Given those |
| two values, the executable must run only every N<sup>th</sup> test starting at |
| test number M (i.e., every i<sup>th</sup> test where (i mod N) == M). dd |
| |
| This mechanism produces roughly equally-sized sets of tests that will hopefully |
| take roughly equal times to execute, but cannot guarantee the latter property |
| to any degree of precision. If you need them to be as close to the same |
| duration as possible, you will need a more complicated process. For example, |
| you could run all of the tests once to determine their individual running |
| times, and then build up lists of tests based on that, or do something even |
| more complicated based on multiple test runs to smooth over variance in test |
| execution times. Chromium does not currently attempt to do this for functional |
| tests, but we do something similar for performance tests in order to better |
| achieve equal running times and device affinity for consistent results. |
| |
| You cannot practically use the sharding mechanism to run a stable named set of |
| tests, so if you want to do the latter, use the `--isolated-script-test-filter` |
| flags instead, as described in the previous section. |
| |
| Which tests are in which shard must be determined **after** tests have been |
| filtered out using the `--isolated-script-test-filter(-file)` flags. |
| |
| The order that tests are run in is not otherwise specified, but tests are |
| commonly run either in lexicographic order or in a semi-fixed random order; the |
| latter is useful to help identify inter-test dependencies, i.e., tests that |
| rely on the results of previous tests having run in order to pass (such tests |
| are generally considered to be undesirable). |
| |
| ## Examples |
| |
| Assume that out/Default is a debug build (i.e., that the "Debug" tag will |
| apply), and that you have tests named Foo.Bar.bar{1,2,3}, Foo.Bar.baz, |
| and Foo.Quux.quux, and the following two filter files: |
| |
| ```sh |
| $ cat filter1 |
| Foo.Bar.* |
| -Foo.Bar.bar3 |
| $ cat filter2 |
| # tags: [ Debug Release ] |
| [ Debug ] Foo.Bar.bar2 [ Skip ] |
| $ |
| ``` |
| |
| #### Filtering tests on the command line |
| |
| ```sh |
| $ out/Default/bin/run_foo_tests \ |
| --isolated_script-test-filter='Foo.Bar.*::-Foo.Bar.bar3' |
| [1/2] Foo.Bar.bar1 passed in 0.1s |
| [2/2] Foo.Bar.bar2 passed in 0.13s |
| |
| 2 tests passed in 0.23s, 0 skipped, 0 failures. |
| $ |
| ``` |
| |
| #### Using a filter file |
| |
| ```sh |
| $ out/Default/bin/run_foo_tests --isolated-script-test-filter-file=filter1 |
| [1/2] Foo.Bar.bar1 passed in 0.1s |
| [2/2] Foo.Bar.bar2 passed in 0.13s |
| |
| 2 tests passed in 0.23s, 0 skipped, 0 failures. |
| ``` |
| |
| #### Combining multiple filters |
| |
| ```sh |
| $ out/Default/bin/run_foo_tests --isolated-script-test-filter='Foo.Bar.*' \ |
| --isolated-script-test-filter='Foo.Bar.bar2' |
| [1/1] Foo.Bar.bar2 passed in 0.13s |
| |
| All 2 tests completed successfully in 0.13s |
| $ out/Default/bin/run_foo_tests --isolated-script-test-filter='Foo.Bar.* \ |
| --isolated-script-test-filter='Foo.Baz.baz' |
| No tests to run. |
| $ out/Default/bin/run_foo_tests --isolated-script-test-filter-file=filter2 \ |
| --isolated-script-test-filter=-FooBaz.baz |
| [1/4] Foo.Bar.bar1 passed in 0.1s |
| [2/4] Foo.Bar.bar3 passed in 0.13s |
| [3/4] Foo.Baz.baz passed in 0.05s |
| |
| 3 tests passed in 0.28s, 2 skipped, 0 failures. |
| $ |
| ``` |
| |
| #### Running one shard of tests |
| |
| ```sh |
| $ GTEST_TOTAL_SHARDS=3 GTEST_SHARD_INDEX=1 out/Default/bin/run_foo_tests |
| Foo.Bar.bar2 passed in 0.13s |
| Foo.Quux.quux1 passed in 0.02s |
| |
| 2 tests passed in 0.15s, 0 skipped, 0 failures. |
| $ |
| ``` |
| |
| ## Related Work |
| |
| This document only partially makes sense in isolation. |
| |
| The [JSON Test Results Format](json_test_results_format.md) document |
| specifies how the results of the test run should be reported. |
| |
| The [Chromium Test List Format][14] specifies in more detail how we can specify |
| which tests to run and which to skip, and whether the tests are expected to |
| pass or fail. |
| |
| Implementing everything in this document plus the preceding three documents |
| should fully specify how tests are run in Chromium. And, if we do this, |
| implementing tools to manage tests should be significantly easier. |
| |
| [On Naming Chromium Builders and Build Steps][15] is a related proposal that |
| has been partially implemented; it is complementary to this work, but not |
| required. |
| |
| [Cleaning up the Chromium Testing Conventions][3] describes a series of |
| changes we might want to make to this API and the related infrastructure to |
| simplify things. |
| |
| Additional documents that may be of interest: |
| * [Testing Configuration Files][8] |
| * [The MB (Meta-Build wrapper) User Guide][10] |
| * [The MB (Meta-Build wrapper) Design Spec][11] |
| * [Test Activation / Deactivation (TADA)][12] (internal Google document only, |
| sorry) |
| * [Standardize Artifacts for Chromium Testing][13] is somewhat dated but goes |
| into slightly greater detail on how to store artifacts produced by tests |
| than the JSON Test Results Format does. |
| |
| ## Document history |
| |
| \[ Significant changes only. \] |
| |
| | Date | Comment | |
| | ---------- | -------- | |
| | 2017-12-13 | Initial version. This tried to be a full-featured spec that defined common flags that devs might want with friendly names, as well the flags needed to run tests on the bots. | |
| | 2019-05-24 | Second version. The spec was significantly revised to just specify the minimal subset needed to run tests consistently on bots given the current infrastructure. | |
| | 2019-05-29 | All TODOs and discussion of future work was stripped out; now the spec only specifies how the `isolated_scripts` currently behave. Future work was moved to a new doc, [Cleaning up the Chromium Testing Environment][3]. | |
| | 2019-09-16 | Add comment about ordering of filters and longest match winning for `--isolated-script-test-filter`. | |
| | 2020-07-01 | Moved into the src repo and converted to Markdown. No content changes otherwise. | |
| |
| ## Notes |
| |
| (*) The initial version of this document talked about test runners instead of |
| test executables, so the bit.ly shortcut URL refers to the test-runner-api instead of |
| the test-executable-api. The author attempted to create a test-executable-api link, |
| but pointed it at the wrong document by accident. bit.ly URLs can't easily be |
| updated :(. |
| |
| [1]: https://bit.ly/chromium-test-runner-api |
| [2]: https://chromium.googlesource.com/infra/infra/+/main/doc/users/services/about_luci.md |
| [3]: https://docs.google.com/document/d/1MwnIx8kavuLSpZo3JmL9T7nkjTz1rpaJA4Vdj_9cRYw/edit?usp=sharing |
| [4]: ../../testing/buildbot/test_suites.pyl |
| [5]: ../../testing/buildbot/gn_isolate_map.pyl |
| [6]: ../../testing/buildbot/test_suite_exceptions.pyl |
| [7]: ../../testing/buildbot/waterfalls.pyl |
| [8]: ../../testing/buildbot/README.md |
| [9]: https://bit.ly/chromium-test-list-format |
| [10]: ../../tools/mb/docs/user_guide.md |
| [11]: ../../tools/mb/docs/design_spec.md |
| [12]: https://goto.google.com/chops-tada |
| [13]: https://bit.ly/chromium-test-artifacts |
| [14]: https://bit.ly/chromium-test-list-format |
| [15]: https://bit.ly/chromium-build-naming |