| # GPU Expectation Files |
| |
| This file goes over details of the expectation files which are critical for |
| ensuring that GPU tests only run where they should and that flakes are |
| suppressed to avoid red bots. |
| |
| [TOC] |
| |
| ## Overview |
| |
| The GPU Telemetry-based integration tests (tests that use the |
| `telemetry_gpu_integration_test` target) |
| [utilize expectation files](gpu_expectations) in order to define when certain |
| tests should not be run or are expected to fail. The core expectation format is |
| defined by [typ](typ_expectations), although there are some Chromium-specific |
| extensions as well. Each expectation consists of the following fields, separated |
| by a space: |
| |
| 1. An optional bug identifier. While optional, it is heavily encouraged that GPU |
| expectations have this field filled. |
| 1. A set of tags that the expectation applies to. This is technically optional, |
| as omitting tags will cause the expectation to be applied everywhere, but |
| there are very few, if any, instances where tags will not be specified for |
| GPU expectations. |
| 1. The name of the test that the expectation applies to. A single wildcard (`*`) |
| character is allowed at the end of the string, but use of a wildcard anywhere |
| but the end of the string is an error. |
| 1. A set of expected results for the test. This technically supports multiple |
| values, but for GPU purposes, it will always be a single value. |
| |
| Additionally, comments are supported, which begin with `#`. |
| |
| Thus, a sample expectation entry might look like: |
| |
| ``` |
| # Flakes regularly but infrequently. |
| crbug.com/1234 [ win amd ] foo/test [ RetryOnFailure ] |
| ``` |
| |
| [gpu_expectations]: https://chromium.googlesource.com/chromium/src/+/main/content/test/gpu/gpu_tests/test_expectations |
| [typ_expectations]: https://chromium.googlesource.com/catapult.git/+/main/third_party/typ/typ/expectations_parser.py |
| |
| ## Core Format |
| |
| The following are further details on each of the parts of an expectation that |
| are part of the core expectation file format. |
| |
| ### Bug Identifier |
| |
| An optional string(s) pointing to the bug(s) tracking the reason why the |
| expectation exists. For GPU uses, this is usually a single bug, but multiple |
| space-separated strings are supported. |
| |
| The format of the string is enforced by [these](bug_regexes) regular |
| expressions, so CLs that introduce malformed bugs will not be submittable. |
| |
| [bug_regexes]: https://chromium.googlesource.com/chromium/src/+/e26d89a52627f8910b79a95668dfa48e5fe8fa06/content/test/gpu/gpu_tests/test_expectations_unittest.py#66 |
| |
| ### Tags |
| |
| One or more tags are used to specify which configuration(s) an expectation |
| applies to. For GPU tests, this is often things such as the OS, the GPU vendor, |
| or the specific GPU model. |
| |
| Tag sets are defined at the top of the expectation file using `# tags:` |
| comments. Each comment defines a different set of mutually exclusive tags, e.g. |
| all of the OS tags are in a single set. An expectation is only allowed to use |
| one tag from each set, but can use tags from an arbitrary number of sets. For |
| example, `[ win win10 ]` would be invalid since both are OS tags, but |
| `[ win amd release ]` would be valid since there is one tag each from the OS, |
| GPU, and browser type tag sets. |
| |
| Additionally, tags used for expectations with the same test must be unambiguous |
| so that the same test cannot have multiple expectations applied to it at once. |
| Take the following expectations as an example: |
| |
| ``` |
| [ mac intel ] foo/test [ Failure ] |
| [ mac debug ] foo/test [ RetryOnFailure ] |
| ``` |
| |
| These expectations would be considered to be conflicting since `[ mac intel ]` |
| does not make any distinctions about the browser type, and `[ mac debug ]` does |
| not make any distinctions about the GPU type. As written, `foo/test` running |
| on a configuration that produced the `mac`, `intel`, and `debug` tags would try |
| to use both expectations. |
| |
| This can be fixed by adding a tag from the same tag set but with a different |
| value so that the configurations are no longer ambiguous. |
| `[ mac intel release ]` would work since a configuration cannot be both |
| `release` and `debug` at the same time. Similarly, `[ mac amd debug ]` would |
| work since a configuration cannot be both `intel` and `amd` at the same time. |
| |
| Such conflicts will be caught and reported by presubmit tests, so you should not |
| have to worry about accidentally landing bad expectations, but you will need to |
| fix any found conflicts before you can submit your CL. |
| |
| #### Adding/Modifying Tags |
| |
| Actually updating the test harness to generate new tags is out of scope for this |
| documentation. However, if a new tag needs to be added to an expectation file |
| or an existing one modified (e.g. renamed), it is important to note that the |
| tag header should not be manually modified in the expectation file itself. |
| |
| Instead, modify the header in [validate_tag_consistency.py] and run |
| `validate_tag_consistency.py apply` to apply the new header to all expectation |
| files. This ensures that all files remain in sync. |
| |
| Tag consistency is checked as part of presubmit, so it will be apparent if you |
| accidentally modify the tag header in a file directly. |
| |
| [validate_tag_consistency.py]: https://chromium.googlesource.com/chromium/src/+/main/content/test/gpu/validate_tag_consistency.py |
| |
| ### Test Name |
| |
| A single string with either a test name or part of a test name suffixed with a |
| wildcard character. Note that the test name is just the test case as reported |
| by the test harness, not the fully qualified name that is sometimes reported in |
| places such as the "Test Results" tab on bots. |
| |
| As an example, |
| `gpu_tests.webgl1_conformance_integration_test.WebGL1ConformanceIntegrationTest.WebglExtension_EXT_blend_minmax` |
| is a fully qualified name, while `WebglExtension_EXT_blend_minmax` is what would |
| actually be used in the expectation file for the `webgl1_conformance` suite. |
| |
| ### Expected Results |
| |
| Usually one, but potentially multiple, results that are expected on the |
| configuration that the expectation is for. Like tags, expected results are |
| defined at the top of each expectation file and have the same caveat about |
| addition/modification with the helper script. However, unlike tags, there is |
| only one set of values which are not expected to be added to/changed on any |
| sort of regular basis. The following expected results are used by GPU tests: |
| |
| #### Skip |
| |
| Skips the test entirely. The benefit of this is that no time is wasted on a bad |
| test. However, it also means that it is impossible to check if the test is still |
| failing or not by just looking at historical results. This is problematic for |
| humans, but even more problematic for scripts we have to automatically remove |
| expectations that are no longer needed. |
| |
| As such, it is heavily discouraged to add new Skip expectations except under the |
| following circumstances: |
| |
| 1. The test is invalid on a configuration for some reason, e.g. a feature is not |
| and will not be supported on a certain OS, and so should never be run. These |
| sorts of expectations are expected to be permanent. |
| 1. The act of running the test is significantly detrimental to other tests, e.g. |
| running the test kills the test device. These are expected to be temporary, |
| so the root cause should be fixed relatively quickly. |
| |
| If presubmit thinks you are adding new Skip expectations, it will warn you, but |
| the warning can be ignored if the addition falls into one of the above |
| categories or it is a false positive, such as due to modifying tags on an |
| existing expectation. |
| |
| #### Failure |
| |
| Lets the test run normally, but hides the fact that it failed during result |
| reporting. This is the preferred way to suppress frequent failures on bots, as |
| it keeps the bots green while still reporting results that can be used later. |
| |
| #### RetryOnFailure |
| |
| Allows the test to be retried up to two additional times before being marked as |
| failing, as by default GPU tests do not retry on failure. This is preferred if |
| the test fails occasionally, but not enough to warrant marking it as failing |
| consistently. |
| |
| #### Slow |
| |
| Only has an effect in a subset of test suites. Currently, those are suites that |
| use a heartbeat mechanism instead of a fixed timeout: |
| |
| * `webgpu_cts` |
| * `webgl1_conformance` |
| * `webgl2_conformance` |
| |
| Since these tests use a relatively short timeout that gets refreshed as long as |
| the test does not hang, they are more susceptible to timeouts if the test does a |
| lot of work or other parallel tests are using a large number of resources. In |
| these cases, the `Slow` expectation can be used to increase the heartbeat |
| timeout for a test, reducing the chance that one of these timeouts is hit. |
| |
| If the reported failure for a test is along the lines of "Timed out waiting for |
| websocket message", prefer to use a `Slow` expectation first over a `Failure` or |
| `RetryOnFailure` one. |
| |
| ## Extensions |
| |
| In addition to the normal expectation functionality, Chromium has several |
| extensions to the expectation file format. |
| |
| ### Unexpected Pass Finder Annotations |
| |
| Chromium has several unexpected pass finder scripts (sometimes called stale |
| expectation removers) to automatically reclaim test coverage by modifying |
| expectation files. These mostly work as intended, but can occasionally make |
| changes that don't align with what we actually want. Thus, there are several |
| annotations that can be inserted into expectation files to adjust the behavior |
| of these scripts. |
| |
| #### Disable |
| |
| There are several annotations that can be used to prevent the scripts from |
| automatically removing expectations. All of these start with `finder:disable` |
| with some suffix. |
| |
| `finder:disable-general` prevents the expectation from being removed under any |
| circumstances. |
| |
| `finder:disable-stale` prevents the expectation from being removed if it is |
| still applicable to at least one bot, but all queried results point to the |
| expectation no longer being needed. This is most likely to be used for |
| expectations for very infrequent flakes, where the flake might not occur within |
| the data range that we query. |
| |
| `finder:disable-unused` prevents the expectation from being removed if it is |
| found to not be used on any bots, i.e. the specified configuration does not |
| appear to actually be tested. This is most likely to be used for expectations |
| for failures reported by third parties with their own testing configurations. |
| |
| `finder:disable-narrowing` prevents the expectation from having its scope |
| automatically narrowed to only apply to configurations that are found to need |
| it. This is most likely to be used for expectations that are intentionally |
| broad to prevent failures that aren't planned on being fixed. |
| |
| All of these annotations can either be used inline for a single expectation: |
| |
| ``` |
| [ mac intel ] foo/test [ Failure ] # finder:disable-general |
| ``` |
| |
| or with their `finder:enable` equivalent for blocks: |
| |
| ``` |
| # finder:disable-general |
| [ mac intel ] foo/test [ Failure ] |
| [ mac intel ] bart/test [ Failure ] |
| # finder:enable-general |
| ``` |
| |
| Nested blocks are not allowed. The `finder:disable` annotations can be followed |
| with a description of why the disable is necessary, which will be output by the |
| script when it encounters a case where one of the disabled expectations would |
| have been removed if the annotation was not present: |
| |
| ``` |
| # finder:disable-stale Very low flake rate |
| [ mac intel ] foo/test [ Failure ] |
| [ mac intel ] bar/test [ Failure ] |
| # finder:enable-stale |
| ``` |
| |
| #### Group Start/End |
| |
| There may be cases where groups of expectations should only be removed together, |
| e.g. if a flake affects a large number of tests but the chance of any individual |
| test hitting the flake is low. In these cases, the expectations can be grouped |
| together so one is only removed if all of them are being removed. |
| |
| ``` |
| # finder:group-start Some group description or name |
| [ mac intel ] foo/test [ Failure ] |
| [ mac intel ] bar/test [ Failure ] |
| # finder:group-end |
| ``` |
| |
| The group name/description is required and is used to uniquely identify each |
| group. This means that groups with the same name string in different parts of |
| the file will be treated as the same group, as if they were all in a single |
| group block together. |
| |
| ``` |
| # finder:group-start group_name |
| [ mac ] foo/test [ Failure ] |
| [ mac ] bar/test [ Failure ] |
| # finder:group-end |
| |
| ... |
| |
| # finder:group-start group_name |
| [ android ] foo/test [ Failure ] |
| [ android ] bar/test [ Failure ] |
| # finder:group-end |
| ``` |
| |
| is equivalent to |
| |
| ``` |
| # finder:group-start group_name |
| [ mac ] foo/test [ Failure ] |
| [ mac ] bar/test [ Failure ] |
| [ android ] foo/test [ Failure ] |
| [ android ] bar/test [ Failure ] |
| # finder:group-end |
| ``` |