blob: 39f969633d3ef995fb904f88e08435892615bf87 [file] [log] [blame] [view]
Kai Ninomiyaa6429fb32018-03-30 01:30:561# GPU Testing
2
3This set of pages documents the setup and operation of the GPU bots and try
4servers, which verify the correctness of Chrome's graphically accelerated
5rendering pipeline.
6
7[TOC]
8
9## Overview
10
11The GPU bots run a different set of tests than the majority of the Chromium
12test machines. The GPU bots specifically focus on tests which exercise the
13graphics processor, and whose results are likely to vary between graphics card
14vendors.
15
16Most of the tests on the GPU bots are run via the [Telemetry framework].
17Telemetry was originally conceived as a performance testing framework, but has
18proven valuable for correctness testing as well. Telemetry directs the browser
19to perform various operations, like page navigation and test execution, from
20external scripts written in Python. The GPU bots launch the full Chromium
21browser via Telemetry for the majority of the tests. Using the full browser to
22execute tests, rather than smaller test harnesses, has yielded several
23advantages: testing what is shipped, improved reliability, and improved
24performance.
25
26[Telemetry framework]: https://github.com/catapult-project/catapult/tree/master/telemetry
27
28A subset of the tests, called "pixel tests", grab screen snapshots of the web
29page in order to validate Chromium's rendering architecture end-to-end. Where
30necessary, GPU-specific results are maintained for these tests. Some of these
31tests verify just a few pixels, using handwritten code, in order to use the
32same validation for all brands of GPUs.
33
34The GPU bots use the Chrome infrastructure team's [recipe framework], and
35specifically the [`chromium`][recipes/chromium] and
36[`chromium_trybot`][recipes/chromium_trybot] recipes, to describe what tests to
37execute. Compared to the legacy master-side buildbot scripts, recipes make it
38easy to add new steps to the bots, change the bots' configuration, and run the
39tests locally in the same way that they are run on the bots. Additionally, the
40`chromium` and `chromium_trybot` recipes make it possible to send try jobs which
41add new steps to the bots. This single capability is a huge step forward from
42the previous configuration where new steps were added blindly, and could cause
43failures on the tryservers. For more details about the configuration of the
44bots, see the [GPU bot details].
45
John Palmer046f9872021-05-24 01:24:5646[recipe framework]: https://chromium.googlesource.com/external/github.com/luci/recipes-py/+/main/doc/user_guide.md
47[recipes/chromium]: https://chromium.googlesource.com/chromium/tools/build/+/main/scripts/slave/recipes/chromium.py
48[recipes/chromium_trybot]: https://chromium.googlesource.com/chromium/tools/build/+/main/scripts/slave/recipes/chromium_trybot.py
Kai Ninomiyaa6429fb32018-03-30 01:30:5649[GPU bot details]: gpu_testing_bot_details.md
50
51The physical hardware for the GPU bots lives in the Swarming pool\*. The
52Swarming infrastructure ([new docs][new-testing-infra], [older but currently
53more complete docs][isolated-testing-infra]) provides many benefits:
54
55* Increased parallelism for the tests; all steps for a given tryjob or
56 waterfall build run in parallel.
57* Simpler scaling: just add more hardware in order to get more capacity. No
58 manual configuration or distribution of hardware needed.
59* Easier to run certain tests only on certain operating systems or types of
60 GPUs.
61* Easier to add new operating systems or types of GPUs.
62* Clearer description of the binary and data dependencies of the tests. If
63 they run successfully locally, they'll run successfully on the bots.
64
65(\* All but a few one-off GPU bots are in the swarming pool. The exceptions to
66the rule are described in the [GPU bot details].)
67
68The bots on the [chromium.gpu.fyi] waterfall are configured to always test
69top-of-tree ANGLE. This setup is done with a few lines of code in the
70[tools/build workspace]; search the code for "angle".
71
72These aspects of the bots are described in more detail below, and in linked
73pages. There is a [presentation][bots-presentation] which gives a brief
74overview of this documentation and links back to various portions.
75
76<!-- XXX: broken link -->
77[new-testing-infra]: https://github.com/luci/luci-py/wiki
78[isolated-testing-infra]: https://www.chromium.org/developers/testing/isolated-testing/infrastructure
Kenneth Russell8a386d42018-06-02 09:48:0179[chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console
80[chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console
Josip Sokcevicba144412020-09-09 20:57:0581[tools/build workspace]: https://source.chromium.org/chromium/chromium/tools/build/+/HEAD:recipes/recipe_modules/chromium_tests/builders/chromium_gpu_fyi.py
Kai Ninomiyaa6429fb32018-03-30 01:30:5682[bots-presentation]: https://docs.google.com/presentation/d/1BC6T7pndSqPFnituR7ceG7fMY7WaGqYHhx5i9ECa8EI/edit?usp=sharing
83
84## Fleet Status
85
86Please see the [GPU Pixel Wrangling instructions] for links to dashboards
87showing the status of various bots in the GPU fleet.
88
Brian Sheedy5a4c0a392021-09-22 21:28:3589[GPU Pixel Wrangling instructions]: http://go/gpu-pixel-wrangler#fleet-status
Kai Ninomiyaa6429fb32018-03-30 01:30:5690
91## Using the GPU Bots
92
93Most Chromium developers interact with the GPU bots in two ways:
94
951. Observing the bots on the waterfalls.
962. Sending try jobs to them.
97
98The GPU bots are grouped on the [chromium.gpu] and [chromium.gpu.fyi]
99waterfalls. Their current status can be easily observed there.
100
101To send try jobs, you must first upload your CL to the codereview server. Then,
102either clicking the "CQ dry run" link or running from the command line:
103
104```sh
105git cl try
106```
107
108Sends your job to the default set of try servers.
109
110The GPU tests are part of the default set for Chromium CLs, and are run as part
111of the following tryservers' jobs:
112
Stephen Martinis089f5f02019-02-12 02:42:24113* [linux-rel], formerly on the `tryserver.chromium.linux` waterfall
114* [mac-rel], formerly on the `tryserver.chromium.mac` waterfall
Ben Pastene9cf11392022-11-14 19:36:25115* [win-rel], formerly on the `tryserver.chromium.win` waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56116
Ben Pastene9cf11392022-11-14 19:36:25117[linux-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-rel?limit=100
118[mac-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac-rel?limit=100
119[win-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win-rel?limit=100
Kai Ninomiyaa6429fb32018-03-30 01:30:56120
121Scan down through the steps looking for the text "GPU"; that identifies those
122tests run on the GPU bots. For each test the "trigger" step can be ignored; the
123step further down for the test of the same name contains the results.
124
125It's usually not necessary to explicitly send try jobs just for verifying GPU
126tests. If you want to, you must invoke "git cl try" separately for each
127tryserver master you want to reference, for example:
128
129```sh
Stephen Martinis089f5f02019-02-12 02:42:24130git cl try -b linux-rel
131git cl try -b mac-rel
132git cl try -b win7-rel
Kai Ninomiyaa6429fb32018-03-30 01:30:56133```
134
135Alternatively, the Gerrit UI can be used to send a patch set to these try
136servers.
137
138Three optional tryservers are also available which run additional tests. As of
139this writing, they ran longer-running tests that can't run against all Chromium
140CLs due to lack of hardware capacity. They are added as part of the included
141tryservers for code changes to certain sub-directories.
142
Corentin Wallezb78c44a2018-04-12 14:29:47143* [linux_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
144* [mac_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
145* [win_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56146
Corentin Wallezb78c44a2018-04-12 14:29:47147[linux_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel
148[mac_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_optional_gpu_tests_rel
149[win_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_optional_gpu_tests_rel
Kenneth Russell42732952018-06-27 02:08:42150[luci.chromium.try]: https://ci.chromium.org/p/chromium/g/luci.chromium.try/builders
Kai Ninomiyaa6429fb32018-03-30 01:30:56151
152Tryservers for the [ANGLE project] are also present on the
153[tryserver.chromium.angle] waterfall. These are invoked from the Gerrit user
154interface. They are configured similarly to the tryservers for regular Chromium
155patches, and run the same tests that are run on the [chromium.gpu.fyi]
156waterfall, in the same way (e.g., against ToT ANGLE).
157
158If you find it necessary to try patches against other sub-repositories than
159Chromium (`src/`) and ANGLE (`src/third_party/angle/`), please
160[file a bug](http://crbug.com/new) with component Internals\>GPU\>Testing.
161
John Palmer046f9872021-05-24 01:24:56162[ANGLE project]: https://chromium.googlesource.com/angle/angle/+/main/README.md
Kai Ninomiyaa6429fb32018-03-30 01:30:56163[tryserver.chromium.angle]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
164[file a bug]: http://crbug.com/new
165
166## Running the GPU Tests Locally
167
168All of the GPU tests running on the bots can be run locally from a Chromium
169build. Many of the tests are simple executables:
170
171* `angle_unittests`
Takuto Ikutaf5333252019-11-06 16:07:08172* `gl_tests`
Kai Ninomiyaa6429fb32018-03-30 01:30:56173* `gl_unittests`
174* `tab_capture_end2end_tests`
175
176Some run only on the chromium.gpu.fyi waterfall, either because there isn't
177enough machine capacity at the moment, or because they're closed-source tests
178which aren't allowed to run on the regular Chromium waterfalls:
179
180* `angle_deqp_gles2_tests`
181* `angle_deqp_gles3_tests`
182* `angle_end2end_tests`
183* `audio_unittests`
184
185The remaining GPU tests are run via Telemetry. In order to run them, just
Brian Sheedy251556b2021-11-15 23:28:09186build the `telemetry_gpu_integration_test` target (or
187`telemetry_gpu_integration_test_android_chrome` for Android) and then
Kai Ninomiyaa6429fb32018-03-30 01:30:56188invoke `src/content/test/gpu/run_gpu_integration_test.py` with the appropriate
189argument. The tests this script can invoke are
190in `src/content/test/gpu/gpu_tests/`. For example:
191
192* `run_gpu_integration_test.py context_lost --browser=release`
Kai Ninomiyaa6429fb32018-03-30 01:30:56193* `run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=1.0.2`
194* `run_gpu_integration_test.py maps --browser=release`
195* `run_gpu_integration_test.py screenshot_sync --browser=release`
196* `run_gpu_integration_test.py trace_test --browser=release`
197
Brian Sheedyc4650ad02019-07-29 17:31:38198The pixel tests are a bit special. See
199[the section on running them locally](#Running-the-pixel-tests-locally) for
200details.
201
Brian Sheedy251556b2021-11-15 23:28:09202The `--browser=release` argument can be changed to `--browser=debug` if you
203built in a directory such as `out/Debug`. If you built in some non-standard
204directory such as `out/my_special_gn_config`, you can instead specify
205`--browser=exact --browser-executable=out/my_special_gn_config/chrome`.
206
207If you're testing on Android, use `--browser=android-chromium` instead of
208`--browser=release/debug` to invoke it. Additionally, Telemetry will likely
209complain about being unable to find the browser binary on Android if you build
210in a non-standard output directory. Thus, `out/Release` or `out/Debug` are
211suggested when testing on Android.
Kenneth Russellfa3ffde2018-10-24 21:24:38212
Brian Sheedy15587f72021-04-16 19:56:06213**Note:** The tests require some third-party Python packages. Obtaining these
Fabrice de Gans7820a772022-09-16 00:10:30214packages is handled automatically by `vpython3`, and the script's shebang should
215use vpython if running the script directly. If you're used to invoking `python3`
216to run a script, simply use `vpython3` instead, e.g.
217`vpython3 run_gpu_integration_test.py ...`.
Kai Ninomiyaa6429fb32018-03-30 01:30:56218
Kenneth Russellfa3ffde2018-10-24 21:24:38219You can run a subset of tests with this harness:
Kai Ninomiyaa6429fb32018-03-30 01:30:56220
221* `run_gpu_integration_test.py webgl_conformance --browser=release
222 --test-filter=conformance_attribs`
223
Brian Sheedy15587f72021-04-16 19:56:06224The exact command used to invoke the test on the bots can be found in one of
225two ways:
Kai Ninomiyaa6429fb32018-03-30 01:30:56226
Brian Sheedy15587f72021-04-16 19:56:062271. Looking at the [json.input][trigger_input] of the trigger step under
228 `requests[task_slices][command]`. The arguments after the last `--` are
229 used to actually run the test.
2301. Looking at the top of a [swarming task][sample_swarming_task].
Kai Ninomiyaa6429fb32018-03-30 01:30:56231
Brian Sheedy15587f72021-04-16 19:56:06232In both cases, the following can be omitted when running locally since they're
233only necessary on swarming:
234* `testing/test_env.py`
235* `testing/scripts/run_gpu_integration_test_as_googletest.py`
236* `--isolated-script-test-output`
237* `--isolated-script-test-perf-output`
Kai Ninomiyaa6429fb32018-03-30 01:30:56238
Kai Ninomiyaa6429fb32018-03-30 01:30:56239
Brian Sheedy15587f72021-04-16 19:56:06240[trigger_input]: https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8849851608240828544/+/u/test_pre_run__14_/l_trigger__webgl2_conformance_d3d11_passthrough_tests_on_NVIDIA_GPU_on_Windows_on_Windows-10-18363/json.input
241[sample_swarming_task]: https://chromium-swarm.appspot.com/task?id=52f06058bfb31b10
Kai Ninomiyaa6429fb32018-03-30 01:30:56242
243The Maps test requires you to authenticate to cloud storage in order to access
244the Web Page Reply archive containing the test. See [Cloud Storage Credentials]
245for documentation on setting this up.
246
247[Cloud Storage Credentials]: gpu_testing_bot_details.md#Cloud-storage-credentials
248
Brian Sheedy6bd9c162022-02-02 21:44:37249### Bisecting ChromeOS Failures Locally
250
251Failures that occur on the ChromeOS amd64-generic configuration are easy to
252reproduce due to the VM being readily available for use, but doing so requires
253some additional steps to the bisect process. The following are steps that can be
254followed using two terminals and the [Simple Chrome SDK] to bisect a ChromeOS
255failure.
256
2571. Terminal 1: Start the bisect as normal `git bisect start`
258 `git bisect good <good_revision>` `git bisect bad <bad_revision>`
2591. Terminal 1: Sync to the revision that git spits out
260 `gclient sync -r src@<revision>`
2611. Terminal 2: Enter the Simple Chrome SDK
262 `cros chrome-sdk --board amd64-generic-vm --log-level info --download-vm --clear-sdk-cache`
2631. Terminal 2: Compile the relevant target (probably the GPU integration tests)
264 `autoninja -C out_amd64-generic-vm/Release/ telemetry_gpu_integration_test`
2651. Terminal 2: Start the VM `cros_vm --start`
2661. Terminal 2: Deploy the Chrome binary to the VM
267 `deploy_chrome --build-dir out_amd64-generic-vm/Release/ --device 127.0.0.1:9222`
268 This will require you to accept a prompt twice, once because of a board
269 mismatch and once because the VM still has rootfs verification enabled.
2701. Terminal 1: Run your test on the VM. For GPU integration tests, this involves
271 specifying `--browser cros-chrome --remote 127.0.0.1 --remote-ssh-port 9222`
2721. Terminal 2: After determining whether the revision is good or bad, shut down
273 the VM `cros_vm --stop`
2741. Terminal 2: Exit the SKD `exit`
2751. Terminal 1: Let git know whether the revision was good or bad
276 `git bisect good`/`git bisect bad`
2771. Repeat from step 2 with the new revision git spits out.
278
279The repeated entry/exit from the SDK between revisions is to ensure that the
280VM image is in sync with the Chromium revision, as it is possible for
281regressions to be caused by an update to the image itself rather than a Chromium
282change.
283
284[Simple Chrome SDK]: https://chromium.googlesource.com/chromiumos/docs/+/HEAD/simple_chrome_workflow.md
285
Brian Sheedy15587f72021-04-16 19:56:06286### Telemetry Test Suites
287The Telemetry-based tests are all technically the same target,
288`telemetry_gpu_integration_test`, just run with different runtime arguments. The
289first positional argument passed determines which suite will run, and additional
290runtime arguments may cause the step name to change on the bots. Here is a list
291of all suites and resulting step names as of April 15th 2021:
292
293* `context_lost`
294 * `context_lost_passthrough_tests`
295 * `context_lost_tests`
296 * `context_lost_validating_tests`
Brian Sheedy15587f72021-04-16 19:56:06297* `hardware_accelerated_feature`
Brian Sheedy15587f72021-04-16 19:56:06298 * `hardware_accelerated_feature_tests`
299* `gpu_process`
Brian Sheedy15587f72021-04-16 19:56:06300 * `gpu_process_launch_tests`
301* `info_collection`
302 * `info_collection_tests`
303* `maps`
Brian Sheedy15587f72021-04-16 19:56:06304 * `maps_pixel_passthrough_test`
305 * `maps_pixel_test`
306 * `maps_pixel_validating_test`
307 * `maps_tests`
308* `pixel`
309 * `android_webview_pixel_skia_gold_test`
Brian Sheedy15587f72021-04-16 19:56:06310 * `egl_pixel_skia_gold_test`
Brian Sheedy15587f72021-04-16 19:56:06311 * `pixel_skia_gold_passthrough_test`
312 * `pixel_skia_gold_validating_test`
313 * `pixel_tests`
Brian Sheedy15587f72021-04-16 19:56:06314 * `vulkan_pixel_skia_gold_test`
315* `power`
316 * `power_measurement_test`
317* `screenshot_sync`
Brian Sheedy15587f72021-04-16 19:56:06318 * `screenshot_sync_passthrough_tests`
319 * `screenshot_sync_tests`
320 * `screenshot_sync_validating_tests`
321* `trace_test`
322 * `trace_test`
323* `webgl_conformance`
324 * `webgl2_conformance_d3d11_passthrough_tests`
325 * `webgl2_conformance_gl_passthrough_tests`
326 * `webgl2_conformance_gles_passthrough_tests`
327 * `webgl2_conformance_tests`
328 * `webgl2_conformance_validating_tests`
329 * `webgl_conformance_d3d11_passthrough_tests`
330 * `webgl_conformance_d3d9_passthrough_tests`
331 * `webgl_conformance_fast_call_tests`
332 * `webgl_conformance_gl_passthrough_tests`
333 * `webgl_conformance_gles_passthrough_tests`
334 * `webgl_conformance_metal_passthrough_tests`
335 * `webgl_conformance_swangle_passthrough_tests`
Brian Sheedy15587f72021-04-16 19:56:06336 * `webgl_conformance_tests`
337 * `webgl_conformance_validating_tests`
338 * `webgl_conformance_vulkan_passthrough_tests`
339
Kenneth Russellfa3ffde2018-10-24 21:24:38340### Running the pixel tests locally
Kai Ninomiyaa6429fb32018-03-30 01:30:56341
Brian Sheedyc4650ad02019-07-29 17:31:38342The pixel tests are a special case because they use an external Skia service
343called Gold to handle image approval and storage. See
344[GPU Pixel Testing With Gold] for specifics.
Kenneth Russellfa3ffde2018-10-24 21:24:38345
Brian Sheedyc4650ad02019-07-29 17:31:38346[GPU Pixel Testing With Gold]: gpu_pixel_testing_with_gold.md
Kenneth Russellfa3ffde2018-10-24 21:24:38347
Brian Sheedyc4650ad02019-07-29 17:31:38348TL;DR is that the pixel tests use a binary called `goldctl` to download and
349upload data when running pixel tests.
Kenneth Russellfa3ffde2018-10-24 21:24:38350
Brian Sheedyc4650ad02019-07-29 17:31:38351Normally, `goldctl` uploads images and image metadata to the Gold server when
352used. This is not desirable when running locally for a couple reasons:
Kenneth Russellfa3ffde2018-10-24 21:24:38353
Brian Sheedyc4650ad02019-07-29 17:31:383541. Uploading requires the user to be whitelisted on the server, and whitelisting
355everyone who wants to run the tests locally is not a viable solution.
3562. Images produced during local runs are usually slightly different from those
357that are produced on the bots due to hardware/software differences. Thus, most
358images uploaded to Gold from local runs would likely only ever actually be used
359by tests run on the machine that initially generated those images, which just
360adds noise to the list of approved images.
Kenneth Russellfa3ffde2018-10-24 21:24:38361
Brian Sheedyc4650ad02019-07-29 17:31:38362Additionally, the tests normally rely on the Gold server for viewing images
363produced by a test run. This does not work if the data is not actually uploaded.
Kenneth Russellfa3ffde2018-10-24 21:24:38364
Brian Sheedyb70d3102019-10-14 22:41:50365The pixel tests contain logic to automatically determine whether they are
366running on a workstation or not, as well as to determine what git revision is
367being tested. This *should* mean that the pixel tests will automatically work
368when run locally. However, if the local run detection code fails for some
369reason, you can manually pass some flags to force the same behavior:
370
Brian Sheedy2df4e142020-06-15 21:49:33371In order to get around the local run issues, simply pass the
372`--local-pixel-tests` flag to the tests. This will disable uploading, but
373otherwise go through the same steps as a test normally would. Each test will
374also print out `file://` URLs to the produced image, the closest image for the
375test known to Gold, and the diff between the two.
Kenneth Russellfa3ffde2018-10-24 21:24:38376
Brian Sheedyc4650ad02019-07-29 17:31:38377Because the image produced by the test locally is likely slightly different from
378any of the approved images in Gold, local test runs are likely to fail during
379the comparison step. In order to cut down on the amount of noise, you can also
380pass the `--no-skia-gold-failure` flag to not fail the test on a failed image
381comparison. When using `--no-skia-gold-failure`, you'll also need to pass the
382`--passthrough` flag in order to actually see the link output.
Kenneth Russellfa3ffde2018-10-24 21:24:38383
Brian Sheedyc4650ad02019-07-29 17:31:38384Example usage:
Brian Sheedy2df4e142020-06-15 21:49:33385`run_gpu_integration_test.py pixel --no-skia-gold-failure --local-pixel-tests
jonross8de90742019-10-15 19:10:48386--passthrough`
Kenneth Russellfa3ffde2018-10-24 21:24:38387
jonross8de90742019-10-15 19:10:48388If, for some reason, the local run code is unable to determine what the git
Brian Sheedy4d335deb2020-04-01 20:47:32389revision is, simply pass `--git-revision aabbccdd`. Note that `aabbccdd` must
jonross8de90742019-10-15 19:10:48390be replaced with an actual Chromium src revision (typically whatever revision
Andrew Williamsbbc1a1e2021-07-21 01:51:22391origin/main is currently synced to) in order for the tests to work. This can
jonross8de90742019-10-15 19:10:48392be done automatically using:
Brian Sheedy2df4e142020-06-15 21:49:33393``run_gpu_integration_test.py pixel --no-skia-gold-failure --local-pixel-tests
Andrew Williamsbbc1a1e2021-07-21 01:51:22394--passthrough --git-revision `git rev-parse origin/main` ``
Kai Ninomiyaa6429fb32018-03-30 01:30:56395
Kai Ninomiyaa6429fb32018-03-30 01:30:56396## Running Binaries from the Bots Locally
397
398Any binary run remotely on a bot can also be run locally, assuming the local
399machine loosely matches the architecture and OS of the bot.
400
401The easiest way to do this is to find the ID of the swarming task and use
402"swarming.py reproduce" to re-run it:
403
Takuto Ikuta2d01a492021-06-04 00:28:58404* `./src/tools/luci-go/swarming reproduce -S https://chromium-swarm.appspot.com [task ID]`
Kai Ninomiyaa6429fb32018-03-30 01:30:56405
406The task ID can be found in the stdio for the "trigger" step for the test. For
407example, look at a recent build from the [Mac Release (Intel)] bot, and
408look at the `gl_unittests` step. You will see something like:
409
Yves Gereya702f6222019-01-24 11:07:30410[Mac Release (Intel)]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Release%20%28Intel%29/
Kai Ninomiyaa6429fb32018-03-30 01:30:56411
412```
413Triggered task: gl_unittests on Intel GPU on Mac/Mac-10.12.6/[TRUNCATED_ISOLATE_HASH]/Mac Release (Intel)/83664
414To collect results, use:
415 swarming.py collect -S https://chromium-swarm.appspot.com --json /var/folders/[PATH_TO_TEMP_FILE].json
416Or visit:
417 https://chromium-swarm.appspot.com/user/task/[TASK_ID]
418```
419
420There is a difference between the isolate's hash and Swarming's task ID. Make
421sure you use the task ID and not the isolate's hash.
422
423As of this writing, there seems to be a
424[bug](https://github.com/luci/luci-py/issues/250)
425when attempting to re-run the Telemetry based GPU tests in this way. For the
426time being, this can be worked around by instead downloading the contents of
Brian Sheedy15587f72021-04-16 19:56:06427the isolate. To do so, look into the "Reproducing the task locally" section on
428a swarming task, which contains something like:
Kai Ninomiyaa6429fb32018-03-30 01:30:56429
Brian Sheedy15587f72021-04-16 19:56:06430```
431Download inputs files into directory foo:
Junji Watanabe160300022021-09-27 03:09:53432# (if needed, use "\${platform}" as-is) cipd install "infra/tools/luci/cas/\${platform}" -root bar
433# (if needed) ./bar/cas login
434./bar/cas download -cas-instance projects/chromium-swarm/instances/default_instance -digest 68ae1d6b22673b0ab7b4427ca1fc2a4761c9ee53474105b9076a23a67e97a18a/647 -dir foo
Brian Sheedy15587f72021-04-16 19:56:06435```
Kai Ninomiyaa6429fb32018-03-30 01:30:56436
437Before attempting to download an isolate, you must ensure you have permission
438to access the isolate server. Full instructions can be [found
439here][isolate-server-credentials]. For most cases, you can simply run:
440
Takuto Ikuta2d01a492021-06-04 00:28:58441* `./src/tools/luci-go/isolate login`
Kai Ninomiyaa6429fb32018-03-30 01:30:56442
443The above link requires that you log in with your @google.com credentials. It's
444not known at the present time whether this works with @chromium.org accounts.
445Email kbr@ if you try this and find it doesn't work.
446
447[isolate-server-credentials]: gpu_testing_bot_details.md#Isolate-server-credentials
448
Colin Blundellf27d43f2022-09-19 12:44:14449## Debugging a Specific Subset of Tests on a Specific GPU Bot
450
451When a test exhibits flake on the bots, it can be convenient to run it
452repeatedly with local code modifications on the bot where it is exhibiting
453flake. One way of doing this is via swarming (see the below section). However, a
454lower-overhead alternative that also works in the case where you are looking to
455run on a bot for which you cannot locally build is to locally alter the
456configuration of the bot in question to specify that it should run only the
457tests desired, repeating as many times as desired. Instructions for doing this
458are as follows (see the [example CL] for a concrete instantiation of these
459instructions):
460
4611. In testsuite_exceptions.pyl, find the section for the test suite in question
462 (creating it if it doesn't exist).
4632. Add modifications for the bot in question and specify arguments such that
464 your desired tests are run for the desired number of iterations.
4653. Run testing/buildbot/generate_buildbot_json.py and verify that the JSON file
466 for the bot in question was modified as you would expect.
4674. Upload and run tryjobs on that specific bot via "Choose Tryjobs."
4685. Examine the test results. (You can verify that the tests run were as you
469 expected by examining the test results for individual shards of the run
470 of the test suite in question.)
4716. Add logging/code modifications/etc as desired and go back to step 4,
472 repeating the process until you've uncovered the underlying issue.
4737. Remove the the changes to testsuite_exceptions.pyl and the JSON file if
474 turning the CL into one intended for submission!
475
476Here is an [example CL] that does this.
477
478[example CL]: https://chromium-review.googlesource.com/c/chromium/src/+/3898592/4
479
Kai Ninomiyaa6429fb32018-03-30 01:30:56480## Running Locally Built Binaries on the GPU Bots
481
482See the [Swarming documentation] for instructions on how to upload your binaries to the isolate server and trigger execution on Swarming.
483
John Budorickb2ff2242019-11-14 17:35:59484Be sure to use the correct swarming dimensions for your desired GPU e.g. "1002:6613" instead of "AMD Radeon R7 240 (1002:6613)" which is how it appears on swarming task page. You can query bots in the chromium.tests.gpu pool to find the correct dimensions:
Sunny Sachanandani8d071572019-06-13 20:17:58485
Takuto Ikuta2d01a492021-06-04 00:28:58486* `tools\luci-go\swarming bots -S chromium-swarm.appspot.com -d pool=chromium.tests.gpu`
Sunny Sachanandani8d071572019-06-13 20:17:58487
Kai Ninomiyaa6429fb32018-03-30 01:30:56488[Swarming documentation]: https://www.chromium.org/developers/testing/isolated-testing/for-swes#TOC-Run-a-test-built-locally-on-Swarming
489
Kenneth Russell42732952018-06-27 02:08:42490## Moving Test Binaries from Machine to Machine
491
492To create a zip archive of your personal Chromium build plus all of
493the Telemetry-based GPU tests' dependencies, which you can then move
494to another machine for testing:
495
4961. Build Chrome (into `out/Release` in this example).
Fabrice de Gans7820a772022-09-16 00:10:304971. `vpython3 tools/mb/mb.py zip out/Release/ telemetry_gpu_integration_test out/telemetry_gpu_integration_test.zip`
Kenneth Russell42732952018-06-27 02:08:42498
499Then copy telemetry_gpu_integration_test.zip to another machine. Unzip
500it, and cd into the resulting directory. Invoke
501`content/test/gpu/run_gpu_integration_test.py` as above.
502
503This workflow has been tested successfully on Windows with a
504statically-linked Release build of Chrome.
505
506Note: on one macOS machine, this command failed because of a broken
507`strip-json-comments` symlink in
508`src/third_party/catapult/common/node_runner/node_runner/node_modules/.bin`. Deleting
509that symlink allowed it to proceed.
510
511Note also: on the same macOS machine, with a component build, this
512command failed to zip up a working Chromium binary. The browser failed
513to start with the following error:
514
515`[0626/180440.571670:FATAL:chrome_main_delegate.cc(1057)] Check failed: service_manifest_data_pack_.`
516
517In a pinch, this command could be used to bundle up everything, but
518the "out" directory could be deleted from the resulting zip archive,
519and the Chromium binaries moved over to the target machine. Then the
520command line arguments `--browser=exact --browser-executable=[path]`
521can be used to launch that specific browser.
522
523See the [user guide for mb](../../tools/mb/docs/user_guide.md#mb-zip), the
524meta-build system, for more details.
525
Kai Ninomiyaa6429fb32018-03-30 01:30:56526## Adding New Tests to the GPU Bots
527
528The goal of the GPU bots is to avoid regressions in Chrome's rendering stack.
529To that end, let's add as many tests as possible that will help catch
530regressions in the product. If you see a crazy bug in Chrome's rendering which
531would be easy to catch with a pixel test running in Chrome and hard to catch in
532any of the other test harnesses, please, invest the time to add a test!
533
534There are a couple of different ways to add new tests to the bots:
535
5361. Adding a new test to one of the existing harnesses.
5372. Adding an entire new test step to the bots.
538
539### Adding a new test to one of the existing test harnesses
540
541Adding new tests to the GTest-based harnesses is straightforward and
542essentially requires no explanation.
543
544As of this writing it isn't as easy as desired to add a new test to one of the
545Telemetry based harnesses. See [Issue 352807](http://crbug.com/352807). Let's
546collectively work to address that issue. It would be great to reduce the number
547of steps on the GPU bots, or at least to avoid significantly increasing the
548number of steps on the bots. The WebGL conformance tests should probably remain
549a separate step, but some of the smaller Telemetry based tests
550(`context_lost_tests`, `memory_test`, etc.) should probably be combined into a
551single step.
552
553If you are adding a new test to one of the existing tests (e.g., `pixel_test`),
554all you need to do is make sure that your new test runs correctly via isolates.
555See the documentation from the GPU bot details on [adding new isolated
Daniel Bratellf73f0df2018-09-24 13:52:49556tests][new-isolates] for the gn args and authentication needed to upload
Kai Ninomiyaa6429fb32018-03-30 01:30:56557isolates to the isolate server. Most likely the new test will be Telemetry
Takuto Ikuta2d01a492021-06-04 00:28:58558based, and included in the `telemetry_gpu_test_run` isolate.
Kai Ninomiyaa6429fb32018-03-30 01:30:56559
560[new-isolates]: gpu_testing_bot_details.md#Adding-a-new-isolated-test-to-the-bots
561
Jamie Madill5b0716b2019-10-24 16:43:47562### Adding new steps to the GPU Bots
Kai Ninomiyaa6429fb32018-03-30 01:30:56563
564The tests that are run by the GPU bots are described by a couple of JSON files
565in the Chromium workspace:
566
John Palmer046f9872021-05-24 01:24:56567* [`chromium.gpu.json`](https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/chromium.gpu.json)
568* [`chromium.gpu.fyi.json`](https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/chromium.gpu.fyi.json)
Kai Ninomiyaa6429fb32018-03-30 01:30:56569
570These files are autogenerated by the following script:
571
John Palmer046f9872021-05-24 01:24:56572* [`generate_buildbot_json.py`](https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/generate_buildbot_json.py)
Kai Ninomiyaa6429fb32018-03-30 01:30:56573
Kenneth Russell8a386d42018-06-02 09:48:01574This script is documented in
John Palmer046f9872021-05-24 01:24:56575[`testing/buildbot/README.md`](https://chromium.googlesource.com/chromium/src/+/main/testing/buildbot/README.md). The
Kenneth Russell8a386d42018-06-02 09:48:01576JSON files are parsed by the chromium and chromium_trybot recipes, and describe
577two basic types of tests:
Kai Ninomiyaa6429fb32018-03-30 01:30:56578
579* GTests: those which use the Googletest and Chromium's `base/test/launcher/`
580 frameworks.
Kenneth Russell8a386d42018-06-02 09:48:01581* Isolated scripts: tests whose initial entry point is a Python script which
582 follows a simple convention of command line argument parsing.
583
584The majority of the GPU tests are however:
585
586* Telemetry based tests: an isolated script test which is built on the
587 Telemetry framework and which launches the entire browser.
Kai Ninomiyaa6429fb32018-03-30 01:30:56588
589A prerequisite of adding a new test to the bots is that that test [run via
Kenneth Russell8a386d42018-06-02 09:48:01590isolates][new-isolates]. Once that is done, modify `test_suites.pyl` to add the
591test to the appropriate set of bots. Be careful when adding large new test steps
592to all of the bots, because the GPU bots are a limited resource and do not
593currently have the capacity to absorb large new test suites. It is safer to get
594new tests running on the chromium.gpu.fyi waterfall first, and expand from there
595to the chromium.gpu waterfall (which will also make them run against every
Stephen Martinis089f5f02019-02-12 02:42:24596Chromium CL by virtue of the `linux-rel`, `mac-rel`, `win7-rel` and
597`android-marshmallow-arm64-rel` tryservers' mirroring of the bots on this
598waterfall – so be careful!).
Kai Ninomiyaa6429fb32018-03-30 01:30:56599
600Tryjobs which add new test steps to the chromium.gpu.json file will run those
601new steps during the tryjob, which helps ensure that the new test won't break
602once it starts running on the waterfall.
603
604Tryjobs which modify chromium.gpu.fyi.json can be sent to the
605`win_optional_gpu_tests_rel`, `mac_optional_gpu_tests_rel` and
606`linux_optional_gpu_tests_rel` tryservers to help ensure that they won't
607break the FYI bots.
608
Kenneth Russellfa3ffde2018-10-24 21:24:38609## Debugging Pixel Test Failures on the GPU Bots
610
Brian Sheedyc4650ad02019-07-29 17:31:38611If pixel tests fail on the bots, the build step will contain either one or more
612links titled `gold_triage_link for <test name>` or a single link titled
613`Too many artifacts produced to link individually, click for links`, which
614itself will contain links. In either case, these links will direct to Gold
615pages showing the image produced by the image and the approved image that most
616closely matches it.
Kenneth Russellfa3ffde2018-10-24 21:24:38617
Quinten Yearsley317532d2021-10-20 17:10:31618Note that for the tests which programmatically check colors in certain regions of
Brian Sheedyc4650ad02019-07-29 17:31:38619the image (tests with `expected_colors` fields in [pixel_test_pages]), there
620likely won't be a closest approved image since those tests only upload data to
621Gold in the event of a failure.
Kenneth Russellfa3ffde2018-10-24 21:24:38622
Brian Sheedyc4650ad02019-07-29 17:31:38623[pixel_test_pages]: https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/pixel_test_pages.py
Kenneth Russellfa3ffde2018-10-24 21:24:38624
Kai Ninomiyaa6429fb32018-03-30 01:30:56625## Updating and Adding New Pixel Tests to the GPU Bots
626
Brian Sheedyc4650ad02019-07-29 17:31:38627If your CL adds a new pixel test or modifies existing ones, it's likely that
628you will have to approve new images. Simply run your CL through the CQ and
629follow the steps outline [here][pixel wrangling triage] under the "Check if any
630pixel test failures are actual failures or need to be rebaselined." step.
Kai Ninomiyaa6429fb32018-03-30 01:30:56631
Brian Sheedy5a4c0a392021-09-22 21:28:35632[pixel wrangling triage]: http://go/gpu-pixel-wrangler-info#how-to-keep-the-bots-green
Kai Ninomiyaa6429fb32018-03-30 01:30:56633
Brian Sheedy5a88cc72019-09-27 23:04:35634If you are adding a new pixel test, it is beneficial to set the
635`grace_period_end` argument in the test's definition. This will allow the test
636to run for a period without actually failing on the waterfall bots, giving you
637some time to triage any additional images that show up on them. This helps
638prevent new tests from making the bots red because they're producing slightly
639different but valid images from the ones triaged while the CL was in review.
640Example:
641
642```
643from datetime import date
644
645...
646
647PixelTestPage(
648 'foo_pixel_test.html',
649 ...
650 grace_period_end=date(2020, 1, 1)
651)
652```
653
654You should typically set the grace period to end 1-2 days after the the CL will
655land.
656
Brian Sheedyc4650ad02019-07-29 17:31:38657Once your CL passes the CQ, you should be mostly good to go, although you should
658keep an eye on the waterfall bots for a short period after your CL lands in case
659any configurations not covered by the CQ need to have images approved, as well.
Brian Sheedy5a88cc72019-09-27 23:04:35660All untriaged images for your test can be found by substituting your test name
661into:
662
663`https://chrome-gpu-gold.skia.org/search?query=name%3D<test name>`
Kai Ninomiyaa6429fb32018-03-30 01:30:56664
Brian Sheedye4a03fc2020-05-13 23:12:00665**NOTE** If you have a grace period active for your test, then Gold will be told
666to ignore results for the test. This is so that it does not comment on unrelated
667CLs about untriaged images if your test is noisy. Images will still be uploaded
668to Gold and can be triaged, but will not show up on the main page's untriaged
669image list, and you will need to enable the "Ignored" toggle at the top of the
670page when looking at the triage page specific to your test.
671
Kai Ninomiyaa6429fb32018-03-30 01:30:56672## Stamping out Flakiness
673
674It's critically important to aggressively investigate and eliminate the root
675cause of any flakiness seen on the GPU bots. The bots have been known to run
676reliably for days at a time, and any flaky failures that are tolerated on the
677bots translate directly into instability of the browser experienced by
678customers. Critical bugs in subsystems like WebGL, affecting high-profile
679products like Google Maps, have escaped notice in the past because the bots
680were unreliable. After much re-work, the GPU bots are now among the most
681reliable automated test machines in the Chromium project. Let's keep them that
682way.
683
684Flakiness affecting the GPU tests can come in from highly unexpected sources.
685Here are some examples:
686
687* Intermittent pixel_test failures on Linux where the captured pixels were
688 black, caused by the Display Power Management System (DPMS) kicking in.
689 Disabled the X server's built-in screen saver on the GPU bots in response.
690* GNOME dbus-related deadlocks causing intermittent timeouts ([Issue
691 309093](http://crbug.com/309093) and related bugs).
692* Windows Audio system changes causing intermittent assertion failures in the
693 browser ([Issue 310838](http://crbug.com/310838)).
694* Enabling assertion failures in the C++ standard library on Linux causing
695 random assertion failures ([Issue 328249](http://crbug.com/328249)).
696* V8 bugs causing random crashes of the Maps pixel test (V8 issues
697 [3022](https://code.google.com/p/v8/issues/detail?id=3022),
698 [3174](https://code.google.com/p/v8/issues/detail?id=3174)).
699* TLS changes causing random browser process crashes ([Issue
700 264406](http://crbug.com/264406)).
701* Isolated test execution flakiness caused by failures to reliably clean up
702 temporary directories ([Issue 340415](http://crbug.com/340415)).
703* The Telemetry-based WebGL conformance suite caught a bug in the memory
704 allocator on Android not caught by any other bot ([Issue
705 347919](http://crbug.com/347919)).
706* context_lost test failures caused by the compositor's retry logic ([Issue
707 356453](http://crbug.com/356453)).
708* Multiple bugs in Chromium's support for lost contexts causing flakiness of
709 the context_lost tests ([Issue 365904](http://crbug.com/365904)).
710* Maps test timeouts caused by Content Security Policy changes in Blink
711 ([Issue 395914](http://crbug.com/395914)).
712* Weak pointer assertion failures in various webgl\_conformance\_tests caused
713 by changes to the media pipeline ([Issue 399417](http://crbug.com/399417)).
714* A change to a default WebSocket timeout in Telemetry causing intermittent
715 failures to run all WebGL conformance tests on the Mac bots ([Issue
716 403981](http://crbug.com/403981)).
717* Chrome leaking suspended sub-processes on Windows, apparently a preexisting
718 race condition that suddenly showed up ([Issue
719 424024](http://crbug.com/424024)).
720* Changes to Chrome's cross-context synchronization primitives causing the
721 wrong tiles to be rendered ([Issue 584381](http://crbug.com/584381)).
722* A bug in V8's handling of array literals causing flaky failures of
723 texture-related WebGL 2.0 tests ([Issue 606021](http://crbug.com/606021)).
724* Assertion failures in sync point management related to lost contexts that
725 exposed a real correctness bug ([Issue 606112](http://crbug.com/606112)).
726* A bug in glibc's `sem_post`/`sem_wait` primitives breaking V8's parallel
727 garbage collection ([Issue 609249](http://crbug.com/609249)).
Kenneth Russelld5efb3f2018-05-11 01:40:45728* A change to Blink's memory purging primitive which caused intermittent
729 timeouts of WebGL conformance tests on all platforms ([Issue
730 840988](http://crbug.com/840988)).
Brian Sheedy382a59b42020-06-09 00:22:32731* Screen DPI being inconsistent across seemingly identical Linux machines,
732 causing the Maps pixel test to flakily produce incorrectly sized images
733 ([Issue 1091410](https://crbug.com/1091410)).
Kai Ninomiyaa6429fb32018-03-30 01:30:56734
735If you notice flaky test failures either on the GPU waterfalls or try servers,
736please file bugs right away with the component Internals>GPU>Testing and
737include links to the failing builds and copies of the logs, since the logs
738expire after a few days. [GPU pixel wranglers] should give the highest priority
739to eliminating flakiness on the tree.
740
Brian Sheedy5a4c0a392021-09-22 21:28:35741[GPU pixel wranglers]: http://go/gpu-pixel-wrangler