Blame - docs/gpu/gpu_testing.md - chromium/src

blob: 65fa0f1f37777357cf9ad753b643e7e1cf790e05 [file] [log] [blame] [view]

Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	1	# GPU Testing
				2
				3	This set of pages documents the setup and operation of the GPU bots and try
				4	servers, which verify the correctness of Chrome's graphically accelerated
				5	rendering pipeline.
				6
				7	[TOC]
				8
				9	## Overview
				10
				11	The GPU bots run a different set of tests than the majority of the Chromium
				12	test machines. The GPU bots specifically focus on tests which exercise the
				13	graphics processor, and whose results are likely to vary between graphics card
				14	vendors.
				15
				16	Most of the tests on the GPU bots are run via the [Telemetry framework].
				17	Telemetry was originally conceived as a performance testing framework, but has
				18	proven valuable for correctness testing as well. Telemetry directs the browser
				19	to perform various operations, like page navigation and test execution, from
				20	external scripts written in Python. The GPU bots launch the full Chromium
				21	browser via Telemetry for the majority of the tests. Using the full browser to
				22	execute tests, rather than smaller test harnesses, has yielded several
				23	advantages: testing what is shipped, improved reliability, and improved
				24	performance.
				25
				26	[Telemetry framework]: https://github.com/catapult-project/catapult/tree/master/telemetry
				27
				28	A subset of the tests, called "pixel tests", grab screen snapshots of the web
				29	page in order to validate Chromium's rendering architecture end-to-end. Where
				30	necessary, GPU-specific results are maintained for these tests. Some of these
				31	tests verify just a few pixels, using handwritten code, in order to use the
				32	same validation for all brands of GPUs.
				33
				34	The GPU bots use the Chrome infrastructure team's [recipe framework], and
				35	specifically the [`chromium`][recipes/chromium] and
				36	[`chromium_trybot`][recipes/chromium_trybot] recipes, to describe what tests to
				37	execute. Compared to the legacy master-side buildbot scripts, recipes make it
				38	easy to add new steps to the bots, change the bots' configuration, and run the
				39	tests locally in the same way that they are run on the bots. Additionally, the
				40	`chromium` and `chromium_trybot` recipes make it possible to send try jobs which
				41	add new steps to the bots. This single capability is a huge step forward from
				42	the previous configuration where new steps were added blindly, and could cause
				43	failures on the tryservers. For more details about the configuration of the
				44	bots, see the [GPU bot details].
				45
				46	[recipe framework]: https://chromium.googlesource.com/external/github.com/luci/recipes-py/+/master/doc/user_guide.md
				47	[recipes/chromium]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium.py
				48	[recipes/chromium_trybot]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py
				49	[GPU bot details]: gpu_testing_bot_details.md
				50
				51	The physical hardware for the GPU bots lives in the Swarming pool\*. The
				52	Swarming infrastructure ([new docs][new-testing-infra], [older but currently
				53	more complete docs][isolated-testing-infra]) provides many benefits:
				54
				55	* Increased parallelism for the tests; all steps for a given tryjob or
				56	waterfall build run in parallel.
				57	* Simpler scaling: just add more hardware in order to get more capacity. No
				58	manual configuration or distribution of hardware needed.
				59	* Easier to run certain tests only on certain operating systems or types of
				60	GPUs.
				61	* Easier to add new operating systems or types of GPUs.
				62	* Clearer description of the binary and data dependencies of the tests. If
				63	they run successfully locally, they'll run successfully on the bots.
				64
				65	(\* All but a few one-off GPU bots are in the swarming pool. The exceptions to
				66	the rule are described in the [GPU bot details].)
				67
				68	The bots on the [chromium.gpu.fyi] waterfall are configured to always test
				69	top-of-tree ANGLE. This setup is done with a few lines of code in the
				70	[tools/build workspace]; search the code for "angle".
				71
				72	These aspects of the bots are described in more detail below, and in linked
				73	pages. There is a [presentation][bots-presentation] which gives a brief
				74	overview of this documentation and links back to various portions.
				75
				76	<!-- XXX: broken link -->
				77	[new-testing-infra]: https://github.com/luci/luci-py/wiki
				78	[isolated-testing-infra]: https://www.chromium.org/developers/testing/isolated-testing/infrastructure
Kenneth Russell	8a386d4	2018-06-02 09:48:01	[diff] [blame]	79	[chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console
				80	[chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console
Josip Sokcevic	ba14441	2020-09-09 20:57:05	[diff] [blame]	81	[tools/build workspace]: https://source.chromium.org/chromium/chromium/tools/build/+/HEAD:recipes/recipe_modules/chromium_tests/builders/chromium_gpu_fyi.py
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	82	[bots-presentation]: https://docs.google.com/presentation/d/1BC6T7pndSqPFnituR7ceG7fMY7WaGqYHhx5i9ECa8EI/edit?usp=sharing
				83
				84	## Fleet Status
				85
				86	Please see the [GPU Pixel Wrangling instructions] for links to dashboards
				87	showing the status of various bots in the GPU fleet.
				88
				89	[GPU Pixel Wrangling instructions]: pixel_wrangling.md#Fleet-Status
				90
				91	## Using the GPU Bots
				92
				93	Most Chromium developers interact with the GPU bots in two ways:
				94
				95	1. Observing the bots on the waterfalls.
				96	2. Sending try jobs to them.
				97
				98	The GPU bots are grouped on the [chromium.gpu] and [chromium.gpu.fyi]
				99	waterfalls. Their current status can be easily observed there.
				100
				101	To send try jobs, you must first upload your CL to the codereview server. Then,
				102	either clicking the "CQ dry run" link or running from the command line:
				103
				104	```sh
				105	git cl try
				106	```
				107
				108	Sends your job to the default set of try servers.
				109
				110	The GPU tests are part of the default set for Chromium CLs, and are run as part
				111	of the following tryservers' jobs:
				112
Stephen Martinis	089f5f0	2019-02-12 02:42:24	[diff] [blame]	113	* [linux-rel], formerly on the `tryserver.chromium.linux` waterfall
				114	* [mac-rel], formerly on the `tryserver.chromium.mac` waterfall
Kai Ninomiya	3c25da9	2019-10-25 23:13:32	[diff] [blame]	115	* [win10_chromium_x64_rel_ng], formerly on the `tryserver.chromium.win` waterfall
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	116
Kai Ninomiya	3c25da9	2019-10-25 23:13:32	[diff] [blame]	117	[linux-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-rel?limit=100
				118	[mac-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac-rel?limit=100
				119	[win10_chromium_x64_rel_ng]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win10_chromium_x64_rel_ng?limit=100
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	120
				121	Scan down through the steps looking for the text "GPU"; that identifies those
				122	tests run on the GPU bots. For each test the "trigger" step can be ignored; the
				123	step further down for the test of the same name contains the results.
				124
				125	It's usually not necessary to explicitly send try jobs just for verifying GPU
				126	tests. If you want to, you must invoke "git cl try" separately for each
				127	tryserver master you want to reference, for example:
				128
				129	```sh
Stephen Martinis	089f5f0	2019-02-12 02:42:24	[diff] [blame]	130	git cl try -b linux-rel
				131	git cl try -b mac-rel
				132	git cl try -b win7-rel
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	133	```
				134
				135	Alternatively, the Gerrit UI can be used to send a patch set to these try
				136	servers.
				137
				138	Three optional tryservers are also available which run additional tests. As of
				139	this writing, they ran longer-running tests that can't run against all Chromium
				140	CLs due to lack of hardware capacity. They are added as part of the included
				141	tryservers for code changes to certain sub-directories.
				142
Corentin Wallez	b78c44a	2018-04-12 14:29:47	[diff] [blame]	143	* [linux_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
				144	* [mac_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
				145	* [win_optional_gpu_tests_rel] on the [luci.chromium.try] waterfall
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	146
Corentin Wallez	b78c44a	2018-04-12 14:29:47	[diff] [blame]	147	[linux_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux_optional_gpu_tests_rel
				148	[mac_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac_optional_gpu_tests_rel
				149	[win_optional_gpu_tests_rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win_optional_gpu_tests_rel
Kenneth Russell	4273295	2018-06-27 02:08:42	[diff] [blame]	150	[luci.chromium.try]: https://ci.chromium.org/p/chromium/g/luci.chromium.try/builders
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	151
				152	Tryservers for the [ANGLE project] are also present on the
				153	[tryserver.chromium.angle] waterfall. These are invoked from the Gerrit user
				154	interface. They are configured similarly to the tryservers for regular Chromium
				155	patches, and run the same tests that are run on the [chromium.gpu.fyi]
				156	waterfall, in the same way (e.g., against ToT ANGLE).
				157
				158	If you find it necessary to try patches against other sub-repositories than
				159	Chromium (`src/`) and ANGLE (`src/third_party/angle/`), please
				160	[file a bug](http://crbug.com/new) with component Internals\>GPU\>Testing.
				161
				162	[ANGLE project]: https://chromium.googlesource.com/angle/angle/+/master/README.md
				163	[tryserver.chromium.angle]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
				164	[file a bug]: http://crbug.com/new
				165
				166	## Running the GPU Tests Locally
				167
				168	All of the GPU tests running on the bots can be run locally from a Chromium
				169	build. Many of the tests are simple executables:
				170
				171	* `angle_unittests`
Takuto Ikuta	f533325	2019-11-06 16:07:08	[diff] [blame]	172	* `gl_tests`
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	173	* `gl_unittests`
				174	* `tab_capture_end2end_tests`
				175
				176	Some run only on the chromium.gpu.fyi waterfall, either because there isn't
				177	enough machine capacity at the moment, or because they're closed-source tests
				178	which aren't allowed to run on the regular Chromium waterfalls:
				179
				180	* `angle_deqp_gles2_tests`
				181	* `angle_deqp_gles3_tests`
				182	* `angle_end2end_tests`
				183	* `audio_unittests`
				184
				185	The remaining GPU tests are run via Telemetry. In order to run them, just
				186	build the `chrome` target and then
				187	invoke `src/content/test/gpu/run_gpu_integration_test.py` with the appropriate
				188	argument. The tests this script can invoke are
				189	in `src/content/test/gpu/gpu_tests/`. For example:
				190
				191	* `run_gpu_integration_test.py context_lost --browser=release`
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	192	* `run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=1.0.2`
				193	* `run_gpu_integration_test.py maps --browser=release`
				194	* `run_gpu_integration_test.py screenshot_sync --browser=release`
				195	* `run_gpu_integration_test.py trace_test --browser=release`
				196
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	197	The pixel tests are a bit special. See
				198	[the section on running them locally](#Running-the-pixel-tests-locally) for
				199	details.
				200
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	201	If you're testing on Android and have built and deployed
				202	`ChromePublic.apk` to the device, use `--browser=android-chromium` to
				203	invoke it.
				204
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	205	Note: If you are on Linux and see this test harness exit immediately with
				206	`Non zero exit code`, it's probably because of some incompatible Python
				207	packages being installed. Please uninstall the `python-egenix-mxdatetime` and
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	208	`python-logilab-common` packages in this case; see [Issue
				209	716241](http://crbug.com/716241). This should not be happening any more since
				210	the GPU tests were switched to use the infra team's `vpython` harness.
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	211
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	212	You can run a subset of tests with this harness:
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	213
				214	* `run_gpu_integration_test.py webgl_conformance --browser=release
				215	--test-filter=conformance_attribs`
				216
				217	Figuring out the exact command line that was used to invoke the test on the
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	218	bots can be a little tricky. The bots all run their tests via Swarming and
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	219	isolates, meaning that the invocation of a step like `[trigger]
				220	webgl_conformance_tests on NVIDIA GPU...` will look like:
				221
				222	* `python -u
				223	'E:\b\build\slave\Win7_Release__NVIDIA_\build\src\tools\swarming_client\swarming.py'
				224	trigger --swarming https://chromium-swarm.appspot.com
				225	--isolate-server https://isolateserver.appspot.com
				226	--priority 25 --shards 1 --task-name 'webgl_conformance_tests on NVIDIA GPU...'`
				227
				228	You can figure out the additional command line arguments that were passed to
				229	each test on the bots by examining the trigger step and searching for the
				230	argument separator (<code> -- </code>). For a recent invocation of
				231	`webgl_conformance_tests`, this looked like:
				232
				233	* `webgl_conformance --show-stdout '--browser=release' -v
				234	'--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'
				235	'--isolated-script-test-output=${ISOLATED_OUTDIR}/output.json'`
				236
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	237	You can leave off the --isolated-script-test-output argument, because that's
				238	used only by wrapper scripts, so this would leave a full command line of:
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	239
				240	* `run_gpu_integration_test.py
				241	webgl_conformance --show-stdout '--browser=release' -v
				242	'--extra-browser-args=--enable-logging=stderr --js-flags=--expose-gc'`
				243
				244	The Maps test requires you to authenticate to cloud storage in order to access
				245	the Web Page Reply archive containing the test. See [Cloud Storage Credentials]
				246	for documentation on setting this up.
				247
				248	[Cloud Storage Credentials]: gpu_testing_bot_details.md#Cloud-storage-credentials
				249
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	250	### Running the pixel tests locally
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	251
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	252	The pixel tests are a special case because they use an external Skia service
				253	called Gold to handle image approval and storage. See
				254	[GPU Pixel Testing With Gold] for specifics.
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	255
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	256	[GPU Pixel Testing With Gold]: gpu_pixel_testing_with_gold.md
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	257
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	258	TL;DR is that the pixel tests use a binary called `goldctl` to download and
				259	upload data when running pixel tests.
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	260
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	261	Normally, `goldctl` uploads images and image metadata to the Gold server when
				262	used. This is not desirable when running locally for a couple reasons:
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	263
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	264	1. Uploading requires the user to be whitelisted on the server, and whitelisting
				265	everyone who wants to run the tests locally is not a viable solution.
				266	2. Images produced during local runs are usually slightly different from those
				267	that are produced on the bots due to hardware/software differences. Thus, most
				268	images uploaded to Gold from local runs would likely only ever actually be used
				269	by tests run on the machine that initially generated those images, which just
				270	adds noise to the list of approved images.
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	271
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	272	Additionally, the tests normally rely on the Gold server for viewing images
				273	produced by a test run. This does not work if the data is not actually uploaded.
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	274
Brian Sheedy	b70d310	2019-10-14 22:41:50	[diff] [blame]	275	The pixel tests contain logic to automatically determine whether they are
				276	running on a workstation or not, as well as to determine what git revision is
				277	being tested. This should mean that the pixel tests will automatically work
				278	when run locally. However, if the local run detection code fails for some
				279	reason, you can manually pass some flags to force the same behavior:
				280
Brian Sheedy	2df4e14	2020-06-15 21:49:33	[diff] [blame]	281	In order to get around the local run issues, simply pass the
				282	`--local-pixel-tests` flag to the tests. This will disable uploading, but
				283	otherwise go through the same steps as a test normally would. Each test will
				284	also print out `file://` URLs to the produced image, the closest image for the
				285	test known to Gold, and the diff between the two.
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	286
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	287	Because the image produced by the test locally is likely slightly different from
				288	any of the approved images in Gold, local test runs are likely to fail during
				289	the comparison step. In order to cut down on the amount of noise, you can also
				290	pass the `--no-skia-gold-failure` flag to not fail the test on a failed image
				291	comparison. When using `--no-skia-gold-failure`, you'll also need to pass the
				292	`--passthrough` flag in order to actually see the link output.
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	293
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	294	Example usage:
Brian Sheedy	2df4e14	2020-06-15 21:49:33	[diff] [blame]	295	`run_gpu_integration_test.py pixel --no-skia-gold-failure --local-pixel-tests
jonross	8de9074	2019-10-15 19:10:48	[diff] [blame]	296	--passthrough`
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	297
jonross	8de9074	2019-10-15 19:10:48	[diff] [blame]	298	If, for some reason, the local run code is unable to determine what the git
Brian Sheedy	4d335deb	2020-04-01 20:47:32	[diff] [blame]	299	revision is, simply pass `--git-revision aabbccdd`. Note that `aabbccdd` must
jonross	8de9074	2019-10-15 19:10:48	[diff] [blame]	300	be replaced with an actual Chromium src revision (typically whatever revision
				301	origin/master is currently synced to) in order for the tests to work. This can
				302	be done automatically using:
Brian Sheedy	2df4e14	2020-06-15 21:49:33	[diff] [blame]	303	``run_gpu_integration_test.py pixel --no-skia-gold-failure --local-pixel-tests
Brian Sheedy	4d335deb	2020-04-01 20:47:32	[diff] [blame]	304	--passthrough --git-revision `git rev-parse origin/master` ``
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	305
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	306	## Running Binaries from the Bots Locally
				307
				308	Any binary run remotely on a bot can also be run locally, assuming the local
				309	machine loosely matches the architecture and OS of the bot.
				310
				311	The easiest way to do this is to find the ID of the swarming task and use
				312	"swarming.py reproduce" to re-run it:
				313
				314	* `./src/tools/swarming_client/swarming.py reproduce -S https://chromium-swarm.appspot.com [task ID]`
				315
				316	The task ID can be found in the stdio for the "trigger" step for the test. For
				317	example, look at a recent build from the [Mac Release (Intel)] bot, and
				318	look at the `gl_unittests` step. You will see something like:
				319
Yves Gerey	a702f622	2019-01-24 11:07:30	[diff] [blame]	320	[Mac Release (Intel)]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac%20Release%20%28Intel%29/
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	321
				322	```
				323	Triggered task: gl_unittests on Intel GPU on Mac/Mac-10.12.6/[TRUNCATED_ISOLATE_HASH]/Mac Release (Intel)/83664
				324	To collect results, use:
				325	swarming.py collect -S https://chromium-swarm.appspot.com --json /var/folders/[PATH_TO_TEMP_FILE].json
				326	Or visit:
				327	https://chromium-swarm.appspot.com/user/task/[TASK_ID]
				328	```
				329
				330	There is a difference between the isolate's hash and Swarming's task ID. Make
				331	sure you use the task ID and not the isolate's hash.
				332
				333	As of this writing, there seems to be a
				334	[bug](https://github.com/luci/luci-py/issues/250)
				335	when attempting to re-run the Telemetry based GPU tests in this way. For the
				336	time being, this can be worked around by instead downloading the contents of
				337	the isolate. To do so, look more deeply into the trigger step's log:
				338
				339	* <code>python -u
				340	/b/build/slave/Mac_10_10_Release__Intel_/build/src/tools/swarming_client/swarming.py
				341	trigger [...more args...] --tag data:[ISOLATE_HASH] [...more args...]
				342	[ISOLATE_HASH] -- [...TEST_ARGS...]</code>
				343
				344	As of this writing, the isolate hash appears twice in the command line. To
				345	download the isolate's contents into directory `foo` (note, this is in the
				346	"Help" section associated with the page for the isolate's task, but I'm not
				347	sure whether that's accessible only to Google employees or all members of the
				348	chromium.org organization):
				349
				350	* `python isolateserver.py download -I https://isolateserver.appspot.com
				351	--namespace default-gzip -s [ISOLATE_HASH] --target foo`
				352
				353	`isolateserver.py` will tell you the approximate command line to use. You
				354	should concatenate the `TEST_ARGS` highlighted in red above with
				355	`isolateserver.py`'s recommendation. The `ISOLATED_OUTDIR` variable can be
				356	safely replaced with `/tmp`.
				357
				358	Note that `isolateserver.py` downloads a large number of files (everything
				359	needed to run the test) and may take a while. There is a way to use
				360	`run_isolated.py` to achieve the same result, but as of this writing, there
				361	were problems doing so, so this procedure is not documented at this time.
				362
				363	Before attempting to download an isolate, you must ensure you have permission
				364	to access the isolate server. Full instructions can be [found
				365	here][isolate-server-credentials]. For most cases, you can simply run:
				366
				367	* `./src/tools/swarming_client/auth.py login
				368	--service=https://isolateserver.appspot.com`
				369
				370	The above link requires that you log in with your @google.com credentials. It's
				371	not known at the present time whether this works with @chromium.org accounts.
				372	Email kbr@ if you try this and find it doesn't work.
				373
				374	[isolate-server-credentials]: gpu_testing_bot_details.md#Isolate-server-credentials
				375
				376	## Running Locally Built Binaries on the GPU Bots
				377
				378	See the [Swarming documentation] for instructions on how to upload your binaries to the isolate server and trigger execution on Swarming.
				379
John Budorick	b2ff224	2019-11-14 17:35:59	[diff] [blame]	380	Be sure to use the correct swarming dimensions for your desired GPU e.g. "1002:6613" instead of "AMD Radeon R7 240 (1002:6613)" which is how it appears on swarming task page. You can query bots in the chromium.tests.gpu pool to find the correct dimensions:
Sunny Sachanandani	8d07157	2019-06-13 20:17:58	[diff] [blame]	381
John Budorick	b2ff224	2019-11-14 17:35:59	[diff] [blame]	382	* `python tools\swarming_client\swarming.py bots -S chromium-swarm.appspot.com -d pool chromium.tests.gpu`
Sunny Sachanandani	8d07157	2019-06-13 20:17:58	[diff] [blame]	383
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	384	[Swarming documentation]: https://www.chromium.org/developers/testing/isolated-testing/for-swes#TOC-Run-a-test-built-locally-on-Swarming
				385
Kenneth Russell	4273295	2018-06-27 02:08:42	[diff] [blame]	386	## Moving Test Binaries from Machine to Machine
				387
				388	To create a zip archive of your personal Chromium build plus all of
				389	the Telemetry-based GPU tests' dependencies, which you can then move
				390	to another machine for testing:
				391
				392	1. Build Chrome (into `out/Release` in this example).
				393	1. `python tools/mb/mb.py zip out/Release/ telemetry_gpu_integration_test out/telemetry_gpu_integration_test.zip`
				394
				395	Then copy telemetry_gpu_integration_test.zip to another machine. Unzip
				396	it, and cd into the resulting directory. Invoke
				397	`content/test/gpu/run_gpu_integration_test.py` as above.
				398
				399	This workflow has been tested successfully on Windows with a
				400	statically-linked Release build of Chrome.
				401
				402	Note: on one macOS machine, this command failed because of a broken
				403	`strip-json-comments` symlink in
				404	`src/third_party/catapult/common/node_runner/node_runner/node_modules/.bin`. Deleting
				405	that symlink allowed it to proceed.
				406
				407	Note also: on the same macOS machine, with a component build, this
				408	command failed to zip up a working Chromium binary. The browser failed
				409	to start with the following error:
				410
				411	`[0626/180440.571670:FATAL:chrome_main_delegate.cc(1057)] Check failed: service_manifest_data_pack_.`
				412
				413	In a pinch, this command could be used to bundle up everything, but
				414	the "out" directory could be deleted from the resulting zip archive,
				415	and the Chromium binaries moved over to the target machine. Then the
				416	command line arguments `--browser=exact --browser-executable=[path]`
				417	can be used to launch that specific browser.
				418
				419	See the [user guide for mb](../../tools/mb/docs/user_guide.md#mb-zip), the
				420	meta-build system, for more details.
				421
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	422	## Adding New Tests to the GPU Bots
				423
				424	The goal of the GPU bots is to avoid regressions in Chrome's rendering stack.
				425	To that end, let's add as many tests as possible that will help catch
				426	regressions in the product. If you see a crazy bug in Chrome's rendering which
				427	would be easy to catch with a pixel test running in Chrome and hard to catch in
				428	any of the other test harnesses, please, invest the time to add a test!
				429
				430	There are a couple of different ways to add new tests to the bots:
				431
				432	1. Adding a new test to one of the existing harnesses.
				433	2. Adding an entire new test step to the bots.
				434
				435	### Adding a new test to one of the existing test harnesses
				436
				437	Adding new tests to the GTest-based harnesses is straightforward and
				438	essentially requires no explanation.
				439
				440	As of this writing it isn't as easy as desired to add a new test to one of the
				441	Telemetry based harnesses. See [Issue 352807](http://crbug.com/352807). Let's
				442	collectively work to address that issue. It would be great to reduce the number
				443	of steps on the GPU bots, or at least to avoid significantly increasing the
				444	number of steps on the bots. The WebGL conformance tests should probably remain
				445	a separate step, but some of the smaller Telemetry based tests
				446	(`context_lost_tests`, `memory_test`, etc.) should probably be combined into a
				447	single step.
				448
				449	If you are adding a new test to one of the existing tests (e.g., `pixel_test`),
				450	all you need to do is make sure that your new test runs correctly via isolates.
				451	See the documentation from the GPU bot details on [adding new isolated
Daniel Bratell	f73f0df	2018-09-24 13:52:49	[diff] [blame]	452	tests][new-isolates] for the gn args and authentication needed to upload
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	453	isolates to the isolate server. Most likely the new test will be Telemetry
				454	based, and included in the `telemetry_gpu_test_run` isolate. You can then
				455	invoke it via:
				456
				457	* `./src/tools/swarming_client/run_isolated.py -s [HASH]
				458	-I https://isolateserver.appspot.com -- [TEST_NAME] [TEST_ARGUMENTS]`
				459
				460	[new-isolates]: gpu_testing_bot_details.md#Adding-a-new-isolated-test-to-the-bots
				461
Jamie Madill	5b0716b	2019-10-24 16:43:47	[diff] [blame]	462	### Adding new steps to the GPU Bots
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	463
				464	The tests that are run by the GPU bots are described by a couple of JSON files
				465	in the Chromium workspace:
				466
				467	* [`chromium.gpu.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json)
				468	* [`chromium.gpu.fyi.json`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json)
				469
				470	These files are autogenerated by the following script:
				471
Kenneth Russell	8a386d4	2018-06-02 09:48:01	[diff] [blame]	472	* [`generate_buildbot_json.py`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/generate_buildbot_json.py)
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	473
Kenneth Russell	8a386d4	2018-06-02 09:48:01	[diff] [blame]	474	This script is documented in
				475	[`testing/buildbot/README.md`](https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/README.md). The
				476	JSON files are parsed by the chromium and chromium_trybot recipes, and describe
				477	two basic types of tests:
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	478
				479	* GTests: those which use the Googletest and Chromium's `base/test/launcher/`
				480	frameworks.
Kenneth Russell	8a386d4	2018-06-02 09:48:01	[diff] [blame]	481	* Isolated scripts: tests whose initial entry point is a Python script which
				482	follows a simple convention of command line argument parsing.
				483
				484	The majority of the GPU tests are however:
				485
				486	* Telemetry based tests: an isolated script test which is built on the
				487	Telemetry framework and which launches the entire browser.
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	488
				489	A prerequisite of adding a new test to the bots is that that test [run via
Kenneth Russell	8a386d4	2018-06-02 09:48:01	[diff] [blame]	490	isolates][new-isolates]. Once that is done, modify `test_suites.pyl` to add the
				491	test to the appropriate set of bots. Be careful when adding large new test steps
				492	to all of the bots, because the GPU bots are a limited resource and do not
				493	currently have the capacity to absorb large new test suites. It is safer to get
				494	new tests running on the chromium.gpu.fyi waterfall first, and expand from there
				495	to the chromium.gpu waterfall (which will also make them run against every
Stephen Martinis	089f5f0	2019-02-12 02:42:24	[diff] [blame]	496	Chromium CL by virtue of the `linux-rel`, `mac-rel`, `win7-rel` and
				497	`android-marshmallow-arm64-rel` tryservers' mirroring of the bots on this
				498	waterfall – so be careful!).
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	499
				500	Tryjobs which add new test steps to the chromium.gpu.json file will run those
				501	new steps during the tryjob, which helps ensure that the new test won't break
				502	once it starts running on the waterfall.
				503
				504	Tryjobs which modify chromium.gpu.fyi.json can be sent to the
				505	`win_optional_gpu_tests_rel`, `mac_optional_gpu_tests_rel` and
				506	`linux_optional_gpu_tests_rel` tryservers to help ensure that they won't
				507	break the FYI bots.
				508
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	509	## Debugging Pixel Test Failures on the GPU Bots
				510
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	511	If pixel tests fail on the bots, the build step will contain either one or more
				512	links titled `gold_triage_link for <test name>` or a single link titled
				513	`Too many artifacts produced to link individually, click for links`, which
				514	itself will contain links. In either case, these links will direct to Gold
				515	pages showing the image produced by the image and the approved image that most
				516	closely matches it.
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	517
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	518	Note that for the tests which programatically check colors in certain regions of
				519	the image (tests with `expected_colors` fields in [pixel_test_pages]), there
				520	likely won't be a closest approved image since those tests only upload data to
				521	Gold in the event of a failure.
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	522
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	523	[pixel_test_pages]: https://cs.chromium.org/chromium/src/content/test/gpu/gpu_tests/pixel_test_pages.py
Kenneth Russell	fa3ffde	2018-10-24 21:24:38	[diff] [blame]	524
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	525	## Updating and Adding New Pixel Tests to the GPU Bots
				526
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	527	If your CL adds a new pixel test or modifies existing ones, it's likely that
				528	you will have to approve new images. Simply run your CL through the CQ and
				529	follow the steps outline [here][pixel wrangling triage] under the "Check if any
				530	pixel test failures are actual failures or need to be rebaselined." step.
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	531
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	532	[pixel wrangling triage]: pixel_wrangling.md#How-to-Keep-the-Bots-Green
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	533
Brian Sheedy	5a88cc7	2019-09-27 23:04:35	[diff] [blame]	534	If you are adding a new pixel test, it is beneficial to set the
				535	`grace_period_end` argument in the test's definition. This will allow the test
				536	to run for a period without actually failing on the waterfall bots, giving you
				537	some time to triage any additional images that show up on them. This helps
				538	prevent new tests from making the bots red because they're producing slightly
				539	different but valid images from the ones triaged while the CL was in review.
				540	Example:
				541
				542	```
				543	from datetime import date
				544
				545	...
				546
				547	PixelTestPage(
				548	'foo_pixel_test.html',
				549	...
				550	grace_period_end=date(2020, 1, 1)
				551	)
				552	```
				553
				554	You should typically set the grace period to end 1-2 days after the the CL will
				555	land.
				556
Brian Sheedy	c4650ad0	2019-07-29 17:31:38	[diff] [blame]	557	Once your CL passes the CQ, you should be mostly good to go, although you should
				558	keep an eye on the waterfall bots for a short period after your CL lands in case
				559	any configurations not covered by the CQ need to have images approved, as well.
Brian Sheedy	5a88cc7	2019-09-27 23:04:35	[diff] [blame]	560	All untriaged images for your test can be found by substituting your test name
				561	into:
				562
				563	`https://chrome-gpu-gold.skia.org/search?query=name%3D<test name>`
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	564
Brian Sheedy	e4a03fc	2020-05-13 23:12:00	[diff] [blame]	565	NOTE If you have a grace period active for your test, then Gold will be told
				566	to ignore results for the test. This is so that it does not comment on unrelated
				567	CLs about untriaged images if your test is noisy. Images will still be uploaded
				568	to Gold and can be triaged, but will not show up on the main page's untriaged
				569	image list, and you will need to enable the "Ignored" toggle at the top of the
				570	page when looking at the triage page specific to your test.
				571
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	572	## Stamping out Flakiness
				573
				574	It's critically important to aggressively investigate and eliminate the root
				575	cause of any flakiness seen on the GPU bots. The bots have been known to run
				576	reliably for days at a time, and any flaky failures that are tolerated on the
				577	bots translate directly into instability of the browser experienced by
				578	customers. Critical bugs in subsystems like WebGL, affecting high-profile
				579	products like Google Maps, have escaped notice in the past because the bots
				580	were unreliable. After much re-work, the GPU bots are now among the most
				581	reliable automated test machines in the Chromium project. Let's keep them that
				582	way.
				583
				584	Flakiness affecting the GPU tests can come in from highly unexpected sources.
				585	Here are some examples:
				586
				587	* Intermittent pixel_test failures on Linux where the captured pixels were
				588	black, caused by the Display Power Management System (DPMS) kicking in.
				589	Disabled the X server's built-in screen saver on the GPU bots in response.
				590	* GNOME dbus-related deadlocks causing intermittent timeouts ([Issue
				591	309093](http://crbug.com/309093) and related bugs).
				592	* Windows Audio system changes causing intermittent assertion failures in the
				593	browser ([Issue 310838](http://crbug.com/310838)).
				594	* Enabling assertion failures in the C++ standard library on Linux causing
				595	random assertion failures ([Issue 328249](http://crbug.com/328249)).
				596	* V8 bugs causing random crashes of the Maps pixel test (V8 issues
				597	[3022](https://code.google.com/p/v8/issues/detail?id=3022),
				598	[3174](https://code.google.com/p/v8/issues/detail?id=3174)).
				599	* TLS changes causing random browser process crashes ([Issue
				600	264406](http://crbug.com/264406)).
				601	* Isolated test execution flakiness caused by failures to reliably clean up
				602	temporary directories ([Issue 340415](http://crbug.com/340415)).
				603	* The Telemetry-based WebGL conformance suite caught a bug in the memory
				604	allocator on Android not caught by any other bot ([Issue
				605	347919](http://crbug.com/347919)).
				606	* context_lost test failures caused by the compositor's retry logic ([Issue
				607	356453](http://crbug.com/356453)).
				608	* Multiple bugs in Chromium's support for lost contexts causing flakiness of
				609	the context_lost tests ([Issue 365904](http://crbug.com/365904)).
				610	* Maps test timeouts caused by Content Security Policy changes in Blink
				611	([Issue 395914](http://crbug.com/395914)).
				612	* Weak pointer assertion failures in various webgl\_conformance\_tests caused
				613	by changes to the media pipeline ([Issue 399417](http://crbug.com/399417)).
				614	* A change to a default WebSocket timeout in Telemetry causing intermittent
				615	failures to run all WebGL conformance tests on the Mac bots ([Issue
				616	403981](http://crbug.com/403981)).
				617	* Chrome leaking suspended sub-processes on Windows, apparently a preexisting
				618	race condition that suddenly showed up ([Issue
				619	424024](http://crbug.com/424024)).
				620	* Changes to Chrome's cross-context synchronization primitives causing the
				621	wrong tiles to be rendered ([Issue 584381](http://crbug.com/584381)).
				622	* A bug in V8's handling of array literals causing flaky failures of
				623	texture-related WebGL 2.0 tests ([Issue 606021](http://crbug.com/606021)).
				624	* Assertion failures in sync point management related to lost contexts that
				625	exposed a real correctness bug ([Issue 606112](http://crbug.com/606112)).
				626	* A bug in glibc's `sem_post`/`sem_wait` primitives breaking V8's parallel
				627	garbage collection ([Issue 609249](http://crbug.com/609249)).
Kenneth Russell	d5efb3f	2018-05-11 01:40:45	[diff] [blame]	628	* A change to Blink's memory purging primitive which caused intermittent
				629	timeouts of WebGL conformance tests on all platforms ([Issue
				630	840988](http://crbug.com/840988)).
Brian Sheedy	382a59b4	2020-06-09 00:22:32	[diff] [blame]	631	* Screen DPI being inconsistent across seemingly identical Linux machines,
				632	causing the Maps pixel test to flakily produce incorrectly sized images
				633	([Issue 1091410](https://crbug.com/1091410)).
Kai Ninomiya	a6429fb3	2018-03-30 01:30:56	[diff] [blame]	634
				635	If you notice flaky test failures either on the GPU waterfalls or try servers,
				636	please file bugs right away with the component Internals>GPU>Testing and
				637	include links to the failing builds and copies of the logs, since the logs
				638	expire after a few days. [GPU pixel wranglers] should give the highest priority
				639	to eliminating flakiness on the tree.
				640
				641	[GPU pixel wranglers]: pixel_wrangling.md