Blame - docs/workflow/debugging-with-swarming.md - chromium/src

blob: 6b62a726cf4c223d0088589da3919d4b55738726 [file] [log] [blame] [view]

Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	1	# Debugging with Swarming
				2
				3	This document outlines how to debug a test failure on a specific builder
				4	configuration without needing to repeatedly upload new CL revisions or do CQ dry
				5	runs.
				6
				7	[TOC]
				8
				9	## Overview & Terms
				10
				11	Swarming is a system operated by the infra team that schedules and runs tasks
				12	under a specific set of constraints, like "this must run on a macOS 10.13 host"
				13	or "this must run on a host with an intel GPU". It is somewhat similar to part
				14	of [Borg], or to [Kubernetes].
				15
				16	An isolate is an archive containing all the files needed to do a specific task
				17	on the swarming infrastructure. It contains binaries as well as any libraries
				18	they link against or support data. An isolate can be thought of like a tarball,
				19	but held by the "isolate server" and identified by a hash of its contents. The
				20	isolate also includes the command(s) to run, which is why the command is
				21	specified when building the isolate, not when executing it.
				22
				23	Normally, when you do a CQ dry run, something like this happens:
				24
				25	```
				26	for type in builders_to_run:
				27	targets = compute_targets_for(type)
				28	isolates = use_swarming_to_build(type, targets) # uploads isolates for targets
				29	wait_for_swarming_to_be_done()
				30
				31	for isolate in isolates:
				32	use_swarming_to_run(type, isolate) # downloads isolates onto the bots used
				33	wait_for_swarming_to_be_done()
				34	```
				35
				36	When you do a CQ retry on a specific set of bots, that simply constrains
				37	`builders_to_run` in the pseudocode above. However, if you're trying to rerun a
				38	specific target on a specific bot, because you're trying to reproduce a failure
				39	or debug, doing a CQ retry will still waste a lot of time - the retry will still
				40	build and run all targets, even if it's only for one bot.
				41
				42	Fortunately, you can manually invoke some steps of this process. What you really
				43	want to do is:
				44
				45	```
				46	isolate = use_swarming_to_build(type, target) # can't do this yet, see below
				47	use_swarming_to_run(type, isolate)
				48	```
				49
				50	or perhaps:
				51
				52	```
				53	isolate = upload_to_isolate_server(target_you_built_locally)
				54	use_swarming_to_run(type, isolate)
				55	```
				56
Fergal Daly	2edab67	2019-10-21 14:12:16	[diff] [blame^]	57	## The easy way
				58
				59	A lot of the steps described in this doc have been bundled up into 2
				60	tools. Before using either of these you will need to
				61	[authenticate](#authenticating).
				62
				63	### run-swarmed.py
				64
				65	A lot of the logic below is wrapped up in `tools/run-swarmed.py`, which you can run
				66	like this:
				67
				68	```
				69	$ tools/run-swarmed.py $outdir $target
				70	```
				71
				72	See the `--help` option of `run-swarmed.py` for more details about that script.
				73
				74	### mb.py run
				75
				76	Similar to `tools/run_swarmed.py`, `mb.py run` bundles much of the logic into a
				77	single command line. Unlike `tools/run_swarmed.py`, `mb.py run` allows the user
				78	to specify extra arguments to pass to the test, but has a messier command line.
				79
				80	To use it, run:
				81	```
				82	$ tools/mb/mb.py run \
				83	-s --no-default-dimensions \
				84	-d pool $pool \
				85	$criteria \
				86	$outdir $target \
				87	-- $extra_args
				88	```
				89
				90	## A concrete example
				91
				92	Here's how to run `chrome_public_test_apk` on a bot with a Nexus 5 running KitKat.
				93
				94	```sh
				95	$ tools/mb/mb.py run \
				96	-s --no-default-dimensions \
				97	-d pool Chrome \
				98	-d device_os_type userdebug -d device_os KTU84P -d device_type hammerhead \
				99	out/Android-arm-dbg chrome_public_test_apk
				100	```
				101
				102	This assumes you have an `out/Android-arm-dbg/args.gn` like
				103
				104	```
				105	ffmpeg_branding = "Chrome"
				106	is_component_build = false
				107	is_debug = true
				108	proprietary_codecs = true
				109	strip_absolute_paths_from_debug_symbols = true
				110	symbol_level = 1
				111	system_webview_package_name = "com.google.android.webview"
				112	target_os = "android"
				113	use_goma = true
				114	```
				115
				116	## Bot selection criteria
				117
				118	The examples in this doc use `$criteria`. To figure out what values to use, you
				119	can go to an existing swarming run
				120	([recent tasks page](https://chromium-swarm.appspot.com/tasklist)) and
				121	look at the `Dimensions` section. Each of these becomes a `-d dimension_name
				122	dimension_value` in your `$criteria`. Click on `bots` (or go
				123	[here](https://chromium-swarm.appspot.com/botlist)) to be taken to a UI that
				124	allows you to try out the criteria interactively, so that you can be sure that
				125	there are bots matching your criteria. Sometimes the web page shows a
				126	human-friendly name rather than the name required on the commandline. [This
				127	file](https://cs.chromium.org/chromium/infra/luci/appengine/swarming/ui2/modules/alias.js)
				128	contains the mapping to human-friendly names. You can test your commandline by
				129	entering `dimension_name:dimension_value` in the interactive UI.
				130
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	131	## Building an isolate
				132
				133	At the moment, you can only build an isolate locally, like so (commands you type
				134	begin with `$`):
				135
				136	```
				137	$ tools/mb/mb.py isolate //$outdir $target
				138	```
				139
				140	This will produce some files in $outdir. The most pertinent two are
Elly Fong-Jones	ef5bed3	2019-09-10 19:27:44	[diff] [blame]	141	`$outdir/$target.isolate` and `$outdir/target.isolated`. If you've already built
				142	$target, you can save some CPU time and run `tools/mb/mb.py` with `--no-build`:
				143
				144	```
				145	$ tools/mb/mb.py isolate --no-build //$outdir $target
				146	```
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	147
				148	Support for building an isolate using swarming, which would allow you to build
				149	for a platform you can't build for locally, does not yet exist.
				150
Fergal Daly	2edab67	2019-10-21 14:12:16	[diff] [blame^]	151	## Authenticating
				152
				153	You may need to log in to `https://isolateserver.appspot.com` to do this:
				154
				155	```
				156	$ python tools/swarming_client/auth.py login \
				157	--service=https://isolateserver.appspot.com
				158	```
				159
				160	Use your google.com account for this.
				161
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	162	## Uploading an isolate
				163
				164	You can then upload the resulting isolate to the isolate server:
				165
				166	```
				167	$ tools/swarming_client/isolate.py archive \
				168	-I https://isolateserver.appspot.com \
				169	-i $outdir/$target.isolate \
				170	-s $outdir/$target.isolated
				171	```
				172
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	173	The `isolate.py` tool will emit something like this:
				174
				175	```
				176	e625130b712096e3908266252c8cd779d7f442f1 unit_tests
				177	```
				178
Elly Fong-Jones	f278f71	2019-09-09 21:08:49	[diff] [blame]	179	Do not ctrl-c it after it does this, even if it seems to be hanging for a
				180	minute - just let it finish.
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	181
				182	## Running an isolate
				183
				184	Now that the isolate is on the isolate server with hash `$hash` from the
				185	previous step, you can run on bots of your choice:
				186
				187	```
				188	$ tools/swarming_client/swarming.py trigger \
				189	-S https://chromium-swarm.appspot.com \
				190	-I https://isolateserver.appspot.com \
				191	-d pool $pool \
				192	$criteria \
				193	-s $hash
				194	```
				195
				196	There are two more things you need to fill in here. The first is the pool name;
				197	you should pick "Chrome" unless you know otherwise. The pool is the collection
				198	of hosts from which swarming will try to pick bots to run your tasks.
				199
				200	The second is the criteria, which is how you specify which bot(s) you want your
				201	task scheduled on. These are specified via "dimensions", which are specified
				202	with `-d key val` or `--dimension=key val`. In fact, the `-d pool $pool` in the
				203	command above is selecting based on the "pool" dimension. There are a lot of
				204	possible dimensions; one useful one is "os", like `-d os Linux`. Examples of
				205	other dimensions include:
				206
				207	* `-d os Mac10.13.6` to select a specific OS version
				208	* `-d device_type "Pixel 3"` to select a specific Android device type
				209	* `-d gpu 8086:1912` to select a specific GPU
				210
				211	The [swarming bot list] allows you to see all the dimensions and the values they
				212	can take on.
				213
Brian Sheedy	00a51e4	2019-09-09 19:09:17	[diff] [blame]	214	If you need to pass additional arguments to the test, simply add
				215	`-- $extra_args` to the end of the `swarming.py trigger` command line - anything
				216	after the `--` will be passed directly to the test.
				217
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	218	When you invoke `swarming.py trigger`, it will emit two pieces of information: a
				219	URL for the task it created, and a command you can run to collect the results of
				220	that task. For example:
				221
				222	```
				223	Triggered task: [email protected]/os=Linux_pool=Chrome/e625130b712096e3908266252c8cd779d7f442f1
				224	To collect results, use:
				225	tools/swarming_client/swarming.py collect -S https://chromium-swarm.appspot.com 46fc393777163310
				226	Or visit:
				227	https://chromium-swarm.appspot.com/user/task/46fc393777163310
				228	```
				229
				230	The 'collect' command given there will block until the task is complete, then
				231	produce the task's results, or you can load that URL and watch the task's
				232	progress.
				233
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	234	## Other notes
				235
				236	If you are looking at a Swarming task page, be sure to check the bottom of the
				237	page, which gives you commands to:
				238
				239	* Download the contents of the isolate the task used
				240	* Reproduce the task's configuration locally
				241	* Download all output results from the task locally
				242
				243	[borg]: https://ai.google/research/pubs/pub43438
				244	[kubernetes]: https://kubernetes.io/
				245	[swarming bot list]: https://chromium-swarm.appspot.com/botlist