Blame - docs/workflow/debugging-with-swarming.md - chromium/src

blob: f316308a99ffb5b2fe3e1f1e2b5d0f66af55620f [file] [log] [blame] [view]

Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	1	# Debugging with Swarming
				2
				3	This document outlines how to debug a test failure on a specific builder
				4	configuration without needing to repeatedly upload new CL revisions or do CQ dry
				5	runs.
				6
				7	[TOC]
				8
				9	## Overview & Terms
				10
				11	Swarming is a system operated by the infra team that schedules and runs tasks
				12	under a specific set of constraints, like "this must run on a macOS 10.13 host"
				13	or "this must run on a host with an intel GPU". It is somewhat similar to part
				14	of [Borg], or to [Kubernetes].
				15
				16	An isolate is an archive containing all the files needed to do a specific task
				17	on the swarming infrastructure. It contains binaries as well as any libraries
				18	they link against or support data. An isolate can be thought of like a tarball,
				19	but held by the "isolate server" and identified by a hash of its contents. The
				20	isolate also includes the command(s) to run, which is why the command is
				21	specified when building the isolate, not when executing it.
				22
				23	Normally, when you do a CQ dry run, something like this happens:
				24
				25	```
				26	for type in builders_to_run:
				27	targets = compute_targets_for(type)
				28	isolates = use_swarming_to_build(type, targets) # uploads isolates for targets
				29	wait_for_swarming_to_be_done()
				30
				31	for isolate in isolates:
				32	use_swarming_to_run(type, isolate) # downloads isolates onto the bots used
				33	wait_for_swarming_to_be_done()
				34	```
				35
				36	When you do a CQ retry on a specific set of bots, that simply constrains
				37	`builders_to_run` in the pseudocode above. However, if you're trying to rerun a
				38	specific target on a specific bot, because you're trying to reproduce a failure
				39	or debug, doing a CQ retry will still waste a lot of time - the retry will still
				40	build and run all targets, even if it's only for one bot.
				41
				42	Fortunately, you can manually invoke some steps of this process. What you really
				43	want to do is:
				44
				45	```
				46	isolate = use_swarming_to_build(type, target) # can't do this yet, see below
				47	use_swarming_to_run(type, isolate)
				48	```
				49
				50	or perhaps:
				51
				52	```
				53	isolate = upload_to_isolate_server(target_you_built_locally)
				54	use_swarming_to_run(type, isolate)
				55	```
				56
				57	## Building an isolate
				58
				59	At the moment, you can only build an isolate locally, like so (commands you type
				60	begin with `$`):
				61
				62	```
				63	$ tools/mb/mb.py isolate //$outdir $target
				64	```
				65
				66	This will produce some files in $outdir. The most pertinent two are
				67	`$outdir/$target.isolate` and $outdir/target.isolated`.
				68
				69	Support for building an isolate using swarming, which would allow you to build
				70	for a platform you can't build for locally, does not yet exist.
				71
				72	## Uploading an isolate
				73
				74	You can then upload the resulting isolate to the isolate server:
				75
				76	```
				77	$ tools/swarming_client/isolate.py archive \
				78	-I https://isolateserver.appspot.com \
				79	-i $outdir/$target.isolate \
				80	-s $outdir/$target.isolated
				81	```
				82
				83	You may need to log in to `https://isolateserver.appspot.com` to do this:
				84
				85	```
				86	$ python tools/swarming_client/auth.py login \
				87	--service=https://isolateserver.appspot.com
				88	```
				89
				90	The `isolate.py` tool will emit something like this:
				91
				92	```
				93	e625130b712096e3908266252c8cd779d7f442f1 unit_tests
				94	```
				95
Elly Fong-Jones	f278f71	2019-09-09 21:08:49	[diff] [blame^]	96	Do not ctrl-c it after it does this, even if it seems to be hanging for a
				97	minute - just let it finish.
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	98
				99	## Running an isolate
				100
				101	Now that the isolate is on the isolate server with hash `$hash` from the
				102	previous step, you can run on bots of your choice:
				103
				104	```
				105	$ tools/swarming_client/swarming.py trigger \
				106	-S https://chromium-swarm.appspot.com \
				107	-I https://isolateserver.appspot.com \
				108	-d pool $pool \
				109	$criteria \
				110	-s $hash
				111	```
				112
				113	There are two more things you need to fill in here. The first is the pool name;
				114	you should pick "Chrome" unless you know otherwise. The pool is the collection
				115	of hosts from which swarming will try to pick bots to run your tasks.
				116
				117	The second is the criteria, which is how you specify which bot(s) you want your
				118	task scheduled on. These are specified via "dimensions", which are specified
				119	with `-d key val` or `--dimension=key val`. In fact, the `-d pool $pool` in the
				120	command above is selecting based on the "pool" dimension. There are a lot of
				121	possible dimensions; one useful one is "os", like `-d os Linux`. Examples of
				122	other dimensions include:
				123
				124	* `-d os Mac10.13.6` to select a specific OS version
				125	* `-d device_type "Pixel 3"` to select a specific Android device type
				126	* `-d gpu 8086:1912` to select a specific GPU
				127
				128	The [swarming bot list] allows you to see all the dimensions and the values they
				129	can take on.
				130
Brian Sheedy	00a51e4	2019-09-09 19:09:17	[diff] [blame]	131	If you need to pass additional arguments to the test, simply add
				132	`-- $extra_args` to the end of the `swarming.py trigger` command line - anything
				133	after the `--` will be passed directly to the test.
				134
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	135	When you invoke `swarming.py trigger`, it will emit two pieces of information: a
				136	URL for the task it created, and a command you can run to collect the results of
				137	that task. For example:
				138
				139	```
				140	Triggered task: [email protected]/os=Linux_pool=Chrome/e625130b712096e3908266252c8cd779d7f442f1
				141	To collect results, use:
				142	tools/swarming_client/swarming.py collect -S https://chromium-swarm.appspot.com 46fc393777163310
				143	Or visit:
				144	https://chromium-swarm.appspot.com/user/task/46fc393777163310
				145	```
				146
				147	The 'collect' command given there will block until the task is complete, then
				148	produce the task's results, or you can load that URL and watch the task's
				149	progress.
				150
				151	## run-swarmed.py
				152
				153	A lot of this logic is wrapped up in `tools/run-swarmed.py`, which you can run
				154	like this:
				155
				156	```
Elly Fong-Jones	23e2b71	2019-09-06 18:14:58	[diff] [blame]	157	$ tools/run-swarmed.py $outdir $target
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	158	```
				159
				160	See the `--help` option of `run-swarmed.py` for more details about that script.
				161
Brian Sheedy	00a51e4	2019-09-09 19:09:17	[diff] [blame]	162	## mb.py run
				163
				164	Similar to `tools/run_swarmed.py`, `mb.py run` bundles much of the logic into a
				165	single command line. Unlike `tools/run_swarmed.py`, `mb.py run` allows the user
				166	to specify extra arguments to pass to the test, but has a messier command line.
				167
				168	To use it, run:
				169	```
				170	$ tools/mb/mb.py run \
				171	-s --no-default-dimensions \
				172	-d pool $pool \
				173	$criteria \
				174	$outdir $target \
				175	-- $extra_args
				176	```
				177
Elly Fong-Jones	9996b217	2019-09-05 13:24:43	[diff] [blame]	178	## Other notes
				179
				180	If you are looking at a Swarming task page, be sure to check the bottom of the
				181	page, which gives you commands to:
				182
				183	* Download the contents of the isolate the task used
				184	* Reproduce the task's configuration locally
				185	* Download all output results from the task locally
				186
				187	[borg]: https://ai.google/research/pubs/pub43438
				188	[kubernetes]: https://kubernetes.io/
				189	[swarming bot list]: https://chromium-swarm.appspot.com/botlist