blob: 6b62a726cf4c223d0088589da3919d4b55738726 [file] [log] [blame] [view]
Elly Fong-Jones9996b2172019-09-05 13:24:431# Debugging with Swarming
2
3This document outlines how to debug a test failure on a specific builder
4configuration without needing to repeatedly upload new CL revisions or do CQ dry
5runs.
6
7[TOC]
8
9## Overview & Terms
10
11*Swarming* is a system operated by the infra team that schedules and runs tasks
12under a specific set of constraints, like "this must run on a macOS 10.13 host"
13or "this must run on a host with an intel GPU". It is somewhat similar to part
14of [Borg], or to [Kubernetes].
15
16An *isolate* is an archive containing all the files needed to do a specific task
17on the swarming infrastructure. It contains binaries as well as any libraries
18they link against or support data. An isolate can be thought of like a tarball,
19but held by the "isolate server" and identified by a hash of its contents. The
20isolate also includes the command(s) to run, which is why the command is
21specified when building the isolate, not when executing it.
22
23Normally, when you do a CQ dry run, something like this happens:
24
25```
26 for type in builders_to_run:
27 targets = compute_targets_for(type)
28 isolates = use_swarming_to_build(type, targets) # uploads isolates for targets
29 wait_for_swarming_to_be_done()
30
31 for isolate in isolates:
32 use_swarming_to_run(type, isolate) # downloads isolates onto the bots used
33 wait_for_swarming_to_be_done()
34```
35
36When you do a CQ retry on a specific set of bots, that simply constrains
37`builders_to_run` in the pseudocode above. However, if you're trying to rerun a
38specific target on a specific bot, because you're trying to reproduce a failure
39or debug, doing a CQ retry will still waste a lot of time - the retry will still
40build and run *all* targets, even if it's only for one bot.
41
42Fortunately, you can manually invoke some steps of this process. What you really
43want to do is:
44
45```
46 isolate = use_swarming_to_build(type, target) # can't do this yet, see below
47 use_swarming_to_run(type, isolate)
48```
49
50or perhaps:
51
52```
53 isolate = upload_to_isolate_server(target_you_built_locally)
54 use_swarming_to_run(type, isolate)
55```
56
Fergal Daly2edab672019-10-21 14:12:1657## The easy way
58
59A lot of the steps described in this doc have been bundled up into 2
60tools. Before using either of these you will need to
61[authenticate](#authenticating).
62
63### run-swarmed.py
64
65A lot of the logic below is wrapped up in `tools/run-swarmed.py`, which you can run
66like this:
67
68```
69$ tools/run-swarmed.py $outdir $target
70```
71
72See the `--help` option of `run-swarmed.py` for more details about that script.
73
74### mb.py run
75
76Similar to `tools/run_swarmed.py`, `mb.py run` bundles much of the logic into a
77single command line. Unlike `tools/run_swarmed.py`, `mb.py run` allows the user
78to specify extra arguments to pass to the test, but has a messier command line.
79
80To use it, run:
81```
82$ tools/mb/mb.py run \
83 -s --no-default-dimensions \
84 -d pool $pool \
85 $criteria \
86 $outdir $target \
87 -- $extra_args
88```
89
90## A concrete example
91
92Here's how to run `chrome_public_test_apk` on a bot with a Nexus 5 running KitKat.
93
94```sh
95$ tools/mb/mb.py run \
96 -s --no-default-dimensions \
97 -d pool Chrome \
98 -d device_os_type userdebug -d device_os KTU84P -d device_type hammerhead \
99 out/Android-arm-dbg chrome_public_test_apk
100```
101
102This assumes you have an `out/Android-arm-dbg/args.gn` like
103
104```
105ffmpeg_branding = "Chrome"
106is_component_build = false
107is_debug = true
108proprietary_codecs = true
109strip_absolute_paths_from_debug_symbols = true
110symbol_level = 1
111system_webview_package_name = "com.google.android.webview"
112target_os = "android"
113use_goma = true
114```
115
116## Bot selection criteria
117
118The examples in this doc use `$criteria`. To figure out what values to use, you
119can go to an existing swarming run
120([recent tasks page](https://chromium-swarm.appspot.com/tasklist)) and
121look at the `Dimensions` section. Each of these becomes a `-d dimension_name
122dimension_value` in your `$criteria`. Click on `bots` (or go
123[here](https://chromium-swarm.appspot.com/botlist)) to be taken to a UI that
124allows you to try out the criteria interactively, so that you can be sure that
125there are bots matching your criteria. Sometimes the web page shows a
126human-friendly name rather than the name required on the commandline. [This
127file](https://cs.chromium.org/chromium/infra/luci/appengine/swarming/ui2/modules/alias.js)
128contains the mapping to human-friendly names. You can test your commandline by
129entering `dimension_name:dimension_value` in the interactive UI.
130
Elly Fong-Jones9996b2172019-09-05 13:24:43131## Building an isolate
132
133At the moment, you can only build an isolate locally, like so (commands you type
134begin with `$`):
135
136```
137$ tools/mb/mb.py isolate //$outdir $target
138```
139
140This will produce some files in $outdir. The most pertinent two are
Elly Fong-Jonesef5bed32019-09-10 19:27:44141`$outdir/$target.isolate` and `$outdir/target.isolated`. If you've already built
142$target, you can save some CPU time and run `tools/mb/mb.py` with `--no-build`:
143
144```
145$ tools/mb/mb.py isolate --no-build //$outdir $target
146```
Elly Fong-Jones9996b2172019-09-05 13:24:43147
148Support for building an isolate using swarming, which would allow you to build
149for a platform you can't build for locally, does not yet exist.
150
Fergal Daly2edab672019-10-21 14:12:16151## Authenticating
152
153You may need to log in to `https://isolateserver.appspot.com` to do this:
154
155```
156$ python tools/swarming_client/auth.py login \
157 --service=https://isolateserver.appspot.com
158```
159
160Use your google.com account for this.
161
Elly Fong-Jones9996b2172019-09-05 13:24:43162## Uploading an isolate
163
164You can then upload the resulting isolate to the isolate server:
165
166```
167$ tools/swarming_client/isolate.py archive \
168 -I https://isolateserver.appspot.com \
169 -i $outdir/$target.isolate \
170 -s $outdir/$target.isolated
171```
172
Elly Fong-Jones9996b2172019-09-05 13:24:43173The `isolate.py` tool will emit something like this:
174
175```
176e625130b712096e3908266252c8cd779d7f442f1 unit_tests
177```
178
Elly Fong-Jonesf278f712019-09-09 21:08:49179Do not ctrl-c it after it does this, even if it seems to be hanging for a
180minute - just let it finish.
Elly Fong-Jones9996b2172019-09-05 13:24:43181
182## Running an isolate
183
184Now that the isolate is on the isolate server with hash `$hash` from the
185previous step, you can run on bots of your choice:
186
187```
188$ tools/swarming_client/swarming.py trigger \
189 -S https://chromium-swarm.appspot.com \
190 -I https://isolateserver.appspot.com \
191 -d pool $pool \
192 $criteria \
193 -s $hash
194```
195
196There are two more things you need to fill in here. The first is the pool name;
197you should pick "Chrome" unless you know otherwise. The pool is the collection
198of hosts from which swarming will try to pick bots to run your tasks.
199
200The second is the criteria, which is how you specify which bot(s) you want your
201task scheduled on. These are specified via "dimensions", which are specified
202with `-d key val` or `--dimension=key val`. In fact, the `-d pool $pool` in the
203command above is selecting based on the "pool" dimension. There are a lot of
204possible dimensions; one useful one is "os", like `-d os Linux`. Examples of
205other dimensions include:
206
207* `-d os Mac10.13.6` to select a specific OS version
208* `-d device_type "Pixel 3"` to select a specific Android device type
209* `-d gpu 8086:1912` to select a specific GPU
210
211The [swarming bot list] allows you to see all the dimensions and the values they
212can take on.
213
Brian Sheedy00a51e42019-09-09 19:09:17214If you need to pass additional arguments to the test, simply add
215`-- $extra_args` to the end of the `swarming.py trigger` command line - anything
216after the `--` will be passed directly to the test.
217
Elly Fong-Jones9996b2172019-09-05 13:24:43218When you invoke `swarming.py trigger`, it will emit two pieces of information: a
219URL for the task it created, and a command you can run to collect the results of
220that task. For example:
221
222```
223Triggered task: [email protected]/os=Linux_pool=Chrome/e625130b712096e3908266252c8cd779d7f442f1
224To collect results, use:
225 tools/swarming_client/swarming.py collect -S https://chromium-swarm.appspot.com 46fc393777163310
226Or visit:
227 https://chromium-swarm.appspot.com/user/task/46fc393777163310
228```
229
230The 'collect' command given there will block until the task is complete, then
231produce the task's results, or you can load that URL and watch the task's
232progress.
233
Elly Fong-Jones9996b2172019-09-05 13:24:43234## Other notes
235
236If you are looking at a Swarming task page, be sure to check the bottom of the
237page, which gives you commands to:
238
239* Download the contents of the isolate the task used
240* Reproduce the task's configuration locally
241* Download all output results from the task locally
242
243[borg]: https://ai.google/research/pubs/pub43438
244[kubernetes]: https://kubernetes.io/
245[swarming bot list]: https://chromium-swarm.appspot.com/botlist