blob: f316308a99ffb5b2fe3e1f1e2b5d0f66af55620f [file] [log] [blame] [view]
Elly Fong-Jones9996b2172019-09-05 13:24:431# Debugging with Swarming
2
3This document outlines how to debug a test failure on a specific builder
4configuration without needing to repeatedly upload new CL revisions or do CQ dry
5runs.
6
7[TOC]
8
9## Overview & Terms
10
11*Swarming* is a system operated by the infra team that schedules and runs tasks
12under a specific set of constraints, like "this must run on a macOS 10.13 host"
13or "this must run on a host with an intel GPU". It is somewhat similar to part
14of [Borg], or to [Kubernetes].
15
16An *isolate* is an archive containing all the files needed to do a specific task
17on the swarming infrastructure. It contains binaries as well as any libraries
18they link against or support data. An isolate can be thought of like a tarball,
19but held by the "isolate server" and identified by a hash of its contents. The
20isolate also includes the command(s) to run, which is why the command is
21specified when building the isolate, not when executing it.
22
23Normally, when you do a CQ dry run, something like this happens:
24
25```
26 for type in builders_to_run:
27 targets = compute_targets_for(type)
28 isolates = use_swarming_to_build(type, targets) # uploads isolates for targets
29 wait_for_swarming_to_be_done()
30
31 for isolate in isolates:
32 use_swarming_to_run(type, isolate) # downloads isolates onto the bots used
33 wait_for_swarming_to_be_done()
34```
35
36When you do a CQ retry on a specific set of bots, that simply constrains
37`builders_to_run` in the pseudocode above. However, if you're trying to rerun a
38specific target on a specific bot, because you're trying to reproduce a failure
39or debug, doing a CQ retry will still waste a lot of time - the retry will still
40build and run *all* targets, even if it's only for one bot.
41
42Fortunately, you can manually invoke some steps of this process. What you really
43want to do is:
44
45```
46 isolate = use_swarming_to_build(type, target) # can't do this yet, see below
47 use_swarming_to_run(type, isolate)
48```
49
50or perhaps:
51
52```
53 isolate = upload_to_isolate_server(target_you_built_locally)
54 use_swarming_to_run(type, isolate)
55```
56
57## Building an isolate
58
59At the moment, you can only build an isolate locally, like so (commands you type
60begin with `$`):
61
62```
63$ tools/mb/mb.py isolate //$outdir $target
64```
65
66This will produce some files in $outdir. The most pertinent two are
67`$outdir/$target.isolate` and $outdir/target.isolated`.
68
69Support for building an isolate using swarming, which would allow you to build
70for a platform you can't build for locally, does not yet exist.
71
72## Uploading an isolate
73
74You can then upload the resulting isolate to the isolate server:
75
76```
77$ tools/swarming_client/isolate.py archive \
78 -I https://isolateserver.appspot.com \
79 -i $outdir/$target.isolate \
80 -s $outdir/$target.isolated
81```
82
83You may need to log in to `https://isolateserver.appspot.com` to do this:
84
85```
86$ python tools/swarming_client/auth.py login \
87 --service=https://isolateserver.appspot.com
88```
89
90The `isolate.py` tool will emit something like this:
91
92```
93e625130b712096e3908266252c8cd779d7f442f1 unit_tests
94```
95
Elly Fong-Jonesf278f712019-09-09 21:08:4996Do not ctrl-c it after it does this, even if it seems to be hanging for a
97minute - just let it finish.
Elly Fong-Jones9996b2172019-09-05 13:24:4398
99## Running an isolate
100
101Now that the isolate is on the isolate server with hash `$hash` from the
102previous step, you can run on bots of your choice:
103
104```
105$ tools/swarming_client/swarming.py trigger \
106 -S https://chromium-swarm.appspot.com \
107 -I https://isolateserver.appspot.com \
108 -d pool $pool \
109 $criteria \
110 -s $hash
111```
112
113There are two more things you need to fill in here. The first is the pool name;
114you should pick "Chrome" unless you know otherwise. The pool is the collection
115of hosts from which swarming will try to pick bots to run your tasks.
116
117The second is the criteria, which is how you specify which bot(s) you want your
118task scheduled on. These are specified via "dimensions", which are specified
119with `-d key val` or `--dimension=key val`. In fact, the `-d pool $pool` in the
120command above is selecting based on the "pool" dimension. There are a lot of
121possible dimensions; one useful one is "os", like `-d os Linux`. Examples of
122other dimensions include:
123
124* `-d os Mac10.13.6` to select a specific OS version
125* `-d device_type "Pixel 3"` to select a specific Android device type
126* `-d gpu 8086:1912` to select a specific GPU
127
128The [swarming bot list] allows you to see all the dimensions and the values they
129can take on.
130
Brian Sheedy00a51e42019-09-09 19:09:17131If you need to pass additional arguments to the test, simply add
132`-- $extra_args` to the end of the `swarming.py trigger` command line - anything
133after the `--` will be passed directly to the test.
134
Elly Fong-Jones9996b2172019-09-05 13:24:43135When you invoke `swarming.py trigger`, it will emit two pieces of information: a
136URL for the task it created, and a command you can run to collect the results of
137that task. For example:
138
139```
140Triggered task: [email protected]/os=Linux_pool=Chrome/e625130b712096e3908266252c8cd779d7f442f1
141To collect results, use:
142 tools/swarming_client/swarming.py collect -S https://chromium-swarm.appspot.com 46fc393777163310
143Or visit:
144 https://chromium-swarm.appspot.com/user/task/46fc393777163310
145```
146
147The 'collect' command given there will block until the task is complete, then
148produce the task's results, or you can load that URL and watch the task's
149progress.
150
151## run-swarmed.py
152
153A lot of this logic is wrapped up in `tools/run-swarmed.py`, which you can run
154like this:
155
156```
Elly Fong-Jones23e2b712019-09-06 18:14:58157$ tools/run-swarmed.py $outdir $target
Elly Fong-Jones9996b2172019-09-05 13:24:43158```
159
160See the `--help` option of `run-swarmed.py` for more details about that script.
161
Brian Sheedy00a51e42019-09-09 19:09:17162## mb.py run
163
164Similar to `tools/run_swarmed.py`, `mb.py run` bundles much of the logic into a
165single command line. Unlike `tools/run_swarmed.py`, `mb.py run` allows the user
166to specify extra arguments to pass to the test, but has a messier command line.
167
168To use it, run:
169```
170$ tools/mb/mb.py run \
171 -s --no-default-dimensions \
172 -d pool $pool \
173 $criteria \
174 $outdir $target \
175 -- $extra_args
176```
177
Elly Fong-Jones9996b2172019-09-05 13:24:43178## Other notes
179
180If you are looking at a Swarming task page, be sure to check the bottom of the
181page, which gives you commands to:
182
183* Download the contents of the isolate the task used
184* Reproduce the task's configuration locally
185* Download all output results from the task locally
186
187[borg]: https://ai.google/research/pubs/pub43438
188[kubernetes]: https://kubernetes.io/
189[swarming bot list]: https://chromium-swarm.appspot.com/botlist