blob: 6677c0f68d2a543afde5c83650f17d407422a353 [file] [log] [blame] [view]
Elly Fong-Jones9996b2172019-09-05 13:24:431# Debugging with Swarming
2
3This document outlines how to debug a test failure on a specific builder
Ben Pastenedd94f6322025-01-02 20:15:054configuration on Swarming using the [UTR tool](../../tools/utr/README.md)
5without needing to repeatedly upload new CL revisions or do CQ dry runs. This
6tool will automatically handle steps like replicating the right GN args,
7building & uploading the test isolate, triggering & collecting the swarming test
8tasks.
Elly Fong-Jones9996b2172019-09-05 13:24:439
10[TOC]
11
12## Overview & Terms
13
14*Swarming* is a system operated by the infra team that schedules and runs tasks
15under a specific set of constraints, like "this must run on a macOS 10.13 host"
16or "this must run on a host with an intel GPU". It is somewhat similar to part
17of [Borg], or to [Kubernetes].
18
19An *isolate* is an archive containing all the files needed to do a specific task
20on the swarming infrastructure. It contains binaries as well as any libraries
21they link against or support data. An isolate can be thought of like a tarball,
Junji Watanabe32211442021-01-13 07:31:4722but held by the CAS server and identified by a digest of its contents. The
Elly Fong-Jones9996b2172019-09-05 13:24:4323isolate also includes the command(s) to run, which is why the command is
Ben Pastenedd94f6322025-01-02 20:15:0524specified when building the isolate, not when executing it. See the
Louis Romeroef34dc82025-01-10 17:20:3325[infra glossary](../infra/glossary.md) for the definitions of these terms and
Ben Pastenedd94f6322025-01-02 20:15:0526more.
Elly Fong-Jones9996b2172019-09-05 13:24:4327
28Normally, when you do a CQ dry run, something like this happens:
29
30```
31 for type in builders_to_run:
32 targets = compute_targets_for(type)
33 isolates = use_swarming_to_build(type, targets) # uploads isolates for targets
34 wait_for_swarming_to_be_done()
35
36 for isolate in isolates:
37 use_swarming_to_run(type, isolate) # downloads isolates onto the bots used
38 wait_for_swarming_to_be_done()
39```
40
41When you do a CQ retry on a specific set of bots, that simply constrains
42`builders_to_run` in the pseudocode above. However, if you're trying to rerun a
43specific target on a specific bot, because you're trying to reproduce a failure
44or debug, doing a CQ retry will still waste a lot of time - the retry will still
45build and run *all* targets, even if it's only for one bot.
46
47Fortunately, you can manually invoke some steps of this process. What you really
48want to do is:
49
50```
51 isolate = use_swarming_to_build(type, target) # can't do this yet, see below
52 use_swarming_to_run(type, isolate)
53```
54
55or perhaps:
56
57```
Junji Watanabe32211442021-01-13 07:31:4758 isolate = upload_to_cas(target_you_built_locally)
Elly Fong-Jones9996b2172019-09-05 13:24:4359 use_swarming_to_run(type, isolate)
60```
61
Fergal Daly2edab672019-10-21 14:12:1662## A concrete example
63
Ben Pastenedd94f6322025-01-02 20:15:0564Here's how to run `chrome_public_unit_test_apk` on Android devices. By using the
65config of the `android-arm64-rel` trybot, we can run it on Pixel 3 XLs running
66Android Pie.
Fergal Daly2edab672019-10-21 14:12:1667
68```sh
Ben Pastenedd94f6322025-01-02 20:15:0569$ vpython3 tools/utr \
70 -p chromium \
71 -B try \
72 -b android-arm64-rel \
73 -t "chrome_public_unit_test_apk on Android device Pixel 3 XL" \
74 compile-and-test
Fergal Daly2edab672019-10-21 14:12:1675```
76
Ben Pastenedd94f6322025-01-02 20:15:0577You can find the UTR invocation for any test on the build UI under the step's
78"reproduction instructions" (displayed by clicking the page icon in the UI).
Elly Fong-Jones9996b2172019-09-05 13:24:4379
Elly Fong-Jones9996b2172019-09-05 13:24:4380## Other notes
81
82If you are looking at a Swarming task page, be sure to check the bottom of the
83page, which gives you commands to:
84
85* Download the contents of the isolate the task used
86* Reproduce the task's configuration locally
87* Download all output results from the task locally
88
89[borg]: https://ai.google/research/pubs/pub43438
90[kubernetes]: https://kubernetes.io/
91[swarming bot list]: https://chromium-swarm.appspot.com/botlist
Sven Zheng6c3b6022023-07-28 18:53:5592
93To find out repo checkout, gn args, etc for local compile, you can use
94[how to repro bot failures](../testing/how_to_repro_bot_failures.md)
95as a reference.