blob: a8eadb4eaff0ac6f79c5de0b0790a4755b94f2f5 [file] [log] [blame] [view]
Hans Wennborg3c65cd42023-05-31 17:46:081# Clang Gardening
2
Hans Wennborgfbda6212025-04-04 14:30:443Chromium bundles its own pre-built version of [Clang](clang.md) and
4[Rust](rust.md). This is done so that Chromium developers have access to the
5latest and greatest developer tools provided by Clang and LLVM (ASan, CFI,
6coverage, etc). In order to [update the compiler](updating_clang.md)
7(roll clang), it has to be tested so that we can be confident that it works
8in the configurations that Chromium cares about.
Hans Wennborg3c65cd42023-05-31 17:46:089
Hans Wennborgfbda6212025-04-04 14:30:4410The Clang gardener is responsible for monitoring the health of the latest
11versions of Clang + Rust, and how they work with the latest version of
12Chromium; raise any issues by filing bugs; address those issues or find someone
13to do so; and ultimately attempt to update the compiler by performing [a Clang
14roll](updating_clang.md).
Hans Wennborg3c65cd42023-05-31 17:46:0815
Hans Wennborgfbda6212025-04-04 14:30:4416There are two main sources of information about the state of the build:
Hans Wennborg3c65cd42023-05-31 17:46:0817
Hans Wennborgfbda6212025-04-04 14:30:44181. Buildbots on the [tip-of-tree clang
19 waterfall](https://ci.chromium.org/p/chromium/g/chromium.clang/console)
20 continuously build the latest version of Clang and use that to build and
21 test Chromium in various build configurations. These provide the fastest
22 signal about problems such as the compiler crashing, new warnings causing
23 build failures, miscompiles causing test failures, etc. Unlike production
24 buildbots, these build Clang with assertions enabled to detect as many
25 problems as possible. (Clicking 'Log in' in the top right corner with a
26 Google account will reveal a few more bots.)
Hans Wennborg3c65cd42023-05-31 17:46:0827
Hans Wennborgfbda6212025-04-04 14:30:44281. Automatically generated [Clang roll
29 CLs](https://chromium-review.googlesource.com/q/path:tools/clang/scripts/update.py)
30 ("dry run CLs"). These are generated every few hours by a Cron job and
31 attempt to package the latest version of Clang and Rust. That process can
32 fail for many reasons, especially due to failures in the compilers' test
33 suites. If the CLs stop being generated, that also needs to be addressed.
Hans Wennborg3c65cd42023-05-31 17:46:0834
Hans Wennborgfbda6212025-04-04 14:30:4435Issues should be filed in the [Chromium > Tools >
36LLVM](https://g-issues.chromium.org/issues?q=status:open%20componentid:1457173)
37bug tracker component, and marked as blockers of the tracking bug for the next
38toolchain update. That bug is typically named "roll clang and rust again"
39[example](https://crbug.com/404285928). The tracking bug should be filed as a
40P1 Process bug, and blockers should be filed and treated as P1 bugs.
Hans Wennborg3c65cd42023-05-31 17:46:0841
Hans Wennborgfbda6212025-04-04 14:30:4442Here is a suggested set of steps to iterate over while gardening:
Hans Wennborg3c65cd42023-05-31 17:46:0843
Hans Wennborgfbda6212025-04-04 14:30:4444* If there is no bug for tracking the next toolchain update, file one.
45
46* Go over the blockers of the toolchain update tracking bug. Close obsolete
47 ones, try to fix or find someone to fix the remaining ones.
48
49* Check the [tip-of-tree clang
50 waterfall](https://ci.chromium.org/p/chromium/g/chromium.clang/console) and
51 file bugs for any issues.
52
53* Check the automatic [Clang roll
54 CLs](https://chromium-review.googlesource.com/q/path:tools/clang/scripts/update.py).
55 File a bug for any packaging issues. File a bug if the CLs stop being produced.
56
57* When packaging succeeds on a roll CL, follow the instructions in [update the
58 compiler](updating_clang.md) to push the packages to production and do a
59 commit queue dry run. File a bug for any issues that come up.
60
61* If the commit queue dry run was successful, review and land the CL.
62
63The key to success is to detect as many problems as early as possible. Rather
64than stopping to dig deeply into the first problem encountered, it's better to
65do a broad sweep to find all the problems. That way they can be shared among
66the team. Also, problems are often much easier to address when found early.
67
68The other key to success is communication. The bug tracker is the main tool for
69that, and the other is the team chat room. When it's not clear whether
70something is an issue or not, how to resolve an problem, how something works,
71etc., just ask away.
72
73The gardener is also responsible for taking notes during the weekly Chrome
74toolchain sync meeting.
Hans Wennborg3c65cd42023-05-31 17:46:0875
76[TOC]
77
Arthur Eubanksc8399772025-03-20 16:10:0678## Clang packaging test failures
79
80When packaging clang/LLVM on our various supported platforms (`upload_*_clang`
81tryjobs), we run the entire LLVM test suite and block the build if any
82test failed. The most common test failures we see are Mac and Windows-specific
83tests since upstream LLVM is mostly Linux-focused. There are public bots that
84also run LLVM tests, mostly accessible from https://lab.llvm.org/buildbot.
85There are also some Apple bots running at
86http://green.lab.llvm.org/job/llvm.org/ which mirror test failures we see on
87Mac. Reverting the culprit change upstream with a pointer to a public bot
88showing the test failure is encouraged.
89
Hans Wennborgfbda6212025-04-04 14:30:4490See LLVM's [Patch reversion
91policy](https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy).
92
Hans Wennborg3c65cd42023-05-31 17:46:0893## Disk out of space
94
95If there are any issues with disk running out of space, file a go/bug-a-trooper
96bug, for example https://crbug.com/1105134.
97
98## Is it the compiler?
99
100Chromium does not always build and pass tests in all configurations that
101everyone cares about. Some configurations simply take too long to build
102(ThinLTO) or be tested (dbg) on the CQ before committing. And, some tests are
103flaky. So, our console is often filled with red boxes, and the boxes don't
104always need to be green to roll clang.
105
106Oftentimes, if a bot is red with a test failure, it's not a bug in the compiler.
107To check this, the easiest and best thing to do is to try to find a
108corresponding builder that doesn't use ToT clang. For standard configurations,
109start on the waterfall that corresponds to the OS of the red bot, and search
110from there. If the failing bot is Google Chrome branded, go to the (Google
111internal) [official builder
112list](https://uberchromegw.corp.google.com/i/official.desktop.continuous/builders/)
113and start searching from there.
114
115If you are feeling charitable, you can try to see when the test failure was
116introduced by looking at the history in the bot. One way to do this is to add
117`?numbuilds=200` to the builder URL to see more history. If that isn't enough
118history, you can manually binary search build numbers by editing the URL until
119you find where the regression was introduced. If it's immediately clear what CL
120introduced the regression (i.e. caused tests to fail reliably in the official
121build configuration), you can simply load the change in gerrit and revert it,
122linking to the first failing build that implicates the change being reverted.
123
124If the failure looks like a compiler bug, these are the common failures we see
125and what to do about them:
126
1271. compiler crash
1281. compiler warning change
1291. compiler error
1301. miscompile
1311. linker errors
132
133## Compiler crash
134
135This is probably the most common bug. The standard procedure is to do these
136things:
137
1381. Open the `gclient runhooks` stdout log from the first red build. Near the
139 top of that log you can find the range of upstream llvm revisions. For
140 example:
141
142 From https://github.com/llvm/llvm-project
143 f917356f9ce..292e898c16d master -> origin/master
144
1451. File a crbug documenting the crash. Include the range, and any other bots
146 displaying the same symptoms.
1471. All clang crashes on the Chromium bots are automatically uploaded to
148 Cloud Storage. On the failing build, click the "stdout" link of the
149 "process clang crashes" step right after the red compile step. It will
150 print something like
151
152 processing heap_page-65b34d... compressing... uploading... done
153 gs://chrome-clang-crash-reports/v1/2019/08/27/chromium.clang-ToTMac-20955-heap_page-65b34d.tgz
154 removing heap_page-65b34d.sh
155 removing heap_page-65b34d.cpp
156
157 Use
158 `gsutil.py cp gs://chrome-clang-crash-reports/v1/2019/08/27/chromium.clang-ToTMac-20955-heap_page-65b34d.tgz .`
159 to copy it to your local machine. Untar with
160 `tar xzf chromium.clang-ToTMac-20955-heap_page-65b34d.tgz` and change the
161 included shell script to point to a locally-built clang. Remove the
162 `-Xclang -plugin` flags. If you re-run the shell script, it should
163 reproduce the crash.
1641. Identify the revision that introduced the crash. First, look at the commit
165 messages in the LLVM revision range to see if one modifies the code near the
166 point of the crash. If so, try reverting it locally, rebuild, and run the
167 reproducer to see if the crash goes away.
168
169 If that doesn't work, use `git bisect`. Use this as a template for the bisect
170 run script:
171 ```shell
172 #!/bin/bash
173 cd $(dirname $0) # get into llvm build dir
174 ninja -j900 clang || exit 125 # skip revisions that don't compile
175 ./t-8f292b.sh || exit 1 # exit 0 if good, 1 if bad
176 ```
1771. File an upstream bug like http://llvm.org/PR43016. Usually the unminimized repro
178 is too large for LLVM's bugzilla, so attach it to a (public) crbug and link
179 to that from the LLVM bug. Then revert with a commit message like
180 "Revert r368987, it caused PR43016."
1811. If you want, make a reduced repro using CReduce. Clang contains a handy wrapper around
182 CReduce that you can invoke like so:
183
184 clang/utils/creduce-clang-crash.py --llvm-bin bin \
185 angle_deqp_gtest-d421b0.sh angle_deqp_gtest-d421b0.cpp
186
Zequan Wud3e671e2024-05-15 19:11:19187 Attach the reproducer to the llvm bug you filed in the previous step. You can
188 disable Creduce's renaming passes with the options
189 `--remove-pass pass_clang rename-fun --remove-pass pass_clang rename-param
190 --remove-pass pass_clang rename-var --remove-pass pass_clang rename-class
191 --remove-pass pass_clang rename-cxx-method --remove-pass pass_clex
192 rename-toks` which makes it easier for the author to reason about and to
193 further reduce it manually.
Hans Wennborg3c65cd42023-05-31 17:46:08194
195 If you need to do something the wrapper doesn't support,
196 follow the [official CReduce docs](https://embed.cs.utah.edu/creduce/using/)
197 for writing an interestingness test and use creduce directly.
198
199## Compiler warning change
200
201New Clang versions often find new bad code patterns to warn on. Chromium builds
202with `-Werror`, so improvements to warnings often turn into build failures in
203Chromium. Once you understand the code pattern Clang is complaining about, file
204a bug to do either fix or silence the new warning.
205
206If this is a completely new warning, disable it by adding `-Wno-NEW-WARNING` to
207[this list of disabled
208warnings](https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?l=1479)
209if `llvm_force_head_revision` is true. Here is [an
210example](https://chromium-review.googlesource.com/1251622). This will keep the
211ToT bots green while you decide what to do.
212
213Sometimes, behavior changes and a pre-existing warning changes to warn on new
214code. In this case, fixing Chromium may be the easiest and quickest fix. If
215there are many sites, you may consider changing clang to put the new diagnostic
216into a new warning group so you can handle it as a new warning as described
217above.
218
219If the warning is high value, then eventually our team or other contributors
220will end up fixing the crbug and there is nothing more to do. If the warning
221seems low value, pass that feedback along to the author of the new warning
222upstream. It's unlikely that it should be on by default or enabled by `-Wall` if
223users don't find it valuable. If the warning is particularly noisy and can't be
224easily disabled without disabling other high value warnings, you should consider
225reverting the change upstream and asking for more discussion.
226
227## Compiler error
228
229This rarely happens, but sometimes clang becomes more strict and no longer
230accepts code that it previously did. The standard procedure for a new warning
231may apply, but it's more likely that the upstream Clang change should be
232reverted, if the C++ code in question in Chromium looks valid.
233
234## Miscompile
235
236Miscompiles tend to result in crashes, so if you see a test with the CRASHED
237status, this is probably what you want to do.
238
2391. Bisect object files to find the object with the code that changed. LLVM
240 contains `llvm/utils/rsp_bisect.py` which may be useful for bisecting object
241 files using an rsp file.
2421. Debug it with a traditional debugger
243
244## Linker error
245
246`ld.lld`'s `--reproduce` flag makes LLD write a tar archive of all its inputs
247and a file `response.txt` that contains the link command. This allows people to
248work on linker bugs without having to have a Chromium build environment.
249
250To use `ld.lld`'s `--reproduce` flag, follow these steps:
251
2521. Locally (build Chromium with a locally-built
253 clang)[https://chromium.googlesource.com/chromium/src.git/+/main/docs/clang.md#Using-a-custom-clang-binary]
254
2551. After reproducing the link error, build just the failing target with
256 ninja's `-v -d keeprsp` flags added:
257 `ninja -C out/gn base_unittests -v -d keeprsp`.
258
2591. Copy the link command that ninja prints, `cd out/gn`, paste it, and manually
260 append `-Wl,--reproduce,repro.tar`. With `lld-link`, instead append
261 `/reproduce:repro.tar`. (`ld.lld` is invoked through the `clang` driver, so
262 it needs `-Wl` to pass the flag through to the linker. `lld-link` is called
263 directly, so the flag needs no prefix.)
264
2651. Zip up the tar file: `gzip repro.tar`. This will take a few minutes and
266 produce a .tar.gz file that's 0.5-1 GB.
267
2681. Upload the .tar.gz to Google Drive. If you're signed in with your @google
269 address, you won't be able to make a world-shareable link to it, so upload
270 it in a Window where you're signed in with your @chromium account.
271
2721. File an LLVM bug linking to the file. Example: http://llvm.org/PR43241
273
274TODO: Describe object file bisection, identify obj with symbol that no longer
275has the section.
276
277## ThinLTO Trouble
278
279Sometimes, problems occur in ThinLTO builds that do not occur in non-LTO builds.
280These steps can be used to debug such problems.
281
282Notes:
283
284 - All steps assume they are run from the output directory (the same directory args.gn is in).
285
286 - Commands have been shortened for clarity. In particular, Chromium build commands are
287 generally long, with many parts that you just copy-paste when debugging. These have
288 largely been omitted.
289
290 - The commands below use "clang++", where in practice there would be some path prefix
291 in front of this. Make sure you are invoking the right clang++. In particular, there
292 may be one in the PATH which behaves very differently.
293
294### Get the full command that is used for linking
295
296To get the command that is used to link base_unittests:
297
298```sh
299$ rm base_unittests
300$ ninja -n -d keeprsp -v base_unittests
301```
302
303This will print a command line. It will also write a file called `base_unittests.rsp`, which
304contains additional parameters to be passed.
305
306### Remove ThinLTO cache flags
307
308ThinLTO uses a cache to avoid compilation in some cases. This can be confusing
309when debugging, so make sure to remove the various cache flags like
310`-Wl,--thinlto-cache-dir`.
311
312### Expand Thin Archives on Command Line
313
314Expand thin archives mentioned in the command line to their individual object files.
315The script `tools/clang/scripts/expand_thin_archives.py` can be used for this purpose.
316For example:
317
318```sh
319$ ../../tools/clang/scripts/expand_thin_archives.py -p=-Wl, -- @base_unittests.rsp > base_unittests.expanded.rsp
320```
321
322The `-p` parameter here specifies the prefix for parameters to be passed to the linker.
323If you are invoking the linker directly (as opposed to through clang++), the prefix should
324be empty.
325
326```sh
327$ ../../tools/clang/scripts/expand_thin_archives.py -p='', -- @base_unittests.rsp > base_unittests.expanded.rsp
328```
329
330### Remove -Wl,--start-group and -Wl,--end-group
331
332Edit the link command to use the expanded command line, and remove any mention of `-Wl,--start-group`
333and `-Wl,--end-group` that surround the expanded command line. For example, if the original command was:
334
335 clang++ -fuse-ld=lld -o ./base_unittests -Wl,--start-group @base_unittests.rsp -Wl,--end-group
336
337the new command should be:
338
339 clang++ -fuse-ld=lld -o ./base_unittests @base_unittests.expanded.rsp
340
341The reason for this is that the `-start-lib` and `-end-lib` flags that expanding the command
342line produces cannot be nested inside `--start-group` and `--end-group`.
343
344### Producing ThinLTO Bitcode Files
345
346In a ThinLTO build, what is normally the compile step that produces native object files
347instead produces LLVM bitcode files. A simple example would be:
348
349```sh
350$ clang++ -c -flto=thin foo.cpp -o foo.o
351```
352
353In a Chromium build, these files reside under `obj/`, and you can generate them using ninja.
354For example:
355
356```sh
357$ ninja obj/base/base/lock.o
358```
359
360These can be fed to `llvm-dis` to produce textual LLVM IR:
361
362```
363$ llvm-dis -o - obj/base/base/lock.o | less
364```
365
366When using split LTO unit (`-fsplit-lto-unit`, which is required for
367some features, CFI among them), this may produce a message like:
368
369 llvm-dis: error: Expected a single module
370
371 In that case, you can use `llvm-modextract`:
372
373```sh
374$ llvm-modextract -n 0 -o - obj/base/base/lock.o | llvm-dis -o - | less
375```
376
377### Saving Intermediate Bitcode
378
379The ThinLTO linking process proceeds in a number of stages. The bitcode that is
380generated during these stages can be saved by passing `-save-temps` to the linker:
381
382```
383$ clang++ -fuse-ld=lld -Wl,-save-temps -o ./base_unittests @base_unittests.expanded.rsp
384```
385
386This generates files such as:
387 - lock.o.0.preopt.bc
388 - lock.o.3.import.bc
389 - lock.o.5.precodegen.bc
390
391in the directory where lock.o is (obj/base/base).
392
393These can be fed to `llvm-dis` to produce textual LLVM IR. They show
394how the code is transformed as it progresses through ThinLTO stages.
395Of particular interest are:
396 - .3.import.bc, which shows the IR after definitions have been imported from
397 other modules, but before optimizations. Running this through LLVM's `opt`
398 tool with the right optimization level can often reproduce issues.
399 - .5.precodegen.bc, which shows the IR just before it is transformed to native
400 code. Running this through LLVM's `llc` tool with the right optimization level
401 can often reproduce issues.
402
403The same `-save-temps` command also produces `base_unittests.resolution.txt`, which
404shows symbol resolutions. These look like:
405
406 -r=obj/base/test/run_all_base_unittests/run_all_base_unittests.o,main,plx
407
408In this example, run_all_base_unittests.o contains a symbol named
409main, with flags plx.
410
411The possible flags are:
412 - p: prevailing: of symbols with this name, this one has been chosen.
413 - l: final definition in this linkage unit.
414 - r: redefined by the linker.
415 - x: visible to regular (that is, non-LTO) objects.
416
417### Code Generation for a Single Module
418
419To speed up debugging, it may be helpful to limit code generation to a single
420module if you know the name of the module (e.g. the module name is in a crash
421dump).
422
423`-Wl,--thinlto-single-module=foo` tells ThinLTO to only run
424optimizations/codegen on files matching the pattern and skip linking. This is
425helpful especially in combination with `-Wl,-save-temps`.
426
427```sh
428$ clang++ -fuse-ld=lld -Wl,--thinlto-single-module=obj/base/base/lock.o -o ./base_unittests @base_unittests.expanded.rsp
429```
430
431You should see
432
433```sh
434[ThinLTO] Selecting obj/base/base/lock.o to compile
435```
436
437being printed.
438
439## Tips and tricks
440
441Finding what object files differ between two directories:
442
443```
444$ diff -u <(cd out.good && find . -name "*.o" -exec sha1sum {} \; | sort -k2) \
445 <(cd out.bad && find . -name "*.o" -exec sha1sum {} \; | sort -k2)
446```
447
448Or with cmp:
449
450```
451$ find good -name "*.o" -exec bash -c 'cmp -s $0 ${0/good/bad} || echo $0' {} \;
452```