Rasika Navarange | febe7718 | 2024-05-07 16:49:19 | [diff] [blame] | 1 | # Orderfile |
| 2 | |
Egor Pasko | b16121d | 2024-06-11 17:50:49 | [diff] [blame] | 3 | An orderfile is a list of symbols that defines an ordering of functions. One can |
| 4 | make a static linker, such as LLD, respect this ordering when generating a |
| 5 | binary. |
| 6 | |
| 7 | Reordering code this way can improve startup performance by fetching machine |
| 8 | code to memory more efficiently, since it requires fetching fewer pages from |
| 9 | disk, and a big part of the I/O work is done sequentially by the readahead. |
| 10 | |
| 11 | Code reordering can also improve memory usage by keeping the used code in a |
| 12 | smaller number of memory pages. It can also reduce TLB and L1i cache misses by |
| 13 | placing functions commonly called together closely in memory. |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 14 | |
Rasika Navarange | febe7718 | 2024-05-07 16:49:19 | [diff] [blame] | 15 | ## Generating Orderfiles Manually |
| 16 | |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 17 | To generate an orderfile you can run the `orderfile_generator_backend.py` |
| 18 | script. You will need an Android device connected with |
| 19 | [adb](https://developer.android.com/tools/adb) to generate the orderfile as the |
| 20 | generation pipeline will need to run benchmarks on a device. |
Rasika Navarange | febe7718 | 2024-05-07 16:49:19 | [diff] [blame] | 21 | |
| 22 | Example: |
| 23 | ``` |
| 24 | tools/cygprofile/orderfile_generator_backend.py --target-arch=arm64 --use-remoteexec |
| 25 | ``` |
| 26 | |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 27 | You can specify the architecture (arm or arm64) with `--target-arch`. For quick |
| 28 | local testing you can use `--streamline-for-debugging`. To build using Reclient, |
| 29 | use `--use-remoteexec` (Googlers only). There are several other options you can |
| 30 | use to configure/debug the orderfile generation. Use the `-h` option to view the |
| 31 | various options. |
Rasika Navarange | febe7718 | 2024-05-07 16:49:19 | [diff] [blame] | 32 | |
Rasika Navarange | 6ae2078b | 2024-05-09 16:13:30 | [diff] [blame] | 33 | NB: If your checkout is non-internal you must use the `--public` option. |
| 34 | |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 35 | To build Chrome with a locally generated orderfile, use the |
| 36 | `chrome_orderfile_path=<path_to_orderfile>` GN arg. |
| 37 | |
Tushar Agarwal | 6f0dc3a | 2024-05-22 15:33:01 | [diff] [blame] | 38 | ## Orderfile Performance Testing |
| 39 | |
| 40 | Orderfiles can be tested using |
| 41 | [Pinpoint](https://chromium.googlesource.com/chromium/src/+/main/docs/speed/perf_trybots.md). |
| 42 | To do this, please create and upload a Gerrit change overriding the value of |
| 43 | [`chrome_orderfile_path`](https://source.chromium.org/chromium/chromium/src/+/main:build/config/compiler/BUILD.gn;l=217-223;drc=3a829695d83990141babd25dee7f2f94c005cae4) |
| 44 | to, for instance, `//path/to/my_orderfile` (relative to `src`), where |
| 45 | `my_orderfile` is the orderfile that needs to be evaluated. The orderfile should |
| 46 | be added to the local branch and uploaded to Gerrit along with |
| 47 | `build/config/compiler/BUILD.gn`. This Gerrit change can then be used as an |
| 48 | "experiment patch" for a Pinpoint try job. |
| 49 | |
Egor Pasko | 04c833b7 | 2024-06-19 14:30:39 | [diff] [blame] | 50 | ## Triaging Performance Regressions |
| 51 | |
| 52 | Occasionally, an orderfile roll will cause performance problems on perfbots. |
| 53 | This typically triggers an alert in the form of a bug report, which contains a |
| 54 | group of related regressions like the one shown |
| 55 | [here](https://crbug.com/344654892). |
| 56 | |
| 57 | In such cases it is important to keep in mind that effectiveness of the |
| 58 | orderfile is coupled with using a recent PGO profile when building the native |
| 59 | code. As a result some orderfile improvements (or effective no-ops) register as |
| 60 | regressions on perfbots using non-PGO builds, which is the most common perfbot |
| 61 | configuration. |
| 62 | |
| 63 | If a new regression does not include alerts from the |
| 64 | [android-pixel6-perf-pgo](https://ci.chromium.org/ui/p/chrome/builders/luci.chrome.ci/android-pixel6-perf-pgo) |
| 65 | (the only Android PGO perfbot as of 2024-06) then the first thing to check is to |
| 66 | query the same benchmark+metric combinations for the PGO bot. If the graphs |
| 67 | demonstrate no regression, feel free to close the issue as WontFix(Intended |
| 68 | Behavior). However, not all benchmarks are exercised on the PGO bot |
| 69 | continuously. If there is no PGO coverage for a particular benchmark+metric |
| 70 | combination, this combination can be checked on Pinpoint with the right perfbot |
| 71 | choice ([example](https://crbug.com/344665295)). |
| 72 | |
| 73 | Finally, the PGO+orderfile coupling exists only on arm64. Most speed |
| 74 | optimization efforts on Android are focused on this configuration. On arm32 the |
| 75 | most important orderfile optimization is for reducing memory used by machine |
| 76 | code. Only one benchmark measures it: `system_health.memory_mobile`. |
| 77 | |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 78 | ## Orderfile Pipeline |
| 79 | |
| 80 | The `orderfile_generator_backend.py` script runs several key steps: |
| 81 | |
| 82 | 1. **Build and install Chrome with orderfile instrumentation.** This uses the |
Egor Pasko | b16121d | 2024-06-11 17:50:49 | [diff] [blame] | 83 | [`-finstrument-function-entry-bare`](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-finstrument-function-entry-bare) |
| 84 | Clang command line option to insert instrumentation for function entry. The |
| 85 | build will be generated in `out/arm_instrumented_out/` or |
| 86 | `out/arm64_instrumented_out`, depending on the CPU architecture (instruction |
| 87 | set). |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 88 | |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 89 | 2. **Run the benchmarks and collect profiles.** These benchmarks can be found |
| 90 | in [orderfile.py](../tools/perf/contrib/orderfile/orderfile.py). These profiles |
| 91 | are a list of function offsets into the binary that were called during execution |
| 92 | of the benchmarks. |
| 93 | |
Peter Wen | a0204d52 | 2024-08-19 22:58:58 | [diff] [blame] | 94 | 3. **Cluster the symbols from the profiles to generate the orderfile.** |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 95 | The offsets are processed and merged using a |
| 96 | [clustering](../tools/cygprofile/cluster.py) algorithm to produce an orderfile. |
| 97 | |
Peter Wen | a0204d52 | 2024-08-19 22:58:58 | [diff] [blame] | 98 | 4. **Run benchmarks on the final orderfile.** We run some benchmarks to compare |
Rasika Navarange | e8a00a7 | 2024-05-09 13:39:41 | [diff] [blame] | 99 | the performance with/without the orderfile. You can supply the `--no-benchmark` |
Egor Pasko | b16121d | 2024-06-11 17:50:49 | [diff] [blame] | 100 | flag to skip this step. |