Christopher Ferris | af455ac | 2018-08-01 15:21:16 -0700 | [diff] [blame] | 1 | Validing libc Assembler Routines |
| 2 | ================================ |
| 3 | This document describes how to verify incoming assembler libc routines. |
| 4 | |
| 5 | ## Quick Start |
| 6 | * First, benchmark the previous version of the routine. |
| 7 | * Update the routine, run the bionic unit tests to verify the routine doesn't |
| 8 | have any bugs. See the [Testing](#Testing) section for details about how to |
| 9 | verify that the routine is being properly tested. |
| 10 | * Rerun the benchmarks using the updated image that uses the code for |
| 11 | the new routine. See the [Performance](#Performance) section for details about |
| 12 | benchmarking. |
Elliott Hughes | cf34653 | 2020-07-31 10:35:03 -0700 | [diff] [blame] | 13 | * Verify that unwind information for new routine looks correct. See |
| 14 | the [Unwind Info](#unwind-info) section for details about how to verify this. |
Christopher Ferris | af455ac | 2018-08-01 15:21:16 -0700 | [diff] [blame] | 15 | |
| 16 | When benchmarking, it's best to verify on the latest Pixel device supported. |
| 17 | Make sure that you benchmark both the big and little cores to verify that |
| 18 | there is no major difference in performance on each. |
| 19 | |
| 20 | Benchmark 64 bit memcmp: |
| 21 | |
Christopher Ferris | a473be2 | 2018-08-06 12:18:32 -0700 | [diff] [blame] | 22 | /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks --bionic_xml=string.xml --benchmark_filter=memcmp |
Christopher Ferris | af455ac | 2018-08-01 15:21:16 -0700 | [diff] [blame] | 23 | |
| 24 | Benchmark 32 bit memcmp: |
| 25 | |
Christopher Ferris | a473be2 | 2018-08-06 12:18:32 -0700 | [diff] [blame] | 26 | /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --bionic_xml=string.xml --benchmark_filter=memcmp |
Christopher Ferris | af455ac | 2018-08-01 15:21:16 -0700 | [diff] [blame] | 27 | |
| 28 | Locking to a specific cpu: |
| 29 | |
Christopher Ferris | a473be2 | 2018-08-06 12:18:32 -0700 | [diff] [blame] | 30 | /data/benchmarktest/bionic-benchmarks/bionic-benchmarks --bionic_cpu=2 --bionic_xml=string.xml --benchmark_filter=memcmp |
Christopher Ferris | af455ac | 2018-08-01 15:21:16 -0700 | [diff] [blame] | 31 | |
| 32 | ## Performance |
| 33 | The bionic benchmarks are used to verify the performance of changes to |
| 34 | routines. For most routines, there should already be benchmarks available. |
| 35 | |
| 36 | Building |
| 37 | -------- |
| 38 | The bionic benchmarks are not built by default, they must be built separately |
| 39 | and pushed on to the device. The commands below show how to do this. |
| 40 | |
| 41 | mmma -j bionic/benchmarks |
| 42 | adb sync data |
| 43 | |
| 44 | Running |
| 45 | ------- |
| 46 | There are two bionic benchmarks executables: |
| 47 | |
| 48 | /data/benchmarktest64/bionic-benchmarks/bionic-benchmarks |
| 49 | |
| 50 | This is for 64 bit libc routines. |
| 51 | |
| 52 | /data/benchmarktest/bionic-benchmarks/bionic-benchmarks |
| 53 | |
| 54 | This is for 32 bit libc routines. |
| 55 | |
| 56 | Here is an example of how the benchmark should be executed. For this |
| 57 | command to work, you need to change directory to one of the above |
| 58 | directories. |
| 59 | |
Christopher Ferris | a473be2 | 2018-08-06 12:18:32 -0700 | [diff] [blame] | 60 | bionic-benchmarks --bionic_xml=string.xml --benchmark_filter=memcmp |
Christopher Ferris | af455ac | 2018-08-01 15:21:16 -0700 | [diff] [blame] | 61 | |
| 62 | The last argument is the name of the one function that you want to |
| 63 | benchmark. |
| 64 | |
| 65 | Almost all routines are already defined in the **string.xml** file in |
| 66 | **bionic/benchmarks/suites**. Look at the examples in that file to see |
| 67 | how to add a benchmark for a function that doesn't already exist. |
| 68 | |
| 69 | It can take a long time to run these tests since it attempts to test a |
| 70 | large number of sizes and alignments. |
| 71 | |
| 72 | Results |
| 73 | ------- |
| 74 | Bionic benchmarks is based on the [Google Benchmarks](https://github.com/google/benchmark) |
| 75 | library. An example of the output looks like this: |
| 76 | |
| 77 | Run on (8 X 1844 MHz CPU s) |
| 78 | CPU Caches: |
| 79 | L1 Data 32K (x8) |
| 80 | L1 Instruction 32K (x8) |
| 81 | L2 Unified 512K (x2) |
| 82 | ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. |
| 83 | ------------------------------------------------------------------------------------------- |
| 84 | Benchmark Time CPU Iterations |
| 85 | ------------------------------------------------------------------------------------------- |
| 86 | BM_string_memcmp/1/0/0 6 ns 6 ns 120776418 164.641MB/s |
| 87 | BM_string_memcmp/1/1/1 6 ns 6 ns 120856788 164.651MB/s |
| 88 | |
| 89 | The smaller the time, the better the performance. |
| 90 | |
| 91 | Caveats |
| 92 | ------- |
| 93 | When running the benchmarks, CPU scaling is not normally enabled. This means |
| 94 | that if the device does not get up to the maximum cpu frequency, the results |
| 95 | can vary wildly. It's possible to lock the cpu to the maximum frequency, but |
| 96 | is beyond the scope of this document. However, most of the benchmarks max |
| 97 | out the cpu very quickly on Pixel devices, and don't affect the results. |
| 98 | |
| 99 | Another potential issue is that the device can overheat when running the |
| 100 | benchmarks. To avoid this, you can run the device in a cool environment, |
| 101 | or choose a device that is less likely to overheat. To detect these kind |
| 102 | of issues, you can run a subset of the tests again. At the very least, it's |
| 103 | always a good idea to rerun the suite a couple of times to verify that |
| 104 | there isn't a high variation in the numbers. |
| 105 | |
Christopher Ferris | a473be2 | 2018-08-06 12:18:32 -0700 | [diff] [blame] | 106 | If you want to verify a single benchmark result, you can run a single test |
| 107 | using a command like this: |
| 108 | |
| 109 | bionic-benchmarks --bionic_xml=string.xml --benchmark_filter=BM_string_memcmp/1/1/0 |
| 110 | |
| 111 | Where the argument to the filter argument is the name of the benchmark from |
| 112 | the output. Sometimes this filter can still match multiple benchmarks, to |
| 113 | guarantee that you only run the single benchmark, you can execute the benchmark |
| 114 | like so: |
| 115 | |
| 116 | bionic-benchmarks --bionic_xml=string.xml --benchmark_filter=BM_string_memcmp/1/1/0$ |
| 117 | |
| 118 | NOTE: It is assumed that these commands are executed in adb as the shell user |
| 119 | on device. If you are trying to run this using adb directly from a host |
| 120 | machine, you might need to escape the special shell characters such as **$**. |
| 121 | |
Christopher Ferris | af455ac | 2018-08-01 15:21:16 -0700 | [diff] [blame] | 122 | ## Testing |
| 123 | |
| 124 | Run the bionic tests to verify that the new routines are valid. However, |
| 125 | you should verify that there is coverage of the new routines. This is |
| 126 | especially important if this is the first time a routine is assembler. |
| 127 | |
| 128 | Caveats |
| 129 | ------- |
| 130 | When verifying an assembler routine that operates on buffer data (such as |
| 131 | memcpy/strcpy), it's important to verify these corner cases: |
| 132 | |
| 133 | * Verify the routine does not read past the end of the buffers. Many |
| 134 | assembler routines optimize by reading multipe bytes at a time and can |
| 135 | read past the end. This kind of bug results in an infrequent and difficult to |
| 136 | diagnosis crash. |
| 137 | * Verify the routine handles unaligned buffers properly. Usually, a failure |
| 138 | can result in an unaligned exception. |
| 139 | * Verify the routine handles different sized buffers. |
| 140 | |
| 141 | If there are not sufficient tests for a new routine, there are a set of helper |
| 142 | functions that can be used to verify the above corner cases. See the |
| 143 | header **bionic/tests/buffer\_tests.h** for these routines and look at |
| 144 | **bionic/tests/string\_test.cpp** for examples of how to use it. |
| 145 | |
| 146 | ## Unwind Info |
| 147 | It is also important to verify that the unwind information for these |
| 148 | routines are properly set up. Here is a quick checklist of what to check: |
| 149 | |
| 150 | * Verify that all labels are of the format .LXXX, where XXX is any valid string |
| 151 | for a label. If any other label is used, entries in the symbol table |
| 152 | will be generated that include these labels. In that case, you will get |
| 153 | an unwind with incorrect function information. |
| 154 | * Verify that all places where pop/pushes or instructions that modify the |
| 155 | sp in any way have corresponding cfi information. Along with this item, |
| 156 | verify that when registers are pushed on the stack that there is cfi |
| 157 | information indicating how to get the register. |
| 158 | * Verify that only cfi directives are being used. This only matters for |
| 159 | arm32, where it's possible to use ARM specific unwind directives. |
| 160 | |
| 161 | This list is not meant to be exhaustive, but a minimal set of items to verify |
| 162 | before submitting a new libc assembler routine. There are difficult |
| 163 | to verify unwind cases, such as around branches, where unwind information |
| 164 | can be drastically different for the target of the branch and for the |
| 165 | code after a branch instruction. |