Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 1 | # The Unsafe Buffers Clang plugin. |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 2 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 3 | Our compiler contains a [plugin](../tools/clang/plugins/UnsafeBuffersPlugin.cpp) |
| 4 | which reports places in the code where unsafe buffer operations are present. |
| 5 | |
| 6 | [TOC] |
| 7 | |
| 8 | |
| 9 | ## Preventing OOB by removing Unsafe Buffers operations. |
| 10 | |
| 11 | Out-of-bounds (OOB) security bugs commonly happen through C-style pointers |
| 12 | which have no bounds information associated with them. We can prevent such |
| 13 | bugs by always using C++ containers. Furthermore, the plugin will warn |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 14 | warn about unsafe pointer usage that should be converted to containers. |
| 15 | When an unsafe usage is detected, Clang prints a warning similar to |
| 16 | ``` |
| 17 | error: unsafe buffer access [-Werror,-Wunsafe-buffer-usage] |
| 18 | ``` |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 19 | and directs developers to this file for more information. Several common |
| 20 | [Techniques](#container-based-ecosystem) for fixing these issues are presented |
| 21 | later in this document. |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 22 | |
danakj | 22031fb1 | 2024-11-08 14:52:30 | [diff] [blame] | 23 | Clang documentation includes a guide to working with unsafe-buffer-usage |
| 24 | warnings here: https://clang.llvm.org/docs/SafeBuffers.html |
| 25 | |
Tom Sepez | b142846 | 2025-01-27 21:44:43 | [diff] [blame] | 26 | ## Preventing OOB by removing unsafe libc calls. |
| 27 | |
| 28 | OOB bugs also commonly happen through C-style library calls such as |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 29 | memcpy() and memset() where the programmer is responsible for specifying |
| 30 | an (unchecked) length. In order to encourage safer alternatives, the |
| 31 | plugin can warn about unsafe calls which should be converted to safer |
| 32 | C++ alternatives. When an unsafe libc call is detected, Clang prints |
| 33 | a warning similar to |
| 34 | ``` |
| 35 | error: function 'memcpy' is unsafe [-Werror,-Wunsafe-buffer-usage-in-libc-call] |
| 36 | ``` |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 37 | ## Unsafe buffer warning suppressions |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 38 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 39 | Because the Chromium codebase is not yet compliant with these warnings, |
| 40 | there are mechanisms to opt out code on a directory, file, or per-occurence |
| 41 | basis. |
| 42 | |
| 43 | By default, all files are checked for unsafe-buffer-usage. |
| 44 | |
| 45 | ### Opting out entire directories |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 46 | |
Tom Sepez | b142846 | 2025-01-27 21:44:43 | [diff] [blame] | 47 | Entire directories are opted out of unsafe buffer usage warnings through |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 48 | the [`//build/config/unsafe_buffers_paths.txt`](../build/config/unsafe_buffers_paths.txt) |
| 49 | file. As work progresses, directories will be removed from this list, and |
| 50 | non-compliant files marked on a per-file basis as below. Early results |
| 51 | indicate that often 85%+ of files in a directory already happen to be |
| 52 | compliant, so file-by-file suppression allows this code to be subject |
| 53 | to enforcement. |
| 54 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 55 | This mechanism opts directories out of all warning categories (unsafe |
| 56 | buffers and unsafe libc calls). |
| 57 | |
| 58 | #### Syntax of Unsafe Buffer Paths file |
| 59 | |
| 60 | Note: Paths should be written as relative to the root of the source tree with |
| 61 | unix-style path separators. Directory prefixes should end with `/`, such |
| 62 | as `base/`. |
| 63 | |
| 64 | Empty lines are ignored. |
| 65 | |
| 66 | The `#` character introduces a comment until the end of the line. |
| 67 | |
Tom Sepez | b1a8156 | 2025-02-20 17:28:54 | [diff] [blame] | 68 | Lines starting with `.` declare which checks are to be enforced, as |
| 69 | a comma-separated list of values. Currently allowed values are `buffers` |
| 70 | and `libc`. |
| 71 | |
| 72 | All other lines specify which paths are to be included/excluded. |
| 73 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 74 | Lines that begin with `-` are immediately followed by path prefixes that |
| 75 | will *not* be checked for unsafe-buffer-usage. They are known to do unsafe |
| 76 | things and should be changed to use constructs like base::span or containers |
| 77 | like base::HeapArray and std::vector instead. See https://crbug.com/40285824 |
| 78 | |
| 79 | Lines that begin with `+` are immediately followed by path prefixes that will |
| 80 | be checked for unsafe-buffer-usage. These have no such usage (or all such |
| 81 | usage is annotated), and are protected against new unsafe pointer behaviour |
| 82 | by the compiler. Generally, `+` lines are used to enable checks for |
| 83 | sub-directories of a path that has previously disabled checks (with a |
| 84 | `-` line). |
| 85 | |
| 86 | If a file matches both a `-` and `+` line, the longest matching prefix takes |
| 87 | precedence. |
| 88 | |
| 89 | #### Removing directories from Unsafe Buffers Paths file |
| 90 | |
| 91 | The recommended process for removing a `-dir/` line from this file is: |
| 92 | |
| 93 | 1. Remove the `-dir/` line from this paths file. Possibly add some subdirectories |
| 94 | now needed to reduce scope, like `-dir/sub_dir/`. |
| 95 | |
| 96 | 2. Add `#pragma allow_unsafe_buffers` to every file in the directory that now |
| 97 | has a compilation error (see the next section). |
| 98 | |
| 99 | ### Opting out individual files |
| 100 | |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 101 | Individual files are opted out of unsafe pointer usage warnings though |
| 102 | the use of the following snippet, which is to be placed immediately |
| 103 | following the copyright header in a source file. |
| 104 | ``` |
| 105 | #ifdef UNSAFE_BUFFERS_BUILD |
| 106 | // TODO(crbug.com/ABC): Remove this and convert code to safer constructs. |
| 107 | #pragma allow_unsafe_buffers |
| 108 | #endif |
| 109 | ``` |
| 110 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 111 | The above mechanism also suppress unsafe libc call warnings in addition |
| 112 | to the unsafe buffer warnings. |
| 113 | |
| 114 | To prevent back-sliding on files which have been made safe with respect |
| 115 | to unsafe buffers, there is now a per-file pragma which suppresses the |
| 116 | libc warnings while still enforcing the unsafe buffer warnings. |
| 117 | |
| 118 | ``` |
| 119 | #ifdef UNSAFE_BUFFERS_BUILD |
| 120 | // TODO(crbug.com/ABC): Remove libc calls to fix these warnings. |
| 121 | #pragma allow_unsafe_libc_calls |
| 122 | #endif |
| 123 | ``` |
| 124 | |
| 125 | An initial set of files containing allow\_unsafe\_libc\_calls has been |
| 126 | uploaded; please keep these in place until the pending libc enforcement |
| 127 | is enabled for Chromium. |
| 128 | |
| 129 | #### Removing pragmas from individual files. |
| 130 | |
| 131 | The recommended process for removing pragmas from individual files is: |
| 132 | |
| 133 | 1. 1. Remove the pragma from the file. |
| 134 | |
| 135 | 2. Use the compiler warnings now generated to identify the individual |
| 136 | expressions to suppress (see the next section). |
| 137 | |
| 138 | ### Opting out individual expressions |
| 139 | |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 140 | Individual expressions or blocks of code are opted out by using the |
Evan Stade | 8d390c5 | 2025-04-22 20:28:49 | [diff] [blame] | 141 | `UNSAFE_BUFFERS()` macro as defined in [`//base/compiler_specific.h`](../base/compiler_specific.h) |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 142 | file. These should be rare once a project is fully converted, except |
| 143 | perhaps when working with C-style external APIs. These must |
| 144 | always be accompanied by a `// SAFETY:` comment explaining in detail |
| 145 | how the code has been evaluated to be safe for all possible input. |
| 146 | |
Evan Stade | 8d390c5 | 2025-04-22 20:28:49 | [diff] [blame] | 147 | Code introducing `UNSAFE_BUFFERS()` macro invocations without corresponding |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 148 | `// SAFETY:` comment should be summarily rejected during code review. |
| 149 | |
Tom Sepez | 3072e38 | 2024-08-14 22:42:44 | [diff] [blame] | 150 | To allow for incremental conversion, code can be temporarily opted out by |
| 151 | using the `UNSAFE_TODO()` macro. This provides the same functionality as |
| 152 | the `UNSAFE_BUFFERS()` macro, but allows easier searching for code in need |
| 153 | of revision. Add TODO() comment, along the lines of |
| 154 | `// TODO(crbug.com/xxxxxx): resolve safety issues`. |
| 155 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 156 | This mechanism opts expressions out of all warning categories (unsafe |
| 157 | buffers and unsafe libc calls). |
Tom Sepez | b142846 | 2025-01-27 21:44:43 | [diff] [blame] | 158 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 159 | #### Removing UNSAFE_TODO() from individual expressions |
Tom Sepez | b142846 | 2025-01-27 21:44:43 | [diff] [blame] | 160 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 161 | We seek to convert code to use owning containers like HeapArray and vector, |
| 162 | as explained in the next section. |
Tom Sepez | b142846 | 2025-01-27 21:44:43 | [diff] [blame] | 163 | |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 164 | ## Container-based ecosystem |
| 165 | |
| 166 | Containers may be owning types or view types. The common owning containers that |
| 167 | us contiguous storage are `std::vector`, `std::string`, `base::HeapArray`, |
| 168 | `std::array`. Their common view types are `base::span`, `std::string_view`, |
| 169 | `base::cstring_view`. |
| 170 | |
| 171 | Other owning containers include maps, sets, deques, etc. These are not |
| 172 | compatible with `base::span` as they are not contiguous and generally do not |
| 173 | have an associated view type at this time. |
| 174 | |
| 175 | We are using `base::span` instead of `std::span` in order to provide a type that |
| 176 | can do more than the standard type. We also have other types and functions to |
| 177 | work with ranges and spans instead of unbounded pointers and iterators. |
| 178 | |
| 179 | The common conversions to spans are: |
| 180 | - `base::span<T>` replaces `T* ptr, size_t size`. |
| 181 | - `base::span<T, N>` replaces `T (&ptr)[N]` (a reference to a compile-time-sized |
| 182 | array). |
| 183 | - `base::raw_span<T>` replaces `base::span<T>` (and `T* ptr, size_t size`) for |
| 184 | class fields. |
| 185 | |
| 186 | ### Span construction |
Peter Kasting | e825c6f | 2024-12-02 17:35:39 | [diff] [blame] | 187 | - `base::span()` makes a span, deducing the type and size, from any contiguous |
| 188 | range. It can also take explicit begin/end or data/size pairs. |
| 189 | - `base::to_fixed_extent<N>()` makes a fixed-size span from a dynamic one. |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 190 | - `base::as_bytes()` and `base::as_chars()` convert a span’s inner type to |
| 191 | `uint8_t` or `char` respectively, making a byte-span or char-span. |
| 192 | - `base::span_from_ref()` and `base::byte_span_from_ref()` make a span, or |
| 193 | byte-span, from a single object. |
Peter Kasting | e825c6f | 2024-12-02 17:35:39 | [diff] [blame] | 194 | - `base::as_byte_span()` and `base::as_writable_byte_span()` make a |
| 195 | byte-span from any contiguous range. |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 196 | |
| 197 | #### Padding bytes |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 198 | Note that if the type contains padding bytes that were not somehow explicitly |
| 199 | initialized, this can create reads of uninitialized memory. Conversion to a |
| 200 | byte-span is most commonly used for spans of primitive types, such as going from |
| 201 | `char` (such as in `std::string`) or `uint32_t` (in a `std::vector`) to |
| 202 | `unit8_t`. |
| 203 | |
| 204 | ### Dynamic read/write of a span |
| 205 | - `base::SpanReader` reads heterogeneous values from a (typically, byte-) span |
| 206 | in a dynamic manner. |
| 207 | - `base::SpanWriter` writes heterogeneous values into a (typically, byte-) span |
| 208 | in a dynamic manner. |
| 209 | |
| 210 | ### Values to/from byte spans |
| 211 | In [`//base/numerics/byte_conversions.h`](../base/numerics/byte_conversions.h) |
| 212 | we have conversions between byte-arrays and big/little endian integers or |
| 213 | floats. For example (and there are many other variations): |
| 214 | - `base::U32FromBigEndian` converts from a big-endian byte-span to an unsigned |
| 215 | 32-bit integer. |
| 216 | - `base::U32FromLittleEndian` converts from a little-endian byte-span to an |
| 217 | unsigned |
| 218 | - `base::U32ToBigEndian` converts from an integer to a big-endian-encoded |
| 219 | 4-byte-array. |
| 220 | - `base::U32ToLittleEndian` converts from an integer to a little-endian-encoded |
| 221 | 4-byte-array. |
| 222 | |
| 223 | ### Heap-allocated arrays |
| 224 | - `base::HeapArray<T>` replaces `std::unique_ptr<T[]>` and places the bounds of |
| 225 | the array inside the `HeapArray` which makes it a bounds-safe range. |
| 226 | |
| 227 | ### Copying and filling arrays |
| 228 | - `base::span::copy_from(span)` replaces `memcpy` and `memmove`, and verifies |
| 229 | that the source and destination spans have the same size instead of writing |
| 230 | out of bounds. It lowers to the same code as `memmove` when possible. |
| 231 | - Note `std::ranges::copy` is not bounds-safe (though its name sounds like |
| 232 | it should be). |
| 233 | - `std::ranges::fill` replaces `memset` and works with a range so you don't |
| 234 | need explicit bounds. |
| 235 | |
| 236 | ### String pointers |
| 237 | |
| 238 | A common form of pointer is `const char*` which is used (sometimes) to represent |
| 239 | a NUL-terminated string. The standard library gives us two types to replace |
| 240 | `char*`, which allow us to know the bounds of the character array and work with |
| 241 | the string as a range: |
| 242 | |
| 243 | - `std::string` owns a NUL-terminated string. |
| 244 | - `std::string_view` is a view of a non-NUL-terminated string. |
| 245 | |
| 246 | What’s missing is a view of a string that is guaranteed to be NUL-terminated so |
| 247 | that you can call `.c_str()` to generate a `const char*` suitable for C APIs. |
| 248 | |
| 249 | - `base::cstring_view` is a view of a NUL-terminated string. This avoids the |
| 250 | need to construct a `std::string` in order to ensure a terminating NUL is |
| 251 | present. Use this as a view type whenever your code bottoms out in a C API |
| 252 | that needs NUL-terminated string pointer. |
| 253 | |
| 254 | ### Use of std::array<T>. |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 255 | |
| 256 | The clang plugin is very particular about indexing a C-style array (e.g. |
| 257 | `int arr[100]`) with a variable index. Often these issues can be resolved |
| 258 | by replacing this with `std::array<int, 100> arr`, which provides safe |
| 259 | indexed operations. In particular, new code should prefer to use the |
| 260 | `std::array<T, N>` mechanism. |
| 261 | |
| 262 | For arrays where the size is determined by the compiler (e.g. |
| 263 | `int arr[] = { 1, 3, 5 };`), the `std::to_array<T>()` helper function |
| 264 | should be used along with the `auto` keyword: |
| 265 | `auto arr = std::to_array<int>({1, 3, 5});` |
| 266 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 267 | ### Avoid reinterpret_cast |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 268 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 269 | Casts to bytes are common and can be handled as follows. |
| 270 | |
| 271 | #### Writing to a byte span |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 272 | |
| 273 | A common idiom in older code is to write into a byte array by casting |
| 274 | the array into a pointer to a larger type (such as `uint32_t` or `float`) |
danakj | cdf43e2 | 2024-07-02 16:42:16 | [diff] [blame] | 275 | and then writing through that pointer. This can result in Undefined Behaviour |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 276 | and violates the rules of the C++ abstract machine. |
| 277 | |
| 278 | Instead, keep the byte array as a `base::span<uint8_t>`, and write to it |
| 279 | directly by chunking it up into pieces of the size you want to write. |
| 280 | |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 281 | Using `take_first()` (good for repeated modifications and loops): |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 282 | ```cc |
| 283 | void write_floats(base::span<uint8_t> out, float f1, float f2) { |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 284 | // Write `f1` into `out`'s prefix, moving `out` forward. |
| 285 | out.take_first<4>().copy_from(base::byte_span_from_ref(f1)); |
| 286 | // Write `f2` into `out`'s new prefix (after `f1`). |
| 287 | out.copy_prefix_from(base::byte_span_from_ref(f2)); |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 288 | } |
| 289 | ``` |
| 290 | |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 291 | Using `split_at()` (good when there are exactly two pieces): |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 292 | ```cc |
| 293 | void write_floats(base::span<uint8_t> out, float f1, float f2) { |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 294 | // Split `out` into a prefix to write `f1` into, and a remainder. |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 295 | auto [write_f1, rem] = out.split_at<4>(); |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 296 | // Write `f1` into the prefix portion, `write_f1`. |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 297 | write_f1.copy_from(base::byte_span_from_ref(f1)); |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 298 | // Write `f2` into the beginning of the remainder. |
| 299 | rem.copy_prefix_from(base::byte_span_from_ref(f2)); |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 300 | } |
| 301 | ``` |
| 302 | |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 303 | Using `SpanWriter` and endian-aware `FloatToLittleEndian()` (good when non-fatal |
| 304 | APIs are desired): |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 305 | ```cc |
| 306 | void write_floats(base::span<uint8_t> out, float f1, float f2) { |
| 307 | auto writer = base::SpanWriter(out); |
| 308 | CHECK(writer.Write(base::FloatToLittleEndian(f1))); |
| 309 | CHECK(writer.Write(base::FloatToLittleEndian(f2))); |
| 310 | } |
| 311 | ``` |
| 312 | |
| 313 | Writing big-endian, with `SpanWriter` and endian-aware `U32ToBigEndian()`: |
| 314 | ```cc |
| 315 | void write_values(base::span<uint8_t> out, uint32_t i1, uint32_t i2) { |
| 316 | auto writer = base::SpanWriter(out); |
| 317 | CHECK(writer.Write(base::U32ToBigEndian(i1))); |
| 318 | // SpanWriter has a built-in shortcut to do the same thing. |
| 319 | CHECK(writer.WriteU32BigEndian(i2)); |
| 320 | // Verify we wrote to the whole output. We can put a size parameter in the |
| 321 | // `out` span to push this check to compile-time when it's a constant. |
| 322 | CHECK_EQ(writer.remaining(), 0u); |
| 323 | } |
| 324 | ``` |
| 325 | |
| 326 | Writing an array to a byte span with `copy_from()`: |
| 327 | ```cc |
| 328 | void write_floats(base::span<uint8_t> out, std::vector<const float> floats) { |
| 329 | base::span<const uint8_t> byte_floats = base::as_byte_span(floats); |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 330 | // Or use copy_from() if you want to CHECK at runtime that all of `out` has |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 331 | // been written to. |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 332 | out.copy_prefix_from(byte_floats); |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 333 | } |
| 334 | ``` |
| 335 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 336 | #### Reading from a byte span |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 337 | |
| 338 | Instead of turning a `span<const uint8_t>` into a pointer of a larger type, |
| 339 | which can cause Undefined Behaviour, read values out of the byte span and |
| 340 | convert each one as a value (not as a pointer). |
| 341 | |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 342 | Using `take_first()` and endian-aware conversion `FloatFromLittleEndian`: |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 343 | ```cc |
| 344 | void read_floats(base::span<const uint8_t> in, float& f1, float& f2) { |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 345 | f1 = base::FloatFromLittleEndian(in.take_first<4>()); |
| 346 | f2 = base::FloatFromLittleEndian(in.take_first<4>()); |
danakj | a4c4238 | 2024-06-18 19:05:44 | [diff] [blame] | 347 | } |
| 348 | ``` |
| 349 | |
| 350 | Using `SpanReader` and endian-aware `U32FromBigEndian()`: |
| 351 | ```cc |
| 352 | void read_values(base::span<const uint8_t> in, int& i1, int& i2, int& i3) { |
| 353 | auto reader = base::SpanReader(in); |
| 354 | i1 = base::U32FromBigEndian(*reader.Read<4>()); |
| 355 | i2 = base::U32FromBigEndian(*reader.Read<4>()); |
| 356 | // SpanReader has a built-in shortcut to do the same thing. |
| 357 | CHECK(reader.ReadU32BigEndian(i3)); |
| 358 | // Verify we read the whole input. We can put a size parameter in the `in` |
| 359 | // span to push this check to compile-time when it's a constant. |
| 360 | CHECK_EQ(reader.remaining(), 0u); |
| 361 | } |
| 362 | ``` |
| 363 | |
| 364 | ## Patterns for spanification |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 365 | |
| 366 | Most pointer issues ought to be resolved by converting to containers. In |
| 367 | particular, one common conversion is to replace `T*` pointers with |
| 368 | `base::span<T>` in a process known as spanification, since most pointers |
| 369 | are unowned references into an array (or vector). The appropriate |
| 370 | replacement for the pointer is |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 371 | [`base::span`](../base/containers/span.h). |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 372 | |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 373 | ### Copying arrays (`memcpy`) |
| 374 | |
| 375 | You have: |
| 376 | ```cc |
| 377 | uint8_t array1[12]; |
| 378 | uint8_t array2[16]; |
| 379 | uint64_t array3[2]; |
| 380 | memcpy(array1, array2 + 8, 4); |
| 381 | memcpy(array1 + 4, array3, 8); |
| 382 | ``` |
| 383 | |
| 384 | Spanified: |
| 385 | ```cc |
| 386 | uint8_t array1[12]; |
| 387 | uint8_t array2[16]; |
| 388 | uint64_t array3[2]; |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 389 | base::span<uint8_t> span1(array1); |
| 390 | span1.take_first<4>().copy_from(base::span(array2).subspan<8, 4>()); |
| 391 | span1.copy_from(base::as_byte_span(array3).first<8>()); |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 392 | |
| 393 | // Use `split_at()` to ensure `array1` is fully written. |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 394 | auto [from2, from3] = base::span(array1).split_at<4>(); |
| 395 | from2.copy_from(base::span(array2).subspan<8, 4>()); |
| 396 | from3.copy_from(base::as_byte_span(array3).first<8>()); |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 397 | ``` |
| 398 | |
| 399 | ### Zeroing arrays (`memset`) |
| 400 | |
| 401 | `std::ranges::fill` works on any range/container and won't write out of |
| 402 | bounds. Converting arbitrary types into a byte array (through |
| 403 | `base::as_writable_byte_span`) is only valid when the type holds trivial |
| 404 | types such as primitives. Unlike `memset`, a constructed object can be |
| 405 | given as the value to use as in `std::ranges_fill(container, Object())`. |
| 406 | |
| 407 | You have: |
| 408 | ```cc |
| 409 | uint8_t array1[12]; |
| 410 | uint64_t array2[2]; |
| 411 | Object array3[4]; |
| 412 | memset(array1, 0, 12); |
| 413 | memset(array2, 0, 2 * sizeof(uint64_t)); |
| 414 | memset(array3, 0, 4 * sizeof(Object)); |
| 415 | ``` |
| 416 | |
| 417 | Spanified: |
| 418 | ```cc |
| 419 | uint8_t array1[12]; |
| 420 | uint64_t array2[2]; |
| 421 | Object array3[4]; |
Peter Kasting | 448444f | 2024-12-13 01:25:36 | [diff] [blame] | 422 | std::ranges::fill(array1, 0); |
| 423 | std::ranges::fill(array2, 0); |
| 424 | std::ranges::fill(base::as_writable_byte_span(array3), 0); |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 425 | ``` |
| 426 | |
| 427 | ### Comparing arrays (`memcmp`) |
| 428 | |
| 429 | You have: |
| 430 | ```cc |
| 431 | uint8_t array1[12] = {}; |
| 432 | uint8_t array2[12] = {}; |
| 433 | bool eq = memcmp(array1, array2, sizeof(array1)) == 0; |
danakj | 8d29e18 | 2024-06-05 16:50:18 | [diff] [blame] | 434 | bool less = memcmp(array1, array2, sizeof(array1)) < 0; |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 435 | |
| 436 | // In tests. |
| 437 | for (size_t i = 0; i < sizeof(array1); ++i) { |
| 438 | SCOPED_TRACE(i); |
| 439 | EXPECT_EQ(array1[i], array2[i]); |
| 440 | } |
| 441 | ``` |
| 442 | |
| 443 | Spanified: |
| 444 | ```cc |
| 445 | uint8_t array1[12] = {}; |
| 446 | uint8_t array2[12] = {}; |
| 447 | // If one side is a span, the other will convert to span too. |
| 448 | bool eq = base::span(array1) == array2; |
danakj | 8d29e18 | 2024-06-05 16:50:18 | [diff] [blame] | 449 | bool less = base::span(array1) < array2; |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 450 | |
| 451 | // In tests. |
| 452 | EXPECT_EQ(base::span(array1), array2); |
| 453 | ``` |
| 454 | |
| 455 | ### Copying array into an integer |
| 456 | |
| 457 | You have: |
| 458 | ```cc |
| 459 | uint8_t array[44] = {}; |
| 460 | uint32_t v1; |
| 461 | memcpy(&v1, array, sizeof(v1)); // Front. |
| 462 | uint64_t v2; |
| 463 | memcpy(&v2, array + 6, sizeof(v2)); // Middle. |
| 464 | ``` |
| 465 | |
| 466 | Spanified: |
| 467 | ```cc |
| 468 | #include "base/numerics/byte_conversions.h" |
| 469 | ... |
| 470 | uint8_t array[44] = {}; |
| 471 | uint32_t v1 = base::U32FromLittleEndian(base::span(array).first<4u>()); // Front. |
| 472 | uint64_t v2 = base::U64FromLittleEndian(base::span(array).subspan<6u, 8u>()); // Middle. |
| 473 | ``` |
| 474 | |
| 475 | ### Copy an array into an integer via cast |
| 476 | |
David Benjamin | 0ffdea7 | 2024-06-12 19:55:22 | [diff] [blame] | 477 | Note: This pattern is prone to more UB than out-of-bounds access. It is UB to |
| 478 | cast pointers if the result is not aligned, so these cases are often buggy or |
| 479 | were only correct due to subtle assumptions on buffer alignment. The spanified |
| 480 | version avoids this pitfalls. It has no alignment requirement. |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 481 | |
| 482 | You have: |
| 483 | ```cc |
| 484 | uint8_t array[44] = {}; |
| 485 | uint32_t v1 = *reinterpret_cast<const uint32_t*>(array); // Front. |
David Benjamin | 0ffdea7 | 2024-06-12 19:55:22 | [diff] [blame] | 486 | uint64_t v2 = *reinterpret_cast<const uint64_t*>(array + 16); // Middle. |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 487 | ``` |
| 488 | |
| 489 | Spanified: |
| 490 | ```cc |
| 491 | #include "base/numerics/byte_conversions.h" |
| 492 | ... |
| 493 | uint8_t array[44] = {}; |
| 494 | uint32_t v1 = base::U32FromLittleEndian(base::span(array).first<4u>()); // Front. |
David Benjamin | 0ffdea7 | 2024-06-12 19:55:22 | [diff] [blame] | 495 | uint64_t v2 = base::U64FromLittleEndian(base::span(array).subspan<16u, 8u>()); // Middle. |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 496 | ``` |
| 497 | |
| 498 | ### Making a byte array (`span<uint8_t>`) from a string (or any array/range) |
| 499 | |
| 500 | You have: |
| 501 | ```cc |
| 502 | std::string str = "hello world"; |
| 503 | func_with_const_ptr_size(reinterpret_cast<const uint8_t*>(str.data()), str.size()); |
| 504 | func_with_mut_ptr_size(reinterpret_cast<uint8_t*>(str.data()), str.size()); |
| 505 | ``` |
| 506 | |
| 507 | Spanified: |
| 508 | ```cc |
| 509 | std::string str = "hello world"; |
| 510 | base::span<const uint8_t> bytes = base::as_byte_span(str); |
| 511 | func_with_const_ptr_size(bytes.data(), bytes.size()); |
| 512 | base::span<uint8_t> mut_bytes = base::as_writable_byte_span(str); |
| 513 | func_with_mut_ptr_size(mut_bytes.data(), mut_bytes.size()); |
| 514 | |
| 515 | // Replace pointer and size with a span, though. |
| 516 | func_with_const_span(base::as_byte_span(str)); |
| 517 | func_with_mut_span(base::as_writable_byte_span(str)); |
| 518 | ``` |
| 519 | |
| 520 | ### Making a byte array (`span<uint8_t>`) from an object |
| 521 | |
| 522 | You have: |
| 523 | ```cc |
| 524 | uint8_t array[8]; |
| 525 | uint64_t val; |
| 526 | two_byte_arrays(array, reinterpret_cast<const uint8_t*>(&val)); |
| 527 | ``` |
| 528 | |
| 529 | Spanified: |
| 530 | ```cc |
| 531 | uint8_t array[8]; |
| 532 | uint64_t val; |
| 533 | base::span<uint8_t> val_span = base::byte_span_from_ref(val); |
| 534 | two_byte_arrays(array, val_span.data()); |
| 535 | |
| 536 | // Replace an unbounded pointer a span, though. |
| 537 | two_byte_spans(base::span(array), base::byte_span_from_ref(val)); |
| 538 | ``` |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 539 | |
danakj | b70695e | 2024-07-16 15:20:28 | [diff] [blame] | 540 | ### Avoid std::next() for silencing warnings, use ranges |
| 541 | |
| 542 | When we convert `pointer + index` to `std::next(pointer, index)` we silence the |
| 543 | `Wunsafe-buffer-usage` warning by pushing the unsafe pointer arithmetic into |
| 544 | the `std::next()` function in a system header, but we have the same unsafety. |
| 545 | `std::next()` does no additional bounds checking. |
| 546 | |
| 547 | Instead of using `std::next()`, rewrite away from using pointers (or iterators) |
| 548 | entirely by using ranges. `span()` allows us to take a subset of a contiguous |
| 549 | range without having to use iterators that we move with arithmetic or |
| 550 | `std::next()`. |
| 551 | |
danakj | e64eb29 | 2024-10-22 19:02:33 | [diff] [blame] | 552 | Likewise, `std::advance()` can silence the warning but does not add any safety |
| 553 | to the pointer arithmetic and should be avoided. |
| 554 | |
danakj | b70695e | 2024-07-16 15:20:28 | [diff] [blame] | 555 | Instead of using pointer/iterator arithmetic: |
| 556 | ```cc |
| 557 | // Unsafe buffers warning on the unchecked arithmetic. |
| 558 | auto it = std::find(vec.begin() + offset, vec.end(), 20); |
| 559 | // No warning... But has the same security risk! |
| 560 | auto it = std::find(std::next(vec.begin(), offset), vec.end(), 20); |
| 561 | ``` |
| 562 | |
| 563 | Use a range, with `span()` providing a view of a subset of the range: |
| 564 | ```cc |
| 565 | auto it = std::ranges::find(base::span(vec).subspan(offset), 20); |
| 566 | ``` |
| 567 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 568 | ### Functions with array pointer parameters |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 569 | |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 570 | Functions that receive a pointer argument into an array may read |
| 571 | or write out of bounds of that array if subsequent manual size |
| 572 | calculations are incorrect. Such functions should be avoided if |
| 573 | possible, or marked with the `UNSAFE_BUFFER_USAGE` attribute macro |
| 574 | otherwise. This macro propagates to their callers that they must |
| 575 | be called from inside an `UNSAFE_BUFFERS()` region (along with |
| 576 | a corresponding safety comment explaining how the caller knows |
| 577 | the call will be safe). |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 578 | |
| 579 | The same is true for functions that accept an iterator instead |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 580 | of a range type. Some examples of each are `memcpy()` and |
| 581 | `std::copy()`. |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 582 | |
Tom Sepez | 9e7f696 | 2024-06-13 00:21:59 | [diff] [blame] | 583 | Again, calling such functions is unsafe and should be avoided. |
| 584 | Replace such functions with an API built on base::span |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 585 | or other range types which prevents any chance of OOB memory |
| 586 | access. For instance, replace `memcpy()`, `std::copy()` and |
| 587 | `std::ranges::copy()` with `base::span::copy_from()`. And |
| 588 | replace `memset()` with `std::ranges::fill()`. |
danakj | b470507 | 2024-11-06 20:54:09 | [diff] [blame] | 589 | |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 590 | ### Aligned memory |
danakj | b470507 | 2024-11-06 20:54:09 | [diff] [blame] | 591 | |
| 592 | An aligned heap allocation can be constructed into a `base::HeapArray` through |
| 593 | the `base::AlignedUninit<T>(size, alignment)` function in |
| 594 | `//base/memory/aligned_memory.h`. It will allocate space for `size` many `T` |
| 595 | objects aligned to `alignment`, and return a `base::AlignedHeapArray<T>` which |
| 596 | is a `base::HeapArray` with an appropriate deleter. Note that the returned |
| 597 | memory is uninitialized. |
| 598 | ```cc |
| 599 | base::AlignedHeapArray<float> array = base::AlignedUninit<float>(size, alignment); |
| 600 | ``` |
| 601 | |
| 602 | Some containers are built on top of buffers of `char`s that are aligned for |
| 603 | some other `T` in order to manage the lifetimes of objects in the buffer |
| 604 | through in-place construction (`std::construct_at`) and destruction. While the |
| 605 | memory is allocated and destroyed as `char*`, it is accessed as `T*`. The |
| 606 | `base::AlignedUninitCharArray<T>(size, alignment)` function in |
| 607 | `//base/memory/aligned_memory.h` handles this by returning both: |
| 608 | - A `base::AlignedHeapArray<char>` that will not call destructors on anything in its |
| 609 | buffer. |
| 610 | - A `base::span<T>` that points to all of the (not-yet-created) objects in the |
| 611 | `AlignedHeapArray`. This span can be used to construct `T` objects in place in the |
| 612 | buffer, and the caller is responsible for destroying them as well. |
| 613 | ```cc |
| 614 | auto [a, s] = base::AlignedUninitCharArray<float>(size, alignment); |
| 615 | base::AlignedHeapArray<char> array = std::move(a); |
| 616 | base::span<float> span = s; |
| 617 | ``` |
Tom Sepez | d647d84 | 2025-02-11 21:11:51 | [diff] [blame] | 618 | |