danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 1 | # Preventing OOB through Unsafe Buffers errors (aka Spanification) |
| 2 | |
| 3 | Out-of-bounds (OOB) security bugs commonly happen through pointers |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 4 | which have no bounds information associated with them. We can prevent such |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 5 | bugs by always using containers. |
| 6 | |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 7 | Entire directories have been opted out of unsafe pointer usage |
| 8 | warnings through the |
| 9 | [`//build/config/unsafe_buffers_paths.txt`](../build/config/unsafe_buffers_paths.txt) |
| 10 | file. Our [compiler](../tools/clang/plugins/UnsafeBuffersPlugin.cpp) enables |
| 11 | the `-Wunsafe-buffer-usage` warning in any file that is not |
| 12 | opted out. As we convert unsafe pointers to safe constructs like |
| 13 | `base::span`, `base::HeapArray` and `std::vector`, we will |
| 14 | remove paths from that file to enable the warnings across the |
| 15 | codebase. To incrementally apply the warnings across a whole directory, |
| 16 | individual files [can be opted-out](#controlling-warnings-for-a-single-file) |
| 17 | of the warning. |
| 18 | |
| 19 | ## Patterns for spanification |
| 20 | |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 21 | Most pointers are unowned references into an array (or vector) |
| 22 | and the most appropriate replacement for the pointer is |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 23 | [`base::span`](../base/containers/span.h). |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 24 | |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 25 | ### Copying arrays (`memcpy`) |
| 26 | |
| 27 | You have: |
| 28 | ```cc |
| 29 | uint8_t array1[12]; |
| 30 | uint8_t array2[16]; |
| 31 | uint64_t array3[2]; |
| 32 | memcpy(array1, array2 + 8, 4); |
| 33 | memcpy(array1 + 4, array3, 8); |
| 34 | ``` |
| 35 | |
| 36 | Spanified: |
| 37 | ```cc |
| 38 | uint8_t array1[12]; |
| 39 | uint8_t array2[16]; |
| 40 | uint64_t array3[2]; |
| 41 | base::span(array1).first(4u).copy_from(base::span(array2).subspan(8u, 4u)); |
| 42 | base::span(array1).subspan(4u).copy_from(base::as_byte_span(array3).first(8u)); |
| 43 | |
| 44 | // Use `split_at()` to ensure `array1` is fully written. |
| 45 | auto [from2, from3] = base::span(array1).split_at(4u); |
| 46 | from2.copy_from(base::span(array2).subspan(8u, 4u)); |
| 47 | from3.copy_from(base::as_byte_span(array3).first(8u)); |
| 48 | |
| 49 | // This can even be ensured at compile time (if sizes and offsets are all |
| 50 | // constants). |
| 51 | auto [from2, from3] = base::span(array1).split_at<4u>(); |
| 52 | from2.copy_from(base::span(array2).subspan<8u, 4u>()); |
| 53 | from3.copy_from(base::as_byte_span(array3).first<8u>()); |
| 54 | ``` |
| 55 | |
| 56 | ### Zeroing arrays (`memset`) |
| 57 | |
| 58 | `std::ranges::fill` works on any range/container and won't write out of |
| 59 | bounds. Converting arbitrary types into a byte array (through |
| 60 | `base::as_writable_byte_span`) is only valid when the type holds trivial |
| 61 | types such as primitives. Unlike `memset`, a constructed object can be |
| 62 | given as the value to use as in `std::ranges_fill(container, Object())`. |
| 63 | |
| 64 | You have: |
| 65 | ```cc |
| 66 | uint8_t array1[12]; |
| 67 | uint64_t array2[2]; |
| 68 | Object array3[4]; |
| 69 | memset(array1, 0, 12); |
| 70 | memset(array2, 0, 2 * sizeof(uint64_t)); |
| 71 | memset(array3, 0, 4 * sizeof(Object)); |
| 72 | ``` |
| 73 | |
| 74 | Spanified: |
| 75 | ```cc |
| 76 | uint8_t array1[12]; |
| 77 | uint64_t array2[2]; |
| 78 | Object array3[4]; |
| 79 | std::ranges::fill(array1, 0u); |
| 80 | std::ranges::fill(array2, 0u); |
| 81 | std::ranges::fill(base::as_writable_byte_span(array3), 0u); |
| 82 | ``` |
| 83 | |
| 84 | ### Comparing arrays (`memcmp`) |
| 85 | |
| 86 | You have: |
| 87 | ```cc |
| 88 | uint8_t array1[12] = {}; |
| 89 | uint8_t array2[12] = {}; |
| 90 | bool eq = memcmp(array1, array2, sizeof(array1)) == 0; |
danakj | 8d29e18 | 2024-06-05 16:50:18 | [diff] [blame^] | 91 | bool less = memcmp(array1, array2, sizeof(array1)) < 0; |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 92 | |
| 93 | // In tests. |
| 94 | for (size_t i = 0; i < sizeof(array1); ++i) { |
| 95 | SCOPED_TRACE(i); |
| 96 | EXPECT_EQ(array1[i], array2[i]); |
| 97 | } |
| 98 | ``` |
| 99 | |
| 100 | Spanified: |
| 101 | ```cc |
| 102 | uint8_t array1[12] = {}; |
| 103 | uint8_t array2[12] = {}; |
| 104 | // If one side is a span, the other will convert to span too. |
| 105 | bool eq = base::span(array1) == array2; |
danakj | 8d29e18 | 2024-06-05 16:50:18 | [diff] [blame^] | 106 | bool less = base::span(array1) < array2; |
danakj | 6a49deaa | 2024-06-04 21:31:55 | [diff] [blame] | 107 | |
| 108 | // In tests. |
| 109 | EXPECT_EQ(base::span(array1), array2); |
| 110 | ``` |
| 111 | |
| 112 | ### Copying array into an integer |
| 113 | |
| 114 | You have: |
| 115 | ```cc |
| 116 | uint8_t array[44] = {}; |
| 117 | uint32_t v1; |
| 118 | memcpy(&v1, array, sizeof(v1)); // Front. |
| 119 | uint64_t v2; |
| 120 | memcpy(&v2, array + 6, sizeof(v2)); // Middle. |
| 121 | ``` |
| 122 | |
| 123 | Spanified: |
| 124 | ```cc |
| 125 | #include "base/numerics/byte_conversions.h" |
| 126 | ... |
| 127 | uint8_t array[44] = {}; |
| 128 | uint32_t v1 = base::U32FromLittleEndian(base::span(array).first<4u>()); // Front. |
| 129 | uint64_t v2 = base::U64FromLittleEndian(base::span(array).subspan<6u, 8u>()); // Middle. |
| 130 | ``` |
| 131 | |
| 132 | ### Copy an array into an integer via cast |
| 133 | |
| 134 | Note: This pattern was UB-prone in addition to bounds-unsafe, as it can produce |
| 135 | a pointer with incorrect alignment. reinterpret_cast is very easy to hold wrong |
| 136 | in C++ and is worth avoiding. |
| 137 | |
| 138 | You have: |
| 139 | ```cc |
| 140 | uint8_t array[44] = {}; |
| 141 | uint32_t v1 = *reinterpret_cast<const uint32_t*>(array); // Front. |
| 142 | uint64_t v2 = *reinterpret_cast<const uint64_t*>(array + 6); // Middle. |
| 143 | ``` |
| 144 | |
| 145 | Spanified: |
| 146 | ```cc |
| 147 | #include "base/numerics/byte_conversions.h" |
| 148 | ... |
| 149 | uint8_t array[44] = {}; |
| 150 | uint32_t v1 = base::U32FromLittleEndian(base::span(array).first<4u>()); // Front. |
| 151 | uint64_t v2 = base::U64FromLittleEndian(base::span(array).subspan<6u, 8u>()); // Middle. |
| 152 | ``` |
| 153 | |
| 154 | ### Making a byte array (`span<uint8_t>`) from a string (or any array/range) |
| 155 | |
| 156 | You have: |
| 157 | ```cc |
| 158 | std::string str = "hello world"; |
| 159 | func_with_const_ptr_size(reinterpret_cast<const uint8_t*>(str.data()), str.size()); |
| 160 | func_with_mut_ptr_size(reinterpret_cast<uint8_t*>(str.data()), str.size()); |
| 161 | ``` |
| 162 | |
| 163 | Spanified: |
| 164 | ```cc |
| 165 | std::string str = "hello world"; |
| 166 | base::span<const uint8_t> bytes = base::as_byte_span(str); |
| 167 | func_with_const_ptr_size(bytes.data(), bytes.size()); |
| 168 | base::span<uint8_t> mut_bytes = base::as_writable_byte_span(str); |
| 169 | func_with_mut_ptr_size(mut_bytes.data(), mut_bytes.size()); |
| 170 | |
| 171 | // Replace pointer and size with a span, though. |
| 172 | func_with_const_span(base::as_byte_span(str)); |
| 173 | func_with_mut_span(base::as_writable_byte_span(str)); |
| 174 | ``` |
| 175 | |
| 176 | ### Making a byte array (`span<uint8_t>`) from an object |
| 177 | |
| 178 | You have: |
| 179 | ```cc |
| 180 | uint8_t array[8]; |
| 181 | uint64_t val; |
| 182 | two_byte_arrays(array, reinterpret_cast<const uint8_t*>(&val)); |
| 183 | ``` |
| 184 | |
| 185 | Spanified: |
| 186 | ```cc |
| 187 | uint8_t array[8]; |
| 188 | uint64_t val; |
| 189 | base::span<uint8_t> val_span = base::byte_span_from_ref(val); |
| 190 | two_byte_arrays(array, val_span.data()); |
| 191 | |
| 192 | // Replace an unbounded pointer a span, though. |
| 193 | two_byte_spans(base::span(array), base::byte_span_from_ref(val)); |
| 194 | ``` |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 195 | |
| 196 | ## Controlling warnings for a single file |
| 197 | |
| 198 | Warnings can be disabled for a single (C++ source or header) file by |
| 199 | writing `#pragma allow_unsafe_buffers` anywhere in the file. This can |
danakj | 51d26a4 | 2024-04-25 14:23:56 | [diff] [blame] | 200 | be used to mark future work to drive down over time: |
| 201 | ``` |
| 202 | #ifdef UNSAFE_BUFFERS_BUILD |
| 203 | // TODO(crbug.com/ABC): Remove this and convert code to safer constructs. |
| 204 | #pragma allow_unsafe_buffers |
| 205 | #endif |
| 206 | ``` |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 207 | |
danakj | 51d26a4 | 2024-04-25 14:23:56 | [diff] [blame] | 208 | It's recommended to place the pragma directly copyright header. This can be used |
| 209 | to work through files in a directory incrementally. To do so, remove the whole |
| 210 | directory from opt-out in the |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 211 | [`//build/config/unsafe_buffers_paths.txt`](../build/config/unsafe_buffers_paths.txt) |
danakj | 51d26a4 | 2024-04-25 14:23:56 | [diff] [blame] | 212 | file, and temporarily mark any files with `#pragma allow_unsafe_buffers` that |
| 213 | need it. |
| 214 | |
| 215 | The warnings can be enabled for a single (C++ source or header) file by writing |
| 216 | `#pragma check_unsafe_buffers` anywhere in the file. These also need to be |
| 217 | guarded by `#ifdef UNSAFE_BUFFERS_BUILD` if being checked in. |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 218 | |
| 219 | # Functions with array pointer parameters |
| 220 | |
| 221 | Functions that receive a pointer into an array may read |
| 222 | or write out of bounds of the pointer if given a pointer that |
| 223 | is incorrectly sized. Such functions should be marked with the |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 224 | `UNSAFE_BUFFER_USAGE` attribute macro. |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 225 | |
| 226 | The same is true for functions that accept an iterator instead |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 227 | of a range type. Some examples of each are `memcpy()` and |
| 228 | `std::copy()`. |
danakj | 4e625fb | 2024-03-06 20:47:46 | [diff] [blame] | 229 | |
| 230 | Calling such functions is unsafe and should generally be avoided. |
| 231 | Instead, replace such functions with an API built on base::span |
| 232 | or other range types which prevents any chance of OOB memory |
| 233 | access. For instance, replace `memcpy()`, `std::copy()` and |
| 234 | `std::ranges::copy()` with `base::span::copy_from()`. And |
| 235 | replace `memset()` with `std::ranges::fill()`. |
| 236 | |
| 237 | # Writing unsafe data structures with pointers |
| 238 | |
| 239 | TODO: Write about `UNSAFE_BUFFERS()` for rare exceptions where |
| 240 | the correctness of pointer bounds can be fully explained and |
danakj | e7db1e3f3 | 2024-04-16 20:43:24 | [diff] [blame] | 241 | encapsulated, such as within a data structure or when working |
| 242 | with Operating System and C-like APIs. |