Blame - docs/unsafe_buffers.md - chromium/src.git

blob: 1c4bb511682fb7da6b61cd8cfc371177c1c488b9 [file] [log] [blame] [view]

danakj	4e625fb	2024-03-06 20:47:46	[diff] [blame]	1	# Preventing OOB through Unsafe Buffers errors (aka Spanification)
				2
danakj	a4c4238	2024-06-18 19:05:44	[diff] [blame^]	3	Out-of-bounds (OOB) security bugs commonly happen through C-style pointers which
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	4	have no bounds information associated with them. We can prevent such
danakj	a4c4238	2024-06-18 19:05:44	[diff] [blame^]	5	bugs by always using C++ containers. Furthermore, the Clang compiler can
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	6	warn about unsafe pointer usage that should be converted to containers.
				7	When an unsafe usage is detected, Clang prints a warning similar to
				8	```
				9	error: unsafe buffer access [-Werror,-Wunsafe-buffer-usage]
				10	```
				11	and directs developers to this file for more information.
danakj	4e625fb	2024-03-06 20:47:46	[diff] [blame]	12
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	13	## Suppressions
danakj	6a49deaa	2024-06-04 21:31:55	[diff] [blame]	14
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	15	Our [compiler](../tools/clang/plugins/UnsafeBuffersPlugin.cpp) enables
				16	the `-Wunsafe-buffer-usage` warning on all files by default. Because the
				17	Chromium codebase is not yet compliant with these warnings, there are
				18	mechanisms to opt out code on a directory, file, or per-occurence basis.
danakj	6a49deaa	2024-06-04 21:31:55	[diff] [blame]	19
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	20	Entire directories are opted out of unsafe pointer usage warnings through
				21	the [`//build/config/unsafe_buffers_paths.txt`](../build/config/unsafe_buffers_paths.txt)
				22	file. As work progresses, directories will be removed from this list, and
				23	non-compliant files marked on a per-file basis as below. Early results
				24	indicate that often 85%+ of files in a directory already happen to be
				25	compliant, so file-by-file suppression allows this code to be subject
				26	to enforcement.
				27
				28	Individual files are opted out of unsafe pointer usage warnings though
				29	the use of the following snippet, which is to be placed immediately
				30	following the copyright header in a source file.
				31	```
				32	#ifdef UNSAFE_BUFFERS_BUILD
				33	// TODO(crbug.com/ABC): Remove this and convert code to safer constructs.
				34	#pragma allow_unsafe_buffers
				35	#endif
				36	```
				37
				38	Individual expressions or blocks of code are opted out by using the
				39	`UNSAFE_BUFFERS()` macro as defined in [`//base/compiler_specific.h`[(../base/compiler_specific.h)
				40	file. These should be rare once a project is fully converted, except
				41	perhaps when working with C-style external APIs. These must
				42	always be accompanied by a `// SAFETY:` comment explaining in detail
				43	how the code has been evaluated to be safe for all possible input.
				44
				45	To allow for incremental conversion, the use of a safety comment with
				46	a TODO() is permitted, along the lines of
				47	`// SAFETY: TODO(crbug.com/xxxxxx): resolve safety issues`.
				48
				49	Code introducing UNSAFE_BUFFER() macro invocations without corresponding
				50	`// SAFETY:` comment should be summarily rejected during code review.
				51
danakj	a4c4238	2024-06-18 19:05:44	[diff] [blame^]	52	## Container-based ecosystem
				53
				54	Containers may be owning types or view types. The common owning containers that
				55	us contiguous storage are `std::vector`, `std::string`, `base::HeapArray`,
				56	`std::array`. Their common view types are `base::span`, `std::string_view`,
				57	`base::cstring_view`.
				58
				59	Other owning containers include maps, sets, deques, etc. These are not
				60	compatible with `base::span` as they are not contiguous and generally do not
				61	have an associated view type at this time.
				62
				63	We are using `base::span` instead of `std::span` in order to provide a type that
				64	can do more than the standard type. We also have other types and functions to
				65	work with ranges and spans instead of unbounded pointers and iterators.
				66
				67	The common conversions to spans are:
				68	- `base::span<T>` replaces `T* ptr, size_t size`.
				69	- `base::span<T, N>` replaces `T (&ptr)[N]` (a reference to a compile-time-sized
				70	array).
				71	- `base::raw_span<T>` replaces `base::span<T>` (and `T* ptr, size_t size`) for
				72	class fields.
				73
				74	### Span construction
				75	- `base::span()` constructor can make a span, and deduce the type and size,
				76	from:
				77	- a `T[N]` array
				78	- `std::array<T, N>`
				79	- `std::vector`
				80	- `std::string`
				81	- any contiguous range with `begin()` and `end()` methods.
				82	- any type with `T* data()` and `size_t size()` methods.
				83	- `base::make_span<N>()` can make a fixed-size span from any range.
				84	- `base::as_bytes()` and `base::as_chars()` convert a span’s inner type to
				85	`uint8_t` or `char` respectively, making a byte-span or char-span.
				86	- `base::span_from_ref()` and `base::byte_span_from_ref()` make a span, or
				87	byte-span, from a single object.
				88	- `base::as_byte_span()` and `base::as_writable_byte_span()` to make a
				89	byte-span (const or mutable) from any container that can convert to a
				90	`base::span<T>`, such as `std::string` or `std::vector<Stuff>`.
				91
				92	#### Padding bytes
				93
				94	Note that if the type contains padding bytes that were not somehow explicitly
				95	initialized, this can create reads of uninitialized memory. Conversion to a
				96	byte-span is most commonly used for spans of primitive types, such as going from
				97	`char` (such as in `std::string`) or `uint32_t` (in a `std::vector`) to
				98	`unit8_t`.
				99
				100	### Dynamic read/write of a span
				101	- `base::SpanReader` reads heterogeneous values from a (typically, byte-) span
				102	in a dynamic manner.
				103	- `base::SpanWriter` writes heterogeneous values into a (typically, byte-) span
				104	in a dynamic manner.
				105
				106	### Values to/from byte spans
				107	In [`//base/numerics/byte_conversions.h`](../base/numerics/byte_conversions.h)
				108	we have conversions between byte-arrays and big/little endian integers or
				109	floats. For example (and there are many other variations):
				110	- `base::U32FromBigEndian` converts from a big-endian byte-span to an unsigned
				111	32-bit integer.
				112	- `base::U32FromLittleEndian` converts from a little-endian byte-span to an
				113	unsigned
				114	- `base::U32ToBigEndian` converts from an integer to a big-endian-encoded
				115	4-byte-array.
				116	- `base::U32ToLittleEndian` converts from an integer to a little-endian-encoded
				117	4-byte-array.
				118
				119	### Heap-allocated arrays
				120	- `base::HeapArray<T>` replaces `std::unique_ptr<T[]>` and places the bounds of
				121	the array inside the `HeapArray` which makes it a bounds-safe range.
				122
				123	### Copying and filling arrays
				124	- `base::span::copy_from(span)` replaces `memcpy` and `memmove`, and verifies
				125	that the source and destination spans have the same size instead of writing
				126	out of bounds. It lowers to the same code as `memmove` when possible.
				127	- Note `std::ranges::copy` is not bounds-safe (though its name sounds like
				128	it should be).
				129	- `std::ranges::fill` replaces `memset` and works with a range so you don't
				130	need explicit bounds.
				131
				132	### String pointers
				133
				134	A common form of pointer is `const char*` which is used (sometimes) to represent
				135	a NUL-terminated string. The standard library gives us two types to replace
				136	`char*`, which allow us to know the bounds of the character array and work with
				137	the string as a range:
				138
				139	- `std::string` owns a NUL-terminated string.
				140	- `std::string_view` is a view of a non-NUL-terminated string.
				141
				142	What’s missing is a view of a string that is guaranteed to be NUL-terminated so
				143	that you can call `.c_str()` to generate a `const char*` suitable for C APIs.
				144
				145	- `base::cstring_view` is a view of a NUL-terminated string. This avoids the
				146	need to construct a `std::string` in order to ensure a terminating NUL is
				147	present. Use this as a view type whenever your code bottoms out in a C API
				148	that needs NUL-terminated string pointer.
				149
				150	### Use of std::array<T>.
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	151
				152	The clang plugin is very particular about indexing a C-style array (e.g.
				153	`int arr[100]`) with a variable index. Often these issues can be resolved
				154	by replacing this with `std::array<int, 100> arr`, which provides safe
				155	indexed operations. In particular, new code should prefer to use the
				156	`std::array<T, N>` mechanism.
				157
				158	For arrays where the size is determined by the compiler (e.g.
				159	`int arr[] = { 1, 3, 5 };`), the `std::to_array<T>()` helper function
				160	should be used along with the `auto` keyword:
				161	`auto arr = std::to_array<int>({1, 3, 5});`
				162
danakj	a4c4238	2024-06-18 19:05:44	[diff] [blame^]	163	## Avoid reinterpret_cast
				164
				165	### Writing to a byte span
				166
				167	A common idiom in older code is to write into a byte array by casting
				168	the array into a pointer to a larger type (such as `uint32_t` or `float`)
				169	and then writing through that pointer. This an result in Undefined Behaviour
				170	and violates the rules of the C++ abstract machine.
				171
				172	Instead, keep the byte array as a `base::span<uint8_t>`, and write to it
				173	directly by chunking it up into pieces of the size you want to write.
				174
				175	Using `first()`:
				176	```cc
				177	void write_floats(base::span<uint8_t> out, float f1, float f2) {
				178	out.first<4>().copy_from(base::byte_span_from_ref(f1));
				179	out = out.subspan(4u); // Advance the span past what we wrote.
				180	out.first<4>().copy_from(base::byte_span_from_ref(f2));
				181	}
				182	```
				183
				184	Using `split_at()`:
				185	```cc
				186	void write_floats(base::span<uint8_t> out, float f1, float f2) {
				187	auto [write_f1, rem] = out.split_at<4>();
				188	auto [write_f2, rem2] = rem.split_at<4>();
				189	write_f1.copy_from(base::byte_span_from_ref(f1));
				190	write_f2.copy_from(base::byte_span_from_ref(f2));
				191	}
				192	```
				193
				194	Using `SpanWriter` and endian-aware `FloatToLittleEndian()`:
				195	```cc
				196	void write_floats(base::span<uint8_t> out, float f1, float f2) {
				197	auto writer = base::SpanWriter(out);
				198	CHECK(writer.Write(base::FloatToLittleEndian(f1)));
				199	CHECK(writer.Write(base::FloatToLittleEndian(f2)));
				200	}
				201	```
				202
				203	Writing big-endian, with `SpanWriter` and endian-aware `U32ToBigEndian()`:
				204	```cc
				205	void write_values(base::span<uint8_t> out, uint32_t i1, uint32_t i2) {
				206	auto writer = base::SpanWriter(out);
				207	CHECK(writer.Write(base::U32ToBigEndian(i1)));
				208	// SpanWriter has a built-in shortcut to do the same thing.
				209	CHECK(writer.WriteU32BigEndian(i2));
				210	// Verify we wrote to the whole output. We can put a size parameter in the
				211	// `out` span to push this check to compile-time when it's a constant.
				212	CHECK_EQ(writer.remaining(), 0u);
				213	}
				214	```
				215
				216	Writing an array to a byte span with `copy_from()`:
				217	```cc
				218	void write_floats(base::span<uint8_t> out, std::vector<const float> floats) {
				219	base::span<const uint8_t> byte_floats = base::as_byte_span(floats);
				220	// Or skip the first() if you want to CHECK at runtime that all of `out` has
				221	// been written to.
				222	out.first(byte_floats.size()).copy_from(byte_floats);
				223	}
				224	```
				225
				226	### Reading from a byte span
				227
				228	Instead of turning a `span<const uint8_t>` into a pointer of a larger type,
				229	which can cause Undefined Behaviour, read values out of the byte span and
				230	convert each one as a value (not as a pointer).
				231
				232	Using `subspan()` and endian-aware conversion `FloatFromLittleEndian`:
				233	```cc
				234	void read_floats(base::span<const uint8_t> in, float& f1, float& f2) {
				235	f1 = base::FloatFromLittleEndian(in.subspan<0, 4>());
				236	f2 = base::FloatFromLittleEndian(in.subspan<4, 4>());
				237	}
				238	```
				239
				240	Using `SpanReader` and endian-aware `U32FromBigEndian()`:
				241	```cc
				242	void read_values(base::span<const uint8_t> in, int& i1, int& i2, int& i3) {
				243	auto reader = base::SpanReader(in);
				244	i1 = base::U32FromBigEndian(*reader.Read<4>());
				245	i2 = base::U32FromBigEndian(*reader.Read<4>());
				246	// SpanReader has a built-in shortcut to do the same thing.
				247	CHECK(reader.ReadU32BigEndian(i3));
				248	// Verify we read the whole input. We can put a size parameter in the `in`
				249	// span to push this check to compile-time when it's a constant.
				250	CHECK_EQ(reader.remaining(), 0u);
				251	}
				252	```
				253
				254	## Patterns for spanification
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	255
				256	Most pointer issues ought to be resolved by converting to containers. In
				257	particular, one common conversion is to replace `T*` pointers with
				258	`base::span<T>` in a process known as spanification, since most pointers
				259	are unowned references into an array (or vector). The appropriate
				260	replacement for the pointer is
danakj	e7db1e3f3	2024-04-16 20:43:24	[diff] [blame]	261	[`base::span`](../base/containers/span.h).
danakj	4e625fb	2024-03-06 20:47:46	[diff] [blame]	262
danakj	6a49deaa	2024-06-04 21:31:55	[diff] [blame]	263	### Copying arrays (`memcpy`)
				264
				265	You have:
				266	```cc
				267	uint8_t array1[12];
				268	uint8_t array2[16];
				269	uint64_t array3[2];
				270	memcpy(array1, array2 + 8, 4);
				271	memcpy(array1 + 4, array3, 8);
				272	```
				273
				274	Spanified:
				275	```cc
				276	uint8_t array1[12];
				277	uint8_t array2[16];
				278	uint64_t array3[2];
				279	base::span(array1).first(4u).copy_from(base::span(array2).subspan(8u, 4u));
				280	base::span(array1).subspan(4u).copy_from(base::as_byte_span(array3).first(8u));
				281
				282	// Use `split_at()` to ensure `array1` is fully written.
				283	auto [from2, from3] = base::span(array1).split_at(4u);
				284	from2.copy_from(base::span(array2).subspan(8u, 4u));
				285	from3.copy_from(base::as_byte_span(array3).first(8u));
				286
				287	// This can even be ensured at compile time (if sizes and offsets are all
				288	// constants).
				289	auto [from2, from3] = base::span(array1).split_at<4u>();
				290	from2.copy_from(base::span(array2).subspan<8u, 4u>());
				291	from3.copy_from(base::as_byte_span(array3).first<8u>());
				292	```
				293
				294	### Zeroing arrays (`memset`)
				295
				296	`std::ranges::fill` works on any range/container and won't write out of
				297	bounds. Converting arbitrary types into a byte array (through
				298	`base::as_writable_byte_span`) is only valid when the type holds trivial
				299	types such as primitives. Unlike `memset`, a constructed object can be
				300	given as the value to use as in `std::ranges_fill(container, Object())`.
				301
				302	You have:
				303	```cc
				304	uint8_t array1[12];
				305	uint64_t array2[2];
				306	Object array3[4];
				307	memset(array1, 0, 12);
				308	memset(array2, 0, 2 * sizeof(uint64_t));
				309	memset(array3, 0, 4 * sizeof(Object));
				310	```
				311
				312	Spanified:
				313	```cc
				314	uint8_t array1[12];
				315	uint64_t array2[2];
				316	Object array3[4];
				317	std::ranges::fill(array1, 0u);
				318	std::ranges::fill(array2, 0u);
				319	std::ranges::fill(base::as_writable_byte_span(array3), 0u);
				320	```
				321
				322	### Comparing arrays (`memcmp`)
				323
				324	You have:
				325	```cc
				326	uint8_t array1[12] = {};
				327	uint8_t array2[12] = {};
				328	bool eq = memcmp(array1, array2, sizeof(array1)) == 0;
danakj	8d29e18	2024-06-05 16:50:18	[diff] [blame]	329	bool less = memcmp(array1, array2, sizeof(array1)) < 0;
danakj	6a49deaa	2024-06-04 21:31:55	[diff] [blame]	330
				331	// In tests.
				332	for (size_t i = 0; i < sizeof(array1); ++i) {
				333	SCOPED_TRACE(i);
				334	EXPECT_EQ(array1[i], array2[i]);
				335	}
				336	```
				337
				338	Spanified:
				339	```cc
				340	uint8_t array1[12] = {};
				341	uint8_t array2[12] = {};
				342	// If one side is a span, the other will convert to span too.
				343	bool eq = base::span(array1) == array2;
danakj	8d29e18	2024-06-05 16:50:18	[diff] [blame]	344	bool less = base::span(array1) < array2;
danakj	6a49deaa	2024-06-04 21:31:55	[diff] [blame]	345
				346	// In tests.
				347	EXPECT_EQ(base::span(array1), array2);
				348	```
				349
				350	### Copying array into an integer
				351
				352	You have:
				353	```cc
				354	uint8_t array[44] = {};
				355	uint32_t v1;
				356	memcpy(&v1, array, sizeof(v1)); // Front.
				357	uint64_t v2;
				358	memcpy(&v2, array + 6, sizeof(v2)); // Middle.
				359	```
				360
				361	Spanified:
				362	```cc
				363	#include "base/numerics/byte_conversions.h"
				364	...
				365	uint8_t array[44] = {};
				366	uint32_t v1 = base::U32FromLittleEndian(base::span(array).first<4u>()); // Front.
				367	uint64_t v2 = base::U64FromLittleEndian(base::span(array).subspan<6u, 8u>()); // Middle.
				368	```
				369
				370	### Copy an array into an integer via cast
				371
David Benjamin	0ffdea7	2024-06-12 19:55:22	[diff] [blame]	372	Note: This pattern is prone to more UB than out-of-bounds access. It is UB to
				373	cast pointers if the result is not aligned, so these cases are often buggy or
				374	were only correct due to subtle assumptions on buffer alignment. The spanified
				375	version avoids this pitfalls. It has no alignment requirement.
danakj	6a49deaa	2024-06-04 21:31:55	[diff] [blame]	376
				377	You have:
				378	```cc
				379	uint8_t array[44] = {};
				380	uint32_t v1 = reinterpret_cast<const uint32_t>(array); // Front.
David Benjamin	0ffdea7	2024-06-12 19:55:22	[diff] [blame]	381	uint64_t v2 = reinterpret_cast<const uint64_t>(array + 16); // Middle.
danakj	6a49deaa	2024-06-04 21:31:55	[diff] [blame]	382	```
				383
				384	Spanified:
				385	```cc
				386	#include "base/numerics/byte_conversions.h"
				387	...
				388	uint8_t array[44] = {};
				389	uint32_t v1 = base::U32FromLittleEndian(base::span(array).first<4u>()); // Front.
David Benjamin	0ffdea7	2024-06-12 19:55:22	[diff] [blame]	390	uint64_t v2 = base::U64FromLittleEndian(base::span(array).subspan<16u, 8u>()); // Middle.
danakj	6a49deaa	2024-06-04 21:31:55	[diff] [blame]	391	```
				392
				393	### Making a byte array (`span<uint8_t>`) from a string (or any array/range)
				394
				395	You have:
				396	```cc
				397	std::string str = "hello world";
				398	func_with_const_ptr_size(reinterpret_cast<const uint8_t*>(str.data()), str.size());
				399	func_with_mut_ptr_size(reinterpret_cast<uint8_t*>(str.data()), str.size());
				400	```
				401
				402	Spanified:
				403	```cc
				404	std::string str = "hello world";
				405	base::span<const uint8_t> bytes = base::as_byte_span(str);
				406	func_with_const_ptr_size(bytes.data(), bytes.size());
				407	base::span<uint8_t> mut_bytes = base::as_writable_byte_span(str);
				408	func_with_mut_ptr_size(mut_bytes.data(), mut_bytes.size());
				409
				410	// Replace pointer and size with a span, though.
				411	func_with_const_span(base::as_byte_span(str));
				412	func_with_mut_span(base::as_writable_byte_span(str));
				413	```
				414
				415	### Making a byte array (`span<uint8_t>`) from an object
				416
				417	You have:
				418	```cc
				419	uint8_t array[8];
				420	uint64_t val;
				421	two_byte_arrays(array, reinterpret_cast<const uint8_t*>(&val));
				422	```
				423
				424	Spanified:
				425	```cc
				426	uint8_t array[8];
				427	uint64_t val;
				428	base::span<uint8_t> val_span = base::byte_span_from_ref(val);
				429	two_byte_arrays(array, val_span.data());
				430
				431	// Replace an unbounded pointer a span, though.
				432	two_byte_spans(base::span(array), base::byte_span_from_ref(val));
				433	```
danakj	e7db1e3f3	2024-04-16 20:43:24	[diff] [blame]	434
danakj	4e625fb	2024-03-06 20:47:46	[diff] [blame]	435	# Functions with array pointer parameters
				436
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	437	Functions that receive a pointer argument into an array may read
				438	or write out of bounds of that array if subsequent manual size
				439	calculations are incorrect. Such functions should be avoided if
				440	possible, or marked with the `UNSAFE_BUFFER_USAGE` attribute macro
				441	otherwise. This macro propagates to their callers that they must
				442	be called from inside an `UNSAFE_BUFFERS()` region (along with
				443	a corresponding safety comment explaining how the caller knows
				444	the call will be safe).
danakj	4e625fb	2024-03-06 20:47:46	[diff] [blame]	445
				446	The same is true for functions that accept an iterator instead
danakj	e7db1e3f3	2024-04-16 20:43:24	[diff] [blame]	447	of a range type. Some examples of each are `memcpy()` and
				448	`std::copy()`.
danakj	4e625fb	2024-03-06 20:47:46	[diff] [blame]	449
Tom Sepez	9e7f696	2024-06-13 00:21:59	[diff] [blame]	450	Again, calling such functions is unsafe and should be avoided.
				451	Replace such functions with an API built on base::span
danakj	4e625fb	2024-03-06 20:47:46	[diff] [blame]	452	or other range types which prevents any chance of OOB memory
				453	access. For instance, replace `memcpy()`, `std::copy()` and
				454	`std::ranges::copy()` with `base::span::copy_from()`. And
				455	replace `memset()` with `std::ranges::fill()`.