0% found this document useful (0 votes)
90 views30 pages

Protecting Complex Software Architecture

Uploaded by

daltonjorge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views30 pages

Protecting Complex Software Architecture

Uploaded by

daltonjorge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Learn the architecture - Providing protection for

complex software
Version 1.0

Non-Confidential Issue 02
Copyright © 2020, 2022 Arm Limited (or its affiliates). 102433_0100_02_en
All rights reserved.
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0

Learn the architecture - Providing protection for complex software


Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.

Release information

Document history

Issue Date Confidentiality Change

0100-01 8 January 2020 Non-Confidential Initial release

0100-02 30 June 2022 Non-Confidential Minor bug fix in Return-oriented programming.

Proprietary Notice

This document is protected by copyright and other related rights and the practice or
implementation of the information contained in this document may be protected by one or more
patents or pending patent applications. No part of this document may be reproduced in any form
by any means without the express prior written permission of Arm. No license, express or implied,
by estoppel or otherwise to any intellectual property rights is granted by this document unless
specifically stated.

Your access to the information in this document is conditional upon your acceptance that you
will not use or permit others to use the information for the purposes of determining whether
implementations infringe any third party patents.

THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO


WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION,
THE IMPLIED WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-
INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO THE
DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to, has
undertaken no analysis to identify or understand the scope and content of, third party patents,
copyrights, trade secrets, or other rights.

This document may include technical inaccuracies or typographical errors.

TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR
ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL,
INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND
REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS
DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 2 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0

This document consists solely of commercial items. You shall be responsible for ensuring that
any use, duplication or disclosure of this document complies fully with any relevant export laws
and regulations to assure that this document or any portion thereof is not exported, directly
or indirectly, in violation of such export laws. Use of the word “partner” in reference to Arm’s
customers is not intended to create or refer to any partnership relationship with any other
company. Arm may make changes to this document at any time and without notice.

This document may be translated into other languages for convenience, and you agree that if there
is any conflict between the English version of this document and any translation, the terms of the
English version of the Agreement shall prevail.

The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks
of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. Other brands
and names mentioned in this document may be the trademarks of their respective owners. Please
follow Arm’s trademark usage guidelines at [Link]

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.

Arm Limited. Company 02557590 registered in England.

110 Fulbourn Road, Cambridge, England CB1 9NJ.

(LES-PRE-20349)

Confidentiality Status

This document is Non-Confidential. The right to use, copy and disclose this document may be
subject to license restrictions in accordance with the terms of the agreement entered into by Arm
and the party that Arm delivered this document to.

Unrestricted Access is an Arm internal classification.

Product Status

The information in this document is Final, that is for a developed product.

Feedback

Arm® welcomes feedback on this product and its documentation. To provide feedback on the
product, create a ticket on [Link]

To provide feedback on the document, fill the following survey: [Link]


documentation-feedback-survey.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 3 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0

Inclusive language commitment


Arm values inclusive communities. Arm recognizes that we and our industry have used language
that can be offensive. Arm strives to lead the industry and create change.

We believe that this document contains no offensive language. To report offensive language in this
document, email terms@[Link].

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 4 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Contents

Contents

1. Overview........................................................................................................................................................... 6

2. Stack smashing and execution permissions............................................................................................. 7

3. Return-oriented programming...................................................................................................................10

4. Jump-oriented programming..................................................................................................................... 16

5. Applying these techniques to real code................................................................................................. 20

6. Detecting memory safety violations....................................................................................................... 23

7. Check your knowledge............................................................................................................................... 28

8. Related information..................................................................................................................................... 29

9. Next steps...................................................................................................................................................... 30

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 5 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Overview

1. Overview
This guide introduces some common forms of attacks that are used against complex software
stacks. The guide also examines the features, including pointer authentication, branch target
identification and memory tagging, that are provided in Armv8-A to help mitigate against such
attacks. The guide is an overview of these features, and not a technical deep dive. You can use the
Related information section to explore some topics in this guide in more detail.

At the end of this guide, you will be able to:


• Define the terms Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP).
• List the features in Armv8-A that help protect against ROP and JOP attacks.
• Describe how memory tagging can be used to detect memory safety violations, like buffer
overruns or use-after-free.

Before you begin


We assume that you are familiar with the Arm memory model. If you are not, you might want to
first read our Memory model and Memory management guides.

If you are not familiar with security, we also recommend that you read our Introduction to security
guide before reading this guide.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 6 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Stack smashing and execution permissions

2. Stack smashing and execution


permissions
One of the oldest forms of attack is stack smashing. There are many types of stack smashing. The
basic form of stack smashing involves malicious software writing new opcodes into memory and
then attempting to execute the written memory. This process is illustrated here:

Figure 2-1: Stack smashing process

Typically, the memory that is used to launch the attack is stack memory. This is where the name
stack smashing comes from. To protect against stack smashing, modern processor architectures,
like the Arm architecture, have execution permissions. In Armv8-A, the main controls are execution
permission bits in the translation tables. If we focus only on EL0 and EL1:
UXN
User (EL0) Execute-never
PXN
Privileged Execute-never

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 7 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Stack smashing and execution permissions

Setting one of these bits marks the page as not executable. This means that any attempt to branch
to an address within that page triggers an exception, in the form of a Permission fault. There are
separate Privileged and Unprivileged bits. This is because application code needs to be executable
in user space (EL0) but should never be executed with kernel permissions (EL1/EL2). Another form
of attack involves abusing system calls to try to get privileged code to call code from user memory.

The following diagram shows a simplified, but typical, virtual address space for an application that is
running under an Operating System (OS), with the expected execution permissions:

Figure 2-2: Virtual address space

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 8 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Stack smashing and execution permissions

By convention, kernel space is at the top of memory and user space is at the bottom
of memory. Although this is not required by the architecture, it is the most common
layout and the examples in this guide follow this convention.

The architecture also provides control bits in the system control register, SCTLR_ELx, to make
all writable addresses non-executable. Enabling this control makes locations like the stack non-
executable.

A location that is writable at EL0 is never executable at EL1, regardless of how the PXN and
SCTLR_ELx controls are configured.

Together, these controls can provide robust protection against the kinds of attack that we have
described. The translation table attributes and write controls can block execution from any location
that the malicious code could write to, as you can see in the following diagram:

Figure 2-3: Protection against malicious code stack

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 9 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Return-oriented programming

3. Return-oriented programming
Features like the execution permission that we described have made it increasingly difficult to
execute arbitrary code. This means that attackers use other approaches like Return Oriented
Programming (ROP). ROP takes advantage of the scale of the software stack in many modern
systems. An attacker analyzes the software in a system, looking for gadgets. A gadget is a useful
fragment of code, usually ending with a function return, for example:

...
ADD x0, x1, x2
RET

This code provides a gadget for adding two registers together. By scanning all the available libraries,
an attacker can build a library of gadgets. These gadgets are existing legal code, within executable
regions. This means that they are not affected by protections like execution permissions. The
attacker strings together a chain of gadgets, forming what is effectively a new program, made up of
existing code fragments. You can see an example in the following diagram:

Figure 3-1: Gadget attack code

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 10 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Return-oriented programming

Any library that is available in the address space for the process is a potential source of gadgets.
For example, the C library contains many functions, each offering potential gadgets. With so many
gadgets available, statistically enough gadgets are available to form any arbitrary new program.
Some compilers are even designed to compile to gadgets, rather than assembler. An ROP attack is
effective, because it is made up of existing legal code, so it is not trapped by execution permissions
or checks on executing from writable memory.

It is time-consuming for an attacker to find gadgets and create the sequence that is necessary to
produce a new program. However, this process can be automated and can be reused to attack
multiple systems. Address Space Randomization (ASLR) can help prevent the practice of automated
and multiple attacks.

Pointer authentication
Armv8.3-A introduces the option of pointer authentication. Pointer authentication can mitigate
against ROP attacks.

Pointer authentication takes advantage of the fact that pointers are stored in a 64-bit format, but
not all those bits are needed to represent the address. The following diagram shows the virtual
address space layout:

Figure 3-2: Virtual address space

You can see that there are potentially two 252 byte address ranges, one at the top of the address
space, and one at the bottom of the address space:

Bottom Range: 0x0000_0000_0000_0000 - 0x000F_FFFF_FFFF_FFFF

Top Range: 0xFFF0_0000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF

Any address that falls outside of both ranges is always invalid and results in a fault if accessed.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 11 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Return-oriented programming

Before the release of Armv8.1-A, the maximum size of each range was 248.

You can see that any valid virtual address will have its top 12 bits as 0x000 or 0xFFF. When
pointer authentication is enabled, the upper bits are used to store a signature and are not treated
as part of the address. This signature is referred to as a Pointer Authentication Code (PAC).

The PAC uses the top bits of the pointer. Bit[55] is reserved to indicate whether the top or bottom
region is being accessed. This is illustrated here:

Figure 3-3: Pointer Authentication Code

The exact number of bits that are available for the PAC depends on the configured size of the
virtual address space, and on whether tagged pointers are enabled. The smaller the virtual address
space, the more bits that are available.

To protect against ROP attacks, at the start of a function the return address in the LR is signed.
This means that a PAC is added in the upper order bits of the register. Before returning, the return
address is authenticated using the PAC. If the check fails, an exception is generated when the
address is used for a branch. The following diagram shows an example:

Figure 3-4: Protection against ROP attacks

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 12 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Return-oriented programming

This change makes ROP attacks much harder to launch. This is because, to form the chain of
gadgets, the attacker needs to know the location of those gadgets, and correctly signed pointers to
those locations. To get a signed pointer it would need access to signing gadget.

How is the PAC formed?


The architecture provides five 128-bit keys. Each key is stored in a pair of 64-bit System registers:
• Two keys, A and B, for instruction pointers
• Two keys, A and B, for data pointers
• One key for general use

The registers that store these keys are only accessible at EL1 and above.

For data and instruction addresses, the instruction used to create and check the PAC specifies
whether the A key or the B key is used. For a particular pointer, the instruction that generates the
PAC and the instruction that authenticates the PAC must agree on which key to use.

The signature is formed from the address itself, the key, and a modifier, as you can see here:

Figure 3-5: Key and modifier authentication

The architecture allows different implementations, for example from different vendors, to use
different encryption algorithms. The recommended algorithm is QARMA, which is required by SBSA
level 5. ID_AA64ISAR1_EL1 reports which algorithm is supported on a specific processor.

The instructions that generate and authenticate the PAC specify whether the modifier is another
processor register or is 0. The modifier needs to be a value which will be the same on entry and
exit if the function is called correctly. For example, the Stack Pointer (SP) can have a different value
every time that a function is called but will have the same value at the start and at the end of a
given call. Using the SP as a modifier gives you a PAC that is only valid for that call of the function.
This is because the SP will probably be in a different location on future calls.

The limited size of the PAC means that the strength of the signature is potentially low, depending
on the size of the configured virtual address size. However, the keys are typically of limited life
span. Each running application can use different keys, and a given application can be given different

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 13 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Return-oriented programming

keys each time that it is launched. When forming a chain of gadgets, the attacker must get every
pointer correct, otherwise an exception will be raised.

How is the PAC checked?


Before use, the pointer must be authenticated. The authentication process is shown in this
diagram:

Figure 3-6: PAC check

The authentication operation regenerates the PAC and compares it with the value that is stored
in the pointer. If authentication succeeds, a pointer without the PAC is returned. If authentication
fails, an invalid pointer is returned. This means that an exception is raised if the pointer is used.

New instructions
To support pointer authentication, new instructions are added to A64. Let’s look at some examples
of the operations that are related to the instruction pointers:

PACIxSP - Sign LR using SP as the modifier.

PACIxZ - Sign LR using 0 as the modifier.

PACIx - Sign Xn using a general-purpose register as modifier.

AUTIxSP - Authenticate LR using SP as the modifier.

AUTIxZ - Authenticate LR using 0 as the modifier.

AUTIx - Authenticate Xn using a general-purpose register as modifier.

BRAx - Indirect branch with pointer authentication.

BLRAx - Indirect branch with link, with pointer authentication.

RETAx - Function return with pointer authentication.

ERETAx - Exception return with pointer authentication.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 14 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Return-oriented programming

In each case, replace x with A or B to select the wanted key.

The preceding list is not complete, but it shows the type of operations that are available. You can
refer to the Arm ARM for a complete list and detailed descriptions.

Use of the NOP space


Some of the new authentication instructions are in the NOP space. Applications or libraries that
protect themselves with these NOP-space instructions can run on older processors without pointer
authentication support. Although the older processors will not benefit from the protections, this
can be very useful in heterogeneous systems, as you can see in the following diagram:

Figure 3-7: Use of the NOP space

To provide backwards compatibility, this program uses separate instructions to


authenticate the LR and return. Ideally the combined authenticate and return
instructions, RETAx, would be used. However, the RETAx instruction does not use the
NOP instruction space. This means that it is not compatible with a processor that
does not support authentication.

Enabling pointer authentication


Pointer authentication is controlled by Exception level using SCTLR_ELx. SCTLR_ELx uses separate
controls for instruction checking and for data checking:
• EnIx - Enables instruction pointer authentication using key x.
• EnDx - Enables data pointer authentication using key x.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 15 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Jump-oriented programming

4. Jump-oriented programming
Jump-Oriented Programming (JOP), is similar to Return-Oriented Programming (ROP). In an ROP
attack, the software stack is scanned for gadgets that can be strung together to form a new
program. ROP attacks look for sequences that end in a function return (RET). In contrast, JOP
attacks target sequences that end in other forms of indirect (absolute) branches, like function
pointers or case statements. You can see an example here:

Figure 4-1: Jump oriented programming

The attacker exploits the fact that BLR or BR instructions can target any executable address, and not
just the addresses that are entry points defined by the compiler or developer. This means that the
instructions can be hijacked to string gadgets together.

Branch target instructions


To help protect against JOP attacks, Armv8.5-A introduced Branch Target Instructions (BTIs). BTIs
are also called landing pads. The processor can be configured so that indirect branches (BR and

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 16 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Jump-oriented programming

BLR) can only allow target landing pad instructions. If the target of an indirect branch is not a
landing pad, a Branch Target Exception is generated as you can see here:

Figure 4-2: Branch targer exception

The use of landing pads significantly reduces the number of possible targets for an indirect branch
and makes it harder to string chains of gadgets together to form a new program.

Enabling branch target checking


Support for landing pads is enabled for each page, using a new bit (GP bit) in the translation tables.
Per-page controls allows a filesystem to contain a mixture of landing pad-protected code and legacy
code, which is illustrated here:

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 17 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Jump-oriented programming

Figure 4-3: Landing pad

The encoding for BTI instructions, like the pointer-authentication instructions, is allocated within
the NOP space. BTI-protected code can still function when run on older processors that do not
support BTI, or when GP=0, although without the additional protection.

How BTI is implemented


PSTATE includes a field, BTYPE, that records the branch type. On executing an indirect branch, the
type of indirect branch is recorded in [Link]. The following list shows the value BTYPE takes
for different branch instructions:
• BTYPE=11: BR, BRAA, BRAB, BRAAZ, BRABZ with any register other than X16 or X17
• BTYPE=10: BLR, BLRAA, BLRAB, BLRAAZ, BLRABZ
• BTYPE=01: BR, BRAA, BRAB, BRAAZ, BRABZ with X16 or X17

Executing any other type of instruction, including direct branches, causes BTYPE to be set to b00.

Why store two bits? A simple implementation could record whether an indirect branch was in
process or not. However, recording the type of indirect branches further limits the possibilities of
finding gadgets. The syntax of the BTI instruction includes an argument, specifying which types of
indirect branch it can be targeted by:

Argument Accepted [Link] Use case


BTI c 0b10 and 0b01 Function calls
BTI j 0b11 and 0b01 Non-function call branches, like case-statements
BTI jc All All

When BTYPE!=00, the processor checks whether the instruction being targeted is a landing. If it is
not a landing, or if it is the wrong type of indirect branch, an exception is generated.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 18 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Jump-oriented programming

X16 and X17


Why does the architecture distinguish between indirect branches that use X16 or X17 and those
that do not?

X16 and X17 have special significance in the Procedure Call Standard used by Arm. They are
referred to as the intra-procedure call corruptible registers, or IP0 or IP1. They can be used by
static linkers for inserting branch-range extending veneers, or by dynamic linkers for handling jump
tables.

This is relevant to us because it means that a function might be entered directly from the caller
using BL or BLR or indirectly via linker generated code using X16 or X17. Therefore, the landing pad
for a function entry needs to be able to accept both.

Function entry and return


The function return instructions, RET, RETAA and RETAB, are also a form of indirect branch. If these
instructions were required to target a BTI, every function call would need to be followed by a BTI.
This would cause undesirable code bloat. Also, the pointer authentication feature already provides
a way to protect function returns.

For function entry, the pointer signing instructions PACIxSP and PACIxZ act like landing pads. These
instructions are like BTI instructions. This means that when the landing pad feature is used pointer
authentication, there is no need to start every function with a BTI. This also avoids code bloat.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 19 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Applying these techniques to real code

5. Applying these techniques to real code


In Return-oriented programming (ROP) and Jump-oriented programming (JOP), we explored
features that Arm introduced to the Arm architecture to mitigate against JOP-style and ROP-style
attacks. Now we will look at the compiler support for these features, and how enabling these
protections affects the number of that are gadgets available to attackers.

In this section, we refer to these versions of Arm Compiler 6 and Gnu C Compiler (GCC):
• Arm Compiler 6.11
• GCC 9.1

Compiler support for these features continues to evolve. Precise figures will vary based on the
versions that you use.

Build an image with pointer authentication and branch target identification


For Arm Compiler 6, GCC and LLVM generation of pointer authentication and BTI-enabled code is
controlled by:
• mbranch-protection=<protection>

Where <protection> can be any combination of:


• pac-ret{+leaf+b-key}

◦ pac-ret enables return address signing for non-leaf functions using the A-key.
◦ +leaf increases the scope of return address signing to include leaf functions.
◦ +b-key uses B-key instructions to sign addresses instead of A-key instructions.
• bti protects code using Branch Target Identification.
• standard turns on all types of branch protection.
◦ Currently standard implies pac-ret+bti.
• none turns off all types of branch protection.
◦ This is the default if the -mbranch-protection flag is not provided.

Whether the combined or NOP-compatible instructions are generated depends on the architecture
version that the code is built for. When building for Armv8.3-A, or later, the compiler will use
the combined operations. When building for Armv8.2-A, or earlier, it will use the NOP compatible
instructions. For example:

-march=armv8.2-a -mbranch-protection=standard -march=armv8.3-a -mbranch-protection=standard


enableInt enableInt
0x00000000: d503233f PACIASP 0x00000000: d503233f PACIASP
... ...
... ...
0x00000350: d50323bf AUTIASP 0x00000350: d65f0bff RETAA

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 20 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Applying these techniques to real code

-march=armv8.2-a -mbranch-protection=standard -march=armv8.3-a -mbranch-protection=standard


0x00000354: 65f03c0 RET

The function used in this example was taken from the example that accompanies
our guide Arm CoreLink Generic Interrupt Controller v3 and v4 Overview and built
with Arm Compiler 6.

The compiler generates the instructions that are required to perform signing and authentication.
Generating and configuring keys is the responsibility of supervising software, typically an operating
system.

Reduction in available gadgets


GLIBC is a large library that is used in C or C++ applications. This means that it is a good target for
attackers, and a good place for us to see the effect of applying the measures to mitigate attacks.
Arm used this tool to measure the number of available gadgets and modified the tool to fit our
requirements.

The following graph shows the number of gadgets before and after the compiler options were
enabled:

Figure 5-1: ROP and JOP gadgets gets in GLIBC graph

By enabling both pointer authentication and branch target identification, the number of gadgets
that are available reduces by 97.65%.
Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 21 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Applying these techniques to real code

Effect on code size


The protection described in the preceding section is helpful but comes at a cost. One obvious cost
is the increase in code size. Here is an analysis of this cost:

Figure 5-2: Change in image size from enabling ROP and JOP mitigations

The graph shows that the code size effect on GLIBC is minimal. Even though turning on both
the mitigations leads to a 2.9% code size increase, this increase is smaller when compiling with -
march=armv8.3-a. Compiling for Armv8.3-A allows the compiler to use fused authenticate and
return instructions. This means that, for Armv8.3-A, the code size increase is only 1.6%.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 22 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Detecting memory safety violations

6. Detecting memory safety violations


Some classes of vulnerability that are related to memory usage can be difficult to detect and test
for. Two examples of this are:

• Use after free - Applications continue to use allocated memory after releasing it, or after it is
out of scope. This is a violation of temporal memory safety.
• Buffer overrun, or overflow - Going beyond the bounds of an allocated structure or buffer,
usually because of insufficient bounds checking. This is a violation of spatial memory safety.

Armv8.5-A introduces the Memory Tagging Extension (MTE), also called memory coloring. Memory
tagging makes detecting memory safety violations easier and more efficient.

One of the first Internet-spread computer worms was the Internet Worm in 1988,
which exploited a buffer overrun. More than thirty years later, we are still seeing
attacks that exploit this type of programming bug.

Memory tagging
Regions of address space are allocated a tag, or lock. The upper bits of a virtual address are also
used to store a tag, or key. On a memory access, the processor compares the key in the issued
address with the lock that is assigned to that physical location. Here is an example:

Figure 6-1: Memory tagging

In the preceding diagram, two regions have been allocated, using tags 9 and 2.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 23 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Detecting memory safety violations

For the first two pointers, the tag matches that of the accessed location. You can think of this as
the key fitting the lock. Accesses using these pointers would succeed as normal.

However, for the final pointer the tag does not match that of the accessed location. This will be
captured as a tag check failure. We will look at what happens in the case later.

Let’s apply this mechanism to the problems that we identified earlier, starting with buffer overruns,
as you can see in this diagram:

Figure 6-2: Buffer overruns memory tagging

On the call to malloc() the C library will allocate the memory and assign a tag for the buffer. The
returned pointer will include the allocated tag. If software using the pointer goes beyond the limits
of the buffer, the tag comparison check will fail. This failure will allow us to detect the overrun.

Similarly, for use-after-free, on the call to malloc() the buffer gets allocated in memory and
assigned a tag value. The pointer that is returned by malloc() includes this tag. Later the buffer is
released. The C library might change the tag when the memory is released or might wait until the
memory is reused for some other purpose. If software continues to use the old pointer, it will have
the old tag value and the tag check will catch it.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 24 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Detecting memory safety violations

Figure 6-3: Old tag value check

The total number of possible tags is small. Therefore, the same tag value might be
used for several different regions over time, or at the same time. However, with
careful tag allocation, sequential overruns or underruns can be detected. Wild
accesses are statistically likely to be caught.

Tags
To work with tags, the architecture gains several new instructions, including:
• IRG - Generates a random tag value and inserts it to a pointer
• STG - Sets the tag value for a block of memory
• STZG - Sets the tag value for a block of memory, and zeros corresponding memory location
◦ If the allocator is going to zero the allocated memory, STZG offers better performance than
separate zeroing and tagging.
• LDG - Reads the tag value for a block of memory

Tags are four bits and are stored in two places:


• Key - Stored in bits [59:56] of a pointer
◦ This requires pointer tagging to be enabled. We will discuss this later in the guide.
• Lock - A new address space, the tag address space, is added. The tag address space records the
tag to a memory region.

On allocating a block of memory, software allocates a tag either randomly, using IRG, or using a
custom algorithm. Each tag covers 16 bytes. This means that software needs to execute STZG or
STG multiple times to cover all the 16-byte blocks within the allocated memory.

Tagged and untagged addresses


Not all memory accesses require tag checking. We describe an access as Checked or Unchecked,
depending on whether tag checking is carried out.

The following accesses are always Unchecked:


• Instruction fetches

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 25 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Detecting memory safety violations

• Translation table walks, including hardware updates of the Access Flag or Dirty state
• Data cache maintenance operations
• Accesses to the Allocation tags

For data accesses, a new memory attribute is added to indicate that accesses to this region should
be Checked:
• MemAttr[] == 0xF0: Inner+Outer Write-Back Cacheable, Read or Write-Allocate, Tagged

Data accesses to a region that is marked as Tagged are classed as Checked, unless one of the
following applies:
• TCR_ELx.TBI==0

• The Logical tag (bits [59:56] of the virtual address) are b0000 or b1111.
• The load or store uses the SP as a base register with an immediate offset, or no offset
• It is a PC relative load.
• [Link]==1

Data accesses to any region without the Tagged attribute are Unchecked.

Loads or stores using the stack pointer with an immediate offset can be statically
checked at build time. This means that there is less benefit to checking with MTE.
The same principle applies to PC-relative loads.

What happens when a comparison fails?


Let’s discuss what happens when the tag comparison fails. The architecture makes the behavior of
tag comparison failure configurable, controlled by SCTLR_ELx.TCF, or SCTLR_ELx.TCF0 for EL0:
• TCF==00 - Tag comparison failures are ignored.
• TCF==01 - Tag comparison failures are reported as a synchronous Data Abort. The address that
caused the failure is reported in FAR_ELx.
• TCF==10 - Tag comparison failures are reported asynchronously by updating bits in TFSR_ELx, or
TFSR0_EL1 for EL0. Optionally, checks can be synchronized on exception entry, to allow check
failures to be attributed to a specific process.

The architecture provides both synchronous and asynchronous mechanisms to report tag
comparison failures. Synchronous checking makes debugging simpler, because it allows you to
identify the precise instruction and address that caused the failure. However, synchronous checking
typically has a significant performance impact. This performance impact might be acceptable in a
development environment but is too high for deployment.

Asynchronous checking is less costly. This means that asynchronous checking is potentially
acceptable even on production systems. Although asynchronous checking provides less precise
information on where the tag comparison failure occurred, it can provide some mitigation and be

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 26 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Detecting memory safety violations

used for profiling. Profiling allows problem areas to be identified, narrowing down the search area
for bugs.

Combining memory tagging and pointer authentication


Memory tagging and pointer authentication both use the upper bits of an address to store
additional information about the pointer: a tag for memory tagging, and a PAC for pointer
authentication.

Both technologies can be enabled at the same time. The size of the PAC is variable, depending on
the size of the virtual address space. When memory tagging is enabled at the same time, there are
fewer bits available for the PAC.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 27 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Check your knowledge

7. Check your knowledge


The following questions will help you test you knowledge.

What is a gadget in Return oriented-programming (ROP) and Jump-oriented programming (JOP)


attacks?
A gadget is a piece of existing code which ends in either a function return or an indirect
(absolute) branch. In ROP and JOP attacks, these gadgets are chained together to form new
programs.
Describe how Branch Target Identification (BTI) limits the scope of JOP attacks.
BTI restricts indirect branches to only target-ing- BTI instructions, or PACIxSP and PACIxZ
instructions. This greatly reduces the number of possible targets and makes it difficult to form
chains of gadgets.
When using pointer authentication, where is the signature of an address stored?
In the upper bits of the virtual address.
In the Arm Memory Tagging Extension (MTE), what happens when the tag issued alongside a
memory access does not match the allocation tag?
This situation is known as a tag checking failure. The behavior is configurable, via
SCTLR_ELx.TCF. The failure can be ignored, reported synchronously, or reported
asynchronously.
How many bits are used to store the logical tag in the Arm memory tagging extension?
4 bits, but the values 0b0000 and 0b1111 are reserved.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 28 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Related information

8. Related information
Here are some resources related to information in this guide:

• Arm architecture and reference manuals: Find technical manuals and documentation relating to
this guide and other similar topics
• Arm Community: Ask development questions, and find articles and blogs on specific topics from
Arm experts
• Armv8-A Instruction Set Architecture: More information on the Procedure Call Standard
• The QARMA Block Cipher Family: Information on the QARMA cipher from the International
Association for Cryptologic Research
• Control-Flow-Integrity: NSA paper on control flow protection. While not specific to the Arm
architecture, the paper provides good background reading on the topic

Detecting memory safety violoations


• Adopting the Arm Memory Tagging Extension in Android: Google blog about their use of
memory tagging techniques to locate memory safety bugs
• Armv8.5-A Memory Tagging Extension: Arm white paper with a detailed description of the
memory tagging technology

Pointer authentication
• Armv8.3-A pointer authentication support: The patches that added support for pointer
authentication to the Linux kernel give information on how the technology is used in practice
• Code reuse attacks: the compiler story: Arm blog discussing the use of Pointer authentication
and Branch Target

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 29 of 30
Learn the architecture - Providing protection for complex Document ID: 102433_0100_02_en
software Version 1.0
Next steps

9. Next steps
This guide introduced features available in the Arm architecture which can provide robust defenses
for complex software stacks. We have looked at the pointer authentication and branch target
identification extensions, which can be used to defend against ROP and JOP attacks. We also
looked at how memory tagging can be used to detect and locate potential vulnerabilities before
they are exploited.

Next you might want to learn about Arm’s TrustZone technology, another feature available in the
Arm architecture.

The knowledge in this guide, and in the TrustZone guide, will be useful to you as you design your
own complex systems. Enabling you to decide which combination of technologies you should
deploy to protect different assets in the system.

Copyright © 2020, 2022 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 30 of 30

Common questions

Powered by AI

BTI enhances security by acting as landing pads for indirect branches, regulating branch targets to prevent arbitrary jump sequences from forming. This feature requires that indirect branches target specific instructions, such as BTIs or PAC instructions, limiting the number of potential targets and making it more difficult to form chains of gadgets essential for JOP attacks .

Pointer authentication and memory tagging coexist by utilizing the upper bits of an address differently. The memory tagging uses a small fixed number of bits for tags, while pointer authentication uses a variable-sized portion for PAC, depending on the remaining available bits following memory tagging. This allows both features to be simultaneously enabled while balancing the use of address bits between them .

Synchronous tag failure reporting provides precise debugging information by identifying the specific instruction and address causing the failure, which makes it valuable in development. However, it incurs significant performance overhead, making it impractical for deployed systems. Asynchronous reporting, while less precise, has lower performance costs, making it suitable for production environments, allowing problem profiling without substantial overhead .

JOP differs from ROP by targeting sequences that end in indirect branches like function pointers or case statements, rather than function returns. To mitigate JOP attacks, Arm introduced Branch Target Instructions (BTIs), which restrict indirect branches to target only landing pad instructions. This reduces the number of viable targets for indirect branching and thereby limits the effectiveness of chaining gadgets together in JOP attacks .

Memory tagging detects buffer overruns by assigning a tag upon memory allocation and verifying it during access. Use-after-free instances get detected when continued use of a pointer—after deallocation—carries an old tag value, failing the newly assigned tag check during subsequent accesses, thereby identifying illicit or incorrect memory use .

Enabling branch target checking impacts execution by requiring per-page controls that mix landing pad-protected code with legacy code. The functionality is retained for older processors or when branch target protection is off, though without the added security, potentially increasing vulnerability if a mixture includes executable code segments not protected by BTIs .

ROP attacks are effective because they utilize existing legal code in the form of gadgets, which are unaffected by execution permissions. Attackers compile a library of these gadgets—useful code fragments ending in function returns—from available libraries. By stringing these gadgets together, an attacker can create a new program without violating execution permissions, since the fragments are existing executable code .

The Arm architecture distinguishes X16 and X17 because they are intra-procedure call corruptible registers used in the Procedure Call Standard. They serve critical roles for dynamic and static linking—such as inserting branch-range extending veneers or handling jump tables—which means that they need to interact flexibly with BTIs to avoid unnecessary constraints on branch linking .

Memory tagging involves assigning a unique tag to each allocated memory block upon a call to malloc(). The pointer returned includes this tag. If software accesses memory beyond a buffer’s boundaries, a tag comparison check fails, identifying the overrun. This mechanism effectively detects buffer overruns by ensuring that memory access matches the tagged boundaries .

The control bits in the system control register (SCTLR_ELx) enforce that all writable addresses are non-executable. When these controls are enabled, they ensure that writable locations such as the stack are non-executable, and a writable location at EL0 is non-executable at EL1 regardless of other configurations. This effectively prevents execution of malicious code written to these writable areas .

You might also like