arm_compiler_user_guide_100748_0623_01_en
arm_compiler_user_guide_100748_0623_01_en
Version 6.23
User Guide
Non-Confidential Issue 01
Copyright © 2019–2024 Arm Limited (or its affiliates). 100748_6.23_01_en
All rights reserved.
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
This document (100748_6.23_01_en) was issued on 2024-10-16. There might be a later issue at
https://developer.arm.com/documentation/100748
See also: Proprietary notice | Product and document information | Useful resources
Start reading
If you prefer, you can skip to the start of the content.
Intended audience
This document is intended for software developers and provides a detailed description of the
features supported in Arm® Compiler for Embedded 6 and how to use them.
We believe that this document contains no offensive language. To report offensive language in this
document, email [email protected].
Feedback
Arm welcomes feedback on this product and its documentation. To provide feedback on the
product, create a ticket on https://support.developer.arm.com.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 2 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Contents
Contents
1. Getting Started..............................................................................................................................................10
1.1 Tools and libraries provided with Arm Compiler for Embedded 6....................................................10
1.2 Application development............................................................................................................................ 12
1.3 About the Arm Compiler for Embedded toolchain assemblers......................................................... 13
1.4 System requirements and installation...................................................................................................... 14
1.5 Accessing Arm Compiler for Embedded from Arm Development Studio........................................18
1.6 Accessing Arm Compiler for Embedded from the Arm Keil MDK.................................................... 18
1.7 Compiling a Hello World example............................................................................................................18
1.8 Using the integrated assembler................................................................................................................ 21
1.9 Running bare-metal images........................................................................................................................24
1.10 Architectures supported by Arm Compiler for Embedded 6...........................................................25
1.11 Using Arm Compiler for Embedded securely in a shared environment.........................................26
1.12 Providing source code to Arm support.................................................................................................27
1.13 Build attributes........................................................................................................................................... 27
6. SVE Coding Considerations with Arm Compiler for Embedded 6................................................. 119
6.1 Assembling SVE code............................................................................................................................... 119
6.2 Disassembling SVE object files............................................................................................................... 121
6.3 Running a binary in an AEMv8-A Base Fixed Virtual Platform (FVP)............................................ 122
6.4 Embedding SVE assembly code directly into C and C++ code.......................................................126
6.5 Using SVE and SVE2 intrinsics directly in your C code....................................................................131
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 4 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Contents
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 8 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Contents
Proprietary notice..........................................................................................................................................424
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 9 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
1. Getting Started
Arm® Compiler for Embedded 6 is the most advanced C and C++ compilation toolchain from Arm
for Arm® Cortex® and Arm® Neoverse® processors. Arm Compiler for Embedded 6 is developed
alongside the Arm architecture. Therefore, Arm Compiler for Embedded 6 is tuned to generate
highly efficient code for embedded bare-metal applications ranging from small sensors to 64-bit
devices.
Arm Compiler for Embedded 6 is a component of Arm Development Studio and Arm Keil MDK.
The features and processors that Arm Compiler for Embedded 6 supports depend on the product
edition. See Compare Editions for Arm Development Studio.
You can use Arm Compiler for Embedded 6 from Arm Development Studio, Arm Keil MDK, or as a
standalone product.
The compiler is based on LLVM and Clang technology. Clang is a compiler front end for LLVM
that supports the C and C++ programming languages.
armasm
The legacy assembler. Only use armasm for legacy Arm-syntax assembly code. Use the
armclang integrated assembler and GNU syntax for all new assembly files.
The armasm legacy assembler is deprecated, and it has not been updated since
Arm Compiler 6.10. Also, armasm does not support:
• Armv8.4-A or later architectures.
• Certain backported options in Armv8.2-A and Armv8.3-A.
• Assembling SVE instructions.
• Armv8.1-M or later architectures, including MVE.
• All versions of the Armv8-R architecture.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 10 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
armlink
The linker combines the contents of one or more object files with selected parts of one or
more object libraries to produce an executable program.
armar
The archiver enables sets of ELF object files to be collected together and maintained in
archives or libraries. If you do not change the files often, these collections reduce compilation
time as you do not have to recompile from source every time you use them. You can pass
such a library or archive to the linker in place of several ELF files. You can also use the archive
for distribution to a third-party application developer as you can share the archive without
giving away the source code.
fromelf
The image conversion utility can convert Arm ELF images to binary formats. It can also
generate textual information about the input image, such as its disassembly, code size, and
data size.
C and C++ language and library support in Arm Compiler for Embedded 6
armclang inherits the C and C++ language from clang. Therefore, Arm progressively updates
the support level based on clang. However, there might be a mismatch between the C and C++
library support and the language support. For example, some library features might not apply to
embedded development, such as filesystem in the C++ library.
Arm does not guarantee the compatibility of C++ compilation units compiled
with different major or minor versions of Arm Compiler for Embedded and
linked into a single image. Therefore, Arm recommends that you always build
your C++ code from source with a single version of the toolchain.
Arm C libraries
The Arm C libraries provide:
• An implementation of the library features as defined in the C standards.
• Nonstandard extensions common to many C libraries.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 11 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
Comments inside source files and header files that are provided by Arm might not
be accurate and must not be treated as documentation about the product.
For C and C++ language support and libc++ library support in Arm Compiler for Embedded 6, see:
• C language
• C++ language
• libc++ C++14
• libc++ C++17
Related information
Compiling a Hello World example on page 18
Common Arm Compiler for Embedded toolchain options on page 32
-S (armclang)
Arm C and C++ library directory structure
The following figure shows how the compilation tools are used for the development of a typical
application:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 12 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
code
C/C++ A32 .c .o data
and T32 code
debug
Arm® Compiler for Embedded 6 has more functionality than the set of product features that is
described in the documentation. The various features in Arm Compiler for Embedded 6 can have
different levels of support and guarantees. For more information, see Support level definitions.
• If you are migrating your toolchain from Arm Compiler 5 to Arm Compiler for
Embedded 6, see the Migration and Compatibility Guide. It contains information
on how to migrate your source code and toolchain build options.
• For a list of Arm Compiler for Embedded 6 documents, see the Arm Compiler
for Embedded documentation index on Arm Developer.
Related information
Compiling a Hello World example on page 18
Common Arm Compiler for Embedded toolchain options on page 32
-S (armclang)
They are:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 13 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
• The armclang integrated assembler. Use this to assemble assembly language code written in
GNU syntax.
• An optimizing inline assembler built into armclang. Use this to assemble assembly language
code written in GNU syntax that is used inline in C or C++ source code.
• The freestanding legacy assembler, armasm. Use armasm to assemble existing A64, A32, and T32
assembly language code written in armasm syntax.
The armasm legacy assembler is deprecated, and it has not been updated since
Arm Compiler 6.10. Also, armasm does not support:
◦ Armv8.4-A or later architectures.
◦ Certain backported options in Armv8.2-A and Armv8.3-A.
◦ Assembling SVE instructions.
◦ Armv8.1-M or later architectures, including MVE.
◦ All versions of the Armv8-R architecture.
The command-line option descriptions and related information in the Arm Compiler
for Embedded Reference Guide describe all the features that Arm Compiler for
Embedded supports. Any features not documented are not supported and are used
at your own risk. You are responsible for making sure that any generated code using
community features is operating correctly. See Support level definitions.
Related information
Using Assembly and Intrinsics in C or C++ Code on page 108
Assembling GNU syntax and armasm assembly code on page 103
Arm Compiler for Embedded Reference Guide
System Requirements
Arm Compiler for Embedded 6 is available for the following:
• x86_64 Windows
• x86_64 Windows for Arm® Keil® MDK
• x86_64 Linux
• AArch64 Linux
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 14 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
For more information on system requirements, see the Release Notes that are available with the
installer for your version on the Arm Compiler downloads index page.
The Linux installers of Arm Compiler for Embedded might be vulnerable to the
CVE-2022-43701 permission-based attack. For more information, see Installer
vulnerabilities CVE-2022-43701, CVE-2022-43702, and CVE-2022-43703.
Prerequisites
1. Click the link in the Product Download Hub page column of the Arm Compiler downloads
index to download the installer for your version. The download pack provided for use with Keil
MDK is not suitable for standalone use.
2. Obtain a license. Contact your Arm sales representative or Request a license.
If you are using a user-based license, see the User-based licensing User Guide.
If you have an older version of Arm Compiler for Embedded 6 and you want to upgrade, we
recommend that you uninstall the older version of Arm Compiler for Embedded 6 before installing
the new version of Arm Compiler for Embedded 6.
Arm Compiler for Embedded requires the Universal C Runtime in Windows to be installed. For
more information, see Update for Universal C Runtime in Windows.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 15 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
To allow you to run the armclang binary, it is dynamically linked to a copy of libstdc++ that is
installed under your chosen directory as part of Arm Compiler for Embedded. libstdc++ is not the C
++ standard library that you use to build a C++ project for Arm target devices.
To allow you to run the armclang binary, it is dynamically linked to a copy of libstdc++ that is
installed under your chosen directory as part of Arm Compiler for Embedded. libstdc++ is not the C
++ standard library that you use to build a C++ project for Arm target devices.
shasum -c ./sw/checksums.txt
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 16 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
Windows installation
1. To verify the installed files on Windows create the file check_checksum.bat containing
the following commands:
@ECHO OFF
setlocal enabledelayedexpansion
REM Cycle through each line of checksums.txt
for /F "tokens=*" %%L in (sw\checksums.txt) do (
REM For each line grab two tokens: %%a (hash) %%b (file path)
for /F "tokens=1,2 delims= " %%a in ("%%L") do (
REM Run certutil on the file path, and cycle over its output line
for /F "usebackq tokens=* skip=1" %%C in (`certutil -hashfile "%%b"
SHA256`) do (
REM We only need the 2nd of 3 lines output by certutil
REM (skip=1 ignores the first)
set var=""
REM Searching for the string 'CertUtil' allows us to ignore the 3rd
for /F "usebackq delims=" %%x in (`echo "%%C"^|findstr /v "CertUtil"`)
do (
set var=%%x
)
REM If this is the 2nd 'hash' line of certutil, then it is time to
compare
if not "!var!" == """" (call :compare_hashes %%b %%a !var!)
)
)
)
echo All hashes match.
EXIT /B 0
:compare_hashes
echo Checking file: %1
echo ... Expected checksum: "%2"
echo ... Received checksum: %3
if %3 == "%2" (echo ... Success) else (echo ... Failure && EXIT /B 11)
To uninstall Arm Compiler for Embedded on Linux, delete the Arm Compiler for Embedded
installation directory for the compiler version you want to delete.
For more information on installation, see the Release Notes that are available with the installer for
your version on the Arm Compiler downloads index page.
Related information
Accessing Arm Compiler for Embedded from Arm Development Studio on page 17
Accessing Arm Compiler for Embedded from the Arm Keil MDK on page 18
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 17 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
For more information, see Create a new C or C++ project in the Arm Development Studio Getting
Started Guide.
Related information
System requirements and installation on page 14
For more information, see Manage Arm Compiler Versions in the μVision User's Guide.
Related information
System requirements and installation on page 14
A simple example
The source code that is used in the examples is a single C source file, hello.c, to display a greeting
message:
#include <stdio.h>
int main() {
printf("Hello World\n");
return 0;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 18 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
You must first decide which target the executable is to run on. An Armv8-A target can run in
different states:
• AArch64 state targets execute A64 instructions using 64-bit and 32-bit general-purpose
registers.
• AArch32 state targets execute A32 or T32 instructions using 32-bit general-purpose registers.
The --target option determines which target state to compile for. This option is a mandatory
option.
Compiling for an AArch64 target
To create an executable for an AArch64 target in a single step:
This command creates an executable file with the default name a.out. You can use the -o
option to specify a different name for the executable file.
This example compiles for an AArch64 state target. Because only --target is specified, the
compiler defaults to generating code that runs on any Armv8-A target. You can also use -
mcpu to target a specific processor.
There is no default target for AArch32 state. You must specify either -march to target an
architecture or -mcpu to target a processor.
This example uses -mcpu to target the Cortex®-A53 processor. The compiler generates code
that is optimized specifically for the Cortex-A53, but might not run on other processors.
The Cortex-A53 supports both the A32 and T32 instruction sets. For more information, see -
marm and -mthumb.
The Arm Compiler for Embedded Reference Guide describes all the supported options. Some of
the most common options are listed in Common Arm Compiler for Embedded toolchain options.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 19 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
...
main
0x000081a0: e92d4800 .H-. PUSH {r11,lr}
0x000081a4: e1a0b00d .... MOV r11,sp
0x000081a8: e24dd010 ..M. SUB sp,sp,#0x10
0x000081ac: e3a00000 .... MOV r0,#0
0x000081b0: e50b0004 .... STR r0,[r11,#-4]
0x000081b4: e30a19cc .... MOV r1,#0xa9cc
...
• Convert the ELF executable image to another format, for example a plain binary file:
See fromelf Command-line Options for the options from the fromelf tool.
This command compiles the two source files file1.c and file2.c into an executable file for an
AArch64 state target. The -o option specifies that the filename of the generated executable file is
image.axf.
However, more complex projects might have a large number of source files. It is not efficient to
compile every source file at every compilation, because many of the source files are unlikely to
change. To avoid compiling unchanged source files, you can compile and link as separate steps.
In this way, you can then use a build system (such as make) to compile only those source files that
have changed, then link the object code together. The armclang -c option tells the compiler to
compile to object code and stop before calling the linker:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 20 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
• Compile file1.c to object code, and save using the default name file1.o.
• Compile file2.c to object code, and save using the default name file2.o.
• Link the object files file1.o and file2.o to produce an executable that is called image.axf.
In future, if you modify file2.c, you can rebuild the executable by recompiling only file2.c then
linking the new file2.o with the existing file1.o to produce a new executable:
Related information
--target (armclang)
-march (armclang)
-mcpu (armclang)
Summary of armclang command-line options
The integrated assembler sets a minimum alignment of 4 bytes for a .text section.
However, if you define your own sections with the integrated assembler, then
you must include the .balign directive to set the correct alignment. For a section
containing T32 instructions, set the alignment to 2 bytes. For a section containing
A32 instructions, set the alignment to 4 bytes.
.global mystrcopy
.type mystrcopy, "function"
mystrcopy:
ldrb r2, [r1], #1
strb r2, [r0], #1
cmp r2, #0
bne mystrcopy
bx lr
The .section directive creates a new section in the object file named StringCopy. The characters
in the string following the section name are the flags for this section. The a flag marks this section
as allocatable. The x flag marks this section as executable.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 21 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
The .balign directive aligns the subsequent code to a 4-byte boundary. The alignment is required
for compliance with the Procedure Call Standard for the Arm Architecture (AAPCS).
The .global directive marks the symbol mystrcopy as a global symbol. This enables the symbol to
be referenced by external files.
The .type directive sets the type of the symbol mystrcopy to function. This helps the linker use
the proper linkage when the symbol is branched to from A32 or T32 code.
In this example, there is no default target for A32 state, so you must specify either -march to target
an architecture or -mcpu to target a processor. This example uses -march to target the Armv8-M
Mainline architecture. The integrated assembler accepts the same options for --target, -march, -
mcpu, and -mfpu as the compiler.
Some update releases and architecture extensions might not be fully supported
in this release. Where these are described, the level of support is indicated. See
Support level definitions.
For example, you can disassemble the code that is contained in the object file:
...
** Section #3 'StringCopy' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 14 bytes (alignment 4)
Address: 0x00000000
$t.0
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 22 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
mystrcopy
0x00000000: f8112b01 ...+ LDRB r2,[r1],#1
0x00000004: f8002b01 ...+ STRB r2,[r0],#1
0x00000008: 2a00 .* CMP r2,#0
0x0000000a: d1f9 .. BNE mystrcopy ; 0x0
0x0000000c: 4770 pG BX lr
...
The example shows the disassembly for the section StringCopy as created in the source file.
The presence of 16-bit opcodes shows that the code is in the T32 instruction set.
T32 is the default in this situtation, because Armv8-M Mainline does not support
A32 code.
For processors that support A32 and T32 code, you can explicitly mark the code as
A32 or T32 by adding the GNU assembly .arm or .thumb directive, respectively, at
the start of the source file.
The C example is a single C source file main.c, containing a call to the mystrcopy function to copy a
string from one location to another:
int main(void) {
mystrcopy(dest, source);
return 0;
}
An extern function declaration has been added for the mystrcopy function. The return type and
function parameters must be checked manually.
If you want to call the assembly function from a C++ source file, you must disable C++ name
mangling by using extern "C" instead of extern. For the above example, use:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 23 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
To link the two object files main.o and mystrcopy.o and generate an executable image:
Related information
Mandatory armclang options on page 30
Summary of armclang command-line options
Sections
The linker creates information to initialize global and static objects (data) and uninitialized global
and static objects (.bss). Bare-metal images initialize the data by copying and decompressing
initialized data and set the .bss to zero.
See your Arm Integrated Development Environment (IDE) documentation for more information on
configuring and running images:
• For Arm Development Studio, see the Arm Development Studio Getting Started Guide and Arm
Development Studio User Guide.
• For Arm® Keil® MDK, see Installation in the Arm Keil Microcontroller Development Kit (MDK)
Getting Started Guide.
By default, the C library in Arm Compiler for Embedded uses special functions to access the
input and output interfaces on the host computer. These functions implement a feature called
semihosting. Semihosting is useful when the input and output on the hardware is not available
during the early stages of application development.
When you want your application to use the input and output interfaces on the hardware, you must
retarget the required semihosting functions in the C library.
See your Arm IDE documentation for more information on configuring debugger settings:
• For Arm Debugger settings, see Configuring a connection to a bare-metal hardware target in
the Arm Development Studio Getting Started Guide.
• For information on how to disable semihosting in Arm Keil MDK, see ARM: Application Builds
Without Error, But Does Not Run.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 24 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
A bare-metal application that uses semihosting does not use the input and output interface of the
development platform. When the input and output interfaces on the development platform are
available, you must reimplement the necessary semihosting functions to use them.
For more information, see how to use the libraries in semihosting and nonsemihosting
environments.
Related information
Arm Development Studio Getting Started Guide
Arm Development Studio User Guide
Semihosting for AArch32 and AArch64
Some update releases and architecture extensions might not be fully supported
in this release. Where these are described, the level of support is indicated. See
Support level definitions.
Arm Compiler for Embedded supports the following architectures for bare-metal targets:
• Armv9-A and all update releases.
• Armv8-A and all update releases.
• Armv8-R.
• Armv8-M and all update releases.
• Armv7-A.
• Armv7-R.
• Armv7-M.
• Armv6-M.
When compiling code, the compiler must know which architecture to target in order to take
advantage of features specific to that architecture.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 25 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
To specify a target, you must supply the target execution state (AArch32 or AArch64), together
with either a target architecture (for example Armv8-A) or a target processor (for example, the
Cortex®-A53 processor).
To specify a target execution state (AArch64 or AArch32) with armclang, use the mandatory --
target command-line option:
--target=<arch>-<vendor>-<os>-<abi>
arm-arm-none-eabi
Generates A32 and T32 instructions for AArch32 state. Must be used in conjunction with -
march (to target an architecture) or -mcpu (to target a processor).
To generate generic code that runs on any processor with a particular architecture, use the -march
option. Use the -march=list option to see all supported architectures.
To optimize your code for a particular processor, use the -mcpu option. Use the -mcpu=list option
to see all supported processors.
The --target, -march, and -mcpu options are armclang options. For all of the other
tools, such as armlink, use the --cpu option to specify target processors and
architectures.
Related information
--target (armclang)
-march (armclang)
-mcpu (armclang)
--cpu (armlink)
Arm Glossary
If deploying Arm Compiler for Embedded into environments where security is a concern, then Arm
strongly recommends that you do all the following:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 26 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
Preprocessing your source files with the armclang option -E might be useful when creating the
minimal example as part of a support case. To help the investigation, try to send only the single
image, object, source file, or function that is causing the issue, together with the command-line
options used.
If your source code contains preprocessor macros, it might be necessary to use the compiler to
preprocess the source before sharing it. That is, to take account of files added with #include, pass
the file through the preprocessor as follows:
Where <options> are your normal compile options, such as -O2, -g, -I, -D, but without -c.
Related information
Common Arm Compiler for Embedded toolchain options on page 32
-E (armclang)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 27 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
Arm® Compiler for Embedded supports build attributes only for AArch32.
Build attributes approximate your intentions for the compatibility of the relocatable file produced
by the tool when compiling or assembling code. You express the intentions to the tool as
configuration options such as -mcpu or -mno-unaligned-access.
When compiling C and C++ code, armclang is in control of code generation and can guarantee
that the object file generated conforms to the intention. When using the assembler, you are in
control of code generation. In some cases the assembler can check that the source code conforms
to the intentions given on the command-line. For example, if the specified processor does not
support a particular instruction, the assembler can give an error message that the instruction is not
supported. However, some intentions cannot be easily checked by the assembler.
You can use the armclang integrated assembler with options that permit using unaligned data
accesses or options that affect the passing of arguments. When using such options, you must
ensure that the object file generated conforms to the intentions and purpose of the options:
• Compatibility can be given a mathematically precise definition using sets of demands placed on
an execution environment.
For example, a program is compatible with a processor if, and only if, the set of instructions the
program might try to execute is a subset of the instructions implemented by that processor.
• Target-related attributes describe the hardware-related demands a relocatable file places on an
execution environment through being included in an executable file for that environment.
For example, target-related attributes record whether use of the Arm® Thumb® Instruction
Set Architecture (ISA) is permitted, and at what architectural revision use is permitted. A pair of
values for these attributes describes the set of Thumb instructions that code is permitted to
execute and that the target processor must implement.
• Procedure call-related attributes describe features of the ABI contract that the ABI allows to
vary. Features such as:
◦ Whether floating-point parameters are passed in floating-point registers.
◦ The size of wchar_t.
◦ Whether enumerated values are containerized according to their size.
You can also set intentions by using directives in the assembler source code. You can use
the armclang [COMMUNITY] option -mdefault-build-attributes to add the default build
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 28 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Started
attribute directives to your assembley code. To see how armclang encodes the build attributes
in the assembly code specify the -S option. For example, the -mno-unaligned-access sets the
Tag_CPU_unaligned_access attribute to 0:
.text
.syntax unified
.eabi_attribute 67, "2.09" @ Tag_conformance
.eabi_attribute 6, 14 @ Tag_CPU_arch
.eabi_attribute 7, 65 @ Tag_CPU_arch_profile
.eabi_attribute 8, 1 @ Tag_ARM_ISA_use
.eabi_attribute 9, 2 @ Tag_THUMB_ISA_use
...
.eabi_attribute 34, 0 @ Tag_CPU_unaligned_access
...
If you have a specific language standard that you are targeting for assembler source code, we
recommend that you specify the language standard on the command-line. You must specify
the language standard because the assembler does not detect non-conformance between the
assembler source code and the stated intentions.
Build attributes are encoded in a binary format. To decode the build attributes, use the
fromelf option --decode_build_attributes. To see a human-readable form, use the --
extract_build_attributes option.
Related information
Addenda to, and Errata in, the ABI for the Arm Architecture
Summary of armclang command-line options
-mdefault-build-attributes, -mno-default-build-attributes
armclang Integrated Assembler
--decode_build_attributes
--extract_build_attributes
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 29 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Specifying a target
To specify a target, use the --target option. The following targets are available:
• To generate A64 instructions for AArch64 state, specify --target=aarch64-arm-none-eabi.
• To generate A32 and T32 instructions for AArch32 state, specify --target=arm-arm-none-eabi.
To specify generation of either A32 or T32 instructions, use -marm or -mthumb respectively.
Specifying an architecture
To generate code for a specific architecture, use the -march option. The supported architectures
vary according to the selected target.
To see a list of all the supported architectures for the selected target, use -march=list.
Specifying a processor
To generate code for a specific processor, use the -mcpu option. The supported processors vary
according to the selected target.
To see a list of all the supported processors for the selected target, use -mcpu=list.
It is also possible to enable or disable optional architecture features, by using the +[no]feature
notation. For a list of the architecture features that your processor supports, see the processor
product documentation. See the Arm Compiler for Embedded Reference Guide for a list of
architecture features that Arm Compiler for Embedded supports.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 30 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Avoid specifying both the architecture (-march) and the processor (-mcpu) because
specifying both has the potential to cause a conflict. The compiler infers the correct
architecture from the processor.
• If you want to run code on one particular processor, specify the processor using
-mcpu. Performance is optimized, but code is only guaranteed to run on that
processor. If you specify a value for -mcpu, do not also specify a value for -
march.
Examples
These examples compile and link the input file helloworld.c:
• To compile for the Armv8-A architecture in AArch64 state, use:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 31 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Related information
--target (armclang)
-march (armclang)
-mcpu (armclang)
-marm (armclang)
-mthumb (armclang)
Summary of armclang command-line options
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 32 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Option Description
-mcpu=name Generates code for the specified processor, for example -mcpu=cortex-a53, -
mcpu=cortex-a57, or -mcpu=cortex-a15.
-mcpu=list Displays a list of all the supported processors for the selected execution state.
-marm Requests that the compiler targets the A32 instruction set, which consists of 32-bit wide
instructions only. For example, --target=arm-arm-none-eabi -march=armv7-a -
marm. This option emphasizes performance.
The -mthumb option is not valid with AArch64 targets. The compiler ignores the -mthumb
option and generates a warning if used with AArch64 targets.
-mfloat-abi Specifies whether to use hardware instructions or software library functions for floating-point
operations.
-mfpu Specifies the target FPU architecture.
-g (armclang) Generates DWARF debug tables compatible with the DWARF 4 standard.
-e Executes only the preprocessor step.
-I Adds the specified directories to the list of places that are searched to find included files.
-o (armclang) Specifies the name of the output file.
-Onum Specifies the level of performance optimization to use when compiling source files.
-Os Balances code size against code speed.
-Oz Optimizes for code size.
-S Outputs the disassembly of the machine code that the compiler generates.
-### Displays diagnostic output showing the options that would be used to invoke the compiler and
linker. The compilation and link steps are not performed.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 33 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Option Description
--info (armlink) Displays information about linker operation. For example, --
info=sizes,unused,unusedsymbols displays information about all the following:
• Code and data sizes for each input object and library member in the image.
• Unused sections that --remove has removed from the code.
• Symbols that were removed with the unused sections.
--list=filename Redirects diagnostics output from options including --info and --map to the specified file.
--map Displays a memory map containing the address and the size of each load region, execution
region, and input section in the image, including linker-generated input sections.
--symbols Lists each local and global symbol that is used in the link step, and their values.
-o filename, --output=filename Specifies the name of the output file.
--keep=section_id Specifies input sections that unused section elimination must not remove.
--load_addr_map_info Includes the load addresses for execution regions and the input sections within them in the map
file.
The optional <options> specify additional information to include in the image information.
Valid <options> include -c to disassemble code, and -s to print the symbol and versioning
tables. You can also use <options> without specifying --text.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 34 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Option Description
--info (fromelf) Displays information about specific topics, for example --info=totals lists the Code, RO
Data, RW Data, ZI Data, and Debug sizes for each input object and library member in the
image.
Only use armasm to assemble legacy assembly code syntax. Use GNU syntax for
new assembly files, and assemble with the armclang integrated assembler.
Source language
By default Arm Compiler for Embedded treats files with .c extension as C source files. If you want
to compile a .c file, for example file.c, as a C++ source file, use the -xc++ option:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 35 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
By default Arm Compiler for Embedded treats files with .cpp extension as C++ source files. If you
want to compile a .cpp file, for example file.cpp, as a C source file, use the -xc option:
The -x option only applies to input files that follow it on the command line.
Some C and C++ language standards are supported as [COMMUNITY] features. See
Support level definitions.
armclang always applies the rules for type auto-deduction for copy-list-initialization
and direct-list-initialization from C++17, regardless of which C++ source language
mode a program is compiled for. For example, the compiler always deduces the type
of foo as int instead of std::initializer_list<int> in the following code:
auto foo{ 1 };
The default language standard for C code is gnu11 [COMMUNITY]. The default language standard
for C++ code is gnu++17. To specify a different source language standard, use the -std=<name>
option.
Compatibility of C++ compilation units
We do not guarantee the compatibility of C++ compilation units compiled with different
major or minor versions of Arm Compiler for Embedded and linked into a single image.
Also, the default language standards used can differ between versions of Arm Compiler for
Embedded.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 36 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
We recommend that you always build your C++ code from source with a
single version of the toolchain.
If you are linking your project against a pre-built library provided by a third party, ensure you
use a version of the library built using the same version of the compiler toolchain you are
using to build your project.
Arm Compiler for Embedded supports various language extensions, including GCC extensions,
which you can use in your source code. Some GCC extensions are only available when you
specify one of the GCC C or C++ language variants. Other GCC extensions are available without
specifying a language variant. Use the armclang option -Wgnu to see if a GNU extension is used. For
more information on language extensions, see the C Language Extensions in the Arm Compiler for
Embedded Reference Guide.
Because Arm Compiler for Embedded uses the available language extensions by default, it does
not adhere to the strict ISO standard. To compile to strict ISO standard for the source language,
use the -Wpedantic option. This option generates warnings where the source code violates the ISO
standard. Arm Compiler for Embedded does not support strict adherence to C++98 or C++03.
If you do not use -Wpedantic, Arm Compiler for Embedded uses the available language extensions
without warning. However, where language variants produce different behavior, the behavior is
that of the language variant that -std specifies.
Certain compiler optimizations can violate strict adherence to the ISO standard for
the language. To identify when these violations happen, use the -Wpedantic option.
The following example shows the use of a variable length array, which is a C99 feature. In this
example, the function declares an array i, with variable length <n>.
#include <stdlib.h>
void function(int n) {
int i[n];
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 37 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Arm Compiler for Embedded does not warn when compiling the example for C99 with -Wpedantic:
Arm Compiler for Embedded does warn about variable length arrays when compiling the example
for C90 with -Wpedantic:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 38 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
• std::declare_reachable()
• std::undeclare_reachable()
• std::declare_no_pointers()
• std::undeclare_no_pointers()
• std::get_pointer_safety()
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 39 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Additional information
See the Arm Compiler for Embedded Reference Guide for information about Arm-specific language
extensions.
For more information about libc++ support, see Standard C++ library implementation definition, in
the Arm C and C++ Libraries and Floating-Point Support User Guide.
For [COMMMUNITY] supported language features, see the Clang Compiler User's Manual.
The LLVM Clang project provides the following additional information about language compatibility:
• Language compatibility:
http://clang.llvm.org/compatibility.html
• Language extensions:
http://clang.llvm.org/docs/LanguageExtensions.html
• C++ implementation status:
http://clang.llvm.org/cxx_status.html
For more information about -fsanitize=undefined support, see -fsanitize, -fno-sanitize, in the Arm
Compiler for Embedded Reference Guide.
Related information
Standard C++ library implementation definition
Arm Compiler for Embedded Reference Guide
-fsized-deallocation, -fno-sized-deallocation
Arm Compiler for Embedded provides various optimization levels to control the different
optimization goals. The best optimization level for your application depends on your application
and optimization goal.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 40 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
If you use a higher optimization level for performance, it has a higher impact on the other goals
such as degraded debug experience, increased code size, and increased build time.
If your optimization goal is code size reduction, it has an impact on the other goals such as
degraded debug experience, slower performance, and increased build time.
armclang provides a range of options to help you find a suitable approach for your requirements.
Consider whether code size reduction or faster performance is the goal that matters most for your
application, and then choose an option that matches your goal.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 41 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
The creation of vector instructions can be inhibited with the armclang command-line option -fno-
vectorize.
-Osprovides code size reduction compared to -O3. It also degrades the debug experience
compared to -O1.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 42 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
• Optimizations that might increase code size, such as Loop unrolling and loop vectorization are
disabled.
• Loops are generated as while loops instead of do-while loops.
• Outlining is enabled for AArch32 with M-profile and AArch64 targets only. The outliner
searches for identical sequences of code and puts them in a function, then replaces each
instance of the code sequence with calls to this function. Outlining reduces code size, but can
increase execution time. You can override this using the -moutline, -mno-outline options.
If you want to compile at -Omin and use separate compile and link steps, then you must also
include -Omin on your armlink command line.
This level also performs other aggressive optimizations that might violate strict compliance with
language standards.
This level degrades the debug experience, and might result in increased code size compared to -O3.
At this optimization level, Arm Compiler for Embedded might violate strict compliance with
language standards. Use this optimization level for the fastest performance.
This level degrades the debug experience, and might result in increased code size compared to -
Ofast.
If you want to compile at -Omax and have separate compile and link steps, then you must also
include -Omax on your armlink command line.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 43 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
int test()
{
int x=10, y=20;
int z;
z=x+y;
return 0;
}
The source file contains mostly dead code, such as int x=10 and z=x+y. In the following examples:
• At optimization level -O0, the compiler performs no optimization, and therefore generates code
for the dead code in the source file.
• At optimization level -O1, the compiler does not generate code for the dead code in the source
file.
test:
.fnstart
.pad #12
sub sp, sp, #12
mov r0, #10
str r0, [sp, #8]
mov r0, #20
str r0, [sp, #4]
ldr r0, [sp, #8]
add r0, r0, #20
str r0, [sp]
mov r0, #0
add sp, sp, #12
bx lr
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 44 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
test:
.fnstart
movs r0, #0
bx lr
Related information
Optimizing for code size or performance on page 84
Optimizing loops on page 65
Optimizing across modules with Link-Time Optimization on page 86
-O
armclang -gdwarf-3
When linking, there are several armlink options available to help improve the debug view:
• --debug. This option is the default.
• --no_remove to retain all input sections in the final image even if they are unused.
• --bestdebug. When different input objects are compiled with different optimization levels, this
option enables linking for the best debug illusion.
Higher optimization levels perform progressively more optimizations with correspondingly poorer
debug views.
The compiler attempts to automatically inline functions at all optimization levels except at -O0.
However, the threshold at which the compiler decides to inline depends on the level. If you must
use optimization levels higher than -O0, disable the automatic inlining with the armclang option -
fno-inline-functions. The linker inlining is disabled by default.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 45 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
• --emit_debug_overlay_relocs
Avoid using the following features when building an image for debugging:
• Link-Time Optimization. This feature performs aggressive optimizations and can remove large
chunks of code.
• The armlink option --no_debug.
• The armlink option --inline. This option changes the image in such a way that the debug
information might not correspond to the source code.
where:
<options>
are linker command-line options.
<input-file-list>
is a space-separated list of objects, libraries, or symbol definitions (symdefs) files.
For example, to link the object file hello_world.o into an executable image hello_world.axf:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 46 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
The options implement the scatter-loading mechanism that describes the memory layout for the
image. The options that you use depend on the complexity of your image:
• For simple images, use the following memory map related options:
◦ --ro_base to specify the address of both the load and execution region containing the RO
output section.
◦ --rw_base to specify the address of the execution region containing the RW output section.
◦ --zi_base to specify the address of the execution region containing the ZI output section.
For objects that include eXecute-Only (XO) sections, the linker provides the --
xo_base option to locate the XO sections. These sections are objects that are
targeted at Arm®v6-M, Armv7-M, or Armv8-M architectures, or objects that
are built with the armclang option -mthumb. However, XO is not supported on
Armv6-M for any form of position independent code.
• For complex images, use a text format scatter-loading description file. This file is known as a
scatter file, and you specify it with the --scatter option.
You cannot use the memory map related options with the --scatter option.
Examples
The following example shows how to place code and data using the memory map related options:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 47 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
In this example, --first is also included to make sure that the initialization routine
is executed first.
The following example shows a scatter file, scatter.scat, that defines an equivalent memory map:
ER_RW 0x400000
{
*(+RW)
}
ER_ZI 0x405000
{
*(+ZI)
}
}
A number of armclang options control the behavior of the linker. These options are translated to
equivalent armlink options.
In addition, the -Xlinker and -Wl options let you pass options directly to the linker from the
compiler command line. These options perform the same function, but use different syntaxes:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 48 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
• The -Xlinker option specifies a single option, a single argument, or a single option=argument
pair. If you want to pass multiple options, use multiple -Xlinker options.
• The -Wl, option specifies a comma-separated list of options and arguments or
option=argument pairs.
For example, the following are all equivalent because armlink treats the single option --
list=diag.txt and the two options --list diag.txt equivalently:
-Wl,--list,diag.txt,--split
-Wl,--list=diag.txt,--split
The -### compiler option produces diagnostic output showing exactly how the
compiler and linker are invoked, displaying the options for each tool. With the -###
option, armclang only displays this diagnostic output. It does not compile source
files or invoke armlink.
The following example shows how to use the -Xlinker option to pass the --split option to the
linker, splitting the default load region containing the RO and RW output sections into separate
regions:
You can use fromelf --text to compare the differences in image content:
Arm Compiler for Embedded lists all the warnings and errors it encounters during the compiling
and linking process.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 49 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
<file>
The filename that contains the error or warning.
<line>
The line number that contains the error or warning.
<col>
The column number that generated the message.
<type>
The type of the message, for example error or warning.
<message>
The message text. This text might end with a diagnostic flag of the form -W<flag>, for
example -Wvla-extension, to identify the error or warning. Only the messages that you can
suppress have an associated flag. Errors that you cannot suppress do not have an associated
flag.
The following are common options that control diagnostic output from armclang.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 50 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Option Description
Wgnu Generate warnings about GCC extensions.
See Options to Control Error and Warning Messages in the Clang Compiler User's Manual for full
details about controlling diagnostics with armclang and for possible values for <flag>.
#include <stdlib.h>
#include <stdio.h>
return;
}
By default, armclang checks the format of printf() statements to ensure that the number of %
format specifiers matches the number of data arguments. By default, armclang also compiles for
the gnu11 standard for .c files. This language standard does not allow implicit function declarations.
Therefore, armclang generates the following diagnostic messages:
file.c:9:3: error: call to undeclared function 'call'; ISO C99 and later do not
support implicit function declarations [-Wimplicit-function-declaration]
9 | call(); /* This function has not been declared and is therefore an
implicit declaration */
| ^
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 51 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Some diagnostic messages are suppressed by default. To see all diagnostic messages, use -
Weverything:
See Controlling Diagnostics via Pragmas in the Clang Compiler User's Manual for full details about
controlling diagnostics with armclang.
The following are some of the common options that control diagnostics:
#pragma clang diagnostic ignored "-W<name>"
Ignores the diagnostic message specified by <name>.
#pragma clang diagnostic warning "-W<name>"
Sets the diagnostic message specified by <name> to warning severity.
#pragma clang diagnostic error "-W<name>"
Sets the diagnostic message specified by <name> to error severity.
#pragma clang diagnostic fatal "-W<name>"
Sets the diagnostic message specified by <name> to fatal error severity.
#pragma clang diagnostic push
Saves the diagnostic state so that it can be restored.
#pragma clang diagnostic pop
Restores the last saved diagnostic state.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 52 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
#if file1
#endif file1 /* no warning when compiling with -Wextra-tokens */
#if file1
#endif file1 /* warning: extra tokens at end of #endif directive */
The compiler only generates a warning for the second instance of #endif file1:
<type>
One of the following types:
Internal fault
Internal faults indicate an internal problem with the tool. Contact your supplier with
feedback.
Error
Errors indicate problems that cause the tool to stop.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 53 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Warning
Warnings indicate unusual conditions that might indicate a problem, but the tool
continues.
Remark
Remarks indicate common, but sometimes unconventional, tool usage. These
diagnostics are not displayed by default. The tool continues.
<prefix>
The tool that generated the message, one of:
• A - armasm
• L - armlink or armar
• Q - fromelf
<id>
A unique numeric message identifier.
<suffix>
The type of message, one of:
• E - Error
• W - Warning
• R - Remark
<message_text>
The text of the message.
Error: L6449E: While processing /home/scratch/a.out: I/O error writing file '/home/
scratch/a.out': Permission denied
All the diagnostic messages that are in this format, and any additional information, are in the Arm
Compiler for Embedded Errors and Warnings Reference Guide.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 54 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
--diag_style=arm|ide|gnu
Specifies the display style for diagnostic messages.
--diag_suppress=<tag>[,<tag>]...
Suppresses the specified diagnostic messages. Use --diag_suppress=error to suppress all
errors that can be downgraded, or --diag_suppress=warning to suppress all warnings.
Reducing the severity of diagnostic messages might prevent the tool from
reporting important faults. Arm recommends that you do not reduce the
severity of diagnostics unless you understand the impact on your software.
--diag_warning=<tag>[,<tag>]...
Sets the specified diagnostic messages to Warning severity. Use --diag_warning=error to
set all errors that can be downgraded to warnings.
--errors=<filename>
Redirects the output of diagnostic messages to the specified file.
--remarks
armlink only. Enables the display of remark messages (including any messages redesignated
to remark severity using --diag_remark).
<tag> is the four-digit diagnostic number, <nnnn>, with the tool letter prefix, but without the letter
suffix indicating the severity. A full list of tags with the associated suffixes is in the Arm Compiler
for Embedded Errors and Warnings Reference Guide.
AREA ||.text||,CODE
x EQU 42
IF :LNOT: :DEF: sym
ASSERT x == 42
ENDIF
sym EQU 1
;END ; Commented out
"noend.s", line 3 (column 3): Error: A1163E: Unknown opcode x , expecting opcode or
Macro
3 00000000 x EQU 42
^
"noend.s", line 7 (column 3): Error: A1163E: Unknown opcode sym , expecting opcode
or Macro
7 00000000 sym EQU 1
^
"noend.s", line 9: Warning: A1313W: Missing END directive at end of file
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 55 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
9 00000000
Related information
-W (armclang)
The LLVM Compiler Infrastructure Project
Clang Compiler User's Manual
Arm Compiler for Embedded supports floating-point arithmetic by using one of the following:
• Libraries that implement floating-point arithmetic in software.
• Hardware floating-point registers and instructions that are available on most Arm-based
processors.
You can use various options that determine how Arm Compiler for Embedded generates code
for floating-point arithmetic. Depending on your target, you might need to specify one or more
of these options to generate floating-point code that correctly uses floating-point hardware or
software libraries.
To improve performance, the compiler can use floating-point registers instead of the stack. You can
disable this feature with the [COMMUNITY] option -mno-implicit-float.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 56 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Avoid specifying both the architecture (-march) and the processor (-mcpu) because
specifying both has the potential to cause a conflict. The compiler infers the correct
architecture from the processor.
• If you want to run code on one particular processor, specify the processor using
-mcpu. Performance is optimized, but code is only guaranteed to run on that
processor. If you specify a value for -mcpu, do not also specify a value for -
march.
The -mfpu option is ignored with AArch64 targets, for example aarch64-arm-none-
eabi. Use the -mcpu option to override the default FPU for aarch64-arm-none-
eabi targets. For example, to prevent the use of floating-point instructions or
floating-point registers for the aarch64-arm-none-eabi target use the -mcpu=name
+nofp+nosimd option. Subsequent use of floating-point data types in this mode is
unsupported.
• Disabling floating-point arithmetic does not disable all the floating-point hardware because
the floating-point hardware is also used for Advanced Single Instruction Multiple Data (SIMD)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 57 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
arithmetic. To disable all Advanced SIMD and floating-point hardware, use the +nofp+nosimd
extension on the -mcpu or -march options:
See -march and -mcpu in the Arm Compiler For Embedded Reference Guide for more information.
• On AArch32 targets, using -mfpu=none disables the hardware for both Advanced SIMD and
floating-point arithmetic. You can use -mfpu to selectively enable certain hardware features.
For example, if you want to use the hardware for Advanced SIMD operations on an Armv7
architecture-based processor, but not for floating-point arithmetic, then use -mfpu=neon.
• The Armv8.1-M architecture profile has optional support for the M-profile Vector Extension
(MVE). -march and -mcpu support certain MVE floating-point combinations.
See -march, -mcpu, and -mfpu in the Arm Compiler For Embedded Reference Guide for more
information.
Floating-point linkage
Floating-point linkage refers to how the floating-point arguments are passed to and returned from
function calls.
For AArch64, you can use -mabi=<name> to specify the calling convention.
For AArch32, Arm Compiler for Embedded can use hardware linkage or software linkage. When
using software linkage, Arm Compiler for Embedded passes and returns floating-point values in
general-purpose registers. By default, Arm Compiler for Embedded uses software linkage. You can
use the -mfloat-abi option to force hardware linkage or software linkage.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 58 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Code with hardware linkage can be faster than the same code with software linkage. However,
code with software linkage can be more portable because it does not require the hardware floating-
point registers. Hardware floating-point is not available on some architectures such as Armv6-M,
or on processors where the floating-point hardware might be powered down for energy efficiency
reasons.
In AArch32 state, if you specify -mfloat-abi=soft, then specifying the -mfpu option
does not have an effect.
See the Arm Compiler For Embedded Reference Guide for more information on the -mfloat-abi option.
All objects to be linked together must have the same type of linkage. If you link
object files that have hardware linkage with object files that have software linkage,
then the image might have unpredictable behavior. When linking objects, specify
the armlink option --fpu=<name> where <name> specifies the correct linkage type
and floating-point hardware. This option enables the linker to provide diagnostic
information if it detects different linkage types.
See the Arm Compiler For Embedded Reference Guide for more information on how the -fpu option
specifies the linkage type and floating-point hardware.
Related information
-mabi=<name> (armclang)
-mcpu (armclang)
-mfloat-abi (armclang)
-mfpu (armclang)
Floating-point support
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 59 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Common Compiler Options
Keyword options
All keyword options, including keyword options with arguments, are preceded by a double
dash --. An = or space character is required between the option and the argument. For
example:
armlink -- -ifile_1
In some Unix shells, you might have to include quotes when using arguments to some command-
line options, for example:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 60 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
This declaration ensures that the compiler does not optimize any use of the variable on the
assumption that this variable is unused or unmodified.
You can also use volatile to tell the compiler that a block containing inline assembly code has
side-effects that the output, input, and clobber lists do not represent.
Arm® Compiler for Embedded does not guarantee that a single-copy atomic
instruction is used to access a volatile variable that is larger than the natural
architecture data size, even when one is available for the target processor. For
more information, see Volatile variables and Atomicity in the Arm architecture in the
following documents:
• Arm Architecture Reference Manual for A-profile architecture.
• ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 61 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
• A variable might be used to implement a sleep or timer delay. If the variable appears unused,
the compiler might remove the timer delay code, unless the variable is declared as volatile.
• In C++, an interrupt function might be defined in a class scope but is called by hardware
asynchronously. A buffer, buffer_full, is modified in an interrupt and is in a scope but must
still be declared as volatile, for example:
class myclass
{
public:
int check_stream();
void async_interrupt();
private:
bool buffer_full; // must be declared as volatile
};
int myclass::check_stream()
{
int count = 0;
while (!buffer_full)
{
count++;
}
return count;
}
void myclass::async_interrupt()
{
buffer_full = !buffer_full;
}
In practice:
• We recommend that you declare the variables that you use to access memory-mapped
peripherals as volatile. Even with the minimum optimization level -O0, there is no guarantee
that a non-volatile variable is not going to be optimized.
• volatile is not a means of inter-thread communication or synchronization, and atomics must
be used for this purpose instead. That is:
◦ The _Atomic qualifier and <stdatomic.h> functions in C.
◦ The <atomic> library functions and templates in C++.
• Interrupt and signal handlers must use either atomics or variables of the type volatile
sig_atomic_t, but not arbitrary volatile-qualified types, to synchronize with other threads
of execution.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 62 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
If you are writing code that must access the AXI port, or any other memory-mapped location
that requires a particular access strategy, then declaring the location as a volatile variable is
not enough. You must also perform your accesses to the register using an __asm__ statement
containing the load or store instructions you need. For example:
int buffer_full;
int read_stream(void)
{
int count = 0;
while (!buffer_full)
{
count++;
}
return count;
}
The routine increments a counter in a loop until a status flag buffer_full is set to true. The state
of buffer_full can change asynchronously with program flow.
This example does not declare the variable buffer_full as volatile and is therefore wrong.
The disassembly in read_stream.s for the nonvolatile version of buffer loop contains:
read_stream:
movw r0, :lower16:buffer_full
movt r0, :upper16:buffer_full
ldr r1, [r0]
mvn r0, #0
.LBB0_1:
add r0, r0, #1
cmp r1, #0
beq .LBB0_1 ; infinite loop
bx lr
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 63 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
In the disassembly of the nonvolatile example, the statement LDR r1, [r0] loads the value of
buffer_full into register r1 outside the loop labeled .LBB0_1. Because buffer_full is not declared
as volatile, the compiler assumes that its value cannot be modified outside the program. Having
already read the value of buffer_full into r0, the compiler omits reloading the variable when
optimizations are enabled, because its value cannot change. The result is the infinite loop labeled
.LBB0_1.
The routine increments a counter in a loop until a status flag buffer_full is set to true. The state
of buffer_full can change asynchronously with program flow.
The disassembly in read_stream.s for the volatile version of buffer loop contains:
read_stream:
movw r1, :lower16:buffer_full
mvn r0, #0
movt r1, :upper16:buffer_full
.LBB1_1:
ldr r2, [r1] ; buffer_full
add r0, r0, #1
cmp r2, #0
beq .LBB1_1
bx lr
In the disassembly of the volatile example, the compiler assumes that the value of buffer_full can
change outside the program and performs no optimization. Therefore, the value of buffer_full
is loaded into register r2 inside the loop labeled .LBB1_1. As a result, the assembly code that is
generated for loop .LBB1_1 is correct.
Related information
Floating-point division by zero errors in C and C++ code on page 270
Volatile variables
armclang inline assembler
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 64 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Loop unrolling
You can reduce the impact of this overhead by unrolling some of the iterations, which in turn
reduces the number of iterations for checking the condition. Use #pragma unroll (<n>) to unroll
time-critical loops in your source code. However, unrolling loops has the disadvantage of increasing
the code size. These pragmas are only effective at optimization -O2, -O3, -Ofast, and -Omax.
Manually unrolling loops in source code might hinder the automatic rerolling of
loops and other loop optimizations by the compiler. Arm recommends that you
use #pragma unroll instead of manually unrolling loops. See #pragma unroll[(n)],
#pragma unroll_completely in the Arm Compiler for Embedded Reference Guide for
more information.
The following examples show code with loop unrolling and code without loop unrolling:
Bit counting loop without unrolling
Create the file file.c containing:
while (n != 0)
{
if (n & 1) bits++;
n >>= 1;
}
return bits;
}
Compile with:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 65 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
countSetBits1:
...
cmp r0, #0
moveq r0, #0
bxeq lr
.LBB0_1:
mov r1, r0
mov r0, #0
.LBB0_2: @ =>This Inner Loop Header: Depth=1
and r2, r1, #1
lsrs r1, r1, #1
add r0, r0, r2
bne .LBB0_2
@ %bb.3:
bx lr
Compile with:
countSetBits1:
...
cmp r0, #0
moveq r0, #0
bxeq lr
.LBB0_1:
mov r1, r0
mov r0, #0
b .LBB0_3
.LBB0_2: @ in Loop: Header=BB0_3 Depth=1
and r2, r2, #1
lsrs r1, r1, #4
add r0, r0, r2
bxeq lr
.LBB0_3: @ =>This Inner Loop Header: Depth=1
and r2, r1, #1
add r0, r0, r2
lsrs r2, r1, #1
beq .LBB0_5
@ %bb.4: @ in Loop: Header=BB0_3 Depth=1
and r2, r2, #1
add r0, r0, r2
lsrs r2, r1, #2
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 66 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Arm® Compiler for Embedded can unroll loops completely only if the number of iterations is
known at compile time.
Loop vectorization
If your target has the Advanced Single Instruction Multiple Data (SIMD) unit, then Arm Compiler
for Embedded can use the vectorizing engine to optimize vectorizable sections of the code. At
optimization level -O1, you can enable vectorization using -fvectorize. At higher optimizations, -
fvectorize is enabled by default and you can disable it using -fno-vectorize. See -fvectorize, -
fno-vectorize in the Arm Compiler for Embedded Reference Guide for more information. When using
-fvectorize with -O1, vectorization might be inhibited in the absence of other optimizations which
might be present at -O2 or higher.
As an implementation becomes more complicated, the likelihood that the compiler can auto-
vectorize the code decreases. For example, loops with the following characteristics are particularly
difficult, or impossible, to vectorize:
• Loops with interdependencies between different loop iterations.
• Loops with break clauses.
• Loops with complex conditions.
The following examples show a loop that Advanced SIMD can vectorize, and a loop that cannot be
vectorized easily:
Vectorizable by Advanced SIMD
Copy the following into the file vectorize.c:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 67 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
In vectorize.s, the vectorized assembly code contains the Advanced SIMD instructions, for
example vld1, vshl, and vst1:
DoubleBuffer1:
...
movw r0, :lower16:buffer
movt r0, :upper16:buffer
vld1.64 {d16, d17}, [r0:128]
vshl.i32 q8, q8, #1
vst1.32 {d16, d17}, [r0:128]!
vld1.64 {d16, d17}, [r0:128]
vshl.i32 q8, q8, #1
vst1.32 {d16, d17}, [r0:128]!
vld1.64 {d16, d17}, [r0:128]
vshl.i32 q8, q8, #1
vst1.32 {d16, d17}, [r0:128]!
vld1.64 {d16, d17}, [r0:128]
vshl.i32 q8, q8, #1
vst1.32 {d16, d17}, [r0:128]!
vld1.64 {d16, d17}, [r0:128]
vshl.i32 q8, q8, #1
vst1.32 {d16, d17}, [r0:128]!
vld1.64 {d16, d17}, [r0:128]
vshl.i32 q8, q8, #1
vst1.64 {d16, d17}, [r0:128]
bx lr
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 68 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
nonvectorize.s shows that the Advanced SIMD instructions are not generated when
compiling the example with the non-vectorizable loop:
DoubleBuffer2:
...
movw r12, :lower16:buffer
movt r12, :upper16:buffer
ldm r12, {r1, r2, r3}
lsl r0, r3, #1
cmp r3, #32
lsl r2, r2, #1
str r0, [r12, #8]
lsl r1, r1, #1
stm r12, {r1, r2}
bgt .LBB0_8
@ %bb.1:
add r2, r12, #12
ldm r2, {r0, r1, r2}
lsl r3, r2, #1
lsl r1, r1, #1
cmp r2, #32
str r1, [r12, #16]
lsl r0, r0, #1
str r3, [r12, #20]
str r0, [r12, #12]
bgt .LBB0_8
...
add r2, r12, #72
ldm r2, {r0, r1, r2}
lsl r3, r2, #1
lsl r1, r1, #1
cmp r2, #32
str r1, [r12, #76]
lsl r0, r0, #1
str r3, [r12, #80]
str r0, [r12, #72]
bxgt lr
.LBB0_7:
add r2, r12, #84
add r3, r12, #84
ldm r2, {r0, r1, r2}
lsl r1, r1, #1
lsl r2, r2, #1
lsl r0, r0, #1
stm r3, {r0, r1, r2}
.LBB0_8:
bx lr
Using -fno-vectorize does not necessarily prevent the compiler from emitting Advanced SIMD
instructions. The compiler or linker might still introduce Advanced SIMD instructions, such as when
linking libraries that contain these instructions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 69 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
To prevent the compiler from emitting Advanced SIMD instructions for AArch64 targets, specify
+nosimd using -march or -mcpu:
To prevent the compiler from emitting Advanced SIMD instructions for AArch32 targets, set
the option -mfpu to the correct value that does not include Advanced SIMD. For example, set -
mfpu=fp-armv8.
Following any or all of these guidelines, separately or in combination, is likely to result in better
code.
The following sample implementations of a routine to calculate n! together show the loop
termination overhead. The first implementation calculates n! using an incrementing loop, while the
second routine calculates n! using a decrementing loop.
C code for incrementing loops, increment.c
int fact1(int n)
{
int i, fact = 1;
for (i = 1; i <= n; i++)
fact *= i;
return (fact);
}
fact1:
...
cmp r0, #1
itt lt
movlt r0, #1
bxlt lr
.LBB0_1:
mov r1, r0
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 70 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
movs r0, #1
movs r2, #0
.p2align 2
.LBB0_2: @ =>This Inner Loop Header: Depth=1
adds r2, #1
cmp r1, r2
mul r0, r2, r0
bne .LBB0_2
@ %bb.3:
bx lr
fact2:
...
cbz r0, .LBB0_4
mov r1, r0
movs r0, #1
.p2align 2
.LBB0_2: @ =>This Inner Loop Header: Depth=1
muls r0, r1, r0
subs r1, #1
bne .LBB0_2
bx lr
.LBB0_4:
movs r0, #1
bx lr
Comparing the disassemblies shows that the ADD and CMP instruction pair in the incrementing loop
disassembly has been replaced with a single SUBS instruction in the decrementing loop disassembly.
Because the SUBS instruction updates the status flags, including the Z flag, there is no requirement
for an explicit CMP r1,r2 instruction.
Also, the variable n does not have to be available for the lifetime of the loop, reducing the number
of registers that have to be maintained. Having fewer registers to maintain eases register allocation.
If the original termination condition involves a function call, each iteration of the loop might call the
function, even if the value it returns remains constant. In this case, counting down to zero is even
more important. For example:
The technique of initializing the loop counter to the number of iterations that are required, and
then decrementing down to zero, also applies to while and do statements.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 71 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Infinite loops
armclang considers infinite loops with no side-effects to be undefined behavior, as stated in the
C11 and C++11 standards. In certain situations armclang deletes or moves infinite loops that have
no side-effects, resulting in a program that eventually terminates, or does not behave as expected.
To ensure that a loop executes for an infinite length of time, Arm recommends writing infinite loops
containing an __asm volatile statement. The volatile keyword tells the compiler to consider
that the loop has potential side effects, and therefore prevents the loop from being removed by
optimization. It is also good practice to try and put the processor in a low power state in such
a loop, until an event or interrupt occurs. The following example shows an infinite loop that is
specified as volatile, and includes an instruction to put the processor in a low power state until
an event occurs:
void infinite_loop(void)
{
while (1)
{
__asm volatile("wfe");
}
}
The volatile keyword tells armclang not to delete or move the loop. The compiler considers the
loop to have side-effects, and so it is not removed during optimization.
The WFE (Wait for Event) assembler instruction gives a hint to the processor. Writing your loop this
way allows processors that implement the WFE instruction to enter a low power state until an event
or interrupt occurs, so the loop does not consume power unnecessarily. You could also use WFI
(Wait for Interrupt) to output code that includes the WFI instruction, which allows processors that
implement WFI to wake on an interrupt signal rather than event signal.
For more details on WFE and WFI, see the relevant Instruction Set Architecture document for the
processor you are compiling for.
Related information
Effect of the volatile keyword on compiler optimization on page 61
-O (armclang)
-S (armclang)
pragma unroll
-fvectorize (armclang)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 72 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
• Arm Compiler for Embedded only inlines functions within the same compilation
unit, unless you use Link-Time Optimization. For more information, see
Optimizing across modules with Link-Time Optimization.
• C++ and C99 provide the inline language keyword. The effect of this inline
language keyword is identical to the effect of using the __inline__ compiler
keyword. However, the effect in C99 mode is different from the effect in C++
or other C that does not adhere to the C99 standard. For more information, see
Inline functions in the Arm Compiler for Embedded Reference Guide.
• Function inlining normally happens at higher optimization levels, such as -O2,
except when you specify __attribute__((always_inline)).
int bar(int a)
{
a=a*(a+1);
return a;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 73 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
{
i=bar(i);
i=i-2;
i=bar(i);
i++;
i=row(i);
i++;
return i;
}
In the example code, functions bar and row are identical but function row is always inlined. Use the
following compiler commands to compile for -O2 with -fno-inline-functions and without -fno-
inline-functions:
When compiling with -fno-inline-functions, the compiler does not inline the function bar. When
compiling without -fno-inline-functions, the compiler inlines the function bar. However, the
compiler always inlines the function row even though it is identical to function bar.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 74 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Related information
-fno-inline-functions (armclang)
__inline keyword
__attribute__((always_inline)) function attribute
__attribute__((no_inline)) function attribute
• Several optimizations can introduce new temporary variables to hold intermediate results. The
optimizations include CSE elimination, live range splitting, and structure splitting. The compiler
tries to allocate these temporary variables to registers. If not, it spills them to the stack. For
more information about what these optimizations do, see Overview of optimizations.
• Generally, code that is compiled for processors that only support 16-bit encoded T32
instructions makes more use of the stack than A64 code, A32 code, and code that is compiled
for processors that support 32-bit encoded T32 instructions. This is because 16-bit encoded
T32 instructions have only eight registers available for allocation, compared to fourteen for A32
code and 32-bit encoded T32 instructions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 75 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
• The AAPCS and AAPCS64 require that some function arguments are passed through the stack
instead of the registers, depending on their type, size, and order.
Processors for embedded applications have limited memory and therefore the amount of space
available on the stack is also limited. You can use Arm® Compiler for Embedded to determine how
much stack space is used by the functions in your application code. The amount of stack that a
function uses depends on factors such as the number and type of arguments to the function, local
variables in the function, and the optimizations that the compiler performs.
The result of the calculation shows how the size of the stack has grown, in bytes.
• Use a Fixed Virtual Platform (FVP) that corresponds to the target processor or architecture. With
a map file, define a region of memory directly below your stack where access is forbidden. If
the stack overflows into the forbidden region, a data abort occurs, which a debugger can trap.
To examine the stack usage in your application, use the linker option --info=stack. The following
example code shows functions with different numbers of arguments:
{
int f = 1;
while (n>0)
{
f *= n--;
}
return f;
}
Copy the code example to file.c and compile it using the following command:
Compiling with the -g option generates the DWARF frame information that armlink requires for
estimating the stack use. Run armlink on the object file using --info=stack:
For the example code, armlink shows the amount of stack that the various functions use. Function
foo_mor has more arguments than function foo, and therefore uses more stack.
You can also examine stack usage using the linker option --callgraph:
This command outputs a file called FileImage.htm which contains the stack usage information for
the various functions in the application.
[Stack]
Max Depth = 12
Call Chain = fact
[Called By]
>> foo_mor
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 77 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
>> foo
foo (ARM, 36 bytes, Stack size 8 bytes, file.o(.text))
[Stack]
Max Depth = 20
Call Chain = foo >> fact
[Calls]
>> fact
[Called By]
>> main
foo_mor (ARM, 76 bytes, Stack size 16 bytes, file.o(.text))
[Stack]
Max Depth = 28
Call Chain = foo_mor >> fact
[Calls]
>> fact
[Called By]
>> main
main (ARM, 76 bytes, Stack size 8 bytes, file.o(.text))
[Stack]
Max Depth = 36
Call Chain = main >> foo_mor >> fact
[Calls]
>> foo_mor
>> foo
[Called By]
>> __rt_entry_main (via BLX)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 78 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
If individual data members in a structure are not packed, the compiler can add padding within the
structure for faster access to individual members, based on the natural alignment of each member.
Arm® Compiler for Embedded provides a pragma and attribute to pack the members in a structure
or union without any padding.
When using #pragma pack(n), the alignment of the structure is the alignment of the largest
member after applying #pragma pack(n) to the structure.
Each example declares two objects c and d. Copy each example into file.c and compile:
For each example use linker option --info=sizes to examine the memory used in file.o.
The linker output shows the total memory used by the two objects c and d. For example:
36 0 0 0 24 0 str.o
---------------------------------------------------------------------------
36 0 16 0 24 0 Object Totals
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 79 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
struct stc
{
char one;
short two;
char three;
int four;
} c,d;
Char Padding
Int
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 80 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Int
Int
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 81 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
char one;
short two;
char three;
int four;
} c,d;
Int
Char Padding
Int
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 82 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
The alignment of the 10-byte structure is 2 bytes because the largest member without
__attribute__((packed)) is short:
struct stc
{
char one;
short two;
char three;
int __attribute__((packed)) four;
} c,d;
Char Int
Int Padding
If you take the address of a packed member, in most cases, the compiler generates a
warning.
char x;
short y;
};
Related information
pragma pack
__attribute__((packed)) type attribute
__attribute__((packed)) variable attribute
Different optimizations often work against each other. That is, techniques for improving code
performance might result in increased code size, and techniques for reducing code size might
reduce performance. For example, the compiler can unroll small loops for higher performance, with
the disadvantage of increased code size.
The default optimization level is -O0. At -O0, armclang does not perform optimization.
The following armclang options help you optimize for code performance:
-O1, -O2, or -O3
Specify the level of optimization to be used when compiling source files. A higher number
implies a higher level of optimization for performance.
-Ofast
Enables all the optimizations from -O3 together with other aggressive optimizations that
might violate strict compliance with language standards.
-Omax
Enables all the optimizations from -Ofast together with Link-Time Optimization (LTO).
The following armclang options help you optimize for code size:
-Os
Performs optimizations to reduce the code size at the expense of a possible increase in
execution time. This option aims for a balanced code size reduction and fast performance.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 84 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
-Oz
Optimizes for smaller code size.
-Omin
Minimum image size. Specifically targets minimizing code size. Enables all the optimizations
from level -Oz, together with:
• LTO aimed at removing unused code and data, while also trying to optimize global
memory accesses.
• Virtual function elimination, which is a particular benefit to C++ users.
You can also set the optimization level for the linker with the armlink option
--lto_level. The optimization levels available for armlink are the same as the
armclang optimization levels.
-fshort-enums
Allows the compiler to set the size of an enumeration type to the smallest data type that can
hold all enumerator values.
-fshort-wchar
Sets the size of wchar_t to 2 bytes.
-fno-exceptions
C++ only. Disables the generation of code that is required to support C++ exceptions.
-fno-rtti
C++ only. Disables the generation of code that is required to support Run-Time Type
Information (RTTI) features.
-mthumb
In AArch32 state, A- and R-profile processors support both the A32 instruction set (formerly
ARM), and the T32 instruction set (formerly Thumb®).
T32 offers significant code size improvements compared to A32, with comparable
performance. Therefore, if you are compiling for AArch32 state for a target that supports
both A32 and T32 instructions, consider compiling with -mthumb to reduce the size of your
code.
The following armclang option helps you optimize for both code size and code performance:
-flto
Enables LTO, which enables the linker to make additional optimizations across multiple source
files. See Optimizing across modules with Link-Time Optimization for more information.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 85 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
If you want to use LTO when invoking armlink separately, you can use the
armlink option --lto_level to select the LTO optimization level that matches
your optimization goal.
Also, choices you make during coding can affect optimization. For example:
• Optimizing loop termination conditions can improve both code size and performance. In
particular, loops with counters that decrement to zero usually produce smaller, faster code than
loops with incrementing counters.
• Manually unrolling loops by reducing the number of loop iterations, but increasing the amount
of work that is done in each iteration, can improve performance at the expense of code size.
• Reducing debug information in objects and libraries reduces the size of your image.
• Using inline functions offers a trade-off between code size and performance.
• Using intrinsics can improve performance.
For example:
• In AArch64 state, 8 integer and 8 floating-point arguments (16 in total) can be passed
efficiently. In AArch32 state, ensure that functions take four or fewer arguments if each
argument is a word or less in size.
• In C++, ensure that nonstatic member functions take fewer arguments than the efficient limit,
because in AArch32 state the implicit this pointer argument is usually passed in R0.
• Ensure that a function does a significant amount of work if it requires more than the efficient
limit of arguments. The work that the function does then outweighs the cost of passing the
stacked arguments.
• Put related arguments in a structure, and pass a pointer to the structure in any function call.
Pointing to a structure reduces the number of parameters and increases readability.
• For AArch32 state, minimize the number of long long parameters, because these use two
argument registers that have to be aligned on an even register index.
• For AArch32 state, minimize the number of double parameters when using software floating-
point.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 86 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
By default, the compiler optimizes each source module independently, translating C or C++ source
code into an ELF file containing object code. At link time, the linker combines all the ELF object
files into an executable by resolving symbol references and relocations. Compiling each source file
separately means that the compiler might miss some optimization opportunities, such as cross-
module inlining.
When Link-Time Optimization (LTO) is enabled, the compiler translates source code into an
intermediate form called LLVM bitcode. At link time, the linker collects all files containing bitcode
together and sends them to the link-time optimizer, libLTO. libLTO is provided as a library:
• libLTO.so on Linux.
• LTO.dll on Windows.
Collecting modules together means that the link-time optimizer can perform more optimizations
because it has more information about the dependencies between modules. The link-time
optimizer then sends a single ELF object file back to the linker. Finally, the linker combines all object
and library code to create an executable.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 87 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Bitcode
C/C++ Source armclang ELF
.c -flto .o
Static library
armar contaning
Bitcode ELF
ELF Object
containing ELF Object
Bitcode .o
.o
In this figure, ELF Object containing Bitcode is an ELF file that does not contain
normal code and data. Instead, it contains a section that is called .llvm.lto that
holds LLVM bitcode. In Arm® Compiler for Embedded versions earlier than 6.21, the
section is called .llvmbc.
Sections .llvm.lto and .llvmbc are reserved. You must not create a .llvm.lto
or .llvmbc section with __attribute__((section("<name>"))), for example,
__attribute__((section(".llvmbc"))).
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 88 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Procedure
1. At compilation time, use the armclang option -flto to produce ELF files suitable for LTO. These
ELF files contain bitcode in a .llvm.lto section.
The armclang options -Omax and -Omin automatically enable the -flto option.
2. At link time, use the armlink option --lto to enable LTO for the specified bitcode files.
If you use the -flto option without the -c option, armclang automatically
passes the --lto option to armlink.
The examples described in Link-Time Optimization examples show how to perform LTO across all
source files, or a subset of source files.
Partial linking
The armlink option --partial only works with ELF files. If the linker detects a file containing
bitcode, it gives an error message.
Scatter-loading
The output of the link-time optimizer is a single ELF object file that by default is given a
temporary filename. This ELF object file contains sections and symbols just like any other ELF
object file, and Input section selectors match the sections and symbols as normal.
Use the armlink option --lto_intermediate_filename to name the ELF object file output.
You can reference this ELF file name in the scatter file.
Arm recommends that LTO is only performed on code and data that does not require precise
placement in the scatter file. That is, placement with general Input section selectors such as
*(+RO) and .ANY(+RO) used to select sections that LTO generates. See Scatter file section or
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 89 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
object placement with Link-Time Optimization for an example of building an image using LTO
and with a scatter file to place named sections.
Any use of libLTO other than the library supplied with Arm Compiler for Embedded 6 is
unsupported.
Other restrictions
• You cannot currently use LTO for building ROPI/RWPI images.
• Object files that LTO produces contain build attributes that are the default for the target
architecture. If you use the armlink options --cpu or --fpu when LTO is enabled, armlink
can incorrectly report that the attributes in the file that the link-time optimizer produces
are incompatible with the provided attributes.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 90 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
• LTO can interfere with the correct reporting of errors when using file-scope inline
assembly. For more information, see File-scope inline assembly.
In this example, as armclang automatically calls armlink, the link-time optimizer has
the same optimization level as armclang. As no optimization level is specified for
armclang, it is the default optimization level -O0, and --lto_level=O0.
In this example, because armclang and armlink are called separately, they have
independent optimization levels. As no optimization level is specified for armclang
or armlink, armclang has the default optimization level -O0 and the link-time
optimizer has the default optimization level --lto_level=O2. You can call armclang
and armlink with any combination of optimization levels.
In this case, function() is called with the parameter a == 0, so printit() is not called at run time.
// main.c
extern int function(int a);
int main(void)
{
return function(0);
}
// functions.c
#include <stdio.h>
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 92 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
printit();
return 0;
}
}
void printit(void)
{
printf("a is non-zero.\n");
}
Procedure
1. Build the example code with LTO disabled:
armclang --target=arm-arm-none-eabi -march=armv7-a -O2 -c main.c -o main.o
armclang --target=arm-arm-none-eabi -march=armv7-a -O2 -c functions.c -o
functions.o
armlink main.o functions.o -o image_without_lto.axf
fromelf --text -c -z image_without_lto.axf
The compiler cannot inline the call to function() because it is in a different object from
main(). Therefore, the compiler must keep the conditional call to printit() within function(),
because the compiler does not have any information about the value of the parameter a while
functions.c is being compiled:
...
$a.0
function
0x00008bd8: e3500000 ..P. CMP r0,#0
0x00008bdc: 0a000004 .... BEQ 0x8c18 ; function + 28
0x00008be0: e92d4800 .H-. PUSH {r11,lr}
0x00008be4: e3080c6a j... MOV r0,#0x8c6a
0x00008be8: e3400000 ..@. MOVT r0,#0
0x00008bec: fafffd1f .... BLX puts ; 0x8094
0x00008bf0: e8bd4800 .H.. POP {r11,lr}
0x00008bf4: e3a00000 .... MOV r0,#0
0x00008bf8: e12fff1e ../. BX lr
main
0x00008bfc: e3a00000 .... MOV r0,#0
0x00008c00: eafffff4 .... B function ; 0x8bfc
...
Also, printit() uses the Arm C library function printf(). In this example, printf() is
optimized to puts() and inlined into function(). Therefore, the linker must include the relevant
C library code to allow the puts() function to be used. Including the C library code results in a
large amount of uncalled code being included in the image. The output from the fromelf utility
shows the resulting overall image size:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 93 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Although the compiler does not have any information about the call to function() from main()
when compiling functions.c, at link time, it is known that:
• function() is only ever called once, with the parameter a == 0.
• printit() is never called.
• The Arm C library function puts() is never called.
Because LTO is enabled, this extra information is used to make the following optimizations:
• Inlining the call to function() into main().
• Removing the code to conditionally call printit() from function() entirely.
• Removing the C library code that allows use of the puts() function.
...
$a.0
main
0x00008128: e3a00000 .... MOV r0,#0
0x0000812c: e12fff1e ../. BX lr
...
Also, this optimization means that the overall image size is much lower. The output from the
fromelf utility shows the reduced image size:
Related information
Optimizing for code size or performance on page 84
Optimizing across modules with Link-Time Optimization on page 86
How optimization affects the debug experience on page 101
-O (armclang)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 94 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
specific regions, both the scatter file and the project source code must be modified to ensure the
placement works with LTO.
In general:
• Scatter files with object names that are used in input selection patterns, such as foo.o(+RO) do
not work with LTO.
• Scatter files with section names that are used in input selection patterns, where the section
name corresponds to an inlined function, do not work.
To use scatter file section or object placement with LTO, the following changes must be made to a
project:
• Compile all source files that are built with LTO enabled with -fno-inline-functions.
• Modify each source file that is built with LTO enabled to use #pragma clang section to place
all functions in that source file into sections with a name unique to that source file.
• Modify the scatter file to use section names instead of object file names.
Example code
The following example code is used in the example sections, unless specified otherwise. In this
code, all functions in foo.c must be placed in an execution region EXEC_FOO, and all functions in
bar.c must be placed in an execution region EXEC_BAR:
variables.c:
foo.c:
#include <stdio.h>
void foo(void)
{
printf("The answer from foo is: %d\n", foo_int);
}
bar.c:
#include <stdio.h>
void bar(void)
{
printf("The answer from bar is: %d\n", bar_int);
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 95 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
main.c:
int main(void)
{
foo();
bar();
return 0;
}
scatter.scat:
LOAD 0x0
{
EXEC_ANY +0x0
{
.ANY(+RO, +RW, +ZI)
}
The memory map from the listing file image.lst shows that EXEC_FOO and EXEC_BAR contain code
from foo.c and bar.c respectively, as intended:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 96 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
In this example, compiling variables.c without -flto has no effect on the result of running
the image. However, compiling the file without -flto is required when placing data with named
sections.
Also, the memory map from the listing file image.lst shows that EXEC_FOO and EXEC_BAR are empty:
These execution regions are empty because LTO has inlined all functions within foo.c and bar.c.
Therefore, the functions are no longer available for placement with a scatter file.
In this example, compiling variables.c without -flto has no effect on the result of running
the image. However, compiling the file without -flto is required when placing data with named
sections.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 97 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
The reason is that, even though function inlining is disabled, all code from main.c, foo.c, and bar.c
is part of the same intermediate LTO object file. Therefore, at the final link stage within the LTO
process, foo.o and bar.o do not exist as separate object files.
The memory map in the listing file image.lst shows that the code from foo.c and bar.c is now
placed in the EXEC_ANY execution region instead:
In this example, lto-llvm-68b687.o is the LTO intermediate filename that the linker generates.
However, this filename might be different when linking again.
Although you can change the LTO intermediate name using the armlink command-line option
--lto_intermediate_filename, it does not help in this use case. Instead, you must use section
names.
Example: Using section names for functions and data within a C language source file
The easiest way to specify section names for all functions or data within a C
language source file is to use #pragma clang section. Alternatively, you can use
__attribute__((section("<section>"))) for specific functions and data.
For this example, rewrite the example code in the files variables.c, foo.c, and bar.c as follows:
variables.c:
foo.c:
#include <stdio.h>
void foo(void)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 98 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
{
printf("The answer is: %d", foo_int);
}
bar.c:
#include <stdio.h>
void bar(void)
{
printf("The answer is: %d", bar_int);
}
#pragma clang section text="foo_rotext" specifies that code in foo.c is placed in the named
section foo_rotext for the code that is generated.
Similar names are specified in bar.c and variables.c for the code and data generated by that file.
You can rewrite scatter.scat to place these section names as follows:
scatter.scat:
LOAD 0x0
{
EXEC_ANY +0x0
{
.ANY(+RO, +RW, +ZI)
}
Example: Building with LTO enabled, function inlining disabled, and using section names
instead of object file names
Build the modified example with:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 99 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Because we are placing the data in named sections with a scatter file, and that data is in a separate
file from the code, then we have to build the variables.c file without -flto. See Scatter-loading in
Restrictions with Link-Time Optimization for more information.
The linker does not report any warnings. Also, the memory map from the listing file image.lst
shows that EXEC_FOO and EXEC_BAR contain the code from the expected sections:
The key difference between this LTO approach and the non-LTO approach with object file names is
that in this approach, the function names are not visible in the listing file. To verify that the sections
foo_rotext and bar_rotext contain the functions from foo.c and bar.c respectively, examine the
symbol table from the fromelf --text -s output:
...
** Section #8 '.symtab' (SHT_SYMTAB)
Size : 7328 bytes (alignment 4)
String table #9 '.strtab'
Last local symbol no. 309
The addresses for these functions in the output from the fromelf utility correspond to the
execution region addresses in the memory map from the listing file image.lst. The symbol table
also confirms the location of the int constants sections foo_rodata and bar_rodata.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 100 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
Other considerations
Other approaches you might want to consider:
• If you plan to build a project with LTO eventually, it might be better to use section names
instead of object file names within scatter files using the method shown in this example. This
approach is compatible both with and without LTO.
• If you disable LTO, it is better to also remove -fno-inline-functions, because doing so allows
the compiler to perform inlining optimizations.
• If disabling function inlining entirely is not required, then use the attribute
__attribute__((noinline)) on each function that is not to be inlined. This approach can
help achieve a better balance between explicit code placement and cross-file function inlining
optimizations.
Related information
Optimizing across modules with Link-Time Optimization on page 86
-fno-inline-functions (armclang)
-flto (armclang)
-O (armclang)
__attribute__((noinline)) function attribute
__attribute__((section("name"))) function attribute
__attribute__((section("name"))) variable attribute
#pragma clang section
--lto (armlink)
--lto_intermediate_filename (armlink)
Scatter-loading Features
Scatter File Syntax
Therefore, there is a trade-off between optimizing code and the debug experience.
For good debug experience, Arm recommends -O1 rather than -O0. When using -O1, the compiler
performs certain optimizations, but the structure of the generated code is still close to the source
code.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 101 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Optimized Code
A literal pool is a block of memory embedded in the code to hold literal values. These values can be
constants or long branch addresses.
armclang does not trade off literal pool sharing against unused section elimination. For example,
you might have five functions in separate sections. You can keep the five functions in separate
sections, so the linker can eliminate any that you did not use in your image. Therefore, the subset
of the functions that are left in the link can still share their literals.
Also, armclang allows a global approach to literal-sharing. The linker can globally search for
opportunities to share literals, even between functions from different parts of the code base that
you might not have realised were using similar literals.
To make the best use of this feature, specify the armclang option -ffunction-sections, which is
the default setting. The -ffunction-sections option does not affect the literal pool generation
for a function. However, because the linker merging of literal pools only works on literal pools at
the end of a section, -ffunction-sections gives the optimization more opportunities. The correct
literal-merging behavior is visible only in the final image after linking, because the object files still
contain the unmerged versions of the literals.
Related information
-ffunction-sections, -fno-function-sections
-mexecute-only
-O
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 102 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Assembly Code
The armasm legacy assembler is deprecated, and it has not been updated since Arm
Compiler 6.10. Also, armasm does not support:
• Armv8.4-A or later architectures.
• Certain backported options in Armv8.2-A and Armv8.3-A.
• Assembling SVE instructions.
• Armv8.1-M or later architectures, including MVE.
• All versions of the Armv8-R architecture.
The Migration and Compatibility Guide contains detailed information about the
differences between GNU syntax and armasm syntax assembly to help you migrate
legacy assembly code.
The following examples show equivalent GNU syntax and armasm assembly code for incrementing a
register in a loop.
.text
.file "file.S"
.section .text.main,"ax",@progbits
.p2align 2
.type main,@function
main:
MOV w5,#0x64 // W5 = 100
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 103 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Assembly Code
MOV w4,#0 // W4 = 0
B test_loop // branch to test_loop
loop:
ADD w5,w5,#1 // Add 1 to W5
ADD w4,w4,#1 // Add 1 to W4
test_loop:
CMP w4,#0xa // if W4 < 10, branch back to loop
BLT loop
.end
Use GNU syntax for newly created assembly files. Use the armclang integrated assembler to
assemble GNU assembly language source code. Typically, you invoke the armclang assembler as
follows:
END
You might have legacy assembly source files that use the armasm syntax. Use armasm to assemble
legacy armasm syntax assembly code. Typically, you invoke the armasm assembler as follows:
Related information
GNU Binutils - Using as
Migrating armasm syntax assembly code to GNU syntax
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 104 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Assembly Code
To debug Arm code, an Arm-compatible debugger expects the .debug_frame section to be present.
Arm® Compiler for Embedded 6 exclusively uses .debug_frame to keep the code size small. There
is a similarly formatted section called .eh_frame, used by the program itself for handling C++
exceptions. armclang does not include the .eh_frame section unless it is necessary.
The armclang integrated assembler does not automatically generate this information. Therefore,
you must add the information into your GNU-syntax assembly code using .cfi directives.
Adding .cfi directives for functions that return using the link register (LR) is easy. Using directives
to describe the location of variables in registers and the stack is more difficult. Because most
assembler functions do not use the stack, only a backtrace is required. Therefore, you need only
use a subset of the .cfi directives for most cases:
• .cfi_sections .debug_frame
• .cfi_startproc
• .cfi_endproc
To see where the armclang integrated assembler inserts the .cfi directives, compile the following
C code:
// test.c
int main(void)
{
return 0;
}
-g generates the .cfi directives. -O2 removes all use of the stack from main(). The armclang
integrated assembler generates the following assembly:
...
main:
.Lfunc_begin0:
.file 1 "<source_code_location>" "test.c"
.loc 1 1 0
.fnstart
.cfi_sections .debug_frame
.cfi_startproc
.loc 1 1 18 prologue_end
mov r0, #0
bx lr
.Ltmp0:
.Lfunc_end0:
.size main, .Lfunc_end0-main
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 105 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Assembly Code
.cfi_endproc
.cantunwind
.fnend
...
The function does not use the stack and returns using LR, so the .cfi_startproc, .cfi_endproc,
and .cfi_sections .debug_frame directives are sufficient.
Functions that do not return using LR require more directives to tell the debugger that the return
address is no longer in LR. For example:
mov r1, lr // r1 = lr
mov lr, #0 // use lr for something else.
bx r1 // return using r1
Here, more directives are needed after the mov lr, #0 instruction. For the complete set of .cfi
directives, see CFI directives.
Related information
Call Frame Information directives
By default, armclang uses the assembly code source file suffix to determine whether to run the C
preprocessor:
• The .s (lowercase) suffix indicates assembly code that does not require preprocessing.
• The .S (uppercase) suffix indicates assembly code that requires preprocessing.
The -x option lets you override the default by specifying the language of the subsequent source
files, rather than inferring the language from the file suffix. Specifically, -x assembler-with-cpp
indicates that the assembly code contains C preprocessor directives and armclang must run the C
preprocessor. The -x option only applies to input files that follow it on the command line.
Do not confuse the .ifdef assembler directive with the preprocessor #ifdef
directive:
• The preprocessor #ifdef directive checks for the presence of preprocessor
macros. These macros are defined using the #define preprocessor directive or
the armclang command-line option -D.
• The armclang integrated assembler .ifdef directive checks for code symbols.
These symbols are defined using labels or the .set directive.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 106 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Writing Assembly Code
The preprocessor runs first and performs textual substitutions on the source code.
This stage is when the #ifdef directive is processed. The source code is then
passed onto the assembler, when the .ifdef directive is processed.
For example:
• Use the -x assembler-with-cpp option to tell armclang that the assembly source file requires
preprocessing. This option is useful when you have existing source files with the lowercase
extension .s.
For example:
If you want to preprocess assembly files that contain legacy armasm-syntax assembly
code, then you must either:
• Use the .S filename suffix.
• Use separate steps for preprocessing and assembling.
For more information, see Command-line options for preprocessing assembly source
code in the Migration and Compatibility Guide.
Related information
Command-line options for preprocessing assembly source code
-E (armclang)
-x (armclang)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 107 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
For example:
• To access features that are not available from C or C++, such as interfacing directly with device
hardware.
• To generate highly optimized code by using intrinsics or inline assembly to write sections of
your code.
There are several ways to have low-level control over the generated code:
• Intrinsics are functions that the compiler provides. An intrinsic function has the appearance of
a function call in C or C++, but compilation replaces the intrinsic by a specific sequence of low-
level instructions.
Arm compilers recognize Arm intrinsics, but are not guaranteed to work with any
third-party compiler toolchains.
• Inline assembly lets you write assembly instructions directly in your C/C++ code, without the
overhead of a function call.
• Calling assembly functions from C/C++ lets you write standalone assembly code in a separate
source file. This code is assembled separately to the C/C++ code, and then integrated at link
time.
The C and C++ languages are suited to many tasks but they do not provide built-in support for
specific areas of application, for example Digital Signal Processing (DSP).
In a given application domain, there is usually a range of domain-specific operations that have to be
performed frequently. However, if specific hardware support is available, then these operations can
often be implemented more efficiently using the hardware support rather than in C or C++.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 108 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
Using compiler intrinsics, you can achieve more complete coverage of target architecture
instructions than you might get from the instruction selection of the compiler.
An intrinsic function has the appearance of a function call in C or C++, but compilation replaces the
intrinsic by a specific sequence of low-level instructions.
• More information is given to the compiler than the underlying C and C++ language is able
to convey. This information enables the compiler to perform optimizations and to generate
instruction sequences that it cannot otherwise perform.
These performance benefits can be significant for real-time processing applications. However, care
is required because the use of intrinsics can decrease code portability.
Some intrinsics are necessary because the compiler does not otherwise recognize them. For many
cases, C code without intrinsics might be more efficient, more portable, and easier for the compiler
to optimize. When the compiler can create the instruction you require, C code without intrinsics
might be the better alternative.
#include <limits.h>
int L_add(const int a, const int b)
{
int c;
c = (unsigned int)a + b;
if (((a ^ b) & INT_MIN) == 0)
{
if ((c ^ a) & INT_MIN)
{
c = (a < 0) ? INT_MIN : INT_MAX;
}
}
return c;
}
...
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 109 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
L_add:
...
adds r2, r1, r0
eor.w r3, r2, r0
eors r1, r0
cmp.w r3, #-1
mov r3, r2
mvn r12, #-2147483648
it le
eorle.w r3, r12, r0, asr #31
cmp r1, #0
csel r0, r2, r3, mi
bx lr
...
3. Compile with:
...
saturating_add:
...
qadd r0, r0, r1
bx lr
...
Example: C code that the compiler can convert to the required instruction
The previous example of the C implementation for a saturating add operation can be rewritten so
that the compiler can create the required qadd instruction directly:
// qadd.c
#include <limits.h>
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 110 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
...
qadd:
...
qadd r0, r0, r1
bx lr
...
Related information
Compiler-specific intrinsics
ACLE support
NEON Programmer's Guide
These intrinsics are documented in the Custom Datapath Extension section of the Arm C Language
Extensions document.
Example
The following example shows how to use the ACLE intrinsics for CDE:
1. Create the foo.c file containing the following code:
#include <arm_cde.h>
In this file, the function foo() uses the __arm_cx2() ACLE intrinsic for CDE. This intrinsic
generates a CX2 instruction.
A CX2 instruction is a Custom class 2 instruction that computes a value based on a source
register, an immediate, optionally the original value of the destination register, and also writes
the result to the destination register.
For example, the instruction CX2 p0, r0, r1, #2 sends the immediate 2 and the register R1 to
the CDE coprocessor p0, and writes the result returned by p0 to the register R0.
Where:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 111 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
This intrinsic generates a variant of the CX2 instruction that does not use the destination
register value to compute the result.
2. Compile foo.c with the command:
The compiler generates a CX2 instruction with the expected operands, and returns the result of
the instruction in register R0.
3. Run the following fromelf command to examine the output:
...
$t.0
[Anonymous symbol #3]
foo
0x00000000: ee400004 @... CX2 p0,r0,r0,#4
0x00000004: 4770 pG BX lr
...
Related information
-march
-mcpu
--coprocN=value (fromelf)
ARM v8-M Supplement - CDE Reference Manual
The __asm keyword can incorporate inline assembly code into a function using the GNU inline
assembly syntax. For example:
#include <stdio.h>
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 112 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
int main(void)
{
int a = 1;
int b = 2;
int c = 0;
c = add(a,b);
The inline assembler does not support legacy assembly code written in armasm
assembler syntax. See the Migration and Compatibility Guide for more information
about migrating armasm syntax assembly code to GNU syntax.
Using inline assembly rather than writing a separate .s file has the following advantages:
• Shifts the burden of handling the procedure call standard (PCS) from the programmer to the
compiler. This includes allocating the stack frame and preserving all necessary callee-saved
registers.
• Inline assembly code gives the compiler more information about what the assembly code does.
• The compiler can inline the function that contains the assembly code into its callers.
• Inline assembly code can take immediate operands that depend on C-level constructs, such as
the size of a structure or the byte offset of a particular structure field.
Use the volatile qualifier for assembler instructions that have processor side-effects, which the
compiler might be unaware of. The volatile qualifier disables certain compiler optimizations,
which might otherwise lead to the compiler removing the code block. The volatile qualifier is
optional, but consider using it around your assembly code blocks to ensure the compiler does not
remove them when compiling with -O1 or higher.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 113 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
code
The assembly instruction, for example "ADD R0, R1, R2".
code_template
A template for an assembly instruction, for example "ADD %[result], %[input_i],
%[input_j]".
If you specify a code_template rather than <code> then you must specify the outputs before
specifying the optional inputs and clobber_list.
outputs
A list of output operands, separated by commas. Each operand consists of a symbolic name
in square brackets, a constraint string, and a C expression in parentheses. In this example,
there is a single output operand: [result] "=r" (res). The list can be empty. For example:
inputs
An optional list of input operands, separated by commas. Input operands use the same syntax
as output operands. In this example, there are two input operands: [input_i] "r" (i),
[input_j] "r" (j). The list can be empty.
clobber_list
A comma-separated list of strings. Each string is the name of a register that the assembly
code potentially modifies, but for which the final value is not important. To prevent the
compiler from using a register for a template string in an inline assembly string, add the
register to the clobber list.
For example, if a register holds a temporary value, include it in the clobber list. The compiler
avoids using a register in this list as an input or output operand, or using it to store another
value when the assembly code is executed.
The list can be empty. In addition to registers, the list can also contain special arguments:
"cc"
The instruction modifies the condition code flags.
"memory"
The instruction accesses unknown memory addresses.
The registers in clobber_list must use lowercase letters rather than uppercase letters. An
example instruction with a clobber_list is:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 114 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
__asm ("my_label:\n\t");
Multiple instructions
You can write multiple instructions within the same __asm statement. This example shows an
interrupt handler written in one __asm statement for an Arm®v8-M mainline architecture.
void HardFault_Handler(void)
{
__asm (
"TST LR, #0x40\n\t"
"BEQ from_nonsecure\n\t"
"from_secure:\n\t"
"TST LR, #0x04\n\t"
"ITE EQ\n\t"
"MRSEQ R0, MSP\n\t"
"MRSNE R0, PSP\n\t"
"B hard_fault_handler_c\n\t"
"from_nonsecure:\n\t"
"MRS R0, CONTROL_NS\n\t"
"TST R0, #2\n\t"
"ITE EQ\n\t"
"MRSEQ R0, MSP_NS\n\t"
"MRSNE R0, PSP_NS\n\t"
"B hard_fault_handler_c\n\t"
);
}
Copy the above handler code to file.c and then you can compile it using:
Related information
armclang inline assembler
Migrating armasm syntax assembly code to GNU syntax
Semihosting for AArch32 and AArch64
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 115 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
You can define embedded assembly functions in C or C++ code using __attribute__((naked)).
For more information about the naked attribute, see the reference page in the Arm Compiler for
Embedded Reference Guide.
Related information
Using the integrated assembler on page 21
Writing inline assembly code on page 112
armclang Integrated Assembler
However, in some situations you might want to make function calls from C/C++ code to assembly
code. For example:
• If you want to make use of existing assembly code, but the rest of your project is in C or C++.
• If you want to manually write critical functions directly in assembly code that can produce
better optimized code than compiling C or C++ code.
• If you want to interface directly with device hardware and if this is easier in low-level assembly
code than high-level C or C++.
For code portability, it is better to use intrinsics or inline assembly rather than
writing and calling assembly functions.
.global myadd
.p2align 2
.type myadd,%function
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 116 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
armclang requires that you explicitly specify the types of exported symbols
using the .type directive. If the .type directive is not specified in the above
example, the linker outputs warnings of the form:
#include <stdio.h>
int main()
{
int a = 4;
int b = 5;
printf("Adding %d and %d results in %d\n", a, b, myadd(a, b));
return (0);
}
3. Ensure that your assembly code complies with the Procedure Call Standard for the Arm
Architecture (AAPCS).
The AAPCS describes a contract between caller functions and callee functions. For example, for
integer or pointer types, it specifies that:
• Registers R0-R3 pass argument values to the callee function, with subsequent arguments
passed on the stack.
• Register R0 passes the result value back to the caller function.
• Caller functions must preserve R0-R3 and R12, because these registers are allowed to be
corrupted by the callee function.
• Callee functions must preserve R4-R11 and LR, because these registers are not allowed to
be corrupted by the callee function.
For more information, see the Application Binary Interface (ABI) documentation.
4. Compile both source files:
Related information
Procedure Call Standard for the Arm Architecture
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 117 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using Assembly and Intrinsics in C or C++ Code
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 118 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
SVE is a SIMD instruction set for AArch64, that introduces the following architectural features for
High Performance Computing (HPC):
• Scalable vector length.
• Per-lane predication.
• Gather-load and scatter-store.
• Fault-tolerant speculative vectorization.
• Horizontal and serialized vector operations.
This release of the Arm Compiler for Embedded toolchain lets you:
• Assemble source code containing SVE instructions.
• Disassemble ELF object files containing SVE instructions.
• Compile C and C++ code for SVE-enabled targets.
• Use intrinsics to write SVE instructions directly from C code.
The Arm Compiler for Embedded toolchain only supports bare-metal applications.
For SVE compilation for Linux, use Arm Compiler for Linux. For more information,
see Arm Compiler for Linux.
Arm Compiler for Embedded supports auto-vectorization for SVE, but does not
include SVE-optimized libraries. Suitable SVE-optimized libraries are supplied with
Arm Compiler for Linux. For more information, see Arm Compiler for Linux.
The SVE architectural extension to the Arm®v8-A architecture (armv8-a+sve) provides SVE
instructions. Many of these SVE instructions make use of the p and z register classes.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 119 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
The following example shows a simple assembly program that includes SVE instructions.
// example1.s
.global main
main:
mov x0, 0x90000000
mov x8, xzr
ptrue p0.s //SVE instruction
fcpy z0.s, p0/m, #5.00000000 //SVE instruction
orr w10, wzr, #0x400
loop:
st1w z0.s, p0, [x0, x8, lsl #2] //SVE instruction
incw x8 //SVE instruction
whilelt p0.s, x8, x10 //SVE instruction
b.any loop //SVE instruction
mov w0, wzr
ret
To assemble this source file into a binary object file, use armclang with an SVE-enabled target:
-march=armv8-a+sve
Specifies that the compiler targets the Armv8-A architecture profile with the SVE target
feature enabled.
The default for AArch64 is -march=armv8-a, that is the Armv8-A architecture profile without
the SVE extension. You must explicitly specify +sve to assemble SVE instructions.
Armv8-A and later architectures support the SVE extension. For example, -march=armv8.1-a
+sve.
example1.s
Input assembly language file.
-o example1.o
Output ELF object file.
Related information
Disassembling SVE object files on page 121
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 120 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
Procedure
1. Use the C file matmul_f64_sve.c from the example in Running a binary in an AEMv8-A Base
Fixed Virtual Platform (FVP).
2. Compile and use fromelf to view the disassembly:
armclang -c -O3 --target=aarch64-arm-none-eabi -march=armv8-a+sve -o
matmul_f64_sve.o matmul_f64_sve.c
fromelf -c matmul_f64_sve.o
...
** Section #3 '.text.matmul_f64_sve' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR]
Size : 432 bytes (alignment 4)
Address: 0x00000000
$x.0
matmul_f64_sve
0x00000000: fc1a0fea .... STR d10,[sp,#-0x60]!
0x00000004: a90457f6 .W.. STP x22,x21,[sp,#0x40]
0x00000008: aa0003f5 .... MOV x21,x0
0x0000000c: 04e0e3f6 .... CNTD x22
0x00000010: 90000000 .... ADRP x0,{pc} ; 0x10
...
0x00000190: 54fffe43 C..T B.CC {pc}-0x38 ; 0x158
0x00000194: a9454ff4 .OE. LDP x20,x19,[sp,#0x50]
0x00000198: a94457f6 .WD. LDP x22,x21,[sp,#0x40]
0x0000019c: a9435ff8 ._C. LDP x24,x23,[sp,#0x30]
0x000001a0: a94267fe .gB. LDP x30,x25,[sp,#0x20]
0x000001a4: 6d4123e9 .#Am LDP d9,d8,[sp,#0x10]
0x000001a8: fc4607ea ..F. LDR d10,[sp],#0x60
0x000001ac: d65f03c0 .._. RET
...
Related information
Assembling SVE code on page 119
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 121 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
The following example shows a complete command-line invocation of the FVP. Most of the lines
are required for correct program execution and do not need to be modified. $VECLEN, $CMDLINE, and
$BINARY are parameters that can be edited.
$FVP_BASE/FVP_Base_AEMvA \
--plugin $FVP_BASE/ScalableVectorExtension.so \
-C SVE.ScalableVectorExtension.veclen=$VECLEN \
--quiet \
--stat \
-C cluster0.NUM_CORES=1 \
-C bp.secure_memory=0 \
-C bp.refcounter.non_arch_start_at_default=1 \
-C cluster0.cpu0.semihosting-use_stderr=1 \
-C bp.vis.disable_visualisation=1 \
-C cluster0.cpu0.semihosting-cmd_line="$CMDLINE" \
-a cluster0.cpu0=$BINARY
Where:
$FVP_BASE
Specifies the path to the FVP.
$VECLEN
Defines the SVE vector width, in units of 64-bit (8 byte) blocks. The maximum value is 32,
which corresponds to the architectural maximum SVE vector width of 2048 bits (256 bytes).
The SVE architecture only supports vector lengths in 128-bit (16 byte increments), so all
values of $VECLEN must be even. For example, a value of 8 signifies a 512-bit vector width.
--quiet
Specifies that the FVP emits reduced output. For example, if --quiet is omitted, Simulation
is started and Simulation is terminating messages are output to signify the start and
end of program execution.
--stat
Specifies that the FVP writes a short summary of program execution to standard output
following termination (even if --quiet is specified).
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 122 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
$CMDLINE
Specifies the command line to pass to your program. This command line is typically of the
form "./<binary_name> <arg1> <arg2>".
$BINARY
Specifies the path to the compiled binary that the FVP is to load and execute.
A sample application
The following sample application, matmul_f64_sve.c, is derived from the matmul_f64 example
provided in SVE Programming Examples, and uses the svcntd, svdup_f64, svld1, svld1rq, and
svmla_lane SVE intrinsics:
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <math.h>
#include <time.h>
#include <arm_sve.h>
#define A 128
#define B 128
#define C 128
float64_t *ptrIN_left;
float64_t *ptrIN_right;
float64_t *ptrOUT;
offsetIN_1 = K;
offsetIN_2 = 2*K;
offsetIN_3 = 3*K;
offsetOUT_1 = N;
offsetOUT_2 = 2*N;
offsetOUT_3 = 3*N;
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 123 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
acc2 = svdup_f64(0.0);
acc3 = svdup_f64(0.0);
ptrIN_left = &inLeft[x*K];
ptrIN_right = &inRight[y];
for (z=0; z<K; z+=2) {
inR_0 = svld1(p64_all, ptrIN_right);
inR_1 = svld1(p64_all, &ptrIN_right[offsetOUT_1]);
ptrIN_right += 2*N;
ptrIN_left += 2;
}
ptrOUT += vl;
}
}
}
// Disable all SVE traps by setting CPTR_EL3.EZ bit [8] and clearing CPTR_EL3.TFP
bit [10]
void disable_sve_traps(void)
{
__asm(
"MRS x0, CPTR_EL3\n"
"BIC x0, x0, #(1<<10)\n"
"ORR x0, x0, #(1<<8)\n"
"MSR CPTR_EL3, x0\n"
"ISB\n"
);
}
disable_sve_traps();
srand((unsigned int)time(0));
{
inRight[x] = ((double)(rand() % 2000000) / 100.f) - 10000.0;
}
For FVP models, you can either use the disable_sve_traps() function or specify
the -C SVE.ScalableVectorExtension.enable_at_reset=true parameter.
#!/bin/bash
# fvp-run.sh
# Usage: fvp-run.sh [veclen] [binary]
# Executes the specified binary in the FVP, with no command-line
# arguments. The SVE register width is [veclen] x 64 bits. Only
# even values of veclen are valid.
#
#
# Set the FVP_BASE environment variable to point to the FVP directory.
#
# Set the ARMLMD_LICENSE_FILE environment variable to reference a license
# file or license server with entitlement for the FVP.
VECLEN=$1
CMDLINE=$2
$FVP_BASE/FVP_Base_AEMvA \
--plugin $FVP_BASE/ScalableVectorExtension.so \
-C SVE.ScalableVectorExtension.veclen=$VECLEN \
--quiet \
--stat \
-C cluster0.NUM_CORES=1 \
-C bp.secure_memory=0 \
-C bp.refcounter.non_arch_start_at_default=1 \
-C cluster0.cpu0.semihosting-use_stderr=1 \
-C bp.vis.disable_visualisation=1 \
-C cluster0.cpu0.semihosting-cmd_line="$CMDLINE" \
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 125 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
-a cluster0.cpu0=$CMDLINE
This script loads and executes the compiled binary with the FVP, and outputs the following
information:
Related information
Arm Compiler for Embedded Reference Guide
-o (armclang)
armclang -Xlinker option
armclang -Olevel option
-march (armclang)
--target (armclang)
This information assumes that you are familiar with details of the SVE Architecture,
including vector-width agnostic registers, predication, and WHILE operations.
The following sections describe information relating to SVE. For general information about writing
inline assembly code, see Writing inline assembly code.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 126 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
Outputs
Each entry in outputs has one of the following forms:
The first form has the register class preceded by =&. This form specifies that the assembly
instructions might read from one of the inputs (specified in the inputs section of the __asm
statement) after writing to the output.
The second form has the register class preceded by =. This form specifies that the assembly
instructions never read from inputs in this way. Using the second form is an optimization. It allows
the compiler to allocate the same register to the output as it allocates to one of the inputs.
Both forms specify that the assembly instructions produce an output that is stored in the C object
specified by destination. This can be any scalar value that is valid for the left-hand side of a C
assignment. The register-class field specifies the type of register that the assembly instructions
require. It can be one of:
r
The register for this output when used within the assembly instructions is a general-purpose
register (x0-x30)
w
The register for this output when used within the assembly instructions is a SIMD and
floating-point register (v0-v31).
It is not possible at present for outputs to contain an SVE vector or predicate value. All uses of SVE
registers must be internal to the inline assembly block.
It is the responsibility of the compiler to allocate a suitable output register and to copy that register
into the destination after the __asm statement is executed. The assembly instructions within the
instructions section of the __asm statement can use one of the following forms to refer to the
output value:
%[name]
Refers to an r-class output as x<N> or a w-class output as v<N>.
%w[name]
Refers to an r-class output as w<N>.
%s[name]
Refers to a w-class output as s<N>.
%d[name]
Refers to a w-class output as d<N>.
In all cases <N> represents the number of the register that the compiler has allocated to the output.
The use of these forms means that it is not necessary for the programmer to anticipate precisely
which register is selected by the compiler. The following example creates a function that returns
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 127 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
the value 10. It shows how the programmer is able to use the %w[res] form to describe the
movement of a constant into the output register without knowing which register is used.
int f()
{
int result;
__asm("movz %w[res], #10" : [res] "=r" (result));
return result;
}
In optimized output the compiler picks the return register (0) for res, resulting in the following
assembly code:
Inputs
Within an asm statement, each entry in the inputs section has the form:
This construct specifies that the __asm statement uses the scalar C expression value as an input,
referred to within the assembly instructions as name. The <operand-type> field specifies how the
input value is handled within the assembly instructions. It can be one of the following:
r
The input is to be placed in a general-purpose register (x0-x30).
w
The input is to be placed in a SIMD and floating-point register (v0-v31).
[<output-name>]
The input is to be placed in the same register as output <operand-type>. In this case the
[<name>] part of the input specification is redundant and can be omitted. The assembly
instructions can use the forms described in Outputs to refer to both the input and the
output. That is, %[<name>], %w[<name>], %s[<name>], and %d[<name>].
i
The input is an integer constant and is used as an immediate operand. The assembly
instructions use %[<name>] in place of immediate operand <#N>, where <N> is the numerical
value of <value>.
In the first two cases, it is the responsibility of the compiler to allocate a suitable register and to
ensure that it contains <value> on entry to the assembly instructions. The assembly instructions
must refer to these registers using the same syntax as for the outputs. That is, %[<name>],
%w[<name>], %s[<name>], and %d[<name>].
It is not possible at present for inputs to contain an SVE vector or predicate value. All uses of SVE
registers must be internal to instructions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 128 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
This example shows an __asm directive with the same effect as the previous example, except that
an i-form input is used to specify the constant to be assigned to the result.
int f()
{
int result;
__asm("movz %w[res], %[value]" : [res] "=r" (result) : [value] "i" (10));
return result;
}
Side effects
Many asm statements have effects other than reading from inputs and writing to outputs. This is
particularly true of __asm statements that implement vectorized loops, since most such loops read
from or write to memory. The <clobber_list> section of an __asm statement tells the compiler
what these additional effects are. Each entry must be one of the following:
"memory"
The __asm statement reads from or writes to memory. This is necessary even if inputs contain
pointers to the affected memory.
"cc"
The __asm statement modifies the condition-code flags.
"x<N>"
The __asm statement modifies general-purpose register <N>.
"v<N>"
The __asm statement modifies SIMD and floating-point register <N>.
"z<N>"
The __asm statement modifies SVE vector register <N>. Since SVE vector registers extend the
SIMD and floating-point registers, this is equivalent to writing "v<N>".
"p<N>"
The __asm statement modifies SVE predicate register <N>.
Use of volatile
Sometimes an __asm statement might have dependencies and side effects that cannot be captured
by the __asm statement syntax. For example, suppose there are three separate __asm statements
(not three lines within a single __asm statement), that do the following:
• The first sets the floating-point rounding mode.
• The second executes on the assumption that the rounding mode set by the first statement is in
effect.
• The third statement restores the original floating-point rounding mode.
It is important that these statements are executed in order, but the __asm statement syntax
provides no direct method for representing the dependency between them. Instead, each
statement must add the keyword volatile after __asm. This prevents the compiler from removing
the __asm statement as dead code, even if the __asm statement does not modify memory and if
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 129 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
its results appear to be unused. The compiler always executes __asm volatile statements in their
original order.
For example:
An __asm volatile statement must still have a valid side effects list. For example,
an __asm volatile statement that modifies memory must still include "memory" in
the side-effects section.
Labels
The compiler might output a given __asm statement more than once, either as a result of optimizing
the function that contains the __asm statement or as a result of inlining that function into some of
its callers. Therefore, __asm statements must not define named labels like .loop, since if the __asm
statement is written more than once, the output contains more than one definition of label .loop.
Instead, the assembler provides a concept of relative labels. Each relative label is simply a number
and is defined in the same way as a normal label. For example, relative label 1 is defined by:
1:
The assembly code can contain many definitions of the same relative label. Code that refers to a
relative label must add the letter f (forward) to refer the next definition or the letter b (backward) to
refer to the previous definition. A typical assembly loop with a pre-loop test would therefore have
the following structure:
...pre-loop test...
b.none 2f
1:
...loop...
b.any 1b
2:
This structure allows the compiler output to contain many copies of this code without creating any
ambiguity.
Examples
The following example shows a simple function that performs a fused multiply-add operation (x=a·b
+c) across four passed-in arrays of a size specified by <n>:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 130 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
An __asm statement that exploits SVE instructions to achieve equivalent behavior might look like
the following:
void f(double *x, double *a, double *b, double *c, unsigned long n)
{
unsigned long i;
__asm ("whilelo p0.d, %[i], %[n] \n\
1: \n\
ld1d z0.d, p0/z, [%[a], %[i], lsl #3] \n\
ld1d z1.d, p0/z, [%[b], %[i], lsl #3] \n\
ld1d z2.d, p0/z, [%[c], %[i], lsl #3] \n\
fmla z2.d, p0/m, z0.d, z1.d \n\
st1d z2.d, p0, [%[x], %[i], lsl #3] \n\
uqincd %[i] \n\
whilelo p0.d, %[i], %[n] \n\
b.any 1b"
: [i] "=&r" (i)
: "[i]" (0),
[x] "r" (x),
[a] "r" (a),
[b] "r" (b),
[c] "r" (c),
[n] "r" (n)
: "memory", "cc", "p0", "z0", "z1", "z2");
}
Keeping the restrict qualifiers would be valid but would have no effect.
The input specifier "[i]" (0) indicates that the assembly statements take an input 0 in the same
register as output [i]. In other words, the initial value of [i] must be zero. The use of =& in the
specification of [i] indicates that [i] cannot be allocated to the same register as [x], [a], [b], [c],
or [n] (because the assembly instructions use those inputs after writing to [i]).
In this example, the C variable i is not used after the __asm statement. In effect the __asm
statement is simply reserving a register that it can use as scratch space. Including "memory" in the
side effects list indicates that the __asm statement reads from and writes to memory. The compiler
must therefore keep the __asm statement even though i is not used.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 131 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
Introduction
The Arm C Language Extensions (ACLE) for SVE provide a set of types and accessors for SVE vectors
and predicates, and a function interface for all relevant SVE and SVE2 instructions.
The function interface is more general than the underlying architecture, so not every function maps
directly to an architectural instruction. The intention is to provide a regular interface and leave the
compiler to pick the best mapping to SVE or SVE2 instructions.
The Arm C Language Extensions specification has a detailed description of this interface, and must
be used as the primary reference. This section introduces a selection of features to help you get
started with the ACLE for SVE.
#ifdef __ARM_FEATURE_SVE
#include <arm_sve.h>
#endif /* __ARM_FEATURE_SVE */
All functions and types that are defined in the header file have the prefix sv, to reduce the chance
of collisions with other extensions.
svint8_t svuint8_t
For example, svint64_t represents a vector of 64-bit signed integers, and svfloat16_t represents
a vector of half-precision floating-point numbers.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 132 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
The main use of predicates is to select elements in a vector. When the elements in the vector
have N bytes, only the low bit in each sequence of N predicate bits is significant, as shown in the
following table:
Because of their unknown size at compile time, SVE types must not be used:
• To declare or define a static or thread-local storage variable.
• As the type of an array element.
• As the operand to a new expression.
• As the type of object that is deleted by a delete expression.
• As the argument to sizeof and _Alignof.
• With pointer arithmetic on pointers to SVE objects (this affects the +, -, ++, and -- operators).
• As members of unions, structures and classes.
• In standard library containers like std::vector.
For a comprehensive list of valid usage, refer to the Arm C Language Extensions specification.
sv<base>[_<disambiguator>][_<type0>][_<type1>]...[_<predication>]
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 133 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
<base>
For most functions, this name is the lowercase name of the SVE instruction. Sometimes,
letters indicating the type or size of data being operated on are omitted, where it can be
implied from the argument types.
Unsigned extending loads add a u to indicate that the data is zero extended, to more
explicitly differentiate them from their signed equivalent.
<disambiguator>
This field distinguishes between different forms of a function, for example:
• To distinguish between addressing modes
• To distinguish forms that take a scalar rather than a vector as the final argument.
<type0> <type1> ...
A list of types for vectors and predicates, starting with the return type then with each
argument type. For example, _s8, _u32, and _f32, which represent signed 8-bit integer, an
unsigned 32-bit integer and single-precision 32-bit float types, respectively.
Predicate types are represented by, for example, _b8 and _b16, for predicates suitable
for 8-bit and 16-bit types respectively. A predicate type suitable for all element types is
represented by _b. Where a type is not needed to disambiguate between variants of a base
function, it is omitted.
<predication>
This suffix describes the inactive elements in the result of a predicated operation. It can be
one of the following:
• z - Zero predication: Set all inactive elements of the result to zero.
• m - Merge predication: copy all inactive elements from the first vector argument.
• x - 'Don't care' predication. Use this form when you do not care about the inactive
elements. The compiler is then free to choose between zeroing, merging, or unpredicated
forms to give the best code quality, but gives no guarantee of what data is left in inactive
elements.
Addressing modes
Load, store, prefetch, and ADR functions have arguments that describe the memory area being
addressed. The first addressing argument is the base - either a single pointer to an element type,
or a 32-bit or 64-bit vector of addresses. The second argument, when present, offsets the base (or
bases) by some number of bytes, elements, or vectors. This offset argument can be an immediate
constant value, a scalar argument, or a vector of offsets.
Not every combination of the addressing modes exists. The following table gives examples of some
common addressing mode disambiguators, and describes how to interpret the address arguments:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 134 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
Disambiguator Interpretation
_s32offset The offset argument is a vector of byte offsets. These offsets are
signed or unsigned 32-bit or 64-bit numbers.
_s64offset
_u32offset
_u64offset
_s32index The offset argument is a vector of element-sized indices. These
indices are signed or unsigned 32-bit or 64-bit numbers.
_s64index
_u32index
_u64index
_offset The offset argument is a scalar, and must be treated as a byte offset.
_index The offset argument is a scalar, and must be treated as an index into
an array of elements.
_vnum The offset argument is a scalar, and must be treated an index into an
array of SVE vectors.
svuint32_t svld1_gather_[s32]index[_u32]
(svbool_t pg, const uint32_t *base, svint32_t indices)
Similarly, arithmetic functions that take three vector inputs have an alternative form that takes two
vectors and one scalar.
To differentiate these forms, the disambiguator _n is added to the form that takes a scalar.
Short forms
Sometimes, it is possible to omit part of the full name, and still uniquely identify the correct form
of a function, by inspecting the argument types. Where omitting part of the full name is possible,
these simplified forms are provided as aliases to their fully named equivalents, and are used for
preference in the rest of this document.
In the Arm C Language Extensions specification, the portion that can be removed is enclosed in
square brackets. For example svclz[_s16]_m has the full name svclz_s16_m, and an overloaded
alias, svclz_m.
SVE2 intrinsics
SVE2 builds on SVE to add data-processing instructions that bring the benefits of scalable long
vectors to a wider class of applications. To enable only the base SVE2 instructions, use the +sve2
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 135 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
option with the armclang options -march or -mcpu. To enable additional optional SVE2 instructions,
use the following armclang options:
• +sve2-aes to enable scalable vector forms of AESD, AESE, AESIMC, AESMC, PMULLB, and PMULLT
instructions.
• +sve2-bitperm to enable the BDEP, BEXT, and BGRP instructions.
• +sve2-sha3 to enable scalable vector forms of the RAX1 instruction.
• +sve2-sm4 to enable scalable vector forms of SM4E and SM4EKEY instructions.
You can use one or more of these options. Each option also implies +sve2. For example, +sve2-aes
+sve2-bitperm+sve2-sha3+sve2-sm4 enables all base and optional instructions. For clarity, you can
include +sve2 if necessary.
See -march and -mcpu in the Arm Compiler for Embedded Reference Guide for more information.
This example presents a step-1 daxpy implementation, where the indices of x and y start at 0 and
increment by 1 for each iteration. A C code implementation might look like the following:
Example notes
[1] - Initialize a predicate register to control the loop. _b64 specifies a predicate for 64-
bit elements. Conceptually, this operation creates an integer vector starting at i and
incrementing by 1 in each subsequent lane. The predicate lane is active if this value is less
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 136 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
than n. Therefore, this loop is safe, if inefficient, even if n ≤ 0. The same operation is used at
the bottom of the loop, to update the predicate for the next iteration.
[2] - Load some values into an SVE vector, which is guarded by the loop predicate. Lanes
where this predicate is false do not perform any load (and so do not generate a fault), and set
the result value to 0.0. The number of lanes that are loaded depends on the vector width,
which is only known at runtime.
[3] - Perform a floating-point multiply-add operation, and pass the result to a store. The _x on
the MLA indicates we do not care about the result for inactive lanes. This gives the compiler
maximum flexibility in choosing the most efficient instruction. The result of this operation is
stored at address &dy[i], guarded by the loop predicate. Lanes where the predicate is false
are not stored, and the value in memory retains its prior value.
[5] - ptest returns true if any lane of the (newly updated) predicate is active, which causes
control to return to the start of the while loop if there is any work left to do.
daxpy_1_1:
MOV Z2.D, D0 // da
MOV X3, #0 // i
WHILELT P0.D, X3, X0 // i, n
loop:
LD1D Z1.D, P0/Z, [X1, X3, LSL #3]
LD1D Z0.D, P0/Z, [X2, X3, LSL #3]
FMLA Z0.D, P0/M, Z1.D, Z2.D
ST1D Z0.D, P0, [X2, X3, LSL #3]
INCD X3 // i
WHILELT P0.D, X3, X0 // i, n
B.ANY loop
RET
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 137 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SVE Coding Considerations with Arm Compiler for Embedded 6
Example notes
[1] - For each of x and y, initialize a vector of indices, starting at 0 for the first lane and
incrementing by incx and incy respectively in each subsequent lane.
[3] - Load a vector's worth of values, which are guarded by the loop predicate. Lanes where
this predicate is false do not perform any load (and so do not generate a fault), and set the
result value to 0.0. This time, a base + vector-of-indices gather load, is used to load the
required non-consecutive values.
[4] - Perform a floating-point multiply-add operation, and pass the result to a store. This time,
the base + vector-of-indices scatter store is used to store each result in the correct index of
the dy[] array.
[5] - Instead of using i to calculate the load address, increment the base pointer, by
multiplying the vector length by the stride.
[7] - Test the loop predicate to work out whether there is any more work to do, and loop
back if appropriate.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 138 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
When a processor accesses instructions and data, the access is either aligned or unaligned. An
access is aligned if the address is a multiple of the element size. Otherwise, the access is unaligned.
The element size depends on the processor architecture and the data type, such as char and int.
For types such as structures, the alignment might be more complicated depending on the type of
each structure member.
We can consider alignment as two distinct aspects, instruction alignment and data access
alignment.
Instruction alignment
Instructions in the Arm architecture are aligned as follows:
• A32 and A64 instructions are word-aligned.
• T32 and ThumbEE instructions are halfword-aligned.
• Java bytecodes are byte-aligned.
Instruction alignment is defined as a power of 2. That is, an address a is 2n byte aligned only if it is
a multiple of 2n.
Any attempt to fetch an instruction from a misaligned location results in a PC alignment fault.
For a variable x with a basic type of size n bytes, such as int, then x is aligned only if x is placed
at an n-byte aligned address. However, the size of more complex types does not contribute to the
alignment in the same way as basic data types.
For a complex data type, such as a structure, the alignment is that of the member with the biggest
alignment. Also, if all members in a structure are aligned, then the structure is aligned.
The following table shows the natural alignment requirement for some basic data types:
Table 7-1: Armv8 AArch32 alignment requirements of load and store instructions
Type Size in bytes (bits) Natural alignment requirement
char 1 (8) Address divisible by 1 - Always aligned
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 139 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
In practice, data might not always be aligned. You can override the natural alignment of a variable
in your source code with attributes or keywords, such as the __attribute__((aligned)) variable
attribute. Overriding the alignment can ultimately cause the compiler to generate code with
unaligned accesses through attributes such as __attribute__((packed)) or using unsafe cast
alignment. However, unaligned accesses might cause alignment faults. Your code might or might
not execute without fault depending on:
• Whether the processor supports unaligned accesses.
• Whether the instruction generated supports unaligned accesses. For example, LDRD does
not support unaligned accesses and generates a run-time exception if it attempts to access
unaligned data.
If natural alignment is the most efficient way that a processor can access data, why change it?
Using a custom alignment can significantly improve performance or save memory, especially with
structures.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 140 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
For more information on Normal and Device memory and restrictions for each supported
architecture, see:
• ARMv6-M Architecture Reference Manual.
• ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition.
• ARMv7-M Architecture Reference Manual.
• Arm Architecture Reference Manual for A-profile architecture, for Armv8-A and Armv9-A
architectures.
• Armv8-M Architecture Reference Manual.
For example, Arm® Cortex®-M0 does not support unaligned accesses. Therefore, if some
instructions complete a transaction of a piece of data that does not lie on a word boundary, then
the processor throws an Alignment fault at the execution level.
Some processors support architectures such as Armv7 that allow for unaligned accesses. Therefore,
if some instructions load data from memory, and this data does not lie on a word boundary, the
processor still completes the translation. However, these unaligned accesses have a cost.
For example, data might begin at an address that is not divisable by 4, such as at address 0x1001. In
this case, the processor must first access the data at address 0x1000 and then apply an algorithm to
access the required data value at byte 0x1001. This operation takes time and lowers performance.
Therefore, having all data addresses aligned is more efficient. Data that spans page boundaries and
caching can also increase the number of transactions, and degrade performance as a result.
Access alignment relates to the lower level on the software stack, rather than being present at
the source-code level. Access alignment concerns memory transactions that are performed at an
instruction level that might be part of a more complex piece of data.
For example, an attempt to fetch a complex type from memory, such as a struct at C level that
contains char and short types might occur in multiple load instructions. The alignment of
accesses is dictated by checking whether each load instruction is aligned. A load is aligned when
the address of the load after applying offsets is divisible by the size of the load being fetched. That
is, checking whether the natural word boundaries are honored.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 141 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
Table note
[1] Assumes that the processor supports unaligned accesses.
Arm®v7 and later architectures must support unaligned data accesses for some load and store
instructions.
Table 7-3: Armv8 AArch32 alignment requirements of load and store instructions
Instructions Alignment check Result if check fails when Result if check fails when
SCTLR.A or HSCTLR.A is 0 SCTLR.A or HSCTLR.A is 1
LDRB, LDREXB, LDRBT, LDRSB, None - -
LDRSBT, STRB, STREXB, STRBT,
TBB
LDRH, LDRHT, LDRSH, LDRSHT, Halfword Unaligned access Alignment fault
STRH, STRHT, TBH
LDREXH, STREXH, LDAH, STLH, Halfword Alignment fault Alignment fault
LDAEXH, STLEXH
LDR, LDRT, STR, STRT Word Unaligned access Alignment fault
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 142 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
Table note
[1] The : character is the preferred separator, but @<align> is also supported.
Table notes
[1] These element and structure load and store instructions are only in the Advanced SIMD
Extension to the A32 and T32 instruction sets.
[2] The : character is the preferred separator, but @<align> is also supported.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 143 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
Support for unaligned accesses is limited to a subset of load and store instructions:
• LDRB, LDRSB, and STRB.
• LDRH, LDRSH, and STRH.
• LDR and STR.
Also, unaligned accesses are only allowed to regions marked as Normal memory type. To enable
unaligned access support, set the SCTLR.A bit in the system control coprocessor. Attempts to
perform unaligned accesses when not allowed cause an Alignment fault, which is taken as a Data
Abort exception. See Unaligned data access for more information.
For example:
Most modern Arm processors have 64-bit or 128-bit interfaces. In this example, a processor
typically reads the 64-bit or 128-bit block containing bytes 0x8001, 0x8002, 0x8003, and 0x8004.
The processor discards the other bytes.
The four bytes of this load span both a 64-bit and 128-bit boundary. Therefore, with either
interface width, the processor has to perform two reads.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 144 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
In both of these examples, it is possible to see that unaligned accesses require more work by the
hardware. While more efficient than the software routines required by previous processors, it is still
less efficient than aligned accesses.
Pointer alignment in C
When compiling C, variables are by default architecturally aligned. A global of type int or
uint32_t is 4-byte aligned in memory. Similarly, a pointer of type int* is expected to contain a
4-byte aligned address.
Where this is not the case, or might not be the case, the variable or pointer must be marked with
the __unaligned keyword. This keyword is a warning to the compiler that the variable or pointer is
potentially unaligned. That is, it reduces the expected alignment of the pointer to 1-byte. For more
information, see __unaligned.
For a structure layout, you must use the __attribute__((packed)) variable or type attribute to
ensure the smallest possible alignment of structure members. For more information, see:
• __attribute__((packed)) type attribute.
• __attribute__((packed)) variable attribute.
Compiler assumptions
When compiling for an Armv7-A or Armv7-R processor, Arm Compiler for Embedded assumes that
it can use unaligned accesses.
The -mno-unaligned-access option tells the compiler not to knowingly generate unaligned
accesses. What is the significance of knowingly?
As mentioned in the previous section, a pointer must contain an address with correct alignment for
the type:
• uint32_t* requires 4-byte alignment.
• uint16_t* requires 2-byte alignment.
• uint8_t* requires 1-byte alignment.
The compiler generates code on the assumption that a pointer is correctly aligned. It does not add
code to perform run-time checks. A pointer might contain an incorrectly aligned address for many
reasons. A common cause is casting, for example:
uint8_t tmp;
uint32_t* pMyPointer = (uint32_t*)(&tmp);
This code takes the address of a uint8_t variable, then casts that address as a uint32_t
pointer. The compiler still assumes that pMyPointer is correctly aligned for a uint32_t pointer.
The compiler might then unknowingly generate code that results in an unaligned access.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 145 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
You can avoid this situation with the __unaligned qualifier, for example:
uint8_t tmp;
__unaligned uint32_t* pMyPointer = (__unaligned uint32_t*)(&tmp);
Code Generation
When unaligned accesses are permitted, the compiler continues to use instructions that support
unaligned accesses for accesses through __unaligned pointers. For example LDR and STR
instructions. However, it does not use instructions that do not support unaligned accesses, such as
LDM.
When unaligned accesses are not permitted, because you specified the compiler option -mno-
unaligned-access, the compiler accesses __unaligned data by performing a number of aligned
accesses. Usually, this access is done by calling a library function such as __aeabi_uread4().
Device Memory
Address regions that access peripherals rather than memory must be marked as Device memory.
Depending on the processor, this memory might be configured in the Memory Protection Unit (MPU)
or the Memory Management Unit (MMU). Unaligned accesses are not permitted to these regions
even when unaligned access support is enabled. If an unaligned access is attempted, the processor
generates an Alignment fault.
The compiler does not have any information about which address ranges are Device memory.
Therefore, it is your responsibility to ensure the alignment of accesses to devices. In practice,
peripheral registers are usually at aligned addresses. It is also usual to access peripheral registers
through volatile variables or pointers. Use of volatile restricts the compiler to accessing
the data with the size of access specified where possible. For more information on the restrictions
imposed on volatile types, see the Volatile Data Types section of the Procedure Call Standard for
the Arm Architecture.
It is also necessary to avoid using C library functions such as memcpy() to access Device memory,
because there is no guarantee of the type of accesses these functions use. If it is necessary to copy
a buffer of memory to a Device memory, you must provide a suitable copying routine and call this
routine instead of memcpy().
Performance
If code frequently accesses unaligned data, there might be a performance advantage to enabling
unaligned accesses. However, the extent of this advantage depends on many factors. Even though
this support allows a single instruction to access unaligned data, it often requires multiple bus
accesses to occur. Therefore, the bus transactions performed by an unaligned access might be
similar to those performed by the multiple instructions used when unaligned access support is
disabled. The code without unaligned access support has to perform various shift and logical
operations. However, on a multi-issue processor the execution time of these operations might
be hidden by executing them in parallel with the memory accesses. There is also a function call
overhead when using functions such as __aeabi_uread4(), though branch prediction might reduce
the impact of using these functions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 146 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
Related information
-munaligned-access, -mno-unaligned-access
__unaligned
Volatile variables
How can I debug an Arm AArch64 Alignment Abort?
memcpy and memset unaligned access and alignment fault
How a C compiler places basic C data types in memory is not arbitrary. Data does not normally
start at arbitrary byte addresses in memory. Rather, each type except char has an alignment
requirement:
• A single-byte char can start on any byte address.
• A 2-byte short must start on an even address.
• A 4-byte int or float must start on an address divisible by 4.
• An 8-byte long or double must start on an address divisible by 8.
That is, basic C types on a standard Instruction Set Architecture (ISA) are self-aligned. Pointers,
whether 32-bit (4-byte) or 64-bit (8-byte) are also self-aligned.
Self-alignment makes access faster because it facilitates generating single-instruction fetches and
puts of the typed data. However, without alignment constraints, the code might perform two or
more accesses that span machine-word boundaries. Characters are a special case and they are
equally expensive whereever they live inside a single machine word. That is why they do not have a
preferred alignment.
To ensure natural alignment, it might be necessary to insert some padding between structure
elements or after the last element of a structure.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 147 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
For more information, see Alignment at the source code and compilation level.
typedef struct
{
char a;
int b;
char c
short d;
} my_struct_t;
After compiling, the layout in memory is determined by the int type, because that has the highest
alignment. For example:
For this example and the following examples, the most important part of the
address is the last two hexadecimal values. Therefore, ?????? means any address
where the data might be placed by the compiler.
However, by changing the layout of the structure in the source code, you can assist the compilation
and reduce or remove the padding. For this example, change the struct to:
typedef struct
{
char a;
char c
short d;
int b;
} my_struct_t;
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 148 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
typedef struct
{
char a;
short d;
int b;
} my_struct_t;
As a consequence, not only is the data aligned in memory, but all accesses and all generated
instructions are aligned.
#include <stdio.h>
struct my_struct
{
char a;
short b;
int c;
};
struct my_struct f;
int main(void)
{
printf("%d\n", f.a + f.b + f.c);
return 0;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 149 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
...
main
0x00000000: b580 .. PUSH {r7,lr}
0x00000002: f2400000 @... MOVW r0,#:LOWER16: f
0x00000006: f2c00000 .... MOVT r0,#:UPPER16: f
0x0000000a: 7801 .x LDRB r1,[r0,#0]
0x0000000c: f9b02002 ... LDRSH r2,[r0,#2]
0x00000010: 6840 @h LDR r0,[r0,#4]
0x00000012: 4411 .D ADD r1,r1,r2
0x00000014: 4401 .D ADD r1,r1,r0
0x00000016: a002 .. ADR r0,{pc}+0xa ; 0x20
0x00000018: f7fffffe .... BL __2printf
0x0000001c: 2000 . MOVS r0,#0
0x0000001e: bd80 .. POP {r7,pc}
...
# Symbol Name Value Bind Sec Type Vis Size
========================================================================
...
7 f 0x00000000 Gb 7 Data Hi 0x8
...
After working out the initial value for register r0, it is possible to conclude that the various fetching
operations in this example represent aligned accesses.
However, if space is a constraint you can force the compiler to overlook the alignment
requirements to save space. Arm® Compiler for Embedded 6 provides this feature with the
__attribute__((packed)) type attribute. For more information, see __attribute__((packed)) type
attribute.
#include <stdio.h>
struct __attribute__((packed)) my_struct
{
char a;
short b;
int c;
};
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 150 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
View the contents of the object file using the fromelf command:
...
main
0x00000000: b580 .. PUSH {r7,lr}
0x00000002: f2400000 @... MOVW r0,#:LOWER16: f
0x00000006: f2c00000 .... MOVT r0,#:UPPER16: f
0x0000000a: 7801 .x LDRB r1,[r0,#0]
0x0000000c: f9b02001 ... LDRSH r2,[r0,#1]
0x00000010: f8d00003 .... LDR r0,[r0,#3]
0x00000014: 4411 .D ADD r1,r1,r2
0x00000016: 4401 .D ADD r1,r1,r0
0x00000018: a002 .. ADR r0,{pc}+0xc ; 0x24
0x0000001a: f7fffffe .... BL __2printf
0x0000001e: 2000 . MOVS r0,#0
0x00000020: bd80 .. POP {r7,pc}
0x00000022: bf00 .. NOP
...
# Symbol Name Value Bind Sec Type Vis Size
========================================================================
...
7 f 0x00000000 Gb 7 Data Hi 0x7
...
You can see that the size of f has changed to 7 bytes rather than 8 in the unpacked version.
Although this example shows that f is unaligned in memory, you can force the compiler to perform
aligned accesses to the elements of f using the command-line option -mno-unaligned-access.
View the contents of the object file using the fromelf command:
...
main
0x00000000: b580 .. PUSH {r7,lr}
0x00000002: f2400000 @... MOVW r0,#:LOWER16: f
0x00000006: f2c00000 .... MOVT r0,#:UPPER16: f
0x0000000a: f9901002 .... LDRSB r1,[r0,#2]
0x0000000e: 7843 Cx LDRB r3,[r0,#1]
0x00000010: 7802 .x LDRB r2,[r0,#0]
0x00000012: f890c004 .... LDRB r12,[r0,#4]
0x00000016: ea432101 C..! ORR r1,r3,r1,LSL #8
0x0000001a: f8103f03 ...? LDRB r3,[r0,#3]!
0x0000001e: 4411 .D ADD r1,r1,r2
0x00000020: 7882 .x LDRB r2,[r0,#2]
0x00000022: 78c0 .x LDRB r0,[r0,#3]
0x00000024: ea43230c C..# ORR r3,r3,r12,LSL #8
0x00000028: ea422000 B.. ORR r0,r2,r0,LSL #8
0x0000002c: ea434000 C..@ ORR r0,r3,r0,LSL #16
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 151 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
However all the accesses performed are aligned, which is possible to see because they are all byte
accesses (LDRB and LDRSB).
Therefore, the code occupies the same space but relies on aligned accesses. Although the aligned
accesses are useful for performance reasons, other factors that are out of the control of the
compiler might degrade the peformance. For example, accesses across page boundaries and
caching.
For example, unsafe casting is when you initialize a variable of one data type, and then cast it to
another data type with a bigger alignment requirement.
If you add the -mno-unaligned-access option during compilation, unaligned accesses still happen
at the assembly level.
Related information
-munaligned-access, -mno-unaligned-access
--unaligned_access, --no_unaligned_access
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 152 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
#include <stdio.h>
...
main
0x00000000: b580 .. PUSH {r7,lr}
0x00000002: f2400000 @... MOVW r0,#:LOWER16: c
0x00000006: f2c00000 .... MOVT r0,#:UPPER16: c
0x0000000a: 6800 .h LDR r0,[r0,#0]
0x0000000c: 6801 .h LDR r1,[r0,#0]
0x0000000e: a002 .. ADR r0,{pc}+0xa ; 0x18
0x00000010: f7fffffe .... BL __2printf
0x00000014: 2000 . MOVS r0,#0
0x00000016: bd80 .. POP {r7,pc}
...
The last LDR instruction is an unaligned access because it fetches a 4-byte integer starting at
address 0xc001 which means that the word boundaries are crossed:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 153 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
You can detect unsafe casts with the -Wcast-align compiler option, for example:
To abort the compilation when this situation occurs, use the -Werror=cast-align compiler option,
for example:
Although we initially have unaligned accesses, the code can still run on a processor that allows
unaligned accesses. However, some instructions such as LDRD only allow for aligned accesses.
Therefore, providing an unaligned address to LDRD causes a fault. In most cases, the compiler
ensures that LDRD instructions always work with aligned addresses. The only situation where it does
not follows from unsafe pointer casting.
#include <stdio.h>
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 154 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
...
main
0x00000000: b580 .. PUSH {r7,lr}
0x00000002: f2400000 @... MOVW r0,#:LOWER16: c
0x00000006: f2c00000 .... MOVT r0,#:UPPER16: c
0x0000000a: 6800 .h LDR r0,[r0,#0]
0x0000000c: 21aa .! MOVS r1,#0xaa
0x0000000e: 7001 .p STRB r1,[r0,#0]
0x00000010: e9d01200 .... LDRD r1,r2,[r0,#0]
0x00000014: a002 .. ADR r0,{pc}+0xc ; 0x20
0x00000016: f7fffffe .... BL __2printf
0x0000001a: 2000 . MOVS r0,#0
0x0000001c: bd80 .. POP {r7,pc}
0x0000001e: bf00 .. NOP
You can still use the -Wcast-align and -Werror=cast-align compiler options to detect these
situations.
.globl ASMDELAY
ASMDELAY:
subs r0,r0,#1
bne ASMDELAY
bx lr
To simulate different parameters of loop alignment, you can insert padding on top of the code.
However, the performance difference depends on the padding that you insert.
In application code that you are developing, it is harder to quantify and qualify the effects of the
alignment on performance. The microarchitecture and cache interactions with your application can
influence the effects.
For example, a bigger alignment boosts execution performance in general. However, loops usually
rely on the repeated code. Therefore, if the alignment is too big, the code might occupy more
memory than necessary and might not fit in cache. This situation hinders the performance.
Quantifying this trade-off is difficult, because making alignment decisions is difficult.
In general, you can try to set loop and function alignment to coincide to cache line size.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 155 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
Processor caches transfer data from and to main memory in chunks called cache lines. A typical size
for the cache line size is 64 bytes. Using an alignment larger than 64 bytes means crossing cache
line boundaries that result in more fetches.
However, a larger cache line size could mean that data has enough space to be properly aligned,
and in general, executing the code would be faster.
Alternatively, a smaller alignment than the cache line size might produce faster code because it
increases the use of the cache. However, to fit in these space boundaries it might also mean that
data must be unaligned, therefore, lowering performance.
The following [COMMUNITY] command-line options allow you to regulate the alignment of
functions and loops with:
• -falign-functions.
• -falign-loops.
For more information about these options, see the Clang command line argument reference.
#include <stdio.h>
#include "struct_packed.c"
struct my_struct f;
int main(void)
{
printf("%d\n", f.a + f.b + f.c);
return 0;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 156 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Alignment support in Arm Compiler for Embedded 6
Error: L6366E: main.o attributes are not compatible with the provided attributes .
Object main.o contains Build Attributes that are incompatible with the provided
attributes.
Tag_CPU_unaligned_access = The producer was permitted to generate architecture v6-
style unaligned data accesses (=1)
Finished: 2 information, 0 warning and 1 error messages.
If you add the -mno-unaligned-access option when compiling main.c, this error is not generated.
Related information
-munaligned-access, -mno-unaligned-access
--unaligned_access, --no_unaligned_access
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 157 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Building for different target architectures
If your application includes assembly code, assembling with +nofp reports an error if
your assembly code contains floating-point instructions. Therefore, we recommend
that you assemble with both +nofp and -mabi=aapcs-soft.
Procedure
1. Create the file main.c containiing the following C code:
#include <stdio.h>
#include <math.h>
int main(void)
{
puts("Hello, world!");
test_nofp(2.7f, -2.3f);
return 0;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 158 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Building for different target architectures
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 159 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
You can specify object files directly on the command line or specify a user library containing object
files. The linker:
• Resolves symbolic references between the input object files.
• Extracts object modules from libraries to resolve otherwise unresolved symbolic references.
• Removes unused sections.
• Eliminates duplicate common groups and common code, data, and debug sections.
• Sorts input sections according to their attributes and names, and merges sections with similar
attributes and names into contiguous chunks.
• Organizes object fragments into memory regions according to the grouping and placement
information that is provided in a memory description.
• Assigns addresses to relocatable values.
• Generates either a partial object if requested, for input to another link step, or an executable
image.
The linker has a built-in memory description that it uses by default. However, you can override this
default memory description with command-line options or with a scatter file. The method that you
use depends how much you want to control the placement of the various output sections in the
image:
• Allow the linker to automatically place the output sections using the default memory map for
the specified linking model. armlink uses default locations for the RO, RW, eXecute-Only (XO),
and ZI output sections.
• Use the memory map related command-line options to specify the locations of the RO, RW,
XO, and ZI output sections.
• Use a scatter file if you want to have the most control over where the linker places various
parts of your image. For example, you can place individual functions at specific addresses or
certain data structures at peripheral addresses.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 160 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
XO sections are supported only for images that are targeted at Arm®v6-M, Armv7-
M, or Armv8-M architectures.
If the location of some code or data lies outside all the regions that are specified in your scatter file,
the linker attempts to create a load and execution region to contain that code or data.
Multiple code and data sections cannot occupy the same area of memory, unless
you place them in separate overlay regions.
The following table describes how the OVERLAY and PROTECTED scatter-loading attributes affect the
armlink options --merge and --merge_litpools. The terms const string and const value have the
following meanings:
const string
A string literal from an ELF section with the SHF_MERGE and SHF_STRINGS flags.
const value
A constant defined in a constant pool where the constant pool is in the same section as the
code that uses it.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 161 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
const strings within a region are const strings within a region are
merged. merged.
--no_merge Disables the merging of all Disables the merging of all Disables the merging of all
const strings. const strings. const strings.
--merge_litpools Merges all const values. Prevents merging across regions Prevents merging across regions
marked OVERLAY. A const in marked PROTECTED with other
an OVERLAY can be merged into regions.
a region that is not marked with
either OVERLAY or PROTECTED. const values within a region are
merged.
const values within a region are
merged.
--no_merge_litpools Disables the merging of all Disables the merging of all Disables the merging of all
const values. const values. const values.
Related information
--merge, --no_merge
--merge_litpools, --no_merge_litpools
Merging identical constants
Load region attributes
Execution region attributes
Properties of PIC
There are a number of ways of implementing PIC, each with its own set of trade-offs.
Relocation required
Relocation, sometimes called rebasing, is where position independence can only be achieved
by applying alterations to the program identified by relocations. In most models, the
relocations are applied to the read/write part of the program, by an external program such
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 162 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
as a program loader, and applied once at load time. However, it is possible to bundle a loader
into the program so that the program can relocate itself.
PI models requiring relocation by an external program are more flexible than those without,
but they require you to build a more complex loader.
Online or offline position independence
The majority of PI applications are relocated at run-time when the application loads. In many
cases the ELF file and its data structures are used by the run-time loader. It is also possible
to construct a product out of components such as a hypervisor and guest operating systems.
When building a flash image, it can help to construct the image from components that can be
relocated when building the image, even if the addresses are fixed at run-time.
Shared Library Support or not
Supporting shared libraries presents some extra complexity. The library has its own code and
data separate from the program, and its address might not be known to the program at static
link time.
Fixed offset between code and data
A common implementation strategy, particularly when there is a Memory Management Unit
(MMU) available, is to place the data for a program at a fixed offset away from the code. This
strategy permits access to the data PC-relative with no relocations. This strategy might not
work for Cortex®-M processors, because each instantiation of the program requires the code
and data to be copied into RAM.
Data accessed through an offset from a static base
An alternative implementation strategy is also supported, particularly when there is no MMU
available. In this strategy, place all the data in a contiguous block of memory and reserve
a register, R9, as the static base. All data is accessed through offsets from the static base.
This strategy does not require any relationship between code and data address, so code can
be in flash and data can be at any point in RAM. The limitation of this strategy is that every
program and shared library has its own static base, so implementing shared libraries with their
own static data is more complicated.
For more information, see the Procedure Call Standard for the Arm Architecture.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 163 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
PC-relative
relocations
resolved at static
link time
Code
RO
PLT
RO Data
Fixed offset
GOT
Can contain absolute or
relative relocations,
.got.plt
might be resolved at
load/run time.
RW
RW data and ZI
For a more thorough explanation of PIC, see Position Independent Code (PIC) in shared
libraries. Although the examples are in X86_64, the general principle is the same.
When a dynamic relocation can be resolved without needing a symbol lookup, then the
relocation can be expressed as R_<ARCH>_RELATIVE. For example, a relocation to a non-
preemptable definition in the same module. To resolve an R_<ARCH>_RELATIVE relocation, a
loader only needs to add the displacement between the static link address and the address
the program is being loaded at. This displacement is the same for all relative relocations.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 164 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
.got
Static link time
fixed offset
RW
All relative relocations
.data.rel.ro
resolved by adding
.got displacement to
location
RW
• The --sysv option is intended for a sophisticated ELF loader that is able to resolve
dynamic relocations. The details of writing such a loader are outside the scope of
this document. For more information, see the section Program Loading and Dynamic
Linking in the System V ABI for the Arm 64-bit Architecture (AArch64).
• The --bare-metal-pie option is limited to single position independent executables,
but only needs a simple loader. See Bare-metal Position Independent Executables.
For systems without a MMU, the code and data must be copied into a contiguous free
block of memory, maintaining the fixed offset from code to data. It is not possible to
run code from flash and to have data in RAM.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 165 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
RO code
Relative relocations
GOT
RW
Bare-metal PIE can support C++ because the relocations are fixed up by the loader.
The main drawback is that the RO part and RW part have to be a fixed distance apart.
This fixed separation can make it more difficult to deploy in single address space
environments. The armlink option --bare_metal_pie is available to support the bare-
metal PIE linking model.
Available armclang command-line options
• -fbare-metal-pie
• -fpic, -fno-pic
• -fsysv, -fno-sysv
• -shared
Available armlink command-line options
• --bare_metal_pie
• --bare_metal_sysv
• --fpic
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 166 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
• --shared
• --sysv
no ROPI ROPI
no RWPI RO and RW data is accessed at an RO data access is PC-relative
absolute address
RW data is accessed at an absolute
address
RWPI RO data is accessed at an absolute RO data access is PC-relative
address
RW data access is relative to a static base
RW data access is relative to a static base address
address
In practice, the options are often used together because either all PI or no-PI is usually
required.
The default configuration for ROPI and RWPI do not require relocations.
ROPI
Instead of loading the address of RO data, the compiler loads an offset from the PC to
the RO data. This option means that the RO data must be placed at a fixed offset from
the code at static link time.
RWPI
The platform register r9 becomes the static base register. This register points to the
start of the static, RW, data for the program. All RW data are accessed using an offset
from the static base register. This option means that the offset to any datum from the
static base must be known at static link time.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 167 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
RO code
PC
relative
offset
RO data
No assumptions
about offset from
RO to RW
R9 Static Base
RW data
Offset from
Static Base
Linking a program that has a ROPI and RWPI part and a non-ROPI and non-RWPI part
is difficult. It is better to separate the ROPI and RWPI part and the non-ROPI and non-
RWPI part into two programs.
Related information
Bare-metal Position Independent Executables on page 190
SysV Dynamic Linking on page 386
Linking Models Supported by armlink
SysV Shared Libraries and Executables
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 169 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Procedure
1. Create peripheral.c to place the my_peripheral variable at address 0x10000000.
#include "stdio.h"
int main(void)
{
printf("%d\n",my_peripheral);
return 0;
}
LR_2 0x01000000
{
ER_ZI +0 UNINIT
{
*(.bss)
}
}
LR_3 0x10000000
{
ER_PERIPHERAL 0x10000000 UNINIT
{
*(.bss.ARM.__at_0x10000000)
}
}
...
Load Region LR_3 (Base: 0x10000000, Size: 0x00000000, Max: 0xffffffff,
ABSOLUTE)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 170 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
You might have to update your startup code to use the correct initial stack pointer. Some
processors, such as the Cortex®-M3 processor, require that you place the initial stack pointer in
the vector table. See Stack and heap configuration in AN179 - Cortex-M3 Embedded Software
Development for more details.
Procedure
1. Define two special execution regions in your scatter file that are named ARM_LIB_HEAP and
ARM_LIB_STACK.
2. Assign the EMPTY attribute to both regions.
Because the stack and heap are in separate regions, the library selects the non-default
implementation of __user_setup_stackheap() that uses the value of the symbols:
• Image$$ARM_LIB_STACK$$ZI$$Base.
• Image$$ARM_LIB_STACK$$ZI$$Limit.
• Image$$ARM_LIB_HEAP$$ZI$$Base.
• Image$$ARM_LIB_HEAP$$ZI$$Limit.
You can specify only one ARM_LIB_STACK or ARM_LIB_HEAP region, and you must allocate a size.
LOAD_FLASH ...
{
...
ARM_LIB_STACK 0x40000 EMPTY -0x20000 ; Stack region growing down
{ }
ARM_LIB_HEAP 0x28000000 EMPTY 0x80000 ; Heap region growing up
{ }
...
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 171 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Because the stack and heap are in the same region, __user_setup_stackheap() uses the value
of the symbols Image$$ARM_LIB_STACKHEAP$$ZI$$Base and Image$$ARM_LIB_STACKHEAP$$ZI$
$Limit.
If the initial entry point is not in a root region, the link fails and the linker gives an error message.
Example
Root region with the same load and execution address.
To specify a root region, use ABSOLUTE as the attribute for the execution region. You can either
specify the attribute explicitly or permit it to default, and use the same address for the first
execution region and the enclosing load region.
To make the execution region address the same as the load region address, either:
• Specify the same numeric value for both the base address for the execution region and the
base address for the load region.
• Specify a +0 offset for the first execution region in the load region.
If you specify an offset of zero (+0) for all subsequent execution regions in the load region, then
all execution regions not following an execution region containing ZI are also root regions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 172 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Example
The following example shows an implicitly defined root region:
Use the FIXED execution region attribute to ensure that the load address and execution address of
a specific region are the same.
You can use the FIXED attribute to place any execution region at a specific address in ROM.
For example, the following memory map shows fixed execution regions:
init.o init.o
0x80000
Single (FIXED)
load
Empty
region
(movable)
*(RO) *(RO)
0x4000
You can use this attribute to place a function or a block of data, for example a constant table or a
checksum, at a fixed address in ROM. This makes it easier to access the function or block of data
through pointers.
If you place two separate blocks of code or data at the start and end of ROM, some of the memory
contents might be unused. For example, you might place some initialization code at the start of
ROM and a checksum at the end of ROM. Use the * or .ANY module selector to flood fill the region
between the end of the initialization block and the start of the data block.
To make your code easier to maintain and debug, use the minimum number of placement
specifications in scatter files. Leave the detailed placement of functions and data to the linker.
There are some situations where using FIXED and a single load region are not
appropriate. Other techniques for specifying fixed locations are:
• If your loader can handle multiple load regions, place the RO code or data in its
own load region.
• If you do not require the function or data to be at a fixed location in ROM, use
ABSOLUTE instead of FIXED. The loader then copies the data from the load region
to the specified address in RAM. ABSOLUTE is the default attribute.
• To place a data structure at the location of memory-mapped I/O, use two load
regions and specify UNINIT. UNINIT ensures that the memory locations are not
initialized to zero.
LR1 0x8000
{
ER_LOW +0 0x1000
{
*(+RO)
}
; At this point the next available Load and Execution address is 0x8000 + size of
; contents of ER_LOW. The maximum size is limited to 0x1000 so the next available
Load
; and Execution address is at most 0x9000
ER_HIGH 0xF0000000 FIXED
{
*(+RW,+ZI)
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 174 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
; The required execution address and load address is 0xF0000000. The linker inserts
; 0xF0000000 - (0x8000 + size of(ER_LOW)) bytes of padding so that load address
matches
; execution address
}
; The other common misuse of FIXED is to give a lower execution address than the
next
; available load address.
LR_HIGH 0x100000000
{
ER_LOW 0x1000 FIXED
{
*(+RO)
}
; The next available load address in LR_HIGH is 0x10000000. The required Execution
; address is 0x1000. Because the next available load address in LR_HIGH must
increase
; monotonically the linker cannot give ER_LOW a Load Address lower than 0x10000000
}
Use the following procedure to modify your source code to place functions and data in a specific
section using a scatter file.
Procedure
1. Create a C source file file.c to specify a section name foo for a variable and a section name
.bss.mybss for a zero-initialized variable z, for example:
#include "stdio.h"
int main(void)
{
int x = 4;
int y = 7;
z = x + y;
printf("%d\n",variable);
printf("%d\n",z);
return 0;
}
2. Create a scatter file to place the named section, scatter.scat, for example:
LR_1 0x0
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 175 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
{
ER_RO 0x0 0x4000
{
*(+RO)
}
ER_RW 0x4000 0x2000
{
*(+RW)
}
ER_ZI 0x6000 0x2000
{
*(+ZI)
}
ER_MYBSS 0x8000 0x2000
{
*(.bss.mybss)
}
The ARM_LIB_STACK and ARM_LIB_HEAP regions are required because the program is being linked
with the semihosting libraries.
If you omit file.o (foo) from the scatter file, the linker places the section in
the region of the same type. That is, ER_RW in this example.
In this example:
• __attribute__((section("foo"))) specifies that the linker is to place the global variable
variable in a section called foo.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 176 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
...
Execution Region ER_MYBSS (Base: 0x00008000, Size: 0x00000004, Max:
0x00002000, ABSOLUTE)
Base Addr Size Type Attr Idx E Section Name
Object
• If scatter-loading is not used, the linker places the section foo in the default
ER_RW execution region of the LR_1 load region. It also places the section
.bss.mybss in the default execution region ER_ZI.
• If you have a scatter file that does not include the foo selector, then the
linker places the section in the defined RW execution region.
You can also place a function at a specific address using .ARM.__at_<address> as the section
name. For example, to place the function sqr at 0x20000, specify:
For more information, see Placement of functions and data at specific addresses.
Related information
Semihosting for AArch32 and AArch64
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 177 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
In the ELF specification, a PT_LOAD program header can be loaded by examining the fields:
• p_offset
• p_vaddr
• p_paddr. The value of this field is always the same as p_vaddr for armlink.
• p_filesz
• p_memsz
The ELF loader copies p_filesz bytes from the file at offset p_offset to the address specified by
p_vaddr. The loader then creates p_memsz - p_filesz bytes of zero-initialized (ZI) data at address
p_vaddr + p_filesz.
p_vaddr
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 178 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
int foo[0x10000];
int main(void)
{
return foo[0];
}
2. Create the file scatter.scat containing the following load and execution regions:
LR 0x8000
{
CODE +0
{
*(+RO)
}
RW_DATA +0
{
*(+RW)
}
/* ZI_DATA is not a root region */
ZI_DATA 0x10000000
{
*(+ZI)
}
}
LR_STACKHEAP 0x20000000
{
ARM_LIB_STACKHEAP +0 EMPTY 0x2000 {}
}
fromelf -s -v foo.axf
...
========================================================================
** Program header #0
** Section #5
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 179 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
...
179 foo 0x10000000 Gb 2 Data Hi 0x40000
...
If you use an ELF loader to create the memory based on the program header, then 0x402d0 -
0x2d0 bytes of ZI data are created at address 0x8000 + 0x2d0. This address does not match the
expected execution address of 0x10000000 as shown by the address of symbol foo.
Where they are required, the compiler normally produces RO, RW, and ZI sections from a single
source file. These sections contain all the code and data from the source file.
Typically, you create a scatter file that defines an execution region at the required address with a
section description that selects only one section.
To place a function or variable at a specific address, it must be placed in its own section. There are
several ways to place a function or variable in its own section:
• By default, the compiler places each function and variable in individual ELF sections. To
override this default placement, use the -fno-function-sections or -fno-data-sections
compiler options.
• Place the function or data item in its own source file.
• Use __attribute__((section("<name>"))) to place functions and variables in a specially
named section, .ARM.__at_<address>, where <address> is the address to place the function or
variable. For example, __attribute__((section(".ARM.__at_0x4000"))).
<address> is the required address of the section. The compiler normalizes this address to eight
hexadecimal digits. You can specify the address in hexadecimal or decimal. Sections in the form of
.ARM.__at_<address> are referred to by the abbreviation __at.
The following example shows how to assign a variable to a specific address in C or C++ code:
The name of the section is only significant if you are trying to match the section
by name in a scatter file. Without overlays, the linker automatically assigns __at
sections when you use the --autoat command-line option. This option is the
default. If you are using overlays, then you cannot use --autoat to place __at
sections.
For example, to specify the address as 0xE0001000 + MY_PREDEFINED_OFFSET, then use the
following code:
Related information
Placement of functions and data at specific addresses on page 180
Restrictions on placing __at sections on page 181
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 181 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
You cannot use __at section placement with position independent execution
regions.
When linking with the --autoat option, the linker does not place __at sections with scatter-loading
selectors. Instead, the linker places the __at section in a compatible region. If no compatible region
is found, the linker creates a load region and an execution region for the __at section.
All linker execution regions created by --autoat have the UNINIT scatter-loading attribute. If you
require a Zero-Initialized (ZI) __at section to be zero-initialized, then it must be placed within a
compatible region. A linker execution region created by --autoat must have a base address that is
at least 4 byte-aligned. If any region is incorrectly aligned, the linker produces an error message.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 182 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
The linker considers an __at section with type RW compatible with RO.
The following example shows the sections .ARM.__at_0x0000 type RO, .ARM.__at_0x4000 type RW,
and .ARM.__at_0x8000 type RW:
The following scatter file shows how automatically to place these __at sections:
LR1 0x0
{
ER_RO 0x0 0x4000
{
*(+RO) ; .ARM.__at_0x0000 lies within the bounds of ER_RO
}
ER_RW 0x4000 0x2000
{
*(+RW) ; .ARM.__at_0x4000 lies within the bounds of ER_RW
}
ER_ZI 0x6000 0x2000
{
*(+ZI)
}
}
; The linker creates a load region and an execution region for the __at section
; .ARM.__at_0x8000 because it lies outside all candidate regions.
You can use the standard section-placement rules to place __at sections when using the --
no_autoat command-line option.
You cannot use __at section placement with position independent execution
regions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 183 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
The following example shows the placement of read-only sections .ARM.__at_0x2000 and the
read-write section .ARM.__at_0x4000. Load and execution regions are not created automatically in
manual mode. An error is produced if an __at section cannot be placed in an execution region.
The following example shows the placement of the variables in C or C++ code:
The following scatter file shows how to place __at sections manually:
LR1 0x0
{
ER_RO 0x0 0x2000
{
*(+RO) ; .ARM.__at_0x0000 is selected by +RO
}
ER_RO2 0x2000
{
*(.ARM.__at_0x02000) ; .ARM.__at_0x2000 is selected by the section named
; .ARM.__at_0x2000
}
ER2 0x4000
{
*(+RW, +ZI) ; .ARM.__at_0x4000 is selected by +RW
}
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 184 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
LR1 0x0
{
ER_FLASH 0x8000 0x2000
{
*(+RO) ; other code and read-only data, the
; __at section is automatically selected
}
ER2 0x4000
{
*(+RW +ZI) ; Any other RW and ZI variables
}
}
Procedure
1. Create a C file abs_address.c to define an integer and a string constant.
unsigned int const number = 0x12345678;
char* const string = "Hello World";
2. Create a scatter file, scatter.scat, to place the constants in separate sections ER_RONUMBERS
and ER_ROSTRINGS.
LR_1 0x040000 ; load region starts at 0x40000
{ ; start of execution region descriptions
ER_RO 0x040000 ; load address = execution address
{
*(+RO +RW) ; all RO sections (must include section with
; initial entry point)
}
ER_RONUMBERS +0
{
*(.rodata.number, +RO-DATA)
}
ER_ROSTRINGS +0
{
*(.rodata.string, .rodata.str1.1, +RO-DATA)
}
; rest of scatter-loading description
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 185 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
4. Run fromelf on the image to view the contents of the output sections.
fromelf -c -d abs_address.axf
...
0x040000: 78 56 34 12 xV4.
0x040004: 48 65 6c 6c 6f 20 57 6f 72 6c 64 00 04 00 04 00 Hello
World.....
...
5. Replace the ER_RONUMBERS and ER_ROSTRINGS sections in the scatter file with the following
ER_RODATA section:
ER_RODATA +0
{
abs_address.o(.rodata.number, .rodata.string, .rodata.str1.1, +RO-DATA)
}
0x040000: 78 56 34 12 48 65 6c 6c 6f 20 57 6f 72 6c 64 00 xV4.Hello
World.
0x040010: 04 00 04 00 ....
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 186 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
The following procedure describes how to place the jump table in a ROM .rodata section.
Procedure
1. Create a C file jump.c .
Make the PFUNC type a pointer to a void function that has no parameters. You can then use
PFUNC to create an array of constant function pointers.
void jump(unsigned i)
{
if (i<=2)
table[i]();
}
3. Run fromelf on the image to view the contents of the output sections.
fromelf -c -d jump.o
The table is placed in the read-only section .rodata that you can place in ROM as required:
...
$a.0
[Anonymous symbol #24]
jump
0x00000000: e92d4800 .H-. PUSH {r11,lr}
0x00000004: e24dd008 ..M. SUB sp,sp,#8
0x00000008: e1a01000 .... MOV r1,r0
0x0000000c: e58d0004 .... STR r0,[sp,#4]
0x00000010: e3500002 ..P. CMP r0,#2
0x00000014: e58d1000 .... STR r1,[sp,#0]
0x00000018: 8a000006 .... BHI {pc}+0x20 ; 0x38
0x0000001c: eaffffff .... B {pc}+0x4 ; 0x20
0x00000020: e59d0004 .... LDR r0,[sp,#4]
0x00000024: e3001000 .... MOVW r1,#:LOWER16: table
0x00000028: e3401000 ..@. MOVT r1,#:UPPER16: table
0x0000002c: e7910100 .... LDR r0,[r1,r0,LSL #2]
0x00000030: e12fff30 0./. BLX r0
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 187 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
...
** Section #7 '.rodata.table' (SHT_PROGBITS) [SHF_ALLOC]
Size : 12 bytes (alignment 4)
Address: 0x00000000
0x000000: 00 00 00 00 00 00 00 00 00 00 00 00 ............
...
#include <stdio.h>
The --map option displays the memory map of the image. Also, --autoat is the default.
...
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 188 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
To modify your source code to place code and data at a specific address using a scatter file:
1. Create the source file main.c containing the following code:
#include <stdio.h>
extern int sqr(int n1);
// Place at address 0x10000
const int gValue __attribute__((section(".ARM.__at_0x10000"))) = 3;
int main(void)
{
int squared;
squared=sqr(gValue);
printf("Value squared is: %d\n", squared);
return 0;
}
3. Create the scatter file scatter.scat containing the following load region:
LR1 0x0
{
ER1 0x0
{
*(+RO) ; rest of code and read-only data
}
ER2 +0
{
function.o
*(.ARM.__at_0x10000) ; Place gValue at 0x10000
}
; RW and ZI data to be placed at 0x200000
RAM 0x200000 (0x1FF00-0x2000)
{
*(+RW, +ZI)
}
ARM_LIB_STACK 0x800000 EMPTY -0x10000
{
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 189 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
}
}
The ARM_LIB_STACK and ARM_LIB_HEAP regions are required because the program is being linked
with the semihosting libraries.
4. Compile and link the sources:
The memory map shows that the variable is placed in the ER2 execution region at address 0x10000:
...
Execution Region ER2 (Base: 0x00002a54, Size: 0x0000d5b0, Max: 0xffffffff,
ABSOLUTE)
In this example, the size of ER1 is unknown. Therefore, gValue might be placed in ER1 or ER2. To
make sure that gValue is placed in ER2, you must include the corresponding selector in ER2 and link
with the --no_autoat command-line option. If you omit --no_autoat, gValue is placed in a separate
load region LR$$.ARM.__at_0x10000 that contains the execution region ER$$.ARM.__at_0x10000.
Related information
Semihosting for AArch32 and AArch64
armclang supports the -fropi and -frwpi options. You can use these options to
create bare-metal position independent executables.
Position independent code uses PC-relative addressing modes where possible and otherwise
accesses global data via the Global Offset Table (GOT). The address entries in the GOT and
initialized pointers in the data area are updated with the executable load address when the
executable runs for the first time.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 190 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
All objects and libraries that are linked into the image must be compiled to be position
independent.
#include <stdio.h>
int main(void)
{
printf("Hello World!\n");
return 0;
}
To compile and automatically link this code for bare-metal PIE, use the -fbare-metal-pie option
with armclang:
Alternatively, you can compile with the armclang option -fbare-metal-pie and link with the
armlink option --bare_metal_pie as separate steps:
Legacy code that is compiled with armcc to be included in a bare-metal PIE must
be compiled with either the option --apcs=/fpic or, if it contains no references to
global data, the option --apcs=/ropi.
If you are using Link-Time Optimization (LTO), use the armlink option --lto_relocation_model=pic
to tell the link time optimizer to produce position independent code:
Restrictions
A bare-metal PIE executable must conform to the following:
• The .got section must be placed in a writable region.
• All references to symbols must be resolved at link time.
• The image must be linked Position Independent with a base address of 0x0.
• The code and data must be linked at a fixed offset from each other.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 191 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
• The stack must be set up before the runtime relocation routine __arm_relocate_pie_ is called.
This means that the stack initialization code must only use PC-relative addressing if it is part of
the image code.
• It is the responsibility of the target platform that loads the PIE to ensure that the ZI region is
zero-initialized.
• The scatter file load region attribute PI is not supported for AArch64 state.
• Mixing absolute linked and bare-metal PIE images is not supported. You must link them as two
separate units.
• When writing assembly code for position independence, some instructions such as LDR let you
specify a label for a PC-relative address. For example:
ldr r0,=__main
Specifying a label causes the link step to fail when building with --bare-metal-pie, because the
symbol is in a read-only section. armlink returns an error message, for example:
LR 0x0
{
er_ro +0 {
*(+RO)
}
DYNAMIC_RELOCATION_TABLE +0 {
*(DYNAMIC_RELOCATION_TABLE)
}
got +0 {
*(.got)
}
er_rw +0 {
*(+RW)
}
er_zi +0 {
*(+ZI)
}
; Add any stack and heap section required by the user supplied
; stack/heap initialization routine here
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 192 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Use the armlink option --bare-metal-pie, or use either the --sysv or --shared option with --
fpic.
For AArch32 state, you can include the PI attribute for the load region, for example
LR 0x0 PI.
The linker generates the DYNAMIC_RELOCATION_TABLE section. This section must be placed in an
execution region called DYNAMIC_RELOCATION_TABLE. This allows the runtime relocation routine
__arm_relocate_pie_ that is provided in the C library to locate the start and end of the table using
the symbols Image$$DYNAMIC_RELOCATION_TABLE$$Base and Image$$DYNAMIC_RELOCATION_TABLE$
$Limit.
When using a scatter file and the default entry code that the C library supplies, the linker requires
that you provide your own routine for initializing the stack and heap. This user supplied stack and
heap routine is run before the routine __arm_relocate_pie_. Therefore, it is necessary to ensure
that this routine only uses PC relative addressing.
Related information
--fpic (armlink)
--pie (armlink)
--bare_metal_pie (armlink)
--bare_metal_sysv (armlink)
--ref_pre_init (armlink)
--sysv
-fbare-metal-pie (armclang)
-fropi (armclang)
-frwpi (armclang)
Load region attributes
Use *armlib* or *libcxx* so that the linker can resolve library naming in your scatter file.
Some Arm C and C++ library sections must be placed in a root region, for example __main.o,
__scatter*.o, __dc*.o, and *Region$$Table. This list can change between releases. The linker can
place all these sections automatically in a future-proof way with InRoot$$Sections.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 193 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Related information
Region table format
To place all sections that must be in a root region, use the section selector InRoot$$Sections. For
example :
Related information
Region table format
To place C library code, specify the library path and library name as the module selector. You can
use wildcard characters if required. For example:
LR1 0x0
{
ROM1 0
{
* (InRoot$$Sections)
* (+RO)
}
ROM2 0x1000
{
*armlib/c_* (+RO) ; all Arm-supplied C library functions
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 194 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
RAM1 0x3000
{
*armlib* (+RO) ; all other Arm-supplied library code
; for example, floating-point libraries
}
RAM2 0x4000
{
* (+RW, +ZI)
}
}
The name armlib indicates the Arm C library files that are located in the directory
<install_directory>\lib\armlib.
Procedure
1. Create the following C++ program, foo.cpp:
#include <iostream>
2. To place the C++ library code, define the following scatter file, scatter.scat:
LR 0x8000
{
ER1 +0
{
*armlib*(+RO)
}
ER2 +0
{
*libcxx*(+RO)
}
ER3 +0
{
*(+RO)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 195 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
{
*(+RW,+ZI)
}
}
The name *libcxx* matches <install_directory>\lib\libcxx, indicating the C++ library files
that are located in the libcxx directory.
3. Compile and link the sources:
armclang --target=arm-arm-none-eabi -march=armv8-a -c foo.cpp
armclang --target=arm-arm-none-eabi -march=armv8-a -c main.c
armlink --scatter=scatter.scat --map main.o foo.o -o foo.axf
To place sections that are not automatically assigned to specific execution regions, use the .ANY
module selector in a scatter file.
Usually, a single .ANY selector is equivalent to using the * module selector. However, unlike *, you
can specify .ANY in multiple execution regions.
The linker has default rules for placing unassigned sections when you specify multiple .ANY
selectors. You can override the default rules using the following command-line options:
• --any_contingency to permit extra space in any execution regions containing .ANY sections for
linker-generated content such as veneers and alignment padding.
• --any_placement to provide more control over the placement of unassigned sections.
• --any_sort_order to control the sort order of unassigned Input sections.
The placement of data can cause some data to be removed and shrink the size of
the sections.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 196 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
• Specify the maximum size for an execution region that the linker can fill with unassigned
sections.
The following are relevant operations in the linking process and their order:
1. .ANY placement.
2. String merging.
3. Region table creation.
4. Late library load (scatter-load functions).
5. Veneer generation + literal pool merging.
String and literal pool merging can reduce execution size, while region table creation, late library
load, and veneer generation can increase it. Padding also affects the execution size of the region.
Extra, more-specific operations can also increase or decrease execution size after
the .ANY placement, such as the generation of PLT/GOT and exception-section
optimizations.
When more than one .ANY selector is present in a scatter file, the linker sorts sections in
descending size order. It then takes the unassigned section with the largest size and assigns the
section to the most specific .ANY execution region that has enough free space. For example,
.ANY(.text) is judged to be more specific than .ANY(+RO).
If several execution regions are equally specific, then the section is assigned to the execution region
with the most available remaining space.
For example:
• You might have two equally specific execution regions where one has a size limit of 0x2000 and
the other has no limit. In this case, all the sections are assigned to the second unbounded .ANY
region.
• You might have two equally specific execution regions where one has a size limit of 0x2000 and
the other has a size limit of 0x3000. In this case, the first sections to be placed are assigned
to the second .ANY region of size limit 0x3000. This assignment continues until the remaining
size of the second .ANY region is reduced to 0x2000. From this point, sections are assigned
alternately between both .ANY execution regions.
You can specify a maximum amount of space to use for unassigned sections with the execution
region attribute ANY_SIZE.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 197 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Use worst_fit when you want to fill regions evenly. With equal sized regions and sections
worst_fit fills regions cyclically.
If the linker attempts to fill a region to its limit, as it does with first_fit and best_fit, it might
overfill the region. This is because linker-generated content such as padding and veneers are
not known until sections have been assigned to .ANY selectors. If this occurs you might see the
following error:
Error: L6220E: Execution region <regionname> size (<size> bytes) exceeds limit (<limit>
bytes).
The --any_contingency option prevents the linker from filling the region up to its maximum.
It reserves a portion of the region's size for linker-generated content and fills this contingency
area only if no other regions have space. It is enabled by default for the first_fit and best_fit
algorithms, because they are most likely to exhibit this behavior.
Procedure
To prioritize the order of multiple .ANY sections use the .ANY<num> selector, where <num> is a
positive integer starting at zero.
The highest priority is given to the selector with the highest integer.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 198 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
er1 +0 512
{
.ANY1(+RO) ; evenly distributed with er3
}
er2 +0 256
{
.ANY2(+RO) ; Highest priority, so filled first
}
er3 +0 256
{
.ANY1(+RO) ; evenly distributed with er1
}
}
9.11.4 Specify the maximum region size permitted for placing unassigned
sections
You can specify the maximum size in a region that armlink can fill with unassigned sections.
Use the execution region attribute ANY_SIZE <max_size> to specify the maximum size in a region
that armlink can fill with unassigned sections.
When ANY_SIZE is present, armlink does not attempt to calculate contingency and strictly follows
the .ANY priorities.
When ANY_SIZE is not present for an execution region containing a .ANY selector, and you specify
the --any_contingency command-line option, then armlink attempts to adjust the contingency for
that execution region. The aims are to:
• Never overflow a .ANY region.
• Make sure there is a contingency reserved space left in the given execution region. This space is
reserved for veneers and section padding.
If you specify --any_contingency on the command line, it is ignored for regions that have ANY_SIZE
specified. It is used as normal for regions that do not have ANY_SIZE specified.
Example
The following example shows how to use ANY_SIZE:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 199 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
In this example:
• ER_1 has 0x100 reserved for linker-generated content.
• ER_2 has 0x50 reserved for linker-generated content. That is about the same as the automatic
contingency of --any_contingency.
• ER_3 has no reserved space. Therefore, 100% of the region is filled, with no contingency for
veneers. Omitting the ANY_SIZE parameter causes 98% of the region to be filled, with a two
percent contingency for veneers.
The input section properties and ordering are shown in the following table:
LR 0x100
{
ER_1 0x100 0x10
{
.ANY
}
ER_2 0x200 0x10
{
.ANY
}
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 200 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
In this example:
• For first_fit, the linker first assigns all the sections it can to ER_1, then moves on to ER_2
because that is the next available region.
• For next_fit, the linker does the same as first_fit. However, when ER_1 is full it is marked as
FULL and is not considered again. In this example, ER_1 is full. ER_2 is then considered.
• For best_fit, the linker assigns sec1 to ER_1. It then has two regions of equal priority and
specificity, but ER_1 has less space remaining. Therefore, the linker assigns sec2 to ER_1, and
continues assigning sections until ER_1 is full.
The linker first assigns sec1 to ER_1. It then has two equally specific and priority regions. It assigns
sec2 to the one with the most free space, ER_2 in this example. The regions now have the same
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 201 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
amount of space remaining, so the linker assigns sec3 to the first one that appears in the scatter
file, that is ER_1.
The behavior of worst_fit is the default behavior in this version of the linker, and it
is the only algorithm available in earlier linker versions.
The input section properties and ordering are shown in the following table:
Table 9-4: Input section properties for placement of sections with next_fit
Name Size
sec1 0x14
sec2 0x14
sec3 0x10
sec4 0x4
sec5 0x4
sec6 0x4
LR 0x100
{
ER_1 0x100 0x20
{
.ANY1(+RO-CODE)
}
ER_2 0x200 0x20
{
.ANY2(+RO)
}
ER_3 0x300 0x20
{
.ANY3(+RO)
}
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 202 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
The next_fit algorithm is different to the others in that it never revisits a region that is considered
to be full. This example also shows the interaction between priority and specificity of selectors.
This is the same for all the algorithms.
In this example:
• The linker places sec1 in ER_1 because ER_1 has the most specific selector. ER_1 now has 0x6
bytes remaining.
• The linker then tries to place sec2 in ER_1, because it has the most specific selector, but there
is not enough space. Therefore, ER_1 is marked as full and is not considered in subsequent
placement steps. The linker chooses ER_3 for sec2 because it has higher priority than ER_2.
• The linker then tries to place sec3 in ER_3. It does not fit, so ER_3 is marked as full and the linker
places sec3 in ER_2.
• The linker now processes sec4. This is 0x4 bytes so it can fit in either ER_1 or ER_3. Because
both of these sections have previously been marked as full, they are not considered. The linker
places all remaining sections in ER_2.
• If another section sec7 of size 0x8 exists, and is processed after sec6 the example fails to
link. The algorithm does not attempt to place the section in ER_1 or ER_3 because they have
previously been marked as full.
The input section properties and ordering are shown in the following table:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 203 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
sections_a.o sections_b.o
Name Size Name Size
seca_1 0x4 secb_1 0x4
seca_2 0x4 secb_2 0x4
seca_3 0x10 secb_3 0x10
seca_4 0x14 secb_4 0x14
The following table shows the order that the sections are processed by the .ANY assignment
algorithm.
With --any_sort_order=descending_size, sections of the same size use the creation index as a
tiebreaker.
Command-line example
The following linker command-line options are used for this example:
The following table shows the order that the sections are processed by the .ANY assignment
algorithm.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 204 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Name Size
secb_1 0x4
secb_2 0x4
secb_3 0x10
secb_4 0x14
The linker does not know the address of a section until it is assigned to a region. Therefore, when
filling .ANY regions, the linker cannot calculate the contingency space and cannot determine if
calling functions require veneers. The linker provides a contingency algorithm that gives a worst-
case estimate for padding and an extra two percent for veneers. To enable this algorithm, use the
--any_contingency command-line option.
The following diagram represents an example image layout during .ANY placement:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 205 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Execution region
Base
.ANY
Image sections
content
Free
space
2%
limit
The downward arrows for prospective padding show that the prospective padding continues to
grow as more sections are added to the .ANY selector.
Prospective padding is dealt with before the two percent veneer contingency.
When the prospective padding is cleared, the priority is set to zero. When the two percent is
cleared, the priority is decremented again.
You can also use the ANY_SIZE keyword on an execution region to specify the maximum amount of
space in the region to set aside for .ANY section assignments.
You can use the armlink command-line option --info=any to get extra information on where the
linker has placed sections. This information can be useful when trying to debug problems.
When there is only one .ANY selector, it might not behave identically to *. The
algorithms that are used to determine the size of the section and place data still run
with .ANY and they try to estimate the impact of changes that might affect the size
of sections. These algorithms do not run if * is used instead. When it is appropriate
to use one or the other of .ANY or *, then you must not use a single .ANY selector
that applies to a kind of data, such as RO, RW, or ZI. For example, .ANY (+RO).
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 206 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Error: L6407E: Sections of aggregate size 0x128 bytes could not fit
into .ANY selector(s).
However, increasing the section size by 0x128 bytes does not necessarily lead to a
successful link. The failure to link is because of the extra data, such as region table
entries, that might end up in the region after adding more sections.
Example
1. Create the following foo.c program:
#include "stdio.h"
struct S {
char A[8];
char B[4];
};
struct S s;
struct S* get()
{
return &s;
}
int main(void) {
int i;
for (i=0; i<10; i++) {
array[i]=i*i;
printf("%d\n", array[i]);
}
gSquared=sqr(i);
printf("%d squared is: %d\n", i, gSquared);
return sizeof(array);
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 207 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
{
.ANY
}
ER_4 (ImageLimit(ER_3)) 0x1000
{
*(+RW,+ZI)
}
ARM_LIB_STACK 0x800000 EMPTY -0x10000
{
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
}
}
==============================================================================
==============================================================================
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 208 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Procedure
To place veneers at a specific location, include the linker-generated symbol Veneer$$Code in a
scatter file. At most, one execution region in the scatter file can have the *(Veneer$$Code) section
selector.
If it is safe to do so, the linker places veneer input sections into the region identified by the
*(Veneer$$Code) section selector. It might not be possible for a veneer input section to be
assigned to the region because of address range problems or execution region size limitations. If
the veneer cannot be added to the specified region, it is added to the execution region containing
the relocated input section that generated the veneer.
Instances of *(IWV$$Code) in scatter files from earlier versions of Arm tools are
automatically translated into *(Veneer$$Code). Use *(Veneer$$Code) in new
descriptions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 209 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Use the first line in the scatter file to specify a preprocessor command that the linker invokes to
process the file. The command is of the form:
#! preprocessor [preprocessor_flags]
You can:
• Add preprocessing directives to the top of the scatter file.
• Use simple expression evaluation in the scatter file.
LR1 ADDRESS
{
...
}
The linker parses the preprocessed scatter file and treats the directives as comments.
You can also use the --predefine command-line option to assign values to constants. For this
example:
1. Modify file.scat to delete the directive #define ADDRESS 0x20000000.
2. Specify the command:
armlink invokes armclang with the -I<scatter_file_path> option so that any preprocessor
directives with relative paths work. The linker only adds this option if the full name of the
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 210 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
preprocessor tool given is armclang or armclang.exe. This means that if an absolute path or
a relative path is given, the linker does not give the -I<scatter_file_path> option to the
preprocessor. This also happens with the --cpu option.
On Windows, .exe suffixes are handled, so armclang.exe is considered the same as armclang.
Executable names are case insensitive, so armclang is considered the same as armclang. The
portable way to write scatter file preprocessing lines is to use correct capitalization and omit the
.exe suffix.
This means:
• The string must be correctly quoted for the host system. The portable way to do this is to use
double-quotes.
• Single quotes and escaped characters are not supported and might not function correctly.
• The use of a double-quote character in a path name is not supported and might not work.
These rules also apply to any strings passed with the --predefine option.
All preprocessor executables must accept the -o <file> option to mean output to file and accept
the input as a filename argument on the command line. These options are automatically added
to the user command line by armlink. Any options to redirect preprocessing output in the user-
specified command line are not supported.
To reserve an empty block of memory, add an execution region in the scatter file and assign the
EMPTY attribute to that region.
The block of memory does not form part of the load region, but is assigned for use at execution
time. Because it is created as a dummy ZI region, the linker uses the following symbols to access it:
• Image$$<region_name>$$ZI$$Base.
• Image$$<region_name>$$ZI$$Limit.
• Image$$<region_name>$$ZI$$Length.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 211 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
If the length is given as a negative value, the address is taken to be the end address of the region.
This address must be an absolute address and not a relative one.
In the following example, the execution region definition STACK 0x800000 EMPTY -0x10000 defines
a region that is called STACK. The region starts at address 0x7F0000 and ends at address 0x800000:
The dummy ZI region that is created for an EMPTY execution region is not initialized
to zero at runtime.
If the address is in relative (+<offset>) form and the length is negative, the linker generates an
error.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 212 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
0x810000
Limit
Heap
0x800000 Base
Limit
Stack
0x7F0000
Base
Image$$STACK$$ZI$$Base = 0x7f0000
Image$$STACK$$ZI$$Limit = 0x800000
Image$$STACK$$ZI$$Length = 0x10000
Image$$HEAP$$ZI$$Base = 0x800000
Image$$HEAP$$ZI$$Limit = 0x810000
Image$$HEAP$$ZI$$Length = 0x10000
The EMPTY attribute applies only to an execution region. The linker generates a
warning and ignores an EMPTY attribute that is used in a load region definition.
The linker checks that the address space used for the EMPTY region does not overlap
any other execution region.
The linker provides the following built-in functions to help create load and execution regions on
page boundaries:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 213 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
• Alignment on an execution region causes both the load address and execution
address to be aligned.
• The default page size is 0x8000. To change the page size, specify the --pagesize
linker command-line option.
To produce an ELF file with each execution region starting on a new page, and with code starting
on the next page boundary after the header information:
If you set up your ELF file in this way, then you can memory-map it onto an operating system in
such a way that:
• RO and RW data can be given different memory protections, because they are placed in
separate pages.
• The load address everything expects to run at is related to its offset in the ELF file by specifying
SizeOfHeaders() for the first load region.
Aligning when it is convenient for you to modify the source and recompile
When it is convenient for you to modify the original source code, you can align at compile
time with the __align(n) keyword, for example.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 214 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Mapping Code and Data to the Target
Aligning when it is not convenient for you to modify the source and recompile
It might not be convenient for you to modify the source code for various reasons. For
example, your build process might link the same object file into several images with different
alignment requirements.
When it is not convenient for you to modify the source code, then you must use the
following alignment specifiers in a scatter file:
ALIGNALL
Increases the section alignment of all the sections in an execution region, for example:
OVERALIGN
Increases the alignment of a specific section, for example:
ER_DATA ...
{
*.o(.bar, OVERALIGN 8)
... ;selectors
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 215 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
The solution is to create an overlay region where each piece of overlaid code is unloaded and
loaded by an overlay manager. Arm® Compiler for Embedded supports:
• An automatic overlay mechanism, where the linker decides how your code sections get
allocated to overlay regions.
• A manual overlay mechanism, where you manually arrange the allocation of the code sections.
Arm Compiler for Embedded does not support using both manual and automatic
overlays within the same program.
Arm® Compiler for Embedded does not support using both manual and automatic
overlays within the same program.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 216 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
Related information
Automatically placing code sections in overlay regions on page 217
Overlay veneer on page 218
Overlay data tables on page 219
Limitations of automatic overlay support on page 220
About writing an overlay manager for automatically placed overlays on page 221
Each overlay region corresponds to an execution region that has the attribute AUTO_OVERLAY
assigned in the scatter file. armlink allocates one set of integer identifiers to each of these overlay
regions. It allocates another set of integer identifiers to each overlaid section with the name
.ARM.overlay<N> that is defined in the object files.
The numbers that are assigned to the overlay sections in your object files do not
match up to the numbers that you put in the .ARM.overlay<N> section names.
Procedure
1. Declare the functions that you want the armlink automatic overlay mechanism to process.
• In C, use a function attribute, for example:
• In the armclang integrated assembler syntax, use the .section directive, for example:
.section .ARM.overlay1,"ax",%progbits
.global foo
.p2align 2
.type foo,%function
foo:
...
.fnend
.section .ARM.overlay2,"ax",%progbits
.global bar
.p2align 2
.type bar,%function
bar:
...
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 217 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
.fnend
AREA |.ARM.overlay1|,CODE
foo PROC
...
ENDP
AREA |.ARM.overlay2|,CODE
bar PROC
...
ENDP
You can only overlay code sections. Data sections must never be overlaid.
2. Specify the locations to load the code sections from and to in a scatter file. Use the
AUTO_OVERLAY keyword on one or more execution regions.
The execution regions must not have any section selectors. For example:
OVERLAY_LOAD_REGION 0x10000000
{
OVERLAY_EXECUTE_REGION_A 0x20000000 AUTO_OVERLAY 0x10000 { }
OVERLAY_EXECUTE_REGION_B 0x20010000 AUTO_OVERLAY 0x10000 { }
}
In this example, armlink emits a program header table entry that loads all the overlay data
starting at address 0x10000000. Also, each overlay is relocated so that it runs correctly if copied
to address 0x20000000 or 0x20010000. armlink chooses one of these addresses for each
overlay.
3. When linking, specify the --overlay_veneers command-line option. This option causes armlink
to arrange function calls between two overlays, or between non-overlaid code and an overlay,
to be diverted through the entry point of an overlay manager.
To permit an overlay-aware debugger to track the overlay that is active, specify the --
emit_debug_overlay_section command-line option.
Related information
__attribute__((section("name"))) function attribute
AREA directive
Execution region attributes
--emit_debug_overlay_section linker option
--overlay_veneers linker option
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 218 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
A function call or return can transfer control between two overlays or between non-overlaid code
and an overlay. If the target function is not already present at its intended execution address, then
the target overlay has to be loaded.
To detect whether the target overlay is present, armlink can arrange for all such function calls
to be diverted through the overlay manager entry point, __ARM_overlay_entry. To enable this
feature, use the armlink command-line option --overlay_veneers. This option causes a veneer to
be generated for each affected function call, so that the call instruction, typically a BL instruction,
points at the veneer instead of the target function. The veneer in turn saves some registers on the
stack, loads some information about the target function and the overlay that it is in, and transfers
control to the overlay manager entry point. The overlay manager must then:
• Ensure that the correct overlay is loaded and then transfer control to the target function.
• Restore the stack and registers to the state they were left in by the original BL instruction.
• If the function call originated inside an overlay, make sure that returning from the called
function reloads the overlay being returned to.
Related information
--overlay_veneers linker option
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 219 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
Region$$Count$$AutoOverlay
This symbol points to a single 16-bit integer (an unsigned short) giving the total number of
overlay regions. That is, the number of entries in the arrays Region$$Table$$AutoOverlay
and CurrLoad$$Table$$AutoOverlay.
Overlay$$Map$$AutoOverlay
This symbol points to an array containing a 16-bit integer (an unsigned short) per overlay. For
each overlay, this table indicates which overlay region the overlay expects to be loaded into
to run correctly.
Size$$Table$$AutoOverlay
This symbol points to an array containing a 32-bit word per overlay. For each overlay, this
table gives the exact size of the data for the overlay. This size might be less than the size of
its containing overlay region, because overlays typically do not fill their regions exactly.
In addition to the read-only tables, armlink also provides one piece of read/write memory:
CurrLoad$$Table$$AutoOverlay
This symbol points to an array containing a 16-bit integer (an unsigned short) for each
overlay region. The array is intended for the overlay manager to store the identifier of the
currently loaded overlay in each region. The overlay manager can then avoid reloading an
already-loaded overlay.
All these data tables are optional. If your code does not refer to any particular table, then it is
omitted from the image.
Related information
Automatic overlay support on page 216
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 220 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
In simple cases, this can still work. However, if the non-overlaid function calls something in
a second overlay that conflicts with the overlay of its calling function, then a runtime failure
occurs. For example:
void non_overlaid(void)
{
innermost();
}
int main(void)
{
// Call the overlaid function call_via_ptr() and pass it a pointer
// to non_overlaid(). non_overlaid() then calls the function
// innermost() in another overlay. If call_via_ptr() and innermost()
// are allocated to the same overlay region by the linker, then there
// is no way for call_via_ptr to have been reloaded by the time control
// has to return to it from non_overlaid().
call_via_ptr(non_overlaid);
}
Related information
Automatic overlay support on page 216
The overlay manager entry point __ARM_overlay_entry is the location that the linker-generated
veneers expect to jump to. The linker also provides some tables of data to enable the overlay
manager to find the overlays and the overlay regions to load.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 221 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
The overlay manager might also have to modify the value it passes to the calling function in lr to
point at a return thunk routine. This routine would reload the overlay of the calling function and
then return control to the original value of the lr of the calling function.
There is no sensible place already available to store the original value of lr for the return thunk to
use. For example, there is nowhere on the stack that can contain the value. Therefore, the overlay
manager has to maintain its own stack-organized data structure. The data structure contains the
saved lr value and the corresponding overlay ID for each time the overlay manager substitutes a
return thunk during a function call, and keeps it synchronized with the main call stack.
Because this extra parallel stack has to be maintained, then you cannot use stack
manipulations unless it is customized to keep the parallel stack of the overlay
manager consistent. Some examples of stack manipulations include cooperative
or preemptive thread switching, coroutines, and the setjmp() and longjmp()
functions.
The armlink option --info=auto_overlay causes the linker to write out a text summary of the
overlays in the image it outputs. The summary consists of the integer ID, start address, and size of
each overlay. You can use this information to extract the overlays from the image, for example from
the output of the fromelf option --bin. You can then put them in a separate peripheral storage
system. Therefore, you still know which chunk of data goes with which overlay ID when you have
to load one of them in the overlay manager.
Related information
Automatic overlay support on page 216
--info linker option
Arm® Compiler for Embedded does not support using both manual and automatic
overlays within the same program.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 222 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
• The OVERLAY attribute for load regions and execution regions. Use this attribute in a scatter file
to indicate regions of memory where the linker assigns the overlay sections for loading into at
runtime.
• The following armlink command-line options to add extra debug information to the image:
◦ --emit_debug_overlay_relocs.
◦ --emit_debug_overlay_section.
This extra debug information permits an overlay-aware debugger to track which overlay is
active.
Related information
Manually placing code sections in overlay regions on page 223
Writing an overlay manager for manually placed overlays on page 225
The OVERLAY attribute allows you to place multiple execution regions at the same address. An
overlay manager is required to make sure that only one execution region is instantiated at a time.
Arm® Compiler for Embedded does not provide an overlay manager.
The following example shows the definition of a static section in RAM followed by a series of
overlays. Here, only one of these sections is instantiated at a time.
EMB_APP 0x8000
{
...
STATIC_RAM 0x0 ; contains most of the RW and ZI code/data
{
* (+RW,+ZI)
}
OVERLAY_A_RAM 0x1000 OVERLAY ; start address of overlay...
{
module1.o (+RW,+ZI)
}
OVERLAY_B_RAM 0x1000 OVERLAY
{
module2.o (+RW,+ZI)
}
... ; rest of scatter-loading description
}
The C library at startup does not initialize a region that is marked as OVERLAY. The contents of the
memory that is used by the overlay region is the responsibility of an overlay manager. If the region
contains initialized data, use the NOCOMPRESS attribute to prevent RW data compression.
You can use the linker defined symbols to obtain the addresses that are required to copy the code
and data.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 223 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
You can use the OVERLAY attribute on a single region that is not at the same address as a different
region. Therefore, you can use an overlay region as a method to prevent the initialization of
particular regions by the C library startup code. As with any overlay region, you must manually
initialize them in your code.
An overlay region can have a relative base. The behavior of an overlay region with a +<offset>
base address depends on the regions that precede it and the value of +<offset>. If they have the
same +<offset> value, the linker places consecutive +<offset> regions at the same base address.
The following table shows the effect of +<offset> when used with the OVERLAY attribute. REGION1
appears immediately before REGION2 in the scatter file:
The following example shows the use of relative offsets with overlays and the effect on execution
region addresses:
EMB_APP 0x8000
{
CODE 0x8000
{
*(+RO)
}
# REGION1 Base = CODE limit
REGION1 +0 OVERLAY
{
module1.o(*)
}
# REGION2 Base = REGION1 Base
REGION2 +0 OVERLAY
{
module2.o(*)
}
# REGION3 Base = REGION2 Base = REGION1 Base
REGION3 +0 OVERLAY
{
module3.o(*)
}
# REGION4 Base = REGION3 Limit + 4
Region4 +4 OVERLAY
{
module4.o(*)
}
}
If the length of the non-overlay area is unknown, you can use a zero relative offset to specify the
start address of an overlay so that it is placed immediately after the end of the static section.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 224 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
Related information
Load region descriptions
Load region attributes
Inheritance rules for load region address attributes
Considerations when using a relative address +offset for a load region
Considerations when using a relative address +offset for execution regions
--emit_debug_overlay_relocs linker option
--emit_debug_overlay_section linker option
ABI for the Arm Architecture: Support for Debugging Overlaid Programs
The overlay manager must ensure that the correct overlay segment is loaded before calling any
function in that segment. If a function from one overlay is called while a different overlay is loaded,
then some kind of runtime failure occurs. If such a failure is a possibility, the linker and compiler do
not warn you because it is not statically determinable. The same is true for a data overlay.
The central component of this overlay manager is a routine to copy code and data from the load
address to the execution address. This routine is based around the following linker defined symbols:
• Load$$execution_region_name$$Base, the load address.
• Image$$execution_region_name$$Base, the execution address.
• Image$$execution_region_name$$Length, the length of the execution region.
The implementation of the overlay manager depends on the system requirements. This procedure
shows a simple method of implementing an overlay manager.
The copy routine that is called load_overlay() is implemented in overlay_manager.c. The routine
uses memcpy() and memset() functions to copy CODE and RW data overlays, and to clear ZI data
overlays.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 225 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
The assembly file overlay_list.s lists all the required symbols. This file defines and exports two
common base addresses and a RAM space that is mapped to the overlay structure table:
code_base
data_base
overlay_regions
As specified in the scatter file, armlink places the two functions, func1() and func2(), and their
corresponding data in CODE_ONE, CODE_TWO, DATA_ONE, and DATA_TWO regions, respectively. armlink
has a special mechanism for replacing calls to functions with stubs. To use this mechanism, write a
small stub for each function in the overlay that might be called from outside the overlay.
In this example, two stub functions $Sub$$func1() and $Sub$$func2() are created for the two
functions func1() and func2() in overlay_stubs.c. These stubs call the overlay-loading function
load_overlay() to load the corresponding overlay. After the overlay manager finishes its overlay
loading task, the stub function can then call $Super$$func1 to call the loaded function func1() in
the overlay.
Procedure
1. Create the overlay_manager.c program to copy the correct overlay to the runtime addresses.
/* overlay_manager.c
* Basic overlay manager
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void load_overlay(int n)
{
const overlay_region_t * selected_region;
if(n == current_overlay)
{
printf("Overlay %d already loaded.\n", n);
return;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 226 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
/* boundary check */
if(n<1 || n>NUM_OVERLAYS)
{
printf("Error - invalid overlay number %d specified\n", n);
exit(1);
}
/* Load the corresponding overlay */
printf("Loading overlay %d...\n", n);
/* Comment out the next line if your overlays have any static ZI variables
* and should not be reinitialized each time, and move them out of the
* overlay region in your scatter file */
memset(selected_region->exec_zi_base, 0, selected_region->zi_length);
2. Create a separate source file for each of the functions, func1.c for func1() and func2.c for
func2().
// func1.c
#include <stdio.h>
#include <stdlib.h>
void func1(void)
{
unsigned int i;
printf("%s\n", func1_string);
for(i = 19; i; i--)
{
func1_values[i] = rand();
foo(i);
printf("%d ", func1_values[i]);
}
printf("\n");
}
// func2.c
#include <stdio.h>
void func2(void)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 227 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
{
printf("%s\n", func2_string);
foo(func2_values[9]);
}
int main(void)
{
printf("Start of main()...\n");
func1();
func2();
/*
* Call func2() again to demonstrate that we don't need to
* reload the overlay
*/
func2();
func1();
printf("End of main()...\n");
return 0;
}
void foo(int x)
{
return;
}
4. Create overlay_stubs.c to provide two stub functions $Sub$$func1() and $Sub$$func2() for
the two functions func1() and func2().
// overlay_stub.c
extern void $Super$$func1(void);
extern void $Super$$func2(void);
void $Sub$$func2(void)
{
load_overlay(2);
$Super$$func2();
}
IMPORT ||Load$$CODE_ONE$$Base||
IMPORT ||Load$$CODE_TWO$$Base||
IMPORT ||Load$$DATA_ONE$$Base||
IMPORT ||Load$$DATA_TWO$$Base||
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 228 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
IMPORT ||Image$$CODE_ONE$$Base||
IMPORT ||Image$$DATA_ONE$$Base||
IMPORT ||Image$$DATA_ONE$$ZI$$Base||
IMPORT ||Image$$DATA_TWO$$ZI$$Base||
IMPORT ||Image$$CODE_ONE$$Length||
IMPORT ||Image$$CODE_TWO$$Length||
IMPORT ||Image$$DATA_ONE$$ZI$$Length||
IMPORT ||Image$$DATA_TWO$$ZI$$Length||
; Symbols to export
EXPORT code_base
EXPORT data_base
EXPORT overlay_regions
overlay_regions
; overlay 1
DCD ||Load$$CODE_ONE$$Base||
DCD ||Load$$DATA_ONE$$Base||
DCD ||Image$$DATA_ONE$$ZI$$Base||
DCD ||Image$$CODE_ONE$$Length||
DCD ||Image$$DATA_ONE$$ZI$$Length||
; overlay 2
DCD ||Load$$CODE_TWO$$Base||
DCD ||Load$$DATA_TWO$$Base||
DCD ||Image$$DATA_TWO$$ZI$$Base||
DCD ||Image$$CODE_TWO$$Length||
DCD ||Image$$DATA_TWO$$ZI$$Length||
END
return config;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 229 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overlay support in Arm Compiler for Embedded 6
{
ROM_EXEC 0x24000000 0x04000000
{
* (InRoot$$Sections) ; All library sections that must be in a root
region
; e.g. __main.o, __scatter*.o, * (Region$
$Table)
* (+RO) ; All other code
}
RAM_EXEC 0x10000
{
* (+RW, +ZI)
}
Related information
Manual overlay support on page 222
Use of $Super$$ and $Sub$$ to patch symbol definitions
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 230 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
It is important to consider the process for moving an embedded application from the development
or debugging environment to a system that runs standalone on target hardware.
For example, when you start work on software for an embedded application, you might not know
the details of target peripheral devices, the memory map, or even the processor itself.
To enable you to proceed with software development before such details are known, the
compilation tools have a default behavior that enables you to start building and debugging
application code immediately.
In the Arm C library, support for some ISO C functionality, for example program I/O, can be
provided by the host debugging environment. The mechanism that provides this functionality
is known as semihosting. When semihosting is executed, the debug agent suspends program
execution. The debug agent then uses the debug capabilities of the host (for example printf
output to the debugger console) to service the semihosting operation before code execution is
resumed on the target. The task performed by the host is transparent to the program running on
the target.
Related information
Semihosting for AArch32 and AArch64
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 231 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
For example, the following figure shows the C library implementing the function printf() by
writing to the debugger console window. This implementation is provided by calling _sys_write(),
a support function that executes a semihosting call, resulting in the default behavior using the
debugger instead of target peripherals.
Functions called by
ISO C your application,
for example, printf()
C Library
Debug Implemented by
Agent Semihosting Support the debugging
environment
Related information
The Arm C and C++ libraries
The C and C++ library functions
Semihosting for AArch32 and AArch64
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 232 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
From
Semihosting
STACK call
HEAP
Calculated
by the linker
ZI
RW
RO
0x8000
Processors that are based on Arm®v6-M and Armv7-M architectures have fixed
memory maps. Having fixed memory maps makes porting software easier between
different systems that are based on these processors.
The linker observes a set of rules to decide where in memory code and data are located:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 233 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
section A
ZI
from file2.o
B Section A
RW
DATA from file1.o
A
CODE
RO
Generally, the linker sorts the Input sections by attribute (RO, RW, ZI), by name, and then by
position in the input list.
To fully control the placement of code and data, you must use the scatter-loading mechanism.
Related information
Tailoring the C library to your target hardware on page 235
The image structure
Section placement with the linker
About scatter-loading
Scatter file syntax
Cortex-M1 Technical Reference Manual
Cortex-M3 Technical Reference Manual
Semihosting for AArch32 and AArch64
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 234 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
__rt_entry
. set up application stack
and heap
. initialize library functions
. call top-level
constructors (C++)
. Exit from application
__main is responsible for setting up the memory and __rt_entry is responsible for setting up the
run-time environment.
__main performs code and data copying, decompression, and zero initialization of the ZI data. It
then branches to __rt_entry to set up the stack and heap, initialize the library functions and static
data, and call any top level C++ constructors. __rt_entry then branches to main(), the entry to
your application. When the main application has finished executing, __rt_entry shuts down the
library, then hands control back to the debugger.
The function label main() has a special significance. The presence of a main() function forces the
linker to link in the initialization code in __main and __rt_entry. Without a function labeled main(),
the initialization sequence is not linked in, and as a result, some standard C library functionality is
not supported.
Related information
--startup=symbol, --no_startup (armlink)
Arm Compiler C Library Startup and Initialization
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 235 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
By default, the C library uses semihosting to provide device driver level functionality, enabling
a host computer to act as an input and an output device. This functionality is useful because
development hardware often does not have all the input and output facilities of the final system.
You can provide your own implementation of target-dependent C library functions to use target
hardware. Your implementations are automatically linked in to your image instead of the C library
implementations. The following figure shows this process, which is known as retargeting the C
library.
Target-independent Target-independent
C Library
Retarget
User
Target-dependent Target-dependent Code
Debug
Semihosting Support Target Hardware
Agent
For example, you have a peripheral I/O device, such as an LCD screen. For this device, you want
to override the library implementation of fputc(), which writes to the debugger console, with one
that prints to the LCD. Because this implementation of fputc() is linked in to the final image, the
entire printf() family of functions prints to the LCD.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 236 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
fputc() acts as an abstraction layer between target-dependent output and the C library standard
output functions.
In a standalone application, you are unlikely to support semihosting operations. Therefore, you
must remove all calls to target-dependent C library functions or reimplement them with non-
semihosting functions.
Related information
Using the libraries in a nonsemihosting environment
Semihosting for AArch32 and AArch64
To build applications without the Arm standard C library, you must provide an alternative library
that reimplements the ISO standard C library functions that your application might need, such as
printf(). Your reimplemented library must be compliant with the Arm Embedded Application Binary
Interface (AEABI).
To instruct armclang to not use the Arm standard C library, you must use the armclang options -
nostdlib and -nostdlibinc. You must also use the armlink option --no_scanlib if you invoke the
linker separately.
You must also use the armclang option -fno-builtin to ensure that the compiler does not perform
any transformations of built-in functions. Without -fno-builtin, armclang might recognize calls to
certain standard C library functions, such as printf(), and replace them with calls to more efficient
alternatives in specific cases.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 237 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
Example
This example reimplements the printf() function to simply return 1 or 0.
//my_lib.c:
int printf(const char *c, ...)
{
if(!c)
{
return 1;
}
else
{
return 0;
}
}
Use armclang and armar to create a library from your reimplemented printf() function:
//foo.c:
extern int printf(const char *c, ...);
void foo(void)
{
printf("Hello, world!\n");
}
Use armclang to build the example application source file using the -nostdlib, -nostdlibinc, and
-fno-builtin options. Then use armlink to link the example reimplemented library using the --
no_scanlib option.
If you do not use the -fno-builtin option, then the compiler transforms the printf() function to
the puts() function, and the linker generates an error because it cannot find the puts() function in
the reimplemented library.
Related information
C library structure on page 231
--startup (armlink)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 238 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
In your final embedded system, without semihosting functionality, you are unlikely to use the
default memory map. Your target hardware usually has several memory devices located at different
address ranges. To make the best use of these devices, you must have separate views of memory at
load and run-time.
Scatter-loading enables you to describe the load and run-time memory locations of code and data
in a textual description file known as a scatter file. This file is passed to the linker on the command
line using the --scatter option. For example:
A single code or data section can only be placed in a single execution region. It cannot be split.
During startup, the C library initialization code in __main carries out the necessary copying of code
and data and the zeroing of data to move from the image load view to the execute view.
The overall layout of the memory maps of devices based around the Arm®v6-M and
Armv7-M architectures are fixed. This fixed layout makes it easier to port software
between different systems based on these architectures.
Related information
Information about scatter files
--scatter=filename (armlink)
Armv7-M Architecture Reference Manual
Armv6-M Architecture Reference Manual
Semihosting for AArch32 and AArch64
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 239 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
The scatter-loading description syntax shown in the following figure reflects the functionality
provided by scatter-loading:
Related information
Information about scatter files
Scatter-loading images with a simple memory map
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 240 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
One restriction placed on scatter-loading is that the code and data responsible for creating
execution regions cannot be copied to another location. As a result, the following sections must be
included in a root region:
• __main.o and __scatter*.o containing the code that copies code and data
• __dc*.o that performs decompression
• Region$$Table section containing the addresses of the code and data to be copied or
decompressed.
Because these sections are defined as read-only, they are grouped by the * (+RO) wildcard syntax.
As a result, if * (+RO) is specified in a non-root region, these sections must be explicitly declared in
a root region using InRoot$$Sections.
Related information
Region Table format on page 241
About placing Arm C and C++ library code
The Region Table is tightly integrated with the Arm C library Default Initialization
Sequence described in Application startup. Arm reserves the right to change the
format of the Region Table in future releases. Arm does not offer support on how
the Arm C library uses the information in the Region Table.
The Region Table is delimited by the linker-defined symbols Region$$Table$$Base and Region$
$Table$$Limit. You must place the Region$$Table in a root execution region. See Placement of
Arm C and C++ library code for details. Each table entry comprises four 32-bit words for AArch32
ELF files and four 64-bit words for AArch64 ELF files:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 241 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
The addresses are in one of three formats depending on the contents of the bottom two bits of the
word:
The Arm C library has different handler routines that have the following function prototype:
The Default Initialization Sequence processes the text entries in order, calling the
<handler_routine> with the right parameters. In the case where the table entries are not absolute
addresses, the linker adds additional veneer routines to translate the offsets into absolute
addresses at runtime.
<handler_routine> Description
__scatterload_null Does nothing.
__scatterload_copy Copies the number of bytes specified by Execution Size of
destination from Load Address of source to Execution
Address of destination.
__scatterload_zeroinit Zero initalizes the number of bytes specified by Execution Size
of destination starting from Execution Address of
destination.
__decompress Decompresses data starting at Load Address of source
to Execution Address of destination. The size of the
decompressed data is Execution Size of Bytes.
Examples
Using the image generated by the example described in Writing an overlay manager for manually
placed overlays, the following examples show the fromelf output:
• To view the disassembly:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 242 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
...
||Region$$Table$$Base||
DCD 0x24002ec8
DCD 0x00010000
DCD 0x00000010
DCD 0x2400003c
DCD 0x24002ed8
DCD 0x00010010
DCD 0x00000244
DCD 0x24000058
||Region$$Table$$Limit||
...
# Offset String
====================================
...
451 8751: Region$$Table$$Base
452 8771: Region$$Table$$Limit
...
Related information
Application startup on page 234
The application stack and heap are set up during C library initialization. You can tailor stack and
heap placement by using the specially named ARM_LIB_HEAP, ARM_LIB_STACK, or ARM_LIB_STACKHEAP
execution regions. Alternatively, if you are not using a scatter file, you can reimplement the
__user_setup_stackheap() function.
Related information
Run-time memory models on page 244
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 243 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
One-region model
The application stack and heap grow towards each other in the same region of memory, see the
following figure. In this run-time memory model, the heap is checked against the value of the stack
pointer when new heap space is allocated. For example, when malloc() is called.
Stack Base
0x40000
STACK
HEAP
Two-region model
The stack and heap are placed in separate regions of memory, see the following figure. For
example, you might have a small block of fast RAM that you want to reserve for stack use only. For
a two-region model, you must import __use_two_region_memory.
In this run-time memory model, the heap is checked against the heap limit when new heap space is
allocated.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 244 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
Heap
0x28080000
Limit
Heap HEAP
Base 0x28000000
Stack 0x40000
STACK
Base
Related information
Stack pointer initialization and heap bounds
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 245 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
If you use a scatter file to tailor stack and heap placement, the linker includes a version of the
library heap and stack setup code using the linker defined symbols, ARM_LIB_*, for these region
names. Alternatively you can create your own implementation.
The reset handler is normally a short module coded in assembler that executes immediately on
system startup. As a minimum, your reset handler initializes stack pointers for the modes that
your application is running in. For processors with local memory systems, such as caches, TCMs,
MMUs, and MPUs, some configuration must be done at this stage in the initialization process.
After executing, the reset handler typically branches to __main to begin the C library initialization
sequence.
There are some components of system initialization, for example, the enabling of interrupts, that
are generally performed after the C library initialization code has finished executing. The block of
code labeled $Sub$$main() performs these tasks immediately before the main application begins
executing.
Related information
About using $Super$$ and $Sub$$ to patch symbol definitions
Specifying stack and heap using the scatter file
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 246 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
It must be placed at a specific address, usually 0x0. To do this you can use the scatter-loading
+FIRST directive, as shown in the following example.
The vector table for the microcontroller profiles is very different to most Arm® architectures.
Related information
Vector table for AArch32 A and R profiles on page 247
Vector table for M-profile architectures on page 248
Information about scatter files
Scatter-loading images with a simple memory map
If required, you can include the FIQ handler at the end of the vector table to ensure it is handled
as efficiently as possible. See the following example. Using a literal pool means that addresses can
easily be modified later if necessary.
//----------------------------------------------------------------
// Exception Vector Table
//----------------------------------------------------------------
// Note: LDR PC instructions are used here, though branch (B) instructions
// could also be used, unless the exception handlers are >32MB away.
Vectors:
ldr pc, Reset_Addr
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 247 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
.balign 4
Reset_Addr: .word Reset_Handler
Undefined_Addr: .word Undefined_Handler
SVC_Addr: .word SVC_Handler
Prefetch_Addr: .word Prefetch_Handler
Abort_Addr: .word Abort_Handler
IRQ_Addr: .word IRQ_Handler
FIQ_Addr: .word FIQ_Handler
This example assumes that you have ROM at location 0x0 on reset. Alternatively, you can use the
scatter-loading mechanism to define the load and execution address of the vector table. In that
case, the C library copies the vector table for you.
In Arm®v7-M and Armv8-M processors, you can specify the <vectorbaseaddress> in the
Vector Table Offset Register (VTOR) to relocate the vector table. The default location on reset
is 0x0 (CODE space). For Armv6-M, the vector table base address is fixed at 0x0. The word at
<vectorbaseaddress> holds the reset value of the main stack pointer.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 248 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
The least significant bit, bit[0], of each address in the vector table must be set or a
HardFault exception is generated. If the table contains T32 symbol names, the Arm
Compiler for Embedded toolchain sets these bits for you.
When setting a different location, the offset, in bytes, must be aligned to:
• A power of 2.
• A minimum of 128 bytes.
• A minimum of 4*<N>, where <N> is the number of exceptions supported.
The minimum alignment is 128 bytes, which allows for 32 exceptions. 16 registers are reserved for
system exceptions. Therefore, you can use up to 16 interrupts.
To use more interrupts, you must adjust the alignment by rounding up to the next power of two.
For example, if you require 21 interrupts, then the total number of exceptions is 37, that is 21 plus
16 reserved system exceptions. The alignment must be on a 64-word boundary because the next
power of 2 after 37 is 64.
Implementations might restrict where the vector table can be located. For example,
in Cortex®-M3 r0p0 to r2p0, the vector table cannot be in RAM space.
This information does not apply to Arm®v6-M, Armv7-M, and Armv8-M profiles.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 249 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
This information assumes that an Arm processor begins fetching instructions at 0x0.
This is the standard behavior for systems based on Arm processors. However, some
Arm processors, for example the processors based on the Armv7-A architecture,
can be configured to begin fetching instructions from 0xFFFF0000.
There has to be a valid instruction at 0x0 at startup, so you must have nonvolatile memory located
at 0x0 at the moment of power-on reset. One way to achieve this is to have ROM located at 0x0.
However, there are some drawbacks to this configuration.
Arm® Compiler for Embedded 6 implements the Itanium C++ ABI and includes:
• A compiler (armclang) that can be used to compile programs written in C++.
• Two C++ libraries:
◦ The C++ standard library (libc++).
◦ The C++ run-time library (libc++abi).
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 250 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
• typeid.
More information about when RTTI is referenced and generated is described in section 2.9 Run-
Time Type Information (RTTI) of the Itanium C++ ABI.
RTTI for basic types such as int and bool is stored in the runtime library. Therefore, object files
generated from a C++ program might reference RTTI defined in libc++abi. See section 2.9.2 Place of
Emission of the Itanium C++ ABI for more information.
The compiler also generates RTTI for a program that contains classes and structures with virtual
functions.
Use of RTTI requires linking with a significant portion of libc++abi because it contains several
routines involved in processing RTTI. Also, there are links to C++ exceptions, or software aborts,
when typeid does not match.
Compiling your code the armclang option -fno-rtti does not guarantee complete removal of RTTI.
The standard libc++ library is compiled to use RTTI and libc++abi includes RTTI handling functions.
Therefore, you must also:
• Avoid using functions in the std:: namespace.
• Link against stub implementations of RTTI for basic types. For more information, see Avoid
linking in Run-Time Type Information.
Related information
-frtti, -fno-rtti
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 251 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
Helper functions
The Run-time ABI for the Arm Architecture document standardizes a set of helper functions that
all ABI-compliant Arm Compiler for Embedded toolchains must provide. The document gives the
following definition of a helper function:
A helper function is one that a relocatable file might refer to, even though its source includes no
standard headers, or no headers at all. A helper function usually implements some aspect of a
programming language not implemented by its standard library. For example, from C, floating-point
to integer conversions.
In some cases, a helper function might implement some aspect of standard library behavior not
implemented by any of its interface functions. For example, from the C library, errno.
A helper function might also implement an operation not implemented by the underlying hardware,
for example, integer division, floating-point arithmetic, or reading and writing misaligned data.
All ABI-compliant compilers can assume that these helper functions are present. Arm Compiler for
Embedded provides these helper functions in the standard C run-time library, and armclang uses
them.
You must write your C or C++ code in a way that avoids the standard library, and minimizes the use
of the runtime library by the compiler. Compiler and linker options are available to prevent them
using the library functions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 252 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
The following types of message identify content from the Arm libraries:
• Searching for ARM libraries in directory <path to directory containing Arm
libraries>
• definition: <symbol>
For example:
...
Loading System Libraries.
Related information
--verbose
ABI for the Arm Architecture
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 253 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
For example:
• The project must use a certified Functional Safety (FuSa) C library to make it easier to fulfill the
safety requirements for the project.
• The project uses alternative libraries provided by the Operating System (OS) vendor.
• The project has some custom requirements to re-implement certain C library functionality.
The following sections expand on the information provided in Standalone C library functions.
The C standardlib is the default C library that projects are likely to use. The microlib is an
alternative to the standard C library. Microlib focuses in particular on smaller code size, but with
some documented limitations and restrictions.
This option applies to functions such as printf, but does not apply to __builtin_<name>
functions, despite the name. The compiler knows something about functions such
as printf, and sometimes transforms the source code based on that understanding.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 254 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
However, the compiler still expects the library to provide an implementation of those
functions.
For example, if your code calls printf("hello, world\n"), the compiler might convert
it into puts("hello, world") because it knows from the descriptions of those two
functions in the C standard that they perform the same operations. But the puts()
function cannot perform all the operations of printf by itself. If you write a more
complicated call involving formatting such as %d, then use this option to ensure the
compiler emits a call to the printf library function.
• -nobuiltininc prevents the compiler from using the built-in header files.
• -nostdlib prevents the compiler from using the Arm standard C and C++ libraries.
• -nostdlibinc prevents the compiler from using the Arm standard C and C++ library
header files.
To use the Arm FuSa C library with the libc++ header files, you must use the
-nobuiltininc, -nostdlibinc, and -nostdlib options. The FuSa C library
is different from the Arm standard C library because it is designed to work
without the built-in header files.
If you are working in a freestanding, non-hosted, environment you can specify the
[COMMUNITY] option -ffreestanding. This option:
• Asserts that compilation targets a freestanding environment.
• Implies -fno-builtin.
• Sets the macro STD_C_HOSTED to 0.
Linker options
• --no_scanlib prevents the linker from scanning the Arm libraries to resolve references.
As a consequence of using this option, the Arm supplied libraries are not used by the
linker and you must include your own libraries.
Without a function labeled main(), the initialization sequence is not linked in, and as a result, some
standard C library functionality is not supported.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 255 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
Related information
Application startup on page 234
Avoid linking in Run-Time Type Information on page 257
-fno-builtin
-ffreestanding
-nostdlib
-nostdlibinc
--scanlib, --no_scanlib
--startup=symbol, --no_startup
__rt_entry
That is, you do not need to include any headers or call functions directly for the compiler to emit a
call to a function defined in the runtime library. The libc++abi library contains implementations of
these functions and some additional low-level support for libc++. Major components include:
• RTTI
• Exceptions
• New and Delete
• Terminate
• Static initialization guards
• Pure virtual abort handler
To link with your own ABI-compliant runtime library, specify the following armclang command-line
options:
• -fno-exceptions to disable the generation of code needed to support C++ exceptions.
• -fno-rtti to avoid typeinfo in object files and ensure no references to libc++abi typeinfo
functionality.
• -nobuiltininc to exclude the built-in header files.
• -nostdlib to pass --noscanlib to the linker and do not perform printf optimization. This
option disables the inclusion of both the C and C++ libraries.
• -nostdlibinc to not add the include and include/libcxx include directories.
• -nostdinc++ to disable standard #include directories for the C++ standard library.
• -I <project_runtime_library_header> to specify the location of your ABI-compliant runtime
library headers.
• -L <project_runtime_library> to specify the location of your ABI-compliant runtime library.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 256 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
For more information on how to avoid linking in libc++abi, see Avoid linking in Run-Time Type
Information.
Related information
-fexceptions, -fno-exceptions
-frtti, -fno-rtti
-I
-nobuiltininc
-nostdlib
-nostdlibinc
libc++ is compiled with RTTI. You can avoid using libc++ by not calling any std:: namespace
functions. However, the typeinfos in libc++abi might still be referenced.
Avoiding libc++abi
Compiling all source code with the armclang option -fno-rtti does not guarantee complete
removal of RTTI from the linked program.
However, RTTI is not used, and armclang does not generate calls to the RTTI handling functions in
libc++abi, when all the following conditions are true:
• -fno-exceptions is used to disable C++ exceptions.
• dynamic_cast is not used in the application, or is used in such a way that RTTI is not required.
• typeid is not used in the application.
If your code includes typeid, then specifying -fno-rtti results in an error. However, an error is
output for dynamic_cast only if the way it is used requires RTTI.
To ensure you avoid RTTI for the basic types being linked in from libc++abi, you must provide stub
implementations of RTTI for basic types as a placeholder.
Providing such stubs is sufficient to link the application, but not to use the C++ features that
depend on RTTI. That is, C++ exceptions, dynamic_cast, and typeid.
#include <iostream>
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 257 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
int main(void)
{
std::cout << "Hello World!" << std::endl;
return 0;
}
2. Create the typeinfo.s file containing the source code provided in typinfo.s example source
code.
3. Create the scatter file scatter.sct containing the following:
The scatter file explicitly places the unused_rtti section in an UNINIT section to ensure that
the RTTI stubs do not occupy any memory.
4. Build the C++ and assembler code with the following commands:
The linker command includes an option to generate a listings file named hello.lst that
includes:
• The memory map and symbol listing for the final image.
• The verbose output to show how the linker resolved references to definitions, including the
references to the RTTI stubs.
The memory map shows that the execution region containing the RTTI data is treated as UNINIT
and does not occupy any space:
Execution Region UNUSED_RTTI (Exec base: 0x20000d18, Load base: 0x00016610, Size:
0x0000000c, Max: 0xffffffff, ABSOLUTE, UNINIT)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 258 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
Exec Addr Load Addr Size Type Attr Idx E Section Name
Object
Related information
About Run-Time Type Information on page 250
-fexceptions, -fno-exceptions
-frtti, -fno-rtti
__cxa_deleted_virtual
Use when a virtual function is explicitly deleted, for example:
void foo() {
static MyClass my_c;
...
}
Instead of using function local static, use pointers to place new instantiated objects.
Dynamically initialized memory using new is also possible, but it is expected that you want to
avoid this. These pointers are inline versions of the guards, for example:
void foo() {
alignas(MyClass) static char buf[sizeof(MyClass)];
static MyClass *cp = nullptr;
if (cp == nullptr) {
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 259 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
cp = new(buf) MyClass;
}
}
If your program is entirely single-threaded then using the static pointer is not an issue.
__cxa_pure_virtual
When a pure virtual function is used, for example:
Do not use. Provide a default implementation that aborts if called. This method is not ideal
because the toolchain can provide diagnostics if it can prove a pure virtual function is going
to be called, and if there is no implementation of a pure virtual function.
new and delete
Use when non-placement new is called and there is no user-defined global operator::new
overload or type specific new overload present.
The implementation in libc++abi is only for those applications that are not
using libc++.
Provide your own implementations of new and delete. The libc++ header contains the
prototypes. Because these functions are not qualified, it is likely that you have to take the
prototypes from the standard so that they can be qualified. Arm expects that most bare-
metal applications use placement new and delete. Therefore, such applications can avoid
dynamic memory allocation.
Therefore, the initialization sequence of processors with local memory systems requires special
consideration.
The C library initialization code in __main is responsible for setting up the execution time memory
map of the image. Therefore, the run-time memory view of the processor must be set up before
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 260 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
branching to __main. This means that any MMU or MPU must be set up and enabled in the reset
handler.
Tightly Coupled Memories (TCM) must also be enabled before branching to __main, normally
before MMU/MPU setup, because you generally want to scatter-load code and data into TCMs.
You must be careful that you do not have to access memory that is masked by the TCMs when
they are enabled.
You might also encounter problems with cache coherency if caches are enabled before branching
to __main. Code in __main copies code regions from their load address to their execution address,
essentially treating instructions as data. As a result, some instructions can be cached in the data
cache, in which case they are not visible to the instruction path.
To avoid these coherency problems, enable caches after the C library initialization sequence
finishes executing.
Related information
Cortex-A Series Programmer's Guide for Armv8-A
Cortex-A Series Programmer's Guide for Armv7-A
Cortex-R Series Programmer's Guide for Armv7-R
; ***************************************************************
; This example does not apply to M-profile
; ***************************************************************
Len_FIQ_Stack EQU 256
Len_IRQ_Stack EQU 256
stack_base DCD 0x18000
;
Reset_Handler
; stack_base could be defined above, or located in a scatter file
LDR R0, stack_base ;
; Enter each mode in turn and set up the stack pointer
MSR CPSR_c, #Mode_FIQ:OR:I_Bit:OR:F_Bit ; Interrupts disabled
MOV sp, R0
SUB R0, R0, #Len_FIQ_Stack
MSR CPSR_c, #Mode_IRQ:OR:I_Bit:OR:F_Bit ; Interrupts disabled
MOV sp, R0
SUB R0, R0, #Len_IRQ_Stack
MSR CPSR_c, #Mode_SVC:OR:I_Bit:OR:F_Bit ; Interrupts disabled
MOV sp, R0
; Leave processor in SVC mode
The stack_base symbol can be a hard-coded address, or it can be defined in a separate assembler
source file and located by a scatter file.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 261 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
The example allocates 256 bytes of stack for Fast Interrupt Request (FIQ) and Interrupt Request (IRQ)
mode, but you can do the same for any other execution mode. To set up the stack pointers, enter
each mode with interrupts disabled, and assign the appropriate value to the stack pointer.
The stack pointer value set up in the reset handler is automatically passed as a parameter to
__user_initial_stackheap() by C library initialization code. Therefore, this value must not be
modified by __user_initial_stackheap().
Related information
Specifying stack and heap using the scatter file
Cortex-M3 Embedded Software Development
This example shows how $Sub and $Super can be used in this way:
void $Sub$$main(void)
{
cache_enable(); // enables caches
int_enable(); // enables interrupts
$Super$$main(); // calls original main()
}
The linker replaces the function call to main() with a call to $Sub$$main(). From there you can call
a routine that enables caches and another to enable interrupts.
Related information
Use of $Super$$ and $Sub$$ to patch symbol definitions
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 262 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
Much of the functionality that you are likely to implement at startup, both in the reset handler and
$Sub$$main, can only be done while executing in privileged modes, for example, on-chip memory
manipulation, and enabling interrupts.
If you want to run your application in a privileged mode, this is not an issue. Ensure that you
change to the appropriate mode before exiting your reset handler.
If you want to run your application in User mode, however, you can only change to User mode
after completing the necessary tasks in a privileged mode. The most likely place to do this is in $Sub
$$main().
The C library initialization code must use the same stack as the application. If you
need to use a non-User mode in $Sub$$main and User mode in the application,
you must exit your reset handler in System mode, which uses the User mode stack
pointer.
For example, if a target has a timer peripheral with two memory mapped 32-bit registers, a C
structure that maps to these registers is:
struct
{
volatile unsigned ctrl; /* timer control */
volatile unsigned tmr; /* timer value */
} timer_regs;
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 263 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
It is important that the contents of these registers are not zero-initialized during application startup,
because this is likely to change the state of your system. Marking an execution region with the
UNINIT attribute prevents ZI data in that region from being zero-initialized by __main.
Related information
Placement of functions and data at specific addresses on page 180
__attribute__((section("name"))) variable attribute
XOM allows you to protect your intellectual property by preventing executable code being read
by users. For example, you can place firmware in XOM and load user code and drivers separately.
Placing the firmware in XOM prevents users from trivially reading the code.
The Arm architecture does not directly support XOM. XOM is supported at the
memory device level.
Related information
Building applications for execute-only memory on page 264
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 264 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
Link-Time Optimization (LTO) does not honor the armclang option -mexecute-only
option. If you use the armclang options -flto or -Omax, then the compiler cannot
generate execute-only code.
Procedure
1. Compile your C or C++ code using the -mexecute-only option.
armclang --target=arm-arm-none-eabi -march=armv7-m -mexecute-only -c test.c -o
test.o
The -mexecute-only option prevents the compiler from generating any data accesses to the
code sections.
To keep code and data in separate sections, the compiler disables the placement of literal pools
inline with code.
Compiled execute-only code sections in the ELF object file are marked with the
SHF_ARM_NOREAD flag.
2. Specify the memory map to the linker using either of the following:
• The +XO selector in a scatter file.
• The armlink --xo-base option on the command-line.
The XO execution region is placed in a separate load region from the RO, RW, and ZI execution
regions.
Related information
Execute-only memory on page 264
-mexecute-only (armclang)
--execute_only (armasm)
--xo_base=address (armlink)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 265 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
AREA directive
The linker normally removes the empty .text section during unused section elimination. However,
the unused section elimination does not occur when:
• The image has no entry point.
• You specify one of the following linker options:
◦ --no_remove
◦ --keep (<object-file-name>(.text))
If you use a scatter file to merge eXecute-Only (XO) and Read-Only (RO) sections into a single
executable region, then the XO sections lose the XO attribute and become RO.
When compiling with -fno-function-sections, all functions are placed in the .text section
with the SHF_ARM_PURECODE attribute. As a result, there are two sections with the name .text,
one with and one without the SHF_ARM_PURECODE attribute. You cannot select between the two
.text sections by name. Therefore, you must use attributes as the selectors in the scatter file to
differentiate between XO and RO sections.
Examples
The following example shows how Arm Compiler for Embedded 6 handles .text sections:
1. Create the file example.c containing:
void foo() {}
int main() {
foo();
}
2. Compile the program and examine the object file with fromelf.
...
LR_XO 0x10000
{
ER_MAIN_FOO 0x10000
{
example.o(.text*)
}
}
LR_2 0x20000
{
ER_REST 0x20000
{
*(+RO, +ZI)
}
ARM_LIB_STACKHEAP 0x80000 EMPTY -0x1000 {}
}
4. Create an image file with armlink and examine the image file with fromelf:
...
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 267 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
5. Repeat the link again with the linker option --no_remove and examine the image file with
fromelf.
The output shows that Section #1 does not have the SHF_ARM_PURECODE attribute:
...
The empty RO .text section is no longer removed and is placed in the same execution region
as .text.main and .text.foo. Therefore, these sections become read-only.
The same result is obtained when linking with --keep example.o(.text) or if there is no main
or no entry point.
6. To ensure that the sections remain as execute-only, either:
• Change the scatter file to use the XO attribute selector as follows:
LR_XO 0x10000
{
ER_MAIN_FOO 0x10000
{
example.o(+XO)
}
}
LR_2 0x20000
{
ER_REST 0x20000
{
*(+RO, +ZI)
}
ARM_LIB_STACKHEAP 0x80000 EMPTY -0x1000 {}
}
• Explicitly place sections in their execution regions. However, compiling with -fno-function
sections generates two .text sections with different attributes:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 268 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
fromelf example.o
...
** Section #1 '.strtab' (SHT_STRTAB)
Size : 107 bytes
In this case, differentiating the sections by name only is not possible. If unused section
elimination does not remove the empty .text sections, the attribute selectors are required
to place the sections in different output sections.
Integer division by zero behavior for processors that support hardware division
instructions
For processors that support hardware division instructions, the behavior depends on the Divide by
Zero support of the processor:
• Trapping Divide by Zero errors.
• Returning a zero result on Divide by Zero.
For more information about the Divide by Zero support, see the Technical Reference Manual (TRM)
for your processor.
Integer division by zero behavior for processors that do not support hardware division
instructions
For processors that do not support hardware division instructions, such as the SDIV and UDIV
instructions, you cannot rely on the C and C++ library helper function __aeabi_idiv0() to trap and
identify integer division by zero errors. Instead, you must manually test the denominator before the
division operation takes place. For example:
#include <signal.h>
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 269 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
You can trap integer division by zero at run-time with the Undefined Behavior
Sanitizer (UBSan) functionality. See Overview of Undefined Behavior Sanitizer for
more information.
armclang assumes that the FPCR.DZE bit is never set to 1. armclang also incorrectly assumes that a
processor always automatically sets FPSR.DZC to 1 to indicate that a divide-by-zero operation has
occurred. Therefore, armclang can move a comparison with 0.0f after a potential divide-by-zero
operation, because it assumes a divide-by-zero operation does not affect program flow. However,
if the implementation supports floating-point exception trapping and your code sets FPCR.DZE to
1, a divide-by-zero operation does affect the program flow and might cause a processor exception.
If the processor does not support floating-point exception trapping, then setting FPCR.DZE to 1
might result in unexpected runtime behavior. Therefore, make sure your code is written such that
armclang avoids placing the division before the comparison.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 270 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
However, because of the assumptions armclang makes about floating-point instructions, it might
compile the example C code for AArch64 state as follows:
This example shows that the division is performed before the comparison, and executed
unconditionally, which might be undesirable.
The following examples show how to work around the division by zero behavior in source code.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 271 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
} else {
ret = x;
}
return ret;
}
11.28 Dealing with leftover debug data for code and data
removed by armlink
armlink eliminates unused functions to reduce code size. However, because the debug information
is not embedded on a function level but at the object level, the linker is unable to remove the
associated unused debug information.
When armlink removes code, it resolves references to addresses in the removed range to 0x0
by default. Therefore, any debug information for that code now points to address 0x00000000.
Resolving to 0x00000000 is a problem when the target processor has a vector table at that address
and you want to set a breakpoint at that address. Therefore, use --dangling-debug-address to
specify an unused address to use to resolve references to the removed code.
You could temporarily turn off the automatic removal of unused code with --no-
remove. However, this option increases the overall code size.
The default armclang option is -ffunction-sections. Therefore, when compiling a translation unit
containing two functions, the resulting .o file contains a separate code section for each function.
However, the debug data sections contain data for both functions.
At link time, one of the code sections might be referenced but the other is not. Therefore, if the
linker wants to retain debug data for only one function, the .o file contains sections that have
debug data for both functions. When the linker applies all the address relocations to the debug
data relative to the retained function, then it generates an acceptable image. However, there
remain all the address relocations for debug data relative to the function that is absent. In this case,
the linker applies the relocations for these data relative to the address supplied by --dangling-
debug-address.
Typically, you use a high address well away from your code, but not at the very top of the address
range, for example:
This command forces any leftover debug data to be moved well away from the startup code
around 0x0 that you are trying to debug.
You must have enough virtual address space after the address specified with --dangling-debug-
address so that all the debug data relocated to that region safely points to nothing.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 272 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Embedded Software Development
Related information
-ffunction-sections, -fno-function-sections
--dangling-debug-address=address
--remove, --no_remove
Arm® Compiler for Embedded provides scatter-loading features that can support complex memory
maps, such as overlapping regions or placing code and data into non-consecutive areas of memory.
Not all tools can handle the complex layouts that Arm Compiler for Embedded supports. Therefore,
Arm Compiler for Embedded provides a simplified mode when the following properties of the
regions are ensured:
• Each load region has a single relocation.
• There is at least one RO region and one root region.
• None of the regions are overlays or overlap.
Arm Compiler for Embedded provides the following armlink command-line options to modify the
output symbols and the addresses of the output image:
• --elf-output-format to modify the symbols and addresses of the output image to be
compatible with third-party tools.
• --scatterload-enabled or --no-scatterload-enabled to enable or disable the generation of
scatter-loading.
Region table generation is disabled when the --no-scatterload-enabled option is used, or when
the --elf-output-format is set to gnu. As such, the linker does not generate region table related
symbols such as Load$$LR. Applications that make use of Load$$LR fail to link.
Related information
--elf-output-format
--scatterload-enabled, --no-scatterload-enabled
__attribute__((section("name"))) variable attribute
Scatter-loading Features
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 273 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Varying the stack location at program startup to increase address diversity of the
stack pointer is also good practice to reduce the risk of attacks.
You must build Secure state code and Non-secure state code as two separate programs. Arm
Compiler for Embedded provides support for:
• Code generation for Non-secure entry functions.
• Code generation for calling Non-secure functions.
• Intrinsics to query memory permissions.
• Linker generation of gateway veneers.
Your Secure state code must perform the following operations to ensure the Secure state is
not compromised:
• Sanitize and verify the addresses provided by the Non-secure state.
• Clear all state, such as floating-point registers, before returning to the Non-secure state.
For more information, see Overview of building Secure and Non-secure images with the Armv8-M
Security Extension.
Stack protection
You access stack protection using the set of armclang options -fstack-protector* to make code
generation changes that detect stack smashing attacks.
Threat Model
The attacker is trying to perform a ROP attack by overwriting the return address on the stack
using an overflow.
Assumptions
Stack protection assumes the attacker:
• Has no access to higher level privilege.
• Does not have control of the stack.
• Only has read-only access to code.
• Can provide input to the program.
• Can disassemble code.
• Does not know the value of __stack_chk_guard or the location of __stack_chk_guard.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 275 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Library support for branch protection is available in the *a.* variants. See C and C++ library
naming conventions.
Threat Model
The attacker is trying to perform a ROP or JOP attack, by overwriting an address of an
indirect jump.
Assumptions
Branch target protection assumes the attacker:
• Has no access to higher level privilege.
• Only has read-only access to code.
• Has control of the stack, because other protections have not been applied or have failed.
• Can disassemble code.
• Can make as many attempts as they like to attack the program.
Protection mechanism
• You enable branch protection for the system on Armv8.5-A and later or Armv8.1-M or
for the memory pages covering the program in AArch64.
• An indirect branch that does not land on a landing pad instruction causes an abort. This
restricts the set of places that an attacker that compromises the system can jump to.
• The compiler inserts landing pad instructions that can be jumped to.
• The assembler author is responsible for adding landing pad instructions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 276 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Library support for return address signing is available in the *a.* variants. See C and C++ library
naming conventions.
Threat Model
Return address signing assumes the attacker:
• Has no access to higher level privilege.
• Only has read-only access to code.
• Can provide input to the program.
• Can disassemble code.
• Can make as many attempts as they like to attack the program.
Protection mechanism
Return address signing is similar to stack protection, but instead of a canary value, the return
address on the stack is signed on function entry and authenticated on function exit. An
attacker must be able to replace the return address with a signed value that successfully
authenticates.
For more information, see Armv8.1-M PACBTI extension mitigations against ROP and JOP style
attacks.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 277 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
There are a finite number of tags so this mechanism provides probablistic protection only. The
immediate + offset form is not subject to checks.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 278 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
CFI requires that you also enable Link-Time Optimization (LTO) with the armclang option -flto and
the armlink option --lto.
The armclang option -mharden-sls generates code that helps prevent a processor from speculating
past affected indirect branch instructions on AArch64 targets. For information about other branch
instructions, see the Straight-line speculation whitepaper.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 279 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Arm recommends using lower optimization levels for files with secure code. If you use higher
optimization levels, then you can use the following mitigation:
• Removal of code that seems redundant to the compiler, but is an important check for some
security property. For example:
◦ Elimination of unused sections can remove a function or variable that is critical to security.
To prevent the removal of a function or variable, you can mark that function or variable
in source code with the __attribute__((used)) attribute. Alternatively, you can use the
armlink option --keep=<section_id>.
◦ Inlining can affect whether a function is protected. To prevent a function being inlined,
specify the __attribute__((noinline)) function attribute or the armclang option -fno-
inline-functions.
• Removal of memory stores that seem to be redundant to the compiler because the variable is
not used afterwards, but leaves sensitive data in memory. For example, removal of a seemingly
unused variable can prevent a function from being protected. To prevent the removal of a
variable that is essential to a security feature, declare that variable as volatile or use the
__attribute__((used)) attribute.
• Changes in code that do not allow the same time execution paths, therefore allowing side
channel attacks.
Related information
Hardware errata and vulnerabilities on page 280
Effect of the volatile keyword on compiler optimization on page 61
-fno-inline-functions
__attribute__((used)) function attribute
-keep=section_id (armlink)
Elimination of unused sections
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 280 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Where errata mitigations are available that can be applied using Arm® Compiler for Embedded, the
mitigations are provided through either armclang mitigations or armlink patches:
• To apply armclang mitigations, use the -mfix-<feature>-<ID> option. <feature> might be the
name of a processor, or the name of an Arm Compiler for Embedded feature. <ID> can be one
of the following combinations:
◦ <name>-<erratum_ID>, for example aes-1742098
◦ <erratum_ID>, for example 835769.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 281 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
For example:
◦ To apply the AES erratum fix 1742098 for the Cortex®-A57 processor, use the command-
line option -mfix-cortex-a57-aes-1742098.
◦ To apply the fix for the CMSE vulnerability cve-2021-42574, use the command-line option
-mfix-cmse-cve-2021-42574.
For example, to apply erratum 835769 for the Cortex-A53 processor, use the command-line
option --branchpatch=cortex-a53-835769.
To get information on the modification made to the program by the workaround, specify the --
info=patches option.
For more information about stack sealing, see the advisory notice Armv8-M
Stack Sealing vulnerability.
To build an image that runs in the Secure state you must include the <arm_cmse.h> header in your
code, and compile using the armclang command-line option -mcmse. Compiling in this way makes
the following features available:
• The Test Target, TT, instruction.
• TT instruction intrinsics.
• Non-secure function pointer intrinsics.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 282 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
On startup, your Secure code must set up the Security Attribution Unit (SAU) and call the Non-
secure startup code.
typedef struct {
int p1;
int p2;
int p3;
int p4;
int p5;
} Params;
void your_api(int p1, int p2, int p3, int p4, int p5) {
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 283 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Here, your_api_implementation(&p1) is the call to your existing function, with fewer than the
maximum number of 4 arguments allowed.
__acle_se_entryname:
entryname:
bl entryname
2. The Secure gateway veneer consists of the SG instruction and a call to the entry function in the
Secure image using the B instruction:
entryname
SG
B.W __acle_se_entryname
3. The Secure image returns from the entry function using the BXNS instruction:
bxns lr
The following figure is a graphical representation of the calling sequence, but for clarity, the return
from the entry function is not shown:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 284 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Related information
Building a Secure image using the Armv8-M Security Extension on page 286
Building a Secure image using a previously generated import library on page 291
Building a Non-secure image that can call a Secure image on page 290
Whitepaper - Armv8-M Architecture Technical Overview
-mcmse
__attribute__((cmse_nonsecure_call)) function attribute
__attribute__((cmse_nonsecure_entry)) function attribute
Predefined macros
TT instruction intrinsics
Non-secure function pointer intrinsics
B instruction
BL instruction
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 285 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
BXNS instruction
SG instruction
TT, TTT, TTA, TTAT instruction
Placement of CMSE veneer sections for a Secure image
Arm recommends that Secure world software adds the value 0xfef5eda5 to the
top of the main and process stacks. Adding this value is known as stack sealing.
CMSIS 5.8.0 handles stack sealing. See CMSIS 5 for more information. For more
information about stack sealing, see the advisory notice Armv8-M Stack Sealing
vulnerability
Procedure
1. Create an interface header file, myinterface_v1.h, to specify the C linkage for use by Non-
secure code:
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
2. In the C program for your Secure code, secure.c, include the following:
#include <arm_cmse.h>
#include "myinterface_v1.h"
In addition to the implementation of the two entry functions, the code defines the function
func1() that is called only by Secure code.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 286 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
If you are compiling the Secure code as C++, then you must add extern "C" to
the functions declared as __attribute__((cmse_nonsecure_entry)).
4. Enter the following command to see the disassembly of the machine code that armclang
generates:
$ armclang -c --target=arm-arm-none-eabi -march=armv8-m.main -mcmse -S secure.c
.text
...
.code 16
.thumb_func
...
func1:
.fnstart
...
bx lr
...
__acle_se_entry1:
entry1:
.fnstart
.save {r7, lr}
push {r7, lr}
...
bl func1
...
pop.w {r7, lr}
...
bxns lr
...
__acle_se_entry2:
entry2:
.fnstart
.save {r7, lr}
push {r7, lr}
...
bl entry1
...
pop.w {r7, lr}
bxns lr
...
main:
.fnstart
...
movs r0, #0
...
bx lr
...
An entry function does not start with a Secure Gateway (SG) instruction. The two symbols
__acle_se_<entry_name> and <entry_name> indicate the start of an entry function to the linker.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 287 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
5. Create a scatter file containing the Veneer$$CMSE selector to place the entry function veneers in
a Non-Secure Callable (NSC) memory region.
LOAD_REGION 0x0 0x3000
{
EXEC_R 0x0
{
*(+RO,+RW,+ZI)
}
EXEC_NSCR 0x4000 0x1000
{
*(Veneer$$CMSE)
}
ARM_LIB_STACK 0x700000 EMPTY -0x10000
{
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
}
}
...
6. Link the object file using the armlink command-line option --import-cmse-lib-out and the
scatter file to create the Secure image:
$ armlink secure.o -o secure.axf --cpu 8-M.Main --import-cmse-lib-out
importlib_v1.o --scatter secure.scf
In addition to the final image, the link in this example also produces the import library,
importlib_v1.o, for use when building a Non-secure image. Assuming that the section with
veneers is placed at address 0x4000, the import library consists of a relocatable file containing
only a symbol table with the following entries:
When you link the relocatable file corresponding to this assembly code into an image, the linker
creates veneers in a section containing only entry veneers.
If you have an import library from a previous build of the Secure image, you
can ensure that the addresses in the output import library do not change when
producing a new version of the Secure image. To ensure that the addresses do
not change, specify the --import-cmse-lib-in command-line option together
with the --import-cmse-lib-out option. However, make sure the input and
output libraries have different names.
7. Enter the following command to see the entry veneers that the linker generates:
$ fromelf --text -s -c secure.axf
The following entry veneers are generated in the EXEC_NSCR eXecute-Only (XO) region for this
example:
...
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 288 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
The section with the veneers is aligned on a 32-byte boundary and padded to a 32-byte
boundary.
If you do not use a scatter file, the entry veneers are placed in an ER_XO section as the first
execution region, for example:
...
** Section #1 'ER_XO' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR + SHF_ARM_NOREAD]
Size : 32 bytes (alignment 32)
Address: 0x00008000
$t
entry1
0x00008000: e97fe97f .... SG ; [0x7e08]
0x00008004: f000b85a ..Z. B.W __acle_se_entry1 ; 0x80bc
entry2
0x00008008: e97fe97f .... SG ; [0x7e10]
0x0000800c: f000b868 ..h. B.W __acle_se_entry2 ; 0x80e0
...
Next steps
After you have built your Secure image:
1. Pre-load the Secure image onto your device.
2. Deliver your device with the pre-loaded image, together with the import library package, to a
party who develops the Non-secure code for this device. The import library package contains:
• The interface header file, myinterface_v1.h.
• The import library, importlib_v1.o.
Related information
Building a Secure image using a previously generated import library on page 291
Building a Non-secure image that can call a Secure image on page 290
Whitepaper - Armv8-M Architecture Technical Overview
-c armclang option
-march armclang option
-mcmse armclang option
-S armclang option
--target armclang option
__attribute__((cmse_nonsecure_entry)) function attribute
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 289 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
SG instruction
--cpu armlink option
--import_cmse_lib_in armlink option
--import_cmse_lib_out armlink option
--scatter armlink option
--text fromelf option
The import library package identifies the entry points for the Secure image.
Procedure
1. Include the interface header file in the C program for your Non-secure code, nonsecure.c, and
use the entry functions as required.
#include <stdio.h>
#include "myinterface_v1.h"
int main(void) {
int val1, val2, x;
val1 = entry1(x);
val2 = entry2(x);
if (val1 == val2) {
printf("val2 is equal to val1\n");
} else {
printf("val2 is different from val1\n");
}
return 0;
}
3. Create a scatter file for the Non-secure image, but without the Non-Secure Callable (NSC)
memory region.
LOAD_REGION 0x8000 0x3000
{
ER 0x8000
{
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 290 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
*(+RO,+RW,+ZI)
}
ARM_LIB_STACK 0x800000 EMPTY -0x10000
{
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
}
}
...
4. Link the object file using the import library, importlib_v1.o, and the scatter file to create the
Non-secure image.
$ armlink nonsecure.o importlib_v1.o -o nonsecure.axf --cpu=8-M.Main --scatter
nonsecure.scat
Related information
Building a Secure image using the Armv8-M Security Extension on page 286
Whitepaper - Armv8-M Architecture Technical Overview
-march armclang option
--target armclang option
--cpu armlink option
--scatter armlink option
The following procedure assumes that you have the import library package that is created in
Building a Secure image using the Arm®v8-M Security Extension.
Procedure
1. Create an interface header file, myinterface_v2.h, to specify the C linkage for use by Non-
secure code:
#ifdef __cplusplus
extern "C" {
#endif
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 291 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
#ifdef __cplusplus
}
#endif
2. Include the following in the C program for your Secure code, secure.c:
#include <arm_cmse.h>
#include "myinterface_v2.h"
In addition to the implementation of the two entry functions, the code defines the function
func1() that is called only by Secure code.
If you are compiling the Secure code as C++, then you must add extern "C" to
the functions declared as __attribute__((cmse_nonsecure_entry)).
4. To see the disassembly of the machine code that is generated by armclang, enter:
$ armclang -c --target arm-arm-none-eabi -march=armv8-m.main -mcmse -S secure.c
.text
...
.code 16
.thumb_func
...
func1:
.fnstart
...
bx lr
...
__acle_se_entry1:
entry1:
.fnstart
.save {r7, lr}
push {r7, lr}
...
bl func1
pop.w {r7, lr}
...
bxns lr
...
__acle_se_entry4:
entry4:
.fnstart
.save {r7, lr}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 292 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
An entry function does not start with a Secure Gateway (SG) instruction. The two symbols
__acle_se_<entry_name> and <entry_name> indicate the start of an entry function to the linker.
5. Create a scatter file containing the Veneer$$CMSE selector to place the entry function veneers in
a Non-Secure Callable (NSC) memory region.
LOAD_REGION 0x0 0x3000
{
EXEC_R 0x0
{
*(+RO,+RW,+ZI)
}
EXEC_NSCR 0x4000 0x1000
{
*(Veneer$$CMSE)
}
ARM_LIB_STACK 0x700000 EMPTY -0x10000
{
}
ARM_LIB_HEAP +0 EMPTY 0x10000
{
}
}
...
6. Link the object file using the armlink command-line options --import-cmse-lib-out and --
import-cmse-lib-in, together with the preprocessed scatter file to create the Secure image:
$ armlink secure.o -o secure.axf --cpu 8-M.Main --import-cmse-lib-out
importlib_v2.o --import-cmse-lib-in importlib_v1.o --scatter secure.scf
In addition to the final image, the link in this example also produces the import library,
importlib_v2.o, for use when building a Non-secure image. Assuming that the section with
veneers is placed at address 0x4000, the import library consists of a relocatable file containing
only a symbol table with the following entries:
When you link the relocatable file corresponding to this assembly code into an image, the linker
creates veneers in a section containing only entry veneers.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 293 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
7. Enter the following command to see the entry veneers that the linker generates:
$ fromelf --text -s -c secure.axf
The following entry veneers are generated in the EXEC_NSCR eXecute-Only (XO) region for this
example:
...
** Section #3 'EXEC_NSCR' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR +
SHF_ARM_NOREAD]
Size : 64 bytes (alignment 32)
Address: 0x00004000
$t
entry1
0x00004000: e97fe97f .... SG ; [0x3e08]
0x00004004: f7fcb85e ..^. B __acle_se_entry1 ; 0xc4
entry2
0x00004008: e97fe97f .... SG ; [0x3e10]
0x0000400c: f7fcb86c ..l. B __acle_se_entry2 ; 0xe8
...
entry3
0x00004020: e97fe97f .... SG ; [0x3e28]
0x00004024: f7fcb872 ..r. B __acle_se_entry3 ; 0x10c
entry4
0x00004028: e97fe97f .... SG ; [0x3e30]
0x0000402c: f7fcb888 .... B __acle_se_entry4 ; 0x140
...
The section with the veneers is aligned on a 32-byte boundary and padded to a 32-byte
boundary.
If you do not use a scatter file, the entry veneers are placed in an ER_XO section as the first
execution region. The entry veneers for the existing entry points are placed in a CMSE veneer
section. For example:
...
** Section #1 'ER_XO' (SHT_PROGBITS) [SHF_ALLOC + SHF_EXECINSTR + SHF_ARM_NOREAD]
Size : 32 bytes (alignment 32)
Address: 0x00008000
$t
entry3
0x00008000: e97fe97f .... SG ; [0x7e08]
0x00008004: f000b87e ..~. B.W __acle_se_entry3 ; 0x8104
entry4
0x00008008: e97fe97f .... SG ; [0x7e10]
0x0000800c: f000b894 .... B.W __acle_se_entry4 ; 0x8138
...
$t
entry1
0x00004000: e97fe97f .... SG ; [0x3e08]
0x00004004: f004b85a ..Z. B.W __acle_se_entry1 ; 0x80bc
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 294 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
entry2
0x00004008: e97fe97f .... SG ; [0x3e10]
0x0000400c: f004b868 ..h. B.W __acle_se_entry2 ; 0x80e0
...
Next steps
After you have built your updated Secure image:
1. Pre-load the updated Secure image onto your device.
2. Deliver your device with the pre-loaded image, together with the new import library package,
to a party who develops the Non-secure code for this device. The import library package
contains:
• The interface header file, myinterface_v2.h.
• The import library, importlib_v2.o.
Related information
Building a Secure image using the Armv8-M Security Extension on page 286
Building a Non-secure image that can call a Secure image on page 290
Whitepaper - Armv8-M Architecture Technical Overview
-c armclang option
-march armclang option
-mcmse armclang option
-S armclang option
--target armclang option
__attribute__((cmse_nonsecure_entry)) function attribute
SG instruction
--cpu armlink option
--import_cmse_lib_in armlink option
--import_cmse_lib_out armlink option
--scatter armlink option
--text fromelf option
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 295 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
The Armv8.1-M PACBTI extension consists of the following control-flow integrity approaches:
• Return address signing and authentication (PAC-RET) mitigates against Return Oriented
Programming (ROP) style attacks.
• BTI instruction placement (BTI) mitigates against Jump Oriented Programming (JOP) style attacks
and restricts the set of targets for an indirect branch.
For more information about ROP and JOP style attacks, see Learn the architecture: Providing
protection for complex software.
Startup initialization
If a source of true randomness is available, you must use it to select a random encryption key to
initialize PAC. Otherwise, you can use the following sequence for testing only:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 296 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
.eabi_attribute Tag_PAC_extension, 1
.eabi_attribute Tag_PACRET_use, 1
.eabi_attribute Tag_BTI_extension, 1
.eabi_attribute Tag_BTI_use, 1
The output of PACBTI build attributes depends only on the command-line options given. The build
attributes are not affected by function attributes.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 297 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
These attributes are output only when compiling C or C++ source. They are not output for
assembly files. If you are linking with objects that are compiled with a PACBTI feature enabled, Arm
recommends that you add the following code to your assembly language source files:
#if !defined(__ARM_64BIT_STATE)
#ifdef __ARM_FEATURE_PAC_DEFAULT
.eabi_attribute Tag_PAC_extension, 1
.eabi_attribute Tag_PACRET_use, 1
#endif
#ifdef __ARM_FEATURE_BTI_DEFAULT
.eabi_attribute Tag_BTI_extension, 1
.eabi_attribute Tag_BTI_use, 1
#endif
#endif
If the assembly source uses non-hint-space PACBTI instructions, you must change the directive for
the PAC extension to:
.eabi_attribute Tag_PAC_extension, 2
Without these directives, you might report an incompatible build attributes error.
Linker behavior
The following table shows the linker behavior for objects compiled with the Armv8.1-M PACBTI
feature and -mbranch-protection options:
The same attributes are generated for each -mbranch-protection option with or
without specifying the +pacbti feature.
There is only one library variant for the Armv8.1-M PACBTI extension. This variant
provides both pointer authentication and BTI. It is not possible to specify a library
variant that supports only one or the other.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 298 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
You can override this behavior by using the linker option --library_security=<option>, as shown
in the following table:
You can use the linker option --info=bti to output a list of the BTI and non-BTI user objects in the
link.
Related information
-march
-mbranch-protection
-mcpu
__attribute__((target("options"))) function attribute
--info=topic[,topic,…] (armlink)
--library-security-protection
--require-bti
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 299 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
RME does not have an associated +[no]<feature> option for the -march or -mcpu
options, because the RME registers are available in the Armv9-A application profile
architecture without an additional extension.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 300 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
• When defining the symbol __use_memtag_heap to enable the heap implementation that uses
memory tagging, you must make sure to place the heap in tagged memory.
• You must ensure that the tagged memory used for the stack and heap has an initial tag value of
zero.
When you enable memory tagging, the compiler checks that expressions that evaluate to addresses
of objects on the stack are within the bounds of the object. If this cannot be guaranteed, the
compiler generates code to ensure that the pointer and the object are tagged. When tagged
pointers are dereferenced, the processor checks the tag on the pointer with the tag on the memory
location being accessed. If the tags do not match, the processor generates an exception and
therefore tries to prevent the pointer from accessing any object that is different from the object
whose address was taken.
For example, if a pointer to a variable on the stack is passed to another function, then the compiler
might be unable to guarantee that this pointer is only used to access the same variable. In this
situation, the compiler generates memory tagging code. The memory tagging instructions apply a
unique tag to the pointer and to its corresponding allocation on the stack.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 301 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Library support
To ensure full memory tagging protection, you must also link your code with the
library that provides memory tagging protection. For more information, see armlink --
library_security=protection.
armlink automatically selects the library with memory tagging protection if at least one object file
is compiled with pointer authentication using -mbranch-protection, and one of the following is
true:
• At least one object file is compiled with -fsanitize=memtag-stack.
• At least one object file includes the symbol __use_memtag_heap and is compiled with -
fsanitize=memtag-heap.
You can override the selected library by using the armlink option --library_security to specify
the library that you want to use.
Related information
armclang -fsanitize, -fno-sanitize
armclang -fstack-protector, -fstack-protector-all, -fstack-protector-strong, -fno-stack-protector
armclang -mbranch-protection
armclang -mcpu
armlink --library_security=protection
Choosing a heap implementation for memory allocation functions
Arm C Language Extensions
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 302 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
You can enable any of the CFI schemes individually, or enable all schemes with -fsanitize=cfi
then disable some of them with the -fno-sanitize option. For example, to disable the cfi-nvcall
and cfi-icall schemes, specify:
If you enable at least one CFI scheme with -fsanitize, then you must also enable Link-Time
Optimization (LTO) with the armclang option -flto and the armlink option --lto.
CFI also uses an ignore list that is a list of entities for which the CFI checks are to be relaxed. This
list is populated from a text file cfi_ignorelist.txt. Arm® Compiler for Embedded provides an
empty cfi_ignorelist.txt file. By default, armclang searches for this file in <install_path>/lib/
clang/<version>/share:
• You can change the default location that armclang searches for the cfi_ignorelist.txt file
with the -resource-dir=<path_to_resource_folder> option.
• If you want to clear the ignore list, then specify the armclang option -fno-sanitize-
ignorelist.
• If you want to extend the ignore list using your own ignore list files, then specify each file with
-fsanitize-ignorelist=<ignorelistfile>.
The member function pointer call checking scheme, cfi-mfcall, checks to make sure that the base
type of the member function pointer is complete. armclang only emits a full CFI check if this base
type is complete. To ensure armclang always emits a full CFI check, you must specify -fcomplete-
member-pointers.
For more information about the CFI checks, see Control Flow Integrity.
Arm Compiler for Embedded does not support the -flto=thin and -fno-sanitize-
trap options.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 303 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
Related information
Support level definitions on page 405
armclang -fcomplete-member-pointers
armclang -fsanitize, -fno-sanitize
armclang -fsanitize-ignorelist, -fno-sanitize-ignorelist
armclang -resource-dir
armclang -flto, -fno-lto
armlink -lto, -no_lto
To catch a particular kind of Undefined Behavior, specify the required check with the armclang
option -fsanitize=<ubsan_check>. For a complete list of checks, see Available checks at Undefined
Behavior Sanitizer.
However, the option -fsanitize=undefined enables all the UBSan checks, except for float-
divide-by-zero, unsigned-integer-overflow, implicit-conversion, local-bounds, and the
nullability-* group of checks. To prevent the non-minimal handlers mode from being enabled,
you must include checks that relate to the traps mode and the minimal handlers mode:
• To enable the traps mode for a particular check, specify the required check with the armclang
option -fsanitize-trap=<ubsan_check>. Alternatively, you can specify -fsanitize-trap=all to
use traps mode for all checks requested.
• To enable the minimal handlers mode, specify the armclang option -fsanitize-minimal-
runtime.
Related information
armclang -fsanitize, -fno-sanitize
armclang -fsanitize-minimal-runtime
armclang -fsanitize-trap, -fno-sanitize-trap
armclang -fsanitize-recover, -fno-sanitize-recover
Undefined Behavior Sanitizer
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 304 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
The armclang option -mharden-sls=<option> allows you to mitigate against this vulnerability.
For RET and BR instructions, the mitigation places a speculation barrier after the instructions that
prohibits incorrect speculation. armclang uses the SB speculation barrier instruction after RET and
BR instructions if that instruction is supported by the target. Otherwise, it uses the DSB and ISB
instructions.
For the BLR instruction, the mitigation replaces all instances of BLR with a BL and BR sequence, for
example:
blr x<N>
bl __llvm_slsblr_thunk_x<N>
armclang creates a thunk __llvm_slsblr_thunk_x<N> for every X<N> register. Each thunk is placed
in a separate section named .text.__llvm_slsblr_thunk_x<N> that contains:
.section
.text.__llvm_slsblr_thunk_x<N>,"axG",@progbits,__llvm_slsblr_thunk_x<N>,comdat
.hidden __llvm_slsblr_thunk_x<N> // -- Begin function
__llvm_slsblr_thunk_x<N>
.weak __llvm_slsblr_thunk_x<N>
.p2align 4
.type __llvm_slsblr_thunk_x<N>,@function
__llvm_slsblr_thunk_x<N>:
br x<N>
dsb sy
isb
The register number in the thunk might be different from the register in the original
BLR instruction.
The BLR instruction gets split into separate BL and BR instructions. This transformation results in not
inserting a speculation barrier on the architectural execution path.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 305 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
In Arm® Compiler for Embedded 6, the separate thunk code is globally visible and might be called
from a location where the SB instruction is locally disabled. Therefore, armclang always uses the
DSB and ISB speculation barrier instructions.
The linker unused section elimination feature removes all unused thunk sections. Also, these
sections are generated in every object file included in the compile. Because the sections are
defined in comdat groups, the linker includes only one instance in the output.
*(.text.__llvm_slsblr_thunk_x*)
If you place the sections far away from the references, the linker adds a veneer to locate them.
Arm does not provide compiler-generated mitigations for all the other instructions mentioned in
the Straight-line speculation whitepaper.
Related information
-mharden-sls
BLR instruction
BR instruction
DSB instruction
ISB instruction
RET instruction
SB instruction
The following techniques are recommended to improve memory-safety of C and C++ code:
Develop code following coding guidelines
There are industry accepted guidelines such as MISRA, AUTOSAR, CERT, and C++ Core
Guidelines. Particularly, C++ guidelines focus on avoiding or encapsulating the use of
raw pointers and arrays by replacing them with smart pointers and standard C++ library
containers.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 306 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Security features supported in Arm Compiler for Embedded
C++ provides more safety features than C. Therefore C++ might be a better choice for
projects where safety is important and technical constraints allow use of C++.
Visit Carnegie Mellon University and search for the following titles:
• SEI CERT C++ Coding Standard.
• SEI CERT C++ Coding Standard: Rules for Developing Safe, Reliable, and Secure Systems (2016
Edition).
Perform static analysis
Commercial and open source third-party tools, such as the LLVM Project clang-tidy, are
available. Such tools allow you to get the most thorough analysis of the code. This analysis
includes checking for the compliance with coding guidelines. Arm® Compiler for Embedded
provides a set of analysis and associated warnings such as:
• -Wall and -Wextra.
• -Wformat=2.
Third-party fuzz testing tools are available to improve code coverage during testing. These
tools help you to find more memory-safety issues. Third-party bounded model checking tools
can verify memory-safety properties, among other properties, by using formal proof methods.
Related information
Security features supported in Arm Compiler for Embedded on page 274
-W (armclang)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 307 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
When using multiple threads, each thread you create must have an instance of the TLS data area.
On switching context, you must arrange for the thread pointer to point to the TLS data area for the
thread.
Many functions in the standard C library use a persistent state in the library. For example, the
global variable errno holds the error status from the library, so it must not be overwritten by other
threads.
The standard C library startup code and linker scatter file ensures that the TLS is instantiated once
for the main thread, for use in single threaded systems.
For C multithreaded support in Arm Compiler for Embedded, see Multithreaded support in
Arm C libraries.
TLS in C++
Arm Compiler for Embedded supports the __thread storage class keyword in C++.
Arm Compiler for Embedded supports the thread storage duration specifier of C++,
thread_local, for -std=c++11 or later. This keyword is only supported for C in Arm Compiler
for Embedded version 6.19 and later when used with the -std=c2x [COMMUNITY] feature
for C23 support.
For C++ multithreaded support in Arm Compiler for Embedded, see Multithreaded support
in Arm C++ libraries. The Arm C++ libraries support level for multithreaded applications is
[ALPHA].
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 308 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
If you specify the -ftls-model=<model> command-line option and your code includes the
__attribute__((tls_model("<model>"))) variable attribute, then the attribute overrides the
command-line option.
For executables that are statically linked, you need only use the local-exec model. local-exec is
the least general and most efficient model.
For dynamic linking, you might consider using the initial-exec and global-dynamic models.
However, the compiler always selects a compatible model:
• global-dynamic is the most general but least efficient model, and you can use it anywhere.
• You can use the initial-exec model in shared-libraries provided that they are not loaded at
runtime with dlopen().
Arm Compiler for Embedded supports TLS in the following linking models:
• Thread local storage in the bare metal and shared library linking models.
• Thread local storage in the SysV linking model.
This example is not a complete solution and is provided only to show the Thread
Local Storage (TLS) features available in Arm Compiler for Embedded.
• At the start of the main() function, initializes TLS data by using linker-defined symbols to find
the data in memory. You must use the equivalent symbols for your own implementation.
• Accesses the initialized TLS data and prints it to the terminal.
• Compile with -mtp=<el> to specify the TPIDR_ELn register to use. For example, to use
TPIDR_EL0, compile with -mtp=el0.
• Place the TLS RW and ZI data using a scatter file in the following order of increasing addresses:
◦ If the TLS RW and ZI data is part of an existing load region:
1. Any RO code and data as needed.
2. TLS RW data.
3. No gaps, other than alignment padding.
4. TLS ZI data.
5. Any non-TLS RW and ZI data as needed.
◦ If the TLS RW and ZI data is in its own dedicated load region:
1. TLS RW data.
2. Make sure there are no gaps other than alignment padding.
3. TLS ZI data.
You can use the +tls-rw selector to select the TLS RW data. You can use the +tls-zi selector
to select the TLS ZI data. You must keep all the TLS data for the entire application in one
execution region.
This example places the TLS RW and ZI data in an existing load region called LOAD:
LOAD 0x80000000
{
STARTUP +0
{
startup.o (StartUp, +FIRST)
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 310 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
EXEC +0 {
*(+RO, +RW, +ZI)
}
;
; TLS RW region
; If the load region contains more execution regions
; than just TLS execution regions, then do not place
; any non-TLS RW or ZI data before TLS RW or ZI data
;
ER_TLS_RW +0 {
*(+tls-rw)
}
;
; TLS ZI region
; This must be immediately after the TLS RW region
;
ER_TLS_ZI +0 {
*(+tls-zi)
}
...
}
• Provide your own implementation of a function that initializes the TLS RW and ZI data for each
thread from its initial location in memory.
If your TLS RW data is in an execution region ER_TLS_RW and your TLS ZI data is in an execution
region ER_TLS_ZI, then you can use the following linker-defined symbols to determine the TLS
data attributes:
It is important that you use $$ZI$$ when referring to the TLS ZI data. Without it,
the linker does not include the ZI data when calculating the value of the linker-
defined symbol.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 311 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
2. Create the build, clean, and run scripts for your environment and place them in the project
folder. See the following for the contents of the scripts:
• Build and clean scripts for the AArch64 TLS local-exec static linking example.
• Run scripts for the AArch64 TLS local-exec static linking example.
3. Create the scatter file shown in Scatter file for the AArch64 TLS local-exec static linking
example, and place it in the project folder.
4. Create the asm, src, and obj folders in the project folder.
5. Create the assembly source files shown in Assembly source files for the AArch64 TLS local-
exec static linking example, and place them in the asm folder.
6. Create the C source files shown in C source files for the AArch64 TLS local-exec static linking
example, and place them in the src folder.
You can use the run.sh and run.bat scripts to run the example on the FVP_Base_Cortex-A53x1
FVP that is shipped with Arm Development Studio. You must provide the path to the directory
containing the FVP executable when running these scripts. For example, with Arm Development
Studio 2021.0 installed to the default installation directory:
• On Linux:
./run.sh /opt/arm/developmentstudio-2021.0/sw/models/bin
• On Windows:
When you run the example, it prints messages similar to the following:
The addresses are from after TLS data initialization at run-time. You can verify that the RW
address is not the link-time address of the TLS RW data by examining the memory map in the
tls_aarch64.lst file. For example:
Execution Region ER_TLS_RW (Exec base: 0x80004250, Load base: 0x80004250, Size:
0x00000004, Max: 0xffffffffffffffff, ABSOLUTE)
Exec Addr Load Addr Size Type Attr Idx E Section Name
Object
Execution Region ER_TLS_ZI (Exec base: 0x80004254, Load base: 0x80004254, Size:
0x00000000, Max: 0xffffffffffffffff, ABSOLUTE)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 312 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
Exec Addr Load Addr Size Type Attr Idx E Section Name
Object
Related information
-ftls-model
-mtp
__attribute__((tls_model("model"))) variable attribute
--sysv
--bare_metal_sysv
Requirements and restrictions for using scatter files with SysV linking model
13.2 Build and clean scripts for the AArch64 TLS local-
exec static linking example
The build script provides the armclang and armlink commands to build the Thread Local Storage
(TLS) example. Use the clean script to remove the files generated by these commands. There is a
build and clean script for both Windows and Linux environments.
# Link everything
armlink --cpu=8-A.64 --sysv --bare_metal_sysv --scatter=scatter.scat --
diag_suppress=6329 --entry start64 --map --load_addr_map_info --list tls_aarch64.lst
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 313 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
rm obj/*
rm tls_aarch64.lst
rm tls_aarch64.axf
del obj\*
del tls_aarch64.lst
del tls_aarch64.axf
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 314 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
;**********************************************************
; Scatter file for Armv8-A Startup code on FVP Base model
; Copyright (c) 2014-2016 Arm Limited (or its affiliates). All rights reserved.
; Use, modification and redistribution of this file is subject to your possession
; of a valid End User License Agreement for the Arm Product of which these
; examples are part of and your compliance with all applicable terms and
; conditions of such licence agreement.
;**********************************************************
LOAD 0x80000000
{
STARTUP +0
{
startup.o (StartUp, +FIRST)
}
EXEC +0 {
*(+RO, +RW)
}
;
; TLS RW region
; If the load region contains more execution regions
; than just TLS execution regions, then do not place
; any non-TLS RW or ZI data before TLS RW or ZI data
;
ER_TLS_RW +0 {
*(+tls-rw)
}
;
; TLS ZI region
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 315 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
;
; GICv3 distributor
;
GICD +0 UNINIT 0x8000
{
GICv3_gicd.o (.bss.distributor)
}
;
; GICv3 redistributors
; 128KB for each redistributor in the system
;
GICR +0 UNINIT 0x80000
{
GICv3_gicr.o (.bss.redistributor)
}
;
; App stack
; All stacks and heap are aligned to a cache-line boundary
;
ARM_LIB_STACK +0 ALIGN 64 EMPTY 0x4000 {}
;
; Stack for EL3
;
EL3_STACKS +0 ALIGN 64 EMPTY 0x1000 {}
;
; Separate heap - import symbol __use_two_region_memory
; in source code for this to work correctly
;
ARM_LIB_HEAP +0 ALIGN 64 EMPTY 0xA0000 {}
;
; Strictly speaking, the L1 tables do not need to
; be so strongly aligned, but no matter
;
TTB0_L1 +0 ALIGN 4096 EMPTY 0x1000 {}
;
; Various sets of L2 tables
;
; Alignment is 4KB, since the code uses a 4K page
; granularity - larger granularities would require
; correspondingly stricter alignment
;
TTB0_L2_RAM +0 ALIGN 4096 EMPTY 0x1000 {}
;
; The startup code uses the end of this region to calculate
; the top of memory - do not place any RAM regions after it
;
TOP_OF_RAM +0 EMPTY 4 {}
;
; CS3 Peripherals is a 64MB region from 0x1c000000
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 316 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
;
; Place the UART peripheral registers data structure
; This is only really needed if USE_SERIAL_PORT is defined, but
; the linker will remove unused sections if not needed
PL011 0x1c090000 UNINIT 0x1000
{
uart.o (+ZI)
}
}
• startup.S
• v8_aarch64.S
• v8_mmu.h
• v8_system.h
• v8_utils.S
• vectors.S
//
// Private Peripheral Map for the v8 Architecture Envelope Model
//
// Copyright (c) 2012-2017 Arm Limited (or its affiliates). All rights reserved.
// Use, modification and redistribution of this file is subject to your possession
of a
// valid End User License Agreement for the Arm Product of which these examples are
part of
// and your compliance with all applicable terms and conditions of such licence
agreement.
//
#ifndef PPM_AEM_H
#define PPM_AEM_H
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 317 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// Distributor layout
//
#define GICD_CTLR 0x0000
#define GICD_TYPER 0x0004
#define GICD_IIDR 0x0008
#define GICD_IGROUP 0x0080
#define GICD_ISENABLE 0x0100
#define GICD_ICENABLE 0x0180
#define GICD_ISPEND 0x0200
#define GICD_ICPEND 0x0280
#define GICD_ISACTIVE 0x0300
#define GICD_ICACTIVE 0x0380
#define GICD_IPRIORITY 0x0400
#define GICD_ITARGETS 0x0800
#define GICD_ICFG 0x0c00
#define GICD_PPISR 0x0d00
#define GICD_SPISR 0x0d04
#define GICD_SGIR 0x0f00
#define GICD_CPENDSGI 0x0f10
#define GICD_SPENDSGI 0x0f20
#define GICD_PIDR4 0x0fd0
#define GICD_PIDR5 0x0fd4
#define GICD_PIDR6 0x0fd8
#define GICD_PIDR7 0x0fdc
#define GICD_PIDR0 0x0fe0
#define GICD_PIDR1 0x0fe4
#define GICD_PIDR2 0x0fe8
#define GICD_PIDR3 0x0fec
#define GICD_CIDR0 0x0ff0
#define GICD_CIDR1 0x0ff4
#define GICD_CIDR2 0x0ff8
#define GICD_CIDR3 0x0ffc
//
// CPU Interface layout
//
#define GICC_CTLR 0x0000
#define GICC_PMR 0x0004
#define GICC_BPR 0x0008
#define GICC_IAR 0x000c
#define GICC_EOIR 0x0010
#define GICC_RPR 0x0014
#define GICC_HPPIR 0x0018
#define GICC_ABPR 0x001c
#define GICC_AIAR 0x0020
#define GICC_AEOIR 0x0024
#define GICC_AHPPIR 0x0028
#define GICC_APR0 0x00d0
#define GICC_NSAPR0 0x00e0
#define GICC_IIDR 0x00fc
#define GICC_DIR 0x1000
#endif // PPM_AEM_H
// ------------------------------------------------------------
// Armv8-A Single-core EL3 AArch64 Startup Code
//
// Basic Vectors, MMU, caches and GICv3 initialization
//
// Exits in EL1 AArch64
//
// Copyright (c) 2014-2020 Arm Limited (or its affiliates). All rights reserved.
// Use, modification and redistribution of this file is subject to your possession
of a
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 318 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
// valid End User License Agreement for the Arm Product of which these examples are
part of
// and your compliance with all applicable terms and conditions of such licence
agreement.
// ------------------------------------------------------------
#include "v8_mmu.h"
#include "v8_system.h"
.global el1_vectors
.global el2_vectors
.global el3_vectors
.global InvalidateUDCaches
.global ZeroBlock
.global SetPrivateIntSecurityBlock
.global SetSPISecurityAll
.global WakeupGICR
.global SyncAREinGICD
.global EnableGICD
.global __main
.global Image$$EXEC$$RO$$Base
.global Image$$TTB0_L1$$ZI$$Base
.global Image$$TTB0_L2_RAM$$ZI$$Base
.global Image$$TTB0_L2_PERIPH$$ZI$$Base
.global Image$$TOP_OF_RAM$$ZI$$Base
.global Image$$GICD$$ZI$$Base
.global Image$$ARM_LIB_STACK$$ZI$$Limit
.global Image$$EL3_STACKS$$ZI$$Limit
.global Image$$CS3_PERIPHERALS$$ZI$$Base
// use separate stack and heap, as anticipated by scatter.scat
.global __use_two_region_memory
// ------------------------------------------------------------
.global start64
.type start64, "function"
start64:
// Extract the core number from MPIDR_EL1 and store it in x19
// (defined by the AAPCS as callee-saved), so we can re-use it later
//
bl GetCPUID
mov x19, x0
core0_only:
//
// program the VBARs
//
ldr x1, =el1_vectors
msr VBAR_EL1, x1
msr VBAR_EL2, x1
//
// set lower exception levels as non-secure, with no access
// back to EL2 or EL3, and are AArch64 capable
//
mov x3, #(SCR_EL3_RW | \
SCR_EL3_SMD | \
SCR_EL3_NS) // Set NS bit, to access Non-secure registers
msr SCR_EL3, x3
isb
mov x0, #15
msr ICC_SRE_EL2, x0
isb
msr ICC_SRE_EL1, x0 // Non-secure copy of ICC_SRE_EL1
//
// no traps or VM modifications from the Hypervisor, EL1 is AArch64
//
mov x2, #HCR_EL2_RW
msr HCR_EL2, x2
//
// VMID is still significant, even when virtualization is not
// being used, so ensure VTTBR_EL2 is properly initialized
//
msr VTTBR_EL2, xzr
//
// VMPIDR_EL2 holds the value of the Virtualization Multiprocessor ID. This is
the value returned by Non-secure EL1 reads of MPIDR_EL1.
// VPIDR_EL2 holds the value of the Virtualization Processor ID. This is the
value returned by Non-secure EL1 reads of MIDR_EL1.
// Both of these registers are architecturally UNKNOWN at reset, and so they
must be set to the correct value
// (even if EL2/virtualization is not being used), otherwise non-secure EL1
reads of MPIDR_EL1/MIDR_EL1 will return garbage values.
// This guarantees that any future reads of MPIDR_EL1 and MIDR_EL1 from Non-
secure EL1 will return the correct value.
//
mrs x0, MPIDR_EL1
msr VMPIDR_EL2, x0
mrs x0, MIDR_EL1
msr VPIDR_EL2, x0
//
// neither EL3 nor EL2 trap floating point or accesses to CPACR
//
msr CPTR_EL3, xzr
msr CPTR_EL2, xzr
//
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 320 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
#ifdef CORTEXA
//
// Configure ACTLR_EL[23]
// ----------------------
//
// These bits are IMPLEMENTATION DEFINED, so are different for
// different processors
//
// For Cortex-A57, the controls we set are:
//
// Enable lower level access to CPUACTLR_EL1
// Enable lower level access to CPUECTLR_EL1
// Enable lower level access to L2CTLR_EL1
// Enable lower level access to L2ECTLR_EL1
// Enable lower level access to L2ACTLR_EL1
//
mov x0, #((1 << 0) | \
(1 << 1) | \
(1 << 4) | \
(1 << 5) | \
(1 << 6))
msr ACTLR_EL3, x0
msr ACTLR_EL2, x0
//
// configure CPUECTLR_EL1
//
// These bits are IMP DEF, so need to be different for different
// processors
//
// SMPEN - bit 6 - Enables the processor to receive cache
// and TLB maintenance operations
//
// Note: For Cortex-A57/53 SMPEN should be set before enabling
// the caches and MMU, or performing any cache and TLB
// maintenance operations.
//
// This register has a defined reset value, so we use a
// read-modify-write sequence to set SMPEN
//
mrs x0, S3_1_c15_c2_1 // Read EL1 CPU Extended Control Register
orr x0, x0, #(1 << 6) // Set the SMPEN bit
msr S3_1_c15_c2_1, x0 // Write EL1 CPU Extended Control Register
isb
#endif
//
// That is the last of the control settings for now
//
// Note: no ISB after all these changes, because registers will not be
// accessed until after an exception return, which is itself a
// context synchronization event
//
//
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 321 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
// Setup some EL3 stack space, ready for calling some subroutines, below.
//
ldr x0, =Image$$EL3_STACKS$$ZI$$Limit
mov sp, x0
//
// we need to configure the GIC while still in secure mode, specifically
// all PPIs and SPIs have to be programmed as Group1 interrupts
//
//
// Before the GIC can be reliably programmed, we need to
// enable Affinity Routing, as this affects where the configuration
// registers are (with Affinity Routing enabled, some registers are
// in the Redistributor, whereas those same registers are in the
// Distributor with Affinity Routing disabled (that is, when in GICv2
// compatibility mode).
//
mov x0, #(1 << 4) | (1 << 5) // gicdctlr_ARE_S | gicdctlr_ARE_NS
mov x1, x19
bl SyncAREinGICD
//
// The Redistributor comes out of reset assuming the processor is
// asleep - correct that assumption
//
mov w0, w19
bl WakeupGICR
//
// Now we are ready to set security and other initializations
//
// This is a per-CPU configuration for these interrupts
//
// for the first cluster, CPU number is the redistributor index
//
mov w0, w19
mov w1, #1 // gicigroupr_G1NS
bl SetPrivateIntSecurityBlock
//
// While we are in the Secure World, set the priority mask low enough
// for it to be writable in the Non-Secure World
//
//mov x0, #16 << 3 // 5 bits of priority in the Secure world
mov x0, #0xFF // for Non-Secure interrupts
msr ICC_PMR_EL1, x0
//
// There is more to do to the GIC - call the utility routine to set
// all SPIs to Group1
//
mov w0, #1 // gicigroupr_G1NS
bl SetSPISecurityAll
//
// Set up EL1 entry point and "dummy" exception return information,
// then perform exception return to enter EL1
//
.global drop_to_el1
drop_to_el1:
adr x1, el1_entry_aarch64
msr ELR_EL3, x1
mov x1, #(AARCH64_SPSR_EL1h | \
AARCH64_SPSR_F | \
AARCH64_SPSR_I | \
AARCH64_SPSR_A)
msr SPSR_EL3, x1
eret
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 322 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
// ------------------------------------------------------------
// EL1 - Common start-up code
// ------------------------------------------------------------
.global el1_entry_aarch64
.type el1_entry_aarch64, "function"
el1_entry_aarch64:
//
// Now we are in EL1, setup the application stack
//
ldr x0, =Image$$ARM_LIB_STACK$$ZI$$Limit
mov sp, x0
//
// Enable floating point
//
mov x0, #CPACR_EL1_FPEN
msr CPACR_EL1, x0
//
// Invalidate caches and TLBs for all stage 1
// translations used at EL1
//
// Cortex-A processors automatically invalidate their caches on reset
// (unless suppressed with the DBGL1RSTDISABLE or L2RSTDISABLE pins).
// It is therefore not necessary for software to invalidate the caches
// on startup, however, this is done here in case of a warm reset.
bl InvalidateUDCaches
tlbi VMALLE1
//
// Set TTBR0 Base address
//
// The CPUs share one set of translation tables that are
// generated by CPU0 at run-time
//
// TTBR1_EL1 is not used in this example
//
ldr x1, =Image$$TTB0_L1$$ZI$$Base
msr TTBR0_EL1, x1
//
// Set up memory attributes
//
// These equate to:
//
// 0 -> 0b01000100 = 0x00000044 = Normal, Inner/Outer Non-Cacheable
// 1 -> 0b11111111 = 0x0000ff00 = Normal, Inner/Outer WriteBack Read/Write
Allocate
// 2 -> 0b00000100 = 0x00040000 = Device-nGnRE
//
mov x1, #0xff44
movk x1, #4, LSL #16 // equiv to: movk x1, #0x0000000000040000
msr MAIR_EL1, x1
//
// Set up TCR_EL1
//
// We are using only TTBR0 (EPD1 = 1), and the page table entries:
// - are using an 8-bit ASID from TTBR0
// - have a 4K granularity (TG0 = 0b00)
// - are outer-shareable (SH0 = 0b10)
// - are using Inner & Outer WBWA Normal memory ([IO]RGN0 = 0b01)
// - map
// + 32 bits of VA space (T0SZ = 0x20)
// + into a 32-bit PA space (IPS = 0b000)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 323 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// 36 32 28 24 20 16 12 8 4 0
// -----+----+----+----+----+----+----+----+----+----+
// | | |OOII| | | |OOII| | |
// TT | | |RRRR|E T | T| |RRRR|E T | T|
// BB | I I|TTSS|GGGG|P 1 | 1|TTSS|GGGG|P 0 | 0|
// IIA| P P|GGHH|NNNN|DAS | S|GGHH|NNNN|D S | S|
// 10S| S-S|1111|1111|11Z-|---Z|0000|0000|0 Z-|---Z|
//
// 000 0000 0000 0000 1000 0000 0010 0101 0010 0000
//
// 0x 8 0 2 5 2 0
//
// Note: the ISB is needed to ensure the changes to system
// context are before the write of SCTLR_EL1.M to enable
// the MMU. It is likely on a "real" implementation that
// this setup would work without an ISB, due to the
// amount of code that gets executed before enabling the
// MMU, but that would not be architecturally correct.
//
ldr x1, =0x0000000000802520
msr TCR_EL1, x1
isb
//
// Turn on the banked GIC distributor enable,
// ready for individual CPU enables later
//
mov w0, #(1 << 1) // gicdctlr_EnableGrp1A
bl EnableGICD
//
// Generate TTBR0 L1
//
// at 4KB granularity, 32-bit VA space, table lookup starts at
// L1, with 1GB regions
//
// we are going to create entries pointing to L2 tables for a
// couple of these 1GB regions, the first of which is the
// RAM on the VE board model - get the table addresses and
// start by emptying out the L1 page tables (4 entries at L1
// for a 4K granularity)
//
// x21 = address of L1 tables
//
ldr x21, =Image$$TTB0_L1$$ZI$$Base
mov x0, x21
mov x1, #(4 << 3)
bl ZeroBlock
//
// time to start mapping the RAM regions - clear out the
// L2 tables and point to them from the L1 tables
//
// x22 = address of L2 tables, needs to be remembered in case
// we want to re-use the tables for mapping peripherals
//
ldr x22, =Image$$TTB0_L2_RAM$$ZI$$Base
mov x1, #(512 << 3)
mov x0, x22
bl ZeroBlock
//
// Get the start address of RAM (the EXEC region) into x4
// and calculate the offset into the L1 table (1GB per region,
// max 4GB)
//
// x23 = L1 table offset, saved for later comparison against
// peripheral offset
//
ldr x4, =Image$$EXEC$$RO$$Base
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 324 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// TOP_OF_RAM in the scatter file marks the end of the
// Execute region in RAM: convert the end of this region to an
// offset too, being careful to round up, then calculate the
// number of entries to write
//
ldr x5, =Image$$TOP_OF_RAM$$ZI$$Base
sub x3, x5, #1
ubfx x3, x3, #21, #9
add x3, x3, #1
sub x3, x3, x2
//
// set x1 to the required page table attributes, then orr
// in the start address (modulo 2MB)
//
// L2 tables in our configuration cover 2MB per entry - map
// memory as Shared, Normal WBWA (MAIR[1]) with a flat
// VA->PA translation
//
bic x4, x4, #((1 << 21) - 1)
ldr x1, =(TT_S1_ATTR_BLOCK | \
(1 << TT_S1_ATTR_MATTR_LSB) | \
TT_S1_ATTR_NS | \
TT_S1_ATTR_AP_RW_PL1 | \
TT_S1_ATTR_SH_INNER | \
TT_S1_ATTR_AF | \
TT_S1_ATTR_nG)
orr x1, x1, x4
//
// factor the offset into the page table address and then write
// the entries
//
add x0, x22, x2, lsl #3
loop1:
subs x3, x3, #1
str x1, [x0], #8
add x1, x1, #0x200, LSL #12 // equiv to add x1, x1, #(1 << 21) // 2MB per
entry
bne loop1
//
// now mapping the Peripheral regions - clear out the
// L2 tables and point to them from the L1 tables
//
// The assumption here is that all peripherals live within
// a common 1GB region (that is, that there is a single set of
// L2 pages for all the peripherals). We only use a UART
// and the GIC in this example, so the assumption is sound
//
// x24 = address of L2 peripheral tables
//
ldr x24, =Image$$TTB0_L2_PERIPH$$ZI$$Base
//
// get the GICD address into x4 and calculate
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 325 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// Peripherals are in a separate 1GB region, and so have their own
// set of L2 tables - clean out the tables and add them to the L1
// table
//
mov x0, x24
mov x1, #512 << 3
bl ZeroBlock
orr x1, x24, #TT_S1_ATTR_PAGE
str x1, [x21, x25, lsl #3]
//
// there is only going to be a single 2MB region for GICD (in
// x4) - get this in terms of an offset into the L2 page tables
//
// with larger systems, it is possible that the GIC redistributor
// registers require extra 2MB pages, in which case extra code
// would be required here
//
nol2setup:
ubfx x2, x4, #21, #9
//
// set x1 to the required page table attributes, then orr
// in the start address (modulo 2MB)
//
// L2 tables in our configuration cover 2MB per entry - map
// memory as NS Device-nGnRE (MAIR[2]) with a flat VA->PA
// translation
//
bic x4, x4, #((1 << 21) - 1) // start address mod 2MB
ldr x1, =(TT_S1_ATTR_BLOCK | \
(2 << TT_S1_ATTR_MATTR_LSB) | \
TT_S1_ATTR_NS | \
TT_S1_ATTR_AP_RW_PL1 | \
TT_S1_ATTR_AF | \
TT_S1_ATTR_nG)
orr x1, x1, x4
//
// only a single L2 entry for this, so no loop as we have for RAM, above
//
str x1, [x24, x2, lsl #3]
//
// we have CS3_PERIPHERALS that include the UART controller
//
// Again, the code is making assumptions - this time that the CS3_PERIPHERALS
// region uses the same 1GB portion of the address space as the GICD,
// and thus shares the same set of L2 page tables
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 326 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// Get CS3_PERIPHERALS address into x4 and calculate the offset into the
// L2 tables
//
ldr x4, =Image$$CS3_PERIPHERALS$$ZI$$Base
ubfx x2, x4, #21, #9
//
// set x1 to the required page table attributes, then orr
// in the start address (modulo 2MB)
//
// L2 tables in our configuration cover 2MB per entry - map
// memory as NS Device-nGnRE (MAIR[2]) with a flat VA->PA
// translation
//
bic x4, x4, #((1 << 21) - 1) // start address mod 2MB
ldr x1, =(TT_S1_ATTR_BLOCK | \
(2 << TT_S1_ATTR_MATTR_LSB) | \
TT_S1_ATTR_NS | \
TT_S1_ATTR_AP_RW_PL1 | \
TT_S1_ATTR_AF | \
TT_S1_ATTR_nG)
orr x1, x1, x4
//
// only a single L2 entry again - write it
//
str x1, [x24, x2, lsl #3]
//
// issue a barrier to ensure all table entry writes are complete
//
dsb ish
//
// Enable the MMU. Caches will be enabled later, after scatterloading.
//
mrs x1, SCTLR_EL1
orr x1, x1, #SCTLR_ELx_M
bic x1, x1, #SCTLR_ELx_A // Disable alignment fault checking. To enable, change
bic to orr
msr SCTLR_EL1, x1
isb
//
// Branch to C library init code
//
b __main
// ------------------------------------------------------------
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 327 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
// * Ensure all code on the path from the program entry up to and including
_platform_pre_stackheap_init is located in a root region.
.global _platform_pre_stackheap_init
.type _platform_pre_stackheap_init, "function"
.cfi_startproc
_platform_pre_stackheap_init:
dsb ish // ensure all previous stores have completed before invalidating
ic ialluis // I cache invalidate all inner shareable to PoU (which includes
secondary cores)
dsb ish // ensure completion on inner shareable domain (which includes
secondary cores)
isb
ret
.cfi_endproc
// ------------------------------------------------------------
// Armv8-A AArch64 - Common helper functions
//
// Copyright (c) 2012-2020 Arm Limited (or its affiliates). All rights reserved.
// Use, modification and redistribution of this file is subject to your possession
of a
// valid End User License Agreement for the Arm Product of which these examples are
part of
// and your compliance with all applicable terms and conditions of such licence
agreement.
// ------------------------------------------------------------
#include "v8_system.h"
.text
.cfi_sections .debug_frame // put stack frame info into .debug_frame instead
of .eh_frame
.global EnableCachesEL1
.global DisableCachesEL1
.global InvalidateUDCaches
.global GetMIDR
.global GetMPIDR
.global GetCPUID
// ------------------------------------------------------------
//
// void EnableCachesEL1(void)
//
// enable Instruction and Data caches
//
.type EnableCachesEL1, "function"
.cfi_startproc
EnableCachesEL1:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 328 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
isb
ret
.cfi_endproc
// ------------------------------------------------------------
.type DisableCachesEL1, "function"
.cfi_startproc
DisableCachesEL1:
isb
ret
.cfi_endproc
// ------------------------------------------------------------
//
// void InvalidateUDCaches(void)
//
// Invalidate data and unified caches
//
.type InvalidateUDCaches, "function"
.cfi_startproc
InvalidateUDCaches:
// From the Armv8-A Architecture Reference Manual
b.ge loop_way
next_level:
add w10, w10, #2 // increment 2 x cache level
cmp w3, w10
b.gt loop_level
dsb sy // ensure completion of previous cache maintenance
operation
isb
finished:
ret
.cfi_endproc
// ------------------------------------------------------------
//
// ID Register functions
//
Others:
mrs x0, MPIDR_EL1
ubfx x1, x0, #MPIDR_EL1_AFF0_LSB, #MPIDR_EL1_AFF_WIDTH
ubfx x2, x0, #MPIDR_EL1_AFF1_LSB, #MPIDR_EL1_AFF_WIDTH
add x0, x1, x2, LSL #2
ret
.cfi_endproc
//
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 330 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
#ifndef V8_MMU_H
#define V8_MMU_H
//
// Translation Control Register fields
//
// RGN field encodings
//
#define TCR_RGN_NC 0b00
#define TCR_RGN_WBWA 0b01
#define TCR_RGN_WT 0b10
#define TCR_RGN_WBRA 0b11
//
// Shareability encodings
//
#define TCR_SHARE_NONE 0b00
#define TCR_SHARE_OUTER 0b10
#define TCR_SHARE_INNER 0b11
//
// Granule size encodings
//
#define TCR_GRANULE_4K 0b00
#define TCR_GRANULE_64K 0b01
#define TCR_GRANULE_16K 0b10
//
// Physical Address sizes
//
#define TCR_SIZE_4G 0b000
#define TCR_SIZE_64G 0b001
#define TCR_SIZE_1T 0b010
#define TCR_SIZE_4T 0b011
#define TCR_SIZE_16T 0b100
#define TCR_SIZE_256T 0b101
//
// Translation Control Register fields
//
#define TCR_EL1_T0SZ_SHIFT 0
#define TCR_EL1_EPD0 (1 << 7)
#define TCR_EL1_IRGN0_SHIFT 8
#define TCR_EL1_ORGN0_SHIFT 10
#define TCR_EL1_SH0_SHIFT 12
#define TCR_EL1_TG0_SHIFT 14
#define TCR_EL1_T1SZ_SHIFT 16
#define TCR_EL1_A1 (1 << 22)
#define TCR_EL1_EPD1 (1 << 23)
#define TCR_EL1_IRGN1_SHIFT 24
#define TCR_EL1_ORGN1_SHIFT 26
#define TCR_EL1_SH1_SHIFT 28
#define TCR_EL1_TG1_SHIFT 30
#define TCR_EL1_IPS_SHIFT 32
#define TCR_EL1_AS (1 << 36)
#define TCR_EL1_TBI0 (1 << 37)
#define TCR_EL1_TBI1 (1 << 38)
//
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 331 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// Inner and Outer Normal memory attributes use the same bit patterns
// Outer attributes just need to be shifted up
//
#define TT_S1_MAIR_OUTER_SHIFT 4
#endif // V8_MMU_H
//
// Defines for v8 System Registers
//
// Copyright (c) 2012-2016 Arm Limited (or its affiliates). All rights reserved.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 332 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// AArch64 SPSR
//
#define AARCH64_SPSR_EL3h 0b1101
#define AARCH64_SPSR_EL3t 0b1100
#define AARCH64_SPSR_EL2h 0b1001
#define AARCH64_SPSR_EL2t 0b1000
#define AARCH64_SPSR_EL1h 0b0101
#define AARCH64_SPSR_EL1t 0b0100
#define AARCH64_SPSR_EL0t 0b0000
#define AARCH64_SPSR_RW (1 << 4)
#define AARCH64_SPSR_F (1 << 6)
#define AARCH64_SPSR_I (1 << 7)
#define AARCH64_SPSR_A (1 << 8)
#define AARCH64_SPSR_D (1 << 9)
#define AARCH64_SPSR_IL (1 << 20)
#define AARCH64_SPSR_SS (1 << 21)
#define AARCH64_SPSR_V (1 << 28)
#define AARCH64_SPSR_C (1 << 29)
#define AARCH64_SPSR_Z (1 << 30)
#define AARCH64_SPSR_N (1 << 31)
//
// Multiprocessor Affinity Register
//
#define MPIDR_EL1_AFF3_LSB 32
#define MPIDR_EL1_U (1 << 30)
#define MPIDR_EL1_MT (1 << 24)
#define MPIDR_EL1_AFF2_LSB 16
#define MPIDR_EL1_AFF1_LSB 8
#define MPIDR_EL1_AFF0_LSB 0
#define MPIDR_EL1_AFF_WIDTH 8
//
// Data Cache Zero ID Register
//
#define DCZID_EL0_BS_LSB 0
#define DCZID_EL0_BS_WIDTH 4
#define DCZID_EL0_DZP_LSB 5
#define DCZID_EL0_DZP (1 << 5)
//
// System Control Register
//
#define SCTLR_EL1_UCI (1 << 26)
#define SCTLR_ELx_EE (1 << 25)
#define SCTLR_EL1_E0E (1 << 24)
#define SCTLR_ELx_WXN (1 << 19)
#define SCTLR_EL1_nTWE (1 << 18)
#define SCTLR_EL1_nTWI (1 << 16)
#define SCTLR_EL1_UCT (1 << 15)
#define SCTLR_EL1_DZE (1 << 14)
#define SCTLR_ELx_I (1 << 12)
#define SCTLR_EL1_UMA (1 << 9)
#define SCTLR_EL1_SED (1 << 8)
#define SCTLR_EL1_ITD (1 << 7)
#define SCTLR_EL1_THEE (1 << 6)
#define SCTLR_EL1_CP15BEN (1 << 5)
#define SCTLR_EL1_SA0 (1 << 4)
#define SCTLR_ELx_SA (1 << 3)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 333 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// Architectural Feature Access Control Register
//
#define CPACR_EL1_TTA (1 << 28)
#define CPACR_EL1_FPEN (3 << 20)
//
// Architectural Feature Trap Register
//
#define CPTR_ELx_TCPAC (1 << 31)
#define CPTR_ELx_TTA (1 << 20)
#define CPTR_ELx_TFP (1 << 10)
//
// Secure Configuration Register
//
#define SCR_EL3_TWE (1 << 13)
#define SCR_EL3_TWI (1 << 12)
#define SCR_EL3_ST (1 << 11)
#define SCR_EL3_RW (1 << 10)
#define SCR_EL3_SIF (1 << 9)
#define SCR_EL3_HCE (1 << 8)
#define SCR_EL3_SMD (1 << 7)
#define SCR_EL3_EA (1 << 3)
#define SCR_EL3_FIQ (1 << 2)
#define SCR_EL3_IRQ (1 << 1)
#define SCR_EL3_NS (1 << 0)
//
// Hypervisor Configuration Register
//
#define HCR_EL2_ID (1 << 33)
#define HCR_EL2_CD (1 << 32)
#define HCR_EL2_RW (1 << 31)
#define HCR_EL2_TRVM (1 << 30)
#define HCR_EL2_HVC (1 << 29)
#define HCR_EL2_TDZ (1 << 28)
#endif // V8_SYSTEM_H
//
// Simple utility routines for baremetal v8 code
//
// Copyright (c) 2013-2017 Arm Limited (or its affiliates). All rights reserved.
// Use, modification and redistribution of this file is subject to your possession
of a
// valid End User License Agreement for the Arm Product of which these examples are
part of
// and your compliance with all applicable terms and conditions of such licence
agreement.
//
#include "v8_system.h"
.text
.cfi_sections .debug_frame // put stack frame info into .debug_frame instead
of .eh_frame
//
// void *ZeroBlock(void *blockPtr, unsigned int nBytes)
//
// Zero fill a block of memory
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 334 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//
// we fill data by steam, 16 bytes at a time: check that
// blocksize is a multiple of that
//
ubfx x2, x1, #0, #4
cbnz x2, incompatible
//
// we already have one register full of zeros, get another
//
mov x3, x2
//
// OK, set temporary pointer and away we go
//
add x0, x0, x1
loop0:
subs x1, x1, #16
stp x2, x3, [x0, #-16]!
b.ne loop0
//
// that's all - x0 will be back to its start value
//
ret
//
// parameters are incompatible with block size - return
// an indication that this is so
//
incompatible:
mov x0,#0
ret
.cfi_endproc
// ------------------------------------------------------------
// Armv8-A Vector tables
//
// Copyright (c) 2014-2016 Arm Limited (or its affiliates). All rights reserved.
// Use, modification and redistribution of this file is subject to your possession
of a
// valid End User License Agreement for the Arm Product of which these examples are
part of
// and your compliance with all applicable terms and conditions of such licence
agreement.
// ------------------------------------------------------------
.global el1_vectors
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 335 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
.global el2_vectors
.global el3_vectors
.global c0sync1
.global irqHandler
.global fiqHandler
.global irqFirstLevelHandler
.global fiqFirstLevelHandler
.section EL1VECTORS, "ax"
.align 11
//
// Current EL with SP0
//
el1_vectors:
c0sync1: B c0sync1
.balign 0x80
c0irq1: B irqFirstLevelHandler
.balign 0x80
c0fiq1: B fiqFirstLevelHandler
.balign 0x80
c0serr1: B c0serr1
//
// Current EL with SPx
//
.balign 0x80
cxsync1: B cxsync1
.balign 0x80
cxirq1: B irqFirstLevelHandler
.balign 0x80
cxfiq1: B fiqFirstLevelHandler
.balign 0x80
cxserr1: B cxserr1
//
// Lower EL using AArch64
//
.balign 0x80
l64sync1: B l64sync1
.balign 0x80
l64irq1: B irqFirstLevelHandler
.balign 0x80
l64fiq1: B fiqFirstLevelHandler
.balign 0x80
l64serr1: B l64serr1
//
// Lower EL using AArch32
//
.balign 0x80
l32sync1: B l32sync1
.balign 0x80
l32irq1: B irqFirstLevelHandler
.balign 0x80
l32fiq1: B fiqFirstLevelHandler
.balign 0x80
l32serr1: B l32serr1
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 336 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
//----------------------------------------------------------------
.balign 0x80
c0irq2: B irqFirstLevelHandler
.balign 0x80
c0fiq2: B fiqFirstLevelHandler
.balign 0x80
c0serr2: B c0serr2
//
// Current EL with SPx
//
.balign 0x80
cxsync2: B cxsync2
.balign 0x80
cxirq2: B irqFirstLevelHandler
.balign 0x80
cxfiq2: B fiqFirstLevelHandler
.balign 0x80
cxserr2: B cxserr2
//
// Lower EL using AArch64
//
.balign 0x80
l64sync2: B l64sync2
.balign 0x80
l64irq2: B irqFirstLevelHandler
.balign 0x80
l64fiq2: B fiqFirstLevelHandler
.balign 0x80
l64serr2: B l64serr2
//
// Lower EL using AArch32
//
.balign 0x80
l32sync2: B l32sync2
.balign 0x80
l32irq2: B irqFirstLevelHandler
.balign 0x80
l32fiq2: B fiqFirstLevelHandler
.balign 0x80
l32serr2: B l32serr2
//----------------------------------------------------------------
//
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 337 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
.balign 0x80
c0serr3: B c0serr3
//
// Current EL with SPx
//
.balign 0x80
cxsync3: B cxsync3
.balign 0x80
cxirq3: B irqFirstLevelHandler
.balign 0x80
cxfiq3: B fiqFirstLevelHandler
.balign 0x80
cxserr3: B cxserr3
//
// Lower EL using AArch64
//
.balign 0x80
l64sync3: B l64sync3
.balign 0x80
l64irq3: B irqFirstLevelHandler
.balign 0x80
l64fiq3: B fiqFirstLevelHandler
.balign 0x80
l64serr3: B l64serr3
//
// Lower EL using AArch32
//
.balign 0x80
l32sync3: B l32sync3
.balign 0x80
l32irq3: B irqFirstLevelHandler
.balign 0x80
l32fiq3: B fiqFirstLevelHandler
.balign 0x80
l32serr3: B l32serr3
BL fiqHandler
• GICv3_gicc.h
• GICv3_gicd.c
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 339 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
• GICv3_gicr.c
• main.c
• retarget.c
• sp804_timer.c
• sp804_timer.h
• timer_interrupts.c
• uart.c
• uart.h
• v8_aarch64.h
/*
* GICv3.h - data types and function prototypes for GICv3 utility routines
*
* Copyright (c) 2014-2017 Arm Limited (or its affiliates). All rights reserved.
* Use, modification and redistribution of this file is subject to your possession
of a
* valid End User License Agreement for the Arm Product of which these examples are
part of
* and your compliance with all applicable terms and conditions of such licence
agreement.
*/
#ifndef GICV3_h
#define GICV3_h
#include <stdint.h>
/*
* extra flags for GICD enable
*/
typedef enum
{
gicdctlr_EnableGrp0 = (1 << 0),
gicdctlr_EnableGrp1NS = (1 << 1),
gicdctlr_EnableGrp1A = (1 << 1),
gicdctlr_EnableGrp1S = (1 << 2),
gicdctlr_EnableAll = (1 << 2) | (1 << 1) | (1 << 0),
gicdctlr_ARE_S = (1 << 4), /* Enable Secure state affinity routing */
gicdctlr_ARE_NS = (1 << 5), /* Enable Non-Secure state affinity routing */
gicdctlr_DS = (1 << 6), /* Disable Security support */
gicdctlr_E1NWF = (1 << 7) /* Enable "1-of-N" wakeup model */
} GICDCTLRFlags_t;
/*
* modes for SPI routing
*/
typedef enum
{
gicdirouter_ModeSpecific = 0,
gicdirouter_ModeAny = (1 << 31)
} GICDIROUTERBits_t;
typedef enum
{
gicdicfgr_Level = 0,
gicdicfgr_Edge = (1 << 1)
} GICDICFGRBits_t;
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 340 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
typedef enum
{
gicigroupr_G0S = 0,
gicigroupr_G1NS = (1 << 0),
gicigroupr_G1S = (1 << 2)
} GICIGROUPRBits_t;
typedef enum
{
gicrwaker_ProcessorSleep = (1 << 1),
gicrwaker_ChildrenAsleep = (1 << 2)
} GICRWAKERBits_t;
/**********************************************************************/
/*
* Utility macros & functions
*/
#define RANGE_LIMIT(x) ((sizeof(x) / sizeof((x)[0])) - 1)
/**********************************************************************/
/*
* GIC Distributor Function Prototypes
*/
/*
* ConfigGICD - configure GIC Distributor prior to enabling it
*
* Inputs:
*
* control - control flags
*
* Returns:
*
* <nothing>
*
* NOTE:
*
* ConfigGICD() will set an absolute flags value, whereas
* {En,Dis}ableGICD() will only {set,clear} the flag bits
* passed as a parameter
*/
void ConfigGICD(GICDCTLRFlags_t flags);
/*
* EnableGICD - top-level enable for GIC Distributor
*
* Inputs:
*
* flags - new control flags to set
*
* Returns:
*
* <nothing>
*
* NOTE:
*
* ConfigGICD() will set an absolute flags value, whereas
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 341 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* SyncAREinGICD - synchronise GICD Address Routing Enable bits
*
* Inputs
*
* flags - absolute flag bits to set in GIC Distributor
*
* dosync - flag whether to wait for ARE bits to match passed
* flag field (dosync = true), or whether to set absolute
* flag bits (dosync = false)
*
* Returns
*
* <nothing>
*
* NOTE:
*
* This function is used to resolve a race in an MP system whereby secondary
* CPUs cannot reliably program all Redistributor registers until the
* primary CPU has enabled Address Routing. The primary CPU will call this
* function with dosync = false, while the secondaries will call it with
* dosync = true.
*/
void SyncAREinGICD(GICDCTLRFlags_t flags, uint32_t dosync);
/*
* EnableSPI - enable a specific shared peripheral interrupt
*
* Inputs:
*
* id - which interrupt to enable
*
* Returns:
*
* <nothing>
*/
void EnableSPI(uint32_t id);
/*
* DisableSPI - disable a specific shared peripheral interrupt
*
* Inputs:
*
* id - which interrupt to disable
*
* Returns:
*
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 342 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
* <nothing>
*/
void DisableSPI(uint32_t id);
/*
* SetSPIPriority - configure the priority for a shared peripheral interrupt
*
* Inputs:
*
* id - interrupt identifier
*
* priority - 8-bit priority to program (see note below)
*
* Returns:
*
* <nothing>
*
* Note:
*
* The GICv3 architecture makes this function sensitive to the Security
* context in terms of what effect it has on the programmed priority: no
* attempt is made to adjust for the reduced priority range available
* when making Non-Secure accesses to the GIC
*/
void SetSPIPriority(uint32_t id, uint32_t priority);
/*
* GetSPIPriority - determine the priority for a shared peripheral interrupt
*
* Inputs:
*
* id - interrupt identifier
*
* Returns:
*
* interrupt priority in the range 0 - 0xff
*/
uint32_t GetSPIPriority(uint32_t id);
/*
* SetSPIRoute - specify interrupt routing when gicdctlr_ARE is enabled
*
* Inputs:
*
* id - interrupt identifier
*
* affinity - prepacked "dotted quad" affinity routing. NOTE: use the
* gicv3PackAffinity() helper routine to generate this input
*
* mode - select routing mode (specific affinity, or any recipient)
*
* Returns:
*
* <nothing>
*/
void SetSPIRoute(uint32_t id, uint64_t affinity, GICDIROUTERBits_t mode);
/*
* GetSPIRoute - read ARE-enabled interrupt routing information
*
* Inputs:
*
* id - interrupt identifier
*
* Returns:
*
* routing configuration
*/
uint64_t GetSPIRoute(uint32_t id);
/*
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 343 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* GetSPITarget - read the set of processor targets for an interrupt
*
* Inputs
*
* id - interrupt identifier
*
* Returns
*
* 8-bit target bitmap
*/
uint32_t GetSPITarget(uint32_t id);
/*
* ConfigureSPI - setup an interrupt as edge- or level-triggered
*
* Inputs
*
* id - interrupt identifier
*
* config - desired configuration
*
* Returns
*
* <nothing>
*/
void ConfigureSPI(uint32_t id, GICDICFGRBits_t config);
/*
* SetSPIPending - mark an interrupt as pending
*
* Inputs
*
* id - interrupt identifier
*
* Returns
*
* <nothing>
*/
void SetSPIPending(uint32_t id);
/*
* ClearSPIPending - mark an interrupt as not pending
*
* Inputs
*
* id - interrupt identifier
*
* Returns
*
* <nothing>
*/
void ClearSPIPending(uint32_t id);
/*
* GetSPIPending - query whether an interrupt is pending
*
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 344 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
* Inputs
*
* id - interrupt identifier
*
* Returns
*
* pending status
*/
uint32_t GetSPIPending(uint32_t id);
/*
* SetSPISecurity - mark a shared peripheral interrupt as
* security <group>
*
* Inputs
*
* id - which interrupt to mark
*
* group - the group for the interrupt
*
* Returns
*
* <nothing>
*/
void SetSPISecurity(uint32_t id, GICIGROUPRBits_t group);
/*
* SetSPISecurityBlock - mark a block of 32 shared peripheral
* interrupts as security <group>
*
* Inputs:
*
* block - which block to mark (for example, 1 = Ints 32-63)
*
* group - the group for the interrupts
*
* Returns:
*
* <nothing>
*/
void SetSPISecurityBlock(uint32_t block, GICIGROUPRBits_t group);
/*
* SetSPISecurityAll - mark all shared peripheral interrupts
* as security <group>
*
* Inputs:
*
* group - the group for the interrupts
*
* Returns:
*
* <nothing>
*/
void SetSPISecurityAll(GICIGROUPRBits_t group);
/**********************************************************************/
/*
* GIC Re-Distributor Function Prototypes
*
* The model for calling Redistributor functions is that, rather than
* identifying the target redistributor with every function call, the
* SelectRedistributor() function is used to identify which redistributor
* is to be used for all functions until a different redistributor is
* explicitly selected
*/
/*
* WakeupGICR - wake up a Redistributor
*
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 345 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
* Inputs:
*
* gicr - which Redistributor to wakeup
*
* Returns:
*
* <nothing>
*/
void WakeupGICR(uint32_t gicr);
/*
* EnablePrivateInt - enable a private (SGI/PPI) interrupt
*
* Inputs:
*
* gicr - which Redistributor to program
*
* id - which interrupt to enable
*
* Returns:
*
* <nothing>
*/
void EnablePrivateInt(uint32_t gicr, uint32_t id);
/*
* DisablePrivateInt - disable a private (SGI/PPI) interrupt
*
* Inputs:
*
* gicr - which Redistributor to program
*
* id - which interrupt to disable
*
* Returns:
*
* <nothing>
*/
void DisablePrivateInt(uint32_t gicr, uint32_t id);
/*
* SetPrivateIntPriority - configure the priority for a private
* (SGI/PPI) interrupt
*
* Inputs:
*
* gicr - which Redistributor to program
*
* id - interrupt identifier
*
* priority - 8-bit priority to program (see note below)
*
* Returns:
*
* <nothing>
*
* Note:
*
* The GICv3 architecture makes this function sensitive to the Security
* context in terms of what effect it has on the programmed priority: no
* attempt is made to adjust for the reduced priority range available
* when making Non-Secure accesses to the GIC
*/
void SetPrivateIntPriority(uint32_t gicr, uint32_t id, uint32_t priority);
/*
* GetPrivateIntPriority - configure the priority for a private
* (SGI/PPI) interrupt
*
* Inputs:
*
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 346 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* SetPrivateIntPending - mark a private (SGI/PPI) interrupt as pending
*
* Inputs
*
* gicr - which Redistributor to program
*
* id - interrupt identifier
*
* Returns
*
* <nothing>
*/
void SetPrivateIntPending(uint32_t gicr, uint32_t id);
/*
* ClearPrivateIntPending - mark a private (SGI/PPI) interrupt as not pending
*
* Inputs
*
* gicr - which Redistributor to program
*
* id - interrupt identifier
*
* Returns
*
* <nothing>
*/
void ClearPrivateIntPending(uint32_t gicr, uint32_t id);
/*
* GetPrivateIntPending - query whether a private (SGI/PPI) interrupt is pending
*
* Inputs
*
* gicr - which Redistributor to program
*
* id - interrupt identifier
*
* Returns
*
* pending status
*/
uint32_t GetPrivateIntPending(uint32_t gicr, uint32_t id);
/*
* SetPrivateIntSecurity - mark a private (SGI/PPI) interrupt as
* security <group>
*
* Inputs
*
* gicr - which Redistributor to program
*
* id - which interrupt to mark
*
* group - the group for the interrupt
*
* Returns
*
* <nothing>
*/
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 347 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* SetPrivateIntSecurityBlock - mark all 32 private (SGI/PPI)
* interrupts as security <group>
*
* Inputs:
*
* gicr - which Redistributor to program
*
* group - the group for the interrupt
*
* Returns:
*
* <nothing>
*/
void SetPrivateIntSecurityBlock(uint32_t gicr, GICIGROUPRBits_t group);
/* EOF GICv3.h */
/*
* GICv3_gicc.h - prototypes and inline functions for GICC system register
operations
*
* Copyright (c) 2014-2017 Arm Limited (or its affiliates). All rights reserved.
* Use, modification and redistribution of this file is subject to your possession
of a
* valid End User License Agreement for the Arm Product of which these examples are
part of
* and your compliance with all applicable terms and conditions of such licence
agreement.
*/
#ifndef GICV3_gicc_h
#define GICV3_gicc_h
/**********************************************************************/
typedef enum
{
sreSRE = (1 << 0),
sreDFB = (1 << 1),
sreDIB = (1 << 2),
sreEnable = (1 << 3)
} ICC_SREBits_t;
return retc;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 348 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
return retc;
}
/**********************************************************************/
typedef enum
{
igrpEnable = (1 << 0),
igrpEnableGrp1NS = (1 << 0),
igrpEnableGrp1S = (1 << 2)
} ICC_IGRPBits_t;
/**********************************************************************/
typedef enum
{
ctlrCBPR = (1 << 0),
ctlrCBPR_EL1S = (1 << 0),
ctlrEOImode = (1 << 1),
ctlrCBPR_EL1NS = (1 << 1),
ctlrEOImode_EL3 = (1 << 2),
ctlrEOImode_EL1S = (1 << 3),
ctlrEOImode_EL1NS = (1 << 4),
ctlrRM = (1 << 5),
ctlrPMHE = (1 << 6)
} ICC_CTLRBits_t;
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 349 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
return retc;
}
static inline void setICC_CTLR_EL3(ICC_CTLRBits_t mode)
{
asm("msr ICC_CTLR_EL3, %0\n; isb" :: "r" ((uint64_t)mode));
}
return retc;
}
/**********************************************************************/
return retc;
}
return retc;
}
uint64_t retc;
return retc;
}
static inline uint64_t getICC_BPR1(void)
{
uint64_t retc;
return retc;
}
return retc;
}
/**********************************************************************/
typedef enum
{
sgirIRMTarget = 0,
sgirIRMAll = (1ull << 40)
} ICC_SGIRBits_t;
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 351 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* GICv3_gicd.c - generic driver code for GICv3 distributor
*
* Copyright (c) 2014-2017 Arm Limited (or its affiliates). All rights reserved.
* Use, modification and redistribution of this file is subject to your possession
of a
* valid End User License Agreement for the Arm Product of which these examples are
part of
* and your compliance with all applicable terms and conditions of such licence
agreement.
*/
#include <stdint.h>
#include "GICv3.h"
typedef struct
{
volatile uint32_t GICD_CTLR; // +0x0000
const volatile uint32_t GICD_TYPER; // +0x0004
const volatile uint32_t GICD_IIDR; // +0x0008
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 352 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* GICD_ISENABLER has 32 interrupts for each register
*/
bank = (id >> 5) & RANGE_LIMIT(gicd.GICD_ISENABLER);
id &= 32 - 1;
return;
}
/*
* GICD_ISENABLER has 32 interrupts for each register
*/
bank = (id >> 5) & RANGE_LIMIT(gicd.GICD_ICENABLER);
id &= 32 - 1;
return;
}
uint32_t bank;
/*
* GICD_IPRIORITYR has one byte-wide entry for each interrupt
*/
bank = id & RANGE_LIMIT(gicd.GICD_IPRIORITYR);
gicd.GICD_IPRIORITYR[bank] = priority;
}
/*
* GICD_IPRIORITYR has one byte-wide entry for each interrupt
*/
bank = id & RANGE_LIMIT(gicd.GICD_IPRIORITYR);
return (uint32_t)(gicd.GICD_IPRIORITYR[bank]);
}
/*
* GICD_IROUTER has one doubleword-wide entry for each interrupt
*/
bank = id & RANGE_LIMIT(gicd.GICD_IROUTER);
/*
* GICD_IROUTER has one doubleword-wide entry for each interrupt
*/
bank = id & RANGE_LIMIT(gicd.GICD_IROUTER);
return gicd.GICD_IROUTER[bank];
}
/*
* GICD_ITARGETSR has one byte-wide entry for each interrupt
*/
bank = id & RANGE_LIMIT(gicd.GICD_ITARGETSR);
gicd.GICD_ITARGETSR[bank] = target;
}
/*
* GICD_ITARGETSR has one byte-wide entry for each interrupt
*/
/*
* GICD_ITARGETSR has 4 interrupts for each register. That is, 8-bits of
* target bitmap for each register
*/
bank = id & RANGE_LIMIT(gicd.GICD_ITARGETSR);
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 354 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
return (uint32_t)(gicd.GICD_ITARGETSR[bank]);
}
tmp = gicd.GICD_ICFGR[bank];
tmp &= ~(3 << id);
tmp |= config << id;
gicd.GICD_ICFGR[bank] = tmp;
}
/*
* GICD_ISPENDR has 32 interrupts for each register
*/
bank = (id >> 5) & RANGE_LIMIT(gicd.GICD_ISPENDR);
id &= 0x1f;
/*
* GICD_ICPENDR has 32 interrupts for each register
*/
bank = (id >> 5) & RANGE_LIMIT(gicd.GICD_ICPENDR);
id &= 0x1f;
/*
* GICD_ICPENDR has 32 interrupts for each register
*/
bank = (id >> 5) & RANGE_LIMIT(gicd.GICD_ICPENDR);
id &= 0x1f;
/*
* GICD_IGROUPR has 32 interrupts for each register
*/
bank = (id >> 5) & RANGE_LIMIT(gicd.GICD_IGROUPR);
id &= 0x1f;
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 355 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* the single group argument is split into two separate
* registers, so filter out and remove the (new to gicv3)
* group modifier bit
*/
groupmod = (group >> 1) & 1;
group &= 1;
/*
* either set or clear the Group bit for the interrupt as appropriate
*/
if (group)
gicd.GICD_IGROUPR[bank] |= 1 << id;
else
gicd.GICD_IGROUPR[bank] &= ~(1 << id);
/*
* now deal with groupmod
*/
if (groupmod)
gicd.GICD_IGRPMODR[bank] |= 1 << id;
else
gicd.GICD_IGRPMODR[bank] &= ~(1 << id);
}
/*
* GICD_IGROUPR has 32 interrupts for each register
*/
block &= RANGE_LIMIT(gicd.GICD_IGROUPR);
/*
* get each bit of group config duplicated over all 32-bits in a word
*/
groupmod = (uint32_t)(((int32_t)group << (nbits - 1)) >> 31);
group = (uint32_t)(((int32_t)group << nbits) >> 31);
/*
* set the security state for this block of SPIs
*/
gicd.GICD_IGROUPR[block] = group;
gicd.GICD_IGRPMODR[block] = groupmod;
}
/*
* GICD_TYPER.ITLinesNumber gives (No. SPIS / 32) - 1, and we
* want to iterate over all blocks excluding 0 (which are the
* SGI/PPI interrupts, and not relevant here)
*/
for (block = (gicd.GICD_TYPER & ((1 << 5) - 1)); block > 0; --block)
SetSPISecurityBlock(block, group);
}
/* EOF GICv3_gicd.c */
/*
* GICv3_gicr.c - generic driver code for GICv3 redistributor
*
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 356 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
* Copyright (c) 2014-2020 Arm Limited (or its affiliates). All rights reserved.
* Use, modification and redistribution of this file is subject to your possession
of a
* valid End User License Agreement for the Arm Product of which these examples are
part of
* and your compliance with all applicable terms and conditions of such licence
agreement.
*/
#include "GICv3.h"
/*
* Physical LPI Redistributor register map
*/
typedef struct
{
volatile uint32_t GICR_CTLR; // +0x0000 - RW - Redistributor
Control Register
const volatile uint32_t GICR_IIDR; // +0x0004 - RO - Implementer
Identification Register
const volatile uint32_t GICR_TYPER[2]; // +0x0008 - RO - Redistributor
Type Register
volatile uint32_t GICR_STATUSR; // +0x0010 - RW - Error Reporting
Status Register, optional
volatile uint32_t GICR_WAKER; // +0x0014 - RW - Redistributor
Wake Register
const volatile uint32_t padding1[2]; // +0x0018 - RESERVED
#ifndef USE_GIC600
volatile uint32_t IMPDEF1[8]; // +0x0020 - ?? - IMPLEMENTATION
DEFINED
#else
volatile uint32_t GICR_FCTLR; // +0x0020 - RW - Function Control
Register
volatile uint32_t GICR_PWRR; // +0x0024 - RW - Power Management
Control Register
volatile uint32_t GICR_CLASS; // +0x0028 - RW - Class Register
const volatile uint32_t padding2[5]; // +0x002C - RESERVED
#endif
volatile uint64_t GICR_SETLPIR; // +0x0040 - WO - Set LPI Pending
Register
volatile uint64_t GICR_CLRLPIR; // +0x0048 - WO - Clear LPI Pending
Register
const volatile uint32_t padding3[8]; // +0x0050 - RESERVED
volatile uint64_t GICR_PROPBASER; // +0x0070 - RW - Redistributor
Properties Base Address Register
volatile uint64_t GICR_PENDBASER; // +0x0078 - RW - Redistributor LPI
Pending Table Base Address Register
const volatile uint32_t padding4[8]; // +0x0080 - RESERVED
volatile uint64_t GICR_INVLPIR; // +0x00A0 - WO - Redistributor
Invalidate LPI Register
const volatile uint32_t padding5[2]; // +0x00A8 - RESERVED
volatile uint64_t GICR_INVALLR; // +0x00B0 - WO - Redistributor
Invalidate All Register
const volatile uint32_t padding6[2]; // +0x00B8 - RESERVED
volatile uint64_t GICR_SYNCR; // +0x00C0 - RO - Redistributor
Synchronize Register
const volatile uint32_t padding7[2]; // +0x00C8 - RESERVED
const volatile uint32_t padding8[12]; // +0x00D0 - RESERVED
volatile uint64_t IMPDEF2; // +0x0100 - WO - IMPLEMENTATION
DEFINED
const volatile uint32_t padding9[2]; // +0x0108 - RESERVED
volatile uint64_t IMPDEF3; // +0x0110 - WO - IMPLEMENTATION
DEFINED
const volatile uint32_t padding10[2]; // +0x0118 - RESERVED
} GICv3_redistributor_RD;
/*
* SGI and PPI Redistributor register map
*/
typedef struct
{
const volatile uint32_t padding1[32]; // +0x0000 - RESERVED
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 357 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* We have a multiplicity of GIC Redistributors; on the GIC-AEM and
* GIC-500 they are arranged as one 128KB region per redistributor: one
* 64KB page of GICR LPI registers, and one 64KB page of GICR Private
* Int registers
*/
typedef struct
{
union
{
GICv3_redistributor_RD RD_base;
uint8_t padding[64 * 1024];
} RDblock;
union
{
GICv3_redistributor_SGI SGI_base;
uint8_t padding[64 * 1024];
} SGIblock;
} GICv3_GICR;
/*
* use the scatter file to place GIC Redistributor base address
*
* although this code does not know how many Redistributor banks
* a particular system will have, we declare gicrbase as an array
* to avoid unwanted compiler optimizations when calculating the
* base of a particular Redistributor bank
*/
static const GICv3_GICR gicrbase[2] __attribute__((section (".bss.redistributor")));
/**********************************************************************/
/*
* utility functions to calculate base of a particular
* Redistributor bank
*/
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 358 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
return &(arraybase[gicr].SGIblock.SGI_base);
}
/**********************************************************************/
// This function walks a block of RDs to find one with the matching affinity
uint32_t GetGICR(uint32_t affinity)
{
GICv3_redistributor_RD* gicr;
uint32_t index = 0;
do
{
gicr = getgicrRD(index);
if (gicr->GICR_TYPER[1] == affinity)
return index;
index++;
}
while((gicr->GICR_TYPER[0] & (1<<4)) == 0); // Keep looking until GICR_TYPER.Last
reports no more RDs in block
#ifdef USE_GIC600
/* GICR_PWRR fields */
#define PWRR_RDPD_SHIFT 0
#define PWRR_RDAG_SHIFT 1
#define PWRR_RDGPD_SHIFT 2
#define PWRR_RDGPO_SHIFT 3
/*
* Values to write to GICR_PWRR register to power redistributor
* for operating through the core (GICR_PWRR.RDAG = 0)
*/
#define PWRR_ON (0 << PWRR_RDPD_SHIFT)
#define PWRR_OFF (1 << PWRR_RDPD_SHIFT)
do {
while (((gicrRD->GICR_PWRR & PWRR_RDGPD) >> PWRR_RDGPD_SHIFT) != ((gicrRD-
>GICR_PWRR & PWRR_RDGPO) >> PWRR_RDGPO_SHIFT));
/* Power on redistributor */
gicrRD->GICR_PWRR=PWRR_ON;
/*
* step 1 - ensure GICR_WAKER.ProcessorSleep is off
*/
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 359 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
* step 2 - wait for children asleep to be cleared
*/
while ((gicrRD->GICR_WAKER & gicrwaker_ChildrenAsleep) != 0)
continue;
/*
* OK, GICR is go
*/
return;
}
id &= 0x1f;
id &= 0x1f;
gicrSGI->GICR_ICENABLER = 1 << id;
}
/*
* GICD_IPRIORITYR has one byte-wide entry per interrupt
*/
id &= RANGE_LIMIT(gicrSGI->GICR_IPRIORITYR);
gicrSGI->GICR_IPRIORITYR[id] = priority;
}
/*
* GICD_IPRIORITYR has one byte-wide entry per interrupt
*/
id &= RANGE_LIMIT(gicrSGI->GICR_IPRIORITYR);
return (uint32_t)(gicrSGI->GICR_IPRIORITYR[id]);
}
/*
* GICR_ISPENDR is one 32-bit register
*/
id &= 0x1f;
/*
* GICR_ICPENDR is one 32-bit register
*/
id &= 0x1f;
gicrSGI->GICR_ICPENDR = 1 << id;
}
/*
* GICR_ISPENDR is one 32-bit register
*/
id &= 0x1f;
/*
* GICR_IGROUPR0 is one 32-bit register
*/
id &= 0x1f;
/*
* the single group argument is split into two separate
* registers, so filter out and remove the (new to gicv3)
* group modifier bit
*/
groupmod = (group >> 1) & 1;
group &= 1;
/*
* either set or clear the Group bit for the interrupt as appropriate
*/
if (group)
gicrSGI->GICR_IGROUPR0 |= 1 << id;
else
gicrSGI->GICR_IGROUPR0 &= ~(1 << id);
/*
* now deal with groupmod
*/
if (groupmod)
gicrSGI->GICR_IGRPMODR0 |= 1 << id;
else
gicrSGI->GICR_IGRPMODR0 &= ~(1 << id);
}
/*
* get each bit of group config duplicated over all 32 bits
*/
groupmod = (uint32_t)(((int32_t)group << (nbits - 1)) >> 31);
group = (uint32_t)(((int32_t)group << nbits) >> 31);
/*
* set the security state for this block of SPIs
*/
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 361 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
gicrSGI->GICR_IGROUPR0 = group;
gicrSGI->GICR_IGRPMODR0 = groupmod;
}
/* EOF GICv3_gicr.c */
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
#include <string.h>
// You must implement this function. The register used here must
// match the one specified with -mtp=<el> during compilation.
// Defining this function as always inline and static with inline
// assembly means it only ever uses one instruction without needing
// a full function call.
__attribute__((always_inline)) static void write_tp(void* tls_data)
{
__asm volatile("msr TPIDR_EL0, %0" : : "r"(tls_data) : "cc");
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 362 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
// Simple program that creates the TLS from the TLS template then prints
// out the values using TLS variables.
// This only works for local-exec TLS as all TLS variables are at a known
// offset from the thread pointer register.
int main(void) {
initialise_tls_from_mem(
// Start address of TLS RW data
(unsigned int*)&Image$$ER_TLS_RW$$Base,
// Number of bytes of TLS RW data
(size_t)&Image$$ER_TLS_RW$$Limit - (size_t)&Image$$ER_TLS_RW$$Base,
// Number of bytes of TLS ZI data
(size_t)&Image$$ER_TLS_ZI$$ZI$$Limit - (size_t)&Image$$ER_TLS_RW$$Limit
);
return 0;
}
/*
** Copyright (c) 2006-2014 Arm Limited (or its affiliates). All rights reserved.
** Use, modification and redistribution of this file is subject to your possession
of a
** valid End User License Agreement for the Arm Product of which these examples are
part of
** and your compliance with all applicable terms and conditions of such licence
agreement.
*/
/*
** This file contains re-implementations of functions whose
** C library implementations rely on semihosting.
**
** Define USE_SERIAL_PORT to retarget the I/O only to the serial port.
** Otherwise, I/O is targeted to the debugger console using semihosting.
**
** Define STANDALONE to eliminate all use of semihosting-using functions too.
*/
#include <stdio.h>
#define TRUE 1
#define FALSE 0
/*
** Importing __use_no_semihosting ensures that our image doesn't link
** with any C Library code that makes direct use of semihosting.
**
** Build with STANDALONE to include this symbol.
*/
#ifdef STANDALONE
#define USE_SERIAL_PORT 1
asm(".global __use_no_semihosting");
#endif
/*
** Retargeted I/O
** ==============
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 363 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
/*
** These must be defined to avoid linking in stdio.o from the
** C Library
*/
struct __FILE { int handle; /* Add whatever you need here */};
FILE __stdout;
FILE __stdin;
/*
** __backspace must return the last char read to the stream
** fgetc() needs to keep a record of whether __backspace was
** called directly before it
*/
int last_char_read;
int backspace_called;
/*
** The effect of __backspace() should be to return the last character
** read from the stream, such that a subsequent fgetc() will
** return the same character again.
*/
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 364 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
#ifdef STANDALONE
/*
** Exception Signaling and Handling
** ================================
** The C library implementations of ferror() uses semihosting directly
** and must therefore be retargeted. This is a minimal reimplementation.
** _sys_exit() is called after the user's main() function has exited. The C library
** implementation uses semihosting to report to the debugger that the application
has
** finished executing.
*/
#endif // STANDALONE
// ------------------------------------------------------------
// SP804 Dual Timer
//
// Copyright (c) 2009-2017 Arm Limited (or its affiliates). All rights reserved.
// Use, modification and redistribution of this file is subject to your possession
of a
// valid End User License Agreement for the Arm Product of which these examples are
part of
// and your compliance with all applicable terms and conditions of such licence
agreement.
// ------------------------------------------------------------
#include "sp804_timer.h"
struct sp804_timer
{
volatile uint32_t Time1Load; // +0x00
const volatile uint32_t Time1Value; // +0x04 - RO
volatile uint32_t Timer1Control; // +0x08
volatile uint32_t Timer1IntClr; // +0x0C - WO
const volatile uint32_t Timer1RIS; // +0x10 - RO
const volatile uint32_t Timer1MIS; // +0x14 - RO
volatile uint32_t Timer1BGLoad; // +0x18
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 365 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
};
// Instance of the dual timer, will be placed using the scatter file
struct sp804_timer* dual_timer;
dual_timer->Time1Load = load_value;
// Fixed setting: 32-bit, no prescaling
tmp = TIMER_SP804_CTRL_TIMERSIZE | TIMER_SP804_CTRL_PRESCALE_1 |
TIMER_SP804_CTRL_TIMERMODE;
return;
}
tmp = dual_timer->Timer1Control;
tmp = tmp | TIMER_SP804_CTRL_TIMEREN; // Set TimerEn (bit 7)
dual_timer->Timer1Control = tmp;
return;
}
tmp = dual_timer->Timer1Control;
tmp = tmp & ~TIMER_SP804_CTRL_TIMEREN; // Clear TimerEn (bit 7)
dual_timer->Timer1Control = tmp;
return;
}
{
return dual_timer->Time1Value;
}
void clearTimerIrq(void)
{
// A write to this register, of any value, clears the interrupt
dual_timer->Timer1IntClr = 1;
}
// ------------------------------------------------------------
// End of sp804_timer.c
// ------------------------------------------------------------
// ------------------------------------------------------------
// SP804 Dual Timer
// Header Filer
//
// Copyright (c) 2009-2017 Arm Limited (or its affiliates). All rights reserved.
// Use, modification and redistribution of this file is subject to your possession
of a
// valid End User License Agreement for the Arm Product of which these examples are
part of
// and your compliance with all applicable terms and conditions of such licence
agreement.
// ------------------------------------------------------------
#ifndef _SP804_TIMER_
#define _SP804_TIMER_
#include <stdint.h>
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 367 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
#endif
// ------------------------------------------------------------
// End of sp804_timer.h
// ------------------------------------------------------------
/* Copyright (c) 2016 Arm Limited (or its affiliates). All rights reserved. */
/* Use, modification and redistribution of this file is subject to your possession
of a */
/* valid End User License Agreement for the Arm Product of which these examples are
part of */
/* and your compliance with all applicable terms and conditions of such licence
agreement. */
#include <stdio.h>
#include "GICv3.h"
#include "GICv3_gicc.h"
#include "sp804_timer.h"
if (state)
{
int max = (1 << 7);
value <<= 1;
if (value == max)
state = 0;
}
else
{
value >>= 1;
if (value == 1)
state = 1;
}
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 368 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
// --------------------------------------------------------
void irqHandler(void)
{
unsigned int ID;
ID = getICC_IAR1(); // readIntAck();
switch(ID)
{
case 34:
// Dual-Timer 0 (SP804)
printf("irqHandler() - External timer interrupt\n\n");
nudge_leds();
clearTimerIrq();
break;
default:
// Unexpected ID value
printf("irqHandler() - Unexpected INTID %d\n\n", ID);
break;
}
// --------------------------------------------------------
void fiqHandler(void)
{
unsigned int ID;
unsigned int aliased = 0;
ID = getICC_IAR0(); // readIntAck();
printf("fiqHandler() - Read %d from IAR0\n", ID);
switch(ID)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 369 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
{
case 34:
// Dual-Timer 0 (SP804)
printf("fiqHandler() - External timer interrupt\n\n");
clearTimerIrq();
break;
default:
// Unexpected ID value
printf("fiqHandler() - Unexpected INTID %d\n\n", ID);
break;
}
/*
* PL011 UART driver
*
* Copyright (c) 2005-2014 Arm Limited (or its affiliates). All rights reserved.
* Use, modification and redistribution of this file is subject to your possession
of a
* valid End User License Agreement for the Arm Product of which these examples are
part of
* and your compliance with all applicable terms and conditions of such licence
agreement.
*/
#include <stdio.h>
#include "uart.h"
/*
* UART instance: will be placed using the scatter file
*/
static struct pl011_uart uart;
void UartInit(void)
{
/*
* ensure the UART is disabled
*/
uart.UARTCR = 0x0;
/*
* OK, now program this thing up
*/
uart.UARTECR = 0x0; // Clear the receive status (i.e. error) register
uart.UARTLCR_H = 0x0 | PL011_LCR_WORD_LENGTH_8 | PL011_LCR_FIFO_DISABLE | \
PL011_LCR_ONE_STOP_BIT | PL011_LCR_PARITY_DISABLE | PL011_LCR_BREAK_DISABLE;
uart.UARTIBRD = PL011_IBRD_DIV_38400;
uart.UARTFBRD = PL011_FBRD_DIV_38400;
/*
* mask and clear all interrupts
*/
uart.UARTIMSC = 0x0;
uart.UARTICR = PL011_ICR_CLR_ALL_IRQS;
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 370 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
return;
}
void uart_putc_polled(char c)
{
/* Wait for UART to become free */
/* Note that FIFOs are not being used here */
while (uart.UARTFR & PL011_FR_BUSY_FLAG);
char uart_getchar_polled(void)
{
/* Wait for UART to become free */
/* Note that FIFOs are not being used here */
while (uart.UARTFR & PL011_FR_BUSY_FLAG);
/* Read character received */
return uart.UARTDR;
}
/*
* PL011 UART driver
*
* Copyright (c) 2005-2016 Arm Limited (or its affiliates). All rights reserved.
* Use, modification and redistribution of this file is subject to your possession
of a
* valid End User License Agreement for the Arm Product of which these examples are
part of
* and your compliance with all applicable terms and conditions of such licence
agreement.
*/
#ifndef uart_h
#define uart_h
/*
* the layout of the UART device
*/
struct pl011_uart
{
volatile unsigned int UARTDR; // +0x00
volatile unsigned int UARTECR; // +0x04
const volatile unsigned int unused0[4]; // +0x08 to +0x14 reserved
const volatile unsigned int UARTFR; // +0x18 - RO
const volatile unsigned int unused1; // +0x1C reserved
volatile unsigned int UARTILPR; // +0x20
volatile unsigned int UARTIBRD; // +0x24
volatile unsigned int UARTFBRD; // +0x28
volatile unsigned int UARTLCR_H; // +0x2C
volatile unsigned int UARTCR; // +0x30
volatile unsigned int UARTIFLS; // +0x34
volatile unsigned int UARTIMSC; // +0x38
const volatile unsigned int UARTRIS; // +0x3C - RO
const volatile unsigned int UARTMIS; // +0x40 - RO
volatile unsigned int UARTICR; // +0x44 - WO
volatile unsigned int UARTDMACR; // +0x48
};
/*
* defines for control/status registers
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 371 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
*/
#define PL011_LCR_WORD_LENGTH_8 (0x60)
#define PL011_LCR_WORD_LENGTH_7 (0x40)
#define PL011_LCR_WORD_LENGTH_6 (0x20)
#define PL011_LCR_WORD_LENGTH_5 (0x00)
#define PL011_LCR_FIFO_ENABLE (0x10)
#define PL011_LCR_FIFO_DISABLE (0x00)
void UartInit(void);
void uart_putc_polled(char c);
char uart_getchar_polled(void);
#endif
/*
*
* Armv8-A AArch64 common helper functions
*
* Copyright (c) 2012-2016 Arm Limited (or its affiliates). All rights reserved.
* Use, modification and redistribution of this file is subject to your possession
of a
* valid End User License Agreement for the Arm Product of which these examples are
part of
* and your compliance with all applicable terms and conditions of such licence
agreement.
*/
#ifndef V8_AARCH64_H
#define V8_AARCH64_H
/*
* Parameters for data barriers
*/
#define OSHLD 1
#define OSHST 2
#define OSH 3
#define NSHLD 5
#define NSHST 6
#define NSH 7
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 372 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
#define ISHLD 9
#define ISHST 10
#define ISH 11
#define LD 13
#define ST 14
#define SY 15
/**********************************************************************/
/*
* function prototypes
*/
/*
* void InvalidateUDCaches(void)
* invalidates all Unified and Data Caches
*
* Inputs
* <none>
*
* Returns
* <nothing>
*
* Side Effects
* guarantees that all levels of cache will be invalidated before
* returning to caller
*/
void InvalidateUDCaches(void);
/*
* unsigned long long EnableCachesEL1(void)
* enables I- and D- caches at EL1
*
* Inputs
* <none>
*
* Returns
* New value of SCTLR_EL1
*
* Side Effects
* context will be synchronised before returning to caller
*/
unsigned long long EnableCachesEL1(void);
/*
* unsigned long long GetMIDR(void)
* returns the contents of MIDR_EL0
*
* Inputs
* <none>
*
* Returns
* MIDR_EL0
*/
unsigned long long GetMIDR(void);
/*
* unsigned long long GetMPIDR(void)
* returns the contents of MPIDR_EL0
*
* Inputs
* <none>
*
* Returns
* MPIDR_EL0
*/
unsigned long long GetMPIDR(void);
/*
* unsigned int GetCPUID(void)
* returns the Aff0 field of MPIDR_EL0
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 373 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Thread Local Storage
*
* Inputs
* <none>
*
* Returns
* MPIDR_EL0[7:0]
*/
unsigned int GetCPUID(void);
#endif
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 374 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the Linker
The command-line option descriptions and related information in the Arm Compiler
for Embedded Reference Guide describe all the features that Arm Compiler for
Embedded supports. Any features not documented are not supported and are used
at your own risk. You are responsible for making sure that any generated code using
community features is operating correctly. For more information, see Support level
definitions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 375 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the Linker
where:
<options>
armlink command-line options.
<input-file-list>
A space-separated list of objects, libraries, or symbol definitions (symdefs) files.
Related information
input-file-list linker option
Linker Command-line Options
Related information
Elimination of unused sections
The structure of an Arm ELF image
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 376 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the Linker
Object files must be formatted as Arm® ELF. This format is described in:
• ELF for the Arm Architecture (IHI 0044).
• ELF for the Arm 64-bit Architecture (AArch64) (IHI 0056).
Related information
Overview of the Arm Librarian on page 399
Security features supported in Arm Compiler for Embedded on page 274
--import_cmse_lib_in=filename
Access symbols in another image
Scatter-loading Features
Scatter File Syntax
Linker Steering File Command Reference
ELF for the Arm Architecture
ELF for the Arm 64-bit Architecture (AArch64)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 377 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the Linker
You can also use fromelf to convert an ELF executable image to other file formats,
or to display, process, and protect the content of an ELF executable image.
Related information
Security features supported in Arm Compiler for Embedded on page 274
Overview of the fromelf Image Converter on page 389
Partial linking model
Section placement with the linker
The structure of an Arm ELF image
--import_cmse_lib_out=filename
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 378 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Image Details
Procedure
To identify the source of some link errors, use --info inputs.
For example, you can search the output to locate undefined references from library objects or
multiply defined symbols caused by retargeting some library functions and not others. Search
backwards from the end of this output to find and resolve link errors.
You can also use the --verbose option to output similar text with additional information on the
linker operations.
Related information
Getting Image Details on page 379
--info=topic[,topic,…] (armlink)
--verbose (armlink)
Here, sizes gives a list of the Code and data sizes for each input object and library member in the
image. Using this option implies --info sizes,totals.
The following example shows the output in tabular format with the totals separated out for easy
reading:
30 16 0 0 0 0 foo.o
56 10 960 0 1024 372 startup_ARMCM7.o
----------------------------------------------------------------------
88 26 992 0 5120 372 Object Totals
0 0 32 0 4096 0 (incl.
Generated)
2 0 0 0 0 0 (incl. Padding)
----------------------------------------------------------------------
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 380 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Image Details
8 0 0 0 0 68 __main.o
0 0 0 0 0 0 __rtentry.o
12 0 0 0 0 0 __rtentry2.o
8 4 0 0 0 0 __rtentry5.o
52 8 0 0 0 0 __scatter.o
26 0 0 0 0 0 __scatter_copy.o
28 0 0 0 0 0 __scatter_zi.o
10 0 0 0 0 68 defsig_exit.o
50 0 0 0 0 88 defsig_general.o
80 58 0 0 0 76
defsig_rtmem_inner.o
14 0 0 0 0 80
defsig_rtmem_outer.o
52 38 0 0 0 76
defsig_rtred_inner.o
14 0 0 0 0 80
defsig_rtred_outer.o
18 0 0 0 0 80 exit.o
76 0 0 0 0 88 fclose.o
470 0 0 0 0 88 flsbuf.o
236 4 0 0 0 128 fopen.o
26 0 0 0 0 68 fputc.o
248 6 0 0 0 84 fseek.o
66 0 0 0 0 76 ftell.o
94 0 0 0 0 80 h1_alloc.o
52 0 0 0 0 68 h1_extend.o
78 0 0 0 0 80 h1_free.o
14 0 0 0 0 84 h1_init.o
80 6 0 4 0 96 heapauxa.o
4 0 0 0 0 136 hguard.o
0 0 0 0 0 0 indicate_semi.o
138 0 0 0 0 168 init_alloc.o
312 46 0 0 0 112 initio.o
2 0 0 0 0 0 libinit.o
6 0 0 0 0 0 libinit2.o
16 8 0 0 0 0 libinit4.o
2 0 0 0 0 0 libshutdown.o
6 0 0 0 0 0 libshutdown2.o
0 0 0 0 96 0 libspace.o
0 0 0 0 0 0
maybetermalloc1.o
44 4 0 0 0 84 puts.o
8 4 0 0 0 68
rt_errno_addr_intlibspace.o
8 4 0 0 0 68
rt_heap_descriptor_intlibspace.o
78 0 0 0 0 80 rt_memclr_w.o
2 0 0 0 0 0 rtexit.o
10 0 0 0 0 0 rtexit2.o
70 0 0 0 0 80 setvbuf.o
240 6 0 0 0 156 stdio.o
0 0 0 12 252 0 stdio_streams.o
62 0 0 0 0 76 strlen.o
12 4 0 0 0 68 sys_exit.o
102 0 0 0 0 240 sys_io.o
0 0 12 0 0 0 sys_io_names.o
14 0 0 0 0 76 sys_wrch.o
2 0 0 0 0 68 use_no_semi.o
----------------------------------------------------------------------
2962 200 14 16 352 3036 Library Totals
12 0 2 0 4 0 (incl. Padding)
----------------------------------------------------------------------
----------------------------------------------------------------------
2962 200 14 16 352 3036 Library Totals
----------------------------------------------------------------------
==============================================================================
==============================================================================
==============================================================================
In this example:
Code (inc. data)
The number of bytes occupied by the code. In this image, there are 3050 bytes of code.
This value includes 226 bytes of inline data (inc. data), for example, literal pools, and short
strings.
RO Data
The number of bytes occupied by the RO data. This value is in addition to the inline data
included in the Code (inc. data) column.
RW Data
The number of bytes occupied by the RW data.
ZI Data
The number of bytes occupied by the ZI data.
Debug
The number of bytes occupied by the debug data, for example, debug Input sections and the
symbol and string table.
Object Totals
The number of bytes occupied by the objects when linked together to generate the image.
(incl. Generated)
armlink might generate image contents, for example, interworking veneers, and Input
sections such as region tables. If the Object Totals row includes this type of data, it is
shown in this row.
Combined across all of the object files (foo.o and startup_ARMCM7.o), the example shows
that there are 992 bytes of RO data, of which 32 bytes are linker-generated RO data.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 382 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Image Details
If the scatter file contains EMPTY regions, the linker might generate ZI data. In
the example, the 4096 bytes of ZI data labeled (incl. Generated) correspond
to an ARM_LIB_STACKHEAP execution region used to set up the stack and heap
in a scatter file as follows:
Library Totals
The number of bytes occupied by the library members that have been extracted and added
to the image as individual objects.
(incl. Padding)
If necessary, armlink inserts padding to force section alignment. If the Object Totals row
includes this type of data, it is shown in the associated (incl. Padding) row. Similarly, if the
Library Totals row includes this type of data, it is shown in its associated row.
In the example, there are 992 bytes of RO data in the object total, of which 0 bytes is linker-
generated padding, and 14 bytes of RO data in the library total, with 2 bytes of padding.
Grand Totals
Shows the true size of the image. In the example, there are 5120 bytes of ZI data (in Object
Totals) and 352 of ZI data (in Library Totals) giving a total of 5472 bytes.
In the example, RW data compression is not enabled. If data is compressed, the RW value
changes.
ROM Totals
Shows the minimum size of ROM required to contain the image. This size does not include ZI
data and debug information that is not stored in the ROM.
Related information
Getting Image Details on page 379
--info=topic[,topic,…] (armlink)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 383 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Image Details
Procedure
1. Create the file s.c containing the following source code:
long long array[10] __attribute__ ((section ("ARRAY")));
int main(void)
{
return sizeof(array);
}
...
Execution Region ER_RW (Base: 0x00008360, Size: 0x00000050, Max: 0xffffffff,
ABSOLUTE)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 384 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Getting Image Details
Related information
Using fromelf to find where a symbol is placed in an executable ELF image on page 396
--keep=section_id (armlink)
--map --no_map (armlink)
-o filename --output=filename (armlink)
-c compiler option
-march compiler option
-o compiler option
--target compiler option
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 385 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SysV Dynamic Linking
Procedure
1. Create the file lib.c containing the following code:
__attribute__((visibility("default")))
int lib_func(int a)
{
return 5 * a;
}
3. Run fromelf with the --only option to see that the function lib_func() has the visibility set to
default and is present in the dynamic symbol table:
fromelf -s --only=.dynsym lib.so
...
** Section #2 '.dynsym' (SHT_DYNSYM) [SHF_ALLOC]
Size : 32 bytes (alignment 4)
Address: 0x00000110
String table #3 '.dynstr'
Last local symbol no. 0
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 386 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SysV Dynamic Linking
Build the image and then run fromelf to examine the contents.
Procedure
1. Create the file app.c containing the following code:
#include <stdio.h>
int main(void)
{
printf("Result: %d.\n", lib_func(3));
return 0;
}
0 DT_NEEDED 1 (lib.so)
1 DT_HASH 33100 (0x0000814c)
2 DT_STRTAB 33156 (0x00008184)
3 DT_SYMTAB 33124 (0x00008164)
4 DT_STRSZ 17
5 DT_SYMENT 16
6 DT_PLTRELSZ 8
7 DT_PLTGOT 77124 (0x00012d44)
8 DT_DEBUG 0 (0x00000000)
9 DT_JMPREL 33176 (0x00008198)
10 DT_PLTREL 17 (DT_REL)
11 DT_NULL 0
...
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 387 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
SysV Dynamic Linking
When executed, a platform-specific dynamic loader processes information in the dynamic array,
loads lib.so, resolves relocations in all loaded files, and passes control to the main executable.
The program then outputs:
Result: 15.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 388 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the fromelf Image Converter
armasm does not support features of Arm®v8.4-A and later architectures, even
those back-ported to Armv8.2-A and Armv8.3-A.
The command-line option descriptions and related information in the Arm Compiler
for Embedded Reference Guide describe all the features that Arm Compiler for
Embedded supports. Any features not documented are not supported and are used
at your own risk. You are responsible for making sure that any generated code using
community features is operating correctly. For more information, see Support level
definitions.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 389 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the fromelf Image Converter
Related information
--bin (fromelf)
--elf (fromelf)
--i32 (fromelf)
--m32 (fromelf)
--text (fromelf)
--vhx (fromelf)
Procedure
To display the help information, enter:
fromelf --help
Related information
fromelf command-line syntax on page 390
--help (fromelf)
Syntax
fromelf <options> <input_file>
<options>
fromelf command-line options.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 390 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the fromelf Image Converter
<input_file>
The ELF file or library file to be processed. When some options are used, multiple input files
can be specified.
Related information
fromelf Command-line Options
input_file (fromelf)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 391 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using fromelf
Related information
--base [[object_file::]load_region_ID=]num (fromelf)
input_file (fromelf)
Examples
Consider an archive, test.a, containing the following ELF files:
bmw.o
bmw1.o
call_c_code.o
newtst.o
shapes.o
strmtst.o
The example also creates an output archive with the name test.a in the subdirectory
strip_all
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 392 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using fromelf
The example also creates an output archive with the name test.a in the subdirectory
subset. The archive contains the processed files together with the remaining files that are
unprocessed.
To process the bmw.o, bmw1.o, and newtst.o files in the archive, enter:
On Unix systems your shell typically requires the parentheses to be escaped with
backslashes. Alternatively, enclose the complete section specifier in double quotes,
for example:
--entry="8+startup.o(startupseg)"
Related information
--disassemble (fromelf)
--elf (fromelf)
input_file (fromelf)
--output=destination (fromelf)
--strip=option[,option,…] (fromelf)
To help you to protect this code, fromelf provides the --strip option and the --privacy option.
These options remove or obscure the symbol names in the image. The option that you choose
depends on how much information you want to remove. The effect of these options is different for
image files.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 393 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using fromelf
Restrictions
You must use --elf with these options. Because you have to use --elf, you must also use --
output.
Effect of the --privacy and --strip options for protecting code in image files
Option Effect
fromelf --elf --privacy Removes the whole symbol table.
Example
To produce a new ELF executable image with the complete symbol table removed and with the
various section names changed, enter:
Related information
Options to protect code in object files with fromelf on page 394
fromelf command-line syntax on page 390
--elf (fromelf)
--output=destination (fromelf)
--privacy (fromelf)
--strip=option[,option,…] (fromelf)
To help you to protect this code, fromelf provides the --strip option and the --privacy option.
These options remove or obscure the symbol names in the object. The option you choose depends
on how much information you want to remove. The effect of these options is different for object
files.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 394 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using fromelf
Restrictions
You must use --elf with these options. Because you have to use --elf, you must also use --
output.
Effect of the --privacy and --strip options for protecting code in object files
Option Local symbols Section names Mapping symbols Build attributes
fromelf --elf -- Removes those local symbols Gives section names a default Present Present
privacy that can be removed without loss value. For example, changes code
of functionality. section names to '.text'
Example
To produce a new ELF object with the complete symbol table removed and various section names
changed, enter:
Related information
Options to protect code in image files with fromelf on page 393
fromelf command-line syntax on page 390
--elf (fromelf)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 395 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using fromelf
--output=destination (fromelf)
--privacy (fromelf)
--strip=option[,option,…] (fromelf)
You can specify some of the --emit options using the --text option.
Examples
To print the contents of the data sections of an ELF file, infile.axf, enter:
To print relocation information and the dynamic section contents for the ELF file infile2.axf,
enter:
Related information
fromelf command-line syntax on page 390
--emit=option[,option,…] (fromelf)
--text (fromelf)
The symbol table identifies the section where the symbol is placed.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 396 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Using fromelf
Procedure
1. Create the file s.c containing the following source code:
long long arr[10] __attribute__ ((section ("ARRAY")));
int main()
{
return sizeof(arr);
}
** Section #24
Name : .symtab
Type : SHT_SYMTAB (0x00000002)
Flags : None (0x00000000)
Addr : 0x00000000
File Offset : 868 (0x364)
Size : 464 bytes (0x1d0)
Link : Section 1 (.strtab)
Info : Last local symbol no = 26
Alignment : 4
Entry Size : 16
The Sec column shows the section where the stack is placed. In this example, section 5.
6. Locate the section identified for the symbol in the fromelf output, for example:
...
====================================
** Section #5
Name : ARRAY
Type : SHT_PROGBITS (0x00000001)
Flags : SHF_ALLOC + SHF_WRITE (0x00000003)
Addr : 0x00000000
File Offset : 88 (0x58)
Size : 80 bytes (0x50)
Link : SHN_UNDEF
Info : 0
Alignment : 8
Entry Size : 0
====================================
...
Related information
--text (fromelf)
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 398 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the Arm Librarian
You can pass these libraries to the linker in place of several ELF object files.
A timestamp is also associated with each file that is added or replaced in a library.
When you create, add, or replace object files in a library, armar creates a symbol
table by default. However, debug symbols are not included by default.
The linker recognizes a collection of ELF files stored in an ar format file as a library. The
contents of each ELF file form a single member in the library.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 399 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the Arm Librarian
However, you can define the symbol in an object in a user library and specify that library object
explicitly on the armlink command line.
Two particular symbol names affected by this behavior are __user_perthread_libspace and
_mutex_initialize.
Table 19-1: Effect of armlink commands for symbol definitions that affect library behavior
armlink commands Description Symbol accessible?
armlink file1.o file2.o libspace.o is explicitly specified on the Yes
libspace.o command line.
armlink file1.o file2.o utils.a The linker might not load the object that No
defines __user_perthread_libspace
before the Arm C library.
armlink file1.o file2.o libspace.o is explicit even though it is in Yes
"utils.a(libspace.o)" a specified library.
Related information
--remove, --no_remove (armlink)
Syntax
armar <options> <archive> [<file_list>]
<options>
armar command-line options.
<archive>
The filename of the library. A library file must always be specified.
<file_list>
The list of files to be processed.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 400 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the Arm Librarian
Related information
armar Command-line Options
archive (armar)
file_list (armar)
This is the default if you do not specify any options or source files.
Example
To display the help information, enter:
armar --help
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 401 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the armasm Legacy Assembler
Because armasm is deprecated, some newer architectural features are not supported.
Supported features
armasm supports the following:
• Unified Assembly Language (UAL) for both A32 and T32 code.
• Assembly language for A64 code.
• Advanced SIMD instructions in A64, A32, and T32 code.
• Floating-point instructions in A64, A32, and T32 code.
• Directives in assembly source code.
• Processing of user-defined macros.
• SDOT and UDOT instructions that are an optional extension in Arm®v8.2-A and Armv8.3-A.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 402 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the armasm Legacy Assembler
This is because assembly language source code often contains forward references. A forward
reference occurs when a label is used as an operand, for example as a branch target, earlier in
the code than the definition of the label. The assembler cannot know the address of the forward
reference label until it reads the definition of the label.
During each pass, the assembler performs different functions. In the first pass, the assembler:
• Checks the syntax of the instruction or directive. It faults if there is an error in the syntax, for
example if a label is specified on a directive that does not accept one.
• Determines the size of the instruction and data being assembled and reserves space.
• Determines offsets of labels within sections.
• Creates a symbol table containing label definitions and their memory addresses.
Memory addresses of labels are determined and finalized in the first pass. Therefore, the assembly
code must not change during the second pass. All instructions must be seen in both passes.
Therefore you must not define a symbol after a :DEF: test for the symbol. The assembler faults if it
sees code in pass 2 that was not seen in pass 1.
AREA x,CODE
[ :DEF: foo
num EQU 42
]
foo DCD num
END
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 403 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Overview of the armasm Legacy Assembler
AREA x,CODE
[ :LNOT: :DEF: foo
MOV r1, r2
]
foo MOV r3, r4
END
Related information
Directives that can be omitted in pass 2 of the assembler
Two pass assembler diagnostics
Instruction and directive relocations
--diag_error=tag[,tag,…]
--debug
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 404 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
Arm welcomes feedback regarding the use of all Arm Compiler for Embedded 6 features, and
intends to support users to a level that is appropriate for that feature. You can contact support at
https://developer.arm.com/support.
The following definitions clarify the levels of support and guarantees on functionality that are
expected from these features.
Product features
Product features are suitable for use in a production environment. The functionality is well-tested,
and is expected to be stable across feature and update releases.
• Arm intends to give advance notice of significant functionality changes to product features.
• If you have a support and maintenance contract, Arm provides full support for use of all
product features.
• Arm welcomes feedback on product features.
• Any issues with product features that Arm encounters or is made aware of are considered for
fixing in future versions of Arm Compiler for Embedded.
In addition to fully supported product features, some product features are only alpha or beta
quality.
Beta product features
Beta product features are implementation complete, but have not been sufficiently tested to
be regarded as suitable for use in production environments.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 405 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
• Arm encourages the use of beta product features, and welcomes feedback on them.
• Any issues with beta product features that Arm encounters or is made aware of are
considered for fixing in future versions of Arm Compiler for Embedded.
Alpha product features
Alpha product features are not implementation complete, and are subject to change in future
releases, therefore the stability level is lower than in beta product features.
Community features
Arm Compiler for Embedded 6 is built on LLVM technology and preserves the functionality of that
technology where possible. This means that there are additional features available in Arm Compiler
for Embedded that are not listed in the documentation. These additional features are known as
community features. For information on these community features, see the Clang Compiler User's
Manual.
Where community features are referenced in the documentation, they are identified with
[COMMUNITY].
• Arm makes no claims about the quality level or the degree of functionality of these features,
except when explicitly stated in this documentation.
• Functionality might change significantly between feature releases.
• Arm makes no guarantees that community features are going to remain functional across
update releases, although changes are expected to be unlikely.
Some community features might become product features in the future, but Arm provides no
roadmap for this. Arm is interested in understanding your use of these features, and welcomes
feedback on them. Arm supports customers using these features on a best-effort basis, unless the
features are unsupported. Arm accepts defect reports on these features, but does not guarantee
that these issues are going to be fixed in future releases.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 406 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
Arm C library
armclang
Source code
armasm
LLVM Project headers
clang
armlink
Scatter/Steering/
Symdefs file
Image
The dashed boxes are toolchain components, and any interaction between these components
is an integration boundary. Community features that span an integration boundary might have
significant limitations in functionality. The exception to this is if the interaction is codified in one
of the standards supported by Arm Compiler for Embedded 6. See Application Binary Interface
(ABI). Community features that do not span integration boundaries are more likely to work as
expected.
• Features primarily used when targeting hosted environments such as Linux or BSD might have
significant limitations, or might not be applicable, when targeting bare-metal environments.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 407 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
• The Clang implementations of compiler features, particularly those that have been present for a
long time in other toolchains, are likely to be mature. The functionality of new features, such as
support for new language features, is likely to be less mature and therefore more likely to have
limited functionality.
Deprecated features
A deprecated feature is one that Arm plans to remove from a future release of Arm Compiler for
Embedded. Arm does not make any guarantee regarding the testing or maintenance of deprecated
features. Therefore, Arm does not recommend using a feature after it is deprecated.
For information on replacing deprecated features with supported features, see the Arm Compiler
for Embedded documentation and Release Notes. Where appropriate, each Arm Compiler
document includes notes for features that are deprecated, and also provides entries in the changes
appendix of that document.
Unsupported features
With both the product and community feature categories, specific features and use cases are
known not to function correctly, or are not intended for use with Arm Compiler for Embedded 6.
Limitations of product features are stated in the documentation. Arm cannot provide an exhaustive
list of unsupported features or use cases for community features. The known limitations on
community features are listed in Community features.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 408 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
• C complex arithmetic is not supported, because of limitations in the current Arm C library.
• Complex numbers are defined in C++ as a template, std::complex. Arm Compiler for
Embedded supports std::complex with the float and double types, but not the long
double type because of limitations in the current Arm C library.
For C code that uses complex numbers, it is not sufficient to recompile with
the C++ compiler to make that code work. How you can use complex numbers
depends on whether or not you are building for Armv8-M targets.
• You must take care when mixing translation units that are compiled with and without the
[COMMUNITY] -fsigned-char option, and that share interfaces or data structures.
The Arm ABI defines char as an unsigned byte, and this is the interpretation
used by the C libraries supplied with the Arm compilation tools.
• There are limitations with the Control Flow Integrity (CFI) sanitizer implementation, -
fsanitize=cfi, which requires Link-Time Optimization (LTO), -flto. The following are likely to
occur:
◦ When using features such as C++ I/O streams, the linker might report errors for a rejected
local symbol, L6654E, or that a symbol is not preserved by the LTO code generation, L6137E.
◦ The linker might report a diagnostic that a symbol has a size that extends outside of its
containing section, L6783E or L6784E.
Use the linker option --diag_suppress 6783 or --diag_suppress 6784 to suppress the
diagnostic.
If you are not building for Armv8-M targets, consider modifying the affected part of your project to
use the C++ standard library type std::complex instead.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 409 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
The linker can consume ELF format inputs containing DWARF 5, DWARF 4, DWARF 3, and
DWARF 2 format debug tables.
The fromelf utility can consume ELF format inputs containing DWARF 4, DWARF 3, and
DWARF 2 format debug tables. fromelf does not support DWARF 5.
The legacy assembler armasm generates DWARF 3 debug tables with the --debug option.
When assembling for AArch32, armasm can also generate DWARF 2 for backwards
compatibility with legacy and third-party tools.
ISO C
The compiler accepts ISO C90, C99, and C11 source as input.
ISO C++
The compiler accepts ISO C++98, C++11, C++14, and C++17 source as input.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 410 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
ELF
The toolchain produces relocatable and executable files in ELF format. The fromelf utility can
translate ELF files into other formats.
Related information
C++ implementation status in LLVM Clang
The Application Binary Interface (ABI) for the Arm Architecture (Base Standard) (BSABI) regulates
the inter-operation of binary code and development tools in Arm® architecture-based execution
environments, ranging from bare metal to major operating systems such as Arm Linux.
By conforming to this standard, objects produced by the toolchain can work together with object
libraries from different producers.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 411 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
standard to govern the exchange of linkable and executable files between producers and
consumers.
AAELF
ELF for the Arm Architecture. Builds on the generic ELF standard to govern the exchange of
linkable and executable files between producers and consumers.
AAPCS64
Procedure Call Standard for the Arm 64-bit Architecture (AArch64). Governs the exchange of
control and data between functions at runtime. There is a variant of the AAPCS for each of
the major execution environment types supported by the toolchain.
AAPCS64 describes a number of different supported data models. Arm Compiler for
Embedded 6 implements the LP64 data model for AArch64 state.
AAPCS
Procedure Call Standard for the Arm 32-bit Architecture. Governs the exchange of control
and data between functions at runtime. There is a variant of the AAPCS for each of the major
execution environment types supported by the toolchain.
CLIBABI
C Library ABI for the Arm Architecture. Defines an ABI to the C library.
CPPABI64
C++ ABI for the Arm 64-bit Architecture. This specification builds on the generic C++ ABI
(originally developed for IA-64) to govern interworking between independent C++ compilers.
CPPABI
C++ ABI for the Arm 32-bit Architecture. This specification builds on the generic C++ ABI to
govern interworking between independent C++ compilers.
DBGOVL
Support for Debugging Overlaid Programs. Defines an extension to the ABI for the Arm
Architecture to support debugging overlaid programs.
EHABI
Exception Handling ABI for the Arm Architecture. Defines both the language-independent
and C++-specific aspects of how exceptions are thrown and handled.
RTABI
Run-time ABI for the Arm Architecture. Governs what independently produced objects
can assume of their execution environments by way of floating-point and compiler helper-
function support.
If you are upgrading from a previous toolchain release, ensure that you are using the most recent
versions of the Arm specifications.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 412 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
• C Language Features
• Clang Language Extensions
• Attributes in Clang
Arm Compiler for Embedded provides full support only for the English locale.
Arm Compiler for Embedded provides support for multibyte characters, for example Japanese
characters, within comments in UTF-8 encoded files. This includes:
• /* */ comments in C source files, C++ source files, and GNU-syntax assembly files.
• // comments in C source files, C++ source files, and GNU-syntax assembly files.
• @ comments in GNU-syntax assembly files, for Arm architectures.
• ; comments in armasm-syntax assembly source files and armlink scatter files.
The environment variables that the toolchain uses are described in the following table.
Where an environment variable is identified as GCC compatible, the GCC documentation provides
full information about that environment variable. See https://gcc.gnu.org/onlinedocs/gcc/
Environment-Variables.html at https://gcc.gnu.org.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 413 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
To set an environment variable on a Linux machine, open a bash shell and use the export
command. For example:
export ARM_TOOL_VARIANT=ult
The options listed appear before any options specified for the
armasm command in the makefile. Therefore, any options specified
in the makefile might override the options listed in this environment
variable.
ARMCOMPILER6_CLANGOPT An optional environment variable to define additional armclang
options that are to be used outside your regular makefile.
The options listed appear before any options specified for the
armclang command in the makefile. Therefore, any options
specified in the makefile might override the options listed in this
environment variable.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 414 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
The options listed appear before any options specified for the
fromelf command in the makefile. Therefore, any options
specified in the makefile might override the options listed in this
environment variable.
ARMCOMPILER6_LINKOPT An optional environment variable to define additional linker options
that are to be used outside your regular makefile.
The options listed appear before any options specified for the
armlink command in the makefile. Therefore, any options
specified in the makefile might override the options listed in this
environment variable.
ARMROOT Your installation directory root, <install_directory>.
ARMLMD_LICENSE_FILE This environment variable specifies the location of your Arm license
file.
Note:
On Windows, the length of ARMLMD_LICENSE_FILE must not
exceed 260 characters.
The Arm Compiler for Embedded documentation describes features that are specific to, and
supported by, Arm Compiler for Embedded. Any features specific to Arm Compiler for Embedded
that are not documented are not supported and are used at your own risk. Although open-source
Clang features that Arm does not document are available, they are not supported by Arm and
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 415 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
are used at your own risk. You are responsible for making sure that any generated code using
unsupported or community features is operating correctly. For more information, see Support level
definitions.
See the third_party_licenses.txt file in your installation for details of open-source software
projects used.
Although Arm Compiler for Embedded 6 is based on Clang and LLVM technology, it:
• Is not based on the same revision as any specific release of the open-source
version of Clang or LLVM;
• Can contain changes introduced by Arm which are not included in the open-
source version.
.weak _ZTIPKe
.weak _ZTIPKf
.weak _ZTIPKg
.weak _ZTIPKh
.weak _ZTIPKi
.weak _ZTIPKj
.weak _ZTIPKl
.weak _ZTIPKm
.weak _ZTIPKn
.weak _ZTIPKo
.weak _ZTIPKs
.weak _ZTIPKt
.weak _ZTIPKv
.weak _ZTIPKw
.weak _ZTIPKx
.weak _ZTIPKy
.weak _ZTIPa
.weak _ZTIPb
.weak _ZTIPc
.weak _ZTIPd
.weak _ZTIPe
.weak _ZTIPf
.weak _ZTIPg
.weak _ZTIPh
.weak _ZTIPi
.weak _ZTIPj
.weak _ZTIPl
.weak _ZTIPm
.weak _ZTIPn
.weak _ZTIPo
.weak _ZTIPs
.weak _ZTIPt
.weak _ZTIPv
.weak _ZTIPw
.weak _ZTIPx
.weak _ZTIPy
.weak _ZTIa
.weak _ZTIb
.weak _ZTIc
.weak _ZTId
.weak _ZTIe
.weak _ZTIf
.weak _ZTIg
.weak _ZTIh
.weak _ZTIi
.weak _ZTIj
.weak _ZTIl
.weak _ZTIm
.weak _ZTIn
.weak _ZTIo
.weak _ZTIs
.weak _ZTIt
.weak _ZTIv
.weak _ZTIw
.weak _ZTIx
.weak _ZTIy
.weak _ZTSDh
.weak _ZTSDi
.weak _ZTSDn
.weak _ZTSDs
.weak _ZTSN10__cxxabiv116__enum_type_infoE
.weak _ZTSN10__cxxabiv116__shim_type_infoE
.weak _ZTSN10__cxxabiv117__array_type_infoE
.weak _ZTSN10__cxxabiv117__class_type_infoE
.weak _ZTSN10__cxxabiv117__pbase_type_infoE
.weak _ZTSN10__cxxabiv119__pointer_type_infoE
.weak _ZTSN10__cxxabiv120__function_type_infoE
.weak _ZTSN10__cxxabiv120__si_class_type_infoE
.weak _ZTSN10__cxxabiv121__vmi_class_type_infoE
.weak _ZTSN10__cxxabiv123__fundamental_type_infoE
.weak _ZTSN10__cxxabiv129__pointer_to_member_type_infoE
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 417 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
.weak _ZTSPDh
.weak _ZTSPDi
.weak _ZTSPDn
.weak _ZTSPDs
.weak _ZTSPKDh
.weak _ZTSPKDi
.weak _ZTSPKDn
.weak _ZTSPKDs
.weak _ZTSPKa
.weak _ZTSPKb
.weak _ZTSPKc
.weak _ZTSPKd
.weak _ZTSPKe
.weak _ZTSPKf
.weak _ZTSPKg
.weak _ZTSPKh
.weak _ZTSPKi
.weak _ZTSPKj
.weak _ZTSPKl
.weak _ZTSPKm
.weak _ZTSPKn
.weak _ZTSPKo
.weak _ZTSPKs
.weak _ZTSPKt
.weak _ZTSPKv
.weak _ZTSPKw
.weak _ZTSPKx
.weak _ZTSPKy
.weak _ZTSPa
.weak _ZTSPb
.weak _ZTSPc
.weak _ZTSPd
.weak _ZTSPe
.weak _ZTSPf
.weak _ZTSPg
.weak _ZTSPh
.weak _ZTSPi
.weak _ZTSPj
.weak _ZTSPl
.weak _ZTSPm
.weak _ZTSPn
.weak _ZTSPo
.weak _ZTSPs
.weak _ZTSPt
.weak _ZTSPv
.weak _ZTSPw
.weak _ZTSPx
.weak _ZTSPy
.weak _ZTSa
.weak _ZTSb
.weak _ZTSc
.weak _ZTSd
.weak _ZTSe
.weak _ZTSf
.weak _ZTSg
.weak _ZTSh
.weak _ZTSi
.weak _ZTSj
.weak _ZTSl
.weak _ZTSm
.weak _ZTSn
.weak _ZTSo
.weak _ZTSs
.weak _ZTSt
.weak _ZTSv
.weak _ZTSw
.weak _ZTSx
.weak _ZTSy
.weak _ZTVN10__cxxabiv116__enum_type_infoE
.weak _ZTVN10__cxxabiv116__shim_type_infoE
.weak _ZTVN10__cxxabiv117__array_type_infoE
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 418 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
.weak _ZTVN10__cxxabiv117__class_type_infoE
.weak _ZTVN10__cxxabiv117__pbase_type_infoE
.weak _ZTVN10__cxxabiv119__pointer_type_infoE
.weak _ZTVN10__cxxabiv120__function_type_infoE
.weak _ZTVN10__cxxabiv120__si_class_type_infoE
.weak _ZTVN10__cxxabiv121__vmi_class_type_infoE
.weak _ZTVN10__cxxabiv123__fundamental_type_infoE
.weak _ZTVN10__cxxabiv129__pointer_to_member_type_infoE
_ZTIDh:
_ZTIDi:
_ZTIDn:
_ZTIDs:
_ZTIN10__cxxabiv116__enum_type_infoE:
_ZTIN10__cxxabiv116__shim_type_infoE:
_ZTIN10__cxxabiv117__array_type_infoE:
_ZTIN10__cxxabiv117__class_type_infoE:
_ZTIN10__cxxabiv117__pbase_type_infoE:
_ZTIN10__cxxabiv119__pointer_type_infoE:
_ZTIN10__cxxabiv120__function_type_infoE:
_ZTIN10__cxxabiv120__si_class_type_infoE:
_ZTIN10__cxxabiv121__vmi_class_type_infoE:
_ZTIN10__cxxabiv123__fundamental_type_infoE:
_ZTIN10__cxxabiv129__pointer_to_member_type_infoE:
_ZTIPDh:
_ZTIPDi:
_ZTIPDn:
_ZTIPDs:
_ZTIPKDh:
_ZTIPKDi:
_ZTIPKDn:
_ZTIPKDs:
_ZTIPKa:
_ZTIPKb:
_ZTIPKc:
_ZTIPKd:
_ZTIPKe:
_ZTIPKf:
_ZTIPKg:
_ZTIPKh:
_ZTIPKi:
_ZTIPKj:
_ZTIPKl:
_ZTIPKm:
_ZTIPKn:
_ZTIPKo:
_ZTIPKs:
_ZTIPKt:
_ZTIPKv:
_ZTIPKw:
_ZTIPKx:
_ZTIPKy:
_ZTIPa:
_ZTIPb:
_ZTIPc:
_ZTIPd:
_ZTIPe:
_ZTIPf:
_ZTIPg:
_ZTIPh:
_ZTIPi:
_ZTIPj:
_ZTIPl:
_ZTIPm:
_ZTIPn:
_ZTIPo:
_ZTIPs:
_ZTIPt:
_ZTIPv:
_ZTIPw:
_ZTIPx:
_ZTIPy:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 419 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
_ZTIa:
_ZTIb:
_ZTIc:
_ZTId:
_ZTIe:
_ZTIf:
_ZTIg:
_ZTIh:
_ZTIi:
_ZTIj:
_ZTIl:
_ZTIm:
_ZTIn:
_ZTIo:
_ZTIs:
_ZTIt:
_ZTIv:
_ZTIw:
_ZTIx:
_ZTIy:
_ZTSDh:
_ZTSDi:
_ZTSDn:
_ZTSDs:
_ZTSN10__cxxabiv116__enum_type_infoE:
_ZTSN10__cxxabiv116__shim_type_infoE:
_ZTSN10__cxxabiv117__array_type_infoE:
_ZTSN10__cxxabiv117__class_type_infoE:
_ZTSN10__cxxabiv117__pbase_type_infoE:
_ZTSN10__cxxabiv119__pointer_type_infoE:
_ZTSN10__cxxabiv120__function_type_infoE:
_ZTSN10__cxxabiv120__si_class_type_infoE:
_ZTSN10__cxxabiv121__vmi_class_type_infoE:
_ZTSN10__cxxabiv123__fundamental_type_infoE:
_ZTSN10__cxxabiv129__pointer_to_member_type_infoE:
_ZTSPDh:
_ZTSPDi:
_ZTSPDn:
_ZTSPDs:
_ZTSPKDh:
_ZTSPKDi:
_ZTSPKDn:
_ZTSPKDs:
_ZTSPKa:
_ZTSPKb:
_ZTSPKc:
_ZTSPKd:
_ZTSPKe:
_ZTSPKf:
_ZTSPKg:
_ZTSPKh:
_ZTSPKi:
_ZTSPKj:
_ZTSPKl:
_ZTSPKm:
_ZTSPKn:
_ZTSPKo:
_ZTSPKs:
_ZTSPKt:
_ZTSPKv:
_ZTSPKw:
_ZTSPKx:
_ZTSPKy:
_ZTSPa:
_ZTSPb:
_ZTSPc:
_ZTSPd:
_ZTSPe:
_ZTSPf:
_ZTSPg:
_ZTSPh:
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 420 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
_ZTSPi:
_ZTSPj:
_ZTSPl:
_ZTSPm:
_ZTSPn:
_ZTSPo:
_ZTSPs:
_ZTSPt:
_ZTSPv:
_ZTSPw:
_ZTSPx:
_ZTSPy:
_ZTSa:
_ZTSb:
_ZTSc:
_ZTSd:
_ZTSe:
_ZTSf:
_ZTSg:
_ZTSh:
_ZTSi:
_ZTSj:
_ZTSl:
_ZTSm:
_ZTSn:
_ZTSo:
_ZTSs:
_ZTSt:
_ZTSv:
_ZTSw:
_ZTSx:
_ZTSy:
_ZTVN10__cxxabiv116__enum_type_infoE:
_ZTVN10__cxxabiv116__shim_type_infoE:
_ZTVN10__cxxabiv117__array_type_infoE:
_ZTVN10__cxxabiv117__class_type_infoE:
_ZTVN10__cxxabiv117__pbase_type_infoE:
_ZTVN10__cxxabiv119__pointer_type_infoE:
_ZTVN10__cxxabiv120__function_type_infoE:
_ZTVN10__cxxabiv120__si_class_type_infoE:
_ZTVN10__cxxabiv121__vmi_class_type_infoE:
_ZTVN10__cxxabiv123__fundamental_type_infoE:
_ZTVN10__cxxabiv129__pointer_to_member_type_infoE:
.word 0
.word 0
.word 0
Arm publications
Arm periodically provides updates and corrections to its documentation. See https://
developer.arm.com/ for errata documents, Knowledge Base Articles (KBAs), and Frequently Asked
Questions (FAQs).
For full information about the base standard, software interfaces, and standards supported by Arm,
see https://developer.arm.com/architectures/system-architectures/software-standards/abi.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 421 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
In addition, see the following documentation for specific information relating to Arm® products:
• Arm Architecture Reference Manuals.
• Cortex-A series processors.
• Cortex-R series processors.
• Cortex-M series processors.
• Cortex-X series processors.
• Neoverse series processors.
Other publications
This Arm Compiler for Embedded tools documentation is not intended to be an introduction to the
C or C++ programming languages. It does not try to teach programming in C or C++, and it is not a
reference manual for the C or C++ standards. Other publications provide general information about
programming.
This book explains how C++ evolved from its first design to the language in use today.
• Vandevoorde, D and Josuttis, N.M. C++ Templates: The Complete Guide (2003). Addison-Wesley
Publishing Company, Reading, Massachusetts. ISBN 0-201-73484-2.
• Meyers, S., Effective C++ (3rd edition, 2005). Addison-Wesley Publishing Company, Reading,
Massachusetts. ISBN 978-0321334879.
The standard is available from national standards bodies (for example, AFNOR in France, ANSI
in the USA).
• Kernighan, B.W. and Ritchie, D.M., The C Programming Language (2nd edition, 1988). Prentice-
Hall, Englewood Cliffs, NJ, USA. ISBN 0-13-110362-8.
This book is co-authored by the original designer and implementer of the C language, and is
updated to cover the essentials of ANSI C.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 422 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Supporting reference information
• Harbison, S.P. and Steele, G.L., A C Reference Manual (5th edition, 2002). Prentice-Hall,
Englewood Cliffs, NJ, USA. ISBN 0-13-089592-X.
This is a comprehensive treatment of ANSI and ISO standards for the C Library.
• Koenig, A., C Traps and Pitfalls, Addison-Wesley (1989), Reading, Mass. ISBN 0-201-17928-8.
This explains how to avoid the most common traps in C programming. It provides informative
reading at all levels of competence in C.
See http://www.dwarfstd.org for the latest information about the Debug With Arbitrary Record
Format (DWARF) debug table standards and ELF specifications.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 423 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Proprietary Notice
This document is protected by copyright and other related rights and the use or implementation of
the information contained in this document may be protected by one or more patents or pending
patent applications. No part of this document may be reproduced in any form by any means
without the express prior written permission of Arm Limited ("Arm"). No license, express or implied,
by estoppel or otherwise to any intellectual property rights is granted by this document unless
specifically stated.
Your access to the information in this document is conditional upon your acceptance that you
will not use or permit others to use the information for the purposes of determining whether the
subject matter of this document infringes any third party patents.
The content of this document is informational only. Any solutions presented herein are subject
to changing conditions, information, scope, and data. This document was produced using
reasonable efforts based on information available as of the date of issue of this document.
The scope of information in this document may exceed that which Arm is required to provide,
and such additional information is merely intended to further assist the recipient and does not
represent Arm’s view of the scope of its obligations. You acknowledge and agree that you possess
the necessary expertise in system security and functional safety and that you shall be solely
responsible for compliance with all legal, regulatory, safety and security related requirements
concerning your products, notwithstanding any information or support that may be provided by
Arm herein. In addition, you are responsible for any applications which are used in conjunction
with any Arm technology described in this document, and to minimize risks, adequate design and
operating safeguards should be provided for by you.
This document may include technical inaccuracies or typographical errors. THIS DOCUMENT IS
PROVIDED "AS IS". ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES
OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A
PARTICULAR PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm
makes no representation with respect to, and has undertaken no analysis to identify or understand
the scope and content of, any patents, copyrights, trade secrets, trademarks, or other rights.
TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR
ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL,
INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND
REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS
DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Reference by Arm to any third party’s products or services within this document is not an express
or implied approval or endorsement of the use thereof.
This document consists solely of commercial items. You shall be responsible for ensuring that
any permitted use, duplication, or disclosure of this document complies fully with any relevant
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 424 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
export laws and regulations to assure that this document or any portion thereof is not exported,
directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to
Arm’s customers is not intended to create or refer to any partnership relationship with any other
company. Arm may make changes to this document at any time and without notice.
This document may be translated into other languages for convenience, and you agree that if there
is any conflict between the English version of this document and any translation, the terms of the
English version of this document shall prevail.
The validity, construction and performance of this notice shall be governed by English Law.
The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks
of Arm Limited (or its affiliates) in the US and/or elsewhere. Please follow Arm’s trademark usage
guidelines at https://www.arm.com/company/policies/trademarks. All rights reserved. Other brands
and names mentioned in this document may be the trademarks of their respective owners.
PRE-1121-V1.0
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 425 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Product status
All products and services provided by Arm require deliverables to be prepared and made available
at different levels of completeness. The information in this document indicates the appropriate
level of completeness for the associated deliverables.
Revision history
These sections can help you understand how the document has changed over time.
Document history
0623- 16 October 2024 Non- Arm Compiler for Embedded v6.23 Release.
01 Confidential
0622- 13 March 2024 Non- Arm Compiler for Embedded v6.22 Release.
00 Confidential
0621- 11 October 2023 Non- Arm Compiler for Embedded v6.21 Release.
00 Confidential
0620- 15 March 2023 Non- Arm Compiler for Embedded v6.20 Release.
00 Confidential
0619- 12 October 2022 Non- Arm Compiler for Embedded v6.19 Release.
00 Confidential
0618- 22 March 2022 Non- Arm Compiler for Embedded v6.18 Release.
00 Confidential
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 426 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
0617- 20 October 2021 Non- Arm Compiler for Embedded v6.17 Release.
00 Confidential
0616- 12 March 2021 Non- Documentation update 1 for Arm Compiler v6.16
01 Confidential Release.
Change history
The first table is for the first release. Then, each table compares the new issue of the manual with
the last released issue of the manual. Release numbers match the revision history in Document
release information on page 426.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 427 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Added a row in the exceptions table for all standards and moved the
following list items from C++98 and C++03 rows to the new row:
• std::vector<bool>::const_reference
• std::bitset<N>
Added information about linking objects compiled with different C • Selecting source language options.
or C++ standards. Linking object files to produce an executable.
•
Added a topic that describes the interaction of OVERLAY and • Interaction of OVERLAY and PROTECTED attributes with
PROTECTED attributes with armlink merge options. armlink merge options.
Added information about the effects of linking with a scatter file • Automatic placement of __at sections.
having ZI data in an execution region.
Added a note to include a .balign directive when defining your • Using the integrated assembler.
own sections with the armclang integrated assembler.
Minor improvements to the Getting Started section about compile • Compiling a Hello World example.
and link steps, and clarification of what the clobbered_list Writing inline assembly code.
•
means when building programs with inline assembly code.
Update description of -marm command-line option to clarify • Common Arm Compiler for Embedded toolchain options.
that it gives an error, not a warning, when used with an M-profile
architecture.
Added a note for the workaround when entry functions or Non- • Overview of building Secure and Non-secure images with the
secure function calls have more than 4 arguments. Armv8-M Security Extension.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 428 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 429 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 430 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 431 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 432 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Conventions
The following subsections describe conventions used in Arm documents.
Glossary
The Arm Glossary is a list of terms used in Arm documentation, together with definitions for
those terms. The Arm Glossary does not contain terms that are industry standard unless the Arm
meaning differs from the generally accepted meaning.
Typographic conventions
Arm documentation uses typographical conventions to convey specific meaning.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 433 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Convention Use
italic Citations.
bold Interface elements, such as menu names.
For example:
SMALL CAPITALS Terms that have specific technical meanings as defined in the Arm® Glossary. For example,
IMPLEMENTATION DEFINED, IMPLEMENTATION SPECIFIC, UNKNOWN, and UNPREDICTABLE.
Your system requires the following. If you do not follow these requirements your
system will not work.
You are at risk of causing permanent damage to your system or your equipment, or
harming yourself.
A useful tip that might make it easier, better or faster to perform a task.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 434 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
A reminder of something important that relates to the information you are reading.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 435 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Useful resources
This document contains information that is specific to this product. See the following resources for
other useful information.
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 436 of 437
Arm® Compiler for Embedded User Guide Document ID: 100748_6.23_01_en
Issue 01
Copyright © 2019–2024 Arm Limited (or its affiliates). All rights reserved.
Non-Confidential
Page 437 of 437