ESD Unit 3 ARM 2024 Latest
ESD Unit 3 ARM 2024 Latest
What Is ARM?
• Advanced RISC Machine
2
Why ARM is most popular:
• ARM is the most popular processors, particularly
used in portable devices due to its low power
consumption and reasonable performance.
• ARM has got better performance when compared
to other processors.
• The ARM processor is basically consisting of low
power consumption and low cost.
• It is very easy to use ARM for quick and efficient
application developments so that is the main
reason why ARM is most popular.
History of ARM Processor
• ARM Processor - 32 bit processor
• RISC (Reduced Instruction Set Computer) concept
introduced in 1980 at Stanford and Berkley
• ARM was developed by Acron Computer Limited
of Cambridge, England between 1983 & 1985
• ARM limited founded in 1990
• ARM Cores
• Licensed to partners to develop and fabricate new
microcontrollers
• Soft core
History of ARM
Historical remarks
• ARM’s parent company is Acorn Computers (UK).
• Acorn Computers started their Acorn RISC Machine
project in October 1983 (two years after the introduction
of the IBM PC) to develop an own powerful processor for
a line of business computers.
• The acronym ARM was coined originally at this time
(1983) from the designation Acorn RISC Machine.
• In 1990 the company Advanced RISC Machines Ltd. (ARM
Ltd.) was founded as a joint venture of Acorn Computers,
Apple Computers and VLSI Technology.
• Accordingly, also the interpretation of ARM was changed
to “Advanced RISC Machines”.
History of ARM
• ARM (ARM Holdings plc) is a British multinational
semiconductor company with its head office in Cambridge.
• The company designs and licenses low power embedded and
mobile ARM processors along with the appropriate design tools
but does not fabricate semiconductors.
• ARM designs dominate recently the embedded and the mobile
market (including Smartphone and tablets).
• As of 2014 more than 50 billion ARM based processors have
been produced in total, up from 10 billion in 2008 [59], [19], as
indicated in the next Figure.
ARM's first office, 18th century barn just
outside of Cambridge.
ARM's headquarters in Cambridge
(UK)
ARM Connected Community – 900+
ARM1156T2F-S
version
ARM1136JF-S
ARMv6
ARM1176JZF-S
ARM102xE XScaleTM ARM1026EJ-S
ARMv5
ARM9x6E ARM926EJ-S
StrongARM® SC200
ARM7TDMI-S ARM92xT
V4
SC100 ARM720T
RISC CISC
• Reduced Instruction Set • Complex Instruction Set
Computer Computer
• It contains lesser number of • It contains greater number
instructions. of instructions.
• Instruction pipelining and • Instruction pipelining
increased execution speed. feature does not exist.
• Orthogonal instruction • Non-orthogonal set(all
set(allows each instruction instructions are not allowed
to operate on any register to operate on any register
and use any addressing and use any addressing
mode. mode.
A difference between RISC and CICS
RISC CISC
• Operations are performed on • Operations are performed either
registers only, only memory on registers or memory
operations are load and store. depending on instruction.
• A larger number of registers are • The number of general purpose
available. registers are very limited.
• Programmer needs to write more • Instructions are like macros in C
code to execute a task since language.
instructions are simpler ones. • It is variable length instruction.
• It is single, fixed length • More silicon usage since more
instruction. additional decoder logic is
• Less silicon usage and pin count. required to implement the
• With Harvard Architecture. complex instruction decoding.
• Can be Harvard or Von-Neumann
Architecture.
RISC Design Principles(1)
• Simple operations
– Simple instructions that can execute in one cycle
• Register-to-register operations
– Only load and store operations access memory
– Rest of the operations on a register-to-register
basis
• Simple addressing modes
– A few addressing modes (1 or 2)
RISC Design Principles(2)
• Large number of registers
– Needed to support register-to-register operations
– Minimize the procedure call and return overhead
• Fixed-length instructions
– Facilitates efficient instruction execution
• Simple instruction format
– Fixed boundaries for various fields
Difference between Harvard and Von-
neumann Achitectures
Difference between Harvard and Von-
neumann Achitectures
ARM processor features
• Load/store architecture.
• An orthogonal instruction set.
• Mostly single-cycle execution.
• Enhanced power-saving design.
• 64 and 32-bit execution states for scalable high performance.
• 32-bit RISC-processor core (32-bit instructions)
• 37 pieces of 32-bit integer registers (16 available)
• Pipelined (ARM7: 3 stages)
• Von Neuman-type bus structure (ARM7), Harvard (ARM9)
• 8 / 16 / 32 -bit data types
• 7 modes of operation (usr, fiq, irq, svc, abt, sys, und)
• Simple structure -> reasonably good speed / power
consumption ratio
ARM7TDMI
• ARM7TDMI is a core processor module embedded in many
ARM7 microprocessors.
• It is the most complex processor core module in ARM7
series.
– T: capable of executing Thumb instruction set
– D: Featuring with IEEE Std. 1149.1 JTAG boundary-scan
debugging interface.
– M: Featuring with a Multiplier-And-Accumulate (MAC) unit for
DSP applications.
– I: Featuring with the support of embedded In-Circuit Emulator.
• Three pipeline Stages: Instruction fetch, decode, and
Execution.
Features
• A 32-bit RSIC processor core capable of executing
16-bit instructions (Von Neumann Architecture)
– High density code
• The Thumb sets 16-bit instruction length allows it to
approach about 65% of standard ARM code size while
retaining ARM 32-bit processor performance.
– Smaller die size
• About 72,000 transistors
• Occupying only about 4.8mm2 in a 0.6um semiconductor
technology.
– Lower power consumption
• dissipate about 2mW/MHZ with 0.6um technology.
Features (2)
• Memory Access
– Data can be
• 8-bit (bytes)
• 16-bit (half words)
• 32-bit (words)
• Memory Interface
– Can interface to SRAM, ROM, DRAM
– Has four basic types of memory cycle
• idle cycle
• Non sequential cycle
• sequential cycle
• coprocessor register cycle
Instruction Pipeline
• The ARM processor uses a internal pipeline to increase
the rate of instruction flow to the processor, allowing
several operations to be undertaken simultaneously,
rather than serially.
• Pipelining is breaking down execution into multiple
steps, and executing each step in parallel.
• In most ARM processors, the instruction pipeline
consists of 3 stages.
• Basic 3 stage pipeline
– Fetch – Load from memory
– Decode – Identify instruction to execute
– Execute – Process instruction and write back result
Instruction Pipeline
• ARM7 has a 3 stage pipeline
– Fetch, Decode, Execute
mre q s eq Cy c l e Us e
0 0 N Non-sequential memory access
0 1 S Sequential memory access
1 0 I Internal cycle – bus and memory inactive
1 1 C Coprocessor register transfer – memory inactive
75
ARM7TDMI Interface Signals (3/4)
– Lock indicates that the processor should keep the bus to ensure the
atomicity of the read and write phase of a SWAP instruction
– \r/w, read or write
– mas[1:0], encode memory access size – byte, half-word or word
– bl[3:0], externally controlled enables on latches on each of the 4 bytes on
the data input bus
• MMU interface
– \trans (translation control), 0: user mode, 1: privileged mode
– \mode[4:0], bottom 5 bits of the CPSR (inverted)
– Abort, disallow access
• State
– T bit, whether the processor is currently executing ARM or Thumb
instructions
• Configuration
– Bigend, big-endian or little-endian
76
ARM7TDMI Interface Signals (4/4)
• Interrupt
– \fiq, fast interrupt request, higher priority
– \irq, normal interrupt request
– isync, allow the interrupt synchronizer to be passed
• Initialization
– \reset, starts the processor from a known state, executing from address
0000000016
• ARM7TDMI characteristics
77
32x8 Multiplier
• Earlier ARM processors (prior to ARM7TDMI) used a
smaller, simpler multiplier block which required more
clock cycles to complete a multiplication.
• Introduction of this more complex 32x8 multiplier
reduced the number of cycles required for a
multiplication of two registers (32-bit * 32-bit) to a few
cycles (data dependent).
• Modern ARM processors are generally capable of
calculating at least a 32-bit product in a single cycle,
although some of the smallest Cortex-M processors
provide an implementation choice of a faster (single-
cycle) or a smaller (32 cycle) 32-bit multiplier block.
The ARM's Barrel Shifter
• The ARM arithmetic logic unit has a 32-bit barrel shifter that is capable of
shift and rotate operations. The second operand to many ARM and Thumb
data-processing and single register data-transfer instructions can be
shifted, before the data-processing or data-transfer is executed, as part of
the instruction.
• This can be used by various classes of ARM instructions to perform
comparatively complex operations in a single instruction.
• The barrel shifter can perform the following types of operation:
• LSL - shift left by n bits
• LSR - logical shift right by n bits
• ASR - arithmetic shift right by n bits (the bits fed |into the top end
of the operand are copies of the |original top (or sign) bit
• ROR - rotate right by n bits
• RRX - rotate right extended by 1 bit. This is a 33 bit |rotate, where
the 33rd bit is the PSR C flag.
• The barrel shifter is a functional unit which
can be used in a number of different
circumstances.
• It provides five types of shifts and rotates
which can be applied to Operand2.
• LSL – Logical Shift Left
– Example: Logical Shift Left by 4.
• LSR – Logical Shift Right
– Example: Logical Shift Right by 4.
• Examples
– MOV r0, r0, LSL #1 -Multiply R0 by two.
– MOV r1, r1, LSR #2 -Divide R1 by four (unsigned).
– MOV r2, r2, ASR #2 -Divide R2 by four (signed).
– MOV r3, r3, ROR #16 -Swap the top and bottom halves
of R3.
– ADD r4, r4, r4, LSL #4 -Multiply R4 by 17. (N = N + N * 16)
– RSB r5, r5, r5, LSL #5 -Multiply R5 by 31. (N = N * 32 - N
what is AMBA?
• “The ARM AMBA (Advanced Microcontroller
Bus Architecture) protocol is an open
standard, on-chip interconnect specification
for the connection and management of
functional blocks in a System-on-Chip (SoC). It
facilitates right-first-time development of
multi-processor designs with large numbers of
controllers and peripherals. AMBA promotes
design re-use by defining common interface
standards for SoC modules.”
AMBA
• AMBA: Advanced Microcontroller Bus Architecture
– It is a specification for an on-chip bus, to enable
macrocells (such as a CPU, DSP, Peripherals, and memory
controllers) to be connected together to form a
microcontroller or complex peripheral chip.
– It defines
• A high-speed, high-bandwidth bus, the Advanced High
Performance Bus (AHB).
• A simple, low-power peripheral bus, the Advanced Peripheral Bus
(APB).
• Access for an external tester to permit modular testing and fast
test of cache RAM
• Essential house keeping operations (reset/power-up, …)
AMBA protocol specifications
• The AMBA specification defines an on-chip
communications standard for designing high-performance
embedded microcontrollers. It is supported by ARM Limited
with wide cross-industry participation.
– The AMBA 5 specification defines the following
buses/interfaces:
• Advanced High-performance Bus (AHB5, AHB-Lite)
• CHI Coherent Hub Interface (CHI)
– The AMBA 4 specification defines following buses/interfaces:
• AXI Coherency Extensions (ACE) - widely used on the latest ARM
Cortex-A processors including Cortex-A7 and Cortex-A15
• AXI Coherency Extensions Lite (ACE-Lite)
• Advanced Extensible Interface 4 (AXI4)
• Advanced Extensible Interface 4 Lite (AXI4-Lite)
• Advanced Extensible Interface 4 Stream (AXI4-Stream v1.0)
• Advanced Trace Bus (ATB v1.1)
• Advanced Peripheral Bus (APB4 v2.0)
AMBA protocol specifications
• AMBA 3 specification defines four buses/interfaces:
– Advanced Extensible Interface (AXI3 or AXI v1.0) - widely used
on ARM Cortex-A processors including Cortex-A9
– Advanced High-performance Bus Lite (AHB-Lite v1.0)
– Advanced Peripheral Bus (APB3 v1.0)
– Advanced Trace Bus (ATB v1.0)
• AMBA 2 specification defines three buses/interfaces:
– Advanced High-performance Bus (AHB) - widely used on ARM7,
ARM9 and ARM Cortex-M based designs
– Advanced System Bus (ASB)
– Advanced Peripheral Bus (APB2 or APB)
• AMBA specification (First version) defines two
buses/interfaces:
– Advanced System Bus (ASB)
– Advanced Peripheral Bus (APB)
ARM7 Processor Architecture
• Features (LPC2148)
– 16/32-bit ARM7TDMI-S microcontroller in a tiny LQFP64
package.
– 8 to 40 kB of on-chip static RAM and 32 to 512 kB of on-chip
flash program memory. 128 bit wide interface/accelerator
enables high speed 60 MHz operation.
– In-System/In-Application Programming (ISP/IAP) via on-chip
boot-loader software. Single flash sector or full chip erase in 400
ms and programming of 256 bytes in 1 ms.
– Embedded ICE RT and Embedded Trace interfaces offer real-
time debugging with the on-chip Real Monitor software and
high speed tracing of instruction execution.
– USB 2.0 Full Speed compliant Device Controller with 2 kB of
endpoint RAM. In addition, the LPC2146/8 provide 8 kB of on-
chip RAM accessible to USB by DMA.
ARM7 Processor Architecture(2)
• Features (LPC2148)
– One or two 10-bit A/D converters provide a total of 6/14 analog
inputs, with conversion times as low as 2.44 µs per channel.
– Single 10-bit D/A converter provides variable analog output.
– Two 32-bit timers/external event counters (with four capture and four
compare channels each), PWM unit (six outputs) and watchdog.
– Low power real-time clock with independent power and dedicated 32
kHz clock input.
– Multiple serial interfaces including two UARTs, two Fast I2C-bus (400
kbit/s), SPI and SSP with buffering and variable data length
capabilities.
– Vectored interrupt controller with configurable priorities and vector
addresses.
– Up to 45 of 5 V tolerant fast general purpose I/O pins in a tiny LQFP64
package.
ARM7 Processor Architecture(3)
• Features (LPC2148)
– Up to nine edge or level sensitive external interrupt pins
available.
– 60 MHz maximum CPU clock available from programmable on-
chip PLL with settling time of 100 µs.
– On-chip integrated oscillator operates with an external crystal in
range from 1 MHz to 30 MHz and with an external oscillator up
to 50 MHz.
– Power saving modes include Idle and Power-down.
– Individual enable/disable of peripheral functions as well as
peripheral clock scaling for additional power optimization.
– Processor wake-up from Power-down mode via external
interrupt, USB, Brown-Out Detect (BOD) or Real-Time Clock
(RTC).
– Single power supply chip with Power-On Reset (POR) and BOD
circuits: – CPU operating voltage range of 3.0 V to 3.6 V (3.3 V ±
10 %) with 5 V tolerant I/O pads.
LPC2148 Pin Configuration
NXP LPC214X - IC
ARM Registers
• ARM has a load store architecture
• General purpose registers can hold data or
address
• Total of 37 registers each 32 bit wide
• There are 18 active registers
– 16 data registers
– 2 status registers
ARM Registers (2)
• Registers R0 - R12 are general purpose
registers
• R13 is used as stack pointer (SP)
• R14 is used as link register (LR)
• R15 is used a program counter (PC)
• CPSR – Current program status register
• SPSR – Stored program status register
ARM Registers (3)
• Three of the 16 visible registers have special roles:
– Stack pointer : Software normally uses R13 as a Stack Pointer
(SP). R13 is used by the PUSH and POP instructions in T variants.
– Link register :Register 14 is the Link Register (LR). This register
holds the address of the next instruction after a Branch and Link
(BL or BLX) instruction, which is the instruction used to make a
subroutine call. It is also used for return address information on
entry to exception modes. At all other times, R14 can be used as
a general-purpose register.
– Program counter :Register 15 is the Program Counter (PC). It
can be used in most instructions as a pointer to the instruction
which is two instructions after the instruction being executed. In
ARM state, all ARM instructions are four bytes long (one 32-bit
word) and are always aligned on a word boundary. The PC can
be halfword (16-bit) and byte aligned respectively in these
states.
ARM Registers (4)
• Program status register
– The current operating processor status is in the
Current Program Status Register (CPSR).
– CPSR is used to control and store CPU states
– CPSR is divided in four 8 bit fields
• Flags
• Status
• Extension
• Control
Current Program status register(CPSR)
Current Program status register
Program Status Registers
31 28 27 24 23 19 16 15 10 9 8 7 6 5 4 0
Mode Description
Supervisor Entered on reset and when a Supervisor call
(SVC) instruction (SVC) is executed
Entered when a high priority (fast) interrupt is
Exception modes
FIQ
raised
cpsr
spsr spsr spsr spsr spsr
– Only 32 bit data bus for both inst. And data. 0x18
memory. 0x16
0x14
space 0x13
Memory as words
119
ARM Memory Interface
• The ARM7TDMI processor has a Von Neumann
architecture, with a single 32-bit data bus
carrying both instructions and data.
• Only load, store, and swap instructions can access
data from memory.
• Bus interface signals
– The signals in the ARM7TDMI processor bus
interface can be grouped into four categories:
• clocking and clock control
• address class signals
• memory request signals
• data timed signals.
ARM Memory Interface
• Bus cycle types
• The ARM7TDMI processor bus interface is pipelined.
• This gives the maximum time for a memory cycle to decode the address
and respond to the access request:
• memory request signals are broadcast in the bus cycle ahead of the bus cycle
to which they refer
• address class signals are broadcast half a clock cycle ahead of the bus cycle to
which they refer.
• A single memory cycle is shown in Figure.
ARM Memory Interface
• Bus cycle types are encoded on the nMREQ and SEQ signals as
listed in Table.
ARM Memory Interface
• Sequential (S cycle)
– (nMREQ, SEQ) = (0, 1)
– The ARM core requests a transfer to or from an address which is either the
same, or one word or one-half-word greater than the preceding address.
• Non-sequential (N cycle)
– (nMREQ, SEQ) = (0, 0)
– The ARM core requests a transfer to or from an address which is unrelated to
the address used in the preceding address.
• Internal (I cycle)
– (nMREQ, SEQ) = (1, 0)
– The ARM core does not require a transfer, as it performing an internal function,
and no useful prefetching can be performed at the same time
• Coprocessor register transfer (C cycle)
– (nMREQ, SEQ) = (1, 1)
– The ARM core wished to use the data bus to communicate with a coprocessor,
but does not require any action by the memory system.
123
ARM Instruction Set
• ARM instructions fall into one of the following
three categories:
– Data processing instructions.
– Data transfer instructions.
– Control flow instructions/Branching instructions.
Features of the ARM Instruction Set
• Load-store architecture
– Process values which are in registers
– Load, store instructions for memory data accesses
• 3-address data processing instructions
• Conditional execution of every instruction
• The inclusion of every powerful load and store multiple
register instructions
• Single-cycle execution of all instruction
• Open coprocessor instruction set extension
• Very dense 16-bit compressed instruction set (Thumb)
Load-store architecture
• ARM employs a load-store architecture.
– This means that the instruction set will only process
(add, subtract, and so on) values which are in registers
(or specified directly within the instruction itself), and
will always place the results of such processing into a
register.
– The only operations which apply to memory state are
ones which copy memory values into registers (load
instructions) or copy register values into memory
(store instructions).
– ARM does not support such 'memory-to-memory'
operations.
Thumb
• Thumb is a 16-bit instruction set
– Optimized for code density from C code
– Improved performance form narrow memory
– Subset of the functionality of the ARM instruction set
• Core has two execution states – ARM and Thumb
– Switch between them using BX instruction
• Thumb has characteristic features:
– Most Thumb instruction are executed unconditionally
– Many Thumb data process instruction use a 2-address
format
– Thumb instruction formats are less regular than ARM
instruction formats, as a result of the dense encoding.
Conditional Execution (1)
• One of the ARM's most interesting features is that
each instruction is conditionally executed
• Most other instruction sets allow conditional
execution of branch instructions, based on the
state of the condition flags.
• In ARM, almost all instructions have can be
conditionally executed.
• If corresponding condition is true, the instruction is
executed. If the condition is false, the instruction is
turned into a nop.
Conditional Execution (2)
• The condition is specified by suffixing the instruction with a
condition code mnemonic.
• This improves code density and performance by reducing the
number of forward branch instructions.
• CMP r3,#0 CMP r3,#0
BEQ skip ADDNE r0,r1,r2
ADD r0,r1,r2
skip
• In the following example, the instruction moves r1 to r0
only if carry is set.
MOVCS r0, r1
Table :- Condition code suffixes
Sign Suffix Meaning Flags
EQ Equal Z=1
NE Not equal Z=0
CS Carry set (identical to HS) C=1
CC Carry clear (identical to LO) C=0
MI Minus or negative result N=1
PL Positive or zero result N=0
VS Overflow V=1
VC Now overflow V=0
AL Always. This is the default -
HI Higher C = 1 AND Z = 0
HS Higher or same C=1
Unsigned
LS Lower or same C = 0 OR Z = 1
LO Lower (identical to CC) C=0
GT Greater than Z = 0 AND N = V
GE Greater than or equal N=V
Signed
LE Less than or equal Z = 1 OR N != V
LT Less than N != V
The Condition Field
• Condition codes and Status flags:
31 28 24 20 16 12 8 4 0
Cond
destination register
first operand register
set condition codes
arithmetic/logic function
25 11 8 7 0
1 #rot 8-bit immediate
immediate alignment
11 7 6 5 4 3 0
#shift Sh 0 Rm
Instruction Sets-143
The ARM instruction set
Data processing instructions:
• Arithmetic operations examples.
ADD r0, r1, r2 ;r0:= r1 + r2
ADC r0, r1, r2 ;r0:= r1 + r2 +C
SUB r0, r1, r2 ;r0:= r1 - r2
SBC r0, r1, r2 ;r0:= r1 - r2 + C - 1
RSB r0, r1, r2 ;r0:= r2 – r1
RSC r0, r1, r2 ;r0:= r2 – r1 + C – 1
• Some other Examples
– SUBGT r3, r3, #1
– RSBLES r4, r5, #5
– ADD r0, r2, r1, LSL #2
– RSB r4, r3, r2, LSL #3
Instruction Sets-144
The ARM instruction set
Data processing instructions:
• Bit-wise logical operations.
– Perform the specified Boolean logic operation on each bit
pair of the input operands, so in the first case r0[i]:= r1[i]
AND r2[i] for each value of i from 0 to 31 inclusive, where
r0[i] is the ith bit of r0.
• AND, OR , XOR (here called EOR) logical operations
and BIC(stands for ‘bit clear’).
Instruction Sets-145
The ARM instruction set
Data processing instructions:
• Bit-wise logical operations examples.
Instruction Sets-147
The ARM instruction set
Data processing instructions:
• Comparison operations examples.
PRE cpsr = nzcvqiFt_USER
r0 = 4 r9 = 4
CMP r0, r9
POST cpsr = nZcvqiFt_USER
• You can see that both registers, r0 and r9, are equal before
executing the instruction.
• prior to execution
– The value of the z flag is 0 and is represented by a lowercase z.
• After execution
– the z flag changes to 1 or an uppercase Z.
• This change indicates equality.
• The CMP is effectively a subtract instruction with the result
discarded.
Instruction Sets-148
The ARM instruction set
Data processing instructions:
• Comparison operations examples.
• compare
– CMP R1, R2 @ set cc on R1-R2
• compare negated
– CMN R1, R2 @ set cc on R1+R2
• bit test
– TST R1, R2 @ set cc on R1 and R2
• test equal
– TEQ R1, R2 @ set cc on R1 xor R2
Instruction Sets-149
The ARM instruction set
Data processing instructions:
• Multiplication operations.
– The multiply instructions multiply the contents of a pair
of registers and, depending upon the instruction,
accumulate the results in with another register.
– The long multiplies accumulate onto a pair of registers
representing a 64-bit value. The final result is placed in a
destination register or a pair of registers.
Instruction Sets-150
The ARM instruction set
Data processing instructions:
• Multiplication operations.
• Multiply:
MUL R0, R1, R2 ; R0 = (R1xR2)[31:0]
• Multiply-accumulate:
MLA r4, r3, r2, r1 ; r4 := (r3 x r2 + r1)[31:0]
• Multiplying two 32-bit integers gives a 64-bit result, the least significant
32 bits of which are placed in the result register and the rest are ignored.
• This can be viewed as multiplication in modulo arithmetic and gives the
correct result whether the operands are viewed as signed or unsigned
integers.
• Operand restrictions
– Immediate second operands are not supported.
– The result register must not be the same as the first source register.
– The destination register Rd must not be the same as the operand register Rm.
– R15 must not be used as an operand or as the destination register.
Instruction Sets-151
The ARM instruction set
Data processing instructions:
• Register movement operations.
– Move is the simplest ARM instruction.
– It copies N into a destination register Rd, where N is a
register or immediate value.
– This instruction is useful for setting initial values and
transferring data between registers.
Instruction Sets-152
The ARM instruction set
Data processing instructions:
• Register movement operations.
PRE r5 = 5 r7 = 8
MOV r7, r5 ;r7 = r5
POST r5 = 5 r7 = 5
• This example shows a simple move instruction.
• The MOV instruction takes the contents of
register r5 and copies them into register r7,
• in this case, taking the value 5, and overwriting
the value 8 in register r7.
Instruction Sets-153
The ARM instruction set
Data processing instructions:
• Register movement operations.
– MVN r0, r2 ;r0= not r2
• The 'MVN' mnemonic stands for 'move negated';
• it leaves the result register set to the value
obtained by inverting every bit in the source
operand.
• Examples:
– MOVS r2, #10
– MVNEQ r1,#0
• Use MVN to:
– form a bit mask
– take the ones complement of a value.
Data operation varieties
• Logical shift:
– fills with zeroes
• Arithmetic shift:
– fills with sign bit on shift right
• RRX performs 33-bit rotate, including C bit
from CPSR above sign bit.
Instruction Sets-155
ARM shift operations
• The available shift operations are:
– LSL: logical shift left by 0 to 31 places; fill the
vacated bits at the least significant end of the
word with zeros.
– LSR: logical shift right by 0 to 31 places; fill the
vacated bits at the most significant end of the
word with zeros.
ARM shift operations
• The available shift operations are:
– ASL: arithmetic shift left; this is a synonym for LSL.
– ASR: arithmetic shift right by 0 to 31 places;
• fill the vacated bits at the MSB end of the word with
zeros if the source operand was positive, or with ones if
the source operand was negative.
ARM shift operations
• The available shift operations are:
– ROR: rotate right by 0 to 32 places;
– RRX: rotate right extended by 1 place;
Data transfer instructions
• Data transfer instructions move data between ARM
registers and memory.
• There are three basic forms of data transfer instruction in
the ARM instruction set:
– Single register load and store instructions.
• These instructions provide the most flexible way to transfer single
data items between an ARM register and memory.
• The data item may be a byte, a 32-bit word, or a 16-bit half-word.
– Multiple register load and store instructions.
• These instructions are less flexible than single register transfer
instructions, but enable large quantities of data to be transferred
more efficiently.
• They are used for procedure entry and exit, to save and restore
workspace registers, and to copy blocks of data around memory.
– Single register swap instructions.
• These instructions allow a value in a register to be exchanged with a
value in memory, effectively doing both a load and a store operation
in one instruction.
ARM load/store instructions
• The ARM is a Load/Store Architecture:
– Does not support memory to memory data processing
operations.
– Must move data values into registers before using them.
• This might sound inefficient, but in practice isn’t:
– Load data values from memory into registers.
– Process data in registers using a number of data processing
instructions which are not slowed down by memory access.
– Store results from registers out to memory.
ARM load/store instructions
• The ARM has three sets of instructions which interact
with main memory. These are:
– Single register data transfer (LDR/STR)
– Block data transfer (LDM/STM)
– Single Data Swap (SWP)
• The basic load and store instructions are:
– Load and Store Word or Byte or Halfword
• LDR / STR / LDRB / STRB / LDRH / STRH
Single-Register Load-Store Instructions
• Load Store instructions are used to transfer data
between memory and registers.
• Single Register Transfer
• These instructions are used to transfer a single data
item in and out of a register.
• Single register load and store instruction transfers
signed and unsigned byte, (16-bit) half word and(32-
bit) word.
• The syntax of the instruction is:
• Syntax:
– LDR/ STR{cond}{word/Half word/Byte} Rd, <address>
ARM load/store instructions
• LDR, LDRH, LDRB : load (Word, half-word, byte)
• STR, STRH, STRB : store (Word, half-word, byte)
• Addressing modes:
– register indirect : LDR r0,[r1]
– with second register : LDR r0,[r1,-r2]
– with constant : LDR r0,[r1,#4]
Instruction Sets-163
Single register data transfer
• The basic load and store instructions are:
– Load and Store Word or Byte
• LDR / STR / LDRB / STRB
• ARM Architecture Version 4 also adds support for halfwords
and signed data.
– Load and Store Halfword
• LDRH / STRH
– Load Signed Byte or Halfword - load value and sign extend it to 32
bits.
• LDRSB / LDRSH
• All of these instructions can be conditionally executed by
inserting the appropriate condition code after STR / LDR.
– e.g. LDREQB
• Syntax:
– <LDR|STR>{<cond>}{<size>} Rd, <address>
Single Register Load-Store Instructions
Data Transfer: Memory to
Register
• To transfer a word of data, we need to specify
two things:
–Register: r0-r15
–Memory address: more difficult
• How do we specify the memory address of data to
operate on?
• We will look at different ways of how this is done in
ARM
Remember: Load value/data FROM memory
Addressing Modes
• There are many ways in ARM to specify the
address; these are called addressing modes.
• Two basic classification
1. Base register Addressing
▪ Register holds the 32 bit memory address
▪ Also called the base address
2. Base Displacement Addressing mode
▪ An effective address is calculated :
Effective address = < Base address +offset>
▪ Base address in a register as before
▪ Offset can be specified in different ways
Base Register Addressing Modes
• Specify a register which contains the memory
address
– In case of the load instruction (LDR) this is the memory
address of the data that we want to retrieve from memory
– In case of the store instruction (STR), this is the memory
address where we want to write the value which is
currently in a register
• Example: [r0]
–specifies the memory address pointed to by the
value in r0
Data Transfer: Memory to Register
• Load Instruction Syntax:
1 2, [3]
–where
1) operation name
2) register that will receive value
3) register containing pointer to memory
• ARM Instruction Name:
–LDR (meaning Load Register, so 32 bits or one
word are loaded at a time)
Data Transfer: Memory to Register
– LDR r2,[r1]
This instruction will take the address in r1, and then load a 4
byte value from the memory pointed to by it into register r2
• Note: r1 is called the base register
Memory
r1 r2
Rd-> mem32[address+4]/
STMIB/LDMIB Increment Before
Rd<- mem32[address+4]
Rd-> mem32[address]/
STMDA/LDMDA Decrement After
Rd<- mem32[address]
Rd-> mem32[address-4]/
STMDB/LDMDB Decrement Before
Rd<- mem32[address-4]
Multiple Data Transfer Instruction
• Block Copy
• Copy a block of memory, which is an exact multiple of 12 words long, from
the location pointed to by r12 to the location pointed to by r13. r14 points
to the end of block to be copied.
• ;r12 points to the start of the source data
• ;r14 points to the end of the source data
• ;r13 points to the start of the destination data
• Register Addressing:-
– Now instead of giving data in the instruction, we give the data using the register.
– MOV R0, R1; R0 gets R1 value
– ADD R0, R1, R2; R0 gets the sum of R1 and R2
Addressing Modes
• Indirect Addressing modes:-
• If the variable address is out of the range 4k then put
the address in a register and give the register in
indirect addressing mode.
• LDR R0, [R1]; R0 gets the value pointed by the
address inside the register R1.
• STR R0, [R1]; The address inside the register R1 is
going to get the value of R0.
Base relative addressing
• Relative Addressing modes:-
• Here address of the memory operand is given by a register plus a
numeric displacement.
– Eg: LDR R0, [R1, #05H] ;
– R0 gets data from the memory location pointed by (R1 + 05H)
– R1 remains unchanged.
– Eg: LDR R0, [R1, #05H]! ;
– First: R1 gets R1 + 05H
– Then: R0 gets data from the new memory location pointed by R1.
– This is called PRE-INDEX Addressing.
– Eg: LDR R0, [R1], #05H ;
– First: R0 gets data from memory location pointed by R1
– Then: R1 gets R1 + 05H
– This is called POST-INDEX Addressing.
Base plus index addressing
• Base plus index addressing:
• the instruction specifies a base register and another register (the
index) which is added to the base to form the memory address.
• Here address of the memory operand is given by a sum of two
registers, where one register acts as the base, and the other acts as the
index register.
– Eg: LDR R0, [R1, R2];
– R0 gets data from the memory location pointed by (R1 + R2)
– R1 remains unchanged.
– Eg: LDR R0, [R1, R2]! ;
– First: R1 gets R1 + R2
– Then: R0 ç data from the new memory location pointed by R1.
– This is called PRE-INDEX Addressing.
– Eg: LDR R0, [R1], R2; First: R0 get data from the memory location pointed by
R1
– Then: R1 gets R1 + R2
– This is called POST-INDEX Addressing.
Base plus scaled index addressing
• Base plus scaled index addressing:
• Here address of the memory operand is given by a sum of two registers
• The first register acts as a base register. The second register can be scaled
by shifting left.
– Eg: LDR R0, [R1, R2, LSL #2];
– R0 gets data from the location pointed by (R1 + R2 left-shifted by 2)
– R1 remains unchanged.
– Eg: LDR R0, [R1, R2, LSL #2]!;
– First: R1 gets R1 + R2 left-shifted by 2
– Then: R0 gets data from the new memory location pointed by R1.
– This is called PRE-INDEX Addressing.
– Eg: LDR R0, [R1], R2, LSL #2;
– First: R0 gets data from the memory location pointed by R1
– Then: R1 gets R1 + R2 left-shifted by 2
– This is called POST-INDEX Addressing.
Memory Addressing Modes
• Pre-indexed mode
– The effective address of the operand is the sum of the
contents of the base register Rn and an offset value
• Pre-indexed with writeback mode
– The effective address of the operand is generated in
the same way as in the Pre-indexed mode, and then
the effective address is written back into Rn
• Post-indexed mode
– The effective address of the operand is the contents
of Rn. The offset is then added to this address and the
result is written back into Rn.
Register-indirect addressing
• The memory location to be accessed is held in a base register
– STR r0, [r1] ; Store contents of r0 to location pointed to
; by contents of r1.
– LDR r2, [r1] ; Load r2 with contents of memory location
; pointed to by contents of r1.
r0 Memory
Source
0x5
Register
for STR
r1 r2
Base Destination
0x200 0x200 0x5 0x5
Register Register
for LDR
Base plus offset addressing
• As well as accessing the actual location contained in the base
register, these instructions can access a location offset from
the base register pointer.
• This offset can be
– An unsigned 12bit immediate value (ie 0 - 4095 bytes).
– A register, optionally shifted by an immediate value
• This can be either added or subtracted from the base
register:
– Prefix the offset value or register with ‘+’ (default) or ‘-’.
• This offset can be applied:
– before the transfer is made: Pre-indexed addressing
• optionally auto-incrementing the base register, by postfixing the
instruction with an ‘!’.
– after the transfer is made: Post-indexed addressing
• causing the base register to be auto-incremented.
Pre-indexed Addressing
• Example: STR r0, [r1,#12] Memory
r0 Source
0x5 Register
for STR
Offset
12 0x20c 0x5
r1
Base
0x200 0x200
Register
r1 Offset r0
Updated Source
Base 0x20c 12 0x20c 0x5 Register
Register for STR
0x200 0x5
r1
Original
Base 0x200
Register
31 28 27 24 23 22 21 20 19 16 15 0
1
Rn
temp
2 3
Memory
Rm Rd
Swap and Swap Byte Instructions
• Example
• The swap instruction loads a word from memory into
register r0 and overwrites the memory with register r1.
• PRE mem32[0x9000] = 0x12345678
• r0 = 0x00000000
• r1 = 0x11112222
• r2 = 0x00009000
• SWP r0, r1, [r2]
• POST mem32[0x9000] = 0x11112222
• r0 = 0x12345678
• r1 = 0x11112222
• r2 = 0x00009000
Control Flow Instructions
• This category of instructions neither processes
data nor moves it around; it simply determines
which instructions get executed next.
– Branch instructions
– Conditional branches
– Conditional execution
– Branch and link instructions
– Subroutine return instructions
– Supervisor calls
– Jump tables
200
Branch Instructions
• Change the flow of sequential execution of instructions and
force to modify the program counter.
– Branch : B{<cond>} label
– Branch with Link : BL{<cond>} sub_routine_label
31 28 27 25 24 23 0
Cond 1 0 1 L Offset
• Branch (B)
– jumps in a range of +/-32 MB.
• Branch with link(BL)
– suitable for subroutine call by storing the address of next
instructions after BL into the link register(lr) and restore the
program counter(pc) from the link register while returning from
subroutine.
Branch Instructions
• The Table 1.1 shows the four branch operations with their
mnemonics and explanations.
• The instruction changes the PC to point to the target location
specified in the label. The sequence of execution is altered as per
the label.
• The PC-relative offset for branch instructions is
calculated by:
– a) Taking the difference between the branch instruction
and the target address minus 8 (to allow for the pipeline).
func1 func2
: STMFD :
: sp!,{regs,lr} :
BL func1 : :
: BL func2 :
: : :
LDMFD MOV pc, lr
sp!,{regs,pc}
Branch and Link Instructions
• Perform a branch, save the address following the branch in
the link register, r14
BL SUBR ;branch to SUBR
… ;return here
SUBR … ;subroutine entry point
MOV PC,r14 ;return
• For nested subroutine, push r14 and some work registers
required to be saved onto a stack in memory
BL SUB1
…
SUB1 STMFD r13!,{r0-r2,r14} ;save work and link regs
…
…
…
MOV PC,r14 ;copy r14 into r15 to return
Branch Instructions
• The most common way to switch program execution from one place
to another is use the branch instruction:
B LABEL
…
LABEL …
• LABEL comes after or before the branch instruction.
• Example:
B Forward
ADD r1, r2, #4
ADD r0, r6, #2
ADD r3, r7, #4
Forward
SUB r1, r2, #4
Backward
ADD r1, r2, #4
SUB r1, r2, #4
ADD r4, r6, r7
B Backward
Conditional Branches
• The branch has a condition associated with it
and it is only executed if the condition codes
have the correct value – taken or not taken
MOV r0,#0 ;initialize counter
Loop …
ADD r0,r0,#1 ;increment loop counter
CMP r0,#10 ;compare with limit
BNE Loop ;repeat if not equal
… ;else fail through
Conditional Branches
Program Status Register Instructions
• You can switch modes by calling either the MSR or the MRS
instructions. These instructions either read or write the mode bits in
the CPSR register.
• Changing the mode does not affect interrupts. If you want to disable
interrupts at the same time that you change mode you need to also
change the F and I interrupt bits in the CPSR.
• MRS Move to ARM register from status register (cpsr or spsr )
• 1. MRS<cond> Rd, cpsr
• 2. MRS<cond> Rd, spsr
• These instructions set Rd = cpsr and Rd = spsr, respectively. Rd must not be pc.
• MSR Move to status register (cpsr or spsr ) from an ARM register
• 1. MSR<cond> cpsr_<fields>, #<rotated_immed>
• 2. MSR<cond> cpsr_<fields>, Rm
• 3. MSR<cond> spsr_<fields>, #<rotated_immed>
• 4. MSR<cond> spsr_<fields>, Rm
Program Status Register Instructions
• MRS - Move to Register from Status
– MRS is use to read from either the CPSR or from the SPRS.
It move the value from the status register into a regular
register.
– The SPSR that will be read is the one that is active for the
CPU’s current mode.
– Example:
MRS R0, CPSR
MRS R1, SPSR
• Note
• Reading the SPSR while in user or system mode is not valid
and yields unpredictable results.
Program Status Register Instructions
• MSR - Move to Status from Register
– The MSR instruction is used to write to the CPSR
or the SPSR of the current mode.
– Any writes to the CPSR in user mode are ignored.
– The CPSR can only be written to in a priveleged
mode.
– Example:
MSR CPSR, R0
MSR SPSR, R1
Program Status Register Instructions
• These instructions alter selected bytes of the cpsr or spsr according to the
value of <mask>.
• The <fields> specifier is a sequence of one or more letters, determining
which bytes of <mask> are set. See Table A.9.
Action
1. cpsr = (cpsr & ∼<mask>) | (<rotated_immed> & <mask>)
2. cpsr = (cpsr & ∼<mask>) | (Rm & <mask>)
3. spsr = (spsr & ∼<mask>) | (<rotated_immed> & <mask>)
4. spsr = (spsr & ∼<mask>) | (Rm & <mask>)
Exceptions
• Exceptions are generated by internal and external
sources to cause the processor to handle an event,
such as an externally generated interrupt or an
attempt to execute an Undefined instruction.
• The processor state just before handling the
exception is normally preserved so that the original
program can be resumed when the exception routine
has completed.
• More than one exception can arise at the same time.
Exception handling
• Exception:
– Any condition that needs to halt normal
sequential execution of instructions
• ARM core is reset
• Instruction fetch or memory access fails
• Undefined instruction is encountered
• Software interrupt instruction is executed
• External interrupt has been raised
• The ARM architecture supports seven types of
exception.
• When an exception occurs, execution is forced
from a fixed memory address corresponding
to the type of exception. These fixed
addresses are called the exception vectors.
ARM Exception Types
• The ARM recognises seven different types of
exceptions.
– Reset
– Undefined instruction
– Software Interrupt (SWI)
– Prefetch Abort
– Data Abort
– IRQ
– FIQ
ARM Exceptions Types (Cont.)
• Reset
– Occurs when the processor reset pin is asserted
• For signalling Power-up
• For resetting as if the processor has just powered up
– Software reset
• Can be done by branching to the reset vector (0x0000)
• Undefined instruction
– Occurs when the processor or coprocessors
cannot recognize the currently execution
instruction
ARM Exceptions Types (Cont.)
• Software Interrupt (SWI)
– User-defined interrupt instruction
– Allow a program running in User mode to request
privileged operations that are in Supervisor mode
• For example, RTOS functions
• Prefetch Abort
– Fetch an instruction from an illegal address, the
instruction is flagged as invalid
– However, instructions already in the pipeline continue
to execute until the invalid instruction is reached and
then a Prefetch Abort is generated.
ARM Exceptions Types (Cont.)
• Data Abort
– A data transfer instruction attempts to load or store
data at an illegal address
• IRQ
– The processor external interrupt request pin is
asserted (LOW) and the I bit in the CPSR is clear
(enable)
• FIQ
– The processor external fast interrupt request pin is
asserted (LOW) and the F bit in the CPSR is clear
(enable)
ARM processor exceptions and modes
ARM Vector Table
• Exception handling is controlled by a vector table.
• It is a table of addresses that the ARM core branches to
when an exception is raised and there is always branching
instructions that direct the core to the ISR.
• This is a reserved area of 32 bytes at the bottom of the
memory map with one word of space allocated to each
exception type.
• the vector table starts at 0x00000000 (ARMx20 processors
can optionally locate the vector table address to
0xffff0000).
• A vector table consists of a set of ARM instructions that
manipulate the PC (i.e. B, MOV, and LDR). These
instructions cause the PC to jump to a specific location that
can handle a specific exception or interrupt.
ARM exception vector locations
Exception handling process
• When an exception occurs, control passes through an area
of memory called the vector table. This is a reserved area
usually at the bottom of the memory map.
• Figure shows the exception handling process.
ARM Exception Priorities
Response to an Exception Handler
• When an exception occurs, the ARM:
– Copies the CPSR into the SPSR for the mode
in which the exception is to be handled.
• Saves the current mode, interrupt mask, and
condition flags. 0x1C FIQ
0x18 IRQ
– Changes the appropriate CPSR mode bits
0x14 (Reserved)
• Change to the appropriate mode
0x10 Data Abort
• Map in the appropriate banked registers for that
mode 0x0C Prefetch Abort
0x08
– Disable interrupts Software Interrupt
0x04 Undefined Instruction
• IRQs are disabled when any exception occurs.
0x00 Reset
• FIQs are disabled when a FIQ occurs, and on
reset Vector Table
– Set lr_mode to the return address
– Set the program counter(PC) to the vector
address for the exception
Returning From an Exception Handler
• To return, exception handler needs to:
– Restore the CPSR from spsr_mode
– Restore the program counter using the return
address stored in lr_mode
Interrupt Handlers
• There are two types of interrupts available on ARM processor.
– The first type is the interrupt caused by external events from hardware
peripherals.
– The second type is the SWI instruction.
• The ARM processor has two levels of external interrupt, FIQ and
IRQ, both of which are level-sensitive active LOW signals into the
core.
• For an interrupt to be taken, the relevant input must be LOW and
the disable bit in the CPSR must be clear.
• FIQs have higher priority than IRQs in two ways:
– 1 FIQs are serviced first when multiple interrupts occur.
– 2 Servicing a FIQ causes IRQs to be disabled, preventing them from
being serviced until after the FIQ handler has re-enabled them (usually
by restoring the CPSR from the SPSR at the end of the handler).
Assigning interrupts
• How are interrupts assigned?
• It is up to the system designer who can decide
which hardware peripheral can produce which
interrupt request.
– Interrupt controller
• Multiple external interrupts to one if the two ARM interrupt
requests
– Standard design practice
• SWI are reserved to call privileged operating system routines
• IRQ are assigned for general-purpose interrupts
– A periodic timer
• FIQ are reserved for a single interrupt source that require a
fast response time
– Direct memory access to move blocks of memory
– FIQ has a higher priority and shorter interrupt latency than IRQ
Interrupt Latency
• It is the interval of time between from an
external interrupt signal being raised to the
first fetch of an instruction of the ISR of the
raised interrupt signal.
• System architects must balance between two
things,
– first is to handle multiple interrupts
simultaneously,
– second is to minimize the interrupt latency.
Interrupt Latency
• Minimization of the interrupt latency is achieved
by software handlers by two main methods,
– the first one is to allow nested interrupt handling so
the system can respond to new interrupts during
handling an older interrupt.
• This is achieved by enabling interrupts immediately after the
interrupt source has been serviced but before finishing the
interrupt handling.
– The second one is the possibility to give priorities to
different interrupt sources;
• this is achieved by programming the interrupt controller to
ignore interrupts of the same or lower priority than the
interrupt being handled if there is one.
Enabling and disabling Interrupt
• This is done by modifying the CPSR, this is done
using only 3 ARM instruction:
– MRS To read CPSR
– MSR To store in CPSR
– BIC Bit clear instruction
– ORR OR instruction
Enabling an IRQ/FIQ Disabling an IRQ/FIQ
Interrupt: Interrupt:
MRS r1, cpsr MRS r1, cpsr
BIC r1, r1, #0x80/0x40 ORR r1, r1, #0x80/0x40
MSR cpsr_c, r1 MSR cpsr_c, r1
Interrupt stack
• Stacks are needed extensively for context
switching between different modes when
interrupts are raised.
• The design of the exception stack depends on
two factors:
– OS Requirements.
– Target hardware.
• A good stack design tries to avoid stack overflow
because it cause instability in embedded systems.
Setting up the interrupt stacks
• Each operation in a system has
its own requirement for stack
design
– Stack pointers are initialized
after reset
• Where the interrupt stack is
placed depends upon the
RTOS requirements and the
specific hardware being used.
• Two design decisions need to
be made for the stacks:
– The location
– The size