0% found this document useful (0 votes)
35 views86 pages

03 Cpu Overview

Class slides for a class of minri Class slides for a class of minr Class slides for a class of minr

Uploaded by

ftmarsy123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views86 pages

03 Cpu Overview

Class slides for a class of minri Class slides for a class of minr Class slides for a class of minr

Uploaded by

ftmarsy123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Microprocessor Systems

Fall 2024

1
Overview
n CPU overview
n Cortex-M0+ Processor Core
n Cortex-M0+ Processor Core Registers
n Memory System and Addressing
n Thumb Instruction Set

2
CPU OVERVIEW
Central Processing Unit (CPU)
n CPU is the fundamental execution/processing unit
of the computer
n CPU consists of ALU, Control Unit, and Registers
n CPU is characterized by:
n Clock frequency
n Speed
n Data bus width
n Instruction set
n Addressing capability
n Addressing capacity
Internal Structure of CPU

n Intel 4004
n in 1971, commercially available single-chip microprocessor

n 12 bit address bus

n 4 bit data bus


CPU Elements
n Program Counter (PC)
n Instruction Register (IR)
n Instruction Decoder
n Arithmetic and Logic Unit (ALU)
n General Purpose Registers
n Special Purpose Registers (SP, BP, IX, CCR, etc.)
n Control Unit (CU)
Internal Structure
of CPU

Example:
Motorola 6802
Registers in the Fetch Unit
n Program Counter: holds the memory location
of the next instruction.
n Instruction Register: holds the current
instruction being executed
Instruction Decoder
n It decodes the instructions and generates the
control signals
Arithmetic Logic Unit (ALU)
n ALU performs all arithmetic and logic
operations in a microprocessor
n ALU has two inputs (A, B) for the operands
and one input for a control signal that selects
the operation
n Operation and Shift control bits determine,
which type of operation to perform (F)
n Output is the result of operation (R) and
status information (D)
n Status information is used to indicate cases
n Zero: if all result lines have value 0
n Overflow: integer overflow of add and
subtract functions
n For unsigned integers, it does not provide any
useful information
Registers
n A register is a storage location in the CPU
n It is used to hold data or a memory address during
the execution of an instruction
n Because the register file is small and close to the
ALU, accessing data in registers is much faster than
accessing data in memory outside the CPU
n The register file makes program execution more
efficient
n The number of registers varies from computer to
computer
Condition Code Register or Flag Register
n Depending on the outcomes of Arithmetic or Logical
operations, we can branch and jump
n The eight-bit Condition Code Register (CCR) provides a
status report on the ALU's activity
n Carry/Borrow
n Half carry from bit 3 to bit 4
n oVerflow
n CCR also provides a status report after loading ACC
n Zero
n Negative

V Z N H C
Condition Code Register (CCR)
n They flag certain conditions resulting from the ALU
outcomes
n Example:
A= 01001000 B= 01111001
A+B:
A 01001000
B +01111001
11000001 V=1 Z=0 N=1 H=1 C=0

n Depending on the outcomes of Arithmetic or


Logical operations, we can branch and jump
The Stack
n A stack is a last-in-first-out data structure
n A stack of a computer works just like a real
stack, e.g., of books. If you have a stack of
books, you can put another book on top:
BOOK3
BOOK3

BOOK2 BOOK2

BOOK1 BOOK1

n This is called a push


n All that happens is the stack gets one book
deeper, and the last book you added is on top
The Stack

n You can also take a book off the top of the stack:
BOOK3

BOOK3

BOOK2 BOOK2

BOOK1 BOOK1

n This is called a pop.


n The stack gets one book shorter, and the book you get
from the top is the one you added, or pushed, most
recently
n Because a pop gives you back the item you most recently
pushed, a stack is called a last-in-first-out, or LIFO,
structure
Stack Pointer
n The stack is a way of using
the memory. SP
n All that's needed is some Address $A000
unused memory and an $A000 D0 $9FFF
index register, called
the Stack Pointer (SP), $9FFF D1 $9FFE
that always points to the $9FFE D2 $9FFD
next available (empty)
location above the current $9FFD D3 $9FFC
top of the stack $9FFC D4 $9FFB
n The stack grows toward $9FFB
lower addresses
$9FFA
Control Unit
n The control unit is a synchronous sequential
logic circuit that sends control signals to the data
processing unit, memory and other parts of the
system
n The signals from the control unit tells the data
processing unit to manipulate data according to
the algorithm built into the sequential logic circuit
n The control unit is instruction controlled;
therefore it can do more than one algorithm based
on its design (programmable)
n Typical control units recognize several hundred
different instruction codes
System Clock
n In order to regulate when the control unit issues its
control signals, computers use a system clock

n System clock generates regular pulses to synchronize


all system events and determine the speed at which
processing can occur

n Each fetch-execute instruction cycle is divided into


states, which are one clock pulse long
n Most instructions require multiple steps, and so require
several clock pulses to complete
(multi-cycle processor design)
n Some individual steps (e.g. a memory access) take
longer & may require additional clock pulses to
complete – these clock cycles spent waiting are called
wait states
System Clock

n The clock speed of a CPU determines how


often a new instruction is executed, and is
measured in MHz or GHz
n For example: 1.7GHz means that a computer
could executes 1,700,000,000 instructions per
second! (if it executes 1 instruction at a cycle)
System Clock
n However, all recent microprocessors overlap the fetching,
decoding and execution of a number of instructions at the
same time – this is called pipelining
n Therefore, clock speed is not necessarily an accurate
measure of performance, and other measurements are
required
Comparing Clocks
n It is difficult to compute CPU performance just by
comparing clock speed
n You must also consider how many clock cycles it
takes to execute 1 instruction
n How fast memory is
n How fast the bus is
n etc.

n In addition, there are different clocks in the


computer, the Control Unit and the whole CPU are
governed by the system clock

n There is usually a bus clock as well to regulate the


usage of the slower buses
CORTEX-M0+ CPU CORE
Microcontroller vs. Microprocessor
n Both have a CPU core to Arm Cortex
M0+ Core
System

execute instructions Memory and


Analog

Memory
n Microcontroller has Debug
Interface
Interfaces Timers

peripherals for embedded Interrupt


Clocks
Communication
Interfaces
Controller
interfacing and control Micro Human-Machine
Trace Security and Interface (HMI)
n Analog Buffer Integrity

n Non-logic level
signals
n Timing
n Clock generators
n Communications
n point to point
n network
n Reliability and safety
Cortex-M0+ Core
An ISA defines the hardware/software interface

n A “contract” between architects and programmers


n Register set
n Instruction set
n Addressing modes
n Word size
n Data formats
n Operating modes
n Condition codes
n Calling conventions
n Really not part of the ISA (usually)
n Rather part of the ABI
n But the ISA often provides meaningful support.
Architectures and Memory Speed
n Load/Store Architecture
n Developed to simplify CPU design and improve performance
n Memory wall: CPUs keep getting faster than memory
n Memory accesses slow down CPU, limit compiler optimizations
n Change instruction set to make most instructions independent of
memory
n Data processing instructions can access registers only
n Load data into the registers
n Process the data
n Store results back into memory
n More effective when more registers are available
n Register/Memory Architecture
n Data processing instructions can access memory or registers
n Memory wall is not very high at lower CPU speeds (e.g. under
50 MHz)
Arm Architecture
n The ARM is a Reduced Instruction Set Computer
(RISC), as it incorporates these typical RISC
architecture features:
n a large uniform register file
n a load/store architecture, where data-processing
operations only operate on register contents, not directly
on memory contents
n simple addressing modes, with all load/store addresses
being determined from register contents and instruction
fields only
n uniform and fixed-length instruction fields, to simplify
instruction decode

27
Arm Architecture
n In addition, the ARM architecture provides:
n control over both the Arithmetic Logic Unit (ALU) and
shifter in most data-processing instructions to maximize
the use of an ALU and a shifter
n auto-increment and auto-decrement addressing modes

to optimize program loops


n Load and Store Multiple instructions to maximize data

throughput
n conditional execution of almost all instructions to

maximize execution throughput


n These enhancements to a basic RISC architecture allow
ARM processors to achieve a good balance of high
performance, small code size, low power consumption, and
small silicon area
28
ARM Processor Core Registers
Stack Pointer

Note: there are two stack pointers!


SP_process (PSP) used SP_main (MSP) used
by: by:
- Base app code - OS kernel
(when not running - Exception handlers
an exception - App code w/
handler) privileded access

Mode dependent
ARM Processor Core Registers (32 bits each)
n R0-R12 - General purpose registers for data
processing
n SP - Stack pointer (R13)
n Can refer to one of two SPs
n Main Stack Pointer (MSP)
n Process Stack Pointer (PSP)
n Uses MSP initially, and whenever in Handler mode
n When in Thread mode, can select either MSP or PSP
using SPSEL flag in CONTROL register.
n LR - Link Register (R14)
n Holds return address when called with Branch & Link
instruction (B&L)
n PC - program counter (R15)
Operating Modes
Reset

Thread
Mode.
MSP or PSP.

Excep%on Star%ng
Processing Excep%on
Completed Processing

Handler
Mode
MSP

n Which SP is active depends on operating mode, and


SPSEL (CONTROL register bit 1)
n SPSEL == 0: MSP
n SPSEL == 1: PSP
ARM Processor Core Registers

n Program Status Register (PSR) is three views of same


register
n Application PSR (APSR)
n Condition code flag bits Negative, Zero, oVerflow, Carry
n Interrupt PSR (IPSR)
n Holds exception number of currently executing ISR
n Execution PSR (EPSR)
n Thumb state
ARM Processor Core Registers
n PRIMASK - Exception mask register
n Bit 0: PM Flag (Priority Mask Flag)
n Set to 1 to prevent activation of all exceptions with configurable
priority
n Access using CPS, MSR and MRS instructions
n Use to prevent data race conditions with code needing
atomicity
n CONTROL
n Bit 1: SPSEL flag
n Selects SP when in thread mode: MSP (0) or PSP (1)
n Bit 0: nPRIV flag
n Defines whether thread mode is privileged (0) or unprivileged (1)
n With OS environment,
n Threads use PSP (operating system threads)
n OS and exception handlers (ISRs) use MSP
Memory Maps For Cortex M0+ and MCU
KL25Z128VLK4
0x2000_2FFF

SRAM_U (3/4)
16 KB SRAM
0x2000_0000
SRAM_L (1/4)
0x1FFF_F000

0x0001_FFFF

128KB Flash

0x0000_0000
Endianness
n For a multi-byte
value, in what order
are the bytes stored?

n Little-Endian: Start
with least-significant
byte

n Big-Endian: Start with


most-significant byte
ARMv6-M Endianness

n Instructions are always little-endian


n Loads and stores to Private Peripheral Bus are
always little-endian
n Data: Depends on implementation, or from
reset configuration
ARM Cortex-M3 Memory Formats (Endian)

n Default memory format for ARM CPUs: LITTLE


ENDIAN
n Bytes 0-3 hold the first stored word
n Bytes 4-7 hold the second stored word
n Processor contains a configuration pin BIGEND
n Enables hardware system developer to select
format:
n Little Endian
n Big Endian (BE-8)
n Pin is sampled on reset
n Cannot change endianness when out of reset
ARM, Thumb and Thumb-2 Instructions
n ARM instructions optimized for resource-rich high-performance
computing systems
n Deeply pipelined processor, high clock rate, wide (e.g. 32-bit) memory
bus
n Low-end embedded computing systems are different
n Slower clock rates, shallow pipelines
n Different cost factors – e.g. code size matters much more, bit and byte
operations critical
n Modifications to ARM ISA to fit low-end embedded computing
n 1995: Thumb instruction set
n 16-bit instructions
n Reduces memory requirements (and performance slightly)
n 2003: Thumb-2 instruction set
n Adds some 32 bit instructions
n Improves speed with little memory overhead
n CPU decodes instructions based on whether in Thumb state or ARM
state - controlled by T bit
Instruction Set

n Cortex-M0+ core implements ARMv6-M Thumb instructions


n Only uses Thumb instructions, always in Thumb state
n Most instructions are 16 bits long, some are 32 bits
n Most 16-bit instructions can only access low registers (R0-R7), but
some can access high registers (R8-R15)
n Thumb state indicated by program counter being odd (LSB =
1)
n Branching to an even address will cause an exception, since switching
back to ARM state is not allowed
n Conditional execution only supported for 16-bit branch
n 32 bit address space
n Half-word aligned instructions
n See ARMv6-M Architecture Reference Manual for specifics per
instruction (Section A.6.7)
Assembler Instruction Format
n <operation> <operand1> <operand2> <operand3>
n There may be fewer operands
n First operand is typically destination (<Rd>)
n Other operands are sources (<Rn>, <Rm>)

n Examples
n ADDS <Rd>, <Rn>, <Rm>
n Add registers: <Rd> = <Rn> + <Rm>
n AND <Rdn>, <Rm>
n Bitwise and: <Rdn> = <Rdn> & <Rm>
n CMP <Rn>, <Rm>
n Compare: Set condition flags based on result of computing
<Rn> - <Rm>
Where Can the Operands Be Located?

n In a general-purpose register R
n Destination: Rd
n Source: Rm, Rn
n Both source and destination: Rdn
n Target: Rt
n Source for shift amount: Rs
n An immediate value encoded in instruction word
n In a condition code flag
n In memory
n Only for load, store, push and pop instructions
Update Condition Codes in APSR?

n “S” suffix indicates the instruction updates the APSR


n ADD vs. ADDS
n ADC vs. ADCS
n SUB vs. SUBS
n MOV vs. MOVS
Updating the APSR
n SUB Rx, Ry
n Rx = Rx - Ry
n APSR unchanged
n SUBS
n Rx = Rx - Ry
n APSR N, Z, C, V updated
n ADD Rx, Ry
n Rx = Rx + Ry
n APSR unchanged
n ADDS
n Rx = Rx + Ry
n APSR N, Z, C, V updated
Instruction Set Summary
Instruction Type Instructions
Move MOV
Load/Store LDR, LDRB, LDRH, LDRSH, LDRSB, LDM, STR,
STRB, STRH, STM
Add, Subtract, ADD, ADDS, ADCS, ADR, SUB, SUBS, SBCS, RSBS,
Multiply MULS
Compare CMP, CMN
Logical ANDS, EORS, ORRS, BICS, MVNS, TST
Shift and Rotate LSLS, LSRS, ASRS, RORS
Stack PUSH, POP
Conditional branch IT, B, BL, B{cond}, BX, BLX
Extend SXTH, SXTB, UXTH, UXTB
Reverse REV, REV16, REVSH
Processor State SVC, CPSID, CPSIE, SETEND, BKPT
No Operation NOP
Hint SEV, WFE, WFI, YIELD
Load/Store Register
n ARM is a load/store architecture, so must process
data in registers (not memory)

n LDR: load register with word (32 bits) from memory


n LDR <Rt>, source address

n STR: store register contents (32 bits) to memory


n STR <Rt>, destination address
Modes for Addressing Memory
n Offset Addressing mode: [<Rn>, <offset>] accesses address
<Rn>+<offset>
n Base Register <Rn>
n Can be register R0-R7, SP or PC
n <offset> is added or subtracted from base register to create
effective address
n Can be an immediate constant
n Can be another register, used as index <Rm>
n Auto-update: Can write effective address back to base
register
n Pre-indexing: use effective address to access memory,
then update base register with that effective address
n Post-indexing: use base register to access memory, then
update base register with effective address
Addressing Modes
n Offset Addressing
n Offset is added or subtracted from base register
n Result used as effective address for memory access
n [<Rn>, <offset>]
n Examples:
n LDR R2, [R0] ; Load R2 with the word pointed by R0
n STR R2, [R3] ; Store the word in R2 in the location
pointed by R3
n LDR R0, [R1, #20] ; loads R0 with the word pointed
at by R1+20
Addressing Modes (continues)
n Pre-indexed Addressing
n Offset is applied to base register
n Result used as effective address for memory access
n Result written back into base register
n [<Rn>, <offset>]!
n Example:
n LDR R0, [R1, #4]! ; loads R0 with the word pointed at by R1+4
; then update the pointer by adding 4 to R1
Addressing Modes (continues)
n Post-indexed Addressing
n The address from the base register is used as the EA
n The offset is applied to the base and then written back
n [<Rn>], <offset>
n Example:
n LDR R0, [R1], #4; loads R0 with the word pointed at by R1
; then update the pointer by adding 4 to R1
Summary of ARM's Indexed Addressing Modes

51
Loading/Storing Smaller Data Sizes
n Some load and store instructions can handle half-word (16
bits) and byte (8 bits)
n Store just writes to half-word or byte
n STRH, STRB
n Loading a byte or half-word requires padding or extension:
What do we put in the upper bits of the register?
n Example: How do we extend 0x80 into a full word?
n Unsigned? Then 0x80 = 128, so zero-pad to extend to word
0x0000_0080 = 128
n Signed? Then 0x80 = -128, so sign-extend to word 0xFFFF_FF80
= -128

Signed Unsigned
Byte LDRSB LDRB
Half-word LDRSH LDRH
In-Register Size Extension
n Can also extend byte or half-word that is already in a
register
n Signed or unsigned (zero-pad)
n How do we extend 0x80 into a full word?
n Unsigned? Then 0x80 = 128, so zero-pad to extend to word
0x0000_0080 = 128
n Signed? Then 0x80 = -128, so sign-extend to word
0xFFFF_FF80 = -128

Signed Unsigned
Byte SXTB UXTB
Half-word SXTH UXTH
Example
n SXTB R0, R1; R0 = Sign Extend (R1[7:0])
n SXTH R0, R1; R0 = Sign Extend (R1 [15:0])

n UXTB R0, R1; R0 = Zero Extend (R1 [7:0])


n UXTH R0, R1; R0 = Zero Extend (R1 [15:0])

54
Load/Store Multiple

n LDM/LDMIA: load multiple registers starting from


[base register], update base register afterwards
n LDM <Rn>!,<registers>
n LDM <Rn>,<registers>
n STM/STMIA: store multiple registers starting at [base
register], update base register after
n STM <Rn>!, <registers>
n LDMIA and STMIA are pseudo-instructions, translated
by assembler
n The accesses happens in order of increasing register
numbers, with the lowest numbered register using the
lowest memory address and the highest number register
using the highest memory address.
Example for LDM/STM
; Load the address of the source data array
ADR R0, SRC_ADR
; Load 6 consecutive addresses starting at SRC_ADDR
; R0 is NOT modified
; All three instructions below do exactly the same thing
LDM R0, {R1-R6}
LDM R0, {R1, R2, R3, R4, R5, R6}
LDM R0, {R6, R1, R2, R4, R5, R3}
; Load the destination address
ADR R7, DEST_ADDR
STM R7, {R1-R6}

56
Example for LDM/STM
n This example demonstrates how to use the LDM/STM
instructions that do modify the base registers. The only
functional difference between the previous example and this
example is that R0 and R7 both get modified.

; Load Src and Dest Addresses


ADR R0, SRC_ADR
ADR R7, DEST_ADDR
; Use '!' to write back to the base address
; The base address is incremented based on the
; number of WORDs being accessed
LDM R0!, {R1-R3} ; R0 <- R0 + 12
STM R7!, {R1-R3} ; R7 <- R7 + 12
LDM R0!, {R1-R3} ; R0 <- R0 + 12
STM R7!, {R1-R3} ; R7 <- R7 + 12 57
Load Literal Value into Register
n Assembly pseudo-instruction: LDR <rd>, =value
n Assembler generates code to load <rd> with value
n Assembler selects best approach depending on value
n Load immediate
n MOV instruction provides 8-bit unsigned immediate operand (0-255)
n Load and shift immediate values
n Can use MOV, shift, rotate, sign extend instructions
n Load from literal pool
n 1. Place value as a 32-bit literal in the program’s literal pool (table of literal
values to be loaded into registers)
n 2. Use instruction LDR <rd>, [pc,#offset] where offset indicates position
of literal relative to program counter value
n Example formats for literal values (depends on compiler
and toolchain used)
n Decimal: 3909
n Hexadecimal: 0xa7ee
n Character: ‘A’
n String: “44??”
Example for LDR
LDR r1,=0xfff ; loads 0xfff into r1

LDR r2,= place ; loads the address of place into r2


; (place is a label)

59
Move (Pseudo-)Instructions
n Copy data from one register to
another without updating
condition flags
n MOV <Rd>, <Rm>
n Assembler translates pseudo-
instructions into equivalent
instructions (shifts, rotates)
n Copy data from one register to
another
and update condition flags
n MOVS <Rd>, <Rm>

n Copy immediate literal value (0-255)


into register and update condition
flags
n MOVS <Rd>, #<imm8>
Examples
n MOV r3, #0
n MOV r0, r12 ; does not update flags

61
Stack Operations
n Push some or all of registers (R0-R7, LR) to stack
n PUSH {<registers>}
n Decrements SP by 4 bytes for each register saved
n Pushing LR saves return address
n PUSH {r1, r2, LR}
n Always pushes registers in same order
n Pop some or all of registers (R0-R7, PC) from stack
n POP {<registers>}
n Increments SP by 4 bytes for each register restored
n If PC is popped, then execution will branch to new PC value
after this POP instruction (e.g. return address)
n POP {r5, r6, r7}
n Always pops registers in same order (opposite of pushing)
Examples
n PUSH { R0, R1, R2 }; Memory [SP- 4]= R2,
; Memory [SP- 8]= R1,
; Memory [SP- 12]= R0,
; SP = SP-12

POP { R0, R1, R2 }; R0 = Memory [SP],


; R1 = Memory [SP+ 4],
; R2 = Memory [SP+ 8],
; SP = SP+12

63
Add Instructions
n Add registers, update condition flags
n ADDS <Rd>,<Rn>,<Rm>
n Add registers and carry bit, update condition
flags
n ADCS <Rdn>,<Rm>
n Add registers
n ADD <Rdn>,<Rm>
n Add immediate value to register
n ADDS <Rd>,<Rn>,#<imm3>
n ADDS <Rdn>,#<imm8>
Add Instructions with Stack Pointer
n Add SP and immediate value
n ADD <Rd>,SP,#<imm8>
n ADD SP,SP,#<imm7>

n Add SP value to register


n ADD <Rdm>, SP, <Rdm>
n ADD SP,<Rm>
Examples
n ADDS R0, R1, R2 ; R0 = R1 + R2
; Update APSR
n ADDS R0, R1, #0x01 ; R0 = R1 + Zero Extend (0x01)
; Update APSR
n ADDS R0, #0x01 ; R0 = R0 +Zero Extend ( 0x01)
; Update APSR
n ADD R0, R1 ; R0 = R0 + R1
n ADCS R0, R1 ; R0 = R0 + R1 + Carry
; Update APSR
n ADD R0, PC, #0x04 ; R0 = PC + 0x04

66
Address to Register Pseudo-Instruction
n Add immediate value to PC, write result in
register
n ADR <Rd>,<label>
n How is this used?
n Enables storage of constant data near program
counter
;First, load register R2 with address of const_data
ADR R2, const_data
;Second, load const_data into R2
LDR R2, [R2]
n Value must be close to current PC value
Subtract
n Subtract immediate from register, update condition
flags
n SUBS <Rd>,<Rn>,#<imm3>
n SUBS <Rdn>,#<imm8>
n Subtract registers, update condition flags
n SUBS <Rd>,<Rn>,<Rm>
n Subtract registers with carry, update condition flags
n SBCS <Rdn>,<Rm>
n Subtract immediate from SP
n SUB SP,SP,#<imm7>
Examples
n 64-bit addition
ADDS R0, R0, R2 ; add the least significant words
ADCS R1, R1, R3 ; add the most significant words with carry
n Multiword values do not have to use consecutive registers.
n Example shows instructions that subtract a 96-bit integer contained
in R1, R2, and R3 from another contained in R4, R5, and R6.
n The example stores the result in R4, R5, and R6.
n 96-bit subtraction
SUBS R4, R4, R1 ; subtract the least significant words
SBCS R5, R5, R2 ; subtract the middle words with carry
SBCS R6, R6, R3 ; subtract the most significant words with carry
n Example shows the RSBS instruction used to perform a 1's complement of
a single register
n Arithmetic negation
RSBS R7, R7, #0 ; subtract R7 from zero
Multiply
n Multiply source registers, save lower word of
result in destination register, update condition
flags
n MULS <Rdm>, <Rn>, <Rdm>
n <Rdm> = <Rdm> * <Rn>

n Signed multiply
n Note: upper word of result is truncated

n MULS R0, R1, R0 ; R0 = R0 × R1


; Update APSR
Logical Operations
n Bitwise AND registers, update condition flags
n
ANDS <Rdn>,<Rm>
n Bitwise OR registers, update condition flags
n
ORRS <Rdn>,<Rm>
n Bitwise Exclusive OR registers, update condition flags
n
EORS <Rdn>,<Rm>
n Bitwise AND register and complement of second register,
update condition flags
n
BICS <Rdn>,<Rm>
n Move inverse of register value to destination, update
condition flags
n
MVNS <Rd>,<Rm>
n Update condition flags by ANDing two registers, discarding
result
n
TST <Rn>, <Rm>
Compare
n Compare - subtracts second value from first,
discards result, updates APSR
n CMP <Rn>,#<imm8>
n CMP <Rn>,<Rm>

n Compare negative - adds two values, updates


APSR, discards result
n CMN <Rn>,<Rm>
Shift and Rotate
n Common features
n All of these instructions update APSR condition flags
n Shift/rotate amount (in number of bits) specified by last
operand
n Logical shift left - shifts in zeroes on right
n LSLS <Rd>,<Rm>,#<imm5>
n LSLS <Rdn>,<Rm>
n Logical shift right - shifts in zeroes on left
n LSRS <Rd>,<Rm>,#<imm5>
n LSRS <Rdn>,<Rm>
n Arithmetic shift right - shifts in copies of sign bit on left (to
maintain arithmetic sign)
n ASRS <Rd>,<Rm>,#<imm5>
n Rotate right
n RORS <Rdn>,<Rm>
Examples
n ASRS R7, R5, #9 ; Arithmetic shift right by 9 bits
n LSLS R1, R2, #3 ; Logical shift left by 3 bits with flag update
n LSRS R4, R5, #6 ; Logical shift right by 6 bits
n RORS R4, R4, R6 ; Rotate right by the value in the bottom byte of R6

74
Reversing Bytes
MSB LSB
n REV - reverse all bytes
in word
MSB LSB
n REV <Rd>,<Rm>
n REV16 - reverse bytes in
both half-words MSB LSB

n REV16 <Rd>,<Rm>
n REVSH - reverse bytes MSB LSB

in low half-word
(signed) and sign- MSB LSB
extend Sign extend
n REVSH <Rd>,<Rm> MSB LSB
Examples
n REV R3, R7 ; Reverse byte order of value in R7 and write it to R3
n REV16 R0, R0 ; Reverse byte order of each 16-bit half-word in R0
n REVSH R0, R5 ; Reverse signed half-word
Changing Program Flow - Branches
n Branches (conditional and unconditional)
n Branch without link (i.e. no possibility of return) to target
n The PC is not saved!
n Unconditional Branches
n B <label>
n Target address must be within 2 KB of branch instruction
(-2048 B to +2046 B)
n Conditional Branches
n B<cond> <label>
n <cond> is condition - see next page
n B<cond> target address must be within of branch
instruction
n B target address must be within 256 B of branch
instruction (-256 B to +254 B)
Condition Codes
n Append to branch
instruction (B) to make
a conditional branch

n Full ARM instructions


(not Thumb or
Thumb-2) support
conditional execution of
arbitrary instructions

n Note: Carry bit = not-


borrow for compares
and subtractions
Changing Program Flow - Subroutines
n Call
n BL <label> - branch with link
n Store the return address in the link register (lr)
n Call subroutine at <label>
n PC-relative, range limited to PC+/-16MB
n Save return address in LR
n BLX <Rd> - branch with link and exchange
n Call subroutine at address in register Rd (exchange Rd with PC)
n Supports full 4GB address range
n LSB of target address must be set to 1 to ensure continued execution in Thumb state
n Save return address in LR
n Return
n BX <Rd> branch and exchange
n Branch to address specified by <Rd>
n LSB of target address must be set to 1 to ensure continued execution in
Thumb state
n Supports full 4 GB address space
n BX LR - Return from subroutine
Examples
B loopA ; Branch to loopA
BL funC ; Branch with link (Call) to function funC, return address
; stored in LR
BX LR ; Return from function call
BLX R0 ; Branch with link and exchange (Call) to a address stored
; in R0
BEQ labelD ; Conditionally branch to labelD if last flag setting
; instruction set the Z flag, else do not branch.

80
Special Register Instructions
n Move to Register from
Special Register
n MSR <Rd>, <spec_reg>

n Move to Special Register


from Register
n MRS <spec_reg>, <Rd>

n Change Processor State -


Modify PRIMASK register
n CPSIE - Interrupt enable
n CPSID - Interrupt disable
Other
n No Operation - does nothing!
n NOP
n Breakpoint - causes hard fault or debug halt - used
to implement software breakpoints
n BKPT #<imm8>
n Wait for interrupt - Pause program, enter low-power
state until a WFI wake-up event occurs (e.g. an
interrupt)
n WFI
n Supervisor call generates SVC exception (#11),
same as software interrupt
n SVC #<imm>
Exercise: What is the value of r2 at done?

...
start:
movs r0, #1
movs r1, #1
movs r2, #1
sub r0, r1
bne done
movs r2, #2
done:
b done
...
Solution: What is the value of r2 at done?

...
start:
movs r0, #1 // r0 ! 1, Z=0
movs r1, #1 // r1 ! 1, Z=0
movs r2, #1 // r2 ! 1, Z=0
sub r0, r1 // r0 ! r0-r1
// but Z flag untouched
// since sub vs subs
bne done // NE true when Z==0
// So, take the branch
movs r2, #2 // not executed
done:
b done // r2 is still 1
...
An example ARM assembly language program for GNU
.equ STACK_TOP, 0x20000800
.text
.syntax unified
.thumb
.global _start
.type start, %function

_start:
.word STACK_TOP, start
start:
movs r0, #10
movs r1, #0
loop:
adds r1, r0
subs r0, #1
bne loop
deadloop:
b deadloop
.end
What’s it all mean?

_start:
.word STACK_TOP, start /* Inserts word 0x20000800 */
/* Inserts word (start) */
start:
movs r0, #10 /* We’ve seen the rest ... */
movs r1, #0
loop:
adds r1, r0
subs r0, #1
bne loop
deadloop:
b deadloop
.end

86

You might also like