Introduction to Arm Assembly
Chapter 2
Sepehr Naimi
www.NicerLand.com
Topics
ARM’s CPU
Its architecture
Some simple programs
Data Memory access
Program memory RAM EEPROM Timers
RISC architecture PROGRAM
Flash ROM
Program Data
Bus Bus
CPU
Interrupt Other
OSC Ports
Unit Peripherals
I/O
PINS
2
ARM ’s CPU
ARM ’s CPU
ALU
16 General Purpose
R0
registers (R0 to R15) R1
ALU
PC register (R15) R2
…
Instruction decoder CPSR: I T H S V N Z C
R13 (SP)
CPU R14 (LR)
R15 (PC)
PC registers
Instruction decoder
Instruction Register
3
CPU
4
Some simple instructions
1. MOV (MOVE)
MOV Rd, #k MOV Rd, Rs
Rd = k Rd = Rs
k is an 8-bit value Example:
Example: MOV R5,R2
MOV R5,#53 R5 = R2
R5 = 53 MOV R9,R7
MOV R9,#0x27 R9 = R7
R9 = 0x27
MOV R3,#2_11101100
5
LDR pseudo-instruction (loading 32-bit values)
LDR Rd, =k
Rd = k
k is an 32-bit value
Example:
LDR R5,=5543
R5 = 5543
LDR R9,=0x123456
R9 = 0x123456
LDR R4,=2_10110110011011001
6
Some simple instructions
Instruction
2. Description
Arithmetic calculation
ADD Rd, Rn,Op2 * ADD Rn to Op2 and place the result in Rd
Opcode
ADC destination,
Rd, Rn,Op2 source1,
ADD Rn to source2
Op2 with Carry and place the result in Rd
Opcodes:
AND
ADD,AND
Rd, Rn,Op2
SUB, AND, etc.
Rn with Op2 and place the result in Rd
BIC Rd, Rn,Op2 AND Rn with NOT of Op2 and place the result in Rd
Examples:
CMP Rn,Op2 Compare Rn with Op2 and set the status bits of CPSR**
ADD R5,R2,R1
CMN Rn,Op2 Compare Rn with negative of Op2 and set the status bits
EOR Rd, Rn,Op2 Exclusive OR Rn with Op2 and place the result in Rd
MVN R5 = R2 + R1Store the negative of Op2 in Rd
Rd,Op2
SUB R5, R9,#23OR Rn with Op2 and place the result in Rd
MOV Rd,Op2 Move (Copy) Op2 to Rd
ORR Rd, Rn,Op2
R5
RSB = R9 - 23 Subtract Rn from Op2 and place the result in Rd
Rd, Rn,Op2
RSC Rd, Rn,Op2 Subtract Rn from Op2 with carry and place the result in Rd
SBC Rd, Rn,Op2 Subtract Op2 from Rn with carry and place the result in Rd
SUB Rd, Rn,Op2 Subtract Op2 from Rn and place the result in Rd
TEQ Rn,Op2 Exclusive-OR Rn with Op2 and set the status bits of CPSR
TST Rn,Op2 AND Rn with Op2 and set the status bits of CPSR
* Op2 can be an immediate 8-bit value #K which can be 0–255 in decimal, (00–FF in hex).
Op2 can also be a register Rm. Rd, Rn and Rm are any of the general purpose registers
** CPSR is discussed later in this chapter
7
A simple program
Write a program that calculates 19 + 95
MOV R6, #19 ;R6 = 19
MOV R2, #95 ;R2 = 95
ADD R6, R6, R2 ;R6 = R6 + R2
8
A simple program
Write a program that calculates 19 + 95 - 5
MOV R1, #19 ;R6 = 19
MOV R2, #95 ;R2 = 95
MOV R3, #5 ;R21 = 5
ADD R6, R1,R2 ;R6 = R1 + R2
SUB R6, R6,R3 ;R6 = R6 - R3
MOV R1, #19 ;R6 = 19
MOV R2, #95 ;R2 = 95
ADD R6, R1,R2 ;R6 = R1 + R2
MOV R2, #5 ;R21 = 5
SUB R6, R6,R2 ;R6 = R6 - R2
9
Status Register (CPSR)
D31 D30 D29 D28 ………. D7 D6 D5 D4 D3 D2 D1 D0
CPSR: N Z C V Reserved I F T M4 M3 M2 M1 M0
Negative oVerflow Interrupt Thumb
Zero carry
Example:Show
Example: Showthe thestatus
statusof ofthe
theZZflag
flagafter
afterthethesubtraction
subtractionof of0x73
0x23
Example:
Example: Show
Show the
the status of the C
status instructions: and
ofinstructions:
the ZC flag Z
and afterflags
Z flags after
theafter the addition
subtraction of
of 0x9C
the addition of
from0x52
from 0xA5 ininthe
the following
following
0x0000009C
from
0x38 0x9C
and 0x2Fin and
the 0xFFFFFF64
in following
the following in the following instructions:
instructions:
instructions:
LDR
LDR R0,=0xA5
R0,=0x52
MOV LDR LDR
R6, #0x38 R0,=0x9C
R0,=0x9C;R6 = 0x38
LDR
LDR R1,=0x23
R1,=0x73
MOV LDR LDR #0x2F
R7, R1,=0xFFFFFF64
R1,=0x9C ;R17 = 0x2F
SUBS
SUBS R0,R0,R1
R0,R0,R1 ;subtract R1
;subtract R1 from
from R0R0
ADDS SUBS ADDS
R6, R6,R7 R0,R0,R1
R0,R0,R1;add R7 ;subtract ;add
to R6 R1 to R0
R21 from R20
Solution:
Solution:
Solution: 52
Solution: 0xA5 0101 101000100101
-- 9C
73 38 00000000
1001 1100 00000000 00000000 0011 1000
0x23 0111
0000009C 0010 0011
0011 00000000 00000000 10011100
00000000
+ - +DF
9C
FFFFFF642F
0x82 1101
00000000
1001 1100
1111
11111111
1000
00000000
0010 11111111 R0 00000000
R0= =0xDF
11111111
0x82
0010 1111
01100100
Z = 10 because 0067the R20
00000000 00000000
1 0000
has a value
00000000 00000000
0000other than
00000000 R0 00000000
=00000000
zero $00 01100111
after the subtraction.
00000000
C Z====01
R6
R0
Z because
becauseR1
=000000000
0x67
because theis
the R20bigger
R20 ishasthan
zero R0 the
after
a value and there
thanis0 aafter
subtraction.
other borrow from D32 bit.
the subtraction.
CC==11because
becausethere
R21 isisnot
R1 is a carry
not beyond
bigger theand
than R0
R20 D7there
andbit.
thereisisno
noborrow
borrowfrom
fromD32
D32bit.
bit.
C = 0 because there is nobigger than
carry beyond the D31 bit.
Z = 1 because R0 (the result) has a value 0 in it after the addition.
Z = 0 because the R6 (the result) has a value other than 0 after the addition.
Harvard in ARM9 and Cortex
11
Memory Map in STM32F103
8 bits
4G 0xFFFF FFFF
Cortex-M3 internal
peripherals
0xE000 0000
Example: Add contents of location 0x90 to contents of location 0x94
Afterand
running the following
store instruction:
the result STR (Store register)
in location 0x20000300.
SRAM
3G STR R5, 0000
0xC000 [R2]
Solution:
locations 0x20000000 through 0x20000003 will be loaded
with 0x78, 0x56, 0x34, and 0x12, respectively.
STR Rx,[Rd] ;[Rd]=Rx
LDR (Load register)
Example: Write a program
LDR R6,=0x90 ;R6 that copies the contents of location 0x80
= 0x90
FSMC 0x12 0x2000 0003
into location 0x88. Example:
LDR Rd, [Rx];Rd = [Rx]
0x8000 LDRR1,[R6]
0000 ;R1 = [0x90]
2G 0x34 0x2000 0002
Solution: 0x56
LDR R6,=0x94
0x6000 0000
;R6 = 0x94 ;[0x20000000]=0x12345678
0x2000 0001
LDRR2,[R6]
LDR R2,=0x80 Example:
;R2 == [0x94]
;R1 0x80 0x78 0x2000 0000
Peripherals
0x5FFF FFFF LDR R5,=0x12345678
1G LDR R2,R2,R1
0x4000 ADD
0000 R1,[R2] ;R1 == R2
;R2 [0x80]
+ R1
LDR
R4,=0x20000000
0x3FFF FFFF
R5: 0x2000 LDR R6,=0x20000300
R2,=0x88 ;R2 LDR
;R6= =0x88 R2, =0x20000000
SRAM 0x12
LDR 0x34 0x56 0x78
0x20000300
0000
LDR R1, [R4]
STR R2,[R6]
0x1FFF STR
FFFF R1,[R2] ;[0x88] =STR
;[0x20000300]R1 = R5,[R2]
R2 ; [R2] = R5
Flash
0 0x0000 0000
LDRB, LDRH, STRB, STRH
Data Size Bits Load instruction used Store instruction used
Byte 8 LDRB STRB
Half-word 16 LDRH STRH
Word 32 LDR STR
LDR Rd,[Rs] STR Rs,[Rd]
LDRB Rd,[Rs] STRB Rs,[Rd]
LDRH Rd,[Rs] STRH Rs,[Rd]
Assumethat
Assume thatR5=0x40000200,
R5=0x40000200,and andR1locations 0x40000200
= 0x41526374.
SRAM
through
After 0x40000203
running contain
the following 0x78, 0x56, 0x34 ,and 0x12,
instruction:
respectively.
STRB R1, [R5]
After running
locations the following
0x40000200 will beinstruction:
loaded with 0x74.
LDRH R7, [R5]
R7 will be loaded with 0x00005678 0x12
- 0x4000 0203
0x34
- 0x4000 0202
0x56
- 0x4000 0201
0x00 0x00 0x78
0x74 0x4000 0200
R7
R1 0x00
x 0x00
x 0x56
x 0x78
0x74
13
Memory Map in STM32F103
I/O Register Address
GPIOA_LCKR 0x40010818
GPIOA_BRR 0x40010814
GPIOA_BSRR 0x40010810
GPIOA_ODR 0x4001080C
GPIOA_IDR 0x40010808
GPIOA_CRH 0x40010804
GPIOA_CRL 0x40010800
Example: Read the contents of GPIOA_IDR.
Example: Write 0x53F6 into GPIOA_ODR.
Solution:
Solution: LDR R1,=0x40010808 ;R1= 0x40010808
LDR R2,=0x53F6 LDR ;R6
R2,[R1]
= 0x53F6 ;R2 = [0x4001080C]
LDR R1,=0x4001080C ;R1= 0x4001080C
STR R2,[R1] ;[0x4001080C] =
0x53F6
14
Some Arm addressing modes
Immediate
MOV R1, #0x25 F04F0125
ADD R6, R6, #0x40
Register addressing mode
MOV R2, R4
ADD R3, R2, R1 EB020301
Register indirect (indexed)
STR R5, [R6]
LDR R10, [R3]
15
Assembler Directives
16
Assembler
Assembly
Editor Program
myfile.a
assembler
Assembler Program
Machine
Language
[scriptFile.scr] [otherFiles.o] myfile.o myfile.lst
Linker
Downloaded to the
myfile.map myfile.hex
Program Memory
17
Assembler directives vs. Instructions
Instructions (e.g. ADD, MOV) tell the CPU what
to do
Assembler directives tell the assembler what to
do
AREA
IMPORT and EXPORT
END
DCD, DCW, DCB
EQU
INCLUDE
18
AREA
AREA sectionName, attribute1, attribute2, …
Code:
8 bits
AREA myCode, CODE, READONLY 4G 0xFFFF FFFF
Data:
Cortex-M3 internal
AREA
AREA MY_PROG,CODE,READONLY
MY_PROG,CODE,READONLY peripherals
0xE000 0000
__main
__main
AREA
MOV myData1,
MOV R4,
R4, #6
#6 DATA, READWRITE 3G 0xC000 0000
ADD
ADD R1,R1,R2
R1,R1,R2
AREA
….
…. myConst, DATA, READONLY
FSMC
myFunc
myFunc
2G 0x8000 0000
ADD
ADD R2,R3,R4
R2,R3,R4
…
… 0x6000 0000
0x5FFF FFFF
Peripherals
1G 0x4000 0000
0x3FFF FFFF
READWRITE
READWRITE SRAM 0x2000 0000
0x1FFF FFFF
READONLY
READONLY 0
Flash
0x0000 0000
19
IMPORT and EXPORT
File1.s
; from the main program:
IMPORT MY_FUNC
...
BL MY_FUNC ;call MY_FUNC function
...
File2.s
AREA OUR_EXAMPLE,CODE,READONLY
EXPORT MY_FUNC
IMPORT DATA1
MY_FUNC
LDR R1,=DATA1
...
20
First Assembly Program
EXPORT __main
AREA PROG_2_1, CODE, READONLY
__main
MOV R1, #0x25 ; R1 = 0x25
MOV R2, #0x34 ; R2 = 0x34
ADD R3, R2, R1 ; R3 = R2 + R1
HERE B HERE ; stay here forever
END ;end of source file
21
Defining Const. Values using DCD, DCW, and DCB
DCB allocates bytes of memory & initializes them.
Examples:
MYVALUE DCB 5
FIBO DCB 1,1,2,3,5,8
MY_MSG DCB “Hello World!”
DCW allocates a half-word
Example:
MYVALUE DCW 25425
DCD allocates a word of memory
MYDATA DCD 0x200000, 0x30F5, 5000000
22
Storing Fixed Data in Program Memory
EXPORT __main
AREA PROG2_2, CODE, READONLY
__main LDR R2, =OUR_FIXED_DATA ; point to OUR_FIXED_DATA
LDRB R0, [R2] ; load R0 with the contents
; of memory pointed to by R2
ADD R1, R1, R0 ; add R0 to R1
HERE B HERE ; stay here forever
AREA LOOKUP_EXAMPLE, DATA, READONLY
OUR_FIXED_DATA
DCB 0x55, 0x33, 1, 2, 3, 4, 5, 6
DCD 0x23222120, 0x30
DCW 0x4540, 0x50
END
23
Allocating memory using SPACE
SPACE allocates memory without initializing.
Example 1: Allocating 4 bytes of memory:
MY_LONG SPACE 4
Example 2: Allocating 2 bytes:
ALFA SPACE 2
Example 3: Allocating an array of 20 bytes:
MY_ARRAY SPACE 20
24
Defining 3 variables A, B, and C
EXPORT __main AREA OUR_DATA, DATA, READWRITE
AREA OUR_PROG, CODE, READONLY ; Allocates the followings in SRAM
__main ; A = 5 A SPACE 4
LDR R0, =A ; R0 = Addr. of A B SPACE 4
MOV R1, #5 ; R1 = 5 C SPACE 4
STR R1, [R0] ; init. A with 5 END
; B = 4
LDR R0, =B ; R0 = Addr. of B
MOV R1, #4 ; R1 = 4
STR R1, [R0] ; init. B with 4
; R1 = A
LDR R0, =A ; R0 = Addr. of A
LDR R1, [R0] ; R1 = value of A int main()
; R2 = B {
LDR R0, =B ; R0 = Addr. of A int a = 5;
LDR R2, [R0] ; R2 = value of A int b = 4;
; C = R1 + R2 (C = A + B) int c = a + b;
ADD R3, R1, R2 ; R3 = A + B
LDR R0, =C ; R0 = Addr. of C while(1)
STR R3, [R0] ; C = R3 {
loop B loop }
}
25
ALIGN
ALIGN is used to align data on 32-bit or 16-bit
boundary.
a)
DTA DCB 0x55
DCB 0x22
END
b)
DTA DCB 0x55
ALIGN 2
DCB 0x22
END
c)
DTA DCB 0x55
ALIGN 4
DCB 0x22
26
Assembler Directives
EQU and RN
name EQU value
Example:
COUNT EQU 0x25
MOV R1, #COUNT ;R1 = 0x25
MOV R2, #COUNT + 3 ;R2 = 0x28
Example 2:
GPIOA_ODR EQU 0x4001080C
name RN register
Example 1:
RESULT RN R2
MOV RESULT,#23
Example 2:
ProgCounter RN R15
27
Assembler Directives
INCLUDE
INCLUDE “filename.ext”
hFile.inc
GPIOA_CRL EQU 0x40010800
GPIOA_CRH EQU 0x40010804
GPIOA_IDR EQU 0x40010808
GPIOA_ODR EQU 0x4001080C
....
Program.s
include “hFile.inc”
28
Power up in Cortex-M
29
Startup and main files
Startup_stm32f10x.s
AREA RESET, DATA, READONLY
EXPORT __Vectors
__Vectors DCD __initial_sp ; loc. 0 to 3 (Stack init)
DCD Reset_Handler ; loc. 4 to 7
...
main.s
Reset_Handler PROC
IMPORT __main AREA OUR_EXAMPLE,CODE,READONLY
... EXPORT __main
__main
LDR R0, =__main ...
BX R0
;reserving 0x400 bytes for stack
AREA STACK, NOINIT, READWRITE, ALIGN=3
Stack_Mem SPACE 0x400
__initial_sp
30
Flash memory and PC register
0x08000200 F04F0125
0x08000204 F04F0234
0F02
0x08000208 EB020301
0x0800020C E7FE
0x0800020E
RAM
PROGRAM
Flash ROM ALU
main.lst 32bit
PC: 0x0800020C
0x08000200
0x08000208
0x08000204
0x0800020E Data
CPU Bus
Line Offset Machine Instruction _ 32bit
1 00000000 ; The program adds some data Code Instruction dec.
2 00000000 EXPORT __main Bus
3 00000000 AREA PROG_2_4, CODE, READONLY
4 00000000 __main
5 00000000 F04F 0125 MOV R1, #0x25 ; R1 = 0x25 Ports
6 00000004 F04F 0234 MOV R2, #0x34 ; R2 = 0x34
7 00000008 EB02 0301 ADD R3, R2, R1 ; R3 = R2 + R1
I/O
8 0000000C PINS
9 0000000C E7FE HERE B HERE ; stay here forever
31
10 0000000E END
How to speed up the CPU
Increase the clock frequency
More frequency More power consumption &
more heat
Limitations
Change the architecture
Pipelining
Harvard
RISC
32
Pipeline
Non-pipeline
Just fetches, decodes, or executes in a given time
Pipeline
33
Pipeline (Cont.)
SUB R3,R3,R4
LDR R2, [R4] ; R2 = [R4] ADD R0, R0,R1
ADD R0,R0,R1 ; R20 = R20 + R21 LDR R2, [R4]
SUB R3,R3,R4
Fetch
Decode
Execute
34
Harvard Architecture
separate buses for opcodes and operands
Advantage: opcodes and operands can go in and out of the CPU
together.
Disadvantage: Using Harvard architecture in motherboards leads
to more cost in general purpose computers.
Control bus Control bus
Code Data
Memory Data bus CPU Data bus Memory
Address bus Address bus
35
Changing the architecture
RISC vs. CISC
CISC (Complex Instruction Set Computer)
Put as many instruction as you can into the CPU
RISC (Reduced Instruction Set Computer)
Reduce the number of instructions, and use your
facilities in a more proper way.
36
RISC architecture
Feature 1 (fixed instruction size)
RISC processors have a fixed instruction size. It
makes the task of instruction decoder easier.
In ARM the instructions are 4 bytes.
In Thumb2 the instructions are either 2 or 4 bytes.
In CISC processors instructions have different
lengths
E.g. in 8051
CLR C ; a 1-byte instruction
ADD A, #20H ; a 2-byte instruction
LJMP HERE ; a 3-byte instruction
37
RISC architecture
Feature 2: reduce the number of instructions
Pros: Reduces the number of used transistors
Cons:
Can make the assembly programming more difficult
Can lead to using more memory
38
RISC architecture
Feature 3: limit the addressing mode
Advantage
hardwiring
Disadvantage
Can make the assembly programming more difficult
39
RISC architecture
Feature 4: Load/Store
LDR R8,=0x20
LDR R0,[R8]
LDR R8,=0x220
LDR R1,[R8]
ADD R0, R0,R1
LDR R8,=0x230 RAM USART Timers
STR R0,[R8]
PROGRAM
Flash ROM ALU
PC: Data
CPU Bus
Instruction dec.
Program
Bus
Interrupt Other
OSC Ports
Unit Peripherals
I/O
PINS
40
RISC architecture
Feature 5: more than 95% of instructions are
executed in 1 machine cycle
41
RISC architecture
Feature 6
RISC processors have at least 32 registers.
Decreases the need for stack and memory usages.
In ARM there are 16 general purpose registers (R0
to R15)
42