ARM Prog Model 6 Subroutines PDF
ARM Prog Model 6 Subroutines PDF
2
Subroutine
A subroutine, also called a function or a procedure:
single-entry, single-exit
return to caller after it exits
When a subroutine is called, the Link Register (LR) holds the
memory address of the next instruction to be executed in the
calling program when the subroutine exits.
3
Subroutine calls
4
ARM subroutine linkage
Branch and link instruction:
BL foo ;copies current PC to r14.
6
Saving/restoring multiple registers
LDM/STM – load/store multiple registers
LDMIA – increment address after xfer
LDMIB – increment address before xfer
LDMDA – decrement address after xfer
LDMDB – decrement address before xfer
LDM/STM default to LDMIA/STMIA
Examples:
ldmia r13!,{r8-r12,r14} ;r13 updated at end
stmda r13,{r8-r12,r14} ;r13 not updated at end
7
The Stack
Stack is last-in-first-out (LIFO) storage
32-bit data
Stack pointer, SP or R13, points to top element of stack
SP decremented before data placed (“pushed”) onto stack
SP incremented after data removed (“popped”) from stack
PUSH and POP instructions used to load and retrieve data
PUSH {reglist} = STMDB sp!,{reglist}
POP {reglist} = LDMIA sp!,{reglist}
SP 3
SP 2 2
SP 1 1 1
SP POP {R5} POP {R4} POP {R3}
0x2000.7FFC
8
Stack Growth Convention:
Ascending vs Descending
Used in
Cortex-M4
Function QUAD
12
Functions
main Change
Change() Return
13
Subroutines
;------------Rand100------------ ;------------Divide------------
; Return R0=a random number between ; find the unsigned quotient and remainder
; 1 and 100. Call Random and then divide ; Inputs: dividend in R0
; the generated number by 100 ; divisor in R1
; return the remainder+1 ; Outputs: quotient in R2
Rand100 ; remainder in R3
PUSH {LR} ; SAVE Link ;dividend = divisor*quotient + remainder
BL Random Divide
;R0 is a 32-bit random number UDIV R2,R0,R1 ;R2=R0/R1,R2 is quotient
LDR R1,=100 MUL R3,R2,R1 ;R3=(R0/R1)*R1
BL Divide SUB R3,R0,R3 ;R3=R0%R1,
ADD R0,R3,#1 ;R3 is remainder of R0/R1
POP {LR} ;Restore Link back BX LR ;return
BX LR
ALIGN
POP {PC}
END
16
Stack to pass parameters
High level program Subroutine
1) Pushes inputs on
the Stack
2) Calls subroutine
3) Sees the inputs
on stack (pops)
4) Performs the
action of the
subroutine
5) Pushes outputs
6) Stack contain on the stack
outputs (pop)
17 7) Balance stack
ARM Architecture Procedure Call Standard
(AAPCS)
Application Binary Interface (ABI) standard for ARM
Allows assembly subroutine to be callable from C or callable from
someone else’s software
Parameters passed using registers and stack
Use registers R0, R1, R2, and R3 to pass the first four input
parameters (in order) into any function, C or assembly.
Pass additional parameters via the stack
Place the return parameter in Register R0.
Functions can freely modify registers R0–R3 and R12.
If a function uses R4--R11, push current register values
onto the stack, use the registers, and then pop the old
values off the stack before returning.
18
ARM Procedure Call Standard
Subroutine
Register Usage Notes
Preserved
If return has 64 bits, then r0:r1 hold it. If argument 1 has 64 bits,
r0 (a1) Argument 1 and return value No
r0:r1 hold it.
r1 (a2) Argument 2 No
r2 (a3) Argument 3 No If the return has 128 bits, r0-r3 hold it.
r3 (a4) Argument 4 No If more than 4 arguments, use the stack
r4 (v1) General-purpose V1 Yes Variable register 1 holds a local variable.
r5 (v2) General-purpose V2 Yes Variable register 2 holds a local variable.
r6 (v3) General-purpose V3 Yes Variable register 3 holds a local variable.
r7 (v4) General-purpose V4 Yes Variable register 4 holds a local variable.
r8 (v5) General-purpose V5 YES Variable register 5 holds a local variable.
r9 (v6) Platform specific/V6 No Usage is platform-dependent.
r10 (v7) General-purpose V7 Yes Variable register 7 holds a local variable.
r11 (v8) General-purpose V8 Yes Variable register 8 holds a local variable.
It holds intermediate values between a procedure and the sub-
r12 (IP) Intra-procedure-call register No
procedure it calls.
r13 (SP) Stack pointer Yes SP has to be the same after a subroutine has completed.
LR does not have to contain the same value after a subroutine has
r14 (LR) Link register No
completed.
r15 (PC) Program counter N/A Do not directly change PC
19
Example: R2 = R0*R0+R1*R1
MOV R0,#3
MOV R1,#4 R1: second argument
BL SSQ
MOV R2,R0
R0: first argument
B ENDL
...
SSQ MUL R2,R0,R0 int SSQ(int x, int y){
MUL R3,R1,R1 int z;
z = x*x + y * y;
ADD R2,R2,R3 return z;
MOV R0,R2 }
BX LR
...
R0: Return Value
20
Parameter-Passing: Registers
Caller Callee
;--call a subroutine that ;---------Exp-----------
;uses registers for parameter passing ; Input: R0 and R1 have inputs XX and YY (non-negative)
; Output: R2 has the result XX raised toYY
MOV R0,#7
; Destroys input R1
MOV R1,#3
Exp
BL Exp
ADDS r0,#0 ;check if XX is zero
;; R2 becomes 7^3 = 343 (0x157) BEQ Zero ;skip algorithm if XX=0
ADDS r1,#0 ; check ifYY is zero
BEQ One ; skip algorithm if YY=0
MOV r2, #1 ; Initial product is 1
Question: More MUL r2,r0 ; multiply product with XX
Is this AAPCS-compliant? ADDS r1,#-1 ; DecrementYY
BNE More
B Retn ; Done, so return
Zero MOV r2,#0 ; XX is 0 so result is 0
B Retn
One MOV r2,#1 ; YY is 0 so result is 1
Retn BX LR
21
Parameter-Passing: Stack
Caller Callee
;-------- call a subroutine that ;---------Max5-----------
; uses stack for parameter passing ; Input: 5 signed numbers pushed on the stack
MOV R0,#12 ; Output: put only the maximum number on the stack
MOV R1,#5 ; Comments:The input numbers are removed from stack
MOV R2,#22 numM RN 1 ; current number
MOV R3,#7 max RN 2 ; maximum so far
MOV R4,#18 count RN 0 ; how many elements
PUSH {R0-R4} Max5
; Stack has 12,5,22,7 and 18 (with 12 on top) POP {max} ; get top element (top of stack) into max
BL Max5 MOV count,#4 ; 4 more to go
; Call Max5 to find the maximum of the five numbers Again POP {numM} ; get next element
POP {R5} CMP numM,max
;; R5 has the max element (22) BLT Next
MOV max, numM ; new numM is the max
Next ADDS count,#-1 ; one more checked
BNE Again
PUSH {max} ; found max so push it on stack
BX LR
22
Parameter-Passing: Stack & Regs
Caller Callee
;------call a subroutine that uses both ;---------MinMax-----------
;stack and registers for parameter passing ; Input: N numbers reg+stack; N passed in R0
MOV R0,#6 ; R0 elem count ; Output: Return in R0 the min and R1 the max
MOV R1,#-14 ; Comments:The input numbers are removed from stack
MinMax
MOV R2,#5
PUSH {R1-R3} ; put all elements on stack
MOV R3,#32
CMP r0,#0 ; if N is zero nothing to do
MOV R4,#-7 BEQ DoneMM
MOV R5,#0 POP {r2} ; pop top and set it
MOV R6,#-5 MOV r1,r2 ; as the current min and max
PUSH {R4-R6} ; rest on stack loop ADDS r0,#-1 ; decrement and check
; R0 has element count BEQ DoneMM
; R1-R3 have first 3 elements; POP {r3}
; remaining parameters on Stack CMP r3,r1
BL MinMax BLT Chkmin
MOV r1,r3 ; new num is the max
;; R0 has -14 and R1 has 32 upon return
Chkmin CMP f3, r2
BGT NextMM
MOV r2,r3 ; new num is the min
NextMM B loop
DoneMM MOV R0,min ; R0 has min
BX LR
23
Abstraction - Device Driver
Abstraction allows us to modularize our code and give us the option to expose what
we want users to see and hide what we don’t want them to see.
A Device Driver is a good example where abstraction is used to expose public
routines that we want users of the driver to call and use private routines to hide
driver internals from the user (more on private routines later)
24
Port E LED Abstraction
RCC EQU 0x40023800 ;RCC base address (Reset and Clock Control)
AHB1ENR EQU 0x30 ;offset of RCC->AHB1ENR (clock enable register)
GPIOE EQU 0x40021000 ;GPIOE base address
MODER EQU 0x00 ;offset of GPIOE->MODER (mode register)
ODR EQU 0x14 ;offset of GPIOE->ODR (output data register)
26
Nested subroutine calls
Nested function calls in C:
27
Nested subroutine calls (1)
Nesting/recursion requires a “coding convention” to
save/pass parameters:
AREA Code1,CODE
Main LDR r13,=StackEnd ;r13 points to last element on stack
MOV r1,#5 ;pass value 5 to func1
STR r1,[r13,#-4]! ; push argument onto stack
BL func1 ; call func1()
here B here
; Stack area
AREA Data1,DATA
Stack SPACE 20 ;allocate stack space
StackEnd
END
30
System Design
Partition the problem into manageable parts
Successive Refinement
Stepwise Refinement
Systematic Decomposition
-------------------------------------------------------------------------
Start with a task and decompose it into a set of simpler
subtasks
Subtasks are decomposed into even simpler sub-subtasks
Each subtask is simpler than the task itself
Ultimately, subtask is so simple, it can be converted to software
Test the subtask before combining with other subtasks
Make design decisions
document decisions and subtask requirements
31
System Design
Four structured program building blocks:
“do A then do B” → sequential
“do A and B in either order” → sequential (parallel)
“if A, then do B” → conditional
“for each A, do B” → iterative
“do A until B” → iterative
“repeat A over & over forever” → iterative (condition always
true)
“on external event do B” → interrupt
“every t msec do B” → interrupt
32
Successive Refinement
33
Successive Refinement