0% found this document useful (0 votes)
23 views86 pages

Cse205R01: Computer Architecture B.Tech. CSE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views86 pages

Cse205R01: Computer Architecture B.Tech. CSE

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

CSE205R01: COMPUTER ARCHITECTURE

B.Tech. CSE

Rajilal M V
School of Computing
ARM instruction set
• Data Processing Instructions
– manipulate data within registers
– move instructions, arithmetic instructions, logical instructions,
comparison instructions, and multiply instructions
– Most data processing instructions can process one of their
operands using the barrel shifter
– usage of S suffix on a data processing instruction
• it updates the flags in the cpsr
– Move and logical operations
• update the carry flag C, negative flag N, and zero flag Z
• carry flag is set from the result of the barrel shift as the last
bit shifted out
• N flag is set to bit 31 of the result
• Z flag is set if the result is zero
ARM instruction set
– Move Instructions
• simplest ARM instruction
• copies N into a destination register Rd, where N is a
register or immediate value
• useful for setting initial values and transferring
data between registers
ARM instruction set

• MOV instruction takes the contents of register r5 and


copies them into register r7
– taking the value 5, and overwriting the value 8 in register r7
ARM instruction set
– Barrel Shifter
• A unique and powerful feature of the ARM processor is
– the ability to shift the 32-bit binary pattern in one of the
source registers left or right by a specific number of
positions before it enters the ALU
– This shift increases the power and flexibility of many data
processing operations
• Pre-processing or shift occurs within the cycle time of the
instruction
– particularly useful for loading constants into a register and
achieving fast multiplies or division by a power of 2
• data processing instructions that do not use the barrel
shift
– MUL (multiply), CLZ (count leading zeros), and QADD
(signed saturated 32-bit add) instructions
ARM instruction set

• apply a logical shift left (LSL) to register Rm before


moving it to the destination register
• MOV instruction copies the shift operator result N
into register Rd
– N represents the result of the LSL operation
• multiplies register r5 by four and then places the result
into register r7
ARM instruction set
five different shift operations that you can
use within the barrel shifter
ARM instruction set
fig illustrates a logical shift left by one
ARM instruction set
example of a MOVS instruction shifts register r1 left by one bit
This multiplies register r1 by a value 21
C flag is updated in the cpsr because the S suffix is present in the instruction
mnemonic
ARM instruction set
• lists the syntax for the different barrel shift operations available on data
processing instructions
• second operand N can be an immediate constant preceded by #, a register value
Rm, or the value of Rm processed by a shift
ARM instruction set
– Arithmetic Instructions
• implement addition and subtraction of 32-bit signed and
unsigned values

• N is the result of the shifter operation


ARM instruction set
• simple subtract instruction subtracts a value stored
in register r2 from a value stored in register r1
• result is stored in register r0
ARM instruction set
• reverse subtract instruction (RSB) subtracts r1
from the constant value #0, writing the result to r0
• can use this instruction to negate numbers
ARM instruction set
• SUBS instruction is useful for decrementing loop counters
• subtract the immediate value one from the value one stored in
register r1
• result value zero is written to register r1
• cpsr is updated with the ZC flags being set
ARM instruction set
– Using the Barrel Shifter with Arithmetic Instructions
• below ex; illustrates the use of the inline barrel
shifter with an arithmetic instruction
– instruction multiplies the value stored in register r1 by three
– Register r1 is first shifted one location to the left to give
the value of twice r1
– ADD instruction then adds the result of the barrel shift
operation to register r1
– final result transferred into register r0 is equal to three times
the value stored in register r1
ARM instruction set
– Logical Instructions
• Logical instructions perform bitwise logical
operations on the two source registers
ARM instruction set
• example shows a logical OR operation between
registers r1 and r2
• r0 holds the result
ARM instruction set
• example shows a more complicated logical instruction
called BIC, which carries out a logical bit clear

– register r2 contains a binary pattern where every binary 1 in r2 clears a


corresponding bit location in register r1
• particularly useful when clearing status bits and is frequently
used to change interrupt masks in the cpsr
• logical instructions update the cpsr flags only if the S suffix is
present
• can use barrel-shifted second operands in the same way as the
arithmetic instructions
ARM instruction set
– Comparison Instructions
• used to compare or test a register with a 32-bit value
• update the cpsr flag bits according to the result, but do not
affect other registers
• After the bits have been set, the information can then be used
to change program flow by using conditional execution

● N is the result of the shifter operation


ARM instruction set
• example shows a CMP comparison instruction
• both registers, r0 and r9, are equal before executing the
instruction
• value of the z flag prior to execution is 0 and is
represented by a lowercase z
• After execution the z flag changes to 1 or an uppercase Z
• This change indicates equality
ARM instruction set
• CMP is effectively a subtract instruction with the result
discarded
• CMN (compare negative) instruction
– operand1 - ( - operand2)
– useful for comparing the values in registers against small
negative numbers (such as -1 which might be used to mark the
end of a data structure.)
• TST(test bits) instruction is a logical AND operation
– used to test if one or more bits are set
– first operand is the value to be tested; the second operand is
the bit mask
• TEQ (test equivalent)is a logical exclusive OR operation
– similar to TST, but differs in that it uses an exclusive-or
operation
– It can be used to determine if specific bits in two
operands are the same or different.
ARM instruction set
• For each, the results are discarded but the condition
bits are updated in the cpsr
• no need to apply S suffix for comparison instructions
to update the flags
• comparison instructions
– only modify the condition flags of the cpsr
– do not affect the registers being compared
ARM instruction set
– Multiply Instructions
• multiply the contents of a pair of registers
• depending upon the instruction, accumulate the results in with
another register
– long multiplies accumulate onto a pair of registers representing a 64-
bit value
– The final result is placed in a destination register or a pair of register
ARM instruction set
• example shows a simple multiply instruction that multiplies
registers r1 and r2 together and places the result into register r0
• In this example, register r1 is equal to the value 2, and r2 is
equal to 2
• result, 4, is then placed into register r0
ARM instruction set
• long multiply instructions (SMLAL, SMULL, UMLAL, and
UMULL) produce a 64-bit result
• result is too large to fit a single 32-bit register
– result is placed in two registers labeled RdLo and RdHi
– RdLo holds the lower 32 bits of the 64-bit result, and
– RdHi holds the higher 32 bits of the 64-bit result
– example :
» instruction multiplies registers r2 and r3 and places the
result into register r0 and r1
» Register r0 contains the lower 32 bits, and register r1
contains the higher 32 bits of the 64-bit result
ARM instruction set

• Branch Instructions
– changes the flow of execution or is used to call a routine
– allows programs to have
• subroutines, if-then-else structures, and loops
– change of execution flow forces the program counter pc
to point to a new address
ARM instruction set
– example shows a forward and backward branch
– Because these loops are address specific, we do not
include the pre- and post-conditions
– forward branch skips three instructions
– backward branch creates an infinite loop
ARM instruction set
– Branches are used to change execution flow
– Most assemblers hide the details of a branch instruction
encoding by using labels
– In the previous example, forward and backward are the
labels
– branch labels
• placed at the beginning of the line
• used to mark an address that can be used later by the
assembler to calculate the branch offset
ARM instruction set
– branch with link, or BL, instruction
• similar to the B instruction
• but overwrites the link register lr with a return
address
• It performs a subroutine call
• example shows a simple fragment of code that
branches to a subroutine using the BL instruction
• To return from a subroutine, you copy the link
register to the pc
ARM instruction set
– branch exchange (BX) instruction
• uses an absolute address stored in register Rm
• primarily used to branch to and from Thumb code
• T bit in the cpsr is updated by the least significant bit
of the branch register
– BLX instruction
• updates the T bit of the cpsr with the least significant
bit
• additionally sets the link register with the return
address
CSE205R01: COMPUTER ARCHITECTURE
B.Tech. CSE

Rajilal M V
School of Computing
instruction example
ARM instruction set
Load Store Instructions
• transfer data between memory and processor registers
• three types of load-store instructions:
– single-register transfer, multiple-register transfer, and
swap
• Single-Register Transfer
– used for moving a single data item in and out of a
register
– data types supported
• signed and unsigned words (32-bit),
• halfwords (16-bit),
• bytes
Load Store Instructions
– various load-store single-register transfer instructions
Load Store Instructions
– example
• LDR r0, [r1] ; = LDR r0, [r1, #0]
– loads a word from the address stored in register r1 and
places it into register r0
• STR r0, [r1] ; = STR r0, [r1, #0]
– storing the contents of register r0 to the address contained
in register r1
• offset from register r1 is zero
• Register r1 is called the base address register
Load Store Instructions
• Single-Register Load-Store Addressing Modes
– ARM instruction set provides different modes for addressing
memory
– These modes incorporate one of the indexing methods:
• preindex with writeback ⇒ first increment the address
then fetch the content from the updated address
• preindex ⇒ fetching the content alone
• postindex ⇒ first fetch the content then increment the
address
Load Store Instructions
– Preindex with writeback
• calculates an address from a base register plus address
offset
• then updates that address base register with the new
address
– preindex offset
• same as the preindex with writeback but does not
update the address base register
– Postindex
• only updates the address base register after the address
is used
– preindex mode is useful for accessing an element in a
data structure
– postindex and preindex with writeback modes are
useful for traversing an array
Load Store Instructions
Load Store Instructions

• fig shows addressing modes available for load and store of a 32-bit word or an unsigned byte
• A signed offset or register is denoted by “+/−”
– identifying that it is either a positive or negative offset from the base address register Rn
– base address register is a pointer to a byte in memory
– the offset specifies a number of bytes
• Immediate means
– address is calculated using the base address register and a 12-bit offset encoded in the
instruction
• Register means
– address is calculated using the base address register and a specific register’s contents
• Scaled means
– address is calculated using the base address register and a barrel shift operation
Load Store Instructions
Load Store Instructions
• Multiple-Register Transfer
– transfer multiple registers between memory and the
processor in a single instruction
– transfer occurs from a base address register Rn
pointing into memory
– more efficient than single-register transfers
• for moving blocks of data around memory
• saving and restoring context and stacks
Load Store Instructions

– N is the number of registers in the list of registers


– Any subset of the current bank of registers can be transferred to
memory or fetched from memory
– base register Rn
• determines the source or destination address for a load-store
multiple instruction
• this register can be optionally updated following the transfer
– this occurs when register Rn is followed by the !
character
– similiar to the single-register load-store using preindex
with writeback
Load Store Instructions
Load Store Instructions
example
register r0 is the base register Rn and
is followed by !
○ indicating that the register is
updated after the instruction is
executed
within the load multiple instruction that
the registers are not individually
listed
○ Instead the “-” character is used to
identify a range of registers
○ the range is from register r1 to r3
inclusive
○ Each register can also be listed, using
a comma to separate each register
within “{” and “}” brackets
Load Store Instructions
– Graphical representation of previous example
Load Store Instructions
– replace the LDMIA instruction with LDMIB instruction and
use the same PRE conditions
• first word pointed to by register r0 is ignored
• register r1 is loaded from the next memory location
– After execution
• register r0 now points to the last loaded memory location
• This is in contrast with the LDMIA example, which pointed
to the next memory location
Load Store Instructions

● STMIB instruction stores the values 7, 8, 9 to memory


● We then corrupt register r1 to r3
● LDMDA reloads the original values and restores the base pointer r0
Load Store Instructions
block memory copy example
● a simple routine that copies blocks of 32 bytes from a source address
location to a destination address location
● two load-store multiple instructions
○ which use the same increment after addressing mode
Load Store Instructions
● relies on registers r9, r10, and r11 being set up before the code is executed
● Registers r9 and r11 determine the data to be copied
● register r10 points to the destination in memory for the data
● LDMIA
○ loads the data pointed to by register r9 into registers r0 to r7
○ also updates r9 to point to the next block of data to be copied
● STMIA
○ copies the contents of registers r0 to r7 to the destination memory address
pointed to by register r10
○ also updates r10 to point to the next destination location
● CMP and BNE
○ compare pointers r9 and r11
○ to check whether the end of the block copy has been reached
○ If the block copy is complete, then the routine finishes;
○ otherwise the loop repeats with the updated values of register r9 and r10
○ BNE is the branch instruction B with a condition mnemonic NE (not equal)
■ If the previous compare instruction sets the condition flags to not equal,
the branch instruction is executed
Load Store Instructions
• shows the memory map of the block memory copy and how the
routine moves through memory
Load Store Instructions
● Stack operations
○ ARM architecture uses the load-store multiple instructions to
carry out stack operations
■ pop operation (removing data from a stack) uses a load
multiple instruction;
■ push operation (placing data onto the stack) uses a store
multiple instruction
○ stack is either ascending (A) or descending (D)
■ Ascending stacks grow towards higher memory addresses
■ descending stacks grow towards lower memory addresses
○ When you use a full stack (F)
■ stack pointer sp points to an address that is the last used or
full location (i.e., sp points to the last item on the stack)
○ if you use an empty stack (E)
■ sp points to an address that is the first unused or empty
location (i.e., it points after the last item on the stack)
Load Store Instructions
• number of load-store multiple addressing mode aliases
available to support stack operations

• Next to the pop column is the actual load multiple


instruction equivalent
– example, a full ascending stack would have the
notation FA appended to the load multiple
instruction—LDMFA
– This would be translated into an LDMDA instruction
Load Store Instructions
• ARM has specified an ARM-Thumb Procedure Call
Standard (ATPCS)
– defines how routines are called and how registers are
allocated
– In the ATPCS, stacks are defined as being full
descending stacks
– Thus, the LDMFD and STMFD instructions provide
the pop and push functions, respectively
Load Store Instructions
push onto a full descending stack
Load Store Instructions
Load Store Instructions
• SWAP instruction
– special case of a load-store instruction
– swaps the contents of memory with the contents of a
register
– is an atomic operation
• it reads and writes a location in the same bus
operation, preventing any other instruction from
reading or writing to that location until it completes
Load Store Instructions
– Swap cannot be interrupted by any other instruction or
any other bus access
• system “holds the bus” until the transaction is
complete
example: swap instruction loads a word from memory into registerr0 and
overwrites the memory with register r1
Load Store Instructions
– instruction is particularly useful when implementing
semaphores and mutual exclusion in an operating
system
Load Store Instructions

– example shows a simple data guard that can be used to protect data
from being written by another task
– SWP instruction “holds the bus” until the transaction is complete
– address pointed to by the semaphore either contains the value 0 or 1
– When the semaphore equals 1, then the service in question is being
used by another process
– The routine will continue to loop around until the service is released
by the other process—
• in other words, when the semaphore address location contains the
value 0
CSE205R01: COMPUTER ARCHITECTURE
B.Tech. CSE

Rajilal M V
School of Computing
ARM instruction set
• Software Interrupt Instruction
– causes a software interrupt exception
– provides a mechanism for applications to call operating system
routines

• When the processor executes an SWI instruction


– sets the program counter pc to the offset 0x8 in the vector
table
– instruction also forces the processor mode to SVC, which
allows an operating system routine to be called in a privileged
mode
• Each SWI instruction has an associated SWI number
– used to represent a particular function call or feature
ARM instruction set

• Software Interrupt Instruction


– pc will be loaded with address 0x00000008
(vectors+0x8)
ARM instruction set
– a simple example of an SWI call with SWI number
0x123456, used by ARM toolkits as a debugging SWI
– Typically the SWI instruction is executed in user
mode
ARM instruction set
• SWI instructions are used to call operating system routines
– some form of parameter passing is required
– achieved using registers
– example,
• register r0 is used to pass the parameter 0x12
• return values are also passed back via registers
– Code called the SWI handler is required to process the SWI
call
– handler obtains the SWI number
• using the address of the executed instruction, which is
calculated from the link register lr
– The SWI number is determined by

– the SWI instruction is the actual 32-bit SWI instruction


executed by the processor
ARM instruction set
• example
– shows the start of an SWI handler
implementation
– code fragment determines what
SWI number is being called and
places that number into register
r10
– load instruction first copies the
complete SWI instruction into
register r10
– BIC instruction masks off the top
bits of the instruction, leaving the
SWI number
– assumption is that the SWI has been
called from ARM state
– number in register r10 is then used
by the SWI handler to call the
appropriate SWI service routine
ARM instruction set

• Program Status Register Instructions


• ARM instruction set provides
– two instructions to directly control a program status
register (psr)
– MRS instruction
• transfers the contents of either the cpsr or spsr
into a register
– MSR instruction
• transfers the contents of a register into the cpsr
or spsr
– Both instructions are used to read and write the cpsr
and spsr
ARM instruction set

– field label in syntax


• can be any combination of control (c), extension (x), status
(s), and flags (f)
– fields relate to particular byte regions in a psr shown
below
ARM instruction set

➢ in the MSR instruction


■ _c Sets the control field mask, bits [7:0]
● c field controls the interrupt masks, Thumb state,
and processor mode
■ _x Sets the extension field mask, bits [15:8]
■ _s Sets the status field mask, bits [23:16]
■ _f Sets the flags field mask, bits [31:24]
ARM instruction set
– Example shows how to enable IRQ interrupts by clearing the I mask

• using both the MRS and MSR instructions to read from and then write to the
cpsr
• MRS first copies the cpsr into register r1
• BIC instruction clears bit 7 of r1
• Register r1 is then copied back into the cpsr, which enables IRQ interrupts
• this code preserves all the other settings in the cpsr and only modifies the I
bit in the control field
• above example is in SVC mode
– In user mode you can read all cpsr bits, but you can only update the condition flag field
f
ARM instruction set
– Coprocessor Instructions
• used to extend the instruction set
• coprocessor can
– either provide additional computation capability or
– be used to control the memory subsystem including
caches and memory management
• coprocessor instructions
– include data processing, register transfer, and memory
transfer instructions
Note:
• we will cover only a short overview
– since these instructions are only used by cores
with a coprocessor
ARM instruction set

• cp field represents the coprocessor number between p0 and p15


• opcode fields describe the operation to take place on the
coprocessor
• Cn, Cm, and Cd fields describe registers within the coprocessor
• coprocessor operations and registers depend on the specific
coprocessor you are using
• Coprocessor 15 (CP15) is reserved for system control purposes,
such as memory management, write buffer control, cache control,
and identification registers
ARM instruction set
– Coprocessor 15 Instruction Syntax
• CP15 configures the processor core and has a set of
dedicated registers to store configuration information
• A value written into a register sets a configuration
attribute
– for example, switching on the cache
• CP15 is called the system control coprocessor
• Both MRC and MCR instructions are used to read and
write to CP15
– where register Rd is the core destination register
– Cn is the primary register
– Cm is the secondary register
– opcode2 is a secondary register modifier.
– secondary registers are also called “extended registers.”
ARM instruction set
• example shows a CP15 register being copied into a
general-purpose register

• CP15 register-0 contains the processor identification number


• This register is copied into the general-purpose register r10
ARM instruction set
• As an example, here is the instruction to move the
contents of CP15 control register c1 into register r1 of
the processor core:

• We use a shorthand notation for CP15 reference


– that makes referring to configuration registers easier to
follow
– The reference notation uses the following format:
ARM instruction set

• first term, CP15, defines it as coprocessor 15


• second term, after the separating colon, is the primary
register
– primary register X can have a value between 0 and 15
• third term is the secondary or extended register
– secondary register Y can have a value between 0 and 15
• last term, opcode2, is an instruction modifier and can
have a value between 0 and 7
• Some operations may also use a nonzero value w of
opcode1
– We write these as CP15:w:cX:cY:Z
ARM instruction set

• Loading Constants
– there is no ARM instruction to move a 32-bit
constant into a register
– Since ARM instructions are 32 bits in size
• they obviously cannot specify a general 32-bit
constant
– To aid programming
• there are two pseudoinstructions to move a 32-bit
value into a register
ARM instruction set

– first pseudoinstruction writes a 32-bit constant to a register using


whatever instructions are available
• It defaults to a memory read if the constant cannot be encoded
using other instructions
– second pseudoinstruction writes a relative address into a register,
which will be encoded using a pc-relative expression
ARM instruction set
– example shows an LDR instruction loading a 32-bit constant
0xff00ffff into register r0

● This example involves a memory access to load the constant,


which can be expensive for time-critical routines
ARM instruction set
– shows an alternative method to load the same constant into register r0
by using an MVN instruction
ARM instruction set
– there are alternatives to accessing memory but they depend upon the
constant you are trying to load
– Compilers and assemblers use clever techniques to avoid loading a
constant from memory
• These tools have algorithms
– to find the optimal number of instructions required to
generate a constant in a register and make extensive use of
the barrel shifter
– If the tools cannot generate the constant by these methods, then it is
loaded from memory
– LDR pseudoinstruction
• either inserts an MOV or MVN instruction to generate a value (if
possible)
• or generates an LDR instruction with a pc-relative address to
read the constant from a literal pool—a data area embedded
within the code
ARM instruction set

– Table shows two pseudocode conversions


– first conversion produces a simple MOV instruction
– second conversion produces a pc-relative load
– Another useful pseudoinstruction is the ADR instruction,
or address relative
• instruction places the address of the given label into register Rd,
using a pc-relative add or subtract
ARM instruction set

• Conditional Execution
– Most ARM instructions are conditionally
executed
• The instruction only executes if the condition code
flags pass a given condition or test
– using conditional execution instructions
• can increase performance and code density
– condition field
• two-letter mnemonic appended to the instruction
mnemonic
• default mnemonic is AL, or always execute
ARM instruction set
– conditional execution
• reduces the number of branches
• also reduces the number of pipeline flushes and thus
improves the performance of the executed code
• depends upon two components:
– condition field and condition flags
• located in the instruction, and the condition flags are
located in the cpsr
ARM instruction set
• we will take the simple C code fragment shown in this
example
– compare the assembler output using non-conditional and
conditional instructions
• Let register r1 represent a and register r2 represent b
ARM instruction set
ARM instruction set

– left side code only uses conditional execution on the


branch instructions:
– right side code with full conditional execution
• this dramatically reduces the number of instructions

You might also like