0% found this document useful (0 votes)
3 views63 pages

Lec02 03 Updated

The document discusses Instruction Set Architecture (ISA) and its performance metrics, including MIPS and MFLOPS. It explains Amdahl's Law, which predicts the speedup of a system based on enhancements and the fraction of the task affected. Additionally, it covers various architectures such as stack, accumulator, and general-purpose register architectures, highlighting their characteristics and implications for computer architecture design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views63 pages

Lec02 03 Updated

The document discusses Instruction Set Architecture (ISA) and its performance metrics, including MIPS and MFLOPS. It explains Amdahl's Law, which predicts the speedup of a system based on enhancements and the fraction of the task affected. Additionally, it covers various architectures such as stack, accumulator, and general-purpose register architectures, highlighting their characteristics and implications for computer architecture design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 63

Advanced Computer Architecture

Lecture – 2-3
Instruction Set Architecture

Dr. M. Ilyas Fakhir


1
Metrics of Performance
Application Operations per second
Programming
Language
MIPS: Millions of Instructions per second
Compiler MFLOPS: millions of FP operations per sec.

Instruction Set Architecture

Datapath Megabytes per second


Control
Function Units
Cycles per second (clock rate)
Transistors
Pins/ Wire – I/O

2
Amdahl's Law
• Suppose that enhancement E accelerates a fraction F of the task by a
factor S, and the remainder of the task is unaffected

Time for Fraction Execution time of


F to be Enhanced the Fraction
by factor S Enhanced

Original Execution
Execution Time
Time of Task after fraction F
Enhanced by factor S
3
Amdahl's Law

Speedup due to enhancement E:

Ex Time without E
Speedup (E) =
Ex Time with E

Performance with E
=
Performance without E

4
Amdahl’s Law

Ex Time new = Ex Time old x (1 – Fraction enhanced) + Fraction enhanced


Speedup enhanced

5
Amdahl’s Law

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced


Speedupenhanced

1
ExTimeold
Speedupoverall = =
(1 - Fractionenhanced) + Fractionenhanced
ExTimenew
Speedupenhanced

1
Speedupoverall =
(1 - F) + F
S
6
Amdahl’s Law
• Overall speedup if we make 90% of a program run 10 times faster.
F = 0.9, S = 10

• Overall speedup if we make 80% of a program run 20% faster.


F = 0.8, S = 1.2 “(20+100)/100”

1 1
Speedupoverall = = = 5.26
(1- 0.9) + 0.9/10 0.1 + 0.09

1 1
Speedupoverall = = = 1.153
(1- 0.8) + 0.8/1.2 0.2 + 0.66
7
Amdahl’s Law
• The law is often used to predict the potential performance improvement of a system when adding more processors or improving the speed of individual
processors.

P is proportion of the system that can be improved


N is number of processors in the system
For example, if a system has a single bottleneck that occupies 20% of the total execution time, and we add 4 more processors to the system, the speedup would be:

1
Speedupoverall =
(1 - P) + P

1 1
Speedupoverall = = = 1.19
(1- 0.2) + 0.2/5 0.8 + 0.04
This means that the overall performance of the system would improve by
about 19% with the addition of the 4 processors.
8
C progra m
A Translation Hierarchy
• High Level Language (HLL)
Com p iler
programs first compiled
(possibly into assembly),
Assem bly la ng ua ge p rogram then linked and finally
loaded into main memory.
Asse m ble r

O bject: M achine la ngu ag e m o du le O b je ct: L ib rary ro utine (m a ch in e lang ua ge )

L in ker

Execu tab le : M a ch ine lang ua ge prog ram

L oa de r

M em ory

9
MIPS R3000 Instruction Set Architecture
(Summary)
° Machine Environment Target
Registers
° Instruction Categories
 Load/Store
 Computational R0 - R31
 Jump and Branch
 Floating Point (coprocessor)

PC
HI
LO
3 Instruction Formats: all 32 bits wide
OP
R: Rs Rt Rd sa funct

I: OP Rs Rt Immediate

J: OP jump target

10
Review C Operators/Operands
• Operators: +, -, *, /, % (mod); (7/4==1, 7%4==3)

• Operands:
– Variables: fahr, celsius
– Constants: 0, 1000, -17, 15.4

• In C (and most High Level Languages) variables declared and


given a type first
 Example:
int fahr, celsius;
int a, b, c, d, e;

11
Assembly
Operators
° Syntax of Assembly Operator
1) operation by name “Mnemonics''
2) operand getting result Register or Memory
3) 1st operand for operation
4) 2nd operand for operation

° Ex. add b to c and put the result in a: add a, b, c


Called an Assembly Language Instruction

° Equivalent assignment statement in C:


a = b + c;

12
Assembly
Operators/Instructions
° MIPS Assembly Syntax is rigid:
1 operation, 3 variables
Why? Keep Hardware simple via regularity

° How to do the following C statement?


a = b + c + d - e;

° Break into multiple instructions


add a, b, c # a = sum of b & c
add a, a, d # a = sum of b,c,d
sub a, a, e # a = b+c+d-e

° To right of sharp sign (#) is a comment terminated by end of


the line. Applies only to current line.
C comments have format /* comment */ , can span many
lines

13
Compilati
on
° How to turn the notation that programmers prefer into notation
computer understands?

° Program to translate C statements into Assembly Language


instructions; called a compiler

° Example: compile by hand this C code:


a = b + c;
d = a - e;

° Easy:

add a, b, c
sub d, a, e

° Big Idea: compiler translates notation from one level of


computing abstraction to lower level

14
Compilatio
° Example: compile by handnthis
2 C code:
f = (g + h) - (i + j);

° First sum of g and h. Where to put result?

add f, g, h # f contains g+h

° Now sum of i and j. Where to put result?


Cannot use f !
Compiler creates temporary variable to hold sum: t1

add t1, i, j # t1 contains i+j

° Finally produce difference


sub f, f, t1 # f = (g+h)-(i+j)

15
Compilation --
Summary
° C statement (5 operands, 3 operators):
f = (g + h) - (i + j);
° Becomes 3 assembly instructions
(6 unique operands, 3 operators):
add f,g,h # f contains g+h
add t1,i,j # t1 contains i+j
sub f,f,t1 # f=(g+h)-(i+j)
° In general, each line of C produces many assembly instructions
One reason why people program in C vs. Assembly;
fewer lines of code
Other reasons? (many!)

16
Assembly Design: Key
Concepts
• Assembly language is essentially directly supported
in hardware, therefore ...

• It is kept very simple!


– Limit on the type of operands
– Limit on the set operations that can be done to
absolute minimum.
• if an operation can be decomposed into a simpler
operation, don’t include it.

17
Instruction Set
Architecture – ISA
• Our focus in couple of lectures will be the Instruction Set
Architecture – ISA which is the interface between the
hardware-software
• It plays a vital role in understanding the computer
architecture from any of the above mentioned perspectives
• The design of hardware and software can’t be initiated
without defining ISA
• It describes the instruction word format and identifies the
memory addressing for data manipulation and control
operations

18
Taxonomy of Instruction Set
• Major advances in computer architecture are typically
associated with landmark instruction set designs – stack,
accumulator, general purpose register etc.
• Design decisions must take into account:
– technology
– machine organization
– programming languages
– compiler technology
– operating systems

19
Taxonomy of Instruction Set ….. Cont’d
• Basic Differentiator: The type of internal storage of
the operand
• Major Choices of ISA:
– Stack Architecture:
– Accumulator Architecture
– General Purpose Register Architecture
 Register – memory
 Register – Register (load/store)
 Memory – Memory Architecture (Obsolete)

20
Stack Architecture
Processor
• Both the operands are
implicitly on the TOS TOS
• Thus, it is also referred to as
Zero-Address machine
• The operand may be either
an input (orange shade) or
result from the ALU (yellow ALU
shade)
• All operands are implicit
(implied or inherited)
• The first operand is removed ....
from the stack and the Memory
second operand is replaced
by the result
....
21
Stack Architecture
To execute: C=A+B
ADD instruction has implicit TOS
operands for the stack –
operands are written in the
stack using PUSH
instruction

PUSH A
PUSH B ALU
ADD
POP C

22
Accumulator Architecture
• An accumulator is a special
Processor
register within the CPU that
serves both as the implicit source
of one operand and as the result
destination for arithmetic and
logic operations.
• Thus, it accumulates or collect
data and doesn’t serve as an
address register at any time ALU
• Limited number of accumulators -
usually only one – are used
• The second operand is in the
memory, thus accumulator based
machines are also called 1- Memory
....
address machines
• They are useful when memory is
expensive or when a limited
number of addressing modes is to
be used ....
23
Accumulator Architecture
To execute: C=A+B
ADD instruction has implicit
operand A for the accumulator,
written using LOAD instruction;
and the second operand B is in
memory at address B ALU

Load A
ADD B
Store C
.
.

24
General Purpose Register Architecture
• Many general purpose registers are available within CPU
• Generally, CPU registers do not have dedicated functions
and can be used for a variety of purposes – address, data
and control
• A relatively small number of bits in the instruction is
needed to identify the register
• In addition to the GPRs, there are many dedicated or
special-purpose registers as well, but many of them are not
“visible” to the programmer
• GPR architecture has explicit operands either in register or
memory thus there may exist:
- Register – memory architecture
- Register – Register (Load/Store) Architecture
- Memory – Memory Architecture
25
General Purpose Register Architecture
Register – Memory Architecture
• One explicit operand is in a
register and one in memory and Processor ....
the result goes into the register R3
• The operand in memory is R2
R1
accessed directly ....
To execute: C=A+B
ADD instruction has explicit
operand A loaded in a register
and the operand B is in memory ALU
and the result is in register

Load R1, A ....


ADD R3, R1, B Memory
Store R3, C

....
26
General Purpose Register Architecture
Register – Register
• The explicit operands in memory
(Load/store) Architecture
are first loaded into registers
Processor
....
temporarily and R3
• Are transferred to memory by R2
R1
Store instruction ....
To execute: C=A+B
ADD instruction has implicit operands
A and B loaded in registers
ALU
Load R1, A
Load R2, B
ADD R3, R1, R2
Store R3, C Memory ....
• Both the explicit operands are not
accessed from memory directly,
i.e., Memory – Memory
Architecture is obsolete
....
27
Basic ISA Classes
Most real machines are hybrids of these:

Accumulator (1 register):
1 address add A acc acc + mem[A]
1+x address addx A acc acc + mem[A + x]
Stack:
0 address add tos tos + next
General Purpose Register (can be memory/memory):
2 address add A B EA[A] EA[A] + EA[B]
3 address add A B C EA[A] [B] + EA[C]
Load/Store:
3 address add Ra Rb Rc Ra Rb + Rc
load Ra Rb Ra mem[Rb]
store Ra Rb mem[Rb] Ra
Comparison:
Bytes per instruction? Number of Instructions? Cycles per instruction?
Comparing Number of Instructions
Code sequence for (C = A + B) for four classes of
instruction sets:
Register Register
Stack Accumulator (register-memory) (load-store)
Push A Load A Load R1,A Load R1,A
Push B Add B Add R1,B Load R2,B
Add Store C Store C, R1 Add R3,R1,R2
Pop C Store C,R3

MIPS
Another example:
Using different instruction formats, write pseudo-code to evaluate the following
expression: Z = 4(A+B) – 16(C+58) : Your code should not change the source
operands

3-Address 2-Address 1-Address 0-Address


ADD x, A, B LOAD y, B LDA C PUSH C
MUL y, x, 4 ADD y, A ADDA 58 PUSH 58
ADD r, C, 58 MUL y, 4 MULA 16 ADD
MUL s, r, 16 LOAD s, C STA S PUSH 16
SUB Z, y, s ADD s, 58 LDA A MUL
MUL s, 16 ADDA B PUSH A
SUB y, s MULA 4 PUSH B
STORE Z, y SUBA s ADD
STA Z PUSH 4
MUL
SUB
POP Z
30
Comparison of three GPR Architectures

Register-Register
Advantages
• Simple, fixed-length instruction decoding
• Simple code generation
• Similar number of clock cycles / instruction

Disadvantages
• Higher Instruction count than memory reference
• Lower instruction density leads to larger programs

31
Comparison of three GPR Architectures

• Register- Memory
Advantages
• Data can be accessed without separate Load first
• Instruction format is easy to encode
Disadvantages
• Operands are not equivalent since a source
operand (in a register) is destroyed in operation
• Encoding a register number and memory address
in each instruction may restrict the number of
registers
• CPI vary by operand location
32
Comparison of three GPR Architectures

• Memory- Memory
Advantages
• Most compact
• Doesn’t waste registers for temporary storages
Disadvantages
• Large variation in instruction size
• Large variation in work per instruction
• Memory bottleneck by memory access

33
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model from Implementation 1963-64

High-level Language Based (B5000 Concept of a Family (IBM 360


1963) 1964)

General Purpose Register Machines

Complex Instruction Sets Computer Load/Store Architecture


(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)

Reduced Instruction Set Computer


(Mips, Sparc, HP-PA,IBM RS6000, . . .1987)
34
Types and Size of Operands
Types of an Operand
- Integer
- Single-precision floating point
- Character
Size of Operand
- Nibble 4-bit
- Character 8-bit
- Half precision or Half word 16-bit
- Single precision FP or Word 32-bit
- Double precision FP or 64-bit
double word
35
Categories of Instruction Set Operations
All computer provide a full set of following operational
instructions for:

Arithmetic and Logic


- Integer add, sub, and, or, multiply, divide

Data Transfer
- Load, store and
- Move instructions with memory addressing

Control
- Branch, Jump, procedure call and return
36
Categories of Instruction Set Operations …
Cont’d
The following support instructions may be provided in computer
with different levels

System
- Operating System call, Virtual Memory Management
Floating point
- Add, multiply, divide and compare
Decimal
- BCD add, multiply and Decimal to Character Conversion
String
- String move, compare and search
Graphics
- Pixel and vertex operations, compression / de-compression
operations

37
Operand Addressing Modes
• An “effective address” is the binary bit pattern
issued by the CPU to specify the location of
operands in CPU (register) or the memory

• Addressing modes are the ways of providing


access paths to CPU registers and memory
locations

• Commonly used addressing modes are:


- Immediate
- Register
- Direct or Absolute
- Indirect
38
Operand Addressing Modes
- Immediate ADD R4, # 24H Reg[R4] Reg[R4] + 24 H

Data for the instruction is part of the instruction itself


Used to hold source operands only; cannot be used for storing results

- Register ADD R4, R3 Reg[R4] Reg[R4] + Reg[R3]

Operand is contained in a CPU register


No memory access needed , therefore it is fast

- Direct (or absolute) ADD R1,(1000) Reg[R1] Reg[R1] + Mem[1000]

The address of the operand is specified as a constant, coded as part of


the instruction
Limited address space (2operand field size) locations

39
Commonly used addressing modes …
cont’d

Indirect Addressing modes


The address of the memory location where the data is
to be found is stored in the instruction as the operand,
i.e., the operand is the address of an address

Large address space ( 2 memory word size


) available

Two or more memory accesses are required

40
Commonly used addressing modes … cont’d
Types of Indirect addressing modes:
Register Indirect
Register Indirect Indexed
- Effective memory address is calculated by adding another register
(index register) to the value in a CPU register (usually referred to as
the base register)
Useful for accessing 2-D arrays
Register Indirect plus displacement
- Similarly, “based” refers to the situation when the constant refers to
the offset (displacement) of an array element with respect to the first
element. The address of the first element is stored in a register
Memory Indirect
41
Commonly used addressing modes … cont’d
Meanings of Indirect Addressing Modes
- Register Indirect
ADD R4, (R1) Reg[R4] Reg[R4] + Mem[Reg[R1]]

- Register Indirect Indexed


ADD R4, (R1+R2) Reg[R4] Reg[R4] + Mem[Reg[R1]+Reg[R2]]

- Register Indirect plus displacement


ADD R4,100(R1) Reg[R4] Reg[R4] + Mem[100+Reg[R1]]

- Memory Indirect
ADD R4,@(R1) Reg[R4] Reg[R4] + Mem[Mem[Reg[R1]]

42
Special Addressing Modes
Used for stepping within loops; R2 points to the start of the array; each reference
increments / decrements R2 by ‘d’; the size of the elements in the array

- Auto-increment ADD R1, (R2)+


(i) Reg[R1] Reg[R1] + Mem[Reg [R2]]
(ii) Reg[R2] Reg[R2] + d

- Auto-decrement ADD R1, (R2)-


(i) Reg[R2] Reg[R2] - d
(ii) Reg[R1] Reg[R1] + Mem[Reg [R2]]

- Scaled ADD R1, 100(R2)[R3]


Reg[R1] Reg[R1] +
Mem[100+Reg [R2] + [R3]*d]

43
Addressing Modes of Control Flow
Instructions
- Branch (conditional)
a sort of displacement, in number of instructions,
relative to PC

- Jump (Unconditional)
jump to an absolute address, independent of the
position of PC

- Procedure call/return
control transfer with some state and return address
saving, some times in a special link register or in some
GPRs (General Purpose Registers)

44
Assembly Variables: Registers (1/4)

• Unlike HLL, assembly cannot use variables

– Why not? Keep Hardware Simple

• Assembly Operands are registers

– limited number of special locations built directly into the hardware

– operations can only be performed on these!

• Benefit: Since registers are directly in hardware, they are very


fast

45
Assembly Variables: Registers (2/4)

• Drawback: Since registers are in hardware, there are a


predetermined number of them
– Solution: MIPS code must be very carefully put together to
efficiently use registers

• 32 registers in MIPS
– Why 32? Smaller is faster

• Each MIPS register is 32 bits wide


– Groups of 32 bits called a word in MIPS

46
Assembly Variables: Registers (3/4)

• Registers are numbered from 0 to 31

• Each register can be referred to by number or name

• Number references:
$0, $1, $2, … $30, $31

47
Assembly Variables: Registers (4/4)
• By convention, each register also has a name to make it easier
to code

• For now:
$16 - $22 $s0 - $s7
(correspond to C variables)
$8 - $15 $t0 - $t7
(correspond to temporary variables)

• In general, use register names to make your code more


readable

48
MIPS Addressing Formats (Summary)
• How memory can be addressed in MIPS
1. Immediate addressing
op rs rt Immediate

2. Register addressing
op rs rt rd . .. funct Registers
Register

3. Base addressing
op rs rt Address Memor y

Register + Byte Halfword Word

4. PC-relative addressing
op rs rt Address Memor y

PC + Word

5. Pseudodirect addressing
op Address Memor y

PC Word

49
Data Transfer: Memory to Reg (1/4)

• To transfer a word of data, we need to specify two things:

– Register: specify this by number (0 - 31)

– Memory address: more difficult

- Think of memory as a single one-dimensional array, so we


can address it simply by supplying a pointer to a memory
address.
- Other times, we want to be able to offset from this pointer.

50
Data Transfer: Memory to Reg (2/4)
• To specify a memory address to copy from, specify two things:
– A register which contains a pointer to memory
– A numerical offset (in bytes)

• The desired memory address is the sum of these two values.

• Example: 8($t0)
– specifies the memory address pointed to by the value in $t0, plus
8 bytes

51
Data Transfer: Memory to Reg (3/4)
• Load Instruction Syntax:
1 2, 3(4)
– where
1) operation (instruction) name
2) register that will receive value
3) numerical offset in bytes
4) register containing pointer to memory

• Instruction Name:
– lw (meaning Load Word, so 32 bits or one word are loaded at a
time)

52
Data Transfer: Memory to Reg (4/4)
• Example: lw $t0, 12($s0)
This instruction will take the pointer in $s0, add 12 bytes to it, and
then load the value from the memory pointed to by this calculated
sum into register $t0

• Notes:
– $s0 is called the base register
– 12 is called the offset
– offset is generally used in accessing elements of array or structure:
base register points to beginning of array or structure

53
Data Transfer: Reg to Memory
• Also want to store value from a register into memory

• Store instruction syntax is identical to Load instruction syntax

• Instruction Name:

sw (meaning Store Word, so 32 bits or one word are loaded at a


time)

• Example: sw $t0, 12($s0)


This instruction will take the pointer in $s0, add 12 bytes to it, and
then store the value from register $t0 into the memory address
pointed to by the calculated sum

54
Addressing: Byte vs. word
• Every word in memory has an address, similar to an index in an array

• Early computers numbered words like C numbers elements of an


array:
– Memory[0], Memory[1], Memory[2], …

Called the “address” of a word

 Computers needed to access 8-bit bytes as well as words


(4 bytes/word)

 Today machines address memory as bytes, hence word


addresses differ by 4
Memory[0], Memory[4], Memory[8],

55
Role of Registers vs. Memory
• What if more variables than registers?
– Compiler tries to keep most frequently used variable in registers
– Writing less frequently used to memory

• Why not keep all variables in memory?


– Smaller is faster:
registers are faster than memory
– Registers more versatile:
• MIPS arithmetic instructions can read 2 operands, operate
on them, and write 1 per instruction
• MIPS data transfer only read or write 1 operand per
instruction, and no operation

56
Register
Conventions (1/6)

• Caller: the calling function

• Callee: the function being called

• When callee returns from executing, the caller needs


to know which registers may have changed and
which are guaranteed to be unchanged.

° Register Conventions: A set of generally accepted


rules as to which registers will be unchanged after a
procedure call (jal) and which may be changed.

57
Register
Conventions (2/6)

• $0: No Change. Always 0.

• $v0-$v1: Change. These are expected to contain new


values.

• $a0-$a3: Change. These are volatile argument


registers.

• $t0-$t9: Change. That’s why they’re called temporary:


any procedure may change them at any time.

58
Register
Conventions (3/6)
• $s0-$s7: No Change. Very important, that’s why they’re called saved
registers. If the callee changes these in any way, it must restore the
original values before returning.
• $sp: No Change. The stack pointer must point to the same place before
and after the jal call, or else the caller won’t be able to restore values
from the stack.
• $ra: Change. The jal call itself will change this register.

• Example: Jump and link

jal label

– Copies the address of the next instruction into the register $ra (register 31)
and then jumps to the address label.

59
Register
Conventions (4/6)
• What do these conventions mean?
– If function A calls function B, then function A must
save any temporary registers that it may be using onto
the stack before making a jal call.
– Function B must save any S (saved) registers it
intends to use before garbling up their values
– Remember: Caller/callee need to save only temporary
/ saved registers they are using, not all registers.

60
Register
Conventions (5/6)
• Note that, if the callee is going to use some s
registers, it must:
– save those s registers on the stack
– use the registers
– restore s registers from the stack
–jr $ra

• With the temp registers, the callee doesn’t need to


save onto the stack.
• Therefore the caller must save those temp registers
that it would like to preserve though the call.

61
Other
Registers
• $at: may be used by(6/6)
the assembler at any
time; always unsafe to use

• $k0-$k1: may be used by the kernel at any time;


unsafe to use

• $gp: global pointer

• $fp: frame pointer

• $sp: stack pointer

• $ra: return address

62
63

You might also like