Chapter 3 Arithmetic For Computers (Revised)
Chapter 3 Arithmetic For Computers (Revised)
3.1 Introduction
Operations on integers
Arithmetic
Where we've been: Performance (seconds, cycles, instructions) Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: Implementing the Architecture
operation
a
32
ALU
result
32
b
32
Numbers
Bits are just bits (no inherent meaning) conventions define relationship between bits and numbers Binary numbers (base 2) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001... decimal: 0...2n-1 Of course it gets more complicated: numbers are finite (overflow) fractions and real numbers negative numbers e.g., no MIPS subi instruction; addi can add a negative number) How do we represent negative numbers? i.e., which bit patterns will represent which numbers?
Possible Representations
Sign Magnitude: One's Complement Two's Complement
Issues: balance, number of zeros, ease of operations Which one is best? Why?
MIPS
32 bit signed numbers:
0000 0000 0000 ... 0111 0111 1000 1000 1000 ... 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000two = 0ten 0000 0000 0000 0000 0000 0000 0001two = + 1ten 0000 0000 0000 0000 0000 0000 0010two = + 2ten
= = = = =
+ +
maxint
minint
1111 1111 1111 1111 1111 1111 1101two = 3ten 1111 1111 1111 1111 1111 1111 1110two = 2ten 1111 1111 1111 1111 1111 1111 1111two = 1ten
Detecting Overflow
No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign: overflow when adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive Consider the operations A + B, and A B Can overflow occur if B is 0 ? Can overflow occur if A is 0 ?
Effects of Overflow
An exception (interrupt) occurs Control jumps to predefined address for exception Interrupted address is saved for possible resumption Details based on software system / language example: flight control vs. homework assignment Don't always want to detect overflow new MIPS instructions: addu, addiu, subu
note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons Roll over: circular buffers Saturation: pixel lightness control
10
11
result
12
Appendix: C.5
13
A B
0 1
14
Different Implementations
Not easy to decide the best way to build something
Don't want too many inputs to a single gate Dont want to have to go through too many gates for our purposes, ease of comprehension is important Let's look at a 1-bit ALU for addition:
CarryIn
a Sum b
CarryOut
How could we build a 1-bit ALU for add, and, and or? How could we build a 32-bit ALU?
15
CarryIn
Operation
1: R = a V b; 2: R = a + b}
a
Result0
b0
a1
0
b1
Result
a2
2 b
b2
CarryOut
a31 b31
16
a 0
Result
0 1
CarryOut
17
18
Supporting slt
a
Binvert
Operation CarryIn
1 Result 2
a.
CarryOut
Binvert
Operation CarryIn
0 1 Result
0 1
Less
3
Set
Overflow detection
b.
Overflow
Binvert
CarryIn
Operation
a0 b0
Result0
a1 b1 0
Result1
a2 b2 0
Result2
CarryIn
a31 b31 0
20
a0 b0
Result0
a1 b1 0
Result1 Zero
a2 b2 0
Result2
a 0 1 Result b 0 1 Less 3 2
a31 b31 0 CarryIn ALU31 Less Result31 Set Overflow
a.
CarryOut
21
Conclusion
We can build an ALU to support the MIPS instruction set
key idea: use multiplexor to select the output we want we can efficiently perform subtraction using twos complement we can replicate a 1-bit ALU to produce a 32-bit ALU
Our primary focus: comprehension, however, Clever changes to organization can improve performance (similar to using better algorithms in software) well look at two examples for addition and multiplication
22
Can you see the ripple? How could you get rid of it? c1 = b0c0 + a0c0 + a0b0 c3 = b2c2 + a2c2 + a2b2 c2 = b1c1 + a1c1 + a1b1 c4 = b3c3 + a3c3 + a3b3
23
Appendix: C.6
24
Carry-lookahead adder
An approach in-between our two extremes Motivation: If we didn't know the value of carry-in, what could we do? When would we always generate a carry? gi = ai bi When would we propagate the carry? pi = ai + bi Did we get rid of the ripple? c1 = g0 + p0c0 c3 = g2 + p2c2 Expanding the carry chains c2 = g1+p1g0+p1p0c0 c2 = g1 + p1c1 c4 = g3 + p3c3
Feasible! Why? The carry chain does not disappear, but is much smaller: Cn has n+1 terms
25
a0 b0 a1 b1 a2 b2 a3 b3 CarryIn Result0--3 ALU0 P0 G0 C1 a4 b4 a5 b5 a6 b6 a7 b7 CarryIn Result4--7 ALU1 P1 G1 C2 CarryIn Result8--11 ALU2 P2 G2 C3 CarryIn Result12--15 ALU3 P3 G3 C4 CarryOut pi + 3 gi + 3 ci + 4 pi + 2 gi + 2 ci + 3 pi + 1 gi + 1 ci + 2 pi gi
Carry-lookahead unit
How about constructing a 16-bit adder in CLA way? Cant build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders Better: use the CLA principle again! P0 = p3 p2 p1 p0 P1 = p7 p6 p5 p4 P2 =p11 p10 p9 p8 P3 = p15 p14 p13 p12 G0 = g3+p3g2+p3p2g1+p3p2p1g0 G1 = g7+p7g6+p7p6g5+p7p6p5g4 G2 = g11+p11g10+p11p10g9+p11p10p9g8 G3 = g15+p15g14+p15p14g13+p15p14p13g12
ci + 1
26
3.3 Multiplication
Multiplication
multiplicand
multiplier
product
Multiplication Hardware
Initially 0
Optimized Multiplier
Faster Multiplier
Cost/performance tradeoff
Can be pipelined
MIPS Multiplication
Instructions
multu rs, rt
mfhi rd
mflo rd
Move from HI/LO to rd Can test HI value to see if product overflows 32 bits
3.4 Division
Division
quotient dividend
1 bit in quotient, subtract 0 bit in quotient, bring down next dividend bit
10010 1000 10010100 -1000 divisor 10 101 1010 -1000 100 remainder
n-bit operands yield (n+1)-bit quotient and n-bit remainder
Otherwise
Restoring division
Do the subtract, and if remainder goes < 0, add divisor back Divide using absolute values Adjust sign of quotient and remainder as required
Signed division
Division Hardware
Initially divisor in left half
Initially dividend
Optimized Divider
Faster Division
Faster dividers (e.g. SRT devision) generate multiple quotient bits per step
MIPS Division
HI: 32-bit remainder LO: 32-bit quotient div rs, rt / divu rs, rt No overflow or divide-by-0 checking
Instructions
Floating Point
Including very small and very large numbers 2.34 1056 +0.002 104 +987.02 109 1.xxxxxxx2 2yyyy
normalized
not normalized
In binary
S Exponent
S
Fraction
(Exponent Bias)
x ( 1) (1 Fraction) 2
S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 |significand| < 2.0
Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the 1. restored Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203
Chapter 3 Arithmetic for Computers 43
Single-Precision Range
Exponent: 00000001 actual exponent = 1 127 = 126 Fraction: 00000 significand = 1.0 1.0 2126 1.2 1038 exponent: 11111110 actual exponent = 254 127 = +127 Fraction: 11111 significand 2.0 2.0 2+127 3.4 10+38
Chapter 3 Arithmetic for Computers 44
Largest value
Double-Precision Range
Exponent: 00000000001 actual exponent = 1 1023 = 1022 Fraction: 00000 significand = 1.0 1.0 21022 2.2 10308 Exponent: 11111111110 actual exponent = 2046 1023 = +1023 Fraction: 11111 significand 2.0 2.0 2+1023 1.8 10+308
Chapter 3 Arithmetic for Computers 45
Largest value
Floating-Point Precision
Relative precision
Floating-Point Example
Represent 0.75
Floating-Point Example
Single precision
Exponent 0 0 1-254 255 255 Fraction 0 Nonzero Anything 0 Nonzero
Double precision
Exponent 0 0 1-2046 2047 2047 Fraction 0 Nonzero Anything 0 Nonzero
Object represented
+ Denomalized number +
Floating-point number
51
Floating-Point Addition
2. Add significands
1.002 102
Chapter 3 Arithmetic for Computers 52
Floating-Point Addition
2. Add significands
FP Adder Hardware
Much more complex than integer adder Doing it in one clock cycle would take too long
Much longer than integer operations Slower clock would penalize all instructions Can be pipelined
FP Adder Hardware
Step 1
Step 2
Step 3
Step 4
FP Arithmetic Hardware
But uses a multiplier for significands instead of an adder Addition, subtraction, multiplication, division, reciprocal, square-root FP integer conversion
Can be pipelined
Chapter 3 Arithmetic for Computers 58
FP Instructions in MIPS
FP hardware is coprocessor 1
Separate FP registers
Programs generally dont do integer ops on FP data, or vice versa More registers with minimal code-size impact lwc1, ldc1, swc1, sdc1
FP Instructions in MIPS
Single-precision arithmetic
Double-precision arithmetic
c.xx.s, c.xx.d (xx is eq, lt, le, ) Sets or clears FP condition-code bit
bc1t, bc1f
FP Example: F to C
C code:
float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); } fahr in $f12, result in $f0, literals in global memory space
X=X+YZ
C code:
void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } Addresses of x, y, z in $a0, $a1, $a2, and i, j, k in $s0, $s1, $s2
Chapter 3 Arithmetic for Computers 62
MIPS code:
$t1, 32 $s0, 0 $s1, 0 $s2, 0 $t2, $s0, 5 $t2, $t2, $s1 $t2, $t2, 3 $t2, $a0, $t2 $f4, 0($t2) $t0, $s2, 5 $t0, $t0, $s1 $t0, $t0, 3 $t0, $a2, $t0 $f16, 0($t0) # # # # # # # # # # # # # # $t1 = 32 (row size/loop end) i = 0; initialize 1st for loop j = 0; restart 2nd for loop k = 0; restart 3rd for loop $t2 = i * 32 (size of row of x) $t2 = i * size(row) + j $t2 = byte offset of [i][j] $t2 = byte address of x[i][j] $f4 = 8 bytes of x[i][j] $t0 = k * 32 (size of row of z) $t0 = k * size(row) + j $t0 = byte offset of [k][j] $t0 = byte address of z[k][j] $f16 = 8 bytes of z[k][j]
li li L1: li L2: li sll addu sll addu l.d L3: sll addu sll addu l.d
Interpretation of Data
The BIG Picture
Associativity
x86 FP Architecture
8 80-bit extended-precision registers Used as a push-down stack Registers indexed from TOS: ST(0), ST(1), Converted on load/store of memory operand Integer operands can also be converted on load/store Result: poor FP performance
x86 FP Instructions
Data transfer
FILD mem/ST(i) FISTP mem/ST(i) FLDPI FLD1 FLDZ
Arithmetic
FIADDP FISUBRP FIMULP FIDIVRP FSQRT FABS FRNDINT mem/ST(i) mem/ST(i) mem/ST(i) mem/ST(i)
Compare
FICOMP FIUCOMP FSTSW AX/mem
Transcendental
FPATAN F2XMI FCOS FPTAN FPREM FPSIN FYL2X
Optional variations
I: integer operand P: pop operand from stack R: reverse operand order But not all combinations allowed
Chapter 3 Arithmetic for Computers 69
Extended to 8 registers in AMD64/EM64T 2 64-bit double precision 4 32-bit double precision Instructions operate on them simultaneously
Single-Instruction Multiple-Data
Only for unsigned integers Arithmetic right shift: replicate the sign bit e.g., 5 / 4
Concluding Remarks
Signed and unsigned integers Floating-point approximation to reals Operations can overflow and underflow Core instructions: 54 most frequently used
MIPS ISA