0% found this document useful (0 votes)
16 views

Lect 13

The document discusses multiplication and division in computers. It describes how these operations are performed using long multiplication and long division algorithms. It also discusses how multiplication and division hardware works and how these operations are implemented in MIPS.

Uploaded by

ARUOS Soura
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lect 13

The document discusses multiplication and division in computers. It describes how these operations are performed using long multiplication and long division algorithms. It also discusses how multiplication and division hardware works and how these operations are implemented in MIPS.

Uploaded by

ARUOS Soura
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

Lecture 13

15-02-2021

1
Multiplication

Long-multiplication approach:

multiplicand
1000
multiplier
× 1001
1000
00000
000000
1000000
product 1001000

Length of product is
the sum of operand
lengths

Chapter 3 — Arithmetic for Computers — 2 2


Multiplication Hardware

Initially 0

Chapter 3 — Arithmetic for Computers — 3

3
1000
× 1001
1000
0000
0000
1000
1001000

4
Chapter 3 — Arithmetic for Computers — 4
Chapter 3 — Arithmetic for Computers — 5

5
Optimized Multiplier

1000 Multiplier is
initially placed
× 1001 here
1000
0000 Important observation
0000 to make: Shifting
1000 P M product right 1 bit is
equivalent to shifting
1001000 multiplicand 1 bit left

 One cycle per partial-product addition


 That’s ok, if frequency of multiplications is low
6
Chapter 3 — Arithmetic for Computers — 6
Fast Multiplier

Uses multiple adders
 Cost/performance tradeoff

 Can be pipelined
 Several multiplication performed in parallel 7
Chapter 3 — Arithmetic for Computers — 8

8
MIPS Multiplication

Two 32-bit registers for product
 HI: most-significant 32 bits
 LO: least-significant 32-bits

Instructions:
 mult rs, rt / multu rs, rt

64-bit product in HI/LO
 mfhi rd / mflo rd

Move from HI/LO to rd

Can test HI value to see if product overflows 32 bits
 mul rd, rs, rt

Least-significant 32 bits of product –> rd

Chapter 3 — Arithmetic for Computers — 9


§3.4 Division
Division

Check for 0 divisor

Long division approach
quotient
 If divisor ≤ dividend bits
dividend

1 bit in quotient, subtract
1001
1000 1001010
 Otherwise
-1000 
0 bit in quotient, bring down next
divisor dividend bit
10
101 
Restoring division
1010  Do the subtract, and if remainder
-1000 goes < 0, add divisor back
10
remainder 
Signed division
n-bit operands yield n-bit  Divide using absolute values
quotient and remainder  Adjust sign of quotient and remainde
as required

10
Chapter 3 — Arithmetic for Computers — 10
Division Hardware

Initially divisor
in left half

Initially dividend

32 bits 
33 iterations

11
1001
1000 1001010
-1000
10
101
1010
-1000
10

Chapter 3 — Arithmetic for Computers12— 12


FIGURE 3.10 Division example using the algorithm in Figure 3.9. The bit
examined to determine the next step is circled in color.

Chapter 3 — Arithmetic for Computers — 13 13


Optimized Divider
Dividend initially
Remainder,
Quotient


One cycle per partial-remainder subtraction

Looks a lot like a multiplier!
 Same hardware can be used for both
14
Chapter 3 — Arithmetic for Computers — 14
Faster Division

Can’t use parallel hardware as in multiplier:
 Subtraction is conditional on sign of
remainder…


Faster dividers (e.g. SRT division)
generate multiple quotient bits per step
 Still require multiple steps

15
Chapter 3 — Arithmetic for Computers — 15
MIPS Division

Use HI/LO registers for result
 HI: 32-bit remainder
 LO: 32-bit quotient

Instructions:
 div rs, rt / divu rs, rt
 No overflow or divide-by-0 checking

Software must perform checks if required

 Use mfhi, mflo to access result

Chapter 3 — Arithmetic for Computers — 16 16


Floating Point
 Significand
Representation for non-integral numbers
 Including very small and very large numbers

Numbers in scientific notation:
 –2.34 × 1056
 +0.002 × 10–4
normalized
 +987.02 × 109
not normalized

In binary
 ±1.xxxxxxx2 × 2yyyy
not normalized

Types float and double in C

normalized

17
Floating Point Standard

Defined by IEEE Std 754 --- 1985

Developed in response to divergence of
representations:
 Portability issues for scientific code

Now almost universally adopted

Two representations
 Single precision (32-bit)
 Double precision (64-bit)
18
IEEE Floating-Point Format
single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction

(Exponent Bias)
x  ( 1)  (1 Fraction)  2
S

View + as

S: sign bit (0  non-negative, 1  negative) a binary

Normalized significand: 1.0 ≤ |significand| < 2.0 point .
 Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden
bit)
 Significand is Fraction with the “1.” restored

Exponent: excess representation: actual exponent + Bias
 Ensures exponent is unsigned
 Single: Bias = 127; Double: Bias = 1203

19
Single-Precision Range

Exponents 00000000 and 11111111 are reserved

Smallest value:
 Exponent: 00000001
 actual exponent = 1 – 127 = –126
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–126 ≈ ±1.2 × 10–38

Largest value:
 exponent: 11111110
 actual exponent = 254 – 127 = +127
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+127 ≈ ±3.4 × 10+38
Chapter 3 — Arithmetic for Computers — 20
20
Double-Precision Range

Exponents 0000…00 and 1111…11 reserved

Smallest value:
 Exponent: 00000000001
 actual exponent = 1 – 1023 = –1022
 Fraction: 000…00  significand = 1.0
 ±1.0 × 2–1022 ≈ ±2.2 × 10–308

Largest value:
 Exponent: 11111111110
 actual exponent = 2046 – 1023 = +1023
 Fraction: 111…11  significand ≈ 2.0
 ±2.0 × 2+1023 ≈ ±1.8 × 10+308

21
Floating-Point Precision

Relative precision:
 all fraction bits are significant
 Single: approx 2–23

Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits
of precision

 Double: approx 2–52



Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits
of precision

22
Floating-Point Example

Represent –0.75
 –0.75 = (–1)1 × 1.12 × 2–1
S =1
 Fraction = 1000…002
 Exponent = –1 + Bias

Single: –1 + 127 = 126 = 011111102

Double: –1 + 1023 = 1022 = 011111111102

Single: 1 01111110 1000…00

Double: 1 01111111110 1000…00

23
Floating-Point Example

Which number is represented by the following
in single-precision float:
1 10000001 01000…00
S =1
 Fraction = 01000…002
 Exponent = 100000012 = 129
 x = (–1)1 × (1 + 012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0
24
Exercise 1

Represent -85.125 in IEEE FP format

–85.125= (–1)1 × (85.125)

85= 1010101

.125=.001

85.125=1010101.001=1.010101001 x 26

S=1
 Fraction = 0101010012
 Exponent = 100001012 = 133
 1 10000101 0101010010…00
25
Exercise 2

Represent 176.375 in IEEE FP format

176.375=10110000.011

1.0110000011 * 27

Exponent= 127+7=134= 10000110

0 10000110
01100000110000000000000
26
Special Values

27
Denormal Numbers

The smallest representable number is ±1.0 × 2–
126


How to close the gap?

Subnormal (denormal number):
 When all the exponent bits are 0
 The leading hidden bit of the significand is
implied to be 0.

28
Denormal Numbers

Exponent = 000...0  hidden bit is 0
Bias
x  ( 1)  (0  Fraction)  2
S

 Denormal with fraction = 000...0


x  ( 1)S  (0  0)  2Bias  0.0
Two representations of 0.0!


Largest subnormal number is 0.999999988×2–126.
 It is close to the smallest normalized number 1×2–126.
29
Infinities and NaNs

Exponent = 111...1, Fraction = 000...0
 ±Infinity
 Can be used in subsequent calculations, avoiding need for
overflow check

Exponent = 111...1, Fraction ≠ 000...0
 Not-a-Number (NaN)
 Indicates illegal or undefined result

e.g., 0.0 / 0.0
 Can be used in subsequent calculations

30
Floating-Point Addition

Consider a 4-digit decimal example
 9.999 × 101 + 1.610 × 10–1

1. Align decimal points
 Shift number with smaller exponent
 9.999 × 101 + 0.016 × 101

2. Add significands
 9.999 × 101 + 0.016 × 101 = 10.015 × 101

3. Normalize result & check for over/underflow
 1.0015 × 102

4. Round and renormalize if necessary
 1.002 × 102

31
Floating-Point Addition

Now consider a 4-digit binary example
 1.0002 × 2–1 + –1.1102 × 2–2 (in decimal 0.5 + –0.4375)

1. Align binary points
 Shift number with smaller exponent
 1.0002 × 2–1 + –0.1112 × 2–1

2. Add significands
 1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1

3. Normalize result & check for over/underflow
 1.0002 × 2–4, with no over/underflow

4. Round and renormalize if necessary
 1.0002 × 2–4 (no change) = 0.0625

32
FP Adder Hardware

Much more complex than integer adder

Doing it in one clock cycle would make
clock cycle too long:
 Much longer than integer operations
 Slower clock would penalize all instructions

FP adder usually takes several cycles
 Can be pipelined
33
FP Adder Hardware

Step 1

Step 2

Step 3

Step 4

34
FP Arithmetic Hardware

FP multiplier is of similar complexity to FP adder
 But uses a multiplier for significands instead of an adder

FP arithmetic hardware usually does
 Addition, subtraction, multiplication, division, reciprocal,
square-root
 FP  integer conversion

Operations usually takes several cycles
 Can be pipelined

37
Accurate Arithmetic

IEEE Std 754 specifies additional rounding control
 Extra bits of precision (guard, round, sticky)
 Choice of rounding modes
 Allows programmer to fine-tune numerical behavior of a
computation

Not all FP units implement all options
 Most programming languages and FP libraries just use
defaults

Trade-off between hardware complexity,
performance, and market requirements
38
Subword Parallellism

Graphics and audio applications can take
advantage of performing simultaneous
operations on short vectors
 Example: 128-bit adder:

Sixteen 8-bit adds

Eight 16-bit adds

Four 32-bit adds

Also called data-level parallelism, vector
parallelism, or Single Instruction, Multiple Data
(SIMD)
39
Associativity

Parallel programs may interleave operations in
unexpected orders
 Assumptions of associativity may fail

(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38 0.00E+00
z 1.0 1.0 1.50E+38
1.00E+00 0.00E+00
 Need to validate parallel programs under
varying degrees of parallelism
40
Who Cares About FP Accuracy?


Important for scientific code
 But for everyday consumer use?

“My bank balance is out by 0.0002¢!” 


The Intel Pentium FDIV bug
 The market expects accuracy
 See Colwell, The Pentium Chronicles
41
Concluding Remarks

Bits have no inherent meaning
 Interpretation depends on the
instructions applied

Computer representations of numbers
 Finite range and precision
 Need to account for this in programs

42
Concluding Remarks

ISAs support arithmetic
 Signed and unsigned integers
 Floating-point approximation to reals

Bounded range and precision
 Operations can overflow and underflow

MIPS ISA
 Core instructions: 54 most frequently used

100% of SPECINT, 97% of SPECFP
 Other instructions: less frequent

43

You might also like