0% found this document useful (0 votes)
303 views3 pages

Floating Point Addition

This document discusses how to add two numbers in scientific notation. [1] The smaller number is rewritten so its exponent matches the larger number. [2] The mantissas are added. [3] The sum is normalized by adjusting the exponent so the mantissa is between 1 and 10. [4] The result is rounded if it has more digits than the reserved space for the mantissa.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
303 views3 pages

Floating Point Addition

This document discusses how to add two numbers in scientific notation. [1] The smaller number is rewritten so its exponent matches the larger number. [2] The mantissas are added. [3] The sum is normalized by adjusting the exponent so the mantissa is between 1 and 10. [4] The result is rounded if it has more digits than the reserved space for the mantissa.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Floating Point Addition

Add the following two decimal numbers in scientific notation:


8.70 10-1 with 9.95 101

1. Rewrite the smaller number such that its exponent matches with
the exponent of the larger number.
8.70 10-1 = 0.087 101

2. Add the mantissas

9.95 + 0.087 = 10.037 and write the sum 10.037 101

3. Put the result in Normalised Form

10.037 101 = 1.0037 102 (shift mantissa, adjust exponent)

check for overflow/underflow of the exponent after normalisation

4. Round the result

If the mantissa does not fit in the space reserved for it, it has to be
rounded off.

For Example: If only 4 digits are allowed for mantissa


1.0037 102 ===> 1.004 102

(only have a hidden bit with binary floating point numbers)

Example addition in binary

Perform 0.5 + (-0.4375)

0.5 = 0.1 20 = 1.000 2-1 (normalised)

-0.4375 = -0.0111 20 = -1.110 2-2 (normalised)

1. Rewrite the smaller number such that its exponent matches with
the exponent of the larger number.
-1.110 2-2 = -0.1110 2-1

2. Add the mantissas:


1.000 2-1 + -0.1110 2-1 = 0.001 2-1

3. Normalise the sum, checking for overflow/underflow:


0.001 2-1 = 1.000 2-4

-126 <= -4 <= 127 ===> No overflow or underflow

4. Round the sum:

The sum fits in 4 bits so rounding is not required

Check: 1.000 2-4 = 0.0625 which is equal to 0.5 - 0.4375

Correct!

You might also like