Floating Point Addition
Add the following two decimal numbers in scientific notation:
8.70 10-1 with 9.95 101
1. Rewrite the smaller number such that its exponent matches with
the exponent of the larger number.
8.70 10-1 = 0.087 101
2. Add the mantissas
9.95 + 0.087 = 10.037 and write the sum 10.037 101
3. Put the result in Normalised Form
10.037 101 = 1.0037 102 (shift mantissa, adjust exponent)
check for overflow/underflow of the exponent after normalisation
4. Round the result
If the mantissa does not fit in the space reserved for it, it has to be
rounded off.
For Example: If only 4 digits are allowed for mantissa
1.0037 102 ===> 1.004 102
(only have a hidden bit with binary floating point numbers)
Example addition in binary
Perform 0.5 + (-0.4375)
0.5 = 0.1 20 = 1.000 2-1 (normalised)
-0.4375 = -0.0111 20 = -1.110 2-2 (normalised)
1. Rewrite the smaller number such that its exponent matches with
the exponent of the larger number.
-1.110 2-2 = -0.1110 2-1
2. Add the mantissas:
1.000 2-1 + -0.1110 2-1 = 0.001 2-1
3. Normalise the sum, checking for overflow/underflow:
0.001 2-1 = 1.000 2-4
-126 <= -4 <= 127 ===> No overflow or underflow
4. Round the sum:
The sum fits in 4 bits so rounding is not required
Check: 1.000 2-4 = 0.0625 which is equal to 0.5 - 0.4375
Correct!