0% found this document useful (0 votes)
47 views

Decimal To Floating-Point Conversions: The Conversion Procedure

The document describes the procedure for converting decimal numbers to floating point format in binary. It involves: 1) Converting the integral and fractional parts of the decimal number to binary separately. 2) Appending an exponent of 2 and normalizing the number by moving the binary point. 3) Placing the mantissa in the mantissa field and adding the bias to the exponent, which is placed in the exponent field. 4) Setting the sign bit according to the original number. Several examples are provided to demonstrate converting decimal numbers to 8-bit and 32-bit IEEE floating point format using this procedure.

Uploaded by

Sanjana Khanna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Decimal To Floating-Point Conversions: The Conversion Procedure

The document describes the procedure for converting decimal numbers to floating point format in binary. It involves: 1) Converting the integral and fractional parts of the decimal number to binary separately. 2) Appending an exponent of 2 and normalizing the number by moving the binary point. 3) Placing the mantissa in the mantissa field and adding the bias to the exponent, which is placed in the exponent field. 4) Setting the sign bit according to the original number. Several examples are provided to demonstrate converting decimal numbers to 8-bit and 32-bit IEEE floating point format using this procedure.

Uploaded by

Sanjana Khanna
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Decimal to Floating-

Point Conversions
Floating-Point Conversion Examples   Binary/Boolean Main
Index

[Decimal to Floating-Point Conversions] [Float to Decimal Conversion]

The Conversion Procedure


The rules for converting a decimal number into floating point are as
follows:

A. Convert the absolute value of the number to binary, perhaps with a


fractional part after the binary point. This can be done by
converting the integral and fractional parts separately. The integral
part is converted with the techniques examined previously. The
fractional part can be converted by multiplication. This is basically
the inverse of the division method: we repeatedly multiply by 2, and
harvest each one bit as it appears left of the decimal.
B. Append × 20 to the end of the binary number (which does not
change its value).
C. Normalize the number. Move the binary point so that it is one bit
from the left. Adjust the exponent of two so that the value does not
change.
D. Place the mantissa into the mantissa field of the number. Omit the
leading one, and fill with zeros on the right.
E. Add the bias to the exponent of two, and place it in the exponent
field. The bias is 2k−1 − 1, where k is the number of bits in the
exponent field. For the eight-bit format, k = 3, so the bias is
23−1 − 1 = 3. For IEEE 32-bit, k = 8, so the bias is 28−1 − 1 = 127.
F. Set the sign bit, 1 for negative, 0 for positive, according to the sign
of the original number.

Using The Conversion Procedure

Convert 2.625 to our 8-bit floating point format.


A. The integral part is easy, 210 = 102. For the fractional part:
×2 Generate 1 and continue with the
0.625 1.25 1
= rest.
×2
0.25 0.5 0 Generate 0 and continue.
=
×2
0.5 1.0 1 Generate 1 and nothing remains.
=
So 0.62510 = 0.1012, and 2.62510 = 10.1012.
B. Add an exponent part: 10.1012 = 10.1012 × 20.
C. Normalize: 10.1012 × 20 = 1.01012 × 21.
D. Mantissa: 0101
E. Exponent: 1 + 3 = 4 = 1002.
F. Sign bit is 0.
The result is 0 100 0101 . Represented as hex, that is 4516.
Convert -4.75 to our 8-bit floating point format.
a. The integral part is 410 = 1002. The fractional:
×2 Generate 1 and continue with the
0.75 1.5 1
= rest.
×2
0.5 1.0 1 Generate 1 and nothing remains.
=
So 4.7510 = 100.112.
b. Normalize: 100.112 = 1.00112 × 22.
c. Mantissa is 0011, exponent is 2 + 3 = 5 = 1012, sign bit is 1.
So -4.75 is 1 101 0011 = d316
Convert 0.40625 to our 8-bit floating point format.
a. Converting:
×2
0.40625 0.8125 0 Generate 0 and continue.
=
×2 Generate 1 and continue with
0.8125 1.625 1
= the rest.
×2 Generate 1 and continue with
0.625 1.25 1
= the rest.
×2
0.25 0.5 0 Generate 0 and continue.
=
×2 Generate 1 and nothing
0.5 1.0 1
= remains.
So 0.4062510 = 0.011012.
b. Normalize: 0.011012 = 1.1012 × 2-2.
c. Mantissa is 1010, exponent is -2 + 3 = 1 = 0012, sign bit is 0.
So 0.40625 is 0 001 1010 = 1a16
Convert -12.0 to our 8-bit floating point format.
a. 1210 = 11002.
b. Normalize: 1100.02 = 1.12 × 23.
c. Mantissa is 1000, exponent is 3 + 3 = 6 = 1102, sign bit is 1.
So -12.0 is 1 110 1000 = e816
Convert decimal 1.7 to our 8-bit floating point format.
a. The integral part is easy, 110 = 12. For the fractional part:
0.7 × 2 = 1.4 1 Generate 1 and continue with the rest.
0.4 × 2 = 0.8 0 Generate 0 and continue.
0.8 × 2 = 1.6 1 Generate 1 and continue with the rest.
0.6 × 2 = 1.2 1 Generate 1 and continue with the rest.
0.2 × 2 = 0.4 0 Generate 0 and continue.
0.4 × 2 = 0.8 0 Generate 0 and continue.
0.8 × 2 = 1.6 1 Generate 1 and continue with the rest.
0.6 × 2 = 1.2 1 Generate 1 and continue with the rest.

The reason why the process seems to continue endlessly is
that it does. The number 7/10, which makes a perfectly
reasonable decimal fraction, is a repeating fraction in binary,
just as the faction 1/3 is a repeating fraction in decimal. (It
repeats in binary as well.) We cannot represent this exactly as
a floating point number. The closest we can come in four bits
is .1011. Since we already have a leading 1, the best eight-bit
number we can make is 1.1011.
b. Already normalized: 1.10112 = 1.10112 × 20.
c. Mantissa is 1011, exponent is 0 + 3 = 3 = 0112, sign bit is 0.
The result is 0 011 1011 = 3b16. This is not exact, of course. If
you convert it back to decimal, you get 1.6875.
Convert -1313.3125 to IEEE 32-bit floating point format.
a. The integral part is 131310 = 101001000012. The fractional:
×2
0.3125 0.625 0 Generate 0 and continue.
=
×2 Generate 1 and continue with the
0.625 1.25 1
= rest.
×2
0.25 0.5 0 Generate 0 and continue.
=
×2
0.5 1.0 1 Generate 1 and nothing remains.
=
So 1313.312510 = 10100100001.01012.
b. Normalize: 10100100001.01012 = 1.010010000101012 ×
210.
c. Mantissa is 01001000010101000000000, exponent is 10 +
127 = 137 = 100010012, sign bit is 1.
So -1313.3125 is 1 10001001 01001000010101000000000
= c4a42a0016
Convert 0.1015625 to IEEE 32-bit floating point format.
a. Converting:
×2
0.1015625 0.203125 0 Generate 0 and continue.
=
×2
0.203125 0.40625 0 Generate 0 and continue.
=
×2
0.40625 0.8125 0 Generate 0 and continue.
=
×2 Generate 1 and continue
0.8125 1.625 1
= with the rest.
×2 Generate 1 and continue
0.625 1.25 1
= with the rest.
×2
0.25 0.5 0 Generate 0 and continue.
=
×2 Generate 1 and nothing
0.5 1.0 1
= remains.
So 0.101562510 = 0.00011012.
b. Normalize: 0.00011012 = 1.1012 × 2-4.
c. Mantissa is 10100000000000000000000, exponent is -4 +
127 = 123 = 011110112, sign bit is 0.
So 0.1015625 is 0 01111011 10100000000000000000000 =
3dd0000016
Convert 39887.5625 to IEEE 32-bit floating point format.
a. The integral part is 3988710 = 10011011110011112. The
fractional:
×2 Generate 1 and continue with the
0.5625 1.125 1
= rest.
×2
0.125 0.25 0 Generate 0 and continue.
=
×2
0.25 0.5 0 Generate 0 and continue.
=
×2
0.5 1.0 1 Generate 1 and nothing remains.
=
So 39887.562510 = 1001101111001111.10012.
b. Normalize: 1001101111001111.10012 =
1.00110111100111110012 × 215.
c. Mantissa is 00110111100111110010000, exponent is 15 +
127 = 142 = 100011102, sign bit is 0.
So 39887.5625 is 0 10001110 00110111100111110010000
= 471bcf9016

You might also like