FPGA Based System Design
General Rule for Unsigned Representation of a system
having x as base (where x is 2 in binary system, 8 in
octal system 10 in decimal system and 16 in
hexadecimal system)
In x-base system total x characters to represent a
number e.g in Binary (base 2) 0 and 1 to represent a
number , in octal 0 to 7, in decimal 0-9 and 0-9, A-F in
hexadecimal system.
If an unsigned number has to be represented in base x system with n digits for
integer and m digits for fraction then the weights of each digit will be as under
xn-1 ……x3 x2 x1 x0 . x-1 x-2 x-3 x-4 x-5……. x-m
Integer . Fractional
4. Fixed Point Arithmetic-Unsigned Number Representation
Example of Decimal System
……. 103 102 101 100 . 10-1 10-2 10-3 10-4 10-5 …
......…1000 100 10 1 0.1 0.01 0.001 0.0001 0.00001
e.g a number in decimal system = (5003.125)10
= 5x103 + 0x102 + 0x101+ 3x100+1x10-1+ 2x10-2+ 5x10-3
Example of Binary System
……. 23 22 21 20 . 2-1 2-2 2-3 2-4 2-5 …….
......……8 4 2 1 0.5 0.25 0.125 0.0625 0.03125
e.g a number in binary (4 bits for integer 3 for fraction)= (1101.101) 2
In Decimal = 1x23 + 1x22 + 0x21+ 1x20+1x2-1+ 0x2-2+ 1x2-3
= 8+4+0+1+.5+0+0.125 = (13.625)10
SoC Design by Dr. Atif Raza Jafri
For a negative number the MSB shall always be 1 and
vice versa:
e.g
With 4 bits for integer and 3 bits for fraction the value -6.25 is written
as:
-23 22 21 20 . 2-1 2-2 2-3
-8 4 2 1 . 0.5 0.25 0.125
1 0 0 1. 1 1 0
By taking 2’s complement one can switch between -ve and +ve
representation of a number:
e.g
With 4 bits for integer and 3 bits for fraction the value 6.25 is written
as:
-23 22 21 20 . 2-1 2-2 2-3
-8 4 2 1 . 0.5 0.25 0.125
01 1 0. 0 1 0 which is 2’s
complement of 1 0 0 1. 1 1 0
How to take 2’s Complement
Invert each bit of a number and add 1 to LSB (used in HW)
Or
start from right of a number and find first ‘1’, keep the bits till this first ‘1’
same and after that invert each bit (easy when doing paper work)
e.g
-6.25 1001.110
0 1 1 0 . 0 0 1 Invert each bit
1+
6.25 0110.010
Or
-6.25 1001.110
First one from right
6.25 0110.010
Till here don’t change any thing
From here invert each bit
Range of a number
if a signed number is represented in Q(n,m) format then the range of
the number is:
Range {Q(n,m)} = -2n-1 to +2n-1 – 2-m
Range {Q(4,3)} = -24-1 to +23 – 2-3
Range {Q(n,m)} = -8 to +8 – 0.125
Range {Q(n,m)} = -8 to +7.875
if a signed number has max. –ve value (-8 in above example) then the MSB
shall be 1 and rest of the bits will be zero. E.g -8 in Q(4,3) is
-8 4 2 1 . 0.5 0.25 0.125
10 0 0. 0 0 0
if a signed number has max. +ve value (7.875in above example) then the
MSB shall be 0 and rest of the bits will be 1. E.g 7.875 in Q(4,3) is
-8 4 2 1 . 0.5 0.25 0.125
01 1 1. 1 1 1
Converting a number represented in less number of
bits into equivalent in more number of bits:
In fractional part concatenate zeros and in integer part perform sign
extension:
E.g -6.25 in Q(4,3) is
-8 4 2 1 . 0.5 0.25 0.125
10 0 1. 1 1 0
Same is to be represented in Q(6,4) then
-32 16 8 4 2 1 . 0.5 0.25 0.125 0.0625
1 1 1 0 0 1. 1 1 0 0
Sign extension Zero extension
1x-32 + 1x16 + 1x8+ 0x4+0x2+ 1x1+ 1x0.5 + 1x0.25 + 0x.125 + 0x0.0625= -6.25
Converting a number represented in more number of
bits into equivalent in less number of bits:
In fractional part either truncate or round off.
In integer part detect overflow and underflow situations and hence
maximize for overflow and minimize for under flow else truncate excess
MSB’s.
E.g -6.25 in Q(6,4) is
-32 16 8 4 2 1 . 0.5 0.25 0.125 0.0625
1 1 1 0 0 1. 1 1 0 0
Same is to be represented in Q(4,3) then first analyze if the number can fit in
the given bit numbers e.g in Q(4,3) the range is from -8 to 7.875 so there is
no underflow condition then truncate 2 MSB’s and 1 LSB and the number is:
-8 4 2 1 . 0.5 0.25 0.125
1 0 0 1. 1 1 0
1x-8+ 0x4+0x2+ 1x1+ 1x0.5 + 1x0.25 + 0x.125 + 0x0.0625= -6.25
Range Estimation after Arithmetic Operations:
Find the range of the operands to estimate the range of the output
after arithmetic operations.
Integer part shows the range whereas fractional part represent
accuracy.
Examples:
A (signed, Q(3,3)) => Range is from -4 to 3.875
B = A2
The output B is always a positive number (unsigned) with max
value = 4x4 =16 and minimum value is square of LSB weight
2-3x 2-3 = 0.015625 = 2-6
Hence
B(unsigned, Q(5,6))
Range Estimation after Arithmetic Operations General
Rules:
After addition/subtraction the answer require 1 more bit than the
operand which is represented in more bits.
e.g A(signed, Q(3,3)) => Range (-4, 3.875)
B(signed, Q(2,3)) => Range (-2, 1.875) Let C = A+B then
the range is (-6, 5.75) hence C(signed, Q(4,3))
After multiplication the bits are doubled and decimal point is
adjusted as under:
e.g A(signed, Q(3,3)) => Range (-4, 3.875)
B(signed, Q(2,3)) => Range (-2, 1.875) Let C = AxB then
C(signed, Q (3+2=5, 3+3=6)) . Though actual range is (-7.75, 8)
where –ve number is -7.75 which can fit in 4 bits for integer but in
this case +8 can not be fitted. Hence, 5 bits are needed for integer
and this will cover (-16 to 15.984375).
Note: Division can also be see as multiplication with reciprocal
Range Estimation after Arithmetic Operations General
Rules:
Summation Operation
e.g Ak(signed, Q(3,3)) => Range (-4, 3.875)
Bk(signed, Q(2,3)) => Range (-2, 1.875) Let C = AxB then
Ck(signed, Q (5, 6)) . Though actual range is (-7.75, 8)
where –ve number is -7.75 which can fit in 4 bits for integer but in
this case +8 can not be fitted. Hence, 5 bits are needed for integer
and this will cover (-16 to 15.984375).
7
Let D = ∑k=0 Ck => Range is 8x Range of Ck
8 is represented in 3 bits hence D is (signed, Q(8,6))
Which is Q( log2k +n,m)
Round this value to higher side
Example in C to Quantize a Parameter:
A variable x has to be quantized in signed, Q(4,3)
The range is (-8 to 7.875)
double x;
int dummy;
x = …….; // e.g 1.135
// addressing the min max values
If (x < -8) // underflow
x = -8;
else if (x>7.875 ) // overflow
x= 7.875;
// addressing the fraction part
x = x*pow(2,3); //x= 9.08 in which .08 has to be eliminated
Dummy = x; // Dummy = 9
x= double (dummy); // x= 9
x= x/ pow(2,3); // 1.125 which fits in signed Q(4,3)