My_Thesis (2)
My_Thesis (2)
RISC-V processor
Malik Nauman
May 2024
Dedication
i
Abstruct
ii
Acknowledgement
I would like to express my deepest appreciation to the person who made it possible
for me to work on a project that required extreme attention. Special thanks to my
supervisor, Dr. Arshad Hussain for the ultimate support he gave and his advice
throughout this project. My thesis work took place at System on Chip lab located
in Electronics department, Quaid-e-Azam University. SOC lab facilitated with ad-
vanced technological software and provided me with all the necessary equipment and
given me great professional environment. Hereby, I would like to express my appre-
ciation to all the members of Electronic Department who played an important role
in making my project a successful attempt.
Most importantly, thanks my parents who supported me and believed in me
throughout my bachelor’s program
iii
Contents
Dedication i
Abstruct ii
Acknowledgement iii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objecive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review 3
2.1 Overview of RISC-V architecture . . . . . . . . . . . . . . . . . . . . 3
2.2 Single Cycle RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Pipeline RISC-V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Floating Point Unit and IEEE Standard . . . . . . . . . . . . . . . . 7
2.5 FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
iv
Contents v
7 Implementation on FPGA 49
7.1 synthesis of the design . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.2 PMod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.3 Constraint File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
List of Figures
vii
List of Figures viii
PC Program Counter
Introduction
1.1 Background
1
1.2. Motivation 2
1.2 Motivation
The demand for increased computational capacity continues to rise in various do-
mains like data science, artificial intelligence or even consumer electronics. With
a modular, flexible and open-source architecture RISC-V is an attractive platform
for processor development innovation. Many high-performance applications cannot
function effectively without floating-point arithmetic. However creating a Floating-
Point Unit (FPU) that is reliable in terms of accuracy, efficiency and complexity is
difficult. By integrating an FPU the capabilities of a pipelined RISC-V processor are
improved, but design challenges such as data management and control hazards are
introduced. This research addresses existing gaps by developing a method for effi-
cient FPU integration in pipelined RISC-V processors, aiming to advance processor
performance and reliability. The outcomes of this work have the potential to impact
both academic research and industry applications, contributing to the development
of more powerful and efficient computing systems.
1.3 Objecive
The main objective of this thesis is to Design a 5 Stage RISC-V Pipelined Procrssor
with the implementation of the Floating Point Unit (FPU) from scratch and to study
the behaviour of the processor in depth such as its interaction between different
components of a processor i.e. ALU, Reg file, Memories, Control Unit, FPU etc.
And then to implement this whole processor on the Zedboard FPGA.
Chapter 2
Literature Review
Reduced Instruction Set Computing (RISC) is a new ISA (instruction set architec-
ture) that was originally designed to support computer architecture and research
education, but it has evolved so much over time that we can now expect RISC
V processors in our laptops, mobile devices, and personal computers sooner than
we think. Big tech companies like NVIDIA, Google, Huawei, Tesla, Samsung, and
dozens of others have begun to adopt the RISC V architecture, and we’ll soon be
seeing products with RISC-V processors on board due to improved battery efficiency,
security, and, most importantly, a royalty-free license to boot it on devices. As a
result, organizations won’t have to pay for Intel and ARM’s pricey licenses, which
can be a pain to deal with when working on a project.
3
2.1. Overview of RISC-V architecture 4
RISC-V uses the standard naming convention to describe the ISA supported in
a given implementation and the ISA naming convention has the format as followed.
So, RV which stands for RISC-V and then a sequence of numbers which can be either
32,64 or 128 this number indicated the width of the integer register file and the size
of the user address space then following this number it could be a sequence of the
letters and these letters are used to indicate set of standard extensions supported
by a given implementation. So, extensions define the instructions supported by an
implementation and for all the RISC-V implementations the only required extension
is the “I” extension and I stands for integer. The I extension defines a set of 40 in-
2.1. Overview of RISC-V architecture 5
structions and then the RISC-V specification itself also defines a number of what it
calls standard extensions. Standard extensions are the extensions which are defined
in the RISC-V foundation. For example the M extension defines Hardware multi-
plication and division instructions, the A extension defines the Atomic instructions,
etc.
One of the great things about RISC-V is it allows for easy and modular cus-
tomization, and that it allows you to add your own custom instructions and you
can indicate this in your implementation using the X in the ISA string to denote
your custom non-standard extension. For example, RV32IMAC which implements
2.2. Single Cycle RISC-V 6
Single Cycle executes instructions in a single cycle. This means that all the process
is completed in a single cycle from fetching the instruction to writing back the result.
In single cycle the control unit is very simple because there is nothing much com-
plex going in there. Because the instruction is completed in one cycle and it does
not require any non-architectural state. The single cycle processor design includes
essential components such as the control unit, arithmetic logic unit (ALU), registers,
and instruction memory, program counter, all interconnected through a straightfor-
ward data path. The main disadvantage of this processor is that it limits the cycle
time because of the slowest instruction (i.e load word instruction) because in this
instruction the critical time is largest.
2.3. Pipeline RISC-V 7
The floating point unit (FPU) is a math coprocessor designed to carry out the opera-
tions on the floating point numbers. FPU is an optional hardware component. Some
Cortex-M4 microcontrollers don’t have the FPU. Therefore, check the datasheet to
see whether the FPU is present or not.Typical a FPU operations include addition,
subtraction, multiplication, division, and the square root. Certain FPUs are capa-
ble of performing a variety of transcendental functions, including trigonometric and
exponential calculations; however, their precision may be low.
In the early 1990s, computers were built with a distinct FPU to perform these
types of calculations. Although computer manufacturers incorporated FPUs into the
microprocessor device beginning with the Motorola 68000 and Intel Pentium series.
FPUs are now a common component of the central processing unit (CPU).
2.4. Floating Point Unit and IEEE Standard 8
• Hardware-integrated FPU
The Institute of Electrical and Electronics Engineers (IEEE) established the IEEE
Standard for Floating-Point Arithmetic (IEEE 754) in 1985. This standard contains
total 4 common standards (i.e half (16-bits), single (32-bits), double (64-bits) and
quad (128-bits) ) to represent a floating point numbers. In this thesis the only single
precision is discussed.
Despite the fact that 2’s complement representations for negative numbers are
frequently used, neither the fraction nor the exponent in the IEEE floating-point
2.4. Floating Point Unit and IEEE Standard 9
representations use 2’s complement. Because the IEEE 754 designers wanted the
format to be easily sorted, they used a biased notation for the exponent and a sign-
magnitude scheme for the fractional component.
The IEEE 754 floating-point formats need three sub-fields: sign, fraction, and
exponent. The fractional part of the number is represented using a sign magnitude
representation in the IEEE floating-point formats—that is, there is an explicit sign
bit (S) for the fraction. The sign is 0 for positive numbers and 1 for negative numbers.
In a binary normalized scientific notation, the leading bit before the binary point is
always 1. Therefore, the designers of the IEEE format decided to make it implied,
representing only the bits after the binary point.
The exponent in the IEEE floating-point formats uses what is known as a biased
notation. A biased representation is one in which every number is represented by
the number plus a certain bias. In the IEEE single precision format, the bias is 127.
Hence, if the exponent is 11, it will be represented by 11 1 127 5 128. If the exponent
is 22, it will be represented by 22 1 127 5 125. Thus, exponents less than 127 indicate
actual negative exponents, and exponents greater than 127 indicate actual positive
exponents. The bias is 1023 in the double precision format.
If a positive exponent becomes too large to fit in the exponent field, the situation
is called overflow, and if a negative exponent is too large to fit in the exponent field,
that situation is called underflow. IEEE 754 standard also supports special cases like
infinity, unnormalized , zero and NaN.
2.5. FPGA 10
2.5 FPGA
FPGA stands for Field Programmable Gate Array that is an integrated circuit de-
signed to be reprogrammable using using hardware description language such as Ver-
ilog HDL or VHDL. The fact that FPGAs are reprogrammable distinguishes FPGAs
from application specific integrated circuits also known as ASICs which are inte-
grated circuits manufactured to do specific design tasks although there are one-time
programmable FPGAs the dominant types are reprogrammable.
• Cost FPGAs are getting cheaper to manufacture thus they’re getting more
popular in today’s industry. FPGAs are used in aerospace and defense audio
2.5. FPGA 12
In this thesis the Zedboard FPGA is used and the software used is Xilinx Vivado
RISC-V Processor
13
3.2. Architectural Design 14
Architecture allows for the division of microarchitectures into two components which
are interconnected with each other: the datapath and the control unit. It includes
memories, registers, ALUs, and multiplexers. We work with the 32-bit RISC-V
(RV32I) architecture which is why we use a 32-bit datapath. Datapath feeds the
control unit to inform it about the ongoing instruction and directs the datapath to
execute that instruction.
We will be talking about the basic components of the single-cycle RISC-V processor
moving forward in this section. Within the constraints of the single clock cycle, each
3.2. Architectural Design 15
• Program Counter
The program counter (PC) is a register that holds the address of the current
instruction. A program counter (PC) is an essential part of a computer’s central
processor unit (CPU). It is a register that maintains track of the memory
location of the following instruction that the processor will fetch and carry
out. When running a program, the program counter is essential because it tells
the CPU which instruction to retrieve from memory and run next.
Unless the program ends or unless a special circumstance, like a branch or jump
instruction, modifies the program flow, the process of carrying out instructions
and updating the program counter continues. The program counter is changed
3.2. Architectural Design 16
• Instruction Memory
Word-Addressable Memory: Each 32 bit data word has its own unique
address. Now take an example of load instruction (lw). And its lw format is
lw r1, 10(r0), where r1 is the destination register, 10 is the offset or constant
or immediate and r0 is the base register. So, how the address is calculated? It
is simply by adding the base register and the offset. Adding the base register
and the offset gives us the address. So, in this example the address is 10. And
after this instruction executes r1 will hold the data value at address r0 + 10.
Now take an another example this time of store word. Suppose we want to
store the value of r4 into the memory address 3. So, the format is sw, r4,
0x3(r0), where r4 is the source register, 0x3 is the offset and r0 is the base
register. So, how the address is calculated? It is simply by adding the base
3.2. Architectural Design 18
register and the offset. Adding the base register and the offset gives us the
address. So, in this example the address is (0 + 0x3) = 3. So, the value in r4
register will be written on the word 3 of the memory.
Byte-Addressable Memory: Each data byte has its own unique address.
Since RISC-V is byte addressable so each data byte has its own unique address
so a word is 32-bits or 4-bytes. So, the word address increments by 4. So,
lets take an example of lord word in byte addressable memory. So the syntax
is lw, r1, 10(r0). So, how the address is calculated? It is simply by adding
the base register and the offset and then multiplied by 4. So in this case the
address would be (0 + 10) x 4 = 40 which is 0x28 in hexadecimal. And after
this instruction executes r1 will hold the data value at address 40.
• Register File
The register file has a write port and two read ports. These read interfaces
are set to receive the 5-bit address inputs each: A1 and A2 that specifies one
of 25 = 32 registers as their source operands. RD1 and RD2 are OUTPUT
3.2. Architectural Design 20
pins of READ data which have the 32-bit register values from the register file.
The third port, WRITE PORT accepts; a five bit ADDRESS input designated
by A3; WE3 (Write Enable), a signal for enabling writing in port three, WD3
(Write Data), and CLOCK (a clock). On the positive edge of its clock, if
its write enable is asserted (WE3), then this module will store the write data
(WD3) into that destination register address from the specified source register
number right away.
The data memory has a single read/write port. If its write enable pin is high
which is WE then it writes the data which is on the WD pin into the address
A on the rising edge of the clock. If its write enable pin WE is low then it
reads from the address A onto the output which is RD.
3.2. Architectural Design 21
The instruction execution flow in single cycle processor consists of five stages each
plays a vital role in instruction execution flow: Instruction Fetch(IF), Instruction De-
code(ID), Execute(EX), Memory(MEM), WriteBack(WB). Each instruction is passed
through all these stages for the completion of the instruction in one single cycle.
This section covers the operations performed in each stage illustrating that how an
instruction flows from fetch stage to write back stage.
• Instruction Fetch(IF): This is the first stage of the instruction flow execu-
tion. Here, in this stage the processor retrieves the next instruction that is
to be executed form the instruction memory. The next instruction to be per-
formed is indicated by the PC pointing to its memory address. The fetched
instruction and PC is passed to the Decode stage.
type, U-type, etc. After determining the type of instruction it generates the
control signals using the control unit. This stage also involves the reading from
the register file and then providing values of the registers r1 and r2 and control
signals to the next stage.
• Memory(MEM): The memory access stage is where the CPU deals with
instructions like load and store operations that need memory access. In the
MEM stage, instructions that are either a load or a store access the data
memory. The load reads data from memory into the processor, which is then
passed into the writeback stage. The store takes data from the register file and
writes it to memory.
• RV32E A reduced version of RV32I with fewer registers, intended for embed-
ded systems.
In this thesis only RV32I is implemented,and only some of instructions are im-
plemented, so, we will now discuss the RV32I Base Instruction Set.
The RISC-V instruction set architecture family of instruction sets includes a base
integer instruction set architecture called RV32I. RISC-V is pronounced “RISC Five,”
an open-standard instruction set architecture based on the principles of Reduced
Instruction Set Computing, or RISC. This base integer ISA is entitled “RV32I”,
meaning a 32-bit version of the RISC-V ISA with “I” extension for integer base
instructions.
The basis for RISC-V processors is the RV32I base integer instruction set. The
ISA can be extended for specific purpose for example if you want a processor that
3.3. Instruction Set Architecture(ISA) 25
• I-Type Instruction: The registers are utilized in the following ways for I-type
instructions, which require an immediate value and a source register.
rs1: Base register, holds the base address for the memory operation.
3.3. Instruction Set Architecture(ISA) 27
• S-Type Instruction: S-type instructions store data from the source register
to memory, whose effective address is calculated by a base register and an
immediate offset, using destination-register source-register registers.
• U-Type Instruction: The registers are utilized in the following ways for U-
type instructions, which involve operations using a large immediate value that
is used to construct addresses or constants.
• J-Type Instruction: The registers are utilized in the following ways for J-
type instructions, which involve jump operations where the target address is
specified by an immediate value.
These are the some basic RV32I instructions. Here are 40 instructions that are
supported by RV32I.
3.4. Single Cycle Data Path 29
Now, we will discuss about eh datapath of the processor. This section will incre-
mentally construct the single-cycle data path; new elements will be added in each
step to the state elements. To construct the datapath, we will use an example of
instructions to understand the datapath.
The PC holds the address of the instruction that is to be executed. The first step
3.4. Single Cycle Data Path 30
is to fetch this instruction from the instruction memory. In Figure 3.15, we can see
that the PC is directly connected to the address input of the instruction memory.
The instruction memory reads out the 32-bit instruction, which is called Instr. Instr
is of load word so the particular instruction fetched determines what the processor
will be doing. We shall start by establishing what the datapath relationships of the
lw instruction are. We shall then discuss how we may extend the application of the
datapaths to cover more instructions.
For the execution of the lw instruction we will first need to read the source register
containing the base address. Recall that lw is an I-type instruction, and the base
register is specified in the rs1 field of the instruction which is from instr[19:15]. As
3.4. Single Cycle Data Path 31
visible From Figure 3.16, these instruction bits are fed to the A1 address of the
register file. Register value is read via the register file. is forwarded to RD1. The
register file in our case, reads 0x2004 from x9.
An offset is also needed for the Iw instruction. The offset is held in the 12-bit
immediate field of the instruction Instr[31:20]. Since the value is signed, sign ex-
tension to 32 bits is necessary. The process of copying the sign bit into the most
significant bits is known as sign extension – for instance, ImmExt[11:0] = Instr[31:20]
and ImmExt[31:12] = Instr[31]. An Extend unit, as shown in Figure 3.17 performs
this sign-extension. It takes the 12-bit signed instant from Instr[31:20] and produces
the 32-bit sign-extended immediate, or ImmExt. In our example, the two’s comple-
ment instant -4 is zero-extended from its 12-bit version, 0xFFC, into a 32-bit form,
0xFFFFFFFC.
To determine the address to read from memory, the CPU multiplies the offset by
the base address. An ALU is introduced in Figure 3.18 to carry out this addition.
SrcA and SrcB, two operands, are given to the ALU. The offset from the sign-
extended instant, or ImmExt, is SrcB, while the base address from the register file is
3.4. Single Cycle Data Path 32
SrcA. Numerous operations are possible with the ALU. The procedure is specified by
the 3-bit ALUControl signal. 32-bit operands are sent into the ALU, which returns a
32-bit ALUResult. ALUControl should be set to 000 for the lw instruction in order
to conduct addition. As seen in Figure 7.6, ALUResult is sent to the data memory
as the address to read.
This memory address is forwarded to the address (A) port of the data memory
from the ALU. Finally, the data is written back to the destination register after
being read from the data memory onto the ReadData bus.The processor needs to
be working out the address of PCNext for the next instruction in the Execute cycle
as it executes each instruction. Since instructions are 32 bits, the next instruction
comes at PC+4.
So this was the complete datapath for the load word instruction. In the same
way we can implement the store word instruction, R-type instruction or any other
instruction that you want your microprocessor the deal it with.All instructions can
be implemented by seeing the architecture and then controlling the signals.There is
a trade off between the instructions that your processor can deal and the complexity
of the processor. The more instructions you add the complex your design would
3.5. Single Cycle Control Unit 34
become. So the complete datapath for the RISC-V microprocessor that can deal
with S-type,I-type,R-type,B-type,and J-type instructions is as follow.
The control signals for the single-cycle processor are computed based on funct3,
funct7 and op. since we only use bit 5 of funct7 in the RV32I instruction set, we
only need to consider the following three inputs to the control unit: op(Instr[6:0]),
funct3(Instr[14:12]) and funct7bit5(Instr[30]).
The Control Unit is divided into two components.Main Decoder: which deter-
3.5. Single Cycle Control Unit 35
mines which type of instruction is this and to produce the correct control signals.
ALU Decoder: which determines which operation is to be performed.
Figure 3.22 Processors combine the output of the Main Decoder and the next state
information to produce all the control signals. Figure 3.22 shows the control signals
produced by the Main Decoder, the control signals were determined as a part of the
datapath design in. The type of instruction from the opcode dictates the correct
control signals to be sent to the datapath by the Main Decoder. The Main Decoder
does produce the bulk of the datapath’s control signals. In addition, it produces
the internal signals to the controller, Branch and ALUOp. The truth table can be
used to implement logic for the Main Decoder using your favorite combinational logic
design techniques.
3.6. Simulation 36
ALUControl will be produced using ALUOp, function3, and the ALU Decoder.
Actually, following Table 7.3, the ALU Decoder uses funct7bit5 and op5 in its cal-
culation of ALUControl for the sub and add instructions.
3.6 Simulation
Since our single cycle RISC-v processor is completed, now its time to check its output
that whether we are getting expected output or not. To do this we first fill the
3.6. Simulation 37
instruction memory with out intructions that we want to execute and then run it.
Here is the example of the single cycle processor.
Chapter 4
There are five stages in pipelined processor. There can be many stages but for the
simplicity we divide the processor in five stages.
• Instruction Fetch(IF): The CPU pulls the subsequent instruction from mem-
ory during an instruction fetch
38
4.3. Hazards 39
ing it
Each of these stages handles one instruction at a time as the instructions flow through
them one at a time. By spreading out the execution of the instructions, the processor
spends less time idle and performs better overall
4.3 Hazards
When we our designing the pipeline processor then the hazards occurs. Hazards
occur when the one instruction is depended on the other instruction that has not be
yet completed then the hazard occurs. There are several types of hazards but the
important hazards are as follow.
Data Hazards are also knows as Read After Write hazards because these hazards can
occur when the last instruction result is the source of the second instruction. For
example the first instruction is the add x2,x1,x3 and the second instruction is the
4.3. Hazards 40
x5,x2,x0. Now look carefully when the first instruction is in the memory stage then
the second instruction is in the execute stage, now the updated result for the x2 is
not stored in the register yet. Now this hazard occur the second instruction will get
the wrong result.
Data Hazards can be solve by using the nops or the forwarding. Nops is not
considered good becasue it stops the processor and there for the forwarding is used.
When one instruction is dependent on a result, from another instruction, that has
not been written into the register file, RAW data is a possibility. Provided the
result is computed rapidly enough, forwarding can handle data hazards; otherwise
the pipeline must be stalled until the result is available.
Control Hazards are hazards that occur when the decision of what instruction to
fetch next has not been made by the time the fetch takes place. This hazard occurs
when the Branch instruction is executed.
4.4. Pipeline Control 41
Control hazards can be avoided by simply waiting for the pipeline to stop, but
only either by prediction of the instruction that should be fetched or by flushing the
pipeline if the forecast proves wrong.
The pipelined processor has the same control signals as the single cycle processor as
it includes the same control unit. The control unit in Decode inspects the op, funct3
and funct7 fields of the instruction, in Biblical terminology the process of producing
4.5. Performance Evaluation 42
the control signals. The control signals must also be pipelined through all the stages.
After the pipeline control the pipelined processor is shown in the following figure.
Ideally, pipeline processor should have a CPI equal to 1 since a new instruction is
issued, or fetched, every cycle. However, as we saw, a stall or flush takes one or
two cycles so actual CPI will a few percent higher than one, independently of the
program executed.
Chapter 5
In the 1970’s integrated circuit-based microprocessors began to see wide use, and
effective floating-point and arithmetic circuits supporting the new CPUs began to
emerge. Also as usual several different proposals for encoding floating-point numbers
emerged along with the associated incompatibilities between manufacturers which
hindered general interoperability. An IEEE standard was defined to offer a uniform
way of encoding floating-point integers. The IEEE 754 Standard for Floating-Point
Arithmetic, when first published in 1985, defined a few formats and special sorts
for floating-point integers. Since then, the standard has undergone minor revisions
in 2008 and 2019. The most commonly used method for encoding floating-point
integers is the IEEE 754. IEEE 754 standard include many variations like half
precision, single precision, double precision, etc. Since our processor is the 32-bit so
we will be using 32-bit floating point which is the single precision.
The single precision uses 32-bits and it is also known as binary32. In single
43
5.1. Introduction to IEEE 754 Standard 44
precision, there is only 1-sign bit, 8-bits for exponent and 23-bits for the mantissa.
If the sign bit is 0 then it means that the number is positive and not then negative.
The 8-bits of the exponent represents the decimal value of the exponent and this
exponent is also stored by adding bias of 127. The bias is add because to represent
both negative values and positive values the value would range from -127 to 127 but
with the help of bias we have increased the range from 0 to 255.
The 23-bits of the mantissa represents the significant bits of the number. IEEE
754 employs the concept of an implied 1 because a number in binary SN can always be
written in a form that guarantees that the whole number to the left of the radix point
will always be 1. Unlike other standards, in IEEE 754, the whole number component
of the mantissa is not included in the 23-bit field. This provides an increment of one
accuracy bit to the finally encoded number. A binary32 word does not contain the
presumed 1 during number encoding. During the decode of a number, the presumed
1 is added back to the mantissa recovering the original value. A number is said to
be normalized if it is represented in binary SN with a single 1 in each point.
5.1. Introduction to IEEE 754 Standard 45
IEEE 754 also defines a set of the special values like infinity, zero, NaN, etc. The
unique codes for +0 and -0 are provided by IEEE 754. The exponent and mantissa
fields both contain all 0 s, which denotes zero. For this purpose the sign bit used to
determine the zero is negative or positive. IEEE 754 also provides unique codes for
the +infinity and -infinity. Infinity is represented by setting all the exponents to 1’s
and all the mantissa bits to 0’s and if there is any non-zero in the mantissa bits then
the number would become Nan.
Now we have learned how to represent the number in the single precision format,
now its time to learn about the how the operations are performed on the two binay32
numbers.
In binay32 the addition or subtraction follows some steps that are very similar to
the addition or subtraction when dealing with the base 10 numbers. The first step
is to make sure that both the numbers have the same exponent. There are involved
a few more processes in addition and subtraction of IEEE 754 numbers, over and
above the base 10 SN operation. The first step is to convert the input number into
3 distinct fields i.e one for sign bit the second for the exponent bits and the third for
the mantissa bits. This step is call unpacking.It is during this phase that the bias
is removed from the exponent and the inferred 1 added to the mantissa field, hence
increasing its size by one bit.
The second step is to modify the the inputs such that they have the same expo-
nent. This is done by comparing the both the exponents and which ever exponent
is greater then the other number mantissa is shift by the difference of the exponents
of two numbers.
FPU Architecture
5.4. Integration with RISC-V ISA 47
Implementation Details
Chapter 6
Performance Analysis
48
Chapter 7
Implementation on FPGA
7.2 PMod
Pmod
Constraint File
49