0% found this document useful (0 votes)
38 views21 pages

6EC171ME22

The document presents a special assignment report on the design of a 32-bit RISC-V based pipelined processor architecture implemented in Verilog. It details the five-stage pipeline structure, including instruction fetch, decode, execute, memory access, and write-back stages, along with the ALU and control unit functionalities. The project includes simulation results demonstrating the processor's ability to execute arithmetic and logical operations, serving as a foundation for further optimizations.

Uploaded by

ishanupadhye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views21 pages

6EC171ME22

The document presents a special assignment report on the design of a 32-bit RISC-V based pipelined processor architecture implemented in Verilog. It details the five-stage pipeline structure, including instruction fetch, decode, execute, memory access, and write-back stages, along with the ALU and control unit functionalities. The project includes simulation results demonstrating the processor's ability to execute arithmetic and logical operations, serving as a foundation for further optimizations.

Uploaded by

ishanupadhye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

6EC171ME22- Advanced Processor Architecture

32 Bit RISC -V Based Pipelined Processor Architecture


Design

SPECIAL ASSIGNMENT REPORT

In partial fulfilment for the award of the degree of

MASTER OF TECHNOLOGY

in

VLSI DESIGN
Submitted by

Upadhye Ishan Nilesh

24MRV021

Guided by

Prof. Vijay Savani

Institute of Technology,
Nirma University, Ahmedabad-382481

1
Abstract

This project implements a pipelined 32-bit RISC-V processor in Verilog,


capable of executing arithmetic, logical, and shift operations. The design
follows a five-stage pipeline architecture: Instruction Fetch (IF), Instruction
Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB).
The ALU supports fundamental operations like addition, subtraction, AND, OR,
XOR, and shift left logical (SLL). The processor uses Harvard architecture,
featuring separate instruction and data memory. Pipeline registers are
incorporated between stages to ensure efficient instruction execution.
Simulation and functional verification are performed using Vivado, with
waveforms analyzed to validate correct execution. This processor serves as a
foundation for further optimizations, including custom instruction extensions
and integration with peripherals.

2
Table of Contents

Table of Contents ...................................................................................................................3


1 Introduction……................................................................................................................4
2 Architecture Organization…………………………..........................................................6
3 Simulation Results………................................................................................................16
References…………………………………………………………………………………..17
Appendix……………………………………………………………………………………18

3
1. Introduction

1.1) What is a RISC-V Processor?

A RISC-V processor is a type of Reduced Instruction Set Computing (RISC) processor


that follows the RISC-V Instruction Set Architecture (ISA). RISC-V is an open-source
and royalty-free ISA, making it different from proprietary ISAs like ARM, x86, and MIPS,
which require licensing fees. The open nature of RISC-V allows researchers, academia,
startups, and major industries to design their own processors without restrictions, fostering
innovation and customizability.

1.2) What is Pipelining?

Pipelining is a technique used in processors to improve instruction throughput by executing


multiple instructions simultaneously across different stages. Instead of completing one
instruction before starting the next, pipelining divides instruction execution into multiple
stages and processes different instructions in parallel.

A RISC-V processor typically follows a 5-stage pipelined architecture:

1. Fetch (IF): Retrieve the instruction from memory.


2. Decode (ID): Decode the instruction and determine control signals.
3. Execute (EX): Perform arithmetic, logic, or memory address calculation.
4. Memory Access (MEM): Read/write from memory (if applicable).
5. Writeback (WB): Write the result back to a register.

4
1.3) Specifications of 32-Bit RISC -V Processor:

A 32-bit RISC-V architecture refers to a variant of the RISC-V (Reduced Instruction Set
Computing - Version Five) open-source instruction set architecture (ISA) that operates with
32-bit registers, program counter, addresses, and data paths

Fig 1.1 Block diagram of 32 Bit RISC-V Based 5 stage Pipelined Architecture

Fig 1.2 Instruction format of 32 Bit RISC -V

5
2. Architecture Organization

This RISC processor design has been constructed using five pipeline stages. The used
pipeline stages are the Instruction Fetch stage (IF), Instruction Decode stage (ID), Execution
stage (EX), Memory Access stage (MEM) and Write Back stage (WB). Pipeline registers or
latches are used to separate the stages of the processor into 5 parts, so there is no
contradictory data due to the execution of multiple instructions. They are named with the
prefix as IF_ID, ID_EX, EX_MEM and MEM_WB. Other blocks include instruction
memory (IR_MEM), Data memory (DATA_MEM), Register File, ALU unit and control unit.
The working of all units and stages are explained here.

2.1) Instruction Memory

All instructions to be performed are stored in the ROM that acts as the instruction memory.
The program counter (pc_ir) points to the location address of the next instruction to be
executed as shown in Fig. The output is the 32-bit instruction, which is sent to the instruction
fetch stage. Here the pc address length is of 32 bits hence it can point up to 2^32 locations.

Fig. 2.1) Instruction Memory

6
2.2) Register File

The register file in this design is a fundamental component of the processor, enabling
efficient data storage and retrieval for instruction execution. It consists of multiple 32-bit
registers with dual read and single write functionality. The read operation occurs
asynchronously, allowing two registers to be accessed simultaneously based on the rs1
and rs2 addresses, outputting their values as rs1_data and rs2_data. The write operation,
controlled by reg_write, occurs on the rising edge of the clock, updating the register
specified by rd with the value in rd_data. A reset signal clears all registers, typically
setting them to zero. For instance, in an ADD instruction ( add x5, x1, x2), the register file
reads the values of x1 and x2, allowing the ALU to compute their sum, which is then
written back to x5 if reg_write is enabled. This architecture ensures efficient pipelined
execution by allowing independent read and write operations in different pipeline stages.

Fig. 2.2) Register File

7
2.3) Data Memory

The data memory module in this design is responsible for storing and retrieving data during
program execution, primarily used for load and store instructions. It operates based on an
address input, which specifies the memory location to be accessed. When mem_read is
asserted, the module outputs the data stored at the given address on read_data, facilitating
operations like lw (load word). Conversely, when mem_write is asserted, the value from
write_data is written to the specified address on the rising edge of the clock, enabling store
operations such as sw (store word). The clock synchronizes write operations, while read
operations typically occur asynchronously, ensuring efficient memory access. This module
plays a crucial role in handling data dependencies and maintaining proper execution flow in a
pipelined architecture.

Fig 2.3) Data Memory

2.4) Instruction fetch

The instruction fetch stage (IF/ID register) is responsible for storing and forwarding the
fetched instruction and program counter (PC) value to the decode stage in a pipelined
processor. The instr_in signal carries the 32-bit instruction fetched from instruction memory,
while pc_in holds the corresponding PC value. On the rising edge of the clock, these values
are latched and stored inside the register, and their outputs (instr_out and pc_out) are
forwarded to the next pipeline stage. The reset signal ensures that the stored instruction and
PC are cleared when necessary, aiding in control flow management.

8
Fig 2.4) Instruction fetch

Fig 2.5) Logic diagram of Instruction fetch

2.5) Decode stage

The decode stage (ID/EX register) is responsible for passing control signals, register values,
and immediate data from the instruction decode stage to the execute stage in a pipelined
processor. It takes inputs such as ALU control (alu_ctrl_in), source selection (alu_src_in),
branch signal (branch_in), memory control signals (mem_read_in, mem_write_in,
mem_to_reg_in), and register control signals (reg_write_in, rd_in, rs1_in, rs2_in).
Additionally, it stores the instruction's immediate value (imm_in), program counter (pc_in),
and register read data (reg_data1_in, reg_data2_in). On the rising edge of the clock, these
values are latched and forwarded as outputs (alu_ctrl_out, imm_out, mem_read_out,
reg_data1_out, etc.), ensuring proper data flow into the execute stage. The reset signal clears
the stored values when necessary, maintaining control over pipeline execution.

9
Fig 2.6) Decode stage

2.6) Execute stage

The execute stage (EX/MEM register) stores and forwards execution results from the ALU
and control signals for memory and register write operations to the memory access stage. It
takes inputs such as the ALU result ( alu_result_in), memory control signals (mem_read_in,
mem_write_in, mem_to_reg_in), destination register (rd_in), and register data to be written to
memory (reg_data2_in). These values are latched on the rising clock edge and outputted
(alu_result_out, mem_read_out, mem_write_out, rd_out, reg_data2_out) for the next pipeline stage. The
register write signal (reg_write_in) is also passed through to maintain correct register updates.
The reset signal clears stored values when necessary.

10
Fig 2.7) Execute stage

Fig 2.8) Logic diagram of Execute Stage

11
2.7) Memory Write Back stage
The memory write-back stage (MEM/WB register) is responsible for storing the results from
the memory access stage and forwarding them to the register file for writing. It takes inputs
such as the ALU result (`alu_result_in`), memory data (`mem_data_in`), the memory-to-
register control signal (`mem_to_reg_in`), the destination register (`rd_in`), and the register
write signal (`reg_write_in`). These values are latched on the clock edge and outputted
(`alu_result_out`, `mem_data_out`, `mem_to_reg_out`, `rd_out`, `reg_write_out`) to be used
in the register write-back phase. The reset signal ensures the clearing of values when
required.

Fig 2.9) Memory Access Stage

Fig 2.10) Logic diagram of Memory Access Stage

12
2.8) ALU unit
ALU (Arithmetic Logic Unit) takes two operands (`op1` and `op2`) and performs various
arithmetic and logical operations based on the `alu_ctrl[3:0]` control signals. The operations
include AND, OR, ADD, SUB, XOR, and comparison for less than (`<`). A multiplexer
selects the final result based on the control signal. The ALU also generates a `zero_flag`
output, which is useful for conditional branch instructions.

Fig 2.11) Logic diagram of ALU

13
2.9) Control unit
The control unit is responsible for decoding the instruction and generating appropriate control
signals to guide data flow within the processor pipeline. It takes `opcode[6:0]`, `funct3[2:0]`,
and `funct7[6:0]` as inputs to determine the type of operation. The outputs include
`alu_op[3:0]` for selecting ALU operations, `alu_src` to choose between register and
immediate operands, `branch` to handle conditional jumps, `mem_read` and `mem_write` for
memory access, `mem_to_reg` to decide whether to write ALU or memory data to registers,
and `reg_write` to enable register updates. This unit plays a crucial role in executing
instructions efficiently within the pipeline.

Fig 2.12) Logic diagram of control unit

14
2.10) Final synthesized design of RISC -V Based pipelined processor
architecture
The final synthesized design integrates various functional units of the RISC-V processor,
including the ALU unit, control unit, register file, instruction fetch and decode stages,
memory unit, and write-back logic. The green wiring represents the data and control flow
between these components, ensuring proper execution of instructions. The design includes
multiplexers for selecting appropriate data paths and memory interactions, along with control
signals to manage execution. This implementation showcases a structured approach to
achieving a functional RISC-V processor pipeline, with optimized connectivity for execution
efficiency.

Fig 2.13) Final synthesized design

15
3. Simulation Results

The functionality of a pipelined RISC-V processor was verified by simulating three


arithmetic and logical operations:
1. ADD (R1 = R2 + R3): The ALU performs addition on register values R2 (3) and R3
(5), producing an expected result of 8 (0x00000008).
2. XOR (R4 = R5 ^ R6): The ALU executes a bitwise XOR operation between R5
(0xA5A5A5A5) and R6 (0x5A5A5A5A), yielding 0xFFFFFFFF.
3. SLL (R7 = R8 << R9): A Shift Left Logical (SLL) operation shifts R8 (4) left by R9
(2), resulting in 16 (0x00000010).
The simulation applies forced values to pipeline registers, ensuring the correct execution
of these instructions at different pipeline stages. The expected results indicate correct
ALU functionality and pipeline execution.

Fig 2.14) Waveforms indicating pipelined execution of instructions


References

16
[1] Digital Design and Computer Architecture, RISC-V Edition by David Harris and Sarah L.
Harris
[2] Design and Analysis of a Multi Clocked Pipelined Processor Based on RISC-V by
Sandeep Prabhakaran et all, IEEE IC3IoT, 2022

17
Appendix

RISC-V Top module code:


module riscv_pipeline (
input clk, reset
);
// IF Stage Signals
wire [31:0] pc, instr, next_pc;

// IF/ID Pipeline Register Outputs


wire [31:0] if_id_pc, if_id_instr;

// ID Stage Signals
wire [31:0] reg_data1, reg_data2, imm_ext;
wire [4:0] rs1, rs2, rd;
wire [3:0] alu_ctrl;
wire alu_src, reg_write, mem_read, mem_write, mem_to_reg, branch;

// ID/EX Pipeline Register Outputs


wire [31:0] id_ex_pc, id_ex_reg_data1, id_ex_reg_data2, id_ex_imm;
wire [4:0] id_ex_rs1, id_ex_rs2, id_ex_rd;
wire [3:0] id_ex_alu_ctrl;
wire id_ex_alu_src, id_ex_reg_write, id_ex_mem_read, id_ex_mem_write,
id_ex_mem_to_reg, id_ex_branch;

// EX Stage Signals
wire [31:0] alu_result;

// EX/MEM Pipeline Register Outputs


wire [31:0] ex_mem_alu_result, ex_mem_reg_data2;
wire [4:0] ex_mem_rd;
wire ex_mem_reg_write, ex_mem_mem_read, ex_mem_mem_write, ex_mem_mem_to_reg;

// MEM Stage Signals


wire [31:0] mem_data;

// MEM/WB Pipeline Register Outputs


wire [31:0] mem_wb_mem_data, mem_wb_alu_result;
wire [4:0] mem_wb_rd;
wire mem_wb_reg_write, mem_wb_mem_to_reg;

// WB Stage Signals
wire [31:0] write_back_data;

// ================= Instruction Fetch (IF) =================


instr_mem imem (
.addr(pc),
.instr(instr)
);

if_id_reg if_id (
.clk(clk), .reset(reset),
.pc_in(pc), .instr_in(instr),
.pc_out(if_id_pc), .instr_out(if_id_instr)
);

// ================= Instruction Decode (ID) =================


control_unit control (

18
.opcode(if_id_instr[6:0]),
.reg_write(reg_write), .mem_read(mem_read), .mem_write(mem_write),
.alu_src(alu_src), .mem_to_reg(mem_to_reg), .branch(branch),
.alu_op(alu_ctrl)
);

reg_file reg_file (
.clk(clk), .reset(reset),
.rs1(if_id_instr[19:15]), .rs2(if_id_instr[24:20]), .rd(mem_wb_rd),
.reg_write(mem_wb_reg_write),
.rd_data(write_back_data),
.rs1_data(reg_data1), .rs2_data(reg_data2)
);

id_ex_reg id_ex (
.clk(clk), .reset(reset),
.pc_in(if_id_pc), .reg_data1_in(reg_data1), .reg_data2_in(reg_data2),
.imm_in(imm_ext), .rs1_in(if_id_instr[19:15]), .rs2_in(if_id_instr[24:20]),
.rd_in(if_id_instr[11:7]), .alu_ctrl_in(alu_ctrl), .alu_src_in(alu_src),
.reg_write_in(reg_write), .mem_read_in(mem_read), .mem_write_in(mem_write),
.mem_to_reg_in(mem_to_reg), .branch_in(branch),
.pc_out(id_ex_pc), .reg_data1_out(id_ex_reg_data1), .reg_data2_out(id_ex_re
g_data2),
.imm_out(id_ex_imm), .rs1_out(id_ex_rs1), .rs2_out(id_ex_rs2), .rd_out(id_e
x_rd),
.alu_ctrl_out(id_ex_alu_ctrl), .alu_src_out(id_ex_alu_src),
.reg_write_out(id_ex_reg_write), .mem_read_out(id_ex_mem_read),
.mem_write_out(id_ex_mem_write), .mem_to_reg_out(id_ex_mem_to_reg),
.branch_out(id_ex_branch)
);

// ================= Execute (EX) =================


alu_risc alu_unit (
.op1(id_ex_reg_data1), .op2(id_ex_alu_src ? id_ex_imm : id_ex_reg_data2),
.alu_ctrl(id_ex_alu_ctrl),
.result(alu_result)
);

ex_mem_reg ex_mem (
.clk(clk), .reset(reset),
.alu_result_in(alu_result), .reg_data2_in(id_ex_reg_data2),
.rd_in(id_ex_rd), .reg_write_in(id_ex_reg_write),
.mem_read_in(id_ex_mem_read), .mem_write_in(id_ex_mem_write),
.mem_to_reg_in(id_ex_mem_to_reg),
.alu_result_out(ex_mem_alu_result), .reg_data2_out(ex_mem_reg_data2),
.rd_out(ex_mem_rd), .reg_write_out(ex_mem_reg_write),
.mem_read_out(ex_mem_mem_read), .mem_write_out(ex_mem_mem_write),
.mem_to_reg_out(ex_mem_mem_to_reg)
);

// ================= Memory (MEM) =================


data_memory dmem (
.clk(clk), .mem_read(ex_mem_mem_read), .mem_write(ex_mem_mem_write),
.address(ex_mem_alu_result), .write_data(ex_mem_reg_data2),
.read_data(mem_data)
);

mem_wb_reg mem_wb (
.clk(clk), .reset(reset),
.mem_data_in(mem_data), .alu_result_in(ex_mem_alu_result),
.rd_in(ex_mem_rd), .reg_write_in(ex_mem_reg_write),

19
.mem_to_reg_in(ex_mem_mem_to_reg),
.mem_data_out(mem_wb_mem_data), .alu_result_out(mem_wb_alu_result),
.rd_out(mem_wb_rd), .reg_write_out(mem_wb_reg_write),
.mem_to_reg_out(mem_wb_mem_to_reg)
);

// ================= Write Back (WB) =================


assign write_back_data = mem_wb_mem_to_reg ? mem_wb_mem_data :
mem_wb_alu_result;

endmodule

Testbench:

`timescale 1ns / 1ps

module riscv_pipeline_tb;

reg clk, reset;

// Instantiate the DUT (Device Under Test)


riscv_pipeline uut (
.clk(clk),
.reset(reset)
);

// Generate Clock (10ns period = 100MHz)


always #5 clk = ~clk;

initial begin
// Initialize Signals
clk = 0;
reset = 1;

// Apply Reset for 2 Cycles


#10 reset = 0;

// ====== Instruction 1: ADD R1 = R2 + R3 (Arithmetic) ======


force uut.pc = 32'h00000000;
force uut.if_id_instr = 32'b00000000001100000000000010110011; // ADD R1,
R2, R3
force uut.if_id_pc = 32'h00000000;

// ID Stage Inputs
force uut.reg_data1 = 32'h00000003; // R2 = 3
force uut.reg_data2 = 32'h00000005; // R3 = 5
force uut.alu_ctrl = 4'b0010; // ALU ADD operation
force uut.alu_src = 0; // Use register data
force uut.reg_write = 1; // Enable write back

#10;

// ====== Instruction 2: XOR R4 = R5 ^ R6 (Logical) ======


force uut.pc = 32'h00000004;
force uut.if_id_instr = 32'b00000000011000101000001010110011; // XOR R4,
R5, R6
force uut.if_id_pc = 32'h00000004;

// ID Stage Inputs
force uut.reg_data1 = 32'hA5A5A5A5; // R5
force uut.reg_data2 = 32'h5A5A5A5A; // R6
force uut.alu_ctrl = 4'b0100; // XOR operation

20
force uut.alu_src = 0;
force uut.reg_write = 1;

#10;

// ====== Instruction 3: SLL R7 = R8 << R9 (Shift Left Logical) ======


force uut.pc = 32'h00000008;
force uut.if_id_instr = 32'b00000000000101000010001110110011; // SLL R7,
R8, R9
force uut.if_id_pc = 32'h00000008;

// ID Stage Inputs
force uut.reg_data1 = 32'h00000004; // R8 = 4
force uut.reg_data2 = 32'h00000002; // R9 = 2 (Shift amount)
force uut.alu_ctrl = 4'b0001; // Shift Left Logical
force uut.alu_src = 0;
force uut.reg_write = 1;

#40;

// Stop Simulation
$finish;
end
endmodule

21

You might also like