EE-222: Microprocessor Systems
RISC-V CPU Datapath and Control Unit
Great Idea #1: Levels of
Representation & Interpretation
Higher-Level Language temp = v[k];
v[k] = v[k+1];
Program (e.g. C) v[k+1] = temp;
Compiler
lw $t0, 0($2)
Assembly Language lw $t1, 4($2)
Program (e.g. RISC-V) sw $t1, 0($2)
sw $t0, 4($2)
Assembler
0000 1001 1100 0110 1010 1111 0101 1000
Machine Language 1010 1111 0101 1000 0000 1001 1100 0110
Program (RISC-V) 1100 0110 1010 1111 0101 1000 0000 1001
0101 1000 0000 1001 1100 0110 1010 1111
Machine We are here
Interpretation
Hardware Architecture Description
(e.g. block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams)
2
§4.1 Introduction
The Processor
• We will examine two RISC-V implementations
– A simplified version
– A more realistic pipelined version
• Simple subset, shows most aspects
– Memory reference: ld, sd
– Arithmetic/logical: add, sub, and, or
– Control transfer: beq
Hardware Design Hierarchy
system
Today
datapath control
code state combinational
multiplexer comparator
registers registers logic
Your DLD Course: register logic
Make sure you
know this stuff switching
networks
4
Before we build our datapath:
A Review of Key Important Concepts
Review -- Combinational Logic
• Hardware is permanent. Always do everything
you might want
• Use MUXes to pick from among input
– S input bits selects one of 2S inputs
• Ex: ALU
6
Sequential Elements
• Register: stores data in a circuit
– Uses a clock signal to determine when to update
the stored value
– Edge-triggered: update when Clk changes from 0
to 1
Clk
D Q
D
Clk
Q
Sequential Elements
• Register with write control
– Only updates on clock edge when write control
input is 1
– Used when stored value is required later
Clk
D Q Write
Write D
Clk
Q
Propagation Delay in Gates
9
Gate Delay (Propagation Delay)
• Time that it takes for combinational gate output to
change after inputs change
A
A
Y B
B
t gate
10
Path Delay
• Delay through a series of combinational gates:
– Specifically, the time it takes for the output of the series of
gates to change after the inputs to the path change.
• Example:
– Propagation delay for each gate is the same (1ns in this
example)
A
X
B
C
D Y
11
Path Delay
• Delay through a series of combinational gates.
– Specifically, the time it takes for the output of the series of
gates to change after the inputs to the path change.
• Example:
– Propagation delay for each gate is the same (1ns in this
example)
A 0
1 5
X
B 3
0 4
1 2
C
0
D Y
0 3 4
12
Comments on the Cct
A 0
1 5
X
B 3
0 4
1 2
C
0
D Y
0 3 4
• In the circuit above, if we apply all inputs at time 0, then
after 5ns, all the outputs have settled to their final
values.
• Do you see any problem with this?
– Some outputs might settle earlier
– Some outputs may switch back and forth a few times
before settling to a final value
– What about input synchronization for the downstream logic?
13
How to Synchronize Inputs?
Add flip-flops!
A
D Q
X
B D XQ
other circuitry
other circuitry
D Q
C
Y
D Q
D Y
Q
D
D Q
• Rising edge on clock at time 0:
– Assuming no delay in the flip-flops, the outputs of the
source (left four) flip-flops change at time 0.
• Some time later (one clock cycle), the clock goes high
again, and the destination flip-flops read in X and Y
14
General Structure of a Digital System
• Digital systems are made up of many stages of flip-flops
and combinational logic.
A
D Q
X
B D XQ
other circuitry
other circuitry
D Q
C
Y
D Q
D Y
Q
D
D Q
DFF Combinational DFF Combinational DFF …
Logic Logic
15
You’ve Seen this Before
• Finite State Machines, shift registers, counters, etc…
FSM FSM
Inputs Next State DFF Outputs
Output Logic
Logic
COUNT_INTERNAL
wire
+ D Q COUNT
1
CLK
COUNTER
16
Review: A General Sequential Circuit
Y1 y1
w
Combinational Combinational z
circuit circuit
Y2 y2
Clock
Y1,Y2 represent NEXT state y1,y2 represent PRESENT state
17
Review: Sequential Circuit
• In a sequential circuit, the values of the outputs
depend on the past behavior of the circuit, as well
as the present values of its inputs.
– Moore: If the outputs depend only on the present
state.
– Mealy: If the outputs depend on both the present
state and the present values of the inputs.
W Combinational Combinational
Flip-flops circuit Z
circuit Q
Clock
18
Review -- SDS and Sequential Logic
19
Clocking Methodology
• Combinational logic transforms data during
clock cycles
– Between clock edges
– Input from state elements, output to state
element
– Longest delay determines clock period
Agenda
• Datapath Overview
• Assembling the Datapath Part 1
• Processor Design Process
• Assembling the Datapath Part 2
24
Hardware Design Hierarchy
system
Today
datapath control
code state combinational
multiplexer comparator
registers registers logic
Your DLD Course: register logic
Make sure you
know this stuff switching
networks
25
The Processor
• Processor (CPU): Instruction Set Architecture
(ISA) implemented directly in hardware
– Datapath: part of the processor that contains the
hardware necessary to perform operations
required by the processor (“the brawn”)
– Control: part of the processor (also in hardware)
which tells the datapath what needs to be done
(“the brain”)
26
§4.3 Building a Datapath
Building a Datapath
• Datapath
– Elements that process data and addresses
in the CPU
• Registers, ALUs, mux’s, memories, …
• We will build a RISC-V datapath incrementally
– Refining the overview design
Executing an Instruction
Very generally, what steps do you take (order
matters!) to figure out the effect/result of the
next RISC-V instruction?
– Get the instruction add s0,t0,t1
– What instruction is it? add
– Gather data read R[t0], R[t1]
– Perform operation calc R[t0]+R[t1]
– Store result save into s0
28
Instruction Fetch
Increment by
4 for next
32-bit instruction
register
Basic Phases of Instruction Execution
rd
PC
Reg[]
rs1
IMEM
ALU
DMEM
rs2
+ imm
4
mux
1. Instruction 2. Decode/ 5. Register
3. Execute 4. Memory
Fetch Register Write
Read
Clock
time 30
State Required by RV32I ISA
Each instruction reads and updates this state during execution:
• Registers (x0..x31)
− Register file (or regfile) Reg holds 32 registers x 32 bits/register: Reg[0].. Reg[31]
− First register read specified by rs1 field in instruction
− Second register read specified by rs2 field in instruction
− Write register (destination) specified by rd field in instruction
− x0 is always 0 (writes to Reg[0]are ignored)
• Program Counter (PC)
− Holds address of current instruction
• Memory (MEM)
− Holds both instructions & data, in one 32-bit byte-addressed memory space
− We’ll use separate memories for instructions (IMEM) and data (DMEM)
▪ Later we’ll replace these with instruction and data caches
− Instructions are read (fetched) from instruction memory (assume IMEM read-only)
− Load/store instructions access data memory
32
Agenda
• Datapath Overview
• Assembling the Datapath Part 1
• Processor Design Process
• Assembling the Datapath Part 2
34
R-Format Instructions
• Read two register operands
• Perform arithmetic/logical operation
• Write register result
Implementing the add instruction
add rd, rs1, rs2
• Instruction makes two changes to machine’s state:
− Reg[rd] = Reg[rs1] + Reg[rs2]
− PC = PC + 4
36
Datapath Walkthroughs (1/3)
• add x3,x1,x2 # r3 = r1+r2
1) IF: fetch this instruction, increment PC
2) ID: decode as add
then read R[1] and R[2]
3) EX: add the two values retrieved in ID
4) MEM: idle (not using memory)
5) WB: write result of EX into R[3]
37
Instruction Fetch
Increment by
4 for next
32-bit instruction
register
Example: add Instruction
add x3,x1,x2
R[1] + R[2]
R[1]
registers
3
instruction
memory
PC
memory
1
Data
ALU
2 R[2]
imm
+4
MUX
39
Datapath for add
+4 Reg[]
DataD Reg[rs1]
pc inst[11:7] alu
pc+4
IMEM AddrD
inst[19:15] AddrA DataA Reg[rs2]
+
inst[24:20] AddrB DataB
inst[31:0] RegWriteEnable
(RegWEn)
Control Logic
40
Timing Diagram for add
+4 Reg[]
DataD Reg[rs1]
pc inst[11:7] alu
pc+4 IMEM AddrD
inst[19:15] AddrA DataA Reg[rs2]
+
inst[24:20] AddrB DataB
inst[31:0]
RegWEn
clock
time
Clock
PC 1000 1004
PC+4 1004 1008
inst[31:0] add x1,x2,x3 add x6,x7,x9
Reg[rs1] Reg[2] Reg[7]
Reg[rs2] Reg[3] Reg[9]
alu Reg[2]+Reg[3] Reg[7]+Reg[9]
Reg[1] ??? Reg[2]+Reg[3] 41
Implementing the sub instruction
sub rd, rs1, rs2
• Almost the same as add, except now have to subtract
operands instead of adding them
• inst[30] selects between add and subtract
42
Datapath for add/sub
+4 Reg[]
DataD Reg[rs1]
ALU
pc IMEM
inst[11:7]
AddrD alu
pc+4 inst[19:15] AddrA DataA Reg[rs2]
inst[24:20] AddrB DataB
inst[31:0] RegWEn ALUSel
(1=write, 0=no write) (Add=0/Sub=1)
Control Logic
43
Implementing other R-Format instructions
• All implemented by decoding funct3 and funct7 fields and
selecting appropriate ALU function
44
Implementing the addi instruction
• RISC-V Assembly Instruction:
addi x15,x1,-50
111111001110 00001 000 01111 0010011
imm=-50 rs1=1 ADD rd=15 OP-Imm
45
Datapath for add/sub
+4 Reg[]
DataD Reg[rs1]
ALU
pc IMEM
inst[11:7]
AddrD alu
pc+4 inst[19:15] AddrA DataA Reg[rs2]
inst[24:20] AddrB DataB
inst[31:0] RegWEn ALUSel
(1=write, 0=no write) (Add=0/Sub=1)
Control Logic
46
Adding addi to datapath
+4 Reg[]
DataD
ALU
pc IMEM
inst[11:7]
AddrD Reg[rs1] alu
pc+4 inst[19:15] AddrA DataA 0
Reg[rs2]
inst[24:20] AddrB DataB 1
inst[31:20]
Imm. imm[31:0]
Gen
inst[31:0] ImmSel=I RegWEn=1 BSel=1 ALUSel=Add
Control Logic
47
I-Format immediates
inst[31:0]
------inst[31]-(sign-extension)------- inst[30:20]
imm[31:0]
inst[31:20] imm[31:0]
Imm.
Gen • High 12 bits of instruction (inst[31:20]) copied to low 12 bits
of immediate (imm[11:0])
• Immediate is sign-extended by copying value of inst[31] to
ImmSel=I fill the upper 20 bits of the immediate value (imm[31:12])
48
Adding addi to datapath
+4 Reg[]
DataD
ALU
pc IMEM
inst[11:7]
AddrD Reg[rs1] alu
pc+4 inst[19:15] AddrA DataA 0
Reg[rs2]
inst[24:20] AddrB DataB 1
Also works for all other I-
format arithmetic instruction
inst[31:20]
Imm. (slti,sltiu,andi,ori,
imm[31:0]
Gen xori,slli,srli,srai)
just by changing ALUSel
inst[31:0] ImmSel=I RegWEn=1 BSel=1 andi
sltiu
slti
xori
ori
ALUSel=Add
Control Logic
49
Why Five Stages?
• Could we have a different number of stages?
– Yes, and other architectures do
• So why does RISC-V have five if instructions
tend to idle for at least one stage?
– The five stages are the union of all the operations
needed by all the instructions
– There is one instruction that uses all five stages:
load (lw/lb)
50
Administrivia
• Semester Project due on Monday 6th May 2019
− Project display schedule will be out in the coming week
• Assignment-3: Last one ☺ and fun doing, hopefully.
− You are required to watch a keynote by John Hennessy and
David Patterson and pen down the key take-away points of
the videos in one page [single-sided, no handwritten, font
size no more than 12-pts and no less than 10 pts]
− Keynote link:
https://www.acm.org/hennessy-patterson-turing-lecture
− Due on Monday 6th May 2019
▪ You’ll be interviewed for it along with your project demo
− Submit the soft copy on LMS.
− Also submit the hard-copy that day.
51