ECE/CS 250
Computer Architecture
Summer 2023
Processor Design: Datapath and Control
Tyler Bletsch
Duke University
Slides are derived from work by
Daniel J. Sorin (Duke), Amir Roth (Penn)
Where We Are in This Course Right Now
• So far:
• We know what a computer architecture is
• We know what kinds of instructions it might execute
• We know how to perform arithmetic and logic in an ALU
• Now:
• We learn how to design a processor in which the ALU is just one
component
• Processor must be able to fetch instructions, decode them, and execute
them
• There are many ways to do this, even for a given ISA
• Next:
• We learn how to design memory systems
2
This Unit: Processor Design
Application • Datapath components and timing
OS • Registers and register files
Compiler Firmware • Memories (RAMs)
• Mapping an ISA to a datapath
CPU I/O
• Control
Memory
• Exceptions
Digital Circuits
Gates & Transistors
3
Readings
• Patterson and Hennessy
• Chapter 4: Sections 4.1-4.4
• Read this chapter carefully
• It has many more examples than I can cover in class
4
So You Have an ALU…
• Important reminder: a processor is just a big finite state
machine (FSM) that interprets some ISA
• Start with one instruction
add $3,$2,$4
• ALU performs just a small part of execution of instruction
• You have to read and write registers
• You have have to fetch the instruction to begin with
• What about loads and stores?
• Need some sort of memory interface
• What about branches?
• Need some hardware for that, too
5
Datapath and Control
datapath
fetch
Insn Register Data
PC
memory File Memory
control
• Datapath: registers, memories, ALUs (computation)
• Control: which registers read/write, which ALU operation
• Fetch: get instruction, translate into control
• Processor Cycle: Fetch → Decode → Execute
6
Building a Processor for an ISA
• Fetch is pretty straightforward
• Just need a register (called the Program Counter or PC) to hold the
next address to fetch from instruction memory
• Provide address to instruction memory → instruction memory provides
instruction at that address
• Let’s start with the datapath
1. Look at ISA
2. Make sure datapath can implement every instruction
7
Datapath for MIPS ISA
• Consider only the following instructions
add $1,$2,$3
addi $1,$2,<value>
lw $1,4($3)
sw $1,4($3)
beq $1,$2,PC_relative_target
j Absolute_target
• Why only these?
• Most other instructions are similar from datapath viewpoint
• I leave the ones that aren’t for you to figure out
8
Review: A Register
D0 Q0
DFF
D1 Q1
N N
DFF D Q
D Q = 32 bit reg
E Q
Note: Above is the “classic” register we
learned before; we’re just introducing a
new symbol for the same thing
DN-1 QN-1
WE DFF
CLK WE
• Register: DFF array with shared clock, write-enable (WE)
• Notice: both a clock and a WE (DFFWE = clock & registerWE)
• Convention I: clock represented by wedge
• Convention II: if no WE, DFF is written on every clock
9
Uses of Registers
datapath
fetch
Insn Register Data
PC
memory File Memory
control
• A single register is good for some things
• PC: program counter
• Other things which aren’t the ISA registers (more later in semester)
10
What About the ISA Registers?
RDVAL RS1VAL Read port 1
Write port 32 32
Register File RS2VAL Read port 2
5 5 5
RD = dest reg
WE RD RS1 RS2 RS = source reg
Write Enable
• Register file: the ISA (“architectural”, ”visible”) registers
• Two read “ports” + one write “port”
• Maximum number of reads/writes in single instruction (R-type)
• Port: wires for accessing an array of data
• Data bus: width of data element (MIPS: 32 bits)
• Address bus: width of log2 number of elements (MIPS: 5 bits)
• Write enable: if it’s a write port
• M ports = M parallel and independent accesses
11
Register File With Tri-State Read Ports
RDVAL
RS2VAL
RS1VAL
WE RD RS1 RS2
12
Another Useful Component: Memory
DATAIN DATAOUT
Memory
ADDRESS
WE
• Memory: where instructions and data reside
• One read/write “port”: one access per cycle, either read or write
• One address bus
• One input data bus for writes, one output data bus for reads
• Actually, a more traditional definition of memory is
• One input/output data bus
• No clock → asynchronous “strobe” instead
13
Dramatis Personae
Register File
Register
Arithmetic Logic Unit
s1VAL
Memory RDVAL
P Register In1
C File ALU
s2VAL Result
Mem rs1 rs2 rd
In2
WE
ALUop
(Which math to do)
Shift left Adder that
by two bits always adds 4 Mux
<< Adder In1
2
+ Plain ol’
4 AND gate
In2 Result
+
Which?
Sign Zero
extender extender
S 0
X X
Converts to longer bit widths; preserves sign Converts to longer bit widths for unsigned numbers
(3) 0011 => 00000011 (still 3) (3) 0011 => 00000011 (still 3)
(-7) 1001 => 11111001 (still -7) (9) 1011 => 00001001 (still 9) 14
Let’s Build A MIPS-like Datapath
15
Start With Fetch
+
4
P Insn
C Mem
• PC and instruction memory
• A +4 incrementer computes default next instruction PC
• Why +4 (and not +1)? What will it be for 16-bit Duke 250/16?
16
First Instruction: add $rd, $rs, $rt
+
4
rs rs + rt
P Insn Register
C Mem File rt
s1 s2 d
R-type Op(6) rs(5) rt(5) rd(5) Sh(5) Func(6)
• Add register file and ALU
17
Second Instruction: addi $rt, $rs, imm
sign extension (sx) unit
+
4
rs
P Insn Register
C Mem File
s1 s2 d
S
X 32 Extended(imm)
16
I-type Op(6) rs(5) rt(5) Immed(16)
• Destination register can now be either rd or rt
• Add sign extension unit and mux into second ALU input
18
Third Instruction: lw $rt, imm($rs)
+
4
a
P Insn Register Data
C Mem File d Mem
s1 s2 d
S
X
I-type Op(6) rs(5) rt(5) Immed(16)
• Add data memory, address is ALU output (rs+imm)
• Add register write data mux to select memory output or ALU output
19
Fourth Instruction: sw $rt, imm($rs)
+
4
a
P Insn Register Data
C Mem File ? d Mem
s1 s2 d
S
X
I-type Op(6) rs(5) rt(5) Immed(16)
• Add path from second input register to data memory data input
• Disable RegFile’s WE signal
20
Fifth Instruction: beq $1,$2,target
<< +
2
+
4
z a
P Insn Register Data
C Mem File d Mem
s1 s2 d
S
X
I-type Op(6) rs(5) rt(5) Immed(16)
• Add left shift unit (why?) and adder to compute PC-relative branch target
• Add mux to do what?
21
Sixth Instruction: j
<< +
2
0 <<
+ X 32
2
4
26
a
P Insn Register Data
C Mem File d Mem
s1 s2 d
S
X
J-type Op(6) Immed(26)
• Add shifter to compute left shift of 26-bit immediate
• Add additional PC input mux for jump target
22
Seventh, Eight, Ninth Instructions
• Are these the paths we would need for all instructions?
sll $1,$2,4 // shift left logical
• Like an arithmetic operation, but need a shifter too
slt $1,$2,$3 // set less than (slt)
• Like subtract, but need to write the condition bits, not the result
• Need zero extension unit for condition bits
• Need additional input to register write data mux
jal absolute_target // jump and link
• Like a jump, but also need to write PC+4 into $ra ($31)
• Need path from PC+4 adder to register write data mux
• Need to be able to specify $31 as an implicit destination
jr $31 // jump register
• Like a jump, but need path from register read to PC write mux
23
Clock Timing
• Must deliver clock(s) to avoid races
• Can’t write and read same value at same clock edge
• Particularly a problem for RegFile and Memory
• May create multiple clock edges (from single input clock) by
using buffers (to delay clock) and inverters
• For Homework 4 (the Duke 250/16 CPU):
• Keep the clock SIMPLE and GLOBAL
• You may need to do the PC on falling edge and everything else on
rising edge
• Changing clock edges in this way will separate PC++ from logic
• Otherwise, if the PC changes while the operation is occurring, the
instruction bits will change before the answer is computed ->
non-deterministic behavior
• Note: A cheap way to make something trigger on the other clock
edge is to NOT the clock on the way into that component 24
This Unit: Processor Design
Application • Datapath components and timing
OS • Registers and register files
Compiler Firmware • Memories (RAMs)
• Clocking strategies
CPU I/O
• Mapping an ISA to a datapath
Memory
• Control
Digital Circuits
• Exceptions
Gates & Transistors
25
What Is Control?
BR
<<
2
0 <<
+ X 2
JP
4
a
P Insn Register Data
C Mem File d Mem Rwd
s1 s2 d
S
Rwe X
ALUop DMwe
Rdst ALUinB
• 9 signals control flow of data through this datapath
• MUX selectors, or register/memory write enable signals
• Datapath of current microprocessor has 100s of control signals
26
Example: Control for add
BR=0
<<
2
0 <<
+ X 2
JP=0
4
a
P Insn Register Data
C Mem File d Mem Rwd=0
s1 s2 d
S
Rwe=1 X
ALUop=0 DMwe=0
Rdst=1 ALUinB=0
• Rwe: Register Write Enable • DMwe: Data Memory Write Enable
• Rdst: Register Destination chooser • Rwd: Register Write Data chooser
• ALUinB: ALU input B chooser • BR: Branch?
• ALUop: ALU operation (multi-bit) • JP: Jump?
27
Example: Control for sw
BR=0
<<
2
0 <<
+ X 2
JP=0
4
a
P Insn Register Data
C Mem File d Mem Rwd=X
s1 s2 d
S
Rwe=0 X
ALUop=0 DMwe=1
Rdst=X ALUinB=1
• Difference between a sw and an add is 5 signals
• 3 if you don’t count the X (“don’t care”) signals
28
Example: Control for sw
BR=0
<<
2
0 <<
+ X 2
JP=0
4
a
P Insn Register Data
C Mem File d Mem Rwd=X
s1 s2 d
S
Rwe=0 X
ALUop=0 DMwe=1
Rdst=X ALUinB=1
• Difference between a sw and an add is 5 signals
• 3 if you don’t count the X (“don’t care”) signals
29
Example: Control for beq $1,$2,target
BR=1
<<
2
0 <<
+ X 2
JP=0
4
a
P Insn Register Data
C Mem File d Mem Rwd=X
s1 s2 d
S
Rwe=0 X
ALUop=1 DMwe=0
Rdst=X ALUinB=0
• Difference between a store and a branch is only 4 signals
30
How Is Control Implemented?
BR
<<
2
0 <<
+ X 2
JP
4
a
P Insn Register Data
C Mem File d Mem Rwd
s1 s2 d
S
Rwe X
ALUop DMwe
Rdst ALUinB
Control?
31
Implementing Control
• Each instruction has a unique set of control signals
• Most signals are function of opcode
• Some may be encoded in the instruction itself
• E.g., the ALUop signal is some portion of the MIPS Func field
+ Simplifies controller implementation
– Requires careful ISA design
• Options for implementing control
1. Use instruction type to look up control signals in a table
2. Design combinational logic whose outputs are control signals
• Either way, goal is same: turn instruction into control signals
32
Control Implementation: ROM
• ROM (read only memory): like a RAM but unwritable
• Bits in data words are control signals
• Lines indexed by opcode
• Example: ROM control for our simple datapath
BR JP ALUinB ALUop DMwe Rwe Rdst Rwd
add 0 0 0 0 0 1 1 0
addi 0 0 1 0 0 1 0 0
opcode lw 0 0 1 0 0 1 0 1
sw 0 0 1 0 1 0 0 0
beq 1 0 0 1 0 0 0 0
j 0 1 0 0 0 0 0 0
33
ROM vs. Combinational Logic
• A control ROM is fine for 6 insns and 9 control signals
• A real machine has 100+ insns and 300+ control signals
• Even “RISC”s have lots of instructions
• 30,000+ control bits (~4KB)
– Not huge, but hard to make fast
• Control must be faster than datapath
• Alternative: combinational logic
• It’s that thing we know how to do! Nice!
• Exploits observation: many signals have few 1s or few 0s
34
Control Implementation Combinational Logic with a
Decoder (one-hot representation)
• Example: combinational logic control for our simple datapath
opcode add
addi
lw
sw
beq
j
BR JP DMwe Rwe Rwd Rdst ALUop ALUinB
35
This Unit: Processor Design
Application • Datapath components and timing
OS • Registers and register files
Compiler Firmware • Memories (RAMs)
• Clocking strategies
CPU I/O
• Mapping an ISA to a datapath
Memory
• Control
Digital Circuits
• Exceptions
Gates & Transistors
37
Exceptions
• Exceptions and interrupts
• Infrequent (exceptional!) events
• I/O, divide-by-0, illegal instruction, page fault, protection fault, ctrl-
C, ctrl-Z, timer
• Handling requires intervention from operating system
• End program: divide-by-0, protection fault, illegal insn, ^C
• Fix and restart program: I/O, page fault, ^Z, timer
• Handling should be transparent to application code
• Don’t want to (can’t) constantly check for these using insns
• Want “Fix and restart” equivalent to “never happened”
38
Exception Handling
• What does exception handling look like to software?
• When exception happens…
• Control transfers to OS at pre-specified exception handler address
• OS has privileged access to registers user processes do not see
• These registers hold information about exception
• Cause of exception (e.g., page fault, arithmetic overflow)
• Other exception info (e.g., address that caused page fault)
• PC of application insn to return to after exception is fixed
• OS uses privileged (and non-privileged) registers to do its “thing”
• OS returns control to user application
• Same mechanism available programmatically via SYSCALL
39
MIPS Exception Handling
• MIPS uses registers to hold state during exception handling
• These registers live on “coprocessor 0”
• $14: EPC (holds PC of user program during exception handling)
• $13: exception type (SYSCALL, overflow, etc.)
• $8: virtual address (that produced page/protection fault)
• $12: exception mask (which exceptions trigger OS)
• Exception registers accessed using two privileged
instructions mfc0, mtc0
• Privileged = user process can’t execute them
• mfc0: move (register) from coprocessor 0 (to user reg)
• mtc0: move (register) to coprocessor 0 (from user reg)
• Privileged instruction rfe restores user mode
• Kernel executes this instruction to restore user program
40
MIPS Exception Handling
• MIPS uses registers to hold state during exception handling
• These registers live on “coprocessor 0”
• $14: EPC (holds PC of user program during exception handling)
• $13: exception type (SYSCALL, overflow, etc.)
• $8: virtual address (that produced page/protection fault)
• $12: exception mask (which exceptions trigger OS)
• Exception registers accessed using two privileged
instructions mfc0, mtc0
• Privileged = user process can’t execute them
• mfc0: move (register) from coprocessor 0 (to user reg)
• mtc0: move (register) to coprocessor 0 (from user reg)
• Privileged instruction rfe restores user mode
• Kernel executes this instruction to restore user program
41
Implementing Exceptions
• Why do architects care about exceptions?
• Because we use datapath and control to implement them
• More precisely… to implement aspects of exception handling
• Recognition of exceptions
• Transfer of control to OS
• Privileged OS mode
• Later in semester, we’ll talk more about exceptions (b/c we
need them for I/O)
42
Datapath with Support for Exceptions
PSRs
P << PCwC
Co-procesor 2
S
Register File <<
R + 2
4
PSRr CRwd CRwe
A a D
P Insn I Register Data
ALUinAC O
C Mem R File dMem
s1 s2 d B
S
X
• Co-processor register (CR) file needn’t be implemented as RF
• Independent registers connected directly to pertinent muxes
• PSR (processor status register): in privileged mode?
43
Summary
• We now know how to build a fully functional processor
• But …
• We’re still treating memory as a black box (actually two green boxes, to
be precise)
• Our fully functional processor is slow. Really, really slow.
44
“Single-Cycle” Performance
• Useful metric: cycles per instruction (CPI)
+ Easy to calculate for single-cycle processor: CPI = 1
• Seconds/program = (insns/program) * 1 CPI * (N seconds/cycle)
• ICQ: How many cycles/second in 3.8 GHz processor?
– Slow!
• Clock period must be elongated to accommodate longest operation
• In our datapath: lw
• Goes through five structures in series: insn mem, register file
(read), ALU, data mem, register file again (write)
• No one will buy a machine with a slow clock
• Not even your grandparents!
• Biggest issue: data memory itself is sloooooooooooooooooooooooow
• Next up: Speed up data memory!
• Later on: Faster processor cores!
45
This Unit: Processor Design
Application • Datapath components and timing
OS • Registers and register files
Compiler Firmware • Memories (RAMs)
• Clocking strategies
CPU I/O
• Mapping an ISA to a datapath
Memory
• Control
Digital Circuits
Gates & Transistors
Next up: Memory Systems
46