Notes - Unit 6
Notes - Unit 6
▪ Computer Specification: It is the description of its appearance to a programmer at the lowest level: the Instruction Set
Architecture (ISA). From the ISA, a high-level description of the hardware to implement the computer (i.e., the computer
architecture) is formulated.
OUTPUT INPUT
UNITS UNITS
ADDRESS/DATA BUS
REGISTERS
Memory CONTROL
Data / UNIT
Instruction ALU
PROCESSOR (CPU)
AD WE
IM_WE WE H
AD
Instruction DATAPATH Data
NI N
IM_DI DI Memory DI Memory
DM_DI
DO FS DO
NI Z
IR V ALU N
N
C
x_ctrl Z
Instruction V
Decoder N Datapath
C Control
CONTROL_WORD FS
2 Instructor: Daniel Llamocca
ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY
ECE-4710/5710: Computer Hardware Design Winter 2024
MEMORY OVERVIEW
RANDOM ACCESS MEMORY (RAM)
▪ Access to words from a desired location take the same time regardless of the location, hence the name RAM.
▪ Number of words: 2𝑘 𝑛-bit words (2𝑘 × 𝑛 memory). Depth = 𝑚 = 2𝑘 , width = 𝑛.
RAM
▪ I/O description: n n
✓ DI: 𝑛-bit input data. DI DO
2k n-bit words
✓ DO: 𝑛-bit output data. k AD
✓ AD: 𝑘-bit address. The addresses range from 0 to 2𝑘 − 1.
✓ we: write enable. Also called R/W (sometimes this signal is active-low: 𝑅/𝑊
̅) we
✓ en: enable. Other terms are CS (chip select) or E. en
▪ Operation: n
en we
Action
0 X
no read or write
1 0
read word from address pointed by AD
1 1
write on address pointed by AD
✓ In simple memory designs, we often tie en=1, so that we always read or write. Here, we is often renamed as wr_rd.
MEMORY DECODING
▪ We can group memory blocks to build a n
DI DI
larger memory. For a memory with 2𝑘 𝑛- en 2h
DO
bit words, if an individual memory block k h we words
address AD 0
has 2ℎ words, we then need 2𝑘−ℎ = 2𝑝
memory blocks. k DI
2h
▪ Conceptual implementation: en
words
DO
we
✓ Data write: a decoder can be used to
k-h
AD 1
p h
select the proper block to write on.
DI
✓ Data read: a multiplexor (or 3-state n
MUX
en
2h DO
DO
buffer) can be used to select which we words
memory block output to read from. AD 2
...
Decoder
▪ The 𝑘-bit address of the memory can be
...
divided into 𝑝 + ℎ bits.
...
▪ Example: 256K8 memory out of 4 648 memory blocks. The 18-bit address of the 256K8 memory is divided into 2 bits
(to select the memory block to read/write data) and 16 bits (to select an individual address from a memory block).
✓ Note that for every bit we increase in the address, the number of words doubles.
✓ We can also concatenate the data inputs (and data outputs) of several memory blocks to create a memory with an
increased word size.
Example: 256Kx16 memory (address, we, and en inputs are shared amont all the memory bocks).
DI 8 DI
en 2
16
DO
we words
18 16 0
ad AD
18
DI
en
216 16
DO DI
2 16 we words 8 8
AD 1 DI DI
2
8 216 216
MUX
en DO en DO
DO
DI
2h we words we words
en DO ad AD 0 AD 1
Decoder we words
2 en 16
AD
we
en E DI
en
216
DO
we we words
AD 3
RAM IMPLEMENTATION
DI resetn
z
Register-based: E
▪ These designs can be described at RTL (on 4 4
FPGAs or ASICs).
D 0 Q
clock
✓ There are different approaches for the E
behavior of the output data when writing.
Simplest one: set all outputs to 0’s. D 2 Q
▪ Read operation: Data is available as soon as the
E
address of the desired word is fed.
3 Q
address
D
3 4
MUX
▪ The figure shows a memory with n=4, k=3. A
Decoder E
timing diagram is shown, where we tie en=1. DO
✓ Sometimes an output register is added (this D 4 Q
delays output for 1 clock cycle).
z E
✓ In many applications, it is common to set E
en=1. Here, we controls reading/writing. D 5 Q
3
E
▪ VHDL parametric code: RAM_emul.vhd.
(it has wr_rd = we; en: not used) D 6 Q
we
en E
▪ This straightforward implementation is not used 4
in real-world applications (unless the memory D 7 Q
requirements are very small): flip flops are very
expensive resources, also for every extra
address bit the decoder and mux sizes grow exponentially.
clock
resetn
we
address 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000
DO 0000 0000 0000 0000 0000 0000 0000 0000 1101 1110 1100 1010 0111 1011 0101 1111 1101
▪ One-bit implementation: the figure shows a SRAM cell logic model, a SRAM cell implementation with pass transistors and
NOT gates, and a SRAM cell implementation where the NOT gates are implemented with CMOS technology.
✓ FPGAs use SRAM-based technology to store the bitstream. BlockRAMs are also based on SRAM technology.
R
Latch
SRAM Cell model SRAM Cell
SRAM Cell
WS0
▪ SRAM implementation: We first implement a ‘bit slice’,
sel
i.e., a column of the SRAM.
✓ RAM bit slice: 𝑚 = 2𝑘
SRAM
bit slice WS1
sel
2k words
...
...
...
...
...
WSm-1
n sel
Write logic:
CS R/W 𝐵 𝐵̅ Comments
multi-input OR
0 X 0 0 DO=0 (invalid)
1 0 0 0 Data kept on SRAM cells Read Logic
̅̅̅
Write Logic
1 1 𝐷𝐼 𝐷𝐼 Can write on an SRAM cell
Latch
Read logic: DI S
If sel=0 for all cells, then 𝐶 = 𝐶̅ = 0 for all cells and
the SR latch keeps its value.
R
If CS=0, then DO=0 (invalid output).
If CS=1, only one cell should have sel=1. Here, the DO
SR Latch stores the cell value (𝐶 ) and DO=𝐶 . R/W CS SRAM Bit Slice
✓ 2𝑘 × 𝑛 words SRAM: It can be implemented with a group of SRAM bit slices. For example: 16x4 SRAM (n=4, k=4).
DI(3) DI(2) DI(1) DI(0)
WS0 DI DI DI DI
ad3 0
ad2
ad1
WS1
ad0 1
Decoder
...
...
...
...
...
WS15
E 15
DO DO DO DO
R/W CS R/W CS R/W CS R/W CS
R/W
CS
16x4 SRAM
DO(3) DO(2) DO(1) DO(0)
3-state output buffers: They are included to optimize implementation of larger SRAMs. If we want to build a larger
SRAM out of smaller SRAMs (see memory decoding), we then do not need to use a large multiplexor:
DI
n
DI
CS
2h
DO
k h r/w words
address AD 0 CS0
k DI
CS
2h
DO
words
CS0
r/w
k-h
AD 1 CS1
p h
n n
DI
DO DO
2h
CS2 CS1
CS DO
r/w words
2 CS2
AD
Decoder
...
...
...
...
DI
CS E CS
2h
DO
r/w words
R/W AD 2 p-1
CS2p-1
word k/2
Column Decoder
sel line
n
DI/DO Data In/Out Sense amplifiers
Buffers
R/W
... bit lines ...
bit line
DRAM Cell
Row Decoder
...
k word Memory
k/2
AD lines Array
RAS
...
16x1 DRAM cell array as a 2D array (4x4)
▪ Synchronous DRAM (SDRAM): The external interface is clocked (DRAMs have asynchronous interfaces).
▪ Double Data Rate SDRAM (DDR SDRAM): Like SDRAM, but the data output is provided on both the positive and negative
clock edges. Voltage: 2.5 v, frequency: 133 MHz. * DDR2: 1.8v, freq: 266, 333, 400 MHz. * DDR3: 1.5v, freq: 800 MHz.
DI
If en = ’1’: If we = ’1’: SP SP-1 ST[SP] DI LIFO
If we = ‘0’: DO ST[SP] SP SP+1 we empty
else: DO ST[SP] en full
Flags: empty = 1 if SP=DEPTH, else 0 full = 1 if SP=0, else 0
0 SP=0 0 0 0
DEPTH=4
Stack
1 1 SP=1 1 1 Empty
2 2 2 2
3 3 3 SP=3 3
SP=4
4
IN
DATAPATH
M6 1 0
0000
4 1 0 M3
cin
+
1 0 M1
4
E_PC E
sclr_PC sclr PC
L_R0 L_R1
E E
R0 R1
IR[7..0]
4
IR[3..0] R0 R1
WE
Instruction M4 0 1 0 1 M2
Memory D[7..0]
(16x8)
M5 1 0
IR[7..5]
IR[7..0] 4 4
L_OP
Z E
Z ALU OUT
INSTRUCTION
4
stop_ID DECODER
LEDS
INSTRUCTION SET
▪ Instructions are specified by the Instruction Register (IR). Destination Register
Source Register
OPCODE
DR SR
IR:
IMMEDIATE DATA
▪ opcode: IR[7..5]: This is the operation code of an instruction. This group of bits specifies an operation (such as add,
subtract, shift, complement in the ALU). If it has m bits, there can be up to 2m distinct instructions.
▪ Immediate Data: IR[3..0]. This is called an immediate operand since it is immediately available in the instruction.
INSTRUCTION DECODER
▪ This component is in charge of issuing control signals for the proper execution of instructions. The inputs to this circuit are
the Instruction Register (IR) and the Z flag. The outputs are all the control signals: M1, M2, M3, M4, M5, M6, L_R0, L_R1,
L_OP. Note that the Function Select (FS) output to the ALU is directly generated by IR[7..5].
▪ Also, if stop_ID=1, the following signals must be set to ‘0’: register enables in the Datapath (L_OP, L_R0, L_R1), and the PC
control signal M6. This is useful to pause execution of a program (PC and Datapath are not updated).
▪ This is a combinational circuit. The I/O relationship depends on how each instruction is defined.
1 0 1 DR X X X X
1 1 0 DR X X X X
0 0 1 DR d3 d2 d1 d0
✓ ADD DR, SR: Adds SR and DR, and copies the result onto DR
0 1 0 1
0 1 0 DR SR X X X
ADD R0,R0: 01000XXX M40, M50, M20, M31, M10, L_R01, M60
ADD R0,R1: 01001XXX M40, M50, M21, M31, M10, L_R01, M60
ADD R1,R0: 01010XXX M40, M50, M21, M31, M10, L_R11, M60
ADD R1,R1: 01011XXX M41, M50, M21, M31, M10, L_R11, M60
✓ ADDI DR, DATA: Adds immediate DATA and DR, and copies the result onto DR
0 1
0 1 1 DR d3 d2 d1 d0
0 0 0 DR SR X X X
✓ SR0 DR, SR: Shifts (to the right) the contents of SR and places the result onto DR
0 1 0 1
1 0 0 DR SR X X X
SR0 R0,R0: 10000XXX M40, M50, M20, M31, M10, L_R01, M60
SR0 R0,R1: 10001XXX M40, M50, M21, M31, M10, L_R01, M60
SR0 R1,R0: 10010XXX M40, M50, M21, M31, M10, L_R11, M60
SR0 R1,R1: 10011XXX M41, M50, M21, M31, M10, L_R11, M60
✓ JNZ DR, ADDRESS: Jumps to a certain instruction if the DR contents 0. This is how computers implement loops.
0 1
1 1 1 DR a3 a2 a1 a0
Example:
▪ Write an assembly program for a counter from 1 to 5: 1, 2, 3, 4, 5, 1, 2, 3, …. The count must be shown on the output
register (OUT).
start: loadi R0,1 R0 1
out R0 OUT = 1
addi R0,1 R0 R0 + 1 = 2
out R0 OUT = 2
addi R0,1 R0 R0 + 1 = 3
out R0 OUT = 3
addi R0,1 R0 R0 + 1 = 4
out R0 OUT = 4
addi R0,1 R0 R0 + 1 = 5
out R0 OUT = 5
jnz R0, start
Example:
▪ Write an assembly program for a counter from 2 to 13: 2,3,…, 13,2,3,… The count must be shown on the output register
(OUT). Use labels to specify any address where your program jumps. Note that you can have only up to 16 instructions.
▪ Provide the contents of the Instruction Memory.
address INSTRUCTION MEMORY
* 2 to 13 4 to 15
0000 00100010
0001 00110100
0010 11000000 start: loadi R0,2 R0 2
01100001 loadi R1,4 R1 4
0011
loop: out R0 → OUT: shows the count
0100 01110001
addi R0,1 R0 R0+1
0101 11110010 addi R1,1 R1 R1+1
00100001 jnz R1, loop
0110
loadi R0,1 R0 1
0111 11100000 jnz R0, start
1000 00000000
1001 00000000
1010 00000000
1011 00000000
1100 00000000
1101 00000000
1110 00000000
1111 00000000
SWs
sclr
sclr_PC File
4 21 registers
start PC
step IM_WE AD
WE
L_in IR[3..0]
INST_LOAD Instruction DATAPATH
IM_DI 8
L_ex CONTROL Memory
(16x8)
D_ex 8 3
FS
we_ex
8 Z
IR ALU
isbranch
Z
stop_ID Instruction
Decoder
FS=IR[7..5] M1..M5 L_R0 L_R1 L_OP
FS
DR
RW
OS
SA
SB
MB
MD
MW
JS
H
+ N
JA H 0 1 MB
N
AO N DO
JS 1 0
Z
V
E_PC E N
ALU FS
sclr_PC sclr C
N
DI
H 0 1 MD
PC
DATAPATH
▪ A generic datapath includes a Register File and an ALU (see previous figure). A Register File includes 2M registers, so we
need M bits to address all of these registers.
▪ Register File: The figure below depicts a Register File with M=2, resulting in 22=4 registers. Note how in this particular
implementation, we use 2 data buses (Bus A and Bus B). Other implementations only use one Data Bus. We also include the
connections to the ALU and to the Datapath inputs and outputs.
REGISTER FILE (M=2)
E3 E2 E1 E0
E E E E
R3 R2 R1 R0
E3
E2
E1
E0
M
Select A(SA)
Decoder
E 3 2 1 0 0 1 2 3 M Select B(SB)
A B
Z
V
N
ALU FS
C
Y N
Data In(DI)
0 1 MD
▪ Arithmetic Logic Unit: The FS has 4 bits, and the following table lists all the possible operations. The input Data (A, B)
and output data (Y) are represented as signed numbers. Here, the flags Z, V, N, C are generated.
✓ In this particular implementation, the carry out (C) from a previous operation is not an input to the ALU. Instead, we
have to use a specific instruction that adds the carry in (or borrow in) to an operation when desired.
INSTRUCTION SET
▪ Instruction: Collection of bits that instructs the compute to perform a specific operation.
✓ Each instruction specifies: i) an operation the system is to perform, ii) the registers or memory words where the operands
are to be found and the result is to be placed, and/or iii) which instruction to execute next.
✓ Instructions are usually stored in memory (RAM or ROM). To execute the instructions sequentially, we need the address
in memory of the instruction to be executed. The address comes from the Program Counter (PC).
✓ Executing an instruction means activating the necessary sequence of microoperations in the datapath (e.g.: add, subtract,
load, shift) and elsewhere required to perform the operation specified by the instruction.
Operation: This is specified by an instruction in memory. The Control Unites decodes the instruction in order to
perform the required microoperations for the execution of the instruction.
Microoperation: This is specified by the control bits generated by the Instruction Decoder (ID). The execution of a
computer operation often requires a sequence of microperations, rather than a single microoperation.
Instruction Format
▪ The 16-bit instructions are generated by the Instruction Memory (IM) and written on the Instruction Register (IR). The
instruction format might have different fields depending on the instruction type. Some microprocessors (like the VBC) only
have one instruction type. In this particular implementation, we have 3 different instruction types:
✓ Register: Opcode, 2 Source Registers (SA, SB), and a Destination Register (DR).
✓ Immediate: Opcode, 1 Source Register (SA), a Destination Register (DR), and a 3-bit immediate operand (OP).
✓ Jump and Branch: Opcode, Source Register, and 6-bit signed address offset: No register or memory contents are
changed. Here, we only update the PC.
▪ The OPCODE specifies the operation to be executed, which must use data stored in the registers or in memory.
Destination Register
Source Register A
Source Register B
REGISTER: OPCODE DR SA SB
15 9 8 65 32 0
Operand
IMMEDIATE: OPCODE DR SA OP
15 9 8 65 32 0
Address Offset
List of Instructions
▪ Each instruction is denoted with a symbolic notation: the OPCODE is given a mnemonic, and the additional instruction fields
are denoted by literals. This symbolic notation (called Assembly Instruction), that represents the operation executed by the
instruction, is then converted to the binary representation by a program called Assembler.
▪ The table provides the instruction specification, i.e., a description of the operation performed by each instruction, including
the status bits affected by the instruction. We include a limited number of instructions; the designer can always add more
instructions that are supported by the Datapath and Control Unit.
Mne- Status
Instruction Opcode Format Description PC
monic Bits
Move A 0000000 MOVA RD, RA R[DR] R[SA] PC PC+1 N, Z
Increment 0000001 INC RD, RA R[DR] R[SA] + 1 PC PC+1 N, Z, C, V
Add 0000010 ADD RD, RA, RB R[DR] R[SA] + R[SB] PC PC+1 N, Z, C, V
Subtract 0000101 SUB RD, RA, RB R[DR] R[SA] - R[SB] PC PC+1 N, Z, C, V
Decrement 0000110 DEC RD, RA R[DR] R[SA] - 1 PC PC+1 N, Z, C, V
AND 0001000 AND RD, RA, RB R[DR] R[SA] R[SB] PC PC+1 N, Z
OR 0001001 OR RD, RA, RB R[DR] R[SA] R[SB] PC PC+1 N, Z
Exclusive OR 0001010 XOR RD, RA, RB R[DR] R[SA] R[SB] PC PC+1 N, Z
NOT 0001011 NOT RD, RA R[DR] not (R[SA]) PC PC+1 N, Z
Move B 0001100 MOVB RD, RB R[DR] R[SB] PC PC+1 N, Z
Shift Right 0001101 SHR RD, RB R[DR] sr R[SB] PC PC+1 N, Z
Shift Left 0001110 SHL RD, RB R[DR] sl R[SB] PC PC+1 N, Z
Load Immediate 1001100 LDI RD, OP R[DR] OP PC PC+1 N, Z
Add Immediate 1000010 ADI RD, RA, OP R[DR] R[SA] + OP PC PC+1 N, Z
Load 0010000 LD RD, RA R[DR] M[R[SA]] PC PC+1
Store 0100000 ST RA, RB M[R[SA]] R[SB] PC PC+1
If R[SA] ≠ 0 PC PC+1
Branch on Zero 1100000 BRZ RA, AD N, Z
If R[SA] = 0 PC PC+AD
Brand on If R[SA] ≥ 0 PC PC+1
1100001 BRN RA, AD N, Z
Negative If R[SA] < 0 PC PC+AD
Jump 1110000 JMP RA PC R[SA]
▪ Other ISAs do not generate status bits when transfers on the Bus B (e.g. Move B) are occurring.
▪ Note that the branch instructions generate N, Z because they require Bus A to be transferred in order to evaluate R[SA]
which might assert N or Z. The Jump instruction does not affect the status bits.
▪ Some considerations regarding the notation of the Instruction Description:
✓ R[DR]: This refers to the register whose number is DR. Example: if DR=2 → R2.
✓ M[R[SA]]: This refers to the memory address given by the value of the Register with number SA, e.g.: if SA=3 → M[R3].
▪ The following table shows an example with instructions in memory and a detailed description of them:
Assembly
Address Memory Contents Other Fields Operation Comments
Instruction
011001 0000101001010011 DR:1, SA:2, SB:3 SUB R1,R2,R3 R1 R2 – R3
100011 0100000000100101 SA:4, SB:5 ST R4,R5 M[R4] R5 DR unused
101101 1000010010111110 DR:2, SA:7, OP:6 ADI R2,R7,6 R2 R7 + 6
110111 1100000101110100 AD:-20, SA:6 BRZ R6,-20 If R[SA] = 0: PC PC-20 -20=101100
111110 0010000101010000 DR:5, SA:2 LD R5,R2 R5 M[R2] SB unused
Example:
▪ The following Assembly Program implements a counter from 2 to 13: 2,3,…, 13,2,3,…
As we cannot use 11 as a 3-bit immediate operand, we first load 7 on R1 and then add 4. * 2 to 13 11 downto 0
We use ‘---‘ to indicate the values that are unused. This means we can assign any value to them.
Address Instruction Memory Assembly Program
000000 1001100 011 --- 100 start: LDI R3,4 R3 4
000001 1001100 000 --- 010 LDI R0,2 R0 2
000010 1001100 001 --- 111 LDI R1,7 R1 7
000011 1000010 001 001 100 ADI R1,R1,4 R1 R1+4 = 11
000100 1000010 000 000 001 loop: ADI R0,R0,1 R0 R0+1
000101 0000110 001 001 --- DEC R1,R1 R1 R1-1
000110 1100000 111 001 011 BRZ R1,-5
000111 1110000 --- 011 --- JMP R3
001000 0000000 000 000 000 R0 R0 (This is NOP operation)
...
Example:
▪ The following Assembly Program stores numbers from 43 down to 29 in Data Memory (DM) on addresses 0 to 14.
✓ After the instruction ST R4,R6 is executed, the R6 value appears on DM output. This is also true after the instruction BRZ
R4,-7 is executed. This is because these instructions cause SA=4, which in turn makes AO=R4[5..0]. And R4[5..0] is
the DM address where the value of R6 was stored.
✓ At the instruction JMP R2, DM_DO shows the value at the address equal to R2 (SA=2 makes AO = R2[5..0])
address DM
Address Instruction Memory Assembly Program 000000 2B
000000 1001100 010 --- 101 start: LDI R2,5 R2 5 000001 2A
000001 1001100 110 --- 111 LDI R6,7 R6 7 000010 29
000011 28
000010 1000010 110 110 111 ADI R6,R6, 7 R6 14
000100 27
000011 0000000 100 110 --- MOVA R4,R6 R4 14 000101 26
000100 0000010 110 100 110 ADD R6,R4,R6 R6 28 000110 25
000101 0000001 110 110 --- loop: INC R6,R6 R6 R6+1 000111 24
000110 0100000 --- 100 110 ST R4,R6 M[R4] R6 001000 23
000111 1100000 111 100 001 BRZ R4,-7 If R4=0 PC PC-7=0 001001 22
001000 0000110 100 100 --- DEC R4,R4 R4 R4-1 001010 21
001011 20
001001 1110000 --- 010 --- JMP R2 PC R2=5
001100 1F
001010 0000000 000 000 000 (NOP operation)
001101 1E
... 001110 1D
...
INSTRUCTION DECODER
▪ The inputs to this circuit are the Instruction Register (IR) and the V, C, N, Z flags. The outputs are all the control signals:
DR, SA, SB, MB, MD, RW, MW, OS, JS, FS. In this implementation, the V, C, N, Z flags are only considered when branching.
▪ Also, if stop_ID=1, the following signals must be set to ‘0’: register enables in the Datapath, the DM write enable, and the
PC control signals OS and JS. This is useful to pause execution of a program (PC and Datapath are not updated).
▪ This is a combinational circuit. The I/O relationship depends on how each instruction is defined. We provide the output
signals for some instructions:
Instruction Register V C N Z RW DR SA SB MB MD FS MW OS JS
MOVA R1,R2 0000000001010--- 001 010
1 --- - 0 0000 0 0 0
MOVA R7,R0 0000000111000--- 111 000
MOVB R0,R3 0001100000---011 000 011
1 --- 0 0 0111 0 0 0
MOVB R6,R6 0001100110---110 110 110
ADD R3,R2,R1 0000010011010001 011 010 001
1 0 0 0010 0 0 0
ADD R6,R0,R0 0000010110000000 110 000 000
XOR R6,R1,R3 0001010110001011 110 001 011
1 0 0 1010 0 0 0
XOR R5,R4,R5 0001010101100101 101 100 101
LDI R7,3 1001100111---011 111 ---
1 --- 1 0 0111 0 0 0
LDI R5,4 1001100101---100 101 ---
ADI R0,R1,7 1000010000001111 000 001 ---
1 1 0 0010 0 0 0
ADI R2,R6,3 1000010010110011 010 110 ---
LD R3,R7 0010000011111--- 1 011 111 --- - 1 1111 0 0 0
ST R1,R5 0100000---001101 0 --- 001 101 0 - 1111 1 0 0
0 0 0
BRN R4,-5 1100001111100011 0 --- 100 --- - - 0000 0
1 1 0
0 0 0
BRZ R3,12 1100000001011100 0 --- 011 --- - - 0000 0
1 1 0
JMP R5 1110000---101--- 0 --- 101 --- - - 1111 0 - 1
▪ Branch instructions (BRN, BRZ): These instructions might affect the N and Z bits. Depending on how they affect these flag
bits, we either branch or increase the value of the PC.
▪ JMP, LD, ST: They use FS=1111 since in this case the V, C, N, Z flags are unaffected.
▪ Register File: 16 8-bit registers: s0, s1, s2, s3, s4, s5, s6, s7, s8, s9, sA, sB, sC, sD, sE, sF.
▪ ALU: 8-bits. It supports:
✓ addition/subtraction with or without carry, bit-wise AND, OR, XOR
✓ arithmetic compare and bit-wise test operations
✓ shift and rotate operations
▪ Flags: C, Z, IE. They are stored in flip flops. This is different from the previous examples.
▪ Instruction Memory (IM): 1024 18-bit words. To optimize resources on an FPGA, this memory is implemented with BRAMs
instead of registers. So, there is a one-cycle delay when reading data.
✓ File for BRAM-based memory implementation (Artix-7 FPGA or 7-series PL): in_RAMgen.vhd)
▪ Data Memory (DM): 64 8-bit words. It is called a ScratchPad memory. No delay: built out of decoder, register, and a MUX.
▪ Program Counter (PC): 10-bit. It supports up to 1024 instructions (0x000 to 0x3FF). When the PC reaches 0x3FF, it rolls
over to location 0x000. Computed jump instructions (like offset) are not supported, i.e., the Datapath does not control the
PC. The Stack Data might or might not be incremented by 1 by the Program Counter.
▪ Call/Return Stack: 31 locations (10-bit words). This allows the processor to handle nested subroutines.
▪ Interrupts: 5 cycles to respond and start ISR. An interrupt enable flag (IE) is required.
CONTROL UNIT
Stack 10 DATAPATH
STACK DI PC 8
Control PORT_ID
DO
8
OUT_PORT
10
8
IN_PORT
Branch IE
Control ST
READ_STROBE
E_PC E PC JA/CA
Register WRITE_STROBE
sclr_PC sclr
File
10 2 4 registers DM_WE
PC
IR[9..0]
AD DATAPATH WE
IM_WE WE 10 6
AO AD
IR[7..0] 8
Instruction CI Data
18 8
IM_DI DI Memory DO DI Memory
DM_DI
DO FS DO
18 Z
IR C
ALU 8
DI
INT Z
Instruction C
Decoder IE
Datapath
INT_ACK Control
CONTROL_WORD FS
CALL/RETURN STACK
▪ This LIFO structure allows for the implementation of functions (including nested functions) as well as handling interrupts.
▪ It stores up to 31 10-bit instruction addresses, enabling nested function calls up to 31 levels deep. Address: 5-bit pointer
called Stack Pointer (SP) or Top of Stack (TOS). DI: input data. DO: output data.
SP 5
✓ At power-up: SP 31 (Stack is empty)
✓ If en = ‘1’: 10 10
DI DO
If we = ‘1’: SP SP-1 ST[SP] DI STACK
If we = ‘0’: DO ST[SP] SP SP+1 we
else: DO ST[SP] en empty
✓ empty flag = 1 if SP = 31, else 0 full flag = 1 if SP = 0, else 0 full
sclr
▪ SP is 5-bits wide as SP [0,31]. Note that SP=31 means Stack is empty. Thus, there are only 31 addresses (0 to 30) where
we can write data. ST[SP] denotes the contents of the Stack at address SP.
▪ The figure shows different stages of a Stack. When data is written onto the Stack, we say that we push a value onto it.
When data is read from the Stack, we say that we pop a value from it.
Stack Empty Write DI=0FE Read from Stack Stack Full
on Stack
0FE SP=0
...
0FE SP=28 0FE 0FE
2BA SP=29 2BA 2BA SP=29 2BA
3FA 3FA 3FA 3FA
SP=31 SP SP-1=28 DO ST[28]
ST[28] 0FE SP SP+1=29
▪ Writing on the Stack (SP SP-1 followed by ST[SP] DI): To minimize delays, these operations are usually executed at
the same time. This is, the hardware precomputes SP-1, and it executes ST[SP-1] DI and SP SP-1 simultaneously.
▪ Reading from Stack: To minimize delays, there is usually no latency on DO, i.e., DO is already showing the Top of the Stack.
▪ Subroutines Calls and Interrupt Event Handling:
✓ Call to Subroutine (or to ISR): To save PC on Stack, the Instruction Decoder issues we=en=1,sclr=0.
✓ Return from Subroutine (or from ISR): To restore PC from Stack, the Instruction Decoder issues we=0,en=1,sclr=0.
▪ PicoBlaze: The Call/Return Stack is implemented as a cyclic buffer. When the Stack is Full, it overwrites the oldest value.
10
+ 8
CI
JA/CA 0x3FF 10
MA 0 1 MB
8
JS 0 1 2 3 DO PORT_ID 8 6
AO
SIE
EPC LIE
E_PC E Z FS
sclr_PC sclr C
ALU
8
IN_PORT DI
IE
10 INTP
RI 0 1 2 2 MD
RS
PC WS
DATAPATH
▪ This datapath (see previous figure) includes: i) a Register File with 24 8-bit registers, ii) an ALU that stores the flags C and
Z on flip flops, and iii) an I/O interface.
▪ The Datapath executes the microoperations required by an instruction based on the Control signals received from the
Instruction Decoder (ID):
✓ IE: We can set this flag bit to ‘1’ or ‘0’ at any time (using the signals SIE and LIE).
✓ Interrupt Handling: On an interrupt event, the datapath stores Z an C onto buffers (ZI and CI) and clears IE. On return
from interrupt, Z and C are restored (with the ZI and CI values), and IE can be set to either ‘1’ or ‘0’.
✓ I/O interface: READ_STROBE and WRITE_STROBE come from the Instruction Decoder (RS and WS).
▪ Register File: The architecture resembles that of the ‘Simple Computer’, except that there is only one output data bus.
▪ Arithmetic Logic Unit (ALU): The FS has 5 bits, and the following table lists all the possible operations. The input Data
(A, B) and output data (Y) can be thought of unsigned or signed integers. The exception is the subtraction operation where
A and B are unsigned integers. Here, the flags Z, C are generated.
✓ Here, the flags C and Z are stored in FFs each time the ALU executes an operation that changes C and Z. We can use
the C flag in our operations. If we need to use C, we grab the value from the flip flop.
✓ Note that unlike the ‘Simple Computer’ model, the C flag (carry out/borrow out) is an input to the ALU. We can then use
a specific instruction that uses the value of this C flag from the flip flop. This allows us to implement, for example, multi-
precision addition and subtraction.
✓ To handle interrupts, on an interrupt event we must store C and Z onto other ffs CI and ZI. On return from interrupt, we
restore the values of C and Z from CI and ZI. The flip flops CI and ZI are part of this ALU.
▪ All instructions require two cycles. However, note that in this architecture, some operations (e.g.: compute the result of two
registers, compute the result of a register and a constant, reading/writing from external ports) do require 2 cycles to update
the results. On the other hand, other instructions (e.g.: reading and writing from Data Memory, clearing IE) take only 1
cycle. As a result, in this multi-cycle processor, we will see instructions that either require 1 or 2 microoperations to update
registers, memory contents, and flags.
▪ This multi-cycle microprocessor has a uniform number of cycles (2) per instructions. Note that other processor can take
different number of cycles depending on the instruction, thereby making the control mechanism more complex.
INSTRUCTION SET
▪ Note that it is common practice to specify (design) the Instruction Set first and then build the Datapath based on the
instructions that need to be supported.
Instruction format:
▪ 18-bit instruction. The instruction format has different fields depending on the instruction type. PicoBlaze has 5 different
instruction types. Note that the Destination Register is given by sX.
✓ Register: Opcode, 2 Source Registers (sX, sY).
✓ Immediate: Opcode, 1 Source Register (sX), 8-bit immediate operand (OP).
✓ Single Register: Opcode, 1 Source Register (sX), 8-bit Opcode extension that further specifies the operation.
✓ Jump and Call: Opcode, 2-bit Opcode extension, 10-bit immediate operand (JA/CA).
✓ No Operand: Opcode, 12-bit Opcode extension.
Source/Destination Register sX
Source Register sY
REGISTER: OPCODE sX sY 0 0 0 0
17 11 8 7 4 3 0
Operand
IMMEDIATE: OPCODE sX OP
17 11 8 7 0
Address
Opcode
JUMP AND CALL: OPCODE JA/CA
Ext
17 11 10 9 0
List of Instructions
▪ The table provides the description of the operation performed by each instruction, as well as the status bits affected by the
instruction. We provide the instructions in Assembly instruction format (mnemonic followed by literals).
✓ Constants Operands: kk = OP[7..0], ss = OP[5..0] (OP[7..6]=00), aaa = JA/CA[9..0].
✓ Addition: sX, sY, kk can be treated as unsigned or signed integers.
✓ Subtraction (unsigned): sX, sY, kk are treated as unsigned integers. C is interpreted as the borrow out (or borrow in).
▪ Status Bits (C, Z): They are stored in FFs. An instruction read these bits when they are part of the operation. These bits can
be updated after the instruction is executed. Z 1 if the result of the operation is 0. C 1 depending on the instruction.
▪ IN_PORT, OUT_PORT: I/O port names. PORT_ID: identifier or port address for an associated I/O operation.
▪ SP = TOS (top of the stack). ST[SP]: contents of the stack at address SP.
▪ Interrupts: CI, ZI: extra buffers to store C and Z when an interrupt hits so that we can restore them after the interrupt
event. IE: Interrupt enable flag (also considered a status bit).
✓ On an interrupt event, the following occurs: CI C, ZI Z, IE 0, ST[SP] PC, SP SP-1, PC 3FF.
PROGRAMMING MODEL
▪ From the point of view of the programmer, PicoBlaze contains:
✓ 16 8-bit registers (s0-sF).
✓ 64-byte Data Memory (DM).
✓ 3 Status flags (C, Z, IE)
✓ Program Counter (PC): This 10-bit pointer handles a 1024-word Instruction Memory (IM).
✓ Stack Pointer (SP): this 5-bit pointer handles a 31-word Call/Return Stack.
▪ After an instruction is executed, the contents of these components are altered explicitly or implicitly.
SUBROUTINES (PicoBlaze)
▪ The CALL and RETURN instructions can implement a subroutine. Here, we need to interact with the Call/Return Stack.
▪ A subroutine call is started by the CALL instruction. The process goes as follows:
✓ The PC value is pushed on the Top of the Call/Return Stack: SP SP-1, ST[SP] PC
✓ The CALL instruction jumps to the start of a subroutine (aaa address). PC aaa
✓ Instructions in the subroutine are then executed until a RETURN instruction is reached. At that moment, the stored PC
value is popped from Top of the Call/Return Stack, incremented by 1, and loaded onto PC: PC ST[SP]+1, SP SP+1
✓ The program returns to the instruction immediately after the original CALL instruction.
▪ Note that only PC is saved. The registers (s0-sF) and flags (C, Z) are not saved. Consider this when using subroutines.
▪ The following examples use unconditional CALL and RETURN instructions. This can be easily modified to allow conditional
CALL and RETURN instructions based on C and Z.
✓ Example: A subroutine call: Program Flow and Stack state.
... main: ... DO=006
... ... PC=006
4 and s1,5
5 add s1,s4
6 call myfun
7 xor s4,s3
...
006 SP=30 006
30 myfun: ... SP=31
... ... SP SP-1=30 DO ST[30]=006
... ... ST[30] 006 SP SP+1=31
41 return PC 006+1=007
✓ Example: Nested subroutine calls (2 levels): Program Flow and Stack state. Note how the Stack structure allows to save
(and restore) the PC values in the right order.
... main: ...
... ... PC=006 PC=037
004 and s1,5
005 add s1,s4
006 call myfun1
007 xor s4,s3
...
037 SP=29
SP=30
006 006
030 myfun1: ...
... ... SP SP-1=30 SP SP-1=30
037 call myfun2 ST[30] 006 ST[30] 006
038 load s0,s2
... ... DO=037 DO=006
04A return
062 myfun2: ...
... ...
... ... 037 037
067 return 006 SP=30 006
SP=31
DO ST[29]=037 DO ST[30]=006
SP SP+1=30 SP SP+1=31
PC 037+1=038 PC 006+1=007
21 Instructor: Daniel Llamocca
ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT, OAKLAND UNIVERSITY
ECE-4710/5710: Computer Hardware Design Winter 2024
INTERRUPTS
▪ This is another mechanism to alter program execution. It is not initiated by an instruction, but rather by an external (to the
microprocessor) request. When an interrupt event hits, the CPU stops normal program execution and perform some service
(called Interrupt Service Routine) associated to the interrupt event, then returns to normal program execution.
▪ PicoBlazeTM provides a single interrupt signal INT. We can enable or disable this signal via the IE (interrupt enable) flag:
✓ To enable interrupts, we use the ENABLE INTERRUPT instruction (IE 1)
✓ To disable interrupts, we use the DISABLE INTERRUPT instruction (IE 0)
▪ Flow of an interrupt event:
✓ By default, after reset (input signal to the PicoBlaze), the INT input is disabled. We need to enable interrupts first.
✓ Once enabled, the INT signal must be asserted for at least two cycles in order to be recognized as an interrupt event.
✓ An interrupt forces PicoBlaze to (implicitly) execute the CALL 3FF instruction immediately after completing the instruction
being executed. The CALL 3FF instruction is a subroutine call to the last Instruction Memory (IM) location (0x3FF).
During the implicit execution of the CALL 3FF instruction, the following occurs:
Further interrupts are disabled: IE 0
Flags Z and C are saved in buffers: CI C, ZI Z
The PC value pointing to the instruction pre-empted by CALL 3FF is saved on the Stack: SP SP-1, ST[SP] PC
The PC value is updated: PC 3FF (updated at the same time its current value is placed onto the Stack).
✓ At 0x3FF, there is typically a JUMP instruction to a subroutine called the Interrupt Service Routine (ISR).
✓ The RETURNI DISABLE/ENABLE instruction ensures the end of ISR. When it is executed, the PC value as well as the
Z and C values are restored. We can exit interrupt process with either enabling or disabling IE. When the interrupt
process is finished, we need to execute the pre-empted instruction.
▪ The figure shows the flow of an interrupt event for a program example.
✓ Note that the PC value last two cycles and it is available one cycle before the instruction. This is because pBlaze uses an
Instruction Memory that takes a cycle to generate output data (BRAM). Also, PC lasts for two clock cycles.
✓ INT signal: If asserted, it is recognized on the first immediate clock edge. However, it must be asserted for at least two
clock cycles so that a FSM (inside the Instruction Decoder) detects INT on two clock edges and then generates a one-
cycle pulse INT_P (during the first cycle of CALL 3FF). This pulse is used to save Z, C, and PC, as well as to update
PC (PC 3FF) and clear IE. So, even though CALL 3FF does not appear on IR, it is implicitly executed via INT_P.
✓ INT_ACK: This is a delayed version of INT_P. INT_ACK is asserted on the second cycle of the CALL 3FF instruction to
indicate that the interrupt was recognized. INT_ACK can be used by an external interface to clear external interrupts.
✓ The RETURNI instruction restores PC (PC ST[SP], SP SP+1). Unlike the RETURN instruction, the restored PC
value from the Stack is not increased by 1. This is because the INT_P pulse saved the PC of the pre-empted instruction
XOR s3,9 (0x008), which now needs to be executed.
✓ Note that interrupt processing takes 5 cycles, this is from the moment the interrupt is recognized until the ISR starts.
▪ The starting address of the ISR (known as the Interrupt Vector) is stored in a particular memory location. This location is
known as the Vector Address. In the PicoBlaze, the Interrupt Vector is located at 0x3FF.
▪ Unlike other processors, note that in PicoBlaze only PC and the Z, C flags are saved during an interrupt process.
Interrupt
IS R starts
recognized 5 clock cy cles
... main: ...
... ...
005 enable interrupt clock
006 add s5,03
007 sub s4,05
Interrupt
asserted IR[17..0] sub s4,05 xor s3,9 jump isr test sA,7
008 xor s3,9
009 input s5,02
... ... PC 007 008 3FF 071 072
... ...
INT
071 isr: test sA,7 INT_P
... ...
07A returni enable INT_ACK
P reempted instruction
... ... SP SP-1=30
call 3FF is implicitly executed
ST[30] PC=008
3FF jump isr PC 3FF
ZI Z
executed
in one cy cle CI C
IE 0
▪ I/O Interface: pBlaze can read/write data from/to external ports via this interface that consists of 5 signals.
✓ I/O Interface signals:
IN_PORT, OUT_PORT: 8-bit I/O ports connected to an external interface.
PORT_ID: 8-bit port identifier (or port address) for an associated I/O operation. It is valid for 2 clock cycles: this
allows enough time for an interface to respond and for reading data from a synchronous RAM.
READ_STROBE, WRITE_STROBE: These signals are associated with the read and write operations.
✓ INPUT operation:
INPUT sX, kk PORT_ID kk
sX IN_PORT
INPUT sX, (sY) PORT_ID sY
When the INPUT instruction appears on IR, PORT_ID is issued, and it is valid for two clock cycles. After those two
cycles, sX captures data on IN_PORT.
IN_PORT: It is connected to external interface that allows selection (usually via a multiplexer) of up to 256 different
input sources (selected by PORT_ID).
READ_STROBE: It is asserted on the second cycle of the two-cycle INPUT instruction cycle. It is used to indicate
that pBlaze has acquired data (i.e., acknowledges receipt of data).
Example: INPUT s0,2.
Note how PORT_ID appears right after the instruction is issued. Then the external circuit has two cycles to provide
an input value (hence we can insert a register before IN_PORT to improve performance). Data is captured on the
rising edge right after the second clock cycle of the instruction.
8
clock
8
pBlaze IR[17..0] INPUT s0,2
8
IN_PORT
...
PORT_ID 0x02
Register s0 DATA
✓ OUTPUT operation:
OUTPUT sX, kk PORT_ID kk
OUT_PORT sX
OUTPUT sX, (sY) PORT_ID sY
When the OUTPUT instruction appears on IR, PORT_ID and OUT_PORT are issued and they are valid for two clock
cycles. After those two cycles, that data is captured by the external interface.
OUT_PORT: It is connected to external interface where a decoder is commonly used (with PORT_ID) to route data
onto a specific destination (up to 256 storage spaces, e.g.: registers).
WRITE_STROBE: It is asserted on the second cycle of the two-cycle OUTPUT instruction cycle. It indicates data is
valid and ready for capture. The external interface can use it as an enable to capture data.
Example: OUTPUT s1,(s9)
Note that PORT_ID and OUT_PORT appear immediately after the instruction. Then the external circuit has two
cycles to capture the output (hence we can include a pipelining stage in the decoder to improve performance). Data
is captured on the rising edge right after the second clock cycle of the instruction.
clock
Decoder E
8 OUT_PORT Contents of s0
PORT_ID
...
...
WRITE_STROBE E
E WRITE_STROBE
External
Contents of s0
Register