COE 485 Sem 1, 2023
Advanced Computer Architecture
MIPS Pipelining
Pipelining
• Overlapped execution of instructions
• Instruction level parallelism (concurrency)
• Example pipeline: assembly line
• Response time for any instruction is the same
• Instruction throughput increases
• Speedup number of steps (stages)
– Reality: Pipelining introduces overhead
Pipelining
• Start work ASAP!! Do not waste time!
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Not pipelined
B
Assume 30 min. each task – wash, dry, fold, store – and that
separate tasks use separate hardware and so can be overlapped
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order
A
Pipelined
B
D
Pipelining Example
• Assume: One instruction format (easy)
• Assume: Each instruction has 3 steps S1..S3
• Assume: Pipeline has 3 segments (one/step)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Seg #1 S1 S2 S3 S1 S2
Seg #2 S1 S2 S3 S1
Seg #3 S1 S2 S3
Time
New Instruction
Pipelined vs. Single-Cycle
Instruction Execution: the Plan
Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data Single-cycle
lw $1, 100($0) fetch
Reg ALU
access
Reg
Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg
Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns
Assume 2 ns for memory access, ALU operation; 1 ns for register access:
therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Instruction Data
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Pipelining: Keep in Mind
• Pipelining does not reduce latency of a single task,
it increases throughput of entire workload
• Pipeline rate limited by longest stage
– potential speedup = number pipe stages
– unbalanced lengths of pipe stages reduces speedup
• Time to fill pipeline and time to drain it – when
there is slack in the pipeline – reduces speedup
Pipelining MIPS
• What makes it easy with MIPS?
– all instructions are same length
• so fetch and decode stages are similar for all instructions
– just a few instruction formats
• simplifies instruction decode and makes it possible in one stage
– memory operands appear only in load/stores
• so memory access can be deferred to exactly one later stage
– operands are aligned in memory
• one data transfer instruction requires one memory access stage
Pipelining MIPS
• What makes it hard?
– structural hazards: different instructions, at different stages,
in the pipeline want to use the same hardware resource
– control hazards: succeeding instruction, to put into pipeline,
depends on the outcome of a previous branch instruction,
already in pipeline
– data hazards: an instruction in the pipeline requires data to be
computed by a previous instruction still in the pipeline
• Before actually building the pipelined datapath and
control we first briefly examine these potential hazards
individually…
Hazards (1/2)
• Structural
– Different instructions trying to use the same
functional unit (e.g. memory, register file)
– Solution: duplicate hardware
• Control (branches)
– Target address known only at the end of 3rd
cycle => STALLS
– Solutions
• Prediction (static and dynamic): Loops
• Delayed branches
Hazards (2/2)
• Data hazards
– Dependency: Instruction depends on the result
of a previous instruction still in the pipeline
Add $s0, $t0, $t1
Sub $t2, $s0, $t3
– Stall: add three bubbles (no-ops) to the pipeline
– Solution: forwarding (send data to later stage)
• MEM => EX
• EX => EX
– Code reordering to avoid stalls
Pipelined Datapath
• We now move to actually building a pipelined datapath
• First recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
• Review: single-cycle processor
– all 5 steps done in a single clock cycle
– dedicated hardware required for each step
• What happens if we break the execution into multiple
cycles, but keep the extra hardware?
MIPS Pipeline
• MIPS subset
– Memory access: lw and sw
– Arithmetic and logic: and, sub, and, or, slt
– Branch: beq
• Steps (pipeline segments)
– IF: fetch instruction from memory
– ID: decode instruction and read registers
– EX: execute the operation or calculate address
– MEM: access an operand in data memory
– WB: write the result into a register
Designing ISA for Pipelining
• Instructions are assumed to be same length
– Easy IF and ID
– Similar to multicycle datapath
• Few but consistent instruction formats
– Register IDs in the same place (rd, rs, rt)
– Decoding and register reading at the same time
• Memory operand only in lw and sw
• Operands are aligned in memory
Recall: Single-cycle Datapath
Pipeline Representation
Pipelined Datapath
IF ID EX MEM WB
Conclusions
• Pipelining improves efficiency by:
– Regularizing instruction format => simplicity
– Partitioning each instruction into steps
– Making each step have about the same work
– Keeping the pipeline almost always full (occupied) to
maximize processor throughput
• Pipeline control is complicated
– Forwarding
– Hazard detection and avoidance
Next : Pipeline operation