Pipeline - 1
1. Consider executing the following code on the pipelined datapath of Figure - I
add
sub
add
add
add
$2,
$4,
$5,
$7,
$8,
$3,
$3,
$3,
$6,
$2,
$1
$5
$7
$1
$6
At the end of the fifth cycle of execution, which registers are being read and which register will be
written?
2. With regard to the program in problem 1 above, explain what the forwarding unit is doing
during the fifth cycle of execution. If any comparisons are being made, mention them.
3. Again with regard to the program in problem 1, explain what the hazard detection unit is doing
during the fifth cycle of execution. If any comparisons are being made, mention them.
Figure - I
Pipeline
-
2
(continue)
1. This exercise is intended to help you understand the relationship between
delay slots, control hazards, and branch execution in a pipelined processor. In
this exercise, we assumethat the following MIPS code is executed on a
pipelined processor with a five-stage pipeline,full forwarding and a predict-taken
branchpredictor:
a.
Label1: lw $1,40($6)
beq $2, $3, Label2
add $1, $6, $4
:Taken
Label2: beq $1, $2, Label1 : Not taken
sw$2, 20($4)
and $1,$1,$4
b.
add $1,$5,$3
Label1: sw $1,0($2)
add $2, $2, $3
beq $2,$4, Label1 : Not taken
add $5, $5, $1
sw $1, 0($2)
Draw the pipeline execution diagram for this code, assuming there are no delay slots
and that branches execute in the EX stage.
2. Assume that we have a multiple-issue pipelined processor with the following number
of pipeline stages, instructions issued per cycle, stage in which branch outcomes are
resolved, and branch predictor accuracy:
Pipeline
Depth
a. 10
Issue Width
4
Branches execute in
stage
7
Branch Predictor
accuracy
80%
Branches as a % of
instructions
20%
b. 25
17
92%
25%
Control hazards can be eliminated by adding branch delay slots. How many delay slots
must follow each branch if we want to eliminate all control hazards in this processor?
3. This exercise examines how exception handling interacts with branch and load/store
instructions. Problems in this exercise refer to the following branch instruction and the
corresponding delay slot instruction:
Branchanddelayslot
a.
beq$1,$0,Label
sw$6,50($1)
b.
beq$5,$0,Label
nor$5,$4,$3
a.Assumethatthisbranchiscorrectlypredictedastaken, butthentheinstruction
atLabelis anundefinedinstruction.Describewhatisdoneineach pipelinestage
foreachcycle, starting withthecycleinwhichthebranchis decodedupto the
cycleinwhichthefirstinstructionofthe exceptionhandle isfetched.
b. RepeatExercise3.1, butthistimeassumethattheinstructioninthedelayslot
also causesa hardwareerrorexceptionwhenitisinMEMstage.
c. WhatisthevalueintheEPCifthebranchistakenbutthedelayslotcauses
anexception? Whathappensaftertheexecutionoftheexceptionhandleris
completed?
Cache
1. Why might a compiler perform the following optimization?
/ * Before */
for (j = 0; j < 20; j ++ )
for ( i = 0; i < 200; i++ )
x[ i ] [ j ] = x [ i ] [ j ] + 1
;
/ * After */
for ( i = 0 ; i < 200 ; i ++ )
for ( j = 0 ; j < 20 ; j ++ )
x [ i ] [ j ] = x [ i ] [ j ] + 1;
2. Cache C1 is direct-mapped with 16 one-word blocks. Cache C2 is direct-mapped with 4
four-word blocks. Assume that the miss penalty for C1 is 8 memory bus clock cycles and
the miss penalty for C2 is 11 memory bus clock cycles. Assuming that the caches are
initially empty, find a reference string for which C2 has a lower miss rate but spends more
memory bus clock cycles on cache misses than C1. Use word addresses.