0% found this document useful (0 votes)
230 views6 pages

18116029

This document contains the answers to four questions regarding assembly language programs, pipelining, and processor stage latencies. For the first question, it lists all the dependencies in a sample assembly program. For the second question, it calculates speedup for single-cycle and pipelined processors with and without stalls. For the third question, it analyzes the timing and cycles required to execute a loop in a pipelined processor with and without forwarding. For the fourth question, it calculates clock cycle times and speedup for single-cycle and pipelined processors given stage latencies, and determines optimal stage groupings for 3-stage and 6-stage pipelines.

Uploaded by

Gurpreet Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
230 views6 pages

18116029

This document contains the answers to four questions regarding assembly language programs, pipelining, and processor stage latencies. For the first question, it lists all the dependencies in a sample assembly program. For the second question, it calculates speedup for single-cycle and pipelined processors with and without stalls. For the third question, it analyzes the timing and cycles required to execute a loop in a pipelined processor with and without forwarding. For the fourth question, it calculates clock cycle times and speedup for single-cycle and pipelined processors given stage latencies, and determines optimal stage groupings for 3-stage and 6-stage pipelines.

Uploaded by

Gurpreet Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CSN-221-Assignment-4

GURPREET SINGH-18116029
22 October 2019

1 Question
Consider the following assembly language program.
I1: MOV R3, R7
I2: LD R8, [R3]
I3: ADD R3, R3, 4
I4: LOAD R9, [R3]
I5: BNE R8, R9, I3
List all the dependencies in this code.

Answer

True Dependency - RAW -


I1 = > I2
I2 = > I5
I1 = > I3
I3 = > I4
I4 = > I5

Output Dependency - WAW -


I1 = > I3

False Dependency - WAR -


I2 = > I3

1
2 Question
We have a single stage, no pipelined machine, and a pipelined machine with 5-
stages. The cycle time for the former is 5 ns and the latter is 1 ns.
a. Assume no stalls, what is the speedup of the pipelined machine over the
single staged machine?
b. Given the pipeline stalls 1 cycle for 40 % of the instructions, what is the
speedup now?

Answer

a)
let number of instructions is n.
Speedup = 1 x n x 5/(5+n-1) = 5n/(4+n)
when number of instructions is very large , by taking limit n - > infinity
speedup = 5
b) Average CPI = 1 + 0.4 x 1 = 1.4
Speedup = 5n/1.4n = 3.58

2
3 Question
Use the following code fragment.
I1: Loop: LD R1, 0[R2]
I2: DADDI R1, R1, 1
I3: SD 0[R2], R1
I4: DADDI R2, R2, 4
I5: DSUB R4, R3, R2
I6: BNEZ R4, Loop

a. List all the True RAW data dependencies.

b. Show the timing of this instruction sequence for a 5-stage pipeline along
with the number of cycles required to execute one iteration of the loop with no
forwarding.

c. Show the timing of this instruction sequence for a 5-stage pipeline along
with the number of cycles required to execute one iteration of the loop with
forwarding.
Assume registers can be written and read in the same cycle, during write back.
(The number of cycles for the execution of one iteration of the loop ends after
the A (ALU) stage of BNEZ instruction.)

Answer :

a) RAW Dependencies [Total = 4] :


I1-I2
I2-I3
I4-I5
I5-I6

b) 16 Cycles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
LD F D E M W
DADDI F S S D E M W
SD F S S D E M W
DADDI F D E M W
DSUB F S S D E M W
BNEZ F S S D E
outside F S

3
c) 9 Cycles

1 2 3 4 5 6 7 8 9
LD F D E M W
DADDI F S D E M W
SD F D E M W
DADDI F D E M W
DSUB F D E M
BNEZ F D E
outside F S

4
4 Question
Individual stages of a processor have the following latencies.
F D A M W
210 90 110 240 50

If the processor is pipelined, each pipeline latch adds a latency of 20 ps to


the stage that precedes it – this is so called “setup-latency”, where the signals
need to be stable at the input of the latch for some amount of time before they
can be latched correctly at the end of the cycle. In this approach, no pipeline
is used, and in each cycle one instruction is executed from start (F) to finish (W).

a. What is the clock cycle time if we implement this processor using single-
cycle approach (in ps)?

b. What is the clock cycle time if we implement this processor using a 5-stage
pipeline (in ps)?

c. What is the speedup of the pipelined processor over a single-cycle processor


if the single cycle processor has a CPI of 1 and the pipelined processor achieves
a CPI of 1.2?

d. If the processor must be implemented with a 3-stage pipeline, some of the


existing 5-stages must be combined (assume that the existing 5-stages can not
be split). Which of the existing five stages (F, D, A, M, W) should be placed
into which stage of the 3-stage pipeline to minimize the resulting clock cycle
time?

e. If the processor is to be implemented with a 6-stage pipeline, but the design


effort and time to market are such that there is only enough time to split one of
the five existing (F, D, A, M, W) stages into two new stages, which stage would
you choose to split?

Answer :

a) Cycle Time : 210+90+110+240+50 = 700 ps

b) Cycle TIme : 240+20 = 260 ps

c) CPU Time = CPI x CT x #Instructions

CPUA = 1 x 700 x N

CPUB = 1.2 x 260 x N

Speedup = CP UA /CP UB = 2.24

5
d) 3 Stage pipeline :

Stage 1 : F - 210 ps

Stage 2 : A,D - 200 ps

Stage 3 : M,W - 290 ps

Total Cycle Time = 290 + 20 = 310 ps

e) Split the stage having maximum time .


Hence, we split the stage : M ,
into two equal halves each having a stage time of 145 ps.
Therefore , the new reduced Cycle Time = 210 + 20 = 230 ps

You might also like