Chapter 6 (Pipelining and Superscalar Techniques)
Chapter 6 (Pipelining and Superscalar Techniques)
Chapter SIX
Pipelining and Superscalar Techniques
Linear Pipeline processors
Linear pipeline processor is a cascade of processing stages which are connected
linearly to perform a fixed function over a data stream flowing from one end to
another.
• A linear pipeline processor has k processing stages (𝑆𝑖 ), such as 𝑆1 , 𝑆2 , 𝑆3 … 𝑆𝑘 .
• Inputs/operands are provided into the first stage 𝑆1 , which is passed into the next
stage 𝑆2 and so on.
• Depending on the data flow control through the pipeline, the linear pipeline models
are categorized in two:
Asynchronous Model: In this model, when stage 𝑆𝑖 is ready to transmit, it sends a ready signal to
stage 𝑆𝑖+1 . Upon receiving the incoming data, it returns an acknowledge signal to 𝑆𝑖 . This process
in known as handshaking protocol.
The delay between any two adjacent stages may be different thus have a variable throughput
rate.
Synchronous Model: In this model, clocked latch (typically a master-slave flip-flop) is used to store the
input only to transmit it later and to interface between stages.
Upon arrival the clock pulse, all latches transfer data to the next stage simultaneously.
All stages have equal transfer delays. These delays determine the clock period thus the speed of the
pipeline.
For a k-stage linear pipeline, k cycle is needed for data flow to the last stage.
Successive tasks or operations are commenced one per cycle to enter the pipeline. When the pipeline is
filled, one result comes out from the pipeline for each additional cycle.
Legend: 𝑆𝑖 = stage i, L = latch, 𝝉 = clock period, 𝝉𝑚 = maximum stage delay, d = latch delay, Ack = Acknowledge signal
• Clocking and Timing control
• Cycle clock: Let 𝝉𝑖 be the clock period in the stage 𝑆𝑖 and d is the time delay of a latch. The clock cycle 𝝉
is determined as below:
𝝉=𝑚𝑎𝑥{𝑡𝑖 }1𝑘 +𝑑 = 𝝉𝑚 + 𝑑
• Pipeline frequency: Defined as the inverse of the clock time/period:
1
𝑓=
𝝉
• Throughput: The maximum throughput represents the fact when one result is expected to come out of
the pipeline per cycle. The actual throughput may be lower than f depending on the initiation rate of
successive tasks entering the pipeline. This means more than one clock cycle has occurred between
successive task initiations.
• Clock skewing: Normally, it is expected that the clock pulses arrive at all stages at the same time. But
due to a problem, same clock pulse may arrive at different stages with a time offset s. This problem is
known as clock skewing.
• To avoid further complicacy, two constrains must be considered and they are 𝝉𝑚 ≥ 𝑡𝑚𝑎𝑥 + 𝑠 and
𝑑 ≤ 𝑡𝑚𝑖𝑛 − 𝑠, where 𝑡𝑚𝑎𝑥 = maximum time delay within a stage and 𝑡𝑚𝑖𝑛 = minimum time delay
within a stage.
Thus we can write when the clock skew takes places:
𝑑 + 𝑡𝑚𝑎𝑥 + 𝑠 ≤ 𝝉 ≤ 𝝉𝑚 + 𝑡𝑚𝑖𝑛 − 𝑠
When 𝑠 = 0, we get, 𝝉 = 𝝉𝑚 + 𝑑.
• Speedup, Efficiency, Throughput
The total time required for n tasks in linear pipeline of k stages is:
𝑇𝑘 = [𝑘 + (𝑛 − 1)] 𝝉
Where, k cycles are needed to complete the first ever task and the remaining n-1 tasks require n-1
cycles.
For a nonlinear pipeline processor, every task needs the delay of 𝑘𝝉 and for n tasks, the total time,
𝑇1 = 𝑛𝑘𝝉.
Speedup factor: The speedup factor between a k-stage pipeline processor and a non pipelined
processor is
𝑇1 𝑛𝑘𝝉 𝑛𝑘
𝑆𝑘 = = =
𝑇𝑘 [𝑘 + 𝑛 − 1 ]𝝉 𝑘 + (𝑛 − 1)
Performance/Cost Ratio (PCR): The ratio between the performance and the total pipeline cost is PCR.
Let t be the equal flow-through delay in a k-stage pipeline processor.
With d be the latch delay, the clock period p is 𝑡Τ𝑘 + 𝑑
1 1
So the maximum throughput/performance, 𝑓 = = 𝑡
𝑝 +𝑑
𝑘
The total cost estimated as 𝑐 + 𝑘ℎ, where c is the cost of all logic stages and h is cost of each latch.
Finally, the performance/cost ratio is defined as below:
𝑓 1
𝑃𝐶𝑅 = =
𝑐 + 𝑘ℎ (𝑐 + 𝑘ℎ)( 𝑡 + 𝑑)
𝑘
• Efficiency and Throughput
Efficiency: The efficiency, 𝐸𝑘 of a k-stage linear pipeline is defined as:
𝑆𝑘 𝑛𝑘 𝑛
𝐸𝑘 = = =
𝑘 𝑘 + 𝑛 − 1 𝑘 𝑘 + (𝑛 − 1)
Pipeline throughput (𝐻𝑘 ): It is defined as the number of tasks performed per unit of time:
𝑛 𝑛𝑓
𝐻𝑘 = =
[𝑘 + (𝑛 − 1)]𝝉 𝑘 + (𝑛 − 1)
Fig 6.3
Fig 6.4(a): Reservation table for function X Fig 6.4(b): Reservation table for function Y
The number of the columns in the reservation table is called the evaluation time.
For examples, function X has eight clock cycles and function Y has six.
• Latency analysis:
• Latency: The number of clock cycles between two initiations of the pipeline. A latency of k means
that two initiations are separated by k clock cycles.
• Collision: Any attempt by any two or more initiations to use the same pipeline stage at the same time
creates collision. Collision indicates resource conflicts between two initiations in the pipeline.
• Forbidden latency: Latencies that cause collisions are called forbidden latencies. As an example, for
function X from fig 6.4(a), the forbidden latencies are:
For stage 𝑆1 = {(6-1),(8-1),(8-6)} = {5, 7, 2}
For stage, 𝑆2 = {(4-2)} = {2}
For stage, 𝑆3 = {(5-3)(7-3)(7-5)} = {2, 4, 2}
Then the forbidden latencies for function X is: {2, 4, 5, 7}
• Collision vectors: From a given reservation table, it is easier to differentiate between the set of
permissible latencies and forbidden latencies. For a reservation table of n columns, the maximum
forbidden latency is m<=n-1. The permissible latency p is in the range of 1<=p<=m-1 or 1<=p<m as it
should be as small as possible. Now, the combined set of permissible and forbidden latencies can be
displayed by a collision vector. This is an m-bit vector
𝐶 = (𝐶𝑚 𝐶𝑚−1 …𝐶2 𝐶1 )
• The value of 𝐶𝑖 =1 if latency i causes a collision and 𝐶𝑖 =0 if latency i happens to be permissible. It is
always 𝐶𝑚 =1 that corresponds to the maximum forbidden latency.
• For the function X, we find the collision vector, 𝐶𝑋 = (𝐶7 𝐶6 𝐶5 𝐶4 𝐶3 𝐶2 𝐶1 ) = (1011010)
• State diagram
Specifies the permissible state transitions among successive initiations based on the collision vector.
• Differences between linear and nonlinear pipeline processors:
Linear pipeline are static pipeline as they are used to Non-Linear pipeline are dynamic pipeline as they can be
perform fixed functions. reconfigured to perform variable functions at different
times.
Linear pipeline has only streamline connections. Non-Linear pipeline has streamline connection as well as
feed-forward and feedback connections.
It is easy to partition a given function into a sequence of Function partitioning is relatively difficult as the pipeline
linearly ordered sub functions. stages are interconnected with loops in addition to
streamline connections.
The Output of the pipeline is produced from the last The Output of the pipeline is not necessarily produced
stage of the pipeline. from the last stage.
The reservation table is trivial in the sense that data The reservation table is non-trivial in the sense that there
flows in linear streamline. is no linear streamline for data flows.
Static pipelining is specified by single Reservation table. Dynamic pipelining is specified by more than one
Reservation table.