0% found this document useful (0 votes)

114 views12 pages

Re Timing

This paper presents a new algorithm called ASTRA that utilizes the equivalence between retiming and clock skew to efficiently find an optimal retiming for large circuits. The algorithm operates in two phases: Phase A solves the clock skew optimization problem to minimize clock period while minimizing skew difference. Phase B then performs retiming to set all skews to zero, guaranteeing a clock period at most one gate delay larger than the optimal period found using skew alone. By exploiting this equivalence between retiming and skew, ASTRA can optimally retime large circuits faster than previous algorithms.

Uploaded by

Nikhil P Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views12 pages

Re Timing

Uploaded by

Nikhil P Gowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

lEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATEDCIRCUITS AND SYSTEMS, VOL. 15, NO.

10, OCTOBER 1996

1237

Utilizing the Retirning-Skew Equivalence in a Practical Algorithm for Retiming Large Circuits
Sachin S. Sapatnekar, Member, IEEE, and Rahul B. Deokar
Abstract-The introduction of clock skew at an edge-triggered flip-flop has an effect that is similar to the movement of the flipflop across combinational logic module boundaries, and these are continuous and discrete optimizations with the same effect. While this fact has been recognized before, this paper, for the first time, utilizes this information to find an optimal retiming efficiently. The clock period is guaranteed to be at most one gate delay larger than the optimal clock period found using skew alone; note that since skew is a continuous optimization, it is possible that the optimal period may not be achievable. The method views the circuit hierarchically, first solving the clock skew problem at one level above the gate level, and then using local transformations at the gate level to perform retiming for the optimal clock period. The solution is thus divided into two phases. In Phase A, the clock skew optimization problem is solved with the objective of minimizing the clock period, while ensuring that the difference between the maximum and the minimum skew is minimized. Next, in Phase B, retiming is employed and some flip-flops are relocated across gates in an attempt to set the values of all skews to be as close to zero as possible.

I. INTRODUCTION HE IMPORTANCE of the issue of optimizing the timing behavior of VLSI circuits probably needs no introduction to any reader of this paper, and a great deal of effort has been invested into research in this field. This paper considers the method of retiming [I], which proceeds by relocating flip-flops within a network to achieve faster clocking speeds. A novel approach to retiming that utilizes the solution of the clock skew optimization problem [2] forms the backbone of this paper. The introduction of clock skew at a flip-flop has an effect that is similar to the movement of the flip-flop across combinational logic module boundaries. This was observed in [2], which stated that clock skew and retiming are continuous and discrete optimizations with the same effect. Although the designer can choose between the two transformations, these methods can, in general, complement each other. The equivalence between retiming and skew has been observed and used in earlier work, for example, in [ 3 ] - [ 5 ] . The contribution of this work is that it exploits this equivalence and presents a method that finds an optimal retiming efficiently, with a clock period that is guaranteed to be ut
Manuscript received August 1 1 , 1994; revised March 13, 1995 and June 20, 1996. This work was supported in part by the National Scicnce Foundation Faculty Early Career Development Award under Contract MIP-9502556. This paper was recommended by Associate Editor F. Somenzi. The authors are with the Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 5001 1 USA. Publisher Item Identifier S 0278-0070(96)07467-2.

most one gate delay larger than the optimal clock period. The method views the circuit hierarchically, first solving the clock skew problem at one level above the gate levell, and then using local transformations at the gate level to perform retiming for the optimal clock period. The a l p rithm has been named ASTRA (A Skew-To-Retiming Algorithm). The clock skew problem is first solved using efficient graph-theoretic techniques in polynomial time. The idea of using graph algorithms is to take advantage of the structurie of the problem to arrive at an efficient solution. Like [2]1, our technique is illustrated on single-phase clocked circuits containing edge-triggered flip-flops. The advantage of using this graph algorithm is that it not only minimizes the cloclk period, but that unlike a simplex-based linear programming approach, it also ensures that the difference between thle maximum and minimum skews is minimized at the optima.1 clock period. The complexity of solving the retiming problem at thle gate level using the technique in [l] is O(IGI2log [GI), where \GI is the number of gates in the circuit; this could be phenomenally large, and a verbatim implementation, such as that in SIS [6] is incapable of handling large circuits. However, a clever implementation, such as that proposed very recently in [7], can render the problem of finding a retiming of a circuit at the minimum clock period tractable. To place the our method in perspective with that in [7], it must be pointed out that the two were research efforts that were carried out independently and parallelly in time. While [7] performs the important task of debunking the myth that the Leiserson-Saxe algorithm cannot be applied to retime large gate-level circuits, our approach shows an alternative view to retiming, by way of clock skew optimization. Apart from taking different approaches to solving the retiming problem, the two also differ in that the work presented here also directly provides a solution to the problem of jointly finding an optimal retiming and optimal clock skews for a minimum period circuit. The work in [7] also performs minimum area retiming; however, we have currently left that as a topic for further research. The run-time of our approach and that in [7] are essentially the same for the minimum period retiming problem.

astra:

(Sanskrit) a sophisticated weapon.

0278-0070/96$05.00 0 1996 IEEE

1238

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 15, NO. 10, OCTOBER 1996

CCl
FF
A

c??
FF
B

CCl

cc2
FF FF

FF
C I4

E T
A

B
I1

C
13

I1
Ck

Fig. 1. The advantages of nonzero clock skew.

Fig. 2.

Retiming for clock period optimization.

Another method developed independently and concurrently in [8] effectively formulates the problem as an integer linear program of minimizing the clock period subject to long path constraints between latch pairs, which is superficially similar to our formulation. The method then relaxes the integrality constraints and calculates the optimum clock period by solving this continuous linear program. The work concludes that the optimal integer solution can be obtained immediately as the ceiling of the clock period provided by the continuous linear program solution. Although it is not specifically stated in [SI, this is true only for circuits with unit-delay gates, and it is easy to construct counterexamples of circuits with nonunit delay gates where this method would not work. Our algorithm uses a separate relocation phase after solving the linear program and can handle arbitrary gate delays; in fact, the result in [XI on optimality of the ceiling for unit gate delays can be shown using a procedure and proof technique similar to that in Theorem 9 of this paper. We also differ from [8] in that we solve the continuous linear program using efficient graph-theoretic techniques. In this paper, the circuit is assumed to be composed of gates with constant delays. We assume the presence of an FF at each primary input and each primary output. The solution is divided into two phases. In Phase A, the clock skew optimization problem is solved with the objective of minimizing the clock period, while ensuring that the difference between the maximum and the minimum skew is minimized. This difference provides a measure of how many gates have to be traversed in the next phase, and therefore, it is important that it be a small number. In Phase B, the skew solution is translated to retiming and some flip-flops are relocated across gates in an attempt to set the values of all skews to be as close to zero as possible. The designer may choose to achieve the optimal clock period by using a combination of clock skew and retiming; alternative@, any skews that could not be set ?., exactly to zero could now be forced to zero. This could cause the clock period to increase; however, it is shown that this increase will be no greater than one gate delay. The paper is organized as follows: the equivalence between retiming and clock skew is first shown in Sections I1 and 1 1 1. The algorithm for the clock skew optimization phase and a few related theoretical results are described in Section IV. Next, in Section V. the process of transforming the solution of Phase A to a retiming solution is described, followed by a description of the properties of this solution in Section VI. Finally, we present experimental results in Section VI1 and conclude the paper in Section VIII.

11. CLOCK SKEW OPTIMIZATION AND RETIMING

In a sequential VLSI circuit, due to differences in interconnect delays on the clock distribution network, clock signals may not arrive at all of the flip-flops at the same time. Thus, there is a skew between the clock arrival times at different flipflops. In a single-phase clocked circuit, in the case where there is no clock skew, the designer must ensure that each inputoutput path of a combinational circuit block has a delay that is less than the clock period. In the presence of skew, however, the relation grows more complex as one must compensate for this effect in designing the combinational circuits blocks. One approach that has been followed by several researchers [9]-[13] is to design the clock distribution network so as to ensure zero clock skew. An alternative approach views clock skews as a manageable resource rather than a liability. It manipulates clock skews to advantage by intentionally introducing skews to improve the performance of the circuit. To illustrate how this may be done, consider the following example. In Fig. 1, assume the delays of the inverters to be 1.0 unit each. The combinational circuit blocks CC1 and CC, have delays of 3.0 and 1.0 units, respectively, and therefore, the fastest allowable clock has a period of 3.0 units. However, if a skew of +1.0 unit is applied to the clock line to flip-flop B, the circuit can run with a clock period of 2.0 units. This approach was formalized in the work by Fishburn [ 2 ] . The clock skew optimization problem was formulated as a linear program that may be solved to find the optimal clock period. A second approach that is exploited here to improve the performance of the circuit is the procedure of retiming [l]. Retiming involves the relocation of flip-flops across logic gates to allow the circuit to be operated under a faster clock, without changing its functionality. To illustrate how this may be done, consider the example in Fig. 1, where it was seen that the clock period can be minimized to 2.0 units by introducing a skew of $1.0 unit at the flip-flop B. In an alternative approach, the period can still be minimized to 2.0 units by moving the flip-flop B to the left across the inverter 13. This results in the combinational circuit blocks CCI and CC, having delays of 2.0 units each as seen in Fig. 2. This approach is formalized in [l]. If one were to imagine the circuit as being drawn with its inputs to the left and outputs to the right, then the conversion of a negative (positive) skew to zero skew would involve the relocation of flip-flops to the right (left). In this paper, we will use the terms right and left to denote the direction of signal propagation and the direction opposite to that of signal propagation, respectively.

SAPATNEKAR AND DEOKAR: UTILIZING THE RETIMING-SKEW EQUIVALENCE

1239

FF i

5 -d*
(b) Fig. 3 . Equivalence between retiming and skew.

Fig. 4. The clock skew optimization problem.

111. EQUIVALENCE BETWEEN CLOCK SKEW AND RETIMING A more formal presentation of the equivalence between clock skew and retiming is presented here. Consider a flip-flop j in a circuit, as shown in Fig. 3(a). For every combinational path from a flip-flop i to j with delay d ( i , j ) , the following constraints must hold to prevent zero-clocking and doubleclocking, respectively.

+ d ( i , j ) + Tsetup I xj + P xi f d ( i , j ) 2xj + Thold

(1)

where x; and xj are the skews at flip-flops i and j respec, tively. Similar constraints can be written for every combinational path from flip-flop j to k with delay d ( j , k ) . The statement of the following result was provided in an informal way in [2], and we have included a brief proof. Theorem 1: For a circuit that operates at a clock period P, satisfying the double-clocking and zero-clocking delay constraints, a> retiming a flip-flop by moving it to the left across a gate of delay d l is equivalent to decreasing its skew by d l . b) retiming a flip-flop by moving it to the right across a gate of delay dz is equivalent to increasing its skew by d2. Prooj We will now prove statement a) with the help of Fig. 3(b); the proof to b) is analogous. Consider the case where flip-flop j is moved to the left across gate Gl that has delay d l . For a combinational path from a flip-flop i to .j, if xj and are, respectively, the skews at flip-flop jbefore and after the relocation operation, then the following relationships must be satisfied

to guarantee that the skew at a flip-flop can always be reduced to zero through retiming operations. An alternative view of the same procedure is as follows. Retiming may be thought of as a sequence of movements of flip-flops across gates (Theorem 9.1 in [14]). We may start from the final retimed circuit, where all of the skews are zero, and the zero-clocking constraints are met, and perform the sequence of movements in reverse order. This procedure can be used to move all flip-flops back to their initial locations, using Theorem 1 to keep track of the changed clock skews at each flip-flop. Therefore, the optimal retiming is equivalent to applying skews at the inputs of flip-flops. Note that the optimal clock period provided by the clock skew optimization procedure must, by definition, be no greater than the clock period for the set of clock skews thus obtained. Any differences arise due to the fact that clock skew optimization is a continuous optimization, while retiming is a discrete optimization. The following corollary follows: Corollary 2: The clock period obtained by an optimal retiming can be achieved via clock skew optimization. The clock period provided by the clock skew optimization procedure is less than or equal to that provided by the method of retiming. IV. PHASE OPTIMIZING A: CLOCKSKEWS Given a combinational circuit segment that lies between two flip-flops i and j as shown in Fig. 4, if xi and xj are the skews , at the two flip-flops, then the following inequations must be satisfied
xi
2 ;

+ [d(i,j)

dl]

+ Tsetup I xi + P
2xi f Thold

+ d ( i , j ) + Tsetup 5 + p
xj

+ d ( i , j ) 2 xj f Thold
(2:)

xi f [ d ( i , j ) - dl]

From (l), this implies that setting x = xj - dl achieves the : same effect. Similarly, for a combinational path from j to a flip-flop k , we can show that setting xi = xj - d l in the minimize P 0 original circuit achieves the same effect. Therefore, if one were to calculate the optimal clock skews, subject to X i - Xj 2 Thold - d ( 2 , j ) one could retime the circuit by moving flip-flops with posi(3:) xj - 2; P 2 Tsetup Z ( 2 , j ) tive(negative) skews to the left(right) until the skews at the flip-flops are nearly equal to zero. It must be noted that for every pair ( i , j ) of flip-flops such that there is at least one since gate delays take on discrete values, it is not possible combinational path from flip-flop i to flip-flop j .

where d ( i , j ) ( d ( i , j ) )is the minimum (maximum) delay of any combinational path between flip-flops i and j . In [ 2 ] , the clock skew problem for minimizing the clock period is found by solving the following linear program

1240

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 15, NO. 10, OCTOBER 1996

In this paper, we will consider single-sided constraints only, and will ignore the short path constraints. We thus obtain the clock skews that correspond to the minimum clock period. Our strategy is to transform the skew solution to a retiming solution to achieve the minimum clock period. The rationale behind this approach is that, as will be shown in the next section, the clock period obtained thus is smaller than that obtained with the inclusion of double-sided constraints. This minimum clock period can be preserved while reconciling short path and logic signal separation constraint violations [15] by using an algorithm for minimum padding, such as the one in [16]. A. Solution to the Clock Period Optimization Problem
I ) Formation of the constraint graph: The linear program ( 3 ) without the short path constraints is rewritten as

The level number level ( k ) of each gate k in the circuit is first computed by a single PERT run; the level number is defined as the largest number of gates from a primary input to the gate, inclusive of the gate. In other words, the level number of a gate is found by a topological ordering algorithm. To find d ( i , o ) , the largest delay from primary input i to all primary outputs 0, we conduct an event-driven PERT-like exercise starting at flip-flop i , as described in the following piece of pseudocode. During the process, we maintain a set of queues, known as level queues. The level queue indexed by k contains all gates at level k that have had an input processed. Initialize all level queues to be empty; currentlevel = 0; currentgate = flip-flop i ; while (currentgate ! = nil) { if (currentgate # i ) for (all fanins 3 of currentgate) d ( i , currentgate) = max ( d ( i , j ) ) delay(currentgate); else d ( 2 , i ) = 0; for (all fanouts k o f currentgate) Append gate k to the level queue indexed by level(k), if it does not already lie on the queue; if (all level queues are not empty) { currentlevel = lowest level number whose level queue is nonempty; currentgate = head of the level queue for currentlevel; delete currentgate from its level queue;

minimize P subject to ' ~ - x i j

+ P 2 Tsetup +

$2,

j).

(4)

Notice that for a constant value of P, the constraint matrix reduces to a system of difference constraints which can be represented by a constraint graph [17]. A feasible solution to the linear program exists if the corresponding constraint graph G l ( P ) contains no positive cycles, and the minimum clock period corresponds to the smallest value of P at which no positive cycle exists. The skews at all primary inputs and primary outputs are assumed to be zero; this is represented by a host node in the constraint graph, similar in principle to the notion in [l]. Observe that the constraint set of the linear program (4) is a subset of the constraint set of the linear program (3). Therefore, the optimal period for the LP above must be less than or equal to that for the LP that handles double-sided constraints. If Z ( z , j ) is finite, then a directed edge between x, and x3 are constructed in G l ( P ) in accordance with the long path delay constraint.

}
else currentgate
E

nil;

At each step, an element from the lowest unprocessed level is plucked from its level queue, and the worst-case delay from B. Calculating the Worst-case Flip-Flop-to-Flip-Flop Delays flip-flop i to its output is computed. All of its fanouts are then For any input i , the procedure [2] for computing d ( i , j ) for placed on their corresponding level queues, unless they have all j involves setting the arrival time at input i to zero, and that already been placed on these queues. Note that by construction, at all other inputs to -CO. The resulting signal arrival time at no gate is processed until the delay to all of its inputs that are each output j,found using PERT [18], is the value of d ( i ,j ) . affected by flip-flop i have been computed, since such inputs However, if this procedure were to be performed directly, it must necessarily have a lower level number. would lead to large computation times. It was observed during the symbolic propagation of con- C. Theoretical Results on the Optimal Clock Period straints in [19] that in most cases, a flip-flop at the input to The following theoretical results are proved for the optimal a combinational block exercises only a small fraction of all clock period: of the paths between the inputs of the combinational block Lemma 3: If P = PI does not permit a feasible solution to and the outputs. Based on this observation, we develop an the linear program (4), then neither does P = P2 < P I . efficient procedure for calculating the values of z ( i , j ) .It was ProoF If P = PI is such that it does not permit a found that the use of this procedure gave run-time improve- feasible solution to the linear program, then the constraint ments of several orders of magnitude over the direct multiple graph Gl(P1) has a positive cycle, C. applications of PERT described in the previous paragraph. Since we will be dealing with combinational blocks only here, in this subsection only, we will refer to the flip-flops at the inputs(outputs) of a combinational block as primary inputs(0utputs).

SAPATNEKAR AND DEOKAR: UTILIZING THE RETIMING-SKEW EQUIVALENCE

1241

If PZ< P I , then since the weight of the cycle C is (Tsetup +&,I

qEC
-

P2) >
qEC

(Tsetup

dL,J

Pi) L 0 (6)
P2

Theorem 6: A finite solution to the clock period minimization problem exists. Proof: From Lemma 5, the solution P to the clock period minimization problem is bounded by 0 5 PI,, 5 P :g
Phigh

C will be a positive cycle in GI (P2) too. Hence,

does not

< CO.

C l

permit a feasible solution to the linear program (4). 0 D. The Clock Skew Optimization Procedure Corollary 4: If P = PI permits a feasible solution to the The skeletal pseudocode describing the algorithm for finding linear program (4), then so does P = P 2 > Pi. the optimal clock period proceeds as follows, using a binary Proo$ Assume that P 2 does not permit a feasible so- search on the value of the clock period: lution. Therefore, by Lemma 3, neither does PI, which is a Construct the constraint graph; contradiction. 0 Pmax Phi&; . . .(Lemma 5a) = Lemma 5a: An upper bound, Phi& on the clock period, Pmin Plow;. . .(Lemma 5b) = Popt, given by is while (P,,, - P,in) > t {

where q = ( i , j ) is an edge in G l ( P ) . Proof: Suppose all clock skews are set to 0. Then from the equation for zero-clocking constraints, we have

P = (Pmax + P m i n ) / 2 ; if G 1 ( P )has a positive cycle Pmin P ; =

Pmax

else
= P;

p
Therefore,

2(W) + Tsetup)

1
In the above algorithm, the presence of a positive cycle in G l ( P ) may be tested using the Bellman-Ford algorithm [171. If the skews are initialized to 0, the Bellman-Ford solution achieves the objective of minimizing I Z ~ , , ~ ~ - zi,,inl [17:1. On a graph with V vertices and E edges, the computational complexity of this algorithm is O(V . E ) . The number of iterations is (Phigh - Plow)/e. The time required to form the constraint graph may be as large as If1 . I GI, where I f 1 is the maximum number of inputs to any combinational stage, though in practice, it is seen thait this upper bound is seldom achieved. Therefore, the iterative procedure above, when carried tlo convergence, provides the solution to the linear program (4) with a worst-case time complexity of

Y ' ( i , j )E G l ( P ) .

(8)

(9)
is a valid solution, and is thus an upper bound on the optimal clock period. U. Lemma 5b: A lower bound, Plowon the optimal clock period POPt given by is

where q = ( 2 . j ) is an edge in G 1 ( P ) , and dloopis defined as the largest weight of any edge in G l ( P ) from any node (including the host node) to itself, if such a loop exists, and 0 otherwise. Proof: The weight of any edge that starts and ends at the same node, d ( i ,i) Tsetup, clearly a lower bound on the is clock period. If no such edge exists, then the lower bound is calculated as follows. Let C be the critical cycle at the optimal clock period Po,t, i.e., a cycle with weight zero. (Clearly one such cycle must exist, or else P,,t could be reduced further.) Let k be the number of edges in C. Therefore

0=
(%dEC

[(z(i,A+ Tsetup - ~ o p t ) I

The nonnegativity of the right hand side is trivial.

I f 1 ' IGl) (11) o ( F ' E ' logs [(Phigh where F is the number of flip-flops in the circuit, E is th'e number of pairs of flip-flops connected by a combinationad path, and Phi& and PI,, are as defined in Lemma 5, and t , defined in the pseudocode above, corresponds to the degree of accuracy required. We point out that for real circuits, E = O(F). We caution the reader that the complexity shown above is not a genuine indication of the complexity if the implementation is cleverly carried out, using back-pointers during thle Bellman-Ford process [20] and the procedure in Section IV-13 for d ( i ,j ) calculations. In the solution found above, all skews must necessarily b'e positive, since the weights of each node in the Bellman-Ford algorithm was initialized to zero. Also, in general, the skew art the host node (corresponding to primary inputs and outputs) could be nonzero. Our objective is to ensure a zero skew a i the primary input and output nodes since we do not have th'e flexibility of retiming these, and hence we modify this solution. 2,] is a solution to a system of differencle Note that if [ X I , constraints in z, then so is z = [(XI + k ) , ( 2 2 + k ) , ' k ) ] .Therefore, by selecting k to be the negative of the skew at the host node, a solution z with a zero skew at the hos;t ' 0 is found.
1
+

1242

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 15, NO. 10, OCTOBER 1996

Negative skew

FF'S

Fig. 5. Retiming across a combinational block

V. PHASE B: A. Introduction

SKEW

MINIMIZATION BY

RETIMING

In Phase B, the magnitudes of the clock skews obtained from Phase A are reduced to zero by applying retiming transformations. This employs relocation of the flip-flops across logic gates while maintaining the optimal clock period previously found. After the skew magnitudes have been reduced by as much as possible, the retimed circuit may be implemented by applying the requisite skews at a flip-flop (to get the minimum achievable clock period) or by setting all skews to zero (to get a clock period that is, as will be shown in Section VI, no more than one gate delay above the optimum). Since any flip-flop to be moved must have a nonzero skew, we divide the relocations into one of the two following categories: Flip-flops with negative skews Flip-flops with positive skews Before we proceed, we will state the following result, which is something of a generalization of Theorem 1. The theorem pertains to Fig. 5, where the flip-flops at the input are retimed across a combinational block to its outputs. Theorem 7: a) Retiming transformations may be used to move flip-flops from all of the inputs of any combinational block to all of its outputs. The equivalent skew of the relocated flip-flop at output j , considering long path constraints only, is given by

skews FF's

skew=

(b) Fig. 6. Retiming for a negative skew flip-flop.

(c)

possible to come up with an equivalent skew value that satisfies both long and short path constraints. For example, when we consider short path constraints, moving flip-flops from the input to the output requires that the new skew be
l<i<n

min (xi d ( i , j ) )

[using the same notation as Theorem 7(a)], which is incompatible with the requirement stated above except in the special case where all paths from i to j have the same delay. However, this is not a serious problem for us here since we are only considering the long-path constraints here, as stated x . - max (xi d ( i , j ) ) in Section IV. - l<i<n While the preceding paragraph may superficially seem to where the x,,1 5 i 5 n are the skews at the input flip-flops, contradict Theorem 1, which is valid for both long path and and x3 is the equivalent skew at output j , and d ( i , j ) is the short path constraints, we reassure the reader that this is worst-case delay of any path from i to j . not so. Theorem 1 can be considered to be a special case b) Similarly, flip-flops may be moved from all of the where the combinational subcircuit consists of only one gate, outputs of any combinational block to all of its inputs, and the where clearly the longest and shortest path delays of the equivalent skew at input k , considering long path constraints combinational subcircuit are equal. only, is given by 1) Case 1: Negative skew reduction: Consider the case of (13) a flip-flop j shown in Fig. 6(a) that has a negative skew at the conclusion of Phase A.2 If we consider the gate p to where the xj,1 5 j 5 m are the skews at the input flip-flops, which j fans out, we may find its transitive fanins and identify and xk is the equivalent skew at input k , and x ( k , . j ) is the 2We assume here that each flip-flop fans out to exactly one gate. If the worst-case delay of any path from k to j . fanout of a flip-flop is larger than one, then it is replicated at each fanout The proof is omitted and proceeds along the lines of that branch. The replicated flip-flops have exactly one fanout gate, and each such for Theorem 1. We point out here that in general, it is not flip-flop is considered in turn.

SAPATNEKAR AND DEOKAR: UTILIZING THE RETIMING-SKEW EQUIVALENCE

1241

the flip-flops at the input of the combinational subcircuit to which p belongs. Through retiming operations, it is possible to transform the circuit in Fig. 6(a) to the one in Fig. 6(b); the equivalent skews at each flip-flop in Fig. 6(b) are calculated. At this point, it need only be noted that the equivalent skews for these flip-flops are found without physically moving them to the gate inputs. To see why the transformation from Fig. 6(a) to (b) is always possible, note that the combinational block shown in the figure can be replaced by a set of input-output delay constraints. We may then apply the result of Theorem 7(a) to obtain the equivalent skews; the complete procedure will be described later. There now exists a set of n virtual flip-flops3at the input to gate p , as shown in Fig. 6(b). Let x l , x 2 , ~ ~ ~ , x n be the skews of these flip-flops. They must satisfy the constraints
Xk

Positive skew

+ delay 5 2; + P

Yl 5 k 5 n

(14)

where P is the clock period, z, the skew at an output flipflop, FF, (not shown), of the combinational block to which FF1 . . . FF, are input flip-flops, and delay is the largest combinational delay from the input of gate p to FF,. Obviously, from the above constraints:
l<k<n

U
(a)

max (xk) delay

5 x, + P.

(15)

=+-(b) ~,

ws skews
. . : . .
:
j j

$J-z-x
min(s i)-d-slack(r)
skew=

.
j

Now, there can exist two scenarios: 1) All of the n flip-flops at the inputs have negative skews. In this case, the maximum of all the negative skew flipflops is negative and hence the set of flip-flops may be moved across the gate p , as shown in Fig. 6(c). If the sign of the skew were to change after the relocation, the relocation would not be carried out unless it reduced the magnitude of the skew, and if not, it would be left in its current location with a skew of m a x l l z l n (xz). For example, a flip-flop with a skew of -0.75, to be moved across a gate with unit delay, would have a new skew value of 0.25; such a relocation would be desirable since it reduces the magnitude of the clock skew (this idea is better understood in the context of Lemma 8 and Theorem 9, to be stated and proved later in this paper). However, it would be undesirable to move a flip-flop with skew -0.25 across a unit delay gate. Therefore, in either case, this effects a reduction in the magnitudes of the skew values, as is desirable. 2) One or more of the virtual flip-flops has a positive effective skew. In this case, the skew at the flip-flop 3 under consideration may be set to zero without violating any timing constraints, since the maximum skew at the input to gate p would be unchanged by this operation. One example where this may occur is when one of the inputs to the combinational block is a primary input of the sequential circuit; the equivalent skew of the corresponding virtual FF at the gate input would be positive. Note that due to this, it is never necessary for a
We refer to these flip-flops as virtual flip-flops because we do not physically move them to the input of gate p at this point.

(C)

Fig, 7, Retiming for a positive skew flip-flop,

primary input to be relocated, which is in conformance with our assumption that these are immovable. 2) Case 2: Positive skew reduction: In the case of a flipflop j that has a positive skew at the end of Phase A, as shown in Fig. 7(a),4 the procedure parallels that described above. Through retiming operations, it is possible to transform the circuit in Fig. 7(a) to the one in Fig. 7(b); the equivalent skews at each flip-flop in Fig. 7(b) are calculated using Theorem 7(b); the precise procedure will be described later. Therefore, at the output of the gate p , there now exists a set of n virtual fliplflops as shown in Fig. 7(b), with effective skews x1,x2 . . . x, . The skews at these flip-flops need to satisfy the constraints

x, + delay 5 xk

Yl 5 k 5 n

(16)

where x L the skew at an input flip-flop, FF,(not shown), of the combinational block to which FFl . . . F F , are output flipflops, and delay is the largest combinational delay from FE, to the output of gate p . The above constraints reduce to

x, delay L

l~mi2,(a)P. +

(17)

As before, we may have one of two scenarios: 1) If all of the n flip-flops in the array have positive skews, the minimum of all the positive skew flip-flops
4Each flip-flop has exactly one fanin, and hence the problem of multiplicity described in a footnote for Case 1 is irrelevant.

1244

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 15, NO. 10, OCTOBER 1996

I
L

Skew = x

(a)

I
Skew = x
I

Skew = x

(b)

Fig. 8. Handling flip-flops with multiple fanouts.

The backtrace continues until a flip-flop is encountered. In the example in Fig. 9, the backtrace terminates when flip-flop x is encountered. During this process, we keep track of the worstcase delay, d, to gate p . As a consequence of Theorem 3, if the skew (calculated in Phase A) at flip-flop x is t units, then its equivalent skew at the input to gate p is t d units. Step 3: If any equivalent skew at a virtual flip-flop is positive, then the skew at j is set to zero and it is not relocated; if not, the skew after retiming is found using the criteria described earlier. If the magnitude of this skew is smaller than the current skew at j , then j and all of the virtual flipflops at the input to p are retimed across p . (Notice that if the B. A4inimization Procedure for Case 1 skew changes sign after retiming, then the magnitude of the The steps involved in minimizing the skew magnitudes for retimed skew could become larger. Only those sign-changing flip-flops with negative skews are outlined below: Step 1: All flip-flops in the circuit with negative skews moves that reduce the skew magnitude are permitted.) Note are placed on a queue, Q . For each flip-flop, we consider that the motion of the virtual flip-flops to their new location one fanout at a time; in other words, the situation shown in may entail replicating these flip-flops, as shown in Fig. 8. For example, if flip-flop j were to be moved across gate p , a new Fig. 8(a) is transformed to the form in Fig. 8(b). Step 2: Let j be the flip-flop that is currently at the head of flip-flop would have to be created at yz with an equivalent the queue, and p the gate that it fans out to. We will assume for skew corresponding to the skew of flip-flop x, retimed to now that p is a gate; the case where it is the input of a flip-flop position y2. The new skews are found as explained in the will be dealt with separately. The equivalent skew at every previous section. Any such flip-flops that have a negative skew other fanin node of p is found. We do not physically move (as will happen most of the time, unless relocation changed flip-flops to the inputs of gate p at this time, and are hence we the sign of the skew) are now placed at the tail of the queue, imagine that the inputs to gate p are a set of virtual flip-flops. Q, and are processed later. The equivalent skews are found as follows. Consider a flipStep 4: If the retimed flip-flop has a negative skew, it is flop j with negative skew, as shown in Fig. 9. The gate p to placed at the tail of Q. which it fans out to is added to the tail of a queue R5A reverse Step 5: If Q is not empty, go to Step 1; if not, the PERT is employed to backtrace along the fanin cone of gate p. magnitudes of all negative skew values have been minimized. When a gate is encountered, it is added to the queue. In Fig. 9, In Step 2 above, if the flip-flop j fans out to another flipgate x is first added to R, and in the next step, gate y is added. flop, which we shall call IC, then since there is no combinational Note that the queue I2 is distinct from the queue Q. delay between the flip-flops, and retiming preserves the zero-

is positive and hence the set of flip-flops may be moved across the gate p , as illustrated in Fig. 7(c). If the sign of the skew were to change after the relocation, the relocation would not be carried out unless it reduced the magnitude of the skew. Therefore, in either case, this effects a reduction in the positive skew values, as is desirable. One may also take advantage of slacks in the combinational paths to reduce the skews at flip-flops. If input T to gate p has a slack, slack ( r ) (i.e., the worst-case delay at input T could have been increased by slack ( T ) without changing the worst-case delay to the output of gate p ) , then the skew may be further reduced by slack ( T ) . If slack ( r )> min ( x z )- d , the skew is set to zero; if not, it is set to min ( x a )- d - slack ( T ) . 2 ) If one or more of the virtual flip-flops has a negative skew value, then the skew at the flip-flop j under consideration is set to zero. This violates no timing constraints, since it leaves the minimum skew at an output of gate p unchanged.

Fig. 9. Reproduction of flip-flops in backtrace.

SAPATNEKAR AND DEOKAR: UTILIZING THE RETIMING-SKEW EQUIVALENCE

1245

clocking constraints, it must be true that

r g

+ Tsetup 5 xk + P
i.e., zJ 5 xk

Tsetup.

If the right-hand side (RHS) is positive, then xJ can be set to zero without violating any constraints. If not, then xk 5 Tsctup P < 0, which implies that k is a flip-flop that will eventually move to the right, thereby allowing flip-flop j the leeway to move as well. Therefore, if this is the case, the skew of flip-flop j is set ta
XI

Step 4: If the retimed flip-flop has a positive skew, it LS placed at the tail of Q. Step 5: If Q is not empty, go to Step 1. In Step 2 above, if the flip-flop j has a fanin from another flip-flop, which we shall call k , then since there is no combinational delay between the flip-flops, and retiming preserves the zero-clocking constraints, it must be true that
X ~ C

+ Tsetup I + P
23

i.e., xJ 2 xc - P 1

+ Tsetup.

= xk

+ P - Tsetup < 0

(18)

and the flip-flop .j is reprocessed after flip-flop IC has been processed (i.e., its skew is set to the value calculated above, and it is placed at the tail off Q). It will be shown in Lemma 8 that all such flip-flops will eventually be processed, and their skews set to nearly zero. It is interesting to note that the latter case was almost never seen to happen in the examples presented in this paper. Note that in spite of Theorem 7, it is still necessary to go through the reverse PERT procedure to ensure that flip-flops are created at fanout points like yz.
C. Minimization Procedure jor Case 2

If the RHS is negative, then xJ can be set to zero witln; out violating any constraints. If not, then we have xk ? P - Tsetup 0, which implies that k is a flip-flop that will1 > eventually move to the left, thereby allowing flip-flop j the leeway to move as well. Therefore, if this is the case, the skew of flip-flop j is set to
~3

= xk - I

+ Tsetup > 0

(1%

and the flip-flop j is reprocessed after flip-flop k has been processed. It will be shown in Lemma 8 that all such flip-flops will eventually be processed. As in the analogous situation i n Case 1 described in the previous section, it was observed that the latter was almost never seen to happen in the ISCAS89 examples. VI. PROPERTIES OF THE RETIMING PROCEDURE Lemma 8: At the end of the retiming procedure in Phase B, the skew at each flip-flop is no more than half a gate delay. Proofi Assume, for purposes of contradiction, that the minimum skew after Phase B at a flip-flop is larger than half a gate delay. Assume also that each flip-flop has a single fanoul; in case of multiple fanouts, the flip-flop can be replicated as shown in Fig. 8. Then one of two possibilities exist: 1) If the flip-flop fans out (fans in) to a gate G when the skew is negative (positive), then using the procedure described above, the skew magnitude can be reducedl, either to zero, keeping the flip-flop in its current location, or by moving the flip-flop across gate G. This contradicts the fact that Phase B is complete. 2 ) If the flip-flop, i, fans out to another flip-flop, j (a primary output is also considered to be a flip-flop), thein the skew of flip-flop i is either set to zero (if possible) or to a magnitude that is smaller than that of xi. This can be seen from (18), (19) and from Lemma 5b (whiclh affirms that POpt Tsetup). > Now, consider the case of negative skew flip-flops,; the situation for positive skew flip-flops is analogous. Assume that we have a situation where flip-flops j l , j 2 , . ,j , are connected directly to each other in a chain and their skews remain negative at the end of Phase B. The flip-flops must necessarily be connected in a cycle; if not, there would be a first (leftmost) flipflop in the chain that is connected either to a gate or to a flip-flop with zero skew, which implies that their skew magnitudes can be reduced, and contradicts the assumption that Phase B is complete.
a

The steps involved in minimizing the skews at flip-flops with positive skews are analogous to those described in Section V-B, and are outlined below: Step 1: All flip-flops in the circuit with positive skews are placed on a queue, Q . Note that each flip-flop has precisely one fanin. Step 2: Let j be the flip-flop that is currently at the head of the queue, and p the gate that fans into it. As in case 1, we will postpone the discussion of the case where j has a fanin from a flip-flop. The equivalent skew of the virtual flip-flops at every other fanout node of p is found. The procedure for finding the equivalent skews is as follows. Consider a flip-flop j with positive skew, as shown in Fig. 7. The gate p that fans into it is added to the tail of a queue R.6 Analogously to the procedure for Case 1, a forward PERT is employed to trace the fanout cone of gate p . When a gate is encountered, it is added to the queue. The trace continues until a flip-flop is encountered. During the process, we keep track of the worst-case delay d from gate p . As a consequence of Theorem 1, if the optimal skew at a flip-flop is t units, then its equivalent skew at the output of gate p is t - d units. Step 3: If any equivalent skew at a virtual flip-flop is negative, then the skew at j is set to zero and it is not relocated; if not, the skew after retiming is found using the criteria described earlier. If the magnitude of this skew is smaller than the current skew at j , then j and all of the virtual flip-flops at the input to p are retimed across p . The new skews are found as explained in the previous section. Any such flip-flops that have a positive skew (as will happen most of the time, unless relocation changed the sign of the skew) are now placed at the tail of the queue Q and are processed later.
6As before, note that R is distinct from the queue

I246

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 15, NO. 10, OCTOBER 1996

Now when each flip-flop in the cycle is processed, 100 the magnitude of its skew is reduced, as shown above. 0 Hence, if the procedure described in Section V-B is applied, then each skew magnitude must eventually be reduced to zero, which contradicts the assumption that Phase B is over. Therefore we cannot have a negative 3 skew flip-flop connected directly to a negative skew 0 20 0 p. U 0 flip-flop at the end of Phase B. 0 I Theorem 9: If, at the end of the retiming procedure, all 0 5000 100001500020000 skews are set to zero, then the optimal clock period for this Number of Gates circuit is no more than P,,t d , , where Poptis the optimal Fig. 10. CPU time versus circuit complexity. clock period found in Phase A, and d , is the maximum delay of any gate in the circuit. For example, for ~15850.1,a circuit with 9772 gates and Proof By Lemma 8, the skew at each flip-flop after 534 flip-flops, the value of P,, is found to be 82.0 units. At Phase B is no more than half a gate delay. Hence, to ensure that the double-clocking constraint is the end of Phase A (skew optimization), ASTRA calculates the value of POPt be 63.0 units. The value of Pretat the to satisfied, it is easy to see that one may modify the period to end of Phase B of ASTRA is 63.0 units. The improvement in the clock period after the two phases is calculated as

where xJ,x,are the skews of any pair of flip-flops that are connected by a combinational path. Clearly, since 1x1~1 d,,,/2 < 'dk, the clock period required to ensure that all double-clocking constraints are satisfied is no more than P 1- d. , Note also that since any period achievable by retiming is also achievable using skews, and final period can be no better than P, which is the optimal period using skew optimization. 17 VII. EXPERIMENTAL RESULTS

%change =

p m a x - Pret

The algorithm was implemented as a C program, ASTRA. Experimental results running from ASTRA on all circuits in the ISCAS89 benchmark suite (including the Addendum93 circuits) are presented in Table I. For each circuit, the table provides data that describes its size in terms of the number of combinational gates, /GI, and flip-flops lFlinit. All gates are assumed to have unit delays (although the algorithm is certainly not restricted to unit delay circuits), and the setup and hold times are arbitrarily set to 0. P,, is the upper bound on the clock period provided by Lemma 5a. Note that P,, corresponds to the clock period in the original circuit. The next column shows the optimal value, POPt, calculated at the end of Phase A of ASTRA, using clock skew optimization. At the end of retiming in Phase B of ASTRA, any skews that could not be set exactly to zero are now forced to zero and the new period Pretis shown, along with the VIII. CONCLUSION corresponding percentage improvement over the initial period, Pma,. This period corresponds to the maximum delay of any An approach that takes advantage of the equivalence becombinational segment in the retimed circuit. As expected, it tween retiming and clock skew is presented, and is used is seen that in each case, Pret within one gate delay of [Link] gate-level retiming. Results on all of the circuits in the is The CPU times for running ASTRA on an HP 735 work- ISCAS89 benchmark suite have been presented and can easily station for each of the two phases, and the total CPU time be handled by this algorithm. are shown for all these circuits. Note that the time for Phase The chief reason for the improvement is that ASTRA takes a A iiicludes the time for calculating the values of z ( z , j ) ,the global view of retiming by first solving the clock skew problem maximum delay from flip-flop i to flip-flop j , as well as the in a smaller number of variables. In the second phase, local time for the Bellman-Ford iterations. The last column shows transformations are used to perform the retiming. The logic the number of flip-flops in the retimed circuit. behind this approach is that a circuit would have to be very

Pret and is found to be 30.2% for ~15850.1. The number of flipflops was increased in the process from 534 to 572. It can be seen from the Table that in 36 out of 44 circuits, ASTRA caused the clock period to improve, with the improvement being as much as 220.7% in the case of s6669. Although it is theoretically possible for retiming to reduce the number of flip-flops in the circuit, this was never seen to happen. It should be stressed here that use of the techniques presented here performs the relocation of flip-flops across the minimal number of levels of gates. However, it does not minimize the number of flip-flops, and this is a topic for further research. It is worth noting that the CPU times for ASTRA are rather small; even the largest circuit could be retimed in just over a minute. The runtimes for the ASTRA algorithm versus the circuit size, IGI, is shown in Fig. 10. It is very difficult to infer a general relationship between the circuit size and the execution time. However, it may be inferred from these graphs and from Table I that for large circuits where large improvements in the clock period are possible, the algorithm is likely to have relatively larger run-times. A second point that can be inferred is that very often, when the clock period is reduced tremendously, the run-time increases because the amount of work required of Phase B increases.

SAPATNEKAR AND DEOKAR: IJTILIZING THE RETIMING-SKEW EQUIVALENCE

I241

TABLE I RESULTS CLOCK OF SKEW OPTIMIZATION

2342 s4863 s53'78 2779 3080 s66G9 5597 s9234.1 ~ 1 3 2 0 7 . 1 7951 ~ 1 5 8 5 0 . 1 9772 ~35932 16065 ~38417 22179 s38584.1 I 19253

I I

104 179 239 211 638 534 1728 1636 1426

I I

58.0 25.0 93.0 58.0 59.0 82.0 29.0 47.0 56.0

I 31.5 I I 48.0 I

30.0 21.0 29.0 38.0 51.0 63.0 27.0

30.0 21.0 29.0 38.0 51.0 63.0 27.0 32.0 48.0

1 1

93.3% 19.0% 220.7% 52.6% 15.7% 30.2% 7.4% 46.9% 16.7%

I I

0.24s 0.24s 0.32s 0.8s 0.98s 3.31s 3.11s 56.2s 3.8s

0.12s

poorly designed indeed to require enormous computation time skew solution at the end of Phase A may be translated to a for the local transformations, and hence in most practical cases, retiming solution using the methods of Phase B. The algorithm directly provides a result to the combined the latter phase takes only a small amount of computation; this clock skew optimization and retiming problem. The use of this is borne out by our experimental results. It must be pointed out that the algorithm performs retiming algorithm to minimize the number of flip-flops using retiming only for timing optimization and does not take into account the only is a topic for further research. Another possible extension fact that retiming may cause initial states to change. Therefore, is the use of deliberate skews with retiming to minimize the in its current avatar, it is more applicable to the timing number of flip-flops. optimization of pipelined circuits, rather than for optimization of control unit circuitry, unless such circuits are designed using ACKNOWLEDGMENT the techniques in [21]. The authors would like to thank Dr. J. Fishburn of AT&T The ASTRA algorithm may easily be adapted to perform retiming to satisfy a given clock period, Pspec. Phase A Bell Laboratories and Prof. L.-F. Chao of Iowa State UniIf Pspec versity for some helpful discussions. They also thank the consists of a single pass through the graph GI (Pspec). is infeasible, this will be reported immediately, and if not, the anonymous reviewers for their helpful comments.

1248

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 15, NO. 10, OCTOBER 1996

REFERENCES
[ l ] C. E. Leiserson and J. B. Saxe, Retiming synchronous circuitry, Algorithmicu, vol. 6, pp. 5-35, 1991. [2] J. P. Fishburn, Clock skew optimization, IEEE Trans. Cumput., vol. 39, pp. 945-951, July 1990. [3] H.-G. Martin, Retiming by combination of relocation and clock delay adjustment, in Proc. Euro. Design Automation Con$, 1993, pp. 384-389. [4] B. Lockyear and C. Ebeling, Minimizing the effect of clock skew via circuit retiming, Tech. Rep. UW-CSE-93-05-04, Dept. Comput. Sci. Eng., Univ. Washington, Seattle, 1993. [5] L.-F. Chao and E. H.-M. Sha, Retiming and clock skew for synchronous systems, in Proc. IEEE h t . Symp. Circuits Syst., 1994, pp. 1.283-1.286. [6] E. M. Sentovich el al., SIS: A system for sequential circuit synthesis, Tech. Rep. UCBERL M92/41, Electron. Res. Lab., Univ. California at Berkeley, May 1992. [7j N. Shenoy and R. Rudell, Efficient implementation of retiming, in Proc. IEEE/ACM Int. Con$ Computer-Aided Design, 1994, pp. 226-233. [8] S . Chakradhar, A fast optimal retiming algorithm for sequential circuits, Tech. Rep. 93-CO19-4-5506-5,CCRL, NEC USA, Inc., Princeton, NJ, 1993. [9] M. A. B. Jackson, A. Srinivasan, and E. S. Kuh, Clock routing for high-performance ICs, in Proc. ACM/IEEE Design Automation Con$, 1990, pp. 573-579. [IO] A. Kahng, J. Cong, and G. Robins, High-performance clock routing based on recursive geometric matching, in Proc. ACWIEEE Design Automation Con$, 1991, pp. 322-327. [I 1j R . 3 . Tsay, Exact zero skew, in Proc. IEEE h t . Con$ Computer-Aided Design, 1991, pp. 336-339. [I21 S. Pullela, N. Menezes, and L. T. Pillage, Reliable nonzero clock skew trees using wire width optimization, in Proc. ACM/IEEE Design Automation Con$, 1993, pp. 165-170. [13] S. Pullela, N. Menezes, J. Omar, and L. T. Pillage, Skew and delay optimization for reliable buffered clock trees, in Proc. IEEE/ACM Int. Con$ Computer-Aided Design, 1993, pp. 556-562. [ 141 L.-F. Chao, Scheduling and behavioral transformations for parallel systems, Ph.D. dissertation, Dept. Comput. Sci., Princeton Univ., 1993.

[15j D. Joy and M. Ciesielski, Clock period minimization with wave pipelining, IEEE Trans. Computer-Aided Design, vol. 12, pp. 4 6 1 4 7 2 , Apr. 1993. [16] N. V. Shenoy, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, Minimum padding to satisfy short path constraints, in Proc. IEEE/ACM lnt. Con$ Computer-Aided Design, 1993, pp. 156-161. [I71 T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorilhms. New York: McGraw-Hill, 1990. [IS] S. Even, Graph Algorirhms. Potomac, MD: Comput. Sci., 1979. [19] W. Chuang, S . S . Sapatnekar, and I. [Link], Timing and area optimization for standard cell VLSI circuit design, IEEE Trans. Computer-Aided Design, vol. 14, pp. 308-320, Mar. 1995. [20] T. G. Szymanski, Computing optimal clock schedules, in Proc. ACM/IEEE Design Automation Con$, 1992, pp. 399404. [21] H. J. Touati and R. K. Brayton, Computing the initial states of retimed circuits, IEEE Trans. Computer-Aided Design, vol. 12, pp. 157-162, Jan. 1993.

Sachin S. Sapatnekar (S86-M93) for a photograph and biography, see p. 1011 of the August 1996 issue of this TRANSACTIONS

Rahul B. Deokar received the B E degree in electronics engineenng from Victoria Jubilee Technical Insitute (VJTI), Bombay, in 1992 and the M S degree in computer engineenng from Iowa State University, Ames in 1994 Since August 1994, he has been a Member of Technical Staff with the Design Automation Center of AT&T Bell Laboratories, Murray Hill, NJ. His research interests include graph theory, algorithms for performance and power optimization (employing transistor sizing, retiming and skew optimization), logic and timing simulation

Retiming Theory and Practice
No ratings yet
Retiming Theory and Practice
21 pages
Retiming-Based Factorization For Sequential Logic Optimization
No ratings yet
Retiming-Based Factorization For Sequential Logic Optimization
26 pages
Latch Opt
No ratings yet
Latch Opt
6 pages
Retiming Control Logic
No ratings yet
Retiming Control Logic
21 pages
Retiming:: Sequential Digital Circuit Optimization
No ratings yet
Retiming:: Sequential Digital Circuit Optimization
33 pages
Advanced
No ratings yet
Advanced
21 pages
Retiming: Reduce Clock Period by Shortening Critical Path Reduce The Number of Registers
No ratings yet
Retiming: Reduce Clock Period by Shortening Critical Path Reduce The Number of Registers
17 pages
Minimizing Clock Skew in ASIC Design
No ratings yet
Minimizing Clock Skew in ASIC Design
8 pages
VLSI Design and ECE: Dr. B. R. Ambedkar National Institute of Technology Jalandhar Dr. Mamta Khosla
No ratings yet
VLSI Design and ECE: Dr. B. R. Ambedkar National Institute of Technology Jalandhar Dr. Mamta Khosla
25 pages
Zero Skew Vs Tolerable Skew
100% (2)
Zero Skew Vs Tolerable Skew
7 pages
3 Unit
No ratings yet
3 Unit
21 pages
DSP-FPGA Ch04-Retiming HK192
No ratings yet
DSP-FPGA Ch04-Retiming HK192
84 pages
ELEC 7770 Advanced VLSI Design Spring 2008
No ratings yet
ELEC 7770 Advanced VLSI Design Spring 2008
21 pages
Clock Skew and Jitter Analysis in Sequential Circuits
No ratings yet
Clock Skew and Jitter Analysis in Sequential Circuits
3 pages
Timing Optimization Through Clock Skew Scheduling - Ivan S.kourtev, Baris Taskin, Eby G. Friedman
100% (1)
Timing Optimization Through Clock Skew Scheduling - Ivan S.kourtev, Baris Taskin, Eby G. Friedman
274 pages
Cycle Time and Slack Optimization For VLSI-Chips: C. Albrecht B. Korte J. Schietke J. Vygen
No ratings yet
Cycle Time and Slack Optimization For VLSI-Chips: C. Albrecht B. Korte J. Schietke J. Vygen
6 pages
Clock Gating
No ratings yet
Clock Gating
12 pages
Adv Vlsi hw2 Static Timing
No ratings yet
Adv Vlsi hw2 Static Timing
5 pages
VLSI Clock Skew Management
No ratings yet
VLSI Clock Skew Management
4 pages
Retiming Techniques in VLSI DSP
No ratings yet
Retiming Techniques in VLSI DSP
33 pages
Clock Skew An
No ratings yet
Clock Skew An
13 pages
Optimizing Sequential Circuits with Retiming
No ratings yet
Optimizing Sequential Circuits with Retiming
39 pages
ECE465 Lecture Notes # 11 Clocking Methodologies: Shantanu Dutt UIC
No ratings yet
ECE465 Lecture Notes # 11 Clocking Methodologies: Shantanu Dutt UIC
21 pages
Understanding Clock Skew in Circuit Design
100% (1)
Understanding Clock Skew in Circuit Design
4 pages
VLSI Retiming for DSP Experts
No ratings yet
VLSI Retiming for DSP Experts
24 pages
Chapter 4 Retiming: 1 ECE734 VLSI Arrays For Digital Signal Processing
100% (1)
Chapter 4 Retiming: 1 ECE734 VLSI Arrays For Digital Signal Processing
24 pages
A Novel Approach To Reduce Clock Power by Using Multi Bit Flip Flops
No ratings yet
A Novel Approach To Reduce Clock Power by Using Multi Bit Flip Flops
10 pages
Lec - 35 Final
No ratings yet
Lec - 35 Final
26 pages
Appl., Vol. 63, Pp. 199-223, 1978. Monthly, Vol. 82, Pp. 481-485, 1975
No ratings yet
Appl., Vol. 63, Pp. 199-223, 1978. Monthly, Vol. 82, Pp. 481-485, 1975
6 pages
hw11 Solution
No ratings yet
hw11 Solution
6 pages
Timing and Clock Issues in Digital Design
No ratings yet
Timing and Clock Issues in Digital Design
23 pages
Clock Gating
No ratings yet
Clock Gating
7 pages
Timing Issues in Digital ASIC Design
No ratings yet
Timing Issues in Digital ASIC Design
101 pages
Understanding Clock Skew in FPGA Design
No ratings yet
Understanding Clock Skew in FPGA Design
35 pages
Negative and Positive Clock Skew
No ratings yet
Negative and Positive Clock Skew
27 pages
10 Chapter 6
No ratings yet
10 Chapter 6
16 pages
Clock-Skew-Aware Scan Chain Grouping For Mitigating Shift Timing Failures in Low-Power Scan Testing
No ratings yet
Clock-Skew-Aware Scan Chain Grouping For Mitigating Shift Timing Failures in Low-Power Scan Testing
6 pages
Lec 35
No ratings yet
Lec 35
18 pages
Timing Issues
No ratings yet
Timing Issues
19 pages
ASIC Clocking Techniques Guide
No ratings yet
ASIC Clocking Techniques Guide
11 pages
Timing Issues in Digital Circuits
No ratings yet
Timing Issues in Digital Circuits
23 pages
A Novel Low Overhead Fault Tolerant Kogge-Stone Adder Using Adapt
No ratings yet
A Novel Low Overhead Fault Tolerant Kogge-Stone Adder Using Adapt
7 pages
NOLO: Predictive Useful Skew Method
No ratings yet
NOLO: Predictive Useful Skew Method
28 pages
Rohini 78038985856
No ratings yet
Rohini 78038985856
16 pages
Understanding Clock Skew in Circuits
No ratings yet
Understanding Clock Skew in Circuits
4 pages
Sharma 2020
No ratings yet
Sharma 2020
6 pages
ClockGating Cts
No ratings yet
ClockGating Cts
8 pages
L06 Clocks
No ratings yet
L06 Clocks
27 pages
Divider Retiming 4G
No ratings yet
Divider Retiming 4G
5 pages
At Speed Atpg
No ratings yet
At Speed Atpg
34 pages
Clock Skew and Clock Jitter
No ratings yet
Clock Skew and Clock Jitter
3 pages
Gated Clock Cloning For Timing Fixing 12 16
No ratings yet
Gated Clock Cloning For Timing Fixing 12 16
19 pages
C-Testable Design Techniques For Iterative Logic Arrays
No ratings yet
C-Testable Design Techniques For Iterative Logic Arrays
7 pages
LESSON 1 - The Information Age
No ratings yet
LESSON 1 - The Information Age
14 pages
ERP Market Trends and Vendor Insights
No ratings yet
ERP Market Trends and Vendor Insights
23 pages
Raiyana Rezwana Annee - AI-Driven Strategies For Enhancing Customer Loyalty and Engagement Through Personalization and Predictive Analytics
No ratings yet
Raiyana Rezwana Annee - AI-Driven Strategies For Enhancing Customer Loyalty and Engagement Through Personalization and Predictive Analytics
32 pages
Can I Use A Charger That Provides The Same Voltage But A Different Amperage - Ask Leo! PDF
No ratings yet
Can I Use A Charger That Provides The Same Voltage But A Different Amperage - Ask Leo! PDF
4 pages
Biawajit Nath PPT CNC
No ratings yet
Biawajit Nath PPT CNC
15 pages
Setup Toon Boom Harmony 9.2 Database
No ratings yet
Setup Toon Boom Harmony 9.2 Database
10 pages
3 Intermediate Code Generation
No ratings yet
3 Intermediate Code Generation
20 pages
Data Sheet: Communication Unit 560CMR01
100% (1)
Data Sheet: Communication Unit 560CMR01
5 pages
ERP Solutions for Sales Efficiency
No ratings yet
ERP Solutions for Sales Efficiency
4 pages
Nokia 5730XpressMusic RM-465 Service Manual L1&2 v1.0
No ratings yet
Nokia 5730XpressMusic RM-465 Service Manual L1&2 v1.0
21 pages
TS4F01 1 EN Col17 ExerciseHandbook A4 109
No ratings yet
TS4F01 1 EN Col17 ExerciseHandbook A4 109
1 page
Understanding Syntax & Logical Errors
No ratings yet
Understanding Syntax & Logical Errors
3 pages
Milestone08 Victoria Ajuzie 10.12.2023
No ratings yet
Milestone08 Victoria Ajuzie 10.12.2023
9 pages
SAP HANA SQL Optimization Assessment
No ratings yet
SAP HANA SQL Optimization Assessment
4 pages
Common Control Channel Power Optimization - Huawei Network (Semarang)
100% (1)
Common Control Channel Power Optimization - Huawei Network (Semarang)
8 pages
Daniel Beach - Introduction To Data Engineering-Leanpub - Com (2022)
100% (1)
Daniel Beach - Introduction To Data Engineering-Leanpub - Com (2022)
172 pages
Essentials of Systems Analysis and Design 6th Edition (Ebook PDF) Download
100% (1)
Essentials of Systems Analysis and Design 6th Edition (Ebook PDF) Download
53 pages
Lab4 IAP301
No ratings yet
Lab4 IAP301
15 pages
Delivery Plan For 1st Sem B.E. ARCHITECTURE - Cycle 1: Week 1
No ratings yet
Delivery Plan For 1st Sem B.E. ARCHITECTURE - Cycle 1: Week 1
3 pages
Single User vs Host-Based Systems
No ratings yet
Single User vs Host-Based Systems
113 pages
Dance Science
No ratings yet
Dance Science
4 pages
HP Virtual Connect Level 2 Training
No ratings yet
HP Virtual Connect Level 2 Training
432 pages
Electric Circuits Lab Registration Guide
No ratings yet
Electric Circuits Lab Registration Guide
66 pages
Regular Expressions, Text Normalization, Edit Distance Part 1
No ratings yet
Regular Expressions, Text Normalization, Edit Distance Part 1
37 pages
MIS Solutions for KKC Managers
No ratings yet
MIS Solutions for KKC Managers
30 pages
Deep Learning Exam: RNN & SOM Analysis
No ratings yet
Deep Learning Exam: RNN & SOM Analysis
6 pages
Programme Des Cours STANFORD UNIV
No ratings yet
Programme Des Cours STANFORD UNIV
4 pages
Install OpenVPN on CentOS 7 VM
No ratings yet
Install OpenVPN on CentOS 7 VM
14 pages
A Tutorial On SCSI-3 Persistent Group Reservations: (Version 1.0) by Lee Duncan, SUSE Labs December, 2012
No ratings yet
A Tutorial On SCSI-3 Persistent Group Reservations: (Version 1.0) by Lee Duncan, SUSE Labs December, 2012
11 pages
Computer Ass. Booklet
No ratings yet
Computer Ass. Booklet
64 pages

Re Timing

Uploaded by

Re Timing

Uploaded by

lEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATEDCIRCUITS AND SYSTEMS, VOL. 15, NO.

10, OCTOBER 1996

(Sanskrit) a sophisticated weapon.

0278-0070/96$05.00 0 1996 IEEE

Fig. 1. The advantages of nonzero clock skew.

Retiming for clock period optimization.

11. CLOCK SKEW OPTIMIZATION AND RETIMING

SAPATNEKAR AND DEOKAR: UTILIZING THE RETIMING-SKEW EQUIVALENCE

Fig. 4. The clock skew optimization problem.

+ d ( i , j ) + Tsetup I xj + P xi f d ( i , j ) 2xj + Thold

minimize P subject to ' ~ - x i j

SAPATNEKAR AND DEOKAR: UTILIZING THE RETIMING-SKEW EQUIVALENCE

If PZ< P I , then since the weight of the cycle C is (Tsetup +&,I

C will be a positive cycle in GI (P2) too. Hence,

P = (Pmax + P m i n ) / 2 ; if G 1 ( P )has a positive cycle Pmin P ; =

The nonnegativity of the right hand side is trivial.

Fig. 5. Retiming across a combinational block

(b) Fig. 6. Retiming for a negative skew flip-flop.

SAPATNEKAR AND DEOKAR: UTILIZING THE RETIMING-SKEW EQUIVALENCE

max (xk) delay

Fig, 7, Retiming for a positive skew flip-flop,

Fig. 8. Handling flip-flops with multiple fanouts.

Fig. 9. Reproduction of flip-flops in backtrace.

SAPATNEKAR AND DEOKAR: UTILIZING THE RETIMING-SKEW EQUIVALENCE

clocking constraints, it must be true that

SAPATNEKAR AND DEOKAR: IJTILIZING THE RETIMING-SKEW EQUIVALENCE

TABLE I RESULTS CLOCK OF SKEW OPTIMIZATION

104 179 239 211 638 534 1728 1636 1426

58.0 25.0 93.0 58.0 59.0 82.0 29.0 47.0 56.0

30.0 21.0 29.0 38.0 51.0 63.0 27.0

30.0 21.0 29.0 38.0 51.0 63.0 27.0 32.0 48.0

93.3% 19.0% 220.7% 52.6% 15.7% 30.2% 7.4% 46.9% 16.7%

0.24s 0.24s 0.32s 0.8s 0.98s 3.31s 3.11s 56.2s 3.8s

You might also like