Ee465 Final Project
Ee465 Final Project
Fall, 2013
Lab 9: System Power Optimization
Xia, Jiwei
Li, Muzi
12/10/2013
EE 465 FINAL DESIGN PROJECT December 10, 2013
1
EE 465 Fall 2013
Final Design Project
Introduction: In this project, we are going to design a triangle
rendering engine. Basically we are required to first write Verilog code
for the circuit, and use RTL Compiler to synthesize the code to get
schematic, then we use Encounter to produce the layout.
After we record the data we produced from the RTL Compiler and
Encounter, we need to redo the design by optimizing the circuit. Our
option is as follows:
1. Using pipe lining to reduce the area. In this lab, the main use of pipe
lining we can apply is to reduce the number of multipliers.
2. Applying clock gating to turn of unnecessary activity in the circuit.
In this way, you may save power and area.
3. Modifying the logic of the essential part to make it more efficient
doing calculation.
Project description: Given relative files and the skeleton of the Verilog
code, we are going to design a triangle rendering engine.
File given:
Triangle Rendering Engine description (pdf)
Supporting files (tar): input.dat, testfixture.v, triangle.sdc, tiangle.v
expect.dat, trangle.vhd
EE 465 FINAL DESIGN PROJECT December 10, 2013
2
The block overview is as follows
What we need to design is the right shaded part. All the I/O interface
decryption is as follows
Fig. 1 block overview
Fig. 2 I/O Interface
EE 465 FINAL DESIGN PROJECT December 10, 2013
3
Functional Example
By inputting (x1,y1), (x2,y2)and (x3,y3) of a triangle
Which are (1,1), (6,3) and (1,6) in this case.
Note: it is constrained that x1=x3, y1<y2<y3
It will produce a triangle as following.
The triangle rendering engine would output the valid coordinates in the
following order:
(1,1), (1,2), (2,2), (3,2), (1,3) ,(2,3), (3,3), (4,3) ,(5,3) ,(6,3), (1,4), (2,4),
(3,4) , (4,4) , (1,5) , (2,5) , (1,6)
Fig.3 Example of a triangle
EE 465 FINAL DESIGN PROJECT December 10, 2013
4
Design methodology and details
For coordinate imports:
From the test bench we know, the coordinates of the triangle is being
inputted one pair by one pair every CYCLE.
`define CYCLE 100 // Modify your clock period here (unit: 0.1ns)
Each CYCLE is 100*0.1 =10 ns
For three points, we need to wait three times to receive complete set of
coordinates.
current_i <= current_i +1;
end
if (current_i==3'b001) begin
x1 <= xii;
y1 <= yii;
end
if (current_i==3'b010) begin
x2 <= xii;
y2 <= yii;
end
if (current_i==3'b011 ) begin
x3 <= xii;
y3 <= yii;
end
After counter reaches 3, we set busy to 1, which indicating we are
busying calculation and prevent receiving new data from test bench.
Calculation
To judge whether the point is inside the or on the edge of the triangle,
we need to use the equation provided.
EE 465 FINAL DESIGN PROJECT December 10, 2013
5
Note: for the upper line, we need to invert the inequity sign.
However, these equations requires divider, which is difficult to be
implemented. Whats more important, the divider is more area and
power consuming than multiplier. As a result, we modify the equation to
use multiplier instead.
reg signed [7:0] bot_line;
reg signed [7:0] top_line;
bot_line <=(X_next-x1)*(y2-y1)-(x2-x1)*(Y_next-y1);
top_line <=(X_next-x3)*(y2-y3)-(x2-x3)*(Y_next-y3);
To make the equations identical to the original equation, we have to
compare the value only if when y1<Y<y3. This makes sure the
inequality sign doesnt change the direction when multiplying both sides
with the denominator.
For a triangle like this,
x1=x3< x2
if (Y>=y1 && x1<x2 && X>=x1 && bot_line<=0 && top_line>=0 ) begin
po<=1;
xo<= X;
yo<= Y;
end
Fig.4 Example of a triangle
EE 465 FINAL DESIGN PROJECT December 10, 2013
6
For a triangle like this,
x1=x3> x2
if (Y>=y1 && x1>x2 && X<=x1 && bot_line>=0 && top_line<=0 ) begin
po<=1;
xo<= X;
yo<= Y;
end
Verilog Design
Code Version 1
Code
module triangle (clk, reset, nt, xi, yi, busy, po, xo, yo);
input clk, reset, nt;
input [2:0] xi, yi;
output busy, po;
output [2:0] xo, yo;
reg [2:0] x1,y1; //cordinates of
point 1,2,3
reg [2:0] x2,y2;
reg [2:0] x3,y3;
reg [2:0] x11,y11; //cordinates of
point 4,5,6
reg [2:0] x22,y22; //but it will be
passed to x1,y1, x2,y2, x3,y3 later, to calculate
reg [2:0] x33,y33;
reg [2:0] xii, yii; //temp storage of
xi,yi
reg busy,po;
reg [2:0] xo,yo;
Fig.5 Example of a triangle
EE 465 FINAL DESIGN PROJECT December 10, 2013
7
reg [2:0]current_i; //a counter to
indicate which point's cordinate is being recording,example: counter_i=1,
x1=xi,y1=yi
reg [2:0] X;
reg [2:0] X_next; //Next value of x
reg [2:0] Y;
reg [2:0] Y_next; //Next value of y
reg Start_importing; //Means wheather
it is still in the process of recording cordinates.
reg signed [7:0] bot_line; //expression for
judging whether the point is on the left or right of the line
reg signed [7:0] top_line;
reg cycle; //indicating which
triangle is being calculating. cycle = 1 means first triangle is being
calculating.
always@(posedge clk or posedge reset) begin
if(reset) begin
busy <=0; //reset all
register
po <= 0;
x2 <=3'bzzz;
y2 <=3'bzzz;
x1 <=3'bzzz;
y1 <=3'bzzz;
x3 <=3'bzzz;
y3 <=3'bzzz;
x22 <=3'bzzz;
y22 <=3'bzzz;
x11 <=3'bzzz;
y11 <=3'bzzz;
x33 <=3'bzzz;
y33 <=3'bzzz;
y3 <=3'bzzz;
xo <=3'bzzz;
yo <=3'bzzz;
current_i <=3'b01;
X <= 3'b0;
Y <=3'b0;
X_next<=3'b001;
Y_next<=3'b000;
Start_importing<=0;
cycle <=0;
end
else begin //Start importing cordinates
from the testbench
xii<=xi;
EE 465 FINAL DESIGN PROJECT December 10, 2013
8
yii<=yi;
if (nt && ~busy) Start_importing <=1;
if (Start_importing && ~busy) begin // counter_i increases from 1
to 6 to indicates which point to record.
current_i <= current_i +1;
end
if (Start_importing && current_i==3'b001) begin
x1 <= xii;
y1 <= yii;
end
if (current_i==3'b010) begin
x2 <= xii;
y2 <= yii;
end
if (current_i==3'b011 ) begin
x3 <= xii;
y3 <= yii;
end
if (current_i==3'b100 ) begin
x11 <= xii;
y11 <= yii;
end
if (current_i==3'b101) begin
x22 <= xii;
y22 <= yii;
end
if (current_i==3'b110 && ~busy) begin
x33 <= xii;
y33 <= yii;
busy <= 1;
Start_importing <=0;
end
if (busy) begin // start judging whether the
points is inside the triangle
X<=X+1; //loop from (0,0) to (8,8)
if (X==3'b111) begin Y<= Y+1; end
X_next<=X_next+1;
if (X_next==3'b111) begin Y_next<= Y_next+1; end
bot_line <=(X_next-x1)*(y2-y1)-(x2-x1)*(Y_next-y1);
top_line <=(X_next-x3)*(y2-y3)-(x2-x3)*(Y_next-y3);
EE 465 FINAL DESIGN PROJECT December 10, 2013
9
po <=0;
if (Y>=y1 && x1<x2 && X>=x1 && bot_line<=0 && top_line>=0 ) begin
po<=1;
xo<= X;
yo<= Y;
end
if (Y>=y1 && x1>x2 && X<=x1 && bot_line>=0 && top_line<=0 ) begin
po<=1;
xo<= X;
yo<= Y;
end
if (Y==3'b111 && X==3'b111) begin // first triangle is
calculated
cycle <=cycle +1; // move to second triangle
x2 <=x22; // pass the cordinates of
second triangle to the first 3 registers so that we don't need to modify the
expression of bot_line, top_line
y2 <=y22;
x1 <=x11;
y1 <=y11;
x3 <=x33;
y3 <=y33;
xo <=3'bzzz;
yo <=3'bzzz;
current_i <=3'b001;
X <= 3'b0;
Y <=3'b0;
X_next<=3'b001;
Y_next<=3'b000;
if (cycle) busy <=0;
end
end
end
end
endmodule
EE 465 FINAL DESIGN PROJECT December 10, 2013
10
Simulation Result
In Transcript window:
# ****** START to VERIFY the Triangel Rendering Enginen OPERATION ******
#
# Waiting for the rendering operation of the triangle points with:
# (x1, y1)=(1, 0)
# (x2, y2)=(7, 2)
# (x3, y3)=(1, 7)
# Waiting for the rendering operation of the triangle points with:
# (x1, y1)=(6, 1)
# (x2, y2)=(0, 3)
# (x3, y3)=(6, 6)
# PASS! All data have been generated successfully!
# ---------------------------------------------
# Total delay: 126000 ns
# ---------------------------------------------
# ** Note: $finish : /home/alfredx/ee330/Formaltest.v(147)
# Time: 12800 ns Iteration: 1 Instance: /test
Fig.6 Simulation Result for the
EE 465 FINAL DESIGN PROJECT December 10, 2013
11
RTL Compiler Result:
Schematic
Timing report
Cost Group : 'clk' (path_group 'clk')
Timing slack : 629ps
Start-point : nt
End-point : Start_importing_reg/SE
Fig.7 Schematic for version 1
EE 465 FINAL DESIGN PROJECT December 10, 2013
12
Power report
============================================================
Leakage Dynamic Total
Instance Cells Power(nW) Power(nW) Power(nW)
------------------------------------------------------------------
triangle 528 20742.271 348657.011 369399.281
Area report
============================================================
Instance Cells Cell Area Net Area Total Area Wireload
--------------------------------------------------------------------------------------
triangle 528 2883 0 2883 ZeroWireload (S)
EE 465 FINAL DESIGN PROJECT December 10, 2013
13
Encounter result
After Select FileRTL Synthesis
Fig.8 Layout after RTL Synthesis
EE 465 FINAL DESIGN PROJECT December 10, 2013
14
Select FloorplanSpecify Floorplan
After clicking Apply
After done mapping Floorplan
Fig.9 Layout after RTL Synthesis
Fig.10 Layout after Floorplan
EE 465 FINAL DESIGN PROJECT December 10, 2013
15
Select PowerPower PlanningAdd Ring
Select PlacePlace Standard Cell
Fig.11 Layout after Adding Ring
Fig.12 Layout after Place Standard Cell
EE 465 FINAL DESIGN PROJECT December 10, 2013
16
Report Power
Total Power
-----------------------------------------------------------------------------------------
Total Internal Power: 0.2861 52.78%
Total Switching Power: 0.2328 42.94%
Total Leakage Power: 0.02322 4.283%
Total Power: 0.542
-----------------------------------------------------------------------------------------
Power Units = 1mW
Area Information
Area =280* 148 = 41440
Fig.12 Area Measurement
EE 465 FINAL DESIGN PROJECT December 10, 2013
17
Debug Timing
The clock period I use is 5 ns in the triangle.sdc constrain file.
Fig.13 Debug Timing
EE 465 FINAL DESIGN PROJECT December 10, 2013
18
Code Version 2(Optimized Circuit)
module triangle (clk, reset, nt, xi, yi, busy, po, xo, yo);
input clk, reset, nt;
input [2:0] xi, yi;
output busy, po;
output [2:0] xo, yo;
reg [2:0] x1,y1; //cordinates of
point 1,2,3
reg [2:0] x2,y2;
reg [2:0] x3,y3;
reg [2:0] x11,y11; //cordinates of
point 4,5,6
reg [2:0] x22,y22; //but it will be
passed to x1,y1, x2,y2, x3,y3 later, to calculate
reg [2:0] x33,y33;
reg [2:0] xii, yii; //temp storage of
xi,yi
reg busy,po,EN;
reg [2:0] xo,yo;
reg [2:0]current_i; //a counter to
indicate which point's cordinate is being recording,example: counter_i=1,
x1=xi,y1=yi
reg [2:0] X;
reg [2:0] Y;
reg Start_importing; //Means wheather
it is still in the process of recording cordinates.
reg signed [6:0] bot_line; //expression for
judging whether the point is on the left or right of the line
reg signed [6:0] top_line;
reg cycle; //indicating which
triangle is being calculating. cycle = 1 means first triangle is being
calculating.
wire ENCLK1,ENCLK2;
assign ENCLK1 = clk|EN;
assign ENCLK2 = clk|(~EN);
reg signed [6:0] m0,m1;
reg [1:0] sel_tmp;
always@(posedge clk or posedge reset) begin
if(reset) begin
busy <=0; //reset all
register
po <= 0;
EN <= 0;
EE 465 FINAL DESIGN PROJECT December 10, 2013
19
x2 <=3'bzzz;
y2 <=3'bzzz;
x1 <=3'bzzz;
y1 <=3'bzzz;
x3 <=3'bzzz;
y3 <=3'bzzz;
x22 <=3'bzzz;
y22 <=3'bzzz;
x11 <=3'bzzz;
y11 <=3'bzzz;
x33 <=3'bzzz;
y33 <=3'bzzz;
y3 <=3'bzzz;
xo <=3'bzzz;
yo <=3'bzzz;
current_i <=3'b01;
X <= 3'b0;
Y <=3'b0;
Start_importing<=0;
cycle <=0;
sel_tmp<=0;
end
else begin //Start importing cordinates
from the testbench
if(~busy) begin
xii<=xi;
yii<=yi;
if (nt) Start_importing <=1;
if (Start_importing) begin // counter_i increases from 1 to
6 to indicates which point to record.
current_i <= current_i +1;
end
if (current_i==3'b001) begin
x1 <= xii;
y1 <= yii;
end
if (current_i==3'b010) begin
x2 <= xii;
y2 <= yii;
end
if (current_i==3'b011 ) begin
x3 <= xii;
y3 <= yii;
end
EE 465 FINAL DESIGN PROJECT December 10, 2013
20
if (current_i==3'b100 ) begin
x11 <= xii;
y11 <= yii;
end
if (current_i==3'b101) begin
x22 <= xii;
y22 <= yii;
end
if (current_i==3'b110 ) begin
x33 <= xii;
y33 <= yii;
busy <= 1;
Start_importing <=0;
end
end
end
end
always@(posedge ENCLK1)
begin
if (busy) begin // start judging whether the
points is inside the triangle
X<=X+1; //loop from (0,0) to (8,8)
if (X==3'b111) Y<= Y+1;
EN <=1;
if (Y>=y1 && x1<x2 && X>=x1 && bot_line<=0 && top_line>=0 ) begin
po<= 1;
xo<= X;
yo<= Y;
end
if (Y>=y1 && x1>x2 && X<=x1 && bot_line>=0 && top_line<=0 ) begin
po<= 1;
xo<= X;
yo<= Y;
end
if (Y==3'b111 && X==3'b111) begin // first triangle is
calculated
cycle <=cycle +1; // move to second triangle
x2 <=x22; // pass the cordinates of
second triangle to the first 3 registers so that we don't need to modify the
expression of bot_line, top_line
y2 <=y22;
x1 <=x11;
EE 465 FINAL DESIGN PROJECT December 10, 2013
21
y1 <=y11;
x3 <=x33;
y3 <=y33;
xo <=3'bzzz;
yo <=3'bzzz;
X <= 3'b0;
Y <=3'b0;
if (cycle) busy <=0;
end
end
end
always@(posedge ENCLK2)
begin
if (Y>=y1 && Y<=y2) begin
sel_tmp<=sel_tmp+1;
m1<= m0; // Moving m0 to m1
top_line <=0;
if (~sel_tmp) m0<=(X-x1)*(y2-y1); // store first part into m0
if (sel_tmp == 2'b01) m0<=(x2-x1)*(Y-y1);// store second part into m0
if it is in part 2
if (sel_tmp==2'b10) begin // set bot_line<= m1-m0 if it is in part3
bot_line <=m1-m0;
EN<=0;
sel_tmp<=0;
m0<=0;
m1<=0;
end
end
else if (Y>y2 && Y<=y3) begin
sel_tmp<=sel_tmp+1;
m1<= m0; // Moving m0 to m1
bot_line<=0;
if (~sel_tmp) m0<=(X-x3)*(y2-y3); // store first part into m0
if (sel_tmp == 2'b01) m0<=(x2-x3)*(Y-y3); // store second part into
m0 if it is in part 2
if (sel_tmp==2'b10) begin // set bot_line<= m1-m0 if it is in part3
top_line <=m1-m0;
EN<=0;
sel_tmp<=0;
EE 465 FINAL DESIGN PROJECT December 10, 2013
22
m0<=0;
m1<=0;
end
end
if (Y<y1 ||Y>y3) EN<=0;
if (po==1)po<=0;
end
endmodule
Optimization Strategy
In the original code,
bot_line <=(X_next-x1)*(y2-y1)-(x2-x1)*(Y_next-y1);
top_line <=(X_next-x3)*(y2-y3)-(x2-x3)*(Y_next-y3);
We calculate value of bot_line and top_line at the same time, which is
inefficient and power consuming. Because when Y <y2, we only need to
know the value of bot_line; when Y>y3, we only need to know the value
of top_line.
Furthermore, these two equation uses 4 multiplier in total which may
take a lot of area. As a result, we could do pipe lining to reduce the
number of the multipliers.
Heres the modified code:
always@(posedge ENCLK2)
begin
if (Y>=y1 && Y<=y2) begin
sel_tmp<=sel_tmp+1;
m1<= m0; // Moving m0 to m1
top_line <=0;
if (~sel_tmp) m0<=(X-x1)*(y2-y1); // store first part into m0
if (sel_tmp == 2'b01) m0<=(x2-x1)*(Y-y1);// store second part into m0
if it is in part 2
if (sel_tmp==2'b10) begin // set bot_line<= m1-m0 if it is in part3
bot_line <=m1-m0;
EE 465 FINAL DESIGN PROJECT December 10, 2013
23
EN<=0;
sel_tmp<=0;
m0<=0;
m1<=0;
end
end
else if (Y>y2 && Y<=y3) begin
sel_tmp<=sel_tmp+1;
m1<= m0; // Moving m0 to m1
bot_line<=0;
if (~sel_tmp) m0<=(X-x3)*(y2-y3); // store first part into m0
if (sel_tmp == 2'b01) m0<=(x2-x3)*(Y-y3); // store second part into
m0 if it is in part 2
if (sel_tmp==2'b10) begin // set bot_line<= m1-m0 if it is in part3
top_line <=m1-m0;
EN<=0;
sel_tmp<=0;
m0<=0;
m1<=0;
end
end
if (Y<y1 ||Y>y3) EN<=0;
if (po==1)po<=0;
end
By using pipe lining, we reduce the activated multiplier to only 1, which
would save a lot of power.
In addition, to make it convenient to do pipe lining, we only need to
extend the clock cycle for the block which contains calculation of
bot_line and top_line, while remain other part unchanged.
To do this, we pulled out the part of the calculation and put it in a always
block which is driven by ENCLK.
This is a kind of clock gating.
EE 465 FINAL DESIGN PROJECT December 10, 2013
24
The type of the clock gating cell we use is as follows
Here is the structure we use
input clk;
wire ENCLK1,ENCLK2;
assign ENCLK1 = clk|EN;
assign ENCLK2 = clk|(~EN);
always@(posedge clk or posedge reset) begin
// do some operation to EN
end
always@(posedge ENCLK1) begin
end
always@(posedge ENCLK2) begin
end
Fig.14 Integrated clock gating cell using DFF
EE 465 FINAL DESIGN PROJECT December 10, 2013
25
Simulation Result
In Transcript window:
# ****** START to VERIFY the Triangel Rendering Enginen OPERATION ******
#
# Waiting for the rendering operation of the triangle points with:
# (x1, y1)=(1, 0)
# (x2, y2)=(7, 2)
# (x3, y3)=(1, 7)
# Waiting for the rendering operation of the triangle points with:
# (x1, y1)=(6, 1)
# (x2, y2)=(0, 3)
# (x3, y3)=(6, 6)
# PASS! All data have been generated successfully!
# ---------------------------------------------
# Total delay: 464000 ns
# ---------------------------------------------
# ** Note: $finish : /home/alfredx/ee330/Formaltest.v(147)
# Time: 46600 ns Iteration: 1 Instance: /test
Fig.14 Simulation Result for version 2
EE 465 FINAL DESIGN PROJECT December 10, 2013
26
RTL Compiler Result:
Schematic
Time report
Cost Group : 'cg_enable_group_clk' (path_group 'cg_enable_group_clk')
Timing slack : 678ps
Start-point : reset
End-point : RC_CG_HIER_INST2/RC_CGIC_INST/E
Fig.15 Schematic for version 2
EE 465 FINAL DESIGN PROJECT December 10, 2013
27
Power report
============================================================
Leakage Dynamic Total
Instance Cells Power(nW) Power(nW) Power(nW)
----------------------------------------------------------
triangle 341 11485.826 147180.865 158666.690
Area report
============================================================
Instance Cells Cell Area Net Area Total Area Wireload
-----------------------------------------------------------------------------
triangle 341 1604 0 1604 ZeroWireload (S)
EE 465 FINAL DESIGN PROJECT December 10, 2013
28
Encounter result
Physical View
EE 465 FINAL DESIGN PROJECT December 10, 2013
29
After floor plan
Power ring
EE 465 FINAL DESIGN PROJECT December 10, 2013
30
Report Power
Total Power
-----------------------------------------------------------------------------------------
Total Internal Power: 0.1725 67.07%
Total Switching Power: 0.07019 27.3%
Total Leakage Power: 0.01449 5.633%
Total Power: 0.2572
Area Information
Area = 262*128 = 33536
EE 465 FINAL DESIGN PROJECT December 10, 2013
31
Debug Timing
The period I use in .sdc file is also 5ns.
EE 465 FINAL DESIGN PROJECT December 10, 2013
32
Comparison for both circuits
RTL Result:
Power (nW) Area (cells) Clock Period
Circuit 1 369,399 2883 5ns
Circuit 2 158,666 1604 5ns
Encounter Result
Power (mW) Area(um^2) Clock Period
Circuit 1 0.542 41,440 5ns
Circuit 2 0.2572 33,536 5ns
Conclusion: Weve done our best in this project. The total hour we
spend is like 50 hours I think. The difficulty is just right for us to
experience design process.
From this project, we have learned quit a lot of how to write the Verilog
code and whats the difference between this and other language. We also
are aware that we need to use non-blocking assignment all the way
through, otherwise it cannot pass the synthesis.
If there are some suggestion I can think of, it might be to have RTL and
Encounter installed in the lab rather than always need to remote to server
to do the synthesis and produce the layout. Sometimes server is not
stable or shut down and it is frustrating.