Frequency Optimization Objective During System Prototyping On Multi-FPGA Platform
Frequency Optimization Objective During System Prototyping On Multi-FPGA Platform
Research Article
Frequency Optimization Objective during System Prototyping
on Multi-FPGA Platform
Copyright © 2013 Mariem Turki et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Multi-FPGA hardware prototyping is becoming increasingly important in the system on chip design cycle. However, after
partitioning the design on the multi-FPGA platform, the number of inter-FPGA signals is greater than the number of physical
connections available on the prototyping board. Therefore, these signals should be time-multiplexed which lowers the system
frequency. The way in which the design is partitioned affects the number of inter-FPGA signals. In this work, we propose a set
of constraints to be taken into account during the partitioning task. Then, the resulting inter-FPGA signals are routed with an
iterative routing algorithm in order to obtain the best multiplexing ratio. Indeed, signals are grouped and then routed using the
intra-FPGA routing algorithm: Pathfinder. This algorithm is adapted to deal with the inter-FPGA routing problem. Many scenarios
are proposed to obtain the most optimized results in terms of prototyping system frequency. Using this technique, the system
frequency is improved by an average of 12.8% compared to constructive routing algorithm.
which appear at the interface and which should be transmit- selected if all signals it depends upon have been routed
ted between FPGAs, is significantly higher than the number in previous phases. The router then uses the shortest path
of available traces between those FPGAs. The communication analysis with a cost function based on pin utilisation to
of interpartition signals between FPGAs is based on routing route as many selected signals as possible, routing the most
algorithms. The most used routing algorithm involves the critical signals first. Any selected signals which cannot be
determination of the shortest feasible path between FPGAs, routed are delayed to the next phase. In this technique, all
using available board interconnect resources for each cut the signals are multiplexed, without promoting the signals on
signal [7]. This approach is not recursive and leads inevitably the critical path. Some critical signals may not be multiplexed
to a blockage. to obtain a better performance in terms of system frequency.
In this paper, we propose a set of constraints to be Another disadvantage related to the combinational loops is
considered during the partitioning task. These constraints are that any unpredictable delay of an inter-FPGA signal causes
intended to get the best results in terms of the number of the transmission of nonupdated values and then system
cut signals and the critical path optimization. We propose errors. Even though such a sophisticated approach may
also a new approach to route the resulted inter-FPGA signals, realize faster verification speed, it decreases the reliability of
based on signal multiplexing technique. To reach this goal, circuit verification which is the most critical issue of circuit
we use an iterative routing algorithm, called Pathfinder [8]. verification.
This algorithm was used to route the intra-FPGA signals. We In [15, 16], the authors proposed a new multiplexing
extend it for the inter-FPGA signals in order to obtain the best approach based on the integer linear programming. The main
routing results. objective of this study is to select which signals must be
The rest of this paper is organized as follows. Section 2 multiplexed and those which must not. Using this technique,
is dedicated to the different techniques used in the state of all signals are transmitted on each phase, but only those
the art to route the inter-FPGA signals. Section 3 describes with updated values are considered. Since all the signals
the different steps of the design prototyping flow. In Sec- are transmitted in each phase, the number of slots per
tion 4, we present the proposed routing algorithm which phase increases and the system frequency is decreased. This
is used initially to route the intra-FPGA signals. Section 5 technique, as the one in [9, 10], uses a constructive routing
explains the scenarios we propose to test the performance algorithm which is not optimized. In fact, when a signal is
of the routing algorithm. These scenarios include the inter- already routed, it cannot be rerouted to leave the routing
FPGA signal form and also the routing graph direction. In resources currently used to another signal that has the
Section 6, we describe the multiplexing IP that we use to greatest need for these resources. This disadvantage will be
transfer the multiplexed signals. Section 7 is dedicated to the solved using an iterative routing algorithm as proposed in
experimental results and to the evaluation of the proposed this study. In fact, the objective of our iterative approach is
methods. Finally, Section 8 concludes the paper. that all the signals negotiate the use of the routing resources.
Each physical wire will be used by the signal which has the
biggest need to this resource. This negotiation will be done
2. Related Works through several iterations to solve all the conflicts, unlike the
constructive routing algorithm which is done only by one
To address the inter-FPGA signals routing problem, the iteration.
authors in [9, 10] proposed heuristic algorithms to solve mul-
titerminal routing signals in partial crossbar architectures.
In [11, 12], multiterminal signals are decomposed into two-
3. Prototyping Flow
terminal nets. Therefore, routing algorithm is applied to these To prototype an ASIC design into a multi-FPGA platform, the
nets. input circuit is transformed into multi-FPGA configuration
In this paper, our goal is to find the best signal shape bitstream to be downloaded onto the prototyping board.
which gives the best routing results. For this reason, many Figure 1 presents the prototyping flow.
scenarios are applied with the proposed routing graph in
order to get the best system frequency.
To remedy the number of pin limitations, Babb et al. 3.1. Logic Synthesis. The HDL description files of the pro-
[13, 14] introduced time multiplexing of I/O pins. Multi- totyping architecture are mapped onto the target library
plexing means that multiple design signals are assembled of FPGA primitives. In this paper, the benchmarks are
and serialized through the same board connection and synthesized with the Synplify industrial tool [18]. The output
then demultiplexed at the receiving FPGA. This technique of this task is a postsynthesis Verilog netlist.
increases dramatically the available inter-FPGA communi-
cation bandwidth. On the other hand, it makes the proto- 3.2. Partitioning. After mapping the netlist onto the tar-
typing system much slower since the system clock period is get technology, it is divided into partitions; each can fit
composed of several phases. Each phase contains a number into a single target FPGA. The partitioner performs K-way
of slots. Consequently, in each phase, the selected signals are partitioning with multiobjective function. The partitioning
transmitted, each in a slot, between a pair of FPGAs. Signals step is very critical since it has a significant impact on the
are selected based on their criticality which is calculated performance of the prototyping system. In this study, we use
depending on the logic dependency analysis. A signal is the Wasga partitioning tool of Flexras technologies [19]. For
International Journal of Reconfigurable Computing 3
HDL Logic
design synthesis Constraints
Design
netlist
Routing and
multiplexing
Individual FPGA compilation
this tool, we set some constraints in order to have a good partitioning tool is to minimize the 𝐶𝑝 parameter presented
trade-off between the following criteria. in
𝑁 2
(a) Minimize the Number of Cut Signals. For big designs, it 𝑆𝑝
is difficult, if not impossible, to find a partitioning solution 𝐶𝑝 = √ ∑ ( ), (1)
𝑝=0 𝑇𝑝
which meets the constraint related to the number of physical
connections between FPGAs. As will be explained subse- with 𝑁 being the number of FPGA pairs in the prototyping
quently, the solution is to make a postpartitioning process platform, 𝑆𝑝 being the number of signals between the pair 𝑝 of
allowing a number of signals to share the same physical wire FPGAs, and 𝑇𝑝 being the number of available tracks between
in different time fractions. The insertion of these multiplexers the same pair.
increases the delays on combinatorial paths. These delays are Finally, the partitioner aims to provide guidance about the
correlated to the number of multiplexed signals (multiplexing signals which should not be multiplexed since they affect the
ratio). Thus, the main goal of the partitioner is to reduce the critical path.
number of the cut signals in order to get the lowest rate of
multiplexing. (b) Combinatorial Paths. The system frequency is imposed
On the other side, the ratio between the number of cut by the delay of the longest combinatorial path (between
signals and the number of available wires should be balanced two registers). The delay on a combinatorial path is strongly
between all pairs of FPGA. Therefore, the objective of the correlated with the number of times a path crosses the border
4 International Journal of Reconfigurable Computing
faster than that of the system in order to transfer all the signals
within one system clock period.
FF Combi- The system clock period is given by the following equa-
logic
tion:
Combi-
logic 𝑇SYS CLK = settle start + comm delay + settle end. (2)
FF Combi-
logic Settle start and settle end correspond to the intradelay of
propagation inside the source and destination FPGA, respec-
FPGA 0 FPGA 1 tively. During the intra-FPGA place and route tasks, we define
a multicycle path constraint to set the intra-delay propagation
(a) Partitioning solution with cut signals = 2, combi-hop = 2
to 3 times the intercommunication period; that is, 3 ∗ 𝑇IOclk
in order to relax the timing constraint inside each FPGA. The
comm delay is the delay of the inter-FPGA communication.
This delay should be reduced in order to optimize the system
frequency. The communication delay is represented by the
Combi- Combi- FF following expression:
FF logic logic
Comm delay = 𝑇mux + 𝑇routing hop + 𝑇latencies . (3)
𝑇mux is the amount of delay spent to transfer all signals via the
FPGA 0 FPGA 1 same physical wire and it is proportional to the multiplexing
(b) Partitioning solution with cut signals = 1, combi-hop = 1 ratio. The 𝑇routing hop is the delay spent to cross all the routing
hops. In fact, the number of routing hops is the number of
Figure 2: Combinatorial hop example. FPGAs to cross to route a signal between the source and
the destination. Finally, 𝑇latencies is the latency of the SERDES
modules.
In order to reduce the multiplexing ratio, the effort should
of an FPGA, called combinatorial hop. This is because the be spent on the routing task. Indeed, using an appropriate
transmission through inter-FPGA connection is much slower routing algorithm, the router can find the optimized solution
than the one inside the FPGA. Therefore, it is important related to the given constraints. As shown in Figure 4, the
to absorb the signals belonging to the critical combinatorial router takes as input the architecture of the prototyping
paths. In Figure 2(a), the number of combinatorial hops is platform, the list of cut signals to be routed, and the initial
equal to 2, and the number of cut signals is equal to 2. mux ratio parameter which is the number of inter-FPGA
If the partitioner identifies the best module to move, the signals to be transmitted through the same physical wire. This
partitioning solution will be improved since the number of parameter is calculated as the max of the multiplexing ratio
inter-FPGA signals and the number of combinatorial hops of all the FPGA pairs. The mux ratio of one FPGA pair is
will be reduced as shown in Figure 2(b). the ratio between the number of signals and the number of
connection wires between these two FPGAs.
(c) Logical Resources Limitation. The number of logical Figure 4 shows the proposed flow to reduce the multi-
resources in the FPGA circuits is limited. During the par- plexing ratio. Depending on the given inputs, the router tries
titioning, an occupancy rate constraint is set, so the par- to route all inter-FPGA signals by meeting the mux ratio
titioning tool must take into account this number and try calculated initially. If a feasible solution exists, the mux ratio
to make a partitioning solution which meets the available is decremented and the router attempts to find another
resources. These resources are heterogeneous since they routing solution with the new mux ratio. Otherwise, the
include different types (LUT, Ram, DSP, etc.). The occupancy router exits with the best obtained multiplexing ratio.
rate should consider the additional logical area which will
be occupied by the multiplexing IPs after the inter-FPGA 3.4. FPGA Place and Route. Once the routing is achieved, the
routing tasks. multiplexing IPs are inserted on the source and destination
Unlike most of commercial tools, the partitioning tool FPGAs to ensure the inter-FPGA signals transmission in the
used in our experiments operates on synthesized netlists corresponding time slots. One netlist is generated for each
which gives accurate information about the size of the design FPGA. Each netlist must be processed with FPGA specific
so it can meet the available logical resources of each FPGA. automated place and route software to generate configuration
bitstreams.
3.3. Routing and Multiplexing. The system clock is the clock
of the logic design being prototyped. The system clock period 4. Inter-FPGA Signals Routing Strategy
is divided into a number of slots as shown in Figure 3. Each
signal is transmitted between a pair of FPGA within one slot To route inter-FPGA signals, it is necessary to find an algo-
period. These slots are controlled by an I/O clock which is rithm that can assign, in an optimized manner, signals to the
International Journal of Reconfigurable Computing 5
MUX IP
D Q Combi- Combi- D Q
logic logic
Comm delay
FPGA 1 FPGA 2
SYS CLK
I/O CLK
available resources. The techniques mentioned in Section 2 4.2. Routing Algorithm: Pathfinder. Pathfinder is used pri-
use constructive routing algorithm. This algorithm keeps marily for routing intra-FPGA signals. We adapt it to deal
the track of the reserved and available physical connections with the inter-FPGA signals [21]. Pathfinder uses an iterative,
between FPGAs. The router applies Dijkstra’s shortest path negotiation-based approach to successfully route all the
algorithm [20] to determine the shortest path between the signals. The routing problem for a given signal is to find
source and destination FPGAs. If the shortest path exists, a directed tree embedded in 𝐺 that connects the source of
the capacity of all used resources is decremented; then, they the signal to each of its FPGA destinations. During the first
cannot be used to route the next signals. Otherwise, router routing iteration, the signals are freely routed without paying
returns unsuccessfully. The main disadvantage of this method attention to resource sharing. Individual signals are routed
is its irreversibility. Indeed, when a signal is already routed, it using Dijkstra’s shortest path algorithm [20]. At the end of the
cannot be rerouted to leave the routing resources currently first iteration, resources may be congested because multiple
used to another signal that has the greatest need for these signals have used them. During subsequent iterations, the
resources. In the example of Figure 5, signals are routed cost of using a resource is increased, based on the number of
randomly. If the signal S1 is first routed through FPGA1, signals that share the resource and the history of congestion
then S2 cannot be routed since the wire between FPGA1 and on that resource. Thus, signals are forced to negotiate for
FPGA2 is used by S1. In this case, the design is considered routing resources. If a resource is highly congested, nets
nonroutable. To avoid this problem, we route the inter-FPGA which can use lower congestion alternatives are forced to do
signals by an iterative routing algorithm. Among existing so. On the other hand, if the alternatives are more congested
techniques, the Pathfinder routing algorithm seems to be best than the resource, then a signal may still use that resource.
6 International Journal of Reconfigurable Computing
S1
FPGA 0 FPGA 1 FPGA 0 FPGA 1
S2 S1 S2
Conflict
F0 P0-F0 P0-F1
FPG A0 FPG A1
P0-F0 P0-F1
P1-F0 P1-F1 P2-F1 P1-F0 P1-F1
F1
P0-F2 P0-F2
F2 P2-F1
FPG A2
Observing the final routing results, we notice that inter- Unidirectional routing graph
FPGA signals can be directly routed between source and modelling
destination FPGAs or intermediate through-hops may be
necessary.
5.1. Convention. All FPGAs on the prototyping board are Start Pathfinder
Mux ratio
indexed sequentially, starting at 0. We say that a signal has
a direct direction if the index of the FPGA source is lower
than its FPGA destination. Signal with indirect direction is
the signal which is directed towards the opposite. Yes
Success?
5.3.2. Two-Terminal Signal. In order to make the design Since the comm delay causes the biggest delay, we neglect the
more flexible, we decompose the multiterminal signals into effect of the intradelays into the sending and the receiving
branches, each with one source and only one destination. FPGAs defined in (2). Therefore, the system frequency is
The Pathfinder routing algorithm tries to find separately a represented in
routing path for each branch. With this decomposition, only
the solution shown in Figure 10(a) is feasible. I/O frequency
Sys freq = . (7)
NB𝑅 hop ∗ 5 + 12 + mux ratio/2
6. Multiplexing IP
7. Experimental Results
The approach described above determines which signals to be
multiplexed together. These signals are transmitted through We use the benchmark generator [22] to generate several
the same physical wire and transferred using 2 multiplexing synthetic designs. The generated benchmarks are hierarchical
IP placed in the sending and receiving FPGAs, as shown since the partitioner operates on high levels of hierarchy in
in Figure 11. To ensure the inter-FPGA communications, order to reduce the partitioning runtime and the number
International Journal of Reconfigurable Computing 9
SYS CLK Clock I/O CLK I/O CLK Clock SYS CLK
generator generator
OSERDES
Demultiplexer
Multiplexer
Data
ISERDES
Design partition LV Data p Data Design partition
LV
LVDS
LVDS
DS Data n DS
Sending IP Receiving IP
FPGA 0 FPGA 1
Table 1: Comparison between routing results of WASGA and CERTIFY partitioning tools.
WASGA CERTIFY
Benchmarks
Cut signals NB FPGA R hop Mux ratio Cut signals NB FPGA R hop Mux ratio
CPU20 occ10 1545 6 0 3 3316 6 0 10
CPU20 occ20 1002 4 0 3 1634 4 0 4
CPU30 occ20 1710 4 0 3 3076 4 0 6
CPU30 occ30 1487 4 0 4 2521 3 0 7
CPU50 occ30 2819 4 0 5 5279 4 0 11
CPU50 occ50 2202 4 0 6 4019 3 0 9
CPU125 occ50 7809 6 1 11 NR NR NR NR
CPU125 occ65 7644 5 0 12 NR NR NR NR
of managed elements. The targeted multi-FPGA prototyping [23]. The number of cut signals obtained by the WASGA
board that we use for the experiments is a DNV6F6PCIe partitioner is considerably less than the number of the
from the Dini group [17]. As shown in Figure 12, this signals obtained by CERTIFY. WASGA aims to optimize the
board contains 6 FPGAs Virtex-6 LX550T using all the same number of combinational hops. Therefore, for all the tested
package FF1759, meaning that they have the same number designs, the mux ratio results are improved compared to
of total user I/Os. The inter-FPGA clock frequency is set to those obtained by CERTIFY. Table 1 shows results related to
500 MHz. Applying this frequency on the multiplexing IPs the number of routing hops used in each benchmark. The
(ISERDES/OSERDES with LVDS), the inter-FPGA commu- number of routing hops is the number of FPGAs crossed by a
nication data rate on this board is 1 Gbps using double data signal from the source until reaching its destination. Results
rate (DDR). presented in Table 1 reflect the importance of partitioning on
To map the designs into this board, we use the WASGA the system frequency. We should notice that for the 2 last
partitioning flow provided by Flexras Technologies [19]. benchmarks, the designs are not routable (NR) with the CER-
WASGA partitions the designs and outputs the list of inter- TIFY tool. In fact, since the number of cut signals is becoming
FPGA signals that should be routed. After routing these larger, the mux ratio is more and more important. CERTIFY
signals, using the routing methodology detailed in this paper, provides multiplexing IP with a maximum width equal to 32
WASGA generates a netlist for each FPGA which contains the [24]. Thus, all the designs which need a multiplexing ratio ≥
multiplexing IP to ensure the transmission of the multiplexed 32 are not routable.
signals. The resulting netlists are entered into the FPGA flow On the other hand, we tried to select the best shape of
to execute the place and route and the bitstream generation the routing signals. Table 2 shows the results for each routing
individually for each FPGA. scenario described in Section 5. These scenarios are defined
Firstly, we set the constraints listed in Section 3.2 to the depending on the signal shape and the routing graph. Four
WASGA partitioning tool. Table 1 presents the routing results scenarios are selected to test the performance of the iterative
obtained by WASGA flow and CERTIFY partitioning tool routing algorithm on the multi-FPGA prototyping platform.
10 International Journal of Reconfigurable Computing
SMA
GTX expansion
FPGA C USER R
FPGA E
header
FPGA F SFP MEG array expansion MEG array expansion MEG array expansion
10/100/1000 128 Mb 128 Mb 8
CLK 25 (25 MHz) or connector (400-pins) flash connector (400-pins) flash connector (400-pins)
phy
SFP+
GTX
1
FPGA A SATA II 96 96 96 96 96 96
USER L
Config FPGA
DDR3 SODIMM
GTX
DDR3 SODIMM
FPGA D (device) GTX 8 GTX 8
FPGA F FPGA D
(4 Gb max)
FPGA E
(4 Gb max)
FPGA B Virtex-6 Virtex-6
78 Virtex-6 100
24 114.285 MHz LX550T
(host) LX550T LX550T 130
MHz Frequency G0 (2 kHz to
synthesizer (FF1759) (FF1759) (FF1759)
700 MHz) 130 78 100
(Si5326)
40
GTX
GTX
GTX
FPGA Q 40
24 114.285 MHz 40
Virtex-5 40
MHz Frequency
G1 (2 kHz to 40
synthesizer Config FPGA 114
(Si5326) 700 MHz) 40 114 20 95 95 16 93 93 16
PCIe NMB
GTX 8
DDR3 SODIMM
GTX
DDR3 SODIMM
24 8
114.285 MHz FPGA C FPGA B FPGA A
(4 Gb max)
(4 Gb max)
MHz Frequency
G2 (2 kHz to Virtex-6 140 Virtex-6 140 Virtex-6
synthesizer
(Si5326) 700 MHz) LX550T LX550T LX550T
130 130
125 MHz (FF1759) 140 (FF1759) 140 (FF1759)
150 MHz
GTX
GTX
OSC PCIe 8 8
250 MHz
312.5 MHz MGT clock Gen1
G0 4-lanes GTX expansion 128 Mb GTX expansion 128 Mb
MPP bus
PCI express
GTX: Packet I/O transceivers 6.5 Gb/s per channel bidirectional (13 Gb/s max)
Figure 12: Prototyping board based on six Virtex-VI from Dini group [17].
In these experiments, we use benchmarks where 70% of (iv) Finally, in the fourth scenario, the two-terminal
signals are multiterminal ones. branches are combined into groups and routed into
(i) In scenario 1, multiterminal signals are routed on a a bidirectional routing graph.
unidirectional routing graph.
(ii) In scenario 2, two-terminal signals are routed into a Results show that routing on a bidirectional graph gives much
unidirectional routing graph where the nodes capac- better results since the router has more flexibility to select
ity can be greater or equal to 1. the routing path. On the other hand, routing multiterminal
(iii) In scenario 3, multiterminal signals are grouped into signals is not always optimized, even if the mux ratio of
GSignals. These GSignals are routed into a bidirec- scenario 3 is sometimes less than the one of scenario 4, but
tional routing graph where all node capacities are set using large number of routing hops penalizes the system
to 1. frequency.
International Journal of Reconfigurable Computing 11
OAR NCR
Benchmarks Gain
R hop Mux ratio Freq (MHz) R hop Mux ratio Freq. (MHz)
CPU50 occ30 0 9 29.41 0 9 29.41 0%
CPU125 occ50 2 16 16.66 1 16 20 20.04%
CPU150 occ30 3 24 12.82 1 29 15.62 21.84%
CPU150 occ50 2 51 10.41 1 51 11.62 11.65%
CPU375 occ80 2 51 10.41 1 51 11.62 11.65%
CPU375 occ85 2 79 8.06 2 69 8.77 8.8%
CPU700 occ80 2 134 5.61 2 109 6.49 15.68%
Since we have demonstrated that scenario 4 gives usu- Design Automation Conference (ASP-DAC ’11), pp. 297–300,
ally the best results, we apply Pathfinder and the obstacle January 2011.
avoidance routing algorithms (constructive algorithms) to [2] M. Santarini, “ASIC prototyping: make versus buy,” EDN, vol.
route inter-FPGA signals, all with one source and one des- 50, no. 24, pp. 30–40, 2005.
tination (branch), and grouped into GSignals. Table 3 shows [3] D. Amos, A. Lesea, and R. Richter, FPGA-Based Prototyping
the results of comparison. OAR means obstacle avoidance Methodology Manual, Synopsys, 2011.
routing and NCR refers to negotiated congestion routing. [4] I. Kuon and J. Rose, “Measuring the gap between FPGAs and
Results show the important impact of the NCR iterative ASICs,” in Proceedings of the 14th ACM/SIGDA International
routing and its efficiency to improve system performance. The Symposium on Field Programmable Gate Arrays, pp. 21–30,
frequency is increased on average by 12.8% and the impact of February 2006.
NCR is important for highly congested partitioning results. [5] H. Krupnova, “Mapping multi-million gate SoCs on FPGAs:
In fact, thanks to its negotiation aspect, it avoids easily local industrial methodology and experience,” in Proceedings of
minima and reduces the path length from a source FPGA to the Design, Automation and Test in Europe Conference and
a destination FPGA. In addition, it leads to a good trade-off Exhibition (DATE ’04), vol. 2, pp. 1236–1241, February 2004.
between maximum multiplexing ratio and routing hops. [6] S. Asaad, R. Bellofatto, B. Brezzo et al., “A cycle-accurate,
cycle-reproducible multi-FPGA system for accelerating multi-
core processor simulation,” in Proceedings of the ACM/SIGDA
8. Conclusion International Symposium on Field Programmable Gate Arrays
(FPGA ’12), pp. 153–162, February 2012.
Prototyping is no longer optional due to the cost of chips [7] J. Babb, R. Tessier, M. Dahl, S. Z. Hanono, D. M. Hoki,
and difficulty to simulate huge designs. To validate designs and A. Agarwal, “Logic emulation with virtual wires,” IEEE
more efficiently, the highest frequency should be reached. The Transactions on Computer-Aided Design of Integrated Circuits
system frequency depends on the way the inter-FPGA signals and Systems, vol. 16, no. 6, pp. 609–626, 1997.
are routed. In this paper, we presented our approach to route [8] L. McMurchie and C. Ebeling, “PathFinder: a negotiation-based
these inter-FPGA signals. We set a number of constraints performance-driven router for FPGAs,” in Proceedings of the
to the partitioning tool in order to get the best partitioning International Workshop on Field Programmable Gate Array, pp.
solution which leads to the optimal routing. We extend the 111–117, February 1995.
Pathfinder routing algorithm to deal with the inter-FPGA [9] A. Ejnioui and N. Ranganathan, “Multiterminal net routing for
signals. In order to select the best signal shape, we tested partial crossbar-based multi-FPGA systems,” IEEE Transactions
the performance of this iterative routing algorithm on four on Very Large Scale Integration Systems, vol. 11, no. 1, pp. 71–78,
scenarios. 2003.
The best scenario in terms of system frequency consists [10] X. Song, W. N. N. Hung, A. Mishchenko, M. Chrzanowska-
in grouping signals into GSignals where each one has 1 Jeske, A. Kennings, and A. Coppola, “Board-level multiterminal
source and only 1 destination. Compared to common obstacle net assignment for the partial cross-bar architecture,” IEEE
avoidance algorithms, we obtain a significant prototyping Transactions on Very Large Scale Integration ystems, vol. 11, no.
system frequency improvement of 12.8%. 3, pp. 511–513, 2003.
[11] W. Mak and D. F. Wong, “Board-level multiterminal net routing
for FPGA-based logic emulation,” ACM Transactions on Design
Acknowledgment Automation of Electronic Systems, vol. 2, no. 2, pp. 151–157, 1997.
[12] W. Mak and D. F. Wong, “On optimal board-level routing for
This research paper is made possible through the help and FPGA-based logic emulation,” IEEE Transactions on Computer-
support of the Feder European Grant. Aided Design of Integrated Circuits and Systems, vol. 16, no. 3, pp.
282–289, 1997.
References [13] J. Babb, R. Tessier, and A. Agarwal, “Virtual wires: overcoming
pin limitations in FPGA-based logic emulators,” in Proceed-
[1] C. Huang, Y. Yin, and C. Hsu, “SoC HW/SW verification and ings of the IEEE Workshop on FPGAs for Custom Computing
validation,” in Proceedings of the 16th Asia and South Pacific Machines (FCCM ’93), pp. 142–151, April 1993.
12 International Journal of Reconfigurable Computing