Scaling Up Physical Design: Challenges and Opportunities
1,3 1 1
Guojie Luo Wentai Zhang Jiaxi Zhang
[email protected] [email protected] [email protected]
2,3,1
Jason Cong
[email protected]
1
Center for Energy-Efficient Computing and Applications, School of EECS, Peking University
2
Computer Science Department, University of California, Los Angeles
3
PKU-UCLA Joint Research Institute in Science and Engineering
ABSTRACT tronics designs. The semiconductor industry is involved in
Due to the continuous scaling of integration density and the new applications including medical technology, automo-
the increasing diversity of customized designs, there are in- tive, robotics, and energy systems, which in turn exposes
creasing demands on the scalability and the customization some complex design problems and thus needs greater com-
of EDA tools and flows. Commercial EDA tools usually putational power [8].
provide an interface of TCL scripting to extract and modify The EDA vendors have already adopted a distributed stor-
the design information for a flexible design flow. However, age solution for data management [5, 6]. Some EDA tools
we observe that the current TCL scripting is not designed provide multi-threaded solutions to relieve the runtime issue.
for the complete netlist extraction, resulting in a significant However, the emerging and mainstream distributed comput-
degradation in performance. For example, it takes over 20 ing infrastructures are not fully adopted by EDA tools, due
minutes to extract the complete netlist of a 466K-cell de- to the cost of rewriting software and the unclear pricing
sign using TCL. This extraction may be repeated several models. In the meanwhile, EDA users have various needs
times when interfacing between the existing EDA platforms for customized tools in their flow. In addition to the core
and the actual distributed EDA algorithms. This drastic physical design steps, users may develop additional tools for
decrease in efficiency is a great barrier for customized EDA their specific needs. There are strong needs for powerful and
tool development. In this paper, we propose to build a dis- extensible framework to design EDA tools and flows.
tributed framework on top of TCL to accelerate the netlist A related topic is putting EDA tools and solutions in the
extraction, and use the distribution detailed placement as cloud [30]. These recent efforts mainly spread across the
an example to demonstrate its capability. This framework applications of training, demonstrations, and web-based col-
is promising in scaling out physical design algorithms to run laboration. These can be viewed as the interaction-intensive
on a cluster. end of the design platform, which create opportunities and
hide the complexity for the development of parallel and dis-
tributed computing tools in the compute-intensive end. And
CCS Concepts there is another example in academia, called bX [27], to
•Hardware → Physical design (EDA); Methodolo- provide computational power for regression testing of EDA
gies for EDA; •Computing methodologies → Parallel algorithms.
algorithms; Distributed algorithms; Here, we investigate a distributed computing framework
for EDA algorithms and flows. We take advantage of the
progress in the computational engines (e.g., Spark [37]) in
Keywords the big data ecosystem, and design a framework capable of
Physical Design; Detailed Placement; FPGA; TCL; interfacing the existing commercial EDA platforms. In this
Distributed Computing; Spark way, the academia will be able to design more distributed
EDA algorithms and test with industrial-grade design ex-
1. INTRODUCTION amples; and the industry will have a low cost to migrate
some of the design tools to this distributed framework. The
The computational demand of physical design and EDA
interface implements design query and modification through
tools keeps growing, due to the increasing complexity of elec-
TCL scripting supported by mainstream EDA tools. TCL
scripting is a good candidate to implement a general in-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
terface across commercial EDA tools, and there have been
for profit or commercial advantage and that copies bear this notice and the full cita- some practices of customized in-house EDA tools developed
tion on the first page. Copyrights for components of this work owned by others than using TCL [35, 16]. However, we observe that existing TCL
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- support is not designed for high-throughput queries. The
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from
[email protected]. extraction of the whole netlist takes an ineligible amount of
ISPD’16, April 03-06, 2016, Santa Rosa, CA, USA time. We propose a distributed parser to efficiently read de-
c 2016 ACM. ISBN 978-1-4503-4039-7/16/04. . . $15.00 sign data from existing EDA platforms, and provide a solu-
DOI: http://dx.doi.org/10.1145/2872334.2872342
tion to maintain data consistency when interact distributed
131
EDA algorithm with existing tools. We also demonstrate existing EDA platforms in the environment of Linux clus-
the capability of this framework using a distributed detailed ters.
placement algorithm.
The remainder of this paper is organized as follows. Sec- virtual machine
tion 2 states the background and related work. Section 3
proposes the design and details of a distributed EDA frame- app 1 app 2 app 3
container
work with an application example of detailed placement. libs libs
Then Section 4 describes the experimental results. Sec- app 1 app 2 app 3
guest
tion 5 discusses the challenges and opportunities for a scal- guest OS
OS libs libs
able EDA framework.
hypervisor Docker engine
2. BACKGROUND AND RELATED WORKS host OS host OS
First, we summarize the latest efforts in putting EDA in infrastructure infrastructure
the cloud. In the meanwhile, we give a short introduction
to the techniques like Spark and Docker in the big data and Figure 1: Virtual machine vs. container.
cloud ecosystem. Though it may not be an urgent task for
the EDA companies to take advantage of the progress in Inspired by the techniques above, we propose a distributed
these techniques, it is about time to design new distributed EDA framework for the scalability of physical design algo-
physical design algorithms for scalability. In the next sec- rithms. Comparing with existing cloud EDA solutions, it is
tion, we will propose a distributed computing framework a computational infrastructure that can deploy either in the
that can build on top of existing EDA platforms. public cloud or private cloud.
A concept related to the scalability of EDA tools is the
cloud EDA. The EDA vendors have been investigated the
opportunity of offering the solutions in the cloud [30]. Ca- 3. FRAMEWORK DESIGN
dence provides Hosted Design Solutions [14] as a produc-
tion design environment in the private cloud. Synopsys has 3.1 Overview
put its functional verification solution VCS in the cloud [19]. The overall design of our proposed framework for dis-
And Mentor Graphics has a cloud-based SystemVision [7] for tributed EDA algorithms and flows is illustrated in Figure 2.
modeling and design of electro-mechanical systems. Besides, Existing EDA platforms are supported using TCL scripts
there have been multiple companies putting their products for design data extraction and modification. The design
in the cloud, including Altium, OneSpin, Plunify, Tabula, data and the FPGA architecture or technology information
etc. Recently, IBM provides its high-performance services are presented on top of the distributed computing engine of
for EDA through SiCAD [21]. Spark. While we can follow OpenAccess as the data model
On the other hand, the MapReduce [13] programming of post-synthesis design, it is promising to design and de-
model and infrastructure have been used widely for scal- velop portable physical design algorithms and flows in the
able data processing in the big data ecosystem. Users of distributed computing engine of Spark.
MapReduce implements all the computations using only two
functions, map and reduce, where a unit computation takes
the input of <key,value> pairs and generates the output
of another set of <key,value> pairs. Though the MapRe- customized customized
duce model seems less flexible and has lower peak perfor- distributed EDA flows
EDA algorithms
mance [29] than the traditional message-passing model [17],
it increases the productivity and has even better perfor-
mance for the programs written by non-experts in distributed design data + architecture information
computing. Spark [37] is an open-source data-parallel com-
putation engine using the MapReduce programming model. Spark
Different from Hadoop [34], a previous open-source imple-
mentation of MapReduce, Spark enables efficient iterative
platforms
platforms
platforms
platforms
existing
existing
existing
existing
algorithms and interactive queries by keeping data in mem-
EDA
EDA
EDA
EDA
ory using the Resilient Distributed Datasets (RDDs). More-
over, it supports general computation DAGs and allows op-
tional specification of data partitioner to avoid data move-
ment. Since most EDA algorithms are iterative, Spark is a server node server node
suitable engine to scale out EDA tools.
Linux container is a lightweight virtualization technique, Figure 2: The distributed EDA framework.
which provides an isolated kernel namespace for the pro-
cesses, file system and network without running a full OS on
virtual hardware. Docker [4] is a representative implementa- 3.2 Interfacing Existing EDA Platforms
tion. The comparative study of virtual machines (VMs) and TCL is a de facto standard for mainstream commercial
containers shows that containers have equal or better perfor- EDA products. One has to import the design data from ex-
mance than VMs and lower overhead in OS interaction [15] . isting EDA platforms to the distributed computing frame-
The basic idea of container is illustrated in Figure 1. We will work using TCL scripts. The outline of the parser code is
show in the next section that container is useful to manage shown in Listing 1, which is implemented for the Xilinx Vi-
132
vado platform. The scripts are only slightly different for the in-memory data, and thus can be used to replicate a live
Altera Quartus platform. instance [31].
Listing 1: Extract netlist from Vivado using TCL write
1 foreach cell [ get_cells ... ] {
2 puts ... data data data data
v0.2 v0.1 v0.1 v0.1
3 }
4 foreach net [ get_nets ... ] { Vivado Vivado Vivado Vivado
5 foreach pin [ get_pins -of $net ] {
6 var cell [ get_cells -of $pin ] server node server node
7 puts ...
8 } (a) Write data back to a single instance when a
9 } conventional step in Vivado is needed.
run
However, we observe the TCL execution on existing FPGA
EDA platforms is relatively slow. For example, it takes over data
20 minutes to extract the complete netlist of a 466K-cell v0.3 kill
design using TCL. This extraction may be repeated sev- Vivado
eral times when interfacing between the existing EDA plat-
forms and the actual distributed EDA algorithms. And this server node server node
runtime cannot be directly reduced given more computing
resources. In order to solve this issue and make the dis- (b) Kill the outdated instances, and run a conven-
tributed computing framework meaningful, we propose a tional step in the remaining instance.
parallel parser to accelerate the execution of TCL scripts. replicate
The parser design is illustrated in Figure 3. In this exam-
ple, four instances of Vivado execute on two server nodes. In data data data data
this implementation, the parser reads partial data from each v0.3 v0.3 v0.3 v0.3
instance, and then combine and convert the design to the in- Vivado Vivado Vivado Vivado
memory RDD in Spark. The RDD can be partitioned into
sub-designs for further MapReduce steps, with an example server node server node
shown in the next subsection.
(c) Replicate the instance with the latest data, and
Spark get ready for the next parallel parsing.
Figure 4: Writing strategy for data consistency.
master parallel parser
In this way, we are able to develop new distributed algo-
slave slave slave slave
rithms in Spark on top of Vivado. The runtime of the TCL-
based parser is accelerated by parallel reads from multiple
identical instances of Vivado. The data consistency for data
write-backs can be guaranteed using the Docker technique
Docker data data data data
to management the Vivado instances. This methodology is
containers
Vivado Vivado Vivado Vivado straightforward to be applied to any other mainstream EDA
platforms that support TCL scripting.
server node server node
3.3 Example Application: Detailed Placement
The detailed placement algorithms [23, 12, 28] usually
Figure 3: Parallel reads for efficient parsing. include global swapping and local swapping. As an illus-
trative example, we implement a brute-force algorithm of
After a design is processed by a distributed algorithm in local swapping on Spark to examine the capability of the
Spark, the result has to be written back if a conventional distributed EDA framework.
step in Vivado is needed. This procedure is briefly listed The information produced by previous TCL parsing stage
in Figure 4. First, as shown in Figure 4a, the updated de- contains the lists of the cells, nets, initial placement, and
sign (from data v0.1 to v0.2) in Spark is written back to a feasible placement locations. The netlist and the placement
single instance of Vivado. Second, the outdated instances region is processed and partitioned into “DP tile” structures
(data v0.1) are stopped and removed; at the same time, the after a few MapReduce steps. Each DP tile consists of its
remaining instance can run a conventional step in Vivado partial netlists and subregion information needed by the re-
to obtain a further updated design (data v0.3), as shown in gional local swapping algorithm. The set of DP tiles is the
Figure 4b. Last, the remaining instance can be replicated as RDD for a map operator to do parallel swapping. This map
in Figure 4c, so that the parallel parsing can be performed to operator takes a DP tile as the input and generates a new
run another distributed algorithm in Spark. The instances DP tile after local swapping.
of Vivado are managed using the Docker technique, where Compared with the conventional sequential solution, our
they can be killed or replicated conveniently. The check- method partitions the whole FPGA into N ×N DP tiles and
point and restore operations are able to save and restore the performs local swapping in each tile. During the swapping
133
in each individual tile, we sweep a sliding window and enu- TCL execution is too long for such interactions, and hurts
merate all possible permutations of the cells in this window the speedup from any distributed algorithm in Spark.
to pick a partially best solution. A size of 3 × 2 and 2 × 3
are selected for the sliding window, and the 6! = 720 per-
mutations of each window can be examined in a reasonable
amount of time. After completing one iteration, the best
permutation of cells is committed, and the sliding window
moves to next position. We send all the tiles in N rounds,
and a group of map operations process N tiles in parallel.
These N tiles are chosen in a way that there are not any
pair of tiles on the same row or column, so that the esti-
mation of wirelength improvement in each tile is consistent.
The partitioning and local swapping scheme are illustrated
in Figure 5.
Figure 6: Decomposition of parsing time.
Thus, we use the parallel parser as illustrated in Figure 3
DP tile previously, and achieve about 3× speedup with 3× memory
using four instances of Vivado. The TCL runtime and mem-
ory consumption are shown in Table 2. The runtime can be
Sliding window in further reduced using more Vivado instances.
the map operator
Table 2: TCL runtime time and memory consump-
tion using four instances of Vivado.
TCL time memory
test case
Figure 5: Distributed detailed placement scheme. (min) decr. (GB) incr.
SLAM spheric 1.0 4.0× 6.4 3.2×
bitcoin miner 2.1 3.9× 8.0 3.2×
guassianblur d1 6.9 2.9× 11.0 3.0×
4. EXPERIMENTAL RESULTS
Our experiments are run on a Linux cluster with four The current support of live replication of Vivado instances
nodes, each with two 6-core Intel Xeon Processor E5-2620 using Docker and CRIU [1] is experimental. The experi-
v3 at 2.40GHz and 64GB memory. The distributed comput- ments in [10] show that it takes about 9 seconds to check-
ing engine is powered by Spark 1.5.2 and HDFS of Hadoop point and restore a container with 1GB memory. These fea-
2.6.3. The four nodes are connected by Gigabit Ethernet. tures are under development by the communities of Linux
The summary of the test cases is described in Table 1. containers and we can expect a runtime improvement in the
The test cases are obtained from the Titan benchmarks [24]. near future.
They are synthesized and placed using Xilinx Vivado 2015.3
targeting VC707 (part name XC7VX485TFFG1761-2). The 4.2 Distributed Detailed Placement
total number of logic cells and a short description are also The distributed detailed placement algorithm is written
included in the table. in Python, and is executed by the command “spark-submit
–executor-memory 4G dplace.py”.
Table 1: Summary of the test cases The runtime results of the distributed detailed placement
name #cells description are listed in Table 3, and the quality of wirelength improve-
spherical coordinates ment is similar to the sequential version. The results show
SLAM spheric 87K
algorithm for SLAM great potentials in speedup in the distributed computing.
Two-core version of the
bitcoin miner 222K Table 3: Runtime of distributed detailed placement
bitcoin FPGA miner
One of the pipelined loops for with different number of parallel tiles
guassianblur d1 466K runtime (min)
3D Gaussian convolution test case
1 tile 48 tiles
SLAM spheric 36 18
4.1 Distributed TCL Parser bitcoin miner 51 20
guassianblur d1 611 25
After synthesis and placement, the next step in our exper-
iments is to load the data from Xilinx Vivado to memory.
We use TCL scripts to extract the design information.
The parsing time of the three test cases in Vivado is illus- 5. CHALLENGES AND OPPORTUNITIES
trated in Figure 6. The part of “load design” is a one-time In the previous sections, we demonstrate a proof-of-concept
execution to load a design in Vivado. The parts of “foreach for a distributed EDA framework to scale out the physical
cell” and “foreach net” correspond to the execution of TCL design algorithms. In this section, we highlight the chal-
scripts in Listing 1. These two parts are usually executed lenges and opportunities to attract the efforts of the physi-
multiple times when there are multiple interactions between cal design community to design and develop new distributed
the Spark program and the Vivado services. The runtime of algorithms.
134
The following are the necessary components to make the ing distributed engines lower the barrier to get involve with
distributed EDA framework for scalable physical design generic distributed computing, and bring more EDA experts to im-
and useful. plement algorithmic kernels and flows in the framework.
Algorithmic kernels. These algorithmic kernels are pre- Reproducible results. It is hopefully the algorithmic
ferred to be either a map operator in the MapReduce dis- kernels and flows can be encapsulated with its dependent
tributed computing paradigm for the partitioned design data, dynamic linking libraries using the Linux container tech-
or a composed series of MapReduce operations. There have nologies like Docker. In this way, they are executable in any
already been some widely-used physical design tools devel- mainstream Linux clusters without configuration or com-
oped. FLUTE [11] is one of such examples, which is adopted pilation issues and generate reproducible results. There is
by most global routers in the ISPD routing contests [26, 25] also an opportunity to provide “cloud” services for such dis-
to construct rectilinear Steiner minimal trees for multi-pin tributed EDA framework, so that the design data, bench-
nets. The algorithmic kernels are analogous to the cognitive marks and design flows can be shared in the community
computing services for language, speech, vision and data on similar to GitHub [2]. Moreover, when the flows are exe-
IBM Bluemix [3]. It is necessary and challenging to build cuted in the cloud by the masses on some existing bench-
and maintain a library for such algorithmic kernels. marks, it is possible to apply the idea of data deduplication
Standard interfaces. The standard interfaces include to skip the execution of the first few stages if a flow has been
the ones connecting some algorithmic kernels to form a com- executed before. It will save runtime for the development of
plete EDA algorithm, as well as the ones connecting existing late-stage physical design algorithms.
EDA platforms and the distributed computing framework. Collaborative innovation. The collaborative innova-
Though the format of raw design data will keep changing tion comes with the standard interfaces and execution of the
due to new design rules and new objectives, it is feasible to algorithmic kernels and flows in the distributed computing
provide a conversion operator as an algorithmic kernel for framework. And it is promising to bridge the gap between
backward compatibility. The relatively stable standard in- industry and academia. The opportunity of “cloud” services
terfaces across different EDA platforms and raw design data to share design data, benchmarks and design flows is a way
will extend the lifetime of an algorithmic kernel and help to boost collaborations. On one hand, when the results of
the growing of the algorithmic library. OpenAccess [18] sets the algorithmic kernels and a design flow from academia are
a good example of an open-source data model and API for reproducible, it will be easier for the industry to try the flow
physical design. Given the necessity of a distributed com- and get direct access to the new ideas from academia. On
puting framework to scale out physical design, it is about the other hand, since the framework is compatible with ex-
time to re-define a new set of standard interfaces in such isting EDA platforms, the industry can set up an evaluation
context. system like ImageNet [20] for academia to submit their tools
Flow composition. The current implementation of Ope- and flows in an executable form, with industry-grade design
nAccess only supports the flow composition by tool-by-tool data without worrying about sensitive data leakage.
inter-operation. Given the algorithmic kernels and stan- Education. The distributed computing framework cre-
dard interfaces, the distributed computing framework will ates opportunities for instructors to provide a design flow to
be able to support both tool-by-tool inter-operation (macro- students in a quick way. The students are not only able to
flow composition) and the connection of algorithmic kernels see an example of the whole design flow (the highest-level of
(micro-flow composition). The former one is conventional, a composed flow) conveniently, but will also be much easier
and it will help the innovations in EDA design flow when than nowadays to replace a design step with their own al-
there are sufficient amount of tools supported in the frame- gorithm. The lower barrier to getting familiar with design
work. The latter one is promising to keep up the innovations flows and experiment on EDA algorithms is promising to
in EDA algorithms. For example, there have been a series attract more students understand the EDA field.
of routibility-driven placement contests [33, 32, 36, 9]. Most
of these algorithms share similar algorithmic kernels, and 6. CONCLUSION
varies in some of these kernels and the detailed tuning of
In this paper, we propose a distributed computing frame-
the flow. Such research activities can attract boarder at-
work for extreme-scale EDA algorithms development. Fur-
tentions, if the existing kernels and flows can be reused; so
thermore, we outline the challenges and opportunities of how
that a group new to this area does not need to start from
such framework will benefit the innovations in both EDA
scratch but can focus on the innovation of the critical ker-
algorithms and flows, as well as the collaboration between
nels (e.g., the routability estimator, the inflation strategy,
industry and academia.
the placement objective, etc.).
Specifically, our proposed framework enables the design
The distributed computing framework have potential ben-
and development of new distributed EDA algorithms while
efits in the following aspects.
being compatible with commercial EDA design platforms.
Scalability. There are extensive efforts to develop the
This framework uses TCL language to interact with exist-
distributed computing engines like Spark and Tachyon [22],
ing EDA platforms, and converts the design information to a
which are motivated by big data applications. The port-
distributed in-memory data structure in Spark. The current
ing of EDA algorithms on new distributed engines can take
TCL support in existing EDA platforms is mainly designed
advantage of such progress in the big data ecosystem, and
for customized flows and lightweight customized tools. As a
keep up with the scaling of design complexity in the long
result, we observe that the extraction of the complete design
run. Though there are results [29] showing that Spark is one
information using TCL takes a significant amount of time.
order of magnitude slower than MPI for specific data sets,
And this will cancel out the speedup from the distributed
it has a data management infrastructure better in handling
EDA algorithms in our proposed framework according to
node failure and data replication. Moreover, these emerg-
Amdahl’s law. To solve this issue, we start multiple in-
135
stances to open the same design in multiple server nodes, Design of Integrated Circuits and Systems,
and extract the design data in a distributed way. The de- 27(1):70–83, 2008.
sign data is stored in memory for further processing in the [12] J. Cong and Min Xie. A Robust Mixed-Size
distributed computing engine of Spark. To demonstrate the Legalization and Detailed Placement Algorithm. IEEE
proposed framework, we implement a distributed detailed Transactions on Computer-Aided Design of Integrated
placement algorithm and show a substantial speedup. Circuits and Systems, 27(8):1349–1362, 2008.
In the end, we summarize the challenges and opportuni- [13] J. Dean and S. Ghemawat. MapReduce: Simplified
ties to scale out physical design algorithms in a distributed Data Processing on Large Clusters. In Proceedings of
framework. It is promising that such framework will accel- the 6th Conference on Symposium on Opearting
erate the innovations in both large-scale EDA algorithms as Systems Design & Implementation - Volume 6,
well as new EDA design flows. OSDI’04, page 10, Berkeley, CA, USA, 2004.
[14] L. Drenan. Cadence Hosted Design Solutions:
7. ACKNOWLEDGMENTS Software-as-a-service capability for the semiconductor
This work is partly supported by National Natural Science industry. Technical Report 702 6/13 SA/DM/PDF,
Foundation of China (NSFC) Grant 61202073, Research Fund Cadence Design Systems, 2013.
for the Doctoral Program of Higher Education of China [15] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio.
(MoE/RFDP) Grant 20120001120124, and Beijing Natural An updated performance comparison of virtual
Science Foundation (BJNSF) Grant 4142022. machines and Linux containers. Technical Report
RC25482 (AUS1407-001), IBM Research, 2014.
[16] J. Friesen. An approach for better debuggability of
8. REFERENCES Tcl- driven EDA methodologies. In CDNLive Silicon
[1] CRIU, a project to implement checkpoint/restore Valley 2015, 2015.
functionality for Linux in userspace. [17] W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A
http://www.criu.org/. [Online; accessed Feb 5, 2016]. high-performance, portable implementation of the
[2] GitHub. https://github.com/. [Online; accessed Feb 5, MPI message passing interface standard. Parallel
2016]. Computing, 22(6):789–828, 1996.
[3] IBM Watson Developer Cloud. http://www.ibm.com/ [18] M. Guiney and E. Leavitt. An introduction to
smarterplanet/us/en/ibmwatson/developercloud/. openaccess an open source data model and API for IC
[Online; accessed Feb 5, 2016]. design. In Proceedings of the Asia and South Pacific
[4] What is Docker? Conference on Design Automation (ASP-DAC’06).,
https://www.docker.com/what-docker. [Online; pages 434–436, 2006.
accessed Feb 5, 2016]. [19] D. Hsu. EDA in the Clouds: Myth Busting. Synopsys
[5] EMC Isilon Storage Best Practices for Electronic Insight, 2011.
Design Automation. Technical Report H11909, EMC [20] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and
Corporation, 2013. Li Fei-Fei. ImageNet: A large-scale hierarchical image
[6] EMC Isilon NAS: Performance at Scale for Electronic database. In Proceedings of the 2009 IEEE Conference
Design Automation. Technical Report H13233.1, EMC on Computer Vision and Pattern Recognition
Corporation, 2014. (CVPR’09), pages 248–255, 2009.
[7] systemvision.com: A Cloud-based Engineering [21] R. C. Johnson. IBM Renting Its EDA Tools, 2015.
Community for System Modeling & Design. Technical [22] H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and
Report MGC 04-15 1033380-w, Mentor Graphics I. Stoica. Tachyon: Reliable, Memory Speed Storage
Corporation, 2015. for Cluster Computing Frameworks. In Proceedings of
[8] R. I. Bahar, A. K. Jones, S. Katkoori, P. H. Madden, the ACM Symposium on Cloud Computing
D. Marculescu, and I. L. Markov. Workshops on (SOCC’14), pages 1–15, New York, New York, USA,
Extreme Scale Design Automation (ESDA) Challenges 2014.
and Opportunities for 2025 and Beyond. Technical [23] Min Pan, N. Viswanathan, and C. Chu. An efficient
report, 2014. and effective detailed placement algorithm. In
[9] I. S. Bustany, D. Chinnery, J. R. Shinnerl, and Proceedings of the IEEE/ACM International
V. Yutsis. ISPD 2015 Benchmarks with Fence Regions Conference on Computer-Aided Design (ICCAD’05),
and Routing Blockages for Detailed-Routing-Driven pages 48–55, 2005.
Placement. In Proceedings of the 2015 Symposium on [24] K. E. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz.
International Symposium on Physical Design Timing-Driven Titan: Enabling Large Benchmarks
(ISPD’15), pages 157–164, New York, New York, and Exploring the Gap between Academic and
USA, 2015. Commercial CAD. ACM Transactions on
[10] Y. Chen. Checkpoint and Restore of Micro-service in Reconfigurable Technology and Systems, 8(2):1–18,
Docker Containers. In Proceedings of the 3rd 2015.
International Conference on Mechatronics and [25] G.-J. Nam, C. Sze, and M. Yildiz. The ISPD global
Industrial Informatics, number Icmii, pages 915–918, routing benchmark suite. In Proceedings of the 2008
Paris, France, 2015. Atlantis Press. international symposium on Physical design
[11] C. Chu and Y. C. Wong. FLUTE: Fast lookup table (ISPD’08), page 156, New York, New York, USA,
based rectilinear steiner minimal tree algorithm for 2008.
VLSI design. IEEE Transactions on Computer-Aided [26] G.-J. Nam, M. Yildiz, D. Z. Pan, and P. H. Madden.
136
ISPD placement contest updates and ISPD 2007 and benchmark suite. In Proceedings of the 49th
global routing contest. In Proceedings of the 2007 Annual Design Automation Conference (DAC’12),
international symposium on Physical design page 774, New York, New York, USA, 2012.
(ISPD’07), page 167, New York, New York, USA, [33] N. Viswanathan, C. J. Alpert, C. Sze, Z. Li, G.-J.
2007. Nam, and J. A. Roy. The ISPD-2011 routability-driven
[27] A. Ng and I. Markov. Toward Quality EDA Tools and placement contest and benchmark suite. In
Tool Flows Through High-Performance Computing. In Proceedings of the 2011 international symposium on
Proceedings of the Sixth International Symposium on Physical design (ISPD’11), page 141, 2011.
Quality of Electronic Design (ISQED’05), pages [34] T. White. Hadoop: The Definitive Guide. O’Reilly
22–27, 2005. Media, 4th editio edition, 2015.
[28] S. Popovych, H.-H. Lai, C.-M. Wang, Y.-L. Li, W.-H. [35] L. Wu. Accelerating Physical Design Flow in Laker
Liu, and T.-C. Wang. Density-aware Detailed with TCL Applications and Third Party Tool
Placement with Instant Legalization. In Proceedings of Integration. In SNUG Taiwan 2015, 2015.
the 51st Annual Design Automation Conference on [36] V. Yutsis, I. S. Bustany, D. Chinnery, J. R. Shinnerl,
Design Automation Conference (DAC’14), pages 1–6, and W.-H. Liu. ISPD 2014 benchmarks with
New York, New York, USA, 2014. sub-45nm technology rules for detailed-routing-driven
[29] J. L. Reyes-Ortiz, L. Oneto, and D. Anguita. Big Data placement. In Proceedings of the 2014 on International
Analytics in the Cloud: Spark on Hadoop vs symposium on physical design (ISPD’14), pages
MPI/OpenMP on Beowulf. Procedia Computer 161–168, New York, New York, USA, 2014.
Science, 53(1):121–130, 2015. [37] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma,
[30] L. Stok. The Next 25 Years in EDA: A Cloudy M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica.
Future? IEEE Design & Test, 31(2):40–46, 2014. Resilient Distributed Datasets: A Fault-Tolerant
[31] M. Tessel, M. Crosby, and D. Mónica. Full Sail Ahead: Abstraction for In-Memory Cluster Computing. In
What’s Next For Container Technology. In LinuxCon Proceedings of the 9th USENIX Symposium on
+ CloudOpen + ContainerCon NA 2015, 2015. Networked Systems Design and Implementation
[32] N. Viswanathan, C. Alpert, C. Sze, Z. Li, and Y. Wei. (NSDI’12), pages 15–28, San Jose, CA, 2012.
The DAC 2012 routability-driven placement contest
137