Goswami and Bhatia - 2023 - Application of Machine Learning in FPGA EDA Tool Development
Goswami and Bhatia - 2023 - Application of Machine Learning in FPGA EDA Tool Development
ABSTRACT With the recent advances in hardware technologies like advanced CPUs and GPUs and the large
availability of open-source libraries, machine learning has penetrated various domains, including Electronics
Design Automation (EDA). EDA consists of multiple stages, from high-level synthesis and logic synthesis
to placement and routing. Traditionally, estimating resources and areas from one level of design abstraction
to the next level uses mathematical, statistical, and analytical approaches. However, as the technology node
decreases and the number of cells inside the chip increases, the traditional estimation methods fail to correlate
with the actual post-route values. Machine-learning (ML) based methodologies pave a strong path towards
accurately estimating post-route values. In this paper, we present a comprehensive survey of the existing
literature in the ML application field in EDA, emphasizing FPGA design automation tools. We discuss
how ML is applied in different stages to predict congestion, power, performance, and area (PPA), both for
High-Level Synthesis (HLS) and Register Transfer Level (RTL)-based FPGA designs, application of design
space exploration and application in Computer-Aided Design (CAD) tool parameter settings to optimize
timing and area requirements. Reinforcement learning is widely applied in both FPGA and ASIC physical
design flow, a topic of discussion in this paper. We also discuss various ML models like classical regression
and classification ML, convolution neural networks, reinforcement learning, and graph convolution network
and their application in EDA.
Electronic Design Automation (EDA) is the branch of science paper, we have surveyed the literature in the ML application
or technology that deals with the tools used to design domain in EDA tool design and optimization, focusing on
integrated circuits. IC design is a highly complex process FPGA EDA tools. We have clustered the application in the
consisting of multiple stages with an average turnaround time following five categories:
ranging from a few days to a few years. While designing i. Power, performance and area prediction in RTL and
an IC, it is always a good practice to estimate the power, HLS Designs
performance, area, congestion, and wirlength of the design ii. Design Space Exploration (DSE)
down the flow in an early stage. Traditionally, this is done iii. CAD tool optimization using ML
using statistical, mathematical, or analytical methods [3], [4], iv. Congestion estimation in HLS and RTL designs
[5]. Although the analytical and mathematical models get v. ML in FPGA Phyiscal Design Flow
pretty accurate estimates, these methods are computationally In Fig. 1, we have shown how ML is applied at various
expensive and consume a lot of time (few hours) to perform stages of FPGA EDA flow. The figure also shows the various
basic estimations like congestion [3], [4] and wirelength [6]. subcategories and ML algorithms applied at each flow stage.
These methods generally solve every problem from scratch We have discussed both analytical and mathematical-based
and do not utilize any experience or knowledge. PPA estimators [19], [20], [21] and ML-based estima-
On the other side of the technology world, another domain tors [22], [23], [24], [25], [26], where the design entry is either
is emerging pretty fast and has shown its application in HLS or RTL input. The prediction models use traditional
almost every aspect of human life known as Machine ML methods like regression and classification and advanced
Learning (ML). ML has seen its applications in various methods like deep learning and graph convolution [26],
domains like image recognition [7], [8], natural language [27] to estimate the PPA of a circuit. For autotuning and
processing [9], [10], [11], audio and signal processing [12], recommendation of CAD tool parameters [28], [29], [30].
[13], [14], etc. This is because the hardware used for training Bayesian methods are used to select the hyperparameters
the models, like GPUs, has progressed significantly with of the CAD tools. Yanghua et al. perform the feature
multiple processors fabricated inside one die along with selection to reduce the number of FPGA CAD parameters to
easy availability of open source training frameworks like consider [31] from 80 to 8 features without compromising
Tensorflow, Scikitlearn, and Pytorch [15], [16], [17]. ML has the quality of results. To reduce the parameter-tuning
also penetrated the EDA domain in the last five to six effort, Kwon et al. [32] propose a CAD tool parameter
years. All the problems that were previously solved using recommender system that involves learning a collaborative
analytical methods [3], [4], [5], [6], [18] are now reframed prediction model through tensor decomposition and regres-
as ML problems. The issues that took a few minutes to sion. ML-based design space exploration of HLS designs
hours using analytical or mathematical methods can now is discussed in [33], [34], [35], and [36]. Reference [30]
be solved using ML-based pre-trained models within a uses Bayesian optimization to perform DSE to optimize
few seconds. Machine learning has found its application an HLS-based CNN architecture, while Carloni [35] uses
in almost all stages of IC designs, which include timing, transfer learning to optimize the designs during DSE.
power, and area estimation; routing congestion estimation, Congestion estimation is another problem that machine
both using classical ML and deep learning methods; design learning can solve accurately and quickly. In [37] and
space exploration; CAD tool parameter tuning; Wirelength [38], the congestion estimation of RTL designs on FPGA
(WL) estimation and lithographic hotspot detection. For this uses classical regression-based ML modes. Reference [39]
predicts the routing congestion of HLS designs using ML, methods like CNN, GNN, and RL. They only discuss
while [40] and [41] uses convolution neural network(CNN) wirelength estimation in placement and Design Rules Check
methods to forecast the congestion on post-placed images of (DRC) hotspot detection in post-routed circuits. Paper [46]
logic netlists. and [49] discuss hotspot and DRC violation prediction using
As shown in Fig. 1, ML can be applied at different stages ML in ASIC circuits and static timing analysis using ML.
of the logical and physical design flow, from quality of These two papers are more of a roadmap paper than a
results (QoR) estimations to hyperparameter selection and survey paper. These papers discuss in what different domains
design space exploration. In this paper, we comprehensively ML can be applied to future IC designs. Similar to [46]
reviewed how ML is applied in different stages of FPGA and [47] is a roadmap that discusses performance limitations
physical design flow, along with some insights into future of traditional compute and storage systems and the systems
developments and open research areas. The rest of the and infrastructure considerations for performing machine
paper is organized as follows: Section II, we discuss learning at scale. This paper also briefly discusses how ML
the other works, where application of ML in EDA tool can be applied to solve functional verification and debug
development for both ASICs and FPGAs have been studied; problems. Even though [48] is not a proper survey paper, the
in Section III, we discuss the various ML models applied authors discuss how RL can be applied to solve various circuit
in EDA tool design, which includes classical regression design problems like DRC violations, layout, and routing
and classification, CNN, reinforcement learning, transfer issues. This work profoundly focuses on using Reinforcement
learning, and graph convolution network (GCN). From Learning (RL) to solve post-route violation issues.
Section IV to Section VIII, we discuss in detail the five As the above discussion shows, most existing surveys are
categories in which ML is applied for FPGA CAD. Finally, highly focused on IC development or EDA tools for ASICs.
Section IX summarizes the paper’s contents and looks into Even though there are many recent works in which ML
future trends and directions. has been applied to solve various FPGA physical design
problems, their discussion has been limited to the ‘‘Related
II. RELATED SURVEYS IN ML BASED EDA TOOL Works’’ section of the papers only. No comprehensive survey
DEVELOPMENT is available that discusses the existing work of ML-based
There are few survey papers in which different works in FPGA CAD tool design. To address these loopholes,
the ML application domain in EDA have been studied [42], we created this detailed survey in which we exclusively
[43], [44], [45], [46], [47], [48]. Most of these works are review various works in which ML has been applied to
either too generic for ASIC [42], [45] or too specific to multiple FPGA CAD tool design stages. We discuss both
a particular domain [43], [44], [48]. These papers discuss ML-based HLS tools and ML-based RTL tools for FPGAs.
how ML has been applied to predict or improve the This paper surveys the recent advancements in GNN and RL,
CAD tools’ performance to design ASICs. Discussion of which have been widely applied to the synthesis, placement,
ML-based FPGA CAD tools has been neglected in survey and routing of FPGA circuits.
papers. Reference [42] is a comprehensive survey in which
they have performed a detailed analysis of how ML can III. BASICS OF MACHINE LEARNING
be applied to various stages of ASIC design flow. They ML algorithms are broadly divided into two classes:
discussed ML applications for ASIC physical design, power
(i) Supervised Learning
delivery network and IR drop analysis, lithography and mask
(ii) Unsupervised Learning
production, analog IC design, device sizing automation, and
verification and testing. As we can see here, in this work, Most of the ML models used in EDA fall under supervised
all the predictions are targeted for nanometer scale ASIC learning, including regression, classification, Convolution
physical design flow. They have minimal discussion on the Neural Networks, and Graph Convolution Network. The
FPGA front, that too on the HLS part, but nothing has CAD tool optimization or Design Space Exploration tools
been discussed about prediction in various stages of FPGA use Bayesian learning and the Gaussian process to set the
physical design. In [43], the authors surveyed exclusively on hyperparameters of the CAD tools. In Table 1, we have
applying graph neural network (GNN), ASIC physical design mapped the different EDA applications to the ML models
flow. The paper [43] discusses primarily the theoretical used.In this section, we have discussed the ML algorithms
details of GNN, and the discussion on applying GNN on mentioned in column 1 of Table 1 in detail.
EDA has been minimal. In another work, [44], the authors (i) Linear Regression: Linear regression is a method to
discussed explicitly the application of ML in analog circuit find relationship between two continuous variables;
design. They mentioned the ANN ANN-based analog IC one is independent, and the other is dependent. Linear
design process, the hybrid analog IC design automation regression tries to create a statistical relationship that
process, and the application of ANN and RL in analog may not always be deterministic. Linear regression
IC manufacturing. In [45], the authors briefly discuss the tries to obtain a straight line that shows the relationship
application of ML in the placement and routing of ASIC between the independent and the dependent variable.
designs. There is no mention of FPGAs, HLS, and recent For most of the linear regression models, the error
is measured using the least squared error. A linear as ‘‘knots’’. If there are multiple knots present in the
regression line has an equation of Y = a + bX , where MARS equation, it may lead to overfitting.
X is the explanatory variable and Y is the dependent (iv) Multi Layer Perceptron based Regression (MLP):
variable. The slope of the line is b, and a is the MLP-based regression models comprise multiple per-
intercept, which is determined during model training. ceptrons known as neurons. This type of model falls
(ii) Random Forest Regression(RF): Random forest is into the feedforward class of artificial neural networks.
a supervised learning method that uses ensemble Generally, Artificial Neural Networks (ANN)s are
methods for learning. Ensemble learning is a technique used for classification. But they can be used for
that combines multiple weak learners to create a strong regression, too. MLP-based models are trained using
learner. In random forests, these weak learners are a backpropagation algorithm. Each network consists
decision trees. The outputs from the individual learners of an input layer of neurons, a few hidden layers,
are averaged to generate the final output. Random and an output layer. The output of each function is
forest belongs to the bagging class of learners where, linear. To make the output non-linear and emulate the
during training, data samples are selected at random real-world behavior, activation functions like ‘‘tanh’’,
without replacement. Bagging makes each model run ‘‘RELU’’ and ‘‘sigmoid’’ are added.
independently and then aggregates the outputs at the (v) Gradient Boost Regression (XGB): Similar to ran-
end without preference for any model. Because of dom forest regression, Gradient boost regression is
the introduction of the bagging method, the chances also an ensemble class of ML model. Boosting is
of overfitting are less in a random forest. Random a method of converting weak learners into strong
forest is a very fast training method because each learners. Unlike random forest regression, where we
tree can be trained in parallel, and the inputs and use fully grown trees, in XGB, the trees are weak
outputs of each tree are not related to one another. learners; hence, shallow trees can also be used for
To summarize, Random Forest Algorithm merges the fast learning. In boosting, each new tree is fit on a
output of multiple Decision Trees to generate the final modified version of the original data set. XGB uses
output. multiple trees in sequence, which fixes the error of
(iii) Multivariate Adaptive Regression Splines (MARS): the previous tree. The XGB uses a gradient decent
Multivariate adaptive regression splines [65], an algorithm on the loss function to minimize the error
algorithm that automatically creates a piecewise generated by the previous trees. The XGB method
linear model which creates a nonlinear model by keeps adding trees one after another iteratively until
combining multiple small linear functions known as there is no improvement in the loss. The loss function
steps. In MARS, non-linearity is introduced by using used is either sum of squared error (SSE) or mean
step functions. There are no polynomial terms in the square error (MSE) [66].
MARS equation. Equation 1 shows the linear stepwise (vi) Graph Convolution Network(GCN): With the wide
equation: application of deep learning, neural networks are an
effective and efficient model for tasks like classifi-
yi = β0 + β1 C1 (xi ) + β2 C2 (xi ) + · · · + βd Cd (xi ) + ϵi ,
cation and regression. However, neural networks like
(1) ANN and CNN only take vectors or tensors as input
MARS is an adaptive procedure for regression and is data, which makes it difficult to work on graphs.
well-suited for high-dimensional problems (i.e., many Defferrard et al. generalize convolutional neural net-
inputs). The bends in the step functions are known works from regular grids (e.g., images) to general
Area (WLPA), Net Cut Per Region (NCPR) etc., as their approach can achieve high prediction accuracy for large
features to the regression model. Goswami et al. [71] FPGA designs while relying on well-engineered features
considered a total of nine features for their model, which that encode the placement and connectivity information
include LUT and Flipflop utilization, incoming/outgoing for large-scale designs. The main contributions of [40]
and fully local nets in each gcells and relative location of include selecting features only from post-placed netlist
each gcell inside the FPGA device. All three works tested image, estimation of the routing channel utilization by
and verified their models on ISPD 2016 [72] benchmarks. forecasting the full congestion heat map instead of hotspots
While [37] and [38] divided their dataset into 70% train and only, and integration with placement tool to estimate routing
30% test dataset, [71] predicted on individual benchmarks. congestion on the fly during placement. Both tools can be
To create their models, the three works used conventional easily integrated with open-source academic placers like
regression methods like linear regression, random forest, and UTPLace and GPlace [74], [75] and achieve very high
gradient boost. accuracy.
A. DESIGN SPACE EXPLORATION USING STANDALONE ML C. DESIGN SPACE EXPLORATION USING COMBINATION
In [36] and [79], the authors use traditional ML algorithms to OF ML AND METAHEURISTICS METHODS
find the Pareto optimal curve. While [36], the authors have There are a few works [33], [34], which is a fusion of
proposed a transductive experimental design (TED) based both metaheuristics-based methods like Simulated Annealing
(SA), genetic algorithm (GA), gradient descent, Ant Colony good design space explorer. The larger this value is, the
Optimization (ACO) etc. and machine learning models. worse the performance of the explorer. An explorer aims
In [33], the authors use decision trees to speed up the to minimize the ADRS or runtime. This is done by either
simulated annealing algorithm. Here [33], the authors use minimizing the design space [35], [79] or by applying transfer
a decision tree to reduce the design space to speed up the learning from a different set of designs [35], [53] or from
DSE time and achieve comparable results to a non-pruned another platform like ASIC [82]. Most of the papers [34],
standard simulated annealing-based DSE. In the initial [35], [36], [53], [80], [82] measure the performance of
phases, standard simulated annealing generates designs based their DSE tool in terms of ADRS. In Table 3, column 3,
on the previous design’s cost function. The initial designs we show what QoR measures are being used to measure
are used to generate a decision tree to decide which of the performance of the DSE tools as discussed in the
the attributes contributes to maximizing the cost function. papers, while in column 4, we reported the average ADRS
The attributes from the decision tree, which shows high values of the various designs used in each paper. For papers
correlations, are fixed, while the less important attributes where QoR is measured using some other parameters like
are selected pseudo-randomly. This considerably reduces the runtime, resource or latency requirements, the average of
design space and allows finding the Pareto-front faster than those values is reported. In columns 5 and 6, the benchmarks’
a regular simulated annealer. In [34], the authors use ML to names and the design space’s maximum size are shown,
select the hyperparameters of the metaheuristic algorithms respectively. Although there are few standard benchmarks
used in the DSE. Some of the meta-heuristic parameter like Chstone [84], Machsuite [85], Polybench [86] and
examples selected in this work are the initial and final S2Cbench [87] are available, most of the researchers try to
temperature, the descent rate, the exit condition in simulated generate their dataset [35], [36], [79], [81], [83]. Even if they
annealing, and, in the GA case, the number of parent pairs and use the existing benchmarks, they generate their own design
the mutation and crossover rate. The authors also proposed versions. Hence, a fair comparison of the works is difficult
a combined SA, GA, and ACO concurrent multiheuristic to perform. Generating datasets to train ML models for PPA
design space explorer. In [80] Goswami et al. combined estimation or DSE is an extremely time-consuming process,
heuristics-based Simulated annealing and ML regression which can take weeks and months. To address this issue, [88],
model to design a fast design space explorer in real designs. [89], and [90] benchmarks have been created, specifically
The initial part of the DSE algorithm runs on logic synthesis- curated for ML training. These are large datasets similar to
based results, and once a sufficient amount of data (design computer vision datasets like Imagenet [91], or Cifar10 [92],
points) has been generated, they create an ML model and which are off-the-shelf datasets that the ML-based EDA
switch to fast ML-based predictive DSE. They used this work researchers can directly use for training purposes. Another
to run DSE on different CNN architectures [83]. In [26], new direction of research is to eliminate the synthesis
the authors created a fast DSE by applying GNN on LLVM tools from the loop by using methods like LLVM [93] or
IR graphs, which predicts a design’s resource and latency MLIR [94] methods, which was used in [26] and [80].
requirements.
VI. MACHINE LEARNING IN FPGA CAD TOOL
D. SUMMARY AND FUTURE DIRECTIONS FOR ML BASED PARAMETER SELECTION
DSE As the technology node decreases, the CAD tools used
One of the figures merits of measuring the performance to design the ICs are also becoming very complex. Many
of a DSE tool is called ‘‘Average Distance to Reference parameters are involved in EDA tools, which results in a huge
Set’’ (ADRS). This parameter measures how close the design space. The runtime complexity of the CAD tools is
predicted DSE points are to the exhaustive search-based DSE also very large, and designs take multiple weeks to synthesize
points. If the value of ADRS is close to zero, it is a very at each design point. Modern CAD tools used for logical
FIGURE 4. High level diagram showing how ML is used to recommend CAD tool parameters.
synthesis and physical design have hundreds of parameters to suggest the best timing solutions. InTime is an iterative
in them, which are set to meet various timing and area algorithm organized as a series of concurrent CAD runs.
requirements of the designs. The design space of these tools is Each round, which consists of multiple concurrent runs,
humongous, and manually selecting the tool parameters will is an opportunity to generate candidate CAD parameter
take ages and may not always generate the optimum design combinations and acquire data for analysis. Within each
in terms of power, performance, and area. To recommend round, InTime uses a supervised learning approach to
tool parameters and minimize design space in EDA tools, train classifiers that evaluate the effectiveness of a given
researchers have used Bayesian optimization [29], [30], [95], combination of CAD parameter selections toward increasing
classification [28], [31], [54], Principal component analysis timing slack.
for design space pruning [95] and tensor decomposition and Reference [31] is an extension of Intime [28]. In this
regression [32]. Researchers minimized the tools’ runtime paper, the authors proposed a method to select the best
and generated area and timing-efficient designs using these CAD tool parameters using a classification problem. They
methods. This section discusses seven recent works in which have minimized the search space using the Principal
ML is applied to minimize search space in EDA tools and Component Analysis method. To train the model, they
generate optimized designs. In Fig. 4, we have shown the used the shelf ML models like Logistic regression (glm),
high-level diagram of how ML can recommend CAD tool Bagging (treebag), Random Forest (rf), Support
parameters for FPGA design. The input to the recommender Vector Machine (SVM) (svmRadial) and Neural Network
system is generally the RTL code, a set of logic synthesis (nnet). Here also, the authors minimized the clock period
and physical design parameters that need to be tuned, and using the suggested CAD parameters.
the desired specs in terms of area and performance. The Similar to [28] and [31], LAMBDA [55] also tries to
recommender system considers the specs and the RTL code, maximize TNS. In [28] and [31], parameters of single
and using ML, it suggests the best set of tool parameters to stage are tuned, i.e only for logic synthesis, placement or
be used. All the works discussed in this section use a variant routing. But in [55], parameters of multiple stages are tuned
of this flow. simultaneously. They combine the features from multiple
stages to predict post route QoR of the designs. They
A. RECOMMENDER SYSTEM TO SUGGEST PARAMETERS addressed the problem as a regression problem and used
TO MEET TIMING gradient boost regression to solve it.
The works discussed in [28], [31], and [55] propose methods
to suggest CAD tool parameters to maximize total negative B. GENERALIZED PPA RECOMMENDER SYSTEM
slack (TNS). InTime [28] is a plugin for FPGA CAD tools There are few other works [29], [30], [32], [54] which are
that can automatically select tool parameter assignments for much more generalized than those discussed in Section VI-A.
each design by using machine learning heuristics and cheap These works suggest parameters not only to optimize timing
cloud computing resources. While modern CAD tools have but also to optimize other metrics like power, timing, and
hundreds of parameters, InTime uses only 25 parameters area. Reference [29] proposed a method that is used to
automate the flow selection in IC design using the Bayesian TABLE 4. Summary of CAD tool autotuning works.
optimization method. Their optimization function optimizes
a cost function consisting of power, performance, and area
values. They automate the flow selection at logic synthesis
and the place and route stage. They have used Gaussian
process regression as the surrogate function for the Bayesian
model. Using Bayesian optimization, the authors tuned six
parameters, four of which are during the logic synthesis
stage and two during the place and route stage. Similar
to [29] and [32] also suggests tool parameters both at logic VII. MACHINE LEARNING TO PREDICT POWER,
synthesis and place and route stage, using a two-step process. PERFORMANCE AND AREA ESTIMATION IN FPGA
In the first stage, an ML model is trained offline. They DESIGNS
have used macros and small partitions of large-scale, high- Power, performance, and area estimation are crucial in
performance server processor chips for training. Once the any VLSI physical design flow. The earlier we predict the
model is trained, in the second stage, the tool inputs the post-route QoR of design, the better it is so that the designer
macro name whenever a new macro comes. Suppose it is can go back and set the tool parameters to meet the specs.
previously seen macro, set of cost functions to be optimized, Few recent works use analytical or mathematical models
or some baseline synthesized results for new macro and or machine learning models to predict the post-route QoR
recommends the set of tool parameters to meet the objective of both HLS and RTL designs. Works like COMBA [20],
function. Although the paper presents a nice idea about the LinAnalyzer [19] and Aladdin [21] use analytical methods to
recommendation system, nothing has been mentioned about estimate the post route PPA of HLS designs, whole ‘‘Fast and
the ML model or feature sets. Also, the paper does not discuss Accurate’’ [22], Pyramid [25], XPPE [23], HLSPredict [23]
the accuracy of the resultants QoR of the recommendation and Powergear [63] use ML-based methods to estimate the
system. post route QoR of HLS designs. In [50] and [51] ML has been
Reference [54] proposes a totally new approach compared used to predict timing in RTL-based FPGA design, while
to the already discussed ones. Instead of suggesting hyper- in [62] and [63], GNN is used to predict power in FPGA
parameters for CAD tools, this work suggests which will be designs..
the best tool to use to meet the specs of a particular design.
They address the problem as a binary classification problem; A. ANAYTICAL METHODS FOR PPA ESTIMATION OF HLS
for placement purposes, they selected two popular academic DESIGNS
placers, gplace3.0 [75] and UTPlacef [74]. This work does Aladdin [21], Comba [20], and Lin Analyzer [19] are three
not discuss the quality of the generated placed circuits in recent works that estimate power, performance, or area
terms of timing, area, or power. for HLS designs either individually or together using
analytical and mathematical approaches without the need
of any ML models. Aladdin [21] is a pre-RTL power
C. SUMMARY AND FUTURE DIRECTIONS performance simulator designed to enable rapid design space
In Table 4, we have summarized the work done in exploration of accelerator-centric systems. This framework
the ML-based CAD tool parameter optimization domain. takes high-level language descriptions of algorithms as inputs
Except [29], all the other works are classification or and uses dynamic data dependence graphs (DDDG) to
regression. In the classification ones discussed in [28] and represent an accelerator without generating RTL. Starting
[31], the authors predict whether the QoR (area/timing, etc.) with an unconstrained program DDDG, which corresponds
will be met or not met using a certain set of tool parameters. to an initial representation of accelerator hardware, Aladdin
Similarly, in [54], a binary classification method is used to applies optimizations and constraints to the graph to create
suggest which of the two academic placers is suitable to a realistic model of accelerator activity. Alladin generates
place and route a certain design; most of the work [28], different optimized DDDG graphs and applies various
[29], [31] work only at one stage of the physical design mathematical calculations to estimate the cycle-wise power,
flow; i.e., either at synthesis, placement or routing. They timing, and area requirement.
cannot optimize parameters parallelly in multiple stages. COMBA [20] is an analytical engine that is used to
This issue is addressed in LAMBDA [55], where they try suggest pragmas during DSE. To generate the pragma
to simultaneously autotune the hyperparameters in multiple recommendations, COMBA uses a database called recursive
stages, from logic synthesis to routing. Bayesian optimization data collector (RDC) and a metric-guided design space
is a popular hyperparameter optimization method which is exploration (MGDSE) algorithm. In input to the tool is a
widely used in ML frameworks to select optimal parameters. high-level HLS code written in C/C++. COMBA converts
More research can be done in using Bayesian optimization the HLS code into LLVM IR [73] codes and corresponding
methods for recommending automatic parameters in EDA control and dataflow graphs (CDFG). By running analysis
flow at different stages. on the LLVM IR code and the CDFGs, they create a
mathematical model to estimate an FPGA design’s latency Graphs (CDFG) to generate the graphs. They created a bunch
and resource requirement. Based on the estimated resource of microcircuits from the CHstone [84] benchmark suite
and latency requirement, the MGDSE suggests pragmas for to test their model and achieved 72% reduction in root-
the next iteration. One major drawback of this work is that mean-square error compared to operation delay estimation in
it compares their estimates against the estimation done by Vivado HLS.
Xilinx Vivado HLS after the C-synthesis stage. Since the Ironman [26] is a unified HLS PPA predictor and DSE tool
value reported by Vivado HLS is highly inaccurate [22], that uses ML in both stages. In the first stage, they use a graph
[25], [96], hence the estimation of COMBA is also wrong. neural network to estimate the LUT requirement and timing
However, their latency estimation is very good as compared of the HLS designs. In the second stage, Reinforcement
to post-HLS estimation by the Vivado HLS tool. Learning is used to create the DSE. They also created an
B. ML BASED METHODS
intermediate stage called Code Transformer (CT), which
converts the high-level HLS code into CDFGs. In this work,
In [22] and [25], the authors predicted post-route resource
they proposed a transformation methodology that maps the
and timing requirements of HLS designs by using regression
multiplication operation to LUT or DSPs in an optimized
models based on post-synthesis log files. Generally, the
manner depending on the latency and resource requirement.
QoR reported by an HLS tool after the C-synthesis stage
They tested on eight real and synthetic benchmarks from
varies significantly compared to post-route results. In [22],
CHstone and Machsuites benchmark suites and achieved
the authors analyzed the log files generated after the
less than 10% Mean Absolute Percentage Error (MAPE)
C-synthesis stage to extract various features to minimize
for both resource and timing requirements. Even though
the error. Later, they created regression models, which
they discussed a DSE, nothing has been mentioned about
include Artificial Neural Networks, Linear Regression, and
the ADRS values as compared to metaheuristics-based
XGBoost, to predict the post-route area and performance of
approaches like SA and GA. The strategy for DSE is also not
the designs. The work described in Pyramid [25] is very
properly discussed in the paper.
similar to [22]. Although the prediction results are good,
References [62] and [63] are two recent works in which
a major drawback of this work is we have to run C-synthesis
GNN has been applied to predict power in RTL and HLS
on the designs to extract features. Also, they did not report the
designs. In this work [63], the authors predicted early-stage
error/accuracy on individual benchmarks. The details of the
power estimation in HLS design using GNN. They considered
used features are also missing in the work. In [96], the authors
the impact of interconnects in power modeling to model
created regression models to predict post-route resource,
their predictor accurately. Here, they have created their graph
timing, and latency of HLS designs based on features from
from the HLS design. The HLS design is converted into
the high-level C++ code and LLVM IR codes, in, [97] the
graphs, where the nodes are represented as operations, and
authors constructed a regression model using four different
node features indicate micro-architectures. The edges are
regression models. They extracted the features from the post-
interconnected, and edge features are represented as netlist
high-level synthesis log files and the labels from a customized
activities. They later created a customized GNN to predict
activity generator. They later used the model to create a
dynamic and total power in an HLS design. Traditionally,
design space explorer for latency vs power consumption
in RTL-based designs, power is estimated using gate-level
and achieved very low ADRS values. However, the work
simulation of the synthesized netlist, switching activity
does not mention the details at which stage the labels are
estimation (SAE), and input-output register toggle rates. This
generated. A random forest-based ML is explored to predict
method is very slow and can take months. In [62], the authors
net delay in FPGA placement was proposed in [51]. Most of
used gate-level netlists and corresponding input port and
the features are extracted from the post-synthesized netlist,
register toggle rates over a power window from simulation as
which includes fanout, Half Perimeter Wirelenth (HPWL),
input as the training graph and the toggle rates from gate-level
congestion, critical path length etc. They later applied the
simulation as labels. They used the model to predict SAE,
Recursive Feature Elimination (RFE) method to reduce the
which can later be converted to power.
number of features to 20 and used it to predict delay in
RTL-based designs. D. CROSS PLATFORM BASED METHODS
C. GRAPH LEANING BASED METHODS In [23] and [24] use a methodology in which training is
References [26] and [27] use graph learning methods to pre- done on one set of architecture/platforms but prediction is
dict the post-route timing of HLS designs. In this work [27], done on FPGAs only. Unlike RTL code, HLS code written in
the authors proposed a graph-based learning method to high-level languages like C/C++ can run on any architectures
measure the delay of post-mapped arithmetic operations in like CPU, GPU and microcontrollers. The authors leveraged
HLS designs. They used Graph Neural Networks (GNN) to this in [23] and [24]. In [23], the authors run multiple
do so. Since GNNs can be used to predict the property of designs on different Xilinx FPGA devices and predict the
nodes and edges, they leveraged this property to predict the performance or speedup on Arm A9 devices. To create the
delay of mapped arithmetic operations. They convert the HLS model, they have extracted features from the post-C-synthesis
code into LLVM IR and corresponding Control and Dataflow log files, which are very similar to ‘‘Fast and Accurate’’ [22]
TABLE 5. Summary of machine learning based power, performance and area prediction tools.
and Pyramid [25]. This model helps designers to create a are slower as compared to LLVM-based models like [20],
heterogeneous system to divide which portion of a design to [26], [27], and [96]. Most of the works discussed here use
run on the FPGA part and which part to run on Arm part. The their design entry as HLS because of the fast synthesis,
major drawback of this work is the authors did not consider implementation time, and large availability of benchmarks.
the effects of pragmas on their HLS designs. In the RTL-based works shown in Table 5 [21], [51], [62] they
In [24], the power and performance of an HLS design use features from synthesized gate-level netlists to predict
running on FPGA is predicted by creating a ML model based a design’s power and timing requirements. While most of
on features generated after running the design on CPUs. the works discussed here predict either resource or timing
HLSPredict ensures time synchronicity between the host or latency, the work in [96] and [98] is a comprehensive
CPU and the target FPGA accelerator by using sub-traces one that can predict all the four post route QoR metrics
for model training: sub-traces are epochs of workload of a design viz., Latency, Resource Requirement, Timing,
execution time in the form of CPU countermeasures for the and Power. There is no work in which more advanced ML
host, and FPGA cycle counts for the target. The authors models like BERT, Transformer, or Autoencoders are being
perform a detailed analysis of a design running on x86 CPU used to enhance and predict the performance of EDA tools.
by identifying the CPU microarchitectural subsystems and Using these modern techniques, post-route PPA estimations
correlating them with the post-route performance and power can be made faster and more accurate simultaneously. In a
when the same design runs on FPGA. very recent work [99] (August 2023), the Large Language
Model (LLM) has been used to generate Verilog code based
E. SUMMARY AND FUTURE DIRECTIONS on design description in pure English language. These codes
In Table 5, we have summarized the works done to predict can be synthesized on both FPGA and ASIC design tools.
FPGA designs’ PPA using analytical methods and ML-based In another work, the authors present their work, which, for
models. Most of the work discussed in Table 5 predicts the first time, shows how to generate hardware security
the post-route behavior of HLS designs. Models like XPPE, properties using LLMs automatically. They created their
‘‘Fast and Accurate’’ and Pyramid [22], [23], [25] rely on own BERT model, Hardware security-BERT, which can read
post-C-synthesis log files to extract features; hence they SoC design documentation and generate pertinent hardware
VOLUME 11, 2023 109575
P. Goswami, D. Bhatia: Application of Machine Learning in FPGA EDA Tool Development
only one state is used to guide the agent instead of using TABLE 6. Summary of ML based physical design tools.
two state spaces at different temperatures of SA annealing.
Also, in [56] and [57], seven directed moves were used for
the annealer. In [58], only one move is selected: the Random
Move in [56]. Here, they guide the RL agent based on four
resource types in place of moves. Reference [59] is quite
similar to [56], [57], and [60]. Here also, the authors applied
RL to optimize moves of SA-based placer in VTR [101],
[125]. But the details of the moves are not discussed in the may include generative AI like GAN, which will generate
work, nor are the results shown very prominently here. the placed circuit based on description like the requirement
In this paper [60], Malappa et al. applied RL in the detailed for timing and area. Just like GAN is used to create virtual
placement stage. They addressed the detailed placement congestion in [40] and [41], GAN can also be used for routing
problem as two staged hierarchical problems: (i) Coarse- on the placed circuit in the future. In Table 6, we classified
grained refinement and (ii) Fine-grained refinement. They the three stages of physical design and what are the ML
applied RL in Stage 1 to select the optimal sliding window algorithms applied in each stage.
size and the order in which the sliding windows must
be rearranged. In Stage 2, Satisfiability Modulo Theories IX. CONCLUSION
(SMT) are applied for fine-grain refinement. This work is ML is a growing field in the current technology domain,
purely RL-based, and they did not combine with any meta- with many applications in computer vision, image processing,
heuristics-based methods like SA or Genetic Algorithm. audio and video processing, NLP, etc. This field is currently
In [120], the authors addressed the placement problem as a being applied in the EDA domain. CAD-based IC design
CNN problem based on electrostatic density. The authors pro- technology has been there since the 1980s, and we have
posed a CNN-inspired analytical global placement algorithm lots and lots of data available at various stages of the
for large-scale FPGAs. A novel density framework was physical design flow for various technology nodes. If we
constructed to remedy the high computation time by casting can utilize the data from past IC designs for ASIC and
the 2D electrostatic-based density constraints into CNN. This FPGA, we can make great tools using ML, resulting in fast
first and one of the only known works where CNN has been IC design tools. Another challenge of ML-based IC design
directly applied to the placement problem. tools is the availability of specialists. The designer must have
exceptionally good knowledge of VLSI design flow and ML
C. MACHINE LEARNING IN FPGA ROUTING technologies. As discussed in this chapter, almost all the
Although ML has been widely applied to predict congestion, available ML algorithms are now being applied in EDA tool
routing violations, or DRC violations in routing, as discussed design. However, most of the work discussed in this paper
in Section IV, very few works are where ML is directly is still nascent in academic labs, and commercially making
applied to guide the routing algorithm. RL has been applied them available with vendor-supplied tools may take some
to optimize the routing algorithm in one such work. In [61], time. Two recent tools from AMD Xilinx Vitis AI [126] and
the authors applied RL to optimize the routing algorithm and Synopsys DSO [127] use ML as part of their implementation
compared it against the conventional negotiated congestion algorithms.
routing method used in Pathfinder [111] routing algorithm.
They used the cost function similar to [111] and optimized REFERENCES
it using RL. However, the paper does not discuss the details [1] G. E. Moore, ‘‘Cramming more components onto integrated circuits,
of the RL algorithm and how the state table is created. This Reprinted from Electronics, volume 38, number 8, April 19, 1965, pp. 114
ff.,’’ IEEE Solid-State Circuits Soc. Newslett., vol. 11, no. 3, pp. 33–35,
is the only work where the authors tried to apply ML in Sep. 2006.
routing instead of merely prediction congestion or routing [2] R. H. Dennard, F. H. Gaensslen, H. Yu, V. L. Rideout, E. Bassous, and
violation [37], [38], [41], [71]. A. R. LeBlanc, ‘‘Design of ion-implanted MOSFET’s with very small
physical dimensions,’’ IEEE J. Solid-State Circuits, vol. SSC-9, no. 5,
pp. 256–268, Oct. 1974.
D. SUMMARY AND FUTURE DIRECTION [3] P. Kannan, S. Balachandran, and D. Bhatia, ‘‘On metrics for comparing
In Table 6, we have summarized the application of ML in dif- interconnect estimation methods for FPGAs,’’ IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 12, no. 4, pp. 381–385, Apr. 2004.
ferent stages of FPGA physical design flow. Reinforcement
[4] S. Balachandran and D. Bhatia, ‘‘A priori wirelength and interconnect
Learning is very similar to human learning. Just like humans estimation based on circuit characteristics,’’ IEEE Trans. Comput.-Aided
learn from their mistakes and retrain themselves, RL also Design Integr. Circuits Syst., vol. 24, no. 7, pp. 1054–1065, Jul. 2005.
does so. Hence, RL is very suitable for physical synthesis [5] P. Kannan, S. Balachandran, and D. Bhatia, ‘‘fGREP—Fast generic
routing demand estimation for placed FPGA circuits,’’ in Field-
optimization in EDA. Based on different reward functions, Programmable Logic and Applications, G. Brebner and R. Woods, Eds.
which are generally the cost functions of heuristic algorithms, Berlin, Germany: Springer, 2001, pp. 37–47.
the RL agent guides the optimizer. Hence, it is widely used [6] A. E. Caldwell, A. B. Kahng, S. Mantik, I. L. Markov, and
A. Zelikovsky, ‘‘On wirelength estimations for row-based placement,’’
in placement [56], [57], [59], [60] although application in IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 18, no. 9,
routing and synthesis is limited. Future works in this direction pp. 1265–1278, Sep. 1999.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification [28] N. Kapre, H. Ng, K. Teo, and J. Naude, ‘‘InTime: A machine learning
with deep convolutional neural networks,’’ in Proc. NIPS, vol. 1, 2012, approach for efficient selection of FPGA CAD tool parameters,’’ in
pp. 1097–1105. Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, Feb. 2015,
[8] S. Liu and W. Deng, ‘‘Very deep convolutional neural network based pp. 23–26.
image classification using small training sample size,’’ in Proc. 3rd IAPR [29] Y. Ma, Z. Yu, and B. Yu, ‘‘CAD tool design space exploration via
Asian Conf. Pattern Recognit. (ACPR), Nov. 2015, pp. 730–734. Bayesian optimization,’’ 2019, arXiv:1912.06460.
[9] T. P. Nagarhalli, V. Vaze, and N. K. Rana, ‘‘Impact of machine learning [30] B. Reagen, J. M. Hernández-Lobato, R. Adolf, M. Gelbart,
in natural language processing: A review,’’ in Proc. 3rd Int. Conf. P. Whatmough, G.-Y. Wei, and D. Brooks, ‘‘A case for efficient
Intell. Commun. Technol. Virtual Mobile Netw. (ICICV), Feb. 2021, accelerator design space exploration via Bayesian optimization,’’ in
pp. 1529–1534. Proc. IEEE/ACM Int. Symp. Low Power Electron. Design (ISLPED),
[10] A. Le Glaz, Y. Haralambous, D.-H. Kim-Dufor, P. Lenca, R. Billot, Jul. 2017, pp. 1–6.
T. C. Ryan, J. Marsh, J. DeVylder, M. Walter, S. Berrouiguet, and [31] Q. Yanghua, H. Ng, and N. Kapre, ‘‘Boosting convergence of timing
C. Lemey, ‘‘Machine learning and natural language processing in mental closure using feature selection in a learning-driven approach,’’ in Proc.
health: Systematic review,’’ J. Med. Internet Res., vol. 23, no. 5, 26th Int. Conf. Field Program. Log. Appl. (FPL), Aug. 2016, pp. 1–9.
May 2021, Art. no. e15708. [32] J. Kwon, M. M. Ziegler, and L. P. Carloni, ‘‘A learning-based
[11] E. Mankolli and V. Guliashki, ‘‘Machine learning and natural language recommender system for autotuning design FIows of industrial high-
processing: Review of models and optimization problems,’’ in ICT performance processors,’’ in Proc. 56th ACM/IEEE Design Autom. Conf.
Innovations 2020. Machine Learning and Applications, V. Dimitrova and (DAC), Jun. 2019, pp. 1–6.
I. Dimitrovski, Eds. Cham, Switzerland: Springer, 2020, pp. 71–86. [33] A. Mahapatra and B. C. Schafer, ‘‘Machine-learning based simulated
[12] J. Long, X. Wang, W. Zhou, J. Zhang, D. Dai, and G. Zhu, ‘‘A annealer method for high level synthesis design space exploration,’’ in
comprehensive review of signal processing and machine learning Proc. Electron. Syst. Level Synth. Conf. (ESLsyn), May 2014, pp. 1–6.
technologies for UHF PD detection and diagnosis (I): Preprocessing and [34] Z. Wang and B. C. Schafer, ‘‘Machine leaming to set meta-heuristic
localization approaches,’’ IEEE Access, vol. 9, pp. 69876–69904, 2021. specific parameters for high-level synthesis design space exploration,’’
[13] Rahul, ‘‘Review of signal processing techniques and machine learning in Proc. 57th ACM/IEEE Design Autom. Conf. (DAC), Jul. 2020, pp. 1–6.
algorithms for power quality analysis,’’ Adv. Theory Simulations, vol. 3, [35] J. Kwon and L. P. Carloni, ‘‘Transfer learning for design-space
no. 10, Oct. 2020, Art. no. 2000118. exploration with high-level synthesis,’’ in Proc. ACM/IEEE 2nd Work-
[14] X. Dong, D. Thanou, L. Toni, M. Bronstein, and P. Frossard, ‘‘Graph shop Mach. Learn. CAD (MLCAD), Nov. 2020, pp. 163–168, doi:
signal processing for machine learning: A review and new perspectives,’’ 10.1145/3380446.3430636.
IEEE Signal Process. Mag., vol. 37, no. 6, pp. 117–127, Nov. 2020. [36] H.-Y. Liu and L. P. Carloni, ‘‘On learning-based methods for design-space
[15] (2019). Scikit-Learn, ML in Python. [Online]. Available: https://scikit- exploration with high-level synthesis,’’ in Proc. 50th ACM/EDAC/IEEE
learn.org/ Design Autom. Conf. (DAC), May 2013, pp. 1–7.
[16] (2021). An End-to-End Open Source Machine Learning Platform. [37] D. Maarouf, A. Alhyari, Z. Abuowaimer, T. Martin, A. Gunter, G. Grewal,
[Online]. Available: https://www.tensorflow.org/ S. Areibi, and A. Vannelli, ‘‘Machine-learning based congestion estima-
[17] (2021). Pytorch. [Online]. Available: https://www.pytorch.org/ tion for modern FPGAs,’’ in Proc. 28th Int. Conf. Field Program. Log.
[18] X. Yang, R. Kastner, and M. Sarrafzadeh, ‘‘Congestion estimation Appl. (FPL), Aug. 2018, pp. 427–4277.
during top-down placement,’’ IEEE Trans. Comput.-Aided Design Integr. [38] C.-W. Pui, G. Chen, Y. Ma, E. F. Y. Young, and B. Yu, ‘‘Clock-aware
Circuits Syst., vol. 21, no. 1, pp. 72–80, Jan. 2002. ultrascale FPGA placement with machine learning routability prediction:
[19] G. Zhong, A. Prakash, Y. Liang, T. Mitra, and S. Niar, ‘‘Lin-analyzer: (Invited paper),’’ in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design
A high-level performance analysis tool for FPGA-based accelerators,’’ (ICCAD), Nov. 2017, pp. 929–936.
in Proc. 53rd ACM/EDAC/IEEE Design Autom. Conf. (DAC), Jun. 2016, [39] J. Zhao, T. Liang, S. Sinha, and W. Zhang, ‘‘Machine learning
pp. 1–6. based routing congestion prediction in FPGA high-level synthesis,’’
[20] J. Zhao, L. Feng, S. Sinha, W. Zhang, Y. Liang, and B. He, ‘‘COMBA: A in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), Mar. 2019,
comprehensive model-based analysis framework for high level synthesis pp. 1130–1135.
of real applications,’’ in Proc. IEEE/ACM Int. Conf. Comput.-Aided [40] C. Yu and Z. Zhang, ‘‘Painting on placement: Forecasting routing
Design (ICCAD), Nov. 2017, pp. 430–437. congestion using conditional generative adversarial nets,’’ in Proc. 56th
[21] Y. Sophia Shao, B. Reagen, G.-Y. Wei, and D. Brooks, ‘‘Aladdin: A Annu. Design Autom. Conf., Jun. 2019, pp. 26–31.
pre-RTL, power-performance accelerator simulator enabling large design [41] M. B. Alawieh, W. Li, Y. Lin, L. Singhal, M. A. Iyer, and D. Z. Pan,
space exploration of customized architectures,’’ in Proc. ACM/IEEE 41st ‘‘High-definition routing congestion prediction for large-scale FPGAs,’’
Int. Symp. Comput. Archit. (ISCA), Jun. 2014, pp. 97–108. in Proc. 25th Asia South Pacific Design Autom. Conf. (ASP-DAC),
[22] S. Dai, Y. Zhou, H. Zhang, E. Ustun, E. F. Y. Young, and Z. Zhang, ‘‘Fast Jan. 2020, pp. 26–31.
and accurate estimation of quality of results in high-level synthesis with [42] G. Huang, J. Hu, Y. He, J. Liu, M. Ma, Z. Shen, J. Wu, Y. Xu, H. Zhang,
machine learning,’’ in Proc. IEEE 26th Annu. Int. Symp. Field-Program. K. Zhong, X. Ning, Y. Ma, H. Yang, B. Yu, H. Yang, and Y. Wang,
Custom Comput. Mach. (FCCM), Apr. 2018, pp. 129–132. ‘‘Machine learning for electronic design automation: A survey,’’ ACM
[23] H. M. Makrani, H. Sayadi, T. Mohsenin, S. Rafatirad, A. Sasan, Trans. Des. Autom. Electron. Syst., vol. 26, no. 5, pp. 1–46, 2021.
and H. Homayoun, ‘‘XPPE: Cross-platform performance estimation of [43] D. S. Lopera, L. Servadei, G. N. Kiprit, S. Hazra, R. Wille, and
hardware accelerators using machine learning,’’ in Proc. 24th Asia South W. Ecker, ‘‘A survey of graph neural networks for electronic design
Pacific Design Autom. Conf., Jan. 2019, pp. 727–732. automation,’’ in Proc. ACM/IEEE 3rd Workshop Mach. Learn. CAD
[24] K. O’Neal, M. Liu, H. Tang, A. Kalantar, K. DeRenard, and P. Brisk, (MLCAD), Aug. 2021, pp. 1–6.
‘‘HLSPredict: Cross platform performance prediction for FPGA high- [44] R. Mina, C. Jabbour, and G. E. Sakr, ‘‘A review of machine learning
level synthesis,’’ in Proc. IEEE/ACM Int. Conf. Computer-Aided Design techniques in analog integrated circuit design automation,’’ Electronics,
(ICCAD), Nov. 2018, pp. 1–8. vol. 11, no. 3, pp. 1–20, 2022.
[25] H. M. Makrani, F. Farahmand, H. Sayadi, S. Bondi, S. M. P. Dinakarrao, [45] V. Hamolia and V. Melnyk, ‘‘A survey of machine learning methods and
H. Homayoun, and S. Rafatirad, ‘‘Pyramid: Machine learning framework applications in electronic design automation,’’ in Proc. 11th Int. Conf.
to estimate the optimal timing and resource usage of a high-level synthesis Adv. Comput. Inf. Technol. (ACIT), Sep. 2021, pp. 757–760.
design,’’ in Proc. 29th Int. Conf. Field Program. Log. Appl. (FPL), [46] A. B. Kahng, ‘‘Machine learning applications in physical design: Recent
Sep. 2019, pp. 397–403. results and directions,’’ in Proc. Int. Symp. Phys. Design, Mar. 2018,
[26] N. Wu, Y. Xie, and C. Hao, ‘‘IronMan: GNN-assisted design space pp. 68–73.
exploration in high-level synthesis via reinforcement learning,’’ in Proc. [47] M. Pandey, ‘‘Machine learning and systems for building the next
Great Lakes Symp. (VLSI), Jun. 2021, pp. 39–44. generation of EDA tools,’’ in Proc. 23rd Asia South Pacific Design Autom.
[27] E. Ustun, C. Deng, D. Pal, Z. Li, and Z. Zhang, ‘‘Accurate operation Conf. (ASP-DAC), Jan. 2018, pp. 411–415.
delay prediction for FPGA HLS using graph neural networks,’’ in Proc. [48] H. Ren, B. Khailany, M. Fojtik, and Y. Zhang, ‘‘Machine learning and
IEEE/ACM Int. Conf. Comput. Aided Design (ICCAD), Nov. 2020, algorithms: Let us team up for EDA,’’ IEEE Des. Test, vol. 40, no. 1,
pp. 1–9. pp. 70–76, Feb. 2023.
[49] A. B. Kahng, ‘‘Machine learning for CAD/EDA: The road ahead,’’ IEEE [71] P. Goswami and D. Bhatia, ‘‘Congestion prediction in FPGA using
Des. Test, vol. 40, no. 1, pp. 8–16, Feb. 2023. regression based learning methods,’’ Electronics, vol. 10, no. 16, p. 1995,
[50] H. Hu, J. Hu, F. Zhang, B. Tian, and I. Bustany, ‘‘Machine-learning Aug. 2021.
based delay prediction for FPGA technology mapping,’’ in Proc. 24th [72] S. Yang, A. Gayasen, C. Mulpuri, S. Reddy, and R. Aggarwal,
ACM/IEEE Workshop Syst. Level Interconnect Pathfinding, Nov. 2023, ‘‘Routability-driven FPGA placement contest,’’ in Proc. Int. Symp. Phys.
pp. 1-6. Design, Apr. 2016, pp. 139–143.
[51] T. Martin, G. Grewal, and S. Areibi, ‘‘A machine learning approach [73] C. Lattner and V. Adve, ‘‘LLVM: A compilation framework for lifelong
to predict timing delays during FPGA placement,’’ in Proc. IEEE program analysis & transformation,’’ in Proc. Int. Symp. Code Gener.
Int. Parallel Distrib. Process. Symp. Workshops (IPDPSW), Jun. 2021, Optim. (CGO), Mar. 2004, pp. 75–86.
pp. 124–127. [74] W. Li, S. Dhar, and D. Z. Pan, ‘‘UTPlaceF: A routability-driven FPGA
[52] G. Singha, D. Diamantopoulosb, J. Gómez-Lunaa, S. Stuijkc, placer with physical and congestion aware packing,’’ in Proc. IEEE/ACM
H. Corporaalc, and O. Mutlua, ‘‘LEAPER: Fast and accurate FPGA- Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2016, pp. 1–7.
based system performance prediction via transfer learning,’’ in Proc. [75] Z. Abuowaimer, D. Maarouf, T. Martin, J. Foxcroft, G. Gréwal, S. Areibi,
IEEE 40th Int. Conf. Comput. Design (ICCD), Oct. 2022, pp. 499–508. and A. Vannelli, ‘‘GPlace3.0: Routability-driven analytic placer for
[53] L. Ferretti, J. Kwon, G. Ansaloni, G. Di Guglielmo, L. P. Carloni, UltraScale FPGA architectures,’’ ACM Trans. Design Autom. Electron.
and L. Pozzi, ‘‘Leveraging prior knowledge for effective design-space Syst., vol. 23, no. 5, pp. 1–33, Oct. 2018.
exploration in high-level synthesis,’’ IEEE Trans. Comput.-Aided Design [76] B. C. Schafer and Z. Wang, ‘‘High-level synthesis design space
Integr. Circuits Syst., vol. 39, no. 11, pp. 3736–3747, Nov. 2020. exploration: Past, present, and future,’’ IEEE Trans. Comput.-Aided
Design Integr. Circuits Syst., vol. 39, no. 10, pp. 2628–2639, Oct. 2020.
[54] A. Al-hyari, Z. Abuowaimer, D. Maarouf, S. Areibi, and G. Grewal,
[77] B. C. Schafer, ‘‘Parallel high-level synthesis design space exploration for
‘‘An effective FPGA placement flow selection framework using machine
behavioral IPs of exact latencies,’’ ACM Trans. Design Autom. Electron.
learning,’’ in Proc. 30th Int. Conf. Microelectron. (ICM), Dec. 2018,
Syst., vol. 22, no. 4, pp. 1–20, May 2017.
pp. 164–167.
[78] B. C. Schafer, ‘‘Probabilistic multiknob high-level synthesis design space
[55] E. Ustun, S. Xiang, J. Gui, C. Yu, and Z. Zhang, ‘‘LAMDA: Learning-
exploration acceleration,’’ IEEE Trans. Comput.-Aided Design Integr.
assisted multi-stage autotuning for FPGA design closure,’’ in Proc. IEEE
Circuits Syst., vol. 35, no. 3, pp. 394–406, Mar. 2016.
27th Annu. Int. Symp. Field-Program. Custom Comput. Mach. (FCCM),
[79] P. Meng, A. Althoff, Q. Gautier, and R. Kastner, ‘‘Adaptive threshold non-
Apr. 2019, pp. 74–77.
Pareto elimination: Re-thinking machine learning for system level design
[56] M. A. Elgammal, K. E. Murray, and V. Betz, ‘‘RLPlace: Using space exploration on FPGAs,’’ in Proc. Design, Autom. Test Eur. Conf.
reinforcement learning and smart perturbations to optimize FPGA Exhib. (DATE), Mar. 2016, pp. 918–923.
placement,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,
[80] P. Goswami, B. C. Schaefer, and D. Bhatia, ‘‘Machine learning based fast
vol. 41, no. 8, pp. 2532–2545, Aug. 2022.
and accurate high level synthesis design space exploration: From graph
[57] M. A. Elgamma, K. E. Murray, and V. Betz, ‘‘Learn to place: FPGA to synthesis,’’ Integration, vol. 88, pp. 116–124, Jan. 2023.
placement using reinforcement learning and directed moves,’’ in Proc. [81] A. Sohrabizadeh, C. H. Yu, M. Gao, and J. Cong, ‘‘AutoDSE: Enabling
Int. Conf. Field-Program. Technol. (ICFPT), Dec. 2020, pp. 85–93. software programmers design efficient FPGA accelerators,’’ in Proc.
[58] K. E. Murray and V. Betz, ‘‘Adaptive FPGA placement optimization via ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, Feb. 2021, p. 147,
reinforcement learning,’’ in Proc. ACM/IEEE 1st Workshop Mach. Learn. doi: 10.1145/3431920.3439464.
CAD (MLCAD), Sep. 2019, pp. 1–6. [82] S. Liu, F. C. Lau, and B. C. Schafer, ‘‘Accelerating FPGA prototyping
[59] J. Zhang, F. Deng, and X. Yang, ‘‘FPGA placement optimization with through predictive model-based HLS design space exploration,’’ in Proc.
deep reinforcement learning,’’ in Proc. 2nd Int. Conf. Comput. Eng. Intell. 56th ACM/IEEE Design Autom. Conf. (DAC), Jun. 2019, pp. 1–6.
Control (ICCEIC), Nov. 2021, pp. 73–76. [83] P. Goswami, M. Shahshahani, and D. Bhatia, ‘‘Robust estimation of
[60] U. Mallappa, S. Pratty, and D. Brown, ‘‘RLPlace: Deep RL guided FPGA resources and performance from CNN models,’’ in Proc. 35th Int.
heuristics for detailed placement optimization,’’ in Proc. Design, Autom. Conf. VLSI Design 21st Int. Conf. Embedded Syst. (VLSID), Feb. 2022,
Test Eur. Conf. Exhib. (DATE), Mar. 2022, pp. 120–123. pp. 144–149.
[61] U. Farooq, N. Ul Hasan, I. Baig, and M. Zghaibeh, ‘‘Efficient FPGA [84] Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii, ‘‘CHStone: A
routing using reinforcement learning,’’ in Proc. 12th Int. Conf. Inf. benchmark program suite for practical C-based high-level synthesis,’’ in
Commun. Syst. (ICICS), May 2021, pp. 106–111. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2008, pp. 1192–1195.
[62] Y. Zhang, H. Ren, and B. Khailany, ‘‘GRANNITE: Graph neural network [85] B. Reagen, R. Adolf, Y. S. Shao, G.-Y. Wei, and D. Brooks, ‘‘MachSuite:
inference for transferable power estimation,’’ in Proc. 57th ACM/IEEE Benchmarks for accelerator design and customized architectures,’’ in
Design Autom. Conf. (DAC), Jul. 2020, pp. 1–6. Proc. IEEE Int. Symp. Workload Characterization (IISWC), Oct. 2014,
pp. 110–119.
[63] Z. Lin, Z. Yuan, J. Zhao, W. Zhang, H. Wang, and Y. Tian, ‘‘PowerGear:
Early-stage power estimation in FPGA HLS via heterogeneous edge- [86] L.-N. Pouchet. (2020). Polybench Benchmarks. [Online]. Available:
centric GNNs,’’ in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/
Mar. 2022, pp. 1341–1346. [87] B. C. Schafer and A. Mahapatra, ‘‘S2CBench: Synthesizable SystemC
benchmark suite for high-level synthesis,’’ IEEE Embedded Syst. Lett.,
[64] N. Wu, H. Yang, Y. Xie, P. Li, and C. Hao, ‘‘High-level synthesis
vol. 6, no. 3, pp. 53–56, Sep. 2014.
performance prediction using GNNs: Benchmarking, modeling, and
advancing,’’ in Proc. 59th ACM/IEEE Design Autom. Conf., Jul. 2022, [88] P. Goswami, M. Shahshahani, and D. Bhatia, ‘‘MLSBench: A synthesiz-
pp. 49–54. able dataset of HLS designs to support ML based design flows,’’ in Proc.
ACM/SIGDA Int. Symp. Field-Program. Gate Arrays, Feb. 2020, pp. 1–6.
[65] J. H. Friedman, ‘‘Multivariate adaptive regression splines,’’ Ann. Statist.,
[89] P. Goswami, M. Shahshahani, and D. Bhatia, ‘‘MLSBench: A benchmark
vol. 19, no. 1, pp. 1–67, Mar. 1991.
set for machine learning based FPGA HLS design flows,’’ in Proc. IEEE
[66] J. Elith, J. R. Leathwick, and T. Hastie, ‘‘A working guide to boosted 13th Latin Amer. Symp. Circuits Syst. (LASCAS), Mar. 2022, pp. 1–4.
regression trees,’’ J. Animal Ecol., vol. 77, no. 4, pp. 802–813, Jul. 2008. [90] Y. Zhou, U. Gupta, S. Dai, R. Zhao, N. Srivastava, H. Jin, J. Featherston,
[67] M. Defferrard, X. Bresson, and P. Vandergheynst, ‘‘Convolutional Y.-H. Lai, G. Liu, G. A. Velasquez, W. Wang, and Z. Zhang, ‘‘Rosetta: A
neural networks on graphs with fast localized spectral filtering,’’ 2016, realistic high-level synthesis benchmark suite for software programmable
arXiv:1606.09375. FPGAs,’’ in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays,
[68] T. N. Kipf and M. Welling, ‘‘Semi-supervised classification with graph Feb. 2018, pp. 269–278.
convolutional networks,’’ 2016, arXiv:1609.02907. [91] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet:
[69] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput.
‘‘Deep reinforcement learning: A brief survey,’’ IEEE Signal Process. Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
Mag., vol. 34, no. 6, pp. 26–38, Nov. 2017. [92] N. V. Krizhevsky and G. Hinton, The CIFAR-10 Dataset.
[70] L. P. Kaelbling, M. L. Littman, and A. W. Moore, ‘‘Reinforcement Toronto, ON, Canada: Univ. Toronto, 2009. [Online]. Available:
learning: A survey,’’ J. Artif. Intell. Res., vol. 4, no. 1, pp. 237–285, https://www.cs.toronto.edu/~kriz/cifar.html
Jan. 1996. [93] (2019). LLVM Compiler. [Online]. Available: https://www.llvm.org
[94] (2020). Multi-Level Intermediate Representation Overview. [Online]. [116] A. Ludwin, V. Betz, and K. Padalia, ‘‘High-quality, deterministic parallel
Available: https://mlir.llvm.org/ placement for FPGAs on commodity hardware,’’ in Proc. 16th Int.
[95] N. Kapre, B. Chandrashekaran, H. Ng, and K. Teo, ‘‘Driving timing ACM/SIGDA Symp. Field Program. Gate Arrays, Feb. 2008, pp. 14–23.
convergence of FPGA designs through machine learning and cloud [117] C. Fobel, G. Grewal, and D. Stacey, ‘‘A scalable, serially-equivalent, high-
computing,’’ in Proc. IEEE 23rd Annu. Int. Symp. Field-Program. Custom quality parallel placement methodology suitable for modern multicore
Comput. Mach., May 2015, pp. 119–126. and GPU architectures,’’ in Proc. 24th Int. Conf. Field Program. Log.
[96] P. Goswami and D. Bhatia, ‘‘Predicting post-route quality of results Appl. (FPL), Sep. 2014, pp. 1–8.
estimates for HLS designs using machine learning,’’ in Proc. 23rd Int. [118] M. An, J. G. Steffan, and V. Betz, ‘‘Speeding up FPGA placement:
Symp. Quality Electron. Design (ISQED), Apr. 2022, pp. 45–50. Parallel algorithms and methods,’’ in Proc. IEEE 22nd Annu. Int. Symp.
[97] Z. Lin, J. Zhao, S. Sinha, and W. Zhang, ‘‘HL-pow: A learning-based Field-Program. Custom Comput. Mach., May 2014, pp. 178–185.
power modeling framework for high-level synthesis,’’ in Proc. 25th Asia [119] R. Manimegalai, E. Siva Soumya, V. Muralidharan, B. Ravindran,
South Pacific Design Autom. Conf. (ASP-DAC), Jan. 2020, pp. 574–580. V. Kamakoti, and D. Bhatia, ‘‘Placement and routing for 3D-FPGAs
using reinforcement learning and support vector machines,’’ in Proc. 18th
[98] P. Goswami, ‘‘Machine learning based prediction in FPGA CAD,’’ Ph.D.
Int. Conf. VLSI Design Held Jointly With 4th Int. Conf. Embedded Syst.
dissertation, Dept. Elect. Eng., Univ. Texas at Dallas, Richardson, TX,
Design, Jan. 2005, pp. 451–456.
USA, May 2022.
[120] H. Wang, X. Tong, C. Ma, R. Shi, J. Chen, K. Wang, J. Yu, and
[99] S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and Y.-W. Chang, ‘‘CNN-inspired analytical global placement for large-scale
S. Garg, ‘‘VeriGen: A large language model for verilog code generation,’’ heterogeneous FPGAs,’’ in Proc. 59th ACM/IEEE Design Autom. Conf.,
2023, arXiv:2308.00708. Jul. 2022, pp. 637–642.
[100] X. Meng, A. Srivastava, A. Arunachalam, A. Ray, P. H. Silva, R. Psiakis, [121] G. Zhou and J. H. Anderson, ‘‘Area-driven FPGA logic synthesis using
Y. Makris, and K. Basu, ‘‘Unlocking hardware security assurance: The reinforcement learning,’’ in Proc. 28th Asia South Pacific Design Autom.
potential of LLMS,’’ 2023, arXiv:2308.11042. Conf. (ASP-DAC), Jan. 2023, pp. 159–165.
[101] V. Betz and J. Rose, ‘‘VPR: A new packing, placement and routing tool [122] R. Brayton and A. Mishchenko, ‘‘ABC: An academic industrial-strength
for FPGA research,’’ in Proc. Int. Conf. Field-Program. Log. Appl., 1997, verification tool,’’ in Computer Aided Verification. Springer, 2010,
pp. 213–222. pp. 24–40.
[102] J. M. Kleinhans, G. Sigl, F. M. Johannes, and K. J. Antreich, ‘‘GORDIAN: [123] K. Zhu, M. Liu, H. Chen, Z. Zhao, and D. Z. Pan, ‘‘Exploring
VLSI placement by quadratic programming and slicing optimization,’’ logic optimizations with reinforcement learning and graph convolutional
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 10, no. 3, network,’’ in Proc. ACM/IEEE 2nd Workshop Mach. Learn. CAD
pp. 356–365, Mar. 1991. (MLCAD), Nov. 2020, pp. 145–150.
[103] W. Wang, Q. Meng, and Z. Zhang, ‘‘A survey of FPGA placement [124] A. Hosny, S. Hashemi, M. Shalan, and S. Reda, ‘‘DRiLLS: Deep
algorithm research,’’ in Proc. 7th IEEE Int. Conf. Electron. Inf. reinforcement learning for logic synthesis,’’ in Proc. 25th Asia South
Emergency Commun. (ICEIEC), Jul. 2017, pp. 498–502. Pacific Design Autom. Conf. (ASP-DAC), Jan. 2020, pp. 581–586.
[104] S.-C. Chen and Y.-W. Chang, ‘‘FPGA placement and routing,’’ in Proc. [125] K. E. Murray, O. Petelin, S. Zhong, J. M. Wang, M. Eldafrawy,
IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD), Nov. 2017, J.-P. Legault, E. Sha, A. G. Graham, J. Wu, M. J. P. Walker, H. Zeng,
pp. 914–921. P. Patros, J. Luu, K. B. Kent, and V. Betz, ‘‘VTR 8: High-performance
CAD and customizable FPGA architecture modelling,’’ ACM Trans.
[105] G. Sergey, Z. Daniil, and C. Rustam, ‘‘Simulated annealing based place-
Reconfigurable Technol. Syst., vol. 13, no. 2, pp. 1–55, Jun. 2020.
ment optimization for reconfigurable systems-on-chip,’’ in Proc. IEEE
[126] (2023). Vitis AI. [Online]. Available: https://www.xilinx.com/products/
Conf. Russian Young Researchers Electr. Electron. Eng. (EIConRus),
design-tools/vitis/vitis-ai.html
Jan. 2019, pp. 1597–1600.
[127] (2023). Synopsys DSO. [Online]. Available: https://www.synopsys.com/
[106] J. Yuan, J. Chen, L. Wang, X. Zhou, Y. Xia, and J. Hu, ‘‘ARBSA: ai/chip-design/dso-ai.html
Adaptive range-based simulated annealing for FPGA placement,’’ IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 38, no. 12, PINGAKSHYA GOSWAMI (Student Member,
pp. 2330–2342, Dec. 2019. IEEE) received the B.Tech. degree in electron-
[107] P. Goswami and D. Bhatia, ‘‘Floorplanning of partially reconfigurable ics and communication engineering from Tezpur
design on heterogeneous FPGA (abstract only),’’ in Proc. ACM/SIGDA University, India, in 2012, and the M.S. and
Int. Symp. Field-Program. Gate Arrays, Feb. 2016, p. 275. Ph.D. degrees in electrical engineering from The
[108] W. Li, Y. Lin, and D. Z. Pan, ‘‘ElfPlace: Electrostatics-based placement University of Texas at Dallas, in 2016 and 2022,
for large-scale heterogeneous FPGAs,’’ in Proc. IEEE/ACM Int. Conf. respectively. He is currently a Software Engineer
Comput.-Aided Design (ICCAD), Nov. 2019, pp. 1–8.
with Lattice Semiconductor. His research inter-
[109] C.-W. Pui, G. Chen, W.-K. Chow, K.-C. Lam, J. Kuang, P. Tu, H. Zhang, ests include EDA tool development for FPGAs
E. F. Y. Young, and B. Yu, ‘‘RippleFPGA: A routability-driven placement
and ASICs and hardware accelerator design on
for large-scale heterogeneous FPGAs,’’ in Proc. IEEE/ACM Int. Conf.
FPGAs.
Comput.-Aided Design (ICCAD), Nov. 2016, pp. 1–8.
[110] T. Liang, G. Chen, J. Zhao, S. Sinha, and W. Zhang, ‘‘AMF-placer: DINESH BHATIA (Senior Member, IEEE)
High-performance analytical mixed-size placer for FPGA,’’ in Proc. received the bachelor’s degree in electrical engi-
IEEE/ACM Int. Conf. Comput. Aided Design (ICCAD), Nov. 2021,
neering from the Regional Engineering College,
pp. 1–9.
Suratkal, India, in 1985, and the master’s and
[111] L. McMurchie and C. Ebeling, ‘‘PathFinder: A negotiation-based
Ph.D. degrees in computer science from The
performance-driven router for FPGAs,’’ in Proc. 3rd Int. ACM Symp.
University of Texas at Dallas, TX, USA, in
Field-Program. Gate Arrays, Feb. 1995, pp. 111–117.
1987 and 1990, respectively. He is currently a
[112] J. Wang, J. Mai, Z. Di, and Y. Lin, ‘‘A robust FPGA router with concurrent
Faculty Member with the Erik Jonsson School
intra-CLB rerouting,’’ in Proc. 28th Asia South Pacific Design Autom.
Conf. (ASP-DAC), Jan. 2023, pp. 529–534. of Engineering and Computer Science, The
[113] K. E. Murray, S. Zhong, and V. Betz, ‘‘AIR: A fast but lazy timing-driven
University of Texas at Dallas, where he directs
FPGA router,’’ in Proc. 25th Asia South Pacific Design Autom. Conf. research activities within the IDEA Laboratory. He has served on
(ASP-DAC), Jan. 2020, pp. 338–344. technical program committees of several international conferences related to
[114] M. Shen and G. Luo, ‘‘Corolla: GPU-accelerated FPGA routing based on field-programmable gate arrays (FPGAs), field-programmable technology,
subgraph dynamic expansion,’’ in Proc. ACM/SIGDA Int. Symp. Field- and system-level design using FPGAs. His research interests include system-
Program. Gate Arrays, Feb. 2017, pp. 105–114. level design, power and energy systems, and architecture and computer-aided
[115] Y. Lin, S. Dhar, W. Li, H. Ren, B. Khailany, and D. Z. Pan, design for FPGAs. He has served on the Editorial Board of the IEEE
‘‘DREAMPIace: Deep learning toolkit-enabled GPU acceleration for TRANSACTIONS ON COMPUTERS. He was a Distinguished Lecturer of the IEEE
modern VLSI placement,’’ in Proc. 56th ACM/IEEE Design Autom. Conf. Circuits and Systems Society, from 2007 to 2008.
(DAC), Jun. 2019, pp. 1–6.