0% found this document useful (0 votes)
21 views12 pages

JAS.2020.1003114 - Main Reference Paper

The paper discusses the integration of AI and machine learning in smart manufacturing, focusing on optimizing semiconductor manufacturing processes through data-driven approaches. It proposes a dynamic algorithm utilizing genetic algorithms and neural networks for feature selection and fault detection, aiming to enhance productivity while reducing costs. The research highlights the importance of addressing challenges such as imbalanced data and feature extraction to improve manufacturing efficiency and decision-making.

Uploaded by

sharesiva81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views12 pages

JAS.2020.1003114 - Main Reference Paper

The paper discusses the integration of AI and machine learning in smart manufacturing, focusing on optimizing semiconductor manufacturing processes through data-driven approaches. It proposes a dynamic algorithm utilizing genetic algorithms and neural networks for feature selection and fault detection, aiming to enhance productivity while reducing costs. The research highlights the importance of addressing challenges such as imbalanced data and feature extraction to improve manufacturing efficiency and decision-making.

Uploaded by

sharesiva81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE/CAA JOURNAL OF AUTOMATICA SINICA 1

AI-Based Modeling and Data-Driven Evaluation for


Smart Manufacturing Processes
Mohammadhossein Ghahramani, Member, IEEE, Yan Qiao, Member, IEEE,
MengChu Zhou, Fellow, IEEE, Adrian O’Hagan, and James Sweeney

Abstract — Smart manufacturing refers to optimization tech- things (IIoT) [1], [2] and machine learning (ML) to enable
niques that are implemented in production operations by utiliz- machinery to boost performance through self-optimization
ing advanced analytics approaches. With the widespread increase
in deploying industrial internet of things (IIoT) sensors in manu- [3]–[7]. Employing computer control over manufacturing
facturing processes, there is a progressive need for optimal and phases can make industry processes smart. Broadly speaking,
effective approaches to data management. Embracing machine smart manufacturing (SM) can be defined as a data-driven ap-
learning and artificial intelligence to take advantage of manufac- proach that leverages IoT devices and various monitoring
turing data can lead to efficient and intelligent automation. In this sensors. Deploying modern technologies, e.g., IoT coupled
paper, we conduct a comprehensive analysis based on evolution-
with cloud computing, in manufacturing, provides access to
ary computing and deep learning algorithms toward making
semiconductor manufacturing smart. We propose a dynamic al- valuable data at different levels, i.e., manufacturing enterprise,
gorithm for gaining useful insights about semiconductor manu- manufacturing equipment, and manufacturing processes. With
facturing processes and to address various challenges. We elabor- the prodigious amount of manufacturing data at hand, compu-
ate on the utilization of a genetic algorithm and neural network to tational intelligence (CI) enables us to transform data into
propose an intelligent feature selection algorithm. Our objective is real-time manufacturing insights. Manufacturing, then, can be
to provide an advanced solution for controlling manufacturing
processes and to gain perspective on various dimensions that en- controlled by leading-edge CI and artificial intelligence (AI),
able manufacturers to access effective predictive technologies. and tasks are modelled based on experimental observations, to
enhance productivity while reducing costs.
Index Terms—Artificial intelligence (AI), cyber physical systems,
feature selection, genetic algorithms (GA), industrial internet of things Cost-effective and sustainable manufacturing has become
(IIoT), machine learning, neural network (NN), smart manufacturing. the focus of academia and industry. In doing so, it is of so
great importance to identify which factors play a pivotal role
I. Introduction in process outcomes. An integrated model based on
VER recent decades, the manufacturing industry wit- manufacturing processes and data analytics is demonstrated in
O nessed tremendous advances in the form of four major
paradigm shifts. In the latest industrial revolution, Industry
Fig. 1. The model has been divided into different layers and
can be considered as a computer-integrated manufacturing
4.0, manufacturing has embraced the industrial internet of (CIM) model from which computational intelligence can take
This article has been accepted for publication in a future issue of this journ-
control of the entire production processes. At the Business
al, but has not been fully edited. Content may change prior to final publica- planning level, all decisions regarding the end product are
tion. made. the operational decisions related to optimizing
This work was supported in part by the Science and Technology develop-
ment fund (FDCT) of Macau (011/2017/A), and the National Natural Science
processes are managed in the operation management level. On
Foundation of China (61803397). Recommended by Associate Editor Xin the monitoring level, different sensor-based monitoring
Luo. (Corresponding author: MengChu Zhou.) approaches, e.g., anomaly detection methods, are employed.
Citation: M. Ghahramani, Y. Qiao, M. C. Zhou, A. O’Hagan, and J. Finally, data acquisition and real-time processing are
Sweeney, “AI-based modeling and data-driven evaluation for smart manufac-
turing processes,” IEEE/CAA J. Autom. Sinica, pp. 1–12, 2020. performed at the production process level and sensing level,
DOI: 10.1109/JAS.2020.1003114.
M. Ghahramani is with the University College Dublin (UCD), Belfield,
Implementation ERP Business planning
Dublin 4 D04, Ireland (e-mail: [email protected];
[email protected]). Optimization MES Operation management
Y. Qiao is with the Institute of Systems Engineering, Macau University of
Science and Technology, Macau 999078, China (e-mail: SCADA/HMI Monitoring
[email protected]). Data acquisition
Real-time preprocessing PLC, CNC, RTU Sensing & manipulating
M. C. Zhou is with the Helen and John C. Hartmann Department of Elec- anamoly detection
trical and Computer Engineering, New Jersey Institute of Technology, Ne-
Sensors & signals Production process
wark, NJ 07102, USA (e-mail: [email protected]).
A. O’Hagan is with the University College Dublin, Belfield, Dublin 4 D04,
Ireland (e-mail: [email protected]). Fig. 1. Different levels of automation and their corresponding data analyt-
J. Sweeney is with the Royal College of Surgeons in Ireland (RCSI), Dub- ics (ERP:= enterprise resource planning; MES:= manufacturing execution
lin 8 D08, Ireland (e-mail: [email protected]).
systems; SCADA:= supervisory control and data acquisition; HMI:= human
Color versions of one or more of the figures in this paper are available on-
line at http://ieeexplore.ieee.org. machine interface; PLC:= programmable logic controller; CNC:= computer
Digital Object Identifier 10.1109/JAS.2020.1003114 numerical control; RTU:= remote terminal unit).

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
2 IEEE/CAA JOURNAL OF AUTOMATICA SINICA

respectively. 3) Whether combining ML with AI can outperform the


The approach implemented in this work aims to mitigate traditional methods.
cost and production risks and promoting sustainable Given the extracted features, various classification methods
development of semiconductor manufacturing. Moving are tested and the one with the minimum classification error
towards an optimal system, i.e., one that is adaptive and rate are selected. Also, a comparison between the proposed
intelligent, is not a trivial task; however, embedding solution and traditional methods is presented. The integrated
intelligent algorithms in automation and semiconductor approach is shown to outperform the others in terms of the
production could be beneficial for both reducing cost and accuracy and performance of a manufacturing system. This
enhancing quality of products. The main focus of smart model can be also useful for fault detection without requiring
manufacturing studies is on the product life-cycle specialized knowledge. In this implementation, we have
management, manufacturing process management, industry- encountered several issues, e.g., handling imbalanced data,
specific communication protocols, and manufacturing exploration, and exploitation in an optimization process. To
strategies. Recent advances in technology-based solutions, address such concerns, various scenarios are discussed
e.g., IoT, cloud/fog computing, and big data, can expedite and throughout the paper.
simplify the production process and make new development
of manufacturing possible [8]–[12]. These advances should Research Methodology and Contributions
drive the evolution of manufacturing architectures into Modern embedded systems, an emerging area of ML, AI,
integrated networks of automation devices and enable the and IoT, can be a promising solution for efficient, cost-
smart characteristics of being self-adaptive, self-sensing, and effective manufacturing production. Semiconductor
self-organizing. Providing such solutions includes addressing manufacturing is a highly interdisciplinary process, complex
several challenges, e.g., data volume, data quality and data and costly including various phases. Failures during the
merging. manufacturing phases result in faulty products. Hence,
Traditional fault detection and diagnosis systems interpret detecting the causes of failures is crucial for effective policy-
sensory signals as single values [13]. Then, these values are making and is a challenging task in the business planning
fed into a model to verify product status. The main drawback stage as demonstrated in Fig. 1. This can be achieved by fully
of this approach is that it fails to determine the most important exploring production phases and extracting relevant
features/operations involved in semiconductor production and manufacturing features involved in the production. Therefore,
may result in the loss of sensory data. Moreover, sensory data fault detection and feature extraction are of much importance.
might consist of noise, outliers, and missing values and can be Accordingly, we deal with implementing a model for feature
characterized by heterogeneous structures. To address these extraction and classification in semiconductor manufacturing.
concerns, our goal is to propose an intelligent and dynamic The solution involves developing a model for monitoring
algorithm consisting of a feature extraction mode. Generally processes based on ML and AI algorithms to enhance the
speaking, predicting the quality of products is an imbalanced overall output of manufacturing processes by extracting the
classification problem and semiconductor manufacturing is most relevant features. Interpreting these features
not an exception. To be specific, the dataset is imbalanced (manufacturing processes), consequently, provides us with the
because the defective rate in manufacturing processes is quite ability to quickly identify the root cause of a defect. One such
low in practice. To address this potential issue, a proper efficient model contributes to cost reduction and productivity
imbalanced technique needs to be taken into account to improvement.
improve model performance. Such implementation is While the greatest challenge in this work is a feature
discussed in the following sections. Moreover, we propose an extraction/feature selection task, some other data-related
integrated algorithm to solve a multiobjective problem based issues, such as imbalanced data and outliers, should first be
on a artificial neural network (ANN) and genetic algorithm addressed. These data preparation steps aim to transform raw
(GA), to establish a fault diagnosis solution by extracting the data into meaningful and useful ones that can be used to
most relevant features and then using these features as an distinguish data patterns and consequently enable us to
input for classifiers. It should be mentioned that implement effective strategies. To solve the imbalanced
multiobjective evolutionary algorithms (MOEA) is divided classification issue, we have adopted a synthetic minority
into different categories, i.e., decomposition-based and over-sampling algorithm to boost the small number of
dominance-based methods. In this work, a decomposition defective cases and assign a higher cost to the
method (weighted sum technique) based on a binary GA and misclassification of defective products than that of normal
ANN is proposed. This approach is practicable for all kinds of products. A confidence interval is defined and outliers have
manufacturing analyses in the context of feature been identified based on this measurement and eliminated.
extraction/selection, dynamic optimization, and fault Then, the initial set of data is fed into a feature selection
detection. Specifically, we investigate the following: algorithm. Feature extraction aims to project high-dimensional
1) How a hybrid model based on an evolutionary algorithm data sets into lower-dimensional ones in which relevant
combined with ANN can be proposed to model nonlinearity; features can be preserved. These features, then, are used to
2) How to integrate the capabilities of ML combined with distinguish patterns.
AI to implement a highly flexible and personalized smart The proposed dynamic feature selection model is based on
manufacturing environment; an integrated algorithm including a meta-heuristic method

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
GHAHRAMANI et al.: AI-BASED MODELING AND DATA-DRIVEN EVALUATION FOR SMART MANUFACTURING PROCESSES 3

(GA) and artificial neural network. We have implemented a reduction and feature selection/extraction methods, e.g.,
binary GA to determine the optimal number of features and its principal component analysis (PCA), linear discriminant
relevant cost which are used to create a predictive model. Our analysis (LDA), and canonical correlation analysis (CCA),
goal is a solution with low-cost values in a search process. play a critical role in dealing with noise and redundant
The cost function has been defined by using a multilayer features and must be considered as a preprocessing stage of
perceptron that is considered as an embedded part of a feature manufacturing data analysis, which leads to better insights and
selection algorithm. GA consists of different phases, i.e., robust decisions [16]. Some previous manufacturing fault
parent selection, crossover, mutation, and creating final detection studies have focused on utilizing the mentioned
population (the selected features) [14]. Parent selection is a techniques for extracting the most relevant features and
crucial part of GA and consists of a finite repetition of classification. Feature selection methods can be divided into
different operations, i.e., selection of parent strings, three main categories, i.e., filter, wrapper, and embedded
recombination, and mutation. The objective in a reproductive methods. The filter methods act based on ranking the features.
phase is to select cost-efficient chromosomes from the In wrapper methods, features are selected based on the
population, which create offsprings for the next generation. To performance of predictors. Finally, embedded methods
address the exploration and exploitation and to avoid include variable selection as part of the training process
premature convergence, we have proposed a selection scheme without splitting the data into training and testing sets.
by combining different crossover operations. The mentioned In [17], the authors have utilized PCA to extract features to
issue is heavily related to the loss of diversity. The proposed decrease the computational cost and complexity. Given the
solution also eliminates the cost scaling issue and adjusts the extracted features, they have implemented a classification
selection pressure throughout the selection phase. We adjust algorithm to infer whether a semiconductor device is a
the balance between exploration and exploitation by defective or normal sample. To that end, they have adopted a
recombining crossover operators with adjustment of their k-nearest neighbors (KNN) classification method. Cherry et
probabilities. A discussion for determining exploration and al. have conducted another model based on a multiway PCA
exploitation rate is presented in the following sections. (MPCA) to monitor stream data [18]. A decision tree
Consequently, offsprings are created by adjusting such algorithm has been developed in [19] to explore various types
probabilities throughout the mating pool by establishing a of defective devices. A KNN method has been utilized in [20],
hybrid roulette-tournament pick operator. Selected features and Euclidean distance has been considered to measure
are fed to a predictive model to determine fault status. It is similarities among features. Verdier et al. have improved the
worth mentioning that the algorithm considers two major performance of a KNN algorithm tailored for fault detection
conflicting objectives: minimizing the number of features and in semiconductor manufacturing by defining similarity
maximizing the classification performance. Consequently, the measurement based on Mahalanobis distance [21]. A support
result of the proposed model are compared with traditional vector machine (SVM) is used to detect semiconductor
approaches. The experiments have verified the effectiveness failures in [22]. The authors have developed their approach
and efficiency of our approach as compared to those in the based on an RBF kernel to address the high dimensionality
literature. In summary, the overall objective is to propose an issue. In [23], an incremental clustering method is adopted for
AI-based multi-objective feature selection method together fault detection. A Bayesian model has been proposed to infer
with an efficient classification algorithm to scrutinise a manufacturing process. The authors have considered the root
manufacturing processes. causes of manufacturing problems. However, their approach
The remainder of this paper is organized as follows: some heavily relies on an expert’s knowledge regarding the related
related work about manufacturing processes, feature field. Zheng et al. have proposed a convolution neural
extraction and application of AI is described in Section II; a network [24]. They have decomposed multivariate time-series
preprocessing procedure is discussed in Section III; the datasets into univariate ones. Then, features have been
proposed approach with its associated discussions is given in extracted and an MLP-based method has been implemented
Section IV; the experimental settings and the classification for data classification. Lee et al. have compared the
results are shown in Section V; and the future work and performance of different fault detection models, including
conclusions are presented in Section VI. feature extraction algorithms and classification approaches
[25]. They have revealed that developing an algorithm based
II. Related Work on features that are not suitable for a specific model can
Recently, the rapid evolution of high-throughput deteriorate the performance of classifiers significantly.
technologies has resulted in the exponential growth of Therefore, it is desirable to consider both feature extraction
manufacturing data [15]. Since traditional approaches toward and classification stages simultaneously to maximize a model’s
data management are impractical due to high dimensionality, performance.
proposing an effective and efficient data management strategy Most studies in the literature have focused on using PCA
has become crucial. To do so, ML can help develop strategies and KNN algorithms for manufacturing data classification.
to automatically identify patterns from high dimensional However, PCA-based approaches project features to another
datasets. The key to leveraging manufacturing data lies in space based on a linear combination of original features.
constant monitoring of processes, which can be associated Therefore they cannot be interpreted in the original feature
with different issues, e.g., noisy signals. Dimensionality space [26]. Moreover, most of the PCA-related work has

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
4 IEEE/CAA JOURNAL OF AUTOMATICA SINICA

considered linear PCA, which is not efficient in exploring e.g., noise, outliers, inconsistency, and missing values. We
non-linear patterns. Although these techniques try to cover have dealt with missing value and noise resulting from inexact
maximum variance among manufacturing variables, data collection. These can negatively affect a later processes.
inappropriate selection of parameters, e.g., principal Outlier labelling methods and T-squared statistics (T2) have
components, may result in great data loss. KNN is a memory- been utilized. Any observation beyond the interval has been
based classifier. Hence, in cases of high dimensional data sets, eliminated.
its performance degrades dramatically with data size. To
overcome the mentioned concerns, an efficient global search A. Outlier Detection
method (e.g., evolutionary computation (EC) techniques) Suppose that, F = { f1 , f2 , . . . , fm } denotes the feature set and
should be considered to better address feature selection L = {Failure, S uccess} the label set, where m is the number of
problems [27]. These techniques are well-known for their features.
global search ability. Derrac et al. [28] have proposed a Matrix X ∈ Rn×m can be defined as
cooperative co-evolutionary algorithm for feature selection    
 X1   x11 x12 · · · x1m 
based on a GA. The proposed method addresses feature  X2   x21 x22 · · · x2m 
selection task in a single process. However, it should be X =  .  =  . .. ..

..  (1)
mentioned that EC algorithms are stochastic methods, which  ..   .. . . . 
   
may produce different solutions when using different starting Xn xn1 xn2 · · · xnm
points. Therefore, the proposed model suffers an instability where R is the real number set, Xi (the ith observation) is
issue. Zamalloa et al. [29] have utilized a GA-based method to defined as an m-tuple (m is the number of features), contain-
rank features. Consequently, features have been selected given ing all features, and n is the number of observations.
the rank orders. A potential drawback of this work is that the The label feature Y is as follows:
proposed method might lead to data loss. Moreover, this [ ]T
solution has not considered the correlation among features. Y = y1 , y2 , . . . , yn (2)
To address the mentioned concerns, we have proposed our
where yi is the corresponding label (Success or Failure) for
solution based on a dynamic feature selection method
the ith observation (Xi ) and T is a transpose operator.
consisting of different modes to provide information on the
We utilize the Mahalanobis distance of each observation
variables that are crucial for fault diagnosis. To that end, we
(Xi ) from the mean, i.e.,
have integrated ANN into our model in order to examine
nonlinear relationship among features. Advanced computing D = (Xi − X̄)S −1 (Xi − X̄)T (3)
and AI can provide manufacturing with a higher degree of where S −1 is the inverse of the m × m variance-covariance
intelligence and low-cost sensing and improve efficiency [30]. ∑
matrix (scatter matrix) and X̄ = ( ni=1 Xi )/n . The Mahalanobis
The process of conducting intelligent manufacturing can be
distance and the values are chi-square distributed. The vari-
regarded in two ways. Firstly, the manufacturing industry has
ance-covariance matrix can be calculated as
become a great contributor to the service industry and
1 ∑
n
secondly the lines between the cyber and physical systems are
becoming blurred. Hence, architectural approaches like S= (Xi − X̄)T (Xi − X̄) (4)
n − 1 i=1
service-oriented architectures (Cloud manufacturing) can be
taken into account in manufacturing modes and systems. In where α is the confidence interval. Given α, if
such distributed and heterogeneous systems, manufacturing (Xi − X̄)S −1 (Xi − X̄)T > α2, Xi is treated as an outlier and elim-
resources can be aggregated based on an efficient service- inated. For this purpose, a quantile of the χ2 distribution (e.g.,
oriented manufacturing model and processed/monitored in an the 97.5% quantile) is considered.
effective way. Application of those solutions can pave the way
B. Handling an Imbalanced Data Set
for large-scale analysis and leads to high productivity.
Developing a successful model includes various steps, e.g., The observations that have been labeled as Failure are
data cleansing and data transformation, to reveal insights. As relatively rare (104 cases) as compared to the Success class.
the quality of data affects the analysis, it is essential to employ Hence, we face an imbalanced classification issue. In other
a data preprocessing procedure. Such discussion is words, Success class (the majority) outnumbered Failure class
demonstrated next. (the minority) and both classes do not make up an equal
portion of our data set. Two distinctive approaches can be
III. Data Preprocessing considered to deal with this issue: 1) skew-insensitive
The data set used in this work is obtained from a methods; 2) re-sampling methods. The first category addresses
semiconductor factory, semiconductor manufacturing the problem by assigning a cost to the training data set while
(SECOM) dataset. It consists of various operation the second one adjusts the original data set such that a more
observations, i.e., wafer fabrication production data, including balanced class distribution is achieved. Re-sampling methods
590 features (operation measurements). The target feature is have become standard approaches and have been dominantly
binomial (Failure and Success), referring to the production utilized recently [31]–[33]. They can be classified into
status, and encoded as 0 and 1. The first step in data analysis different categories, e.g., sampling strategies, wrapper
is data cleansing to address a variety of data quality issues, approaches, and ensemble-based methods. Implementing a

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
GHAHRAMANI et al.: AI-BASED MODELING AND DATA-DRIVEN EVALUATION FOR SMART MANUFACTURING PROCESSES 5

proper method is crucial, otherwise it can be problematic, e.g., and some genetic operators, e.g., crossover and mutation, to
data loss and overfitting, and can result in a poor outcome. generate a new generation by recombining a population’s
Our goal in this phase is to relatively balance class chromosomes. Then, fitter individuals are selected according
distribution. To do so, we have utilized a synthetic minority to a cost-function (objective-function) in a reproduction phase.
over-sampling technique. There are various over-sampling GA maintains its effectiveness from two sources: exploration
algorithms, such as SMOTE, Borderline-SMOTE, and Safe- and exploitation. The former can be considered as a process of
Level-SMOTE, just to name a few. The mentioned methods exploring a search space (by genetic search operators, e.g.,
create synthetic samples based on the nearest neighbour crossover operation), while the latter is the process of
approach and can be negatively impacted by the over- employing a mutation operator and modifying offsprings’
generalization issue. To overcome these problems in this chromosomes. A balance between the mentioned abilities
work, a density-based SMOTE [34], [35] technique is utilized (exploration and exploitation) should be maintained. To that
and by synthetically adding Failure class instances we make end, beneficial aspects of existing solutions (individuals with
the distribution more balanced. It is an over-sampling method lower costs) should be exploited. Moreover, exploring the
in which the Failure class is over-sampled by generating its feature space in order to find an optimal solution (optimal
synthetic instances. features) is crucial. While a crossover operation is the main
search operator, a mutation operator is employed to avoid
C. Feature Selection
premature convergence. The level of exploration/exploitation
As stated, the data set consists of nearly 600 features. Data can be controlled by selection processes, e.g., selection
sets with high dimensions can cause serious challenges such pressure parameter. Selecting an appropriate pressure
as overfitting in learning processes, known as the curse of measurement (β in this work) can maintain a balance between
dimensionality. To address these challenges the exploration and exploitation. Such discussion is provided in
dimensionality needs to be reduced and different approaches Section IV. Parameter β has been used in the parent selection
have been proposed in the literature. Generally speaking, stage and candidate individuals have been taken into account
dimensionality reduction can be considered as an approach to in the generation production. This operation, iteratively, has
eliminate redundant (or noisy) features. It can be divided into been repeated until the termination criteria (number of
two categories, feature extraction and feature selection. The iterations or number of function evaluations (NFE)) are met.
former refers to those methods (PCA and LDA) that map The best individual (the one with the minimum cost) is
original features to a new feature space with lower selected and in this way optimal features are then identified.
dimensionality while the latter aims to select a subset of Fig. 2 displays our proposed feature selection model.
features such that the trained model (based on the selected
features) minimizes redundancy and maximizes relevance to IV. Feature Selection Model
the target feature. PCA (a classic approach to dimensionality As mentioned earlier, our objective is to modify the output
reduction), multidimensional scaling, and independent of each iteration (a subset of features) by searching feature
component analysis (ICA) all suffer from a global linearity space and finding proper values for the input features such
issue. To address the mentioned shortcoming, nonlinear that the measured cost is minimized. Given Fig. 2, our
techniques have been proposed: kernel PCA, Laplacian proposed feature selection model consists of different phases.
eigenmaps and semidefinite embedding. Since reconstructing It starts with defining an initial population, i.e., individuals
observations (after the projection phase) in these nonlinear including m-dimensional chromosomes.
methods is not a trivial task, finding the corresponding pattern
is sometimes impractical. In a feature extraction approach, ζ = (v1 , v1 , . . . , vm ) (5)
observations are projected into another space where there is where vi is either 1 or 0, and corresponds to the status of the
no physical meaning between newly generated features and ith variable (feature), selected or not. While some individuals
the original ones. Hence, feature selection methods are are admitted to the new generation unchanged, others may be
superior in terms of readability and interpretability in this subject to some genetic operators (crossover and mutation).
sense. Therefore, to avoid complexity and uncertainty that The cost related to each individual is evaluated by the ANN in
feature extraction techniques bring, a feature selection the second phase to be presented in Section IV. These costs
approach has been opted for in this work. To this end, we have are utilized (in the parent selection phase) to determine which
proposed an integrated approach, consisting of a metaheuristic offsprings are used to create a new generation. The objective
algorithm (GA) and an artificial neural network. GA is a is to select two individuals (with lower costs) from the popula-
heuristic search method and inspired by Charles Darwin’s tion such that newly created offsprings would inherit such pat-
theory of natural evolution. Since selecting features can be terns from their parents. Generally speaking, there are several
considered as a binary problem, we have developed our model approaches to select parents such as random selection, rank
based on binary GA that treated candidate features selection, stochastic universal sampling (SUS), tournament se-
(chromosomes in GA terminology) as bit-strings. lection, and Boltzmann selection. There is no selection pres-
GA relies on a population of individuals to explore a search sure parameter in the random selection method, and hence it is
space. Each individual is a set of chromosomes, encoded as usually avoided. Rank selection and SUS suffer from prema-
strings of 0 (if the corresponding feature is not selected) and 1 ture convergence and applying such approaches may easily
(if the feature is selected). GA utilizes an initial population result in a local optimum. To avoid the mentioned situation

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
6 IEEE/CAA JOURNAL OF AUTOMATICA SINICA

Create
Initialize Cost Termination Optimal
new
population evaluation criteria (NFE) features
population

1 2 3 4 5 6 7 8 ... N−2 N−1 N Parent


N features selection
1 1 0 1 0 0 0 0 ... 1 0 0
Selecting features
Cross-over

Mutation

Input layer Hidden layers Output layer


(Selected features
in each iteration)

Fig. 2. Feature selection model using artificial neural network and genetic algorithm.

and maintain good diversity we have employed the Boltzmann Manufacturing features Manufacturing features
selection method which is inspired by simulated annealing. first parent second parent

The probability of an individual being selected is calculated Parents


according to the below Boltzmann probability 1 1 0 0 1 0 0 0 1 1 0 1 0 1 0 1 1 0 1 0 1 0 0 1 1 0 0 0

e−βJi Cross-over
p(i) = ∑η p (6) Offsprings
k=1
e−βJk 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0 1

where η p is the size of the initial population and J is the 1 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 0 1

defined cost function. β is the selection pressure. It is clear Mutation


that parents are selected based on probabilities which are pro-
portional to the costs measured in the prior phase. This means Fig. 3. Reproduction phase.

that individuals with a lower cost are more likely to be chosen


discussed earlier, after the initial population is created, the
than ones with a greater cost. It should be mentioned that we
∑ parent selection operation should be conducted in the repro-
have selected the β parameter such that i∈H p(i) = 0.7, where
duction phase. Our goal is to select individuals from those
H is the set of half of the best individuals (population is sor-
with minimum costs in the population. Consequently, parents
ted according to their cost values and η p /2 of them are selec-
are selected to create offspring for the next generation. The
ted). Consequently, the Roulette wheel method is utilized for
cost function and the way we have integrated ANN to calcu-
sampling (selecting parents using stochastic sampling with re-
late this measurement is next described.
placement based on Boltzmann probability function). A circu-
lar wheel is considered and divided into η p pies, each of which A. Cost Function and MLP
is proportional to the cost values. The wheel is spun and the
Our objective in the feature selection phase is to explore a
individual related to the pie on which it stops is then selected.
hypothesis space, find the optimal number of features, and
We have repeated this procedure until our predefined number
consequently reduce the dimensionality. In other words, we
of parents are selected. In this way, individuals with the
are looking for a subset of the original dataset,
largest cost value have the minimal chance to be selected. Par-
J : X ′ ⊆ X → R , such that two criteria are to be met. The cost
ents are selected according to the weighted slots, cross-over
function obtains different subset of features and target values
operations are then applied to them. On this basis, the chromo-
as the input and the corresponding costs are calculated. Given
somes of selected parents are combined to create new off-
the conventions adopted earlier, let F be the original feature
spring. As demonstrated in Fig. 3, a random portion of the
set with the cardinality of |X| = m. Now, let J(X ′ ) be an
first individual is swapped with a random portion of the
evaluation measure to be optimized given the below criteria
second one. In this process the chromosomes combination can
be carried out in different ways, e.g., single-point, double- 1) Set |X ′ | = k < m. Find X ′ ⊂ X , such that J(X ′ ) is
point, or uniform crossover. In single-point cross-over, one maximized. It is equal to minimizing the mean squared error

1∑
random position in the array of bits is selected and exchan- n
ging then takes place, while in double-point method, two posi- ϵ= (Yi − Ji )2 . (7)
n i=1
tions are chosen and chromosomes are swapped. In uniform
cross-over, parents’ chromosomes are selected for random ex- 2) Find optimal features (in the case of both the number of
change. Parents contribute to creating new offspring based on features and discrimination) while minimizing |X ′ | = n j.
a bit string known as the cross-over mask. Let ξ be the pre- It should be mentioned that we are facing a multi-objective
defined cross-over mask, e.g., ξ = {1, 1, 0, 0, 0, 1, . . . , 0, 0, 1}. As optimization problem [36]. We define our objective function

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
GHAHRAMANI et al.: AI-BASED MODELING AND DATA-DRIVEN EVALUATION FOR SMART MANUFACTURING PROCESSES 7

with weights as methods have some serious drawbacks which can make their
′ results unrealistic. Filter methods do not consider the features’
J = ϵ × (1 + (Ω × |X |)) (8)
dependencies and the relationship between independent and
where |X ′ |
is the dimension of selected features in each itera- dependent features. There is a high risk of an overfitting
tion, and Ω can be considered as a cost parameter for choos- problem in the wrapper approach. Embedded methods are
ing new features. If Ω = 0, all features are selected, while a more of a local discrimination approach than a global one and
large number to Ω results in no feature being selected. This the hybrid methods are computationally expensive. Next we
parameter is a trade-off between relevancy and redundancy compare the results of the models implemented in this work.
and must be designated carefully. As stated, our objective is to
minimize objective function J . In doing so, we have integ- B. Results
rated artificial neural network (ANN) and GA. GA gets the The experiment has been conducted on a computer with
defined cost function (i.e., feature-selection-cost, J ) as the in- quad-core Intel i9-7900X CPU 8 GHz processor and 32 GB
put and employs the ANN to calculate cost values. Iteratively, memory. It was equipped with a NVIDIA GeForce GTX 1080
different individuals (bit strings) consist of 0 and 1 (where 1 GPU and 8 GB memory. The parallel algorithm has been
refers to a feature being selected and 0 refers to is not being implemented by CUDA programming.
selected) are generated and evaluated by GA’s operations. The proposed algorithm for feature selection is based on an
Multilayer perceptron (MLP) is utilized to calculate ϵ (in each adaptive and dynamic GA combined with a neural network.
iteration). The Multi-layer perceptrons with Levenberg- Our meta-heuristic method evaluates various subsets of
Marquardt training algorithm is used (since it converges faster features to optimize our defined cost function whose
and more accurately towards our problem) and consisting of 2 calculation has been given to a multilayer perceptron. We
layers (15 neurons in the hidden layer) of adaptive weights consider the volume of our data and the number of features
with full connectivity among neurons in the input and hidden and samples for defining the initial population rate. We
layers. All costs are calculated and the best features are selec- choose the number of neurons based on a trial and error
ted such that the corresponding cost is minimized. To sum- method. It should be mentioned that we have used the neural
marize the procedures, the pseudo code of the feature selec- network as a cost function and in this context the main
tion model is presented in Algorithm 1. objective is to decrease the cost function’s values. The
algorithm gets the initial solutions (manufacturing operations)
V. Analysis Procedures and obtains the optimal features after a series of iterative
As mentioned earlier, in this work, we deal with a computations (given the termination criteria, e.g., number of
classification problem with a relatively large number of function evaluation). Fig. 4 displays the cost values in each
variables. It has been widely discussed [37] that irrelevant iteration.
variables may deteriorate the performance of algorithms. The
application of feature extraction/selection methods can make it 0.13
possible to choose a subset of features possible and thus helps 0.12
achieve reliable performance. Most studies in the literature 0.11
have considered feature selection as a single objective
0.10
problem while our solution is based on a multi-objective
approach. In this section, different approaches, i.e., 0.09
Cost

conventional feature extraction methods and the model 0.08


proposed in this work, are compared. Our objective is to 0.07
demonstrate that an intelligent algorithm can outperform the
0.06
results of other competing classification methods.
0.05
A. Feature Extraction Methods 0.04
0 0.5 1.0 1.5 2.0 2.5
Different scenarios in the context of feature extractions are NFE ×104
available to remove irrelevant features. All solutions have
been considered as a pre-processing task in order to increase Fig. 4. Cost function value versus number of function evaluations (NFE).
the learning accuracy. These conventional methods can be
categorized into the filter, wrapper, embedded, and hybrid Finally, we have examined various classification techniques,
techniques. Filter methods are divided into univariate and and the most appropriate one is selected. To do so, different
multivariate layers. The relevance of features is evaluated classification models, e.g., Gaussian support vector machine,
based on ranking techniques. Wrapper methods, e.g., random forest, linear discriminant, k-NN, and SVM with RBF
sequential selection and heuristic search algorithms, are kernel, have been tested. The classifiers’ performance is
basically a search algorithm and relevant features are selected evaluated according to their classification accuracies. The
by training and testing a classification model. Embedded ability of each method to accurately predict the correct class is
methods are performed based on dependencies among measured and expressed as a percentage. ROC curves are used
features. Finally, the hybrid method is based on a combination to determine the predictive performance of the examined
of other approaches and consists of different phases. These classification algorithms. The area under a ROC curve can be

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
8 IEEE/CAA JOURNAL OF AUTOMATICA SINICA

1.0 1.0 1.0


(0.18,0.96)
True positive rate 0.8 0.8 0.8

True positive rate

True positive rate


0.6 0.6 0.6 (0.31,0.59)
(0.28,0.55) AUC = 0.89
AUC = 0.69
0.4 AUC = 0.68 0.4 0.4

0.2 0.2 0.2


ROC curve ROC curve ROC curve
Area under curve (AUC) Area under curve (AUC) Area under curve (AUC)
0 Current classifier 0 Current classifier 0 Current classifier
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
False positive rate False positive rate False positive rate
(a) (b) (c)

1.0 1.0 1.0


(0.08,0.94) (0.23,0.93)
0.8 0.8 0.8
True positive rate

True positive rate

True positive rate


(0.27,0.73)
0.6 0.6 0.6
AUC = 0.98 AUC = 0.93 AUC = 0.81
0.4 0.4 0.4

0.2 0.2 0.2


ROC curve ROC curve ROC curve
Area under curve (AUC) Area under curve (AUC) Area under curve (AUC)
0 Current classifier 0 Current classifier 0 Current classifier

0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
False positive rate False positive rate False positive rate
(d) (e) (f)

Fig. 5. ROC curves for different classification methods: (a) linear discriminant; (b) random forest; (c) logistic regression; (d) Gaussian SVM; (e) k-NN; and
(f) SVM with RBF kernel.

considered as an evaluation criterion to select the best variation. Together with PCA, we have tested most popular
classification algorithm. When the area under the curve is algorithms, e.g., family-wise error rate (FWE), false discovery
approaching 1, it indicates that the classification has been rate (FDR), sequential forward selection (SFS), sequential
carried out correctly. Fig. 5 shows AUC-ROC curves resulted backward selection (SBS), filtration feature selection (FFS),
from implementing different classification methods. correlation-based feature selection (CFS), Lasso regression
Some statistical results (e.g., the percentage of correct and ensemble methods, for feature extraction [38]. We have
predictions) have been also provided in Table I. Given the implemented these traditional methods to reduce the
results demonstrated, Gaussian SVM has been selected as the dimensionality of our data set and compared the results. To do
classification model. Some other feature selection methods are so, the extracted features are used as the input for the chosen
also utilized to compare their results with our proposed classifier. Fig. 6 displays the analysis based on the Lasso
approach. The discussion regarding it is presented next. regression method.
The experimental results (Table II) show that our proposed
TABLE I model is superior over those conventional ones. The
Comparisons of Different Classification corresponding accuracy rate of the proposed model is over
Machine Learning Models
90%. An ROC comparison between our method and two of
Positive predictive value False discovery rate the traditional techniques is demonstrated in Fig. 7.
Classification method (Success class) (%) (Failure class) (%)
Linear discriminant 64 36 VI. Conclusions and Future Work
Random forest 83 4 The goal of manufacturing enterprises is to develop cost-
Logistic regression 63 35 effective and competitive products. Manufacturing itelligence
Gaussian SVM 91 6 can significantly improve effectiveness by bridging business
Adaptive k-NN 84 16 and manufacturing models with the help of low-cost sensor
SVM with RBF data. It aims to achieve a high level of intelligence with the
71 25
kernel latest appropriate technology-based computing, advanced
analytics, and new levels of Internet connectivity. The
C. Conventional Methods landscape of Industry 4.0 includes achieving visibility on real-
As discussed in the previous sections, most studies time processes, mutual recognition, and establishing an
regarding manufacturing data analysis have considered PCA- effective relationship among the workforce, equipment, and
based approaches which aim to detect the directions of most products. Most work in the area of manufacturing data

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
GHAHRAMANI et al.: AI-BASED MODELING AND DATA-DRIVEN EVALUATION FOR SMART MANUFACTURING PROCESSES 9

Degrees of freedom Features

500

Classification error
0
Coefficients

0.40

Features
−500

0.30

−1 500
−8 −6 −4 −2 0 −8 −6 −4 −2 0
Lambda Lambda

Fig. 6. Selecting features based on a conventional method, i.e., Lasso regression. The panels show the Lasso coefficient estimates and the curve of the meas-
urements for the degrees of freedom of the Lasso.

TABLE II 1 I Max ←Maximum number of iterations


Algorithms Comparisons 2 θ ← Crossover rate, µ ← Mutation rate
Feature extraction Selected Accuracy Accuracy 3 η p ← Size of population
method features (train data) (%) (test data) (%) 4 # Initialise Population
Lasso regression 54 78.8 74.2 5 for i ← 1, ..., η p do
PCA 48 87.8 79.2 6 Pop.position ← randomly-generated chromosomes
Univariate method (FWE) 15 74.3 47.8 7 Pop.cost ← calculated costs given each chromosome
Controlling false 8 end
12 60.7 41.5
discovery rate (FDR) 9 Pop ← Sort population(Pop);
Select percentile 71 84.0 81.5 10 # Main Loop
Sequential forward 11 ηc ← Size of crossover(based on θ)
88 81.2 72.9
selection
Sequential backward 12 ηm ← Size of mutation(based on µ)
40 78.0 72.4
selection 13 for i ← 1, ..., I Max do
Filtration feature selection 111 73.9 43.6 14 # Crossover operation
Correlation-based feature 15 Calculate probabilities based on
92 65.2 61.0
selection (CFS) Js
(−β)∗ LargestCost
Ensemble 76 73.3 70.9 exp
Pr(s ∈ Pop) = Jk
∑η p (−β)∗ LargestCost
Proposed method 36 93.7 90.2 k=1
exp

16 for i ← 1, ..., ηc do
analysis are based on PCA-based approaches. They are not 17 Select two parents (P1 and P2) based on a Roulette
wheel method given probabilities measured above;
able to recognize nonlinear relationships among features and
18 [O f f spring(i, 1).position, O f f spring(i, 2).position] ←
extract complex pattern. To address this concern, we have
CrossoverFn(P1.position, P2.position)
proposed a dynamic feature selection method based on GA 19 [O f f spring(i, 1).cost, O f f spring(i, 2).cost] ←
and ANN. We have compared the results achieved in this CostFn(P1.position, P2.position)
work with traditional approaches to prove the effectiveness of 20 end
our proposed solution. As a part of our future work, we plan to 21
consider other MOEAs, e.g., dominance-based algorithms, for 22 # Mutation operation
solving our optimization problem in a way that both feature 23 for i ← 1, ..., ηm do
selections objective functions are optimized simultaneously. 24 Select one parents (P) based on Roulette wheel method
Moreover, we will also compare the current model with other 25 [Mutant(i).position] ← MutationFn(P.position)
evolutionary algorithms proposed for feature selection. 26 [Mutant(i).cost] ← CostFn(P.position)
Appendix. Pseudo-Code for the Feature Selection 27 end
Model 28 Pop ← [Pop, Offsprings, and Mutants];
29 Pop ← Sort population and select first η p individuals;
GA has different parameters and the performance of a GA-
30 BestS olution ← Select first chromosome’s position,
based model depends on these parameters. We have discussed Pop(1).position;
how they have been selected throughout this work. Table III 31 BestCost(i) ← Select first chromosome’s cost, Pop(1).cost;
reveals the impacts of different parameter setting.
32 end
33 return Individual (Selected features);
Algorithm 1: Pseudo-code for the feature selection model
34___________________________________________________
Input: GA(|F|,CostFn) 35 Function [O1, O2] = CrossoverFn(P1, P2)
Output: Individual ← [Selected features (a binary vector)] 36 CrossoverMethod ← {Single-point, Multi-point, Uniform

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
10 IEEE/CAA JOURNAL OF AUTOMATICA SINICA

1.0 1.0 1.0


(0.07,0.95)
0.8 0.8 0.8 (0.27,0.86)
(0.27,0.73)
True positive rate

True positive rate

True positive rate


0.6 0.6 0.6
AUC = 0.98 AUC = 0.81 AUC = 0.87
0.4 0.4 0.4

0.2 0.2 0.2


ROC curve ROC curve ROC curve
Area under curve (AUC) Area under curve (AUC) Area under curve (AUC)
0 Current classifier 0 Current classifier 0 Current classifier

0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
False positive rate False positive rate False positive rate
(a) (b) (c)

Fig. 7. Comparing ROC results from the (a) proposed method vs (b) PCA vs (c) Lasso regression.

TABLE III
Comparing the Results of our Hybrid Model Given Different Parameter Setting
Crossover rate Mutation rate Population size Neurons Corresponding cost for selected solution in different iterations
50 10 2.0171 2.0171 2.0171 2.0021 2.0021 ... 0.6785 0.6785 0.6715 0.6715 0.6642
0.6 0.2 100 15 1.8264 1.8124 1.8124 1.8124 1.7023 ... 0.5211 0.5117 0.5011 0.5011 0.5011
150 20 1.7111 1.7114 1.6617 1.6617 1.5668 ... 0.5862 0.5808 0.5808 0.5631 0.5631
200 10 1.8311 1.8311 1.8311 1.8266 1.8266 ... 0.5507 0.5507 0.5446 0.5446 0.5446
0.6 0.3 250 15 1.5182 1.3109 1.3109 1.1102 1.1102 ... 0.4902 0.4902 0.4902 0.4852 0.4852
300 20 1.5330 1.5330 1.5330 1.4016 1.4001 ... 0.5062 0.4767 0.4767 0.4767 0.4561
300 10 1.0539 1.0539 0.9547 0.9547 0.9531 ... 0.3214 0.3112 0.3112 0.3072 0.3072
0.7 0.2 350 15 0.8413 0.8411 0.8231 0.8231 0.7877 ... 0.2813 0.2813 0.2813 0.2743 0.2708
400 20 0.8613 0.8532 0.8152 0.7462 0.7462 ... 0.2491 0.2491 0.2491 0.2491 0.2491
450 10 0.6013 0.5182 0.5182 0.5468 0.5012 ... 0.1926 0.1926 0.1926 0.1859 0.1859
0.7 0.3 500 15 0.4165 0.4002 0.4002 0.3922 0.3774 ... 0.1088 0.1088 0.0991 0.0991 0.0982
550 20 0.3911 0.3891 0.3891 0.3496 0.3347 ... 0.1068 0.0932 0.0932 0.0932 0.0932
550 10 0.2662 0.2662 0.2615 0.2612 0.2508 ... 0.0997 0.0997 0.0997 0.0974 0.0974
0.8 0.2 600 15 0.2227 0.2215 0.2016 0.2016 0.1835 ... 0.0762 0.0762 0.0742 0.0713 0.0713
650 20 0.1844 0.1808 0.1808 0.1808 0.1808 ... 0.0788 0.0788 0.0781 0.0781 0.0781
650 10 0.1542 0.1523 0.1523 0.1478 0.1478 ... 0.0661 0.0661 0.0652 0.0652 0.0652
0.8 0.3 700 15 0.1227 0.1214 0.1214 0.1205 0.1205 ... 0.0484 0.0484 0.0481 0.0472 0.0415
750 20 0.1304 0.1302 0.1246 0.1246 0.1228 ... 0.0493 0.0493 0.0478 0.0478 0.0478

crossover} Industrial Internet of Things, Springer Series in Wireless Technology,


Springer, Cham, pp. 3–19, 2017.
37 randomly select one method given probabilities defined for
each of them; [2] T. Lojka, M. Miskuf M., and I. Zolotova, “Industrial IoT gateway with
machine learning for smart manufacturing,” IFIP Advances in
38 return two offsprings; Information and Communication Technology, pp. 759–766, 2017.
39 end
[3] J. Tavcar and I. Horvath, “A Review of the principles of designing
40 Function M = MutationFn(P) smart cyber-physical systems for run-time adaptation: learned lessons
41 Apply mutation operator; and open issues,” IEEE Trans. Systems, Man, and Cybernetics: Systems,
42 return mutant; vol. 49, pp. 145–158, 2019.
43 end [4] B. Chen, D. W. C. Ho, W. Zhang, and L. Yu, “Distributed
dimensionality reduction fusion estimation for cyber-physical systems
44 Function J = CostFn(dataset)
under DoS attacks,” IEEE Trans. Systems, Man, and Cybernetics:
45 Employ ANN; Systems, vol. 49, pp. 455–468, 2019.
46 return ϵ × (1 + (Ω × |X ′ |)); [5] P. Palensky, E. Widl, and A. Elsheikh, “Simulating cyber-physical
47 end energy systems: challenges, tools and methods,” IEEE Trans. Systems,
Man, and Cybernetics: Systems, vol. 44, pp. 318–326, 2014.
[6] Y. Liu, Y. Peng, B. Wang, S. Yao, and Z. Liu, “Review on cyber-
References physical systems,” IEEE/CAA J. Autom. Sinica, vol. 4, pp. 27–40, 2017.
[1] S. Jeschke, C. Brecher, T. Meisen, D. Ozdemir, and T. Eschert, [7] G. Fortino, F. Messina, D. Rosaci, G. M. Sarne, and C. Savaglio, “A
“Industrial internet of things and cyber manufacturing systems,” trust-based team formation framework for mobile intelligence in smart

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
GHAHRAMANI et al.: AI-BASED MODELING AND DATA-DRIVEN EVALUATION FOR SMART MANUFACTURING PROCESSES 11

factories,” IEEE Trans. Industrial Informatics, 2020. Evolutionary Computation, vol. 20, pp. 606–626, 2016.
[8] M. Ghahramani, M. C. Zhou, and C. T. Hon, “Analysis of mobile phone [28] J. Derrac, S. Garcia, and F. Herrera, “A first study on the use of
data under a cloud computing framework,” in Proc. 14th IEEE Int. coevolutionary algorithms for instance and feature selection,” in Proc.
Conf. Networking, Sensing and Control, Calabria, Italy, 2017. Hybrid Artificial Intelligence Systems, Berlin, Germany, 2009.
[9] M. Ghahramani, M. C. Zhou, and C. T. Hon, “Extracting significant [29] M. Zamalloa, G. Bordel, L. J. Rodriguez, and M. Penagarikano,
mobile phone interaction patterns based on community structures,” “Feature selection based on genetic algorithms for speaker recognition,”
IEEE Trans. Intelligent Transportation Systems, vol. 20, pp. 1031–1041, in Proc. IEEE Odyssey Speaker Lang. Recognit. Workshop, USA, 2006,
2019. pp. 1–8.
[10] M. Ghahramani, M. C. Zhou, and C. T. Hon, “Mobile phone data [30] L. Dong, S. Chai, B. Zhang, S. K. Nguang, and A. Savvaris, “Stability
analysis: a spatial exploration toward hotspot detection,” IEEE Trans. of a class of multiagent tracking systems with unstable subsystems,”
Automation Science and Engineering, vol. 16, pp. 351–362, 2019. IEEE Trans. Cybernetics, vol. 47, pp. 2193–2202, 2017.
[11] M. H. Ghahramani, M. C. Zhou, and C. T. Hon, “Toward cloud [31] S. Wang and X. Yao, “Multiclass imbalance problems: analysis and
computing QoS architecture: analysis of cloud systems and cloud potential solutions,” IEEE Trans. Systems, Man, and Cybernetics, Part
services,” IEEE/CAA J. Autom. Sinica, vol. 4, pp. 6–18, 2017. B (Cybernetics), vol. 42, pp. 1119–1130, 2012.
[12] D. P. Bertsekas, “Feature-based aggregation and deep reinforcement [32] H. Liu, M. Zhou and Q. Liu, “An embedded feature selection method
learning: a survey and some new implementations,” IEEE/CAA J. for imbalanced data classification,” IEEE/CAA J. Autom. Sinica, vol. 6,
Autom. Sinica, vol. 6, no. 1, pp. 1–31, Jan. 2019. no. 3, pp. 703–715, May 2019.
[13] Z. Gao, C. Cecati, and S. X. Ding, “A survey of fault diagnosis and [33] Q. Kang, X. Chen, S. Li, and M. C. Zhou, “A noise-filtered
fault-tolerant techniques, part II: fault diagnosis with knowledge-based undersampling scheme for imbalanced classification,” IEEE Trans.
and hybrid/active approaches,” IEEE Trans. Industrial Electronics, Cybernetics, vol. 47, no. 12, pp. 4263–4274, Dec. 2018.
vol. 62, pp. 3768–3774, 2015.
[34] C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap,
[14] S. P. Hoseini Alinodehi, S. Moshfe, M. Saber Zaeimian, A. Khoei, and “DBSMOTE: density based synthetic minority over-sampling
K. Hadidi, “High-speed general purpose genetic algorithm processor,” technique,” Applied Intelligence, vol. 36, pp. 1–21, 2011.
IEEE Trans. Cybernetics, vol. 46, pp. 1551–1565, 2016.
[35] X. Zhang, Y. Zhuang, W. Wang, and W. Pedrycz, “Transfer boosting
[15] J. Wan, S. Tang, D. Li, S. Wang, C. Liu, H. Abbas, and A. V. with synthetic instances for class imbalanced object recognition,” IEEE
Vasilakos, “A manufacturing big data solution for active preventive Trans. Cybernetics, vol. 48, 2018.
maintenance,” IEEE Trans. Industrial Informatics, vol. 13,
pp. 2039–2047, 2017. [36] A. Gupta, Y. Ong, L. Feng, and K. Tan, “Multiobjective multifactorial
optimization in evolutionary multitasking,” IEEE Trans. Cybernetics,
[16] S. M. Meerkov and M. T. Ravichandran, “Combating curse of vol. 49, pp. 1652–1665, 2017.
dimensionality in resilient monitoring systems: conditions for lossless
decomposition,” IEEE Trans. Cybernetics, vol. 47, pp. 1263–1272, [37] C. Hou, F. Nie, X. Li, D. Yi, and Y. Wu, “Joint embedding learning and
2017. sparse regression: a framework for unsupervised feature selection,”
IEEE Trans. Cybernetics, vol. 44, pp. 793–804, 2013.
[17] Q. P. He and J. Wang, “Principal component based k-nearest-neighbor
rule for semiconductor process fault detection,” in Proc. American [38] C. Hou, F. Nie, X. Li, D. Yi, and Y. Wu, “A survey on feature selection
Control Conf., Seattle, USA, 2008, DOI: 10.1109/ACC.2008.4586721. methods,” Computers & Electrical Engineering, vol. 40, no. 1,
pp. 16–28, 2014.
[18] G. A. Cherry and S. J. Qin, “Principal component based k-
nearestneighbor rule for semiconductor process fault detection,” IEEE
Trans. Semiconductor Manufacturing, vol. 19, pp. 159–172, 2006. Mohammadhossein Ghahramani (S’15–M’18) ob-
tained the B.S. degree and M.S. degree in informa-
[19] S. He, G. Wang, M. Zhang, and D. Cook, “Multivariate process
tion technology engineering from Amirkabir Uni-
monitoring and fault identification using multiple decision tree
versity of Technology-Tehran Polytechnic, Iran, and
classifiers,” Int. J. Production Research, pp. 3355–3371, 2013.
Ph.D. degree in computer technology and applica-
[20] Q. He and J. Wang, “Fault detection using the k-nearest neighbor rule tion from Macau University of Science and Techno-
for semiconductor manufacturing processes,” IEEE Trans. logy in 2018. He was a Technical Manager and Seni-
Semiconductor Manufacturing, vol. 20, no. 4, pp. 345–354, 2007. or Data Analyst of the Information Centre of Insti-
tute for Research in Fundamental Sciences from
[21] G. Verdier and A. Ferreira, “Adaptive mahalanobis distance and k- 2008 to 2014. He is currently a Post-Doctoral Re-
nearest neighbor rule for fault detection in semiconductor search Fellow at University College Dublin (UCD), Ireland. He is also a
manufacturing,” IEEE Trans. Semiconductor Manufacturing, vol. 24,
Member of the Insight Centre for Data Analytics at UCD. His research in-
no. 1, pp. 59–68, 2011.
terests include smart cities, machine learning, artificial intelligence, internet
[22] R. Baly and H. Hajj, “Wafer classification using support vector of things, and big data. Dr. Ghahramani was a Recipient of the Best Student
machines,” IEEE Trans. Semiconductor Manufacturing, vol. 25, no. 3, Paper Award of 2018 IEEE International Conference on Networking, Sens-
pp. 373–383, 2012. ing and Control. He has served as a Reviewer of over 10 journals including
IEEE Transactions on Cybernetics, IEEE Transactions on Neural Networks
[23] J. Kwak, T. Lee, and C. O. Kim, “An Incremental clustering-based fault and Learning Systems, and IEEE Transactions on Industrial Informatics.
detection algorithm for class-imbalanced process data,” IEEE Trans.
Semiconductor Manufacturing, vol. 28, no. 3, pp. 318–328, 2015.
[24] Y. Zheng, Q. Liu, E. Chen, Y. Ge, and J. Zhao, “Time series Yan Qiao (M’16) received the B.S. and Ph.D. de-
classification using multi-channels deep convolutional neural grees in industrial engineering and mechanical engin-
networks,” in Proc. WAIM, Macau, pp. 298–310, 2014. eering from Guangdong University of Technology in
2009 and 2015, respectively. From Sept. 2014 to
[25] K. Lee, S. Cheon, and C. Kim, “A convolutional neural network for Sept. 2015, he was a Visiting Student with the De-
fault classification and diagnosis in semiconductor manufacturing partment of Electrical and Computer Engineering,
processes,” IEEE Trans. Semiconductor Manufacturing, vol. 30, New Jersey Institute of Technology, USA. From Jan.
pp. 135–142, May 2017. 2016 to Dec. 2017, he was a Post-Doctoral Research
Associate with the Institute of Systems Engineering,
[26] J. Zhang, H. Chen, S. Chen, and X. Hong, “An improved mixture of
Macau University of Science and Technology. Since
probabilistic PCA for nonlinear data-driven process monitoring,” IEEE
Jan. 2018, he is an Assistant Professor with the Institute of Systems Engineer-
Trans. Cybernetics, vol. 49, 2019.
ing, Macau University of Science and Technology. He has over 60 publica-
[27] B. Xue, M. Zhang, W. N. Browne, and Xin Yao, “A survey on tions, including one book chapter and over 30 international journal papers.
evolutionary computation approaches to feature selection,” IEEE Trans. His research interests include discrete event systems, production planning,

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.
12 IEEE/CAA JOURNAL OF AUTOMATICA SINICA

Petri nets, scheduling, and control. Dr. Qiao was a Recipient of the QSI Best (IFAC), American Association for the Advancement of Science (AAAS) and
Application Paper Award Finalist of 2011 IEEE International Conference on Chinese Association of Automation (CAA).
Automation Science and Engineering, the Best Student Paper Award of 2012
IEEE International Conference on Networking, Sensing and Control, and the
Best Conference Paper Award Finalist of 2016 IEEE International Confer- Adrian O’Hagan is a Lecturer and Researcher in
ence on Automation Science and Engineering. He has served as a Reviewer statistics and actuarial science at University College
Dublin (UCD), Ireland. He holds the degree in actu-
for a number of journals.
arial science and the M.Sc. and Ph.D. degrees in stat-
istics from UCD. He uses cutting edge statistical and
MengChu Zhou (S’88–M’90–SM’93–F’03) re- data analytics techniques to solve real industrial
ceived the B.S. degree in control engineering from problems in modelling and pricing risk, working with
Nanjing University of Science and Technology in leading insurers and financial institutions. He cur-
1983, the M.S. degree in automatic control from rently supervises Ph.D. students in statistical genet-
Beijing Institute of Technology in 1986, and the Ph. ics with actuarial applications and statistics and actu-
D. degree in computer and systems engineering from arial science, and is currently expanding his research group in the FinTech
Rensselaer Polytechnic Institute, USA, in 1990. He space. He serves as an Examiner for the Institute and faculty of Actuaries and
joined New Jersey Institute of Technology (NJIT), is a Referee for several leading statistics journals.
USA, in 1990, and is now a Distinguished Professor
of electrical and computer engineering. His research
James Sweeney is a Lecturer and Researcher in stat-
interests are in Petri nets, intelligent automation, internet of things, big data,
istics at the Royal College of Surgeons Ireland (RC-
web services, and intelligent transportation. He has over 800 publications in-
SI), Ireland, with the Ph.D. degree in statistical cli-
cluding 12 books, over 500 journal papers (over 400 in IEEE transactions), 12 matology. His research interests range across the
patents and 29 book-chapters. He is the founding Editor of IEEE Press Book fields of spatial analysis, high performance comput-
Series on Systems Science and Engineering and Editor-in-Chief of IEEE/CAA ing and simulation, and statistical applications in
Journal of Automatica Sinica. He is a Recipient of Humboldt Research Award medicine and agriculture. Dr. Sweeney’s core
for US Senior Scientists from Alexander von Humboldt Foundation, Franklin strengths are in the analysis of extremely large data-
V. Taylor Memorial Award and the Norbert Wiener Award from IEEE Sys- sets, particularly those comprised of multivariate,
tems, Man and Cybernetics Society. He is a Life Member of the Chinese As- spatially indexed data with practical applications in-
sociation for Science and Technology-USA and served as its President in cluding the modelling of house price information in Dublin, as well as evalu-
1999. He is a Fellow of International Federation of Automatic Control ating the speed and cost of abrupt climate change.

Authorized licensed use limited to: Carleton University. Downloaded on March 29,2020 at 07:35:57 UTC from IEEE Xplore. Restrictions apply.

You might also like