0% found this document useful (0 votes)
35 views16 pages

Evolutionary Digital Twin A New Approach For Intelligent Industrial Product Development

This document proposes an evolutionary digital twin (EDT) approach for developing intelligent industrial products. The EDT establishes a more precise approximated model of the physical world through supervised learning. It then enables collaborative exploration for optimal policies via parallel simulation in multiple cyber spaces using reinforcement learning. This brings more flexibility and adaptability to industrial products through machine learning-based self-evolution. A case study confirmed the effectiveness of the EDT-based development approach.

Uploaded by

DavidSu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views16 pages

Evolutionary Digital Twin A New Approach For Intelligent Industrial Product Development

This document proposes an evolutionary digital twin (EDT) approach for developing intelligent industrial products. The EDT establishes a more precise approximated model of the physical world through supervised learning. It then enables collaborative exploration for optimal policies via parallel simulation in multiple cyber spaces using reinforcement learning. This brings more flexibility and adaptability to industrial products through machine learning-based self-evolution. A case study confirmed the effectiveness of the EDT-based development approach.

Uploaded by

DavidSu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Advanced Engineering Informatics 47 (2021) 101209

Contents lists available at ScienceDirect

Advanced Engineering Informatics


journal homepage: www.elsevier.com/locate/aei

Evolutionary digital twin: A new approach for intelligent industrial


product development
Ting Yu Lin a, b, c, Zhengxuan Jia a, b, c, Chen Yang d, *, Yingying Xiao a, b, c, Shulin Lan e,
Guoqiang Shi a, b, c, Bi Zeng a, b, c, Heyu Li a, b, c
a
State Key Laboratory of Complex Product Intelligent Manufacturing System Technology, Beijing Institute of Electronic System Engineering, Beijing, PR China
b
Beijing Complex Product Advanced Manufacturing Engineering Research Center, Beijing Simulation Center, Beijing, PR China
c
Science and Technology on Special System Simulation Laboratory, Beijing Simulation Center, Beijing, PR China
d
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, PR China
e
School of Economics and Management, University of the Chinese Academy of Sciences, Beijing, PR China

A R T I C L E I N F O A B S T R A C T

Keywords: To fulfill increasingly difficult and demanding tasks in the ever-changing complex world, intelligent industrial
Evolutionary digital twin products are to be developed with higher flexibility and adaptability. Digital twin (DT) brings about a possible
Intelligent industrial product means, due to its ability to provide candidate behavior adjustments based on received “feedbacks” from its
Collaborative evolution
physical part. However, such candidate adjustments are deterministic, and thus lack of flexibility and adapt­
Approximate world
Multiple cyber spaces
ability. To address such problem, in this paper an extended concept – evolutionary digital twin (EDT) and an
Simple evolution paradigm EDT-based new mode for intelligent industrial product development has been proposed. With our proposed EDT,
Model evolution paradigm a more precise approximated model of the physical world could be established through supervised learning,
based on which the collaborative exploration for optimal policies via parallel simulation in multiple cyberspaces
could be performed through reinforcement learning. Hence, more flexibility and adaptability could be brought to
industrial products through machine learning (such as supervised learning and reinforcement learning) based
self-evolution. As a primary verification of the effectiveness of our proposed approach, a case study has been
carried out. The experimental results have well confirmed the effectiveness of our EDT based development mode.

1. Introduction dynamically changing. The core of nuclear power plant, multi-AGVs of


logistics system, and multi-vehicles of vehicle networking [2,3,4] are all
A new round of global scientific and technological revolution and good examples of such kind of industrial products which should run
industrial revolution is coming and will bring about a subversive impact autonomously in a rapidly changing environment through dynamic
on industries around the world. In particular, the rapid development of gaming based on part of the information.
new generation information and communication technologies (ICTs) However, the R&D of IntelliIndusProds is much more difficult than
such as 5G, Internet of Things, cloud computing, big data, as well as that of traditional industrial products, as they are conceived to work in
artificial intelligence (AI) is reinforcing the connection between human, harsh complex working conditions and accomplish demanding (multi­
computing machine, and the physical world through data and infor­ ple) missions. On one hand, nonlinearity, uncertainty, self-organization
mation to promote the intelligence of industrial products to higher levels and emergence pose great difficulty and complexity for theoretical
defined in literature [1]. The industrial product with the capability of modeling; on the other hand, tactic and model design for IntelliIndu­
cognition, cooperation and quick adaption to the complex changing sProds under complex environments becomes even harder for engineers,
world is called an intelligent industrial product (IntelliIndusProd) in this due to limited human capability on precise abstraction and rapid
paper. Such IntelliIndusProds are of increasing importance nowadays cognition of high-dimensional information. It is arduous to develop an
and in the future, as our systems are required to accomplish diversified IntelliIndusProd in a short time with enough flexibility and adaptability
and complex tasks more autonomously in complex world that has the through theoretical modeling, which relies only on human knowledge
characteristics of partially observable, non-cooperative, and and experience.

* Corresponding author.
E-mail address: [email protected] (C. Yang).

https://doi.org/10.1016/j.aei.2020.101209
Received 10 January 2020; Received in revised form 7 October 2020; Accepted 16 November 2020
Available online 1 January 2021
1474-0346/© 2020 Elsevier Ltd. All rights reserved.
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

Digital twin technology which integrates the cyber space and the continuous evolution and verification of the product [26]. An advanced
physical space [5] has the potential to increase the flexibility, adapt­ parallel simulator under multi-core environments is proposed to address
ability and intelligence of the industrial product. First, virtual product the challenges of collaborative simulation of complex variable-structure
(digital model) could better perceive the real world through the real- systems that exhibit changes both at structural and behavior levels, with
time “feedback” from its physical product (counterparts); second, vir­ increasingly big and complex models [31]. Recently, digital twin-driven
tual product could synthesize the sensing data and provide the physical product design approaches has been proposed to optimize the design of
product with better action policies. However, such digital twin could the virtual product based on the data feedback from the physical product
hardly solve the above problem effectively. First, faced with demanding [6]. However, the above approaches mainly depend on human knowl­
tasks, it may be difficult to construct such a virtual product (only by edge and abstraction capability without better utilizing the power of
human knowledge) that could fully reflect its physical counterpart’s new AI algorithms and models.
characteristics, due to the nonlinearity, uncertainty, self-organization
and emergence of the complex world. For example, during establish­ 2.2. Digital twin and parallel control
ment of the virtual product of a physical robot arm, the working envi­
ronment of such robot arm, including the production line layout, the The first formal definition of digital twin dates back to 2012, given
process to be executed, etc., is usually dynamic, with perturbation and by Glaessgen and Stargel [5] in NASA. A digital twin is an integrated
drastically changing or even unknown for different tasks, and thus multi-physics, multiscale, probabilistic simulation of an as-built system
extremely difficult to model in advance. Second, the existence of various that uses the best available physical models, sensor updates, history
working conditions makes it difficult to pre-design a policy-making data, etc., to mirror the life of its corresponding physical product [5].
module that can generate the effective action policies for the physical Rosen et al. [8] gives another definition from the perspective of auton­
product. Taking again the robot arm for example, as the motion control omous systems which are required to be able to respond rapidly to un­
law is established through human experience based on the virtual expected events without central re-planning. According to [8], a digital
product, when faced with totally different tasks, the control law may twin is an ultra-realistic model that reflects the state of the process and
need to be re-designed, and such process normally takes several months. the behavior of the autonomous system in interaction with its environ­
Therefore the concept of digital twin needs to be further developed ment in the real world.
and extended to enhance its self-learning, self-adaptation and self- From the above definitions, it could be summarized that a digital
growing capability based on machine intelligence. twin is an ultra-realistic model of the physical system, which is estab­
First, it is necessary for us to establish multiple cognitive models of lished in different scales, synthesizing knowledge, data, and physical
the real world, called the approximate worlds, which keep approxi­ model from different domains and equipment. Such a definition, in fact,
mating the different scenes of the real world through continuous emphasizes a lot on the establishment of model validity which is the core
learning. Through the approximate worlds, IntelliIndusProd could fully value of digital twin, but is difficult to achieve in engineering, though
understand the real world, and could explore different scenes in parallel related concepts, architectures [9], and even key technologies have
and in a low-cost, risk-free and super real-time way when it has to deal already been proposed. This is mainly due to the fact that human
with the uncertainty in the partially observable, non-cooperative, and knowledge is limited facing the complex real world. Thus starting from
dynamically changing world. current human cognition of the world with traditional theories, we could
Second, it is necessary to train the virtual product to adapt to scarcely build such a digital twin that could reproduce 100% the
approximate worlds. With the approximate worlds gradually behavior and characteristics of its physical product, especially the de­
approaching the real world, the self-growing virtual product could tails. However, when complex systems are taken into consideration, a
finally adapt to the real world and give better action policies for the minor mistake in such details could result in huge differences in
physical product. In addition, we could train multiple virtual products in behavior. Furthermore, to our knowledge, most of the research on dig­
parallel in a single approximate world which could generate more data ital twin concentrates on optimizing the performance of the physical
through simulation to speed up the exploration process. In other words, product through human knowledge based optimization of the virtual
the new kind of digital twin is an evolving digital twin equipped with product [8,10]. Similar to the modelling precision limit totally based on
multiple cyber spaces, namely multiple approximate worlds, where the human knowledge, such optimization ability is also limited.
virtual product could evolve its behavior and policy to better approxi­ Faced with similar difficulty in the control law design domain, Fei-
mate the physical product and make better decisions. yue et al. [11] proposed the parallel control theory. In this theory,
This paper proposes an Evolutionary Digital Twin (EDT) approach they proposed constructing an equivalent artificial system (not exactly
for the IntelliIndusProd development, and is organized as follows: Sec­ the same as the physical one) that works in parallel with the real world
tion 2 presents related work; Section 3 and 4 introduces the EDT concept system. In this artificial system, planning and optimization algorithms
and the architecture of an EDT; Section 5 discusses the concrete estab­ could be designed and experimented for better control laws. Meanwhile,
lishment of a product design oriented EDT in details. Section 6 describes the behavior of this artificial system is further corrected by the data
an application case study and discusses related results. Section 7 con­ collected from the real system. Although such parallel control theory is
cludes the paper with future work. promising and potentially effective in control law design where the core
purpose is to eliminate deviations, it might be less applicable in the cases
2. Related work where policies should be made based on human knowledge or precise
knowledge about the state or evolution dynamics of the system, for
2.1. Simulation driven product design example the autonomous collaborative robot arm pairs on the produc­
tion line. Even so, the concept of parallel control has provided good
The innovation of digital design and manufacturing is the key to this insights to the design of our EDT.
new round of industrial revolution [23]. Currently, digital methods for
product development can be used to quickly define and design a pro­ 2.3. Reinforcement learning based design
totype, not only in terms of product structure, but also product function
and behavior [24]. The multi-disciplinary virtual prototype could sup­ Recent years have witnessed rapid development of machine learning,
port unified modeling and collaborative simulation across different especially deep learning and deep reinforcement learning. For example,
disciplines to evaluate and optimize virtual products in the cyber space in 2016, AlphaGo designed by DeepMind of Google mastered the game
[25]. Model-based system engineering could establish the entire life­ of Go and easily defeated the world champion Sedol Lee and Jie Ke by
cycle model of a product through multiple views, supporting the big scores [12]. With this victory, DeepMind published their research on

2
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

development of AlphaGo on Nature, followed by another one on 3. EDT and persistent reinforcement based product design
AlphaZero which mastered the game of Go without human supervision. paradigm
In 2019, faced with the more challenging task Starcraft II, AlphaStar
[13] developed by DeepMind defeated again professional human players 3.1. Common product development process
with score 10:1. From the details revealed in the articles [13], it could be
seen that provided with perfectly described interaction environment and The common product development process follows a step-by-step
task targets, the agent could converge to behavior strategies largely process which is mainly driven and led by human in three worlds
superior than those manually developed, or even to optimal strategies, including expected world, interpreted world and external world [24].
through autonomous learning. Hence, reinforcement learning based The product lifecycle could be roughly divided into 3 steps, as shown in
design pattern starts to attract increasing attention both in industry and Fig. 1: first, in the primary design step, according to the ideal world
academy, and is being applied to designs in different domains. (corresponding to the expected world in [24]) which is based on the
Among these application domains, the design of robot control algo­ human understanding of the real world (corresponding to the external
rithms has been widely studied and has been practically applied in the world in [24]), an ideal product model is established by means of
control of real robots. Asada et al. [14] developed a ping-pong player mathematical and physical modeling in the cyber space. Second, in the
robot based on Q-learning algorithm [16]. Their developed robot con­ detailed design, simulation and test step, an approximate world (cor­
trol algorithm could drive the robot to beat the ball to required positions responding to the interpreted world in [24]) of the real world is con­
based only on visual information. Deisenroth et al. [15] proposed a structed through modeling and simulation, together with possible semi-
model based policy search method to train the robot to accomplish physical simulation in verification. In this step, a digital or semi-physical
building blocks task. Although the application systems in the above prototype product based on the approximate world in the cyber space or
work were real robot machines, the robot control commands were the physical space is established. Finally, in the operation and service
constrained and limited by human scripted rules when applied. This may step, a real product is fabricated and released to provide services. In the
hinder better policy discovery in complicated tasks, as human knowl­ traditional development mode, the function, performance and structure
edge or cognition is, to some extent, limited. Deep reinforcement of the final product are fixed, with extremely limited flexibility and
learning based robot control design (without human scripted rules in adaptability. Facing new scenarios and requirements, corrections or
control commands), on the other hand, could hardly be applied to real innovations from humans are needed, and then the above three steps
robots, and still stays in simulation stage [17,18]. Zhang et al. [19,20] needs to be repeated. Usually, each product development cycle will
applied DQN algorithm to train a three joint robot for grasping task in incur high cost and take a long time (several years in extreme cases),
simulation environment. However, when applied to physical robot which cannot meet user’s need for fast product delivery, upgrade and
machine, the performance was less satisfactory due to the differences iterations. In other words, this product development mode cannot sup­
between the simulation environment and the real world. Similar prob­ port the fast product innovation and development (within a relatively
lems also exist in others work [21]. short timeframe) in the coming 4th industrial revolution era.
Hence, for the purpose of addressing the above problems encoun­
tered in simulation driven product design, we propose the concept of
EDT which allows the designed product to persistently and intelligently 3.2. Connotation of EDT
optimize itself in its whole lifecycle. With our proposed EDT, learning
agents could be equipped with a model with high and gradually To address the above issue, based on the industrial Internet, inte­
increasing confidence, which in turn converge to behavior policies with grating the new generation ICT, the new generation AI technology and
higher performance in real products. the product development field technology, the EDT provides a novel
product development mode where different forms of learning and
searching means in multiple parallel cyber spaces are introduced, which
allows the IntelliIndusProd to evolve and possess better adaptability.

Conceptual Design in Detailed Design, Simulation & Operation & Service in


Cyber Space Test in Cyber&Physical Space Physical Space

Ideal Product Prototype Product Real Product


By
Human

By Ideal World Approximate World Real World


Human

Fig. 1. Common process of product development.

3
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

The EDT, as an extension of DT, is also composed of the virtual and the real ones or the physical ones. However, through evolution and
physical parts, namely the virtual product and physical product in terms learning process, such differences can be gradually decreased. More­
of industrial product development. Mainly based on machine intelli­ over, together with such evolution process, action policies of the phys­
gence and supplemented by human intelligence, the EDT could help ical product should be optimized. Second, based on the EDT, the
effectively establish the operation law under various uncertainties and traditional simulation, test, operation and service would not be a sepa­
gradually achieve a well approximated model of the real world. Such rated process. Instead, this process is combined with the evolution of
capability could well guide product design improvement and operation both the prototype product (virtual product) and the approximate
optimization, and hence support to develop IntelliIndusProds with world. Emphasizing on autonomous evolution, the above development
enough flexibility and adaptability for the complex world and tasks. process, however, does not negate the value of human and theoretical
The EDT, as shown Fig. 2, includes some notable features: modeling which may also play an important role in extracting more
value from the data on the contrary [7].
(1) Compared with the current digital twin, the EDT clearly builds an In this new product development process, there are two paradigms of
approximate world corresponding to the real world, which could digital twin evolution: simple evolution paradigm and model evolution
focus and simulate some specific aspects (views) of the real world paradigm, which are discussed in details as follows.
according to the R&D requirements. Besides, via EDT, such
established approximate world can support the repeated tests and 3.3.2. Simple evolution paradigm
experiments of the designed products and evolve with the new In the simple evolution paradigm, the models & parameters of the
product designs at the same time. product and the world are deterministic, the operation policies is the
(2) Compared with the current digital twin whose cyber space and only adjustable factor. As shown in Fig. 4, at the beginning, the state of
physical space are mapped through bijection, the EDT builds the approximate world is mapped from the real world’s state. In multiple
multiple cyber spaces to construct different models of the real cyber spaces, the virtual product could execute actions from different
world with uncertainties and of the product at different resolu­ operation policies, and bring the state changes in the corresponding
tions from different aspects (views) of the real world according to approximate world. The process is a super real-time process. If the policy
the development demands. space is huge, searching and planning algorithms like Monte Carlo Tree
(3) Compared with the current digital twin which mainly use a cyber- Search technology would be applied to explore the feasible solution
space to predict the operation effects of the product, the EDT space in parallel in multiple cyber spaces. Finally, the policy used in the
builds a development method using multiple cyber spaces for fast physical space can be optimized using the searching and evaluating
parallel learning and searching, which allows the approximate results generated in the cyber space.
world to keep approaching the real world, and the product
scheme to keep adapting to the real world. 3.3.3. Model evolution paradigm
In the model evolution paradigm, the behavior policies are not the
only variable for the virtual product of an EDT, the cognition models of
3.3. A new product development paradigm the real world would also change due to the gradually increased infor­
mation completeness and certainty.
3.3.1. New process of product development To this end, as shown in Fig. 5, supervised learning and unsupervised
The process of new product development mainly using machine in­ learning could be adopted to construct different cognition models of the
telligence and assisted by human intelligence, is an evolving process real world, supporting the evolution of the approximate world towards
based on the EDT, as shown in Fig. 3. First, if it belongs to a product the real world, while reinforcement learning approach could be applied
family, evolution could be performed through learning from the data of for policy model construction, allowing the search of effective and even
similar products fed back by their EDT in the early step of design, which optimal behavior policies through interaction between the virtual
is similar to the in-use product [29]. Initial models of both the product product and the approximate world.
and the approximate world may be different from or cannot fully reflect

Transfer Learning
Reinforcement Learning
(Monte Carlo) Tree Searching

Virtual Product of Multi-


Cyber Space

lives in
Physical Product
Virtual lives in
Product

Physical Product
Approximate World of
Multi-Cyber Space
Real World

Supervised / Unsupervised Learning

Fig. 2. Core features of the EDT.

4
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

Virtual
Product

Ideal Product Prototype Product lives in Real Product


Physical
Product
Mainly By AI lives in
& Less By
Human

Approximate
World
Ideal World Approximate World Real World Real World

Mainly By AI
& Less By
Human
Fig. 3. New product development process.

...

(Monte Carlo) (Monte Carlo)


Time Axis
Tree Searching Tree Searching

Fig. 4. Simple evolution paradigm.

However, differences still exist between the behavior of the virtual (1) Big data: big data is not only collected from the physical space,
product and that of the physical product, due to the existence of errors in but also generated from cyber space. (2) Algorithm: algorithm mainly
training, forming a gap between the cyber space (including the virtual operates in data analysis engine, intelligent optimization engine and
product and the approximate world) and the physical space (including machine learning engine. (3) Computing power: enough computing
the physical product and the real world). Thus, to mitigate this gap, the power is needed to support the data generation in the approximate
transfer learning approach could be applied to develop a product with world and the virtual product, and to support the costly operation of
enough adaptability to the errors above, allowing the converged policy corresponding engines.
models to be applied in physical world. Furthermore, as the EDT contains multiple cyber spaces, much more
computing power is needed to process big data from different cyber
4. EDT based intelligent industrial product development system spaces. An IntelliIndusProd often runs on the industrial edge where
architecture computing resources and data processing capability are limited.
Therefore, the development system should adopt the architecture that
4.1. Overview integrates the cloud and the edge computing power.
As shown in Fig. 6, for both simple evolution paradigm and model
The intelligence of the IntelliIndusProd is not possible without the evolution paradigm, the edge would collect and pre-process the data of
effective fusion of big data, algorithm and computing power, for both the physical product and the real world, and then feed the data back to
supervised learning of the approximate world and the reinforcement the cloud; the cloud could support the supervised learning of multiple
learning of the virtual product. approximate worlds and the reinforcement learning of multiple virtual

5
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

Scenario 1 Scenario 2 ...

...

Supervised / Supervised /
Reforcement Reforcement Time Axis
Unsupervised Transfer Unsupervised Transfer
Learning Learning
Learning Learning Learning Learning

Fig. 5. Model evolution paradigm.

The confrontation training of The reinforcement learning of


multiple approximate worlds multiple virtual product instances

Updating the Cloud Updating the


newest intelligence newest intelligence
Feeding back data of the physical
product and the real world
Edge 1 Edge n
(Scenario 1) (Scenario n)

Fig. 6. Cloud-edge integrated architecture.

product instances efficiently. The edge would always update the latest It not only includes the access and communication of the edge to the
intelligence to improve the flexibility and adaptability. physical product and the real world, but also includes the access and
communication of the cloud to each edge.
4.2. Hierarchical description
(3) Edge processing platform layer
As shown in Fig. 7, the system architecture of the EDT based Intel­
liIndusProd development consists of several layers including Physical It not only supports the time-efficient operation of the intelligent
layer, Access/communication layer, Edge processing platform layer, industrial product, but also supports the local evolution of the intelligent
Cloud development service platform layer and Application layer, along industrial product by providing data analysis engine, intelligent opti­
with two parts about Information security management, Standard and mization engine and machine learning engine along with local data and
specification. The Cloud development service platform layer further computing power.
includes Virtualizing layer, Cloud development service support layer
and Portal layer. The details of this architecture are explained as follows. (4) Cloud development service platform layer
1) Virtualizing layer
(1) Physical layer
It not only includes the virtualization of traditional development
It not only includes the physical product and the real world, but also resources and capabilities such as computing power, but also includes
includes the related development resources and capabilities (such as the virtualization of the physical product and the real world, namely the
computing power) required by the operation of multiple cyber spaces. virtual product and the approximate world. The virtual product and the
approximate world could be encapsulated by the container technology
(2) Access/communication layer such as Docker, and could be created as multiple instances in the cloud
or be deployed to the edge computing devices.

6
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

Data feedback
Model
Application training and Application
layer optimization
Model update/
decision support
Cloud Edge

Cloud customized portal


Portal Platform operator
layer Service provider portal Service consumer portal
portal

App services
Cloud development service platform layer

System modeling and simulation language


CFX/DFX/... (APPs)
Cloud development service support layer

and collaborative design software (SaaS)

Approximate world Approximate world


Virtual intelligent Virtual intelligent
industrial product industrial product

Information security management


Basic services

Standard and specification


Data analysis engine / intelligent optimization engine / machine learning engine (PaaS)

High performance cloud simulation engine (PaaS)

Infrastructure service (IaaS) Data service (DaaS)

Virtua Cloud virtual development Cloud virtual intelligent industrial Cloud virtual development
lizing resource product and approximate world capability
layer

Edge pervasive terminal


Edge
processing Edge processing service support (data analysis engine / intelligent optimization engine /
platform machine learning engine)
layer
Edge virtual development resource/edge virtual intelligent industrial product and
approximate world/edge virtual development capability

Communication network and interface


Access /
comm-
Information terminal IoT gateway Software service bus
unication
layer Perception and control interface

The real world of intelligent industrial product


Development resource
Physical
layer Physical intelligent industrial product
Development capability

Fig. 7. Global architecture of the EDT based product design system.

7
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

2) Cloud development service support layer It reflects that, for both simple evolution paradigm and model evo­
(a) Basic services lution paradigm, the EDT could evolve the intelligent industrial product
to develop the real intelligence based on the integration of cloud and
The basic services module provides the core support of the whole edge.
system, including the data support, the computing power support, and
the algorithm support in form of services. Data as a service (DaaS) 5. Product development-oriented EDT construction and
manages and provides the data collected from both the real world sys­ evolution
tems and the virtualized approximated world systems to support the
data driven evolution process of the virtual intelligent industrial prod­ 5.1. EDT construction design
uct. Infrastructure as a service (IaaS) provides elastic computing power
to large scale parallel simulation of multiple virtual intelligent industrial As described above, the EDT has the characteristics of persistent
products in multiple approximated world instances, which is further enhancement in terms of both its virtual product precision and its
supported by the high performance cloud simulation engine based on physical product policy optimization. Thus, it should be constructed
the Docker technique. Platform as a service (PaaS) realized via the data naturally equipped with the ability to evolve, in terms of both the virtual
analysis engine, the intelligent optimization engine and the machine product and the physical product.
learning engine supports the evolution of the EDT with abundant To achieve this, the virtual product is designed to be composed of a
powerful algorithms. totally or partially parameterized policy module and a behavior module
also parameterized. The function of the latter is to approximate the
(b) Application services behavior of the physical product, while that of the former is to generate
action commands for the latter to accomplish tasks. The physical prod­
The application services module provides shared domain related uct, on the other hand, is designed to be composed also of a totally or
applications. With this module, users could define data driven intelli­ partially parameterized policy module whose parameters are a copy of
gent industrial product development tasks with the assist of system those of the virtual product’s policy module through update, as shown in
modelling and simulation language and collaborative design service the example of a robot arm EDT in Fig. 8. As for the approximate world,
(Software as a Service, SaaS). Large scale simulation instances are it is also equipped with a behavior module with similar kind and func­
defined and executed in this part, continuously generating data from the tion as that of the virtual product.
interaction between the virtual intelligent industrial product and the With such construction design, the upgrade of the virtual product
approximated world, supporting the collaborative evolution of the vir­ relies on the model adjustment based on the measurement data collected
tual model. from the physical world and the simulation data from the approximate
world, while that of the physical product relies on the update of its
3) Portal layer behavior policy and the application of the generated policy.
It is thus required that the physical product is able to apply the
Stakeholders, including end users, could jointly carry out various generated policy and that the physical product, the virtual product, and
activities in the whole life cycle of the intelligent industrial product, the approximate world are adjustable through programs. Based on this,
such as describing the ideal product and supervising the evolution of the the construction of the dual parts of the EDT is discussed in details in the
prototype product. following parts.

(5) Application layer 5.1.1. Physical product construction


In an EDT, the physical product serves as a collector of the physical

Fig. 8. Global design of the EDT.

8
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

Fig. 9. Physical product construction for robot arm EDT.

measurement data and an actuator and verifier of the policy learned by (2) Policy update and application
the virtual product. Thus, the physical product should have the ability to
convert the policy into commands directly applicable to its physical sub- To automatically update the policy of the physical product, a
components and continuously collect measured data about itself and the parameterized policy module which is the same with that of the virtual
surrounding environment. Based on these functional requirements, the product is constructed. At each policy update, the parameters of the
following key points should be well determined in the construction of physical product’s policy module are updated with those of the virtual
the physical product in an EDT. product’s policy module, and hence update the physical product
behavior. Moreover, as the policy generated by the policy module is
(1) Measurement data collection and preprocessing digital, and that the motion or dynamics of a physical system is usually
continuous, converting modules, such as stepping motors, that convert
Measurement data of a physical system originate from measurements the digital policy into continuous behavior policy are equipped, as
from different sensors which, taking the robot arm EDT shown in Fig. 9 shown in Fig. 9. These converting modules could convert online the
as an example, include angular sensors, angular velocity sensors, cam­ behavior policy, and the behavior of the robot arm in the example in
era, infrared sensor etc. Therefore, in order to gather sufficient data with Fig. 9 could be controlled directly or indirectly by frequency of the pulse
a wide enough range for training the virtual product, the physical signal generated by the policy module.
product needs to be equipped with a wide range of accurate sensors.
Meanwhile, as different sensors may have different sample frequency, a 5.1.2. Virtual product construction
pre-processing module in the edge processing platform layer is needed to In an EDT, the virtual product, composed of a policy module and a
unify different measurement data time line through either unifying the behavior module, serves as both an imitator and an optimal policy
sample frequency or introducing sample interpolation for measurement searcher of the physical product. In the operation mode, the behavior
data from sensors with a lower sample frequency. Through this coor­ module corrects its behavior based on the measurement data received
dination process, the data are collected at the same frequency and could from its physical counterpart and the policy module searches for an
be merged together to provide the virtual product with a comprehensive optimal behavior policy in the approximate world with the support of
measurement data set of both the physical product and its living the high performance computer cluster in the cloud. Accordingly, both
environment. the behavior module and the policy module of the virtual product should
Moreover, besides the data time line unifying processing, further be designed as auto-adjustable through deterministic or stochastic
data processing is required before the collected data could be trans­ learning programs.
mitted to the virtual product and is utilized for training, due to the ex­ To make the behavior of the virtual product adjustable through
istence of noise, abnormality and misalignment in the preprocessed measurement data based learning and simulation based reinforcement
data. Thus, in the operation mode, the data flow from the physical world learning in the approximate world, both the policy module and the
into the sensors, pass through the pre-processing module, the filter and behavior module of the virtual product can be modelled in different
alignment module, and finally transmitted to the cloud where they could parameterized forms: partially parameterized and totally parameter­
be utilized for virtual product training. ized, as shown in Fig. 10.

9
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

Fig. 10. Behavior module construction of the EDT.

In the totally parameterized modelling mode, the virtual product is model further consists of a non-parameterized or partial parameterized
either modelled by different neural networks according to its desired sub-part and a totally parameterized sub-part. In such a construction
function or established by different components, each of which is further form, parameters to be optimized could be drastically decreased, as the
constructed by a neural network imitating its behavior. Such a model­ model is restricted to only some parts of the entire system, and that the
ling mode normally results in a large number of parameters to be opti­ parameterized parts are based on some empirical models or theory based
mized when the behavior of the physical product is complex. Such non- models. But such modelling mode has its drawbacks compared to the
structural characteristics combined with a large number of adjustable first mode, such as sophisticated training process, relatively limited
parameters make the model hard to optimize. behavior fitting capability.
Different from the totally parameterized modelling mode, the Hence, in the construction of the virtual product’s behavior module
partially parameterized mode introduces the prior information of the of an EDT, the extent and the approach of parameterization needs to be
physical product into the modelling process, including the system considered delicately based on the features of the system and the
structure, some deterministic theory based part models, some manually extensibility of it. The same also applies to the construction of the
behavior fitting functions, etc., as shown in Fig. 10. In this mode, the approximated world models.
non-parameterized approach and the parameterized approach are
combined in two ways. In the first way, part of the model is established 5.2. EDT evolution design
based on theories with some influencing parameters determined from
outside, for example by a neural network. In the second, part of the Currently, model evolutions are achieved mainly using machine

10
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

learning, deep learning, and reinforcement learning approaches. As for adopted by the EDT. Different from supervised learning processes based
the first two approaches, models are trained mainly through supervised on well labelled data, the reinforcement learning process learns through
or semi-supervised mode based on collected data, while model evolution interaction between the agent and its living environment with no pre-
in the last approach is accomplished based on data collected via in­ collected labelled data, and relies on the virtual product behavior
teractions between the current agent and the environment. All three module. The complete training process under the collaborative training
approaches have achieved good results in various tasks. However, in the framework is shown in Fig. 12.
evolution process of an EDT, none of these evolution approaches could As shown in Fig. 12, the exact virtualization of the physical product
succeed alone. Hence, in this paper, a coordinated evolution approach allows the creation of multiple cyber spaces, where the policy module
designed for the EDT is proposed. accumulates experience (in the form of collected data, accumulated
An EDT is composed of a virtual product and a physical product. parameter gradients, etc.) and updates its policy model through parallel
However, the evolution of an EDT mainly concentrates on the virtual interaction with different virtual product models with multiple
product side, while the evolution of the physical product is achieved instances.
through a simple transmission of parameters, given that both the virtual In each of the multiple parallel interactions and trainings, the policy
product and the physical product have a policy module with the same module collects observations of the environment and the reward it gains
structure. Meanwhile, the parameter transmission could be performed after applying its output policy based on previous observations. With
once the difference between the policy generated by the up-to-date batches of such interaction data collected, the policy module accumu­
model and that given by the previous version of model exceeds a lates experience through policy gradient [22] or policy optimization
certain threshold. In such a way, the physical product is gradually [17] methods. Finally, with the accumulated experience collected from
evolved, guaranteeing its behavior stability at the same time. different interaction and training processes, the parameters of the policy
The evolution of the virtual product concerns the evolution of its module is updated through application of the synthesized gradient or
behavior module through supervised learning and that of its policy direct parameter assignment.
module through reinforcement learning, which optimizes the behavior
of an EDT according to different tasks. Moreover, as both evolutions take 5.2.3. Coevolution of the supervised learning and reinforcement learning
place simultaneously, a coevolution strategy that coordinates these two The above two processes of the virtual product evolutions are carried
evolution processes also plays a key role in the evolution of the virtual out independently. However, coupling does exist between them. As
product. stated above, the entire policy optimization process of the policy module
bases on the correctness of the virtualization, namely the virtual product
5.2.1. Supervised learning based behavior module evolution behavior module. Thus, if the behavior of the behavior module is largely
The purpose of the behavior module evolution is to achieve the different from that of the physical product, the behavior policy provided
consistency between the behavior of the physical product and that of the by the policy module which is trained based on data collected through
virtual product. Taking the robot arm EDT as an example, in a well simulation interaction between the virtual product in the approximate
evolved virtual robot arm, each ankle of it should turn exactly the same world would be useless or even dangerous if applied in real world.
angle as the physical robot arm, given the same input command. This Accordingly, in the EDT evolution process, the evolution of the policy
consistency could result in an exact virtualization of the physical robot, module will not start until good precision has been achieved by the
which could serve to optimize the control policy of the real robot behavior module.
through reinforcement learning. Furthermore, as described in the Section 5.1, the EDT is designed to
Therefore the training of the behavior module follows the typical support, to some extent, system scalability, which indicates that some
supervised learning process. The behavior module is established ac­ functions or parts in the physical system could be modified according to
cording to the Section 5.1. As shown in Fig. 11, with the parameterized the needs of product upgrade. Under such condition, the behavior
parts built by using neural networks, such process takes the input module first will be updated, followed by the update of policy module, to
commands labelled by the output behavior of the physical product as adapt to this change, realizing specifically the support of system
training data, and takes the mean square error between the output extensibility.
behavior of the virtual product and that of the physical product as the
loss function. The parameter optimization is accomplished through 6. Case study: EDT based development application in robot arm
minimizing the loss function by tuning the neural network parameters. control

5.2.2. Reinforcement learning based policy module evolution In this section, an application of an EDT based development in the
The reinforcement learning process aims to optimize the policy robot arm control command calculation is presented. We first introduce

Fig. 11. Supervised learning of behavior module.

11
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

Fig. 12. Reinforcement learning of policy module.

1. Set up safe activity area by


human

It usually takes 2. Adjust the motion path


months step by step and one by one
by human

3. Integrate and debug the


collaboration by human

4. Implement pilot run and


improvement by human

Fig. 13. Manual adjustment of robot arms before equipped in a production line.

12
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

a robot arm dynamic model in the production line, followed by the utilization on product line. Furthermore, as the whole adjustment pro­
design to construct an EDT for the robot arm control. Finally, some cess is task oriented, any changes on the pre-defined task would require
related results on the construction process and system performance are a re-adjustment of the robot arm.
discussed. With the increasing needs for individualized and small-lot produc­
tion [28], flexibility of the production line becomes more and more
important. Under such circumstances, the above manual adjustment and
6.1. Robot arm control in the production line re-adjustment of robot arms turn out to be a bottleneck in the efficient
production, where our proposed EDT approach can be utilized. Our
Manufacturers around the world are turning to automation to help approach applied in the design of the robot arm, allows the robot arm to
solve the labor shortage, increase productivity and improve product adapt itself to different tasks, even changes in the robot arm constitu­
quality. Robot arms provide a cost-effective, flexible, and safe automa­ tion. We will describe the construction of a robot arm control EDT based
tion solution for a wide range of production tasks, including machining, on our proposed method in Section 6.2.
product packaging, product sorting, etc.
At present, most of the busy robots on the production line are based
on manual pre-adjustment, undertaking fixed tasks, and running in the 6.2. Construction of a robot arm control EDT
scope of non-interference. However, with the increasing demand for
personalized and intelligent production, they are required to perform 6.2.1. Constructing a robot arm EDT model
diversified and complex tasks, constantly undertake and adapt to new The construction of the virtual product and physical product is
tasks, and can work closely with each other or with people autono­ described in Figs. 14 and 15. As the environment is simply an object with
mously. Hence, they are becoming the IntelliIndusProd that this paper a table and a floor, the approximate world could be established
focuses on. deterministically.
Normally, a robot arm is composed of five subsystems, namely the For the construction of the virtual product, geometric data and dy­
driving system, the transmission system, the actuators, the control sys­ namic characteristic data are first collected through product description
tem, and the detecting system. The control system stands in core position and geometric measurements. These collected data are further utilized
of a robot arm, as it coordinates the dynamics of motors on different axes for geometric modelling and dynamic modelling of the virtual product
to accomplish a production related task. Hence, the design of this control model. Concerning these two modelling processes, Unity [30] is adopted
system is the core of the design of a robot arm. as the modelling tool, as it supports both 3D modelling and dynamic
However, the control system of most robot arms could only control modelling, and hence could combine these two parts together, forming a
the robot arm to perform predefined action sequences which are realized unique virtual product model.
through manual guidance and adjustment, as shown in Fig. 13. During Moreover, in the process of dynamic modelling, the aforementioned
this adjustment process, normally, a step-by-step guiding process is partial parameterized model is adopted. In this model, the dynamics of
needed. In this process, the robot is controlled by the technician through each arm and joint are determined through dynamic characteristic based
a wired controller to accomplish a given task. The commands sent by the modelling, while the interaction between different arms and joints are
technician through the controller are transformed into executable code modelled through neural networks. Specifically, the mass, and the
lines and stored in the computer. As the task that a robot arm needs to moment of inertia of each arm, the angle range and the motor torque
achieve is usually delicate with high precision, the human controlled range of each joint are determined manually, while the joint angles
process needs to be slow enough to protect both the robot and the between successive arms, together with their changing rates are pro­
product from being damaged. After this controlled process, the com­ vided by the neural networks. Based on this, the behavior of a virtual
mand execution needs to be further accelerated to meet the production product model could be described by joint angles, joint angle velocities,
needs. As a result, for each robot arm and each task, it would require and joint positions.
about two to three months for a task oriented adjustment before its As for the policy module that is in charge of generating task oriented

Fig. 14. Virtual product model construction process.

13
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

Fig. 15. Physical product construction process.

( )
control commands, on the other hand, it is also modelled as a neural θπ ) serving as the policy function and the critic Q s, a; θQ as the action-
network taking the task related information (for example the position of value function, parameterized respectively with θπ and θQ , where s de­
the cargo in a cargo fetching task) as input and outputs the change rate notes the observation state and a represents the action taken. In the
of joint angles.
training process, the θQ is updated via minimizing the expected value of
Compared to the construction of the virtual product, that of the
temporal difference error
physical product is less complicated. As depicted in Fig. 15, the raw [ ( ) ( ( ))]
robot arm is first equipped with different sensors, including goniome­ Eπ′ Q st , at ; θQ − r(st , at ) + γQ st+1 , at+1 ; θQ (2)
ters, angular velocity sensors, industrial cameras, etc. These sensors
collect measurement data including but not limited to the joint posi­ with π an ε-greedy policy based on policy π, while the policy parameter

tions, the joint angles and the joint angle change rates, which could be θπ is updated in the direction of the deterministic policy gradient
further utilized in the evolution of the virtual product model. Further­ through the equations
more, the physical product is connected to a computer where the policy ( ) ( )
module and the converter are equipped. For these two parts, the former δt = r(st , at ) + γQ st+1 , π (st+1 ); θQ − Q st , at ; θQ (3)
is only a copy of the policy module of the virtual product, while the latter ( ) ( )⃒
converts the target joint angle values generated by the former into target θπt+1 = θπt + ∇a Q s, a; θQ |s=st ,a=π(st ;θπ ) απ ∇θπ π s; θπt ⃒s=st (4)
t

joint angle command signals which could be directly applied to the robot
With experience replay and target networks mechanisms further
arm machine.
introduced, the DDPG algorithm further stabilizes the algorithm evo­
lution process.
6.2.2. Robot arm EDT evolution
Applying the DDPG algorithm to our case, the observation state s is
With the EDT model constructed above, evolution of the robot arm
designed as follows. The observation state provided by the behavior
EDT is implemented as follows. For the supervised learning based
module of the virtual product is composed of variables below:
behavior module evolution, the implementation is simply an application

of the backpropagation process on a structured neural network, with one ⎪
⎪ dif f jti = (jointi − tgt)/2

neural network part for each correlation between arms. The network ⎪
⎨ dif(f jji = (jointi − joint
)/ 0 )/2
training process is as described in Section 4.2. dif f thj = tpointj − hpointj 4 j = 1, 2, 3, 4
( )/ (5)


The evolution of the policy module is a bit more complicated. In this ⎪


dif f hjij = hpointj − jointi 4 j = 1, 2, 3, 4
case, the Deep Deterministic Policy Gradient (DDPG) [22] algorithm is collision
adopted as the reinforcement learning algorithm to drive the evolution
where jointi denotes the three-dimensional position of joint i, tgt repre­
of the policy module. The DDPG algorithm originating from Deep Policy
sents the three-dimensional position of the target, tpointj is the three-
Gradient (DPG) [27] algorithm is proposed for solving reinforcement
learning problems, especially those with a continuous action space, dimensional position of the point j just beneath the object, hpointj is
where an agent needs to learn an action policy π through interaction the three-dimensional position of the point j on the gripper, collision is
with an environment, aiming at maximizing the expected return from the occurrence of collision. By these five parts, the final observation of
start R1 the agent is


T observation = [ s1 s2 s3 s4 s5 ]
Rt = γi− t r(si , ai ) (1) si = [diff jti diff jji diff hji1 diff hji2 diff hji3 diff hji4 ] ∀i ∈ {1,2,3, 4}
i=t s = [diff th1 diff th2 diff th3 diff th4 collision]

the DDPG algorithm utilizes an actor-critic structure, with the actor π(s; For the design of the reward function r(si ,ai ) that guides the robot

14
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

arm to converge to optimal policy through the DDPG algorithm is


designed to guide the gripper to the position below the object with
( )
⎧ ∑ 3

⎪ jre1 = ‖tpoint i − hpoint i ‖/4



⎪ i=0

⎪ ( )

⎪ ∑ 3
⎨ jre2 = |hpoint xi |/4
i=0
(6)

⎪ ( )

⎪ ∑ 3



⎪ jre3 = |hpoint yi |/4


⎩ i=0
jre4 = cos(hvect, tvect)

where jrei i = 1, 2, 3, 4 denotes the reward part i, hpoint x represents the


x coordinate of hpoint, and hpoint y the y coordinate of the hpoint. hvect
represents the normal vector of gripper plane, and tvect is the normal
vector of object-bottom plane.
Finally, these two evolution processes above are combined together
according to the coevolution mode designed in Section 4.

6.3. Results and discussions


Fig. 16. Episodic accumulated reward evolution.

Based on the settings described above, in our simulation experiment,


additional Gaussian noise has also been applied to the observation of the
agent, i.e. observation introduced in Section 6.2.2, which forces our
agent to learn the control policy instead of overfitting to a single
application scene. Moreover, with the same models and algorithms,
objects placed at different positions around the robot could be fetched
successfully by the robot arm controlled by our trained policy model.
Results of a single object reaching task is shown in Fig. 17.
Episodic accumulated reward evolution chart suggests the variation
of the accumulated reward value that the agent could gain in each
episode throughout the model evolution process. As expressed in Fig. 16,
at the beginning, the reward seems to be only random noise, which in­
dicates that our designed policy module has not yet capture the rules of
the task and it just behaves randomly. Then, with the progress of evo­
lution, the episodic reward starts to rise with gradually narrowing
variation range. This shows that our policy module has mastered the
optimal control policy, indicating the effectiveness of our approach. This
could be further confirmed by the results on distance variation along
time with a well-trained policy module, as shown in Fig. 17. From
Fig. 17, it could be seen that during the process, with the progress of
Fig. 17. Control precision of robot arm in cargo fetching task (distance be­
time, the distance between the robot arm hand and the bottom of the
tween the robot hand and the bottom of the object).
object decreases and reaches under 1 cm rapidly. This result shows that
through the control policy learned by the policy module, the robot arm
could accomplish cargo fetching tasks as expected, which confirms the Third, two evolution paradigms for the virtual product, namely the
results above. simple evolution paradigm and the model evolution paradigm, have
been proposed. The former allows policy improvement via super real-
time deep search based on techniques like Monte Carlo Tree Search,
7. Conclusion and future work
while the latter adopts reinforcement learning approach to ameliorate
the policy performance of the product, based on recognition abstracted
Motivated by the core idea of developing intelligent industrial
products with autonomous learning and self-adaptation capability, we through supervised learning and unsupervised learning. Moreover, via
transfer learning the gap between the virtual space and the physical
proposed an EDT approach. The contribution of this paper mainly
includes: space was merged for application of the learned policy.
Our future work will concentrate on the following two aspects:
First, a new concept of EDT has been proposed in this paper via
machine learning approaches, to address the lack of flexibility and
(1) the abstract model and evolution formalism which form the
adaptability in traditional industrial product development. With the
newly proposed EDT, more precise virtual product together with theoretical basis of the intelligent industrial product
development.
behavior policies with higher performance could be developed.
Second, a coevolution approach of the approximate world and the (2) the system of swarm intelligent industrial product systems
developed based on EDT, which supports the collaborative
product has been proposed, where the former converges to the behavior
of the real world and the latter explores excellent behavior policies that recognition and decision with multiple agents.
could be applied in the real world. Moreover, multiple virtual spaces has
been designed, allowing for more efficient and effective design of Declaration of Competing Interest
different policy models corresponding to different situations, which in
turn could largely improve the adaptability of the established policy The authors declare that they have no known competing financial
model to varying and uncertain environments. interests or personal relationships that could have appeared to influence

15
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209

the work reported in this paper. [14] M. Asada, S. Noda, S. Tawaratsumida, K. Hosoda, Purposive behavior acquisition
for a real robot by vision-based reinforcement learning, Mach Learn 23 (2-3)
(1996) 279–303, https://doi.org/10.1007/BF00117447.
Acknowledgement [15] M.P. Deisenroth, C.E. Rasmussen, D. Fox, Learning to control a low-cost
manipulator using data-efficient reinforcement learning, Robotics: Science and
The research is supported by the National key R&D Program of China Systems VII, 2011, 57–64.
[16] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare,
under Grant No. 2018YFB1701600 and the Beijing Institute of Tech­ A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie,
nology Research Fund Program for Young Scholars. A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis,
Human-level control through deep reinforcement learning, Nature 518 (7540)
(2015) 529–533, https://doi.org/10.1038/nature14236.
References [17] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, ... D. Silver,
Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:
[1] B.H. Li, X. Chai, B. Hou, et al., An Industrial Internet in the Age of “Intelligence+” - 1707.02286, 2017.
Cloud Manufacturing System 3.0 (Manufacturing Cloud 3.0). International [18] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy
Conference on Industrial Internet, 2019. optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[2] G. Xiong, J. Hou, F. Wang, T.R. Nyberg, J. Zhang, M.C. Fu, Parallel system method [19] F. Zhang, J. Leitner, M. Milford, B. Upcroft, P. Corke, Towards vision-based deep
to improve safety and reliability of nuclear power plants, Intell. Control reinforcement learning for robotic motion control. arXiv preprint arXiv:
Automation (2011). 1511.03791, 2015.
[3] F.-Y. Wang, P.K. Wong, Intelligent systems and technology for integrative and [20] F. Zhang, J. Leitner, M. Milford, P. Corke, Modular deep q networks for sim-to-real
predictive medicine: An ACP approach, ACM Trans. Intell. Syst. Technol. 4 (2) transfer of visuo-motor policies. arXiv preprint arXiv:1610.06781, 2016.
(2013) 1–6, https://doi.org/10.1145/2438653.2438667. [21] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder,
[4] E. Lapira, D. Brisset, H. Davari Ardakani, D. Siegel, J. Lee, Wind turbine W. Zaremba, Hindsight experience replay, Adv. Neural Inform. Process. Syst.
performance assessment using multi-regime modeling approach, Renew. Energy 45 (2017) 5048–5058.
(2012) 86–95, https://doi.org/10.1016/j.renene.2012.02.018. [22] T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Wierstra,
[5] E.H. Glaessgen, D. Stargel, The Digital Twin Paradigm for Future NASA and US Air Continuous control with deep reinforcement learning. arXiv preprint arXiv:
Force Vehicles, in: 53rd Struct. Dyn. Mater. Conf. Special Session: Digital Twin, 1509.02971, 2015.
Honolulu, HI, US, 2012, 1–14. [23] S.F. Qin, K. Cheng, Future digital design and manufacturing: embracing industry
[6] J. Leng, D. Yan, Q. Liu, H. Zhang, G. Zhao, L. Wei, ... X. Chen, Digital twin-driven 4.0 and beyond, Chin. J. Mech. Eng. 05 (2017) 12–14.
joint optimisation of packing and storage assignment in large-scale automated [24] J.S. Gero, U. Kannengiesser, The situated function–behaviour–structure
high-rise warehouse product-service system, Int. J. Comput. Integrated Manuf. framework, Des. Stud. 25 (4) (2004) 373–391.
2019, 1–18. [25] H. Zhang, H. Wang, D. Chen, G. Zacharewicz, A model-driven approach to
[7] L.R. Goldberg, The book of why: the new science of cause and effect, Notices Am. multidisciplinary collaborative simulation for virtual product development, Adv.
Math. Soc. 66 (07) (2019) 1, https://doi.org/10.1090/noti1912. Eng. Inform. 24 (2) (2010) 167–179.
[8] R. Rosen, G. von Wichert, G. Lo, K.D. Bettenhausen, About the importance of [26] J. Estefan, MBSE methodology survey, Insight 12 (4) (2009) 16–18.
autonomy and digital twins for the future of manufacturing, IFAC-PapersOnLine 48 [27] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic
(3) (2015) 567–572, https://doi.org/10.1016/j.ifacol.2015.06.141. policy gradient algorithms, in: ICML, Beijing, China, 2014, 21-26 June, 387–395.
[9] T. Gabor, L. Belzner, M. Kiermeier, M.T. Beck, A. Neitz, A simulation-based [28] C. Yang, S. Lan, W. Shen, G.Q. Huang, X. Wang, T. Lin, Towards product
architecture for smart cyber-physical systems, in: IEEE International Conference on customization and personalization in IoT-enabled cloud manufacturing, Cluster
Autonomic Computing, 2016, 374–379. Comput. 20 (2) (2017) 1717–1730.
[10] R. Söderberg, K. Wärmefjord, J.S. Carlson, L. Lindkvist, Toward a Digital Twin for [29] C. Yang, W. Shen, X. Wang, Application of Internet of Things in manufacturing, in
real-time geometry assurance in individualized production, CIRP Annals 66 (1) :2016 IEEE 20th International Conference on Computer Supported Cooperative
(2017) 137–140, https://doi.org/10.1016/j.cirp.2017.04.038. Work in Design (CSCWD), 2016, May, 670–675.
[11] F.Y. Wang, D.R. Liu, G. Xiong, Parallel control theory of complex systems and [30] Unity User Manual, https://docs.unity3d.com/Manual/ModelingOptimizedCharac
applications, Complex Syst. Complexity Sci. 9 (3) (2012) 1–12. ters.html.
[12] D. Silver, A. Huang, C.J. Maddison, A. Guez, D. Hassabis, Mastering the game of go [31] C. Yang, P. Chi, X. Song, T.Y. Lin, B.H. Li, X. Chai, An efficient approach to
with deep neural networks and tree search, Nature 529 (7587) (2016) 484–489. collaborative simulation of variable structure systems on multi-core machines,
[13] V. Zambaldi, D. Raposo, A. Santoro, V. Bapst, Y. Li, I. Babuschkin, et al., Relational Cluster Comput. 19 (1) (2016) 29–46.
deep reinforcement learning, 2018.

16

You might also like