Evolutionary Digital Twin A New Approach For Intelligent Industrial Product Development
Evolutionary Digital Twin A New Approach For Intelligent Industrial Product Development
A R T I C L E I N F O A B S T R A C T
Keywords: To fulfill increasingly difficult and demanding tasks in the ever-changing complex world, intelligent industrial
Evolutionary digital twin products are to be developed with higher flexibility and adaptability. Digital twin (DT) brings about a possible
Intelligent industrial product means, due to its ability to provide candidate behavior adjustments based on received “feedbacks” from its
Collaborative evolution
physical part. However, such candidate adjustments are deterministic, and thus lack of flexibility and adapt
Approximate world
Multiple cyber spaces
ability. To address such problem, in this paper an extended concept – evolutionary digital twin (EDT) and an
Simple evolution paradigm EDT-based new mode for intelligent industrial product development has been proposed. With our proposed EDT,
Model evolution paradigm a more precise approximated model of the physical world could be established through supervised learning,
based on which the collaborative exploration for optimal policies via parallel simulation in multiple cyberspaces
could be performed through reinforcement learning. Hence, more flexibility and adaptability could be brought to
industrial products through machine learning (such as supervised learning and reinforcement learning) based
self-evolution. As a primary verification of the effectiveness of our proposed approach, a case study has been
carried out. The experimental results have well confirmed the effectiveness of our EDT based development mode.
* Corresponding author.
E-mail address: [email protected] (C. Yang).
https://doi.org/10.1016/j.aei.2020.101209
Received 10 January 2020; Received in revised form 7 October 2020; Accepted 16 November 2020
Available online 1 January 2021
1474-0346/© 2020 Elsevier Ltd. All rights reserved.
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
Digital twin technology which integrates the cyber space and the continuous evolution and verification of the product [26]. An advanced
physical space [5] has the potential to increase the flexibility, adapt parallel simulator under multi-core environments is proposed to address
ability and intelligence of the industrial product. First, virtual product the challenges of collaborative simulation of complex variable-structure
(digital model) could better perceive the real world through the real- systems that exhibit changes both at structural and behavior levels, with
time “feedback” from its physical product (counterparts); second, vir increasingly big and complex models [31]. Recently, digital twin-driven
tual product could synthesize the sensing data and provide the physical product design approaches has been proposed to optimize the design of
product with better action policies. However, such digital twin could the virtual product based on the data feedback from the physical product
hardly solve the above problem effectively. First, faced with demanding [6]. However, the above approaches mainly depend on human knowl
tasks, it may be difficult to construct such a virtual product (only by edge and abstraction capability without better utilizing the power of
human knowledge) that could fully reflect its physical counterpart’s new AI algorithms and models.
characteristics, due to the nonlinearity, uncertainty, self-organization
and emergence of the complex world. For example, during establish 2.2. Digital twin and parallel control
ment of the virtual product of a physical robot arm, the working envi
ronment of such robot arm, including the production line layout, the The first formal definition of digital twin dates back to 2012, given
process to be executed, etc., is usually dynamic, with perturbation and by Glaessgen and Stargel [5] in NASA. A digital twin is an integrated
drastically changing or even unknown for different tasks, and thus multi-physics, multiscale, probabilistic simulation of an as-built system
extremely difficult to model in advance. Second, the existence of various that uses the best available physical models, sensor updates, history
working conditions makes it difficult to pre-design a policy-making data, etc., to mirror the life of its corresponding physical product [5].
module that can generate the effective action policies for the physical Rosen et al. [8] gives another definition from the perspective of auton
product. Taking again the robot arm for example, as the motion control omous systems which are required to be able to respond rapidly to un
law is established through human experience based on the virtual expected events without central re-planning. According to [8], a digital
product, when faced with totally different tasks, the control law may twin is an ultra-realistic model that reflects the state of the process and
need to be re-designed, and such process normally takes several months. the behavior of the autonomous system in interaction with its environ
Therefore the concept of digital twin needs to be further developed ment in the real world.
and extended to enhance its self-learning, self-adaptation and self- From the above definitions, it could be summarized that a digital
growing capability based on machine intelligence. twin is an ultra-realistic model of the physical system, which is estab
First, it is necessary for us to establish multiple cognitive models of lished in different scales, synthesizing knowledge, data, and physical
the real world, called the approximate worlds, which keep approxi model from different domains and equipment. Such a definition, in fact,
mating the different scenes of the real world through continuous emphasizes a lot on the establishment of model validity which is the core
learning. Through the approximate worlds, IntelliIndusProd could fully value of digital twin, but is difficult to achieve in engineering, though
understand the real world, and could explore different scenes in parallel related concepts, architectures [9], and even key technologies have
and in a low-cost, risk-free and super real-time way when it has to deal already been proposed. This is mainly due to the fact that human
with the uncertainty in the partially observable, non-cooperative, and knowledge is limited facing the complex real world. Thus starting from
dynamically changing world. current human cognition of the world with traditional theories, we could
Second, it is necessary to train the virtual product to adapt to scarcely build such a digital twin that could reproduce 100% the
approximate worlds. With the approximate worlds gradually behavior and characteristics of its physical product, especially the de
approaching the real world, the self-growing virtual product could tails. However, when complex systems are taken into consideration, a
finally adapt to the real world and give better action policies for the minor mistake in such details could result in huge differences in
physical product. In addition, we could train multiple virtual products in behavior. Furthermore, to our knowledge, most of the research on dig
parallel in a single approximate world which could generate more data ital twin concentrates on optimizing the performance of the physical
through simulation to speed up the exploration process. In other words, product through human knowledge based optimization of the virtual
the new kind of digital twin is an evolving digital twin equipped with product [8,10]. Similar to the modelling precision limit totally based on
multiple cyber spaces, namely multiple approximate worlds, where the human knowledge, such optimization ability is also limited.
virtual product could evolve its behavior and policy to better approxi Faced with similar difficulty in the control law design domain, Fei-
mate the physical product and make better decisions. yue et al. [11] proposed the parallel control theory. In this theory,
This paper proposes an Evolutionary Digital Twin (EDT) approach they proposed constructing an equivalent artificial system (not exactly
for the IntelliIndusProd development, and is organized as follows: Sec the same as the physical one) that works in parallel with the real world
tion 2 presents related work; Section 3 and 4 introduces the EDT concept system. In this artificial system, planning and optimization algorithms
and the architecture of an EDT; Section 5 discusses the concrete estab could be designed and experimented for better control laws. Meanwhile,
lishment of a product design oriented EDT in details. Section 6 describes the behavior of this artificial system is further corrected by the data
an application case study and discusses related results. Section 7 con collected from the real system. Although such parallel control theory is
cludes the paper with future work. promising and potentially effective in control law design where the core
purpose is to eliminate deviations, it might be less applicable in the cases
2. Related work where policies should be made based on human knowledge or precise
knowledge about the state or evolution dynamics of the system, for
2.1. Simulation driven product design example the autonomous collaborative robot arm pairs on the produc
tion line. Even so, the concept of parallel control has provided good
The innovation of digital design and manufacturing is the key to this insights to the design of our EDT.
new round of industrial revolution [23]. Currently, digital methods for
product development can be used to quickly define and design a pro 2.3. Reinforcement learning based design
totype, not only in terms of product structure, but also product function
and behavior [24]. The multi-disciplinary virtual prototype could sup Recent years have witnessed rapid development of machine learning,
port unified modeling and collaborative simulation across different especially deep learning and deep reinforcement learning. For example,
disciplines to evaluate and optimize virtual products in the cyber space in 2016, AlphaGo designed by DeepMind of Google mastered the game
[25]. Model-based system engineering could establish the entire life of Go and easily defeated the world champion Sedol Lee and Jie Ke by
cycle model of a product through multiple views, supporting the big scores [12]. With this victory, DeepMind published their research on
2
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
development of AlphaGo on Nature, followed by another one on 3. EDT and persistent reinforcement based product design
AlphaZero which mastered the game of Go without human supervision. paradigm
In 2019, faced with the more challenging task Starcraft II, AlphaStar
[13] developed by DeepMind defeated again professional human players 3.1. Common product development process
with score 10:1. From the details revealed in the articles [13], it could be
seen that provided with perfectly described interaction environment and The common product development process follows a step-by-step
task targets, the agent could converge to behavior strategies largely process which is mainly driven and led by human in three worlds
superior than those manually developed, or even to optimal strategies, including expected world, interpreted world and external world [24].
through autonomous learning. Hence, reinforcement learning based The product lifecycle could be roughly divided into 3 steps, as shown in
design pattern starts to attract increasing attention both in industry and Fig. 1: first, in the primary design step, according to the ideal world
academy, and is being applied to designs in different domains. (corresponding to the expected world in [24]) which is based on the
Among these application domains, the design of robot control algo human understanding of the real world (corresponding to the external
rithms has been widely studied and has been practically applied in the world in [24]), an ideal product model is established by means of
control of real robots. Asada et al. [14] developed a ping-pong player mathematical and physical modeling in the cyber space. Second, in the
robot based on Q-learning algorithm [16]. Their developed robot con detailed design, simulation and test step, an approximate world (cor
trol algorithm could drive the robot to beat the ball to required positions responding to the interpreted world in [24]) of the real world is con
based only on visual information. Deisenroth et al. [15] proposed a structed through modeling and simulation, together with possible semi-
model based policy search method to train the robot to accomplish physical simulation in verification. In this step, a digital or semi-physical
building blocks task. Although the application systems in the above prototype product based on the approximate world in the cyber space or
work were real robot machines, the robot control commands were the physical space is established. Finally, in the operation and service
constrained and limited by human scripted rules when applied. This may step, a real product is fabricated and released to provide services. In the
hinder better policy discovery in complicated tasks, as human knowl traditional development mode, the function, performance and structure
edge or cognition is, to some extent, limited. Deep reinforcement of the final product are fixed, with extremely limited flexibility and
learning based robot control design (without human scripted rules in adaptability. Facing new scenarios and requirements, corrections or
control commands), on the other hand, could hardly be applied to real innovations from humans are needed, and then the above three steps
robots, and still stays in simulation stage [17,18]. Zhang et al. [19,20] needs to be repeated. Usually, each product development cycle will
applied DQN algorithm to train a three joint robot for grasping task in incur high cost and take a long time (several years in extreme cases),
simulation environment. However, when applied to physical robot which cannot meet user’s need for fast product delivery, upgrade and
machine, the performance was less satisfactory due to the differences iterations. In other words, this product development mode cannot sup
between the simulation environment and the real world. Similar prob port the fast product innovation and development (within a relatively
lems also exist in others work [21]. short timeframe) in the coming 4th industrial revolution era.
Hence, for the purpose of addressing the above problems encoun
tered in simulation driven product design, we propose the concept of
EDT which allows the designed product to persistently and intelligently 3.2. Connotation of EDT
optimize itself in its whole lifecycle. With our proposed EDT, learning
agents could be equipped with a model with high and gradually To address the above issue, based on the industrial Internet, inte
increasing confidence, which in turn converge to behavior policies with grating the new generation ICT, the new generation AI technology and
higher performance in real products. the product development field technology, the EDT provides a novel
product development mode where different forms of learning and
searching means in multiple parallel cyber spaces are introduced, which
allows the IntelliIndusProd to evolve and possess better adaptability.
3
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
The EDT, as an extension of DT, is also composed of the virtual and the real ones or the physical ones. However, through evolution and
physical parts, namely the virtual product and physical product in terms learning process, such differences can be gradually decreased. More
of industrial product development. Mainly based on machine intelli over, together with such evolution process, action policies of the phys
gence and supplemented by human intelligence, the EDT could help ical product should be optimized. Second, based on the EDT, the
effectively establish the operation law under various uncertainties and traditional simulation, test, operation and service would not be a sepa
gradually achieve a well approximated model of the real world. Such rated process. Instead, this process is combined with the evolution of
capability could well guide product design improvement and operation both the prototype product (virtual product) and the approximate
optimization, and hence support to develop IntelliIndusProds with world. Emphasizing on autonomous evolution, the above development
enough flexibility and adaptability for the complex world and tasks. process, however, does not negate the value of human and theoretical
The EDT, as shown Fig. 2, includes some notable features: modeling which may also play an important role in extracting more
value from the data on the contrary [7].
(1) Compared with the current digital twin, the EDT clearly builds an In this new product development process, there are two paradigms of
approximate world corresponding to the real world, which could digital twin evolution: simple evolution paradigm and model evolution
focus and simulate some specific aspects (views) of the real world paradigm, which are discussed in details as follows.
according to the R&D requirements. Besides, via EDT, such
established approximate world can support the repeated tests and 3.3.2. Simple evolution paradigm
experiments of the designed products and evolve with the new In the simple evolution paradigm, the models & parameters of the
product designs at the same time. product and the world are deterministic, the operation policies is the
(2) Compared with the current digital twin whose cyber space and only adjustable factor. As shown in Fig. 4, at the beginning, the state of
physical space are mapped through bijection, the EDT builds the approximate world is mapped from the real world’s state. In multiple
multiple cyber spaces to construct different models of the real cyber spaces, the virtual product could execute actions from different
world with uncertainties and of the product at different resolu operation policies, and bring the state changes in the corresponding
tions from different aspects (views) of the real world according to approximate world. The process is a super real-time process. If the policy
the development demands. space is huge, searching and planning algorithms like Monte Carlo Tree
(3) Compared with the current digital twin which mainly use a cyber- Search technology would be applied to explore the feasible solution
space to predict the operation effects of the product, the EDT space in parallel in multiple cyber spaces. Finally, the policy used in the
builds a development method using multiple cyber spaces for fast physical space can be optimized using the searching and evaluating
parallel learning and searching, which allows the approximate results generated in the cyber space.
world to keep approaching the real world, and the product
scheme to keep adapting to the real world. 3.3.3. Model evolution paradigm
In the model evolution paradigm, the behavior policies are not the
only variable for the virtual product of an EDT, the cognition models of
3.3. A new product development paradigm the real world would also change due to the gradually increased infor
mation completeness and certainty.
3.3.1. New process of product development To this end, as shown in Fig. 5, supervised learning and unsupervised
The process of new product development mainly using machine in learning could be adopted to construct different cognition models of the
telligence and assisted by human intelligence, is an evolving process real world, supporting the evolution of the approximate world towards
based on the EDT, as shown in Fig. 3. First, if it belongs to a product the real world, while reinforcement learning approach could be applied
family, evolution could be performed through learning from the data of for policy model construction, allowing the search of effective and even
similar products fed back by their EDT in the early step of design, which optimal behavior policies through interaction between the virtual
is similar to the in-use product [29]. Initial models of both the product product and the approximate world.
and the approximate world may be different from or cannot fully reflect
Transfer Learning
Reinforcement Learning
(Monte Carlo) Tree Searching
lives in
Physical Product
Virtual lives in
Product
Physical Product
Approximate World of
Multi-Cyber Space
Real World
4
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
Virtual
Product
Approximate
World
Ideal World Approximate World Real World Real World
Mainly By AI
& Less By
Human
Fig. 3. New product development process.
...
However, differences still exist between the behavior of the virtual (1) Big data: big data is not only collected from the physical space,
product and that of the physical product, due to the existence of errors in but also generated from cyber space. (2) Algorithm: algorithm mainly
training, forming a gap between the cyber space (including the virtual operates in data analysis engine, intelligent optimization engine and
product and the approximate world) and the physical space (including machine learning engine. (3) Computing power: enough computing
the physical product and the real world). Thus, to mitigate this gap, the power is needed to support the data generation in the approximate
transfer learning approach could be applied to develop a product with world and the virtual product, and to support the costly operation of
enough adaptability to the errors above, allowing the converged policy corresponding engines.
models to be applied in physical world. Furthermore, as the EDT contains multiple cyber spaces, much more
computing power is needed to process big data from different cyber
4. EDT based intelligent industrial product development system spaces. An IntelliIndusProd often runs on the industrial edge where
architecture computing resources and data processing capability are limited.
Therefore, the development system should adopt the architecture that
4.1. Overview integrates the cloud and the edge computing power.
As shown in Fig. 6, for both simple evolution paradigm and model
The intelligence of the IntelliIndusProd is not possible without the evolution paradigm, the edge would collect and pre-process the data of
effective fusion of big data, algorithm and computing power, for both the physical product and the real world, and then feed the data back to
supervised learning of the approximate world and the reinforcement the cloud; the cloud could support the supervised learning of multiple
learning of the virtual product. approximate worlds and the reinforcement learning of multiple virtual
5
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
...
Supervised / Supervised /
Reforcement Reforcement Time Axis
Unsupervised Transfer Unsupervised Transfer
Learning Learning
Learning Learning Learning Learning
product instances efficiently. The edge would always update the latest It not only includes the access and communication of the edge to the
intelligence to improve the flexibility and adaptability. physical product and the real world, but also includes the access and
communication of the cloud to each edge.
4.2. Hierarchical description
(3) Edge processing platform layer
As shown in Fig. 7, the system architecture of the EDT based Intel
liIndusProd development consists of several layers including Physical It not only supports the time-efficient operation of the intelligent
layer, Access/communication layer, Edge processing platform layer, industrial product, but also supports the local evolution of the intelligent
Cloud development service platform layer and Application layer, along industrial product by providing data analysis engine, intelligent opti
with two parts about Information security management, Standard and mization engine and machine learning engine along with local data and
specification. The Cloud development service platform layer further computing power.
includes Virtualizing layer, Cloud development service support layer
and Portal layer. The details of this architecture are explained as follows. (4) Cloud development service platform layer
1) Virtualizing layer
(1) Physical layer
It not only includes the virtualization of traditional development
It not only includes the physical product and the real world, but also resources and capabilities such as computing power, but also includes
includes the related development resources and capabilities (such as the virtualization of the physical product and the real world, namely the
computing power) required by the operation of multiple cyber spaces. virtual product and the approximate world. The virtual product and the
approximate world could be encapsulated by the container technology
(2) Access/communication layer such as Docker, and could be created as multiple instances in the cloud
or be deployed to the edge computing devices.
6
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
Data feedback
Model
Application training and Application
layer optimization
Model update/
decision support
Cloud Edge
App services
Cloud development service platform layer
Virtua Cloud virtual development Cloud virtual intelligent industrial Cloud virtual development
lizing resource product and approximate world capability
layer
7
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
2) Cloud development service support layer It reflects that, for both simple evolution paradigm and model evo
(a) Basic services lution paradigm, the EDT could evolve the intelligent industrial product
to develop the real intelligence based on the integration of cloud and
The basic services module provides the core support of the whole edge.
system, including the data support, the computing power support, and
the algorithm support in form of services. Data as a service (DaaS) 5. Product development-oriented EDT construction and
manages and provides the data collected from both the real world sys evolution
tems and the virtualized approximated world systems to support the
data driven evolution process of the virtual intelligent industrial prod 5.1. EDT construction design
uct. Infrastructure as a service (IaaS) provides elastic computing power
to large scale parallel simulation of multiple virtual intelligent industrial As described above, the EDT has the characteristics of persistent
products in multiple approximated world instances, which is further enhancement in terms of both its virtual product precision and its
supported by the high performance cloud simulation engine based on physical product policy optimization. Thus, it should be constructed
the Docker technique. Platform as a service (PaaS) realized via the data naturally equipped with the ability to evolve, in terms of both the virtual
analysis engine, the intelligent optimization engine and the machine product and the physical product.
learning engine supports the evolution of the EDT with abundant To achieve this, the virtual product is designed to be composed of a
powerful algorithms. totally or partially parameterized policy module and a behavior module
also parameterized. The function of the latter is to approximate the
(b) Application services behavior of the physical product, while that of the former is to generate
action commands for the latter to accomplish tasks. The physical prod
The application services module provides shared domain related uct, on the other hand, is designed to be composed also of a totally or
applications. With this module, users could define data driven intelli partially parameterized policy module whose parameters are a copy of
gent industrial product development tasks with the assist of system those of the virtual product’s policy module through update, as shown in
modelling and simulation language and collaborative design service the example of a robot arm EDT in Fig. 8. As for the approximate world,
(Software as a Service, SaaS). Large scale simulation instances are it is also equipped with a behavior module with similar kind and func
defined and executed in this part, continuously generating data from the tion as that of the virtual product.
interaction between the virtual intelligent industrial product and the With such construction design, the upgrade of the virtual product
approximated world, supporting the collaborative evolution of the vir relies on the model adjustment based on the measurement data collected
tual model. from the physical world and the simulation data from the approximate
world, while that of the physical product relies on the update of its
3) Portal layer behavior policy and the application of the generated policy.
It is thus required that the physical product is able to apply the
Stakeholders, including end users, could jointly carry out various generated policy and that the physical product, the virtual product, and
activities in the whole life cycle of the intelligent industrial product, the approximate world are adjustable through programs. Based on this,
such as describing the ideal product and supervising the evolution of the the construction of the dual parts of the EDT is discussed in details in the
prototype product. following parts.
8
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
measurement data and an actuator and verifier of the policy learned by (2) Policy update and application
the virtual product. Thus, the physical product should have the ability to
convert the policy into commands directly applicable to its physical sub- To automatically update the policy of the physical product, a
components and continuously collect measured data about itself and the parameterized policy module which is the same with that of the virtual
surrounding environment. Based on these functional requirements, the product is constructed. At each policy update, the parameters of the
following key points should be well determined in the construction of physical product’s policy module are updated with those of the virtual
the physical product in an EDT. product’s policy module, and hence update the physical product
behavior. Moreover, as the policy generated by the policy module is
(1) Measurement data collection and preprocessing digital, and that the motion or dynamics of a physical system is usually
continuous, converting modules, such as stepping motors, that convert
Measurement data of a physical system originate from measurements the digital policy into continuous behavior policy are equipped, as
from different sensors which, taking the robot arm EDT shown in Fig. 9 shown in Fig. 9. These converting modules could convert online the
as an example, include angular sensors, angular velocity sensors, cam behavior policy, and the behavior of the robot arm in the example in
era, infrared sensor etc. Therefore, in order to gather sufficient data with Fig. 9 could be controlled directly or indirectly by frequency of the pulse
a wide enough range for training the virtual product, the physical signal generated by the policy module.
product needs to be equipped with a wide range of accurate sensors.
Meanwhile, as different sensors may have different sample frequency, a 5.1.2. Virtual product construction
pre-processing module in the edge processing platform layer is needed to In an EDT, the virtual product, composed of a policy module and a
unify different measurement data time line through either unifying the behavior module, serves as both an imitator and an optimal policy
sample frequency or introducing sample interpolation for measurement searcher of the physical product. In the operation mode, the behavior
data from sensors with a lower sample frequency. Through this coor module corrects its behavior based on the measurement data received
dination process, the data are collected at the same frequency and could from its physical counterpart and the policy module searches for an
be merged together to provide the virtual product with a comprehensive optimal behavior policy in the approximate world with the support of
measurement data set of both the physical product and its living the high performance computer cluster in the cloud. Accordingly, both
environment. the behavior module and the policy module of the virtual product should
Moreover, besides the data time line unifying processing, further be designed as auto-adjustable through deterministic or stochastic
data processing is required before the collected data could be trans learning programs.
mitted to the virtual product and is utilized for training, due to the ex To make the behavior of the virtual product adjustable through
istence of noise, abnormality and misalignment in the preprocessed measurement data based learning and simulation based reinforcement
data. Thus, in the operation mode, the data flow from the physical world learning in the approximate world, both the policy module and the
into the sensors, pass through the pre-processing module, the filter and behavior module of the virtual product can be modelled in different
alignment module, and finally transmitted to the cloud where they could parameterized forms: partially parameterized and totally parameter
be utilized for virtual product training. ized, as shown in Fig. 10.
9
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
In the totally parameterized modelling mode, the virtual product is model further consists of a non-parameterized or partial parameterized
either modelled by different neural networks according to its desired sub-part and a totally parameterized sub-part. In such a construction
function or established by different components, each of which is further form, parameters to be optimized could be drastically decreased, as the
constructed by a neural network imitating its behavior. Such a model model is restricted to only some parts of the entire system, and that the
ling mode normally results in a large number of parameters to be opti parameterized parts are based on some empirical models or theory based
mized when the behavior of the physical product is complex. Such non- models. But such modelling mode has its drawbacks compared to the
structural characteristics combined with a large number of adjustable first mode, such as sophisticated training process, relatively limited
parameters make the model hard to optimize. behavior fitting capability.
Different from the totally parameterized modelling mode, the Hence, in the construction of the virtual product’s behavior module
partially parameterized mode introduces the prior information of the of an EDT, the extent and the approach of parameterization needs to be
physical product into the modelling process, including the system considered delicately based on the features of the system and the
structure, some deterministic theory based part models, some manually extensibility of it. The same also applies to the construction of the
behavior fitting functions, etc., as shown in Fig. 10. In this mode, the approximated world models.
non-parameterized approach and the parameterized approach are
combined in two ways. In the first way, part of the model is established 5.2. EDT evolution design
based on theories with some influencing parameters determined from
outside, for example by a neural network. In the second, part of the Currently, model evolutions are achieved mainly using machine
10
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
learning, deep learning, and reinforcement learning approaches. As for adopted by the EDT. Different from supervised learning processes based
the first two approaches, models are trained mainly through supervised on well labelled data, the reinforcement learning process learns through
or semi-supervised mode based on collected data, while model evolution interaction between the agent and its living environment with no pre-
in the last approach is accomplished based on data collected via in collected labelled data, and relies on the virtual product behavior
teractions between the current agent and the environment. All three module. The complete training process under the collaborative training
approaches have achieved good results in various tasks. However, in the framework is shown in Fig. 12.
evolution process of an EDT, none of these evolution approaches could As shown in Fig. 12, the exact virtualization of the physical product
succeed alone. Hence, in this paper, a coordinated evolution approach allows the creation of multiple cyber spaces, where the policy module
designed for the EDT is proposed. accumulates experience (in the form of collected data, accumulated
An EDT is composed of a virtual product and a physical product. parameter gradients, etc.) and updates its policy model through parallel
However, the evolution of an EDT mainly concentrates on the virtual interaction with different virtual product models with multiple
product side, while the evolution of the physical product is achieved instances.
through a simple transmission of parameters, given that both the virtual In each of the multiple parallel interactions and trainings, the policy
product and the physical product have a policy module with the same module collects observations of the environment and the reward it gains
structure. Meanwhile, the parameter transmission could be performed after applying its output policy based on previous observations. With
once the difference between the policy generated by the up-to-date batches of such interaction data collected, the policy module accumu
model and that given by the previous version of model exceeds a lates experience through policy gradient [22] or policy optimization
certain threshold. In such a way, the physical product is gradually [17] methods. Finally, with the accumulated experience collected from
evolved, guaranteeing its behavior stability at the same time. different interaction and training processes, the parameters of the policy
The evolution of the virtual product concerns the evolution of its module is updated through application of the synthesized gradient or
behavior module through supervised learning and that of its policy direct parameter assignment.
module through reinforcement learning, which optimizes the behavior
of an EDT according to different tasks. Moreover, as both evolutions take 5.2.3. Coevolution of the supervised learning and reinforcement learning
place simultaneously, a coevolution strategy that coordinates these two The above two processes of the virtual product evolutions are carried
evolution processes also plays a key role in the evolution of the virtual out independently. However, coupling does exist between them. As
product. stated above, the entire policy optimization process of the policy module
bases on the correctness of the virtualization, namely the virtual product
5.2.1. Supervised learning based behavior module evolution behavior module. Thus, if the behavior of the behavior module is largely
The purpose of the behavior module evolution is to achieve the different from that of the physical product, the behavior policy provided
consistency between the behavior of the physical product and that of the by the policy module which is trained based on data collected through
virtual product. Taking the robot arm EDT as an example, in a well simulation interaction between the virtual product in the approximate
evolved virtual robot arm, each ankle of it should turn exactly the same world would be useless or even dangerous if applied in real world.
angle as the physical robot arm, given the same input command. This Accordingly, in the EDT evolution process, the evolution of the policy
consistency could result in an exact virtualization of the physical robot, module will not start until good precision has been achieved by the
which could serve to optimize the control policy of the real robot behavior module.
through reinforcement learning. Furthermore, as described in the Section 5.1, the EDT is designed to
Therefore the training of the behavior module follows the typical support, to some extent, system scalability, which indicates that some
supervised learning process. The behavior module is established ac functions or parts in the physical system could be modified according to
cording to the Section 5.1. As shown in Fig. 11, with the parameterized the needs of product upgrade. Under such condition, the behavior
parts built by using neural networks, such process takes the input module first will be updated, followed by the update of policy module, to
commands labelled by the output behavior of the physical product as adapt to this change, realizing specifically the support of system
training data, and takes the mean square error between the output extensibility.
behavior of the virtual product and that of the physical product as the
loss function. The parameter optimization is accomplished through 6. Case study: EDT based development application in robot arm
minimizing the loss function by tuning the neural network parameters. control
5.2.2. Reinforcement learning based policy module evolution In this section, an application of an EDT based development in the
The reinforcement learning process aims to optimize the policy robot arm control command calculation is presented. We first introduce
11
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
Fig. 13. Manual adjustment of robot arms before equipped in a production line.
12
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
a robot arm dynamic model in the production line, followed by the utilization on product line. Furthermore, as the whole adjustment pro
design to construct an EDT for the robot arm control. Finally, some cess is task oriented, any changes on the pre-defined task would require
related results on the construction process and system performance are a re-adjustment of the robot arm.
discussed. With the increasing needs for individualized and small-lot produc
tion [28], flexibility of the production line becomes more and more
important. Under such circumstances, the above manual adjustment and
6.1. Robot arm control in the production line re-adjustment of robot arms turn out to be a bottleneck in the efficient
production, where our proposed EDT approach can be utilized. Our
Manufacturers around the world are turning to automation to help approach applied in the design of the robot arm, allows the robot arm to
solve the labor shortage, increase productivity and improve product adapt itself to different tasks, even changes in the robot arm constitu
quality. Robot arms provide a cost-effective, flexible, and safe automa tion. We will describe the construction of a robot arm control EDT based
tion solution for a wide range of production tasks, including machining, on our proposed method in Section 6.2.
product packaging, product sorting, etc.
At present, most of the busy robots on the production line are based
on manual pre-adjustment, undertaking fixed tasks, and running in the 6.2. Construction of a robot arm control EDT
scope of non-interference. However, with the increasing demand for
personalized and intelligent production, they are required to perform 6.2.1. Constructing a robot arm EDT model
diversified and complex tasks, constantly undertake and adapt to new The construction of the virtual product and physical product is
tasks, and can work closely with each other or with people autono described in Figs. 14 and 15. As the environment is simply an object with
mously. Hence, they are becoming the IntelliIndusProd that this paper a table and a floor, the approximate world could be established
focuses on. deterministically.
Normally, a robot arm is composed of five subsystems, namely the For the construction of the virtual product, geometric data and dy
driving system, the transmission system, the actuators, the control sys namic characteristic data are first collected through product description
tem, and the detecting system. The control system stands in core position and geometric measurements. These collected data are further utilized
of a robot arm, as it coordinates the dynamics of motors on different axes for geometric modelling and dynamic modelling of the virtual product
to accomplish a production related task. Hence, the design of this control model. Concerning these two modelling processes, Unity [30] is adopted
system is the core of the design of a robot arm. as the modelling tool, as it supports both 3D modelling and dynamic
However, the control system of most robot arms could only control modelling, and hence could combine these two parts together, forming a
the robot arm to perform predefined action sequences which are realized unique virtual product model.
through manual guidance and adjustment, as shown in Fig. 13. During Moreover, in the process of dynamic modelling, the aforementioned
this adjustment process, normally, a step-by-step guiding process is partial parameterized model is adopted. In this model, the dynamics of
needed. In this process, the robot is controlled by the technician through each arm and joint are determined through dynamic characteristic based
a wired controller to accomplish a given task. The commands sent by the modelling, while the interaction between different arms and joints are
technician through the controller are transformed into executable code modelled through neural networks. Specifically, the mass, and the
lines and stored in the computer. As the task that a robot arm needs to moment of inertia of each arm, the angle range and the motor torque
achieve is usually delicate with high precision, the human controlled range of each joint are determined manually, while the joint angles
process needs to be slow enough to protect both the robot and the between successive arms, together with their changing rates are pro
product from being damaged. After this controlled process, the com vided by the neural networks. Based on this, the behavior of a virtual
mand execution needs to be further accelerated to meet the production product model could be described by joint angles, joint angle velocities,
needs. As a result, for each robot arm and each task, it would require and joint positions.
about two to three months for a task oriented adjustment before its As for the policy module that is in charge of generating task oriented
13
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
( )
control commands, on the other hand, it is also modelled as a neural θπ ) serving as the policy function and the critic Q s, a; θQ as the action-
network taking the task related information (for example the position of value function, parameterized respectively with θπ and θQ , where s de
the cargo in a cargo fetching task) as input and outputs the change rate notes the observation state and a represents the action taken. In the
of joint angles.
training process, the θQ is updated via minimizing the expected value of
Compared to the construction of the virtual product, that of the
temporal difference error
physical product is less complicated. As depicted in Fig. 15, the raw [ ( ) ( ( ))]
robot arm is first equipped with different sensors, including goniome Eπ′ Q st , at ; θQ − r(st , at ) + γQ st+1 , at+1 ; θQ (2)
ters, angular velocity sensors, industrial cameras, etc. These sensors
collect measurement data including but not limited to the joint posi with π an ε-greedy policy based on policy π, while the policy parameter
′
tions, the joint angles and the joint angle change rates, which could be θπ is updated in the direction of the deterministic policy gradient
further utilized in the evolution of the virtual product model. Further through the equations
more, the physical product is connected to a computer where the policy ( ) ( )
module and the converter are equipped. For these two parts, the former δt = r(st , at ) + γQ st+1 , π (st+1 ); θQ − Q st , at ; θQ (3)
is only a copy of the policy module of the virtual product, while the latter ( ) ( )⃒
converts the target joint angle values generated by the former into target θπt+1 = θπt + ∇a Q s, a; θQ |s=st ,a=π(st ;θπ ) απ ∇θπ π s; θπt ⃒s=st (4)
t
joint angle command signals which could be directly applied to the robot
With experience replay and target networks mechanisms further
arm machine.
introduced, the DDPG algorithm further stabilizes the algorithm evo
lution process.
6.2.2. Robot arm EDT evolution
Applying the DDPG algorithm to our case, the observation state s is
With the EDT model constructed above, evolution of the robot arm
designed as follows. The observation state provided by the behavior
EDT is implemented as follows. For the supervised learning based
module of the virtual product is composed of variables below:
behavior module evolution, the implementation is simply an application
⎧
of the backpropagation process on a structured neural network, with one ⎪
⎪ dif f jti = (jointi − tgt)/2
⎪
neural network part for each correlation between arms. The network ⎪
⎨ dif(f jji = (jointi − joint
)/ 0 )/2
training process is as described in Section 4.2. dif f thj = tpointj − hpointj 4 j = 1, 2, 3, 4
( )/ (5)
⎪
⎪
The evolution of the policy module is a bit more complicated. In this ⎪
⎪
⎩
dif f hjij = hpointj − jointi 4 j = 1, 2, 3, 4
case, the Deep Deterministic Policy Gradient (DDPG) [22] algorithm is collision
adopted as the reinforcement learning algorithm to drive the evolution
where jointi denotes the three-dimensional position of joint i, tgt repre
of the policy module. The DDPG algorithm originating from Deep Policy
sents the three-dimensional position of the target, tpointj is the three-
Gradient (DPG) [27] algorithm is proposed for solving reinforcement
learning problems, especially those with a continuous action space, dimensional position of the point j just beneath the object, hpointj is
where an agent needs to learn an action policy π through interaction the three-dimensional position of the point j on the gripper, collision is
with an environment, aiming at maximizing the expected return from the occurrence of collision. By these five parts, the final observation of
start R1 the agent is
∑
T observation = [ s1 s2 s3 s4 s5 ]
Rt = γi− t r(si , ai ) (1) si = [diff jti diff jji diff hji1 diff hji2 diff hji3 diff hji4 ] ∀i ∈ {1,2,3, 4}
i=t s = [diff th1 diff th2 diff th3 diff th4 collision]
the DDPG algorithm utilizes an actor-critic structure, with the actor π(s; For the design of the reward function r(si ,ai ) that guides the robot
14
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
15
T.Y. Lin et al. Advanced Engineering Informatics 47 (2021) 101209
the work reported in this paper. [14] M. Asada, S. Noda, S. Tawaratsumida, K. Hosoda, Purposive behavior acquisition
for a real robot by vision-based reinforcement learning, Mach Learn 23 (2-3)
(1996) 279–303, https://doi.org/10.1007/BF00117447.
Acknowledgement [15] M.P. Deisenroth, C.E. Rasmussen, D. Fox, Learning to control a low-cost
manipulator using data-efficient reinforcement learning, Robotics: Science and
The research is supported by the National key R&D Program of China Systems VII, 2011, 57–64.
[16] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare,
under Grant No. 2018YFB1701600 and the Beijing Institute of Tech A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie,
nology Research Fund Program for Young Scholars. A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis,
Human-level control through deep reinforcement learning, Nature 518 (7540)
(2015) 529–533, https://doi.org/10.1038/nature14236.
References [17] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, ... D. Silver,
Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:
[1] B.H. Li, X. Chai, B. Hou, et al., An Industrial Internet in the Age of “Intelligence+” - 1707.02286, 2017.
Cloud Manufacturing System 3.0 (Manufacturing Cloud 3.0). International [18] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy
Conference on Industrial Internet, 2019. optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[2] G. Xiong, J. Hou, F. Wang, T.R. Nyberg, J. Zhang, M.C. Fu, Parallel system method [19] F. Zhang, J. Leitner, M. Milford, B. Upcroft, P. Corke, Towards vision-based deep
to improve safety and reliability of nuclear power plants, Intell. Control reinforcement learning for robotic motion control. arXiv preprint arXiv:
Automation (2011). 1511.03791, 2015.
[3] F.-Y. Wang, P.K. Wong, Intelligent systems and technology for integrative and [20] F. Zhang, J. Leitner, M. Milford, P. Corke, Modular deep q networks for sim-to-real
predictive medicine: An ACP approach, ACM Trans. Intell. Syst. Technol. 4 (2) transfer of visuo-motor policies. arXiv preprint arXiv:1610.06781, 2016.
(2013) 1–6, https://doi.org/10.1145/2438653.2438667. [21] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder,
[4] E. Lapira, D. Brisset, H. Davari Ardakani, D. Siegel, J. Lee, Wind turbine W. Zaremba, Hindsight experience replay, Adv. Neural Inform. Process. Syst.
performance assessment using multi-regime modeling approach, Renew. Energy 45 (2017) 5048–5058.
(2012) 86–95, https://doi.org/10.1016/j.renene.2012.02.018. [22] T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Wierstra,
[5] E.H. Glaessgen, D. Stargel, The Digital Twin Paradigm for Future NASA and US Air Continuous control with deep reinforcement learning. arXiv preprint arXiv:
Force Vehicles, in: 53rd Struct. Dyn. Mater. Conf. Special Session: Digital Twin, 1509.02971, 2015.
Honolulu, HI, US, 2012, 1–14. [23] S.F. Qin, K. Cheng, Future digital design and manufacturing: embracing industry
[6] J. Leng, D. Yan, Q. Liu, H. Zhang, G. Zhao, L. Wei, ... X. Chen, Digital twin-driven 4.0 and beyond, Chin. J. Mech. Eng. 05 (2017) 12–14.
joint optimisation of packing and storage assignment in large-scale automated [24] J.S. Gero, U. Kannengiesser, The situated function–behaviour–structure
high-rise warehouse product-service system, Int. J. Comput. Integrated Manuf. framework, Des. Stud. 25 (4) (2004) 373–391.
2019, 1–18. [25] H. Zhang, H. Wang, D. Chen, G. Zacharewicz, A model-driven approach to
[7] L.R. Goldberg, The book of why: the new science of cause and effect, Notices Am. multidisciplinary collaborative simulation for virtual product development, Adv.
Math. Soc. 66 (07) (2019) 1, https://doi.org/10.1090/noti1912. Eng. Inform. 24 (2) (2010) 167–179.
[8] R. Rosen, G. von Wichert, G. Lo, K.D. Bettenhausen, About the importance of [26] J. Estefan, MBSE methodology survey, Insight 12 (4) (2009) 16–18.
autonomy and digital twins for the future of manufacturing, IFAC-PapersOnLine 48 [27] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, M. Riedmiller, Deterministic
(3) (2015) 567–572, https://doi.org/10.1016/j.ifacol.2015.06.141. policy gradient algorithms, in: ICML, Beijing, China, 2014, 21-26 June, 387–395.
[9] T. Gabor, L. Belzner, M. Kiermeier, M.T. Beck, A. Neitz, A simulation-based [28] C. Yang, S. Lan, W. Shen, G.Q. Huang, X. Wang, T. Lin, Towards product
architecture for smart cyber-physical systems, in: IEEE International Conference on customization and personalization in IoT-enabled cloud manufacturing, Cluster
Autonomic Computing, 2016, 374–379. Comput. 20 (2) (2017) 1717–1730.
[10] R. Söderberg, K. Wärmefjord, J.S. Carlson, L. Lindkvist, Toward a Digital Twin for [29] C. Yang, W. Shen, X. Wang, Application of Internet of Things in manufacturing, in
real-time geometry assurance in individualized production, CIRP Annals 66 (1) :2016 IEEE 20th International Conference on Computer Supported Cooperative
(2017) 137–140, https://doi.org/10.1016/j.cirp.2017.04.038. Work in Design (CSCWD), 2016, May, 670–675.
[11] F.Y. Wang, D.R. Liu, G. Xiong, Parallel control theory of complex systems and [30] Unity User Manual, https://docs.unity3d.com/Manual/ModelingOptimizedCharac
applications, Complex Syst. Complexity Sci. 9 (3) (2012) 1–12. ters.html.
[12] D. Silver, A. Huang, C.J. Maddison, A. Guez, D. Hassabis, Mastering the game of go [31] C. Yang, P. Chi, X. Song, T.Y. Lin, B.H. Li, X. Chai, An efficient approach to
with deep neural networks and tree search, Nature 529 (7587) (2016) 484–489. collaborative simulation of variable structure systems on multi-core machines,
[13] V. Zambaldi, D. Raposo, A. Santoro, V. Bapst, Y. Li, I. Babuschkin, et al., Relational Cluster Comput. 19 (1) (2016) 29–46.
deep reinforcement learning, 2018.
16