computer systems and other devices which are part of the IoT framework can be effective if they
are intelligent. This intelligence can be applied by using methods from the field of AI.
2.3 Artificial Intelligence (AI)
(McCarthy, 1989) introduce AI as a concept where he argued computers are machines that are
computing mathematical formulas but lack common-sense knowledge. For an artificial machine
such as a computer to be considered intelligent, it should be able to provide mathematical
reasoning for common sense problems. That is one of the very fundamental arguments which
conceptualized the field of AI as it we know it today. Meanwhile, (Russell & Norvig, 2002)
defined AI more broadly. They extended the definition as thinking or acting humanly and/or
rationally (Russell & Norvig, 2002).
For a computer system to perform a rational or human-level intelligent task, the machine needs
to learn from the environment and operate under the given constraints, this is called machine
learning. (Goodfellow et al., 2012) defined machine learning as the ability of a system in learning
or identifying patterns from data and make decisions based on the identified patterns is called
machine learning. Machine learning over the years has evolved and is used in many industry-wide
problems. The rise of smart devices and data has created a huge opportunity for machine learning
methods to be implemented in problems across the board. That is not to say that machine learning
methods can be only applied to a large number of data. Traditionally, machine learning has been
categorized into three main categories, namely: supervised learning, unsupervised learning, and
reinforcement learning (Goodfellow et al., 2012). Supervised learning means an algorithm is
trained over the labeled data in such a way that it predicts the output variable. Unsupervised
learning means an algorithm learns patterns from the dependent and independent variables such
that it could predict the output variable over time. Both these methods require a certain amount of
6
data, whereas reinforcement learning methods operate slightly differently compared to supervised
learning and unsupervised learning. In reinforcement learning, an agent is given a task to perform
in a given environment with a reward policy. Reward policy is a method in which an agent
receives rewards upon successful completion of a task and otherwise. This study is an approach
that resembles the human learning approach, which was implemented upon the reasoning that
arose from the way humans learn and become intelligent (Russell & Norvig, 2002).
In this research, we primarily use the unsupervised learning method and reinforcement learning
method at times. The reason is the problems that are being solved require algorithms to detect
patterns given the desired outputs. Artificial Neural Networks (ANN) are considered to be one of
the unsupervised learning methods that excel at such problems. Moreover, there has been huge
progress in these methods since 2012. (Krizhevsky et al., 2012) used a variant of ANN called
CNN (Convolutional Neural Network) for classification tasks over a large dataset. It is one of the
early successes of CNN since its inception (LeCun et al., 1998). This is where deep learning
originated (DL) is in theory.
(Goodfellow et al., 2012) define DL as the depth of a CNN model in terms of its layers that
improve its pattern recognition task. DL has significantly changed the course of AI in general.
(Deng & Yu, 2013) defined DL as the class of non-leaner learning methods that could extract
layered information and recognize patterns from such representation. According to them, the DL
has wide-ranging applications. DL has revolutionized object detection and Natural Language
Processing (NLP). DL has been used in this research for object detection tasks and pattern
recognition in chapter 3. There is a novel approach to using DL that can help solve combinatorial
optimization is proposed in chapter 4.
7
CHAPTER 3
DEEP LEARNING-BASED CYBER-PHYSICAL SYSTEM FRAMEWORK FOR REAL-
TIME INDUSTRIAL OPERATIONS
3.1 Introduction
The fourth industrial revolution (Industry 4.0) reorganizes the control of the product life
cycle (Rüßmann, 2015). Industry 4.0 refers to the increased usage of internet-based applications
and the digitization of processes in the industry (Lasi et al., 2014). This internet-based
digitization provides the opportunity for applications to be real-time (Almada-Lobo, 2015) and
self-learning (J. Lee et al., 2014). The real-time and self-learning applications can connect
different fragments of the industry and improve overall functionalities themselves (Lasi et al.,
2014). Furthermore, it is not only the increased ability to operate for businesses, but also from
the sustainability perspective, it is important to develop intelligent industrial applications.
Sustainability is a core factor for businesses as identified by the United Nations (UN)
Sustainability 2030 agenda (Jamwal, Agrawal, Sharma, & Giallanza, 2021). For this research, we
consider this holistic approach for developing an application framework that does provide
reproducibility and repeatability.
To understand the real opportunity to enhance the scope of the framework’s application,
the research also considers the dynamics of Industry 4.0 and sustainability. (Müller et al., 2018)
the study suggested that the challenges of this dynamic are the organization and production fit in
strategic and operational drivers. That says there is a high value to the industry in providing an
aforementioned repeatable framework if that leverages the real-time and intelligence components
and has applications in processes. For these reasons, the techniques emerging from Industry 4.0,
such as the Internet of Things (IoT) and Cyber-Physical Systems (CPS), offer key contributions
8
to the sustainability of businesses directly or indirectly (Jamwal, Agrawal, Sharma, Kumar, &
Kumar, 2021). (Bai et al., 2020) developed an evaluation method for the impacts of Industry 4.0
on sustainability and concluded that Industry 4.0 improves the sustainability dynamics of the
industry. They run experiments by considering the UN sustainable development goals.
The IoT and CPS combined have a promise of introducing improved operations in a vast
variety of systems such as smart production systems, smart transportation, and infrastructure
systems (Vaidya et al., 2018). Cyber-Physical Systems refer to the integration of computing and
physical systems to satisfy desired functional operations. Internet of Things refers to the
interconnection of several entities (both cyber and physical) over the web that facilitates the
transmission of information across those entities; thereby enabling automation. Therefore, there
are three important components to the CPS, which are: a cyber component, a physical
component, and a networking component. In this paper, we use these three CPS components to
develop an intelligent framework that can perform industrial operations (Jamwal et al., 2020;
Pivoto et al., 2021).
According to (Kevin, 2010), the idea behind the use of IoT and CPS was the reduction in
human interaction in data gathering and processing and proposed the increased ability of cyber
systems in data gathering and communication among each other with the help of the internet. In
other words, different cyber systems operate in synchronization to achieve similar objectives
with the assistance of the internet (E. A. Lee, 2008). In essence, meaning to have able cyber
systems operating on the internet with less human interaction will require these systems to have
certain intelligence.
The three components of the CPS are selected for the paper in such a way that they are
capable of intelligent operations (Radanliev et al., 2021). There is a range of physical
9
components that can be included while developing a CPS (Mazumder et al., 2021; Wolf & Tech,
2009). However, for this paper, we used a robotic system as the primary physical component.
The rise in the use of robots is due to many reasons. A few of the major reasons are that robots
can perform repetitive tasks. There are many industrial operations where the processes involve
repetitive tasks. Manual operations are not precise in repetitive tasks operations (Alvarez-de-los-
Mozos & Renteria, 2017; Heyer, 2010).
Robots can perform a task on a repetitive basis over a specified period and can produce
similar outputs. Hence, robotic systems possess better repeatability than manual operations
(Zupančič & Bajd, 1998). Therefore, robots working on repetitive tasks help in improving the
accuracy and precision of products, reduce defects, reduce human fatigue, and increase
workplace safety overall. Human fatigue varies in different industries and depends upon the
tasks. For example, a pick-and-place operation of a plastic toy and an airplane wing is
significantly different operations. Human operators can perform a pick-and-place operation of
plastic toys easily, however, a pick-and-place operation of an airplane wing increases safety
concerns, as it is heavier to lift. Although, plastic toys' pick-and-place operation will cause
human fatigue when performed for long working periods. Due to weight and time considerations,
robots have become a go-to solution for industrial operations.
However, the manufacturing industry heavily deals with variability. For example, an
automobile manufacturing plant will require the assembling of multiple different models of an
automobile. Commonly, companies have a variety of different products. Different products and
models suggest variability in manufacturing operations. This variability will require highly
automated plant operations using robots to change robot codes. The amount of hard coding
required to program a robot for a task is time-consuming and expensive. These additional time
10
delays and cost increment negatively affects the plant's ability to reduce costs and operate
efficiently. Therefore, robots that can handle variability are an important solution to the complex
problem of industrial operations (Luo, 2020a).
Intelligent robotic systems can have a huge impact on the manufacturing industry. This
change is expected to revolutionize the industry in general and is referred to as the fourth
industrial revolution, or simply put, industry 4.0 (Rajkumar et al., 2010). (Zhong et al., 2017)
mentioned that automation in manufacturing needs new technological concepts to make it
intelligent, which can be applied as a part of design, production, and management.
The field that deals with making artificial systems intelligent is Artificial Intelligence
(AI). AI was first coined by McCarthy in 1989. According to (McCarthy, 1989), AI is
mathematical reasoning for commonsense problems. This mathematical reasoning is used in the
form of identifying patterns. Advanced artificial (cyber and physical) intelligent systems extract
patterns from the available data in such a way that it can be used as information. Several machine
learning techniques based on artificial neural networks (ANN) are primarily used to perform
pattern recognition. ANN and its variants that are used for object detection are later described in
Section II. B. Figure 3.1 details a framework of concepts that are offered in the literature and
defined by its innovators.
11
Cyber-Physcial System
Object Detection via TCP/IP
Robot Operation
Computer
TCP/IP
IP Location Memory
Update in a
Computer
Human Contorlled Automated Physcial
Machine Vision Operation
Figure 3. 1. Conceptual Framework. CPS, IoT, and Industry 4.0 are displayed to visualize the
differences. Moreover, the components and their data exchange are also visualized.
The goal of this paper is to create an intelligent CPS framework, which functions over the
internet and performs real-time object detection and appropriate control actions based on the
identified objects. We detail the proposed framework and demonstrate it with real-world
experiments using a Universal Robot (UR5) for pick and place operations based on object
detection. The proposed algorithm for the CPS framework operates on real-time data gathering
and processing with detection in a time frame of 0.2 seconds per image (Ren et al., 2017). The
proposed CPS works on Real-Time Data Exchange (RTDE) (Universal Robots, 2019) which is
operating on the internet using TCP/IP standards. The proposed framework improves the
operator’s safety and simplifies the operational process in general by keeping the control in the
hands of an operator. This framework can be implemented in different industrial settings with a
computer (a cyber-system), a robot (a physical component), and the internet (a networking
component). This is a very novel CPS approach that is discussed in the literature as a
revolutionary (industry 4.0) but not often formulated in a way that it has an intelligent repetitive
12
application. This paper provides a clear baseline for creating a CPS in various industrial settings.
Moreover, in the paper, we show that it is crucial to consider latency associated with the cyber
system and networking element while recreating this experiment in a practical environment. CPS
latency is important for dynamic experiments and therefore we propagate latency by the
sampling-resampling method. The results of the propagated latency provide important
considerations for the practical use that is an important contribution as well.
The paper is organized as follows. Sections II provides background information and
methodology for this paper. Section III contains algorithm 1 for the framework and experimental
setup for validation of the framework. In section IV, latency in CPS is explained in detail, which
shows that there are factors that should be considered while setting up this experiment in real life
to get the best out of the CPS. Lastly, section V provides the conclusion and future directions for
research.
3.2 Background
The objective of the paper is to create an intelligent CPS that implements different types
of physical control actions (Derler et al., 2012; Mörth et al., 2020). We consider a classic pick-
and-place operation based on the detected objects using a vision system. For implementation, a
conventional robot operation, pick-and-place of parts is considered. The motivation behind the
objective is to improve the operator’s safety and simplify the operational process on the
production floor. This problem, in short, can be termed artificial intelligence-based industrial
operation. The deep learning method will provide the basis of artificial intelligence which
contributes to Industry 4.0 (Miškuf & Zolotová, 2016). There is a research scope in the existing
literature for improvement using deep-learning-based CPS for better production fit (Jamwal,
Agrawal, Sharma, Kumar, Kumar, et al., 2021; Müller et al., 2018; Saufi et al., 2019).
13
Some background information on the terminologies and concepts used in the experiment
is explained in this section. The framework is developed considering three components as
mentioned above, which are a cyber-system, a networking element, and a physical system. The
application components of the experiment are a laptop computer operating as a cyber system,
socket-based communication using TCP/IP protocol standards, and a UR-5 robot operating as a
physical system. A laptop computer will be performing intelligence tasks using robot
programming for control purposes, a deep-learning methodology that includes variants of neural
networks. The following subsections provide a comprehensive review of each of the methods
which will be used in the methodology section. Furthermore, there is a literature review provided
for the networking element and protocol standards as well.
3.2.1 Robot Programming
The robot used for the research is a UR5 robot from Universal Robotics. This is a 6
Degrees-of-Freedom (DoF) robot with a payload capacity of 5 Kgs. Figure 3.9 in the case study
shows the UR5 robot for the pick and place operation. The robot can be programmed using its
software called Polyscope or by using SCRIPT Language or Python. Robotic arms generally
have applications in operations such as pick-and-place, assembly, welding, quality inspection,
3D printing, etc.
Robot programming can be categorized into two types: a) online programming, and b)
offline programming. Programming (also known as online programming) is when the robot is
programmed while functioning, and offline programming is when the robot is simulated in the
programming software before deploying the real program to the robot. Offline programming is a
helpful way of avoiding malfunctions and dangerous incidents which a programmer may not
have anticipated while building the solution approach. However, offline programming adds an
14
extra step of simulation and testing the programs which will then be extended as online
programming (Alvarez-de-los-Mozos & Renteria, 2017).
The usual robot-based manufacturing systems consider minimal variability in the work
environment. To counter any variability, manufacturing systems (e. g., robots) contain sensors
that are responsible for detecting variable actions in an environment. The use of sensors in a
system increases the cost of automation. Therefore, to increase the speed of control actions (by
directly deploying the control actions via online programming) and decrease the costs (by
decreasing the use of sensors), the use of the internet and DL processes is welcome progress.
The Tensorflow and Keras toolboxes available in the Python programming language
were used to implement the Object Detection and Control Algorithm (ODCA) 1. The python
code creates a socket communication to send SCRIPT codes to control the motion of the robot as
well as receive joint data from the robot. For object detection, a normal laptop webcam was used
for image processing.
3.2.2 Artificial Neural Networks
Artificial Neural Networks (ANN) are used as a framework for object detection. There
are various types of ANNs, and there are several techniques to train those models. One of the
widely used models is Convolutional Neural Networks (CNN). CNN is a recently developed
Artificial Neural Network (ANN) paradigm for object detection where the detected objects work
as input to the network. There have been several variants of CNN that have also been developed
to improve the efficiency of model training. Brief explanations regarding CNN and its variants
such as R-CNN, fast R-CNN, and faster R-CNN are given below.
15
a) Convolutional Neural Networks (CNN)
In CNN, there are convolutional and pooling layers to extract data in metric form, which
then are fed to the Fully Connected (FC) layers. FC layers, in the end, classify the data in the
classified output (Krizhevsky et al., 2012).
Figure 3.2. A CNN architecture showing convolution layers and fully connected layers for image
classification.
b) R-CNN
To improve the computational speed and to establish real-time object detection, a
variation of CNN called R-CNN was developed, where R stands for the region proposals (or
proposed region). Proposed regions are the regions that have a higher probability of containing
an object within the image. Figure 3.3 below shows that CNNs are computed for the proposed
regions instead of classifying all the pixels in the image (Girshick et al., 2014; Gkioxari et al.,
2015). (Gkioxari et al., 2015) recorded 50 seconds per image for image classification.
16
Figure 3.3. An R-CNN Architecture Where the CNNs are Computed only for the Proposed
Regions.
c) Fast R-CNN
(Girshick, 2015) proposed another development in the R-CNN architecture to improve
the run-time. In fast R-CNN, only one CNN is computed based on the Region of Interest (RoI).
Then the RoI pooling layers feed the data to the FC layers and the classification is performed.
The run-time for the fast R-CNN was 2 seconds per image (Girshick, 2015).
Figure 3.4. A Fast R-CNN Architecture. CNN is computed only for the proposed regions
followed by the similar FC layers classifying in the end.
d) Faster R-CNN
In faster R-CNN architecture, there is only one CNN is developed of the image, which
sends the data to the feature maps. Furthermore, region proposals are created regarding the
17
Region of Interest (RoI) (Ren et al., 2017). RoI provides an interesting area of the image which
requires classifying. This process significantly reduces the computational time to 0.2 seconds per
image (Ren et al., 2017), while the video feed is detecting objects.
Figure 3.5. A Faster R-CNN Architecture. CNN is computed only for the proposed regions. This
process is repeated in the form of feature maps and region proposals followed by similar FC
layers classifying in the end.
e) Dataset
The Microsoft COCO (Common Objects in Context) dataset (Microsoft, 2015) is used
which has the structured layout of the faster R-CNN model directory. The manual creation of
objects is done with the help of label image software. For example, there were 4 different classes
(playing cards) that were added with 296 images of the playing cards. These 296 images were
taken in different environments with different backgrounds and positions in the image. This
process increased the likelihood of accurate object detection. After having set the data directory,
the programming is performed on Python. Python code is built with the help of the Anaconda
platform where various deep learning toolboxes were used. The major toolboxes were
Tensorflow and Keras.
18
f) Training and Testing
The dataset was trained with 296 mages for almost 3197 computational iterations. The
training process accounted for almost 9 hours of training to achieve 0.05 Root Mean Square
(RMS) error. In training, the optimization of the loss (or cost) function was performed by using
the gradient descent algorithm. The gradient descent algorithm finds the local or global minima
of a function by learning the appropriate gradient to reduce errors (Mandic, 2004). This means
the gradient descent enables the model to gradually converge towards the minimum change in
the parameter changes in the loss (Cauchy, 1847). In other words, a convergence of the model.
The optimization results are shown in section III Figure 3.9. After training the faster R-CNN
model, the information was fed to the ODCA. Algorithm 1 (ODCA) is also explained in the next
section. In the testing process, the ODCA algorithm performed successful object detection of the
playing cards, showing more than 80% probability of identifying the objects correctly.
3.2.3 Robot Socket Communication
The UR5 robot allows us to communicate with it by creating a socket communication
with Port no. 30003. This port sends data from the robot in form of byte packets of the size 1108
bytes (Xue & Zhu, 2009). Each byte in the packet contains certain significant information about
the robot. A function called ‘get_data’ in the python program was created to receive this data. A
laptop computer, in networking terms, works as the client and the UR5 control unit works as the
server. This server can be accessed by the IP address of the controller (the laptop). In data
transfers, the control lies in the laptop and can be operated based on the command options
(Espinosa et al., 2008; Xue & Zhu, 2009). Figure 3.6 shows the laptop controlling the robot
shown in Figure 3.9.
19
Figure 3.6. Experimental Set-up. A playing card is shown to the laptop webcam to perform the
object detection.
3.3 Methodology and Experimentation
The proposed framework components are provided in Figure 3.7. The framework consists
of a UR-5; it can be any robotic or actuation system, however; data exchanges over networking
elements, and a cyber-system. The laptop is used which represents the object detection
processing phase and respective control action corresponding to the object detection. These
control actions are transmitted to the robotic system through a Real-Time Data Exchange
(RTDE) system operating on TCP/IP standards. The cyber-system, therefore, operates on the
internet for data exchange with the physical system in contrast to other CPS such as self-driving
cars.
20