Deep - Learning and AR
Deep - Learning and AR
sciences
Article
Deep-Learning-Incorporated Augmented Reality Application
for Engineering Lab Training
John Estrada 1 , Sidike Paheding 2,∗ , Xiaoli Yang 1 and Quamar Niyaz 1
1 Department of Electrical and Computer Engineering, Purdue University Northwest,
Hammond, IN 46323, USA; estradag@[Link] (J.E.); yangx@[Link] (X.Y.); qniyaz@[Link] (Q.N.)
2 Department of Applied Computing, Michigan Technological University, Houghton, MI 49931, USA
* Correspondence: spahedin@[Link]
Abstract: Deep learning (DL) algorithms have achieved significantly high performance in object
detection tasks. At the same time, augmented reality (AR) techniques are transforming the ways that
we work and connect with people. With the increasing popularity of online and hybrid learning, we
propose a new framework for improving students’ learning experiences with electrical engineering
lab equipment by incorporating the abovementioned technologies. The DL powered automatic object
detection component integrated into the AR application is designed to recognize equipment such as
multimeter, oscilloscope, wave generator, and power supply. A deep neural network model, namely
MobileNet-SSD v2, is implemented for equipment detection using TensorFlow’s object detection
API. When a piece of equipment is detected, the corresponding AR-based tutorial will be displayed
on the screen. The mean average precision (mAP) of the developed equipment detection model
is 81.4%, while the average recall of the model is 85.3%. Furthermore, to demonstrate practical
application of the proposed framework, we develop a multimeter tutorial where virtual models
are superimposed on real multimeters. The tutorial includes images and web links as well to help
users learn more effectively. The Unity3D game engine is used as the primary development tool
for this tutorial to integrate DL and AR frameworks and create immersive scenarios. The proposed
Citation: Estrada, J.; Paheding, S.;
framework can be a useful foundation for AR and machine-learning-based frameworks for industrial
Yang, X.; Niyaz, Q. Deep-Learning-
and educational training.
Incorporated Augmented Reality
Application for Engineering Lab
Training. Appl. Sci. 2022, 12, 5159.
Keywords: artificial intelligence; augmented reality; machine learning; object detection; computer in
[Link] education; lab equipment tutorial
app12105159
computers and mobile devices normally provide [5]. Both AI and XR have the potential
to be powerful workplace tools. For example, teams at various locations could work
together in a virtual environment using AI and XR technologies to create new products and
prototypes seamlessly. The applications of AI and XR are crossing many fields, ranging from
workflow optimization in various healthcare processes and industrial training procedures
to interactive educational systems [6]. Augmented reality (AR) is one of the XR realities
that is commonly used in mobile/tablet devices.
Deep learning (DL), a sub-field of machine learning (ML), embraces artificial neural
networks (ANN), which are algorithms inspired by the structure and function of the human
brain. ML has made significant advances in recent years because of the need for increased
automation and intelligence [7]. XR refers to immersive technology that encompasses three
distinct realities: AR, mixed reality (MR), and virtual reality (VR). AR superimposes three-
dimensional objects on the physical world, requiring the use of mobile devices to create
interactions. MR is a technology that combines the physical and digital worlds to create
immersive physical experiences. Users interact with both the physical and digital worlds
by using their five senses. VR is a fully digitized world in which users can completely
immerse themselves in the computer-generated world by using virtual reality devices [8].
Many AR apps have recently been developed. AUREL [9] is an interactive application
that aids in the understanding of specific STEM topics. It enhances the learning experience
by projecting 3D models onto physical 2D textures that are part of the AR system, drawing
virtual objects using the mobile display, and placing them onto a specific image tracked for
the camera. The image detection for the ML system uses the camera as input data to detect
specific images based on a trained dataset. Nonetheless, its application is limited to flat
image recognition, allowing them to research and extend their idea for object recognition.
An AR application [10] was implemented to detect a breadboard and instruct students
on how to build a circuit. Their system scans a circuit diagram for circuit symbols and
their connections. These components are then arranged by a neural network. The AR
system provides a 3D visualization of the scanned circuit diagram which students can use
as a guided tutorial to build real circuitry. Another study [11] was to create and test an
augmented reality application to teach power electronics to beginners. Two AR applications
for RLC circuits and Buck–Boost converters were created, and the experimental results
showed that they had a positive effect on students when compared to traditional teaching
methods. The results of the experiment indicated improved cognitive performance. Despite
the fact that augmented reality has made its way into STEM education, there is no general
non-linear framework that can guide the development of an AR-based tutorial to our
knowledge. Furthermore, the presented study goes in the direction of facilitating a smooth
transition from real-time object recognition using deep learning methods to interactive
tutorials using AR technologies, a particular step of the process where there is potential
for improvement.
In this paper, we discuss the design and implementation of an AR- and DL-based
smartphone app to assist students in learning how to use electrical lab equipment such
as multimeters. A similar framework can be applied to develop AR- and DL-based apps
for other equipment in the future. The paper is structured as follows. Section 2 provides
an overview of the DL and AR techniques suitable for this type of application. Section 3
illustrates the design and implementation of the smartphone app using different AR and
DL frameworks. The experimental results are discussed in Section 4. Finally, the paper
ends with a conclusion and future works in Section 5.
2.2.1. Marker-Based AR
Marker-based AR works when it is triggered by pre-defined markers. It allows the user
to choose where to place the virtual object. Barcodes and QR codes are commonly used as
images or photo symbols to be placed on flat surfaces. The program recognizes the marker
when the mobile device focuses the target image. The virtual information will be projected
by the AR onto the marker that will be displayed on the device. There are many levels of
complexity in marker-based AR [34]. For example, a few display virtual information when
the device is focused on the marker, while others save that virtual information and allow
users to view it again when the device is focused on a different section. The marker-based
AR technology leverages images from the actual world or QR codes to extract points, lines,
corners, textures, and other properties [35]. These images are used to superimpose and
create AR experiences by referencing track points in the physical world.
2.2.2. Marker-Less AR
Marker-less AR is more versatile than marker-based AR. It interacts with the real object
without the need for pre-defined markers but leaves the freedom to the user. This allows
the user, for example, to position a virtual object anywhere on a real object. Users can
experiment with different styles and locations digitally without having to move anything in
their immediate surroundings [36]. Marker-less AR collects data from the device hardware
such as a camera, a GPS, a digital compass, and an accelerometer for the AR program to
function. Marker-less AR applications rely on computer vision algorithms to distinguish
objects, and they can function in the real world without specific markers [37,38]. There are
four types of marker-less AR discussed as follows:
(a) Location-based AR: In this type of AR, simultaneous localization and mapping (SLAM)
technology is used to track the user’s location as the map is generated and updated
on the user’s mobile device [39]. To display AR content in the physical environment,
the user must detect a surface with a mobile device [40,41]. As an example, the world-
famous AR-based game app, Pokemon Go, uses SLAM technology that allows its users
to battle, navigate, and search for 3D interactive objects based on their geographical
locations [42].
(b) Superimposition-based AR: Superimposition-based AR applications can provide an
additional view along with the original view of the object. Object recognition is
required to determine the type of object to partially or completely replace an object
in the user’s environment with a digital image [43,44]. Using HoloLens glasses,
surgeons can superimpose images previously gathered through scanners or X-rays on
the patient’s body during the operation. They can anticipate potential problems using
this approach.
(c) Projection-based AR: Projection-based AR (also known as projection mapping and aug-
mented spatial reality) is a technique that does not require the use of head-mounted
or hand-held devices. This method allows augmented information to be viewed
immediately from a natural perspective. Using projection mapping, projection-based
Appl. Sci. 2022, 12, 5159 5 of 19
AR turns an uneven surface into a projection screen. This method allows for the
creation of optical illusions [45].
(d) Outlining-based AR: This type of AR employs image recognition to create contours
or forms and highlight components of the real world using special cameras. It is
used by human eyes to designate specific items with lines to make situations easier.
Vuforia’s Model Target is an example of outlining-based AR. Vuforia is a platform
that enables developers to quickly incorporate AR technology into their applications.
Model Targets allow apps to recognize and track real-world objects based on their
shape [46].
In our project, we built a superimposition-based AR app. We built user interfaces
on top of lab equipment, allowing step-by-step instructions to be incorporated into the
application for users to understand and learn how to use specific equipment. Using AR
technology, immersive experiences are created in a variety of ways. It does, however, have
some limitations such as the inability to recognize multiple objects at once. On the other
hand, DL models show high performances in recognizing multiple objects at the same
time. Integrating AR apps with DL models will help trigger specific AR scenarios based on
objects being aimed at with a camera and allow an AR scenario to perform a single tracking
without decreasing mobile device performance.
Figure 2. Design framework of AR-based smartphone app for lab equipment training.
Appl. Sci. 2022, 12, 5159 6 of 19
MobileNet is a low-latency and low-power model that can be tailored to meet the
resource constraints of various use cases. For multi-scale object detection, MobileNetv2
provides a number of feature maps with different dimensions for the backbone detection
network to the SSD convolutional layer that uses small convolutional filters to predict
scores and class offsets for a fixed set of the standard bounding boxes. MobileNet-SSDv2
extracts features from images, which are then processed through SSD predictor layers
that reduce image size to recognize objects at various scales [26,48] as shown in Figure 3.
Mobilenet-SSDv2 detector improves the SSD detector by combining MobileNetv2 and FPN
while maintaining memory efficiency.
(a) Image Dataset: The model was given input of 643 images collected from various
perspective views and in different lighting settings. Each image is of 4032 × 3024 pixels
in size. It is necessary to annotate these images before using them to train the model.
A software, LabelImg [50], is used in the annotation process [51] that allows users to
draw a rectangle in a specific area of the image. During training, the annotation will
help the model precisely locate the object in the image. The outlining will generate
and save coordinate points in an XML file.
(b) TensorFlow Dataset: To make the computation of the DL framework efficient, TF records
use a binary file format. Furthermore, TF records enable the dataset to be stored as a
sequence of binary strings that improves the model’s performance while using less
disk space. We converted the XML files generated by LabelImg into TF binary records
using a Python script. The last step in configuring the TF dataset is to create a .pbtxt
file containing all of the categorical label classes that will be stored in a TF record file.
(c) Configuration File: Multiple pre-trained models based on the common objects in con-
text (COCO) dataset are available in TF. These models can be used to set up DL models
prior to training on a new dataset. Table 1 lists several popular architectures with
pre-trained models. For instance, ssd_MobileNet_v1_coco is the SSD with a MobileNet
v1 configuration, ssd_inception_v2_coco represents an SSD with an Inception v2 con-
figuration, and faster_rcnn_resnet101_coco stands for Faster R-CNN with a Resnet-101
(v1) configuration. All these configurations have been derived for the COCO dataset.
From Table 1, it can be observed that ssd_MobileNet_v1_coco reaches the fastest infer-
ence speed of 30 ms but with the lowest mean average precision (mAP). In contrast,
faster_rcnn_resnet101_coco has the slowest inference speed but the highest mAP of 32.
We tested both MobileNet SSD v2 and faster RCNN [52] and concluded that Mo-
bileNet SSD v2 performs faster inference in mobile devices than the faster-RCNN
model in our study. Using a pre-trained model saves time and computing resources.
A configuration file, in addition to the pre-trained model, is also required. It must
match the same architecture of the pre-trained model. It is recommended to fine-tune
the model to maximize the prediction outcome. The process of fine-tuning is divided
into two steps: restoring weights and updating weights. After we completed the
requirements, we ran the python code provided for TF API to start the training job.
Following training, the API will generate a file serving as a training checkpoint in a
specific format named .ckpt. This file is a binary file containing all of the weights,
biases, and other variables’ values.
(d) Inference: After training the model, the last step is to put it into production and feed the
model with live data to calculate the predicted output. Before testing, we can evaluate
the model’s accuracy using mAP. In Section 4, the evaluation result is described in
detail. We also need a lightweight version of the model to perform inference, so we
choose an OpenCV library.
Table 1. Comparison of pre-trained models in Model Zoo collection based on COCO dataset [13].
The ssd_MobileNet_v1_coco and ssd_MobileNet_v2_coco are the SSD models with MobileNet v1 and v2
configurations, respectively; ssd_inception_v2_coco represents SSD with Inception v2 configuration,
and faster_rcnn_resnet101_coco stands for Faster R-CNN with Resnet-101 (v1) configuration. All these
configurations are used for the COCO dataset.
In addition, there is a frozen trained model, a ready-to-use inference model that can
generate an output based on the live data input, and the frozen process file is stored in
Protobuf (.pb) file. The Protobuf model contains graph definition and trained parame-
ters in a binary format. The text graph representation of the frozen process file is in a
human-readable format required by the OpenCV library and is kept in a .pbtxt format.
After creating the corresponding file, it is time to examine and test the trained model. We
use a function called VideoCapture from OpenCV to test the model, which loads the input
video using the PC webcam and then predicts the relevant labels and object location with
an enclosed rectangle indicating its pixel location within the input image. Finally, with the
Protobuf and the configuration file, we can now use the Unity3D game engine and OpenCV
to create our application by triggering AR scenarios based on the detection of electrical lab
equipment performed by the DL model during its inference.
image within a series of photos) of the captured scene. It improves the quality of the image
captured by the camera so that an AR tracker component can correctly treat it. It uses the
latter to analyze the image and search the database for matches, which may include one or
more targets. Finally, the program renders virtual material such as photographs, videos,
models, and animations on the device screen, creating a hybrid image of what we perceive
as holographs. The process of generating AR targets is depicted in Figure 5 and can be
described in the following steps:
(a) Object Scanning: It is the primary tool for generating an object data file, which is
required for generating an object target in the target manager on the Vuforia webpage.
This app is available on the Vuforia developer website, which users can access after
creating a developer account. The ObjectScanner [53] app is used, and the scanning
environment is configured. The Vuforia developer portal provides a printable target
image that defines the target position and orientation relative to the local coordinate
space to scan the object and collect data points. It also distinguishes and removes
undesirable areas. This printable target image is used in conjunction with an Android
application, which is available for free download from the Vuforia official website.
During scanning, the printable target image must be placed under the object to be
scanned. Using the Vuforia scanning mobile application, the user can start collecting
data points from the object. To achieve the best scanning quality, it is recommended
to work in a noise-free environment with moderately bright and diffuse lighting. It is
also recommended to avoid objects with reflective surfaces. In this work, a multimeter
met all of the requirements, and a successful scanning was achieved.
(b) Data File: Following the scanning, an object data file is created. The mobile app will
also show how many scanning points the object has. The completed scanning area
is evidenced by a square grid that changes color from gray to green. The object data
file contains all the object’s information. There is a test scenario to determine whether
the scanned object has sufficient data for augmentation. In this scenario, a green
rectangular prism will be drawn in one of the object corners relative to the target
image coordinate space.
(c) Target Management System: Vuforia has a framework that allows developers to choose
from various target types, such as picture targets, stimuli targets, cylinder targets,
object targets, and VuMarks. The system will process and manage the data for visual
examination. A developer license is required to fully utilize the Vuforia manager
web-based tool, which includes access to a target manager panel where a database
can be uploaded, and targets can be added to the management system. The 3D object
option must be selected when selecting the target type, and the object data file must
be uploaded.
(d) Export Target Dataset Following the web-tool processing the information, the database
can be downloaded by choosing the desired platform. The platform can be converted
into a package that can be used in the primary development process as well as to
create AR experiences in Unity.
(a) Setup Environment: The setup starts with the creation of a new project using a Unity
hub. After creating and opening the project, it is essential to switch to a different
build platform because Unity allows us to create once and deploy anywhere. In other
words, we can select a platform from the list of available platforms in Unity, such as
Windows, WebGL, Android, iOS, or any gaming console. We chose Android as the
deployed platform for this project. The platform can be changed in the build settings
windows, which can be accessed via the file bar. Additionally, the rendering, scripting,
and project configuration must be modified.
(b) Libraries and Assets:
(1) OpenCV Library: OpenCV For Unity [54] is a program that uses AI algorithms
to analyze and interpret images on computers or mobile devices. This Unity
asset store product allows users to test AI pre-trained models that can be used
to run algorithms and executable applications on mobile devices. The model
employs a script that requires a binary file of a DL model with trained weights
(weights of deep neural networks are not modified in this stage), and a file
model network configuration. This script is granted access to the device
resource, specifically the camera, so that the script can pass input to the model
and start object detection inference, which will generate bounding boxes and
labels around the object detected.
(2) Vuforia Engine: This library allows Unity to create AR experiences for mobile
devices. It is a collection of scripts and pre-made components for developing
Vuforia apps in Unity. It includes API libraries written in the C# language that
expose the Vuforia APIs. This library supports all traceable functions as well
as high-level access to device hardware such as the device camera.
(3) Assets: They are graphical representations of any items that could be used in
the project. It is made up of user interface sprites, 3D models, images, materials,
and sounds, all with their own design and functionality. Photoshop is used
to create art sprites, such as images for a virtual manual and blender. A 3D
modeler software is used to create 3D models.
(c) Scenarios creation
(1) Main menu: The application includes a menu scenario, as shown in Figure 7,
that will allow the user to select various modes based on their preferences. It
includes a tutorial that teaches students how to use the application. There is
a training mode to help students learn more about lab equipment or electri-
cal components.
(2) Object detection: In this case, the DL model is used in conjunction with the
OpenCV library in Unity. The application has access to the device’s camera
from which it will infer the object detection model provided by the object detec-
tion framework. Furthermore, depending on the object that is being targeted,
the application automatically generates bounding boxes around the desired
object with its respective label and confidence. When the user points to the
desired equipment, a bottom panel will appear with the option to load the AR
experience or continue looking for other lab equipment. The OpenCV library
allows us to specify the desired confidence value threshold during the model
inference. During model inference, we can specify the desired confidence
value threshold using the OpenCV library. The model draws a green rectangle
around the detected equipment. The detection threshold confidence value is
set to 90%, which means that the confidence must be greater than or equal to
90% to indicate a detection with a rectangular bounding box. This percentage
was chosen because the lab equipment is quite different. The score of 90%
would ensure that the lab equipment detected had a high confidence level.
(3) Learning scenarios: A 3D visual aid is provided in this scenario to understand
the essential functions of the equipment selected during the detection scenario.
Figure 8 shows how users will be able to access an interactive 3D model repre-
Appl. Sci. 2022, 12, 5159 12 of 19
4. Experimental Results
In this section, performance for DL and AR frameworks are discussed.
TP
Precision = (2)
TP + FP
• Recall: The percentage of true positives that were correctly identified by the model.
A model with a recall of 1.0 produces zero false negatives. Recall can be computed
as a ratio of True Positives predictions and the sum of TP and False Negatives (FN),
as shown in Equation (3). FN is defined as an incorrect prediction of the positive class
as the negative class.
TP
Recall = (3)
TP + FN
• Intersection over Union (IoU): It is also known as the Jaccard index used for measuring
the similarity and diversity of sample sets. In an object detection task, it describes
the similarity between the predicted bounding box and the ground truth bounding
box. Equation (4) expresses IoU in terms of area of the prediction and ground truth
bounding boxes.
Area of Overlap
IoU = (4)
Area of Union
Appl. Sci. 2022, 12, 5159 14 of 19
∑nk=1 APk
mAP = (6)
n
where APk is the average precision of class k, and n is the number of classes.
Figure 10 shows our model during inference when the new dataset was fed to the
model. It demonstrates the correct detection of four types of lab equipment in a single
shot when the confidence threshold value (i.e., the threshold related to the confidence score
to determine whether the detection is an object of interest or not. Confidence scores of
the predicted bounding boxes above the threshold value are considered as positive boxes,
or vice versa) is greater or equal to 90%. The experimental results shown in Table 2 support
that our model has a high mAP. In practice, the DL model can recognize all of the electrical
lab equipment that has been pre-selected.
Figure 10. Example of automatic equipment detection by the DL model. The number above each
green bounding box indicates confidence score of the model, which is the probability that a bounding
box contains an object of interest, for the detection.
Table 2. Mean average precision (mAP) with different IoU scores of the trained MobileNet-SSD
V2 model.
Table 2 shows the average precision and average recall for a given IoU score and mAP.
The IoU is a range between 0.50 and 0.95. Using 193 testing images, the average precision of
our proposed model achieved a mAP of 81.4% and an average recall of 85.3%. Some failure
cases were due to the low ambient lighting and a lack of training datasets with varying
lighting conditions.
Appl. Sci. 2022, 12, 5159 15 of 19
The DL model deployed on a mobile device uses CPU resources of the device to infer
and predict objects. We ran the DL model on two different mobile devices to evaluate
performances of devices for real-time prediction. We used the frame per second (FPS) unit,
which measures the number of images that the mobile device screen displays every second.
According to Table 3, Samsung has a performance of 8.5 FPS, and One plus has a
performance of 5.5 FPS, indicating that the device hardware resources are required to
accelerate the inference performance.
Figure 11. Image reference of Samsung s21 camera with a light intensity of 350 lux (Left) and
25 lux (Right).
We included a toggle button in the test AR scenario during the evaluation to indicate
whether the application keeps detecting the multimeter. Table 4 shows the results of
the AR-based detection experiments. Due to the camera’s lack of focus, we discovered
that our camera was not tracking the multimeter during our preliminary results. We
included a script in our Unity engine project that allowed us to focus on the mobile camera.
The evaluation table includes focus parameters that will help us decide whether to include
this feature in the AR experience. We chose 50 cm and 100 cm for our evaluation because
these are the typical distances between the lab equipment and the students. The final
column contains the result in True/False format, indicating whether the multimeter was
detected. We concluded that Vuforia can detect objects even in low light conditions.
However, the distance will have an impact on the detection results. According to our
table evaluation, the focus parameter increases the likelihood of detecting a multimeter in
different light intensities, but it also depends on the camera resolution.
Appl. Sci. 2022, 12, 5159 16 of 19
Table 4. AR-based object detection experiments using different devices, luminous intensities, dis-
tances between mobile camera and multimeter, and camera’s focus.
Luminous
Device Distance (cm) Focus Detection Status
Intensities
1 25 50 No Yes
1 25 100 No No
1 25 50 Yes Yes
1 25 100 Yes Yes
1 150 50 No Yes
1 150 100 No No
1 150 50 Yes Yes
1 150 100 Yes Yes
1 350 50 No Yes
1 350 100 No Yes
1 350 50 Yes Yes
1 350 100 Yes Yes
2 25 50 No Yes
2 25 100 No No
2 25 50 Yes Yes
2 25 100 Yes No
2 150 50 No Yes
2 150 100 No No
2 150 50 Yes Yes
2 150 100 Yes No
2 350 50 No Yes
2 350 100 No No
2 350 50 Yes Yes
2 350 100 Yes Yes
References
1. Mejías Borrero, A.; Andújar Márquez, J. A pilot study of the effectiveness of augmented reality to enhance the use of remote labs
in electrical engineering education. J. Sci. Educ. Technol. 2012, 21, 540–557. [CrossRef]
2. Singh, G.; Mantri, A.; Sharma, O.; Dutta, R.; Kaur, R. Evaluating the impact of the augmented reality learning environment on
electronics laboratory skills of engineering students. Comput. Appl. Eng. Educ. 2019, 27, 1361–1375. [CrossRef]
Appl. Sci. 2022, 12, 5159 17 of 19
3. Ray, S. A quick review of machine learning algorithms. In Proceedings of the 2019 International Conference on Machine Learning,
Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 35–39.
4. Xue, M.; Zhu, C. A study and application on machine learning of artificial intelligence. In Proceedings of the 2009 International
Joint Conference on Artificial Intelligence, Hainan, China, 25–26 April 2009; pp. 272–274.
5. Gong, L.; Fast-Berglund, Å.; Johansson, B. A framework for extended reality system development in manufacturing. IEEE Access
2021, 9, 24796–24813. [CrossRef]
6. Nisiotis, L.; Alboul, L. Work-In-Progress-An Intelligent Immersive Learning System Using AI, XR and Robots. In Proceedings of
the 2021 7th International Conference of the Immersive Learning Research Network (iLRN), 17 May–10 June 2021; pp. 1–3.
7. Khomh, F.; Adams, B.; Cheng, J.; Fokaefs, M.; Antoniol, G. Software engineering for machine-learning applications: The road
ahead. IEEE Softw. 2018, 35, 81–84. [CrossRef]
8. Hu, M.; Weng, D.; Chen, F.; Wang, Y. Object Detecting Augmented Reality System. In Proceedings of the 2020 IEEE 20th
International Conference on Communication Technology (ICCT), Nanning, China, 28–31 October 2020; pp. 1432–1438.
9. Ang, I.J.X.; Lim, K.H. Enhancing STEM Education Using Augmented Reality and Machine Learning. In Proceedings of the 2019
7th International Conference on Smart Computing & Communications (ICSCC), Miri, Malaysia, 28–30 June 2019; pp. 1–5.
10. Thiwanka, N.; Chamodika, U.; Priyankara, L.; Sumathipala, S.; Weerasuriya, G. Augmented Reality Based Breadboard Circuit
Building Guide Application. In Proceedings of the 2018 3rd International Conference on Information Technology Research
(ICITR), Moratuwa, Sri Lanka, 5–7 December 2018; pp. 1–6.
11. Sandoval Pérez, S.; Gonzalez Lopez, J.M.; Villa Barba, M.A.; Jimenez Betancourt, R.O.; Molinar Solís, J.E.; Rosas Ornelas, J.L.;
Riberth García, G.I.; Rodriguez Haro, F. On the Use of Augmented Reality to Reinforce the Learning of Power Electronics for
Beginners. Electronics 2022, 11, 302. [CrossRef]
12. Technologies, U. Unity Real-Time Development Platform. 2005. Available online: [Link] (accessed on 8 April 2022).
13. Yu, H.; Chen, C.; Du, X.; Li, Y.; Rashwan, A.; Hou, L.; Jin, P.; Yang, F.; Liu, F.; Kim, J.; et al. TensorFlow 1 Detection Model Zoo.
2020. Available online: [Link]
[Link]/ (accessed on 28 April 2022).
14. Chen, X.W.; Lin, X. Big data deep learning: Challenges and perspectives. IEEE Access 2014, 2, 514–525. [CrossRef]
15. Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.; Asari, V.K.
A state-of-the-art survey on deep learning theory and architectures. Electronics 2019, 8, 292. [CrossRef]
16. Deng, L.; Liu, Y. Deep Learning in Natural Language Processing; Springer: Berlin/Heidelberg, Germany, 2018.
17. Deng, L.; Hinton, G.; Kingsbury, B. New types of deep neural network learning for speech recognition and related applications:
An overview. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouve,
BC, Canada, 26–31 May 2013; pp. 8599–8603.
18. Nishani, E.; Çiço, B. Computer vision approaches based on deep learning and neural networks: Deep neural networks for video
analysis of human pose estimation. In Proceedings of the 2017 6th Mediterranean Conference on Embedded Computing (MECO),
Bar, Montenegro, 11–15 June 2017; pp. 1–4.
19. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998,
86, 2278–2324. [CrossRef]
20. Pandiya, M.; Dassani, S.; Mangalraj, P. Analysis of Deep Learning Architectures for Object Detection-A Critical Review. In
Proceedings of the 2020 IEEE-HYDCON, Hyderabad, India, 11–12 September 2020; pp. 1–6.
21. Arora, D.; Garg, M.; Gupta, M. Diving deep in deep convolutional neural network. In Proceedings of the 2020 2nd Interna-
tional Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India,
18–19 December 2020; pp. 749–751.
22. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation.
IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [CrossRef] [PubMed]
23. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
24. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of
the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37.
25. Ryu, J.; Kim, S. Chinese character detection using modified single shot multibox detector. In Proceedings of the 2018 18th
International Conference on Control, Automation and Systems (ICCAS), PyeongChang, Korea, 17–20 October 2018; pp. 1313–1315.
26. Chiu, Y.C.; Tsai, C.Y.; Ruan, M.D.; Shen, G.Y.; Lee, T.T. Mobilenet-SSDv2: An improved object detection model for embedded
systems. In Proceedings of the 2020 International Conference on System Science and Engineering (ICSSE), Kagawa, Japan,
31 August–3 September 2020; pp. 1–5.
27. Heirman, J.; Selleri, S.; De Vleeschauwer, T.; Hamesse, C.; Bellemans, M.; Schoofs, E.; Haelterman, R. Exploring the possibilities of
Extended Reality in the world of firefighting. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence
and Virtual Reality (AIVR), Utrecht, The Netherlands, 14–18 December 2020; pp. 266–273.
28. Andrade, T.M.; Smith-Creasey, M.; Roscoe, J.F. Discerning User Activity in Extended Reality Through Side-Channel Accelerometer
Observations. In Proceedings of the 2020 IEEE International Conference on Intelligence and Security Informatics (ISI), Arlington,
VA, USA, 9–10 November 2020; pp. 1–3.
Appl. Sci. 2022, 12, 5159 18 of 19
29. Dandachi, G.; Assoum, A.; Elhassan, B.; Dornaika, F. Machine learning schemes in augmented reality for features detection.
In Proceedings of the 2015 Fifth International Conference on Digital Information and Communication Technology and Its
Applications (DICTAP), Beirut, Lebanon, 29 April–1 May 2015; pp. 101–105.
30. Sendari, S.; Anggreani, D.; Jiono, M.; Nurhandayani, A.; Suardi, C. Augmented reality performance in detecting hardware
components using marker based tracking method. In Proceedings of the 2020 4th International Conference on Vocational
Education and Training (ICOVET), Malang, Indonesia, 19 September 2020; pp. 1–5.
31. Mahurkar, S. Integrating YOLO Object Detection with Augmented Reality for iOS Apps. In Proceedings of the 2018 9th IEEE
Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 8–10
November 2018; pp. 585–589.
32. El Filali, Y.; Krit, S.D. Augmented reality types and popular use cases. Int. J. Eng. Sci. Math. 2019, 8, 91–97.
33. Poetker, B. What Is Augmented Reality? (+Most Common Types of AR Used Today). 2018. Available online: [Link]
.com/articles/augmented-reality (accessed on 8 April 2022).
34. Gao, Y.F.; Wang, H.Y.; Bian, X.N. Marker tracking for video-based augmented reality. In Proceedings of the 2016 International
Conference on Machine Learning and Cybernetics (ICMLC), Jeju Island, Korea, 10–13 July 2016; Volume 2, pp. 928–932.
35. Sendari, S.; Firmansah, A. Performance analysis of augmented reality based on vuforia using 3d marker detection. In Proceedings
of the 2020 4th International Conference on Vocational Education and Training (ICOVET), Malang, Indonesia, 19 September 2020;
pp. 294–298.
36. Vidya, K.; Deryl, R.; Dinesh, K.; Rajabommannan, S.; Sujitha, G. Enhancing hand interaction patterns for virtual objects in mobile
augmented reality using marker-less tracking. In Proceedings of the 2014 International Conference on Computing for Sustainable
Global Development (INDIACom), New Delhi, India, 5–7 March 2014; pp. 705–709.
37. Beier, D.; Billert, R.; Bruderlin, B.; Stichling, D.; Kleinjohann, B. Marker-less vision based tracking for mobile augmented reality.
In Proceedings of the The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, Tokyo, Japan,
7–10 October 2003; pp. 258–259.
38. Pooja, J.; Vinay, M.; Pai, V.G.; Anuradha, M. Comparative analysis of marker and marker-less augmented reality in education. In
Proceedings of the 2020 IEEE International Conference for Innovation in Technology (INOCON), Bangalore, India, 6–8 November
2020; pp. 1–4.
39. Batuwanthudawa, B.; Jayasena, K. Real-Time Location based Augmented Reality Advertising Platform. In Proceedings of the
2020 2nd International Conference on Advancements in Computing (ICAC), Malabe, Sri Lanka, 10–11 December 2020; Volume 1,
pp. 174–179.
40. Unal, M.; Bostanci, E.; Sertalp, E.; Guzel, M.S.; Kanwal, N. Geo-location based augmented reality application for cultural heritage
using drones. In Proceedings of the 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies
(ISMSIT), Ankara, Turkey, 19–21 October 2018; pp. 1–4.
41. Argotti, Y.; Davis, L.; Outters, V.; Rolland, J.P. Dynamic superimposition of synthetic objects on rigid and simple-deformable real
objects. Comput. Graph. 2002, 26, 919–930. [CrossRef]
42. Ketchell, S.; Chinthammit, W.; Engelke, U. Situated storytelling with SLAM enabled augmented reality. In Proceedings of the The
17th International Conference on Virtual-Reality Continuum and Its Applications in Industry, Brisbane, QLD, Australia, 14–16
November 2019; pp. 1–9.
43. Knopp, S.; Klimant, P.; Schaffrath, R.; Voigt, E.; Fritzsche, R.; Allmacher, C. Hololens ar-using vuforia-based marker tracking
together with text recognition in an assembly scenario. In Proceedings of the 2019 IEEE International Symposium on Mixed and
Augmented Reality Adjunct (ISMAR-Adjunct), Beijing, China, 10–18 October 2019; pp. 63–64.
44. Soulami, K.B.; Ghribi, E.; Labyed, Y.; Saidi, M.N.; Tamtaoui, A.; Kaabouch, N. Mixed-reality aided system for glioblastoma
resection surgery using microsoft HoloLens. In Proceedings of the 2019 IEEE International Conference on Electro Information
Technology (EIT), Brookings, SD, USA, 20–22 May 2019; pp. 79–84.
45. Lee, J.D.; Wu, H.K.; Wu, C.T. A projection-based AR system to display brain angiography via stereo vision. In Proceedings of the
2018 IEEE 7th Global Conference on Consumer Electronics (GCCE), Nara, Japan, 9–12 October 2018; pp. 130–131.
46. Vuforia Developer Library. Introduction to Model Targets. 2021. Available online: [Link]
[Link] (accessed on 8 April 2022).
47. Zhang, S.; Tian, J.; Zhai, X.; Ji, Z. Detection of Porcine Huddling Behaviour Based on Improved Multi-view SSD. In Proceedings
of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 5494–5499.
48. Rios, A.C.; dos Reis, D.H.; da Silva, R.M.; Cuadros, M.A.D.S.L.; Gamarra, D.F.T. Comparison of the YOLOv3 and SSD MobileNet
v2 Algorithms for Identifying Objects in Images from an Indoor Robotics Dataset. In Proceedings of the 2021 14th IEEE
International Conference on Industry Applications (INDUSCON), Sao Paulo, Brazil, 15–18 August 2021; pp. 96–101.
49. Phadnis, R.; Mishra, J.; Bendale, S. Objects talk-object detection and pattern tracking using tensorflow. In Proceedings of the 2018
Second International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India,
20–21 April 2018; pp. 1216–1219.
50. Lin, T. LabelImg. 2015. Available online: [Link] (accessed on 8 April 2022).
51. Kilic, I.; Aydin, G. Traffic sign detection and recognition using tensorflow’s object detection API with a new benchmark dataset.
In Proceedings of the 2020 International Conference on Electrical Engineering (ICEE), Istanbul, Turkey, 25–27 September 2020; pp. 1–5.
52. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings
of the 28th Annual Conference on Neural Information Processing Systems, Montreal, Canada, 7–12 December 2015; pp. 91–99.
Appl. Sci. 2022, 12, 5159 19 of 19
53. Park, Y.; Chin, S. An Efficient Method of Scanning and Tracking for AR. Int. J. Adv. Cult. Technol. 2019, 7, 302–307.
54. Enoxsoftware. About OpenCV for Unity. 2016. Available online: [Link]
about-opencv-for-unity (accessed on 8 April 2022).