Traffic Guidance System Via Multiple Camera
Domain Adaption Using Deep Learning
Tien Do⋆ , Xuan Le, and Phong Nguyen
.
[email protected],{thanhxuan8054,nguyenthanhphong}@gmail.com
http://www.springer.com/lnns
Abstract. Traffic congestion in densely populated urban areas poses
significant challenges due to the rapid increase in vehicle numbers and
the limitations of current traffic management systems. Reliable vehicle
detection, tracking, and guidance systems are essential for addressing
these issues but often struggle to perform effectively under diverse con-
ditions such as varying lighting, weather, and road infrastructure. One
major obstacle is the absence of comprehensive datasets representing
complex urban scenarios and the difficulty of achieving generalization
across different environments. To address this, a new, diverse dataset has
been developed, specifically tailored for vehicle detection and tracking in
urban traffic conditions. Based on this dataset, a guidance system in-
corporating domain adaptation techniques has been designed to enhance
performance across varying environments. Experimental evaluations con-
ducted on multiple urban traffic scenarios demonstrate significant im-
provements in detection accuracy, tracking reliability, and adaptability,
offering a robust solution for enhancing traffic management in modern
cities.
Keywords: Deep Learning, Domain Adaption
1 Introduction
Transportation has long been recognized as a critical driver of economic growth
and societal development, forming the backbone of modern urban life. However,
with the rapid increase in vehicle usage, traffic congestion has emerged as a
pressing global issue. Particularly in urban areas, where population density is
highest and demand for mobility continues to surge, road infrastructure has
struggled to expand at the same pace, leading to slower speeds, longer travel
times, and significant queuing. Traffic congestion disrupts daily routines and
diminishes the quality of community life, making it a challenge that extends
⋆
Please note that the LNNS Editorial assumes that all authors have used the west-
ern naming convention, with given names preceding surnames. This determines the
structure of the names in the running heads and the author index.
2 Lecture Notes in Computer Science
beyond simple delays. Studies have shown that congestion imposes significant
economic costs, including billions of dollars in wasted fuel and lost productivity
annually. For instance, research conducted in Ho Chi Minh City demonstrated
that traffic congestion heavily impacts daily routines and leads to lower quality
of life for residents, particularly commuters forced to spend extended periods in
gridlocked traffic [3], [8].
The consequences of traffic congestion extend beyond economic losses. Pro-
longed exposure to congested traffic conditions has been linked to increased
stress levels, frustration, and even health issues. Vehicle emissions, including
particulate matter (PM2.5) and carbon dioxide (CO2), contribute to air pol-
lution, which can result in respiratory illnesses, cardiovascular problems, and
reduced life expectancy for individuals living near heavily congested roads. A
study highlighted that emissions from congested traffic significantly increase the
risk of morbidity and mortality for nearby residents [13]. Furthermore, a research
had identified carbon dioxide as a primary greenhouse gas produced by traffic
congestion, emphasizing its role in exacerbating global environmental challenges
[1].
In addition to its environmental and health impacts, traffic congestion dis-
rupts the social fabric of communities. Daily routines, which form the foundation
of individual productivity and well-being, are particularly affected. Women, who
often bear a larger share of household responsibilities such as childcare, cook-
ing, and cleaning, experience greater disruptions to their routines due to traffic
congestion. A study noted that women and working professionals often face sig-
nificant delays in reaching workplaces or completing their daily responsibilities,
leading to increased stress and decreased productivity [2]. These findings under-
score how congestion impacts not only individuals but also the broader economic
and social productivity of cities.
The rise of artificial intelligence (AI) offers a transformative opportunity
to address these persistent challenges. AI technologies, capable of processing
large-scale data, recognizing patterns, and making real-time decisions, present
innovative solutions for traffic management. Applications such as real-time traf-
fic monitoring, predictive analytics, and vehicle guidance systems have shown
potential to alleviate congestion and improve urban mobility. The implementa-
tion of AI-driven solutions can enable traffic systems to adapt dynamically to
varying conditions, optimize routes, and reduce delays. Moreover, AI-powered
systems can address long-standing limitations of traditional methods, such as
the inability to scale effectively across diverse urban environments or provide
accurate predictions in complex scenarios.
Building on this foundation, this study investigates the integration of AI and
domain adaptation techniques to tackle traffic congestion. A novel dataset has
been developed to capture diverse urban traffic scenarios, enabling a guidance
system to learn and generalize effectively. The system incorporates advanced AI-
driven strategies to optimize vehicle flow, reduce congestion, and improve daily
routines disrupted by traffic. By addressing the challenges identified in earlier
studies.
Lecture Notes in Networks and Systems: Authors’ Instructions 3
In this study, related studies will be disscussed in the next part, followed by
the proposed method in Section 3, experimental results in Section 4, and the
discussion and conclusion in Section 5.
2 Related works
2.1 Dataset related works
The development of vehicle-related datasets has played a transformative role
in advancing research in vehicle detection, classification, and re-identification.
Since 2015, a wide variety of datasets have been introduced, each addressing
specific research needs while presenting unique strengths and limitations. These
datasets have provided researchers with standardized benchmarks, enabling the
evaluation and improvement of machine learning models. However, the varying
focus and scope of these datasets underscore the need for region-specific datasets
to address unique challenges.
The CompCars Dataset in 2015 [11], is one of the earliest comprehensive
datasets for vehicle classification and verification. It offers a rich collection of
images taken from diverse angles, including internal and external perspectives,
paired with detailed attribute annotations. These features make it highly ef-
fective for tasks like brand and model classification. Despite its strengths, the
dataset’s focus on controlled environments limits its usefulness in dynamic, real-
world traffic scenarios, where vehicles are often partially visible or subjected to
variable lighting conditions.
In the same year, the VEDAI Dataset [9] was introduced to address vehicle
detection in aerial imagery. With its high-resolution aerial images annotated
for various vehicle types, this dataset has been invaluable for testing detection
algorithms designed for aerial surveillance and remote sensing. The challenges
posed by small object sizes and varied orientations in the dataset have advanced
research in these niche areas. However, its aerial focus makes it unsuitable for
addressing ground-level urban traffic challenges, where the interaction between
vehicles and pedestrians is crucial.
The VehicleID Dataset introduced in 2016 [5], offers a unique approach to
vehicle re-identification, focusing on matching vehicles across multiple camera
views in surveillance settings. The dataset includes numerous images of vehicles
captured at different locations, making it a valuable resource for developing algo-
rithms aimed at urban monitoring systems. Despite its utility, the dataset lacks
diversity in environmental conditions, such as weather variations and differing
traffic densities, which are critical in broader urban traffic scenarios.
Another influential dataset is the VeRi-776 Dataset in 2016 [14], which tar-
gets vehicle re-identification in urban settings. This dataset captures vehicles
under varied orientations, lighting conditions, and occlusions, complemented by
detailed annotations like vehicle models and license plate information. These
features have contributed to advancements in robust re-identification methods.
However, its focus on urban surveillance limits its ability to generalize to broader
4 Lecture Notes in Computer Science
multi-vehicle detection and tracking tasks, especially in highly congested traffic
environments.
The UA-DETRAC Dataset in 2017 [7] is specifically designed for multi-
object detection and tracking in real-world traffic conditions. With extensive
traffic video sequences annotated for bounding boxes and object categories, the
dataset serves as a benchmark for testing algorithms in dynamic and complex
scenarios. While it provides a rich source of data, its lack of representation for
extreme conditions such as heavy rain or nighttime traffic creates challenges for
developing all-weather, robust systems.
In 2020, the VehicleX Dataset [12] introduced a novel synthetic approach to
dataset creation. Using the Unity engine, it offers customizable vehicle attributes,
providing flexibility for algorithm testing and adaptation. The dataset has been
particularly impactful in vehicle re-identification and domain adaptation studies.
However, as synthetic data, it may require additional adaptation steps to ensure
its effective application in real-world scenarios.
While these datasets have significantly contributed to research advancements,
they often fail to capture the unique challenges of real-world traffic in regions
like Vietnam. Urban traffic in Vietnam is characterized by high vehicle density,
diverse types of vehicles, and dynamic environmental conditions, which are often
underrepresented in existing datasets. To address this gap, this study introduces
a new dataset specifically tailored to Vietnam’s urban streets. By capturing the
distinct traffic patterns and conditions, this dataset aims to provide a robust
foundation for developing adaptive vehicle detection and guidance systems suited
to local challenges.
2.2 Vehicle Domain Adaption works
Domain adaptation techniques have become increasingly relevant in vehicle de-
tection, addressing the challenges posed by domain shifts between training and
deployment environments. These techniques enable models to adapt effectively
to new conditions, such as different lighting, weather, or geographical settings,
without requiring extensive labeled data from the target domain. Recent stud-
ies in this field demonstrate significant advancements in domain adaptation for
vehicle detection.
One notable study, Unsupervised Domain Adaptation for Remote-Sensing
Vehicle Detection Using Domain-Specific Channel Recalibration [6], introduces
a method that recalibrates feature channels to align domain-specific character-
istics. By focusing on feature alignment, this approach improves detection ac-
curacy in remote-sensing applications where the domain gap between satellite
images and ground-based datasets can hinder performance.
Another significant contribution is Domain Adaptation for Vehicle Detec-
tion from Bird’s Eye View LiDAR Point Cloud Data. This research employs a
CycleGAN-based framework to address the domain shift between synthetic and
real LiDAR data [10]. By bridging this gap, the method enhances vehicle detec-
tion performance in autonomous driving scenarios, where LiDAR data is often
used for perception tasks.
Lecture Notes in Networks and Systems: Authors’ Instructions 5
The study Domain Adaptation Based Object Detection for Autonomous
Driving in Adverse Weather Conditions explores object detection under chal-
lenging conditions like fog and rain [4]. It proposes an unsupervised domain
adaptation method that incorporates both image-level and object-level adapta-
tions to improve detection accuracy in adverse weather, addressing one of the
critical limitations of existing detection systems.
In the study Stepwise Domain Adaptation for Object Detection in Autonomous
Driving, a stepwise domain adaptation method is introduced to minimize diver-
gence across domains in object detection tasks. This method gradually aligns
source and target domains, focusing on features that are robust to domain shifts,
thereby enhancing object detection in complex scenarios such as urban traffic
environments.
These studies highlight the potential of domain adaptation techniques to
overcome domain-specific challenges in vehicle detection. By addressing the dis-
crepancies between training and deployment environments, such methods im-
prove the robustness and applicability of detection systems in diverse and dy-
namic conditions. The insights and methods developed in these works serve as
a foundation for integrating domain adaptation into vehicle detection systems
tailored to the unique characteristics of traffic scenarios in regions like Vietnam.
3 Proposed Method
3.1 Proposed Dynamic Vehicle View Dataset (DVV)
This study introduces the Dynamic Vehicle View (DVV) Dataset, a novel dataset
specifically curated to address the unique traffic challenges of Vietnam’s urban
environments, particularly in Ho Chi Minh City. Data was collected from six
distinctive streets, including a prominent four-way crossroad, to encompass a
broad spectrum of traffic scenarios. Each location was recorded for one hour,
ensuring a comprehensive representation of real-world traffic dynamics.
To capture diverse perspectives and minimize the effects of occlusion, video
footage was recorded from two complementary viewpoints: a straight-forward
view and an inclined view of the road. These angles were deliberately chosen
to capture intricate vehicle interactions and mitigate the challenges posed by
overlapping vehicles, which are common in dense urban traffic.
Following data collection, the raw video footage was preprocessed to prepare
it for model training. Frames were extracted at a rate of one frame per second to
achieve a balance between capturing temporal dynamics and avoiding redundant
information. Each extracted frame was cropped to focus solely on the road area,
systematically removing extraneous background elements such as pedestrians,
roadside infrastructure, and advertisements. This preprocessing step was crit-
ical to ensure the dataset emphasizes vehicle dynamics and interactions while
eliminating noise that could interfere with model training.
The DVV dataset contains 9,900 images, with a total of 95,486 annotations
across five vehicle categories: Motorcycles, Cars, Vans, Trucks, and Buses. The
6 Lecture Notes in Computer Science
average image size is 640x640 pixels, ensuring consistent resolution for training
and evaluation. These annotations were evenly distributed across training, vali-
dation, and testing subsets, maintaining a balanced representation of all vehicle
classes to enhance model performance across diverse scenarios.
Fig. 1. Overview of the DVV Dataset showing class distributions and annotation statis-
tics.
Annotation quality was ensured through a two-stage labeling process. First,
the DINO model, a state-of-the-art vision transformer-based object detection
framework, was employed for initial automated annotation. The DINO model
was selected for its precision and ability to generalize across datasets, provid-
ing a strong baseline for annotation. Subsequently, manual refinement of these
annotations was performed to address inconsistencies and ensure high-quality
labeling. Each frame was meticulously reviewed and adjusted to rectify any er-
rors or omissions introduced during the automated phase. This hybrid approach
of automated and manual labeling ensured the consistency and robustness of the
annotations.
The DVV dataset reflects the distinct characteristics of Vietnamese urban
traffic, featuring various vehicle types such as motorcycles, cars, vans, buses,
and trucks. It also captures complex traffic dynamics, including high-density
congestion, frequent lane changes, and diverse vehicle interactions. By capturing
data from multiple angles and diverse urban locations, the dataset provides a
comprehensive and contextually relevant resource for training and evaluating
vehicle detection models. Furthermore, the manual refinement of annotations
enhances the dataset’s quality, ensuring its suitability for developing robust and
reliable detection systems.
The DVV dataset addresses the limitations of existing global datasets by
providing a localized and contextually nuanced resource tailored to Vietnam’s
unique urban traffic conditions. This dataset serves as the foundation for the
proposed method, enabling the development of vehicle detection systems specif-
ically optimized for the region and its challenges.
Lecture Notes in Networks and Systems: Authors’ Instructions 7
3.2 Training and Evaluation
The dataset, consisting of 9,568 annotated images, was split into an 80:20 ra-
tio for training and validation. Data augmentation techniques, such as random
cropping, brightness adjustments, horizontal flipping, and scaling, were applied
during training to enhance model robustness and simulate varying real-world
conditions. All models were trained under identical conditions, including the
same hardware and software configurations, to maintain consistency across ex-
periments.
The newly developed dataset was used to train three state-of-the-art object
detection models: YOLOv8, DETR, and Faster R-CNN. These models were cho-
sen to evaluate the dataset’s ability to support robust vehicle detection across
diverse scenarios. The training process and evaluation were conducted to ensure
a fair comparison of the models’ performance on the dataset.
After training, the models were tested on the validation set, and their perfor-
mance was compared. mAP scores revealed the ability of the dataset to support
high-accuracy detection across different traffic scenarios. Precision and recall
values demonstrated the models’ effectiveness in identifying vehicles of various
types, including motorbikes, cars, buses, and trucks. The inference speed pro-
vided insights into the suitability of the models for real-time traffic monitoring
and guidance systems.
3.3 Domain Discriminator
The discriminator in the model is designed to extract meaningful features from
both source and target datasets. Training in a supervised manner enables the
model to leverage these features for more precise predictions. The supervised
and unsupervised discriminators share the same feature extraction layers, but
their output layers differ. During training, backpropagation ensures that updates
to the weights in one model influence the other, promoting a unified learning
process.
The supervised discriminator produces outputs corresponding to N classes us-
ing a softmax activation function. On the other hand, the unsupervised discrim-
inator utilizes the pre-softmax outputs from the supervised model and applies
a custom activation function to process these outputs. The custom activation
is designed to compute a normalized sum of the exponential outputs, enabling
the unsupervised discriminator to assess confidence levels. The formula for this
custom activation is as follows:
Z(x)
D(x) = (1)
Z(x) + 1
where Z(x) is defined as:
N
X
Z(x) = exp[lx (x)] (2)
n=1
8 Lecture Notes in Computer Science
The outputs of this equation range between 0.0 and 1.0. If the pre-softmax
probabilities exhibit low entropy (i.e., high confidence), the custom activation
output approaches 1.0. Conversely, if the probabilities have high entropy (i.e.,
low confidence), the custom activation output nears 0.0.
This mechanism allows the unsupervised discriminator to output high-confidence
predictions for source dataset samples while producing lower confidence for tar-
get dataset samples. Such an approach is highly efficient, as it reuses the same
feature extraction layers for both the supervised and unsupervised discrimina-
tors, ensuring consistency and computational efficiency.
3.4 Vehicle Navigation Domain Adaptation (VNDA) Framework
The VNDA framework addresses the domain shift challenge between the source
domain (training data) and the target domain (real-world deployment data). In
unsupervised domain adaptation task, we assume the presence of source images
Xs and their corresponding labels Ys , originating from the source domain dis-
tribution ps (x, y). Additionally, a target dataset Xt is drawn from the target
domain distribution pt (x, y), where the target labels are unavailable.
The goal is to learn a target encoder Mt that produces feature representations
aligned with the source encoder Ms , allowing the classifier Cs trained on the
source domain to generalize effectively to the target domain. This alignment is
achieved using adversarial training to align the feature distributions Ms (Xs ) and
Mt (Xt ). The framework consists of the following steps:
Fig. 2. Overall VNDA framework architecture
Lecture Notes in Networks and Systems: Authors’ Instructions 9
Step 1: Training the Source Encoder and Discriminator
In the initial phase, the source encoder Ms and discriminator D are trained to
classify domain origin and align the source domain features. This step minimizes
the following adversarial loss for the discriminator:
arg min LD
adv (Xs , Xt , Ms , Mt ) = −Exs ∼Xs log D(Ms (xs ))−Ext ∼Xt log(1−D(Mt (xt ))),
D
(3)
where D(Ms (xs )) represents the probability of correctly classifying features
from the source domain and 1−D(Mt (xt )) indicates the likelihood of identifying
target features as non-source.
Step 2: Training the Target Encoder
To align the target features with the source features, the target encoder Mt
is trained to fool the discriminator D. This ensures that D cannot distinguish
between source and target domain features. The adversarial loss for the target
encoder is defined as:
arg min LM
adv (Xt , D) = −Ext ∼Xt log D(Mt (xt )). (4)
Mt
By minimizing this loss, the target encoder Mt learns to produce domain-
invariant features, effectively bridging the gap between source and target distri-
butions.
Step 3: Feature Alignment
Through iterative adversarial training, the feature extractor aligns the feature
distributions of the source (Ms (Xs )) and target (Mt (Xt )) domains. This align-
ment ensures that the classifier Cs , trained on source domain features, can gen-
eralize effectively to the target domain without requiring labeled target data.
Step 4: Testing
In the testing phase, the aligned target encoder Mt is combined with the pre-
trained source classifier Cs to predict labels for target domain samples Xt . Since
the feature distributions of the source and target domains are aligned during
training, the classifier Cs generalizes effectively to the target domain, allowing
accurate predictions.
Step 5: Fine-Tuning with Target Domain Annotations
If a small subset of manually annotated target domain data is available, it can be
used for fine-tuning the model. The fine-tuning process updates both the target
encoder Mt and the classifier Cs to adapt to specific characteristics of the target
domain, such as unique vehicle types, lighting conditions, or traffic patterns.
10 Lecture Notes in Computer Science
Loss Functions Summary
The VNDA framework optimizes two primary loss functions:
1. Adversarial Loss for Discriminator:
LD
adv (Xs , Xt , Ms , Mt ) = −Exs ∼Xs log D(Ms (xs ))−Ext ∼Xt log(1−D(Mt (xt ))).
(5)
2. Adversarial Loss for Target Encoder:
LM
adv (Xt , D) = −Ext ∼Xt log D(Mt (xt )). (6)
The total loss combines these components, ensuring effective domain align-
ment while retaining high accuracy in the target domain.
3.5 Navigation System
The navigation system is designed to simulate an urban road network, where each
road is monitored by cameras to collect real-time traffic data. Vehicle speeds are
assumed to range between 30 km/h and 40 km/h, representing typical urban
conditions. The traffic data, including vehicle counts and their directions, is
used to categorize the traffic levels and dynamically adjust routing policies. The
system implements an A* algorithm for pathfinding, using traffic data to avoid
congested routes.
Traffic Monitoring and Classification : Each road segment in the network
is classified into one of four traffic conditions based on the number of vehicles
and road length. The classification is as follows:
Free flow, if vehicle count < 5 (50m) or < 10 (100m)
Light traffic, if 5 ≤ vehicle count < 10 (50m) or 10 ≤ vehicle count < 18 (100m)
Traffic Level =
Moderate traffic,
if 10 ≤ vehicle count < 15 (50m) or 18 ≤ vehicle count < 25 (100m)
Heavy congestion, if vehicle count ≥ 15 (50m) or ≥ 25 (100m)
This classification system is used to dynamically adjust road weights during the
pathfinding process, ensuring that heavily congested roads are deprioritized.
Pathfinding Using A* Algorithm : To determine the optimal path from
a source road to a destination, the system uses the A* algorithm. The cost of
traveling along a road segment is determined by the number of vehicles and the
road length. The cost function is defined as:
Cost = vehicle count + road length
The A* algorithm is employed to minimizes the total cost function:
f (n) = g(n) + h(n)
where:
– g(n) is the cumulative cost from the start node to the current node n.
– h(n) is the heuristic estimate of the cost from n to the goal node.
– f (n) is the total estimated cost through node n.
Lecture Notes in Networks and Systems: Authors’ Instructions 11
Traffic Cost Function : The road weights are dynamically calculated using
traffic data. The cost of traversing a road is defined as:
Cost(n, m) = vehicle count[m] + road length[m]
This ensures that roads with higher congestion or longer lengths are penalized
during the routing process.
3.6 Evaluation Metrics
Mean Average Precision (mAP): mAP was calculated at IoU thresholds
of 0.5 (
[email protected] ) and 0.75 (
[email protected] ) to evaluate detection accuracy and
the models’ ability to distinguish objects under varying overlap conditions. It is
defined as:
N
1 X
mAP = APi (7)
N i=1
where APi is the average precision for the i-th class, and N is the total
number of classes.
Precision (P) measures the proportion of correctly detected objects among
all detections:
TP
P = (8)
TP + FP
Recall (R) represents the proportion of true positives out of all ground-truth
objects:
TP
R= (9)
TP + FN
F1 Score: The harmonic mean of precision and recall is used to assess the
balance between these two measures:
P ·R
F1 = 2 · (10)
P +R
4 Experimental Results
4.1 DVV Dataset Trainning Results
The training results on the DVV dataset demonstrated the effectiveness of three
state-of-the-art object detection models: YOLOv8, Faster R-CNN, and DETR,
illustrated in Figure 3. Among these, YOLOv8 achieved the highest mean av-
erage precision (mAP) of 0.966, slightly outperforming Faster R-CNN at 0.964
and DETR at 0.954, showcasing its superior accuracy in detecting vehicles across
diverse scenarios. In terms of precision, YOLOv8 excelled with a score of 0.912,
significantly outperforming DETR (0.89) and Faster R-CNN (0.11), indicating
its capability to reduce false positives effectively. Recall values were comparable
among the models, with YOLOv8 achieving 0.915, slightly higher than Faster
12 Lecture Notes in Computer Science
R-CNN (0.913) and DETR (0.911). Furthermore, YOLOv8 demonstrated the
lowest training loss at 0.39, compared to Faster R-CNN (0.43) and DETR (0.44),
highlighting its efficiency in converging during training. These results establish
YOLOv8 as the most effective model on the DVV dataset, making it the pre-
ferred choice for the domain adaptation framework.
Fig. 3. Trainning results of models on DVV dataset
4.2 VNDA and Navigation System Perfromance Results
The system performs well across all seven domains, illustrated in Figure 4, with
its highest performance observed in Domain 1, where it shows strong accuracy
in detecting relevant objects while minimizing errors. However, as the domains
increase in complexity, there is a noticeable decline in performance, particularly
in the more challenging Domain 7. Despite this, the system is able to adapt
effectively across a range of environments, though further optimization may be
required to maintain consistent performance as the complexity of the domain
increases.
The system’s ability to adapt to new domains is evident, though challenges
remain in handling more difficult conditions. Improvements could be made to
ensure that the system maintains a high level of performance across all domains,
especially in terms of object detection accuracy and minimizing errors.
When it comes to processing speed, as illustrated in Figure 6, the system
consistently maintains a stable FPS rate across all domains, averaging around 10
FPS. The FPS fluctuates slightly due to inherent noise, which is expected given
the variable computational load and scene complexity. Despite these fluctuations,
the system demonstrates that it can handle different domain conditions without
significant degradation in processing speed, ensuring reliable performance for
real-time applications.
Lecture Notes in Networks and Systems: Authors’ Instructions 13
In summary, while the system demonstrates strong adaptation across do-
mains and maintains a stable FPS, further optimization could help reduce per-
formance degradation in more complex domains, improving both accuracy and
processing speed for more demanding real-world applications.
Fig. 4. Traffic Guidance System
Fig. 5. Performance of YOLOv8-DVVFig. 6. FPS performance of YOLOv8-
across 7 domains DVV across 7 domains
14 Lecture Notes in Computer Science
5 Conclusion and Discussion
5.1 Conslusion
This study introduces a domain adaptation framework for vehicle detection in
urban traffic, utilizing a new Dynamic Vehicle View (DVV) dataset tailored to
Vietnam’s urban environments. Experimental results show that YOLOv8 out-
performs other models in terms of mAP, Precision, and Recall, making it the
ideal choice for this system. The framework demonstrated consistent perfor-
mance across seven diverse domains, with minimal performance degradation in
more complex environments. The system maintained a stable FPS of 10, ensuring
real-time capability, which is crucial for urban traffic management applications.
5.2 Discussion
The system performs well across various domains but faces challenges in more
complex environments, where performance slightly decreases. This highlights
the need for further optimization to maintain consistent accuracy in difficult
conditions. The domain adaptation approach was successful in bridging domain
shifts, enabling the system to generalize across different traffic scenarios. Future
improvements could focus on enhancing the model’s robustness under extreme
weather conditions and complex traffic situations, ensuring broader applicability
in real-world settings.
References
1. Afifa, Kashaf Arshad, Nazim Hussain, Muhammad Hamza Ashraf, and Muham-
mad Zafar Saleem. Air pollution and climate change as grand challenges to sus-
tainability. Science of The Total Environment, 928:172370, June 2024.
2. Gregory Fields, David Hartgen, Adrian Moore, and Robert W. Poole Jr. Relieving
Congestion by Adding Road Capacity and Tolling. International Journal of Sus-
tainable Transportation, 3(5-6):360–372, August 2009. Publisher: Taylor & Francis
eprint: https://doi.org/10.1080/15568310802260013.
3. Thi Phuong Linh Le and Tu Anh Trinh. Encouraging Public Transport Use to
Reduce Traffic Congestion and Air Pollutant: A Case Study of Ho Chi Minh City,
Vietnam. Procedia Engineering, 142:236–243, January 2016.
4. Jinlong Li, Runsheng Xu, Xinyu Liu, Jin Ma, Baolu Li, Qin Zou, Jiaqi Ma, and
Hongkai Yu. Domain Adaptation based Object Detection for Autonomous Driving
in Foggy and Rainy Weather. IEEE Transactions on Intelligent Vehicles, pages
1–12, 2024. Conference Name: IEEE Transactions on Intelligent Vehicles.
5. Hongye Liu, Yonghong Tian, Yaowei Wang, Lu Pang, and Tiejun Huang. Deep
relative distance learning: Tell the difference between similar vehicles. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages
2167–2175, 2016.
6. Weixing Liu, Jun Liu, and Bin Luo. Unsupervised Domain Adaptation for Remote-
Sensing Vehicle Detection Using Domain-Specific Channel Recalibration. IEEE
Geoscience and Remote Sensing Letters, 20:1–5, 2023. Conference Name: IEEE
Geoscience and Remote Sensing Letters.
Lecture Notes in Networks and Systems: Authors’ Instructions 15
7. Siwei Lyu, Ming-Ching Chang, Dawei Du, Longyin Wen, Honggang Qi, Yuezun
Li, Yi Wei, Lipeng Ke, Tao Hu, Marco Del Coco, Pierluigi Carcagnı̀, Dmitriy
Anisimov, Erik Bochinski, Fabio Galasso, Filiz Bunyak, Guang Han, Hao Ye,
Hong Wang, Kannappan Palaniappan, Koray Ozcan, Li Wang, Liang Wang, Mar-
tin Lauer, Nattachai Watcharapinchai, Nenghui Song, Noor M. Al-Shakarji, Shuo
Wang, Sikandar Amin, Sitapa Rujikietgumjorn, Tatiana Khanova, Thomas Sikora,
Tino Kutschbach, Volker Eiselein, Wei Tian, Xiangyang Xue, Xiaoyi Yu, Yao Lu,
Yingbin Zheng, Yongzhen Huang, and Yuqi Zhang. UA-DETRAC 2017: Report
of AVSS2017 & IWT4S Challenge on Advanced Traffic Monitoring. In 2017 14th
IEEE International Conference on Advanced Video and Signal Based Surveillance
(AVSS), pages 1–7, August 2017.
8. Minh Quyen Nguyen, Thi Thanh Xuan Pham, and Thi Thuy Hoa Phan. Traffic
Congestion: A Prominent Problem in Vietnam Current Situation and Solutions.
European Journal of Engineering and Technology Research, 4(9):112–116, Septem-
ber 2019. Number: 9.
9. Sebastien Razakarivony and Frederic Jurie. Vehicle detection in aerial imagery : A
small target detection benchmark. Journal of Visual Communication and Image
Representation, 34:187–203, January 2016.
10. Khaled Saleh, Ahmed Abobakr, Mohammed Attia, Julie Iskander, Darius Naha-
vandi, Mohammed Hossny, and Saeid Nahvandi. Domain Adaptation for Vehicle
Detection from Bird’s Eye View LiDAR Point Cloud Data. 2019 IEEE/CVF Inter-
national Conference on Computer Vision Workshop (ICCVW), pages 3235–3242,
October 2019. Conference Name: 2019 IEEE/CVF International Conference on
Computer Vision Workshop (ICCVW) ISBN: 9781728150239 Place: Seoul, Korea
(South) Publisher: IEEE.
11. Linjie Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. A Large-Scale Car
Dataset for Fine-Grained Categorization and Verification. pages 3973–3981, 2015.
12. Yue Yao, Liang Zheng, Xiaodong Yang, Milind Naphade, and Tom Gedeon. Simu-
lating content consistent vehicle datasets with attribute descent. In ECCV, 2020.
13. Kai Zhang and Stuart Batterman. Air pollution and health risks due to vehicle
traffic. Science of The Total Environment, 450-451:307–316, April 2013.
14. Zhedong Zheng, Tao Ruan, Yunchao Wei, Yi Yang, and Tao Mei. VehicleNet:
Learning Robust Visual Representation for Vehicle Re-Identification. IEEE Trans-
actions on Multimedia, 23:2683–2693, 2021. Conference Name: IEEE Transactions
on Multimedia.