A Survey of Autonomous Driving CommonPractices and Emerging.docx

A Survey of Autonomous Driving: Common
Practices and Emerging Technologies
Ekim Yurtsever∗ , Jacob Lambert∗ , Alexander Carballo∗ ,
Kazuya Takeda∗ †
Abstract—Automated driving systems (ADSs) promise a safe,
comfortable and efficient driving experience. However,
fatalities
involving vehicles equipped with ADSs are on the rise. The full
potential of ADSs cannot be realized unless the robustness of
state-of-the-art improved further. This paper discusses unsolved
problems and surveys the technical aspect of automated driving.
Studies regarding present challenges, high-level system
architec-
tures, emerging methodologies and core functions: localization,
mapping, perception, planning, and human machine interface,
were thoroughly reviewed. Furthermore, the state-of-the-art was
implemented on our own platform and various algorithms were
compared in a real-world driving setting. The paper concludes
with an overview of available datasets and tools for ADS
development.
I. INTRODUCTION
ACCORDING to a recent technical report by the
NationalHighway Traffic Safety Administration (NHTSA), 94%
of road accidents are caused by human errors [1]. Automated
driving systems (ADSs) are being developed with the promise
of preventing accidents, reducing emissions, transporting the
mobility-impaired and reducing driving related stress [2].
Annual social benefits of ADSs are projected to reach nearly

$800 billion by 2050 through congestion mitigation, road ca-
sualty reduction, decreased energy consumption and increased
productivity caused by the reallocation of driving time [3].
Eureka Project PROMETHEUS [4] was carried out in Eu-
rope between 1987-1995, and it was one of the earliest major
automated driving studies. The project led to the development
of VITA II by Daimler-Benz, which succeeded in automat-
ically driving on highways [5]. DARPA Grand Challenge,
organized by the US Department of Defense in 2004, was
the first major automated driving competition where all of the
attendees failed to finish the 150-mile off-road parkour. The
difficulty of the challenge was due to the rule that no human
intervention at any level was allowed during the finals. Another
similar DARPA Grand Challenge was held in 2005. This time
five teams managed to complete the off-road track without any
human interference [6].
Fully automated driving in urban scenes was seen as the
biggest challenge of the field since the earliest attempts.
During DARPA Urban Challenge [7], held in 2007, many
different research groups around the globe tried their ADSs
in a test environment that was modeled after a typical urban
scene. Six teams managed to complete the event. Even though
∗ E. Yurtsever, J. Lambert, A. Carballo and K. Takeda are with
Nagoya
University, Furo-cho, Nagoya, 464-8603, JAPAN
† K. Takeda is also with Tier4 Inc. Nagoya, JAPAN.
Corresponding author: Ekim Yurtsever, [email protected]
this competition was the biggest and most significant event
up to that time, the test environment lacked certain aspects
of a real-world urban driving scene such as pedestrians and
cyclists. Nevertheless, the fact that six teams managed to
complete the challenge attracted significant attention. After
DARPA Urban Challenge, several more automated driving

competitions such as [8]–[10] were held in different countries.
Common practices in system architecture have been estab-
lished over the years. Most of the ADSs divide the massive
task of automated driving into subcategories and employ an
array of sensors and algorithms on various modules. More
recently, end-to-end driving systems started to emerge as an
alternative to modular approaches. Deep learning models have
become dominant in many of these tasks [11]. A high level
classification of ADS architectures is given in Figure 1.
The accumulated knowledge in vehicle dynamics, break-
throughs in computer vision caused by the advent of deep
learning [12] and availability of new sensor modalities, such
as lidar [13], catalyzed ADS research and industrial imple-
mentation. Furthermore, an increase in public interest and
market potential precipitated the emergence of ADSs with
varying degrees of automation. However, robust automated
driving in urban environments has not been achieved yet [14].
Accidents caused by immature systems [15]–[18] undermine
trust, and furthermore, they cost lives. As such, a thorough
investigation of unsolved challenges and the state-of-the-art is
deemed necessary here.
The Society of Automotive Engineers (SAE) refers to
hardware-software systems that can execute dynamic driving
tasks (DDT) on a sustainable basis as ADS [19]. There are also
vernacular alternative terms such as ”autonomous driving” and
”self-driving car” in use. Nonetheless, despite being commonly
used, SAE advices not to use them as these terms are unclear
and misleading. In this paper we follow SAE’s convention.
The present paper attempts to provide a structured and com-
prehensive overview of the hardware-software architectures of
state-of-the-art ADSs. Moreover, emerging trends such as end-
to-end driving and connected systems are discussed in detail.

There are overview papers on the subject, which covered
several core functions [20], [21], and which concentrated on
the motion planning aspect [22], [23]. However, a survey that
covers: present challenges, available and emerging high-level
system architectures, individual core functions such as local-
ization, mapping, perception, planning, vehicle control, and
human-machine interface altogether does not exist. The aim
of this paper is to fill this gap in the literature with a thorough
survey. In addition, a detailed summary of available datasets,
software stacks, and simulation tools is presented here. An-
ar
X
iv
:1
90
6.
05
11
3v
2
[
cs
.R
O
]
6
J

an
2
02
0
TABLE I: Comparison of survey papers
Related work Survey coverage
Connected
systems
End-to-end Localization Perception Assessment Planning
Control HMI Datasets &
software
Implementation
[20] - - X X - X X - - X
[21] - - X X - X - - - -
[22] - - - - - X X - - -
[23] - - - - - X - - - -
[24] X X X X - - - - - -
[25] X - X - - - - - X -
[26] X - - - - - - - - -
[27] - - X X - X X - - X
[28] - X - - X X X - - -
Ours X X X X X X - X X X
other contribution of this paper is the detailed comparison and
analysis of alternative approaches through implementation. We
implemented the state-of-the-art in our platform using open-

source software. Comparison of existing overview papers and
our work is shown in Table I.
The reminder of this paper is written in eight sections.
Section II is an overview of present challenges. Details of
automated driving system components and architectures are
given in Section III. Section IV presents a summary of state-
of-the-art localization techniques followed by Section V, an in-
depth review of perception models. Assessment of the driving
situation and planning are discussed in Section VI and VII
respectively. In Section VIII, current trends and shortcomings
of human machine interface are introduced. Datasets and
available tools for developing automated driving systems are
given in Section IX.
II. PROSPECTS AND CHALLENGES
A. Social impact
Widespread usage of ADSs is not imminent, yet, it is still
possible to foresee the potential impact and benefits of it to a
certain degree.
1) Problems that can be solved: preventing traffic acci-
dents, mitigating traffic congestions, reducing emissions
2) Arising opportunities: reallocation of driving time, trans-
porting the mobility impaired
3) New trends: consuming Mobility as a Service (MaaS),
logistics revolution
Widespread deployment of ADSs can reduce the societal
loss caused by erroneous human behavior such as distraction,
driving under influence and speeding [3].
Globally, the elder group (over 60 years old) is growing

faster than the younger groups [29]. Increasing the mobility
of elderly with ADSs can have a huge impact on the quality
of life and productivity of a large portion of the population.
A shift from personal vehicle-ownership towards consuming
Mobility as a Service (MaaS) is an emerging trend. Currently,
ride-sharing has lower costs compared to vehicle-ownership
under 1000 km annual mileage [30]. The ratio of owned to
shared vehicles is expected to be 50:50 by 2030 [31]. Large
scale deployment of ADSs can accelerate this trend.
B. Challenges
ADSs are complicated robotic systems that operate in inde-
terministic environments. As such, there are myriad scenarios
with unsolved issues. This section discusses the high level
challenges of driving automation in general. More minute,
task-specific details are discussed in corresponding sections.
The Society of Automotive Engineers (SAE) defined five
levels of driving automation in [19]. In this taxonomy, level
zero stands for no automation at all. Primitive driver assistance
systems such as adaptive cruise control, anti-lock braking
systems and stability control start with level one [32]. Level
two is partial automation to which advanced assistance systems
such as emergency braking or collision avoidance [33], [34]
are integrated. With the accumulated knowledge in the vehicle
control field and the experience of the industry, level two
automation became a feasible technology. The real challenge
starts above this level.
Level three is conditional automation; the driver could focus
on tasks other than driving during normal operation, however,
s/he has to quickly respond to an emergency alert from the
vehicle and be ready to take over. In addition, level three ADS

operate only in limited operational design domains (ODDs)
such as highways. Audi claims to be the first production car to
achieve level 3 automation in limited highway conditions [35].
However, taking over the control manually from the automated
mode by the driver raises another issue. Recent studies [36],
[37] investigated this problem and found that the takeover
situation increases the collision risk with surrounding vehicles.
The increased likelihood of an accident during a takeover is a
problem that is yet to be solved.
Human attention is not needed in any degree at level four
and five. However, level four can only operate in limited ODDs
where special infrastructure or detailed maps exist. In the case
of departure from these areas, the vehicle must stop the trip
by automatically parking itself. The fully automated system,
level five, can operate in any road network and any weather
condition. No production vehicle is capable of level four or
level five driving automation yet. Moreover, Toyota Research
Institute stated that no one in the industry is even close to
attaining level five automation [38].
Level four and above driving automation in urban road
networks is an open and challenging problem. The envi-
ronmental variables, from weather conditions to surrounding
human behavior, are highly indeterministic and difficult to
predict. Furthermore, system failures lead to accidents: in the
Hyundai competition one of the ADSs crashed because of rain
[15], Google’s ADS hit a bus while lane changing because it
failed to estimate the speed of a bus [16], and Tesla’s Autopilot
failed to recognize a white truck and collided with it, killing
Fig. 1: A high level classification of automated driving system
architectures

Fig. 2: A generic end-to-end system information flow diagram
the driver [17].
Fatalities [17], [18] caused by immature ADSs undermine
trust. According to a recent survey [30], the majority of
consumers question the safety of the technology, and they want
a significant amount of control over the development and use
of ADS. On the other hand, extremely cautious ADSs are also
making a negative impression [39].
Ethical dilemmas pose another set of challenges. In an
inevitable accident situation, how should the system behave
[40]? Experimental ethics were proposed regarding this issue
[41].
Risk and reliability certification is another task yet to be
solved. Like in aircraft, ADSs need to be designed with high
redundancies that will minimize the chance of a catastrophic
failure. Even though there is promising projects in this regard
such as DeepTest [42], the design-simulation-test-redesign-
certification procedure is still not established by the industry
nor the rule-makers.
Finally, various optimization goals such as time to reach the
destination, fuel efficiency, comfort, and ride-sharing optimiza-
tion increases the complexity of an already difficult to solve
problem. As such, carrying all of the dynamic driving tasks
safely under strict conditions outside a well defined, geofenced
area has not been achieved yet and stays as an open problem.
III. SYSTEM COMPONENTS AND ARCHITECTURE
A. System architecture
Classification of system architectures is shown in Figure

1. ADSs are designed either as standalone, ego-only sys-
tems [20], [43] or connected multi-agent systems [44]–[46].
Furthermore, these design philosophies are realized with two
alternative approaches: modular [20], [43], [47]–[54] or end-
to-end designs [55]–[63].
TABLE II: End-to-end driving architectures
Related
works Learning/training strategy Pros/cons
[55]–[59] Direct supervised deep
learning
Imitates the target data:
usually a human driver. Can
be trained offline. Poor
generalization performance.
[60], [61] Deep reinforcement learning Learns the optimum way
of
driving. Requires online
interaction. Urban driving has
not been achieved yet
[62], [63] Neuroevolution No backpropagation. Requires
online interaction. Real world
driving has not been achieved
yet.
1) Ego-only systems: The ego-only approach is to carry
all of the necessary automated driving operations on a single
self-sufficient vehicle at all times, whereas a connected ADS

may or may not depend on other vehicles and infrastructure
elements given the situation. Ego-only is the most common
approach amongst the state-of-the-art ADSs [20], [43], [47]–
[52], [52]–[54]. We believe this is due to the practicality
of having a self-sufficient platform for development and the
additional challenges of connected systems.
2) Modular systems: Modular systems, referred as the
mediated approach in some works [55], are structured as a
pipeline of separate components linking sensory inputs to
actuator outputs [11]. Core functions of a modular ADS
can be summarized as: localization and mapping, perception,
assessment, planning and decision making, vehicle control,
and human-machine interface. Typical pipelines [20], [43],
[47]–[52], [52]–[54] start with feeding raw sensor inputs to
localization and object detection modules, followed by scene
prediction and decision making. Finally, motor commands are
generated at the end of the stream by the control module [11],
[64].
Developing individual modules separately divides the chal-
lenging task of automated driving into an easier-to-solve set
of problems [65]. These sub-tasks have their corresponding
literature in robotics [66], computer vision [67] and vehicle
dynamics [32], which makes the accumulated know-how and
expertise directly transferable. This is a major advantage of
modular systems. In addition, functions and algorithms can
be integrated or build upon each other in a modular design.
E.g, a safety constraint [68] can be implemented on top of
a sophisticated planning module to force some hard-coded
emergency rules without modifying the inner workings of
the planner. This enables designing redundant but reliable
architectures.
The major disadvantages of modular systems are being
prone to error propagation [11] and over-complexity. In the

unfortunate Tesla accident, an error in the perception module,
misclassification of a white trailer as sky, propagated down
the pipeline until failure, causing the first ADS related fatality
[42].
3) End-to-end systems: End-to-end driving systems, re-
ferred as direct perception in some studies [55], generate
ego-motion directly from sensory inputs. Ego-motion can be
either the continuous operation of steering wheel and pedals
or a discrete set of actions, e.g, acceleration and turning left.
There are three main approaches for end-to-end driving: direct
supervised deep learning [55]–[59], neuroevolution [62], [63]
and the more recent deep reinforcement learning [60], [61].
The flow diagram of a generic end-to-end driving system is
shown in Figure 2 and comparison of the approaches is given
in Table II.
The earliest end-to-end driving system dates back to
ALVINN [56], where a 3-layer fully connected network was
trained to output the direction that the vehicle should follow.
An end-to-end system for off-road driving was introduced in
[57]. With the advances in artificial neural network research,
deep convolutional and temporal networks became feasible
for automated driving tasks. A deep convolutional neural
network that takes image as input and outputs steering was
proposed in [58]. A spatiotemporal network, an FCN-LSTM
architecture, was developed for predicting ego-vehicle motion
in [59]. DeepDriving is another convolutional model that tries
to learn a set of discrete perception indicators from the image
input [55]. This approach is not entirely end-to-end though,
the proper driving actions in the perception indicators have
to be generated by another module. All of the mentioned
methods follow direct supervised training strategies. As such,

ground truth is required for training. Usually, the ground truth
is the ego-action sequence of an expert human driver and the
network learns to imitate the driver. This raises an import
design question: should the ADS drive like a human?
A novel deep reinforcement learning model, Deep Q Net-
works (DQN), combined reinforcement learning with deep
learning [69]. In summary, the goal of the network is to select
a set of actions that maximize cumulative future rewards. A
deep convolutional neural network was used to approximate
the optimal action reward function. Actions are generated
first with random initialization. Then, the network adjust
its parameters with experience instead of direct supervised
learning. An automated driving framework using DQN was
introduced in [60], where the network was tested in a sim-
ulation environment. The first real world run with DQN was
achieved in a countryside road without traffic [61]. DQN based
systems do not imitate the human driver, instead, they learn
the ”optimum” way of driving.
Neuroevolution refers to using evolutionary algorithms to
train artificial neural networks [70]. End-to-end driving with
neuroevolution is not popular as DQN and direct supervised
learning. To the best of our knowledge, real world end-to-
end driving with neuroevolution is not achieved yet. However,
some promising simulation results were obtained [62], [63].
ALVINN was trained with neuroevolution and outperformed
the direct supervised learning version [62]. A RNN was
trained with neuroevolution in [63] using a driving simulator.
The biggest advantage of neuroevolution is the removal of
backpropagation, hence, the need for direct supervision.
End-to-end driving is promising, however it has not been
implemented in real-world urban scenes yet, except limited
demonstrations. The biggest shortcomings of end-to-end sys-
tems in general are the lack of hard coded safety measures and

interpretability [65]. In addition, DQN and neuroevolution has
one major disadvantage over direct supervised learning: these
networks must interact with the environment online and fail
a lot to learn the desired behavior. On the contrary, direct
supervised networks can be trained offline with labeled data.
4) Connected systems: There is no operational connected
ADS in use yet, however, some researchers [44]–[46] be-
lieve this emerging technology will be the future of driving
automation. With the use of Vehicular Ad hoc NETworks
(VANETs), the basic operations of automated driving can be
distributed amongst agents. V2X is a term that stands for
“vehicle to everything.” From mobile devices of pedestrians
to stationary sensors on a traffic light, an immense amount of
data can be accessed by the vehicle with V2X [26]. By sharing
detailed information of the traffic network amongst peers [71],
shortcomings of the ego-only platforms such as sensing range,
blind spots, and computational limits may be eliminated. More
V2X applications that will increase safety and traffic efficiency
are expected to emerge in the foreseeable future [72].
VANETs can be realized in two different ways: conventional
IP based networking and Information-Centric Networking
(ICN) [44]. For vehicular applications, lots of data have to be
distributed amongst agents with intermittent and in less than
ideal connections while maintaining high mobility [46]. Con-
ventional IP-host based Internet protocol cannot function prop-
erly under these conditions. On the other hand, in information-
centric networking, vehicles stream query messages to an area
instead of a direct address and they accept corresponding
responses from any sender [45]. Since vehicles are highly
mobile and dispersed on the road network, the identity of
the information source becomes less relevant. In addition,
local data often carries more crucial information for immediate
driving tasks such as avoiding a rapidly approaching vehicle

on a blind spot.
Early works, such as the CarSpeak system [73], proved that
vehicles can utilize each other’s sensors and use the shared
information to execute some dynamic driving tasks. However,
without reducing huge amounts of continuous driving data,
sharing information between hundreds of thousand vehicles
in a city could not become feasible. A semiotic framework
that integrates different sources of information and converts
raw sensor data into meaningful descriptions was introduced
in [74] for this purpose. In [75], the term Vehicular Cloud
Computing (VCC) was coined and the main advantages of it
over conventional Internet cloud applications was introduced.
Sensors are the primary cause of the difference. In VCC,
sensor information is kept on the vehicle and only shared if
there is a local query from another vehicle. This potentially
saves the cost of uploading/downloading a constant stream
of sensor data to the web. Besides, the high relevance of
local data increases the feasibility of VCC. Regular cloud
computing was compared to vehicular cloud computing and
it was reported that VCC is technologically feasible [76]. The
term ”Internet of Vehicles” (IoV) was proposed for describing
a connected ADS [44] and the term ”vehicular fog” was
introduced in [45].
Establishing an efficient VANET with thousands of vehicles
in a city is a huge challenge. For an ICN based VANET,
some of the challenging topics are security, mobility, routing,
naming, caching, reliability and multi-access computing [77].
Fig. 3: Ricoh Tetha V panoramic images collected using our
data collection platform, in Nagoya University campus. Note
some distortion still remains on the periphery of the image.

In summary, even though the potential benefits of a connected
system is huge, the additional challenges increase the com-
plexity of the problem to a significant degree. As such, there
is no operational connected system yet.
B. Sensors and hardware
State-of-the-art ADSs employ a wide selection of onboard
sensors. High sensor redundancy is needed in most of the
tasks for robustness and reliability. Hardware units can be
categorized into five: exteroceptive sensors for perception,
proprioceptive sensors for internal vehicle state monitoring
tasks, communication arrays, actuators, and computational
units.
Exteroceptive sensors are mainly used for perceiving the
environment, which includes dynamic and static objects, e.g.,
drivable areas, buildings, pedestrian crossings. Camera, lidar,
radar and ultrasonic sensors are the most commonly used
modalities for this task. A detailed comparison of exteroceptive
sensors is given in Table III.
1) Monocular Cameras: Cameras can sense color and is
passive, i.e. does not emit any signal for measurements. Sens-
ing color is extremely important for tasks such as traffic light
recognition. Furthermore, 2D computer vision is an established
field with remarkable state-of-the-art algorithms. Moreover,
a passive sensor does not interfere with other systems since
it does not emit any signals. However, cameras have certain
shortcomings. Illumination conditions affect their performance
drastically, and depth information is difficult to obtain from a
single camera. There are promising studies [83] to improve
monocular camera based depth perception, but modalities
that are not negatively affected by illumination and weather
conditions are still necessary besides cameras for dynamic
driving tasks. Other camera types gaining interest for ADS

include flash cameras [78], thermal cameras [80], [81], and
event cameras [79].
2) Omnidirection Camera: For 360◦ 2D vision, omnidirec-
tional cameras are used as an alternative to camera arrays. It
have seen increasing use, with increasingly compact and high
performance hardware being constantly released. Panoramic
view is particularly desirable for applications such as naviga-
tion, localization and mapping [84].
Fig. 4: DAVIS240 events, overlayed on the image (left) and
corresponding RBG image from a different camera (right),
collected by our data collection platform, at a road crossing
near Nagoya University. The motion of the cyclist and vehicle
causes brightness changes which trigger events.
Fig. 5: The ADS equipped Prius of Nagoya University. We
have used this vehicle to perform core automated driving
functions.
3) Event Cameras: Event cameras are among the newer
sensing modalities that have seen use in ADS [85]. Event
cameras record data asynchronously for individual pixels with
respect to visual stimulus. The output is therefore an irregular
sequence of data points, or events triggered by changes in
brightness. The response time is in the order of microseconds
[86]. The main limitation of current event cameras is pixel
size and image resolution. For example, the DAVIS40 image
shown in Figure 4 has a pixel size of 18.5 × 18.5 µm and a
resolution of 240×180. Recently, a driving dataset with event
camera data has been published [85].
4) Radar: Radar, lidar and ultrasonic sensors are very
useful in covering the shortcomings of cameras. Depth infor-
mation, i.e. distance to objects, can be measured effectively
to retrieve 3D information with these sensors, and they are

not affected by illumination conditions. However, they are
active sensors. Radars emit radio waves that bounce back
from objects and measure the time of each bounce. Emissions
from active sensors can interfere with other systems. Radar
is a well established technology that is both lightweight and
cost effective. For example, radars can fit inside side-mirrors.
Radars are cheaper and can detect objects at longer distances
than lidars. However, lidars are more accurate.
TABLE III: Exteroceptive sensors
Modality Affected byIllumination
Affected by
weather Color Depth Range Accuracy Size Cost
Lidar - X - X medium(< 200m) high large* high*
Radar - - - X high medium small medium
Ultrasonic - - - X short low small low
Camera X X X - - - smallest lowest
Stereo Camera X X X X medium(< 100m) low medium low
Flash Camera [78] X X X X medium(< 100m) low medium low
Event Camera [79] limited X - - - - smallest low
Thermal Camera [80], [81] - X - - - - smallest low
* Cost, size and weight of lidars started to decrease recently
[82]
TABLE IV: Onboard sensor setup of ADS equipped vehicles

Platform # 360
◦ rotating
lidars
# stationary
lidars # Radars # Cameras
Ours 1 - - 4
Boss [43] 1 9 5 2
Junior [20] 1 2 6 4
BRAiVE [48] - 5 1 10
RobotCar [49] - 3 - 4
Google car (prius) [51] 1 - 4 1
Uber car (XC90) [52] 1 - 10 7
Uber car (Fusion) [52] 1 7 7 20
Bertha [53] - - 6 3
Apollo Auto [54] 1 3 2 2
5) Lidar: Lidar operates with a similar principle that of
radar but it emits infrared light waves instead of radio waves.
It has much higher accuracy than radar under 200 meters.
Weather conditions such as fog or snow have a negative impact
on the performance of lidar. Another aspect is the sensor size.
Smaller sensors are preferred on the vehicle because of limited
space and aerodynamic restraints. Lidar is larger than radar.
In [87], human sensing performance is compared to ADS.
One of the key findings of this study is that even though
human drivers are still better at reasoning in general, the
perception capability of ADSs with sensor-fusion can exceed
humans, especially in degraded conditions such as insufficient
illumination.

6) Proprioceptive sensors: Proprioceptive sensing is an-
other crucial category. Vehicle states such as speed, accel-
eration and yaw must be continuously measured in order
to operate the platform safely with feedback. Almost all
of the modern production cars are equipped with adequate
proprioceptive sensors. Wheel encoders are mainly used for
odometry, Inertial Measurement Units (IMU) are employed
for monitoring the velocity and position changes, tachometers
are utilized for measuring speed and altimeters for altitude.
These signals can be accessed through the CAN protocol of
modern cars.
Besides sensors, an ADS needs actuators to manipulate the
vehicle and advanced computational units for processing and
storing sensor data.
7) Full size cars: There are numerous instrumented vehi-
cles introduced by different research groups such as Stanford’s
Junior [20], which employs an array of sensors with different
modalities for perceiving external and internal variables. Boss
won the [43] DARPA Urban Challenge with an abundance of
sensors. RobotCar [49] is a cheaper research platform aimed
for data collection. In addition, different levels of driving
automation have been introduced by the industry; Tesla’s
Autopilot [88] and Google’s self driving car [89] are some
examples. Bertha [53] is developed by Daimler and has 4
120◦ short-range radars, two long-range range radar on the
sides, stereo camera, wide angle-monocular color camera on
the dashboard, another wide-angle camera for the back. Our
vehicle is shown in Figure 5. A detailed comparison of sensor
setups of 10 different full-size ADSs is given in Table IV.
8) Large vehicles and trailers: Earliest intelligent trucks
were developed for the PATH program in California [98],

which utilized magnetic markers on the road. Fuel economy is
an essential topic in freight transportation and methods such
as platooning has been developed for this purpose. Platooning
is a well-studied phenomenon; it reduces drag and therefore
fuel consumption [99]. In semi-autonomous truck platooning,
the lead truck is driven by a human driver, and several
automated trucks follow it; forming a semi-autonomous road-
train as defined in [100]. Sartre European Union project [101]
introduced such a system that satisfies three core conditions:
using the already existing public road network, sharing the
traffic with non-automated vehicles and not modifying the road
infrastructure. A platoon consisting of three automated trucks
was formed in [99] and significant fuel savings were reported.
Tractor-trailer setup poses an additional challenge for au-
tomated freight transport. Conventional control methods such
as feedback linearization [102] and fuzzy control [103] were
used for path tracking without considering the jackknifing
constraint. The possibility of jackknifing, the collision of the
truck and the trailer with each other, increases the difficulty of
the task [104]. A control safety governor design was proposed
in [104] to prevent jackknifing while reversing.
IV. LOCALIZATION AND MAPPING
Localization is the task of finding ego-position relative
to a reference frame in an environment [106], and it is
fundamental to any mobile robot. It is especially crucial for
ADSs [25]; the vehicle must use the correct lane and position
TABLE V: Localization techniques
Methods Robustness Cost Accuracy Size Computational
requirements Related works

Absolute positioning sensors low low low small lowest [90]
Odometry/dead reckoning low low low smallest low [91]
GPS-IMU fusion medium medium low small low [92]
SLAM medium-high medium high large very high [93]
A priori Map-based
Landmark search high medium high large medium [94], [95]
Point cloud matching highest highest highest largest high [96],
[97]
Fig. 6: We used NDT matching [97], [105] to localize our
vehicle in the Nagoya University campus. White points belong
to the offline map and the colored ones were obtained from
online scans. The objective is to find the best match between
colored points and white points, thus localizing the vehicle.
itself in it accurately. Furthermore, localization is an elemental
requirement for global navigation.
The reminder of this section details the three most common
approaches that use solely on-board sensors: Global Position-
ing System and Inertial Measurement Unit (GPS-IMU) fusion,
Simultaneous Localization And Mapping (SLAM), and state-
of-the-art a priori map-based localization. Readers are referred
to [106] for a broader localization overview. A comparison of
localization methods is given in Table V.
A. GPS-IMU fusion
The main principle of GPS-IMU fusion is correcting accu-
mulated errors of dead reckoning in intervals with absolute
position readings [107]. In a GPS-IMU system, changes in
position and orientation are measured by IMU, and this
information is processed for localizing the robot with dead
reckoning. There is a significant drawback of IMU, and in
general dead reckoning though, errors accumulate with time

and often leads to failure in long-term operations [108]. With
the integration of GPS readings, the accumulated errors of the
IMU can be corrected in intervals.
GPS-IMU systems by themselves cannot be used for vehicle
localization as they do not meet the performance criteria [109].
In the 2004 DARPA Grand Challenge, the red team from
Carnegie Mellon University [92] failed the race because of
a GPS error. The accuracy required for automated driving
in urban scenes cannot be realized with the current GPS-
IMU technology. Moreover, in dense urban environments, the
accuracy drops further, and the GPS stops functioning from
time to time because of tunnels [107] and high buildings.
Fig. 7: Creating a 3D pointcloud map with congregation of
scans. We used Autoware [110] for mapping.
Even though GPS-IMU systems by themselves do not
meet the performance requirements and cannot be utilized
except high-level route planning, they are used for initial pose
estimation in tandem with lidar and other sensors in state-of-
the-art localization systems [109].
B. Simultaneous localization and mapping
Simultaneous localization and mapping (SLAM) is the act
of online map making and localizing the robot in it at the
same time. A priori information about the environment is not
required in SLAM. It is a common practice in robotics, espe-
cially in indoor environments. However, due to the high com-
putational requirements and environmental challenges, running
SLAM algorithms outdoors is less efficient than localization
with a pre-built map [111].
Team MIT used a SLAM approach in DARPA urban
challenge [112] and finished it in the 4th place. Whereas,

the winner, Carnegie Mellon‘s Boss [43] and the runner-up,
Stanford‘s Junior [20], both utilized a priori information. In
spite of not having the same level of accuracy and efficiency,
SLAM techniques have one major advantage over a priori
methods: they can work anywhere.
SLAM based methods have the potential to replace a priori
techniques if their performances can be increased further [24].
We refer the readers to [25] for a detailed SLAM survey in
the intelligent vehicle domain.
C. A priori map-based localization
The core idea of a priori map-based localization techniques
is matching: localization is achieved through the comparison of
online readings to the information on the detailed a priori map
and finding the location of the best possible match [109]. Often
an initial pose estimation, for example with a GPS, is used
Fig. 8: Annotating a 3D point cloud map with topological
information. A large number of annotators were employed to
build
the map shown on the right-hand side. The point-cloud and
annotated maps are available on [113].
at the beginning of the matching process. There are various
approaches to map building and preferred modalities.
Changes in the environment affect the performance of map-
based methods negatively. This effect is prevalent especially
in rural areas where past information of the map can deviate
from the actual environment because of changes in roadside
vegetation and constructions [114]. Moreover, this method
requires an additional step of map making.

There are two different map-based approaches; landmark
search and matching.
1) Landmark search: Landmark search is computationally
less expensive in comparison to point cloud matching. It is a
robust localization technique as long as a sufficient amount of
landmarks exists. In an urban environment, poles, curbs, signs
and road markers can be used as landmarks.
A road marking detection method using lidar and Monte
Carlo Localization (MCL) was used in [94]. In this method,
road markers and curbs were matched to an offline 3D map to
find the location of the vehicle. A vision based road marking
detection method was introduced in [115]. Road markings
detected by a single front camera were compared and matched
to a low-volume digital marker map with global coordinates.
Then, a particle filter was employed to update the position
and heading of the vehicle with the detected road markings and
GPS-IMU output. A road marking detection based localization
technique using; two cameras directed towards the ground,
GPS-IMU dead reckoning, odometry, and a precise marker
location map was proposed in [116]. Another vision based
method with a single camera and geo-referenced traffic signs
was presented in [117].
This approach has one major disadvantage; landmark depen-
dency makes the system prone to fail where landmark amount
is insufficient.
2) Point cloud matching: The state-of-the-art localization
systems use multi-modal point cloud based approaches. In
summary, the online-scanned point cloud, which covers a
smaller area, is translated and rotated around its center it-
eratively to be compared against the larger a priori point
cloud map. The position and orientation that gives the best

match/overlap of points between the two point clouds give the
localized position of the sensor relative to the map. For initial
pose estimation, GPS is used commonly along dead reckoning.
We used this approach to localize our vehicle. The matching
process is shown in fig. 6 and the map-making in fig. 7 and
fig. 8.
In the seminal work of [109], a point cloud map collected
with lidar was used to augment inertial navigation and lo-
calization. A particle filter maintained a three-dimensional
vector of 2D coordinates and the yaw angle. A multi-modal
approach with probabilistic maps was utilized in [96] to
achieve localization in urban environments with less than 10
cm RMS error. Instead of comparing two point clouds point
by point and discarding the mismatched reads, the variance
of all observed data was modeled and used for the match-
ing task. A matching algorithm for lidar scans using multi-
resolution Gaussian Mixture Maps (GMM) was proposed in
[118]. Iterative Closest Point (ICP) was compared against
Normal Distribution Transform (NDT) in [105], [119]. In
NDT, accumulated sensor readings are transformed into a
grid that is represented by the mean and covariance obtained
from the scanned points that fall into its’ cells/voxels. NDT
proved to be more robust than point-to-point ICP matching.
An improved version of 3D NDT matching was proposed in
[97], and [114] augmented NDT with road marker matching.
An NDT-based Monte Carlo Localization (MCL) method that
utilizes an offline static map and a constantly updated short-
term map was developed by [120]. In this method, NDT
occupancy grid was used for the short-term map and it was
utilized only when and where the static map failed to give
sufficient explanations.
Map-making and maintaining it is time and resource con-
suming. Therefore some researchers such as [95] argue that

methods with a priori maps are not feasible given the size of
road networks and rapid changes.
3) 2D to 3D matching: Matching online 2D readings to a
3D a priori map is an emerging technology. This approach
requires only a camera on the ADS equipped vehicle instead
of the more expensive lidar. The a priori map still needs to be
created with a lidar though.
A monocular camera was used to localize the vehicle in
a point cloud map in [121]. With an initial pose estimation,
2D synthetic images were created from the offline 3D point
cloud map and they were compared with normalized mutual
information to the online images received from the camera.
This method increases the computational load of the localiza-
tion task. Another vision matching algorithm was introduced
in [122] where a stereo camera setup was utilized to compare
online readings to synthetic depth images generated from 3D
TABLE VI: Comparison of 2D bounding box estimation
architectures on the test set of ImageNet1K, ordered by Top
5% error. Number of parameters (Num. Params) and number
of layers (Num. Layers), hints at the computational cost of the
algorithm.
Architecture Num. Params Num. ImageNet1K
(×106) Layers Top 5 Error %
Incept.ResNet v2 [127] 30 95 4.9
Inception v4 [127] 41 75 5
ResNet101 [128] 45 100 6.05
DenseNet201 [129] 18 200 6.34
YOLOv3-608 [123] 63 53+1 6.2

ResNet50 [128] 26 49 6.7
GoogLeNet [130] 6 22 6.7
VGGNet16 [131] 134 13+2 6.8
AlexNet [12] 57 5+2 15.3
prior.
Camera based localization approaches can become popular
in the future as the hardware requirement is cheaper than lidar
based systems.
V. PERCEPTION
Perceiving the surrounding environment and extracting in-
formation which may be critical for safe navigation is a
critical objective. A variety of tasks, using different sensing
modalities, fall under the umbrella of perception. Building
on decades of computer vision research, cameras are the
most commonly used sensor for perception, with 3D vision
becoming a strong alternative/supplement.
The reminder of this section is divided into core per-
ception tasks. We discuss image-based object detection in
Section V-A1, semantic segmentation in Section V-A2, 3D
object detection in Section V-A3, road and lane detection in
Section V-C and object tracking in Section V-B.
A. Detection
1) Image-based Object Detection: Object detection refers
to identifying the location and size of objects of interests. Both
static objects, from traffic lights and signs to road crossings,
and dynamic objects such as other vehicles, pedestrians or
cyclists are of concern to ADS. Generalized object detection

has a long-standing history as a central problem in computer
vision, where the goal is to determine whether or not objects
of specific classes are present in an image, then to determine
their size via a rectangular bounding box. This section mainly
discusses state-of-the-art object detection methods, as they
represent the starting point of several other tasks in an ADS
pipe, such as object tracking and scene understanding.
Object recognition research started more than 50 years ago,
but only recently, in the late 1990s and early 2000s, has
algorithm performance reached a level of relevance for driving
automation. In 2012, the deep convolutional neural network
(DCNN) AlexNet [12] shattered the ImageNet image recogni-
tion challenge [132]. This resulted in a near complete shift of
focus to supervised learning and in particular deep learning for
object detection. There exists a number of extensive surveys on
general image-based object detection [133]–[135]. Here, the
focus is on the state-of-the-art methods that could be applied
to ADS.
While state-of-the-art methods all rely on DCNNs, there
currently exist a clear distinction between them:
1) Single stage detection frameworks use a single network
to produce object detection locations and class predic-
tion simultaneously.
2) Region proposal detection frameworks have two
distinct stages, where general regions of interest are
first proposed, then categorized by separate classifier
networks.
Region proposal methods are currently leading detection
benchmarks, but at the cost requiring high computation power,
and generally being difficult to implement, train and fine-

tune. Meanwhile, single stage detection algorithms tend to
have fast inference time and low memory cost, which is well-
suited for real-time driving automation. YOLO (You Only
Look Once) [136] is a popular single stage detector, which has
been improved continuously [123], [137]. Their network uses a
DCNN to extract image features on a coarse grid, significantly
reducing the resolution of the input image. A fully-connected
neural network then predicts class probabilities and bounding
box parameters for each grid cell and class. This design makes
YOLO very fast, the full model operating at 45 FPS and
a smaller model operating at 155 FPS for a small accuracy
trade-off. More recent versions of this method, YOLOv2,
YOLO9000 [137] and YOLOv3 [123] briefly took over the
PASCAL VOC and MS COCO benchmarks while maintaining
low computation and memory cost. Another widely used
algorithm, even faster than YOLO, is the Single Shot Detector
(SSD) [138], which uses standard DCNN architectures such
as VGG [131] to achieve competitive results on public bench-
marks. SSD performs detection on a coarse grid similar to
YOLO, but also uses higher resolution features obtained early
in the DCNN to improve detection and localization of small
objects.
Considering both accuracy and computational cost is essen-
tial for detection in ADS; the detection needs to be reliable,
but also operate better than real-time, to allow as much time
as possible for the planning and control modules to react to
those objects. As such, single stage detectors are often the
detection algorithms of choice for ADSs. However, as shown
in Table VI, region proposal networks (RPN), used in two-
stage detection frameworks, have proven to be unmatched
in terms of object recognition and localization accuracy, and
computational cost has improved greatly in recent years. They
are also better suited for other tasks related to detection,
such as semantic segmentation as discussed in Section V-A2.
Through transfer learning, RPNs achieving multiple percep-

tion tasks simultaneously are become increasingly feasible
for online applications [124]. RPNs can replace single stage
detection networks for ADS applications in the near future.
Omnidirectional Cameras: 360 degree vision, or at least
panoramic vision, is necessary for higher levels of automation.
This can be achieved through camera arrays, though precise
extrinsic calibration between each camera is then necessary to
make image stitching possible. Alternatively, omnidirectional
Fig. 9: An urban scene near Nagoya University, with camera
and lidar data collected by our experimental vehicle and object
detection outputs from state-of-the-art perception algorithms.
(a) A front facing camera’s view, with bounding box results
from
YOLOv3 [123] and (b) instance segmentation results from
MaskRCNN [124]. (c) Semantic segmentation masks produced
by
DeepLabv3 [125]. (d) The 3D Lidar data with object detection
results from SECOND [126]. Amongst the four, only the 3D
perception algorithm outputs range to detected objects.
cameras can be used, or a smaller array of cameras with
very wide angle fisheye lenses. These are however difficult to
intrinsically calibrate; the spherical images are highly distorted
and the camera model used must account for mirror reflections
or fisheye lens distortions, depending on the camera model
producing the panoramic images [139], [140]. The accuracy
of the model and calibration dictates the quality of undistorted
images produced, on which the aforementioned 2D vision
algorithms are used. An example of fisheye lenses producing
two spherical images then combined into one panoramic image
is shown in Figure 3. Some distortions inevitably remain,
but despite these challenges in calibration, omnidirectional

cameras have been used for many applications such as SLAM
[141] and 3D reconstruction [142].
Event Cameras: Event cameras are a fairly new modality
which output asynchronous events usually caused by move-
ment in the observed scene, as shown in Figure 4. This makes
the sensing modality interesting for dynamic object detection.
The other appealing factor is their response time on the order
of microseconds [86], as frame rate is a significant limitation
for high-speed driving. The sensor resolution remains an issue,
but new models are rapidly improving.
They have been used for a variety of applications closely
related to ADS. A recent survey outlines progress in pose
estimation and SLAM, visual-inertial odometry and 3D recon-
struction, as well as other applications [143]. Most notably, a
dataset for end-to-end driving with event cameras was recently
published, with preliminary experiments showing that the
output of an event camera can, to some extent, be used to
predict car steering angle [85].
Poor Illumination and Changing Appearance: The main
drawback with using camera is that changes in lighting con-
ditions can significantly affect their performance. Low light
conditions are inherently difficult to deal with, while changes
in illumination due to shifting shadows, intemperate weather,
or seasonal changes, can cause algorithms to fail, in particular
supervised learning methods. For example, snow drastically
alters the appearance of scenes and hides potentially key
features such as lane markings. An easy alternative is to use
an alternate sensing modalities for perception, but lidar also
has difficulties with some weather conditions like snow [144],
and radars lack the necessary resolution for many perception
tasks [47]. A sensor fusion strategy is often employed to avoid
any single point of failure [145].

Thermal imaging through infrared sensors are also used for
object detection in low light conditions, which is particularly
effective for pedestrian detection [146]. Camera-only methods
which attempt to deal with dynamic lighting conditions di-
rectly have also been developed. Both attempting to extract
lighting invariant features [147] and assessing the quality of
features [148] have been proposed. Pre-processed, illumination
invariant images have applied to ADS [149] and were shown
to improve localization, mapping and scene classification
capabilities over long periods of time. Still, dealing with
the unpredictable conditions brought forth by inadequate or
changing illumination remains a central challenge preventing
the widespread implementation of ADS.
2) Semantic Segmentation: Beyond image classification
and object detection, computer vision research has also tackled
the task of image segmentation. This consists of classifying
each pixel of an image with a class label. This task is of
particular importance to driving automation as some objects
of interest are poorly defined by bounding boxes, in particular
roads, traffic lines, sidewalks and buildings. A segmented
scene in an urban area can be seen in Figure 9. As opposed to
semantic segmentation, which labels pixels based on a class,
instance segmentation algorithms further separates instances
of the same class, which is important in the context of driving
automation. In other words, objects which may have different
trajectories and behaviors must be differentiated from each
other. We used the COCO dataset [150] to train the instance
segmentation algorithm Mask R-CNN [124] with the sample
result shown in Figure 9.
Segmentation has recently started being feasible for real-

time applications. Developments in this field are very much
parallel progress in general, image-based object detection.
The aforementioned Mask R-CNN [124] is a generalization
of Faster R-CNN [151]. The multi-task network can achieve
accurate bounding box estimation and instance segmentation
simultaneously and can also be generalized to other tasks like
pedestrian pose estimation with minimal domain knowledge.
Running at 5 fps means it is approaching the area of real-time
use for ADS.
Unlike Mask-RCNN’s CNN which is more akin to those
used for object detection through its use of region proposal
networks, segmentation networks usually employ a combi-
nation of convolutions, for feature extractions, followed by
deconvolutions, also called transposed convolutions, to obtain
pixel resolution labels [152], [153]. Feature pyramid networks
are also commonly used, for example in PSPNet [154], which
also introduced dilated convolutions for segmentation. This
idea of sparse convolutions was then used to develop DeepLab
[155], with the most recent version being the current state-of-
the-art for object segmentation [125]. We employed DeepLab
with our ADS and a segmented frame is shown in Figure 9.
While most segmentation networks are as of yet too slow
and computationally expensive to be used in ADS, it is
important to notice that many of these segmentations networks
are initially trained for different tasks, such as bounding box
estimation, then generalized to segmentation networks. Fur-
thermore, these networks were shown to learn universal feature
representations of images, and can be generalized for many
tasks. This suggests the possibility that single, generalized
perception networks may be able to tackle all the different
perception tasks required for an ADS.
3) 3D Object Detection: Given their affordability, availabil-
ity and widespread research, the camera is used by nearly all

algorithms presented so far as the primary perception modality.
However, cameras have limitations that are critical to ADS.
Aside from illumination which was previously discussed,
camera-based object detection occurs in the projected image
space and therefore the scale of the scene is unknown. To make
use of this information for dynamic driving tasks like obstacle
avoidance, it is necessary to bridge the gap form 2D image-
based detection to the 3D, metric space. Depth estimation is
therefore necessary, which is in fact possible through single
camera [156] though stereo or multi-view systems are more
robust [157]. These algorithms necessarily need to solve an
expensive image matching problem, which adds a significant
amount of processing cost to an already complex perception
pipeline.
A relatively new sensing modality, the 3D lidar, offers an
alternative for 3D perception. The 3D data collected inherently
solves the scale problem, and since they have their own
emission source, they are far less dependable on lighting
condition, and less susceptible to intemperate weather. The
sensing modality collects sparse 3D points representing the
surfaces of the scene, as shown in Figure 10, which are
challenging to use for object detection and classification. The
appearance of objects change with range, and after some
distance, very few data points per objects are available to
detect an object. This poses some challenges for detection,
but since the data is a direct representation of the world,
it is more easily separable. Traditional methods often used
euclidean clustering [158] or region-growing methods [159]
for grouping points into objects. This approach has been made
much more robust through various filtering techniques, such
as ground filtering [160] and map-based filtering [161]. We
implemented a 3D object detection pipeline to get clustered
objects from raw point cloud input. An example of this process
is shown in Figure 10.

As with image-based methods, machine learning has also
recently taken over 3D detection methods. These methods have
also notably been applied to RGB-D [162], which produce
similar, but colored, point clouds; with their limited range and
unreliability outdoors, RGB-D have not been used for ADS
applications. A 3D representation of point data, through a 3D
occupancy grid called voxel grids, was first applied for object
detection in RGB-D data [162]. Shortly thereafter, a similar
approach was used on point clouds created by lidars [163].
Inspired by image-based methods, 3D CNNs are used, despite
being computationally very expensive.
The first convincing results for point cloud-only 3D bound-
ing box estimation were produced by VoxelNet [164]. In-
stead of hand-crafting input features computed during the
discretization process, VoxelNet learned an encoding from raw
point cloud data to voxel grid. Their voxel feature encoder
(VFE) uses a fully connected neural network to convert the
variable number of points in each occupied voxel to a feature
vector of fixed size. The voxel grid encoded with feature
vectors was then used as input to an aforementioned RPN
for multi-class object detection. This work was then improved
both in terms of accuracy and computational efficiency by
SECOND [126] by exploiting the natural sparsity of lidar
data. We employed SECOND and a sample result is shown
in Figure 9. Several algorithms have been produced recently,
with accuracy constantly improving as shown in Table VII,
yet the computational complexity of 3D convolutions remains
an issue for real-time use.
Another option for lidar-based perception is 2D projection
of point cloud data. There are two main representations of
point cloud data in 2D, the first being a so-called depth image
shown in Figure 14, largely inspired by camera-based methods
that perform 3D object detection through depth estimation

[165] and methods that operate on RGB-D data [166]. The
VeloFCN network [167] proposed to use single-channel depth
image as input to a shallow, single-stage convolutional neural
network which produced 3D vehicle proposals, with many
Fig. 10: Outline of a traditional method for object detection
from 3D pointcloud data. Various filtering and data reduction
methods are used first, followed by clustering. The resulting
clusters are shown by the different colored points in the 3D
lidar
data of pedestrians collected by our data collection platform.
TABLE VII: Average Precision (AP) in % on the KITTI 3D
object detection test set car class, ordered based on moderate
category accuracy. These algorithms only use pointcloud data.
Algorithm T [s] Easy Moderate Hard
PointRCNN [176] 0.10 85.9 75.8 68.3
PointPillars [177] 0.02 79.1 75.0 68.3
SECOND [126] 0.04 83.1 73.7 66.2
IPOD [178] 0.20 82.1 72.6 66.3
F-PointNet [179] 0.17 81.2 70.4 62.2
VoxelNet (Lidar) [164] 0.23 77.5 65.1 57.7
MV3D (Lidar) [169] 0.24 66.8 52.8 51.3
other algorithms adopting this approach. Another use of depth
image was shown for semantic classification of lidar points
[168].
The other 2D projection that has seen increasing popularity,
in part due to the new KITTI benchmark, is projection to

bird’s eye view (BV) image. This is a top-view image of
point clouds shown as shown in Figure 15. As such, bird’s
view images necessarily discretize space purely in 2D, so
lidar points which vary in height alone necessarily occlude
each other. The MV3D algorithm [169] used camera images,
depth images, as well as multi-channel BV images; each
channel corresponding to a different range of heights, so as to
minimize these occlusions. Several other works have reused
camera-based algorithms and trained efficient networks for
3D object detection on 2D BV images [170]–[173]. State-of-
the-art algorithms are currently being evaluated on the KITTI
dataset [174] and nuScenes dataset [175] as they offer labeled
3D scenes. Table VII shows the leading methods on the KITTI
benchmark, alongside detection times. 2D methods are far less
computationally expensive, but recent methods that take point
sparsity into account [126] are real-time viable and rapidly
approaching the accuracies necessary for integration in ADS.
Radar Radar sensors have already been used for various
perception applications, in different types of vehicles, with
different models operating at complementary ranges. While
not as accurate as the lidar, it can detect at object at high range
and estimate their velocity [112]. The lack of precision for
estimating shape of objects is a major drawback when it is used
in perception systems [47], the resolution is simply too low. As
such, it can be used for range estimation to large objects like
vehicles, but it is challenging for pedestrians or static objects.
Another issue is the very limited field of view of most radars,
forcing a complicated array of radar sensors to cover the full
field of view. Nevertheless, radar have seen widespread use
as an ADAS component, for applications including proximity
warning and adaptive cruise control [144]. While radar and
lidar are often seen as competing sensing modalities, they will
likely be used in tandem in fully automated driving systems.
Radars are very long range, have low cost and complete

robustness to poor weather, while lidar offer precise object
localization capabilities, as discussed in Section IV.
Another similar sensor to the radar are sonar devices, though
their extremely short range of < 2m and poor angular reso-
lution makes their use limited to very near obstacle detection
[144].
B. Object Tracking
Object tracking is also often referred to as multiple object
tracking (MOT) [180] and detection and tracking of multi-
ple objects (DATMO) [181]. For fully automated driving in
complex and high speed scenarios, estimating location alone
is insufficient. It is necessary to estimate dynamic objects’
heading and velocity so that a motion model can be applied to
track the object over time and predict future trajectory to avoid
collisions. These trajectories must be generally estimated in the
vehicle frame to be used by planning, so range information
must be obtained through multiple camera systems, lidars or
radar sensors. 3D lidars are often used for their precise range
information and large field of view, allowing tracking over
Fig. 11: A scene with several tracked pedestrians and cyclist
with a basic particle filter on an urban road intersection. Past
trajectories are shown in white with current heading and speed
shown by the direction and magnitude of the arrow, sample
collected by our data collection platform.
longer periods of time. To better cope with the limitations and
uncertainties of different sensing modalities, a sensor fusion
strategy is often use for tracking [43].
Commonly used object trackers rely on simple data asso-

ciation techniques followed by traditional filtering methods.
When objects are tracked in 3D space at high frame rate,
nearest neighbor methods are often sufficient for establishing
associations between objects. Image-based methods, however,
need to establish some appearance model, which may consider
the use of color histograms, gradients and other features
such as KLT to evaluate the similarity [182]. Point cloud
based methods may also use similarity metrics such as point
density and Hausdorff distance [161], [183]. Since association
errors are always a possibility, multiple hypothesis tracking
algorithms [184] are often employed, which ensures tracking
algorithms can recover from poor data association at any single
time step. Using occupancy maps as a frame for all sensors to
contribute to and then doing data association in that frame
is common, especially when using multiple sensors [185].
To obtain smooth dynamics, the detection results are filtered
by traditional Bayes filters. Kalman filtering is sufficient for
simple linear models, while the extended and and unscented
Kalman filters [186] are used to handle nonlinear dynamic
models [187]. We implemented a basic particle filter based
object-tracking algorithm, and an example of tracked pedes-
trians in contrasting camera and 3D lidar perspective is shown
in Figure 11.
Physical models for the object being tracked are also often
used for more robust tracking. In that case, non-parametric
methods such as particle filters are used, and physical pa-
rameters such as the size of the object are tracked alongside
dynamics [188]. More involved filtering methods such as Rao-
Blackwellized particle filters have also been used to keep track
of both dynamic variables and vehicle geometry variables for
an L-shape vehicle model [189]. Various models have been
proposed for vehicles and pedestrians, while some models
generalize to any dynamic object [190].
Finally, deep learning has also been applied to the problem

of tracking, particularly for images. Tracking in monocu-
lar images was achieved in real-time through a CNN-based
method [191], [192]. Multi-task network which estimate object
dynamics are also emerging [193] which further suggests that
generalized networks tackling multiple perception tasks may
be the future of ADS perception.
C. Road and Lane Detection
Bounding box estimation methods previously covered are
useful for defining some objects of interest but are inadequate
for continuous surfaces like roads. Determining the drivable
surface is critical for ADS and has been specifically researched
as a subset of the detection problem. While drivable surface
can be determined through semantic segmentation, automated
vehicles need to understand road semantics to properly ne-
gotiate the road. An understanding of lanes, and how they
are connected through merges and intersections remains a
challenge from the perspective of perception. In this section,
we provide an overview of current methods used for road
and lane detection, and refer the reader to in-depth surveys
of traditional methods [194] and the state-of-the-art methods
[195], [196].
This problem is usually subdivided in several tasks, each
unlocking some level of automation. The simplest is determin-
ing the drivable area from the perspective of the ego-vehicle.
The road can then be divided into lanes, and the vehicles’ host
lane can be determined. Host lane estimation over a reasonable
distance allows ADAS technology such as lane departure
warning, lane keeping and adaptive cruise control [194], [198].
Even more challenging is determining other lanes and their
direction [199], and finally understanding complex semantics,
as in their current and future direction, or merging and turning
lanes [43]. These ADAS or ADS technologies have different

criteria both in terms of task, detection distance and reliability
rates, but fully automated driving will require a complete,
semantic understanding of road structures and the ability to
detect several lanes at long ranges [195]. Annotated maps as
shown in Figure 8 are extremely useful for understanding lane
semantics.
Current methods on road understanding typically first rely
on exteroceptive data preprocessing. When cameras are used,
this usually means performing image color corrections to
normalize lighting conditions [200]. For lidar, several filtering
methods can be used to reduce clutter in the data such as
ground extraction [160] or map-based filtering [161]. For any
sensing modality, identifying dynamic objects which conflicts
with the static road scene is an important pre-processing
step. Then, road and lane feature extraction is performed on
the corrected data. Color statistics and intensity information
[201], gradient information [202], and various other filters
have been used to detect lane markings. Similar methods have
been used for road estimation, where the usual uniformity of
roads and elevation gap at the edge allows for region growing
methods to be applied [203]. Stereo camera systems [204],
as well as 3D lidars [201], have been used determine the 3D
structure of roads directly. More recently, machine learning-
based methods which either fuse maps with vision [196] or use
fully appearance-based segmentation [205] have been used.
Fig. 12: Assessing the overall risk level of driving scenes. We
employed an open-source1 deep spatiotemporal video-based risk
detection framework [197] to assess the image sequences shown
in this figure.
Once surfaces are estimated, model fitting is used to es-
tablish the continuity of the road and lanes. Geometric fitting

through parametric models such as lines [206] and splines
[201] have been used, as well as non-parametric continuous
models [207]. Models that assume parallel lanes have been
used [198], and more recently models integrating topological
elements such as lane splitting and merging were proposed
[201].
Temporal integration completes the road and lane segmenta-
tion pipeline. Here, vehicle dynamics are used in combination
with a road tracking system to achieve smooth results. Dy-
namic information can also be used alongside Kalman filtering
[198] or particle filtering [204] to achieve smoother results.
Road and lane estimation is a well-researched field and
many methods have already been integrated successfully for
lane keeping assistance systems. However, most methods
remain riddled with assumptions and limitations, and truly
general systems which can handle complex road topologies
have yet to be developed. Through standardized road maps
which encode topology and emerging machine learning-based
road and lane classification methods, robust systems for driv-
ing automation are slowly taking shape.
VI. ASSESSMENT
A robust ADS should constantly evaluate the overall risk
level of the situation and predict the intentions of human
drivers and pedestrians around itself. A lack of acute assess-
ment mechanism can lead to accidents. This section discusses
assessment under three subcategories: overall risk and uncer-
tainty assessment, human driving behavior assessment, and
driving style recognition.
A. Risk and uncertainty assessment
Overall assessment can be summarized as quantifying the

uncertainties and the risk level of the driving scene. It is a
promising methodology that can increase the safety of ADS
pipelines [11].
Using Bayesian methods to quantify and measure uncer-
tainties of deep neural networks were proposed in [208]. A
Bayesian deep learning architecture was designed for prop-
agating uncertainty throughout an ADS pipeline, and the
advantage of it over conventional approaches was shown in a
hypothetical scenario [11]. In summary, each module conveys
and accepts probability distributions instead of exact outcomes
throughout the pipeline, which increases the overall robustness
of the system.
An alternative approach is to assess the overall risk level of
the driving scene separately, i.e outside the pipeline. Sensory
inputs were fed into a risk inference framework in [74], [209]
to detect unsafe lane change events using Hidden Markov
Models (HMMs) and language models. Recently, a deep
spatiotemporal network that infers the overall risk level of a
driving scene was introduced in [197]. Implementation of this
method is available open-source1. We employed this method
to assess the risk level of a lane change as shown in Figure
12.
B. Surrounding driving behavior assessment
Understanding surrounding human driver intention is most
relevant to medium to long term prediction and decision mak-
ing. In order to increase the prediction horizon of surrounding
object behavior, human traits should be considered and incor-
porated into the prediction and evaluation steps. Understanding
surrounding driver intention from the perspective of an ADS
is not a common practice in the field, as such, state-of-the-art
is not established yet.

In [210], a target vehicle’s future behavior was predicted
with a hidden Markov model (HMM) and the prediction time
horizon was extended 56% by learning human driving traits.
The proposed system tagged observations with predefined
maneuvers. Then, the features of each type were learned in
a data-centric manner with HMMs. Another learning based
approach was proposed in [211], where a Bayesian network
classifier was used to predict maneuvers of individual drivers
on highways. A framework for long term driver behavior pre-
diction using a combination of a hybrid state system and HMM
was introduced in [212]. Surrounding vehicle information was
1https://github.com/Ekim-Yurtsever/DeepTL-Lane-Change-
Classification
https://github.com/Ekim-Yurtsever/DeepTL-Lane-Change-
Classification
integrated with ego-behavior through a symbolization frame-
work in [74], [209]. Detecting dangerous cut in maneuvers
was achieved with an HMM framework that was trained on
safe and dangerous data in [213]. Lane change events were
predicted 1.3 seconds in advance with support vector machines
(SVM) and Bayesian filters [214].
The main challenges are the short observation window for
understanding the intention of humans and real-time high-
frequency computation requirements. Most of the time the
ADS can observe a surrounding vehicle only for seconds.
Complicated driving behavior models that require longer ob-
servation periods cannot be utilized under these circumstances.
C. Driving style recognition

In 2016, Google’s self-driving car collided with an oncom-
ing bus [16] during a lane change. The ADS assumed that
the bus driver was going to yield and let the self-driving car
merge in. However, the bus driver accelerated instead. This
accident could have been prevented if the ADS understood
this particular bus driver’s individual, unique driving style and
predicted his behavior.
Driving style is a broad term without an established com-
mon definition. A thorough review of the literature can be
found in [215] and a survey of driving style recognition
algorithms for intelligent vehicles is given in [216]. Readers
are referred to these papers for a complete review.
Typically, driving style is defined with respect to either
aggressiveness [217]–[221] or fuel consumption [222]–[226].
For example, [227] introduced a rule-based model that clas-
sified driving styles with respect to jerk. This model decides
whether a maneuver is aggressive or calm by a set of rules
and jerk thresholds. Drivers were categorized with respect to
their average speed in [228]. In conventional methods, total
number and meaning of driving style classes are predefined
beforehand. The vast majority of driving style recognition
literature uses two or three classes. In [229]–[231] driving
style was categorized into three and into two in [74], [209],
[217], [218], [222]. Representing driving style in a continuous
domain is uncommon, but there are some studies. In [232],
driving style was depicted as a continuous value between -1
and +1, which stands for mild and active respectively. Details
of classification is given in Table VIII.
More recently, machine learning based approaches have
been utilized for driving style recognition. Principal compo-
nent analysis was used and five distinct driving classes were
detected in an unsupervised manner in [233] and A GMM
based driver model was used to identify individual drivers

with success in [234]. Car-following and pedal operation be-
havior was investigated separately in the latter study. Another
GMM based driving style recognition model was proposed for
electric vehicle range prediction in [235]. In [217], aggressive
event detection with dynamic time wrapping was presented
where the authors reported a high success score. Bayesian
approaches were utilized in [236] for modeling driving style
on roundabouts and in [237] to asses critical braking situations.
Bag-of-words and K-means clustering was used to represent
individual driving features in [238]. A stacked autoencoder
TABLE VIII: Driving style categorization
Related work # Classes Methodology Class details
[233] 5 PCA non-aggressive to very
aggressive
[242] 3 NN, SVM, DT expert/typical/low-skill
[229] 3 FL sporty/normal/comfortable
[230] 3 PCMLP aggressive/moderate/calm
[239] 3 SAE & K-means unidentified clusters
[74] 2 non-param.Bayesian risky/safe
[217] 2 DTW aggressive/non-aggressive
[218] 2 RB sudden/safe
[232] Continuous
[−1, 1] NN mild to active
was used to extract unique driving signatures from different
drivers, and then macro driving style centroids were found
with clustering [239]. Another autoencoder network was used
to extract road-type specific driving features [240]. Similarly,
driving behavior was encoded in a 3-channel RGB space with

a deep sparse autoencoder to visualize individual driving styles
[241].
A successful integration of driving style recognition into
a real world ADS pipeline is not reported yet. However,
these studies are promising, and they point to a possible new
direction in ADS development.
VII. PLANNING AND DECISION MAKING
A brief overview of planning and decision making is given
here. For more detailed studies of this topic [22], [27], [244]
can be referred.
A. Global planning
Planning can be categorized into two sub-tasks: global
route planning and local path planning. The global planner
is responsible for finding the route on the road network from
origin to the final destination. The user usually defines the
final destination. Global navigation is a well-studied subject,
and high performance has became an industry standard for
more than a decade. Almost all modern production cars are
equipped with navigation systems that utilize GPS and offline
maps to plan a global route.
Route planning is formulated as finding the point-to-point
shortest path in a directed graph, and conventional methods
are examined under four categories in [244]. These are;
goal-directed, separator-based, hierarchical and bounded-hop
techniques. A* search [245] is a standard goal-directed path
planning algorithm and used extensively in various fields for
almost 50 years.
The main idea of separator-based techniques is to remove a
subset of vertices [246] or arcs from the graph and compute an

overlay graph over it. Using the overlayed graph to calculate
the shortest path results in faster queries.
Hierarchical techniques take advantage of the road hier-
archy. For example, the road hierarchy in the US can be
listed from top to bottom as freeways, arterials, collectors and
local roads respectively. For a route query, the importance
Fig. 13: Global plan and the local paths. The annotated vector
map shown in Figure 8 was utilized by the planner. We
employed
OpenPlanner [243], a graph-based planner, here to illustrate a
typical planning approach.
TABLE IX: Local planning techniques
Approach Methods Pros and cons
Graph
search
Dijkstra [250], A* [245],
State lattice [251] Slow and jerky
Sampling
based
RPP [252], RRT [253],
PRM [254] Fast solution but jerky
Curve
interpolation
clothoids [255],

polynomials [256],
Bezier [257], splines [100]
Smooth but slow
Numerical
optimization
num. non-linear opt. [258],
Newton’s method [259]
increases computational
cost but improves quality
Deep
learning
FCN [260],
segmentation network [261]
high imitation performance,
but no hard coded
safety measures
of hierarchy increases as the distance between origin and
destination gets longer. The shortest path may not be the
fastest nor the most desirable route anymore. Getting away
from the destination thus making the route a bit longer to
take the closest highway ramp may result in faster travel time
in comparison to following the shortest path of local roads.
Contraction Hierarchies (CH) method was proposed in [247]
for exploiting road hierarchy.
Precomputing distances between selected vertexes and uti-
lizing them on the query time is the basis of bounded-hop

techniques. Precomputed shortcuts can be utilized partly or
exclusively for navigation. However, the naive approach of
precomputing all possible routes from every pair of vertices is
impractical in most cases with large networks. One possible
solution to this is to use hub labeling (HL) [248]. This
approach requires preprocessing also. A label associated with a
vertex consists of nearby hub vertices and the distance to them.
These labels satisfy the condition that at least one shared hub
vertex must exist between the labels of any given two vertices.
HL is the fastest query time algorithm for route planning [244],
in the expense of high storage usage.
A combination of the above algorithms are popular in state-
of-the-art systems. For example, [249] combined a separator
with a bounded-hop method and created the Transit Node
Routing with Arc Flags (TNR+AF) algorithm. Modern route
planners can make a query in milliseconds.
B. Local planning
The objective of the local planner is to execute a global
plan without failing. In other words, in order to complete its
trip, the ADS must find trajectories to avoid obstacles and
satisfy optimization criteria in the configuration space (C-
space), given a starting and destination point. A detailed local
planning review is presented in [23] where the taxonomy of
motion planning was divided into four groups; graph-based
planners, sampling-based planners, interpolating curve plan-
ners and numerical optimization approaches. After a summary
of these conventional planners, the emerging deep learning-
based planners are introduced at the end of this section.
Graph-based local planners use the same techniques as
graph-based global planners such as Dijkstra [250] and A*
[245], which output discrete paths rather than continuous ones.
This can lead to jerky trajectories [23]. A more advanced

graph-based planner is the state lattice algorithm. As all graph-
based methods, state lattice discretizes the decision space.
High dimensional lattice nodes, which typically encode 2D
position, heading and curvature [251], are used to create a grid
first. Then, the connections between the nodes are precom-
puted with an inverse path generator to build the state lattice.
During the planning phase, a cost function, which usually
considers proximity to obstacles and deviation from the goal,
is utilized for finding the best path with the precomputed path
primitives. State lattices can handle high dimensions and is
good for local planning in dynamical environments, however,
the computational load is high and the discretization resolution
limits the planners’ capacity [23].
A detailed overview of Sampling Based Planning (SBP)
methods can be found in [262]. In summary, SBP tries to
build the connectivity of the C-space by randomly sampling
it. Randomized Potential Planner (RPP) [252] is one of the
earliest SBP approaches, where random walks are generated
for escaping local minimums. Probabilistic roadmap method
(PRM) [254] and rapidly-exploring random tree (RRT) [253]
are the most commonly used SBP algorithms. PRM first
samples the C-space during its learning phase and then makes
a query with the predefined origin and destination points on
the roadmap. RRT, on the other hand, is a single query planner.
The path between start and goal configuration is incrementally
Fig. 14: A depth image produced from synthetic lidar data,
generated in the CARLA [264] simulator.
built with random tree-like branches. RRT is faster than PRM
and both are probabilistically complete [253], which means a
path that satisfies the given conditions will be guaranteed to
be found with enough runtime. The main disadvantage of SBP

is, again, the jerky trajectories [23].
Interpolating curve planners fit a curve to a known set of
points [23], e.g. way-points generated from the global plan
or a discrete set of future points from another local planner.
The main obstacle avoidance strategy is to interpolate new
collision-free paths that first deviate from, and than re-enter
back to the initial planned trajectory. The new path is generated
by fitting a curve to a new set of points: an exit point from
the currently traversed trajectory, newly sampled collision free
points, and a re-entry point on the initial trajectory. The
resultant trajectory is smooth, however, the computational load
is usually higher compared to other methods. There are various
curve families that are used commonly such as clothoids [255],
polynomials [256], Bezier curves [257] and splines [100].
Optimization based motion planners improve the quality of
already existing paths with optimization functions. A* trajec-
tories were optimized with numeric non-linear functions in
[258]. Potential Field Method (PFM) was improved by solving
the inherent oscillation problem using Newton’s method with
obtaining C1 continuity in [259].
Recently, Deep Learning (DL) and reinforcement learning
based local planners started to emerge as an alternative. Fully
convolutional 3D neural networks can generate future paths
from sensory input such as lidar point clouds [260]. An
interesting take on the subject is to segment image data with
path proposals using a deep segmentation network [261].
Planning a safe path in occluded intersections was achieved in
a simulation environment using deep reinforcement learning
in [263]. The main difference between end-to-end driving
and deep learning based local planners is the output: the
former outputs direct vehicle control signals such as steering
and pedal operation, whereas the latter generates a trajectory.
This enables DL planners to be integrated into conventional

pipelines [28].
Deep learning based planners are promising, but they are
not widely used in real-world systems yet. Lack of hard-coded
safety measures, generalization issues, need for labeled data
are some of the issues that need to be addressed.
Fig. 15: Bird’s eye view perspective of 3D lidar data, a sample
from the KITTI dataset [174].
VIII. HUMAN MACHINE INTERFACE
Vehicles communicate with their drivers/passengers through
their HMI module. The nature of this communication greatly
depends on the objective, which can be divided into two:
primary driving tasks and secondary tasks. The interaction
intensity of these tasks depend on the automation level. Where
a manually operated, level zero conventional car requires
constant user input for operation, a level five ADS may need
user input only at the beginning of the trip. Furthermore,
the purpose of interaction may affect intensity. A shift from
executing primary driving tasks to monitoring the automation
process raises new HMI design requirements.
There are several investigations such as [265], [266] about
automotive HMI technologies, mostly from the distraction
point of view. Manual user interfaces for secondary tasks are
more desired than their visual counterparts [265]. The main
reason is vision is absolutely necessary and has no alternative
for primary driving tasks. Visual interface interactions require
glances with durations between 0.6 and 1.6 seconds with a
mean of 1.2 seconds [265]. As such, secondary task interfaces
that require vision is distracting and detrimental for driving.
Auditory User Interfaces (AUI) are good alternatives to
visually taxing HMI designs. AUIs are omni-directional: even

if the user is not attending, the auditory cues are hard to miss
[267]. The main challenge of audio interaction is automatic
speech recognition (ASR). ASR is a very mature field. How-
ever, in vehicle domain there are additional challenges; low
performance caused by uncontrollable cabin conditions such
as wind and road noise [268]. Beyond simple voice commands,
conversational natural language interaction with an ADS is still
an unrealized concept with many unsolved challenges [269].
The biggest HMI challenge is at level three and four
automation. The user and the ADS need to have a mutual
understanding; otherwise, they will not be able to grasp the
intentions of each other [266]. The transition from manual to
automated driving and vice versa is prone to fail in the state-
of-the-art. Recent research showed that drivers exhibit low
cognitive load when monitoring automated driving compared
to doing a secondary task [284]. Even though some experimen-
tal systems can recognize driver-activity with a driver facing
camera based on head and eye-tracking [285], and prepare
the driver for handover with visual and auditory cues [286] in
simulation environments, a real world system with an efficient
handover interaction module does not exist at the moment.
This is an open problem [287] and future research should focus
TABLE X: Driving datasets
Dataset Image LIDAR 2Dannotation*
3D
annotation* ego signals Naturalistic POV
Multi
trip
all

weathers
day &
night
Cityscapes [270] X - X - - - Vehicle - - -
Berkley DeepDrive [271] X - X - - - Vehicle - X X
Mapillary [272] X - X - - - Vehicle - X X
Oxford RobotCar [49] X X - - - - Vehicle X X X
KITTI [174] X X X X - - Vehicle - - -
H3D [273] X X - X - - Vehicle - - -
ApolloScape [274] X X X X - - Vehicle - - -
nuScenes [175] X X X X - - Vehicle - X X
Udacity [275] X X X X - - Vehicle - - -
DDD17 [85] X - X - X - Vehicle - X X
Comma2k19 [276] X - - - X - Vehicle - - X
LiVi-Set [277] X X - - X - Vehicle - - -
NU-drive [278] X - - - X Semi Vehicle X - -
SHRP2 [279] X - - - X X Vehicle - - -
100-Car [280] X - - - X X Vehicle - X X
euroFOT [281] X - - - X X Vehicle - - -
TorontoCity [282] X X X X - -
Aerial,
panorama,
vehicle
- - -
KAIST multi-spectral [283] X X X - - - Vehicle - - X
*2D and 3D annotation can vary from bounding boxes to

segmentation masks. Readers are referred to sources for details
of the datasets.
on delivering better methods to inform/prepare the driver for
easing the transition [37].
IX. DATASETS AND AVAILABLE TOOLS
A. Datasets and Benchmarks
Datasets are crucial for researchers and developers because
most of the algorithms and tools have to be tested and trained
before going on road.
Typically, sensory inputs are fed into a stack of algorithms
with various objectives. A common practice is to test and
validate these functions separately on annotated datasets. For
example, the output of cameras, 2D vision, can be fed into
an object detection algorithm to detect surrounding vehicles
and pedestrians. Then, this information can be used in an-
other algorithm for planning purposes. Even though these
two algorithms are connected in the stack of this example,
the object detection part can be worked on and validated
separately during the development process. Since computer
vision is a well-studied field, there are annotated datasets for
object detection and tracking specifically. The existence of
these datasets increases the development process and enables
interdisciplinary research teams to work with each other much
more efficiently. For end-to-end systems, the dataset has to
include additional ego-vehicle signals, chiefly steering and
longitudinal control signals.
As learning approaches emerged, so did training datasets
to support them. The PASCAL VOC dataset [288], which
grew from 2005 to 2012, was one of the first dataset fea-
turing a large amount of data with relevant classes for ADS.

However, the images often featured single objects, in scenes
and scales that are not representative of what is encountered
in driving scenarios. In 2012, the KITTI Vision Benchmark
[174] remedied this situation by providing a relatively large
amount of labeled driving scenes. It remains now as one of the
most widely used datasets for applications related to driving
automation. Yet in terms of quantity of data and number of
labeled classes, it is far inferior to generic image databases
such as ImageNet [132] and COCO [150]. While no doubt
useful for training, it remains that generic image databases
lack the adequate context to test the capabilities of ADS. UC
Berkeley DeepDrive [271] is a recent dataset with annotated
image data. The Oxford RobotCar [49] is used to collect
over 1000 km of driving data with six cameras, lidar, GPS
and INS in the UK. The dataset is not annotated though.
ApolloScape is a very recent dataset that is not fully public yet
[274]. Cityscapes [270] is commonly used for computer vision
algorithms as a benchmark set. Mapillary Vistas is a big image
dataset with annotations [272]. TorontoCity benchmark [282]
is a very detailed dataset; however it is not public yet. The
nuScenes dataset is the most recent urban driving dataset with
lidar and image sensors [175]. Comma.ai has released a part
of their dataset [289] which includes 7.25 hours of driving.
In DDD17 [85] around 12 hours of driving data is recorded.
The LiVi-Set [277] is a new dataset that has lidar, image and
driving behavior.
Naturalistic driving data is another type of dataset that
concentrates on the individual element of the driving: the
driver. SHRP2 [279] includes over 3000 volunteer participants’
driving data over a 3-year collection period. Other naturalistic
driving datasets are the 100-Car study [280], euroFOT [281]
and NUDrive [278]. Table X shows the comparison of these
datasets.

B. Open-source frameworks and simulators
Open source frameworks are very useful for both re-
searchers and the industry. These frameworks can “democra-
tize” ADS development. Autoware [110], Apollo [290], Nvidia
DriveWorks [291] and openpilot [292] are amongst the most
used software-stacks capable of running an ADS platform
in real world. We utilized Autoware [110] to realize core
automated driving functions in this study.
Simulations also have an important place for ADS devel-
opment. Since the instrumentation of an experimental vehicle
still has a high cost and conducting experiments on public
road networks are highly regulated, a simulation environment
is beneficial for developing certain algorithms/modules before
road tests. Furthermore, highly dangerous scenarios such as
a collision with pedestrian can be tested in simulations with
ease. CARLA [264] is an urban driving simulator developed
for this purpose. TORCS [293] was developed for race track
simulation. Some researchers even used computer games such
as Grand Theft Auto V [294]. Gazebo [295] is a common
simulation environment for robotics. For traffic simulations,
SUMO [296] is a widely used open-source platform. [297]
proposed different concepts of integrating real-world measure-
ments into the simulation environment.
X. CONCLUSIONS
In this survey on automated driving systems, we outlined
some of the key innovations as well as existing systems. While
the promise of automated driving is enticing and already mar-
keted to consumers, this survey has shown there remains clear
gaps in the research. Several architecture models have been

proposed, from fully modular to completely end-to-end, each
with their own shortcomings. The optimal sensing modality for
localization, mapping and perception is still disagreed upon,
algorithms still lack accuracy and efficiency, and the need for
a proper online assessment has become apparent. Less than
ideal road conditions are still an open problem, as well as
dealing with intemperate weather. Vehicle-to-vehicle commu-
nication is still in its infancy, while centralized, cloud-based
information management has yet to be implemented due to the
complex infrastructure required. Human-machine interaction is
an under-researched field with many open problems.
The development of automated driving systems relies on
the advancements of both scientific disciplines and new tech-
nologies. As such, we discussed in the recent research de-
velopments that which are likely to have a significant impact
on automated driving technology, either by overcoming the
weakness of previous methods or by proposing an alternative.
This survey has shown that through inter-disciplinary academic
collaboration and support from industries and the general pub-
lic, the remaining challenges can be addressed. With directed
efforts towards ensuring robustness at all levels of automated
driving systems, safe and efficient roads are just beyond the
horizon.
REFERENCES
[1] S. Singh, “Critical reasons for crashes investigated in the
national motor
vehicle crash causation survey,” Tech. Rep., 2015.
[2] T. J. Crayton and B. M. Meier, “Autonomous vehicles:
Developing
a public health research agenda to frame the future of
transportation
policy,” Journal of Transport & Health, vol. 6, pp. 245–252,

2017.
[3] W. D. Montgomery, R. Mudge, E. L. Groshen, S. Helper, J.
P.
MacDuffie, and C. Carson, “America’s workforce and the self-
driving
future: Realizing productivity gains and spurring economic
growth,”
2018.
[4] Eureka. E! 45: Programme for a european traffic system
with highest
efficiency and unprecedented safety.
https://www.eurekanetwork.org/
project/id/45. [Retrieved May 19, 2019].
[5] B. Ulmer, “Vita ii-active collision avoidance in real traffic,”
in Intel-
ligent Vehicles’ 94 Symposium, Proceedings of the. IEEE,
1994, pp.
1–6.
[6] M. Buehler, K. Iagnemma, and S. Singh, “The 2005 darpa
grand
challenge: the great robot race,” vol. 36, 2007.
[7] ——, “The darpa urban challenge: autonomous vehicles in
city traffic,”
vol. 56, 2009.
[8] A. Broggi, P. Cerri, M. Felisa, M. C. Laghi, L. Mazzei, and
P. P. Porta,
“The vislab intercontinental autonomous challenge: an extensive
test
for a platoon of intelligent vehicles,” International Journal of
Vehicle

Autonomous Systems, vol. 10, no. 3, pp. 147–164, 2012.
[9] A. Broggi, P. Cerri, S. Debattisti, M. C. Laghi, P. Medici,
D. Molinari,
M. Panciroli, and A. Prioletti, “Proud-public road urban
driverless-car
test,” IEEE Transactions on Intelligent Transportation Systems,
vol. 16,
no. 6, pp. 3508–3519, 2015.
[10] P. Cerri, G. Soprani, P. Zani, J. Choi, J. Lee, D. Kim, K.
Yi, and
A. Broggi, “Computer vision at the hyundai autonomous
challenge,”
in 14th International Conference on Intelligent Transportation
Systems
(ITSC). IEEE, 2011, pp. 777–783.
[11] R. McAllister, Y. Gal, A. Kendall, M. Van Der Wilk, A.
Shah,
R. Cipolla, and A. V. Weller, “Concrete problems for
autonomous
vehicle safety: advantages of bayesian deep learning.”
International
Joint Conferences on Artificial Intelligence, Inc., 2017.
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification
with deep convolutional neural networks,” in Advances in
neural
information processing systems, 2012, pp. 1097–1105.
[13] B. Schwarz, “Lidar: Mapping the world in 3d,” Nature
Photonics,
vol. 4, no. 7, p. 429, 2010.

[14] S. Hecker, D. Dai, and L. Van Gool, “End-to-end learning
of driving
models with surround-view cameras and route planners,” in
Proceed-
ings of the European Conference on Computer Vision (ECCV),
2018,
pp. 435–453.
[15] D. Lavrinc. This is how bad self-driving cars suck in rain.
https://jalopnik.com/this-is-how-bad-self-driving-cars-suck-in-
the-
rain-1666268433. [Retrieved December 16, 2018].
[16] A. Davies. Google’s self-driving car caused its first crash.
https://www.wired.com/2016/02/googles-self-driving-car-may-
caused-
first-crash/. [Retrieved December 16, 2018].
[17] M. McFarland. Who’s responsible when an autonomous car
crashes? https://money.cnn.com/2016/07/07/technology/tesla-
liability-
risk/index.html. [Retrieved June 4, 2019].
[18] T. B. Lee. Autopilot was active when a tesla crashed into a
truck,
killing driver. https://arstechnica.com/cars/2019/05/feds-
autopilot-was-
active-during-deadly-march-tesla-crash/. [Retrieved May 19,
2019].
[19] SAE, “Taxonomy and definitions for terms related to
driving automa-
tion systems for on-road motor vehicles,” SAE J3016, 2016,
Tech. Rep.
[20] J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S.

Kammel,
J. Z. Kolter, D. Langer, O. Pink, V. Pratt et al., “Towards fully
autonomous driving: Systems and algorithms,” in Intelligent
Vehicles
Symposium (IV), 2011 IEEE. IEEE, 2011, pp. 163–168.
[21] M. Campbell, M. Egerstedt, J. P. How, and R. M. Murray,
“Autonomous
driving in urban environments: approaches, lessons and
challenges,”
Philosophical Transactions of the Royal Society of London A:
Math-
ematical, Physical and Engineering Sciences, vol. 368, no.
1928, pp.
4649–4672, 2010.
[22] B. Paden, M. Čáp, S. Z. Yong, D. Yershov, and E. Frazzoli,
“A survey of
motion planning and control techniques for self-driving urban
vehicles,”
IEEE Transactions on intelligent vehicles, vol. 1, no. 1, pp. 33–
55,
2016.
[23] D. González, J. Pérez, V. Milanés, and F. Nashashibi, “A
review of mo-
tion planning techniques for automated vehicles,” IEEE
Transactions
on Intelligent Transportation Systems, vol. 17, no. 4, pp. 1135–
1145,
2016.
[24] J. Van Brummelen, M. O’Brien, D. Gruyer, and H.
Najjaran, “Au-
tonomous vehicle perception: The technology of today and
tomorrow,”

Transportation research part C: emerging technologies, vol. 89,
pp.
384–406, 2018.
[25] G. Bresson, Z. Alsayed, L. Yu, and S. Glaser,
“Simultaneous localiza-
tion and mapping: A survey of current trends in autonomous
driving,”
IEEE Transactions on Intelligent Vehicles, vol. 20, pp. 1–1,
2017.
[26] K. Abboud, H. A. Omar, and W. Zhuang, “Interworking of
dsrc and
cellular network technologies for v2x communications: A
survey,”
IEEE transactions on vehicular technology, vol. 65, no. 12, pp.
9457–
9470, 2016.
[27] C. Badue, R. Guidolini, R. V. Carneiro, P. Azevedo, V. B.
Cardoso,
A. Forechi, L. F. R. Jesus, R. F. Berriel, T. M. Paixão, F. Mutz
et al.,
“Self-driving cars: A survey,” arXiv preprint arXiv:1901.04407,
2019.
[28] W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and
decision-
making for autonomous vehicles,” Annual Review of Control,
Robotics,
and Autonomous Systems, vol. 1, pp. 187–210, 2018.
https://www.eurekanetwork.org/project/id/45
https://www.eurekanetwork.org/project/id/45
the-rain-1666268433

the-rain-1666268433
caused-first-crash/
caused-first-crash/
https://money.cnn.com/2016/07/07/technology/tesla-liability-
risk/index.html
https://money.cnn.com/2016/07/07/technology/tesla-liability-
risk/index.html
https://arstechnica.com/cars/2019/05/feds-autopilot-was-active-
during-deadly-march-tesla-crash/
https://arstechnica.com/cars/2019/05/feds-autopilot-was-active-
during-deadly-march-tesla-crash/
[29] Department of Economic and Social Affairs (DESA),
Population Di-
vision, “The 2017 revision, key findings and advance tables,” in
World
Population Prospects. United Nations, 2017, no.
ESA/P/WP/248.
[30] Deloitte. 2019 deloitte global automotive consumer study –
advanced
vehicle technologies and multimodal transportation, global
focus coun-
tries.
https://www2.deloitte.com/content/dam/Deloitte/us/Documents/
manufacturing/us-global-automotive-consumer-study-2019.pdf.
[Retrieved May 19, 2019].
[31] Federatione Internationale de l’Automobile (FiA) Region 1.
The auto-
motive digital transformation and the economic impacts of
existing data

access models. https://www.fiaregion1.com/wp-
content/uploads/2019/
03/The-Automotive-Digital-Transformation Full-study.pdf.
[Retrieved
May 19, 2019].
[32] R. Rajamani, “Vehicle dynamics and control,” 2011.
[33] M. R. Hafner, D. Cunningham, L. Caminiti, and D. Del
Vecchio,
“Cooperative collision avoidance at intersections: Algorithms
and ex-
periments,” IEEE Transactions on Intelligent Transportation
Systems,
vol. 14, no. 3, pp. 1162–1175, 2013.
[34] A. Colombo and D. Del Vecchio, “Efficient algorithms for
collision
avoidance at intersections,” in Proceedings of the 15th ACM
inter-
national conference on Hybrid Systems: Computation and
Control.
ACM, 2012, pp. 145–154.
[35] P. E. Ross, “The audi a8: the world’s first production car
to achieve
level 3 autonomy,” IEEE Spectrum, 2017.
[36] C. Gold, M. Körber, D. Lechner, and K. Bengler, “Taking
over control
from highly automated vehicles in complex traffic situations:
the role
of traffic density,” Human factors, vol. 58, no. 4, pp. 642–652,
2016.
[37] N. Merat, A. H. Jamson, F. C. Lai, M. Daly, and O. M.

Carsten,
“Transition to manual: Driver behaviour when resuming control
from
a highly automated vehicle,” Transportation research part F:
traffic
psychology and behaviour, vol. 27, pp. 274–282, 2014.
[38] E. Ackerman, “Toyota’s gill pratt on self-driving cars and
the reality
of full autonomy,” IEEE Spectrum, 2017.
[39] J. D’Onfro. ‘I hate them’: Locals reportedly are frustrated
with
alphabet’s self-driving cars.
https://www.cnbc.com/2018/08/28/locals-
reportedly-frustrated-with-alphabets-waymo-self-driving-
cars.html.
[Retrieved May 19, 2019].
[40] J.-F. Bonnefon, A. Shariff, and I. Rahwan, “The social
dilemma of
autonomous vehicles,” Science, vol. 352, no. 6293, pp. 1573–
1576,
2016.
[41] ——, “Autonomous vehicles need experimental ethics: Are
we ready
for utilitarian cars?” arXiv preprint arXiv:1510.03346, 2015.
[42] Y. Tian, K. Pei, S. Jana, and B. Ray, “Deeptest: Automated
testing of
deep-neural-network-driven autonomous cars,” in Proceedings
of the
40th International Conference on Software Engineering. ACM,
2018,
pp. 303–314.

[43] C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M.
Clark,
J. Dolan, D. Duggins, T. Galatali, C. Geyer et al., “Autonomous
driving
in urban environments: Boss and the urban challenge,” Journal
of Field
Robotics, vol. 25, no. 8, pp. 425–466, 2008.
[44] M. Gerla, E.-K. Lee, G. Pau, and U. Lee, “Internet of
vehicles: From
intelligent grid to autonomous cars and vehicular clouds,” in
IEEE
World Forum on Internet of Things (WF-IoT). IEEE, 2014, pp.
241–
246.
[45] E.-K. Lee, M. Gerla, G. Pau, U. Lee, and J.-H. Lim,
“Internet of
vehicles: From intelligent grid to autonomous cars and
vehicular fogs,”
International Journal of Distributed Sensor Networks, vol. 12,
no. 9,
p. 1550147716665500, 2016.
[46] M. Amadeo, C. Campolo, and A. Molinaro, “Information-
centric
networking for connected vehicles: a survey and future
perspectives,”
IEEE Communications Magazine, vol. 54, no. 2, pp. 98–104,
2016.
[47] J. Wei, J. M. Snider, J. Kim, J. M. Dolan, R. Rajkumar, and
B. Litkouhi,
“Towards a viable autonomous driving research platform,” in
Intelligent

Vehicles Symposium (IV), 2013 IEEE. IEEE, 2013, pp. 763–
770.
[48] A. Broggi, M. Buzzoni, S. Debattisti, P. Grisleri, M. C.
Laghi,
P. Medici, and P. Versari, “Extensive tests of autonomous
driving tech-
nologies,” IEEE Transactions on Intelligent Transportation
Systems,
vol. 14, no. 3, pp. 1403–1415, 2013.
[49] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1
year, 1000 km:
The oxford robotcar dataset,” The International Journal of
Robotics
Research, vol. 36, no. 1, pp. 3–15, 2017.
[50] N. Akai, L. Y. Morales, T. Yamaguchi, E. Takeuchi, Y.
Yoshihara,
H. Okuda, T. Suzuki, and Y. Ninomiya, “Autonomous driving
based
on accurate localization using multilayer lidar and dead
reckoning,”
in IEEE 20th International Conference on Intelligent
Transportation
Systems (ITSC). IEEE, 2017, pp. 1–6.
[51] E. Guizzo, “How google’s self-driving car works,” IEEE
Spectrum
Online, vol. 18, no. 7, pp. 1132–1141, 2011.
[52] H. Somerville, P. Lienert, and A. Sage. Uber’s use of fewer
safety
sensors prompts questions after arizona crash. Business news,
Reuters,
March 2018. [Retrieved December 16, 2018].

[53] J. Ziegler, P. Bender, M. Schreiber, H. Lategahn, T.
Strauss, C. Stiller,
T. Dang, U. Franke, N. Appenrodt, C. G. Keller et al., “Making
Bertha
drive – an autonomous journey on a historic route,” IEEE
Intelligent
Transportation Systems Magazine, vol. 6, no. 2, pp. 8–20, 2014.
[54] Baidu. Apollo auto. https://github.com/ApolloAuto/apollo.
[Retrieved
May 1, 2019].
[55] C. Chen, A. Seff, A. Kornhauser, and J. Xiao,
“Deepdriving: Learning
affordance for direct perception in autonomous driving,” in
Proceedings
of the IEEE International Conference on Computer Vision,
2015, pp.
2722–2730.
[56] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in
a neural
network,” in Advances in neural information processing
systems, 1989,
pp. 305–313.
[57] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun,
“Off-road
obstacle avoidance through end-to-end learning,” in Advances
in neural
information processing systems, 2006, pp. 739–746.
[58] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B.
Flepp,
P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al.,

“End to
end learning for self-driving cars,” arXiv preprint
arXiv:1604.07316,
2016.
[59] H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning
of driving
models from large-scale video datasets,” arXiv preprint, 2017.
[60] A. E. Sallab, M. Abdou, E. Perot, and S. Yogamani, “Deep
reinforce-
ment learning framework for autonomous driving,” Electronic
Imaging,
vol. 2017, no. 19, pp. 70–76, 2017.
[61] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.-M.
Allen, V.-D.
Lam, A. Bewley, and A. Shah, “Learning to drive in a day,”
arXiv
preprint arXiv:1807.00412, 2018.
[62] S. Baluja, “Evolution of an artificial neural network based
autonomous
land vehicle controller,” IEEE Transactions on Systems, Man,
and
Cybernetics-Part B: Cybernetics, vol. 26, no. 3, pp. 450–463,
1996.
[63] J. Koutnı́k, G. Cuccu, J. Schmidhuber, and F. Gomez,
“Evolving large-
scale neural networks for vision-based reinforcement learning,”
in
Proceedings of the 15th annual conference on Genetic and
evolutionary
computation. ACM, 2013, pp. 1061–1068.

[64] S. Behere and M. Torngren, “A functional architecture for
autonomous
driving,” in First International Workshop on Automotive
Software
Architecture (WASA). IEEE, 2015, pp. 3–10.
[65] L. Chi and Y. Mu, “Deep steering: Learning end-to-end
driv-
ing model from spatial and temporal visual cues,” arXiv
preprint
arXiv:1708.03798, 2017.
[66] J.-P. Laumond et al., Robot motion planning and control.
Springer,
1998, vol. 229.
[67] R. Jain, R. Kasturi, and B. G. Schunck, Machine vision.
McGraw-Hill
New York, 1995, vol. 5.
[68] S. J. Anderson, S. B. Karumanchi, and K. Iagnemma,
“Constraint-
based planning and control for safe, semi-autonomous operation
of
vehicles,” in 2012 IEEE intelligent vehicles symposium. IEEE,
2012,
pp. 383–388.
[69] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J.
Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G.
Ostrovski
et al., “Human-level control through deep reinforcement
learning,”
Nature, vol. 518, no. 7540, p. 529, 2015.

[70] D. Floreano, P. Dürr, and C. Mattiussi, “Neuroevolution:
from architec-
tures to learning,” Evolutionary Intelligence, vol. 1, no. 1, pp.
47–62,
2008.
[71] H. T. Cheng, H. Shan, and W. Zhuang, “Infotainment and
road
safety service support in vehicular networking: From a
communication
perspective,” Mechanical Systems and Signal Processing, vol.
25, no. 6,
pp. 2020–2038, 2011.
[72] J. Wang, Y. Shao, Y. Ge, and R. Yu, “A survey of vehicle
to everything
(v2x) testing,” Sensors, vol. 19, no. 2, p. 334, 2019.
[73] S. Kumar, L. Shi, N. Ahmed, S. Gil, D. Katabi, and D. Rus,
“Carspeak:
a content-centric network for autonomous driving,” in
Proceedings of
the ACM SIGCOMM 2012 conference on Applications,
technologies,
architectures, and protocols for computer communication. ACM,
2012, pp. 259–270.
[74] E. Yurtsever, S. Yamazaki, C. Miyajima, K. Takeda, M.
Mori, K. Hit-
omi, and M. Egawa, “Integrating driving behavior and traffic
context
through signal symbolization for data reduction and risky lane
change
detection,” IEEE Transactions on Intelligent Vehicles, vol. 3,
no. 3, pp.
242–253, 2018.

manufacturing/us-global-automotive-consumer-study-2019.pdf
manufacturing/us-global-automotive-consumer-study-2019.pdf
https://www.fiaregion1.com/wp-content/uploads/2019/03/The-
Automotive-Digital-Transformation_Full-study.pdf
https://www.fiaregion1.com/wp-content/uploads/2019/03/The-
Automotive-Digital-Transformation_Full-study.pdf
https://www.cnbc.com/2018/08/28/locals-reportedly-frustrated-
with-alphabets-waymo-self-driving-cars.html
https://www.cnbc.com/2018/08/28/locals-reportedly-frustrated-
with-alphabets-waymo-self-driving-cars.html
https://github.com/ApolloAuto/apollo
[75] M. Gerla, “Vehicular cloud computing,” in Ad Hoc
Networking Work-
shop (Med-Hoc-Net), 2012 The 11th Annual Mediterranean.
IEEE,
2012, pp. 152–155.
[76] M. Whaiduzzaman, M. Sookhak, A. Gani, and R. Buyya,
“A survey
on vehicular cloud computing,” Journal of Network and
Computer
applications, vol. 40, pp. 325–344, 2014.
[77] I. Din, B.-S. Kim, S. Hassan, M. Guizani, M. Atiquzzaman,
and
J. Rodrigues, “Information-centric network-based vehicular
communi-
cations: overview and research opportunities,” Sensors, vol. 18,
no. 11,
p. 3957, 2018.

[78] C. Jang, C. Kim, K. Jo, and M. Sunwoo, “Design factor
optimization of
3d flash lidar sensor based on geometrical model for automated
vehicle
and advanced driver assistance system applications,”
International
Journal of Automotive Technology, vol. 18, no. 1, pp. 147–156,
2017.
[79] A. I. Maqueda, A. Loquercio, G. Gallego, N. Garcı́a, and
D. Scara-
muzza, “Event-based vision meets deep learning on steering
prediction
for self-driving cars,” in Proceedings of the IEEE Conference
on
Computer Vision and Pattern Recognition (CVPR), 2018, pp.
5419–
5427.
[80] C. Fries and H.-J. Wuensche, “Autonomous convoy driving
by night:
The vehicle tracking system,” in Proceedings of the IEEE
International
Conference on Technologies for Practical Robot Applications
(TePRA).
IEEE, 2015, pp. 1–6.
[81] Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T.
Harada, “Mfnet:
Towards real-time semantic segmentation for autonomous
vehicles
with multi-spectral scenes,” in Proceedings of the 2017
IEEE/RSJ
International Conference on Intelligent Robots and Systems
(IROS).
IEEE, 2017, pp. 5108–5115.

[82] T. B. Lee. How 10 leading companies are trying to make
powerful, low-
cost lidar. https://arstechnica.com/cars/2019/02/the-ars-
technica-guide-
to-the-lidar-industry/. [Retrieved May 19, 2019].
[83] A. Saxena, S. H. Chung, and A. Y. Ng, “Learning depth
from single
monocular images,” in Advances in neural information
processing
systems, 2006, pp. 1161–1168.
[84] J. Janai, F. Güney, A. Behl, and A. Geiger, “Computer
vision for
autonomous vehicles: Problems, datasets and State-of-the-Art,”
pre-
print, Apr. 2017.
[85] J. Binas, D. Neil, S.-C. Liu, and T. Delbruck, “Ddd17:
End-to-end
davis driving dataset,” arXiv preprint arXiv:1711.01458, 2017.
[86] P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128
120 db 15µs
latency asynchronous temporal contrast vision sensor,” IEEE J.
Solid-
State Circuits, vol. 43, no. 2, pp. 566–576, Feb. 2008.
[87] B. Schoettle, “Sensor fusion: A comparison of sensing
capabilities of
human drivers and highly automated vehicles,” University of
Michi-
gan, Sustainable Worldwide Transportation, Tech. Rep. SWT-
2017-12,
August 2017.

[88] Tesla Motors. Autopilot press kit.
https://www.tesla.com/presskit/
autopilot#autopilot. [Retrieved December 16, 2018].
[89] SXSW Interactive 2016. Chris Urmson explains google
self-driving car
project. https://www.sxsw.com/interactive/2016/chris-urmson-
explain-
googles-self-driving-car-project/. [Retrieved December 16,
2018].
[90] M. A. Al-Khedher, “Hybrid GPS-GSM localization of
automobile
tracking system,” arXiv preprint arXiv:1201.2630, 2012.
[91] K. S. Chong and L. Kleeman, “Accurate odometry and
error modelling
for a mobile robot,” in IEEE International Conference on
Robotics and
Automation (ICRA), vol. 4. IEEE, 1997, pp. 2783–2788.
[92] C. Urmson, J. Anhalt, M. Clark, T. Galatali, J. P.
Gonzalez, J. Gowdy,
A. Gutierrez, S. Harbaugh, M. Johnson-Roberson, H. Kato et
al.,
“High speed navigation of unrehearsed terrain: Red team
technology for
grand challenge 2004,” Robotics Institute, Carnegie Mellon
University,
Pittsburgh, PA, Tech. Rep. CMU-RI-TR-04-37, 2004.
[93] T. Bailey and H. Durrant-Whyte, “Simultaneous
localization and map-
ping (slam): Part ii,” IEEE Robotics & Automation Magazine,
vol. 13,

no. 3, pp. 108–117, 2006.
[94] A. Hata and D. Wolf, “Road marking detection using lidar
reflective
intensity data and its application to vehicle localization,” in
17th
International Conference on Intelligent Transportation Systems
(ITSC).
IEEE, 2014, pp. 584–589.
[95] T. Ort, L. Paull, and D. Rus, “Autonomous vehicle
navigation in rural
environments without detailed prior maps,” in International
Conference
on Robotics and Automation, 2018.
[96] J. Levinson and S. Thrun, “Robust vehicle localization in
urban envi-
ronments using probabilistic maps,” in IEEE International
Conference
on Robotics and Automation (ICRA). IEEE, 2010, pp. 4372–
4378.
[97] E. Takeuchi and T. Tsubouchi, “A 3-d scan matching using
improved
3-d normal distributions transform for mobile robotic mapping,”
in
IEEE/RSJ International Conference on Intelligent Robots and
Systems
(IROS). IEEE, 2006, pp. 3068–3073.
[98] S. E. Shladover, “Path at 20-history and major milestones,”
IEEE
Transactions on intelligent transportation systems, vol. 8, no. 4,
pp.

584–592, 2007.
[99] A. Alam, B. Besselink, V. Turri, J. Martensson, and K. H.
Johansson,
“Heavy-duty vehicle platooning for sustainable freight
transportation:
A cooperative method to enhance safety and efficiency,” IEEE
Control
Systems, vol. 35, no. 6, pp. 34–56, 2015.
[100] C. Bergenhem, S. Shladover, E. Coelingh, C. Englund,
and S. Tsugawa,
“Overview of platooning systems,” in Proceedings of the 19th
ITS
World Congress, Oct 22-26, Vienna, Austria (2012), 2012.
[101] E. Chan, “Sartre automated platooning vehicles,” Towards
Innovative
Freight and Logistics, vol. 2, pp. 137–150, 2016.
[102] A. K. Khalaji and S. A. A. Moosavian, “Robust adaptive
controller for
a tractor–trailer mobile robot,” IEEE/ASME Transactions on
Mecha-
tronics, vol. 19, no. 3, pp. 943–953, 2014.
[103] J. Cheng, B. Wang, and Y. Xu, “Backward path tracking
control for
mobile robot with three trailers,” in International Conference on
Neural
Information Processing. Springer, 2017, pp. 32–41.
[104] M. Hejase, J. Jing, J. M. Maroli, Y. B. Salamah, L.
Fiorentini, and
Ü. Özgüner, “Constrained backward path tracking control using
a plug-

in jackknife prevention system for autonomous tractor-trailers,”
in 2018
21st International Conference on Intelligent Transportation
Systems
(ITSC). IEEE, 2018, pp. 2012–2017.
[105] M. Magnusson, “The three-dimensional normal-
distributions transform:
an efficient representation for registration, surface analysis, and
loop
detection,” PhD dissertation, Örebro Universitet, 2009.
[106] S. Kuutti, S. Fallah, K. Katsaros, M. Dianati, F.
Mccullough, and
A. Mouzakitis, “A survey of the state-of-the-art localization
techniques
and their potentials for autonomous vehicle applications,” IEEE
Inter-
net of Things Journal, vol. 5, no. 2, pp. 829–846, 2018.
[107] F. Zhang, H. Stähle, G. Chen, C. C. C. Simon, C. Buckl,
and A. Knoll,
“A sensor fusion approach for localization with cumulative
error
elimination,” in 2012 IEEE International Conference on
Multisensor
Fusion and Integration for Intelligent Systems (MFI). IEEE,
2012,
pp. 1–6.
[108] W.-W. Kao, “Integration of gps and dead-reckoning
navigation sys-
tems,” in Vehicle Navigation and Information Systems
Conference,
1991, vol. 2. IEEE, 1991, pp. 635–643.

[109] J. Levinson, M. Montemerlo, and S. Thrun, “Map-based
precision
vehicle localization in urban environments,” in Robotics:
Science and
Systems III, W. Burgard, O. Brock, and C. Stachniss, Eds. MIT
Press,
2007, ch. 16, pp. 4372–4378.
[110] S. Kato, E. Takeuchi, Y. Ishiguro, Y. Ninomiya, K.
Takeda, and
T. Hamada, “An open approach to autonomous vehicles,” IEEE
Micro,
vol. 35, no. 6, pp. 60–68, 2015.
[111] A. Ranganathan, D. Ilstrup, and T. Wu, “Light-weight
localization for
vehicles using road markings,” in IEEE/RSJ International
Conference
on Intelligent Robots and Systems (IROS). IEEE, 2013, pp.
921–927.
[112] J. Leonard, J. How, S. Teller, M. Berger, S. Campbell, G.
Fiore,
L. Fletcher, E. Frazzoli, A. Huang, S. Karaman et al., “A
perception-
driven autonomous urban vehicle,” Journal of Field Robotics,
vol. 25,
no. 10, pp. 727–774, 2008.
[113] Autoware.
https://github.com/autowarefoundation/autoware. [Retrieved
June 12, 2019].
[114] N. Akai, L. Y. Morales, E. Takeuchi, Y. Yoshihara, and
Y. Ninomiya,
“Robust localization using 3d ndt scan matching with

experimentally
determined uncertainty and road marker matching,” in
Intelligent
Vehicles Symposium (IV), 2017 IEEE. IEEE, 2017, pp. 1356–
1363.
[115] J. K. Suhr, J. Jang, D. Min, and H. G. Jung, “Sensor
fusion-based
low-cost vehicle localization system for complex urban
environments,”
IEEE Transactions on Intelligent Transportation Systems, vol.
18, no. 5,
pp. 1078–1086, 2017.
[116] D. Gruyer, R. Belaroussi, and M. Revilloud, “Accurate
lateral position-
ing from map data and road marking detection,” Expert Systems
with
Applications, vol. 43, pp. 1–8, 2016.
[117] X. Qu, B. Soheilian, and N. Paparoditis, “Vehicle
localization using
mono-camera and geo-referenced traffic signs,” in Intelligent
Vehicles
[118] R. W. Wolcott and R. M. Eustice, “Fast lidar localization
using mul-
tiresolution gaussian mixture maps,” in IEEE International
Conference
2821.
[119] M. Magnusson, A. Nuchter, C. Lorken, A. J. Lilienthal,
and
J. Hertzberg, “Evaluation of 3d registration reliability and

speed-a
comparison of icp and ndt,” in IEEE International Conference
on
Robotics and Automation (ICRA). IEEE, 2009, pp. 3907–3912.
https://arstechnica.com/cars/2019/02/the-ars-technica-guide-to-
the-lidar-industry/
https://arstechnica.com/cars/2019/02/the-ars-technica-guide-to-
the-lidar-industry/
https://www.tesla.com/presskit/autopilot#autopilot
https://www.tesla.com/presskit/autopilot#autopilot
https://www.sxsw.com/interactive/2016/chris-urmson-explain-
googles-self-driving-car-project/
https://www.sxsw.com/interactive/2016/chris-urmson-explain-
googles-self-driving-car-project/
https://github.com/autowarefoundation/autoware
[120] R. Valencia, J. Saarinen, H. Andreasson, J. Vallvé, J.
Andrade-Cetto,
and A. J. Lilienthal, “Localization in highly dynamic
environments
using dual-timescale ndt-mcl,” in IEEE International
Conference on
[121] R. W. Wolcott and R. M. Eustice, “Visual localization
within lidar maps
for automated urban driving,” in IEEE/RSJ International
Conference on
Intelligent Robots and Systems (IROS). IEEE, 2014, pp. 176–
183.
[122] C. McManus, W. Churchill, A. Napier, B. Davis, and P.
Newman, “Dis-
traction suppression for vision-based pose estimation at city

scales,” in
IEEE International Conference on Robotics and Automation
(ICRA).
IEEE, 2013, pp. 3762–3769.
[123] J. Redmon and A. Farhadi, “YOLOv3: An incremental
improvement,”
Apr. 2018.
[124] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-
CNN,” in
2017 IEEE International Conference on Computer Vision
(ICCV), Oct.
2017, pp. 2980–2988.
[125] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H.
Adam,
“Encoder-Decoder with atrous separable convolution for
semantic
image segmentation,” Feb. 2018.
[126] Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely
embedded convolu-
tional detection,” Sensors, vol. 18, no. 10, Oct. 2018.
[127] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi,
“Inception-v4,
Inception-ResNet and the impact of residual connections on
learning,”
Feb. 2016.
[128] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual
learning for image
recognition,” in 2016 IEEE Conference on Computer Vision and
Pattern
Recognition (CVPR). IEEE, Jun. 2016, pp. 770–778.

[129] G. Huang, Z. Liu, L. van der Maaten, and K. Q.
Weinberger, “Densely
connected convolutional networks,” Aug. 2016.
[130] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D.
Anguelov,
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper
with
convolutions,” in Proceedings of the IEEE conference on
computer
vision and pattern recognition, 2015, pp. 1–9.
[131] K. Simonyan and A. Zisserman, “Very deep convolutional
networks
for Large-Scale image recognition,” Computing Research
Repository
CoRR, vol. abs/1409.1556, 2015.
[132] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-
fei, “Imagenet:
A large-scale hierarchical image database,” in In CVPR, 2009.
[133] A. Andreopoulos and J. K. Tsotsos, “50 years of object
recognition:
Directions forward,” Comput. Vis. Image Underst., vol. 117, no.
8, pp.
827–891, Aug. 2013.
[134] Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object
detection with
deep learning: A review,” Jul. 2018.
[135] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu,
and
M. Pietikäinen, “Deep learning for generic object detection: A

survey,”
Sep. 2018.
[136] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi,
“You only
look once: Unified, real-time object detection,” 2016 IEEE
Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 779–
788,
2016.
[137] J. Redmon and A. Farhadi, “Yolo9000: Better, faster,
stronger,”
2017 IEEE Conference on Computer Vision and Pattern
Recognition
(CVPR), pp. 6517–6525, 2017.
[138] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-
Y. Fu, and
A. C. Berg, “SSD: Single shot MultiBox detector,” Dec. 2015.
[139] C. Geyer and K. Daniilidis, “A unifying theory for central
panoramic
systems and practical implications,” in Computer Vision —
ECCV 2000.
Springer Berlin Heidelberg, 2000, pp. 445–461.
[140] D. Scaramuzza, A. Martinelli, and R. Siegwart, “A
toolbox for easily
calibrating omnidirectional cameras,” in 2006 IEEE/RSJ
International
Conference on Intelligent Robots and Systems, Oct. 2006, pp.
5695–
5701.
[141] D. Scaramuzza and R. Siegwart, “Appearance-Guided

monocular om-
nidirectional visual odometry for outdoor ground vehicles,”
IEEE
Trans. Rob., vol. 24, no. 5, pp. 1015–1026, Oct. 2008.
[142] M. Schönbein and A. Geiger, “Omnidirectional 3D
reconstruction
in augmented manhattan worlds,” in 2014 IEEE/RSJ
International
Conference on Intelligent Robots and Systems, Sep. 2014, pp.
716–
723.
[143] G. Gallego, T. Delbruck, G. Orchard, C. Bartolozzi, B.
Taba, A. Censi,
S. Leutenegger, A. Davison, J. Conradt, K. Daniilidis, and D.
Scara-
muzza, “Event-based vision: A survey,” pre-print, Apr. 2019.
[144] R. H. Rasshofer and K. Gresser, “Automotive radar and
lidar systems
for next generation driver assistance functions,” Adv. Radio
Sci., vol. 3,
no. B.4, pp. 205–209, May 2005.
[145] P. Radecki, M. Campbell, and K. Matzen, “All weather
perception:
Joint data association, tracking, and classification for
autonomous
ground vehicles,” pre-print, May 2016.
[146] P. Hurney, P. Waldron, F. Morgan, E. Jones, and M.
Glavin, “Review
of pedestrian detection techniques in automotive far-infrared
video,”
IET Intel. Transport Syst., vol. 9, no. 8, pp. 824–832, 2015.

[147] N. Carlevaris-Bianco and R. M. Eustice, “Learning visual
feature
descriptors for dynamic lighting conditions,” in 2014 IEEE/RSJ
In-
ternational Conference on Intelligent Robots and Systems, Sep.
2014,
pp. 2769–2776.
[148] V. Peretroukhin, W. Vega-Brown, N. Roy, and J. Kelly,
“PROBE-GK:
Predictive robust estimation using generalized kernels,” pre-
print, Aug.
2017.
[149] W. Maddern, A. Stewart, C. McManus, B. Upcroft, W.
Churchill, and
P. Newman, “Illumination invariant imaging: Applications in
robust
vision-based localisation, mapping and classification for
autonomous
vehicles,” in Proceedings of the Visual Place Recognition in
Changing
Environments Workshop, IEEE International Conference on
Robotics
and Automation (ICRA), Hong Kong, China, vol. 2, 2014, p. 3.
[150] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R.
Girshick, J. Hays,
P. Perona, D. Ramanan, C. Lawrence Zitnick, and P. Dollár,
“Microsoft
COCO: Common objects in context,” May 2014.
[151] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN:
Towards Real-
Time object detection with region proposal networks,” Jun.

2015.
[152] H. Noh, S. Hong, and B. Han, “Learning deconvolution
network
for semantic segmentation,” in Proceedings of the 2015 IEEE
International Conference on Computer Vision (ICCV), ser.
ICCV ’15.
Washington, DC, USA: IEEE Computer Society, 2015, pp.
1520–1528.
[Online]. Available: http://dx.doi.org/10.1109/ICCV.2015.178
[153] O. Ronneberger, P. Fischer, and T. Brox, “U-Net:
Convolutional net-
works for biomedical image segmentation,” May 2015.
[154] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid
scene parsing
network,” Dec. 2016.
[155] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and
A. L. Yuille,
“DeepLab: Semantic image segmentation with deep
convolutional nets,
atrous convolution, and fully connected CRFs,” pre-print, Jun.
2016.
[156] X. Ma, Z. Wang, H. Li, W. Ouyang, and P. Zhang,
“Accurate monoc-
ular 3D object detection via Color-Embedded 3D reconstruction
for
autonomous driving,” Mar. 2019.
[157] X. Cheng, P. Wang, and R. Yang, “Learning depth with
convolutional
spatial propagation network,” 2018.

[158] R. B. Rusu, “Semantic 3d object maps for everyday
manipulation in
human living environments,” October 2009.
[159] W. Wang, K. Sakurada, and N. Kawaguchi, “Incremental
and enhanced
Scanline-Based segmentation method for surface reconstruction
of
sparse LiDAR data,” Remote Sensing, vol. 8, no. 11, p. 967,
Nov.
2016.
[160] P. Narksri, E. Takeuchi, Y. Ninomiya, Y. Morales, and N.
Kawaguchi,
“A slope-robust cascaded ground segmentation in 3D point
cloud
for autonomous vehicles,” in 2018 IEEE International
Conference on
Intelligent Transportation Systems (ITSC), Nov. 2018, pp. 497–
504.
[161] J. Lambert, L. Liang, Y. Morales, N. Akai, A. Carballo,
E. Takeuchi,
P. Narksri, S. Seiya, and K. Takeda, “Tsukuba challenge 2017
dynamic
object tracks dataset for pedestrian behavior analysis,” Journal
of
Robotics and Mechatronics (JRM), vol. 30, no. 4, Aug. 2018.
[162] S. Song and J. Xiao, “Sliding shapes for 3D object
detection in depth
images,” in Proceedings of the European Conference on
Computer
Vision ECCV 2014. Springer International Publishing, 2014, pp.
634–
651.

[163] D. Z. Wang and I. Posner, “Voting for voting in online
point cloud
object detection,” in Proceedings of Robotics: Science and
Systems,
July 2015.
[164] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End learning
for point cloud
based 3D object detection,” Nov. 2017.
[165] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S.
Fidler, and
R. Urtasun, “3D object proposals for accurate object class
detection,”
in Advances in Neural Information Processing Systems 28, C.
Cortes,
N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds.
Curran
Associates, Inc., 2015, pp. 424–432.
[166] D. Lin, S. Fidler, and R. Urtasun, “Holistic scene
understanding for
3D object detection with RGBD cameras,” in 2013 IEEE
International
Conference on Computer Vision, Dec. 2013, pp. 1417–1424.
[167] B. Li, T. Zhang, and T. Xia, “Vehicle detection from 3D
lidar using
fully convolutional network,” in Proceedings of Robotics:
Science and
Systems, June 2016.
[168] L. Liu, Z. Pan, and B. Lei, “Learning a rotation invariant
detector with
rotatable bounding box,” Nov. 2017.

http://dx.doi.org/10.1109/ICCV.2015.178
[169] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view
3D object
detection network for autonomous driving,” in 2017 IEEE
Conference
on Computer Vision and Pattern Recognition (CVPR), July
2017, pp.
6526–6534.
[170] M. Ren, A. Pokrovsky, B. Yang, and R. Urtasun, “SBNet:
Sparse blocks
network for fast inference,” in Proceedings of the IEEE
Conference on
Computer Vision and Pattern Recognition, 2018, pp. 8711–
8720.
[171] W. Ali, S. Abdelkarim, M. Zahran, M. Zidan, and A. El
Sallab,
“YOLO3D: End-to-end real-time 3D oriented object bounding
box
detection from LiDAR point cloud,” Aug. 2018.
[172] B. Yang, W. Luo, and R. Urtasun, “PIXOR: Real-time 3D
object detec-
tion from point clouds,” in 2018 IEEE/CVF Conference on
Computer
Vision and Pattern Recognition (CVPR), Jun. 2018, pp. 7652–
7660.
[173] D. Feng, L. Rosenbaum, and K. Dietmayer, “Towards safe
autonomous
driving: Capture uncertainty in the deep neural network for lidar
3D

vehicle detection,” Apr. 2018.
[174] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for
autonomous
driving? the kitti vision benchmark suite,” in IEEE Conference
on
Computer Vision and Pattern Recognition (CVPR). IEEE, 2012,
pp.
3354–3361.
[175] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong,
Q. Xu, A. Kr-
ishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuScenes: A
multimodal
dataset for autonomous driving,” arXiv preprint
arXiv:1903.11027,
2019.
[176] S. Shi, X. Wang, and H. Li, “PointRCNN: 3D object
proposal gener-
ation and detection from point cloud,” Dec. 2018.
[177] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O.
Beijbom,
“PointPillars: Fast encoders for object detection from point
clouds,”
Dec. 2018.
[178] Z. Yang, Y. Sun, S. Liu, X. Shen, and J. Jia, “IPOD:
Intensive point-
based object detector for point cloud,” Dec. 2018.
[179] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas,
“Frustum PointNets
for 3D object detection from RGB-D data,” Nov. 2017.

[180] W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, X. Zhao,
and T.-K. Kim,
“Multiple object tracking: A literature review,” Sep. 2014.
[181] A. Azim and O. Aycard, “Detection, classification and
tracking of
moving objects in a 3D environment,” in 2012 IEEE Intelligent
Vehicles
Symposium, Jun. 2012, pp. 802–807.
[182] T. S. Shi, “Good features to track,” in 1994 Proceedings
of IEEE
Conference on Computer Vision and Pattern Recognition, Jun.
1994,
pp. 593–600.
[183] M. . Dubuisson and A. K. Jain, “A modified hausdorff
distance for
object matching,” in Proceedings of 12th International
Conference on
Pattern Recognition, vol. 1, Oct. 1994, pp. 566–568 vol.1.
[184] S. Hwang, N. Kim, Y. Choi, S. Lee, and I. S. Kweon,
“Fast multiple
objects detection and tracking fusing color camera and 3D
LIDAR
for intelligent vehicles,” in 2016 13th International Conference
on
Ubiquitous Robots and Ambient Intelligence (URAI), Aug.
2016, pp.
234–239.
[185] T. Nguyen, B. Michaelis, A. Al-Hamadi, M. Tornow, and
M. Meinecke,
“Stereo-Camera-Based urban environment perception using
occupancy

grid and object tracking,” IEEE Trans. Intell. Transp. Syst., vol.
13,
no. 1, pp. 154–165, Mar. 2012.
[186] J. Ziegler, P. Bender, T. Dang, and C. Stiller, “Trajectory
planning
for bertha — a local, continuous method,” in 2014 IEEE
Intelligent
Vehicles Symposium Proceedings, Jun. 2014, pp. 450–457.
[187] A. Ess, K. Schindler, B. Leibe, and L. Van Gool, “Object
detection and
tracking for autonomous navigation in dynamic environments,”
Int. J.
Rob. Res., vol. 29, no. 14, pp. 1707–1725, Dec. 2010.
[188] A. Petrovskaya and S. Thrun, “Model based vehicle
detection and
tracking for autonomous urban driving,” Auton. Robots, vol. 26,
no.
2-3, pp. 123–139, Apr. 2009.
[189] M. He, E. Takeuchi, Y. Ninomiya, and S. Kato, “Precise
and effi-
cient model-based vehicle tracking method using Rao-
Blackwellized
and scaling series particle filters,” in 2016 IEEE/RSJ
International
Conference on Intelligent Robots and Systems (IROS), Oct.
2016, pp.
117–124.
[190] D. Z. Wang, I. Posner, and P. Newman, “Model-free
detection and
tracking of dynamic objects with 2D lidar,” Int. J. Rob. Res.,
vol. 34,

no. 7, pp. 1039–1063, Jun. 2015.
[191] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J.
Pazhayampallil,
M. Andriluka, P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, F.
Mujica,
A. Coates, and A. Y. Ng, “An empirical evaluation of deep
learning
on highway driving,” pre-print, Apr. 2015.
[192] D. Held, S. Thrun, and S. Savarese, “Learning to track at
100 FPS
with deep regression networks,” pre-print, Apr. 2016.
[193] S. Chowdhuri, T. Pankaj, and K. Zipser, “MultiNet:
Multi-Modal Multi-
Task learning for autonomous driving,” pre-print, Sep. 2017.
[194] J. C. McCall and M. M. Trivedi, “Video-based lane
estimation and
tracking for driver assistance: survey, system, and evaluation,”
IEEE
Trans. Intell. Transp. Syst., vol. 7, no. 1, pp. 20–37, Mar. 2006.
[195] A. B. Hillel, R. Lerner, D. Levi, and G. Raz, “Recent
progress in road
and lane detection: a survey,” Machine vision and applications,
vol. 25,
no. 3, pp. 727–745, 2014.
[196] C. Fernández, D. Fernández-Llorca, and M. A. Sotelo, “A
hybrid
Vision-Map method for urban road detection,” Journal of
Advanced
Transportation, vol. 2017, Oct. 2017.

[197] E. Yurtsever, Y. Liu, J. Lambert, C. Miyajima, E.
Takeuchi, K. Takeda,
and J. H. L. Hansen, “Risky action recognition in lane change
video
clips using deep spatiotemporal networks with segmentation
mask
transfer,” in 2019 IEEE Intelligent Transportation Systems
Conference
(ITSC), Oct 2019, pp. 3100–3107.
[198] R. Labayrade, J. Douret, J. Laneurit, and R. Chapuis, “A
reliable
and robust lane detection system based on the parallel use of
three
algorithms for driving safety assistance,” IEICE Trans. Inf.
Syst.,
vol. 89, no. 7, pp. 2092–2100, 2006.
[199] Yan Jiang, Feng Gao, and Guoyan Xu, “Computer vision-
based
multiple-lane detection on straight road and in a curve,” in 2010
International Conference on Image Analysis and Signal
Processing,
Apr. 2010, pp. 114–117.
[200] M. Paton, K. MacTavish, C. J. Ostafew, and T. D.
Barfoot, “It’s not
easy seeing green: Lighting-resistant stereo visual teach &
repeat using
color-constant images,” in 2015 IEEE International Conference
on
[201] A. S. Huang, D. Moore, M. Antone, E. Olson, and S.
Teller, “Finding
multiple lanes in urban road networks with vision and lidar,”

Auton.
Robots, vol. 26, no. 2, pp. 103–122, Apr. 2009.
[202] H. Cheng, B. Jeng, P. Tseng, and K. Fan, “Lane detection
with moving
vehicles in the traffic scenes,” IEEE Trans. Intell. Transp. Syst.,
vol. 7,
no. 4, pp. 571–582, Dec. 2006.
[203] J. M. Álvarez, A. M. López, and R. Baldrich, “Shadow
resistant road
segmentation from a mobile monocular system,” in Pattern
Recognition
and Image Analysis. Springer Berlin Heidelberg, 2007, pp. 9–
16.
[204] R. Danescu and S. Nedevschi, “Probabilistic lane tracking
in difficult
road scenarios using stereovision,” IEEE Trans. Intell. Transp.
Syst.,
vol. 10, no. 2, pp. 272–282, Jun. 2009.
[205] J. Long, E. Shelhamer, and T. Darrell, “Fully
convolutional networks
for semantic segmentation,” in Proceedings of the IEEE
conference on
computer vision and pattern recognition, 2015, pp. 3431–3440.
[206] A. Borkar, M. Hayes, and M. T. Smith, “Robust lane
detection and
tracking with ransac and kalman filter,” in 2009 16th IEEE
Interna-
tional Conference on Image Processing (ICIP), Nov. 2009, pp.
3261–
3264.

[207] A. V. Nefian and G. R. Bradski, “Detection of drivable
corridors for
Off-Road autonomous navigation,” in 2006 International
Conference
on Image Processing, Oct. 2006, pp. 3025–3028.
[208] Y. Gal, “Uncertainty in deep learning,” Ph.D.
dissertation, PhD thesis,
University of Cambridge, 2016.
[209] S. Yamazaki, C. Miyajima, E. Yurtsever, K. Takeda, M.
Mori, K. Hit-
omi, and M. Egawa, “Integrating driving behavior and traffic
context
through signal symbolization,” in Intelligent Vehicles
Symposium (IV),
2016 IEEE. IEEE, 2016, pp. 642–647.
[210] X. Geng, H. Liang, B. Yu, P. Zhao, L. He, and R. Huang,
“A scenario-
adaptive driving behavior prediction approach to urban
autonomous
driving,” Applied Sciences, vol. 7, no. 4, p. 426, 2017.
[211] M. Bahram, C. Hubmann, A. Lawitzky, M. Aeberhard, and
D. Wollherr,
“A combined model-and learning-based framework for
interaction-
aware maneuver prediction,” IEEE Transactions on Intelligent
Trans-
portation Systems, vol. 17, no. 6, pp. 1538–1550, 2016.
[212] V. Gadepally, A. Krishnamurthy, and Ü. Özgüner, “A
framework for
estimating long term driver behavior,” Journal of advanced
transporta-

tion, vol. 2017, 2017.
[213] P. Liu, A. Kurt, K. Redmill, and U. Ozguner,
“Classification of highway
lane change behavior to detect dangerous cut-in maneuvers,” in
The
Transportation Research Board (TRB) 95th Annual Meeting,
vol. 2,
2015.
[214] P. Kumar, M. Perrollaz, S. Lefevre, and C. Laugier,
“Learning-based
approach for online lane change intention prediction,” in 2013
IEEE
Intelligent Vehicles Symposium (IV). IEEE, 2013, pp. 797–802.
[215] F. Sagberg, Selpi, G. F. Bianchi Piccinini, and J.
Engström, “A review
of research on driving styles and road safety,” Human factors,
vol. 57,
no. 7, pp. 1248–1275, 2015.
[216] C. M. Martinez, M. Heucke, F.-Y. Wang, B. Gao, and D.
Cao, “Driving
style recognition for intelligent vehicle control and advanced
driver
assistance: A survey,” IEEE Transactions on Intelligent
Transportation
Systems, vol. 19, no. 3, pp. 666–676, 2018.
[217] D. A. Johnson and M. M. Trivedi, “Driving style
recognition using
a smartphone as a sensor platform,” in 14th International
Conference

on Intelligent Transportation Systems (ITSC). IEEE, 2011, pp.
1609–
1615.
[218] M. Fazeen, B. Gozick, R. Dantu, M. Bhukhiya, and M. C.
González,
“Safe driving using mobile phones,” IEEE Transactions on
Intelligent
Transportation Systems, vol. 13, no. 3, pp. 1462–1468, 2012.
[219] N. Karginova, S. Byttner, and M. Svensson, “Data-driven
methods for
classification of driving styles in buses,” SAE Technical Paper,
Tech.
Rep., 2012.
[220] A. Doshi and M. M. Trivedi, “Examining the impact of
driving style
on the predictability and responsiveness of the driver: Real-
world and
simulator analysis,” in Intelligent Vehicles Symposium (IV),
2010 IEEE.
IEEE, 2010, pp. 232–237.
[221] V. Vaitkus, P. Lengvenis, and G. Žylius, “Driving style
classification
using long-term accelerometer information,” in 19th
International Con-
ference On Methods and Models in Automation and Robotics
(MMAR).
IEEE, 2014, pp. 641–644.
[222] F. Syed, S. Nallapa, A. Dobryden, C. Grand, R. McGee,
and D. Filev,
“Design and analysis of an adaptive real-time advisory system
for

improving real world fuel economy in a hybrid electric
vehicle,” SAE
Technical Paper, Tech. Rep., 2010.
[223] A. Corti, C. Ongini, M. Tanelli, and S. M. Savaresi,
“Quantitative driv-
ing style estimation for energy-oriented applications in road
vehicles,”
in IEEE International Conference on Systems, Man, and
Cybernetics
(SMC). IEEE, 2013, pp. 3710–3715.
[224] E. Ericsson, “Independent driving pattern factors and
their influence on
fuel-use and exhaust emission factors,” Transportation Research
Part
D: Transport and Environment, vol. 6, no. 5, pp. 325–345, 2001.
[225] V. Manzoni, A. Corti, P. De Luca, and S. M. Savaresi,
“Driving style
estimation via inertial measurements,” in 13th International
Conference
on Intelligent Transportation Systems (ITSC), 2010, pp. 777–
782.
[226] J. S. Neubauer and E. Wood, “Accounting for the
variation of driver
aggression in the simulation of conventional and advanced
vehicles,”
SAE Technical Paper, Tech. Rep., 2013.
[227] Y. L. Murphey, R. Milton, and L. Kiliaris, “Driver’s style
classification
using jerk analysis,” in IEEE Workshop on Computational
Intelligence
in Vehicles and Vehicular Systems (CIVVS). IEEE, 2009, pp.

23–28.
[228] E. Yurtsever, K. Takeda, and C. Miyajima, “Traffic
trajectory history
and drive path generation using gps data cloud,” in Intelligent
Vehicles
[229] D. Dörr, D. Grabengiesser, and F. Gauterin, “Online
driving style
recognition using fuzzy logic,” in 17th International Conference
on
Intelligent Transportation Systems (ITSC). IEEE, 2014, pp.
1021–
1026.
[230] L. Xu, J. Hu, H. Jiang, and W. Meng, “Establishing style-
oriented driver
models by imitating human driving behaviors,” IEEE
Transactions on
Intelligent Transportation Systems, vol. 16, no. 5, pp. 2522–
2530, 2015.
[231] B. V. P. Rajan, A. McGordon, and P. A. Jennings, “An
investigation
on the effect of driver style and driving events on energy
demand of
a phev,” World Electric Vehicle Journal, vol. 5, no. 1, pp. 173–
181,
2012.
[232] A. Augustynowicz, “Preliminary classification of driving
style with
objective rank method,” International journal of automotive
technology,
vol. 10, no. 5, pp. 607–610, 2009.

[233] Z. Constantinescu, C. Marinoiu, and M. Vladoiu, “Driving
style analy-
sis using data mining techniques,” International Journal of
Computers
Communications & Control, vol. 5, no. 5, pp. 654–663, 2010.
[234] C. Miyajima, Y. Nishiwaki, K. Ozawa, T. Wakita, K. Itou,
K. Takeda,
and F. Itakura, “Driver modeling based on driving behavior and
its
evaluation in driver identification,” Proceedings of the IEEE,
vol. 95,
no. 2, pp. 427–437, 2007.
[235] A. Bolovinou, I. Bakas, A. Amditis, F. Mastrandrea, and
W. Vinciotti,
“Online prediction of an electric vehicle remaining range based
on
regression analysis,” in Electric Vehicle Conference (IEVC),
2014 IEEE
International. IEEE, 2014, pp. 1–8.
[236] A. Mudgal, S. Hallmark, A. Carriquiry, and K. Gkritza,
“Driving
behavior at a roundabout: A hierarchical bayesian regression
analysis,”
Transportation research part D: transport and environment, vol.
26,
pp. 20–26, 2014.
[237] J. C. McCall and M. M. Trivedi, “Driver behavior and
situation aware
brake assistance for intelligent vehicles,” Proceedings of the
IEEE,
vol. 95, no. 2, pp. 374–387, 2007.

[238] E. Yurtsever, C. Miyajima, S. Selpi, and K. Takeda,
“Driving signature
extraction,” in FAST-zero’15: 3rd International Symposium on
Future
Active Safety Technology Toward zero traffic accidents, 2015,
2015.
[239] E. Yurtsever, C. Miyajima, and K. Takeda, “A traffic flow
simulation
framework for learning driver heterogeneity from naturalistic
driving
data using autoencoders,” International Journal of Automotive
Engi-
neering, vol. 10, no. 1, pp. 86–93, 2019.
[240] K. Sama, Y. Morales, N. Akai, H. Liu, E. Takeuchi, and
K. Takeda,
“Driving feature extraction and behavior classification using an
au-
toencoder to reproduce the velocity styles of experts,” in 2018
21st
International Conference on Intelligent Transportation Systems
(ITSC).
IEEE, 2018, pp. 1337–1343.
[241] H. Liu, T. Taniguchi, Y. Tanaka, K. Takenaka, and T.
Bando, “Visu-
alization of driving behavior based on hidden feature extraction
by
using deep learning,” IEEE Transactions on Intelligent
Transportation
Systems, vol. 18, no. 9, pp. 2477–2489, 2017.
[242] Y. Zhang, W. C. Lin, and Y.-K. S. Chin, “A pattern-
recognition

approach for driving skill characterization,” IEEE transactions
on
intelligent transportation systems, vol. 11, no. 4, pp. 905–916,
2010.
[243] H. Darweesh, E. Takeuchi, K. Takeda, Y. Ninomiya, A.
Sujiwo, L. Y.
Morales, N. Akai, T. Tomizawa, and S. Kato, “Open source
integrated
planner for autonomous navigation in highly dynamic
environments,”
Journal of Robotics and Mechatronics, vol. 29, no. 4, pp. 668–
684,
2017.
[244] H. Bast, D. Delling, A. Goldberg, M. Müller-Hannemann,
T. Pajor,
P. Sanders, D. Wagner, and R. F. Werneck, “Route planning in
transportation networks,” in Algorithm engineering. Springer,
2016,
pp. 19–80.
[245] P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis
for the
heuristic determination of minimum cost paths,” IEEE
transactions on
Systems Science and Cybernetics, vol. 4, no. 2, pp. 100–107,
1968.
[246] D. Van Vliet, “Improved shortest path algorithms for
transport net-
works,” Transportation Research, vol. 12, no. 1, pp. 7–20, 1978.
[247] R. Geisberger, P. Sanders, D. Schultes, and C. Vetter,
“Exact routing
in large road networks using contraction hierarchies,”

Transportation
Science, vol. 46, no. 3, pp. 388–404, 2012.
[248] E. Cohen, E. Halperin, H. Kaplan, and U. Zwick,
“Reachability
and distance queries via 2-hop labels,” SIAM Journal on
Computing,
vol. 32, no. 5, pp. 1338–1355, 2003.
[249] R. Bauer, D. Delling, P. Sanders, D. Schieferdecker, D.
Schultes, and
D. Wagner, “Combining hierarchical and goal-directed speed-up
tech-
niques for dijkstra’s algorithm,” Journal of Experimental
Algorithmics
(JEA), vol. 15, pp. 2–3, 2010.
[250] D. Delling, A. V. Goldberg, A. Nowatzyk, and R. F.
Werneck, “Phast:
Hardware-accelerated shortest path trees,” Journal of Parallel
and
Distributed Computing, vol. 73, no. 7, pp. 940–952, 2013.
[251] M. Pivtoraiko and A. Kelly, “Efficient constrained path
planning via
search in state lattices,” in International Symposium on
Artificial
Intelligence, Robotics, and Automation in Space, 2005, pp. 1–7.
[252] J. Barraquand and J.-C. Latombe, “Robot motion
planning: A
distributed representation approach,” The International Journal
of
Robotics Research, vol. 10, no. 6, pp. 628–649, 1991.
[253] S. M. LaValle and J. J. Kuffner Jr, “Randomized

kinodynamic plan-
ning,” The international journal of robotics research, vol. 20,
no. 5,
pp. 378–400, 2001.
[254] L. Kavraki, P. Svestka, and M. H. Overmars,
“Probabilistic roadmaps
for path planning in high-dimensional configuration spaces,”
vol. 1994,
1994.
[255] H. Fuji, J. Xiang, Y. Tazaki, B. Levedahl, and T. Suzuki,
“Trajectory
planning for automated parking using multi-resolution state
roadmap
considering non-holonomic constraints,” in Intelligent Vehicles
Sympo-
sium Proceedings, 2014 IEEE. IEEE, 2014, pp. 407–413.
[256] P. Petrov and F. Nashashibi, “Modeling and nonlinear
adaptive control
for autonomous vehicle overtaking.” IEEE Transactions on
Intelligent
Transportation Systems, vol. 15, no. 4, pp. 1643–1656, 2014.
[257] J. P. Rastelli, R. Lattarulo, and F. Nashashibi, “Dynamic
trajectory
generation using continuous-curvature algorithms for door to
door
assistance vehicles,” in Intelligent Vehicles Symposium
Proceedings,
2014 IEEE. IEEE, 2014, pp. 510–515.
[258] D. Dolgov, S. Thrun, M. Montemerlo, and J. Diebel, “Path
planning
for autonomous vehicles in unknown semi-structured

environments,”
The International Journal of Robotics Research, vol. 29, no. 5,
pp.
485–501, 2010.
[259] J. Ren, K. A. McIsaac, and R. V. Patel, “Modified
newton’s method
applied to potential field-based navigation for mobile robots,”
IEEE
Transactions on Robotics, vol. 22, no. 2, pp. 384–391, 2006.
[260] L. Caltagirone, M. Bellone, L. Svensson, and M. Wahde,
“Lidar-based
driving path generation using fully convolutional neural
networks,” in
2017 IEEE 20th International Conference on Intelligent
Transportation
Systems (ITSC). IEEE, 2017, pp. 1–6.
[261] D. Barnes, W. Maddern, and I. Posner, “Find your own
way: Weakly-
supervised segmentation of path proposals for urban autonomy,”
in
2017 IEEE International Conference on Robotics and
Automation
(ICRA). IEEE, 2017, pp. 203–210.
[262] M. Elbanhawi and M. Simic, “Sampling-based robot
motion planning:
A review,” Ieee access, vol. 2, pp. 56–77, 2014.
[263] D. Isele, R. Rahimi, A. Cosgun, K. Subramanian, and K.
Fujimura,
“Navigating occluded intersections with autonomous vehicles

using
deep reinforcement learning,” in 2018 IEEE International
Conference
2039.
[264] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V.
Koltun, “Carla:
An open urban driving simulator,” arXiv preprint
arXiv:1711.03938,
2017.
[265] C. A. Pickering, K. J. Burnham, and M. J. Richardson, “A
review
of automotive human machine interface technologies and
techniques
to reduce driver distraction,” in 2nd Institution of Engineering
and
Technology international conference on system safety. IET,
2007, pp.
223–228.
[266] O. Carsten and M. H. Martens, “How can humans
understand their
automated cars? hmi principles, problems and solutions,”
Cognition,
Technology & Work, vol. 21, no. 1, pp. 3–20, 2019.
[267] P. Bazilinskyy and J. de Winter, “Auditory interfaces in
automated
driving: an international survey,” PeerJ Computer Science, vol.
1, p.
e13, 2015.
[268] M. Peden, R. Scurfield, D. Sleet, D. Mohan, A. A. Hyder,
E. Jarawan,

and C. D. Mathers, “World report on road traffic injury
prevention.”
World Health Organization Geneva, 2004.
[269] D. R. Large, L. Clark, A. Quandt, G. Burnett, and L.
Skrypchuk,
“Steering the conversation: a linguistic exploration of natural
language
interactions with a digital assistant during simulated driving,”
Applied
ergonomics, vol. 63, pp. 53–61, 2017.
[270] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M.
Enzweiler, R. Be-
nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes
dataset
for semantic urban scene understanding,” in Proceedings of the
IEEE
conference on computer vision and pattern recognition, 2016,
pp.
3213–3223.
[271] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan,
and T. Darrell,
“Bdd100k: A diverse driving video database with scalable
annotation
tooling,” arXiv preprint arXiv:1805.04687, 2018.
[272] G. Neuhold, T. Ollmann, S. R. Bulò, and P. Kontschieder,
“The
mapillary vistas dataset for semantic understanding of street
scenes,”
in ICCV, 2017, pp. 5000–5009.
[273] A. Patil, S. Malla, H. Gang, and Y.-T. Chen, “The h3d
dataset for

full-surround 3d multi-object detection and tracking in crowded
urban
scenes,” arXiv preprint arXiv:1903.01568, 2019.
[274] X. Huang, X. Cheng, Q. Geng, B. Cao, D. Zhou, P. Wang,
Y. Lin,
and R. Yang, “The apolloscape dataset for autonomous driving,”
arXiv
preprint arXiv:1803.06184, 2018.
[275] Udacity. Udacity dataset. https://github.com/udacity/self-
driving-car/
tree/master/datasets. [Retrieved April 30, 2019].
[276] H. Schafer, E. Santana, A. Haden, and R. Biasini. (2018)
A commute
in data: The comma2k19 dataset.
[277] Y. Chen, J. Wang, J. Li, C. Lu, Z. Luo, H. Xue, and C.
Wang,
“Lidar-video driving dataset: Learning driving policies
effectively,” in
Proceedings of the IEEE Conference on Computer Vision and
Pattern
Recognition, 2018, pp. 5870–5878.
[278] K. Takeda, J. H. Hansen, P. Boyraz, L. Malta, C.
Miyajima, and
H. Abut, “International large-scale vehicle corpora for research
on
driver behavior on the road,” IEEE Transactions on Intelligent
Trans-
portation Systems, vol. 12, no. 4, pp. 1609–1623, 2011.
[279] A. Blatt, J. Pierowicz, M. Flanigan, P.-S. Lin, A.
Kourtellis, C. Lee,

P. Jovanis, J. Jenness, M. Wilaby, J. Campbell et al.,
“Naturalistic
driving study: Field data collection.” Transportation Research
Board,
National Academy of Sciences, 2015.
[280] S. G. Klauer, F. Guo, J. Sudweeks, and T. A. Dingus, “An
analysis
of driver inattention using a case-crossover approach on 100-car
data.”
National Highway Traffic Safety Administration (NHTSA),
2010.
[281] M. Benmimoun, A. Pütz, A. Zlocki, and L. Eckstein,
“euroFOT: Field
operational test and impact assessment of advanced driver
assistance
systems: Final results,” in Proceedings of the FISITA 2012
World
Automotive Congress. Springer, 2013, pp. 537–547.
[282] S. Wang, M. Bai, G. Mattyus, H. Chu, W. Luo, B. Yang,
J. Liang,
J. Cheverie, S. Fidler, and R. Urtasun, “Torontocity: Seeing the
world
with a million eyes,” in IEEE International Conference on
Computer
Vision (ICCV). IEEE, 2017, pp. 3028–3036.
[283] Y. Choi, N. Kim, S. Hwang, K. Park, J. S. Yoon, K. An,
and I. S.
Kweon, “KAIST multi-spectral day/night data set for
autonomous
and assisted driving,” IEEE Transactions on Intelligent
Transportation

Systems, vol. 19, no. 3, pp. 934–948, 2018.
[284] S. Sibi, H. Ayaz, D. P. Kuhns, D. M. Sirkin, and W. Ju,
“Monitoring
driver cognitive load using functional near infrared
spectroscopy in par-
tially autonomous cars,” in 2016 IEEE Intelligent Vehicles
Symposium
(IV). IEEE, 2016, pp. 419–425.
[285] C. Braunagel, E. Kasneci, W. Stolzmann, and W.
Rosenstiel, “Driver-
activity recognition in the context of conditionally autonomous
driv-
ing,” in 2015 IEEE 18th International Conference on Intelligent
Transportation Systems. IEEE, 2015, pp. 1652–1657.
[286] M. Walch, K. Lange, M. Baumann, and M. Weber,
“Autonomous
driving: investigating the feasibility of car-driver handover
assistance,”
in Proceedings of the 7th International Conference on
Automotive User
Interfaces and Interactive Vehicular Applications. ACM, 2015,
pp.
11–18.
[287] J. H. Hansen, C. Busso, Y. Zheng, and A. Sathyanarayana,
“Driver
modeling for detection and assessment of driver distraction:
Examples
from the utdrive test bed,” IEEE Signal Processing Magazine,
vol. 34,
no. 4, pp. 130–142, 2017.
[288] M. Everingham, A. Zisserman, C. K. Williams, L. Van

Gool, M. Allan,
C. M. Bishop, O. Chapelle, N. Dalal, T. Deselaers, G. Dorkó et
al.,
“The 2005 pascal visual object classes challenge,” in Machine
Learning
Challenges Workshop. Springer, 2005, pp. 117–176.
[289] E. Santana and G. Hotz, “Learning a driving simulator,”
arXiv preprint
arXiv:1608.01230, 2016.
[290] H. Fan, F. Zhu, C. Liu, L. Zhang, L. Zhuang, D. Li, W.
Zhu, J. Hu,
H. Li, and Q. Kong, “Baidu apollo em motion planner,” arXiv
preprint
arXiv:1807.08048, 2018.
[291] NVIDIA. Driveworks sdk.
https://developer.nvidia.com/driveworks.
[Retrieved December 9, 2018].
[292] CommaAI. OpenPilot.
https://github.com/commaai/openpilot. [Re-
trieved December 9, 2018].
[293] B. Wymann, C. Dimitrakakis, A. Sumner, E. Espié, and C.
Guion-
neau. (2015) TORCS: the open racing car simulator.
http://www.cse.
chalmers.se/∼chrdimi/papers/torcs.pdf. [Retrieved May 2,
2019].
[294] S. R. Richter, Z. Hayder, and V. Koltun, “Playing for
benchmarks,” in
International Conference on Computer Vision (ICCV), vol. 2,
2017.

[295] N. P. Koenig and A. Howard, “Design and use paradigms
for gazebo,
an open-source multi-robot simulator,” in IEEE/RSJ
International Con-
ference on Intelligent Robots and Systems (IROS), vol. 4. IEEE,
2004,
pp. 2149–2154.
[296] D. Krajzewicz, G. Hertkorn, C. Rössel, and P. Wagner,
“Sumo
(simulation of urban mobility)-an open-source traffic
simulation,” in
Proceedings of the 4th middle East Symposium on Simulation
and
Modelling (MESM20002), 2002, pp. 183–187.
[297] J. E. Stellet, M. R. Zofka, J. Schumacher, T. Schamm, F.
Niewels,
and J. M. Zöllner, “Testing of advanced driver assistance
towards
automated driving: A survey and taxonomy on existing
approaches
and open questions,” in 18th International Conference on
Intelligent
Transportation Systems (ITSC). IEEE, 2015, pp. 1455–1462.
Ekim Yurtsever (Member, IEEE) received his B.S.
and M.S. degrees from Istanbul Technical University
in 2012 and 2014 respectively. He is currently a
Ph.D. candidate in Information Science at Nagoya
University, Japan.
His research interests include automated driving
systems and machine learning.

https://github.com/udacity/self-driving-car/tree/master/datasets
https://github.com/udacity/self-driving-car/tree/master/datasets
https://developer.nvidia.com/driveworks
https://github.com/commaai/openpilot
http://www.cse.chalmers.se/~chrdimi/papers/torcs.pdf
http://www.cse.chalmers.se/~chrdimi/papers/torcs.pdf
Jacob Lambert (Student Member, IEEE) received
his B.S. in Honours Physics in 2014 at McGill
University in Montreal, Canada. He received his
M.A.Sc. in 2017 at the University of Toronto,
Canada, and is currently a PhD candidate in Nagoya
University, Japan.
His current research focuses on 3D perception
through lidar sensors for autonomous robotics.
Alexander Carballo (Member, IEEE) received his
Dr.Eng. degree from the Intelligent Robot Labora-
tory, University of Tsukuba, Japan. From 1996 to
2006, he worked as lecturer at School of Computer
Engineering, Costa Rica Institute of Technology.
From 2011 to 2017, worked in Research and Devel-
opment at Hokuyo Automatic Co., Ltd. Since 2017,
he is a Designated Assistant Professor at Institutes
of Innovation for Future Society, Nagoya University,
Japan.
His main research interests are lidar sensors,
robotic perception and autonomous driving.
Kazuya Takeda (Senior Member, IEEE) received
his B.E.E., M.E.E., and Ph.D. from Nagoya Uni-
versity, Japan. Since 1985 he had been working at
Advanced Telecommunication Research Laborato-

ries and at KDD R&D Laboratories, Japan. In 1995,
he started a research group for signal processing
applications at Nagoya University.
He is currently a Professor at the Institutes of
Innovation for Future Society, Nagoya University
and with Tier IV inc. He is also serving as a member
of the Board of Governors of the IEEE ITS society.
His main focus is investigating driving behavior using data
centric ap-
proaches, utilizing signal corpora of real driving behavior.
I IntroductionII Prospects and ChallengesII-A Social impactII-B
ChallengesIII System components and architectureIII-A System
architectureIII-A1 Ego-only systemsIII-A2 Modular systemsIII-
A3 End-to-end systemsIII-A4 Connected systemsIII-B Sensors
and hardwareIII-B1 Monocular CamerasIII-B2 Omnidirection
CameraIII-B3 Event CamerasIII-B4 RadarIII-B5 LidarIII-B6
Proprioceptive sensorsIII-B7 Full size carsIII-B8 Large vehicles
and trailersIV Localization and mappingIV-A GPS-IMU
fusionIV-B Simultaneous localization and mappingIV-C A
priori map-based localizationIV-C1 Landmark searchIV-C2
Point cloud matchingIV-C3 2D to 3D matchingV PerceptionV-A
DetectionV-A1 Image-based Object DetectionV-A2 Semantic
SegmentationV-A3 3D Object DetectionV-B Object TrackingV-
C Road and Lane DetectionVI AssessmentVI-A Risk and
uncertainty assessment VI-B Surrounding driving behavior
assessmentVI-C Driving style recognitionVII Planning and
decision makingVII-A Global planningVII-B Local
planningVIII Human machine interfaceIX Datasets and
available toolsIX-A Datasets and BenchmarksIX-B Open-source
frameworks and simulatorsX
ConclusionsReferencesBiographiesEkim YurtseverJacob
LambertAlexander CarballoKazuya Takeda

/
Printed by: [email protected] Printing is for personal, private
use only. No part of this book
may be reproduced or transmitted without publisher's prior
permission. Violators will be prosecuted.
/
/
/

/
/
King Faisal University
Computer Science MS Program
College of Computer Science and IT
Department of Computer Science
1440/1441H – 2019/2020
MAS: Assignment 2, Due date: 2020
Q. Please write the summary of the article in your own words

covering the pages all the sections of the articles. It may span
5-6 pages but with added section for your comments
(weaknesses and strengths of the paper). Marks 30

A Survey of Autonomous Driving CommonPractices and Emerging.docx

More Related Content

Similar to A Survey of Autonomous Driving CommonPractices and Emerging.docx (20)

More from daniahendric (20)

Recently uploaded (20)

A Survey of Autonomous Driving CommonPractices and Emerging.docx