0% found this document useful (0 votes)

57 views7 pages

Deep Learning for Highway Driving

1. The document evaluates the use of deep learning techniques for highway driving tasks like car and lane detection. It collects a large highway driving dataset and applies deep learning algorithms to problems like object detection. 2. Convolutional neural networks are shown to perform lane and vehicle detection in real-time, supporting the potential for deep learning in autonomous driving applications. However, neural networks require large datasets representing all driving environments and scenarios to prepare for practical use. 3. Classic computer vision has not provided the robustness needed for production-grade self-driving cars due to the need for extensive engineering. Deep learning offers an alternative approach that can learn from large datasets with minimal engineering.

Uploaded by

Yajan Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views7 pages

Deep Learning for Highway Driving

Uploaded by

Yajan Agarwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

An Empirical Evaluation of Deep Learning on

Highway Driving
Brody Huval∗ , Tao Wang∗ , Sameep Tandon∗ , Jeff Kiske∗ , Will Song∗ , Joel Pazhayampallil∗ ,
Mykhaylo Andriluka∗ , Pranav Rajpurkar∗ , Toki Migimatsu∗ , Royce Cheng-Yue† ,
Fernando Mujica‡ , Adam Coates§ , Andrew Y. Ng∗
∗ Stanford University † Twitter ‡ Texas Instruments § Baidu Research
arXiv:1504.01716v3 [[Link]] 17 Apr 2015

Abstract—Numerous groups have applied a variety of deep

learning techniques to computer vision problems in highway
perception scenarios. In this paper, we presented a number of
empirical evaluations of recent deep learning advances. Com-
puter vision, combined with deep learning, has the potential
to bring about a relatively inexpensive, robust solution to au-
tonomous driving. To prepare deep learning for industry uptake
and practical applications, neural networks will require large
data sets that represent all possible driving environments and
scenarios. We collect a large data set of highway data and apply
deep learning and computer vision algorithms to problems such
as car and lane detection. We show how existing convolutional
neural networks (CNNs) can be used to perform lane and vehicle
detection while running at frame rates required for a real-time
system. Our results lend credence to the hypothesis that deep
learning holds promise for autonomous driving.

I. I NTRODUCTION
Fig. 1: Sample output from our neural network capable of lane
Since the DARPA Grand Challenges for autonomous vehi-
and vehicle detection.
cles, there has been an explosion in applications and research
for self-driving cars. Among the different environments for
self-driving cars, highway and urban roads are on opposite
ends of the spectrum. In general, highways tend to be more detection typically requires radar, while nearby car detection
predictable and orderly, with road surfaces typically well- can be solved with sonar. Computer vision can play an
maintained and lanes well-marked. In contrast, residential or important a role in lane detection as well as redundant object
urban driving environments feature a much higher degree of detection at moderate distances. Radar works reasonably well
unpredictability with many generic objects, inconsistent lane- for detecting vehicles, but has difficulty distinguishing between
markings, and elaborate traffic flow patterns. The relative different metal objects and thus can register false positives on
regularity and structure of highways has facilitated some of the objects such as tin cans. Also, radar provides little orientation
first practical applications of autonomous driving technology. information and has a higher variance on the lateral position
Many automakers have begun pursuing highway auto-pilot of objects, making the localization difficult on sharp bends.
solutions designed to mitigate driver stress and fatigue and The utility of sonar is both compromised at high speeds and,
to provide additional safety features; for example, certain even at slow speeds, is limited to a working distance of about
advanced-driver assistance systems (ADAS) can both keep 2 meters. Compared to sonar and radar, cameras generate a
cars within their lane and perform front-view car detection. richer set of features at a fraction of the cost. By advancing
Currently, the human drivers retain liability and, as such, computer vision, cameras could serve as a reliable redundant
must keep their hands on the steering wheel and prepare to sensor for autonomous driving. Despite its potential, computer
control the vehicle in the event of any unexpected obstacle or vision has yet to assume a significant role in today’s self-
catastrophic incident. Financial considerations contribute to a driving cars. Classic computer vision techniques simply have
substantial performance gap between commercially available not provided the robustness required for production grade
auto-pilot systems and fully self-driving cars developed by automotives; these techniques require intensive hand engineer-
Google and others. Namely, today’s self-driving cars are ing, road modeling, and special case handling. Considering
equipped with expensive but critical sensors, such as LIDAR, the seemingly infinite number of specific driving situations,
radar and high-precision GPS coupled with highly detailed environments, and unexpected obstacles, the task of scaling
maps. classic computer vision to robust, human-level performance
In today’s production-grade autonomous vehicles, critical would prove monumental and is likely to be unrealistic.
sensors include radar, sonar, and cameras. Long-range vehicle Deep learning, or neural networks, represents an alternative
2

approach to computer vision. It shows considerable promise network we train likely learns some model of the road for
as a solution to the shortcomings of classic computer vision. object detection and depth predictions, but it is never explicitly
Recent progress in the field has advanced the feasibility engineered and instead learns from the annotations alone.
of deep learning applications to solve complex, real-world Before the wide spread adoption of Convolutional Neu-
problems; industry has responded by increasing uptake of such ral Networks (CNNs) within computer vision, deformable
technology. Deep learning is data centric, requiring heavy parts based models were the most successful methods for
computation but minimal hand-engineering. In the last few detection [13]. After the popular CNN model AlexNet [9]
years, an increase in available storage and compute capabilities was proposed, state-of-the-art detection shifted towards CNNs
have enabled deep learning to achieve success in supervised for feature extraction [1], [14], [10], [15]. Girshick et al.
perception tasks, such as image detection. A neural network, developed R-CNN, a two part system which used Selective
after training for days or even weeks on a large data set, can Search [16] to propose regions and AlexNet to classify them.
be capable of inference in real-time with a model size that is R-CNN achieved state-of-the-art on Pascal by a large margin;
no larger than a few hundred MB [9]. State-of-the-art neural however, due to its nearly 1000 classification queries and
networks for computer vision require very large training sets inefficient re-use of convolutions, it remains impractical for
coupled with extensive networks capable of modeling such real-time implementations. Szegedy et al. presented a more
immense volumes of data. For example, the ILSRVC data-set, scalable alternative to R-CNN, that relies on the CNN to
where neural networks achieve top results, contains 1.2 million propose higher quality regions compared to Selective Search.
images in over 1000 categories. This reduces the number of region proposals down to as low as
By using expensive existing sensors which are currently 79 while keeping the mAP competitive with Selective Search.
used for self-driving applications, such as LIDAR and mm- An even faster approach to image detection called Overfeat
accurate GPS, and calibrating them with cameras, we can was presented by Sermanet et al. [1]. By using a regular
create a video data set containing labeled lane-markings and pattern of “region proposals”, Overfeat can efficiently reuse
annotated vehicles with location and relative speed. By build- convolution computations from each layer, requiring only a
ing a labeled data set in all types of driving situations (rain, single forward pass for inference.
snow, night, day, etc.), we can evaluate neural networks on this For our empirical evaluation, we use a straight-forward
data to determine if it is robust in every driving environment application of Overfeat, due to its efficiencies, and combine
and situation for which we have training data. this with labels similar to the ones proposed by Szegedy et al..
In this paper, we detail empirical evaluation on the data set We describe the model and similarities in the next section.
we collect. In addition, we explain the neural network that we
applied for detecting lanes and cars, as shown in Figure 1. III. R EAL T IME V EHICLE D ETECTION
Convolutional Neural Networks (CNNs) have had the
largest success in image recognition in the past 3 years [9],
II. R ELATED W ORK
[17], [18], [19]. From these image recognition systems, a
Recently, computer vision has been expected to player a number of detection networks were adapted, leading to further
larger role within autonomous driving. However, due to its advances in image detection. While the improvements have
history of relatively low precision, it is typically used in been staggering, not much consideration had been given to
conjunction with either other sensors or other road models the real-time detection performance required for some appli-
[3], [4], [6], [7]. Cho et al. [3] uses multiple sensors, such cations. In this paper, we present a detection system capable
as LIDAR, radar, and computer vision for object detection. of operating at greater than 10Hz using nothing but a laptop
They then fuse these sensors together in a Kalman filter using GPU. Due to the requirements of highway driving, we need
motion models on the objects. Held et al. [4], uses only a to ensure that the system used can detect cars more than
deformable parts based model on images to get the detections, 100m away and can operate at speeds greater than 10Hz; this
then uses road models to filter out false positives. Carafii et al. distance requires higher image resolutions than is typically
[6] uses a WaldBoost detector along with a tracker to generate used, and in our case is 640 × 480. We use the Overfeat
pixel space detections in real time. Jazayeri et al. [7] relies on CNN detector, which is very scalable, and simulates a sliding
temporal information of features for detection, and then filters window detector in a single forward pass in the network by
out false positives with a front-view motion model. efficiently reusing convolutional results on each layer [1].
In contrast to these object detectors, we do not use any Other detection systems, such as R-CNN, rely on selecting
road or motion-based models; instead we rely only on the as many as 1000 candidate windows, where each is evaluated
robustness of a neural network to make reasonable predictions. independently and does not reuse convolutional results.
In addition, we currently do not rely on any temporal features, In our implementation, we make a few minor modifications
and the detector operates independently on single frames from to Overfeat’s labels in order to handle occlusions of cars, pre-
a monocular camera. To make up for the lack of other sensors, dictions of lanes, and accelerate performance during inference.
which estimate object depth, we train the neural network to We will first provide a brief overview of the original imple-
predict depth based on labels extracted from radar returns. mentation and will then address the modifications. Overfeat
Although the model only predicts a single depth value for converts an image recognition CNN into a “sliding window”
each object, Eigen et al. have shown how a neural network detector by providing a larger resolution image and trans-
can predict entire depth maps from single images [12]. The forming the fully connected layers into convolutional layers.
3

Then, after converting the fully connected layer, which would

have produced a single final feature vector, to a convolutional
layer, a grid of final feature vectors is produced. Each of the CNN
resulting feature vectors represents a slightly different context
view location within the original pixel space. To determine
the stride of this window in pixel space, it is possible to
simply multiply the strides on each convolutional or pool layer
Fig. 2: mask detector
together. The network we used has a stride size of 32 pixels.
Each final feature vector in this grid can predict the presence
of an object; once an object is detected, those same features are
then used to predict a single bounding box through regression. We combine these ideas by using the efficient “sliding
The classifier will predict “no-object” if it can not discern any window” detector of Overfeat to produce an object mask and
part of an object within its entire input view. This causes large perform bounding box regression. This is shown in Fig 3.
ambiguities for the classifier, which can only predict a single In this implementation, we use a single image resolution of
object, as two different objects could can easily appear in the 640 × 480 with no skip gram kernels. To help the ambiguity
context view of the final feature vector, which is typically problem, and reduce the number of bounding boxes predicted,
larger than 50% of the input image resolution. we alter the detector on the top layer to only activate within
The network we used has a context view of 355×355 pixels a 4 × 4 pixel region at the center of its context view, as
in size. To ensure that all objects in the image are classified shown in the first box in Fig 3. Because it’s highly unlikely
at least once, many different context views are taken of the that any two different object’s bounding boxes appear in a
image by using skip gram kernels to reduce the stride of the 4 × 4 pixel region, compared to the entire context view with
context views and by using up to four different scales of the Overfeat, the bounding box regressor will no longer have to
input image. The classifier is then trained to activate when an arbitrarily choose between two valid objects in its context
object appears anywhere within its entire context view. In the view. In addition, because the requirement for the detector to
original Overfeat paper, this results in 1575 different context fire is stricter, this produces many fewer bounding boxes which
views (or final feature vectors), where each one is likely to greatly reduces our run-time performance during inference.
become active (create a bounding box). Although these changes helped, ambiguity was still a com-
This creates two problems for our empirical evaluation. Due mon problem on the border of bounding boxes in the cases of
to the L2 loss between the predicted bounding box and actual occlusion. This ambiguity results in a false bounding box being
bounding proposed by Sermanet et al., the ambiguity of having predicted between the two ground truth bounding boxes. To
two valid bounding box locations to predict when two objects fix this problem, the bounding boxes were first shrunk by 75%
appear, is incorrectly handled by the network by predicting a before creating the detection mask label. This added the addi-
box in the center of the two objects to minimize its expected tional requirement that the center 4 × 4-pixel region of the de-
loss. These boxes tend to cause a problem for the bounding tector window had to be within the center region of the object
box merging algorithm, which incorrectly decides that there before activating. The bounding box regressor however, still
must be a third object between the two ground truth objects. predicts the original bounding box before shrinking. This also
This could cause problems for an ADAS system which falsely further reduces the number of active bounding boxes as input
believes there is a car where there is not, and emergency to our merging algorithm. We also found that switching from
breaking is falsely applied. In addition, the merging algorithm, L2 to L1 loss on the bounding box regressions results in better
used only during inference, operates in O(n2 ) where n is the performance. To merge the bounding boxes together, we used
number of bounding boxes proposed. Because the bounding OpenCV’s efficient implementation of groupRectangles,
box merging is not as easily parallelizable as the CNN, this which clusters the bounding boxes based on a similarity metric
merging may become the bottleneck of a real-time system in in O(n2 ) [8].
the case of an inefficient implementation or too many predicted The lower layers of our CNN we use for feature extraction
bounding boxes. is similar to the one proposed by Krizhevsky et al. [9]. Our
In our evaluations, we use a mask detector as described modifications to the network occurs on the dense layers which
in Szegedy et al. [10] to improve some of the issues with are converted to convolution, as described in Sermanet et
Overfeat as described above. Szegedy et al. proposes a CNN al. [1]. When using our larger image sizes of 640 × 480
that takes an image as input and outputs an object mask this changes the previous final feature response maps of size
through regression, highlighting the object location. The idea 1 × 1 × 4096 to 20 × 15 × 4096. As stated earlier, each of
of a mask detector is shown in Fig 2. To distinguish multiple these feature vectors sees a context region of 355×355 pixels,
nearby objects, different part-detectors output object masks, and the stride between them is 32 × 32 pixels; however, we
from which bounding boxes are then extracted. The detector want each making predictions at a resolution of 4 × 4 pixels,
they propose must take many crops of the image, and run which would leave gaps in our input image. To fix this, we use
multiple CNNs for each part on every crop. Their resulting each 4096 feature as input to 64 softmax classifiers, which are
implementation takes roughly 5-6 seconds per frame per class arranged in an 8 × 8 grid each predicting if an object is within
using a 12-core machine, which would be too slow for our a different 4 × 4 pixel region. This allows for the 4096 feature
application. vector to cover the full stride size of 32 × 32 pixels; the end
4

“Sliding Window” Mask Detector Bounding Box

Context Detector Result Regression

Fig. 3: overfeat-mask

result is a grid mask detector of size 160 × 120 where each

element is 4 × 4 pixels which covers the entire input image of
size 640 × 480.

A. Lane Detection
The CNN used for vehicle detection can be easily extended
for lane boundary detection by adding an additional class.
Whereas the regression for the vehicle class predicts a five
dimensional value (four for the bounding box and one for
depth), the lane regression predicts six dimensions. Similar to
the vehicle detector, the first four dimensions indicate the two
end points of a local line segment of the lane boundary. The
remaining two dimensions indicate the depth of the endpoints
with respect to the camera. Fig 4 visualizes the lane boundary
ground truth label overlaid on an example image. The green
tiles indicate locations where the detector is trained to fire, Fig. 4: Example of lane boundary ground truth
and the line segments represented by the regression labels are
explicitly drawn. The line segments have their ends connected
to form continuous splines. The depth of the line segments
are color-coded such that the closest segments are red and
the furthest ones are blue. Due to our data collection methods
for lane labels, we are able to obtain ground truth in spite of
objects that occlude them. This forces the neural network to
learn more than a simple paint detector, and must use context
to predict lanes where there are occlusions.
Similar to the vehicle detector, we use L1 loss to train
the regressor. We use mini-batch stochastic gradient descent
Fig. 5: Example output of lane detector after DBSCAN
for optimization. The learning rate is controlled by a variant
clustering
of the momentum scheduler [11]. To obtain semantic lane
information, we use DBSCAN to cluster the line segments
into lanes. Fig 5 shows our lane predictions after DBSCAN
Receiver. We also have access to the Q50 built-in Continental
clustering. Different lanes are represented by different colors.
mid-range radar system. The sensors are connected to a Linux
Since our regressor outputs depths as well, we can predict the
PC with a Core i7-4770k processor.
lane shapes in 3D using inverse camera perspective mapping.
Once the raw videos are collected, we annotate the 3D
locations for vehicles and lanes as well as the relative speed
IV. E XPERIMENTAL S ETUP
of all the vehicles. To get vehicle annotations, we follow the
A. Data Collection conventional approach of using Amazon Mechanical Turk to
Our Research Vehicle is a 2014 Infiniti Q50. The car get accurate bounding box locations within pixel space. Then,
currently uses the following sensors: 6x Point Grey Flea3 cam- we match bounding boxes and radar returns to obtain the
eras, 1x Velodyne HDL32E lidar, and 1x Novatel SPAN-SE distance and relative speed of the vehicles.
5

Unlike vehicles that can be annotated with bounding boxes,

highway lane borders often need to be annotated as curves
of various shapes. This makes frame-level labelling not only
tedious and inefficient, but also prone to human errors. For-
tunately, lane markings can be considered as static objects
that do not change their geolocations very often. We follow
the process descried in [5] to create LIDAR maps of the
environment using the Velodyne and GNSS systems. Using
these maps, labeling is straight forward. First, we filter the
3D point clouds based on lidar return intensity and position
to obtain the left and right boundaries of the ego-lane. Then,
we replicate the left and right ego-lane boundaries to obtain
initial guesses for all the lane boundaries. A human annotator
inspects the generated lane boundaries and makes appropriate
corrections using our 3D labelling tool. For completeness, we
describe each of these steps in details.
Fig. 6: Image after perspective distortion
1) Ego-lane boundary generation: Since we do not change
lanes during data collection drives, the GPS trajectory of our
research vehicle already gives a decent estimate of the shape
to the ground truth labels so that they match correctly with
of the road. We can then easily locate the ego-lane boundaries
the distorted image.
using a few heuristic filters. Noting that lane boundaries on
highways are usually marked with retro-reflective materials,
we first filter out low-reflectivity surfaces such as asphalt C. Results
in our 3D point cloud maps and only consider points with The detection network used is capable of running at 44Hz
high enough laser return intensities. We then filter out other using a desktop PC equipped with a GTX 780 Ti. When using
reflective surfaces such as cars and traffic signs by only consid- a mobile GPU, such as the Tegra K1, we were capable of
ering points whose heights are close enough the ground plane. running the network at 2.5Hz, and would expect the system
Lastly, assuming our car drives close to the center of the lane, to run at 5Hz using the Nvidia PX1 chipset.
we filter out ground paint other than the ego-lane boundaries, Our lane detection test set consists of 22 video clips
such as other lane boundaries, car-pool or directional signs, collected using both left and right cameras during 11 different
by only considering markings whose absolute lateral distances data collection runs, which correspond to about 50 minutes of
from the car are smaller than 2.2 meters and greater than 1.4 driving. We evaluate detection results for four lane boundaries,
meters. We can also distinguish the left boundary from the namely, the left and right boundaries of the ego lane, plus
right one using the sign of the lateral distance. After obtaining the outer boundaries of the two adjacent lanes. For each of
the points in the left and right boundaries, we fit a piecewise these lane boundaries, we further break down the evaluation
linear curve similar to the GPS trajectory to each boundary. by longitudinal distances, which range from 15 to 80 meters
2) Semi-automatic generation of multiple lane boundaries: ahead of the car, spaced by 5 meters. Thus, there are at
We observe that the width of lanes during a single data maximum 4 × 14 = 56 positions at which we evaluate the
collection run stays constant most of the time, with occasional detection results. We pair up the prediction and ground truth
exceptions such as merges and splits. Therefore, if we prede- points at each of these locations using greedy nearest neighbor
fine the number of lanes to the left and right of the car for a matching. True positives, false positives and false negatives are
single run, we can make a good initial guess of all the lane accumulated at every evaluation location in a standard way:
boundaries by shifting the auto-generated ego-lane boundaries A true positive is counted when the matched prediction and
laterally by multiples of the lane width. We will then rely on ground truth differ by less than 0.5 meter. If the matched
human annotators to fix the exception cases. prediction and ground truth differ by more than 0.5 meter,
both false positive and false negative counts are incremented.
Fig 7 shows a visualization of this evaluation method on
B. Data Set one image. The blue dots are true positives. The red dots are
At the time of this writing our annotated data-set consists false positives, and the yellow ones are false negatives. Fig 8
of 14 days of driving in the San Francisco Bay Area during shows the aggregated precision, recall and F1 score on all test
the months of April-June for a few hours each day. The videos. For the ego-lane boundaries, we obtain 100% F1 score
vehicle annotated data is sampled at 1/3Hz and contains nearly up to 50 meters. Recall starts to drop fast beyond 65 meters,
17 thousand frames with 140 thousand bounding boxes. The mainly because the resolution of the image cannot capture the
lane annotated data is sampled at 5Hz and contains over 616 width of the lane markings at that distance. For the adjacent
thousand frames. During training, translation and 7 different lanes, recall is low for the nearest point because it is outside
perspective distortions are applied to the raw data sets. Fig 6 the field of view of the camera.
shows an example image after perspective distortions are The vehicle detection test set consists of 13 video clips
applied. Note that we apply the same perspective distortion collected from a single day, which corresponds to 1 hour
6

Fig. 7: Left: lane prediction on test image. Right: Lane

detection evaluated in 3D

Fig. 9: Car Detector Bounding Box Performance

(a) (b)

(a) FP: tree (b) FP: overpass

Fig. 10: Vehicle False Positives

(c) (d)
ground truth. The standard error in the depth predictions as a
function of depth can be seen in Fig 12.
Fig. 8: Lane detection results on different lateral lanes. (a) For a qualitative review of the detection system, we have
Ego-lane left border. (b) Ego-lane right border. (c) Left adja- uploaded a 1.5 hour video of the vehicle detector ran on our
cent lane left border. (d) Right adjacent lane right border. test set. This may be found at [Link]/GJ0cZBkHoHc. A
short video of our lane detector may also be found online
at [Link]/__f5pqqp6aM. In these videos, we evaluate the
and 30 mins of driving. The accuracy of the vehicle bound- detector on every frame independently and display the raw
ing box predictions were measured using Intersection Over detections, without the use of any Kalman filters or road
Union (IOU) against the ground truth boxes from Amazon models. The red locations in the video correspond to the mask
Mechanical Turk (AMT). A bounding box prediction matched detectors that are activated. This network was only trained on
with ground truth if IOU≥ 0.5. The performance of our the rear view of cars traveling in the same direction, which is
car detection as a function of depth can be seen in Fig 9. why cars across the highway barrier are commonly missed.
Nearby false positives can cause the largest problems for We have open sourced the code for the vehicle and lane
ADAS systems which could cause the system to needlessly
apply the brakes. In our system, we found overpasses and
shading effects to cause the largest problems. Two examples
of these situations are shown in Fig 10.
As a baseline to our car detector, we compared the detection
results to the Continental mid-range radar within our data
collection vehicle. While matching radar returns to ground
truth bounding boxes, we found that although radar had nearly
100% precision, false positives were being introduced through
errors in radar/camera calibration. Therefore, to ensure a fair
comparison we matched every radar return to a ground truth
bounding box even if IOU< 0.5, giving our radar returns 100%
precision. This comparison is shown in Fig 11, the F1 score
for radar is simply the recall.
In addition to the bounding box locations, we measured
the accuracy of the predicted depth by using radar returns as Fig. 11: Radar Comparison to Vehicle Detector
7

[6] Caraffi, Claudio, et al. ”A system for real-time detection and tracking of
vehicles from a single car-mounted camera.” Intelligent Transportation
Systems (ITSC), 2012 15th International IEEE Conference on. IEEE,
2012.
[7] Jazayeri, Amirali, et al. ”Vehicle detection and tracking in car video based
on motion model.” Intelligent Transportation Systems, IEEE Transactions
on 12.2 (2011): 583-595.
[8] Bradski, Gary. ”The opencv library.” Doctor Dobbs Journal 25.11 (2000):
120-126.
[9] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. ”Imagenet
classification with deep convolutional neural networks.” Advances in
neural information processing systems. 2012.
[10] Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. ”Deep neural
networks for object detection.” Advances in Neural Information Process-
ing Systems. 2013.
[11] Sutskever, Ilya, et al. ”On the importance of initialization and momentum
in deep learning.” Proceedings of the 30th International Conference on
Machine Learning (ICML-13). 2013.
Fig. 12: Car Detector Depth Performance [12] Eigen, David, Christian Puhrsch, and Rob Fergus. ”Depth map prediction
from a single image using a multi-scale deep network.” Advances in
Neural Information Processing Systems. 2014.
[13] Felzenszwalb, Pedro F., et al. ”Object detection with discriminatively
detector online at [Link]/brodyh/caffe. Our repository was trained part-based models.” Pattern Analysis and Machine Intelligence,
forked from the original Caffe code base from the BVLC IEEE Transactions on 32.9 (2010): 1627-1645.
group [20]. [14] Szegedy, Christian, et al. ”Scalable, High-Quality Object Detection.”
arXiv preprint arXiv:1412.1441 (2014).
[15] Girshick, Ross, et al. ”Rich feature hierarchies for accurate object
V. C ONCLUSION detection and semantic segmentation.” Computer Vision and Pattern
Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014.
By using Camera, Lidar, Radar, and GPS we built a highway [16] Uijlings, Jasper RR, et al. ”Selective search for object recognition.”
data set consisting of 17 thousand image frames with vehicle International journal of computer vision 104.2 (2013): 154-171.
bounding boxes and over 616 thousand image frames with [17] Szegedy, Christian, et al. ”Going deeper with convolutions.” arXiv
preprint arXiv:1409.4842 (2014).
lane annotations. We then trained on this data using a CNN [18] He, Kaiming, et al. ”Delving Deep into Rectifiers: Surpassing
architecture capable of detecting all lanes and cars in a single Human-Level Performance on ImageNet Classification.” arXiv preprint
forward pass. Using a single GTX 780 Ti our system runs arXiv:1502.01852 (2015).
[19] Simonyan, Karen, and Andrew Zisserman. ”Very deep convolutional net-
at 44Hz, which is more than adequate for real-time use. Our works for large-scale image recognition.” arXiv preprint arXiv:1409.1556
results show existing CNN algorithms are capable of good (2014).
performance in highway lane and vehicle detection. Future [20] Jia, Yangqing, et al. ”Caffe: Convolutional architecture for fast feature
embedding.” Proceedings of the ACM International Conference on Mul-
work will focus on acquiring frame level annotations that will timedia. ACM, 2014.
allow us to develop new neural networks capable of using
temporal information across frames.

ACKNOWLEDGMENT
This research was funded in part by Nissan who generously
donated the car used for data collection. We thank our col-
leagues Yuta Yoshihata from Nissan who provided technical
support and expertise on vehicles that assisted the research.
In addition, the authors would like to thank the author of
Overfeat, Pierre Sermanet, for their helpful suggestions on
image detection.

R EFERENCES
[1] Sermanet, Pierre, et al. ”Overfeat: Integrated recognition, localization and
detection using convolutional networks.” arXiv preprint arXiv:1312.6229
(2013).
[2] Rothengatter, Talib Ed, and Enrique Carbonell Ed Vaya. ”Traffic and
transport psychology: Theory and application.” International Conference
of Traffic and Transport Psychology, May, 1996, Valencia, Spain. Perga-
mon/Elsevier Science Inc, 1997.
[3] Cho, Hyunggi, et al. ”A multi-sensor fusion system for moving object
detection and tracking in urban driving environments.” Robotics and
Automation (ICRA), 2014 IEEE International Conference on. IEEE,
2014.
[4] Held, David, Jesse Levinson, and Sebastian Thrun. ”A probabilistic
framework for car detection in images using context and scale.” Robotics
and Automation (ICRA), 2012 IEEE International Conference on. IEEE,
2012.
[5] Levinson, Jesse, et al. ”Towards fully autonomous driving: systems and
algorithms.” Intelligent Vehicles Symposium, 2011.

Self-Driving Car AI Project Overview
No ratings yet
Self-Driving Car AI Project Overview
34 pages
Artificial Intelligence Based Self-Driving Car
No ratings yet
Artificial Intelligence Based Self-Driving Car
5 pages
Self-Driving Car AI Technology Insights
No ratings yet
Self-Driving Car AI Technology Insights
5 pages
Deep Learning for Autonomous Vehicle Control
No ratings yet
Deep Learning for Autonomous Vehicle Control
11 pages
CNN Model for Self-Driving Cars
No ratings yet
CNN Model for Self-Driving Cars
8 pages
CNN for Autonomous Driving Simulation
No ratings yet
CNN for Autonomous Driving Simulation
8 pages
Self Driving Car Using Aurdin and Rasberry Pi
No ratings yet
Self Driving Car Using Aurdin and Rasberry Pi
6 pages
Computer Vision in Autonomous Vehicles
No ratings yet
Computer Vision in Autonomous Vehicles
6 pages
IIoT Framework for Autonomous Vehicles
No ratings yet
IIoT Framework for Autonomous Vehicles
11 pages
Obstacle Detection for Autonomous Vehicles
No ratings yet
Obstacle Detection for Autonomous Vehicles
6 pages
Self Driving Car Using Raspberry Pi: Keywords:Raspberry Pi, Lane Detection, Obstacle Detection, Opencv, Deep Learning
No ratings yet
Self Driving Car Using Raspberry Pi: Keywords:Raspberry Pi, Lane Detection, Obstacle Detection, Opencv, Deep Learning
8 pages
Deep Learning for Autonomous Cars
No ratings yet
Deep Learning for Autonomous Cars
16 pages
Enabling Autonomous Driving in Cities
No ratings yet
Enabling Autonomous Driving in Cities
21 pages
Computer Vision in Autonomous Vehicles
No ratings yet
Computer Vision in Autonomous Vehicles
8 pages
A Survey of Deep Learning Applications To Autonomous Vehicle Control
No ratings yet
A Survey of Deep Learning Applications To Autonomous Vehicle Control
23 pages
Self-Driving Vehicle Technology Overview
No ratings yet
Self-Driving Vehicle Technology Overview
16 pages
Lane Detection for Autonomous Vehicles
No ratings yet
Lane Detection for Autonomous Vehicles
17 pages
Electronics 13 02790
No ratings yet
Electronics 13 02790
15 pages
SSLA for Traffic Sign & Lane Detection
No ratings yet
SSLA for Traffic Sign & Lane Detection
9 pages
Autonomous Car Technology Overview
No ratings yet
Autonomous Car Technology Overview
19 pages
Smt. Indira Gandhi College of Engineering Navi Mumbai. Accredited by NAAC With A Grade
No ratings yet
Smt. Indira Gandhi College of Engineering Navi Mumbai. Accredited by NAAC With A Grade
16 pages
Deep Learning for Lane Detection
No ratings yet
Deep Learning for Lane Detection
14 pages
Autonomous Vehicle Development Insights
No ratings yet
Autonomous Vehicle Development Insights
18 pages
Accident Avoidance in Driverless Cars
No ratings yet
Accident Avoidance in Driverless Cars
9 pages
CNNs in Self-Driving Car Technology
No ratings yet
CNNs in Self-Driving Car Technology
27 pages
Overview of Autonomous Vehicle Sensors
No ratings yet
Overview of Autonomous Vehicle Sensors
1 page
Future of Autonomous Cars
No ratings yet
Future of Autonomous Cars
22 pages
Sensors 23 06661 v2
No ratings yet
Sensors 23 06661 v2
21 pages
Autonomous Vehicle Technology Overview
No ratings yet
Autonomous Vehicle Technology Overview
15 pages
Computer Vision in Self Driving Cars
No ratings yet
Computer Vision in Self Driving Cars
11 pages
Real-Time Vehicle and Lane Detection Using Modified OverFeat CNN
No ratings yet
Real-Time Vehicle and Lane Detection Using Modified OverFeat CNN
9 pages
ADAS in Autonomous Vehicles Seminar Report
No ratings yet
ADAS in Autonomous Vehicles Seminar Report
32 pages
YOLOP for Lane Detection in Autonomous Vehicles
No ratings yet
YOLOP for Lane Detection in Autonomous Vehicles
7 pages
1902 07830
No ratings yet
1902 07830
27 pages
Deep Learning for Steering Prediction
No ratings yet
Deep Learning for Steering Prediction
35 pages
End-to-End Deep Learning for Autonomous Driving
No ratings yet
End-to-End Deep Learning for Autonomous Driving
10 pages
CNNs for Self-Driving Car Control
No ratings yet
CNNs for Self-Driving Car Control
6 pages
Autopilot Technology in Vehicles
No ratings yet
Autopilot Technology in Vehicles
19 pages
AI Innovations in Autonomous Vehicles
No ratings yet
AI Innovations in Autonomous Vehicles
24 pages
Self-Driving Cars: Development Fundamentals
No ratings yet
Self-Driving Cars: Development Fundamentals
8 pages
Lane Detection in Self-Driving Cars
No ratings yet
Lane Detection in Self-Driving Cars
1 page
Object Detection for Self-Driving Cars
No ratings yet
Object Detection for Self-Driving Cars
12 pages
Future of Autonomous Driving Cars
No ratings yet
Future of Autonomous Driving Cars
7 pages
Computer Vision in Autonomous Vehicles
No ratings yet
Computer Vision in Autonomous Vehicles
17 pages
Drivable Road Detection for Autonomous Cars
No ratings yet
Drivable Road Detection for Autonomous Cars
13 pages
Autonomous Car Control System Overview
No ratings yet
Autonomous Car Control System Overview
25 pages
Autonomous Car Seminar Overview
No ratings yet
Autonomous Car Seminar Overview
28 pages
Computer Vision in Autonomous Vehicles
No ratings yet
Computer Vision in Autonomous Vehicles
17 pages
Obstacle Detection in Self-Driving Cars
No ratings yet
Obstacle Detection in Self-Driving Cars
10 pages
Array: Abhishek Gupta, Alagan Anpalagan, Ling Guan, Ahmed Shaharyar Khwaja
No ratings yet
Array: Abhishek Gupta, Alagan Anpalagan, Ling Guan, Ahmed Shaharyar Khwaja
20 pages
Self-Driving Car Design and Implementation
No ratings yet
Self-Driving Car Design and Implementation
5 pages
Deep Learning in Autonomous Vehicles
No ratings yet
Deep Learning in Autonomous Vehicles
6 pages
Understanding Automation Levels in SDCs
No ratings yet
Understanding Automation Levels in SDCs
13 pages
Machine Learning for Autonomous Driving
No ratings yet
Machine Learning for Autonomous Driving
7 pages
Machine Learning in Autonomous Navigation
No ratings yet
Machine Learning in Autonomous Navigation
8 pages
Autonomous Vehicle Technology Insights
No ratings yet
Autonomous Vehicle Technology Insights
8 pages
Geo Brick LV AS3 User Manual
No ratings yet
Geo Brick LV AS3 User Manual
29 pages
Kindle Crusher
100% (3)
Kindle Crusher
61 pages
Joining Report and Security Guidelines
No ratings yet
Joining Report and Security Guidelines
13 pages
NPTEL Exam Instructions for April 2024
No ratings yet
NPTEL Exam Instructions for April 2024
2 pages
Yoga Assistant App Internship Report
No ratings yet
Yoga Assistant App Internship Report
49 pages
MXGB User Guide v2.1
No ratings yet
MXGB User Guide v2.1
6 pages
Java Programming Concepts and Questions
No ratings yet
Java Programming Concepts and Questions
24 pages
C2000 Eqep
No ratings yet
C2000 Eqep
12 pages
Technical Support Agent at Gigmo Solutions
No ratings yet
Technical Support Agent at Gigmo Solutions
1 page
Token Recognition and Finite Automata
No ratings yet
Token Recognition and Finite Automata
15 pages
Icx C360exp Retail
100% (1)
Icx C360exp Retail
183 pages
Finalreport
No ratings yet
Finalreport
14 pages
NC Part Programming Techniques
No ratings yet
NC Part Programming Techniques
4 pages
Survey Error Messages and Guidelines
No ratings yet
Survey Error Messages and Guidelines
20 pages
Appium Foundation Level Tester Syllabus
No ratings yet
Appium Foundation Level Tester Syllabus
65 pages
BCA NEP Syllabus Overview for BCU
No ratings yet
BCA NEP Syllabus Overview for BCU
13 pages
Analytics Trends in Indian Banking
100% (1)
Analytics Trends in Indian Banking
41 pages
AI-Enhanced Home Automation Systems
No ratings yet
AI-Enhanced Home Automation Systems
5 pages
Digital Health Consultation System
No ratings yet
Digital Health Consultation System
3 pages
Feature Engineering Techniques in ML
No ratings yet
Feature Engineering Techniques in ML
158 pages
Penetration Testing Techniques Overview
90% (10)
Penetration Testing Techniques Overview
20 pages
11th Computer Science Mid Term Papers 2025
No ratings yet
11th Computer Science Mid Term Papers 2025
1 page
Forward Chaining in Criminal Cases
No ratings yet
Forward Chaining in Criminal Cases
8 pages
Similarity-Based Learning in BCS606
No ratings yet
Similarity-Based Learning in BCS606
5 pages
Kiive Audio Tape Face Plugin Guide
No ratings yet
Kiive Audio Tape Face Plugin Guide
9 pages
GANs Enhancing Electric Vehicle Systems
No ratings yet
GANs Enhancing Electric Vehicle Systems
19 pages
Ambulatory Blood Pressure Monitor: Service Manual
No ratings yet
Ambulatory Blood Pressure Monitor: Service Manual
63 pages
Modsonic Ultrasonic Thickness Gauge Quote
No ratings yet
Modsonic Ultrasonic Thickness Gauge Quote
2 pages
Ben Brings the Mail: A Community Story
No ratings yet
Ben Brings the Mail: A Community Story
33 pages
NoSQL Database Characteristics for E-commerce
No ratings yet
NoSQL Database Characteristics for E-commerce
4 pages

Deep Learning for Highway Driving

Uploaded by

Deep Learning for Highway Driving

Uploaded by

1

An Empirical Evaluation of Deep Learning on

Abstract—Numerous groups have applied a variety of deep

Then, after converting the fully connected layer, which would

“Sliding Window” Mask Detector Bounding Box

result is a grid mask detector of size 160 × 120 where each

Unlike vehicles that can be annotated with bounding boxes,

Fig. 7: Left: lane prediction on test image. Right: Lane

Fig. 9: Car Detector Bounding Box Performance

(a) FP: tree (b) FP: overpass

You might also like