0% found this document useful (0 votes)
148 views24 pages

State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow's Intelligent Network Traffic Control Systems

This article discusses how deep learning can be applied to intelligent network traffic control systems to help them better adapt to increasing network traffic loads. Deep learning has the potential to configure and manage networks in a more autonomous way. Specifically, the article proposes using deep learning for intelligent routing. It demonstrates that a deep learning-based routing approach can be more effective than conventional routing strategies. Overall, the article argues that deep learning is a promising technique for network operators to intelligently control traffic in their networks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views24 pages

State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow's Intelligent Network Traffic Control Systems

This article discusses how deep learning can be applied to intelligent network traffic control systems to help them better adapt to increasing network traffic loads. Deep learning has the potential to configure and manage networks in a more autonomous way. Specifically, the article proposes using deep learning for intelligent routing. It demonstrates that a deep learning-based routing approach can be more effective than conventional routing strategies. Overall, the article argues that deep learning is a promising technique for network operators to intelligently control traffic in their networks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

State-of-the-Art Deep Learning: Evolving Machine


Intelligence Toward Tomorrow’s Intelligent Network
Traffic Control Systems
Zubair Md. Fadlullah, Senior Member, IEEE, Fengxiao Tang, Student Member, IEEE,
Bomin Mao, Student Member, IEEE, Nei Kato, Fellow, IEEE,
Osamu Akashi, Takeru Inoue, and Kimihiro Mizutani

Abstract—Currently, the network traffic control systems are packet-switched networks are experiencing a sharp growth in
mainly composed of the Internet core and wired/wireless het- data traffic owing to the rapid development of mobile user
erogeneous backbone networks. Recently, these packet-switched equipment, social networking applications and services, and
systems are experiencing an explosive network traffic growth due
to the rapid development of communication technologies. The so forth. The existing network policies are not adequate to
existing network policies are not sophisticated enough to cope adapt to the continually changing network conditions arising
with the continually varying network conditions arising from from the explosive traffic growth. In these years of tremendous
the tremendous traffic growth. Deep learning, with the recent growth in the network traffic, while the network operators
breakthrough in the machine learning/intelligence area, appears frequently expressed concern regarding declining profits [5],
to be a viable approach for the network operators to configure
and manage their networks in a more intelligent and autonomous it is almost the perfect time to rethink how the network traffic
fashion. While deep learning has received a significant research control can be improved. Therefore, incorporating intelligence
attention in a number of other domains such as computer vision, into network traffic control systems can play a significant
speech recognition, robotics, and so forth, its applications in role in guaranteeing Quality of Service (QoS) in Internet
network traffic control systems are relatively recent and garnered Protocol (IP)-based networks [6]. Over the past few decades,
rather little attention. In this paper, we address this point
and indicate the necessity of surveying the scattered works on machine learning (ML) has been exploited to intelligently
deep learning applications for various network traffic control dictate traffic control in wired/wireless networks [7]–[10].
aspects. In this vein, we provide an overview of the state-of- Since the early excitement stirred by machine intelligence
the-art deep learning architectures and algorithms relevant to in the 1950s, smaller subsets of machine intelligence have
the network traffic control systems. Also, we discuss the deep been impacting a myriad of applications over the last three
learning enablers for network systems. In addition, we discuss,
in detail, a new use case, i.e., deep learning based intelligent decades as shown in Fig 1. Recently, an even smaller subset
routing. We demonstrate the effectiveness of the deep learning of Machine Intelligence and ML techniques, known as deep
based routing approach in contrast with the conventional routing learning, has emerged with the potential of creating even larger
strategy. Furthermore, we discuss a number of open research disruptions [11]–[13]. Notice from Fig. 1 that our focus is
issues, which researchers may find useful in the future. different from the traditional one as we aim to investigate
Index Terms—Machine learning, machine intelligence, artificial how the state-of-the-art deep learning applications may disrupt
neural network, deep learning, deep belief system, network traffic computer networks, particularly the network traffic control
control, routing. systems. In order to understand why deep learning systems
are anticipated to replace their predecessors (i.e., conventional
I. I NTRODUCTION ML techniques), refer to the various types of ML tech-
niques (supervised, unsupervised, or reinforcement learning)
Recently, the rapid development of the current Internet
and different algorithms in Fig. 2, which may be used to
and mobile communications industry has contributed to in-
implement the intelligent decision-making for network traffic
creasingly large-scale, heterogeneous, dynamic, and system-
control systems. Among the ML techniques, both supervised
atically complex networks [1]–[3]. The core networks have
and unsupervised Artificial Neural Networks (ANNs) have
grown substantially larger as greater switching capacities are
been exploited in a variety of networking fields, ranging from
introduced in the Internet core and more, bigger routers with
routing to intrusion detection. While the conventional, shallow
more/faster radio links are deployed in the wireless enterprise
ANNs have frequently been utilized for traffic prediction for
backbone networks. Such complex network systems confront a
proactive network management, their performance is practi-
myriad of challenges including management, maintenance, and
cally limited [6], [14], [15]. This limitation arises due to
network traffic optimization [4]. Furthermore, most of these
the fact that increasing the number of hidden layers of the
Z. M. Fadlullah, F. Tang, B. Mao, and N. Kato are with the Graduate School ANNs do not have an impact on the performance pertaining
of Information Sciences, Tohoku University, Sendai, Japan. Emails: {zubair, to improved decisions on network-operations (e.g., scheduling,
fengxiao.tang, bomin.mao, kato}@it.is.tohoku.ac.jp routing, and so forth). Very recently, however, there has been
O. Akashi, T. Inoue, and K. Mizutani are with the Nippon Telegraph
and Telephone Corporation (NTT) Network Innovation Laboratories. Emails: a breakthrough in the way deep learning systems such as
{akashi.osamu, inoue.takeru, mizutani.kimihiro}@lab.ntt.co.jp Deep Belief Networks, Deep ANNs, and Deep Boltzmann

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

Evolution of Deep Learning Techniques

Machine Intelligence
1950 - 1980 Machine Learning (ML)
Early research on 1980 - 2010
Artificial Intelligence Deep Learning
(AI) stirs excitement ML techniques
2010 -
begin to flourish
Deep Learning breakthroughs may significantly impact
many research domains including computer vision,
speech recognition, robotics, AI gaming, networking

Traditional focus: Our focus:


Legacy applications of Machine Intelligence, Investigate how Deep Learning applications
Machine Learning, and Deep Learning are disrupting the network traffic control systems
-Recognition of objects (handwriting, image, video) and -State-of-the-art Deep Learning applications on network
speech is a well-studied discipline systems are a new trend
-Inter-disciplines of above applications with Machine -A new inter-disciplinary area (network traffic control
Learning/Deep Learning are well-established systems with Deep Learning) is emerging

Fig. 1: Evolution of deep learning from conventional Machine Intelligence and Machine Learning paradigms. Our focus, in this
paper, compared to the traditional one, is on exploring how deep learning applications are disrupting network traffic control
systems.

Machines can lead to significant performance gain [16]–[19]. network systems are discussed. Several commercially available
However, the applications of such deep learning systems have deep learning platforms are also described in the section. Next,
been limited mainly to image/character/pattern recognition and in Sec. IV, the state-of-the-art deep learning applications in
natural language processing. Inspired by the way these deep various networking related systems are extensively surveyed.
learning systems work to provide much better performance In Sec. V, a new application of deep learning in the network
in contrast with the contemporary ML algorithm, we conduct traffic control system, i.e., deep learning based routing, is
a survey on the state-of-the-art deep learning techniques for discussed. The open research issues related to network-centric
intelligent network traffic control. The contributions of our deep learning applications are discussed in Sec. VI. Finally,
paper are as follows. First, we provide an overview of the the paper is concluded in Sec. VII.
state-of-the-art deep learning techniques. Second, we identify
the various network traffic control themes where deep learning
techniques have been applied in a rather scattered manner. In II. OVERVIEW OF D EEP L EARNING A RCHITECTURES AND
our survey, for each of these networking themes, we provide a A PPLICATIONS
brief overview of the traditional ML techniques, and then de- Deep learning is a branch of ML based on a set of
scribe how the deep learning methodology can achieve much- algorithms, which construct computational models aiming to
improved performance in contrast with the traditional ML represent high-level data abstractions. In the literature, deep
approaches. Third, among the various networking themes, we learning has also been referred to as deeply structured learning,
stress on a new deep learning application for intelligent routing hierarchical learning, deep feature learning, and deep represen-
operations of a backbone network by providing detailed sys- tation learning. One of the most common and popular deep
tem model of the deep neural network computational model, learning structures is the multiple-layered models of inputs,
step by step training and execution details, and experimental commonly known as deep neural networks, which comprise
results. We also show how the deep learning based intelligent multiple levels of non-linear operations. Other deep learning
routing technique can outperform the conventional routing architectures also exist in the literature [20]. Prior to 2006,
strategies such as the Open Shortest Path First (OSPF) routing searching the parameter space of the deep architecture was
strategy. Fourth, we provide an elaborate discussion on various a formidable research challenge. Recently, the deep learning
open research issues related to deep learning applications in algorithms have significantly solved this problem [21]. In this
networking problems. section, we first present a brief overview of the state-of-the-
The remainder of the paper is organized as follows. In art deep learning architectures and algorithms, followed by
Sec. II, we provide an overview of the state-of-the-art deep their applications in several prominent fields to highlight the
learning techniques which may be useful for network traffic research gap between deep learning applications in network
control systems. In Sec. III, the deep learning enablers for traffic control systems in contrast with other disciplines.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

Binary Decision Tree (BDT) Applications


Naïve Bayesian Classifier
Image classification; Character recognition;
Neural Network (NN)
Facial recognition; Surveillance systems
Classification Convolutional NN (CNN) (intrusion detection, etc); Diagnostics
Supervised Deep Belief Networks (DBN)
Learning Recurrent NN (RNN)
Advertising and business intelligence (Google
NN ads, etc); Weather forecasting; Market
Regression
forecasting; Political campaigns
Trees
Deep Q Networks Real-time decisions; Game Artificial
Reinforcement Learning Intelligence (AI); Learning tasks; Skill
Double Q-learning
acquisition; Personal assistants (Google Now,
Prioritized experience replay
Microsoft Cortana, Apple Siri, etc);
Dimensionality Stacked Auto-Encoders (SAE) Autonomous (“self-driving”) cars
Reduction Local Linear Embedding
Big data visualization; Feature elicitation;
Auto-Associative NN Structure discovery; Meaningful compression
Unsupervised
CNN
Learning Clustering Recommendation engines (Amazon web
DBN service, Netflix, etc); Customer segmentation;
K-Means Targeted marketing; Filtering algorithms;
Density Gaussian Mixture Newsfeeds
Estimation Kernel Density
Deep Boltzmann Machine (DBM)
Generalized Denoising Auto-Encoders Economics (risk prediction, etc)

Fig. 2: Various Machine Learning techniques exploited for solving a myriad of computer science problems. It may be noticed
that deep learning techniques which are shown with N labels have emerged recently with their use mainly restricted to objects
recognition and have not been applied to intelligent network traffic control systems extensively.

A. Deep Learning Architectures massive, unlabeled dataset, one level at a time (a use example
Deep learning architectures are bio-inspired in the sense that is described in section.II-A3). In other words, in each level,
brains have a deep architecture [22], [23], and computers can a new transformation of the previously learned features is
emulate such deep architectures. Human brains organize their performed that is served as the input to the next level [28]–
concepts in a hierarchical fashion. For instance, the human [31]. The features obtained through the process can, then, be
brain first learn simpler concepts, and then combine them used as the input to either a standard supervised ML predictor
to represent more abstract ideas. Motivated by this learning (e.g., Support Vector Machines (SVMs), Conditional Random
technique, researchers have devoted a lot of efforts for using Field (CRF), and so forth) or to a deep supervised neural
many levels of abstraction and processing to solve compu- network. Since we are interested in the deep learning architec-
tational problems. Depending on how the deep architectures tures, consider the latter. Each iteration of unsupervised feature
are intended for use, they can be broadly categorized into learning creates a layer of weights to the deep neural network.
three types, namely, generative, discriminative, and hybrid At the final stage, the set of layers comprising the learned
deep architectures. A generative deep architecture aims to weights can be used to initialize a deep supervised predictor
characterize the high-order correlation properties of the input (e.g., a neural network classifier or a deep generative model
data for synthesis purposes. On the other hand, a discrim- like a Deep Boltzmann Machine (DBM) [18]). For interested
inative deep architecture is used for pattern classification readers, we briefly overview the relevant deep learning models
or recognition purposes. By combining the generative and such as deep neural networks, deep reinforcement learning,
discriminative deep architectures, a hybrid model may be stacked auto-encoders, DBMs, and deep neural networks.
constructed, particularly to carry out discrimination tasks, 1) Convolutional Neural Networks (CNNs): The Convolu-
which are aided by the optimized outputs obtained from the tional Neural Network, also known as CNN or Convnet, is
generative architecture [20]. It is worth noting that the hybrid a discriminative deep architecture [32]. At the first glance,
deep architecture is not the same as feeding the outputs a CNN may appear to be quite similar to an ordinary ANN
of a traditional neural network to a Hidden Markov Model since both architectures consist of neurons having learnable
(HMM) [24]–[26]. However, regardless of their purpose, with (i.e., tunable) weights and biases. Readers familiar with a
the exception of Convolution Neural Networks (CNNs), deep traditional feed-forward ANN may recall that it receives an
architectures could not be successfully trained before 2006. input (i.e., a single vector) and transforms the input through
The state-of-the-art deep learning algorithms, on the other a number of hidden layers as depicted in Fig. 3(a). Each
hand, still hinge upon multi-layer architectures according to hidden layer comprises a set of learning units called neurons.
the work conducted by Bengio et al. [27]. The key difference The neurons of a hidden layer are connected in a fully
is the introduction of Greedy Layer-Wise unsupervised pre- meshed fashion with those of the previous layer. The last
training , which aims to learn a hierarchy of features from a fully connected layer is referred to as the output layer. Each

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

Hidden Layers
Input Layer Input Layer Output Layer
Hidden Layers
Output Layer

(a) Shallow ANN architecture. (b) Deep ANN architecture.

Fully Fully Ouptut


Convolution Pooling Convolution Pooling
connected connected Prediction
Input
dog(0.05)
cat(0.92)
bird(0.01)
building(0.02)

(c) CNN architecture.

𝑥1 𝑥
ෞ1
Hidden Layers

Input Layer 𝑥2 ℎ1 𝑥
ෞ2
Output Layer
𝑥3 ℎ2 𝑥
ෞ3

𝑥4 ℎ3 𝑥
ෞ4

𝑥5 𝑥
ෞ5
(d) RNN architecture. (e) Auto-Encoder.

Fig. 3: The architectures of shallow (i.e., regular), deep, convolutional, and recurrent neural networks.

neuron receiving several inputs takes a weighted sum of them, combines their results. In addition, the CNN consists of the
passes it through an activation function, and responds with pooling layers for subsampling. During the training phase, a
an output. In contrast with the traditional feed-forward neural CNN automatically learns the values of its filters based on the
networks i.e., a shallow ANN where the input is typically a given task. For instance, for classifying images, assume that
vector, a deep ANN architecture is shown in Fig. 3(b). The raw pixels e.g. the image of a cat as shown in Fig. 3(c) are
deep ANN structure is a conventional training method in many the input of a CNN. In its first layer, the CNN may learn to
areas, but with full connectivity between nodes suffer from the detect the edges from the raw pixels. Then, in the second layer,
burden of dimensionality, when the input becomes too large the CNN employs the edges to detect simple shapes. In this
and complex such as the training of high resolution images, fashion, in the subsequent (i.e., higher) layers, by using these
the conventional ANN cannot process well. Therefore, the shapes, the CNN may be able to learn higher-level features
convolution layers were proposed instead of full connectivity like facial shapes and so forth. In the final layer, a classifier
in the neural network layers. The architecture of a deep CNN is used to exploit these high-level features. Deep CNNs have
is shown in Fig. 3(c) that deals with a multi-channeled image proved to be quite successful in learning task specific features,
as the input. To deal with this complex input, the CNN consists which have provided much-improved results, particularly on
of several layers of convolutions with nonlinear activation different computer vision tasks, in contrast with contemporary
functions to compute the output. As a consequence, the CNN ML techniques. Generally, the CNNs are trained by employing
comprises localized connections whereby each region of the supervised learning methods whereby a large number of input-
input is connected to a neuron in the output. Each layer applies output pairs are essential. However, obtaining a substantially
different filters (in the order of hundreds to thousands) and large training set has been a key challenge in applying CNNs

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

for solving a new task. In addition, CNNs have been reported fine-tuned by employing the back-propagation algorithm [46].
to be extensively used in unsupervised methods [33]–[35]. The RBMs stack is also referred to as the Deep Boltzmann
2) Recurrent Neural Networks (RNNs) or Long Short Term machines(DBMs) or Deep Belief Networks (DBNs). Accord-
Memory Networks (LSTMs): The Recurrent Neural Network ing to [28], [47], the pre-trained DBM is used to initialize a
(RNN) can be considered to be a deep generative architec- deep neural network and train with backpropagation similar to
ture [36] as shown in Fig. 3(d). The depth of an RNN may be that in the stacked auto-encoder as explained in Sec. II-A3.
as large as the length of the input data sequence. Therefore, 5) Deep Reinforcement Learning: Reinforcement learning
the RNN is particularly useful for modeling the sequence data combines the strength of both supervised and unsupervised
in text and speech. Despite the potential strength, their use learning methods. In a reinforcement learning method, sparse
was restricted until recently due to the so-called “vanishing and time-delayed labels (referred to as “rewards”) are used
gradient” problem [37]. New optimization methods to train based on which the agent has to learn how to behave in a given
generative RNNs that modify stochastic gradient descent have environment. In other words, reinforcement learning enables
appeared in the literature recently [38]–[40]. an “agent” to learn by interacting with its environment. In
3) Stacked Auto-Encoder: Stacked auto-encoders can be this manner, the agent continues to build upon its experience
a good example to demonstrate how the earlier-mentioned so as to maximize its long-term rewards. The most well-
Greedy Layer-Wise unsupervised pre-training can be ex- known reinforcement learning technique is Q-learning [48],
ploited. An auto-encoder refers to an ANN aimed to learn a simplified architecture of which is presented in Fig. 4(c).
efficient coding [41] by encoding a set of data as depicted in Recently, deep Q networks, patented by Google, have emerged
Fig. 3(e). The encoded data conveys a compressed represen- to represent the Q-function of the Q-learning with deep neural
tation of the data set. In other words, the auto-encoder can network architectures [49]. Many improvements to the deep
be exploited to perform data compression or dimensionality Q networks have recently been proposed that include Double
reduction. The architecture of a typical auto-encoder is as Q-learning [50], prioritized experience replay [51], dueling
follows. There is an input layer followed by a number of network architecture [52], the extension to continuous action
significantly smaller hidden layers, and finally an output layer. space [53], and so forth.
The hidden layers encode the input data set while the output
layer attempts to reconstruct the input layer. According to B. Significance of Deep Architectures
Bourlard et al. [42], if the auto-encoder architecture consists
A key reason to use the afore-mentioned deep architectures
of just linear neurons or just a single sigmoid hidden layer, the
is because they can more efficiently represent a non-linear
optimal solution obtained by the auto-encoder is correlated to
function [21] compared to the shallow ML architectures. In
the Principal Component Analysis (PCA). Then, the feature
other words, for a non-linear function to be compactly repre-
which is learned by the auto-encoder is used to train another
sented, a significantly large (i.e., deep) architecture is required.
layer of auto-encoder, and so forth. The learned weights, via
Another reason of using deep architectures is that they can sup-
this process, are eventually used to initialize a deep neural
port transfer learning, i.e., the ability of a learning algorithm to
network [43].
share knowledge across various tasks. Because deep learning
4) Deep Boltzmann Machines: The next deep architecture
algorithms learn features which capture underlying factors,
worth mentioning is the Deep Boltzmann Machines or the
they may be useful for carrying out other tasks. This idea
DBMs. However, readers may first need to briefly review the
of knowledge sharing was demonstrated in [27]. In addition,
shallow Boltzmann machine architectures, i.e., the basic Boltz-
in the two transfer learning challenges held in 2011, learning
mann machines and the restricted Boltzmann machines [44].
algorithms leveraging deep architectures exhibited superior
A Boltzmann machine (BM), is a network of binary,
performance compared to existing learning algorithms [54],
stochastic units with an “energy” defined for the network. A
[55]. Furthermore, the works conducted by Glorot et al. [56]
typical BM architecture is depcited in Fig. 4(a). While learning
and Chen et al. [57] demonstrated the successful application of
is ineffective in a shallow BM, it can be made quite efficient
deep learning architectures in fields related to transfer learning
in an architecture called the Restricted Boltzmann Machine
such as domain adaptation.
(RBM), which does not permit the connections among units
on the same layer (refer to Fig. 4(b). After training one RBM,
the activities of its hidden units can be considered as data C. Deep Learning Applications
for training a higher-level RBM. This method of stacking In recent years, deep learning research has gained a remark-
RBMs permits training many layers of hidden units in an able momentum in both academia and industry. In particular,
efficient manner, and this consists in one of the most common deep learning techniques had a terrific impact on several
deep learning strategies. As each new layer is added the computer science and engineering fields including object
overall generative model of the RBM improves significantly. In recognition, speech recognition, natural language processing,
addition, RBMs [45] offer a popular architecture to carry out robotics, driverless cars and AI gaming. The state-of-the-art
the pre-training. For instance, Hinton et al. [28], considered deep learning applications in these areas are briefly discussed
a stack of RBMs whereby the learned feature activations of below.
a single RBM are served as the input data for training the 1) Object Recognition: Object recognition has been re-
following layer of RBM. Once the pre-training is completed, garded as a non-trivial task for computers [60], [61].
the RBMs are unfolded to construct a deep network that is The Mixed National Institute of Standards and Technology

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

ℎ2 hidden layer

hidden layer
ℎ1 ℎ3
ℎ1 ℎ2 ℎ3

𝑣1 𝑣4

𝑣1 𝑣2 𝑣3 𝑣4
visible layer

𝑣2 𝑣3
visible layer

(a) Boltzmann Machine (BM) architecture. (b) Restricted Boltzmann Machine (RBM) architec-
ture.

Q-value 1 Q-value 2 … Q-value n

Network

State

(c) Deep Q-networks.

Fig. 4: Simplified architectures of BM, RBM, and deep Q-networks. Note that the BMs and RBMs can be used to construct
the deep BM and the DBN, respectively.

(MNIST) database [62] comprising a training set of 60,000 with the Google Photos [67].
examples and a test set of 10,000 examples of handwritten
2) Speech Recognition and Signal Processing: There ex-
digits have frequently been considered as the benchmark for
ist many applications of convolutional neural networks for
many prominent ML techniques. Until 2006, SVMs were speech recognition and signal processing in the literature [68]–
known to be the most suitable among the contemporary ML [70]. The recent research in deep learning architectures and
methods to recognize the handwritten digits available in the
algorithms have revived the interest in neural networks and
MNIST database. Then, Hinton et al. [28], [63] and Bengio et
representation learning, particularly in the speech recognition
al. [21] demonstrated that deep learning algorithms are able to
area [71]–[73]. In addition to research endeavors in academia,
outperform the SVMs by a substantial margin for carrying out
industry-based researchers are also devoting a lot of effort
the object recognition tasks. Rifai et al. [64] and Ciresan et toward deep learning applications for speech recognition and
al. [65] demonstrated further effective applications of deep signal processing. For instance, the Microsoft Audio Video
learning systems to solve MNIST digit image classification
Indexing Service (MAVIS) speech system based on deep
problems with 0.81% and 0.27%, respectively. In addition,
learning showed a significant drop in the error rate compared
researchers have shown that deep learning can be applied for
to contemporary ML techniques (e.g., Gaussian mixtures for
more sophisticated object recognition in natural images, e.g.,
the acoustic modeling) used for recognizing speech in a dataset
those available in the ImageNet dataset [66]. In particular, Kri- comprising audio records over 300 hours [74]. Despite their
hevsky et al. [33] demonstrated that deep learning algorithms
good performance, the input vectors used in the state-of-the-
can improve the state-of-the-art error rate in image recognition
art deep neural network are of fixed dimension. Therefore,
from 26.1% to 15.3%. As a practical example, with which the
such a deep architecture acts a static classifier that may not
readers may be easily able to connect, consists of Google’s
be suitable for speech sequence recognition. This is because
deep learning project of integrating image recognition ability
the dimensionality of inputs and/or outputs may be variable

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

in sequence recognition. On the other hand, the HMM, based 4) Robotics: In recent years, there has been a growing
on dynamic programming operations, is particularly useful to interest in using deep learning architectures in advanced
model speech sequence data with variable length. Therefore, robotics [87], [88]. For example, deep neural networks as
to exploit their respective advantages, researchers [75] have function approximators in reinforcement learning for physical
considered the joint use of the static classifier (i.e., the deep systems like robots were used in [89]. CNNs exploiting RGB-
neural network) and the HMM. Furthermore, deep RNNs D data have been used by a plethora of robotics applica-
and CNNs have also been utilized to deal with the afore- tions, e.g., object detection, scene semantic segmentation, and
mentioned dimensionality problem involving the inputs and/or grasping [90]. The work in [91] proposed a deep learning
outputs [58], [76], [77]. These methods were applied to several architecture to detect the palm and fingertip positions of stable
musical datasets, and the results demonstrated a relative error grasps by employing partial object views. The deep architec-
improvement from 5% to 30% compared to the existing ture was shown to be particularly useful for generalizing grasp
polyphonic transcription method [39], [64]. Furthermore, a experiences for both known and novel objects. Furthermore,
practical example of the deep learning application for speech a novel vision-based robotic grasping system based on a
recognition can be found in Apple’s deep neural networks- max-pooling CNN (MPCNN) appeared in [92]. On the other
based upgrade to the smart assistant “Siri”, which significantly hand, a new real-time deep learning algorithm for dynamically
improved its accuracy [78]. training several robots was presented in [93]. The real-time
3) Natural Language Processing: Natural Language Pro- learning of high-dimensional features for robotics applications
cessing, frequently referred to as NLP in the literature, is via deep learning techniques is another important topic, which
another domain where both state-of-the-art ML techniques was discussed by [94]. In addition, other topics in robotics
and deep learning algorithms have been reported to be ex- such as obstacle detection [95] and context-dependent social
tensively utilized. For instance, the concept of symbolic data mapping [96] are also being addressed by researchers through
representation through distributed representation was intro- deep learning methods.
duced way back in 1986 [79]. However, the idea of using 5) Automated Vehicles: Automated vehicles, also referred
deep architectures to represent symbolic data, also known as to as self-driving vehicles or driverless cars, are becoming
“word embedding”, was successfully realized only in 2003 via popular due to the advancement of autonomous navigation
statistical language modeling [80]. The practical application and situational awareness systems. Such systems require the
was, however, not possible, until 2008 when Weston et al. [81] integration of numerous sensors on board the vehicle that
exploited deep CNNs to carry out the word embedding task. makes it challenging for the ML algorithms to provide real-
Furthermore, they demonstrated the transfer learning capabil- time driving decisions [97]. As a consequence, deep neural
ity of deep architectures through the SENNA system [59] to networks are recently being utilized for analyzing the multi-
share representations across various NLP tasks such as named- modal sensor inputs [13], [17], [98], [99] for autonomous
entity recognition, learning language models, chunking, part- vehicles.
of-speech tagging, semantic role-labeling, and so forth. While 6) AI Games: In addition to the afore-mentioned appli-
the language model recognition was shown to require labeled cations, deep learning algorithms are extensively used for
data for training, the training of the other tasks could be carried playing AI games. For instance, the game of “Go”, with an
out by adopting unsupervised methods. On the other hand, enormous search space and an incredibly high number of
conventional NLP approaches heavily depend upon supervised board positions/moves, deep neural networks harnessing the
methods, i.e., manual extraction of features from the language combined strength of supervised learning from human ex-
dataset that are passed to a shallow classification algorithm perts and reinforcement learning, the AlphaGo program [100]
(e.g., a SVM with a linear kernel). Thus, the deep learning achieved a 99.8% winning rate against other state-of-the-art
paradigm significantly differs from the state-of-the-art NLP ML-driven Go programs. Also, for the first time in history,
approaches, and exhibits much superior performance com- AlphaGo defeated the champion Go player that attests to the
pared to the shallow architectures using supervised methods success of the underlying deep learning method. The winning
for complex tasks. In addition, deep RNNs applications have streak of AlphaGo against human champions all over the
also appeared in the recent literature to successfully address world attracted a lot of media attention. As a result, there is a
various aspects of NLP. For example, Socher et al., in their huge interest among researchers in utilizing the deep learning
works in [76] and [82], exploited RNNs for carrying out architectures and algorithms to solve a myriad of AI gaming
sentiment analysis for semantic compositionality and pars- challenges [101], which were considered to be benchmark
ing, respectively. Furthermore, RNN models were used to problems in the recent past.
achieve improved word prediction accuracy [83] using the
Wall Street Journal benchmark task [84] and to carry out
statistical machine translation [85]. On the other hand, other D. Research Gap - Deep Learning Implication for Computer
deep learning structures have also been used to solve NLP Networks
problems. For instance, recursive Auto-Encoders were utilized As discussed in the earlier segment of this section, it is
by Socher et al. [77] to significantly improve full sentence evident that the deep learning applications are continuing
paraphrase detection. Also, Google’s deep learning initiative is to span across a wide range of topics in computer science
worth-noting for the speech recognition capabilities of Google and engineering. From the networking domain, the use of
Translate [86]. deep learning architectures and algorithms for solving various

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

problems of communication networks have not been formally and so forth. In order to support these much-improved learning
surveyed. In other words, recently, a number of research works algorithms, the recent advancement in commodity hardware
have exploited deep learning methodologies in networking (i.e., in terms of processing, memory, and Input/Output bus
themes. However, they have been rather scattered in the speeds) is also acting as a key enabler for leveraging deep
literature. The implication of using deep learning for computer learning architectures for network traffic control systems. For
networks is enormous in the sense that this may create a instance, the Graphics Processing Units (GPUs) with low-
new inter-disciplinary research area. In contrast with the other latency shared memory system provide a massive compu-
computer science areas such as object recognition, speech tation platform. This is anticipated to drive the next gen-
recognition, NLP, and so forth, why exactly the deep learning eration Software Defined Routers (SDRs) in a significant
applications have not received a holistic attention from the manner.additionally, benefiting from the hardware innovation,
network researchers? What existing networking applications recent GPU-based SDRs are reported to have a much im-
can be complemented by the deep learning techniques? How proved packets processing capability. Furthermore, the GPUs
to characterize the networking metrics to be measured and on an SDR may be particularly useful for executing deep
predicted using deep learning techniques? What applications learning algorithms for learning various networking situations
exist in computer vision, speech recognition, robotics, and so and acting according to the acquired knowledge. Therefore,
forth, that may also be analogously applicable to the network the grounds for using deep learning algorithms for network
traffic control systems? Hence, it is important to thoroughly control traffic systems are, now, ready. In addition, technology
review which networking areas are currently overlapping with giants including Google, Microsoft, Apple, Facebook, Nvidia,
the deep learning space. To the best of our knowledge, no prior Amazon, and so on are heavily investing on GPU-accelerated
research work has surveyed the deep learning applications deep learning initiatives with their own deep learning plat-
on computer networks, particularly network traffic control forms. For instance, Google’s open-source platforms called
systems. Therefore, in this paper, we stress on surveying the TensorFlow and DeepMind (which was used to successfully
state-of-the-art deep learning applications for network traffic train the earlier-mentioned AlphaGo program in Sec. II-C6),
control systems. However, before delving into the survey, we Microsoft’s Computational Network Toolkit (CNTK), Face-
need to understand the key deep learning enablers for network book’s DeepText, Nvidia’s DGX-1 and Amazon’s Deep Scal-
systems which are discussed in the following section. able Sparse Tensor Network Engine (DSSTNE) are GPUs to
parallelize the deep network training [103]. Broadly speaking,
the deep neural networks in these libraries carry out the
III. D EEP L EARNING E NABLERS FOR N ETWORK T RAFFIC
training phase via data and model parallelism. In other words,
C ONTROL S YSTEMS
a massive dataset is split into batches, which are distributed
In this section, we discuss the key deep learning enablers to parallel models executing on separate hardware in order
for network traffic control systems. In order to ultimately to conduct simultaneous training. For interested readers, the
provide intelligent network traffic control, i.e., the so-called prominent platforms available for driving real-time packets
Knowledge Defined Networking [102], a joint consideration of processing capabilities in SDRs are described in the remainder
Software Defined Networking (SDN), network analytics, and of this section.
deep learning is essential. In today’s highly networked society,
vast amounts of data are generated, which can be utilized to A. TensorFlow
effectively train the deep learning systems. Due to the fact
Google’s TensorFlow is an open-source interface for ac-
that the conventional ANNs and other ML techniques are not
cessing the state-of-the-art ML and deep learning algorithms.
inherently efficient and/or scalable enough to cope with the
It permits the use of both CPUs and GPUs in a scalable
large volume data, the networking community did not pursue
manner. In other words, it can be used by individual users (e.g.,
their use in the long run. If sufficient labeled training data is
on smart phones/tablets) and also by large-scale distributed
available, the deep learning techniques could be exploited to
systems. TensorFlow has been used for conducting research
explicitly predict the traffic delivery routes and make intelli-
and deploying deep learning systems into production in the
gent decisions on scheduling, bandwidth reservation, and so
applications described in Sec. II-C. However, to the best of
forth. Nowadays, the network datasets, comprising information
our knowledge, the TensorFlow API has not been used for
such as inbound traffic patterns history, packet drops, failure
developing SDRs. Perhaps, the reason behind this is the cloud-
of network nodes, and so forth, are much bigger compared
based execution of the deep learning algorithms in TensorFlow,
to the past. If deep learning could bring the improvements
which may not fit for the SDR needs.
in performance to network traffic control systems that it has
brought to computer vision, speech recognition, and the other
are as mentioned in Sec. II-C, it could, in essence, enable the B. Torch
Knowledge Defined Networking. Based on the Lua programming language, Torch offers sev-
Another deep learning enabler for the network traffic control eral machine/deep learning algorithms with fast and efficient
systems is the better algorithms that have appeared in the GPU support. The interesting feature of Torch is the way it
recent literature that include stochastic gradient descent (RM- can be embedded in mobile operating systems and FPGA
Sprop, Adagrad, Adam, and so forth), efficient regularizers backends. Torch has already been used within Facebook,
(e.g., Dropout, L1/L2, and so on), Greedy Layer-Wise training, Google, Twitter, IDIAP, and several other social messengers

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

and research hubs. However, its use for network traffic control surveyed as extensively as those in other research domains.
systems is rather limited, similar to the case of TensorFlow. In the following section, we identify the various networking
areas which have witnessed applications of deep learning.
C. Caffe
The Convolutional Architecture for Fast Feature Embed- IV. A PPLICATIONS OF D EEP L EARNING IN N ETWORK
RELATED AREAS
ding, referred to as Caffe, provides the researchers with a
robust framework of state-of-the-art deep learning algorithms. In this section, we first identify a number of network related
Currently, Caffe drives a number of startup prototypes in system where state-of-the-art deep learning has been applied,
vision, speech, and multimedia. It is anticipated that it may namely wireless ad hoc and sensor networks, network traffic
be able to power large-scale industrial applications including classification, origin flow prediction, social networks, mobility
network traffic control systems. prediction, and cognitive radio networks. Then, for each of the
mentioned areas, we briefly discuss the shortcomings of the
D. Deeplearning4j existing ML approaches and then survey the relevant deep
learning applications. In addition, for each of the aforemen-
Deeplearning4j (DL4J) offers an open source neural net- tioned networking areas, we provide an insightful discussion
work library for Java and Scala programming languages. on the relevant deep learning applications.
This distributed deep learning framework integrates well with
Apache Hadoop [104] and Spark [105] using an arbitrary num-
ber of CPUs and/or GPUs. In terms of speed, its performance A. Deep Learning in Wireless Sensor Networks
is comparable with that of Caffe exploiting multiple GPUs. Over the decades, the Wireless Sensor Networks
On the other hand, it exhibits better performance in contrast (WNSs) [110]–[117] have enjoyed a plethora of ML
with Google’s Tensorflow or Torch. applications [118]. Statistical learning algorithms such as
Bayesian inference [119] and Gaussian Process Regression
E. WILL (GPR) [120] have been used to the limited extent for various
purposes in the WSNs. While they require a significantly small
WILL is a high performance deep learning frame-
number of training samples, these methods require accurate
work [106], which is supported by C++ and compatible to
statistical knowledge and are, therefore, not widely adopted
other interfaces of C, python and assembly language. It has
in the WSN context [118]. In particular, reinforcement
comparable learning performance and speed with Caffe and
learning, neural networks, and decision trees have been
DL4J, and is more convenient in configuration. Furthermore,
popular ML algorithms used in WSNs, which consist of
WILL is a cross-platform framework, suitable for windows,
typically many autonomous, tiny, low power, and low-cost
Unix and even embedded systems. The work in [107] demon-
sensor nodes to collect various types of information (e.g.,
strated the first ever deep learning based routing strategy based
thermal, acoustic, pressure, chemical, and other ambient
on WILL.
data). For instance, supervised learning methods to address
localization and objects targeting in WSNs have been
F. CUDA extensively employed in the works conducted by [121]–[123].
Nvidia’s CUDA offers a parallel computing platform and The prominent ML-based works dealing with intelligent
programming model based on GPUs. The deep neural network scheduling in the Medium Access Control (MAC) layer
library in CUDA comprises a library of primitives for the of the WSN can be found in [124]–[126]. Additionally,
standard deep learning routines, e.g., forward and backward security, particularly intrusion detection in the WSN, has been
convolution, pooling, normalization, activation of layers, and widely studied using Machine Intelligence theory that has
so forth. It provides one of the fastest state-of-the-art deep appeared in [127]–[130]. Furthermore, other research areas
learning libraries for deep CNNs and RNNs. In addition, in the WSN have also enjoyed ML applications that include
it is one of the top ranking image-processing benchmarks event detection and query processing [131]–[133], and QoS,
conducted by Chintala et al. [108]. An interesting feature of data integrity, and fault detection [134]–[136]. In addition,
CUDA-based SDRs is the feature of writing the deep learning the K-Nearest Neighbor (K-NN), which is a light-weight
code directly in the high-level languages (e.g., C/C++ and so supervised learning algorithm, was applied extensively in
forth) and sending it to the GPU without the need of assembly the WSNs, particularly for the query processing tasks [131],
language programming. The NVIDIA Titan X comprising [132]. However, when the WSNs produce a significantly high-
3584 CUDA cores and 12GB of on-board GDDR5X memory dimsensional data (i.e., exceeding 10 dimensions), the K-NN
is anticipated to efficiently parallelize the training process of is reported to lead to inaccurate results [137]. Decision tree
the deep neural networks. Such GPUs are being used to drive is another classification method which has been frequently
real-time image/packets processing capabilities [109]. applied to WSNs [138]. However, those applications are
It is evident from the discussion in this section that currently limited to linearly separable data and suffer from high
researchers have access to a wide range of deep learning complexity since constructing complete optimal learning
libraries, which they are able to apply to solve the problems trees is an NP-complete problem [139]. Furthermore, the
of various research domains. However, the deep learning use of ANNs, particularly deep neural network architectures,
applications in network traffic control systems have not been has been limited in the literature [140] due to the high

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

10

computational requirement to learn the weights and high for the training. However, due to its high complexity, a semi-
management overhead in the distributed WSN topologies. For supervised learning technique was proposed by Erman et
node localization of the WSNs, neural networks with self- al. [180]. Such unsupervised and/or semi-supervised learning
organizing maps (i.e., Kohonen’s maps) and Learning Vector algorithms, however, may not be directly applied to next
Quantization (LVQ) [141] have been used. However, the data generation Software Defined Networks (SDNs) without taking
set required for training such neural networks was reported into consideration the decoupled control and data planes.
not to be sufficient by Hinton et al. [28]. As a consequence, The work in [172] presented a traffic classification engine
instead of deep learning architectures, many researchers to first conduct local traffic identification at the SDN edge
opted to employ SVM for classifying data points using switches, and then perform a “global” traffic classification
labeled training samples [138], [142]. In particular, the SVM via the network controller, which is responsible for training,
algorithm, which typically optimizes a quadratic function constructing, and refining QoS policies based on the learned
with linear constraints, has been extensively exploited for traffic information. Once a large amount of data on network
node localization [143]–[145] and security applications [127], traffic flows and their corresponding labels are available, the
[130], [146]–[148] in WSNs. problem of protocol classification was demonstrated to be
In addition to the supervised learning methods, unsupervised efficiently solved by deep learning architectures like deep
learning approaches have also been widely used in WSNs, neural networks [15] and/or stacked Auto-Encoders [181]. It
particularly to classify the sample set into various groups was also mentioned in [181], [182] that the latter performs
based on their similarity. Naturally, the unsupervised ML better in contrast with the deep neural networks for classifying
algorithms have been widely adopted for clustering of WSN any flow data to a predefined protocol with an accuracy
nodes and data aggregation problems [149]–[155]. Among enough to be used in a real application. Also, the use of
the most frequently used clustering methods in WSNs, the deep architectures was shown to reveal the highly probable
K-means clustering [156] is notable. However, for reduction anomalous and/or disguised flows. Furthermore, the unknown
of dimensionality, researchers used a multivariate method network flows were estimated to be over 17%. The work
exploiting the PCA [157]. However, to exploit the distributed clearly showed that the deep learning models are able to
nature of the WSNs, reinforcement learning, particularly Q- distinguish more than half of the flows, which are difficult
learning, is used extensively and efficiently for WSN routing to be identified by the traditional ML algorithms.
problems [158]–[161]. On the other hand, the state-of-the-art
deep Q networks described in Sec. II-A5 are yet to be exploited
C. Network Flow Prediction with Deep Learning
for WSNs. Furthermore, Alsheikh et al. [118] recommended
deep learning methods [21] and the non-negative matrix In addition to network traffic classification, flow prediction
factorization algorithm [162] for more efficient unsupervised is another important area of the network traffic control sys-
learning methods for the WSNs. tems, which has witnessed a growing number of deep learning
applications recently. A network traffic flow is defined to be
a sequence of data packets, which share the same context be-
B. Network Traffic Classification and Deep Learning tween source-destination pairs that include Transport Control
Traffic classification by network traffic control systems is Protocol (TCP) connections, media stream, and so forth. In
another area where ML continues to be of long-term interest order to manage the limited networking resources, information
to the networking community [163]–[171]. This is because on the flow characteristics like the burst size (i.e., packets
the state-of-the-art ML, and now deep learning techniques, are number and packet-size) and the inter-burst gap are often used.
anticipated to efficiently perform network traffic classification In the case of SDNs, the flow information is particularly useful
useful for network monitoring, QoS, intrusion-detection, and for programming routers, mitigating wireless interferences,
so forth in a wide variety of network settings [172]. Re- scheduling congested data traffic, and so on. Among traditional
searchers used supervised ML techniques to label all available ML techniques, Basu et al. [183] investigated a wide range
network traffic traces with previously known applications of time-series models for the Internet data traffic such as
in [173]–[175]. However, the supervised learning techniques the auto-regressive moving average process. Claffy et al. [4]
were considered to be impractical by Wang et al. [172] because showed how to estimate the original packet-size distribution
of the limited training data and the manner in which new of a flow from the packet sampling performed at the routers.
applications continue to appear. On the other hand, unsuper- On the other hand, the work in [184] proposed a method to
vised ML techniques were investigated in [170], [171], [176]. obtain the original frequencies of flow lengths from a sparse
The works in [170], [171] provided automatic network traffic packet sampling. That work was similar in spirit with the
classification based on exploiting the traffic features, e.g., time- face recognition system in [185] for performing classifica-
interval between packets arrival, packet-size, recurring traffic tion via a sparse representation of features. Similar sparse
patterns, and so forth. While classifiers like naive Bayes, de- representations of features based pooling to construct higher-
cision trees, and neural networks [177]–[179] are extensively level features for traffic flows forecasting were investigated by
used in these works, the training processes are not real-time. Raina et al. [186], [187]. The “origin” flow pattern inference
This rendered the automatic traffic classification inefficient, is performed for the traffic generated from a Wireless Local
similar to the unsupervised method. In order to remedy the Area Network (WLAN) in [188] by applying SVMs. On the
issue, in [176], Erman et al. employed an unlabeled database other hand, deep learning architectures were introduced by

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

11

Coates et al. [189] to pool over multiple features for the flow they developed a feature representation method. By combining
prediction. On the other hand, the work in [12] presented a the link samples and their values based on the link prediction
novel deep learning based sparse coding with forced incoher- and the obtained features, they further proposed the use of
ent dictionary atoms [190]–[192] for conducting origin flow RBMs for link prediction with a significantly high accuracy.
prediction. Furthermore, Oliveira et al. [182] demonstrated Moreover, Lazreg et al. [205] used deep learning for social
the viability of stacked Auto-Encoders for short-term network media analysis in emergency situations. Their deep learning
traffic forecast based on the Internet traffic time-series obtained system was used to extract features and patterns related to
from the DataMarket dataset [193]. the text and concepts available in crisis-related social media
posts and harness them to obtain an idea regarding the crisis.
Also, they demonstrated the great potential of deep learning
D. Deep Learning in Social Networks architectures to play a substantial role in the future social
Recently, social networks [195] have become a hot research networks in noisy emergency situations, i.e., during crises
topic as beyond 4G mobile networks continue to emerge events. It is also worth mentioning that the government-based
attracting deep learning applications [194]. In the mobile initiatives, e.g. the “anticipatory intelligence” [206], leverage
social networks, based on the users’ interaction, predicting deep neural networks to analyze social networks data to predict
their behavior toward a certain application, service, location, possible occurrence of social unrest. In the remainder of the
preference, and so forth is essential for the network operators. section, we discuss deep learning applications in other network
E-commerce sites, online marketing, advertisement display traffic control aspects, namely mobility prediction cognitive
networks, and so forth use such information of the social radio, and self-organized network.
network users’ intentions and behaviors to maximize their
efficiency [196]. The different patterns exhibited by the social
network users consist of time spent per application, search E. Mobility Prediction with Deep learning
frequency, recurring visits, and so on [197]. These patterns With the proliferation of 4G (such as Long Term Evolution
can be exploited to quantify their search behavior using ML (LTE) and WiMax), beyond 4G, and heterogeneous cellular
techniques [198]. The work in [199] uses a probabilistic networks along with the recent advances in portable devices,
generative process to model user exploratory and purchase the users can nowadays enjoy mobile network access. As a
history. Also, they introduced the latent context variable to consequence, mobility prediction is a critical issue for the op-
take into account both spatial and temporal features. Thus, erators in order to determine capacity estimation, resource allo-
the work aims to predict the social network users’ decisions cation, and so forth [216]–[219]. Chon et al. [220] stressed on
for specific contexts, and accordingly provide them with predicting human mobility as a critical need for broad-domain
appropriate recommendations. This is also evident in the recent applications (e.g., ranging from simple home preheating and
application of deep learning approaches to predict user activity sending dinner coupons to epidemic control [221], [222],
within the web content [199]. The deep learning algorithms urban planning [223]–[225], traffic forecasting systems [226]–
are replacing the state-of-the-art ML techniques such as lo- [228], resource management of mobile communications [3],
gistic regression and boosted decision trees. The deep neural [229], [230]). With an aim to offer appropriate services to the
networks, in particular, are superior to the conventional ML mobile users in a timely fashion, they proposed time-resolved
methods because of their capability to capture the non-linear places and paths prediction via monitoring users’ mobility.
relationship between the input features from the social network Such research works addressing mobility prediction have con-
users. In addition, the deep learning architectures are shown sidered ML techniques for a while. With the emergence of
to consist of a superior modeling strength compared to the the state-of-the-art deep learning methods, mobility prediction
existing ML approaches [199]. Furthermore, in the DeepWalk is anticipated to be even more accurate than the conventional
representation proposed by Perozzi et al. [200], a novel deep ML approaches. For instance, Sundsy et al. [231] considered
architecture to encode social relations in a continuous vector a large, raw mobile phone data set for the duration of over
space was proposed. The advantage of the DeepWalk architec- 3 months to construct a deep learning model. Compared
ture consists in its ability to exploit local information obtained with the traditional data mining algorithms, the deep learning
from truncated random walks to learn latent representations. based method showed a significantly accurate classification of
The latent representation learned through the deep architecture the mobile users with different socio-economic groups that
was demonstrated to outperform several multi-label network could be used for predicting mobility of the users. The result
traffic classification methods such as SpectralClustering [201], also indicated that the deep learning algorithms are able to
Modularity [202], EdgeCluster [203], and the weighted-vote capture the complex relationships between various dimensions
Relational Neighbor (wvRN) [204] for social networks like of the massive data without suffering from the overfitting of
BlogCatalog, Flickr, and YouTube. Also, the deep learning the training data. In addition, using only a single dimension
based representations were shown to be scalable and par- of the data in its raw form, the deep learning model was
allelizable, which can exploit the state-of-the-art GPUs for demonstrated to achieve a 7% better performance in contrast
efficiently executing deep learning algorithms. On the other with the baseline constructed with custom engineered features
hand, Liu et al. [11] considered an unsupervised method with from multiple data dimensions. Furthermore, Sundsy et al.
little samples from an online social network service model indicated that finding a general representation of mobile data
in order to perform the social links prediction. In addition, can be exploited for a number of other prediction tasks.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

12

Ghouti et al. [232] proposed a mobility prediction by using past and act quickly in the future. As a simple example
fully-complex extreme learning machines. These deep learning of a rule-based CRN, consider an IEEE 802.11 modulation
structures are based on fully-complex activation functions controller, which changes its modulation in response to the
known as CELMs [233], which can operate without the need varying values of Signal-to-Noise-Ratio (SNR) [242]. This
to tune the parameters of the connections between the input kind of cognitive decision making is based on reasoning rather
to hidden layers. Arbitrary computational nodes are applied than learning. The work in [243] points out the significant
irrespective of the training data so as to achieve a significantly difference between reasoning and learning in CRNs. By doing
low training error while estimating the output weights using a so, the work aims to formalize the application of MLs to CRN
least-square solution. On the other hand, Song et al. [234] to deal with problems like capacity maximization, dynamic
constructed a deep RNN to predict urban human mobility. spectrum access, and so forth. Abbas et al. [244] presented
However, Zhao et al. [219] argued that the conventional the evaluation and challenges of various learning techniques’
RNN fails to capture the long temporal features associated applications. However, Zorzi et al. [245] stressed on the fact
with the input sequence. Therefore, they designed a specific that the current ML techniques applied to the CRNs have
RNN architecture tailored for sequence prediction tasks in traditionally adopted shallow architectures. Recently, they pre-
order to learn the time series with long time spans while sented a novel perspective on CRN optimization by employing
automatically estimating the optimal time lags. Their deep learning and distributed intelligence, and described how deep
RNN architecture was demonstrated to be able to predict the learning architectures can be utilized. Furthermore, O’Shea et
future movements and transportation mode of the users with al. [246] demonstrated the viability of using deep CNNs in
an accuracy of 80% or more. On the other hand, Ouyang et CRNs. Their work builds upon successful ML algorithms
al. [235] identified two representative works that pioneered in for image and voice recognition to flexibly use the deep
transforming the state-of-the-art deep learning algorithms into CNNs to flexibly learn features across many different tasks.
an effective online learning model. First, Zhou et al. [236] In contrast with the conventional ML approaches, their deep
employed the denoising Auto-Encoders for online learning, learning based method exhibited much better performance
which is capable of learning new patterns at the cost of slow when applied to the blind temporal learning on large and
learning speed due to large parameters space and a lack of densely encoded time series datasets obtained from the CRN.
support for parallel execution. As a remedy, Xiao et al. [237] The work conducted by Zorzi et al. [245] focused on
introduced a hierarchical training algorithm using a deep CNN unsupervised training of stacked RBMs to introduce a variety
model, which supports parallel execution and is particularly of deep learning models for CRNs. The deep learning methods
suitable for the massive mobility datasets. Based on the lessons have also been considered to be currently the state-of-the-art in
learned from these two works, Ouyang et al. [235] proposed CRN modeling by Zorzi et al.. Furthermore, they highlight on
an online deep learning framework called the “DeepSpace” the CUDA framework refer to Sec.III-F, which is identified
in order to predict human moving paths by exploiting a to an extremely powerful parallel computing architectures
deep CNN architecture that can deal with parallel online exploiting GPUs for efficiently constructing incredibly large-
data. The “DeepSpace” framework,along with the traditional scale deep learning models containing millions of connection
CNN architecture were applied to a mobile cellular network weights [247], [248] capable of being trained in an unsu-
dataset obtained from a city of southeast China. The results pervised manner by using the huge number of patterns in
demonstrated that the “DeepSpace” framework outperforms the currently available datasets [33]. In addition, the deep
the traditional CNN-based approach by learning the spatio- learning system is shown to be able to build rich and abstract
temporal features of the network-level mobile data in a much representations, which can be served as input to a variant of
more efficient manner. the Q-learning algorithm as discussed in Sec.II to improve
the overall behavior of the CRN. Another application of the
deep learning architecture, in the form of a DBN, was given
F. Deep Learning in Cognitive Radio and Self-Organized in [245] to classify the primary user agents in a CRN. This
Networks application, referred to as the COBANET, demonstrated a
In the remainder of this section, we first describe the substantial reduction of the number of labeled data and im-
applications of deep learning in Cognitive Radio Networks provement of the classification accuracy. With COBANET, the
(CRNs) [238], [239], and then discuss how the deep learning CRN performance in terms of spectrum sensing and handoff
methods are also becoming useful for Self-Organized Net- delay for changing channels improved in a significant manner
works (SONs). compared to existing methods [249].
Traditionally, CRNs rely on intelligent learning techniques Also, in future wireless networks, self-organization has been
to learn from and adapt to their environment, and deep learn- identified as a critical issue by a number of researchers [250].
ing applications are recently appearing as a promising CRN The key idea behind the so-called Self Organized Networks
enabler [240]. Most of the contemporary research works in- (SONs) is for them to mimic the capabilities of biological
volving CRNs attempted to employ policy-based radios, which systems (e.g., swarms of insects/fish) to autonomously adapt
are hard-coded comprising a myriad of rules for the radios to the changes in the surrounding environment. Research
to behave in specific situations and/or applications [241]. In attention is growing in developing ML algorithms to construct
other words, in a typical CRN, there is a reasoning engine SONs [251] by not only autonomously adapting to varying
that permits the radios to remember lessons learned in the conditions but also to learn based on experience. Zorzi et

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

13

al. [245] hypothesized that deep learning techniques are likely R1 R2 R3 R4


to play a pivotal role in the learning aspect of the SONs.
Furthermore, when combined with reinforcement learning, the
deep learning systems for SONs are indicated to capable of R5 R6 R7 R8
performing network optimization [252]. This was implemented
by Razavi et al. [253], [254] where they proposed reinforce-
ment learning combined with fuzzy logic to optimize the R9 R10 R11 R12
down-tilt of the antennas of an eNB (i.e., a 4G base station) to
achieve self-healing, self-configuration, and self-optimization
functions. Their proposed learning architecture exhibited ro- R13 R14 R15 R16
bustness to environmental changes and demonstrated improved
performance over comparable heuristics. On the other hand, a
Fuzzy Q-Learning algorithm to jointly optimize the coverage Fig. 5: The considered wireless mesh backbone network.
and capacity of a wireless cellular network were proposed by
Islam et al. [255]. Indeed, the reinforcement learning based
approach is becoming the mainstream direction for the CRNs methods, researchers have explored to adopt machine learning
as shown in the work conducted by Nil et al. [256]. In to manage the path intelligently [6]–[10]. However, to the best
that work, a heuristically accelerated reinforcement learning, of our knowledge, contemporary researchers did not exploit
referred to as HARL, was applied to the problem of dynamic deep learning for network traffic control. From hereon, we
spectrum sharing in LTE cellular networks. By utilizing ex- introduce a proof-of-concept of applying the deep learning
ternal information, e.g., Radio Environment Msap (REM) for technique for performing intelligent traffic control in future
guiding and speeding up the learning process, HARL achieved networks.
a significant reduction in the secondary system’s interference To clearly show how the deep learning architecture can be
to the primary systems of the considered CRN. Furthermore, it used for traffic control, we suppose a 4 × 4 wireless mesh
resulted in a much higher system throughput in contrast with backbone network as shown in Fig. 5. In the considered
the state-of-the-art reinforcement learning approaches. network, assume that the packets are generated only in edge
In this section, we extensively discussed the state-of-the-art routers and destined for other edge routers since the access
deep learning applications for a number of network environ- terminals are all connected to the edge routers while the inner
ments. In particular, we stressed on investigating how deep routers just play the role of forwarding packets. Each edge
learning applications are disrupting the networking related router is assumed to run several DBNs to construct the whole
systems in various settings, from basic WSNs to advanced paths to other edge routers and attach its packets with the
CRNs. In the following section, we describe a new application corresponding paths. The inner routers do not need to run
of deep learning for routing in order to facilitate intelligent the DBNs and just read the path to forward the packets. For
network traffic control systems. each DBN, the units in its bottom layer are characterized as
the traffic patterns of all routers in the network while the top
layer represents the next node for an origin-destination pair.
V. N EW A REA : D EEP L EARNING BASED ROUTING
As the number of routers in the network is 16, the bottom
As discussed in the earlier section, deep learning can be layer consists of 16 units. We adopt a 16-dimensional vector
used in a wide range of networking related areas. Furthermore, format output to represent the next node path of the DBN such
with the development of new technique in deep learning, in that each of its elements has a binary value. Furthermore, only
this section, we present a new area of intelligent traffic control a single element in this vector can be 1. The position or order
systems facilitated by deep learning based routing. of the element in the vector having the value of 1 indicates the
Since network traffic grows exponentially in the recent next node. As a path is composed of several routers, several
years, traffic control is essential to ensure the QoS, espe- DBNs are needed to build a complete path. Since the number
cially in the real-time multimedia networks where packet of edge routers is 12 and every DBN only outputs the next
retransmissions due to the traffic congestion are not a sen- node from one router to a destination router, the number of
sible option [207]. Routing management is a crucial aspect DBNs in the network is 180. To relieve the training burden,
for traffic control as the poorly chosen paths can lead to each router in the network trains the DBNs giving the next
network congestion, and then the following retransmissions node from itself for each of its destination routers.
of the lost packets may further aggravate the congestion. In As mentioned earlier, the training of DBN can be sep-
traditional routing protocols, the main concept is to choose arated into two steps. The first step is pre-training of the
the path having the maximum or minimum value or metric, architecture with the Greedy Layer-Wise training method,
for instance, the Shortest Path (SP) algorithm [208]. However, and the following step is to fine-tune the architecture with
these methods have different shortcomings. For example, the the backpropagation method. The details can be referred to
traditional SP algorithm has the problem of slow convergence, previous works [107]. Since the training of deep learning
which is not suitable for dynamic networks since the slow architecture is to output the desired values, we usually utilize
response to the network changes can lead to severe conges- the distance between the practical output and the desired
tion [209]. To solve this shortcoming of conventional routing output to measure the training errors. To find the method

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

14

neuron 𝑖, layer 𝑙 − 1 neuron 𝑗, layer 𝑙 TABLE I: The values of weights and Mean Square Error(MSE)
… for different numbers of the backpropagation steps.



Step Weight Value MSE
… 𝑤𝑗𝑖𝑙 … 1
w00 0.825148
1
w10 −0.600210
100 ... ... 2.918418 × 10−5
… …
3
w11,15 −0.074882
3
w12,15 −0.052039
1
w00 0.825954
… … 1
w10 −0.598363
200 ... ... 6.719093 × 10−6




3
w11,15 −0.143467
3
w12,15 −0.121055
... ... ... ...
1
w00 0.825623
Fig. 6: The error propagates layer by layer and the value 1
w10 −0.597710
changes. 500 ... ... 9.843218 × 10−7
3
w11,15 −0.231773
3
w12,15 −0.209574
... ... ... ...
how to decrease the distance, we need to first know how 1
w00 0.825115
1
w10 −0.597787
the training errors are produced. To easily understand how
1000 ... ... 4.882770 × 10−7
the error is defined, imagine that the error is represented by 3
w11,15 −0.263871
a demon icon as depicted in Fig. 6. The demon sits at the 3
w12,15 −0.241689
j th neuron in the lth layer. The demon interferes with the
neuron’s operation when the input to the neuron comes in. It
adds a little change (i.e., a noise) to the neuron’s weighted
input, so that instead of generating only the expected output, this scale of wireless mesh backbone [211]. Every node is
the neuron instead outputs the expected output along with the assumed to have an unlimited buffer. The overall data packet
noise. This change propagates through the subsequent layers generating rate in the considered network is varied between
in the DBN, finally causing the overall cost to change by 7.68Mbps to 14.4Mbps. For comparison of the adopted deep
a substantial amount [210]. Consequently, to minimize the learning system, OSPF is used as the benchmark method.
value of the cost function, we need to adjust the values of In the conducted simulations, the time-interval of each path
the weights. The mathematical procedure to conduct this is to updating phase is set to 0.25s during which signaling is
utilize the derivative of the cost function to update the weight exchanged once.
(l)
wji as shown in Fig. 6. To sufficiently adjust the weight, we To decide the number of layers and the number of units
need to use the backpropagation method several times until in every hidden layer, different deep learning structures are
the value of the cost function reaches the requirement. Using compared first as shown in Table. II. In the training process
Table I, we can find the values of the weights and the Means of the deep learning system, MSE is often used as the stopping
Square Errors (MSE) which are used to measure the value of condition of the training. Here, we use this value to measure
the cost function. It can be clearly seen, from Table I, that the the performance of different deep learning structures. For
value of MSE decreases with the number of backpropagation simplicity, 12 structures in which the numbers of layers are
steps. varied from 4 to 6 while the number of nodes in every hidden
layer is varied in the range of {14, 16, 18, and 20}. It can be
After the training phase, every router obtains the values
noticed that when the number of nodes in the hidden layers
of the weights and biases of the DBNs that predict the next
is fixed, the value of MSE grows bigger as the number of
nodes for edge routers from itself. The ext step is that every
layers grows. This indicates that 4 layers are sufficient for
router forwards these values to the edge routers. Therefore,
our training data. In other words, more layers will cause
once every edge router gets the traffic patterns of all routers,
the problem of over-fitting for the considered scenario. On
it can utilize the DBNs to construct the whole paths to other
the other hand, the changing trend of MSE with the number
edge routers.
of nodes in the hidden layers is not the same for different
Next, based on the simulation topology in [107], the per- numbers of layers. For instance, for the deep learning systems
formance of the deep learning based routing is evaluated.
Since the computations of all the routers were outsourced to
a single machine, the evaluation was restricted to a medium TABLE II: Comparison of the learning structures for the
scale wireless mesh backbone network comprising 16 routers considered network.
shown in Fig. 5 rather than a full-scale backbone network
M SE(10−5 ) layers
topology. Note that this scale of simulation is sufficient as 4 5 6
nodes
long as it demonstrates that the proposed deep learning system 14 2.157 2.224 2.232
outperforms the conventional routing strategies such as OSPF. 16 2.15 2.223 2.229
The data and control packets sizes are both set to 1Kb. The 18 2.15 2.218 2.224
link bandwidths are set to 8Mbps which is reasonable for 20 2.159 2.212 2.225

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

15

Benchmark Proposed deep Benchmark Proposed deep Benchmark Proposed deep


35000 (OSPF) learning 4 (OSPF) learning 4.5 (OSPF) learning

Average per hop delay (ms)


4

Total Throughput(Mbps)
30000 3.5
Signaling Overhead

3 3.5
25000
3
20000 2.5
2.5
2
15000 2
1.5 1.5
10000
1 1
5000 0.5 0.5
0 0 0
7.68 11.52 14.4 7.68 11.52 14.4 7.68 11.52 14.4
Packet generation rate (Mbps) Packet generation rate (Mbps) Packet generation rate (Mbps)

(a) Comparison of signaling overhead. (b) Comparison of aggregate throughput. (c) Comparison of average per-hop delay.

Fig. 7: Comparison of signaling overhead, throughput, and average per-hop delay between the benchmark method (OSPF) and
the new, proof-of-concept, deep learning system with different overall data packet generate rates.

comprising 4 layers, MSE reaches the minimum value when the network traffic control system is still at a preliminary stage
the number of nodes in every hidden layer is 16 and 18, which of implementation, and therefore, various issues need to be
means that these two types of structures can achieve similar considered as future research issues [212].
performance. However, the training will be more complex
with the increase of nodes in the hidden layers. Therefore, VI. O PEN R ESEARCH I SSUES
we choose the deep learning structure of 4 layers and 16 A number of research challenges are anticipated to emanate
nodes in every hidden layer for the evaluation of network from the deep learning applications for network traffic control
performance metrics shown in Figs. 7(a), 7(b), and 7(c). As systems. In this section, we discuss these open research issues.
shown in these figures, three network performance metrics are
evaluated between the conventional OSPF and deep learning.
First, the signaling overhead is compared with the benchmark A. Training data processing
method (i.e., OSPF) in Fig. 7(a). The results show that the Deep learning is a widely used technique in many areas,
signaling overhead of OSPF is much higher compared to that and big data are often served as its input for training. Such
of the deep learning system no matter how the generation big data are collected from the real world, and processed as
rate is changed. The lower signaling achieved by our proposal training sets. However, can such raw data sets be directly used
can be explained as follows. While the conventional OSPF in deep learning? In most learning systems, we require the
method needs all the routers to frequently flood their respective training sets to be without redundancy while having accuracy
neighbors with the routing information, in the deep learning and balance. However in the real world, the available data
method, only the edge routers’ traffic patterns are sufficient sets are not always so perfect. For example, the data may be
to compute the whole paths with an accuracy as high as represented in some counterpart classes much more than those
95%. Therefore, in the deep learning method, only the edge in some other passive classes [213]. Furthermore, they may
routers need to exchange traffic patterns among themselves contain many redundant and mislabeled data in it [214]. Large
and the traffic patterns of the inner routers are arbitrarily set training sets require large memory and enormous amounts of
in the running phase. Then, in Fig. 7(b), the throughput of training time and contribute to large training costs. Decreasing
the two methods are compared. It can be observed that the redundant (i.e., useless) training data is essential to improve
throughput achieved by the deep learning system is almost the training effect. Besides normal training data processing,
close to the average data generation rate of the routers since it different deep learning application areas still have their special
avoids congestion and packet drops by evaluating routes to the demands. For instance, consider the intelligence translation
destination much faster than the conventional OSPF. Finally, area, which confronts the challenge in determining how to
Fig. 7(c) demonstrates that the deep learning method finds select non-domain-specific language model training data [215].
the appropriate routes quickly enough so that the average per- In our prior work citeour-WCM, in order to train network
hop delay of the adopted deep learning system is significantly intelligent routing, we need large amounts of training data of
lower than that of the benchmark method. existing routing strategies. Because of the difference of data
between different routing strategies, we only used OSPF-based
In this section, we highlighted the deep learning application routing data as our training data. However, some portions
on routing in a backbone network, and provided the detailed of such datasets may still have imbalance, which cannot be
description of how to consider an appropriate deep learning directly used in the training phase. Hence, more accurate data
system, how to characterize the inputs and outputs of the sets to characterize the routing path is required. This is why,
system, and how to train the system. In addition, simulation in Sec. V, we described the need to use many characteri-
results were provided to demonstrate how the deep learning zation strategies to pre-process and filter data. Indeed, there
system outperforms the conventional routing strategy. How- is still room for further improvement of dataset selection.
ever, it is worth noting that the deep-learning application for For example, using the real routing data instance of multiple

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

16

routing protocols, using more than 3-dimensional input data D. Scalability of Deep Learning Applications for Network
to characterize the routing path, and so forth need to be Traffic Control Systems
considered in the future. All in all, how to select and process Specific deep learning algorithms may work well for spe-
training data to suit the training system and improve the quality cific applications. In addition, the time versus space tradeoff
of the training data, still, pose a significant research challenge. could present a significant challenge for using deep learning
algorithms for network traffic control systems. This may lead
to scalability issues of using deep learning applications for
B. Insights on the deep learning problem large network traffic control systems, such as the Internet.
Due to its profound impact in the AI area, deep learning is For instance, in the case of the SDR exploiting deep learning
dramatically changing the researchers’ way of the thinking for packets forwarding/routing scenario, the SDR not only
and interpreting the representation of problems, which are requires a highly parallelized packets processing methodology
currently being solved with analytics. The deep learning but also needs to have a significantly huge storage. Fur-
technique moves away from instructing the computer how thermore, fast storage such as Ternary Content Addressable
to computationally solve a problem to training the computer Memory (TCAM), Solid State Disk (SSD), and so forth are
to solve the problem by itself. However, researchers must expensive and may not be scalable without adequate encoding
gain the insight on how the deep learning actually works. In schemes for deep learning inputs and outputs. Therefore, how
other words, researchers should not take the deep learning to practically expand the storage for the SDRs to contain the
technique for granted just by arbitrarily applying it to solve training data for the deep learning system is a formidable
large computational problems. Instead, how the deep learning research challenge. Using high-speed Storage Area Network
is working through the training process should be carefully and (SAN) may be a viable approach that can provide access to
deeply understood. The deep neural network contains layers of the relevant dataset for training the deep learning system while
nodes to represent features of objects. Can we have a method moving the unwanted dataset to a backup archive storage. In
to analyze how those neural nodes construct such features of the future, researchers need to adequately address the afore-
the neural network? If this is possible, the structure of deep mentioned scalability issue and develop appropriate solutions
learning system may improved easily, and even newer, more to deal with the same.
sophisticated deep learning structures can be invented.
E. Deep Learning in the Internet-of-Things
Recently, the Internet-of-Things (IoT) emerged as a hot
C. Optimize GPU-based Deep Learning for Software Defined
research area that aims to connect billions of things (e.g.,
Routers
sensors, objects, machines, devices, and so forth) in order to
Software Defined Routers or SDRs are becoming an at- collect and process detailed information about events and envi-
tractive platform for flexible packet processing. Furthermore, ronments to solve various challenges [2], [258]. As described
the provision of GPU-based deep learning in SDRs is both in Sec. II, Google, Microsoft, Amazon, and other technology
an opportunity and a challenge for researchers. Particularly, giants are heavily investing in deep learning techniques. These
the CUDA streams provide an interesting way to optimize deep learning techniques can lead to the so-called deep linking
GPU-based applications, particularly for SDRs. The CUDA of the IoT. Deep linking refers to a unified protocol or interface
streams can increase the parallelism and throughput in a to allow applications to trigger and communicate with one
significant way. However, more research is needed in this another behind the scenes. For instance, the calendar appli-
aspect to encourage researchers in realizing that the future cation collaborates with the navigation application in a smart
of networking heavily hinges upon SDNs. Therefore, more phone to indicate when the user should leave work to avoid the
investigation is required on the GPU-based SDRs such as how rush-hour traffic, and arrive at the favorite restaurant right on
to further improve the packet generation rate, how to improve schedule. The IoT prototype called GreenIQ (a smart irrigation
resource management, and so forth to make an optimized system for gardens) may know when it is about to rain,
execution of the deep learning algorithms. Also, on-chip deep and avoid activating the garden sprinklers. The deep learning
learning can be another area where researchers should devote algorithms can be the behind-the-scene enablers to this deep
a lot of attention. This is because while the SDRs can be a linking interface in the IoT applications. In other words, the
proof-of-concept of the ability in deep learning algorithms for state-of-the-art IoT paradigm is restricted to connectivity. The
performing effective packets routing, the hardware implemen- IoT systems need to step beyond connectivity into the realm
tation of deep learning can be much more efficient. With the of intelligence empowered by deep learning. For instance,
current state-of-the-art hardware, a tempting idea could be to the smart home is a frequently referred to the use case of
explore the use of integrated graphics processors instead of a typical IoT system that exhibits connectivity of devices,
using dedicated discrete GPUs. By doing so, the GPU could gadgets, wearable things, and so forth in the user home.
be co-located with the CPU on the Quick-Path-Interconnect However, the intelligent aspect of the smart home requires
(QPI). In other words, moving the location from the typical deep learning, which is becoming prevalent in more and more
PCI-express bus to QPI can offer more bus bandwidth to devices. The networks that connect the things need to be
memory, and theoretically integrated graphics processors may somehow augmented with the deep learning systems. Future
be able to achieve even higher throughput as indicated in [257]. research works need to deal with how to realize such an

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

17

exciting harmony of the IoT and deep learning with the deep G. Deep Learning for Network Security
linking interface. Network security is another area, which may initially appear
as unrelated to the network traffic control systems. However,
F. Deep Learning Approaches to Mobile Edge Computing
the security is a critical aspect of any network traffic control
In the next generation networks, the various analytics on system. In the network security research area, a plethora of
the massive IoT data and big data are expected to pose a ML-based approaches exist in the literature [177], [178]. Most
significant challenge for conventional data-driven and rule- of these approaches deal with identifying anomalous traffic
based expert systems. While they are easy to implement, such patterns as potential risks to the networks. Deep learning can
systems are usually slower in adapting to new data types. As offer a more accurate classification of the network threats,
a consequence, deep learning techniques are being considered from malware detection to Distributed Denial of Service
for analytics in data centers as well as in the network edge. (DDoS) attacks. Nevertheless, deep learning algorithms still
At the first glance, the deep learning approaches to the data need to be implemented in the intrusion detection systems,
analytics may not appear to be intimately connected with deployment of which has been a key challenge in the literature.
the network traffic control systems. However, the fusion of Therefore, how to enable deep learning applications to detect
these various disciplines, i.e., communication networks, data potential threats on the user-scale may be explored. Deep
analytics and computing, and deep learning, is imminent. In learning may have a huge impact on cyber security, partic-
such an interdisciplinary paradigm, where should the deep ularly in terms of detecting zero-day malware, new malware,
learning algorithms be implemented? Data centers, which typ- and sophisticated Advanced Persistent Threats (APTs). The
ically comprise processing and large memory resources, robust state-of-the-art deep learning methods can be expected to
networks, and huge storage supporting massive datasets, may exhibit superior performance compared to the traditional ML
be able to make the best use of the deep learning algorithms techniques. This is because the deep learning paradigm can
for the state-of-the-art web analytics. On the other hand, for provide an accurate information on suspicious (i.e., anoma-
the analytics tasks with immediate or near-immediate response lous) activity without intervention or supervised training from
time (e.g., the real-time IoT analytics), recent researchers are human analysts. In addition, the deep learning approaches are
showing a growing interest in analytics at the network edge, much more robust to significantly large sets of encrypted data
refered to as the Mobile Edge Computing. This paradigm also compared to traditional ML-based intrusion detection systems.
referred to as the edge computing or edge analytics, may Hence, the deep learning methodology, theoretically, should
present a challenge for the deep learning algorithms simply be adopted by IT organizations. However, the organizational
because the equipment available in the network edge may not policy may be a major obstacle against adopting deep learning
have sufficient resources to execute a robust deep learning systems to combat malicious threats that the researchers have
system. In other words, whether a sophisticated deep learning to appropriately address in the future research works.
system that consists of a complex computational circuit with Also, for a huge data set containing both benign and mali-
millions of parameters can be effectively used for real-time the cious data, the higher number of dimensionality means that the
edge analytics can be genuinely a daunting challenge in the intrusion detection method needs to deal with dimensionality
future. Currently, deep learning libraries such as TensorFlow, reduction technique to estimate the presence of malicious
which require less memory and storage, while supporting het- data. Therefore, the underlying deep learning algorithm of
erogeneous distributed systems, may be effective for real-time the intrusion detection system needs to be able to perform
analytics tasks exploiting the mobile edge nodes. However, dimensionality reduction in an efficient manner. In this vein,
this requires more investigation. Furthermore, the current deep Auto-Encoders may be a viable candidate. However, for the
learning architectures are drawing on only a tiny fraction of online detection case in a substantially large network, this may
what is known about real neurons and brains. Researchers be particularly challenging since the deep learning algorithm
are working on developing further variations of deep/extreme needs to ingest a significantly large training set almost at real-
learning machines, which may change the analytics and big time. In addition, normality poisoning poses another challenge
data mining research area completely. For instance, recently, to the unsupervised deep learning based intrusion detection
Intel announced that an autonomous vehicle may generate up methodology (e.g., based on DBNs) that may change the
to 4000GB of data every day [259]. On board deep learning sanctity of data used for detecting malicious activities. In other
algorithms are required to absorb this huge amount of data words, the deep learning algorithm needs to take into account
from a plethora of senors in the vehicle. Accordingly, the the likelihood that anomalous data may be cleverly hidden
deep learning system has to decide how much data should within normal information in the network, thereby making the
be processed locally and how much information should be whole detection process self-defeatist [260].
uploaded to the cloud while ensuring that the driving decisions
are made at real-time to ensure the safety of the driver.
Again, this creates an intricately complex inter-discipline of VII. C ONCLUSION
communication networks, deep learning, data mining, and Deep learning is a new breed of Machine Intelligence
perhaps even robotics. This is just an example to indicate that technique, which is gaining much popularity and wide use
in the future, there may be unique areas to apply the deep in various computer science fields, such as object recognition,
learning techniques where network traffic control systems will speech recognition, signal processing, robotics, AI gaming,
be subject to a much more complex inter-disciplinary trend. and so forth. However, the application of deep learning in

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

18

network systems just started to receive research attention. In [14] Z. Li and R. Wang, “A multipath routing algorithm based on traffic
this survey paper, we discussed the state-of-the-art machine prediction in wireless mesh networks,” Communications and Network,
vol. 1, no. 2, pp. 82–90, Aug. 2009.
learning and new deep learning researches in the network [15] S. Chabaa, A. Zeroual, and J. Antari, “Identification and prediction of
related areas such as WSN and social networks, network traffic Internet traffic using artificial neural networks,” Journal of Intelligent
classification, network flow prediction, mobility prediction, Learning Systems and Applications, vol. 2, no. 3, pp. 147–155, Jul.
2010.
and so forth. Furthermore, we provided a comprehensive guide [16] H. Goh, N. Thome, M. Cord, and J.-H. Lim, “Top-down regularization
on how deep learning applications can stimulate a new area of deep belief networks,” in Proceedings of the 26th International
of research involving smart network traffic control systems. In Conference on Neural Information Processing Systems, ser. NIPS’13,
Lake Tahoe, Nevada, USA, 2013, pp. 1878–1886.
particular, we focused on the newly emerging deep learning [17] N. Srivastava and R. R. Salakhutdinov, “Multimodal learning with deep
based routing. We provided a step-by-step description of the boltzmann machines,” in Advances in Neural Information Processing
deep learning technique used for intelligent network routing. Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,
Eds, Curran Associates, Inc., 2012, pp. 2222–2230.
In addition, simulation results were provided to demonstrate [18] R. Salakhutdinov and G. E. Hinton, “Deep Boltzmann Machines.” in
the superior performance of the deep learning based routing AISTATS, Florida, USA, Apr. 2009.
method compared to the conventional routing strategy. Fur- [19] R. Salakhutdinov and G. Hinton, “An Efficient Learning Procedure
for Deep Boltzmann Machines,” Neural Comput., vol. 24, no. 8, pp.
thermore, we discussed a number of open research issues, 1967–2006, Aug. 2012.
and indicated how deep learning, networking, and computing [20] L. Deng, “A tutorial survey of architectures, algorithms, and appli-
are heading toward an intricate yet imminent inter-disciplinary cations for deep learning,” APSIPA Trans. on Signal and Information
Processing, vol. 3, no. 2, Jan. 2014.
area, which the future researchers need to embrace. [21] Y. Bengio, “Learning Deep Architectures for AI,” Foundations and
Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, Jan. 2009.
[22] P. E. Utgoff and D. J. Stracuzzi, “Many-Layered Learning,” Neural
R EFERENCES Computation, vol. 14, no. 10, pp. 2497–2529, Nov. 2002.
[1] S. Chen, H. Xu, D. Liu, B. Hu, and H. Wang, “A vision of IoT: [23] Y. Bengio and Y. Lecun, “Scaling learning algorithms towards AI,”
Applications, challenges, and opportunities with china perspective,” Cambridge, MA, USA, MIT Press, Aug. 2007.
IEEE Internet of Things journal, vol. 1, no. 4, pp. 349–359, Aug. [24] Y. Bengio, R. De Mori, G. Flammia, and R. Kompe, “Global opti-
2014. mization of a neural network-hidden Markov model hybrid,” IEEE
[2] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of Transactions on Neural Networks, vol. 3, no. 2, pp. 252–259, Mar.
Things (IoT): A vision, architectural elements, and future directions,” 1992.
Future Generation Computer Systems, vol. 29, no. 7, pp. 1645–1660, [25] H. A. Bourlard and N. Morgan, “Connectionist Speech Recognition: A
Sep. 2013. Hybrid Approach,” Norwell, MA, USA, Kluwer Academic Publishers,
[3] Y. Zheng, F. Liu, and H. Hsieh, “U-air: when urban air quality inference 1993.
meets big data,” in The 19th ACM SIGKDD International Conference [26] N. Morgan, “Deep and Wide: Multiple Layers in Automatic Speech
on Knowledge Discovery and Data Mining, Chicago, IL, USA, Aug. Recognition,” IEEE Transactions on Audio, Speech, and Language
2013, p. 14361444. Processing, vol. 20, no. 1, pp. 7–13, Jan. 2012.
[4] K. C. Claffy, G. C. Polyzos, and H-W. Braun, “Application of sampling [27] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A
methodologies to network traffic characterization,” in ACM SIGCOMM, review and new perspectives,” IEEE Transactions on Pattern Analysis
San Francisco, CA, USA, 1993. and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, Aug. 2013.
[5] J. S. Marcus, “The Economic Impact of Internet Traffic Growth on [28] G. Hinton and R. Salakhutdinov, “Reducing the Dimensionality of Data
Network Operators,” 2014, wIK-Consult (for Google, Inc.), Germany. with Neural Networks,” Science, vol. 313, no. 5786, pp. 504 – 507,
[Online]. Available: http://www.wik.org/uploads/media/Google Two- Jul. 2006.
Sided Mkts.pdf [29] G. E. Hinton, S. Osindero, and Y. W. Teh, “A Fast Learning Algorithm
[6] M. Barabas, G. Boanea, and V. Dobrota, “Multipath routing man- for Deep Belief Nets,” Neural Computation, vol. 18, no. 7, pp. 1527–
agement using neural networks-based traffic prediction,” in The 3rd 1554, Jul. 2006.
International Conference on Emerging Network Intelligence, Lisbon, [30] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle et al., “Greedy
Portugal, Nov. 2011. layer-wise training of deep networks,” Advances in neural information
[7] Zhang and Thomopoulos, “Neural network implementation of the processing systems, vol. 19, pp. 153, Aug. 2007.
shortest path algorithm for traffic routing in communication networks,” [31] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum
in International 1989 Joint Conference on Neural Networks, San Diego, learning,” in Proceedings of the 26th annual international conference
CA, USA, Aug. 1989, pp. 591. on machine learning, Montreal, QC, Canada, Jun. 2009, pp. 41–48.
[8] J. Barbancho, C. León, F. J. Molina, and A. Barbancho, “A new QoS [32] A. Dosovitskiy, J. T. Springenberg, M. A. Riedmiller, and T. Brox,
routing algorithm based on self-organizing maps for wireless sensor “Discriminative unsupervised feature learning with convolutional neu-
networks,” Telecommunication Systems, vol. 36, no. 1, pp. 73–83, Nov. ral networks,” CoRR, vol. abs/1406, no. 6909, Apr. 2014.
2007. [33] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[9] M. K. M. Ali and F. Kamoun, “Neural networks for shortest path with deep convolutional neural networks,” in Advances in Neural
computation and routing in computer networks,” IEEE Transactions Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou,
on Neural Networks, vol. 4, no. 6, pp. 941–954, Nov. 1993. and K. Q. Weinberger, Eds, Harrahs and Harveys, USA: Curran
[10] G. Boanea, M. Barabas, A. B. Rus, V. Dobrota, and J. Domingo- Associates, Inc., Dec. 2012, pp. 1097–1105.
Pascual, “Performance evaluation of a situation aware multipath routing [34] K. Kavukcuoglu, P. Sermanet, Y. lan Boureau, K. Gregor, M. Mathieu,
solution,” in 2011 RoEduNet International Conference 10th Edition: and Y. L. Cun, “Learning convolutional feature hierarchies for visual
Networking in Education and Research, Iasi, Romania, Jun. 2011, pp. recognition,” in Advances in Neural Information Processing Systems,
1–6. J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and
[11] F. Liu, B. Liu, C. Sun, M. Liu, and X. Wang, “Deep Learning A. Culotta, Eds, Vancouver, Canada: Curran Associates, Inc., Dec.
Approaches for Link Prediction in Social Network Services,” Berlin, 2010, pp. 1090–1098.
Heidelberg: Springer Berlin Heidelberg, 2013, pp. 425–432. [35] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
[12] Y. L. Gwon and H. T. Kung, “Inferring origin flow patterns in wi-fi with W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten
deep learning,” in Proceedings of the 11th International Conference on zip code recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, Dec.
Autonomic Computing (ICAC ’14), vol. 18-20, Philadelphia, PA, USA, 1989.
Jun. 2014. [36] I. Sutskever, J. Martens, and G. Hinton, “Generating text with recurrent
[13] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, neural networks,” in Proceedings of the 28th International Confer-
“Multimodal Deep Learning,” in International Conference on Machine ence on Machine Learning (ICML 11), ser. ICML’11, L. Getoor and
Learning (ICML), Bellevue, USA, Jun. 2011. T. Scheffer, Eds., New York, NY, USA: ACM, Jun. 2011, pp. 1017–
1024.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

19

[37] S. Hochreiter, “The vanishing gradient problem during learning re- [61] K. Saruta, N. Kato, M. Abe, and Y. Nemoto, “High Accuracy Recogni-
current neural nets and problem solutions,” International Journal of tion of ETL9B Using Exclusive Learning Neural Network-II : ELNET-
Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, no. 02, II,” IEICE Transactions on Information and Systems (Special Issue on
pp. 107–116, Oct. 1998. Character Recognition and Document Understanding), vol. 79, no. 5,
[38] T. Mikolov, M. Karafit, L. Burget, J. Cernock, and S. Khudanpur, “Re- pp. 516–522, May 1996.
current Neural Network based Language Model,” in INTERSPEECH, [62] “MNIST database,” http://yann.lecun.com/exdb/mnist/, (accessed Dec.
T. Kobayashi, K. Hirose, and S. Nakamura, Eds., Makuhari, Japan, 2016).
Sep. 2010, pp. 1045–1048. [63] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-
[39] Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu, “Advances in dinov, “Dropout: A simple way to prevent neural networks from
optimizing recurrent networks,” in 2013 IEEE International Conference overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, Jan.
on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2014.
May 2013, pp. 8624–8628. [64] S. Rifai, Y. Bengio, Y. Dauphin, and P. Vincent, “A generative process
[40] I. Sutskever, “Training recurrent neural networks,” Diss. University of for sampling contractive auto-encoders,” in 29th International Confer-
Toronto. 2013. ence on Machine Learning (ICML 2012), Edinburgh, Scotland, U.K.,
[41] C.-Y. Liou, J.-C. Huang, and W.-C. Yang, “Modeling word perception Jun. 2012.
using the Elman network,” Neurocomputing, vol. 71, no. 1618, pp. [65] D. C. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep
3150 – 3157, Jun. 2008. neural networks for image classification,” CoRR, vol. abs/1202, no.
[42] H. Bourlard and Y. Kamp, “Auto-association by multilayer perceptrons 2745, Feb. 2012.
and singular value decomposition,” Biological Cybernetics, vol. 59, [66] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
no. 4, pp. 291–294, Sep. 1988. Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and
[43] R. Socher, Y. Bengio, and C. D. Manning, “Deep Learning for NLP L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,”
(Without Magic),” in Tutorial Abstracts of ACL 2012, ser. ACL’12, Jul. International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp.
2012, pp. 5–5. 211–252, Sep. 2015.
[44] Y. Freund and D. Haussler, “Unsupervised learning of distributions on [67] Emerging Technology from the arXiv, “Google Unveils Neural
binary vectors using two layer networks,” University of California at Network with Superhuman Ability to Determine the Location
Santa Cruz, Santa Cruz, CA, USA, Tech. Rep., Jun. 1994. of Almost Any Image,” https://www.technologyreview.com/s/
[45] M. A. Cote and H. Larochelle, “An Infinite Restricted Boltzmann 600889/google-unveils-neural-network-with-superhuman-ability-
Machine,” Neural Computation, vol. 28, no. 7, pp. 1265–1288, Jul. to-determine-the-location-of-almost/, (accessed Dec. 2016).
2016. [68] T. Landauer, C. Kamm, and S. Singhal, “Learning a Minimally Struc-
[46] T. Oohori, H. Naganuma, and K. Watanabe, “A New Backpropagation tured Back Propagation Network to Recognize Speech,” in Proc. 9th
Learning Algorithm for Layered Neural Networks with Nondifferen- Annu. Conf. Cogn. Sci. Soc., Seattle, WA, USA, 1987.
tiable Units,” Neural Computation, vol. 19, no. 5, pp. 1422–1435, May [69] Q. Zhu, B. Chen, N. Morgan, and A. Stolcke, “Tandem Connectionist
2007. Feature Extraction for Conversational Speech Recognition,” Machine
[47] C. L. P. Chen, C. Y. Zhang, L. Chen, and M. Gan, “Fuzzy Restricted Learning for Multimodal Interaction, vol. 3361, p. 223231, 2005.
Boltzmann Machine for the Enhancement of Deep Learning,” IEEE [70] O. Abdel-Hamid, A. R. Mohamed, H. Jiang, L. Deng, G. Penn, and
Transactions on Fuzzy Systems, vol. 23, no. 6, pp. 2163–2173, Dec. D. Yu, “Convolutional Neural Networks for Speech Recognition,”
2015. IEEE/ACM Transactions on Audio, Speech, and Language Processing,
[48] C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. vol. 22, no. 10, pp. 1533–1545, Oct. 2014.
3-4, pp. 279–292, Oct. 1992. [71] F. Seide, G. Li, X. Chen, and D. Yu, “Feature Engineering in Context-
[49] “Methods and apparatus for reinforcement learning,” https://www. Dependent Deep Neural Networks for Conversational Speech Tran-
google.com/patents/US20150100530, (accessed Dec. 2016). scription,” in IEEE Workshop on Automatic Speech Recognition &
[50] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning Understanding (ASRU’11), Hilton Waikoloa Village Resort Waikoloa,
with double q-learning,” CoRR, vol. abs/1509, no. 06461, Nov. 2015. HI, USA, Dec. 2011.
[51] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience [72] F. Seide, G. Li, and D. Yu, “Conversational Speech Transcription
replay,” CoRR, vol. abs/1511, no. 05952, Nov. 2015. Using Context-Dependent Deep Neural Networks,” in 12th Annual
[52] Z. Wang, N. de Freitas, and M. Lanctot, “Dueling network architectures Conference of the International Speech Communication Association
for deep reinforcement learning,” CoRR, vol. abs/1511, no. 06581, Nov. (Interspeech’11), Florence, Italy, Aug. 2011.
2015. [73] D. Yu, M. L. Seltzer, J. Li, J. Huang, and F. Seide, “Feature
[53] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, Learning in Deep Neural Networks - A Study on Speech Recognition
D. Silver, and D. Wierstra, “Continuous control with deep reinforce- Tasks,” CoRR, vol. abs/1301.3605, Jan. 2013. [Online]. Available:
ment learning,” CoRR, vol. abs/1509, no. 02971, Sep. 2015. http://arxiv.org/abs/1301.3605
[54] I. J. Goodfellow, A. Courville, and Y. Bengio, “Large-scale feature [74] “Microsoft Audio Video Indexing Service (MAVIS),” https://www.
learning with spike-and-slab sparse coding,” vol. 1206, no. 6407, Jun. microsoft.com/en-us/research/project/mavis/, 2016 (accessed Dec.
2012. 2016).
[55] Y. Bengio, G. Mesnil, Y. Dauphin, and S. Rifai, “Better mixing via [75] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance
deep representations,” CoRR, vol. abs/1207, no. 4404,Jun. 2012. of initialization and momentum in deep learning,” in Proceedings of
[56] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural the 30th International Conference on Machine Learning (ICML-13),
networks,” in Proceedings of the Fourteenth International Conference vol. 28, no. 3, Atlanta, USA, May 2013, pp. 1139–1147.
on Artificial Intelligence and Statistics (AISTATS-11), G. J. Gordon [76] R. Socher, A. Perelygin, J. Y. Wu, J. Chuang, C. D. Manning, A. Y. Ng,
and D. B. Dunson, Eds, Ft. Lauderdale, FL, USA: Journal of Machine and C. Potts, “Recursive deep models for semantic compositionality
Learning Research - Workshop and Conference Proceedings, Apr. over a sentiment treebank,” in Proceedings of the conference on
2011, pp. 315–323. empirical methods in natural language processing (EMNLP), vol. 1631,
[57] J. Chen and X. Liu, “Transfer Learning with One-class Data,” Pattern Seattle, USA, Oct. 2013.
Recogn. Lett., vol. 37, pp. 32–40, Feb. 2014. [77] R. Socher, E. H. Huang, J. Pennin, C. D. Manning, and A. Y. Ng,
[58] J. Weston, F. Ratle, H. Mobahi, and R. Collobert, “Deep Learning “Dynamic pooling and unfolding recursive autoencoders for paraphrase
via Semi-Supervised Embedding,” in Neural Networks: Tricks of the detection,” in Advances in Neural Information Processing Systems,
Trade, G. Montavon, G. Orr, and K.-R. Muller, Eds, Berlin, Germany: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q.
Springer, May. 2012. Weinberger, Eds. Granada, Spain: Curran Associates, Inc., Dec. 2011,
[59] R. Collobert, “Deep Learning for Efficient Discriminative Parsing,” pp. 801–809.
in 14th International Conference on Artificial Intelligence and Statis- [78] “The ibrain is here and it’s already inside your phone,”
tics(AISTATS), Ft. Lauderdale, FL, USA, Apr. 2011. https://backchannel.com/an-exclusive-look-at-how-ai-and-machine-
[60] N. Kato, M. Suzuki, S. Omachi, H. Aso, and Y. Nemoto, “A hand- learning-work-at-apple-8dbfb131932b#.43bf9cm00, (accessed Aug.
written character recognition system using directional element feature 2016).
and asymmetric Mahalanobis distance,” IEEE Transactions on Pattern [79] G. E. Hinton, “Learning Distributed Representations of Concepts,” in
Analysis and Machine Intelligence, vol. 21, no. 3, pp. 258–262, Mar. Proceedings of the 8th Annual Conference of the Cognitive Science
1999. Society, 1986.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

20

[80] Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, “A Neural Prob- Systems and Knowledge Discovery (ICNC-FSKD), ChangSha, China,
abilistic Language Model,” Journal of Machine Learning Research, Aug. 2016, pp. 578–582.
vol. 3, pp. 1137–1155, Mar. 2003. [100] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den
[81] R. Collobert and J. Weston, “A unified architecture for natural language Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,
processing: Deep neural networks with multitask learning,” in Proceed- M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner,
ings of the 25th International Conference on Machine Learning, ser. I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and
ICML ’08. New York, NY, USA: ACM, 2008, pp. 160–167. D. Hassabis, “Mastering the game of Go with deep neural networks
[82] R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng, “Parsing With and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.
Compositional Vector Grammars,” in ACL,Feb. 2013. [101] “The Game Imitation: A Portable Deep Learning Model for Mod-
[83] A. Deoras, T. Mikolov, S. Kombrink, and K. Church, “Approximate ern Gaming AI,” http://cs231n.stanford.edu/reports2016/113 Report.
inference: A sampling based modeling technique to capture complex pdf, (accessed Dec. 2016).
dependencies in a language model,” Speech Communication, vol. 2012, [102] A. Mestres, A. Rodrı́guez-Natal, J. Carner, P. Barlet-Ros, E. Alarcón,
no. 8, pp. 1–16, Aug. 2012. M. Solé, V. Muntés, D. Meyer, S. Barkai, M. J. Hibbett, G. Estrada,
[84] “Wall Street Journal-based Continuous Speech Recognition (CSR) K. Maruf, F. Coras, V. Ermagan, H. Latapie, C. Cassar, J. Evans,
Corpus ,” http://catalog.ldc.upenn.edu/docs/LDC94S13A/wsj1.txt, (ac- F. Maino, J. C. Walrand, and A. Cabellos, “Knowledge-defined net-
cessed Dec. 2016). working,” CoRR, vol. abs/1606.06222, Nov. 2016.
[85] H. Schwenk, A. Rousseau, and M. Attik, “Large, pruned or continuous [103] “Deep learning comp sheet: Deeplearning4j vs. torch vs. theano vs.
space language models on a gpu for statistical machine translation,” caffe vs. tensorflow,” http://deeplearning4j.org/compare-dl4j-torch7-
in NAACL workshop on the Future of Language Modeling, Montreal, pylearn.html, 2016 (accessed Aug. 2016).
Quebec, Canada, Jun. 2012. [104] T. White, Hadoop: The Definitive Guide, 1st ed. O’Reilly Media,
[86] D. Castelvecchi, “Nature News, Deep learning boosts Google Trans- Inc., 2009.
late tool,” http://www.nature.com/news/deep-learning-boosts-google- [105] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica,
translate-tool-1.20696, (accessed Dec. 2016). “Spark: Cluster computing with working sets,” in Proceedings of the
[87] T. Schmidt, R. Newcombe, and D. Fox, “Self-supervised Visual 2Nd USENIX Conference on Hot Topics in Cloud Computing, ser.
Descriptor Learning for Dense Correspondence,” IEEE Robotics and HotCloud’10, Berkeley, CA, USA, 2010.
Automation Letters, vol. 1, no. 99, pp. 1–1, Jan. 2016. [106] “Will API,” https://scarsty.gitbooks.io/will/content/, 2016 (accessed
[88] G. Pasquale, C. Ciliberto, L. Rosasco, and L. Natale, “Object iden- Aug. 2016).
tification from few examples by improving the invariance of a Deep [107] N. Kato, Z. M. Fadlullah, B. Mao, F. Tang, O. Akashi, T. Inoue, and
Convolutional Neural Network,” in 2016 IEEE/RSJ International Con- K. Mizutani, “The Deep Learning Vision for Heterogeneous Network
ference on Intelligent Robots and Systems (IROS), Daejeon, Korea, Oct. Traffic Control Proposal, Challenges, and Future Perspective ,” Dec.
2016, pp. 4904–4911. 2016, to appear.
[89] T. De Bruin, J. Kober, K. Tuyls, and R. Babuka, “Improved deep rein- [108] S. Chintala, “Understanding Natural Language with Deep Neural
forcement learning for robotics through distribution-based experience Networks Using Torch,” https://devblogs.nvidia.com/parallelforall/
retention,” in 2016 IEEE/RSJ International Conference on Intelligent understanding-natural-language-deep-neural-networks-using-torch/,
Robots and Systems (IROS), Daejeon, Korea, Oct. 2016, pp. 3947– (accessed Dec. 2016).
3952. [109] P. Roquero, J. Ramos, V. Moreno, I. González, and J. Aracil, “High-
[90] L. Porzi, S. R. Bulo, A. Penate-Sanchez, E. Ricci, and F. Moreno- speed TCP flow record extraction using GPUs,” the Journal of Super-
Noguer, “Learning Depth-aware Deep Representations for Robotic computing, vol. 71, no. 10, pp. 3851–3876, Jul. 2015.
Perception,” IEEE Robotics and Automation Letters, vol. 1, no. 99, [110] A. E.A.A. Abdulla, H. Nishiyama, J. Yang, N. Ansari, and N. Kato,
pp. 1–1, Jan. 2016. “HYMN: A Novel Hybrid Multi-hop Routing Algorithm to Improve the
[91] J. Varley, J. Weisz, J. Weiss, and P. Allen, “Generating multi-fingered Longevity of WSNs,” IEEE Transactions on Wireless Communications,
robotic grasps via deep learning,” in 2015 IEEE/RSJ International Con- vol. 11, no. 7, pp. 2531-2541, July 2012.
ference on Intelligent Robots and Systems (IROS), Hamburg, Germany, [111] H. Nakayama, Z. M. Fadlullah, N. Ansari, and N. Kato, “A Novel
Sep. 2015, pp. 4415–4420. Scheme for WSAN Sink Mobility based on Clustering and Set Packing
[92] J. Yu, K. Weng, G. Liang, and G. Xie, “A vision-based robotic Techniques,” IEEE Transactions on Automatic Control, vol. 56, no. 10,
grasping system using deep learning for 3D object recognition and pose pp. 2381-2389, Oct. 2011.
estimation,” in 2013 IEEE International Conference on Robotics and [112] K. Suto, H. Nishiyama, N. Kato, and C-W. Huang, “An Energy-
Biomimetics (ROBIO), Shenzhen, China, Dec. 2013, pp. 1175–1180. Efficient and Delay-Aware Wireless Computing System for Industrial
[93] A. S. Polydoros, L. Nalpantidis, and V. Krger, “Real-time deep learning Wireless Sensor Networks,” IEEE Access, vol. 3, pp. 1026-1035, July
of robotic manipulator inverse dynamics,” in 2015 IEEE/RSJ Interna- 2015.
tional Conference on Intelligent Robots and Systems (IROS), hamburg, [113] A. E.A.A. Abdulla, H. Nishiyama, and N. Kato, “Extending the
Germany, Sep. 2015, pp. 3442–3448. Lifetime of Wireless Sensor Networks: A Hybrid Routing Algorithm,”
[94] J. Li, P. Ozog, J. Abernethy, R. M. Eustice, and M. Johnson-Roberson, Computer Communications Journal, vol. 35, no. 9, pp. 1056-1063, May
“Utilizing high-dimensional features for real-time robotic applications: 2012.
Reducing the curse of dimensionality for recursive Bayesian estima- [114] H. Nakayama, N. Ansari, A. Jamalipour, and N. Kato, “Fault-resilient
tion,” in 2016 IEEE/RSJ International Conference on Intelligent Robots Sensing in Wireless Sensor Networks,” Computer Communications,
and Systems (IROS), Daejeon, Korea, Oct. 2016, pp. 1230–1237. Special Issue on Security on Wireless Ad Hoc and Sensor Networks,
[95] M. Mancini, G. Costante, P. Valigi, and T. A. Ciarfuglia, “Fast Vol. 30, No. 11-12, pp. 2376-2384, Sept. 2007.
robust monocular depth estimation for Obstacle Detection with fully [115] Y. Kawamoto, H. Nishiyama, Z. M. Fadlullah, and N. Kato, “Effective
convolutional networks,” in 2016 IEEE/RSJ International Conference Data Collection via Satellite-Routed Sensor System (SRSS) to Realize
on Intelligent Robots and Systems (IROS), Daejeon, Korea, Oct. 2016, Global-Scaled Internet of Things,” IEEE Sensors Journal, vol. 13, no.
pp. 4296–4303. 10, pp. 3645-3654, Oct. 2013.
[96] K. Charalampous, I. Kostavelis, and A. Gasteratos, “Context-dependent [116] D. Takaishi, H. Nishiyama, N. Kato, and R. Miura, “Towards Energy
social mapping,” in 2016 IEEE International Conference on Imaging Efficient Big Data Gathering in Densely Distributed Sensor Networks,”
Systems and Techniques (IST), Chania, Crete Island, Greece, Oct. 2016, IEEE Transactions on Emerging Topics in Computing (TETC), vol. 2,
pp. 30–35. no. 3, pp. 388-397, Sept. 2014.
[97] M. Giering, V. Venugopalan, and K. Reddy, “Multi-modal sensor [117] H. Nishiyama, T. Ngo, N. Ansari, and N. Kato, “On Minimizing
registration for vehicle perception via deep neural networks,” in 2015 the Impact of Mobility on Topology Control in Mobile Ad Hoc
IEEE High Performance Extreme Computing Conference (HPEC), Networks,” IEEE Transactions on Wireless Communications, vol. 11,
Waltham, MA, USA, Sep. 2015, pp. 1–6. no.3, pp.1158-1166, Mar. 2012.
[98] Y. Kim, H. Lee, and E. M. Provost, “Deep learning for robust [118] M. A. Alsheikh, S. Lin, D. Niyato, and H. Tan, “Machine Learning in
feature generation in audiovisual emotion recognition,” in ICASSP. Wireless Sensor Networks: Algorithms, Strategies, and Applications,”
Vancouver, BC, Canada: IEEE, May 2013, pp. 3687–3691. CoRR, vol. abs/1405, no. 4463, May 2014.
[99] R. Qian, Y. Yue, F. Coenen, and B. Zhang, “Traffic sign recognition [119] G. E. Box and G. C. Tiao, “Bayesian inference in statistical analysis,”
with convolutional neural network based on max pooling positions,” John Wiley & Sons, vol. 40, no. 1, Jul. 2011.
in 2016 12th International Conference on Natural Computation, Fuzzy [120] C. E. Rasmussen, ‘‘Gaussian processes for machine learning,” Cam-
bridge, MA, USA, Feb, 2006.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

21

[121] A. Shareef, Y. Zhu, and M. Musavi, “Localization Using Neural machines, ser. Mobile Ad-Hoc and Sensor Networks. Berlin, Germany,
Networks in Wireless Sensor Networks,” in Proceedings of the 1st Springer, 2007.
International Conference on Mobile Wireless Middleware, Operating [144] W. Kim, J. Park, and H. Kim, “Target localization using ensemble
Systems, and Applications, Innsbruck, Austria, 2007, pp. 1–7. support vector regression in wireless sensor networks,” in Wireless
[122] D. Fontanella, M. Nicoli, and L. Vandendorpe, “Bayesian Localization Communications and Networking Conference, Sydney, Australia, Apr.
in Sensor Networks: Distributed Algorithm and Fundamental Limits,” 2010, pp. 1–5.
in 2010 IEEE International Conference on Communications, May [145] D. Tran and T. Nguyen, “Localization in wireless sensor networks
2010, pp. 1–5. based on support vector machines,” IEEE Transactions on Parallel and
[123] C. H. Lu and L. C. Fu, “Robust Location-Aware Activity Recognition Distributed Systems, vol. 19, no. 7, pp. 981–994, Jul. 2008.
Using Wireless Sensor Network in an Attentive Home,” IEEE Trans- [146] Y. Chen, Y. Qin, Y. Xiang, J. Zhong, and X. Jiao, “Intrusion detection
actions on Automation Science and Engineering, vol. 6, no. 4, pp. system based on immune algorithm and support vector machine in
598–609, Oct. 2009. wireless sensor network,” Communications in Computer and Informa-
[124] R. V. Kulkarni and G. K. Venayagamoorthy, Neural network based tion Science, vol. 86, pp. 372–376, May 2011.
secure media access control protocol for wireless sensor networks. [147] Y. Zhang, N. Meratnia, and P. J. Havinga, “Distributed online outlier
Piscataway, NJ, USA, Feb. 2009. detection in wireless sensor networks using ellipsoidal support vector
[125] M. Kim and M. G. Park, “Bayesian statistical modeling of system machine,” Ad Hoc Networks, vol. 11, no. 3, pp. 1062–1074, May 2013.
energy saving effectiveness for MAC protocols of wireless sensor [148] Z. Yang, N. Meratnia, and P. Havinga, “An online outlier detection
networks,” Software Engineering, Artificial Intelligence, Networking technique for wireless sensor networks using unsupervised quarter-
and Parallel/Distributed Computing, vol. 209, pp. 233–245, Jun. 2009. sphere support vector machine,” in IEEE International Conference
[126] Y. J. Shen and M.-S. Wang, “Broadcast scheduling in wireless sensor on Intelligent Sensors, Sensor Networks and Information Processing,
networks using fuzzy hopfield neural network,” Expert Systems with Sydney, Australia, Dec. 2008, pp. 151–156.
Applications, vol. 34, no. 2, pp. 900–907, Nov. 2008. [149] D. Li, K. Wong, Y. H. Hu, and A. Sayeed, “Detection, classification,
[127] S. Kaplantzis, A. Shilton, N. Mani, and Y. Sekercioglu, “Detecting and tracking of targets,” IEEE Signal Processing Magazine, vol. 19,
selective forwarding attacks in wireless sensor networks using support no. 2, pp. 17–29, Nov. 2002.
vector machines,” in 3rd International Conference on Intelligent Sen- [150] S. Macua, P. Belanovic, and S. Zazo, “Consensus-based distributed
sors, Sensor Networks and Information. Melbourne, Australia, Dec. principal component analysis in wireless sensor networks,” in 11th
2007, pp. 335–340. International Workshop on Signal Processing Advances in Wireless
[128] D. Janakiram, V. A. M. Reddy, and A. P. Kumar, “Outlier detection Communications, Marrakech, Morocco, Jun. 2010, pp. 1–5.
in wireless sensor networks using Bayesian belief networks,” in 1st [151] Y. C. Tseng, Y. C. Wang, K. Y. Cheng, and Y. Y. Hsieh, “iMouse: An
International Conference on Communication System Software and integrated mobile surveillance and wireless sensor system,” Computer,
Middleware, New Delhi, India, Jan. 2006, p. 1–6. vol. 40, no. 6, pp. 60–66, Jun. 2007.
[129] J. W. Branch, C. Giannella, B. Szymanski, R. Wolff, and H. Kargupta, [152] A. Rooshenas, H. Rabiee, A. Movaghar, and M. Naderi, “Reducing
“In-network outlier detection in wireless sensor networks,” Knowledge the data transmission in wireless sensor networks using the principal
and information systems, vol. 34, no. 1, pp. 23–54, Jul. 2013. component analysis,” in 6th International IEEE Conference on Intelli-
[130] S. Rajasegarar, C. Leckie, M. Palaniswami, and J. Bezdek, “Quarter gent Sensors, Sensor Networks and Information Processing, Brisbane,
sphere based distributed anomaly detection in wireless sensor net- Australia, Dec. 2010, pp. 133–138.
works,” in International Conference on Communications, Glasgow, [153] R. Masiero, G. Quer, D. Munaretto, M. Rossi, J. Widmer, and M. Zorzi,
Scotland, Jun. 2007, pp. 3864–3869. “Data acquisition through joint compressive sensing and principal com-
[131] P. P. Jayaraman, A. Zaslavsky, and J. Delsing, ‘‘Intelligent processing ponent analysis,” Honolulu, Hawaii, USA: Global Telecommunications
of k-nearest neighbors queries using mobile data collectors in a location Conference. IEEE, Nov. 2009, pp. 1–6.
aware 3D wireless sensor network,” Cordoba, Spain, Springer, 2010. [154] R. Masiero, G. Quer, M. Rossi, and M. Zorzi, “A Bayesian analysis
[132] J. Winter, Y. Xu, and W. C. Lee, “Energy efficient processing of of compressive sensing data recovery in wireless sensor networks,”
k nearest neighbor queries in location-aware sensor networks,” in in International Conference on Ultra Modern Telecommunications
2nd International Conference on Mobile and Ubiquitous Systems: Workshops, St.Petersburg, Russia, Oct. 2009, pp. 1–6.
Networking and Services, San Diego, CA, USA: IEEE, Jul. 2005, [155] S. Lee and T. Chung, “Data aggregation for wireless sensor networks
pp. 281–292. using self-organizing map,” Artificial Intelligence and Simulation, ser.
[133] L. Yu, N. Wang, and X. Meng, “Real-time forest fire detection with Lecture Notes in Computer Science. Springer Berlin Heidelberg, vol.
wireless sensor networks,” in International Conference on Wireless 3397, pp. 508–517, Nov. 2005.
Communications, Networking and Mobile Computing, vol. 2, Wuhan, [156] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Sil-
China, Sep. 2005, pp. 1214–1217. verman, and A. Y. Wu, “An efficient k-means clustering algorithm:
[134] A. Moustapha and R. Selmic, “Wireless sensor network modeling using Analysis and implementation,” IEEE Transactions on Pattern Analysis
modified recurrent neural networks: Application to fault detection,” and Machine Intelligence, vol. 24, no. 7, pp. 881–892, Jul. 2002.
IEEE Transactions on Instrumentation and Measurement, vol. 57, no. 5, [157] I. T. Jolliffe, “Principal component analysis,” Berlin, Germany:
pp. 981–988, May 2008. Springer verlag, 2002.
[135] A. Snow, P. Rastogi, and G. Weckman, “Assessing dependability of [158] R. Arroyo-Valles, R. Alaiz-Rodriguez, A. Guerrero-Curieses, and
wireless networks using neural networks,” Military Communications J. Cid-Sueiro, “Q-probabilistic routing in wireless sensor networks,”
Conference, vol. 5, pp. 2809–2815, Jun. 2005. Melbourne, Australia, Dec. 2007.
[136] Y. Wang, M. Martonosi, and L. S.Peh, “Predicting link quality using [159] S. Dong, P. Agrawal, and K. Sivalingam, “Reinforcement learning
supervised learning in wireless sensor networks,” ACM SIGMOBILE based geographic routing protocol for UWB wireless sensor network.”
Mobile Computing and Communications Review, vol. 11, no. 3, pp. Washington, DC, USA: Global Telecommunications Conference. IEEE,
71–83, Jul. 2007. Nov 2007, pp. 652–656.
[137] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, ‘‘When is [160] A. Frster and A. Murphy, “Froms: Feedback routing for optimizing
nearest neighbor meaningful? in Database Theory,” Springer, 1999. multiple sinks in wsn with reinforcement learning,” in 3rd International
[138] T. O. Ayodele, ‘‘Types of machine learning algorithms,” Portsmouth, Conference on Intelligent Sensors, Sensor Networks and Information,
UK, InTech, Feb. 2010. Melbourne, Australia, Dec. 2007, pp. 371–376.
[139] S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier [161] R. Sun, S. Tatsumi, and G. Zhao, “Q-MAP: A novel multicast routing
methodology,” IEEE Transactions on Systems, Man and Cybernetics, method in wireless ad hoc networks with multiagent reinforcement
vol. 21, no. 3, pp. 660–674, Jun. 1991. learning,” in Region 10 Conference on Computers, Communications,
[140] R. Lippmann, “An introduction to computing with neural nets,” ASSP Control and Power Engineering, vol. 1, Beijing, China, Oct. 2002, pp.
Magazine, IEEE, vol. 4, no. 2, pp. 4–22, Apr. 1987. 667–670.
[141] T. Kohonen, “Self-organizing maps,” Neurocomputing, vol. 21, no. 1 [162] Y. X. Wang and Y.-J. Zhang, “Non-negative matrix factorization: A
Mar. 1998. pp. 19-30. comprehensive review,” IEEE Transactions on Knowledge and Data
[142] I. Steinwart and A. Christmann, “Support vector machines,” Berlin, Engineering, vol. 25, no. 6, pp. 1336–1353, Sep. 2013.
Germany, Springer, 2008. [163] T. Nguyen and G. Armitage, “A survey of techniques for Internet traffic
[143] B. Yang, J. Yang, J. Xu, and D. Yang, Area localization algorithm classification using machine learning,” IEEE Commun. Surveys and
for mobile nodes in wireless sensor networks based on support vector Tutorials, vol. 10, no. 4, pp.56-76, Oct. 2008.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

22

[164] L. Grimaudo, M. Mellia, E. Baralis, and R. Keralapura, “Select: Self- Recognition via Sparse Representation,” IEEE Trans. Pattern Anal.
learning classifier for Internet traffic,” IEEE Transactions on Network Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009.
and Service Management, vol. 11, no. 2, pp. 144–157, June 2014. [186] M. Rresenhuber, , and T. Poggio, “Hierarchical models of object
[165] R. Ettiane, A. Chaoub, and R. Elkouch, “Enhanced traffic classification recognition in cortex,” Nature 2, vol. 2, no. 11, p. 10191025, Nov.
design through a randomized approach for more secure 3G mobile 1999.
networks,” in 2016 International Conference on Wireless Networks and [187] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught
Mobile Communications (WINCOM), Fez, Morocco, Oct. 2016, pp. learning: Transfer learning from unlabeled data,” in ICML (2007),
116–121. Corvallis, OR, USA, Jun. 2007.
[166] Z. D and M. A. W., ‘‘Traffic classification using a statistical approach,” [188] B. Heisele, P. Ho, and T. Poggio, “Face Recognition with Support
International Workshop on Passive and Active Network Measurement Vector Machines: Global versus Component-based Approach,” in ICCV
Berlin, Germany, Springer Berlin Heidelberg, 2005. (2001), Vancouver, British Columbia, Canada, Jul. 2001.
[167] H-J. Kang, M-S. Kim, and J. W-K. Hong, ‘‘A method on multimedia [189] A. Coates, and A. Ng, “Selecting receptive fields in deep networks,”
service traffic monitoring and analysis,” International Workshop on in NIPS (2011), Granada, SPAIN, Dec. 2011.
Distributed Systems: Operations and Management Berlin, Germany: [190] T. Lin, S. Liu, and H. Zha, “Incoherent dictionary learning for sparse
Springer Berlin Heidelberg, 2003, pp. 93–105. representation,” in ICPR (2012), 2012, vol. 014.
[168] V. D. M. J, C. R, C. Y et al., “Mmdump: A tool for monitoring In- [191] I. Ramirez, P. Sprechmann, and G. Sapiro, “Classification and Cluster-
ternet multimedia traffic,” ACM SIGCOMM Computer Communication ing via Dictionary Learning with Structured Incoherence and Shared
Review, vol. 30, no. 5, pp. 48–59, Feb. 2000. Features,” in IEEE CVPR (2010), San Francisco, CA, USA, Jun. 2010.
[169] S. Sen, O. Spatscheck, and D. Wang, “Accurate, scalable innetwork [192] Q. Zhang, and B. Li, “Discriminative K-SVD for dictionary learning
identification of p2p traffic using application signatures,” in Proceed- in face recognition,” in IEEE CVPR (2010), San Francisco, CA, USA,
ings of the 13th international conference on World Wide Web, New Jun. 2010.
York, NY, USA, May 2004, pp. 512–521. [193] R. Hyndman, “DataMarket: Data Library (TSDL),” https://datamarket.
[170] S. Zander. T. Nguyen, and G. Armitage, “Selflearning IP traffic com/data/list/?q=provider:tsdl, (accessed Dec. 2016).
classification based on statistical flow characteristics,” International [194] Y. Jia, X. Song, J. Zhou, L. Liu, L. Nie, and D. S. Rosenblum,
Workshop on Passive and Active Network Measurement, Berlin, Ger- “Fusing social networks with deep learning for volunteerism tendency
many: Springer Berlin Heidelberg, 2005. prediction,” in Proceedings of the Thirtieth AAAI Conference on
[171] S. Sen and W. Jia., “Analyzing peertopeer traffic across large networks,” Artificial Intelligence, Phoenix, Arizona, USA, Feb. 2016, pp. 165–
IEEE/ACM Transactions on Networking (ToN), vol. 12, no. 2, pp. 219– 171.
232, Apr. 2004. [195] S. Gao, H. Pang, P. Gallinari, J. Guo, and N. Kato, “A Novel
[172] P. Wang, S.-C. Lin, and M. Luo, “A framework for QoS-aware traffic Embedding Method for Information Diffusion Prediction in Social
classification using semi-supervised machine learning in SDNs,” in Network Big Data,” IEEE Transactions on Industrial Informatics, DOI
IEEE International Conference on Services Computing (SCC), San : 10.1109/TII.2017.2684160.
Francisco, USA, Jun. 2016, pp. 760–765. [196] C. Curme, T. Preis, H. E. Stanley, and H. S. Moat, “Quantifying the
[173] T. T. Nguyen and G. J. Armitage, “Training on multiple sub- flows semantics of search behavior before stock market moves,” Proceedings
to optimise the use of Machine Learning classifiers in real-world of the National Academy of Sciences, vol. 111, no. 32, pp. 11 600–
IP networks,” in Proc. 31st IEEE Conference on Local Computer 11 605, Aug. 2014.
Networks,Tampa, Florida, USA, Nov. 2006. [197] K. Dembczynski, W. Kotlowski, and D. Weiss, “Predicting ads click-
[174] M. Roughan et al., “Class-of-service mapping for QoS: a statisti- through rate with decision rules,” in Workshop on Targeting and
cal signature-based approach to IP traffic classification.” Proceed- Ranking in Online Advertising, Beijing, China, Apr. 2008.
ings of the 4th ACM SIGCOMM conference on Internet measure- [198] J. B. Kim, P. Albuquerque, and B. J. Bronnenberg, Online Demand
ment,Taormina, Sicily, Italy, Oct. 2004. Under Limited Consumer Search. Marketing Science vol.29 no.6,
[175] A. W. Moore and D. Zuev, “Internet traffic classification using bayesian pp.1001-1023, Jun. 2010.
analysis techniques,” ACM SIGMETRICS Performance Evaluation Re- [199] A. Vieira, “Predicting online user behaviour using deep learning
view, vol. 33, no. 1, Jun. 2005. pp.50–60. algorithms,” CoRR, vol. abs/1511, no. 06247, Nov. 2015.
[176] J. Erman, M. Arlitt, and A. Mahanti, “Traffic classification using clus- [200] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning
tering algorithms,” in Proceedings of the 2006 SIGCOMM workshop of social representations,” in Proceedings of the 20th ACM SIGKDD
on Mining network data, Pisa, Italy, Sep. 2006. International Conference on Knowledge Discovery and Data Mining,
[177] K. M. C. Tan and B. S. Collie, “Detection and classification of TCP/ip ser. KDD’14, New York, NY, USA, Aug. 2014, pp. 701–710.
network services[c]//computer Security Applications Conference,” in [201] L. Tang and H. Liu, Leveraging social media networks for classifica-
Proceedings., 13th Annual, San Francisco, CA, USA, Feb. 1997, pp. tion. Data Mining and Knowledge Discovery, vol. 23, no. 3, pp.447–
99–107. 478, Feb. 2011.
[178] J. P. Early, C. E. Brodley, and C. Rosenberg, “Behavioral authentica- [202] ——, “Relational learning via latent social dimensions,” in Proceedings
tion of server flows[c]//computer Security Applications Conference.” of the 15th ACM SIGKDD, KDD 09, New York, NY, USA, Aug. 2009,
Washington, DC, USA: Proceedings., 19th Annual, Jul. 2003, pp. 46– pp. 817–826.
55. [203] ——, “Scalable learning of collective behavior based on sparse social
[179] W. N, Z. S, and A. G., “A preliminary performance comparison of five dimensions,” in Proceedings of the 18th ACM conference on Informa-
machine learning algorithms for practical IP traffic flow classification,” tion and knowledge management, Hong Kong, China, Nov. 2009, pp.
ACM SIGCOMM Computer Communication Review, vol. 36, no. 5, pp. 1107–1116.
5–15, Oct. 2006. [204] S. A. Macskassy and F. Provost, “A simple relational classifier,” in
[180] J. Erman et al., “Offline/realtime traffic classification using semisuper- Proceedings of the Second Workshop on Multi-Relational Data Mining
vised learning,” Performance Evaluation, vol. 64, no. 9, Oct. 2007. pp. (MRDM-2003) at KDD-2003, New York, NY, USA, Dec. 2003, pp.
1194-1213. 64–76.
[181] Z. Wang, “The Applications of Deep Learning on Traffic [205] M. Ben Lazreg, M. Goodwin, and O.-C. Granmo, “Deep learning for
Identification,” https://www.blackhat.com/docs/us-15/materials/us-15- social media analysis in crises situations (position paper),” in The 29th
Wang-The-Applications-Of-Deep-Learning-On-Traffic-Identification- Annual Workshop of the Swedish Artificial Intelligence Society (SAIS),
wp.pdf, (accessed Dec. 2016). Malm, Sweden, Jun. 2016, pp. 6.
[182] T. P. Oliveira, J. S. Barbar, and A. S. Soares, ‘‘Multilayer Perceptron [206] M.-A. Russon, “Using deep learning neural networks
and Stacked Autoencoder for Internet Traffic Prediction,”, Ilan, to predict social unrest five days before it happens ,”
Taiwan, Springer Berlin Heidelberg, Sep. 2014, pp. 61–71. http://www.ibtimes.co.uk/cia-using-deep-learning-neural-networks-
[183] S. Basu, A. Mukherjee, and S. Klivansky, “Time series models for predict-social-unrest-five-days-before-it-happens-1585115, (accessed
internet traffic,” in in Proc. IEEE INFOCOM ’96, vol. 2, Mar. 1996, Dec. 2016).
pp. 611–620. [207] “The zettabyte erajtrends and analysis,” http://www.cisco.com/c/en/us/
[184] N. Duffield, C. Lund, and M. Thorup, “Estimating flow distributions solutions/collateral/service-provider/visual-networking-index-vni/vni-
from sampled flow statistics,” in ACM SIGCOMM, Karlsruhe, Ger- hyperconnectivity-wp.html, (accessed Nov. 2016).
many, Aug. 2003. [208] J. Moy, “OSPF version 2,” Ascend Communications, Apr. 1998.
[185] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust Face [209] G. Rétvári and T. Cinkler, “Practical ospf traffic engineering,” IEEE
communications letters, vol. 8, no. 11, pp. 689–691, Nov. 2004.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

23

[210] “The Zettabyte Era-Trends and Analysis,” http:// [231] P. Sundsy, J. Bjelland, B.-A. Reme, A. M. Iqbal, and E. Jahani,
neuralnetworksanddeeplearning.com/chap2.html, (accessed Nov. “Deep Learning Applied to Mobile Phone Data for Individual Income
2016). Classification,” in Proc. 5th International Conference on Artificial
[211] A. Raniwala and T. Cker Chiueh, “Evaluation of a wireless enterprise Intelligence: Technologies and Applications (ICAITA 2016), Rove
backbone network architecture,” in Proc. 12th Annual IEEE Sympo- Downtown, Dubai, UAE, Nov. 2016.
sium on High Performance Interconnects, Washington, DC, USA, Aug. [232] L. Ghouti, “Mobility prediction using Fully-Complex extreme learning
2004, pp. 98–104. machines,” in Proceedings of the European Symposium on Artificial
[212] Mao B. et al., “Routing or Computing? The Paradigm Shift Towards Neural Networks, Bruges, Belgium, Apr. 2014.
Intelligent Computer Network Packet Transmission Based on Deep [233] G.-B. Huang, M.-B. Li, L. Chen, and C.-K. Siew, “Incremental extreme
Learning,” in IEEE Transactions on computers, conditionally accepted learning machine with fully complex hidden nodes,” Neurocomput.,
Apr. 2017. vol. 71, no. 4-6, pp. 576–583, Jan. 2008.
[213] Batista, Gustavo E. A. P. A., Prati, Ronaldo C. and Monard, Maria [234] X. Song, H. Kanasugi, and R. Shibasaki, ‘‘DeepTransport: Prediction
Carolina, “A Study of the Behavior of Several Methods for Balancing and simulation of human mobility and transportation mode at a
Machine Learning Training Data,” SIGKDD Explor. Newsl., vol. 6, citywide level, IJCAI, New York, NY, USA, Jul. 2016.
no. 1, pp. 20–29, Jun. 2004. [235] X. Ouyang, C. Zhang, P. Zhou, and H. Jiang, “Deepspace: An online
[214] Brodley, Carla E and Friedl, Mark A, “Identifying mislabeled training deep learning framework for mobile big data to understand human
data,” Journal of artificial intelligence research, vol. 11, no. 1, pp. mobility patterns,” CoRR, vol. abs/1610, no. 07009, pp.1-11, Nov.
131–167, Jun. 1999. 2016.
[215] Moore, Robert C. and Lewis, William, “Intelligent Selection of Lan- [236] G.Zhou, K.Sohn, and H.Lee, “Online incremental feature learning
guage Model Training Data,” Proceedings of the ACL 2010 Conference with denoising autoencoders,” in International Conference on Artificial
Short Papers, ser. ACLShort ’10, Stroudsburg, PA, USA, Apr. 2010, Intelligence and Statistics (AISTATS), La Palma, Canary Islands, Apr.
pp. 220–224. 2012, pp. 1453–1461.
[216] D. Wang, D. Pedreschi, C. Song, F. Giannotti, and A.-L. Barabasi, [237] T. Xiao, J. Zhang, and K. Yang et al., “Error-Driven Incremental
“Human mobility, social ties, and link prediction,” in Proceedings of the Learning in Deep Convolutional Neural Network for Large-Scale Image
17th ACM SIGKDD International Conference on Knowledge Discovery Classification,” in Proceedings of the ACM International Conference
and Data Mining, ser. KDD ’11, New York, NY, USA, 2011, pp. 1100– on Multimedia (MM), Orlando, FL, USA, Nov. 2014, pp. 177–186.
1108. [238] I. Konno, H. Nishiyama, and N. Kato, “An Adaptive Media Access
[217] S. Jiang, J. Ferreira, and M. C. Gonzales, “Activity-based human Control Mechanism for Cognitive Radio,” IEICE Transactions on
mobility patterns inferred from mobile phone data: A case study of Communications, vol. J94-B, no. 2, pp. 253-263, Feb. 2011.
singapore,” IEEE Transactions on Big Data, vol. 1, no. 99, pp. 1–1, [239] Z. M. Fadlullah, H. Nishiyama, N. Kato, and M. M. Fouda, “Intrusion
Dec. 2015. Detection System (IDS) for Combating Attacks Against Cognitive
[218] S. Hasan, X. Zhan, and S. V. Ukkusuri, “Understanding urban human Radio Networks,” IEEE Network Magazine, vol. 27, no. 3, pp. 51-56,
activity and mobility patterns using large-scale location-based data June 2013.
from online social media,” in Proceedings of the 2Nd ACM SIGKDD [240] T. W. Rondeau, C. J. Rieser, B. Le, and C. W. Bostian, “Cognitive
International Workshop on Urban Computing, ser. UrbComp ’13, New radios with genetic algorithms: intelligent control of software de-
York, NY, USA, 2013, pp. 1–8. fined radios,” in Software Defined Radio Forum Technical Conference,
[219] K. Zhao, S. Tarkoma, S. Liu, and H. Vo, “Urban Human Mobility Data Phoenix, USA, Nov. 2004.
Mining: An Overview,” in IEEE International Conference on Big Data, [241] T. W. Rondeau and C. W. Bostian, “Cognitive Techniques: Physical
Washington D.C., USA, Dec. 2016. and Link Layers,” in Cognitive Radio Technology, B. A. Fette, Ed.
[220] Y. Chon, E. Talipov, H. Shin, and H. Cha, “Mobility prediction-based Burlington: Newnes, Aug. 2006, pp. 219–268.
smartphone energy optimization for everyday location monitoring,” [242] IEEE 802.11, “Working Group on Wireless Local Area Networks,”
in Proceedings of the 9th ACM Conference on Embedded Networked http://www.ieee802.org/11/, (accessed Dec. 2016).
Sensor Systems, New York, NY, USA, 2011, pp. 82–95. [243] C. Clancy, J. Hecker, E. Stuntebeck, and T. O’Shea, “Applications
[221] V. Belik, T. Geisel, and D. Brockmann, “Natural human mobility of machine learning to cognitive radio networks,” IEEE Wireless
patterns and spatial spread of infectious diseases,” Phys. Rev. X, vol. 1, Communications, vol. 14, no. 4, pp. 47–52, Aug. 2007.
no. 1, pp. 001–011, Aug. 2011. [244] N. Abbas, Y. Nasser, and K. E. Ahmad, “Recent advances on artificial
[222] S. Ni and W. Weng, “Impact of travel patterns on epidemic dynamics intelligence and learning techniques in cognitive radio networks,”
in heterogeneous spatial metapopulation networks,” Physical Review E, EURASIP Journal on Wireless Communications and Networking, vol.
vol. 79, no. 1, pp. 016111, Jan. 2009. 2015, no. 1, pp. 174, Jun. 2015.
[223] J. Yuan, Y. Zheng, and X. Xie, “Discovering regions of different [245] M. Zorzi, A. Zanella, A. Testolin, M. D. F. D. Grazia, and M. Zorzi,
functions in a city using human mobility and pois,” in SIGKDD12, “Cognition-based networks: A new perspective on network optimiza-
Beijing, China, Aug. 2012, pp. 186–194. tion using learning and distributed intelligence,” IEEE Access, vol. 3,
[224] G. Qi, X. Li, S. Li, G. Pan, Z. Wang, and D. Zhang, “Measuring pp. 1512–1530, Nov. 2015.
social functions of city regions from large-scale taxi behaviors,” in [246] T. J. O’Shea and J. Corgan, “Convolutional radio modulation recogni-
IEEE PerCom 2011, Seattle, WA, USA, Mar. 2011, pp. 384–388. tion networks,” CoRR, vol. abs/1602, no. 04105, Mar. 2016.
[225] Y. Zheng, Y. Liu, J. Yuan, and X. Xie, “Urban computing with [247] R. Raina, A. Madhavan, and A. Y. Ng, “Large-scale deep unsupervised
taxicabs.” Beijing, China: 13th ACM International Conference on learning using graphics processors,” in Proceedings of the 26th Annual
Ubiquitous Computing, Sep. 2011, pp. 89–98. International Conference on Machine Learning, ser. ICML ’09. New
[226] S. Goh, K. Lee, J. S. Park, and M. Y. Choi, “Modification of the gravity York, NY, USA, ACM, 2009, pp. 873–880. [Online]. Available:
model and application to the metropolitan seoul subway system,” Phys. http://doi.acm.org/10.1145/1553374.1553486
Rev. E, vol. 86, pp. 026102, Aug. 2012. [248] J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z.
[227] W.-S. Jung, F. Wang, and H. E. Stanley, “Gravity model in the korean Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng, “Large
highway,” EPL (Europhysics Letters), vol. 81, no. 4, pp. 48005, Jan. scale distributed deep networks,” in NIPS, 2012.
2008. [249] A. Taherpour, S. Gazor, and A. Taherpour, “Adaptive spectrum sensing
[228] B. Pan, Y. Zheng, D. Wilkie, and C. Shahabi, “Crowd sensing of traffic and learning in cognitive radio networks,” in 2010 18th European
anomalies based on human mobility and social media,” in Proceedings Signal Processing Conference, Aalborg, Denmark, Aug. 2010, pp. 860–
of the 21st ACM SIGSPATIAL International Conference on Advances in 864.
Geographic Information Systems, Orlando, Florida, USA, Nov. 2013, [250] O. G. Aliu, A. Imran, M. A. Imran, and B. Evans, “A survey of self
pp. 344–353. organisation in future cellular networks,” IEEE Commun. Surveys Tuts,
[229] W. Rao, K. Zhao, E. Lagerspetz, P. Hui, and S. Tarkoma, “Energyaware vol. 15, no. 1, pp. 336–361, Feb. 2013.
keyword search on mobile phones,” in in Proceedings of MCC SIG- [251] N. Agoulmine, ‘‘Autonomic network management principles: from
COMM, Helsinki, Finland, Aug. 2012, pp. 59–64. concepts to applications,” Academic Press - Elsevier, Oct. 2011.
[230] K. Zhao, M. P. Chinnasamy, and S. Tarkoma, “Automatic city region [252] M. Dirani and Z. Altman, ‘‘A cooperative reinforcement learning
analysis for urban routing,” in IEEE ICDMW, Atlantic City, NJ, USA, approach for inter-cell interference coordination in OFDMA cellular
Nov. 2015, pp. 1136–1142. networks,” Avignon, France: Proc. IEEE 8th Int. Symp. Modeling
Optim. Mobile, Ad Hoc, Wireless Netw. (WiOpt), May 2010.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/COMST.2017.2707140, IEEE
Communications Surveys & Tutorials

24

[253] R. Razavi, S. Klein, and H. Claussen, Self-optimization of capacity Nei Kato is a full professor and the Director
and coverage in LTE networks using a fuzzy reinforcement learning of Research Organization of Electrical Communi-
approach. Proc. IEEE 21st Int. Symp. Pers. Indoor Mobile Radio cation(ROEC), Tohoku University, Japan. He has
Commun. (PIMRC), Istanbul, Turkey, Sep. 2010. been engaged in research on computer networking,
[254] ——, “A fuzzy reinforcement learning approach for self-optimization wireless mobile communications, satellite communi-
of coverage in LTE networks,” Bell Labs Tech. J, vol. 15, no. 3, p. cations, ad hoc & sensor & mesh networks, smart
153175, Dec. 2010. grid, IoT, Big Data, and pattern recognition. He
[255] M. N. Islam and A. Mitschele-Thiel, “Reinforcement learning strategies has published more than 350 papers in prestigious
for self-organized coverage and capacity optimization,” in Proceeding. peer-reviewed journals and conferences. He is the
IEEE Wireless Communications Network Conf. (WCNC), Le Palais des Editor-in-Chief of IEEE Network Magazine(2015.7-
Congres de Paris Paris, France, Apr. 2012, p. 28182823. ), the Associate Editor-in-Chief of IEEE Internet of
[256] N. Morozs, T. Clarke, and D. Grace, “Heuristically accelerated rein- Things Journal(2013-), an Area Editor of IEEE Transactions on Vehicular
forcement learning for dynamic secondary spectrum sharing,” IEEE Technology(2014-), and the Chair of IEEE Communications Society Sendai
Access, vol. 3, pp. 2771–2783, Dec. 2015. Chapter. He served as a Member-at-Large on the Board of Governors, IEEE
[257] S. Han, K. Jang, K. Park, and S. Moon, “Building a single-box 100 Communications Society(2014-2016), a Vice Chair of Fellow Committee
Gbps software router,” in 17th IEEE Workshop on Local Metropolitan of IEEE Computer Society(2016), a member of IEEE Computer Society
Area Networks (LANMAN), Chapel Hill, NC, USA, May 2010. Award Committee(2015-2016) and IEEE Communications Society Award
[258] D. Singh, G. Tripathi, and A. J. Jara, “A survey of Internet-of-Things: Committee(2015-2017). He has also served as the Chair of Satellite and Space
Future vision, architecture, challenges and services,” in IEEE World Communications Technical Committee(2010-2012) and Ad Hoc & Sensor
Forum on Internet of Things (WF-IoT), Seoul, Korea, Mar. 2014. Networks Technical Committee(2014-2015) of IEEE Communications Soci-
[259] “Intel Autonomous Cars,” http://www.intel.eu/content/www/eu/en/it- ety. His awards include Minoru Ishida Foundation Research Encouragement
managers/autonamous-cars.html, 2016 (accessed Dec. 2016). Prize(2003), Distinguished Contributions to Satellite Communications Award
[260] Y. Li, R. Ma, and R. Jiao, “A Hybrid Malicious Code Detection Method from the IEEE Communications Society, Satellite and Space Communications
Based on Deep Learning,” International Journal on Security and Its Technical Committee(2005), the FUNAI information Science Award(2007),
Applications, vol. 9, no. 5, pp. 205–216, 2015. the TELCOM System Technology Award from Foundation for Electrical
Communications Diffusion(2008), the IEICE Network System Research
Award(2009), the IEICE Satellite Communications Research Award(2011), the
KDDI Foundation Excellent Research Award(2012), IEICE Communications
Society Distinguished Service Award(2012), IEICE Communications Society
Best Paper Award(2012), Distinguished Contributions to Disaster-resilient
Networks R&D Award from Ministry of Internal Affairs and Communications,
Zubair Md. Fadlullah (M’11-SM’13) is currently
Japan(2014), Outstanding Service and Leadership Recognition Award 2016
an Associate Professor with the Graduate School
from IEEE Communications Society Ad Hoc & Sensor Networks Technical
of Information Sciences, Tohoku University, Japan.
Committee, Radio Achievements Award from Ministry of Internal Affairs
He received BSc with Honors in Computer Science
and Communications, Japan (2016) and Best Paper Awards from IEEE
and Information Technology from Islamic Univer-
ICC/GLOBECOM/WCNC/VTC. Nei Kato is a Distinguished Lecturer of
sity of Technology (IUT), Bangladesh, in 2003. He
IEEE Communications Society and Vehicular Technology Society. He is a
completed MSc and PhD in Applied Information
fellow of IEEE and IEICE.
Science from Tohoku University in 2008 and 2011,
respectively. His research interests are in the areas of Osamu Akashi received BSc and MSc degrees in
5G, smart grid, network security, intrusion detection, information science from Tokyo Institute of Tech-
game theory, quality of security service provisioning nology, in 1987 and 1989 respectively. He received
mechanism and deep learning. He received the Dean’s Award and the his Ph.D. degree in mathematical and computing sci-
President’s Award from Tohoku University in 2011, the IEEE Asia Pacific ences from Tokyo Institute of Technology in 2001.
Outstanding Researcher Award in 2015, and the NEC Foundation Prize for He joined the Nippon Telegraph and Telephone
research contributions in 2016. He also received several best paper awards in Corporation (NTT) Software Laboratories in 1989,
the Globecom, IC-NIDC, and IWCMC conferences. and is a senior research scientist at the NTT Network
Innovation Laboratories. His research interests are in
the areas of distributed systems, multi-agent systems,
Fengxiao Tang (S’15) received the B.E. degree and network architectures. He is a member of ACM,
in measurement and control technology and instru- IEICE, IPSJ and JSSST.
ment from the Wuhan University of Technology,
Wuhan, China, in 2012 and the M.S. degree in soft- Takeru Inoue is a distinguished researcher in NTT
ware engineering from the Central South University, Network Innovation Laboratories. He was an ER-
Changsha, China, in 2015. Currently, he is pursuing ATO researcher at the Japan science and technol-
the Ph.D. degree at the GSIS, Tohoku University, ogy agency from 2011 through 2013. His research
Japan. His research interests are unmanned aerial interests widely cover algorithmic approaches in
vehicles system, game theory optimization, and deep computer networks. He received the B.E., M.E.,
learning. and Ph.D. degrees from Kyoto University, Japan, in
1998, 2000, and 2006, respectively.

Bomin Mao (S’15) received the BSc degree in


telecommunications engineering and M.S. degree Kimihiro Mizutani (M) is a researcher in NTT
in electronics and telecommunications engineering Network Innovation Labs. He received the M.S.
at Xidian University, China, in 2012 and 2015, degree in information system from Nara Institute
respectively. Currently, he is pursuing Ph.D. degree Science and Technology in 2010. His research in-
at the GSIS, Tohoku University, Japan. His research terest is future internet architecture. He received the
interests are involving wireless networks, software best student paper from International Conference on
defined networks, quality of service, particularly Communication Systems and Application (ICCSA)
with applications of machine intelligence and deep in 2010. He also received the research awards from
learning. IPSJ and IEICE in 2010 and 2013, respectively. He
is a member of IEICE and IEEE Communication
Society.

1553-877X (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like