Bridging Machine Learning and Computer Network Res
Bridging Machine Learning and Computer Network Res
net/publication/329325877
CITATIONS READS
10 3,030
6 authors, including:
Dan Li
China Medical University (PRC)
139 PUBLICATIONS 3,374 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jinkun Geng on 03 June 2019.
REVIEW PAPER
Received: 6 November 2017 / Accepted: 14 November 2018 / Published online: 30 November 2018
© China Computer Federation (CCF) 2018
Abstract
With the booming development of artificial intelligence (AI), a series of relevant applications are emerging and promoting an
all-rounded reform of the industry. As the major technology of AI, machine learning (ML) shows great potential in solving
network challenges. Network optimization, in return, brings significant performance gains for ML applications, in particular
distributed machine learning. In this paper, we conduct a survey on combining ML technologies with network research.
Keywords Artificial intelligence · Machine learning · Computer network · Directions and challenges
13
Vol.:(0123456789)
2 Y. Cheng et al.
The rest of the paper is organized as follows. Section 2 Model from the history data accumulated by experienced
provides an overview of machine learning based network operators. Human operators can interact with the system
management solutions. Section 3 reviews the significant periodically to set hyper-parameters and input fresh training
works in machine learning based network security and pri- data for correction. In most time, Opprentice is able to inde-
vacy solutions. Section 4 surveys representative works in pendently conduct online detection of possible anomalies
network for distributed machine learning. We summarize without the participation of network operators. In this way,
and bring forth the future directions in Sect. 5. the accumulated history data is utilized and the human labor
is much reduced. Winnowing Algorithm Lutu et al. (2014)
is another successful application in router management to
2 Machine learning based network distinguish the unintended limited visibility prefix (LVPs)
management solutions caused by misconfigurations or unforeseen routing policies.
The essence of Winnowing Algorithm is the decision-tree
Operation & management have always been a major part in based classification. In the training process, there are many
network engineering, especially with the booming of cloud decision trees established based on the labeled data. With
computing and development of data center. Since the scale the boosted tree model, Winnowing Algorithm is able to
of network has developed to a great extent, it becomes more distinguish unintended LVPs from the ones, which are the
challenging for network operators to manage such a large stable expression of intended routing policies to detect the
network and guarantee the service quality provided to cus- anomalous events in the routing system.
tomers. Traditional operating methodology requires a large
amount of manual work, which incurs a laborious burden to 2.1.2 Maintenance without sufficient history data
network operators. The rise of ML techniques has brought
new opportunities to free network operators from the heavy On the other hand, when there is not enough data to train
workload. Meanwhile, through machine learning techniques, a suitable model for the practical scenario, transfer learn-
the system performance can be optimized and resources can ing becomes a considerable choice. The SMART solution
be better utilized. Specifically, ML techniques have been Botezatu et al. (2016), proposed by researchers at IBM,
applied in the following aspects of network operation & focuses on disk replacement in data center and trains a clas-
management. sification model to determine whether the disk should be
replaced. Considering the data come from different scenarios
2.1 Intelligent maintenance and fault detection with different distributions, they take advantage of transfer
learning to eliminate sample selection bias Zadrozny (2004).
Performance anomalies can damage the service quality In this way, the data from different scenarios are transferred
provided to customers and it is a critical issue for network to train the ML model and further help the operator with
operators to detect or prevent anomalies in routine main- disk replacement.
tenance. Many explorative works have been conducted
focusing on the design of anomaly detectors Fontugne et al. 2.2 Resource management and job scheduling
(2010), Soule et al. (2005), Li et al. (2006), Yamada et al.
(2013), Ashfaq et al. (2010). However, traditional moni- Resource management and job scheduling have always
toring mechanisms are time-consuming and less effective, been hot topics in data center, especially as the scale grows
which require expert knowledge from operators and involve large and the communication becomes more complicated
laborious manual work. Examining this, some novel solu- Chowdhury and Stoica (2015), Ma et al. (2016). Usually the
tions have been proposed to utilize ML techniques for net- problem can be formulated into an NP-hard problem, and
work system maintenance. past non-AI works adopt simple heuristic methods to solve
the problem Ballani et al. (2011), Li et al. (2016), Xie et al.
2.1.1 Maintenance with sufficient history data (2012), Zhu et al. (2012). Nowadays there is an emerging
trend to utilize classic intelligent algorithm to find better
Given sufficient history data, it is feasible to train a detection solutions efficiently. Generally speaking, a proper resource
model so that it can replace human operators to monitor the management solution focuses on two aspects of demands,
performance of network system. Opprentice Liu et al. (2015) i.e. utilization-driven and energy-saving. On one hand, the
is a novel framework to detect performance anomalies with solution is expected to improve the resource utilization and
reference to the KPI data. The main objective of Opprentice accelerate the progress. On the other hand, energy consump-
is to choose suitable detectors and tune them to detect real- tion should be reduced in the data center, especially in the
time anomalies without the participation of network opera- dynamic scenario that involves VM (or container) migration.
tors. To attain this, Opprentice constructs a Random Forest Besides utilization-driven and energy-saving objectives,
13
Bridging machine learning and computer network research: a survey 3
there are also some works pursuing a hybrid objective to is also implemented with Q-learning to balance QoS reve-
reach a better trade-off among various performance metrics. nue and power consumption in geo-distributed data centers
Zhou et al. (2016). The objective function quantities QoS
2.2.1 Utilization‑driven solution revenue and power consumption with a simple weighted
sum, besides, optimization techniques are integrated to
Libra Ma et al. (2016) is a representative work focusing on accelerate the solving computation. By tuning the hyper-
the first aspect and it aims to maximize isolation guaran- parameter in the weighted sum, adaptive policies can be
tee with an optimal placement for parallel applications in generated from the RL model to cater to various services
the data center. In order to maximize isolation guarantee, in geo-distributed data centers.
Tabu search is adopted in Libra to help find an optimal solu- Moreover, there are also some research works focusing
tion for container placement at an affordable computation on the prediction of future situation to determine the cur-
cost. Besides the traditional intelligent algorithms such as rent resource management solutions. For example, the user
Tabu search, novel techniques including deep learning and demand can be modeled with neural network classifiers, then
machine learning, are also applied in resource management adaptive solutions are generated to determine the resource
and job scheduling. DeepRM Mao et al. (2016) is a pio- configuration and job scheduling in data center Bao et al.
neering work in adopting deep reinforcement learning tech- (2016). Recent works, such as Tan et al. (2017), Zhang et al.
nology to manage resources in network system. It defines (2017), He et al. (2013), adopt online learning techniques
average job slowdown to quantity the normalized progress to predict future workload and reduce the cost of time and
rate of jobs and aims to minimize the metric with a standard resource.
policy gradient algorithm. DeepRM proves the feasibility of
Deep RL techniques in resource management problems, and
motivates fresh ideas for following research. 2.3 Service performance optimization
13
4 Y. Cheng et al.
2.3.2 Web search optimization from the large amount of data packets Santiago et al. (2012),
Baralis et al. (2013), Franc et al. (2015), Bartos et al. (2016),
Apart from the quality management of video streaming, Xu et al. (2015), Antonakakis et al. (2012). Via clustering
web search is another hot field in QoS optimization. From and classification, similar patterns are mined out among data
the perspective of user experience, response time is a key packets, which are helpful for applications such as security
attribute to consider during the QoS optimization for web analysis and user profiling. The combination of classification
services. The optimization towards high search response and traffic analysis still remains as a hot topic in recent years,
time (HSRT) is challenging with a heavy workload. Such nevertheless, some other machine learning algorithms also
a situation motivates the combination of machine learn- come into use for traffic analysis.
ing techniques and HSRT optimization. FOCUS Liu et al.
(2016) is the first work in this direction and it adopts deci- 2.4.1 Natural language processing for traffic analysis
sion tree model to learn the domain knowledge automatically
from search logs and identify the major factors responsible Proword Zhang et al. (2014) leverages natural language
for HSRT conditions. processing technique in protocol analysis: First, Proword
designs Voting Experts (VE) algorithm to select the most
2.3.3 CDN service optimization possible boundary positions for word partitioning. Based on
the candidate feature words extracted by the VE algorithm,
PACL (Privacy-Aware Contextual Localizer) Das et al. Proword tries to mine out the protocol features from these
(2014) adopts similar technique as FOCUS, but it aims to words. The candidate words are ranked with pre-defined
learn users contextual location and further improve the qual- score rules and the top k of them serve as the feature words.
ity of content delivery. PACL models the mobile traffic with The combination of NLP and protocol analysis demonstrates
a decision tree, together with pruning techniques. Then the a higher accuracy over traditional protocol analysis methods.
most significant attributes are identified to imply user loca-
tion contexts. With the predicted context information, PACL
is able to choose the nearest CDN node for content delivery, 2.4.2 Exploratory factor analysis for traffic analysis
thus reducing the waiting time in data transfer.
Exploratory factor analysis (EFA) emerges as a novel fac-
2.3.4 Congestion control optimization tor analysis technique, and it is believed to be more effec-
tive than traditional principal component analysis (PCA)
Machine learning can also be integrated into TCP conges- techniques for multivariate analysis. The recent work Furno
tion control (CC) mechanism to improve the network perfor- et al. (2017) adopts EFA technique to mobile traffic data and
mance. For example, it has been used to classify congestive bridge the temporal and spatial structure analysis. This work
and non-congestive loss Jayaraj et al. (2008), forecast TCP fills the gap in this joint area with better or equal results
throughput Mirza et al. (2007), and for better RTT estima- compared to state-of-art solutions.
tion Nunes et al. (2011). Remy Winstein et al. (2013) for-
malizes the multi-user congestion control problem as an 2.4.3 Transfer learning for traffic analysis
MDP and learns the optimum policy offline. It needs intense
offline computation and the performance of the RemyCCs Transfer learning also contributes to the traffic analysis
depends on the accuracy of the network and traffic models. and security threat detection in Bartos et al. (2016), which
PCC Dong et al. (2015) adaptively adjusts its sending rate proposes a classification system to detect both known and
based on continuous profiling, but it is entirely rate-based previously unseen security threats. Since there may be
and its performance depends on the accuracy of the clock- biases between the training domain and target domain, the
ing. The learnability of TCP CC was examined in Sivaraman knowledge acquired from traditional training model can-
et al. (2014), where RemyCC was used to understand what not be directly applied to the target cases. Through transfer
kinds of imperfect knowledge on network model would hurt learning technique, the feature values are transformed into
the learnability of TCP CC more than others. Q-learning an invariant representation for domain adaptation. Such an
based TCP Li et al. (2016) is the first attempt (that we know invariant representation is integrated into the classification
of) that uses Q-learning to design the TCP CC. system, which helps to categorize traffic into malicious or
legitimate classes.
2.4 Traffic analysis Traffic analysis perhaps is the closest area related to
machine learning techniques and it involves lots of efforts
Traditional clustering and classification methods are widely from computer network researchers. The broad applications
applied in earlier works aim to find valuable information of traffic analysis lie in the security research, such as user
13
Bridging machine learning and computer network research: a survey 5
profiling and anomaly detection. We will further discuss detect whether there is any anomaly hidden in the graph.
these aspects in the following section. Such a mapping from anomaly detection to graph mining
gains a good performance in efficiency and scalability.
In most of their works Neuvirth et al. (2015), Zhang et al.
3 Machine learning based network security (2014), classic algorithms are trained by the system logs and
and privacy solutions other KPI, so the key factor to get more precise results is
how to find a well-organized set of features, it seems to be
Network security and privacy are a broad area covering a poor generalized. More precise result and generalization can
range of issues, including anomaly detection, user authenti- be gained from another view Nandi et al. (2016).
cation, attack defense, privacy protection and other aspects.
With the explosion of network data in recent years, tradi- 3.2 Authentication and privacy protection
tional methodologies are confronted with more challenges
in detection and defense of emerging attacks. Inspired by the Authentication guarantees user privacy and prevents infor-
success on the traditional area, researchers try to use classic mation leakage. The user identity is supposed to be authenti-
ML methodology(e.g. Naive Bayesian Nandi et al. (2016), cated with a reliable mechanism, and then the user is granted
KNN Wang et al. (2014), regression Nandi et al. (2016), authorization for access.
decision tree Zheng et al. (2014), Soska and Christin (2014), The popularity of mobile devices causes more problems
SVM Franc et al. (2015), random forest Hayes and Danezis for user authentication and privacy protection. On one hand,
(2016)) to solve a series of complicated security problems. numerous applications on the phone require collecting user’s
behavior data for better service, but the collected informa-
3.1 Anomaly detection on cloud platform tion may be exploited by adversaries, further incurring
leakage and threatening user privacy. On the other hand,
Anomaly such as the misconfiguration and vulnerable attack mobile users may be unaware of information protection so
have a great impact on the system security and stability, the that their authentications may be stolen by others easily.
detection can be very challenging without sufficient knowl- Complex password and/or secondary verification mechanism
edge of the system conditions. In most cases, detection can reduce the risk of authentication attacks, but also bring
requires analyzing a large amount of system log, which is not significant inconvenience to the users.
possible for human operators. Machine learning techniques Observed the difference on behaviors from different user
thus provide alternative ways for manual work. Zheng et al. (2014) designed a novel authentication mecha-
Since the cloud platform is more and more popular, many nism to improve the privacy by profiling a user behavior
attackers are targeting on how to steal the rich resource of according to their habit while use smartphone. More specifi-
cloud platform, thus, how to detect the anomaly behavior cally, they model the nearest neighbor distance and a deci-
on a cloud platform is arising a wide concern of research- sion score based on the features(e.g. acceleration, pressure,
ers. FraudStormML Neuvirth et al. (2015) adopts super- size and time) to judge whether the behaviors belong to the
vised regression methods to detect fraudulent use of cloud same user and whether the authentication should be granted.
resources in Microsoft Azure. It can detect the fraud storms Besides analysis on the user side, analysis on the adver-
with reference to their utilization of resources like the band- saries’ side is also worthwhile. It has been observed that
width and computation unit, thus raising early alerts to pre- the behavior between user and adversaries keep chang-
vent the illegal consumption of cloud computing resources. ing dynamically Wang and Zhang (2014). The changes of
APFC 3 Zhang et al. (2014) is another system developed user’s behavior and adversaries action can be modeled with
with a hierarchical clustering algorithm to automatically a two-state Markov chain for inferring user’s behavior. The
profile the maximum capacities of various cross-VM cov- action between user and adversary is generalized as a zero-
ert channels on different cloud platforms. Convert channel sum game: the user tries to change his behavior making
may cause information leakage and be utilized by hackers the adversary fail to predict their next state, whereas the
to threaten system security. adversary tries to predict user’s behavior by adjusting their
The maturity of NLP techniques brings new ideas to strategies. The zero-sum game can be solved by a minimax
anomaly detection and it proves to be an efficient way to learning algorithm with provable convergence to obtain the
detect the anomalous issues based on their execution logs. optimal strategies for users to protect their privacy against
Nandi et al. Nandi et al. (2016) follow this idea and con- the malicious adversaries.
structs graph abstraction with the execution logs: the execu- Fingerprinting technique is regarded as an effective
tions in the distributed applications are abstracted as nodes method for behavior identification even under the circum-
and workflows are abstracted as edges to connect them. stances with encryption, which can be utilized by both
Naive Bayesian and linear regression models are used to defender and attackers. AppScanner Taylor et al. (2016) is a
13
6 Y. Cheng et al.
typical work which uses automatic fingerprinting technique domains by learning from a public-available blacklist on
to identify android apps based on the analysis of network malicious domains. Different to traditional methods, it can
traffic generated by the apps. It adopts the SVM classifier learn efficiently from a weak label dataset by a MIL (mul-
and random forest to establish its identification model. By tiple instance learning) framework, the main idea is that: it
using flow characteristics such as the packet length and traf- firstly extracts features from the proxy log, and analyzes the
fic direction, AppScanner is able to identify sensitive apps, correlation between a huge amounts of unlabeled data with
which may cause attacks. Since AppScanner focuses on the a small fraction of labeled data, further deriving the weak
statistical characters, it works well against encryption. In labels for them. Which give an inspiration on how to learn
addition, fingerprinting attack is seen as a serious threat to from weakly labelled dataset.
online privacy. Apart from the deficiency of labeled data for training,
The privacy is a big concern with the popularity of another problem in malicious behavior detection lies in the
mobile device, more knowledges on the user behavior help difficulty of understanding the data representation, since
the service provider offering a better User Experience, it can attackers may hide their behaviors with traffic obfuscation
also be exploited by malicious attackers to put the owner in to escape being tracked. Confronted with such a problem,
danger. So it is more like an adversarial game, many works a robust representation suite is proposed in Bartos et al.
mentioned before bringing novel ideas to solve these sce- (2016) for classifying evolving malicious behaviors from
narios. In general, how to figure out the malicious behaviors obfuscated traffic. It groups sets of network flows into bags
while keeping a high user experience should be taken into and represents them with a combination of feature values
consideration. and feature differences. The representation is designed to
be resilient to feature shifting and scaling and oblivious to
3.3 Web security and attack detection bag permutation and size changes. The proposed optimi-
zation method learns the parameters of the representation
Web security is another rigorous issues today, and many automatically from the training data (SVM based to learn
websites are suffering attack caused by the vulnerability the number and size of bin of historical graph), allowing the
and some other factors. The recent k-fingerprinting Hayes classifiers to create robust models of malicious behaviors
and Danezis (2016) attack employs website fingerprinting capable of detecting previously unseen malicious variants
technique to launch an attack even confronted with a large and behavior changes.
amount of noisy data and encrypted traffic. It adopted a ran- Unlike the malicious obfuscation, traffic encryption,
dom decision forest to construct the website fingerprinting which is usually applied for privacy protection, also causes
and trains it by a set of features (such as the burst or packet great challenges to identify the malicious flow stream. To
length instead of plain text) and proves to be an efficient analyze those malicious traffic flow with encrypted payload,
methodology on identifying which websites the victim is the meta-information is used in Comar et al. (2013), such
visiting based on history datum, which can be used in further as packet length and time interval, to train the model. A
attack behaviors. two-level framework is constructed combining an existing
How to identify the malicious websites is arising a great IDS and the self-developed SVM algorithm, which proves
interest from academic. In general, traffic inspection serves to be effective to identify malicious traffic from tremendous
as a common and effective method for malicious behavior flow data.
identification. The recent Soska and Christin (2014) pro- The methods mentioned above may not work so well
poses a general approach for predicting websites propensity compared to raw traffic analysis, limited by the obfuscation
to become malicious in the future. It adopts C4.5 decision and encryption, some more advanced methodology may be
tree trained by Relevant features (e.g. distributed from traf- proposed further to cater to those scenarios, on the other
fic statistic, file system webpage structure and contents) to hand, many works are trying to remedy the limited condition
identify whether the page is going to be malicious. from other views.
However, the increasing variety of network applications SpiderWeb Stringhini et al. (2013) offers a new approach
and protocols, as well as the widely using encryption tech- to detect malicious web pages by using the redirection
niques, makes it more challenging to identify malicious graphs. It collects HTTP redirection data from a large and
behavior from the huge mass of traffic. Meanwhiles, many diverse collection of web users and aggregates the differ-
relevant works have brought novel ideas on how to train ent redirection chains that lead to a specific web page, and
model with limited dataset and make ML more powerful in then it analyzes the characteristics of the redirection graph,
web security issues. extract 28 features to represent the redirection graph. By
The recent work Franc et al. (2015) is focused on how inputting these features into the SVM classifier, SpiderWeb
to learn the traffic detector from weak labels, which adopt is able to identify the malicious web page more accurately
a novel Neyman Pearson model to identify the malicious than previous methods.
13
Bridging machine learning and computer network research: a survey 7
More deeply, MEERKAT Borgolte et al. (2015) brings a results by tuning parameter and advanced model, one the
novel approach based on the “look and feel” of a website to other hand, attackers are trying to obfuscate their mali-
identify if the website has been defaced. Different from pre- cious behavior with normal data. This inverses relationship
vious works, MEERKAT leverages recent computer vision between defender and attacker makes ML-based security
techniques and directly takes the snapshot as input. It can issues more complicated since ML techniques can be lever-
automatically learn high-level features from data directly and aged by both the attackers and defenders.
does not rely on additional information supplied by the web- In summary, ML possesses potential power in security
site’s operator. MEERKAT employs a stacked autoencoder research. However, more factors need to consider and there
neural network to “feel” the high-level features. The features are still many open problems remained in this area.
are extracted by the machine automatically and input into a
feedforward neural network to identify the defaced websites.
Web security is rigorous and harder to solve by ML- 4 Network for distributed machine learning
based methodology, confronting with the obfuscation and platform
encryption technologies. On the one hand, researchers can
develop more advanced model catering to such harsh condi- The rapid development of the Internet has led to the explo-
tion, meanwhiles, some novel angles from other areas may sion of business data amount as well as the increasing com-
bring new chances. plexity of training models. The time-consuming training
process and heavy workload make it even impossible to
3.4 Barriers in ML‑based security research undertake these tasks on one single machine, therefore, dis-
tributed computing becomes an alternative way to consider.
Researchers have focused on applying classic ML method- In recent years, there have been some representative works
ology to solve the security issues and get a great success. conducted towards distributing machine learning platforms,
However, more barriers still remain in this area and here we such as Hadoop hadoop (2009), Spark Zaharia et al. (2010),
summarize three main aspects as follows. GraphLab Low et al. (2014), DistBelief Dean et al. (2012),
Challenge of model design Network security issues are Tensorflow Abadi et al. (2016), MXNet Chen et al. (2015),
mainly analyzed on the basis of traffic traces and system etc. Generally speaking, there are a couple of major issues
logs. Current research simply borrows the models from to concern during the construction of an efficient distributed
other areas (such as computer vision and pattern recogni- machining learning platform: (1) network topology (2) par-
tion) to the security scenario. However, it remains as a key allelism and synchronization (3) communication and scal-
concern how to design more effective models to mine the ability, etc.
complex relationship hidden in these data, Wang (2015)
brings a novel idea on how to map network traces to other 4.1 Network topology for distributed machine
areas, it maps traffic to a bitmap and adopts autoencoder to learning
distinguish different traffics inspired by CV, it works well
on unencrypted raw network traffic, however not suitable The architecture design of the distributed platform can
for the encrypted one, other work [Nandi et al. (2016) map impose significant impacts on the execution efficiency and
system logs to DAG problems] also gives inspiration on how overall performance; meanwhile, it has a close relationship
to map network security issues to traditional area. How to to other issues, such as fault tolerance and scalability. So
design effective model catered to network security scenarios far, there has been two types of architectural prototypes pro-
is deserved to be deeply explored in future. posed, i.e. Parameter Server-based (PS-based) architecture
Lack of training dataset In traditional area (e.g. CV, and Ring-based architecture.
NLP and speech recognition, etc.), many datasets (e.g. Ima-
geNet, MNIST, SQuAD, Billion Words, bAbi, TED-LIUM, 4.1.1 PS‑based architecture
LibriSpeech, etc.) are public to academia and industry,
researchers can get access to those resources and develop PS-based architecture Chilimbi et al. (2014) is illustrated
more advanced model. In the security area, many factors as Fig. 1. In the PS-based design, the machines (or nodes1)
(such as political and commercial concerns) constrain the are organized in a centralized way and there is a functional
access to ground-truth network dataset, thus making it even difference among them. Some nodes work as parameter
harder to apply machine learning technologies to solve the servers (PSs) whereas the others work as workers. PSs are
network security issues. responsible for managing the parameters of the model and
Adversarial and game theory The security problem is
more like a competitive game with many factors involved.
On the one hand, defenders are trying to get more precise 1
In this paper, we use node and machine as synonyms.
13
8 Y. Cheng et al.
13
Bridging machine learning and computer network research: a survey 9
4.2 Parallelism and synchronization
13
10 Y. Cheng et al.
takes vertex as the minimum granularity and distributes the Remote Direct Memory Access (RDMA) Archer and
vertexes into different nodes. The edges reflect the depend- Blocksome (2012) is another high-performance communi-
ency between vertexes and multiple copies of edges are cation protocol, which is aimed to access memory informa-
stored to avoid the loss of dependency information. Tux2 , on tion on the other machine directly. RDMA can minimize the
the other hand, cut vertexes and replicate them into several overhead of processing packet and latency with the assist
copies stored in several nodes. Such a design proves to be of dependable protocol implemented on hardware, zero
effective to handle power-law graphs and better match PS- copy and kernel by-pass technologies. With those features,
based architecture. RDMA can achieve 100 Gbps throughput and less than
5 𝜇 s latency. RDMA has great advantages over TCP and
4.2.3 Synchronization and asynchronization been applied to distributed machine learning system such
as tensorflow Jia et al. (2017), Abadi et al. (2016). To fur-
Synchronization and asynchronization is a concerning issue ther release the potentiality in RDMA, GPUDirect RDMA
in either Parallelism mode: Synchronization can sometimes gpudirect (2018) enables a direct path for data exchange
cause serious communication costs. Asynchronization, on between the GPU and a third-party peer device using stand-
the other hand, will lead to a frustrating result and incur ard features of PCIe (e.g. network interfaces), instead of the
more iterations. Since either mechanism is not perfect, there assistance of CPU, which incurs extra copying and latency.
are some strategies proposed to combine the benefits of the Related work Yi et al. (2017) is trying to introduce GPU-
two mechanisms. Direct RDMA to improve the performance of distributed
K-bounded delay Li et al. (2014a) can be regarded as a machine learning system.
trade-off of synchronization and asynchronization in model Recently, Systems with multiple GPUs and CPUs are
updates. It relaxes the synchronization constraints and allows becoming common in AI computing. These GPUs and CPUs
the fastest worker node can surpass the slowest one for no communicate with each other via PCIe. However, GPUs is
more than K rounds. Only when the gap goes beyond K rounds, gaining more and more computation ability, the traditional
the fastest node will be blocked for synchronization. K is a PCIe bandwidth is increasingly becoming the bottleneck at
user-defined hyperparameter and varies in different models. In the multi-GPU system level, driving the need for a faster and
particular, when K is set to zero, the K-bounded delay mecha- more scalable multiprocessor interconnect. The NVIDIA
nism turns to the synchronous one. When K is infinite, the NVLink nvlink et al. (2018) technology addresses this inter-
K-bounded delay mechanism turns to the asynchronous one. connection issue by providing higher bandwidth, more links,
and improved scalability for multi-GPU and multi-GPU/
4.3 Communication optimization CPU system configurations. A single NVIDIA Tesla V100
GPU supports up to six NVLink connections and a total
The poor performance of distributed system has much to do bandwidth of 300 GB/sec10X the bandwidth of PCIe Gen
with the communication latency. This is even more distinc- 3. Servers like the new NVIDIA DGX-1 dgx (2017) take
tive with the popularity of GPU-based computing such as advantage of these technologies to gain greater scalability
NVIDIA nvidia et al. (2017) and AMD amd (2017). GPU for ultrafast deep learning training.
is efficient for parallel computing with a lot of computing
cores, which is in great need of an efficient communication 4.3.2 Data compression and communication filter
mechanism. The increasing training scale can incur expen-
sive communication cost and cause severe bottlenecks for Usually, the parameters are stored and transferred in the key-
the platform. To mitigate the communication bottleneck and value format, which may cause redundancy because values
achieve a satisfactory performance, there are some tricks can be small (floats or integers) and the same keys are trans-
worth considering. ferred during each interaction between server and worker. To
mitigate this, several tricky strategies are adopted:
4.3.1 Efficient communication protocols Transferring the updated portion of parameters Li et al.
(2014a), Hsieh et al. (2017) Since parameters in model are
As one of the most typical communication protocols, TCP represented as structured mathematical objects, such as vec-
is widely applied to distributed machine learning systems. tors, matrices, or tensors and typically a part of the object
However, the drawbacks seriously damage the system per- is updated at each iteration, only partial or full matrix is
formance, such as slow start, naive congestion control and transferred between them, thus greatly reducing the com-
high latency. Inspired by this, many researchers try to intro- munication cost.
duce more efficient communication protocol to improve the Transferring the values instead of key-value pairs Due
scalability and distribution, including RDMA, GPUDirect to the range-based push and pull, a range of key-value pairs
RDMA, NVLink, etc. is communicated at each iteration. When the same range is
13
Bridging machine learning and computer network research: a survey 11
chosen again, it is likely that only values are changed while gradients based on the former values in the last iteration
the keys are unmodified. If both the sender and receiver have and local training dataset, then it collects the previous
cached these keys, only the values with a signature of the parameters from its neighbors (there will be two neighbors
keys need to be transferred between them. Therefore, the in the ring-based topology) and average the three copies of
network bandwidth is effectively doubled Li et al. (2014b). parameters (i.e. the two copies from neighbors as well as the
Compressing the transferred data Since the values trans- local parameters on itself ) to replace the local parameters.
ferred is compressible numbers, such as zeros, small integers, Finally, it uses the calculated gradients to update the fresh
and 32-bit floats with an excessive level of precision, commu- parameters D-PSGD is proven to have the same computa-
nication cost can be reduced by using lossless or lossy data tion complexity as well as smoother bandwidth consump-
compression algorithms, Li et al. (2014b) compress the sparse tion over the traditional topology such as PS-based topol-
matrix by eliminating most zeros values, gRPC Abadi et al. ogy. Compared to AllReduce, it gains better performance
(2016) eliminated the redundancy to decrease the transferred by reducing the number of communication in high latency
data by novel compression algorithms, Wei et al. (2015) used network. Besides, it follows parallel workflow and overlaps
a 16-bit float to replace 32-bit float value to improve the uti- the time for parameters collection and gradients updates,
lization of bandwidth and Chilimbi et al. (2014), Zhang et al. thus gaining better performance.
(2017) decompose the gradient in Full-Connection layer as
two vectors to decrease the transferred data.
Besides, Li et al. (2014b), Hsieh et al. (2017) have
5 Future direction for network & ML
observed that many updates in one iteration is negligible in
changing parameters, to balance the computation efficiency
The advent of ML revolutions brings fresh vitality to com-
and communication, Li et al. (2014b), Chen et al. (2015)
puter network research whereas the improvement of network
adopted a KKT (Karush Kuhn Tucker) threshold to filter the
performance also provides better support for ML computa-
most insignificant updates and just transfer those updates
tions. The combination of computer network and ML tech-
which can dramatically affect the parameters while keeps
nology is a frontier area and many open issues still remain
the convergence; Gaia Hsieh et al. (2017) has also adopted
to be explored. Generally speaking, the future research will
a dynamic filter threshold to get the significant updates and
focus on the two main dimensions.
make it efficient training a model over WAN.
Gradient descent (GD) has been widely used in various kinds Lots of network-related challenges are expected to be solved
of distributed machine learning scenarios, which requires or mitigated with the integration of ML technologies. As
considerable communication between GPUs because each introduced in the paper, QoS optimization and Traffic analy-
GPU must exchange both gradients and parameter values on sis gain much benefit from the machine learning techniques.
every update step. To reduce the communication cost, the “Network by AI” will remain as a hot topic, which aims to
batch computation idea is then applied to the optimization of adopt ML technologies to solve network problems. Towards
GD and becomes a prevalent method in distributed machine this direction, some major points should be concerned.
learning. Mini-batch gradient descent (MBGD) Cotter et al.
(2011) divides the training data into several parts (each part 1. Data Data collection is a key step for most ML tech-
is called one batch) and uses one batch to update param- niques. The quality and quantity of data can significantly
eters. Although large mini-batches are prone to reduce more affect the following modeling and training process.
communication cost, they may slow down convergence rate However, network-related data may touch the individ-
in practice Byrd et al. (2012). Thus, the size of mini-batch ual privacy and usually unavailable. For instance, the
should not be set too large. With the moderate batch size, encryption improves the barrier for data accessing and
parameters can be updated frequently. Meanwhile, com- fails many analytical methods. Further, when the data is
pared with stochastic gradient descent (SGD), which uses accessible, the preprocessing of the network data also
one sample each time to update its model, MBGD retains requires special consideration. Noisy data and irrel-
good convergence properties Li et al. (2014c) and enjoys a evant features may damage the accuracy of the training
better robustness to noises since the batch data can smooth models. The filtering and cleaning of network data are
the biases caused by the noisy points. D-PSGD Lian et al. expected to involve much effort and skills. The lack of
(2017) is recently proposed to utilize ring-based topology labeled network data is also a big challenge.
for improving distributed machine learning performance. 2. Modeling Confronted with a variety of models and train-
During each iteration of D-PSGD, each node calculates ing algorithms, it can be difficult to make the proper
13
12 Y. Cheng et al.
choices that match the scenario. In the prior works, some eter in training, so how to design a system with slack
classic models and methods are employed to solve the fault tolerance to improve the efficiency of ML system
network problems, such as basic SVM, linear regression, is an open issue.
etc. To gain a better performance, more advanced mod-
els are applied to better fit the practical cases. No doubt Network has always played a fundamental role in com-
deep learning and reinforcement learning provide more puter engineering. The recent development of ML tech-
powerful tools for complex network problems. However, nology brings lots of novel ideas and methods for network
the modeling of a training process should be conducted research. It is believed that the combination of network and
with a full understanding of the practical problems. The ML will generate more innovations and create more values
abuse of deep learning and reinforcement learning may in the near future.
not gain much benefit.
Acknowledgements This work is supported by the National Natural
Science Foundation of China under Grants No. 61772305.
5.2 Network for ML
13
Bridging machine learning and computer network research: a survey 13
Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection Jayaraj, A., Venkatesh, T., Murthy, C.S.R.: Loss classification in
in optimization methods for machine learning. Math. Program. optical burst switching networks using machine learning tech-
134(1), 127–155 (2012) niques: improving the performance of TCP. IEEE J. Sel. Areas
Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, Commun. 26(6), 45–54 (2008)
B., Zhang, C., Zhang, Z.: Mxnet: A flexible and efficient machine Jia, C., Liu, J., Jin, X., Lin, H., An, H., Han, W., Wu, Z., Chi, M.:
learning library for heterogeneous distributed systems. CoRR Improving the performance of distributed tensorflow with
abs/1512.01274 (2015) RDMA. Int. J. Parallel Program. 3, 1–12 (2017)
Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project adam: Jiang, J., Sekar, V., Milner, H., Shepherd, D., Stoica, I., Zhang, H.:
Building an efficient and scalable deep learning training system. CFA: A practical prediction system for video QoE optimization.
In: 11th USENIX Symposium on Operating Systems Design and In: NSDI, pp. 137–150 (2016)
Implementation (OSDI 14). pp. 571–582. USENIX Association, Li, D., Chen, C., Guan, J., Zhang, Y., Zhu, J., Yu, R.: Dcloud: Dead-
Broomfield, CO (2014) line-aware resource allocation for cloud computing jobs. IEEE
Chowdhury, M., Stoica, I.: Efficient coflow scheduling without prior Trans. Parallel Distrib. Syst. 27(8), 2248–2260 (2016)
knowledge, pp. 393–406 (2015) Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Ahmed, A., Josi-
Comar, P.M., Liu, L., Saha, S., Tan, P.N., Nucci, A.: Combining super- fovski, V., Long, J., Shekita, E.J., Su, B.Y.: Scaling distributed
vised and unsupervised learning for zero-day malware detection. machine learning with the parameter server. In: Proceedings of
In: 2013 Proceedings IEEE INFOCOM, pp. 2022–2030 (2013) the 11th USENIX Conference on Operating Systems Design and
Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch Implementation, pp. 583–598. OSDI’14, USENIX Association,
algorithms via accelerated gradient methods. In: Shawe-Taylor, Berkeley, CA, USA (2014a)
J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Li, M., Andersen, D.G., Smola, A.J., Yu, K.: Communication effi-
Advances in Neural Information Processing Systems 24, pp. cient distributed machine learning with the parameter server.
1647–1655. Curran Associates, Inc. (2011) In: International conference on neural information processing
Das, A.K., Pathak, P.H., Chuah, C.N., Mohapatra, P.: Contextual locali- systems, MIT Press, Cambridge, pp. 19–27 (2014b)
zation through network traffic analysis. In: INFOCOM, 2014 Pro- Li, M., Zhang, T., Chen, Y., Smola, A.J.: Efficient mini-batch train-
ceedings IEEE, pp. 925–933. IEEE (2014) ing for stochastic optimization. In: Proceedings of the 20th
Dean, J., Corrado, G.S., Monga, R., Chen, K., Devin, M., Le, Q.V., ACM SIGKDD international conference on Knowledge dis-
Mao, M.Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., Ng, covery and data mining, pp. 661–670. ACM (2014c)
A.Y.: Large scale distributed deep networks, pp. 1223–1231. Li, W., Zhou, F., Meleis, W., Chowdhury, K.: Learning-based and
Associates Inc., USA, NIPS’12, Curran (2012) data-driven tcp design for memory-constrained iot. In: Distrib-
Dong, M., Li, Q., Zarchy, D., Godfrey, P.B., Schapira, M.: Pcc: re- uted Computing in Sensor Systems, pp. 199–205. IEEE (2016)
architecting congestion control for consistent high performance. Li, X., Bian, F., Crovella, M., Diot, C., Govindan, R., Iannaccone,
NSDI 1, 2 (2015) G., Lakhina, A.: Detection and identification of network anoma-
Fontugne, R., Borgnat, P., Abry, P., Fukuda, K.: Mawilab:combining lies using sketch subspaces. In: ACM SIGCOMM Conference
diverse anomaly detectors for automated anomaly labeling and on Internet Measurement, pp. 147–152 (2006)
performance benchmarking. In: International Conference of Lian, X., Zhang, C., Zhang, H., Hsieh, C.J., Zhang, W., Liu, J.: Can
CoNext, pp. 1–12 (2010) decentralized algorithms outperform centralized algorithms? a
Foundation, T.A.S.: Hadoop project.http://hadoop.apache.org/core/ case study for decentralized parallel stochastic gradient descent
(2009) (2017)
Franc, V., Sofka, M., Bartos, K.: Learning detector of malicious net- Liu, D., Zhao, Y., Sui, K., Zou, L., Pei, D., Tao, Q., Chen, X., Tan,
work traffic from weak labels. In: Joint European Conference on D.: Focus: Shedding light on the high search response time in
Machine Learning and Knowledge Discovery in Databases, pp. the wild. In: IEEE INFOCOM 2016—the IEEE International
85–99. Springer (2015) Conference on Computer Communications, pp. 1–9 (2016)
Franc, V., Sofka, M., Bartos, K.: Learning detector of malicious net- Liu, D., Zhao, Y., Xu, H., Sun, Y., Pei, D., Luo, J., Jing, X., Feng,
work traffic from weak labels. In: Proceedings, Part III, of the M.: Opprentice: towards practical and automatic anomaly detec-
European Conference on Machine Learning and Knowledge Dis- tion through machine learning, Tokyo, Japan (2015)
covery in Databases —Volume 9286. pp. 85–99. ECML PKDD Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C., Hel-
2015, Springer, New York, Inc., New York, NY, USA (2015) lerstein, J.M.: Graphlab: a new framework for parallel machine
Furno, A., Fiore, M., Stanica, R.: Joint spatial and temporal classifi- learning. CoRR abs/1408.2041 (2014)
cation of mobile traffic demands. In: INFOCOM—36th Annual Lutu, A., Bagnulo, M., Cid-Sueiro, J., Maennel, O.: Separating wheat
IEEE International Conference on Computer Communications from chaff: Winnowing unintended prefixes using machine
(2017) learning. In: IEEE INFOCOM 2014—IEEE Conference on
Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., Computer Communications, pp. 943–951 (2014)
Lu, S.: Bcube:a high performance, server-centric network archi- Ma, S., Jiang, J., Li, B., Li, B.: Maximizing container-based net-
tecture for modular data centers, pp. 63–74 (2009) work isolation in parallel computing clusters. In: Edition of
Hayes, J., Danezis, G.: k-fingerprinting: a robust scalable website fin- the IEEE International Conference on Network Protocols, pp.
gerprinting technique. In: 25th USENIX Security Symposium 1–10 (2016)
(USENIX Security 16), pp. 1187–1203. USENIX Association, Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource manage-
Austin, TX (2016) ment with deep reinforcement learning, pp. 50–56., HotNets ’16,
He, T., Goeckel, D., Raghavendra, R., Towsley, D.: Endhost-based short- ACM, New York, NY, USA (2016)
est path routing in dynamic networks: An online learning approach. Mao, H., Netravali, R., Alizadeh, M.: Neural adaptive video streaming
In: INFOCOM, 2013 Proceedings IEEE, pp. 2202–2210 (2013) with pensieve, pp. 197–210. ACM (2017)
Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G.R., Mirza, M., Sommers, J., Barford, P., Zhu, X.: A machine learning
Gibbons, P.B., Mutlu, O.: Gaia: Geo-distributed machine learn- approach to tcp throughput prediction. In: ACM SIGMETRICS
ing approaching LAN speeds. In: 14th USENIX Symposium on Performance Evaluation Review, vol. 35, pp. 97–108. ACM
Networked Systems Design and Implementation (NSDI 17), pp. (2007)
629–647. USENIX Association, Boston, MA (2017)
13
14 Y. Cheng et al.
NVIDIA.: GPU APPLICATIONS: transforming computational Winstein, K., Balakrishnan, H.: Tcp ex machina: computer-generated
research and engineering. http://www.nvidia.com/object/machi congestion control. In: ACM SIGCOMM Computer Communi-
ne-learning.html (2017) cation Review, vol. 43, pp. 123–134. ACM (2013)
NVIDIA.: Developing a linux kernel module using gpudirect rdma. Xiao, W., Xue, J., Miao, Y., Li, Z., Chen, C., Wu, M., Li, W., Zhou,
https://docs.nvidia.com/cuda/gpudirect-rdma/index.html (2018) L.: Tux2: Distributed graph computation for machine learning.
NVIDIA.: Nvlink fabric. https://www.nvidia.com/en-us/data-center/ In: 14th USENIX Symposium on Networked Systems Design
nvlink/ (2018) and Implementation (NSDI 17), pp. 669–682. USENIX Asso-
NVIDIA: Nvidia dgx-1: the fastest deep learning system.https://devbl ciation, Boston, MA (2017)
ogs.nvidia.com/parallelforall/dgx-1-fastest-deep-learning-syste Xie, D., Ding, N., Hu, Y.C., Kompella, R.: The only constant is
m/ (2017) change: incorporating time-varying network reservations in
Nandi, A., Mandal, A., Atreja, S., Dasgupta, G.B., Bhattacharya, S.: data centers. ACM Sigcomm. Comput. Commun. Rev. 42(4),
Anomaly detection using program control flow graph mining from 199–210 (2012)
execution logs, pp. 215–224., KDD ’16, ACM, New York, NY, Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X.,
USA (2016) Xie, P., Kumar, A., Yu, Y.: Petuum: a new platform for distrib-
Neuvirth, H., Finkelstein, Y., Hilbuch, A., Nahum, S., Alon, D., Yom- uted machine learning on big data. IEEE Trans. Big Data 1(2),
Tov, E.: Early detection of fraud storms in the cloud. In: Proceed- 49–67 (2015)
ings, Part III, of the European Conference on Machine Learn- Xu, Q., Liao, Y., Miskovic, S., Mao, Z.M., Baldi, M., Nucci, A.,
ing and Knowledge Discovery in Databases —volume 9286, pp. Andrews, T.: Automatic generation of mobile app signatures
53–67. ECML PKDD 2015, Springer-Verlag New York, Inc., New from traffic observations. In: 2015 IEEE Conference on Computer
York, NY, USA (2015) Communications (INFOCOM), pp. 1481–1489 (2015)
Nunes, B.A., Veenstra, K., Ballenthin, W., Lukin, S., Obraczka, : K.: Xu, Y., Yao, J., Jacobsen, H.A., Guan, H.: Cost-efficient negotiation
A machine learning approach to end-to-end rtt estimation and its over multiple resources with reinforcement learning. Spain, Bar-
application to tcp, pp. 1–6. IEEE (2011) celona (2016)
Research., B.: Bringing HPC techniques to deep learning. http://resea Yamada, M., Kimura, A., Naya, F., Sawada, H.: Change-point detec-
rch.baidu.com/bringing-hpc-techniques-deep-learning/ (2017) tion with feature selection in high-dimensional time-series data.
Santiago del Rio, P.M., Rossi, D., Gringoli, F., Nava, L., Salgarelli, L., J. Catalysis 111(1), 50–58 (2013)
Aracil, J.: Wire-speed statistical classification of network traffic Yi, B., Xia, J., Chen, L., Chen, K.: Towards zero copy dataflows using
on commodity hardware, pp. 65–72. ACM (2012) rdma. In: Proceedings of the SIGCOMM Posters and Demos, vol.
Sivaraman, A., Winstein, K., Thaker, P., Balakrishnan, H.: An experi- 2017. ACM (2017)
mental study of the learnability of congestion control. In: ACM Zadrozny, B.: Learning and evaluating classifiers under sample selec-
SIGCOMM Computer Communication Review, vol. 44, pp. tion bias, pp. 114, ICML ’04, ACM, New York, NY, USA (2004)
479–490. ACM (2014) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.:
Soska, K., Christin, N.: Automatically detecting vulnerable websites Spark: Cluster computing with working sets. In: Proceedings of
before they turn malicious. In: 23rd USENIX Security Sympo- the 2Nd USENIX Conference on Hot Topics in Cloud Comput-
sium (USENIX Security 14), pp. 625–640, USENIX Association, ing. pp. 10–10. HotCloud’10, USENIX Association, Berkeley,
San Diego, CA (2014) CA, USA (2010)
Soule, A., Taft, N.: Combining filtering and statistical methods for Zhang, H., Zheng, Z., Xu, S., Dai, W., Ho, Q., Liang, X., Hu, Z., Wei,
anomaly detection. In: Conference on Internet Measurement 2005, J., Xie, P., Xing, E.P.: Poseidon: An efficient communication
Berkeley, California, Usa, pp. 31–31 (2005) architecture for distributed deep learning on GPU clusters. In:
Stringhini, G., Kruegel, C., Vigna, G.: Shady paths: Leveraging surfing 2017 USENIX Annual Technical Conference (USENIX ATC 17),
crowds to detect malicious web pages, pp. 133–144., CCS ’13, pp. 181–193. USENIX Association, Santa Clara, CA (2017)
ACM, New York, NY, USA (2013) Zhang, R., Qi, W., Wang, J.: Cross-vm covert channel risk assessment
Sun, Y., Yin, X., Jiang, J., Sekar, V., Lin, F., Wang, N., Liu, T., Sin- for cloud computing: An automated capacity profiler. In: 2014
opoli, B.: Cs2p: Improving video bitrate selection and adapta- IEEE 22nd International Conference on Network Protocols, pp.
tion with data-driven throughput prediction. In: Proceedings of 25–36 (2014)
the 2016 conference on ACM SIGCOMM 2016 Conference, pp. Zhang, X., Wu, C., Li, Z., Lau, F.C.M.: Proactive vnf provisioning with
272–285, ACM (2016) multi-timescale cloud resources: fusing online learning and online
Tan, H., Han, Z., Li, X., Lau, F.C.M.: Online job dispatching and optimization. In: IEEE INFOCOM 2017-IEEE Conference on
scheduling in edge-clouds (2017) Computer Communications (INFOCOM), pp. 1–9. IEEE (2017)
Taylor, V.F., Spolaor, R., Conti, M., Martinovic, I.: Appscanner: Auto- Zhang, Z., Zhang, Z., Lee, P.P., Liu, Y., Xie, G.: Proword: an unsu-
matic fingerprinting of smartphone apps from encrypted network pervised approach to protocol feature word extraction. In: INFO-
traffic. In: 2016 IEEE European Symposium on Security and Pri- COM, 2014 Proceedings IEEE, pp. 1393–1401. IEEE (2014)
vacy (EuroS P). pp. 439–454 (March 2016) Zheng, R., Le, T., Han, Z.: Approximate online learning for passive
Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Practi- monitoring of multi-channel wireless networks. Proc. IEEE INFO-
cal adversarial detection of malicious crowdsourcing workers. In: COM 12(11), 3111–3119 (2013)
23rd USENIX Security Symposium (USENIX Security 14), pp. Zheng, N., Bai, K., Huang, H., Wang, H.: You are how you touch:
239–254. USENIX Association, San Diego, CA (2014) User verification on smartphones via tapping behaviors. In: 2014
Wang, W., Zhang, Q.: A stochastic game for privacy preserving con- IEEE 22nd International Conference on Network Protocols, pp.
text sensing on mobile phone. In: IEEE INFOCOM 2014—IEEE 221–232 (2014)
Conference on Computer Communications, pp. 2328–2336 (2014) Zhou, X., Wang, K., Jia, W., Guo, M.: Reinforcement learning-based
Wang, Z.: The applications of deep learning on traffic identification. adaptive resource management of differentiated services in geo-
BlackHat USA (2015) distributed data centers. Spain, Barcelona (2016)
Wei, J., Dai, W., Qiao, A., Ho, Q., Cui, H., Ganger, G.R., Gibbons, Zhu, J., Li, D., Wu, J., Liu, H., Zhang, Y., Zhang, J.: Towards band-
P.B., Gibson, G.A., Xing, E.P.: Managed communication and width guarantee in multi-tenancy cloud computing networks. In:
consistency for fast data-parallel iterative analytics, pp. 381– IEEE International Conference on Network Protocols, pp. 1–10
394. ACM (2015) (2012)
13
Bridging machine learning and computer network research: a survey 15
Yang Cheng is currently a Ph.D. Ph.D. degree with the Department of Computer Science and Technol-
student in the Department of ogy, Tsinghua University, Beijing, China. His research interests include
Computer Science, Tsinghua data center networking, software defined networking and cloud
University. His research interest computing.
includes networking system, dis-
tr ibuted machine lear ning
system.
13