0% found this document useful (0 votes)

47 views32 pages

A survey on deploying mobile deep learning applications a systemic and technical perspective

Uploaded by

nsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views32 pages

A survey on deploying mobile deep learning applications a systemic and technical perspective

Uploaded by

nsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Journal Pre-proof

A survey on deploying mobile deep learning applications: a systemic and technical

perspective

Yingchun Wang, Jingyi Wang, Weizhan Zhang, Yufeng Zhan, Song Guo, Qinghua
Zheng, Xuanyu Wang

PII: S2352-8648(21)00029-8
DOI: https://doi.org/10.1016/j.dcan.2021.06.001
Reference: DCAN 283

To appear in: Digital Communications and Networks

Received Date: 4 June 2020

Revised Date: 6 May 2021
Accepted Date: 1 June 2021

Please cite this article as: Y. Wang, J. Wang, W. Zhang, Y. Zhan, S. Guo, Q. Zheng, X. Wang, A
survey on deploying mobile deep learning applications: a systemic and technical perspective, Digital
Communications and Networks, https://doi.org/10.1016/j.dcan.2021.06.001.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2021 Chongqing University of Posts and Telecommunications. Production and hosting by Elsevier
B.V. on behalf of KeAi Communications Co. Ltd.
Journal
Logo
00 (2021) 1–30
www.elsevier.com/locate/procedia

A survey on deploying mobile deep

learning applications: a systemic and
technical perspective

of
ro
-p
Yingchun Wanga , Jingyi Wanga , Weizhan Zhang∗a , Yufeng Zhanb,
Song Guob, Qinghua Zhenga ,Xuanyu Wanga
re
lP

a MOEKLINNS Lab, School of Computer Science and Technology, Xi’an Jiaotong University, Shaanxi 710049, China
b Department of Computing, The Hong Kong Polytechnic University, HK 999077, China
na
ur

Abstract
With the rapid development of mobile devices and deep learning, mobile smart applications using deep learning
Jo

technology have sprung up. It satisfies multiple needs of users, network operators and service providers, and rapidly
becomes a main research focus. In recent years, deep learning has achieved tremendous success in image processing,
natural language processing, language analysis and other research fields. Despite the task performance has been greatly
improved, the resources required to run these models have increased significantly. This poses to a major challenge for
deploying such applications on resource-restricted mobile devices. Mobile intelligence needs faster mobile processors,
more storage space, smaller but more accurate models, and even the assistance of other network nodes. To help the
readers establish a global concept of the entire research direction concisely, we classify the latest works in this field into
two categories, which are local optimization on mobile devices and distributed optimization based on the computational
position of machine learning tasks. We also list a few typical scenarios to make readers realize the importance and
indispensability of mobile deep learning application. Finally, we conjecture what the future may hold for deploying
deep learning applications on mobile devices research, which may help to stimulate new ideas.

c 2015 Published by Elsevier Ltd.

Keywords:
Deep learning, mobile computing, distributed offloading, distributed caching
1. Introduction other new mobile devices has further promoted the
development of AI applications on mobile devices.
The development of smart phones, laptops and
In this paper, we define a deep model that has been
∗ Weizhan
trained and applied to a specific service and is de-
Zhang (Corresponding author)
(email:[email protected]). signed to run on mobile devices as mobile deep
2 Wang Yingchun, et al.

learning applications(MDLA). Its training may be power and network bandwidth, yet users may be
cloud-based, or it may be based on edge devices reluctant to download them to their mobile devices.
using federated learning technology, which is not
our focus. The focus of our investigation is the rea- It is very important to study the deployment
soning process of these mobile deep learning mod- of MDLAs on mobile devices, and it makes it
els. Cameras, microphones, and sensors can ob- possible to migrate a large number of centralized
tain various types of information like video, audio, applications to mobile end. Some of them have
and acceleration from the real world. These kinds changed our daily life. Users may have to manu-
of data are then provided to MDLAs. Based on ally record their meal information in the past, but
this, MDLAs have developed rapidly and attracted now it can be achieved just with a smart spoon.
widespread attention due to its tangible benefits for Besides, there are many other important public
users from all sides. For example, MDLAs bene- scenes. For example, by carefully designing the
fit users by performing malicious software detec- offloading strategy, Lu et.al. make it possible
tion [1], app recommendation [2], user verifica- to deploy CNN on the userâĂŹs mobile phone

of
tion [3, 4], mobile visual tasks [5, 6], mobile web to detect important objectives like criminals by
browsing optimization [7], human activity moni- crowd sensing. Without MDLAs, the intelligence

ro
toring [8], medical health monitoring [9, 10] and of the machine will stay in the centralized cloud
other smart fields [11–17] and never appear in front of people on the edge. So
For network operators and third-party service our work is dedicated to investigating some classic
providers, the deployment of MDLA can sup-
port mobile crowdsourcing scenarios [18–21],
-p
and state-of-the-art research on deploying MDLA.
re
distributed machine learning [22–24], federated Recently, two have been explored to solve this
learning [25], multiple smart IoT applications [26], problem. The first addresses this problem from
and other fantastic services using mobile big the perspective of software and hardware of local
lP

data [27, 28], etc. The intelligence in mobile mobile devices, which means that the goal is to
applications is changing the way people live, work run MDLAs locally on mobile devices without
and interact with the world. the help of a third party. The key method is to
na

reduce the resource of running a deep learning

Deep learning is the main method of MDLAs. model or to optimize mobile device hardware
Even though these smart mobile vision applica- that is more suitable for MDLAs. There are
ur

tions are wonderful, they require a lot of storage, some popular solutions. One approach is to
calculations, and high consumption of power and compress the deep learning model. Even though
Jo

network bandwidth, and users may be reluctant this might influence its accuracy, it decreases the
to download them to their mobile devices. These demand for computation and storage resources.
requirements in contradiction with the limited An essential problem is how to balance the two
resources of mobile devices, becomes a main counterparts [29–35]. Another possible solution
bottleneck for the development of MDLAs. For is to reduce computational needs by reusing
example, CNN, serving as the main method in intermediate computing results [6, 29, 36, 37] or
mobile vision tasks, is executed for each input as maximizing the rate of utilizing device resources
a cascade of layers mainly including convolution through precise dispatching among multiple
layers, pooling layers and FC, and it produces in- deep learning tasks [5, 38, 39]. Moreover, much
termediate results called feature maps, and outputs work has been done to develop deep learning
inference results. Such CNN executions are known frameworks suitable for mobile devices [40–46].
for their high time and space complexity. A typical Finally, improving the hardware of mobile devices
CNN model, such as AlexNet, occupying more to support the operation of MDLAs can be taken
than 200M of memory, has 60 million parameters into consideration.
and needs 720 million FLOPs. The other, VGG-
16, occupying more than 500MB of memory, has Another research direction is to gain support
138 million parameters and needs 15300 million from background servers with sufficient resources.
FLOPs. Even though these smart mobile vision Here, ’sufficient resourcesâĂŹ means that the cur-
applications are wonderful, they need so much rent resources of background servers are sufficient
storage, calculations, and high consumption of for the running of tasks to be offloaded and that the
A survey on deploying mobile deep learning applications: a systemic and technical perspective 3

servers are not limited to cloud servers. The target

offloading servers can be mobile edge computing
servers, peer devices, or any places that are both
resource-sufficient and network reachable. First,
in 2006, Amazon launched the Elastic Compute
Cloud (EC2) [47], and the concept of cloud
computing was proposed. Offloading to cloud
servers with abundant computation and storage
resources can remarkably reduce computing delay.
However, this approach might introduce a major
network delay due to the unstable and restricted
bandwidth of the radio channel. Hence, how to
reduce transfer latency has become the focus of Fig. 1: Overview of the deployment of MDLAs
mobile offloading systems based on clouds [48–

of
50]. Second, in 2014, ETSI and IBM jointly
established the Mobile Edge Computing(MEC)

ro
of ad-hoc cloudlets due to the closer distance.
standardization group, and formally proposed
the concept of standardized MEC [51]. MEC
Some investigations have been carried out in
offers IT and cloud computing in a Radio Access
Network (RAN); its location is closer to mobile
users, which further reduces transfer latency.
-p
related fields. Mao’s work[58] mainly discusses
mobile edge computing from the perspectives
re
However, due to the limited resources compared of communication technology and computing
with cloud resources, problems such as multi-user resources, and studies the offloading of mobile
computing to edge servers through computation
lP

resource allocation [52–54], service placement

location [55], and offloading task allocation on architecture, communication models and resource
multiple servers [56] have appeared. Third, the management. Another work that also focuses on
emergence of ad hoc cloudlets and D2D commu- MEC [59] introduces some mobile edge comput-
na

nication in recent years enables mobile devices ing cases and service environments, followed by a
to transfer computing to other nearby computing detailed illustration of the standardized MEC in-
devices like smart phones, home-use computers, frastructure. Finally, this work discusses three key
ur

etc. [57]. Even though such a strategy minimizes areas of MEC computation offloading: offloading
the distance between task-initiated devices and its strategy, edge server resource allocation and user
Jo

computing location, considering the complicated mobility management. Dao’s work [60] discusses
mobility of users and the resource availability of how to deploy multimedia applications using deep
the cloudlet, the offloading strategy needs to be learning on mobile devices. This paper focuses
delicately designed within its limited coverage on the local computation of deep multimedia
area. In addition, the caching of services and data applications on mobile devices and discusses from
on background servers and communication opti- two aspects of software and hardware. Kumar’s
mization between mobile devices and backgrounds work[61] focuses on the distributed deployment
are also key research areas, but these are not the of mobile computing tasks, surveying it from the
focus of this paper. perspective of computation offloading. However,
in these works, we can learn the running of
Fig 1 shows an overview of deploying MD- MDLA only based on certain aspects: offloading
LAs. Mobile user can execute local computa- its computation to certain places such as edge
tion, or choose distributed deployment(by offload- servers, only learning distributed deployment
ing). The up arrow indicates the offloading pro- using offloading techniques and only discussing a
cess. Computing tasks from mobile devices are special kind of MDLA such as deep multimedia
firstly offloaded to the edge server. If the edge applications. None of these works studied MDLAs
server cannot satisfy its resource needs, it can be in a comprehensive way.
further offloaded to the remote cloud with sufficient
resources. We can also use D2D communication to This paper studies MDLAs perspectives of sys-
transfer some relatively light computing to devices tematization and networking, and conducts a com-
4 Wang Yingchun, et al.

prehensive survey on the challenges and corre- model compressing and this paper summarize them
sponding state-of-the-art solutions from two direc- based on space-memory and time-computation.
tions: local optimized running and distributed de-
ployment. The rest of this paper is organized as fol- 2.1.1. Reducing model spatial complexity
lows: Section 2 introduces the deployment of MD- The spatial complexity of deep neural networks
LAs on mobile devices; Section 3 illustrates the is determined by the number and size of param-
distributed deployment of MDLA; Section 4 clas- eters in the deep model. By reducing the spatial
sifies MDLAs according to different beneficiaries; complexity of the deep model, its memory can be
Section 5 lists future possible research directions greatly reduced. Based on this, we can classify
of related fields; and Section 6 summarizes this ar- related works into two categories: cutting down
ticle. the number of model parameters, including prun-
ing and sharing, and reducing its size like weight
quantization. The related specific researches are as
2. Deployment on mobile devices follows.

of
In this section, we discuss how to run MDLAs
a) Pruning

ro
locally on mobile devices. Its main idea is to re-
Pruning aims to reduce the number of model
duce the resource requirements for running deep
parameters. The basic idea is to select and delete
learning tasks or to design deep learning frame-
works suitable for mobile devices to optimize an
MDLAâĂŹs computation. To reduce resource us-
-p
some trivial parameters that have influence on the
modelâĂŹs accuracy, and then retrain the model
to recover the model performance.
re
age, we can make the following considerations:
firstly, the natural idea is to reduce the amount
Nonstructural pruning removes trivial neurons
of calculation and storage space required by the
lP

to refine the model size, including the parameters

model itself; second, for different input data, we
of layer size and the network depth. For instance,
can consider the reuse of some calculation results
in Han’s work, inessential connections in a trained
to reduce the computing resource requirements;
network will be pruned to reduce parameters,thus
na

whatâĂŹs more, there are usually multiple deep

reducing memory and CPU consumption [30].
learning tasks running on a single mobile device.
By well-dispatching resources between multiple
Structural pruning exploits structured sparsity at
ur

deep tasks, the purpose of "resource-saving" can

different scales on models, including filter sparsity,
also be achieved. The details are as follows.
kernel sparsity, and feature map sparsity, which
Jo

saves computation resources for embedded de-

2.1. Reducing the deep learning model’s complex- vices, parallel computing and distributed systems.
ity In Anwar’s work [31], for example, they propose
Deep learning tasks often involve large mod- a particle filtering approach to determine the
els and massive computation. Hence, when run- importance of different network connections. The
ning MDLAs locally, problems such as insufficient weight of importance of each particle is assigned
memory or computational resources are quite ob- by computing the misclassification rate of the
vious. A direct solution is to compress deep learn- corresponding connectivity pattern. The pruned
ing models so that they can be executed completely network is retrained to compensate for the loss
on mobile devices. Currently, this compression caused by pruning. In Fang’s work NESTDNN [5],
method is widely used in deep neural networks. the offline phase includes three subphases: model
Although the types of deep neural networks are dif- pruning, model recovery, and model analysis.
ferent, fundamentally, they comprise a general fea- In the model pruning phase, NestDNN adopts
ture extraction part and a decision part for certain the state-of-the-art Triplet Response Residual
tasks. The bottom feature extraction layers dom- (TRR) approach to sort the filters in the given
inate computational costs due to their relatively deep learning model according to their relative
heavy computation, while the decision layer dom- importance and then prunes the filters iteratively.
inates storage costs due to the large number of In each iteration, the approach prunes relatively
parameters that strongly influence the model size. unimportant filters and retrains the pruned model
In recent years, there are many researches about to compensate for the loss of accuracy. The
A survey on deploying mobile deep learning applications: a systemic and technical perspective 5

iteration ends when the pruned model fails to binarization scheme and executes binarization
reach the minimum accuracy set by the users. both on the input layer and the weight layer of the
CNN. Specifically, this method recursively per-
Middle hidden-layer pruning directly deletes forms residual quantization to improve the effect
some layers in networks. Rueda et al. [32] propose of binary approximation. Yin et al. accelerate
a method to maximize units to combine neurons the operation of DNN on low-power computing
into more complex convex functions and select hardware by understanding Straight-Through
neurons based on the local relevance of each neu- Estimator in training activation quantized neural
ronâĂŹs activation on the training set. Li et al. [62] nets [64].
divide the network layers into weight layers, such
as convolutional layers or fully connected layers, d) Direct design of small models
and non-weight layers, such as pooling layers or Some works expect to yield small and accurate
activation layers. The non-weight layer incurs less deep models by more delicate model design or di-
theoretical computation but a longer computation rect training of promising network architectures

of
time due to memory data access time and for other rather than iterative execution of training and prun-
reasons. The authors combine a non-weight layer ing.

ro
with a weight layer and eliminate independent For instance, the authors of [65] propose a
non-weight layers, which significantly reduces the small CNN architecture called SqueezeNe and put
inference time.

b) Parameter sharing
-p
forward a network architecture similar to inception
and call it a "Fire Module", which consists of
a squeeze convolutional layer with 1 × 1 filters
re
There are weighted edges between layers, and and an expanded convolutional layer with both
the number of weight parameters will increase 1 × 1 and 3 × 3 filters. It reduces the computation
lP

with a rise in the number of nodes. Therefore, overhead of convolutional layers by reducing the
to reduce parameters, approaches are designed size of filter kernels and the number of channels.
to share the weights of certain edges. A simple
example is that if every layer has 1000 nodes, then
na

The authors of [66] learn from factorized

we need to store 1000 × 1000 weight parameters. convolution and divide convolution into two parts.
However, if we cluster edges with approximate The first part is depthwise convolution, in which
weights, the model size will be sharply reduced.
ur

each filter only convolutes certain input channels.

Similarly, if we have 1000 cluster centers, we only The other part is pointwise convolution, using a
have to store 1000 parameters in this way. 1 × 1 filter to combine the multichannel outputs of
Jo

For example, the main idea of Han’s work[63] depthwise convolution layers.
is to compute the multiple clustering centers of
weights using k-means clustering. This approach Lin et al. [35] attempt to overthrow the previ-
clusters weight to the nearest centroids and com- ous idea of network pruning. Their goal is not to
pensates for the weight by fine-tuning training. prune after training a large model, but to design a
Chen hashes parameters into corresponding hash small model architecture and train it at the begin-
buckets, and parameters in the same bucket share ning. By a large number of assessments of state-
a single value[33]. of-the-art pruning algorithms, they found that it is
the network structure that counts, rather than which
c) Network quantization weights are retained.
Network quantization compresses initial net-
works by reducing the number of bits for each
weight parameter. Quantization uses low precision 2.1.2. Lowering time complexity
data to represent the original high precision data. Time compression of models means lowering
The traditional technique is to convert 32-bit the inference time complexity of deep models
floating-point to 16-bit floating-point, 16-bit with no notable increase in the training time
fixed-point, 8-bit fixed-point and so on. There are complexity. By reducing the time complexity of
some related technologies like Binarized Neural the deep model, we may be able to reduce the
Networks, Ternary Weight Networks, XNOR Net- amount of battery power it required. Common
work.In Li’s work[34], he proposes a higher-order methods that reduce the time complexity are as
6 Wang Yingchun, et al.

follows: In this method, the input data is transformed to

speed up the calculation. For example, Chen et al.
a) Low-rank Tensor Decomposition using the idea of wavelet to decompose the origi-
Low-rank tensor decomposition is often based nal input image into two low-resolution subimages,
on the theories and approaches of low-rank one of which carries high-frequency information
approximation, which is a very useful tool for and another carries low-frequency information. In
speeding up large CNNs. It decomposes the origi- this way, their work reduces image resolution with-
nal weighted tensor into two or more tensors, and out losing original information and finally achieve
optimizes the decomposed tensor, and it usually deep computation speed [71].
applies standard SVD to unfold three-dimensional e) Direct design of small models
tensors into two-dimensional tensors or uses the To reduce the inference latency of mobile deep
outer product sum approximation of multiple models, we can directly design small, fast mobile
unidimensional tensors to decrease the time com- models and ensure their accuracy at the same time.
plexity. For instance, Cheng et al. compute the There are various types of MDLA, and each of

of
low-rank tensor decomposition to eliminate redun- them has different trade-offs between model size
dancy in the convolution kernels. Their method and accuracy. Manually balancing these trade-

ro
obtains the global optimizer of the decomposi- offs and designing deep models for each of them
tion, and based on this, they use a new method to are very difficult because there are so many fac-
train low-rank constrained CNNs from scratch[67].

b) Computation acceleration
-p
tors to consider. In the most recent work [72],
the authors propose an automated Mobile Neural
Architecture Search (MNAS) approach using deep
re
Computation acceleration usually speeds reinforcement learning to search for a model for
forward propagation by improving matrix multi- a specific trade-off. They also propose a novel
lP

plication, abating numeral resolution, etc. Park’s factorized hierarchical search space to obtain a
work proposes a kind of multiplication based on good trade-off between flexibility and search space
tensors to achieve efficient computation between size. Using deep learning and automatic search-
a dense matrix and a sparse matrix. It pre-locates ing avoids the complexity of manual design and
na

the values that will be zero to avoid calculating is a promising new direction to build mobile deep
them when multiplying matrixes[68]. models. And for these lightweight convolutional
neural networks (such as MobileNets) designed for
ur

c) Knowledge distillation reasoning on mobile devices, Depthwise Convolu-

The concept of knowledge distillation was tion (DWConv) and Pointwise Convolution (PW-
Jo

first proposed by Bulica in 2006[69], and Hinton Conv) are their key operations. Taking into account
summarized and developed it in 2014[70]. The the characteristics of current mobile hardware and
main idea of knowledge distillation is to train a software systems, Zhang et al. proposed tech-
small network model to imitate the knowledge niques to re-optimize the implementations of these
of a large network trained in advance, similar to operations based on the ARM architecture [73].
the relationship of teacher and student. The large
network is the “teacher”, and the small network 2.2. Reuse of intermediate results
is the “student”. The general method is to learn
softmax classification outputs of teacher models. In addition to modifying the structure of deep
For example, Hinton et al. reduce the amount of learning models, it is applicable to reuse the in-
computation in a deep learning network by fol- termediate computing results of deep models by
lowing a less cumbersome model. To transfer the caching part of the computing results based on the
generalization ability of the cumbersome model partial similarity of the data. As a result, it can
to a small model, they use the class probabilities decrease the modelâĂŹs computational resource
produced by the cumbersome model as "soft needs. The intermediate results to be reused can
targets" for training the small model and add a be considered at multiple granularities, including
small term to the objective function to encourage the middle layersâĂŹ computational results for the
the small model to predict the true targets. same model and a similar input, a shared similar-
ity search for different models and the same in-
d) Data transformation put, similar semantic computational results for the
A survey on deploying mobile deep learning applications: a systemic and technical perspective 7

same models on different devices and a similar in- different tasks. The abstract features computed at
put, etc. The basic idea of these techniques is to re- the bottom could be used by multiple high layers.
duce the computational resources of running MD- Hence, the main idea of data reuse among multiple
LAs. deep tasks is that feature representations computed
by lower layers could be shared among different
2.2.1. Data reuse among image frames high layers to save computational costs.
Data reuse among image frames is often applied
to the same model and similar input, typically in The authors of MCDNN apply the idea above
continuous mobile vision. It is a serial mobile and present "class" clustering, a DNN classifier
video image stream obtained from omnipresent specialized for similar tasks. They aim to dominate
cameras on mobile and wearable devices to sup- context and provide similar classes with special-
port diverse vision MDLAs, including recognition ized, light models. More importantly, the model
assistance, lifestyle monitors, street navigation, need to perform well in recognition when the in-
etc. The CNN is a state-of-the-art vision process- put does not belong to any of the classes and then

of
ing methods, which can be regarded as a group of classifies it as the "other" class. Concatenating this
stacked layers. Each input frame generates inter- specialized model with a generic model, and only

ro
mediate results called a feature map and outputs if the specialized model reports the input as the
reasoning results. Because of these characteristics, "other" class does the general model perform fur-
we can reuse its layer processing results for similar
continuous images.
-p
ther classification[36].

2.2.3. Data reuse across devices

re
For example, DeepCache discovered temporal Computational data reuse across devices is ap-
locality from the inner structure of the input video, plied for the same models on different devices and
lP

that is, similarity between frames with similar time. similar inputs. There are many scenarios of run-
Referencing the heuristic approach in video com- ning the same MDLA on adjacent multiple devices,
pression, DeepCache propagates areas of reusable and these application cases often process similar
results in frames using the modelâĂŹs inner struc-
na

contextual data mappings for the same results. The

ture. These inner elements are not pixels. Instead, authors of foggy reuse approximate computation
they are high-dimensional, inexplicable data [6]. reused inputs across devices, which minimizes re-
In [29], the authors design an intelligent caching
ur

dundant computation by harnessing the “equiva-

mechanism especially for convolutional layers. lence” of similar input values and reusing previous
The key idea is to utilize the similarity between computation outputs with high confidence. They
Jo

successive frames in the first-person video. It com- designed Adaptive Locality-Sensitive Hashing (A-
putes the current frame by reusing the middle re- LSH) and Homogenized k Nearest Neighbors (H-
sults of the previous frame through the inner struc- kNN). The former achieved extensible and con-
ture of the convolution layer rather than simply stant lookup, while the latter provided high-quality
reusing the final output. reuse and a tunable accuracy guarantee[37].
2.2.2. Data reuse among multiple deep tasks
2.3. Resource dispatch among multiple tasks
Data reuse among tasks is often implemented
for different deep models and the same input. MD- There is usually more than one MDLA running
LAs might have several models for different but on a mobile system. Joint resource dispatch
related tasks with the same input executed during a optimization among all these deep models instead
similar time, and it is a waste of resources to repeat of independent optimization for a single deep task
the underlying feature extraction calculation. The can maximize the performance sum of all MD-
popular idea is to share partial computing results LAs. For instance, the architecture of DeepEye
of models for multiple tasks. For example, an removes a crucial limitation of executing multiple
MDLA may aim to infer the race, age and gender deep learning models on resource-limited mobile
of the user. One choice is to train a DNN for each devices by presenting a novel inference software
task, which incurs a cost of four inferences. DNNs pipeline. Its goal is to combine the execution
can be treated as bottom layers for extracting the of heavy computational convolution layers with
feature representation and high layers aiming for high-memory-cost fully connected layers, which
8 Wang Yingchun, et al.

realizes partial execution of multiple models training process while bringing little profit in the
simultaneously, especially for CNNs [38]. inference phase. Recently, some work has focused
on developing professional software libraries
NestDNN considers that resources available in to train and deploy deep models on resource-
mobile devices are dynamic due to events such as limited mobile devices. The general idea is to
starting new applications, closing current applica- combine traditional frameworks and accelerate
tions or modifying the application priority. Based the inference process of trained networks to sig-
on this consideration, the approach presents a nificantly reduce the resources of running MDLAs.
multitask resource-aware deep learning framework
for mobile vision systems. It selects the optimal For example, DeepLearningKit supports using
resource balance and accuracy for each deep CNNs on mobile devices possessing a GPU under
learning model dynamically so that the models’ an IOS system. It can speed up the inference
resource needs are compatible with the available phases of deep models trained by Caffe[74].
resources in the runtime system[5]. DeepX, as a software accelerator, also optimizes

of
the computation, memory, and energy cost of the
Geng et al. solve an energy-saving local core- inference phases of deep networks trained previ-

ro
offloading problem for multiple deep learning tasks ously. Its method is to divide the networkâĂŹs
running on multicore mobile devices. They for- computation into simpler pieces that can be ar-
malize the problem as a mixed-integer nonlinear
programming problem and then propose a heuristic
algorithm to jointly solve the offloading decision
-p
ranged more efficiently. Each piece can be run on
different processors (e.g., GPU, CPU) to achieve
sufficient utilization of the computation ability of
re
and task scheduling problems. This strategy prior- mobile devices[75].
itizes tasks of various applications to satisfy both
lP

application time constraints and task-dependency In Table 1, we list some current state-of-the-art
requirements. To reduce the search cost, they re- deep learning frameworks on mobile devices.
cursively inspect tasks and move them to the right
CPU cores to minimize the energy cost[39].
na

2.5. Summary of this chapter

2.4. Deep learning frameworks on mobile devices
There have been many well-established deep This section illustrates the local deployment of
ur

learning frameworks on resource-sufficient com- MDLAs and presents four ideas: reducing deep
model complexity, reusing intermediate computing
putation platforms, such as Caffe, TensorFlow,
results, performing resource dispatch among multi-
Jo

and Torch. They display good performance

in various deep learning tasks and have been ple deep tasks and developing deep learning frame-
widely used. However, for resource-restricted works for mobile systems. In Table 2, we give a
mobile platforms, these heavy frameworks are not brief summary of this section:
appropriate. Firstly, although these frameworks
are very complete, most of them have third-
party dependencies, which makes the framework 3. Distributed deployment
cumbersome and unsuitable for deployment on
resource-constrained mobile devices. Secondly, In addition to local deployment, MDLAs can be
the mobile deep learning framework should be executed in a distributed way, especially for com-
more focused on the inference phase, which putationally intensive applications. The develop-
is inconsistent with the traditional framework ment of communication techniques makes this pos-
design. WhatâĂŹs more, there is no model sible. Offloading is a solution to enhance the capa-
compression or mobile hardware-oriented calcu- bilities of mobile systems by migrating computing
lation optimization in a non-mobile deep learning to computable nodes with more resources. With
framework, which is exactly the focus of mobile good network connections, we can offload tasks
deep computing optimization. Although some from various MDLAs to resource-sufficient back-
frameworks have released corresponding versions grounds and use abundant storage resources in dif-
for mobile platforms, they are primarily optimized ferent locations of the network to support its oper-
for the powerful GPU chips that facilitate the ation.
A survey on deploying mobile deep learning applications: a systemic and technical perspective 9

Table 1: An overview of the popular deep learning frameworks for mobile terminals
Name Company Android IOS CPU GPU DSP Time Open Characteristic Training
source
TensorFlow Google X X X X 2017 X Lightweight, ×
lite [40] cross-platform,
fast
Caffe2 [41] Facebook X X X X 2017 X Lightweight, X
modular, scalable
core ML2 Apple × X X X 2018 × Weight quan- ×
[42] tification, batch
forecasting
NCNN [43] Tencent X X X × 2017 X Cross-platform, ×

of
no third-party
dependence,

ro
compiler-level
optimization is
-p extremely
and scalable
fast
re
Feather Tencent X X X × X Lightweight X
cnn [44] (hundreds of
KB), no third-
lP

party depen-
dency, scalable
SNPE [45] Qualcomm X × X X X X Executes any ×
na

depth model,
greatly limited by
hardware
ur

MACE [46] Xiaomi X X X X X 2018 X Heterogeneous ×

acceleration,
Jo

compiler-level
optimization,
supports various
framework model
transformations
Paddle Baidu X X X X X 2017 X Multi-hardware ×
model [76] platform, deep
model quantiza-
tion compression

3.1. Position abling the running of more complicated applica-

tions on mobile devices and saving devices’ re-
3.1.1. Remote cloud
sources. In most cases, this approach can be a
Before mobile-edge computing and D2D com- good choice for MDLAs with small data transfer-
munication techniques, people usually offloaded ence needs, large computation needs and low inter-
computation to traditional cloud service plat- action needs.
forms and introduced Mobile Cloud Computation For example, Shea et al. provide a platform
(MCC). Cloud services deployed on core networks to offload AR content that requires real-time im-
have rich computation and storage resources, en-
10 Wang Yingchun, et al.

Table 2: Overview of work for local deployment of MDLAs

Key idea Bibliography list Work content
The unimportant connections in the network are
pruned after training to reduce the parameters
[30]
needed by the network and the consumption of
memory and CPU
Using a particle filter, the importance weight of
Decrease the complexity of the each particle is allocated by calculating the er-
deep learning model and make it [31]
ror classification rate with corresponding connected
suitable for mobile execution modes
The most advanced Triple-Response Residual
(TRR) method is used to sort the filters in a given
[77]
deep learning model according to the importance of

of
the filters as well as to prune the filters iteratively
Output units are maximized and multiple neurons

ro
[32] are merged into more complex convex function rep-
resentations
The parameters are mapped to the corresponding
[33] -p Hasi bucket, where parameters in the same bucket
have the same value
re
The weight and input of CNN network layers are
[34] binarized so that the computing speed is faster and
the memory consumption is smaller
lP

The original pruning point of view is overturned

[35] and the acquired network architecture rather than
the retained weight is treated as more important
na

A multiplication method based on vector form is

[68] used, and the values that will be 0 are locked in ad-
vance;
ur

The internal structure of the model is used to prop-

[78]
agate areas with reusable results in the matrix
Reuse intermediate results
An intelligent caching mechanism is designed for
Jo

the convolution layers, which makes use of the sim-

[50]
ilarity between consecutive frames in first-person
video
Computing results are reused across devices be-
[37]
tween similar applications
The idea of “class” clustering is used to specialize a
DNN classifier for similar classes. The specialized
[51]
model is connected with the original “generalized
model” variant of the model
A new reasoning software pipeline is proposed,
which aims to interweave the execution of the con-
volution layer with a large amount of computation
Resource call between multiple [79] and the loading of the full connection layer with a
deep learning tasks on a single large amount of memory in order to realize the lo-
mobile device cal execution of multiple depth vision models (es-
pecially CNNs)
A framework is proposed that considers the dy-
namic changes of runtime resources to realize deep
[77]
learning for resource-aware multi-tenant devices in
a mobile vision system
The aim is to solve the problem of energy-saving
[80] computational offloading of multiple deep learning
tasks running on multicore mobile devices
Mobile deep learning framework [40–46, 76]
Deep reinforcement learning is used to search for
Automatic design of small models [72] mobile models, with good a trade-off between ac-
curacy and latency to reduce inference latency
A survey on deploying mobile deep learning applications: a systemic and technical perspective 11

age analysis and a great deal of image rendering the orchestrator to perform server assignment and
to powerful cloud servers [50]. This kind of AR frame resolution. In their edge-based MAR sys-
MDLA is computationally intensive, has a high tem, mobile tasks are first offloaded to the orches-
power cost and is a typical application type to be trator and then sent to the edge server[56].
offloaded to clouds. The authors use Pokemon Go, In recent years, ad hoc mobile cloudlets [57]
a popular AR game, to test their platform. have emerged as a closer offloading point for mo-
However, the mobile cloud still faces multiple bile users. They can offload their computation to
challenges. Obviously, because of its relatively peer devices ad hoc to save local energy and re-
long distance from users, mobile cloud computa- sources. When we use cloudlets, there are more
tion is not suitable for all offloading cases, espe- communication methods than MEC and MCC,
cially for interaction-intensive applications. MCC such as Bluetooth, Wi-Fi Direct, and other di-
also places massive additional loads on the radio rect communication techniques, which may lead to
and backhaul of mobile networks. Dinh et al. list lower latency. Van et al. discuss the optimal offload
the technical challenges of MCC in detail [81]. In decision for mobile users in ad hoc mobile clouds.

of
mobile communication, because of the character- Mobile users can offload computation to nearby
istics of wireless networks, e.g., lack of wireless cloudlet devices through Device-to-Device (D2D)

ro
resources, traffic congestion and multi-rates, MCC communication-powered cellular mobile networks.
faces challenges, including low bandwidth, ser- The authors developed an offload scheme based on
vice availability, heterogeneity, etc. Regarding the
computational aspect, there are difficulties such as
efficient and dynamic computation offloads under
-p
Deep Reinforcement Learning (DRL), aiming to
make an optimal offloading policy by considering
the uncertainty of users and cloudlets as well as the
re
variable conditions, user and data security prob- resource availability of cloudlets[82].
lems, productivity of data access, and contextual
lP

awareness. These are not the focus of this article. 3.1.3. Collaboration of remote cloud and edge net-
works
3.1.2. Edge network MCC offers high-capacity service and rich com-
The long distance from the users and restricted puting resources, but it is too far from mobile ter-
na

backhaul bandwidth make it difficult for MCC to minals and faces high latency. MEC is closer to
cope well with all offloading scenarios, especially mobile users and offers low latency, but it has lim-
for latency-sensitive and interaction-intensive MD- ited computing resources and low-capacity queries.
ur

LAs in recent years. Over time, people have con- Ad hoc mobile cloudlets are the closest to mobile
tinued to explore how to promote resources and users and offer more communication methods and
Jo

services closer to users to reduce access delays and low latency. Users can access cloudlets through
energy consumption. Then, mobile edge comput- one hop in the wireless network, but resources on
ing, ad hoc clouds, and other novel computational cloudlets are extremely limited and offer few ser-
architectures emerged. Mobile Edge Computing vices. Therefore, when multimobile users offload
(MEC) is a promising solution to compensate for their tasks to cloudlets, resources on cloudlets are
the MCC problem. It is closer to mobile users by likely to be depleted, and the rejection rate of new
providing IT and cloud computing in wireless ac- requests is high. Each type of MDLA has dif-
cess networks [51]. As one of the key techniques ferent characteristics, such as latency sensitivity
in the 5G era, MEC has some advantages, includ- and computation intensity. Therefore, combining
ing low latency, high bandwidth, real-time wireless MCC, MEC and ad hoc mobile cloudlets, learn-
network information, and location awareness. In ing from others’ strong points and closing the gap
recent years, there have been many works studying would be a promising approach for MDLAs.
offloading mobile computation to MEC servers. For example, Teerapittayanon proposes a Dis-
For instance, Liu et al. study the offloading of tributed Deep Neural Network (DDNN) based
computation-intensive Mobile AR (MAR) tasks to on distributed computing hierarchies, including
servers in edge networks. They design an edge net- clouds, edge networks and terminal devices. The
work orchestrator and build a measurement-based DDNN can accommodate DNN inference in the
analytical model to express the trade-off between cloud, as well as rapid and local inference at edge
latency and accuracy. They also propose a cor- servers and terminal devices using shallow parts
responding algorithm as the core component of of the DNN. Under the support of extensively dis-
12 Wang Yingchun, et al.

tributed computing hierarchies, a DDNN can ex- Josilo et al. study the resource allocation prob-
tend neural networks and geographic scales[83]. lem of multiple self-benefiting mobile users of-
Its distributed method also improves the sensor fu- floading to mobile clouds[52]. They define this
sion ability, fault tolerance and data confidentiality as a non-cooperative game problem and present
of DNNs and can be applied to large MDLAs be- an efficient decentralized algorithm to jointly op-
cause of its more robust and safer operation. timize their offloading strategies. Their algorithm
In [84], Vitor et al. provide a new strategy to converges to a pure-strategy Nash equilibrium. Fi-
simplify the combination of cloud and fog facilities nally, an upper bound for the price of anarchy in
in IoT scenarios, called the Combined Fog-Cloud the game is provided for the two cloud resource
(CFC). It introduces the QoS-aware service allo- models they propose.
cation problem and expresses it as an integer opti- Liu et al. introduce a joint multi-resource alloca-
mization problem to satisfy capacity requirements tion framework located in a cloud computation sys-
as well as minimize latency. tem based on the Semi-Markov Decision Process
Lin et al. coordinate the computational re- (SMDP). The goal of the framework is to maxi-

of
sources of cloudlets and remote clouds to fully uti- mize the overall advantages of the entire system by
lize these two systems. Additionally, they develop constructing an optimal strategy of wireless band-

ro
a system reward model for wireless bandwidth and width and computing resource allocation for mul-
computational resource allocation. They formu- tiple mobile users in MCC that has a low service-
late the problem as a semi-Markov decision pro-
cess and use the LP solver tool to solve it as a linear
programming problem[53].
-p
denial rate and processing latency[53].
Zheng et al. adopt a multi-user stochastic game-
theoretic approach in an MCC dynamic offloading
re
environment. Mobile users are in a dynamic state,
3.2. Computation mode active or inactive, and radio channels vary stochas-
lP

tically. The authors formulate an offloading strat-

3.2.1. Single-server single-user egy for multiple users in a dynamic environment
Under the condition of the single-server single- as a stochastic game due to its selfish character
user, we mainly focus on making the offloading and prove that it is equivalent to a weighted po-
na

strategy, that is, running MDLAs locally, perform- tential strategy with at least one Nash Equilibrium
ing complete offloading or dividing MDLAs into (NE). They propose a multiagent stochastic learn-
several independent subtasks to partially offload. ing algorithm with convergent speed to solve the
ur

Our goal is to minimize energy cost or tasksâĂŹ problem[54].

total latency or make a trade-off between the two.
Jo

We discuss the offloading strategy in detail in sub- 3.2.3. Multi-servers single-users

section B. In this case, we first set the premise that the
server is not a remote cloud; it is a set of edge
3.2.2. Single-server multi-users servers or computable devices not far from mo-
The single-server multi-user mode is more com- bile devices running MDLAs. When considering
plex than single-user situations since we must con- different offloading modes of MDLAs, the multi-
sider issues of computational resource allocation server single-user mode can be divided into two
and wireless channel competition. When multi- situations: (i) the MDLAs are offloaded com-
ple users offload their computation to one server pletely, which means that the mobile device needs
to reduce the processing time of their MDLAs si- to choose one point among multiple computable lo-
multaneously, their response time may not be as cations according to the current wireless channel
perfect as expected. On the one hand, this situ- state, task queue and available resources on edge
ation may cause resource contention over limited servers and nearby computable devices. It is de-
servers. On the other hand, there will be competi- fined as a request routing problem, and we discuss
tion for various resources, including wireless con- it in paragraph D; (ii) MDLAs can also be divided
nections. Under such circumstances, to achieve into several subtasks and offloaded to multiple edge
better system performance, joint optimization of servers and computable devices. In this case, the
offloading strategies of individual users under mul- heterogeneity and time-varying nature of edge de-
tiple constraints on their time and energy should be vices pose difficult challenges for the division and
taken into consideration. allocation of tasks and the collection and consol-
A survey on deploying mobile deep learning applications: a systemic and technical perspective 13

idation of calculation results. Keshtkarjahromi et get is to minimize the amount of centralized of-
al. propose a Coding Cooperative Computing Pro- floading to the cloud caused by a lack of service
tocol (C3P). It dynamically offloads encoded sub- caching or insufficient resources at the edge. Then,
tasks of MDLAs to multiple computable locations they propose a bicriteria algorithm that provably
in the edge network and can adapt to time-varying achieves approximation guarantees while violating
edge resources [85]. resource constraints in a bounded way [87].
Liu et al. primarily focus on usersâĂŹ request-
3.2.4. Multi-servers multi-users routing problems. They study dynamic alloca-
Under multi-server multi-user conditions, there tion of usersâĂŹ offloading requests under multi-
are more problems to be noted. We define them as ple edge servers in a MAR system[56].
follows: Chen’s work focuses on resource-allocation and
Resource allocation on a single server: Re- request-routing problems. It studies offloading
sources on edge servers are limited and may fail in a superdense computing network based on the
to satisfy all requests from the covered area. We idea that software defines the network. The au-

of
define the allocation of limited resources on servers thors transfer the offloading strategy as an NP-hard
among multiple users as a resource allocation prob- mixed integer nonlinear programming problem and

ro
lem. further divide it into two subproblems: the con-
UsersâĂŹ routing requests on multiple servers: vex subproblem of resource allocation and the 0-
The density of base stations in the 5G era will
reach 50 BSs/km2 [86], which will lead to users
being located in a complex multi-base station en-
-p
1 program subproblem of request routing. They
use alternative optimization techniques to find a
solution[88] .
re
vironment with overlapping areas. Such compli- The work of [89] mainly focuses on the
cated, multi-unit situations make it difficult for resource-allocation problem and resource sharing
lP

users to decide where to offload their MDLA tasks among multi-edge computing servers. It mod-
to achieve the optimal performance of MDLAs, els these problems as a multi-objective optimiza-
and we define this as a request routing problem. tion problem and constructs a framework based on
Placement of service: Services can be cached on Cooperative Game Theory (CGT) in which every
na

edge servers not far from users to provide lower edge server first satisfies its own offloading request
service access latency. There are three prerequi- and then shares the remaining resources with other
sites: (i) edge servers can only cache a limited servers. The authors present an O(N) algorithm
ur

number of services; (ii) usersâĂŹ requests can only and achieve Pareto optimal allocation.
be routed to servers with the service they request;
Jo

and (iii) users are in a complex multiunit environ- 3.3. Offloading decisions
ment. We must decide how to allocate various ser- Offloading the tasks of various MDLAs to back-
vices among multiple edge servers to respond to grounds with sufficient resources is the basic idea
more requests and maximize overall performance. of the distributed deployment of MDLAs. A key
We define this as a service placement problem and challenge when we make an offloading decision is
discuss its solution in subsection C. deciding when and how to offload, because offload-
Balancing offloading among edge servers: The ing is not always beneficial; unstable network con-
distribution of mobile users presents high spatial ditions, frequent interactions or large amounts of
variety and mobility. These characteristics cause input data may lead to large transmission latency
an imbalance in the workload among edge servers and high energy consumption. However, making
and influence the request-response time. We can the best offloading decision is not an easy and
define two problems from this: a load-balancing straightforward task. For different MDLA types,
problem and a resource-sharing problem. there are different factors to be considered and dif-
With the definitions of these problems, we can ferent weights of demands, including accuracy, la-
now provide some examples of offloading un- tency, energy consumption, etc. We also need to
der multi-server multi-user modes. Each example consider the state of the whole system, includ-
faces one or more of the problems above. ing the device temperature, current task number,
Poularakis et al. formulate the Joint Service network type, state of the background server, etc.
Placement and Request Routing problem (JSPRR) The inherent complexity and diversity of these fac-
in a multi-unit MEC network. Its optimization tar- tors have led to a variety of studies on comput-
14 Wang Yingchun, et al.

ing offloading decision-making. For the offloading ponents and the offloading sequence. The compo-
mode, computing offloading can be divided into nents higher in the executive order have a lower of-
two strategies: complete offloading and partial offloading priority. (ii) Hardware-constrained com-
floading. Decision-making methods can be divided ponents must be executed locally on mobile de-
into rule-based and learning-based methods. vices; for example, in a mobile video analysis
application, we obtain an image or video stream
3.3.1. Offloading mode through a camera on the mobile device that cannot
A typical MDLA can be simply divided into be offloaded. (iii) The size of data exchanged be-
three parts: data acquisition, data preprocessing tween components and the amount of computation
and data analysis. Data acquisition often requires of each component should also be taken into con-
hardware to be integrated into mobile devices. sideration. The tendency is usually to offload com-
Therefore, due to the limitation of hardware set- ponents with a large amount of computation or less
tings, it must stay on mobile devices and cannot data traffic with other components, or to offload in
be offloaded. For the other two subtasks, the op- the reverse order of execution.

of
timal offloading decision should be made on the We may consider three aspects of potential
premise of comprehensively considering the re- strategies: (i) offloading some subtasks of the

ro
sources needed, the amount of communication data MDLA to the background to reduce the calcula-
between subtasks, battery power and the current tion delay and energy consumption of mobile de-
network bandwidth. In addition, it should be noted
that during partial offloading, the order of offload-
ing among subtasks is reverse to the running order.
-p
vices; (ii) offloading part of the processed data in-
stead of all initial data to reduce the transmission
delay and energy consumption; and (iii) protecting
re
a) Complete offloading the security and privacy of user mobile data. We
Highly integrated or relatively simple tasks can- give some examples of previous work to illustrate
lP

not be partitioned and must be executed locally or each aspect.

offloaded to a server or a peer device as a whole. For point a, the work in [18] aims to detect a spe-
Alternatively, MDLAs can be partitioned, but af- cific object in videos on mobile devices. It can be
ter making offloading decisions, complete offload- divided into three subtasks: video capture, video
na

ing is considered to be the current optimal solu- frame extraction and frame detection. Video cap-
tion. In [58], the authors define such a computing ture can only be done by cameras on mobile de-
task model as a binary offloading task model and vices and must be performed on mobile devices lo-
ur

use three field symbols to represent its properties: cally because of hardware limitations. Video frame
the task input data size, time limitation and calcu- extraction and frame detection can be offloaded se-
Jo

lation workload. These three features are the basic quentially according to network connection condi-
attributes of an MDLA, and we can essentially use tions and battery power. Notably, the order of of-
them and the current network bandwidth dynamics floading of these two subtasks is limited. The first
to make offloading decisions. subtask to be executed is the last subtask to be of-
In one study [59], Pavel Mach and Zdenek Bec- floaded.
var investigate complete offloading from three per- For point b, Jain et al. aim to use environmen-
spectives: minimizing latency, minimizing energy tal fingerprinting to achieve immersive, highly con-
consumption under delay constraints, and trading textualized MDLAs, especially MAR. This visual
off delay and energy consumption. In another diversity requires matching match a unique visual
project [50], the authors present a platform for signature with millions of databases. Its computa-
offloading MAR tasks to powerful cloud servers tion is heavy, and it needs to offload considerable
completely. They implement this system using a visual data to the cloud. The authors identify the
thin-client design. low-entropy characteristics of visual "features" and
b) Partial offloading design a system named VisualPrint to offload only
An MDLA consists of many components and the most distinctive visual data, reducing the time
can be divided into multiple partitions to achieve of network transmission [49].
fine-grained (partial) computing offload. For point c, when offloading the data of an
In partial offloading, we must note the following MDLA to the background for better and faster ex-
three points: (i) The dependence of partitions and ecution, mobile users face the risk of data privacy
components influences the executive order of com- exposure. Partial offloading is beneficial to privacy
A survey on deploying mobile deep learning applications: a systemic and technical perspective 15

protection in data exchange [22] [23] [20]. Inter- late the trade-off of network latency, computational
mediate data in deep learning models usually have latency, and analytic accuracy in MAR systems
semantics different from those of row data. For ex- and develop a multi-objective optimization prob-
ample, it is difficult to understand original informa- lem to select the optimal edge server and video
tion by only observing the features extracted from frame resolution for MAR users. They design a fast
the original data by CNN filters. Therefore, we and accurate (FACT) algorithm to solve this multi-
can offload high layers of the deep learning model, objective optimization problem based on convex
and then offload abstract data processed at the bot- optimization theory.
tom layer of the mobile side to the background. In In other fields, for offloading of one MDLA, the
many works of distributed deep learning, the model following works make rule-based offloading deci-
on mobile devices is regarded as a worker, and is sions. Sundar et al. study offloading decisions
combined with a central server to train the whole of MDLA in terms of a set of dependent tasks in
deep model. Partial local computing abstracts the a general cloud computing system consisting of a
user’s private data to a certain extent and protects heterogeneous local processor network and a re-

of
data privacy and safety when it is offloaded to the mote cloud server. Their optimization target is
central server. This can also be applied in a similar to minimize the execution cost of the entire ap-

ro
way to most mobile crowdsourcing scenarios. plication under each subtask completion time con-
straint. They propose a heuristic algorithm named
3.3.2. Decision method
a) Rule-based offloading decisions
A rule-based offloading decision usually consid-
-p
ITAGs to solve this NP-hard problem[80].
The work of [92] aims at minimizing task delay.
This optimization problem takes the queue state of
re
ers whether to offload as the output result of a com- the task buffer, the execution state of local process-
binatorial optimization problem under a set of con- ing units and the state of transmission units as in-
lP

straints. This method formulates the problem by puts to determine whether to offload completely.
measuring the context of current task execution un- Xu, Chen and Zhou regard the minimization of
der specific constraints and optimizing objectives, the computational delay and device energy con-
and uses certain mathematical knowledge to solve sumption on the server as the optimal target, and
na

the problem and output the decision scheme. their constraint condition is the cache capacity
In the field of vision, we list some works of rule- of the edge server, the maximum delay limita-
based offloading decisions. In [90] and [91], Ran tion of tasks, and the battery power of the device.
ur

et al. make extensive measurements to understand Their system outputs a service placement layout on
the trade-offs between video quality, network con- servers and an offloading decision for devices.
Jo

ditions, battery consumption, processing delay, and Third, for offloading multiple MDLAs on mul-
model accuracy, and formulate them as an opti- tiple mobile devices, we list the following works,
mization problem; then, they use a measurement- most of which jointly optimize the offloading de-
driven mathematical framework to efficiently solve cision of these tasks. The authors of [93], [94],
this combinatorial optimization problem. and [95] study the joint optimization problem of
In [18], Lu focuses on mobile video analysis; the multitask offloading under multiple mobile de-
task publisher needs mobile crowdsourcing videos vices. They measure the arrival rate of the data
to identify specific objects. A video crowd pro- packet of each time slot and the current network
cessing platform is designed and offloading deci- channel conditions as input, use the complete time
sions are made in both Wi-Fi and mobile cellular limitation of each task as a set of constraints, and
network connections. Under Wi-Fi connection, the aim to minimize the energy consumption of these
optimal goal is minimizing completion time, and mobile devices. They finally output the offloading
an algorithm named split-shift is proposed. Under decision for each mobile device and the allocation
cellular connections subjected to data usage con- of wireless resources and computing resources on
straints, the optimization goal is the trade-off be- the server among multiple tasks.
tween processing time and energy consumption. The work of [96] also considers the joint opti-
The authors of [56] consider dynamically as- mization of multitasks on multi-mobile devices. It
signing the workload of the mobile AR system to minimizes the trade-off between the energy con-
multiple mobile edge servers to maximize the per- sumption of mobile devices and task execution la-
formance of the MAR system. Liu et al. formu- tency and the output offloading decision for each
16 Wang Yingchun, et al.

task and its optimal wireless channel selection. tasks in certain fields, compared with the conven-
b) Learning-based offloading decisions Mak- tional method, it is a good way to further improve
ing offloading decisions based on learning begins the performance of offloading.
by collecting, quantifying and characterizing the
current running context of the program, including 3.4. Distributed cache
battery power, application properties, user mobil- In the broad use of MDLAs, we can observe
ity, network status, etc., and then uses them as in- two points. First, a user-requested service has a
put to the deep learning model to output whether high degree of repeatability; that is, the same ap-
and how to offload at the current moment. At plication is downloaded and run on thousands of
present, learning-based methods mainly use two different mobile devices by thousands of users in
basic strategies: DNNs, which are usually used to the application store. By deploying these services
construct classifiers, and DRL, which has excellent in a mobile edge network, mobile users can eas-
performance in decision-making. ily offload their MDLA data to edge servers under
In [97], the authors propose a novel mecha- good network conditions, which can greatly reduce

of
nism for optimizing offloading performance by us- the MDLA’s execution latency. Whether the ser-
ing crowd-sensed evidence traces and constructing vice can be cached in a certain edge server directly

ro
a DNN offloading-decision classifier. They believe determines whether the user in its coverage re-
that for MDLAs, data from one device is obviously gion can offload their computing to the edge. Sec-
not enough to quantify individual factors affecting
offloading due to their inherent complexity and di-
versity. Huber Flores et al. aggregate samples from
-p
ond, in MAR applications and many other video
image MDLAs, similar video content may be re-
peatedly requested by many users. Because video
re
a larger community of devices and design an evi- transmission takes up a large bandwidth, if we
dence analyzer using a DNN to identify times ben- take the video from the CDN for each request, re-
lP

eficial for offloading by analyzing evidence traces peated content transmission will cause great band-
collected through crowdsensing. width waste. Therefore, we should adopt an intel-
Duc Van Le et al. propose a Deep Reinforce- ligent cache strategy in a mobile network to enable
ment Learning (DRL)-based offloading scheme mobile users to obtain the content from a nearby
na

that enables users to make near-optimal offloading cache, which could significantly reduce the data
decisions under consideration of uncertainties of accessing time of the MDLA and greatly eliminate
user and cloudlet movements and cloudlet resource the influence of the network connection dynamics.
ur

availabilities. They propose a Markov Decision It has been proved that caches in 3G mobile net-
Process (MDP)-based offloading problem formu- works and 4G LTE networks can reduce mobile
Jo

lation and then use a deep reinforcement learning traffic by 1/3 to 2/3 [99]. In addition, the energy
scheme called a Deep Q-Network (DQN) to learn efficiency of the 4G network can be improved. The
an efficient solution for the proposed MDP-based evolution of the green 5G network can be effec-
offloading problem [82]. tively promoted by the intelligent caching of popu-
In addition to making an appropriate offloading lar content to reduce traffic load.
decision, we can improve the effectiveness of of-
floading by special offloading means. For exam- 3.4.1. Cache content
ple, Wasiur et al. use a queuing theoretic descrip- For MDLAs, there are two kinds of content to
tion of a collaborative uploading scenario, split be cached at the edge network: deep models of
data into chunks and offload them over multiple MDLAs, which we call services, and the MDLA
paths; finally, these chunks are merged at the desti- input data. Caching services requested by a large
nation [98]. This method can reduce the network number of users on the edge network enable mo-
transmission delay significantly and can be gen- bile users to offload their corresponding computa-
erally applied to other offloading work. This ap- tion, and the benefit depends on the popularity of
proach is a special offloading technique rather than the cached services. MDLA input data is generally
a black box that generally uses the network state data types with large transmission bandwidths such
as an input and outputs an offloading decision. We as videos, images and common data.
can also consider other special offloading methods a) Service
for each field, although this may be unusual and not A service cache is a deep model of the cache in
suitable for general work; however, for offloading an edge server or nearby computable device and
A survey on deploying mobile deep learning applications: a systemic and technical perspective 17

its associated databases, which allows users to of- interference on edge servers[102].
fload the corresponding computing tasks on the Second, for non-real-time data, in many dis-
edge. Since we can only cache a limited number tributed deployment MDLAs, users need to ex-
of MDLA services in resource-constrained edge change data frequently with the server, or many
servers at the same time, we must carefully decide users may request the same multimedia content
which services to cache to maximize the profit of from the CDN repeatedly. In traditional cloud-
offloading for the overall system. The services that based architecture, content is usually obtained
we cache on the edge server decide which tasks the from the central data storage center, far away from
user can offload. If we cache the most popular ser- users, which is not suitable for the characteristics
vice, the system may obtain the maximum perfor- of frequent data exchange and a large amount of
mance benefit. access requests. It produces considerable delay
Xu et al. formalize the joint service caching and and influences usersâĂŹ QoS. Therefore, in recent
task-offloading problem in MEC-enabled dense years, it has become increasingly popular to cache
cellular networks to minimize computation latency data at places close to users, such as edge servers

of
under a long-term energy consumption constraint. or ad hoc devices that are nearby.
They develop a novel online algorithm named Zhang et al. propose that more bandwidth is re-

ro
OREO to perform stochastic service caching on- quired for VR video applications to achieve high
line without requiring future information[55]. temporal and spatial fidelity content. They design
Wang et al. focus on mobile VR applications
with the support of online social networks. They
divide VR applications into two parts: service en-
-p
a VR video delivery system based on Named Data
Networking (NDN) and proposed an integrating
hotspot-based and popularity-based caching pol-
re
tities on servers and client entities on mobile de- icy to cache the content that is most likely to be
vices. They define the Edge Service Entity Place- requested to reduce the transmission delay of VR
lP

ment problem (ESEP) as the problem of deciding videos and enhance user experience[103].
where to place the SE of each user among the edge Hao et al. study knowledge-centric proac-
servers to maximize the economic profits of the tive edge caching in mobile content distribution
edge servers as well as achieve the desired level networks. The high dynamics of mobile video
na

of QoS for users, and they propose an iterative al- streams and complex user playback behaviors
gorithm named ITEM to solve this problem[100]. make it difficult to decide which content should
b) Data be cached through popularity-based investigations
ur

The input data of MDLAs can be divided into or probability-based predictions. This work opti-
two types: (i) real-time data acquired by hardware mizes the caching configuration based on seman-
Jo

on mobile devices such as images for object clas- tic information of the online playback behavior
sification; and (ii) offline data acquired from the of 5G multimedia service users. They mathemat-
content provider in the central storage center, for ically formulate this NP-complete caching opti-
example, multimedia data to support VR/AR and mization problem and propose a greedy-based on-
panoramic views. Distributed data caching works line caching configuration algorithm to minimize
only for offline data, not for real-time data. There- the overall delivery cost of video streaming and the
fore, for real-time data, we discuss data compres- maximum edge caching utilization ratio[104].
sion and transmission; for offline data, we discuss Mohan et al. propose an efficient edge caching
distributed caching. mechanism leveraging edge resources to predict
First, for real-time data, the operation is offload- and store data required for upcoming computa-
ing, and the main problem is limited, dynamic tions. Their solution is to group caches according
wireless bandwidth. In addition, we have to face to the workloads of different services. They further
the reality that most depth models are very sen- develop methods for populating caches and ensur-
sitive to data noise [101], so we need to offload ing the coherence of cached data[105].
high-quality data. Xie and Kim developed a DNN-
aware basic data compression framework named 3.4.2. Cache location
"GRACE" to compress the real-time image and From ad hoc devices to remote clouds, there are
video data acquired on IoT devices, which reduces many cache locations. Considering the character-
the network bandwidth consumption for data trans- istics of cellular networks based on full IP, we can
mission without affecting the performance of DNN divide them into three main storage locations: EPC
18 Wang Yingchun, et al.

core networks, Radio Access Networks (RANs) easier, recently, many works prefer pushing con-
and ad hoc devices. However, when deciding the tent cache to edges closer to users. Especially
caching location, we need to consider a basic prob- in the emerging 5G network, base stations (BS)
lem: although a closer cache reduces the redun- are naturally equipped with edge servers (such as
dant transmission of identical content in the rest of Nvidia Jetson TX21) and provide storage capa-
the network and releases the core, most networks bilities for cache services. By caching appropri-
are organized according to a tree distribution hi- ate content at the nearby edge, viewers can obtain
erarchy. And the closer the caching location is to the target content locally instead of from a remote
the user, the fewer users it covers. There are fewer CDN server, which not only provides better QoE
users served by edge servers than by more central with lower latency, but also saves core network
cloud storage, and if no user requests data at a cer- traffic costs.RAN caching typically cache content
tain cache point, it may not be necessary to actively in the eNB and it is mainly divided into two cate-
bring replicated content to the edge point. There- gories:
fore, intelligent selection of cache locations and Macro base station: a macro base station has a

of
optimization of content placement are required. large coverage to serve more users and has rich
In Fig 2, we show the transmission of content in storage resources for a better cache hit ratio. In the

ro
four cases: no cache, using a core network cache, work of Gu [106], the authors analyze the caching
using the wireless access network cache and using distribution problem in a macro base station as
an ad hoc device cache with D2D communication.
In the case of "no cache", every content request
needs to be transmitted through a complex network
-p
an NP-hard problem and propose a heuristic algo-
rithm to solve it.
Micro base station: compared with a macro
re
to retrieve content from a remote ISP, resulting in base station, a micro base station has less storage
great storage redundancy and transmission delays. space and a smaller coverage, so it may have a
lP

After adding the core network cache, the commu- lower cache hit ratio in terms of the diversity of
nication between the ISP and the core network de- cache content. However, micro base stations bring
vice can be reduced somewhat; after adding the greater flexibility, and more importantly, cooper-
RAN cache, the communication traffic between the ative content sharing between micro base stations
na

access network and the core network can be signif- can jointly optimize users’ requests to improve the
icantly reduced; if the device cache and D2D com- cache hit rate. In addition, one of the greatest ad-
munication are further added, the transmission de- vantages of micro base stations over macro base
ur

lay can be further reduced. stations is that they are closer to the user, so they
a) Core network can bring smaller delays to reduce the impact of
Jo

The centralized deployment of cache servers and unstable network connections.

PC nodes in EPC greatly simplifies the manage- However, current large-scale content tracking
ment of mobile content distribution networks and analysis shows that, unlike CDN-based caching,
is easy to expand as content demand diversity in- the edge caching environment has a large number
creases. The hit rate of the EPC cache can be of dynamic and diversified request patterns, which
extremely high because of its massive repository is more complicated. Therefore, what content is
and mass users. Moreover, since the content is en- stored at each base station and how the base sta-
capsulated and transmitted by the GPRS Tunneling tions can coordinate storage to meet the needs of
Protocol (GTP) downstream of the EPC, it is easier more users is an important issue in this subfield. In
to deploy a content-aware cache on the EPC than in the latest work [107], Wang et al. proposed Maco-
the other locations[99]. Cache, which uses Multi-Agents Deep Reinforce-
However, the core network cache has two main ment Learning(MADRL), in which each edge can
shortcomings: (i) the core network is far from combine with other edges to adaptively learn its
users, which will result in great latency when users own best strategy for intelligent caching.
access data; and (ii) the core network covers a large c) Ad hoc devices
range of users, and multiple users request the same The emergence of D2D communication tech-
content repeatedly, which will waste more network nology makes it possible to carry out convenient
transmission resources. and direct point-to-point communication between
b) RAN terminal devices. It is common to cache content
Although deploying a cache in a core network is on devices that are geographically close auxiliary
A survey on deploying mobile deep learning applications: a systemic and technical perspective 19

of
ro
-p
re
(a) (b) (c) (d)
lP

Fig. 2: Different content transfer requests due to different cache location.From left to right: No cache, Core network caching, Edge
network caching, Ad hoc cloudlet and D2D link
na

nodes or belong to users with similar characteris- a limited cache and can transmit content to each
tics and use D2D communication to assist its trans- other through D2D communication; and (iii) the
mission. central server transmits the content to mobile users
ur

Aravindh Ramane et al. cache data on fam- and then transmits it to other users. Their goal is to
ily auxiliary nodes or mobile devices of social- decide which content to cache on the end devices
Jo

related users and then connect them together in a to minimize service cost[110].
distributed way to realize content sharing. They Zhang et al. consider the QoS of a two-hop
design an edge-caching architecture named "Wi- wireless connection with a delay constraint in a
Stitch", which is an "edge-stitching" distributed multimedia big data offloading architecture. When
content transmission network[108]. In their recent two mobile users request the same multimedia data
work [109], the Wi-Stitch is extended by solving content, one downloads data from a BS and uses
two main problems: (i) the shared node may not D2D to forward data to another. The authors pro-
have enough bandwidth to share content associ- pose three optimal single-hop transmission power
ated with the limited Wi-Fi AP; and (ii) Wi-Stitch allocation schemes to solve the problem of sup-
may produce multiple copies of popular content porting this double-hop wireless link transmission
but insufficient copies of less popular content. The while ensuring the bounded QoS requirements of
authors formulate these as optimization problems two single hops[111].
and solve them by strategically placing content for
sharing within a geographically localized cell. 3.4.3. Cache policy
Akshay Mete and Sharayu Moharir combine a The cache strategy determines the caching con-
central server with multiple end-users. They form tent and locations as well as the time when to re-
a content delivery system that supports three con- lease its storage. Making a perfect cache scheme
tent delivery modes: (i) the central server stores the is one of the keys to improving the performance
entire content catalog and delivers the requested of MDLAs. It is important to estimate the benefits
content to mobile users; (ii) mobile devices have of caching certain content by evaluating its current
20 Wang Yingchun, et al.

popularity, potential popularity, storage size, and the viewpoint of users moves with the movement
the location of its existing copies in the network of the football. In the game, most players are only
topology. It is more challenging than traditional concerned about their own situation. These areas
cache strategies, including LRU, LFU, FIFO, etc., of interest, or video clips, are defined as hot spots
and its goal is to improve the cache hit rate and and should be prioritized in caching. Sometimes,
reduce the network transmission bandwidth con- there are multiple hot spots in a view. For exam-
sumption. ple, the work of [103] models the attraction of all
a) Popularity-based viewpoints in a VR view to cache the content that
Caching content with higher popularity can is most likely to be requested.
maximize the total QoE of all users within a cer- c) User preference-based
tain region. The popularity of content is defined The user preference profile includes the proba-
as the ratio of the number of requests for specific bilities of a specific user requesting each content
content to the total number of user requests. It is over a certain period of time, and there are signif-
restricted to a certain area in a certain period[112]. icant differences among different individuals. This

of
However, it is worth noting that the popularity of is because users usually have a strong preference
content is not static; it follows a Zipf distribution, for specific content categories. Users’ preferences

ro
which is a power-law distribution[113]. Therefore, can be predicted according to the historical content
it is necessary to update the popularity of content requirements of users and the similarity between
in a popularity-based cache strategy.
Zhang et al. design a data structure to record
the content popularity of each router. In addition,
-p
users. This information can be widely used in rec-
ommendation systems. Because of the character
of preference, user numbers under a certain prefer-
re
each router needs to communicate with the oth- ence category will not be too large, so it is suitable
ers to calculate global popularity information[103]. to be applied to cache servers with small coverage,
lP

The work of [114] records the global popularity such as SBs or home cloudlets.
of every video segment at every viewpoint. Large d) Learning-based
popularity means that this video segment has been First, content popularity is region-specific and
requested many times by users, so the cache of the not fixed, so it is difficult to capture. Second,
na

segment is even more meaningful. in most cases, the content we cache is a video
The work of [115] analyzes the dynamic adapt- stream that faces highly dynamic and complex user
ability of popularity in the cache algorithm from playback behavior. Therefore, a learning-based
ur

two aspects: (i) learning the accuracy of fixed pop- caching policy using knowledge of content demand
ularity distribution; and (ii) learning the changing history is very promising. For instance, in the work
Jo

speed of popularity for certain content. Based on of [78], the authors use multiagent reinforcement
both of these aspects, they propose a novel hybrid learning to design content cache policy in mobile
algorithm to learn popularity changes faster and D2D networks without the need for acquiring real-
better. time requirements and popularity. They propose a
Another important fact is that the distribution of belief-based Modified Combinatorial Upper Con-
content popularity in a large area is often differ- fidence Bound (MCUCB) algorithm to solve the
ent from the distribution in a small area. There- problem of large joint action.
fore, the measurement of content popularity faces Hao et al. implement a cache policy based on
the difficulty of spatial granularity knowledge, not user playback behavior[104]. They use deep be-
only because the coverage of various types of edge lief networks to capture the semantic information
servers is different but also because the users are of users and infer the video that will be requested
in dynamic flow between multiple units. This will in the future based on the user’s playback mode.
also have an impact on the prediction of content The video is actively cached in the edge network.
popularity, especially for edge servers in SBs with
a small coverage. 3.5. Summary of this chapter
b) Hot spot-based This section investigates the distributed deploy-
In many multimedia MDLAs such as VR/AR, ment scheme of MDLAs from three perspectives:
video streaming services, and real-time interactive deployment architecture, offloading decisions and
games, the user usually looks at the most attrac- distributed caches. The main work is summarized
tive viewpoint. For example, in a football game, in Table 3:
A survey on deploying mobile deep learning applications: a systemic and technical perspective 21

Table 3: Overview of distributed deployment of AI applications on mobile devices

Specific content Classification Paper List Note
Offloading Learning- [82, 97] Collects running context as model input to auto-
Policy based output offloading decisions
[18, 56, 80, 90,
Rule-based Takes minimizing the delay and/ or the energy
92, 94–96]
consumption of the mobile terminal as the opti-
mization target, under a set of constraints, and
Computing formulates and solves an optimization problem
Offloading Reducing [49, 90, 98] By reducing the quantity of data that needs to be
content transmission transferred
Protecting [20, 22, 23] Uploads pre-processed data

of
data privacy
Remote cloud [50, 81] Remote cloud networks are rich in resources but

ro
Offloading have large transmission delays
location
Edge network [56, 82, 116] Network resources are not as great as those of

Collaboration [83, 84]

-p the cloud but are closer to users
Combines the edge network with the core net-
re
of Cloud and work, makes use of the advantages of both, and
Edge realizes complementary disadvantages
SS-SU 3.2.2 Offloads decision-making on a single device
lP

Offloading
SS-MU [52–54] Server resource allocation and radio channel
mode
contention issues
na

MS-SU [85] Users have to choose where to offload their tasks

among multiple servers
MS-MU [56, 88, 117] Server resource allocation, wireless channel con-
ur

tention, user request routing, service entity

placement, load balancing between servers and
Jo

resource sharing
Cache Services [55, 100] Caching app services and related databases in
content edge servers
Data [103, 105] Cacheing data frequently requested by users,
particularly for video
Core network [99] Caching at the cloud with rich memory,but with
Cache Cache far distance, and large transmission delay
locations
RAN [99, 106] Caching at the eNB or edge servers nearby, lead-
ing faster response speed and lower latency
Peer devices [108, 110, 111] Caching data closer users,further reducing trans-
mission waiting
Popularity [103, 112–115] Uses the probability of certain content being re-
Cache quested in a certain period of time, which is re-
policy lated to the specific region. The cache of content
with greater popularity is more important
Hot spot [103] The highest-interest point of users in images
or videos; caching high-quality hot-spot content
can improve usersâĂŹ experience
Preferences of The user typically has a strong preference for a
users particular content category; we need to predict
and cache the content that individuals prefer
Learning [104] Learning-based caching strategy with content-
requesting history knowledge, which has high
dynamic adaptability
22 Wang Yingchun, et al.

4. Typical applications on the mobile side [7], mobile malware detection

and application recommendation [2], recommen-
We provide a brief overview of new MDLAs. dation systems based on historical information and
First, according to the application scenario or other smart applications [11–14]. MDLAs com-
the main beneficiaries, we divide them into user- pletely change our lives.
oriented and third party-oriented MDLAs, includ- Third party-oriented MDLAs mainly benefit
ing service providers and network operators. from service providers and network operators. The
User-oriented MDLAs operate mainly to pro- wide application of MDLAs heralds the generation
vide diversified and personalized intelligent ser- of massive data. The question of how third parties
vices for mobile users, and the user is the main ben- can effectively use these data to find valuable infor-
eficiary. They use lightweight portable data col- mation and develop better services and new busi-
lection devices integrated on mobile devices, in- ness opportunities is very important. Some typical
cluding cameras, microphones, sensors, etc., and scenarios are as follows. (i) Mobile crowdsourc-
use these data to compute on nearby mobile de- ing [18] [19–21]: For instance, criminal tracking

of
vices or edge servers. If the calculation is complex, analysis crowdsources multimedia data to locate
it can be further offloaded to the remote cloud. criminals; traffic forecasting collects complaints

ro
This deep learning is lightweight, portable, and from a large number of mobile users about acci-
close to users. User-oriented MDLAs center intel- dents and congestion on the ground in the early
ligence around the user. For example, the real-time
video or image acquired by a mobile-side camera
provides support for deploying object recognition
-p
peak period so that service providers can pro-
vide more accurate real-time traffic condition re-
ports and obtain better economic benefits. (ii) Dis-
re
and tracking technology on the mobile terminal so tributed deep learning tasks [22–24]: Distributed
that the mobile device has vision, which can be deep learning on mobile devices to make more ef-
lP

used in tasks such as face recognition for user au- ficient and convenient use of the large quantity of
thentication and local video analysis. We can use data generated by mobile terminals not only solves
the camera on a mobile device to collect lip mo- the problem of large data sets and large models in
tion video, and the lip-reading information can be the traditional centralized training mode but also
na

used not only for deaf and mute information input effectively aims at the problem of data privacy of
but also for lip reading authentication [3]. Aug- mobile terminal users. (iii) There are also vari-
mented reality and virtual reality technology pro- ous internet of things applications [26] and appli-
ur

vide a virtual superhuman vision for us, and it has cations that use mobile big data to develop various
attracted much interest from academia and indus- services [27, 28].
Jo

try and has appeared in many relevant emerging

applications, such as remote conferencing, which
5. Future directions
provides immersive experience through mobile de-
vices. This technology frees people from a spe- To better promote the development of MDLAs,
cific space so that they can meet at any time. Var- we summarize some possible development direc-
ious sensors provided on a mobile terminal enable tions based on the following points:
the mobile device to intelligently have a range of
awareness similar to humans, such as an accelera-
5.1. Mobile devices
tion sensor, which gives mobile devices a motion-
awareness capability. For example, the data col- As the carrier of MDLAs, mobile devices should
lected can be used to recognize human actions, be improved in many aspects, such as more com-
which can be applied in monitoring human activ- prehensive information acquisition, stronger pro-
ities [118] [8] and in medical scenarios [119]. An- cessing units with larger storage, and greater en-
other work for human health [12] uses sensors to durance.
collect spectral information and design an intelli-
gent tableware named Smart-U, which can recog- 5.1.1. Mobile information acquisition device
nize food composition and analyze dietary infor- The human brain cannot make better judgments
mation, driving progress in the healthcare system. without careful observation of life. Similarly, the
There are MDLAs that are beneficial for users in improved operation of mobile deep learning cannot
many other aspects, such as accelerated browsers be achieved without good contextual data input and
A survey on deploying mobile deep learning applications: a systemic and technical perspective 23

continuous feedback. Therefore, first, we should and poor network conditions, some users may pre-
consider how to enrich mobile information acqui- fer to run their applications locally on the mobile
sition devices, which are not limited to existing end. Besides, offloading itself is also an energy-
cameras, microphones, temperature sensors, etc., consuming work. So, how to improve the en-
to obtain more comprehensive feature-dimensional durance of mobile devices is a key problem. We
data. Second, how to improve the accuracy of the can study it by reducing the energy consumption
data acquired by mobile information acquisition of MDLAs and improving the battery capacity of
devices, reduce the quantity of dirty data and im- mobile devices.
prove the ability to acquire accurate data in poor
environments are also areas where mobile infor-
5.2. Data management
mation acquisition devices need to be improved.
Third, the heterogeneity of sensor quality in var- 5.2.1. Data personalized management
ious devices is also one of the key issues [120]. On Smart phones have become the main computing
the one hand, mobile devices are usually equipped

of
platform for millions of people. They also repre-
with nonprofessional sensors. The sensor quality sent a new set of input devices, millions of cam-
of different devices may be uneven, which leads to eras, microphones, GPS devices and many other

ro
the uneven quality of sensor data obtained. For ex- types of sensors that generate massive data at every
ample, in a third-party mobile applications, this has moment. With the increase in the number of con-
a great impact on the accuracy of deep tasks. On
the other hand, the workload of the mobile system
is unpredictable, which may result in different sam-
-p
nected devices including mobile phones, tablets
and laptops, there is an urgent need for personal-
ized management of mobile user data. There are
re
pling rates in different time periods, so the quality two main questions: The first is how to store mas-
of sensor data may be unstable over time. Work has sive data. First, these data cannot be completely
lP

been done [121] to solve these problems, but there stored in mobile devices. The limited storage ca-
are still some shortcomings, such as the execution pacity of a mobile device can only support it in
time and energy consumption of the whole opti- storing a small quantity of running data and per-
mization framework on mobile devices. In terms of
na

sonalized digital media data. For many devices

mobile sensor data acquisition, we still need more with insufficient storage, users also need to clear
work. the cache regularly. Apple iCloud is a good exam-
ur

ple of cloud storage replacing local storage. The

5.1.2. Mobile device design second point is to store a large quantity of user data
From the current development trend of the mo- in the cloud or edge servers, or even in peer devices
Jo

bile device market, it can be seen that mobile de- such as domestic microclouds and other mobile de-
vices tend to be smaller, thinner, more portable, vices. How can massive data be managed in a per-
while reducing their size. All of these factors limit sonalized way? For users, it is convenient for data
the size of the device hardware, such as the CPU, acquisition, data consistency, data recovery, data
heat sink area and memory chip, which restricts the updating and cleaning, etc. For enterprises, in core
performance of MDLAs. Improving the computing data mining, marketing, maintenance of member
and storage ability of mobile devices, similar to the service data, etc., there are many factors to be con-
introduction of deep learning chipsets and GPUs in sidered. Because of the massive size and frequent
mobile devices, is one of the key development di- updates of data, a more detailed design is needed
rections of mobile devices in the future. in MDLA data management.

5.1.3. Mobile device power consumption 5.2.2. Data privacy and security
As is well known, the computation of most MD- Offloading user data to the background will in-
LAs is very powerful, especially for those vision evitably face data privacy and security problems.
applications. While most of the battery capacity of First, for highly private data, privacy issues in com-
mobile devices is restricted and not enough to sup- munication, such as eavesdropping, may occur in
port complete local operation. Although we can the process of offloading. Second, there is a prob-
offload it to the background for high-performance lem of how to protect the privacy of user data when
computing, sometimes, due to high data privacy offloading the user data to the background server.
24 Wang Yingchun, et al.

Third, massive data have brought great value, in- large-scale MIMO systems, more intelligent de-
cluding a large number of data models and infor- vices, and Machine-to-Machine (M2M) communi-
mation; this raises the question of how to obtain cation [122]. We will discuss the communication
the user’s consent to use them and how to protect challenges of MDLAs based on these five tech-
the user’s privacy while using these data with the nologies.
user’s consent. The former may require the design a) Device-centered architecture: In the past, as
of nontechnical issues such as reward mechanisms. the basic unit of the wireless access network, the
The latter requires the design and support of data cell play an important role in controlling the uplink
encryption, feature abstraction and other technical and downlink transmission of data services. How-
issues. ever, recently, the focus has gradually moved from
core networks to peripheral devices, and the tradi-
5.3. System and network tional cell-centric architecture has been destroyed.
5.3.1. Distributed system for MDLAs We need to redefine the network architecture of the
Most offloading of MDLAs is related to the work new era, and we face some challenges. First, we

of
of distributed computing and caching. The tradi- study an ultra-dense heterogeneous network: MD-
tional design of highly available distributed sys- LAs make the density of heterogeneous access in

ro
tems usually needs to achieve redundancy, state mobile cellular units rapidly increase, the simple
synchronization, resource scheduling, system self- and single communication network architecture is
inspection, fault recovery, convenient scaling, etc.
In the process of offloading MDLAs, we need to
consider not only the typical factors above but
-p
not enough to meet intensive and diversified user
needs. The design of communication is now af-
fected by the type of popular MDLAs in this area
re
also the characteristics of mobile devices and deep and user’s mobility, and the coordination compen-
learning models, as well as the challenges brought sation between different layers of the network ar-
lP

by the diversity of offloading locations. Mobile de- chitecture also needs to be considered; In addi-
vices have mobility and rely on unstable wireless tion, with the rapid increase of base station density,
networks and cellular connections, which makes it achieving more flexible adaptive resource schedul-
difficult for offloading to achieve high fault toler- ing between them for MDLAs also urgently needs
na

ance and a stable state. It also affects making of- to be solved. Secondly, MDLAs also need 5G tech-
floading and cache decisions by changing devices’ nology with strong connectivity and highly inten-
location among multiple servers. This raises the sive deployment.
ur

questions of (i) how to model the spatiotemporal b) Communication technology: (1) Millime-
characteristics of users; (ii) how to cache the user’s ter waves: 5G has drawn attention to millimeter-
Jo

content in a mobile-aware way; (iii) how to im- wave, which brings greater bandwidth, richer spec-
prove the hit rate of the edge cache when users re- trum resources, more high-frequency antennas and
quest cached content; and (iv) how to ensure the higher propagation accuracy. However, millimeter
continuity of services when the mobility of users is waves are easily affected by the environment, and
unknown. All of these questions have to be con- the propagation distance is short, so we need more
sidered after the introduction of mobility. Whats’s technologies to improve signal anti-interference
more, now we have to consider the deep learning abilities and reduce path loss. (2) Communica-
model’s distributed training and parallel inference. tion between mobile devices: To share content and
jointly conputation between mobile devices wire-
5.3.2. Advancing communication lessly, more efficient D2D communication need to
As content and computing migrate to the edge be developed. Efforts need be made to design user-
side, the vigorous development of MDLAs shows sharing schemes, including hardwares and content.
the characteristics of low delay and high reliabil-
ity in computing and distributed and high band- 5.4. New application types
width in content. This poses the development People always need more types of MDLAs and
requirements of ultra-large bandwidth, ultra-large new mobile devices. To start with, the recent App
connection, ultra-reliability and low delay for new market has shown us its unlimited possibilities.
communication networks. Bocardi’s work has For example, Apps that run on traditional mo-
identified five key technologies of 5G: device- bile devices (including but not limited to rec-
centered architecture, millimeter-wave technology, ommendations from nearby smart friends) are
A survey on deploying mobile deep learning applications: a systemic and technical perspective 25

voice recognition modules that allow users to is- 61472317, and 61502379, the MOE Innovation
sue voice commands in social software. Besides, Research Team No. IRT 17R86, and the Project of
the combination of visual services and DL re- China Knowledge Centre for Engineering Science
sults in a super-visual service: auto-beatifying for and Technology.
multi-media data, 360-degree panoramic transmis- [1] L. Wei, W. Luo, J. Weng, Y. Zhong, X. Zhang, Z. Yan,
sion, viewpoint HD, super-resolution reconstruc- Machine learning-based malicious application detec-
tion, post-occlusion visual extension, visual au- tion of android, IEEE Access 5 (2017) 25591–25601.
thentication, etc. Moreover, intelligence in online doi:10.1109/ACCESS.2017.2771470.
[2] S. Xu, L. Zhang, A. Li, X. Y. Li, C. Ruan, W. Huang,
shopping can be applied to building a user im- Appdna: App behavior profiling via graph-based deep
age for commodity recommendation, false goods, learning, in: 2018 IEEE Conference on Computer
poor seller analysis[123], etc. In addition, some Communications, INFOCOM 2018,Honolulu, HI, USA,
April 16-19, 2018, 2018, pp. 1475–1483.
novel applications on new types of mobile de-
[3] L. Lu, J. Yu, Y. Chen, H. Liu, Y. Zhu, L. Kong, M. Li, Lip
vices appear in recent years, such as the wear- reading-based user authentication through acoustic sens-
able mobile devices[124], AR/VR glasses, smart ing on smartphones, IEEE/ACM Transactions on Net-

of
tableware[12], and driverless cars[125]. In fact, working 27 (1) (2019) 1–14.
[4] B. Zhou, J. Lohokare, R. Gao, F. Ye, Echoprint:two-
these apps are not enough. Human social activities,

ro
factor authentication using acoustics and vision on
work activities, physical activities as well as sen- smartphones, in: Proceedings of the 24th Annual In-
sory activities such as vision, hearing, smell, taste, ternational Conference on Mobile Computing and Net-
working, MobiCom 2018, New Delhi, India, October
touch and vision, hearing, smell, taste, tactile, and
vision, will be combined with mobile deep learn-
ing in the future. The new design of MDLAs will
-p 29-November 02, 2018, 2018, pp. 321–336.
[5] B. Fang, X. Zeng, M. Zhang, Nestdnn: Resource-aware
multi-tenant on-device deep learning for continuous mo-
re
make our lives much easier. bile vision, in: Proceedings of the 24th Annual Interna-
tional Conference on Mobile Computing and Network-
ing, MobiCom 2018, New Delhi, India, October 29 -
lP

6. Conclusion November 02, 2018, 2018, pp. 115–127.

[6] M. Xu, M. Zhu, Y. Liu, F. X. Lin, X. Liu, Deepcache:
Principled cache for mobile deep vision, in: Proceed-
In this paper, we discuss two aspects of deploy- ings of the 24th Annual International Conference on Mo-
na

ing MDLAs. One way is to execute them locally bile Computing and Networking(2018), MobiCom ’18,
on mobile devices. The main methods include (i) ACM, New York, NY, USA, 2018, pp. 129–144.
[7] J. Ren, L. Gao, H. Wang, Z. Wang, Optimise web brows-
reducing complexity by improving the deep learning on heterogeneous mobile platforms: A machine
ur

ing algorithm or by redesigning the model archi- learning based approach, in: 2017 IEEE Conference on
tecture to be suitable for mobile terminals; (ii) Computer Communications, INFOCOM 2017,Atlanta,
reusing intermediate results of deep models to re- GA, USA, May 1-4, 2017, 2017, pp. 1–9.
Jo

[8] W. Jiang, C. Miao, F. Ma, S. Yao, Y. Wang, Y. Yuan,

duce the amount of computation; and (iii) devel- H. Xue, C. Song, X. Ma, D. Koutsonikolas, W. Xu,
oping a lighter deep learning framework for mo- L. Su, Towards environment independent device free hu-
bile devices. The second way is to use the cloud, man activity recognition, in: Proceedings of the 24th An-
mobile edge servers, and ad hoc cloudlets to en- nual International Conference on Mobile Computing and
Networking, MobiCom 2018, New Delhi, India, October
able the distributed deployment of MDLAs. We 29-November 02, 2018, 2018, pp. 289–304.
discuss this scheme from three perspectives: a [9] H. Zhang, C. Song, A. Wang, C. Xu, D. Li, W. Xu, Pdvo-
distributed deployment framework, computing of- cal: Towards privacy-preserving parkinson’s disease de-
tection using non-speech body sounds, in: The 25th An-
floading decisions and cache configuration. In sec- nual International Conference on Mobile Computing and
tion 4, we classify current MDLAs and then give Networking, MobiCom 2019, Los Cabos, Mexico, Octo-
some promising future development directions for ber 21-25, 2019, 2019, pp. 1–16.
MDLAs. [10] H. Zhang, C. Xu, H. Li, A. S. Rathore, C. Song, Z. Yan,
D. Li, F. Lin, K. Wang, W. Xu, Pdmove: Towards passive
medication adherence monitoring of parkinson’s disease
using smartphone-based gait assessment, Proc. ACM In-
7. Acknowledgments teract. Mob. Wearable Ubiquitous Technol. 3 (3) (2019)
123:1–123:23.
This work is supported by the National Key [11] H. Du, P. Li, H. Zhou, W. Gong, G. Luo, P. Yang, Wor-
Research and Development Program of China drecorder: Accurate acoustic-based handwriting recog-
nition using deep learning, in: 2018 IEEE Conference
with grant number 2020AAA0108800, the Na- on Computer Communications, INFOCOM 2018,Hon-
tional Science Foundation of China under Grant olulu, HI, USA, April 16-19, 2018, 2018, pp. 1448–
Nos. 61772414, 61532015, 61532004, 61721002, 1456.
26 Wang Yingchun, et al.

[12] Q. Huang, Z. Yang, Q. Zhang, Smart-u: Smart utensils and Statistics, AISTATS 2017, 20-22 April 2017, Fort
know what you eat, in: 2018 IEEE Conference on Com- Lauderdale, FL, USA, Vol. 54 of Proceedings of Ma-
puter Communications, INFOCOM 2018,Honolulu, HI, chine Learning Research, PMLR, 2017, pp. 1273–1282.
USA, April 16-19, 2018, 2018, pp. 1439–1447. [26] L. He, K. Ota, M. Dong, Learning iot in edge: Deep
[13] T. Zhao, J. Liu, Y. Wang, H. Liu, Y. Chen, Ppg-based learning for the internet of things with edge computing,
finger-level gesture recognition leveraging wearables, IEEE Network 32 (1) (2018) 96–101.
in: 2018 IEEE Conference on Computer Communica- [27] Y. Chen, L. Shu, L. Wang, Poster abstract: Traffic
tions, INFOCOM 2018,Honolulu, HI, USA, April 16-19, flow prediction with big data: A deep learning based
2018, 2018, pp. 1457–1465. time series model, in: 2017 IEEE Conference on Com-
[14] M. Cheung, J. She, L. Liu, Deep learning-based on- puter Communications Workshops, INFOCOMWork-
line counterfeit-seller detection, in: IEEE INFOCOM shops, Atlanta, GA, USA, May 1-4, 2017, 2017, pp.
2018 - IEEE Conference on Computer Communications 1010–1011.
Workshops, INFOCOM Workshops 2018, Honolulu, HI, [28] Y. Hou, P. Zhou, J. Xu, D. O. Wu, Course recommenda-
USA, April 15-19,2018, 2018, pp. 51–56. tion of mooc with big data support: A contextual online
[15] Y. Zou, G. Wang, K. Wu, L. M. Ni, Smartsensing: Sens- learning approach, in: IEEE INFOCOM 2018 - IEEE
ing through walls with your smartphone!, in: 11th IEEE Conference on Computer Communications Workshops
International Conference on Mobile Ad Hoc and Sensor (INFOCOM WKSHPS), 2018, pp. 106–111.

of
Systems,MASS 2014, Philadelphia, PA, USA, October [29] L. N. Huynh, Y. Lee, R. K. Balan, Deepmon: Mobile
28-30, 2014, 2014, pp. 55–63. gpu-based deep learning framework for continuous vi-
[16] T. Meng, X. Jing, Z. Yana, W. Pedrycz, A survey on sion applications, in: Proceedings of the 15th Annual

ro
machine learning for data fusion, Information Fusion 57 International Conference on Mobile Systems, Applica-
(2020) 115–129. tions, and Services, MobiSys ’17, ACM, New York, NY,
[17] J. Wang, X. Jing, Z. Yan, Y. Fu, W. Pedrycz, L. T. Yang, USA, 2017, pp. 82–95.
A survey on trust evaluation based on machine learning,
ACM Comput. Surv. 53 (5).
[18] Z. Lu, C. K. S., P. Shiliang, L. P. Tom, Crowdvision:
-p
[30] S. Han, J. Pool, J. Tran, W. J. Dally, Learning both
weights and connections for efficient neural networks,
in: Proceedings of the 28th International Conference
re
A computing platform for video crowdprocessing using on Neural Information Processing Systems - Volume 1,
deep learning, IEEE Transactions on Mobile Computing NIPS’15, 2015, pp. 1135–1143.
PP (99) (2018) 1–1. [31] S. Anwar, K. Hwang, W. Sung, Structured pruning of
lP

[19] Y. Tian, W. Wei, Q. Li, F. Xu, S. Zhong, Mobi- deep convolutional neural networks, J. Emerg. Technol.
crowd: Mobile crowdsourcing on location-based so- Comput. Syst. 13 (3) (2017) 32:1–32:18.
cial networks, in: 2018 IEEE Conference on Computer [32] F. Moya Rueda, R. Grzeszick, G. A. Fink, Neuron prun-
Communications, INFOCOM 2018,Honolulu, HI, USA, ing for compressing deep networks using maxout archi-
na

April 16-19, 2018, 2018, pp. 2726–2734. tectures, in: V. Roth, T. Vetter (Eds.), Pattern Recogni-
[20] Q. Xu, R. Zheng, When data acquisition meets data ana- tion, Springer International Publishing, Cham, 2017, pp.
lytics: A distributed active learning framework for op- 177–188.
timal budgeted mobile crowdsensing, in: 2017 IEEE [33] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger,
ur

Conference on Computer Communications, INFOCOM Y. Chen, Compressing neural networks with the hashing
2017,Atlanta, GA, USA, May 1-4, 2017, 2017, pp. 1–9. trick, Computer Science (2015) 2285–2294.
[21] S. He, K. G. Shin, Steering crowdsourced signal map [34] Z. Li, B. Ni, W. Zhang, X. Yang, G. Wen, Performance
Jo

construction via bayesian compressive sensing, in: 2018 guaranteed network acceleration via high-order residual
IEEE Conference on Computer Communications, IN- quantization, in: 2017 IEEE International Conference on
FOCOM 2018,Honolulu, HI, USA, April 16-19, 2018, Computer Vision (ICCV), 2017.
2018, pp. 1016–1024. [35] Z. Liu, M. Sun, T. Zhou, G. Huang, T. Darrell, Rethink-
[22] T. Tuor, S. Wang, T. Salonidis, B. Ko, K. K. Le- ing the value of network pruning, in: International Con-
ung, Demo abstract: Distributed machine learning at ference on Learning Representations, 2019.
resource-limited edge nodes, in: IEEE INFOCOM [36] S. Han, H. Shen, M. Philipose, S. Agarwal, A. Wolman,
2018 - IEEE Conference on Computer Communications A. Krishnamurthy, Mcdnn: An approximation-based ex-
Workshops, INFOCOM Workshops 2018, Honolulu, HI, ecution framework for deep stream processing under re-
USA, April 15-19,2018, 2018, pp. 1–2. source constraints, in: Proceedings of the 14th Annual
[23] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, International Conference on Mobile Systems, Applica-
T. He, K. Chan, When edge meets learning: Adap- tions, and Services, MobiSys ’16, ACM, New York, NY,
tive control for resource-constrained distributed ma- USA, 2016, pp. 123–136.
chine learning, in: 2018 IEEE Conference on Computer [37] P. Guo, B. Hu, R. Li, W. Hu, Foggycache: Cross-
Communications, INFOCOM 2018,Honolulu, HI, USA, device approximate computation reuse, 2018, pp. 19–34.
April 16-19, 2018, 2018, pp. 63–71. doi:10.1145/3241539.3241557.
[24] Y. Bao, Y. Peng, C. Wu, Z. Li, Online job scheduling [38] A. Mathur, N. D. Lane, S. Bhattacharya, A. Boran,
in distributed machine learning clusters, in: 2018 IEEE C. Forlivesi, F. Kawsar, Deepeye: Resource efficient lo-
Conference on Computer Communications, INFOCOM cal execution of multiple deep vision models using wear-
2018,Honolulu, HI, USA, April 16-19, 2018, 2018, pp. able commodity hardware, in: Proceedings of the 15th
495–503. Annual International Conference on Mobile Systems,
[25] B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. Applications, and Services, MobiSys’17, Niagara Falls,
y Arcas, Communication-efficient learning of deep net- NY, USA, June 19-23, 2017, 2017, pp. 68–81.
works from decentralized data, in: Proceedings of the [39] Y. Geng, Y. Yang, G. Cao, Energy-efficient com-
20th International Conference on Artificial Intelligence putation offloading for multicore-based mobile de-
A survey on deploying mobile deep learning applications: a systemic and technical perspective 27

vices, in: INFOCOM 2018 - IEEE Conference [58] Y. Mao, C. You, J. Zhang, K. Huang, K. B. Letaief, A
on Computer Communications, Proceedings - IEEE survey on mobile edge computing: The communication
INFOCOM, Institute of Electrical and Electronics perspective, IEEE Communications Surveys and Tutori-
Engineers Inc., United States, 2018, pp. 46–54. als 19 (4) (2017) 2322–2358.
doi:10.1109/INFOCOM.2018.8485875. [59] P. Mach, Z. Becvar, Mobile edge computing: A survey
[40] Google, Tensorflow lite, https://tensorflow. on architecture and computation offloading, IEEE Com-
google.cn/lite/guide. munications Surveys and Tutorials 19 (3) (2017) 1628–
[41] Facebook, caffe2, https://caffe2.ai/blog/2018/ 1656.
05/02/Caffe2_PyTorch_1_0.html. [60] M. S. Dao, M. S. Dao, V. Mezaris, F. G. B. D. Na-
[42] Apple, Core ml2, https://developer.apple.com/ tale, Deep learning for mobile multimedia: A survey,
documentation/coreml. Acm Transactions on Multimedia Computing Commu-
[43] Tecent, Ncnn, https://github.com/Tencent/ncnn. nications and Applications 13 (3s) (2017) 1–22.
[44] Tecent, Feathercnn, https://github.com/Tencent/ [61] K. Kumar, J. Liu, Y.-H. Lu, B. Bhargava, A survey of
FeatherCNN. computation offloading for mobile systems, Mobile Net-
[45] Qualcomm, Snpe, https:// works and Applications 18 (1) (2013) 129–140.
developer.qualcomm.com/software/ [62] D. Li, X. Wang, D. Kong, Deeprebirth: Accelerating
qualcomm-neural-processing-sdk. deep neural network execution on mobile devices.

of
[46] Xiaomi, Mace, https://github.com/XiaoMi/mace. [63] S. Han, H. Mao, W. J. Dally, Deep compression: Com-
[47] Amazon, Amazon ec2, https://aws.amazon.com/ pressing deep neural networks with pruning, trained
de/ec2/. quantization and huffman coding.

ro
[48] T. Y.-H. Chen, L. Ravindranath, S. Deng, P. Bahl, H. Bal- [64] P. Yin, J. Lyu, S. Zhang, S. J. Osher, Y. Qi, J. Xin, Under-
akrishnan, Glimpse: Continuous, real-time object recog- standing straight-through estimator in training activation
nition on mobile devices, in: Proceedings of the 13th quantized neural nets, in: 7th International Conference
ACM Conference on Embedded Networked Sensor Sys-
tems, SenSys ’15, ACM, New York, NY, USA, 2015, pp.
155–168.
-p on Learning Representations, ICLR 2019, New Orleans,
LA, USA, May 6-9, 2019, 2019.
[65] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han,
re
[49] P. Jain, J. Manweiler, R. Roy Choudhury, Low band- W. J. Dally, K. Keutzer, Squeezenet: Alexnet-level accu-
width offload for mobile ar, in: Proceedings of the 12th racy with 50x fewer parameters and <0.5mb model size,
International on Conference on Emerging Networking CoRR abs/1602.07360.
lP

EXperiments and Technologies, CoNEXT ’16, ACM, [66] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko,
New York, NY, USA, 2016, pp. 237–251. W. Wang, T. Weyand, M. Andreetto, H. Adam, Mo-
[50] R. Shea, A. Sun, S. Fu, J. Liu, Towards fully offloaded bilenets: Efficient convolutional neural networks for mo-
cloud-based ar: Design, implementation and experience, bile vision applications, CoRR abs/1704.04861.
na

in: Proceedings of the 8th ACM on Multimedia Systems [67] C. Tai, T. Xiao, X. Wang, W. E, Convolutional neural
Conference, MMSys’17, ACM, New York, NY, USA, networks with low-rank regularization, in: 4th Interna-
2017, pp. 321–330. tional Conference on Learning Representations, ICLR
[51] Mobile-edge computing introductory technical white pa- 2016,San Juan, Puerto Rico, May 2-4, 2016, Conference
ur

per, Tech. rep., European Telecommunications Stan- Track Proceedings, 2016.

dards Institute (2015). [68] o. Park, Jongsbooktitle=International Conference on
[52] S. Josilo, G. Dan, Selfish decentralized computation Learning Representations (ICLR), S. Li, W. Wen, P. T. P.
Jo

offloading for mobile cloud computing in dense wire- Tang, H. Li, Y. Chen, P. Dubey, Faster cnns with direct
less networks, IEEE Transactions on Mobile Computing sparse convolutions and guided pruning.
PP (99) (2016) 1–1. [69] C. Bucila, R. Caruana, A. Niculescu-Mizil, Model com-
[53] Y. Liu, M. J. Lee, Y. Zheng, Adaptive multi-resource al- pression, in: Proceedings of the Twelfth ACM SIGKDD
location for cloudlet-based mobile cloud computing sys- International Conference on Knowledge Discovery and
tem, IEEE Transactions on Mobile Computing 15 (10) Data Mining, Philadelphia, PA, USA, August 20-23,
(2016) 2398–2410. 2006, 2006, pp. 535–541.
[54] J. Zheng, C. Yueming, W. Yuan, S. X. Sherman, Dy- [70] G. E. Hinton, O. Vinyals, J. Dean, Distilling the knowl-
namic computation offloading for mobile cloud comput- edge in a neural network, CoRR abs/1503.02531.
ing: A stochastic game-theoretic approach, IEEE Trans- [71] T. Chen, L. Lin, W. Zuo, X. Luo, L. Zhang, Learning a
actions on Mobile Computing PP (99) (2018) 1–1. wavelet-like auto-encoder to accelerate deep neural net-
[55] J. Xu, L. C. P. Zhou, Joint service caching and task works, in: Proceedings of the Thirty-Second AAAI Con-
offloading for mobile edge computing in dense net- ference on Artificial Intelligence,(AAAI-18), the 30th
works, in: IEEE INFOCOM 2018 - IEEE Confer- innovative Applications of Artificial Intelligence(IAAI-
ence on Computer Communications, 2018, pp. 207–215. 18), and the 8th AAAI Symposium on Educational Ad-
doi:10.1109/INFOCOM.2018.8485977. vances in Artificial Intelligence (EAAI-18), New Or-
[56] Q. Liu, S. Huang, J. Opadere, T. Han, An edge network leans, Louisiana, USA, February 2-7, 2018, 2018, pp.
orchestrator for mobile augmented reality, in: 2018 IEEE 6722–6729.
Conference on Computer Communications, INFOCOM [72] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler,
2018, Honolulu, HI, USA, April 16-19, 2018, 2018, pp. A. Howard, Q. V. Le, Mnasnet: Platform-aware neural
756–764. architecture search for mobile, in: IEEE Conference on
[57] M. Chen, Y. Hao, Y. Li, C.-F. Lai, D. Wu, On the com- Computer Vision and Pattern Recognition, CVPR 2019,
putation offloading at ad hoc cloudlet: architecture and Long Beach, CA, USA, June 16-20, 2019, 2019, pp.
service modes, IEEE Communications Magazine 53 (6) 2820–2828.
(2015) 18–24. [73] P. Zhang, E. Lo, B. Lu, High performance depthwise
28 Wang Yingchun, et al.

and pointwise convolutions on mobile devices, in: The heterogeneity-aware coded cooperative computation at
Thirty-Fourth AAAI Conference on Artificial Intelli- the edge, in: 2018 IEEE 26th International Conference
gence, AAAI 2020, The Thirty-Second Innovative Ap- on Network Protocols (ICNP), 2018.
plications of Artificial Intelligence Conference, IAAI [86] X. Ge, S. Tu, G. Mao, C.-X. Wang, T. Han, 5g ultra-
2020, The Tenth AAAI Symposium on Educational Ad- dense cellular networks, IEEE Wireless Communica-
vances in Artificial Intelligence, EAAI 2020, New York, tions 23 (1) (2016) 72–79.
NY, USA, February 7-12, 2020, AAAI Press, 2020, pp. [87] K. Poularakis, J. Llorca, A. M. Tulino, I. Taylor, L. Tas-
6795–6802. siulas, Joint service placement and request routing in
[74] A. Tveit, T. Morland, T. B. Røst, Deeplearningkit - multi-cell mobile edge computing networks, in: 2019
an gpu optimized deep learning framework for apple’s IEEE Conference on Computer Communications, INFO-
ios, os x and tvos developed in metal and swift, ArXiv COM 2019,Paris, France, April 29 - May 2, 2019, 2019,
abs/1605.04614. pp. 10–18.
[75] N. D. Lane, S. Bhattacharya, P. Georgiev, C. For- [88] M. Chen, Y. Hao, Task offloading for mobile edge com-
livesi, L. Jiao, L. Qendro, F. Kawsar, Deepx: puting in software defined ultra-dense network, IEEE
A software accelerator for low-power deep learn- Journal on Selected Areas in Communications 36 (3)
ing inference on mobile devices, in: 2016 15th (2018) 587–597.
ACM/IEEE International Conference on Information [89] F. Zafari, J. Li, K. K. Leung, D. Towsley, A. Swami,

of
Processing in Sensor Networks (IPSN), 2016, pp. 1–12. A game-theoretic approach to multi-objective resource
doi:10.1109/IPSN.2016.7460664. sharing and allocation in mobile edge clouds, CoRR
[76] BaiDu, Paddle model, https://github.com/ abs/1808.06937.

ro
PaddlePaddle/models. [90] X. Ran, H. Chen, X. Zhu, Z. Liu, J. Chen, Deepdecision:
[77] N. Makris, V. Passas, T. Korakis, L. Tassiulas, Employ- A mobile deep learning framework for edge video an-
ing mec in the cloud-ran: An experimental analysis, in: alytics, in: IEEE INFOCOM 2018 - IEEE Conference
Proceedings of the 2018 on Technologies for the Wire-
less Edge Workshop,EdgeTech@MobiCom 2018, New
Delhi, India, November 2, 2018, 2018, pp. 15–19.
-p
[91]
on Computer Communications, 2018, pp. 1421–1429.
doi:10.1109/INFOCOM.2018.8485905.
X. Ran, H. Chen, Z. Liu, J. Chen, Delivering deep learn-
re
[78] W. Jiang, G. Feng, S. Qin, T. S. P. Yum, Efficient ing to mobile devices via offloading, in: Proceedings of
d2d content caching using multi-agent reinforcement the Workshop on Virtual Reality and Augmented Real-
learning, in: IEEE INFOCOM 2018-IEEE Conference ity Network, VR/AR Network’17, ACM, New York, NY,
lP

on Computer Communications Workshops (INFOCOM USA, 2017, pp. 42–47.

WKSHPS), IEEE, 2018, pp. 511–516. [92] J. Liu, Y. Mao, J. Zhang, K. B. Letaief, Delay-optimal
[79] A. Mathur, N. D. Lane, S. Bhattacharya, A. Boran, computation task scheduling for mobile-edge comput-
C. Forlivesi, F. Kawsar, Deepeye: Resource efficient ing systems, in: 2016 IEEE International Symposium
na

local execution of multiple deep vision models using on Information Theory (ISIT), 2016, pp. 1451–1455.
wearable commodity hardware, in: Proceedings of the doi:10.1109/ISIT.2016.7541539.
15th Annual International Conference on Mobile Sys- [93] Z. Chen, W. Hu, J. Wang, S. Zhao, B. Amos, G. Wu,
tems, Applications, and Services, MobiSys ’17, ACM, K. Ha, K. Elgazzar, P. Pillai, R. Klatzky, D. Siewiorek,
ur

New York, NY, USA, 2017, pp. 68–81. M. Satyanarayanan, An empirical study of latency in an
[80] S. Sundar, B. Liang, Offloading dependent tasks with emerging class of edge computing applications for wear-
communication delay and deadline constraint, in: 2018 able cognitive assistance, in: Proceedings of the Second
Jo

IEEE Conference on Computer Communications, IN- ACM/IEEE Symposium on Edge Computing, SEC ’17,
FOCOM 2018, Honolulu, HI, USA, April 16-19, 2018, ACM, New York, NY, USA, 2017, pp. 14:1–14:14.
2018, pp. 37–45. [94] M. Kamoun, W. Labidi, M. Sarkiss, Joint resource al-
[81] H. T. Dinh, C. Lee, D. Niyato, W. Ping, A survey of mo- location and offloading strategies in cloud enabled cel-
bile cloud computing: architecture, applications, and ap- lular networks, in: 2015 IEEE International Confer-
proaches, Wireless Communications and Mobile Com- ence on Communications (ICC), 2015, pp. 5529–5534.
puting 13 (18) (2013) 1587–1611. doi:10.1109/ICC.2015.7249203.
[82] D. V. Le, C. K. Tham, A deep reinforcement learn- [95] W. Labidi, M. Sarkiss, M. Kamoun, Energy-optimal re-
ing based offloading scheme in ad-hoc mobile clouds, source scheduling and computation offloading in small
in: IEEE INFOCOM 2018 - IEEE Conference on Com- cell networks, in: 2015 22nd International Confer-
puter Communications Workshops, INFOCOM Work- ence on Telecommunications (ICT), 2015, pp. 313–318.
shops 2018, Honolulu, HI, USA, April 15-19,2018, doi:10.1109/ICT.2015.7124703.
2018, pp. 760–765. [96] X. Chen, L. Jiao, W. Li, X. Fu, Efficient multi-user com-
[83] S. Teerapittayanon, B. McDanel, H. T. Kung, Distributed putation offloading for mobile-edge cloud computing,
deep neural networks over the cloud, the edge and end IEEE/ACM Transactions on Networking 24 (5) (2016)
devices, in: 37th IEEE International Conference on Dis- 2795–2808. doi:10.1109/TNET.2015.2487344.
tributed Computing Systems,ICDCS 2017, Atlanta, GA, [97] H. Flores, P. Hui, P. Nurmi, E. Lagerspetz, S. Tarkoma,
USA, June 5-8, 2017, 2017, pp. 328–339. J. Manner, V. Kostakos, Y. Li, X. Su, Evidence-
[84] V. B. C. Souza, W. Ramírez, X. Masip-Bruin, E. Marín- aware mobile computational offloading, IEEE Transac-
Tordera, G. Ren, G. Tashakor, Handling service al- tions on Mobile Computing 17 (8) (2018) 1834–1850.
location in combined fog-cloud scenarios, in: 2016 doi:10.1109/TMC.2017.2777491.
IEEE International Conference on Communications, [98] W. R. KhudaBukhsh, B. Alt, S. Kar, A. Rizk, H. Koeppl,
ICC 2016,Kuala Lumpur, Malaysia, May 22-27, 2016, Collaborative uploading in heterogeneous networks: Op-
2016, pp. 1–5. timal and adaptive strategies, in: 2018 IEEE Conference
[85] Y. Keshtkarjahromi, Y. Xing, H. Seferoglu, Dynamic on Computer Communications, INFOCOM 2018,Hon-
A survey on deploying mobile deep learning applications: a systemic and technical perspective 29

olulu, HI, USA, April 16-19, 2018, 2018, pp. 1–9. Computer Communications, IEEE, 2019, pp. 82–90.
[99] X. Wang, M. Chen, T. Taleb, A. Ksentini, V. C. M. Le- [112] D. Liu, B. Chen, C. Yang, A. F. Molisch, Caching at the
ung, Cache in the air: exploiting content caching and de- wireless edge: design aspects, challenges, and future di-
livery techniques for 5g systems, IEEE Communications rections, IEEE Communications Magazine 54 (9) (2016)
Magazine 52 (2) (2014) 131–139. 22–28.
[100] L. Wang, L. Jiao, T. He, J. Li, M. Mühlhäuser, Service [113] A. Tatar, M. D. D. Amorim, S. Fdida, P. Antoniadis, A
entity placement for social virtual reality applications in survey on predicting the popularity of web content, Jour-
edge computing, in: 2018 IEEE Conference on Com- nal of Internet Services and Applications 5 (1) (2014) 8.
puter Communications, INFOCOM 2018,Honolulu, HI, [114] C. Bernardini, T. Silverston, F. Olivier, Mpc:popularity-
USA, April 16-19, 2018, 2018, pp. 468–476. based caching strategy for content centric networks, in:
[101] J. Su, D. V. Vargas, K. Sakurai, One pixel attack for fool- Proceedings of IEEE International Conference on Com-
ing deep neural networks, IEEE Transactions on Evolu- munications,ICC 2013, Budapest, Hungary, June 9-13,
tionary Computation(2019). 2013, 2013, pp. 3619–3623.
[102] K.-H. K. Xiufeng Xie, Source compression with [115] J. Li, S. Shakkottai, J. C. S. Lui, V. Subramanian, Ac-
bounded dnn perception loss for iot edge computer curate learning or fast mixing? dynamic adaptability of
vision, in: Proceedings of the 25th Annual Interna- caching algorithms, IEEE Journal on Selected Areas in
tional Conference on Mobile Computing and Network- Communications 36 (6) (2018) 1314–1330.

of
ing(2019), MobiCom ’19, ACM, 2019. [116] S. Misra, N. Saha, Detour: Dynamic task offloading in
[103] Y. Zhang, X. Jiang, Y. Wang, K. Lei, Cache and delivery software-defined fog for iot applications, IEEE Journal
of vr video over named data networking, in: IEEE IN- on Selected Areas in Communications 37 (5) (2019) 1–

ro
FOCOM 2018 - IEEE Conference on Computer Com- 1.
munications Workshops, INFOCOM Workshops 2018, [117] K. Poularakis, J. Llorca, A. M. Tulino, I. Taylor,
Honolulu, HI, USA, April 15-19,2018, 2018, pp. 280– L. Tassiulas, Joint service placement and request routing
285.
[104] H. Hao, C. Xu, M. Wang, H. Xie, Y. Liu, D. O.
Wu, Knowledge-centric proactive edge caching over mo-
-p in multi-cell mobile edge computing networks, CoRR
abs/1901.08946.
[118] H. Gong, K. Xing, W. Du, A user activity pattern mining
re
bile content distribution network, in: IEEE INFOCOM system based on human activity recognition and location
2018 - IEEE Conference on Computer Communications service, in: IEEE INFOCOM 2018 - IEEE Conference
Workshops, INFOCOM Workshops 2018, Honolulu, HI, on Computer Communications Workshops, INFOCOM
lP

USA, April 15-19,2018, 2018, pp. 450–455. Workshops 2018, Honolulu, HI, USA, April 15-19,2018,
[105] N. Mohan, P. Zhou, K. Govindaraj, J. Kangasharju, 2018, pp. 1–2.
Managing data in computational edge clouds, in: Pro- [119] H. Zhang, A. Wang, D. Li, W. Xu, Deepvoice: A
ceedings of the Workshop on Mobile Edge Communica- voiceprint-based mobile health framework for parkin-
na

tions, MECOMM@SIGCOMM 2017, Los Angeles, CA, son’s disease identification, in: 2018 IEEE EMBS Inter-
USA, August 21, 2017, 2017, pp. 19–24. national Conference on Biomedical & Health Informat-
[106] J. Gu, W. Wang, A. Huang, H. Shan, Proactive stor- ics, BHI 2018, Las Vegas, NV, USA, March 4-7, 2018,
age at caching-enable base stations in cellular networks, 2018, pp. 214–217.
ur

in: 24th IEEE Annual International Symposium on [120] A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow,
Personal, Indoor, and Mobile Radio Communications, M. B. Kjærgaard, A. K. Dey, T. Sonne, M. M. Jensen,
PIMRC 2013, London, United Kingdom,September 8- Smart devices are different: Assessing and mitigating-
Jo

11, 2013, 2013, pp. 1543–1547. mobile sensing heterogeneities for activity recognition,
[107] F. Wang, F. Wang, J. Liu, R. Shea, L. Sun, Intelligent in: Proceedings of the 13th ACM Conference on Embed-
video caching at network edge: A multi-agent deep re- ded Networked Sensor Systems, SenSys 2015, Seoul,
inforcement learning approach, in: IEEE INFOCOM South Korea, November 1-4, 2015, 2015, pp. 127–140.
2020 - IEEE Conference on Computer Communications, [121] S. Yao, Y. Zhao, H. Shao, D. Liu, S. Liu, Y. Hao, A. Piao,
2020. S. Hu, S. Lu, T. F. Abdelzaher, Sadeepsense: Self-
[108] A. Raman, N. Sastry, A. Sathiaseelan, J. Chandaria, attention deep learning framework for heterogeneous on-
A. Secker, Wi-stitch: Content delivery in converged edge device sensors in internet of things applications, in: 2019
networks, in: Proceedings of the Workshop on Mobile IEEE Conference on Computer Communications, INFO-
Edge Communications, MECOMM@SIGCOMM 2017, COM 2019, Paris, France, April 29 - May 2, 2019, 2019,
Los Angeles, CA, USA, August 21, 2017, 2017, pp. 13– pp. 1243–1251.
18. [122] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta,
[109] A. Raman, N. Sastry, N. Mokari, M. Salehi, T. Faisal, P. Popovski, Five disruptive technology directions for 5g,
A. Secker, J. Chandaria, Care to share?: An empirical IEEE Communications Magazine 52 (2) (2014) 74–80.
analysis of capacity enhancement by sharing at the edge, [123] M. Cheung, J. She, L. Liu, Deep learning-based on-
in: Proceedings of the 2018 on Technologies for the line counterfeit-seller detection, in: IEEE INFOCOM
Wireless Edge Workshop, EdgeTech@MobiCom 2018, 2018 - IEEE Conference on Computer Communications
New Delhi, India, November 2, 2018, 2018, pp. 27–31. Workshops, INFOCOM Workshops 2018, Honolulu, HI,
[110] A. Mete, S. Moharir, Caching policies for d2d-assisted USA, April 15-19, 2018, 2018, pp. 51–56.
content delivery systems, in: Proceedings of the 2018 [124] W. Chang, Y. Yu, J. Chen, Z. Zhang, S. Ko, T. Yang,
on Technologies for the Wireless Edge Workshop, ACM, C. Hsu, L. Chen, M. Chen, A deep learning based wear-
2018, pp. 3–7. able medicines recognition system for visually impaired
[111] X. Zhang, Q. Zhu, D2d offloading for statistical qos people, in: IEEE International Conference on Artifi-
provisionings over 5g multimedia mobile wireless net- cial Intelligence Circuits and Systems, AICAS 2019,
works, in: IEEE INFOCOM 2019-IEEE Conference on Hsinchu, Taiwan, March 18-20, 2019, 2019, pp. 207–
30 Wang Yingchun, et al.

208.
[125] C. Hodges, S. An, H. Rahmani, M. Bennamoun, Deep
learning for driverless vehicles, in: Handbook of Deep
Learning Applications, 2019, pp. 83–99.

of
ro
-p
re
lP
na
ur
Jo
Conflict of interest
The authors declared that they have no conflicts of interest to this work.
We declare that we do not have any commercial or associative interest that represents a conflict of
interest in connection with the work submitted.

of
ro
-p
re
lP
na
ur
Jo

Critical Thinking
50% (2)
Critical Thinking
22 pages
Deep Learning in Wireless Network
No ratings yet
Deep Learning in Wireless Network
67 pages
Prose Analysis THE FAN CLUB by Rona Maynard
0% (1)
Prose Analysis THE FAN CLUB by Rona Maynard
6 pages
Deep Learning Towards Mobile Applications
No ratings yet
Deep Learning Towards Mobile Applications
9 pages
Deep Learning On Mobile Devices-A Review: March 2019
No ratings yet
Deep Learning On Mobile Devices-A Review: March 2019
16 pages
Deep Learning on Mobile Devices-A Review
No ratings yet
Deep Learning on Mobile Devices-A Review
15 pages
A Survey of Deep Learning On Mobile Devices Applications Optimizations Challenges and Research Opportunities
No ratings yet
A Survey of Deep Learning On Mobile Devices Applications Optimizations Challenges and Research Opportunities
21 pages
Deep Learning On Mobile and Embedded Devices - State-Of-The-Art, Challenges, and Future Directions
No ratings yet
Deep Learning On Mobile and Embedded Devices - State-Of-The-Art, Challenges, and Future Directions
37 pages
arXiv-M4
No ratings yet
arXiv-M4
16 pages
Mobile Foundation Model As Firmware
No ratings yet
Mobile Foundation Model As Firmware
17 pages
Mobile Computing Textbook
From Everand
Mobile Computing Textbook
Manish Soni
No ratings yet
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
1803.04311 - Transaction Survey
No ratings yet
1803.04311 - Transaction Survey
53 pages
Deep Learning in Mobile and Wireless Networking: A Survey PDF
No ratings yet
Deep Learning in Mobile and Wireless Networking: A Survey PDF
43 pages
Enterprise Mobile Tips and Tricks
From Everand
Enterprise Mobile Tips and Tricks
Adam Sivell
No ratings yet
Virtual Intelligence: Fundamentals and Applications
From Everand
Virtual Intelligence: Fundamentals and Applications
Fouad Sabry
No ratings yet
AI Benchmark: All About Deep Learning On Smartphones in 2019
No ratings yet
AI Benchmark: All About Deep Learning On Smartphones in 2019
19 pages
Deep Learning in Mobile and Wireless Networking: A Survey
No ratings yet
Deep Learning in Mobile and Wireless Networking: A Survey
67 pages
The Mobile Enterprise: A pragmatic vision
From Everand
The Mobile Enterprise: A pragmatic vision
Axel Beauduin
No ratings yet
The Golden Key to Remote Access: Unlocking Windows NT Connectivity
From Everand
The Golden Key to Remote Access: Unlocking Windows NT Connectivity
Pasquale De Marco
No ratings yet
Mastering Mobile Phone Development
From Everand
Mastering Mobile Phone Development
Pasquale De Marco
No ratings yet
Machine Learning On Mobile: 14 Real-World Use Cases
No ratings yet
Machine Learning On Mobile: 14 Real-World Use Cases
37 pages
Smartphone Communication and Society
From Everand
Smartphone Communication and Society
Baalaaditya Mishra
No ratings yet
Mobile Computing: Securing your workforce
From Everand
Mobile Computing: Securing your workforce
BCS, The Chartered Institute for IT
No ratings yet
Lec-All Deep Learning Coursework
100% (2)
Lec-All Deep Learning Coursework
639 pages
Computer Science Self Management: Fundamentals and Applications
From Everand
Computer Science Self Management: Fundamentals and Applications
Fouad Sabry
No ratings yet
sensors-21-04412
No ratings yet
sensors-21-04412
44 pages
21St Century Technologies for Construction Industry
From Everand
21St Century Technologies for Construction Industry
Bassey Effanga Asuquo
5/5 (1)
Microservices Architecture Handbook: Non-Programmer's Guide for Building Microservices
From Everand
Microservices Architecture Handbook: Non-Programmer's Guide for Building Microservices
Stephen Fleming
4/5 (5)
The Beginner's to Professional Guide
From Everand
The Beginner's to Professional Guide
mohamed adel
No ratings yet
Mobile Agents in Networking and Distributed Computing
From Everand
Mobile Agents in Networking and Distributed Computing
Jiannong Cao
No ratings yet
The IT Pro's Guide to Technical Mastery
From Everand
The IT Pro's Guide to Technical Mastery
Douglas Albert Amos
No ratings yet
DeepLearning Networking
No ratings yet
DeepLearning Networking
64 pages
Cloud Computing
From Everand
Cloud Computing
Mei Gates
No ratings yet
Cloud Computing: Enhancing Robotics Through Distributed Data Processing and Virtual Infrastructure
From Everand
Cloud Computing: Enhancing Robotics Through Distributed Data Processing and Virtual Infrastructure
Fouad Sabry
No ratings yet
An Empirical Study of AI Techniques in Mobile Applications
No ratings yet
An Empirical Study of AI Techniques in Mobile Applications
23 pages
Booz 2018
No ratings yet
Booz 2018
6 pages
Smart Camera: Revolutionizing Visual Perception with Computer Vision
From Everand
Smart Camera: Revolutionizing Visual Perception with Computer Vision
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Deepdecision: A Mobile Deep Learning Framework For Edge Video Analytics
No ratings yet
Deepdecision: A Mobile Deep Learning Framework For Edge Video Analytics
9 pages
Internet of Things & Wireless Sensor Network
From Everand
Internet of Things & Wireless Sensor Network
Ajit Singh
No ratings yet
Unveiling the Enigma: Unraveling the Secrets of Mobile Operating Systems
From Everand
Unveiling the Enigma: Unraveling the Secrets of Mobile Operating Systems
Pasquale De Marco
No ratings yet
Grid Computing: A Revolutionary Approach to Scientific Research and Data Management
From Everand
Grid Computing: A Revolutionary Approach to Scientific Research and Data Management
Pasquale De Marco
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mobile-Agent-v2: Mobile Device Operation Assistant With Effective Navigation Via Multi-Agent Collaboration
No ratings yet
Mobile-Agent-v2: Mobile Device Operation Assistant With Effective Navigation Via Multi-Agent Collaboration
22 pages
Distributed Artificial Intelligence: Fundamentals and Applications
From Everand
Distributed Artificial Intelligence: Fundamentals and Applications
Fouad Sabry
No ratings yet
NeurIPS 2024 Mobile Agent v2 Mobile Device Operation Assistant With Effective Navigation via Multi Agent Collaboration Paper Conference
No ratings yet
NeurIPS 2024 Mobile Agent v2 Mobile Device Operation Assistant With Effective Navigation via Multi Agent Collaboration Paper Conference
25 pages
DeepLearning - 1NT22CS078 - I Shania Jone
No ratings yet
DeepLearning - 1NT22CS078 - I Shania Jone
4 pages
02-92
No ratings yet
02-92
15 pages
Emerging Technologies in Telecommunications
From Everand
Emerging Technologies in Telecommunications
Matthew N. O. Sadiku
No ratings yet
Melting Point: Mobile Evaluation of Language Transformers
No ratings yet
Melting Point: Mobile Evaluation of Language Transformers
16 pages
Handbook of Cloud Computing: Basic to Advance research on the concepts and design of Cloud Computing
From Everand
Handbook of Cloud Computing: Basic to Advance research on the concepts and design of Cloud Computing
Dr. Anand Nayyar
No ratings yet
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
No ratings yet
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
101 pages
Ug4 Proj
No ratings yet
Ug4 Proj
44 pages
EasyChair-Preprint-15723
No ratings yet
EasyChair-Preprint-15723
10 pages
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
From Everand
Cloud Computing: Master the Concepts, Architecture and Applications with Real-world examples and Case studies
Ruchi Doshi
No ratings yet
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
An Introduction to SDN Intent Based Networking
From Everand
An Introduction to SDN Intent Based Networking
alasdair gilchrist
5/5 (1)
Machine Vision: Insights into the World of Computer Vision
From Everand
Machine Vision: Insights into the World of Computer Vision
Fouad Sabry
No ratings yet
1-1
No ratings yet
1-1
14 pages
Deep Learning
From Everand
Deep Learning
Manish Soni
No ratings yet
The Ultimate Guide to Landing a Network Engineering Job
From Everand
The Ultimate Guide to Landing a Network Engineering Job
J.L Parham
No ratings yet
Barack Obama - Bagram Air Base December 2010
No ratings yet
Barack Obama - Bagram Air Base December 2010
6 pages
Barack Obama - Comprehensive Immigration Reform
No ratings yet
Barack Obama - Comprehensive Immigration Reform
9 pages
Barack Obama - Birth Certificate Release
No ratings yet
Barack Obama - Birth Certificate Release
2 pages
Barack Obama - Defense Dept Personnel
No ratings yet
Barack Obama - Defense Dept Personnel
4 pages
Barack Obama - Baton Rouge Flooding
No ratings yet
Barack Obama - Baton Rouge Flooding
6 pages
Barack Obama - Carnegie Robotics Center
No ratings yet
Barack Obama - Carnegie Robotics Center
6 pages
Barack Obama - Benghazi Remains Transfer
No ratings yet
Barack Obama - Benghazi Remains Transfer
3 pages
Barack Obama - BP Oil Spill Presser
No ratings yet
Barack Obama - BP Oil Spill Presser
17 pages
Barack Obama - Chemical Weapons Syria
No ratings yet
Barack Obama - Chemical Weapons Syria
4 pages
Barack Obama - Beau Biden Eulogy
No ratings yet
Barack Obama - Beau Biden Eulogy
6 pages
Barack Obama - Capetown University
No ratings yet
Barack Obama - Capetown University
11 pages
Barack Obama - Booker T Washington
No ratings yet
Barack Obama - Booker T Washington
6 pages
Barack Obama - Comprehensive Immigration Reform Vegas
No ratings yet
Barack Obama - Comprehensive Immigration Reform Vegas
7 pages
Barack Obama - Border Security and Immigration Reform
No ratings yet
Barack Obama - Border Security and Immigration Reform
5 pages
Barack Obama - CIA Intelligence Community
No ratings yet
Barack Obama - CIA Intelligence Community
4 pages
Barack Obama - BRAIN Initiative
No ratings yet
Barack Obama - BRAIN Initiative
4 pages
Barack Obama - Aviation Security
No ratings yet
Barack Obama - Aviation Security
5 pages
Barack Obama - Carnegie Mellon Economy
No ratings yet
Barack Obama - Carnegie Mellon Economy
11 pages
Barack Obama - Dorothy Height Eulogy
No ratings yet
Barack Obama - Dorothy Height Eulogy
4 pages
Barack Obama - DDay 65
No ratings yet
Barack Obama - DDay 65
5 pages
Barack Obama - Cuba People Speech
No ratings yet
Barack Obama - Cuba People Speech
11 pages
Barack Obama - Burundi People Message
No ratings yet
Barack Obama - Burundi People Message
2 pages
Barack Obama - 911 Pentagon Memorial 2009_2
No ratings yet
Barack Obama - 911 Pentagon Memorial 2009_2
2 pages
Barack Obama - CNN Guns in America Town Hall
No ratings yet
Barack Obama - CNN Guns in America Town Hall
22 pages
Barack Obama - Brandenburg Gate Speech
No ratings yet
Barack Obama - Brandenburg Gate Speech
7 pages
Barack Obama - CGI 2013
No ratings yet
Barack Obama - CGI 2013
19 pages
Www Chompchomp Com Terms Auxiliaryverb Htm
No ratings yet
Www Chompchomp Com Terms Auxiliaryverb Htm
20 pages
Www Gq Com Story Fitness How to Get Chris Pratt Fit Mbid Soc
No ratings yet
Www Gq Com Story Fitness How to Get Chris Pratt Fit Mbid Soc
20 pages
Beautiful letter to Whitney Heard
No ratings yet
Beautiful letter to Whitney Heard
4 pages
arch2-1(amdhal)
No ratings yet
arch2-1(amdhal)
19 pages
Teoretical Foundation of Projective - Bellak-1-30
No ratings yet
Teoretical Foundation of Projective - Bellak-1-30
30 pages
Seppala - What Is Your Phone Doing To Your Relationships
No ratings yet
Seppala - What Is Your Phone Doing To Your Relationships
10 pages
Curriculum Vitae - Neha Agarwal
No ratings yet
Curriculum Vitae - Neha Agarwal
1 page
EASA ATPL Integrated
No ratings yet
EASA ATPL Integrated
14 pages
Damasio - Fundamental Feelings
No ratings yet
Damasio - Fundamental Feelings
1 page
Shigeru Nakamura Ebook Okinawa Kenpo
100% (2)
Shigeru Nakamura Ebook Okinawa Kenpo
26 pages
SMEA Narrative Report
No ratings yet
SMEA Narrative Report
2 pages
Manuel S. Envega University Foundation Lucena: Grade 1
No ratings yet
Manuel S. Envega University Foundation Lucena: Grade 1
3 pages
Applied Knowledge in Paediatrics: MRCPCH Mastercourse 1st Edition- eBook PDFpdf download
100% (3)
Applied Knowledge in Paediatrics: MRCPCH Mastercourse 1st Edition- eBook PDFpdf download
55 pages
Mil Catch Up Friday Teaching Guide
No ratings yet
Mil Catch Up Friday Teaching Guide
2 pages
Jurnal Internasional (Prokrastinasi
No ratings yet
Jurnal Internasional (Prokrastinasi
12 pages
(GEETHIC) Learning Output 2 - Scribd
No ratings yet
(GEETHIC) Learning Output 2 - Scribd
2 pages
Arun Kumar
No ratings yet
Arun Kumar
2 pages
Launch of Vodafone Essar: Case Study On Vodafone's Re-Branding Strategies in India: Hutch To Vodafone
No ratings yet
Launch of Vodafone Essar: Case Study On Vodafone's Re-Branding Strategies in India: Hutch To Vodafone
9 pages
Thesis MPPT Perturb and Observe
100% (3)
Thesis MPPT Perturb and Observe
8 pages
CCS335
No ratings yet
CCS335
2 pages
Detailed Lesson Plan in English
No ratings yet
Detailed Lesson Plan in English
8 pages
Business Etiquette Part 1
No ratings yet
Business Etiquette Part 1
33 pages
Clinical Sonography A Practical Guide - Fifth Edition PDF ebook with Full Chapters
100% (8)
Clinical Sonography A Practical Guide - Fifth Edition PDF ebook with Full Chapters
17 pages
A. Phonetics: I. Look at The Pictures and Complete The Words. Then Put Them in The Correct Column
No ratings yet
A. Phonetics: I. Look at The Pictures and Complete The Words. Then Put Them in The Correct Column
17 pages
Prebudget 090223
No ratings yet
Prebudget 090223
3 pages
DNB 2017 Closing Ranks - UR - Round 3 - Col. Course
No ratings yet
DNB 2017 Closing Ranks - UR - Round 3 - Col. Course
79 pages
Handout
No ratings yet
Handout
3 pages
Devesh_s_Resume (5)
No ratings yet
Devesh_s_Resume (5)
1 page
Week 1 - Reading Academic Texts
No ratings yet
Week 1 - Reading Academic Texts
11 pages
Test 2 English 11
No ratings yet
Test 2 English 11
3 pages
Awards For Outstanding Performance in Specific Disciplines
No ratings yet
Awards For Outstanding Performance in Specific Disciplines
2 pages
Carter Et Al., 2012
No ratings yet
Carter Et Al., 2012
10 pages

A survey on deploying mobile deep learning applications a systemic and technical perspective

Uploaded by

A survey on deploying mobile deep learning applications a systemic and technical perspective

Uploaded by

Journal Pre-proof

A survey on deploying mobile deep learning applications: a systemic and technical

To appear in: Digital Communications and Networks

Received Date: 4 June 2020

A survey on deploying mobile deep

c 2015 Published by Elsevier Ltd.

reduce the resource of running a deep learning

servers are not limited to cloud servers. The target

resource allocation [52–54], service placement

to refine the model size, including the parameters

whatâĂŹs more, there are usually multiple deep

deep tasks, the purpose of "resource-saving" can

saves computation resources for embedded de-

The authors of [66] learn from factorized

each filter only convolutes certain input channels.

follows: In this method, the input data is transformed to

c) Knowledge distillation reasoning on mobile devices, Depthwise Convolu-

2.2.3. Data reuse across devices

contextual data mappings for the same results. The

dundant computation by harnessing the “equiva-

2.5. Summary of this chapter

and Torch. They display good performance

MACE [46] Xiaomi X X X X X 2018 X Heterogeneous ×

3.1. Position abling the running of more complicated applica-

Table 2: Overview of work for local deployment of MDLAs

The original pruning point of view is overturned

A multiplication method based on vector form is

The internal structure of the model is used to prop-

the convolution layers, which makes use of the sim-

tically. The authors formulate an offloading strat-

Our goal is to minimize energy cost or tasksâĂŹ problem[54].

We discuss the offloading strategy in detail in sub- 3.2.3. Multi-servers single-users

not be partitioned and must be executed locally or each aspect.

The centralized deployment of cache servers and unstable network connections.

Table 3: Overview of distributed deployment of AI applications on mobile devices

Collaboration [83, 84]

MS-SU [85] Users have to choose where to offload their tasks

tention, user request routing, service entity

4. Typical applications on the mobile side [7], mobile malware detection

try and has appeared in many relevant emerging

sonalized digital media data. For many devices

ple of cloud storage replacing local storage. The

6. Conclusion November 02, 2018, 2018, pp. 115–127.

[8] W. Jiang, C. Miao, F. Ma, S. Yao, Y. Wang, Y. Yuan,

per, Tech. rep., European Telecommunications Stan- Track Proceedings, 2016.

on Computer Communications Workshops (INFOCOM USA, 2017, pp. 42–47.

You might also like