0% found this document useful (0 votes)
79 views21 pages

1 s2.0 S1389128622000421 Main

This paper surveys distributed machine learning (ML) for 5G and beyond, emphasizing its necessity due to the increased complexity and energy efficiency demands of modern networks. It reviews various distributed ML architectures, focusing on optimizing communication, computation, and resource distribution while ensuring privacy and security. The findings highlight the potential of distributed ML to enhance AI applications in 5G networks and address challenges related to heterogeneous devices and real-time data processing.

Uploaded by

Bhanu B Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views21 pages

1 s2.0 S1389128622000421 Main

This paper surveys distributed machine learning (ML) for 5G and beyond, emphasizing its necessity due to the increased complexity and energy efficiency demands of modern networks. It reviews various distributed ML architectures, focusing on optimizing communication, computation, and resource distribution while ensuring privacy and security. The findings highlight the potential of distributed ML to enhance AI applications in 5G networks and address challenges related to heterogeneous devices and real-time data processing.

Uploaded by

Bhanu B Prakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Computer Networks 207 (2022) 108820

Contents lists available at ScienceDirect

Computer Networks
journal homepage: www.elsevier.com/locate/comnet

A survey: Distributed Machine Learning for 5G and beyond


Omar Nassef a ,∗, Wenting Sun b , Hakimeh Purmehdi b , Mallik Tatipamula b , Toktam Mahmoodi a
a
Centre of Telecommunication Research, Kings College London, United Kingdom
b
Ericsson

ARTICLE INFO ABSTRACT

Keywords: 5𝐺 is the fifth generation of cellular networks. It enables billions of connected devices to gather and
Machine Learning share information in real time; a key facilitator in Industrial Internet of Things (IoT) applications. It has
Distributed machine learning more capabilities in terms of bandwidth, latency/delay, processing powers and flexibility to utilize either
Distributed inference
edge or cloud resources. Furthermore, 6G is expected to be equipped with the new capability to converge
Latency
ubiquitous communication, computation, sensing and controlling for a variety of sectors, which heightens
5G networks
the complexity in a more heterogeneous environment This increased complexity, combined with energy
efficiency and Service Level Agreement (SLA) requirements makes application of Machine Learning (ML) and
distributed ML necessary. A decentralized approach stemming from distributed learning is a very attractive
option compared with a centralized architecture for model learning and inference. Distributed ML exploits
recent Artificial Intelligence (AI) technology advancements to allow collaborated ML, whilst safeguarding
private data, minimizing both communication and computation overhead along with addressing ultra-low
latency requirements. In this paper, we review a number of distributed ML architectures and designs, that
focus on optimizing communication, computation and resource distribution. Privacy, information security and
compute frameworks, are also analyzed and compared with respect to different distributed ML approaches.
We summarize the major contributions and trends in this area and highlight the potential of distributed ML
to help researchers and practitioners make informed decisions on selecting the right ML approach for 5𝐺 and
Beyond related AI applications. To enable distributed ML for 5𝐺 and Beyond, communication, security, and
computing platform often counter balance each other, thus, consideration and optimization of these aspects at
an overall system level is crucial to realize the full potential of AI for 5G and Beyond. These different aspects
do not only pertain to 5𝐺, but will also enable careful design of distributed machine learning architectures to
circumvent the same hurdles that will inevitably burden 5𝐺 and Beyond network generations. This is the first
survey paper that brings together all these aspects for distributed ML.

1. Introduction To achieve the desired 5𝐺 network performance while meeting en-


ergy efficiency, security, latency requirements, multiple ML approaches
5𝐺 is a main driving force for improved communication perfor- can be considered, which are explored in literature extensively. We
mance. It is a prime catalyst for more workloads being executed on will start by introducing 5𝐺, followed by presenting the opportuni-
the edge, coupled with more low-latency data-driven use cases and
ties of applying distributed ML for 5𝐺. We will then review how
applications [1]. The adoption of 5𝐺, applied to multiple aggregated
the popular distributed ML topologies, such as FL and SL, are being
frequency bands, helps improve communication to and from remote
devices by orders of magnitude. Lowering operational costs and en- designed and utilized to solve specific problems related to 5𝐺. These
suring returns on network investments, are key priorities that service distributed ML approaches adopt and build upon computation and
providers are looking to achieve using AI. The added complexity of communication optimization, resource distribution enhancement, se-
multi-layer, multi-band aggregated 5𝐺 networks drive the demand of curity and privacy preservation techniques, and compute frameworks
ML-enabled automation to maintain or reduce operational costs for to create a harmonious environment, where a breadth of diverse data
service providers.

∗ Corresponding author.
E-mail addresses: [email protected] (O. Nassef), [email protected] (W. Sun), [email protected] (H. Purmehdi),
[email protected] (M. Tatipamula), [email protected] (T. Mahmoodi).

https://doi.org/10.1016/j.comnet.2022.108820
Received 7 July 2021; Received in revised form 25 January 2022; Accepted 27 January 2022
Available online 12 February 2022
1389-1286/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
O. Nassef et al. Computer Networks 207 (2022) 108820

Fig. 1. Diagrammatic view of the organization of this survey.

from different entities can be exploited to potentially support 5𝐺 of milliseconds between the User Equipment (UE) and Base Station
applications. (BS), which makes it especially attractive for use cases that demand
The contribution of this paper includes: real-time transmission, such as intelligent transport systems [4]. On the
other hand, the aim of mMTC is to support a high density of devices in
• Thorough examination of optimization approaches for commu-
a specific location, catering to lower data rates that IoT devices usually
nication and computation; emphasizing the methods that reduce
transmit and over a longer communication range. mMTC facilitates a
overhead whilst maintaining model accuracy.
longer battery life for the UE devices which is appealing to use cases
• Discussion of resource distribution enhancement in the context of
where device installation is difficult. Finally, eMBB caters to the set
mobile networks, where the prominent algorithms are evaluated
of use cases where a higher data rate is essential as well as managing
and analyzed.
exploding traffic requirements. Additionally, eMBB is able to support
• We delve into privacy and security in a distributed ML setting to
mobility use cases along with its capability of connecting to small and
ensure its viability as a privacy-preserving approach for 5𝐺.
micro cells [5].
• Finally, we review different ML frameworks with focus on their
impact on communication and computation to support efficient
2.1. Why is 5𝐺 different ?
ML implementations for 5𝐺.

The paper is structured as follows, a brief overview of related work Frequency bands limit the amount of information that radio waves
is provided in Section 2. Communication and computation optimization can carry, if the limit is reached, the allocation may become pareto-
is detailed in Section 3, whereas, resource allocation is analyzed in Sec- optimal, thus affecting other users (e.g. 4𝐺). 5𝐺 grows the network
tion 4. The privacy and security elements that safeguard participating capacity, as a result of increased bandwidth, which allows catering to
entities are explored in Section 5. The various efforts undertaken at the a larger number of user devices and enabling faster data transmission
ML framework level to create inter-operable and efficient ML solutions speeds. Not only does 5𝐺 provide more capacity for existing tasks,
is discussed in Section 6. Finally in Sections 7 and 8, we conclude by but it also opens up opportunities for new innovative use cases, such
summarizing the future work and trend. The paper structure is visually as, securely streaming high-quality video from a connected ambulance
presented in Fig. 1. to a hospital etc. It enables a range of new types of smart devices
and industry digitization applications. To scale these applications, AI
2. Background solutions are expected to help manage the coordination of devices,
radio and compute resources [6].
5𝐺 is the fifth generation technology standard for broadband cellu- Although 4𝐺 brought great changes, allowing streaming of video
lar networks. It runs on the same radio frequencies that are currently and music on the go, it operated under a one-size-fits-all policy for
being used for smartphones, WiFi networks and satellite communica- connectivity. 5𝐺, on the other hand, is designed to connect devices that
tions. 5𝐺 also introduces new frequency bands known as mid-band extend beyond smart phones, by offering a variety of connections that
and high-band frequency to increase coverage distance [2]. The main cater to specific application and device needs. For example, it can pro-
advantage of 5𝐺 networks is in its ability to offer greater bandwidth vide a connection that is energy conservative, suitable for devices such
and higher speeds. The increased bandwidth makes networks capable of as smart watches, yet is also capable of providing an extremely stable
not just serving cellphones, but also being utilized as a general internet and fast connection for devices such as industrial robots [6]. Extending
service provider, for a diverse range of applications, such as, IoT and this, network slicing can be enabled for 5𝐺; slices of the network can be
more advanced machine to machine communication applications. tailored for a specific purpose and act as its own independent network.
Ultra-reliable Low-latency Communication (URLLC), Enhanced Mo- Each slice can optimize the characteristics needed for a specific service
bile Broadband (eMBB), Massive Machine-Type Communications (MTC) without wasting resources unnecessarily.
(mMTC) are key pillars that 5𝐺 offers to help realize use cases that re- Previous mobile network generations, like 4𝐺, may have difficulty
quire high speed, large scale deployment of low powered IoT devices or handling numerous devices in the same location. 5𝐺 solves this problem
99.99% reliable low latency communication [3]. As the name suggests, by intelligently transmitting to each device, with high precision, han-
URLLC, facilitates use cases that require ultra reliable communication dling as much as 1 million devices per square kilometer. This precision
with very low latency. URLLC offers an end-to-end latency in the order reduces the noise in 5G, so that it is easier to connect many devices.

2
O. Nassef et al. Computer Networks 207 (2022) 108820

While 4𝐺 made cloud services usable on mobile phones, 5𝐺 technol- 2.3. Mobile edge computing
ogy takes this to a new level. Due to its immense processing power, 5𝐺
networks extend beyond traditional networks; it can act as a distributed
MEC is an essential service for the implementation of 5G networks
data center that performs processing tasks, either using the full power
and IoT. It is considered the best method of delivering computation
of centralized resources or the responsiveness of edge computing. As a
and communication resources to mobile devices [12]. The basic idea
result, processing of intense tasks, such as AR-filters or games, could be
of MEC is to run applications and processing tasks closer to the cellular
handled by the network instead of a phone, improving performance and
customer, at the edge of the network. MEC technology is designed to
conserving energy. This makes new types of battery-powered devices
viable (e.g. light-weight AR-glasses), and paves ways for performance enable flexible and rapid deployment of applications, while providing a
reliant applications such as coordinated fleets of connected delivery distributed computing environment for application and service hosting.
drones. The ability to store and process content close to cellular subscribers for
Furthermore, 5𝐺 introduces a number of differentiating technolo- faster response time is especially relevant for real-time applications that
gies which sets it further aside from previous generations, especially demand minimal latency.
in terms of MEC. For instance, Software-Defined Networking (SDN) Furthermore, applications can also be exposed to real time radio
simplifies the network management and the ability to deploy new ser- access network information, which creates more potential for future 5G
vices, by splitting the control and the data plane. More specifically, the applications. Edge computing combined with 5G creates tremendous
control plane handles the policies on the cloud, whilst the data plane opportunities in every industry. It brings computation and data storage
assesses whether to forward traffic based on the control plane poli- closer to where data is generated, enabling better data control, reduced
cies [7]. Moreover, Network Virtualisation Function (NVF) allows the costs, faster insights and actions, and continuous operations. In fact, by
execution of network functions on virtualisation software located on the 2025, it is estimated that 75% of enterprise data will be processed at
servers, making it flexible, automated and scalable. This prevents the the edge, compared to only 10% today [13].
network functions all happening on the cloud [8]. On the other hand,
However, the convergence of MEC and 5𝐺 also presents a number
massive Multiple-Input Multiple-Output (MIMO), allows for an increase
of challenges. For instance, the challenge to handle delay-sensitive data
in the signal-to-noise ratio without the trade-off of increasing additional
generated by the UEs properly so that the real-time manner of the use
transmission power, resulting in increased network capacity, thanks to
cases are not overlooked. To prevent significant delays from directly
the simultaneous offload of tasks to an edge server from a UE [9].
accessing the cloud, edge servers can be used as an intermediary for
Another important enabler of MEC as a result of 5𝐺 enhancement is 5𝐺
efficient transmission. Another challenge is to provide accurate pre-
𝑁𝑅 (the next generation wireless access technology [10]). It facilitates
connectivity from a range of different devices and networks, resulting diction of network demand so that sufficient resources are provisioned
in lower latency and better scalability. 5G also enables device to device to handle the mentioned network demand. More specifically, the split
communication, utilizing ad-hoc links to communicate directly to other between cloud offloading and local processing on edge cloud is difficult
UE in close proximity, without needing the signal to traverse to the base to accurately predict, more so in real-time. This challenge is carried
station first, which helps to reduce the traffic congestion and improve further, when resource management is taken into consideration, since
overall network throughput [11]. the edge nodes do not enjoy the full computing capabilities that the
These new capabilities of 5𝐺 require coordination and optimization cloud does. Optimally allocating resource based on the application type
of compute and data transmission distribution across the network. and requirement which varies constantly is complicated. Additionally,
Distributed AI presents great potential to leverage the flexibility of to improve the Quality of Experience (QoE) and the Quality of Service
utilizing both centralized and distributed resources in a 5𝐺 network. (QoS), service providers must maintain a holistic view of the users and
It also preserves data privacy, optimizes network performance and the demand for the different use cases and the requirements in different
addresses the additional complexity brought by a variety of use-cases geographical locations in order to cater to the different constraints. A
and applications. highly recurring challenge that is expressed throughout many works
covered in this paper is device and communication heterogeneity,
2.2. Opportunities and challenges with 5𝐺 which makes it difficult to encompass a one size fits all solution. Hence,
complex, distributed systems are needed to collaboratively operate in
The global 5𝐺 standardization ensures that devices and networks order to cater to the heterogeneous entities. Although, communication
will work together regardless of user location and device specification. between the UE and the cloud across the network has been reduced,
5𝐺 has many capabilities to enhance our connected lives, however it privacy and security remain a vital challenge due to the dynamic net-
also has its unique challenges. work requirements and the exploding number of heterogeneous devices
The very nature of 5𝐺 to support heterogeneous devices and applica- in a network. This places emphasis on scalable and comprehensive
tions adds complexity to the network. The traditional way of managing security management system which may introduce its own overhead.
networks will no longer be sufficient in supporting these complicated
The challenges listed above may pose difficulty to the adoption of
tasks, such as, coordinating machine to machine communication with
MEC in 5𝐺 and may also be present in beyond 5𝐺. However, they
low latency and high reliability. ML, especially distributed ML, can be
offer a foundation to start catering for the different requirements and
used to address these complex challenges.
constraints arising from various use cases in order to design systems
5𝐺 has immense processing power and can act as more than just a
that could perform more optimally [14].
network. The full range of processing tasks that can be carried out on
5𝐺 networks also pose energy efficiency challenges. ML and distributed Because of the close connection between 5𝐺 and MEC, when it
ML can be used to help optimize network resource allocations, as well comes to distributed ML on 5𝐺 and Beyond, it is critical to analyze
as, network performance to achieve energy efficiency targets. how compute and communication should be distributed and optimized
5𝐺 provides additional flexibility to allocate tasks on edge or cloud among multiple cloud servers and the mobile edge devices. We cover
across the entire distributed network. Leveraging this flexibility, while this topic extensively in Section 3, while device heterogeneity and
fulfilling the SLA requirements (including latency, performance, con- resource management is discussed in 4, privacy and security in 5 and
nectivity, availability, privacy and security) without incurring addi- scalable architecture planning in Section 6. The topics covered in this
tional operational cost, could be challenging. This is another potential paper, do no only pertain to 5𝐺, but also pave a way to address these
area for ML and distributed ML to play a role. fundamental challenges that will be present beyond 5𝐺.

3
O. Nassef et al. Computer Networks 207 (2022) 108820

2.4. 6G And applications 2.5. Connection between 5𝐺 with AI

In future years to come, even 5𝐺 may not be enough for the 5𝐺 networks integrate a wider range of connected nodes, such as
explosive mobile traffic growth; the expected high QoE requirements smart devices and sensors, by catering to a huge number of different
and disruptive use-cases that demand impeccable performance [15]. applications. To allow low latency applications to function and preserve
6𝐺 introduces new possibilities beyond the established 5𝐺 standards data privacy, there has been a paradigm shift in mobile computing
such as peak data rate, user-experience data rate, latency, mobility, and 5𝐺, from centralized mobile cloud computing towards mobile edge
connection density, energy efficiency, peak spectral efficiency and area computing (MEC). The need to draw meaningful insight from large
traffic capacity [16]. The reliability in 6𝐺 is expected to improve by
amounts of decentralized data whilst preserving its privacy, has given
two orders of magnitude reaching 99.99999%. The 6𝐺 signal band-
rise to distributed ML on mobile edge. Extensive efforts has been
width which can either support single or multiple Radio Frequency
made in recent years in both academia and industry to develop these
(RF) carriers should support up to 1 GHz or higher, reaching THz
technologies.
communications [16]. The positioning accuracy for 6𝐺, which is vital
for many industrial application especially in indoor environments, is To adopt machine learning in 5G network and MEC applications, we
anticipated to reach centimeter level opposed to meter level obtained have to (i) decompose the model itself, to train (or make inference with)
by 5𝐺. its components individually; or (ii) scale or parallelize the training
There are many potential enabling technologies which can work process to perform model update at distributed locations associated
in conjunction to realize 6𝐺, from the physical spectrum, networking with data containers. This makes distributed ML approaches critically
conceptualizations, to integration of multiple paradigms into network important [22].
communication [16]. The main problems that MEC faces when applying AI in the context
As part of the physical spectrum, Millimeter Wave Frequencies of 5𝐺 are: offloading decision, resource allocation, server deployment
(mmWave) is a vital component in future 6𝐺 networks, which broadens and overhead [12].
the available bandwidth of new carrier frequencies to improve the The authors of [12] argue that the efficiency of MEC is dependent
portability and integration of antennas; allowing for increased antenna on burdens of high complexity and dimensionality. Approaches such
array dimension. However mmWave will carry its drawbacks of high as heuristical methods, greatly simplifies the scenario. Work in [23]
propagation loss and line of sight dependency to 6𝐺. focuses on the combination of edge computing and Deep Learning
Going beyond mmWave and visible light, THz and optical frequency (DL), specifically segmenting the relationship into four categories: DL
are important factors in wireless technologies, with the ability to pro- applications on edge, DL inference on edge, DL training on edge and
vide extremely high bandwidth, and the benefit of being insensitive to optimizing edge with DL. DL applications on edge, explored the tech-
atmospheric effects. However, many technical challenges will be solely nical frameworks to provide intelligent services, for example real-time
dependent on the hardware implementations. To manage the physical video analytics, intelligent manufacturing and smart home/cities. The
spectrum, dynamic spectrum management is required to improve radio inference of DL on edge focuses on architecture requirements and opti-
resource utilization in 6𝐺.
mization methods to reduce latency, without hindering the accuracy of
Network conceptualizations will be a key enabler of 6𝐺, consoli-
the models. The techniques discussed encompassed early exit inference,
dating network infrastructure to become more flexible and intelligent.
model selection, model optimization and computation sharing. As for
Software and virtualisation are at the front-front. NVF provides the
DL training on the edge, the authors investigated the role of FL and
flexibility for various scenarios and requirements. It will continue to
its enhancements in dealing with a distributed learning environment,
be essential to 6𝐺 as it is to 5G Core Network, though applying NVF
effectively is a complex problem. Integrated with NVF, SDN further focusing mainly on FedAvg. The authors have also explored the op-
improves the flexibility of network management and enables service timizations that can occur on the edge itself stemming from DL to
modularity, playing an important role in the management, orches- improve its QoS, such as effective caching, edge task offloading and
tration and architecture of communication technologies for 5𝐺 and edge management and maintenance.
beyond. However, there remains many open research questions upon To address the gaps observed in the literature, we fully explore
the exploitation of SDN effectively to bring the optimal performance distributed ML with respect to compute, communication, privacy, re-
with respect to QoSs and Key Points of Interest (KPI) [17]. source allocation. This article offers the complete overview of all the
6𝐺 is anticipated to merge a variety of different paradigms to mobile interconnecting elements to apply ML for 5G. It helps to ensure the
networks. Such paradigms include block-chain, intelligent edge com- success of adopting such a concept in both alleviating challenges of 5G,
puting, communication-computing-control-convergence and AI [15]. as well as, addressing the requirements arising from a diverse range of
As mobile networks become more heterogeneous and the complexity distributed applications enabled by 5G.
increases, the difficulty of optimizing performance magnifies with the
increased dimensionality. Block-chain could be applied to enhance the 2.6. Deep learning approaches on edge
implementation of important services due to its nature of immutability,
decentralization and enhanced security [18]. Intelligent edge comput-
Distributed ML on the edge is being explored extensively as an
ing has been facilitating 5𝐺 adoption and is expected to go beyond
efficient training approach for deep learning models [24]. The main
future communication generations. By optimizing performance of net-
driving factor is the promise of an improved model that incorporates
work services and allocating resources efficiently, it enables automation
data from a variety of data sources, whilst security, communication
of orchestration and management tasks [19]. It is envisioned that 6𝐺
and computation overhead are kept in check. There are two main
will have the ability to converge ubiquitous communication, compu-
tation, sensing and controlling for a variety of sectors such as tactile distributed ML approaches: FL and SL.
internet [20]. The adoption of AI may alleviate the complexity from all FL allows collaborated training of a shared model while keeping the
of the above mentioned tasks. data on the remote device [25]. Though many variations of FL now
It can be shown that even with early 6G concepts [21], the aspects exist with tweaks and improvements, the general process of learning in
(optimizing communication, computation and resource distribution, a FL setting is common. Firstly, the server decides the model hyper-
privacy and information security, compute frameworks), highlighted in parameters, and sends the global model to the clients. The clients
this paper, which burden distributed machine learning in 5𝐺 are highly then train the local model on their local dataset, and send the learnt
extendable to 6𝐺, with the network generations being the medium parameters to the FL server. In the third step, the server aggregates the
that facilitates distributed machine learning with evolving scale and local model weights to create a global model. The process is repeated
capabilities. until the model converges [26]. Fig. 2 presents the architecture of

4
O. Nassef et al. Computer Networks 207 (2022) 108820

model training is complete [31]. SL is abstractly showcased in Fig. 3.


With SL, the cut layer can be arbitrarily placed in the NN, tailoring the
partition of NN to the computational resources between the clients and
the server. This offers a greater flexibility to the end-to-end design for
the distributed ML environment.
There are two main aspects to SL: topology and training. A vanilla
topology considers clients training a partial NN up to the cut layer,
sending the output to the server, and the server computing back gradi-
ents to the same layer, sending the gradients back to the client. There
exists different configurations catering to different data anonymiza-
tion requirements such as: U-shape, vertically partitioned and Tor-like
Fig. 2. FL MEC architecture where the end devices pass the model weights to the
topologies [32].
edge server. The edge server then delivers the weights to the central cloud for model Distributed ML exploits a server-client connection topology to train
aggregation, to which the new model is distributed back to the edge devices for a full a statistical model and concurrently keep remote data private. Different
FL cycle. approaches are applicable for different scenarios. Several attributes of
distributed ML are vital for design: computation and communication
overhead, security and privacy, topology, dataset distribution and fi-
nally the number of required clients. Specifically, SL is applicable when
there are numerous clients in the environment, which brings a faster
convergence and a more accurate model overall. On the other hand,
FL, is suitable when communication overhead is important, especially
in an environment with a modest number of connected devices.

2.7. Related surveys

Although there are many good surveys in the literature, with each
Fig. 3. Vanilla SL architecture depicting the computation split of an NN between client
providing specific value, this survey is unique in that it is the first to
and server. study recent research that have combined distributed machine learn-
ing concept with distributed nature of telecommunication networks.
We also went into depth to distill the fundamental mechanism that
FL, whereby, the global model aggregates each of the remote model enabled a distributed machine learning strategy to be successful in
weights for learning. FL directly prevents the sharing of raw data. various applications. Readers can use this survey not only as a way
Three common flavors of FL are present: Vertical FL, Horizontal to gain perspective and knowledge, but also aspire to apply similar
FL and Federated Transfer Learning [27]. Horizontal FL is reserved basic principals behind these distributed machine learning techniques
for datasets that share the same feature space, but the intersection of to design their own novel approaches.
the samples is minimal. Consider a language model from a keyboard Work in [22], bridged the gap between deep learning with mo-
auto-correct characteristic, the users share the same feature space, but bile and wireless networking research, by presenting a comprehensive
the data generated from each user is different. On the other hand, survey of the crossovers between the two areas. Several use cases of
Vertical FL is applicable in use cases where applications share the applying deep learning in 5𝐺 network have been brought up and the
same samples but in different feature spaces. For example, building challenges identified, including MIMO, smart grid, Self-Organizing Net-
a targeted advertisement and recommendation system from the user works, and millimeter wave beam forming. Limitations of deep learning
samples recorded by a single entity. While the feature set of each of the approaches in Mobile and wireless networks are analyzed, namely, slow
ML models are different, the data samples are shared. Finally, Federated and unstable communication, heterogeneous devices and privacy and
Transfer Learning is utilized in instances where not only the feature security concerns. However, in [22] it is assumed that big data alone
space but also the samples are different. As such Federated Transfer can fuel the performance and eliminate domain expertise by applying
Learning can be used to provide solutions for the entire sample and hierarchical feature extraction. As a result, [22] advocates that deep
feature space. This can be demonstrated by providing a general solution learning can be an addition applied on top of wireless networking
to autonomous vehicles and unmanned aerial vehicles which neither technologies, by leveraging on more data available. In this article, we
share the same feature space nor samples. argue it is critical to combine domain knowledge rather than eliminate
A challenge in FL is the trade-off between communication overhead, it. We not only illustrate the interconnection between domain and ML
and computation cost [28,29]. These compromises do not only pertain in resource allocation, communication and computation, privacy but
to FL, but can be extended to distributed ML in general. However, the also highlight the need to combine domain and design knowledge from
trade-off between communication and computation is more apparent wireless networks into ML system design and implementation in order
in FL due to its intricacies of system design and statistical heterogene- to derive effective optimization strategies to enable AI in 5G.
ity [30]. Communication and computation efficiency and the trade-off Work in [33] presents a conceptual model for 5G and beyond. It
between the two are discussed further in this paper. shows the use and role of ML techniques in each layer of the model. The
On the other hand, SL splits the computation between the remote authors also review some classical and contemporary ML techniques
device and the server, allowing cooperative model training without the such as supervised and un-supervised learning, Reinforcement Learning
need for devices to share data or model specifics to the server. The cut (RL), DL and FL in the context of wireless communication systems.
layer is the divide between the device and server computation. To train However, there are various different approaches for distributed ML
the model, the remote device initiates forward-propagation until the apart from FL which have not been touched on.
cut layer, sending the output to the server to complete the calculation [34] discuses the inability of 5G wireless heterogeneous networks
on the rest of the layers. For back-propagation, the process is mirrored, to fully reach its potential due to the complexity of various generated
the server starts the gradient calculation from the last layer until the cut data and protocols, which are supposed to cooperate together within
layer, passing the gradient output to the remote device to continue the the network. The authors believe that the utilization of AI will facilitate
propagation on the rest of the layers. This procedure is repeated until easier network management to reduce the network challenges. The

5
O. Nassef et al. Computer Networks 207 (2022) 108820

various AI approaches and their efficacy on heterogeneous networks Table 1


are reviewed. However, the challenges in fact extend beyond what is Summary of survey papers for distributed machine learning.

discussed by the authors, where memory and computation complexities Title Focus areas
remain a big hurdle in heterogeneous networks. Deep learning in mobile ∙ State of the art deep learning approaches
[35] dives into 5G wireless networks’ adoption, benefits, and ap- and wireless networking: A ∙ Applications and limitations of deep learning
proaches. The survey described 5G applications, architecture and ser- survey [22] approaches in mobile and wireless networking
∙ Pipeline and systems for deep learning
vices. The authors reviewed common ML methods such as supervised,
applications in mobile and wireless network
unsupervised and reinforcement learning. The enablers of certain ML
Machine Learning ∙ Review of classical and contemporary ML
applications such IoT, smart technology and tactile internet related to
Techniques for 5G and techniques including supervised and unsupervised
the adoption of 5G are expanded upon in the paper. However, the beyond [33] learning, Reinforcement learning, Deep learning
survey did not cover distributed ML as part of the AI methodologies and Federated learning in the context wireless
that are both necessary and have great potential to benefit 5G and its communication systems
applications. ∙ Future vision of 6G networks and role of ML at
[36] provides a comprehensive survey on coded distributed com- application and infrastructure level
puting. It addresses specifically the communication costs that arise Survey on machine ∙ Introduction of basic machine learning concepts
from distributed computing. The authors provide an excellent reference learning in 5G [34] and algorithms, such as supervised learning,
unsupervised learning and reinforcement learning
on fundamentals of coded distributed computing (CDC), basic CDC
∙ Introduction of some use cases of applying
schemes and several specific CDC approaches. CDC, as a newly evolved machine learning to 5G
area could provide theoretical foundation to optimize communication
A survey of 5G network ∙ Brief introduction to 5G and machine learning
cost for distributed machine learning, specially for network applications
systems: challenges and approaches
and 5G. Although CDC addresses communication cost, straggler effects, machine learning ∙ Several machine learning (ML) techniques and
fault tolerance, privacy and security, the theory and functional blocks approaches [35] ML inspired approached to handle 5G network
have not yet been explored beyond MapReduce. The survey has not challenges
covered resource allocation, computation frameworks or privacy and A comprehensive survey ∙ Fundamentals of coded distributed computing
security aspects in details. Neither are the interconnections or balance on coded distributed (CDC) and basic CDC schemes
among these factors mentioned. computing: Fundamentals, ∙ CDC approaches and applications to reduce
[37] focuses on distributed machine learning architecture and topol- Challenges, and communication cost
Networking applications
ogy, specifically, model parallelization and data parallelization. The
[36]
authors have reviewed machine learning algorithms according to their
A survey on distributed ∙ Challenges and opportunities of distributed
categories, defined based on different criteria. Model and data paral-
machine learning [37] machine learning
lelization is then analyzed based on the defined categories of machine ∙ Overview of the distributed systems available
learning algorithms, where recommendations are provided. The au-
thors also covered compute frameworks. However, this article does
not address the unique need and requirement for telecommunication
networks. It has missed the discussions on resource allocation and 3. Communication and computation optimization
detailed analysis on privacy and security tradeoffs which are extremely
important for telecommunication applications due to the constraints on Modern machine learning is shifting from centralized to distributed
resources and stringent Service Level Agreement (SLA) requirements. architectures. State-of-the-art models are now typically trained using
A reference table is complied below to help readers locate the multiple CPUs or GPUs and data is increasingly being collected and pro-
right reference material and cross examine the surveys to gain broad cessed in networks of resource-constrained devices, e.g., IoT devices,
understanding of the discussed area, evaluated by the authors with smart phones, or wireless sensors [38]. In a 5𝐺 context, the hetero-
distinctly different focuses and perspectives in mind. (see Table 1).
geneous devices connected within a network will put on additional
requirements for machine learning to be carried out in a distributed
2.8. Our scope
manner, while maintaining training and inference performance. Earlier
Although there have been excellent surveys that discuss the adop- works have focused on addressing the computation and communication
tion of AI and ML in 5𝐺 and Beyond, there remains a gap in literature bottlenecks between CPUs and GPUs. In the 5𝐺 world, which adopts a
on the adoption of distributed machine learning in mobile network broader range of devices, architectures and the communication among
scenario with the various factors that contribute to successful training them need to be considered to reduce the associated overheads.
and inference (with their workloads distributed in different parts of the In this section, we address the communication and computation
network). The majority of the existing works mentioned in Section 2, issues that are crucial to apply on distributed machine learning ap-
cover a broad range of ML methods and their applications in mobile proaches. The communication and computation optimization
networks. However, they did not delve into distributed ML from system approaches summarized in this section could be applicable for training
and architecture level as a viable and necessary facilitator to apply or inference, sometimes both (see Table 2).
these ML techniques in mobile networks. Gathering the previous works, In the next section, we will put the issues into the context of mobile
this survey paper adopts systematic analysis and provides insights to networks, where radio resources utilization will require optimization to
tackle the challenges of using machine learning approaches in a dis- support both network functionality operations and distributed machine
tributed environment such as in mobile networks. Specifically, we focus
learning applications.
on computation and communication optimization, end-to-end data and
Distributed ML is often bound by the communication overhead as
workflow design, resource distribution between network and device
training and inference algorithms depend on the frequent transmissions
entities, the privacy and security of such entities and finally, the dif-
ferent compute frameworks that can allow the adoption of distributed of gradients and intermediate tensors between nodes. Using 4𝐺 as
learning with their inherent resource management design. The main an example of a communication medium, the transmission speed is
goal of this work is to fill the gap in literature and provide the readers roughly one gradient per second [38]. With the absence of communica-
the overview and comparative analysis. It can be used to design and tion optimization, the delay could impair latency significantly, which
balance specific requirements (such as data pipeline and infrastructure, heightens the necessity and importance to optimize communication
mobile network resource optimization, QoS, environmental and energy for any distributed machine learning tasks. When multiple distributed
consumption, ML prediction accuracy, latency etc.) for the use cases of resource-constraint devices are present in the network, compute needs
interest. to be coordinated to prevent excessive training and inference times.

6
O. Nassef et al. Computer Networks 207 (2022) 108820

Table 2
Summary of techniques to improve training and inference efficiency.
Approach Suitable for training Suitable for inference
∙ Reduce the size of model by ∙ Applicable for training. It directly reduces resources needed to train ∙ Not directly applicable for inference, however it
reducing number of helps to reduce resource needed to compute
parameters or weights needed inference result and speeds up inference due to a
(adaptive models in resulted smaller model.
Section 3.1)
∙ Reduce data and gradient ∙ Applicable for training in the sense that it reduces the payload for each ∙ Applicable for inference. It reduces inference
dimension or precision communication round between edge and host platforms during training. data size and workload.
(compression in Section 3.2) However it may increase training iterations to achieve reasonable accuracy.
∙ Optimize query and message ∙ Not applicable for training, except for very special cases (e.g. feed ∙ Applicable for inference. It reduces inference
passing (graph models in forward DNN which matches specifically to certain application and workload by finding the optimal computational
Section 3.3) network run times [39]) scheduling and message passing.

Parameters, especially NN gradients need to be communicated across


distributed devices frequently with iterative distributions and aggrega-
tions; delays stemming from resource insufficient devices could result
in global inference and training latency. The problem becomes more
severe with 5𝐺 when there is an increasing complexity of manag-
ing more devices with lower latency requirements. The balance be-
tween computation, communication and accuracy becomes critical to
achieve both prediction and training performance as well as efficiency
objectives [40].
In MEC scenarios, with many and varied users, servers and appli-
cations, problems such as where to offload tasks generated by devices,
how many resources to allocate to each user and how to handle inter-
server communication becomes complicated. These problems are char-
acterized by parameters with exceedingly high levels of dimensionality,
resulting in too much data in need of processing, thus complicating
the task of finding efficient configurations for both computation and
communication optimizations. The importance of combining both the
communication and computation planes in the service model is high- Fig. 4. Illustration of edge and host partition scenarios [42].
lighted in [12], advocating the necessity to trace the modification of
these parameters on the performance of MEC, in order to optimize
compute and communications effectively.
by the network. Various mechanisms under these three categories are
The authors of [41], discuss the modeling of communication and
explained and compared in [43]. The offloaded application can be
computation tasks, more specifically, computation on mobile device
processed in either a single node or multiple nodes, depending on
and computation on MEC servers. This leads to the rationale behind
the possibility of being portioned. If partitioning the application is
incorporating advanced wireless communication techniques, such as,
possible, the distributed computation methods would be the primary
interference cancellation and adaptive power control, to better uti-
solution. Similarly, in [43], MEC resource allocation of a variety of
lize the wireless channel conditions of MEC for energy consumption.
single-node and multi-node processing are compared based on the
Whereas, for computation latency, the authors suggest load-balancing
delay minimization, QoS for users satisfaction, and energy consumption
and intelligent scheduling policies. Concepts including computation
of computing nodes. It is important to consider a balance between
offloading, radio and computation resource allocation, MEC schedul-
the communication and computation resources to preserve a level of
ing and multi-server cooperation are highlighted. [41] proposes the
offloading of latency-insensitive and computation-intensive tasks to the satisfaction for these metrics.
central cloud, leaving the latency and computation sensitive tasks for In mobile and edge scenarios, to achieve real time prediction at
the edge server, to cater for multiple computation tasks on heteroge- the user’s end, the edge often serves as a parameter server to both
neous servers. In addition, prioritizing offloading tasks with multiple execute and train ML model [44] (Fig. 5). The edge server performs
users that have stringent latency requirements and heavy computa- gradient synchronization by collecting all gradients and averaging them
tional loads allows the computational latency of the MEC to decrease. to update parameters. The updated parameters will be sent back to
each edge node for next training step, which is known as parameter
In [43], the integration of different MEC architectures and standard- synchronization. To achieve efficiency, in [45], a design is implemented
ization activities into mobile networks is discussed. One challenge for to allow the communication between nodes to be asynchronous, us-
any user device is to decide when it should offload its computations ing clients to process data while servers synchronize parameters and
onto the MEC, and how much of its computations should be offloaded. perform global updates, so that the synchronization does not block
This decision is crucial in both energy saving and computation effi- computation.
ciency. The decision is influenced by many factors such as security Due to the need of parameter synchronization, it is important to
and privacy of the data for the user, user’s preference to perform the ensure that the user’s connectivity is guaranteed. Communication is
computation, the quality of the radio connection, the MEC and cloud crucial as parameters computed will need to be synchronized between
availability and capability to perform the process, etc. Considering edge and cloud. This could be a major bottleneck for distributed ML
these factors, there are three strategies for offloading to MEC: local tasks. Several options to deal with the mobility of the users while
execution (takes place on the user’s device), full and partial offloading connecting to the MEC are explored in [43], which include (a) setting
to the MEC. Fig. 4, gives a good example of these strategies. The properly the transmission power for low mobility users to keep them
goal of these algorithms is to balance two important metrics: energy connected to the same serving edge, (b) for handover of the mobile
saving on the user’s device, and to satisfy the delay constraints imposed users, a virtual machine is created for their computation and is being

7
O. Nassef et al. Computer Networks 207 (2022) 108820

Table 3
Summary of adaptive models.
Approach Key idea Proposed solution Trade-offs
Adaptive ∙ Reduce the number of Deep ∙ Smart splitting grouping Advantages:
models Neural Network (DNN) [47,48]. ∙ Reduce number of model parameters and enable
parameters without affecting ∙ Pruning and Selection of nodes model-parallelization [50].
accuracy. or branches [49,50]. ∙ Minimize communicated bits and payload size, particularly when
∙ Optimize communication and ∙ Communication-Aware the model size is large [51].
computation load by eliminating compression, sparsification and ∙ Optimize the client-to-server communication cost while addressing
unnecessary weights. quantization. inter-client interference and inter-client knowledge transfer
∙ Federated Distillation (FD) and challenges [52].
Federated Augmentation cite
(FAug) [51]. Disadvantages:
∙ Federated Weighted Inter-client ∙ Best partition point for a DNN architecture depends on its
Transfer (FedWeIT) [52]. topology [48].
∙ Needs to be tuned and customized for specific task, process or
data.
∙ Communication and computation cost depends on the compression
level for the specific technology and use-case.

To improve transmission efficiency, communication-aware adap-


tive tuning can be carried out to communicate gradient sparsifica-
tion (minimization of the coding length of stochastic gradients) effi-
ciently [38]. The adaptive tuning relies on a data-dependent measure
of objective function to continuously improve, and adapt the compres-
sion level to maximize the gradient descent per communicated bit.
To minimize inter-device communication overhead, Federated Distil-
lation (FD) [54] and Federated Augmentation (FAug) [51], are used to
achieve around 26x less communication overhead while maintaining
95 − 98% test accuracy compared to FL. In this implementation, a
distributed model training algorithm with much smaller communica-
tion payload size is used, where communication payload size depends
not on the model size but on the output dimension. FAug helps to
counter the effect that user-generated data samples are likely to be-
come non-Individually Identically Distributed (i.i.d) across devices,
by using a generative model, and thereby augments its local data
Fig. 5. Example of edge training [44]. towards yielding an i.i.d dataset. The adaptation can also happen
at NN architecture level [52], where network weights are additively
decomposed into global shared parameters and sparse task-specific
transferred to the next edge node(s) to keep the connection and satisfy parameters. This decomposition minimizes interference between in-
QoS, (c) finding a new path to deliver the computed data from the compatible tasks, and also allows inter-client knowledge transfer by
hosting edge node to the user in its new serving cell. The mentioned communicating the sparse task-specific parameters.
approaches require sophisticated resource allocation and network man- Gradient coding and grouping is used in [47]. Leveraging storage
agement optimization algorithms to handle the requirements in real and computation redundancy, it uses adaptive selection to reduce the
time all over the network. communication and computation loads. The combination of stochastic
The authors of [46] explored various optimization methods to ac- gradient coding, grouping and adaptive selection provides robustness
celerate the convergence of the algorithms with efficient edge commu- and reduced communication overhead with only a limited increase
nication requirements; and the corresponding applications where these in computation cost for both high and low tail distributions of the
methods are beneficial. Four algorithms are investigated: zero-order, computing time.
first-order and second-order optimization methods as well as federated From an inference perspective, work in [48,55] attempted to tackle
optimization. As discussed earlier, the advantages of distributed ML communication optimization. By segmenting the actual DNN, it per-
goes beyond the optimization of computation, storage, and communi- formed a portion of the computation on device whilst the rest is
cation resources. Distributed ML can also address concerns on privacy, offloaded. Logically, leaving the computation-heavy layers to the cloud
security, reliability, and latency. More specifically for ultra-low latency would result in a better performance. However, this approach de-
pends on the correct profiling of the layer performance, which is ex-
wireless networks, privacy, security, reliability, and latency ought to be
tremely difficult due to the obscurity of NN and unknown optimizations
considered at a vast scale. Thus, AI systems should be restructured and
occurring on proprietary hardware.
redesigned, with approaches such as model or data partition, to help
Alternatively, authors in [56,57], introduced early exit points for
implement communication-efficient algorithms [46].
the DNN. At each exit point, the entropy of a classification result
In the remaining part of this section, we will illustrate the details of
(e.g., by softmax) is used as a measure of confidence in the prediction.
three main approaches.
If the entropy of a test sample is below a learned threshold value,
meaning that the classifier is confident in the prediction, the sample
3.1. Adaptive models exits the network with the prediction result at this exit point, and is
not processed by the higher network layers. These approaches create a
Communication can be adapted to minimize unnecessary bandwidth new DNN which utilizes the weights of the original model, and is then
requirement while maximizing the efficiency for each communicated trained to reduce accuracy error at the branches. Aside from the ben-
bit. At the system level, this can be achieved by using event triggering efit of reducing unnecessary computation, if the model is pre-trained
for sparsity, inherently streamlining communication [53] (see Table 3). then transferring the weights is trivial. If less training iterations are

8
O. Nassef et al. Computer Networks 207 (2022) 108820

Table 4
Summary of compression approaches.
Approach Key idea Proposed solution Trade-offs
Compression ∙ Sending important gradients to ∙ Deep Gradient Compression Advantage:
reduce the communication (DGC) [58]. ∙ Reduces the required communication bandwidth and improves the
bandwidth. ∙ Sparsified stochastic gradients scalability by sending only the important gradients (sparse update).
∙ Represents gradients and [44,59]. ∙ Enhances energy-efficiency and throughput.
weights in a compact way to ∙ Partitioning a DNN between the ∙ Slashing the number of communication rounds hence reducing
reduce model size. edge and the host platform [42]. communication overhead.
∙ Feature encoding on input and ∙ Communication-Mitigated
output features (lossy and Federated Learning (CMFL) [60]. Disadvantages:
lossless) to reduce the required ∙ Bayesian compression [61] ∙ Requires additional processing (momentum correction, local
bandwidth for communication. gradient clipping, momentum factor masking and warm-up training)
∙ Importance and the gradients to preserve accuracy.
and weights are measured to ∙ Partitioned network needs to be fine tuned.
reduce irrelevant updates. ∙ Reduces convergence accuracy and hence requiring more training
iterations.
∙ Additional computational overhead caused by checking the
updates’ relevance.
∙ Manual inspection may be required for tuning the pruning
threshold.

In [58], deep gradient compression is used together with momen-


tum correction, local gradient clipping, momentum factor masking,
warm-up training and hierarchical threshold selection to achieve com-
pression ratio in the range between 270x and 600x without losing
accuracy. In [62], three stage pipeline: pruning, trained quantization
and Huffman coding, work together to reduce the storage requirement
of neural networks by 35x to 49x without affecting their accuracy. A
general framework for atomic sparsification of stochastic gradients is
proposed in [59], which gives a random unbiased sparsification of the
Fig. 6. Illustration of weight compression [58]. atoms minimizing variance.
Alternatively, authors in [42], recognized that offloading earlier
layers may be sub-optimal due to feature sizes and bandwidth re-
lated network bottlenecks. As a result, the authors suggested feature
performed, the exchange of model weights across distributed devices is
encoding with lossy and lossless compression, to reduce the sparsity
reduced, thus reducing the communication bandwidth needed.
of information and bandwidth for communication, minimizing offload
Pruning the DNN architecture has also been identified as an ef-
overhead. It is important to note that the accuracy of the DNN decreases
fective approach to reduce communication overhead. Authors in [49]
as a result of compression. Therefore, it is not advised when critical
proposed pruning fully connected layers in a DNN with the aim of
applications that depend on high accuracy are of interest.
maintaining accuracy level whilst balancing partitions for effective
It was shown that sparsifying the singular value decomposition of
parallelizability. This allows processing to occur on different cores
neural networks gradients, rather than their coordinates, can lead to
with reduced energy consumption and improved inference speed. The
significantly faster distributed training. Bayesian approach is adopted
pruning of fully connected layers can be extended to fit a distributed
to prune the network in [61], hierarchical priors are used to prune
ML approach in order to reduce speed and energy consumption on
nodes instead of individual weights, and posterior uncertainties are
the distributed nodes. SplitNet [50] introduces parameter reduction
used to determine the optimal fixed point precision to encode the
and model parallelization through split of DNN weights horizontally.
weights.
It subdivides networks for the exclusive clustering of different classes,
To address the communication and scalability challenges in dis-
generating independent pruned sub-networks. This work, has shown tributed machine learning, in [63], three approaches are implemented:
superior parallelization performance compared to base networks, which (a) Periodic averaging, where models are updated locally at devices and
makes it an attractive feature in distributed ML. As the pruned archi- only periodically averaged at the server. (b) Partial device participation
tecture is relatively light weight, the communication requirement on where only a fraction of devices participate in each round of the
exchanging the weights will also be lower. However, this approach training. (c) Quantized message-passing where the edge nodes quantize
could incur additional computational complexity and training time for their updates before uploading to the parameter server.
large DNNs to cluster exclusive sub-networks. Moreover, in [44], sparsification is used to choose important gradi-
ent coordinates. Only the important gradient coordinates are transmit-
3.2. Compression by sparsification and quantization ted to the cloud for synchronization. After which, momentum residual
accumulation for tracking out-of-date residual gradient coordinates is
The gradient exchange in distributed machine learning is not just used to avoid low convergence rates caused by sparse updates. A similar
costly but also sometimes wasteful. 99.9% of the gradient exchange approach is used in [60] to identify and preclude irrelevant updates
in distributed Stochastic Gradient Descent (SGD) are redundant [58]. from being uploaded, to reduce network footprint by approximately
Compression by sparsification and quantization of the weights can help 13.97×.
reduce the communication bandwidth significantly, showcased in Fig. 6
(see Table 4). 3.3. Graphical models
Gradient sparsification is used to minimize the coding length of
stochastic gradients. Thresholds often are used for gradient sparsifica- Leveraging topological structures of the network, distributed infer-
tion. Quantization uses low precision values to compute the gradients. ence can be implemented by designing robust architecture and defining
Both intend to remove the redundancy and improve compression ratio efficient protocols (e.g. using probabilistic graphical models to define
with little sacrifice to accuracy. message passing). Work in [64], demonstrated an efficient distributed

9
O. Nassef et al. Computer Networks 207 (2022) 108820

Table 5
Summary of graphical models.
Approach Key idea Proposed solution Trade-offs
Graph models ∙ Construct graphical models to ∙ Spanning tree formation, Advantages:
optimize communication quality junction tree formation, and ∙ Both network-related and application-specific information can be
and minimize the computation message passing. used to minimize the communication and computation required.
and communication cost [64]. ∙ Solving the scheduling of DNNs ∙ Provides an energy and performance efficient method of querying
∙ Distributed message passing. at layer granularity in the mobile DNNs for the mobile side.
∙ Formulating the problem of cloud computing environment by ∙ Benefits the cloud server by reducing the amount of its workload
optimal computation scheduling the shortest path and integer and communications.
of DNNs as a graph navigation linear programming (ILP) [39]. ∙ Preserves properties of convex belief propagation.
problem [39]. ∙ Dividing the computation and
memory requirements onto Disadvantages:
multiple machines to perform ∙ Tight coupling between the application and networking layers will
inference in large scale graphical make customization necessary.
models [65]. ∙ Requires the problem to be partitioned and specific networks to
be utilized to make full use of approach.
∙ Constraints are required to reduce the problem to NP-Complete.

algorithm for optimizing the choice of junction tree to minimize the


communication and computation required by inference. The nodes of
the sensor network first organize themselves into a spanning tree so
that neighbors have high-quality wireless connections. Using pairwise
communication between neighbors in this tree, the nodes compute the
information necessary to transform the spanning tree into a junction
tree for the inference problem. In addition, these two algorithms jointly
optimize the junction tree to minimize the computation and communi-
cation required for inference. Finally, the inference problem is solved
exactly via message passing on the junction tree. Fig. 7. Diversity of available resources — Conventional vs. new types of resources
overview.
Graph models are also explored in [65], a decomposition method
is proposed for message passing algorithms to ensure applicability
for non sub-modular potentials to distribute inference using graph
cuts. It applies model-parallel learning and inference algorithms which 4. Resource distribution optimization
exploit distributed computation and memory storage to handle large
scale graphical models, as well as, a small number of training in- The new radio access paradigm is driven by large demands to access
stances. It extends a primal–dual Message-Passing algorithm to min- 5𝐺 and Beyond networks. With 5𝐺 and beyond, it is expected that
imize transmission overhead when considering distributed memory various types of services will be utilized, and data traffic will con-
environments. tinue to explode. Resource allocation and management of conventional
A direct acyclic graph to model the granularity of DNNs is adopted wireless networks, known as radio and network resources (or, simply
in [39], the learning problem can then be framed as a shortest path communication resources), has been traditionally a challenge for wire-
optimization. The cost of the path can be broken down to mobile less networks. With 5𝐺 and beyond, the pool of resources is expanded
execution, cloud execution, downloading input/output data and up- to more diverse types such as CPU/GPU or storage resources. This
loading input data. Unfortunately, this work depends on the application brings additional challenges to the resource allocation problem [67–70]
specifics, network run-times and the requirement that the DNN is as illustrated in Fig. 7. On the other hand, by re-distributing real-
feed-forward (excluding recurrent neural networks), this causes the time compute workloads to the edge, additional gains can be realized.
generalizability of this approach to drop. Ericsson 5G uplink booster [71] is an example where radio performance
A hybrid algorithm [66] using Bayesian non-parametric models to
was significantly improved by shifting uplink signal processing load to
decompose and partition latent measures into finite ones, has been
the edge radio heads, via software change alone. ‘‘The uplink coverage
demonstrated to be easily distributed to allow scalable inference with-
is extended with an application coverage gain up to 10 dB, that is
out sacrificing asymptotic convergence guarantees (see Table 5).
equivalent to 90% increase of application coverage, and the cell-edge
uplink speed can be increased by a factor of 10’’.
Remarks. To improve inter-device communication and computation
To exploit the full potential of efficient and effective 5𝐺 and beyond
efficiency in distributed machine learning, it is instrumental to max-
imize the reduction offered with the latest sparsification and quanti- networks, there have been many attempts to deploy AI/ML approaches
zation approaches to reduce the payload that needs to be transmitted within wireless communication networks [68,69]. However, central-
and synchronized across devices. Knowledge distillation approaches ized algorithms cannot satisfy the low latency requirement of 5𝐺 and
such as federated distillation that exchanges the representation of beyond (for near-real time applications) and are prone to security and
the model output using logit vector or equally efficient presentations privacy issues. Adopting distributed ML is imperative to address the
instead of the model parameters, seemed to produce significant lift in majority of these challenges by transferring the training process into
communication efficiency. either edge nodes or by employing device’s computation and storage
resources [32,67,72] to accomplish the training tasks. In a real world
Adaptive algorithms built on top of efficient model exchange by
grouping, adaptive selection and dynamic adjustment could potentially scenario, devices in the network have different capabilities for comput-
render superbly efficient algorithms with little sacrifice on machine ing and storage. The performance of the distributed algorithms could be
learning model performance (most of the time the accuracy can be severely affected by weaker devices. On the other side of the spectrum,
preserved at very high level). Graphical models are often intractable the stronger devices can be exhausted during the training process
and difficult to solve. However, if the network topology can be mapped and consume more resources compared to other devices, which may
careful, some approximate tree or chain structures would allow the flex- result in less training cooperation. This imbalance between weaker and
ibility to utilize various graph cut, graph coloring, graph traversal tech- stronger devices is referred to as system heterogeneity. Furthermore,
niques to further improve communication and computation efficiency the data distribution and the quality of data over each device also could
and scalability of the distributed machine learning algorithms. be different; this is considered another source for heterogeneity and can

10
O. Nassef et al. Computer Networks 207 (2022) 108820

Thus, it is critical that the server can allocate these limited resources
efficiently while the requirements of the training are satisfied, which
is more challenging when prior knowledge of the network condition is
missing or limited [74–78].
FL and SL require i.i.d data for the training process over the union of
all data samples to minimize its expected loss. If the distribution of the
test data varies from the training data, the performance loss increases.
In practical scenarios there are more non i.i.d. datasets, i.e. there is a
high probability that the model is trained based on a data distribution
but the target task includes a newer and different distribution. With FL,
the accuracy reduction is expected; however, by agnostic FL introduced
in [74], the model on the server is optimized and trained for network
nodes with an unknown mix of data distributions, thus reducing the
risk of data mismatch. This paper considered various data sets and
demonstrated that this approach achieved better performance, in terms
of processing time and accuracy, compared to vanilla FL.
Distributed learning server decides on data, CPU, and battery of
each device on the distributed network to maintain low latency and
low power consumption during the training process. The procedure of
making this decision is a critical challenge for the server, while the mo-
bile environment resources are stochastic and time dependent. In [75],
a method based on Deep Q-Network (DQN) is introduced to allow the
server to allocate the limited resources of the mobile devices optimally
without prior knowledge from the network. Their proposed algorithm
is capable of minimizing the energy consumption and training latency
Fig. 8. Distributed learning architecture overview, based on FL structure. compared to greedy and random resource allocation methods. For a
heterogeneous data scenario, this method intends to consider more
devices with higher data quality, due to their impact on the accuracy
affect distributed ML performance dramatically. Thus, allocation of the of the global model. Reinforcement learning is also used in [76]; a
communication and computation resources is vital for any distributed double-DQN is proposed to tackle the over-optimistic problem of DQN.
ML and network resource management tasks. In this approach, two DNNs are employed, one online parameter set
Work done in [73] addressed the resource allocation issues for MEC which is updated at each iteration, and one parameter set fixed for each
to combat the limited resources on the edge. At the same time it tried to device. After several iterations the device’s parameters are reset to the
optimize communication without compromising model accuracy. The online parameters. This paper showed that the achieved reward of the
Double-DQN is higher than the conventional algorithms; i.e. the model
following resource allocation issues have been extensively discussed:
learns the optimal decision better, although the network environment
(a) Participant Selection (b) Joint Radio and Computation Resource
is stochastic and uncertain. The proposed solution also assumes fairness
Management. (c) Adaptive Aggregation (d) Incentive Mechanism. The
among the UEs in terms of energy consumption. It is depicted that with
work also mentioned the integration of computation and communi-
any number of UEs, the system converges to the same average utility,
cation via over-the-air computation for MEC. Federated learning, as and the convergence speed has reverse correlation with the number of
a way to decentralize ML and preserve data privacy, is discussed in the UEs.
a comprehensive survey [73]. It covered not only FL applications in Fairness is an important topic in both machine learning algorithms
generic ML context, but also considered the potential of FL as an and wireless communication resource management. Data could have
enabling technology for optimizing mobile edge networks, such as different distributions or sizes, which will affect the performance of the
cell association, computation offloading and vehicular networks. The model on each device as well as the global model. Some devices may
communication bandwidth, the heterogeneity of participating devices suffer more from poor accuracy. Moreover, distributed ML is affected
and lack of privacy in the presence of malicious participants are listed severely by the resource constraints on the devices. The optimization of
as challenges for FL to be applied for MEC. In this section, we focus the resource allocation problem in distributed ML usually attempts to
on different aspects of resource optimization and allocation to improve minimize the aggregated loss in training; however, this can harm some
the performance of FL. The structure of a common FL framework with devices by exhausting them. Thus, in a fair distributed learning model,
the different device heterogeneities are illustrated in Fig. 8, where a available resources and the data both should be considered.
server and its corresponding workers are represented. The server could An agnostic framework was proposed in [74] to investigate fairness
be base station, and the worker could be edge node or the mobile in the scenario where the distribution of the data set in each device is
device. As shown in the figure, the server is responsible to select different, and no prior information is available. Considering fairness
among heterogeneous workers to guarantee its performance during the in client selection to avoid imbalanced resource allocation among
devices is investigated in [77,78]. An approach inspired by the fairness
learning and inference processes.
mechanism in telecommunication is introduced in [77] and adapted
into FL to provide more uniform accuracy distribution among the FL
4.1. Client selection
workers. In this method, the device with a higher loss is assigned to
receive higher relative weights. The results demonstrated that fair FL
The promised performance of any distributed ML approach is con- is more flexible and efficient compared with the baseline method. It is
strained by two factors: node heterogeneity and wireless channel con- able to reduce the variance of accuracy on devices, and provide uniform
straint. Node heterogeneity includes various features of the device as performance among devices irrespective to their loss. Satisfying fairness
well as the data. Network devices usually have limited power and and accuracy in a big scale network becomes more important when
computation resources. Both affect the training efficiency. Besides, the traffic requests explodes. In [78], a lightweight and scale-able method
size and distribution of data over different contributing nodes can cause of a fair FL approach is proposed. This approach collected updates from
some bias on the performance of the local model, which can affect each device and then selected a subset of devices to update the model
overall FL performance severely. Moreover, bandwidth and latency con- for the next step. The results demonstrated that this method reduced the
straints of wireless networks, influence distributed training efficiency. processing time and increased the accuracy compared to vanilla FL.

11
O. Nassef et al. Computer Networks 207 (2022) 108820

4.2. Physical link optimization performance without compromising on data privacy. In [88,89], hier-
archical FL is employed to address the challenges with edge association
Distributed machine learning models depend on wireless communi- and the related resource allocation, when the players are workers,
cation channel characteristics such as bandwidth, latency, noise, and edge servers, and model owner. The hierarchical FL preserves the
interference, which create a big challenge for low-latency accurate privacy of the workers, and the introduced model associates the edge
machine learning [79–81]. nodes to the workers by the first layer of the algorithm, using an
In [79], a low-latency implementation of distributed ML is pro- evolutionary game method. The second layer of the model considers
posed. It is called broadband analog aggregation in which each edge a Stackelberg differential game to maximizes the profit of the model
node trains a local model and the updates are simultaneously transmit- owner by deciding on the reward mechanism. Incentive mechanism
ted over the transmission channel and waveforms are aggregated on the is also applied into a FL-based algorithm for an internet of vehicles
uplink channel. This approach exploits two trade-offs: Signal-to-Noise- application [90], by employing UAVs to facilitate intelligent transport
Ratio (SNR)-truncation trade-off, which is defined between the received system. In this paper, the authors introduce a method to provide a
SNR and a model update quality measure called truncation ratio. The learning mechanism which preserves the privacy of UAVs through data
second trade-off is a model update reliability-data quantity trade-off, collection and a multi-dimensional contract-based mechanism to ensure
which is related to the scheduling of devices inside the cell. These the coverage availability of UAVs and profit maximization of the model
two trade-offs are fundamental for network planning and optimization owner. Although the proposed method presents performance in terms
goals. It is shown that this approach considerably reduced the latency of the profit gain for model owner, the paper does not include a
compared to the traditional Orthogonal Frequency-Division Multiple comparison of its proposed method with some other incentive models,
Access (OFDMA) method. The reduction is also linear with respect to as the baseline, to evaluate the achievable profit margins under the
the number of the devices within the network. same UAV scenario.
The edge devices, usually with limited power resource for their Work in [83] proposed an incentive-based FL between server and
transmissions, need to transmit their data to the server frequently. the edge devices by using game theory. By employing an optimization
The available bandwidth on the uplink channel is limited for all these of communication resources, such as radio resources on RANs, and
transmissions. Thus, it is expected that the transmitted local models computation resources, like network devices or edge equipment, the
suffer from loss and errors. An analog and digital distributed stochastic cost of learning is minimized. The mechanism is detailed as follows:
gradient descent method in [80] tried to minimize this loss. In each iter- from all worker nodes, the devices with enough energy, processing
time, and CPU capacity are selected to perform computations with
ation of the digital method, the algorithm selected one device based on
low latency. Enough radio resources should be allocated during the
its channel quality. Then, the selected device employed some efficient
model update transmission period. A Stackelberg game-based method
coding techniques to transmit its quantized gradient updates with high
is introduced, in which each device reports its local CPU resource to the
rates to the server. In contrast, the analog method exploits the superpo-
server (i.e. BS) to receive the offered reward rate and maximize its local
sition feature of the uplink channel and each device reduced its large
utility function. The server collects all these reports, updates its global
parameter vector’s dimension to transmit the model updates via limited
reward model and shares its offered rewards to heterogeneous devices.
bandwidth channel. By proposing a power allocation mechanism to
Then, the workers update their strategies based on this offer and try to
align the received vectors at the server, the results demonstrated that
maximize their utility function accordingly. The experiment evaluation
the analog technique is more successful in efficient transmission on the
illustrated that this approach is an effective model to interact with
band-limited channels compared to the digital method.
workers and server and improve learning performance.
Unfortunately, if the aggregated model updates include errors, it
Alternatively, in [84], the communication among workers and
is expected that there would be considerable reduction in training
server is ensured by introducing an adapted method from cooperative
accuracy, which directly affects the prediction accuracy. This distortion
relay networks. After finishing the local training on the worker side,
can grow faster if the number of worker nodes (devices) increases. On
the updated model is transferred into the server by relaying mechanism
the other hand, it is shown that the training algorithm can converge
between other nearby devices. Each worker charges the server either
faster if the number of worker nodes increases [82]. To address this
for its computation or relaying with some price. This method requires
dilemma, an approach based on joint device selection and beamforming
a dual optimization algorithm to choose those relay(s) with low trans-
is introduced in [81] to tackle the multi-access channel constraints.
mission power as well as interference to accomplish the model delivery
This method attempts to find the maximum number of devices which
to the server. It is implemented based on a Stackelberg game model
satisfied the mean-square-error requirement to reduce the model aggre-
to analyze transmission strategies while optimizing the pricing mecha-
gation errors. By using sparse and low-rank optimization solution, a fast nisms. The outcomes illustrated that workers under higher interference
model aggregation approach based on weighted average of the local prefer to relay their model updates into other devices rather than
updates is shown to be able to enhance the communication efficiency. transmitting directly to the server, to save more energy. On the other
hand, since the devices with bigger data sizes need more computation
4.3. Incentive mechanism time, they are more eager to relay the neighbor’s update, i.e. they
offer cheaper relaying price. Thus, those workers which finish their
Distributed ML brings AI computation onto the edge- or device- computation faster can use the neighbors to relay their updates and
level while satisfying security and privacy concerns. However, moti- perform more energy efficient transmission rather than transmitting
vating those limited computational resource devices to contribute into directly to the server. Similarly, in [85], the equilibrium solution
the learning process is difficult. According to the literature, incentive based on the Stackelberg game is employed to investigate the trade-
mechanisms are good candidates to encourage the network nodes to off between the CPU utilization of the workers and budget allocation
participate in network activities like distributed learning. However, of the server. The outcomes emphasized the importance of optimizing
these mechanisms conflict with the private nature of distributed learn- resource allocation for distributed ML models in practical scenarios.
ing, such as, the processes of unsharing data between devices. This In [86], the limitation of data sharing in FL is addressed by propos-
limitation prohibited many FL-based approaches to be implemented at ing a Deep Reinforcement Learning (DRL)-based incentive mechanism.
scale or to achieve their promised performance in practical scenarios In this method, the server and worker each has its own state spaces,
[83–90]. Nevertheless, it is still valuable to selectively adapt incentive policies and rewards. The state space of the server includes its past
mechanisms to distributed learning methods to help devices participate payment strategy and the workers’ past participation history. In con-
in a collective learning environment and improve model accuracy and trast, the worker’s state space includes the server’s current payment

12
O. Nassef et al. Computer Networks 207 (2022) 108820

Table 6
Comparison of resource optimization approaches.
Approach Key idea Proposed solution Trade-off
Client- ∙ Selects a subset of clients from ∙ Centralized optimization model Advantages:
selection the set of clients to obtain based on a mix of various ∙ Computation and communication resources are taken into account
mechanisms weights from. distribution of the clients [74]. in the training process.
∙ Reduces convergence speed by ∙ Deep Q-Learning approaches to ∙ Best set of devices selected to ensure model accuracy while robust
forgoing waiting on all clients for select the best cut of clients data distribution and energy efficiency are satisfied.
updates. [75,76].
∙ Optimizes Computation and ∙ Fairness focused client selection Disadvantages:
communication load on the BS by mechanism, to overcome trade-off ∙ Fairness cannot be guaranteed with device selection.
filtering to a number of clients. [77,78]. ∙ May skew the model generalizability and accuracy, as data space
cannot represented by the selected devices.
Physical link ∙ Reduction of the load on the ∙ Signal superposition [79,81], Advantages:
optimizations wireless medium through the enabling faster model aggregation ∙ Improve the exchange of model parameters between server and
utilization of physical links. client.
∙ Weights are projected to lower ∙ Stochastic gradient descent to ∙ Technologies such as beam-forming and user selection synergize
dimensions such that only cater for fading and imperfect with FL.
important gradients are sent, to channels [80]. ∙ Enables robust designs to data bias, imperfect channel state
reduce the burden on the information and low power IoT.
communication medium.
∙ Minimizing number of clients Disadvantages:
that can send their quantized ∙ Channel noise can negatively impact training experience.
weights to the server at one time. ∙ Realistic channel models are difficult to accurately evaluate
leading to unanticipated FL training process.
Incentive ∙ Offers rewards for UE ∙ Stackelberg game method Advantages:
mechanisms participation, increasing [83–86], to assign rewards to the ∙ Appropriately balances computation and communication resources
computing power allocation [83] clients. whilst ensuring participation.
or local iterations [85]. ∙ Contract theory [87] to form a ∙ Improves global model accuracy and generalizability as a wider
∙ Assigns a UE a reliability assessment on the data distribution is used from different devices.
reputation/reliability based on participating clients. ∙ Introduces competition and quantifies contribution and reliability
their interaction, which correlates from different participants.
with the importance UE on the
global model. Disadvantages:
∙ More contributing UEs results in ∙ Mobility of UEs or selfish UEs may cause unforeseen impacts.
more training diversity and less ∙ The network topology may have a large weight in the sensitivity
training delay, more BS utility of the incentive mechanisms.
and less UE benefit due to more ∙ The approach is sensitive to number of the UEs and budget.
competition [86].

strategy as well as the history of the other workers’ training strategy. model training mechanisms. Moreover, the energy consumption is an-
The policy of the server node is to take an action in which the workers other concern which can be addressed by using device selection along
are encouraged to collaborate more, while the policy of the workers is with link optimization mechanisms, to ensure that the collaborating
to determine the training strategy. Since both state spaces are contin- devices can maintain latency and energy efficiency requirements of the
uous, the problem is modeled by Q-learning, and neural networks are designed network.
employed to find the optimal policies on both sides. For the reward
The resourceful devices impact positively the accuracy of the dis-
system, the server agent uses a utility function based on the action
tributed machine learning models; however, they might be exhausted
taken as the reward, whereas the edge nodes employ the function based
for their high capacity CPUs or storage spaces. For that reason, network
on their previous learning strategies as well as current action taken. The
nodes might be reluctant to contribute to the distributed learning. To
results presented that the DRL approach outperforms the Stackelberg
encourage such users, incentive mechanisms can be applied, mostly by
equilibrium method.
using the Stackelberg equilibrium or related methods, to improve the
Another interesting perspective to incentive mechanism is intro- accuracy while satisfying the data privacy requirement.
duced in [87]. To address the computational resource allocation prob-
lem and the selection of devices, the reputation and trustworthiness of 5. Privacy and security
each worker is introduced as a metric. Then, the selection algorithm
tries to choose more reliable workers for the computation mission. The One of the main reasons, for the introduction of distributed machine
reputation of the workers is securely managed by employing block- learning is to ensure data privacy whilst allowing collaborative model
chain. Finally, the proposed mechanism encourages the better-reputed calibration. Even though, distributed machine learning approaches, do
workers with high-quality data to contribute more frequently into the not strictly require data exchange between other participants, it does
learning process compared to the other nodes. It is presented that this not prevent attacks from malicious servers or participants that may
approach improves the accuracy on reliable FL by attracting stronger infer sensitive data, posing a threat to the anonymity and privacy of
workers into the collaborative learning task (see Table 6). the data.
As security and privacy measures employed become more complex,
Remarks. High performance of distributed ML methods, such as FL the computational overhead that facilitates these features inflates as
and SL, is achievable when large number of devices are contributing well. Increasing the amount of noise in differential privacy is an exam-
and all have sufficient computation resources. However, this would not ple, whereby the de-noising and adding noise in each iteration requires
be likely in practical scenarios like wireless networks with heteroge- further computational power adding to overhead of implementing such
neous resources on devices. To achieve the promised performance of an approach. Hence, there is often a large trade-off between security
distributed ML algorithms, the device selection based on their available and computational complexity. This is further highlighted in relation to
resources as well as the existence of the reliable link between worker(s) distributed machine learning [91], as the overhead becomes more inter-
and the server, are key approaches for high accuracy and low latency twined between communication costs, computation costs, privacy and

13
O. Nassef et al. Computer Networks 207 (2022) 108820

model accuracy. Extensive exploration has been carried out in order to multiparty computation is that it is a lossless method, able to protect
obtain a favorable balance between model accuracy and computational each individual model update, yet allows the server to calculate the
costs without compromising security severely. exact aggregated result at each global model update, whilst retaining
Both SL and FL somewhat abide by the concept of local and global the original accuracy [117].
privacy [92]. Local privacy ensures that the communication between Homomorphic encryption is a form of encryption, which can be an
the server and the participant is private and secure, whereas, global pri- enabling factor for secure multiparty computation. The uniqueness of
vacy pertains to mask global updates to all untrusted third parties [93]. homomorphic encryption stems from the ability to carry out mathe-
A majority of the threats that distributed machine learning faces, matical functions without the need of prior decryption. Although this
are inherited from machine and deep learning threats such as infor- method operates on the decrypted data, it achieves the same results.
mation exploitation attacks [94], model poisoning attacks [95] and This is very beneficial in a distributed machine learning setting to
data poisoning attacks [96]. The introduction of distributed machine protect the participant parameters from an honest-but-curious server
learning inadvertently hatched new threat spaces such as free loading (aims to keep FL services working but still extract information) [118].
attacks [97]. Homomorphic encryption is not a stand-alone technology. Different
With respect to distributed machine learning architecture and mode flavors of homomorphic encryption exist, such as the use of ideal
of operation, such as secure aggregation for FL [98,99], we focus our lattices in [119], the removal of modulus switching in [120] and
discussions on differential privacy and secure multiparty computation in [121] where bootstrapping has been negated. Homomorphic en-
which are critical areas especially for distributed machine learning cryption is usually combined with other methods to achieve privacy
approaches (e.g. FL and SL), as shown in Table 7. while aiming to minimize computational overhead. Authors in [122]
combined both differential privacy and homomorphic encryption to
5.1. Differential privacy concurrently prevent honest-but-curious servers and malicious partic-
ipants from exploiting information in a FL setting by encryption using
Differential privacy, allows the sharing of statistical inference on a homomorphism before it is sent to the server. Other examples include
dataset, without sacrificing the individual dataset [100]. In the context work in [123,124] which focus specifically on large scale deployments
of distributed machine learning, it is an enabling factor which allows with millions of records, while [125] is concerned with big data on
data collection and aggregation, while maintaining the privacy of the cloud deployments.
client dataset.
Authors in [101] introduced the notion of differential privacy with Remarks. The design of privacy and security measures cannot be
respect to NN and continuously improved upon similar work in [102], dependent on a single approach [134] especially for 5𝐺 and beyond,
where the authors proposed the addition of noise on the trained model where services become more integrated. For instance, two approaches
weights, before sharing these weights with the server; the server re- that can be combined are differential privacy and secure multiparty
moves the noise added, obtaining the true trained model weights. The computation. While differential privacy ensures that the information
training phase of a participant stops when a pre-defined threshold on of a single party remains anonymous and private, secure multiparty
the probability of malicious inference of parameters is met, reducing computation adds another layer of security by allowing operations to
the risk of information leakage from shared parameters. Before the occur on encrypted data. This reduces the need for repeated encryption
addition of noise however, gradients can be clipped in order to bound and decryption operations, thus saving computational resources.
its influence on the global aggregation [103].
Expanding on this are [104,105], which are specifically catered to The trade-off between the computational load, privacy and accuracy
FL. Noise is applied to the selected participants required to train the has yet to be explored extensively in literature with respect to a
global model before communicating with the server. This obscures the combination of technologies that utilize different forms of security and
involvement of participants, which protects the FL process from a ma- privacy in a distributed machine learning environment.
licious participant and negates the possibility of individual information
being learnt from the global model. 6. Compute frameworks
Differential Privacy has been incorporated into a variety of learning
frameworks such as TensorFlow [106] and Opacus (Pytorch) [107]. It To fundamentally address the compute efficiency issue in dis-
is important to note that differential privacy does not guarantee total tributed systems, each step of the programming paradigm as well as
security and privacy. It is one of the tools to be used in conjunction with model execution needs to be designed and optimized, especially for
other approaches to achieve such a goal. Specifically, an assumption of deep learning tasks. This becomes even more prevalent when AI is put
differential privacy is that the server is trustful, which may still pose a into MEC. Computational frameworks are important elements of any
threat to privacy of participants. machine learning model and its implementation. The modern popular
machine learning compute frameworks offer us remarkable reference
5.2. Secure multiparty computation designs and optimization techniques; namely, compute graph, where
the binding symbolic expression is presented for evaluation, resource
Secure multiparty Computation is concerned with calculating a dependency, which defines the resource dependencies in a distributed
shared function globally whilst maintaining the privacy of the indi- system, and data communication, which handles the device-to-machine
vidual inputs as shown in [108,109]. This is directly applicable and and machine-to-machine synchronization [135]. In this section, we
beneficial to distributed machine learning since a gradient is computed review some of these computational frameworks and explore their fea-
by multiple parties with the aim of keeping each client’s input data tures and capabilities for different ML scenarios. Many of the concepts
private from the server and other clients [110,111]. can be expanded at system level when we design distributed machine
Multiple implementations of secure multiparty computations have learning solutions for mobile networks.
been introduced in machine learning [112,113] and NN models [114,
115]. It is especially beneficial in distributed machine learning, when 6.1. Caffe
used to safeguard the dataset from other clients. However, there is
a significant trade-off between privacy and the communication and Caffe was first designed for computer vision, then it has been
computation overhead. The framework needs to be designed carefully, adopted and improved to incorporate speech recognition, robotics,
and used with other technologies to guarantee information privacy, neuroscience, and astronomy [136]. Compute graph is realized by
without impacting the accuracy greatly [116]. A main benefit of secure arbitrary directed acyclic graphs (DAGs) defined with Protocol Buffer

14
O. Nassef et al. Computer Networks 207 (2022) 108820

Table 7
Comparison of different privacy and security approaches.
Approach Key idea Proposed solution Trade-offs
Differential ∙ Quantifies the information that ∙ Deliberately adds noise and Advantage
privacy can be obtained about an randomness to the dataset to ∙ Protection against information exploitation attacks
individual from a dataset. obscure the impact of a user on
∙ Adds noise to reduce the the dataset [127]. Disadvantages
information that can be deduced ∙ Gradient clipping to bound ∙ Increased privacy comes at a larger computational cost.
from each participant. influence on global model [103]. ∙ Amount of noise and randomness negatively influences accuracy
∙ Can adopt a central, distributed ∙ Noise applied to only specific of model.
or hybrid server for differential participants [105]. ∙ Assumes that the server is trust-worthy.
privacy [126]. ∙ Gaussian/Laplacian/Binomial
functions used to approximate the
de-noised values [104,128,129].
Secure ∙ Parties agree upon a common ∙ Prevent the server from learning Advantages
multiparty function where the outputs are the relationship between the ∙ Protects against malicious participants in the same environment
computation shared without disclosing the information and the participant without affecting functionality.
private inputs. through secure shuffling [130]. ∙ Protects against information exploitation attacks.
∙ Multiple parties participate to ∙ Private information retrieval,
create a trusted third party to computation and information Disadvantages
compute gradients. theoretic, such that the server has ∙ Assumes that the server/third-party is trustworthy, as such does
∙ Securely share the information zero knowledge on the requesting not protect against a honest-but-curious or malicious servers.
with a subset of participants party [131]. ∙ Accuracy and privacy have an inversely proportional relationship.
without enabling information ∙ Can utilize Homomorphic ∙ Communication and computation costs increase as the
leakage. Encryption. environment scales.
Homomorphic ∙ Allows for mathematical ∙ ElGamal and Lattice based Advantages
encryption computation on encrypted data, solutions that which allow both ∙ Protects from an honest-but-curios server and other curious
without requiring the cipher text. addition and multiplication participants.
∙ Can be used to enable secure [118,119,121].
multi-party computation. ∙ Recurrent renewal of secret keys Disadvantages
preventing cipher attacks [132]. ∙ The strength of the approach is not guaranteed, which could be a
∙ Secret key distribution to cause of information leakage.
prevent relying on central/trusted ∙ Requires a dedicated design to utilize the encryption method and
third-party [133]. account for the increased computational load.
∙ Adoption in a scalable ∙ Solely depends on the type of computation taking place the
environment with large numbers distributed learning environment, as the approach does not cover
of records or cloud deployments all mathematical expressions.
[124,125].

language. Resource dependency is optimized by the separation of the


representation from the actual implementation. Caffe model definitions
are written as config files using the Protocol Buffer language. Caffe
supports network architectures in the form of arbitrary DAGs. This
abstract configuration from its underlying location in host or GPU
makes seamless switching between heterogeneous platforms possible.
For data communication, Caffe uses blobs to store and communicate
data in 4-dimensional arrays. This design provides unified memory
interface to conceal the computational and mental overhead of mixed
CPU/GPU operation by synchronizing from the CPU host to the GPU de-
vice as needed. Compute and communication optimization is achieved
by loading data from the disk to a blob in the CPU, calling a CUDA
kernel to do GPU computation, and ferries the blob off to the next layer.
Fig. 9. Distributed subgraph execution with Tensorflow [106].
Since Caffe implementation is completely C++ based, the integration to
C++ is straightforward. The design allows the efficiency required for
scalability.
algorithm may break down and hinder run time performance when a
6.2. Tensorflow few tensors hold on to a lot of scarce GPU memory as gradient nodes
are automatically added to the compute graph.
Tensorflow [106] is an interface to handle a wide variety of het- Compute and communication optimization is achieved by defining
erogeneous systems in expressing and executing machine learning al- a cost model that incorporates both computation and communication
gorithms. Compute graph in Tensorflow represents a data flow compu- aspects, and uses the Send and Receive nodes to coordinate and transfer
tation, with extensions to allow certain kind of nodes (e.g. Variable) data across devices. This allows all communication to be housed inside
to maintain and update persistent state, which is also coupled with Send and Receive implementations, which simplifies the rest of the
branching and looping control structures within the graph (Fig. 9). run time. By handling communication in this manner, it also allows
Resource dependency is optimized by a placement algorithm with the scheduling of individual nodes of the graph on different devices
registered kernels and operations. It uses simulated activity on each to be decentralized into workers. Furthermore, control-flow and data-
device with greedy heuristic from a cost model to optimize resource flow based on graph partitioning and rewriting, are used to handle
allocation at run time. Specifically, the feasible set of devices for each cyclic data flows and execute graphs in a distributed coordinated
node is first computed, then union-find is applied on the graph where manner. The compute is optimized by wrapping already optimized li-
co-location constraints are used to compute the graph components that braries (e.g. BLAS, GPU libraries) for matrix multiplications on different
must be placed together. However, the heuristics used in the placement devices as kernels.

15
O. Nassef et al. Computer Networks 207 (2022) 108820

structure, the GPU and CPU operators, and the basic parallel primitives
are used to provide the bindings generated using YAML meta-data files.
Different from previously mentioned compute frameworks, PyTorch
implements the eager execution concept to handle the resource de-
pendency. The resource dependency is implemented with a custom
allocator which builds up a cache of CUDA memory and reassigns
it to later allocations without further use of CUDA APIs. The main
allocation strategy is to use asynchronous GPU execution operators by
leveraging the CUDA stream mechanism and overlapping the execution
of Python code on CPU with tensor operators on GPU. It lets the GPU
saturate and reach peak performance. PyTorch also relies on a reference
counting scheme to track the number of uses of each tensor, and frees
the underlying memory immediately once this count reaches zero.
This ensures that memory is released exactly when tensors become
unneeded.
To optimize compute and communication, strict separation between
its control (i.e. program branches, loops) and data flow (i.e. tensors
Fig. 10. MXNet overview [135].
and the operations performed on them) is exercised in PyTorch. The
resolution of the control flow is handled by Python and optimized with
C++ code executed on the host CPU, and results in a linear sequence
6.3. MXNet of operator invocations on the device. Operators can be run either on
CPU or on GPU. The allocator was tuned for the specific memory usage
MXNet is a computation and memory efficient framework that runs patterns of deep learning to improve compute efficiency. The one-
on various heterogeneous systems, ranging from mobile devices to pool-per-stream design assumption simplifies the implementation and
distributed GPU clusters [135]. improves the performance of the allocator. Exceptions to the one stream
Compute graph is declared with multi-output symbolic expressions, design are handled by carefully inserting additional synchronization
where symbolic expressions and tensor operations are embedded in to avoid bad interactions with the allocator. PyTorch extends the
a unified fashion to realize large scale DNN applications. Imperative Python multiprocessing module and automatically moves the data of
tensor computation with NDArray is used to fill the gap between the tensors sent to other processes to shared memory instead of sending it
declarative symbolic expression and the host language. The binded over the communication channel, thus simplifying communication and
symbolic expression (compute graph) is further optimized by simpli- improving overall performance (see Table 8).
fying sub graph operations, leveraging on existing optimized libraries
(e.g. BLAS, GPU libraries) or manually implementing well optimized Remarks. The obvious trend in the development of compute frame-
‘‘big’’ operations, such as a layer in NN (Fig. 10). works is that with the hardware performance significantly improved,
Before evaluation, MXNet transforms the graph to optimize the especially the GPUs, complicated scheduling (e.g. the scheduling of
efficiency and allocate memory to internal variables. To handle re- individual nodes of the graph on different devices to be decentralized
source dependency, resource units are registered with mutable tags into workers or multi-threaded scheduling) might be avoided by priori-
to the dependency engine. The multi thread scheduling process will tizing allocation of critical resources and recycling memory timely with
allocate resources to execute operations pushed to it if dependencies reference counting mechanisms. Separating data and control flows can
are resolved. Two heuristics strategies with linear time complexity also create more flexibility to optimize compute and communication
are designed to implement additional optimizations. The first strategy, both at architecture level and at process/operation serialization level.
called inplace, simulates the procedure of traversing the graph, and Binding with different programming languages and different devices
keeps a reference counter to recycle memory. The second strategy, still remain challenging, especially when new devices and new chip
named co-share, allows two nodes to share a piece of memory if and sets are being developed constantly. However, binding only needs to be
only if they cannot be run in parallel. Exploring co-share imposes implemented once for resource registration. Then the new devices will
one additional dependency constraint. In particular, each time upon be taken into consideration by the optimization algorithms or inherent
scheduling, among the pending paths in the graph, the longest path is heuristics. This means carefully designed end to end process with
found to perform needed memory allocations. potential extendability to future devices will be critical to implement
A distributed key–value store for data synchronization over multiple any distributed machine learning solutions with desired performances.
devices, KVStore, is designed to handle compute and communication
optimization. It supports two primitives: push a key–value pair from 7. Conclusion
a device to the store, and pull the value on a key from the store.
The dependency engine is used to schedule the KVStore operations In this work, we have provided structural overview of distributed
and manage the data consistency. Data synchronization between the machine learning and detailed analysis on the various distributed
devices and inter machine synchronization are managed with a two machine learning approaches, including their variants. Enhancements
level structure to keep the compute and communication efficient. from diverse disciplines such as communication and computation opti-
mization, privacy and security have been probed to determine their
6.4. Pytorch feasibility and expediency in ongoing advancements for distributed
machine learning.
PyTorch [137] provides an array-based programming model ac- From a system design perspective, with improved hardware perfor-
celerated by GPUs and is differentiable via automatic differentiation mance, such as the GPUs, complicated scheduling might be avoided
integrated in the Python ecosystem. by prioritizing allocation of critical resources and recycling memory
Instead of a static compute graph, PyTorch uses dynamic eager exe- timely. Separating data and control flows is also needed to provide
cution to enable immediate execution of dynamic tensor computations enough flexibility in the design to optimize compute and communica-
with automatic differentiation and GPU acceleration. The tensor data tion.

16
O. Nassef et al. Computer Networks 207 (2022) 108820

Table 8
Comparison of compute frameworks.
Framework Key idea Proposed solution Trade-offs
Caffe [136] ∙ Separation of representation and ∙ Supports network architectures Advantages:
implementation in the form of arbitrary directed ∙ Switching between a CPU and GPU implementation is exactly one
∙ Using blobs for data storage acyclic graphs. function call
∙ Blobs provide a unified memory ∙ Caffe does all the bookkeeping for any directed acyclic graph of
interface, holding batches of layers, ensuring correctness of the forward and backward passes.
images (or other data), ∙ Blobs conceal the computational and mental overhead of mixed
parameters, or parameter updates. CPU/GPU operation by synchronizing from the CPU host to the
GPU device as needed
Disadvantages:
∙ No clear separation of control and data flow.
TensorFlow ∙ Using directed graph to describe ∙ The graph represents a dataflow Advantages:
[106] TensorFlow computation. computation, with extensions for ∙ Allows distributed coordination mechanism to execute graphs with
∙ Abstract computation is allowing nodes to maintain and control flow
represented as operations with update persistent state and for ∙ Enforce orderings e.g. controlling the peak memory usage
attributes branching and looping control ∙ Allows flexibility for resource usage and scheduling optimizations
structures. Disadvantages:
∙ Distributed coordination ∙ Heuristics used to determine the order of graph execution could
mechanism to execute graphs break down, causing memory issues and limiting the size of
with control flow computations.
PyTorch ∙ Allows for bidirectional ∙ Overlap the execution of Python Advantages:
[137] exchange of data with external code on CPU with tensor ∙ Interoperable and extensible.
libraries. operators on GPU ∙ GPU optimized.
∙ Strict separation between ∙ Implements a custom allocator ∙ High performance.
control and data flow which incrementally builds up a
∙ One-pool-per-stream design cache of CUDA memory and Disadvantages:
∙ Eager execution reassigns it to later allocations, ∙ When GPU resources are not rich or available, the optimization
∙ Reference counting tuned for the specific memory potential is limited
usage patterns of deep learning ∙ Memory management depends on availability of reference
∙ Relies on reference counting to counting of the underlying library and language.
track and free the underlying
memory immediately
MXNet [135] ∙ Unified symbolic expression and ∙ Use imperative tensor Advantages:
tensor operation handling. computation to fill the gap ∙ Multi-language support.
∙ Heuristics and graph optimized between the declarative symbolic ∙ Computation and memory efficient.
memory allocation. expression and the host language. ∙ Runs on various heterogeneous systems, ranging from mobile
∙ Distributed data synchronization ∙ Use compute graph to evaluate devices to distributed GPU clusters.
over multiple devices. binded symbolic expression. Disadvantages:
∙ Use dependency engine to ∙ Designing the heuristics could be difficult when performance and
continuously schedules the efficiency needs to be balanced.
pushed operations for execution if
dependencies are resolved.

To consider communication optimization, maximizing the payload design process must be security and privacy oriented, and at the same
reduction with sparsification and quantization are often used to im- time balance the computation and communication overhead to not
prove inter-device communication efficiency in distributed machine incur an immense burden disrupting the distributed machine learning
learning. Efficient ways to exchange model information is also critical. environment.
Approaches such as grouping, adaptive selection, dynamic adjustment
in the model parameters, as well as probabilistic graphical models used 8. Future work and open challenges
to optimize message passing based on network topology are also effec-
tive ways to further improve communication efficiency and scalability As we discussed in this paper, the future direction of distributed
of the distributed machine learning algorithms. machine learning is highly influenced by different aspects of technology
Providing a balance between communication and computation re- advancements, from computing hardware, cloud and edge servers,
sources is a key factor to achieve the promised high performance of more efficient high-performance software and algorithms, to privacy
wireless networks, which is usually a challenge with heterogeneous and security. The development of wireless networks, particularly edge
resources on the devices as well as the communication link limitations. computing, has a significant impact on the scale and areas of applica-
These two aspects impact the accuracy and latency of the distributed tions of distributed machine learning algorithms. In this regard, cross
machine learning mechanism, which eventually affect the overall per- optimization on the allocation of various heterogeneous resources,
formance of wireless networks. Jointly optimized communication and i.e. communication and computation resources, is still an open research
computation algorithms can be considered as a solution, although area that deserves more attention.
concepts including energy efficiency, fairness, cooperation of a large On the other hand, 5𝐺 and beyond wireless networks are required
number of network devices, should also be considered and evaluated to to support near-real time applications. This means the tolerable delay
acquire a robust and well-designed network management and resource is in the order of μs. The optimization methods and algorithms for
allocation mechanism. distributed machine learning will be expected to fulfill this latency
In the privacy and security aspect, many forms of privacy preserving requirement along with the machine learning performance target in
approaches have been proposed for distributed machine learning archi- the near future. The design and development of the future network
tectures, however, it is vital to not depend on just one form of security architectures will need to be adapted in order to make effective use of
preserving approaches but to combine them. There is a large trade off these machine learning algorithms. The new structure of the network
between computation overhead and privacy, which inevitably extends design and management, with considerations of various requirements
the impact on accuracy and communication overhead. As such, the for distributed and agile AI/ML approaches, could be another potential

17
O. Nassef et al. Computer Networks 207 (2022) 108820

future direction. Other area to explore for distributed machine learning [13] R. van der Meulen, G. Research, What edge computing means for
for 5G include channel allocation policies, asynchronous FL [73], model infrastructure and operations leaders, 2018, [Online]. Available:
https://www.gartner.com/smarterwithgartner/what-edge-computing-means-
segmentation, DL architecture on edge, as well as data and model
for-infrastructure-and-operations-leaders.
parallelism [12]. [14] N. Hassan, K.-L.A. Yau, C. Wu, Edge computing in 5G: A review, IEEE Access
On top of that, future directions of research should also take into 7 (2019) 127276–127289.
account the sustainability and environmental friendliness such as green [15] W. Jiang, B. Han, M.A. Habibi, H.D. Schotten, The road towards 6G: A
MEC [41], with Cache-enabled MEC as an initial solution alongside of comprehensive survey, IEEE Open J. Commun. Soc. 2 (2021) 334–366.
[16] H. Tataria, M. Shafi, A.F. Molisch, M. Dohler, H. Sjöland, F. Tufvesson, 6G
mobility management.
wireless systems: Vision, requirements, challenges, insights, and opportunities,
The future work highlighted in this article, focused on interdisci- Proc. IEEE (2021) 1–34.
plinary areas is intended to pave the way for further research and [17] B. Ji, Y. Wang, K. Song, C. Li, H. Wen, V.G. Menon, S. Mumtaz, A survey of
investigations. They invite new discussions in the critical areas that will computational intelligence for 6G: Key technologies, applications and trends,
help realize the potential of distributed machine learning for future 5𝐺 IEEE Trans. Ind. Inf. (2021) 1.
[18] M. Tahir, M.H. Habaebi, M. Dabbagh, A. Mughees, A. Ahad, K.I. Ahmed, A
network and beyond.
review on application of blockchain in 5G and beyond networks: Taxonomy,
field-trials, challenges and opportunities, IEEE Access 8 (2020) 115876–115904.
CRediT authorship contribution statement [19] M. Nasimi, M.A. Habibi, B. Han, H.D. Schotten, Edge-assisted congestion
control mechanism for 5G network using software-defined networking, in: 15th
International Symposium on Wireless Communication Systems (ISWCS), 2018,
Omar Nassef: Writing – original draft, Writing – review & editing.
pp. 1–5.
Wenting Sun: Writing – original draft, Writing – review & editing.
[20] G. Zhao, M.A. Imran, Z. Pang, Z. Chen, L. Li, Toward real-time control in future
Hakimeh Purmehdi: Writing – original draft, Writing – review & wireless networks: Communication-control co-design, IEEE Commun. Mag. 57
editing. Mallik Tatipamula: Writing – review & editing, Supervision. (2) (2019) 138–144.
Toktam Mahmoodi: Writing – review & editing, Supervision. [21] M. Katz, M. Matinmikko-Blue, M. Latva-Aho, 6Genesis flagship program: Build-
ing the bridges towards 6G-enabled wireless smart society and ecosystem, in:
2018 IEEE 10th Latin-American Conference on Communications (LATINCOM),
Declaration of competing interest 2018, pp. 1–9.
[22] C. Zhang, P. Patras, H. Haddadi, Deep learning in mobile and wireless
No author associated with this paper has disclosed any potential or networking: A survey, IEEE Commun. Surv. Tutor. 21 (3) (2019) 2224–2287.
pertinent conflicts which may be perceived to have impending conflict [23] X. Wang, Y. Han, V.C.M. Leung, D. Niyato, X. Yan, X. Chen, Convergence of
edge computing and deep learning: A comprehensive survey, IEEE Commun.
with this work. For full disclosure statements refer to https://doi.org/
Surv. Tutor. 22 (2) (2020) 869–904.
10.1016/j.comnet.2022.108820. [24] W. Shi, J. Cao, Q. Zhang, Y. Li, L. Xu, Edge computing: Vision and challenges,
IEEE Internet Things J. 3 (2016) 1.
Acknowledgment [25] J. Konečný, H.B. McMahan, D. Ramage, P. Richtárik, Federated optimization:
Distributed machine learning for on-device intelligence, 2016.
[26] W.Y.B. Lim, N.C. Luong, D.T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang, D. Niyato,
The authors would like to acknowledge the many valuable discus- C. Miao, Federated learning in mobile edge networks: A comprehensive survey,
sions and suggestions provided by Arthur Brisebois and Bassant Selim, 2019.
whom contributed to this work. [27] Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: Concept and
applications, 2019.
[28] S. Wang, T. Tuor, T. Salonidis, K.K. Leung, C. Makaya, T. He, K. Chan, Adaptive
References
federated learning in resource constrained edge computing systems, 2018.
[29] D. Ye, R. Yu, M. Pan, Z. Han, Federated learning in vehicular edge computing:
[1] V. Veeravalli, P. Varshney, Distributed inference in wireless sensor networks, A selective model aggregation approach, IEEE Access 8 (2020) 23920–23935.
Philos. Trans. Ser. A Math. Phys. Eng. Sci. 370 (2012) 100–117. [30] T. Li, A.K. Sahu, A. Talwalkar, V. Smith, Federated learning: Challenges,
[2] 5G explained. [Online]. Available: https://www.ericsson.com/en/5g/what-is- methods, and future directions, IEEE Signal Process. Mag. 37 (3) (2020) 50–60,
5g. [Online]. Available: http://dx.doi.org/10.1109/MSP.2020.2975749.
[3] M. Shafi, A.F. Molisch, P.J. Smith, T. Haustein, P. Zhu, P. De Silva, F. [31] P. Vepakomma, O. Gupta, T. Swedish, R. Raskar, Split learning for health:
Tufvesson, A. Benjebbour, G. Wunder, 5G: A tutorial overview of standards, Distributed deep learning without sharing raw patient data, 2018.
trials, challenges, deployment, and practice, IEEE J. Sel. Areas Commun. 35
[32] V. Praneeth, G. Otkrist, S. Tristan, R. Ramesh, Split learning for health:
(6) (2017) 1201–1221.
Distributed deep learning without sharing raw patient data, 2018.
[4] O. Nassef, L. Sequeira, E. Salam, T. Mahmoodi, Building a lane merge coordi-
[33] J. Kaur, M.A. Khan, M. Iftikhar, M. Imran, Q. Emad Ul Haq, Machine learning
nation for connected vehicles using deep reinforcement learning, IEEE Internet
techniques for 5G and beyond, IEEE Access 9 (2021) 23472–23488.
Things J. 8 (4) (2021) 2540–2557.
[34] M. Rohini, G. Suganya, N. Selvakumar, D. Shanthi, Survey on machine learning
[5] A. Ghosh, A. Maeder, M. Baker, D. Chandramouli, 5G evolution: A view
in 5G, Int. J. Eng. Res. Technol. (IJERT) 9 (1) (2020) 569–576.
on 5G cellular technology beyond 3GPP release 15, IEEE Access 7 (2019)
[35] H. Fourati, R. Maaloul, L. Chaari, A survey of 5G network systems: challenges
127639–127651.
and machine learning approaches, Int. J. Mach. Learn. Cybern. 12 (2) (2021)
[6] 5G vs 4G: What is the difference?. [Online]. Available: https://www.ericsson.
385–431.
com/en/5g/what-is-5g/5g-vs-4g.
[36] J.S. Ng, W.Y.B. Lim, N.C. Luong, Z. Xiong, A. Asheralieva, D. Niyato, C.
[7] Y. Li, M. Chen, Software-defined network function virtualization: A survey, IEEE
Access 3 (2015) 2542–2553. Leung, C. Miao, A comprehensive survey on coded distributed computing:
Fundamentals, challenges, and networking applications, IEEE Commun. Surv.
[8] I. Sarrigiannis, E. Kartsakli, K. Ramantas, A. Antonopoulos, C. Verikoukis,
Application and network VNF migration in a MEC-enabled 5G architecture, Tutor. 23 (3) (2021) 1800–1837.
in: 2018 IEEE 23rd International Workshop on Computer Aided Modeling and [37] J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, J.S.
Design of Communication Links and Networks (CAMAD), 2018, pp. 1–6. Rellermeyer, A survey on distributed machine learning, 2019.
[9] S. Wang, X. Zhang, Y. Zhang, L. Wang, J. Yang, W. Wang, A survey on mobile [38] S. Khirirat, S. Magnússon, A. Aytekin, M. Johansson, A flexible framework for
edge networks: Convergence of computing, caching and communications, IEEE communication-efficient machine learning: from HPC to IoT, 2020.
Access 5 (2017) 6757–6779. [39] A.E. Eshratifar, M.S. Abrishami, M. Pedram, JointDNN: An efficient training and
[10] E. Dahlman, S. Parkvall, J. Skold, 5G NR: The Next Generation Wireless Access inference engine for intelligent mobile cloud computing services, IEEE Trans.
Technology, Academic Press, 2020. Mob. Comput. (2019) 1.
[11] Y. He, J. Ren, G. Yu, Y. Cai, D2D communications meet mobile edge computing [40] Z. Yang, M. Chen, W. Saad, C.S. Hong, M. Shikh-Bahaei, H.V. Poor, S. Cui, Delay
for enhanced computation capacity in cellular networks, IEEE Trans. Wireless minimization for federated learning over wireless communication networks,
Commun. 18 (2019) 1750–1763. 2020.
[12] T.K. Rodrigues, K. Suto, H. Nishiyama, J. Liu, N. Kato, Machine learning meets [41] Y. Mao, C. You, J. Zhang, K. Huang, K.B. Letaief, A survey on mobile edge
computation and communication control in evolving edge and cloud: Challenges computing: The communication perspective, IEEE Commun. Surv. Tutor. 19
and future perspective, IEEE Commun. Surv. Tutor. 22 (1) (2020) 38–67. (4) (2017) 2322–2358.

18
O. Nassef et al. Computer Networks 207 (2022) 108820

[42] J.H. Ko, T. Na, M.F. Amir, S. Mukhopadhyay, Edge-host partitioning of deep [64] M. Paskin, C. Guestrin, J. McFadden, A robust architecture for distributed in-
neural networks with feature space encoding for resource-constrained internet- ference in sensor networks, in: Fourth International Symposium on Information
of-things platforms, in: 15th IEEE International Conference on Advanced Video Processing in Sensor Networks, 2005, pp. 55–62.
and Signal Based Surveillance (AVSS), 2018, pp. 1–6. [65] A.G. Schwing, T. Hazan, M. Pollefeys, R. Urtasun, Distributed algorithms for
[43] P. Mach, Z. Becvar, Mobile edge computing: A survey on architecture and large scale learning and inference in graphical models, PAMI.
computation offloading, IEEE Commun. Surv. Tutor. 19 (3) (2017) 1628–1656. [66] A. Dubey, M.M. Zhang, E.P. Xing, S.A. Williamson, Distributed, partially
[44] Z. Tao, Q. Li, ESGD: Communication efficient distributed deep learning on collapsed MCMC for Bayesian nonparametrics, 2020.
the edge, in: USENIX Workshop on Hot Topics in Edge Computing, USENIX [67] Z. Xu, Z. Yang, J. Xiong, J. Yang, X. Chen, ELFISH: Resource-aware federated
Association, Boston, MA, 2018, [Online]. Available: https://www.usenix.org/ learning on heterogeneous edge devices, 2019.
conference/hotedge18/presentation/tao. [68] S. Manap, K. Dimyati, M.N. Hindia, M.S. Abu Talip, R. Tafazolli, Survey of radio
[45] M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. Andersen, A. Smola, Parameter server resource management in 5G heterogeneous networks, IEEE Access 8 (2020)
for distributed machine learning, 2013. 131202–131223.
[69] F. Hussain, S.A. Hassan, R. Hussain, E. Hossain, Machine learning for resource
[46] Y. Shi, K. Yang, T. Jiang, J. Zhang, K.B. Letaief, Communication-efficient
management in cellular and IoT networks: Potentials, current solutions, and
edge AI: Algorithms and systems, IEEE Commun. Surv. Tutor. 22 (4) (2020)
open challenges, IEEE Commun. Surv. Tutor. 22 (2) (2020) 1251–1275.
2167–2191.
[70] M. Polese, R. Jana, V. Kounev, K. Zhang, S. Deb, M. Zorzi, Machine learning at
[47] J. Zhang, O. Simeone, LAGC: Lazily aggregated gradient coding for
the edge: A data-driven architecture with applications to 5G cellular networks,
straggler-tolerant and communication-efficient distributed learning, 2019.
IEEE Trans. Mob. Comput. (2020).
[48] Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, L. Tang,
[71] Ericsson. Ericsson Uplink Booster. [Online]. Available: https://www.ericsson.
Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,
com/en/networks/offerings/5g/uplink-booster.
SIGARCH Comput. Archit. News 45 (1) (2017) 615–629, [Online]. Available:
[72] C. Thapa, M.A.P. Chamikara, S. Camtepe, SplitFed: When federated learning
https://doi.org/10.1145/3093337.3037698.
meets split learning, 2020, arXiv preprint arXiv:2004.12088.
[49] S. Shahhosseini, A. Albaqsami, M. Jasemi, S. Hessabi, N. Bagherzadeh, Par- [73] W.Y.B. Lim, N.C. Luong, D.T. Hoang, Y. Jiao, Y.C. Liang, Q. Yang, D. Niyato,
tition pruning: Parallelization-aware pruning for deep neural networks, 2019, C. Miao, Federated learning in mobile edge networks: A comprehensive survey,
Computing Research Repository (CoRR), abs/1901.11391. [Online]. Available: IEEE Commun. Surv. Tutor. 22 (3) (2020) 2031–2063.
http://arxiv.org/abs/1901.11391. [74] M. Mohri, G. Sivek, A.T. Suresh, Agnostic federated learning, 2019.
[50] J. Kim, Y. Park, G. Kim, S.J. Hwang, SplitNet: Learning to semantically split [75] T.T. Anh, N.C. Luong, D. Niyato, D.I. Kim, L.-C. Wang, Efficient training
deep networks for parameter reduction and model parallelization, in: D. Precup, management for mobile crowd-machine learning: A deep reinforcement learning
Y.W. Teh (Eds.), Proceedings of the 34th International Conference on Machine approach, 2018.
Learning, in: Proceedings of Machine Learning Research, vol. 70, PMLR, [76] H.T. Nguyen, N.C. Luong, J. Zhao, C. Yuen, D. Niyato, Resource allocation
International Convention Centre, Sydney, Australia, 2017, pp. 1866–1874, in mobility-aware federated learning networks: A deep reinforcement learning
[Online]. Available: http://proceedings.mlr.press/v70/kim17b.html. approach, 2019.
[51] E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, S.-L. Kim, Communication-efficient [77] T. Li, M. Sanjabi, A. Beirami, V. Smith, Fair resource allocation in federated
on-device machine learning: Federated distillation and augmentation under learning, 2020.
non-IID private data, 2018. [78] T. Nishio, R. Yonetani, Client selection for federated learning with hetero-
[52] J. Yoon, W. Jeong, G. Lee, E. Yang, S.J. Hwang, Federated continual learning geneous resources in mobile edge, in: ICC 2019 - 2019 IEEE International
with weighted inter-client transfer, 2020. Conference on Communications (ICC), IEEE, 2019, [Online]. Available: http:
[53] A. Mitra, S. Bagchi, S. Sundaram, Event-Triggered Distributed Inference, arXiv: //dx.doi.org/10.1109/ICC.2019.8761315.
2004.01302. [79] G. Zhu, Y. Wang, K. Huang, Broadband analog aggregation for low-latency
[54] J.-H. Ahn, O. Simeone, J. Kang, Wireless federated distillation for distributed federated edge learning (extended version), 2018.
edge learning with heterogeneous data, 2019. [80] M.M. Amiri, D. Gunduz, Federated learning over wireless fading channels, 2019.
[55] H.-J. Jeong, H.-J. Lee, C.H. Shin, S.-M. Moon, IONN: Incremental offloading [81] K. Yang, T. Jiang, Y. Shi, Z. Ding, Federated learning via over-the-air
of neural network computations from mobile devices to edge servers, in: computation, 2018.
Proceedings of the ACM Symposium on Cloud Computing, in: SoCC ’18, [82] B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication-
Association for Computing Machinery, New York, NY, USA, 2018, pp. 401–411, efficient learning of deep networks from decentralized data, in: Artificial
[Online]. Available: https://doi.org/10.1145/3267809.3267828. Intelligence and Statistics, PMLR, 2017, pp. 1273–1282.
[56] S. Teerapittayanon, B. McDanel, H. Kung, BranchyNet: Fast inference via early [83] L.U. Khan, S.R. Pandey, N.H. Tran, W. Saad, Z. Han, M.N.H. Nguyen, C.S. Hong,
exiting from deep neural networks, 2017. Federated learning for edge networks: Resource optimization and incentive
mechanism, 2019.
[57] E. Li, Z. Zhou, X. Chen, Edge intelligence: On-demand deep learning model
[84] S. Feng, D. Niyato, P. Wang, D.I. Kim, Y.-C. Liang, Joint service pricing and
co-inference with device-edge synergy, in: Proceedings of the 2018 Workshop
cooperative relay communication for federated learning, 2018.
on Mobile Edge Communications, in: MECOMM’18, Association for Computing
[85] Y. Sarikaya, O. Ercetin, Motivating workers in federated learning: A stackelberg
Machinery, New York, NY, USA, 2018, pp. 31–36, [Online]. Available: https:
game perspective, 2019.
//doi.org/10.1145/3229556.3229562.
[86] Y. Zhan, P. Li, Z. Qu, D. Zeng, S. Guo, A learning-based incentive mechanism
[58] Y. Lin, S. Han, H. Mao, Y. Wang, B. Dally, Deep gradient compression:
for federated learning, IEEE Internet Things J. (2020) 1.
Reducing the communication bandwidth for distributed training, in: Inter-
[87] J. Kang, Z. Xiong, D. Niyato, S. Xie, J. Zhang, Incentive mechanism for reliable
national Conference on Learning Representations, 2018, [Online]. Available:
federated learning: A joint optimization approach to combining reputation and
https://openreview.net/forum?id=SkhQHMW0W.
contract theory, IEEE Internet Things J. 6 (6) (2019) 10700–10714.
[59] H. Wang, S. Sievert, S. Liu, Z. Charles, D. Papailiopoulos, S. Wright, ATOMO:
[88] W.Y.B. Lim, J.S. Ng, Z. Xiong, J. Jin, Y. Zhang, D. Niyato, C. Leung, C. Miao,
Communication-efficient learning via atomic sparsification, in: S. Bengio, H.
Decentralized edge intelligence: A dynamic resource allocation framework for
Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), Ad-
hierarchical federated learning, IEEE Trans. Parallel Distrib. Syst. 33 (3) (2021)
vances in Neural Information Processing Systems, Vol. 31, Curran Associates,
536–550.
Inc., 2018, [Online]. Available: https://proceedings.neurips.cc/paper/2018/file/
[89] W.Y.B. Lim, J.S. Ng, Z. Xiong, D. Niyato, C. Miao, D.I. Kim, Dynamic edge
33b3214d792caf311e1f00fd22b392c5-Paper.pdf.
association and resource allocation in self-organizing hierarchical federated
[60] L. Wang, W. Wang, B. LI, CMFL: Mitigating communication overhead for learning networks, IEEE J. Sel. Areas Commun. 39 (12) (2021) 3640–3653.
federated learning, in: 39th International Conference on Distributed Computing [90] W.Y.B. Lim, J. Huang, Z. Xiong, J. Kang, D. Niyato, X.-S. Hua, C. Leung,
Systems (ICDCS), 2019, pp. 954–964. C. Miao, Towards federated learning in uav-enabled internet of vehicles: A
[61] C. Louizos, K. Ullrich, M. Welling, Bayesian compression for deep multi-dimensional contract-matching approach, IEEE Trans. Intell. Transp. Syst.
learning, in: I. Guyon, U. von Luxburg, S. Bengio, H.M. Wallach, R. (2021).
Fergus, S.V.N. Vishwanathan, R. Garnett (Eds.), Annual Conference [91] N. Carlini, C. Liu, U. Erlingsson, J. Kos, D. Song, The secret sharer: Evaluating
on Neural Information Processing Systems, 2017, pp. 3288–3298, and testing unintended memorization in neural networks, 2018.
[Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/ [92] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor, R. Rogers, Protection against
69d1fc78dbda242c43ad6590368912d4-Abstract.html. reconstruction and its applications in private federated learning, 2018.
[62] S. Han, H. Mao, W.J. Dally, Deep compression: Compressing deep neural [93] J. Li, M. Khodak, S. Caldas, A. Talwalkar, Differentially private meta-learning,
networks with pruning, trained quantization and huffman coding, 2015. 2019.
[63] A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie, R. Pedarsani, FedPAQ: [94] R. Shokri, V. Shmatikov, Privacy-preserving deep learning, in: 53rd Annual
A communication-efficient federated learning method with periodic averaging Allerton Conference on Communication, Control, and Computing (Allerton),
and quantization, 2020. 2015, pp. 909–910.

19
O. Nassef et al. Computer Networks 207 (2022) 108820

[95] A.N. Bhagoji, S. Chakraborty, P. Mittal, S. Calo, Analyzing federated learning [127] C. Dwork, F. Mcsherry, K. Nissim, A. Smith, Calibrating noise to sensitivity
through an adversarial lens, 2018. in private data analysis, in: Proceedings of the 3rd Theory of Cryptography
[96] C. Fung, C.J.M. Yoon, I. Beschastnikh, Mitigating sybils in federated learning Conference, Springer, 2006, pp. 265–284.
poisoning, 2018. [128] L. Melis, G. Danezis, E.D. Cristofaro, Efficient private statistics with succinct
[97] H. Kim, J. Park, M. Bennis, S.-L. Kim, Blockchained on-device federated sketches, 2016.
learning, 2018. [129] N. Agarwal, A.T. Suresh, F. Yu, S. Kumar, H.B. Mcmahan, CpSGD:
[98] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. McMahan, S. Patel, D. Communication-efficient and differentially-private distributed SGD, 2018.
Ramage, A. Segal, K. Seth, Practical secure aggregation for privacy-preserving [130] A. Bittau, U. Erlingsson, P. Maniatis, I. Mironov, A. Raghunathan, D. Lie, M.
machine learning, 2017, pp. 1175–1191. Rudominer, U. Kode, J. Tinnes, B. Seefeld, Prochlo, in: Proceedings of the 26th
[99] Y. Liu, Z. Ma, X. Liu, S. Ma, S. Nepal, R. Deng, Boosting privately: Symposium on Operating Systems Principles, ACM, 2017.
Privacy-preserving federated extreme boosting for mobile crowdsensing, 2019. [131] A. Ali, T. Lepoint, S. Patel, M. Raykova, P. Schoppmann, K. Seth, K. Yeo,
[100] C. Dwork, A. Roth, The algorithmic foundations of differential privacy, Found. Communication–Computation Trade-Offs in PIR, Report 2019/1483, Cryptology
Trends Theor. Comput. Sci. 9 (2013). ePrint Archive, 2019, https://eprint.iacr.org/2019/1483.
[101] S. Song, K. Chaudhuri, A.D. Sarwate, Stochastic gradient descent with differ- [132] M. Chenal, Q. Tang, On Key Recovery Attacks against Existing Somewhat Ho-
entially private updates, in: IEEE Global Conference on Signal and Information momorphic Encryption Schemes, Report 2014/535, Cryptology ePrint Archive,
Processing, 2013, pp. 245–248. 2014, https://eprint.iacr.org/2014/535.
[102] M. Abadi, A. Chu, I. Goodfellow, H.B. McMahan, I. Mironov, K. Talwar, [133] E. Roth, D. Noble, B. Falk, A. Haeberlen, Honeycrisp: large-scale differentially
L. Zhang, Deep learning with differential privacy, Proceedings of the 2016 private aggregation without a trusted core, 2019, pp. 196–210.
ACM SIGSAC Conference on Computer and Communications Security (2016) [134] Y. Sun, J. Liu, J. Wang, Y. Cao, N. Kato, When machine learning meets privacy
[Online]. Available: http://dx.doi.org/10.1145/2976749.2978318. in 6G: A survey, IEEE Commun. Surv. Tutor. 22 (4) (2020) 2694–2724.
[103] C. Dwork, Differential privacy: A survey of results, TAMC 4978 (2008) 1–19. [135] T. Chen, et al., MXNet: A flexible and efficient machine learning library for
[104] R.C. Geyer, T. Klein, M. Nabi, Differentially private federated learning: A client heterogeneous distributed systems, 2015.
level perspective, 2017. [136] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadar-
[105] H.B. McMahan, D. Ramage, K. Talwar, L. Zhang, Learning differentially private rama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding,
recurrent language models, 2017. 2014.
[106] M. Abadi, et al., TensorFlow: Large-scale machine learning on heterogeneous [137] A. Paszke, et al., Pytorch: An imperative style, high-performance deep
systems, 2015, Software available from tensorflow.org. [Online]. Available: learning library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché
https://www.tensorflow.org/. Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Process-
[107] Facebook, Opacus, 2020, https://github.com/pytorch/opacus. ing Systems 32, Curran Associates, Inc., 2019, pp. 8026–8037, [Online].
[108] R. Bost, R. Popa, S. Tu, S. Goldwasser, Machine learning classification over Available: http://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-
encrypted data, 2015. performance-deep-learning-library.pdf.
[109] B.D. Rouhani, M.S. Riazi, F. Koushanfar, DeepSecure: Scalable provably-secure
deep learning, 2017.
[110] P. Mohassel, M. Rosulek, Y. Zhang, Fast and secure three-party computation,
Omar Nassef obtained his M.Sc. degree in Computer
2015, pp. 591–602.
Science from King’s College London, and is currently un-
[111] J. Furukawa, Y. Lindell, A. Nof, O. Weinstein, High-throughput secure three-
dertaking a Ph.D. at the Centre for Telecommunications
party computation for malicious adversaries and an honest majority, 2017, pp.
Research (CTR). His research interest includes multipath
225–255.
protocols, machine learning and 5G communications. Omar
[112] P. Mohassel, Y. Zhang, SecureML: A system for scalable privacy-preserving has contributed to number of research projects including
machine learning, in: 2017 IEEE Symposium on Security and Privacy (SP), 2017, 5GCAR and ANIARA.
pp. 19–38.
[113] A. Gason, P. Schoppmann, B. Balle, M. Raykova, J. Doerner, S. Zahur, D.
Evans, Privacy-Preserving Distributed Linear Regression on High-Dimensional
Data, Report 2016/892, Cryptology ePrint Archive, 2016.
[114] A. Dalskov, D. Escudero, M. Keller, Secure evaluation of quantized neural
Wenting Sun is a Senior Data Science Manager in Ericsson.
networks, 2019.
She leads a team of data scientists and data engineers to
[115] N. Agrawal, A.S. Shamsabadi, M.J. Kusner, A. Gascón, QUOTIENT: Two-party
develop cutting edge AI/ML applications in telecommuni-
secure neural network training and prediction, 2019.
cation domain and drives some of the AI related open
[116] P. Mohassel, P. Rindal, ABY 3: A mixed protocol framework for machine
sources initiatives for Ericsson. Wenting has been working
learning, 2018, pp. 35–52.
in the AI/ML related field for the last 17 years, has 10+
[117] V. Chen, V. Pastro, M. Raykova, Secure computation for machine learning with
peer reviewed publications. She holds a bachelor’s degree
SPDZ, 2019. in electrical and electronics engineering and a Ph.D. degree
[118] A. Acar, H. Aksu, A.S. Uluagac, M. Conti, A survey on homomorphic encryption in intelligent control.
schemes: Theory and implementation, 2017.
[119] C. Gentry, Fully homomorphic encryption using ideal lattices, in: Proceedings
of the Annual ACM Symposium on Theory of Computing, Vol. 9, 2009, pp. Hakimeh Purmehdi is a data scientist at Ericsson Global
169–178. Artificial Intelligence Accelerator, where leads innova-
[120] Z. Brakerski, Fully homomorphic encryption without modulus switching from tive AI/ML solutions for future wireless communication
classical gapsvp, Proc. Adv. Cryptol.-Crypto 7417 (2012). networks. She received her Ph.D. degree in electrical en-
[121] Z. Brakerski, C. Gentry, V. Vaikuntanathan, (Leveled) fully homomorphic gineering from the Department of Electrical and Computer
encryption without bootstrapping, in: Electronic Colloquium on Computational Engineering, University of Alberta, Edmonton, AB, Canada.
Complexity (ECCC), Vol. 18, 2011, p. 111. After completing a postdoc in AI and image processing
[122] L.T. Phong, Y. Aono, T. Hayashi, L. Wang, S. Moriai, Privacy-preserving deep at the Radiology Department, University of Alberta, she
learning via additively homomorphic encryption, IEEE Trans. Inf. Forensics co-founded Corowave, a start-up to develop resilient to
Secur. 13 (5) (2018) 1333–1345. movement, non-contact and continuous vital-signs mon-
[123] V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, N. Taft, Privacy- itoring platform and sensors solution, leveraging radio
preserving ridge regression on hundreds of millions of records, in: IEEE frequency technology and machine learning. Before joining
Ericsson, she was with Microsoft Research (MSR) as a
Symposium on Security and Privacy, 2013, pp. 334–348.
research engineer, and collaborated on developments of
[124] C. Chen, J. Zhou, L. Wang, X. Wu, W. Fang, J. Tan, L. Wang, X. Ji, A. Liu, H.
TextWorld project, which is a testbed for reinforcement
Wang, C. Hong, When homomorphic encryption marries secret sharing: Secure
learning research projects. Her research focus is basically
large-scale sparse logistic regression and applications in risk control, 2020.
on the intersection of wireless communication (5G and be-
[125] Q. Zhang, L.T. Yang, Z. Chen, Privacy preserving deep computation model
yond including resource management and edge computing),
on cloud for big data feature learning, IEEE Trans. Comput. 65 (5) (2016)
AI solutions (such as online learning, federated learning,
1351–1362. reinforcement learning, deep learning), optimization, and
[126] B. Avent, A. Korolova, D. Zeber, T. Hovden, B. Livshits, BLENDER: Enabling biotech.
local search with a hybrid differential privacy model, J. Priv. Confid. 9 (2)
(2019).

20
O. Nassef et al. Computer Networks 207 (2022) 108820

Mallik Tatipamula is a CTO at Ericsson, leading evolution Toktam Mahmoodi received the B.Sc. degree in electrical
of Ericsson’s technology, and champion the company’s next engineering from the Sharif University of Technology, Iran,
phase of innovation and growth driven by 5G Distributed and the Ph.D. degree in telecommunications from King’s
Multi-Cloud Deployments. He also leads O-RAN and 6G College London, U.K. She was a Visiting Research Scientist
research efforts. Prior to Ericsson, he held several lead- with F5 Networks, San Jose, CA, in 2013, a Post-Doctoral
ership positions at F5 networks, Juniper, Cisco, Motorola, Research Associate with the ISN Research Group, Electrical
Nortel and IIT Chennai. Since 2011, he has been a visiting and Electronic Engineering Department, Imperial College
professor at King’s College London. He is a Fellow of Cana- from 2010 to 2011, and a Mobile VCE Researcher from
dian Academy of Engineering (CAE) and The Institution of 2006 to 2009. She has also worked in mobile and personal
Engineering and Technology (IET). He received ‘‘UC Berke- communications industry from 2002 to 2006, and in an R&D
ley’s Garwood Center for Corporate Innovation Award,’’ team on developing DECT standard for WLL applications.
‘‘CTO/Technologist of the year’’ award (sponsored by NTT) She has contributed to, and led number of FP7, H2020
by World Communications Awards (WCA), ‘‘IEEE ComSoc and EPSRC funded projects, advancing mobile and wireless
Distinguished Industry Leader Award,’’ ‘‘IET Achievement communication networks. Toktam is currently Head of the
medal in telecommunications’’ ‘‘CTO of the year from Silicon Centre for Telecommunications Research at the Department
Valley Business Journal (2019–2020)’’. He received his of Informatics, King’s College London. Her research interests
Ph.D., Master’s, and bachelor’s degrees from the University include 5G communications, network virtualization, and low
of Tokyo, IIT (Chennai), and the NIT, Warangal, India, latency networking.
respectively.

21

You might also like