0% found this document useful (0 votes)
8 views16 pages

2023-An Empirical Study of Federated Learning On Iot-Edge Devices - Resource Allocation and Heterogeneity

This empirical study investigates Federated Learning (FL) on IoT-Edge devices, focusing on resource allocation and heterogeneity in real-world environments. The research reveals that on-device settings can achieve similar training accuracy to simulations, but exhibit more complex operational behaviors, particularly due to disparities in computational and networking resources. Key findings highlight that data heterogeneity significantly impacts FL performance, emphasizing the need for a deeper understanding of on-device behaviors in diverse settings.

Uploaded by

lxt18022003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views16 pages

2023-An Empirical Study of Federated Learning On Iot-Edge Devices - Resource Allocation and Heterogeneity

This empirical study investigates Federated Learning (FL) on IoT-Edge devices, focusing on resource allocation and heterogeneity in real-world environments. The research reveals that on-device settings can achieve similar training accuracy to simulations, but exhibit more complex operational behaviors, particularly due to disparities in computational and networking resources. Key findings highlight that data heterogeneity significantly impacts FL performance, emphasizing the need for a deeper understanding of on-device behaviors in diverse settings.

Uploaded by

lxt18022003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO.

X, X 2023 1

An Empirical Study of Federated Learning on


IoT-Edge Devices: Resource Allocation and
Heterogeneity
Kok-Seng Wong∗ , Member, IEEE, Manh Nguyen-Duc , Khiem Le-Huy , Long Ho-Tuan ,
Cuong Do-Danh , Member, IEEE, and Danh Le-Phuoc , Member, IEEE
arXiv:2305.19831v1 [cs.LG] 31 May 2023

Abstract—Nowadays, billions of phones, IoT and edge devices


around the world generate data continuously, enabling many Ma-
chine Learning (ML)-based products and applications. However,
due to increasing privacy concerns and regulations, these data
tend to reside on devices (clients) instead of being centralized for
performing traditional ML model training. Federated Learning
(FL) is a distributed approach in which a single server and
multiple clients collaboratively build an ML model without
moving data away from clients. Whereas existing studies on FL
have their own experimental evaluations, most experiments were
conducted using a simulation setting or a small-scale testbed. This
might limit the understanding of FL implementation in realistic
environments. In this empirical study, we systematically conduct
extensive experiments on a large network of IoT and edge devices
(called IoT-Edge devices) to present FL real-world characteristics,
including learning performance and operation (computation
and communication) costs. Moreover, we mainly concentrate on
heterogeneous scenarios, which is the most challenging issue of FL.
By investigating the feasibility of on-device implementation, our
study provides valuable insights for researchers and practitioners,
promoting the practicality of FL and assisting in improving the
Fig. 1. The standard FL framework.
current design of real FL systems.
Index Terms—Federated Learning, IoT-Edge Devices, On-
Device Training, Empirical Study. To address this issue, Federated Learning (FL) [5] was
proposed, which allows multiple parties (clients) to train a
I. I NTRODUCTION shared global model collaboratively in a decentralized fashion

B Y the end of 2018, there were an estimated 22 billion


IoT devices in use around the world and this number is
increasing fast. Forecasts suggest that by 2030 the number
without sharing any private dataset. In general, a standard
FL framework, as illustrated in Fig. 1, consists of two main
steps: (1) Client training, in which clients train models on their
of IoT devices will increase to around 50 billion [1]. Also, local data for several epochs and send their trained models
100 billion ARM CPUs currently dominate the IoT market to a central server, and (2) Model aggregation, in which the
have been shipped so far [2]. This installation base is server aggregates those models to establish a global model and
a key enabler for many industrial and societal domains, distributes this global model back to the clients. This 2-step
especially Artificial Intelligence (AI) and Machine Learning procedure is repeated for numerous rounds until the global
(ML) powered applications [3]. However, due to increasing model converges or a target level of accuracy is reached.
privacy concerns and regulations [4], especially in sensitive Although FL recently has received considerable attention
domains like healthcare or finance, these valuable assets mostly from the research community [6, 7] thanks to several advantages
remain inaccessible and cannot be centralized for conducting such as scalability or data privacy protection, it still has many
traditional ML model training. serious challenges which lead to difficulties for real-world
implementation. Specifically, clients in a federation differ from
Manuscript received X 2023; revised X 2023; accepted X 2023. Date of
publication X 2023; date of current version X 2023. each other in terms of computational and communication capac-
Kok-Seng Wong, Khiem Le-Huy, Long Ho-Tuan, and Cuong Do-Danh are ity. For instance, the hardware resources (memory, CPU/GPU,
with the College of Engineering and Computer Science, VinUniversity, Hanoi, or connectivity) of various IoT and edge devices (IoT-Edge
Vietnam. (e-mail:{wong.ks, khiem.lh, long.ht, cuong.dd}@vinuni.edu.vn)
Manh Nguyen-Duc, and Danh Le-Phuoc are with the Technische Uni- devices) like Raspberry Pi devices or NVIDIA Jetson devices
versität Berlin Straße des 17. Juni 135, 10623 Berlin, Germany. (e-mail: are much different. Therefore, considering all clients equally
[email protected], [email protected]) might lead to suboptimal efficiency. Furthermore, the training
Kok-Seng Wong, Manh Nguyen-Duc, and Khiem Le-Huy contributed equally
to this work. data owned by each client can be non-independent, identically
∗ Corresponding author: [email protected] (Kok-Seng Wong) distributed (Non-IID), and with different quality and quantity.
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 2

These challenges make FL impractical and limit the motivation as specified in RQ2. Additionally, we aim to find the
of parties to join the federation for training. dominant factor towards the behaviors of FL in a real-world
Despite the aforementioned real-world issues, most existing deployment.
studies on FL heavily rely on simulation settings or small-scale To answer these questions, we need stable FL systems
testbeds of devices [8, 9, 10] to examine the behavior of their that can be deployed our targeted hardware, i.e., Raspberry
systems. While simulation settings are useful for controlled Pi 3 (Pi3), Raspberry Pi 4 (Pi4), Jetson Nano (Nano) and
testing and development of FL models, they face significant Jetson TX2 (TX2) and can support GPUs on edge computing
challenges in adequately covering all operational aspects of real- boards. While many algorithms are accompanied by source
world deployments. Specifically, existing simulators cannot em- code, only Federated Averaging (FedAvg) [5] can satisfy our
ulate crucial aspects of realistic execution environments, such as requirements due to its popularity. FedAvg has been extensively
resource consumption (e.g., memory, CPU/GPU usage, battery studied and evaluated in the literature with a large number of
life) and network connectivity (e.g., bandwidth and network works reporting its performance characteristics and limitations
congestion). These factors significantly impact the performance in simulations. However, understanding its behavior on real
of FL systems, as demonstrated in Section IV. Additionally, devices is still limited (c.f. Secion II. Hence, we will focus
other realistic environment aspects such as data distribution, on FedAvg for our studies in this paper and leave others for
underlying software libraries, and executing settings introduce future work. However, our experiment design in Section III
further challenges that can affect FL performance. Therefore, is general enough to be replicated in other algorithms, given
this motivates us to conduct more comprehensive evaluations that their implementations are stable enough to run on targeted
of such aspects to ensure their effectiveness and scalability. devices.
In Section II, we observe a lack of experimental studies
that systematically investigate the implementation of FL on
real devices and assess the impact of intrinsic heterogeneity B. Our Key Findings
on performance and costs. Although there have been some
Along this light, our extensive set of experiments reported
attempts to implement FL on IoT-Edge devices at small scales
in Section IV reveal the following key findings:
with simplistic settings, it is desirable to have more reproducible
experiments in larger and more realistic settings. Hence, to the • The on-device settings can achieve similar training accuracy
best of our knowledge, our study pushed the experiment scale to the simulation counterparts with similar convergence
and complexity to a new level. behaviors. But when it comes to operational behaviours
related to computation and communication, the on-device
ones show much more complicated behavior patterns for
A. Objectives, Research Questions and Scope realistic IoT-Edge deployments.
To identify potential issues and limitations on real devices • The disparity in computational and networking resources
that may not be apparent in simulated environments, we among the participating devices leads to longer model update
focus our study on the impact of resource allocations and (local and global) exchange times because high computational
heterogeneity independently and their combined effects in devices need to wait for the server to receive and aggregate
realistic environments. To achieve this, we focus on the local updates from low computational devices. This hints that
following research questions (RQ): an oversimplified emulation of these aspects in simulation
• RQ1: What are the behaviors of FL implementation in setting highly likely lead to unexpected outcomes of a FL
realistic environments compared to a simulation setting? algorithm at the deployment phase.
In this RQ, we compare many simulation and on-device • Data heterogeneity is the most dominant factor in FL
deployment aspects. We want to see how simulation results performance, followed by the number of clients. The per-
can represent reality because FL experiments conducted in formance of the global model is affected most by the data
a controlled laboratory setting may not accurately reflect distribution (i.e., Non-IID and Extreme Non-IID) of each
the challenges and complexities of realistic device-based participating client, especially for challenging learning tasks.
environments. Hence, combining with the disparity in computational and
• RQ2: How do resource allocation and heterogeneity networking resources, FL on diverse IoT-Edge devices in
affect the learning performance and operation costs? realistic deployment settings need further understanding on-
There are several factors that can affect FL deployment. This device behaviours in terms combining all these factors in
RQ focuses on the client participation rate, communication tandem.
bandwidth, device and data heterogeneity. We test each factor
independently to learn their impact on the behaviors of FL.
C. Paper Outline
Specifically, we want to observe the impact of varying the
number and type of devices, bandwidth, and data distribution The rest of this article is organized as follows. Section II
on the FL process for each factor. presents preliminaries to our work and discusses some existing
• RQ3: How do these two factors, resource allocation surveys and empirical studies on FL. In Section III, we show
and heterogeneity, simultaneously affect the learning our experimental designs and followed by our results and
performance and operation costs? This RQ is an essential findings in Section IV. Finally, we give further discussions in
study on understanding the impact of combined factors Section V and conclude this empirical study in Section VI.
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 3

II. P RELIMINARIES AND R ELATED W ORKS al. [15] explore and analyze the potential of FL for enabling
A. Federated Learning a wide range of IoT services, including IoT data sharing,
data offloading and caching, attack detection, localization,
In the standard FL framework, data for learning tasks is
mobile crowdsensing, and IoT privacy and security. Ahmed
acquired and processed locally at the IoT-Edge nodes, and only
et al. [16] discuss the implementation challenges and issues
the trained model parameters are transmitted to the central
when applying FL to an IoT environment. Zhu et al. [17]
server for aggregation. In general, along with an initialization
provides a detailed analysis of the influence of Non-IID data
stage, FL involves the following stages:
on different types of ML models in both horizontal and
• Stage 0 (Initialization): The aggregation server S first initiates
vertical FL. Li et al. [18] conduct extensive experiments to
the weight w0 of the global model and hyperparameters such evaluate state-of-the-art FL algorithms on Non-IID data silos
as the number of communication rounds T , size of the and find that Non-IID does bring significant challenges in
selected clients for each round N , and local training details. learning accuracy of FL algorithms, and none of the existing
• Stage 1 (Client training): All selected clients C1 , C2 , C3 , ...,
state-of-the-art FL algorithms outperforms others in all cases.
CN receive the current global weight from S. Next, each Ci Recently, Matsuda et al. [19] benchmark the performance of
updates its local model parameters wit using its local dataset existing personalized FL through comprehensive experiments
Di , where t denotes the current communication round. Upon to evaluate the characteristics of each method and find that
the completion of the local training, all selected clients send there are no champion methods. Caldas et al. [20] propose
the local weight to S for model aggregation. LEAF, a modular benchmarking simulation-based framework
• Stage 2 (Model Aggregation): S aggregates the received local
for learning in federated settings. LEAF includes a suite of open-
weights based on a certain mechanism and then sends back source federated datasets, a rigorous evaluation framework,
the aggregated weights to the clients for the next round of and a set of reference implementations. To the best of our
local training. knowledge, we are the first ones that consider an empirical
study of FL on IoT-Edge devices.
B. Federated Averaging Algorithm For real-world FL implementation, Di et al. [8] present
Federated Averaging (FedAvg) is the de facto FL algorithm FedAdapt, an adaptive offloading FL framework based on
that is included in most FL systems [5]. As shown in Algorithm reinforcement learning and clustering to identify which layers
1, FedAvg aggregates the locally trained model parameters by of the DNN should be offloaded for each device onto a server.
weighted averaging proportional to the amount of local dataset Experiments are carried out on a lab-based testbed, including
Di , that each client Ci had (corresponding to the above Stage two Pi3s, two Pi4s, and one Jetson Xavier. Sun et al. [9] propose
2). Note that there are many advanced FL algorithms were a model selection and adaptation system for FL (FedMSA),
introduced (e.g., FedProx [11] and FedMA [12]), with different which includes a hardware-aware model selection algorithm,
purposes in the last few years [13, 14]. then demonstrate the effectiveness of their method on a network
of two Pi4s and five Nanos. Mills et al. [10] propose adapting
Algorithm 1 FedAvg Algorithm [5]. FedAvg to use a distributed form of Adam optimization, then
1: Aggregation Server executes: test their method on a small testbed of five Pi2s and five Pi3s.
2: initialize: w ← w0 Furthermore, Zhang et al. [21] build the FedIoT platform for
3: for each round t = 1, 2, 3, . . . , T do on-device anomaly data detection and evaluate their platform
4: for each client i = 1, 2, 3, . . . , N in parallel do on a network of ten Pi4s. However, these attempts are still on
5: t
wi ← w t−1 a small scale and do not represent real-world environments.
t t
6: wi ← ClientTraining(wi , Di )
TABLE I
7: end for C OMPARISON BETWEEN OUR WORK AND OTHERS .
8: // ModelAggregation
PN
wt+1 ← PN1 n t Empirical Simulation Device-based
9: i=1 ni wi
i=1 i
10: end for Studies -based Small-scale Large-scale Device
11: return: w T (up to 10) (up to 64) Heter.*
12: [18, 19, 20] ✓ ✗ ✗ ✗
13: ClientTraining(wi , Di ): // Run on client Ci [21, 9, 10] ✓ ✓ ✗ ✗
14: for each epoch e = 1, 2, 3, . . . , E do Ours ✓ ✓ ✓ ✓
15: wi ← wi − η∇l(wi ; Di ) *
Device Heterogeneity: Study of different types of IoT devices
16: end for
17: return: wi
III. E XPERIMENTAL D ESIGN
This section describes how we designed our experiments to
C. Related Works answer our research questions in Section I-A. Starting with data
Several available theoretical surveys and simulation-based preparation, we then implement FL on IoT-Edge devices with
empirical studies on FL are available in the literature. Dinh et different settings based on the evaluation factors we defined.
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 4

Fig. 2. Our Methodology.

After that, we use a bag of metrics to analyze the impact of the model does not need massive resources for training, making
these factors individually and their combined effects in different it suitable for deployment on IoT-Edge devices.
aspects. Fig. 2 illustrates this workflow in detail.
B. Hardware and Software Specifications
A. Data Preparation and Models In the past few years, many IoT-Edge devices have entered
1) Datasets: We use two datasets in this work : CIFAR10 the market with different prices and abilities. In this work,
[22] and CIFAR100 [22], which are commonly used in previous we use the most popular ones such as Pi3, Pi4, Nano, and
studies on FL [11, 18]. CIFAR10 consists of 60000 32x32 color TX2. Different types of devices with different generations
images and is the simple one. The images are labeled with one have different resources and processing capabilities. A diverse
of 10 exclusive classes. There are 6000 images per class with pool of devices helps us more accurately represent the real
5000 training and 1000 testing images. CIFAR100 also consists world. Our devices are connected to a workstation, which is
of 60000 32x32 color images and is more challenging to train, used as the server, via a network of IoT-Edge devices and
however, each image comes with one of 100 fine-grained labels. switches. Fig. 4 is a snapshot of our infrastructure. In more
There are 600 images per class with 500 training and 100 testing detail, Table II provides specifications of these devices, and
images. the server machine and simulation machine are also described.
2) Data Partitioning: The CIFAR10 and CIFAR100 datasets For software specifications, we use the PyTorch [24] frame-
are not separated for FL originally, we need to divide these two work version 1.13.1 to implement deep learning components
datasets synthetically. While the test sets are kept at the server and use the Flower [25] framework version 1.11.0 FedAvg
for testing the aggregated model, we divide the training set of algorithm. Additionally, we use Docker technology to create a
each dataset into 64 disjoint partitions with an equal number separate container on each device to perform local training.
of samples in three different ways to simulate three scenarios
of heterogeneity that are IID, Non-IID, and Extreme Non-IID C. Evaluation Metrics
(ExNon-IID). The IID strategy adapts independent and random
In this study, we use a comprehensive set of metrics to
division, as shown in Fig. 3(a) and 3(b), the data distribution in
characterize and quantify the impact of heterogeneity factors
each client is basically the same. The Non-IID and ExNon-IID
on the behaviors of FL implementation in realistic environments.
strategies use biased divisions proposed in [5, 23]. Specifically,
Specifically, test accuracy and convergence speed are used to
the whole dataset is sorted according to the labels and divided
evaluate the learning performance. Averaged training time,
into different chunks, then these chunks are randomly assigned
memory, and GPU/CPU utilization are used to measure
to different clients. The number of chunks affects the degree of
computational costs. Finally, we use the averaged model update
heterogeneity across clients. As shown in Fig. 3(c)-(f), while
(local and global) exchange time between the clients and the
each client in Non-IID contains approximately four and ten
aggregation server to measure the communication cost. Table
data classes in CIFAR10 and CIFAR100, respectively, each
III provides concise definitions of all our used metrics.
client in ExNon-IID contains only one and two data classes
in CIFAR10 and CIFAR100 respectively, which simulates the
extreme data heterogeneity across clients. D. Experiments Setup
3) Model Architecture: Following previous works [5, 20], we 1) Behaviors of On-Device FL Implementation (RQ1): First
study a popular CNN model designed for image classification of all, we conduct a baseline experiment on the simulation.
tasks, called CNN3 on the two datasets. The model only Particularly, we simulate eight clients in which each client holds
includes two 5x5 convolution layers (the first with 32 channels, one of the first eight partitions (12.5 % of total partitions) in
the second with 64), each followed by a ReLU activation the CIFAR10 IID dataset. For the training settings, we train a
function and a 2x2 max pooling. After that, one fully connected simple CNN3 model described above for 500 communication
layer with 512 units and ReLu activation is added, followed by rounds, at each round, the model is trained for 2 local epochs
a softmax layer as a classifier. The number of output units is 10 at the clients, SGD optimizer is used with a learning rate
for CIFAR10 and 100 for CIFAR100. By its simple architecture, of 0.01, and the batch size is set to 16. To answer the
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 5

Fig. 3. Data distribution of the first 24 clients in the CIFAR10 and CIFAR100 datasets.

TABLE II
H ARDWARE S PECIFICATIONS .

Machine Memory CPU GPU Connectivity


Pi 3 1GB LPDDR2 900 MHz (32 bit) 4-Core ARM-A53 1.20 GHz – 100 Mbps
Pi 4 8GB LPDDR4 3200 MHz (32 bit) 4-Core ARM-A72 1.50 GHz – 1 Gbps
Jetson Nano 4GB LPDDR4 1600 MHz (64 bit) 4-Core ARM-A57 1.43 GHz Maxwell 4GB x1 1 Gbps
Jetson TX2 8GB LPDDR4 1866 MHz (64 bit) 4-Core ARM-A57 2.00 GHz Pascal 8GB x1 1 Gbps
& 2-Core Denver2 2.00 GHz
Server 2048GB DDR4 2666 MHz (64 bit) Intel Xeon Gold 5117 2.00 GHz Tesla V100 16GB x2 1 Gbps
Simulation 256GB LPDDR4 2666 MHz (64 bit) Intel Xeon Gold 6242 2.80 GHz RTX 3090 24GB x4 –

RQ1 described in Section I-A, we then turn the simulation impact of these factors, we conduct extensive experiments that
environment in the above experiment into realistic environments are shown in detail in Fig. 5. Training settings are the same as
by sequentially using eight Pi3s, eight Pi4s, and eight Nanos in the baseline experiment in RQ1. By conducting experiments
as clients. These devices are connected to a server machine defined in Fig. 5, we can observe what happens when the
via ethernet connections. For comparison, all training settings number of participating clients increases, the communication
are maintained as in the baseline. We use all metrics defined bandwidth is saturated, and when intrinsic heterogeneity is
in Table III to describe the behaviors of FL implementation. introduced across clients. The results and conclusions for RQ2
The results and conclusions are shown in Section IV-A. experiments are provided in Section IV-B.
2) Impact of Single Factor (RQ2): For the RQ2, we 3) Impact of Combined Factors (RQ3): After observing the
consider two critical factors in FL, namely resource allocation impact of resource allocation and heterogeneity individually
and heterogeneity. Resource allocation includes the number by addressing RQ2, we aim to explore more realistic scenarios
of participating clients and the connection’s communication where these two factors appear simultaneously. First, we vary
bandwidth, and heterogeneity includes device heterogeneity and the number of participating clients and increase the degree
data heterogeneity (statistical heterogeneity). To explore the of heterogeneity in client devices concurrently. Second, we
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 6

TABLE III
E VALUATION M ETRIC D EFINITIONS .

Metrics Unit Definition


Performance Test Accuracy Percentage Accuracy of the global model on the test set at the server
Convergence No. Rounds The number of communication rounds that
Speed the global model needs to converge
Cost Computational Cost Avg Training Second Average local training time per round for all clients
Time
Avg Memory Percentage Average memory utilization during training for all clients
Utilization
Avg GPU/CPU Percentage Average GPU/CPU utilization during training for all clients
Utilization
Communication Cost Avg Update Exchange Second Averaged time interval per round when
Time clients send the model to the server until receiving it back

described in III-D1. All four experiments use the same eight


partitions of the CIFAR10 IID dataset and the same training
details, it is reasonable that test accuracy and convergence speed
in these experiments are consistent. In terms of computational
cost, training time exponentially increases when we change the
devices from TX2 and Nano to Pi4, then Pi3. From resource
utilization, Pi3 devices seem to be overloaded when training a
small model like CNN3, while Nano devices can handle the
task easier due to the support of GPU. Additionally, update
exchange time roughly doubles when we change the devices
from Nano to Pi4, then Pi3. These observations raise a need for
more efficient FL frameworks which are suitable for low-end
devices like Pi3, and even for weaker, lower-cost IoT devices or
sensors which were introduced more and more with extremely
limited computational capacity.

B. Impact of Single Factor On FL Implementation (RQ2)


In this set of experiments, we observe the results of
experiments in RQ2 and analyze what happens when the
number of participating clients increases, the communication
Fig. 4. IoT-Edge Federated Learning Testbed. bandwidth is constrained, and when intrinsic heterogeneity is
introduced across clients.
1) Impact of the Resource Allocation:
still vary the number of participating clients in different
Impact of the Number of Clients. Fig. 7 and Fig. 8 show
data heterogeneity settings (IID, Non-IID, and ExNon-IID)
the effect of the number of participating clients on the learning
to observe the accuracy and convergence speed. Fig. 6 shows
performance of communication cost. Generally, increasing the
these experiments in detail. Additionally, training settings are
number of clients means more data involved in training the
the same as in the baseline experiment in RQ1. By conducting
global model, resulting in an improvement in test accuracy.
these experiments, we expect to gain more valuable insights
However, this also leads to a high diversity across client model
beyond those gained from the RQ2. Also, we aim to figure out
parameters which can slow down the convergence process.
the dominant factors towards the behaviors of FL in real-device
We also observe that when the number of clients increases
deployment. The results and conclusions for RQ3 experiments
from 32 to 64, the improvement in test accuracy is negligible,
are provided in Section IV-C.
however, the update exchange time goes up dramatically. From
this observation, we can empirically verify an assumption that
IV. E XPERIMENTAL R ESULTS
more participating clients do not guarantee better accuracy but
A. Behaviors of On-Device FL Implementation (RQ1) can lead to large congestion in communication and increase the
Table IV provides detailed results of experiments in RQ1 update exchange time. In this setting, it is easy to observe that
where we compare real-device FL implementations to the 32 is the optimal number of participating clients. Therefore,
baseline of simulation. Details of the experimental setup are we only use 32 clients in the remaining experiments in RQ2.
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 7

Fig. 5. Experiments Setup for Studying the Impact of Single Factor (RQ2).

Fig. 6. Experiments Setup for Studying the Impact of Combined Factors (RQ3).

Impact of the Communication Bandwidth. Next, we that update exchange time increase linearly when we decrease
investigate the effect of connection bandwidth on update the bandwidth. Specifically, when we halve the bandwidth
exchange time. One interesting point obtained from Fig. 9 is from 100Mbps to 50Mbps, the update exchange time increases
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 8

TABLE IV
B EHAVIORS OF O N -D EVICE FL I MPLEMENTATION (RQ1).

Hardware Performance Operation Cost


Computational Cost Communication Cost
Test Accuracy Convergence Avg Training Avg Memory Avg GPU/CPU Avg Update Exchange
Speed (no. rounds) Time (s) Utilization (%) Utilization (%) Time (s)
Pi 3 0.662 322 161.148 40.851 – / 73.188 52.471
Pi 4 0.664 338 22.739 11.414 – / 42.312 25.523
Nano 0.660 339 5.211 77.213 56.177 / 11.309 12.915
Simulation 0.667 314 1.524 13.363 11.118 / 0.578 3.949

across client devices. From Table V below, we can observe


that in a federation of heterogeneous devices, more powerful
devices such as Nano or TX2 only need a couple of seconds to
finish local training while weaker devices like Pi3 and Pi4 need
much longer. However, in a naive FedAvg framework, the server
needs to wait for all clients regardless of their strengths which
is the reason why the update exchange time of more powerful
devices is higher than weaker devices, this diminishes all
benefits that high-end devices bring. This observation suggests
a need for better client selection strategies based on the
client’s computational power in realistic systems to leverage
the presence of high-end devices.

TABLE V
I MPACT OF THE D EVICE H ETEROGENEITY.

Exps Devices Avg Training Avg Update Exchange


Fig. 7. Impact of the Number of Clients on Test Accuracy. Time (s) Time (s)
Exp 2.1.3 32 Pi 3 161.872 87.090
Exp 2.3.1 16 Pi 3 166.641 72.826
16 Pi 4 22.715 216.448
Exp 2.3.2 12 Pi 3 170.227 65.391
12 Pi 4 22.687 215.620
4 Nano 4.971 233.224
4 TX2 4.126 234.161
Fig. 8. Impact of the Number of Clients on Update Exchange Time.
Impact of the Data Heterogeneity. Heterogeneous data or
distribution shift is the most challenging issue in FL. Most
approximately by 4 times. Furthermore, it increases about by existing works on this issue only consider conventional Non-
8 times when the bandwidth is constrained four times from IID data scenarios. As discussed above, in this study, we further
100Mbps to 25Mbps. This observation promotes FL algorithms explore extreme cases of heterogeneity, i.e., ExNon-IID. Figs.
that are suitable for low-bandwidth systems. 10(a) and 10(b) to show the effect of data heterogeneity on
FL for CIFAR10 and CIFAR100 datasets, respectively. As
observed from these results, ExNon-IID scenarios degrade the
accuracy on test sets significantly compared to IID and Non-
IID cases. Additionally, ExNon-IID scenarios tend to cause
some fluctuation periods during training and slow down the
convergence process. This suggests that the development of
FL algorithms needs to tackle not only Non-IID cases but also
Fig. 9. Impact of the Bandwidth on Update Exchange Time. ExNon-IID.
In summary, we have figured out that increasing the number
2) Impact of the Heterogeneity: of participating clients generally leads to an improvement in
Impact of the Device Heterogeneity. Following the exper- accuracy due to the increase in data samples used for training.
iments in Fig. 5, we investigate the impact of heterogeneity However, when we substantially increase the number of clients
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 9

Fig. 10. Impact of the Data Heterogeneity.

(i.e., from 32 to 64), the improvement is not significant but the reduce the congestion in communication. These observations
update exchange time goes up dramatically. Moreover, the data also suggest that a large number of clients and the congestion
heterogeneity also affects the global model’s accuracy signifi- have a significantly negative effect on the update exchange time
cantly, especially in ExNon-IID cases. Besides heterogeneity and raise a need for novel FL algorithms capable of handling
in labels of local datasets, other types of data heterogeneity situations with massive clients.
such as quantity heterogeneity or distribution heterogeneity
are also important and might degrade the model’s accuracy
much further, however, these types of data heterogeneity are
still under-explored. In addition, the update exchange time
is linearly affected by communication bandwidth. Also, we
show that better client selection strategies are essential when
dealing with heterogeneous devices to leverage the presence
of high-end devices and reduce the update exchange time.
However, it is quite challenging on a real deployment when
the distributions of computing power and data are not known
as a prior and can not be simulated in a controlled setting.

C. Impact of Combined Factors On FL Implementation (RQ3)


This part reports the experimental results of RQ3 and
draws insights when two factors, resource allocation, and
heterogeneity, appear simultaneously. Also, we aim to figure
out dominant factors towards the FL behaviors in real-device
Fig. 11. Combined Impact of the Number of Clients and Device Heterogeneity
deployment. on Update Exchange Time.
Combined Impact of the Number of Clients and Device
Heterogeneity. We focus on investigating the effect of the Combined Impact of the Number of Clients and Data
number of clients and device heterogeneity across clients on Heterogeneity. We continue to study simultaneously the effect
the update exchange time. Fig. 11 shows the average update of the number of clients and data heterogeneity. Fig. 12 shows
exchange time of each type of device used in experiments 3.1.4 the test accuracy of the global model in experiments 3.2.1 to
to 3.1.6. By comparing these results with results in Fig. 8 and 3.2.16. From Fig. 12(a), 12(c), and 12(e), we can see that when
Table V, we can draw a fascinating insight that with the same increasing the number of clients from 32 to 64, the improvement
number of clients, heterogeneity in the federation can help in IID case is negligible. However, the improvement is more
reduce the overall update exchange time, and this gap seems significant in cases of Non-IID and ExNon-IID which means
more significant with a smaller number of clients. Unlike in that a large number of participating clients is essential in
homogenous scenarios where clients mostly finish local training heterogenous data scenarios. Moreover, the negative effect of
and update their local models to the server simultaneously, ExNon-IID data on the more challenging dataset, CIFAR100,
which causes considerable congestion, in heterogeneous sce- seems more serious. Therefore, we can conclude that data
narios, clients with more powerful devices complete their work heterogeneity is the most dominant factor in the model’s test
earlier, followed by weaker devices sequentially. This helps accuracy, especially in challenging datasets.
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 10

Fig. 12. Combined Impact of the Number of Clients and Data Heterogeneity.

In summary, we have figured out that the communication In this study, we observed that the practicality of FL on
congestion caused by a large number of clients has a signif- IoT-Edge devices depends on combined effects from various
icant negative effect on the update exchange time. However, factors such as device availability (number of participating
increasing the number of clients leads to improvements in clients), communication constraints (bandwidth availability),
accuracy, especially in heterogenous data scenarios. Also, and heterogeneity of data (data distribution) and devices
data heterogeneity is the most dominant factor that affects (computational capability and hardware configuration). These
the model’s test accuracy, especially in challenging datasets. factors are interdependent and affect each other, and hence, a
Going beyond the fundamental image classification task, data comprehensive analysis of the practicality of FL on IoT devices
heterogeneity might further hurt the model’s performance in should consider all these factors together. For example, the
other advanced tasks, such as object detection or segmentation, computational capability of devices can affect communication
which are under-explored in current literature. Interestingly, overhead, as devices with lower computational capability may
we also observe that some homogeneous devices can behave take longer to process and transmit data, resulting in higher
differently. This may be caused by various implicit factors such communication latency and overhead. Similarly, the hetero-
as power supply, network conditions, hardware and software geneity of devices can affect the robustness of FL algorithms,
variations, or user behavior. as the presence of devices with varying characteristics can
introduce heterogeneity in the data and make it challenging to
V. D ISCUSSIONS train accurate models.
In this section, we first discuss the practicality of FL on To address the processing power and storage capacity issues,
IoT-Edge devices (based on our experimental results) and then we need to design models that are optimized for lightweight
discuss other essential factors to consider while designing an devices and implement compression or distillation techniques to
FL system for IoT devices. reduce the size of the updates. There is also a need to implement
techniques such as asynchronous updates and checkpointing
A. Practicality of FL on IoT-Edge Devices to ensure that the training process can continue even when
FL requires local processing on the device, which can be devices are disconnected due to network connectivity issues.
challenging on lightweight devices with limited processing
power. In addition, storing the model updates locally can be
B. Other Considerable Factors
challenging due to the limited storage capacity. Another chal-
lenge is the unreliable connectivity of IoT devices. Federated Besides the factors studied in this work, it is essential
learning requires a stable and reliable network connection for to consider other factors that can cause IoT devices not to
devices to communicate with each other and the aggregation perform well in FL, such as the power supply of devices and
server. However, IoT-Edge devices are often deployed in remote specifications of memory cards, and the performance of the
locations with limited network connectivity. aggregation server when designing FL systems.
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 11

1) Power Supply: The amount of power available to the FL in IoT-Edge devices. Lastly, the study may miss out on the
device can impact its processing capability. If the device has a potential benefits of other FL algorithms that are better suited
limited power supply, it may not be able to perform complex for specific scenarios or applications. For instance, FedProx
computations or transmit large amounts of data efficiently. [11] is designed to handle heterogeneity in data across devices
Furthermore, the quality and reliability of the power supply and can improve the convergence rate of the FL process. It is
can affect the device’s stability and longevity. Power surges or important to note that these future improvements do not affect
outages can cause damage to the device’s components, leading the objectives and scopes of the current study.
to reduced performance and potentially even complete failure. Particularly, we plan to extend our study to a broader range
As shown in [26], when the battery life of the devices decreased, of scenarios by examining the impact of varying network
the accuracy of the global model also decreased significantly. conditions, communication protocols, and resource usage of
Hence, it is crucial to ensure that devices used in FL have FL. In addition, we want to conduct a comprehensive analysis
access to a reliable power supply with sufficient capacity to to measure the resource consumption of FL, including battery
handle the demands of the learning process. life and network bandwidth usage. We also want to focus
2) Memory Card Usage: The speed and capacity of the on real-world applications of FL on IoT devices, including
memory card can indirectly affect the overall performance of developing FL-based solutions for specific IoT use cases
the IoT device itself. If the memory card is slow or has limited such as environmental monitoring, predictive maintenance, and
capacity, it may result in slower data processing and storage, evaluating their performance in realistic environments.
slowing down the overall FL process. Also, the reliability and
durability of the memory card can impact FL performance. R EFERENCES
For instance, if the memory card fails or becomes corrupted, [1] Internet of Things (IoT) And Non-IoT Active
it can result in the loss of data, which can negatively impact Device Connections Worldwide From 2010 to 2025.
the accuracy and effectiveness of the FL model. https://www.statista.com/statistics/1101442/iot-number-
3) Performance of the Aggregation Server: The performance of-connected-devices-worldwide.
of the aggregation server is crucial to the success of the FL [2] Enabling Mass IoT Connectivity as
process and can bring a significant impact on the participating Arm Partners Ship 100 Billion Chips.
IoT devices. The aggregation server needs to have sufficient https://community.arm.com/arm-community-
computational resources to process the incoming model updates blogs/b/internet-of-things-blog/posts/enabling-mass-
from IoT devices. If the server is overloaded, this can cause iot-connectivity-as-arm-partners-ship-100-billion-chips.
delays or even crashes in the system, affecting the IoT devices [3] Christoph Gröger. There Is No AI Without Data. Commun.
involved. This can be particularly problematic if the IoT devices ACM, 64(11):98–108, 2021.
have limited resources themselves, as they may not be able to [4] Axel von dem Bussche Paul Voigt. The EU General Data
handle the increased workload. Protection Regulation (GDPR). Springer Cham, 2017.
[5] H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth
Hampson, and Blaise Agüera y Arcas. Communication-
VI. C ONCLUSIONS AND F UTURE W ORKS
Efficient Learning of Deep Networks From Decentralized
The results of our experiment have revealed several important Data. In Proceedings of the 20th International Conference
findings: (1) our simulation of FL has shown that it can be on Artificial Intelligence and Statistics, Aistats 2017, 2017.
a valuable tool for algorithm testing and evaluation, but its [6] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong.
effectiveness in accurately representing the reality of IoT-Edge Federated Machine Learning: Concept and Applications.
deployment is very limited, (2) the disparity in computational ACM Trans. on Intelligent Systems and Technology, 2019.
resources among IoT devices can significantly impact the [7] Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo
update exchange time, and (3) data heterogeneity is the most Wang, Yuan Li, Xu Liu, and Bingsheng He. A Survey on
dominant factor in the presence of other factors, especially Federated Learning Systems: Vision, Hype and Reality for
working in tandem with computation and network factors. Data Privacy and Protection. IEEE Trans. on Knowledge
Moving forward, several areas could be explored to expand and Data Engineering, 2021.
on the findings of this study. Firstly, considering the diversity [8] Di Wu, Rehmat Ullah, Paul Harvey, Peter Kilpatrick,
of devices used in FL, it would be valuable to test the approach Ivor Spence, and Blesson Varghese. FedAdapt: Adaptive
on a more comprehensive range of devices with different Offloading for IoT Devices in Federated Learning. IEEE
hardware, operating systems, and network connections to ensure Internet of Things Journal, 9(21):20889–20901, 2022.
the effectiveness and robustness of the approach. Secondly, [9] Rui Sun, Yinhao Li, Tejal Shah, Ringo W. H. Sham,
the dataset selection process used for training the FL model Tomasz Szydlo, Bin Qian, Dhaval Thakker, and Rajiv
could be further optimized to increase accuracy and efficiency Ranjan. FedMSA: Fedmsa: A Model Selection and
and ensure that the results represent all potential use cases. Adaptation System for Federated Learning. Sensors,
Additionally, to expand the scope of the study’s findings, 22(19), 2022.
exploring other FL algorithms beyond the standard FedAvg [10] Jed Mills, Jia Hu, and Geyong Min. Communication-
algorithm could be beneficial. These alternative algorithms Efficient Federated Learning for Wireless Edge Intelli-
could be better suited for specific scenarios or applications and gence in IoT. IEEE Internet of Things Journal, 7(7):5986–
may provide insights into how to improve the performance of 5994, 2020.
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 12

[11] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque
Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated de Gusmão, et al. Flower: A Friendly Federated Learning
Optimization in Heterogeneous Networks. In I. Dhillon, Research Framework. arXiv:2007.14390, 2020.
D. Papailiopoulos, and V. Sze, editors, Proceedings of [26] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp,
Machine Learning and Systems, pages 429–450, 2020. Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe
[12] Hongyi Wang, Mikhail Yurochkin, Yuekai Sun, Dimitris Kiddon, Jakub Konečnỳ, Stefano Mazzocchi, Brendan
Papailiopoulos, and Yasaman Khazaeni. Federated Learn- McMahan, et al. Towards Federated Learning at Scale:
ing with Matched Averaging. In International Conference System Design. Proceedings of Machine Learning and
on Learning Representations, 2020. Systems, 1:374–388, 2019.
[13] Huiming Chen, Huandong Wang, Qingyue Long, Depeng
Jin, and Yong Li. Advancements in Federated Learning: B IOGRAPHY S ECTION
Models, Methods, and Privacy. arXiv, 2023.
[14] Bingyan Liu, Nuoyan Lv, Yuanchun Guo, and Yawen Li.
Recent Advances on Federated Learning: A Systematic
Survey. arXiv, 2023.
[15] Dinh C. Nguyen, Ming Ding, Pubudu N. Pathirana, Aruna Kok-Seng Wong (Member, IEEE) received his first
degree in Computer Science (Software Engineering)
Seneviratne, Jun Li, and H. Vincent Poor. Federated from the University of Malaya, Malaysia in 2002,
Learning for Internet of Things: A Comprehensive Survey. and an M.Sc. (Information Technology) degree from
IEEE Communications Surveys and Tutorials, 23, 2021. Malaysia University of Science and Technology (in
collaboration with MIT) in 2004. He obtained his
[16] Ahmed Imteaj, Urmish Thakker, Shiqiang Wang, Jian Li, Ph.D. from Soongsil University, South Korea, in
and M. Hadi Amini. A Survey on Federated Learning 2012. He is currently an Associate Professor in
for Resource-Constrained IoT Devices. IEEE Internet of the College of Engineering and Computer Science,
VinUniversity. To this end, he conducts research that
Things Journal, 9, 2022. spans areas of security, data privacy, and AI security
[17] Hangyu Zhu, Jinjin Xu, Shiqing Liu, and Yaochu Jin. while maintaining a strong relevance to the privacy-preserving framework.
Federated Learning on Non-iid Data: A Survey. Neuro-
computing, 465:371–390, 2021.
[18] Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng
He. Federated Learning on Non-IID Data Silos: An Duc-Manh Nguyen got a Master in Information Sci-
Experimental Study. In 2022 IEEE 38th International ence and Technology from University of Information
Conference on Data Engineering (ICDE), 2022. Science and Technology, North Macedonia. Currently,
he is a PhD candidate and a research assistant at
[19] Koji Matsuda, Yuya Sasaki, Chuan Xiao, and Makoto the Technical University of Berlin. His research
Onizuka. An Empirical Study of Personalized Federated focuses on Robotics and Edge Computing with
Learning. arXiv, 2022. Machine Learning, partially Cooperative Perception
for Autonomous Vehicles.
[20] Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu,
Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia
Smith, and Ameet Talwalkar. LEAF: A Benchmark for
Federated Settings. In Workshop on Federated Learning
for Data Privacy and Confidentiality, 2019.
[21] Tuo Zhang, Chaoyang He, Tianhao Ma, Lei Gao, Mark Khiem Le-Huy got the Honors Bachelor Degree
Ma, and Salman Avestimehr. Federated Learning for in Mathematics and Computer Science from the
Internet of Things. In Proceedings of the 19th ACM Vietnam National University, Ho Chi Minh City.
He was a Research Intern at Smart Health Center,
Conference on Embedded Networked Sensor Systems, VinBigData JSC, and currently is a Research As-
SenSys ’21, page 413–419, New York, NY, USA, 2021. sistant at the College of Engineering and Computer
Association for Computing Machinery. Science, VinUniversity, Hanoi, Vietnam. His research
interests include Efficient Machine Learning and AI
[22] Alex Krizhevsky. Learning Multiple Layers of Features for Biomedical Applications.
from Tiny Images. Science Department, University of
Toronto, Tech, 2009.
[23] Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon
Civin, and Vikas Chandra. Federated Learning with Non-
IID Data. arXiv, 2018.
Long Ho-Tuan got the Honors Bachelor Degree
[24] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, in Computer Science from the Vietnam National
James Bradbury, Gregory Chanan, Trevor Killeen, Zeming University, Hanoi, Vietnam. Currently, he is a Re-
Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: search Assistant at the College of Engineering and
Computer Science, VinUniversity, Hanoi, Vietnam.
An Imperative Style, High-Performance Deep Learning His research interests include Federated Learning and
Library. Advances in Neural Information Processing AI for Biomedical Applications.
Systems, 32, 2019.
[25] Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi
Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani,
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 13

Cuong Do-Danh (Member, IEEE) received his B.Sc.


degree in electronics and telecommunication from
Vietnam National University, Hanoi, Vietnam, in
2004, M.Eng. degree in electronics from Chungbuk
National University, Korea, in 2007, and a Ph.D.
degree in electronics from Cork Institute of Technol-
ogy, Ireland, in 2012. He had more than four years
working as a postdoc researcher at the University
of Cambridge, U.K in both fields of MEMS and
CMOS circuits for low-power sensors and timing
applications. He is now an Assistant Professor at
VinUniversity. His research interests include sensors for medical applications
and sensor fusion.

Danh Le-Phuoc (Member, IEEE) is a DFG Principle


Investigator at the Technical University of Berlin.
His research interests include linked data and the
Semantic Computing for IoT-Edge-Cloud continuum,
neural-symbolic AI, databases, pervasive comput-
ing, semantic stream processing and reasoning. Le
Phuoc received a PhD in computer science from
the National University of Ireland. Contact him at
[email protected].
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 14

VII. A PPENDIX A: P OWER AND S TORAGE


The testbed utilized in this research project consists of a diverse array of devices, including Raspberry Pi 3, Raspberry Pi 4,
and various models from the NVIDIA Jetson family. These devices are equipped with different types of storage, which come
in varying capacities and speeds. Additionally, the devices are powered by a variety of power supplies, with power outputs
ranging from 7.5W to 15W. The detailed specification is shown in Figure 13

Fig. 13. Devices power and storage specification


IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 15

VIII. A PPENDIX B: C OMPARE C ONVERGENCE T IME OF D IFFERENT HARDWARE PROFILE


This appendix presents the results of an experiment conducted to compare the convergence time of different hardware profiles
in Federated Learning. The experiment involved training models on three distinct sets of devices: 8 Raspberry Pi 3 devices,
8 Raspberry Pi 4 devices, and 8 simulation threads on a high-end computer. The experiment used CIFAR10 IID data and
monitored the progress of the models on different metrics which is test accuracy, test loss, and convergence speed. Three
sub-figures in Figure 14 provide a visual representation of the performance differences observed among the hardware profiles.

(a) Test Accuracy (b) Test Loss (c) Convergence Time


Fig. 14. Experiment of Federated Learning on different set of devices. Setup with 8 Raspberry Pi 3 vs 8 Raspberry Pi 4 vs 8 simulation threads of high-end
computer on CIFAR10 IID data.

IX. A PPENDIX C: T HE IMPACT FOR THE NUMBER OF CLIENT


This appendix presents the findings from an experiment conducted to examine the impact of the number of clients on
Federated Learning. The experiment involved training models on Raspberry Pi 3 devices using CIFAR10 IID data, with the
number of clients gradually increasing from 8 to 16, 32, and finally 64.

(a) Test Accuracy (b) Test Loss (c) Convergence Time


Fig. 15. Experiment of Federated Learning on incremental from 8, 16, 32 to 64 clients. Setup on Raspberry Pi 3 using CIFAR10 IID data.

X. A PPENDIX D: H ETEROGENEITY IN DATA AND D EVICE PROFILES


This appendix examines how differences in both data and device profiles impact Federated Learning. The first figure compares
the performance of models trained on different types of data: identical, non-identical, and extremely non-identical. We used
32 Raspberry Pi 4 devices and CIFAR 100 data for this experiment. The figure shows how the models perform under these
varying data distributions, highlighting the challenges posed by diverse data.
The second figure explores the performance differences among diverse device profiles. We used a setup with 32 devices,
including 32 Raspberry Pi 3, 32 Raspberry Pi 4, and a mix up of 32 devices from different type of devices. The models were
trained on CIFAR 10 identical data. This figure compares the performance achieved by these different device profiles, giving
insights into the advantages and limitations of diversity in Federated Learning scenarios.
IEEE INTERNET OF THINGS JOURNAL, VOL. X, NO. X, X 2023 16

(a) Test Accuracy (b) Test Loss (c) Convergence Time


Fig. 16. Comparision between IID, Non-IID, and extreme Non-IID data. Setup on 32 Raspberry Pi 4, and CIFAR 100 data.

(a) Test Accuracy (b) Convergence Time


Fig. 17. Heterogeneous devices comparison. Setup on 32 Raspberry Pi 3 vs 32 Raspberry Pi 4 vs 32 devices that mix up of 12 Raspberry Pi 3 + 12 Raspberry
Pi 4 + 4 NVIDIA Jetson TX2 + 4 NVIDIA Jetson NaNo using CIFAR 10 IID data.

You might also like