Heuristic Based Federated Learning With Adaptive Hyperparameter Tuning For Households Energy Prediction
Heuristic Based Federated Learning With Adaptive Hyperparameter Tuning For Households Energy Prediction
com/scientificreports
Energy prediction is significant for modern power grids, ensuring their efficient operation, mitigating instability,
and optimizing resource allocation and renewable energy source integration1. In recent years, progress has been
made in ML forecasting for energy prediction2,3. The accuracy and reliability of energy forecasts have been
improved by leveraging sophisticated models and large datasets to anticipate demand and supply fluctuations
more precisely. However, large amounts of data are utilized in the training process to create effective prediction
models. Since the household’s energy data contains sensitive information about individuals’ behaviours, ensuring
privacy in learning while still achieving good performance is an open research topic4. Even with strong privacy
and security guarantees, the households’ residents are often reluctant to grant access to their energy data for
storage in centralized cloud silos, where it can be further processed and used for model training purposes5.
Recently, Federated Learning (FL) has emerged as a promising approach in the field of energy prediction,
particularly for electrical load forecasting. It enables local prediction model training on data collected
and stored on household devices at the edge and offers advantages for training models on distributed data,
including improved efficiency and enhanced data privacy. Taïk et al.6 conducted one of the first studies on
electrical load forecasting using edge computing and FL. They employed Long short-term memory (LSTM)
in a federated scenario to predict residential load for 200 houses in Texas. Their approach highlighted the
benefits of personalization through re-training, achieving a 5% performance increase in terms of root mean
square deviation (RMSE) and mean absolute percentage error (MAPE). Similarly, Liu et al.7 introduced a FL
1Distributed Systems Research Laboratory, Computer Science Department, Technical University of Cluj-Napoca,
G. Barițiu 26-28, Cluj-Napoca 400027, Romania. 2Decision Support Systems Laboratory, School of Electrical &
Computer Engineering, National Technical University of Athens, Ir. Politechniou 9, Athens 157 73, Greece. email:
[email protected]; [email protected]
framework for smart grids, integrating power consumption data with weather features from 60 transformer
stations in Zhuhai, China. This study utilized LSTMs and boosting trees, comparing horizontal and vertical FL
models using MSE as the performance metric. The work emphasized the importance of securing power traces in
collaborative learning environments.
Further research indicated the diminishing performance of FL when dealing with non-independent and
identically distributed (non-IID) data8,9. This prompted several researchers to experiment with clustering
techniques. Savi et al.10 explored short-term load forecasting (STLF) at the edge, using FL and clustering
methodologies. The prediction model was based on LSTMs and incorporated weather data. They compared
FL with clustering learning in terms of accuracy, impact of clustering, scalability, and communication cost,
with the Kmeans FL model achieving the best performance in most metrics. Brigs et al.11 conducted a similar
study with an LSTM-based model enhanced with weather data. They tested several scenarios, comparing FL,
centralized learning, local learning, and Hierarchical Clustering (HC). The results showed that FL approaches
outperformed centralized learning but underperformed local learning. However, with a personalization step, FL
and its clustered variant (FL + HC) improved performance by up to 5% over localized learning while maintaining
data privacy. Additionally, FL + HC with fine-tuning significantly reduced computational demands, requiring up
to 10 times fewer samples for optimal model performance. He et al.12 tested residential STLF on 250 households
from Australia, using LSTM models and K-means clustering in a federated setting. It showcased the importance
of clustering and indicated that FL can be particularly useful for collaborative training in cases of users with
missing historical data. More advanced clustering techniques have been used in13,14. Tun et al.13 implemented
bi-directional LSTM models with ordering points to identify the clustering structure for STLF on data from 22
households in British Columbia. Their comparison between clustered and non-clustered approaches revealed
the benefits of clustering in improving forecast accuracy. Gholizadeh et al.14 introduced hyperparameter-
based clustering for electrical load forecasting on 75 households in Edmonton, comparing FL with centralized
and local learning using RMSE. The results revealed that the clustering method significantly reduced the
convergence time and that FL performed worse than local learning and better than centralized learning in
individual load forecasting. Fernández et al.15 focused on privacy-preserving FL for residential STLF, testing
various architectures and scenarios. Their findings suggest that FL performs worse than centralized learning
in terms of accuracy, the performance of FL increases proportionally with the number of participating clients.
Additionally, clustering methods enhance forecasting accuracy, while complex model architectures involve high
computational costs and pose risks of overfitting. Duttagupta et al.16 explored lightweight FL for distributed load
forecasting using a feedforward neural network model, demonstrating that lightweight models could indeed
achieve comparable performance to more complex architectures. The experiments highlighted the potential of
FL in reducing computational costs while maintaining accuracy.
A limited number of studies have experimented with variations of the federated aggregations algorithms in
energy predictions. Wang et al.17 introduced the SecFedAProx-LSTM an adaptive FL framework for multiparty
wind power forecasting, based on an LSTM model, a variation of the FedProx framework, and secure aggregation.
Their method demonstrated three key advantages. It provided more accurate and reliable forecasts compared
to Multilayer Perceptron, Convolutional Neural Network, Recurrent Neural Network, and Gated recurrent unit
(GRU) models and achieved faster convergence and improved accuracy in the presence of statistical heterogeneity
compared to FedProx, especially as the number of clients increased. Additionally, it ensured privacy without
requiring a third party for key generation, using Decentralized Multi-Client Functional Encryption for secure
aggregation. Fekri et al.18 experimented with two federated aggregation algorithms: FedSGD and FedAVG. Both
achieved higher accuracy than individual and central models for one-hour forecasting, with FedAVG slightly
better. For 24-hour forecasting, FedAVG outperformed all methods, while FedSGD had convergence issues. The
approach maintained high accuracy even when new smart meters joined post-training. Some approaches aim to
ensure a more efficient federated model aggregation. Hu Y. et al.19 propose an aggregation method that considers
the characteristics of individual datasets of the training nodes, enabling participants to make element-wise
contributions to improve the learning performance and convergence speed. Hu Z. et al. propose in20 a multi-
objective optimization approach for FL that converges to Pareto stationary solutions. The aggregation algorithm
considers individual objectives and the overall collaborative objective. Chifu et al.21 introduced FedWOA, a FL
model for predicting renewable energy production using time series data from local prosumer nodes. Utilizing
the Whale Optimization Algorithm (WOA) to aggregate LSTM model weights, FedWOA addresses data
heterogeneity and variations in generation patterns. With Kmeans clustering for non-IID data management,
FedWOA improved prediction accuracy by 25% for MSE and 16% for Mean absolute error (MAE) compared to
FedAVG, demonstrating good convergence and reduced loss. This approach enables precise forecasts for small-
scale energy prosumers through decentralized data and collaborative global model optimization.
Finally, the hyperparameters of local models may significantly impact the performance of FL for energy
prediction. Improving hyperparameter selection such as learning rate, batch size, or number of epochs and
dynamically adjusting them can increase convergence speed and enhance the learning of local models22.
However, communication overhead and convergence speed between the edge devices and the cloud server may
affect the prediction accuracy and training efficiency23. Heuristic-based approaches are often used to find the
optimal hyperparameter settings as they are exploring efficiently large search spaces by balancing the exploration
and exploration in finding the optimal configuration22. Kundroo et al.24 highlight the importance of selecting
the appropriate configuration of hyperparameters for both model performance and training efficiency. In their
case, the clients are responsible for hyperparameter optimization, by dynamically adjusting the learning rate and
number of epochs according to the model training loss. Qolomany et al. propose a Particle Swarm Optimization
algorithm for hyperparameter tuning of deep long short-term memory models25. The number of communication
rounds needed to find the best solution is reduced compared to a grid search method. Al-Wesabi et al.26 use the
Pelican Optimization Algorithm to fine-tune the hyperparameters of a belief network for attack detection on
local IoT devices. A heuristic approach for hyperparameter tuning was applied for spiking neural networks
in27. This type of neural network has many hyperparameters, and the Cuckoo Search Algorithm, Grasshopper
Optimization Algorithm, and Polar Bears Algorithm were tested for their optimization. Orchard meta-heuristic
optimization algorithm is proposed by Bukhari et al. in28 for hyperparameter tuning of a FL model that predicts
photovoltaic power generation. The optimization problem solutions are composed of architectural information
for the proposed Conv-SGRU model, learning, and dropout rate. Michalakopoulos et al.29 propose a federated
framework for collaborative model training across decentralized prosumer energy data without compromising
sensitive information. They leverage clustering algorithms that utilize the models’ hyperparameters as the input
space and integrate the differential privacy aggregator. The privacy-preserving transfer learning for short-term
building energy consumption predictions is addressed in30. The federated model learns transferable knowledge,
and the hyperparameter fine-tuning process is made during the training phase using a grid search algorithm
to find the optimal configuration regarding model architecture, learning rate, and the used optimizer. The grid
search algorithm is also used for hyperparameter selection in31 in different FL settings for residential energy
consumption prediction.
The paper explores a novel hierarchical FL solution for households’ energy consumption prediction that
incorporates clustering techniques, simulated annealing (SA), and genetic algorithms (GAs) for efficient models’
aggregation and hyperparameters tuning. We address the challenge of effective and adaptive hyperparameter
tuning for heterogeneous energy profiles by using a clustering technique. Similar energy profiles are grouped and
linked for aggregation at the fog level. The GA efficiently explores the hyperparameter configurations, selecting
and sending only the most promising ones to the validation nodes for evaluation. Additionally, there is a need
for effective hyperparameter tuning methods that can scale to numerous households and massive datasets. These
methods should be capable of handling the diverse FL deployments and consider the limited computational
resources available at the edge. To address this gap, a hierarchical SA optimization is used as an efficient
aggregation method at the fog and cloud layers. The method improves performance by prioritizing updates from
the better-performing models.,and enhances training efficiency by focusing on early updates. Finally, the GA-
based hyperparameter optimization process reduces the computational effort of edge nodes by using only one
hyperparameter configuration at a time for training and validation. In this way, we address significant challenges
in FL, such as optimizing the communication between edge devices and the fog/cloud to reduce overhead, while
maintaining the prediction performance of the global model. This is relevant, especially in the case of households’
energy consumption prediction where the energy data is non-IID and a node with a larger dataset and higher
energy profile magnitude shouldn’t necessarily have a greater influence on the global model. Additionally, it’s
important to consider, especially in the early stages some prediction models may perform poorly on edge nodes
but still contribute positively to the global model.
The remainder of the paper is structured as follows: the Methods section introduces the proposed FL solution
for households’ energy production, the Results section details the evaluation and validation results and the
Conclusion section summarizes the paper and highlights future works.
Methods
Figure 1 presents the proposed three-layer FL architecture for energy consumption prediction of a set of
households, h ∈ H . The edge nodes refer to gateway devices located in buildings, which are used to train
local prediction models on the data stored locally. These devices then send updates of their learned models to
the upper fog layer. Since households have different energy consumption profiles with varying patterns and
amplitudes, their effective grouping into distinct clusters is important for prediction accuracy. In this scope we
have used our clustering solution from32, with one change that involves removing the extra features related to
peak demand hours, as it plays no role in understanding the time series patterns that we are trying to categorize.
Therefore, the fog devices are associated with a cluster c ∈ C , of H c households ∪ H c = H , enabling them
to contribute to a shared prediction model on the fog layer. The top cloud layer is responsible for efficiently
aggregating the fog layer updates into a global prediction model.
A round of communications between the top layer cloud, fog clusters, and each cluster with its households
and reverse, represents an iteration. We have considered K as the total number of iterations needed to complete
the trading of the global federated model. The top layer is responsible for initializing and storing the global
weights w (k) after each iteration k ∈ K , and a set of hyperparameter configurations of the global model ψ
. Also, λ is a cumulative hyperparameter for the cloud model and α (k) is the computed performance of the
global model on iteration k. Each fog layer cluster, c ∈ C , has a set of hyperparameter configurations ψ c from
which it selects the best configuration φ cbest (k) and sends it to the edge layer. The cumulative hyperparameter
of the cluster model is denoted as λ c and its performance on iteration k as α c (k). Additionally, the cluster-
associated vector of weights on iteration k, wc (k) is updated by aggregating the weights received from each
edge node. Finally, the household edge nodes are responsible for the training and validation of the model. They
receive the initial weights and configuration from the fog and update and evaluate their performance considering
the current configuration of the hyperparameters. The performance of the updated model on iteration k is
denoted as α cp (k). The computed weights on the prosumer node are wpc (k).
For each cluster c the edge nodes H c are split into train, Htc and validation nodes Hvc , such that
H c = Htc ∪ Hvc . We define the learning of the global federated model as a multi-objective optimization
problem. On the edge layer, for each training node hcp (k) ∈ Htc the objective is to minimize the loss on its
training data set Dhp , given the weights of the local model wpc (k) and the best hyperparameter configuration
φ cbest (k) sampled from the set of fog configurations. The objective function is expressed as:
hc ( )
p
fobj (φ ci (k)) = min Lossφ cbest (k) Dhp , wpc (k) , hp ∈ Htc (1)
c (k)∈ Rd
wp
On the fog layer, the objective at each cluster is to minimize the sum of the losses computed on both training
and validation edge nodes. This involves minimizing the total loss from all household nodes in the cluster
by aggregating the weights from edge nodes within the cluster and selecting the optimal hyperparameter
configuration for training. The objective function is:
∑ hc
c
fobj = min p
fobj (φ cbest (k)) (2)
φc (k)∈ ψ , wc (k)∈ Rd hc
p∈ H
c
best
where c is the cluster of edge nodes, ψ c is the hyperparameter configurations for the cluster and wc (k) the set
of edge models in the cluster.
The cloud layer’s global objective is to minimize the overall loss on all edge nodes by efficiently aggregating
the updates received from the fog nodes:
g
fobj = min fc , c ∈ C
∪ wc (k)∈ Rd (3)
obj
In other words, the optimization problem is to efficiently aggregate the model weights both on fog and cloud
layers and to find the best hyperparameter configuration of nodes such that the sum of edge node training and
validation losses is minimized.
score (6.4). The detailed GA as well as the population update process involving offspring generation, removing
the worst candidates, and fitness score computation is described in the Hyperparameter Tuning section. The
fog nodes select the best chromosome from the population based on fitness score (6.5). The weights and the
best-selected chromosome are broadcasted to all the training edge nodes hcp from Htc (7). The edge nodes
train the model with the given hyperparameter configuration on its dataset Dhp (8) and send to the fog node
the updated weights wpc (k) and its performance α cp (k) (9). Using the SA process, the fog node aggregates the
received updates (10) and sends the aggregated model weights wc (k) and performance α c (k) to the clod (11).
Finally, the cloud aggregates the model updates received from fog nodes (12) and the process is repeated for the
remaining iterations.
Algorithm 1: SA Aggregation
The method returns a new set of aggregated weights, the performance of the aggregated model, and the
updated cumulative hyperparameter. Firstly, the algorithm computes two factors: µ prev based on current the
cumulative hyperparameter and µ new based on current temperature (line 5). Afterward, for each set of weights
(line 6), the difference ∇ E between the performances of the previous aggregated model and the current
updates is computed, and a random number γ is selected between 0 and 1 (lines 7–8). If the performance of
the updated weights is higher than the previous aggregated model or with a given probability influenced by γ ,
∇ E , Tcurrent and a constant kB the model is aggregated (lines 9–11). The ponders of the new weights and the
aggregated model is given by the µ prev and µ new and the cumulative hyperparameter is updated with µ new .
Finally, the performance of the aggregated weights is computed as the maximum values between the previous
performance of the aggregated model and the performances of all the updated models (line 14). The usage of
the Boltzmann constant kB employs to operate with the Boltzmann probability distribution where the random
value γ is evaluated concerning the chance that the system is found in a state with a difference of performance
∇ E therefore searching function of temperature for better or random states.
Hyperparameters tuning
The GA34 is used to find the best configuration for the hyperparameters for each fog node
{ corresponding to a cluster.
}
The population is initialized with a set of hyperparameter configurations ψ c = φ c1, , φ c2, φ c3, ... φ cψ size
where φ ci is the ith chromosome of the population:
φ ci = (η i , batchi , epochi , Pi , Nf ti )(4)
The genes represent hyperparameters that significantly influence model performance in federated energy
prediction tasks. The learning rate η ∈ [10−4 , 10−2 ], is tuned to find a balance between stable convergence
and faster training; the batch size batch ∈ [16, 128] allows for exploring different trade-offs between
computational efficiency and capturing complex consumption patterns; the number of epochs epoch ∈ [1, 100]
ensures flexibility in fitting seasonal and varying consumption behaviours without overfitting; the early stopping
patience P ∈ [1, 20] helps to detect convergence and prevent unnecessary training, accommodating data
irregularities; and the number of fine-tuning layers Nft ∈ [1, 10] controls how much of the pre-trained model
is adapted to local conditions. For population initialization, ψ size individuals are randomly generated with
each hyperparameter value drawn from its defined range, enabling a broad search space for discovering effective
configurations.
The GA-based hyperparameter tuning is defined In Algorithm 2. It receives the current temperature from
the SA_AGG, Tcurrent , population of chromosomes ψ c , the validation node hcv (k) and the current cluster-
level aggregated weights wc (k). Firstly, the candidates for crossover φ cp1 , φ cp2 are selected as the best two
hyperparameter configurations in the population (line 6). The new offsprings φ o1 and , φ o2 are generated by
crossover between the selected candidates (line 7) and it is added to the survivor population (line 8). The Single-
Point Crossover is used for offspring generation which involves swapping segments of two parent chromosomes
at a random point. As parameters, we have set a probability of 60% for the crossovers meaning that for a given
pair of parents, there is a 60% chance that crossover will be applied to produce offspring. If φ p1 and φ p2 are
the parent chromosomes, φ o1 and , φ o2 are the offspring chromosomes, r is the crossover point and n is the
length of the chromosomes, then the formula is:
( ) ( )
φ o1 = φ p1 [0 : r] , φ p2 [r : n] , φ o2 = φ p2 [0 : r] , φ p1 [r : n] (5)
For the mutation process, each gene in the offspring has a 3% probability of being changed to a random value
from its domain. After the generation of the offspring, the new population ψ cnew is obtained by replacing the
two chromosomes with the lowest fitness scores with the newly generated offspring (lines 8–9). Only some of
the chromosomes from the population are selected in the current iteration to be evaluated on the validation edge
node hcv (k) ∈ Hvc (lines 10–15). The probability of a chromosome φ ci to be selected is given by a randomly
generated value (line 11), its current fitness score f (φ ci ), constant kB and the temperature Tcurrent (line 12).
For each selected chromosome in the new population, the randomly chosen validation edge node hv (k) ∈ Hvc ,
receives the current cluster-level aggregated weights wc (k) and a hyperparameter configuration corresponding
to a chromosome to compute the fitness score. The fitness score is determined by computing the loss of fitting the
model with the received weights and hyperparameters (line 13). If the chromosome is not selected for evaluation,
the previous fitness will be kept. Finally, the algorithm returns the new population (line 16).
Fig. 3. Hourly energy consumption data for each household (recorded daily across the dataset).
Fig. 4. Households’ energy profiles analysis: (a) Number of households by daily energy consumption range
and (b) Hourly energy consumption.
the dataset, a data cleaning process was undertaken before data analysis. Initially, data points with missing or
erroneous values were removed to ensure data integrity, resulting in a final sample of 4,438 households for this
study.
For solution evaluation, a wide array of features was considered to capture various aspects of energy
consumption patterns. These features are categorized into several groups, each contributing uniquely to the
predictive power of the federated model. We considered temporal features like the hour of the day, day of the
week, and month of the year to capture features that capture daily, weekly, and seasonal patterns in energy
consumption. To capture short-term trends and variability, statistical features such as moving averages, rolling
mean, and maximum and minimum values were used (see Table 1).
Figure 4 presents an overview of the daily energy consumption of households from the dataset. Figure 4 (a)
shows the distribution of households in the dataset by their daily energy consumption range. Most households
in the dataset have an average daily energy consumption that falls within the interval of 0 to 10 or 10 to 20 kWh/
day. The average daily energy consumption is computed for overall households as an hourly average and is
illustrated in Fig. 4 (b).
Figure 5 (a) represents the monthly average energy consumption. The average is computed for overall
households, and the seasons are represented with different colours, and it can be noticed that the lowest energy
consumption is during the summer months (yellow) and the highest is during the winter (blue). Figure 5 (b)
presents a heatmap of the average energy consumption for each day of the week and how it varies based on the
month. The colour intensity from the heatmap indicates the value of the energy consumption, from blue (high)
to light yellow (low).
We have clustered the households’ prosumers based on the energy profile features using the methodology
presented in31. In the process, a normalization procedure was applied using the Min-Max normalization method,
which scales all values to a range of 0.0 to 1.0. Specifically, the minimum value of each feature is transformed to
0, the maximum value to 1, and all other values to a decimal between 0 and 1. This normalization step is crucial
for mitigating the impact of varying data magnitudes on subsequent clustering analyses, thereby preventing
associated biases. The applied data preparation process aims to enhance the robustness of the clustering analyses
by normalizing data scales and facilitating the use of distance-based metrics in data exploration. Three clustering
algorithms, K-means36, K-medoids37, and Hierarchical clustering38 are applied to segment the data based on the
features of each load profile. Determining the optimal number of clusters in clustering analysis is challenging, as
it typically cannot be precisely known in advance. Therefore, the various clustering algorithms are tested over a
predefined range of clusters, from 2 to 30. This extensive range is systematically explored to determine the most
appropriate number of clusters using three evaluation metrics: the Silhouette Score (SIL), the Davies-Bouldin
Index (DBI), and the Calinski-Harabasz Index (CHI). Table 2 shows the optimal number of clusters for our
Fig. 5. Statistical features analysis: (a) Overall monthly energy consumption and (b) Day of the week energy
consumption by month.
Evaluation
metric
Clustering Algorithm SIL DBI CHI
K-means 3 3 3
HAC 3 3 3
K-medoids 2 2 2
Table 2. The optimal number of clusters for the clustering algorithms based on three evaluation metrics.
Fig. 6. SIL scores for every clustering algorithm under the selected range.
case is three. The only exceptions are observed with K-medoids, where the optimal number of clusters is two.
However, as discussed in previous studies, K-medoids is not reliable for tasks of this nature. Consequently, its
results are excluded from further analysis. On the other hand, the results of K-means and Hierarchical clustering
mostly agree, with only minor exceptions. Since K-means achieves higher scores across all evaluation metrics
(SIL, DBI, and CHI), the labels selected by this algorithm will be incorporated into the proposed solution for
further assessment.
Figure 6 illustrates the SIL scores for all clustering algorithms evaluated across the selected range of cluster
numbers, offering a clear comparison of their performance. The figure highlights the consistent superiority of
K-means, as it achieves the highest SIL scores for most of the tested configurations. This trend underscores the
robustness of K-means in identifying well-separated and compact clusters.
Fig. 7. Normalized median values for each cluster during the day (generated using Python 3.12.341 and
Matplotlib48).
Fig. 8. Evaluation Setup: Households data and Physical Device Assignments (created with Microsoft
PowerPoint60).
In Fig. 7a visual depiction of the clustering outcomes derived from our methodology is presented, overlaying
the time-series data. Each cluster is represented by a unique color to enhance visual distinction, with its respective
median trend line displayed in the same color to emphasize the central tendency within the cluster. The clusters
reveal subtle yet meaningful variations in energy consumption patterns, primarily distinguished by the volume
of usage, providing insights into the underlying structure of our dataset. More specifically, Cluster 1 exhibits the
largest magnitude in daytime peaks, reflecting higher activity levels. Cluster 2 shows a moderate level of energy
usage, with peaks smaller than those of Cluster 1, but still pronounced compared to Cluster 0. Despite these
differences, all clusters share a common temporal structure influenced by similar daily cycles across the dataset.
The evaluation setup for each layer in the federated architecture is presented in Fig. 8. The edge devices,
represented by different versions of Raspberry Pi, are mapped to the corresponding households in the dataset.
For each cluster, a fog device (Intel Core I3 and 8GB RAM) was used for the aggregation and hyperparameter
tuning process. The edge devices are connected to the fog node that represents the cluster to which the consumer
belongs.
We have developed applications for script handling and communication exchanges, each corresponding to
a layer of the federated architecture, using Spring Boot 3.2.5 with Java 1740. For the dependencies manager
we have used Maven and communication among nodes is established using Representational State Transfer
(REST) communication. Python 3.12.341 and TensorFlow 2.1842 are used for building scripts for data and model
manipulation. The applications and the scripts run on Docker containers deployed on the federated architecture
nodes, providing a virtual environment featuring the following libraries: (i) TensorFlow for managing and
creating models, (ii) Pandas43 for reading data from comma-separated values (CSV) files and processing it
through feature engineering pipelines, and (iii) Scikit-learn44 for scaling tasks. Additionally, Scipy45 was used for
special functions, such as the Boltzmann constant, while Argparse46 handled parsing arguments from the stack.
Protobuf47 was used for building the image, and Matplotlib48 for generating plots. The GA for hyperparameters
optimization is implemented using the Java library Jenetics 7.0.049 and it is deployed on the Dockers from the
fog nodes. The SA algorithm for models’ aggregation is implemented from scratch and runs on the Dockers from
the cloud and fog nodes. For monitoring network traffic and hyperparameters, we used features of the Spring
Framework along with a custom caching mechanism to capture the state of the algorithms across iterations. The
code of our federated solution is available on GitHub50.
The energy prediction model architecture is designed using the Keras library51 and is constructed with
sequential layers, the core layer being the LSTM and using ReLU52 as activation function. The input consists
of a sequence of 6 features with a sequence length of 48. The first LSTM layer contains 32 units. The value was
determined through repeated attempts, correlating their impact on the quality of the predictions. A second
LSTM layer with 64 units is then applied. Finally, a Dense layer with 16 units, followed by a final Dense layer with
1 unit to output the predicted value. To update the model’s weights, we used the Adam optimizer53 and Mean
Squared Error (MSE) as the loss function.
Figure 9 reports the prediction accuracy of our FL methodology compared with other state of the art
methods using the average MSE (Mean Square Error) for households’ energy prediction over several iterations
(executed on daily energy profiles from 2013-07-10 to 2013-07-20). For a series of iterations, the performance of
the aggregated model at the cloud model was analysed, as well as the execution time and volume of the network
transmitted data. Compared with FedAVG39 it can be noticed that the hyperparameter tuning method helps the
model converge earlier and, by finding the optimal hyperparameters for training, prevents the spikes of the MSE
during iterations.
We have compared the accuracy of our FL energy prediction model for each edge device, representing a
household. The results presented in Table 3 show that in average the model outperforms the considered baseline
represented by the FedAVG algorithm. Our solution effectively captures patterns in household energy profiles
through clustering and hyperparameter tuning. It demonstrates superior performance in scenarios where
FedAVG struggles, such as for households with device IDs MAC001198 and MAC000321. By introducing
greater variance in the energy prediction data used during training and later in cluster-level cross-validation
our model has good generalization features. It achieves similar accuracy with FedAVG minimizing prediction
deviations across the rest of the households used in testing.
The execution time and the network traffic are measured over iterations to have an overview of the costs
implied by the integration of the proposed aggregation method and hyperparameter tuning process. Figure 10
shows the execution time for each iteration involving a complete federated energy prediction model update.
Fig. 9. Average prediction accuracy of our federated model compared with state of the art methods.
Fig. 10. Execution time for each prediction model update iteration.
As many combinations of parameters in the search space need to be evaluated by the genetic heuristics it adds
computational overhead. The time depends on how many chromosomes are selected during the GA evolution for
validation and how fast the edge nodes respond to the computed performance or updated model. Additionally,
the increase in the execution time is due to steps that involve additional communication with the household’s
validation nodes inside the same cluster.
The computational complexity of our solution is influenced by the additional complexities brought by the GA
for hyperparameters optimization and by the simulating annealing solution for prediction models aggregation.
In the case of the GA, the complexity per each cluster c is directly influenced by the size of the initial population
φ ci , the number of iterations I, and the complexity of the fitness function Of :
( ( ( ( ) ) ))
O φ ci ∗ I ∗ Of −GA |H c | + |Htc | · avgsize Dhp + wpc (k) *I + |φ | , hp ∈ Htc (6)
where |H
( | is) the total edge nodes in the cluster hp , m cthe number of training nodes in the cluster,
c
avgsize Dhp is the average size per training node ( Dhp ), wp (k) the dimensionality of the model and ∣Φ∣ is
the number of hyperparameter configurations.
The computational cost of SA for model aggregation per cluster c depends on the number of edge devices in
the cluster, |H c |, the complexity of the objective function which is the model aggregation loss:
( )
Of −SA (I∗( |H c | ∗ wpc (k) + |H c | ∗ avgsize Dhp )(7)
Despite the additional complexity brought by the GA and SA algorithms the execution time for each model
iteration remains within reasonable boundaries feasible for solutions requiring the day ahead energy prediction
for energy prosumers. Additionally, the accuracy gains are significant compared to other federated models in
state of the art. Its complexity could be managed by selecting and sampling only a subset of edge devices or model
parameters to approximate the objective function, reducing the dependence on the number of edge devices per
cluster and the prediction model dimensionality.
Figure 11 shows the data transmission overhead brought by our federated solutions for all the layers. The
edge and fog quantity of transmitted data is computed as an average across all nodes. The FL methodology
proposed has minimum impact on incoming and outgoing traffic among nodes on different architectural layers,
which is beneficial when network resources are limited such as the cases of edge nodes in smart grids. In our
case, the hyperparameter tuning reduces the size of model updates sent between nodes at edge and fog layers as
the GA efficiency parameters such as batch size, learning rate, and update frequency. Therefore, the FL-based
solution can scale more effectively across larger energy networks with many households associated with edge
devices without overwhelming the data network infrastructure. Additionally, the low network traffic overhead
of our solution reduces the energy consumption of edge devices, which is particularly important for important
in households where energy management often overlaps with the integration of smart homes into energy grids.
GA-based hyperparameter optimization minimizes the communication rounds that are required for accurate
households’ energy prediction. This not only optimizes the use of data network resources but also smooths the
data transmission patterns between nodes making the data flow in federated prediction model update more
stable and manageable. Therefore, our federated energy prediction model converges faster leading to quicker
decision-making on edge and fog devices, contributing to the management of microgrids.
The best fitness score and the diversity from each fog population are represented in Fig. 12. The fitness score
(see Fig. 12a) is computed as the performance of the hyperparameters configuration on the selected validation
node. The fitness score for the best chromosome is more stable in the later iterations, as the algorithm progresses.
This stability reflects a more refined and accurate prediction model as the FL process converges thus the federated
model is reaching an optimal solution across all household’s edge nodes, leading to better energy predictions.
The diversity of the population on each fog node (see Fig. 12b) helps prevent premature convergence and ensures
a more robust, globally optimal solution for the federated energy prediction model. The diversity varies based
on local conditions, such as households’ energy data heterogeneity. However, the clustering of households based
on energy profiles and the cross-validation of the model between the edge nodes of the same cluster helps
in exploring a wide solution space. Our federated model explores not only individual households’ patterns
but also broader trends within the cluster widening the solution space, as the model benefits from both local
(individual household) and group (cluster) data patterns. Consequently, different fog nodes can host distinct
local populations of chromosomes, representing local solutions to the energy prediction problem.
To benchmark the energy prediction accuracy results of our methodology we have used the FedAVG,
FedProx and FedMIME implementation from the Tensorflow Federated framework. FedProx is an extension of
FedAvg that incorporates a regularization term to handle heterogeneous client data and improve stability in non-
iid settings, whilst FedMIME is a personalized federated learning method. The energy consumption values were
scaled using Standard Scaler, the dataset was split into training and testing sets (80%-20%), and the federated
Fig. 11. The volume of network traffic for cloud, fog, and edge: (a) incoming and (b) outgoing.
Fig. 12. Genetic based hyper parameters tunning (a) Best fitness score and (b) diversity for each fog over
iterations.
Prediction Accuracy
relative improvement (%)
Metric FedAVG FedProx FedMIME Our federated model FedAVG FedProx FedMIME
MAE 0.12014 0.10431 0.50415 0.07438 38.08 28.69 85.25
RMSE 0.24993 0.22456 0.69997 0.10062 59.74 55.19 85.62
R2 0.95495 0.96366 0.31188 0.96797 28.91 11.85 95.35
Table 4. Average prediction accuracy of our solution compared with state of the Art aggregation methods.
model was trained over 10 communication rounds. The metrics were computed on the testing set for each client,
using the global model. For FedProx, we set the proximal strength to 0.01 to balance stabilizing updates from
heterogeneous data and allowing local model adaptation, and the Yogi client optimizer was used with a learning
rate of 0.01. For FedMIME, Yogi optimizer was used for both the base and server optimizers, with learning rates
of 0.001 and 0.01, respectively. In Table 4 are presented the average values for those metrics computed over all
clients and the statistical improvement of our solution.
Our federated model demonstrates consistent performance improvements over FedAVG, FedProx, and
FedMime across all evaluated metrics. Compared to FedAVG, the MAE decreased in average by 38%, RMSE
by 59% and R2 metric was improved by 28%. Similarly, the average accuracy improvements over FedProx were
of 28% for MAE, 55% for RMSE, and 11% for R2. The prediction performance of FedMIME was worse than
FedAVG and FedProx due to its focus on personalization and the relatively small number of training examples
(hourly consumption data over less than one year). Thus, the improvement was higher in this case (over 85%).
As a final note, the hierarchical FL methodology and adaptive hyperparameter tuning strategy presented here
are not restricted to energy prediction and can be applied in diverse fields characterized by data decentralization
and privacy concerns. Examples include distributed healthcare analytics (e.g., hospital-level patient data)54,55,
language modeling56,57, traffic58, and telecommunications59 forecasting among others. In each case, grouping
similar data sources into clusters and adjusting hyperparameters to local conditions enhances performance,
robustness, and scalability. Likewise, the GA-based hyperparameter tuning method is equally domain-agnostic.
It can efficiently search large and complex hyperparameter spaces to identify near-optimal configurations
without requiring explicit assumptions about the underlying data distribution or the nature of the predictive task.
This flexibility makes the proposed approach readily transferable to other fields where FL and hyperparameter
optimization are needed.
Conclusions
The proposed hierarchical federated learning solution for household energy prediction, captures well the
household energy patterns through clustering and hyperparameter tuning, excelling in scenarios where
FedAVG underperforms with an average accuracy improvement of about 20%. It ensures good generalization
by introducing greater variance in training and cluster-level cross-validation while achieving comparable
accuracy to FedAVG in scenarios where FedAVG excels (around 4%). Additionally, it outperforms FedProx,
and FedMIME, with significant gains in prediction accuracy. The network traffic is kept below 30 KB, and
hyperparameter tuning reduces model update sizes and communication rounds by 30%, making the approach
efficient in resource-constrained networks.
Data availability
All data generated or analysed during this study are included in this published article .
References
1. Sarmas, E. et al. Revving up energy autonomy: A forecast-driven framework for reducing reverse power flow in microgrids’,
sustain. Energy Grids Netw. 38, 101376. https://doi.org/10.1016/j.segan.2024.101376 (Jun. 2024).
2. Aslam, S. et al. A survey on deep learning methods for power load and renewable energy forecasting in smart microgrids’, renew.
Sustain. Energy Rev. 144, 110992. https://doi.org/10.1016/j.rser.2021.110992 (Jul. 2021).
3. Zhu, J. et al. ‘Review and prospect of data-driven techniques for load forecasting in integrated energy systems’, Appl. Energy, 321,
119269, DOI: https://doi.org/10.1016/j.apenergy.2022.119269.Sep. (2022).
4. Olusogo Popoola, M. et al. A critical literature review of security and privacy in smart home healthcare schemes adopting IoT &
blockchain: Problems, challenges and solutions, Blockchain: Research and Applications, Volume 5, Issue 2, 100178, ISSN 2096–
7209, (2024). https://doi.org/10.1016/j.bcra.2023.100178
5. Vigurs, C., Maidment, C., Fell, M. & Shipworth, D. Customer privacy concerns as a barrier to sharing data about energy use in
smart local energy systems: A rapid realist review. Energies 14 (5), 1285 (2021).
6. Taïk, A. & Cherkaoui, S. ‘Electrical Load Forecasting Using Edge Computing and Federated Learning’, in ICC 2020–2020 IEEE
International Conference on Communications (ICC), Jun. pp. 1–6. (2020). https://doi.org/10.1109/ICC40277.2020.9148937
7. Liu, H., Zhang, X., Shen, X. & Sun, H. ‘A federated learning framework for smart grids: Securing power traces in collaborative
learning’, Nov. 01, 2021, arxiv: arxiv:2103.11870. https://doi.org/10.48550/arXiv.2103.11870
8. Li, Q., Diao, Y., Chen, Q. & He, B. ‘Federated Learning on Non-IID Data Silos: An Experimental Study’, in 2022 IEEE 38th
International Conference on Data Engineering (ICDE), May pp. 965–978. (2022). https://doi.org/10.1109/ICDE53745.2022.00077
9. Zhu, H., Xu, J., Liu, S. & Jin, Y. ‘Federated learning on non-IID data: A survey’, Neurocomputing, vol. 465, pp. 371–390, Nov.
(2021). https://doi.org/10.1016/j.neucom.2021.07.098
10. Savi, M. & Olivadese, F. Short-Term energy consumption forecasting at the edge: A federated learning approach. IEEE Access. 9,
95949–95969. https://doi.org/10.1109/ACCESS.2021.3094089 (2021).
11. Briggs, C., Fan, Z. & Andras, P. Federated learning for Short-Term residential load forecasting. IEEE Open. Access. J. Power Energy.
9, 573–583. https://doi.org/10.1109/OAJPE.2022.3206220 (2022).
12. He, Y., Luo, F., Ranzi, G. & Kong, W. ‘Short-Term Residential Load Forecasting Based on Federated Learning and Load
Clustering’, in 2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids
(SmartGridComm), Oct. pp. 77–82. (2021). http s://doi.o
rg/10.1109/SmartGridComm5199 9.2021.9632314
13. Tun, Y. L., Thar, K., Thwal, C. M. & Hong, C. S. ‘Federated Learning based Energy Demand Prediction with Clustered Aggregation’,
in IEEE International Conference on Big Data and Smart Computing (BigComp), Jan. 2021, pp. 164–167. (2021). https://doi.o rg/
10.1109 /BigComp51 126.2021.00039
14. Gholizadeh, N. & Musilek, P. ‘Federated learning with hyperparameter-based clustering for electrical load forecasting’, Internet
Things, vol. 17, p. 100470, Mar. (2022). https://doi.org/10.1016/j.iot.2021.100470
15. Fernández, J. D., Menci, S. P., Lee, C. M., Rieger, A. & Fridgen, G. Privacy-preserving federated learning for residential short-term
load forecasting. Appl. Energy. 326, 119915. https://doi.org/10.1016/j.apenergy.2022.119915 (Nov. 2022).
16. Duttagupta, A., Zhao, J. & Shreejith, S. ‘Exploring Lightweight Federated Learning for Distributed Load Forecasting’, in 2023 IEEE
International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Oct.
pp. 1–6. (2023). https://doi.org/1 0.1109/Sm
artGridComm57358.2023.103338 89
17. Wang, Y. & Guo, Q. Privacy-Preserving and adaptive federated deep learning for multiparty wind power forecasting. IEEE Trans.
Ind. Appl. 1–11. https://doi.org/10.1109/TIA.2024.3430229 (2024).
18. Fekri, M. N., Grolinger, K. & Mir, S. Distributed load forecasting using smart meter data: federated learning with recurrent neural
networks. Int. J. Electr. Power Energy Syst. 137, 107669. https://doi.org/10.1016/j.ijepes.2021.107669 (May 2022).
19. Hu, Y., Ren, H., Hu, C., Deng, J. & Xie, X. An Element-Wise Weights Aggregation Method for Federated Learning, 2023 IEEE
International Conference on Data Mining Workshops (ICDMW), Shanghai, China, 2023, pp. 188–196. ht tps://doi. org/10.110 9/IC
DMW608 47.2023.00031
20. Hu, Z., Shaloudegi, K., Zhang, G. & Yu, Y. Federated Learning Meets Multi-Objective Optimization, in IEEE Transactions on
Network Science and Engineering, vol. 9, no. 4, pp. 2039–2051, 1 July-Aug. (2022). https://doi.org/10.1109/TNSE.2022.3169117
21. Chifu, V., Cioara, T., Anitiei, C., Pop, C. & Anghel, I. ‘FedWOA: A Federated Learning Model that uses the Whale Optimization
Algorithm for Renewable Energy Prediction’, Sep. 19, 2023, arXiv: arXiv:2309.10337. https://doi.org/10.48550/arXiv.2309.10337
22. Raiaan, M. A. K., Sakib, S., Fahad, N. M. & Mamun, A. A. Md. Anisur Rahman, Swakkhar Shatabda, Md. Saddam Hossain Mukta,
A systematic review of hyperparameter optimization techniques in convolutional neural networks. Decis. Analytics J. 11, 2772–
6622. https://doi.org/10.1016/j.dajour.2024.100470 (2024).
23. Jingwen Zhou, S., Pal, C., Dong, K. & Wang Enhancing quality of service through federated learning in edge-cloud architecture.
Ad Hoc Netw. 156, 1570–8705. https://doi.org/10.1016/j.adhoc.2024.103430 (2024).
24. Kundroo, M. & Kim, T. Federated learning with hyper-parameter optimization. J. King Saud University-Computer Inform. Sci. 35
(9), 101740 (2023).
25. Qolomany, B., Ahmad, K., Al-Fuqaha, A. & Qadir, J. Particle Swarm Optimized Federated Learning For Industrial IoT and Smart
City Services, GLOBECOM 2020–2020 IEEE Global Communications Conference, Taipei, Taiwan, pp. 1–6, (2020). https://doi .or
g/10.11 09/GLOBECO M42002.202 0.9322464
26. Fahd, N. et al. Deepak Gupta, pelican optimization algorithm with federated learning driven attack detection model in internet
of things environment. Future Generation Comput. Syst. 148 https://doi.org/10.1016/j.future.2023.05.029 (2023). Pages 118–127,
ISSN 0167-739X.
27. Połap, D. et al. A heuristic approach to the hyperparameters in training spiking neural networks using spike-timing-dependent
plasticity. Neural Comput. Applic. 34, 13187–13200. https://doi.org/10.1007/s00521-021-06824-8 (2022).
28. Bukhari, S. M. S., Moosavi, S. K. R., Zafar, M. H., Mansoor, M., Mohyuddin, H., Ullah,S. S., … Sanfilippo, F. (2024). Federated
transfer learning with orchard-optimized Conv-SGRU: A novel approach to secure and accurate photovoltaic power forecasting.
Renewable Energy Focus, 48, 100520.
29. Vasilis Michalakopoulos, E. et al. A machine learning-based framework for clustering residential electricity load profiles to
enhance demand response programs. Appl. Energy. 361, 0306–2619. https://doi.org/10.1016/j.apenergy.2024.122943 (2024).
30. Li, J. et al. Federated learning-based short-term Building energy consumption prediction method for solving the data silos
problem. Build. Simul. 15, 1145–1159https://doi.org/10.1007/s12273-021-0871-y (2022).
31. Petrangeli, E., Tonellotto, N. & Vallati, C. Performance evaluation of federated learning for residential energy forecasting. IoT 3 (3),
381–397 (2022).
32. Vasilis Michalakopoulos, E., Sarantinopoulos, E., Sarmas, V. & Marinakis Empowering federated learning techniques for privacy-
preserving PV forecasting. Energy Rep. 12 https://doi.org/10.1016/j.egyr.2024.08.033 (2024). Pages 2244–2256, ISSN 2352–4847.
33. Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Sci. New. Ser., 220, No. 4598. (May 13, 1983),
pp. 671–680 .
34. Man, K. F., Tang, K. S. & Kwong, S. Genetic Algorithms: Concepts and Applications, IEEE Transactions on Industrial Electronics,
Vol. 43, No. 5, 519 (October 1996).
35. UK Power Networks. SmartMeter Energy Consumption Data in London Households, https://data.london.gov.uk/da taset/sma rtm
eter-energy-use-data-in-london-households
36. David Arthur and Sergei Vassilvitskii. K-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual
ACM-SIAM symposium on Discrete algorithms (SODA ‘07). Society for Industrial and Applied Mathematics, USA, 1027–1035.
(2007).
37. Leonard Kaufman, Peter, J. & Rousseeuw Finding Groups in Data: An Introduction to Cluster Analysis, ISBN:9780471878766
|Online ISBN:9780470316801 (1990). https://doi.org/10.1002/9780470316801
38. Murtagh, F. & Contreras, P. Algorithms for hierarchical clustering: an overview. WIREs Data Min. Knowl. Discov. 2, 86–97. https:
//doi.org/1 0.1002/wid m.53 (2012).
39. Brendan, H. McMahan Eider Moore Daniel Ramage Seth Hampson Blaise Aguera y Arcas, Communication-Efficient Learning of
Deep Networks from Decentralized Data, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics
(AISTATS) (2017).
40. Boot, S. https://spring.io/projects/spring-boot
41. Python https ://www.py thon.org/downloads/release/python-3123/
42. Tensorflow https://github.com/tensorflow/tensorflow/releases
43. Pandas https://pandas.pydata.org/.
44. scikit-learn, https://scikit-learn.org/stable/
45. SciPy https://scipy.org/.
46. argparse https://docs.python.org/3/library/argparse.html
47. Protobuf https://protobuf.dev/.
48. Matplotlib https://matplotlib.org/.
49. Jenetics https://jenetics.io/.
50. Heuristic-Based Federated Learning on GitHub. https://github.com/mihaid150/ Heuristic -Adaptive-Federated
-Learning
51. Keras https://keras.io/.
52. Abien Fred, M. & Agarap Deep Learning using Rectified Linear Units (ReLU), (2018). https://arxiv.org/abs/1803.08375
53. Diederik, P. & Kingma Jimmy Ba, Adam: A Method for Stochastic Optimization, (2014). https://arxiv.org/abs/1412.6980
54. Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inf. 112, 59–67.
https://doi.org/10.1016/j.ijmedinf.2018.01.007 (Apr. 2018).
55. Choudhury, O. et al. Differential Privacy-enabled Federated Learning for Sensitive Health Data, Feb. 27, 2020, arXiv:
arXiv:1910.02578. https://doi.org/10.48550/arXiv.1910.02578
56. McMahan, H. B., Ramage, D., Talwar, K. & Zhang, L. Learning Differentially Private Recurrent Language Models, Feb. 23, 2018,
arXiv: arXiv:1710.06963. https://doi.org/10.48550/arXiv.1710.06963
57. Wu, X., Liang, Z. & Wang, J. FedMed: A Federated Learning Framework for Language Modeling, Sensors, vol. 20, no. 14, Art. no.
14, Jan. (2020). https://doi.org/10.3390/s20144048
58. Liu, Y., Yu, J. J. Q., Kang, J., Niyato, D. & Zhang, S. Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach,
IEEE Internet Things J., vol. 7, no. 8, pp. 7751–7763, Aug. (2020). https://doi.org/10.1109/JIOT.2020.2991401
59. Perifanis, V., Pavlidis, N., Koutsiamanis, R. A. & Efraimidis, P. S. Federated learning for 5G base station traffic forecasting. Comput.
Netw. 235, 109950. https://doi.org/10.1016/j.comnet.2023.109950 (Nov. 2023).
60. Powerpoint, M. https:/ /www.micr osoft.com/ro-ro/m icrosoft -365/powerpoint
Acknowledgements/Funding
This research received funding from the European Union’s Horizon Europe research and innovation program
under Grant Agreements number 101136216 (Hedge-IoT) and 101103998 (DEDALUS). Views and opinions
expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union
or the European Climate, Infrastructure, and Environment Executive Agency. Neither the European Union nor
the granting authority can be held responsible for them.
Author contributions
Conceptualization, T.C., I.A., V.M and Ef.S.; Methodology, T.C., L.T., El.S. and V.M.; writing—original draft
preparation, L.T., M.D., T.C., I.A., V.M., Ef.S. and El.S.; writing—review and editing, L.T., M.D., T.C., I.A., V.M.,
Ef.S,, and El.S.; All authors read and agreed to the submitted version of the manuscript.
Declarations
Competing interests
The authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to T.C. or I.A.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.