Abstract
Prediction problems regularly exist in practical application problems, where real application systems are usually complex. For multi-model data in complex systems, effectively identifying the patterns of each data set is significant for subsequent prediction. This paper focuses on the prediction problem of multi-model data and proposes an adaptive regression algorithm (CFM-MSVR) that combines a clustering feedback mechanism with improved support vector regression. The clustering feedback mechanism (CFM) clusters samples based on their residuals in each forecasting model, enabling the discovery of original data generation models. Meanwhile, it can intelligently estimate the number of clusters based on the number of samples in each cluster, reducing computational cost and dependence on empirical settings. In the regression stage, the multi-model support vector regression (MSVR) leverages the non-dominated sorting genetic algorithm II (NSGA-II) to optimise the parameters of the support vector regression, thereby improving the generalisation of each sub-model. The proposed method is evaluated on a simulated dataset, four real-world datasets, and the 2012 Global Energy Forecasting Competition dataset. Results show that CFM-MSVR achieves a MAPE% of 1.52 on the energy prediction task, demonstrating its strong performance in complex forecasting scenarios.
Similar content being viewed by others
1 Introduction
Prediction problems are popular in real applications such as power scheduling [1], bioinformatics [2], and environmental science [3]. The systems of real applications are generally complex, so the prediction task is often challenging, particularly for multi-model data. Moreover, for multi-model data, identifying the model for each data point effectively is difficult and crucial in data modelling to achieve satisfying predictions. Therefore, we will concentrate on multi-model data prediction by accurately recognising and learning the underlying data of each model in this paper.
In predictive method research, scholars often assume that the data is from a single model and many different types of models have been developed, such as statistical models [4,5,6], Kalman filter methods [7, 8], artificial neural networks [9,10,11], deep learning models [12,13,14], and fuzzy logic methods [15,16,17]. These models have been applied in various fields such as epidemic forecasting [18, 19], portfolio selection [20], weather forecasting [21, 22], financial forecasting [23, 24], and chloride diffusion forecasting in building materials [25]. Although these models, considering a single mode, can provide good predictions [20, 26,27,28,29], their effectiveness is often constrained in complex scenarios of multi-model data due to limited model assumptions. It is noted that their prediction performance improves when the models of the data are diagnosed and modelled. Therefore, some advanced clustering-based predictive models have been proposed to address the problem of multi-model data modelling. Initially, algorithms such as K-means clustering [30,31,32], fuzzy C-means algorithm [33], and multi-objective semi-supervised clustering model [34, 35] are used to identify the data modalities. Then, models such as binary logistic regression [36], Lion-Wolf based deep belief network [37], kernel ridge regression [31], grey prediction [38], and support vector machine [39, 40] are applied for data prediction. To improve the prediction accuracy, feature extraction is also carried out, which includes principal component analysis. One of the most typical works is in the reference of [34]. The authors established a multi-objective semi-supervised clustering model to address a predictive task in clinical data, where two objective functions were designed to minimise the cost function of K-medians clustering and the mean square error of cross-validation. Here, the K-medians clustering was employed to diagnose the model for each dataset. Later, [35] further improved upon the work of [34] by modifying the objective function to minimise the deviation of the data points in the clusters and the prediction error of the response. Here, it should be noticed that the non-dominated sorting genetic algorithm II (NSGA-II) was used to solve the multi-objective optimisation problem, and local regression is used to learn each model. The investigation of seven references in [30, 31, 33, 34, 36, 37], and [35] all use the method of clustering the data first and then learning each cluster separately. Because the data of each cluster, which is from one model, has similar characteristics, this approach significantly improves the prediction accuracy, reduces the prediction error, and has high prediction accuracy and stability. Multi-objective optimisation models in [34] and [35] also help to overcome the disadvantage of the initial setting sensitivity of clustering.
However, the optimal number of clusters cannot be directly determined, and the computational cost of using an experiment-by-experiment method is too high. In addition, clustering of the data by simple Manhattan distance does not distinguish the data from different models well, as such data is distributed disorderly. Furthermore, real-world data is usually non-linear, so linear regression is ineffective. To address this problem, a non-linear prediction model can be adopted. Non-linear support vector regression (SVR) is one of the most popular methods used for non-linear regression [41], and its performance is sensitive to parameter settings [42]. Thus, a dynamic parameter adjustment mechanism can further improve computational efficiency [43].
Taking these ideas, we propose an adaptive regression algorithm to forecast multi-model data based on the clustering procedure, using the cluster-feedback mechanism (CFM) and modified support vector regression (MSVR). Specifically, we select the CFM to find the data models via a clustering process [44]. By adaptively selecting the number of clusters based on prediction residuals, CFM not only reduces computational cost but also enhances prediction accuracy. Moreover, CFM helps the proposed algorithm pay more attention to the original data generation models, which are unknown. Following the clustering process, we use non-linear SVR with a radial basis function to forecast the samples in different clusters after the clustering process. In addition to improving the generalisation ability of each forecasting model, we propose the MSVR, which uses NSGA-II to find better parameters for each SVR model. In summary, the contributions of this paper are as follows:
-
(1)
An adaptive regression algorithm, which is named CFM-MSVR, based on a cluster-feedback mechanism, is proposed to forecast the multi-model data. In particular, CFM is used to cluster the samples according to the residual of each sample in each forecasting model. The CFM helps to find more reasonable clusters that are more similar to the real models, which improves the forecasting accuracy.
-
(2)
Based on the clusters adjusted by CFM, the number of clusters can be estimated intelligently based on the number of samples in each cluster, which decreases the computational cost, the number of parameters, and the dependence on the empirical setting.
-
(3)
Focusing on the non-linear forecasting problems, we propose the MSVR, which is composed of non-linear SVR and non-dominated sorting genetic algorithm II (NSGA-II). To avoid overfitting, we use NSGA-II to optimise the number of support vectors and the prediction error and promote the generalisation and forecasting ability.
-
(4)
What’s more, we demonstrate the effectiveness of the proposed algorithm in terms of computational cost, generalisation ability, and forecasting accuracy on one simulated dataset, four real datasets, and a case study.
The organisation of this paper is as follows: Sect. 2 outlines the non-linear SVR algorithm and NSGA-II. Section 3 describes the proposed adaptive regression algorithm (CFM-MSVR) in detail. Section 4 tests the proposed method using one simulated dataset and four real datasets. Section 5 illustrates the performance of the CFM-MSVR based on the data Global Energy Forecasting Competition 2012 (GEFCom2012). Section 6 concludes the paper.
2 Background
In this section, the non-linear SVR algorithm and NSGA-II are introduced in the following, which is advantageous to propose the MSVR.
2.1 The non-linear SVR algorithm
The non-linear SVR algorithm is an effective forecasting method proposed by [45]. Compared with linear regression, non-linear SVR performs better when forecasting high-dimensional data [46]. Given a set of training data \(\{(x_1, y_1),(x_2, y_2),\cdots ,(x_N, y_N)\}\) with size N, where \(x_i \in R^{D},\ i=1, 2,\cdots , N\) are input feature vectors and \(y_i \in R^1,\ i=1, 2,\cdots , N\) are input label values. We use a non-linear function f(x) to regress the input data in high-dimensional space as follows:
where \(x \in R^D\) is the input vector, w and b are the weight and bias vectors. \(\varphi (x)\) is a nonlinear function used to map the x to high-dimensional Hilbert space [47]. The weight vector w and the bias b can be calculated by solving the optimisation problem as follows:
where C is the penalty factor used to avoid overfitting, \(\zeta _i\) and \(\zeta _i^*\) are loose variables, and \(\epsilon\) is a constant [48]. Finally, the regression function can be expressed as follows:
where \(k(x, x_i) = <\varphi (x), \varphi (x_i)>\) is the kernel function and common kernel functions, including linear, polynomial, and Gaussian kernel functions.
Remark 1
The selection of penalty factor C, kernel function \(k(x, x_i)\), and \(\epsilon\) matters a lot to the forecasting accuracy. However, the selection of these parameters is difficult, and most time is based on experience [49].
2.2 The non-dominated sorting genetic algorithm II
The non-dominated sorting genetic algorithm II (NSGA-II), which is proposed by [50], is one of the most popular algorithms to solve multi-objective optimisation problems. Compared with NSGA, NSGA-II adds elitism and decreases the computational complexity and the number of parameters [51]. NSGA-II contains three main procedures: fast non-dominated sorting, crowding distance, and selection operator [51]. A brief description of these procedures is shown as follows.
2.2.1 Fast non-dominated sorting
Fast non-dominated sorting is proposed based on the concept of Pareto dominance. Suppose that a multi-objective problem includes n decision variables \(u_i,\ i = 1, 2, \cdots , n\) and k objective functions \(g_j,\ j = 1, 2, \cdots , k\). For any \(p,q\in \{1, 2, \cdots , n\}\), if \(g_j(u_p)> g_j(u_q)\) for all \(j=1,\cdots ,k\), we say that \(u_p\) dominates \(u_q\), denoted by \(u_p \succ u_q\).
Moreover, a decision variable u is named as Pareto optimal if u is not dominated by any other variable.
The procedures of fast non-dominated sorting are shown as follows:
Step 1: Initialize \(\Psi (u_i) = 0\) and \(S(u_i) = \emptyset\), where \(\Psi (u_i)\) and \(S(u_i)\) are the number of variables which dominate \(u_i\) and the set of solutions which are dominated by solution \(u_i\). For all \(i, j = 1,2,\cdots , n\), if \(u_i \succ u_j\), let \(\Psi (u_j) = \Psi (u_j) + 1\) and put \(u_j\) in \(S(u_i)\).
Step 2: Let \(k = 1\) and \(P_{k}\) be the set of kth front. Set \((r_1, r_2,\cdots , r_n)^T = \textbf{0}\).
Step 3: Find \(Q \subset \{1,2,\cdots ,n\}\), where \(\Psi (u_i) = 0, i \in Q\). Let \(r_i = k\), for all \(i \in Q\), and put \(u_i\) into \(P_k\).
Step 4: Find all solutions \(u_q\) in the \(S(u_i)\) for all \(i \in Q\), and let \(\Psi (u_q) = \Psi (u_q) - 1\).
Step 5: If \(r_i \ne 0\) for all \(i = 1, 2, \cdots , n\), stop. Otherwise, \(k = k + 1\), go to Step 3.
2.2.2 Crowding distance
Though the solutions have been ranked by fast non-dominated sorting, the solutions in the same rank are difficult to judge which one is better. So the crowding distance is calculated to rank the solutions in the same rank. The definition of the crowding distance is as follows:
where \(g_j^{max}\) is the largest number in \(g_j(u_s),\ s = 1, 2,\cdots , n\). The definition of \(g_j^{min}\) is analogous. The solution with the larger crowding distance is considered to be the better solution.
2.2.3 Selection operator
When generating an offspring, n solutions are selected based on the fast non-dominated sorting and crowding distance. For \(p,q \in \{1, 2, \cdots , 2n\}\), the rule of selection between \(u_p\) and \(u_q\) is as follows: (1) If \(r_p < r_q\), retain the solution \(u_p\) in the next generation. (2) If \(r_p = r_q\), calculate the crowding distance \(cd(u_p)\) and \(cd(u_q)\). If \(cd(u_p)> cd(u_q)\), retain the solution \(u_p\) in the next generation. Else, retain the solution \(u_q\).
With the components outlined above, the complete NSGA-II process is shown as follows.
First, generate an initial population \(u = \{u_1, u_2, \cdots , u_n\}\). Then, a new generation \(u' = \{u'_1, u'_2, \cdots , u'_n\}\) is generated after crossover and mutation. Let \(u^* = u \cup u'\) and rank all members of \(u^*\) by fast non-dominated sorting and crowding distance. After that, select n members using the selection operator as u in the next generation. The working of NSGA-II is shown in Fig. 1.
3 Proposed Method
In this section, the details of the adaptive regression algorithm (CFM-MSVR) to forecast multi-model data based on the clustering process are shown. The two main components of CFM-MSVR are the cluster-feedback mechanism (CFM) and modified SVR (MSVR). First, the CFM is used to find the data models and adjust the number of clusters. Then, MSVR is used to calculate the regression models and adjust the parameters for each model. Finally, the CFM-MSVR based on CFM and MSVR is proposed in detail.
3.1 The cluster-feedback mechanism
First, we initialise m clusters. The m centers \(\{c_1, c_2, \cdots , c_m\}\) are randomly chosen from the samples \(\{(x_1, y_1),(x_2, y_2),\cdots ,(x_N, y_N)\}\). Let \(Cl = \{Cl_i,\ i = 1,2,\cdots , m\}\), where \(Cl_i\) is the set of ith cluster. Then for each sample \((x_t,y_t),\ t=1,\cdots ,N\), let \(d((x_t,y_t),c_i)\) be the Manhattan distance from \((x_t,y_t)\) to \(c_i\). Find \(i^* \in \{1,2,\cdots ,m\}\) where \(d(x_t, c_{i^*})\) is the minimum value in \(\{d((x_t, y_t), c_1),\cdots , d((x_t, y_t), c_m)\}\), and add \((x_t,y_t)\) to \(Cl_{i^*}\).
Supposing we have obtained the forecasting models \(f_i,\ i = 1,2,\cdots ,m\), the predictive residual of each sample \(r_i(x_t, y_t),\ i = 1,2,\cdots ,m, \ t = 1,2,\cdots ,N\) in \(f_i\) can be calculated as follows:
Let \(p = \arg \min \limits _{i}(r_i(x_t, y_t))\) and put the sample \((x_t, y_t)\) into the pth cluster \(Cl_p\).
In addition, let \(size(Cl_i)\) be the number of samples in \(Cl_i\). After regrouping all of the samples, \(size(Cl_i),\ i=1,2,\cdots ,m\) changes at the same time. Then some clusters contain few samples. Using Eq. (5), we can adjust the number of clusters:
where \(Cl^*\) is the adjusted set of clusters and \(\hat{\epsilon }\) is a threshold value. \(len(Cl^*)\) is the number of clusters in \(Cl^*\). Then, if \(len(Cl^*) \ne len(Cl)\), choose centres randomly from the samples again in the next iteration. Otherwise, the clusters in Cl are all retained and used in the next iteration. The steps of the CFM are illustrated in Algorithm 1.
3.2 The modified SVR
Because the selection of penalty factor C and \(\epsilon\) influences the performance of SVR models, we use NSGA-II to adjust C and \(\epsilon\). Supposing that the samples \(\{(x_1, y_1),(x_2, y_2),\cdots ,(x_N, y_N)\}\) have been grouped into m clusters (\(Cl_i = \{(x_t, y_t)\mid (x_t, y_t)\ \text {is grouped into the }i \text {th cluster}\},\ i = 1, 2, \cdots , m\)), first we initialize n agents \(u_j = (\{C_1, \epsilon _1\}, \{C_2, \epsilon _2\}, \cdots , \{C_m, \epsilon _m\}),\ j = 1, 2, \cdots , n\). Then for each agent \(u_j\), the regression functions \(f^{(j)} = \{f_i(Cl_i\mid C_i, \epsilon _i),\ i = 1,2,\cdots ,m\}\) can be trained using data in \(Cl_i\) with the parameters \(\{C_i\),\(\epsilon _i\}\) by Eq. (1). Now we have m SVR models \(f^{(j)}\) for the jth agent. Based on these models, the number of support vectors and the sum of the validation error, which are two objective functions, can be calculated. The optimisation problem is shown as follows:
where \(C = (C_1,C_2,\cdots ,C_n)\), \(\epsilon = (\epsilon _1, \epsilon _2,\cdots , \epsilon _n)\), and \(\psi (f_i^{(j)}(Cl_i\mid C_i, \epsilon _i))\) is the number of support vectors in \(f_i^{(j)}(Cl_i\mid C_i, \epsilon _i)\). Then update \(u_j\) by NSGA-II until the last iteration is finished. After the last iteration, we need to find the best agent in \(P_1\) [34]. First, we normalise the objective function values of the agents in \(P_1\):
where \(n_p\) is the number of samples in the first front. \(g_k^{(p)}\) is the kth objective function value of the \((x_p,y_p)\). \(mean(g_k)\) and \(std(g_k)\) stand for the average and standard deviation values of the kth objective function for all samples in \(P_1\). The overall steps of the MSVR are illustrated in Algorithm 2.
Then a positive vector P can be determined by the following equation:
where \(\underset{p = 1,\cdots ,n_p}{min}\ NormG_k^{(p)},\ k=1,2\) is the minimum value in \(NormG_k^{(p)}\) with the subscript \(p_k\). Finally, the similarity between each sample in the first front can be calculated by the following equation:
where \(NormG^{(p)} = [NormG_1^{(p)},\ NormG_2^{(p)}]\). And choose the sample with the highest similarity. The overall steps of the MSVR are illustrated in Algorithm 2.
3.3 The proposed CFM-MSVR
The adaptive regression algorithm (CFM-MSVR) to forecast multi-model data based on the clustering process is proposed in this section. CFM-MSVR not only uses non-linear regression to forecast the multi-model data, but also uses CFM to find the different models quickly and improve the forecasting accuracy. CFM helps to select the data in different models, and MSVR is used to calculate the forecasting models. The combination of CFM and MSVR helps to forecast the multi-model data quickly and accurately. The main steps of CFM-MSVR are below.
Step 1: Initialise the agents and parameters, and randomly choose the centres from the input data.
Step 2: Cluster the data based on the Manhattan distance.
Step 3: Use MSVR to calculate the forecasting models for each cluster.
Step 4: Regroup the data and adjust the number of clusters based on each model by CFM. If the number of clusters changes, randomly choose the centres again. Go to Step 3.
The overall structures of the proposed CFM-MSVR are shown in Fig. 2, where Iter and iter are the iterations of CFM and MSVR, respectively.
Remark 2
The setting of the initial number of clusters m and the threshold value \(\hat{\epsilon }\) is usually based on experience. The initial number of clusters tends to be set to a large number. When we forecast the data with N samples, the threshold value \(\hat{\epsilon }\) is about \(\frac{N}{2m}\). Especially, if the threshold value is too small, the number of clusters will not change. On the other hand, some of the clusters will be deleted by error. What’s more, when the data is significantly impacted by noise, using more models to forecast one dataset may get better forecasting results.
4 Numerical simulation
In this section, the proposed CFM-MSVR is tested on one constructive dataset and four real datasets to demonstrate its effectiveness.
When experimenting on the constructive data, the validation error of CFM-MSVR is recorded and discussed. To verify the superiority of the proposed CFM-MSVR, the results of CFM-MSVR are compared with the results of the method in [35], which is denoted by MethodG. In addition, to verify the effectiveness of CFM, we convert linear regression to MSVR in MethodG (MethodG-MSVR). And the result of MethodG-MSVR is also compared.
When experimenting on the four real datasets, the computational cost and forecasting accuracy of CFM-MSVR are discussed. The results of CFM-MSVR are compared with MethodG. In addition, to prove the stability of the proposed algorithm, CFM-MSVR, the differential entropy in each iteration of CFM is recorded.
4.1 Constructive data
We construct the samples produced by two models. Our goal is to judge which model each sample belongs to and forecast the two constructive datasets by the proposed method CFM-MSVR. Here we uniformly sample 450 data points from \([-8\pi , -3\pi ]\cup [-3\pi ,-\pi ]\cup [\pi ,3\pi ]\cup [3\pi ,8\pi ]\) as x. And the label value y(x) is defined as follows:
In addition, noise obeying a normal distribution is added to the original data (x, y). And the data is standardized after all.
In this experiment, \(m = 6\) is the initial number of clusters, and \(Max\_Iter = 10\) is used in the CFM. 25 iterations and 50 search agents are used in MSVR. What’s more, the radial basis function (RBF) is chosen as the kernel function, and the threshold \(\hat{\epsilon }\) is 80. In NSGA-II, the rate of crossover is set to 0.8 and the rate of mutation to 0.01. For MethodG, the initial number of clusters is set as 2, then the number of iterations is 50.
Figure 3 shows the results of numerical simulation using CFM-MSVR and MethodG. By CFM-MSVR, all data are put into the original model correctly, and this verifies the effectiveness of the proposed method. When two groups of data are densely distributed, CFM-MSVR uses CFM can successfully separate the data from the two models, but MethodG uses the shortest distance to make it difficult to separate the data. Moreover, the true number of models is unknown when setting the number of clusters. CFM-MSVR with CFM can find the true number of clusters and calculate the two models.
The validation error of the CFM-MSVR is recorded and compared with the results of MethodG and MethodG-MSVR. Table 1 shows the minimum and average validation error (\(g_2\)) in each iteration of the three methods, and CFM-MSVR performs better in the forecasting accuracy as a whole.
The average validation error of MethodG is 1.43e+04, and that of CFM-MSVR is 95.5, which is the smallest of all. It means that the use of the CFM improves the performance in the average value. Meanwhile, compared with the results of MethodG, the validation error of MethodG-MSVR is 184 on average. Based on the results of MethodG and MethodG-MSVR, MSVR significantly improves forecasting accuracy when it comes to the data in complex systems.
4.2 Real data
In this section, the proposed CFM-MSVR is tested on four real datasets containing Boston Housing, Concrete Compressive Strength, QSAR Aquatic Toxicity, and Concrete Slump Test, all of which are publicly available from https://archive.ics.uci.edu/datasets. The details of the four datasets are shown in Table 2. We discuss the validation error and computational cost of CFM-MSVR based on four real datasets. The results of CFM-MSVR are also compared with the results of MethodG.
In this experiment, in CFM-MSVR, the initial number of clusters is set as 12. \(Max\_Iter\) and \(max\_iter\) are set as 10 and 50. And the number of agents is 25. Moreover, the radial basis function is selected as the kernel function, and the threshold \(\hat{\epsilon }\) is set as 40 for Data 1, 3, set as 80 for Data 2, and set as 5 for Data 4. In NSGA-II, the crossover rate and mutation rate are set to 0.8 and 0.01, respectively. For the method MethodG, the number of iterations is set as 50, and the number of agents is 25. The number of clusters is set from 12 to 2. The experiments are conducted in a Python 3.7 environment with a 1.4 GHz quad-core Intel Core i5 and 8 GB of RAM.
To ensure the stability of the proposed algorithm, the differential entropy of the distance between samples and their centres is calculated after each iteration of CFM as:
where N is the number of samples and \(X = (X_1,\cdots ,X_N)\) is the sample data. \(C(X_i)\) is the centre of the cluster which contains the sample \(X_i\).
The results of the CFM-MSVR and MethodG on the four real datasets are recorded in Table 3, which contains the computation cost and forecasting accuracy. Also, the differential entropy in each iteration is recorded and displayed in Fig. 4.
When forecasting complex real data, the proposed method, CFM-MSVR, performs better than MethodG. For example, on the high-dimensional dataset, Data 1, the minimum validation error of the CFM-MSVR is 32.12, and that of MethodG is 164.91, much higher than 32.12. Meanwhile, the proposed CFM-MSVR significantly decreases the computational cost. For example, on Data 1, the time costs of CFM-MSVR and MethodG are 8360 s and 19081 s, respectively. So, using CFM-MSVR to forecast complex multi-model data is more effective.
To verify the stability of the proposed method, the differential entropy by Eq. (10) in each iteration of CFM for the four real data is shown in Fig. 4. The difference in entropy decreases gradually in each iteration and tends to converge, which stands for the stability of the proposed CFM-MSVR.
5 The global energy forecasting competition 2012
In this section, the CFM-MSVR is tested using data from GEFCom2012. The data of Zone 21 is chosen, and the hourly load from 2004 to 2006 (26280 samples) is used as the training set, and the hourly data from 2007, which contains 8760 samples, is used as the test set. Taking the total hourly load of 20 power stations into consideration, \(L_t\), \(M_t\), \(W_t\), \(H_t\), and \(T_t\) are inputs, and \(y_t\) is the output, where \(y_t\) is the real load at time t. \(L_t = t\) is a linear trend term. \(M_t\), \(W_t\), and \(H_t\) stand for the month, the day of the week, and the hour of the day at time t. \(T_t\) is the temperature at time t.
To measure the performance of each model, the mean absolute percentage error (MAPE) is calculated as follows:
where \(N_{test}\) is the size of the test set. \(y_t\) is the real load at time t and \(\hat{y}_t\) is the forecasting load at time t. MAPE treats errors proportionally and reduces the impact of outliers, making it more robust than the root mean square error (RMSE) in measuring prediction performance. To demonstrate the superiority of the proposed method, CFM-MSVR, the values of MAPE of MethodG and the other methods mentioned in [52] are compared.
Here, in CFM-MSVR, the number of agents is set as 15. The initial number of clusters is 80. And the maximum number of iterations of CFM and MSVR is set as 10 and 5, respectively. The threshold value \(\hat{\epsilon }\) is 100. In NSGA-II, the rate of crossover and the rate of mutation are set to 0.8 and 0.01, respectively. For MethodG, the number of agents is 15, and the number of iterations is set as 50. For all different datasets, the results of the algorithm converge after 10 iterations. For the convenience of comparison, the results shown in Fig. 4 are those of the first 10 iterations.
By CFM-MSVR, after deleting the clusters with few samples, the number of clusters stays around 40, which represents the number of models in the data GEFCom2012. Figure 5 (a) shows the curve of the number of clusters. With the help of CFM and MSVR, different models are found, which stand for the different electricity load modes. In addition, we test the stability of the CFM-MSVR based on differential entropy. The differential entropy decreases gradually. When the last iteration is reached, the differential entropy stays around 366.39, as shown in Fig. 5(b).
CFM-MSVR predicts data with lower residuals, generally compared with MethodG. The proposed CFM-MSVR and MethodG forecasting loads are shown in Fig. 6 (a). Figure 6 (b) illustrates the residuals of the hourly load using CFM-MSVR and MethodG. As the figures showed, in the test data, the CFM-MSVR managed to improve the forecasting accuracy based on the CFM and MSVR. Though some of the maximum and minimum points are not predicted well, the whole forecasting result of CFM-MSVR is excellent, where the residuals of the CFM-MSVR are much lower than the residuals of MethodG.
Figure 7 shows the results of the CFM-MSVR, MethodG, and the other methods mentioned in [52], including IRLS_bis, IRLS_log, and the other popular forecasting methods. The results in [52] are used.
The MAPE% of the CFM-MSVR is 1.52, and the MAPE% of the MethodG is 24.18. Compared with MethodG, the non-linear forecasting method is quite effective, and CFM-MSVR gets more accurate forecasting results. Besides, some famous methods, including IRLS_bis, MLR, ANN, SVR, GRR, and RFR, are tested and compared. For example, the MAPE% of IRLS_bis and MLR are 5.3 and 5.22, respectively. CFM-MSVR performs best among all of the methods. Especially, compared with the SVR with the MAPE% of 5.23, CFM-MSVR manages to lower the MAPE% with the use of CFM to find the samples in different models. In addition, NSGA-II is used to find better parameters for each forecasting model.
To sum up, the performance of CFM-MSVR is stable and significant. Combined with the CFM and MSVR, the MAPE based on the test data is 1.52, and the forecasting accuracy of CFM-MSVR is better than the other forecasting methods.
6 Conclusions
This paper proposes an adaptive regression algorithm (CFM-MSVR) via a clustering process to forecast multi-model data. First, the cluster-feedback mechanism helps to recognise each model and adjust the samples in each cluster. Then we apply the modified SVR to forecast the data in each cluster. To demonstrate the effectiveness of the proposed CFM-MSVR, the proposed method is tested on one constructive dataset and four real datasets, where the results illustrate the superiority of the CFM-MSVR in the computational cost and forecasting accuracy. For instance, the validation error is 32.12, and the time cost is 8360 s based on Data 1. Moreover, CFM-MSVR is tested in the dataset GEFCom2012, and the results are compared with MethodG and other deep learning methods. The MAPE% of the CFM-MSVR is 1.52, which shows that CFM-MSVR performs better when it comes to complex multi-model data.
Despite extensive experiments, this study may not fully capture the diversity and dynamics of real-world applications, potentially limiting the algorithm’s generalizability. Moreover, the cluster-feedback mechanism needs further improvement, especially in setting the threshold value \(\hat{\epsilon }\). The setting of the \(\hat{\epsilon }\) should be more intelligent, which will improve the effectiveness and accuracy of finding each model. We will further explore these aspects in future research.
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Mashlakov A, Pournaras E, Nardelli PH, Honkapuro S. Decentralized cooperative scheduling of prosumer flexibility under forecast uncertainties. Appl Ener. 2021;290: 116706.
Dai Y, Chen H, Zhuang S, Feng X, Fang Y, Tang H, et al. Immunodominant regions prediction of nucleocapsid protein for SARS-CoV-2 early diagnosis: a bioinformatics and immunoinformatics study. Pathogens Global Health. 2020;114(8):463–70.
Chen T, Yan ZA, Xu D, Wang M, Huang J, Yan B, et al. Current situation and forecast of environmental risks of a typical lead-zinc sulfide tailings impoundment based on its geochemical characteristics. J Environ Sci. 2020;93:120–8.
Shams SR, Jahani A, Kalantary S, Moeinaddini M, Khorasani N. The evaluation on artificial neural networks (ANN) and multiple linear regressions (MLR) models for predicting SO2 concentration. Urban Climate. 2021;37: 100837.
Li X, Liu Y, Fan L, Shi S, Zhang T, Qi M. Research on the prediction of dangerous goods accidents during highway transportation based on the ARMA model. J Loss Prev Process Ind. 2021;72: 104583.
Singh S, Mohapatra A. Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renew Energy. 2019;136:758–68.
Singh KK, Kumar S, Dixit P, Bajpai MK. Kalman filter based short term prediction model for COVID-19 spread. Appl Intell. 2021;51(5):2714–26.
Aly HH. An intelligent hybrid model of neuro Wavelet, time series and Recurrent Kalman Filter for wind speed forecasting. Sustain Energy Technol Assess. 2020;41: 100802.
Huang X, Jagota V, Espinoza-Muñoz E, Flores-Albornoz J. Tourist hot spots prediction model based on optimized neural network algorithm. Int J Syst Assu Eng Manag. 2022;13(1):63–71.
Hu H, Wang L, Tao R. Wind speed forecasting based on variational mode decomposition and improved echo state network. Renew Energy. 2021;164:729–51.
Sun W, Huang C. A carbon price prediction model based on secondary decomposition algorithm and optimized back propagation neural network. J Clean Prod. 2020;243: 118671.
Yan X, Weihan W, Chang M. Research on financial assets transaction prediction model based on LSTM neural network. Neural Comput Appl. 2021;33(1):257–70.
Srivastava T, Vedanshu Tripathi M. Predictive analysis of RNN, GBM and LSTM network for short-term wind power forecasting. J Stat Manag Syst. 2020;23(1):33–47.
Zhang J, Wei Y, Tan Z. An adaptive hybrid model for short term wind speed forecasting. Energy. 2020;190: 115615.
Khorramdel B, Chung C, Safari N, Price G. A fuzzy adaptive probabilistic wind power prediction framework using diffusion kernel density estimators. IEEE Trans Power Syst. 2018;33(6):7109–21.
Garg C, Namdeo A, Singhal A, Singh P, Shaw RN, Ghosh A. Adaptive fuzzy logic models for the prediction of compressive strength of sustainable concrete. Cham: Springer; 2022.
Bagherian-Marandi N, Ravanshadnia M, Akbarzadeh-T MR. Two-layered fuzzy logic-based model for predicting court decisions in construction contract disputes. Arti Intell Law. 2021;29(4):453–84.
Tomar A, Gupta N. Prediction for the spread of COVID-19 in India and effectiveness of preventive measures. Sci Total Environ. 2020;728: 138762.
Torrealba-Rodriguez O, Conde-Gutiérrez R, Hernández-Javier A. Modeling and prediction of COVID-19 in Mexico applying mathematical and computational models. Chaos, Solitons & Fractals. 2020;138: 109946.
Bodnar T, Lindholm M, Niklasson V, Thorsén E. Bayesian portfolio selection using VaR and CVaR. Appl Math Comput. 2022;427: 127120.
Moosavi A, Rao V, Sandu A. Machine learning based algorithms for uncertainty quantification in numerical weather prediction models. J Comput Sci. 2021;50: 101295.
Goliatt L, Yaseen ZM. Development of a hybrid computational intelligent model for daily global solar radiation prediction. Expert Syst Applic. 2022;12: 118295.
Cheng D, Yang F, Xiang S, Liu J. Financial time series forecasting with multi-modality graph neural network. Pattern Recogn. 2022;121: 108218.
Lv P, Shu Y, Xu J, Wu Q. Modal decomposition-based hybrid model for stock index prediction. Expert Syst Appl. 2022;202: 117252.
Liu QF, Iqbal MF, Yang J, Lu XY, Zhang P, Rauf M. Prediction of chloride diffusivity in concrete using artificial neural network: modelling and performance evaluation. Constr Build Mater. 2021;268: 121082.
Tian Z, Chen H. Multi-step short-term wind speed prediction based on integrated multi-model fusion. Appl Energy. 2021;298: 117248.
Zhang Y, Zhang R, Ma Q, Wang Y, Wang Q, Huang Z, et al. A feature selection and multi-model fusion-based approach of predicting air quality. ISA Trans. 2020;100:210–20.
Tobore I, Kandwal A, Li J, Yan Y, Omisore OM, Enitan E, et al. Towards adequate prediction of prediabetes using spatiotemporal ECG and EEG feature analysis and weight-based multi-model approach. Knowl-Based Syst. 2020;209: 106464.
Ahmed K, Sachindra D, Shahid S, Iqbal Z, Nawaz N, Khan N. Multi-model ensemble predictions of precipitation and temperature using machine learning algorithms. Atmos Res. 2020;236: 104806.
Jia B, Li R, Wang C, Qiu C, Wang X. Cluster-based content caching driven by popularity prediction. CCF Trans High Perf Comput. 2022;3:66.
Seok HS. Enhancing performance of gene expression value prediction with cluster-based regression. Genes Genom. 2021;43(9):1059–64.
Dileep P, Rao KN, Bodapati P, Gokuruboyina S, Peddi R, Grover A, et al. An automatic heart disease prediction using cluster-based bi-directional LSTM (C-BiLSTM) algorithm. Neural Comput Appl. 2023;35(10):7253–66.
Li S, Chang J, Chu M, Li J, Yang A. A blast furnace coke ratio prediction model based on fuzzy cluster and grid search optimized support vector regression. Appl Intell. 2022;1:10.
Akbarzadeh Khorshidi H, Aickelin U, Haffari G, Hassani-Mahmooei B. Multi-objective semi-supervised clustering to identify health service patterns for injured patients. Health Inf Sci Syst. 2019;7(1):1–8.
Ghasemi Z, Khorshidi HA, Aickelin U. Multi-objective Semi-supervised clustering for finding predictive clusters. Expert Syst Appl. 2022;195: 116551.
Rubio-Rivas M, Corbella X. Clinical phenotypes and prediction of chronicity in sarcoidosis using cluster analysis in a prospective cohort of 694 patients. Eur J Intern Med. 2020;77:59–65.
Ramanathan L, Parthasarathy G, Vijayakumar K, Lakshmanan L, Ramani S. Cluster-based distributed architecture for prediction of student’s performance in higher education. Clust Comput. 2019;22(1):1329–44.
Luo H, Wang J, Lin D, Kong L, Zhao Y, Guan YL. A novel energy-efficient approach based on clustering using grey prediction in WSNs for IoT infrastructures. IEEE Internet Things J. 2024;11(14):24748–60.
Candelieri A, Giordani I, Archetti F, Barkalov K, Meyerov I, Polovinkin A, et al. Tuning hyperparameters of a SVM-based water demand forecasting system through parallel global optimization. Comput Opera Res. 2019;106:202–9.
Chen S, Jq Wang, Hy Zhang. A hybrid PSO-SVM model based on clustering algorithm for short-term atmospheric pollutant concentration forecasting. Technol Forecast Soc Chang. 2019;146:41–54.
Wang YG, Wu J, Hu ZH, McLachlan GJ. A new algorithm for support vector regression with automatic selection of hyperparameters. Pattern Recogn. 2023;133: 108989.
Wu J, Wang YG. A working likelihood approach to support vector regression with a data-driven insensitivity parameter. Int J Machine Learn Cyber. 2022;1:17.
Ye S, Zhou K, Zain AM, Wang F, Yusoff Y. A modified harmony search algorithm and its applications in weighted fuzzy production rule extraction. Front Inf Technol Elect Eng. 2023;24(11):1574–90.
Yang Y, Zhou H, Wu J, Liu CJ, Wang YG. A novel decompose-cluster-feedback algorithm for load forecasting with hierarchical structure. Int J Elect Power Energy Syst. 2022;142: 108249.
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
Balogun AL, Rezaie F, Pham QB, Gigović L, Drobnjak S, Aina YA, et al. Spatial prediction of landslide susceptibility in western Serbia using hybrid support vector regression (SVR) with GWO, BAT and COA algorithms. Geosci Front. 2021;12(3): 101104.
Yang Y, Che J, Deng C, Li L. Sequential grid approach based support vector regression for short-term electric load forecasting. Appl Energy. 2019;238:1010–21.
Panahi M, Sadhasivam N, Pourghasemi HR, Rezaie F, Lee S. Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J Hydrol. 2020;588: 125033.
Utami NA, Maharani W, Atastina I. Personality classification of facebook users according to big five personality using SVM (support vector machine) method. Proc Comput Sci. 2021;179:177–84.
Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput. 2002;6(2):182–97.
Kumar M, Guria C. The elitist non-dominated sorting genetic algorithm with inheritance (i-NSGA-II) and its jumping gene adaptations for multi-objective optimization. Inf Sci. 2017;382:15–37.
Aflaki A, Gitizadeh M, Kantarci B. Accuracy improvement of electrical load forecasting against new cyber-attack architectures. Sustain Cities Soc. 2022;77: 103523.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions. The work is supported by the Chinese Fundamental Research Funds for the Central Universities (WUT: 213114009) and the Natural Science Foundation of Shandong Province, China (No. ZR2024QF057). Additional funding was provided by the Innovation Team Project of the Guangdong Provincial Department of Education (Grant No. 2022WCXTD009), the Key Field Special Project of Scientific Research by the Guangdong Provincial Department of Education (Grant No. 2024ZDZX2088), and the Guangdong Province Graduate Education Innovation Plan Project (Grant No. 2024SFKC_042). This work is also supported by the Science and Technology Innovation Program of Hunan Province (2022RC4028).
Author information
Authors and Affiliations
Contributions
Shangrui Zhao and Weiqi Yu: Methodology, Software and Writing - Original Draft; Yulu Wu and Xi’an Li: Writing-Reviewing and Editing; You-Gan Wang and Jinran Wu: Writing-Reviewing and Editing. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhao, S., Yu, W., Wu, Y. et al. An adaptive regression algorithm with a clustering process for multi-modal data prediction. Discov Computing 28, 94 (2025). https://doi.org/10.1007/s10791-025-09565-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s10791-025-09565-7











