Big Data Prediction Orig
Big Data Prediction Orig
Abstract—In the realm of Internet of Things (IoT) be feasible, particularly in high-stakes scenarios.In
systems, accurately forecasting the runtime of Infer- addition to providing latency and scalability, edge
encing models on heterogeneous devices is paramount computing must accommodate resource-intensive
for optimizing resource allocation, particularly in the
contexts of Model Distributed Inferencing (MDI) and demands such as DNN and coordination among edge
Data Distributed Inferencing (DDI). This paper delves devices in heterogeneous processing environments
into the application of Gradient Boosting Regression and dynamic network conditions [1]. In response to
(GBR) as a predictive modeling technique for estimating this challenge, innovative solutions have emerged,
runtime in both MDI and DDI scenarios. GBR presents notably Model Distributed Inferencing (MDI) and
an equitable trade-off between interpretability, robust-
ness against noise, and suitability for moderately sized Data Distributed Inferencing (DDI), which dis-
datasets. The study reviews previous research on IoT tribute the computational workload across multiple
inference optimization. It accentuates the multifaceted devices at the network’s edge. The effectiveness of
intricacies of device diversity and the significance of these approaches hinges on two critical factors: the
model interpretability within MDI and DDI setups. The accuracy of the inferencing results and the time
primary contribution of this research is the novel appli-
cation of the GBR model to predict machine learning required to execute these tasks.
inferencing runtime in MDI and DDI contexts. This Both MDI and DDI processes necessitate the
approach is invaluable when empirical data is limited exchange of data between devices. Given the di-
and characterizing the behavior of newly introduced versity in resources among IoT devices, it becomes
devices is imperative. The paper elaborates on the GBR
algorithm’s utilization, hyperparameters, and custom
paramount to comprehensively understand the infer-
loss functions tailored explicitly for MDI and DDI. The encing run-time of each device under varying band-
results section exemplifies GBR’s performance across width constraints. Additionally, gaining insights into
various computational regimes, including MDI and DDI. the resource cost associated with data offloading
It offers insights into the model’s balance between ac- between nodes is essential for assessing the trade-
curacy and complexity. A performance comparison with
prior models underscores GBR’s efficacy in predicting
offs in deploying MDI or DDI. It is important to
runtime in MDI and DDI. This work contributes to the acknowledge that modeling this process for all pos-
ongoing discourse on IoT optimization and predictive sible operational scenarios is impractical. Therefore,
modeling. there is a pressing need to develop a method capable
of estimating trade-offs beyond benchmarking data.
I. Introduction One of the challenges in creating such models
lies in the inherent noise and intricate nonlinearity
A. Background and Motivation present in the data derived from benchmarking ex-
Many Internet of Things (IoT) devices face re- ercises. Conventional linear models, such as multiple
source constraints due to their small size. With regression, are ill-suited for handling these complex-
the growing prevalence of these devices, innovative ities, as they struggle to generalize effectively. One
strategies for conducting inferencing on resource- viable approach to mitigating this issue is to employ
constrained IoT devices are becoming increasingly Ridge Regression, wherein the regression coefficients
imperative. While cloud computing often serves as are penalized. This regularization technique signifi-
a means to offload resource-intensive computations, cantly enhances the model’s predictive capabilities.
there are situations where this approach may not However, it comes at the cost of interpretability, a
2
critical element in comprehending how different sub- identifying machine-independent application phases
variants interact within the system. In the context of via offline benchmark analysis, using neural network
this research study, the attainment of requisite gran- models to analyze intricate cross-platform relation-
ularity for neural network processing necessitates ships, and integrating performance counter mea-
the execution of multiple benchmarking iterations, surements during run-time to increase prediction
rendering this approach operationally cumbersome. accuracy.
To address this challenge innovatively, we turn to Interpretability is a crucial consideration when
the application of boosting methodologies. Boosting, it comes to predictive modeling of IoT devices.
a prominent machine learning technique, orches- Gradient-boosting models offer easily understand-
trates the amalgamation of predictions generated by able results, particularly those based on gradient-
multiple individual models, typically characterized boosted decision trees. This interpretability is valu-
as weak learners, thereby engendering a prediction able for gaining insights into the factors influenc-
model of heightened accuracy and robustness. The ing predictions, such as run-time in [3] research.
underpinning principle of boosting revolves around Their use of neural networks can pose challenges in
the sequential training of a series of models, each interpretability [4], making it less straightforward
directed towards rectifying the misclassifications to discern the critical determinants of run-time.
encountered by its predecessor. This sequential In addition, gradient-boosting models inherently
ensemble of models collectively contributes to the produce feature importance scores [5], which aid
formulation of definitive predictions. In the context in identifying influential factors within applications
of the present study, Gradient Boosting Regression and architectures. These models exhibit robustness
(GBR) assumes a pivotal role in predicting run- [6] to outliers within datasets, gracefully handling
time duration. Several intrinsic attributes of GBR extreme cases without necessitating extensive data
contribute to its selection as the predictive model, preprocessing. This resilience is especially relevant
including interpretability, resilience to noise, ability when dealing with smaller datasets [7], as gradient
to capture non-linear relationships within data, and boosting models often yield reliable results without
suitability for handling modestly sized datasets. the need for intricate hyperparameter tuning, sim-
This deliberate choice strikes a judicious equilibrium plifying the implementation process compared to the
between pursuing heightened predictive precision more complex and data-demanding nature of neural
and managing model complexity. Notably, this se- networks.
lection proves particularly advantageous in scenarios [8] measured the performance of specific hardware
characterized by constraints on data availability devices when running machine learning models.
and in situations where the elucidation of nuanced They considered various metrics, including power
intricacies within the prediction process assumes consumption, inference time, and accuracy, provid-
paramount significance. ing insights into how different devices perform under
various conditions and workloads. This approach
B. Research Objectives can be valuable for selecting the most appropriate
C. Contribution of the Study hardware for specific AI tasks. In contrast, we use a
gradient-boosting regressor to estimate run-time on
II. Related Work heterogeneous devices, creating a predictive model
A. Review of Previous Research on IoT Inference based on historical data. Our approach relies on
Optimization statistical and machine learning techniques to make
This section examines methods for predicting run-time predictions without directly measuring the
run times on IoT devices and explores DDI/MDI devices’ performance. This approach is practical
predictions. Effective task scheduling in heteroge- when empirical data is limited and new devices
neous device environments necessitates thoroughly entering an ecosystem require characterizing.
considering resource disparities among such devices,
as highlighted by [2]. It is imperative to possess
in-depth knowledge of these diverse devices’ archi- B. Existing Predictive Modeling Approaches
tectures, capabilities, and, in some cases, energy C. Gap in the Literature
efficiency profiles to optimize their performance.
III. Methodology
[3], for example, predict both power consump-
tion and performance for applications that run on In this section, we will discuss the methodology
heterogeneous computing systems. They employ a for predicting run time using the Gradient Boosting
multifaceted approach to achieve this, including Regressor algorithm.
3
the customquantile loss function is meticulously tai- TABLE V: Performance Metrics of Gradient Boost-
lored for the realm of quantile regression, diligently ing Regressor
evaluating quantile-specific errors and rendering it- Compute Nodes Mean Absolute Error (MAE) R-Squared
self adaptable to a spectrum of quantile-related in- 2 8.76 0.99
3 118.78 0.95
vestigations. Collectively, these functions engender 4 125.83 0.98
a holistic evaluation of gradient boosting models,
embracing diverse loss criteria and ensuring their MDI and DDI data dataset. The regression model
adaptability across a gamut of data characteristics. utilized bandwidth, compute nodes, and bandwidth
This systematic approach substantially heightens reserved as predictor variables to estimate the re-
the prospects of selecting the most apropos gradient sponse variable, the runtime of inferencing on IoT
boosting model tailored to the idiosyncrasies of spe- devices.
cific datasets and research objectives. Furthermore, Compute Nodes: This column represents the num-
the scrupulous documentation of results and custom ber of compute nodes employed in the computa-
loss functions augments transparency and bolsters tional tasks. As a critical factor in parallel com-
the reproducibility of scientific research endeavors. puting, compute nodes profoundly impact runtime;
consequently, we segment the data based on the
A. Evaluation Metrics number of compute nodes.
Mean Absolute Error (MAE): The MAE measures
In the evaluation of our model’s performance, we
the absolute differences between the predicted and
employ the following metrics: Mean Absolute Error
actual values. It quantifies the average magnitude
(MAE): This metric quantifies the average absolute
of errors in predicting the runtime. Smaller MAE
difference between the predicted and actual run
values indicate better model accuracy.
times, providing a measure of the model’s accuracy.
R-Squared (R²): R-squared is a measure of the
R-squared (R2 ) Score: The R2 score is utilized to
goodness of fit of the regression model. It ranges
gauge the proportion of variance in the run times
from 0 to 1, with higher values indicating a better
that can be predicted by our model, elucidating its
fit. In this context, R² reflects the proportion of
predictive capability.
the variance in runtime explained by the predictor
Cross-Validation Analysis
variables. The results in Table 1 reveal insightful
To rigorously assess our model’s generalization
information: Notably, the GBR achieved a low
performance, we adopt a k-fold cross-validation
Mean Absolute Error (MAE) of 8.76 when two (2)
methodology. Specifically, we employ a five-fold
compute nodes were present. This value indicates
cross-validation approach (k=5) to scrutinize the
that the model’s predictions were, on average, very
model’s robustness and performance across distinct
close to the actual runtime values. Moreover, the
data subsets.
R-squared value of 0.99 indicates that the chosen
predictors explained approximately 99/
V. Experimental Results
When the number of compute nodes increased to
In this section, we present the experimental re- three (3), the MAE rose substantially to 118.78,
sults obtained from the Gradient Boosting Regressor indicating larger prediction errors. However, the
model. model still exhibited a reasonably high R-squared
value of 0.95, indicating that it could explain a sig-
A. Description of the Dataset nificant portion of the variance in runtime. With the
addition of the compute nodes, the model remains
The dataset used for experimentation consists
informative despite its increased complexity.
of records with columns ’bandwidth,’ ’nodes,’
Similarly, with four (4) compute nodes, the MAE
’run_time_per_sec,’ and ’dist_type.’ It was pre-
increased to 125.83, although it remained within
processed as described in the Data Preprocessing
a reasonable range. The R-squared value of 0.98
section.
highlights the model’s ability to explain most of the
variance in runtime, even with the higher number
B. Predictive Modeling Results of compute nodes.
In Table IV, we summarize the results of training The table illustrates the performance of the Gra-
a Gradient gradient-boosting regressor model on our dient Boosting Regressor across different numbers
dataset: of compute nodes. It demonstrates the trade-off
The table presents the performance metrics for a between model accuracy (as indicated by MAE)
Gradient gradient-boosting regressor applied to an and model explanatory power (as indicated by R-
6
D. Discussion of Results
We discuss the implications and significance of
the predictive modeling results, highlighting any
Fig. 2: Compute Nodes -3 insights gained from the analysis. for Research
References
[1] J. Chen and X. Ran, “Deep Learning with Edge
Computing: A Review,” Proceedings of the IEEE, vol.
107, no. 8, pp. 1655–1674, Aug 2019. [Online]. Available:
https://doi.org/10.1109/jproc.2019.2921977
[2] C. Gregg, M. Boyer, K. Hazelwood, and
K. Skadron, “Dynamic Heterogeneous Scheduling
Decisions Using Historical Runtime Data,” Workshop
on Applications for Multi-and Many-Core Processors
(A4MMC), pp. 1–12, 1 2011. [Online]. Available:
https://www.cs.virginia.edu/ skadron/Papers/
gregga 4mmc11.pdf
[3] Y. Kim, P. Mercati, A. More, E. Shriver, and T. Rosing,
“P4: Phase-based power/performance prediction of
heterogeneous systems via neural networks,” 2017
IEEE/ACM International Conference on Computer-
Aided Design (ICCAD), 11 2017. [Online]. Available:
https://doi.org/10.1109/iccad.2017.8203843
[4] Z. C. Lipton, “The mythos of model interpretability,”
ACM Queue, vol. 16, no. 3, pp. 31–57, 6 2018. [Online].
Available: https://doi.org/10.1145/3236386.3241340
[5] J. H. Friedman, “Greedy function approximation: A
gradient boosting machine.” Annals of Statistics,
vol. 29, no. 5, 10 2001. [Online]. Available:
https://doi.org/10.1214/aos/1013203451
[6] J. Friedman, “Stochastic gradient boosting,”
Computational Statistics Data Analysis, vol. 38,
no. 4, pp. 367–378, 2 2002. [Online]. Available:
https://doi.org/10.1016/s0167-9473(01)00065-2
[7] T. Chen and C. Guestrin, “XG-
Boost,” arxiv, 8 2016. [Online]. Available:
https://doi.org/10.1145/2939672.2939785
[8] S. P. Baller, A. Jindal, M. Chadha, and
M. Gerndt, “DeepEdgeBench: Benchmarking deep
neural networks on edge devices,” arXiv (Cor-
nell University), 8 2021. [Online]. Available:
https://arxiv.org/pdf/2108.09457.pdf
[9] T. Hastie, R. Tibshirani, and J. H. Friedman, The
elements of statistical learning, 1 2009. [Online].
Available: https://doi.org/10.1007/978-0-387-84858-7
[10] “Scikit-Learn Ensemble: GradientBoostingRegressor,”
https://scikit-learn.org/stable/modules/generated/
[11] C. Anderson, M. Dwyer, and K. S. Chan, “Optimizing
machine learning inference performance on iot devices:
trade-offs and insights from statistical learning,” SPIE
Proceedings, 2023.