0% found this document useful (0 votes)
51 views12 pages

1 s2.0 S0098135424003727 Main

This study investigates the use of machine learning algorithms, specifically random forest (RF), support vector machine (SVM), and eXtreme gradient boosting (XGBoost), to predict hydrogen production rates in proton exchange membrane water electrolysis (PEMWE). The RF model outperformed the others, achieving high predictive accuracy, and the study emphasizes the importance of optimizing operational parameters like cell voltage and current for enhanced hydrogen production. The findings suggest further research on additional parameters to improve optimization and efficiency in PEMWE systems.

Uploaded by

Aditto Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views12 pages

1 s2.0 S0098135424003727 Main

This study investigates the use of machine learning algorithms, specifically random forest (RF), support vector machine (SVM), and eXtreme gradient boosting (XGBoost), to predict hydrogen production rates in proton exchange membrane water electrolysis (PEMWE). The RF model outperformed the others, achieving high predictive accuracy, and the study emphasizes the importance of optimizing operational parameters like cell voltage and current for enhanced hydrogen production. The findings suggest further research on additional parameters to improve optimization and efficiency in PEMWE systems.

Uploaded by

Aditto Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Computers and Chemical Engineering 194 (2025) 108954

Contents lists available at ScienceDirect

Computers and Chemical Engineering


journal homepage: www.elsevier.com/locate/compchemeng

Machine learning in PEM water electrolysis: A study of hydrogen


production and operating parameters
Ibrahim Shomope a , Amani Al-Othman a,b,* , Muhammad Tawalbeh c,d , Hussam Alshraideh e,f ,
Fares Almomani g,*
a
Department of Chemical and Biological Engineering, American University of Sharjah, PO. Box 26666, Sharjah, United Arab Emirates
b
Energy, Water and Sustainable Environment research center, college of Engineering, American University of Sharjah, PO. Box 26666, Sharjah, United Arab Emirates
c
Sustainable and Renewable Energy Engineering Department, University of Sharjah, PO. Box 27272, Sharjah, United Arab Emirates
d
Sustainable Energy & Power Systems Research Centre, RISE, University of Sharjah, P.O. Box 27272, Sharjah, United Arab Emirates
e
Department of Industrial Engineering, American University of Sharjah, PO. Box 26666, Sharjah, United Arab Emirates
f
Industrial Engineering Department, Jordan University of Science and Technology, PO. Box 3030, Irbid, Jordan
g
Department of Chemical Engineering, Qatar University, P. O. Box 2713, Doha, Qatar

A R T I C L E I N F O A B S T R A C T

Keywords: Proton exchange membrane water electrolysis (PEMWE) powered by renewable energy stands out as a promising
Water electrolysis technology for the sustainable production of high-purity hydrogen. This study employed three machine learning
Machine learning (ML) algorithms, random forest (RF), support vector machine (SVM), and eXtreme gradient boosting (XGBoost),
Hydrogen production
to predict hydrogen production in PEMWE. Model performance was evaluated using root mean squared error
Random forest
Support vector machine
(RMSE), coefficient of determination (R²), and mean absolute error (MAE) metrics. The top-performing models,
eXtreme gradient boosting algorithm RF and XGBoost, were further refined through hyperparameter tuning. The final models demonstrated high
reliability in predicting hydrogen production rates, with RF consistently outperforming XGBoost. The RF model
achieved a predictive accuracy of R² = 0.9898, RMSE = 19.99 mL/min, and MAE = 10.41 mL/min, while the
XGBoost model achieved R² = 0.9894, RMSE = 20.43 mL/min, and MAE = 11.50 mL/min. Partial dependency
plots (PDPs) emphasized the critical role of optimizing both cell voltage and current to maximize hydrogen
production in PEMWE. These insights provide valuable guidance for operational adjustments, ensuring optimal
system performance for high efficiency and productivity. The study suggests further research on the impact of
parameters like temperature and power density on hydrogen production, incorporating them for better
optimization.

1. Introduction hydrogen production is estimated at approximately 500 billion cubic


meters annually, sourced from both renewable and non-renewable re­
Hydrogen has emerged as a vital component in the global energy sources, including fossil fuels, coal gasification, biomass, biological
transition, playing a key role in the development of clean and sustain­ sources, and water electrolysis (Acar and Dincer, 2014; Boyano et al.,
able energy solutions (Sharifishourabi et al., 2024). As a versatile fuel, 2011; Huang and Dincer, 2014; Rezaei et al., 2024; Das, 2001; Velas­
hydrogen provides substantial energy output and finds applications quez-Jaramillo et al.2024).
across various sectors, including industrial processes, residential energy Among the various methods of hydrogen production, proton ex­
supply, and transportation through fuel cell electric vehicles (FCEVs) change membrane water electrolysis (PEMWE) has gained significant
(Verma et al., 2023). One of the main advantages of hydrogen as a fuel is attention due to its ability to generate high-purity hydrogen using
its clean combustion process, which produces only water as a renewable energy sources such as solar and wind (Abdelkareem et al.,
by-product, thereby eliminating harmful carbon emissions. Further­ 2023; Kumar and Lim, 2022). The PEMWE operates by splitting water
more, hydrogen has a high energy density of 140 MJ/kg, far surpassing molecules into hydrogen (H2) and oxygen (O2) using electrical energy.
that of conventional solid fuels (Saravanan et al., 2020). Global Despite its environmental advantages, PEMWE currently accounts for

* Corresponding authors.
E-mail address: [email protected] (F. Almomani).

https://doi.org/10.1016/j.compchemeng.2024.108954
Received 14 August 2024; Received in revised form 26 October 2024; Accepted 20 November 2024
Available online 27 November 2024
0098-1354/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

only about 4 % of global hydrogen production, primarily due to financial and PEM technologies, with KNN and RF proving to be the most effective
constraints (Dunn, 2002). However, the increasing deployment of based on regression coefficients, RMSE, and MAE. Recently, Shomope
renewable energy sources is expected to boost the growth of PEMWE. et al. (2025) employed a multilayer perceptron artificial neural network
Compared to other water electrolysis methods, such as alkaline water (MLP–ANN) to predict biohydrogen production from dark fermentation
electrolysis (AWE) and solid oxide electrolysis (SOE) (Ong et al., 2023), of organic waste biomass. Their results showed that the model was
PEMWE offers distinct advantages, including higher current density, effective in predicting the biohydrogen production and provided a
compact system design, and superior energy efficiency, making it a valuable tool for optimizing the fermentation process.
promising technology for hydrogen production (Tawalbeh et al., 2022; Cheng et al. (2023) conducted a comparative analysis of green
Nikolaidis and Poullikkas, 2017). hydrogen production in China using photovoltaic-powered water elec­
Accurately predicting hydrogen production rates in PEMWE presents trolysis. Their study showed that SVM outperformed the FbProphet al­
several challenges due to the complexity of the system’s operational gorithm, as indicated by higher R2 values (0.968 vs. 0.955) and lower
parameters. Hydrogen production efficiency is influenced by various RMSE values (71.1 kg/km2 vs. 81.3 kg/km2). In a separate study, Zhang
factors, such as gas diffusion layers, membrane types, voltage, current, et al. (2022) developed an ML-based approach using the eXtreme
and temperature (Olabi et al., 2024). Traditional modeling techniques gradient boosting (XGBoost) algorithm with K-fold cross-validation to
often struggle to capture the nonlinear relationships between these pa­ predict hydrogen production efficiency in water electrolysis, finding
rameters, limiting their effectiveness in optimizing the process. Histor­ that XGBoost outperformed other algorithms, including DTs and random
ically, several principal-based methods have been employed to predict forest (RF), in predicting electrocatalytic performance. Recently, Hay­
hydrogen production rates in water electrolysis systems (Kumar and atzadeh et al. (2024) examined the effectiveness of support vector
Himabindu, 2019; CHOI, 2004). These methods include thermody­ regression (SVR) and ANN in analyzing data from commercial PEMWE
namics models based on Gibbs free energy and the Nernst equation, as systems, focusing on the impact of operating temperature, current
well as electrochemical models such as the Butler-Volmer equation, density, and catalyst loading on cell performance. Despite the extensive
which describes the kinetics of the electrochemical reaction (Oliveira research, there remains a gap in the literature concerning the compar­
et al., 2013). While these models provide valuable insights, they are ative performance of multiple ML algorithms in predicting hydrogen
often constrained by simplifying assumptions and may not fully capture production rates, specifically for PEMWE systems.
the complexity of real-world systems. For instance, thermodynamic This study seeks to bridge this gap by conducting a comprehensive
models do not account for mass transport limitations, and the evaluation of three distinct ML algorithms, RF, SVM, and XGBoost, for
Butler-Volmer equation is highly sensitive to parameters like exchange predicting hydrogen production rates in PEMWE systems. These algo­
current density and charge transfer coefficient (Sezer et al., 2024; Gri­ rithms were chosen for their complementary strengths: RF’s resistance
goriev et al., 2006). Additionally, empirical models, such as polynomial to overfitting, ability to capture complex non-linear relationships
regressions (PRs), are frequently used to estimate relationships between (Odabaşı et al., 2022), and capacity to handle both numerical and cat­
operational parameters and hydrogen production. However, these egorical variables while providing feature importance estimates to bet­
models often require extensive experimental data and are less adaptable ter understand the influence of operational parameters on hydrogen
to varying system configurations (Hernández-Gómez et al., 2020). production. The XGBoost offers scalability, rapid processing, and the
Scaling these models from lab-scale to industrial systems is challenging ability to minimize errors while managing multicollinearity and missing
due to differences in materials, design, and environmental factors. data (Zhang et al., 2020), making it ideal for refining model predictions
Moreover, conducting experimental trials to explore the full operational in PEMWE. The SVM excels in handling high-dimensional spaces,
range of variables is both time-consuming and costly. Given these lim­ effectively modeling both linear and non-linear relationships (Valizadeh
itations, traditional models struggle to accurately predict hydrogen et al., 2024), and its kernel trick captures intricate interactions between
production rates, particularly in systems with highly nonlinear and input variables, crucial for identifying subtle patterns in the dataset.
interdependent operational parameters. This underscores the need for While most previous studies (Abdelkareem et al., 2022), including a
advanced modeling techniques capable of handling such complexities recent work by the authors (Tawalbeh et al., 2024), have predominantly
while providing accurate predictions across a broad range of operating used single algorithms, such as ANN, to predict hydrogen production,
conditions. this study provides a broader comparative analysis across multiple ML
Machine learning (ML) has emerged as a powerful tool to address the algorithms. This approach enables a deeper understanding of the factors
challenges of modeling hydrogen production. Unlike traditional that affect hydrogen production and the operational limits of PEMWE
methods, ML algorithms excel at identifying complex nonlinear re­ systems. Additionally, this study employs recursive feature elimination
lationships between input and output variables, making them well- (RFE) for feature selection and hyperparameter optimization to enhance
suited for systems like PEMWE, where multiple interdependent factors model performance and accuracy.
influence hydrogen production rates. The ML techniques also facilitate The novelty of this research lies in its integration of ML techniques
more efficient analysis and optimization, reducing the need for exten­ with experimental data to predict hydrogen production rates accurately
sive experimental trials by offering accurate predictions based on while providing insights into optimizing key operational parameters,
existing data (Ming et al., 2023). such as cell voltage and current. The use of Partial Dependency Plots
Numerous studies have explored the application of ML techniques to (PDPs) further highlights the operational boundaries and limiting be­
hydrogen production via water electrolysis. For instance, Bilgiç et al. haviors of PEMWE systems, enabling more effective optimization. By
(2023) utilized an artificial neural network (ANN) model to analyze addressing these specific objectives, this study contributes to the prac­
multiple input parameters, such as magnetic field, electrode material, tical advancements of PEMWE technologies and the optimization of
and electrolyte type, achieving excellent predictive capabilities, with a hydrogen production through innovative ML methodologies. The find­
correlation coefficient (R = 0.973) and a minimal mean squared error ings also offer valuable guidance for future research, such as integrating
(MSE = 0.01125). Mohamed et al. (2022) evaluated five ML algorithms, additional parameters like temperature, pressure, and power density for
ANN, PR, support vector machine (SVM), K-nearest neighbor (KNN), more comprehensive optimization.
and decision tree (DT) regressor, for their effectiveness in hydrogen
production simulations. Box and whisker plots were used to determine 2. Theory and methodology
the most efficient materials for current density, with ANN outperforming
the others in terms of mean absolute error (MAE) during both the This section presents the systematic approach adopted in this study,
training and testing phases. Kabir et al. (2023) applied various ML starting with data collection, including its sources and the pre-
models to enhance green hydrogen production via dark fermentation processing steps undertaken. Following this, the feature selection

2
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

method using RFE is explained. The application of ML algorithms, RF, variables capture qualitative aspects such as cell design types, anode and
SVM, and XGBoost, is also discussed, along with the process of cross- cathode catalysts, membrane types, and electrolyte compositions.
validation to ensure robustness and mitigate overfitting. The method­
ology continues with a detailed explanation of model training and Table 1
hyperparameter tuning, concluding with an evaluation of the models Type and range of values of the PEMWE parameters.
using metrics such as MAE, RMSE, and R2. These steps are visually
Parameter Category/Range of value
represented in Fig. 1, which summarizes the overall methodological
Anode/Cathode type Porous titanium, porous carbon, 304 stainless steel,
framework.
carbon plate, titanium
To implement and evaluate the ML models, Python (version 3.9.7) Membrane type Nafion115, Nafion117, Nafion112, Nafion110
was utilized within the Anaconda environment (version 22.11.1). Cathode catalyst 10 % platinum, nano sheets of molybdenum disulfide
Jupyter Notebook (version 6.5.2) and the scikit-learn library (version (MOS2), 20 % platinum, 30 % platinum, 4 mg/cm2
1.5.0) were employed for the ML tasks, while SciPy (version 1.9.3), platinum, 0.5 mg/cm2 platinum, 5 % palladium, 10 %
palladium
matplotlib (version 3.6.0), and seaborn (version 0.11.2) were essential
Anode catalyst Iridium oxide, ruthenium oxide or iridium oxide, 40 %
tools for statistical analysis and data visualization. platinum and 20 % ruthenium oxide, 2 mg/cm2 platinum
and 2 mg/cm2 iridium oxide, 1 mg/cm2 iridium oxide,
ruthenium oxide, platinum, and ruthenium oxide
2.1. Data collection and description Anolyte/Catholyte Deionized water, wastewater + 2 L of 0.1 mol of sulfuric
acid, wastewater + 2 L of 0.05 sodium metabisulfite
(Na2SO5), 0.5 mol sulfuric acid, 6 mol of ethanol water
Developing ML algorithms requires a substantial amount of well- (CH3CH2OH), normal water, pure water, 4 mol methanol,
curated and organized training data to ensure robust predictive perfor­ hybrid sulfur (SO2)
mance and generalizability. In this study, 450 experimental data points Cell design type Single or bipolar
were gathered from published sources (Mohamed et al., 2022; Rozain Cell design number 1–20 cells
Anode/Cathode flow 6–75,000
et al., 2016; Brightman et al., 2015; Ayers et al., 2016; Sarno and Pon­
area (cm2)
ticorvo, 2019; Ju et al., 2017; Ramakrishna et al., 2016; Kaya et al., Voltage (V) 0.5–32
2021; Song et al., 2008; Grigoriev et al., 2008; Ruck et al., 2022; Cell current (A) 0–125
Mayousse et al., Aug. 2011; Zhou et al., 2016; Pushkarev et al., 2023; Power (W) 0–1300
Wang et al., 2022), representing a wide range of operational parameters Water flow rate (mL/ 5–1500
min)
and experimental conditions. The dataset comprises both numerical and Cell temperature (K) 298–360
categorical variables, with eight variables in each category. The nu­ Hydrogen flow rate 0–5000
merical variables include parameters such as anode and cathode flow (mL/min)
area, cell temperature, power, and flow rates, while the categorical

Fig. 1. Methodological framework of the study.

3
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

Table 1 presents a detailed summary of the various types and ranges features is achieved (Darst et al., 2018). Specifically, an RF regressor was
of parameter values obtained from the datasets, illustrating the diversity used as the estimator for RFE due to its ability to manage
and scope of the operational conditions. Table 2 provides a statistical high-dimensional data and provide feature importance scores. The pri­
summary of the numerical variables, offering further insights into the mary advantage of RFE in this context lies in its ability to retain the
distribution and characteristics of the data. Additionally, Fig. 2 visual­ physical meaning of each feature, allowing for a clearer interpretation of
izes Pearson’s correlation coefficients (PCC) between the numerical in­ how various operational parameters, such as voltage, current, and
dependent variables using a heatmap, revealing the relationships among temperature, affect the hydrogen production rate. This makes RFE
these features. This comprehensive and well-structured dataset forms particularly valuable for this study, where understanding the individual
the foundation for the development of robust ML models in this study. contributions of operational parameters is critical for optimizing water
electrolysis.
2.2. Data cleaning and pre-processing Principal component analysis (PCA) is another widely used tech­
nique for dimensionality reduction that transforms the original features
Data cleaning and pre-processing are critical steps in ensuring the into orthogonal principal components. While PCA is effective at
quality and reliability of the ML models. The raw dataset underwent a reducing dimensionality, it generates new composite features that are
rigorous cleaning and preparation process before analysis. Initially, the linear combinations of the original variables, potentially obscuring the
dataset was carefully examined for missing values. To minimize the physical interpretation of individual features (Shlens, 2014; Jolliffe and
impact of missing data on model performance, mean imputation tech­ Cadima, 2016). In contrast, RFE ranks and selects the most important
niques were applied to fill in the gaps, ensuring the dataset’s integrity features based on their direct contributions to model predictions while
was maintained. In addition to addressing missing values, the dataset preserving their physical significance. Given the importance of main­
contained several categorical variables that needed to be converted into taining a clear understanding of how individual operational parameters
a numerical format suitable for ML algorithms. One-hot encoding was influence hydrogen production in water electrolysis, RFE was selected as
employed to transform these categorical variables into a binary matrix, the feature selection method for this study.
where each category was represented by a binary column (0 or 1). This After applying RFE, the model was trained on the dataset, and fea­
method ensured no artificial ordinal relationships were introduced be­ tures were ranked based on their importance. Cross-validation tech­
tween the categories, maintaining the validity of the data representation niques were utilized to validate the robustness of the selected features
(Kim and Boukouvala, 2020). and ensure that the reduced feature set would enhance model perfor­
Multicollinearity, or the presence of highly correlated variables, can mance. Ultimately, the final set of ten input features, membrane type,
adversely affect the performance of regression models. To mitigate this, anode and cathode catalyst, cathode area, cell design number, cell
a correlation matrix was computed to identify pairs of highly correlated current, cell voltage, power, cell temperature, and water flow rate, were
features (correlation coefficient > 0.85). For each pair, the less signifi­ identified as the most impactful predictors of hydrogen production.
cant feature was removed to reduce redundancy without losing valuable
information. This step ensured that the remaining features were inde­ 2.3.2. Machine learning techniques
pendent, enhancing the robustness of the models. Furthermore, nu­ The field of ML offers a wide range of regression algorithms, with no
merical features were standardized using the ’StandardScaler’ from single algorithm being optimal for every task, as suggested by the no-
Python’s scikit-learn library (Pedregosa et al., 2011). Standardization, free-lunch (NFL) theorem (Gómez and Rojas, 2016). In this study,
with a mean of zero and a standard deviation of one, is crucial for al­ three ML algorithms were employed: RF, XGBoost, and SVM.
gorithms sensitive to feature scaling, such as SVM and KNN (Alaca et al., The RF is an ensemble learning technique frequently used for both
2024). This careful approach to data cleaning and pre-processing classification and regression tasks. This method constructs multiple DTs
ensured that the dataset was free from inconsistencies and biases, during training and aggregates the outputs to determine the final pre­
paving the way for the development of accurate and reliable ML models. diction, either by taking the majority vote (for classification) or by
averaging the results (for regression). The RF is widely known for its
robustness, speed, and versatility, making it a cornerstone of ML ap­
2.3. Machine learning model development
plications. Key advantages of RF include its ability to reduce overfitting,
handle noisy data, and provide high prediction accuracy (Madaan and
2.3.1. Selection of input features by recursive feature elimination
Pandey, 2024; Li et al., 2018).
Feature selection is critical to improving both the performance and
The XGBoost is another widely recognized ML method designed for
interpretability of ML models. In this study, RFE was employed to
both regression and classification tasks. As an ensemble learning algo­
identify the most significant features for predicting the target variable,
rithm, it combines the predictions of multiple weaker models to form a
hydrogen flow rate. After the cleaning and pre-processing stages, the
stronger predictive model. Furthermore, XGBoost operates within a
dataset comprised 16 independent features, including various sub-
gradient boosting framework, where new models are iteratively added
categorical variables. To streamline the model and reduce the risk of
to correct errors made by the previous models. This technique has
overfitting, RFE was applied to select the most relevant features. The
gained significant popularity due to its efficiency, its capability to
RFE operates by iteratively removing the least important features and
handle missing data, its built-in overfitting prevention strategies, and its
building models on the remaining ones until the optimal number of
minimal requirement for data preprocessing, such as normalization
(Wang et al., 2024).
Table 2
The SVMs are a powerful category of supervised ML algorithms used
Statistical description of the numerical variables.
for both classification and regression tasks. Moreover, SVMs work by
Variables Unit Mean Standard deviation mapping features into an n-dimensional space and identifying a hyper­
Cathode area cm 2
10,437.31 17,593.29 plane that best separates the data into different classes. This separation
Anode area cm2 9252.22 17,505.89 is designed to maximize the margin between classes, providing effective
Cell design number − 13.06 12.72 classification performance. The SVMs are particularly well-suited for
Cell voltage V 4.05 7.65
Cell current A 28.83 28.09
complex datasets due to their ability to capture intricate patterns
Power W 62.72 96.84 through the use of kernel functions, which can handle nonlinear re­
Water flow rate mL/min 61.03 198.96 lationships (Chiang et al., 2004). The popularity of SVMs can be
Cell temperature K 302.64 54.50 attributed to their effectiveness in high-dimensional spaces, versatility
Hydrogen flow rate mL/min 258.61 449.82
in kernel selection, memory efficiency, and robustness against

4
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

Fig. 2. Heatmap of Pearson’s correlation matrix between input and target variables.

overfitting (Lu et al., 2020). accuracy. In the case of SVM (using the SVR approach), the algorithm
For readers interested in a deeper understanding of the mathematical finds a function that minimizes the margin of error within a specified
and physical principles underlying these ML models, several compre­ tolerance, focusing on data points lying outside this margin, known as
hensive references are available (Chen and Guestrin, 2016; Friedman, support vectors. Initial results indicated that the RF and XGBoost models
2002; Segal, 2004). These works provide detailed explanations and exhibited superior test accuracy compared to the SVM model. However,
analyses of the mathematical and statistical methods used in ML, serving it is important to clarify that the exclusion of the SVM model from
as valuable resources for those seeking to expand their understanding of hyperparameter tuning was not due to an inherent flaw in the algorithm.
the field. Rather, this was a decision based on the SVM’s relatively lower per­
formance in the initial test set. It is fully acknowledged that, with
2.3.3. Training of machine learning models appropriate hyperparameter tuning, such as selecting different kernel
In developing ML algorithms, splitting the data into training and functions, the SVM model could potentially outperform the other
testing sets is essential to ensure unbiased model evaluation. The models.
training set is used to fit the model, while the testing set is reserved for It is also noteworthy to mention that the original test data was never
evaluating the model’s predictive performance. In this study, an 80:20 used during the training process, ensuring that the predictions made
split was used, with 80 % of the data allocated for training and 20 % for with the data represent an unbiased evaluation of the developed models.
testing. A detailed discussion of models’ performance is provided in Section 3.2.
Three ML models were evaluated: RF, XGBoost, and SVM. The RF
model builds an ensemble of DTs, combining their predictions through
2.4. Model assessment
voting (for classification) or averaging (for regression). Similarly,
XGBoost constructs an ensemble of learners, typically DTs, within a
To evaluate the predictive performance of the ML algorithms, three
gradient boosting framework, iteratively improving the model’s
statistical metrics were employed: mean absolute error (MAE), root

5
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

mean squared error (RMSE), and the coefficient of determination (R²). Table 3
Lower values of MAE and RMSE, combined with higher R2 scores, Summary of the hyperparameter tuning through GridSearchCV function in
indicate superior predictive accuracy. The equations defining these scikit-learn package.
metrics are provided in Eqs. (1) to (3): Model Hyperparameters tuned Values considered Optimal value

1 ∑ ʹ RF n_estimators, 100, 200 100


MAE = |(y − y)| (1) max_depth, Full trees, 10, 20 10
n ​
​ min_samples_split, 2, 5 2
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ​ min_samples_leaf 1, 2 1
1∑ ʹ XGBoost learning_rate, 0.01, 0.1, 0.2 0.2
RMSE = (y − y)² (2)
n ​ n_estimators, 100, 200 200
​ max_depth, 3, 6, 9 3
⎛ ⎞
∑ ∑ ∑ ​ reg_alpha, 0.01, 0.1 0.01
⎜ n( yyʹ) − ( y)( yʹ) ⎟ reg_lambda 0.01, 0.1 0.01
R2 = ⎝√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ (3)

[ ∑ ]̅[ ]⎠
∑ 2 ∑ ∑ 2
n y2 − ( y) n y 2 − ( yʹ) ʹ

Table 4
where y′ is the predicted value, y is the experimental value, and n is the Machine learning models performance.
total number of data points in the datasets.
ML models R2Train R2Test MAETrain MAETest RMSETrain RMSETest

3. Results and discussion RF 0.9961 0.9898 4.46 10.41 12.54 19.99


XGBoost 0.9997 0.9894 2.38 11.50 3.61 20.43

3.1. Cross validation and hyperparameter optimization


higher R² values represent better model performance. As seen in Table 4,
In this study, a 5-fold cross-validation strategy was employed to during the training phase, each model achieved an R² score exceeding
ensure robust model performance and generalization. This approach 0.9, demonstrating excellent predictive performance. Among the
provides a comprehensive evaluation by partitioning the dataset into models, XGBoost slightly outperformed RF during training, achieving
multiple subsets, iteratively training the model on one subset, and the highest R² score and the lowest MAE (2.38 mL/min) and RMSE (3.61
validating it on the remaining data. Averaging the performance metrics mL/min). The RF model followed closely, with an R² score of 0.9961,
across all folds offers a more reliable estimate of the model’s predictive alongside MAE and RMSE values of 4.46 mL/min and 12.54 mL/min,
performance. respectively.
During the model training, hyperparameter tuning (Yang and Shami, However, the testing set results (Table 4) show that the RF model
2020) was critical in optimizing model performance. For the RF model, consistently outperformed XGBoost. The RF model achieved the highest
parameters such as the number of trees, maximum depth, minimum R² score of 0.9898, with the lowest MAE (10.41 mL/min) and RMSE
samples split, and minimum samples per leaf were fine-tuned. Pre­ (19.99 mL/min), demonstrating robust generalization to unseen data.
liminary analyses indicated that narrower ranges for certain hyper­ The XGBoost model performed slightly worse on the test set, attaining an
parameters, like the minimum samples split (set between 2 and 5), R² score of 0.9894 and RMSE and MAE values of 20.43 mL/min and
provided the most significant improvement in performance within 11.50 mL/min, respectively. These results indicate that the RF model
computational constraints. This tuning aimed to balance model exhibited stronger predictive performance and generalizability on the
complexity and generalization capabilities. Similarly, for the XGBoost testing data compared to the XGBoost model, particularly in terms of
model, key parameters like the learning rate, number of trees, and hydrogen flow rate prediction accuracy, which is critical for optimizing
maximum depth were adjusted. The hyperparameter tuning process, PEM water electrolysis efficiency.
conducted using the GridSearchCV function from the scikit-learn library To further support these findings, Fig. 3 presents scatter plots
(Géron, 2024), involved an exhaustive search across predefined comparing the predicted hydrogen flow rate to the actual hydrogen flow
parameter grids. This iterative search sought to identify the optimal rate for both models. The scatter plot for the RF model (a) reveals a
combination of parameters that would model performance while mini­ strong linear relationship between predicted and actual values, with
mizing overfitting. data points closely aligning along the line of best fit, signifying a high
The results from hyperparameter tuning guided the selection of level of prediction accuracy. This alignment indicates that the RF model
optimal values for each model, such as the number of trees for RF and effectively captures the underlying patterns in the data, accurately
the learning rate for XGBoost. Both RF and XGBoost models, fine-tuned predicting hydrogen production across a wide range of conditions (Jin
with these optimal parameters, demonstrated superior performance, et al., 2020). The plot for the XGBoost model (b) also displays a strong
reflected by high R2 scores and low RMSE and MAE values, signaling linear relationship, though with slightly more dispersion of data points,
strong predictive capabilities. Once the optimal parameters were suggesting a minor reduction in prediction accuracy. The increased
determined, the models were retrained using the entire dataset to dispersion could be attributed to the XGBoost model’s slightly lower
maximize their predictive power. This final retraining step, with opti­ sensitivity to feature interactions compared to the RF model, leading to
mized configurations, ensured that the models fully exploited the minor deviations in prediction performance (Chen and Guestrin, 2016).
dataset to accurately predict hydrogen production rates in PEMWE. In summary, while both models demonstrate strong predictive capabil­
Table 3 summarizes the hyperparameter tunings for each model and the ities for hydrogen flow rate, the RF model shows tighter clustering of
selected optimal values. data points around the line of best fit, highlighting its superior perfor­
mance in capturing the relationship between the operational parameters
3.2. Predictive performance of machine learning models and hydrogen production.

The prediction results for hydrogen flow rate are presented in 3.3. Feature importance analysis
Table 4, with model performance evaluated using MAE, RMSE, and R²
on the test set. The MAE measures the average magnitude of prediction In ML, feature importance represents a score that indicates the
errors, with lower MAE values indicating greater accuracy in predicting relative influence of each input feature in predicting the target variable.
hydrogen flow rates. R² measures the proportion of variance in the The underlying concept is that certain features have a more substantial
predicted hydrogen flow rate explained by the input features, where impact on the model’s output than others. Features with higher

6
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

Fig. 3. Scatter plots showing the predicted vs actual hydrogen production rate of different ML models: (a) RF, (b) XGBoost models.

importance scores are more critical for accurate predictions, while fea­ input features in descending order of importance. As illustrated, cell
tures with lower scores have less influence. In this study, the RF algo­ voltage (V) emerges as the most influential feature in predicting the
rithm was used to determine the relative importance of the input hydrogen production rate in PEMWE, with a considerably higher
features in predicting hydrogen flow rate. importance score than the other features. This physical significance can
Fig. 4 presents the feature importance scores, as calculated by the RF be attributed to the fact that cell voltage directly drives the electro­
model. The x-axis shows the importance scores, and the y-axis lists the chemical reactions in the electrolyzer, influencing the separation of

Fig. 4. Feature importance of the operating parameters on hydrogen rate based on the RF algorithm.

7
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

hydrogen and oxygen molecules. This is followed by cell current (A), crucial in identifying the optimal operational range for maximizing ef­
which also plays a substantial role due to its direct relation to the rate of ficiency while minimizing unnecessary energy expenditure.
electron transfer during the electrolysis process. Both of these features, On the other hand, the XGBoost model exhibits a similar trend, with
voltage and current, are fundamental to the electrolysis process, directly hydrogen production rising in response to increasing cell voltage and
influencing the rate of hydrogen production. Beyond these key features, current. However, the model presents a slight gradient and saturation
water flow rate and power (W) also contribute to the predictions, though behavior compared to the RF model. In particular, the XGBoost model
their importance is notably lower compared to voltage and cell current, shows a more gradual increase in hydrogen production with respect to
likely because they affect the process indirectly through operational voltage and current, and the saturation effect appears less pronounced.
efficiency rather than the core electrochemical reaction. The remaining This more gradual trend may reflect the XGBoost model’s sensitivity to
features, such as cell temperature (K), cathode catalyst (10 % platinum), minor changes in operational parameters, indicating its utility in fine-
cell design number, anode type (titanium), membrane type (Nafion tuning operational conditions. This distinction between the two
117), and catholyte (deionized water), exhibit minimal importance models highlights the importance of evaluating multiple ML models to
within the model, indicating that although these parameters influence capture different aspects of the operational dynamics of PEMWE
system performance, their role in hydrogen production is more systems.
secondary. Key insights from this extended analysis indicate that increasing both
cell voltage and current enhances hydrogen production, a finding
3.4. Extended range analysis of voltage and current impact on hydrogen consistent with the fundamental principles of electrolysis where greater
production electrical input stimulates higher hydrogen generation (Cheng et al.,
2007). However, the identified saturation points indicate that beyond
This section presents an extended range analysis aimed at evaluating certain voltage and current thresholds, efficiency gains diminish. This
the limiting behavior of hydrogen production by varying the key oper­ reinforces the physical reality that while increasing electrical inputs
ational parameters: cell voltage and cell current. The goal was to provide boosts hydrogen production, excessive inputs may lead to inefficient
deeper insights into the operational boundaries of PEMWE. Fig. 5(a) and energy use and material wear (Carmo et al., 2013). Additionally, the RF
Fig. 5(b) illustrate the results of this analysis using the RF and XGBoost model demonstrates a more pronounced saturation effect compared to
models, respectively. These 3D surface plots depict the predicted XGBoost, suggesting that RF may be more sensitive to changes in
hydrogen production rates over an extended range of cell voltage (0.5 to operational parameters, offering a clearer delineation of the optimal
3.5 V) and cell current (0 to 100 A). operating ranges. On the other hand, the XGBoost model’s smoother
The RF model reveals a strong relationship between cell voltage, cell transition in hydrogen production rates could be beneficial in scenarios
current, and hydrogen production, aligning with the electrochemical where more nuanced adjustments to operational parameters are
principles of water electrolysis. As cell voltage increases, hydrogen required.
production also rises significantly, and similarly, higher cell currents
contribute to a substantial increase in hydrogen production. This reflects 3.5. Partial dependency plots for best machine learning models
the direct role of voltage and current in driving the electrochemical
reaction that separates water into hydrogen and oxygen, where higher The results of this study indicate that the RF and XGBoost models are
voltages and currents accelerate reaction rates. However, the model the most effective ML algorithms for predicting the hydrogen production
indicates a plateau at higher voltages and currents, suggesting a satu­ rate in PEMWE. Partial Dependency Plots (PDPs) were employed to
ration point beyond which further increases in voltage or current pro­ analyze how specific input features affect the predictions of these
vide diminishing returns in hydrogen production. This plateau effect models. PDPs are particularly valuable for complex models, as they
corresponds to the physical limits of the electrolyzer, where factors such allow for the exploration of non-linear patterns and feature interactions
as increased resistive losses and heat generation reduce system effi­ by holding one or two input features constant while varying all others
ciency at high operational levels (Babic et al., 2017). This insight is within their respective ranges (Parr and Wilson, 2021). This approach

Fig. 5. Extended analysis of hydrogen production rates using (a) RF and (b) XGBoost models, illustrating the effect of varying cell voltage (0.5 to 3.5 V) and cell
current (0 to 100 A) on predicted hydrogen flow rate (mL/min). This analysis highlights the relationship between operational parameters and hydrogen production,
including observed saturation points at higher voltages and currents. These findings provide insights into the optimal operational ranges for PEMWE systems.

8
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

aids in making the model’s decision-making process more transparent, cell current (A) also displays a steep initial increase, followed by a
facilitating model validation, and providing insights into its behavior for plateau, reinforcing the observation of a saturation point beyond which
non-experts. additional current does not lead to substantial improvements in
A PDP analysis was conducted on the two most influential operating hydrogen production. This behavior highlights the physical limits of the
parameters: cell voltage and cell current. The results are visualized in electrolyzer, where excessive current input fails to translate into pro­
Fig. 6. For the RF model, the PDP of cell voltage (V) shows a relatively portionate hydrogen production increases due to inefficiencies (Millet
stable hydrogen production rate up to approximately 3.0 V, after which et al., 2010). This underscores the importance of optimizing both
there is a sharp increase. This indicates that, beyond a certain threshold, voltage and current to maximize hydrogen production, as excessive in­
increased cell voltage significantly enhances the electrochemical re­ creases in either parameter yield diminishing returns.
actions within PEMWE, boosting hydrogen generation efficiency
(El-Shafie, 2023; Zeng and Zhang, 2010). Similarly, the PDP for cell 3.6. Limitations and future work
current (A) exhibits a gradual increase in hydrogen production with
rising current, followed by a plateau at higher current values. This While the ML algorithms employed in this study, RF, SVM, and
plateau suggests the onset of system limitations, where factors such as XGBoost, have demonstrated strong predictive accuracy and provided
increased heat generation or resistive losses prevent the system from valuable insights into hydrogen production in PEMWE systems, several
efficiently handling higher current densities. This physical interpreta­ limitations need to be acknowledged. One key limitation is the reliance
tion aligns with the electrochemical constraints of PEMWE systems. on the datasets used in this research. Although the dataset encompasses
The XGBoost model’s PDP exhibits similar trends. The PDP for cell a broad range of operational conditions, it may not fully represent the
voltage (V) reveals a sharp increase in hydrogen production beyond 3.0 diverse operating environments of PEMWE systems. As such, this could
V, although the transition is more gradual than that of the RF model, constrain the model’s generalizability when applied to significantly
reflecting a more nuanced interpretation of voltage effects. The PDP for different configurations, limiting their practical applicability in real-

Fig. 6. Partial dependency plots for the best ML models: (a) and (b) RF, (c) and (d) XGBoost. The plots depict the relationship between cell voltage (V) and cell
current (A) on hydrogen production rate (mL/min), emphasizing the key operational ranges for optimizing hydrogen production in PEM water electrolysis.

9
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

world settings. hydrogen production rates in PEMWE systems. The insights provided
Another limitation arises from the exclusion of certain variables, offer valuable guidance for optimizing PEMWE operations and set the
such as minor fluctuations in temperature or systems variations, that foundation for future advancements in hydrogen energy production.
were assumed to have minimal impact on hydrogen production. In
practice, these factors may play a role in system efficiency, and their Declaration of competing interest
omission could reduce the model’s precision, especially when applied to
more complex or sensitive environments. While the ML models suc­ The authors declare that they have no known competing financial
cessfully capture key relationships between critical parameters, such as interests or personal relationships that could have appeared to influence
cell voltage and current, their ability to account for the intricate inter­ the work reported in this paper.
play of all operational variables is limited, potentially reducing predic­
tive performance under specific conditions. Acknowledgments
Furthermore, the static nature of the ML models presents challenges
in real-time, dynamic environments where parameters like voltage and The authors would like to acknowledge the financial support pro­
current fluctuate rapidly. The models, as developed, may experience vided from the American University of Sharjah through FRG22-C-E06
reduced predictive accuracy when applied to scenarios with fast- and Petrofac Endowed Chair funds. The authors would like also to
changing operational conditions or extreme environments (e.g., very thank the University of Sharjah, United Arab Emirates for financial
high or low temperatures) that fall outside the range of the training data. support via the targeted research grant project number: 23020406306.
Additionally, the risk of overfitting, particularly with complex models
such as RF and XGBoost, remains a limitation. Despite the application of References
hyperparameter optimization to mitigate overfitting, highly complex
systems could lead to reduced model generalizability when exposed to Sharifishourabi, M., Dincer, I., Mohany, A., 2024. Implementation of experimental
new, unseen data. techniques in ultrasound-driven hydrogen production: a comprehensive review. Int.
J. Hydrogen Energy 62, 1183–1204. https://doi.org/10.1016/j.
Looking ahead, future research should address these limitations by
ijhydene.2024.03.013.
expanding the dataset to encompass a wider range of operational con­ Verma, A., Rathore, K., Srivastava, R., 2023. Application of machine learning approach
ditions, including extreme cases, to further test the robustness of the for green hydrogen. Solar-Driven Green Hydrogen Generation and Storage. Elsevier,
pp. 525–543. https://doi.org/10.1016/B978-0-323-99580-1.00004-2.
models. Incorporating additional variables, such as temperature fluc­
Saravanan, P., Khan, M.R., Yee, C.S., Vo, D.-V.N., 2020. An overview of water electrolysis
tuations and localized environmental factors, could enhance the models’ technologies for the production of hydrogen. New Dimensions in Production and
predictive capabilities. Moreover, the integration of ML with physics- Utilization of Hydrogen. Elsevier, pp. 161–190. https://doi.org/10.1016/B978-0-12-
based or physics-informed modeling approaches offers a promising 819553-6.00007-6.
Acar, C., Dincer, I., 2014. Comparative assessment of hydrogen production methods from
avenue for improving both prediction accuracy and interpretability. renewable and non-renewable sources. Int. J. Hydrogen Energy 39 (1), 1–12.
These hybrid models could provide a more holistic understanding of the https://doi.org/10.1016/j.ijhydene.2013.10.060.
physical processes governing hydrogen production, allowing for better Boyano, A., Blanco-Marigorta, A.M., Morosuk, T., Tsatsaronis, G., 2011.
Exergoenvironmental analysis of a steam methane reforming process for hydrogen
optimization of PEMWE systems. production. Energy 36 (4), 2202–2214. https://doi.org/10.1016/j.
energy.2010.05.020.
4. Conclusions Huang, J., Dincer, I., 2014. Parametric analysis and assessment of a coal gasification
plant for hydrogen production. Int. J. Hydrogen Energy 39 (7), 3294–3303. https://
doi.org/10.1016/j.ijhydene.2013.12.054.
This study employed ML techniques, namely RF, XGBoost, and SVM, Rezaei, M., Sameti, M., Nasiri, F., 2024. Design optimization for an integrated tri-
to develop predictive models for hydrogen production rates in PEMWE. generation of heat, electricity, and hydrogen powered by biomass in cold climates.
Int. J. Thermofluids 22, 100618. https://doi.org/10.1016/j.ijft.2024.100618.
A dataset of 450 experimental data points, comprising 16 features Das, D., 2001. Hydrogen production by biological processes: a survey of literature. Int. J.
sourced from peer-reviewed literature, was compiled for model training. Hydrogen Energy 26 (1), 13–28. https://doi.org/10.1016/S0360-3199(00)00058-6.
To enhance model accuracy, RFE was applied to identify the most Velasquez-Jaramillo, M., García, J.-G., Vasco-Echeverri, O., 2024. Techno economic
model to analyze the prospects of hydrogen production in Colombia. Int. J.
relevant input features, with the RF algorithm consistently providing the Thermofluids 22, 100597. https://doi.org/10.1016/j.ijft.2024.100597.
most accurate predictions. The RF model achieved an R² score of 0.9898, Abdelkareem, M.A., et al., 2023. Optimized solar photovoltaic-powered green hydrogen:
an MAE of 10.41 mL/min, and an RMSE of 19.99 mL/min, out­ current status, recent advancements, and barriers. Sol. Energy 265, 112072. https://
doi.org/10.1016/j.solener.2023.112072.
performing the other models.
Kumar, S.S., Lim, H., 2022. An overview of water electrolysis technologies for green
The feature selection process not only improved model performance hydrogen production. Energy Reports 8, 13793–13813. https://doi.org/10.1016/j.
but also highlighted key factors influencing PEMWE efficiency. egyr.2022.10.127.
Extended analysis using the RF and XGBoost models revealed that Dunn, S., 2002. Hydrogen futures: toward a sustainable energy system. Int. J. Hydrogen
Energy 27 (3), 235–264. https://doi.org/10.1016/S0360-3199(01)00131-8.
increasing cell voltage and current results in higher hydrogen produc­ Ong, S., Al-Othman, A., Tawalbeh, M., Aug. 2023. Emerging technologies in prognostics
tion, aligning with the fundamental principles of electrolysis. However, for fuel cells including direct hydrocarbon fuel cells. Energy 277, 127721. https://
saturation points were observed, indicating the importance of carefully doi.org/10.1016/j.energy.2023.127721.
Tawalbeh, M., Alarab, S., Al-Othman, A., Javed, R.M.N., Aug. 2022. The operating
managing operational parameters to optimize hydrogen production, parameters, structural composition, and fuel sustainability aspects of PEM fuel cells:
reduce costs, and extend the lifespan of the system. While the SVM a mini review. Fuels 3 (3), 449–474. https://doi.org/10.3390/fuels3030028.
initially showed lower predictive accuracy compared to RF and Nikolaidis, P., Poullikkas, A., 2017. A comparative overview of hydrogen production
processes. Renewable and Sustainable Energy Reviews 67, 597–611. https://doi.
XGBoost, this was before hyperparameter tuning. With further fine- org/10.1016/j.rser.2016.09.044.
tuning, such as exploring different kernel functions, the SVM model Olabi, A.G., et al., 2024. Multiple-criteria decision-making for hydrogen production
could potentially achieve improved performance. approaches based on economic, social, and environmental impacts. Int. J. Hydrogen
Energy 52, 854–868. https://doi.org/10.1016/j.ijhydene.2023.10.293.
This study also suggests potential avenues for further research, Kumar, S.S., Himabindu, V., 2019. Hydrogen production by PEM water electrolysis – A
including incorporating additional parameters like temperature, pres­ review. Mater. Sci. Energy Technol. 2 (3), 442–454. https://doi.org/10.1016/j.
sure, and power density into the models to enable more comprehensive mset.2019.03.002.
CHOI, P., 2004. A simple model for solid polymer electrolyte (SPE) water electrolysis.
optimization. Although the results are promising, challenges related to
Solid State Ionics 175 (1–4), 535–539. https://doi.org/10.1016/j.ssi.2004.01.076.
data accessibility remain. It is recommended that the research commu­ Oliveira, L.F.L., Jallut, C., Franco, A.A., 2013. A multiscale physical model of a polymer
nity make experimental data more readily available to enhance data- electrolyte membrane water electrolyzer. Electrochim. Acta 110, 363–374. https://
driven approaches to PEMWE optimization. doi.org/10.1016/j.electacta.2013.07.214.
Sezer, N., Bayhan, S., Fesli, U., Sanfilippo, A., 2024. A comprehensive review of the state-
In conclusion, this research demonstrates the effectiveness of ML of-the-art of proton exchange membrane water electrolysis. Mater. Sci. Energy
techniques, particularly RF and XGBoost, in accurately predicting Technol. https://doi.org/10.1016/j.mset.2024.07.006.

10
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

Grigoriev, S., Porembsky, V., Fateev, V., 2006. Pure hydrogen production by PEM J. Power Sources 177 (2), 281–285. https://doi.org/10.1016/j.
electrolysis for hydrogen energy. Int. J. Hydrogen Energy 31 (2), 171–175. https:// jpowsour.2007.11.072.
doi.org/10.1016/j.ijhydene.2005.04.038. Ruck, S., et al., 2022. Carbon supported NiRu nanoparticles as effective hydrogen
Hernández-Gómez, Á., Ramirez, V., Guilbert, D., 2020. Investigation of PEM electrolyzer evolution catalysts for anion exchange membrane water electrolyzers. J. Phys.
modeling: electrical domain, efficiency, and specific energy consumption. Int. J. Energy 4 (4), 044007. https://doi.org/10.1088/2515-7655/ac95cd.
Hydrogen Energy 45 (29), 14625–14639. https://doi.org/10.1016/j. Mayousse, E., Maillard, F., Fouda-Onana, F., Sicardy, O., Guillet, N., Aug. 2011.
ijhydene.2020.03.195. Synthesis and characterization of electrocatalysts for the oxygen evolution in PEM
Ming, W., et al., 2023. A systematic review of machine learning methods applied to fuel water electrolysis. Int. J. Hydrogen Energy 36 (17), 10474–10481. https://doi.org/
cells in performance evaluation, durability prediction, and application monitoring. 10.1016/j.ijhydene.2011.05.139.
Int. J. Hydrogen Energy 48 (13), 5197–5228. https://doi.org/10.1016/j. Zhou, W., et al., 2016. Recent developments of carbon-based electrocatalysts for
ijhydene.2022.10.261. hydrogen evolution reaction. Nano Energy 28, 29–43. https://doi.org/10.1016/j.
Bilgiç, G., Öztürk, B., Atasever, S., Şahin, M., Kaplan, H., 2023. Prediction of hydrogen nanoen.2016.08.027.
production by magnetic field effect water electrolysis using artificial neural network Pushkarev, A.S., Pushkareva, I.V., du Preez, S.P., Bessarabov, D.G., 2023. PGM-free
predictive models. Int. J. Hydrogen Energy 48 (53), 20164–20175. https://doi.org/ electrocatalytic layer characterization by electrochemical impedance spectroscopy
10.1016/j.ijhydene.2023.02.082. of an anion exchange membrane water electrolyzer with nafion ionomer as the
Mohamed, A., Ibrahem, H., Kim, K., 2022. Machine learning-based simulation for proton bonding agent. Catalysts 13 (3), 554. https://doi.org/10.3390/catal13030554.
exchange membrane electrolyzer cell. Energy Reports 8, 13425–13437. https://doi. Wang, L., et al., 2022. Deciphering the exceptional performance of NiFe hydroxide for
org/10.1016/j.egyr.2022.09.135. the oxygen evolution reaction in an anion exchange membrane electrolyzer. ACS
Kabir, M.M., et al., 2023. Machine learning-based prediction and optimization of green Appl. Energy Mater. 5 (2), 2221–2230. https://doi.org/10.1021/acsaem.1c03761.
hydrogen production technologies from water industries for a circular economy. Kim, S.H., Boukouvala, F., 2020. Surrogate-based optimization for mixed-integer
Desalination 567, 116992. https://doi.org/10.1016/j.desal.2023.116992. nonlinear problems. Comput. Chem. Eng. 140, 106847. https://doi.org/10.1016/j.
Cheng, G., et al., 2023. Analysis and prediction of green hydrogen production potential compchemeng.2020.106847.
by photovoltaic-powered water electrolysis using machine learning in China. Energy Pedregosa, F., et al., 2011. Scikit-learn: machine learning in Python. J. Mach. Learn. Res.
284, 129302. https://doi.org/10.1016/j.energy.2023.129302. 12, 2825–2830.
Zhang, Z., Ren, B., Du, X., Chen, L., 2022. An XGBoost based prediction model for Alaca, Y., Emin, B., Akgul, A., 2024. A comparative study of deep learning models and
electrochemical characteristics of hydrogen production by water electrolysis. In: classification algorithms for chemical compound identification and Tox21
2022 4th Int. Conf. Power Energy Technol. ICPET 2022, pp. 1163–1168. https://doi. prediction. Comput. Chem. Eng. 189, 108805. https://doi.org/10.1016/j.
org/10.1109/ICPET55165.2022.9918340. compchemeng.2024.108805.
Hayatzadeh, A., Fattahi, M., Rezaveisi, A., 2024. Machine learning algorithms for Darst, B.F., Malecki, K.C., Engelman, C.D., 2018. Using recursive feature elimination in
operating parameters predictions in proton exchange membrane water electrolyzers: random forest to account for correlated variables in high dimensional data. BMC
anode side catalyst. Int. J. Hydrogen Energy 56, 302–314. https://doi.org/10.1016/ Genet 19 (Suppl 1), 1–6. https://doi.org/10.1186/s12863-018-0633-8.
j.ijhydene.2023.12.149. J. Shlens, “A Tutorial on Principal Component Analysis,” 2014.
Odabaşı, Ç., Dologlu, P., Gülmez, F., Kuşoğlu, G., Çağlar, Ö., 2022. Investigation of the Jolliffe, I.T., Cadima, J., 2016. Principal component analysis: a review and recent
factors affecting reverse osmosis membrane performance using machine-learning developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 374 (2065), 20150202.
techniques. Comput. Chem. Eng. 159, 107669. https://doi.org/10.1016/j. https://doi.org/10.1098/rsta.2015.0202.
compchemeng.2022.107669. Gómez, D., Rojas, A., 2016. An Empirical Overview of the No Free Lunch Theorem and
Zhang, X., Yan, C., Gao, C., Malin, B.A., Chen, Y., 2020. Predicting missing values in Its Effect on Real-World Machine Learning Classification. Neural Comput 28 (1),
medical data via XGBoost regression. J. Healthc. Informatics Res. 4 (4), 383–394. 216–228. https://doi.org/10.1162/NECO_a_00793.
https://doi.org/10.1007/s41666-020-00077-1. Madaan, A., Pandey, J., 2024. Development of machine learning based model for low-
Valizadeh, A., Amirhosseini, M.H., Ghorbani, Y., 2024. Predictive precision in battery temperature PEM fuel cells. Comput. Chem. Eng. 188, 108754. https://doi.org/
recycling: unveiling lithium battery recycling potential through machine learning. 10.1016/j.compchemeng.2024.108754.
Comput. Chem. Eng. 183, 108623. https://doi.org/10.1016/j. Li, Y., et al., 2018. Random forest regression for online capacity estimation of lithium-ion
compchemeng.2024.108623. batteries. Appl. Energy 232, 197–210. https://doi.org/10.1016/j.
Abdelkareem, M.A., et al., 2022. Progress of artificial neural networks applications in apenergy.2018.09.182.
hydrogen production. Chem. Eng. Res. Des. 182, 66–86. https://doi.org/10.1016/j. Wang, F., Zhao, J., Van Hoang, V., 2024. Prediction of variables involved in TEG
cherd.2022.03.030. Dehydration using hybrid models based on boosting algorithms. Comput. Chem.
Tawalbeh, M., Shomope, I., Al-Othman, A., Alshraideh, H., 2024. Prediction of hydrogen Eng. 188, 108747. https://doi.org/10.1016/j.compchemeng.2024.108747.
production in proton exchange membrane water electrolysis via neural networks. Chiang, L.H., Kotanchek, M.E., Kordon, A.K., 2004. Fault diagnosis based on Fisher
Int. J. Thermofluids, 100849. https://doi.org/10.1016/j.ijft.2024.100849. discriminant analysis and support vector machines. Comput. Chem. Eng. 28 (8),
Mohamed, A., Ibrahem, H., Yang, R., Kim, K., 2022. Optimization of proton exchange 1389–1401. https://doi.org/10.1016/j.compchemeng.2003.10.002.
membrane electrolyzer cell design using machine learning. Energies 15 (18), 6657. Lu, Q., Forbes, M.G., Loewen, P.D., Backström, J.U., Dumont, G.A., Gopaluni, R.B., 2020.
https://doi.org/10.3390/en15186657. Support vector machine approach for model-plant mismatch detection. Comput.
Rozain, C., Mayousse, E., Guillet, N., Millet, P., 2016. Influence of iridium oxide loadings Chem. Eng. 133, 106660. https://doi.org/10.1016/j.compchemeng.2019.106660.
on the performance of PEM water electrolysis cells: part I–Pure IrO 2 -based anodes. T. Chen and C. Guestrin, “XGBoost: a Scalable Tree Boosting System,” 2016, doi:10.11
Appl. Catal. B Environ. 182, 153–160. https://doi.org/10.1016/j. 45/2939672.2939785.
apcatb.2015.09.013. Friedman, J.H., 2002. Stochastic gradient boosting. Comput. Stat. Data Anal. 38 (4),
Brightman, E., Dodwell, J., van Dijk, N., Hinds, G., 2015. In situ characterisation of PEM 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2.
water electrolysers using a novel reference electrode. Electrochem. commun. 52, Segal, M.R., 2004. Machine learning benchmarks and random forest regression
1–4. https://doi.org/10.1016/j.elecom.2015.01.005. publication date machine learning benchmarks and random forest regression. Cent.
Ayers, K.E., et al., 2016. Pathways to ultra-low platinum group metal catalyst loading in Bioinforma. Mol. Biostat. 15.
proton exchange membrane electrolyzers. Catal. Today 262, 121–132. https://doi. Yang, L., Shami, A., 2020. On hyperparameter optimization of machine learning
org/10.1016/j.cattod.2015.10.019. algorithms: theory and practice. Neurocomputing 415, 295–316. https://doi.org/
Sarno, M., Ponticorvo, E., 2019. High hydrogen production rate on RuS2@MoS2 hybrid 10.1016/j.neucom.2020.07.061.
nanocatalyst by PEM electrolysis. Int. J. Hydrogen Energy 44 (9), 4398–4405. A. Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow. O’Reilly Media.
https://doi.org/10.1016/j.ijhydene.2018.10.229. 2024.
Ju, H., Giddey, S., Badwal, S.P.S., 2017. The role of nanosized SnO 2 in Pt-based Jin, Z., Shang, J., Zhu, Q., Ling, C., Xie, W., Qiang, B., 2020. RFRSF: employee turnover
electrocatalysts for hydrogen production in methanol assisted water electrolysis. prediction based on random forests and survival analysis. Lect. Notes Comput. Sci.
Electrochim. Acta 229, 39–47. https://doi.org/10.1016/j.electacta.2017.01.106. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 12343,
Ramakrishna, S.U.B., Srinivasulu Reddy, D., Shiva Kumar, S., Himabindu, V., 2016. 503–515. https://doi.org/10.1007/978-3-030-62008-0_35. LNCS.
Nitrogen doped CNTs supported Palladium electrocatalyst for hydrogen evolution Babic, U., Suermann, M., Büchi, F.N., Gubler, L., Schmidt, T.J., 2017. Critical
reaction in PEM water electrolyser. Int. J. Hydrogen Energy 41 (45), 20447–20454. review—identifying critical gaps for polymer electrolyte water electrolysis
https://doi.org/10.1016/j.ijhydene.2016.08.195. development. J. Electrochem. Soc. 164 (4), F387–F399. https://doi.org/10.1149/
Kaya, M.F., Demir, N., Rees, N.V., El-Kharouf, A., 2021. Magnetically modified 2.1441704jes.
electrocatalysts for oxygen evolution reaction in proton exchange membrane (PEM) Cheng, X., et al., 2007. A review of PEM hydrogen fuel cell contamination: impacts,
water electrolyzers. Int. J. Hydrogen Energy 46 (40), 20825–20834. https://doi.org/ mechanisms, and mitigation. J. Power Sources 165 (2), 739–756. https://doi.org/
10.1016/j.ijhydene.2021.03.203. 10.1016/j.jpowsour.2006.12.012.
Shomope, I., Tawalbeh, M., Al-Othman, A., Almomani, F., 2025. Predicting biohydrogen Carmo, M., Fritz, D.L., Mergel, J., Stolten, D., 2013. A comprehensive review on PEM
production from dark fermentation of organic waste biomass using multilayer water electrolysis. Int. J. Hydrogen Energy 38 (12), 4901–4934. https://doi.org/
perceptron artificial neural network (MLP–ANN). Comput. Chem. Eng. 192, 108900. 10.1016/j.ijhydene.2013.01.151.
Song, S., Zhang, H., Ma, X., Shao, Z., Baker, R.T., Yi, B., 2008. Electrochemical Parr, T., Wilson, J.D., 2021. Partial dependence through stratification. Mach. Learn. with
investigation of electrocatalysts for the oxygen evolution reaction in PEM water Appl. 6, 100146. https://doi.org/10.1016/j.mlwa.2021.100146.
electrolyzers. Int. J. Hydrogen Energy 33 (19), 4955–4961. https://doi.org/
10.1016/j.ijhydene.2008.06.039.
Grigoriev, S.A., Millet, P., Fateev, V.N., 2008. Evaluation of carbon-supported Pt and Pd
nanoparticles for the hydrogen evolution reaction in PEM water electrolysers.

11
I. Shomope et al. Computers and Chemical Engineering 194 (2025) 108954

El-Shafie, M., 2023. Hydrogen production by water electrolysis technologies: a review. Millet, P., et al., 2010. PEM water electrolyzers: from electrocatalysis to stack
Results Eng 20, 101426. https://doi.org/10.1016/j.rineng.2023.101426. development. Int. J. Hydrogen Energy 35 (10), 5043–5052. https://doi.org/
Zeng, K., Zhang, D., 2010. Recent progress in alkaline water electrolysis for hydrogen 10.1016/j.ijhydene.2009.09.015.
production and applications. Prog. Energy Combust. Sci. 36 (3), 307–326. https://
doi.org/10.1016/j.pecs.2009.11.002.

12

You might also like