Physics-informed neural network (PINN) modeling of charged particle multiplicity
using the two-component framework in heavy-ion collisions: A comparison with data-driven neural networks

Akash Das¹ [email protected] Satya Ranjan Nayak² [email protected] B. K. Singh^1,2 [email protected] [email protected] ¹Discipline of Natural Sciences, PDPM Indian Institute of Information Technology Design & Manufacturing, Jabalpur 482005, INDIA.
²Department of Physics, Institute of Science,
Banaras Hindu University (BHU), Varanasi 221005, INDIA.

(November 7, 2025)

Abstract

In this study, we employ a conventional deep neural network (NN) framework integrated with physics-based constraints to predict charged hadron multiplicity ( $N_{\text{ch}}$ ) in heavy-ion collisions. The goal is to assess the performance of a purely data-driven deep neural network in comparison to a physics-informed neural network (PINN). To accomplish this, we have taken data generated from the HYDJET++ model for testing and training purposes. We train our neural network frameworks using the data of one million individual ${}^{96}_{40}\text{Zr}+^{96}_{40}\text{Zr}$ collision events. Our PINN model successfully extracts the hard-scattering fraction ( $x$ ) by learning its underlying relation from the event data. For further testing and comparison with the conventional NN, we take data of ${}^{96}_{44}\text{Ru}+^{96}_{44}\text{Ru}$ (isobar of Zr) and ${}^{197}_{79}\text{Au}+^{197}_{79}\text{Au}$ collisions using the same simulation model. We found that the NN model needs more time to train with physics. However, once trained, the PINN model is capable of accurately predicting data that it has not encountered during training, such as Au+Au collision results. Especially in a region of sparse data corresponding to high $N_{\text{ch}}$ in our study, PINN has a clear advantage over a simple NN.

^†^†preprint: APS/123-QED

I Introduction

Heavy-ion collisions at relativistic energies provide a unique environment to study the properties of the quark-gluon plasma (QGP) [1, 2]. Experiments have been conducted on various colliding systems, such as ${}^{197}_{79}\text{Au}+^{197}_{79}\text{Au}$ , ${}^{96}_{44}\text{Ru}+^{96}_{44}\text{Ru}$ , and ${}^{96}_{40}\text{Zr}+^{96}_{40}\text{Zr}$ , to explore the collective behavior, nuclear structure properties, and isobaric effects [4, 3]. The charged particle multiplicity ( $N_{\text{ch}}$ ) and its correlations serve as a probe to know the initial geometry and particle production mechanism.

Relativistic Heavy Ion Collider (RHIC) and Large Hadron Collider (LHC) record particle tracks in detectors following high-energy nuclear collisions. These tracks are then correlated with the collision geometry using the Glauber model calculations [5, 6]. The Glauber framework provides the number of nucleons taking part in the collision ( $N_{\text{part}}$ ) and the number of binary nucleon-nucleon collisions ( $N_{\text{coll}}$ ) corresponding to the impact parameter ( $b$ ). The $N_{\text{ch}}$ is empirically related to $N_{\text{part}}$ and $N_{\text{coll}}$ via a two-component formula:

\frac{dN_{\text{ch}}}{d\eta}=n_{pp}\left[(1-x)\cdot\frac{N_{\text{part}}}{2}+x\cdot N_{\text{coll}}\right]

(1)

Since $N_{\text{part}}$ counts the total number of participating nucleons from both nuclei, we divide it by 2 to get the effective number of participant pairs. $x$ represents the weight of the hard scattering contribution, and $n_{pp}$ is the average number of charged hadrons produced in individual proton-proton collisions. The contributions from hard processes scale with $N_{\text{coll}}$ and soft processes with $N_{\text{part}}$ .

In recent years, neural networks emerged as a promising tool in heavy-ion collision physics, enabling data-driven modeling of complex nonlinear dependencies [7, 8]. Although effective, the standard neural networks (NN) rely purely on the statistical correlations of the data and lack physical interoperability. To overcome this limitation of conventional NN, a more advanced version of NN was introduced, known as Physics-Informed Neural Networks (PINN) [9]. This embeds physical laws directly into the NN’s loss function and allows simultaneous learning from the data and theoretical constraints. Thus, PINN improves the robustness of the learning model in regions where the data are sparse.

In this work, we employed both a conventional neural network (NN) and a physics-informed neural network (PINN) to predict $N_{\text{ch}}$ in isobaric collisions. The models are trained on Zr+Zr data generated from the HYDJET++ model [10, 11], while testing is performed on Ru+Ru and Au+Au data generated by the same process. As an isobar system, Ru+Ru collision data serves as the seen test dataset, since the model was trained on Zr+Zr data. In contrast, the Au+Au collision data represents a completely unseen dataset during the training phase. The comparative analysis of NN and PINN predictions in these testing systems highlights the role of physics constraints in improving generalization across both seen and unseen collision systems. The study provides an initial step toward integrating PINN for analysis and prediction of observables in relativistic heavy-ion collisions.

The letter is organized as follows. In section- $\mathrm{II}$ , we have discussed broadly about the NN architecture, Pythia, and HYDJET++ models. In section- $\mathrm{III}$ , we provide the result and discussion part. In section- $\mathrm{IV}$ , we have summarized our work.

II Model Formalism

II.1 Physics framework

The heavy-ion dataset was generated using the HYDJET++ event generator. HYDJET++ is a combination of two independent algorithms, i.e, PYQUEN for hard interactions and FASTMC for thermal particle production. The simulation starts by calculating the number of participants and the number of binary collisions using the Glauber model. Then the initial parton spectra are generated using PYTHIA6 [12]. PYQUEN incorporates the collisional and radiational energy loss of the partons inside the dense QGP medium. The thermal state is generated on freezeout hypersurfaces with preset freezeout conditions. The model assumes both chemical freezeout and thermal freezeout with $T_{ch}>T_{th}$ [13, 14]. The final hadronization is performed according to the LUND string fragmentation model [15, 16].

The multiplicity in pp collisions ( $n_{pp}$ ) was generated using PYTHIA 8 [17, 18]. PYTHIA is an all-purpose Monte Carlo event generator used to simulate a wide range of collision systems. The hard interaction between the valency quarks is calculated using standard pQCD tools. The softer interactions between the beam remnants are simulated using the Multi-Parton interaction (MPI) framework. A detailed description can be found in the corresponding papers [19]. The parton showers are simulated simultaneously and added back to the final state. A color reconnection [20] is performed between the $q\bar{q}$ dipoles to reduce the number of long strings and control the multiplicity. Finally, the strings are fragmented into hadrons according to the Lund string fragmentation model [15, 16]. A detailed description of PYTHIA can be found in the PYTHIA manual [12].

II.2 Model Architecture and Data Processing

In this study, we employed the PyTorch deep learning framework [21] to construct and train the PINN for predicting $N_{\text{ch}}$ based on event-level observables. Here, event-level observables denote the $N_{\text{part}}$ , $N_{\text{coll}}$ and the impact parameter ( $b$ ) for each collision event. PyTorch’s ecosystem offers a flexible and modular platform for implementing the NN framework. It has some core features such as GPU acceleration, nn.Module, automatic differentiation, learning rate scheduling, gradient clipping, etc., which help to achieve stable and efficient optimization.

For training the neural networks, we employ a dataset generated from simulated Zr+Zr collision events at RHIC energies using the HYDJET++ event generator. The dataset consists of four features per event:- $b$ , $N_{\text{part}}$ , $N_{\text{coll}}$ , and $N_{\text{ch}}$ . The input vector X = (b, $N_{\text{part}}$ , $N_{\text{coll}}$ ) is fed into a fully connected feedforward network, while the target variable Y = $N_{\text{ch}}$ serves as a prediction target.

Before training, all the input features and the target variables are normalized to the range $[0,1]$ using min-max scaling [22]. The normalized data is converted to PyTorch tensors and split into an $80$ %- $20$ % train-test partition using random_split. The batch size is adjusted according to the size of the training dataset. For a training dataset of one million samples, a batch size of 1000 is used to ensure efficient training using PyTorch’s DataLoader [21].

The neural network architecture consists of an input layer of dimension 3, followed by three hidden layers containing 128, 64, and 32 neurons, respectively. The first hidden layer uses a hyperbolic tangent (tanh) activation function, followed by Rectified Linear Unit (ReLU) activations in subsequent layers [23]. Dropout layers with a rate of 0.1 are applied after the second and third hidden layers to mitigate overfitting. A learning rate of 0.0001 is employed for training, and the mean squared error (MSE) is used as the loss function. The output layer produces a single scalar prediction corresponding to $N_{\text{ch}}$ .

A physics module is incorporated in parallel with the data-driven neural network to enforce known physical constraints. This module implements a fixed analytical expression for charged hadron multiplicity inspired by the Glauber two-component model Eq. (1). In this module, $n_{\text{pp}}$ is treated as a non-trainable buffer using register_buffer. On the other hand $x$ is taken as a scaler trainable parameter defined as self.x = nn.Parameter(torch.tensor(7.0, dtype=torch.float32)). The value 7.0 is a random initial guess for the $x$ value. Since it is wrapped with nn.Parameter, it is registered as a model parameter and included in the gradient-based optimization.

The overall training is guided by a composite loss function:

\mathcal{L}_{\text{total}}=\mathcal{L}_{\text{data}}+\lambda\,\mathcal{L}_{\text{physics}}

(2)

The $\mathcal{L}_{\text{data}}$ quantifies how much difference there is between the NN prediction and simulated data, while $\mathcal{L}_{\text{physics}}$ penalizes deviation from the already known physics-based model prediction. Both losses use the MSELoss function to ensure robustness against the outliers. The hyperparameter $\lambda$ adjusts how much importance we give to the physics rules in matching the data.

The training was carried out on Google Colab using an NVIDIA T4 accelerator with 16 GB of memory. The T4 provides hardware acceleration optimized for deep learning, enabling efficient parallel computations. PyTorch was employed as the deep learning framework, taking advantage of Colab’s GPU/TPU backend for faster neural network training and experimentation. The PINN takes about 1.5 to 2 times longer to train compared to the data-driven neural network.

Metric	$\bm{\lambda=0.1}$	$\bm{\lambda=1}$
$R^{2}$	0.9880	0.9841
RMSE	23.16	26.07
MAE	14.85	17.16

Table 1: Performance metrics for different values of

\lambda

, trained and tested on Zr+Zr data points for 100 epochs.

III Result and discussions

The input data distributions are shown in Fig. 1. An intrinsic non-uniformity is present in the data. In the Glauber model, events are sampled with a probability proportional to $2\pi b\,db$ , so smaller $b$ values are less probable than intermediate ones. The distribution peaks around $b\sim 2$ fm before dropping off due to the finite nuclear size. On account of the reduced number of events at small impact parameters ( $b$ ), the $N_{\text{part}}$ distribution exhibits an exponential fall-off. The $N_{\text{coll}}$ and $N_{\text{ch}}$ are directly determined by the the $N_{\text{part}}$ , therefore their distributions follow the same trend. Due to this, our model has fewer training samples in the high- $N_{\text{ch}}$ region. To handle this sparse-data regime, a PINN is a good choice.

A total of $10^{6}$ minimum bias Zr+Zr collision events were simulated at $\sqrt{s_{NN}}=200$ GeV using the HYDJET++ model, assuming spherical nuclear configurations obtained from the density functional theory (DFT) calculations [24]. Those events are selected that fall under the pseudorapidity cut $|\eta|<0.5$ .

The code for purely data-driven and physics-based models is nearly the same, with the latter incorporating an additional physics-informed loss term. We started with predicting the value of $x$ using PINN. For that, we took $x$ as a trainable parameter in our code. The value of $n_{pp}$ is fixed to 3.602 as obtained with the help of PYTHIA. The model trains on 1 million minimum-bias Zr+Zr collision data points converged within 32 epochs (early stopped), yielding a predicted value of $x\approx 0.4097$ , which we round off to 0.41. To validate this result, we fixed $x$ together with $n_{pp}$ and performed multiple trials by varying the value of $x$ from 0.1 to 1.0. The parity plot confirms that $x=0.41$ provides the best agreement with the data, as shown in Fig. 2.

Taking $x$ as a fixed parameter, we conducted trials to determine the value of $\lambda$ that provides a good balance between data loss and physics loss. The corresponding plots are presented in Fig. 3, obtained using one-tenth of the original dataset (0.1 million events) of Zr+Zr collisions and trained for 100 epochs. The total data set is divided into an $80$ %- $20$ % split, with $20$ % reserved for testing. Performance metrics [25] are tabulated in the Table I. Since $\lambda=0.1$ does not strongly enforce physics, it is more suitable for fitting the data at a few number of epochs. As the value of $\lambda$ increases, the neural network output and physics-based model show strong agreement. At high $N_{\text{ch}}$ , due to the sparsity of data, the NN gets less guidance from the data. Hence, the prediction follows the physics model strongly.

The comparison between PINN and purely data-driven models is performed using limited Zr+Zr data to evaluate how much data is sufficient for effective training. Corresponding evaluation matrices are tabulated in Table II. For training and testing, only 500 data points are used on an 80–20% train-test split. A batch size of 10 is employed and the value of $\lambda$ is taken as $0.1$ . We tested the model with and without physics for 10 epochs, and the results clearly indicate that incorporating physical constraints provides a distinct advantage, especially when the model is trained with limited data. The loss curve for this is plotted in Fig. 4.

10 Epochs
	PINN	Normal NN
$R^{2}$	0.9789	0.9742
RMSE	31.19	34.68
MAE	20.30	23.75

Table 2: Performance metrics for 500 data points for PINN and normal NN over different epochs.

The model trained on Zr+Zr collisions is evaluated on Ru+Ru and Au+Au data to assess generalizability. We trained it with one million Zr data points with a physics weight of $\lambda=0.1$ over $100$ epochs. Testing datasets of 1000 events each are taken from Ru+Ru and Au+Au collisions. The PINN model predicted Ru’s $N_{\text{ch}}$ with high accuracy as it is an isobar of Zr (as shown in Fig. 5). We further evaluated the model on Au data, which was completely absent during the training process, serving as a fully unseen test set. In Fig. 6, the predicted $N_{\text{ch}}$ of Au is shown for three different conditions, namely (a) without physics constraints, (b) with PINN ( $\lambda=0.1$ ), and (c) with PINN ( $\lambda=1.0$ ). Without physics-based loss, the model underpredicts around $N_{\text{ch}}$ = $400$ . That shows that the purely data-driven model is poor at predicting unknown $N_{\text{ch}}$ . Meanwhile, with physics imposed, the model predicts better for Au+Au data. The prediction accuracy improves for $\lambda=1.0$ compared to $\lambda=0.1$ , as the former predicts higher $N_{\text{ch}}$ values that are not present in the Zr’s training dataset. So, despite being trained on Zr+Zr data with a fixed hard-scattering fraction ( $x$ ), the PINN is successful in reproducing the charged-particle multiplicity trends in Au+Au collisions. This explains that the model has learned the Glauber-type scaling law and can generalize the learned physical relation across collision systems.

IV Summary

In summary, we started training our models with Zr+Zr collisions data generated by the HYDJET++ model. A limited number of data points are present in the low impact parameter region, due to Glauber statistics. So, the use of a physics-informed neural network (PINN) helps to compensate for this scarcity by incorporating physics constraints into the learning process. We have taken $x$ as a scalar trainable parameter, and the model predicts a value of $x$ =0.41. Then we have verified this value of $x$ by taking both $x$ and $n_{pp}$ fixed. In our study, with the increasing value of $\lambda$ , the normal neural network output and physics-based output show strong agreement. We also find that physical constraints can make the model learn better with fewer training data points. With PINN, the trained model can predict even on unseen test datasets, as we have shown using isobaric collisions of Ru+Ru data and unseen Au+Au data.

Thus, it is expected that our work will enlighten the role of PINN over conventional NN in heavy-ion collision problems. This approach will help the high-energy physics community to learn complex nonlinear relations between different observables in the future. PINN can specifically be effective at reducing the number of simulated events and the overall computation time in heavy-ion collision phenomenological analysis. PINN can also help predict the outcomes of Beam Energy Scan (BES) studies, provided that data at specific energies and reliable theoretical constraints are available.

V Data availability

Data and codes will be made available on request.

VI Acknowledgment

We are grateful to Dr. Stephen Baek for his valuable lectures on PINN and L. Pon Vijaya Kanthan for his guidance on coding. BKS sincerely acknowledges financial support from the Institute of Eminence (IoE), BHU Grant number 6031. AD acknowledges the financial support through the Institute fellowship from IIITDM Jabalpur. SRN acknowledges the financial support from the UGC Non-NET fellowship and IoE research incentive during the research work.

References

[1] J. C. Collins and M. J. Perry, Phys. Rev. Lett. 34 (1975), 1353 doi:10.1103/PhysRevLett.34.1353
[2] E. V. Shuryak, Phys. Lett. B 78 (1978), 150 doi:10.1016/0370-2693(78)90370-2
[3] M. Abdallah et al. [STAR], Phys. Rev. C 105 (2022) no.1, 014901 doi:10.1103/PhysRevC.105.014901 [arXiv:2109.00131 [nucl-ex]].
[4] J. Adam et al. [STAR], Phys. Rev. C 102 (2020) no.5, 054913 doi:10.1103/PhysRevC.102.054913 [arXiv:2006.00582 [nucl-ex]].
[5] K. Adcox et al. [PHENIX], Phys. Rev. Lett. 86 (2001), 3500-3505 doi:10.1103/PhysRevLett.86.3500 [arXiv:nucl-ex/0012008 [nucl-ex]].
[6] C. Adler et al. [STAR], Phys. Rev. Lett. 87 (2001), 112303 doi:10.1103/PhysRevLett.87.112303 [arXiv:nucl-ex/0106004 [nucl-ex]].
[7] D. Guest, K. Cranmer and D. Whiteson, Ann. Rev. Nucl. Part. Sci. 68 (2018), 161-181 doi:10.1146/annurev-nucl-101917-021019 [arXiv:1806.11484 [hep-ex]].
[8] J. Duarte and J. R. Vlimant, doi:10.1142/9789811234033_0012 [arXiv:2012.01249 [hep-ph]].
[9] M. Raissi, P. Perdikaris and G. E. Karniadakis, J. Comput. Phys. 378, 686–707 (2019). doi:10.1016/j.jcp.2018.10.045
[10] I. P. Lokhtin, L. V. Malinina, S. V. Petrushanko, A. M. Snigirev, I. Arsene and K. Tywoniuk, Comput. Phys. Commun. 180 (2009), 779-799 doi:10.1016/j.cpc.2008.11.015 [arXiv:0809.2708 [hep-ph]].
[11] I. P. Lokhtin, L. V. Malinina, S. V. Petrushanko and A. M. Snigirev, Phys. Atom. Nucl. 73 (2010), 2139–2147 doi:10.1134/S1063778810120203 [DOI]
[12] T. Sjostrand, S. Mrenna and P. Z. Skands, JHEP 05 (2006), 026 doi:10.1088/1126-6708/2006/05/026 [arXiv:hep-ph/0603175 [hep-ph]].
[13] N. S. Amelin, R. Lednicky, T. A. Pocheptsov, I. P. Lokhtin, L. V. Malinina, A. M. Snigirev, I. A. Karpenko and Y. M. Sinyukov, Phys. Rev. C 74 (2006), 064901 doi:10.1103/PhysRevC.74.064901 [arXiv:nucl-th/0608057 [nucl-th]].
[14] N. S. Amelin, R. Lednicky, I. P. Lokhtin, L. V. Malinina, A. M. Snigirev, I. A. Karpenko, Y. M. Sinyukov, I. Arsene and L. Bravina, Phys. Rev. C 77 (2008), 014903 doi:10.1103/PhysRevC.77.014903 [arXiv:0711.0835 [hep-ph]].
[15] B. Andersson, Camb. Monogr. Part. Phys. Nucl. Phys. Cosmol. 7 (1997), 1-471 Cambridge University Press, 1998, ISBN 978-1-009-40129-6, 978-1-009-40125-8, 978-1-009-40128-9, 978-0-521-01734-3, 978-0-521-42094-5, 978-0-511-88149-7 doi:10.1017/9781009401296
[16] B. Andersson, S. Mohanty and F. Soderberg, Eur. Phys. J. C 21 (2001), 631-647 doi:10.1007/s100520100757 [arXiv:hep-ph/0106185 [hep-ph]].
[17] C. Bierlich, G. Gustafson, L. Lönnblad and H. Shah, JHEP 10 (2018), 134 doi:10.1007/JHEP10(2018)134 [arXiv:1806.10820 [hep-ph]].
[18] C. Bierlich, S. Chakraborty, N. Desai, L. Gellersen, I. Helenius, P. Ilten, L. Lönnblad, S. Mrenna, S. Prestel, C. T. Preuss et al., SciPost Phys. Codeb. 2022 (2022), 8 doi:10.21468/SciPostPhysCodeb.8 [DOI]
[19] I. Helenius and C. O. Rasmussen, Eur. Phys. J. C 79 (2019) no. 5, 413 doi:10.1140/epjc/s10052-019-6914-1 [DOI]
[20] T. Sjostrand and M. van Zijl, Phys. Rev. D 36 (1987), 2019 doi:10.1103/PhysRevD.36.2019
[21] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, arXiv:1912.01703 [cs.LG] [arXiv]
[22] S. G. K. Patro and K. K. Sahu, arXiv:1503.06462 [cs.OH] [arXiv]
[23] S. R. Dubey, S. K. Singh and B. B. Chaudhuri, [arXiv:2109.14545 [cs.LG]].
[24] H. j. Xu, H. Li, X. Wang, C. Shen and F. Wang, Phys. Lett. B 819 (2021), 136453 doi:10.1016/j.physletb.2021.136453 [arXiv:2103.05595 [nucl-th]].
[25] A. Botchkarev, doi:10.48550/arXiv.1809.03006

Physics-informed neural network (PINN) modeling of charged particle multiplicity using the two-component framework in heavy-ion collisions: A comparison with data-driven neural networks