Bulk-boundary decomposition of neural networks

Donghee Lee [email protected] Hye-Sung Lee [email protected] Jaeok Yi [email protected] Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea

(November 2025)

Abstract

We present the bulk–boundary decomposition as a new framework for understanding the training dynamics of deep neural networks. Starting from the stochastic gradient descent formulation, we show that the Lagrangian can be reorganized into a data-independent bulk term and a data-dependent boundary term. The bulk captures the intrinsic dynamics set by network architecture and activation functions, while the boundary reflects stochastic interactions from training samples at the input and output layers. This decomposition exposes the local and homogeneous structure underlying deep networks. As a natural extension, we develop a field-theoretic formulation of neural dynamics based on this decomposition.

Introduction— Deep neural networks have achieved remarkable empirical success across diverse domains, yet the fundamental principles governing their learning dynamics remain unclear [1, 2, 3]. Unlike many natural physical systems, which are often isotropic, neural networks are engineered structures possessing an inherently directional and anisotropic organization [4, 5]. Furthermore, the training of a deep neural network is governed by nonlocal interactions, which stem primarily from the loss function [6]. The loss function couples parameters across all layers, obscuring any notion of locality along the network’s depth.¹¹1Nonlocal correlations in neural networks have been studied in several papers [4, 7, 8, 6]. This nonlocality is one of the key obstacles that makes the study of neural networks difficult.

Several previous studies have attempted to interpret deep learning through physics-inspired approaches [3, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. However, only a few techniques could be partially applicable, as locality plays a crucial role in enabling powerful analytical frameworks such as the continuum limit. Although several works have explored incorporating field-theoretic techniques into deep learning [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], the fundamental locality of the underlying degrees of freedom has not yet been examined in detail.

In this work, we introduce the bulk–boundary decomposition (BBD) as a new framework for analyzing neural-network training dynamics.²²2We emphasize that our dynamical bulk–boundary decomposition, which separates the data and architectural sectors, is distinct from the well-known topological bulk–boundary correspondence studied in contexts such as AdS/CFT. The latter’s relation to neural networks has been explored in Refs. [45, 46, 47, 44]. Starting from the stochastic gradient descent formulation, we show that the training Lagrangian can be reorganized into a data-independent bulk, governed by the architecture and activation functions, and a data-dependent boundary, encoding stochastic interactions from the training samples at the input and output layers. This decomposition makes the local and homogeneous structure of deep networks manifest, permitting the separate analysis of architectural and statistical contributions.

Although the microscopic interactions are local, long-range order can emerge through the collective effect of training, characterizing the network’s trained state. Because field theory provides an effective language for describing long-range order, we construct a field-theoretic formulation of neural dynamics based on the BBD. Within this setting, the bulk exhibits locality, translational symmetry, and directional anisotropy along the depth dimension, while the boundary serves as the interface through which data influence the system.

The bulk–boundary framework thus offers a unified theoretical foundation for connecting optimization, generalization, and information propagation in deep learning with the organizing principles of field theory and statistical physics.

Fundamentals— A deep neural network consists of neurons $h^{(m+1)}_{i}$ connected by weights $W^{(m)}_{ij}$ and biases $b^{(m)}_{i}$ . Here, $m=0,\dots,M-1$ indexes the layers (depth), while $i=1,\dots,N_{m+1}$ and $j=1,\dots,N_{m}$ label neurons in adjacent layers. Each neuron³³3In this work, we promote the pre-activations $z_{i}^{(m)}$ , rather than $h_{i}^{(m)}$ , to serve as the fundamental degrees of freedom. For brevity, we hereafter refer to $z_{i}^{(m)}$ simply as neurons, as the context makes this distinction clear. obeys the recursive relation

\begin{split}h^{(m+1)}_{i}&=\sigma(z^{(m+1)}_{i}),\\ z^{(m+1)}_{i}&=\sum_{j}W^{(m)}_{ij}h^{(m)}_{j}+b^{(m)}_{i},\end{split}

(1)

where $\sigma$ is a nonlinear activation function. At the input layer ( $m=0$ ), $h^{(0)}_{i}=X_{i}$ ; at the output layer ( $m=M$ ), $h^{(M)}_{i}=Z_{i}$ .⁴⁴4While a distinct activation $\sigma_{\text{post}}$ such as softmax is often used for post-processing in the final layer (i.e., $h_{i}^{(M)}=\sigma_{\text{post}}(z_{i}^{(M)})$ ), we use the identity function for simplicity, such that $h_{i}^{(M)}=z_{i}^{(M)}$ .

Training optimizes the parameters $W$ and $b$ to minimize a loss function $\ell(Z,Y)$ that quantifies the mismatch between the network output $Z$ and the target $Y$ . Typical choices—summarized in Table 1—include mean-squared error, cross-entropy, and Kullback–Leibler divergence, many of which originate from principles in information theory and statistical physics. With learning rate $\eta$ , the stochastic gradient descent (SGD) update is

\Delta W=-\eta\partial_{W}\ell,\qquad\Delta b=-\eta\partial_{b}\ell,

(2)

where a randomly selected training pair $(X,Y)$ introduces stochasticity into each update. Thus, the parameters evolve as dynamical variables in a potential landscape determined by $\ell$ .

The discrete iteration can be approximated by a continuous-time limit,

\dot{W}=-\eta\partial_{W}\ell,\qquad\dot{b}=-\eta\partial_{b}\ell,

(3)

which represents the overdamped limit of a damped dynamical system [48, 49],

\ddot{W}+\gamma\dot{W}+\partial_{W}\ell=0,\qquad\ddot{b}+\gamma\dot{b}+\partial_{b}\ell=0,

(4)

where $\gamma=1/\eta$ is an effective damping coefficient. Such formulations link gradient descent to the equations of motion for dissipative physical systems. This system admits a Lagrangian formulation with action

\begin{split}S&=\int dt\,e^{\gamma t}\,L(t),\\ L(t)&=\frac{1}{2}\sum_{i,j,m}\Big(\dot{W}_{ij}^{(m)}\Big)^{2}+\frac{1}{2}\sum_{i,m}\Big(\dot{b}_{i}^{(m)}\Big)^{2}\\ &\hskip 113.81102pt-\ell\Big(Z(W,b,X),Y\Big).\end{split}

(5)

Here, the loss acts as a potential energy, while the kinetic terms describe the temporal evolution of the parameters. The exponential factor $e^{\gamma t}$ accounts for dissipation and underscores the analogy between optimization dynamics and dissipative physical systems.

Table 1: Representative loss functions

\ell(Z,Y)

used in various learning tasks [50, 51]. Here,

Z_{i}

and

Y_{i}

denote the model outputs and the corresponding labels in supervised learning, while

p_{i}

and

q_{i}

represent the true and predicted class probabilities in probabilistic settings, including unsupervised learning.

Loss Function	Expression
Mean-Squared Error (MSE)	$\displaystyle\ell=\sum_{i}(Z_{i}-Y_{i})^{2}$
Cross-Entropy	$\displaystyle\ell=-\sum_{i}p_{i}\log q_{i}$
Kullback–Leibler Divergence	$\displaystyle\ell=\sum_{i}p_{i}\log({p_{i}}/{q_{i}})$
Hinge Loss	$\displaystyle\ell=\max(0,1-ZY)$

This Lagrangian formulation provides the foundation for the bulk–boundary decomposition introduced next, in which locality along the depth dimension and the separation between architecture-driven and data-driven dynamics become explicit.

Bulk–Boundary Decomposition— Equation (1) shows that each neuron is determined locally by those in the preceding layer, suggesting a form of locality along the network depth. Making this locality explicit requires reformulating the degrees of freedom. In the conventional formulation, the fundamental variables are the weights $W^{(m)}_{ij}$ and biases $b^{(m)}_{i}$ , but the loss function $\ell(Z,Y)$ becomes a nested, highly nonlocal function when expressed in these variables. Consequently, a direct expansion of the Lagrangian in Eq. (5) produces terms that couple parameters across all layers, thereby obscuring the local structure of the information flow, as illustrated in Fig. 1(a).

To make this locality explicit, it is necessary to reconsider the role of the neurons. Although locality was previously discussed on the basis of the relationship between adjacent neurons, neurons themselves were not treated as degrees of freedom of the system. If these neurons are promoted to the status of degrees of freedom, properties such as locality can be analyzed more directly. Here, we aim to develop such a framework by leveraging the change of variables and the SGD process.

In the SGD process, an input vector $X$ is provided at a given time $t$ , which in turn determines neurons $z_{i}^{(m)}$ throughout the network. To endow these neurons $z_{i}^{(m)}$ with dynamics, their evolution must be inherited from the original dynamical components, $W_{ij}^{(m)}$ and $b_{i}^{(m)}$ . We implement this by performing a change of variables, promoting the $z_{i}^{(m)}$ to be degrees of freedom while replacing the bias parameters $b_{i}^{(m)}$ . This substitution is defined by the network’s recursive relation, Eq. (1), which can be rewritten as

b_{i}^{(m)}=z_{i}^{(m+1)}-\sum_{j}W_{ij}^{(m)}\sigma(z_{j}^{(m)}).

(6)

According to this substitution, the loss function can be used in its simple form given in Table 1. This is possible because the network output $Z_{i}$ is identified with the dynamic variable $z_{i}^{(M)}$ . The full Lagrangian resulting from this substitution is given by

\begin{split}L&=L_{\text{bulk}}+L_{\text{boundary}},\\[3.0pt] L_{\text{bulk}}&=\frac{1}{2}\sum_{i,j,m}\!\left(\dot{W}_{ij}^{(m)}\right)^{2}+\frac{1}{2}\sum_{i,m}\!\left(\dot{z}_{i}^{(m)}\right)^{2}\\ &\qquad\quad-\sum_{i,j,m}\dot{z}_{i}^{(m+1)}\partial_{t}\!\Big(W_{ij}^{(m)}\sigma(z_{j}^{(m)})\Big)\\ &\qquad\quad+\sum_{i,m}\Big[\sum_{j}\partial_{t}\!\Big(W_{ij}^{(m)}\sigma(z_{j}^{(m)})\Big)\Big]^{2},\\ L_{\text{boundary}}&=\sum_{i}\Big[\sum_{j}\partial_{t}\!\Big(W_{ij}^{(0)}X_{j}\Big)\Big]^{2}\\ &\qquad\quad-\sum_{i,j}\dot{z}_{i}^{(1)}\partial_{t}\!\Big(W_{ij}^{(0)}X_{j}\Big)-\ell(Z,Y).\end{split}

(7)

In the basis of degrees of freedom $(W,z)$ , a natural separation emerges: the bulk degrees of freedom, which interact independently of data, and the boundary degrees of freedom, whose interactions are driven by data $(X,Y)$ . From this perspective, the Lagrangian can be decomposed into two parts, $L_{\text{bulk}}$ and $L_{\text{boundary}}$ . We refer to this as bulk–boundary decomposition. In this picture, the effects of training examples are confined to the input and output boundaries ( $m=0$ and $m=M$ ), while the internal architecture contributes only to the bulk dynamics.

Refer to caption — (a) Lagrangian description with weights and biases of Eq. (5).

This separation enables a systematic analysis of the two sectors. The boundary part encodes stochasticity from data sampling, whereas the bulk part describes the deterministic evolution governed by the network architecture and activation functions. Although the reformulated Lagrangian appears algebraically more involved, it exposes the key physical structure: the kinetic terms now couple only adjacent layers $(m,m+1)$ , thereby making the locality along the depth direction explicit.

The BBD provides a natural framework for revealing the symmetry structure of deep neural networks. For architectures repeating the same layer structure, the bulk Lagrangian is invariant under $m\rightarrow m+1$ , analogous to discrete translational symmetry in lattice field theories, where the ‘depth’ index plays the role of an effective spatial coordinate. A similar consideration applies to symmetries along the width direction. Such symmetry principles provide a natural bridge between deep learning dynamics and the analytic tools of field theory, which will be developed in the following section.

Figure 1(b) schematically describes the decomposition and its local nature. Because its layout resembles typical neural network diagrams, existing intuitions about network structure can be readily applied within the BBD framework. The explicit separation of data-dependent and data-independent sectors constitutes the core theoretical innovation of this work. It provides the structural foundation upon which the subsequent field-theoretic formulation is built.

Field Description— The bulk–boundary decomposition reveals that the underlying dynamics are local along the depth direction, with interactions confined to adjacent layers. From this perspective, the network’s mapping of an input $X$ to an output $Z$ —a process that spans the entire network depth—may potentially be interpreted as an emergent long-range order phenomenon arising from these local interactions. To investigate this long-range order, we may adopt a field-theoretic approach, as is standard in physics for analyzing collective phenomena such as critical behavior or the emergence of magnetization in spin systems [52, 53, 54]. In this section, we provide a field-theoretic formulation that naturally emerges from the discrete Lagrangian derived in the previous section. Such an approach is expected to provide a promising framework for exploring the long-range order in deep neural networks.

Given that locality is a foundational property in many field theories, the BBD provides a promising framework, as it makes the locality along the depth direction explicit. However, this notion of locality is difficult to extend to the width direction, since there is no well-defined concept of spatial distance between neurons residing within the same layer. In the continuum limit of a fully connected architecture, each neuron $z_{i}^{(m)}$ can be represented by a field $z(\mathbf{x})$ , and Eq. (6) becomes

b(\mathbf{x})=z(\mathbf{x})-\int d\mathbf{x}^{\prime}\,W(\mathbf{x},\mathbf{x}^{\prime})\sigma[z(\mathbf{x}^{\prime})],

(8)

where the integral kernel $W(\mathbf{x},\mathbf{x}^{\prime})$ couples different width coordinates, thereby rendering the system nonlocal in $\mathbf{x}$ .

One approach to formulating a field theory for neural networks that is local along the width direction is to restrict the network architecture itself. This can be achieved by imposing a structure in which neurons interact only with nearby units in that dimension—a property inherent to convolutional neural networks. As a simple illustration, we consider the local architecture shown in Fig. 2. In this configuration, neurons connect only to neighboring ones through weights $W$ , and periodic boundary conditions are imposed along the width direction.

To construct the field theories for the bulk and boundary sectors, we treat the discrete indices as continuous coordinates, where $x$ denotes the depth and $y$ the width. We begin by deriving the field description for the boundary contributions, where the data-driven effects emerge. All the boundary components from Eq. (7) can be aligned along a one-dimensional line, such that the boundary field theory is formulated in one dimension. From the network connectivity, we define the spatial coordinate $y$ of $z$ , $w$ , $X$ , and $Y$ according to their connecting relations. Based on the distances between these coordinates, the discrete sums can be naturally expanded in powers of the lattice spacing $a_{y}$ , yielding a field-theoretic action.⁵⁵5The complete Lagrangian depends on a specific choice of these coordinates, which we do not detail here, as we present only several representative terms for illustrative purposes from the full expression.

\begin{split}L_{\text{input}}&=\int dy\,\Big[2\hat{X}^{2}\dot{\hat{w}}^{2}+2\dot{\hat{X}}^{2}\hat{w}^{2}+4\dot{\hat{w}}\dot{\hat{X}}\hat{w}\hat{X}\\ &\qquad\qquad\qquad\qquad\quad+2a_{y}^{2}\dot{\hat{X}}^{2}\hat{w}\partial_{y}^{2}\hat{w}+\cdots\Big]\\ L_{\text{output}}&=-\int dy\,\ell(\hat{z},\hat{Y}).\end{split}

(9)

where we define the coarse-grained fields $\hat{w}(y,t)$ , $\hat{z}(y,t)$ , $\hat{X}(y,t)$ , and $\hat{Y}(y,t)$ corresponding to the synaptic, neuronal, input, and output variables, respectively. This explicitly shows the $X$ - and $Y$ -dependence. Given the stochastic nature of the training samples, a statistical physics framework emerges as a natural candidate for incorporating this effect. This extension thus represents a promising direction for future work.

Constructing the bulk field theory is straightforward and leads to similar consequences:

\begin{split}L_{\text{bulk}}=\int dxdy\Big[\dot{\hat{w}}^{2}+&\frac{1}{2}\dot{\hat{z}}^{2}+2a_{x}\partial_{x}\partial_{t}(\hat{\sigma}\hat{w})\dot{\hat{z}}\\ &+\frac{2}{3}a_{x}^{2}[\partial_{t}(\hat{\sigma}\partial_{x}\hat{w})]^{2}\\ &+\frac{16}{3}a_{y}^{2}[\partial_{t}(\hat{\sigma}\partial_{y}\hat{w})]^{2}+\cdots\Big],\end{split}

(10)

where $\hat{\sigma}=\sigma(\hat{z})$ . This Lagrangian contains all the information about how the signal from the input boundary is transmitted to the output boundary. Exploring its implications will be an interesting direction for future work.

Figure 2: Illustration of the local neural network architecture considered in the example. Each neuron interacts only with nearby neurons through weights

W

, ensuring locality along the width direction. Periodic boundary conditions are imposed along the width direction. The depth direction corresponds to layer index

m

, along which locality and translation symmetry emerge through the bulk–boundary decomposition.

Discussion and Outlook— The bulk-boundary decomposition provides a new perspective on deep learning by separating architectural dynamics from data-driven stochasticity. It reveals that, despite being engineered systems, neural networks possess intrinsic locality and translational symmetry structures that can be analyzed through physical principles. Since locality and homogeneity are foundational assumptions in physics, many standard analytical techniques rely on these properties. Consequently, few of these methods are directly applicable to systems lacking such symmetries. In this context, the BBD offers a valuable framework for applying physics-based analyses to deep neural networks.

The locality revealed by the BBD suggests that long-range order may emerge during the training dynamics. This phenomenon can be investigated in various ways, such as via a field-theoretic approach. Considering that the goal of training is to make a deep neural network accurately approximate a target function, this emergent order may be correlated with the successfully trained state of the system.

Future work can extend this approach in several directions. The statistical-mechanical description of boundary stochasticity may clarify how generalization arises from effective thermal ensembles, while symmetry-breaking analyses could connect network anisotropy to dynamical phase transitions. The BBD framework thus opens a route toward a unified theoretical understanding of learning, bridging the dynamics of artificial networks with the organizing principles of condensed matter and field theory.

Acknowledgments— This work was supported in part by the National Research Foundation of Korea (Grant No. RS-2024-00352537).

References

Fan et al. [2021] F.-L. Fan, J. Xiong, M. Li, and G. Wang, On interpretability of artificial neural networks: A survey, IEEE Trans. Radiat. Plasma Med. Sci. 5, 741 (2021).
Li et al. [2022] X. Li, H. Xiong, X. Li, X. Wu, X. Zhang, J. Liu, J. Bian, and D. Dou, Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond, Knowl. Inf. Syst. 64, 3197 (2022).
Bahri et al. [2020] Y. Bahri, J. Kadmon, J. Pennington, S. S. Schoenholz, J. Sohl-Dickstein, and S. Ganguli, Statistical mechanics of deep learning, Annu. Rev. of Condens. Matter Phys. 11, 501 (2020).
[4] S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl-Dickstein, Deep information propagation, arXiv:1611.01232 .
[5] G. A. D’Inverno, Z. Hu, L. Davy, M. Unser, G. Rozza, and J. Dong, Revisiting deep information propagation: Fractal frontier and finite-size effects, arXiv:2508.03222 .
Lee et al. [2025a] D. Lee, H.-S. Lee, and J. Yi, Dynamic neuron approach to deep neural networks: Decoupling neurons for renormalization group analysis, Phys. Rev. Res. 7, 023276 (2025a), arXiv:2410.00396 [cond-mat.stat-mech] .
You et al. [2022] H. You, Y. Yu, M. D’Elia, T. Gao, and S. Silling, Nonlocal kernel network (nkn): A stable and resolution-independent deep neural network, Journal of Computational Physics 469, 111536 (2022).
[8] S. Lanthaler, Z. Li, and A. M. Stuart, Nonlocality and nonlinearity implies universality in operator learning, arXiv:2304.13221 .
Carnevali and Patarnello [1987] P. Carnevali and S. Patarnello, Exhaustive thermodynamical analysis of boolean learning networks, Europhys. Lett. 4, 1199 (1987).
Tishby et al. [1989] N. Tishby, E. Levin, and S. A. Solla, Consistent inference of probabilities in layered networks: Predictions and generalizations, in Proceedings IEEE International Conference on Neural Networks (IEEE, Piscataway, NJ, 1989) pp. 403–409.
Sompolinsky et al. [1990] H. Sompolinsky, N. Tishby, and H. S. Seung, Learning from examples in large neural networks, Phys. Rev. Lett. 65, 1683 (1990).
Levin et al. [1990] E. Levin, N. Tishby, and S. A. Solla, A statistical approach to learning and generalization in layered neural networks, Proc. IEEE 78, 1568 (1990).
Seung et al. [1992] H. S. Seung, H. Sompolinsky, and N. Tishby, Statistical mechanics of learning from examples, Phys. Rev. A 45, 6056 (1992).
Engel and Van den Broeck [2001] A. Engel and C. Van den Broeck, Statistical mechanics of learning (Cambridge University Press, 2001).
Mézard and Montanari [2009] M. Mézard and A. Montanari, Information, physics, and computation (Oxford University Press, 2009).
Györgyi [1990] G. Györgyi, First-order transition to perfect generalization in a neural network with binary synapses, Phys. Rev. A 41, 7097 (1990).
Koebarle and Theumann [1990] R. Koebarle and W. K. Theumann, Neural Networks and Spin Glasses—Proceedings of the Statphys 17 Workshop (World Scientific, Singapore, 1990).
Advani et al. [2013] M. Advani, S. Lahiri, and S. Ganguli, Statistical mechanics of complex neural systems and high dimensional data, J. Stat. Mech. , P03014 (2013).
[19] P. Mehta and D. J. Schwab, Exact mapping between the variational renormalization group and deep learning, arXiv:1410.3831 .
Zdeborová and Krzakala [2016] L. Zdeborová and F. Krzakala, Statistical physics of inference: thresholds and algorithms, Advances in Physics 65, 453 (2016), https://doi.org/10.1080/00018732.2016.1211393 .
Ghio et al. [2024] D. Ghio, Y. Dandi, F. Krzakala, and L. Zdeborová, Sampling with flows, diffusion, and autoregressive neural networks from a spin-glass perspective, Proceedings of the National Academy of Sciences 121, e2311810121 (2024), https://www.pnas.org/doi/pdf/10.1073/pnas.2311810121 .
[22] J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, Deep neural networks as gaussian processes, arXiv:1711.00165 [stat.ML] .
[23] S. S. Schoenholz, J. Pennington, and J. Sohl-Dickstein, A correspondence between random neural networks and statistical field theory, arXiv:1710.06570 [stat.ML] .
[24] S. Sonoda and N. Murata, Transport analysis of infinitely deep neural network, arXiv:1605.02832 [cs.LG] .
[25] E. Dyer and G. Gur-Ari, Asymptotics of Wide Networks from Feynman Diagrams, arXiv:1909.11304 [cs.LG] .
[26] Y. Sho, Non-gaussian processes and neural networks at finite widths, arXiv:1910.00019 [stat.ML] .
Lee [2020] J.-W. Lee, Quantum fields as deep learning, J. Korean Phys. Soc. 76, 684 (2020), arXiv:1708.07408 [physics.gen-ph] .
Helias and Dahmen [2020] M. Helias and D. Dahmen, Statistical Field Theory for Neural Networks, Vol. 970 (2020).
Erbin et al. [2022a] H. Erbin, V. Lahoche, and D. O. Samary, Non-perturbative renormalization for the neural network-QFT correspondence, Mach. Learn. Sci. Tech. 3, 015027 (2022a), arXiv:2108.01403 [hep-th] .
Erbin et al. [2022b] H. Erbin, V. Lahoche, and D. O. Samary, Renormalization in the neural network-quantum field theory correspondence (2022) arXiv:2212.11811 [hep-th] .
[31] J. Halverson, Building Quantum Field Theories Out of Neurons, arXiv:2112.04527 [hep-th] .
Halverson et al. [2021] J. Halverson, A. Maiti, and K. Stoner, Neural Networks and Quantum Field Theory, Mach. Learn. Sci. Tech. 2, 035002 (2021), arXiv:2008.08601 [cs.LG] .
[33] R. Bondesan and M. Welling, The hintons in your neural network: a quantum field theory view of deep learning, arXiv:2103.04913 [quant-ph] .
Segadlo et al. [2022] K. Segadlo, B. Epping, A. van Meegen, D. Dahmen, M. Krämer, and M. Helias, Unified field theoretical approach to deep and recurrent neuronal networks, J. Stat. Mech. 2022, 103401 (2022).
Krippendorf and Spannowsky [2022] S. Krippendorf and M. Spannowsky, A duality connecting neural network and cosmological dynamics, Mach. Learn. Sci. Tech. 3, 035011 (2022), arXiv:2202.11104 [gr-qc] .
Cohen et al. [2021] O. Cohen, O. Malka, and Z. Ringel, Learning curves for overparametrized deep neural networks: A field theory perspective, Phys. Rev. Res. 3, 023034 (2021).
[37] A. Maiti, K. Stoner, and J. Halverson, Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators, arXiv:2106.00694 [cs.LG] .
[38] J. Halverson, J. Naskar, and J. Tian, Conformal Fields from Neural Networks, arXiv:2409.12222 [hep-th] .
Demirtas et al. [2024] M. Demirtas, J. Halverson, A. Maiti, M. D. Schwartz, and K. Stoner, Neural network field theories: non-gaussianity, actions, and locality, Mach. Learn. Sci. Technol. 5, 015002 (2024).
[40] V. Vanchurin, Emergent field theories from neural networks, arXiv:2411.08138 [hep-th] .
Howard et al. [2025] J. N. Howard, M. S. Klinger, A. Maiti, and A. G. Stapleton, Bayesian RG flow in neural network field theories, SciPost Phys. Core 8, 027 (2025), arXiv:2405.17538 [hep-th] .
[42] Z. Ringel, N. Rubin, E. Mor, M. Helias, and I. Seroussi, Applications of statistical field theory in deep learning, arXiv:2502.18553 [stat.ML] .
Lee et al. [2025b] D. Lee, H.-S. Lee, and J. Yi, Synaptic field theory for neural networks, Phys. Rev. D 112, L031902 (2025b).
Gan and Shu [2017] W.-C. Gan and F.-W. Shu, Holography as deep learning, Int. J. Mod. Phys. D 26, 1743020 (2017), arXiv:1705.05750 [gr-qc] .
Hashimoto et al. [2018a] K. Hashimoto, S. Sugishita, A. Tanaka, and A. Tomiya, Deep learning and the AdS/CFT correspondence, Phys. Rev. D 98, 046019 (2018a), arXiv:1802.08313 [hep-th] .
Hashimoto et al. [2018b] K. Hashimoto, S. Sugishita, A. Tanaka, and A. Tomiya, Deep Learning and Holographic QCD, Phys. Rev. D 98, 106014 (2018b), arXiv:1809.10536 [hep-th] .
Hashimoto [2019] K. Hashimoto, AdS/CFT correspondence as a deep Boltzmann machine, Phys. Rev. D 99, 106017 (2019), arXiv:1903.04951 [hep-th] .
Parisi [1988] G. Parisi, Statistical Field Theory, Frontiers in Physics (Addison-Wesley, Redwood City, CA, 1988).
Watkin et al. [1993] T. L. H. Watkin, A. Rau, and M. Biehl, The statistical mechanics of learning a rule, Rev. Mod. Phys. 65, 499 (1993).
Jadon [2020] S. Jadon, A survey of loss functions for semantic segmentation, in 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) (IEEE, 2020) pp. 1–7.
[51] L. Ciampiconi, A. Elwood, M. Leonardi, A. Mohamed, and A. Rozza, A survey and taxonomy of loss functions in machine learning, arXiv:2301.05579 .
Ginzburg and Landau [1950] V. L. Ginzburg and L. D. Landau, On the theory of superconductivity, Zh. Eksp. Teor. Fiz. 20, 1064 (1950).
Wilson and Kogut [1974] K. G. Wilson and J. Kogut, The renormalization group and the $\epsilon$ expansion, Physics Reports 12, 75 (1974).
Aizenman and Fernández [1986] M. Aizenman and R. Fernández, On the critical behavior of the magnetization in high-dimensional ising models, Journal of Statistical Physics 44, 393 (1986).