Bulk-boundary decomposition of neural networks

Donghee Lee [email protected]    Hye-Sung Lee [email protected]    Jaeok Yi [email protected] Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
(November 2025)
Abstract

We present the bulk–boundary decomposition as a new framework for understanding the training dynamics of deep neural networks. Starting from the stochastic gradient descent formulation, we show that the Lagrangian can be reorganized into a data-independent bulk term and a data-dependent boundary term. The bulk captures the intrinsic dynamics set by network architecture and activation functions, while the boundary reflects stochastic interactions from training samples at the input and output layers. This decomposition exposes the local and homogeneous structure underlying deep networks. As a natural extension, we develop a field-theoretic formulation of neural dynamics based on this decomposition.

Introduction— Deep neural networks have achieved remarkable empirical success across diverse domains, yet the fundamental principles governing their learning dynamics remain unclear [1, 2, 3]. Unlike many natural physical systems, which are often isotropic, neural networks are engineered structures possessing an inherently directional and anisotropic organization [4, 5]. Furthermore, the training of a deep neural network is governed by nonlocal interactions, which stem primarily from the loss function [6]. The loss function couples parameters across all layers, obscuring any notion of locality along the network’s depth.111Nonlocal correlations in neural networks have been studied in several papers [4, 7, 8, 6]. This nonlocality is one of the key obstacles that makes the study of neural networks difficult.

Several previous studies have attempted to interpret deep learning through physics-inspired approaches [3, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. However, only a few techniques could be partially applicable, as locality plays a crucial role in enabling powerful analytical frameworks such as the continuum limit. Although several works have explored incorporating field-theoretic techniques into deep learning [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], the fundamental locality of the underlying degrees of freedom has not yet been examined in detail.

In this work, we introduce the bulk–boundary decomposition (BBD) as a new framework for analyzing neural-network training dynamics.222We emphasize that our dynamical bulk–boundary decomposition, which separates the data and architectural sectors, is distinct from the well-known topological bulk–boundary correspondence studied in contexts such as AdS/CFT. The latter’s relation to neural networks has been explored in Refs. [45, 46, 47, 44]. Starting from the stochastic gradient descent formulation, we show that the training Lagrangian can be reorganized into a data-independent bulk, governed by the architecture and activation functions, and a data-dependent boundary, encoding stochastic interactions from the training samples at the input and output layers. This decomposition makes the local and homogeneous structure of deep networks manifest, permitting the separate analysis of architectural and statistical contributions.

Although the microscopic interactions are local, long-range order can emerge through the collective effect of training, characterizing the network’s trained state. Because field theory provides an effective language for describing long-range order, we construct a field-theoretic formulation of neural dynamics based on the BBD. Within this setting, the bulk exhibits locality, translational symmetry, and directional anisotropy along the depth dimension, while the boundary serves as the interface through which data influence the system.

The bulk–boundary framework thus offers a unified theoretical foundation for connecting optimization, generalization, and information propagation in deep learning with the organizing principles of field theory and statistical physics.

Fundamentals— A deep neural network consists of neurons hi(m+1)h^{(m+1)}_{i} connected by weights Wij(m)W^{(m)}_{ij} and biases bi(m)b^{(m)}_{i}. Here, m=0,,M1m=0,\dots,M-1 indexes the layers (depth), while i=1,,Nm+1i=1,\dots,N_{m+1} and j=1,,Nmj=1,\dots,N_{m} label neurons in adjacent layers. Each neuron333In this work, we promote the pre-activations zi(m)z_{i}^{(m)}, rather than hi(m)h_{i}^{(m)}, to serve as the fundamental degrees of freedom. For brevity, we hereafter refer to zi(m)z_{i}^{(m)} simply as neurons, as the context makes this distinction clear. obeys the recursive relation

hi(m+1)=σ(zi(m+1)),zi(m+1)=jWij(m)hj(m)+bi(m),\begin{split}h^{(m+1)}_{i}&=\sigma(z^{(m+1)}_{i}),\\ z^{(m+1)}_{i}&=\sum_{j}W^{(m)}_{ij}h^{(m)}_{j}+b^{(m)}_{i},\end{split} (1)

where σ\sigma is a nonlinear activation function. At the input layer (m=0m=0), hi(0)=Xih^{(0)}_{i}=X_{i}; at the output layer (m=Mm=M), hi(M)=Zih^{(M)}_{i}=Z_{i}.444While a distinct activation σpost\sigma_{\text{post}} such as softmax is often used for post-processing in the final layer (i.e., hi(M)=σpost(zi(M))h_{i}^{(M)}=\sigma_{\text{post}}(z_{i}^{(M)})), we use the identity function for simplicity, such that hi(M)=zi(M)h_{i}^{(M)}=z_{i}^{(M)}.

Training optimizes the parameters WW and bb to minimize a loss function (Z,Y)\ell(Z,Y) that quantifies the mismatch between the network output ZZ and the target YY. Typical choices—summarized in Table 1—include mean-squared error, cross-entropy, and Kullback–Leibler divergence, many of which originate from principles in information theory and statistical physics. With learning rate η\eta, the stochastic gradient descent (SGD) update is

ΔW=ηW,Δb=ηb,\Delta W=-\eta\partial_{W}\ell,\qquad\Delta b=-\eta\partial_{b}\ell, (2)

where a randomly selected training pair (X,Y)(X,Y) introduces stochasticity into each update. Thus, the parameters evolve as dynamical variables in a potential landscape determined by \ell.

The discrete iteration can be approximated by a continuous-time limit,

W˙=ηW,b˙=ηb,\dot{W}=-\eta\partial_{W}\ell,\qquad\dot{b}=-\eta\partial_{b}\ell, (3)

which represents the overdamped limit of a damped dynamical system [48, 49],

W¨+γW˙+W=0,b¨+γb˙+b=0,\ddot{W}+\gamma\dot{W}+\partial_{W}\ell=0,\qquad\ddot{b}+\gamma\dot{b}+\partial_{b}\ell=0, (4)

where γ=1/η\gamma=1/\eta is an effective damping coefficient. Such formulations link gradient descent to the equations of motion for dissipative physical systems. This system admits a Lagrangian formulation with action

S=𝑑teγtL(t),L(t)=12i,j,m(W˙ij(m))2+12i,m(b˙i(m))2(Z(W,b,X),Y).\begin{split}S&=\int dt\,e^{\gamma t}\,L(t),\\ L(t)&=\frac{1}{2}\sum_{i,j,m}\Big(\dot{W}_{ij}^{(m)}\Big)^{2}+\frac{1}{2}\sum_{i,m}\Big(\dot{b}_{i}^{(m)}\Big)^{2}\\ &\hskip 113.81102pt-\ell\Big(Z(W,b,X),Y\Big).\end{split} (5)

Here, the loss acts as a potential energy, while the kinetic terms describe the temporal evolution of the parameters. The exponential factor eγte^{\gamma t} accounts for dissipation and underscores the analogy between optimization dynamics and dissipative physical systems.

Table 1: Representative loss functions (Z,Y)\ell(Z,Y) used in various learning tasks [50, 51]. Here, ZiZ_{i} and YiY_{i} denote the model outputs and the corresponding labels in supervised learning, while pip_{i} and qiq_{i} represent the true and predicted class probabilities in probabilistic settings, including unsupervised learning.
Loss Function Expression
Mean-Squared Error (MSE) =i(ZiYi)2\displaystyle\ell=\sum_{i}(Z_{i}-Y_{i})^{2}
Cross-Entropy =ipilogqi\displaystyle\ell=-\sum_{i}p_{i}\log q_{i}
Kullback–Leibler Divergence =ipilog(pi/qi)\displaystyle\ell=\sum_{i}p_{i}\log({p_{i}}/{q_{i}})
Hinge Loss =max(0,1ZY)\displaystyle\ell=\max(0,1-ZY)

This Lagrangian formulation provides the foundation for the bulk–boundary decomposition introduced next, in which locality along the depth dimension and the separation between architecture-driven and data-driven dynamics become explicit.

Bulk–Boundary Decomposition— Equation (1) shows that each neuron is determined locally by those in the preceding layer, suggesting a form of locality along the network depth. Making this locality explicit requires reformulating the degrees of freedom. In the conventional formulation, the fundamental variables are the weights Wij(m)W^{(m)}_{ij} and biases bi(m)b^{(m)}_{i}, but the loss function (Z,Y)\ell(Z,Y) becomes a nested, highly nonlocal function when expressed in these variables. Consequently, a direct expansion of the Lagrangian in Eq. (5) produces terms that couple parameters across all layers, thereby obscuring the local structure of the information flow, as illustrated in Fig. 1(a).

To make this locality explicit, it is necessary to reconsider the role of the neurons. Although locality was previously discussed on the basis of the relationship between adjacent neurons, neurons themselves were not treated as degrees of freedom of the system. If these neurons are promoted to the status of degrees of freedom, properties such as locality can be analyzed more directly. Here, we aim to develop such a framework by leveraging the change of variables and the SGD process.

In the SGD process, an input vector XX is provided at a given time tt, which in turn determines neurons zi(m)z_{i}^{(m)} throughout the network. To endow these neurons zi(m)z_{i}^{(m)} with dynamics, their evolution must be inherited from the original dynamical components, Wij(m)W_{ij}^{(m)} and bi(m)b_{i}^{(m)}. We implement this by performing a change of variables, promoting the zi(m)z_{i}^{(m)} to be degrees of freedom while replacing the bias parameters bi(m)b_{i}^{(m)}. This substitution is defined by the network’s recursive relation, Eq. (1), which can be rewritten as

bi(m)=zi(m+1)jWij(m)σ(zj(m)).b_{i}^{(m)}=z_{i}^{(m+1)}-\sum_{j}W_{ij}^{(m)}\sigma(z_{j}^{(m)}). (6)

According to this substitution, the loss function can be used in its simple form given in Table 1. This is possible because the network output ZiZ_{i} is identified with the dynamic variable zi(M)z_{i}^{(M)}. The full Lagrangian resulting from this substitution is given by

L=Lbulk+Lboundary,Lbulk=12i,j,m(W˙ij(m))2+12i,m(z˙i(m))2i,j,mz˙i(m+1)t(Wij(m)σ(zj(m)))+i,m[jt(Wij(m)σ(zj(m)))]2,Lboundary=i[jt(Wij(0)Xj)]2i,jz˙i(1)t(Wij(0)Xj)(Z,Y).\begin{split}L&=L_{\text{bulk}}+L_{\text{boundary}},\\[3.0pt] L_{\text{bulk}}&=\frac{1}{2}\sum_{i,j,m}\!\left(\dot{W}_{ij}^{(m)}\right)^{2}+\frac{1}{2}\sum_{i,m}\!\left(\dot{z}_{i}^{(m)}\right)^{2}\\ &\qquad\quad-\sum_{i,j,m}\dot{z}_{i}^{(m+1)}\partial_{t}\!\Big(W_{ij}^{(m)}\sigma(z_{j}^{(m)})\Big)\\ &\qquad\quad+\sum_{i,m}\Big[\sum_{j}\partial_{t}\!\Big(W_{ij}^{(m)}\sigma(z_{j}^{(m)})\Big)\Big]^{2},\\ L_{\text{boundary}}&=\sum_{i}\Big[\sum_{j}\partial_{t}\!\Big(W_{ij}^{(0)}X_{j}\Big)\Big]^{2}\\ &\qquad\quad-\sum_{i,j}\dot{z}_{i}^{(1)}\partial_{t}\!\Big(W_{ij}^{(0)}X_{j}\Big)-\ell(Z,Y).\end{split} (7)

In the basis of degrees of freedom (W,z)(W,z), a natural separation emerges: the bulk degrees of freedom, which interact independently of data, and the boundary degrees of freedom, whose interactions are driven by data (X,Y)(X,Y). From this perspective, the Lagrangian can be decomposed into two parts, LbulkL_{\text{bulk}} and LboundaryL_{\text{boundary}}. We refer to this as bulk–boundary decomposition. In this picture, the effects of training examples are confined to the input and output boundaries (m=0m=0 and m=Mm=M), while the internal architecture contributes only to the bulk dynamics.

Refer to caption
(a) Lagrangian description with weights and biases of Eq. (5).
Refer to caption
(b) Bulk–boundary decomposition: Lagrangian description with weights and neurons of Eq. (7).
Figure 1: (a) and (b) illustrate two equivalent representations of a deep neural network. The input and output layers act as data-dependent boundaries, while the interior bulk represents architecture-dependent dynamics. Local interactions along the depth direction lead to translational symmetry and enable a field-theoretic continuum limit.

This separation enables a systematic analysis of the two sectors. The boundary part encodes stochasticity from data sampling, whereas the bulk part describes the deterministic evolution governed by the network architecture and activation functions. Although the reformulated Lagrangian appears algebraically more involved, it exposes the key physical structure: the kinetic terms now couple only adjacent layers (m,m+1)(m,m+1), thereby making the locality along the depth direction explicit.

The BBD provides a natural framework for revealing the symmetry structure of deep neural networks. For architectures repeating the same layer structure, the bulk Lagrangian is invariant under mm+1m\rightarrow m+1, analogous to discrete translational symmetry in lattice field theories, where the ‘depth’ index plays the role of an effective spatial coordinate. A similar consideration applies to symmetries along the width direction. Such symmetry principles provide a natural bridge between deep learning dynamics and the analytic tools of field theory, which will be developed in the following section.

Figure 1(b) schematically describes the decomposition and its local nature. Because its layout resembles typical neural network diagrams, existing intuitions about network structure can be readily applied within the BBD framework. The explicit separation of data-dependent and data-independent sectors constitutes the core theoretical innovation of this work. It provides the structural foundation upon which the subsequent field-theoretic formulation is built.

Field Description— The bulk–boundary decomposition reveals that the underlying dynamics are local along the depth direction, with interactions confined to adjacent layers. From this perspective, the network’s mapping of an input XX to an output ZZ—a process that spans the entire network depth—may potentially be interpreted as an emergent long-range order phenomenon arising from these local interactions. To investigate this long-range order, we may adopt a field-theoretic approach, as is standard in physics for analyzing collective phenomena such as critical behavior or the emergence of magnetization in spin systems [52, 53, 54]. In this section, we provide a field-theoretic formulation that naturally emerges from the discrete Lagrangian derived in the previous section. Such an approach is expected to provide a promising framework for exploring the long-range order in deep neural networks.

Given that locality is a foundational property in many field theories, the BBD provides a promising framework, as it makes the locality along the depth direction explicit. However, this notion of locality is difficult to extend to the width direction, since there is no well-defined concept of spatial distance between neurons residing within the same layer. In the continuum limit of a fully connected architecture, each neuron zi(m)z_{i}^{(m)} can be represented by a field z(𝐱)z(\mathbf{x}), and Eq. (6) becomes

b(𝐱)=z(𝐱)𝑑𝐱W(𝐱,𝐱)σ[z(𝐱)],b(\mathbf{x})=z(\mathbf{x})-\int d\mathbf{x}^{\prime}\,W(\mathbf{x},\mathbf{x}^{\prime})\sigma[z(\mathbf{x}^{\prime})], (8)

where the integral kernel W(𝐱,𝐱)W(\mathbf{x},\mathbf{x}^{\prime}) couples different width coordinates, thereby rendering the system nonlocal in 𝐱\mathbf{x}.

One approach to formulating a field theory for neural networks that is local along the width direction is to restrict the network architecture itself. This can be achieved by imposing a structure in which neurons interact only with nearby units in that dimension—a property inherent to convolutional neural networks. As a simple illustration, we consider the local architecture shown in Fig. 2. In this configuration, neurons connect only to neighboring ones through weights WW, and periodic boundary conditions are imposed along the width direction.

To construct the field theories for the bulk and boundary sectors, we treat the discrete indices as continuous coordinates, where xx denotes the depth and yy the width. We begin by deriving the field description for the boundary contributions, where the data-driven effects emerge. All the boundary components from Eq. (7) can be aligned along a one-dimensional line, such that the boundary field theory is formulated in one dimension. From the network connectivity, we define the spatial coordinate yy of zz, ww, XX, and YY according to their connecting relations. Based on the distances between these coordinates, the discrete sums can be naturally expanded in powers of the lattice spacing aya_{y}, yielding a field-theoretic action.555The complete Lagrangian depends on a specific choice of these coordinates, which we do not detail here, as we present only several representative terms for illustrative purposes from the full expression.

Linput=dy[2X^2w^˙2+2X^˙2w^2+4w^˙X^˙w^X^+2ay2X^˙2w^y2w^+]Loutput=𝑑y(z^,Y^).\begin{split}L_{\text{input}}&=\int dy\,\Big[2\hat{X}^{2}\dot{\hat{w}}^{2}+2\dot{\hat{X}}^{2}\hat{w}^{2}+4\dot{\hat{w}}\dot{\hat{X}}\hat{w}\hat{X}\\ &\qquad\qquad\qquad\qquad\quad+2a_{y}^{2}\dot{\hat{X}}^{2}\hat{w}\partial_{y}^{2}\hat{w}+\cdots\Big]\\ L_{\text{output}}&=-\int dy\,\ell(\hat{z},\hat{Y}).\end{split} (9)

where we define the coarse-grained fields w^(y,t)\hat{w}(y,t), z^(y,t)\hat{z}(y,t), X^(y,t)\hat{X}(y,t), and Y^(y,t)\hat{Y}(y,t) corresponding to the synaptic, neuronal, input, and output variables, respectively. This explicitly shows the XX- and YY-dependence. Given the stochastic nature of the training samples, a statistical physics framework emerges as a natural candidate for incorporating this effect. This extension thus represents a promising direction for future work.

Constructing the bulk field theory is straightforward and leads to similar consequences:

Lbulk=dxdy[w^˙2+12z^˙2+2axxt(σ^w^)z^˙+23ax2[t(σ^xw^)]2+163ay2[t(σ^yw^)]2+],\begin{split}L_{\text{bulk}}=\int dxdy\Big[\dot{\hat{w}}^{2}+&\frac{1}{2}\dot{\hat{z}}^{2}+2a_{x}\partial_{x}\partial_{t}(\hat{\sigma}\hat{w})\dot{\hat{z}}\\ &+\frac{2}{3}a_{x}^{2}[\partial_{t}(\hat{\sigma}\partial_{x}\hat{w})]^{2}\\ &+\frac{16}{3}a_{y}^{2}[\partial_{t}(\hat{\sigma}\partial_{y}\hat{w})]^{2}+\cdots\Big],\end{split} (10)

where σ^=σ(z^)\hat{\sigma}=\sigma(\hat{z}). This Lagrangian contains all the information about how the signal from the input boundary is transmitted to the output boundary. Exploring its implications will be an interesting direction for future work.

XXXXXXXXYYYYYYYYzzzzzzzzzzzzzzzz\displaystyle\vdots\displaystyle\vdots\displaystyle\vdots\displaystyle\vdots\displaystyle\vdots\displaystyle\vdots\displaystyle\vdots\cdots\cdots\cdotszzzzzzzzzzzzDepthWW\cdotsWWWWWWWWWW\cdotsWWWWWWWWWWWWWWWWWW\cdots WW\cdots WWWWWWWWWW\cdotsWWWWWWWWWW\cdotsWWWWWWWW
Figure 2: Illustration of the local neural network architecture considered in the example. Each neuron interacts only with nearby neurons through weights WW, ensuring locality along the width direction. Periodic boundary conditions are imposed along the width direction. The depth direction corresponds to layer index mm, along which locality and translation symmetry emerge through the bulk–boundary decomposition.

Discussion and Outlook— The bulk-boundary decomposition provides a new perspective on deep learning by separating architectural dynamics from data-driven stochasticity. It reveals that, despite being engineered systems, neural networks possess intrinsic locality and translational symmetry structures that can be analyzed through physical principles. Since locality and homogeneity are foundational assumptions in physics, many standard analytical techniques rely on these properties. Consequently, few of these methods are directly applicable to systems lacking such symmetries. In this context, the BBD offers a valuable framework for applying physics-based analyses to deep neural networks.

The locality revealed by the BBD suggests that long-range order may emerge during the training dynamics. This phenomenon can be investigated in various ways, such as via a field-theoretic approach. Considering that the goal of training is to make a deep neural network accurately approximate a target function, this emergent order may be correlated with the successfully trained state of the system.

Future work can extend this approach in several directions. The statistical-mechanical description of boundary stochasticity may clarify how generalization arises from effective thermal ensembles, while symmetry-breaking analyses could connect network anisotropy to dynamical phase transitions. The BBD framework thus opens a route toward a unified theoretical understanding of learning, bridging the dynamics of artificial networks with the organizing principles of condensed matter and field theory.

Acknowledgments— This work was supported in part by the National Research Foundation of Korea (Grant No. RS-2024-00352537).

References