Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > stat

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Statistics

  • New submissions
  • Cross-lists
  • Replacements

See recent articles

Showing new listings for Wednesday, 11 February 2026

Total of 103 entries
Showing up to 2000 entries per page: fewer | more | all

New submissions (showing 41 of 41 entries)

[1] arXiv:2602.09058 [pdf, html, other]
Title: Persistent Entropy as a Detector of Phase Transitions
Matteo Rucco
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)

Persistent entropy (PE) is an information-theoretic summary statistic of persistence barcodes that has been widely used to detect regime changes in complex systems. Despite its empirical success, a general theoretical understanding of when and why persistent entropy reliably detects phase transitions has remained limited, particularly in stochastic and data-driven settings. In this work, we establish a general, model-independent theorem providing sufficient conditions under which persistent entropy provably separates two phases. We show that persistent entropy exhibits an asymptotically non-vanishing gap across phases. The result relies only on continuity of persistent entropy along the convergent diagram sequence, or under mild regularization, and is therefore broadly applicable across data modalities, filtrations, and homological degrees. To connect asymptotic theory with finite-time computations, we introduce an operational framework based on topological stabilization, defining a topological transition time by stabilizing a chosen topological statistic over sliding windows, and a probability-based estimator of critical parameters within a finite observation horizon. We validate the framework on the Kuramoto synchronization transition, the Vicsek order-to-disorder transition in collective motion, and neural network training dynamics across multiple datasets and architectures. Across all experiments, stabilization of persistent entropy and collapse of variability across realizations provide robust numerical signatures consistent with the theoretical mechanism.

[2] arXiv:2602.09061 [pdf, html, other]
Title: Optimal information deletion and Bayes' theorem
Hans Montcho, Håvard Rue
Subjects: Methodology (stat.ME); Information Theory (cs.IT); Statistics Theory (math.ST)

In this same journal, Arnold Zellner published a seminal paper on Bayes' theorem as an optimal information processing rule. This result led to the variational formulation of Bayes' theorem, which is the central idea in generalized variational inference. Almost 40 years later, we revisit these ideas, but from the perspective of information deletion. We investigate rules which update a posterior distribution into an antedata distribution when a portion of data is removed. In such context, a rule which does not destroy or create information is called the optimal information deletion rule and we prove that it coincides with the traditional use of Bayes' theorem.

[3] arXiv:2602.09145 [pdf, html, other]
Title: Estimating causal effects of functional treatments with modified functional treatment policies
Ziren Jiang, Erjia Cui, Jared D. Huling
Subjects: Methodology (stat.ME)

Functional data are increasingly prevalent in biomedical research. While functional data analysis has been established for decades, causal inference with functional treatments remains largely unexplored. Existing methods typically focus on estimating the causal average dose response functional (ADRF), which requires strong positivity assumptions and offers limited interpretability. In this work, we target a new causal estimand, the modified functional treatment policy (MFTP), which focuses on estimating the average potential outcome when each individual slightly modifies their treatment trajectory from the observed one. A major challenge for this new estimand is the need to define an average over an infinite-dimensional object with no density. By proposing a novel definition of the population average over a functional variable using a functional principal component analysis (FPCA) decomposition, we establish the causal identifiability of the MFTP estimand. We further derive outcome regression, inverse probability weighting, and doubly robust estimators for the MFTP, and provide theoretical guarantees under mild regularity conditions. The proposed estimators are validated through extensive simulation studies. Applying our MFTP framework to the National Health and Nutrition Examination Survey (NHANES) accelerometer data, we estimate the causal effects of reducing disruptive nighttime activity and low-activity duration on all-cause mortality.

[4] arXiv:2602.09161 [pdf, html, other]
Title: Minimum Distance Summaries for Robust Neural Posterior Estimation
Sherman Khoo, Dennis Prangle, Song Liu, Mark Beaumont
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.

[5] arXiv:2602.09167 [pdf, html, other]
Title: Mean regression for (0,1) responses via beta scale mixtures
Arno Otto, Andriëtte Bekker, Johan Ferreira, Lebogang Rathebe
Comments: 21 pages, 11 figures
Subjects: Methodology (stat.ME)

To achieve a greater general flexibility for modeling heavy-tailed bounded responses, a beta scale mixture model is proposed. Each member of the family is obtained by multiplying the scale parameter of the conditional beta distribution by a mixing random variable taking values on all or part of the positive real line and whose distribution depends on a single parameter governing the tail behavior of the resulting compound distribution. These family members allow for a wider range of values for skewness and kurtosis. To validate the effectiveness of the proposed model, we conduct experiments on both simulated data and real datasets. The results indicate that the beta scale mixture model demonstrates superior performance relative to the classical beta regression model and alternative competing methods for modeling responses on the bounded unit domain.

[6] arXiv:2602.09170 [pdf, html, other]
Title: Quantifying Epistemic Uncertainty in Diffusion Models
Aditi Gupta, Raphael A. Meyer, Yotam Yaniv, Elynn Chen, N. Benjamin Erichson
Comments: Will appear in the Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

To ensure high quality outputs, it is important to quantify the epistemic uncertainty of diffusion this http URL methods are often unreliable because they mix epistemic and aleatoric uncertainty. We introduce a method based on Fisher information that explicitly isolates epistemic variance, producing more reliable plausibility scores for generated data. To make this approach scalable, we propose FLARE (Fisher-Laplace Randomized Estimator), which approximates the Fisher information using a uniformly random subset of model parameters. Empirically, FLARE improves uncertainty estimation in synthetic time-series generation tasks, achieving more accurate and reliable filtering than other methods. Theoretically, we bound the convergence rate of our randomized approximation and provide analytic and empirical evidence that last-layer Laplace approximations are insufficient for this task.

[7] arXiv:2602.09208 [pdf, html, other]
Title: Some Bayesian Perspectives on Clinical Trials
Alexandra Sokolova, Vadim Sokolov, Nick Polson
Subjects: Methodology (stat.ME)

We provide a Bayesian perspective on three interconnected aspects of clinical trial design: prior specification, sequential adaptive allocation, and decision-theoretic optimization. For prior specification, we argue that treatment effects in clinical trials are known a priori to be small, rendering default noninformative priors such as Jeffreys' prior inappropriate; priors calibrated to historical effect sizes or LD50 relationships are both more honest and more efficient. For sequential design, we show how Thompson's (1933) probability-matching rule connects to modern adaptive randomization, and how backward induction on sufficient statistics -- following \citet{christen2003} and \citet{carlin1998} -- reduces the seemingly intractable infinite-horizon stopping problem to a finite table. For trial optimization, we review the utility-based framework of \citet{thall2004} that jointly models efficacy and toxicity, enabling dose-finding designs that maximize patient benefit rather than merely controlling error rates. We illustrate these ideas through the ECMO trial, the CALGB~49907 breast cancer trial, and modern platform trials, and discuss the 2026 FDA draft guidance on Bayesian methodology.

[8] arXiv:2602.09219 [pdf, html, other]
Title: Goodness-of-fit testing for nonlinear inverse problems with random observations
Remo Kretschmann, Han Cheng Lie
Comments: 44 pages
Subjects: Statistics Theory (math.ST)

This work is concerned with nonparametric goodness-of-fit testing in the context of nonlinear inverse problems with random observations. Bayesian posterior distributions based upon a Gaussian process prior distribution are proven to contract at a certain rate uniformly over a set of true parameters. The corresponding posterior mean is shown to converge uniformly at the posterior contraction rate in the sense of satisfying a concentration inequality. Distinguishability for bounded alternatives separated from a composite null hypothesis at the posterior contraction rate is established using infimum plug-in tests based on the posterior mean and also on maximum a posteriori estimators. The results are applied to a class of inverse problems governed by ordinary differential equation initial value problems that is widely used in pharmacokinetics. For this class, uniform posterior contraction rates are proven and then used to establish distinguishability.

[9] arXiv:2602.09240 [pdf, other]
Title: Optimal Estimation in Orthogonally Invariant Generalized Linear Models: Spectral Initialization and Approximate Message Passing
Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan, Marco Mondelli
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

We consider the problem of parameter estimation from a generalized linear model with a random design matrix that is orthogonally invariant in law. Such a model allows the design have an arbitrary distribution of singular values and only assumes that its singular vectors are generic. It is a vast generalization of the i.i.d. Gaussian design typically considered in the theoretical literature, and is motivated by the fact that real data often have a complex correlation structure so that methods relying on i.i.d. assumptions can be highly suboptimal. Building on the paradigm of spectrally-initialized iterative optimization, this paper proposes optimal spectral estimators and combines them with an approximate message passing (AMP) algorithm, establishing rigorous performance guarantees for these two algorithmic steps. Both the spectral initialization and the subsequent AMP meet existing conjectures on the fundamental limits to estimation -- the former on the optimal sample complexity for efficient weak recovery, and the latter on the optimal errors. Numerical experiments suggest the effectiveness of our methods and accuracy of our theory beyond orthogonally invariant data.

[10] arXiv:2602.09247 [pdf, html, other]
Title: Motivating REML via Prediction-Error Covariances in EM Updates for Linear Mixed Models
Andrew T. Karl
Subjects: Computation (stat.CO)

We present a computational motivation for restricted maximum likelihood (REML) estimation in linear mixed models using an expectation--maximization (EM) algorithm. At each iteration, maximum likelihood (ML) and REML solve the same mixed-model equations for the best linear unbiased estimator (BLUE) of the fixed effects and the best linear unbiased predictor (BLUP) of the random effects. They differ only in the trace adjustments used in the variance-component updates: ML uses conditional covariances of the random effects given the data, whereas REML uses prediction-error covariances from Henderson's C-matrix, reflecting uncertainty from estimating the fixed effects. Short R code makes this switch explicit, exposes the key matrices for classroom inspection, and reproduces lme4 ML and REML fits.

[11] arXiv:2602.09267 [pdf, other]
Title: Estimating the distance at which narwhal $(\textit{Monodon monoceros})$ respond to disturbance: a penalized threshold hidden Markov model
Fanny Dupont, Marianne Marcoux, Nigel E. Hussey, Jackie Dawson, Marie Auger-Méthé
Comments: 22 pages
Subjects: Applications (stat.AP)

Understanding behavioural responses to disturbances is vital for wildlife conservation. For example, in the Arctic, the decrease in sea ice has opened new shipping routes, increasing the need for impact assessments that quantify the distance at which marine mammals react to vessel presence. This information can then guide targeted mitigation policies, such as vessel slow-down regulations and delineation of avoidance areas. Using telemetry data to determine distances linked to deviations from normal behaviour requires advanced statistical models, such as threshold hidden Markov models (THMMs). While these are powerful tools, they do not assess whether the estimated threshold reflects a meaningful behavioural shift. We introduce a lasso-penalized THMM that builds on computationally efficient methods to impose penalties on HMMs and present a new, efficient penalized quasi-restricted maximum-likelihood estimator. Our framework is capable of estimating thresholds and assessing whether the disturbance effects are meaningful. With simulations, we demonstrate that our lasso method effectively shrinks spurious threshold effects towards zero. When applied to narwhal $\textit{(Monodon monoceros)}$ movement data, our analysis suggests that narwhal react to vessels up to 4 kilometres away by decreasing movement persistence and spending more time in deeper waters (average maximum depth of 356m). Overall, we provide a broadly applicable framework for quantifying behavioural responses to stimuli, with applications ranging from determining reaction thresholds to disturbance to estimating the distances at which terrestrial species, such as elephants, detect water.

[12] arXiv:2602.09277 [pdf, html, other]
Title: Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs
Minh Vu, Xiaoliang Wan, Shuangqing Wei
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The $\beta$-VAE is a foundational framework for unsupervised disentanglement, using $\beta$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $\beta$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $\beta > 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $\lambda\beta$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $\lambda$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $\lambda > 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $\beta$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.

[13] arXiv:2602.09279 [pdf, html, other]
Title: Stochastic EM Estimation and Inference for Zero-Inflated Beta-Binomial Mixed Models for Longitudinal Count Data
John Barrera, Ana Arribas-Gil, Dae-Jin Lee, Cristian Meza
Comments: 21 pages, 4 figures
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Analyzing overdispersed, zero-inflated, longitudinal count data poses significant modeling and computational challenges, which standard count models (e.g., Poisson or negative binomial mixed effects models) fail to adequately address. We propose a Zero-Inflated Beta-Binomial Mixed Effects Regression (ZIBBMR) model that augments a beta-binomial count model with a zero-inflation component, fixed effects for covariates, and subject-specific random effects, accommodating excessive zeros, overdispersion, and within-subject correlation. Maximum likelihood estimation is performed via a Stochastic Approximation EM (SAEM) algorithm with latent variable augmentation, which circumvents the model's intractable likelihood and enables efficient computation. Simulation studies show that ZIBBMR achieves accuracy comparable to leading mixed-model approaches in the literature and surpasses simpler zero-inflated count formulations, particularly in small-sample scenarios. As a case study, we analyze longitudinal microbiome data, comparing ZIBBMR with an external Zero-Inflated Beta Regression (ZIBR) benchmark; the results indicate that applying both count- and proportion-based models in parallel can enhance inference robustness when both data types are available.

[14] arXiv:2602.09351 [pdf, html, other]
Title: Supervised Learning of Functional Outcomes with Predictors at Different Scales: A Functional Gaussian Process Approach
R. Jacob Andros, Rajarshi Guhaniyogi, Devin Francom, Donatella Pasqualini
Subjects: Methodology (stat.ME)

The analysis of complex computer simulations, often involving functional data, presents unique statistical challenges. Conventional regression methods, such as function-on-function regression, typically associate functional outcomes with both scalar and functional predictors on a per-realization basis. However, simulation studies often demand a more nuanced approach to disentangle nonlinear relationships of functional outcome with predictors observed at multiple scales: domain-specific functional predictors that are fixed across simulation runs, and realization-specific global predictors that vary between runs. In this article, we develop a novel supervised learning framework tailored to this setting. We propose an additive nonlinear regression model that flexibly captures the influence of both predictor types. The effects of functional predictors are modeled through spatially-varying coefficients governed by a Gaussian process prior. Crucially, to capture the impact of global predictors on the functional outcome, we introduce a functional Gaussian process (fGP) prior. This new prior jointly models the entire collection of unknown, spatially-indexed nonlinear functions that encode the effects of the global predictors over the entire domain, explicitly accounting for their spatial dependence. This integrated architecture enables simultaneous learning from both predictor types, provides a principled strategies to quantify their respective contributions in predicting the functional outcome, and delivers rigorous uncertainty estimates for both model parameters and predictions. The utility and robustness of our approach are demonstrated through multiple synthetic datasets and a real-world application involving outputs from the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) model.

[15] arXiv:2602.09356 [pdf, html, other]
Title: Regularized geometric quantiles and universal linear distribution functionals
Dimitri Konen, Gilles Stupfler
Subjects: Statistics Theory (math.ST)

Geometric quantiles are popular location functionals to build rank-based statistical procedures in multivariate settings. They are obtained through the minimization of a non-smooth convex objective function. As a result, the singularity of the directional derivatives leads to numerical instabilities and poor sample properties as well as surprising `phase transitions' from empirical to population distributions. To solve these issues, we introduce a regularized version of geometric distribution functions and quantiles that are provably close to the usual geometric concepts and share their qualitative properties, both in the empirical and continuous case, while allowing for a much broader applicability of asymptotic results without any moment condition. We also show that any linear assignment of probability measures (such as the univariate distribution function), that is also translation- and orthogonal-equivariant, necessarily coincides with one of our regularized geometric distribution functions.

[16] arXiv:2602.09394 [pdf, html, other]
Title: The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning
Seyed Morteza Emadi
Comments: 49 pages, 5 figures
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG)

Manufacturing lines, service journeys, supply chains, and AI reasoning chains share a common challenge: attributing a terminal outcome to the intermediate stage that caused it. We establish an information-theoretic barrier to this credit assignment problem: the signal connecting early steps to final outcomes decays exponentially with depth, creating a critical horizon beyond which no algorithm can learn from endpoint data alone. We prove four results. First, a Signal Decay Bound: sample complexity for attributing outcomes to early stages grows exponentially in the number of intervening steps. Second, Width Limits: parallel rollouts provide only logarithmic relief, with correlation capping the effective number of independent samples. Third, an Objective Mismatch: additive reward aggregation optimizes the wrong quantity when sequential validity requires all steps to be correct. Fourth, Optimal Inspection Design: uniform checkpoint spacing is minimax-optimal under homogeneous signal attenuation, while a greedy algorithm yields optimal non-uniform schedules under heterogeneous attenuation. Together, these results provide a common analytical foundation for inspection design in operations and supervision design in AI.

[17] arXiv:2602.09405 [pdf, other]
Title: Is Memorization Helpful or Harmful? Prior Information Sets the Threshold
Chen Cheng, Rina Foygel Barber
Comments: 33 pages, 3 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)

We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $\pi$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $\pi$.

[18] arXiv:2602.09457 [pdf, other]
Title: From Average Sensitivity to Small-Loss Regret Bounds under Random-Order Model
Shinsaku Sakaue, Yuichi Yoshida
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

We study online learning in the random-order model, where the multiset of loss functions is chosen adversarially but revealed in a uniformly random order. Building on the batch-to-online conversion by Dong and Yoshida (2023), we show that if an offline algorithm admits a $(1+\varepsilon)$-approximation guarantee and the effect of $\varepsilon$ on its average sensitivity is characterized by a function $\varphi(\varepsilon)$, then an adaptive choice of $\varepsilon$ yields a small-loss regret bound of $\tilde O(\varphi^{\star}(\mathrm{OPT}_T))$, where $\varphi^{\star}$ is the concave conjugate of $\varphi$, $\mathrm{OPT}_T$ is the offline optimum over $T$ rounds, and $\tilde O$ hides polylogarithmic factors in $T$. Our method requires no regularity assumptions on loss functions, such as smoothness, and can be viewed as a generalization of the AdaGrad-style tuning applied to the approximation parameter $\varepsilon$. Our result recovers and strengthens the $(1+\varepsilon)$-approximate regret bounds of Dong and Yoshida (2023) and yields small-loss regret bounds for online $k$-means clustering, low-rank approximation, and regression. We further apply our framework to online submodular function minimization using $(1\pm\varepsilon)$-cut sparsifiers of submodular hypergraphs, obtaining a small-loss regret bound of $\tilde O(n^{3/4}(1 + \mathrm{OPT}_T^{3/4}))$, where $n$ is the ground-set size. Our approach sheds light on the power of sparsification and related techniques in establishing small-loss regret bounds in the random-order model.

[19] arXiv:2602.09512 [pdf, html, other]
Title: Continuous mixtures of Gaussian processes as models for spatial extremes
Lorenzo Dell'Oro, Carlo Gaetan, Thomas Opitz
Subjects: Methodology (stat.ME); Computation (stat.CO)

Spatial modelling of extreme values allows studying the risk of joint occurrence of extreme events at different locations and is of significant interest in climatic and other environmental sciences. A popular class of dependence models for spatial extremes is that of random location-scale mixtures, in which a spatial "baseline" process is multiplied or shifted by a random variable, potentially altering its extremal dependence behaviour. Gaussian location-scale mixtures retain benefits of their Gaussian baseline processes while overcoming some of their limitations, such as symmetry, light tails and weak tail dependence. We review properties of Gaussian location-scale mixtures and develop novel constructions with interesting features, together with a general algorithm for conditional simulation from these models. We leverage their flexibility to propose extended extreme-value models, that allow for appropriately modelling not only the tails but also the bulk of the data. This is important in many applications and avoids the need to explicitly select the events considered as extreme. We propose new solutions for likelihood inference in parametric models of Gaussian location-scale mixtures, in order to avoid the numerical bottleneck given by the latent location and scale variables that can lead to high computational cost of standard likelihood evaluations. The effectiveness of the models and of the inference methods is confirmed with simulated data examples, and we present an application to wildfire-related weather variables in Portugal. Although not detailed here, the approaches would also be straightforward to use for modelling multivariate (non spatial) data.

[20] arXiv:2602.09537 [pdf, html, other]
Title: A joint QoL-Survival framework with debiased estimation under truncation by death
Torben Martinussen, Klaus K. Holst, Christian Bressen Pipper, Per Kragh Andersen
Subjects: Methodology (stat.ME)

Evaluating quality-of-life (QoL) outcomes in populations with high mortality risk is complicated by truncation by death, since QoL is undefined for individuals who do not survive to the planned measurement time. We propose a framework that jointly models the distribution of QoL and survival without extrapolating QoL beyond death. Inspired by multistate formulations, we extend the joint characterization of binary health states and mortality to continuous QoL outcomes. Because treatment effects cannot be meaningfully summarized in a single one-dimensional estimand without strong assumptions, our approach simultaneously considers both survival and the joint distribution of QoL and survival with the latter conveniently displayed in a simplex. We develop assumption-lean, semiparametric estimators based on efficient influence functions, yielding flexible, root-n consistent estimators that accommodate machine-learning methods while making transparent the conditions these must satisfy. The proposed method is illustrated through simulation studies and two real-data applications.

[21] arXiv:2602.09542 [pdf, html, other]
Title: High Dimensional Mean Test for Shrinking Random Variables with Applications to Backtesting
Liujun Chen, Chen Zhou
Subjects: Methodology (stat.ME)

We propose a high dimensional mean test framework for shrinking random variables, where the underlying random variables shrink to zero as the sample size increases. By pooling observations across overlapping subsets of dimensions, we estimate subsets means and test whether the maximum absolute mean deviates from zero. This approach overcomes cancellations that occur in simple averaging and remains valid even when marginal asymptotic normality fails. We establish theoretical properties of the test statistic and develop a multiplier bootstrap procedure to approximate its distribution. The method provides a flexible and powerful tool for the validation and comparative backtesting of value-at-risk. Simulations show superior performance in high-dimensional settings, and a real-data application demonstrates its practical effectiveness in backtesting.

[22] arXiv:2602.09595 [pdf, html, other]
Title: Sharp Bounds for Treatment Effect Generalization under Outcome Distribution Shift
Amir Asiaee, Samhita Pal, Cole Beck, Jared D. Huling
Subjects: Methodology (stat.ME)

Generalizing treatment effects from a randomized trial to a target population requires the assumption that potential outcome distributions are invariant across populations after conditioning on observed covariates. This assumption fails when unmeasured effect modifiers are distributed differently between trial participants and the target population. We develop a sensitivity analysis framework that bounds how much conclusions can change when this transportability assumption is violated. Our approach constrains the likelihood ratio between target and trial outcome densities by a scalar parameter $\Lambda \geq 1$, with $\Lambda = 1$ recovering standard transportability. For each $\Lambda$, we derive sharp bounds on the target average treatment effect -- the tightest interval guaranteed to contain the true effect under all data-generating processes compatible with the observed data and the sensitivity model. We show that the optimal likelihood ratios have a simple threshold structure, leading to a closed-form greedy algorithm that requires only sorting trial outcomes and redistributing probability mass. The resulting estimator runs in $O(n \log n)$ time and is consistent under standard regularity conditions. Simulations demonstrate that our bounds achieve nominal coverage when the true outcome shift falls within the specified $\Lambda$, provide substantially tighter intervals than worst-case bounds, and remain informative across a range of realistic violations of transportability.

[23] arXiv:2602.09619 [pdf, html, other]
Title: Discrete-time, discrete-state multistate Markov models from the perspective of algebraic statistics
Dario Gasbarra, Kaie Kubjas, Sangita Kulathinal, Nataliia Kushnerchuk, Fatemeh Mohammadi, Etienne Sebag
Subjects: Statistics Theory (math.ST); Algebraic Geometry (math.AG)

We study discrete-time, discrete-state multistate Markov models from the perspective of algebraic statistics. These models are widely studied in event history analysis, and are characterized by the state space, the initial distribution and the transition probabilities. A finite path under the multistate Markov model is a particular set of states occupied at finite time instances $\{1, \dots, n\}$. The main goal of this paper is to establish a bridge between event history analysis and algebraic statistics. The joint probabilities of finite paths in these models have a natural monomial parametrization in terms of the initial distribution and the transition probabilities. We study the polynomial relations among joint path probabilities. When the statistical constraints on the parameters are disregarded, nonhomogeneous multistate Markov models of arbitrary order can be viewed as slices of decomposable hierarchical models. This yields a complete description of their vanishing ideals as toric ideals generated by explicit families of binomials. Moreover, the variety of this vanishing ideal equals the nonhomogeneous multistate Markov model on the probability simplex. In contrast, homogeneous multistate Markov models exhibit different algebraic behavior, as time homogeneity imposes additional polynomial relations, leading to vanishing ideals that are strictly larger than in the nonhomogeneous case. We also derive families of binomial relations that vanish on homogeneous multistate Markov models. We investigate maximum likelihood estimation from statistical and algebraic perspectives. For nonhomogeneous models, classical and algebraic formulas agree; in the homogeneous case, the algebraic approach is more complex. Lastly, we provide data applications where we demonstrate the statistical theory to obtain the maximum likelihood estimates of the parameters under specific multistate Markov models.

[24] arXiv:2602.09632 [pdf, html, other]
Title: Bayesian network approach to building an affective module for a driver behavioural model
Dorota Młynarczyk, Gabriel Calvo, Francisco Palmi-Perales, Carmen Armero, Virgilio Gómez-Rubio, Ana de la Torre-García, Ricardo Bayona Salvador
Subjects: Applications (stat.AP)

This paper focuses on the affective component of a driver behavioural model (DBM). This component specifically models some drivers' mental states such as mental load and active fatigue, which may affect driving performance. We have used Bayesian networks (BNs) to explore the dependencies between various relevant random variables and assess the probability that a driver is in a particular mental state based on their physiological and demographic conditions. Through this approach, our goal is to improve our understanding of driver behaviour in dynamic environments, with potential applications in traffic safety and autonomous vehicle technologies.

[25] arXiv:2602.09651 [pdf, html, other]
Title: The Entropic Signature of Class Speciation in Diffusion Models
Florian Handke, Dejan Stančević, Felix Koulischer, Thomas Demeester, Luca Ambrogioni
Comments: 21 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical work attributes this transition to dynamical instabilities along class-separating directions, but practical methods to detect and exploit these windows in trained models are still limited. We show that tracking the class-conditional entropy of a latent semantic variable given the noisy state provides a reliable signature of these transition regimes. By restricting the entropy to semantic partitions, the entropy can furthermore resolve semantic decisions at different levels of abstraction. We analyze this behavior in high-dimensional Gaussian mixture models and show that the entropy rate concentrates on the same logarithmic time scale as the speciation symmetry-breaking instability previously identified in variance-preserving diffusion. We validate our method on EDM2-XS and Stable Diffusion 1.5, where class-conditional entropy consistently isolates the noise regimes critical for semantic structure formation. Finally, we use our framework to quantify how guidance redistributes semantic information over time. Together, these results connect information-theoretic and statistical physics perspectives on diffusion and provide a principled basis for time-localized control.

[26] arXiv:2602.09704 [pdf, html, other]
Title: Extended Isolation Forest with feature sensitivities
Illia Donhauzer
Comments: The automated classifier suggested cs.LG. We believe the paper is primarily machine learning theory, and we would appreciate cross-listing to cs.LG or stat.ML if deemed appropriate
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

Compared to theoretical frameworks that assume equal sensitivity to deviations in all features of data, the theory of anomaly detection allowing for variable sensitivity across features is less developed. To the best of our knowledge, this issue has not yet been addressed in the context of isolation-based methods, and this paper represents the first attempt to do so. This paper introduces an Extended Isolation Forest with feature sensitivities, which we refer to as the Anisotropic Isolation Forest (AIF). In contrast to the standard EIF, the AIF enables anomaly detection with controllable sensitivity to deviations in different features or directions in the feature space. The paper also introduces novel measures of directional sensitivity, which allow quantification of AIF's sensitivity in different directions in the feature space. These measures enable adjustment of the AIF's sensitivity to task-specific requirements. We demonstrate the performance of the algorithm by applying it to synthetic and real-world datasets. The results show that the AIF enables anomaly detection that focuses on directions in the feature space where deviations from typical behavior are more important.

[27] arXiv:2602.09720 [pdf, html, other]
Title: Continual Learning for non-stationary regression via Memory-Efficient Replay
Pablo García-Santaclara, Bruno Fernández-Castro, RebecaP.Díaz-Redondo, Martín Alonso-Gamarra
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Data streams are rarely static in dynamic environments like Industry 4.0. Instead, they constantly change, making traditional offline models outdated unless they can quickly adjust to the new data. This need can be adequately addressed by continual learning (CL), which allows systems to gradually acquire knowledge without incurring the prohibitive costs of retraining them from scratch. Most research on continual learning focuses on classification problems, while very few studies address regression tasks. We propose the first prototype-based generative replay framework designed for online task-free continual regression. Our approach defines an adaptive output-space discretization model, enabling prototype-based generative replay for continual regression without storing raw data. Evidence obtained from several benchmark datasets shows that our framework reduces forgetting and provides more stable performance than other state-of-the-art solutions.

[28] arXiv:2602.09731 [pdf, html, other]
Title: Bayesian identification of early warning signals for long-range dependent climatic time series
Sigrunn H. Sørbye, Eirik Myrvoll-Nilsen, Håvard Rue
Comments: 27 pages, 9 figures
Subjects: Methodology (stat.ME)

Detecting early warning signals in climatic time series is essential for anticipating critical transitions and tipping points. Common statistical indicators include increased variance and lag-one autocorrelation prior to bifurcation points. However, these indicators are sensitive to observational noise, long-term mean trends, and long-memory dependence, all of which are prevalent in climatic time series. Such effects can easily obscure genuine signals or generate spurious detections. To address these challenges, we employ a flexible Bayesian framework for modelling time-varying autocorrelation in long-range dependent time series, also accounting for time-varying variance. The approach uses a mixture of two fractional Gaussian noise processes with a time-dependent weight function to represent fractional Gaussian noise with a time-varying Hurst exponent. Inference is performed via integrated nested Laplace approximation, enabling joint estimation of mean trends and handling of irregularly sampled observations. The strengths and limitations of detecting changes in the autocorrelation is investigated in extensive simulations. Applied to real climatic data sets, we find evidence of early warning signals in a reconstructed Atlantic multidecadal variability index, while dismissing such signals for paleoclimate records spanning the Dansgaard-Oeschger events.

[29] arXiv:2602.09762 [pdf, html, other]
Title: Asymptotic analysis of the Gaussian kernel matrix for partially noisy data in high dimensions
Kensuke Aishima
Subjects: Statistics Theory (math.ST); Numerical Analysis (math.NA)

The Gaussian kernel is one of the most important kernels, applicable to many research fields, including scientific computing and data science. In this paper, we present asymptotic analysis of the Gaussian kernel matrix in high dimension under a statistical model of noisy data. The main result is a nice combination of Karoui's asymptotic analysis with procedures of constrained low rank matrix approximations. More specifically, Karouli clarified an important asymptotic structure of the Gaussian kernel matrix, leading to strong consistency of the eigenvectors, though the eigenvalues are inconsistent. This paper focuses on the above results and presents a consistent estimator with the use of the smallest eigenvalue, whenever the target kernel matrix tends to low rank in the asymptotic regime. Importantly, asymptotic analysis is given under a statistical model representing partial noise. Although a naive estimator is inconsistent, applying an optimization method for low rank approximations with constraints, we overcome the difficulty caused by the inconsistency, resulting in a new estimator with strong consistency in rank deficient cases.

[30] arXiv:2602.09833 [pdf, html, other]
Title: Density estimation from batched broken random samples
Hancheng Bi, Bernhard Schmitzer, Thilo D. Stier
Comments: 18 pages, 4 figures
Subjects: Statistics Theory (math.ST)

The broken random sample problem was first introduced by DeGroot, Feder, and Gole (1971, Ann. Math. Statist.): in each observation (batch), a random sample of $M$ i.i.d. point pairs $ ((X_i,Y_i))_{i=1}^M$ is drawn from a joint distribution with density $p(x,y)$, but we can observe only the unordered multisets $(X_i)_{i=1}^M$ and $(Y_i)_{i=1}^M$ separately; that is, the pairing information is lost. For large $M$, inferring $p$ from a single observation has been shown to be essentially impossible. In this paper, we propose a parametric method based on a pseudo-log-likelihood to estimate $p$ from $N$ i.i.d. broken sample batches, and we prove a fast convergence rate in $N$ for our estimator that is uniform in $M$, under mild assumptions.

[31] arXiv:2602.09845 [pdf, html, other]
Title: Estimating Individual Customer Lifetime Values with R: The CLVTools Package
Markus Meierer, Patrick Bachmann, Jeffrey Näf, Patrik Schilter, René Algesheimer
Subjects: Computation (stat.CO)

Customer lifetime value (CLV) describes a customer's long-term economic value for a business. This metric is widely used in marketing, for example, to select customers for a marketing campaign. However, modeling CLV is challenging. When relying on customers' purchase histories, the input data is sparse. Additionally, given its long-term focus, prediction horizons are often longer than estimation periods. Probabilistic models are able to overcome these challenges and, thus, are a popular option among researchers and practitioners. The latter also appreciate their applicability for both small and big data as well as their robust predictive performance without any fine-tuning requirements. Their popularity is due to three characteristics: data parsimony, scalability, and predictive accuracy. The R package CLVTools provides an efficient and user-friendly implementation framework to apply key probabilistic models such as the Pareto/NBD and Gamma-Gamma model. Further, it provides access to the latest model extensions to include time-invariant and time-varying covariates, parameter regularization, and equality constraints. This article gives an overview of the fundamental ideas of these statistical models and illustrates their application to derive CLV predictions for existing and new customers.

[32] arXiv:2602.09847 [pdf, html, other]
Title: Stabilized Maximum-Likelihood Iterative Quantum Amplitude Estimation for Structural CVaR under Correlated Random Fields
Alireza Tabarraei
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Conditional Value-at-Risk (CVaR) is a central tail-risk measure in stochastic structural mechanics, yet its accurate evaluation under high-dimensional, spatially correlated material uncertainty remains computationally prohibitive for classical Monte Carlo methods. Leveraging bounded-expectation reformulations of CVaR compatible with quantum amplitude estimation, we develop a quantum-enhanced inference framework that casts CVaR evaluation as a statistically consistent, confidence-constrained maximum-likelihood amplitude estimation problem. The proposed method extends iterative quantum amplitude estimation (IQAE) by embedding explicit maximum-likelihood inference within a rigorously controlled interval-tracking architecture. To ensure global correctness under finite-shot noise and the non-injective oscillatory response induced by Grover amplification, we introduce a stabilized inference scheme incorporating multi-hypothesis feasibility tracking, periodic low-depth disambiguation, and a bounded restart mechanism governed by an explicit failure-probability budget. This formulation preserves the quadratic oracle-complexity advantage of amplitude estimation while providing finite-sample confidence guarantees and reduced estimator variance. The framework is demonstrated on benchmark problems with spatially correlated lognormal Young's modulus fields generated using a Nystrom low-rank Gaussian kernel model. Numerical results show that the proposed estimator achieves substantially lower oracle complexity than classical Monte Carlo CVaR estimation at comparable confidence levels, while maintaining rigorous statistical reliability. This work establishes a practically robust and theoretically grounded quantum-enhanced methodology for tail-risk quantification in stochastic continuum mechanics.

[33] arXiv:2602.09911 [pdf, html, other]
Title: Doubly Robust Machine Learning for Population Size Estimation with Missing Covariates: Application to Gaza Conflict Mortality
Mateo Dulce Rubio, Edward H. Kennedy, Nicholas P. Jewell
Subjects: Methodology (stat.ME)

Population size estimation from capture-recapture data is central for studying hard-to-reach populations, incorporating auxiliary covariates to account for heterogeneous capture probabilities and recapture dependencies. However, missing attributes pose a critical methodological challenge due to reluctance to share sensitive information, data collection limitations, and imperfect record linkage. Existing approaches either ignore missingness or rely on a priori imputation, potentially introducing substantial bias. In this work, we develop a novel nonparametric estimation framework using a Missing at Random assumption to identify capture probabilities under missing covariates. Using semiparametric efficiency theory, we construct one-step estimators that combine efficiency, robustness, and finite-sample validity: they approximately achieve the nonparametric efficiency bound, accommodate flexible machine learning methods through a doubly robust structure, and provide approximately valid inference for any sample size. Simulations demonstrate substantial improvements over naive imputation approaches, with our doubly robust ML estimators maintaining valid inference even at high missingness rates where competing methods fail. We apply our methodology to re-estimate mortality in the Gaza Strip from October 7, 2023, to June 30, 2024, using three-list capture-recapture data with missing demographic information. Our approach yields more conservative yet precise estimates compared to previous methods, indicating the true death toll exceeds official statistics by approximately 26%. Our framework provides practitioners with principled tools for handling incomplete data in conflict settings and other applications with hard-to-reach populations.

[34] arXiv:2602.09936 [pdf, html, other]
Title: The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It
Roy R. Lederman, David Silva-Sánchez, Ziling Chen, Gilles Mordant, Amnon Balanov, Tamir Bendory
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Lloyd's k-means algorithm is one of the most widely used clustering methods. We prove that in high-dimensional, high-noise settings, the algorithm exhibits catastrophic failure: with high probability, essentially every partition of the data is a fixed point. Consequently, Lloyd's algorithm simply returns its initial partition - even when the underlying clusters are trivially recoverable by other methods. In contrast, we prove that Hartigan's k-means algorithm does not exhibit this pathology. Our results show the stark difference between these algorithms and offer a theoretical explanation for the empirical difficulties often observed with k-means in high dimensions.

[35] arXiv:2602.09959 [pdf, other]
Title: Statistical-Computational Trade-offs in Learning Multi-Index Models via Harmonic Analysis
Hugo Latourelle-Vigeant, Theodor Misiakiewicz
Comments: 91 pages
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of learning multi-index models (MIMs), where the label depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown $\mathsf{s}$-dimensional projection $\boldsymbol{W}_*^\mathsf{T} \boldsymbol{x} \in \mathbb{R}^\mathsf{s}$. Exploiting the equivariance of this problem under the orthogonal group $\mathcal{O}_d$, we obtain a sharp harmonic-analytic characterization of the learning complexity for MIMs with spherically symmetric inputs -- which refines and generalizes previous Gaussian-specific analyses. Specifically, we derive statistical and computational complexity lower bounds within the Statistical Query (SQ) and Low-Degree Polynomial (LDP) frameworks. These bounds decompose naturally across spherical harmonic subspaces. Guided by this decomposition, we construct a family of spectral algorithms based on harmonic tensor unfolding that sequentially recover the latent directions and (nearly) achieve these SQ and LDP lower bounds. Depending on the choice of harmonic degree sequence, these estimators can realize a broad range of trade-offs between sample and runtime complexity. From a technical standpoint, our results build on the semisimple decomposition of the $\mathcal{O}_d$-action on $L^2 (\mathbb{S}^{d-1})$ and the intertwining isomorphism between spherical harmonics and traceless symmetric tensors.

[36] arXiv:2602.09982 [pdf, html, other]
Title: Kelly Betting as Bayesian Model Evaluation: A Framework for Time-Updating Probabilistic Forecasts
Michael Beuoy
Comments: 31 pages, 10 figures
Subjects: Methodology (stat.ME)

This paper proposes a new way of evaluating the accuracy and validity of probabilistic forecasts that change over time (such as an in-game win probability model, or an election forecast). Under this approach, each model to be evaluated is treated as a canonical Kelly bettor, and the models are pitted against each other in an iterative betting contest. The growth or decline of each model's bankroll serves as the evaluation metric. Under this approach, market consensus probabilities and implied model credibilities can be updated real time as each model updates, and do not require one to wait for the final outcome. Using a simulation model, it will be shown that this method is in general more accurate than traditional average log-loss and Brier score methods at distinguishing a correct model from an incorrect model. This Kelly approach is shown to have a direct mathematical and conceptual analogue to Bayesian inference, with bankroll serving as a proxy for Bayesian credibility.

[37] arXiv:2602.10012 [pdf, html, other]
Title: Doubly Robust Estimation of Desirability of Outcome Ranking (DOOR) Probability with Application to MDRO Studies
Shiyu Shu, Toshimitsu Hamasaki, Scott Evans, Lauren Komarow, David van Duin, Guoqing Diao
Subjects: Methodology (stat.ME)

In observational studies, adjusting for confounders is required if a treatment comparison is planned. A crude comparison of the primary endpoint without covariate adjustment will suffer from biases, and the addition of regression models could improve precision by incorporating imbalanced covariates and thus help make correct inference. Desirability of outcome ranking (DOOR) is a patient-centric benefit-risk evaluation methodology designed for randomized clinical trials. Still, robust covariate adjustment methods could further expand the compatibility of this method in observational studies. In DOOR analysis, each participant's outcome is ranked based on pre-specified clinical criteria, where the most desirable rank represents a good outcome with no side effects and the least desirable rank is the worst possible clinical outcome. We develop a causal framework for estimating the population-level DOOR probability, via the inverse probability of treatment weighting method, G-Computation method, and a Doubly Robust method that combines both. The performance of the proposed methodologies is examined through simulations. We also perform a causal analysis of the Multi-Drug Resistant Organism (MDRO) network within the Antibacterial Resistant Leadership Group (ARLG), comparing the benefit:risk between Mono-drug therapy and Combination-drug therapy.

[38] arXiv:2602.10018 [pdf, html, other]
Title: Online Selective Conformal Prediction with Asymmetric Rules: A Permutation Test Approach
Mingyi Zheng, Ying Jin
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)

Selective conformal prediction aims to construct prediction sets with valid coverage for a test unit conditional on it being selected by a data-driven mechanism. While existing methods in the offline setting handle any selection mechanism that is permutation invariant to the labeled data, their extension to the online setting -- where data arrives sequentially and later decisions depend on earlier ones -- is challenged by the fact that the selection mechanism is naturally asymmetric. As such, existing methods only address a limited collection of selection mechanisms.
In this paper, we propose PErmutation-based Mondrian Conformal Inference (PEMI), a general permutation-based framework for selective conformal prediction with arbitrary asymmetric selection rules. Motivated by full and Mondrian conformal prediction, PEMI identifies all permutations of the observed data (or a Monte-Carlo subset thereof) that lead to the same selection event, and calibrates a prediction set using conformity scores over this selection-preserving reference set. Under standard exchangeability conditions, our prediction sets achieve finite-sample exact selection-conditional coverage for any asymmetric selection mechanism and any prediction model. PEMI naturally incorporates additional offline labeled data, extends to selection mechanisms with multiple test samples, and achieves FCR control with fine-grained selection taxonomies. We further work out several efficient instantiations for commonly-used online selection rules, including covariate-based rules, conformal p/e-values-based procedures, and selection based on earlier outcomes. Finally, we demonstrate the efficacy of our methods across various selection rules on a real drug discovery dataset and investigate their performance via simulations.

[39] arXiv:2602.10026 [pdf, html, other]
Title: Degrees-of-Freedom Approximations for Conditional-Mean Inference in Random-Lot Stability Analysis
Andrew T. Karl, Heath Rushing, Richard K. Burdick, Jeff Hofer
Subjects: Methodology (stat.ME)

Linear mixed models are widely used for pharmaceutical stability trending when sufficient lots are available. Expiry support is typically based on whether lot-specific conditional-mean confidence limits remain within specification through a proposed expiry. These limits depend on the denominator degrees-of-freedom (DDF) method used for $t$-based inference. We document an operationally important boundary-proximal phenomenon: when a fitted random-effect variance component is close to zero, Satterthwaite DDF for conditional-mean predictions can collapse, inflating $t$ critical values and producing unnecessarily wide and sometimes nonmonotone pointwise confidence limits on scheduled time grids. In contrast, containment DDF yields stable degrees of freedom and avoids sharp discontinuities as variance components approach the boundary. Using a worked example and simulation studies, we show that DDF choice can materially change pass/fail conclusions even when observed data comfortably meet specifications. Containment-based inference with the full random-effects model provides a single modeling framework that avoids the discontinuities introduced by data-dependent model reduction at arbitrary cutoffs. When containment is unavailable, a 10\% variance-contribution reduction workflow mitigates extreme Satterthwaite behavior by simplifying the random-effects structure only when fitted contributions at the proposed expiry are negligible. An AICc step-down is also evaluated but is best treated as a sensitivity analysis, as it can be liberal when the margin between the mean trend and the specification limit at the proposed expiry is small.

[40] arXiv:2602.10055 [pdf, html, other]
Title: The weak law of large numbers for the friendship paradox index
Mingao Yuan
Subjects: Statistics Theory (math.ST)

The friendship paradox index is a network summary statistic used to quantify the friendship paradox, which describes the tendency for an individual's friends to have more friends than the individual. In this paper, we utilize Markov's inequality to derive the weak law of large numbers for the friendship paradox index in a random geometric graph, a widely-used model for networks with spatial dependence and geometry. For uniform random geometric graph, where the nodes are uniformly distributed in a space, the friendship paradox index is asymptotically equal to $1/4$. On the contrary, in nonuniform random geometric graphs, the nonuniform node distribution leads to distinct limiting properties for the index. In the relatively sparse regime, the friendship paradox index is still asymptotically equal to $1/4$, the same as in the uniform case. In the intermediate sparse regime, however, the index converges in probability to $1/4$ plus a constant that is explicitly dependent on the node distribution. Finally, in the relatively dense case, the index diverges to infinity as the graph size increases. Our results highlight the sharp contrast between the uniform case and its nonuniform counterpart.

[41] arXiv:2602.10103 [pdf, html, other]
Title: Minimax properties of gamma kernel density estimators under $L^p$ loss and $β$-Hölder smoothness of the target
Frédéric Ouimet
Comments: 32 pages, 3 figures
Subjects: Statistics Theory (math.ST); Probability (math.PR)

This paper considers the asymptotic behavior in $\beta$-Hölder spaces, and under $L^p$ loss, of the gamma kernel density estimator introduced by Chen [Ann. Inst. Statist. Math. 52 (2000), 471-480] for the analysis of nonnegative data, when the target's support is assumed to be upper bounded. It is shown that this estimator can achieve the minimax rate asymptotically for a suitable choice of bandwidth whenever $(p,\beta)\in [1,3)\times(0,2]$ or $(p,\beta)\in [3,4)\times ((p-3)/(p-2),2]$. It is also shown that this estimator cannot be minimax when either $p\in [4,\infty)$ or $\beta\in (2,\infty)$.

Cross submissions (showing 12 of 12 entries)

[42] arXiv:2602.09196 (cross-list from cs.LG) [pdf, html, other]
Title: Fair Feature Importance Scores via Feature Occlusion and Permutation
Camille Little, Madeline Navarro, Santiago Segarra, Genevera Allen
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

As machine learning models increasingly impact society, their opaque nature poses challenges to trust and accountability, particularly in fairness contexts. Understanding how individual features influence model outcomes is crucial for building interpretable and equitable models. While feature importance metrics for accuracy are well-established, methods for assessing feature contributions to fairness remain underexplored. We propose two model-agnostic approaches to measure fair feature importance. First, we propose to compare model fairness before and after permuting feature values. This simple intervention-based approach decouples a feature and model predictions to measure its contribution to training. Second, we evaluate the fairness of models trained with and without a given feature. This occlusion-based score enjoys dramatic computational simplification via minipatch learning. Our empirical results reflect the simplicity and effectiveness of our proposed metrics for multiple predictive tasks. Both methods offer simple, scalable, and interpretable solutions to quantify the influence of features on fairness, providing new tools for responsible machine learning development.

[43] arXiv:2602.09235 (cross-list from cs.LG) [pdf, html, other]
Title: RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata
Matthias Templ, Oscar Thees, Roman Müller
Comments: 29 pages, 5 figures
Subjects: Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)

Statistical data anonymization increasingly relies on fully synthetic microdata, for which classical identity disclosure measures are less informative than an adversary's ability to infer sensitive attributes from released data. We introduce RAPID (Risk of Attribute Prediction--Induced Disclosure), a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model. An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers. For continuous sensitive attributes, RAPID reports the proportion of records whose predicted values fall within a specified relative error tolerance. For categorical attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone, and we summarize risk as the fraction of records exceeding a policy-defined threshold. This construction yields an interpretable, bounded risk metric that is robust to class imbalance, independent of any specific synthesizer, and applicable with arbitrary learning algorithms. We illustrate threshold calibration, uncertainty quantification, and comparative evaluation of synthetic data generators using simulations and real data. Our results show that RAPID provides a practical, attacker-realistic upper bound on attribute-inference disclosure risk that complements existing utility diagnostics and disclosure control frameworks.

[44] arXiv:2602.09314 (cross-list from cs.LG) [pdf, html, other]
Title: Clarifying Shampoo: Adapting Spectral Descent to Stochasticity and the Parameter Trajectory
Runa Eschenhagen, Anna Cai, Tsung-Hsien Lee, Hao-Jun Michael Shi
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Optimizers leveraging the matrix structure in neural networks, such as Shampoo and Muon, are more data-efficient than element-wise algorithms like Adam and Signum. While in specific settings, Shampoo and Muon reduce to spectral descent analogous to how Adam and Signum reduce to sign descent, their general relationship and relative data efficiency under controlled settings remain unclear. Through extensive experiments on language models, we demonstrate that Shampoo achieves higher token efficiency than Muon, mirroring Adam's advantage over Signum. We show that Shampoo's update applied to weight matrices can be decomposed into an adapted Muon update. Consistent with this, Shampoo's benefits can be exclusively attributed to its application to weight matrices, challenging interpretations agnostic to parameter shapes. This admits a new perspective that also avoids shortcomings of related interpretations based on variance adaptation and whitening: rather than enforcing semi-orthogonality as in spectral descent, Shampoo's updates are time-averaged semi-orthogonal in expectation.

[45] arXiv:2602.09317 (cross-list from cs.LG) [pdf, html, other]
Title: SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints
Ya-Chi Chu, Alkiviades Boukas, Madeleine Udell
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Neural networks are increasingly used as surrogate solvers and control policies, but unconstrained predictions can violate physical, operational, or safety requirements. We propose SnareNet, a feasibility-controlled architecture for learning mappings whose outputs must satisfy input-dependent nonlinear constraints. SnareNet appends a differentiable repair layer that navigates in the constraint map's range space, steering iterates toward feasibility and producing a repaired output that satisfies constraints to a user-specified tolerance. To stabilize end-to-end training, we introduce adaptive relaxation, which designs a relaxed feasible set that snares the neural network at initialization and shrinks it into the feasible set, enabling early exploration and strict feasibility later in training. On optimization-learning and trajectory planning benchmarks, SnareNet consistently attains improved objective quality while satisfying constraints more reliably than prior work.

[46] arXiv:2602.09456 (cross-list from cs.LG) [pdf, html, other]
Title: Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits
Hao Qin, Chicheng Zhang
Comments: 40 pages (13 pages main body, 24 pages supplementary materials)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose an algorithmic framework, Offline Estimation to Decisions (OE2D), that reduces contextual bandit learning with general reward function approximation to offline regression. The framework allows near-optimal regret for contextual bandits with large action spaces with $O(log(T))$ calls to an offline regression oracle over $T$ rounds, and makes $O(loglog(T))$ calls when $T$ is known. The design of OE2D algorithm generalizes Falcon~\citep{simchi2022bypassing} and its linear reward version~\citep[][Section 4]{xu2020upper} in that it chooses an action distribution that we term ``exploitative F-design'' that simultaneously guarantees low regret and good coverage that trades off exploration and exploitation. Central to our regret analysis is a new complexity measure, the Decision-Offline Estimation Coefficient (DOEC), which we show is bounded in bounded Eluder dimension per-context and smoothed regret settings. We also establish a relationship between DOEC and Decision Estimation Coefficient (DEC)~\citep{foster2021statistical}, bridging the design principles of offline- and online-oracle efficient contextual bandit algorithms for the first time.

[47] arXiv:2602.09566 (cross-list from cs.LG) [pdf, html, other]
Title: ECG-IMN: Interpretable Mesomorphic Neural Networks for 12-Lead Electrocardiogram Interpretation
Vajira Thambawita, Jonas L. Isaksen, Jørgen K. Kanters, Hugo L. Hammer, Pål Halvorsen
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME)

Deep learning has achieved expert-level performance in automated electrocardiogram (ECG) diagnosis, yet the "black-box" nature of these models hinders their clinical deployment. Trust in medical AI requires not just high accuracy but also transparency regarding the specific physiological features driving predictions. Existing explainability methods for ECGs typically rely on post-hoc approximations (e.g., Grad-CAM and SHAP), which can be unstable, computationally expensive, and unfaithful to the model's actual decision-making process. In this work, we propose the ECG-IMN, an Interpretable Mesomorphic Neural Network tailored for high-resolution 12-lead ECG classification. Unlike standard classifiers, the ECG-IMN functions as a hypernetwork: a deep convolutional backbone generates the parameters of a strictly linear model specific to each input sample. This architecture enforces intrinsic interpretability, as the decision logic is mathematically transparent and the generated weights (W) serve as exact, high-resolution feature attribution maps. We introduce a transition decoder that effectively maps latent features to sample-wise weights, enabling precise localization of pathological evidence (e.g., ST-elevation, T-wave inversion) in both time and lead dimensions. We evaluate our approach on the PTB-XL dataset for classification tasks, demonstrating that the ECG-IMN achieves competitive predictive performance (AUROC comparable to black-box baselines) while providing faithful, instance-specific explanations. By explicitly decoupling parameter generation from prediction execution, our framework bridges the gap between deep learning capability and clinical trustworthiness, offering a principled path toward "white-box" cardiac diagnostics.

[48] arXiv:2602.09639 (cross-list from cs.LG) [pdf, html, other]
Title: Blind denoising diffusion models and the blessings of dimensionality
Zahra Kadkhodaie, Aram-Alexandre Pooladian, Sinho Chewi, Eero Simoncelli
Comments: 40 pages, 12 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We analyze, theoretically and empirically, the performance of generative diffusion models based on \emph{blind denoisers}, in which the denoiser is not given the noise amplitude in either the training or sampling processes. Assuming that the data distribution has low intrinsic dimensionality, we prove that blind denoising diffusion models (BDDMs), despite not having access to the noise amplitude, \emph{automatically} track a particular \emph{implicit} noise schedule along the reverse process. Our analysis shows that BDDMs can accurately sample from the data distribution in polynomially many steps as a function of the intrinsic dimension. Empirical results corroborate these mathematical findings on both synthetic and image data, demonstrating that the noise variance is accurately estimated from the noisy image. Remarkably, we observe that schedule-free BDDMs produce samples of higher quality compared to their non-blind counterparts. We provide evidence that this performance gain arises because BDDMs correct the mismatch between the true residual noise (of the image) and the noise assumed by the schedule used in non-blind diffusion models.

[49] arXiv:2602.09643 (cross-list from math.PR) [pdf, html, other]
Title: A simple proof of the discreteness of Dirichlet processes
Nils Lid Hjort
Comments: Based on pages 18-19 in N.L. Hjort's graduate thesis, 1976
Subjects: Probability (math.PR); Statistics Theory (math.ST)

That Dirichlet processes are discrete with probability 1 is demonstrated once more. And yes, these two pages spent fifty years in Norwegian.

[50] arXiv:2602.09969 (cross-list from cs.LG) [pdf, html, other]
Title: Causal Identification in Multi-Task Demand Learning with Confounding
Varun Gupta, Vijay Kamble
Subjects: Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)

We study a canonical multi-task demand learning problem motivated by retail pricing, in which a firm seeks to estimate heterogeneous linear price-response functions across a large collection of decision contexts. Each context is characterized by rich observable covariates yet typically exhibits only limited historical price variation, motivating the use of multi-task learning to borrow strength across tasks. A central challenge in this setting is endogeneity: historical prices are chosen by managers or algorithms and may be arbitrarily correlated with unobserved, task-level demand determinants. Under such confounding by latent fundamentals, commonly used approaches, such as pooled regression and meta-learning, fail to identify causal price effects.
We propose a new estimation framework that achieves causal identification despite arbitrary dependence between prices and latent task structure. Our approach, Decision-Conditioned Masked-Outcome Meta-Learning (DCMOML), involves carefully designing the information set of a meta-learner to leverage cross-task heterogeneity while accounting for endogenous decision histories. Under a mild restriction on price adaptivity in each task, we establish that this method identifies the conditional mean of the task-specific causal parameters given the designed information set. Our results provide guarantees for large-scale demand estimation with endogenous prices and small per-task samples, offering a principled foundation for deploying causal, data-driven pricing models in operational environments.

[51] arXiv:2602.10014 (cross-list from cs.LG) [pdf, html, other]
Title: A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula
Chenruo Liu, Yijun Dong, Yiqiu Shen, Qi Lei
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improvement, the theoretical foundation of this generative, iterative procedure in a practical, finite-sample setting remains limited. We make progress toward this goal by modeling each round of self-improvement as maximum-likelihood fine-tuning on a reward-filtered distribution and deriving finite-sample guarantees for the expected reward. Our analysis reveals an explicit feedback loop where better models accept more data per iteration, supporting sustained self-improvement while explaining eventual saturation of such improvement. Adopting a task-centric view by considering reasoning tasks with multiple difficulty levels, we further prove quantifiable conditions on model initialization, task difficulty, and sample budget where easy-to-hard curricula provably achieve better guarantees than training on fixed mixtures of tasks. Our analyses are validated via Monte-Carlo simulations and controlled experiments on graph-based reasoning tasks.

[52] arXiv:2602.10045 (cross-list from cs.CV) [pdf, other]
Title: Conformal Prediction Sets for Instance Segmentation
Kerri Lu, Dan M. Kluger, Stephen Bates, Sherrie Wang
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal prediction algorithm to generate adaptive confidence sets for instance segmentation. Given an image and a pixel coordinate query, our algorithm generates a confidence set of instance predictions for that pixel, with a provable guarantee for the probability that at least one of the predictions has high Intersection-Over-Union (IoU) with the true object instance mask. We apply our algorithm to instance segmentation examples in agricultural field delineation, cell segmentation, and vehicle detection. Empirically, we find that our prediction sets vary in size based on query difficulty and attain the target coverage, outperforming existing baselines such as Learn Then Test, Conformal Risk Control, and morphological dilation-based methods. We provide versions of the algorithm with asymptotic and finite sample guarantees.

[53] arXiv:2602.10056 (cross-list from cs.LG) [pdf, html, other]
Title: WildCat: Near-Linear Attention in Theory and Practice
Tobias Schröder, Lester Mackey
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$. WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly pivoted Cholesky -- and weight the elements optimally to minimise reconstruction error. Remarkably, given bounded inputs, WildCat approximates exact attention with super-polynomial $O(n^{-\sqrt{\log(\log(n))}})$ error decay while running in near-linear $O(n^{1+o(1)})$ time. In contrast, prior practical approximations either lack error guarantees or require quadratic runtime to guarantee such high fidelity. We couple this advance with a GPU-optimized PyTorch implementation and a suite of benchmark experiments demonstrating the benefits of WildCat for image generation, image classification, and language model KV cache compression.

Replacement submissions (showing 50 of 50 entries)

[54] arXiv:2310.01153 (replaced) [pdf, html, other]
Title: Measuring Evidence against Exchangeability and Group Invariance with E-values
Nick W. Koning
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

We study e-values for quantifying evidence against exchangeability and general invariance of a random variable under a compact group. We start by characterizing such e-values, and explaining how they nest traditional group invariance tests as a special case. We show they can be easily designed for an arbitrary test statistic, and computed through Monte Carlo sampling. We prove a result that characterizes optimal e-values for group invariance against optimality targets that satisfy a mild orbit-wise decomposition property. We apply this to design expected-utility-optimal e-values for group invariance, which include both Neyman-Pearson-optimal tests and log-optimal e-values. Moreover, we generalize the notion of rank- and sign-based testing to compact groups, by using a representative inversion kernel. In addition, we characterize e-processes for group invariance for arbitrary filtrations, and provide tools to construct them. We also describe test martingales under a natural filtration, which are simpler to construct. Peeking beyond compact groups, we encounter e-values and e-processes based on ergodic theorems. These nest e-processes based on de Finetti's theorem for testing exchangeability.

[55] arXiv:2312.05319 (replaced) [pdf, html, other]
Title: Hyperbolic Network Latent Space Model with Learnable Curvature
Jinming Li, Gongjun Xu, Ji Zhu
Journal-ref: Journal of the American Statistical Association 2026
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Network data is ubiquitous in various scientific disciplines, including sociology, economics, and neuroscience. Latent space models are often employed in network data analysis, but the geometric effect of latent space curvature remains a significant, unresolved issue. In this work, we propose a hyperbolic network latent space model with a learnable curvature parameter. We theoretically justify that learning the optimal curvature is essential to minimizing the embedding error across all hyperbolic embedding methods beyond network latent space models. A maximum-likelihood estimation strategy, employing manifold gradient optimization, is developed, and we establish the consistency and convergence rates for the maximum-likelihood estimators, both of which are technically challenging due to the non-linearity and non-convexity of the hyperbolic distance metric. We further demonstrate the geometric effect of latent space curvature and the superior performance of the proposed model through extensive simulation studies and an application using a Facebook friendship network.

[56] arXiv:2402.15004 (replaced) [pdf, html, other]
Title: Repro Samples Method for a Performance Guaranteed Inference in General and Irregular Inference Problems
Minge Xie, Peng Wang
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Rapid advancements in data science require us to have fundamentally new frameworks to tackle prevalent but highly non-trivial "irregular" inference problems, to which the large sample central limit theorem does not apply. Typical examples are those involving discrete or non-numerical parameters and those involving non-numerical data, etc. In this article, we present an innovative, wide-reaching, and effective approach, called "repro samples method," to conduct statistical inference for these irregular problems plus more. The development relates to but improves several existing simulation-inspired inference approaches, and we provide both exact and approximate theories to support our development. Moreover, the proposed approach is broadly applicable and subsumes the classical Neyman-Pearson framework as a special case. For the often-seen irregular inference problems that involve both discrete/non-numerical and continuous parameters, we propose an effective three-step procedure to make inferences for all parameters. We also develop a unique matching scheme that turns the discreteness of discrete/non-numerical parameters from an obstacle for forming inferential theories into a beneficial attribute for improving computational efficiency. We demonstrate the effectiveness of the proposed general methodology using various examples, including a case study example on a Gaussian mixture model with unknown number of components. This case study example provides a solution to a long-standing open inference question in statistics on how to quantify the estimation uncertainty for the unknown number of components and other associated parameters. Real data and simulation studies, with comparisons to existing approaches, demonstrate the far superior performance of the proposed method.

[57] arXiv:2408.00955 (replaced) [pdf, html, other]
Title: Aggregation Models with Optimal Weights for Distributed Gaussian Processes
Haoyuan Chen, Rui Tuo
Comments: 34 pages, 8 figures, 2 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

Gaussian process (GP) models have received increasing attention in recent years due to their superb prediction accuracy and modeling flexibility. To address the computational burdens of GP models for large-scale datasets, distributed learning for GPs are often adopted. Current aggregation models for distributed GPs is not time-efficient when incorporating correlations between GP experts. In this work, we propose a novel approach for aggregated prediction in distributed GPs. The technique is suitable for both the exact and sparse variational GPs. The proposed method incorporates correlations among experts, leading to better prediction accuracy with manageable computational requirements. As demonstrated by empirical studies, the proposed approach results in more stable predictions in less time than state-of-the-art consistent aggregation models.

[58] arXiv:2412.06582 (replaced) [pdf, other]
Title: Optimal estimation in private distributed functional data analysis
Gengyu Xue, Zhenhua Lin, Yi Yu
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

We systematically investigate the preservation of differential privacy in functional data analysis, beginning with functional mean estimation and extending to varying coefficient model estimation. Our work introduces a distributed learning framework involving multiple servers, each responsible for collecting several sparsely observed functions. This hierarchical setup introduces a mixed notion of privacy. Within each function, user-level differential privacy is applied to $m$ discrete observations. At the server level, central differential privacy is deployed to account for the centralised nature of data collection. Across servers, only private information is exchanged, adhering to federated differential privacy constraints. To address this complex hierarchy, we employ minimax theory to reveal several fundamental phenomena: from sparse to dense functional data analysis, from user-level to central and federated differential privacy costs, and the intricate interplay between different regimes of functional data analysis and privacy preservation.
To the best of our knowledge, this is the first study to rigorously examine functional data estimation under multiple privacy constraints. Our theoretical findings are complemented by efficient private algorithms and extensive numerical evidence, providing a comprehensive exploration of this challenging problem.

[59] arXiv:2502.14566 (replaced) [pdf, html, other]
Title: Feasible Dose-Response Curves for Continuous Treatments Under Positivity Violations
Han Bao, Michael Schomaker
Comments: 43 pages (30 without appendix), 8 figures
Subjects: Methodology (stat.ME); Applications (stat.AP)

Positivity violations can complicate estimation and interpretation of causal dose-response curves (CDRCs) for continuous interventions. Weighting-based methods are designed to handle limited overlap, but the resulting weighted targets can be hard to interpret scientifically. Modified treatment policies can be less sensitive to support limitations, yet they typically target policy-defined effects that may not align with the original dose-response question. We develop an approach that addresses limited overlap while remaining close to the scientific target of the CDRC. Our work is motivated by the CHAPAS-3 trial of HIV-positive children in Zambia and Uganda, where clinically relevant efavirenz concentration levels are not uniformly supported across covariate strata. We introduce a diagnostic, the non-overlap ratio, which quantifies, as a function of the target intervention level, the proportion of the population for whom that level is not supported given observed covariates. We also define an individualized most feasible intervention: for each child and target concentration, we retain the target when it is supported, and otherwise map it to the nearest supported concentration. The resulting feasible dose-response curve answers: if we try to set everyone to a given concentration, but it is not realistically attainable for some individuals, what outcome would be expected after shifting those individuals to their nearest attainable concentration? We propose a plug-in g-computation estimator that combines outcome regression with flexible conditional density estimation to learn supported regions and evaluate the feasible estimand. Simulations show reduced bias under positivity violations and recovery of the standard CDRC when support is adequate. An application to CHAPAS-3 yields a stable and interpretable concentration-response summary under realistic support constraints.

[60] arXiv:2502.18223 (replaced) [pdf, html, other]
Title: Principled priors for Bayesian inference of circular models
Xiang Ye, Janet Van Niekerk, Håvard Rue
Comments: 46 pages, 21 figures
Subjects: Methodology (stat.ME)

Advancements in computational power and methodologies have enabled research on massive datasets. However, tools for analyzing data with directional or periodic characteristics, such as wind directions and customers' arrival time in 24-hour clock, remain underdeveloped. While statisticians have proposed circular distributions for such analyses, significant challenges persist in constructing circular statistical models, particularly in the context of Bayesian methods. These challenges stem from limited theoretical development and a lack of historical studies on prior selection for circular distribution parameters.
In this article, we propose a principled, practical and systematic framework for selecting priors that effectively prevents overfitting in circular scenarios, especially when there is insufficient information to guide prior selection. We introduce well-examined Penalized Complexity (PC) priors for the most widely used circular distributions. Comprehensive comparisons with existing priors in the literature are conducted through simulation studies and a practical case study. Finally, we discuss the contributions and implications of our work, providing a foundation for further advancements in constructing Bayesian circular statistical models.

[61] arXiv:2503.03065 (replaced) [pdf, html, other]
Title: Meta-analysis of median survival times with inverse-variance weighting
Sean McGrath, Cheng-Han Yang, Jonathan Kimmelman, Omer Ozturk, Russell Steele, Andrea Benedetti
Subjects: Methodology (stat.ME)

We consider the problem of meta-analyzing outcome measures based on median survival times. Primary studies with time-to-event outcomes often report estimates of median survival times and confidence intervals based on the Kaplan-Meier estimator. However, outcome measures based on median survival are rarely meta-analyzed, as standard inverse-variance weighted methods require within-study standard errors that are typically not reported. In this article, we consider an inverse-variance weighted approach to meta-analyze median survival times that estimates the within-study standard errors from the reported confidence intervals. We show that this method consistently estimates the standard error of median survival when applied to confidence intervals constructed by the Brookmeyer-Crowley method. We conduct a series of simulation studies evaluating the performance of this approach at the study level (i.e., for estimating the standard error of median survival) and the meta-analytic level (i.e., for estimating the pooled median, difference of medians, and ratio of medians) for commonly used confidence intervals for median survival, including the Brookmeyer-Crowley method and nonparametric bootstrap. We find that this approach often performs comparably to a benchmark approach that uses the true within-study standard errors for meta-analyzing median-based outcome measures when within-study sample sizes are moderately large (e.g., above 50). However, when the effective sample sizes are small, the method can yield biased estimates of within-study standard errors. We illustrate an application of this approach in a meta-analysis evaluating survival benefits of being assigned to experimental arms versus comparator arms in randomized trials for non-small cell lung cancer therapies.

[62] arXiv:2505.08654 (replaced) [pdf, other]
Title: Holistic Multi-Scale Inference of the Leverage Effect: Efficiency under Dependent Microstructure Noise
Ziyang Xiong, Zhao Chen, Christina Dan Wang
Subjects: Methodology (stat.ME); Econometrics (econ.EM); Statistical Finance (q-fin.ST)

This paper addresses the long-standing challenge of estimating the leverage effect from high-frequency data contaminated by dependent, non-Gaussian microstructure noise. We depart from the conventional reliance on pre-averaging or volatility "plug-in" methods by introducing a holistic multi-scale framework that operates directly on the leverage effect. We propose two novel estimators: the Subsampling-and-Averaging Leverage Effect (SALE) and the Multi-Scale Leverage Effect (MSLE). Central to our approach is a shifted window technique that constructs a noise-unbiased base estimator, significantly simplifying the multi-scale architecture. We provide a rigorous theoretical foundation for these estimators, establishing central limit theorems and stable convergence results that remain valid under both noise-free and dependent-noise settings. The primary contribution to estimation efficiency is a specifically designed weighting strategy for the MSLE estimator. By optimizing the weights based on the asymptotic covariance structure across scales and incorporating finite-sample variance corrections, we achieve substantial efficiency gains over existing benchmarks. Extensive simulation studies and an empirical analysis of 30 U.S. assets demonstrate that our framework consistently yields smaller estimation errors and superior performance in realistic, noisy market environments.

[63] arXiv:2505.17133 (replaced) [pdf, html, other]
Title: Learning Probabilities of Causation with Mask-Augmented Data
Shuai Wang, Yizhou Sun, Judea Pearl, Ang Li
Comments: arXiv admin note: text overlap with arXiv:2502.08858
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Probabilities of causation play a central role in modern decision making. Tian and Pearl first introduced formal definitions and derived tight bounds for three binary probabilities of causation, such as the probability of necessity and sufficiency (PNS). However, estimating these probabilities requires both experimental and observational distributions specific to each subpopulation, which are often unreliable or impractical to obtain from limited population-level data. To solve this problem, we propose two machine learning models: Exact-MLP and Mask-MLP, which are trained on a small set of reliable subpopulations and are able to predict PNS bounds for all other subpopulations. We validate our models across four Structural Causal Models (SCMs), each evaluated on population-level data with sample sizes between 100k and 200k. Our models achieve average mean absolute errors (MAEs) of roughly 0.03 on main tasks, reducing MAE by about 80% relative to the corresponding baselines. These results demonstrate both the feasibility of machine learning models for learning probabilities of causation and the effectiveness of the proposed approach.

[64] arXiv:2505.21208 (replaced) [pdf, html, other]
Title: Input Convex Kolmogorov Arnold Networks
Thomas Deschatre, Xavier Warin
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

This article presents an input convex neural network architecture using Kolmogorov-Arnold networks (ICKAN). Two specific networks are presented: the first is based on a low-order, linear-by-part, representation of functions, and a universal approximation theorem is provided. The second is based on cubic splines, for which only numerical results support convergence. We demonstrate on simple tests that these networks perform competitively with classical input convex neural networks (ICNNs). In a second part, we use the networks to solve some optimal transport problems needing a convex approximation of functions and demonstrate their effectiveness. Comparisons with ICNNs show that cubic ICKANs produce results similar to those of classical ICNNs.

[65] arXiv:2506.05776 (replaced) [pdf, html, other]
Title: Analyzing the retraining frequency of global forecasting models: towards more stable forecasting systems
Marco Zanotti
Subjects: Applications (stat.AP); Other Statistics (stat.OT)

Forecast stability, that is, the consistency of predictions over time, is essential in business settings where sudden shifts in forecasts can disrupt planning and erode trust in predictive systems. Despite its importance, stability is often overlooked in favor of accuracy. In this study, we evaluate the stability of point and probabilistic forecasts across several retraining scenarios using three large forecastingdatasets and ten different global forecasting models. To analyze stability in the probabilistic setting, we propose a new model-agnostic, distribution-free, and scale-free metric that measuresprobabilistic stability: the Scaled Multi-Quantile Change (SMQC). The results show that less frequent retraining not only preserves but often improves forecast stability, challenging the need for frequent retraining. Moreover, the study shows that accuracy and stability are not necessarily conflicting objectives when adopting a global modeling approach. The study promotes a shift toward stability-aware forecasting practices, proposing a new metric to evaluate forecast stability effectively in probabilistic settings, and offering practical guidelines for building more stable and sustainable forecasting systems.

[66] arXiv:2506.05905 (replaced) [pdf, html, other]
Title: Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows
Francesca R. Crucinio, Sahani Pathiraja
Comments: Changes from v1: the study of tempered dynamics was removed in favour of a larger experimental section
Subjects: Methodology (stat.ME); Numerical Analysis (math.NA); Computation (stat.CO); Machine Learning (stat.ML)

We consider the problem of sampling from a probability distribution $\pi$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise the Kullback--Leibler divergence from $\pi$. We consider several partial differential equations (PDEs) whose solution is a minimiser of the Kullback--Leibler divergence from $\pi$ and connect them to well-known Monte Carlo algorithms. We focus in particular on PDEs obtained by considering the Wasserstein--Fisher--Rao geometry over the space of probabilities and show that these lead to a natural implementation using importance sampling and sequential Monte Carlo. We propose a novel algorithm to approximate the Wasserstein--Fisher--Rao flow of the Kullback--Leibler divergence and conduct an extensive empirical study to identify when these algorithms outperforms other popular Monte Carlo algorithms.

[67] arXiv:2507.02552 (replaced) [pdf, html, other]
Title: Covariance scanning for adaptively optimal change point detection in high-dimensional linear models
Haeran Cho, Housen Li
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

This paper investigates the detection and estimation of a single change in high-dimensional linear models. We derive minimax lower bounds for the detection boundary and the estimation rate, which uncover a phase transition governed by the sparsity of the covariance-weighted differential parameter. This form of "inherent sparsity" captures a delicate interplay between the covariance structure of the regressors and the change in regression coefficients on the detectability of a change point. Complementing the lower bounds, we introduce two covariance scanning-based methods, McScan and QcSan, which achieve minimax optimal performance (up to possible logarithmic factors) in the sparse and the dense regimes, respectively. In particular, QcScan is the first method shown to achieve consistency in the dense regime and further, we devise a combined procedure which is adaptively minimax optimal across sparse and dense regimes without the knowledge of the sparsity. Computationally, covariance scanning-based methods avoid costly computation of Lasso-type estimators and attain worst-case computation complexity that is linear in the dimension and sample size. Additionally, we consider the post-detection estimation of the differential parameter and the refinement of the change point estimator. Simulation studies support the theoretical findings and demonstrate the computational and statistical efficiency of the proposed covariance scanning methods.

[68] arXiv:2507.09093 (replaced) [pdf, html, other]
Title: Sharp High-Probability Rates for Nonlinear SGD under Heavy-Tailed Noise via Symmetrization
Aleksandar Armacki, Dragana Bajovic, Dusan Jakovetic, Soummya Kar
Comments: 43 pages, 1 figure
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts. Our first result shows that nonlinear SGD (N-SGD) achieves the rate $\widetilde{\mathcal{O}}(t^{-1/2})$, for any noise with unbounded moments and a symmetric probability density function (PDF). Crucially, N-SGD has exponentially decaying tails, matching the performance of linear SGD under light-tailed noise. To handle non-symmetric noise, we propose two novel estimators, based on the idea of noise symmetrization. The first, dubbed Symmetrized Gradient Estimator (SGE), assumes a noiseless gradient at any reference point is available at the start of training, while the second, dubbed Mini-batch SGE (MSGE), uses mini-batches to estimate the noiseless gradient. Combined with the nonlinear framework, we get N-SGE and N-MSGE methods, respectively, both achieving the same convergence rate and exponentially decaying tails as N-SGD, while allowing for non-symmetric noise with unbounded moments and PDF satisfying a mild technical condition, with N-MSGE additionally requiring bounded noise moment of order $p \in (1,2]$. Compared to works assuming noise with bounded $p$-th moment, our results: 1) are based on a novel symmetrization approach; 2) provide a unified framework and relaxed moment conditions; 3) imply optimal oracle complexity of N-SGD and N-SGE, strictly better than existing works when $p < 2$, while the complexity of N-MSGE is close to existing works. Compared to works assuming symmetric noise with unbounded moments, we: 1) provide a sharper analysis and improved rates; 2) facilitate state-dependent symmetric noise; 3) extend the strong guarantees to non-symmetric noise.

[69] arXiv:2507.15529 (replaced) [pdf, html, other]
Title: Algorithms for Approximating Conditionally Optimal Bounds
George Bissias
Subjects: Computation (stat.CO)

This work develops algorithms for non-parametric confidence regions for samples from a univariate distribution whose support is a discrete mesh bounded on the left. We generalize the theory of Learned-Miller to preorders over the sample space. In this context, we show that the lexicographic low and lexicographic high orders are in some way extremal in the class of monotone preorders. From this theory we derive several approximation algorithms: 1) Closed form approximations for the lexicographic low and high orders with error tending to zero in the mesh size; 2) A polynomial-time approximation scheme for quantile orders with error tending to zero in the mesh size; 3) Monte Carlo methods for calculating quantile and lexicographic low orders applicable to any mesh size.

[70] arXiv:2508.13366 (replaced) [pdf, html, other]
Title: Monotonic Path-Specific Effects: Application to Estimating Educational Returns
Aleksei Opacic
Subjects: Applications (stat.AP); General Economics (econ.GN); Methodology (stat.ME)

Conventional research on educational effects typically either employs a "years of schooling" measure of education, or dichotomizes attainment as a point-in-time treatment. Yet, such a conceptualization of education is misaligned with the sequential process by which individuals make educational transitions. In this paper, I propose a causal mediation framework for the study of educational effects on outcomes such as earnings. The framework considers the effect of a given educational transition as operating indirectly, via progression through subsequent transitions, as well as directly, net of these transitions. I demonstrate that the average treatment effect (ATE) of education can be additively decomposed into mutually exclusive components that capture these direct and indirect effects. The decomposition has several special properties which distinguish it from conventional mediation decompositions of the ATE, properties which facilitate less restrictive identification assumptions as well as identification of all causal paths in the decomposition. An analysis of the returns to high school completion in the NLSY97 cohort suggests that the payoff to a high school degree stems overwhelmingly from its direct labor market returns. Mediation via college attendance, completion and graduate school attendance is small because of individuals' low counterfactual progression rates through these subsequent transitions.

[71] arXiv:2508.21536 (replaced) [pdf, html, other]
Title: Triply Robust Panel Estimators
Susan Athey, Guido Imbens, Zhaonan Qu, Davide Viviano
Subjects: Methodology (stat.ME); Econometrics (econ.EM)

This paper studies estimation of causal effects in a panel data setting. We introduce a new estimator, the Triply RObust Panel (TROP) estimator, that combines (i) a flexible model for the potential outcomes based on a low-rank factor structure on top of a two-way-fixed effect specification, with (ii) unit weights intended to upweight units similar to the treated units and (iii) time weights intended to upweight time periods close to the treated time periods. We study the performance of the estimator in a set of simulations designed to closely match several commonly studied real data sets. We find that there is substantial variation in the performance of the estimators across the settings considered. The proposed estimator outperforms two-way-fixed-effect/difference-in-differences, synthetic control, matrix completion and synthetic-difference-in-differences estimators. We investigate what features of the data generating process lead to this performance, and assess the relative importance of the three components of the proposed estimator. We have two recommendations. Our preferred strategy is that researchers use simulations closely matched to the data they are interested in, along the lines discussed in this paper, to investigate which estimators work well in their particular setting. A simpler approach is to use more robust estimators such as synthetic difference-in-differences or the new triply robust panel estimator which we find to substantially outperform two-way fixed effect estimators in many empirically relevant settings.

[72] arXiv:2509.09569 (replaced) [pdf, html, other]
Title: Measuring football fever through wearable technology: A case study on the German cup final
Timo Adam, Jonas Bauer, Christian Deutscher, Christiane Fuchs, Tamara Schamberger, David Winkelmann
Subjects: Applications (stat.AP)

Football is the world's most popular sport, evoking strong physiological and emotional responses among its fans. Yet, the specific dynamics of fan attachment to matches have received little attention in the literature. In this paper, we quantify these dynamics through a unique case study from professional football: the 2025 cup final of the German Football Association (DFB) between first-division club VfB Stuttgart and third-division club Arminia Bielefeld. We collected high-resolution smartwatch data, including heart rate and stress level, from 229 Arminia Bielefeld fans over approximately 12 weeks, complemented by survey responses on club attachment, match attendance, and personal characteristics from a subset of 37 participants. By combining physiological data with survey information, we analyse variations in emotional engagement across individuals and contexts, as well as physiological reactions to key match events. This approach provides rare, data-driven insights into the football fever that captivates fans during high-stakes competitions. Furthermore, we compare the vital parameters recorded on the day of the match with baseline levels on non-matchdays throughout the entire observation period. Our findings reveal pronounced physiological responses among fans, beginning hours before the match and peaking at kick-off.

[73] arXiv:2509.21996 (replaced) [pdf, html, other]
Title: A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process Prior
Trinnhallen Brisley, Gordon Ross, Daniel Paulin
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Hawkes process models are used in settings where past events increase the likelihood of future events occurring. Many applications record events as counts on a regular grid, yet discrete-time Hawkes models remain comparatively underused and are often constrained by fixed-form baselines and excitation kernels. In particular, there is a lack of flexible, nonparametric treatments of both the baseline and the excitation in discrete time. To this end, we propose the Gaussian Process Discrete Hawkes Process (GP-DHP), a nonparametric framework that places Gaussian process priors on both the baseline and the excitation and performs inference through a collapsed latent representation. This yields smooth, data-adaptive structure without prespecifying trends, periodicities, or decay shapes, and enables maximum a posteriori (MAP) estimation with near-linear-time \(O(T\log T)\) complexity. A closed-form projection recovers interpretable baseline and excitation functions from the optimized latent trajectory. In simulations, GP-DHP recovers diverse excitation shapes and evolving baselines. In case studies on U.S. terrorism incidents and weekly Cryptosporidiosis counts, it improves test predictive log-likelihood over standard parametric discrete Hawkes baselines while capturing bursts, delays, and seasonal background variation. The results indicate that flexible discrete-time self-excitation can be achieved without sacrificing scalability or interpretability.

[74] arXiv:2510.08174 (replaced) [pdf, other]
Title: Dimension-free Bounds for Covariance Estimation with Tensor-Train Structure
Artsiom Patarusau, Nikita Puchkin, Maxim Rakhuba, Fedor Noskov
Subjects: Statistics Theory (math.ST)

We consider a problem of covariance estimation from a sample of i.i.d. high-dimensional random vectors. To avoid the curse of dimensionality, we impose an additional assumption on the structure of the covariance matrix $\Sigma$. To be more precise, we study the case when $\Sigma$ can be approximated by a sum of double Kronecker products of smaller matrices in a tensor train (TT) format. Our setup naturally extends widely known Kronecker sum and CANDECOMP/PARAFAC models but admits richer interaction across modes. We suggest an iterative polynomial time algorithm based on TT-SVD and higher-order orthogonal iteration (HOOI) adapted to Tucker-2 hybrid structure. We derive non-asymptotic dimension-free bounds on the accuracy of covariance estimation taking into account hidden Kronecker product and tensor train structures. The efficiency of our approach is illustrated with numerical experiments.

[75] arXiv:2510.12636 (replaced) [pdf, html, other]
Title: Adapting Noise to Data: Generative Flows from 1D Processes
Jannis Chemseddine, Gregor Kornhardt, Richard Duong, Gabriele Steidl
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Analysis of PDEs (math.AP)

The default Gaussian latent in flow-based generative models poses challenges when learning certain distributions such as heavy-tailed ones. We introduce a general framework for learning data-adaptive latent distributions using one-dimensional quantile functions, optimized via the Wasserstein distance between noise and data. The quantile-based parameterization naturally adapts to both heavy-tailed and compactly supported distributions and shortens transport paths. Numerical results confirm the method's flexibility and effectiveness achieved with negligible computational overhead.

[76] arXiv:2510.15632 (replaced) [pdf, html, other]
Title: Robust estimation of polyserial correlation coefficients: A density power divergence approach
Max Welz
Comments: 69 pages (32 main text), 19 figures and 5 tables in total
Journal-ref: Forthcoming in Psychometrika (2026+)
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)

The association between a continuous and an ordinal variable is commonly modeled through the polyserial correlation model. However, this model, which is based on a partially-latent normality assumption, may be misspecified in practice, due to, for example (but not limited to), outliers or careless responses. The typically used maximum likelihood (ML) estimator is highly susceptible to such misspecification: One single observation not generated by partially-latent normality can suffice to produce arbitrarily poor estimates. As a remedy, we propose a novel estimator of the polyserial correlation model designed to be robust against the adverse effects of observations discrepant to that model. The estimator leverages density power divergence estimation to achieve robustness by implicitly downweighting such observations; the ensuing weights constitute a useful tool for pinpointing potential sources of model misspecification. The proposed estimator generalizes ML and is consistent as well as asymptotically Gaussian. As price for robustness, some efficiency must be sacrificed, but substantial robustness can be gained while maintaining more than 98% of ML efficiency. We demonstrate our estimator's robustness and practical usefulness in simulation experiments and an empirical application in personality psychology where our estimator helps identify outliers. Finally, the proposed methodology is implemented in free open-source software.

[77] arXiv:2512.01965 (replaced) [pdf, html, other]
Title: Predicting Onsets and Dry Spells of the West African Monsoon Season Using Machine Learning Methods
Colin Bobocea, Yves Atchadé
Subjects: Applications (stat.AP)

The beginning of the rainy season and the occurrence of dry spells in West Africa is notoriously difficult to predict, however these are the key indicators farmers use to decide when to plant crops, having a major influence on their overall yield. While many studies have shown correlations between global sea surface temperatures and characteristics of the West African monsoon season, there are few that effectively implement this information into machine learning (ML) prediction models. In this study we investigated the best ways to define our target variables, onset and dry spell, and produced methods to predict them for upcoming seasons using sea surface temperature teleconnections. Defining our target variables required the use of a combination of two well known definitions of onset. We then applied custom statistical techniques -- like total variation regularization and predictor selection -- to the two models we constructed, the first being a linear model and the other an adaptive-threshold logistic regression model. We found mixed results for onset prediction, with spatial verification showing signs of significant skill, while temporal verification showed little to none. For dry spell though, we found significant accuracy through the analysis of multiple binary classification metrics. These models overcome some limitations that current approaches have, such as being computationally intensive and needing bias correction. We also introduce this study as a framework to use ML methods for targeted prediction of certain weather phenomenon using climatologically relevant variables. As we apply ML techniques to more problems, we see clear benefits for fields like meteorology and lay out a few new directions for further research.

[78] arXiv:2512.14609 (replaced) [pdf, html, other]
Title: Asymptotic Inference for Rank Correlations
Marc-Oliver Pohle, Jan-Lukas Wermuth, Christian H. Weiß
Subjects: Methodology (stat.ME); Econometrics (econ.EM)

Kendall's tau and Spearman's rho are widely used tools for measuring dependence. Surprisingly, when it comes to asymptotic inference for these rank correlations, some fundamental results and methods have not yet been developed, in particular for discrete random variables and in the time series case, and concerning variance estimation in general. Consequently, asymptotic confidence intervals are not available. We provide a comprehensive treatment of asymptotic inference for classical rank correlations, including Kendall's tau, Spearman's rho, Goodman-Kruskal's gamma, Kendall's tau-b, and grade correlation. We derive asymptotic distributions for both iid and time series data, resorting to asymptotic results for U-statistics, and introduce consistent variance estimators. This enables the construction of confidence intervals and tests, generalizes classical results for continuous random variables and leads to corrected versions of widely used tests of independence. We analyze the finite-sample performance of our variance estimators, confidence intervals, and tests in simulations and illustrate their use in case studies.

[79] arXiv:2601.14049 (replaced) [pdf, html, other]
Title: Tail-Aware Density Forecasting of Locally Explosive Time Series: A Neural Network Approach
Elena Dumitrescu, Julien Peignon, Arthur Thomas
Subjects: Methodology (stat.ME)

This paper proposes a Mixture Density Network specifically designed for forecasting time series that exhibit locally explosive behavior. By incorporating skewed t-distributions as mixture components, our approach offers enhanced flexibility in capturing the skewed, heavy-tailed, and potentially multimodal nature of predictive densities associated with bubble dynamics modeled by mixed causal-noncausal ARMA processes. In addition, we implement an adaptive weighting scheme that emphasizes tail observations during training and hence leads to accurate density estimation in the extreme regions most relevant for financial applications. Equally important, once trained, the MDN produces near-instantaneous density forecasts. Through extensive Monte Carlo simulations and two empirical applications, on the natural gas price and inflation, we show that the proposed MDN-based framework delivers superior forecasting performance relative to existing approaches.

[80] arXiv:2601.17400 (replaced) [pdf, html, other]
Title: Variational autoencoder for inference of nonlinear mixed effect models based on ordinary differential equations
Zhe Li, Mélanie Prague, Rodolphe Thiébaut, Quentin Clairon
Subjects: Methodology (stat.ME)

We propose a variational autoencoder (VAE) approach for parameter estimation in nonlinear mixed-effects models based on ordinary differential equations (NLME-ODEs) using longitudinal data from multiple subjects. In moderate dimensions, likelihood-based inference via the stochastic approximation EM algorithm (SAEM) is widely used, but it relies on Markov Chain Monte-Carlo (MCMC) to approximate subject-specific posteriors. As model complexity increases or observations per subject are sparse and irregular, performance often deteriorates due to a complex, multimodal likelihood surface which may lead to MCMC convergence difficulties. We instead estimate parameters by maximizing the evidence lower bound (ELBO), a regularized surrogate for the marginal likelihood. A VAE with a shared encoder amortizes inference of subject-specific random effects by avoiding per-subject optimization and the use of MCMC. Beyond pointwise estimation, we quantify parameter uncertainty using observed-information-based variance estimator and verify that practical identifiability of the model parameters is not compromised by nuisance parameters introduced in the encoder. We evaluate the method in three simulation case studies (pharmacokinetics, humoral response to vaccination, and TGF-$\beta$ activation dynamics in asthmatic airways) and on a real-world antibody kinetics dataset, comparing against SAEM baselines.

[81] arXiv:2601.19186 (replaced) [pdf, html, other]
Title: Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making
Zeyu Bian, Lan Wang, Chengchun Shi, Zhengling Qi
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Fairness is a central pillar of trustworthy machine learning, especially in domains where accuracy- or profit-driven optimization is insufficient. While most fairness research focuses on supervised learning, fairness in policy learning remains less explored. Because policy learning is interventional, it induces two distinct fairness targets: action fairness (equitable action assignments) and outcome fairness (equitable downstream consequences). Crucially, equalizing actions does not generally equalize outcomes when groups face different constraints or respond differently to the same action. We propose a novel double fairness learning (DFL) framework that explicitly manages the trade-off among three objectives: action fairness, outcome fairness, and value maximization. We integrate fairness directly into a multi-objective optimization problem for policy learning and employ a lexicographic weighted Tchebyshev method that recovers Pareto solutions beyond convex settings, with theoretical guarantees on the regret bounds. Our framework is flexible and accommodates various commonly used fairness notions. Extensive simulations demonstrate improved performance relative to competing methods. In applications to a motor third-party liability insurance dataset and an entrepreneurship training dataset, DFL substantially improves both action and outcome fairness while incurring only a modest reduction in overall value.

[82] arXiv:2601.20152 (replaced) [pdf, other]
Title: Concentration Inequalities for Exchangeable Tensors and Matrix-valued Data
Chen Cheng, Rina Foygel Barber
Comments: 45 pages, 3 figures
Subjects: Statistics Theory (math.ST); Probability (math.PR); Machine Learning (stat.ML)

We study concentration inequalities for structured weighted sums of random data, including (i) tensor inner products and (ii) sequential matrix sums. We are interested in tail bounds and concentration inequalities for those structured weighted sums under exchangeability, extending beyond the classical framework of independent terms.
We develop Hoeffding and Bernstein bounds provided with structure-dependent exchangeability. Along the way, we recover known results in weighted sum of exchangeable random variables and i.i.d. sums of random matrices to the optimal constants. Notably, we develop a sharper concentration bound for combinatorial sum of matrix arrays than the results previously derived from Chatterjee's method of exchangeable pairs.
For applications, the richer structures provide us with novel analytical tools for estimating the average effect of multi-factor response models and studying fixed-design sketching methods in federated averaging. We apply our results to these problems, and find that our theoretical predictions are corroborated by numerical evidence.

[83] arXiv:2602.00716 (replaced) [pdf, html, other]
Title: Emergence of Distortions in High-Dimensional Guided Diffusion Models
Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello
Comments: 29 pages, 16 figures
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)

Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often leads to a loss of diversity in generated samples. We formalize this phenomenon as generative distortion, defined as the mismatch between the CFG-induced sampling distribution and the true conditional distribution. Considering Gaussian mixtures and their exact scores, and leveraging tools from statistical physics, we characterize the onset of distortion in a high-dimensional regime as a function of the number of classes. Our analysis reveals that distortions emerge through a phase transition in the effective potential governing the guided dynamics. In particular, our dynamical mean-field analysis shows that distortion persists when the number of modes grows exponentially with dimension, but vanishes in the sub-exponential regime. Consistent with prior finite-dimensional results, we further demonstrate that vanilla CFG shifts the mean and shrinks the variance of the conditional distribution. We show that standard CFG schedules are fundamentally incapable of preventing variance shrinkage. Finally, we propose a theoretically motivated guidance schedule featuring a negative-guidance window, which mitigates loss of diversity while preserving class separability.

[84] arXiv:2602.03970 (replaced) [pdf, other]
Title: Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits
Anastasis Kratsios, Giulia Livieri, A. Martina Neuman
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Metric Geometry (math.MG); Statistics Theory (math.ST)

We study the statistical behaviour of reasoning probes in a stylized model of looped reasoning, given by Boolean circuits whose computational graph is a perfect $\nu$-ary tree ($\nu\ge 2$) and whose output is appended to the input and fed back iteratively for subsequent computation rounds. A reasoning probe has access to a sampled subset of internal computation nodes, possibly without covering the entire graph, and seeks to infer which $\nu$-ary Boolean gate is executed at each queried node, representing uncertainty via a probability distribution over a fixed collection of $\mathtt{m}$ admissible $\nu$-ary gates. This partial observability induces a generalization problem, which we analyze in a realizable, transductive setting.
We show that, when the reasoning probe is parameterized by a graph convolutional network (GCN)-based hypothesis class and queries $N$ nodes, the worst-case generalization error attains the optimal rate $\mathcal{O}(\sqrt{\log(2/\delta)}/\sqrt{N})$ with probability at least $1-\delta$, for $\delta\in (0,1)$. Our analysis combines snowflake metric embedding techniques with tools from statistical optimal transport. A key insight is that this optimal rate is achievable independently of graph size, owing to the existence of a low-distortion one-dimensional snowflake embedding of the induced graph metric. As a consequence, our results provide a sharp characterization of how structural properties of the computational graph govern the statistical efficiency of reasoning under partial access.

[85] arXiv:2602.07632 (replaced) [pdf, html, other]
Title: Scalable Mean-Field Variational Inference via Preconditioned Primal-Dual Optimization
Jinhua Lyu, Tianmin Yu, Ying Ma, Naichen Shi
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In this work, we investigate the large-scale mean-field variational inference (MFVI) problem from a mini-batch primal-dual perspective. By reformulating MFVI as a constrained finite-sum problem, we develop a novel primal-dual algorithm based on an augmented Lagrangian formulation, termed primal-dual variational inference (PD-VI). PD-VI jointly updates global and local variational parameters in the evidence lower bound in a scalable manner. To further account for heterogeneous loss geometry across different variational parameter blocks, we introduce a block-preconditioned extension, P$^2$D-VI, which adapts the primal-dual updates to the geometry of each parameter block and improves both numerical robustness and practical efficiency. We establish convergence guarantees for both PD-VI and P$^2$D-VI under properly chosen constant step size, without relying on conjugacy assumptions or explicit bounded-variance conditions. In particular, we prove $O(1/T)$ convergence to a stationary point in general settings and linear convergence under strong convexity. Numerical experiments on synthetic data and a real large-scale spatial transcriptomics dataset demonstrate that our methods consistently outperform existing stochastic variational inference approaches in terms of convergence speed and solution quality.

[86] arXiv:2602.07681 (replaced) [pdf, html, other]
Title: Mapping Drivers of Greenness: Spatial Variable Selection for MODIS Vegetation Indices
Qishi Zhan, Cheng-Han Yu, Yuchi Chen, Zhikang Dong, Rajarshi Guhaniyogi
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI)

Understanding how environmental drivers relate to vegetation condition motivates spatially varying regression models, but estimating a separate coefficient surface for every predictor can yield noisy patterns and poor interpretability when many predictors are irrelevant. Motivated by MODIS vegetation index studies, we examine predictors from spectral bands, productivity and energy fluxes, observation geometry, and land surface characteristics. Because these relationships vary with canopy structure, climate, land use, and measurement conditions, methods should both model spatially varying effects and identify where predictors matter. We propose a spatially varying coefficient model where each coefficient surface uses a tensor product B-spline basis and a Bayesian group lasso prior on the basis coefficients. This prior induces predictor level shrinkage, pushing negligible effects toward zero while preserving spatial structure. Posterior inference uses Markov chain Monte Carlo and provides uncertainty quantification for each effect surface. We summarize retained effects with spatial significance maps that mark locations where the 95 percent posterior credible interval excludes zero, and we define a spatial coverage probability as the proportion of locations where the credible interval excludes zero. Simulations recover sparsity and achieve prediction. A MODIS application yields a parsimonious subset of predictors whose effect maps clarify dominant controls across landscapes.

[87] arXiv:2406.05637 (replaced) [pdf, other]
Title: A Generalized Version of Chung's Lemma and its Applications
Li Jiang, Xiao Li, Andre Milzarek, Junwen Qiu
Comments: 38 pages
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

Chung's Lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's Lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as Stochastic Gradient Descent (SGD) and Random Reshuffling (RR), under a general $(\theta,\mu)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes exhibit superior adaptivity to both landscape geometry and gradient noise; specifically, they achieve optimal convergence rates without requiring exact knowledge of the underlying landscape or separate parameter selection strategies for noisy and noise-free regimes. Our results demonstrate that the developed variant of Chung's Lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.

[88] arXiv:2504.03560 (replaced) [pdf, html, other]
Title: Stochastic Optimization with Optimal Importance Sampling
Liviu Aolaritei, Bart P.G. Van Parys, Henry Lam, Michael I. Jordan
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its effectiveness, the performance of IS is highly sensitive to the choice of the proposal distribution and often requires stochastic calibration. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a lesser-known fundamental challenge: the decision variable and the importance sampling distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both convergence analysis and variance control. In this paper, we consider the generic setting of convex stochastic optimization with linear constraints. We propose a single-loop stochastic approximation algorithm, based on a variant of Nesterov's dual averaging, that jointly updates the decision variable and the importance sampling distribution, notably without time-scale separation or nested optimization. The method is globally convergent and achieves the minimal asymptotic variance among stochastic gradient schemes, which moreover matches the performance of an oracle sampler adapted to the optimal solution and thus effectively resolves the circular optimization challenge.

[89] arXiv:2505.10919 (replaced) [pdf, html, other]
Title: A Physics-Informed Spatiotemporal Deep Learning Framework for Turbulent Systems
Luca Menicali, Andrew Grace, David H. Richter, Stefano Castruccio
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Machine Learning (stat.ML)

Fluid thermodynamics underpins atmospheric dynamics, climate science, industrial applications, and energy systems. However, direct numerical simulations (DNS) of such systems can be computationally prohibitive. To address this, we present a novel physics-informed spatiotemporal surrogate model for Rayleigh-Benard convection (RBC), a canonical example of convective fluid flow. Our approach combines convolutional neural networks, for spatial dimension reduction, with an innovative recurrent architecture, inspired by large language models, to model long-range temporal dynamics. Inference is penalized with respect to the governing partial differential equations to ensure physical interpretability. Since RBC exhibits turbulent behavior, we quantify uncertainty using a conformal prediction framework. This model replicates key physical features of RBC dynamics while significantly reducing computational cost, offering a scalable alternative to DNS for long-term simulations.

[90] arXiv:2505.18879 (replaced) [pdf, other]
Title: Efficient Online Random Sampling via Randomness Recycling
Thomas L. Draper, Feras A. Saad
Journal-ref: Proceedings of the 2026 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 2473-2511. Society for Industrial and Applied Mathematics, 2026
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Probability (math.PR); Computation (stat.CO)

This article studies the fundamental problem of using i.i.d. coin tosses from an entropy source to efficiently generate random variables $X_i \sim P_i$ $(i \ge 1)$, where $(P_1, P_2, \dots)$ is a random sequence of rational discrete probability distributions subject to an \textit{arbitrary} stochastic process. Our method achieves an amortized expected entropy cost within $\varepsilon > 0$ bits of the information-theoretically optimal Shannon lower bound using $O(\log(1/\varepsilon))$ space. This result holds both pointwise in terms of the Shannon information content conditioned on $X_i$ and $P_i$, and in expectation to obtain a rate of $\mathbb{E}[H(P_1) + \dots + H(P_n)]/n + \varepsilon$ bits per sample as $n \to \infty$ (where $H$ is the Shannon entropy). The combination of space, time, and entropy properties of our method improves upon the Knuth and Yao (1976) entropy-optimal algorithm and Han and Hoshi (1997) interval algorithm for online sampling, which require unbounded space. It also uses exponentially less space than the more specialized methods of Kozen and Soloviev (2022) and Shao and Wang (2025) that generate i.i.d. samples from a fixed distribution. Our online sampling algorithm rests on a powerful algorithmic technique called \textit{randomness recycling}, which reuses a fraction of the random information consumed by a probabilistic algorithm to reduce its amortized entropy cost.
On the practical side, we develop randomness recycling techniques to accelerate a variety of prominent sampling algorithms. We show that randomness recycling enables state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator, and that it reduces the entropy cost of discrete Gaussian sampling. Accompanying the manuscript is a performant software library in the C programming language.

[91] arXiv:2505.19013 (replaced) [pdf, html, other]
Title: Faithful Group Shapley Value
Kiljae Lee, Ziqi Liu, Weijing Tang, Yuan Zhang
Comments: Accepted to NeurIPS 2025
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); General Economics (econ.GN); Machine Learning (stat.ML)

Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. In practice, group-level data valuation is desirable when data providers contribute data in batch. However, we identify that existing group-level extensions of Data Shapley are vulnerable to shell company attacks, where strategic group splitting can unfairly inflate valuations. We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks. Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV. Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.

[92] arXiv:2506.13865 (replaced) [pdf, other]
Title: Connecting phases of matter to the flatness of the loss landscape in analog variational quantum algorithms
Kasidit Srimahajariyapong, Supanut Thanasilp, Thiparat Chotibut
Comments: 17+9 pages, 9+7 figures
Subjects: Quantum Physics (quant-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Variational quantum algorithms (VQAs) promise near-term quantum advantage, yet parametrized quantum states commonly built from the digital gate-based approach often suffer from scalability issues such as barren plateaus, where the loss landscape becomes flat. We study an analog VQA ansätze composed of $M$ quenches of a disordered Ising chain, whose dynamics is native to several quantum simulation platforms. By tuning the disorder strength we place each quench in either a thermalized phase or a many-body-localized (MBL) phase and analyse (i) the ansätze's expressivity and (ii) the scaling of loss variance. Numerics shows that both phases reach maximal expressivity at large $M$, but barren plateaus emerge at far smaller $M$ in the thermalized phase than in the MBL phase. Exploiting this gap, we propose an MBL initialisation strategy: initialise the ansätze in the MBL regime at intermediate quench $M$, enabling an initial trainability while retaining sufficient expressivity for subsequent optimization. The results link quantum phases of matter and VQA trainability, and provide practical guidelines for scaling analog-hardware VQAs.

[93] arXiv:2506.22499 (replaced) [pdf, html, other]
Title: Scalable Dynamic Origin-Destination Demand Estimation Enhanced by High-Resolution Satellite Imagery Data
Jiachao Liu, Pablo Guarda, Koichiro Niinuma, Sean Qian
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Applications (stat.AP)

This study presents a novel integrated framework for dynamic origin-destination demand estimation (DODE) in multi-class mesoscopic network models, incorporating high-resolution satellite imagery together with conventional traffic data from local sensors. Unlike sparse local detectors, satellite imagery offers consistent, city-wide road and traffic information of both parking and moving vehicles, overcoming data availability limitations. To extract information from imagery data, we design a computer vision pipeline for class-specific vehicle detection and map matching, generating link-level traffic density observations by vehicle class. Building upon this information, we formulate a computational graph-based DODE framework that calibrates dynamic network states by jointly matching observed traffic counts/speeds from local sensors with density measurements derived from satellite imagery. To assess the accuracy and robustness of the proposed framework, we conduct a series of numerical experiments using both synthetic and real-world data. The results demonstrate that supplementing traditional data with satellite-derived density significantly improves estimation performance, especially for links without local sensors. Real-world experiments also show the framework's potential for practical deployment on large-scale networks. Sensitivity analysis further evaluates the impact of data quality related to satellite imagery data.

[94] arXiv:2507.05526 (replaced) [pdf, html, other]
Title: Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning
Anish Dhir, Cristiana Diaconu, Valentinian Mihai Lungu, James Requeima, Richard E. Turner, Mark van der Wilk
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

In scientific domains -- from biology to the social sciences -- many questions boil down to \textit{What effect will we observe if we intervene on a particular variable?} If the causal relationships (e.g.~a causal graph) are known, it is possible to estimate the intervention distributions. In the absence of this domain knowledge, the causal structure must be discovered from the available observational data. However, observational data are often compatible with multiple causal graphs, making methods that commit to a single structure prone to overconfidence. A principled way to manage this structural uncertainty is via Bayesian inference, which averages over a posterior distribution on possible causal structures and functional mechanisms. Unfortunately, the number of causal structures grows super-exponentially with the number of nodes in the graph, making computations intractable. We propose to circumvent these challenges by using meta-learning to create an end-to-end model: the Model-Averaged Causal Estimation Transformer Neural Process (MACE-TNP). The model is trained to predict the Bayesian model-averaged interventional posterior distribution, and its end-to-end nature bypasses the need for expensive calculations. Empirically, we demonstrate that MACE-TNP outperforms strong Bayesian baselines. Our work establishes meta-learning as a flexible and scalable paradigm for approximating complex Bayesian causal inference, that can be scaled to increasingly challenging settings in the future.

[95] arXiv:2507.06556 (replaced) [pdf, html, other]
Title: Spectra of high-dimensional sparse random geometric graphs
Yifan Cao, Yizhe Zhu
Comments: 26 pages, 4 figures
Subjects: Probability (math.PR); Combinatorics (math.CO); Statistics Theory (math.ST)

We analyze the spectral properties of the high-dimensional random geometric graph $G(n, d, p)$, formed by sampling $n$ i.i.d vectors $\{v_i\}_{i=1}^{n}$ uniformly on a $d$-dimensional unit sphere and connecting each pair $\{i,j\}$ whenever $\langle v_i, v_j \rangle \geq \tau$ so that $p=\mathbb P(\langle v_i,v_j\rangle \geq \tau)$. This model defines a nonlinear random matrix ensemble with dependent entries. We show that if $d =\omega( np\log^{2}(1/p))$ and $np\to\infty$, the limiting spectral distribution of the normalized adjacency matrix $\frac{A}{\sqrt{np(1-p)}}$ is the semicircle law. To our knowledge, this is the first such result for $G(n, d, p)$ in the sparse regime. In the constant sparsity case $p=\alpha/n$, we further show that if $d=\omega(\log^2(n))$ the limiting spectral distribution of $A$ in $G(n,\alpha/n)$ coincides with that of the Erdős-Rényi graph $G(n,\alpha/n)$.
Our approach combines the classical moment method in random matrix theory with a novel recursive decomposition of closed-walk graphs, leveraging block-cut trees and ear decompositions, to control the moments of the empirical spectral distribution. A refined high trace analysis further yields a near-optimal bound on the second eigenvalue when $np=\Omega(\log^4 (n))$, removing technical conditions previously imposed in (Liu et al. 2023). As an application, we demonstrate that this improved eigenvalue bound sharpens the parameter requirements on $d$ and $p$ for spontaneous synchronization on random geometric graphs in (Abdalla et al. 2024) under the homogeneous Kuramoto model.

[96] arXiv:2510.23631 (replaced) [pdf, html, other]
Title: Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Yuxuan Tang, Yifan Feng
Comments: Accepted by The Fourteenth International Conference on Learning Representations (ICLR 2026)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)

Alignment of large language models (LLMs) has predominantly relied on pairwise preference optimization, where annotators select the better of two responses to a prompt. While simple, this approach overlooks the opportunity to learn from richer forms of human feedback, such as multiway comparisons and top-$k$ rankings. We introduce Ranked Choice Preference Optimization (RCPO), a unified framework that bridges preference optimization with (ranked) choice modeling via maximum likelihood estimation. RCPO supports both utility-based and rank-based models, subsumes several pairwise methods (such as DPO and SimPO) as special cases, and provides principled training objectives for richer feedback formats. We instantiate this framework with two representative models (Multinomial Logit and Mallows-RMJ). Experiments on Llama-3-8B-Instruct, Gemma-2-9B-it, and Mistral-7B-Instruct across in-distribution and out-of-distribution settings show that RCPO consistently outperforms competitive baselines. RCPO shows that directly leveraging ranked preference data, combined with the right choice models, yields more effective alignment. It offers an extensible foundation for incorporating (ranked) choice modeling into LLM training.

[97] arXiv:2511.20605 (replaced) [pdf, html, other]
Title: How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets
Xiwen Huang, Pierre Pinson
Comments: Accepted for publication in INFORMS Journal on Data Science (IJDS). This is the authors' preprint
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared to benchmark baselines including random sampling and a greedy knapsack heuristic. The proposed strategies are validated on real-world datasets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared to conventional methods. Our proposal comprises an easy-to-implement practical solution for optimising data acquisition in resource-constrained environments.

[98] arXiv:2512.13123 (replaced) [pdf, html, other]
Title: Stopping Rules for SGD via Anytime-Valid Confidence Sequences
Liviu Aolaritei, Michael I. Jordan
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

Deciding when to stop stochastic gradient descent (SGD) has long remained unresolved in a statistically rigorous sense. While SGD is routinely monitored as it runs, the classical theory of SGD provides guarantees only at pre-specified iteration horizons and offers no valid way to decide, based on the observed trajectory, when further computation is justified. We address this gap by developing anytime-valid confidence sequences for stochastic gradient methods, which remain valid under continuous monitoring and directly induce statistically valid, trajectory-dependent stopping rules: stop as soon as the current upper confidence bound on an appropriate performance measure falls below a user-specified tolerance. The confidence sequences are constructed using nonnegative supermartingales, are time-uniform, and depend only on observable quantities along the SGD trajectory, without requiring prior knowledge of the optimization horizon. In convex optimization, this yields anytime-valid certificates for weighted suboptimality of projected SGD under general stepsize schedules, without assuming smoothness or strong convexity. In nonconvex optimization, it yields time-uniform certificates for weighted first-order stationarity under smoothness assumptions. We further characterize the stopping-time complexity of the resulting stopping rules under standard stepsize schedules. To the best of our knowledge, this is the first framework that provides statistically valid, time-uniform stopping rules for SGD across both convex and nonconvex settings based solely on its observed trajectory.

[99] arXiv:2512.15771 (replaced) [pdf, html, other]
Title: Solving PDEs With Deep Neural Nets under General Boundary Conditions
Chenggong Zhang
Comments: 7 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA); Machine Learning (stat.ML)

Partial Differential Equations (PDEs) are central to modeling complex systems across physical, biological, and engineering domains, yet traditional numerical methods often struggle with high-dimensional or complex problems. Physics-Informed Neural Networks (PINNs) have emerged as an efficient alternative by embedding physics-based constraints into deep learning frameworks, but they face challenges in achieving high accuracy and handling complex boundary conditions. In this work, we extend the Time-Evolving Natural Gradient (TENG) framework to address Dirichlet boundary conditions, integrating natural gradient optimization with numerical time-stepping schemes, including Euler and Heun methods, to ensure both stability and accuracy. By incorporating boundary condition penalty terms into the loss function, the proposed approach enables precise enforcement of Dirichlet constraints. Experiments on the heat equation demonstrate the superior accuracy of the Heun method due to its second-order corrections and the computational efficiency of the Euler method for simpler scenarios. This work establishes a foundation for extending the framework to Neumann and mixed boundary conditions, as well as broader classes of PDEs, advancing the applicability of neural network-based solvers for real-world problems.

[100] arXiv:2512.23190 (replaced) [pdf, html, other]
Title: A Simple, Optimal and Efficient Algorithm for Online Exp-Concave Optimization
Yi-Han Wang, Peng Zhao, Zhi-Hua Zhou
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Online eXp-concave Optimization (OXO) is a fundamental problem in online learning, where the goal is to minimize regret when loss functions are exponentially concave. The standard algorithm, Online Newton Step (ONS), guarantees an optimal $O(d \log T)$ regret, where $d$ is the dimension and $T$ is the time horizon. Despite its simplicity, ONS may face a computational bottleneck due to the Mahalanobis projection at each round. This step costs $\Omega(d^\omega)$ arithmetic operations for bounded domains, even for simple domains such as the unit ball, where $\omega \in (2,3]$ is the matrix-multiplication exponent. As a result, the total runtime can reach $\tilde{O}(d^\omega T)$, particularly when iterates frequently oscillate near the domain boundary. This paper proposes a simple variant of ONS, called LightONS, which reduces the total runtime to $O(d^2 T + d^\omega \sqrt{T \log T})$ while preserving the optimal regret. Deploying LightONS with the online-to-batch conversion implies a method for stochastic exp-concave optimization with runtime $\tilde{O}(d^3/\epsilon)$, thereby answering an open problem posed by Koren [2013]. The design leverages domain-conversion techniques from parameter-free online learning and defers expensive Mahalanobis projections until necessary, thereby preserving the elegant structure of ONS and enabling LightONS to act as an efficient plug-in replacement in broader scenarios, including gradient-norm adaptivity, parametric stochastic bandits, and memory-efficient OXO.

[101] arXiv:2512.24968 (replaced) [pdf, html, other]
Title: The Impact of LLMs on Online News Consumption and Production
Hangcheng Zhao, Ron Berman
Subjects: General Economics (econ.GN); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Applications (stat.AP)

Large language models (LLMs) change how consumers acquire information online; their bots also crawl news publishers' websites for training data and to answer consumer queries; and they provide tools that can lower the cost of content creation. These changes lead to predictions of adverse impact on news publishers in the form of lowered consumer demand, reduced demand for newsroom employees, and an increase in news "slop." Consequently, some publishers strategically responded by blocking LLM access to their websites using the this http URL file standard.
Using high-frequency granular data, we document four effects related to the predicted shifts in news publishing following the introduction of generative AI (GenAI). First, we find a moderate decline in traffic to news publishers occurring after August 2024. Second, using a difference-in-differences approach, we find that blocking GenAI bots can be associated with a reduction of total website traffic to large publishers compared to not blocking. Third, on the hiring side, we do not find evidence that LLMs are replacing editorial or content-production jobs yet. The share of new editorial and content-production job listings increases over time. Fourth, regarding content production, we find no evidence that large publishers increased text volume; instead, they significantly increased rich content and use more advertising and targeting technologies.
Together, these findings provide early evidence of some unforeseen impacts of the introduction of LLMs on news production and consumption.

[102] arXiv:2601.07752 (replaced) [pdf, other]
Title: A Unified Framework for Debiased Machine Learning: Riesz Representer Fitting under Bregman Divergence
Masahiro Kato
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

Estimating the Riesz representer is central to debiased machine learning for causal and structural parameter estimation. We propose generalized Riesz regression, a unified framework for estimating the Riesz representer by fitting a representer model via Bregman divergence minimization. This framework includes various divergences as special cases, such as the squared distance and the Kullback--Leibler (KL) divergence, where the former recovers Riesz regression and the latter recovers tailored loss minimization. Under suitable pairs of divergence and model specifications (link functions), the dual problems of the Riesz representer fitting problem correspond to covariate balancing, which we call automatic covariate balancing. Moreover, under the same specifications, the sample average of outcomes weighted by the estimated Riesz representer satisfies Neyman orthogonality even without estimating the regression function, a property we call automatic Neyman orthogonalization. This property not only reduces the estimation error of Neyman orthogonal scores but also clarifies a key distinction between debiased machine learning and targeted maximum likelihood estimation (TMLE). Our framework can also be viewed as a generalization of density ratio fitting under Bregman divergences to Riesz representer estimation, and it applies beyond density ratio estimation. We provide convergence analyses for both reproducing kernel Hilbert space (RKHS) and neural network model classes. A Python package for generalized Riesz regression is released as genriesz and is available at this https URL.

[103] arXiv:2602.08681 (replaced) [pdf, other]
Title: The Theory and Practice of MAP Inference over Non-Convex Constraints
Leander Kurscheidt, Gabriele Masina, Roberto Sebastiani, Antonio Vergari
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In many safety-critical settings, probabilistic ML systems have to make predictions subject to algebraic constraints, e.g., predicting the most likely trajectory that does not cross obstacles. These real-world constraints are rarely convex, nor the densities considered are (log-)concave. This makes computing this constrained maximum a posteriori (MAP) prediction efficiently and reliably extremely challenging. In this paper, we first investigate under which conditions we can perform constrained MAP inference over continuous variables exactly and efficiently and devise a scalable message-passing algorithm for this tractable fragment. Then, we devise a general constrained MAP strategy that interleaves partitioning the domain into convex feasible regions with numerical constrained optimization. We evaluate both methods on synthetic and real-world benchmarks, showing our approaches outperform constraint-agnostic baselines, and scale to complex densities intractable for SoTA exact solvers.

Total of 103 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status