Detecting non-uniform patterns on high-dimensional hyperspheres
Abstract
We propose a new probabilistic characterization of the uniform distribution on the hypersphere in terms of the distribution of inner products, extending the ideas of [cuesta2009projection; cuesta2007sharp] in a data-driven manner. Using this characterization, we define a new distance that quantifies the deviation of an arbitrary distribution from uniformity.
As an application, we construct a novel nonparametric test for the problem of testing uniformity, namely the task of determining whether a set of i.i.d. random points on the -dimensional hypersphere is approximately uniformly distributed. The proposed test is asymptotically a Brownian bridge and it can detect any alternative lying outside a ball of radius with respect to the proposed distance, in both high and low-dimensional settings.
We then prove a matching lower bound with respect to this distance and study its behavior when restricted to parametric models. In particular, we show that the minimax detection thresholds with respect to this distance coincide with the usual minimax thresholds in two important families: (i) the class of Fisher–von Mises–Langevin (FvML) alternatives, and (ii) a class of low-rank uniform distributions. Thus, the proposed test is optimal in these models. We also derive the limiting distributions of the test under the corresponding local alternatives.
As a byproduct of our analysis, we determine the detection threshold in the high-dimensional regime for testing the intrinsic dimension of the uniform distribution on ; that is, for testing whether the distribution is uniformly supported on against the alternative that it is uniformly distributed on
for some -dimensional linear subspace .
Contents
1 Introduction
Testing whether a sample from an unknown distribution is uniformly distributed over a domain is a classical problem in statistical theory. In the discrete case, this problem has been extensively studied by statisticians, computer scientists, and probabilists; see [bhattacharya2024sparse; balakrishnan2018hypothesis] and references therein. For the continuous case, one of the most common and intriguing settings is the unit hypersphere, not only because of its rich mathematical structure but also due to its importance in statistical analysis on non-Euclidean spaces. Below, we briefly formulate the problem and review relevant literature.
Consider the hypersphere , where is the Euclidean distance. The observed data points are denoted by with for all . We are mainly interested in the high-dimensional case where one assumes is a sequence diverging to infinity. Assume that the data ’s are drawn independently from an unknown distribution supported on the hypersphere . The uniform distribution on the hypersphere is denoted by . The uniformity testing problem can be formulated as
| (1) |
In fixed and small dimensions, the uniformity testing problem has been investigated extensively in the last few decades. An incomplete list of early results concerning the case includes the Kuiper test (see, for example, [Kuiper]), Watson test (see, for example, [watson]) and Hodjes-Ajne test (see, for example, [Ajne]). In arbitrary but fixed dimensions, the class of Sobolev-based tests were introduced in [Gine], and were shown to be universally consistent against any absolutely continuous alternative with -integrable densities. Notable developments of Sobolev tests include the data driven procedures proposed by [Bogdan] and [Jupp]. The readers are referred to the survey papers [survey-uni; pewsey2021recent] for recent progress on this problem and for a list of recent testing procedures. The consistency and optimality of various testing procedures in fixed dimensions have been well-studied in literature, and can also be found in [survey-uni]. Recent results in the fixed-dimensional settings include [garcia2021cramer; garcia2023projection; fernandez2023new; boucher2025modified; boucher2025runs].
In the era of big data, there has been an increasing interest in studying high-dimensional directional statistics, which assumes the dimensions diverge to infinity. For example, in shape analysis and nonparametric statistics, a popular approach is to consider sign-based procedures, in which one projects the observations onto the hyperspheres and carries out statistical inference based on the projected data. This approach is robust in high dimensions since the concentration of measure phenomenon implies that the majority of information from the data is captured by the directions rather than the magnitudes of the observations. Let us give a brief overview about the high-dimensional directional statistics literature below.
In [Dryden05], the author investigates the asymptotic properties of high-dimensional spherical distributions and their applications to brain shape modeling. Specifically, the study involved statistical modeling of a sample of MRI images of adult brains. After normalization, each brain image was represented as a unit vector with dimension . A natural question in this modeling task is whether some simple, well-known distributions (such as the uniform distribution) provide a good fit for the data. Clustering analysis on large dimension hypersphere has been studied in [Banerjee04; Banerjee03]. Potential applications of high-dimensional the uniformity tests were illustrated in [Juan2001], in which the authors relate the multivariate outliers detection problem to uniformity testing problem. Sign-based procedures in high dimensions have been considered in [Zou14] in the context of sphericity testing and in [WPL15], where the authors propose a high-dimensional nonparametric mean test.
From a different perspective than the directional statistical viewpoint discussed above, our primary motivation for studying the high-dimensional analog of (1) arises from deep learning theory. In overparameterized neural networks, regularization is crucial for preventing overfitting and improving generalization. In [xie2017diverse], it was shown that optimizing one-hidden-layer neural networks with approximately uniformly distributed neurons can help avoid spurious local minima. Furthermore, empirical studies in [lin2020regularizing; liu2018learning] have shown that regularization methods promoting uniformity among neurons effectively reduce the generalization error in deep networks. Such methods are fundamentally tied to the question of whether a random set of points on the unit hypersphere is approximately uniformly distributed. The overparameterized nature of deep networks makes it natural to study this question in the high-dimensional settings. Given the complexity of many deep networks, including heavy-tailed or strongly correlated structures (see [mahoney2019traditional] for details), we focus on detecting non-uniformity in a non-parametric manner. This perspective shifts the attention away from traditional parametric modeling goals—such as optimality and asymptotic local power within a parametric class of distributions—towards prioritizing simplicity of implementation and universal consistency.
Despite the vast literature concerning fixed-dimensional tests, much less is known about the uniformity testing problem in the high-dimensional context with diverging dimensions. When the dimension diverges to infinity, many of the existing procedures require highly non-trivial adjustments to work properly. Moreover, there is usually no tractable limiting distribution under uniformity, and the power is typically low due to the curse of dimensionality. To the best of the authors’ knowledge, there are only three high-dimensional tests that have been investigated in literature. We give a short overview of such tests below.
-
1.
Rayleigh test in [Cutting-P-V] and [Ley-P]. This test can be formulated in terms of a U-statistic of the data points with the inner product kernel, i.e.
(2) -
2.
Bingham test in [Cutting-P-V2; Zou14] and [Ley-P]. This test is also based on a U-statistic of the data points, but with a quadratic inner product kernel, i.e.
(3) -
3.
Packing test in [Jiang13]. This test is based on the smallest angle, i.e.
(4)
The asymptotic distributions of these test statistics, as well as their non-null behaviors have been studied rigorously over the last few years; see, for example, [Cutting-P-V; Cutting-P-V2; Ley-P; Ley-P-2; Ley-P-V]. It is known that the Rayleigh test and the Bingham test enjoy a doubly robust property: under the null hypothesis and the single assumption , both and converge in distribution to the standard normal distribution. This feature is highly desirable since no restriction on the dependence between and is imposed, and neither resampling procedures nor tuning parameters are needed to get the critical values of such tests. Regarding the packing test , it is known that, under the null hypothesis and the mild assumption , converges in distribution to the Gumbel distribution with CDF (see [Jiang13] and also [Jiang12]).
Each of these three tests has its own advantages and disadvantages. However, a common limitation is that they are each optimal only for a specific class of (parametric) alternatives: they perform well against certain models but may be essentially powerless outside those classes. Given the inherently nonparametric nature of the uniformity testing problem, it is therefore natural to prioritize robustness and optimality over a broad range of alternatives when designing testing procedures.
The primary objective of this article is to address this issue by approaching the problem (1) from a probabilistic and geometric perspective, rather than relying on the likelihood-based framework commonly used in statistics. We introduce a novel pseudometric to quantify deviations from uniformity (see (13) below). Unlike most classical distances between probability measures, this distance takes into account geometric deviations from uniformity; see Section 4 for further discussion. We then define a test statistic (see (8) below) that is naturally based on this distance and is universally consistent in fixed dimension. A key advantage of our test is its model-free nature: it imposes no structural assumptions on the underlying class of distributions, since it is not derived from likelihood inference. In high-dimensional settings, the proposed test enjoys a “doubly robust” property analogous to that of the Rayleigh and Bingham tests, yet remains intrinsically nonparametric. It admits a simple asymptotic theory in the high-dimensional regime, and comes with a consistency theory that is not restricted to any particular parametric class of alternatives. Our contributions can be summarized as follows.
-
•
We propose a new distance (see (13) for a precise definition) to quantify deviations from uniformity. This distance does not require the alternatives to be absolutely continuous with respect to the uniform distribution and is therefore well suited for analyzing singular alternatives. A natural test statistic associated with this distance (see in (8)) is introduced and shown to converge in distribution to the supremum of a Brownian bridge under the null (Theorem 1).
- •
-
•
We investigate how the distance behaves when restricted to parametric models, and how it relates to the minimax testing problem in those settings. In particular, we study two concrete models: the Fisher–von Mises–Langevin model and a low-rank uniform distribution model. We derive the local limiting distribution and the local power of in these two models (Propositions 2 and 3), and show that is asymptotically the supremum of a shifted Brownian bridge under such alternatives, where the shift is a smooth function that vanishes at the end points.
As a direct consequence of the local limiting distributions, we see that the minimax detection threshold with respect to coincides with the usual minimax detection rates within the corresponding parametric model (Propositions 7 and 8 in Appendix A.6). This means that, the threshold at which matches the minimax rate for testing uniformity in the corresponding parametric model. This phenomenon seems to hold for other models as well, see the discussion in Section 4.
-
•
As a byproduct of our analysis, we obtain an information-theoretic lower bound for testing the intrinsic dimension of the uniform distribution. This result is new and is of independent interest. In this low-rank uniform distribution model, the detection thresholds of the four tests , , , and our proposed test can be identified precisely.
We find that in this low-rank model, only the Bingham test and our proposed test attain the optimal detection threshold, and that our test is the only one that achieves the optimal rate simultaneously in both of the parametric models considered above; see Table 1 below.
We would also like to point out that there are other approaches in the literature that are not based on likelihood inference, such as the family of Sobolev tests originally proposed in [Gine] and the projection-based tests introduced in [cuesta2009projection], with further developments in recent works [garcia2021cramer; garcia2023projection; fernandez2023new]. Each of these approaches uses a different characterization of the uniform distribution: the Sobolev tests rely on the eigenfunctions of the Laplacian, while the projection-based tests are based on the one-dimensional distributions obtained by projecting the data onto all possible directions. However, none of these tests extend easily to high-dimensional settings, as they either involve tuning parameters or require resampling methods to implement. In contrast, our proposed test offers a simple and interpretable asymptotic theory in high-dimensional settings, which we discuss below. A detailed comparison between our test and the class of projection-based tests is provided in Section A.1.
The rest of the paper is organized as follows. The proposed pseudo distance and the test are presented in Section 2. The lowerbounds and local limiting distributions are provided in Section 3. Further discussions on the proposed pseudo distance and test can be found in Section 4. Section 5 contains the conclusions and some remarks. The proofs of the main results are presented in Section 6. Further discussisons and remarks can be found in Section 5. Some simulations, technical results, proofs and discussions are provided in Appendix A.
2 Measuring uniformity deviation and testing procedure
2.1 Notation and preliminaries
Throughout the paper, we consider the hypersphere , where is the Euclidean distance. We always assume without stating that the dimension diverges to infinity. The observed data points are denoted by with for all . We assume that ’s are drawn independently from an unknown distribution supported on the hypersphere . The uniform distribution on the hypersphere is denoted by .
For a pair of data points , we denote by the inner product formed by and . Under , the distribution of is known to have density (see, for example, Lemma 11 and 12 in [Jiang13])
| (5) |
In the formula (5) above, and are taken to be in . The CDF and density of a standard normal distribution will be denoted by and , respectively.
Throughout the paper, we will use or to denote the uniform distribution. The notation is used to indicate the -fold product measures of the uniform distribution.
2.2 A characterization of the uniform distribution
Let us start with an important observation regarding the random inner product of two i.i.d. points on the hypersphere.
Proposition 1.
Let be any Borel probability measure on , and let denote the uniform distribution on . Suppose are drawn independently from , and are drawn independently from . If
| (6) |
then .
Proposition 1 establishes the identifiability of the uniform distribution in terms of the inner product, asserting that the distribution of the inner product uniquely characterizes . To the best of our knowledge, this characterization is new and may be of independent interest. Notably, we do not impose any regularity assumptions on the measure in the statement of Proposition 1, and the result holds even if is highly singular. For instance, the characterization applies to probability measures supported on sets of lower dimensions. Proposition 1 allows us to construct tests that work against any type of alternative, distinguishing it from other omnibus tests in the literature, which typically require alternatives to have -integrable densities with respect to .
A related but distinct characterization of the uniform distribution in terms of projections was proposed in [cuesta2009projection]; see Section A.1 for further details and comparisons with the class of projection-based tests, which are built upon this characterization. Some recent extensions of this characterization to other types of distributions can be found in [fraiman2023cramer; fraiman2024application; fraiman2023quantitative]. Specifically, the characterization in [cuesta2009projection] is as follows. Let be random variables on for some and , then under some regularity conditions,
| (7) |
Broadly speaking, the characterization (7) relies on projections onto independent, uniformly distributed directions. To construct testing procedures using (7), one often needs to sample the direction repeatedly. In contrast, the left-hand side of (6) does not involve any uniformly distributed direction and is completely data-driven. Consequently, our method requires neither integrating over all directions , as was done in [escanciano2006consistent; garcia2023projection; fernandez2023new] nor sampling repeatedly, as was done in [cuesta2009projection].
The key challenge in proving Proposition 1 is that the arguments used to establish (7) are no longer applicable. Specifically, the proof of the characterization (7) in [cuesta2007sharp; cuesta2009projection] relies on a sharp version of the Cramér–Wold device, which cannot be applied to (6) because and follow different laws. The proof of Proposition 1, instead, relies on a subtle application of the Lebesgue differentiation theorem, which will be detailed in Section 6.1.
The first motivation for our model-free testing comes from Proposition 1 above. Intuitively, instead of carrying out inference directly on the data points, one can construct tests based on some unique features of the random inner product under . Here we choose the CDF as such a feature to construct the test, which is based on estimating the CDF of from the data and reject if the estimated CDF differs too much from the true CDF . Under , we know that the CDF of has a beta-type distribution with specific parameters (see equation (9) below). Another advantage of the random inner products is that, they are one-dimensional objects, which require low computational cost. Moreover, they become “asymptotically independent” as the dimension gets larger. The last phenomenon has been observed in a number of works related to Haar matrices; see, for examples, [Jiang05; Jiang09; D-F] and the references therein.
Another motivation for our model-free test is based on the following observation regarding three exsiting tests , and defined in (2), (3) and (4), respectively. It is reasonable to argue that the primary cause of their model-dependent issue is the inability to capture all the information available from the data. In particular, the Rayleigh test only takes advantage of the linear kernel, which is powerless for models involving axial data, such as the Watson distributions. On the other hand, the Bingham test uses a quadratic kernel and performs poorly on non-axial data. The packing test uses extreme-values, and as pointed out in [Jiang13], it does not examine whether there is a gap in the data or not.
To fully utilize all information available from the data, we will look at the empirical distributions generated from all random inner products, instead of using a particular U-statistic. To be more precise, let be the data, we define
In the light of Proposition 1, the empirical measure intuitively capture the characteristics of the underlying unknown distribution. One can then compare and , which is the distribution of under via the Kolomorgrov distance to see how severely the data is away from uniformity. To define the test formally, we write
| (8) |
where
| (9) |
and .
2.3 Testing procedure
In what follows, we assume depends on and sometimes we write for clarity. Recall the Brownian bridge has the same distribution as , where is a standard Brownian motion. Our next result establishes the asymptotic distribution of under , namely
Theorem 1.
The asymptotic distribution in Theorem 1 is the same as that of the classical nonparametric Kolomorgrov-Smirnov test. The exact expression of the Brownian bridge’s maximum is known to be
| (10) |
see, for example, [brownian-bridge].
At first glance, one can see that is the supremum of a degenerate U-process. Moreover, the underlying distribution of the U-process is also allowed to change with the dimension. To the best of our knowledge, this situation has not been explored in literature, although limit theory for non-degenerate U-processes is well-studied. To establish the null distribution of , we rely on a special property of the uniform distribution on the sphere: under , the normalized inner product ’s are pairwise independent and asymptotically normal. Moreover, as discussed above, the dependence between them is weaker as the dimension increases. Thus, one should expect the asymptotic distribution of is the same as that of the classical Kolomorgrov-Smirnov test under the i.i.d. settings. Remarkably, Theorem 1 presents a stark departure from classical results concerning degenerate U-processes with a fixed distribution. Typically, the asymptotic distributions of such statistics lack closed-form expressions; see (12) below. However, the divergence of gives a convenient asymptotic distribution as demonstrated in Theorem 1.
Thanks to Theorem 1 and the expression (10), one can easily calculate the critical value for the -level test. The test rejects the null hypothesis if , where is chosen such that
The quantile of the Kolomogrov-Smirnov distribution is well understood in literature and can be calculated precisely via (10), for example, . This is also the critical value we choose in the simulation. From now on, we will use to indicate the -level test based on , which is
| (11) |
where is defined in (8).
From Theorem 1, one can see that the test is doubly robust: there is no restriction on the way diverges to infinity. Among the three known high-dimensional tests, only the Rayleigh test in (2) and the Bingham test in (3) satisfy this property. The Packing test requires a mild regularity condition and thus, is not doubly robust. It’s worth noting that the test is also valid in the fixed scenario, albeit without the asymptotic distribution provided in Theorem 1. Indeed, when is fixed, it is known that
| (12) |
where is defined in (8) and is a stochastic process whose marginal distributions equal to a linear combination of chi-squared distributions, see Theorem 7 in [nolan1988functional]. The asymptotic distribution in (12) does not have a tractable expression and Monte Carlo simulation is needed to approximate the test’s critical value.
2.4 Model-free consistency
For any two probability measures and supported on , define the pseudometric
| (13) |
where , are drawn independently from and , are drawn independently from .
The distance in (13) is only a pseudometric, in the sense that does not imply . However, in the light of Proposition 1, we can see that yields . Thus, the pseudometric can be used as a quantitative measure for the deviation from the null. Based on this pseudometric, we define a consistency criteria, namely the -separation condition. The precise definition can be formulated as follows.
Condition 1 (separation condition).
Given a sequence . Let be a sequence of probability measures on , we say that the sequence satisfies the -separation condition if
| (14) |
where is the pseudo metric defined in (13).
The separation condition (14) measures the departure from the null hypothesis in terms of the pseudometric , which is the Kolomogrov distance between the random inner products drawn under and . Interestingly, the rate in (14) is of order , which is different than the normal rate. This is due to the degeneracy nature of and the form of . Interestingly, condition (14) is of nonparametric nature and requires neither parametric assumptions nor regularities: the sequence of alternatives may or may not have densities with respect to the uniform measure , and it is not restricted to any parametric class of distributions that contains .
We do not require converge to infinity in (14), and the fixed setting is also covered in (14). The assumption that is diverging is only required to control the size of the test via the asymptotic result in Theorem 1. In the fixed scenario, one can use Monte Carlo simulation to approximate the test’s critical value as stated in (12). If we keep fixed and consider a fixed alternative, then by Proposition 1, (14) always holds. Therefore, in the fixed-dimensional case, (14) is the same as the universally consistency property of the Sobolev tests. Furthermore, condition (14) remains valid in the high-dimensional settings, making it a natural analogue of the universal consistency property that operates in both fixed and high-dimensional cases. Next, given the separation condition (14), it can be shown next that the test is consistent.
Theorem 2.
The rate in condition (14) is sharp: we prove a matching lowerbound in Theorem 3 below, which also does not impose any restriction between and . Interestingly, if one restricts the model to a parametric class, the threshold at which the distance scales like often coincide which the minimax rates within that model. We do not have a proof or a result of this type for an arbitrary sequence of alternatives. However, we will investigate in details below two examples of this type, and derive the local limiting distributions along a sequence of local alternatives at the minimax thresholds.
3 Lower bound and non-null results
3.1 An information lower bound
Define the set of test functions based on a sample of size as
| (15) |
By using the Le Cam’s mixture argument, we can show that
Theorem 3.
Suppose . For small enough, we have
| (16) |
In fixed-dimensional settings, Theorem 3 is straightforward to prove since one can directly apply the Le Cam’s two-point argument to a perturbation of size of the uniform distribution. The non-trivial aspect of Theorem 3 lies in establishing the result in high-dimensional settings, and doing so without imposing any growth condition on and .
The worst-case construction in the proof of Theorem 3 is based on the Fisher–von Mises–Langevin (FvML) distributions. This choice is motivated by simulation results showing that the test exhibits power very close to that of the Rayleigh test, which is the optimal invariant test within this model [Cutting-P-V].
3.2 Local limiting distribution under the FvML alternatives
The FvML distributions are one the most common type of alternatives for uniformity testing and have been investigated in the recent line of works [Cutting-P-V; Cutting-P-V2]; see also the references therein. To describe the FvML distributions, let us introduce a general class of “monotone” rotationally symmetric densities following [Cutting-P-V; Cutting-P-V2].
Let be a smooth and strictly increasing function. Define the family of densities
Here is the concentration parameter and is the location parameter. Most of the common distributions in directional statistics belong to this class of distributions. Two common choices are
-
•
Watson distributions. This corresponds to the case [Cutting-P-V2].
-
•
FvML distributions. This corresponds to the case [Cutting-P-V].
In this subsection, we will investigate the local power and consistency of the test under the the class of FvML distributions. It is known that within the class of FvML distributions, the threshold is the minimax rate: when is below this threshold, no rotationally invariant test can be consistent. Moreover, when is above this threshold, the Rayleigh test is consistent and is also optimal in the sense of Le Cam.
Let be the quantile function of the standard normal distribution, and is the standard Gaussian density, . Our main result regarding the FvML alternatives is
Proposition 2.
Let . Then, if , then
under the sequence of FvML alternatives with concentration parameter , where is the Brownian bridge.
It follows directly from Proposition 2 that the asymptotic power of under the class of FvML distributions is given by
From the display above, we can see that is consistent at the contiguity rate . By Proposition 7 in the Appendix A.6, we get
The display above indicates that the local alternatives at the minimax threshold for are the same as that of the parametric FvML model. In other words, the distance captures precisely the minimax rate of testing uniformity in the FvML model.
The asymptotic power above does not have any closed-form expression, but we observe in simulation that its power is slightly lower than that of the Rayleigh test, which is expected due to the LAN expansion in [Cutting-P-V]. Note that in this regime, the Packing test and the Bingham are both blind. We further know from [Cutting-P-V2] that the detection threshold for the Bingham test in this model is .
3.3 Local limiting distribution under a low-rank model
Consider the set of -dimensional hyperplanes in . We denote this set by , which is known to form the Grassmannian manifold; see [chikuse2003statistics] for a comprehensive overview.
Consider the set of -dimensional hyperplanes in . We denote this set by , which is known to form the Grassmannian manifold; see [chikuse2003statistics] for a comprehensive overview.
We are interested in testing uniformity against the class of low-rank uniform distributions:
| (17) |
for some .
Obviously, the case corresponds to the null . The problem is essentially about detecting whether the uniform distribution has a low-rank structure in it. We are interested in the hard regime where is close to , and we will thus assume that such that and . In this regime, we can show that
Proposition 3.
Suppose and . Let be the Brownian bridge. Then,
where and are the quantile function and density function of a standard normal distribution, respectively.
The proof of Proposition 3 follows directly from Proposition 8 in the Appendix A.5 and Theorem 1. Proposition 3 shows that the test is consistent at the threshold , with asymptotic power given by
It is natural to ask whether the rate is optimal. The answer is yes, which is the claim of the theorem below.
Theorem 4.
Theorem 4 claims that in the high-dimensional regime, as long as is small enough, no test based on a sample size of can be consistent. This suggests that the test is rate-optimal in this model. To the best of our knowledge, this information lowerbound is new and has not been studied before. The most technical part of the proof is to analyze the likelihood ratio against a random distribution over the Grassmanian .
By Proposition 8 in the Appendix A.6, we have
where is the standard Gaussian density. Therefore, the local alternatives at the minimax threshold for are the same as that of the low-rank model. In other words, the distance captures precisely the minimax rate of testing uniformity in the low-rank model.
Let us now do a comparison in terms of local power between the four tests . The nice feature of this low-rank model is that all the detection threshold of all the four tests can be computed precisely.
-
•
Recall the Rayleigh test from (2), it is easy to check that as . Thus, is not consistent, even when . Its maximum power will not exceed . However, a two-sided version of , which rejects if is large, is consistent in the regime .
-
•
For the Bingham test in (3), we have
as , under . Therefore, in the regime , the asymptotic power of the Bingham test is
where is the -quantile of the standard normal distribution. This shows that the Bingham test achieves the optimal rate as suggested by Theorem 4, with local power given by the above.
-
•
Finally, regarding the Packing test in (4), we have
where is a standard Gumbel law. Thus, the test is consistent iff . This detection threshold is strictly sub-optimal, but is still better than the Rayleigh test.
From the above, we can see that only the proposed test and the Bingham test achieve the optimal rate. Although the power function of does not have a closed-form expression, we find in simulation studies that the local power of the Bingham test is greater than that of the proposed test.
4 When are the distance and test useful?
In this section, we discuss several advantages of the distance and explain why the proposed test is useful. First, the distance is a measure of “symmetry” and differs from classical metrics between probability measures such as total variation, Hellinger, or chi-squared distances. These distances are not tailored to the orthogonally invariant structure of the problem and, in particular, they do not reflect geometric features such as concentration along lower-dimensional subspaces.
To illustrate the difference with such metrics, consider the class of low-rank uniform distributions introduced in Section 3.3. In terms of total variation distance, we always have
for all subspaces with dimension less than or equal to .
Thus, density-based distances such as total variation are not well-suited for alternatives that are singular with respect to the uniform distribution. In contrast, the distance is sensitive to geometric deviations: it detects that low-rank uniform distributions have large “empty regions” compared to the standard uniform distribution, and it also captures the concentration patterns of FvML distributions at the optimal rate. We also note that the Wasserstein distance is another metric that can detect geometric features and is sensitive to changes in intrinsic dimension. However, constructing tests based on the Wasserstein distance is substantially more involved, both analytically and computationally.
Another interesting feature of the distance , which we don’t have a fully general theory for it yet is that, when restricts to many parametric, high-dimensional classes of distributions, the threshold at which converges to a non-zero limit often coincides with the minimax rate of testing uniformity within that family. One can show this for some other models, such as the Watson distributions, or the spiked covariance distributions. The task of getting asymptotic expansion for the distance along a parametric model often can be done by using Edgeworth-type expansion (although the computation can be tedious).
Regarding the test , we observe that it achieves the optimal detection rates in both the FvML model and the low-rank model, with explicit power functions available in each case, albeit without being the locally most powerful test in either setting. As discussed above, we believe that is rate-optimal for a broad range of parametric models; that is, we conjecture that the threshold at which converges to a positive limit coincides with the minimax rate for testing uniformity in that parametric model. This has been verified for the two models considered in Sections 3.2 and 3.3. One can also establish this correspondence for the Watson distributions via an Edgeworth-type expansion, although deriving the local limiting distributions would require a LAN expansion similar to that in [Cutting-P-V]. At present, however, we are not aware of a unified framework for computing the local power of across different parametric families.
In what follows, we examine yet another class of distributions whose geometric structure is close to that of the uniform distribution, thereby providing further insight into the behavior of .
-
•
The class of -spherical distributions. This model arises by projecting heavy-tailed random vectors with i.i.d. components onto the unit sphere. It was introduced in [heiny2022limiting; dornemann2025limiting] in the study of a heavy-tailed analog of the Marchenko–Pastur law for sample correlation matrices. Subsequently, [jiang2025asymptotic] showed that both the Rayleigh test (2) and the Bingham test (3) are inconsistent for this model in the proportional regime , while the packing test (4) remains consistent.
Formally, the -spherical distribution is defined as
where is a -dimensional random vector with i.i.d. symmetric components, each regularly varying with index .
Although this model does not represent a local alternative to the uniform distribution, the geometric behavior of its samples is remarkably similar: in both cases, the points are nearly orthogonal in high dimensions. The subtle difference between and is that, under , there are a few points that are either very close to each other or almost aligned along straight lines through the origin. Intuitively, since almost all the points are orthogonal, the tests that are based on a single polynomial of the inner products, like the Rayleigh test and Bingham test, would fail to be consistent.
By [cohen2020heavy] (see the discussion after Theorem 4.1), it follows that
for some non-degenerate random variable that can be written as the ratio of independent stable random variables. Since , we have , and hence as . Therefore,
As , the first probability converges to , while the second converges to , so for all large enough,
Thus, the test is also consistent in this model.
The behaviour of the four tests can be summarized in the Table 1 above. One can see that only the proposed test is the one that stays consistent/rate-optimal across all three different models.
| Test / model | FvML | Low rank | -spherical |
|---|---|---|---|
| (2) | , optimal [Cutting-P-V] | , sub-optimal | inconsistent [jiang2025asymptotic] |
| (3) | , sub-optimal [Cutting-P-V] | , optimal | inconsistent [jiang2025asymptotic] |
| (4) | blind at , sub-optimal [jiang2025asymptotic] | , sub-optimal | consistent [jiang2025asymptotic] |
| (8) | , optimal Proposition 2 | , optimal Proposition 3 | consistent |
5 Conclusions and remarks
In this paper, we propose a novel distance to quantify deviations from uniformity, together with a test naturally associated with this distance. We show that the test enjoys very simple asymptotic properties in high dimensions and admits a model-free consistency theory. We establish optimal detection rates with respect to the proposed distance and show that the test attains these rates. Furthermore, we show that, when restricted to parametric models, the proposed distance precisely captures the usual notion of local alternatives; this is verified for the FvML model and for a low-rank uniform distribution model. As a consequence of our analysis, we obtain the detection threshold for testing the intrinsic dimension of the uniform distribution. We now make some remarks.
- 1.
-
2.
We believe that the proposed distance characterizes the minimax rates for other models as well. For example, one can show this for the Watson model and for certain spiked-covariance models, although the analysis in those cases requires specific restrictions on the joint growth of and .
-
3.
One can also investigate a procedure based on an -type distance instead, for which similar results are expected to hold. We leave this as a direction for future work.
6 Proofs
6.1 Proof of Proposition 1
Before presenting the proof, we first state a version of Lebesgue’s differentiation theorem for smooth and complete Riemannian manifolds. The version provided below is not the most general result but is sufficient for our purposes.
Lemma 1.
Let be a smooth, complete Riemannian manifold with the corresponding geodesic distance . Suppose is a non-negative, finite Borel measure on the metric space and is a non-negative integrable function with respect to . Then, we have
| (18) |
for -almost everywhere . Here is the open ball with respect to the geodesic distance .
Proof of Lemma 1. Define
It is easy to see that . By Theorem A.1 in [jost2021probabilistic], there exists a measurable set such that and
| (19) |
exists and is finite for all . Also by Theorem A.1 in [jost2021probabilistic], is the Radon–Nikodym derivative (up to a null set) whenever is complete. Thus, -almost everywhere. The proof is completed.
In the argument of Lemma 1, the completeness of is needed only to deduce that in (19) is equal to the Radon–Nikodym derivative -almost everywhere. Results of this type hold for various metric spaces with different structures, see the classical monograph [GMT] for more details.
Proof of Proposition 1. It is easy to see that given the assumptions in Proposition 1, we have
| (20) |
for any bounded, measurable function . We will show that (20) implies . To see this, fix and define
Let and define the Radon-Nykodim derivatives , . It follows that and thus,
| (21) |
-almost surely. Plug into (20) gives
Thus,
| (22) |
Moreover, by Fubini’s theorem,
Additionally,
Therefore, from (22) and the two equalities above, we get
| (23) |
for all .
Let be the geodesic distance on . It is easy to check that is a Polish space and that for all ,
where and is the closed ball with center at , radius , and with respect to . Hence, for all , one can rewrite (23) as
| (24) |
where we have used Lemma 8 to write
for all . Note that (24) holds since the right-hand side in the expression above is constant across .
Since is a smooth, complete Riemannian manifold with respect to the canonical Riemannian metric, Lemma 1 can be applied to to deduce that
as , for -almost surely since as . Therefore, the above display together with (21), (24) and Fatou’s lemma yields
Moreover, Holder’s inequality gives
The two bounds above implies that the integral is exactly and for -almost surely . This in turn yields almost surely with respect to . From (21), we also get and the conclusion follows.
6.2 Proof of Theorem 3
Fix sufficiently small and define
In other words, is a FvML distribution with location and concentration parameter . Consider the least favorable distribution
| (25) |
By Proposition 7, for all , we have
whenever is sufficiently large and is small enough.
6.3 Proof of Theorem 1
Define
| (26) |
We will show that the process converges in distribution to in the Skorohod space , for all . Some basic properties regarding the topology on this space can be found in [Dehling].
Here is the CDF of a standard normal distribution and is the Brownian bridge. We do not work directly with the space (see [vogel2010weak] for more details) since the supremum functional is not almost surely continuous on this space, see Remark 1 below.
Step 1: Convergence in . Suppose . To show the convergence in , we need to check that
Condition 2 (Finite-dimensional convergence in distribution).
For any grid , one has converges in distribution to .
Condition 3 (Tightness).
For any , we have
| (27) |
Proposition 4.
Let be a sequence of measurable functions such that and
| (28) | ||||
| (29) |
Then,
| (30) |
Apply Proposition 4 to the kernels of the form , we get the convergence of finite-dimensional distributions. Note that condition (29) satisfies because is asymptotically a standard normal distribution.
By applying Lemma 6 to the class of functions (which has VC-dimension ) and using the degeneracy of the kernels, we obtain
The proof is completed by first letting and then letting .
Step 2: Continuous mapping and negligibility of the tail. By the continuous mapping theorem, for all , we have
To deduce the result, it suffices to show that for all
| (31) |
The above is equivalent to showing that
Since the proofs of these two limit are identical, we will only prove the former. We again apply Lemma 6 to the VC-type class of functions
to deduce that
where the variance profile is defined as
It is easy to check that converges to as . This finishes the proof.
Remark 1.
The reason we do not work directly with is because the functional
is not almost surely continuous at .
In fact, the topology on is equivalent to the coarsest topology such that the projection map from to is continuous for all (see Section 3 in [vogel2010weak] for more details). Since this topology only sees the behavior of the process on bounded intervals, modifying the process on a diverging sequence still yields convergence in , but the supremum functional can blow up.
6.4 Proof of Theorem 2
Consider a sequence of laws such that . Put and define
For , rewrite in terms of the Hoeffding’s projection as
where
Define
Roughly speaking, is the point where the major contribution from the Hoeffding’s projection term comes from, and is the point where main contribution from the deterministic perturbation comes from.
It suffices to consider the following two cases.
Case 1: . In this case, we can estimate
It is easy to check that
where the second inequality follows from Lemma 6.
Consequently,
Since and , the test rejects with probability tending to one.
Case 2: . In this case, by a subsequence argument, we can assume that . Notice that
where
and the second line follows from the fact that
By Lemma 6 and the assumption that , we deduce that . Now, thanks to the Berry–Esseen bound for sum of i.i.d. random variables (see; for example, Theorem 3.7 in [chen2010normal]) and the fact that , we have
The proof is in this case is completed by employing Lemma 2 below with
to get
Lemma 2.
Suppose is a sequence of random variables such that
where is a normal distribution with variance .
Let be any sequence of real numbers (not necessarily bounded), and let be a sequence of random variables such that . Then
6.5 Proof of Proposition 2
As in the proof of Theorem 1, we need to check three conditions:
-
•
Convergence in finite-dimensional distributions. Recall in (26). We need to show that for all ,
(32) under the FvML distributions with concentration parameter .
-
•
Tightness. This condition is equivalent to (27), for all spaces with .
-
•
Negligibility of the tail. This condition is (31).
To show (32), recall that by Proposition 4 and the Crámer-Wold device, we have
under uniformity, where is a standard normal, is the Rayleigh test as in (2) and has the distribution equals to the joint distribution of discretized Brownian bridge at . Moreover, the correlation between and can be specified as (this is also the covariance limit in Proposition 7’s proof)
with , for all .
We then obtain (32) from the convergence above by using the LAN expansion (52) (see [Cutting-P-V] for a proof) and the Le Cam’s third lemma.
We now show (27) and (31). Since their proofs are similar, we will only show (27). Assume the contrary, then there exists and and a sequence such that
| (33) |
where is the corresponding subsequence of FvML alternatives. Put
Recall in (50). Thanks to Proposition 5, we have
| (34) |
since and under uniformity (which is due to (27) holds under uniformity). Note that the second equality in this display above follows from the fact that the distribution of is invariant under rotations:
for all orthogonal matrices .
6.6 Proof of Theorem 4
Let us start with a useful result for calculating likelihood ratios between distributions that are invariant under group actions. For terminology related to group actions and maximal invariants, we refer the reader to Chapters 2 and 3 of the monograph [eaton1989group].
For the reader’s convenience, we briefly recall the relevant concepts. A group is said to act on a space if there exists a mapping that is compatible with the group operation. A measurable mapping is called an invariant if
An invariant is called a maximal invariant if for some , then there exists such that .
Lemma 3.
Let be a Polish space, and suppose a compact group acts on continuously. Let and be two Borel probability measures on that are invariant under the action of . Let be a continuous maximal invariant for some Polish space . Define the induced laws
Then whenever . Moreover, when this holds and , we have
The proof of Lemma 3 can be found in Appendix A.4. We now construct the least favorable alternative. Let denote the normalized left Haar measure on the Grassmannian (so that it is a probability measure). Define
Roughly speaking, is the joint distribution of under , while is the law obtained by first sampling a -dimensional subspace and then sampling
From now on, we let denote the data matrix whose columns are . We will apply Lemma 3 to show that exists and to derive its explicit form. Note that, although one can also use the Blaschke–Petkantschin formula to compute this integral (see, for example, Chapter 7 of [schneider2008stochastic]), the computation is quite lengthy. Define
where is the set of all symmetric matrices of size .
Note that the matrix is nothing but the sample correlation matrix without centering by the sample mean. We can then apply Lemma 2.1 in [jiang2019determinant] and Theorem 5.1.3 in [muirhead2009aspects] to get the density of under as
| (35) |
We have in the above formula instead of in [muirhead2009aspects] because there is no centering term in , and Lemma 2.1 in [jiang2019determinant] asserts that such difference is in fact equivalent up to one unit shift in .
Here joint densities of the upper-diagonal entries of . Equivalently, it can also be regarded as a measure on , defined as the pushforward measure of the Lebesgue measure on an open subset of to via the natural embedding.
Similarly, under , the density of is given by
| (36) |
It is easy to see that the two laws in (35) and (36) are mutually continuous. Also, these two densities are well-defined due to our assumption that . Thus, Lemma 3 gives
| (37) |
where the normalizing constant satisfies
Similarly to the proof of Theorem 3, we only need to show that
whenever with is defined in (37).
From Lemma 5.2 in [jiang2015likelihood], we find that
for where is the multivariate Gamma function defined as in (5.1) of [jiang2015likelihood]. The specific form of is not relevant to our proof as we will only need the asymptotic result from Proposition 5.1 of [jiang2015likelihood]. These asymptotic results are collected in Lemma 4 in Appendix A.5 below.
Appendix A Technical results, discussions and other proofs
A.1 Comparison with projection-based tests
At a high level, our newly developed procedure follows the philosophy of projection-based tests, initially developed in [cuesta2009projection]. Their test relies on the characterization (7). This characterization can be shown using a variant of the Cramér-Wold device, although the same argument does not apply to Proposition 1. Based on (7), [cuesta2009projection] proposed a test that rejects for large values of
| (39) |
where is drawn independently from the data and .
The test in [cuesta2009projection] uses the same critical value as the Kolmogorov–Smirnov test. In practice, is drawn multiple times from and one gets a corresponding -value for every such . The test rejects if the smallest -value is below a threshold. More specifically, one picks a large number and draws independently from . The test in [cuesta2009projection] rejects at -level if
where is the critical value of the Kolomogrov–Smirnov test and is the -quantile of the left-hand side. However, as the asymptotic theory for this test remains unresolved, computationally intensive Monte Carlo methods are often required to approximate .
Subsequent works, such as [escanciano2006consistent; garcia2023projection; garcia2021cramer] (see also the references therein), addressed this issue by integrating over all possible directions , resulting in test statistics of the form
| (40) |
for some weight function , where the expectation above is taken with respect to .
Test statistics like (40) exhibit desirable properties, similar to the Anderson–Darling and Cramér–von Mises tests. However, the expectation with respect to in (40) often lacks a closed-form expression, requiring Monte Carlo simulations for approximation. Additionally, their asymptotic distributions frequently involve weighted sums of chi-squared distributions, complicating the computation of tail probabilities. For example, the tests in [garcia2021cramer; garcia2023projection] rely on Imhof’s method to approximate the critical value. Our proposed test offers two key advantages over projection-based tests:
-
1.
Reduction in computational cost: Unlike projection-based methods, our test avoids sampling random directions or integrating over all possible directions, which often requires complex procedures to approximate the critical values. Theorem 1 demonstrates that when the dimension is large, the tail probabilities of our test statistic are much simpler to approximate, eliminating the need for Monte Carlo simulation.
-
2.
Flexibility in the high-dimensional settings: While existing projection-based tests are valid only in fixed-dimensional scenarios, our test extends seamlessly to high-dimensional settings, including cases where is large and is small. Extending projection-based tests to such settings is highly non-trivial, as it requires understanding how the eigenvalues of the associated Hilbert–Schmidt operator shrink to zero as the dimension increases.
A.2 Simulation studies
In this subsection, we do simulation studies to compare the power of with the three existing tests, in (2), (3) and (4), respectively. We let in the experiments. The empirical powers of the four tests are reported in Figure 1 below. The yellow curve corresponds to the proposed test . We can see that the empirical power fits the theoretical analysis well: the proposed test is nearly as good as the Rayleigh test in the FvML model and is the only test that remains consistent under the both models.


A.3 Proof of Proposition 4
To prove (30), we employ a martingale central limit theorem. We will use Corollary 3.1 in [Hall]. Define
| (41) | ||||
| (42) |
Thanks to Lemma 8 in Section A, the sequence is a martingale difference sequence with respect to the natural sigma-fields . This means that . To get (30) via the martingale CLT from Corollary 3.1 in [Hall], we need to verify the following two conditions
| (43) |
for every fixed , and
| (44) |
To verify the Lindeberg condition (43), it suffices to show that
| (45) |
By the pairwise independence property (see Lemma 8), we get and thus,
| (46) |
where is the limit in (28). To bound the -th moment terms in (45), we first note that
due to the assumption and Lemma 8. Similarly,
Consequently, for some universal constant , we get
where the last assertion follows from the second statement of Lemma 8. Sum up the above display over all , we arrive at
for some universal constant . This in turn yields
by (44) and (46). The last term in the above display goes to by assumption (29) and thus, implies (45). This in turn conclude the proof of (43).
We now prove (44). Recall defined in (42), which can be rewritten as
Let be i.i.d realizations of the uniform distribution on . Define
| (47) |
It is easy to see that for any , we have
Thus, we have
where
To prove (44), we only need to show that . Let . Note that for any two pair and , we have
unless . This is due to the assumption and Lemma 8. Consequently,
where we use the fact that in the last bound. By (46), so it suffices to verify that , which is the content of Lemma 10 in Section A. This concludes the proof of (44), which together with (43) implies (30).
A.4 Proof of Lemma 3
Suppose for some measurable subset . Let and . By the disintegration theorem (see, for example, Appendix F of [pollard2002user]),
and similarly
For each , define the -invariant conditional laws (such sets are called fibers) by
for measurable , where is the normalized Haar probability measure on .
Roughly speaking, we integrate the disintegration kernels over the group so that the resulting kernels are -invariant, up to some null sets in . Because and are -invariant, we still have
Each fiber is a single -orbit, in the sense that it is generated by for some , because is a maximal invariant. Thus acts transitively on every such fiber. By Theorem 4.5 of [eaton1989group], a transitive compact group action admits a unique -invariant probability measure on each orbit. Hence,
| (48) |
Since , we have
and therefore for -almost every as well, because . Using on this set of ,
Thus .
Now, for the statement regarding the likelihood ratio, let
Then, by (48),
On the other hand,
Hence,
which implies
This completes the proof.
A.5 Likelihood ratio analysis
This section contains the analysis of the likelihood ratio’s second moment used in Sections 3.2 and 3.3.
We first analyze the second moment of the likelihood ratio between the FvML distributions with randomized locations and the uniform distributions. We first parametrize the FvML distributions as
with and . From this, we find that
| (49) |
Some basic properties of the above normalizing constant are collected in Lemma 12. Define the likelihood ratio
| (50) |
Our first result is an asymptotic formula for the moment of the likelihood ratio.
Proposition 5.
Proof of Proposition 5. Recall the form of in (50). To compute the second moment, take two independent copies of the uniform distribution and writes
Note that , where has the law as in (5). Thus, we have
where has a symmetric Beta-type distribution as in (5).
Put
By the third property in Lemma 12, we have
Consequently,
The proof is completed by noting that the sequence converges to a standard normal distribution and is exponentially tight.
The next asymptotic result was used in the proof of Theorem 4. Recall that the multivariate Gamma function is defined as
| (51) |
for all complex number such that .
The multivariate Gamma function reduces to the usual Gamma function for . The following lemma is taken from Lemma 5.1 and Proposition 5.1 in [jiang2015likelihood].
Lemma 4.
Let be the standard Gamma function and be the multivariate Gamma function as in (51). We have
-
•
Uniformly for all ,
as .
-
•
Uniformly for all ,
as with , where
Proof of Lemma 4. The first item follows from Lemma 5.1 in [jiang2015likelihood] and the second item follows from Proposition 5.1 in [jiang2015likelihood].
Proposition 6.
A.6 Kolomogrov distance asymptotic results
Let us start with a simple observation.
Lemma 5.
Suppose and are two sequence of probabiity measures such that the likelihood ratio exists. Let be a sequence of random variables. If
then
Proof of Lemma 5. Observe that
The second inequality can be proven similarly. The proof is completed.
Proposition 7.
Let with . Suppose are two i.i.d. random points on . Then, for any , we have
as , where is a FvML distribution on with concentration parameter . Moreover, the convergence above is uniform in , which also means , where is the metric in (13).
We did not specify the location parameter of the FvML distributions in the statement of Proposition 7 since the distribution of the inner product is independent of the location’s choice. The proof of Proposition 7 is based on an application of the Le Cam’s third lemma and the high-dimensional LAN result in [Cutting-P-V].
The interesting feature of this approach is that no growth condition on and is assumed. We do not know whether direct analyses based on the Edgeworth expansion or spherical harmonics can yield the same result. The reason that Proposition 7 holds without any growth condition on and is due to some special properties of the Bessel functions of the first type, which was exploited in [Cutting-P-V] and Proposition 5 above.
Proof of Proposition 7. Fix and consider the sequence of random variables
Recall in (50). It was shown in [Cutting-P-V] that under uniformity, we have the LAN expansion
| (52) |
where is the Rayleigh test in (2).
Recall that is the CDF of a standard normal. By using Proposition 4 and the Crámmer–Wold device, we have
under uniformity, where
with being a standard normal in the expression above. The function is nothing but the opposite sign of the standard Gaussian density.
The convergence in expectation in the display above follows from the fact that the normalized inner products are asymptotically normal and have uniformly bounded fourth moments. Thus,
under uniformity. By using the Le Cam’s third lemma and (52), we obtain that under ,
Since is the sum of pairwise independent, mean zero, bounded random variables under uniformity rescaled by , its fourth moment is uniformly bounded by a universal constant. Combine this fact, the expansion (52), Lemma 5 and Proposition 5, we obtain the mean convergence
This completes the proof of the first claim.
We now prove the uniform convergence. Observe that for a fixed
It is an standard fact that if a sequence of equicontinuous functions converges pointwise on a compact set, then the convergence is uniform (this is a corollary of the Arzelá–Ascoli theorem, see Theorem 4.43 in [folland1999real]). To use this fact, let us check that the functions
| (53) |
are equicontinuous.
The second term is obviously smooth and has bounded derivatives, so we only have to treat the first term. By the second estimate in Lemma 5, for all , we have
Since the densities of under (given by (55) below) are uniformly bounded for large (the upper bound can be taken as for ), the equicontinuity follows. In fact, the argument above also gives Hölder continuity of the sequences in (53) with exponent and with the same Hölder constant.
Now, notice that the sequence of functions in (53) are equicontinuous and converges pointwise to in , the convergence is uniform. Thus, we have
for all .
It suffices to show that the first term on the last display goes to as . To see this, use the second estimate in Lemma 5 again to get
Consequently,
where . By (54) below, we have
Therefore,
The last term goes to whenever . The proof is completed.
The next result gives the expansion in terms of the distance for the low-rank model (17). Recall that and are the uniform distributions on -sphere and -sphere, respectively.
Proposition 8.
Suppose , , and .Let be i.i.d. sampled from either or . Then, we have
as , for all . Moreover, the convergence above is uniform in , which also means
The rate is sharper than a direct application of the Berry-Esseen bound and requires a more careful analysis. We will prove a stronger result that the -distance between the densities of and a standard normal is of order . To see this, note that by (5), the density of has the form
| (55) |
A direct calculation yields
Also, for , we have
uniformly.
Thus,
where the first equality follows from the fact that on the interval . Consequently, (54) follows.
To finish the proof, notice that
for all . The convergence above is uniform whenever . The proof is completed.
A.7 Other technical results
Let us start with a concentration inequality for degenerate U-processes, taken from [cattaneo2024uniform]. Let be a measurable space and let be i.i.d. -valued random variables with common law . Let be a pointwise measurable class of measurable functions .
Define the (canonical) degenerate -process of order two by
for . Assume that
-
1.
Each is symmetric, i.e. for all .
-
2.
There exists a measurable envelope such that for all and all .
-
3.
For any probability measure on and , let
Suppose that the envelope is VC-type in the sense that there exist constants and such that, for all ,
where the supremum is taken over all finite discrete probability measures on .
Under the three conditions above, we have
Lemma 6 (Lemma SA37 from [cattaneo2024uniform]).
Let be any deterministic quantity satisfying
and define the random variable
Then there exists a universal constant such that
Proof of Lemma 6. See [cattaneo2024uniform].
Lemma 7.
Suppose are matrices of size such that . Then, there exists an orthogonal matrix of size such that .
Proof of Lemma 7. Notice that the assumption implies that since and have the same kernel. We will use the notation to indicates the column space of a matrix . Therefore, the map
is well-defined for all . Note that since their kernels are identical.
Moreover, is an isometry because
Since , admits a linear isometry extension to . Since linear isometries are orthogonal matrices, we have
for some orthogonal matrix of size and all . The proof is completed.
Lemma 8.
Let , and be i.i.d realizations of the uniform distribution on and be a bounded measurable function. Then, we have
-
•
almost surely.
-
•
and are independent.
Proof of Lemma 8. The first claim is a consequence of the rotational invariant property. Conditioning on , there exists an orthogonal matrix such that . Thus, with probability one, we have
where we use the fact that is independent from in the last equality. Similarly, one can also show that . This concludes the proof of the first claim.
For the second claim, take any bounded measurable functions and , by conditioning on , we get
where we use the conclusion of the first statement in the last equality. This concludes the proof.
A direct consequence of Lemma 8 is that, for any bounded measurable function , the term defined in (41) are degenerate U-statistics of order (see, for example, Section 5.3 of the monograph [Dehling] for a comprehensive introduction to U-statistics and its limit theories). This fact was used frequently throughout the proof of Theorem 1. The next lemma is used in checking the conditions of martingale CLT, which was used in the proof of Theorem 1. It gives a simpler form of the distribution of joint angles.
Lemma 9.
Let be an integer. Assume and are fixed vectors. Let be a random vector uniformly distributed over and be i.i.d. standard normal. Set . Then, for any bounded measurable function , we have
Proof of Lemma 9. We can assume without loss of generality. Suppose and are constant vectors with . Let be an orthogonal matrix of which the first two rows are and , respectively. Then, by the Haar-invariance of the uniform distribution on sphere, and are identically distributed. In particular the top two entries of , that is, have same distribution as that of Write
The advantage of doing so is the trivial observation that and are orthogonal unit vectors. Hence and have the same law as that of This implies that have the same law as that of
As a result,
where the last expectation is taken over , hence it is a function of .
Lemma 10.
Let be defined in (47) for a sequence of measurable functions . Assume additionally that
for some constant independent of . Then, we have
as .
Proof of Lemma 10. It suffices to prove Lemma 10 when is bounded. Indeed, suppose we have proved for all bounded , then write
where the expectation is taken with respect to the law of . Then,
For every fixed the first term tends to zero when . We then the deduce the result by letting and noting that
Now assume that is bounded. Let be drawn from the uniform distribution on independently from and . Thanks to Lemma 9, we can write
where is a vector consisting of i.i.d. standard normal. Set and let be the density of , we can write
for any fixed . By Proposition 5 in [Jiang13], we get as and hence, it suffices to bound the first integrand over . Thanks to Lemma 11, for we get the bound
by choosing in Lemma 11. Note that since , which gives us
where we use the bound (56) in the last inequality. This in turns yields
since the -norm of is uniformly bounded. Consequently,
The proof is completed by taking and then taking .
Lemma 11.
Let is a vector consisting of i.i.d. standard normal, then for any bounded, measureable function and , we have
for some universal constant .
Proof of Lemma 11. The conclusion follows from the following two total variation distance bounds
| (56) |
and
| (57) |
for some universal constant and for all large enough in the first estimate. We next explain (56) and (57).
The first estimate (56) is a consequence of Diaconis-Freeman theorem (see Theorem 2.8 in the monograph [Meckes] and also the paper [D-F]). It provides a sharp bound in terms of total variation between the joint distribution of the first few entries of and the standard multivariate normal random vector of the same length. The second estimate (57) is elementary and can be proven directly by estimating the difference between the two corresponding densities. To see this, write
where
The term is obviously of order as and thus, we only need to bound . In polar coordinates, can be rewritten as
Moreover, we have
where we use the elementary inequalities for all and for all , where depends only on . Thus, we have
This concludes the proof of (57).
The next lemma collects some elementary properties and bound for the normalizing constant of the FvML distributions.
Lemma 12.
Recall in (49). The following statement holds:
-
•
If is the modfified Bessel function of first kind (see [Ley-Verdebout] for details), then
-
•
For all , we have
where
-
•
for all , we have
for all and .
Proof of Lemma 12. The first property can be found in [M-Jupp], pages and . The second property follows can be found in Section 3 of [hornik2013amos]. Let us use the first two properties to prove the last one.
We first show that
| (58) |
To see this, we apply the second property with to obtain
From the upperbound, it is clear that the Bessel ratio is always less than . Consider two cases as follows.
Case 1: . In this case, the result is trival since
Case 2: . In this case, put . Use the lowerbound to get
where the last line follows from the fact that
Finally, to deduce the third property, we integrate (58) to get
This completes the proof.