Detecting non-uniform patterns on high-dimensional hyperspheres

Tiefeng Jiang
School of Data Science, Chinese University of Hong Kong, Shenzhen
[email protected]
   Tuan Pham
Department of Statistics and Data Science, University of Texas, Austin
[email protected]
Abstract

We propose a new probabilistic characterization of the uniform distribution on the hypersphere in terms of the distribution of inner products, extending the ideas of [cuesta2009projection; cuesta2007sharp] in a data-driven manner. Using this characterization, we define a new distance that quantifies the deviation of an arbitrary distribution from uniformity.

As an application, we construct a novel nonparametric test for the problem of testing uniformity, namely the task of determining whether a set of nn i.i.d. random points on the pp-dimensional hypersphere is approximately uniformly distributed. The proposed test is asymptotically a Brownian bridge and it can detect any alternative lying outside a ball of radius 1/n1/n with respect to the proposed distance, in both high and low-dimensional settings.

We then prove a matching lower bound with respect to this distance and study its behavior when restricted to parametric models. In particular, we show that the minimax detection thresholds with respect to this distance coincide with the usual minimax thresholds in two important families: (i) the class of Fisher–von Mises–Langevin (FvML) alternatives, and (ii) a class of low-rank uniform distributions. Thus, the proposed test is optimal in these models. We also derive the limiting distributions of the test under the corresponding local alternatives.

As a byproduct of our analysis, we determine the detection threshold in the high-dimensional regime for testing the intrinsic dimension of the uniform distribution on 𝕊p1\mathbb{S}^{p-1}; that is, for testing whether the distribution is uniformly supported on 𝕊p1\mathbb{S}^{p-1} against the alternative that it is uniformly distributed on

𝕊p1H,\mathbb{S}^{p-1}\cap H,

for some kk-dimensional linear subspace HpH\subset\mathbb{R}^{p}.

1 Introduction

Testing whether a sample from an unknown distribution is uniformly distributed over a domain is a classical problem in statistical theory. In the discrete case, this problem has been extensively studied by statisticians, computer scientists, and probabilists; see [bhattacharya2024sparse; balakrishnan2018hypothesis] and references therein. For the continuous case, one of the most common and intriguing settings is the unit hypersphere, not only because of its rich mathematical structure but also due to its importance in statistical analysis on non-Euclidean spaces. Below, we briefly formulate the problem and review relevant literature.

Consider the hypersphere 𝕊p1={xp:x2=1}\mathbb{S}^{p-1}=\left\{x\in\mathbb{R}^{p}:\|x\|_{2}=1\right\}, where .2\|.\|_{2} is the Euclidean distance. The observed data points are denoted by 𝑿1,𝑿2,,𝑿n\bm{X}_{1},\bm{X}_{2},...,\bm{X}_{n} with 𝑿i𝕊p1\bm{X}_{i}\in\mathbb{S}^{p-1} for all i=1,2,,ni=1,2,...,n. We are mainly interested in the high-dimensional case where one assumes p=pnp=p_{n} is a sequence diverging to infinity. Assume that the data 𝑿i\bm{X}_{i}’s are drawn independently from an unknown distribution μ\mu supported on the hypersphere 𝕊p1\mathbb{S}^{p-1}. The uniform distribution on the hypersphere 𝕊p1\mathbb{S}^{p-1} is denoted by Unif(𝕊p1)\mbox{Unif}(\mathbb{S}^{p-1}). The uniformity testing problem can be formulated as

H0:μ=Unif(𝕊p1)againstH1:μUnif(𝕊p1).\displaystyle H_{0}:\mu=\mbox{Unif}(\mathbb{S}^{p-1})\ \ \ \mbox{against}\ \ \ H_{1}:\mu\neq\mbox{Unif}(\mathbb{S}^{p-1}). (1)

In fixed and small dimensions, the uniformity testing problem has been investigated extensively in the last few decades. An incomplete list of early results concerning the case p=2p=2 includes the Kuiper test (see, for example, [Kuiper]), Watson test (see, for example, [watson]) and Hodjes-Ajne test (see, for example, [Ajne]). In arbitrary but fixed dimensions, the class of Sobolev-based tests were introduced in [Gine], and were shown to be universally consistent against any absolutely continuous alternative with L2L^{2}-integrable densities. Notable developments of Sobolev tests include the data driven procedures proposed by [Bogdan] and [Jupp]. The readers are referred to the survey papers [survey-uni; pewsey2021recent] for recent progress on this problem and for a list of recent testing procedures. The consistency and optimality of various testing procedures in fixed dimensions have been well-studied in literature, and can also be found in [survey-uni]. Recent results in the fixed-dimensional settings include [garcia2021cramer; garcia2023projection; fernandez2023new; boucher2025modified; boucher2025runs].

In the era of big data, there has been an increasing interest in studying high-dimensional directional statistics, which assumes the dimensions diverge to infinity. For example, in shape analysis and nonparametric statistics, a popular approach is to consider sign-based procedures, in which one projects the observations onto the hyperspheres and carries out statistical inference based on the projected data. This approach is robust in high dimensions since the concentration of measure phenomenon implies that the majority of information from the data is captured by the directions rather than the magnitudes of the observations. Let us give a brief overview about the high-dimensional directional statistics literature below.

In [Dryden05], the author investigates the asymptotic properties of high-dimensional spherical distributions and their applications to brain shape modeling. Specifically, the study involved statistical modeling of a sample of n=74n=74 MRI images of adult brains. After normalization, each brain image was represented as a unit vector with dimension p=62,501p=62,501. A natural question in this modeling task is whether some simple, well-known distributions (such as the uniform distribution) provide a good fit for the data. Clustering analysis on large dimension hypersphere has been studied in [Banerjee04; Banerjee03]. Potential applications of high-dimensional the uniformity tests were illustrated in [Juan2001], in which the authors relate the multivariate outliers detection problem to uniformity testing problem. Sign-based procedures in high dimensions have been considered in [Zou14] in the context of sphericity testing and in [WPL15], where the authors propose a high-dimensional nonparametric mean test.

From a different perspective than the directional statistical viewpoint discussed above, our primary motivation for studying the high-dimensional analog of (1) arises from deep learning theory. In overparameterized neural networks, regularization is crucial for preventing overfitting and improving generalization. In [xie2017diverse], it was shown that optimizing one-hidden-layer neural networks with approximately uniformly distributed neurons can help avoid spurious local minima. Furthermore, empirical studies in [lin2020regularizing; liu2018learning] have shown that regularization methods promoting uniformity among neurons effectively reduce the generalization error in deep networks. Such methods are fundamentally tied to the question of whether a random set of points on the unit hypersphere is approximately uniformly distributed. The overparameterized nature of deep networks makes it natural to study this question in the high-dimensional settings. Given the complexity of many deep networks, including heavy-tailed or strongly correlated structures (see [mahoney2019traditional] for details), we focus on detecting non-uniformity in a non-parametric manner. This perspective shifts the attention away from traditional parametric modeling goals—such as optimality and asymptotic local power within a parametric class of distributions—towards prioritizing simplicity of implementation and universal consistency.

Despite the vast literature concerning fixed-dimensional tests, much less is known about the uniformity testing problem in the high-dimensional context with diverging dimensions. When the dimension diverges to infinity, many of the existing procedures require highly non-trivial adjustments to work properly. Moreover, there is usually no tractable limiting distribution under uniformity, and the power is typically low due to the curse of dimensionality. To the best of the authors’ knowledge, there are only three high-dimensional tests that have been investigated in literature. We give a short overview of such tests below.

  1. 1.

    Rayleigh test in [Cutting-P-V] and [Ley-P]. This test can be formulated in terms of a U-statistic of the data points with the inner product kernel, i.e.

    Rn\displaystyle R_{n} :=2pn1i<jn𝑿i𝑿j.\displaystyle:=\frac{\sqrt{2p}}{n}\sum_{1\leq i<j\leq n}\bm{X}^{\top}_{i}\bm{X}_{j}. (2)
  2. 2.

    Bingham test in [Cutting-P-V2; Zou14] and [Ley-P]. This test is also based on a U-statistic of the data points, but with a quadratic inner product kernel, i.e.

    Bn\displaystyle B_{n} :=pn1i<jn[(𝑿i𝑿j)21p].\displaystyle:=\frac{p}{n}\sum_{1\leq i<j\leq n}\Big[\left(\bm{X}^{\top}_{i}\bm{X}_{j}\right)^{2}-\frac{1}{p}\Big]. (3)
  3. 3.

    Packing test in [Jiang13]. This test is based on the smallest angle, i.e.

    Pn\displaystyle P_{n} :=pmax1i<jn(𝑿i𝑿j)24logn+loglogn.\displaystyle:=p\cdot\max_{1\leq i<j\leq n}\left(\bm{X}^{\top}_{i}\bm{X}_{j}\right)^{2}-4\log n+\log\log n. (4)

The asymptotic distributions of these test statistics, as well as their non-null behaviors have been studied rigorously over the last few years; see, for example, [Cutting-P-V; Cutting-P-V2; Ley-P; Ley-P-2; Ley-P-V]. It is known that the Rayleigh test RnR_{n} and the Bingham test BnB_{n} enjoy a doubly robust property: under the null hypothesis and the single assumption min{n,p}\min\left\{n,p\right\}\to\infty, both RnR_{n} and BnB_{n} converge in distribution to the standard normal distribution. This feature is highly desirable since no restriction on the dependence between pp and nn is imposed, and neither resampling procedures nor tuning parameters are needed to get the critical values of such tests. Regarding the packing test PnP_{n}, it is known that, under the null hypothesis and the mild assumption p(logn)2p\gg(\log n)^{2}, PnP_{n} converges in distribution to the Gumbel distribution with CDF exp((8π)1/2ex/2)\exp\left(-(8\pi)^{-1/2}e^{-x/2}\right) (see [Jiang13] and also [Jiang12]).

Each of these three tests has its own advantages and disadvantages. However, a common limitation is that they are each optimal only for a specific class of (parametric) alternatives: they perform well against certain models but may be essentially powerless outside those classes. Given the inherently nonparametric nature of the uniformity testing problem, it is therefore natural to prioritize robustness and optimality over a broad range of alternatives when designing testing procedures.

The primary objective of this article is to address this issue by approaching the problem (1) from a probabilistic and geometric perspective, rather than relying on the likelihood-based framework commonly used in statistics. We introduce a novel pseudometric to quantify deviations from uniformity (see (13) below). Unlike most classical distances between probability measures, this distance takes into account geometric deviations from uniformity; see Section 4 for further discussion. We then define a test statistic TnT_{n} (see (8) below) that is naturally based on this distance and is universally consistent in fixed dimension. A key advantage of our test is its model-free nature: it imposes no structural assumptions on the underlying class of distributions, since it is not derived from likelihood inference. In high-dimensional settings, the proposed test enjoys a “doubly robust” property analogous to that of the Rayleigh and Bingham tests, yet remains intrinsically nonparametric. It admits a simple asymptotic theory in the high-dimensional regime, and comes with a consistency theory that is not restricted to any particular parametric class of alternatives. Our contributions can be summarized as follows.

  • We propose a new distance dd (see (13) for a precise definition) to quantify deviations from uniformity. This distance does not require the alternatives to be absolutely continuous with respect to the uniform distribution and is therefore well suited for analyzing singular alternatives. A natural test statistic associated with this distance (see TnT_{n} in (8)) is introduced and shown to converge in distribution to the supremum of a Brownian bridge under the null (Theorem 1).

  • We prove a lower bound of order 1/n1/n for testing with respect to dd (Theorem 3), and we show that the proposed test achieves this lower bound (Theorem 2). Both results are established without imposing any restriction between the sample size nn and the dimension pp.

  • We investigate how the distance dd behaves when restricted to parametric models, and how it relates to the minimax testing problem in those settings. In particular, we study two concrete models: the Fisher–von Mises–Langevin model and a low-rank uniform distribution model. We derive the local limiting distribution and the local power of TnT_{n} in these two models (Propositions 2 and 3), and show that TnT_{n} is asymptotically the supremum of a shifted Brownian bridge under such alternatives, where the shift is a smooth function that vanishes at the end points.

    As a direct consequence of the local limiting distributions, we see that the minimax detection threshold with respect to dd coincides with the usual minimax detection rates within the corresponding parametric model (Propositions 7 and 8 in Appendix A.6). This means that, the threshold at which d1/nd\asymp 1/n matches the minimax rate for testing uniformity in the corresponding parametric model. This phenomenon seems to hold for other models as well, see the discussion in Section 4.

  • As a byproduct of our analysis, we obtain an information-theoretic lower bound for testing the intrinsic dimension of the uniform distribution. This result is new and is of independent interest. In this low-rank uniform distribution model, the detection thresholds of the four tests RnR_{n}, BnB_{n}, PnP_{n}, and our proposed test can be identified precisely.

    We find that in this low-rank model, only the Bingham test and our proposed test attain the optimal detection threshold, and that our test is the only one that achieves the optimal rate simultaneously in both of the parametric models considered above; see Table 1 below.

We would also like to point out that there are other approaches in the literature that are not based on likelihood inference, such as the family of Sobolev tests originally proposed in [Gine] and the projection-based tests introduced in [cuesta2009projection], with further developments in recent works [garcia2021cramer; garcia2023projection; fernandez2023new]. Each of these approaches uses a different characterization of the uniform distribution: the Sobolev tests rely on the eigenfunctions of the Laplacian, while the projection-based tests are based on the one-dimensional distributions obtained by projecting the data onto all possible directions. However, none of these tests extend easily to high-dimensional settings, as they either involve tuning parameters or require resampling methods to implement. In contrast, our proposed test offers a simple and interpretable asymptotic theory in high-dimensional settings, which we discuss below. A detailed comparison between our test and the class of projection-based tests is provided in Section A.1.

The rest of the paper is organized as follows. The proposed pseudo distance and the test are presented in Section 2. The lowerbounds and local limiting distributions are provided in Section 3. Further discussions on the proposed pseudo distance and test can be found in Section 4. Section 5 contains the conclusions and some remarks. The proofs of the main results are presented in Section 6. Further discussisons and remarks can be found in Section 5. Some simulations, technical results, proofs and discussions are provided in Appendix A.

2 Measuring uniformity deviation and testing procedure

2.1 Notation and preliminaries

Throughout the paper, we consider the hypersphere 𝕊p1={xp:x2=1}\mathbb{S}^{p-1}=\left\{x\in\mathbb{R}^{p}:\|x\|_{2}=1\right\}, where .2\|.\|_{2} is the Euclidean distance. We always assume without stating that the dimension p=pnp=p_{n} diverges to infinity. The observed data points are denoted by 𝑿1,𝑿2,,𝑿n\bm{X}_{1},\bm{X}_{2},...,\bm{X}_{n} with 𝑿i𝕊p1\bm{X}_{i}\in\mathbb{S}^{p-1} for all i=1,2,,ni=1,2,...,n. We assume that 𝑿i\bm{X}_{i}’s are drawn independently from an unknown distribution μ\mu supported on the hypersphere 𝕊p1\mathbb{S}^{p-1}. The uniform distribution on the hypersphere 𝕊p1\mathbb{S}^{p-1} is denoted by Unif(𝕊p1)\mbox{Unif}(\mathbb{S}^{p-1}).

For a pair of data points (𝑿i,𝑿j)(\bm{X}_{i},\bm{X}_{j}), we denote by 𝑿i𝑿j\bm{X}^{\top}_{i}\bm{X}_{j} the inner product formed by 𝑿i\bm{X}_{i} and 𝑿j\bm{X}_{j}. Under H0H_{0}, the distribution of 𝑿1𝑿2\bm{X}_{1}^{\top}\bm{X}_{2} is known to have density (see, for example, Lemma 11 and 12 in [Jiang13])

(a𝑿1𝑿2b)\displaystyle\mathbb{P}\left(a\leq\bm{X}_{1}^{\top}\bm{X}_{2}\leq b\right) =1πΓ(p2)Γ(p12)ab(1ρ2)p32𝑑ρ.\displaystyle=\frac{1}{\sqrt{\pi}}\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p-1}{2}\right)}\cdot\int_{a}^{b}\left(1-\rho^{2}\right)^{\frac{p-3}{2}}d\rho. (5)

In the formula (5) above, aa and bb are taken to be in (1,1)(-1,1). The CDF and density of a standard normal distribution N(0,1)N(0,1) will be denoted by Φ(t)\Phi(t) and ϕ(t)\phi(t), respectively.

Throughout the paper, we will use μ0\mu_{0} or 0\mathbb{P}_{0} to denote the uniform distribution. The notation 0n\mathbb{P}_{0n} is used to indicate the nn-fold product measures of the uniform distribution.

2.2 A characterization of the uniform distribution

Let us start with an important observation regarding the random inner product of two i.i.d. points on the hypersphere.

Proposition 1.

Let ν\nu be any Borel probability measure on 𝕊p1\mathbb{S}^{p-1}, and let μ0\mu_{0} denote the uniform distribution on 𝕊p1\mathbb{S}^{p-1}. Suppose 𝐗1,𝐘1\bm{X}_{1},\bm{Y}_{1} are drawn independently from ν\nu, and 𝐗,𝐘\bm{X},\bm{Y} are drawn independently from μ0\mu_{0}. If

𝑿1𝒀1=d𝑿𝒀,\displaystyle\bm{X}_{1}^{\top}\bm{Y}_{1}\stackrel{{\scriptstyle d}}{{=}}\bm{X}^{\top}\bm{Y}, (6)

then νμ0\nu\equiv\mu_{0}.

Proposition 1 establishes the identifiability of the uniform distribution in terms of the inner product, asserting that the distribution of the inner product uniquely characterizes Uni(𝕊p1)\text{Uni}(\mathbb{S}^{p-1}). To the best of our knowledge, this characterization is new and may be of independent interest. Notably, we do not impose any regularity assumptions on the measure ν\nu in the statement of Proposition 1, and the result holds even if ν\nu is highly singular. For instance, the characterization applies to probability measures supported on sets of lower dimensions. Proposition 1 allows us to construct tests that work against any type of alternative, distinguishing it from other omnibus tests in the literature, which typically require alternatives to have L2L^{2}-integrable densities with respect to Unif(𝕊p1)\mbox{Unif}(\mathbb{S}^{p-1}).

A related but distinct characterization of the uniform distribution in terms of projections was proposed in [cuesta2009projection]; see Section A.1 for further details and comparisons with the class of projection-based tests, which are built upon this characterization. Some recent extensions of this characterization to other types of distributions can be found in [fraiman2023cramer; fraiman2024application; fraiman2023quantitative]. Specifically, the characterization in [cuesta2009projection] is as follows. Let 𝑿1,𝑿2\bm{X}_{1},\bm{X}_{2} be random variables on 𝕊p1\mathbb{S}^{p-1} for some p1p\geq 1 and UUnif(𝕊p1)U\sim\mbox{Unif}(\mathbb{S}^{p-1}), then under some regularity conditions,

𝑿1𝑼=d𝑿2𝑼𝑿1=d𝑿2.\displaystyle\bm{X}_{1}^{\top}\bm{U}\stackrel{{\scriptstyle d}}{{=}}\bm{X}_{2}^{\top}\bm{U}\Leftrightarrow\bm{X}_{1}\stackrel{{\scriptstyle d}}{{=}}\bm{X}_{2}. (7)

Broadly speaking, the characterization (7) relies on projections onto independent, uniformly distributed directions. To construct testing procedures using (7), one often needs to sample the direction 𝑼\bm{U} repeatedly. In contrast, the left-hand side of (6) does not involve any uniformly distributed direction and is completely data-driven. Consequently, our method requires neither integrating over all directions 𝑼\bm{U}, as was done in [escanciano2006consistent; garcia2023projection; fernandez2023new] nor sampling 𝑼\bm{U} repeatedly, as was done in [cuesta2009projection].

The key challenge in proving Proposition 1 is that the arguments used to establish (7) are no longer applicable. Specifically, the proof of the characterization (7) in [cuesta2007sharp; cuesta2009projection] relies on a sharp version of the Cramér–Wold device, which cannot be applied to (6) because 𝒀1\bm{Y}_{1} and 𝒀\bm{Y} follow different laws. The proof of Proposition 1, instead, relies on a subtle application of the Lebesgue differentiation theorem, which will be detailed in Section 6.1.

The first motivation for our model-free testing comes from Proposition 1 above. Intuitively, instead of carrying out inference directly on the data points, one can construct tests based on some unique features of the random inner product under H0H_{0}. Here we choose the CDF as such a feature to construct the test, which is based on estimating the CDF of 𝑿1𝑿2\bm{X}^{\top}_{1}\bm{X}_{2} from the data and reject H0H_{0} if the estimated CDF differs too much from the true CDF F0F_{0}. Under H0H_{0}, we know that the CDF of 𝑿1𝑿2\bm{X}^{\top}_{1}\bm{X}_{2} has a beta-type distribution with specific parameters (see equation (9) below). Another advantage of the random inner products is that, they are one-dimensional objects, which require low computational cost. Moreover, they become “asymptotically independent” as the dimension gets larger. The last phenomenon has been observed in a number of works related to Haar matrices; see, for examples, [Jiang05; Jiang09; D-F] and the references therein.

Another motivation for our model-free test is based on the following observation regarding three exsiting tests RnR_{n}, BnB_{n} and PnP_{n} defined in (2), (3) and (4), respectively. It is reasonable to argue that the primary cause of their model-dependent issue is the inability to capture all the information available from the data. In particular, the Rayleigh test RnR_{n} only takes advantage of the linear kernel, which is powerless for models involving axial data, such as the Watson distributions. On the other hand, the Bingham test BnB_{n} uses a quadratic kernel and performs poorly on non-axial data. The packing test PnP_{n} uses extreme-values, and as pointed out in [Jiang13], it does not examine whether there is a gap in the data or not.

To fully utilize all information available from the data, we will look at the empirical distributions generated from all random inner products, instead of using a particular U-statistic. To be more precise, let 𝑿1,𝑿2,,𝑿n\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{n} be the data, we define

μ^n=2n(n1)1i<jnδ𝑿i𝑿j.\displaystyle\hat{\mu}_{n}=\frac{2}{n(n-1)}\sum_{1\leq i<j\leq n}\delta_{\bm{X}^{\top}_{i}\bm{X}_{j}}.

In the light of Proposition 1, the empirical measure μ^n\hat{\mu}_{n} intuitively capture the characteristics of the underlying unknown distribution. One can then compare μ^n\hat{\mu}_{n} and 𝑿1𝑿2|H0\bm{X}^{\top}_{1}\bm{X}_{2}\Big|_{H_{0}}, which is the distribution of 𝑿1𝑿2\bm{X}^{\top}_{1}\bm{X}_{2} under H0H_{0} via the Kolomorgrov distance to see how severely the data is away from uniformity. To define the test formally, we write

Tn:\displaystyle T_{n}: =supt[1,1]|2n(n1)1i<jn𝟏{𝑿i𝑿jt}m(t)|,\displaystyle=\sup_{t\in[-1,1]}\Big|\frac{2}{n(n-1)}\sum_{1\leq i<j\leq n}\mathbf{1}_{\left\{\bm{X}^{\top}_{i}\bm{X}_{j}\leq t\right\}}-m(t)\Big|, (8)

where

m(t):=(𝑿1𝑿2t|H0)=(12Ut),\displaystyle m(t):=\mathbb{P}\left(\bm{X}^{\top}_{1}\bm{X}_{2}\leq t\Big|H_{0}\right)=\mathbb{P}\left(1-2U\leq t\right), (9)

and UBeta(p32,p32)U\sim\mbox{Beta}\left(\frac{p-3}{2},\frac{p-3}{2}\right).

2.3 Testing procedure

In what follows, we assume pp depends on nn and sometimes we write p=pnp=p_{n} for clarity. Recall the Brownian bridge {Bt;0t1}\left\{B_{t};0\leq t\leq 1\right\} has the same distribution as {WttW1;0t1}\left\{W_{t}-tW_{1};0\leq t\leq 1\right\}, where {Wt}\left\{W_{t}\right\} is a standard Brownian motion. Our next result establishes the asymptotic distribution of TnT_{n} under H0H_{0}, namely

Theorem 1.

Let {Bt}0t1\left\{B_{t}\right\}_{0\leq t\leq 1} be a standard Brownian bridge on [0,1][0,1]. If min{n,p}\min\left\{n,p\right\}\to\infty, then

n(n1)2Tn𝑑maxt[0,1]|Bt|\sqrt{\frac{n(n-1)}{2}}T_{n}\xrightarrow{d}\max_{t\in[0,1]}|B_{t}|

where TnT_{n} is defined in (8).

The asymptotic distribution in Theorem 1 is the same as that of the classical nonparametric Kolomorgrov-Smirnov test. The exact expression of the Brownian bridge’s maximum is known to be

(maxt[0,1]|Bt|>x)=2k=1(1)k+1exp(2k2x2);\displaystyle\mathbb{P}\left(\max_{t\in[0,1]}|B_{t}|>x\right)=2\sum_{k=1}^{\infty}(-1)^{k+1}\exp\left(-2k^{2}x^{2}\right); (10)

see, for example, [brownian-bridge].

At first glance, one can see that TnT_{n} is the supremum of a degenerate U-process. Moreover, the underlying distribution of the U-process is also allowed to change with the dimension. To the best of our knowledge, this situation has not been explored in literature, although limit theory for non-degenerate U-processes is well-studied. To establish the null distribution of TnT_{n}, we rely on a special property of the uniform distribution on the sphere: under H0H_{0}, the normalized inner product p𝑿i𝑿j\sqrt{p}\bm{X}^{\top}_{i}\bm{X}_{j}’s are pairwise independent and asymptotically normal. Moreover, as discussed above, the dependence between them is weaker as the dimension increases. Thus, one should expect the asymptotic distribution of TnT_{n} is the same as that of the classical Kolomorgrov-Smirnov test under the i.i.d. settings. Remarkably, Theorem 1 presents a stark departure from classical results concerning degenerate U-processes with a fixed distribution. Typically, the asymptotic distributions of such statistics lack closed-form expressions; see (12) below. However, the divergence of pp gives a convenient asymptotic distribution as demonstrated in Theorem 1.

Thanks to Theorem 1 and the expression (10), one can easily calculate the critical value cαc_{\alpha} for the α\alpha-level test. The test rejects the null hypothesis if Tn>cα/2/n(n1)T_{n}>c_{\alpha}/\sqrt{2/n(n-1)}, where cαc_{\alpha} is chosen such that

(maxt[0,1]|Bt|cα)=1α.\mathbb{P}\left(\max_{t\in[0,1]}|B_{t}|\leq c_{\alpha}\right)=1-\alpha.

The quantile cαc_{\alpha} of the Kolomogrov-Smirnov distribution is well understood in literature and can be calculated precisely via (10), for example, c0.95=1.36c_{0.95}=1.36. This is also the critical value we choose in the simulation. From now on, we will use ϕn\phi_{n} to indicate the α\alpha-level test based on TnT_{n}, which is

qn(α):=2cαn(n1)andϕn:=𝟏{Tnqn(α)},\displaystyle q_{n}(\alpha):=\frac{\sqrt{2}c_{\alpha}}{\sqrt{n(n-1)}}~~~~~\text{and}~~~~~\phi_{n}:=\mathbf{1}_{\left\{T_{n}\geq q_{n}(\alpha)\right\}}, (11)

where TnT_{n} is defined in (8).

From Theorem 1, one can see that the test ϕn\phi_{n} is doubly robust: there is no restriction on the way pp diverges to infinity. Among the three known high-dimensional tests, only the Rayleigh test RnR_{n} in (2) and the Bingham test BnB_{n} in (3) satisfy this property. The Packing test PnP_{n} requires a mild regularity condition p/(logn)2p/(\log n)^{2}\to\infty and thus, is not doubly robust. It’s worth noting that the test ϕn\phi_{n} is also valid in the fixed pp scenario, albeit without the asymptotic distribution provided in Theorem 1. Indeed, when pp is fixed, it is known that

n(n1)2Tn𝑑maxt[0,1]|Qt|\displaystyle\sqrt{\frac{n(n-1)}{2}}T_{n}\xrightarrow{d}\max_{t\in[0,1]}|Q_{t}| (12)

where TnT_{n} is defined in (8) and {Qt}0t1\left\{Q_{t}\right\}_{0\leq t\leq 1} is a stochastic process whose marginal distributions equal to a linear combination of chi-squared distributions, see Theorem 7 in [nolan1988functional]. The asymptotic distribution in (12) does not have a tractable expression and Monte Carlo simulation is needed to approximate the test’s critical value.

2.4 Model-free consistency

For any two probability measures μ\mu and ν\nu supported on 𝕊p1\mathbb{S}^{p-1}, define the pseudometric

d(μ,ν):=supt[1,1]|μ(𝑿1𝒀1t)ν(𝑿𝒀t)|\displaystyle d(\mu,\nu):=\sup_{t\in[-1,1]}\Big|\mathbb{P}_{\mu}\left(\bm{X}_{1}^{\top}\bm{Y}_{1}\leq t\right)-\mathbb{P}_{\nu}\left(\bm{X}^{\top}\bm{Y}\leq t\right)\Big| (13)

where 𝑿1\bm{X}_{1}, 𝒀1\bm{Y}_{1} are drawn independently from μ\mu and 𝑿\bm{X}, 𝒀\bm{Y} are drawn independently from ν\nu.

The distance dd in (13) is only a pseudometric, in the sense that d(μ,ν)=0d(\mu,\nu)=0 does not imply μν\mu\equiv\nu. However, in the light of Proposition 1, we can see that d(μ,Unif(𝕊p1))=0d(\mu,\mbox{Unif}\left(\mathbb{S}^{p-1}\right))=0 yields μUnif(𝕊p1)\mu\equiv\mbox{Unif}\left(\mathbb{S}^{p-1}\right). Thus, the pseudometric dd can be used as a quantitative measure for the deviation from the null. Based on this pseudometric, we define a consistency criteria, namely the 1/n1/n-separation condition. The precise definition can be formulated as follows.

Condition 1 (separation condition).

Given a sequence {(n,pn);n1}\left\{(n,p_{n});{n\geq 1}\right\}. Let μn\mu_{n} be a sequence of probability measures on 𝕊pn1\mathbb{S}^{p_{n}-1}, we say that the sequence {μn}n1\left\{\mu_{n}\right\}_{n\geq 1} satisfies the 1/n1/n-separation condition if

nd(μn,Unif(𝕊pn1))\displaystyle n\cdot d\left(\mu_{n},\mbox{Unif}\left(\mathbb{S}^{p_{n}-1}\right)\right)\to\infty (14)

where dd is the pseudo metric defined in (13).

The separation condition (14) measures the departure from the null hypothesis in terms of the pseudometric dd, which is the Kolomogrov distance between the random inner products drawn under H0H_{0} and H1H_{1}. Interestingly, the rate in (14) is of order n1n^{-1}, which is different than the normal n1/2n^{-1/2} rate. This is due to the degeneracy nature of H0H_{0} and the form of dd. Interestingly, condition (14) is of nonparametric nature and requires neither parametric assumptions nor regularities: the sequence of alternatives μn\mu_{n} may or may not have densities with respect to the uniform measure μ0\mu_{0}, and it is not restricted to any parametric class of distributions that contains Unif(𝕊p1)\mbox{Unif}(\mathbb{S}^{p-1}).

We do not require pnp_{n} converge to infinity in (14), and the fixed pp setting is also covered in (14). The assumption that pp is diverging is only required to control the size of the test via the asymptotic result in Theorem 1. In the fixed pp scenario, one can use Monte Carlo simulation to approximate the test’s critical value as stated in (12). If we keep pp fixed and consider a fixed alternative, then by Proposition 1, (14) always holds. Therefore, in the fixed-dimensional case, (14) is the same as the universally consistency property of the Sobolev tests. Furthermore, condition (14) remains valid in the high-dimensional settings, making it a natural analogue of the universal consistency property that operates in both fixed and high-dimensional cases. Next, given the separation condition (14), it can be shown next that the test ϕn\phi_{n} is consistent.

Theorem 2.

Let μn\mu_{n} be a sequence of probability measures on 𝕊pn1\mathbb{S}^{p_{n}-1} which satisfies the separation condition (14). Then, limnμn(ϕn=1)=1\lim_{n\to\infty}\mathbb{P}_{\mu_{n}}(\phi_{n}=1)=1. Here ϕn=ϕn(𝐗1,𝐗2,,𝐗n)\phi_{n}=\phi_{n}(\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{n}) is the test in (11).

The rate 1/n1/n in condition (14) is sharp: we prove a matching lowerbound in Theorem 3 below, which also does not impose any restriction between pp and nn. Interestingly, if one restricts the model to a parametric class, the threshold at which the distance dd scales like 1/n1/n often coincide which the minimax rates within that model. We do not have a proof or a result of this type for an arbitrary sequence of alternatives. However, we will investigate in details below two examples of this type, and derive the local limiting distributions along a sequence of local alternatives at the minimax thresholds.

3 Lower bound and non-null results

3.1 An information lower bound

Define the set of test functions based on a sample of size nn as

𝒯n:={ϕ=ϕ(𝑿1,𝑿2,,𝑿n):(𝕊p1)n{0,1}}.\displaystyle\mathcal{T}_{n}:=\left\{\phi=\phi\left(\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{n}\right):\left(\mathbb{S}^{p-1}\right)^{n}\to\left\{0,1\right\}\right\}. (15)

By using the Le Cam’s mixture argument, we can show that

Theorem 3.

Suppose min{p,n}\min\left\{p,n\right\}\to\infty. For ε\varepsilon small enough, we have

lim infn{infϕ𝒯n{μ0(ϕ=1)+supd(ν,μ0)εnν(ϕ=0)}}1/4.\displaystyle\liminf_{n\to\infty}\left\{\inf_{\phi\in\mathcal{T}_{n}}\left\{\mathbb{P}_{\mu_{0}}\left(\phi=1\right)+\sup_{d(\nu,\mu_{0})\geq\frac{\varepsilon}{n}}\mathbb{P}_{\nu}\left(\phi=0\right)\right\}\right\}\geq 1/4. (16)

In fixed-dimensional settings, Theorem 3 is straightforward to prove since one can directly apply the Le Cam’s two-point argument to a perturbation of size Θ(1/n)\Theta(1/\sqrt{n}) of the uniform distribution. The non-trivial aspect of Theorem 3 lies in establishing the result in high-dimensional settings, and doing so without imposing any growth condition on pp and nn.

The worst-case construction in the proof of Theorem 3 is based on the Fisher–von Mises–Langevin (FvML) distributions. This choice is motivated by simulation results showing that the test exhibits power very close to that of the Rayleigh test, which is the optimal invariant test within this model [Cutting-P-V].

3.2 Local limiting distribution under the FvML alternatives

The FvML distributions are one the most common type of alternatives for uniformity testing and have been investigated in the recent line of works [Cutting-P-V; Cutting-P-V2]; see also the references therein. To describe the FvML distributions, let us introduce a general class of “monotone” rotationally symmetric densities following [Cutting-P-V; Cutting-P-V2].

Let f:+f:\mathbb{R}\to\mathbb{R}^{+} be a smooth and strictly increasing function. Define the family of densities

𝒙cf,κexp[f(κ(𝒙𝝁))]dμ0(𝒙)\bm{x}\mapsto c_{f,\kappa}\cdot\exp\Big[f\left(\kappa\left(\bm{x}^{\top}\bm{\mu}\right)\right)\Big]d\mu_{0}\left(\bm{x}\right)

Here κ>0\kappa>0 is the concentration parameter and 𝝁𝕊p1\bm{\mu}\in\mathbb{S}^{p-1} is the location parameter. Most of the common distributions in directional statistics belong to this class of distributions. Two common choices are

  • Watson distributions. This corresponds to the case f(x)=ex2f(x)=e^{x^{2}} [Cutting-P-V2].

  • FvML distributions. This corresponds to the case f(x)=exf(x)=e^{x} [Cutting-P-V].

In this subsection, we will investigate the local power and consistency of the test TnT_{n} under the the class of FvML distributions. It is known that within the class of FvML distributions, the threshold κp3/4/n\kappa\sim p^{3/4}/\sqrt{n} is the minimax rate: when κ\kappa is below this threshold, no rotationally invariant test can be consistent. Moreover, when κ\kappa is above this threshold, the Rayleigh test is consistent and is also optimal in the sense of Le Cam.

Let Φ1(t):[0,1]\Phi^{-1}(t):[0,1]\to\mathbb{R} be the quantile function of the standard normal distribution, and ϕ\phi is the standard Gaussian density, ϕ(x)=(1/2π)exp(x2/2)\phi(x)=(1/\sqrt{2\pi})\exp\left(-x^{2}/2\right). Our main result regarding the FvML alternatives is

Proposition 2.

Let κ=τnp3/4/n\kappa=\tau_{n}p^{3/4}/\sqrt{n}. Then, if τnτ(0,)\tau_{n}\to\tau\in(0,\infty), then

Tndsupt[0,1]|Btτ22ϕ(Φ1(t))|.T_{n}\stackrel{{\scriptstyle d}}{{\to}}\sup_{t\in[0,1]}\left|B_{t}-\frac{\tau^{2}}{\sqrt{2}}\phi\left(\Phi^{-1}(t)\right)\right|.

under the sequence of FvML alternatives with concentration parameter κn\kappa_{n}, where {Bt;0t1}\left\{B_{t};0\leq t\leq 1\right\} is the Brownian bridge.

It follows directly from Proposition 2 that the asymptotic power of TnT_{n} under the class of FvML distributions is given by

(supt[0,1]|Btτ22ϕ(Φ1(t))|qα).\mathbb{P}\left(\sup_{t\in[0,1]}\left|B_{t}-\frac{\tau^{2}}{\sqrt{2}}\phi\left(\Phi^{-1}(t)\right)\right|\geq q_{\alpha}\right).

From the display above, we can see that TnT_{n} is consistent at the contiguity rate p3/4/np^{3/4}/\sqrt{n}. By Proposition 7 in the Appendix A.6, we get

n×d(FvML(τnp3/4/n),μ0)τ22π.n\times d\left(\mbox{FvML}\left(\tau_{n}p^{3/4}/\sqrt{n}\right),\mu_{0}\right)\to\frac{\tau^{2}}{\sqrt{2\pi}}.

The display above indicates that the local alternatives at the minimax threshold for dd are the same as that of the parametric FvML model. In other words, the distance dd captures precisely the minimax rate of testing uniformity in the FvML model.

The asymptotic power above does not have any closed-form expression, but we observe in simulation that its power is slightly lower than that of the Rayleigh test, which is expected due to the LAN expansion in [Cutting-P-V]. Note that in this regime, the Packing test PnP_{n} and the Bingham BnB_{n} are both blind. We further know from [Cutting-P-V2] that the detection threshold for the Bingham test in this model is p3/4/n1/4p^{3/4}/n^{1/4}.

3.3 Local limiting distribution under a low-rank model

Consider the set of kk-dimensional hyperplanes in p\mathbb{R}^{p}. We denote this set by G(k,p)G(k,p), which is known to form the Grassmannian manifold; see [chikuse2003statistics] for a comprehensive overview.

Consider the set of kk-dimensional hyperplanes in p\mathbb{R}^{p}. We denote this set by G(k,p)G(k,p), which is known to form the Grassmannian manifold; see [chikuse2003statistics] for a comprehensive overview.

We are interested in testing uniformity against the class of low-rank uniform distributions:

H1:(k,)s.t.Gn,k and μUnif(𝕊p1)\displaystyle H_{1}:\quad\exists\ (k,\mathcal{H})\ \text{s.t.}\ \mathcal{H}\in G_{n,k}\text{ and }\mu\sim\mathrm{Unif}\!\left(\mathcal{H}\cap\mathbb{S}^{p-1}\right) (17)

for some k{1,,p1}k\in\{1,\dots,p-1\}.

Obviously, the case k=pk=p corresponds to the null H0H_{0}. The problem is essentially about detecting whether the uniform distribution has a low-rank structure in it. We are interested in the hard regime where kk is close to pp, and we will thus assume that k=knk=k_{n} such that min{k,p,n}\min\left\{k,p,n\right\}\to\infty and k/p1k/p\to 1. In this regime, we can show that

Proposition 3.

Suppose p/np/n\to\infty and (1k/p)nτ(0,)\left(1-k/p\right)n\to\tau\in(0,\infty). Let {Bt}\left\{B_{t}\right\} be the Brownian bridge. Then,

Tndsupt[0,1]|Btτ22Φ1(t)ϕ(Φ1(t))|T_{n}\stackrel{{\scriptstyle d}}{{\to}}\sup_{t\in[0,1]}\left|B_{t}-\frac{\tau}{2\sqrt{2}}\cdot\Phi^{-1}(t)\phi\left(\Phi^{-1}(t)\right)\right|

where Φ1\Phi^{-1} and ϕ\phi are the quantile function and density function of a standard normal distribution, respectively.

The proof of Proposition 3 follows directly from Proposition 8 in the Appendix A.5 and Theorem 1. Proposition 3 shows that the test TnT_{n} is consistent at the threshold k=(1τ/n))pk=(1-\tau/n))p, with asymptotic power given by

(supt[0,1]|Btτ22Φ1(t)ϕ(Φ1(t))|qα)\mathbb{P}\left(\sup_{t\in[0,1]}\left|B_{t}-\frac{\tau}{2\sqrt{2}}\cdot\Phi^{-1}(t)\phi\left(\Phi^{-1}(t)\right)\right|\geq q_{\alpha}\right)

It is natural to ask whether the rate k=(1Ω(1/n))pk=(1-\Omega(1/n))p is optimal. The answer is yes, which is the claim of the theorem below.

Theorem 4.

Suppose p/np/n\to\infty and put δ(k)=(1k/p)n\delta(k)=(1-k/p)n. Then, for some ε>0\varepsilon>0 sufficiently small, we have

lim infn{infϕ𝒯n{μ0(ϕ=1)+supμkH1:δ(k)ε{μk(ϕ=0)}}}1/4\liminf_{n\to\infty}\left\{\inf_{\phi\in\mathcal{T}_{n}}\left\{\mathbb{P}_{\mu_{0}}\left(\phi=1\right)+\sup_{\mu_{k}\in H_{1}:\delta(k)\geq\varepsilon}\left\{\mathbb{P}_{\mu_{k}}\left(\phi=0\right)\right\}\right\}\right\}\geq 1/4

where 𝒯n\mathcal{T}_{n} is the class of tests based on the data as defined in (15) and the supremum is taken over all alternatives μk\mu_{k} of the form (17) such that δ(k)ε\delta(k)\geq\varepsilon.

Theorem 4 claims that in the high-dimensional regime, as long as (1k/p)n(1-k/p)n is small enough, no test based on a sample size of nn can be consistent. This suggests that the test TnT_{n} is rate-optimal in this model. To the best of our knowledge, this information lowerbound is new and has not been studied before. The most technical part of the proof is to analyze the likelihood ratio against a random distribution over the Grassmanian G(k,p)G(k,p).

By Proposition 8 in the Appendix A.6, we have

n×d(Unif(Hk𝕊p1),μ0)τ2supu|uϕ(u)|\displaystyle n\times d\left(\mbox{Unif}\left(H_{k}\cap\mathbb{S}^{p-1}\right),\mu_{0}\right)\to\frac{\tau}{2}\cdot\sup_{u\in\mathbb{R}}|u\phi(u)|

where ϕ\phi is the standard Gaussian density. Therefore, the local alternatives at the minimax threshold for dd are the same as that of the low-rank model. In other words, the distance dd captures precisely the minimax rate of testing uniformity in the low-rank model.

Let us now do a comparison in terms of local power between the four tests Rn,Bn,Pn,TnR_{n},B_{n},P_{n},T_{n}. The nice feature of this low-rank model is that all the detection threshold of all the four tests can be computed precisely.

  • Recall the Rayleigh test RnR_{n} from (2), it is easy to check that p/kRndN(0,1)\sqrt{p/k}\cdot R_{n}\stackrel{{\scriptstyle d}}{{\to}}N\left(0,1\right) as nn\to\infty. Thus, RnR_{n} is not consistent, even when k/p0k/p\to 0. Its maximum power will not exceed 1/21/2. However, a two-sided version of RnR_{n}, which rejects if |Rn||R_{n}| is large, is consistent in the regime k/p0k/p\to 0.

  • For the Bingham test BnB_{n} in (3), we have

    kp[Bnp(n1)2(1k1p)]dN(0,1)\frac{k}{p}\left[B_{n}-\frac{p(n-1)}{2}\left(\frac{1}{k}-\frac{1}{p}\right)\right]\stackrel{{\scriptstyle d}}{{\to}}N\left(0,1\right)

    as nn\to\infty, under H1H_{1}. Therefore, in the regime n(1k/p)τ(0,)n(1-k/p)\to\tau\in(0,\infty), the asymptotic power of the Bingham test is

    1Φ(zατ/2)1-\Phi\left(z_{\alpha}-\tau/2\right)

    where zαz_{\alpha} is the (1α)(1-\alpha)-quantile of the standard normal distribution. This shows that the Bingham test achieves the optimal rate as suggested by Theorem 4, with local power given by the above.

  • Finally, regarding the Packing test PnP_{n} in (4), we have

    pkPn(1kp)(4lognloglogn)G\frac{p}{k}\cdot P_{n}-\left(1-\frac{k}{p}\right)\left(4\log n-\log\log n\right)\to G

    where GG is a standard Gumbel law. Thus, the test PnP_{n} is consistent iff (logn)(1k/p)(\log n)(1-k/p)\to\infty. This detection threshold is strictly sub-optimal, but is still better than the Rayleigh test.

From the above, we can see that only the proposed test TnT_{n} and the Bingham test BnB_{n} achieve the optimal rate. Although the power function of TnT_{n} does not have a closed-form expression, we find in simulation studies that the local power of the Bingham test is greater than that of the proposed test.

4 When are the distance dd and test TnT_{n} useful?

In this section, we discuss several advantages of the distance dd and explain why the proposed test TnT_{n} is useful. First, the distance dd is a measure of “symmetry” and differs from classical metrics between probability measures such as total variation, Hellinger, or chi-squared distances. These distances are not tailored to the orthogonally invariant structure of the problem and, in particular, they do not reflect geometric features such as concentration along lower-dimensional subspaces.

To illustrate the difference with such metrics, consider the class of low-rank uniform distributions introduced in Section 3.3. In terms of total variation distance, we always have

dTV(Unif(𝕊p1),Unif(H𝕊p1))=1d_{\rm TV}\!\left(\mathrm{Unif}\!\left(\mathbb{S}^{p-1}\right),\,\mathrm{Unif}\!\left(H\cap\mathbb{S}^{p-1}\right)\right)=1

for all subspaces HH with dimension less than or equal to p1p-1.

Thus, density-based distances such as total variation are not well-suited for alternatives that are singular with respect to the uniform distribution. In contrast, the distance dd is sensitive to geometric deviations: it detects that low-rank uniform distributions have large “empty regions” compared to the standard uniform distribution, and it also captures the concentration patterns of FvML distributions at the optimal rate. We also note that the Wasserstein distance is another metric that can detect geometric features and is sensitive to changes in intrinsic dimension. However, constructing tests based on the Wasserstein distance is substantially more involved, both analytically and computationally.

Another interesting feature of the distance dd, which we don’t have a fully general theory for it yet is that, when restricts to many parametric, high-dimensional classes of distributions, the threshold at which nd(μn,μ0)nd\left(\mu_{n},\mu_{0}\right) converges to a non-zero limit often coincides with the minimax rate of testing uniformity within that family. One can show this for some other models, such as the Watson distributions, or the spiked covariance distributions. The task of getting asymptotic expansion for the distance dd along a parametric model often can be done by using Edgeworth-type expansion (although the computation can be tedious).

Regarding the test TnT_{n}, we observe that it achieves the optimal detection rates in both the FvML model and the low-rank model, with explicit power functions available in each case, albeit without being the locally most powerful test in either setting. As discussed above, we believe that TnT_{n} is rate-optimal for a broad range of parametric models; that is, we conjecture that the threshold at which nd(μn,μ0)nd(\mu_{n},\mu_{0}) converges to a positive limit coincides with the minimax rate for testing uniformity in that parametric model. This has been verified for the two models considered in Sections 3.2 and 3.3. One can also establish this correspondence for the Watson distributions via an Edgeworth-type expansion, although deriving the local limiting distributions would require a LAN expansion similar to that in [Cutting-P-V]. At present, however, we are not aware of a unified framework for computing the local power of TnT_{n} across different parametric families.

In what follows, we examine yet another class of distributions whose geometric structure is close to that of the uniform distribution, thereby providing further insight into the behavior of TnT_{n}.

  • The class of α\alpha-spherical distributions. This model arises by projecting heavy-tailed random vectors with i.i.d. components onto the unit sphere. It was introduced in [heiny2022limiting; dornemann2025limiting] in the study of a heavy-tailed analog of the Marchenko–Pastur law for sample correlation matrices. Subsequently, [jiang2025asymptotic] showed that both the Rayleigh test (2) and the Bingham test (3) are inconsistent for this model in the proportional regime p/nc(0,)p/n\to c\in(0,\infty), while the packing test (4) remains consistent.

    Formally, the α\alpha-spherical distribution is defined as

    μα,p=𝐗𝐗,\mu_{\alpha,p}=\frac{\mathbf{X}}{\|\mathbf{X}\|},

    where 𝐗\mathbf{X} is a pp-dimensional random vector with i.i.d. symmetric components, each regularly varying with index α(0,2)\alpha\in(0,2).

    Although this model does not represent a local alternative to the uniform distribution, the geometric behavior of its samples is remarkably similar: in both cases, the points are nearly orthogonal in high dimensions. The subtle difference between μα,p\mu_{\alpha,p} and μ0\mu_{0} is that, under μα,p\mu_{\alpha,p}, there are a few points that are either very close to each other or almost aligned along straight lines through the origin. Intuitively, since almost all the points are orthogonal, the tests that are based on a single polynomial of the inner products, like the Rayleigh test and Bingham test, would fail to be consistent.

    By [cohen2020heavy] (see the discussion after Theorem 4.1), it follows that

    p1/α𝐗𝐘𝑑Zαp^{1/\alpha}\,\mathbf{X}^{\top}\mathbf{Y}\;\xrightarrow{d}\;Z_{\alpha}

    for some non-degenerate random variable ZαZ_{\alpha} that can be written as the ratio of independent stable random variables. Since α(0,2)\alpha\in(0,2), we have 1/α1/2>01/\alpha-1/2>0, and hence p1/α1/2/2p^{1/\alpha-1/2}/2\to\infty as pp\to\infty. Therefore,

    d(μα,p,μ0)\displaystyle d\bigl(\mu_{\alpha,p},\mu_{0}\bigr) =supt[1,1]|μα,p(𝐗𝐘t)μ0(𝐗𝐘t)|\displaystyle=\sup_{t\in[-1,1]}\left|\mathbb{P}_{\mu_{\alpha,p}}\!\left(\mathbf{X}^{\top}\mathbf{Y}\leq t\right)-\mathbb{P}_{\mu_{0}}\!\left(\mathbf{X}^{\top}\mathbf{Y}\leq t\right)\right|
    |μα,p(p1/α𝐗𝐘p1/α1/22)μ0(p𝐗𝐘12)|.\displaystyle\geq\left|\mathbb{P}_{\mu_{\alpha,p}}\!\left(p^{1/\alpha}\mathbf{X}^{\top}\mathbf{Y}\leq\frac{p^{1/\alpha-1/2}}{2}\right)-\mathbb{P}_{\mu_{0}}\!\left(\sqrt{p}\,\mathbf{X}^{\top}\mathbf{Y}\leq\frac{1}{2}\right)\right|.

    As pp\to\infty, the first probability converges to 11, while the second converges to Φ(1/2)\Phi(1/2), so for all pp large enough,

    d(μα,p,μ0)1Φ(1/2)2.d\bigl(\mu_{\alpha,p},\mu_{0}\bigr)\;\geq\;\frac{1-\Phi(1/2)}{2}.

    Thus, the test TnT_{n} is also consistent in this model.

The behaviour of the four tests can be summarized in the Table 1 above. One can see that only the proposed test TnT_{n} is the one that stays consistent/rate-optimal across all three different models.

Table 1: Asymptotic detection boundaries (up to constants) of various tests under FvML, low-rank, and α\alpha-spherical alternatives.
Test / model FvML Low rank α\alpha-spherical
RnR_{n} (2) p3/4n\dfrac{p^{3/4}}{\sqrt{n}}, optimal    [Cutting-P-V] k=o(p)k=o(p), sub-optimal inconsistent      [jiang2025asymptotic]
BnB_{n} (3) p3/4n1/4\dfrac{p^{3/4}}{n^{1/4}}, sub-optimal    [Cutting-P-V] k=(1Ω(1/n))pk=\bigl(1-\Omega(1/n)\bigr)p, optimal inconsistent      [jiang2025asymptotic]
PnP_{n} (4) blind at p3/4n\dfrac{p^{3/4}}{\sqrt{n}}, sub-optimal       [jiang2025asymptotic] k=(1Ω(1/logn))pk=\bigl(1-\Omega(1/\log n)\bigr)p, sub-optimal consistent       [jiang2025asymptotic]
TnT_{n} (8) p3/4n\dfrac{p^{3/4}}{\sqrt{n}}, optimal  Proposition 2 k=(1Ω(1/n))pk=\bigl(1-\Omega(1/n)\bigr)p, optimal    Proposition 3 consistent

5 Conclusions and remarks

In this paper, we propose a novel distance to quantify deviations from uniformity, together with a test naturally associated with this distance. We show that the test enjoys very simple asymptotic properties in high dimensions and admits a model-free consistency theory. We establish optimal detection rates with respect to the proposed distance and show that the test attains these rates. Furthermore, we show that, when restricted to parametric models, the proposed distance precisely captures the usual notion of local alternatives; this is verified for the FvML model and for a low-rank uniform distribution model. As a consequence of our analysis, we obtain the detection threshold for testing the intrinsic dimension of the uniform distribution. We now make some remarks.

  1. 1.

    It is of independent interest to extend Proposition 1 to other types of spherical distributions. We conjecture that the conclusion of Proposition 1 remains valid for any two Borel probability measures on the sphere, up to an orthogonal transformation, under suitable regularity conditions.

  2. 2.

    We believe that the proposed distance characterizes the minimax rates for other models as well. For example, one can show this for the Watson model and for certain spiked-covariance models, although the analysis in those cases requires specific restrictions on the joint growth of pp and nn.

  3. 3.

    One can also investigate a procedure based on an L2L^{2}-type distance instead, for which similar results are expected to hold. We leave this as a direction for future work.

6 Proofs

6.1 Proof of Proposition 1

Before presenting the proof, we first state a version of Lebesgue’s differentiation theorem for smooth and complete Riemannian manifolds. The version provided below is not the most general result but is sufficient for our purposes.

Lemma 1.

Let (M,g)(M,g) be a smooth, complete Riemannian manifold with the corresponding geodesic distance dd. Suppose ν\nu is a non-negative, finite Borel measure on the metric space (M,d)(M,d) and ff is a non-negative integrable function with respect to ν\nu. Then, we have

limr0B(x,r)f(x)𝑑ν(x)ν(B(x,r))=f(x)\displaystyle\lim_{r\to 0}\frac{\int_{B(x,r)}f(x)d\nu(x)}{\nu\left(B(x,r)\right)}=f(x) (18)

for ν\nu-almost everywhere xMx\in M. Here B(x,r)B(x,r) is the open ball with respect to the geodesic distance dd.

Proof of Lemma 1. Define

ν1(A)=Af(x)𝑑ν(x).\displaystyle\nu_{1}(A)=\int_{A}f(x)d\nu(x).

It is easy to see that ν1ν\nu_{1}\ll\nu. By Theorem A.1 in [jost2021probabilistic], there exists a measurable set S0MS_{0}\subset M such that ν(S0)=0\nu(S_{0})=0 and

D(x)=limr0ν1(B(x,r))ν(B(x,r))\displaystyle D(x)=\lim_{r\to 0}\frac{\nu_{1}\left(B(x,r)\right)}{\nu\left(B(x,r)\right)} (19)

exists and is finite for all xMS0x\in M\setminus S_{0}. Also by Theorem A.1 in [jost2021probabilistic], D(x)D(x) is the Radon–Nikodym derivative dν1/dνd\nu_{1}/d\nu (up to a null set) whenever (M,d)(M,d) is complete. Thus, D(x)=f(x)D(x)=f(x) ν\nu-almost everywhere. The proof is completed. \square

In the argument of Lemma 1, the completeness of (M,d)(M,d) is needed only to deduce that D(x)D(x) in (19) is equal to the Radon–Nikodym derivative f(x)f(x) ν\nu-almost everywhere. Results of this type hold for various metric spaces with different structures, see the classical monograph [GMT] for more details.

Proof of Proposition 1. It is easy to see that given the assumptions in Proposition 1, we have

𝕊p1𝕊p1g(𝒙𝒚)𝑑ν(𝒙)𝑑ν(𝒚)=𝕊p1𝕊p1g(𝒙𝒚)𝑑μ0(𝒙)𝑑μ0(𝒚)\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}g(\bm{x}^{\top}\bm{y})d\nu(\bm{x})d\nu(\bm{y})=\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}g(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y}) (20)

for any bounded, measurable function g:[1,1]g:[-1,1]\to\mathbb{R}. We will show that (20) implies νμ0\nu\equiv\mu_{0}. To see this, fix η(0,2]\eta\in(0,2] and define

gη(t):=𝟏(1η,1](t)𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑μ0(𝒙)𝑑μ0(𝒚).g_{\eta}(t):=\frac{\mathbf{1}_{(1-\eta,1]}(t)}{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})}.

Let μ1=ν+μ02\mu_{1}=\frac{\nu+\mu_{0}}{2} and define the Radon-Nykodim derivatives dν(𝒙)dμ1(𝒙)=f(𝒙)\frac{d\nu(\bm{x})}{d\mu_{1}(\bm{x})}=f(\bm{x}), dμ0(𝒙)dμ1(𝒙)=h(𝒙)\frac{d\mu_{0}(\bm{x})}{d\mu_{1}(\bm{x})}=h(\bm{x}). It follows that dμ1(𝒙)=f+g2dμ1(𝒙)d\mu_{1}(\bm{x})=\frac{f+g}{2}d\mu_{1}(\bm{x}) and thus,

f+h=2\displaystyle f+h=2 (21)

μ1\mu_{1}-almost surely. Plug gηg_{\eta} into (20) gives

𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑ν(𝒙)𝑑ν(𝒚)𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑μ0(𝒙)𝑑μ0(𝒚)=𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑μ0(𝒙)𝑑μ0(𝒚)𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑μ0(𝒙)𝑑μ0(𝒚)=1.\displaystyle\frac{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\nu(\bm{x})d\nu(\bm{y})}{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})}=\underbrace{\frac{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})}{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})}}_{=1}.

Thus,

𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑ν(𝒙)𝑑ν(𝒚)=𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑μ0(𝒙)𝑑μ0(𝒚).\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\nu(\bm{x})d\nu(\bm{y})=\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y}). (22)

Moreover, by Fubini’s theorem,

𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑ν(𝒙)𝑑ν(𝒚)\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\nu(\bm{x})d\nu(\bm{y})
=\displaystyle= 𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)f(𝒙)f(𝒚)𝑑μ1(𝒙)𝑑μ1(𝒚)\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})f(\bm{x})f(\bm{y})d\mu_{1}(\bm{x})d\mu_{1}(\bm{y})
=\displaystyle= 𝕊p1f(𝒙)(1η<𝒙𝒚1f(𝒚)𝑑μ1(𝒚))𝑑μ1(𝒙).\displaystyle\int_{\mathbb{S}^{p-1}}f(\bm{x})\left(\int_{1-\eta<\bm{x}^{\top}\bm{y}\leq 1}f(\bm{y})d\mu_{1}(\bm{y})\right)d\mu_{1}(\bm{x}).

Additionally,

𝕊p1𝕊p1𝟏(1η,1](𝒙𝒚)𝑑μ0(𝒙)𝑑μ0(𝒚)\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y}) =μ0(1η<𝒙𝒚1).\displaystyle=\mathbb{P}_{\mu_{0}}\left(1-\eta<\bm{x}^{\top}\bm{y}\leq 1\right).

Therefore, from (22) and the two equalities above, we get

𝕊p1f(𝒙)(1η<𝒙𝒚1f(𝒚)𝑑μ1(𝒚))𝑑μ1(𝒙)=μ0(1η<𝒙𝒚1)\displaystyle\int_{\mathbb{S}^{p-1}}f(\bm{x})\left(\int_{1-\eta<\bm{x}^{\top}\bm{y}\leq 1}f(\bm{y})d\mu_{1}(\bm{y})\right)d\mu_{1}(\bm{x})=\mathbb{P}_{\mu_{0}}\left(1-\eta<\bm{x}^{\top}\bm{y}\leq 1\right) (23)

for all η(0,2]\eta\in(0,2].

Let d(𝒙,𝒚)=arccos(𝒙𝒚)d(\bm{x},\bm{y})=\arccos\left(\bm{x}^{\top}\bm{y}\right) be the geodesic distance on 𝕊p1\mathbb{S}^{p-1}. It is easy to check that (𝕊p1,d)\left(\mathbb{S}^{p-1},d\right) is a Polish space and that for all 𝒙𝕊p1\bm{x}\in\mathbb{S}^{p-1},

{𝒚:1η<𝒙𝒚1}=B(𝒙,fη)\left\{\bm{y}:1-\eta<\bm{x}^{\top}\bm{y}\leq 1\right\}=B(\bm{x},f_{\eta})

where fη=arccos(1η)f_{\eta}=\arccos(1-\eta) and B(𝒙,r)B(\bm{x},r) is the closed ball with center at 𝒙\bm{x}, radius rr, and with respect to dd. Hence, for all η(0,2]\eta\in(0,2], one can rewrite (23) as

𝕊p1f(𝒙)(B(𝒙,fη)f(𝒚)𝑑μ1(𝒚)B(𝒙,fη)h(𝒚)𝑑μ1(𝒚))𝑑μ1(𝒙)=1\displaystyle\int_{\mathbb{S}^{p-1}}f(\bm{x})\left(\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{y})d\mu_{1}(\bm{y})}{{\int_{B(\bm{x},f_{\eta})}h(\bm{y})d\mu_{1}(\bm{y})}}\right)d\mu_{1}(\bm{x})=1 (24)

where we have used Lemma 8 to write

μ0(1η<𝒙𝒚1)=μ0(B(𝒙,fη))=B(𝒙,fη)h(𝒚)𝑑μ1(𝒚)\mathbb{P}_{\mu_{0}}\left(1-\eta<\bm{x}^{\top}\bm{y}\leq 1\right)=\mu_{0}\left(B(\bm{x},f_{\eta})\right)=\int_{B(\bm{x},f_{\eta})}h(\bm{y})d\mu_{1}(\bm{y})

for all 𝒙𝕊p1\bm{x}\in\mathbb{S}^{p-1}. Note that (24) holds since the right-hand side in the expression above is constant across 𝒙\bm{x}.

Since (𝕊p1,d)\left(\mathbb{S}^{p-1},d\right) is a smooth, complete Riemannian manifold with respect to the canonical Riemannian metric, Lemma 1 can be applied to μ1\mu_{1} to deduce that

B(𝒙,fη)f(𝒙)𝑑μ1(𝒚)B(𝒙,fη)h(𝒙)𝑑μ1(𝒚)=B(𝒙,fη)f(𝒙)𝑑μ1(𝒚)μ1(B(𝒙,fη))(B(𝒙,fη)h(𝒙)𝑑μ1(𝒚)μ1(B(𝒙,fη)))1f(𝒙)h(𝒙),\displaystyle\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{x})d\mu_{1}(\bm{y})}{{\int_{B(\bm{x},f_{\eta})}h(\bm{x})d\mu_{1}(\bm{y})}}=\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{x})d\mu_{1}(\bm{y})}{\mu_{1}\left(B(\bm{x},f_{\eta})\right)}\cdot\left(\frac{\int_{B(\bm{x},f_{\eta})}h(\bm{x})d\mu_{1}(\bm{y})}{\mu_{1}\left(B(\bm{x},f_{\eta})\right)}\right)^{-1}\to\frac{f(\bm{x})}{h(\bm{x})},

as η0\eta\to 0, for μ1\mu_{1}-almost surely 𝒙\bm{x} since fη0f_{\eta}\to 0 as η0\eta\to 0. Therefore, the above display together with (21), (24) and Fatou’s lemma yields

𝕊p1f(𝒙)22f(𝒙)𝑑μ1(𝒙)\displaystyle\int_{\mathbb{S}^{p-1}}\frac{f(\bm{x})^{2}}{2-f(\bm{x})}d\mu_{1}(\bm{x}) =𝕊p1f(𝒙)2h(𝒙)𝑑μ1(𝒙)\displaystyle=\int_{\mathbb{S}^{p-1}}\frac{f(\bm{x})^{2}}{h(\bm{x})}d\mu_{1}(\bm{x})
=𝕊p1limη0f(𝒙)(B(𝒙,fη)f(𝒚)𝑑μ1(𝒚)B(𝒙,fη)h(𝒚)𝑑μ1(𝒚))dμ1(𝒙)\displaystyle=\int_{\mathbb{S}^{p-1}}\lim_{\eta\to 0}f(\bm{x})\left(\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{y})d\mu_{1}(\bm{y})}{{\int_{B(\bm{x},f_{\eta})}h(\bm{y})d\mu_{1}(\bm{y})}}\right)d\mu_{1}(\bm{x})
lim infη0𝕊p1f(𝒙)(B(𝒙,fη)f(𝒙)𝑑μ1(𝒚)B(𝒙,fη)h(𝒙)𝑑μ1(𝒚))𝑑μ1(𝒙)=1.\displaystyle\leq\liminf_{\eta\to 0}\int_{\mathbb{S}^{p-1}}f(\bm{x})\left(\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{x})d\mu_{1}(\bm{y})}{{\int_{B(\bm{x},f_{\eta})}h(\bm{x})d\mu_{1}(\bm{y})}}\right)d\mu_{1}(\bm{x})=1.

Moreover, Holder’s inequality gives

𝕊p1f(𝒙)22f(𝒙)𝑑μ1(𝒙)\displaystyle\int_{\mathbb{S}^{p-1}}\frac{f(\bm{x})^{2}}{2-f(\bm{x})}d\mu_{1}(\bm{x}) =𝕊p1f(𝒙)22f(𝒙)𝑑μ1(𝒙)𝕊p1[2f(𝒙)]𝑑μ1(𝒙)\displaystyle=\int_{\mathbb{S}^{p-1}}\frac{f(\bm{x})^{2}}{2-f(\bm{x})}d\mu_{1}(\bm{x})\cdot\int_{\mathbb{S}^{p-1}}\left[2-f(\bm{x})\right]d\mu_{1}(\bm{x})
(𝕊p1f(𝒙)𝑑μ1(𝒙))2=1.\displaystyle\geq\left(\int_{\mathbb{S}^{p-1}}f(\bm{x})d\mu_{1}(\bm{x})\right)^{2}=1.

The two bounds above implies that the integral 𝕊p1f(𝒙)2(2f(𝒙))1𝑑μ1(𝒙)\int_{\mathbb{S}^{p-1}}f(\bm{x})^{2}\cdot\left(2-f(\bm{x})\right)^{-1}d\mu_{1}(\bm{x}) is exactly 11 and f(𝒙)2=(2f(𝒙))2f(\bm{x})^{2}=(2-f(\bm{x}))^{2} for μ1\mu_{1}-almost surely 𝒙\bm{x}. This in turn yields f1f\equiv 1 almost surely with respect to μ1\mu_{1}. From (21), we also get h1h\equiv 1 and the conclusion follows. \square

6.2 Proof of Theorem 3

Fix ε>0\varepsilon>0 sufficiently small and define

κn:=εp3/4n;dμn,𝜽dμ0(𝒙)exp(κn(𝒙𝜽)).\kappa_{n}:=\frac{\varepsilon p^{3/4}}{\sqrt{n}};\quad\frac{d\mu_{n,\bm{\theta}}}{d\mu_{0}}(\bm{x})\propto\exp\left(\kappa_{n}\left(\bm{x}^{\top}\bm{\theta}\right)\right).

In other words, μn\mu_{n} is a FvML distribution with location θ\theta and concentration parameter κn\kappa_{n}. Consider the least favorable distribution

μn:=𝔼𝜽μ0[i=1ndμn,𝜽dμ0(𝑿i)].\displaystyle\mu_{n}^{*}:=\mathbb{E}_{\bm{\theta}\sim\mu_{0}}\left[\prod_{i=1}^{n}\frac{d\mu_{n,\bm{\theta}}}{d\mu_{0}}(\bm{X}_{i})\right]. (25)

By Proposition 7, for all 𝜽𝕊p1\bm{\theta}\in\mathbb{S}^{p-1}, we have

d(μn,𝜽,μ0)ε210nd\left(\mu_{n,\bm{\theta}},\mu_{0}\right)\geq\frac{\varepsilon^{2}}{10n}

whenever min{p,n}\min\left\{p,n\right\} is sufficiently large and ε\varepsilon is small enough.

Consequently, the Le Cam’s mixture argument yields

lim infn{infϕ𝒯n{μ0(ϕ=1)+supd(ν,μ0)εnν(ϕ=0)}}\displaystyle\liminf_{n\to\infty}\left\{\inf_{\phi\in\mathcal{T}_{n}}\left\{\mathbb{P}_{\mu_{0}}\left(\phi=1\right)+\sup_{d(\nu,\mu_{0})\geq\frac{\varepsilon}{n}}\mathbb{P}_{\nu}\left(\phi=0\right)\right\}\right\} lim infn{1dTV(μ0,μn)}\displaystyle\geq\liminf_{n\to\infty}\Big\{1-d_{\rm TV}\left(\mu_{0},\mu_{n}^{*}\right)\Big\}
lim infn{1𝔼Ln21}\displaystyle\geq\liminf_{n\to\infty}\Big\{1-\sqrt{\mathbb{E}L_{n}^{2}-1}\Big\}

where LnL_{n} is the likelihood ratio defined in (50).

Thanks to Proposition 5, we know that 𝔼Ln21eε21=O(ε2)\mathbb{E}L_{n}^{2}-1\leq e^{\varepsilon^{2}}-1=O(\varepsilon^{2}) for small ε>0\varepsilon>0. Thus, we get (16) by choosing ε\varepsilon small enough. The proof is completed. \square

6.3 Proof of Theorem 1

Define

Sn(t):=2n(n1)i<j[𝟏{p𝑿i𝑿jt}μ0(p𝑿1𝑿2t)].\displaystyle S_{n}(t):=\sqrt{\frac{2}{n(n-1)}}\sum_{i<j}\left[\mathbf{1}_{\left\{\sqrt{p}\bm{X}_{i}^{\top}\bm{X}_{j}\leq t\right\}}-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\leq t\right)\right]. (26)

We will show that the process {Sn(t);n1}\left\{S_{n}(t);n\geq 1\right\} converges in distribution to {BΦ(t)}\left\{B_{\Phi(t)}\right\} in the Skorohod space D[a,b]D[a,b], for all a<ba<b\in\mathbb{R}. Some basic properties regarding the topology on this space can be found in [Dehling].

Here Φ\Phi is the CDF of a standard normal distribution and {Bu;0u1}\left\{B_{u};0\leq u\leq 1\right\} is the Brownian bridge. We do not work directly with the space D()D\left(\mathbb{R}\right) (see [vogel2010weak] for more details) since the supremum functional is not almost surely continuous on this space, see Remark 1 below.

Step 1: Convergence in D[a,b]D[a,b]. Suppose a<ba<b. To show the convergence in D[a,b]D[a,b], we need to check that

Condition 2 (Finite-dimensional convergence in distribution).

For any grid at1<t2<<tkba\leq t_{1}<t_{2}<\dots<t_{k}\leq b, one has (Sn(t1),Sn(t2),,Sn(tk))(S_{n}(t_{1}),S_{n}(t_{2}),\dots,S_{n}(t_{k})) converges in distribution to (BΦ(t1),BΦ(t2),,BΦ(tk))\left(B_{\Phi(t_{1})},B_{\Phi(t_{2})},\dots,B_{\Phi_{(}t_{k})}\right).

Condition 3 (Tightness).

For any ε>0\varepsilon>0, we have

limδ0lim supn(sup|ts|δ|Sn(t)Sn(s)|>ε)=0.\displaystyle\lim_{\delta\to 0}\limsup_{n\to\infty}\mathbb{P}\left(\sup_{|t-s|\leq\delta}\Big|S_{n}(t)-S_{n}(s)\Big|>\varepsilon\right)=0. (27)

To check Condition 2, we will make use of the following result whose proof is given in Section A.3.

Proposition 4.

Let hn:h_{n}:\mathbb{R}\mapsto\mathbb{R} be a sequence of measurable functions such that 𝔼hn(𝐗1𝐗2)=0\mathbb{E}h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})=0 and

Var(hn(𝑿1𝑿2))\displaystyle\mbox{Var}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})\right) σ2>0;\displaystyle\to\sigma^{2}>0; (28)
𝔼(hn4(𝑿1𝑿2))n(𝔼hn2(𝑿1𝑿2))2\displaystyle\frac{\mathbb{E}\left(h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})\right)}{n\left(\mathbb{E}h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\right)^{2}} 0.\displaystyle\to 0. (29)

Then,

2n(n1)1i<jnhn(𝑿i𝑿j)𝑑N(0,σ2).\displaystyle\sqrt{\frac{2}{n(n-1)}}\sum_{1\leq i<j\leq n}h_{n}\left(\bm{X}^{\top}_{i}\bm{X}_{j}\right)\xrightarrow{d}N(0,\sigma^{2}). (30)

Apply Proposition 4 to the kernels of the form hn(x)=𝟏{p𝒙t}h_{n}(x)=\mathbf{1}_{\left\{\sqrt{p}\bm{x}\leq t\right\}}, we get the convergence of finite-dimensional distributions. Note that condition (29) satisfies because p𝑿1𝑿2\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2} is asymptotically a standard normal distribution.

We now check the tightness condition (27). Note that under uniformity,

cn,δ:=sup|ts|δ(s<p𝑿1𝑿2<t)=sup|ts|δ(sZt)+O(p1/2)c_{n,\delta}:=\sup_{|t-s|\leq\delta}\mathbb{P}\left(s<\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}<t\right)=\sup_{|t-s|\leq\delta}\mathbb{P}\left(s\leq Z\leq t\right)+O\left(p^{-1/2}\right)

as pp\to\infty.

By applying Lemma 6 to the class of functions {𝟏{sp𝒙𝒚t}}as<tb\left\{\mathbf{1}_{\left\{s\leq\sqrt{p}\cdot\bm{x}^{\top}\bm{y}\leq t\right\}}\right\}_{a\leq s<t\leq b} (which has VC-dimension 22) and using the degeneracy of the kernels, we obtain

(sup|ts|δSn(t)Sn(s)ε)cn,δ[1+log(cn,δ)]+log(cn,δ1)n.\displaystyle\mathbb{P}\left(\sup_{|t-s|\leq\delta}\mid S_{n}(t)-S_{n}(s)\mid\geq\varepsilon\right)\lesssim c_{n,\delta}\left[1+\log\left(c_{n,\delta}\right)\right]+\frac{\log\left(c_{n,\delta}^{-1}\right)}{\sqrt{n}}.

The proof is completed by first letting nn\to\infty and then letting δ0\delta\to 0.

Step 2: Continuous mapping and negligibility of the tail. By the continuous mapping theorem, for all a>0a>0, we have

supt[a,a]|Sn(t)|dsupt[a,a]|BΦ(t)|.\sup_{t\in[-a,a]}|S_{n}(t)|\stackrel{{\scriptstyle d}}{{\to}}\sup_{t\in[-a,a]}|B_{\Phi(t)}|.

To deduce the result, it suffices to show that for all ε>0\varepsilon>0

lima(sup|t|>a|Sn(t)|>ε)=0.\displaystyle\lim_{a\to\infty}\mathbb{P}\left(\sup_{|t|>a}|S_{n}(t)|>\varepsilon\right)=0. (31)

The above is equivalent to showing that

lima(supt>a|Sn(t)|>ε)=0,andlima(supt<a|Sn(t)|>ε)=0.\lim_{a\to\infty}\mathbb{P}\left(\sup_{t>a}|S_{n}(t)|>\varepsilon\right)=0,\ \text{and}\ \lim_{a\to\infty}\mathbb{P}\left(\sup_{t<-a}|S_{n}(t)|>\varepsilon\right)=0.

Since the proofs of these two limit are identical, we will only prove the former. We again apply Lemma 6 to the VC-type class of functions

{𝟏{p𝒙𝒚t}}t(a,)\left\{\mathbf{1}_{\left\{\sqrt{p}\cdot\bm{x}^{\top}\bm{y}\leq t\right\}}\right\}_{t\in(a,\infty)}

to deduce that

(supt>a|Sn(t)|>ε)τa[1+log(τa)]+1n\mathbb{P}\left(\sup_{t>a}|S_{n}(t)|>\varepsilon\right)\lesssim\tau_{a}\left[1+\log\left(\tau_{a}\right)\right]+\frac{1}{\sqrt{n}}

where the variance profile τa\tau_{a} is defined as

τa:=supt>a{Var(𝟏{p𝑿1𝑿2t})}1(p𝑿1𝑿2a).\tau_{a}:=\sup_{t>a}\left\{\mbox{Var}\left(\mathbf{1}_{\left\{\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\leq t\right\}}\right)\right\}\leq 1-\mathbb{P}\left(\sqrt{p}\cdot\bm{X}_{1}^{\top}\bm{X}_{2}\leq a\right).

It is easy to check that τa\tau_{a} converges to 0 as aa\to\infty. This finishes the proof. \square

Remark 1.

The reason we do not work directly with D()D(\mathbb{R}) is because the functional

𝒮:D()\displaystyle\mathcal{S}:D(\mathbb{R}) +\displaystyle\to\mathbb{R}^{+}
f\displaystyle f supt|f(t)|\displaystyle\to\sup_{t\in\mathbb{R}}|f(t)|

is not almost surely continuous at {BΦ(u)}u\left\{B_{\Phi(u)}\right\}_{u\in\mathbb{R}}.

In fact, the topology on D()D(\mathbb{R}) is equivalent to the coarsest topology such that the projection map from D()D(\mathbb{R}) to D[a,b]D[a,b] is continuous for all a<ba<b (see Section 3 in [vogel2010weak] for more details). Since this topology only sees the behavior of the process on bounded intervals, modifying the process on a diverging sequence still yields convergence in D()D(\mathbb{R}), but the supremum functional can blow up.

6.4 Proof of Theorem 2

Consider a sequence of laws μn\mu_{n} such that nd(μn,μ0)nd\left(\mu_{n},\mu_{0}\right)\to\infty. Put ht(𝒙,𝒚)=𝟏{𝒙,𝒚t}h_{t}(\bm{x},\bm{y})=\mathbf{1}_{\left\{\langle\bm{x},\bm{y}\rangle\leq t\right\}} and define

gt,n(𝒙):=𝔼μn(ht(𝒙,𝒚)𝒙).g_{t,n}(\bm{x}):=\mathbb{E}_{\mu_{n}}\left(h_{t}\left(\bm{x},\bm{y}\right)\mid\bm{x}\right).

For t[1,1]t\in[-1,1], rewrite Tn(t)T_{n}(t) in terms of the Hoeffding’s projection as

Tn(t)=Tn,1(t)+Tn,2(t)+dt\displaystyle T_{n}(t)=T_{n,1}(t)+T_{n,2}(t)+d_{t}

where

Tn,1(t):\displaystyle T_{n,1}(t): =2ni=1n[gt,n(𝑿i)𝔼gt,n(𝑿i)];\displaystyle=\frac{2}{n}\sum_{i=1}^{n}\Big[g_{t,n}\left(\bm{X}_{i}\right)-\mathbb{E}g_{t,n}\left(\bm{X}_{i}\right)\Big];
Tn,2(t):\displaystyle T_{n,2}(t): =2n(n1)1i<jn[ht(𝑿i,𝑿j)gt,n(𝑿i)gt,n(𝑿i)+𝔼ht(𝑿i,𝑿j)];\displaystyle=\frac{2}{n(n-1)}\sum_{1\leq i<j\leq n}\Big[h_{t}\left(\bm{X}_{i},\bm{X}_{j}\right)-g_{t,n}\left(\bm{X}_{i}\right)-g_{t,n}\left(\bm{X}_{i}\right)+\mathbb{E}h_{t}\left(\bm{X}_{i},\bm{X}_{j}\right)\Big];
dt:\displaystyle d_{t}: =μn(𝑿1𝑿2t)μ0(𝑿1𝑿2t).\displaystyle=\mathbb{P}_{\mu_{n}}\left(\bm{X}_{1}^{\top}\bm{X}_{2}\leq t\right)-\mathbb{P}_{\mu_{0}}\left(\bm{X}_{1}^{\top}\bm{X}_{2}\leq t\right).

Define

Vn:\displaystyle V_{n}: =maxt[1,1]Var(gt,n(𝑿1));\displaystyle=\max_{t\in[-1,1]}\mbox{Var}\left(g_{t,n}\left(\bm{X}_{1}\right)\right);
tn:\displaystyle t_{n}: =argmaxt[1,1]Var(gt,n(𝑿1));\displaystyle=\mbox{argmax}_{t\in[-1,1]}\mbox{Var}\left(g_{t,n}\left(\bm{X}_{1}\right)\right);
αn:\displaystyle\alpha_{n}: =argmaxt[1,1]|dt|\displaystyle=\mbox{argmax}_{t\in[-1,1]}\left|d_{t}\right|
v:\displaystyle v: =lim supn(nVn).\displaystyle=\limsup_{n\to\infty}(nV_{n}).

Roughly speaking, tnt_{n} is the point where the major contribution from the Hoeffding’s projection term Tn,1T_{n,1} comes from, and αn\alpha_{n} is the point where main contribution from the deterministic perturbation dtd_{t} comes from.

It suffices to consider the following two cases.

Case 1: v<v<\infty. In this case, we can estimate

nsupt[1,1]|Tn(t)|\displaystyle n\cdot\sup_{t\in[-1,1]}|T_{n}(t)| n|Tn(αn)|n|Tn,1(αn)+Tn,2(αn)|\displaystyle\geq n\cdot|T_{n}(\alpha_{n})|-n\cdot\left|T_{n,1}\left(\alpha_{n}\right)+T_{n,2}(\alpha_{n})\right|
=nd(μn,μ0)n|Tn,1(αn)+Tn,2(αn)|.\displaystyle=n\cdot d\left(\mu_{n},\mu_{0}\right)-n\cdot\left|T_{n,1}\left(\alpha_{n}\right)+T_{n,2}(\alpha_{n})\right|.

It is easy to check that

Var(Tn,1(αn))Vnnand|Tn,2(αn)|supt[1,1]|Tn,2(t)|=O(n1)\displaystyle\mbox{Var}\left(T_{n,1}\left(\alpha_{n}\right)\right)\leq\frac{V_{n}}{n}\ \ \text{and}\ \ \left|T_{n,2}(\alpha_{n})\right|\leq\sup_{t\in[-1,1]}\left|T_{n,2}(t)\right|=O_{\mathbb{P}}\left(n^{-1}\right)

where the second inequality follows from Lemma 6.

Consequently,

nsupt[1,1]|Tn(t)|\displaystyle n\cdot\sup_{t\in[-1,1]}|T_{n}(t)| nd(μn,μ0)O(nVn)O(1).\displaystyle\geq n\cdot d\left(\mu_{n},\mu_{0}\right)-O_{\mathbb{P}}\left(\sqrt{nV_{n}}\right)-O_{\mathbb{P}}(1).

Since supn1(nVn)=v<\sup_{n\geq 1}\left(nV_{n}\right)=v<\infty and nd(μn,μ0)nd\left(\mu_{n},\mu_{0}\right)\to\infty, the test rejects with probability tending to one.

Case 2: v=v=\infty. In this case, by a subsequence argument, we can assume that nVnnV_{n}\to\infty. Notice that

(rejecting the null)\displaystyle\mathbb{P}\left(\text{rejecting the null}\right) =(supt[1,1]|Tn(t)|qαn(1+o(1)))\displaystyle=\mathbb{P}\left(\sup_{t\in[-1,1]}|T_{n}(t)|\geq\frac{q_{\alpha}}{n}(1+o(1))\right)
(|Tn,1(tn)+dtn|qαn(1+o(1))+|Tn,2(tn)|)\displaystyle\geq\mathbb{P}\left(\left|T_{n,1}(t_{n})+d_{t_{n}}\right|\geq\frac{q_{\alpha}}{n}(1+o(1))+\left|T_{n,2}\left(t_{n}\right)\right|\right)
=(|nVnTn1(tn)+dtnnVn|Tn)\displaystyle=\mathbb{P}\left(\left|\sqrt{\frac{n}{V_{n}}}\cdot T_{n1}(t_{n})+d_{t_{n}}\sqrt{\frac{n}{V_{n}}}\right|\geq T^{*}_{n}\right)

where

Tn:=qα(1+o(1))nVn+nVn|Tn,2(tn)|,T^{*}_{n}:=\frac{q_{\alpha}(1+o(1))}{\sqrt{nV_{n}}}+\sqrt{\frac{n}{V_{n}}}\left|T_{n,2}\left(t_{n}\right)\right|,

and the second line follows from the fact that

supt[1,1]|Tn(t)||Tn1(tn)+dtn||Tn,2(tn)|.\sup_{t\in[-1,1]}|T_{n}(t)|\geq\left|T_{n1}(t_{n})+d_{t_{n}}\right|-\left|T_{n,2}\left(t_{n}\right)\right|.

By Lemma 6 and the assumption that nVnnV_{n}\to\infty, we deduce that Tn=o(1)T_{n}^{*}=o_{\mathbb{P}}(1). Now, thanks to the Berry–Esseen bound for sum of i.i.d. random variables (see; for example, Theorem 3.7 in [chen2010normal]) and the fact that |gn,tn|1|g_{n,t_{n}}|\leq 1, we have

supx|(nVnTn1(tn)x)(N(0,4)x)|\displaystyle\sup_{x\in\mathbb{R}}\left|\mathbb{P}\!\left(\sqrt{\frac{n}{V_{n}}}\,T_{n1}(t_{n})\leq x\right)-\mathbb{P}\!\left(N(0,4)\leq x\right)\right| 𝔼|gn,tn(𝑿i)|3nVn3/2\displaystyle\lesssim\frac{\mathbb{E}\left|g_{n,t_{n}}\left(\bm{X}_{i}\right)\right|^{3}}{\sqrt{n}\cdot V_{n}^{3/2}}
𝔼|gn,tn(𝑿i)|2nVn3/2\displaystyle\lesssim\frac{\mathbb{E}\left|g_{n,t_{n}}\left(\bm{X}_{i}\right)\right|^{2}}{\sqrt{n}\cdot V_{n}^{3/2}}
=VnnVn3/2=1nVn0.\displaystyle=\frac{V_{n}}{\sqrt{n}\cdot V_{n}^{3/2}}=\frac{1}{\sqrt{nV_{n}}}\to 0.

The proof is in this case is completed by employing Lemma 2 below with

Xn=nVnTn1(tn);Yn=Tn;an=dtnnVn.X_{n}=\sqrt{\frac{n}{V_{n}}}\,T_{n1}(t_{n});\quad Y_{n}=T_{n}^{*};\quad a_{n}=d_{t_{n}}\sqrt{\frac{n}{V_{n}}}.

to get

limn(|nVnTn1(tn)+dtnnVn|Tn)=1.\lim_{n\to\infty}\mathbb{P}\left(\left|\sqrt{\frac{n}{V_{n}}}\cdot T_{n1}(t_{n})+d_{t_{n}}\sqrt{\frac{n}{V_{n}}}\right|\geq T^{*}_{n}\right)=1.

\square

Lemma 2.

Suppose {Xn}\{X_{n}\} is a sequence of random variables such that

supt|(Xnt)(N(0,4)t)|0\sup_{t\in\mathbb{R}}\left|\mathbb{P}\left(X_{n}\leq t\right)-\mathbb{P}\left(N(0,4)\leq t\right)\right|\to 0

where N(0,4)N(0,4) is a normal distribution with variance 44.

Let {an}\{a_{n}\} be any sequence of real numbers (not necessarily bounded), and let {Yn}\{Y_{n}\} be a sequence of random variables such that Yn0Y_{n}\stackrel{{\scriptstyle\mathbb{P}}}{{\to}}0. Then

limn(|Xn+an||Yn|)=1.\lim_{n\to\infty}\mathbb{P}\!\left(|X_{n}+a_{n}|\geq|Y_{n}|\right)=1.

Proof of Lemma 2. Fix ε>0\varepsilon>0 and write

(|Xn+an||Yn|)\displaystyle\mathbb{P}\!\Big(|X_{n}+a_{n}|\leq|Y_{n}|\Big) (|Xn+an||Yn|,|Yn|ε)+(|Yn|>ε)\displaystyle\leq\mathbb{P}\!\Big(|X_{n}+a_{n}|\leq|Y_{n}|,|Y_{n}|\leq\varepsilon\Big)+\mathbb{P}\Big(|Y_{n}|>\varepsilon\Big)
(|Xn+an|ε)+(|Yn|>ε)\displaystyle\leq\mathbb{P}\left(|X_{n}+a_{n}|\leq\varepsilon\right)+\mathbb{P}\Big(|Y_{n}|>\varepsilon\Big)
=(εanN(0,4)εan)+(|Yn|>ε)+o(1)\displaystyle=\mathbb{P}\left(-\varepsilon-a_{n}\leq N(0,4)\leq\varepsilon-a_{n}\right)+\mathbb{P}\Big(|Y_{n}|>\varepsilon\Big)+o(1)
2εsupt{122πexp(t2/8)}+(|Yn|>ε)+o(1).\displaystyle\leq 2\varepsilon\cdot\sup_{t\in\mathbb{R}}\left\{\frac{1}{2\sqrt{2\pi}}\exp\left(-t^{2}/8\right)\right\}+\mathbb{P}\Big(|Y_{n}|>\varepsilon\Big)+o(1).
2ε+(|Yn|>ε)+o(1).\displaystyle\leq 2\varepsilon+\mathbb{P}\Big(|Y_{n}|>\varepsilon\Big)+o(1).

The proof is completed by taking nn\to\infty and then letting ε0\varepsilon\to 0. \square

6.5 Proof of Proposition 2

As in the proof of Theorem 1, we need to check three conditions:

  • Convergence in finite-dimensional distributions. Recall Sn(t)S_{n}(t) in (26). We need to show that for all t1<t2<<tkt_{1}<t_{2}<\dots<t_{k},

    (Sn(t1),,Sn(tk))d(BΦ(t1)τ2exp(t12/2)2π,,BΦ(tk)τ2exp(tk2/2)2π)\displaystyle\left(S_{n}(t_{1}),\dots,S_{n}(t_{k})\right)\stackrel{{\scriptstyle d}}{{\to}}\left(B_{\Phi(t_{1})}-\frac{\tau^{2}\exp\left(-t_{1}^{2}/2\right)}{2\sqrt{\pi}},\dots,B_{\Phi(t_{k})}-\frac{\tau^{2}\exp\left(-t_{k}^{2}/2\right)}{2\sqrt{\pi}}\right) (32)

    under the FvML distributions with concentration parameter κn\kappa_{n}.

  • Tightness. This condition is equivalent to (27), for all spaces D[a,b]D[a,b] with a<ba<b.

  • Negligibility of the tail. This condition is (31).

To show (32), recall that by Proposition 4 and the Crámer-Wold device, we have

(Sn(t1),,Sn(tk),Rn)d(𝑩k,Z)\left(S_{n}(t_{1}),\dots,S_{n}(t_{k}),R_{n}\right)\stackrel{{\scriptstyle d}}{{\to}}\left(\bm{B}_{k},Z\right)

under uniformity, where ZZ is a standard normal, RnR_{n} is the Rayleigh test as in (2) and 𝑩k=(BΦ(t1),,BΦ(tk))\bm{B}_{k}=\left(B_{\Phi(t_{1})},\dots,B_{\Phi(t_{k})}\right) has the distribution equals to the joint distribution of discretized Brownian bridge at Φ(t1),Φ(t2),,Φ(tk)\Phi(t_{1}),\Phi(t_{2}),\dots,\Phi(t_{k}). Moreover, the correlation between ZZ and 𝑩k\bm{B}_{k} can be specified as (this is also the covariance limit in Proposition 7’s proof)

𝔼(BΦ(ti)Z)=𝔼(Z𝟏{Zti})=exp(ti2/2)2π\mathbb{E}\left(B_{\Phi(t_{i})}Z\right)=\mathbb{E}\left(Z^{*}\cdot\mathbf{1}_{\left\{Z^{*}\leq t_{i}\right\}}\right)=\frac{-\exp\left(-t_{i}^{2}/2\right)}{\sqrt{2\pi}}

with ZN(0,1)Z^{*}\sim N(0,1), for all 1ik1\leq i\leq k.

We then obtain (32) from the convergence above by using the LAN expansion (52) (see [Cutting-P-V] for a proof) and the Le Cam’s third lemma.

We now show (27) and (31). Since their proofs are similar, we will only show (27). Assume the contrary, then there exists ε,ε1>0\varepsilon,\varepsilon_{1}>0 and and a sequence {δk,nk}k1\left\{\delta_{k},n_{k}\right\}_{k\geq 1} such that

lim infk{μnk(sup|ts|δk|Snk(t)Snk(s)|>ε)}ε1.\displaystyle\liminf_{k\to\infty}\left\{\mathbb{P}_{\mu_{n_{k}}}\left(\sup_{|t-s|\leq\delta_{k}}\Big|S_{n_{k}}(t)-S_{n_{k}}(s)\Big|>\varepsilon\right)\right\}\geq\varepsilon_{1}. (33)

where μnk\mu_{n_{k}} is the corresponding subsequence of FvML alternatives. Put

𝒜k={sup|ts|δk|Snk(t)Snk(s)|>ε}.\mathcal{A}_{k}=\left\{\sup_{|t-s|\leq\delta_{k}}\Big|S_{n_{k}}(t)-S_{n_{k}}(s)\Big|>\varepsilon\right\}.

Recall LnL_{n} in (50). Thanks to Proposition 5, we have

μnk(𝒜k)=𝒜k1𝑑μnk\displaystyle\mathbb{P}_{\mu_{n_{k}}}\left(\mathcal{A}_{k}\right)=\int_{\mathcal{A}_{k}}1d\mathbb{P}_{\mu_{nk}} =𝒜kLn𝑑0\displaystyle=\int_{\mathcal{A}_{k}}L_{n}d\mathbb{P}_{0}
0(𝒜k)𝔼0(Ln2)0\displaystyle\leq\sqrt{\mathbb{P}_{0}\left(\mathcal{A}_{k}\right)}\cdot\sqrt{\mathbb{E}_{\mathbb{P}_{0}}\left(L_{n}^{2}\right)}\to 0 (34)

since 𝔼0(Ln2)<{\mathbb{E}_{\mathbb{P}_{0}}\left(L_{n}^{2}\right)}<\infty and 0(𝒜k)0\mathbb{P}_{0}\left(\mathcal{A}_{k}\right)\to 0 under uniformity (which is due to (27) holds under uniformity). Note that the second equality in this display above follows from the fact that the distribution of Sn(t)S_{n}(t) is invariant under rotations:

Sn(t)(𝑿1,,𝑿n)=dSn(t)(𝑶𝑿1,,𝑶𝑿n)S_{n}(t)\left(\bm{X}_{1},\dots,\bm{X}_{n}\right)\stackrel{{\scriptstyle d}}{{=}}S_{n}(t)\left(\bm{O}\bm{X}_{1},\dots,\bm{O}\bm{X}_{n}\right)

for all orthogonal matrices 𝑶\bm{O}.

Since (33) contradicts (34), (27) must hold. This finishes the proof. \square

6.6 Proof of Theorem 4

Let us start with a useful result for calculating likelihood ratios between distributions that are invariant under group actions. For terminology related to group actions and maximal invariants, we refer the reader to Chapters 2 and 3 of the monograph [eaton1989group].

For the reader’s convenience, we briefly recall the relevant concepts. A group GG is said to act on a space 𝒳\mathcal{X} if there exists a mapping G×𝒳𝒳G\times\mathcal{X}\to\mathcal{X} that is compatible with the group operation. A measurable mapping T:𝒳𝒴T:\mathcal{X}\to\mathcal{Y} is called an invariant if

T(x)=T(gx),gG.T(x)=T(gx),\qquad\forall\,g\in G.

An invariant TT is called a maximal invariant if T(x)=T(y)T(x)=T(y) for some x,y𝒳x,y\in\mathcal{X}, then there exists gGg\in G such that x=gyx=gy.

Lemma 3.

Let 𝒳\mathcal{X} be a Polish space, and suppose a compact group GG acts on 𝒳\mathcal{X} continuously. Let \mathbb{P} and \mathbb{Q} be two Borel probability measures on 𝒳\mathcal{X} that are invariant under the action of GG. Let T:𝒳𝒴T:\mathcal{X}\to\mathcal{Y} be a continuous maximal invariant for some Polish space 𝒴\mathcal{Y}. Define the induced laws

T:=T1,T:=T1.\mathbb{P}_{T}:=\mathbb{P}\circ T^{-1},\qquad\mathbb{Q}_{T}:=\mathbb{Q}\circ T^{-1}.

Then \mathbb{P}\ll\mathbb{Q} whenever TT\mathbb{P}_{T}\ll\mathbb{Q}_{T}. Moreover, when this holds and XX\sim\mathbb{Q}, we have

dd(X)=dTdT(T(X))-almost surely.\frac{d\mathbb{P}}{d\mathbb{Q}}(X)=\frac{d\mathbb{P}_{T}}{d\mathbb{Q}_{T}}\!\bigl(T(X)\bigr)\quad\mathbb{Q}\text{-almost surely}.

The proof of Lemma 3 can be found in Appendix A.4. We now construct the least favorable alternative. Let Πk,p\Pi_{k,p} denote the normalized left Haar measure on the Grassmannian G(k,p)G(k,p) (so that it is a probability measure). Define

0n\displaystyle\mathbb{P}_{0n} :=Unif(𝕊p1)Unif(𝕊p1)n-times,\displaystyle:=\underbrace{\mathrm{Unif}\!\left(\mathbb{S}^{p-1}\right)\otimes\cdots\otimes\mathrm{Unif}\!\left(\mathbb{S}^{p-1}\right)}_{n\text{-times}},
1n\displaystyle\mathbb{P}_{1n} :=G(k,p)Unif(H𝕊p1)Unif(H𝕊p1)n-timesΠk,p(dH).\displaystyle:=\int_{G(k,p)}\underbrace{\mathrm{Unif}\!\left(H\cap\mathbb{S}^{p-1}\right)\otimes\cdots\otimes\mathrm{Unif}\!\left(H\cap\mathbb{S}^{p-1}\right)}_{n\text{-times}}\,\Pi_{k,p}(dH).

Roughly speaking, 0n\mathbb{P}_{0n} is the joint distribution of 𝐗1,,𝐗n\mathbf{X}_{1},\ldots,\mathbf{X}_{n} under H0H_{0}, while 1n\mathbb{P}_{1n} is the law obtained by first sampling a kk-dimensional subspace HΠk,pH\sim\Pi_{k,p} and then sampling

(𝐗1,,𝐗n)Hi.i.d.Unif(H𝕊p1).(\mathbf{X}_{1},\ldots,\mathbf{X}_{n})\mid H\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}\mathrm{Unif}\!\left(H\cap\mathbb{S}^{p-1}\right).

From now on, we let 𝐗\mathbf{X} denote the p×np\times n data matrix whose columns are 𝐗1,,𝐗n\mathbf{X}_{1},\ldots,\mathbf{X}_{n}. We will apply Lemma 3 to show that d1n/d0nd\mathbb{P}_{1n}/d\mathbb{P}_{0n} exists and to derive its explicit form. Note that, although one can also use the Blaschke–Petkantschin formula to compute this integral (see, for example, Chapter 7 of [schneider2008stochastic]), the computation is quite lengthy. Define

𝒳:\displaystyle\mathcal{X}: =(𝕊p1)n;\displaystyle=\left(\mathbb{S}^{p-1}\right)^{n};
𝒴:\displaystyle\mathcal{Y}: ={𝒞Symn():𝒞0and𝒞ii=1,1in}\displaystyle=\left\{\mathcal{C}\in\mbox{Sym}_{n}\left(\mathbb{R}\right):\mathcal{C}\succ 0\ \text{and}\ \mathcal{C}_{ii}=1,\ \forall 1\leq i\leq n\right\}

where Symn()\mbox{Sym}_{n}\left(\mathbb{R}\right) is the set of all symmetric matrices of size nn.

By Lemma 7, the map

T:𝒳\displaystyle T:\mathcal{X} 𝒴\displaystyle\to\mathcal{Y}
𝑿\displaystyle\bm{X} 𝑿𝑿\displaystyle\to\bm{X}^{\top}\bm{X}

is a maximal invariant under the action of the group of orthogonal matrices.

Note that the matrix 𝑮:=𝑿𝑿\bm{G}:=\bm{X}^{\top}\bm{X} is nothing but the sample correlation matrix without centering by the sample mean. We can then apply Lemma 2.1 in [jiang2019determinant] and Theorem 5.1.3 in [muirhead2009aspects] to get the density of 𝑮\bm{G} under 0n\mathbb{P}_{0n} as

f(𝑮)det(𝑮)(pn1)/2d𝑮.\displaystyle f\left(\bm{G}\right)\propto\mbox{det}\left(\bm{G}\right)^{(p-n-1)/2}d\bm{G}. (35)

We have p1p-1 in the above formula instead of p2p-2 in [muirhead2009aspects] because there is no centering term in 𝑮\bm{G}, and Lemma 2.1 in [jiang2019determinant] asserts that such difference is in fact equivalent up to one unit shift in pp.

Here d𝑮d\bm{G} joint densities of the upper-diagonal entries of 𝑮\bm{G}. Equivalently, it can also be regarded as a measure on 𝒴\mathcal{Y}, defined as the pushforward measure of the Lebesgue measure on an open subset of n(n1)/2\mathbb{R}^{n(n-1)/2} to 𝒴\mathcal{Y} via the natural embedding.

Similarly, under 1n\mathbb{P}_{1n}, the density of 𝑮\bm{G} is given by

f(𝑮)det(𝑮)(kn1)/2d𝑮.\displaystyle f\left(\bm{G}\right)\propto\mbox{det}\left(\bm{G}\right)^{(k-n-1)/2}d\bm{G}. (36)

It is easy to see that the two laws in (35) and (36) are mutually continuous. Also, these two densities are well-defined due to our assumption that n+1kpn+1\leq k\leq p. Thus, Lemma 3 gives

n:=d1nd0n(𝑿1,,𝑿n)=d1nT1d0nT1(𝑮)=C(kp2)det(𝑮)(kp)/2.\displaystyle\mathcal{L}_{n}:=\frac{d\mathbb{P}_{1n}}{d\mathbb{P}_{0n}}\left(\bm{X}_{1},\dots,\bm{X}_{n}\right)=\frac{d\mathbb{P}_{1n}\circ T^{-1}}{d\mathbb{P}_{0n}\circ T^{-1}}\left(\bm{G}\right)=C\left(\frac{k-p}{2}\right)\cdot\mbox{det}\left(\bm{G}\right)^{(k-p)/2}. (37)

where the normalizing constant C(θ)C(\theta) satisfies

C(θ):=[𝔼0n(det(𝑮)θ)]1.C\left(\theta\right):=\left[\mathbb{E}_{\mathbb{P}_{0n}}\left(\mbox{det}\left(\bm{G}\right)^{\theta}\right)\right]^{-1}.

Similarly to the proof of Theorem 3, we only need to show that

𝔼0n(n2)=1+o(1)\mathbb{E}_{\mathbb{P}_{0n}}\left(\mathcal{L}_{n}^{2}\right)=1+o(1)

whenever n(1k/p)0n\left(1-k/p\right)\to 0 with n\mathcal{L}_{n} is defined in (37).

From Lemma 5.2 in [jiang2015likelihood], we find that

𝔼0n(det(𝑮)θ)=[Γ(p2)Γ(p2+θ)]nΓn(p2+θ)Γn(p2)\displaystyle\mathbb{E}_{\mathbb{P}_{0n}}\left(\mbox{det}\left(\bm{G}\right)^{\theta}\right)=\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\theta\right)}\right]^{n}\cdot\frac{\Gamma_{n}\left(\frac{p}{2}+\theta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}

for θ>max{1,(pn)/2}\theta>-\max\left\{1,(p-n)/2\right\} where Γn\Gamma_{n} is the multivariate Gamma function defined as in (5.1) of [jiang2015likelihood]. The specific form of Γn\Gamma_{n} is not relevant to our proof as we will only need the asymptotic result from Proposition 5.1 of [jiang2015likelihood]. These asymptotic results are collected in Lemma 4 in Appendix A.5 below.

Consequently, with Δ:=kp\Delta:=k-p we obtain

𝔼0n(n2)=C(Δ2)2C(Δ)\displaystyle\mathbb{E}_{\mathbb{P}_{0n}}\left(\mathcal{L}_{n}^{2}\right)=\frac{C\left(\frac{\Delta}{2}\right)^{2}}{C\left(\Delta\right)} =[Γ(p2)Γ(p2+Δ)]nΓn(p2+Δ)Γn(p2)[Γ(p+Δ2)Γ(p2)]2nΓn2(p2)Γn2(p+Δ2)\displaystyle=\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\Delta\right)}\right]^{n}\cdot\frac{\Gamma_{n}\left(\frac{p}{2}+\Delta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\cdot\left[\frac{\Gamma\left(\frac{p+\Delta}{2}\right)}{\Gamma\left(\frac{p}{2}\right)}\right]^{2n}\cdot\frac{\Gamma^{2}_{n}\left(\frac{p}{2}\right)}{\Gamma^{2}_{n}\left(\frac{p+\Delta}{2}\right)}
=exp{Fn(Δ)2Fn(Δ2)}\displaystyle=\exp\left\{F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)\right\}

where

Fn(Δ):=nlog[Γ(p2)Γ(p2+Δ)]+log[Γn(p2+Δ)Γn(p2)].\displaystyle F_{n}(\Delta):=n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\Delta\right)}\right]+\log\left[\frac{\Gamma_{n}\left(\frac{p}{2}+\Delta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right]. (38)

The proof is completed by applying Proposition 6 in the Appendix A.5, which states that

Fn(Δ)2Fn(Δ2)0F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)\to 0

as nn\to\infty such that n(1k/p)0n(1-k/p)\to 0. \square

Appendix A Technical results, discussions and other proofs

A.1 Comparison with projection-based tests

At a high level, our newly developed procedure follows the philosophy of projection-based tests, initially developed in [cuesta2009projection]. Their test relies on the characterization (7). This characterization can be shown using a variant of the Cramér-Wold device, although the same argument does not apply to Proposition 1. Based on (7), [cuesta2009projection] proposed a test that rejects for large values of

Dn,𝑼:=supx[1,1]|1ni=1n𝕀{𝑿i𝑼x}(𝒆1𝑼x)|,\displaystyle D_{n,\bm{U}}:=\sup_{x\in[-1,1]}\left|\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}\left\{\bm{X}_{i}^{\top}\bm{U}\leq x\right\}-\mathbb{P}\left(\bm{e}_{1}^{\top}\bm{U}\leq x\right)\right|, (39)

where 𝑼Uni(𝕊p1)\bm{U}\sim\text{Uni}(\mathbb{S}^{p-1}) is drawn independently from the data and 𝒆1=(1,0,,0)\bm{e}_{1}=(1,0,\dots,0).

The test in [cuesta2009projection] uses the same critical value as the Kolmogorov–Smirnov test. In practice, 𝑼\bm{U} is drawn multiple times from Unif(𝕊pn1)\mbox{Unif}\left(\mathbb{S}^{p_{n}-1}\right) and one gets a corresponding pp-value for every such 𝑼\bm{U}. The test rejects if the smallest pp-value is below a threshold. More specifically, one picks a large number kk and draws 𝑼1,,𝑼k\bm{U}_{1},\dots,\bm{U}_{k} independently from Unif(𝕊pn1)\mbox{Unif}\left(\mathbb{S}^{p_{n}-1}\right). The test in [cuesta2009projection] rejects at α\alpha-level if

min1ik(Dn,𝑼i>Kα|𝑿1,,𝑿n)cα\min_{1\leq i\leq k}\mathbb{P}\left(D_{n,\bm{U}_{i}}>K_{\alpha}\Big|\bm{X}_{1},\dots,\bm{X}_{n}\right)\leq c_{\alpha}

where KαK_{\alpha} is the critical value of the Kolomogrov–Smirnov test and cαc_{\alpha} is the (1α)(1-\alpha)-quantile of the left-hand side. However, as the asymptotic theory for this test remains unresolved, computationally intensive Monte Carlo methods are often required to approximate cαc_{\alpha}.

Subsequent works, such as [escanciano2006consistent; garcia2023projection; garcia2021cramer] (see also the references therein), addressed this issue by integrating over all possible directions 𝑼\bm{U}, resulting in test statistics of the form

𝔼𝑼[11(1ni=1n𝕀{𝑿i𝑼x}(𝒆1𝑼x))2w(x)𝑑x],\displaystyle\mathbb{E}_{\bm{U}}\left[\int_{-1}^{1}\left(\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}\left\{\bm{X}_{i}^{\top}\bm{U}\leq x\right\}-\mathbb{P}\left(\bm{e}_{1}^{\top}\bm{U}\leq x\right)\right)^{2}w(x)dx\right], (40)

for some weight function w(x)L2([1,1])w(x)\in L^{2}\left([-1,1]\right), where the expectation above is taken with respect to 𝑼\bm{U}.

Test statistics like (40) exhibit desirable properties, similar to the Anderson–Darling and Cramér–von Mises tests. However, the expectation with respect to 𝑼\bm{U} in (40) often lacks a closed-form expression, requiring Monte Carlo simulations for approximation. Additionally, their asymptotic distributions frequently involve weighted sums of chi-squared distributions, complicating the computation of tail probabilities. For example, the tests in [garcia2021cramer; garcia2023projection] rely on Imhof’s method to approximate the critical value. Our proposed test offers two key advantages over projection-based tests:

  1. 1.

    Reduction in computational cost: Unlike projection-based methods, our test avoids sampling random directions or integrating over all possible directions, which often requires complex procedures to approximate the critical values. Theorem 1 demonstrates that when the dimension is large, the tail probabilities of our test statistic are much simpler to approximate, eliminating the need for Monte Carlo simulation.

  2. 2.

    Flexibility in the high-dimensional settings: While existing projection-based tests are valid only in fixed-dimensional scenarios, our test extends seamlessly to high-dimensional settings, including cases where pp is large and nn is small. Extending projection-based tests to such settings is highly non-trivial, as it requires understanding how the eigenvalues of the associated Hilbert–Schmidt operator shrink to zero as the dimension increases.

A.2 Simulation studies

In this subsection, we do simulation studies to compare the power of TnT_{n} with the three existing tests, Rn,Bn,PnR_{n},B_{n},P_{n} in (2), (3) and (4), respectively. We let n=p=80n=p=80 in the experiments. The empirical powers of the four tests are reported in Figure 1 below. The yellow curve corresponds to the proposed test TnT_{n} . We can see that the empirical power fits the theoretical analysis well: the proposed test is nearly as good as the Rayleigh test in the FvML model and is the only test that remains consistent under the both models.

Refer to caption
Refer to caption
Figure 1: Empirical power under the FvML model and the low rank model

A.3 Proof of Proposition 4

To prove (30), we employ a martingale central limit theorem. We will use Corollary 3.1 in [Hall]. Define

Zn:\displaystyle Z_{n}: =1i<jnhn(𝑿j𝑿j)\displaystyle=\sum_{1\leq i<j\leq n}h_{n}\left(\bm{X}^{\top}_{j}\bm{X}_{j}\right) (41)
sn2:\displaystyle s_{n}^{2}: =Var(Zn),\displaystyle=\mbox{Var}(Z_{n}),
Yn,i:\displaystyle Y_{n,i}: =j=1i1hn(𝑿i𝑿j),\displaystyle=\sum_{j=1}^{i-1}h_{n}(\bm{X}^{\top}_{i}\bm{X}_{j}),
Qn:\displaystyle Q_{n}: =i=2n𝔼(Yn,i2|𝑿1,𝑿2,,𝑿i1).\displaystyle=\sum_{i=2}^{n}\mathbb{E}\left(Y_{n,i}^{2}\Big|\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{i-1}\right). (42)

Thanks to Lemma 8 in Section A, the sequence {Yn,i;2in}\left\{Y_{n,i};2\leq i\leq n\right\} is a martingale difference sequence with respect to the natural sigma-fields i=σ(𝑿1,𝑿2,,𝑿i)\mathcal{F}_{i}=\sigma\left(\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{i}\right). This means that 𝔼(Yn,i|i1)=Yn,i1\mathbb{E}\left(Y_{n,i}|\mathcal{F}_{i-1}\right)=Y_{n,i-1}. To get (30) via the martingale CLT from Corollary 3.1 in [Hall], we need to verify the following two conditions

sn2j=2n𝔼[Yn,i2𝟏{|Yn,i|>εsn}]0,\displaystyle s_{n}^{-2}\sum_{j=2}^{n}\mathbb{E}\Big[Y_{n,i}^{2}\cdot\mathbf{1}_{\left\{|Y_{n,i}|>\varepsilon s_{n}\right\}}\Big]\to 0, (43)

for every fixed ε>0\varepsilon>0, and

sn2Qn1.\displaystyle s_{n}^{-2}Q_{n}\xrightarrow{\mathbb{P}}1. (44)

To verify the Lindeberg condition (43), it suffices to show that

sn4i=2n𝔼Yn,i40.\displaystyle s_{n}^{-4}\sum_{i=2}^{n}\mathbb{E}Y_{n,i}^{4}\to 0. (45)

By the pairwise independence property (see Lemma 8), we get sn2=n2(1+o(1))𝔼hn2(𝑿1𝑿2)s_{n}^{2}=n^{2}(1+o(1))\cdot\mathbb{E}h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2}) and thus,

sn4=n4(1+o(1))σ4,\displaystyle s_{n}^{4}=n^{4}(1+o(1))\cdot\sigma^{4}, (46)

where σ2\sigma^{2} is the limit in (28). To bound the 44-th moment terms in (45), we first note that

𝔼[hn2(𝑿1𝑿2)hn(𝑿1𝑿3)hn(𝑿1𝑿4)]\displaystyle\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big] =𝔼𝔼[hn2(𝑿1𝑿2)hn(𝑿1𝑿3)hn(𝑿1𝑿4)|𝑿1]\displaystyle=\mathbb{E}\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big|\bm{X}_{1}\Big]
=𝔼[𝔼(hn2(𝑿1𝑿2)|𝑿1)𝔼(hn(𝑿1𝑿3)|𝑿1)\displaystyle=\mathbb{E}\Big[\mathbb{E}\left(h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big|\bm{X}_{1}\right)\cdot\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big|\bm{X}_{1}\right)
𝔼(hn(𝑿1𝑿4)|𝑿1)]\displaystyle\cdot\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big|\bm{X}_{1}\right)\Big]
=0,\displaystyle=0,

due to the assumption 𝔼hn(𝑿1𝑿2)=0\mathbb{E}h_{n}(\bm{X}_{1}^{\top}\bm{X}_{2})=0 and Lemma 8. Similarly,

𝔼[hn3(𝑿1𝑿2)hn(𝑿1𝑿3)]\displaystyle\mathbb{E}\Big[h_{n}^{3}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big] =𝔼𝔼[hn3(𝑿1𝑿2)hn(𝑿1𝑿3)|𝑿1]\displaystyle=\mathbb{E}\mathbb{E}\Big[h_{n}^{3}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big|\bm{X}_{1}\Big]
=𝔼[𝔼(hn3(𝑿1𝑿2)|𝑿1)𝔼(hn(𝑿1𝑿2)|𝑿1)]\displaystyle=\mathbb{E}\Big[\mathbb{E}\left(h_{n}^{3}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big|\bm{X}_{1}\right)\cdot\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big|\bm{X}_{1}\right)\Big]
=0.\displaystyle=0.

Consequently, for some universal constant CC, we get

𝔼Yn,i4\displaystyle\mathbb{E}Y_{n,i}^{4} j=1i1𝔼hn4(𝑿i𝑿j)+C1rti1𝔼[hn2(𝑿i𝑿r)hn2(𝑿i𝑿t)]\displaystyle\leq\sum_{j=1}^{i-1}\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{i}\bm{X}_{j})+C\cdot\sum_{1\leq r\neq t\leq i-1}\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{i}\bm{X}_{r})\cdot h_{n}^{2}(\bm{X}^{\top}_{i}\bm{X}_{t})\Big]
=(i1)𝔼hn4(𝑿1𝑿2)+Ci2𝔼[hn2(𝑿1𝑿2)hn2(𝑿1𝑿3)]\displaystyle=(i-1)\cdot\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})+Ci^{2}\cdot\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big]
=(i1)𝔼hn4(𝑿1𝑿2)+Ci2(𝔼[hn2(𝑿1𝑿2)])2.\displaystyle=(i-1)\cdot\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})+Ci^{2}\cdot\left(\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big]\right)^{2}.

where the last assertion follows from the second statement of Lemma 8. Sum up the above display over all 2in2\leq i\leq n, we arrive at

i=2n𝔼Yn,i4\displaystyle\sum_{i=2}^{n}\mathbb{E}Y_{n,i}^{4} C1(n2𝔼hn4(𝑿1𝑿2)+n3𝔼2[hn2(𝑿1𝑿2)])\displaystyle\leq C_{1}\cdot\Big(n^{2}\cdot\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})+n^{3}\cdot\mathbb{E}^{2}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big]\Big)
(C1+1)n3𝔼hn4(𝑿1𝑿2),\displaystyle\leq(C_{1}+1)\cdot n^{3}\cdot\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2}),

for some universal constant C1C_{1}. This in turn yields

sn4i=2n𝔼Yn,i4(C1+1)𝔼[hn4(𝑿1𝑿2)]n𝔼2[hn2(𝑿1𝑿2)]s_{n}^{-4}\sum_{i=2}^{n}\mathbb{E}Y_{n,i}^{4}\leq(C_{1}+1)\cdot\frac{\mathbb{E}\Big[h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big]}{n\cdot\mathbb{E}^{2}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big]}

by (44) and (46). The last term in the above display goes to 0 by assumption (29) and thus, implies (45). This in turn conclude the proof of (43).

We now prove (44). Recall QnQ_{n} defined in (42), which can be rewritten as

Qn\displaystyle Q_{n} =i=2n[j=1i1𝔼(hn2(𝑿i𝑿j)|𝑿𝒋)+1rti1𝔼(hn(𝑿i𝑿r)hn(𝑿i𝑿t)|𝑿r,𝑿t)]\displaystyle=\sum_{i=2}^{n}\Big[\sum_{j=1}^{i-1}\mathbb{E}\left(h_{n}^{2}(\bm{X}^{\top}_{i}\bm{X}_{j})\Big|\bm{X_{j}}\right)+\sum_{1\leq r\neq t\leq i-1}\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{i}\bm{X}_{r})\cdot h_{n}(\bm{X}^{\top}_{i}\bm{X}_{t})\Big|\bm{X}_{r},\bm{X}_{t}\right)\Big]
=i=2n(i1)𝔼hn2(𝑿1𝑿2)+i=2n1rti1𝔼(hn(𝑿i𝑿r)hn(𝑿i𝑿t)|𝑿r,𝑿t).\displaystyle=\sum_{i=2}^{n}(i-1)\cdot\mathbb{E}h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})+\sum_{i=2}^{n}\sum_{1\leq r\neq t\leq i-1}\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{i}\bm{X}_{r})\cdot h_{n}(\bm{X}^{\top}_{i}\bm{X}_{t})\Big|\bm{X}_{r},\bm{X}_{t}\right).

Let 𝑿,𝒀,𝒁\bm{X},\bm{Y},\bm{Z} be i.i.d realizations of the uniform distribution on 𝕊p1\mathbb{S}^{p-1}. Define

Hn(𝑿,𝒀):=𝔼[hn(𝑿T𝒁)hn(𝒀T𝒁)|𝑿,𝒀].\displaystyle H_{n}(\bm{X},\bm{Y}):=\mathbb{E}\Big[h_{n}\left(\bm{X}^{T}\bm{Z}\right)\cdot h_{n}\left(\bm{Y}^{T}\bm{Z}\right)\Big|\bm{X},\bm{Y}\Big]. (47)

It is easy to see that for any 1rti11\leq r\neq t\leq i-1, we have

Hn(𝑿r,𝑿t)=𝔼(hn(𝑿i𝑿r)hn(𝑿i𝑿t)|𝑿r,𝑿t).H_{n}(\bm{X}_{r},\bm{X}_{t})=\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{i}\bm{X}_{r})\cdot h_{n}(\bm{X}^{\top}_{i}\bm{X}_{t})\Big|\bm{X}_{r},\bm{X}_{t}\right).

Thus, we have

Qn\displaystyle Q_{n} =i=2n(i1)𝔼hn2(𝑿1𝑿2)+i=2n1rti1Hn(𝑿r,𝑿t)\displaystyle=\sum_{i=2}^{n}(i-1)\cdot\mathbb{E}h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})+\sum_{i=2}^{n}\sum_{1\leq r\neq t\leq i-1}H_{n}(\bm{X}_{r},\bm{X}_{t})
=sn2(1+o(1))+Qn,\displaystyle=s_{n}^{2}\cdot(1+o(1))+Q_{n}^{*},

where

Qn\displaystyle Q_{n}^{*} :=i=2n1rti1Hn(𝑿r,𝑿t).\displaystyle:=\sum_{i=2}^{n}\sum_{1\leq r\neq t\leq i-1}H_{n}(\bm{X}_{r},\bm{X}_{t}).

To prove (44), we only need to show that Var(Qn/sn2)0\mbox{Var}\left(Q_{n}^{*}/s_{n}^{2}\right)\to 0. Let A={(r,t):1rtn1}A=\left\{(r,t):1\leq r\neq t\leq n-1\right\}. Note that for any two pair (r,t)A(r,t)\in A and (r,t)A(r^{\prime},t^{\prime})\in A, we have

𝔼[Hn(𝑿r,𝑿t)Hn(𝑿r,𝑿t)]=0,\mathbb{E}\Big[H_{n}(\bm{X}_{r},\bm{X}_{t})\cdot H_{n}(\bm{X}_{r^{\prime}},\bm{X}_{t^{\prime}})\Big]=0,

unless {r,t}={r,t}\left\{r,t\right\}=\left\{r^{\prime},t^{\prime}\right\}. This is due to the assumption 𝔼hn(𝑿1𝑿2)=0\mathbb{E}h_{n}(\bm{X}_{1}^{\top}\bm{X}_{2})=0 and Lemma 8. Consequently,

Var(Qn)\displaystyle\mbox{Var}\left(Q_{n}^{*}\right) =Var((r,t)A[nmax{r,t}1]Hn(𝑿r,𝑿t))\displaystyle=\mbox{Var}\Big(\sum_{(r,t)\in A}\Big[n-\max\left\{r,t\right\}-1\Big]\cdot H_{n}(\bm{X}_{r},\bm{X}_{t})\Big)
=(r,t)A[nmax{r,t}1]2𝔼Hn2(𝑿r,𝑿t)\displaystyle=\sum_{(r,t)\in A}\Big[n-\max\left\{r,t\right\}-1\Big]^{2}\cdot\mathbb{E}H_{n}^{2}(\bm{X}_{r},\bm{X}_{t})
O(n4)𝔼Hn2(𝑿1,𝑿2),\displaystyle\leq O(n^{4})\cdot\mathbb{E}H_{n}^{2}(\bm{X}_{1},\bm{X}_{2}),

where we use the fact that |A|n2|A|\leq n^{2} in the last bound. By (46), sn4=σ4n4(1+o(1))s_{n}^{4}=\sigma^{4}n^{4}(1+o(1)) so it suffices to verify that 𝔼Hn2(𝑿1,𝑿2)0\mathbb{E}H_{n}^{2}(\bm{X}_{1},\bm{X}_{2})\to 0, which is the content of Lemma 10 in Section A. This concludes the proof of (44), which together with (43) implies (30). \square

A.4 Proof of Lemma 3

Suppose (A)=0\mathbb{Q}(A)=0 for some measurable subset A𝒳A\subset\mathcal{X}. Let XX\sim\mathbb{P} and YY\sim\mathbb{Q}. By the disintegration theorem (see, for example, Appendix F of [pollard2002user]),

(A)=𝒴(YAT1(t)T(Y)=t)T(dt),\mathbb{Q}(A)=\int_{\mathcal{Y}}\mathbb{Q}\bigl(Y\in A\cap T^{-1}(t)\mid T(Y)=t\bigr)\,\mathbb{Q}_{T}(dt),

and similarly

(A)=𝒴(XAT1(t)T(X)=t)T(dt).\mathbb{P}(A)=\int_{\mathcal{Y}}\mathbb{P}\bigl(X\in A\cap T^{-1}(t)\mid T(X)=t\bigr)\,\mathbb{P}_{T}(dt).

For each tt, define the GG-invariant conditional laws T1(t)T^{-1}(t) (such sets are called fibers) by

𝕂Pt(B):=G(gXBT(X)=t)Π(dg),𝕂Qt(B):=G(gYBT(Y)=t)Π(dg),\mathbb{K}_{P}^{t}(B):=\int_{G}\mathbb{P}(gX\in B\mid T(X)=t)\,\Pi(dg),\qquad\mathbb{K}_{Q}^{t}(B):=\int_{G}\mathbb{Q}(gY\in B\mid T(Y)=t)\,\Pi(dg),

for measurable BT1(t)B\subset T^{-1}(t), where Π\Pi is the normalized Haar probability measure on GG.

Roughly speaking, we integrate the disintegration kernels over the group GG so that the resulting kernels are GG-invariant, up to some null sets in 𝒴\mathcal{Y}. Because \mathbb{P} and \mathbb{Q} are GG-invariant, we still have

(A)=𝒴𝕂Qt(AT1(t))T(dt),(A)=𝒴𝕂Pt(AT1(t))T(dt).\mathbb{Q}(A)=\int_{\mathcal{Y}}\mathbb{K}_{Q}^{t}(A\cap T^{-1}(t))\,\mathbb{Q}_{T}(dt),\qquad\mathbb{P}(A)=\int_{\mathcal{Y}}\mathbb{K}_{P}^{t}(A\cap T^{-1}(t))\,\mathbb{P}_{T}(dt).

Each fiber T1(t)T^{-1}(t) is a single GG-orbit, in the sense that it is generated by {gxt,gG}\left\{gx_{t},g\in G\right\} for some xt𝒳x_{t}\in\mathcal{X}, because TT is a maximal invariant. Thus GG acts transitively on every such fiber. By Theorem 4.5 of [eaton1989group], a transitive compact group action admits a unique GG-invariant probability measure on each orbit. Hence,

𝕂Pt𝕂Qtfor T-almost surely t.\displaystyle\mathbb{K}_{P}^{t}\equiv\mathbb{K}_{Q}^{t}\quad\text{for $\mathbb{Q}_{T}$-almost surely $t$}. (48)

Since (A)=0\mathbb{Q}(A)=0, we have

𝕂Qt(AT1(t))=0for T-almost surely t,\mathbb{K}_{Q}^{t}(A\cap T^{-1}(t))=0\quad\text{for $\mathbb{Q}_{T}$-almost surely $t$},

and therefore for T\mathbb{P}_{T}-almost every tt as well, because TT\mathbb{P}_{T}\ll\mathbb{Q}_{T}. Using 𝕂Pt=𝕂Qt\mathbb{K}_{P}^{t}=\mathbb{K}_{Q}^{t} on this set of tt,

(A)=𝒴𝕂Pt(AT1(t))T(dt)=0.\mathbb{P}(A)=\int_{\mathcal{Y}}\mathbb{K}_{P}^{t}(A\cap T^{-1}(t))\,\mathbb{P}_{T}(dt)=0.

Thus \mathbb{P}\ll\mathbb{Q}.

Now, for the statement regarding the likelihood ratio, let

L(t):=dTdT(t).L(t):=\frac{d\mathbb{P}_{T}}{d\mathbb{Q}_{T}}(t).

Then, by (48),

(A)=𝒴𝕂Pt(AT1(t))T(dt)=𝒴𝕂Qt(AT1(t))L(t)T(dt).\mathbb{P}(A)=\int_{\mathcal{Y}}\mathbb{K}_{P}^{t}(A\cap T^{-1}(t))\,\mathbb{P}_{T}(dt)=\int_{\mathcal{Y}}\mathbb{K}_{Q}^{t}(A\cap T^{-1}(t))L(t)\,\mathbb{Q}_{T}(dt).

On the other hand,

𝒳𝟏A(x)L(T(x))(dx)=𝒴𝕂Qt(AT1(t))L(t)T(dt).\int_{\mathcal{X}}\mathbf{1}_{A}(x)\,L(T(x))\,\mathbb{Q}(dx)=\int_{\mathcal{Y}}\mathbb{K}_{Q}^{t}(A\cap T^{-1}(t))L(t)\,\mathbb{Q}_{T}(dt).

Hence,

(A)=AL(T(x))(dx),\mathbb{P}(A)=\int_{A}L(T(x))\,\mathbb{Q}(dx),

which implies

dd(x)=L(T(x))=dTdT(T(x)),-a.s.\frac{d\mathbb{P}}{d\mathbb{Q}}(x)=L(T(x))=\frac{d\mathbb{P}_{T}}{d\mathbb{Q}_{T}}(T(x)),\quad\mathbb{Q}\text{-a.s.}

This completes the proof. \square

A.5 Likelihood ratio analysis

This section contains the analysis of the likelihood ratio’s second moment used in Sections 3.2 and 3.3.

We first analyze the second moment of the likelihood ratio between the FvML distributions with randomized locations and the uniform distributions. We first parametrize the FvML distributions as

dFvml:=Cp(κ)exp(κ𝝁,𝒙)d0.\displaystyle d\mathbb{P}_{\rm Fvml}:=C_{p}(\kappa)\cdot\exp\left(\kappa\langle\bm{\mu},\bm{x}\rangle\right)d\mathbb{P}_{0}.

with κ(0,)\kappa\in(0,\infty) and 𝝁𝕊p1\bm{\mu}\in\mathbb{S}^{p-1}. From this, we find that

Cp(κ)=[𝔼0exp(κ𝝁,𝒙)]1=[1πΓ(p2)Γ(p12)11eκt(1t2)p32𝑑t]1.\displaystyle C_{p}(\kappa)=\left[\mathbb{E}_{\mathbb{P}_{0}}\exp\left(\kappa\langle\bm{\mu},\bm{x}\rangle\right)\right]^{-1}=\left[\frac{1}{\sqrt{\pi}}\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p-1}{2}\right)}\cdot\int_{-1}^{1}e^{\kappa t}\left(1-t^{2}\right)^{\frac{p-3}{2}}dt\right]^{-1}. (49)

Some basic properties of the above normalizing constant are collected in Lemma 12. Define the likelihood ratio

Ln:=𝔼𝝁0[dFvmlnd0n]=(Cp(κ))n𝔼𝝁0[i=1nexp(κ𝝁,𝑿i)d𝝁].\displaystyle L_{n}:=\mathbb{E}_{\bm{\mu}\sim\mathbb{P}_{0}}\left[\frac{d\mathbb{P}_{\rm Fvml}^{\otimes n}}{d\mathbb{P}_{0}^{\otimes n}}\right]=\left(C_{p}(\kappa)\right)^{n}\cdot\mathbb{E}_{\bm{\mu}\sim\mathbb{P}_{0}}\left[\prod_{i=1}^{n}\exp\Big(\kappa\langle\bm{\mu},\bm{X}_{i}\rangle\Big)d\bm{\mu}\right]. (50)

Our first result is an asymptotic formula for the moment of the likelihood ratio.

Proposition 5.

Let κ=τp3/4/n\kappa=\tau p^{3/4}/\sqrt{n} for some τ>0\tau>0. Suppose min{p,n}\min\left\{p,n\right\}\to\infty, then

𝔼(Ln2)=exp(τ4/2+o(1))\mathbb{E}\left(L_{n}^{2}\right)=\exp\left(\tau^{4}/2+o(1)\right)

where LnL_{n} is defined as in (50).

Proof of Proposition 5. Recall the form of LnL_{n} in (50). To compute the second moment, take two independent copies 𝝁,𝝁1\bm{\mu},\bm{\mu}_{1} of the uniform distribution and writes

𝔼Ln2\displaystyle\mathbb{E}L_{n}^{2} =(Cp(κ))2n𝔼𝑿[𝔼(𝝁,𝝁𝟏)[i=1nexp(κ𝝁+𝝁1,𝑿i)]]\displaystyle=\left(C_{p}(\kappa)\right)^{2n}\cdot\mathbb{E}_{\bm{X}}\left[\mathbb{E}_{(\bm{\mu},\bm{\mu_{1}})}\left[\prod_{i=1}^{n}\exp\Big(\kappa\langle\bm{\mu}+\bm{\mu}_{1},\bm{X}_{i}\rangle\Big)\right]\right]
=Cp(κ)2n𝔼(𝝁,𝝁𝟏)[𝑬𝑿[i=1nexp(κ𝝁+𝝁1𝝁+𝝁1𝝁+𝝁1,𝑿i)]]\displaystyle=C_{p}(\kappa)^{2n}\cdot\mathbb{E}_{(\bm{\mu},\bm{\mu_{1}})}\left[\bm{E}_{\bm{X}}\left[\prod_{i=1}^{n}\exp\left(\kappa\|\bm{\mu}+\bm{\mu}_{1}\|\cdot\langle\frac{\bm{\mu}+\bm{\mu}_{1}}{\|\bm{\mu}+\bm{\mu}_{1}\|},\bm{X}_{i}\rangle\right)\right]\right]
=𝔼[Cp(κ)2nCp(κ𝝁+𝝁1)n].\displaystyle=\mathbb{E}\left[\frac{C_{p}(\kappa)^{2n}}{C_{p}(\kappa\|\bm{\mu}+\bm{\mu}_{1}\|)^{n}}\right].

Note that 𝝁+𝝁1=d2(1+U)\|\bm{\mu}+\bm{\mu}_{1}\|\stackrel{{\scriptstyle d}}{{=}}\sqrt{2(1+U)}, where UU has the law as in (5). Thus, we have

𝔼Ln2=𝔼[Cp(κ)2nCp(κ2(1+U))n]=𝔼[exp(n(2logCp(κ)logCp(κ2(1+U))))]\mathbb{E}L_{n}^{2}=\mathbb{E}\left[\frac{C_{p}(\kappa)^{2n}}{C_{p}(\kappa\sqrt{2(1+U)})^{n}}\right]=\mathbb{E}\left[\exp\left(n\left(2\log C_{p}(\kappa)-\log C_{p}\left(\kappa\sqrt{2(1+U)}\right)\right)\right)\right]

where UU has a symmetric Beta-type distribution as in (5).

Put

Ln1:=2logCp(κ)logCp(κ2(1+U)).L_{n1}:=2\log C_{p}(\kappa)-\log C_{p}\left(\kappa\sqrt{2(1+U)}\right).

By the third property in Lemma 12, we have

|Ln1+κ2pκ2(1+U)p|\displaystyle\left|L_{n1}+\frac{\kappa^{2}}{p}-\frac{\kappa^{2}(1+U)}{p}\right| 2|logCp(κ)+κ2p|+|logCp(κ2(1+U))+κ2(1+U)p|\displaystyle\leq 2\left|\log C_{p}(\kappa)+\frac{\kappa^{2}}{p}\right|+\left|\log C_{p}\left(\kappa\sqrt{2(1+U)}\right)+\frac{\kappa^{2}(1+U)}{p}\right|
=O(κ4p3)=O(τ4n2).\displaystyle=O\left(\frac{\kappa^{4}}{p^{3}}\right)=O\left(\frac{\tau^{4}}{n^{2}}\right).

Consequently,

𝔼Ln2=𝔼[exp(nLn12)]\displaystyle\mathbb{E}L_{n}^{2}=\mathbb{E}\left[\exp\left(nL_{n1}^{2}\right)\right] =𝔼[exp(κ2npU+O(n1))]\displaystyle=\mathbb{E}\left[\exp\left(\frac{\kappa^{2}n}{p}\cdot U+O\left(n^{-1}\right)\right)\right]
=𝔼[τ2pU+O(n1)].\displaystyle=\mathbb{E}\left[\tau^{2}\cdot\sqrt{p}U+O\left(n^{-1}\right)\right].

The proof is completed by noting that the sequence {pU}\left\{\sqrt{p}U\right\} converges to a standard normal distribution and is exponentially tight. \square

The next asymptotic result was used in the proof of Theorem 4. Recall that the multivariate Gamma function Γn(z)\Gamma_{n}(z) is defined as

Γn(z):=πn(n1)/4k=1nΓ(zk12)\displaystyle\Gamma_{n}(z):=\pi^{n(n-1)/4}\prod_{k=1}^{n}\Gamma\left(z-\frac{k-1}{2}\right) (51)

for all complex number zz such that Re(z)>(n1)/2\mbox{Re}(z)>(n-1)/2.

The multivariate Gamma function reduces to the usual Gamma function for n=1n=1. The following lemma is taken from Lemma 5.1 and Proposition 5.1 in [jiang2015likelihood].

Lemma 4.

Let Γ(x)\Gamma(x) be the standard Gamma function and Γn(x)\Gamma_{n}(x) be the multivariate Gamma function as in (51). We have

  • Uniformly for all b[x/2,x/2]b\in[-x/2,x/2],

    log[Γ(x+b)Γ(x)]=(x+b)log(x+b)xlogxbb2x+O(b2+1x2)\log\left[\frac{\Gamma(x+b)}{\Gamma(x)}\right]=(x+b)\log(x+b)-x\log x-b-\frac{b}{2x}+O\left(\frac{b^{2}+1}{x^{2}}\right)

    as xx\to\infty.

  • Uniformly for all t[p/n,p/n]t\in[-p/n,\,p/n],

    log[Γn(p2+t)Γn(p2)]=αn,pt+βn,pt2+γn,p(t)+o(1)\log\left[\frac{\Gamma_{n}\!\left(\frac{p}{2}+t\right)}{\Gamma_{n}\!\left(\frac{p}{2}\right)}\right]=\alpha_{n,p}\,t+\beta_{n,p}\,t^{2}+\gamma_{n,p}(t)+o(1)

    as n,pn,p\to\infty with p/np/n\to\infty, where

    αn,p\displaystyle\alpha_{n,p} :=[2n+(pn12)log(1np)],\displaystyle:=-\left[2n+\left(p-n-\frac{1}{2}\right)\log\!\left(1-\frac{n}{p}\right)\right],
    βn,p\displaystyle\beta_{n,p} :=[np+log(1np)],\displaystyle:=-\left[\frac{n}{p}+\log\!\left(1-\frac{n}{p}\right)\right],
    γn,p(t)\displaystyle\gamma_{n,p}(t) :=n[(p2+t)log(p2+t)p2log(p2)].\displaystyle:=n\left[\left(\frac{p}{2}+t\right)\log\!\left(\frac{p}{2}+t\right)-\frac{p}{2}\log\!\left(\frac{p}{2}\right)\right].

Proof of Lemma 4. The first item follows from Lemma 5.1 in [jiang2015likelihood] and the second item follows from Proposition 5.1 in [jiang2015likelihood]. \square

Proposition 6.

Recall FnF_{n} in (38). Suppose p/np/n\to\infty, n(1k/p)0n(1-k/p)\to 0, and n+1kpn+1\leq k\leq p. Then,

Fn(Δ)2Fn(Δ2)0F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)\to 0

where Δ:=kp\Delta:=k-p.

Proof of Proposition 6. Write

Fn(Δ)2Fn(Δ2)\displaystyle F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right) =nlog[Γ(p2)Γ(p2+Δ)]+log[Γn(p2+Δ)Γn(p2)]\displaystyle=n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\Delta\right)}\right]+\log\left[\frac{\Gamma_{n}\left(\frac{p}{2}+\Delta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right]
2nlog[Γ(p2)Γ(p+Δ2)]2log[Γn(p+Δ2)Γn(p2)].\displaystyle-2n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p+\Delta}{2}\right)}\right]-2\log\left[\frac{\Gamma_{n}\left(\frac{p+\Delta}{2}\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right].

Obviously, |Δ|=O(p/n)=o(p)|\Delta|=O(p/n)=o(p), so Lemma 4 applies and yields

nlog[Γ(p2)Γ(p2+Δ)]2nlog[Γ(p2)Γ(p+Δ2)]\displaystyle n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\Delta\right)}\right]-2n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p+\Delta}{2}\right)}\right]
=\displaystyle= n[(p2+Δ)log(p2+Δ)+p2log(p2)+Δ+Δp+O(Δ2+1p2)]\displaystyle n\left[-\left(\frac{p}{2}+\Delta\right)\log\left(\frac{p}{2}+\Delta\right)+\frac{p}{2}\log\left(\frac{p}{2}\right)+\Delta+\frac{\Delta}{p}+O\left(\frac{\Delta^{2}+1}{p^{2}}\right)\right]
\displaystyle- 2n[(p+Δ2)log(p+Δ2)+p2log(p2)+Δ2+Δ2p+O(Δ2+1p2)]\displaystyle 2n\left[-\left(\frac{p+\Delta}{2}\right)\log\left(\frac{p+\Delta}{2}\right)+\frac{p}{2}\log\left(\frac{p}{2}\right)+\frac{\Delta}{2}+\frac{\Delta}{2p}+O\left(\frac{\Delta^{2}+1}{p^{2}}\right)\right]
=\displaystyle= n[(p2+Δ)log(p2+Δ)2(p+Δ2)log(p+Δ2)]+O(n(Δ2+1)p2)\displaystyle-n\left[\left(\frac{p}{2}+\Delta\right)\log\left(\frac{p}{2}+\Delta\right)-2\left(\frac{p+\Delta}{2}\right)\log\left(\frac{p+\Delta}{2}\right)\right]+O\left(\frac{n(\Delta^{2}+1)}{p^{2}}\right)
\displaystyle- np2log(p2)\displaystyle\frac{np}{2}\log\left(\frac{p}{2}\right)
=\displaystyle= n[(p2+Δ)log(p2+Δ)2(p+Δ2)log(p+Δ2)]np2log(p2)+o(1),\displaystyle-n\left[\left(\frac{p}{2}+\Delta\right)\log\left(\frac{p}{2}+\Delta\right)-2\left(\frac{p+\Delta}{2}\right)\log\left(\frac{p+\Delta}{2}\right)\right]-\frac{np}{2}\log\left(\frac{p}{2}\right)+o(1),

where the last line follows from the fact that nΔ2/p=O(1/n)=o(1)n\Delta^{2}/p=O(1/n)=o(1).

Similarly, with αn,p,βn,p,γn,p(t)\alpha_{n,p},\beta_{n,p},\gamma_{n,p}(t) as in Lemma 4, we have

log[Γn(p2+Δ)Γn(p2)]2log[Γn(p+Δ2)Γn(p2)]\displaystyle\log\left[\frac{\Gamma_{n}\left(\frac{p}{2}+\Delta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right]-2\log\left[\frac{\Gamma_{n}\left(\frac{p+\Delta}{2}\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right]
=\displaystyle= αn,pΔ+βn,pΔ2+γn,p(Δ)(αn,pΔ+βn,pΔ22+2γn,p(Δ/2))+o(1)\displaystyle\alpha_{n,p}\,\Delta+\beta_{n,p}\,\Delta^{2}+\gamma_{n,p}(\Delta)-\left(\alpha_{n,p}\,\Delta+\beta_{n,p}\,\frac{\Delta^{2}}{2}+2\gamma_{n,p}(\Delta/2)\right)+o(1)
=\displaystyle= βn,pΔ22+γn,p(Δ)2γn,p(Δ/2)+o(1)\displaystyle\beta_{n,p}\,\frac{\Delta^{2}}{2}+\gamma_{n,p}(\Delta)-2\gamma_{n,p}(\Delta/2)+o(1)
=\displaystyle= βn,pΔ22+n[(p2+Δ)log(p2+Δ)p2log(p2)]\displaystyle\beta_{n,p}\,\frac{\Delta^{2}}{2}+n\left[\left(\frac{p}{2}+\Delta\right)\log\!\left(\frac{p}{2}+\Delta\right)-\frac{p}{2}\log\!\left(\frac{p}{2}\right)\right]
\displaystyle- 2n[(p+Δ2)log(p+Δ2)p2log(p2)]+o(1)\displaystyle 2n\left[\left(\frac{p+\Delta}{2}\right)\log\!\left(\frac{p+\Delta}{2}\right)-\frac{p}{2}\log\!\left(\frac{p}{2}\right)\right]+o(1)
=\displaystyle= βn,pΔ22+n[(p2+Δ)log(p2+Δ)2(p+Δ2)log(p+Δ2)]+np2log(p2).\displaystyle\beta_{n,p}\,\frac{\Delta^{2}}{2}+n\left[\left(\frac{p}{2}+\Delta\right)\log\!\left(\frac{p}{2}+\Delta\right)-2\left(\frac{p+\Delta}{2}\right)\log\!\left(\frac{p+\Delta}{2}\right)\right]+\frac{np}{2}\log\left(\frac{p}{2}\right).

Thus,

Fn(Δ)2Fn(Δ2)=βn,pΔ22+o(1)\displaystyle F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)=\beta_{n,p}\,\frac{\Delta^{2}}{2}+o(1) =[np+log(1np)]Δ22+o(1)\displaystyle=-\left[\frac{n}{p}+\log\!\left(1-\frac{n}{p}\right)\right]\frac{\Delta^{2}}{2}+o(1)
=n2Δ22p2(1+o(1))+o(1)\displaystyle=\frac{n^{2}\Delta^{2}}{2p^{2}}(1+o(1))+o(1)

which tends to 0 since nΔ/p=n(1k/p)0n\Delta/p=-n(1-k/p)\to 0. The proof is completed. \square

A.6 Kolomogrov distance asymptotic results

Let us start with a simple observation.

Lemma 5.

Suppose n\mathbb{P}_{n} and n\mathbb{Q}_{n} are two sequence of probabiity measures such that the likelihood ratio Ln:=dn/dnL_{n}:=d\mathbb{Q}_{n}/d\mathbb{P}_{n} exists. Let {Xn;n1}\left\{X_{n};n\geq 1\right\} be a sequence of random variables. If

supn1{𝔼n(Ln2)+𝔼n(Xn4)}\displaystyle\sup_{n\geq 1}\left\{\mathbb{E}_{\mathbb{P}_{n}}\left(L_{n}^{2}\right)+\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{4}\right)\right\} <,\displaystyle<\infty,

then

supn1{𝔼n(Xn2)}<and|𝔼n(Xn)𝔼n(Xn)|𝔼n(Xn2)Varn(Ln).\sup_{n\geq 1}\left\{\mathbb{E}_{\mathbb{Q}_{n}}\left(X_{n}^{2}\right)\right\}<\infty\ \text{and}\ \left|\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}\right)-\mathbb{E}_{\mathbb{Q}_{n}}\left(X_{n}\right)\right|\leq\sqrt{\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{2}\right)\cdot\mbox{Var}_{\mathbb{P}_{n}}\left(L_{n}\right)}.

Proof of Lemma 5. Observe that

𝔼nXn2\displaystyle\mathbb{E}_{\mathbb{Q}_{n}}X_{n}^{2} =𝔼n(Xn2Ln)𝔼n(Xn4)𝔼n(Ln2)𝔼n(Xn4)+𝔼n(Ln2)2.\displaystyle=\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{2}L_{n}\right)\leq\sqrt{\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{4}\right)\cdot\mathbb{E}_{\mathbb{P}_{n}}\left(L_{n}^{2}\right)}\leq\frac{\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{4}\right)+\mathbb{E}_{\mathbb{P}_{n}}\left(L_{n}^{2}\right)}{2}.

The second inequality can be proven similarly. The proof is completed. \square

Based on Lemma 5 and Proposition 5, we get the following asymptotic expansion.

Proposition 7.

Let κ=τpn3/4/n\kappa=\tau p_{n}^{3/4}/\sqrt{n} with τ(0,)\tau\in(0,\infty). Suppose 𝐗,𝐘\bm{X},\bm{Y} are two i.i.d. random points on 𝕊p1\mathbb{S}^{p-1}. Then, for any uu\in\mathbb{R}, we have

n[μn(p𝑿𝒀u)μ0(p𝑿𝒀u)]τ22πexp(u2/2)\displaystyle n\left[\mathbb{P}_{\mu_{n}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\right]\to\frac{-\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)

as min{n,p}\min\left\{n,p\right\}\to\infty, where μn\mu_{n} is a FvML distribution on 𝕊pn1\mathbb{S}^{p_{n}-1} with concentration parameter κn\kappa_{n}. Moreover, the convergence above is uniform in uu, which also means nd(μn,μ)τ2/2πnd\left(\mu_{n},\mu\right)\to\tau^{2}/\sqrt{2\pi}, where dd is the metric in (13).

We did not specify the location parameter of the FvML distributions in the statement of Proposition 7 since the distribution of the inner product is independent of the location’s choice. The proof of Proposition 7 is based on an application of the Le Cam’s third lemma and the high-dimensional LAN result in [Cutting-P-V].

The interesting feature of this approach is that no growth condition on nn and pp is assumed. We do not know whether direct analyses based on the Edgeworth expansion or spherical harmonics can yield the same result. The reason that Proposition 7 holds without any growth condition on nn and pp is due to some special properties of the Bessel functions of the first type, which was exploited in [Cutting-P-V] and Proposition 5 above.

Proof of Proposition 7. Fix uu\in\mathbb{R} and consider the sequence of random variables

An(u)=An:=2n(n1)1i<jn[𝟏{p𝑿i𝑿ju}μ0(p𝑿1𝑿2u)].A_{n}(u)=A_{n}:=\sqrt{\frac{2}{n(n-1)}}\sum_{1\leq i<j\leq n}\left[\mathbf{1}_{\left\{\sqrt{p}\bm{X}_{i}^{\top}\bm{X}_{j}\leq u\right\}}-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\leq u\right)\right].

Recall LnL_{n} in (50). It was shown in [Cutting-P-V] that under uniformity, we have the LAN expansion

log(Ln)=τ22Rnτ44+o(1)\displaystyle\log\left(L_{n}\right)=\frac{\tau^{2}}{\sqrt{2}}R_{n}-\frac{\tau^{4}}{4}+o_{\mathbb{P}}(1) (52)

where RnR_{n} is the Rayleigh test in (2).

Recall that Φ\Phi is the CDF of a standard normal. By using Proposition 4 and the Crámmer–Wold device, we have

(An,Rn)dN((0,0),(Φ(u)(1Φ(u))g(u)g(u)1))\left(A_{n},R_{n}\right)\stackrel{{\scriptstyle d}}{{\to}}N\left(\left(0,0\right)^{\top},\left(\begin{matrix}\Phi(u)\left(1-\Phi(u)\right)&g(u)\\ g(u)&1\end{matrix}\right)\right)

under uniformity, where

g(u):\displaystyle\quad g(u): =limn𝔼μ0[p𝑿1𝑿2𝟏{p𝑿1𝑿2u}]=𝔼μ0(Z𝟏{Zu})=exp(u2/2)2π\displaystyle=\lim_{n\to\infty}\mathbb{E}_{\mu_{0}}\left[\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\cdot\mathbf{1}_{\left\{\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\leq u\right\}}\right]=\mathbb{E}_{\mu_{0}}\left(Z\cdot\mathbf{1}_{\left\{Z\leq u\right\}}\right)=\frac{-\exp\left(-u^{2}/2\right)}{\sqrt{2\pi}}

with ZZ being a standard normal in the expression above. The function gg is nothing but the opposite sign of the standard Gaussian density.

The convergence in expectation in the display above follows from the fact that the normalized inner products are asymptotically normal and have uniformly bounded fourth moments. Thus,

(An,log(Ln))dN((0,τ44),(Φ(u)(1Φ(u))τ22g(u)τ22g(u)τ44))\left(A_{n},\log(L_{n})\right)\stackrel{{\scriptstyle d}}{{\to}}N\left(\left(0,\frac{-\tau^{4}}{4}\right)^{\top},\left(\begin{matrix}\Phi(u)\left(1-\Phi(u)\right)&\frac{\tau^{2}}{\sqrt{2}}g(u)\\ \frac{\tau^{2}}{\sqrt{2}}g(u)&\frac{\tau^{4}}{4}\end{matrix}\right)\right)

under uniformity. By using the Le Cam’s third lemma and (52), we obtain that under μn\mu_{n},

AndN(τ2g(u)2,Φ(u)(1Φ(u))).A_{n}\stackrel{{\scriptstyle d}}{{\to}}N\left(\frac{\tau^{2}g(u)}{\sqrt{2}},\Phi(u)\left(1-\Phi(u)\right)\right).

Since AnA_{n} is the sum of Θ(n2)\Theta(n^{2}) pairwise independent, mean zero, bounded random variables under uniformity rescaled by Θ(n)\Theta(n), its fourth moment is uniformly bounded by a universal constant. Combine this fact, the expansion (52), Lemma 5 and Proposition 5, we obtain the mean convergence

τ2g(u)2\displaystyle\frac{\tau^{2}g(u)}{\sqrt{2}} =limn𝔼μn(An)\displaystyle=\lim_{n\to\infty}\mathbb{E}_{\mu_{n}}\left(A_{n}\right)
=limn{n(1+o(1))2[μn(p𝑿𝒀u)0(p𝑿𝒀u)]}.\displaystyle=\lim_{n\to\infty}\left\{\frac{n(1+o(1))}{\sqrt{2}}\cdot\left[\mathbb{P}_{\mu_{n}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{0}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\right]\right\}.

This completes the proof of the first claim.

We now prove the uniform convergence. Observe that for a fixed M>0M>0

supu|n(n1)2[μn(p𝑿𝒀u)μ0(p𝑿𝒀u)]+τ22πexp(u2/2)|\displaystyle\sup_{u\in\mathbb{R}}\left|\sqrt{\frac{n(n-1)}{2}}\Big[\mathbb{P}_{\mu_{n}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\Big]+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right|
=\displaystyle= supu|𝔼μn(An(u))+τ22πexp(u2/2)|\displaystyle\sup_{u\in\mathbb{R}}\left|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right|
\displaystyle\leq supu[M,M]|𝔼μn(An(u))+τ22πexp(u2/2)|+sup|u|>M|𝔼μn(An(u))+τ22πexp(u2/2)|.\displaystyle\sup_{u\in[-M,M]}\left|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right|+\sup_{|u|>M}\left|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right|.

It is an standard fact that if a sequence of equicontinuous functions converges pointwise on a compact set, then the convergence is uniform (this is a corollary of the Arzelá–Ascoli theorem, see Theorem 4.43 in [folland1999real]). To use this fact, let us check that the functions

u𝔼μn(An(u))+τ22πexp(u2/2)\displaystyle u\mapsto\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right) (53)

are equicontinuous.

The second term is obviously smooth and has bounded derivatives, so we only have to treat the first term. By the second estimate in Lemma 5, for all u<v[M,M]u<v\in[-M,M], we have

|[𝔼μn(An(u))𝔼μn(An(v))]|\displaystyle\left|\Big[\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)-\mathbb{E}_{\mu_{n}}\left(A_{n}(v)\right)\Big]\right|
=\displaystyle= |[𝔼μn(An(u))𝔼μn(An(v))][𝔼μ0(An(u))𝔼μ0(An(v))]=0|\displaystyle\left|\Big[\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)-\mathbb{E}_{\mu_{n}}\left(A_{n}(v)\right)\Big]-\underbrace{\Big[\mathbb{E}_{\mu_{0}}\left(A_{n}(u)\right)-\mathbb{E}_{\mu_{0}}\left(A_{n}(v)\right)\Big]}_{=0}\right|
\displaystyle\leq 𝔼μ0[An(u)An(v)]2Varμ0(Ln)\displaystyle\sqrt{\mathbb{E}_{\mu_{0}}\left[A_{n}(u)-A_{n}(v)\right]^{2}\cdot\mbox{Var}_{\mathbb{P}_{\mu_{0}}}\left(L_{n}\right)}
\displaystyle\lesssim 𝔼μ0[An(u)An(v)]2μ0(up𝑿𝒀v).\displaystyle\sqrt{\mathbb{E}_{\mu_{0}}\left[A_{n}(u)-A_{n}(v)\right]^{2}}\leq\sqrt{\mathbb{P}_{\mu_{0}}\left(u\leq\sqrt{p}\bm{X}^{\top}\bm{Y}\leq v\right)}.

Since the densities of p𝑿𝒀\sqrt{p}\bm{X}^{\top}\bm{Y} under μ0\mu_{0} (given by (55) below) are uniformly bounded for large pp (the upper bound can be taken as 1/2π1/2\pi for p3p\geq 3), the equicontinuity follows. In fact, the argument above also gives Hölder continuity of the sequences in (53) with exponent 1/21/2 and with the same Hölder constant.

Now, notice that the sequence of functions in (53) are equicontinuous and converges pointwise to 0 in [M,M]-[M,M], the convergence is uniform. Thus, we have

lim supn{supu|𝔼μn(An(u))+τ22πexp(u2/2)|}\displaystyle\limsup_{n\to\infty}\left\{\sup_{u\in\mathbb{R}}\left|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right|\right\}
\displaystyle\leq lim supn{sup|u|>M|𝔼μn(An(u))+τ22πexp(u2/2)|}\displaystyle\limsup_{n\to\infty}\left\{\sup_{|u|>M}\left|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right|\right\}
\displaystyle\leq lim supn{sup|u|>M|𝔼μn(An(u))|}+τ22πexp(M2/2)\displaystyle\limsup_{n\to\infty}\left\{\sup_{|u|>M}\Big|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)\Big|\right\}+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-M^{2}/2\right)

for all M>0M>0.

It suffices to show that the first term on the last display goes to 0 as MM\to\infty. To see this, use the second estimate in Lemma 5 again to get

|𝔼μn(An(u))𝔼μ0(An(u))=0|𝔼μ0(An2(u))Varμ0(Ln).\displaystyle\left|\mathbb{E}_{\mathbb{P}_{\mu_{n}}}\left(A_{n}(u)\right)-\underbrace{\mathbb{E}_{\mathbb{P}_{\mu_{0}}}\left(A_{n}(u)\right)}_{=0}\right|\leq\sqrt{\mathbb{E}_{\mathbb{P}_{\mu_{0}}}\left(A_{n}^{2}(u)\right)\cdot\mbox{Var}_{\mathbb{P}_{\mu_{0}}}\left(L_{n}\right)}.

Consequently,

sup|u|>M|𝔼μn(An(u))|\displaystyle\sup_{|u|>M}\Big|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)\Big| sup|u|>M{𝔼μ0(An2(u))}\displaystyle\lesssim\sup_{|u|>M}\left\{\sqrt{\mathbb{E}_{\mathbb{P}_{\mu_{0}}}\left(A_{n}^{2}(u)\right)}\right\}
sup|u|>M{an,u(1an,u)}\displaystyle\lesssim\sup_{|u|>M}\left\{\sqrt{a_{n,u}\left(1-a_{n,u}\right)}\right\}

where an,u:=μ0(p𝑿𝒀u)a_{n,u}:=\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right). By (54) below, we have

sup|u|>M{an,u(1an,u)}=Φ(M)(1Φ(M))+O(1/p).\sup_{|u|>M}\left\{a_{n,u}\left(1-a_{n,u}\right)\right\}=\Phi(M)\left(1-\Phi(M)\right)+O(1/p).

Therefore,

lim supn{sup|u|>M|𝔼μn(An(u))|}Φ(M)(1Φ(M)).\limsup_{n\to\infty}\left\{\sup_{|u|>M}\Big|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)\Big|\right\}\lesssim\sqrt{\Phi(M)\left(1-\Phi(M)\right)}.

The last term goes to 0 whenever MM\to\infty. The proof is completed. \square

The next result gives the expansion in terms of the distance dd for the low-rank model (17). Recall that μ0:=Unif(𝕊p1)\mu_{0}:=\mbox{Unif}\left(\mathbb{S}^{p-1}\right) and μk:=Unif(𝕊k1)\mu_{k}:=\mbox{Unif}\left(\mathbb{S}^{k-1}\right) are the uniform distributions on pp-sphere and kk-sphere, respectively.

Proposition 8.

Suppose kpk\leq p, p/np/n\to\infty, and (1k/p)nτ(0,)\left(1-k/p\right)n\to\tau\in(0,\infty).Let 𝐗,𝐘\bm{X},\bm{Y} be i.i.d. sampled from either μ0\mu_{0} or μk\mu_{k}. Then, we have

n[μk(p𝑿𝒀u)μ0(p𝑿𝒀u)]τ2uexp(u2/2)2πn\cdot\left[\mathbb{P}_{\mu_{k}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\right]\to\frac{-\tau}{2}\cdot\frac{u\cdot\exp\left(-u^{2}/2\right)}{\sqrt{2\pi}}

as nn\to\infty, for all uu\in\mathbb{R}. Moreover, the convergence above is uniform in uu, which also means

nd(μn,μ0)τ2supu|uexp(u2/2)2π|.nd\left(\mu_{n},\mu_{0}\right)\to\frac{\tau}{2}\cdot\sup_{u\in\mathbb{R}}\left|\frac{u\cdot\exp\left(-u^{2}/2\right)}{\sqrt{2\pi}}\right|.

Proof of Proposition 8. We will first show that

supu|μ0(p𝑿𝒀u)Φ(u)|=O(p1)\displaystyle\sup_{u\in\mathbb{R}}\left|\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\Phi(u)\right|=O\left(p^{-1}\right) (54)

as pp\to\infty.

The rate p1p^{-1} is sharper than a direct application of the Berry-Esseen bound and requires a more careful analysis. We will prove a stronger result that the L1L_{1}-distance between the densities of p𝑿𝒀\sqrt{p}\bm{X}^{\top}\bm{Y} and a standard normal is of order 1/p1/p. To see this, note that by (5), the density of p𝑿𝒀\sqrt{p}\bm{X}^{\top}\bm{Y} has the form

fp(x):=1πΓ(p2)pΓ(p12)(1x2p)p32𝟏[p,p](x).\displaystyle f_{p}(x):=\frac{1}{\sqrt{\pi}}\frac{\Gamma\left(\frac{p}{2}\right)}{\sqrt{p}\cdot\Gamma\left(\frac{p-1}{2}\right)}\left(1-\frac{x^{2}}{p}\right)^{\frac{p-3}{2}}\mathbf{1}_{\left[-\sqrt{p},\sqrt{p}\right]}(x). (55)

A direct calculation yields

Γ(p2)Γ(p12)\displaystyle\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p-1}{2}\right)} =p2(114p+O(p2)).\displaystyle=\sqrt{\frac{p}{2}}\left(1-\frac{1}{4p}+O(p^{-2})\right).

Also, for x[p/2,p/2]x\in[-\sqrt{p/2},\sqrt{p/2}], we have

(1x2p)p32\displaystyle\left(1-\frac{x^{2}}{p}\right)^{\frac{p-3}{2}} =exp[p32log(1x2p)]\displaystyle=\exp\left[\frac{p-3}{2}\log\left(1-\frac{x^{2}}{p}\right)\right]
=exp[p32(x2px42p2+O(x6p3))]\displaystyle=\exp\left[\frac{p-3}{2}\left(-\frac{x^{2}}{p}-\frac{x^{4}}{2p^{2}}+O\left(\frac{x^{6}}{p^{3}}\right)\right)\right]
=exp[x22x42p+O(x6p2)]\displaystyle=\exp\left[-\frac{x^{2}}{2}-\frac{x^{4}}{2p}+O\left(\frac{x^{6}}{p^{2}}\right)\right]

uniformly.

Thus,

|fp(x)ϕ(x)|𝑑x\displaystyle\int_{\mathbb{R}}\left|f_{p}(x)-\phi(x)\right|dx p/2p/2|exp(x2/2)2π[1exp(x42p+O(x6p3))]|+O(1/p)\displaystyle\leq\int_{-\sqrt{p/2}}^{\sqrt{p/2}}\left|\frac{\exp(-x^{2}/2)}{\sqrt{2\pi}}\left[1-\exp\left(-\frac{x^{4}}{2p}+O\left(\frac{x^{6}}{p^{3}}\right)\right)\right]\right|+O\left(1/p\right)
=O(1/p)x4exp(x2/2)2π𝑑x+O(1/p)=O(1/p)\displaystyle=O(1/p)\cdot\int_{\mathbb{R}}\frac{x^{4}\exp\left(-x^{2}/2\right)}{\sqrt{2\pi}}dx+O(1/p)=O(1/p)

where the first equality follows from the fact that x6/p2=O(x4/p)x^{6}/p^{2}=O(x^{4}/p) on the interval [p/2,p/2][-\sqrt{p/2},\sqrt{p/2}]. Consequently, (54) follows.

To finish the proof, notice that

n[μk(p𝑿𝒀u)μ0(p𝑿𝒀u)]\displaystyle n\cdot\left[\mathbb{P}_{\mu_{k}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\right] =n[Φ(ukp)Φ(u)]+O(n/p)\displaystyle=n\left[\Phi\left(u\sqrt{\frac{k}{p}}\right)-\Phi\left(u\right)\right]+O\left(n/p\right)
=nu(1k/p)ϕ(u)(1+o(1))\displaystyle=-nu\left(1-\sqrt{k/p}\right)\phi(u)(1+o(1))
+O(n/p)\displaystyle+O(n/p)
τ2uϕ(u)\displaystyle\to-\frac{\tau}{2}u\phi(u)

for all uu\in\mathbb{R}. The convergence above is uniform whenever p/np/n\to\infty. The proof is completed. \square

A.7 Other technical results

Let us start with a concentration inequality for degenerate U-processes, taken from [cattaneo2024uniform]. Let (S,𝒮)(S,\mathcal{S}) be a measurable space and let X1,,XnX_{1},\dots,X_{n} be i.i.d. SS-valued random variables with common law PP. Let \mathcal{F} be a pointwise measurable class of measurable functions f:S×Sf:S\times S\to\mathbb{R}.

Define the (canonical) degenerate UU-process of order two by

Un(f):=2n(n1)i<j{f(Xi,Xj)𝔼[f(Xi,Xj)Xi]𝔼[f(Xi,Xj)Xj]+𝔼[f(Xi,Xj)]}U_{n}(f):=\frac{2}{n(n-1)}\sum_{i<j}\Big\{f(X_{i},X_{j})-\mathbb{E}\big[f(X_{i},X_{j})\mid X_{i}\big]-\mathbb{E}\big[f(X_{i},X_{j})\mid X_{j}\big]+\mathbb{E}\big[f(X_{i},X_{j})\big]\Big\}

for ff\in\mathcal{F}. Assume that

  1. 1.

    Each ff\in\mathcal{F} is symmetric, i.e. f(s1,s2)=f(s2,s1)f(s_{1},s_{2})=f(s_{2},s_{1}) for all s1,s2Ss_{1},s_{2}\in S.

  2. 2.

    There exists a measurable envelope F:S×SF:S\times S\to\mathbb{R} such that |f(s1,s2)|F(s1,s2)|f(s_{1},s_{2})|\leq F(s_{1},s_{2}) for all ff\in\mathcal{F} and all s1,s2Ss_{1},s_{2}\in S.

  3. 3.

    For any probability measure QQ on (S×S,𝒮𝒮)(S\times S,\mathcal{S}\otimes\mathcal{S}) and q1q\geq 1, let

    fQ,q:=(𝔼Q[|f|q])1/q.\|f\|_{Q,q}:=\big(\mathbb{E}_{Q}[|f|^{q}]\big)^{1/q}.

    Suppose that the envelope FF is VC-type in the sense that there exist constants C1eC_{1}\geq e and C21C_{2}\geq 1 such that, for all ε(0,1]\varepsilon\in(0,1],

    supQN(,Q,2,εFQ,2)(C1ε)C2\sup_{Q}N\Big(\mathcal{F},\|\cdot\|_{Q,2},\varepsilon\|F\|_{Q,2}\Big)\;\leq\;\Big(\frac{C_{1}}{\varepsilon}\Big)^{C_{2}}

    where the supremum is taken over all finite discrete probability measures QQ on S×SS\times S.

Under the three conditions above, we have

Lemma 6 (Lemma SA37 from [cattaneo2024uniform]).

Let σ>0\sigma>0 be any deterministic quantity satisfying

supffP,2σFP,2,\sup_{f\in\mathcal{F}}\|f\|_{P,2}\;\leq\;\sigma\;\leq\;\|F\|_{P,2},

and define the random variable

M:=max1i,jn|F(Xi,Xj)|.M:=\max_{1\leq i,j\leq n}|F(X_{i},X_{j})|.

Then there exists a universal constant C3>0C_{3}>0 such that

n𝔼[supf|Un(f)|]C3σ(C2logC1FP,2σ)+C3MP,2n[C2log(C1FP,2σ)2.]n\,\mathbb{E}\Bigg[\sup_{f\in\mathcal{F}}|U_{n}(f)|\Bigg]\;\leq\;C_{3}\sigma\Bigg(C_{2}\log\frac{C_{1}\|F\|_{P,2}}{\sigma}\Bigg)\;+\;\frac{C_{3}\|M\|_{P,2}}{\sqrt{n}}\left[C_{2}\log\left(\frac{C_{1}\|F\|_{P,2}}{\sigma}\right)^{\!2}.\right]

Proof of Lemma 6. See [cattaneo2024uniform]. \square

Lemma 7.

Suppose 𝐗,𝐘\bm{X},\bm{Y} are matrices of size p×np\times n such that 𝐗𝐗=𝐘𝐘\bm{X}^{\top}\bm{X}=\bm{Y}^{\top}\bm{Y}. Then, there exists an orthogonal matrix 𝐐\bm{Q} of size p×pp\times p such that 𝐗=𝐐𝐘\bm{X}=\bm{Q}\bm{Y}.

Proof of Lemma 7. Notice that the assumption 𝑿𝑿=𝒀𝒀\bm{X}^{\top}\bm{X}=\bm{Y}^{\top}\bm{Y} implies that Ker(𝑿)=Ker(𝒀)\mbox{Ker}\left(\bm{X}\right)=\mbox{Ker}\left(\bm{Y}\right) since 𝑿\bm{X} and 𝑿𝑿\bm{X}^{\top}\bm{X} have the same kernel. We will use the notation Col(𝑨)\mbox{Col}\left(\bm{A}\right) to indicates the column space of a matrix 𝑨\bm{A}. Therefore, the map

T:Col(𝑿)p\displaystyle T:\mbox{Col}\left(\bm{X}\right)\subset\mathbb{R}^{p} Col(𝒀)p\displaystyle\to\mbox{Col}\left(\bm{Y}\right)\subset\mathbb{R}^{p}
𝑿𝒖\displaystyle\bm{X}\bm{u} 𝒀𝒖\displaystyle\to\bm{Y}\bm{u}

is well-defined for all 𝒖n\bm{u}\in\mathbb{R}^{n}. Note that dim(Col(𝑿))=dim(Col(𝒀))\mbox{dim}\left(\mbox{Col}\left(\bm{X}\right)\right)=\mbox{dim}\left(\mbox{Col}\left(\bm{Y}\right)\right) since their kernels are identical.

Moreover, TT is an isometry because

T(𝑿𝒖),T(𝑿𝒗)=𝒀𝒖,𝒀𝒗=𝒖𝒀𝒀𝒗=𝒖𝑿𝑿𝒗=𝑿𝒖,𝑿𝒗.\langle T\left(\bm{X}\bm{u}\right),T\left(\bm{X}\bm{v}\right)\rangle=\langle\bm{Y}\bm{u},\bm{Y}\bm{v}\rangle=\bm{u}^{\top}\bm{Y}^{\top}\bm{Y}\bm{v}=\bm{u}^{\top}\bm{X}^{\top}\bm{X}\bm{v}=\langle\bm{X}\bm{u},\bm{X}\bm{v}\rangle.

Since dim(Col(𝑿))=dim(Col(𝒀))\mbox{dim}\left(\mbox{Col}\left(\bm{X}\right)\right)=\mbox{dim}\left(\mbox{Col}\left(\bm{Y}\right)\right), TT admits a linear isometry extension to p\mathbb{R}^{p}. Since linear isometries are orthogonal matrices, we have

𝑸𝑿𝒖=𝒀𝒖\bm{Q}\bm{X}\bm{u}=\bm{Y}\bm{u}

for some orthogonal matrix 𝑸\bm{Q} of size p×pp\times p and all 𝒖n\bm{u}\in\mathbb{R}^{n}. The proof is completed. \square

Lemma 8.

Let 𝐗\bm{X}, 𝐘\bm{Y} and 𝐙\bm{Z} be i.i.d realizations of the uniform distribution on 𝕊p1\mathbb{S}^{p-1} and f:f:\mathbb{R}\mapsto\mathbb{R} be a bounded measurable function. Then, we have

  • 𝔼(f(𝑿T𝒀)|𝒀)=𝔼f(𝑿T𝒀)\mathbb{E}\left(f(\bm{X}^{T}\bm{Y})\Big|\bm{Y}\right)=\mathbb{E}f(\bm{X}^{T}\bm{Y}) almost surely.

  • 𝑿T𝒀\bm{X}^{T}\bm{Y} and 𝑿T𝒁\bm{X}^{T}\bm{Z} are independent.

Proof of Lemma 8. The first claim is a consequence of the rotational invariant property. Conditioning on 𝒀\bm{Y}, there exists an orthogonal matrix OO such that OT𝒀=𝒆1=(1,0,,0)O^{T}\bm{Y}=\bm{e}_{1}=(1,0,\dots,0). Thus, with probability one, we have

𝔼(f(𝑿T𝒀)|𝒀)\displaystyle\mathbb{E}\left(f(\bm{X}^{T}\bm{Y})\Big|\bm{Y}\right) =𝔼(f(𝑿TOT𝒀)|𝒀)\displaystyle=\mathbb{E}\left(f\left(\bm{X}^{T}O^{T}\bm{Y}\right)\Big|\bm{Y}\right)
=𝔼(f(𝑿T𝒆1)|𝒀)\displaystyle=\mathbb{E}\left(f(\bm{X}^{T}\bm{e}_{1})\Big|\bm{Y}\right)
=𝔼(f(𝑿𝑻𝒆1)),\displaystyle=\mathbb{E}\left(f(\bm{X^{T}}\bm{e}_{1})\right),

where we use the fact that 𝑿T𝒆1\bm{X}^{T}\bm{e}_{1} is independent from 𝒀\bm{Y} in the last equality. Similarly, one can also show that 𝔼f(𝑿T𝒀)=𝔼(f(𝑿𝑻𝒆1))\mathbb{E}f(\bm{X}^{T}\bm{Y})=\mathbb{E}\left(f(\bm{X^{T}}\bm{e}_{1})\right). This concludes the proof of the first claim.

For the second claim, take any bounded measurable functions ff and gg, by conditioning on 𝑿\bm{X}, we get

𝔼(f(𝑿T𝒀)g(𝑿T𝒁))\displaystyle\mathbb{E}\left(f(\bm{X}^{T}\bm{Y})g(\bm{X}^{T}\bm{Z})\right) =𝔼[𝔼(f(𝑿T𝒀)|𝑿)𝔼(g(𝑿T𝒁)|𝑿)]\displaystyle=\mathbb{E}\Big[\mathbb{E}\left(f(\bm{X}^{T}\bm{Y})\Big|\bm{X}\right)\cdot\mathbb{E}\left(g(\bm{X}^{T}\bm{Z})\Big|\bm{X}\right)\Big]
=𝔼f(𝑿T𝒀)𝔼g(𝑿T𝒁),\displaystyle=\mathbb{E}f(\bm{X}^{T}\bm{Y})\cdot\mathbb{E}g(\bm{X}^{T}\bm{Z}),

where we use the conclusion of the first statement in the last equality. This concludes the proof. \hfill\square

A direct consequence of Lemma 8 is that, for any bounded measurable function hh, the term ZnZ_{n} defined in (41) are degenerate U-statistics of order 11 (see, for example, Section 5.3 of the monograph [Dehling] for a comprehensive introduction to U-statistics and its limit theories). This fact was used frequently throughout the proof of Theorem 1. The next lemma is used in checking the conditions of martingale CLT, which was used in the proof of Theorem 1. It gives a simpler form of the distribution of joint angles.

Lemma 9.

Let p2p\geq 2 be an integer. Assume 𝐚𝕊p1\bm{a}\in\mathbb{S}^{p-1} and 𝐛𝕊p1\bm{b}\in\mathbb{S}^{p-1} are fixed vectors. Let 𝐗\bm{X} be a random vector uniformly distributed over 𝕊p1\mathbb{S}^{p-1} and ξ1,ξ2,,ξp\xi_{1},\xi_{2},\dots,\xi_{p} be i.i.d. standard normal. Set 𝛏=(ξ1,ξ2,,ξp)\bm{\xi}=\left(\xi_{1},\xi_{2},\dots,\xi_{p}\right)^{\top}. Then, for any bounded measurable function f(x,y):2f(x,y):\mathbb{R}^{2}\to\mathbb{R}, we have

𝔼f(𝒂T𝑿,𝒃T𝑿)=𝔼f(ξ1𝝃,(𝒂T𝒃)ξ1𝝃+1(𝒂T𝒃)2ξ2𝝃).\mathbb{E}f(\bm{a}^{T}\bm{X},\bm{b}^{T}\bm{X})=\mathbb{E}f\Big(\frac{\xi_{1}}{\|\bm{\xi}\|},(\bm{a}^{T}\bm{b})\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-(\bm{a}^{T}\bm{b})^{2}}\,\frac{\xi_{2}}{\|\bm{\xi}\|}\Big).

Proof of Lemma 9. We can assume 𝑿=(ξ1,,ξp)/𝝃\bm{X}=(\xi_{1},\dots,\xi_{p})^{\top}/\|\bm{\xi}\| without loss of generality. Suppose 𝒄𝕊p1\bm{c}\in\mathbb{S}^{p-1} and 𝒅𝕊p1\bm{d}\in\mathbb{S}^{p-1} are constant vectors with 𝒄𝒅=0\bm{c}^{\top}\bm{d}=0. Let AA be an orthogonal matrix of which the first two rows are 𝒄\bm{c}^{\top} and 𝒅\bm{d}^{\top}, respectively. Then, by the Haar-invariance of the uniform distribution on sphere, A𝑿A\bm{X} and 𝑿\bm{X} are identically distributed. In particular the top two entries of A𝑿A\bm{X}, that is, (𝒄𝑿,𝒅𝑿)(\bm{c}^{\top}\bm{X},\bm{d}^{\top}\bm{X}) have same distribution as that of (ξ1,ξ2)/𝝃.(\xi_{1},\xi_{2})^{\top}/\|\bm{\xi}\|. Write

𝒃=(𝒂𝒃)𝒂+1(𝒂𝒃)2𝒃(𝒂𝒃)𝒂1(𝒂𝒃)2.\bm{b}=(\bm{a}^{\top}\bm{b})\bm{a}+\sqrt{1-(\bm{a}^{\top}\bm{b})^{2}}\cdot\frac{\bm{b}-(\bm{a}^{\top}\bm{b})\bm{a}}{\sqrt{1-(\bm{a}^{\top}\bm{b})^{2}}}.

The advantage of doing so is the trivial observation that 𝒂\bm{a} and 𝒃(𝒂𝒃)𝒂1(𝒂𝒃)2\frac{\bm{b}-(\bm{a}^{\top}\bm{b})\bm{a}}{\sqrt{1-(\bm{a}^{\top}\bm{b})^{2}}} are orthogonal unit vectors. Hence 𝒂T𝑿\bm{a}^{T}\bm{X} and b(ab)a1(ab)2𝑿\frac{b-(a^{\prime}b)a}{\sqrt{1-(a^{\prime}b)^{2}}}\bm{X} have the same law as that of (ξ1,ξ2)/𝝃.(\xi_{1},\xi_{2})^{\prime}/\|\bm{\xi}\|. This implies that (𝒂T𝑿,𝒃T𝑿)(\bm{a}^{T}\bm{X},\bm{b}^{T}\bm{X}) have the same law as that of

(ξ1𝝃,(𝒂𝒃)ξ1𝝃+1(𝒂𝒃)2ξ2𝝃).\Big(\frac{\xi_{1}}{\|\bm{\xi}\|},(\bm{a}^{\top}\bm{b})\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-(\bm{a}^{\top}\bm{b})^{2}}\,\frac{\xi_{2}}{\|\bm{\xi}\|}\Big).

As a result,

𝔼f(𝒂T𝑿,𝒃T𝑿)=𝔼f(ξ1𝝃,(𝒂T𝒃)ξ1𝝃+1(𝒂T𝒃)2ξ2𝝃),\mathbb{E}f(\bm{a}^{T}\bm{X},\bm{b}^{T}\bm{X})=\mathbb{E}f\Big(\frac{\xi_{1}}{\|\bm{\xi}\|},(\bm{a}^{T}\bm{b})\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-(\bm{a}^{T}\bm{b})^{2}}\,\frac{\xi_{2}}{\|\bm{\xi}\|}\Big),

where the last expectation is taken over ξ1,,ξp\xi_{1},\dots,\xi_{p}, hence it is a function of 𝒂T𝒃\bm{a}^{T}\bm{b}. \hfill\square

Lemma 10.

Let HnH_{n} be defined in (47) for a sequence of measurable functions hn:h_{n}:\mathbb{R}\mapsto\mathbb{R}. Assume additionally that

𝔼hn(𝑿1𝑿2)\displaystyle\mathbb{E}h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2}) =0;\displaystyle=0;
Var(hn(𝑿1𝑿2))\displaystyle\mbox{Var}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})\right) C1\displaystyle\leq C_{1}

for some constant C1C_{1} independent of nn. Then, we have

𝔼Hn2(𝑿𝟏,𝑿2)0\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2})\to 0

as nn\to\infty.

Proof of Lemma 10. It suffices to prove Lemma 10 when hnh_{n} is bounded. Indeed, suppose we have proved 𝔼Hn2(𝑿𝟏,𝑿2)0\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2})\to 0 for all bounded hnh_{n}, then write

hn=hn𝟏{|hn|L}𝔼(hn𝟏{|hn|L})fn,L+hn𝟏{|hn|>L}𝔼(hn𝟏{|hn|>L})gn,Lh_{n}=\underbrace{h_{n}\cdot\mathbf{1}_{\left\{|h_{n}|\leq L\right\}}-\mathbb{E}\left(h_{n}\cdot\mathbf{1}_{\left\{|h_{n}|\leq L\right\}}\right)}_{f_{n,L}}+\underbrace{h_{n}\cdot\mathbf{1}_{\left\{|h_{n}|>L\right\}}-\mathbb{E}\left(h_{n}\cdot\mathbf{1}_{\left\{|h_{n}|>L\right\}}\right)}_{g_{n,L}}

where the expectation is taken with respect to the law of 𝑿1𝑿2\bm{X}_{1}^{\top}\bm{X}_{2}. Then,

𝔼Hn2(𝑿𝟏,𝑿2)\displaystyle\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2}) =𝔼𝔼2[hn(𝑿1T𝒀)hn(𝑿2T𝒀)|𝑿1,𝑿2]\displaystyle=\mathbb{E}\mathbb{E}^{2}\Big[h_{n}\left(\bm{X}_{1}^{T}\bm{Y}\right)\cdot h_{n}\left(\bm{X}_{2}^{T}\bm{Y}\right)\Big|\bm{X}_{1},\bm{X}_{2}\Big]
=𝔼[𝔼2[fn,L(𝑿1T𝒀)fn,L(𝑿2T𝒀)|𝑿1,𝑿2]]+O(Var(gn,L((𝑿𝟏,𝑿2))).\displaystyle=\mathbb{E}\left[\mathbb{E}^{2}\left[f_{n,L}\left(\bm{X}_{1}^{T}\bm{Y}\right)\cdot f_{n,L}\left(\bm{X}_{2}^{T}\bm{Y}\right)\Big|\bm{X}_{1},\bm{X}_{2}\right]\right]+O\left(\mbox{Var}\left(g_{n,L}\left((\bm{X_{1}},\bm{X}_{2}\right)\right)\right).

For every fixed LL the first term tends to zero when nn\to\infty. We then the deduce the result by letting LL\to\infty and noting that

supn1Var(gn,L((𝑿𝟏,𝑿2))C1L2.\sup_{n\geq 1}\mbox{Var}\left(g_{n,L}\left((\bm{X_{1}},\bm{X}_{2}\right)\right)\leq\frac{C_{1}}{L^{2}}.

Now assume that hh is bounded. Let 𝒀\bm{Y} be drawn from the uniform distribution on 𝕊p1\mathbb{S}^{p-1} independently from 𝑿1\bm{X}_{1} and 𝑿2\bm{X}_{2}. Thanks to Lemma 9, we can write

𝔼Hn2(𝑿𝟏,𝑿2)\displaystyle\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2}) =𝔼𝔼2[hn(𝑿1T𝒀)hn(𝑿2T𝒀)|𝑿1,𝑿2]\displaystyle=\mathbb{E}\mathbb{E}^{2}\Big[h_{n}\left(\bm{X}_{1}^{T}\bm{Y}\right)\cdot h_{n}\left(\bm{X}_{2}^{T}\bm{Y}\right)\Big|\bm{X}_{1},\bm{X}_{2}\Big]
=𝔼𝔼2[hn(ξ1𝝃)hn(𝑿1T𝑿2ξ1𝝃+1(𝑿1T𝑿2)2ξ2𝝃)|𝑿1,𝑿2],\displaystyle=\mathbb{E}\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(\bm{X}_{1}^{T}\bm{X}_{2}\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-(\bm{X}_{1}^{T}\bm{X}_{2})^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\Big|\bm{X}_{1},\bm{X}_{2}\bigg],

where 𝝃=(ξ1,ξ2,,ξp)\bm{\xi}=(\xi_{1},\xi_{2},\dots,\xi_{p})^{\top} is a vector consisting of i.i.d. standard normal. Set U=𝑿1T𝑿2U=\bm{X}_{1}^{T}\bm{X}_{2} and let f(u)f(u) be the density of UU, we can write

𝔼Hn2(𝑿𝟏,𝑿2)\displaystyle\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2}) =11f(u)𝔼2[hn(ξ1𝝃)hn(uξ1𝝃+1u2ξ2𝝃)]𝑑u\displaystyle=\int_{-1}^{1}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\bigg]du
=|u|εf(u)𝔼2[hn(ξ1𝝃)hn(uξ1𝝃+1u2ξ2𝝃)]𝑑u\displaystyle=\int_{|u|\leq\varepsilon}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\bigg]du
+|u|>εf(u)𝔼2[hn(ξ1𝝃)hn(uξ1𝝃+1u2ξ2𝝃)]𝑑u\displaystyle+\int_{|u|>\varepsilon}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\bigg]du
|u|εf(u)𝔼2[hn(ξ1𝝃)hn(uξ1𝝃+1u2ξ2𝝃)]𝑑u\displaystyle\leq\int_{|u|\leq\varepsilon}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\bigg]du
+hn2(|𝑿1T𝑿2|>ε)\displaystyle+\|h_{n}\|_{\infty}^{2}\cdot\mathbb{P}\left(|\bm{X}_{1}^{T}\bm{X}_{2}|>\varepsilon\right)

for any fixed ε>0\varepsilon>0. By Proposition 5 in [Jiang13], we get (|𝑿1T𝑿2|>ε)0\mathbb{P}\left(|\bm{X}_{1}^{T}\bm{X}_{2}|>\varepsilon\right)\to 0 as pp\to\infty and hence, it suffices to bound the first integrand over (ε,ε)(-\varepsilon,\varepsilon). Thanks to Lemma 11, for |u|ε|u|\leq\varepsilon we get the bound

|𝔼[hn(ξ1𝝃)hn(uξ1𝝃+1u2ξ2𝝃)]𝔼[hn(ξ1p)hn(ξ2p)]|\displaystyle\bigg|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\Big]-\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\sqrt{p}}\right)\cdot h_{n}\left(\frac{\xi_{2}}{\sqrt{p}}\right)\Big]\bigg|
\displaystyle\leq (const)hn2(p1+ε),\displaystyle(\mbox{const})\cdot\|h_{n}\|_{\infty}^{2}\cdot\left(p^{-1}+\varepsilon\right),

by choosing gn(x,y)=hn(x)hn(y)g_{n}(x,y)=h_{n}(x)h_{n}(y) in Lemma 11. Note that 𝔼hn(ξ1/𝝃)=0\mathbb{E}h_{n}\left(\xi_{1}/\|\bm{\xi}\|\right)=0 since 𝔼hn(𝑿1𝑿2)=0\mathbb{E}h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})=0, which gives us

|𝔼[hn(ξ1p)hn(ξ2p)]|\displaystyle\bigg|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\sqrt{p}}\right)\cdot h_{n}\left(\frac{\xi_{2}}{\sqrt{p}}\right)\Big]\bigg| =|𝔼[hn(ξ1p)]|2\displaystyle=\bigg|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\sqrt{p}}\right)\Big]\bigg|^{2}
=|𝔼[hn(ξ1p)]𝔼[hn(ξ1𝝃)]|2\displaystyle=\bigg|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\sqrt{p}}\right)\Big]-\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\Big]\bigg|^{2}
(const)hn2p,\displaystyle\leq(\mbox{const})\cdot\frac{\|h_{n}\|_{\infty}^{2}}{p},

where we use the bound (56) in the last inequality. This in turns yields

|𝔼[hn(ξ1𝝃)hn(uξ1𝝃+1u2ξ2𝝃)]|(const)(p1+ε),\displaystyle\bigg|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\Big]\bigg|\leq(\mbox{const})\cdot\left(p^{-1}+\varepsilon\right),

since the LL_{\infty}-norm of hnh_{n} is uniformly bounded. Consequently,

|u|εf(u)𝔼2[hn(ξ1𝝃)hn(uξ1𝝃+1u2ξ2𝝃)]𝑑u\displaystyle\int_{|u|\leq\varepsilon}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\bigg]du
\displaystyle\leq (const)|u|εf(u)(p1+ε)𝑑u.\displaystyle(\mbox{const})\cdot\int_{|u|\leq\varepsilon}f(u)\cdot(p^{-1}+\varepsilon)du.

The proof is completed by taking pp\to\infty and then taking ε0\varepsilon\to 0. \hfill\square

Lemma 11.

Let 𝛏=(ξ1,ξ2,,ξp)\bm{\xi}=(\xi_{1},\xi_{2},\dots,\xi_{p})^{\top} is a vector consisting of i.i.d. standard normal, then for any bounded, measureable function g:2g:\mathbb{R}^{2}\mapsto\mathbb{R} and u[1/2,1/2]u\in[-1/2,1/2], we have

|𝔼g(ξ1𝝃,uξ1𝝃+1u2ξ2𝝃)𝔼g(ξ1p,ξ2p)|Cg(1p+|u|)\Big|\mathbb{E}g\left(\frac{\xi_{1}}{\|\bm{\xi}\|},u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)-\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},\frac{\xi_{2}}{\sqrt{p}}\right)\Big|\leq C\|g\|_{\infty}\left(\frac{1}{p}+|u|\right)

for some universal constant C>0C>0.

Proof of Lemma 11. The conclusion follows from the following two total variation distance bounds

dTV((pξ1𝝃,pξ2𝝃),(ξ1,ξ2))Cp\displaystyle d_{TV}\left(\left(\frac{\sqrt{p}\cdot\xi_{1}}{\|\bm{\xi}\|},\frac{\sqrt{p}\cdot\xi_{2}}{\|\bm{\xi}\|}\right),\left(\xi_{1},\xi_{2}\right)\right)\leq\frac{C}{p} (56)

and

dTV(N((00),(1uu1)),N(𝟎,𝑰2))C|u|\displaystyle d_{TV}\Big(N\left(\begin{pmatrix}0\\ 0\end{pmatrix},\begin{pmatrix}1&u\\ u&1\end{pmatrix}\right),N(\bm{0},\bm{I}_{2})\Big)\leq C|u| (57)

for some universal constant CC and for all pp large enough in the first estimate. We next explain (56) and (57).

The first estimate (56) is a consequence of Diaconis-Freeman theorem (see Theorem 2.8 in the monograph [Meckes] and also the paper [D-F]). It provides a sharp bound in terms of total variation between the joint distribution of the first few entries of Unif(𝕊p1)\mbox{Unif}\left(\mathbb{S}^{p-1}\right) and the standard multivariate normal random vector of the same length. The second estimate (57) is elementary and can be proven directly by estimating the difference between the two corresponding densities. To see this, write

dTV(N((00),(1uu1)),N(𝟎,𝑰2))\displaystyle d_{TV}\Big(N\left(\begin{pmatrix}0\\ 0\end{pmatrix},\begin{pmatrix}1&u\\ u&1\end{pmatrix}\right),N(\bm{0},\bm{I}_{2})\Big)
\displaystyle\leq |12πexp{x2+y22}12π1u2exp{x2+y22uxy2(1u2)}|𝑑x𝑑y\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}}\bigg|\frac{1}{2\pi}\exp\left\{-\frac{x^{2}+y^{2}}{2}\right\}-\frac{1}{2\pi\sqrt{1-u^{2}}}\exp\left\{-\frac{x^{2}+y^{2}-2uxy}{2(1-u^{2})}\right\}\bigg|dxdy
\displaystyle\leq 12πexp{x2+y22}|111u2exp{(x2+y2)u22(1u2)+uxy1u2}|𝑑x𝑑y\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}}\frac{1}{2\pi}\exp\left\{-\frac{x^{2}+y^{2}}{2}\right\}\cdot\bigg|1-\frac{1}{\sqrt{1-u^{2}}}\exp\left\{-\frac{(x^{2}+y^{2})u^{2}}{2(1-u^{2})}+\frac{uxy}{1-u^{2}}\right\}\bigg|dxdy
\displaystyle\leq I1+I2,\displaystyle I_{1}+I_{2},

where

I1\displaystyle I_{1} =|111u2|12πexp{x2+y22}𝑑x𝑑y,\displaystyle=\bigg|1-\frac{1}{\sqrt{1-u^{2}}}\bigg|\cdot\int_{\mathbb{R}}\int_{\mathbb{R}}\frac{1}{2\pi}\exp\left\{-\frac{x^{2}+y^{2}}{2}\right\}dxdy,
I2\displaystyle I_{2} =12π1u2exp{x2+y22}|1exp{(x2+y2)u22(1u2)+uxy1u2}|.\displaystyle=\frac{1}{2\pi\sqrt{1-u^{2}}}\cdot\int_{\mathbb{R}}\int_{\mathbb{R}}\exp\left\{-\frac{x^{2}+y^{2}}{2}\right\}\cdot\bigg|1-\exp\left\{-\frac{(x^{2}+y^{2})u^{2}}{2(1-u^{2})}+\frac{uxy}{1-u^{2}}\right\}\bigg|.

The term I1I_{1} is obviously of order O(|u|)O(|u|) as u0u\to 0 and thus, we only need to bound I2I_{2}. In polar coordinates, I2I_{2} can be rewritten as

I2\displaystyle I_{2} =12π1u20rexp{r22}02π|1exp{r2u2usin2θ2(1u2)}|𝑑θ𝑑r.\displaystyle=\frac{1}{2\pi\sqrt{1-u^{2}}}\cdot\int_{0}^{\infty}r\cdot\exp\left\{-\frac{r^{2}}{2}\right\}\cdot\int_{0}^{2\pi}\bigg|1-\exp\left\{-\frac{r^{2}u^{2}-u\sin 2\theta}{2(1-u^{2})}\right\}\bigg|d\theta dr.

Moreover, we have

|1exp{r2u2usin2θ2(1u2)}|\displaystyle\bigg|1-\exp\left\{-\frac{r^{2}u^{2}-u\sin 2\theta}{2(1-u^{2})}\right\}\bigg| |1exp{usin2θ2(1u2)}|\displaystyle\leq\bigg|1-\exp\left\{\frac{u\sin 2\theta}{2(1-u^{2})}\right\}\bigg|
+|exp{usin2θ2(1u2)}||1exp{r2u22(1u2)}|\displaystyle+\bigg|\exp\left\{\frac{u\sin 2\theta}{2(1-u^{2})}\right\}\bigg|\cdot\bigg|1-\exp\left\{-\frac{r^{2}u^{2}}{2(1-u^{2})}\right\}\bigg|
(const)[|usin2θ2(1u2)|+r2u22(1u2)],\displaystyle\leq(\mbox{const})\cdot\bigg[\Big|\frac{u\sin 2\theta}{2(1-u^{2})}\Big|+\frac{r^{2}u^{2}}{2(1-u^{2})}\bigg],

where we use the elementary inequalities 1exx1-e^{-x}\leq x for all x>0x>0 and |1ex|C1|x||1-e^{x}|\leq C_{1}|x| for all |x|C|x|\leq C, where C1C_{1} depends only on CC. Thus, we have

I2\displaystyle I_{2} (const)|u|4π(1u2)3/20rexp{r22}(02π|sin2θ|+r2udθ)𝑑r\displaystyle\leq(\mbox{const})\cdot\frac{|u|}{4\pi(1-u^{2})^{3/2}}\cdot\int_{0}^{\infty}r\cdot\exp\left\{-\frac{r^{2}}{2}\right\}\cdot\bigg(\int_{0}^{2\pi}|\sin 2\theta|+r^{2}ud\theta\bigg)dr
(const)|u|.\displaystyle\leq(\mbox{const})|u|.

This concludes the proof of (57).

Now we are ready to prove the main estimate in Lemma 11, write

|𝔼g(ξ1𝝃,uξ1𝝃+1u2ξ2𝝃)𝔼g(ξ1p,ξ2p)|\displaystyle\Big|\mathbb{E}g\left(\frac{\xi_{1}}{\|\bm{\xi}\|},u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)-\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},\frac{\xi_{2}}{\sqrt{p}}\right)\Big|
\displaystyle\leq |𝔼g(ξ1𝝃,uξ1𝝃+1u2ξ2𝝃)𝔼g(ξ1p,uξ1p+1u2ξ2p)|\displaystyle\Big|\mathbb{E}g\left(\frac{\xi_{1}}{\|\bm{\xi}\|},u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)-\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},u\cdot\frac{\xi_{1}}{\sqrt{p}}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\sqrt{p}}\right)\Big|
+\displaystyle+ |𝔼g(ξ1p,uξ1p|+1u2ξ2p)𝔼g(ξ1p,ξ2p)|\displaystyle\Big|\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},u\cdot\frac{\xi_{1}}{\sqrt{p}|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\sqrt{p}}\right)-\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},\frac{\xi_{2}}{\sqrt{p}}\right)\Big|
\displaystyle\leq (const)gp+(const)g|u|.\displaystyle\frac{(\mbox{const})\cdot\|g\|_{\infty}}{p}+(\mbox{const})\cdot\|g\|_{\infty}|u|.

The proof is completed. \hfill\square

The next lemma collects some elementary properties and bound for the normalizing constant Cp(κ)C_{p}(\kappa) of the FvML distributions.

Lemma 12.

Recall Cp(κ)C_{p}(\kappa) in (49). The following statement holds:

  • If Iν(x)I_{\nu}(x) is the modfified Bessel function of first kind (see [Ley-Verdebout] for details), then

    Cp(κ)Cp(κ)=Ip/2(κ)Ip/21(κ).\frac{C_{p}(\kappa)^{{}^{\prime}}}{C_{p}(\kappa)}=-\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}.
  • For all ν>0\nu>0, we have

    Gν+12,ν+32(κ)Iν+1(κ)Iν(κ)Gν,ν+2(κ)G_{\nu+\frac{1}{2},\nu+\frac{3}{2}}(\kappa)\leq\frac{I_{\nu+1}(\kappa)}{I_{\nu}(\kappa)}\leq G_{\nu,\nu+2}(\kappa)

    where

    Gα,β(t):=tα+β2+t2.G_{\alpha,\beta}(t):=\frac{t}{\alpha+\sqrt{\beta^{2}+t^{2}}}.
  • for all κ>0\kappa>0, we have

    |logCp(κ)+κ22p|κ42p4\left|\log C_{p}(\kappa)+\frac{\kappa^{2}}{2p}\right|\leq\frac{\kappa^{4}}{2p^{4}}

    for all p3p\geq 3 and κ>0\kappa>0.

Proof of Lemma 12. The first property can be found in [M-Jupp], pages 169169 and 170170. The second property follows can be found in Section 3 of [hornik2013amos]. Let us use the first two properties to prove the last one.

We first show that

|Ip/2(κ)Ip/21(κ)κp|2κ3p3.\displaystyle\Big|\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}-\frac{\kappa}{p}\Big|\leq\frac{2\kappa^{3}}{p^{3}}. (58)

To see this, we apply the second property with ν=(p2)/2\nu=(p-2)/2 to obtain

κν+1/2+κ2+(ν+3/2)2Ip/2(κ)Ip/21(κ)κν+κ2+(ν+2)2\frac{\kappa}{\nu+1/2+\sqrt{\kappa^{2}+\left(\nu+3/2\right)^{2}}}\leq\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}\leq\frac{\kappa}{\nu+\sqrt{\kappa^{2}+\left(\nu+2\right)^{2}}}

From the upperbound, it is clear that the Bessel ratio is always less than 11. Consider two cases as follows.

Case 1: κp\kappa\geq p. In this case, the result is trival since

|Ip/2(κ)Ip/21(κ)κp|=κpIp/2(κ)Ip/21(κ)κpκ3p3.\Big|\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}-\frac{\kappa}{p}\Big|=\frac{\kappa}{p}-\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}\leq\frac{\kappa}{p}\leq\frac{\kappa^{3}}{p^{3}}.

Case 2: κp\kappa\leq p. In this case, put x:=κ/p1x:=\kappa/p\leq 1. Use the lowerbound to get

xIp/2(κ)Ip/21(κ)\displaystyle x-\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)} xκ(p1)/2+κ2+((p+1)/2)2\displaystyle\leq x-\frac{\kappa}{(p-1)/2+\sqrt{\kappa^{2}+\left((p+1)/2\right)^{2}}}
=x[111+x2+(1/2+1/(2p))21/21/(2p)]\displaystyle=x\left[1-\frac{1}{1+\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-1/2-1/(2p)}\right]
=xx2+(1/2+1/(2p))2[1/2+1/(2p)]1+x2+(1/2+1/(2p))21/21/(2p)\displaystyle=x\cdot\frac{\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-[1/2+1/(2p)]}{1+\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-1/2-1/(2p)}
2x31+x2+(1/2+1/(2p))21/21/(2p)2x3\displaystyle\leq\frac{2x^{3}}{1+\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-1/2-1/(2p)}\leq 2x^{3}

where the last line follows from the fact that

x2+(1/2+1/(2p))2[1/2+1/(2p)]=x2x2+(1/2+1/(2p))2+[1/2+1/(2p)]2x2.\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-[1/2+1/(2p)]=\frac{x^{2}}{\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}+[1/2+1/(2p)]}\leq 2x^{2}.

Finally, to deduce the third property, we integrate (58) to get

|logCp(κ)+κ22p|=|0κ(Ip/2(t)Ip/21(t)+tp)𝑑t|\displaystyle\left|\log C_{p}(\kappa)+\frac{\kappa^{2}}{2p}\right|=\left|\int_{0}^{\kappa}\left(-\frac{I_{p/2}(t)}{I_{p/2-1}(t)}+\frac{t}{p}\right)dt\right| 0κ|Ip/2(t)Ip/21(t)+tp|𝑑t\displaystyle\leq\int_{0}^{\kappa}\left|-\frac{I_{p/2}(t)}{I_{p/2-1}(t)}+\frac{t}{p}\right|dt
κ42p4.\displaystyle\leq\frac{\kappa^{4}}{2p^{4}}.

This completes the proof. \square