Detecting non-uniform patterns on high-dimensional hyperspheres

Tiefeng Jiang
School of Data Science, Chinese University of Hong Kong, Shenzhen
[email protected] Tuan Pham
Department of Statistics and Data Science, University of Texas, Austin
[email protected]

Abstract

We propose a new probabilistic characterization of the uniform distribution on the hypersphere in terms of the distribution of inner products, extending the ideas of [cuesta2009projection; cuesta2007sharp] in a data-driven manner. Using this characterization, we define a new distance that quantifies the deviation of an arbitrary distribution from uniformity.

As an application, we construct a novel nonparametric test for the problem of testing uniformity, namely the task of determining whether a set of $n$ i.i.d. random points on the $p$ -dimensional hypersphere is approximately uniformly distributed. The proposed test is asymptotically a Brownian bridge and it can detect any alternative lying outside a ball of radius $1/n$ with respect to the proposed distance, in both high and low-dimensional settings.

We then prove a matching lower bound with respect to this distance and study its behavior when restricted to parametric models. In particular, we show that the minimax detection thresholds with respect to this distance coincide with the usual minimax thresholds in two important families: (i) the class of Fisher–von Mises–Langevin (FvML) alternatives, and (ii) a class of low-rank uniform distributions. Thus, the proposed test is optimal in these models. We also derive the limiting distributions of the test under the corresponding local alternatives.

As a byproduct of our analysis, we determine the detection threshold in the high-dimensional regime for testing the intrinsic dimension of the uniform distribution on $\mathbb{S}^{p-1}$ ; that is, for testing whether the distribution is uniformly supported on $\mathbb{S}^{p-1}$ against the alternative that it is uniformly distributed on

\mathbb{S}^{p-1}\cap H,

for some $k$ -dimensional linear subspace $H\subset\mathbb{R}^{p}$ .

1 Introduction

Testing whether a sample from an unknown distribution is uniformly distributed over a domain is a classical problem in statistical theory. In the discrete case, this problem has been extensively studied by statisticians, computer scientists, and probabilists; see [bhattacharya2024sparse; balakrishnan2018hypothesis] and references therein. For the continuous case, one of the most common and intriguing settings is the unit hypersphere, not only because of its rich mathematical structure but also due to its importance in statistical analysis on non-Euclidean spaces. Below, we briefly formulate the problem and review relevant literature.

Consider the hypersphere $\mathbb{S}^{p-1}=\left\{x\in\mathbb{R}^{p}:\|x\|_{2}=1\right\}$ , where $\|.\|_{2}$ is the Euclidean distance. The observed data points are denoted by $\bm{X}_{1},\bm{X}_{2},...,\bm{X}_{n}$ with $\bm{X}_{i}\in\mathbb{S}^{p-1}$ for all $i=1,2,...,n$ . We are mainly interested in the high-dimensional case where one assumes $p=p_{n}$ is a sequence diverging to infinity. Assume that the data $\bm{X}_{i}$ ’s are drawn independently from an unknown distribution $\mu$ supported on the hypersphere $\mathbb{S}^{p-1}$ . The uniform distribution on the hypersphere $\mathbb{S}^{p-1}$ is denoted by $\mbox{Unif}(\mathbb{S}^{p-1})$ . The uniformity testing problem can be formulated as

\displaystyle H_{0}:\mu=\mbox{Unif}(\mathbb{S}^{p-1})\ \ \ \mbox{against}\ \ \ H_{1}:\mu\neq\mbox{Unif}(\mathbb{S}^{p-1}).

(1)

In fixed and small dimensions, the uniformity testing problem has been investigated extensively in the last few decades. An incomplete list of early results concerning the case $p=2$ includes the Kuiper test (see, for example, [Kuiper]), Watson test (see, for example, [watson]) and Hodjes-Ajne test (see, for example, [Ajne]). In arbitrary but fixed dimensions, the class of Sobolev-based tests were introduced in [Gine], and were shown to be universally consistent against any absolutely continuous alternative with $L^{2}$ -integrable densities. Notable developments of Sobolev tests include the data driven procedures proposed by [Bogdan] and [Jupp]. The readers are referred to the survey papers [survey-uni; pewsey2021recent] for recent progress on this problem and for a list of recent testing procedures. The consistency and optimality of various testing procedures in fixed dimensions have been well-studied in literature, and can also be found in [survey-uni]. Recent results in the fixed-dimensional settings include [garcia2021cramer; garcia2023projection; fernandez2023new; boucher2025modified; boucher2025runs].

In the era of big data, there has been an increasing interest in studying high-dimensional directional statistics, which assumes the dimensions diverge to infinity. For example, in shape analysis and nonparametric statistics, a popular approach is to consider sign-based procedures, in which one projects the observations onto the hyperspheres and carries out statistical inference based on the projected data. This approach is robust in high dimensions since the concentration of measure phenomenon implies that the majority of information from the data is captured by the directions rather than the magnitudes of the observations. Let us give a brief overview about the high-dimensional directional statistics literature below.

In [Dryden05], the author investigates the asymptotic properties of high-dimensional spherical distributions and their applications to brain shape modeling. Specifically, the study involved statistical modeling of a sample of $n=74$ MRI images of adult brains. After normalization, each brain image was represented as a unit vector with dimension $p=62,501$ . A natural question in this modeling task is whether some simple, well-known distributions (such as the uniform distribution) provide a good fit for the data. Clustering analysis on large dimension hypersphere has been studied in [Banerjee04; Banerjee03]. Potential applications of high-dimensional the uniformity tests were illustrated in [Juan2001], in which the authors relate the multivariate outliers detection problem to uniformity testing problem. Sign-based procedures in high dimensions have been considered in [Zou14] in the context of sphericity testing and in [WPL15], where the authors propose a high-dimensional nonparametric mean test.

From a different perspective than the directional statistical viewpoint discussed above, our primary motivation for studying the high-dimensional analog of (1) arises from deep learning theory. In overparameterized neural networks, regularization is crucial for preventing overfitting and improving generalization. In [xie2017diverse], it was shown that optimizing one-hidden-layer neural networks with approximately uniformly distributed neurons can help avoid spurious local minima. Furthermore, empirical studies in [lin2020regularizing; liu2018learning] have shown that regularization methods promoting uniformity among neurons effectively reduce the generalization error in deep networks. Such methods are fundamentally tied to the question of whether a random set of points on the unit hypersphere is approximately uniformly distributed. The overparameterized nature of deep networks makes it natural to study this question in the high-dimensional settings. Given the complexity of many deep networks, including heavy-tailed or strongly correlated structures (see [mahoney2019traditional] for details), we focus on detecting non-uniformity in a non-parametric manner. This perspective shifts the attention away from traditional parametric modeling goals—such as optimality and asymptotic local power within a parametric class of distributions—towards prioritizing simplicity of implementation and universal consistency.

Despite the vast literature concerning fixed-dimensional tests, much less is known about the uniformity testing problem in the high-dimensional context with diverging dimensions. When the dimension diverges to infinity, many of the existing procedures require highly non-trivial adjustments to work properly. Moreover, there is usually no tractable limiting distribution under uniformity, and the power is typically low due to the curse of dimensionality. To the best of the authors’ knowledge, there are only three high-dimensional tests that have been investigated in literature. We give a short overview of such tests below.

1.

Rayleigh test in [Cutting-P-V] and [Ley-P]. This test can be formulated in terms of a U-statistic of the data points with the inner product kernel, i.e.

$\displaystyle R_{n}$ $\displaystyle:=\frac{\sqrt{2p}}{n}\sum_{1\leq i<j\leq n}\bm{X}^{\top}_{i}\bm{X}_{j}.$ (2)
2.

Bingham test in [Cutting-P-V2; Zou14] and [Ley-P]. This test is also based on a U-statistic of the data points, but with a quadratic inner product kernel, i.e.

$\displaystyle B_{n}$ $\displaystyle:=\frac{p}{n}\sum_{1\leq i<j\leq n}\Big[\left(\bm{X}^{\top}_{i}\bm{X}_{j}\right)^{2}-\frac{1}{p}\Big].$ (3)
3.

Packing test in [Jiang13]. This test is based on the smallest angle, i.e.

$\displaystyle P_{n}$ $\displaystyle:=p\cdot\max_{1\leq i<j\leq n}\left(\bm{X}^{\top}_{i}\bm{X}_{j}\right)^{2}-4\log n+\log\log n.$ (4)

The asymptotic distributions of these test statistics, as well as their non-null behaviors have been studied rigorously over the last few years; see, for example, [Cutting-P-V; Cutting-P-V2; Ley-P; Ley-P-2; Ley-P-V]. It is known that the Rayleigh test $R_{n}$ and the Bingham test $B_{n}$ enjoy a doubly robust property: under the null hypothesis and the single assumption $\min\left\{n,p\right\}\to\infty$ , both $R_{n}$ and $B_{n}$ converge in distribution to the standard normal distribution. This feature is highly desirable since no restriction on the dependence between $p$ and $n$ is imposed, and neither resampling procedures nor tuning parameters are needed to get the critical values of such tests. Regarding the packing test $P_{n}$ , it is known that, under the null hypothesis and the mild assumption $p\gg(\log n)^{2}$ , $P_{n}$ converges in distribution to the Gumbel distribution with CDF $\exp\left(-(8\pi)^{-1/2}e^{-x/2}\right)$ (see [Jiang13] and also [Jiang12]).

Each of these three tests has its own advantages and disadvantages. However, a common limitation is that they are each optimal only for a specific class of (parametric) alternatives: they perform well against certain models but may be essentially powerless outside those classes. Given the inherently nonparametric nature of the uniformity testing problem, it is therefore natural to prioritize robustness and optimality over a broad range of alternatives when designing testing procedures.

The primary objective of this article is to address this issue by approaching the problem (1) from a probabilistic and geometric perspective, rather than relying on the likelihood-based framework commonly used in statistics. We introduce a novel pseudometric to quantify deviations from uniformity (see (13) below). Unlike most classical distances between probability measures, this distance takes into account geometric deviations from uniformity; see Section 4 for further discussion. We then define a test statistic $T_{n}$ (see (8) below) that is naturally based on this distance and is universally consistent in fixed dimension. A key advantage of our test is its model-free nature: it imposes no structural assumptions on the underlying class of distributions, since it is not derived from likelihood inference. In high-dimensional settings, the proposed test enjoys a “doubly robust” property analogous to that of the Rayleigh and Bingham tests, yet remains intrinsically nonparametric. It admits a simple asymptotic theory in the high-dimensional regime, and comes with a consistency theory that is not restricted to any particular parametric class of alternatives. Our contributions can be summarized as follows.

•

We propose a new distance $d$ (see (13) for a precise definition) to quantify deviations from uniformity. This distance does not require the alternatives to be absolutely continuous with respect to the uniform distribution and is therefore well suited for analyzing singular alternatives. A natural test statistic associated with this distance (see $T_{n}$ in (8)) is introduced and shown to converge in distribution to the supremum of a Brownian bridge under the null (Theorem 1).
•

We prove a lower bound of order $1/n$ for testing with respect to $d$ (Theorem 3), and we show that the proposed test achieves this lower bound (Theorem 2). Both results are established without imposing any restriction between the sample size $n$ and the dimension $p$ .
•

We investigate how the distance $d$ behaves when restricted to parametric models, and how it relates to the minimax testing problem in those settings. In particular, we study two concrete models: the Fisher–von Mises–Langevin model and a low-rank uniform distribution model. We derive the local limiting distribution and the local power of $T_{n}$ in these two models (Propositions 2 and 3), and show that $T_{n}$ is asymptotically the supremum of a shifted Brownian bridge under such alternatives, where the shift is a smooth function that vanishes at the end points.

As a direct consequence of the local limiting distributions, we see that the minimax detection threshold with respect to $d$ coincides with the usual minimax detection rates within the corresponding parametric model (Propositions 7 and 8 in Appendix A.6). This means that, the threshold at which $d\asymp 1/n$ matches the minimax rate for testing uniformity in the corresponding parametric model. This phenomenon seems to hold for other models as well, see the discussion in Section 4.
•

As a byproduct of our analysis, we obtain an information-theoretic lower bound for testing the intrinsic dimension of the uniform distribution. This result is new and is of independent interest. In this low-rank uniform distribution model, the detection thresholds of the four tests $R_{n}$ , $B_{n}$ , $P_{n}$ , and our proposed test can be identified precisely.

We find that in this low-rank model, only the Bingham test and our proposed test attain the optimal detection threshold, and that our test is the only one that achieves the optimal rate simultaneously in both of the parametric models considered above; see Table 1 below.

We would also like to point out that there are other approaches in the literature that are not based on likelihood inference, such as the family of Sobolev tests originally proposed in [Gine] and the projection-based tests introduced in [cuesta2009projection], with further developments in recent works [garcia2021cramer; garcia2023projection; fernandez2023new]. Each of these approaches uses a different characterization of the uniform distribution: the Sobolev tests rely on the eigenfunctions of the Laplacian, while the projection-based tests are based on the one-dimensional distributions obtained by projecting the data onto all possible directions. However, none of these tests extend easily to high-dimensional settings, as they either involve tuning parameters or require resampling methods to implement. In contrast, our proposed test offers a simple and interpretable asymptotic theory in high-dimensional settings, which we discuss below. A detailed comparison between our test and the class of projection-based tests is provided in Section A.1.

The rest of the paper is organized as follows. The proposed pseudo distance and the test are presented in Section 2. The lowerbounds and local limiting distributions are provided in Section 3. Further discussions on the proposed pseudo distance and test can be found in Section 4. Section 5 contains the conclusions and some remarks. The proofs of the main results are presented in Section 6. Further discussisons and remarks can be found in Section 5. Some simulations, technical results, proofs and discussions are provided in Appendix A.

2 Measuring uniformity deviation and testing procedure

2.1 Notation and preliminaries

Throughout the paper, we consider the hypersphere $\mathbb{S}^{p-1}=\left\{x\in\mathbb{R}^{p}:\|x\|_{2}=1\right\}$ , where $\|.\|_{2}$ is the Euclidean distance. We always assume without stating that the dimension $p=p_{n}$ diverges to infinity. The observed data points are denoted by $\bm{X}_{1},\bm{X}_{2},...,\bm{X}_{n}$ with $\bm{X}_{i}\in\mathbb{S}^{p-1}$ for all $i=1,2,...,n$ . We assume that $\bm{X}_{i}$ ’s are drawn independently from an unknown distribution $\mu$ supported on the hypersphere $\mathbb{S}^{p-1}$ . The uniform distribution on the hypersphere $\mathbb{S}^{p-1}$ is denoted by $\mbox{Unif}(\mathbb{S}^{p-1})$ .

For a pair of data points $(\bm{X}_{i},\bm{X}_{j})$ , we denote by $\bm{X}^{\top}_{i}\bm{X}_{j}$ the inner product formed by $\bm{X}_{i}$ and $\bm{X}_{j}$ . Under $H_{0}$ , the distribution of $\bm{X}_{1}^{\top}\bm{X}_{2}$ is known to have density (see, for example, Lemma 11 and 12 in [Jiang13])

\displaystyle\mathbb{P}\left(a\leq\bm{X}_{1}^{\top}\bm{X}_{2}\leq b\right)

\displaystyle=\frac{1}{\sqrt{\pi}}\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p-1}{2}\right)}\cdot\int_{a}^{b}\left(1-\rho^{2}\right)^{\frac{p-3}{2}}d\rho.

(5)

In the formula (5) above, $a$ and $b$ are taken to be in $(-1,1)$ . The CDF and density of a standard normal distribution $N(0,1)$ will be denoted by $\Phi(t)$ and $\phi(t)$ , respectively.

Throughout the paper, we will use $\mu_{0}$ or $\mathbb{P}_{0}$ to denote the uniform distribution. The notation $\mathbb{P}_{0n}$ is used to indicate the $n$ -fold product measures of the uniform distribution.

2.2 A characterization of the uniform distribution

Let us start with an important observation regarding the random inner product of two i.i.d. points on the hypersphere.

Proposition 1.

Let $\nu$ be any Borel probability measure on $\mathbb{S}^{p-1}$ , and let $\mu_{0}$ denote the uniform distribution on $\mathbb{S}^{p-1}$ . Suppose $\bm{X}_{1},\bm{Y}_{1}$ are drawn independently from $\nu$ , and $\bm{X},\bm{Y}$ are drawn independently from $\mu_{0}$ . If

\displaystyle\bm{X}_{1}^{\top}\bm{Y}_{1}\stackrel{{\scriptstyle d}}{{=}}\bm{X}^{\top}\bm{Y},

(6)

then $\nu\equiv\mu_{0}$ .

Proposition 1 establishes the identifiability of the uniform distribution in terms of the inner product, asserting that the distribution of the inner product uniquely characterizes $\text{Uni}(\mathbb{S}^{p-1})$ . To the best of our knowledge, this characterization is new and may be of independent interest. Notably, we do not impose any regularity assumptions on the measure $\nu$ in the statement of Proposition 1, and the result holds even if $\nu$ is highly singular. For instance, the characterization applies to probability measures supported on sets of lower dimensions. Proposition 1 allows us to construct tests that work against any type of alternative, distinguishing it from other omnibus tests in the literature, which typically require alternatives to have $L^{2}$ -integrable densities with respect to $\mbox{Unif}(\mathbb{S}^{p-1})$ .

A related but distinct characterization of the uniform distribution in terms of projections was proposed in [cuesta2009projection]; see Section A.1 for further details and comparisons with the class of projection-based tests, which are built upon this characterization. Some recent extensions of this characterization to other types of distributions can be found in [fraiman2023cramer; fraiman2024application; fraiman2023quantitative]. Specifically, the characterization in [cuesta2009projection] is as follows. Let $\bm{X}_{1},\bm{X}_{2}$ be random variables on $\mathbb{S}^{p-1}$ for some $p\geq 1$ and $U\sim\mbox{Unif}(\mathbb{S}^{p-1})$ , then under some regularity conditions,

\displaystyle\bm{X}_{1}^{\top}\bm{U}\stackrel{{\scriptstyle d}}{{=}}\bm{X}_{2}^{\top}\bm{U}\Leftrightarrow\bm{X}_{1}\stackrel{{\scriptstyle d}}{{=}}\bm{X}_{2}.

(7)

Broadly speaking, the characterization (7) relies on projections onto independent, uniformly distributed directions. To construct testing procedures using (7), one often needs to sample the direction $\bm{U}$ repeatedly. In contrast, the left-hand side of (6) does not involve any uniformly distributed direction and is completely data-driven. Consequently, our method requires neither integrating over all directions $\bm{U}$ , as was done in [escanciano2006consistent; garcia2023projection; fernandez2023new] nor sampling $\bm{U}$ repeatedly, as was done in [cuesta2009projection].

The key challenge in proving Proposition 1 is that the arguments used to establish (7) are no longer applicable. Specifically, the proof of the characterization (7) in [cuesta2007sharp; cuesta2009projection] relies on a sharp version of the Cramér–Wold device, which cannot be applied to (6) because $\bm{Y}_{1}$ and $\bm{Y}$ follow different laws. The proof of Proposition 1, instead, relies on a subtle application of the Lebesgue differentiation theorem, which will be detailed in Section 6.1.

The first motivation for our model-free testing comes from Proposition 1 above. Intuitively, instead of carrying out inference directly on the data points, one can construct tests based on some unique features of the random inner product under $H_{0}$ . Here we choose the CDF as such a feature to construct the test, which is based on estimating the CDF of $\bm{X}^{\top}_{1}\bm{X}_{2}$ from the data and reject $H_{0}$ if the estimated CDF differs too much from the true CDF $F_{0}$ . Under $H_{0}$ , we know that the CDF of $\bm{X}^{\top}_{1}\bm{X}_{2}$ has a beta-type distribution with specific parameters (see equation (9) below). Another advantage of the random inner products is that, they are one-dimensional objects, which require low computational cost. Moreover, they become “asymptotically independent” as the dimension gets larger. The last phenomenon has been observed in a number of works related to Haar matrices; see, for examples, [Jiang05; Jiang09; D-F] and the references therein.

Another motivation for our model-free test is based on the following observation regarding three exsiting tests $R_{n}$ , $B_{n}$ and $P_{n}$ defined in (2), (3) and (4), respectively. It is reasonable to argue that the primary cause of their model-dependent issue is the inability to capture all the information available from the data. In particular, the Rayleigh test $R_{n}$ only takes advantage of the linear kernel, which is powerless for models involving axial data, such as the Watson distributions. On the other hand, the Bingham test $B_{n}$ uses a quadratic kernel and performs poorly on non-axial data. The packing test $P_{n}$ uses extreme-values, and as pointed out in [Jiang13], it does not examine whether there is a gap in the data or not.

To fully utilize all information available from the data, we will look at the empirical distributions generated from all random inner products, instead of using a particular U-statistic. To be more precise, let $\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{n}$ be the data, we define

\displaystyle\hat{\mu}_{n}=\frac{2}{n(n-1)}\sum_{1\leq i<j\leq n}\delta_{\bm{X}^{\top}_{i}\bm{X}_{j}}.

In the light of Proposition 1, the empirical measure $\hat{\mu}_{n}$ intuitively capture the characteristics of the underlying unknown distribution. One can then compare $\hat{\mu}_{n}$ and $\bm{X}^{\top}_{1}\bm{X}_{2}\Big|_{H_{0}}$ , which is the distribution of $\bm{X}^{\top}_{1}\bm{X}_{2}$ under $H_{0}$ via the Kolomorgrov distance to see how severely the data is away from uniformity. To define the test formally, we write

\displaystyle T_{n}:

\displaystyle=\sup_{t\in[-1,1]}\Big|\frac{2}{n(n-1)}\sum_{1\leq i<j\leq n}\mathbf{1}_{\left\{\bm{X}^{\top}_{i}\bm{X}_{j}\leq t\right\}}-m(t)\Big|,

(8)

where

\displaystyle m(t):=\mathbb{P}\left(\bm{X}^{\top}_{1}\bm{X}_{2}\leq t\Big|H_{0}\right)=\mathbb{P}\left(1-2U\leq t\right),

(9)

and $U\sim\mbox{Beta}\left(\frac{p-3}{2},\frac{p-3}{2}\right)$ .

2.3 Testing procedure

In what follows, we assume $p$ depends on $n$ and sometimes we write $p=p_{n}$ for clarity. Recall the Brownian bridge $\left\{B_{t};0\leq t\leq 1\right\}$ has the same distribution as $\left\{W_{t}-tW_{1};0\leq t\leq 1\right\}$ , where $\left\{W_{t}\right\}$ is a standard Brownian motion. Our next result establishes the asymptotic distribution of $T_{n}$ under $H_{0}$ , namely

Theorem 1.

Let $\left\{B_{t}\right\}_{0\leq t\leq 1}$ be a standard Brownian bridge on $[0,1]$ . If $\min\left\{n,p\right\}\to\infty$ , then

\sqrt{\frac{n(n-1)}{2}}T_{n}\xrightarrow{d}\max_{t\in[0,1]}|B_{t}|

where $T_{n}$ is defined in (8).

The asymptotic distribution in Theorem 1 is the same as that of the classical nonparametric Kolomorgrov-Smirnov test. The exact expression of the Brownian bridge’s maximum is known to be

\displaystyle\mathbb{P}\left(\max_{t\in[0,1]}|B_{t}|>x\right)=2\sum_{k=1}^{\infty}(-1)^{k+1}\exp\left(-2k^{2}x^{2}\right);

(10)

see, for example, [brownian-bridge].

At first glance, one can see that $T_{n}$ is the supremum of a degenerate U-process. Moreover, the underlying distribution of the U-process is also allowed to change with the dimension. To the best of our knowledge, this situation has not been explored in literature, although limit theory for non-degenerate U-processes is well-studied. To establish the null distribution of $T_{n}$ , we rely on a special property of the uniform distribution on the sphere: under $H_{0}$ , the normalized inner product $\sqrt{p}\bm{X}^{\top}_{i}\bm{X}_{j}$ ’s are pairwise independent and asymptotically normal. Moreover, as discussed above, the dependence between them is weaker as the dimension increases. Thus, one should expect the asymptotic distribution of $T_{n}$ is the same as that of the classical Kolomorgrov-Smirnov test under the i.i.d. settings. Remarkably, Theorem 1 presents a stark departure from classical results concerning degenerate U-processes with a fixed distribution. Typically, the asymptotic distributions of such statistics lack closed-form expressions; see (12) below. However, the divergence of $p$ gives a convenient asymptotic distribution as demonstrated in Theorem 1.

Thanks to Theorem 1 and the expression (10), one can easily calculate the critical value $c_{\alpha}$ for the $\alpha$ -level test. The test rejects the null hypothesis if $T_{n}>c_{\alpha}/\sqrt{2/n(n-1)}$ , where $c_{\alpha}$ is chosen such that

\mathbb{P}\left(\max_{t\in[0,1]}|B_{t}|\leq c_{\alpha}\right)=1-\alpha.

The quantile $c_{\alpha}$ of the Kolomogrov-Smirnov distribution is well understood in literature and can be calculated precisely via (10), for example, $c_{0.95}=1.36$ . This is also the critical value we choose in the simulation. From now on, we will use $\phi_{n}$ to indicate the $\alpha$ -level test based on $T_{n}$ , which is

\displaystyle q_{n}(\alpha):=\frac{\sqrt{2}c_{\alpha}}{\sqrt{n(n-1)}}~~~~~\text{and}~~~~~\phi_{n}:=\mathbf{1}_{\left\{T_{n}\geq q_{n}(\alpha)\right\}},

(11)

where $T_{n}$ is defined in (8).

From Theorem 1, one can see that the test $\phi_{n}$ is doubly robust: there is no restriction on the way $p$ diverges to infinity. Among the three known high-dimensional tests, only the Rayleigh test $R_{n}$ in (2) and the Bingham test $B_{n}$ in (3) satisfy this property. The Packing test $P_{n}$ requires a mild regularity condition $p/(\log n)^{2}\to\infty$ and thus, is not doubly robust. It’s worth noting that the test $\phi_{n}$ is also valid in the fixed $p$ scenario, albeit without the asymptotic distribution provided in Theorem 1. Indeed, when $p$ is fixed, it is known that

\displaystyle\sqrt{\frac{n(n-1)}{2}}T_{n}\xrightarrow{d}\max_{t\in[0,1]}|Q_{t}|

(12)

where $T_{n}$ is defined in (8) and $\left\{Q_{t}\right\}_{0\leq t\leq 1}$ is a stochastic process whose marginal distributions equal to a linear combination of chi-squared distributions, see Theorem 7 in [nolan1988functional]. The asymptotic distribution in (12) does not have a tractable expression and Monte Carlo simulation is needed to approximate the test’s critical value.

2.4 Model-free consistency

For any two probability measures $\mu$ and $\nu$ supported on $\mathbb{S}^{p-1}$ , define the pseudometric

\displaystyle d(\mu,\nu):=\sup_{t\in[-1,1]}\Big|\mathbb{P}_{\mu}\left(\bm{X}_{1}^{\top}\bm{Y}_{1}\leq t\right)-\mathbb{P}_{\nu}\left(\bm{X}^{\top}\bm{Y}\leq t\right)\Big|

(13)

where $\bm{X}_{1}$ , $\bm{Y}_{1}$ are drawn independently from $\mu$ and $\bm{X}$ , $\bm{Y}$ are drawn independently from $\nu$ .

The distance $d$ in (13) is only a pseudometric, in the sense that $d(\mu,\nu)=0$ does not imply $\mu\equiv\nu$ . However, in the light of Proposition 1, we can see that $d(\mu,\mbox{Unif}\left(\mathbb{S}^{p-1}\right))=0$ yields $\mu\equiv\mbox{Unif}\left(\mathbb{S}^{p-1}\right)$ . Thus, the pseudometric $d$ can be used as a quantitative measure for the deviation from the null. Based on this pseudometric, we define a consistency criteria, namely the $1/n$ -separation condition. The precise definition can be formulated as follows.

Condition 1 (separation condition).

Given a sequence $\left\{(n,p_{n});{n\geq 1}\right\}$ . Let $\mu_{n}$ be a sequence of probability measures on $\mathbb{S}^{p_{n}-1}$ , we say that the sequence $\left\{\mu_{n}\right\}_{n\geq 1}$ satisfies the $1/n$ -separation condition if

\displaystyle n\cdot d\left(\mu_{n},\mbox{Unif}\left(\mathbb{S}^{p_{n}-1}\right)\right)\to\infty

(14)

where $d$ is the pseudo metric defined in (13).

The separation condition (14) measures the departure from the null hypothesis in terms of the pseudometric $d$ , which is the Kolomogrov distance between the random inner products drawn under $H_{0}$ and $H_{1}$ . Interestingly, the rate in (14) is of order $n^{-1}$ , which is different than the normal $n^{-1/2}$ rate. This is due to the degeneracy nature of $H_{0}$ and the form of $d$ . Interestingly, condition (14) is of nonparametric nature and requires neither parametric assumptions nor regularities: the sequence of alternatives $\mu_{n}$ may or may not have densities with respect to the uniform measure $\mu_{0}$ , and it is not restricted to any parametric class of distributions that contains $\mbox{Unif}(\mathbb{S}^{p-1})$ .

We do not require $p_{n}$ converge to infinity in (14), and the fixed $p$ setting is also covered in (14). The assumption that $p$ is diverging is only required to control the size of the test via the asymptotic result in Theorem 1. In the fixed $p$ scenario, one can use Monte Carlo simulation to approximate the test’s critical value as stated in (12). If we keep $p$ fixed and consider a fixed alternative, then by Proposition 1, (14) always holds. Therefore, in the fixed-dimensional case, (14) is the same as the universally consistency property of the Sobolev tests. Furthermore, condition (14) remains valid in the high-dimensional settings, making it a natural analogue of the universal consistency property that operates in both fixed and high-dimensional cases. Next, given the separation condition (14), it can be shown next that the test $\phi_{n}$ is consistent.

Theorem 2.

Let $\mu_{n}$ be a sequence of probability measures on $\mathbb{S}^{p_{n}-1}$ which satisfies the separation condition (14). Then, $\lim_{n\to\infty}\mathbb{P}_{\mu_{n}}(\phi_{n}=1)=1$ . Here $\phi_{n}=\phi_{n}(\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{n})$ is the test in (11).

The rate $1/n$ in condition (14) is sharp: we prove a matching lowerbound in Theorem 3 below, which also does not impose any restriction between $p$ and $n$ . Interestingly, if one restricts the model to a parametric class, the threshold at which the distance $d$ scales like $1/n$ often coincide which the minimax rates within that model. We do not have a proof or a result of this type for an arbitrary sequence of alternatives. However, we will investigate in details below two examples of this type, and derive the local limiting distributions along a sequence of local alternatives at the minimax thresholds.

3 Lower bound and non-null results

3.1 An information lower bound

Define the set of test functions based on a sample of size $n$ as

\displaystyle\mathcal{T}_{n}:=\left\{\phi=\phi\left(\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{n}\right):\left(\mathbb{S}^{p-1}\right)^{n}\to\left\{0,1\right\}\right\}.

(15)

By using the Le Cam’s mixture argument, we can show that

Theorem 3.

Suppose $\min\left\{p,n\right\}\to\infty$ . For $\varepsilon$ small enough, we have

\displaystyle\liminf_{n\to\infty}\left\{\inf_{\phi\in\mathcal{T}_{n}}\left\{\mathbb{P}_{\mu_{0}}\left(\phi=1\right)+\sup_{d(\nu,\mu_{0})\geq\frac{\varepsilon}{n}}\mathbb{P}_{\nu}\left(\phi=0\right)\right\}\right\}\geq 1/4.

(16)

In fixed-dimensional settings, Theorem 3 is straightforward to prove since one can directly apply the Le Cam’s two-point argument to a perturbation of size $\Theta(1/\sqrt{n})$ of the uniform distribution. The non-trivial aspect of Theorem 3 lies in establishing the result in high-dimensional settings, and doing so without imposing any growth condition on $p$ and $n$ .

The worst-case construction in the proof of Theorem 3 is based on the Fisher–von Mises–Langevin (FvML) distributions. This choice is motivated by simulation results showing that the test exhibits power very close to that of the Rayleigh test, which is the optimal invariant test within this model [Cutting-P-V].

3.2 Local limiting distribution under the FvML alternatives

The FvML distributions are one the most common type of alternatives for uniformity testing and have been investigated in the recent line of works [Cutting-P-V; Cutting-P-V2]; see also the references therein. To describe the FvML distributions, let us introduce a general class of “monotone” rotationally symmetric densities following [Cutting-P-V; Cutting-P-V2].

Let $f:\mathbb{R}\to\mathbb{R}^{+}$ be a smooth and strictly increasing function. Define the family of densities

\bm{x}\mapsto c_{f,\kappa}\cdot\exp\Big[f\left(\kappa\left(\bm{x}^{\top}\bm{\mu}\right)\right)\Big]d\mu_{0}\left(\bm{x}\right)

Here $\kappa>0$ is the concentration parameter and $\bm{\mu}\in\mathbb{S}^{p-1}$ is the location parameter. Most of the common distributions in directional statistics belong to this class of distributions. Two common choices are

•

Watson distributions. This corresponds to the case $f(x)=e^{x^{2}}$ [Cutting-P-V2].
•

FvML distributions. This corresponds to the case $f(x)=e^{x}$ [Cutting-P-V].

In this subsection, we will investigate the local power and consistency of the test $T_{n}$ under the the class of FvML distributions. It is known that within the class of FvML distributions, the threshold $\kappa\sim p^{3/4}/\sqrt{n}$ is the minimax rate: when $\kappa$ is below this threshold, no rotationally invariant test can be consistent. Moreover, when $\kappa$ is above this threshold, the Rayleigh test is consistent and is also optimal in the sense of Le Cam.

Let $\Phi^{-1}(t):[0,1]\to\mathbb{R}$ be the quantile function of the standard normal distribution, and $\phi$ is the standard Gaussian density, $\phi(x)=(1/\sqrt{2\pi})\exp\left(-x^{2}/2\right)$ . Our main result regarding the FvML alternatives is

Proposition 2.

Let $\kappa=\tau_{n}p^{3/4}/\sqrt{n}$ . Then, if $\tau_{n}\to\tau\in(0,\infty)$ , then

T_{n}\stackrel{{\scriptstyle d}}{{\to}}\sup_{t\in[0,1]}\left|B_{t}-\frac{\tau^{2}}{\sqrt{2}}\phi\left(\Phi^{-1}(t)\right)\right|.

under the sequence of FvML alternatives with concentration parameter $\kappa_{n}$ , where $\left\{B_{t};0\leq t\leq 1\right\}$ is the Brownian bridge.

It follows directly from Proposition 2 that the asymptotic power of $T_{n}$ under the class of FvML distributions is given by

\mathbb{P}\left(\sup_{t\in[0,1]}\left|B_{t}-\frac{\tau^{2}}{\sqrt{2}}\phi\left(\Phi^{-1}(t)\right)\right|\geq q_{\alpha}\right).

From the display above, we can see that $T_{n}$ is consistent at the contiguity rate $p^{3/4}/\sqrt{n}$ . By Proposition 7 in the Appendix A.6, we get

n\times d\left(\mbox{FvML}\left(\tau_{n}p^{3/4}/\sqrt{n}\right),\mu_{0}\right)\to\frac{\tau^{2}}{\sqrt{2\pi}}.

The display above indicates that the local alternatives at the minimax threshold for $d$ are the same as that of the parametric FvML model. In other words, the distance $d$ captures precisely the minimax rate of testing uniformity in the FvML model.

The asymptotic power above does not have any closed-form expression, but we observe in simulation that its power is slightly lower than that of the Rayleigh test, which is expected due to the LAN expansion in [Cutting-P-V]. Note that in this regime, the Packing test $P_{n}$ and the Bingham $B_{n}$ are both blind. We further know from [Cutting-P-V2] that the detection threshold for the Bingham test in this model is $p^{3/4}/n^{1/4}$ .

3.3 Local limiting distribution under a low-rank model

Consider the set of $k$ -dimensional hyperplanes in $\mathbb{R}^{p}$ . We denote this set by $G(k,p)$ , which is known to form the Grassmannian manifold; see [chikuse2003statistics] for a comprehensive overview.

We are interested in testing uniformity against the class of low-rank uniform distributions:

\displaystyle H_{1}:\quad\exists\ (k,\mathcal{H})\ \text{s.t.}\ \mathcal{H}\in G_{n,k}\text{ and }\mu\sim\mathrm{Unif}\!\left(\mathcal{H}\cap\mathbb{S}^{p-1}\right)

(17)

for some $k\in\{1,\dots,p-1\}$ .

Obviously, the case $k=p$ corresponds to the null $H_{0}$ . The problem is essentially about detecting whether the uniform distribution has a low-rank structure in it. We are interested in the hard regime where $k$ is close to $p$ , and we will thus assume that $k=k_{n}$ such that $\min\left\{k,p,n\right\}\to\infty$ and $k/p\to 1$ . In this regime, we can show that

Proposition 3.

Suppose $p/n\to\infty$ and $\left(1-k/p\right)n\to\tau\in(0,\infty)$ . Let $\left\{B_{t}\right\}$ be the Brownian bridge. Then,

T_{n}\stackrel{{\scriptstyle d}}{{\to}}\sup_{t\in[0,1]}\left|B_{t}-\frac{\tau}{2\sqrt{2}}\cdot\Phi^{-1}(t)\phi\left(\Phi^{-1}(t)\right)\right|

where $\Phi^{-1}$ and $\phi$ are the quantile function and density function of a standard normal distribution, respectively.

The proof of Proposition 3 follows directly from Proposition 8 in the Appendix A.5 and Theorem 1. Proposition 3 shows that the test $T_{n}$ is consistent at the threshold $k=(1-\tau/n))p$ , with asymptotic power given by

\mathbb{P}\left(\sup_{t\in[0,1]}\left|B_{t}-\frac{\tau}{2\sqrt{2}}\cdot\Phi^{-1}(t)\phi\left(\Phi^{-1}(t)\right)\right|\geq q_{\alpha}\right)

It is natural to ask whether the rate $k=(1-\Omega(1/n))p$ is optimal. The answer is yes, which is the claim of the theorem below.

Theorem 4.

Suppose $p/n\to\infty$ and put $\delta(k)=(1-k/p)n$ . Then, for some $\varepsilon>0$ sufficiently small, we have

\liminf_{n\to\infty}\left\{\inf_{\phi\in\mathcal{T}_{n}}\left\{\mathbb{P}_{\mu_{0}}\left(\phi=1\right)+\sup_{\mu_{k}\in H_{1}:\delta(k)\geq\varepsilon}\left\{\mathbb{P}_{\mu_{k}}\left(\phi=0\right)\right\}\right\}\right\}\geq 1/4

where $\mathcal{T}_{n}$ is the class of tests based on the data as defined in (15) and the supremum is taken over all alternatives $\mu_{k}$ of the form (17) such that $\delta(k)\geq\varepsilon$ .

Theorem 4 claims that in the high-dimensional regime, as long as $(1-k/p)n$ is small enough, no test based on a sample size of $n$ can be consistent. This suggests that the test $T_{n}$ is rate-optimal in this model. To the best of our knowledge, this information lowerbound is new and has not been studied before. The most technical part of the proof is to analyze the likelihood ratio against a random distribution over the Grassmanian $G(k,p)$ .

By Proposition 8 in the Appendix A.6, we have

\displaystyle n\times d\left(\mbox{Unif}\left(H_{k}\cap\mathbb{S}^{p-1}\right),\mu_{0}\right)\to\frac{\tau}{2}\cdot\sup_{u\in\mathbb{R}}|u\phi(u)|

where $\phi$ is the standard Gaussian density. Therefore, the local alternatives at the minimax threshold for $d$ are the same as that of the low-rank model. In other words, the distance $d$ captures precisely the minimax rate of testing uniformity in the low-rank model.

Let us now do a comparison in terms of local power between the four tests $R_{n},B_{n},P_{n},T_{n}$ . The nice feature of this low-rank model is that all the detection threshold of all the four tests can be computed precisely.

•

Recall the Rayleigh test $R_{n}$ from (2), it is easy to check that $\sqrt{p/k}\cdot R_{n}\stackrel{{\scriptstyle d}}{{\to}}N\left(0,1\right)$ as $n\to\infty$ . Thus, $R_{n}$ is not consistent, even when $k/p\to 0$ . Its maximum power will not exceed $1/2$ . However, a two-sided version of $R_{n}$ , which rejects if $|R_{n}|$ is large, is consistent in the regime $k/p\to 0$ .
•

For the Bingham test $B_{n}$ in (3), we have

$\frac{k}{p}\left[B_{n}-\frac{p(n-1)}{2}\left(\frac{1}{k}-\frac{1}{p}\right)\right]\stackrel{{\scriptstyle d}}{{\to}}N\left(0,1\right)$

as $n\to\infty$ , under $H_{1}$ . Therefore, in the regime $n(1-k/p)\to\tau\in(0,\infty)$ , the asymptotic power of the Bingham test is

$1-\Phi\left(z_{\alpha}-\tau/2\right)$

where $z_{\alpha}$ is the $(1-\alpha)$ -quantile of the standard normal distribution. This shows that the Bingham test achieves the optimal rate as suggested by Theorem 4, with local power given by the above.
•

Finally, regarding the Packing test $P_{n}$ in (4), we have

$\frac{p}{k}\cdot P_{n}-\left(1-\frac{k}{p}\right)\left(4\log n-\log\log n\right)\to G$

where $G$ is a standard Gumbel law. Thus, the test $P_{n}$ is consistent iff $(\log n)(1-k/p)\to\infty$ . This detection threshold is strictly sub-optimal, but is still better than the Rayleigh test.

From the above, we can see that only the proposed test $T_{n}$ and the Bingham test $B_{n}$ achieve the optimal rate. Although the power function of $T_{n}$ does not have a closed-form expression, we find in simulation studies that the local power of the Bingham test is greater than that of the proposed test.

4 When are the distance $d$ and test $T_{n}$ useful?

In this section, we discuss several advantages of the distance $d$ and explain why the proposed test $T_{n}$ is useful. First, the distance $d$ is a measure of “symmetry” and differs from classical metrics between probability measures such as total variation, Hellinger, or chi-squared distances. These distances are not tailored to the orthogonally invariant structure of the problem and, in particular, they do not reflect geometric features such as concentration along lower-dimensional subspaces.

To illustrate the difference with such metrics, consider the class of low-rank uniform distributions introduced in Section 3.3. In terms of total variation distance, we always have

d_{\rm TV}\!\left(\mathrm{Unif}\!\left(\mathbb{S}^{p-1}\right),\,\mathrm{Unif}\!\left(H\cap\mathbb{S}^{p-1}\right)\right)=1

for all subspaces $H$ with dimension less than or equal to $p-1$ .

Thus, density-based distances such as total variation are not well-suited for alternatives that are singular with respect to the uniform distribution. In contrast, the distance $d$ is sensitive to geometric deviations: it detects that low-rank uniform distributions have large “empty regions” compared to the standard uniform distribution, and it also captures the concentration patterns of FvML distributions at the optimal rate. We also note that the Wasserstein distance is another metric that can detect geometric features and is sensitive to changes in intrinsic dimension. However, constructing tests based on the Wasserstein distance is substantially more involved, both analytically and computationally.

Another interesting feature of the distance $d$ , which we don’t have a fully general theory for it yet is that, when restricts to many parametric, high-dimensional classes of distributions, the threshold at which $nd\left(\mu_{n},\mu_{0}\right)$ converges to a non-zero limit often coincides with the minimax rate of testing uniformity within that family. One can show this for some other models, such as the Watson distributions, or the spiked covariance distributions. The task of getting asymptotic expansion for the distance $d$ along a parametric model often can be done by using Edgeworth-type expansion (although the computation can be tedious).

Regarding the test $T_{n}$ , we observe that it achieves the optimal detection rates in both the FvML model and the low-rank model, with explicit power functions available in each case, albeit without being the locally most powerful test in either setting. As discussed above, we believe that $T_{n}$ is rate-optimal for a broad range of parametric models; that is, we conjecture that the threshold at which $nd(\mu_{n},\mu_{0})$ converges to a positive limit coincides with the minimax rate for testing uniformity in that parametric model. This has been verified for the two models considered in Sections 3.2 and 3.3. One can also establish this correspondence for the Watson distributions via an Edgeworth-type expansion, although deriving the local limiting distributions would require a LAN expansion similar to that in [Cutting-P-V]. At present, however, we are not aware of a unified framework for computing the local power of $T_{n}$ across different parametric families.

In what follows, we examine yet another class of distributions whose geometric structure is close to that of the uniform distribution, thereby providing further insight into the behavior of $T_{n}$ .

•

The class of $\alpha$ -spherical distributions. This model arises by projecting heavy-tailed random vectors with i.i.d. components onto the unit sphere. It was introduced in [heiny2022limiting; dornemann2025limiting] in the study of a heavy-tailed analog of the Marchenko–Pastur law for sample correlation matrices. Subsequently, [jiang2025asymptotic] showed that both the Rayleigh test (2) and the Bingham test (3) are inconsistent for this model in the proportional regime $p/n\to c\in(0,\infty)$ , while the packing test (4) remains consistent.

Formally, the $\alpha$ -spherical distribution is defined as

\mu_{\alpha,p}=\frac{\mathbf{X}}{\|\mathbf{X}\|},

where $\mathbf{X}$ is a $p$ -dimensional random vector with i.i.d. symmetric components, each regularly varying with index $\alpha\in(0,2)$ .

Although this model does not represent a local alternative to the uniform distribution, the geometric behavior of its samples is remarkably similar: in both cases, the points are nearly orthogonal in high dimensions. The subtle difference between $\mu_{\alpha,p}$ and $\mu_{0}$ is that, under $\mu_{\alpha,p}$ , there are a few points that are either very close to each other or almost aligned along straight lines through the origin. Intuitively, since almost all the points are orthogonal, the tests that are based on a single polynomial of the inner products, like the Rayleigh test and Bingham test, would fail to be consistent.

By [cohen2020heavy] (see the discussion after Theorem 4.1), it follows that

p^{1/\alpha}\,\mathbf{X}^{\top}\mathbf{Y}\;\xrightarrow{d}\;Z_{\alpha}

for some non-degenerate random variable $Z_{\alpha}$ that can be written as the ratio of independent stable random variables. Since $\alpha\in(0,2)$ , we have $1/\alpha-1/2>0$ , and hence $p^{1/\alpha-1/2}/2\to\infty$ as $p\to\infty$ . Therefore,

	$\displaystyle d\bigl(\mu_{\alpha,p},\mu_{0}\bigr)$	$\displaystyle=\sup_{t\in[-1,1]}\left\|\mathbb{P}_{\mu_{\alpha,p}}\!\left(\mathbf{X}^{\top}\mathbf{Y}\leq t\right)-\mathbb{P}_{\mu_{0}}\!\left(\mathbf{X}^{\top}\mathbf{Y}\leq t\right)\right\|$
		$\displaystyle\geq\left\|\mathbb{P}_{\mu_{\alpha,p}}\!\left(p^{1/\alpha}\mathbf{X}^{\top}\mathbf{Y}\leq\frac{p^{1/\alpha-1/2}}{2}\right)-\mathbb{P}_{\mu_{0}}\!\left(\sqrt{p}\,\mathbf{X}^{\top}\mathbf{Y}\leq\frac{1}{2}\right)\right\|.$

As $p\to\infty$ , the first probability converges to $1$ , while the second converges to $\Phi(1/2)$ , so for all $p$ large enough,

d\bigl(\mu_{\alpha,p},\mu_{0}\bigr)\;\geq\;\frac{1-\Phi(1/2)}{2}.

Thus, the test $T_{n}$ is also consistent in this model.

The behaviour of the four tests can be summarized in the Table 1 above. One can see that only the proposed test $T_{n}$ is the one that stays consistent/rate-optimal across all three different models.

Table 1: Asymptotic detection boundaries (up to constants) of various tests under FvML, low-rank, and

\alpha

-spherical alternatives.

Test / model	FvML	Low rank	$\alpha$ -spherical
$R_{n}$ (2)	$\dfrac{p^{3/4}}{\sqrt{n}}$ , optimal [Cutting-P-V]	$k=o(p)$ , sub-optimal	inconsistent [jiang2025asymptotic]
$B_{n}$ (3)	$\dfrac{p^{3/4}}{n^{1/4}}$ , sub-optimal [Cutting-P-V]	$k=\bigl(1-\Omega(1/n)\bigr)p$ , optimal	inconsistent [jiang2025asymptotic]
$P_{n}$ (4)	blind at $\dfrac{p^{3/4}}{\sqrt{n}}$ , sub-optimal [jiang2025asymptotic]	$k=\bigl(1-\Omega(1/\log n)\bigr)p$ , sub-optimal	consistent [jiang2025asymptotic]
$T_{n}$ (8)	$\dfrac{p^{3/4}}{\sqrt{n}}$ , optimal Proposition 2	$k=\bigl(1-\Omega(1/n)\bigr)p$ , optimal Proposition 3	consistent

5 Conclusions and remarks

In this paper, we propose a novel distance to quantify deviations from uniformity, together with a test naturally associated with this distance. We show that the test enjoys very simple asymptotic properties in high dimensions and admits a model-free consistency theory. We establish optimal detection rates with respect to the proposed distance and show that the test attains these rates. Furthermore, we show that, when restricted to parametric models, the proposed distance precisely captures the usual notion of local alternatives; this is verified for the FvML model and for a low-rank uniform distribution model. As a consequence of our analysis, we obtain the detection threshold for testing the intrinsic dimension of the uniform distribution. We now make some remarks.

1.

It is of independent interest to extend Proposition 1 to other types of spherical distributions. We conjecture that the conclusion of Proposition 1 remains valid for any two Borel probability measures on the sphere, up to an orthogonal transformation, under suitable regularity conditions.
2.

We believe that the proposed distance characterizes the minimax rates for other models as well. For example, one can show this for the Watson model and for certain spiked-covariance models, although the analysis in those cases requires specific restrictions on the joint growth of $p$ and $n$ .
3.

One can also investigate a procedure based on an $L^{2}$ -type distance instead, for which similar results are expected to hold. We leave this as a direction for future work.

6 Proofs

6.1 Proof of Proposition 1

Before presenting the proof, we first state a version of Lebesgue’s differentiation theorem for smooth and complete Riemannian manifolds. The version provided below is not the most general result but is sufficient for our purposes.

Lemma 1.

Let $(M,g)$ be a smooth, complete Riemannian manifold with the corresponding geodesic distance $d$ . Suppose $\nu$ is a non-negative, finite Borel measure on the metric space $(M,d)$ and $f$ is a non-negative integrable function with respect to $\nu$ . Then, we have

\displaystyle\lim_{r\to 0}\frac{\int_{B(x,r)}f(x)d\nu(x)}{\nu\left(B(x,r)\right)}=f(x)

(18)

for $\nu$ -almost everywhere $x\in M$ . Here $B(x,r)$ is the open ball with respect to the geodesic distance $d$ .

Proof of Lemma 1. Define

\displaystyle\nu_{1}(A)=\int_{A}f(x)d\nu(x).

It is easy to see that $\nu_{1}\ll\nu$ . By Theorem A.1 in [jost2021probabilistic], there exists a measurable set $S_{0}\subset M$ such that $\nu(S_{0})=0$ and

\displaystyle D(x)=\lim_{r\to 0}\frac{\nu_{1}\left(B(x,r)\right)}{\nu\left(B(x,r)\right)}

(19)

exists and is finite for all $x\in M\setminus S_{0}$ . Also by Theorem A.1 in [jost2021probabilistic], $D(x)$ is the Radon–Nikodym derivative $d\nu_{1}/d\nu$ (up to a null set) whenever $(M,d)$ is complete. Thus, $D(x)=f(x)$ $\nu$ -almost everywhere. The proof is completed. $\square$

In the argument of Lemma 1, the completeness of $(M,d)$ is needed only to deduce that $D(x)$ in (19) is equal to the Radon–Nikodym derivative $f(x)$ $\nu$ -almost everywhere. Results of this type hold for various metric spaces with different structures, see the classical monograph [GMT] for more details.

Proof of Proposition 1. It is easy to see that given the assumptions in Proposition 1, we have

\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}g(\bm{x}^{\top}\bm{y})d\nu(\bm{x})d\nu(\bm{y})=\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}g(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})

(20)

for any bounded, measurable function $g:[-1,1]\to\mathbb{R}$ . We will show that (20) implies $\nu\equiv\mu_{0}$ . To see this, fix $\eta\in(0,2]$ and define

g_{\eta}(t):=\frac{\mathbf{1}_{(1-\eta,1]}(t)}{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})}.

Let $\mu_{1}=\frac{\nu+\mu_{0}}{2}$ and define the Radon-Nykodim derivatives $\frac{d\nu(\bm{x})}{d\mu_{1}(\bm{x})}=f(\bm{x})$ , $\frac{d\mu_{0}(\bm{x})}{d\mu_{1}(\bm{x})}=h(\bm{x})$ . It follows that $d\mu_{1}(\bm{x})=\frac{f+g}{2}d\mu_{1}(\bm{x})$ and thus,

\displaystyle f+h=2

(21)

$\mu_{1}$ -almost surely. Plug $g_{\eta}$ into (20) gives

\displaystyle\frac{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\nu(\bm{x})d\nu(\bm{y})}{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})}=\underbrace{\frac{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})}{\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})}}_{=1}.

Thus,

\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\nu(\bm{x})d\nu(\bm{y})=\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y}).

(22)

Moreover, by Fubini’s theorem,

		$\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\nu(\bm{x})d\nu(\bm{y})$
	$\displaystyle=$	$\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})f(\bm{x})f(\bm{y})d\mu_{1}(\bm{x})d\mu_{1}(\bm{y})$
	$\displaystyle=$	$\displaystyle\int_{\mathbb{S}^{p-1}}f(\bm{x})\left(\int_{1-\eta<\bm{x}^{\top}\bm{y}\leq 1}f(\bm{y})d\mu_{1}(\bm{y})\right)d\mu_{1}(\bm{x}).$

Additionally,

\displaystyle\int_{\mathbb{S}^{p-1}}\int_{\mathbb{S}^{p-1}}\mathbf{1}_{(1-\eta,1]}(\bm{x}^{\top}\bm{y})d\mu_{0}(\bm{x})d\mu_{0}(\bm{y})

\displaystyle=\mathbb{P}_{\mu_{0}}\left(1-\eta<\bm{x}^{\top}\bm{y}\leq 1\right).

Therefore, from (22) and the two equalities above, we get

\displaystyle\int_{\mathbb{S}^{p-1}}f(\bm{x})\left(\int_{1-\eta<\bm{x}^{\top}\bm{y}\leq 1}f(\bm{y})d\mu_{1}(\bm{y})\right)d\mu_{1}(\bm{x})=\mathbb{P}_{\mu_{0}}\left(1-\eta<\bm{x}^{\top}\bm{y}\leq 1\right)

(23)

for all $\eta\in(0,2]$ .

Let $d(\bm{x},\bm{y})=\arccos\left(\bm{x}^{\top}\bm{y}\right)$ be the geodesic distance on $\mathbb{S}^{p-1}$ . It is easy to check that $\left(\mathbb{S}^{p-1},d\right)$ is a Polish space and that for all $\bm{x}\in\mathbb{S}^{p-1}$ ,

\left\{\bm{y}:1-\eta<\bm{x}^{\top}\bm{y}\leq 1\right\}=B(\bm{x},f_{\eta})

where $f_{\eta}=\arccos(1-\eta)$ and $B(\bm{x},r)$ is the closed ball with center at $\bm{x}$ , radius $r$ , and with respect to $d$ . Hence, for all $\eta\in(0,2]$ , one can rewrite (23) as

\displaystyle\int_{\mathbb{S}^{p-1}}f(\bm{x})\left(\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{y})d\mu_{1}(\bm{y})}{{\int_{B(\bm{x},f_{\eta})}h(\bm{y})d\mu_{1}(\bm{y})}}\right)d\mu_{1}(\bm{x})=1

(24)

where we have used Lemma 8 to write

\mathbb{P}_{\mu_{0}}\left(1-\eta<\bm{x}^{\top}\bm{y}\leq 1\right)=\mu_{0}\left(B(\bm{x},f_{\eta})\right)=\int_{B(\bm{x},f_{\eta})}h(\bm{y})d\mu_{1}(\bm{y})

for all $\bm{x}\in\mathbb{S}^{p-1}$ . Note that (24) holds since the right-hand side in the expression above is constant across $\bm{x}$ .

Since $\left(\mathbb{S}^{p-1},d\right)$ is a smooth, complete Riemannian manifold with respect to the canonical Riemannian metric, Lemma 1 can be applied to $\mu_{1}$ to deduce that

\displaystyle\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{x})d\mu_{1}(\bm{y})}{{\int_{B(\bm{x},f_{\eta})}h(\bm{x})d\mu_{1}(\bm{y})}}=\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{x})d\mu_{1}(\bm{y})}{\mu_{1}\left(B(\bm{x},f_{\eta})\right)}\cdot\left(\frac{\int_{B(\bm{x},f_{\eta})}h(\bm{x})d\mu_{1}(\bm{y})}{\mu_{1}\left(B(\bm{x},f_{\eta})\right)}\right)^{-1}\to\frac{f(\bm{x})}{h(\bm{x})},

as $\eta\to 0$ , for $\mu_{1}$ -almost surely $\bm{x}$ since $f_{\eta}\to 0$ as $\eta\to 0$ . Therefore, the above display together with (21), (24) and Fatou’s lemma yields

	$\displaystyle\int_{\mathbb{S}^{p-1}}\frac{f(\bm{x})^{2}}{2-f(\bm{x})}d\mu_{1}(\bm{x})$	$\displaystyle=\int_{\mathbb{S}^{p-1}}\frac{f(\bm{x})^{2}}{h(\bm{x})}d\mu_{1}(\bm{x})$
		$\displaystyle=\int_{\mathbb{S}^{p-1}}\lim_{\eta\to 0}f(\bm{x})\left(\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{y})d\mu_{1}(\bm{y})}{{\int_{B(\bm{x},f_{\eta})}h(\bm{y})d\mu_{1}(\bm{y})}}\right)d\mu_{1}(\bm{x})$
		$\displaystyle\leq\liminf_{\eta\to 0}\int_{\mathbb{S}^{p-1}}f(\bm{x})\left(\frac{\int_{B(\bm{x},f_{\eta})}f(\bm{x})d\mu_{1}(\bm{y})}{{\int_{B(\bm{x},f_{\eta})}h(\bm{x})d\mu_{1}(\bm{y})}}\right)d\mu_{1}(\bm{x})=1.$

Moreover, Holder’s inequality gives

	$\displaystyle\int_{\mathbb{S}^{p-1}}\frac{f(\bm{x})^{2}}{2-f(\bm{x})}d\mu_{1}(\bm{x})$	$\displaystyle=\int_{\mathbb{S}^{p-1}}\frac{f(\bm{x})^{2}}{2-f(\bm{x})}d\mu_{1}(\bm{x})\cdot\int_{\mathbb{S}^{p-1}}\left[2-f(\bm{x})\right]d\mu_{1}(\bm{x})$
		$\displaystyle\geq\left(\int_{\mathbb{S}^{p-1}}f(\bm{x})d\mu_{1}(\bm{x})\right)^{2}=1.$

The two bounds above implies that the integral $\int_{\mathbb{S}^{p-1}}f(\bm{x})^{2}\cdot\left(2-f(\bm{x})\right)^{-1}d\mu_{1}(\bm{x})$ is exactly $1$ and $f(\bm{x})^{2}=(2-f(\bm{x}))^{2}$ for $\mu_{1}$ -almost surely $\bm{x}$ . This in turn yields $f\equiv 1$ almost surely with respect to $\mu_{1}$ . From (21), we also get $h\equiv 1$ and the conclusion follows. $\square$

6.2 Proof of Theorem 3

Fix $\varepsilon>0$ sufficiently small and define

\kappa_{n}:=\frac{\varepsilon p^{3/4}}{\sqrt{n}};\quad\frac{d\mu_{n,\bm{\theta}}}{d\mu_{0}}(\bm{x})\propto\exp\left(\kappa_{n}\left(\bm{x}^{\top}\bm{\theta}\right)\right).

In other words, $\mu_{n}$ is a FvML distribution with location $\theta$ and concentration parameter $\kappa_{n}$ . Consider the least favorable distribution

\displaystyle\mu_{n}^{*}:=\mathbb{E}_{\bm{\theta}\sim\mu_{0}}\left[\prod_{i=1}^{n}\frac{d\mu_{n,\bm{\theta}}}{d\mu_{0}}(\bm{X}_{i})\right].

(25)

By Proposition 7, for all $\bm{\theta}\in\mathbb{S}^{p-1}$ , we have

d\left(\mu_{n,\bm{\theta}},\mu_{0}\right)\geq\frac{\varepsilon^{2}}{10n}

whenever $\min\left\{p,n\right\}$ is sufficiently large and $\varepsilon$ is small enough.

Consequently, the Le Cam’s mixture argument yields

	$\displaystyle\liminf_{n\to\infty}\left\{\inf_{\phi\in\mathcal{T}_{n}}\left\{\mathbb{P}_{\mu_{0}}\left(\phi=1\right)+\sup_{d(\nu,\mu_{0})\geq\frac{\varepsilon}{n}}\mathbb{P}_{\nu}\left(\phi=0\right)\right\}\right\}$	$\displaystyle\geq\liminf_{n\to\infty}\Big\{1-d_{\rm TV}\left(\mu_{0},\mu_{n}^{*}\right)\Big\}$
		$\displaystyle\geq\liminf_{n\to\infty}\Big\{1-\sqrt{\mathbb{E}L_{n}^{2}-1}\Big\}$

where $L_{n}$ is the likelihood ratio defined in (50).

Thanks to Proposition 5, we know that $\mathbb{E}L_{n}^{2}-1\leq e^{\varepsilon^{2}}-1=O(\varepsilon^{2})$ for small $\varepsilon>0$ . Thus, we get (16) by choosing $\varepsilon$ small enough. The proof is completed. $\square$

6.3 Proof of Theorem 1

Define

\displaystyle S_{n}(t):=\sqrt{\frac{2}{n(n-1)}}\sum_{i<j}\left[\mathbf{1}_{\left\{\sqrt{p}\bm{X}_{i}^{\top}\bm{X}_{j}\leq t\right\}}-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\leq t\right)\right].

(26)

We will show that the process $\left\{S_{n}(t);n\geq 1\right\}$ converges in distribution to $\left\{B_{\Phi(t)}\right\}$ in the Skorohod space $D[a,b]$ , for all $a<b\in\mathbb{R}$ . Some basic properties regarding the topology on this space can be found in [Dehling].

Here $\Phi$ is the CDF of a standard normal distribution and $\left\{B_{u};0\leq u\leq 1\right\}$ is the Brownian bridge. We do not work directly with the space $D\left(\mathbb{R}\right)$ (see [vogel2010weak] for more details) since the supremum functional is not almost surely continuous on this space, see Remark 1 below.

Step 1: Convergence in $D[a,b]$ . Suppose $a<b$ . To show the convergence in $D[a,b]$ , we need to check that

Condition 2 (Finite-dimensional convergence in distribution).

For any grid $a\leq t_{1}<t_{2}<\dots<t_{k}\leq b$ , one has $(S_{n}(t_{1}),S_{n}(t_{2}),\dots,S_{n}(t_{k}))$ converges in distribution to $\left(B_{\Phi(t_{1})},B_{\Phi(t_{2})},\dots,B_{\Phi_{(}t_{k})}\right)$ .

Condition 3 (Tightness).

For any $\varepsilon>0$ , we have

\displaystyle\lim_{\delta\to 0}\limsup_{n\to\infty}\mathbb{P}\left(\sup_{|t-s|\leq\delta}\Big|S_{n}(t)-S_{n}(s)\Big|>\varepsilon\right)=0.

(27)

To check Condition 2, we will make use of the following result whose proof is given in Section A.3.

Proposition 4.

Let $h_{n}:\mathbb{R}\mapsto\mathbb{R}$ be a sequence of measurable functions such that $\mathbb{E}h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})=0$ and

	$\displaystyle\mbox{Var}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})\right)$	$\displaystyle\to\sigma^{2}>0;$		(28)
	$\displaystyle\frac{\mathbb{E}\left(h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})\right)}{n\left(\mathbb{E}h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\right)^{2}}$	$\displaystyle\to 0.$		(29)

Then,

\displaystyle\sqrt{\frac{2}{n(n-1)}}\sum_{1\leq i<j\leq n}h_{n}\left(\bm{X}^{\top}_{i}\bm{X}_{j}\right)\xrightarrow{d}N(0,\sigma^{2}).

(30)

Apply Proposition 4 to the kernels of the form $h_{n}(x)=\mathbf{1}_{\left\{\sqrt{p}\bm{x}\leq t\right\}}$ , we get the convergence of finite-dimensional distributions. Note that condition (29) satisfies because $\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}$ is asymptotically a standard normal distribution.

We now check the tightness condition (27). Note that under uniformity,

c_{n,\delta}:=\sup_{|t-s|\leq\delta}\mathbb{P}\left(s<\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}<t\right)=\sup_{|t-s|\leq\delta}\mathbb{P}\left(s\leq Z\leq t\right)+O\left(p^{-1/2}\right)

as $p\to\infty$ .

By applying Lemma 6 to the class of functions $\left\{\mathbf{1}_{\left\{s\leq\sqrt{p}\cdot\bm{x}^{\top}\bm{y}\leq t\right\}}\right\}_{a\leq s<t\leq b}$ (which has VC-dimension $2$ ) and using the degeneracy of the kernels, we obtain

\displaystyle\mathbb{P}\left(\sup_{|t-s|\leq\delta}\mid S_{n}(t)-S_{n}(s)\mid\geq\varepsilon\right)\lesssim c_{n,\delta}\left[1+\log\left(c_{n,\delta}\right)\right]+\frac{\log\left(c_{n,\delta}^{-1}\right)}{\sqrt{n}}.

The proof is completed by first letting $n\to\infty$ and then letting $\delta\to 0$ .

Step 2: Continuous mapping and negligibility of the tail. By the continuous mapping theorem, for all $a>0$ , we have

\sup_{t\in[-a,a]}|S_{n}(t)|\stackrel{{\scriptstyle d}}{{\to}}\sup_{t\in[-a,a]}|B_{\Phi(t)}|.

To deduce the result, it suffices to show that for all $\varepsilon>0$

\displaystyle\lim_{a\to\infty}\mathbb{P}\left(\sup_{|t|>a}|S_{n}(t)|>\varepsilon\right)=0.

(31)

The above is equivalent to showing that

\lim_{a\to\infty}\mathbb{P}\left(\sup_{t>a}|S_{n}(t)|>\varepsilon\right)=0,\ \text{and}\ \lim_{a\to\infty}\mathbb{P}\left(\sup_{t<-a}|S_{n}(t)|>\varepsilon\right)=0.

Since the proofs of these two limit are identical, we will only prove the former. We again apply Lemma 6 to the VC-type class of functions

\left\{\mathbf{1}_{\left\{\sqrt{p}\cdot\bm{x}^{\top}\bm{y}\leq t\right\}}\right\}_{t\in(a,\infty)}

to deduce that

\mathbb{P}\left(\sup_{t>a}|S_{n}(t)|>\varepsilon\right)\lesssim\tau_{a}\left[1+\log\left(\tau_{a}\right)\right]+\frac{1}{\sqrt{n}}

where the variance profile $\tau_{a}$ is defined as

\tau_{a}:=\sup_{t>a}\left\{\mbox{Var}\left(\mathbf{1}_{\left\{\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\leq t\right\}}\right)\right\}\leq 1-\mathbb{P}\left(\sqrt{p}\cdot\bm{X}_{1}^{\top}\bm{X}_{2}\leq a\right).

It is easy to check that $\tau_{a}$ converges to $0$ as $a\to\infty$ . This finishes the proof. $\square$

Remark 1.

The reason we do not work directly with $D(\mathbb{R})$ is because the functional

	$\displaystyle\mathcal{S}:D(\mathbb{R})$	$\displaystyle\to\mathbb{R}^{+}$
	$\displaystyle f$	$\displaystyle\to\sup_{t\in\mathbb{R}}\|f(t)\|$

is not almost surely continuous at $\left\{B_{\Phi(u)}\right\}_{u\in\mathbb{R}}$ .

In fact, the topology on $D(\mathbb{R})$ is equivalent to the coarsest topology such that the projection map from $D(\mathbb{R})$ to $D[a,b]$ is continuous for all $a<b$ (see Section 3 in [vogel2010weak] for more details). Since this topology only sees the behavior of the process on bounded intervals, modifying the process on a diverging sequence still yields convergence in $D(\mathbb{R})$ , but the supremum functional can blow up.

6.4 Proof of Theorem 2

Consider a sequence of laws $\mu_{n}$ such that $nd\left(\mu_{n},\mu_{0}\right)\to\infty$ . Put $h_{t}(\bm{x},\bm{y})=\mathbf{1}_{\left\{\langle\bm{x},\bm{y}\rangle\leq t\right\}}$ and define

g_{t,n}(\bm{x}):=\mathbb{E}_{\mu_{n}}\left(h_{t}\left(\bm{x},\bm{y}\right)\mid\bm{x}\right).

For $t\in[-1,1]$ , rewrite $T_{n}(t)$ in terms of the Hoeffding’s projection as

\displaystyle T_{n}(t)=T_{n,1}(t)+T_{n,2}(t)+d_{t}

where

	$\displaystyle T_{n,1}(t):$	$\displaystyle=\frac{2}{n}\sum_{i=1}^{n}\Big[g_{t,n}\left(\bm{X}_{i}\right)-\mathbb{E}g_{t,n}\left(\bm{X}_{i}\right)\Big];$
	$\displaystyle T_{n,2}(t):$	$\displaystyle=\frac{2}{n(n-1)}\sum_{1\leq i<j\leq n}\Big[h_{t}\left(\bm{X}_{i},\bm{X}_{j}\right)-g_{t,n}\left(\bm{X}_{i}\right)-g_{t,n}\left(\bm{X}_{i}\right)+\mathbb{E}h_{t}\left(\bm{X}_{i},\bm{X}_{j}\right)\Big];$
	$\displaystyle d_{t}:$	$\displaystyle=\mathbb{P}_{\mu_{n}}\left(\bm{X}_{1}^{\top}\bm{X}_{2}\leq t\right)-\mathbb{P}_{\mu_{0}}\left(\bm{X}_{1}^{\top}\bm{X}_{2}\leq t\right).$

Define

	$\displaystyle V_{n}:$	$\displaystyle=\max_{t\in[-1,1]}\mbox{Var}\left(g_{t,n}\left(\bm{X}_{1}\right)\right);$
	$\displaystyle t_{n}:$	$\displaystyle=\mbox{argmax}_{t\in[-1,1]}\mbox{Var}\left(g_{t,n}\left(\bm{X}_{1}\right)\right);$
	$\displaystyle\alpha_{n}:$	$\displaystyle=\mbox{argmax}_{t\in[-1,1]}\left\|d_{t}\right\|$
	$\displaystyle v:$	$\displaystyle=\limsup_{n\to\infty}(nV_{n}).$

Roughly speaking, $t_{n}$ is the point where the major contribution from the Hoeffding’s projection term $T_{n,1}$ comes from, and $\alpha_{n}$ is the point where main contribution from the deterministic perturbation $d_{t}$ comes from.

It suffices to consider the following two cases.

Case 1: $v<\infty$ . In this case, we can estimate

	$\displaystyle n\cdot\sup_{t\in[-1,1]}\|T_{n}(t)\|$	$\displaystyle\geq n\cdot\|T_{n}(\alpha_{n})\|-n\cdot\left\|T_{n,1}\left(\alpha_{n}\right)+T_{n,2}(\alpha_{n})\right\|$
		$\displaystyle=n\cdot d\left(\mu_{n},\mu_{0}\right)-n\cdot\left\|T_{n,1}\left(\alpha_{n}\right)+T_{n,2}(\alpha_{n})\right\|.$

It is easy to check that

\displaystyle\mbox{Var}\left(T_{n,1}\left(\alpha_{n}\right)\right)\leq\frac{V_{n}}{n}\ \ \text{and}\ \ \left|T_{n,2}(\alpha_{n})\right|\leq\sup_{t\in[-1,1]}\left|T_{n,2}(t)\right|=O_{\mathbb{P}}\left(n^{-1}\right)

where the second inequality follows from Lemma 6.

Consequently,

\displaystyle n\cdot\sup_{t\in[-1,1]}|T_{n}(t)|

\displaystyle\geq n\cdot d\left(\mu_{n},\mu_{0}\right)-O_{\mathbb{P}}\left(\sqrt{nV_{n}}\right)-O_{\mathbb{P}}(1).

Since $\sup_{n\geq 1}\left(nV_{n}\right)=v<\infty$ and $nd\left(\mu_{n},\mu_{0}\right)\to\infty$ , the test rejects with probability tending to one.

Case 2: $v=\infty$ . In this case, by a subsequence argument, we can assume that $nV_{n}\to\infty$ . Notice that

	$\displaystyle\mathbb{P}\left(\text{rejecting the null}\right)$	$\displaystyle=\mathbb{P}\left(\sup_{t\in[-1,1]}\|T_{n}(t)\|\geq\frac{q_{\alpha}}{n}(1+o(1))\right)$
		$\displaystyle\geq\mathbb{P}\left(\left\|T_{n,1}(t_{n})+d_{t_{n}}\right\|\geq\frac{q_{\alpha}}{n}(1+o(1))+\left\|T_{n,2}\left(t_{n}\right)\right\|\right)$
		$\displaystyle=\mathbb{P}\left(\left\|\sqrt{\frac{n}{V_{n}}}\cdot T_{n1}(t_{n})+d_{t_{n}}\sqrt{\frac{n}{V_{n}}}\right\|\geq T^{*}_{n}\right)$

where

T^{*}_{n}:=\frac{q_{\alpha}(1+o(1))}{\sqrt{nV_{n}}}+\sqrt{\frac{n}{V_{n}}}\left|T_{n,2}\left(t_{n}\right)\right|,

and the second line follows from the fact that

\sup_{t\in[-1,1]}|T_{n}(t)|\geq\left|T_{n1}(t_{n})+d_{t_{n}}\right|-\left|T_{n,2}\left(t_{n}\right)\right|.

By Lemma 6 and the assumption that $nV_{n}\to\infty$ , we deduce that $T_{n}^{*}=o_{\mathbb{P}}(1)$ . Now, thanks to the Berry–Esseen bound for sum of i.i.d. random variables (see; for example, Theorem 3.7 in [chen2010normal]) and the fact that $|g_{n,t_{n}}|\leq 1$ , we have

	$\displaystyle\sup_{x\in\mathbb{R}}\left\|\mathbb{P}\!\left(\sqrt{\frac{n}{V_{n}}}\,T_{n1}(t_{n})\leq x\right)-\mathbb{P}\!\left(N(0,4)\leq x\right)\right\|$	$\displaystyle\lesssim\frac{\mathbb{E}\left\|g_{n,t_{n}}\left(\bm{X}_{i}\right)\right\|^{3}}{\sqrt{n}\cdot V_{n}^{3/2}}$
		$\displaystyle\lesssim\frac{\mathbb{E}\left\|g_{n,t_{n}}\left(\bm{X}_{i}\right)\right\|^{2}}{\sqrt{n}\cdot V_{n}^{3/2}}$
		$\displaystyle=\frac{V_{n}}{\sqrt{n}\cdot V_{n}^{3/2}}=\frac{1}{\sqrt{nV_{n}}}\to 0.$

The proof is in this case is completed by employing Lemma 2 below with

X_{n}=\sqrt{\frac{n}{V_{n}}}\,T_{n1}(t_{n});\quad Y_{n}=T_{n}^{*};\quad a_{n}=d_{t_{n}}\sqrt{\frac{n}{V_{n}}}.

to get

\lim_{n\to\infty}\mathbb{P}\left(\left|\sqrt{\frac{n}{V_{n}}}\cdot T_{n1}(t_{n})+d_{t_{n}}\sqrt{\frac{n}{V_{n}}}\right|\geq T^{*}_{n}\right)=1.

$\square$

Lemma 2.

Suppose $\{X_{n}\}$ is a sequence of random variables such that

\sup_{t\in\mathbb{R}}\left|\mathbb{P}\left(X_{n}\leq t\right)-\mathbb{P}\left(N(0,4)\leq t\right)\right|\to 0

where $N(0,4)$ is a normal distribution with variance $4$ .

Let $\{a_{n}\}$ be any sequence of real numbers (not necessarily bounded), and let $\{Y_{n}\}$ be a sequence of random variables such that $Y_{n}\stackrel{{\scriptstyle\mathbb{P}}}{{\to}}0$ . Then

\lim_{n\to\infty}\mathbb{P}\!\left(|X_{n}+a_{n}|\geq|Y_{n}|\right)=1.

Proof of Lemma 2. Fix $\varepsilon>0$ and write

	$\displaystyle\mathbb{P}\!\Big(\|X_{n}+a_{n}\|\leq\|Y_{n}\|\Big)$	$\displaystyle\leq\mathbb{P}\!\Big(\|X_{n}+a_{n}\|\leq\|Y_{n}\|,\|Y_{n}\|\leq\varepsilon\Big)+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)$
		$\displaystyle\leq\mathbb{P}\left(\|X_{n}+a_{n}\|\leq\varepsilon\right)+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)$
		$\displaystyle=\mathbb{P}\left(-\varepsilon-a_{n}\leq N(0,4)\leq\varepsilon-a_{n}\right)+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)+o(1)$
		$\displaystyle\leq 2\varepsilon\cdot\sup_{t\in\mathbb{R}}\left\{\frac{1}{2\sqrt{2\pi}}\exp\left(-t^{2}/8\right)\right\}+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)+o(1).$
		$\displaystyle\leq 2\varepsilon+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)+o(1).$

The proof is completed by taking $n\to\infty$ and then letting $\varepsilon\to 0$ . $\square$

6.5 Proof of Proposition 2

As in the proof of Theorem 1, we need to check three conditions:

•

Convergence in finite-dimensional distributions. Recall $S_{n}(t)$ in (26). We need to show that for all $t_{1}<t_{2}<\dots<t_{k}$ ,

\displaystyle\left(S_{n}(t_{1}),\dots,S_{n}(t_{k})\right)\stackrel{{\scriptstyle d}}{{\to}}\left(B_{\Phi(t_{1})}-\frac{\tau^{2}\exp\left(-t_{1}^{2}/2\right)}{2\sqrt{\pi}},\dots,B_{\Phi(t_{k})}-\frac{\tau^{2}\exp\left(-t_{k}^{2}/2\right)}{2\sqrt{\pi}}\right)

(32)

under the FvML distributions with concentration parameter $\kappa_{n}$ .

•

Tightness. This condition is equivalent to (27), for all spaces $D[a,b]$ with $a<b$ .
•

Negligibility of the tail. This condition is (31).

To show (32), recall that by Proposition 4 and the Crámer-Wold device, we have

\left(S_{n}(t_{1}),\dots,S_{n}(t_{k}),R_{n}\right)\stackrel{{\scriptstyle d}}{{\to}}\left(\bm{B}_{k},Z\right)

under uniformity, where $Z$ is a standard normal, $R_{n}$ is the Rayleigh test as in (2) and $\bm{B}_{k}=\left(B_{\Phi(t_{1})},\dots,B_{\Phi(t_{k})}\right)$ has the distribution equals to the joint distribution of discretized Brownian bridge at $\Phi(t_{1}),\Phi(t_{2}),\dots,\Phi(t_{k})$ . Moreover, the correlation between $Z$ and $\bm{B}_{k}$ can be specified as (this is also the covariance limit in Proposition 7’s proof)

\mathbb{E}\left(B_{\Phi(t_{i})}Z\right)=\mathbb{E}\left(Z^{*}\cdot\mathbf{1}_{\left\{Z^{*}\leq t_{i}\right\}}\right)=\frac{-\exp\left(-t_{i}^{2}/2\right)}{\sqrt{2\pi}}

with $Z^{*}\sim N(0,1)$ , for all $1\leq i\leq k$ .

We then obtain (32) from the convergence above by using the LAN expansion (52) (see [Cutting-P-V] for a proof) and the Le Cam’s third lemma.

We now show (27) and (31). Since their proofs are similar, we will only show (27). Assume the contrary, then there exists $\varepsilon,\varepsilon_{1}>0$ and and a sequence $\left\{\delta_{k},n_{k}\right\}_{k\geq 1}$ such that

\displaystyle\liminf_{k\to\infty}\left\{\mathbb{P}_{\mu_{n_{k}}}\left(\sup_{|t-s|\leq\delta_{k}}\Big|S_{n_{k}}(t)-S_{n_{k}}(s)\Big|>\varepsilon\right)\right\}\geq\varepsilon_{1}.

(33)

where $\mu_{n_{k}}$ is the corresponding subsequence of FvML alternatives. Put

\mathcal{A}_{k}=\left\{\sup_{|t-s|\leq\delta_{k}}\Big|S_{n_{k}}(t)-S_{n_{k}}(s)\Big|>\varepsilon\right\}.

Recall $L_{n}$ in (50). Thanks to Proposition 5, we have

	$\displaystyle\mathbb{P}_{\mu_{n_{k}}}\left(\mathcal{A}_{k}\right)=\int_{\mathcal{A}_{k}}1d\mathbb{P}_{\mu_{nk}}$	$\displaystyle=\int_{\mathcal{A}_{k}}L_{n}d\mathbb{P}_{0}$
		$\displaystyle\leq\sqrt{\mathbb{P}_{0}\left(\mathcal{A}_{k}\right)}\cdot\sqrt{\mathbb{E}_{\mathbb{P}_{0}}\left(L_{n}^{2}\right)}\to 0$		(34)

since ${\mathbb{E}_{\mathbb{P}_{0}}\left(L_{n}^{2}\right)}<\infty$ and $\mathbb{P}_{0}\left(\mathcal{A}_{k}\right)\to 0$ under uniformity (which is due to (27) holds under uniformity). Note that the second equality in this display above follows from the fact that the distribution of $S_{n}(t)$ is invariant under rotations:

S_{n}(t)\left(\bm{X}_{1},\dots,\bm{X}_{n}\right)\stackrel{{\scriptstyle d}}{{=}}S_{n}(t)\left(\bm{O}\bm{X}_{1},\dots,\bm{O}\bm{X}_{n}\right)

for all orthogonal matrices $\bm{O}$ .

Since (33) contradicts (34), (27) must hold. This finishes the proof. $\square$

6.6 Proof of Theorem 4

Let us start with a useful result for calculating likelihood ratios between distributions that are invariant under group actions. For terminology related to group actions and maximal invariants, we refer the reader to Chapters 2 and 3 of the monograph [eaton1989group].

For the reader’s convenience, we briefly recall the relevant concepts. A group $G$ is said to act on a space $\mathcal{X}$ if there exists a mapping $G\times\mathcal{X}\to\mathcal{X}$ that is compatible with the group operation. A measurable mapping $T:\mathcal{X}\to\mathcal{Y}$ is called an invariant if

T(x)=T(gx),\qquad\forall\,g\in G.

An invariant $T$ is called a maximal invariant if $T(x)=T(y)$ for some $x,y\in\mathcal{X}$ , then there exists $g\in G$ such that $x=gy$ .

Lemma 3.

Let $\mathcal{X}$ be a Polish space, and suppose a compact group $G$ acts on $\mathcal{X}$ continuously. Let $\mathbb{P}$ and $\mathbb{Q}$ be two Borel probability measures on $\mathcal{X}$ that are invariant under the action of $G$ . Let $T:\mathcal{X}\to\mathcal{Y}$ be a continuous maximal invariant for some Polish space $\mathcal{Y}$ . Define the induced laws

\mathbb{P}_{T}:=\mathbb{P}\circ T^{-1},\qquad\mathbb{Q}_{T}:=\mathbb{Q}\circ T^{-1}.

Then $\mathbb{P}\ll\mathbb{Q}$ whenever $\mathbb{P}_{T}\ll\mathbb{Q}_{T}$ . Moreover, when this holds and $X\sim\mathbb{Q}$ , we have

\frac{d\mathbb{P}}{d\mathbb{Q}}(X)=\frac{d\mathbb{P}_{T}}{d\mathbb{Q}_{T}}\!\bigl(T(X)\bigr)\quad\mathbb{Q}\text{-almost surely}.

The proof of Lemma 3 can be found in Appendix A.4. We now construct the least favorable alternative. Let $\Pi_{k,p}$ denote the normalized left Haar measure on the Grassmannian $G(k,p)$ (so that it is a probability measure). Define

	$\displaystyle\mathbb{P}_{0n}$	$\displaystyle:=\underbrace{\mathrm{Unif}\!\left(\mathbb{S}^{p-1}\right)\otimes\cdots\otimes\mathrm{Unif}\!\left(\mathbb{S}^{p-1}\right)}_{n\text{-times}},$
	$\displaystyle\mathbb{P}_{1n}$	$\displaystyle:=\int_{G(k,p)}\underbrace{\mathrm{Unif}\!\left(H\cap\mathbb{S}^{p-1}\right)\otimes\cdots\otimes\mathrm{Unif}\!\left(H\cap\mathbb{S}^{p-1}\right)}_{n\text{-times}}\,\Pi_{k,p}(dH).$

Roughly speaking, $\mathbb{P}_{0n}$ is the joint distribution of $\mathbf{X}_{1},\ldots,\mathbf{X}_{n}$ under $H_{0}$ , while $\mathbb{P}_{1n}$ is the law obtained by first sampling a $k$ -dimensional subspace $H\sim\Pi_{k,p}$ and then sampling

(\mathbf{X}_{1},\ldots,\mathbf{X}_{n})\mid H\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}\mathrm{Unif}\!\left(H\cap\mathbb{S}^{p-1}\right).

From now on, we let $\mathbf{X}$ denote the $p\times n$ data matrix whose columns are $\mathbf{X}_{1},\ldots,\mathbf{X}_{n}$ . We will apply Lemma 3 to show that $d\mathbb{P}_{1n}/d\mathbb{P}_{0n}$ exists and to derive its explicit form. Note that, although one can also use the Blaschke–Petkantschin formula to compute this integral (see, for example, Chapter 7 of [schneider2008stochastic]), the computation is quite lengthy. Define

	$\displaystyle\mathcal{X}:$	$\displaystyle=\left(\mathbb{S}^{p-1}\right)^{n};$
	$\displaystyle\mathcal{Y}:$	$\displaystyle=\left\{\mathcal{C}\in\mbox{Sym}_{n}\left(\mathbb{R}\right):\mathcal{C}\succ 0\ \text{and}\ \mathcal{C}_{ii}=1,\ \forall 1\leq i\leq n\right\}$

where $\mbox{Sym}_{n}\left(\mathbb{R}\right)$ is the set of all symmetric matrices of size $n$ .

By Lemma 7, the map

	$\displaystyle T:\mathcal{X}$	$\displaystyle\to\mathcal{Y}$
	$\displaystyle\bm{X}$	$\displaystyle\to\bm{X}^{\top}\bm{X}$

is a maximal invariant under the action of the group of orthogonal matrices.

Note that the matrix $\bm{G}:=\bm{X}^{\top}\bm{X}$ is nothing but the sample correlation matrix without centering by the sample mean. We can then apply Lemma 2.1 in [jiang2019determinant] and Theorem 5.1.3 in [muirhead2009aspects] to get the density of $\bm{G}$ under $\mathbb{P}_{0n}$ as

\displaystyle f\left(\bm{G}\right)\propto\mbox{det}\left(\bm{G}\right)^{(p-n-1)/2}d\bm{G}.

(35)

We have $p-1$ in the above formula instead of $p-2$ in [muirhead2009aspects] because there is no centering term in $\bm{G}$ , and Lemma 2.1 in [jiang2019determinant] asserts that such difference is in fact equivalent up to one unit shift in $p$ .

Here $d\bm{G}$ joint densities of the upper-diagonal entries of $\bm{G}$ . Equivalently, it can also be regarded as a measure on $\mathcal{Y}$ , defined as the pushforward measure of the Lebesgue measure on an open subset of $\mathbb{R}^{n(n-1)/2}$ to $\mathcal{Y}$ via the natural embedding.

Similarly, under $\mathbb{P}_{1n}$ , the density of $\bm{G}$ is given by

\displaystyle f\left(\bm{G}\right)\propto\mbox{det}\left(\bm{G}\right)^{(k-n-1)/2}d\bm{G}.

(36)

It is easy to see that the two laws in (35) and (36) are mutually continuous. Also, these two densities are well-defined due to our assumption that $n+1\leq k\leq p$ . Thus, Lemma 3 gives

\displaystyle\mathcal{L}_{n}:=\frac{d\mathbb{P}_{1n}}{d\mathbb{P}_{0n}}\left(\bm{X}_{1},\dots,\bm{X}_{n}\right)=\frac{d\mathbb{P}_{1n}\circ T^{-1}}{d\mathbb{P}_{0n}\circ T^{-1}}\left(\bm{G}\right)=C\left(\frac{k-p}{2}\right)\cdot\mbox{det}\left(\bm{G}\right)^{(k-p)/2}.

(37)

where the normalizing constant $C(\theta)$ satisfies

C\left(\theta\right):=\left[\mathbb{E}_{\mathbb{P}_{0n}}\left(\mbox{det}\left(\bm{G}\right)^{\theta}\right)\right]^{-1}.

Similarly to the proof of Theorem 3, we only need to show that

\mathbb{E}_{\mathbb{P}_{0n}}\left(\mathcal{L}_{n}^{2}\right)=1+o(1)

whenever $n\left(1-k/p\right)\to 0$ with $\mathcal{L}_{n}$ is defined in (37).

From Lemma 5.2 in [jiang2015likelihood], we find that

\displaystyle\mathbb{E}_{\mathbb{P}_{0n}}\left(\mbox{det}\left(\bm{G}\right)^{\theta}\right)=\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\theta\right)}\right]^{n}\cdot\frac{\Gamma_{n}\left(\frac{p}{2}+\theta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}

for $\theta>-\max\left\{1,(p-n)/2\right\}$ where $\Gamma_{n}$ is the multivariate Gamma function defined as in (5.1) of [jiang2015likelihood]. The specific form of $\Gamma_{n}$ is not relevant to our proof as we will only need the asymptotic result from Proposition 5.1 of [jiang2015likelihood]. These asymptotic results are collected in Lemma 4 in Appendix A.5 below.

Consequently, with $\Delta:=k-p$ we obtain

	$\displaystyle\mathbb{E}_{\mathbb{P}_{0n}}\left(\mathcal{L}_{n}^{2}\right)=\frac{C\left(\frac{\Delta}{2}\right)^{2}}{C\left(\Delta\right)}$	$\displaystyle=\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\Delta\right)}\right]^{n}\cdot\frac{\Gamma_{n}\left(\frac{p}{2}+\Delta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\cdot\left[\frac{\Gamma\left(\frac{p+\Delta}{2}\right)}{\Gamma\left(\frac{p}{2}\right)}\right]^{2n}\cdot\frac{\Gamma^{2}_{n}\left(\frac{p}{2}\right)}{\Gamma^{2}_{n}\left(\frac{p+\Delta}{2}\right)}$
		$\displaystyle=\exp\left\{F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)\right\}$

where

\displaystyle F_{n}(\Delta):=n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\Delta\right)}\right]+\log\left[\frac{\Gamma_{n}\left(\frac{p}{2}+\Delta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right].

(38)

The proof is completed by applying Proposition 6 in the Appendix A.5, which states that

F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)\to 0

as $n\to\infty$ such that $n(1-k/p)\to 0$ . $\square$

Appendix A Technical results, discussions and other proofs

A.1 Comparison with projection-based tests

At a high level, our newly developed procedure follows the philosophy of projection-based tests, initially developed in [cuesta2009projection]. Their test relies on the characterization (7). This characterization can be shown using a variant of the Cramér-Wold device, although the same argument does not apply to Proposition 1. Based on (7), [cuesta2009projection] proposed a test that rejects for large values of

\displaystyle D_{n,\bm{U}}:=\sup_{x\in[-1,1]}\left|\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}\left\{\bm{X}_{i}^{\top}\bm{U}\leq x\right\}-\mathbb{P}\left(\bm{e}_{1}^{\top}\bm{U}\leq x\right)\right|,

(39)

where $\bm{U}\sim\text{Uni}(\mathbb{S}^{p-1})$ is drawn independently from the data and $\bm{e}_{1}=(1,0,\dots,0)$ .

The test in [cuesta2009projection] uses the same critical value as the Kolmogorov–Smirnov test. In practice, $\bm{U}$ is drawn multiple times from $\mbox{Unif}\left(\mathbb{S}^{p_{n}-1}\right)$ and one gets a corresponding $p$ -value for every such $\bm{U}$ . The test rejects if the smallest $p$ -value is below a threshold. More specifically, one picks a large number $k$ and draws $\bm{U}_{1},\dots,\bm{U}_{k}$ independently from $\mbox{Unif}\left(\mathbb{S}^{p_{n}-1}\right)$ . The test in [cuesta2009projection] rejects at $\alpha$ -level if

\min_{1\leq i\leq k}\mathbb{P}\left(D_{n,\bm{U}_{i}}>K_{\alpha}\Big|\bm{X}_{1},\dots,\bm{X}_{n}\right)\leq c_{\alpha}

where $K_{\alpha}$ is the critical value of the Kolomogrov–Smirnov test and $c_{\alpha}$ is the $(1-\alpha)$ -quantile of the left-hand side. However, as the asymptotic theory for this test remains unresolved, computationally intensive Monte Carlo methods are often required to approximate $c_{\alpha}$ .

Subsequent works, such as [escanciano2006consistent; garcia2023projection; garcia2021cramer] (see also the references therein), addressed this issue by integrating over all possible directions $\bm{U}$ , resulting in test statistics of the form

\displaystyle\mathbb{E}_{\bm{U}}\left[\int_{-1}^{1}\left(\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}\left\{\bm{X}_{i}^{\top}\bm{U}\leq x\right\}-\mathbb{P}\left(\bm{e}_{1}^{\top}\bm{U}\leq x\right)\right)^{2}w(x)dx\right],

(40)

for some weight function $w(x)\in L^{2}\left([-1,1]\right)$ , where the expectation above is taken with respect to $\bm{U}$ .

Test statistics like (40) exhibit desirable properties, similar to the Anderson–Darling and Cramér–von Mises tests. However, the expectation with respect to $\bm{U}$ in (40) often lacks a closed-form expression, requiring Monte Carlo simulations for approximation. Additionally, their asymptotic distributions frequently involve weighted sums of chi-squared distributions, complicating the computation of tail probabilities. For example, the tests in [garcia2021cramer; garcia2023projection] rely on Imhof’s method to approximate the critical value. Our proposed test offers two key advantages over projection-based tests:

1.

Reduction in computational cost: Unlike projection-based methods, our test avoids sampling random directions or integrating over all possible directions, which often requires complex procedures to approximate the critical values. Theorem 1 demonstrates that when the dimension is large, the tail probabilities of our test statistic are much simpler to approximate, eliminating the need for Monte Carlo simulation.
2.

Flexibility in the high-dimensional settings: While existing projection-based tests are valid only in fixed-dimensional scenarios, our test extends seamlessly to high-dimensional settings, including cases where $p$ is large and $n$ is small. Extending projection-based tests to such settings is highly non-trivial, as it requires understanding how the eigenvalues of the associated Hilbert–Schmidt operator shrink to zero as the dimension increases.

A.2 Simulation studies

In this subsection, we do simulation studies to compare the power of $T_{n}$ with the three existing tests, $R_{n},B_{n},P_{n}$ in (2), (3) and (4), respectively. We let $n=p=80$ in the experiments. The empirical powers of the four tests are reported in Figure 1 below. The yellow curve corresponds to the proposed test $T_{n}$ . We can see that the empirical power fits the theoretical analysis well: the proposed test is nearly as good as the Rayleigh test in the FvML model and is the only test that remains consistent under the both models.

Refer to caption — Figure 1: Empirical power under the FvML model and the low rank model

A.3 Proof of Proposition 4

To prove (30), we employ a martingale central limit theorem. We will use Corollary 3.1 in [Hall]. Define

$\displaystyle Z_{n}:$	$\displaystyle=\sum_{1\leq i<j\leq n}h_{n}\left(\bm{X}^{\top}_{j}\bm{X}_{j}\right)$	(41)
$\displaystyle s_{n}^{2}:$	$\displaystyle=\mbox{Var}(Z_{n}),$
$\displaystyle Y_{n,i}:$	$\displaystyle=\sum_{j=1}^{i-1}h_{n}(\bm{X}^{\top}_{i}\bm{X}_{j}),$
$\displaystyle Q_{n}:$	$\displaystyle=\sum_{i=2}^{n}\mathbb{E}\left(Y_{n,i}^{2}\Big\|\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{i-1}\right).$	(42)

Thanks to Lemma 8 in Section A, the sequence $\left\{Y_{n,i};2\leq i\leq n\right\}$ is a martingale difference sequence with respect to the natural sigma-fields $\mathcal{F}_{i}=\sigma\left(\bm{X}_{1},\bm{X}_{2},\dots,\bm{X}_{i}\right)$ . This means that $\mathbb{E}\left(Y_{n,i}|\mathcal{F}_{i-1}\right)=Y_{n,i-1}$ . To get (30) via the martingale CLT from Corollary 3.1 in [Hall], we need to verify the following two conditions

\displaystyle s_{n}^{-2}\sum_{j=2}^{n}\mathbb{E}\Big[Y_{n,i}^{2}\cdot\mathbf{1}_{\left\{|Y_{n,i}|>\varepsilon s_{n}\right\}}\Big]\to 0,

(43)

for every fixed $\varepsilon>0$ , and

\displaystyle s_{n}^{-2}Q_{n}\xrightarrow{\mathbb{P}}1.

(44)

To verify the Lindeberg condition (43), it suffices to show that

\displaystyle s_{n}^{-4}\sum_{i=2}^{n}\mathbb{E}Y_{n,i}^{4}\to 0.

(45)

By the pairwise independence property (see Lemma 8), we get $s_{n}^{2}=n^{2}(1+o(1))\cdot\mathbb{E}h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})$ and thus,

\displaystyle s_{n}^{4}=n^{4}(1+o(1))\cdot\sigma^{4},

(46)

where $\sigma^{2}$ is the limit in (28). To bound the $4$ -th moment terms in (45), we first note that

	$\displaystyle\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big]$	$\displaystyle=\mathbb{E}\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big\|\bm{X}_{1}\Big]$
		$\displaystyle=\mathbb{E}\Big[\mathbb{E}\left(h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big\|\bm{X}_{1}\right)\cdot\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big\|\bm{X}_{1}\right)$
		$\displaystyle\cdot\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big\|\bm{X}_{1}\right)\Big]$
		$\displaystyle=0,$

due to the assumption $\mathbb{E}h_{n}(\bm{X}_{1}^{\top}\bm{X}_{2})=0$ and Lemma 8. Similarly,

	$\displaystyle\mathbb{E}\Big[h_{n}^{3}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big]$	$\displaystyle=\mathbb{E}\mathbb{E}\Big[h_{n}^{3}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big\|\bm{X}_{1}\Big]$
		$\displaystyle=\mathbb{E}\Big[\mathbb{E}\left(h_{n}^{3}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big\|\bm{X}_{1}\right)\cdot\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big\|\bm{X}_{1}\right)\Big]$
		$\displaystyle=0.$

Consequently, for some universal constant $C$ , we get

	$\displaystyle\mathbb{E}Y_{n,i}^{4}$	$\displaystyle\leq\sum_{j=1}^{i-1}\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{i}\bm{X}_{j})+C\cdot\sum_{1\leq r\neq t\leq i-1}\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{i}\bm{X}_{r})\cdot h_{n}^{2}(\bm{X}^{\top}_{i}\bm{X}_{t})\Big]$
		$\displaystyle=(i-1)\cdot\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})+Ci^{2}\cdot\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big]$
		$\displaystyle=(i-1)\cdot\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})+Ci^{2}\cdot\left(\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big]\right)^{2}.$

where the last assertion follows from the second statement of Lemma 8. Sum up the above display over all $2\leq i\leq n$ , we arrive at

	$\displaystyle\sum_{i=2}^{n}\mathbb{E}Y_{n,i}^{4}$	$\displaystyle\leq C_{1}\cdot\Big(n^{2}\cdot\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})+n^{3}\cdot\mathbb{E}^{2}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big]\Big)$
		$\displaystyle\leq(C_{1}+1)\cdot n^{3}\cdot\mathbb{E}h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2}),$

for some universal constant $C_{1}$ . This in turn yields

s_{n}^{-4}\sum_{i=2}^{n}\mathbb{E}Y_{n,i}^{4}\leq(C_{1}+1)\cdot\frac{\mathbb{E}\Big[h_{n}^{4}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big]}{n\cdot\mathbb{E}^{2}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big]}

by (44) and (46). The last term in the above display goes to $0$ by assumption (29) and thus, implies (45). This in turn conclude the proof of (43).

We now prove (44). Recall $Q_{n}$ defined in (42), which can be rewritten as

	$\displaystyle Q_{n}$	$\displaystyle=\sum_{i=2}^{n}\Big[\sum_{j=1}^{i-1}\mathbb{E}\left(h_{n}^{2}(\bm{X}^{\top}_{i}\bm{X}_{j})\Big\|\bm{X_{j}}\right)+\sum_{1\leq r\neq t\leq i-1}\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{i}\bm{X}_{r})\cdot h_{n}(\bm{X}^{\top}_{i}\bm{X}_{t})\Big\|\bm{X}_{r},\bm{X}_{t}\right)\Big]$
		$\displaystyle=\sum_{i=2}^{n}(i-1)\cdot\mathbb{E}h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})+\sum_{i=2}^{n}\sum_{1\leq r\neq t\leq i-1}\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{i}\bm{X}_{r})\cdot h_{n}(\bm{X}^{\top}_{i}\bm{X}_{t})\Big\|\bm{X}_{r},\bm{X}_{t}\right).$

Let $\bm{X},\bm{Y},\bm{Z}$ be i.i.d realizations of the uniform distribution on $\mathbb{S}^{p-1}$ . Define

\displaystyle H_{n}(\bm{X},\bm{Y}):=\mathbb{E}\Big[h_{n}\left(\bm{X}^{T}\bm{Z}\right)\cdot h_{n}\left(\bm{Y}^{T}\bm{Z}\right)\Big|\bm{X},\bm{Y}\Big].

(47)

It is easy to see that for any $1\leq r\neq t\leq i-1$ , we have

H_{n}(\bm{X}_{r},\bm{X}_{t})=\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{i}\bm{X}_{r})\cdot h_{n}(\bm{X}^{\top}_{i}\bm{X}_{t})\Big|\bm{X}_{r},\bm{X}_{t}\right).

Thus, we have

	$\displaystyle Q_{n}$	$\displaystyle=\sum_{i=2}^{n}(i-1)\cdot\mathbb{E}h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})+\sum_{i=2}^{n}\sum_{1\leq r\neq t\leq i-1}H_{n}(\bm{X}_{r},\bm{X}_{t})$
		$\displaystyle=s_{n}^{2}\cdot(1+o(1))+Q_{n}^{*},$

where

\displaystyle Q_{n}^{*}

\displaystyle:=\sum_{i=2}^{n}\sum_{1\leq r\neq t\leq i-1}H_{n}(\bm{X}_{r},\bm{X}_{t}).

To prove (44), we only need to show that $\mbox{Var}\left(Q_{n}^{*}/s_{n}^{2}\right)\to 0$ . Let $A=\left\{(r,t):1\leq r\neq t\leq n-1\right\}$ . Note that for any two pair $(r,t)\in A$ and $(r^{\prime},t^{\prime})\in A$ , we have

\mathbb{E}\Big[H_{n}(\bm{X}_{r},\bm{X}_{t})\cdot H_{n}(\bm{X}_{r^{\prime}},\bm{X}_{t^{\prime}})\Big]=0,

unless $\left\{r,t\right\}=\left\{r^{\prime},t^{\prime}\right\}$ . This is due to the assumption $\mathbb{E}h_{n}(\bm{X}_{1}^{\top}\bm{X}_{2})=0$ and Lemma 8. Consequently,

	$\displaystyle\mbox{Var}\left(Q_{n}^{*}\right)$	$\displaystyle=\mbox{Var}\Big(\sum_{(r,t)\in A}\Big[n-\max\left\{r,t\right\}-1\Big]\cdot H_{n}(\bm{X}_{r},\bm{X}_{t})\Big)$
		$\displaystyle=\sum_{(r,t)\in A}\Big[n-\max\left\{r,t\right\}-1\Big]^{2}\cdot\mathbb{E}H_{n}^{2}(\bm{X}_{r},\bm{X}_{t})$
		$\displaystyle\leq O(n^{4})\cdot\mathbb{E}H_{n}^{2}(\bm{X}_{1},\bm{X}_{2}),$

where we use the fact that $|A|\leq n^{2}$ in the last bound. By (46), $s_{n}^{4}=\sigma^{4}n^{4}(1+o(1))$ so it suffices to verify that $\mathbb{E}H_{n}^{2}(\bm{X}_{1},\bm{X}_{2})\to 0$ , which is the content of Lemma 10 in Section A. This concludes the proof of (44), which together with (43) implies (30). $\square$

A.4 Proof of Lemma 3

Suppose $\mathbb{Q}(A)=0$ for some measurable subset $A\subset\mathcal{X}$ . Let $X\sim\mathbb{P}$ and $Y\sim\mathbb{Q}$ . By the disintegration theorem (see, for example, Appendix F of [pollard2002user]),

\mathbb{Q}(A)=\int_{\mathcal{Y}}\mathbb{Q}\bigl(Y\in A\cap T^{-1}(t)\mid T(Y)=t\bigr)\,\mathbb{Q}_{T}(dt),

and similarly

\mathbb{P}(A)=\int_{\mathcal{Y}}\mathbb{P}\bigl(X\in A\cap T^{-1}(t)\mid T(X)=t\bigr)\,\mathbb{P}_{T}(dt).

For each $t$ , define the $G$ -invariant conditional laws $T^{-1}(t)$ (such sets are called fibers) by

\mathbb{K}_{P}^{t}(B):=\int_{G}\mathbb{P}(gX\in B\mid T(X)=t)\,\Pi(dg),\qquad\mathbb{K}_{Q}^{t}(B):=\int_{G}\mathbb{Q}(gY\in B\mid T(Y)=t)\,\Pi(dg),

for measurable $B\subset T^{-1}(t)$ , where $\Pi$ is the normalized Haar probability measure on $G$ .

Roughly speaking, we integrate the disintegration kernels over the group $G$ so that the resulting kernels are $G$ -invariant, up to some null sets in $\mathcal{Y}$ . Because $\mathbb{P}$ and $\mathbb{Q}$ are $G$ -invariant, we still have

\mathbb{Q}(A)=\int_{\mathcal{Y}}\mathbb{K}_{Q}^{t}(A\cap T^{-1}(t))\,\mathbb{Q}_{T}(dt),\qquad\mathbb{P}(A)=\int_{\mathcal{Y}}\mathbb{K}_{P}^{t}(A\cap T^{-1}(t))\,\mathbb{P}_{T}(dt).

Each fiber $T^{-1}(t)$ is a single $G$ -orbit, in the sense that it is generated by $\left\{gx_{t},g\in G\right\}$ for some $x_{t}\in\mathcal{X}$ , because $T$ is a maximal invariant. Thus $G$ acts transitively on every such fiber. By Theorem 4.5 of [eaton1989group], a transitive compact group action admits a unique $G$ -invariant probability measure on each orbit. Hence,

\displaystyle\mathbb{K}_{P}^{t}\equiv\mathbb{K}_{Q}^{t}\quad\text{for $\mathbb{Q}_{T}$-almost surely $t$}.

(48)

Since $\mathbb{Q}(A)=0$ , we have

\mathbb{K}_{Q}^{t}(A\cap T^{-1}(t))=0\quad\text{for $\mathbb{Q}_{T}$-almost surely $t$},

and therefore for $\mathbb{P}_{T}$ -almost every $t$ as well, because $\mathbb{P}_{T}\ll\mathbb{Q}_{T}$ . Using $\mathbb{K}_{P}^{t}=\mathbb{K}_{Q}^{t}$ on this set of $t$ ,

\mathbb{P}(A)=\int_{\mathcal{Y}}\mathbb{K}_{P}^{t}(A\cap T^{-1}(t))\,\mathbb{P}_{T}(dt)=0.

Thus $\mathbb{P}\ll\mathbb{Q}$ .

Now, for the statement regarding the likelihood ratio, let

L(t):=\frac{d\mathbb{P}_{T}}{d\mathbb{Q}_{T}}(t).

Then, by (48),

\mathbb{P}(A)=\int_{\mathcal{Y}}\mathbb{K}_{P}^{t}(A\cap T^{-1}(t))\,\mathbb{P}_{T}(dt)=\int_{\mathcal{Y}}\mathbb{K}_{Q}^{t}(A\cap T^{-1}(t))L(t)\,\mathbb{Q}_{T}(dt).

On the other hand,

\int_{\mathcal{X}}\mathbf{1}_{A}(x)\,L(T(x))\,\mathbb{Q}(dx)=\int_{\mathcal{Y}}\mathbb{K}_{Q}^{t}(A\cap T^{-1}(t))L(t)\,\mathbb{Q}_{T}(dt).

Hence,

\mathbb{P}(A)=\int_{A}L(T(x))\,\mathbb{Q}(dx),

which implies

\frac{d\mathbb{P}}{d\mathbb{Q}}(x)=L(T(x))=\frac{d\mathbb{P}_{T}}{d\mathbb{Q}_{T}}(T(x)),\quad\mathbb{Q}\text{-a.s.}

This completes the proof. $\square$

A.5 Likelihood ratio analysis

This section contains the analysis of the likelihood ratio’s second moment used in Sections 3.2 and 3.3.

We first analyze the second moment of the likelihood ratio between the FvML distributions with randomized locations and the uniform distributions. We first parametrize the FvML distributions as

\displaystyle d\mathbb{P}_{\rm Fvml}:=C_{p}(\kappa)\cdot\exp\left(\kappa\langle\bm{\mu},\bm{x}\rangle\right)d\mathbb{P}_{0}.

with $\kappa\in(0,\infty)$ and $\bm{\mu}\in\mathbb{S}^{p-1}$ . From this, we find that

\displaystyle C_{p}(\kappa)=\left[\mathbb{E}_{\mathbb{P}_{0}}\exp\left(\kappa\langle\bm{\mu},\bm{x}\rangle\right)\right]^{-1}=\left[\frac{1}{\sqrt{\pi}}\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p-1}{2}\right)}\cdot\int_{-1}^{1}e^{\kappa t}\left(1-t^{2}\right)^{\frac{p-3}{2}}dt\right]^{-1}.

(49)

Some basic properties of the above normalizing constant are collected in Lemma 12. Define the likelihood ratio

\displaystyle L_{n}:=\mathbb{E}_{\bm{\mu}\sim\mathbb{P}_{0}}\left[\frac{d\mathbb{P}_{\rm Fvml}^{\otimes n}}{d\mathbb{P}_{0}^{\otimes n}}\right]=\left(C_{p}(\kappa)\right)^{n}\cdot\mathbb{E}_{\bm{\mu}\sim\mathbb{P}_{0}}\left[\prod_{i=1}^{n}\exp\Big(\kappa\langle\bm{\mu},\bm{X}_{i}\rangle\Big)d\bm{\mu}\right].

(50)

Our first result is an asymptotic formula for the moment of the likelihood ratio.

Proposition 5.

Let $\kappa=\tau p^{3/4}/\sqrt{n}$ for some $\tau>0$ . Suppose $\min\left\{p,n\right\}\to\infty$ , then

\mathbb{E}\left(L_{n}^{2}\right)=\exp\left(\tau^{4}/2+o(1)\right)

where $L_{n}$ is defined as in (50).

Proof of Proposition 5. Recall the form of $L_{n}$ in (50). To compute the second moment, take two independent copies $\bm{\mu},\bm{\mu}_{1}$ of the uniform distribution and writes

	$\displaystyle\mathbb{E}L_{n}^{2}$	$\displaystyle=\left(C_{p}(\kappa)\right)^{2n}\cdot\mathbb{E}_{\bm{X}}\left[\mathbb{E}_{(\bm{\mu},\bm{\mu_{1}})}\left[\prod_{i=1}^{n}\exp\Big(\kappa\langle\bm{\mu}+\bm{\mu}_{1},\bm{X}_{i}\rangle\Big)\right]\right]$
		$\displaystyle=C_{p}(\kappa)^{2n}\cdot\mathbb{E}_{(\bm{\mu},\bm{\mu_{1}})}\left[\bm{E}_{\bm{X}}\left[\prod_{i=1}^{n}\exp\left(\kappa\\|\bm{\mu}+\bm{\mu}_{1}\\|\cdot\langle\frac{\bm{\mu}+\bm{\mu}_{1}}{\\|\bm{\mu}+\bm{\mu}_{1}\\|},\bm{X}_{i}\rangle\right)\right]\right]$
		$\displaystyle=\mathbb{E}\left[\frac{C_{p}(\kappa)^{2n}}{C_{p}(\kappa\\|\bm{\mu}+\bm{\mu}_{1}\\|)^{n}}\right].$

Note that $\|\bm{\mu}+\bm{\mu}_{1}\|\stackrel{{\scriptstyle d}}{{=}}\sqrt{2(1+U)}$ , where $U$ has the law as in (5). Thus, we have

\mathbb{E}L_{n}^{2}=\mathbb{E}\left[\frac{C_{p}(\kappa)^{2n}}{C_{p}(\kappa\sqrt{2(1+U)})^{n}}\right]=\mathbb{E}\left[\exp\left(n\left(2\log C_{p}(\kappa)-\log C_{p}\left(\kappa\sqrt{2(1+U)}\right)\right)\right)\right]

where $U$ has a symmetric Beta-type distribution as in (5).

Put

L_{n1}:=2\log C_{p}(\kappa)-\log C_{p}\left(\kappa\sqrt{2(1+U)}\right).

By the third property in Lemma 12, we have

	$\displaystyle\left\|L_{n1}+\frac{\kappa^{2}}{p}-\frac{\kappa^{2}(1+U)}{p}\right\|$	$\displaystyle\leq 2\left\|\log C_{p}(\kappa)+\frac{\kappa^{2}}{p}\right\|+\left\|\log C_{p}\left(\kappa\sqrt{2(1+U)}\right)+\frac{\kappa^{2}(1+U)}{p}\right\|$
		$\displaystyle=O\left(\frac{\kappa^{4}}{p^{3}}\right)=O\left(\frac{\tau^{4}}{n^{2}}\right).$

Consequently,

	$\displaystyle\mathbb{E}L_{n}^{2}=\mathbb{E}\left[\exp\left(nL_{n1}^{2}\right)\right]$	$\displaystyle=\mathbb{E}\left[\exp\left(\frac{\kappa^{2}n}{p}\cdot U+O\left(n^{-1}\right)\right)\right]$
		$\displaystyle=\mathbb{E}\left[\tau^{2}\cdot\sqrt{p}U+O\left(n^{-1}\right)\right].$

The proof is completed by noting that the sequence $\left\{\sqrt{p}U\right\}$ converges to a standard normal distribution and is exponentially tight. $\square$

The next asymptotic result was used in the proof of Theorem 4. Recall that the multivariate Gamma function $\Gamma_{n}(z)$ is defined as

\displaystyle\Gamma_{n}(z):=\pi^{n(n-1)/4}\prod_{k=1}^{n}\Gamma\left(z-\frac{k-1}{2}\right)

(51)

for all complex number $z$ such that $\mbox{Re}(z)>(n-1)/2$ .

The multivariate Gamma function reduces to the usual Gamma function for $n=1$ . The following lemma is taken from Lemma 5.1 and Proposition 5.1 in [jiang2015likelihood].

Lemma 4.

Let $\Gamma(x)$ be the standard Gamma function and $\Gamma_{n}(x)$ be the multivariate Gamma function as in (51). We have

•

Uniformly for all $b\in[-x/2,x/2]$ ,

\log\left[\frac{\Gamma(x+b)}{\Gamma(x)}\right]=(x+b)\log(x+b)-x\log x-b-\frac{b}{2x}+O\left(\frac{b^{2}+1}{x^{2}}\right)

as $x\to\infty$ .

•

Uniformly for all $t\in[-p/n,\,p/n]$ ,

\log\left[\frac{\Gamma_{n}\!\left(\frac{p}{2}+t\right)}{\Gamma_{n}\!\left(\frac{p}{2}\right)}\right]=\alpha_{n,p}\,t+\beta_{n,p}\,t^{2}+\gamma_{n,p}(t)+o(1)

as $n,p\to\infty$ with $p/n\to\infty$ , where

	$\displaystyle\alpha_{n,p}$	$\displaystyle:=-\left[2n+\left(p-n-\frac{1}{2}\right)\log\!\left(1-\frac{n}{p}\right)\right],$
	$\displaystyle\beta_{n,p}$	$\displaystyle:=-\left[\frac{n}{p}+\log\!\left(1-\frac{n}{p}\right)\right],$
	$\displaystyle\gamma_{n,p}(t)$	$\displaystyle:=n\left[\left(\frac{p}{2}+t\right)\log\!\left(\frac{p}{2}+t\right)-\frac{p}{2}\log\!\left(\frac{p}{2}\right)\right].$

Proof of Lemma 4. The first item follows from Lemma 5.1 in [jiang2015likelihood] and the second item follows from Proposition 5.1 in [jiang2015likelihood]. $\square$

Proposition 6.

Recall $F_{n}$ in (38). Suppose $p/n\to\infty$ , $n(1-k/p)\to 0$ , and $n+1\leq k\leq p$ . Then,

F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)\to 0

where $\Delta:=k-p$ .

Proof of Proposition 6. Write

	$\displaystyle F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)$	$\displaystyle=n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\Delta\right)}\right]+\log\left[\frac{\Gamma_{n}\left(\frac{p}{2}+\Delta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right]$
		$\displaystyle-2n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p+\Delta}{2}\right)}\right]-2\log\left[\frac{\Gamma_{n}\left(\frac{p+\Delta}{2}\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right].$

Obviously, $|\Delta|=O(p/n)=o(p)$ , so Lemma 4 applies and yields

		$\displaystyle n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p}{2}+\Delta\right)}\right]-2n\log\left[\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p+\Delta}{2}\right)}\right]$
	$\displaystyle=$	$\displaystyle n\left[-\left(\frac{p}{2}+\Delta\right)\log\left(\frac{p}{2}+\Delta\right)+\frac{p}{2}\log\left(\frac{p}{2}\right)+\Delta+\frac{\Delta}{p}+O\left(\frac{\Delta^{2}+1}{p^{2}}\right)\right]$
	$\displaystyle-$	$\displaystyle 2n\left[-\left(\frac{p+\Delta}{2}\right)\log\left(\frac{p+\Delta}{2}\right)+\frac{p}{2}\log\left(\frac{p}{2}\right)+\frac{\Delta}{2}+\frac{\Delta}{2p}+O\left(\frac{\Delta^{2}+1}{p^{2}}\right)\right]$
	$\displaystyle=$	$\displaystyle-n\left[\left(\frac{p}{2}+\Delta\right)\log\left(\frac{p}{2}+\Delta\right)-2\left(\frac{p+\Delta}{2}\right)\log\left(\frac{p+\Delta}{2}\right)\right]+O\left(\frac{n(\Delta^{2}+1)}{p^{2}}\right)$
	$\displaystyle-$	$\displaystyle\frac{np}{2}\log\left(\frac{p}{2}\right)$
	$\displaystyle=$	$\displaystyle-n\left[\left(\frac{p}{2}+\Delta\right)\log\left(\frac{p}{2}+\Delta\right)-2\left(\frac{p+\Delta}{2}\right)\log\left(\frac{p+\Delta}{2}\right)\right]-\frac{np}{2}\log\left(\frac{p}{2}\right)+o(1),$

where the last line follows from the fact that $n\Delta^{2}/p=O(1/n)=o(1)$ .

Similarly, with $\alpha_{n,p},\beta_{n,p},\gamma_{n,p}(t)$ as in Lemma 4, we have

		$\displaystyle\log\left[\frac{\Gamma_{n}\left(\frac{p}{2}+\Delta\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right]-2\log\left[\frac{\Gamma_{n}\left(\frac{p+\Delta}{2}\right)}{\Gamma_{n}\left(\frac{p}{2}\right)}\right]$
	$\displaystyle=$	$\displaystyle\alpha_{n,p}\,\Delta+\beta_{n,p}\,\Delta^{2}+\gamma_{n,p}(\Delta)-\left(\alpha_{n,p}\,\Delta+\beta_{n,p}\,\frac{\Delta^{2}}{2}+2\gamma_{n,p}(\Delta/2)\right)+o(1)$
	$\displaystyle=$	$\displaystyle\beta_{n,p}\,\frac{\Delta^{2}}{2}+\gamma_{n,p}(\Delta)-2\gamma_{n,p}(\Delta/2)+o(1)$
	$\displaystyle=$	$\displaystyle\beta_{n,p}\,\frac{\Delta^{2}}{2}+n\left[\left(\frac{p}{2}+\Delta\right)\log\!\left(\frac{p}{2}+\Delta\right)-\frac{p}{2}\log\!\left(\frac{p}{2}\right)\right]$
	$\displaystyle-$	$\displaystyle 2n\left[\left(\frac{p+\Delta}{2}\right)\log\!\left(\frac{p+\Delta}{2}\right)-\frac{p}{2}\log\!\left(\frac{p}{2}\right)\right]+o(1)$
	$\displaystyle=$	$\displaystyle\beta_{n,p}\,\frac{\Delta^{2}}{2}+n\left[\left(\frac{p}{2}+\Delta\right)\log\!\left(\frac{p}{2}+\Delta\right)-2\left(\frac{p+\Delta}{2}\right)\log\!\left(\frac{p+\Delta}{2}\right)\right]+\frac{np}{2}\log\left(\frac{p}{2}\right).$

Thus,

	$\displaystyle F_{n}(\Delta)-2F_{n}\left(\frac{\Delta}{2}\right)=\beta_{n,p}\,\frac{\Delta^{2}}{2}+o(1)$	$\displaystyle=-\left[\frac{n}{p}+\log\!\left(1-\frac{n}{p}\right)\right]\frac{\Delta^{2}}{2}+o(1)$
		$\displaystyle=\frac{n^{2}\Delta^{2}}{2p^{2}}(1+o(1))+o(1)$

which tends to $0$ since $n\Delta/p=-n(1-k/p)\to 0$ . The proof is completed. $\square$

A.6 Kolomogrov distance asymptotic results

Let us start with a simple observation.

Lemma 5.

Suppose $\mathbb{P}_{n}$ and $\mathbb{Q}_{n}$ are two sequence of probabiity measures such that the likelihood ratio $L_{n}:=d\mathbb{Q}_{n}/d\mathbb{P}_{n}$ exists. Let $\left\{X_{n};n\geq 1\right\}$ be a sequence of random variables. If

\displaystyle\sup_{n\geq 1}\left\{\mathbb{E}_{\mathbb{P}_{n}}\left(L_{n}^{2}\right)+\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{4}\right)\right\}

\displaystyle<\infty,

then

\sup_{n\geq 1}\left\{\mathbb{E}_{\mathbb{Q}_{n}}\left(X_{n}^{2}\right)\right\}<\infty\ \text{and}\ \left|\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}\right)-\mathbb{E}_{\mathbb{Q}_{n}}\left(X_{n}\right)\right|\leq\sqrt{\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{2}\right)\cdot\mbox{Var}_{\mathbb{P}_{n}}\left(L_{n}\right)}.

Proof of Lemma 5. Observe that

\displaystyle\mathbb{E}_{\mathbb{Q}_{n}}X_{n}^{2}

\displaystyle=\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{2}L_{n}\right)\leq\sqrt{\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{4}\right)\cdot\mathbb{E}_{\mathbb{P}_{n}}\left(L_{n}^{2}\right)}\leq\frac{\mathbb{E}_{\mathbb{P}_{n}}\left(X_{n}^{4}\right)+\mathbb{E}_{\mathbb{P}_{n}}\left(L_{n}^{2}\right)}{2}.

The second inequality can be proven similarly. The proof is completed. $\square$

Based on Lemma 5 and Proposition 5, we get the following asymptotic expansion.

Proposition 7.

Let $\kappa=\tau p_{n}^{3/4}/\sqrt{n}$ with $\tau\in(0,\infty)$ . Suppose $\bm{X},\bm{Y}$ are two i.i.d. random points on $\mathbb{S}^{p-1}$ . Then, for any $u\in\mathbb{R}$ , we have

\displaystyle n\left[\mathbb{P}_{\mu_{n}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\right]\to\frac{-\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)

as $\min\left\{n,p\right\}\to\infty$ , where $\mu_{n}$ is a FvML distribution on $\mathbb{S}^{p_{n}-1}$ with concentration parameter $\kappa_{n}$ . Moreover, the convergence above is uniform in $u$ , which also means $nd\left(\mu_{n},\mu\right)\to\tau^{2}/\sqrt{2\pi}$ , where $d$ is the metric in (13).

We did not specify the location parameter of the FvML distributions in the statement of Proposition 7 since the distribution of the inner product is independent of the location’s choice. The proof of Proposition 7 is based on an application of the Le Cam’s third lemma and the high-dimensional LAN result in [Cutting-P-V].

The interesting feature of this approach is that no growth condition on $n$ and $p$ is assumed. We do not know whether direct analyses based on the Edgeworth expansion or spherical harmonics can yield the same result. The reason that Proposition 7 holds without any growth condition on $n$ and $p$ is due to some special properties of the Bessel functions of the first type, which was exploited in [Cutting-P-V] and Proposition 5 above.

Proof of Proposition 7. Fix $u\in\mathbb{R}$ and consider the sequence of random variables

A_{n}(u)=A_{n}:=\sqrt{\frac{2}{n(n-1)}}\sum_{1\leq i<j\leq n}\left[\mathbf{1}_{\left\{\sqrt{p}\bm{X}_{i}^{\top}\bm{X}_{j}\leq u\right\}}-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\leq u\right)\right].

Recall $L_{n}$ in (50). It was shown in [Cutting-P-V] that under uniformity, we have the LAN expansion

\displaystyle\log\left(L_{n}\right)=\frac{\tau^{2}}{\sqrt{2}}R_{n}-\frac{\tau^{4}}{4}+o_{\mathbb{P}}(1)

(52)

where $R_{n}$ is the Rayleigh test in (2).

Recall that $\Phi$ is the CDF of a standard normal. By using Proposition 4 and the Crámmer–Wold device, we have

\left(A_{n},R_{n}\right)\stackrel{{\scriptstyle d}}{{\to}}N\left(\left(0,0\right)^{\top},\left(\begin{matrix}\Phi(u)\left(1-\Phi(u)\right)&g(u)\\ g(u)&1\end{matrix}\right)\right)

under uniformity, where

\displaystyle\quad g(u):

\displaystyle=\lim_{n\to\infty}\mathbb{E}_{\mu_{0}}\left[\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\cdot\mathbf{1}_{\left\{\sqrt{p}\bm{X}_{1}^{\top}\bm{X}_{2}\leq u\right\}}\right]=\mathbb{E}_{\mu_{0}}\left(Z\cdot\mathbf{1}_{\left\{Z\leq u\right\}}\right)=\frac{-\exp\left(-u^{2}/2\right)}{\sqrt{2\pi}}

with $Z$ being a standard normal in the expression above. The function $g$ is nothing but the opposite sign of the standard Gaussian density.

The convergence in expectation in the display above follows from the fact that the normalized inner products are asymptotically normal and have uniformly bounded fourth moments. Thus,

\left(A_{n},\log(L_{n})\right)\stackrel{{\scriptstyle d}}{{\to}}N\left(\left(0,\frac{-\tau^{4}}{4}\right)^{\top},\left(\begin{matrix}\Phi(u)\left(1-\Phi(u)\right)&\frac{\tau^{2}}{\sqrt{2}}g(u)\\ \frac{\tau^{2}}{\sqrt{2}}g(u)&\frac{\tau^{4}}{4}\end{matrix}\right)\right)

under uniformity. By using the Le Cam’s third lemma and (52), we obtain that under $\mu_{n}$ ,

A_{n}\stackrel{{\scriptstyle d}}{{\to}}N\left(\frac{\tau^{2}g(u)}{\sqrt{2}},\Phi(u)\left(1-\Phi(u)\right)\right).

Since $A_{n}$ is the sum of $\Theta(n^{2})$ pairwise independent, mean zero, bounded random variables under uniformity rescaled by $\Theta(n)$ , its fourth moment is uniformly bounded by a universal constant. Combine this fact, the expansion (52), Lemma 5 and Proposition 5, we obtain the mean convergence

	$\displaystyle\frac{\tau^{2}g(u)}{\sqrt{2}}$	$\displaystyle=\lim_{n\to\infty}\mathbb{E}_{\mu_{n}}\left(A_{n}\right)$
		$\displaystyle=\lim_{n\to\infty}\left\{\frac{n(1+o(1))}{\sqrt{2}}\cdot\left[\mathbb{P}_{\mu_{n}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{0}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\right]\right\}.$

This completes the proof of the first claim.

We now prove the uniform convergence. Observe that for a fixed $M>0$

		$\displaystyle\sup_{u\in\mathbb{R}}\left\|\sqrt{\frac{n(n-1)}{2}}\Big[\mathbb{P}_{\mu_{n}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\Big]+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|$
	$\displaystyle=$	$\displaystyle\sup_{u\in\mathbb{R}}\left\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|$
	$\displaystyle\leq$	$\displaystyle\sup_{u\in[-M,M]}\left\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|+\sup_{\|u\|>M}\left\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|.$

It is an standard fact that if a sequence of equicontinuous functions converges pointwise on a compact set, then the convergence is uniform (this is a corollary of the Arzelá–Ascoli theorem, see Theorem 4.43 in [folland1999real]). To use this fact, let us check that the functions

\displaystyle u\mapsto\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)

(53)

are equicontinuous.

The second term is obviously smooth and has bounded derivatives, so we only have to treat the first term. By the second estimate in Lemma 5, for all $u<v\in[-M,M]$ , we have

		$\displaystyle\left\|\Big[\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)-\mathbb{E}_{\mu_{n}}\left(A_{n}(v)\right)\Big]\right\|$
	$\displaystyle=$	$\displaystyle\left\|\Big[\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)-\mathbb{E}_{\mu_{n}}\left(A_{n}(v)\right)\Big]-\underbrace{\Big[\mathbb{E}_{\mu_{0}}\left(A_{n}(u)\right)-\mathbb{E}_{\mu_{0}}\left(A_{n}(v)\right)\Big]}_{=0}\right\|$
	$\displaystyle\leq$	$\displaystyle\sqrt{\mathbb{E}_{\mu_{0}}\left[A_{n}(u)-A_{n}(v)\right]^{2}\cdot\mbox{Var}_{\mathbb{P}_{\mu_{0}}}\left(L_{n}\right)}$
	$\displaystyle\lesssim$	$\displaystyle\sqrt{\mathbb{E}_{\mu_{0}}\left[A_{n}(u)-A_{n}(v)\right]^{2}}\leq\sqrt{\mathbb{P}_{\mu_{0}}\left(u\leq\sqrt{p}\bm{X}^{\top}\bm{Y}\leq v\right)}.$

Since the densities of $\sqrt{p}\bm{X}^{\top}\bm{Y}$ under $\mu_{0}$ (given by (55) below) are uniformly bounded for large $p$ (the upper bound can be taken as $1/2\pi$ for $p\geq 3$ ), the equicontinuity follows. In fact, the argument above also gives Hölder continuity of the sequences in (53) with exponent $1/2$ and with the same Hölder constant.

Now, notice that the sequence of functions in (53) are equicontinuous and converges pointwise to $0$ in $-[M,M]$ , the convergence is uniform. Thus, we have

		$\displaystyle\limsup_{n\to\infty}\left\{\sup_{u\in\mathbb{R}}\left\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|\right\}$
	$\displaystyle\leq$	$\displaystyle\limsup_{n\to\infty}\left\{\sup_{\|u\|>M}\left\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|\right\}$
	$\displaystyle\leq$	$\displaystyle\limsup_{n\to\infty}\left\{\sup_{\|u\|>M}\Big\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)\Big\|\right\}+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-M^{2}/2\right)$

for all $M>0$ .

It suffices to show that the first term on the last display goes to $0$ as $M\to\infty$ . To see this, use the second estimate in Lemma 5 again to get

\displaystyle\left|\mathbb{E}_{\mathbb{P}_{\mu_{n}}}\left(A_{n}(u)\right)-\underbrace{\mathbb{E}_{\mathbb{P}_{\mu_{0}}}\left(A_{n}(u)\right)}_{=0}\right|\leq\sqrt{\mathbb{E}_{\mathbb{P}_{\mu_{0}}}\left(A_{n}^{2}(u)\right)\cdot\mbox{Var}_{\mathbb{P}_{\mu_{0}}}\left(L_{n}\right)}.

Consequently,

	$\displaystyle\sup_{\|u\|>M}\Big\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)\Big\|$	$\displaystyle\lesssim\sup_{\|u\|>M}\left\{\sqrt{\mathbb{E}_{\mathbb{P}_{\mu_{0}}}\left(A_{n}^{2}(u)\right)}\right\}$
		$\displaystyle\lesssim\sup_{\|u\|>M}\left\{\sqrt{a_{n,u}\left(1-a_{n,u}\right)}\right\}$

where $a_{n,u}:=\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)$ . By (54) below, we have

\sup_{|u|>M}\left\{a_{n,u}\left(1-a_{n,u}\right)\right\}=\Phi(M)\left(1-\Phi(M)\right)+O(1/p).

Therefore,

\limsup_{n\to\infty}\left\{\sup_{|u|>M}\Big|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)\Big|\right\}\lesssim\sqrt{\Phi(M)\left(1-\Phi(M)\right)}.

The last term goes to $0$ whenever $M\to\infty$ . The proof is completed. $\square$

The next result gives the expansion in terms of the distance $d$ for the low-rank model (17). Recall that $\mu_{0}:=\mbox{Unif}\left(\mathbb{S}^{p-1}\right)$ and $\mu_{k}:=\mbox{Unif}\left(\mathbb{S}^{k-1}\right)$ are the uniform distributions on $p$ -sphere and $k$ -sphere, respectively.

Proposition 8.

Suppose $k\leq p$ , $p/n\to\infty$ , and $\left(1-k/p\right)n\to\tau\in(0,\infty)$ .Let $\bm{X},\bm{Y}$ be i.i.d. sampled from either $\mu_{0}$ or $\mu_{k}$ . Then, we have

n\cdot\left[\mathbb{P}_{\mu_{k}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\right]\to\frac{-\tau}{2}\cdot\frac{u\cdot\exp\left(-u^{2}/2\right)}{\sqrt{2\pi}}

as $n\to\infty$ , for all $u\in\mathbb{R}$ . Moreover, the convergence above is uniform in $u$ , which also means

nd\left(\mu_{n},\mu_{0}\right)\to\frac{\tau}{2}\cdot\sup_{u\in\mathbb{R}}\left|\frac{u\cdot\exp\left(-u^{2}/2\right)}{\sqrt{2\pi}}\right|.

Proof of Proposition 8. We will first show that

\displaystyle\sup_{u\in\mathbb{R}}\left|\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\Phi(u)\right|=O\left(p^{-1}\right)

(54)

as $p\to\infty$ .

The rate $p^{-1}$ is sharper than a direct application of the Berry-Esseen bound and requires a more careful analysis. We will prove a stronger result that the $L_{1}$ -distance between the densities of $\sqrt{p}\bm{X}^{\top}\bm{Y}$ and a standard normal is of order $1/p$ . To see this, note that by (5), the density of $\sqrt{p}\bm{X}^{\top}\bm{Y}$ has the form

\displaystyle f_{p}(x):=\frac{1}{\sqrt{\pi}}\frac{\Gamma\left(\frac{p}{2}\right)}{\sqrt{p}\cdot\Gamma\left(\frac{p-1}{2}\right)}\left(1-\frac{x^{2}}{p}\right)^{\frac{p-3}{2}}\mathbf{1}_{\left[-\sqrt{p},\sqrt{p}\right]}(x).

(55)

A direct calculation yields

\displaystyle\frac{\Gamma\left(\frac{p}{2}\right)}{\Gamma\left(\frac{p-1}{2}\right)}

\displaystyle=\sqrt{\frac{p}{2}}\left(1-\frac{1}{4p}+O(p^{-2})\right).

Also, for $x\in[-\sqrt{p/2},\sqrt{p/2}]$ , we have

	$\displaystyle\left(1-\frac{x^{2}}{p}\right)^{\frac{p-3}{2}}$	$\displaystyle=\exp\left[\frac{p-3}{2}\log\left(1-\frac{x^{2}}{p}\right)\right]$
		$\displaystyle=\exp\left[\frac{p-3}{2}\left(-\frac{x^{2}}{p}-\frac{x^{4}}{2p^{2}}+O\left(\frac{x^{6}}{p^{3}}\right)\right)\right]$
		$\displaystyle=\exp\left[-\frac{x^{2}}{2}-\frac{x^{4}}{2p}+O\left(\frac{x^{6}}{p^{2}}\right)\right]$

uniformly.

Thus,

	$\displaystyle\int_{\mathbb{R}}\left\|f_{p}(x)-\phi(x)\right\|dx$	$\displaystyle\leq\int_{-\sqrt{p/2}}^{\sqrt{p/2}}\left\|\frac{\exp(-x^{2}/2)}{\sqrt{2\pi}}\left[1-\exp\left(-\frac{x^{4}}{2p}+O\left(\frac{x^{6}}{p^{3}}\right)\right)\right]\right\|+O\left(1/p\right)$
		$\displaystyle=O(1/p)\cdot\int_{\mathbb{R}}\frac{x^{4}\exp\left(-x^{2}/2\right)}{\sqrt{2\pi}}dx+O(1/p)=O(1/p)$

where the first equality follows from the fact that $x^{6}/p^{2}=O(x^{4}/p)$ on the interval $[-\sqrt{p/2},\sqrt{p/2}]$ . Consequently, (54) follows.

To finish the proof, notice that

	$\displaystyle n\cdot\left[\mathbb{P}_{\mu_{k}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\right]$	$\displaystyle=n\left[\Phi\left(u\sqrt{\frac{k}{p}}\right)-\Phi\left(u\right)\right]+O\left(n/p\right)$
		$\displaystyle=-nu\left(1-\sqrt{k/p}\right)\phi(u)(1+o(1))$
		$\displaystyle+O(n/p)$
		$\displaystyle\to-\frac{\tau}{2}u\phi(u)$

for all $u\in\mathbb{R}$ . The convergence above is uniform whenever $p/n\to\infty$ . The proof is completed. $\square$

A.7 Other technical results

Let us start with a concentration inequality for degenerate U-processes, taken from [cattaneo2024uniform]. Let $(S,\mathcal{S})$ be a measurable space and let $X_{1},\dots,X_{n}$ be i.i.d. $S$ -valued random variables with common law $P$ . Let $\mathcal{F}$ be a pointwise measurable class of measurable functions $f:S\times S\to\mathbb{R}$ .

Define the (canonical) degenerate $U$ -process of order two by

U_{n}(f):=\frac{2}{n(n-1)}\sum_{i<j}\Big\{f(X_{i},X_{j})-\mathbb{E}\big[f(X_{i},X_{j})\mid X_{i}\big]-\mathbb{E}\big[f(X_{i},X_{j})\mid X_{j}\big]+\mathbb{E}\big[f(X_{i},X_{j})\big]\Big\}

for $f\in\mathcal{F}$ . Assume that

1.

Each $f\in\mathcal{F}$ is symmetric, i.e. $f(s_{1},s_{2})=f(s_{2},s_{1})$ for all $s_{1},s_{2}\in S$ .
2.

There exists a measurable envelope $F:S\times S\to\mathbb{R}$ such that $|f(s_{1},s_{2})|\leq F(s_{1},s_{2})$ for all $f\in\mathcal{F}$ and all $s_{1},s_{2}\in S$ .
3.

For any probability measure $Q$ on $(S\times S,\mathcal{S}\otimes\mathcal{S})$ and $q\geq 1$ , let

$\|f\|_{Q,q}:=\big(\mathbb{E}_{Q}[|f|^{q}]\big)^{1/q}.$

Suppose that the envelope $F$ is VC-type in the sense that there exist constants $C_{1}\geq e$ and $C_{2}\geq 1$ such that, for all $\varepsilon\in(0,1]$ ,

$\sup_{Q}N\Big(\mathcal{F},\|\cdot\|_{Q,2},\varepsilon\|F\|_{Q,2}\Big)\;\leq\;\Big(\frac{C_{1}}{\varepsilon}\Big)^{C_{2}}$

where the supremum is taken over all finite discrete probability measures $Q$ on $S\times S$ .

Under the three conditions above, we have

Lemma 6 (Lemma SA37 from [cattaneo2024uniform]).

Let $\sigma>0$ be any deterministic quantity satisfying

\sup_{f\in\mathcal{F}}\|f\|_{P,2}\;\leq\;\sigma\;\leq\;\|F\|_{P,2},

and define the random variable

M:=\max_{1\leq i,j\leq n}|F(X_{i},X_{j})|.

Then there exists a universal constant $C_{3}>0$ such that

n\,\mathbb{E}\Bigg[\sup_{f\in\mathcal{F}}|U_{n}(f)|\Bigg]\;\leq\;C_{3}\sigma\Bigg(C_{2}\log\frac{C_{1}\|F\|_{P,2}}{\sigma}\Bigg)\;+\;\frac{C_{3}\|M\|_{P,2}}{\sqrt{n}}\left[C_{2}\log\left(\frac{C_{1}\|F\|_{P,2}}{\sigma}\right)^{\!2}.\right]

Proof of Lemma 6. See [cattaneo2024uniform]. $\square$

Lemma 7.

Suppose $\bm{X},\bm{Y}$ are matrices of size $p\times n$ such that $\bm{X}^{\top}\bm{X}=\bm{Y}^{\top}\bm{Y}$ . Then, there exists an orthogonal matrix $\bm{Q}$ of size $p\times p$ such that $\bm{X}=\bm{Q}\bm{Y}$ .

Proof of Lemma 7. Notice that the assumption $\bm{X}^{\top}\bm{X}=\bm{Y}^{\top}\bm{Y}$ implies that $\mbox{Ker}\left(\bm{X}\right)=\mbox{Ker}\left(\bm{Y}\right)$ since $\bm{X}$ and $\bm{X}^{\top}\bm{X}$ have the same kernel. We will use the notation $\mbox{Col}\left(\bm{A}\right)$ to indicates the column space of a matrix $\bm{A}$ . Therefore, the map

	$\displaystyle T:\mbox{Col}\left(\bm{X}\right)\subset\mathbb{R}^{p}$	$\displaystyle\to\mbox{Col}\left(\bm{Y}\right)\subset\mathbb{R}^{p}$
	$\displaystyle\bm{X}\bm{u}$	$\displaystyle\to\bm{Y}\bm{u}$

is well-defined for all $\bm{u}\in\mathbb{R}^{n}$ . Note that $\mbox{dim}\left(\mbox{Col}\left(\bm{X}\right)\right)=\mbox{dim}\left(\mbox{Col}\left(\bm{Y}\right)\right)$ since their kernels are identical.

Moreover, $T$ is an isometry because

\langle T\left(\bm{X}\bm{u}\right),T\left(\bm{X}\bm{v}\right)\rangle=\langle\bm{Y}\bm{u},\bm{Y}\bm{v}\rangle=\bm{u}^{\top}\bm{Y}^{\top}\bm{Y}\bm{v}=\bm{u}^{\top}\bm{X}^{\top}\bm{X}\bm{v}=\langle\bm{X}\bm{u},\bm{X}\bm{v}\rangle.

Since $\mbox{dim}\left(\mbox{Col}\left(\bm{X}\right)\right)=\mbox{dim}\left(\mbox{Col}\left(\bm{Y}\right)\right)$ , $T$ admits a linear isometry extension to $\mathbb{R}^{p}$ . Since linear isometries are orthogonal matrices, we have

\bm{Q}\bm{X}\bm{u}=\bm{Y}\bm{u}

for some orthogonal matrix $\bm{Q}$ of size $p\times p$ and all $\bm{u}\in\mathbb{R}^{n}$ . The proof is completed. $\square$

Lemma 8.

Let $\bm{X}$ , $\bm{Y}$ and $\bm{Z}$ be i.i.d realizations of the uniform distribution on $\mathbb{S}^{p-1}$ and $f:\mathbb{R}\mapsto\mathbb{R}$ be a bounded measurable function. Then, we have

•

$\mathbb{E}\left(f(\bm{X}^{T}\bm{Y})\Big|\bm{Y}\right)=\mathbb{E}f(\bm{X}^{T}\bm{Y})$ almost surely.
•

$\bm{X}^{T}\bm{Y}$ and $\bm{X}^{T}\bm{Z}$ are independent.

Proof of Lemma 8. The first claim is a consequence of the rotational invariant property. Conditioning on $\bm{Y}$ , there exists an orthogonal matrix $O$ such that $O^{T}\bm{Y}=\bm{e}_{1}=(1,0,\dots,0)$ . Thus, with probability one, we have

	$\displaystyle\mathbb{E}\left(f(\bm{X}^{T}\bm{Y})\Big\|\bm{Y}\right)$	$\displaystyle=\mathbb{E}\left(f\left(\bm{X}^{T}O^{T}\bm{Y}\right)\Big\|\bm{Y}\right)$
		$\displaystyle=\mathbb{E}\left(f(\bm{X}^{T}\bm{e}_{1})\Big\|\bm{Y}\right)$
		$\displaystyle=\mathbb{E}\left(f(\bm{X^{T}}\bm{e}_{1})\right),$

where we use the fact that $\bm{X}^{T}\bm{e}_{1}$ is independent from $\bm{Y}$ in the last equality. Similarly, one can also show that $\mathbb{E}f(\bm{X}^{T}\bm{Y})=\mathbb{E}\left(f(\bm{X^{T}}\bm{e}_{1})\right)$ . This concludes the proof of the first claim.

For the second claim, take any bounded measurable functions $f$ and $g$ , by conditioning on $\bm{X}$ , we get

	$\displaystyle\mathbb{E}\left(f(\bm{X}^{T}\bm{Y})g(\bm{X}^{T}\bm{Z})\right)$	$\displaystyle=\mathbb{E}\Big[\mathbb{E}\left(f(\bm{X}^{T}\bm{Y})\Big\|\bm{X}\right)\cdot\mathbb{E}\left(g(\bm{X}^{T}\bm{Z})\Big\|\bm{X}\right)\Big]$
		$\displaystyle=\mathbb{E}f(\bm{X}^{T}\bm{Y})\cdot\mathbb{E}g(\bm{X}^{T}\bm{Z}),$

where we use the conclusion of the first statement in the last equality. This concludes the proof. $\hfill\square$

A direct consequence of Lemma 8 is that, for any bounded measurable function $h$ , the term $Z_{n}$ defined in (41) are degenerate U-statistics of order $1$ (see, for example, Section 5.3 of the monograph [Dehling] for a comprehensive introduction to U-statistics and its limit theories). This fact was used frequently throughout the proof of Theorem 1. The next lemma is used in checking the conditions of martingale CLT, which was used in the proof of Theorem 1. It gives a simpler form of the distribution of joint angles.

Lemma 9.

Let $p\geq 2$ be an integer. Assume $\bm{a}\in\mathbb{S}^{p-1}$ and $\bm{b}\in\mathbb{S}^{p-1}$ are fixed vectors. Let $\bm{X}$ be a random vector uniformly distributed over $\mathbb{S}^{p-1}$ and $\xi_{1},\xi_{2},\dots,\xi_{p}$ be i.i.d. standard normal. Set $\bm{\xi}=\left(\xi_{1},\xi_{2},\dots,\xi_{p}\right)^{\top}$ . Then, for any bounded measurable function $f(x,y):\mathbb{R}^{2}\to\mathbb{R}$ , we have

\mathbb{E}f(\bm{a}^{T}\bm{X},\bm{b}^{T}\bm{X})=\mathbb{E}f\Big(\frac{\xi_{1}}{\|\bm{\xi}\|},(\bm{a}^{T}\bm{b})\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-(\bm{a}^{T}\bm{b})^{2}}\,\frac{\xi_{2}}{\|\bm{\xi}\|}\Big).

Proof of Lemma 9. We can assume $\bm{X}=(\xi_{1},\dots,\xi_{p})^{\top}/\|\bm{\xi}\|$ without loss of generality. Suppose $\bm{c}\in\mathbb{S}^{p-1}$ and $\bm{d}\in\mathbb{S}^{p-1}$ are constant vectors with $\bm{c}^{\top}\bm{d}=0$ . Let $A$ be an orthogonal matrix of which the first two rows are $\bm{c}^{\top}$ and $\bm{d}^{\top}$ , respectively. Then, by the Haar-invariance of the uniform distribution on sphere, $A\bm{X}$ and $\bm{X}$ are identically distributed. In particular the top two entries of $A\bm{X}$ , that is, $(\bm{c}^{\top}\bm{X},\bm{d}^{\top}\bm{X})$ have same distribution as that of $(\xi_{1},\xi_{2})^{\top}/\|\bm{\xi}\|.$ Write

\bm{b}=(\bm{a}^{\top}\bm{b})\bm{a}+\sqrt{1-(\bm{a}^{\top}\bm{b})^{2}}\cdot\frac{\bm{b}-(\bm{a}^{\top}\bm{b})\bm{a}}{\sqrt{1-(\bm{a}^{\top}\bm{b})^{2}}}.

The advantage of doing so is the trivial observation that $\bm{a}$ and $\frac{\bm{b}-(\bm{a}^{\top}\bm{b})\bm{a}}{\sqrt{1-(\bm{a}^{\top}\bm{b})^{2}}}$ are orthogonal unit vectors. Hence $\bm{a}^{T}\bm{X}$ and $\frac{b-(a^{\prime}b)a}{\sqrt{1-(a^{\prime}b)^{2}}}\bm{X}$ have the same law as that of $(\xi_{1},\xi_{2})^{\prime}/\|\bm{\xi}\|.$ This implies that $(\bm{a}^{T}\bm{X},\bm{b}^{T}\bm{X})$ have the same law as that of

\Big(\frac{\xi_{1}}{\|\bm{\xi}\|},(\bm{a}^{\top}\bm{b})\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-(\bm{a}^{\top}\bm{b})^{2}}\,\frac{\xi_{2}}{\|\bm{\xi}\|}\Big).

As a result,

\mathbb{E}f(\bm{a}^{T}\bm{X},\bm{b}^{T}\bm{X})=\mathbb{E}f\Big(\frac{\xi_{1}}{\|\bm{\xi}\|},(\bm{a}^{T}\bm{b})\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-(\bm{a}^{T}\bm{b})^{2}}\,\frac{\xi_{2}}{\|\bm{\xi}\|}\Big),

where the last expectation is taken over $\xi_{1},\dots,\xi_{p}$ , hence it is a function of $\bm{a}^{T}\bm{b}$ . $\hfill\square$

Lemma 10.

Let $H_{n}$ be defined in (47) for a sequence of measurable functions $h_{n}:\mathbb{R}\mapsto\mathbb{R}$ . Assume additionally that

	$\displaystyle\mathbb{E}h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})$	$\displaystyle=0;$
	$\displaystyle\mbox{Var}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})\right)$	$\displaystyle\leq C_{1}$

for some constant $C_{1}$ independent of $n$ . Then, we have

\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2})\to 0

as $n\to\infty$ .

Proof of Lemma 10. It suffices to prove Lemma 10 when $h_{n}$ is bounded. Indeed, suppose we have proved $\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2})\to 0$ for all bounded $h_{n}$ , then write

h_{n}=\underbrace{h_{n}\cdot\mathbf{1}_{\left\{|h_{n}|\leq L\right\}}-\mathbb{E}\left(h_{n}\cdot\mathbf{1}_{\left\{|h_{n}|\leq L\right\}}\right)}_{f_{n,L}}+\underbrace{h_{n}\cdot\mathbf{1}_{\left\{|h_{n}|>L\right\}}-\mathbb{E}\left(h_{n}\cdot\mathbf{1}_{\left\{|h_{n}|>L\right\}}\right)}_{g_{n,L}}

where the expectation is taken with respect to the law of $\bm{X}_{1}^{\top}\bm{X}_{2}$ . Then,

	$\displaystyle\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2})$	$\displaystyle=\mathbb{E}\mathbb{E}^{2}\Big[h_{n}\left(\bm{X}_{1}^{T}\bm{Y}\right)\cdot h_{n}\left(\bm{X}_{2}^{T}\bm{Y}\right)\Big\|\bm{X}_{1},\bm{X}_{2}\Big]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}^{2}\left[f_{n,L}\left(\bm{X}_{1}^{T}\bm{Y}\right)\cdot f_{n,L}\left(\bm{X}_{2}^{T}\bm{Y}\right)\Big\|\bm{X}_{1},\bm{X}_{2}\right]\right]+O\left(\mbox{Var}\left(g_{n,L}\left((\bm{X_{1}},\bm{X}_{2}\right)\right)\right).$

For every fixed $L$ the first term tends to zero when $n\to\infty$ . We then the deduce the result by letting $L\to\infty$ and noting that

\sup_{n\geq 1}\mbox{Var}\left(g_{n,L}\left((\bm{X_{1}},\bm{X}_{2}\right)\right)\leq\frac{C_{1}}{L^{2}}.

Now assume that $h$ is bounded. Let $\bm{Y}$ be drawn from the uniform distribution on $\mathbb{S}^{p-1}$ independently from $\bm{X}_{1}$ and $\bm{X}_{2}$ . Thanks to Lemma 9, we can write

	$\displaystyle\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2})$	$\displaystyle=\mathbb{E}\mathbb{E}^{2}\Big[h_{n}\left(\bm{X}_{1}^{T}\bm{Y}\right)\cdot h_{n}\left(\bm{X}_{2}^{T}\bm{Y}\right)\Big\|\bm{X}_{1},\bm{X}_{2}\Big]$
		$\displaystyle=\mathbb{E}\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|}\right)\cdot h_{n}\left(\bm{X}_{1}^{T}\bm{X}_{2}\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-(\bm{X}_{1}^{T}\bm{X}_{2})^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)\Big\|\bm{X}_{1},\bm{X}_{2}\bigg],$

where $\bm{\xi}=(\xi_{1},\xi_{2},\dots,\xi_{p})^{\top}$ is a vector consisting of i.i.d. standard normal. Set $U=\bm{X}_{1}^{T}\bm{X}_{2}$ and let $f(u)$ be the density of $U$ , we can write

	$\displaystyle\mathbb{E}H_{n}^{2}(\bm{X_{1}},\bm{X}_{2})$	$\displaystyle=\int_{-1}^{1}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)\bigg]du$
		$\displaystyle=\int_{\|u\|\leq\varepsilon}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)\bigg]du$
		$\displaystyle+\int_{\|u\|>\varepsilon}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)\bigg]du$
		$\displaystyle\leq\int_{\|u\|\leq\varepsilon}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)\bigg]du$
		$\displaystyle+\\|h_{n}\\|_{\infty}^{2}\cdot\mathbb{P}\left(\|\bm{X}_{1}^{T}\bm{X}_{2}\|>\varepsilon\right)$

for any fixed $\varepsilon>0$ . By Proposition 5 in [Jiang13], we get $\mathbb{P}\left(|\bm{X}_{1}^{T}\bm{X}_{2}|>\varepsilon\right)\to 0$ as $p\to\infty$ and hence, it suffices to bound the first integrand over $(-\varepsilon,\varepsilon)$ . Thanks to Lemma 11, for $|u|\leq\varepsilon$ we get the bound

		$\displaystyle\bigg\|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)\Big]-\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\sqrt{p}}\right)\cdot h_{n}\left(\frac{\xi_{2}}{\sqrt{p}}\right)\Big]\bigg\|$
	$\displaystyle\leq$	$\displaystyle(\mbox{const})\cdot\\|h_{n}\\|_{\infty}^{2}\cdot\left(p^{-1}+\varepsilon\right),$

by choosing $g_{n}(x,y)=h_{n}(x)h_{n}(y)$ in Lemma 11. Note that $\mathbb{E}h_{n}\left(\xi_{1}/\|\bm{\xi}\|\right)=0$ since $\mathbb{E}h_{n}(\bm{X}^{\top}_{1}\bm{X}_{2})=0$ , which gives us

	$\displaystyle\bigg\|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\sqrt{p}}\right)\cdot h_{n}\left(\frac{\xi_{2}}{\sqrt{p}}\right)\Big]\bigg\|$	$\displaystyle=\bigg\|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\sqrt{p}}\right)\Big]\bigg\|^{2}$
		$\displaystyle=\bigg\|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\sqrt{p}}\right)\Big]-\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|}\right)\Big]\bigg\|^{2}$
		$\displaystyle\leq(\mbox{const})\cdot\frac{\\|h_{n}\\|_{\infty}^{2}}{p},$

where we use the bound (56) in the last inequality. This in turns yields

\displaystyle\bigg|\mathbb{E}\Big[h_{n}\left(\frac{\xi_{1}}{\|\bm{\xi}\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)\Big]\bigg|\leq(\mbox{const})\cdot\left(p^{-1}+\varepsilon\right),

since the $L_{\infty}$ -norm of $h_{n}$ is uniformly bounded. Consequently,

		$\displaystyle\int_{\|u\|\leq\varepsilon}f(u)\cdot\mathbb{E}^{2}\bigg[h_{n}\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|}\right)\cdot h_{n}\left(u\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)\bigg]du$
	$\displaystyle\leq$	$\displaystyle(\mbox{const})\cdot\int_{\|u\|\leq\varepsilon}f(u)\cdot(p^{-1}+\varepsilon)du.$

The proof is completed by taking $p\to\infty$ and then taking $\varepsilon\to 0$ . $\hfill\square$

Lemma 11.

Let $\bm{\xi}=(\xi_{1},\xi_{2},\dots,\xi_{p})^{\top}$ is a vector consisting of i.i.d. standard normal, then for any bounded, measureable function $g:\mathbb{R}^{2}\mapsto\mathbb{R}$ and $u\in[-1/2,1/2]$ , we have

\Big|\mathbb{E}g\left(\frac{\xi_{1}}{\|\bm{\xi}\|},u\cdot\frac{\xi_{1}}{\|\bm{\xi}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\|\bm{\xi}\|}\right)-\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},\frac{\xi_{2}}{\sqrt{p}}\right)\Big|\leq C\|g\|_{\infty}\left(\frac{1}{p}+|u|\right)

for some universal constant $C>0$ .

Proof of Lemma 11. The conclusion follows from the following two total variation distance bounds

\displaystyle d_{TV}\left(\left(\frac{\sqrt{p}\cdot\xi_{1}}{\|\bm{\xi}\|},\frac{\sqrt{p}\cdot\xi_{2}}{\|\bm{\xi}\|}\right),\left(\xi_{1},\xi_{2}\right)\right)\leq\frac{C}{p}

(56)

and

\displaystyle d_{TV}\Big(N\left(\begin{pmatrix}0\\ 0\end{pmatrix},\begin{pmatrix}1&u\\ u&1\end{pmatrix}\right),N(\bm{0},\bm{I}_{2})\Big)\leq C|u|

(57)

for some universal constant $C$ and for all $p$ large enough in the first estimate. We next explain (56) and (57).

The first estimate (56) is a consequence of Diaconis-Freeman theorem (see Theorem 2.8 in the monograph [Meckes] and also the paper [D-F]). It provides a sharp bound in terms of total variation between the joint distribution of the first few entries of $\mbox{Unif}\left(\mathbb{S}^{p-1}\right)$ and the standard multivariate normal random vector of the same length. The second estimate (57) is elementary and can be proven directly by estimating the difference between the two corresponding densities. To see this, write

		$\displaystyle d_{TV}\Big(N\left(\begin{pmatrix}0\\ 0\end{pmatrix},\begin{pmatrix}1&u\\ u&1\end{pmatrix}\right),N(\bm{0},\bm{I}_{2})\Big)$
	$\displaystyle\leq$	$\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}}\bigg\|\frac{1}{2\pi}\exp\left\{-\frac{x^{2}+y^{2}}{2}\right\}-\frac{1}{2\pi\sqrt{1-u^{2}}}\exp\left\{-\frac{x^{2}+y^{2}-2uxy}{2(1-u^{2})}\right\}\bigg\|dxdy$
	$\displaystyle\leq$	$\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}}\frac{1}{2\pi}\exp\left\{-\frac{x^{2}+y^{2}}{2}\right\}\cdot\bigg\|1-\frac{1}{\sqrt{1-u^{2}}}\exp\left\{-\frac{(x^{2}+y^{2})u^{2}}{2(1-u^{2})}+\frac{uxy}{1-u^{2}}\right\}\bigg\|dxdy$
	$\displaystyle\leq$	$\displaystyle I_{1}+I_{2},$

where

	$\displaystyle I_{1}$	$\displaystyle=\bigg\|1-\frac{1}{\sqrt{1-u^{2}}}\bigg\|\cdot\int_{\mathbb{R}}\int_{\mathbb{R}}\frac{1}{2\pi}\exp\left\{-\frac{x^{2}+y^{2}}{2}\right\}dxdy,$
	$\displaystyle I_{2}$	$\displaystyle=\frac{1}{2\pi\sqrt{1-u^{2}}}\cdot\int_{\mathbb{R}}\int_{\mathbb{R}}\exp\left\{-\frac{x^{2}+y^{2}}{2}\right\}\cdot\bigg\|1-\exp\left\{-\frac{(x^{2}+y^{2})u^{2}}{2(1-u^{2})}+\frac{uxy}{1-u^{2}}\right\}\bigg\|.$

The term $I_{1}$ is obviously of order $O(|u|)$ as $u\to 0$ and thus, we only need to bound $I_{2}$ . In polar coordinates, $I_{2}$ can be rewritten as

\displaystyle I_{2}

\displaystyle=\frac{1}{2\pi\sqrt{1-u^{2}}}\cdot\int_{0}^{\infty}r\cdot\exp\left\{-\frac{r^{2}}{2}\right\}\cdot\int_{0}^{2\pi}\bigg|1-\exp\left\{-\frac{r^{2}u^{2}-u\sin 2\theta}{2(1-u^{2})}\right\}\bigg|d\theta dr.

Moreover, we have

	$\displaystyle\bigg\|1-\exp\left\{-\frac{r^{2}u^{2}-u\sin 2\theta}{2(1-u^{2})}\right\}\bigg\|$	$\displaystyle\leq\bigg\|1-\exp\left\{\frac{u\sin 2\theta}{2(1-u^{2})}\right\}\bigg\|$
		$\displaystyle+\bigg\|\exp\left\{\frac{u\sin 2\theta}{2(1-u^{2})}\right\}\bigg\|\cdot\bigg\|1-\exp\left\{-\frac{r^{2}u^{2}}{2(1-u^{2})}\right\}\bigg\|$
		$\displaystyle\leq(\mbox{const})\cdot\bigg[\Big\|\frac{u\sin 2\theta}{2(1-u^{2})}\Big\|+\frac{r^{2}u^{2}}{2(1-u^{2})}\bigg],$

where we use the elementary inequalities $1-e^{-x}\leq x$ for all $x>0$ and $|1-e^{x}|\leq C_{1}|x|$ for all $|x|\leq C$ , where $C_{1}$ depends only on $C$ . Thus, we have

	$\displaystyle I_{2}$	$\displaystyle\leq(\mbox{const})\cdot\frac{\|u\|}{4\pi(1-u^{2})^{3/2}}\cdot\int_{0}^{\infty}r\cdot\exp\left\{-\frac{r^{2}}{2}\right\}\cdot\bigg(\int_{0}^{2\pi}\|\sin 2\theta\|+r^{2}ud\theta\bigg)dr$
		$\displaystyle\leq(\mbox{const})\|u\|.$

This concludes the proof of (57).

Now we are ready to prove the main estimate in Lemma 11, write

		$\displaystyle\Big\|\mathbb{E}g\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|},u\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)-\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},\frac{\xi_{2}}{\sqrt{p}}\right)\Big\|$
	$\displaystyle\leq$	$\displaystyle\Big\|\mathbb{E}g\left(\frac{\xi_{1}}{\\|\bm{\xi}\\|},u\cdot\frac{\xi_{1}}{\\|\bm{\xi}\\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\\|\bm{\xi}\\|}\right)-\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},u\cdot\frac{\xi_{1}}{\sqrt{p}}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\sqrt{p}}\right)\Big\|$
	$\displaystyle+$	$\displaystyle\Big\|\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},u\cdot\frac{\xi_{1}}{\sqrt{p}\|}+\sqrt{1-u^{2}}\cdot\frac{\xi_{2}}{\sqrt{p}}\right)-\mathbb{E}g\left(\frac{\xi_{1}}{\sqrt{p}},\frac{\xi_{2}}{\sqrt{p}}\right)\Big\|$
	$\displaystyle\leq$	$\displaystyle\frac{(\mbox{const})\cdot\\|g\\|_{\infty}}{p}+(\mbox{const})\cdot\\|g\\|_{\infty}\|u\|.$

The proof is completed. $\hfill\square$

The next lemma collects some elementary properties and bound for the normalizing constant $C_{p}(\kappa)$ of the FvML distributions.

Lemma 12.

Recall $C_{p}(\kappa)$ in (49). The following statement holds:

•

If $I_{\nu}(x)$ is the modfified Bessel function of first kind (see [Ley-Verdebout] for details), then

$\frac{C_{p}(\kappa)^{{}^{\prime}}}{C_{p}(\kappa)}=-\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}.$
•

For all $\nu>0$ , we have

$G_{\nu+\frac{1}{2},\nu+\frac{3}{2}}(\kappa)\leq\frac{I_{\nu+1}(\kappa)}{I_{\nu}(\kappa)}\leq G_{\nu,\nu+2}(\kappa)$

where

$G_{\alpha,\beta}(t):=\frac{t}{\alpha+\sqrt{\beta^{2}+t^{2}}}.$
•

for all $\kappa>0$ , we have

$\left|\log C_{p}(\kappa)+\frac{\kappa^{2}}{2p}\right|\leq\frac{\kappa^{4}}{2p^{4}}$

for all $p\geq 3$ and $\kappa>0$ .

Proof of Lemma 12. The first property can be found in [M-Jupp], pages $169$ and $170$ . The second property follows can be found in Section 3 of [hornik2013amos]. Let us use the first two properties to prove the last one.

We first show that

\displaystyle\Big|\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}-\frac{\kappa}{p}\Big|\leq\frac{2\kappa^{3}}{p^{3}}.

(58)

To see this, we apply the second property with $\nu=(p-2)/2$ to obtain

\frac{\kappa}{\nu+1/2+\sqrt{\kappa^{2}+\left(\nu+3/2\right)^{2}}}\leq\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}\leq\frac{\kappa}{\nu+\sqrt{\kappa^{2}+\left(\nu+2\right)^{2}}}

From the upperbound, it is clear that the Bessel ratio is always less than $1$ . Consider two cases as follows.

Case 1: $\kappa\geq p$ . In this case, the result is trival since

\Big|\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}-\frac{\kappa}{p}\Big|=\frac{\kappa}{p}-\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}\leq\frac{\kappa}{p}\leq\frac{\kappa^{3}}{p^{3}}.

Case 2: $\kappa\leq p$ . In this case, put $x:=\kappa/p\leq 1$ . Use the lowerbound to get

	$\displaystyle x-\frac{I_{p/2}(\kappa)}{I_{p/2-1}(\kappa)}$	$\displaystyle\leq x-\frac{\kappa}{(p-1)/2+\sqrt{\kappa^{2}+\left((p+1)/2\right)^{2}}}$
		$\displaystyle=x\left[1-\frac{1}{1+\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-1/2-1/(2p)}\right]$
		$\displaystyle=x\cdot\frac{\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-[1/2+1/(2p)]}{1+\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-1/2-1/(2p)}$
		$\displaystyle\leq\frac{2x^{3}}{1+\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-1/2-1/(2p)}\leq 2x^{3}$

where the last line follows from the fact that

\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}-[1/2+1/(2p)]=\frac{x^{2}}{\sqrt{x^{2}+\left(1/2+1/(2p)\right)^{2}}+[1/2+1/(2p)]}\leq 2x^{2}.

Finally, to deduce the third property, we integrate (58) to get

	$\displaystyle\left\|\log C_{p}(\kappa)+\frac{\kappa^{2}}{2p}\right\|=\left\|\int_{0}^{\kappa}\left(-\frac{I_{p/2}(t)}{I_{p/2-1}(t)}+\frac{t}{p}\right)dt\right\|$	$\displaystyle\leq\int_{0}^{\kappa}\left\|-\frac{I_{p/2}(t)}{I_{p/2-1}(t)}+\frac{t}{p}\right\|dt$
		$\displaystyle\leq\frac{\kappa^{4}}{2p^{4}}.$

This completes the proof. $\square$

	$\displaystyle\mathbb{P}\left(\text{rejecting the null}\right)$	$\displaystyle=\mathbb{P}\left(\sup_{t\in[-1,1]}\|T_{n}(t)\|\geq\frac{q_{\alpha}}{n}(1+o(1))\right)$
		$\displaystyle\geq\mathbb{P}\left(\left\|T_{n,1}(t_{n})+d_{t_{n}}\right\|\geq\frac{q_{\alpha}}{n}(1+o(1))+\left\|T_{n,2}\left(t_{n}\right)\right\|\right)$
		$\displaystyle=\mathbb{P}\left(\left\|\sqrt{\frac{n}{V_{n}}}\cdot T_{n1}(t_{n})+d_{t_{n}}\sqrt{\frac{n}{V_{n}}}\right\|\geq T^{*}_{n}\right)$

	$\displaystyle\sup_{x\in\mathbb{R}}\left\|\mathbb{P}\!\left(\sqrt{\frac{n}{V_{n}}}\,T_{n1}(t_{n})\leq x\right)-\mathbb{P}\!\left(N(0,4)\leq x\right)\right\|$	$\displaystyle\lesssim\frac{\mathbb{E}\left\|g_{n,t_{n}}\left(\bm{X}_{i}\right)\right\|^{3}}{\sqrt{n}\cdot V_{n}^{3/2}}$
		$\displaystyle\lesssim\frac{\mathbb{E}\left\|g_{n,t_{n}}\left(\bm{X}_{i}\right)\right\|^{2}}{\sqrt{n}\cdot V_{n}^{3/2}}$
		$\displaystyle=\frac{V_{n}}{\sqrt{n}\cdot V_{n}^{3/2}}=\frac{1}{\sqrt{nV_{n}}}\to 0.$

	$\displaystyle\mathbb{P}\!\Big(\|X_{n}+a_{n}\|\leq\|Y_{n}\|\Big)$	$\displaystyle\leq\mathbb{P}\!\Big(\|X_{n}+a_{n}\|\leq\|Y_{n}\|,\|Y_{n}\|\leq\varepsilon\Big)+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)$
		$\displaystyle\leq\mathbb{P}\left(\|X_{n}+a_{n}\|\leq\varepsilon\right)+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)$
		$\displaystyle=\mathbb{P}\left(-\varepsilon-a_{n}\leq N(0,4)\leq\varepsilon-a_{n}\right)+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)+o(1)$
		$\displaystyle\leq 2\varepsilon\cdot\sup_{t\in\mathbb{R}}\left\{\frac{1}{2\sqrt{2\pi}}\exp\left(-t^{2}/8\right)\right\}+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)+o(1).$
		$\displaystyle\leq 2\varepsilon+\mathbb{P}\Big(\|Y_{n}\|>\varepsilon\Big)+o(1).$

	$\displaystyle\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big]$	$\displaystyle=\mathbb{E}\mathbb{E}\Big[h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\cdot h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big\|\bm{X}_{1}\Big]$
		$\displaystyle=\mathbb{E}\Big[\mathbb{E}\left(h_{n}^{2}(\bm{X}^{\top}_{1}\bm{X}_{2})\Big\|\bm{X}_{1}\right)\cdot\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{3})\Big\|\bm{X}_{1}\right)$
		$\displaystyle\cdot\mathbb{E}\left(h_{n}(\bm{X}^{\top}_{1}\bm{X}_{4})\Big\|\bm{X}_{1}\right)\Big]$
		$\displaystyle=0,$

		$\displaystyle\sup_{u\in\mathbb{R}}\left\|\sqrt{\frac{n(n-1)}{2}}\Big[\mathbb{P}_{\mu_{n}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)-\mathbb{P}_{\mu_{0}}\left(\sqrt{p}\bm{X}^{\top}\bm{Y}\leq u\right)\Big]+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|$
	$\displaystyle=$	$\displaystyle\sup_{u\in\mathbb{R}}\left\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|$
	$\displaystyle\leq$	$\displaystyle\sup_{u\in[-M,M]}\left\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|+\sup_{\|u\|>M}\left\|\mathbb{E}_{\mu_{n}}\left(A_{n}(u)\right)+\frac{\tau^{2}}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right)\right\|.$

Detecting non-uniform patterns on high-dimensional hyperspheres

Abstract

1 Introduction

2 Measuring uniformity deviation and testing procedure

2.1 Notation and preliminaries

2.2 A characterization of the uniform distribution

Proposition 1.

2.3 Testing procedure

Theorem 1.

2.4 Model-free consistency

Condition 1 (separation condition).

Theorem 2.

3 Lower bound and non-null results

3.1 An information lower bound

Theorem 3.

3.2 Local limiting distribution under the FvML alternatives

Proposition 2.

3.3 Local limiting distribution under a low-rank model

Proposition 3.

Theorem 4.

4 When are the distance dd and test TnT_{n} useful?

5 Conclusions and remarks

6 Proofs

6.1 Proof of Proposition 1

Lemma 1.

6.2 Proof of Theorem 3

6.3 Proof of Theorem 1

Condition 2 (Finite-dimensional convergence in distribution).

Condition 3 (Tightness).

Proposition 4.

Remark 1.

6.4 Proof of Theorem 2

Lemma 2.

6.5 Proof of Proposition 2

6.6 Proof of Theorem 4

Lemma 3.

Appendix A Technical results, discussions and other proofs

A.1 Comparison with projection-based tests

A.2 Simulation studies

A.3 Proof of Proposition 4

A.4 Proof of Lemma 3

A.5 Likelihood ratio analysis

Proposition 5.

Lemma 4.

Proposition 6.

A.6 Kolomogrov distance asymptotic results

Lemma 5.

Proposition 7.

Proposition 8.

A.7 Other technical results

Lemma 6 (Lemma SA37 from [cattaneo2024uniform]).

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Lemma 11.

Lemma 12.

4 When are the distance $d$ and test $T_{n}$ useful?