1 s2.0 S016794732500043X Main
1 s2.0 S016794732500043X Main
Department of Statistics, University of South Carolina, Columbia, 29201, SC, United States
A R T I C L E I N F O A B S T R A C T
Keywords: A comprehensive toolkit is developed for regression analysis of directional data based on a
Angular Gaussian flexible class of angular Gaussian distributions. Informative testing procedures to assess rotational
Hypersphere symmetry around the mean direction, and the dependence of model parameters on covariates are
Isotropy
proposed. Bootstrap-based algorithms are provided to assess the significance of the proposed test
Prediction region
statistics. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal
prediction regions of the same coverage probability is constructed. The efficacy of these inference
procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze
directional data originating from a hydrology study and a bioinformatics application.
1. Introduction
Directional data naturally arise in many scientific disciplines, such as flight directions of migrating birds, the directions of wind
and waves in the ocean, and geomagnetic field directions. These examples of directional data as the original form of observed data
are typically of low dimensions. High dimensional directional data typically result from preprocessing high dimensional features
collected in genetic study (Banerjee et al., 2005), computer vision (Ryali et al., 2013), and text analysis (Ennajari et al., 2021), among
many other fields of study. In these instances, the raw data vectors in some 𝑑 -dimensional Euclidean space ℝ𝑑 are often normalized
to lie on a hypersphere 𝕊𝑑−1 = {𝐲 ∈ ℝ𝑑 ∶ ‖𝐲‖ = 1}, where ‖𝐲‖ denotes the Euclidean norm of 𝐲 .
Regression analysis of directional data is relatively underdeveloped compared to regression analysis of response data in the
linear (Euclidean) space. One of the most notable early developments of regression models for directional data is given by Johnson
and Wehrly (1978), who formulated parametric models for the joint distribution of a circular response (i.e., 𝑑 = 2) and a linear
covariate. Later, Presnell et al. (1998) introduced the spherically projected multivariate linear model based on the projected Gaussian
distribution for the circular response with a mean direction depending on covariates linearly. Mimicking the least squares method in
regression analysis for a linear response, Lund (1999) proposed a least circular-distance method for regression analysis of a circular
response. Scealy and Wood (2019) proposed a transformation of the von Mises-Fisher distribution to study paleomagnetic data,
following which they built regression models using the proposed directional distribution. Paine et al. (2018) proposed the elliptically
symmetric angular Gaussian distribution (ESAG), focusing on directional data on 𝕊2 . In a follow-up work (Paine et al., 2020), the
authors formulated regression models based on ESAG of low dimensions.
The formulation of ESAG results from imposing constraints on the mean 𝝁 and variance-covariance matrix 𝐕 of a multivariate
Gaussian distribution 𝑑 (𝝁, 𝐕) to resolve the identfiability issue. Such an identfiability issue emerges inevitably when normalizing a
multivariate Gaussian vector to yield an angular Gaussian random variable, since two Gaussian vectors, 𝐖 and 𝑐𝐖, are normalized to
the same vector when 𝑐 > 0, yet they follow different Gaussian distributions whenever 𝑐 ≠ 1. For most angular Gaussian distributions,
* Corresponding author.
E-mail address: [email protected] (X. Huang).
https://doi.org/10.1016/j.csda.2025.108167
Received 6 July 2024; Received in revised form 4 February 2025; Accepted 26 February 2025
A 𝑑 -dimensional random variable 𝐘 supported on 𝕊𝑑−1 follows an angular Gaussian distribution, AG(𝝁, 𝐕), if 𝐘 = 𝐖∕‖𝐖‖ with
𝐖 ∼ 𝑑 (𝝁, 𝐕). To guarantee the identfiability of the distribution AG(𝝁, 𝐕), constraints on (𝝁, 𝐕) are needed to avoid overparam
eterization. For example, Presnell et al. (1998) assumed 𝐕 = 𝐈𝑑 that leads to an isotropic directional distribution, where 𝐈𝑑 is the
𝑑 -dimensional identity matrix. Less stringent assumptions are also considered, for example, in Wang and Gelfand (2013) where a
sub-block of 𝐕 is assumed known. We adopt the ESAG distribution (Paine et al., 2018) resulting from imposing the following con
straints that we refer to as ESAG constraints henceforth, 𝐕𝝁 = 𝝁 and det(𝐕) = 1, where det(𝐕) dentoes the determinant of 𝐕. These
constraints leave more room for flexible modeling of 𝐘 than most previously considered constraints at the price of creating a more
complex constrained parameter space. The pdf of the ESAG distribution is
[ { }] { }
(2𝜋)−(𝑑−1)∕2 1 (𝐘T 𝝁)2 𝐘T 𝝁
𝑓 (𝐘|𝝁, 𝐕) = exp − 𝝁T 𝝁 𝑀𝑑−1 √ , (1)
(𝐘T 𝐕−1 𝐘)𝑑∕2 2 𝐘T 𝐕−1 𝐘 𝐘T 𝐕−1 𝐘
where 𝑀𝑑−1 (𝑡) = (2𝜋)−1∕2 ∫0 𝑥𝑑−1 exp{−(𝑥 − 𝑡)2 ∕2}𝑑𝑥.
∞
We recently reparameterized ESAG by introducing constraint-free parameters 𝜸 ∈ ℝ(𝑑−2)(𝑑+1)∕2 so that 𝐕 that satifies ESAG
constraints can be determined by (𝝁, 𝜸) via an eigendecomposition (Yu and Huang, 2024). This new parameterization is suitable for
directional data on 𝕊𝑑−1 for an arbitrary 𝑑 ≥ 3, the range of dimensions we focus on in this study. Appendix A the Supplementary
Material details this parameterization when 𝑑 = 4. Henceforth, we use 𝐘 ∼ ESAG(𝝁, 𝜸) to refer to 𝐘 ∼ AG(𝝁, 𝐕) with ESAG constraints
imposed on (𝝁, 𝐕).
The benfits of modeling ESAG via constraint-free parameters are at least twofold. First, maximum likelihood estimation of model
parameters becomes more straightforward than directly estimating (𝝁, 𝐕) subject to the nonlinear ESAG constraints, such as the
constraint of det(𝐕) = 1. Second, a covariate-dependent ESAG can be easily formulated without introducing link functions to relate
covariates to constrained model parameters as done in earlier regression models for directional responses (Lund, 1999; Scealy and
Welsh, 2011, 2017). In this study, we consider an ESAG regression model specfied by 𝐘|𝐗 ∼ ESAG(𝝁 = 𝜶 0 + 𝐀1 𝐗, 𝜸 = 𝜷 0 + 𝐁1 𝐗),
where 𝐗 = (𝑋1 , ..., 𝑋𝑞 )⊤ is the 𝑞 -dimensional covariate vector, 𝜶 0 is the intercept for modeling 𝝁, 𝐀1 = [𝜶 1 ∣ … ∣ 𝜶 𝑞 ] is the 𝑑 × 𝑞
matrix of regression coefficients representing covariates effects on 𝝁 , 𝜷 0 is the intercept parameter in 𝜸 , and 𝐁1 = [𝜷 1 ∣ … ∣ 𝜷 𝑞 ] is
the (𝑑 − 2)(𝑑 + 1)∕2 × 𝑞 matrix of covariates effects on 𝜸 , in which 𝜶 𝑘 ∈ ℝ𝑑 and 𝜷 𝑘 ∈ ℝ(𝑑−2)(𝑑+1)∕2 , for 𝑘 = 0, 1, ..., 𝑞 . This regression
model generalizes the one in Paine et al. (2020, Section 3.2) that focuses on the case with 𝑑 = 3.
Suppose the observed data include directional responses {𝐘1 , … , 𝐘𝑛 } from 𝑛 independent experimental units along with their
covariates data {𝐗1 , … , 𝐗𝑛 }. Similar to the treatment on covariates data in Scealy and Wood (2019), we standardize covariates
data via (𝑋𝑖,𝑘 − 𝑋(1),𝑘 )∕(𝑋(𝑛),𝑘 − 𝑋(1),𝑘 ) + 1, for 𝑖 = 1, … , 𝑛, where 𝑋(1),𝑘 and 𝑋(𝑛),𝑘 are the minimum and maximum order statistics
corresponding to covariate 𝑋𝑘 , for 𝑘 = 1, … , 𝑞 . The resultant standardized covariates data are more comparable in scale with the
response of a unit Euclidean norm, which helps to stabilize the numerical implementation of maximum likelihood estimation without
distorting the underlying association between the response and covariates. With a slight abuse of notation, we use {𝐗𝑖 }𝑛𝑖=1 to refer to
the standardized covariates data. These standardized covariates data fall in the 𝑞 -dimensional unit cube [1, 2]𝑞 . If a new experimental
unit possesses a covariate value falling outside of the original data range, then this subject’s standardized covariates data fall outside
of the unit cube, which can be problematic when predicting the subject’s outcome. To alleviate this concern in prediction, one may
consider alternative standardization, such as the more traditional approach of centering and scaling the covariates data to achieve
zero-mean and unit-variance. On the other hand, when predicting a directional response based on linear covariates, extrapolation
can be even more unreliable than when the response is also linear. Hence, extrapolation using our regression models is especially
discouraged, without which the current data standardization poses no complication in prediction.
2
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
2.2. Maximum likelihood estimation
To parameterize 𝐕 in AG(𝝁, 𝐕) that satifies ESAG constraints, we introduced longitude and latitude angle parameters to specify
eigenvectors of 𝐕 after 𝝁 is specfied (as explained in detail in Appendix A in the Supplementary Material). We showed that 𝜸 or
a certain subvector of it being zero amounts to some latitude angles falling on the boundary of 0 or 𝜋 and some other latitude and
longitude angles being non-identifiable (see Appendix B in Yu and Huang, 2024, for details). This suggests violations of regularity
conditions in the context of drawing likelihood-based inference for model parameters even though the parameter space of ESAG(𝝁, 𝜸)
is the entire real space ℝ(𝑑−1)(𝑑+2)∕2 . The irregularity carries over to the ESAG regression model. As a result, maximum
√ likelihood
estimators (MLE) of some regression coefficients may converge in distribution to Gaussian at a slower rate than 𝑛, or may not be
asymptotically Gaussian, depending on where the true model parameters fall in the parameter space. Regardless, numerical imple
mentation of maximum likelihood estimation is straightforward under the current parameterization of ESAG, as demonstrated in our
earlier work (and thus omitted here), and a simple resample-based bootstrap procedure can be used to quantify the uncertainty of
the MLEs.
When it comes to hypothesis testing, the conventional likelihood ratio test is inadequate when regularity conditions are not
satified because the asymptotic null distribution of a likelihood ratio (LR) statistic is no longer a 𝜒 2 (Chernoff, 1954). Most existing
strategies for addressing this complication with LR aim at estimating the exact distribution of LR or its limiting distribution under the
null using some simulation-based methods, such as the method proposed by Drton (2009) and the approach developed in Mitchell et
al. (2019). Instead of using LR, we propose different test statistics that exploit unique properties of the ESAG distribution. These are
elaborated in the next two sections, one focusing on testing for isotropy, and the other considering tests for covariate dependence of
𝝁 and 𝜸 .
If 𝐘 follows an isotropic distribution around its mean, then 𝐑𝐘 and 𝐘 are identically distributed for any given 𝑑 × 𝑑 rotation
matrix 𝐑 such that 𝐑𝝁 = 𝝁. By the parameterization of 𝐕 via 𝜸 after 𝝁 is specfied, ESAG(𝝁, 𝜸) is isotropic when 𝜸 = 𝟎, which gives
𝐕 = 𝐈𝑑 . Hence, testing isotropy is relevant to inferring correlations of the components in 𝐖, i.e., the pre-normalization version of
𝐘, and also relates to model selection between the more parsimonious isotropic ESAG and a generic ESAG distribution. Outside of
the ESAG family, isotropy can be manifested in different forms other than having 𝐕 = 𝐈𝑑 . In what follows, we propose a strategy for
(𝐕)
testing the null hypothesis 𝐻0 ∶ 𝐘 ∼ ESAG(𝝁, 𝜸 = 𝟎), where potential dependence of 𝝁 on covariates 𝐗 is suppressed for notational
simplicity. The proposed strategy is motivated by properties of the MLE for the concentration parameter in the presence of model
misspecfication.
For the distribution ESAG(𝝁, 𝜸), ‖𝝁‖ quantfies the overall concentration of the distribution, with 𝐕 (or 𝜸 ) controlling the variation
in different subspaces on the unit sphere. A data cloud generated from an isotropic ESAG supported on 𝕊𝑑−1 visually tends to be
ball-shaped, whereas a data cloud from an anisotropic ESAG distribution takes the shape of an elliptical disc. Intuitively, when fitting
an isotropic ESAG model to data from an anisotropic ESAG, one essentially tries to find a ball that can best resemble (in some sense)
the elliptical disc. To accomplish this, the radius of the ball tends to be some weighted average of the axes of the elliptical disc, leading
to a lower concentration of the fitted isotropic ESAG compared to the concentration of the true anisotropic distribution. In the context
of model comparison, two ESAG distributions, ESAG(𝝁1 , 𝜸 1 = 𝟎) and ESAG(𝝁2 , 𝜸 2 ≠ 𝟎), are more alike when ‖𝝁1 ‖ < ‖𝝁2 ‖ than when
‖𝝁1 ‖ ≥ ‖𝝁2 ‖. We demonstrate this phenomenon next by exploiting the properties of MLEs in the presence of model misspecfication.
Let 𝑃 denote a generic ESAG distribution with pdf 𝑃 (𝐘; 𝝁𝑎 , 𝜸 𝑎 ), which specfies the true data-generating mechanism with true
model parameters 𝝁𝑎 and 𝜸 𝑎 . Let 𝑄 denote an isotropic ESAG distribution with pdf 𝑄(𝐘; 𝝁). Using the density function in (1), we
have 𝑃 (𝐘; 𝝁𝑎 , 𝜸 𝑎 ) = 𝑓 (𝐘|𝝁𝑎 , 𝐕(𝜸 𝑎 )) and 𝑄(𝐘; 𝝁) = 𝑓 (𝐘|𝝁, 𝐈𝑑 ), where we use 𝐕(𝜸 𝑎 ) to highlight the dependence of 𝐕 on 𝜸 𝑎 after 𝝁𝑎 is
specfied in the true pdf. The Kullback–Leibler divergence of 𝑄 from 𝑃 is dfined as 𝐷KL (𝑃 ‖𝑄; 𝝁) = 𝐸𝑃 [log{𝑃 (𝐘; 𝝁𝑎 , 𝜸 𝑎 )∕𝑄(𝐘; 𝝁)}],
where the subscript ``𝑃 '' signfies that the expectation is with respect to the distribution 𝑃 . Under regularity conditions (White,
1982), if one fits the model 𝑄 to data from 𝑃 , then the MLE for 𝝁 converges in probability to 𝝁0 = argmin𝝁 𝐷KL (𝑃 ‖𝑄; 𝝁) =
argmax𝝁 𝐸𝑃 {log 𝑄(𝐘; 𝝁)}. We show next that ‖𝝁0 ‖ ≤ ‖𝝁𝑎 ‖, or, equivalently, in the presence of model misspecfication (i.e., 𝑃 ≠ 𝑄),
𝐸𝑃 {log 𝑄(𝐘; 𝝁)} is maximized when the ratio of concentrations (RoC) ‖𝝁𝑎 ‖∕‖𝝁0 ‖ exceeds 1. To single out the concentration, we
view 𝝁𝑎 = 𝑐𝑎 𝐑𝑎 𝝁∗ and 𝝁0 = 𝑐0 𝐑0 𝝁∗ for some rotation matrices, 𝐑𝑎 and 𝐑0 , and some positive constants, 𝑐𝑎 and 𝑐0 , where 𝝁∗ is a
unit vector. In other words, 𝝁𝑎 and 𝝁0 may differ in concentration, quantfied by 𝑐𝑎 and 𝑐0 respectively, or differ in orientation when
𝐑𝑎 ≠ 𝐑0 . Using this factorization of the mean direction parameter, we have ‖𝝁𝑎 ‖∕‖𝝁0 ‖ = 𝑐𝑎 ∕𝑐0 since ‖𝐑𝑎 𝝁∗ ‖∕‖𝐑0 𝝁∗ ‖ = 1. Now we
re-express the density 𝑃 (⋅; 𝝁𝑎 , 𝜸 𝑎 ) as 𝑃 (⋅; 𝑐𝑎 , 𝐑𝑎 , 𝜸 𝑎 ), and similarly write the density 𝑄(⋅; 𝝁) as 𝑄(⋅; 𝑐, 𝐑), where the dependence of both
distributions on 𝝁∗ is suppressed because they depend on the same 𝝁∗ . Viewing 𝑃 as the reference distribution, we let 𝝁∗ = 𝝁𝑎 ∕‖𝝁𝑎 ‖,
𝐑𝑎 = 𝐈𝑑 , and 𝑐𝑎 = ‖𝝁𝑎 ‖. Fitting 𝑄 to data from 𝑃 now amounts to, in limit as 𝑛 → ∞, maximizing 𝐸𝑃 {log 𝑄(𝐘; 𝑐, 𝐑)} with respect to
(𝑐, 𝐑), which cannot be done analytically but can be simulated using large samples. √
To simulate this maximization problem, we generate a random sample of size 𝑛 = 104 from 𝑃 (⋅; 𝑐𝑎 , 𝐑𝑎 , 𝜸 𝑎 ) with 𝑐𝑎 = 63 resulting
from setting 𝝁𝑎 = (2, −5, 3, 5)T , 𝐑𝑎 = 𝐈4 , and 𝜸 𝑎 taking one of the following three values, 𝜸 (1) = 𝟎, 𝜸 (2) = (1.5, 0, 1.5, 0, 0)T , and 𝜸 (3) =
2𝜸 (2) , with the first value creating a scenario where 𝑃 = 𝑄, and the latter two creating increasing degree of anisotropy in 𝑃 . We
∑𝑛
then use the log-likelihood function 𝓁(𝑐, 𝐑) = 𝑛−1 𝑖=1 log 𝑄(𝐘𝑖 ; 𝑐, 𝐑) as an empirical version of 𝐸𝑃 {log 𝑄(𝐘; 𝑐, 𝐑)} to demonstrate
3
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
Fig. 1. The empirical version of 𝐸𝑃 {log 𝑄(𝐘; 𝑐, 𝐑)}, 𝓁(𝑐, 𝐑), based on a random sample of size 𝑛 = 104 from an isotropic ESAG (solid lines), anisotropic ESAG with
𝜸 = 𝜸 (2) ≠ 𝟎 (dashed lines), and 𝜸 = 𝜸 (3) = 2𝜸 (2) (dotted lines), versus RoC when 𝐑 is set at 𝐑(1) = 𝐈𝑑 (top panel), 𝐑(2) ≠ 𝐈𝑑 (middle panel), and 𝐑(3) that deviates from
𝐈𝑑 even more (bottom panel). Vertical lines mark the value of RoC where the corresponding function 𝓁(𝑐, 𝐑) is maximized.
that 𝑐𝑎 ∕𝑐 ∗ > 1 when 𝜸 𝑎 ≠ 𝟎, where 𝑐 ∗ = argmax𝑐>0 𝓁(𝑐, 𝐑∗ ) for some arbitrary rotation matrix 𝐑∗ . For concreteness, we consider
three values for 𝐑∗ given by 𝐑(1) = 𝐑𝑎 , 𝐑(2) resulting from replacing the upper 2 × 2 block of 𝐈4 with the two-dimensional rotation
matrix 𝑅∗ (0.1), and 𝐑(3) similarly dfined using 𝑅∗ (0.3) to produce a rotation matrix that deviates from 𝐑𝑎 further than 𝐑(2) does. A
two-dimensional rotation matrix 𝑅∗ (𝜃) has diagonal entries equal to cos(𝜃), and [1,2] and [2,1] entries equal to − sin(𝜃) and sin(𝜃),
respectively.
The top panel of Fig. 1 depicts 𝓁(𝑐, 𝐑(1) ) as a function of RoC = 𝑐𝑎 ∕𝑐 when the data-generating mechanism 𝑃 has 𝜸 𝑎 set at 𝜸 (1) = 𝟎
(isotropy), 𝜸 (2) ≠ 𝟎 (mild anisotropy), and 𝜸 (3) = 2𝜸 (2) (severe anisotropy), respectively. With 𝐑(1) = 𝐑𝑎 , the mean directions of 𝑃
and 𝑄 have the same orientation. When 𝑃 is isotropic, 𝐸𝑃 {log 𝑄(𝐘; 𝑐, 𝐑(1) )} is expected to be maximized at 𝑐 ∗ = 𝑐𝑎 , resulting in
𝐷KL (𝑃 ‖𝑄; 𝝁0 ) = 0. This is indeed (empirically) justfied by the curve of 𝓁(𝑐; 𝐑(1) ) that reaches its peak at around RoC = 𝑐𝑎 ∕𝑐 ∗ = 1.
Once 𝑃 exhibits anisotropy by having 𝜸 𝑎 deviating from 𝟎, one witnesses a drop in the likelihood 𝓁(𝑐, 𝐑(1) ), which is maximized
at some RoC that exceeds 1, indicating that 𝑐 ∗ < 𝑐𝑎 . The iflation in RoC, i.e., the attenuation in 𝑐 ∗ , becomes more substantial as
𝜸 𝑎 deviates from 𝟎 further. This implies that misspecifying 𝜸 in the ESAG distribution by assuming isotropy can be manifested in
a larger-than-1 RoC. The bottom two panels in Fig. 1 show 𝓁(𝑐, 𝐑(2) ) and 𝓁(𝑐, 𝐑(3) ) versus RoC, where all the previously observed
phenomena for 𝓁(𝑐, 𝐑(1) ) remain except for that, even with 𝜸 𝑎 set at 𝟎, 𝓁(𝑐, 𝐑∗ ) is also maximized when RoC is larger than 1.
Comparing the three panels in Fig. 1 reveals a clear trend of RoC increasing as model misspecfication becomes more severe by
having 𝜸 𝑎 further away from zero or having the orientation of 𝝁𝑎 mismatch more with the orientation of 𝝁0 . The latter observation
suggests that RoC can be used to test assumptions regarding 𝝁 as well, which is a point we come back to in a later section on testing
assumptions on the mean direction parameter.
Inspired by the above findings regarding concentration estimation, we propose the statistic for testing isotropy dfined by
𝑛
1 ∑ ‖𝝁̂ 𝑎𝑖 ‖
RoC = , (2)
𝑛 𝑖=1 ‖𝝁̂ 0𝑖 ‖
(𝐕)
where 𝝁 ̂ 0𝑖 is the restricted MLE of the mean direction of 𝐘𝑖 given 𝐗𝑖 under 𝐻0 that assumes 𝜸 = 𝟎 and covariate-dependent 𝝁,
and 𝝁̂ 𝑎𝑖 is the unrestricted MLE under the alternative hypothesis that allows anisotropy. If the true data-generating mechanism is
(𝐕)
consistent with 𝐻0 , then RoC is expected to be close to 1; otherwise, RoC tends to be larger than 1.
Algorithm 1 below gives the parametric bootstrap procedure to estimate the 𝑝-value associated with RoC to assess its statistical
significance. The goal of the bootstrap procedure is to estimate the null distribution of RoC by simulating realizations of RoC under
the null. To this end, we repeatedly compute RoC based on data generated from an isotropic ESAG distribution 𝑄(⋅; ̂ 𝝁̂ 0𝑖 ) for the
𝑖-th experimental unit, for 𝑖 = 1, … , 𝑛. This distribution is an estimate of 𝑄 that is closest to the unknown true model 𝑃 for each
experimental unit. An estimated 𝑝-value can then be obtained by comparing the RoC computed based on the raw data with the
4
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
simulated RoC’s. As seen here and in the testing procedures proposed later for other purposes, the distribution of a test statistic under
any hypothesized ESAG model can be easily approximated via parametric bootstrap because it is straightforward to simulate data
from any ESAG distribution, which is yet another virtue of the ESAG distribution family and the constraint-free parameterization.
For a directional response, a practically interesting question is whether or not its mean direction depends on covariates. For
(𝝁)
concreteness, let us consider testing the null 𝐻0 ∶ 𝐘|𝐗 ∼ ESAG(𝝁 = 𝜶 0 , 𝜸 = 𝜷 0 + 𝐁1 𝐗) versus the alternative 𝐻1 ∶ 𝐘|𝐗 ∼ ESAG(𝝁 =
𝜶 0 + 𝐀1 𝐗, 𝜸 = 𝜷 0 + 𝐁1 𝐗). If the alternative is true with 𝐀1 ≠ 𝟎, the fitted 𝝁 under the null is expected to differ from the fitted value
that allows covariates dependence of 𝝁. The difference can lie in their directions, or their norms, i.e., the concentrations of the two
fitted distributions. This motivates the following test statistic that captures both sources of discrepancies,
𝑛
( )
1∑ 𝝁̂ ⊤ ̂ 𝑎𝑖
0𝑖 𝝁 ‖𝝁̂ 𝑎𝑖 ‖
𝐷= 2− , (3)
𝑛 𝑖=1 ‖𝝁̂ 0𝑖 ‖‖𝝁̂ 𝑎𝑖 ‖ ‖𝝁̂ 0𝑖 ‖
where, for the 𝑖-th data point, 𝝁 ̂ 0𝑖 is the restricted MLE obtained under the null that assumes covariate-independent 𝝁, and 𝝁̂ 𝑎𝑖 is the
unrestricted MLE obtained under the alternative. In (3), 𝝁 ̂⊤ ̂ 𝑎𝑖 ∕(‖𝝁̂ 0𝑖 ‖‖𝝁̂ 𝑎𝑖 ‖) is known as the cosine similarity between two vectors,
0𝑖 𝝁
𝝁̂ 0𝑖 and 𝝁̂ 𝑎𝑖 , which is equal to 1 if they have the same direction, and is equal to −1 if the directions are opposite. Hence the first
factor in the summand in (3) quantfies the dissimilarity in direction between 𝝁 ̂ 0𝑖 and 𝝁̂ 𝑎𝑖 . The second factor of the summand in (3)
(𝝁)
contrasts the concentrations of the two estimates for 𝝁 as in RoC. By construction, under the null 𝐻0 , 𝐷 is expected to be close to
1; and a realization of 𝐷 larger than 1 can imply the observed data coming from a model that violates the null.
Algorithm 2 provides detailed steps for implementing the test based on the newly proposed test statistic, where we again use a
parametric bootstrap procedure to estimate the 𝑝-value associated with 𝐷.
As indicated in Section 3.1, RoC can be used to test hypotheses about 𝝁, such as testing covariate dependence of it by adopting
(𝝁)
Algorithm 1 with the restricted MLEs for 𝝁 and 𝜸 obtained under the current null 𝐻0 . Moreover, because 𝐷 incorporates direction
comparison between two fitted values of 𝝁 besides concentration comparison that RoC focuses on, one can combine the two test
statistics to gain more insight into the underlying data-generating mechanism. If 𝐷 is significantly larger than RoC when testing
covariate dependence of 𝝁, one may interpret it as data evidence for the direction of 𝝁 depending on some covariate. Having 𝐷 close
5
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
to RoC can imply that the direction of 𝝁 may not depend on covariates, although its norm may depend on covariates. This exemplfies
the versatility and additional insight our proposed test statistics can offer compared to LR.
Unique to our parameterization of ESAG(𝝁, 𝜸), parameters in 𝜸 control variation of the distribution in different subspaces on the
hypersphere besides (an)isotropy. It is thus of interest to test if such distributional features depend on covariates. For instance, one
(𝜸)
may consider testing the null 𝐻0 ∶ 𝐘|𝐗 ∼ ESAG(𝝁 = 𝜶 0 + 𝐀1 𝐗, 𝜸 = 𝜷 0 ) versus the alternative 𝐻1 ∶ 𝐘|𝐗 ∼ ESAG(𝝁 = 𝜶 0 + 𝐀1 𝐗, 𝜸 =
𝜷 0 + 𝐁1 𝐗). Because 𝜸 as a whole relates to (an)isotropy of the distribution, RoC that is initially proposed for testing isotropy has its
(𝜸)
natural appeal for testing hypotheses about 𝜸 . When a violation of 𝐻0 adversely affects inferences for 𝝁, the test statistic 𝐷 designed
for testing assumptions on 𝝁 also has the potential to detect covariates dependence of 𝜸 . With the restricted MLEs 𝝁 ̂ 0𝑖 and 𝜸̂ 0𝑖 now
(𝜸) (𝜸)
rflecting 𝐻0 used in Algorithm 1 or Algorithm 2, one can carry out the test based on RoC or 𝐷 for testing 𝐻0 versus 𝐻1 .
Fixing 𝐻1 at the above saturated ESAG model, to test other null hypotheses, say, 𝜶 𝑘 = 𝟎 for a given 𝑘 ∈ {1, … , 𝑞} (with all other
covariates in the null model), RoC and 𝐷 can be used with the restricted MLEs in Algorithms 1 and 2 revised accordingly when
computing 𝝁 ̂ 0𝑖 to rflect the specific null hypothesis under consideration. Similarly, if one considers a different alternative ESAG
model that allows 𝝁 or 𝜸 to depend on covariates nonlinearly, the tests based on RoC and 𝐷 remain applicable with 𝝁 ̂ 𝑎𝑖 obtained
under such alternative model. We thus argue that the proposed testing strategies are more versatile than the strategies of testing
covariate dependence of 𝝁 or 𝜸 based on the magnitude of regression-coe˙icient matrices 𝐀1 or 𝐁1 . Even if one adopts an angular
Gaussian distribution that is not ESAG, as long as the mean vector 𝝁 has the same interpretations as that in ESAG(𝝁, 𝜸), RoC and
𝐷 remain meaningful statistics for testing assumptions on 𝝁 or other model assumptions that inferences for 𝝁 are sensitive to. One
simply needs to revise the bootstrap procedures to adapt to the assumed angular Gaussian distribution.
Lastly, RoC and 𝐷 depend on both the restricted and unrestricted MLEs of model parameters, which in turn add to the computa
tional burden in Algorithms 1 and 2 where these MLEs are obtained based on each bootstrap sample. We thus propose yet another
testing strategy that only requires computing the restricted MLEs that is based on a second moment estimation, with the test statistic
given by
𝑛 {
‖1 ∑
‖ ( 2 )}‖
‖
𝑀 =‖ 𝐘2𝑖 − Ê 𝐘𝑖 ‖ , (4)
‖𝑛 0 ‖
‖ 𝑖=1 ‖
( )
where 𝐘2𝑖 is the element-wise quantity square of 𝐘𝑖 , and Ê0 𝐘𝑖 is an empirical mean of 𝐘 given 𝐗 = 𝐗𝑖 computed using a random
2 2
sample simulated from an estimated null model. Unlike RoC and 𝐷, the construction of 𝑀 is not motivated by (and thus does not
target at testing) a particular aspect of the model specfication; instead, 𝑀 can serve as an overall goodness-o-fit test statistic. In
fact, 𝐘2 can be viewed as a compositional vector in a simplex, of which (non-negative) entries sum to one, and hence 𝑀 can be
interpreted as a prediction error of the compositions under a null model. This interpretation also reveals that lots of information of 𝐘
regarding orientations is lost in 𝑀 (by taking the element-wise quantity square), and thus 𝑀 is insensitive to model misspecfication
that impacts inferences on such information. Regardeless, in the absence of model misspecfication, 𝑀 is expected to be close to zero,
and a larger 𝑀 serves as data evidence of a worse fit of a null model for the observed data. As an example, Algorithm 3 below gives
the algorithm for using 𝑀 to test the null model that assumes an isotropic ESAG, with an estimated 𝑝-value obtained via parametric
bootstrap as an output.
6
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
Table 1
Data-generating mechanisms (DGM) designed for testing the power of tests
for each considered null hypothesis regarding ESAG(𝝁, 𝜸), along with values
of model parameters in these DGMs.
𝐻0(𝐕) ∶ 𝝁 = 𝜶 0 + 𝜶 1 𝑋, 𝜸 = 𝟎 (𝐕)
DGM1 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1 𝑋, 𝜸 = 𝜷 ∗0𝑟
(𝐕)
DGM2 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1 𝑋, 𝜸 = 𝜷 ∗0𝑟 + 𝜷 ∗1𝑟 𝑋
𝐻0(𝝁) ∶ 𝝁 = 𝜶 0 , 𝜸 = 𝜷 0 + 𝜷 1 𝑋 (𝝁)
DGM1 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1𝑟 𝑋, 𝜸 = 𝟎
(𝝁)
DGM2 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1𝑟 𝑋, 𝜸 = 𝜷 ∗0
(𝝁)
DGM3 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1𝑟 𝑋, 𝜸 = 𝜷 ∗0 + 𝜷 ∗1 𝑋
(𝜸) (𝜸)
𝐻0 ∶ 𝝁 = 𝜶 0 + 𝜶 1 𝑋, 𝜸 = 𝜷 0 DGM1 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1 𝑋, 𝜸 = 𝜷 ∗1𝑟 𝑋
(𝜸)
DGM2 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1 𝑋, 𝜸 = 𝜷 ∗0 + 𝜷 ∗1𝑟 𝑋
5
𝑟
𝜷 ∗1 = (4, 2, 5, −2, 3)⊤ , 𝜷 ∗1𝑟 = √ 𝟏5
5
5. Simulation study
(𝐕) (𝝁)
We are now in the position to study empirically operating characteristics of the proposed testing procedures for testing 𝐻0 , 𝐻0 ,
(𝜸)
and 𝐻0 versus the alternative 𝐻1 ∶ 𝐘|𝐗 ∼ ESAG(𝝁 = 𝜶 0 + 𝐀1 𝐗, 𝜸 = 𝜷 0 + 𝐁1 𝐗). To this end, we design several data-generating
mechanisms (DGM) for each null hypothesis. A random sample of size 𝑛 ∈ {100, 200, 400} is generated according to each DGM,
based on which the proposed test statistics and their estimated 𝑝-values are computed following Algorithms 1--3 with 𝐵 = 300. As a
benchmark testing procedure to compare with ours, we also test each null using LR, with the corresponding 𝑝-value estimated via
parametric bootstrap as opposed to assuming a 𝜒 2 null distribution for LR as in Paine et al. (2020). This experiment is repeated 200
times at each simulation setting specfied by the null hypothesis, DGM, and the level of 𝑛. Common in all settings, we consider a scalar
covariate, with 𝑛 realizations {𝑋𝑖′ }𝑛𝑖=1 generated from (0, 1), followed by standardization via 𝑋𝑖 = (𝑋𝑖′ − 𝑋(1)
′ )∕(𝑋 ′ − 𝑋 ′ ) + 1, for
(𝑛) (1)
𝑖 = 1, … , 𝑛. Given the covariate data {𝑋𝑖 }𝑛𝑖=1 , response data {𝐘𝑖 }𝑛𝑖=1 are generated according to each DGM designed for inspecting
the power of a test (see Table 1) or for checking the size of a test (see Table B.1 in Appendix B).
For each considered null hypothesis, we include one or multiple DGM’s consistent with the null (see Table B.1 in the Supplementary
Material). This allows for inspecting the size of a test. For each considered null, as one can see in Table 1, we also design several
DGMs with increasing model complexity compared to the null. The values of some regression coefficients depend on a quantity 𝑟 that
we vary in the simulation to control the severity of model misspecfication under a null, with a larger 𝑟 leading to a more pronounced
deviation of the DGM from a null. This allows for monitoring the power of a test as the true model deviates from the null model
further.
The metric we record in the simulation study is the relative frequency of a considered test rejecting the current null across 200
Monte Carlo replicates at a pre-specified significance level. In what follows, we present these rejection rates associated with different
tests for testing each of the three null hypotheses tabulated in Table 1.
The rejection rates of RoC, 𝐷 or 𝑀 , and LR versus the nominal significance level based on data generated from an ESAG regression
model consistent with a null hypothesis are provided in Appendix B of the Supplementary Material. Empirically, the sizes of the
proposed tests are mostly well controlled provided that the sample size is moderate or large (e.g., 𝑛 > 100). A slight iflation in the
test size often occurs when 𝑛 = 100 or when some parameters (e.g., 𝜸 ) take true values (as zeros) that lead to irregular MLEs when
fitting the full model. Such iflation is more evident for LR, especially when testing covariate dependence of model parameters.
(𝝁)
For example, when testing 𝐻0 using a random sample of size 100 from an isotropic ESAG distribution with covariate-free 𝝁 , the
rejection rates are 0.085, 0.080, 0.005, and 0.115 for the tests based on RoC, 𝐷, 𝑀 , and LR, respectively, in contrast to the nominal
level of 0.05. This can be where the size of LR fails to approach the nominal level asymptotically even when its 𝑝-value is estimated by
the conventional parametric bootstrap, which is a phenomenon described in Drton and Williams (2011). One shall thus interpret the
empirical power of LR with caution. For this reason, we omit to report the empirical power of LR for testing covariate dependence.
Table 2 presents the empirical power of various tests for testing each of the three null hypotheses at a significance level of 0.05
based on data from different true ESAG models specfied in Table 1. When using RoC, 𝑀 , and LR to test isotropy, the three tests
are comparable in their power to detect anisotropy, with the power increasing steadily as 𝑛 grows bigger or as the true value of 𝜸
deviates from zero further (by having a larger 𝑟). Having a covariate-dependent 𝜸 in the true regression model also enhances the
power of these tests.
7
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
Table 2
(𝐕) (𝝁) (𝜸)
Rejection rates of tests for 𝐻0 , 𝐻0 , and 𝐻0 at nominal level 0.05, with the highest rejection rate at
each level of 𝑟 when 𝑛 = 100 highlighted in bold.
{𝑛} 100 200 400 100 200 400 100 200 400
Testing 𝐻0(𝐕) ∶ 𝝁 = 𝜶 0 + 𝜶 1 𝑋, 𝜸 = 𝟎
RoC 𝑀 LR
(𝐕)
𝑟 DGM1 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1 𝑋, 𝜸 = 𝜷 ∗0𝑟
0.1 0.065 0.075 0.110 0.075 0.095 0.105 0.055 0.070 0.100
0.2 0.085 0.145 0.245 0.070 0.145 0.280 0.095 0.145 0.305
0.4 0.160 0.440 0.870 0.215 0.360 0.745 0.175 0.470 0.900
(𝐕)
𝑟 DGM2 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1 𝑋, 𝜸 = 𝜷 ∗0𝑟 + 𝜷 ∗1𝑟 𝑋
0.1 0.090 0.210 0.420 0.110 0.190 0.390 0.080 0.215 0.470
0.2 0.325 0.675 0.990 0.290 0.525 0.890 0.345 0.710 1.000
0.4 0.895 1.000 1.000 0.740 0.980 1.000 0.910 1.000 1.000
(𝝁)
Testing 𝐻0 ∶ 𝝁 = 𝜶 0 , 𝜸 = 𝜷 0 + 𝜷 1 𝑋
RoC 𝐷 𝑀
(𝝁)
𝑟 DGM1 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1𝑟 𝑋, 𝜸 = 𝟎
0.5 0.085 0.075 0.065 0.085 0.055 0.070 0.005 0.005 0.025
1 0.160 0.200 0.265 0.165 0.195 0.240 0.005 0.005 0.025
2 0.485 0.660 0.800 0.490 0.635 0.815 0.030 0.115 0.160
(𝝁)
𝑟 DGM2 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1𝑟 𝑋, 𝜸 = 𝜷 ∗0
0.5 0.245 0.440 0.760 0.235 0.350 0.635 0.070 0.075 0.105
1 0.675 0.935 0.995 0.655 0.930 0.995 0.125 0.205 0.160
2 0.980 1.000 1.000 0.980 0.995 1.000 0.375 0.470 0.525
(𝝁)
𝑟 DGM3 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1𝑟 𝑋, 𝜸 = 𝜷 ∗0 + 𝜷 ∗1 𝑋
0.5 0.460 0.850 0.975 0.435 0.745 0.935 0.080 0.140 0.145
1 0.980 1.000 1.000 0.975 1.000 1.000 0.230 0.215 0.240
2 1.000 1.000 1.000 1.000 1.000 1.000 0.655 0.775 0.880
(𝜸)
Testing 𝐻0 ∶ 𝝁 = 𝜶 0 + 𝜶 1 𝑋, 𝜸 = 𝜷 0
RoC 𝐷 𝑀
(𝜸)
𝑟 DGM1 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1 𝑋, 𝜸 = 𝜷 ∗1𝑟 𝑋
0.5 0.065 0.120 0.090 0.065 0.125 0.090 0.040 0.045 0.035
1 0.105 0.135 0.130 0.105 0.135 0.125 0.020 0.030 0.070
2 0.135 0.145 0.170 0.135 0.155 0.160 0.050 0.070 0.090
(𝜸)
𝑟 DGM2 ∶ 𝝁 = 𝜶 ∗0 + 𝜶 ∗1 𝑋, 𝜸 = 𝜷 ∗0 + 𝜷 ∗1𝑟 𝑋
0.5 0.070 0.045 0.055 0.065 0.050 0.050 0.025 0.070 0.075
1 0.070 0.105 0.095 0.075 0.105 0.090 0.055 0.060 0.055
2 0.155 0.165 0.340 0.150 0.165 0.325 0.025 0.065 0.045
According to Table 2, the tests based on RoC and 𝐷 enjoy higher power to detect covariate dependence of 𝝁 when the true model
(𝝁) (𝝁)
also has a covariate-dependent 𝜸 (as in DGM3 ) than when it has an intercept-only model for 𝜸 (as in DGM2 ). Noting that obtaining
(𝝁)
the unrestricted MLE for 𝜸 using data from DGM2 creates an irregular maximum likelihood estimation, but the same estimation
(𝝁)
using data from DGM3 is a regular case, we believe that having irregular MLEs for model parameters can compromise the power of
(𝜸) (𝝁)
RoC and 𝐷. When testing 𝐻0 , the power of the proposed tests does not increase as quickly as when testing 𝐻0 when 𝑛 increases
or when the covariate dependence becomes stronger. We conjecture that, once we allow 𝝁 to depend on covariates, inferences for
the concentration are less sensitive to the assumption of covariate-independent for 𝜸 , and thus RoC and 𝐷 may lack high power to
detect the dependence of 𝜸 on covariates unless when the dependence is very strong.
The moment-based test using 𝑀 is much less powerful than the tests based on RoC and 𝐷 for testing covariate dependence of
model parameters. By solely focusing on the fit for the mean of 𝐘2 , the power 𝑀 to detect model misspecfication heavily hinges
on the impact of the misspecfication on second-moment estimation. The observed phenomenon suggests some level of robustness of
the second-moment estimation to covariate dependence of ESAG model parameters. In additional simulation study not reported here
where we generate covariate data from different distributions, we observe that likelihood-based estimation of 𝐸(𝐘2 ) is more sensitive
(𝝁) (𝜸)
to violation of 𝐻0 or 𝐻0 when the covariate distribution is skewed, and, consequently, 𝑀 becomes more powerful in detecting
covariate dependence of 𝝁 or 𝜸 .
8
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
Table 3
Rejection rates of tests for isotropy using RoC, the scatter
test, and the hybrid test at nominal level 0.05, with the
highest rejection rate at each setting highlighted in bold.
A referee brought to our attention tests for rotational symmetry about a location of a directional distribution proposed in García
Portugués et al. (2020), where the study is not limited to a particular distribution family. The location is 𝝁 in our context, and
rotational symmetry about 𝝁 amounts to isotropy considered in our study. Their proposed tests are based on a special form of 𝝁
dependent ``projection'' of 𝐘 that falls in ℝ𝑑−1 and, after normalization to possess a unit norm, is uniformly distributed on 𝕊𝑑−2 if
𝐘 is rotationally symmetric about 𝝁. This normalized projection is known as the multivariate sign, denoted by 𝐮𝝁 (𝐘). Test statistics
based on 𝐮𝝁 (𝐘) are constructed to assess the discrepancies between the first two moments of 𝐮𝝁 (𝐘) suggested by the data and the first
two moments of a uniform distribution on 𝕊𝑑−2 . To compare their moment-based strategies (developed in the non-regression setting)
with our RoC-based isotropy test, we use the R package, rotasym (García-Portugués et al., 2023), to implement the test called the
scatter test, which is based on the second-moment discrepancy, and the hybrid test, which accounts for discrepancies in both the first
and second moments.
⊤
√ comparative study, we generate random samples of size 𝑛 ∈ {200, 400} from ESAG(𝝁, 𝜸), where 𝝁 = (2, −5, 3, 5) and
In the
𝜸 = (𝑟∕ 5)𝟏5 , for 𝑟 ∈ {0.2, 0.4}, with a higher value of 𝑟 leading to a greater degree of anisotropy. Table 3 provides the rejection
rates of the three considered tests across 300 Monte Carlo replicates in each simulation setting. By exploiting the concentration
estimation with and without the assumption of isotropy under the ESAG family, the proposed RoC test achieves higher power than
the two competing methods that do not assume a particular distribution family as the alternative or full model. This may be partly
thanks to the true data-generating mechanism being ESAG in this experiment. The competing methods have the advantage of having
the test statistics follow some 𝜒 2 distribution asymptotically, and thus the corresponding 𝑝-values can be easily obtained without a
bootstrap procedure as for RoC. Systematically theoretical and empirical comparisons between the RoC test and these tests when the
true data-generating mechanism deviates from ESAG are interesting directions to pursue in follow-up research.
6. Prediction regions
Following the estimation of all model parameters in an ESAG regression model, one can predict the outcome of the directional
response 𝐘. If all model parameters are known, similar to the prediction region for a multivariate Gaussian distribution (Chew, 1966),
a sensible 100(1 − 𝑎)% prediction region that rflects the elliptical symmetry of ESAG(𝝁, 𝜸) is an ellipsoidal ball given by
{ }
PR𝑎 = 𝐲 ∈ 𝕊𝑑−1 ∶ (𝐲 − 𝝁∕‖𝝁‖)T 𝐕−1 (𝐲 − 𝝁∕‖𝝁‖) ≤ 𝑞𝛼 , (5)
where 𝑞𝑎 is chosen such that 𝑃 (𝐘 ∈ PR𝑎 ) = 1 − 𝑎. We show in Appendix C of the Supplementary Material that PR𝑎 dfined in (5)
has the smallest volume in a class of ellipsoidal prediction regions centering around 𝝁∕‖𝝁‖ with the nominal coverage probability of
1 − 𝑎.
When the model parameters are unknown, we evaluate 𝝁 and 𝐕 at their MLEs, 𝝁 ̂ and 𝐕̂ , in (5), and estimate 𝑞𝑎 by 𝑞̂𝑎 that is
obtained using bootstrap samples from the estimated ESAG distribution. This leads to a 100(1 − 𝑎)% prediction region dfined as
{ }
P̂
R𝑎 = 𝐲 ∈ 𝕊𝑑−1 ∶ (𝐲 − 𝝁∕‖ ̂ T 𝐕̂ −1 (𝐲 − 𝝁∕‖
̂ 𝝁‖) ̂ 𝝁‖)
̂ ≤ 𝑞̂𝑎 . (6)
Algorithm 4 below provides the detailed computational path leading to P̂ R𝑎 when 𝐗 = 𝐱0 . Appendix D of the Supplementary Material
presents a simulation study where we follow Algorithm 4 to compute prediction regions of different nominal coverage probabilities
based on samples of size 𝑛 ∈ {200, 400, 800}. The simulation results suggest that the empirical coverage probabilities of the resultant
prediction regions match closely with the nominal levels.
We now put into action the regression analysis toolkit on data examples from two real-life applications.
We analyzed in a recent work (Yu and Huang, 2024) the relative abundance of two major ions, K+ and Na+ , and two minor
ions, Ca2+ and Mg2+ , in water samples collected from two sets of locations between the summer of 1997 and the spring of 1999:
9
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
Algorithm 4 Compute the prediction region in (6).
1: procedure Parametric bootstrap accounting for variation of ESAG
2: ̂ 1 , 𝜷̂ 0 , and 𝐁
Given the observed data {(𝐘𝑖 , 𝐗𝑖 )}𝑛𝑖=1 , compute the MLEs for regression coefficients, 𝜶̂ 0 , 𝐀 ̂ 1 , assuming an ESAG model for 𝐘𝑖 conditioning on 𝐗𝑖 .
3: Compute 𝝁 ̂ 1 𝐱0 and 𝜸̂ = 𝜷̂ 0 + 𝐁
̂ = 𝜶̂ 0 + 𝐀 ̂ 1 𝐱0 , obtain the corresponding 𝐕̂ .
4: Set 𝑚 = the number of bootstrap samples. Generate a random sample, {𝐘′𝑗 }𝑚 𝑗=1
̂ 𝜸)
, from ESAG(𝝁, ̂ .
5: Compute 𝑞𝑗 = (𝐘′𝑗 − 𝝁∕‖ ̂ T 𝐕̂ −1 (𝐘′𝑗 − 𝝁∕‖
̂ 𝝁‖) ̂ 𝝁‖)
̂ , for 𝑗 = 1, ..., 𝑚.
6: end procedure
7: procedure Nonparametric bootstrap accounting for variation of MLEs
8: Set 𝐵 = the number of bootstrap samples.
9: for 𝑏 in 1, ..., 𝐵 do
Generate the 𝑏-th bootstrap sample {𝐘𝑖 , 𝐗𝑖 }𝑛𝑖=1 via sampling with replacement from the raw data.
(𝑏) (𝑏)
10:
Repeat Steps 2--5 using data {(𝐘𝑖 , 𝐗𝑖 )}𝑛𝑖=1 . Denote the bootstrap version of 𝑞𝑗 as 𝑞𝑗 .
(𝑏) (𝑏) (𝑏)
11:
12: end for
Viewing {𝑞𝑗 , 𝑞𝑗 , … , 𝑞𝑗 }𝑚
(1) (𝐵)
13: 𝑗=1
as a sample of size 𝑚 × (𝐵 + 1), find the (1 − 𝑎)-quantile of this sample. Denote this sample quantile as 𝑞̂𝑎 .
14: Output a 100(1 − 𝑎)% prediction region when 𝐗 = 𝐱0 given by {𝐲 ∈ 𝕊𝑑−1 ∶ (𝐲 − 𝝁∕‖ ̂ T 𝐕̂ −1 (𝐲 − 𝝁∕‖
̂ 𝝁‖) ̂ 𝝁‖)
̂ ≤ 𝑞̂𝑎 } .
15: end procedure
Fig. 2. Directional data on 𝕊2 corresponding to each triplet of components from tributaries of Anoia (upper panels) and those from the lower Llobregat course (lower
panels).
67 samples from tributaries of Anoia and 43 samples from tributaries of the lower Llobregat course in Spain (Otero et al., 2005).
The complete data are available in the R package, compositions (Van den Boogaart and Tolosana-Delgado, 2008). The relative
abundance of (K+ , Na+ , Ca2+ , Mg2+ ) is an example of compositional data in a 3-dimensional simplex, ℂ3 = {𝐲∗ ∈ ℝ4 ∶ 𝟏⊤ 4
𝐲∗ =
1 and 𝐞⊤
𝑗 𝐲 ∗ ≥ 0, for 𝑗 = 1, … , 4}, where 𝟏 is the 4 × 1 vector of ones, and 𝐞 is the unit vector with the 𝑗 -th entry being 1. We
4 𝑗
transformed the compositional data by taking the square-root of 𝐲∗ ∈ ℂ3 element-wise to directional data in 𝕊3 . Previous analyses
of the directional data from each set of locations suggested an adequate fit of an intercept-only ESAG model, but a poor fit for the
combined data of size 𝑛 = 110 from two sets of locations. Fig. 2 presents the directional data associated with triplets of components
from each set of locations on the three-dimensional sphere 𝕊2 . Even though the directional data to be analyzed are on 𝕊3 , a spherical
space hard to visualize, the partial data plotted on 𝕊2 in Fig. 2 seem to suggest that data from tributaries of Anoia are more variable
than those from tributaries of the lower Llobregat course. The lack of circular shape of (many of) the data clouds may also suggest
anisotropy of the underlying distribution.
These earlier findings and data visualization motivate a location-dependent ESAG model for all data from these locations, where
we incorporate a covariate 𝑋 indicating locations, with 𝑋 = 0 corresponding to tributaries of Anoia (At), and 𝑋 = 1 representing
tributaries of the lower Llobregat course (LLt). Fitting the regression model, 𝑌𝑖 |𝑋𝑖 ∼ ESAG(𝝁𝑖 = 𝜶 0 + 𝜶 1 𝑋𝑖 , 𝜸 𝑖 = 𝜷 0 + 𝜷 1 𝑋𝑖 ), to the
data, we arrive at the following estimates for the ESAG model parameters,
⎡ −0.67 ⎤ ⎡ 2.43 ⎤
⎡ 1.99 ⎤ ⎡ 1.28 ⎤ ⎢ 0.15 ⎥ ⎢ −0.22 ⎥
⎢ 5.74 ⎥ ⎢ 2.83 ⎥ ⎢ ⎥ ⎢ ⎥
𝝁̂ 𝑖 = ⎢ ⎥+⎢ ⎥ 𝑋𝑖 , 𝜸̂ 𝑖 = ⎢ −0.82 ⎥ + ⎢ 10.17 ⎥ 𝑋𝑖 .
⎢ 7.95 ⎥ ⎢ 1.06 ⎥ ⎢ 6.12 ⎥ ⎢ −20.19 ⎥
⎣ 4.59 ⎦ ⎣ 1.20 ⎦ ⎢ 0.64 ⎥ ⎢ 0.47 ⎥
⎣ ⎦ ⎣ ⎦
10
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
Hence, for the directional response associated with tributaries of Anoia, the mean direction is estimated to be 𝝁 ̂ At =
(1.99, 5.74, 7.95, 4.59)⊤ , and, for the directional response coming from tributaries of the lower Llobregat course, the estimated mean
direction is 𝝁̂ LLt = (3.27, 8.57, 9.01, 5.79)⊤ . These estimates lead to the estimated concentration at each set of locations that provide
data evidence suggesting that the latter set of locations exhibits a higher concentration than the former. These are coherent with
results in our previous analysis when we analyzed data from one set of locations at a time, and also in line with the visual impression
from Fig. 2. Estimates for 𝜸 𝑖 when 𝑋𝑖 = 0 and 1 are also aligned with our earlier analyses (and are omitted here), based on which
estimates for 𝐕 corresponding to two sets of locations, 𝐕̂ At and 𝐕̂ LLt , can be obtained.
For model diagnosis, we carry out tests for isotropy and covariate dependence of 𝝁 and 𝜸 based on the three proposed test statistics.
All tests suggest statistically significant evidence of location-dependent model parameters in the ESAG distribution that is anisotropic
for the (transformed) compositions of (K+ , Na+ , Ca2+ , Mg2+ ). All estimated 𝑝-values are less than 10−3 except for that associated
with 𝑀 when testing covariate dependence of 𝜸 , which returns an estimated 𝑝-value less than 0.01 (although larger than 10−3 ). This
is consistent with findings in existing literature reporting that the hydrochemical prfile of Anoia and that of the lower Llobergat
course are substantially different because the two sets of tributaries pass through zones that are differently populated with vastly
different distributions of agricultural and industrial areas (González et al., 2012). Zooming in on the tests for covariate dependence
of 𝝁, we have 𝐷 = 1.062 that is somewhat higher than RoC = 1.059. This can be data evidence indicating that not only the norm of
the mean direction depends on 𝑋 , that is, the concentration varies across locations, but also the orientation of the mean direction
(𝜸)
differs between locations. When testing covariate dependence of 𝜸 , i.e., testing 𝐻0 versus the full model, the two statistics are nearly
equal (at around 1.090). This suggests that, once we acknowledge a location-dependent 𝝁 in the null model, allowing 𝜸 to depend
on 𝑋 in the alternative model mostly helps to distinguish the variability of data across different locations but it may not contribute
to capturing the discrepancy in the orientation of 𝝁 in different locations.
Lastly, applying Algorithm 4 with 𝑥0 = 0 and 1, we obtain the prediction regions for the two sets of locations given by
(At)
P̂
R𝑎 = {𝐲 ∈ 𝕊3 ∶ (𝐲 − 𝝁̂ At ∕‖𝝁̂ At ‖)T 𝐕̂ −1
At
(𝐲 − 𝝁̂ At ∕‖𝝁̂ At ‖) ≤ 𝑞̂𝑎(At) },
(LLt)
P̂
R𝑎 = {𝐲 ∈ 𝕊3 ∶ (𝐲 − 𝝁̂ LLt ∕‖𝝁̂ LLt ‖)T 𝐕̂ −1 ̂ LLt ∕‖𝝁̂ LLt ‖) ≤ 𝑞̂𝑎(LLt) },
LLt (𝐲 − 𝝁
(At) (LLt)
with 𝑞̂𝑎 = 0.029, 0.036, 0.050 and 𝑞̂𝑎 = 0.018, 0.023, 0.031 for the nominal level of 70%, 80%, and 90%, respectively. At each
(LLt)
considered nominal level, having 𝑞̂𝑎 < 𝑞̂𝑎(At) is in line with the finding that the distribution of directional data from the lower
Llobregat course exhibits a higher concentration (i.e., lower variability) than that for Anoia.
We now turn to a dataset regarding the gut microbiota of elderly adults. Besides gut microbiome compositions of 160 elderly
adults, also recorded in this data include the residence types, age, body mass index (BMI), diet, and gender. A similar dataset has
been analyzed by Claesson et al. (2012), where the authors carried out a principal component analysis to study correlations of the
relative abundance of various microorganisms in the gut. Shen et al. (2022) used the Gaussian chain graph model for the data to
infer the effects of one’s diet and residence type on gut microbiome composition. For illustration purposes, we study the potential
association between two covariates, one’s age and BMI, and the directional response on 𝕊3 dfined as the square root of the relative
abundance of four genera of bacteria found in the gut: Blautia, Caloramator, Clostridium, and Faecalibacterium.
We first fit the directional response data to the ESAG regression model, for 𝑖 = 1, … , 160,
11
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
Fig. 4. Estimates of ‖𝝁‖, 𝜆1 , 𝜆2 , and 𝜆3 versus BMI when one is 70 (solid lines), 80 (dashed lines), and 90 (dotted lines) years of age.
ESAG model. This test applied to the current dataset yields an estimated 𝑝-value of 0.58, suggesting insufficient evidence for the
lack of fit of the current model. In addition, the tests for isotropy based on RoC and 𝑀 , and the tests for covariates dependence of
ESAG model parameters based on RoC and 𝐷 all produce estimated 𝑝-values less than 0.01. We thus conclude significant covariates
effects on the ESAG model parameters and recommend against opting for a regression model more parsimonious than (7). To further
demonstrate the versatility and informativeness of RoC and 𝐷, we test the significance of age in modeling 𝝁 given that 𝜸 depends
on both covariates as in (7) and that 𝝁 depends on BMI. These tests yield RoC = 1.068 and 𝐷 = 1.070, with estimated 𝑝-values less
than 0.001. These can serve as data evidence indicating the significance of age in modeling (the BMI-dependent) 𝝁 and that age
potentially impacts the direction of 𝝁 (besides its norm) significantly. These findings on the significance of the covariate effect and
overall goodness-o-fit remain unchanged when we use the normalized covariate data with mean zero and variance one.
To further elucidate the effects of age and BMI on ESAG model features, we present in Fig. 4 estimates of the concentration and
three eigenvalues of 𝐕, (𝜆1 , 𝜆2 , 𝜆3 ), versus BMI when one is 70, 80, and 90 years of age. As age increases, we observe in Fig. 4 a decrease
in the estimate of ‖𝝁‖, corresponding to an increase in the estimated overall variation of 𝐘. The finding of highly variable directional
distribution can imply highly variable in the composition of the gut microbiota among the elderly, which is a finding reported in
existing literature but has been mostly stated in comparison with younger (than 65) healthy adults that are found to have a more stable
composition of intestinal microorganisms (Claesson et al., 2012). Our results here can be evidence for that, even among the elderly,
the trend of higher variability in microbiome composition as one ages persists. Moreover, a higher BMI also leads to a more variable
distribution. Examining the estimated eigenvalues of 𝐕, one can see two change points in BMI: one at BMI of nearly 25 for an 80
year-old and the other at BMI of around 35 for a 90-year-old. The first change point separates healthy weight (BMI ∈ (18.5, 24.9)) and
overweight (BMI ∈ (25.0, 29.9)); the second change point belongs to the obese range (https://www.cdc.gov/healthyweight/assessing).
Because 𝜆1 = 𝜆2 = 𝜆3 = 1(= 𝜆4 ) implies 𝐕 = 𝐈4 , the proximity of the three considered eigenvalues to 1 implies (nearly) isotropy of
the directional distribution and also relates to weak correlations between the four genera of bacteria. The aforementioned change
points are where the estimates for these eigenvalues are closest to 1, and thus the distribution of 𝐘 tends to be more isotropic when
𝐗 = (Age = 80, BMI ≈ 25) and (Age = 90, BMI ≈ 35). This can also imply a reduction in the correlation between the relative abundance
of the four considered genera of bacteria at these change points.
12
Z. Yu and X. Huang
Computational Statistics and Data Analysis 208 (2025) 108167
8. Discussion
We develop in this study a complete package of regression analysis for directional response built upon the ESAG distribution
family indexed by constraint-free parameters. We consider a full range of statistical inference problems, including parameter estima
tion, testing hypotheses on model features, and prediction. The uncertainty of parameter estimation can be assessed via bootstrap.
Parametric bootstrap is also heavily involved in all proposed inference procedures, which is straightforward to implement owing to
the formulation and parametrization of ESAG that allow for easy data generation from an ESAG distribution. Computer programs for
implementing all proposed methods are available at https://github.com/Zehaoyu217/ESAG/blob/main/ESAG_Project2.R. We also
demonstrate the use of this package for analyzing two datasets from different fields of applications.
The number of parameters in an ESAG regression model can be large in an application since the dimension of the parameter space
grows quadratically in the dimension of 𝐘, 𝑑 , and linearly in the number of covariates, 𝑞 . For example, in microbiome analysis,
𝑑 is the dimension of the compositional response, which typically is much larger than four, and one may wish to consider many
covariates relating to the host’s physiological characteristics. We have started developing penalized likelihood-based methods to
deal with high-dimensional directional data. Besides this ongoing follow-up research, another interesting topic is compositional data
analysis the two case studies in Section 7 relate to. The idea of relating compositional data on a simplex to directional data on a
hypersphere has been explored (Scealy and Welsh, 2011, 2017; Li et al., 2023) but with many open questions yet to be addressed. In
this particular context, more components tend to have zero or nearly zero relative abundance as 𝑑 increases, which is a data pattern
ESAG and most existing named directional distributions tend to fit poorly. Interpretations and implications of model parameters of
a directional distribution that are practically meaningful for the corresponding compositional data also demand further systematic
investigation.
Acknowledgements
We are grateful to the Associate Editor and two anonymous reviewers for their constructive comments and suggestions which
helped to greatly improve our article.
References
Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S., Ridgeway, G., 2005. Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res. 6.
Chernoff, H., 1954. On the distribution of the likelihood ratio. Ann. Math. Stat. 25, 573--578.
Chew, V., 1966. Cofidence, prediction, and tolerance regions for the multivariate normal distribution. J. Am. Stat. Assoc. 61, 605--617.
Claesson, M.J., Jeffery, I.B., Conde, S., Power, S.E., O’connor, E.M., Cusack, S., Harris, H., Coakley, M., Lakshminarayanan, B., O’sullivan, O., et al., 2012. Gut microbiota
composition correlates with diet and health in the elderly. Nature 488, 178--184.
Drton, M., 2009. Likelihood ratio tests and singularities. Ann. Stat. 37, 979--1012.
Drton, M., Williams, B., 2011. Quantifying the failure of bootstrap likelihood ratio tests. Biometrika 98, 919--934.
Ennajari, H., Bouguila, N., Bentahar, J., 2021. Combining knowledge graph and word embeddings for spherical topic modeling. IEEE Trans. Neural Netw. Learn.
Syst. 34, 3609--3623.
García-Portugués, E., Paindaveine, D., Verdebout, T., 2020. On optimal tests for rotational symmetry against new classes of hyperspherical distributions. J. Am. Stat.
Assoc. 115, 1873--1887.
García-Portugués, E., Paindaveine, D., Verdebout, T., 2023. rotasym: Tests for Rotational Symmetry on the Hypersphere. https://CRAN.R-project.org/package=
rotasym, r package version 1.1.5.
González, S., López-Roldán, R., Cortina, J.L., 2012. Presence and biological effects of emerging contaminants in Llobregat River basin: a review. Environ. Pollut. 161,
83--92.
Johnson, R.A., Wehrly, T.E., 1978. Some angular-linear distributions and related regression models. J. Am. Stat. Assoc. 73, 602--606.
Jupp, P., 1988. Residuals for directional data. J. Appl. Stat. 15, 137--147.
Li, B., Yoon, C., Ahn, J., 2023. Reproducing kernels and new approaches in compositional data analysis. J. Mach. Learn. Res. 24, 1--34.
Lund, U., 1999. Least circular distance regression for directional data. J. Appl. Stat. 26, 723--733.
Mitchell, J.D., Allman, E.S., Rhodes, J.A., 2019. Hypothesis testing near singularities and boundaries. Electron. J. Stat. 13, 2150.
Otero, N., Tolosana-Delgado, R., Soler, A., Pawlowsky-Glahn, V., Canals, A., 2005. Relative vs. absolute statistical analysis of compositions: a comparative study of
surface waters of a Mediterranean river. Water Res. 39, 1404--1414.
Paine, P.J., Preston, S.P., Tsagris, M., Wood, A.T., 2018. An elliptically symmetric angular Gaussian distribution. Stat. Comput. 28, 689--697.
Paine, P.J., Preston, S., Tsagris, M., Wood, A.T., 2020. Spherical regression models with general covariates and anisotropic errors. Stat. Comput. 30, 153--165.
Presnell, B., Morrison, S.P., Littell, R.C., 1998. Projected multivariate linear models for directional data. J. Am. Stat. Assoc. 93, 1068--1077.
Ryali, S., Chen, T., Supekar, K., Menon, V., 2013. A parcellation scheme based on von mise-fisher distributions and Markov random fields for segmenting brain regions
using resting-state FMRI. NeuroImage 65, 83--96.
Scealy, J., Welsh, A., 2011. Regression for compositional data by using distributions dfined on the hypersphere. J. R. Stat. Soc., Ser. B, Stat. Methodol. 73, 351--375.
Scealy, J., Welsh, A., 2017. A directional mixed effects model for compositional expenditure data. J. Am. Stat. Assoc. 112, 24--36.
Scealy, J., Wood, A.T., 2019. Scaled von Mises–Fisher distributions and regression models for paleomagnetic directional data. J. Am. Stat. Assoc.
Shen, Y., Solís-Lemus, C., Deshpande, S.K., 2022. Sparse Gaussian chain graphs with the spike-and-slab LASSO: Algorithms and asymptotics. arXiv preprint. arXiv:
2207.07020.
Van den Boogaart, K.G., Tolosana-Delgado, R., 2008. ``Compositions'': a unfied R package to analyze compositional data. Comput. Geosci. 34, 320--338.
Wang, F., Gelfand, A.E., 2013. Directional data analysis under the general projected normal distribution. Stat. Methodol. 10, 113--127.
White, H., 1982. Maximum likelihood estimation of misspecfied models. Econometrica 50, 1--25.
Yu, Z., Huang, X., 2024. A new parameterization for elliptically symmetric angular gaussian distributions of arbitrary dimension. Electron. J. Stat. 18, 301--334.
13