0% found this document useful (0 votes)
102 views12 pages

Computational Statistics and Data Analysis: Tonglin Zhang, Ge Lin

Uploaded by

Julian Escobar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views12 pages

Computational Statistics and Data Analysis: Tonglin Zhang, Ge Lin

Uploaded by

Julian Escobar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Computational Statistics and Data Analysis 95 (2016) 83–94

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis


journal homepage: www.elsevier.com/locate/csda

On Moran’s I coefficient under heterogeneity


Tonglin Zhang a,∗ , Ge Lin b
a
Department of Statistics, Purdue University, 250 North University Street, West Lafayette, IN 47907-2067, United States
b
Department of Environmental and Occupational Health, University of Nevada, Las Vegas, NV 89154, United States

article info abstract


Article history: Moran’s I is the most popular spatial test statistic, but its inability to incorporate heteroge-
Received 25 December 2014 neous populations has been long recognized. This article provides a limiting distribution of
Received in revised form 28 July 2015 the Moran’s I coefficient which can be applied to heterogeneous populations. The method
Accepted 24 September 2015
provides a unified framework of testing for spatial autocorrelation for both homogeneous
Available online 9 October 2015
and heterogeneous populations, thereby resolving a long standing issue for Moran’s I. For
Poisson count data, a variance adjustment method is provided that solely depends on pop-
Keywords:
Clustering or clusters
ulations at risk. Simulation results are shown to be consistent with theoretical results. The
Heterogeneity application of Nebraska breast cancer data shows that the variance adjustment method is
Martingale central limit theorem simple and effective in reducing type I error rates, which in turn will likely reduce potential
Moran’s I coefficient misallocation of limited resources.
Permutation test © 2015 Elsevier B.V. All rights reserved.
Spatial autocorrelation

1. Introduction

In the past 20 years, improved GIS and computer technologies have led to a rapid expansion of statistical methods
for the analysis of spatial data. The concept and the usage of computational intensive methods, such as Markov random
field methods (Green and Richardson, 2002), geostatistical methods (Gneiting, 2002; Kelsall and Wakefield, 2002; Stein,
2005), and Bayesian disease mapping methods (Waller et al., 1997; Wakefield and Morris, 2001) have developed rapidly. A
growing number of social and health scientists have taken up the use of the sophisticated technology and new methodologies
of spatial analysis in their empirical work (Best et al., 2000; Goodchild et al., 2000; Pickle et al., 2005). Moran’s I is the
most widely used test statistic in spatial statistical literature, and it has been included in major commercial geographic
information systems (e.g., ArcGIS, MapInfo, Intergraph, Imagine), spatial analysis packages (e.g., CrimeStat, GeoDa,
TerraSeer), and some statistical packages (e.g., MatLab, R, S+, SAS).
Moran’s I coefficient (Moran, 1948) is defined as
m 
m
wij (Xi − X̄ )(Xj − X̄ )

i =1 j =1
I = , (1)
S0m b2m
where Xi is the variable of interest in region i (i = 1, . . . , m), X̄ = i=1 Xi /m, S0m = j=1 wij , b2m = i=1 (Xi − X̄ ) /m
m m m 2
m
i=1
and wij with wii = 0 is an element of the spatial weight matrix. Moran’s I mostly ranges between −1 and 1, but it can be
outside of [−1, 1] in extreme cases (Arbia, 2014, P. 2). The absolute value of I is bounded by the square root of the ratio
between the variance of spatially lagged value and the variance of observed values (Arbia, 1989). It has also been shown

∗ Corresponding author.
E-mail addresses: [email protected] (T. Zhang), [email protected] (G. Lin).

http://dx.doi.org/10.1016/j.csda.2015.09.010
0167-9473/© 2015 Elsevier B.V. All rights reserved.
84 T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94

that the value of Moran’s I varies from the largest and the least eigenvalues of the weight matrix (Griffith, 1988). These
conclusions can provide a way to understand the possible ranges for the values of Moran’s I. In applications, a significant
positive autocorrelation indicates the existence of either high-value or low-value clustering, while a negative autocorrelation
indicates a tendency toward the juxtaposition of high values next to low values. If there is no spatial dependence, the
expected value of I is equal to −1/(m − 1), which is close to 0 if m is large.
The null distribution of Moran’s I is derived from the assumption √that distributions of Xi are homogeneous. The p-value
of Moran’s I is based on its z-score defined by Z (I ) = [I − E (I )]/ V (I ), where E (I ) and V (I ) are the mean and variance
respectively. There are two ways to define the null hypothesis. The first assumes that X1 , . . . , Xm are independently and
identically distributed (i.i.d.) and the second assumes that X1 , . . . , Xm are obtained from a random permutation of observed
values. The validity of the asymptotic N (0, 1) for Z (I ) in the i.i.d. case is evident from Sen (1976), but this assumption
may not be valid for variables based on rate when population sizes among area units vary substantially (Besag and Newell,
1991). Although several alternative methods have been proposed (Assuncao and Reis, 1999; Oden, 1995; Waldhor, 1996;
Whittemore et al., 1987), for reasons listed below, many still use Moran’s I and ignore the potential problems associated
with heterogeneous populations. Heterogeneity continues to be the central problem for epidemic models and data (Millison
et al., 1994; Zhang and Lin, 2009).
First, all the alternative methods tend to introduce a new test statistic based on regional counts and populations at-
risk, and none of them have received wide acceptance. For instance, the population weighted Moran’s I proposed by Oden
(1995) is essentially a spatially χ 2 statistic, which is not always effective to account for heterogeneous populations (Assuncao
and Reis, 1999). Second, some data (e.g., confidential and historical) are only available in rates, for which most alternative
methods cannot be used. Third, it is not clear in which situation the population heterogeneity problem will become a serious
concern, as such that Moran’s I should not be used. Finally, the proliferation of Moran’s I in various software packages makes
it a candidate for potential misuse as one might simply be unaware of the problem.
In this paper, we attempt to resolve the population heterogeneity issue by providing a unified statistical framework
through the limiting distribution of the Moran’s I coefficient. Since heterogeneous populations could cause variance inflation,
a large sample distribution of Moran’s I under heterogeneity should be able to gauge and adjust variance inflation or deflation
for Z (I ). In the following sections, we first derive a limiting distribution of Moran’s I, and then demonstrate its use with two
numerical examples in Section 3 and a case study in Section 4. Finally, we offer some concluding remarks.

2. Notation and main result

Notation. Assume a study area partitioned into m regions. Let Xi be the variable of interest from the ith region. Suppose
X1 , . . . , Xm are assumed to be independent. Let µi = E (Xi ), σi2 = V (Xi ), κi = E [(Xi − µi )4 ], µ̄ = µi /m, σ̄ 2 =
m
i= 1

i=1 σi /m. Let Xim = (Xi − µ̄)/σ̄ , X̄ = i=1 Xi /m, X̄·m i=1 Xim /m, µim E (Xim ) = (µi − µ̄)/σ̄ , σim = V (Xim ) = σi2 /σ̄ 2
m 2 m
= m 2

=
and κim = E [(Xim − νi ) ] = κi /σ̄ , i = 1, . . . , m. Then i=1 µim = 0 and i=1 σim /m = 1. For a positive integer k, we write
4 4 m m 2

i=1 (Xi − X̄ ) /m, b̃km = i=1 (Xim


−mX̄·m ) /m, ηkm = i=1 (µi − µ̄) /m and η̃ i=1 µim /m. Then
b̃km = bkm /σ̄ k
m k k
m k
m k
m
bkm = km =
and η̃km = ηkm /σ̄ . We write wi· = j=1 wij /m, w·i = j=1 wji /m, S0m = j=1 wij , S1m = j=1 wij ,
k m  m m m m 2
i =1 i=1

i=1 wi· , ψm = S0m /m and ωm = 2S1m /m. We denote ψ̃m =


2
 m 2 2
 m  m
S2m = m i =1 j=1 |wij |/m,

m 
√  m
m wij (µi − µ̄)(µj − µ̄) m m
i=1 j=1 1 
θm = m
= √ wij µim µjm (2)
m i=1 j=1
σi2

i =1

and
 2
m 
m m m
wij2 σi2 σj2 σk2 (wkj + w·j )(µj − µ̄)
  
2m 4m
i =1 j =1 k=1 j =1
τ =
2
m m 2 + m 2
σ 2
σi2
 
i
i =1 i=1
 2
m m m m
2  4  
= w σ σ +
2 2 2
ij im jm σ2
km (wkj + w·j )µjm . (3)
m i=1 j=1 m k=1 j =1

P
Throughout the paper, we assume that the fourth moment of Xi exists for all i ≤ m. We write → as convergence in
L
probability and → as convergence in law (or in distribution).
Main Result. We impose the following regularity conditions for our main result:
(C1) For any i and j, wii = 0. m
(C2) For any fixed i ≤ m, there is a constant C such that j=1 |wij | ≤ C .
T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94 85

(C3) There are constants c1 and c2 such that c1 < ψ̃m , τm2 , ωm
2
< c2 for all m.
(C4) There exist constants C and α < 1/2 such that maxi≤m (µi − µ̄)2 /σ̄ 2 ≤ Cmα (or equivalently maxi≤m µ2im ≤ Cmα ) and
1/2 1/2
maxi≤m κi /σ̄ 2 ≤ Cmα (or equivalently maxi≤m κim ≤ Cmα ).
(C5) The limits θ = limm→∞ θm and η = limm→∞ η̃2m exist. The limits τ 2 = limm→∞ τm , ψ = limm→∞ ψm and
ω2 = limm→∞ ωm 2
positively exist.
Conditions (C1) and (C2) are imposed by Sen (1976) when he proved the asymptotic normality of Moran’s I for the i.i.d.

case. Condition (C4) is imposed for the asymptotic normality of mĨm which is weaker than the i.i.d. case. In particular,
Condition (C4) includes both the case when the expected values of Xi are the same and the case when the variances of Xi
are different. If µ1 , . . . , µm are distinct but σ12 , . . . , σm2 are the same, then Moran’s I is often used to test the presence of
spatial clustering. If σ12 , . . . , σm2 are distinct, then the type I error probability of Moran’s I for spatial clustering may not be
close to the significance level (Walter, 1992). Therefore, the asymptotic property of Moran’s I under distinct expected values
and distinct variances is important. Although we need higher-order moment conditions in Condition (C4), since in practice
Moran’s I is mostly applied to Gaussian or Poisson data, the issue of heavy tail distribution is not critical. Otherwise, we

should revise Condition (C4) to incorporate such a case. Condition (C5) ensures that the asymptotic distribution of mĨm
has an explicit asymptotic mean and variance based on the independent but not i.i.d. sample. Condition (C3) is a weaker
version of Condition (C5).
The main result for the asymptotic normality of the Moran’s I coefficient is obtained by deriving the asymptotic

distribution of its numerator. The key is to derive the asymptotic normality of mĨm as m → ∞, where
m 
m
1 
Ĩm = m
wij (Xi − X̄ )(Xj − X̄ )
σi2

i =1 j =1
i=1
m m
1 
= wij (Xim − X̄·m )(Yjm − X̄·m ). (4)
m i=1 j=1

We partition mĨm into a sum of three terms: a linear–quadratic term (Tm ) to be asymptotic normally distributed, a non-
random term (θm ), and asymptotically negligible term (Rm ):

mĨm = Tm + θm + Rm , (5)

where θm is given by (2),


 
m 
m m 
m
1  
Tm = √ wij (Xim − µim )(Xjm − µjm ) + 2 (wij + w·j )µjm (Xim − µim ) (6)
m i=1 j=1 i =1 j =1

and
m m m
√  X̄ 2  
Rm = −2 mX̄·m wi· (Xim − µim ) + √·m wij . (7)
i=1
m i=1 j=1

It is evident that the derivation of the limiting distribution of mĨm completely depends on the derivation of the
linear–quadratic term Tm .
The Central Limit Theorem of quadratic forms has been extensively researched (de Jong, 1987; de Wet and Vebter, 1973;
Giraitis and Taqqu, 1998; Mikosch, 1991; Whittle, 1964), and it has been quite useful in spatial statistics. For Moran’s I
in particular, the proof of the Central Limit Theorem of quadratic forms is an application of the Martingale Central Limit
Theorem (Billingsley, 1995, p 475), which has been used by Sen (1976) in the i.i.d. case and Kelejian and Prucha (2001) in
their ARMA(1,1) model for the limiting distributions of Moran’s I. Still based on the Martingale Central Limit Theorem, we
derived the limiting distribution of Moran’s I and associated propositions for the independent case (see the proofs in the
Appendix).

P
Lemma 1. If Conditions (C1)–(C3) hold, then Rm → 0 and m → ∞.
L L
Theorem 1. If Conditions (C1)–(C4) hold, then Tm /τm → N (0, 1). If Condition (C5) also holds, then Tm → N (0, τ 2 ).
√ L √ L
Corollary 1. If Conditions (C1)–(C4) , then ( mĨm − θm )/τm → N (0, 1). If Condition (C5) also holds, then mĨm → N (θ , τ 2 ).

P √ P √ P
Proposition 1. If Conditions (C1)–(C4) , b̃2m − η̃2m → 1, b̃2m / m → 0, η̃2m / m → 0 and b̃4m /m → 0.
86 T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94

Theorem 2. Under the regularity Conditions (C1)–(C4) ,



mI − θm /[ψm (1 + η̃2m )] L
→ N (0, 1). (8)
τm /[ψm (1 + η̃2m )]
√ L
If Condition (C5) also holds, then mI → N (θ /[ψ(1 + η)], τ 2 /[ψ 2 (1 + η)2 ]).

Lemma 1 states that the limiting distribution of mĨm depends solely on the limiting distribution of Tm in Theorem 1.
The proof of Tm is an application of the Central Limit Theorem of quadratic forms in Kelejian and Prucha (2001). Corollary 1
provides the limiting distribution of the numerator for Moran’s I and Proposition 1 describes the limiting property in
probability for the denominator, because it is still a random variable. Conditions (C3) or (C5) guarantees that such a limit
is always positive. Theorem 2 is the main result, which provides the limiting distribution of the Moran’s I coefficient in an
independent sample.
As mentioned earlier, it is often assumed units in a study area have spatially homogeneous mean values under the null
hypothesis of Moran’s I, which is often interpreted as that µ1 , . . . , µm are the same. The significance of the test statistic can
be determined by its z-score or the Z (I ) value against the standard normal distribution. In practice, the mean and variance
of Moran’s I under the null hypothesis are usually computed under the permutation scheme (Cliff and Ord, 1981), where
the random samples X1 , . . . , Xm are assumed identically independently drawn from an unknown distribution. A common
way to derive a random sample is to permute observed values spatially (Schmoyer, 1994). When sample size is large (e.g.,
m > 50), Moran’s I is believed to be asymptotically normal (Cliff and Ord, 1981; Walter, 1992). In order to distinguish the
means and the variances of Moran’s I, we denote ER (I ), VR (I ) as the mean and variance under the randomization assumption
and they are

1
ER (I ) = − , (9)
m−1
and

m[(m2 − 3m + 3)S1m − mS2m + 3S0m


2
] b̃4m [(m2 − m)S1m − 2mS2m + 6S0m
2
] 1
VR (I ) = − − . (10)
(m − 1)(m − 2)(m − 3) (m − 1)(m − 2)(m − 3)S0m
2
S0m 2
b̃22m (m − 1)2

We denote the z-score of Z (I ) under the randomization assumption by ZR = [I − ER (I )]/ VR (I ). The general conclusion
about the limiting distributions of ZR are stated below and the proof is also in the Appendix.

Theorem 3. If Conditions (C1)–(C4) hold, then


ZR − θmR L
→ N (0, 1)
τmR
as m → ∞, where θmR = θm /[ωm (1 + η̃2m )] and τmR 2
= τm2 /[ωm
2
(1 + η̃2m )2 ]. Furthermore if Condition (C5) also holds, ZR
asymptotically weakly converges to N (θR , τR ), where θR = θ /[ω(1 + η)], τR2 = τ 2 /[ω2 (1 + η)2 ].
2

Under the general framework of the permutation test, it is believed that ZR is asymptotically distributed of N (0, 1) as
m → ∞ under the null hypothesis that the expected values of Xi are not clustered. However, Theorem 3 indicates that
this conclusion may be incorrect if spatial heterogeneity is present. The variance of the asymptotic distribution of Moran’s
I under the permutation testing scheme should be adjusted by τmR , which may not be one under spatial heterogeneity.

Corollary 2. Suppose that Conditions (C1)–(C4) hold. If the random samples X1 , . . . , Xm have the same mean and variance, then
ZR is asymptotically N (0, 1) as m → ∞.

Corollary 3. Assume Conditions (C1)–(C4) hold. Suppose that Xi , i = 1, . . . , m, are independently distributed with mean µ and
σi2 respectively. Then
L
ZR /τmR → N (0, 1),

where
m 
m
wij2 σi2 σj2

i=1 j=1
τmR
2
=   2 . (11)
m 
m m
w 2
σ /m
2
 
ij i
i=1 j=1 i=1
T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94 87

The limiting distribution of ZR in Corollary 2 is not limited to the i.i.d. case. When populations are heterogeneous,
Corollary 3 states that the variance of ZR can be approximated by (11). This case will be discussed in Example 3 in the next
section. In addition, the numerical results of the asymptotic variance given in (11) can be used to obtain the true asymptotic
distribution of ZR and the true critical value for the test of significance.
More importantly, our limiting distribution is likely to provide a way to fully understand the null hypothesis of Moran’s
I under the framework of the permutation test. According to Theorem 3, the expected value of ZR is proportional to θm
provided in Eq. (2). Assume µ1 , . . . , µm are not equal. If the weights wij are selected such that θm = 0, then Moran’s I is
likely to be insignificant even if µ1 , . . . , µm are clustered. In this case, the selection of these particular wij is not able to
detect the current cluster structure, which means other weight structure should be adopted.

3. Examples and numerical studies

Testing for spatial clustering is essential in almost all spatial data analyses. Previous studies have extensively evaluated
heterogeneity problem via Monte Carlo simulations. Our asymptotic distribution provides a theoretical basis not only for
population heterogeneity, but also for power functions encompassing almost all of the previously simulated scenarios. In this
section, we provided three simulated examples to demonstrate the usage of the limiting distribution. Example 1 assessed
the null hypothesis of Moran’s I based on typical variance patterns in our previous studies (Zhang and Lin, 2007). Example 2
provided power functions of Moran’s I under the alternative hypothesis of a few local spatial autocorrelated patterns. Both
examples were based on a 10×10 regular lattice indexed by a Cartesian coordinate for a lattice point i using Gaussian random
variables. Because of the importance of Poisson data, we also evaluated the performance of our limiting distribution for
Poisson random variables in Example 3. Following Cliff and Ord (1981) and Schmoyer (1994), we defined wij = 1/Ni if two
lattice points (i, j) were adjacent and wij = 0 otherwise, where Ni was the number of neighbors for the ith unit. In Examples 1
and 2, we focused on Gaussian random variables with heterogeneous variances. The findings can reflect similar situations
using Poisson data for rare diseases when heterogeneous populations are present, because the variances of observed disease
rates are inversely proportional to unit population sizes.

Example 1 (Evaluation of Asymptotic Variances). In this simulated example, we assumed X1 , . . . , Xm were normal random
variables. We used Eq. (11) to provide numerical solutions of variances for what could be simulated when µi were all equal.
Previous studies suggested that variances were small in densely populated areas and large in sparsely populated areas Besag
and Newell (1991). Walter (1992) and Waldhor (1996) confirmed that Moran’s I had an unacceptable variance inflation for
several simulated heterogeneous patterns. We synthesized their patterns into four artificial patterns and we use δ 2 and δ −2
ranging represent variances with δ < 1 being heterogeneous and δ = 1 being homogeneous (Fig. 1).
Pattern (1a) represented a half-urban and half-rural spatial pattern. The variances were δ 2 in the first 5 rows and were
1 in the next five rows. Pattern (1b) represented a predominant urban area with two sparsely populated rural enclaves
represented by two circular areas centered at (3, 3) or (8, 8) with a radius of two units. The variances were δ 2 if the lattice
points were within one of the circles, and were 1 otherwise. Pattern (1c) was the opposite of pattern of (1b); the variances
were δ −2 if the lattice points were within one of the circles and were 1 otherwise. Pattern (1d) represented an urban and
rural mixture. The variances were δ 2 if the sums of the corresponding row and column of the lattice points were odd, and
they were 1 otherwise.
Since these patterns satisfied the regularity conditions (C1)–(C4) and µim = 0 for all i, θmR = 0 and the asymptotic
variance τm2 could be approximated by (11). Fig. 2 displays asymptotic variance plots from (11) and the Monte Carlo variance
from simulations based on 104 replications. It was apparent that the variances derived from (11) were very close to the
simulated variances in each scenario. Fig. 1(b) exhibited an extremely stable variance, while Fig. 1(c) exhibited extremely
unstable variance. The latter result suggested that the variance of ZR was inflated if a few regions next to each other had
larger variances. The results from patterns (1b) and (1c) suggested that one should pay close attention to the problem of
variance inflation in sparsely populated areas. Fig. 1(d) showed that the variance could be lower than 1 if the heterogeneous
variances were juxtaposed. These results from the asymptotic theory were not only consistent with our own simulations,
but also confirmed previous simulation studies. The crosscheck pattern (1d) showed that the variance of Moran’s I might be
under estimated under the permutation test scheme, which yielded a much lower power function correspondingly. 

Example 2 (Evaluation of Power Function). In this simulated example, we assumed X1 , . . . , Xm were also normal random
variables. We applied Theorem 3 to evaluate their powers under the alternative hypothesis and heterogeneity. We devised
heterogeneous variances according to pattern (1c) in the previous numeric example: σ = 5 if the lattices points were within
the circles and σ = 1 otherwise. Results from both Monte Carlo simulations and the asymptotic distributions were provided
to check for their consistencies.
In the evaluation of statistical power functions, we used the same circles in the previous simulations, and generated six
spatial clustered patterns: (2a) the expected value µi = ϵ if a lattice point was within a cluster centered at (3, 8), and µi = 0
otherwise; (2b) µi = −ϵ if a lattice point was within a low-value cluster centered at (8, 3), and µi = 0 otherwise. Pattern
(2c) had a chessboard or negative autocorrelated circle centered at (3, 8); the mean was ϵ or −ϵ if a lattice point was within
the circle. Pattern (2d) combined patterns (2a) and (2b). Pattern (2e) combined a high-value cluster defined in (2a) and a
chessboard circle centered at (8, 3); pattern 2f combined patterns (2b) and (2c).
88 T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94

Fig. 1. Variance patterns used in Example 1, where σi = 1 at ‘‘◦’’, σi = δ at ‘‘•’’, and σi = 1/δ at ‘‘+’’.

Fig. 2. Variance of ZR as a function of δ for selected patterns.

To compare the power functions of ZR between simulated and theoretical results, we first computed the power functions
from the asymptotic distribution given by Theorem 3. We then computed the rejection rates via Monte Carlo simulations
with 104 replications by assuming that ZR was asymptotic N (0, τm2 ) under the null hypothesis, where τm2 was given by (11).
To under the performance of the usual permutation testing method, we also computed the power function of ZR without
the adjustment of τm2 , where we used N (0, 1) to compute the rejection rates of ZR . For each pattern, the statistical power
of Moran’s I as a function of ϵ was displayed in Fig. 3. The results again showed that the theoretical and simulated curves
were very close, but the unadjusted ones were not. Since the type I error probabilities of Moran’s I was represented by the
case when ϵ = 0, we concluded that the type I error probabilities were inflated in the unadjusted method. As ϵ increased
in patterns (2a)–(2d), the power functions increased to 1. In other words, when there was a strong high-value cluster, low-
value cluster, or both, Moran’s I signified a clustering tendency. However, when a negative autocorrelated pattern appeared
together with a high-value or low-value cluster (patterns (2e) and (2f)), Moran’s I had limited power to detect them, and it
suggesting that negative and positive autocorrelations could cancel out the influence of each other. 
T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94 89

Fig. 3. Asymptotic and simulated power functions of ZR for Gaussian data as a function of ϵ when α = 0.05.

Example 3 (Evaluation of the Performance for Poisson Data). In this simulated example, we assumed that observations
Ni independently followed Poisson distribution. In spatial epidemiology, one usually takes Xi = Ni /ξi , where Ni is the
disease count and ξi is the population at risk in region i. If the rate is rare, then Ni is approximately Poisson distributed:
Ni ∼ind Poisson(θi ξi ). If the expected rates are all equal, then the variance of Xi is inversely proportional to its population size.
In this case, the mean of ZR is about 0 and the variance of ZR is approximated by

m 
m
m2 wij2 ξi−1 ξj−1

i=1 j=1
τmR
2
=   2 . (12)
m 
m m
w 2
ξi
−1
 
ij
i =1 j =1 i=1

Since the asymptotic variance τmR 2


only depended on regional population sizes, we could use regional populations to gauge
potential variance inflation from the Moran’s I test.
To evaluate the impact of population heterogeneity, we set ξi = 104 in the first five rows and ξi = 25 × 104 in the next
five rows. If we chose Xi = Ni /ξi (the observed risk) under H0 : θi = θ0 , then we had E (Xi ) = θ0 and V (Xi ) = θ0 /ξi . This
yielded that the values of V (Xi ) for units within the circles were five times as large as those for units outside the circles,
which was similar to the patterns that we had considered in Example 2. In particular, we considered six patterns in our
simulation. According the basic approach provided by Zhang and Lin (2007, 2009), we devised a high-value cluster in the
first three patterns and a low-value cluster in the last three patterns. In pattern (3a), we assumed θi = θ0 (1 + ϵ) if a lattice
point was within a cluster centered at (3, 3) with radius 2 and θi = θ0 otherwise, where θ0 = 0.0005. In pattern (3b), we
used the similar way to define θi but we moved the center of the cluster to (5, 5). In pattern (3c), we moved the center of
the cluster to (8, 8). The only difference between patterns (3a) and (3d) was the choices of θi for units within the cluster,
where we chose θi = θ0 (1 − ϵ) in (3d). The same method was used in patterns (3e) and (3f), respectively.
We also compared the power functions of ZR between simulated and theoretical results. For the simulated result, we
computed the rejection rates by assuming ZR following N (0, τm2 ) in the adjusted method and N (0, 1) in the unadjusted
method, respectively (Fig. 4). We found the results from the adjusted method and the theoretical method were closed and
both had 0.05 type I error probabilities. However, the type I error probabilities in the unadjusted method were significantly
90 T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94

Fig. 4. Asymptotic and simulated power functions of ZR for Poisson data as a function of ϵ when α = 0.05.

higher than 0.05, which meant that the type I error probability was inflated in the unadjusted method. The power functions
were also affected by the pattern of population at-risk. 
The above examples show that the asymptotic distribution was general. It provided the formulae not only for the null
distributions under the heterogeneous variances, but also the power functions for various clustered patterns under the
alternative hypothesis. When low-population regions were clustered, they tended to cause substantial variance inflation.
When high-populations were mixed with low-population regions, they tended to cause substantial variance shrinkage.
Given that the approximation of the asymptotic distributions of Moran’s I test was very precise, it could be used to assess
various scenarios of variance inflation and power functions in place of Monte Carlo simulations.
The most important issue to be paid attention to is the case discussed by (1d) in the first example, while the case discussed
by (1c) is also important. When lower variance and high variance units are mixed together, the permutation variance of
Moran’s I is significantly lower than one, which means that the power function of Moran’s I permutation test is much lower
than it should be. It is very hard to have a significant Moran’s I value even if the cluster is strong. To improve the power of
Moran’s I permutation test, one needs to use Eq. (11) to adjust the critical value, where σi2 in the equation can be roughly
estimated by a statistical model. For example, we can assume σi2 only has two values: σi2 = σ01 2
or σi2 = σ022
. This method
is likely to provide a brief estimate of τmR by Eq. (11).
2

4. Application

The asymptotic distribution of ZR given by Theorem 3 also has practical applications. When data are in count, for example,
we can use it to evaluate potential false alarms caused by spatially varied population sizes. In this section, we first empirically
examined the heterogeneity problem for each state at the county level in the US. We then selected a state that is likely to have
the heterogeneity problem and use Corollary 3 to adjust population heterogeneity when testing for spatial autocorrelation.
We considered the scenario provided in Example 3 in our application, where τmR 2
provided in Eq. (12) was used. Table 1
listed numerical values of τmR for 45 states based on county population from the 2000 Census. States with less than 10
2

counties (Connecticut, Delaware, Rhode Island, Hawaii, Alaska) were not included. The results showed that most τmR 2
values
were very close to 1. The τmR values for Colorado, New York and Nebraska were all above 1.45. The low τmR values appeared
2 2

to be in states with less than 20 counties. When variances were deviant from 1, caution should be exercised when making
inference about spatial clustering of disease or other rate-based data. The limiting distribution of Moran’s I could measure
T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94 91

Table 1
Asymptotic variances of ZR for 45 States in United States under the null hypothesis of the homogeneous disease rate, where m is the number of counties.
State name m All sex Male Female State name m All sex Male Female

Alabama 67 1.0791 1.0929 1.0668 Nevada 17 0.5127 0.5251 0.4977


Arizona 15 0.7413 0.7329 0.7499 New Hampshire 10 0.9207 0.9196 0.9218
Arkansas 75 1.0922 1.0955 1.0886 New Jersey 21 0.9559 0.9466 0.9651
California 58 1.1997 1.1634 1.2466 New Mexico 33 1.2850 1.2907 1.2790
Colorado 63 1.5922 1.5820 1.5955 New York 62 1.4557 1.4494 1.4638
Florida 67 1.5691 1.5486 1.5841 North Carolina 100 1.0052 0.9988 1.0149
Georgia 159 1.2243 1.2347 1.2170 North Dakota 53 1.1313 1.1221 1.1427
Idaho 44 0.9321 0.9311 0.9330 Ohio 88 1.1264 1.1187 1.1361
Illinois 102 1.0323 1.0296 1.0379 Oklahoma 77 1.0282 1.0197 1.0366
Indiana 92 0.9629 0.9645 0.9611 Oregon 36 1.9112 1.9014 1.9207
Iowa 99 1.0943 1.0983 1.0904 Pennsylvania 67 1.1924 1.1942 1.1895
Kansas 105 1.1859 1.1814 1.1905 South Carolina 46 0.9760 0.9795 0.9731
Kentucky 120 1.0510 1.0473 1.0556 South Dakota 66 1.1860 1.1862 1.1855
Louisiana 64 1.1546 1.1524 1.1564 Tennessee 95 0.9794 0.9863 0.9690
Maine 16 0.8294 0.8294 0.8293 Texas 254 1.1420 1.2011 1.0853
Maryland 24 1.0825 1.0960 1.0685 Utah 29 0.8192 0.8440 0.7893
Massachusetts 14 0.0735 0.0798 0.0677 Vermont 14 0.6015 0.6034 0.5996
Michigan 83 1.0916 1.1058 1.0777 Virginia 135 0.8335 0.8029 0.8637
Minnesota 87 0.9835 0.9885 0.9792 Washington 39 1.0786 1.0891 1.0684
Mississippi 82 0.9534 0.9607 0.9453 West Virginia 55 1.2278 1.2161 1.2386
Missouri 115 1.1644 1.1709 1.1585 Wisconsin 72 0.9776 0.9782 0.9767
Montana 56 1.0314 1.0297 1.0324 Wyoming 23 0.9183 0.9211 0.9150
Nebraska 93 1.6972 1.7059 1.6894

Fig. 5. Nebraska breast cancer incidence rates by county:1998–2002.

and adjust both inflated and deflated variances according to Eq. (12). For example, the asymptotic variance for females in
Nebraska was 1.6894, and it could be translated into an upper bound of the p-value 0.0162 under the asymptotic N (0, 1)
assumption. If the unadjusted p-value was greater than 0.0162 for females, then the adjusted p-value of Moran’s I would
not be significant.
Given that Nebraska had substantial population variation by county, the following case study illustrated how to
use asymptotic variance to adjust p-values when testing for spatial autocorrelation. Nebraska had a total population of
1, 711, 263 in 2000, 843,351 males and 867,912 females. There were 93 counties in Nebraska and their population sizes
vary substantially. One urban county (Douglas) had a population of 463,585 while 11 rural counties had less than 1000.
We obtained the five-year (1998–2002) county-level incidence data for breast cancer from the Nebraska Cancer Registry,
and derived a five-year incidence rate by using the 2000 census female population. Across Nebraska, the five-year breast
cancer incidence rate was 133.9 per 100,000 (Fig. 5). To determine whether breast cancer presents spatial clustering, we
computed the values of Moran’s I, its permutation mean ER (I ) and variance VR (I ) according to Expressions (1), (9), and (10),
respectively.
Without adjustment by variance inflation, the value of Moran’s I was 0.0904 with the p-value of 0.046, which suggests
spatial clustering. After the variance adjustment by expression (12), however, the p-value became 0.0990, which accepts
the null hypothesis of no spatial clustering. The seemingly significant clustering tendency from the original Moran’s I was,
therefore, due to variance inflation. Similar to pattern (1c) discussed earlier, the observed inconsistency in p-values for breast
92 T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94

cancer was attributable to the rural population heterogeneity, because about 10 rural counties were clustered, and they only
accounted for 0.35% of female population. If we trusted the result from the unadjusted Moran’s I, a next logical step would be
to identify local clusters and associated etiological factors, which could lead to wasted public health resources. Our variance
adjustment method should be able to reduce false alarm due to heterogeneity.

5. Discussion

The limiting distribution provided in this paper has resolved the variance inflation problems in Moran’s I test
under heterogeneity. It provides the basis for theoretically assessing Type I and Type II error probabilities of Moran’s I
under heterogeneity. This method could serve as the default for the Moran’s I permutation test. When populations are
homogeneous, the method is identical to the permutation test; when populations are heterogeneous, the method adjusts
p-values for variance inflation. In this way, potential false alarms due to variance inflation can be avoided.
The result that the variance adjustment factor τm2 in (12) depends on spatial distributions of population sizes is very useful.
For Poisson count data, we can use τm2 to establish an upper sensitivity bound for the unadjusted p-value of Moran’s I test
statistic, because populations at risk are often available across spatial units. This method is not only likely to aid meta data
analysis of spatial statistical findings, but is also likely to expand the application of Moran’s I to historical and confidential
data, for which only rates and some kinds of populations at risk are accessible.
The application of Central Limit Theorem of quadratic forms to Moran’s I is just a beginning, and there are several avenues
for future research. First, spatial phenomena are often spatially dependent, and the limiting distribution of the Moran’s
I coefficient should be extended to spatially dependent samples (Ord, 1975; Schmoyer, 1994). Second, the asymptotic
distribution is likely provide a solution to the long standing problem about the choice of the spatial weight wij in spatial
data analysis, because different spatial weighting schemes can all be reflected by the asymptotic mean and variance in the
equations and ensuing Corollaries in Theorem 3. Finally, since most spatial test statistics are in quadratic forms, the limiting
distributions for many spatial test statistics, such as the G (Getis and Ord, 1992) and the c (Geary, 1954) statistics, can be
derived.

Appendix. Proofs of theorems

Proof of Lemma 1. Note tat E (X̄·m ) = 0, V (X̄·m ) = 1/m, and


 
m  m m m
 1   1  √
√ wij  ≤ √ |wij | ≤ mC .
 
 m i=1 j=1  m i =1 j =1

The absolute expected value of the second term of (7) is less than or equal to
 
m m m 
m
X̄·2m   1  C
E √ |wij | = wij ≤ √ ,
m i=1 j=1 m−3/2 i=1 j=1
m

m that the second term of (7) goes to 0 in probability as m → ∞. Since wi· ≤ C /m, for the first term in (7), we
which yields
have E [ i=1 wi· (Xim − µim )] = 0 and
 
m m
 w2 σ 2 C2
V wi· (Xim − µim ) = i· i
≤ .
i=1 i=1
σ̄ 2 m
√ √ √
wi· (Xim − µim ) = op (1). Note that E ( mX̄·m ) and V ( mX̄·m ) = 1. We have mX̄·m = Op (1). This is enough
m
We have i=1
P
to conclude that the first term of (7) goes to 0 in probability as m → ∞. Therefore, Rm → 0 as m → ∞. 

Proof of Theorem 1. We prove Theorem 1 by checking the conditions of the Central Limiting Theorem for the
linear–quadratic form given by Kelejian and Prucha (2001). Accordingly, we write Tm into the following linear–quadratic
form:
 
m
 (Xim − µim ) (Xjm − µim ) m
 (Xim − µim ) 2
m

Tm = wij + (wij + w·j ) .
i =1
m4/1 m4/1 i =1
m1/4 m1/4 j=1

For any ϵ > 0, we have


T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94 93

 2+ϵ  2+ϵ
m  m m  m
1  2  1  2 
 
(wij + w·j ) = (wij + wi· )µjm 
 
 1/4  1/4
m i=1  m j =1
 m i =1
 m j =1

 2+ϵ
m  m
1  

≤ 2 (wij + wi· )

m i =1  j =1 
= (4C )2+ϵ ,
and we still have
 Xim − µim 4
 
max E   = max m−1 E (Xim − µim )4 ≤ Cm2α−1 ≤ C .
i≤m m4/1  i ≤m

Note that E (Tm ) = 0 and V (Tm ) = τm2 . By the conclusion of Kelejian and Prucha’s Theorem for the central limiting theorem
L L
of linear–quadratic form, we have Tm /τm → N (0, 1) as m → ∞. If condition (C5) holds, Tm → N (0, τ 2 ). 
√ −3/2
√ P
Proof of Proposition 1. Since η̃2m / m = m µ2im ≤ mα−1/2 , limm→∞ η̃2m / m → 0. Since X̄·m → 0 and
m
i =1
 
1 + mα
m m
1  1 
E 2
Xim = (σim
2
+ µ2im ) ≤ √ → 0,
m3/2 i=1 m3/2 i=1 m

we have
m m
b̃2m 1  2 
√ = (Xim − X̄·m )2 ≤ (Xim
2
+ X̄·2m )
m m3/2 i=1 m3/2 i=1

going to 0 in probability as m → ∞. Since h(t ) = t 4 is a concave function in t, we have


   
  m m
1 1 8
(Xim + X̄·m ) ≤ 16m2α−1 ,
 
E b̃4m = E (Xim − X̄·m ) 4
≤ E 4 4
m m2 i=1
m2
i =1

P P
which implies b̃4m → 0 as m → ∞. To prove b̃2m − η̃2m → 1 as m → ∞, we consider the following identity
m m m
1  2  2 
b̃2m − η̃2m = (Xim − µim )2 + (Xim − µim )µim − X̄·m (Xim − µim ) + X̄·2m .
m i =1 m i=1 m i =1

Using  the Chebyshev inequality, we canmconclude that the second term  above goes to 0 in probability. Note that
E [m−1 i=1 (Xim − µim )] = 0 abd V [m−1 i=1 (Xim − µim )] = 1. We have m−1 i=1 (Xim − µim ) = Op (1). Since X̄·m = op (1),
m m

we conclude that the third term as well as the last term above goes to 0 in probability. The mean of the first term on the
right is 1 and its variance is
 
m m
1  1 
V (Xim − µim ) 2
= (κim − σim
4
) ≤ C 2 m2α−1 → 0
m i=1 m2 i=1

P
as m → ∞. Thus, the first term goes to 1 in probability, and b̃2m − η̃2m → 1 as m → ∞. 
Proof of Theorem 2. Under the conclusion of Theorem 1 and Proposition 1, a conclusion is drawn from the identity
√ √
mI = mĨm /(ψm b̃2m ). 
√ √ √
Proof of Theorem 3. Immediately we have limm→∞ ER ( mI ) = 0 since ER ( mI ) = m/(m − 1). Next, we look at the
following identity

VR ( mI ) m(m2 − 3m + 3) 2
3mS0m − m2 S2m
= +
(ω /ψ )
2
m
2
m (m − 1)(m − 2)(m − 3) (m − 1)(m − 2)(m − 3)S1m
b̃4m (m − m)S1m − 2mS2m + 6S0m
2 2
S0m
+ − .
b̃22m (m − 1)(m − 2)(m − 3)S1m (m − 1)2 S1m
Conditions (C2) and (C3) imply that the second term and the last term go to 0 as m → ∞. The third term equals

b̃4m /m (m2 − m)S1m − 2mS2m + 6S0m


2
.
b̃22m (m − 1)(m − 2)(m − 3)S1m
94 T. Zhang, G. Lin / Computational Statistics and Data Analysis 95 (2016) 83–94

Proposition 1 implies that (m−1 b̃4m )/b̃22m goes to 0 in probability as m → ∞. Since Conditions (C1) and (C2) also imply

(m2 − m)S1m − 2mS2m + 6S0m 2


lim = 0,
m→∞ (m − 1)(m − 2)(m − 3)S1m

√ P L
the third term goes to 0 in probability. Thus VR ( mI )/(ωm
2
/ψm2 ) → 1 if m → ∞. Consequently, we have (ZR − θmR )/τmR →
L
N (0, 1) as m → ∞. Moreover if Condition (C5) holds, we have ZR → N (θ /[ω(1 + η)], τ 2 /[ω2 (1 + η)2 ]) as m → ∞. 

References

Arbia, G., 1989. Spatial data configuration in the statistical analysis of regional economic and related problems. In: Advanced Studies in Theoretical and
Applied Economics, Vol. 14. Kluwer Academic Publishers, Boston.
Arbia, G., 2014. A Primer for Spatial Econometrics with Applications in R. Palgrave Macmillan, New York.
Assuncao, R., Reis, E., 1999. A new proposal to adjust Moran’s I for population density. Stat. Med. 18, 2147–2162.
Besag, J., Newell, J., 1991. The detection of clusters in rare diseases. J. Roy. Statist. Soc. Ser. A 154, 143–155.
Best, N., Ickstadt, K., Wolpert, R., 2000. Spatial Poisson regression for health and exposure data measured at disparate resolutions. J. Amer. Statist. Assoc.
95, 1076–1088.
Billingsley, P., 1995. Probability and Measure. Wiley, New York.
Cliff, A.D., Ord, J.K., 1981. Spatial Processes: Models and Applications. Pion, London.
de Jong, P., 1987. A central limit theorem for generalized quadratic forms. Probab. Theory Related Fields 75, 261–277.
de Wet, Y., Vebter, J.H., 1973. Asymptotic distribution for quadratic form with application to test of fit. Ann. Statist. 1, 380–387.
Geary, R.C., 1954. The contiguity ratio and statistical mapping. Inc. Statist. 5, 115–145.
Getis, A., Ord, J., 1992. The analysis of spatial association by use of distance statistics. Geogr. Anal. 24, 189–206.
Giraitis, L., Taqqu, M., 1998. Central limit theorems for quadratic forms with time-domain conditions. Ann. Probab. 26, 377–398.
Gneiting, T., 2002. Nonseparable, stationary covariance functions for space–time data. J. Amer. Statist. Assoc. 97, 590–600.
Goodchild, M., Aneselin, L., Appelbaum, R., Harthorn, B., 2000. Toward spatially integrated social science. Int. Reg. Sci. Rev. 23, 139–159.
Green, P., Richardson, S., 2002. Hidden Markov models and disease mapping. J. Amer. Statist. Assoc. 96, 1055–1070.
Griffith, D.A., 1988. Advanced Spatial Statistics. In: Advanced Studies in Theoretical and Applied Econometrics, Kluwer Academic Publishers, Boston.
Kelejian, H., Prucha, I., 2001. On the asymptotic distribution of Moran I test statistic with applications. J. Econometrics 104, 219–257.
Kelsall, J., Wakefield, J., 2002. Modeling spatial variation in disease risk: a geostatistical approach. J. Amer. Statist. Assoc. 97, 692–701.
Mikosch, T., 1991. Functional limit theorems for random quadratic forms. Stochastic Process. Appl. 37, 81–98.
Millison, D., Isham, V., Grenfell, B., 1994. Epidemics: models and data. J. Roy. Statist. Soc. Ser. A 157, 115–149.
Moran, P.A.P., 1948. The interpretation of statistical maps. J. R. Stat. Soc. Ser. B 10, 243–251.
Oden, N., 1995. Adjusting Moran’s I for population density. Stat. Med. 14, 17–26.
Ord, K., 1975. Estimation methods for models of spatial interaction. J. Amer. Statist. Assoc. 70, 120–126.
Pickle, L., Waller, L., Lawson, L., 2005. Current practices in cancer spatial data analysis: a call for guidance. Int. J. Health Geograph. 4, 1–5.
Schmoyer, R.L., 1994. Permutation tests form correlation in regression errors. J. Amer. Statist. Assoc. 89, 1507–1516.
Sen, A., 1976. Large sample-size distribution of statistics used in testing for spatial correlation. Geogr. Anal. 9, 175–184.
Stein, M., 2005. Space–time covariance functions. J. Amer. Statist. Assoc. 100, 310–321.
Wakefield, J., Morris, S., 2001. The Bayesian modeling of disease risk in relation to a point source. J. Amer. Statist. Assoc. 96, 77–91.
Waldhor, T., 1996. The spatial autocorrelation coefficient Moran’s I under heteroscedasticity. Stat. Med. 15, 887–892.
Waller, L., Carlin, B., Xia, H., Gelfand, A., 1997. Hierarchical spatio-temporal mapping of disease rates. J. Amer. Statist. Assoc. 92, 607–617.
Walter, S.D., 1992. The analysis of regional patterns in health data. Am. J. Epidemiol. 136, 730–741.
Whittemore, A., Friend, N., Brown, B., Holly, E., 1987. A test to detect clusters of disease. Biometrika 74, 631–635.
Whittle, P., 1964. Convergence of quadratic forms in independent random variables. Theory Probab. Appl. 9, 103–108.
Zhang, T., Lin, G., 2007. A decomposition of Moran’s I for clustering detection. Comput. Statist. Data Anal. 51, 6123–6137.
Zhang, T., Lin, G., 2009. Cluster detection based on spatial association and iterated residuals in generalized linear mixed models. Biometrics 65, 353–360.

You might also like