Xóa 11
Xóa 11
Abstract
Colocalization analyses assess whether two traits are affected by the same or dis-
tinct causal genetic variants in a single gene region. A class of Bayesian colocalization
tests (Giambartolomei et al., 2014; Wallace, 2021) are now routinely used in practice;
for example, for genetic analyses in drug development pipelines. In this work, we con-
sider an alternative frequentist approach to colocalization testing that examines the
proportionality of genetic associations with each trait. The proportional colocaliza-
tion approach uses markedly different assumptions to Bayesian colocalization tests, and
therefore can provide valuable complementary evidence in cases where Bayesian colo-
calization results are inconclusive or sensitive to priors. We propose a novel conditional
test of proportional colocalization, prop-coloc-cond, that aims to account for the uncer-
tainty in variant selection, in order to recover accurate type I error control. The test
can be implemented straightforwardly, requiring only summary data on genetic associa-
tions. Simulation evidence and an empirical investigation into GLP1R gene expression
demonstrates how tests of proportional colocalization can offer important insights in
conjunction with Bayesian colocalization tests.
1 Introduction
Genome-wide association studies (GWASs) have revealed that a large number of genetic
variants are robustly associated with a wide range of diseases and disease traits. This has
motivated the development of post-GWAS methods that use genetic association data to in-
vestigate causal mechanisms of potentially related traits. Colocalization analyses specifically
aim to identify whether or not two traits are affected by the same or distinct causal variants
in a single genetic region; the two traits are said to colocalize if they share the same causal
∗
Ashish Patel ([email protected]); John Whittaker ([email protected]);
Stephen Burgess ([email protected]).
1
variants. The statistical challenge of such analyses is to appropriately account for issues
that could otherwise lead to false positives (e.g. falsely concluding that the traits colocalize
when they do not), such as sampling uncertainty in measured genetic associations, and the
correlation between genetic variants.
To date, most of the popular methods for colocalization analyses can be placed into one
of two categories: enumeration colocalization and proportional colocalization. Enumeration
colocalization methods (Giambartolomei et al., 2014; Wallace, 2020) evaluate the evidence
for colocalization from a Bayesian perspective, and return posterior probabilities of colocal-
ization and other competing hypotheses. The methods require investigators to specify a prior
probability for colocalization, which is typically set to limit the potential for false positives.
The key features of this approach are that the analyses can be performed using only sum-
mary data that is typically shared in GWASs, and that the methods conclude that there is
colocalization only if there is evidence to refute the prior of no colocalization.
In contrast, proportional colocalization approaches (Plagnol et al., 2009; Wallace, 2013) are
frequentist methods that test a null hypothesis of proportional colocalization, a particular
type of colocalization where the association of causal variants with one trait are proportional
to their associations with the other trait (Figure 1). Aside from testing for only a specific
type of colocalization, the proportional testing approach also differs from the enumeration
approach in that the null hypothesis is that the two traits colocalize. Therefore, whereas
the enumeration colocalization approach effectively looks for evidence against the prior of
no colocalization, the proportional testing approach looks for evidence against the null of
proportional colocalization. Note that to credibly accept the null hypothesis of proportional
colocalization we require that the proportional colocalization test has high power.
Figure 1. Measured genetic associations in a single gene region in proportional, non-proportional, and no
colocalization settings. The red curve corresponds to trait 1, and the blue curve corresponds to trait 2.
While the enumeration colocalization approach more directly evaluates evidence for the gen-
eral colocalization hypothesis, the proportional colocalization hypothesis is at times of in-
dependent interest. The proportionality of genetic associations with each trait would be
consistent with one trait mediating the other. Moreover, proportional colocalization tests
2
are closely related to overidentification tests (Hansen, 1982) and conditional F-tests (Sander-
son and Windmeijer, 2016) which are routinely used to indicate sources of possible model
mis-specification in drug target cis-Mendelian randomization analyses (Burgess et al., 2023).
An important decision when performing a proportional colocalization test concerns variant
selection. The test effectively considers the totality of evidence across all included variants,
and therefore aims to be sensitive to even slight departures from proportionality of weak
variant associations. In practice, this results in a tendency to reject the proportional colo-
calization hypothesis when an included variant is only weakly associated with one trait, even
when the two traits share another strongly associated causal variant. Enumeration colocal-
ization tests are less sensitive in this regard since they aim to compare colocalization evidence
of only strongly associated variants.
In order to avoid sensitivity to weak variant associations, Wallace (2013) computes the pro-
portional colocalization test with lead variants (the variants most strongly associated with
each trait), and highlights the heavily inflated type I error rates that can result from ignoring
the uncertainty in variant selection. In this work, we aim to mitigate this type I error problem
through conditional inference techniques. Specifically, we propose a novel test of proportional
colocalization based on lead variants that conditions on the selection event (Fithian et al.,
2017), and hence accounts for the uncertainty in variant selection.
More generally, we discuss how subtle differences in underlying models of genetic variant-trait
effects and definitions of colocalization can result in differing evidence between and within
proportional and enumeration colocalization approaches. Since they work with fundamen-
tally different assumptions, combining the evidence provided by the two approaches offers
the potential to make more robust inferences than if the evidence from only one approach is
considered. We demonstrate this practice by comparing colocalization evidence in an empir-
ical application involving gene expression in different tissues in the GLP1R gene region. R
code to apply our methods is available at github.com/ash-res/prop-coloc/.
2 Methods
Two traits are said to proportionately colocalize if: (i) they share the same causal genetic
variants in a single gene region; and (ii) the trait associations with those causal variants are
proportional to each other. When there is only one causal variant for both traits, genetic
3
associations with two traits that colocalize are necessarily proportional, and hence there is no
distinction between colocalization and proportional colocalization. But in the general case
of multiple causal variants, proportional colocalization is a specific type of colocalization, as
highlighted in the two leftmost plots in Figure 1.
We are interested in testing the null hypothesis that two traits X1 and X2 proportionately
colocalize. Let Z = (Z1 . . . ZJ )′ denote a J-vector of genetic variants that are possibly causal
for X1 and/or X2 . The genetic variants may be moderately correlated, but the variance-
covariance matrix of Z, var(Z), is assumed to be invertible.
For each trait k, we consider the following linear model with homoscedastic errors,
Xk = αk + γk′ Z + Vk (1)
where Σγ = var(Z)−1 ⊗ ΣV , n denotes the sample size used to measure genetic associations
with traits, and ⊗ denotes the Kronecker product. The normality assumption (2) is justified
by standard large sample arguments (Proof of Proposition 1, Supplementary Information).
The strategy of modelling joint effects of multiple genetic variants using a multivariate normal
distribution has previously been explored in fine-mapping investigations; see, for example,
Verzilli et al. (2008); Yang et al. (2012); Newcombe et al. (2016).
4
2.1.2 Proportional colocalization test statistic
Under some GMM regularity conditions, it can be shown that in large samples nQ(b b η ) has a
χ2 distribution with J − 1 degrees of freedom (Proposition 2, Supplementary Information).
Thus, a ν-level proportional colocalization test rejects the null hypothesis H0 if minη nQ(η)
b >
qJ−1 (ν), where qJ−1 (ν) is the (1 − ν)-th quantile qJ−1 (ν) of the χ distribution with J − 1
2
degrees of freedom.
There are at least two important considerations when choosing which genetic variants should
be included in the proportional colocalization test. First, the proportional colocalization
hypothesis H0 : γ1 = γ2 η0 requires that proportionality of trait associations hold for every
variant included in the test. As such, it considers the totality of evidence across all included
variants. The inclusion of many irrelevant variants that are not causal for either trait is shown
in Section 3.1.1 to harm the finite-sample size properties of the proportional colocalization
test.
At the same time, selecting variants based on their measured associations with either trait
can also lead to inflated type I error rates if the sampling uncertainty in variant selection is
not accounted for. When considering only the top 10 associated variants with each trait, we
find that the type I error rate of the proportional colocalization test can be over 50% for a
5% level test (Supplementary Figures S1 and S2). Although some variant selection is usually
5
necessary, it is seemingly difficult to make good inferences in finite samples without adjusting
for variant selection.
Second, since the proportional colocalization test statistic is a function of the inverse of the
variance-covariance matrix of genetic variants, the test can be numerically unstable when
very highly correlated variants are included. In general, there are no obvious guidelines on
how high correlations between selected variants should be: on the one hand, allowing variant
correlations up to only, say, a R2 ≤ 0.4 correlation threshold runs a higher risk that we are
omitting true causal variants from the analysis. However, allowing highly correlated variants
up to a R2 ≤ 0.9 threshold may be problematic, especially if the available genetic correlation
estimates are not representative of the sample used to compute genetic associations with
traits. For example, this may be due to excessive uncertainty in estimating genetic corre-
lations, or if genetic correlation estimates are based on individuals of different ethnicity to
those represented in measured genetic variant–trait associations.
We now revisit the strategy of Wallace (2013), which tests the proportionality of trait associa-
tions using only two lead variants. The lead variants are selected based on their multivariable
associations with traits (i.e. based on Xk on Z = (Z1 , . . . , ZJ )′ linear regressions rather than
univariable Xk on Zj linear regressions), and hence the measured association of any given
variant is adjusted for all other variants. To do this, we start by using the normality as-
sumption (2) to construct simple t-statistics to judge the relevance of a given variant for each
trait.
For each trait k, let Dγ,k be the J × J diagonal matrix such that its (j, j)-th element is given
−1/2
by (Σγ,kk )jj where (Σγ,kk )jj is the (j, j)-th element of Σγ,kk . Next, let Dγ be the 2J × 2J
diagonal matrix such that its top-left J × J block is equal to Dγ,1 , and its bottom-right
√
J × J block is equal to Dγ,2 . Then, Tb = (Tb1′ Tb2′ )′ = Dγ n(b b2′ )′ denotes a 2J-vector of
γ1′ , γ
t-statistics, where Tbk is the J-vector of t-statistics corresponding to trait k, (k = 1, 2).
We first select the j ⋆ -th variant that is most strongly associated with trait 1, so that |Tb1j ⋆ | ≥
|Tb1j | for all variants j = 1, . . . , J, where, for example, Tb1j is the t-statistic of the association
of the j-th variant with trait 1. Then, from the remaining variants not including the j ⋆ -th
variant, we select the j ⋆⋆ -th variant that is most strongly associated with trait 2, so that
|Tb2j ⋆⋆ | ≥ |Tb2j | for all variants j = {1, . . . , J}\{j ⋆ }, j ⋆⋆ ̸= j ⋆ . We note that this procedure
could induce non-commutativity; if both traits share the same strongest variant, the second
strongest variant for trait 2 may not be the same as the second strongest variant for trait 1.
6
To compute the proportional colocalization test statistic based on only these two selected
variants, let I⋆ denote the 2 × J matrix with its (1, j ⋆ )-th and (2, j ⋆⋆ )-th elements equal to
1, and all other elements equal to 0. Then, as in Section 2.1.2, we can define the GMM
quantities gb⋆ (η) = I⋆ gb(η), Ω⋆ (η) = I⋆ Ω(η)I⋆′ , and Q
b⋆ (η) = gb⋆ (η)′ Ω⋆ (η)−1 gb⋆ (η). An estimator
of the proportionality constant is ηb⋆ = arg minη Q b⋆ (η), and a naive ν-level proportional
colocalization test that ignores the uncertainty in the variant selection step simply compares
the statistic nQ η ) against the (1 − v)-th quantile q1 (ν) of the χ2 distribution with 1 degree
b⋆ (b
of freedom. We refer to this test as prop-coloc-naive.
If there was no uncertainty in variant selection, the limiting χ2 distribution of the propor-
tional colocalization test statistic nQ η ) is established by noting that nQ
b⋆ (b η ) is closely
b⋆ (b
−1/2 √ ′ −1/2 √
approximated by Ω⋆ g⋆ (η0 ) in large samples, where Ω⋆ = Ω⋆ (η0 ),
nbg⋆ (η0 ) M⋆ Ω⋆ nb
−1/2 −1/2
G⋆ = −I⋆ γ2 , and M⋆ = I2×2 − Ω⋆ G⋆ (G⋆ Ω⋆ G)−1 G′⋆ Ω⋆
′ −1
is an idempotent matrix of rank
1 (Proof of Proposition 2, Supplementary Information). Under H0 and the normality as-
−1/2 √
sumption (2), Ω⋆ g⋆ (η0 ) is distributed as a N (02×1 , I2×2 ) random variable, so that in
nb
−1/2 √ ′ −1/2 √
large samples, the statistic Ω⋆ g⋆ (η0 ) converges to a χ2 random
nb
g⋆ (η0 ) M⋆ Ω⋆ nb
variable with 1 degree of freedom.
To consider how the uncertainty in variant selection may affect the distribution of the propor-
−1/2 √
tional colocalization test statistic, we model the dependence of the statistic Ω⋆ nb
g⋆ (η0 )
on the vector of t-test statistics Tb which determines variant selection. Specifically, we
−1/2 √
assume that under H0 : γ2 = γ1 η0 , the normalized statistics Ω⋆ g⋆ (η0 ) and Tb are
nb
−1/2 √ −1/2
jointly normal with covariance cov(Ω⋆ g⋆ (η0 ), Tb) = C⋆ , where C⋆ = Ω⋆ I⋆ CΣγ Dγ and
nb
−1/2 √
C = [Σγ,11 − Σ′γ,12 η0 , Σγ,12 − Σγ,22 η0 ]. Since Ω⋆ nbg⋆ (η0 ) has mean zero under H0 , the joint
−1/2 √
distribution of Ω⋆ nbg⋆ (η0 ) and Tb then depends on only C⋆ and the mean of Tb, which we
denote by T .
Since T is unknown, we choose to condition on a sufficient statistic L for T , where L =
−1/2 √ −1/2 √
Tb − C⋆′ Ω⋆ g⋆ (η0 ). Note that this statistic is uncorrelated with Ω⋆
nb g⋆ (η0 ), and that
nb
under H0 ,
−1/2 √
" # " #
Ω⋆ nbg⋆ (η0 ) K
(L = ℓ) ∼ , where K ∼ N (02×1 , I2×2 ). (3)
Tb ℓ + C⋆′ K
Equation (3) provides the basis to approximate the conditional distribution of the propor-
tional colocalization test statistic, and hence to adjust inferences for the uncertainty in vari-
ant selection. In particular, we condition on the variant selection event (Fithian et al.,
2017), which is given by S = ∩j∈[J] {|Tb1j ⋆ | ≥ |Tb1j |} ∩j∈[J]\{j ⋆ } {|Tb2j ⋆⋆ | ≥ |Tb2j |}. There-
7
fore, in contrast with the naive test described in Section 2.2.1, the conditional test does
not compare the statistic nQ η ) against q1 (ν), but rather a critical value wν⋆ such that
b⋆ (b
PH0 (nQ η ) ≤ wν⋆ |S, L = ℓ) = 1 − ν under H0 . If nQ
b⋆ (b η ) > wν⋆ , then we reject the null
b⋆ (b
hypothesis of proportional colocalization. We refer to this method as prop-coloc-cond.
In order to implement the prop-coloc-cond test in practice, we can use (3) to approximate
the conditional distribution PH0 (nQ η ) ≤ w|S, L = ℓ), which can then be used to find the
b⋆ (b
critical value wν . Let ℓK = T |(L = ℓ), and as with the 2J-vector of t-statistics Tb = (Tb1′ , Tb2′ )′ ,
⋆ b
partition ℓK = ℓ + C⋆′ K into ℓK = (ℓ′K,1 , ℓ′K,2 )′ where ℓK,k = (ℓK,k1 , . . . , ℓK,kJ )′ for each trait
k = 1, 2. Then, the conditional distribution
is approximately
P {K′ M⋆ K ≤ w} ∩j∈[J] {|ℓK,1j ⋆ | ≥ |ℓK,1j |} ∩j∈[J]\{j ⋆ } {|ℓK,2j ⋆⋆ | ≥ |ℓK,2j |}
. (4)
P ∩j∈[J] {|ℓK,1j ⋆ | ≥ |ℓK,1j |} ∩j∈[J]\{j ⋆ } {|ℓK,2j ⋆⋆ | ≥ |ℓK,2j |}
We can estimate (4) by taking many draws of K ∼ N (02×1 , I2×2 ), and in practice we evaluate
√
the conditional distribution at the fitted value ℓ = Tb − C⋆′ Ω−1/2 nb η ). In a sparse effects
g⋆ (b
setting where it is obvious which are the lead variants, there should be little uncertainty
in variant selection and hence the conditional critical value wν⋆ should be very close to the
standard critical value q1 (ν) based on the (1 − ν)-th quantile of the χ2 distribution with 1
degree of freedom. However, in more general cases, the conditional critical value will adjust
to accommodate the sampling uncertainty in determining the lead variants.
8
and, taking variant selection as given, the statistic converges to a χ2 random variable with
1 degree of freedom as n → ∞ (Proposition 3, Supplementary Information). Therefore, a
ν-level test of H0,η compares the statistic LM with q1 (ν), the (1 − ν)-th quantile of the χ2
distribution with 1 degree of freedom. If LM > q1 (ν), then we reject H0,η and conclude
evidence for a non-zero proportionality constant, and hence the presence of a causal variant
for trait 1. Although this inference procedure does not explicitly account for variant selection,
this appears to have a negligible impact on the size performance of the test. Supplementary
Figures S3 and S4 verify that this test accurately controls type I error rates, and has power
to detect a non-zero proportionality constant in small and large samples.
Before presenting a more comprehensive simulation study, we briefly discuss some toy numeri-
cal examples of scenarios where we may expect differences between the results of proportional
and enumeration colocalization approaches.
We note the results of two enumeration colocalization tests. First, the “coloc” (Giambar-
tolomei et al., 2014; Wallace, 2020) test is based on an assumption that there is at most
a single causal variant for each trait, and provides posterior probabilities of colocalization
and other competing hypotheses. Specifically, under “H0” there is no causal variant for either
trait, under “H1” there is a causal variant only for trait 1, under “H2” there is a causal variant
only for trait 2, and under “H3” each trait has a different causal variant. The hypothesis of
interest is “H4”, which is that the two traits have a shared causal variant. Second, we also
consider a recently proposed enumeration colocalization method that combines coloc with a
Sum of Single Effects (“coloc-SuSiE”; Wallace, 2021) regression framework; the method has
built-in variant selection, and can report the evidence for colocalization at multiple variant
loci.
We compute coloc and coloc-SuSiE with default priors using the coloc R package. For coloc,
we take a posterior probability of H4 lower than 0.5 to conclude evidence of no colocalization,
and for coloc-SuSiE we take a posterior probability of H4 lower than 0.5 at all variant loci
to conclude evidence of no colocalization.
These enumeration colocalization approaches are compared with two versions of proportional
colocalization tests: “prop-coloc-full ” computes the proportional colocalization test discussed
in Section 2.1.2 comparing the proportionality of all variant associations included in the
test, and “prop-coloc-cond ” computes the conditional test of proportional colocalization of
lead variant associations discussed in Section 2.2.2. The nominal size of the proportional
colocalization tests was set at 0.05.
9
We generated data based on the linear model (1) in Section 2.1.1 for 40 uncorrelated genetic
variants, with their true effects γ1 and γ2 on the two traits plotted in Figure 2. The sample
size was set to n = 1000.
In Model 1 of Figure 2, there is only a single causal variant for trait 2, which is also causal for
trait 1, and all other variants have weak effects only on trait 1. Since the proportionality of
variant–trait associations does not hold over all variants or any two lead variants, prop-coloc-
full and prop-coloc-cond tend to reject the proportional colocalization hypothesis. There is
a shared causal variant that is strongly associated with trait 2; in this case coloc tends to
favour the colocalization hypothesis, whereas coloc-SuSiE does not.
Figure 2. Colocalization test results under three different model designs involving 40 uncorrelated variants.
These are merely illustrative examples to highlight potential scenarios where different colocalization
approaches can conclude in differing evidence. The nominal size of the proportional colocalization tests was
set to 0.05, and a rejection of the colocalization hypothesis using coloc and coloc-SuSiE was defined as the
posterior probability of H4 being lower than 0.5.
Model 2 is a setting of non-proportional colocalization with two causal variants that are
strongly associated with both traits. In this case, only coloc-SuSiE detects colocalization.
Finally, Model 3 is a setting where two variants are strongly associated with trait 2 and are
weakly associated with trait 1. All other variants have even weaker random effects on only
trait 1. These random effects tend to force prop-coloc-full into rejecting the proportional
colocalization hypothesis, whereas prop-coloc-cond does not since the proportionality of trait
associations holds between the first two lead variants. Since all variant effects on trait 1 are
quite small, both enumeration colocalization tests tend to favour a hypothesis of a causal
variant for trait 2 only.
10
Both prop-coloc-naive and prop-coloc-cond use the proportional colocalization test statistic
based on the two strongest variant associations with the two traits, but prop-coloc-cond uses
a critical value that attempts to account for the uncertainty in variant selection as described
in Section 2.2.2, whereas the prop-coloc-naive uses the standard critical value based on the
χ2 distribution as described in Section 2.2.1.
To aid a straightforward comparison, we primarily consider the setting of a single causal
variant for each trait, so that the proportional colocalization hypothesis with a non-zero pro-
portionality constant is exactly the same as the general colocalization hypothesis studied by
enumeration colocalization tests. As in Section 2.3, we compute the “coloc” (Giambartolomei
et al., 2014) and “coloc-SuSiE” (Wallace, 2021) tests with default priors using the coloc R
package, and we take a posterior probability of colocalization lower than 0.5 (at all variant
loci for coloc-SuSiE) to conclude evidence of no colocalization. For the three proportional
colocalization tests, the nominal size of the tests was set at ν = 0.05. The setting of multiple
causal variants is discussed in Section 3.1.3 below.
The goal of the simulation study is to investigate the finite-sample size and power properties
of proportional colocalization tests, with a particular emphasis on verifying the type I error
control of the prop-coloc-cond test.
We generated genetic association data on J = 40 genetic variants such that there is only a
single causal variant for each trait. The variants Z = (Z1 , . . . Z40 )′ were generated from the
joint normal distribution Z ∼ N (040×1 , ρ), where ρ was set to be an invertible correlation
matrix where the diagonal elements were set equal to 1 so that all variants have equal
variance. The off-diagonal elements were set to be the off-diagonal elements of the matrix
√
aa′ for a = (a1 , . . . , a40 )′ , where aj ∼ U [0, ρ0 ] for j = 1, . . . , 40, and where ρ0 < 1 is a
positive constant. We set ρ0 = 0.8 so that the correlation between any two distinct variants
in Z was at most R2 = 0.64, with the exception of causal variants. The correlation between
the causal variant for trait 1 and the causal variant for trait 2 was set equal to ξ. When
ξ = 0 the causal variants for each trait are uncorrelated, and ξ = 1 represents the case where
the traits share the same causal variants (i.e. the traits colocalize).
We generated traits from the linear model described in Section 2.1.1, where the intercepts
(α1 , α2 ) were set to be zero, and the errors of the linear model were generated as V =
(V1 , V2 ) ∼ N (02×1 , ΣV ) where ΣV is the correlation matrix such that cov(V1 , V2 ) = 0.3. The
variant effects γ1 on trait 1 were set as 0.5 for the causal variant, and 0 for non-causal variants.
The variant effects γ2 on trait 2 were set as 0.5η0 for the causal variant, and 0 for non-causal
variants.
11
From an n-sized sample on X1 , X2 , and Z, we generated univariable summary data on
genetic associations. For computing the coloc and proportional colocalization test results,
we used the estimated coefficients and corresponding standard errors from Xk on Zj linear
regressions for each k = 1, 2, and j = 1, . . . , 40, as well as the sample correlation matrix of
Z. In addition, proportional colocalization tests used the sample correlation between X1 and
X2 , and knowledge of the sample size n.
We consider a colocalization analysis of gene expression in different tissues in the GLP1R gene
region. Such analyses are of potentially of interest when investigating causal mechanisms by
which GLP1R agonists affect disease risk (Daghlas et al., 2021; Patel et al., 2023).
Estimated genetic associations with gene expression based on n = 838 participants of mostly
European ancestry were taken from the Genotype-Tissue Expression (GTEx) project version
8 (GTEx Consortium, 2020). Estimated genetic variant correlations based on 367,703 unre-
lated participants were taken from the UK Biobank (Astle et al., 2016). We considered the
genetic region ± 100kbp of the GLP1R gene (chr6:39,016,557-39,059,079 in GRCh37/hg19)
for which 851 variant associations with GLP1R expression were available.
Our analysis studies GLP1R expression in 10 tissues that were used to fit a multivariable
model for coronary artery disease risk in Patel et al. (2023). The tissues are thyroid, testis,
stomach, pancreas, nerve, lung, left ventricle (heart), atrial appendage (heart), hypothalamus
(brain), and caudate (brain). Our colocalization analyses involve considering as traits GLP1R
expression in any 2 of these 10 tissues in turn.
For computing proportional colocalization tests, we pruned variants up to a correlation
threshold of R2 ≤ 0.6, and after this we considered only the top 10 associated variants
with each trait (measured by marginal p-values). Such selection on p-values may harm the
size properties of the prop-coloc-full test, but appears to have a relatively low impact on
the prop-coloc-cond test (Supplementary Figures S1 and S2). The trait with the strongest
variant association (lowest p-value) was selected as trait 2.
We present the results of “prop-coloc-cond-LM” which combines the results of the prop-coloc-
cond test with the results of a Lagrange multiplier (LM) test for a non-zero proportionality
constant, as discussed in Section 2.2.3. Specifically, the p-value of the prop-coloc-cond test is
presented only for trait comparisons where the LM test concludes in strong evidence (at the
95% confidence level) for a non-zero proportionality constant. Otherwise, the proportional
12
colocalization hypothesis is rejected regardless of the p-value of the prop-coloc-cond test
because the LM test suggests that there is insufficient evidence for a causal variant for trait
1. The results of “prop-coloc-full-LM” are defined analogously from combining the prop-
coloc-full and LM test results.
The proportional colocalization tests require an input of trait correlations; the trait corre-
lations were set to 0, with the exception of the two brain tissues and the two heart tissues
which were set at 0.5. The results were not too sensitive to this choice. As in the simulation
study, the default priors were selected for the coloc and coloc-SuSiE tests.
3 Results
Figure 3 presents the type I error and power results of the colocalization tests for the single
causal variant case, varying with the sample size n (from the top row to the bottom row), the
proportionality constant η0 (from the left column to the right column), and the correlation
between the causal variants of trait 1 and trait 2 (on the x-axis of each plot).
As expected, the naive test that does not account for variant selection uncertainty can have
very poor size properties even in large sample settings; for example, the test has a type I error
rate of 0.26 for the case of η0 = 1 and n = 10000. In contrast, the conditional proportional
colocalization test is able to control type I error rates near the nominal 0.05 level, even though
it is slightly over-sized in some cases. The proportional colocalization test using the full set of
40 variants has inflated type I error rates compared with the conditional test in small sample
settings, with its type I error rates consistently over 0.1 when n = 500 for all values of the
proportionality constant η0 considered.
Similar to the simulation evidence in Wallace (2013), the correlation between causal variants
is shown to affect the power properties of proportional colocalization tests, as well as the
type I error rate of coloc tests in finite samples. Intuitively, as two distinct causal variants
become more correlated, it becomes harder to separate the two genetic signals, leading to
lower rejection rates of the proportional colocalization test. For the case of η0 = 0.5 and
n = 1000, the power of prop-coloc-cond falls from around 0.9 when ξ = 0.7 to around 0.4
when ξ = 0.9 when n = 1000 and η0 = 0.5. Similarly, coloc is more likely to falsely conclude
that there is colocalization for highly correlated distinct causal variants in small samples; for
η0 = 0.5 and n = 500, the coloc type I error rate is less than 0.05 when ξ = 0.4 but around
0.6 when ξ = 0.8.
13
The results in Figure 3 also suggest specific situations where we could expect differences in the
evidence provided by coloc and proportional colocalization tests. When the proportionality
constant η0 is close to 0, this represents a setting where the genetic signal for trait 1 is
relatively weak. The tendency of coloc in this case is to favour the hypothesis of a causal
variant for only trait 2 in small samples, whereas the proportional colocalization test tends
to retain the null hypothesis of colocalization. This is useful from the perspective that the
proportional testing approach is less likely to erroneously reject the null of colocalization
when the signal for trait 1 is weak. However, since the proportional colocalization hypothesis
H0 : γ2 = γ1 η0 is trivially satisfied when η0 = 0, proportional colocalization tests have no
power to reject colocalization when there is no genetic signal for trait 1. Hence, proportional
colocalization tests should be used only when there is strong evidence for a genetic signal for
both traits in the gene region.
Figure 3. The rejection frequencies of colocalization tests, with a single causal variant for traits 1 and 2.
The traits colocalize only where the causal variants are perfectly correlated (ξ = 1). The nominal size of
proportional colocalization tests was set at 0.05. For coloc and coloc-SuSiE the rejection rates of the
colocalization hypothesis, P (H4 < 0.5), are plotted.
An important problem in practice is that investigators may not always have access to genetic
correlation estimates from the same sample used to compute genetic assocations with traits.
14
In such situations, genetic correlations from a non-overlapping reference sample may be
used, but the performance of the proportional colocalization test with many variants may
be sensitive to errors in genetic correlation estimates. One strategy to ensure the test is not
too vulnerable to mis-measured genetic correlations is to consider only moderately correlated
variants through pruning. Pruning is a stepwise procedure that finds a subset of variants
that are as strongly associated with traits as possible, up to the constraint that the mutual
correlation between any two variants in that subset does not exceed some user specified
threshold (Dudbridge and Newcombe, 2016).
To investigate the impact of mis-measured genetic correlations on proportional colocalization
testing, we consider the same simulation design described in Section 2.4.1, but now we use
noisy genetic correlation estimates from a reference sample when computing proportional
colocaliztion tests. In particular, instead of using the sample genetic correlation matrix, we
use a random Wishart matrix centered at the true genetic correlation matrix with degrees
of freedom λ. We also compute the prop-coloc-full test at different pruning thresholds based
on the R2 correlation between variants.
Figure 4. Type I error rates of proportional colocalization tests varying with estimation errors in genetic
correlations. Lower degrees of freedom λ correspond to greater errors in correlation estimates. The nominal
size of the tests was set at 0.05.
Figure 4 plots the type I error rates of proportional colocalization tests for the case where
the sample size is n = 1000 and the proportionality constant is η0 = 0.5. For the case of
no pruning, where the correlation of any two variants may be up to R2 = 0.82 as described
in Section 2.4.1, the type I error rates of prop-coloc-full are considerably higher than prop-
coloc-cond. The type I error problem becomes more severe for low enough degrees of freedom
λ, since lower values of λ correspond to larger errors in genetic correlation estimates.
The performance of the prop-coloc-full test based on different pruning thresholds is shown in
the first 4 panels of Figure 4; the prop-coloc-cond and prop-coloc-naive results are also shown
for comparison. Aggressive pruning seems to alleviate the inflated type I error rates of the
prop-coloc-full test, with a pruning threshold of R2 ≤ 0.1 leading to similar size performance
as prop-coloc-cond.
15
Figure 5. Power of proportional colocalization tests varying with the correlation between causal variants,
under no mis-measured genetic correlations. The nominal size of the tests was set at 0.05.
Although considering only weakly correlated variants appears to control type I error rates of
the prop-coloc-full test under mis-measured genetic correlations, this would obviously harm
the power of the test. This is verified in Figure 5, which plots the power of the prop-coloc-
full test under different pruning thresholds and under no mis-measured genetic correlations.
Stricter pruning thresholds run the risk of omitting true causal variants from the analysis,
which is the case when the square of the correlation between causal variants ξ 2 is greater
than the R2 pruning threshold; it can be seen in Figure 5 that the power of prop-coloc-full
starts to dip at that moment.
When there are multiple causal variants for either trait, proportional colocalization ap-
proaches test the null hypothesis that genetic variants proportionately colocalize, which is a
specific type of colocalization as discussed in Section 2.1.1. This section presents the power
performance of colocalization tests varying with the non-proportionality of colocalized ge-
netic associations. The power of the coloc method, which uses an assumption of a single
causal variant, is also shown to be affected by the non-proportionality of colocalized signals.
The simulation design is as described in Section 2.4.1 with the exception of the true variant
effects on traits, γ1 and γ2 . Two distinct causal variants (Zj1 , Zj2 ) were selected at random,
and the vector of variant effects γ2 on trait 2 was set to be a vector of zeros apart from its
j1 -th element which was set to 0.5(1 − δ), and its j2 -th element which was set to 0.5(1 + δ).
The vector of variant effects γ1 on trait 1 was set to be a vector of zeros apart from its j1 -th
element which was set to 0.5(1 + δ)η0 , and its j2 -th element which was set to 0.5(1 − δ)η0 .
The non-proportionality parameter δ was varied between 0 and 1; note that when δ = 0,
γ1 = γ2 η0 so that the proportional colocalization hypothesis holds. Then, as δ moves away
from 0 and towards 1, the non-proportionality of the colocalized signals increases.
Figure 6 presents the power of colocalization tests varying with the non-proportionality of
the causal genetic variant associations δ. The results provide further evidence that using
16
many genetic variants in the proportional colocalization test can inflate type I error rates in
small samples. In contrast, the conditional test was again able to control type I error rates
near the nominal 0.05 level across the range of sample sizes n and proportionality constants
η0 considered. Moreover, prop-coloc-cond test was generally also more powerful than prop-
coloc-full. For example, the rejection rate of prop-coloc-cond was around 33 percent higher
than the rejection rate of prop-coloc-full when n = 1000, η0 = 0.5, and δ = 0.2.
Non-proportionality of colocalized genetic associations can also be seen to affect the power
of coloc, with the test favouring the hypothesis of distinct causal variants when the non-
proportionality of colocalized signals is high. For the specific case of small samples (n ≤ 1000)
and proportionality constants of η0 = 0.1 and η0 = 0.2, coloc was more powerful than coloc-
SuSiE. But for all other cases, coloc-SuSiE was generally much more powerful in detecting
colocalization, although high non-proportionality of colocalized signals also harmed the power
of the approach.
Figure 6. The rejection frequencies of colocalization tests, varying with the non-proportionality of causal
genetic variant associations. The traits proportionately colocalize only when δ = 0. The nominal size of
proportional colocalization tests was set at 0.05. For coloc and coloc-SuSiE the rejection rates of the
colocalization hypothesis, P (H4 < 0.5), are plotted.
We conclude this section by summarizing our findings from the simulation study. First,
using many variants in the proportional colocalization test can result in inflated type I error
rates in small samples, but so can naive application of the test based on the two variants
17
most strongly associated with traits. In contrast, the conditional proportional colocalization
test which accounts for the uncertainty in variant selection has better finite-sample size
properties, and the test can also be more powerful when detecting non-proportionality of
colocalized signals compared to also testing the proportionality of many irrelevant variants.
Further, using many correlated genetic variants in proportional colocalization tests can result
in highly inflated type I error rates under mis-measured genetic correlations. While pruning
to weakly correlated variants can alleviate type I error inflation in this regard, it also harms
the power of the test when the causal variants are more correlated. Finally, under multiple
causal variants, coloc-SuSiE is better placed to detect colocalization under a wide range of
scenarios, although the power to detect non-proportional colocalization is not as high in
settings of small sample sizes and when the association of causal variants with one trait are
weak.
Figure 7 presents the results of colocalization tests. The nominal size of the proportional
colocalization tests was set at 0.05; where the p-value of the test exceeds 0.05, we retained the
null hypothesis of proportional colocalization, and those results are marked with a cross if in
addition the p-value of the LM test is less than 0.05 (which indicates evidence for a non-zero
proportionality constant). For coloc, the posterior probability of “H4”, the hypothesis that
the traits colocalize, is indicated. A posterior probability of H4 greater than 0.5 was taken as
evidence for colocalization, and those results are marked with a cross. The same applies for
coloc-SuSiE, but where a posterior probability of H4 greater than 0.5 at at least one locus
was taken as evidence for colocalization.
We first note that prop-coloc-full rejects the proportional colocalization hypothesis in all
but one pairwise trait comparison, which involves the two heart tissues. The prop-coloc-
cond test also retains the proportional colocalization hypothesis for the heart tissues, as well
as retaining the null hypothesis in a further 25 out of the 45 pairwise trait comparisons
considered. Of these 25, only 11 pairs of traits were retained after using the LM test to rule
out the proportional colocalization hypothesis where there is insufficient evidence for a non-
zero proportionality constant. The more conservative performance of the prop-coloc-cond test
suggests that in some pairwise comparisons, the proportionality of variant–trait associations
may hold for the two most relevant variants, but not across other weaker associated variants.
18
Figure 7. The p-values of proportional colocalization tests, and posterior probabilities of colocalization “H4”
of coloc and coloc-SuSiE tests. Where the proportional colocalization tests do not reject the null hypothesis
of proportional colocalization (p-value > 0.05) and the LM tests do reject the null of a zero proportionality
constant (p-value < 0.05) are indicated by a cross. Similarly, where coloc and coloc-SuSiE detect
colocalization (posterior probability of H4 > 0.5) are indicated by a cross.
Of the 5 pairwise trait comparisons judged to colocalize by the coloc-SuSiE method, 4 of the 5
are supported with the results of prop-coloc-cond, 3 of the 5 are consistent with the results of
coloc, and only the colocalization finding for the two heart tissues is supported by the results
of all other methods. Of the 7 colocalization findings by the coloc method, 3 are supported
by the results of coloc-SuSiE, and 4 are supported by the results of prop-coloc-cond.
The results in Figure 7 show that in 30 out of 45 pairwise trait comparisons, all four methods
that we considered (prop-coloc-cond, prop-coloc-full, coloc, and coloc-SuSiE) reject a colocal-
ization hypothesis. For the remaining pairwise trait comparisons, there is some disagreement
in the colocalization evidence provided. As discussed in Section 2.3, this is primarily because
19
each of the four methods are testing different hypotheses. We now discuss specific cases
where the methods suggest similar and contrasting evidence.
For each pairwise trait comparison in Figures 8–14, we plot genetic associations with the
two traits (gene expression in two tissues) based on univariable (Xk on Zj ) and multivari-
able (Xk on Z = (Z1 , . . . , ZJ )′ ) linear regressions. The two lead variants selected by the
prop-coloc-cond method are circled in green. The slopes in the rightmost plots indicate the
estimated proportionality constant from the prop-coloc-full (red) and prop-coloc-cond (green)
methods. In the two leftmost plots, causal variants detected by coloc-SuSiE are highlighted.
The plots for the remaining pairwise trait comparisons not discussed here are provided in
Supplementary Information.
One unanimous finding by all proportional and enumeration colocalization methods was the
colocalization of gene expression in the two heart tissues (atrial appendage and left ventricle).
Figure 8. Colocalization results for gene expression in two heart tissues. Further details for the plots are
discussed at the start of Section 3.2.2.
Figure 8 suggests evidence of at least two causal variants that are shared by both traits.
Moreover, the genetic associations with each trait appear to be proportional; the propor-
tionality constant was estimated to be 0.850 (LM p-value 0.000) in the prop-coloc-full test,
and 0.775 (LM p-value 0.000) in the prop-coloc-cond test. The posterior probability of H4
(colocalization) was greater than 0.999 using the coloc method. The coloc-SuSiE method
suggested colocalization with posterior probability greater than 0.999 at one variant, and
greater than 0.804 at another variant. In this case, we would conclude that GLP1R gene
expression in the atrial appendage and left ventricle tissues colocalize.
20
3.2.4 Only coloc and coloc-SuSiE conclude evidence for colocalization
Figure 9. Colocalization results for gene expression in pancreas and heart (left ventricle) tissues. Further
details for the plots are discussed at the start of Section 3.2.2.
Only the coloc and coloc-SuSiE methods suggested colocalization evidence for gene expression
in the pancreas and heart (left ventricle) tissues. In Figure 9, there appears to be one causal
variant shared by gene expression in the pancreas and heart (left ventricle) tissues.
The coloc-SuSiE method indicates there may be only one causal variant for pancreas, and
multiple causal variants for left ventricle. In this case, we would not expect genetic associa-
tions with each trait to proportional. Indeed, there appears to be considerable heterogeneity
in genetic associations, leading to a rejection of the proportional colocalization hypothesis
using all variants. Further, the two circled lead variants suggest very different estimates of
the proportionality constant, which also leads to a rejection of the proportional colocalization
hypothesis under the prop-coloc-cond method. Overall, we conclude that there is a causal
variant shared by GLP1R gene expression in the pancreas and left ventricle tissues, but the
proportional colocalization hypothesis does not hold.
Figure 10. Colocalization results for gene expression in stomach and pancreas tissues. Further details for
the plots are discussed at the start of Section 3.2.2.
Only the coloc and prop-coloc-cond methods suggested colocalization evidence for gene ex-
pression in the stomach and pancreas tissues. Here, there appears to be a strong causal
21
variant for pancreas that is less strongly associated with stomach. Given this, coloc con-
cludes colocalization evidence, whereas coloc-SuSiE suggests no causal variants for stomach.
In the two rightmost plots of Figure 10, there is significant heterogeneity in variant–trait
associations when considering all variants, so that the prop-coloc-full rejects the null hypoth-
esis of proportional colocalization. Although the two lead variants are in different quadrants,
they suggest similar estimates of the proportionality constant, and so the prop-coloc-cond
method does not reject the proportional colocalization hypothesis (proportionality constant
estimate: 0.774; LM p-value 0.002). The result of prop-coloc-cond therefore tallies with coloc
in concluding evidence for a shared causal variant for GLP1R gene expression in the stomach
and pancreas tissues.
Only the coloc-SuSiE and prop-coloc-cond methods suggested colocalization evidence for
gene expression in the nerve and heart tissues. From Figure 11, coloc-SuSiE detects multiple
causal variants for left ventricle, but the top signals for left ventricle and nerve appear to
be different variants. Therefore coloc does not conclude evidence for colocalization, whereas
coloc-SuSiE does because the causal variant for nerve is judged to also be causal for left
ventricle even though it is not the strongest signal for left ventricle.
Figure 11. Colocalization results for gene expression in nerve and heart (left ventricle) tissues. Further
details for the plots are discussed at the start of Section 3.2.2.
The variant–trait associations in the two rightmost plots in Figure 11 suggest significant
heterogeneity when considering all variants, and hence prop-coloc-full rejects the proportional
colocalization hypothesis. However, the two lead variants suggest quite similar estimates of a
positive proportionality constant (estimate: 0.324; LM p-value: 0.003), and thus prop-coloc-
cond retains the null of proportional colocalization. A clear conclusion is difficult; the top
signals for the two traits appear to be distinct, but the lead variant for nerve may be causal
for both traits.
22
3.2.7 Only coloc concludes evidence for colocalization
Figure 12. Colocalization results for gene expression in stomach and heart (left ventricle) tissues. Further
details for the plots are discussed at the start of Section 3.2.2.
Only the coloc method suggested colocalization evidence for gene expression in the stomach
and heart tissues. Similar to Figure 10, a top signal in Figure 12 for one of the traits (in this
case, left ventricle) appears to be located near to the top, but weaker, signal for the other trait
(stomach) leading coloc to favour a colocalization hypothesis, whereas coloc-SuSiE concludes
there is no causal variant for stomach.
There appears to be significant heterogeneity in variant–trait associations, leading to a re-
jection of the proportional colocalization hypothesis from the prop-coloc-full method. More-
oever, in the rightmost plot in Figure 12, the two lead variants are in different quadrants
and suggest no coherent estimate of the proportionality constant, and the LM test cannot
reject the null hypothesis of a zero proportionality constant. This could represent a setting
similar to model 1 in Section 2.3 where only coloc tends to detect colocalization, and where
the linear model of proportional colocalization does not hold between the two lead variants.
Figure 13. Colocalization results for gene expression in brain (hypothalamus) and stomach tissues. Further
details for the plots are discussed at the start of Section 3.2.2.
Only the prop-coloc-cond method suggested colocalization evidence for GLP1R gene expres-
sion in the brain (hypothalamus) and stomach tissues. In Figure 13, the top signal for brain
(hypothalamus) is quite weak, with coloc favouring a “H2” hypothesis of a causal variant only
23
for stomach (with posterior probability 0.680). The coloc-SuSiE method does not detect a
causal variant for either trait.
Again, the heterogeneity across all variant–trait associations leads the prop-coloc-full test to
reject the null hypothesis of proportional colocalization, but the two lead variants suggest a
similar value of the proportionality constant (estimate: -1.191; LM p-value: 0.003), so that
the prop-coloc-cond test retains the null hypothesis.
Finally, we note a case where a colocalization hypothesis was not supported by any of the
methods considered. From the leftmost manhattan plots in Figure 14, there is no clear
indication of a shared causal variant between testis and stomach. In this case, coloc favours
a “H2” hypothesis of a causal variant only for stomach, whereas coloc-SuSiE does not detect
a causal variant for either trait.
Figure 14. Colocalization results for gene expression in testis and stomach tissues. Further details for the
plots are discussed at the start of Section 3.2.2.
The two rightmost plots of variant-trait associations show no clear linear trend; there is
excessive heterogeneity which leads to a rejection of the proportional colocalization hypothesis
under both the prop-coloc-full and prop-coloc-cond methods. Further, the LM test is unable
to suggest evidence for a non-zero proportionality constant, indicating there is no causal
variant for testis. Overall, there is unlikely to be a causal variant for testis, but there may
be a causal variant for stomach.
4 Discussion
Enumeration and proportional colocalization approaches differ in both the assumptions they
use, and the null hypotheses or priors that are maintained in the absence of evidence to the
contrary. Therefore we may be more confident in colocalization evidence that is supported
by different methods in the spirit of triangulation (Munafò and Davey Smith, 2018). Where
24
the methods give different answers, careful interrogation can give insight into the most likely
causal model, as in the examples discussed in Section 3.2.4–3.2.8.
In this work, we primarily focused on the proportional colocalization approach, and derived a
conditional test that aims to account for uncertainty in variant selection, in order to overcome
the issue of inflated type I error rates highlighted in Wallace (2013). Our simulation evidence
suggests that the conditional test has competitive finite-sample size properties compared with
a proportional colocalization test that compares proportionality of variant–trait associations
for a large number of variants.
Specific cases where we may expect contrasting evidence include: (i) when there is a single
causal variant for one trait that is weakly associated with another trait, and other variants
that more weakly associated with only one trait – here, only coloc tends to provide evidence
for colocalization (Model 1 of Figure 2 in Section 2.3, and Figure 12 in Section 3.2.7); (ii)
when there is non-proportional colocalization at multiple causal variants – here, only coloc-
SuSiE tends to provide evidence for colocalization (Model 2 of Figure 2 in Section 2.3); (iii)
under small sample sizes, and relatedly, when proportionality of variant–trait associations
holds for weak causal variants, the conditional proportional colocalization test tends not to
reject the proportional colocalization hypothesis.
Reasons for the popularity of the coloc method are clear: it can be implemented straight-
forwardly with minimal data requirements, needing only summarized data on the traits of
interest, and it gives clear output that provides direct evidence on the probability of colocal-
ization. However, the method also has weaknesses: it is often sensitive to specification of the
priors (particularly for p12 , the prior probability of a variant being causal for both traits), it
assumes a single causal variant for both traits, it focuses on the lead signals for both traits,
and it can provide ambiguous conclusions with no strong evidence either for or against colo-
calization (Burgess et al., 2023). The coloc-SuSiE method addresses some of these issues,
allowing multiple causal variants and hence reducing the focus on the lead signals.
The proportional colocalization method operates in a different paradigm, complicating direct
comparison of the methods. Both coloc and coloc-SuSiE operate in a Bayesian paradigm,
whereas proportional colocalization is implemented in a frequentist paradigm. This means
that proportional colocalization can either reject the null hypothesis of colocalization, or not
reject this hypothesis: the latter could represent lack of strong evidence against colocalization
rather than strong evidence to support colocalization. In cases where genetic associations
with one trait are not strong, proportional colocalization could therefore act as a negative
filter, triaging out situations where colocalization is not supported by the data. If coloc/coloc-
SuSiE and proportional colocalization give concordant results, this can be interpreted as
stronger evidence than provided by either method individually. If they give discordant results,
a variety of explanations are possible: this could reflect weak evidence both for and against
25
colocalization, different definitions of colocalization tested by the approaches, or a focus on
lead variants (in coloc) versus consideration of several variants (particularly in prop-coloc-
full).
Biology is complex and messy. In proposing this test for proportional colocalization, we
provide another method that tests for colocalization that may agree or disagree with the
commonly-used coloc method. Where it agrees, this provides the analyst with stronger
evidence supporting or refuting colocalization; where it disagrees, this provides a caution
that the results we see are not black-and-white, but reflect the complexity in the biological
mechanisms that underlie the association estimates in our statistical models.
Acknowledgements
We thank Chris Wallace and Amy Mason for helpful discussions. This research was funded by
the United Kingdom Research and Innovation Medical Research Council (MC-UU-00002/7
and MC-UU-00002/18), and supported by the National Institute for Health Research Cam-
bridge Biomedical Research Centre (BRC-1215-20014). S.B. is supported by the Wellcome
Trust (225790/Z/22/Z).
Data availability
The summary genetic association data used for the analyses described in this manuscript
were obtained from the Genotype-Tissue Expression (GTEx) Portal (project version 8) at
https://gtexportal.org/home GTEx, and UK Biobank Linkage Disequilibrium Matrices were
obtained from the AWS Open Data Sponsorship Program at
https://registry.opendata.aws/ukbb-ld.
Code availability
Declaration of interests
26
References
Astle, W. J., H. Elding, T. Jiang, D. Allen, D. Ruklisa, A. L. Mann, D. Mead, H. Bouman,
F. Riveros-Mckay, and M. A. Kostadima (2016). The allelic landscape of human blood cell
trait variation and links to common complex disease. Cell 167 (5), 1415–1429.
Daghlas, I., V. Karhunen, D. Ray, V. Zuber, S. Burgess, P. S. Tsao, and D. Gill et al. (2021).
Genetic evidence for repurposing of GLP1R (Glucagon-Like Peptide-1 Receptor) agonists
to prevent heart failure. Journal of the American Heart Association 10 (13), e020331.
Dudbridge, F. and P. J. Newcombe (2016). Accuracy of gene scores when pruning markers
by linkage disequilibrium. Human Heredity 80 (4), 178–186.
Fithian, W., D. L. Sun, and J. Taylor (2017). Optimal inference after model selection.
arXiv:1410.2597 , 1–39.
GTEx Consortium (2020). The GTEx Consortium atlas of genetic regulatory effects across
human tissues. Science 369 (6509), 1318–1330.
Hansen, L. P., J. Heaton, and A. Yaron (1996). Finite-sample properties of some alternative
GMM estimators. Journal of Business and Economic Statistics 14 (3), 262–280.
Munafò, M. R. and G. Davey Smith (2018). Robust research needs many lines of evidence.
Nature 553 (7689), 399–401.
Newcombe, P. J., D. V. Conti, and S. Richardson (2016). JAM: a scalable Bayesian framework
for joint analysis of marginal SNP effects. Genetic Epidemiology 40 (3), 188–201.
27
Plagnol, V., D. J. Smyth, J. A. Todd, and D. G. Clayton (2009). Statistical independence
of the colocalized association signals for type 1 diabetes and RPS26 gene expression on
chromosome 12q13. Biostatistics 10 (2), 327–334.
Sanderson, E. and F. Windmeijer (2016). A weak instrument F-test in linear IV models with
multiple endogenous variables. Journal of Econometrics 190 (2), 212–221.
Wallace, C. (2013). Statistical testing of shared genetic control for potentially related traits.
Genetic Epidemiology 37 (8), 802–813.
Wallace, C. (2020). Eliciting priors and relaxing the single causal variant assumption in
colocalisation analyses. PLOS Genetics 16 (4), 1–20.
Wallace, C. (2021). A more accurate method for colocalisation analysis allowing for multiple
causal variants. PLOS Genetics 17 (9), e1009440.
28