0% found this document useful (0 votes)
45 views17 pages

Genotype-By-Environment Interaction and Its Genetic Basis

Uploaded by

ch6tzxs5cg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views17 pages

Genotype-By-Environment Interaction and Its Genetic Basis

Uploaded by

ch6tzxs5cg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

METHODS ARTICLE

published: 12 March 2013


doi: 10.3389/fphys.2013.00044

The statistical analysis of multi-environment data:


modeling genotype-by-environment interaction
and its genetic basis
Marcos Malosetti 1*, Jean-Marcel Ribaut 2 and Fred A. van Eeuwijk 1
1
Biometris - Applied Statistics, Department of Plant Science, Wageningen University, Wageningen, Netherlands
2
Consultative Group on International Agricultural Research Generation Challenge Programme, México DF, Mexico

Edited by: Genotype-by-environment interaction (GEI) is an important phenomenon in plant breeding.


Philippe Monneveux, International This paper presents a series of models for describing, exploring, understanding, and
Potato Center, Peru
predicting GEI. All models depart from a two-way table of genotype by environment
Reviewed by:
means. First, a series of descriptive and explorative models/approaches are presented:
Jin Chen, Michigan State University,
USA Finlay–Wilkinson model, AMMI model, GGE biplot. All of these approaches have in
Pawel Krajewski, Institute of Plant common that they merely try to group genotypes and environments and do not use other
Genetics, Poland information than the two-way table of means. Next, factorial regression is introduced as
John Doonan, Aberystwyth
University, UK
an approach to explicitly introduce genotypic and environmental covariates for describing
and explaining GEI. Finally, QTL modeling is presented as a natural extension of factorial
*Correspondence:
Marcos Malosetti, Biometris - regression, where marker information is translated into genetic predictors. Tests for
Applied Statistics, Department of regression coefficients corresponding to these genetic predictors are tests for main effect
Plant Science, Wageningen QTL expression and QTL by environment interaction (QEI). QTL models for which QEI
University, PO Box 100, 6700 AA,
Wageningen, Netherlands.
depends on environmental covariables form an interesting model class for predicting GEI
e-mail: [email protected] for new genotypes and new environments. For realistic modeling of genotypic differences
across multiple environments, sophisticated mixed models are necessary to allow for
heterogeneity of genetic variances and correlations across environments. The use and
interpretation of all models is illustrated by an example data set from the CIMMYT maize
breeding program, containing environments differing in drought and nitrogen stress. To
help readers to carry out the statistical analyses, GenStat® programs, 15th Edition and
Discovery® version, are presented as “Appendix.”
Keywords: adaptation, genotype by environment interaction, multi-environment trials, QTL by environment
interaction, QTL mapping methodology, REML

INTRODUCTION: PHENOTYPE, GENOTYPE, AND of water, nutrients or incoming radiation. A primary objective in
ENVIRONMENT plant breeding is to match genotypes and environments in such
The success of a plant breeding program depends on its ability a way that improved phenotypes are obtained. For example, a
to provide farmers with genotypes with guaranteed superior per- breeder might be interested in selecting genotypes that do well
formance (phenotype) in terms of yield and/or quality across a under water stress conditions.
range of environmental conditions. To achieve this aim, it is nec- While there can be genotypes that do well across a wide
essary to have an understanding of the factors leading to a good range of conditions (widely adapted genotypes), there are also
phenotype. genotypes that do relatively better than others exclusively under
Usually the phenotype is the value for a trait at the end of the a restricted set of conditions (specifically adapted genotypes).
growing season. The reason is that we are primarily interested in Specific adaptation of genotypes is closely related to the phe-
phenotypes like yield or grain weight at maturity and not, or less, nomenon of genotype-by-environment interaction (GEI). GEI
in yield or grain weight at earlier stages. The final state of a trait is exists whenever the relative phenotypic performance of genotypes
the cumulative result of a number of causal interactions between depends on the environment, or in other words, when the dif-
the genetic make-up of the plant (the genotype) and the condi- ference in reactions of genotypes varies in dependence on the
tions in which that plant developed (the environment). Plants environment.
differ in the efficiency and adequacy with which they capture and To illustrate the phenomenon of GEI, we can consider two dif-
convert environmental inputs and stimuli into the biomass and ferent genotypes that differ in the genetic machinery involved
organs that constitute a final product. The capture and conver- in tolerance to water-limited conditions, while being equal for
sion abilities of a plant are determined by its particular ensemble all other characteristics. If these two genotypes are exposed
of genes. Environments differ in the amount and quality of inputs to a poorly watered environment, their performance will dif-
and stimuli that they convey to plants including, e.g., the amount fer depending on the genetic properties related to tolerance for

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 1


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

water-limited conditions. However, this genotypic difference will for breeders as they imply that the choice of the best genotype is
disappear in an environment that provides the right amount of determined by the environment.
water. So, the difference in performance between the two geno- GEI was introduced in terms of the relative difference between
types depends on the environment, through the amount of water genotypic means. GEI can also be regarded in terms of hetero-
that it provides. geneity of genetic variance and covariance, or correlation. As a
Some scenarios that can occur when comparing the perfor- consequence of GEI, the magnitude of the genetic variance as
mances of pairs of genotypes across environments are presented observed within individual environments will change from one
in Figure 1. The function describing the phenotypic performance environment to the next. Often, the genetic variance tends to
of a genotype in relation to an environmental characterization is be larger in better environments than in poorer environments,
called the “norm of reaction” (Griffiths et al., 1996). Figure 1A although the opposite can be observed as well (Przystalski et al.,
shows the case where there is no GEI, the genotype and the envi- 2008). Figure 2A illustrates the phenomenon of heterogeneity of
ronment behave additively (this will be developed later) and the genetic variance across environments, showing box plots for a
reaction norms are parallel. The remaining plots show different series of maize trials, where the range of variation in the poor
situations in which GEI occurs: divergence (Figure 1B), conver- environments LN96a and LN96b is smaller than that in the good
gence (Figure 1C), and the most critical one, crossover interac- environments HN96b and NS92a.
tion (Figure 1D). Crossover interactions are the most important GEI has also consequences for the correlations between geno-
typic performances in different environments. When GEI is large,
the observed performance of a set of genotypes in one environ-
ment may not be very informative for the performance of the
same genotypes in another environment. Environments with sim-
ilar characteristics will induce corresponding responses in plants
and will lead to strong genetic correlations. Figure 2B shows that
the correlation between the similar environments IS92a and IS94a
is larger than the correlation between the dissimilar environments
NS92a and HN96b.
In conclusion, given the complexity of the mechanisms and
processes underlying the phenotypic response across diverse and
changing environmental conditions—frequently in an unpre-
dictable way—it is necessary to develop analytical tools to help
breeders understand GEI. The use of adequate strategies to ana-
FIGURE 1 | Genotype-by-environment interaction in terms of changing lyze GEI is a first and important step toward more informed
mean performances across environments: (A) additive model, (B)
divergence, (C) convergence, (D) cross-over interaction.
breeding decisions. Good analytical methods are a prerequisite
for predicting the performance of genotypes as accurately as

FIGURE 2 | (A) Boxplot for yield of a maize F2 population in eight The two digits indicate the year of the trial, and the letters a and b the
environments displaying total range, interquartile range (box) and median cropping season: a, winter; b, summer. (B) Scatter plot matrix for two
(line). Environment names are coded as: LN, low nitrogen; HN, high Nitrogen; stress environments (IS92a, and IS94a) and two non-stress environments
SS, severe water stress; IS, intermediate water stress; NS, no water stress. (HN96b and NS92a).

Frontiers in Physiology | Plant Physiology March 2013 | Volume 4 | Article 44 | 2


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

possible. This paper explores several strategies to model GEI, MET data. In the first stage, individual trials are analyzed with
starting with simple methods that have been historically popu- models including terms for design features and spatial varia-
lar within the plant breeding community. It then moves to more tion. From these individual trial analyses, adjusted means and
elaborate models in which additional information is used in the weights, usually reciprocals of the variances of the means, are
form of explicit environmental characterization to model GEI. A carried forward to the second stage, where a model is fitted to
final section is devoted to the integration of molecular marker the genotype by environment means, using either no weights or
information into GEI models, leading to the detection of quanti- weights estimated in the first stage. Various choices can be made
tative trait loci (QTLs) and more specifically, to the modeling of for the weights in a two stage analysis (Mohring and Piepho,
QTL by environment interaction (QEI). The statistical methodol- 2009; Welham et al., 2010), and a good choice of weights will
ogy is illustrated using a maize data set obtained from a series of lead to a two-stage analysis with results very close to those of
drought and nitrogen stress trials from the maize breeding pro- a so-called single stage analysis, in which plot data are analyzed
gram at Centro Internacional de Mejoramiento de Maíz y Trigo instead of means. Single stage analyses have certain theoreti-
(CIMMYT; the International Maize and Wheat Improvement cal advantages over two-stage analyses, but two-stage analyses
Center; Ribaut et al., 1996, 1997). To encourage readers to carry are logistically and computationally easier to handle. This paper
out these statistical analyses themselves, GenStat® programs for focuses on two-stage analyses, because of the small differences
the 15th Edition (VSN International, 2012) and the Discovery® with single stage analyses and the aforementioned larger handling
version of this statistical package (Payne et al., 2007) are presented ease. Still, good descriptions of single stage analyses are offered
as “Appendix.” by Cullis et al. (1996a,b), Gilmour et al. (1997), and Smith et al.
(2005). In principle, the QTL mapping approach outlined later
GENERATING DATA TO STUDY in this paper could also be embedded in a single stage analysis
GENOTYPE-BY-ENVIRONMENT INTERACTION strategy.
An obvious first step to investigate GEI is to obtain phenotypic
observations on a set of genotypes exposed to a range of envi- CIMMYT MAIZE DROUGHT STRESS TRIALS: EXAMPLE DATA
ronmental conditions. The set of genotypes can include advanced The models to be presented here are illustrated using data pro-
lines of a breeding program, cultivars, and segregating offspring duced by the maize drought stress breeding program of CIMMYT.
from a specific cross such as F2 , a backcross, or a recombinant A brief description of the data is given here, a more detailed
inbred line (RIL) population. description is available in the original publications (Ribaut et al.,
Genotypes can be tested under different management regimes 1996, 1997). A maize F2 population was generated by crossing
that represent increasing levels of a particular stress, or a com- a drought tolerant parent (P1 ) with a drought susceptible one
bination of stresses. This type of experiment is called a “managed (P2 ). Seeds harvested from each of 211 F2 plants formed F3 fam-
stress trial” and is appropriate when the researcher wishes to focus ilies, which were stored for further evaluation. The F3 families
on a particular type of stress. When performing managed stress were evaluated in managed stress trials in 1992, 1994, and 1996.
trials, it is important to control the system in such a way that In the winter of 1992, a managed water stress trial was con-
all other factors influencing the phenotype are as homogenous ducted in Mexico, including no stress (NS), intermediate stress
as possible. (IS), and severe stress (SS). In the winter of 1994, a similar trial
Managed stress trials are not a default option in plant breed- was conducted, but it only included the IS and SS treatments.
ing, because stress type and level can be difficult to implement In the summer of 1996, the families were tested in a nitrogen
and because the relationship between phenotype and stress is stress trial with two levels: low (LN) and high nitrogen (HN). An
complex, with genes and environmental stress(es) interacting extra LN trial was conducted in the winter of the same year. In
throughout the various developmental phases. In those situations, total, the families were evaluated in eight different environments,
a common way for plant breeders to screen for genotypic reac- each environment characterized by year, stress type and inten-
tions to environmental factors is by “multi-environment trials” sity, and management factors. DNA was extracted from each of
(METs). In a MET, a number of genotypes are evaluated at a the 211 F2 plants to produce a total of 132 restriction fragment
number of geographical locations for a number of years in the length polymorphism (RFLP) markers covering the 10 maize
hope that the pattern of stresses that the genotypes experience is chromosomes.
representative of future growing environments.
A convenient way to summarize data from managed stress tri- MODELS FOR GENOTYPE-BY-ENVIRONMENT INTERACTION:
als and METs is in the form of two-way tables of means, with MODELING THE MEAN
genotypes in the rows and environments in the columns. Each THE ADDITIVE MODEL AS A BENCHMARK
cell of such a table contains an estimate of the performance The phenomenon of GEI is of primary interest in plant breed-
(adjusted mean) of a particular genotype in a specific environ- ing, and has resulted in a large body of literature on models
ment. To identify genotypes and environments unequivocally, we and strategies for analysis of GEI [see, for example, the reviews
use indices, the letter i for genotypes (i = 1 . . . I), and the letter j in Cooper and Hammer (1996), Kang and Gauch (1996), van
for environments (j = 1 . . . J). Eeuwijk et al. (1996), van Eeuwijk (2006)]. A dominant feature of
The models in the following sections will assume as a start- strategies used to describe and understand GEI is a heavy reliance
ing point a genotype-by-environment table of means. These on parameters that are statistical rather than biological. This is
models are used in a so-called two-stage strategy for analyzing no coincidence, since historically, a large part of quantitative

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 3


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

genetics has relied on simple, yet very useful, statistical models. A this is not a major concern. Breeders want to concentrate on dif-
notorious example is the well-known model: P = G + E, where ferences between genotypes. A significant genotypic main effect
P stands for phenotype, G for genotype and E for environment indicates that genotypes differ in their average performance across
(Falconer and Mackay, 1996; Lynch and Walsh, 1998). A statisti- environments, something certainly more interesting to breeders.
cal formulation of this model for a two-way table of means can be Finally, it should be mentioned that the residual ε in Table 1
written as: corresponds to the discrepancy between the predicted genotype-
µij = µ + Gi + Ej + εij . (1) by-environment means from an additive model and the observed
means.
From here onwards, in the model formulations, random terms are There are two reasons for the disagreement between the pre-
underlined to emphasize the fact that their effects are assumed to dicted values from an additive model and the observed means
follow a normal distribution. Model 1 describes the response vari- for environment-specific genotypic performances: (1) an effect
able, that is, the mean of genotype i in environment j, µij , as the proper to the particular combination of genotype and environ-
ment; and (2) experimental error. Model 1 can be extended with
result of the common fixed intercept term µ, a fixed genotypic
an effect that is specific for genotype-by-environment combina-
main effect corresponding to genotype i, Gi , plus a fixed envi-
tions, GEI, or a double-indexed term GEIij :
ronmental main effect corresponding to environment j, Ej , and
finally the random term, εij , representing the error term, typically
µij = µ + Gi + Ej + GEIij + εij (2)
assumed normally distributed, with a mean of zero and constant
variance, σ2 ; εij ∼ N(0, σ2 ).
Model 1 predicts that for any genotype the difference means When we are working on a two-way table of means, we cannot
between any two environments j and j∗ will be equal to the dif- straightforwardly separate GEI from error. For that, we would
ference in the environmental main effects: Ej –Ej∗ . Consequently, need to develop a model based on plot observations. Use of model
the norms of reaction of genotypes will be parallel (Figure 1A). 2 implies estimation of as many parameters as there are genotype-
Another important aspect is that, although the parameters in the by-environment combinations, something that is not desirable in
model suggest that something intrinsically genetic and something the interest of parsimony. Another limitation of the model is that
intrinsically environmental is determining the trait, the genotypic it is not possible to estimate the genotypic performance in envi-
and environmental effects purely follow from a convenient way of ronments that are not included in the trial. Accordingly, fitting
partitioning phenotypic variation from a statistical point of view. model 2 could tell us something about the amount of variation
In a balanced data set, the genotypic main effects can be estimated due to genotypic main effects in relation to GEI, by comparing
from the average performance of the genotypes across environ- sums of squares or mean squares, but it does not bring much
ments. Rather than being something inherently genotypic, this progress toward understanding GEI.
is dependent on the set of environments used in the experi-
ment. If a few environments are dropped, the genotypic effects THE REGRESSION ON THE MEAN MODEL
will change. The same argument applies to the environmental A more attractive alternative is to extend the additive model
main effects, which depend on the set of genotypes used in the (model 1) by incorporating terms that explain as much as
experiment. possible of the GEI. A popular strategy in plant breeding is that
The results of the fit of an additive model to the maize data set proposed by Finlay and Wilkinson (1963), which describes GEI
are presented in Table 1. The results show that, according to the as a regression line on the environmental quality. In the absence
F-test, there is a significant environmental and genotypic main of explicit environmental information, the biological quality of
effect (the F statistic for environments equals 1466.5, and for an environment can be reflected in the average performance of
genotypes 5.3, both of which are highly significant: P < 0.001). all genotypes in that environment. Good environments will have
As just mentioned, environments are characterized by the aver- a high average genotypic performance, and bad environments
age performance of the genotypes in the particular environment, will have a low average genotypic performance. The GEI part
and the results indicate that the environments differ significantly is then described by genotype-specific regression slopes on the
in their quality. In general, differences between environmental environmental quality, and the model can be written in the
main effects are significant, and from the breeder’s point of view, following equivalent ways:

µij = µ + Gi + Ej + bi Ej + εij (3a)


Table 1 | ANOVA table for the additive model (model 1), as applied to
CIMMYT maize stress trials. µij = G′i + b′i Ej + εij (3b)
Term Degrees of Sum of Mean F Probability Model 3b follows from model 3a by taking µ + Gi = G′i and Ej +
freedom squares squares
bi Ej = (1 + bi )Ej = b′i Ej . Model 3b is easier to interpret because
E 7 5679 811.2 1466.5 <0.001 it looks as a set of regression lines; each genotype has a linear
G 210 614 2.9 5.3 <0.001 reaction norm with intercept G′i and slope b′i . The explanatory
ε 1470 813 0.6 environmental variable in these reaction norms is simply the envi-
ronmental main effect Ej . Model 3a shows more clearly how GEI
Total 1687 7106 4.2 is captured by a regression on the environmental main effect,

Frontiers in Physiology | Plant Physiology March 2013 | Volume 4 | Article 44 | 4


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

with the hope that as much as possible of the GEI signal will be (model 2), model 3 can be used to predict the performance of
retained by the term bi Ej . genotypes in environments that were not present in the MET, as
In the regression on the mean model, GEI is explained in terms long as the environment for which predictions are required can
of differential sensitivities to the improvement of the environ- reasonably be placed within the range of environments used in
ment, with some genotypes (the ones with larger values of bi ) the original MET. Nevertheless, the regression on the mean model
benefiting more than others from an increase in environmen- suffers from the fact that the environmental characterization is
tal quality. Note that in model 3a, !bi = 0, so that the average based on a single dimension. Environmental quality can be hard
slope value is zero, while in model 3b the average value of b′ is to summarize within a single explanatory variable. Therefore, a
1, meaning that b′ > 1 for genotypes with a higher than average substantial amount of GEI can remain unexplained. In the next
sensitivity, and b′ < 1 for genotypes that are less sensitive than section, the regression on the mean model will be extended by
average. including multidimensional environmental characterizations in
Table 2 gives the fit of model 3a to the maize example data. the statistical model for the genotype-by-environment data.
The first two rows of the table, corresponding to the genotypic
and environmental main effects, are identical to Table 1. The third THE ADDITIVE MAIN EFFECTS AND MULTIPLICATIVE INTERACTIONS
row corresponds to the GEI effect in terms of the regression on MODEL
environmental quality, where quality is represented by the envi- The limitation of a single dimension in environmental character-
ronmental mean. This regression is highly significant, according ization can be removed by employing a more flexible model, in
to the F-tests (F = 2.4, P < 0.001). The residual sum of squares which more than one environmental quality variable is allowed.
in Table 1 (SSε = 813) has been divided into a part explained by A popular model of this type is the additive main effects
genotypic sensitivities to environmental quality (SSb = 230), and and multiplicative interaction (AMMI) model (Gollob, 1968;
a residual (SSε = 583). Mandel, 1969; Gabriel, 1978; Gauch, 1988; van Eeuwijk, 1995).
By way of example, the fitted reaction norms of five geno- To emphasize the similarities with model 3a, we write the AMMI
types (out of the full set of 211 genotypes) are given in Figure 3, model as:
together with the parameters estimated according to the param-
K
eterization in model 3b (G′ and b′ ). Figure 3 shows that, in the !
average environment, genotypes G025 and G045 are better than µij = µ + Gi + Ej + bik zjk + εij (4)
k=1
G008, G012, and G016. The estimates for the parameters G′ can
be read-off from the plot as the fitted values at the null value of
where the GEI is now explained by K multiplicative terms (k =
the x-axis, i.e., the average environment indicated by the dashed
1 . . . K), each multiplicative term formed by the product of
vertical line. Although G045 does slightly better than G025 in
a genotypic sensitivity bik (genotypic score) and a hypothet-
the average environment, G025 is superior to G045 in the high-
ical environmental characterization zjk (environmental score).
quality environments. This is because G025 has a better ability to
Although genotypic and environmental scores are deemed to rep-
exploit improved environmental conditions, which is reflected in
resent genetic and environmental qualities, they come from a
its higher genotypic sensitivity (b′G025 = 1.27 > b′G045 = 0.99). A
similar observation can be made for G008 vs. G012 and G016.
While G008 does relatively better in low quality environments, it
is clearly surpassed by G012 and G016 in the best environments,
since it is not capable of profiting from the better environmental
conditions (b′G008 = 0.65, which is the lowest sensitivity among
the five genotypes).
In summary, the regression on the mean model describes GEI
in terms of parameters that can be given some biological mean-
ing. In addition, and in contrast with the full interaction model

Table 2 | ANOVA table for the regression on the mean model


(model 3), as applied to CIMMYT maize stress trials.

Term Degrees of Sum of Mean F Probability


freedom squares squares

E 7 5679 811.2 1752.3 <0.001


G 210 614 2.9 6.3 <0.001
Heterogeneity 210 230 1.1 2.4 <0.001
of slopes FIGURE 3 | Finlay–Wilkinson regression curves of five maize
genotypes. The vertical line indicates the average environment. Next to
ε 1260 583 0.5 genotype labels, the corresponding Finlay-Wilkinson regression equation is
given.
Total 1687 7106 4.2

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 5


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

mathematical procedure, a principal components analysis on the A desirable property of the AMMI model is that the geno-
GEI (Gabriel, 1978; Gauch, 1988) that maximizes the variation typic and environmental scores can be used to construct powerful
explained by the products of the genotypic and environmental graphical representations called biplots (Gabriel, 1978) that help
scores. The first product term is the one that explains most of to interpret the GEI. Figure 4A presents a biplot for the maize
the variation, followed by the second one, and so on. This is data. A first thing to recognize is that both genotypes and envi-
reflected in Table 3, which shows the results from the AMMI ronments are present in the same plot; genotypes are represented
model to the maize example data. In the AMMI model, GEI is by gray circles and environments by filled triangles (red, blue,
explained by two axes (principal component 1, PCA1, and princi- and black). The environments are typically represented as axes
pal component 2, PCA2) that are highly significant (F = 2.8 and intersecting at their origins. The origins represent the averages
2.0 respectively, both with an associated P < 0.001). The first axis for the trait in the corresponding environments. The triangles
(PCA1) explains the largest part (SSPCA1 = 242), the second one point in the direction of increasing trait values. By projecting
explains a little less (SSPCA2 = 173), with a total explained sum genotypes on environmental axes, GEI for individual genotypes
of squares for GEI of 242 + 173 = 415, an improvement over the is approximated. To help interpretation, environmental axes can
explained sum of squares in the regression on the mean model be enriched by including a scale (Graffelman and van Eeuwijk,
(SSb = 230). 2005).
Biplots facilitate the exploration of relationships between
genotypes and/or environments. Genotypes that are more similar
Table 3 | ANOVA table corresponding to application of AMMI2 model
to each other are closer to each other in the plot than geno-
(model 4) to CIMMYT maize stress trials.
types that are less similar. The same is true for environments.
Term Degrees of Sum of Mean F Probability Genotypes/environments that are alike tend to cluster together.
freedom squares squares The angle between environmental axes is related to the correlation
between the environments. An acute angle indicates positive cor-
E 7 5679 811.2 1752.3 <0.001 relation (e.g., between LN96a and LN96b), a right angle indicates
G 210 614 2.9 6.3 <0.001 no correlation (e.g., between HN96b and NS92a), and an obtuse
PCA1 216 242 1.1 2.8 <0.001 angle indicates negative correlation (e.g., NS92a and LN96a). The
PCA2 214 173 0.8 2.0 <0.001 projection of a genotype onto an environmental axis reflects the
ε 1040 398 0.4 performance of that genotype in that environment (for GEI). For
example, genotype G091 projects on the NS92a axis above the ori-
Total 1687 7106 4.2
gin, indicating a positive interaction with that environment i.e.,
PCA1 and PCA2 are the principal component axes 1 and 2, respectively. the relative performance (GEI part) of G091 in NS92a is above the

FIGURE 4 | (A) Biplot from the AMMI model used to describe GEI in the data set, with same characteristics as of the AMMI biplot, except that
maize example data. Gray circles represent genotypes, and filled triangles triangles point in the direction of increasing overall performance (G + GEI),
environments, with triangles pointing in the direction of increasing GEI so the origin corresponds to the average performance of all genotypes in
(at origin GEI = 0). The projection of two genotypes (G041 and G091) on the particular environment. Projections for genotypes G041 and G091 are
the NS92a axis is shown by a dashed line. (B) GGE biplot for the maize given.

Frontiers in Physiology | Plant Physiology March 2013 | Volume 4 | Article 44 | 6


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

average of all genotypes in NS92a. Conversely, genotype G041 (on given here. Model 6a includes a single environmental covariable,
the right hand side of the plot) projects below the origin on the while model 6b includes multiple environmental covariables:
same axis, which points to a negative interaction with environ-
ment NS92a (i.e., G041 performs worse than average). Following
µij = µ + Gi + Ej + bi Zj + εij (6a)
a similar procedure it is possible to conclude that while geno-
type G091 showed positive adaptation to environment NS92a, it K
!
is not well adapted to environments LN96a and LN96b (the pro- µij = µ + Gi + Ej + bik Zjk + εij (6b)
jection of G091 on the LN96a and LN96b axes falls below the k=1
origin). Biplots are useful tools to investigate patterns in GEI, Models 6a and 6b look very similar to models 3a and 4, but there
because they can help to identify interesting genotypes that are is a substantial difference between them. In models 6a and 6b, Zj
adapted to particular environments, and to classify environments represents an explicit environmental covariable and not a hypo-
in groups. thetical environmental covariable as in models 3a and 4 (note that
Plant breeders are interested in the total genetic variation and Z is capitalized to highlight this difference). This distinction is
not exclusively in the GEI part. For that reason, it is useful to critical since the interpretation of the GEI in models 6a and 6b is
have a modification of model 4 that considers the joint effects of automatically placed into a biological context. Instead of describ-
the genotypic main effect and the GEI as a sum of multiplicative ing GEI as differential reactions to hypothetical environmental
terms. Effectively, the two-way table of genotype-by-environment covariables, factorial regression models help to identify genotypes
means is exposed to a standard principal components analysis, that are differentially sensitive to changes in identified environ-
with genotypes as objects and environments as variables (Yan mental quality components, for example, in a particular nutrient,
et al., 2000). For this model, closely the same estimation and or in water availability.
interpretation procedures hold as for model 4. Because genotypic Table 4 shows the results of a factorial regression model fitted
scores now describe genotypic main effects G and GEI together, to the maize example data, in which GEI is explained by differen-
this type of model is also known as the “Genotype main effects tial genotypic sensitivities to the minimum temperature during
and GEI model,” or “GGE model” and the biplots are called “GGE flowering (minTF, F = 1.7, P < 0.001) and to the amount of
biplots” (Yan et al., 2000). The model reads: radiation during grain filling (radiationGF, F = 1.2, P ≤ 0.038).
In many cases, different combinations of explanatory variables
K
! could produce closely similar models in terms of the amount
µij = µ + Ej + bik zjk + εij (5)
of explained GEI. Therefore, to arrive at biologically meaningful
k=1
models, it is crucial to combine statistical criteria for model selec-
The results of model 5 fitted to the maize data are presented in tion with physiological knowledge about the trait that is involved
the form of a biplot in Figure 4B. GGE biplots approximate over- (Voltas et al., 1999a,b, 2002).
all performance (G + GEI). This is in contrast to AMMI biplots,
Figure 4A, that approximate only the GEI part of the phenotype. MIXED MODELS FOR GENOTYPE-BY-ENVIRONMENT
Figure 4B shows the high yielding genotypes concentrated on the INTERACTION: MODELING GENETIC VARIANCES AND
right hand side of the biplot, with their projections on environ- COVARIANCES
mental axes covering the above average range (for example, G091 In the introduction, it was mentioned that GEI can be regarded
projects above the origin in NS92a, whereas G041 is found below both in terms of differential mean responses across environments
the origin). In contrast, low yielding genotypes (as G041) are con- and in terms of heterogeneity of genetic variation and covaria-
centrated on the left hand side of the biplot (projects below origin tion between environments. While the models considered so far
in most of the environments). focus on modeling the mean response, the models in this section
focus on the modeling of GEI in terms of heterogeneity of vari-
FACTORIAL REGRESSION MODELS ances and covariances. This section switches to the framework of
The models discussed so far assumed that we do not have explicit so-called mixed models. We concentrate on the main character-
information about the environments. While such models can be istics of a few, relatively simple yet powerful, mixed models that
useful to explain GEI, the biological interpretation of their results
is not always obvious. What do hypothetical environmental
Table 4 | ANOVA table corresponding to application of a factorial
variables, as in AMMI, mean in terms of quantifiable environ-
regression model (model 6) to CIMMYT maize stress trials.
mental characteristics such as temperature, water, nutrients
etc? A straightforward approach is to correlate environmental Term Degrees of Sum of Mean F Probability
scores with environmental covariables. However, if we do have freedom squares squares
explicit information about the environment, the information
can be used directly in the model by including it in the form E 7 5679 811.2 1752.3 <0.001
G 210 614 2.9 6.3 <0.001
of explanatory variables. GEI is then described as differential
G.minTF 210 172 0.8 1.7 <0.001
genotypic sensitivity to explicit environmental factors such as
G.radiationGF 210 124 0.6 1.2 ≤0.038
temperature, precipitation, water availability etc. Such models are ε 1050 517 0.5
known as factorial regression models (Denis, 1988; van Eeuwijk
et al., 1996). Two examples of factorial regression models are Total 1687 7106 4.2

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 7


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

can be used to model GEI in terms of heterogeneity of variance inference to fit mixed models is by residual maximum likelihood,
and covariance. A more detailed description of mixed models can or REML (Patterson and Thompson, 1971). Results of analyses
be found in the literature elsewhere (Verbeke and Molenberghs, based on REML are presented in another way than the famil-
2000; Galwey, 2006). iar ANOVA tables. Table 5 shows the results obtained by fitting
The models discussed in the previous sections were all exam- mixed models to the maize example data.
ples of fixed effects models, with all terms except the residual term Table 5 does not contain sums of squares, nor mean squares.
fixed. However, genotypes can be regarded as a random sample Instead, there is a table with three main sections. For model 7, the
from a larger population (especially easy when the number of compound symmetry model, one section contains the results for
genotypes is large, say more than 10), in which case genotypes testing fixed model terms (header Fixed terms). A second section
are an extra source of random variation. This situation calls for shows the estimates for the variances of the random terms (header
a mixed model, with genotypes taken as random term. A review Random terms), and a third section a goodness-of-fit statistic, the
of the use of mixed models to analyse complex data sets in plant deviance, that can be used to compare mixed models with equal
breeding can be found in Smith et al. (2005). For the maize exam- fixed terms and differing random terms (header Deviance). For
ple data set, there are 211 genotypes. When the genotypic main the fixed effects (environments in this case), Table 5 shows a Wald
effects are taken as random, the following mixed model equivalent test statistic, the corresponding degrees of freedom (DF), and a
of the additive model can be defined as: P value. The Wald test statistic is used to assess the significance
of fixed effects in the REML mixed model framework. Under the
µij = µ + Gi + Ej + εij (7) null hypothesis of no fixed effects, the Wald test has a distribution
that is approximately a Chi-square with DF equal to the num-
Gi ∼ N(0, σ2G ) εij ∼ N(0, σ2ε )
ber of independent effects for the particular fixed term. In the
The term Gi is underlined to indicate that it is a random term; maize example, the Wald test statistic for environments is 10,265.3
its distribution needs to be specified, and usually is taken to and it has 8 − 1 = 7 degrees of freedom. This Wald statistic has a
be normal, with zero mean and a variance specific to the term. very low tail probability in the Chi-square distribution under the
Model 7 contains two variance components, one corresponds null hypothesis of no environmental effects (P < 0.001). So, it is
to the random genotypic main effects, σ2G , and a second one, concluded that there is a significant difference between environ-
σ2ε , corresponds to the residual (which includes true GEI and ments. Some statistical packages, including GenStat®, can provide
error). An important consequence of including genotypes as ran- an F-distributed approximation to the Wald statistic.
dom is that automatically genetic covariances and correlations The estimates of the two parameters associated to the ran-
between performances in different environments are imposed. dom terms in the model: σ2G = 0.297 and σ2ε = 0.553 are given in
The total variance for individual genotypic observations in a par- the second part of Table 5. The magnitude of the variance com-
ticular environment j, σ2j , is the sum of two sources of variation: ponents can be compared to have an impression of the relative
σ2j = σ2G + σ2ε . The covariance between observations for a par- importance of genotypic main effects (σ2G ) in relation to the sum
of GEI and error (σ2ε ). The genetic correlation between any two
ticular genotype in environments j and j∗ , σjj∗ , following from
environments is estimated as:
model 7 is: σjj∗ = σ2G . For observations on different genotypes
σjj∗ = 0. In model 7, similarities (or covariation, and therefore 0.297
r(Envj ; Envj∗ ) = = 0.349
correlation) between observations made on the same genotype 0.297 + 0.553
in different environments are assumed to be positive, but covari-
ation between observations on different genotypes (regardless The last row in Table 5 presents the deviance (equal to −2 times
whether the observation is done in the same or in different envi- the restricted loglikelihood), which is a measure of how well the
ronments) is assumed to be zero. Model 7 is referred as the model fitted to the data. The better the model, the lower the
compound symmetry model (Verbeke and Molenberghs, 2000). deviance is. As will be seen later, the deviance can be used to
The general definition for a correlation between two traits, or compare different models to select the best model for the data,
two environments, x and y is: provided that the fixed part of the model remains unchanged.
Model 7 assumes a constant genetic variance and correlation
covariance(x; y) between pairs of environments. For METs, the assumption of
r(x; y) = √ "
var(x) var(y) constant genetic variance and genetic correlation across environ-
Model 7 imposes a constant correlation between environments, ments is unrealistic (Figure 2A). In the presence of GEI, a more
with the correlation between any pair of environments j and j∗ realistic model would allow the total genetic variance to change
(for clarity, we write Envj and Envj∗ when referring to those from environment to environment, which will in turn, cause
environments), being equal to: heterogeneous genetic correlations between environments:

σjj∗ σ2G σ2 µij = µ + Gi + Ej + εij (8)


r(Envj ; Envj∗ ) = # # = # # = 2 G 2
σ2j σ2j∗ σ2G + σ2ε σ2G + σ2ε σG + σε Gi ∼ N(0, σ2G ) εij ∼ N(0, σ2εj )

Although mixed models can be fitted by standard least squares In model 8, there is still a single genetic variance component for
procedures in the case of balanced data, a more general method of genotypes, and therefore, a constant genetic covariance between

Frontiers in Physiology | Plant Physiology March 2013 | Volume 4 | Article 44 | 8


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

Table 5 | REML output of the fit of different mixed models to the CIMMYT maize stress trials.

Model 7 Model 8 Model 9

Fixed Wald (DF ) P Fixed Wald (DF ) P Fixed Wald (DF ) P

E 10265.3 (7) <0.001 E 9759.4 (7) <0.001 E 6268.8 (7) <0.001

Random Estimate SE Random Estimate SE Random Estimate SE

σ2G 0.297 0.036 σ2G 0.125 0.017 σ2C1 0.439 0.053


σ2ε 0.553 0.020 σ2ε1 0.551 0.057 σ2C2 1 –
σ2ε2 0.692 0.071 σ2C3 0.042 0.013
σ2ε3 1.399 0.140 σC1C2 0.551 0.077
σ2ε4 0.672 0.069 σC1C3 0.109 0.019
σ2ε5 0.704 0.072 σC2C3 0.115 0.032
σ2ε6 0.135 0.018 σ2ε1 0.446 0.051
σ2ε7 0.152 0.019 σ2ε2 0.445 0.052
σ2ε8 0.761 0.078 σ2ε3 0.736 0.169
σ2ε4 0.428 0.050
σ2ε5 0.508 0.057
σ2ε6 0.145 0.018
σ2ε7 0.138 0.017
σ2ε8 0.740 0.080

Deviance (DF ) 1077.9 (1678) Deviance (DF ) 838.4 (1671) Deviance (DF ) 619.9 (1667)

Model 7 assumes compound symmetry, model 8: assumes heterogeneity of genetic variance across environments, and model 9 assumes heterogeneity of genetic
covariance between groups of environments and heterogeneity of genetic variance across individual environments. Environments are indexed as: 1 = SS92a, 2 =
IS92a, 3 = NS92a, 4 = IS94a, 5 = SS94a, 6 = LN96a, 7 = LN96b, 8 = HN96b. Groups of environments are indexed as: C1 = SS92a, IS92a, IS94a, SS94a, HN96b;
C2 = NS92a; C3 = LN96a, LN96b.

environments. However, the variance for the term εij that includes analyzing environments that strongly differ (e.g., with strong
GEI and error, is assumed to depend on the environment (i.e., stress and without stress).
the variance component σ2εj is indexed by j). Table 5 presents the The deviance for model 8 is 838.4 with 1671 DF, which
results of fitting model 8 to the maize data. Instead of two vari- is much lower than the one for model 7 (deviance 1077.9
ance components, there are now nine, one corresponding to the with 1678 DF). The deviance has dropped, but at the expense
variance component for genotypes (σ2G = 0.125), and eight cor- of having to estimate more parameters (nine instead of two
responding to a form of GEI for each of the eight environments parameters). Is the decrease in deviance large enough to con-
(for convenience, we assume constant errors). The heterogene- sider model 8 a significant improvement over model 7? Because
ity of variance for εij reflects that in some environments there is a model 7 and 8 are nested models (model 7 is a special case
larger variation (e.g., in environment 3, which is the high-yielding of model 8 when the σ2εj are equal for all j), a deviance test
NS92a) than in other environments (e.g., in environments 6 can be used to answer this question. Under the null hypoth-
and 7, which are low-yielding, LN96a and LN96b). The hetero- esis of no difference in quality of the fits, the difference in
geneity of variance leads to heterogeneous genetic correlations deviance between the two models is Chi-square distributed with
between environments. For example, the correlation between the number of DF equal to the difference in the number of
environments 6 and 7 is: parameters between the models. In the example, the difference
in deviance is 1077.9 − 838.4 = 239.5, and the models differ by
0.125
r(Env6 ; Env7 ) = √ √ = 0.466 seven parameters. The P value associated to 239.5 in a Chi-square
0.125 + 0.135 0.125 + 0.152 distribution with 7 DF is very small (P < 0.001), so it is con-
cluded that model 8 provides a significant improvement over
and between environments 3 and 6 is:
model 7.
0.125 In cases where the models are not nested, the comparison
r(Env3 ; Env6 ) = √ √ = 0.199 can be done by the Akaike Information Criterion (AIC) (Akaike,
0.125 + 1.399 0.125 + 0.135
1974). For model 7, AIC = 4170, and for model 8 AIC = 3944.
In conclusion, model 8 accommodates heterogeneity of variance The model that has the lowest AIC value is the one that is cho-
between environments and, with it, allows for heterogeneous sen. Model 8 has the lowest AIC value, which agrees with the
correlations between environments, which can be desirable when conclusion based on the deviance test.

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 9


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

Model 8 assumes heterogeneous variances across environ- 0.439


r(Env1 ; Env2 ) = √ √ = 0.496
ments, in combination with a constant covariance between envi- 0.439 + 0.446 0.439 + 0.445
ronments. This latter assumption can be relaxed by also allowing
the genetic covariance between environments to be heteroge- and between environments 1 and 7 is:
neous. A possibility is to estimate a covariance parameter for each
pair of environments, producing a variance-covariance model 0.109
r(Env1 ; Env7 ) = √ √ = 0.273
that is referred to as the “unstructured model” (Verbeke and 0.439 + 0.446 0.042 + 0.138
Molenberghs, 2000). A somewhat simpler strategy consists of esti-
mating covariances between groups of environments instead of Finally, the deviance can be used to evaluate whether the
between individual environments, in which the environments are allowance for heterogeneity of covariance between environments
first grouped in a number of clusters and then fitting the following improved the quality of the model or not.
model: The deviance for model 9 is 619.9 with 1667 DF, and the
difference in deviance with model 8 is 218.5, with four extra
µi(c)j = µ + Gi(c) + Ej + εi(c)j (9) parameters. The associated P value for 218.5 in a Chi-square
distribution with 4 DF is very low (P < 0.001), so it can be con-
Gi(c) ∼ N(0, !c ) εi(c)j ∼ N(0, σ2εj ) cluded that model 9 is a significant improvement over model 8.
For model 9 AIC = 3736, which is smaller than for model 8
In model 9 a random genetic main effect is fitted that changes (AIC = 3944), and confirms this conclusion.
between groups of environments and that has a covariance matrix We have presented different mixed model formulations to
!c that consists of group specific genetic variances, with σ2cj for model GEI in terms of heterogeneity of variance and covariance
group j, on the diagonals, and pairwise-specific genetic covari- between environments. The compound symmetry model, which
ances, with σcjj∗ between groups j and j∗ , on the off-diagonals. is the commonly used default model when fitting a mixed model
Model 9 retains the residual heterogeneity of model 8, which to a two–way table of means, forces variances and covariances to
means that environment specific genotypic effects are added to be constant across environments. Two alternative models accom-
group specific genotypic effects. To illustrate model 9, using the modated either heterogeneity of genetic variances across envi-
maize example, and based on Figure 4, the environments were ronments, or heterogeneity of genetic variances and covariances
clustered in three groups: group 1 = (SS92a, SS94a, IS92a, IS94a, across environments. There are other useful variance-covariance
HN96b), group 2 = (NS92a), and group 3 = (LN96a, LN96b). models such as the factor analytic (Malosetti et al., 2004; Boer
Therefore, the covariance matrix !C will contain on the diago- et al., 2007) that combines flexibility with parsimony (reduced
nal the genetic variances for groups 1, 2, and 3 (σ2c1 , σ2c2 , and σ2c3 number of parameters), but their discussion is outside the scope
respectively), and on the off-diagonals the covariances between of this paper.
the groups (σc12 , σc13 , and σc23 ). The full covariance matrix can The analysis of a data set is an iterative process consisting
be written as: of fitting and comparing alternative models to identify a good
model for the data under study. That process has been illustrated
⎛ ⎞
σ2c1 with a maize data set. The next section goes one step further in

!C = σc12 σ2c2 ⎠ the modeling process by including molecular marker informa-
σc13 σc23 σ2c3 tion, with the ultimate objective of identifying genomic regions,
QTLs, that underlie genetic variation of quantitative traits. Within
The results of fitting model 9 to the maize data are presented in the context of METs, the use of such models is a powerful
Table 5, where the estimates of the parameters in the covariance tool to identify and understand the genetic basis of GEI, that
matrix !C can be found. is, QEI.
The diagonals of !C show that, on average, the genetic vari-
ation is lower in group 1 (the group of nitrogen stress environ- QTL MAPPING IN THE CONTEXT OF MULTI-ENVIRONMENT
ments) than in group 2. It should be noted that because group 3 TRIALS: MODELING MAIN EFFECT QTLs AND
is composed of a single environment, the genetic variation can- QTL-BY-ENVIRONMENT INTERACTION
not be partitioned into a component due to the group and a So far, we discussed models that use either implicit or explicit
residual, so σ2c3 is not estimated but arbitrarily fixed to 1. The environmental characterizations to understand GEI. We switch
total variance in each of the environments is equal to the sum in this section to the use of explicit genotypic information in
of the group’s variance plus the environment-specific variance. the models describing GEI. Use of such information in sta-
For example, the variance in environment 1 is equal to 0.885, tistical models for GEI can help understand the basis of GEI
which is the sum of the variance of group 1, i.e., σ2c1 = 0.439, and in terms of the action of genome regions, QTLs, in their
σ2ε1 = 0.446. Recalling that the covariance between environments dependence on the environment, i.e., QEI. Molecular marker
within the same group is given by σ2c1 , σ2c2 and σ2c3 , and the covari- systems (RFLP, AFLP, DArT, SSR, SNP) provide information
ance between environments in different groups by σc1c2 , σc1c3 , about variation at the DNA level that can be employed in
and σc2c3 , the correlation between any pair of environments can statistical models. For example, within the framework of fac-
be estimated. For example, the correlation between environments torial regression models, markers can serve as explanatory
1 and 2 is: variables, which is at the core of regression–based approaches for

Frontiers in Physiology | Plant Physiology March 2013 | Volume 4 | Article 44 | 10


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

QTL mapping (Haley and Knott, 1992; Martínez and Curnow, predictor will then take the value 2 whenever an individual has
1992). two paternal alleles (M1 M1 ), the value 1 when the offspring indi-
Elaborating upon factorial regression ideas, the following sec- vidual is M1 m1 , and 0 when it is m1 m1 . Using a simple regression
tion presents mixed models that can accommodate explicit geno- model, the slope for the regression of the genotypic means on a
typic information to describe GEI in terms of QTL and QEI genetic predictor defined by the number of M1 alleles corresponds
effects (Malosetti et al., 2004; Boer et al., 2007; van Eeuwijk et al., to the effect of a substitution of an m1 allele by an M1 allele at the
2007, 2010). The genotypic information stemming from mark- given locus (Lynch and Walsh, 1998; Bernardo, 2002). This effect
ers is introduced in the statistical models in the form of so-called is also known as the additive genetic substitution effect of the
genetic predictors. Applications of mixed model QTL by envi- QTL allele. By analogy, a dominance genetic predictor can be con-
ronment detection as the one described here, can be found in structed by creating an explanatory variable with values 0, when
wheat (Mathews et al., 2008), sugar cane (Pastina et al., 2012), the offspring individual is M1 M1 or m1 m1 , and value 1 whenever
and sorghum (Sabadin et al., 2012). We should emphasize, that it is M1 m1 .
although we focus on QTL models applied to standard biparental With complete information on the marker genotypes, i.e.,
populations, these models can be adapted rather easily to multi- codominant markers without missing values, the construction
parental populations (van Eeuwijk et al., 2010; Huang et al., of genetic predictors at marker positions consists of simply
2011), or association mapping panels (Malosetti et al., 2007; van counting the number of alleles coming from a particular parent.
Eeuwijk et al., 2010). For genomic positions in between marker loci (putative QTL
While here we focus in this paper on mixed model QTL detec- positions), for dominant markers, and for markers with missing
tion, this is certainly not the only method for multi-environment values, the construction of genetic predictors requires more
QTL mapping. A well known and common alternative is to use effort. In a general formulation, the value for the additive genetic
mixture model approaches (Jiang and Zeng, 1995), for which predictor, Xadd , for an offspring individual can be defined as the
various user-friendly QTL software packages exist (e.g., QTL expected number of alleles coming from the paternal line, the
Cartographer, Basten et al., 2002). However, such QTL software number of M1 alleles:
packages typically provide little or no opportunity to intervene
with the statistical model, nor do they allow for applying differ-
ent model building strategies. For example, in the mixture model Xadd = Pr(M1 M1 |all markers) × 2 + Pr(M1 m1 |all markers)
context, it is hard to switch between different models for repre- × 1 + Pr(m1 m1 |all markers) × 0, (10a)
senting the dependencies between environments or add explicit
information on the environments, something that is relatively with Pr(M1 M1 |all markers), Pr(M1 m1 |all markers), and
easy in the mixed model context. Pr(m1 m1 |all markers) the conditional probabilities of the
individual being of the M1 M1 , M1 m1 , or m1 m1 type, respec-
EXPLANATORY VARIABLES FOR DIFFERENCES BETWEEN tively given the observed marker information. Note that in
GENOTYPES: GENETIC PREDICTORS the case of complete information, the individual’s genotype is
Most populations in QTL mapping originate from crosses known, so one of Pr(M1 M1 |markers), Pr(M1 m1 |markers) and
between pairs of inbred lines. A segregating offspring popula- Pr(m1 m1 |markers) will be equal to 1, while the others will be 0.
tion can be produced from an F1 hybrid after one generation of In the case of incomplete information, although the genotype
selfing (F2 ), after several generations of self-pollination (recom- for a locus of an individual may not be known with certainty,
binant inbred lines or RIL), or after crossing the F1 with one of information can be obtained from nearby markers to estimate
the parental lines (backcross). In addition, by chromosome dou- the probability of the offspring individual being of a partic-
bling of F1 gametes, a population of doubled haploid lines can be ular genotype. This probability is a function of the observed
generated. In all of these cases, two alleles at most will segregate genotypes at neighboring markers and the expected recombina-
at each locus. For a locus M1 , individuals can have the genotypes tion occurring between those marker loci and the locus under
M1 M1 , M1 m1 , or m1 m1 , with M1 the allele that comes from the evaluation (Lynch and Walsh, 1998). Efficient methods to cal-
paternal line, and m1 the allele that comes from the maternal line. culate conditional genetic probabilities for the different types
By convention the locus names are given in italics (so for exam- of population commonly used for plants have been proposed
ple M1 refers to locus 1, and M1 and m1 refer to the paternal and in the literature; see Jiang and Zeng (1997) for an exhaustive
maternal alleles at locus 1, respectively). The relative frequency of overview. The calculation of genotypic probabilities conditional
the genotypes in the offspring population depend on the type of on marker information provides the basis for all QTL mapping
population; for example, in an F2 the expected frequencies are ¼, strategies; QTL mapping packages calculate these probabilities
½, and ¼ for M1 M1 , M1 m1 , and m1 m1 , respectively. behind the scenes. In GenStat® (see “Appendix”), a very general
With the help of molecular markers, it can be revealed whether Hidden Markov Model algorithm has been programmed to calcu-
a particular individual is of the M1 M1 , M1 m1 , or m1 m1 type. To late those condtional probabilities. Other packages that calculate
detect QTLs and estimate their effects, it is necessary to translate those probabilities and that are free are Grafgen (Servin et al.,
the marker information into explanatory variables or genetic pre- 2002) and r/qtl (Broman et al., 2003).
dictors. A straightforward way of constructing genetic predictors With the estimated conditional probabilities, the genetic
is to create an explanatory variable that contains the number of predictors at positions where no or partial marker information is
copies of one of the alleles, for example, the M1 allele. The genetic available can be calculated by using the conditional probabilities

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 11


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

in expression 10a. An analogous reasoning holds for the estima- Table 6 | Results of the test for fixed effects in a mixed model
tion of dominance genetic predictors: including a fixed environment–specific additive (αj ) and dominance
(δj ) QTL effect.
dom
X = Pr(M1 M1 |all markers) × 0 + Pr(M1 m1 |all markers) Fixed terms Wald DF P

× 1 + Pr(m1 m1 |all markers) × 0. (10b) E 10875.5 7 <0.001


Additive effect (αj ) 100.9 8 <0.001
MODELING GENOTYPE-BY-ENVIRONMENT INTERACTION IN αQ 12.8 1 <0.001
TERMS OF QTL EFFECTS αQEI 88 7 <0.001
j
The inclusion of genetic predictors in a GEI model allows testing Dominance effect (δj ) 13.5 8 ≤0.097
the hypothesis that the DNA at a particular genome position has
an effect on a phenotypic trait, and whether that effect is envi- The additive QTL effect is partitioned into a QTL main effect (αQ ), and a QEI
ronment dependent or not. A basic GEI phenotypic model, as the effect (αjQEI ).
one discussed in the previous sections, can be extended to accom-
modate two new terms, one for the additive genetic effect of a
possible QTL (Xadd If required, a similar partitioning of the QTL effects may be
i αj ), and a second for the dominance effect of
carried out for the dominance effects. As a result of the parti-
the same locus (Xdom
i δj ):
tioning of the environment-specific QTL effects, there is a Wald
test for QTL main effect and a Wald test for QEI (Table 6). The
µij = µ + Ej + Xadd dom
i αj + Xi δj + Gi + εij , (11)
QEI effects should be tested, conditional on the main effect being
fitted into the model, i.e., the QTL main effect should always pre-
where Xadd
i , and Xi
dom stand for the values of the additive and
cede the term for QEI. In the example, it is observed that the
dominance genetic predictors of individual i at the position QEI interaction effect is highly significant (Wald = 88.0 on 7 DF,
at which a QTL is postulated and tested for. The parameters P < 0.001), so it is concluded that QTL effects are dependent on
αj and δj represent the additive and dominance effects of this the environment. Since there is significant QEI, no attempt will
QTL. In model 11, both types of QTL effects are indexed by j, be made to interpret the QTL main effect. When QEI is not sig-
because environment-specific effects are allowed. Residual genetic nificant, the model can be simplified by omitting the QEI term, as
main effects (i.e., genetic effects not explained by the QTL) the QTL main effect will suffice to describe the QTL effect.
contribute to the random genetic effect, Gi , and residual GEI
(residual QEI) contributes to εij . The conclusion about the pres-
A QTL MAPPING STRATEGY FOR MULTI-ENVIRONMENT
ence of a QTL at a particular position is based on a Wald test
TRIALS BASED ON MIXED MODELS
(Verbeke and Molenberghs, 2000) that assess the null hypothe-
The preceding section presented a number of models that can
sis of the environment-specific additive and dominance genetic
be useful in the detection of QTLs for MET data. The present
effects being zero across all environments: Ho: αj = 0, and Ho:
section discusses a strategy for a genome-wide scan for QTLs.
δj = 0, j = 1 . . . J. Note that as by definition, dominance effects
QTL mapping can be regarded as a model selection process aim-
are deviations from additivity, so dominance effects should be
ing to identify a model that describes the phenotypic response
tested conditional on the additive effects present in the model. In
in terms of QTL effects. Since a priori neither the number of
practice, and to assure that the proper test is used, it is adviced
QTLs nor their effects are known, we need a strategy that allows
to include the term for additive genetic effects in the model
to explore the vast range of possible models. There is no unique
before the term for the dominance effects, and use the sequen-
way of performing this search, but an effective strategy is pre-
tial Wald test (e.g., in GenStat® output, the test under the heading
sented here consisting of the following steps: (1) find a good
“Sequentially adding terms to fixed model”).
model for the phenotypic data; (2) perform a genome–wide scan
For the maize data, Table 6 shows an example of the appli-
for QTLs by simple interval mapping (SIM); (3) perform one or
cation of model 11 to a particular genomic position. The table
more rounds of composite interval mapping (CIM) starting with
indicates that the dominance effect at this genome position was
cofactors selected from the SIM step; and (4) fit a final multi–QTL
not significant (Wald statistic = 13.5 on 8 DF, P ≤ 0.097), and,
model to estimate QTL effects. Each step is illustrated using the
therefore, the null hypothesis of no dominance effects is not
maize example data. An example code that performs the differ-
rejected. However, the Wald statistic for the additive genetic
ent steps in GenStat® (VSN International, 2012) and in GenStat
effects was highly significant (Wald = 100.9, on 8 DF, P < 0.001),
Discovery® (Payne et al., 2007) is given in the “Appendix.”
indicating the existence of additive QTL effects. It is still necessary
to find out whether they are environment specific, i.e., whether
STEP 1: IDENTIFY THE BEST VARIANCE-COVARIANCE MODEL FOR THE
a QEI term is needed, or whether a model with just main effect
PHENOTYPIC DATA
QTL expression would suffice. To this purpose, the environment–
A number of models can be fitted (for example models 7 to 9
specific QTL effects (αj ) are partitioned into an additive main
plus the unstructured model), and compared based on the AIC
effect (αQ ) and QEI effects (αQEI
j ), leading to the following model: values. The selected mixed model will be the starting point from
which to develop a QTL model. Table 7 gives the AIC for four
add QEI
µij = µ + Ej + Xadd Q
i α + Xi αj + Xdom
i δj + Gi + εij (12) candidate models for the maize example data, and shows that

Frontiers in Physiology | Plant Physiology March 2013 | Volume 4 | Article 44 | 12


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

Table 7 | Comparison of the goodness of fit for four different mixed


models (models 7 to 9 and the unstructured model), as fitted to
CIMMYT maize stress trials.

Model Deviance DF ! Deviance ! DF P AIC

Model 7 1077.9 1678 – – – 4170


Model 8 838.4 1671 239.5 7 <0.001 3944
Model 9 619.9 1667 218.5 4 <0.001 3736
Unstructured 548.7 1644 71.2 23 <0.001 3708

The columns “# deviance” and “# DF” indicate the differences in deviance and
number of degrees of freedom between the current and the preceding model
in the list. The associated P values correspond to a Chi-square distribution with
# DF degrees of freedom.

the unstructured model is the best (lowest AIC) and is, therefore,
chosen as the basic phenotypic model.

STEP 2: GENOME-WIDE QTL SCAN, SIMPLE INTERVAL MAPPING


After choosing the phenotypic model, a genome-wide scan is per-
formed by fitting single QTL models across the genome at marker FIGURE 5 | Plot produced by a SIM QTL scan in a maize F2 population.
and in between marker positions, i.e., SIM. To perform SIM, we The upper panel shows the P value of the Wald test (on a –log10 scale) for
need to estimate genetic predictors that cover the genome. For the effect of a QTL along the chromosomes (solid line). The horizontal line
indicates a threshold value for significance. The lower panel gives an
most population types and population sizes of a few hundred indication of magnitude of QTL effects (higher intensity, larger effect),
individuals, calculating the genetic predictors every 5–10 cM is and parental line contributing the superior allele (blue, maternal; red,
sufficient. The genetic predictors are used to test for QTL effect parental line).
at the predictor location. The unstructured model was selected
for the maize data set, so the SIM scan can be done by fitting the
following model at every genetic predictor position (only additive and SS92a), but always with the allele from the father contributing
effects are tested as a previous analysis showed little dominance): to higher yield.
Scanning the results across the full set of chromosomes pro-
duces a list of putative QTL positions that can be used as cofactors
µij = µ + Ej + Xadd
i αj + Gi + εij (13)
at the following stage of the QTL mapping.
SIM implies performing multiple tests along the genome, one
The results of a genome-wide SIM scan are plotted in Figure 5. test at each putative QTL position. For example, for the maize
The upper plot displays the P value of the Wald test (on a –log10 data genetic predictors were calculated at 246 chromosome posi-
scale) for the effect of a QTL along the chromosomes. The hor- tions, which means that model 13 was fitted 246 times. When
izontal line indicates a threshold value, above which the null performing multiple tests, the probability of at least one false pos-
hypothesis of no QTL is rejected. The profile shows evidence of itive (i.e., falsely rejecting the null hypothesis) increases according
QTLs on chromosomes 1, 3, 4, 6, and 10. The two largest QTLs to the expression 1 − (1 − α)n ,with α the test level for a single
are the ones on chromosome 1 and on chromosome 10. The lower test and n the number of tests. A simple correction method is the
panel shows an indication of the magnitude of the QTL effects in Bonferroni correction that uses α/n instead of α to test individual
each of the environments at a particular chromosome position. null hypotheses, assuring that the proportion of false rejections
The type of color points to the parent that contributes the high among n tests will be at most equal to α. For example, to accept a
value allele (blue = maternal line, red = paternal line), and the maximum of 5% of false rejections in the whole of the experiment
color intensity to the magnitude of the effect. QEI is reflected in (genome–wide), one should use a threshold equal to 0.05/n. A
this plot by changes in color at a particular chromosome position disadvantage of the Bonferroni correction is that it is very conser-
(cross-over interaction) or by changes in intensity of the color vative risking that some QTLs may go undetected, especially when
(convergence-divergence). For example, the large QTL on chro- not all tests are independent, which is the case in QTL mapping
mosome 1 not only shows changes in magnitude of the effects where nearby positions are correlated.
between environments (different color intensities), but also shows Modifications to the Bonferroni correction in the context of
change of colors. For example, while in HN96b the allele increas- QTL mapping have been proposed by Cheverud (2001), and fur-
ing yield comes from the mother (blue), in IS92a, IS94a, NS92a, ther modifications proposed by Li and Ji (2005). Both approaches
SS92a, and SS94a the allele increasing yield comes from the father essentially compensate for the fact that, in QTL mapping, tests
(red). This is an example of cross-over interaction. The large QTL are correlated by using an estimated effective number of tests
on chromosome 10 shows only differences in magnitude of the (n∗ ) instead of the actual number of tests (n) to set the signifi-
QTL effect (from largest in HN96b to no effect in LN96a, LN96b, cance threshold. For the maize data, the Li and Ji (2005) approach

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 13


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

produced a value of n∗ = 81, which gives a larger threshold


P value than the Bonferroni correction (divide 0.05 by 81, instead
of dividing by 246). By default, GenStat estimates n∗ and uses it
to set the corresponding significance threshold.

STEP 3: COMPOSITE INTERVAL MAPPING


The power of QTL detection can be improved by reducing the
background noise caused by QTLs outside the region under test.
This is the principle of the CIM approach, simultaneously pro-
posed by Jansen and Stam (1994) and Zeng (1994). What makes
the difference between SIM and CIM, is that when performing
CIM the model includes a number of cofactors that corrects for
the effects of the genetic background:
!
µij = µ + Ej + Xif cjf + Xadd
i αj + Gi + εij (14)

(
In model 14 the term Xif cjf accounts for the effects of QTLs
outside the region that is being tested (Xadd i ), reducing the error
variation and thereby improving the power for QTL detection.
Various strategies exist for the selection of a set of cofactors, but
a pragmatic approach is to use the results from the SIM scan, FIGURE 6 | Plot produced by a CIM QTL scan in a maize F2 population.
including the positions indicative of QTLs by SIM as cofactors. The upper panel shows the P value of the Wald test (on a –log10 scale) for
the effect of a QTL along the chromosomes (solid line). The horizontal line
Another issue that needs to be addressed is that when testing indicates a threshold value for significance. The lower panel gives an
in a region close to a cofactor, it is necessary to exclude the partic- indication of magnitude of QTL effects (higher intensity, larger effect), and
ular cofactor from the model to avoid colinearity with the tested parental line contributing the superior allele (blue, maternal; red,
position. A popular solution is to choose a window around an parental line).
evaluation position such that if a cofactor falls inside that window,
then the cofactor is excluded from the model. Window size affects
the results of a CIM scan, and there are no clear–cut recommen-
but the one at the end of chromosome 3, had significant QEI
dations about which window size to use. For the present example,
(P < 0.01).
all cofactors that are on the chromosome being evaluated are
The estimated QTL effects are given in Table 8. The effect of
excluded, a strategy known as restricted CIM.
a QTL in a particular environment is declared significant when
The results of the restricted CIM scan for the maize data are
zero is outside the confidence interval of the estimated effect
presented in Figure 6. The profiles point to QTLs on chromo-
(CI = estimate ± 2∗ s.e., with s.e. the average standard error
somes 1, 2, 3, 4, 6, 9, and 10. In comparison with the results
obtained from the REML analysis). Results for the large QTL on
from SIM, the CIM profile reveals the same QTLs (the two major
chromosome 1 (QTL1,141 ) showed that the QTL had a significant
QTLs on chromosome 1 and 10, and the ones on chromosome
effect of 0.469 ton·ha−1 in environment SS92a, which means that
3, 4, and 6), but in addition it shows indications of QTLs on
for each replacement of the maternal allele by a paternal allele,
chromosomes 2 and 9.
a yield increase of about half a ton is expected. The effect of the
STEP 4: ESTABLISHING A FINAL QTL MODEL same QTL in environment HN96b had a negative sign (−0.232
In a subsequent modeling step, the QTLs for all positions that ton·ha−1 ), which means that rather than an increase, a decrease
were found significant in the restricted CIM scan are included in yield is expected for the same allele substitution. The effects of
simultaneously in the mixed model: QTL1,141 are inconsistent across environments not only in terms
of the size of the effects, but also in terms of the sign of the effect.
! Inconsistency in size and sign of QTL effects underlies crossover
µij = µ + Ej + Xadd
iq αjq + Gi + εij (15) interactions, the most important case of GEI (recall Figure 1D).
From the breeder’s point of view, the crossover QEI means that,
Model 15 is a multi–QTL model constructed by inclusion of the while the maternal allele has to be selected when breeding for
full set of QTLs identified in the previous CIM scan. QTLs with environment HN96b, the paternal allele will be the choice when
non-significant effects will be removed using Wald tests (condi- selecting for all the other environments. The other large QTL,
tional on all other QTLs) to arrive at a final model. The final which is on chromosome 10 (QTL10,67 ) showed changes of the
model for our example data showed that nine out of the ten QTLs sizes of the effects but not of their signs, indicating that the favor-
from the CIM scan were significant in the multi-QTL model. able allele came always from the paternal line. The size of the
Further, by breaking down the QTL effects into QTL main effects QTL effect was largest in HN96b (0.564 ton·ha−1 ), around 0.300
(αQ QEI
q ) and QEI effects (αq ), it was possible to investigate whether ton·ha−1 in IS92a, IS94a, NS92a, and SS94a, and not significant
QTL effects were consistent across environments or not. All QTLs in LN96a, LN96b, and SS92a. Despite changes in effect sizes,

Frontiers in Physiology | Plant Physiology March 2013 | Volume 4 | Article 44 | 14


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

Table 8 | QTL effect estimates (ton·ha−1 ) for individual environments.

SS92a IS92a NS92a IS94a SS94a LN96a LN96b HN96b

QTL1,141 0.469* 0.351* 0.370* 0.370* 0.214* −0.005 −0.002 −0.232*


QTL1,252 −0.026 −0.078 −0.292* −0.061 0.182 −0.05 −0.106* 0.093
QTL2,36 −0.123 −0.304* −0.329* −0.026 −0.091 −0.003 0.131* 0.106
QTL3,38 0.224* 0.236* 0.035 0.323* 0.241* −0.007 0.152* 0.480*
QTL3,217 −0.129* −0.129* −0.129* −0.129* −0.129* −0.129* −0.129* −0.129*
QTL4,136 −0.272* −0.344* −0.456* −0.147 −0.293* −0.093* −0.107* −0.262*
QTL6,125 −0.006 0.015 −0.332* 0.061 0.004 −0.096* 0.116* −0.155
QTL9,97 0.187* 0.251* 0.386* 0.016 0.023 0.026 −0.018 0.021
QTL10,67 0.056 0.324* 0.258* 0.251* 0.322* 0.072 0.054 0.564*

A positive sign indicates that the superior allele comes from the parental line, and a negative sign indicates the superior allele comes from the maternal line. QTL
effects significantly different from zero are indicated with an asterisk.

in this case, selection will always be for the paternal allele. In


contrast to these two QTLs, the QTL at 217 cM on chromosome
3 (QTL3,217 ) showed a consistent effect across all environments
(−0.129 ton·ha−1 ) with the maternal allele as the yield increas-
ing allele. The other QTLs showed different degrees of interac-
tion with the environment, involving crossovers (QTL2,36 and
QTL6,125 ) or only differences in magnitude of effects (QTL1,252 ,
QTL3,38 , QTL4,136 , and QTL9,97 ). The QTL effect information
is useful at the moment of selecting complementary lines that
combine in future crosses the favorable alleles coming from the
maternal and paternal line.

MODELING QTL EFFECTS IN RELATION TO ENVIRONMENTAL


INFORMATION
An interesting possibility with the QTL models presented here
is that they allow the inclusion of environmental information
to explain QTL effects in terms of sensitivities to environmental
factors. Similarly to GEI models in which environmental infor-
mation can be integrated to describe GEI effects, QEI models
can integrate environmental information to describe QEI effects.
Expressing QTL effects in terms of sensitivities to a particular
environmental factor allows prediction of the effect of the QTL FIGURE 7 | Effect on yield (ton ha-1) of the QTL on chromosome 1 at
under any condition within the range of the original experiments. 141 cM in relation to the minimum temperature (◦ C) during flowering
In addition, the inclusion of environmental information can help time.
unravel the physiological mechanisms that are behind the action
of a particular QTL.
The final QTL model for the maize example data consisted of For simplicity, in model 16, the regression of environment-
nine QTLs. It can now be investigated as to whether the varia- specific QTL effects on environmental covariables is developed for
tion in effects of those QTLs is related to changes in one or more one QTL (q∗ ). However, the procedure can be applied equally well
external environmental variables (There exists a strong analogy to other QTLs with environment–specific effects. In model 16,
with the factorial regression models discussed for GEI, models the effect of the QTL is expressed in relation to an environmen-
6a and 6b). Figure 7 presents a scatter plot of the QTL1,141 effects tal covariable (Z), where the effect of the QTL is equal to: αjq∗ =
across environments vs. the minimum temperature during flow- αq∗ + βq∗ Zj + ajq∗ . Zj represents the value of the covariable Z for
ering time. The plot shows a negative relationship between the environment j. When Zj is centered around zero, the parameters
QTL effect and temperature. of the QTL effects can be interpreted as follows: αq∗ corresponds
Assuming a simple linear relationship between the effect of a to the effect of QTL in the average environment (that is, when Z
QTL and a given environmental covariable, it is possible to test = 0); βq∗ corresponds to the change of the QTL effect per unit of
for that relationship using the following model: change of the covariable’s value; and the random term ajq∗ cor-
! responds to the residual (unexplained) QTL effect, with ajq∗ ∼
µij = µ + Ej + Xadd
iq αjq + Xi (αq∗ + βq∗ Zj + ajq∗ ) + Gi + εij N(0, σ2aq∗ ). For example applying model 16 to QTL1,141 , and
(16) with minimum temperature during flowering time as covariable,

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 15


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

showed a significant reaction of QTL1,141 to changes in the min- parameters/covariates and environmental parameters/covariates,
imum temperature during flowering, with β estimate equal to with as examples bi zj (FW, AMMI, and GGE), bi Zj (factorial
−0.040 ton ha−1 ◦ C−1 . We can interpret this result saying that regression), and Xi αj (QTL mapping). For some models no other
when the maternal allele is replaced by the paternal allele, we information than the two-way table of means is required (FW,
expect a yield decrease of 0.040 ton ha−1 for each degree Celsius AMMI, and GGE), others require explicit environmental (facto-
of increase in the minimum temperature during flowering. rial regression) and/or genotypic information (QTL models). For
The example assumed a simple linear relationship between exploring patterns of GEI, FW, AMMI, and GGE are very useful.
the QTL effect and a single environmental covariable, but more For prediction and understanding, factorial regression and QTL
complex explanatory models can be constructed. For example, it models are more appropriate.
is possible to include higher order terms to model the response
curve (e.g., a quadratic term), to use spline formulations, or to ACKNOWLEDGMENTS
include more than one environmental covariable in the model. It We acknowledge the support of the Generation Challenge
is important to mention that a close interaction with physiologists Program—Integrated Breeding Platform projects 2.2.1 and 3.2.4
is crucial to explore and select biologically sound models. for supporting the work presented here. We also thank three
anonymous reviewers for helping us to improve the manuscript.
CONCLUSION
We have discussed a suite of statistical models that are useful SUPPLEMENTARY MATERIAL
to plant breeding practitioners who are dealing with GEI. What The Supplementary Material for this article can be found online
all models have in common is that they make an attempt to at: http://www.frontiersin.org/Plant_Physiology/10.3389/fphys.
replace the ANOVA GEIij term by product terms of genotypic 2013.00044/abstract

REFERENCES wheat variety database. II. Variance of variables in genomic research. Traits. Sunderland, MA: Sinauer
Akaike, H. (1974). A new look at the component estimation. Theor. Appl. Biom. J. 47, 863–879. Associates Inc.
statistical model identification. IEEE Genet. 92, 28–39. Griffiths, A. J. F., Miller, J. H., Suzuki, Malosetti, M., van der Linden, C. G.,
Trans. Automat. Contr. 19, 716–723. Denis, J. B. (1988). Two-way analy- D. T., Lewontin, R. C., and Gelbart, Vosman, B., and van Eeuwijk, F.
Basten, C. J., Weir, B. S., and Zeng, Z.-B. sis using covariables. Statistics 19, W. M. (1996). An Introduction to A. (2007). A mixed-model approach
(2002). QTL Cartographer, Version 123–132. Genetic Analysis. New York, NY: WH to association mapping using pedi-
1.16. Department of Statistics, Falconer, D. S., and Mackay, T. Freeman and Company. gree information with an illustra-
North Carolina State University, F. C. (1996). Introduction to Haley, C. S., and Knott, S. A. (1992). tion of resistance to Phytophthora
Raleigh, NC. Quantitative Genetics. 4th Edn. A simple regression method for infestans in potato. Genetics 175,
Bernardo, R. (2002). Breeding for Harlow: Longman. mapping quantitative trait loci in 879–889.
Quantitative Traits in Plants. Finlay, K. W., and Wilkinson, G. line crosses using flanking markers. Malosetti, M., Voltas, J., Romagosa, I.,
Woodbury, MN: Stemma Press. N. (1963). The analysis of adap- Heredity 69, 315–324. Ullrich, S. E., and van Eeuwijk, F.
Boer, M. P., Wright, D., Feng, L., tation in a plant-breeding pro- Huang, X., Paulo, M. J., Boer, M., A. (2004). Mixed models includ-
Podlich, D. W., Luo, L., Cooper, gramme. Aust. J. Agric. Res. 14, Effgen, S., Keizer, P., Koornneef, M., ing environmental covariables
M., et al. (2007). A mixed-model 742–754. et al. (2011). Analysis of natural for studying QTL by environ-
quantitative trait loci (QTL) analysis Gabriel, K. (1978). Least squares allelic variation in Arabidopsis using ment interaction. Euphytica 137,
for multiple-environment trial data approximation of matrices by addi- a multiparent recombinant inbred 139–145.
using environmental covariables for tive and multiplicative models. J. R. line population. Proc. Natl. Acad. Mandel, J. (1969). The partitioning of
QTL-by-environment interactions, Stat. Soc. B 40, 186–196. Sci. U.S.A. 108, 4488–4493. interaction in analysis of variance.
with an example in maize. Genetics Galwey, N. W. (2006). Introduction to Jansen, R. C., and Stam, P. (1994). High J. Res. Natl. Bur. Stand. Math. Sci.
177, 1801–1813. Mixed Modeling: Beyond Regression resolution of quantitative traits into 73B, 309–328.
Broman, K. W., Wu, H., Sen, Ś., and and Analysis of Variance. Chichester: multiple loci via interval mapping. Martínez, O., and Curnow, R. N.
Churchill, G. A. (2003). R/qtl: QTL John Wiley and Sons Ltd. Genetics 136, 1447–1455. (1992). Estimating the locations and
mapping in experimental crosses. Gauch, H. G. (1988). Model selec- Jiang, C., and Zeng, Z.-B. (1995). the sizes of the effects of quan-
Bioinformatics 19, 889–890. tion and validation for yield tri- Multiple trait analysis of genetic titative trait loci using flanking
Cheverud, J. M. (2001). A simple cor- als with interaction. Biometrics 44, mapping for quantitative trait loci. markers. Theor. Appl. Genet. 85,
rection for multiple comparisons 705–715. Genetics 140, 1111–1127. 480–488.
in interval mapping genome scans. Gilmour, A. R., Cullis, B. R., and Jiang, C. J., and Zeng, Z. B. (1997). Mathews, K., Malosetti, M., Chapman,
Heredity 87, 52–58. Verbyla, A. P. (1997). Accounting Mapping quantitative trait loci with S., McIntyre, L., Reynolds, M.,
Cooper, M., and Hammer, G. L. for natural and extraneous varia- dominant and missing markers in Shorter, R., et al. (2008). Multi-
(eds.). (1996). Plant Adaptation tion in the analysis of field experi- various crosses from two inbred environment QTL mixed models
and Crop Improvement. Wallingford, ments. J. Agric. Biol. Environ. Stat. 2, lines. Genetica 101, 47–58. for drought stress adaptation in
UK: CAB International. 269–293. Kang, M. S., and Gauch, H. G. (eds.). wheat. Theor. Appl. Genet. 117,
Cullis, B. R., Thomson, F. M., Fisher, J. Gollob, H. (1968). A statistical model (1996). Genotype-by-Environment 1077–1091.
A., Gilmour, A. R., and Thompson, which combines features of fac- Interaction. Boca Raton, FL: CRC Mohring, J., and Piepho, H. P. (2009).
R. (1996a) The analysis of the NSW tor analysis and analysis of vari- Press Inc. Comparison of weighting in two-
wheat variety database. I. Modelling ance techniques. Psychometrika 33, Li, J., and Ji, L. (2005). Adjusting mul- stage analysis of plant breeding tri-
trial error variance. Theor. Appl. 73–115. tiple testing in multilocus analyses als. Crop Sci. 49, 1977–1988.
Genet. 92, 21–27. Graffelman, J., and van Eeuwijk, F. A. using the eigenvalues of a correla- Pastina, M. M., Malosetti, M., Gazaffi,
Cullis, B. R., Thomson, F. M., Fisher, J. (2005). Calibration of multivariate tion matrix. Heredity 95, 221–227. R., Mollinari, M., Margarido, G.
A., Gilmour, A. R., and Thompson, scatter plots for exploratory analysis Lynch, M., and Walsh, B. (1998). R., Oliveira, K. M., et al. (2012).
R. (1996b) The analysis of the NSW of relations within and between sets Genetics and Analysis of Quantitative A mixed model QTL analysis

Frontiers in Physiology | Plant Physiology March 2013 | Volume 4 | Article 44 | 16


Malosetti et al. Statistical models for genotype and QTL-by-environment interaction

for sugarcane multiple-harvest- and plant height differences. Theor. environment interaction,” in R. (2010). A Comparison of analysis
location trial data. Theor. Appl. Appl. Genet. 124, 1389–1402. Scale and Complexity in Plant methods for late-stage variety eval-
Genet. 124, 835–849. Servin, B., Dillmann, C., Decoux, G., Systems Research. Gene-Plant- uation trials. Aust. N.Z. J. Stat. 52,
Patterson, H. D., and Thompson, and Hospital, F. (2002). MDM: a Crop Relations, eds J. H. J. 125–149.
R. (1971). Recovery of inter- program to compute fully informa- Spiertz, P. C. Struik, and H. H. Yan, W., Hunt, L. A., Sheng,
block information when block tive genotype frequencies in com- van Laar (Dordrecht: Springer), Q., and Szlavnics, Z. (2000).
sizes are unequal. Biometrika 58, plex breeding schemes. J. Hered. 93, 115–126. Cultivar evaluation and mega-
545–554. 227–228. Verbeke, G., and Molenberghs, G. environment investigation based
Payne, R. W., Murray, D. A., Harding, Smith, A. B., Cullis, B. R., and (2000). Linear Mixed Models for on the GGE biplot. Crop Sci. 40,
S. A., Baird, D. B., Soutar, Thompson, R. (2005). The analysis Longitudinal Data. New York, NY: 597–605.
D. M., and Lane, P. (2007). of crop cultivar breeding and evalu- Springer-Verlag. Zeng, Z. B. (1994). Precision mapping
GenStat for Windows, 10th Edition ation trials: an overview of current Voltas, J., van Eeuwijk, F. A., Araus, of quantitative trait loci. Genetics
Introduction. Hertfordshire, UK: mixed model approaches. J. Agric. J. L., and Romagosa, I. (1999a) 136, 1457–1468.
VSN International, 355. Sci. 143, 449–462. Integrating statistical and ecophys-
Przystalski, M., Osman, A., Thiemt, E. van Eeuwijk, F. A. (1995). Linear and iological analysis of genotype by
Conflict of Interest Statement: The
M., Rolland, B., Ericson, L., Østerga, bilinear models for the analysis environment interaction for grain
authors declare that the research
H., et al. (2008). Comparing the of multi-environment trials: I. An filling of barley in Mediterranean
was conducted in the absence of any
performance of cereal varieties inventory of models. Euphytica 84, areas. II. Grain growth. Field Crops
commercial or financial relationships
in organic and non-organic 1–7. Res. 62, 75–84.
that could be construed as a potential
cropping systems in different van Eeuwijk, F. A. (2006). “Genotype Voltas, J., van Eeuwijk, F. A., Sombrero,
conflict of interest.
European countries. Euphytica 163, by environment interaction: basics A., Lafarga, A., Igartua, E., and
417–433. and beyond,” in Plant Breeding: Romagosa, I. (1999b) Integrating
Ribaut, J.-M., Hoisington, D. A., The Arnell Hallauer International statistical and ecophysiological Received: 28 September 2012; accepted:
Deutsch, J. A., Jiang, C., and Symposium, eds K. Lamkey and analysis of genotype by environ- 25 February 2013; published online: 12
Gonzalez de Leon, D. (1996). M. Lee (Oxford, UK: Blackwell ment interaction for grain filling of March 2013.
Identification of quantitative trait Publishing), 155–170. barley in Mediterranean areas. I. Citation: Malosetti M, Ribaut J-M and
loci under drought conditions van Eeuwijk, F. A., Bink, M. C. A. Individual grain weight. Field Crops van Eeuwijk FA (2013) The statistical
in tropical maize. 1. Flowering M., Chenu, K., and Chapman, S. C. Res. 62, 63–74. analysis of multi-environment data:
parameters and the anthesis-silking (2010). Detection and use of QTL Voltas, J., van Eeuwijk, F., Igartua, E., modeling genotype-by-environment
interval. Theor. Appl. Genet. 92, for complex traits in multiple envi- del Moral, L. G., Molina-Cano, interaction and its genetic basis. Front.
905–914. ronments. Curr. Opin. Plant Biol. 13, J. L., and Romagosa, I. (2002). Physiol. 4:44. doi: 10.3389/fphys.
Ribaut, J.-M., Jiang, C., Gonzalez 193–205. “Genotype by environment inter- 2013.00044
de Leon, D., Edmeades, G. O., van Eeuwijk, F. A., Denis, J. B., and action and adaptation in barley This article was submitted to Frontiers in
and Hoisington, D. A. (1997). Kang, M. S. (1996). “Incorporating breeding: basic concepts and meth- Plant Physiology, a specialty of Frontiers
Identification of quantitative trait additional information on ods of analysis,” in Barley Science: in Physiology.
loci under drought conditions in genotypes and environments in Recent Advances from Molecular Copyright © 2013 Malosetti,
tropical maize. 2. Yield compo- models for two-way genotype by Biology to Agronomy of Yield and Ribaut and van Eeuwijk. This is an
nents and marker-assisted selection environment tables,” in Genotype- Quality, ed G. Slafer (Binghampton, open-access article distributed under
strategies. Theor. Appl. Genet. 94, by-Environment Interaction, eds NY: Food Products Press), the terms of the Creative Commons
887–896. M. S. Kang and H. G. Gauch 205–241. Attribution License, which permits
Sabadin, P., Malosetti, M., Boer, M., (Boca Raton, FL: CRC Press Inc.), VSN International. (2012). GenStat use, distribution and reproduc-
Tardin, F., Santos, F., Guimarães, 15–50. for Windows, 15th Edn. Hemel tion in other forums, provided the
C., et al. (2012). Studying the van Eeuwijk, F. A., Malosetti, M., and Hempstead: VSN International. original authors and source are cred-
genetic basis of drought tolerance Boer, M. P. (2007). “Modeling Web page: GenStat.co.uk ited and subject to any copyright
in sorghum by managed stress trials the genetic basis of response Welham, S. J., Beverley, J. G., Smith, notices concerning any third-party
and adjustments for phenological curves underlying genotype x A. B., Thompson, R., and Cullis, B. graphics etc.

www.frontiersin.org March 2013 | Volume 4 | Article 44 | 17

You might also like