AICcmodavg: Model Selection Guide
AICcmodavg: Model Selection Guide
AICcmodavg package
Marc J. Mazerolle*
August 24, 2020
Abstract
The AICcmodavg package implements model selection and multimodel inference for
a wide range of model types. This vignette outlines the first steps to use the package
and also presents the main functions. The package also offers utility functions for
diagnostics and enhancements to specific classes of models that estimate demographic
parameters and vital rates in populations of unmarked animals (Fiske and Chandler,
2011).
1 Introduction
The publication of Burnham and Anderson (1998) and an expanded second edition
of the book four years later (Burnham and Anderson, 2002) initiated a shift in ecol-
ogy from traditional null-hypothesis statistical testing to the adoption of information-
theoretic approaches for model selection and inference. This movement also echoed a
broader fundamental change of focus in statistical inference. Whereas many statisti-
cal approaches have traditionnally centered on null-hypothesis statistical testing and
P -values, emphasis has moved to estimating parameters and measures of uncertainty
(Goodman, 1999; Nuzzo, 2014; Wasserstein et al., 2019; Calin-Jageman and Cumming,
2019; Anderson, 2019).
The AICcmodavg package implements model selection and multimodel inference
based on different information criteria, including AIC, AICc , QAIC, QAICc , and
BIC (Akaike, 1973; Sugiura, 1978; Burnham and Anderson, 2002; Schwarz, 1978).
Before starting analyses, I suggest considering the ten following guidelines:
1. Carefully construct your candidate model set. Each model should represent
a specific (interesting) hypothesis to test. Thought needs to be put in the models
that are relevant for the hypotheses and data at hand.
2. Keep your candidate model set short. The number of models should gener-
ally be less than the number of data points (Burnham and Anderson, 2002).
* Département des sciences du bois et de la forêt, Université Laval
1
3. Check model fit. Use the global model (i.e., the model from which all other
models can be derived) to assess model fit and ensure that model assumptions
are met. If none of your models fit the data well, information criteria will only
indicate the most parsimonious of the poor models.
4. Avoid data dredging. Data dredging or data snooping consists in running
analyses to find effects in your model set and then building the candidate model
set based on this information. This is ill-advised and you should avoid such a
procedure. You should specify the candidate model set based on your hypotheses,
and then do model selection based on this model set.
5. Avoid overfitting models. You should not estimate too many parameters for
the number of observations available in the sample. Running a model much too
complex for the data available can lead to spurious results.
6. Watch out for missing values. Values that are missing only for certain ex-
planatory variables change the data set and sample size, depending on which
variable is included in any given model. You should deal with missing values
before analysis, either by deleting certain observations or using missing data im-
putation (Gelman and Hill, 2007).
7. Use the same response variable for all models of the candidate model
set. It is inappropriate to run some models with a transformed response variable
and others with the untransformed variable. A workaround is to use a different
link function for some models (McCullagh and Nelder, 1989).
8. When dealing with models with overdispersion, use the same value
of ĉ for all models in the candidate model set. Overdispersion occurs in
certain models that use binomial or Poisson distributions and results from the
variance in the data exceeding that allowed by the distribution. One way to
diagnose the presence of overdispersion is to estimate a variance inflation factor
(ĉ) from the global model. Note that functions c_hat( ), mb.gof.test( ), and
Nmix.gof.test( ) estimate ĉ for specific model types.
9. Avoid mixing the information-theoretic approach and notions of sta-
tistical significance (i.e., P values). Information criteria and P -values do not
mix (Burnham and Anderson, 2002). Instead, you should provide estimates and
a measure of their precision such as unconditional standard errors or confidence
intervals).
10. Determining the ranking of the models is just the first step. When
the top-ranking model has most of the support (e.g., Akaike weights > 0.9), it
can be appropriate to base inference on this single most parsimonious model.
However, when many models rank highly, one should model-average effect sizes
for the parameters with most support across the entire set of models. This is the
underlying idea behind multimodel inference which consists in making inference
based on the whole set of candidate models.
After this preamble, we can start with an example using various functions of the
AICcmodavg package.
2
2 Getting started
In this section, we will walk through the steps to building the models as well as con-
ducting model selection and multimodel inference with an example data set. Here, we
will use the dry.frog data set from Mazerolle (2006). The data feature mass lost by
green frogs (Lithobates clamitans) after spending two hours on one of three substrates
that are encountered in some landscape types (for additional details, check Mazerolle
and Desrochers 2005). The response variable is the mass lost (Mass_lost) and we
are interested in testing difference among substrate types. To simplify the example, we
will only consider main effects, but note that you should consider interactions whenever
relevant (Mazerolle and Desrochers 2005 include interaction terms in the analysis).
> library(AICcmodavg)
> data(dry.frog)
For this example, we’ll only be using the first seven columns of the data set:
> ##extract only first 7 columns
> frog <- dry.frog[, 1:7]
> ##first lines
> head(frog)
Individual Species Shade SVL Substrate Initial_mass Mass_lost
1 1 Racla 0 7.27 SOIL 38.5 8.3
2 2 Racla 0 7.00 SPHAGNUM 31.0 3.6
3 3 Racla 0 6.83 PEAT 23.6 4.7
4 4 Racla 0 7.26 PEAT 37.4 7.0
5 5 Racla 0 7.43 SOIL 44.4 7.7
6 6 Racla 0 5.75 SPHAGNUM 16.4 1.6
> ##structure of data frame
> str(frog)
'data.frame': 121 obs. of 7 variables:
$ Individual : int 1 2 3 4 5 6 7 8 9 10 ...
$ Species : Factor w/ 1 level "Racla": 1 1 1 1 1 1 1 1 1 1 ...
$ Shade : int 0 0 0 0 0 0 0 0 0 0 ...
$ SVL : num 7.27 7 6.83 7.26 7.43 5.75 7.66 6.42 7.64 6.57 ...
$ Substrate : Factor w/ 3 levels "PEAT","SOIL",..: 2 3 1 1 2 3 1 2 3 2 ...
$ Initial_mass: num 38.5 31 23.6 37.4 44.4 16.4 39.8 25.9 35.6 29 ...
$ Mass_lost : num 8.3 3.6 4.7 7 7.7 1.6 6.4 5.9 2.8 3.4 ...
Note that Substrate is a factor with three levels. Using the default treatment
contrast coding in R, the variable has been recoded automatically with two indicator
(dummy) variables.
It’s also a good idea to check for missing values:
> any(is.na(frog))
3
[1] FALSE
In this case, there are no missing values and we won’t have to worry about some
observations being excluded in certain models.
1. Null model
Biological hypothesis: Mass lost by frogs is constant.
Ŷi = β0
2. Shade model
Biological hypothesis: Mass lost by frogs varies with shade.
3. Substrate model
Biological hypothesis: Mass lost by frogs varies with substrate type.
4
Ŷi =β0 + βInitial mass ∗ Initial massi + βInitial mass2 ∗ Initial mass2i +
βShade ∗ Shadei
Ŷi =β0 + βInitial mass ∗ Initial massi + βInitial mass2 ∗ Initial mass2i +
βSubstrateSOIL ∗ SubstrateSOILi + βSubstrateP EAT ∗ SubstrateP EATi
Ŷi =β0 + βInitial mass ∗ Initial massi + βInitial mass2 ∗ Initial mass2i +
βShade ∗ Shadei + βSubstrateSOIL ∗ SubstrateSOILi + βSubstrateP EAT ∗ SubstrateP EATi
The assumption of homoscedasticity does not seem to be met with the raw response
variable, as the variance increases with the mean (Fig. 1). To circumvent this issue,
we will apply a log transformation to the response variable:
> frog$logMass_lost <- log(frog$Mass_lost + 1) #adding 1 due to presence of 0's
5
Residuals vs Fitted Normal Q−Q
Standardized residuals
1 1
4
4
45 5
Residuals
2
0
0
−2
−2
−4
63
0 1 2 3 4 −2 −1 0 1 2
1
0.0 0.5 1.0 1.5 2.0
Standardized residuals
635 1
4
1
2
50 0.5
0
−4 −2
0.5
1
Cook's
63 distance
Figure 1: Assessment of model assumptions using residuals and fitted values from the global
model based on mass lost (g) by green frogs (Lithobates clamitans) exposed to different
conditions.
6
Residuals vs Fitted Normal Q−Q
Standardized residuals
14 4 1
0.5
2
Residuals
0.0
0
−2
−1.0
−4
63
63
63
Standardized residuals
1
2.0
50
2
0.5
14
0
1.0
−2
0.5
17 1
−4
Cook's
63 distance
0.0
Figure 2: Assessment of model assumptions using residuals and fitted values from the global
model based on the log of the mass lost (g) by green frogs (Lithobates clamitans) exposed
to different conditions.
The log transformation generally homogenized the variance and most residuals fol-
low a normal distribution, except for a few outliers (Fig. 2). Thus, we will proceed
with the analysis using the log transformation on all candidate models.
7
> m.shade <- lm(logMass_lost ~ Shade,
data = frog)
> m.substrate <- lm(logMass_lost ~ Substrate,
data = frog)
> m.shade.substrate <- lm(logMass_lost ~ Shade + Substrate,
data = frog)
> m.null.mass <- lm(logMass_lost ~ InitMass_cent + InitMass2,
data = frog)
> m.shade.mass <- lm(logMass_lost ~ InitMass_cent + InitMass2 + Shade,
data = frog)
> m.substrate.mass <- lm(logMass_lost ~ InitMass_cent + InitMass2 + Substrate,
data = frog)
> m.global.mass <- global.log
Most functions for model selection and multimodel inference in the AICcmodavg
package require that the output of the candidate models be stored in a single list.
Although the functions will add generic names to each model automatically if none are
supplied, it is good practice to provide meaningful and succinct names for each model.
This will help in the interpretation of the model selection tables. Model names can be
entered as a character string using the modnames argument or directly as a named list.
Here are the model outputs stored in a list with names assigned to each element:
> ##store models in named list
> Cand.models <- list("null" = m.null, "shade" = m.shade,
"substrate" = m.substrate,
"shade + substrate" = m.shade.substrate,
"mass" = m.null.mass, "mass + shade" = m.shade.mass,
"mass + substrate" = m.substrate.mass,
"global" = m.global.mass)
We note that the global model has all the support. By default, AICc is used in the
model selection and multimodel inference functions, but AIC can be selected with the
second.ord = FALSE argument:
8
> aictab(Cand.models, second.ord = FALSE)
Model selection based on AIC:
For those familiar with LATEX(Lamport, 1994; Mittelbach and Goossens, 2004),
note that most functions in AICcmodavg can export result tables in LATEX format using
xtable( ) methods from the xtable package (Dahl, 2014). For example, the following
code will produce Table 1:
> library(xtable)
> print(xtable(selectionTable, caption = "Model selection table on frog mass lost.",
label = "tab:selection"),
include.rownames = FALSE, caption.placement = "top")
9
Evidence ratios are also useful to quantify the amount of support in favor of a model
relative to a competing model (Burnham and Anderson, 2002). Function evidence( )
takes a model selection table as argument:
> ##evidence ratios
> evidence(aic.table = selectionTable)
Evidence ratio between models 'global' and 'mass + substrate':
87087.77
Here, we see that the global model is 87088 times more parsimonious that the
substrate model. It is also possible to compare two arbitrary models by using their
names in the model.high and model.low arguments:
> ##compare "substrate" vs "shade"
> evidence(selectionTable, model.high = "substrate",
model.low = "shade")
Evidence ratio between models 'substrate' and 'shade':
7.04
We conclude that the substrate model is 7 times more parsimonious than the shade
model. Another useful comparison is between the top-ranked model and the null model:
Because the top-ranked model has all the support, we could interpret the results of
the model using confidence intervals:
> confint(m.global.mass)
2.5 % 97.5 %
(Intercept) 1.0119220187 1.2071260941
InitMass_cent 0.0292360282 0.0369301094
InitMass2 -0.0007653636 -0.0003673069
SubstrateSOIL -0.0032563220 0.2091385826
SubstrateSPHAGNUM -0.3112314181 -0.0994731113
Shade -0.3081777571 -0.1366728765
We conclude that there is a quadratic effect of initial mass on frog mass loss, that
mass loss is lower in the presence of shade, and that mass loss is lower on Sphagnum
moss (living vegetation) than on peat. However, model support will often be shared
by several models (i.e., top-ranked model having < 90% of the support). In such cases,
we should conduct multimodel inference.
10
2.7.1 Inference on β estimates
Two functions are available to compute model-averaged estimates of β parameters.
Function modavg( ) implements the natural average. This method consists in using
exclusively the models that include the parameter of interest, recalculating the ∆AIC
and Akaike weights, and computing a weighted average of the estimates (Burnham and
Anderson, 2002, p. 152). We can compute the natural average of the effect of shade
(βShade ) on the loss of frog mass:
> modavg(cand.set = Cand.models, parm = "Shade")
Multimodel inference on "Shade" based on AICc
Note that the table only features the models that include shade as an explanatory
variable. We conclude that frogs loose less mass in the shade than out of the shade
lost (β̄ˆShade = −0.22, 95% CI: [−0.31, −0.14]).
Similarly, we can request a model-averaged estimate for factor levels, keeping in
mind that only certain contrast have been estimated in the model (i.e., there are three
levels, but only two contrasts). Note that the parameter must be specified with the
same label as in the model output.
For instance, to estimate the contrast between SPHAGNUM vs PEAT, we will inspect
the labels of a model that includes substrate type:
> coef(m.global.mass)
(Intercept) InitMass_cent InitMass2
1.1095240564 0.0330830688 -0.0005663352
SubstrateSOIL SubstrateSPHAGNUM Shade
0.1029411303 -0.2053522647 -0.2224253168
11
Model-averaged estimate: -0.21
Unconditional SE: 0.05
95% Unconditional confidence interval: -0.31, -0.1
We conclude that mass loss is lower on the Sphagnum substrate than on the peat
substrate (β̄ˆSusbstrateSP HAGN U M = −0.21, 95% CI: [−0.31, −0.1]).
The natural average has been under criticism lately, mainly due to the overestima-
tion of the effect under certain conditions (Cade, 2015). Indeed, excluding models that
do not feature the parameter of interest can inflate the model-averaged β, particularly
if the parameter only appears in models with low weight. Users should be wary of
systematically investigating the effect of parameters appearing in weakly supported
models, as using the natural average for this purpose is not recommended. An alter-
native estimator, the model-averaging estimator with shrinkage, is more robust to this
issue (Burnham and Anderson, 2002).
In contrast to the natural average, the model-averaging estimator with shrinkage re-
tains all models in the candidate model set, regardless of the presence of the parameter
of interest. Specifically, models without the parameter of interest are assigned a value
of 0 for the β and variance. This results in shrinking the effect towards 0 when mod-
els without the parameter of interest have high support. Function modavgShrink( )
implements this approach in AICcmodavg:
12
mass 4 49.44 45.37 0 0.00 0.00
mass + shade 5 32.32 28.25 0 0.00 0.00
mass + substrate 6 26.82 22.75 0 -0.20 0.06
global 7 4.07 0.00 1 -0.21 0.05
Note that all models are included in the tables above and that the estimate and
variance are set to 0 when the parameter does not appear in the model. An additional
consideration is that one should strive to balance the number of models with and
without the parameter of interest when specifying candidate models. In our case, four
models include the effect of shade (vs four without) and four models include the effect
of substrate (vs four without).
In our example, both methods of model-averaging β estimates lead to the same con-
clusions, because the top-ranked model (global model) has all the support and domi-
nates the results. However, whenever several candidate models share the support, both
methods of model averaging will not lead to the same conclusions. Model-averaging
with shrinkage is the recommended approach for β estimates. A similar approach is
also used to model-average predictions.
$se.fit
1 2
0.05177402 0.04973186
$df
[1] 115
13
$residual.scale
[1] 0.2372509
> ##predictions from null model
> predict(m.null, newdata = predData, se.fit = TRUE)
$fit
1 2
0.8235653 0.8235653
$se.fit
[1] 0.04637383 0.04637383
$df
[1] 120
$residual.scale
[1] 0.5101121
The main idea here is that the prediction from the null model does not depend on
the frog being in the shade or not. Similarly, we can make predictions for the same two
conditions (shade vs no shade) from each model and obtain a model-averaged estimate
of the predictions. In other words, the predictions are weighted by the Akaike weights
of each model. Consequently, a model with larger weight has a greater influence on
the model-averaged prediction than a model with low support. Because modavgPred(
) relies on predict( ) methods for different model types, one must supply a newdata
argument following the same restrictions as for predict( ). Specifically, you must
supply values for each variable appearing at least once in the candidate models. To
assist in this task, the extractX( ) utility function displays every variable appearing
in the model set:
> extractX(cand.set = Cand.models)
Predictors appearing in candidate models:
Shade Substrate InitMass_cent InitMass2
Structure of predictors:
$ Shade : int 0 0 0 0 0 0 0 0 0 0 ...
$ Substrate : Factor w/ 3 levels "PEAT","SOIL",..: 2 3 1 1 2 3 1 2 3 2 ...
$ InitMass_cent: num 20.36 12.86 5.46 19.26 26.26 ...
$ InitMass2 : num 414.6 165.4 29.8 371 689.6 ...
Using the predData data frame above, we proceed with model-averaging predic-
tions:
> modavgPred(cand.set = Cand.models, newdata = predData)
Model-averaged predictions on the response scale
based on entire model set and 95% confidence interval:
14
> ##data frame holding all variables constant, except Substrate
> predSub <- data.frame(InitMass_cent = c(0, 0, 0),
InitMass2 = c(0, 0, 0),
Substrate = factor(c("PEAT", "SOIL", "SPHAGNUM"),
levels = levels(frog$Substrate)),
Shade = c(1, 1, 1))
> ##model-average predictions
> predsMod <- modavgPred(Cand.models, newdata = predSub)
> predsMod
Model-averaged predictions on the response scale
based on entire model set and 95% confidence interval:
15
1.1
Predicted mass lost (log of mass in g)
1.0
0.9
0.8
0.7
0.6
Substrate type
Figure 3: Model-averaged predictions of the log of mass lost by green frogs Lithobates clami-
tans on three different substrate types.
The plot clearly shows that mass lost is lower on Sphagnum than on the other
substrates (Fig. 3).
Besides model-averaging predictions, it is also possible to model-average effect sizes
(differences between groups) using the predictions for two groups from each model.
This approach is implemented in the modavgEffect( ) function. Its use is similar to
16
modavgPred( ), except that the newdata data frame used for prediction must include
only two rows (i.e., the two groups to compare). Here is the application on the difference
between the peat and Sphagnum substrates:
Again, we conclude that frogs lose mass faster on peat than on Sphagnum sub-
strates. The values reflect differences on the log scale, because we used a log transfor-
mation on the original response variable.
17
Table 2: Model classes currently supported by AICcmodavg for model selection and multi-
model inference based on AIC, AICc , QAIC, and QAICc . Note that support varies from
basic model selection to model-averaging predictions.
Model type Class Degree of support
beta regression betareg model averaging β
conditional logistic regression clogit model averaging β
cox proportional hazard coxph, coxme model averaging β
distributions fitdist, fitdistr model selection
generalized least squares gls model averaging β
generalized linear mixed models glmerMod, glmmTMB model averaging
predictions
latent variable models lavaan model selection
linear and generalized linear models aov, lm, glm, vglm model averaging
predictions
linear mixed models lme, lmerMod, lmekin, model averaging
lmerModLmerTest predictions
multinomial logistic regression multinom, vglm model averaging β
nonlinear model gnls, nls, nlme, nlmerMod model selection
occupancy and abundance models unmarkedFit model averaging
with imperfect detectability predictions
ordinal logistic regression polr, clm, clmm, vglm model averaging β
presence-only models maxlikeFit model selection
survival regression survreg model averaging
predictions
zero-inflated models vglm, zeroinfl model averaging β
zero-truncated models vglm, hurdle model averaging β
The package also offers model selection based on BIC for all model classes in
Table 2 with the bictab( ) function. For Bayesian models of classes bugs, rjags, or
jagsUI, model selection using the DIC (Spiegelhalter et al., 2002) is enabled with the
dictab( ) function. A number of functions offer the possibility of conducting model
selection and multimodel inference by specifying the basic information (log-likelihood,
number of parameters, estimates, standard errors) for models that are not currently
supported. The next section features an example using these functions.
18
White and Burnham, 1999). The data were collected during three breeding seasons to
investigate the influence of the presence of road-mitigating infrastructures (amphibian
tunnels associated with drift fences) on green frog (Lithobates clamitans) populations
adjacent to roads. We will save the log-likelihoods, number of estimated parameters,
and effective sample size in vectors to conduct model selection based on AICc :
> ##log-likelihoods
> modL <- c(-225.4180, -224.0697, -225.4161)
> ##number of parameters
> modK <- c(2, 3, 3)
> ##model selection
> outTab <- aictabCustom(logL = modL,
K = modK,
modnames = c("null", "phi(SVL)p(.)",
"phi(Road)p(.)"),
nobs = 621)
We note that models including the effect of road-mitigating infrastructures have low
support compared to models of frog size (SVL: snout-vent length) or the null model.
We can also compute the evidence ratio between the top-ranked model vs the model
with the effect of road-mitigating infrastructures:
> evidence(outTab, model.high = "phi(SVL)p(.)",
model.low = "phi(Road)p(.)")
The top-ranked model has 3.8 times more support than the model with road-
mitigating infrastructures. We continue by saving the predicted survival estimates
in the presence of road-mitigating infrastructures and their standard errors:
> ##survival estimates with road mitigation
> modEst <- c(0.1384450, 0.1266030, 0.1378745)
> ##SE's of survival estimates with road mitigation
> modSE <- c(0.03670327, 0.03347475, 0.03862634)
19
Model-averaged estimate: 0.13
Unconditional SE: 0.04
95 % Unconditional confidence interval: 0.06 , 0.2
Unsuprisingly, we note that survival does not vary with the presence of road-
mitigating infrastructures.
The tools in this section highlighted how to conduct model selection and multimodel
inference for models that are not yet supported by the package. For models built
outside of the R environment in other software and based on maximum likelihood,
these tools can be convenient. For model classes that you would wish to be supported
by AICcmodavg, you can contact the package author directly.
References
Akaike, H. 1973. Second International Symposium on Information Theory, chapter
Information theory as an extension of the maximum likelihood principle, pages 267–
281. Akadémiai Kiadó, Budapest, Hungary.
20
Burnham, K. P. and D. R. Anderson. 2002. Model selection and multimodel inference:
a practical information-theoretic approach, second edition. Springer-Verlag, New
York, USA.
Calin-Jageman, R. J. and G. Cumming. 2019. The new statistics for better science:
ask how much, how uncertain, and what else is known. American Statistician
73(Suppl.):271–280.
Dahl, D. B. 2014. xtable: export tables to LaTeX or HTML. R package version 1.7-3.
https://cran.r-project.org/package=xtable.
Fiske, I. and R. Chandler. 2011. unmarked: an R package for fitting hierarchical models
of wildlife occurrence and abundance. Journal of Statistical Software 43:1–23.
Gelman, A. and J. Hill. 2007. Data analysis using regression and multilevel/hierarchical
models. Cambridge University Press, New York, USA.
Jolly, G. M. 1965. Explicit estimates from capture-recapture data with both death and
immigration: stochastic model. Biometrika 52:225–247.
Nuzzo, R. 2014. Statistical errors: P values, the ”gold standard” of statistical validity,
are not as reliable as many scientists assume. Nature 506:150–152.
21
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. van der Linde. 2002. Bayesian
measures of complexity and fit. Journal of the Royal Statistical Society Series B
64:583–639.
Sugiura, N. 1978. Further analysis of the data by Akaike’s information criterion and
the finite corrections. Communications in Statistics: Theory and Methods A7:13–26.
22