0% found this document useful (0 votes)
17 views18 pages

1 s2.0 S259026012200011X Main

The article introduces Multiblock Discriminant Correspondence Analysis (MUDICA), a statistical method designed to analyze group differences in large, structured categorical datasets, particularly in psychological research. MUDICA allows for the simultaneous examination of multiple blocks of variables, revealing complex relationships and differences in performance among groups, as illustrated through a study on mental health literacy. The method addresses limitations of traditional statistical techniques by providing intuitive visual representations and non-parametric inferential testing, making it suitable for datasets where traditional methods may fail.

Uploaded by

kaczetoww
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views18 pages

1 s2.0 S259026012200011X Main

The article introduces Multiblock Discriminant Correspondence Analysis (MUDICA), a statistical method designed to analyze group differences in large, structured categorical datasets, particularly in psychological research. MUDICA allows for the simultaneous examination of multiple blocks of variables, revealing complex relationships and differences in performance among groups, as illustrated through a study on mental health literacy. The method addresses limitations of traditional statistical techniques by providing intuitive visual representations and non-parametric inferential testing, making it suitable for datasets where traditional methods may fail.

Uploaded by

kaczetoww
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Methods in Psychology 7 (2022) 100100

Contents lists available at ScienceDirect

Methods in Psychology
journal homepage: www.sciencedirect.com/journal/methods-in-psychology

Multiblock discriminant correspondence analysis: Exploring group


differences with structured categorical data
Anjali Krishnan a, *, Ju-Chi Yu b, Rona Miles a, Derek Beaton d, Laura A. Rabin a, Hervé Abdi c
a
Brooklyn College of the City University of New York, USA
b
Centre for Addiction and Mental Health, Canada
c
The University of Texas at Dallas, USA
d
St. Michael’s Hospital Unity Health Toronto, Canada

A R T I C L E I N F O A B S T R A C T

Keywords: Psychological research often involves complex datasets that cannot easily be analyzed using traditional statistical
Discriminant correspondence analysis methods. Multiblock Discriminant Correspondence Analysis (multiblock DICA, also called MUDICA) examines group
DICA differences in large, structured categorical datasets and identifies blocks of variables that contribute to these
MUDICA
differences. Data for this illustration were obtained from a study on mental health literacy (N = 648) that
Mental health literacy
included 33 questions that were arranged into four blocks: etiology, symptoms, treatment, and general knowl­
edge of psychological disorders. With non-parametric inference tests and results displayed as intuitive maps,
MUDICA revealed differences in performance across groups not readily detectable using standard methods.

Psychological research often involves the simultaneous examination datasets), ANOVA or regression are often performed on each dependent
of a large number of behavioral, physiological, and demographic vari­ variable separately and are followed by corrections for multiple
ables. These variables might be quantitative or qualitative and could comparisons—a procedure that in turn can result in low statistical
individually or collectively be associated with differences among pop­ power. Alternatively, multivariate datasets can be analyzed with
ulations of interest. Often, such variables are analyzed with specific methods such as multivariate ANOVA (MANOVA) or linear discriminant
statistical methods whose main goals are to: (1) determine if there are analysis (LDA). However, such methods can only handle datasets with
reliable group differences (e.g., clinical versus control groups); (2) many more observations than variables and the variables themselves
predict information for new individuals (e.g., group assignment); and/or cannot be multicollinear (i.e., the variables cannot be linearly related).
(3) examine relationships between different variables (e.g., indepen­ Qualitative data include variables that describe observations (e.g.,
dence of attributes). demographic variables, survey responses) and may be categorical (also
called nominal) or ordinal. Such data are analyzed with methods that
1. Analyzing quantitative and qualitative data examine the association between variables (e.g., χ 2 test of indepen­
dence) or predict group assignment (e.g., binomial or multinomial lo­
Quantitative data include variables that represent amounts and may gistic regression). However, the χ 2 test of independence can only
be discrete or continuous. Such data are analyzed (usually one variable examine the association between two categorical variables, while lo­
at a time) via two seemingly different yet statistically equivalent gistic regression can only be applied to datasets with many more ob­
methods: Analysis of Variance (ANOVA) and Regression. However, both servations than (non-colinear) variables. In addition, most methods that
ANOVA and regression can only handle datasets with one quantitative analyze qualitative data cannot examine fine grained relationships
dependent variable at a time, irrespective of the number of quantitative among observations and variables (e.g., main effects and interactions).
or qualitative independent variables or factors. When datasets contain Almost all the traditional methods mentioned above (that analyze
more than one quantitative dependent variable (i.e., multivariate either quantitative or qualitative data) predominantly use parametric

Abbreviations: ANOVA, Analysis of Variance; BADA, Barycentric Discriminant Analysis; CA, Correspondence Analysis; DICA, Discriminant Correspondence
Analysis; MANOVA, Multivariate Analysis of Variance; MCA, Multiple Correspondence Analysis; MUDICA, Multiblock Discriminant Correspondence Analysis; PCA,
Principal Component Analysis.
* Corresponding author. 2900 Bedford Avenue, Brooklyn, NY, 11210, USA.
E-mail address: [email protected] (A. Krishnan).

https://doi.org/10.1016/j.metip.2022.100100
Received 3 April 2022; Received in revised form 21 September 2022; Accepted 21 September 2022
Available online 27 September 2022
2590-2601/© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

inferential tests, which depend on specific assumptions about the data how to interpret the numerous maps that are generated by the analysis.
(e.g., normality). Furthermore, results from most traditional methods The underlying technique for MUDICA—used in this article—is MCA (Lebart
are often presented in a counter-intuitive manner and can be difficult to et al., 1984; also see Guttman, 1941, for an earlier version), a method­
interpret (Wasserstein et al., 2019), particularly for very large datasets —derived from CA (Cordier, 1965)—that extends principal component
that require multiple levels of analysis (e.g., hierarchical linear analysis [PCA; Hotelling (1933)] to analyze categorical data. MCA can
regression). handle more than two categorical variables at a time and shares the
Though there exist alternative methods that can address the afore­ geometric properties that are characteristic of CA (Husson et al., 2017),
mentioned limitations of traditional methods and handle large, quali­ such as analyzing observation profiles rather than absolute frequencies
tative (i.e., categorical) datasets, these alternative methods are not across variables. However, increasing the number of variables in MCA
widely applied in psychological research. One such method is corre­ makes the geometric representation of profiles for MCA less intuitive than
spondence analysis [CA; Cordier (1965)], which is a multivariate method CA (for more details on CA geometry see Abdi and Williams, 2022; Husson
specifically developed to handle categorical data. In its simplest form, CA et al., 2017; Phillips and Phillips, 2009). In fact, the application of MCA
(1) analyzes a contingency table that cross tabulates the frequency of for categorical data is similar to how PCA is applied for quantitative data
observations according to two categorical variables, and (2) represents (see Abdi and Williams, 2010 for a detailed illustration of PCA). Specif­
the relationships of these variables by maps where the proximity be­ ically, as with PCA, MCA condenses information from a large dataset by
tween the levels of the variables expresses their association (Hair et al., combining the originally correlated categorical variables into new, un­
2009). Since its introduction in the 1960’s, the CA family has grown to correlated quantitative variables called factors, components, or di­
include other variants such as multiple correspondence analysis [MCA; mensions. These dimensions reveal how observations differ from each
Lebart et al. (1984); see Guttman (1941) for an earlier version], which other and which variables contribute to the differences. In addition, the
simultaneously examines more than two categorical variables at a time, relationship between individual observations and variables can be dis­
and discriminant correspondence analysis [DICA; Abdi (2007); Saporta played on a single map, a unique feature of the CA family of methods
and Keita (2006)], which evaluates group differences. (Lebart and Saporta, 2014).
DICA—a particular variant of CA and MCA—analyzes differences be­
1.1. Current work tween categories of observations that are evaluated on multiple variables
(Abdi, 2007; Saporta and Keita, 2006), while MUDICA (Williams et al.,
In this work, we present advances in the application of a particular 2010)—a particular variant of DICA—analyzes differences between cat­
variant of discriminant correspondence analysis—multiblock discrimi­ egories of observations that are evaluated on multiple blocks of variables.
nant correspondence analysis (also called MUDICA)—a method that can MUDICA offers numerous advantages to analyze large, structured cate­
handle large, structured categorical datasets with multicollinear vari­ gorical datasets from different perspectives. First, fine-grained analyses
ables that are arranged as blocks (e.g., academic variables, demographic that expose complex relationships between observations, variables,
variables). We expand upon the original presentation of multiblock DICA, groups, and blocks can be simultaneously performed. Second, relation­
where the method was first introduced to analyze the relationship be­ ships between the observations, variables, groups, and blocks are pre­
tween types of social communication patterns among individuals with sented in the form of intuitive maps that reveal underlying patterns in
Dementia of the Alzheimer’s Type (DAT) and their spouses. Specifically, the data. Third, while early variants of CA were exploratory and required
the authors evaluated group differences between patients with varying experienced visual interpretation of the maps, now, with superior
severity of DAT and examined the contribution of variables (i.e., the computing power, relevant non-parametric inferential testing proced­
number of occurrences of a given communication pattern) that were ures generate objective or confirmatory results that can also be dis­
arranged into two specific blocks (i.e., patient-initiated and spouse- played on the same maps. Fourth, because these non-parametric testing
initiated communication patterns). Williams et al. (2010) showed how procedures do not rely on parametric assumptions of standard statistical
MUDICA can include hypothesis testing to address clinical research ques­ models (e.g., normality), traditional hypothesis testing is still possible
tions involving categorical variables even when the data comprise many with MUDICA. Fifth, MUDICA is not affected by multicollinearity because it
more variables than observations (a pattern often called the “N ≪ P” does not rely on matrix inversion, which is the basis of traditional
problem). methods such as linear or logistic regression. In summary, MUDICA is an
While MUDICA was first used more than a decade ago (albeit in the field ideal tool to analyze complex datasets where traditional methods cannot
of communication disorders), the method has largely remained in the ordinarily be employed.
sidelines and has not been fully applied in psychological research. With We used MUDICA for our data to: (1) examine the variability in mental
the availability of better data visualization tools and customizable sta­ health literacy among participants based on age and gender; (2) identify
tistical software, we present an up-to-date account of MUDICA including variability among participant groups due to differences in the types of
how to normalize (i.e., scale) variable blocks, partial out a confounding mental health literacy questions; and (3) display descriptive and infer­
variable, examine main effects and interactions, and evaluate a posteriori ential results in the form of intuitive maps that are easily interpreted. In
group differences with non-parametric inferential tests. addition, we used a new conditioned version of MUDICA to partial out the
We use MUDICA to analyze data obtained from a recent study (Miles main effect of gender and separately control for the effect of clinical
et al., 2020), to examine age and gender effects on mental health liter­ coursework on mental health literacy.
acy—a concept that refers to knowledge and beliefs that facilitate the
identification and management of psychological disorders (Jorm et al., 3. Methods
1997). With MUDICA, we simultaneously assess how different participant
groups respond to questions on various topics in mental health literacy Data for this illustration were obtained from a study involving 663
such as etiology, symptoms, treatment, and general knowledge of dis­ undergraduate students who answered 33 multiple choice questions on
orders. In doing so, we identify problematic topics for specific groups, various topics related to mental health literacy (Miles et al., 2020; Rabin
which will provide a direction for mental health literacy education. et al., 2021). Based on previous literature on age differences in mental
health literacy (Farrer et al., 2008), participants in this study were
2. Background divided into two groups: (1) traditional college students (≤ 24 years)
and (2) non-traditional college students or adult learners (≥ 25 years).
There are various ways in which MUDICA can be used to extract in­ Furthermore, taking into account known gender differences in mental
formation from a large dataset. In this paper, we provide an illustrative health literacy (Wong, 2016), the two age groups were stratified by male
example that incorporates all the steps of MUDICA with an emphasis on and female gender, a process resulting in a total of four participant

2
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

groups. We excluded from the analysis a total of 15 participants with individual variables. Ideally, to examine group differences based on
missing or inadequate data (seven who did not report their age, three blocks of variables, as in our example, the analysis should give the same
who did not report their gender, and five who had answered fewer than importance to a block of four variables (e.g., the etiology block) as to a
four questions correctly). The final sample of 648 participants included block of sixteen variables (e.g., the symptoms block). If the size of the
303 females and 208 males in the ≤ 24 age group and 94 females and 43 block is ignored, then the block with sixteen variables is more likely to
males in the ≥ 25 age group, respectively. influence the results as compared to the block with four variables. So, to
The 33 multiple choice questions were further classified into four prevent the analysis from being driven by the number of variables
blocks: etiology (4 questions), treatment (5 questions), general knowl­ within a block, we incorporated block normalization with MUDICA such
edge (8 questions), symptoms (16 questions). Based on content consid­ that all blocks were given an equivalent importance (see Data analysis
erations, questions were further identified by specific domains (e.g., section below for details).
suicide, childhood disorders, medication). Each multiple choice ques­
tion had five answer choices and only one possible correct answer (see 3.2. Data analysis
Rabin et al., 2021, for sample questions). The main variables of interest
for this paper were responses to each question (correct or incorrect), age Below we describe the different steps of MUDICA (see Fig. 1 for a
(≤ 24 years or ≥ 25 years) and gender (male or female), while clinical schematic diagram and the Appendix for mathematical details). In
coursework (yes or no) was a supplementary variable of interest. addition, the Results section highlights how these different steps were
implemented for our example dataset with an emphasis on how to
3.1. Data recoding interpret the different maps generated by MUDICA. All statistical analyses
were performed in the R programming language (R Core Team, 2020)
Often, datasets represent quantities (e.g., score on a test, number of using the TExPosition (Beaton et al., 2014a, 2014b) and the ggplot2
correct questions) that are, in fact, qualitative or an aggregate of qual­ (Wickham, 2016) packages (for additional R code for CA and MCA, see
itative variables. In our example, a participant could obtain a score Husson et al., 2017).
between 0 and 33 depending on how many questions were answered
correctly. The total score (e.g., 18 out of 33) is a quantitative variable, 3.2.1. Step 1: Data organization
but the response to each question (i.e., correct versus incorrect) is a The original categorical data are appropriately recoded for analysis.
qualitative variable. There are two approaches to analyze such datasets. The recoded data are arranged with observations (identified by their
One approach is to examine the differences in participant groups by groups) on the rows and variables (normalized within blocks) on the
analyzing the absolute quantity (e.g., an independent sample t-test be­ columns (see Fig. 2 for specific details on data recoding and
tween males and females with total score as a single quantitative normalization).
dependent variable). Another approach is to examine the differences In our example, a question was either answered correctly (i.e., right,
between participant groups by analyzing the pattern of responses (e.g., R) or incorrectly (i.e., wrong, W), so each question was described as {R,
with responses to each question as multiple qualitative dependent vari­ W}, where ‘R’ and ‘W’ indicated a particular response for each question.
ables). In our example, we examined such patterns with MUDICA by rep­ Numerically, each question was coded with 1s and 0s, where 1 indicated
resenting participant responses with complete disjunctive coding the presence of a particular response and 0 indicated the absence of a
[Nakache (1973); see Data analysis section below for details], where particular response. Specifically, if a participant answered a question
each possible categorical response level (i.e., correct or incorrect) of the correctly, then the question was coded as {1, 0}, and if the participant
qualitative variable was uniquely expressed in the analysis. answered a question incorrectly, then the question was coded as {0, 1}.
In addition, large datasets often have variables that are organized The final dataset contained 648 participants whose responses to 33
into blocks, where the blocks collectively offer more information than questions could either be correct or incorrect, thus creating a table with

Fig. 1. Schematic diagram of the steps for MUDICA [adapted from Williams et al. (2010)]: (1) The original data are recoded for anlaysis; (2) DICA is performed on a
group × variable contingency table and the resulting dimensions are displayed as maps; (3) Contributions of blocks of variables are quantified in the dimensional
space; (4) Inference tests are conducted to examine group differences, determine reliability of dimensions, and predict group assignment.

3
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Fig. 2. Data organization for MUDICA: (1) All variables are represented by their levels of possible responses; (2) Original data are re-coded disjunctively; (3) Variables
are normalized within blocks such that for each row, the responses for all questions in a particular block sum to 1.

648 rows and 66 columns (i.e., number of questions × levels of 3.2.2. Step 2: Discriminant correspondence analysis
response). While this type of complete disjunctive coding ensures that all Observations are collapsed into group barycenters (i.e., summed
levels of a variable are represented in the analysis, multicollinearity, by within groups), so there are as many rows as there are groups. The data
definition, is automatically introduced into the dataset. This is because table itself becomes a contingency table that contains the frequency of
one level of each variable can be derived directly from the other level of occurrence of every level of each variable for each group of observa­
this variable. For example, if a participant answers a question correctly, tions. This contingency table is the input for correspondence analysis,
it automatically implied that the question was not answered incorrectly. which in turn transforms the table into two sets of factors (also called
However, with complete disjunctive coding both responses (i.e., 1 = dimensions): one set for the groups (and observations) and one set for
presence of correct response and 0 = absence of incorrect response) are the variables. The first dimension explains the largest possible variance
represented in the analysis, making each variable (with its levels) a in the data. The second dimension, which is uncorrelated (i.e., orthog­
multicollinear set. Fortunately, MUDICA is not affected by multi­ onal) to the first dimension, explains the second largest possible vari­
collinearity because the analysis does not involve a matrix inversion step ance in the data. All subsequent dimensions are computed as such, each
that is necessary for other methods such as linear or logistic regression with a decreasing amount of variance explained. Pairs of dimensions are
and linear discriminant analysis (see Härdle & Simar, 2019). geometrically represented on a map with each dimension as an axis and
group barycenters (and observations) and variables as points on these
3.2.1.1. Block normalization. There are different types of block maps, where points close to each other are similar and points far away
normalization procedures (see Abdi et al., 2012b, for examples) whose from each other are dissimilar [Abdi and Williams (2022); see the Ap­
goal is to ensure that each variable within a block is given equal pendix for mathematical details].
importance and that all blocks in the analysis are also given an appro­
priate importance. For example, consider a dataset with seven variables, 3.2.2.1. Conditioned analyses. Conditioned analyses (not displayed in
each with two possible responses (i.e., whether the question was Fig. 1) are used to partial out the effect of a single categorical variable
answered correctly or incorrectly). These seven variables are organized that might contribute to the variability in the data but might not be
into three blocks, where the first block (B1) contains one variable, the directly relevant to the overall analysis. For conditioned MCA (Escofier,
second block (B2) contains two variables, and the third block (B3) con­ 1988), such an effect is algebraically removed from the dataset prior to
tains four variables (see Fig. 2, Step 1). When these variables are performing MCA, and the resulting dimensions are interpreted in the same
disjunctively coded, each variable is represented as {R, W} and way as in a plain MCA (see the Appendix for mathematical details).
numerically coded as {1, 0} for a right answer and {0, 1} for a wrong Conditioned MUDICA extends conditioned MCA where the effect of a single
answer (see Fig. 2, Step 2). With this coding schema, across each row, categorical variable is removed before performing the MUDICA. Condi­
the sum of values within a block indicates the number of variables in tioned analyses can be used for various purposes such as to examine
that block. So, for B1, the sum across each row is 1 (i.e., one variable in experimental effects in the absence of an interfering or confounding
the block), for B2 the sum across each row is 2 (i.e., two variables in the factor or to examine interaction effects in the absence of an over­
block), and for B3 the sum across each row is 4 (i.e., four variables in the shadowing main effect.
block). Ideally, despite the differences in the number of variables in each In our example, participants with previous coursework in clinical
block, B1, B2, and B3 should contribute equally to the analysis. There­ psychology have been previously shown to have an advantage in a
fore, to ensure this equal contribution, we normalize (i.e., scale) the mental health literacy assessment because of their exposure to such
blocks by dividing the disjunctively coded variables within a block by topics in their curriculum (Miles et al., 2020). With conditioned MUDICA,
the total number of variables in that block. With this approach, the sum the effect of age and gender on performance in the mental health literacy
across each row for B1, B2, and B3 is 1 (see Fig. 2, Step 3), a configuration assessment can be examined after partialling out the effect of clinical
indicating that each block as a whole will contribute equally to the coursework, which, by itself, is not one of the primary variables of
analysis irrespective of how many variables are present in that block. interest.
In our example, there were four blocks with a different number of
questions per block: etiology (4 questions), treatment (5 questions), 3.2.2.2. Supplementary data. Supplementary data can be any data,
general knowledge (8 questions), symptoms (16 questions). After the which were not included in the original analysis. Supplementary ob­
responses were disjunctively coded, each block was normalized by the servations are observations described by the same variables as the
number of questions in this block (i.e., the etiology block by 4, the original dataset (e.g., new participants who take the same mental health
treatment block by 5, the general knowledge block by 8, and the literacy assessment) and supplementary variables are variables that are
symptoms block by 16). In this way, we ensured that no particular block measured on the same observations in the original dataset (e.g., clinical
preferentially influenced the analysis just by its number of questions. coursework, college major). These supplementary data are simply pro­
jected onto the dimensions generated by MUDICA or conditioned MUDICA,

4
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

and therefore, these supplementary data do not influence the analysis In order to better understand the total variance explained in our
but can be useful to better interpret the dimensions and perform infer­ example, we first performed an MCA (see the Appendix for mathematical
ential analyses (described in more detail below). details) on the overall dataset (i.e., 648 rows by 66 columns). The MCA
generated nine dimensions of which the first dimension revealed that
3.2.3. Step 3: Block contributions differences in total scores explained most of the variance (i.e., τ1 = 99%,
The relationship between the normalized blocks of variables and λ1 = 0.022). To examine the variance in responses on Dimension 1
groups of observations are displayed in the dimension space (see Results (Fig. 3, read from left to right on the horizontal axis), we identified
section and the Appendix for more details). The effect of each normalized participants based on their total scores (min = 4 and max = 33) and split
block is separately quantified to reveal how these blocks contribute to the distribution into three groups and identified them by colors [e.g.,
group differences on each dimension (Williams et al., 2010). scores between 0 and 11 (red), 12–22 (orange), and 23–33 (green)].
Dimension 1 can be interpreted as differences in total scores, with par­
3.2.4. Step 4: Inferential tests ticipants who were likely to answer most questions correctly represented
There are different inferential analysis steps for MUDICA. The first step on the left side of Dimension 1 and participants who were likely to
is to evaluate whether the variance explained in the sample reflects the answer most questions incorrectly represented on the right side of
real variance explained in the population (akin to a null hypothesis test). Dimension 1. It should be noted that the color-coded arrow depicted in
For this, MUDICA uses permutation tests (Berry et al., 2011) to (1) evaluate Fig. 3 will be used for subsequent figures to indicate the direction of
whether there is an overall difference between groups, and, if such a maximum variance from high scores (in green) on the left to low scores
difference exists, (2) identify the dimensions responsible for these group (in red) on the right.
differences. The second step is to examine the stability of group differ­
ences and identify variables that reliably contribute to these differences, 4.2. Multiblock discriminant correspondence analysis
for which MUDICA uses bootstrap tests (Efron and Tibshirani, 1993; Hes­
terberg, 2011). Specific implementation of these tests for our example For MUDICA, the 648 observations were categorized into four groups
data are further elaborated in the Results section. stratified by two genders (males and females) and two age-groups (≤ 24
In addition, MUDICA uses cross-validation analyses such as the leave- and ≥ 25 years), and the 66 variables were organized into four blocks.
one-out (LOO) procedure to evaluate the quality of group assignment. To avoid any particular block from dominating the analysis, each block
In the LOO procedure, each observation is excluded from the dataset one was normalized by the number of questions within the block: etiology (4
at a time and the left out observation is then projected as a supple­ questions), treatment (5 questions), general knowledge (8 questions),
mentary observation onto the dimensions generated by the MUDICA model symptoms (16 questions).
(which was created with the other observations). Then, the distance of When observations are categorized in a priori groups, MUDICA uses this
the projected observation from each of the group barycenters is group information to extract dimensions that maximize the variance
computed and the observation is assigned to the closest group. between groups (and so optimizes group assignment). These dimensions
are represented as a map with the group barycenters (or means) as
4. Results points on this map. The similarity between two groups is interpreted
based on the proximity of points on the maps—the closer the points, the
In this paper, we illustrate how MUDICA can be used to examine group more similar the groups and the farther the points, the more dissimilar
differences in mental health literacy based on age and gender, and we the groups. Individual observations are also represented as points on
identify blocks of variables that drive these group differences. Below, we these maps, and, to predict group assignment, a boundary is drawn
present each analysis in detail along with its methodological relevance, around all the participants from a particular group anchored by the
with an emphasis on how to interpret the numerous maps generated by respective group barycenter. The boundary, called a convex hull, con­
MUDICA [for additional interpretation on specific analyses, see Williams nects the outermost participants for this group and is sensitive to outliers
et al. (2010); for mathematical details see the Appendix]. (Greenacre and Blasius, 2006). Often, the convex hulls are peeled to only
contain a given proportion (e.g., 95%) of the participants within the
4.1. What do dimension-based methods give us? group and are known as peeled convex hulls. When peeled convex hulls
are drawn around participants included in the original analysis (i.e., a
Dimension-based methods extract—from datasets—new, uncorre­ fixed effect model), these hulls are called tolerance intervals.
lated variables, also called dimensions. Each dimension explains a spe­ In our example, MUDICA generated a total of three dimensions that,
cific amount of variance (called eigenvalue and denoted by λ), and the together, depicted the differences in performance of the four participant
sum of all the eigenvalues gives the overall variance of the dataset. The groups within the dimensional space (Fig. 4a). The most discriminant
proportion of variance explained (denoted by τ) by each dimension is dimension, Dimension 1, had λ1 = 0.013 and explained τ1 = 81% of the
the ratio of the variance explained by this dimension to the total vari­ total variance, the second-most discriminant dimension, Dimension 2,
ance. The dimensions are viewed as maps, where the observations, had λ2 = 0.002 and explained τ2 = 12% of the total variance, and the
groups, variables, and blocks are plotted as points on this map so that the least discriminant dimension, Dimension 3, had λ3 = 0.001 and
variability in the dataset can be visually inspected and interpreted based explained τ1 = 7% of the total variance (the total variance is given by λ1
on proximity of the points on the map. + λ2 + λ3 = 0.016). As the first two dimensions together accounted for

Fig. 3. Plain MCA: The maximum variance


between participants lies in the number of
questions that were answered correctly
versus incorrectly (out of a total of 33 ques­
tions). Participants are coded by color to
indicate performance levels based on total
score, where red = low score, orange = me­
dium score, and green = high score. NOTE:
The red-orange-green arrow displayed here
will be displayed in subsequent figures to
indicate the direction of maximum variance
along the first dimension.

5
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Dimension 1 separated participants based on the number of questions


that were correctly versus incorrectly answered. Similarly, for MUDICA,
the left side of Dimension 1 represented participants who were more
likely to answer more questions correctly and the right side of Dimen­
sion 1 represented participants who were more likely to answer more
questions incorrectly. The direction of lines in Fig. 5b shows the direc­
tion of response patterns for blocks of questions for each group of par­
ticipants. For example, the line for etiology for ≤ 24 year old females
pointed towards the left side of Dimension 1—a pattern that implied that
this group, compared to other groups, was more likely to correctly
answer questions on etiology. By contrast, the line for etiology for ≥ 25
year old males pointed towards the right side of Dimension 1—a pattern
that implied that this group, compared to other groups, was more likely
to incorrectly answer questions on etiology.

4.2.1. Identifying importance of specific variables


To quantify the importance of a variable relative to a given dimen­
sion, the variable’s contribution is computed as the ratio of the variance
explained by this variable to the total variance explained by the
dimension, and these contributions are often represented as percent­
ages. A variable is considered to be important if its contribution is larger
than N1 , where N is the total number of variables.
In our example, there were 66 variables, so any variable that
1
contributed more than 66 or ~0.015 (1.5%) to the total variance would
be considered important for a dimension. In Fig. 6a, responding incor­
rectly to an etiology question on bipolar disorder (~9%) and responding
incorrectly to a treatment question on schizophrenia (~10%) contrib­
uted most to the difference in performance between females and males.
In Fig. 6b, responding correctly to a general question on eating disorders
(~22%) and responding incorrectly to a symptom question on dementia
(~10%) contributed most to the difference in performance between ≤
Fig. 4. MUDICA: Top panel shows that Dimension 1 represents gender differ­
24 year olds and ≥ 25 year olds.
ences in how questions were answered (i.e., correctly versus incorrectly), and
Dimension 2 represents age differences in how questions were answered. Bot­
tom panel shows the fixed effect group assignment with tolerance intervals 4.2.2. Identifying variables that best explain a particular dimension
drawn around the group barycenters. To identify the variables that drive the differences between groups
on a particular dimension, variable scores are examined across all levels
of this variable. For MUDICA, the midpoint (or barycenter) of each variable
93% of the between-group variance in the dataset, only these two di­ (across its levels) lies at the center of the multivariate space (i.e., the
mensions will be displayed in all of the following results. By further origin). In our example, each variable is binary (i.e., has two levels,
examining the means of the four groups, we found that Dimension 1 correct versus incorrect response), and a line that joins these two levels
(read from left to right on Fig. 4a) identified differences in performance always passes through the origin of the multivariate space. The rela­
based on gender, and that Dimension 2 (read from top to bottom on tionship between any two such binary variables is then interpreted
Fig. 4a) identified differences in performance based on age. Based on the based on the angle made by the lines. If two lines (one for each variable)
overlap of the tolerance intervals (Fig. 4b), we found that the within- form an angle close to 0◦ , where the same levels of response (i.e., correct
group variance was so large that it was impossible to reliably assign on both variables) are close to each other, then these two variables are
participants to their respective groups. highly positively correlated. If the two lines form an angle close to 180◦ ,
MUDICA also revealed that for Dimension 1—which represented the where opposite levels of response are close to each other (i.e., correct on
differences in performance based on gender (horizontal axis)—questions one variable and incorrect on the other variable), then these two vari­
based on etiology contributed the most to variability in performance ables are highly negatively correlated. If the two lines form an angle
(Fig. 5a, left top panel). While the etiology block had the lowest number close to 90◦ , then the two variables are uncorrelated (i.e., orthogonal)
of questions, participants who answered etiology questions correctly with each other.
were more likely to obtain higher scores overall compared to other For example, in Fig. 7a, the angles between the lines for questions on
participants. For Dimension 2, which represented the differences based etiology are relatively small with the same levels of the response close to
on age, questions based on general knowledge contributed the most to each other, a configuration implying that participants answered all
variability in performance. By contrast, Fig. 5b (right top panel) shows questions on etiology in a similar way. In contrast, in Fig. 7b, the line for
the contribution of each block when block normalization was not per­ a general treatment question is almost perpendicular to all the other
formed. Here, the block with the largest number of questions (i.e., lines for treatment questions about schizophrenia, dementia, anxiety, or
symptoms) contributed the most to gender differences on Dimension 1, suicide, a configuration implying that participants answered the general
followed by the blocks on general knowledge, treatment, and etiology, treatment question independently than the other treatment questions.
which could lead to the misinterpretation of the dimensions to be based The importance of a variable is represented by the relative distance
on number of questions (i.e., symptoms > general knowledge > treat­ to the origin of the two points representing the levels of this variable. In
ment > etiology) rather than pattern of responses. our example, the first dimension separated females from males and the
Fig. 5c (bottom panel) shows how each of the four groups responded second dimension separated ≤ 24 year olds from ≥ 25 year olds. So, in
to questions from the perspective of each of the four blocks (with block Fig. 7c, a general question on eating disorders was more likely to be
normalization). We had already identified with the earlier MCA that answered correctly by ≤ 24 females than by any of the other groups,

6
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Fig. 5. Block normalization: Top panel shows the respective contribution of each block with (left) and without (right) block normalization. Bottom panel shows the
partial effect of the four blocks (with block normalization) for each participant group, where the direction of the lines (i.e., towards the left or right) indicates the
performance on questions within a particular block.

whereas in Fig. 7d, a symptom question on dementia was more likely to variance reflects a true difference in the population (akin to a null hy­
be answered correctly by ≥ 25 year old females than by any of the other pothesis test), MUDICA uses a permutation procedure, where the original
groups. dataset is reordered so that the inherent relationship between observa­
tions and variables is broken. For this reordered (or permuted) dataset, a
new R2 statistic is computed. This procedure is repeated a large number
4.3. Inference procedures
(e.g., 1000) of times and a distribution of the R2 statistic is generated and
used to determine the probability of obtaining the original R2 under the
The overall explained variance from a MUDICA is used to compute an
assumptions of the null hypothesis (separate tests are conducted for the
R2 statistic, which is the ratio of the between-group variance to the total
whole dimensional space and for each dimension). If this probability is
variance (Beaton et al., 2014a). To determine whether the explained

7
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Fig. 6. Variable contributions: Top panel shows variable contributions for Dimension 1 and bottom panel shows variable contributions for Dimension 2 (threshold is
set at ~1.5%).

smaller than .05 or 5%, then the variance explained is considered to Further, to illustrate the quality of group assignment, MUDICA gener­
reflect a true value different from 0. In our example, the overall R2 was ates prediction intervals, which are the random effect version of the
.07 (p < .001), a probability value small enough to indicate a statistically tolerance intervals mentioned earlier. Prediction intervals are computed
significant, albeit small, effect. using the LOO procedure (see Methods section), where each observation is

8
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Fig. 7. Variables by block: Variables are represented by factors scores in the multivariate space and can either be displayed all at once or separately by blocks. Panels
identify variables within, respectively, blocks of (a) etiology, (b) treatment, (c) symptoms and general knowledge, and (d) questions that contributed to the dif­
ferences in scores between participant groups.

excluded from the dataset and is projected onto its LOO subspace (i.e., the population. For the bootstrap procedure, observations from the original
multivariate space created by a MUDICA using the other observations). dataset are resampled with replacement to generate a new sample called
Next, this observation is reconstituted from its LOO subspace projections. as a bootstrap sample, and group means are computed for this bootstrap
Then, the reconstituted observation is projected as a supplementary sample. This process is repeated a large number (e.g., 1000) of times,
observation onto the multivariate space of the original MUDICA (for spe­ and a set of group means is generated for each of these bootstrap sam­
cifics, see Equations 19 to 24, page 1391ff in Williams et al., 2010). ples, which are each projected as supplementary observations onto the
Finally, a peeled convex hull is drawn around a given proportion (e.g., multivariate space of the original MUDICA. An ellipsoid is then drawn
95%) of the reconstituted observations (i.e., a random effect model) for a around a given proportion (e.g., 95%) of the bootstrapped group means
particular group, and this hull is called a prediction interval. In our and represents the confidence interval for each group mean. If the el­
example, Fig. 8a shows prediction intervals for each group, which lipsoids around the group means do not overlap, this indicates that the
almost completely overlap with each other, indicating a low accuracy in groups (i.e., the group means) reliably differ in the dimensional space. If
group assignment (i.e., we cannot accurately determine age or gender of the ellipsoids around two group means do overlap, this indicates that the
the participants based on their performance on the assessment). groups do not reliably differ for the given dimensions. In our example
MUDICA uses a bootstrap procedure to determine the stability of group (Fig. 8b, results provided in Table 1), the confidence intervals for the ≤
differences. In the bootstrap procedure, the original dataset is assumed 24 year old males and females do not overlap—a configuration that
to represent the entire population of interest and is therefore used to indicates that the pattern of responses from these two groups reliably
recreate samples that are similar to samples drawn from the original differed. However, the confidence intervals for the ≥ 25 year old males

9
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

4.4. Incorporating supplementary data

MUDICA can incorporate supplementary data without influencing the


original analysis. Supplementary data could relate to the observations or
variables and can be examined in the context of the original analysis.
In our example, participants responded to questions about their
coursework (i.e., whether or not they had taken any classes related to
clinical psychology), which—based on previous work (Miles et al.,
2020)—was a significant factor in determining levels of mental health
literacy. MUDICA examined a posteriori the average response patterns for
participant groups based on whether they had taken a clinical course or
not, and a bootstrap resampling procedure was used to compute confi­
dence intervals for the group means. As Fig. 10a shows, having taken a
clinical course was associated with high scores and not having taken
such a course was associated with low scores.
Another way to use supplementary data is to examine finer grained
group differences. In our example, the variable age was originally
divided into 2 groups: ≤ 24 year olds and ≥ 25 year olds, but additional
sub-divisions (e.g., 18–20, 21–24, 25–29, 30+) could also be examined a
posteriori. Fig. 10b shows that when supplementary age groups are
examined with respect to the original space that already maximized
between-group variance, Dimension 2 indeed depicts age differences,
especially when participants are older.
Finally, supplementary data describing age and gender interactions
can also be examined a posteriori. Fig. 10c shows that while age differ­
ences are more pronounced for older participants (25–29, 30+), gender
differences are more pronounced for younger participants (18–20,
21–24)—a configuration indicating an age by gender interaction in
overall performance.

4.5. Conditioned multiblock discriminant correspondence analysis


Fig. 8. Prediction and confidence intervals: Prediction intervals are used to
represent the random effect group assignment (top panel), where overlapping
prediction intervals implies that a participant may not be accurately classified
Conditioned MUDICA can be used to partial out certain strong effects
into their respective group. Bootstrapped confidence intervals represent group that might pull an analysis in a particular direction and mask or over­
differences (bottom panel), where overlapping confidence intervals imply that shadow other important but weaker effects (see the Appendix for
group means are not statistically different. mathematical details). Conditioned MUDICA can also be used to clean
datasets (i.e., remove confounding information that could potentially
and females do overlap—a configuration that indicates that the patterns influence the analysis).
of responses for these two groups of participants did not reliably differ
on Dimensions 1 and 2. 4.5.1. Partialling out a particular main effect
The bootstrap samples derived earlier are also used to identify var­ In our example, the strong effect of gender drove differences between
iables that reliably contribute towards group differences. For the vari­ groups on Dimension 1. With conditioned MUDICA, the main effect of
ables, MUDICA computes bootstrap ratios (McIntosh and Lobaugh, 2004) gender was first removed to examine any underlying interactions be­
whose distribution is similar to a t-distribution. The bootstrap ratios are tween gender and age. It should be noted that removing a main effect of
interpreted similarly to a t-value for a given threshold (e.g., boot­ one variable does not remove any interaction effects of that variable
strapped t-value > 1.96 corresponds to a p-value of p < .05) and so, with other variables. As shown in Fig. 11a, once gender was removed,
variables that have a bootstrap ratio that crosses the threshold are Dimension 1 represented age differences with ≤ 24 year olds scoring
considered to be reliable at this threshold. In our example, Dimension 1 lower than ≥ 25 year olds. Because the main effect of gender was
separated males from females, and the bootstrap ratio tests identified the removed, Dimension 1 had λ1 = 0.002, and explained τ1 = 63% of the
variables that contributed reliably to this gender difference (Fig. 9a; total variance, while Dimension 2 had λ2 = 0.001, and explained τ2 =
results provided in Table 2). Dimension 2 separated ≤ 24 year olds from 37% of the total variance; together these two dimensions accounted for
the ≥ 25 year olds, and the bootstrap ratio test identified the variables 100% of the total variance (for this analysis the total variance was λ1 +
that contributed reliably to this age difference (Fig. 9b; results provided λ2 = 0.003).
in Table 2). The total variance before removing the main effect of gender was
0.016 (as mentioned earlier), and the total variance after removing the

Table 1
Normalized multiblock dica category-level statistics.
Category N Dimension 1 Dimension 2

Factor Scores Contributions (%) Bootstrap Ratios Factor Scores Contributions (%) Bootstrap Ratios

≤24 Females 303 − 0.08 24.10 ¡3.87 0.03 26.11 3.35


≤24 Males 208 0.16 60.42 5.71 0.01 0.61 0.56
≥25 Females 94 − 0.11 13.56 ¡2.99 − 0.08 49.13 ¡4.62
≥25 Males 43 0.06 1.91 1.01 − 0.08 24.16 ¡3.40

Note: Bootstrap ratios above/below ± 1.96 are considered reliable at p < .05 and are shown in bold face.

10
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Fig. 9. Bootstrap ratio tests: Top panel shows the bootstrap ratio values for Dimension 1 and the bottom panel shows the bootstrap ratio values for Dimension 2
(threshold at p < .05, which corresponds to a bootstrap ratio value of ~± 2, akin to a t-test).

main effect of gender was 0.003, a process resulting in a 81.25% were no longer reliable—–an effect implying that performance was less
reduction in the total variance of the dataset, a reduction which in­ likely to be driven by gender and more likely to be driven by age. In
dicates a very large effect of gender on performance. When the main addition, as mentioned earlier, when the main effect of gender was
effect of gender was removed, the effect of the interaction between age present, females were more likely than males to answer etiology ques­
and gender (previously overshadowed by the main effect of gender) was tions correctly (see Fig. 5b), but once the gender effect was removed,
clearly revealed. The interaction effect showed that differences in per­ knowledge of etiology was no longer important for identifying differ­
formance between males and females in the ≤ 24 year old age group ences in performance between the groups (Fig. 11b).

11
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Table 2
Normalized multiblock DICA variable-level statistics.
Block Content Domain Response N Dimension 1 Dimension 2

Factor Contributions Bootstrap Factor Contributions Bootstrap


Scores (%) Ratios Scores (%) Ratios

Etiology Bipolar R 428 − 0.12 4.77 ¡4.28 0.00 0.00 4.77


Bipolar W 220 0.24 9.28 4.26 0.00 0.00 9.28
Childhood Disorders R 326 − 0.09 1.85 ¡2.31 0.03 1.28 1.85
Childhood Disorders W 322 0.09 1.88 2.31 − 0.03 1.30 1.88
Depression W 348 0.15 5.44 4.25 − 0.02 0.94 5.44
Depression R 300 − 0.17 6.31 ¡4.25 0.03 1.09 6.31
General W 370 0.12 3.90 3.65 − 0.03 1.15 3.90
General R 278 − 0.16 5.19 ¡3.66 0.03 1.53 5.19

Treatment Anxiety Disorders R 336 − 0.12 2.80 ¡3.05 − 0.04 2.61 2.80
Anxiety Disorders W 312 0.13 3.01 3.03 0.05 2.81 3.01
Dementia R 537 − 0.06 1.06 ¡3.16 − 0.01 0.08 1.06
Dementia W 111 0.28 5.15 3.22 0.03 0.39 5.15
General R 406 − 0.03 0.19 − 0.94 0.03 1.71 0.19
General W 242 0.05 0.32 0.94 − 0.05 2.86 0.32
Schizophrenia W 205 0.29 9.64 4.78 0.04 1.44 9.64
Schizophrenia R 443 − 0.13 4.46 ¡4.64 − 0.02 0.67 4.46
Suicide W 453 0.05 0.55 1.84 0.02 0.81 0.55
Suicide R 195 − 0.11 1.28 − 1.85 − 0.05 1.89 1.28

General Anxiety Disorders W 305 0.10 1.14 2.53 0.01 0.10 1.14
Knowledge Anxiety Disorders R 343 − 0.09 1.02 ¡2.55 − 0.01 0.09 1.02
Anxiety Medication (1) W 360 0.09 0.95 2.32 − 0.01 0.08 0.95
Anxiety Medication (1) R 288 − 0.11 1.19 ¡2.36 0.01 0.10 1.19
Anxiety Medication (2) R 436 − 0.01 0.02 − 0.41 0.03 1.06 0.02
Anxiety Medication (2) W 212 0.02 0.03 0.41 − 0.06 2.17 0.03
Eating Disorders R 145 − 0.18 1.69 ¡2.76 0.25 23.23 1.69
Eating Disorders W 503 0.05 0.49 2.71 − 0.07 6.70 0.49
General (1) W 371 0.08 0.80 2.34 0.01 0.03 0.80
General (1) R 277 − 0.10 1.07 ¡2.33 − 0.01 0.04 1.07
General (2) R 419 − 0.07 0.64 ¡2.18 − 0.04 1.57 0.64
General (2) W 229 0.12 1.17 2.18 0.07 2.87 1.17
Substance Use Disorder R 233 − 0.06 0.33 − 1.14 − 0.10 5.22 0.33
Substance Use Disorder W 415 0.04 0.19 1.14 0.05 2.93 0.19
Suicide R 553 − 0.01 0.03 − 0.75 − 0.01 0.13 0.03
Suicide W 95 0.07 0.17 0.74 0.06 0.76 0.17

Symptoms Anxiety Disorders W 157 0.11 0.34 1.54 − 0.02 0.08 0.34
Anxiety Disorders R 491 − 0.03 0.11 − 1.53 0.01 0.03 0.11
Anxiety Disorders R 485 − 0.06 0.36 ¡2.75 0.04 0.94 0.36
Anxiety Disorders W 163 0.19 1.07 2.78 − 0.12 2.80 1.07
Bipolar R 238 − 0.14 0.81 ¡2.92 0.00 0.00 0.81
Bipolar W 410 0.08 0.47 2.89 0.00 0.00 0.47
Bipolar W 435 0.06 0.30 2.25 0.00 0.00 0.30
Bipolar R 213 − 0.13 0.62 ¡2.27 − 0.01 0.01 0.62
Childhood Disorders R 343 − 0.03 0.06 − 0.84 0.04 0.54 0.06
Childhood Disorders W 305 0.04 0.07 0.84 − 0.04 0.61 0.07
Dementia R 168 − 0.19 1.09 ¡3.10 − 0.21 9.43 1.09
Dementia W 480 0.07 0.38 3.05 0.07 3.30 0.38
Depression R 365 − 0.16 1.67 ¡4.57 0.04 0.66 1.67
Depression W 283 0.21 2.16 4.61 − 0.05 0.85 2.16
Eating Disorders W 347 0.15 1.46 4.16 0.02 0.11 1.46
Eating Disorders R 301 − 0.18 1.68 ¡4.15 − 0.02 0.13 1.68
Gender W 178 0.13 0.52 2.10 − 0.07 0.96 0.52
Gender R 470 − 0.05 0.20 ¡2.09 0.02 0.36 0.20
OCD W 463 0.10 0.80 3.91 0.00 0.00 0.80
OCD R 185 − 0.24 1.99 ¡4.07 0.00 0.01 1.99
Personality Disorder (1) R 297 − 0.08 0.32 − 1.82 0.05 0.90 0.32
Personality Disorder (1) W 351 0.07 0.27 1.82 − 0.04 0.76 0.27
Personality Disorder (2) R 382 − 0.09 0.57 ¡2.81 − 0.04 0.88 0.57
Personality Disorder (2) W 266 0.13 0.82 2.83 0.06 1.26 0.82
PTSD W 391 0.09 0.56 2.95 0.00 0.00 0.56
PTSD R 257 − 0.14 0.85 ¡2.99 0.00 0.00 0.85
Schizophrenia R 174 − 0.08 0.18 − 1.20 − 0.10 1.98 0.18
Schizophrenia W 474 0.03 0.07 1.20 0.04 0.73 0.07
Sexual Disorder R 438 − 0.08 0.45 ¡2.70 − 0.04 0.92 0.45
Sexual Disorder W 210 0.16 0.94 2.74 0.09 1.91 0.94
Somatic Symptom R 431 − 0.06 0.27 ¡2.08 0.01 0.08 0.27
Disorder
Somatic Symptom W 217 0.12 0.54 2.08 − 0.02 0.16 0.54
Disorder

Note: Bootstrap ratios above/below ± 1 are considered reliable at p < .05 and are shown in bold face. R = correct (i.e., right) response; W = incorrect (i.e., wrong)
response. Questions within each block of variables are identified by content domain.

12
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Fig. 11. Partialling out the main effect of gender: Dimension 1 represents age
differences in performance and Dimension 2 represents the specific difference
Fig. 10. Supplementary data: Top panel shows the effect of having taken a in performance between ≥ 25 males and females, which was overshadowed by
clinical course, middle panel shows finer grained age effects with categories not the strong main effect of gender (top panel). Bottom panel shows the contri­
included in the analysis, and bottom panel shows finer grained age and gender bution of each block, which reveals that in the absence of a gender effect,
categories to illustrate the interaction effect. questions on etiology no longer drive differences between participant scores
along Dimension 1.
4.5.2. Partialling out the effect of a particular confound
As noted above, taking clinical coursework could be examined as a males, implying that, in the absence of clinical coursework, males in
supplementary variable, where results showed that there were reliable general were more likely to have lower levels of mental health literacy
group differences in performance based on whether or not participants than females, as indicated by the strong main effect of gender described
had clinical coursework. Therefore, in order to examine age and gender earlier. Fig. 12b shows the important variable contributions for Di­
differences on mental health literacy in the absence of clinical course­ mensions 1 and 2, where (compared to Fig. 6a and b) the absence of a
work, we used conditioned MUDICA to remove the effect of clinical cour­ clinical psychology course resulted in lower contributions of questions
sework from the dataset and then examined the effect of age and gender on specific disorders and higher contributions of general questions on
on performance. After accounting for clinical coursework, Dimension 1 disorders.
had λ1 = 0.007, and τ1 = 69%, Dimension 2 had λ2 = 0.002, and τ2 =
20%, and Dimension 3 had λ3 = 0.001, and τ3 = 11%; together these 5. Discussion
three dimensions accounted for 100% of the total variance (for this
analysis the total variance was λ1 + λ2 + λ3 = 0.010). Psychological research involves datasets with multiple data types (e.
The total variance before removing the effect of clinical coursework g., survey measures, physiological measures, behavioral measures).
was 0.016 (as mentioned earlier), and the total variance after removing However, most traditional statistical methods such as ANOVA or regres­
the effect of clinical coursework was 0.010, resulting in a 37.5% sion limit the scope of the analysis to one or few questions about the data
reduction of the total variance in the dataset. When the effect of taking a and, so, often ignore the richness of the datasets. MUDICA has previously
clinical course was removed (Fig. 12a; supplementary age and gender been shown to better handle categorical datasets and extract from the
categories displayed), the response patterns for ≥ 25 year old males data findings that were not readily detectable using traditional statistical
were no longer similar to the response patterns for the ≥ 25 year old methods (Williams et al., 2010). Here, we present new advances in
females. Instead, the response patterns for ≥ 25 year old males now MUDICA including: (1) examining main effects and interactions and pre­
appeared more similar to the response patterns for the ≤ 24 year old dicting group assignment; (2) quantifying the contributions of blocks of

13
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

Fig. 12. Partialling out the confounding effect of clinical coursework: Partialling out clinical coursework has a specific effect on a sub-group of older male par­
ticipants (i.e., 30 + years), where the effects are better seen with supplementary group means (top panel). Bottom panel shows the important variables that contribute
to differences in scores between participant groups along Dimensions 1 and 2.

variables and categories of observations; and (3) partialling out the ef­ included additional approaches to examine specific interaction effects,
fect of a single confounding variable to reveal other weaker, yet supplementary data, and contributions of blocks of variables.
important, underlying effects. In our example, MUDICA revealed finer details about age and gender
For this paper, we used data from a mental health literacy research effects on mental health literacy, and identified the individual variables
study to illustrate the application of MUDICA to examine 33 varia­ and blocks of variables that contributed to the differences between
bles—arranged into four blocks—that were collected on 648 partici­ groups. Specifically, while there was a strong gender effect—with fe­
pants who, in turn, were classified into four groups. The goal of the male participants performing better than male participants over­
analysis was to examine group differences based on age and gender in all—there was also a clear interaction effect where the differences
mental health literacy and identify variables that contributed to these between males and females were driven by the differences in the ≥ 25
differences. However, the dataset contained categorical data that were year age-group. This difference between males and females in the ≥ 25
multicollinear—a configuration precluding the use of traditional year age-group was amplified when controlling for variance due to
methods of analyses. MUDICA offered a middle ground, where the clinical coursework—an effect implying that the gender difference in
elegance of traditional methods such as ANOVA or regression was pre­ mental health literacy is affected by clinical coursework offered at the
served (i.e., to examine main effects and interaction or predict group college level.
assignment), but in a single model and without the need for corrections
for multiple comparisons. MUDICA also went a step further by employing 6. Limitations
non-parametric inference testing methods that do not rely on the as­
sumptions required for ANOVA or regression (e.g., normality), and While MUDICA has many advantages, it also has limitations to be

14
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

considered. A first limitation is that, for datasets with a large number of (2017) ], SPSS [module Categories; Meulman and Heiser (2011)], and
variables, MUDICA could identify dimensions on which almost all variables Matlab (from author HA’s homepage at: https://personal.utdallas.
contribute, making such dimensions difficult to interpret. However, edu/herve/). In addition, CA and MCA are also available in various
when data are structured into blocks, examining block contributions packages from open-source software languages such as python (MCA
could facilitate a more effective interpretation of a dimension. There is 1.0.3, n. d.) and R (e.g., FactoMineR, ade4, ExPosition). In fact, R also
also ongoing research on how to effectively simplify interpretation of incorporates aspects of DICA in some of its packages. The version of CA
dimensions by forcing variables to either have very high or very low used in this article was implemented in R using the TExPosition package
contributions on particular dimensions (i.e., sparse solutions), where (Beaton et al., 2014b) and the maps were created with the ggplot2
such dimensions can be optimized to explain as much variance as package (Wickham, 2016). However, none of the above software pro­
possible while retaining a simple dimensional structure (Guillemot et al., grams have specific code to perform MUDICA or any of its various steps (e.
2019; Yu, 2021; Yu et al., in press 2022). g., block projections, conditioned analysis), which have to be specially
A second, often reported, limitation of methods such as MCA and coded (see Williams et al., 2010 for Matlab code for MUDICA). The
MUDICA is that they only analyze categorical variables and therefore, step-by-step R code (with the example dataset used in this paper) is
quantitative variables have to be appropriately transformed (e.g., bin­ available from author AK’s GitHub webpage at: https://github.
ned or categorized) if they are to be examined along with other cate­ com/anjkrishnan/multiblockDICA, and the R code for the specific ana­
gorical variables. While methods such as Barycentric Discriminant lyses and creation of the maps is available from author HA’s GitHub
Analysis (Abdi et al., 2017) and Multiblock Barycentric Discriminant webpage at: https://github.com/HerveAbdi/PTCA4CATA and
Analysis (Abdi et al., 2012a) exist for quantitative variables, and deliver https://github.com/HerveAbdi/data4PCCAR.
the same advantages as DICA and MUDICA deliver for categorical variables,
research is currently underway to better integrate information from both 8. Conclusion
quantitative and categorical variables within the same dimension-based
model (Beaton et al., 2019a, 2019b). However, if the goal of the analysis We illustrated the application of MUDICA in a mental health literacy
is to generalize a pattern of responses as opposed to studying individual study where the goal was to identify differences between groups of par­
differences, and the loss of statistical power is minimal, the conversion ticipants (i.e., ≤ 24 males, ≤ 24 females, ≥ 25 males, ≥ 25 females)
of some types of quantitative variables into categorical variables (e.g., based on 33 questions from a mental health literacy questionnaire that
via binning or using domain-specific cut-offs) is acceptable (Benzécri, were arranged in blocks (i.e., etiology, symptoms, treatment, general
1973). knowledge). Results from MUDICA were displayed as maps representing
A third limitation of dimension-based methods relates to missing different aspects of the data: group differences, group assignment, indi­
data. In general, if there are only a few missing responses for any vari­ vidual variable contributions, contributions of blocks, and underlying
able, then one approach is to replace the missing values by the profile of interaction effects in the absence of a main effect or a confound, along
this variable (i.e., the average probability of responses after excluding with relevant inferential testing procedures. For our example, MUDICA
the missing data). Another approach is to predict plausible values for the revealed that while a strong gender effect exists in mental health liter­
missing data while taking into account similarities between the obser­ acy—where, overall, females have higher mental health literacy than
vations and the relationship between variables (Josse and Husson, males—this gender effect masks underlying interactions between age and
2016). A third approach is to impute missing responses based on the gender. Specifically, both males and females in the ≤ 24 age group are
original disjunctively coded dataset and iteratively reconstruct the data more likely to have low mental health literacy, and that males in the ≥ 25
until convergence (see Husson et al., 2017, for more details). When there age-group are more likely to have lower mental health literacy in the
is a large number of missing responses, a common practice is to include absence of any college coursework in clinical psychology. This difference
an additional level for any variable where multiple observations have in coursework is less likely to affect females across all age-groups.
missing values, and this level is included in the analysis (Husson et al., In conclusion, multiblock discriminant correspondence analysis
2017). An examination of such data will reveal whether the responses (MUDICA), is a versatile dimension-based method that is well suited to
were missing randomly or systematically. Often, participants with analyze large, structured categorical datasets. MUDICA generates easy-to-
randomly missing responses are removed from the analysis. Other ap­ interpret maps that represent the relationship between groups of ob­
proaches to impute data such as regularized iterative imputations that servations and blocks of variables. The reliability of the maps and sta­
have been developed for MCA (Josse et al., 2012) can also be applied to bility of the variables are tested through non-parametric inferential
MUDICA. procedures such as permutation and bootstrap procedures. In this paper,
A fourth limitation is that, with conditioned MUDICA, only one effect (e. we introduced conditioned MUDICA where one specific effect from the
g., a main effect, confounding effect) can be removed at a time, partic­ dataset can be removed so that weaker underlying effects are clearly
ularly if the effects are not orthogonal (i.e., uncorrelated) with the other revealed. Thus, much like how a sketch artist creates a composite picture
effects. This limitation is being addressed in recent work by combining that represents the likely image of a person based on information from a
the underlying method for MUDICA with other techniques such as partial witness, so too, MUDICA creates a composite picture of the relationship
least squares regression (Beaton et al., 2019a, 2019b), where layers of between observations and variables based on information from large
the dataset can be systematically removed in order to study other and complex datasets.
smaller, yet important, effects within the data (Escofier and Pagès,
2016). Credit author statement
Finally, methods such as MUDICA generate numerous maps that are
designed to intuitively reveal patterns of information from large data­ The authors made the following contributions. Anjali Krishnan:
sets. However, to be able to accurately interpret these maps requires Conceptualization - Equations and code, Data curation - Analysis, fig­
substantial practice in reading such maps along with an adequate un­ ures, and tables, Writing – original draft preparation, review and edit­
derstanding of the dataset and relevant domain expertise (e.g., mental ing; Ju-Chi Yu: Conceptualization - Equations and code, Writing -
health literacy for this paper). Drafting mathematical appendices, review and editing; Rona Miles:
Conceptualization - Example study design, Writing - Drafting interpre­
7. Software tation of clinical results, review and editing; Derek Beaton: Conceptu­
alization - Equations and code, Writing – review and editing; Laura A.
MUDICA is based on correspondence analysis, which is available in Rabin: Conceptualization - Example study design, Writing – review and
most proprietary software programs including SAS [PROC CORRESP; Inc editing; Hervé Abdi: Conceptualization - Equations and code, Writing –

15
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

review and editing. Declaration of competing interest

Ethics approval and consent to participate The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
The example dataset used in this paper was obtained from a study the work reported in this paper.
that was approved by the Institutional Review Board (IRB) of Brooklyn
College of the City University of New York (IRB reference number: Data availability
2016–1018), and the consent procedure was also approved by the IRB.
No personal identification data were collected. The data and code are publicly available (see Software section for
details).
Funding
Acknowledgements
This work was supported by a grant awarded to authors RM and LAR
from the John Cleaver Kelly (JCK) Foundation [2016–2021] and the The authors would like to acknowledge Dr. Amy Boggan, Dr. Soudeh
Professional Staff Congress and The City University of New York Grant Khoubrouy, and Mr. Brendon Mizener for helpful comments on previous
#63184–00 51 awarded to author RM. versions of the manuscript.

Appendix

This Appendix describes the main steps for MUDICA including the new features of block normalization and conditioned analyses. For a more formal
presentation of MUDICA, see Appendix C, page 1390ff in Williams et al. (2010).

Notations

Matrices are shown in bold face upper case letters (e.g., X), vectors are shown in bold face lower case letters (e.g., x), and numbers are shown in
italic upper case letters (e.g., I). The diag{X} operator transforms the elements on the diagonal of matrix X into a vector, while the diag{x} operator
transforms the vector x into a diagonal matrix. The transpose of a matrix (e.g., X) is represented with a superscript T as XT .

Simple and Multiple Correspondence Analysis

For simple correspondence analysis, X is a contingency table with counts for levels of one categorical variable on the rows and another categorical
variable on the columns. This contingency table is then analyzed with a generalized singular value decomposition (GSVD). Multiple correspondence
analysis (MCA) generalizes correspondence analysis to analyze multiple categorical variables that are disjunctively coded (i.e., scores coded as 0s and
1s).
For MCA, matrix X has I observations and JK levels for each of the K variables (the total number of all levels for all variables is J). Matrix X is coded
with 0s and 1s and the sum of all the 1s is N. The first step in MCA is to compute the probability matrix:
Z = N− 1X (1)
The next step is to compute two vectors that contain the row totals (r) and the column totals (c) for X. These row and column total vectors are
diagonalized (Dr = diag{r} and Dc = diag{c}). Then, the χ 2 (chi-square) distance from the probability matrix Z is computed as:
R = Z − rcT (2)

Finally, a GSVD in performed on R as:


R = PΔQ T
(3)

1 1
with the constraints, PDr 2 PT = QDc 2 QT = I. Matrix Δ is an L × L diagonal matrix with L singular values as the diagonal elements; P is the matrix of
− −

left singular vectors, and Q is the matrix of right singular vectors. The rows and columns of R are multiplied by P and Q, respectively, to generate
matrices of factor scores:

(4)
1
F = D−r 2 PΔ

and

(5)
1
G = D−c 2 QΔ

Each column of F and G represent the dimensions and reveal how the observations and variables differ from each other.

Discriminant Correspondence Analysis

DICA is an extension of CA and MCA that examines group differences by maximizing between-group variance. In DICA, the I rows of X are categorized
into groups, which are represented in an I × O design matrix Y, where O is the number of groups. A contingency table is then computed as: R = YT X,
and a GSVD is then performed on this contingency table (following the same steps above). The factor scores are generated in the same way as in CA and

16
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

MCA,and represent the differences between categories (as opposed to individual observations).
Conditioned Multiple Correspondence Analysis

A conditioned MCA (Escofier, 1988) is used to control for a specific effect described by a single categorical variable that was not included in the
T
original analysis. If e is the additional variable and Ye is the design matrix for that variable, then C = X Ye represents how e contributes to the variables
of X. This matrix C is transformed into probabilities as:
̂ = C(1Ye )− 1 .
C (6)

In order to get the contribution of each observation to the levels of e, the design matrix Ye is normalized by the total number of observations (N) as:
N− 1Ye. This conditioned contribution indicates the proportion of the data predicted by e and is computed as:
( )
̂ = Ye C
Oe = N N − 1 Ye C ̂ (7)

− 1( ) −1
To perform a conditioned MCA, a GSVD is conducted on Dr 2 R − Oe + rcT Dc 2 , where R is the χ 2 distance from the probability matrix Z (see MCA section
above).

Conditioned Discriminant Correspondence Analysis

To perform a conditioned DICA, R is now the group matrix (i.e., YT X), and this matrix R is used to compute Oe. Factor scores of conditioned MCA and
conditioned DICA are computed and interpreted in the same way as ordinary MCA and DICA.

Supplementary Data

Observations or variables that are not included in the original analysis can be examined as supplementary data (i.e., they are not used to generate
the multivariate space). For a supplementary row (iTsup ), the factor scores (gsup ) are obtained as:

(8)
− 1 1
gsup = (iTsup 1) iTsup GΔ−

For a supplementary column (jsup ), the factor scores (f sup ) are obtained as:

(9)
− 1 1
f sup = (jTsup 1) jTsup FΔ−

− 1 − 1
where (iTsup 1) and (jTsup 1) are first used to scale (iTsup ) and (jsup ) so that the sum of the elements of isup and jsup are equal to 1.

Multiblock Discriminant Correspondence Analysis

MUDICA (Williams et al., 2010) is used to examine the contributions of blocks of variables to the overall variance. Here, the data table has J columns
as before, but these J columns are now arranged into H a priori blocks (i.e., X = [X1, X2, …, Xh]). These blocks are normalized such that each block
contributes equally to the analysis (see below for block normalization).
Each of the H blocks can be projected into the DICA multivariate space. First, the GSVD of the group matrix R is rewritten as:

R = PΔQT = PΔ[Q1 , ⋯Qh , …, QH ]T (10)

where Qh is the hth block of X. Then, the factor scores for the hth block are computed as (with Wh being a diagonal weight matrix for the hth block):
Fh = HXh Wh Qh (11)

Block Normalization

When variables are arranged into blocks, the variables within a particular block can be normalized such that, for each observation, the sum of
responses to variables within this block is equal to 1. Specifically, if the variables in the hth block (i.e., Xh) are disjunctively coded (i.e., with 1s and 0s)
and rh represents the vector of row totals for block Xh, block Xh is normalized as:
̃ h = Xh diag{rh }−
X 1
(12)

such that the row totals of X


̃ h would now be a vector of 1s, and the total sum of X
̃ h would be equal to the total number of observations N.

17
A. Krishnan et al. Methods in Psychology 7 (2022) 100100

References Hotelling, H., 1933. Analysis of a complex of statistical variables into principal
components. J. Educ. Psychol. 24 (6), 417. https://doi.org/10.1037/h0071325.
Husson, F., Lê, S., Pagès, J., 2017. Exploratory Multivariate Analysis by Example Using R,
Abdi, H., 2007. Discriminant correspondence analysis. In: Salkind, N. (Ed.), Encyclopedia
second ed. CRC Press, Boca Raton.
of Measurement and Statistics. Sage Publications. https://doi.org/10.4135/
Inc, S., 2017. SAS/STAT 14.3 User’s Guide: the Corresp Procedure. Cary: SAS Institute
9781412952644.n140.
Inc.
Abdi, H., Williams, L., 2022. Correspondence analysis. In: Frey, B. (Ed.), The SAGE
Jorm, A.F., Korten, A.E., Jacomb, P.A., Christensen, H., Rodgers, B., Pollitt, P., 1997.
Encyclopedia of Research Design. Sage Publications, pp. 327–339. https://doi.org/
Mental health literacy”: a survey of the public’s ability to recognise mental disorders
10.4135/9781071812082.n124.
and their beliefs about the effectiveness of treatment. Med. J. Aust. 166 (4),
Abdi, H., Williams, L.J., 2010. Principal component analysis. Wiley Interdiscipl. Rev.:
182–186. https://doi.org/10.5694/j.1326-5377.1997.tb140071.x.
Comput. Stat. 2 (4), 433–459. https://doi.org/10.1002/wics.101.
Josse, J., Chavent, M., Liquet, B., Husson, F., 2012. Handling missing values with
Abdi, H., Williams, L.J., Beaton, D., Posamentier, M.T., Harris, T.S., Krishnan, A., Devous
regularized iterative multiple correspondence analysis. J. Classif. 29 (1), 91–116.
Sr, M.D., 2012a. Analysis of regional cerebral blood flow data to discriminate among
https://doi.org/10.1007/s00357-012-9097-0.
alzheimer’s disease, frontotemporal dementia, and elderly controls: a multi-block
Josse, J., Husson, F., 2016. missMDA: a package for handling missing values in
barycentric discriminant analysis (MUBADA) methodology. J. Alzheim. Dis. 31 (s3),
multivariate data analysis. J. Stat. Software 70, 1–31. https://doi.org/10.18637/jss.
S189–S201. https://doi.org/10.3233/JAD-2012-112111.
v070.i01.
Abdi, H., Williams, L.J., Béra, M., 2017. Barycentric discriminant analysis. In: Alhajj, R.,
Lebart, L., Morineau, A., Warwick, K.M., 1984. Multivariate Descriptive Statistical
Rokne, J. (Eds.), Encyclopedia of Social Network Analysis and Mining. Springer,
Analysis; Correspondence Analysis and Related Techniques for Large Matrices. New
New York, pp. 1–20. https://doi.org/10.1007/978-1-4939-7131-2_110192.
York (USA) Wiley.
Abdi, H., Williams, L.J., Valentin, D., Bennani-Dosse, M., 2012b. STATIS and DISTATIS:
Lebart, L., Saporta, G., 2014. Historical elements of correspondence analysis and multiple
optimum multitable principal component analysis and three way metric
correspondence analysis. In: Greenacre, M., Blasius, J. (Eds.), Visualization and
multidimensional scaling. Wiley Interdiscipl. Rev.: Comput. Stat. 4 (2), 124–167.
Verbalization of Data. CRC Press, Chapman & Hall, pp. 31–44.
https://doi.org/10.1002/wics.198.
MCA 1.0.3. Python software foundation (n.d.) Retrieved September 3, 2022, from.
Beaton, D., Abdi, H., Filbey, F.M., 2014a. Unique aspects of impulsive traits in substance
https://pypi.org/project/mca/.
use and overeating: specific contributions of common assessments of impulsivity.
McIntosh, A.R., Lobaugh, N.J., 2004. Partial least squares analysis of neuroimaging data:
Am. J. Drug Alcohol Abuse 40 (6), 463–475. https://doi.org/10.3109/
applications and advances. Neuroimage 23, S250–S263. https://doi.org/10.1016/j.
00952990.2014.937490.
neuroimage.2004.07.020.
Beaton, D., Fatt, C.R.C., Abdi, H., 2014b. An ExPosition of multivariate analysis with the
Meulman, J.J., Heiser, W.J., 2011. IBM SPSS Categories 20. SPSS Inc., USA, p. 313.
singular value decomposition in R. Comput. Stat. Data Anal. 72, 176–189. https://
Miles, R., Rabin, L., Krishnan, A., Grandoit, E., Kloskowski, K., 2020. Mental health
doi.org/10.1016/j.csda.2013.11.006.
literacy in a diverse sample of undergraduate students: demographic, psychological,
Beaton, D., Saporta, G., Abdi, H., 2019a. A Generalization of Partial Least Squares
and academic correlates. BMC Publ. Health 20 (1), 1–13. https://doi.org/10.1186/
Regression and Correspondence Analysis for Categorical and Mixed Data: an
s12889-020-09696-0.
Application with the ADNI Data. others. bioRxiv, 598888. https://doi.org/10.1101/
Nakache, J.-P., 1973. Influence du codage des données en analyse factorielle des
598888.
correspondances étude d’un exemple pratique médical. Rev. Stat. Appl. 21 (2),
Beaton, D., Sunderland, K.M., Levine, B., Mandzia, J., Masellis, M., Swartz, R.H.,
57–70.
Troyer, A.K., Binns, M.A., Abdi, H., Strother, S.C., 2019b. Generalization of the
Phillips, D., Phillips, J., 2009. Visualising types: the potential of correspondence analysis.
minimum covariance determinant algorithm for categorical and mixed data types.
In: Byrne, D., Ragin, C.C. (Eds.), Sage Handbook of Case-Based Methods. Sage
others bioRxiv, 333005. https://doi.org/10.1101/333005.
Publications, pp. 148–168.
Benzécri, J.-P., 1973. L’Analyse des Données 1–2. Dunod.
R Core Team, 2020. R: A Language and Environment for Statistical Computing. https://
Berry, K.J., Johnston, J.E., Mielke Jr., P.W., 2011. Permutation methods. Wiley
www.R-project.org/.
Interdiscipl. Rev.: Comput. Stat. 3 (6), 527–542. https://doi.org/10.1002/wics.177.
Rabin, L.A., Miles, R.T., Kamata, A., Krishnan, A., Elbulok-Charcape, M., Stewart, G.,
Cordier, B., 1965. L’analyse des correspondances [PhD thesis]. University of Rennes.
Compton, M.T., 2021. Development, item analysis, and initial reliability and validity
Efron, B., Tibshirani, R.J., 1993. An introduction to the bootstrap. Monogr. Stat. Appl.
of three forms of a multiple-choice mental health literacy assessment for college
Probab. 57, 1–436. https://doi.org/10.1007/978-1-4899-4541-9.
students (MHLA-c). Psychiatr. Res. 300, 113897 https://doi.org/10.1016/j.
Escofier, B., 1988. Analyse des correspondances multiples conditionnelle. In: Diday, en
psychres.2021.113897.
(Ed.), Data Analysis and Informatics: International Symposium Proceedings: 5th.
Saporta, G., Keita, N.N., 2006. Correspondence analysis and classification. In:
Amsterdam: North Holland.
Greenacre, M.J., Blasius, J. (Eds.), Multiple Correspondence Analysis and Related
Escofier, B., Pagès, J., 2016. Analyses factorielles simples et multiples. Dunod.
Methods, pp. 371–392. https://doi.org/10.1201/9781420011319-19. Chapman and
Farrer, L., Leach, L., Griffiths, K.M., Christensen, H., Jorm, A.F., 2008. Age differences in
Hall/CRC.
mental health literacy. BMC Publ. Health 8 (1), 1–8. https://doi.org/10.1186/1471-
Statistical inference in the 21st century: a world beyond p < 0.05 [special issue]. In:
2458-8-125.
Wasserstein, R.L., Schirm, A.L., Lazar, N.A. (Eds.), Am. Statistician 73. https://doi.
Greenacre, M., Blasius, J., 2006. Multiple Correspondence Analysis and Related Methods.
org/10.1080/00031305.2019.1583913.
Chapman and Hall/CRC.
Wickham, H., 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag.
Guillemot, V., Beaton, D., Gloaguen, A., Löfstedt, T., Levine, B., Raymond, N.,
Williams, L.J., Abdi, H., French, R., Orange, J.B., 2010. A tutorial on multiblock
Tenenhaus, A., Abdi, H., 2019. A constrained singular value decomposition method
discriminant correspondence analysis (MUDICA): a new method for analyzing
that integrates sparsity and orthogonality. PLoS One 14 (3), e0211463. https://doi.
discourse data from clinical populations. J. Speech Lang. Hear. Res. 53 (5),
org/10.1371/journal.pone.0211463.
1372–1393. https://doi.org/10.1044/1092-4388(2010/08-0141.
Guttman, L., 1941. The quantification of a class of attributes: a theory and method of
Wong, K., 2016. Gender differences in mental health literacy of university students.
scale construction. In: Horst, P. (Ed.), The Prediction of Personal Adjustment. Social
West. Undergrad. Psychol. J. 4 (1).
Science Council, pp. 318–348.
Yu, J.-C., 2021. Sparse Partial Least Square Correspondence Analysis (SPLS-CA):
Hair, J., Black, W., Babin, J., Anderson, R., Tatham, R., 2009. Analyzing nominal data
Applications To Genetics and Behavioral Studies [PhD Thesis].
with correspondence analysis. In: Hair, J., Black, W., Babin, J., Anderson, R.,
Yu, J.-C., Gómez–Corona, C., Abdi, H., Guillemot, V., 2022. Sparse MFA, sparse STATIS,
Tatham, R. (Eds.), Multivariate Data Analysis. Prentice-Hall, pp. 595–603.
and sparse DiSTATIS with an application to sensory evaluation. J. Chemometr.
Härdle, W.K., Simar, L., 2019. Applied Multivariate Statistical Analysis. Springer Nature.
https://doi.org/10.1002/cem.3443 (in press).
Hesterberg, T., 2011. Bootstrap. Wiley Interdiscipl. Rev.: Comput. Stat. 3 (6), 497–526.
https://doi.org/10.1002/wics.182.

18

You might also like