Development of The "Scale For The Assessment of Non-Experts' AI Literacy" - An Exploratory Factor Analysis
Development of The "Scale For The Assessment of Non-Experts' AI Literacy" - An Exploratory Factor Analysis
net/publication/374235062
CITATIONS READS
76 2,003
4 authors:
All content following this page was uploaded by Matthias Carl Laupichler on 11 October 2023.
A R T I C L E I N F O A B S T R A C T
Keywords: Artificial Intelligence competencies will become increasingly important in the near future. Therefore, it is
AI literacy essential that the AI literacy of individuals can be assessed in a valid and reliable way. This study presents the
AI competencies development of the “Scale for the assessment of non-experts’ AI literacy” (SNAIL). An existing AI literacy item set
AI literacy scale
was distributed as an online questionnaire to a heterogeneous group of non-experts (i.e., individuals without a
AI literacy questionnaire
Assessment
formal AI or computer science education). Based on the data collected, an exploratory factor analysis was
Exploratory factor analysis conducted to investigate the underlying latent factor structure. The results indicated that a three-factor model
had the best model fit. The individual factors reflected AI competencies in the areas of “Technical Under
standing”, “Critical Appraisal”, and “Practical Application”. In addition, eight items from the original ques
tionnaire were deleted based on high intercorrelations and low communalities to reduce the length of the
questionnaire. The final SNAIL-questionnaire consists of 31 items that can be used to assess the AI literacy of
individual non-experts or specific groups and is also designed to enable the evaluation of AI literacy courses’
teaching effectiveness.
1. Introduction technologies; communicate and collaborate effectively with AI; and use
AI as a tool online, at home, and in the workplace” (p. 2). Furthermore,
Artificial intelligence (AI) is having an increasing impact on various Ng et al. (2021a) state in their literature review that „instead of merely
aspects of daily life. These effects are evident in areas such as education knowing how to use AI applications, learners should be inculcated with
(Zhai et al., 2021), healthcare (Reddy et al., 2019), or politics (König & the underlying AI concepts for their future career, as well as the ethical
Wenzelburger, 2020). However, AI is not only used in niche areas that concerns of AI applications to become a responsible citizen” (p. 507).
require a high degree of specialization, but it is also integrated into Despite these and other attempts to define AI literacy, there is still no
everyday life applications. Programs like ChatGPT (OpenAI, 2023) clear consensus on which specific skills fall under the umbrella term AI
provide free and low-threshold access to powerful AI applications for literacy. However, researchers seem to agree that AI literacy is aimed at
everyone. It is already becoming apparent that the use of these AI ap non-experts, which are laymen who have not had specific AI or computer
plications requires a certain level of AI competence that enables a crit science training. These may be individuals who could be classified as
ical appraisal of the programs’ capabilities and limitations. consumers of AI, or individuals who interact with AI in a professional
manner (Faruqe et al., 2021). Because of this somewhat ambiguous
1.1. Defining AI literacy definitional situation, we propose the following AI literacy working
definition: The term AI literacy describes competencies that include basic
These competencies are often referred to in the literature as AI lit knowledge and analytical evaluation of AI, as well as critical use of AI ap
eracy. There are several definitions of AI literacy, but one of the most plications by non-experts. It should be emphasized that programming
commonly used can be found in a paper by Long and Magerko (2020), skills are explicitly not included in AI literacy in this definition, since in
which lists 16 core AI literacy competencies. They define AI literacy as our view they represent a separate set of competencies and go beyond AI
“a set of competencies that enables individuals to critically evaluate AI literacy.
* Corresponding author. Institute of Medical Education, University Hospital Bonn, Venusberg-Campus 1, 53127, Bonn, Germany.
E-mail addresses: [email protected] (M.C. Laupichler), [email protected] (A. Aster), [email protected] (N. Haverkamp),
[email protected] (T. Raupach).
https://doi.org/10.1016/j.chbr.2023.100338
Received 18 April 2023; Received in revised form 7 August 2023; Accepted 25 September 2023
Available online 27 September 2023
2451-9588/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
1.2. Assessing AI literacy assumptions from practical observations (i.e., participants’ responses to
the AI literacy items), we deliberately refrained from formulating
Although comparatively young, the field of AI literacy and AI edu hypotheses.
cation has been the subject of increasing research for several years However, three research questions can still be formulated that can
(Kandlhofer et al., 2016, Cetindamar et al., 2022). In addition, there are structure the development of the scale. First, we are interested in
many examples in the literature of courses and classes that strive to whether hidden (or latent) factors influence item responses. These could
increase AI literacy of individuals at different levels of education, e.g., be subconstructs that map different capabilities in the field of AI literacy.
kindergarten (Su & Ng, 2023), high school (Ng et al., 2022), or uni For example, it would be possible that AI literacy consists of the specific
versity (Laupichler et al., 2022). However, few attempts have been made subcategories of “awareness,” “usage,” “evaluation,” and “ethics,” as
to develop instruments for assessing individuals’ AI literacy. However, postulated by Wang et al. (2022). As a first step, it would therefore be
the existence of such instruments would be essential, for example, to interesting to determine how many factors there are and which items of
evaluate the teaching effectiveness of the courses described above. the item set can be assigned to each individual factor. Thus, research
Another advantage of AI literacy assessment tools would be the ability to question (RQ) 1 is:
compare the AI literacy of different subgroups (e.g., high school or
RQ1. How many factors should be extracted from the available data,
medical students), identify their strengths and weaknesses, and develop
and which items of the SNAIL-questionnaire load on which factor?
learning opportunities based on these findings. In addition, a scale
reliably assessing AI literacy could be used to characterize study pop While RQ1 can be answered mainly with statistical methods (more
ulations in AI-related research. It is important that such assessment in on this in the section 2), RQ2 is more concerned with the meaning of the
struments meet psychometric quality criteria. In particular, the factors in terms of factor content. Often, multiple items loading on a
reliability and validity of the instruments are vitally important and single factor follow a specific content theme. This theme can be identi
should be tested extensively (Verma, 2019). fied and named, and the name can be used as the “title” for the
To our knowledge, there are currently three publications dealing respective factor.
with the development of psychometrically validated scales for AI liter
RQ2. Can the items loading on a factor be subsumed under a particular
acy which allow a general and cross-sample assessment of AI literacy.
theme that can be used as a factor name?
The first published scale by Wang et al. (2022) found four factors that
constitute AI literacy: “awareness”, “usage”, “evaluation”, and “ethics”. Lastly, in most item sets there are certain items whose added value is
This scale was developed primarily to “measure people’s AI literacy for rather low. This could be due, to the fact that an item is worded
future [Human-AI interaction] research” (p. 5). The authors developed ambiguously or measures something other than what it is supposed to
their questionnaire based on digital literacy research and found that measure. Such items should be excluded from the final scale because
digital literacy and AI literacy overlap to some extent. Another study they can negatively influence the psychometric quality criteria. In
was published by Pinski and Benlian (2023). This study primarily pre addition, a scale is more efficient if it requires fewer items while
sents the development of a set of content-valid questions and supple maintaining the same quality.
ments this with a pre-test of the item set with 50 participants. Based on
RQ3. Do items exist in the original item set that can be excluded to
the preliminary sample, structural equation modelling was used to
increase the efficiency of the final SNAIL-questionnaire?
examine whether their notion of a general model of AI capabilities was
accurate. While the study is well designed overall, the results of the
2. Material and methods
pre-test based on only 50 subjects can indeed only be considered pre
liminary. It is also interesting to note that the questionnaire is intended
This study was approved by the local Ethics Committee (application
to be used to assess general AI literacy, but in the pre-selection of par
number 194/22), and all participants gave informed consent.
ticipants, a certain level of programming ability was required. The most
recent study in this area was published as a preprint by Carolus et al.
(2023) and is still in the peer-review process at this time. The authors 2.1. Variable selection and study design
generated a set of potential AI literacy items derived from the categories
listed in the review by Ng et al. (2021b). Afterwards, the “items were Laupichler et al. (2023) developed a preliminary item set for
discussed, rephrased, rejected, and finalised by [their] team of re assessing individuals’ AI literacy in a Delphi expert study. In this study,
searchers” (Carolus et al., 2023, p.6). They then tested the fit of the items 53 experts in the field of AI education were asked to evaluate
to the theoretical categories using confirmatory factor analysis. Of note, pre-generated items in terms of their relevance to an AI literacy ques
this procedure corresponds to the top-down process of deduction, as the tionnaire. In addition, the experts were asked to contribute their own
authors derive practical conclusions (i.e., items) from theory. item suggestions as well as to improve the wording of the pre-generated
items. The relevance and the wording of 47 items were evaluated in
1.3. Developing the „scale for the assessment of non-experts’ AI literacy” three iterative Delphi rounds (for more information on the Delphi pro
cess, see Laupichler et al., 2023). This resulted in a preliminary set of 39
The main objective of this paper is to present the development of the content-valid items designed to cover the entire domain of AI literacy.
“Scale for the assessment of non-experts’ AI literacy” (SNAIL), which The authors argued that the item set is preliminary because their psy
aims to expand the AI literacy assessment landscape. It differs from chometric properties were not assessed in the study. The items were
existing AI literacy assessment tools in several essential ways. First, the formulated as “I can…" statements, e.g. “I can tell if the technologies I
focus of the scale is clearly on non-experts, i.e., individuals who have not use are supported by artificial intelligence”.
had formal AI training themselves and who interact or collaborate with We used an analytical, observational, cross-sectional study design.
AI rather than create or develop it (in contrast to Carolus et al. (2023)). All 39 items created by Laupichler et al. (2023) were presented to the
Second, we focused exclusively on AI literacy items, as the assessment of participants in an online questionnaire. Participants rated the corre
AI literacy must be detached from related constructs such as digital sponding competency on a seven-point Likert scale from “strongly
literacy (in contrast to Wang et al., 2022). Third, we take an inductive, disagree” (one) to “strongly agree” (seven), as recommended by Lozano
exploratory, bottom-up research approach by moving from specific et al. (2008). The items were presented in random order, and the online
items to generalized latent factors. The main reason for this approach is questionnaire system ensured that the items were presented in a
the prelusive theoretical basis for AI literacy (as described in section different (randomized) order for each participant. In addition to the
1.1). Since this inductive research approach derives theoretical actual AI literacy items, some sociodemographic questions were asked
2
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
about age, gender, country of origin, etc. In addition, two bogus items
were used to control the participants’ attention (see next section).
2.2. Participants
3
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
of a domain” (Widaman, 2018, p. 829), we chose common factor anal After the EFA was conducted, the SNAIL-questionnaire was short
ysis over principal component analysis. However, since we used a ened to improve questionnaire economy and thereby increase the
relatively high number of variables (39), both techniques would likely acceptability of using SNAIL as an assessment tool. As a basis for
produce fairly similar results (Watkins, 2021). deciding whether to exclude variables, we looked at salient pattern co
Although different factor extraction methods generally yield similar efficients on more than one factor on the one hand, and a particularly
results (Tabachnik et al., 2019), we compared the results of maximum low communality on the other.
likelihood extraction and iterated principal axis extraction due to the Data pre-processing was done partially in Microsoft Excel (Microsoft
multivariate non-normality of our data. The differences between the two Corporation, 2018) or R (R Core Team, 2021) and RStudio (RStudio
extraction methods were negligible, so we applied the more commonly Team, 2020), respectively. Data analysis and data visualization was
used maximum likelihood extraction. We used squared multiple corre conducted entirely in R and RStudio.
lations for the initial estimation of communalities. Since our variables
were in principle ordinal at least, we based the analysis on the poly 3. Results
choric correlation matrix instead of the more commonly used Pearson
correlation matrix. We used parallel analysis by Horn (1965) and the 3.1. Data screening and appropriateness of data for EFA
minimum average partial (MAP) method of Velicer (1976) to decide
how many factors to retain. A scree-plot (Catell, 1966) was used for The univariate distribution of all variables was acceptable, with
visual representation, but not as a decisive method, since it was found to skewness values ranging from − 1.18 to 0.87, which is in the acceptable
be rather subjective and researcher-dependent (Streiner, 1998). Since range of − 2.0 to 2.0. Similar results were found for univariate kurtosis,
we expected the various factors in the model to be at least somewhat with values ranging from − 1.26 to 1.85, which is in the acceptable range
correlated, we used an oblique rotation method. We used the promax of − 7.0 to 7.0 (see supplementary material 1). Because Mardia’s test of
rotation method as a basis for interpretation, but compared the results multivariate skew and kurtosis became significant (p < .001), multi
with the oblimin rotation method. Norman and Streiner (2014) sug variate non-normality had to be assumed. Bentler (2005) found that
gested to set the threshold at which pattern coefficients (factor loadings) increased multivariate kurtosis values of ≥5.0 can influence the results
will be considered meaningful (i.e., salient) to √5.152
̅̅̅̅̅̅̅ (for p = .01).
N− 2
of EFA when working with Pearson correlation matrices, which is
However, due to the large number of participants in our study, this another reason to base calculations on the polychoric correlation matrix.
would imply a relatively low salience threshold of 0.25, which is why we Using the Mahalanobis distance (D2), some outliers were identified, but
followed the more conservative suggestion made by Comrey and Lee these were still within the normal range and showed no signs of sys
(1992), who considered a minimum loading of 0.32 as salient. tematic error. Data entry errors or other third-party influences are
highly unlikely because we used automated questionnaire programs.
Fig. 2. Distribution and number of missing values in absolute and relative terms across all subjects and variables.
4
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
Thus, we could not find any “demonstrable proof [that] indicates that 3.3.2. Two and three factor models
they are truly aberrant and not representative of any observations in the The difference between the two-factor model and the three-factor
population” (Hair et al., 2019, p. 91), which is why we did not exclude model was rather ambiguous, which is consistent with the contrasting
these cases from the data set. In total, each variable missed between results of the parallel analysis and the minimum average partial method,
0 and 4 values, which makes up 0–0.96% of all data. In addition, the which suggested the extraction of two and three factors, respectively.
data was missing completely at random, as demonstrated in Fig. 2. Both models had somewhat elevated levels of off-diagonal residuals,
Therefore, no imputation or deletion methods were applied. with 15.1% of residuals exceeding 0.05 and 3% of residuals exceeding
Based on Bartlett’s test of sphericity, the null-hypothesis that the 0.10 in the two-factor model and 11.3% of residuals exceeding 0.05 and
correlation matrix was an identity matrix could be rejected (p < .001). 1.08% of residuals exceeding 0.10 in the three-factor model. Although
The significant result (i.e., p < .05) indicates that there is some redun this might indicate underfactoring, it could also be due to the ordinal
dancy among the variables, which means that they can be reasonably nature of the data set and the multivariate non-normality. In addition,
summarized with a smaller number of factors. The overall MSA of the the RSMR-value of both models (0.04 and 0.03, respectively) lay under
Kaiser-Mayer-Olkin criterion was 0.97, with a range of 0.94–0.98 for the suggested threshold of ≤ 0.08.
each item, which is far above the minimum recommended threshold of All models had a sufficient number of pattern coefficients that loaded
0.5 (Field et al., 2012) or 0.6 (Tabachnik et al., 2019), respectively. A saliently on each factor (i.e., more than three, Fabrigar & Wegener,
visual inspection of the correlation matrix revealed that a majority of the 2012; Mulaik, 2009). The only exception is the three-factor oblimin
coefficients were ≥0.30, indicating a sufficiently high magnitude of model when applying the conservative salience threshold of ≥ 0.32
coefficients in the correlation matrix. Based on these measures, we described above. Here, no variables would load saliently on the third
assumed that the correlation matrix was adequate for performing an factor. The promax rotation method, on the other hand, comes to a
EFA. (Watkins, 2021; Hair et al., 2019; Tabachnik et al., 2019). reasonable distribution of salient pattern coefficients on all three factors.
The two- and three-factor model both showed marginally acceptable
3.2. Number of factors to retain communalities and no Heywood-cases (Harman, 1976). The mean of the
communalities was 0.54 (SD = 0.08) for the two-factor model and 0.57
Horn’s parallel analysis, conducted with 20,000 iterations, found (SD = 0.08) for the three-factor model.
two factors to be the optimal solution, regardless whether the reduced or While the one-factor model was only able to explain 48% of the
unreduced correlation matrix was used. In contrast, Velicer’s minimum variance, the two-, three- and four-factor models were able to explain
average partial reached a minimum of 0.0086 with three factors. A vi 54%, 57%, and 58% of the variance, respectively.
sual inspection of the scree plot supports these results. Depending on To analyse the internal consistency reliability, we combined every
subjective preferences, two or three factors could be retained (Fig. 3). variable that saliently loaded on a factor in a scale and calculated
Consequently, we analysed models with one, two, three, and four factors Cronbach’s alpha with bootstrapped confidence intervals. The internal
for signs of under- or overfactoring, as well as their interpretability and consistency of both scales in the two-factor model was excellent, with α
theoretical meaningfulness. = 0.95 [CI 0.94, 0.96] for the first scale and α = 0.94 [CI 0.93, 0.95] for
the second scale. Albeit having slightly lower alpha-values, the internal
3.3. EFA model evaluation consistency of the three scales in the three-factor model was also
excellent: α = 0.94 [CI 0.93, 0.95] for the first scale, α = 0.93 [CI 0.91,
Following RQ1, the next section evaluates and compares different 0.94] for the second scale, and α = 0.89 [CI 0.87, 0.91] for the third
factor models to identify the most fitting number of factors. scale.
Since Cattell (1978) and other researchers conclude that the right
number of factors is not a question of a correct absolute number, but
rather a question “of not missing any factor of more than trivial size” (p.
61), the three-factor model seems to represent a good compromise be
tween parsimony and avoiding the risk of underextraction (see Fig. 4).
As for RQ2, the findings and assessments based on the data coincide
well with the content-related examination of the individual factors. With
the two-factor solution, a unifying theme could be identified but is
rather diffuse and unclear. However, the three-factor solution creates a
more plausible classification of the manifest variables to the latent fac
Fig. 3. Screeplot.
tors in terms of content (see Table 1). Based on the reasons given, the
5
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
three-factor model was chosen as the best model. “Critical Appraisal” and “Practical Application”). Furthermore, two
The first factor’s highest pattern coefficients were found in variables variables were deleted because of low communalities; e.g. “I can explain
centred around the understanding of machine learning, e.g. “I can the differences between human and artificial intelligence”, and one item
describe how machine learning models are trained, validated, and that did not load saliently on any factor and had a weak communality; “I
tested”. Other rather technical or theoretical AI competencies such as can explain what an algorithm is” (see Table 1). We repeated the EFA
defining the differences between general and narrow AI or explaining process with the reduced set of variables and found comparable results.
“how sensors are used by computers to collect data that can be used for One of the main differences was the decrease in interfactor-correlations,
AI purposes” load saliently on this factor, too. Thus, we propose the first which is somewhat trivial, given that we specifically excluded variables
factor’s name to be “Technical Understanding”. The variables loading that loaded saliently on more than one factor. The internal consistency
saliently on the second factor deal with the recognition of the impor of the three scales (i.e., three factors) based on the reduced variable set
tance of data privacy and data security in AI, ethical issues related to AI, was excellent. The alpha-values were very similar to the values of the
and risks or weaknesses that may appear when applying AI technologies. unreduced variable set, with α = 0.93 [CI 0.92, 0.94] for the first scale, α
Therefore, the second factor is to be called “Critical Appraisal” as it = 0.91 [CI 0.89, 0.93] for the second scale, and α = 0.85 [CI 0.81, 0.88]
reflects competencies related to the critical evaluation of AI application for the third scale.
results. Lastly, the variables with the highest pattern coefficients that For the sake of brevity, all other results and diagrams can be found in
load on the third factor are concerned with “examples of technical ap the supplementary material (Supplementary Material 1). Consequently,
plications that are supported by artificial intelligence” or assessing “if a the final SNAIL-questionnaire consists of 31 variables loading on three
problem in [one’s] field can and should be solved with artificial intel factors.
ligence methods”. Consequently, the third factor is to be called “Prac
tical Application”. Accordingly, the interaction of the three factors could 4. Discussion
be called the TUCAPA-model of AI literacy.
4.1. Relation between TUCAPA and other models
3.5. Variable elimination
One of the most well-known lists of AI literacy components was
certainly published by Long and Magerko (2020), who list 16 compe
The last section of the results section serves to answer RQ3, which
tencies that constitute AI literacy. These competencies seem to have only
deals with the elimination of items that do not add value and can
minor relevance for the design of AI literacy assessment questionnaires.
therefore be excluded. As described above, we excluded variables that
This could be due to the large number of 16 competencies, some of
loaded saliently on more than one factor and variables with a commu
which are at the level of latent factors (e.g., competency 11 “Data Lit
nality of 2 SD (i.e., 0.08) under the mean communality (0.57). After item
eracy”) and some at the level of individual manifest variables (e.g.,
elimination, 31 items remained in the final SNAIL-questionnaire. We did
competency 4 “General vs. Narrow [AI]”). Nevertheless, some compe
not use item parameters (i.e., item difficulty1 and item discrimination2)
tencies listed by Long & Magerko (e.g., competency 1 “Recognizing AI”)
as exclusion criteria because they were all within the acceptable range
correspond to variables used in SNAIL (e.g., V01 “I can tell if the tech
(see Table 2).
nologies I use are supported by artificial intelligence.“).
Overall, five items were eliminated due to diffuse loading patterns; e.
Many researchers refer to the literature review by Ng et al. (2021b)
g. “I can name strengths of artificial intelligence” (loading saliently on
6
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
Table 1
List of all variables sorted by factors based on the three-factor TUCAPA-model of AI literacy.
Note. The variables are sorted by pattern coefficient, with variables loading the highest on each factor
appearing at the top of each column. Note that the table shows the model before elimination of eight
items. Eliminated items have a lighter font. The superscript numbers indicate the reason for elimi
nation, with (1) indicating salient loadings on more than one factor, (2) indicating extraordinarily low
communalities, and (3) indicating a combination of (1) and (2).
7
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
8
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
investigated whether SNAIL can be applied equally well in all subject Data availability
domains, or whether there are practical differences in AI literacy be
tween different domains. For example, it could be possible that in Research data will be published as Supplementary Material (Excel-
dividuals with a high level of technical understanding (e.g., individuals File)
from the field of mathematics or mechanical engineering) would rate the
questions of the Technical Understanding factor very positively, while Appendix A. Supplementary data
people from fields with less technical affinity (e.g., medicine, psychol
ogy) may evaluate the same questions rather negatively. Furthermore, it Supplementary data to this article can be found online at https://doi.
should be examined whether SNAIL is suitable to investigate the org/10.1016/j.chbr.2023.100338.
teaching effectiveness of courses that aim to increase the AI literacy of
their participants. Since SNAIL is freely available as an open access of References
fering, this would also be interesting for platforms such as “Elements of
AI” (University of Helsinki & MinnaLearn, 2018) or “AI Campus” (KI Bartlett, M. S. (1950). Tests of significance in factor analysis. British Journal of Psychology,
3, 77–85.
Campus, 2023), which offer open educational resources to improve Benson, J., & Nasser, F. (1998). On the use of factor analysis as a research tool. Journal of
general AI literacy. Last but not least, the SNAIL-questionnaire should be Vocational Education Research, 23(1), 13–33.
compared with related constructs such as “attitudes toward AI” Bentler, P. M. (2005). EQS structural equations program manual. Multivariate software.
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s mechanical Turk.
(Schepman & Rodway, 2020; Sindermann et al., 2021) or “digital lit Perspectives on Psychological Science, 6(1), 3–5. https://doi.org/10.1177/
eracy” (Gilster, 1997) to investigate the relationship between each 1745691610393980
construct. For example, it is possible that more pronounced AI literacy Carolus, A., Koch, M., Straka, S., Latoschik, M. E., & Wienrich, C. (2023). MAILS – meta AI
literacy scale: Development and testing of an AI literacy questionnaire based on well-
reduces anxiety toward AI (Wang & Wang, 2022), leading to more founded competency models and psychological change- and meta-competencies.
positive attitudes toward AI. Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral
Research, 1(2), 245–276. https://doi.org/10.1207/s15327906mbr0102_10
Cattell, R. B. (1978). Use of factor analysis in behavioral and life sciences.
5. Conclusion
Cetindamar, D., Kitto, K., Wu, M., Zhang, Y., Abedin, B., & Knight, S. (2022). Explicating
AI literacy of employees at digital workplaces. IEEE Transactions on Engineering
We conducted an exploratory factor analysis to develop the “Scale for Management. https://doi.org/10.1109/TEM.2021.3138503
the assessment of non-experts’ AI literacy” (SNAIL) questionnaire, which Comrey, A., & Lee, H. (1992). Interpretation and application of factor analytic results. In
A first course in factor analysis (2nd ed.). Lawrence Erlbaum Associates.
is designed to assess AI literacy in non-experts. In doing so, we found that Crump, M. J. C., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating amazon’s
the construct represented by the questionnaire can be divided into three mechanical Turk as a tool for experimental behavioral research. PLoS One, 8(3),
subfactors that influence individuals’ response behaviour on AI literacy Article e57410. https://doi.org/10.1371/journal.pone.0057410
Fabrigar, L. R., & Wegener, D. T. (2012). Exploratory factor analysis. Oxford, UK: Oxford
items: Technical Understanding, Critical Appraisal, and Practical Appli University Press.
cation. Therefore, the model can be abbreviated as the TUCAPA model of Faruqe, F., Watkins, R., & Medsker, L. (2021). Competency model approach to AI literacy:
AI literacy. Our study provides initial evidence that the 31 SNAIL items Research-based Path from initial framework to model. ArXiv Preprint.
Ferguson, G. A. (1954). The concept of parsimony in factor analysis. Psychometrika, 19
are able to reliably and validly assess the AI competence of nonexperts. (4), 281–290. https://doi.org/10.1007/BF02289228
However, further research is needed to evaluate whether the results found Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. SAGE.
in our study can be replicated and are representative of the population of Gilster, P. (1997). Digital literacy. John Wiley & Sons, Inc.
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis
nonexperts. Finally, we would like to encourage all researchers in the field (8th ed.). Cengage Learning.
of AI literacy to use psychometrically validated questionnaires to assess Harman, H. H. (1976). Modern factor analysis (3rd ed.). University of Chicago Press.
the AI literacy of individuals and groups as well as to evaluate the learning Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis.
Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447
outcome of course participants.
Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39(1), 31–36.
https://doi.org/10.1007/BF02291575
Funding statement Kandlhofer, M., Hirschmugl-Gaisch, S., & Huber, P. (2016). Artificial intelligence and
computer science in education: From kindergarten to university. 2016 IEEE Frontiers
in Education Conference (FIE), 1–9.
This work was supported by the Open Access Publication Fund of the Karaca, O., Çalışkan, S. A., & Demir, K. (2021). Medical artificial intelligence readiness
University of Bonn. scale for medical students (MAIRS-MS) – development, validity and reliability study.
BMC Medical Education, 21(1), 112. https://doi.org/10.1186/s12909-021-02546-6
KI Campus. (2023). AI Campus. https://Ki-Campus.Org/.
Ethics approval statement König, P. D., & Wenzelburger, G. (2020). Opportunity for renewal or disruptive force?
How artificial intelligence alters democratic politics. Government Information
Data collection for this study took place in February 2023 using the Quarterly, 37(3), Article 101489. https://doi.org/10.1016/j.giq.2020.101489
Laupichler, M. C., Aster, A., & Raupach, T. (2023). Delphi study for the development and
study participant acquisition program Prolific. The subjects received preliminary validation of an item set for the assessment of non-experts’ AI literacy.
appropriate financial compensation for participating in the study. Computers and Education: Artificial Intelligence, 4, Article 100126. https://doi.org/
Participation in the study was voluntary and participants gave their 10.1016/j.caeai.2023.100126
Laupichler, M. C., Aster, A., Schirch, J., & Raupach, T. (2022). Artificial intelligence
informed consent. The study was approved by the Research Ethics literacy in higher and adult education: A scoping literature review. Computers and
Committee of the University of Bonn (Reference 194/22). Education: Artificial Intelligence, 3, Article 100101. https://doi.org/10.1016/j.
caeai.2022.100101
Long, D., & Magerko, B. (2020). What is AI literacy? Competencies and design
Author contributions
considerations. In Proceedings of the 2020 CHI Conference on Human Factors in
Computing Systems, 1–16. https://doi.org/10.1145/3313831.3376727
Matthias Carl Laupichler: Conceptualization, Formal Analysis, Lozano, L. M., García-Cueto, E., & Muñiz, J. (2008). Effect of the number of response
categories on the reliability and validity of rating scales. Methodology, 4(2), 73–79.
Writing – Original Draft, Visualization Alexandra Aster: Writing – Re
https://doi.org/10.1027/1614-2241.4.2.73
view & Editing, Data Curation Nicolas Haverkamp: Methodology. Mahalanobis, P. C. (1936). On the generalized distance in statistics. Journal of the Society
Tobias Raupach: Supervision, Resources. of Bengal, 2(1), 49–55.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications.
Biometrika, 57(3), 519–530.
Declaration of competing interest Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data.
Psychological Methods, 17(3), 437–455. https://doi.org/10.1037/a0028085
The authors declare that they have no known competing financial Microsoft Corporation. (2018). Microsoft Excel. Retrieved from https://office.microsoft.
com/excel.
interests or personal relationships that could have appeared to influence
the work reported in this paper.
9
M.C. Laupichler et al. Computers in Human Behavior Reports 12 (2023) 100338
Mulaik, S. A. (2009). Foundations of factor analysis. Chapman and Hall/CRC. https://doi. Streiner, D. L. (1998). Factors affecting reliability of interpretations of scree plots.
org/10.1201/b15851 Psychological Reports, 83(2), 687–694. https://doi.org/10.2466/pr0.1998.83.2.687
Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005). Minimum sample size Su, J., & Ng, D. T. K. (2023). Artificial intelligence (AI) literacy in early childhood
recommendations for conducting factor analyses. International Journal of Testing, 5 education: The challenges and opportunities. Computers and Education: Artificial
(2), 159–168. https://doi.org/10.1207/s15327574ijt0502_4 Intelligence. , Article 100124. https://doi.org/10.1016/j.caeai.2023.100124
Ng, D. T. K., Leung, J. K. L., Chu, K. W. S., & Qiao, M. S. (2021a). AI literacy: Definition, Tabachnik, B. G., Fidell, L. S., & Ullman, J. B. (2019). In Using multivariate statistics (Vol.
teaching, evaluation and ethical issues. Proceedings of the Association for Information 7). Pearson.
Science and Technology, 58(1), 504–509. https://doi.org/10.1002/pra2.487 University of Helsinki, MinnaLearn. (2018). Elements of AI. https://www.elementsofai.
Ng, D. T. K., Leung, J. K. L., Chu, S. K. W., & Qiao, M. S. (2021b). Conceptualizing AI com/.
literacy: An exploratory review. Computers and Education: Artificial Intelligence, 2, Velicer, W. F. (1976). Determining the number of components from the matrix of partial
Article 100041. https://doi.org/10.1016/j.caeai.2021.100041 correlations. Psychometrika, 41(3), 321–327. https://doi.org/10.1007/BF02293557
Ng, D. T. K., Leung, J. K. L., Su, M. J., Yim, I. H. Y., Qiao, M. S., & Chu, S. K. W. (2022). AI Verma, J. P. (2019). Statistics and research methods in psychology with Excel. Springer
literacy in K-16 classrooms. Springer. Singapore. https://doi.org/10.1007/978-981-13-3429-0
Norman, G. R., & Streiner, D. L. (2014). Biostatistics: The bare essentials (4th ed.). People’s Wang, B., Rau, P. L. P., & Yuan, T. (2022). Measuring user competence in using artificial
Medical Publishing. intelligence: Validity and reliability of artificial intelligence literacy scale. Behaviour
Pinski, M., & Benlian, A. (2023). AI literacy - towards measuring human competency in & Information Technology. https://doi.org/10.1080/0144929X.2022.2072768
artificial intelligence. Proceedings of the 56th Hawaii International Conference on Wang, Y. Y., & Wang, Y. S. (2022). Development and validation of an artificial
System Sciences, 165–174. intelligence anxiety scale: An initial application in predicting motivated learning
R Core Team. (2021). R: A language and environment for statistical computing. Vienna, behavior. Interactive Learning Environments, 30(4), 619–634. https://doi.org/
Austria: R Foundation for Statistical Computing. https://www.R-project.org/. 10.1080/10494820.2019.1674887
Reddy, S., Fox, J., & Purohit, M. P. (2019). Artificial intelligence-enabled healthcare Watkins, M. R. (2021). A step-by-step guide to exploratory factor analysis with R and
delivery. Journal of the Royal Society of Medicine, 112(1), 22–28. https://doi.org/ RStudio. Routledge.
10.1177/014107681881551 Widaman, K. F. (2018). On common factor and principal component representations of
RStudio Team. (2020). RStudio. Boston, MA: Integrated Development for R. RStudio, data: Implications for theory and for confirmatory replications. Structural Equation
PBC. http://www.rstudio.com/. Modeling: A Multidisciplinary Journal, 25(6), 829–847. https://doi.org/10.1080/
Schepman, A., & Rodway, P. (2020). Initial validation of the general attitudes towards 10705511.2018.1478730
artificial intelligence scale. Computers in Human Behavior Reports, 1, Article 100014. Woods, C. M. (2006). Careless responding to reverse-worded items: Implications for
https://doi.org/10.1016/j.chbr.2020.100014 confirmatory factor analysis. Journal of Psychopathology and Behavioral Assessment,
Sindermann, C., Sha, P., Zhou, M., Wernicke, J., Schmitt, H. S., Li, M., Sariyska, R., 28(3), 186–191. https://doi.org/10.1007/s10862-005-9004-7
Stavrou, M., Becker, B., & Montag, C. (2021). Assessing the attitude towards artificial Zhai, X., Chu, X., Chai, C. S., Jong, M. S. Y., Istenic, A., Spector, M., Liu, J.-B., Yuan, J., &
intelligence: Introduction of a short measure in German, Chinese, and English Li, Y. (2021). A review of artificial intelligence (AI) in education from 2010 to 2020.
language. KI - Kunstliche Intelligenz, 35(1), 109–118. https://doi.org/10.1007/ Complexity, 2021, 1–18. https://doi.org/10.1155/2021/8812542
s13218-020-00689-0
10