Development of The Paranormal and Supernatural Bel
Development of The Paranormal and Supernatural Bel
Abstract
Background: This study describes the construction and validation of a new scale for measuring belief in paranormal
phenomena. The work aims to address psychometric and conceptual shortcomings associated with existing meas-
ures of paranormal belief. The study also compares the use of classic test theory and modern test theory as methods
for scale development.
Method: We combined novel items and amended items taken from existing scales, to produce an initial corpus of 29
items. Two hundred and thirty-one adult participants rated their level of agreement with each item using a seven-
point Likert scale.
Results: Classical test theory methods (including exploratory factor analysis and principal components analysis)
reduced the scale to 14 items and one overarching factor: Supernatural Beliefs. The factor demonstrated high internal
reliability, with an excellent test–retest reliability for the total scale. Modern test theory methods (Rasch analysis using
a rating scale model) reduced the scale to 13 items with a four-point response format. The Rasch scale was found to
be most effective at differentiating between individuals with moderate-high levels of paranormal beliefs, and dif-
ferential item functioning analysis indicated that the Rasch scale represents a valid measure of belief in paranormal
phenomena.
Conclusions: The scale developed using modern test theory is identified as the final scale as this model allowed for
in-depth analyses and refinement of the scale that was not possible using classical test theory. Results support the
psychometric reliability of this new scale for assessing belief in paranormal phenomena, particularly when differentiat-
ing between individuals with higher levels of belief.
Keywords: Paranormal beliefs, Anomalous beliefs, Supernatural, Scale, Scale development, Factor analysis, Rasch
analysis, Rating scale model, Classical test theory, Modern test theory
© The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco
mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
gender also influence belief in the paranormal, although Paranormal Belief Scale
the extent of these effects have been questioned [15–18]. The Paranormal Belief scale in both original [49] and
While much of this work indicates a negative influence revised format (RPBS) [52] is the most widely used meas-
of paranormal beliefs on cognition and psychological ure of paranormal belief. The revised format contains
well-being, many studies have also demonstrated posi- 26 items, adopts a broad definition of paranormal phe-
tive and adaptive functions of such beliefs. These adap- nomena, and contains seven subscales (Traditional Reli-
tive functions include goal setting, emotional clarity, gious Belief, Psi, Witchcraft, Superstition, Spiritualism,
clarity about the self and the wider world, coping with Extraordinary Life Forms, and Precognition). Several
trauma and stress, and the reduction of fear surround- issues have been raised regarding both the item content
ing ambiguous stimuli [19–26]. Similarly, paranormal and the factor structure of the RPBS [47, 48, 53–61].
experiences have been shown to have adaptive outcomes, Much of this criticism has centred on the Extraordinary
particularly in the wake of a bereavement [27–30]. These Life Forms (ELF) and Traditional Religious Belief (TRB)
positive experiences may in turn lead to belief in the subscales.
paranormal. Indeed, several studies have reported posi- The ELF subscale consists of several cryptozoological
tive correlations between paranormal experience and items, including those relating to the alleged existence of
belief [31–33]. This may also relate to the relationship the Loch Ness monster and the abominable snowman of
between emotion-based reasoning and an individual’s Tibet. Some have argued that endorsing the existence of
proneness to paranormal attributions [34, 35]. Regard- such alleged extraordinary life forms is not strongly asso-
less of the cause of these beliefs, the breadth of work in ciated with belief in more ‘mainstream’ paranormal phe-
this area suggests that belief in the paranormal should nomena, such as telepathy and premonitions [47]. These
not be automatically viewed as a negative or problem- cryptozoological items have also been shown to be prob-
atic trait. However, some researchers argue that there is lematic in samples with greater cultural diversity, leading
a specific type of believer whose beliefs are more likely to some researchers to replace items with more culturally
be associated with negative biases and dysfunctions. Pre- relevant equivalents [62–65]. The ELF subscale also has
vious work has suggested that paranormal believers can the lowest internal reliability of the seven RPBS subscales
be divided into two subgroups: informed believers (who and has frequently failed to reach recommended Cron-
have a deeper understanding of paranormal phenomena bach’s alpha thresholds [53, 66–69].
and their putative causes), and quasi-believers (whose The TRB subscale has raised concerns due to contra-
beliefs represent a superficial understanding of para- dictory evidence concerning the relationship between
normal phenomena) [36, 37]. It has been proposed that paranormal and religious beliefs. While several studies
negative associations seen between paranormal beliefs have noted positive correlations between religiosity and
and cognition are a function of a tendency to hold quasi- belief in the paranormal [9, 70, 71], others have found
beliefs, and that informed believers represent a small those displaying especially strong forms of religious belief
subgroup of believers whose beliefs are independent of to be less likely to endorse the existence of paranormal
any cognitive deficits [36, 37]. However, it is still unclear phenomena [72, 73]. Some suggest that the relationship
whether paranormal believers can be reliably divided into may be best conceptualised as curvilinear, with paranor-
such subgroups [38]. mal belief increasing alongside religious beliefs, but then
Despite this large amount of work, researchers have yet decreasing when religious beliefs become particularly
to agree on a definition of the term “paranormal”. While a strong [74, 75].
review of existing definitions is beyond the scope of this A further criticism of the RPBS has focused on the
paper, the present work adopts the widely held view that fact that only one item in the scale is negatively worded.
phenomena can be considered paranormal when they This could clearly increase the risk of RPBS scores being
violate the basic limiting principles of current scientific affected by respondents endorsing this item without fully
understanding [39], and so includes phenomena such as considering its content [76].
telepathy, life after death, astrology, and hauntings.
However, widespread agreement exists that research in
this area is hampered both by studies employing a diverse Australian Sheep‑Goat Scale
range of measures of paranormal belief [4, 40–44], and by The Australian Sheep-Goat Scale (ASGS) [50] consists
the lack of psychometric validity for some scales [45–48]. of 18 items and contains three subscales (Belief in Extra-
Much of the discussion has focussed on the three most sensory Perception, Psychokinesis, and Life After Death).
frequently used scales—namely, the Paranormal Belief The original response format for the ASGS involved a vis-
Scale [49], the Australian Sheep-Goat Scale [50] and the ual analogue scale, with respondents indicating their level
Survey of Scientifically Unaccepted Beliefs [51]. of agreement with each item by marking a horizontal
line. Scoring involved using a ruler to yield a value from these biases and assessing their effect. Findings indicated
1 to 44, and these scores were then recoded to give a weak age and gender biases for some ASGS items, but the
final value of 0, 1 or 2 for each item. Subsequent versions effect of these biases was minimal and suggests that the
employed a force-choice format, with participants select- scale is not significantly affected by DIF. The same scaling
ing one of three response options (‘True’, ‘Uncertain’, and has also been applied to the RPBS [56], with significant
‘False’) that are then recoded as 2, 1 or 0. The visual ana- DIF for gender seen on 18 items, and age on 15 items.
logue scale and forced choice options produce similar Consequently, using top-down purification (combining
overall scores [77]. The ASGS has also been adapted for factor analysis and Rasch scaling), a two-factor model
use with a six-point Likert scale, with some authors argu- was suggested to reduce the impact of DIF, which has
ing that this format is less confusing for respondents and subsequently been employed in several studies [22, 33,
easier to interpret than the original visual analogue [78]. 34, 84–86]. Despite the extensive use of the purified scale,
Although the ASGS tends to yield moderate-to-large several items of the RPBS failed to load on either of the
intercorrelations between the three subscales, the Life two new factors, with the authors highlighting that addi-
After Death subscale exhibits the lowest internal reliabil- tion of new items to the RPBS may produce additional
ity, leading some to suggest that it may undermine over- belief clusters to those identified through their analyses
all scale integrity [45]. Also, although the visual analogue [56]. DIF analysis was also used in the construction of
ASGS presented both negatively and positively worded the SSUB to remove three items from the original item
items, the more frequently employed force-choice and pool that were identified for age and gender biases [51].
Likert formats lack any negatively phrased items. As As such, these items do not feature on the final version of
such, they raise concerns about response bias. the SSUB.
Despite the issues raised with the ASGS and its differ-
ences to the RPBS, several studies have noted positive Classical test theory and modern test theory
correlations of 0.70 and above between the two scales Latent traits such as paranormal beliefs are, by defini-
[79, 80]. tion, unobservable. Therefore, research relies on the use
of self-report scales, like those mentioned above, which
Survey of scientifically unaccepted beliefs assume that individuals’ responses to items are influ-
The Survey of Scientifically Unaccepted Beliefs (SSUB, enced by the latent trait of interest [87]. Classical test the-
also referred to as the “Survey of Popular Beliefs”) [51] is ory (CTT) and modern test theory (MTT; also referred
a more recent alternative for measuring belief in the par- to as item response theory) are the two primary methods
anormal. The SSUB is made up of 20 items and contains used in psychological scale development. Both CTT and
two subscales: New Age Beliefs and Traditional Religious MTT models strive to measure and improve the reliabil-
Beliefs. The scale has high levels of internal reliability [51, ity, validity, and internal consistency of the scale under
81, 82], has a balance of positive and negative items, and assessment [88, 89] but do so in different ways. One of
has not seen the same level of scrutiny and critique as the the key differences between these approaches is that
ASGS or RPBS. Although many of the phenomena fea- CTT assumes that measurement precision is equal for all
tured in the scale could be considered paranormal (e.g., individuals, while MTT takes the view that measurement
the existence of genuine haunted houses, psychics and precision depends on individuals’ levels of the latent trait
fortune tellers), the inventory also contains items relat- [90].
ing to several scientifically unaccepted beliefs that are not CTT models, focused at the test-score level, assume a
commonly associated with the paranormal (e.g., the lack linear model that links the observable test score (X) to
of a rational explanation of crop circles and pixies, which the sum of two unobservable variables: true score (T)
are based upon mystery and elusiveness rather than a and error score (E) [91]. This assumption can be more
strict violation of scientific principles). clearly illustrated with the following formula: X = T + E.
In this formula, the observed score (X) represents the
Differential item functioning observed total score calculated from the scale in use,
In addition to the criticisms outlined above, some and the error score represents a random, non-systematic
researchers have questioned whether variations in error assumed to be independent of the true score (e.g.,
responses on the existing scales may be partly a func- poorly functioning test items, or external confounding
tion of semantic biases introduced by age or gender, variables). The true score is often conceptualised as the
rather than fluctuations in level of belief [56]. This issue mean of all scores obtained if an individual responded to
is commonly referred to as differential item function- the given scale an infinite number of times [92]. There-
ing (DIF). Rasch scaling (a modern test theory model) fore, the observed score of X can be considered to be a
has been applied to the ASGS [83] as a way of detecting combination of both relevant information relating to the
latent variable of interest and the error associated with professionals [96]. The assumptions of MTT models are
each item [93]. A factor-analytic strategy (often relying also more restrictive compared to those of CTT models
on the use of exploratory factor analysis for item selec- (i.e., more difficult to meet with real test data), and sam-
tion) is among the most popular CTT method for scale ple size requirements are much larger for both items and
development, and has the primary aim of developing an respondents [97]. For unidimensional MTT models (such
internally consistent scale with a manageable number of as the Rasch model), minimum sample sizes of approxi-
differentiable dimensions [94]. mately 200 respondents are required [98]. However,
CTT models offer certain advantages. For example, multidimensional MTT models require large sample
many CTT models are based on relatively weak assump- sizes ≥ 1000 respondents to identify precise item param-
tions, and are therefore easily met with real test data [91]. eters and decrease error estimation [99].
These models are also simple to use and allow for exami- CTT and MTT models both have their individual
nation at the test-score level of the precision with which strengths relating to scale development and assessment.
the latent trait of interest is measured by a given scale Therefore, complete and successful psychometric assess-
[95]. However, CTT’s standing popularity, despite the ment may benefit from the use of both models, which
emergence of more modern approaches to scale develop- would provide information about individual item func-
ment, could be attributed to the fact that many research- tioning as well as how items function as a unit [97].
ers are familiar with its basic concepts and are likely to
have encountered CTT (or to have used scales that were Present study
developed through CTT methods) [93]. Therefore, it is Paranormal belief scales suffer from various shortcom-
important to also consider the limitations of CTT. The ings, including sub-scales that are often heavily culture
central limitation of CTT models is that person and item specific or do not reflect mainstream beliefs commonly
parameters are sample-dependent, which limits the util- associated with the paranormal, a lack of negatively
ity of these statistics in scale development [89, 91]. CTT phrased items and the potential for differential item
models also do not allow for rigorous assessment of functioning. The present study sought to address these
item characteristics that can be computed under differ- issues by creating a scale that included phenomena that
ent models, and so scales developed using CTT methods are widely considered to be associated with the paranor-
may suffer from differential item functioning (as men- mal, had less culture-bound items, combined both posi-
tioned above) [93]. tively and negatively phrased items, and did not contain
In contrast to CTT models, MTT models are nonlinear evidence of differential item functioning. The first aim of
and focus at the item level, seeking to relate respondents’ this study was to construct a scale for measuring para-
performance on individual test items to their estimated normal beliefs, examine the latent structure and refine
level of the latent trait of interest [96]. These models are the scale using both CTT and MTT models. The second
assumed to be invariant across populations, meaning the aim was to assess the test–retest reliability of the new
item and test parameters can be interpreted independent scale(s). Finally, the study aimed to compare the scales
of specific samples. The type of MTT model used in scale developed through CTT and MTT methods to deter-
development may differ depending on the type of data mine the usefulness of each approach, and to determine
collected (dichotomous data such as yes/no responses, or which scale provides the most precise measure of belief
polytomous data collected using Likert response meth- in paranormal phenomena.
ods), and on the number of dimensions they specify. In
general, MTT models can be said to have three main Method and materials
goals: (1) to produce items that provide the most infor- Participants
mation about respondents’ levels of the latent trait of We recruited an opportunistic sample of the general pub-
interest, (2) to present respondents with items tailored lic (N = 343) through advertisements placed on social
to their latent trait levels, and (3) to reduce the number media. These advertisements asked for participants
of items needed to determine respondents’ level of the over the age of 18 and fluent in English to complete sev-
latent trait without loss of reliability [96]. The advantages eral short questions about their beliefs in paranormal
of MTT models over CTT models are most notable at and superstitious phenomena, as well as a few short
the item level. Item characteristics, differential function- questions about themselves. Removal of incomplete
ing and fit to the model can be assessed, as well as indi- responses resulted in the final sample (N = 231: 83 males
viduals’ response styles and the functionality of response and 144 females, 4 unreported: Age 18–80, M = 36.94,
scales [97]. However, a limitation of MTT models is their SD = 14.60). Most participants were white (51.10%
use of sophisticated and in-depth statistical analyses white British, 21.20% other white background, 06.90%
which remain unfamiliar to many researchers and testing White Irish) and held an undergraduate degree or higher
(71.00%). Of the participants with a university education, Health, Science, Engineering and Technology Ethics
most had a background in psychology (21.60%). Committee with Delegated Authority (HSET ECDA).
Data analysis
Materials
Analyses will be conducted using two models: a classi-
An initial collection of 29 statements regarding paranor-
cal test theory (CTT) model and a modern test theory
mal and superstitious phenomena was generated using
(MTT) model. Therefore, the analysis will use both an
adapted items from: the Revised Paranormal Belief Scale
exploratory factor analysis (EFA) and a rating scale model
(RPBS) [52], the Australian Sheep-Goat Scale (ASGS)
(Rasch model). The EFA will allow for the identification
[50], and the Survey of Scientifically Unaccepted beliefs
of underlying latent constructs underpinning the scale. In
(SSUB) [51], as well as four novel items developed by the
other words, EFA will be used to identify emerging sub-
authors. These novel items arose from discussion and
categories (or factors) across the initial collection of 29
examination of the RPBS, ASGS and SSUB to identify any
items. Factors emerging through EFA will be interpreted
phenomena absent from these measures, such as posses-
as distinct categories of paranormal belief. EFA will be
sions and protection objects. Examples of the phenom-
conducted using a principal components extraction
ena used include luck (lucky charms and bad luck), psi
method, selecting only eigenvalues greater than 1, and a
(sixth sense and psychics) and hauntings (Ouija boards
direct oblimin rotation. Items with factor loadings < 0.50
and possession). The scale contained both positively
will be removed from the scale and the EFA run again
(n = 23) and negatively phrased items (n = 6).
until all items have acceptable factor loadings. EFA will
also explore group differences and answering patterns to
Procedure the scale items and factors to further assess the effective-
The scale was administered as an online survey using ness of the remaining scale items.
Qualtrics Survey Software (Qualtrics, Provo, UT; see Rasch analysis will be conducted to allow for a com-
https://www.qualtrics.com). Participants were informed parison between CTT and MTT methods of scale devel-
that the study was concerned with paranormal and super- opment. Owing to the polytomous nature of the data, a
stitious belief within the general population. Respondents rating scale model (RSM) [100] will be adopted for the
who agreed to take part were asked to provide their age, Rasch analysis. Analyses will first evaluate item thresh-
gender (male, female, other), ethnicity (Arabic, Asian/ olds and item characteristic curves (ICCs) for the ini-
Asian British, Bangladeshi, Black/Black British, Chinese, tial collection of 29 items to assess the suitability of the
Indian, Pakistani, White British, White Irish, other Asian 7-point Likert response format. Item fit to the model
background, other White background, mixed back- will then be assessed by examining both infit (weighted)
ground) level of education (doctoral degree, postgraduate and outfit (unweighted) mean square statistics (MNSQ).
degree, undergraduate degree, post-secondary educa- Items identified for overfitting (MNSQ < 0.07/t < -2)
tion, secondary education, vocational) and academic dis- or underfitting/misfitting (MNSQ > 1.2/t > 2) will be
cipline if they had indicated a university education removed from the scale [101]. The person-item map will
(architecture, arts and humanities, business, education, then be consulted to assess item difficulty, and to deter-
law, medicine, natural sciences, philosophy, psychol- mine whether the remaining items meaningfully measure
ogy, social sciences, theology, technology, other medical, the ability (level of belief ) of all persons. Therefore, we
other). Respondents had the option not to provide the will be using the person-item map to determine whether
above demographic details. Participants then completed the final scale is suitable for measuring the range of para-
the paranormal scale. Responses were recorded using a normal belief (from low belief/scepticism to high belief ).
7-point Likert scale (Strongly Disagree, Moderately Disa- A CTT method of confirmatory factor analysis (CFA)
gree, Slightly Disagree, Uncertain, Slightly Agree, Moder- will be used alongside the Rasch analysis to confirm the
ately Agree, Strongly Agree). The seven response options unidimensional model fit of the RSM. Finally, remaining
were numerically coded from 1 to 7 for positively worded items will be tested for DIF in relation to: age, gender,
items, and reverse coded for the negatively worded items. ethnicity, education, or discipline.
Following completion of the scale, we asked participants A test–retest reliability analysis will be conducted for
if they would be willing to complete the scale again one- both the CTT and MTT scales.
week from the date of initial completion.
Informed consent was obtained from all participants Results: classical test theory
and all methods were performed in accordance with rel- Factor structure of the scale
evant guidelines and regulations. Ethical approval for An exploratory factor analysis (EFA) was conducted
the study was granted by the University of Hertfordshire to investigate the latent constructs underpinning the
scale. A principal components extraction method was sample comprised 117 (50.60%) sceptics and 114 (49.40%)
employed and only eigenvalues greater than one were believers.
extracted. A direct oblimin rotation was used to account
for the non-orthogonality of the items. Bartlett’s Test of
Sphericity was significant (χ2 = 4975.77, p < 0.001) and the Principal component analysis
Kaiser–Mayer–Olkin value equalled 0.95 indicating that To provide a visual overview of answering patterns for
the data were suitable for further analysis. A four-factor the two groups, a principal component analysis (PCA)
solution was extracted, accounting for 64.32% of the was conducted using the ggfortifiy [102] package in R
total variance. Cronbach’s Alpha was computed for each version 4.0.2 [103]. The PCA score plot (see Fig. 1) shows
factor, with all four showing good internal consistency responses to all 20 items as a function of respondent
(α > 0.70). Examination of the pattern matrix revealed group, and highlights the distinct clustering of believ-
seven items with low item loadings (< 0.50), and so a sec- ers and sceptics, with very little overlap between the two
ond analysis was undertaken after excluding these items. groups. To visually represent the responses to each item
The second analysis conducted on 22 items indicated a on the scale for believers and sceptics, a raincloud plot
three-factor solution, accounting for 63.94% of the total was created, and the results can be seen in Fig. 2.
variance. Inspection of the pattern matrix revealed a fur-
ther two items with loadings < 0.50, leading to an analy-
sis restricted to 20 of the scale items. The final analysis Group answering patterns
accounted for 65.67% of the total variance. All emergent Responses for believers and sceptics were tested for each
factors demonstrated good levels of internal consistency item and factor. Table 1 displays the percentage agree-
and were conceptually distinct. Of the nine items that ment for each item and subsequent factor across both
were removed during EFA, most were concerned with groups. Responses labelled “strongly disagree”, “moder-
belief in psychics and those with supernatural abilities ately disagree” and “slightly disagree” were collapsed to
(e.g., “psychokinesis, the movement of objects through give an overall “disagree” score for a given item or fac-
psychic powers, does exist”, “tarot cards are an accurate tor. The same was done for responses labelled “strongly
way to see a person’s past, present, and future”, “astrology agree”, “moderately agree” and “slightly agree” to pro-
is a way to accurately predict the future”, “mind reading is vide an overall “agree” score. Participants’ percentage of
possible”). “uncertain” responses are also shown here as a function
The first factor, eigenvalue 10.07, accounted for 50.34% of respondent group. Percentage agreement was also cal-
of the variance and demonstrated excellent internal reli- culated for participants in the upper and lower quartiles
ability (α = 0.95). The 14 items contained within Factor 1 to provide a more accurate reflection of item-based dif-
concerned phenomena such as spell casting, communi- ferences for the most sceptical participants and those
cating with the dead, hauntings, possession, the soul, and with the strongest paranormal beliefs (see Table 2).
premonitions. As this factor contained 70% of the total
scale items and covered a variety of paranormal phenom-
ena that could be considered supernatural, Factor 1 was
subsequently labelled “Supernatural Beliefs”. The sec-
ond factor had an eigenvalue of 1.87 and accounted for
9.34% of the variance. Factor 2 showed excellent internal
reliability (α = 0.88). The factor comprised three items
concerned with common superstitions centred around
bad luck. Factor 2 was subsequently labelled “Bad Luck”.
The final factor, eigenvalue 1.20, accounted for 5.99%
of the variance, with low to moderate internal reliabil-
ity (α = 0.53). Factor 3 comprised three items regarding
telepathy, charms, and predicting the future, and was
labelled “Psi”.
Response differences between believers and sceptics Fig. 1 PCA score plot of all responses to the paranormal scale as a
We divided participants into groups of ‘believers’ and function of respondent group. Figure plots participants’ responses to
‘sceptics’ according to their mean scores (with those the scale items against the two principal components that represent
scoring below the overall mean of 67.30 identified the largest variability among the two groups, to provide a visual
indication of separation (or lack thereof ) between the groups
as ‘sceptics’ and those above as ‘believers’). The total
Table 1 Percentage agreement with factors and items as a function of respondent group
Disagree % Uncertain % Agree %
Believers Sceptics Believers Sceptics Believers Sceptics
Factor 1 Total 13 75 19 12 67 13
Item 1 4 60 16 13 80 27
Item 3 8 55 18 22 74 23
Item 5 31 90 25 5 45 5
Item 7 14 71 22 20 64 9
Item 8 11 87 16 5 73 8
Item 9 13 82 9 13 78 5
Item 10 3 64 11 15 87 21
Item 12 8 72 28 17 64 11
Item 13 22 91 11 5 67 4
Item 15* 9 61 24 18 68 21
Item 16 13 68 10 7 77 26
Item 17* 13 80 31 11 56 9
Item 18 20 86 25 6 54 8
Item 20 18 91 26 7 56 3
Factor 2 Total 64 94 12 2 23 4
Item 2 61 92 15 3 25 5
Item 4 60 95 11 3 30 3
Item 6 73 95 11 1 16 4
Factor 3 Total 30 69 20 6 49 25
Item 11* 46 85 14 1 39 15
Item 14* 23 57 24 10 54 32
Item 19* 21 64 24 8 55 28
*Reverse scored items, table presents the percentage of believers and sceptics who indicated agreement, disagreement, or uncertainty for each item and each factor
Table 2 Percentage agreement with factors and items for upper and lower quartiles
Disagree % Uncertain % Agree %
Upper quartile Lower quartile Upper quartile Lower quartile Upper quartile Lower
quartile
Factor 1 Total 6 91 11 4 83 4
Item 1 3 82 3 5 93 13
Item 3 2 84 7 8 91 8
Item 5 12 98 19 0 69 2
Item 7 7 85 16 11 78 3
Item 8 2 98 3 2 95 0
Item 9 3 98 0 2 97 0
Item 10 0 92 3 3 97 5
Item 12 2 94 21 6 78 0
Item 13 12 98 12 0 76 2
Item 15* 3 76 19 13 78 11
Item 16 7 82 3 5 90 13
Item 17* 9 94 16 2 76 5
Item 18 10 97 16 2 74 2
Item 20 5 100 17 0 78 0
Factor 2 Total 55 99 15 0 30 1
Item 2 55 98 17 0 28 2
Item 4 45 100 10 0 45 0
Item 6 64 100 17 0 19 0
Factor 3 Total 19 74 23 5 58 22
Item 11* 28 82 17 2 55 16
Item 14* 12 69 24 8 64 23
Item 19* 17 69 28 5 55 26
*Reverse scored items, table presents the percentage of participants in the upper and lower quartiles who indicated agreement, disagreement, or uncertainty for each
item and each factor
Table 3 Mean score (standard errors), χ2, p values, and Cramer’s Demographic differences
V for likelihood ratio tests for groups within each factor Owing to the somewhat mixed research suggesting a cor-
Factor Mean (SE)
relation between paranormal beliefs, academic discipline
Believers Sceptics χ2 p Cramer’s V
and aspects of thinking, responses to the paranormal
scale were compared for those with and without higher
1 5.07 (.10) 2.19 (.11) 1330.63 < .001 .45 education backgrounds; and between those from sci-
2 2.82 (.12) 1.41 (.07) 93.24 < .001 .26 ence and non-science academic disciplines. Most partici-
3 4.32 (.11) 2.64 (.14) 105.83 < .001 .28 pants held an undergraduate degree or higher (n = 164),
while less than half held post-secondary qualifications or
lower (n = 67). Participants with university degrees had
lower total paranormal scores (M = 46.20, SD = 22.89)
(χ2 = 2565.14, p < 0.001) and the Kaiser–Mayer–Olkin
than participants without university degrees (M = 61.34,
value equalled 0.95 indicating that the data were suit-
SD = 22.08). The difference in scores between the two
able for further analysis. A one-factor solution was
education groups was significant [t(126.78) = −4.68,
extracted, accounting for 62.93% of the total variance.
p < 0.001]. Of the participants with degree qualifications,
Cronbach’s Alpha was computed for this factor, which
most were from science-based disciplines including psy-
retained the excellent internal consistency found in the
chology, natural sciences, technology, and other medical
earlier analysis (α = 0.95). Table 4 presents the final 14
backgrounds (n = 83), while the rest included social sci-
items contained within the single factor alongside the
ences, education, business, philosophy, theology, art and
component loadings seen in the (non-rotated) compo-
humanities, law, and architecture (n = 57). As 24 par-
nent matrix.
ticipants did not disclose their discipline, the following
Table 4 Single-factor scale with corresponding Cronbach’s Alpha (α) score and component loadings
Factor α Items (loading scores)
1 Supernatural Beliefs .95 1 The soul continues to exist after a person has died (.76)
2 Your mind or soul can leave your body (.77)
3 It is possible to cast spells on persons using formulas and incantations (.80)
4 It is possible to be reincarnated (.74)
5 Some people with psychic abilities can accurately see the future (.86)
6 It is possible to communicate with the dead (.86)
7 Buildings can be haunted by spirits or other supernatural entities (.87)
8 Some psychics have helped find the bodies of murder victims through paranormal means (.85)
9 A person’s star sign can have a direct influence on their personality (.76)
10* Reports of an apparent sixth sense are generally based on fantasies (.72)
11 Having a dream that comes true is not just a coincidence (.71)
12* Communicating with spirits or other supernatural entities through a Ouija board is not possible (.75)
13 It is possible to become possessed by an evil supernatural entity (.81)
14 It is possible to protect one’s home from spirits using protection objects and herbs (.83)
*Reverse scored items
analyses were conducted on 140 participants. Those from numerically coded as before. The scale was administered
science-based disciplines demonstrated lower paranor- again as an online survey using Qualtrics Survey Soft-
mal scores (M = 40.02, SD = 21.28) compared to those ware (Qualtrics, Provo, UT; see https://www.qualtrics.
with art-based degrees (M = 54.77, SD = 22.24), and the com).
difference in scores between the two discipline groups
was significant [t(116.99) = 3.92, p < 0.001].
Retest analysis
Retest analyses were conducted on the final 14-item
Test–retest reliability scale. Pearson’s correlations revealed a strong test–retest
Sample and procedure reliability for the scale [r(35) = 0.98, p < 0.001], as well as
A follow-up study was conducted to assess the test–retest for both believers [r(15) = 0.88, p < 0.001] and sceptics
reliability of the newly developed scale. Of the original [r(18) = 0.90, p < 0.001]. A scatterplot of the scores for
sample of 231 participants, 37 (16% of the original sam- believers and sceptics at time one and time two can be
ple) agreed to complete the scale a second time, one- found in Fig. 3.
week after their initial participation. The retest sample
consisted of 21 males (56.80%) and 16 females (43.20%),
Results: modern test theory
aged between 18 and 73 (M = 41.51, SD = 16.61). In con-
The MTT analyses presented in the following sections
trast to the original sample, this sample had a higher
were conducted using a Rasch rating scale model (RSM)
percentage of male participants and a higher mean age.
using the eRm [104, 105] package in R version 4.0.2 [103].
The difference in gender between the original participant
group and the retest group was significant (χ2 = 5.433,
p = 0.020). However, the difference in age between Response categories
the two groups was not significant [t(262) = −1.77, MTT analyses first focused on evaluating the effective-
p = 0.078]. ness of the 7-point Likert rating scale. As it is difficult to
Nineteen respondents were identified as ‘sceptics’ be certain of the exact way the sample will use the rat-
(51.35%) and 18 as ‘believers’ (48.65%), according to ing scale, investigation is necessary to verify or improve
their mean scores on the 14-item scale at time one (with the functioning of the rating scale categories [106]. To
those scoring below the overall mean of 50.59 identified evaluate the response category use of the sample, thresh-
as ‘sceptics’ and those above as ‘believers’). The question- old parameters of each category were examined for each
naire completed by participants comprised the original of the original 29 items. These thresholds identify and
collection of 29 statements and used the same 7-point define the boundaries between each response category
Likert response format (Strongly Disagree, Moderately and should therefore increase monotonically. Conse-
Disagree, Slightly Disagree, Uncertain, Slightly Agree, quently, participants with higher levels of paranormal
Moderately Agree, Strongly Agree). Responses were beliefs should be more likely to endorse higher response
Item fit
Mean square statistics (MNSQ) were computed to deter-
mine item fit to the model (i.e., how well each item con-
tributes to defining a single unidimensional construct).
The MNSQ statistics indicate the amount of distortion
of the scale, where high MNSQ values indicate unpre-
dictability and a lack of construct similarity with other
Fig. 3 Test–retest reliability analysis as a function of respondent
group. Pearson’s correlations between participants’ individual total
scale items (underfitting), and low values indicate item
scores at time one and time two shown for each group redundancy and less variation in the observed data com-
pared to the variation that was modelled (overfitting)
[107]. Two MNSQ statistics were used to assess item fit:
categories. For the Rasch analyses, responses are shifted infit (weighted) and outfit (unweighted) statistics. Sub-
such that the lowest category (strongly disagree) is 0. sequent analyses used an accepted range of fit of 0.7 to
Analysis of the 7-point rating scale revealed that 1.2 [101] to identify items with poor model fit. Therefore,
threshold parameters failed to increase monotonically, items with MNSQ values < 0.7 were identified as overfit-
therefore indicating evidence of step disordering. Step ting the model, and MNSQ values > 1.2 were identified
disordering, occurring when threshold parameters fail to as underfitting the model. When assessing item fit to
increase monotonically, indicates that certain response the model, infit and outfit t-statistics were also exam-
categories have a low probability of being observed ined where t-values < -2 were identified as overfitting
[106], meaning that the sample are less likely to use and t-values > 2 were identified as underfitting. However,
these response categories. The lack of ordered increase it has been suggested that infit and outfit MNSQ values
occurred at Category 2 (somewhat disagree). Examina- are relatively insensitive to sample size variation in poly-
tion of the item category curves (ICCs) indicated that tomous data, while the t-statistics vary considerably with
Category 2 had the lowest probability of observance and sample size. Therefore, it has been recommended that
was therefore never more likely to be observed than any infit and outfit t-statistics are interpreted with caution
other category. Put more simply, regardless of an individ- when determining item fit to the model for large sam-
ual’s level of belief in paranormal phenomena, the prob- ples and polytomous data [101]. As such, items would
ability of choosing “somewhat disagree” is never the most be removed from the scale if they demonstrated both
likely. Similarly, Category 1 (moderately disagree) also infit and outfit MNSQ values that were overfitting or
had a low probability of observance and at no point was underfitting the model. In cases where items were only
this category most likely to be observed. identified on one of the MNSQ values (infit or outfit),
To begin to improve the functioning of response cat- t-statistics were consulted to verify item misfit. Based on
egories, responses were recoded such that the “moder- the MNSQ values of the 29 items, a total of 7 items (4,
ately disagree” and “somewhat disagree” categories were 10, 12, 13, 15, 28 and 29) were identified for overfitting
collapsed, as were the “moderately agree” and “somewhat and a further 8 items (1, 2, 5, 8, 14, 17, 23 and 27) were
agree” categories. This gave a revised 5-point scoring identified for underfitting. Subsequently, these 15 items
method (0 = strongly disagree, 1 = disagree, 2 = uncer- were removed from the scale and the analysis was con-
tain, 3 = agree, 4 = strongly agree). However, this revised ducted again on the remaining 14 items. A final item (7)
scoring method failed to rectify step disordering. Exami- was identified for overfitting the model and was removed
nation of the ICCs revealed that the boundaries between from the scale. Analysis of the final 13 items revealed infit
Categories 1 and 2 (disagree and uncertain) were very and outfit statistics within the specified ranges. While
narrow and suggested that the sample did not clearly dif- item 11 produced an infit t-statistic of − 2.2, the infit
ferentiate between these two categories. Therefore, a final and outfit MNSQ values were within the specified range
recoding took place such that the “disagree” and “uncer- (0.81 and 0.83, respectively) as was the outfit t-statistic
tain” categories were collapsed, giving a final revised (− 1.76). Considering these other statistics and given that
the infit t-statistic of item 11 was very close to -2, it was
Fig. 4 Item characteristic curve for item 1 using the 4-point scoring method. Curves represent the probability of selecting a category along the
latent trait. Category 0 = “strongly disagree”, Category 1 = “disagree”, Category 2 = “agree”, Category 3 = “strongly agree”
determined that the item demonstrated reasonable fit to monotonically for all remaining items. An example of the
the model and that there was not sufficient evidence to ICC for item 3 is shown in Fig. 5, which again shows the
remove the item from the final scale. Table 5 shows the desired range of peaks.
final MNSQ statistics for the remaining items, along with
the corresponding item difficulty statistics. Item difficulty
Owing to the substantial change in the number of scale The final RSM analysis conducted using the and eRm
items, thresholds for the 4-point response scale were package [104, 105] sought to estimate the person trait
consulted to verify the functioning of the new rating scale and item difficulty parameters. In other words, the fol-
for the remaining 13 items. The analysis demonstrated lowing analysis aimed to determine whether the difficulty
that the thresholds of the four categories increased of the remaining items was appropriate for the sample.
Table 5 Parameter values for the remaining 13 items (in order of item difficulty)
Item Outfit MSQ Infit MSQ Difficulty
6 If you break a mirror, you will have bad luck 1.002 1.112 2.547
18 Fairies and similar beings are real 1.001 0.866 2.432
19* Fortune tellers’ predictions are typically based on guesswork 1.046 0.823 2.026
16 A person’s star sign can have a direct influence on their personality 0.971 0.993 1.663
21 Some health conditions can be treated with psychic healing 0.986 0.937 1.587
26 It is possible to become possessed by an evil supernatural entity 1.018 0.991 1.540
25* Communicating with spirits or other supernatural entities through a Ouija board is not 1.199 1.170 1.475
possible
11 Mind reading is possible 0.832 0.809 1.401
9 It is possible to be reincarnated 1.049 0.974 1.318
22 In some cultures, shamans or “witch doctors” exercise powers we cannot explain 0.892 0.862 1.199
20* Reports of an apparent sixth sense are generally based on fantasies 0.829 0.833 0.829
3 Your mind or soul can leave your body 1.074 1.058 0.793
24 Having a dream that comes true is not just a coincidence 0.832 0.837 0.766
Fig. 5 Item characteristic curve for item 3 in the reduced scale using the 4-point scoring method. Curves represent the probability of selecting a
category along the latent trait. Category 0 = “strongly disagree”, Category 1 = “disagree”, Category 2 = “agree”, Category 3 = “strongly agree”
To meaningfully measure the ability (level of paranor- therefore do not contribute to the Rasch model. Conse-
mal belief ) of all persons, items should be located along quently, data for 14 participants (all of whom scored in
the length of the latent dimension. The person-item map the lowest categories) were removed. In total, 22 partici-
shown in Fig. 6 displays both the person traits (in the pants were removed and the DIF analysis was conducted
upper panel) and item difficulties (lower panel) along the on a reduced sample of 209 participants. If none of the
same latent dimension. As shown, the category thresh- scale items show evidence of DIF, then the analysis should
olds of most of the 13 items cover a low-to-high range produce a tree with only a single node, supporting a uni-
of paranormal belief well. However, item difficulty loca- dimensional Rasch model for the data [110]. However, if
tions (identified in Fig. 6 as solid circles) cluster towards the Rasch tree shows at least one split and identifies more
the right side of the latent dimension. Therefore, the than a single node containing the entire sample, then DIF
items have a higher probability of differentiating between is present. An advantage of using the Rasch tree method
individuals with higher levels of paranormal beliefs. For for identifying DIF is that DIF can be detected between
example, item 6 (“if you break a mirror, you will have bad groups of participants created by more than one covari-
luck”) shows the highest item difficulty meaning that ate (e.g., females under 34), and these groups do not need
participants with higher levels of paranormal beliefs are to be pre-specified prior to analysis. As such, the Rasch
more likely to agree with this item. tree method searches for the value corresponding to the
strongest parameter change and splits the sample at the
Differential item functioning value identified [110]. The DIF analysis was conducted
Differential item functioning (DIF) analysis was con- for five covariates: age, gender, ethnicity, education, and
ducted using rating scale trees within the psychotree discipline. Analysis produced a tree with a single node,
[108, 109] package in R version 4.0.2 [103]. Before this and therefore no DIF was present in the scale for any of
analysis was conducted, data for 8 participants who the covariates. The single-node tree can be seen in Fig. 7.
chose not to disclose demographic information were
removed. Data was also removed for participants scoring Confirmatory factor analysis
only in either the highest or lowest categories (i.e., par- As a final test of the unidimensionality of the scale,
ticipants responding “strongly disagree” to all 13 items, a confirmatory factor analysis (CFA) was conducted
or “strongly agree” to all items”) as these responses do using the lavaan [111] package in R version 4.0.2 [103].
not provide information relating to item difficulty and To determine the strength of model fit, four main fit
Fig. 6 Person-item map for the 13-item scale. Figure displays the location of person traits and item difficulties along the same latent dimension
(paranormal belief ). The person traits are located on the scale from left (low belief ) to right (high belief ). Locations of item difficulties are presented
as solid circles, and thresholds of adjacent category locations are presented as open circles. The item parameters are located on the scale from least
difficult (left) to most difficult (right)
indices were used: comparative fit index (CFI), Tucker- identified as ‘sceptics’ and those above as ‘believers’),
Lewis index (TLI), root mean square error of approxi- the analysis retained the original split seen in the EFA
mation (RMSEA), and standardised root mean square analysis of 19 sceptics and 18 believers. Pearson’s correla-
residual (SRMR). For both the CFI and TLI, a value of tions revealed a strong test–retest reliability for the scale
0.90 or above would indicate acceptable fit and a value [r(35) = 0.92, p < 0.001], and for believers [r(16) = 0.75,
of 0.95 or above would indicate very good model fit. p < 0.001]. However, the retest correlation was not signifi-
For the RMSEA, a value of 0.05 or below would indicate cant for sceptics [r(17) = 0.45, p = 0.051]. A scatterplot of
close model fit, with a value of 0.08 indicating accept- the scores for believers and sceptics at time one and time
able fit. The accompanying p value for the RMSEA sta- two can be found in Fig. 8. Cronbach’s Alpha computed
tistic should also be greater than the standardised value for this final scale, indicated an excellent internal reliabil-
of 0.05 for close model fit. Finally, an SRMR value of 0.05 ity (α = 0.91).
or below would indicate a well-fitting model. Overall, the
model demonstrated good fit, and supported the use of Correlations between scales
a unidimensional Rasch model for the data. Complete fit To compare the performance of the CTT and MTT
statistics can be seen in Table 5. derived scales, a final correlational analysis was con-
ducted comparing respondents’ total scores on each
Rasch test–retest reliability scale. The analysis only included respondents who
The sample for the test–retest reliability analysis was the were identified as ‘sceptics’ or ‘believers’ by both
same as that described in the EFA analysis. While par- scales. Therefore, 17 respondents were removed from
ticipants were divided into believers and sceptics based the analysis owing to the scales placing them in dif-
on their mean scores for the 13-item Rasch scale at time ferent groups, and the final analysis was conducted
one (with those scoring below the overall mean of 26.94 on a reduced sample of 214. Of the reduced sample,
Fig. 7 Single-node Rasch tree. Note: Figure shows differential item functioning analysis conducted on the covariates of age, gender (male or female),
ethnicity (white background or BME background), education, and discipline. No differential item functioning was identified. Item number is represented
on the x axis in both plots. Item difficulty is represented on the y axis of the top plot (higher values represent higher item difficulty), and item threshold
parameters are shown on the y axis of the lower plot (with the lightest shade representing the ‘strongly agree’ response category)
102 respondents were identified as ‘sceptics’ (47.66%) [r(100) = 0.82, p < 0.001]. A scatterplot of the scores for
and 112 as ‘believers’ (52.34%). Pearson’s correlations believers and sceptics at time one and time two can be
revealed a strong correlation between the scales for found in Fig. 9.
the total sample [r(212) = 0.96, p < 0.001], as well as for
both believers [r(110) = 0.86, p < 0.001] and sceptics
between the two scales. While the CTT and MTT scales tree revealed a single node, with no DIF identified for any
both demonstrated good retest correlations for believers of the covariates. Therefore, while the MTT scale can be
(0.88 and 0.75 respectively, ps < 0.001), the retest corre- described as a valid measure of belief in paranormal phe-
lation for sceptics was not significant in the MTT scale nomena, it is difficult to be certain that the CTT derived
[r(17) = 0.45, p = 0.051] compared to the CTT scale scale does not suffer from DIF. As mentioned above,
[r(17) = 0.90, p < 0.001]. The difference in these scores can MTT analyses also allowed for examination of item dif-
be explained using the person-item map produced dur- ficulty, with results indicating that items had a higher
ing MTT analyses, which suggested that the item within probability of differentiating between respondents with
the MTT scale have a lower probability of differentiat- moderate-high levels of paranormal beliefs. This infor-
ing between individuals with lower levels of paranormal mation is particularly useful for future research looking
beliefs. Similar differences were not able to be established to utilise the scale to examine group differences within
through CTT analyses. To the authors’ knowledge this is paranormal beliefs. The following comparisons focus on
the first presentation of separate retest scores for believ- the final PSBS developed through MTT analyses.
ers and sceptics. Comparison of the performance of both Several important differences can be noted when com-
scales revealed strong correlations between respondents’ paring the PSBS to the three most frequently employed
total scores on the CTT and MTT derived scales in the measures of paranormal belief. The unidimensional
total sample (r = 0.96), and for believers (r = 0.86) and structure of the PSBS is far simpler than the 7-factor
sceptics (r = 0.82) separately. A final similarity between RPBS, with the content of many RPBS factors (such as
the two scales can be seen in their item content, as both those within Witchcraft, Spiritualism and Precognition)
scales shared 7 common items (approximately half of the appearing in the PSBS. The appropriateness of this solu-
total scale content). tion accords with previous research suggesting that a
Despite the strengths of the CTT scale, and its simi- larger array of factors may not provide the most prudent
larities to the MTT scale, the results of the study pro- account of paranormal belief [117], particularly as the
vide strong evidence to support preference of the MTT RPBS has an insufficient number of items to adequately
derived scale. First, MTT analyses allowed for inves- sample seven distinct dimensions of paranormal belief.
tigation and refinement of the 7-point Likert scale. The Such criticisms may explain why a range of studies have
results indicated that respondents did not require so failed to replicate the original factor structure of the
many response options, and supported removal of three RPBS, finding smaller factor structures ranging between
categories leading to a final 4-point scale (1 = strongly one and six to be more suitable [117]. Despite this, most
disagree, 2 = disagree, 3 = agree, 4 = strongly agree). Cat- of these replication studies have suggested paranor-
egories 1 and 2 of the original Likert scale (moderately mal belief to be a multidimensional construct, which
disagree and somewhat disagree), both had low prob- contradicts the findings from the present work. While
abilities of observance and were subsequently collapsed the structure of the PSBS is more comparable to that of
into a single category (as were the moderately agree and the ASGS (but still differs in terms of dimensionality of
somewhat agree categories). The “uncertain” category belief ), the range of items contained within the PSBS is
was also found to be inadequate in representing partici- much broader as its focus is not confined to parapsycho-
pants’ responses, with results suggesting that this cat- logical phenomena such as extrasensory perception and
egory may be poorly defined with respondents not clearly psychokinesis, though it does include several psi-related
differentiating between this category and the “disagree” items.
category. A 7-point Likert scale was initially selected The item content of the PSBS also differs considerably
for the scale as it was thought that the large number of from the existing scales in that the final scale presents
response options would produce a more precise index of three negatively phrased items, and contains few cryp-
respondents’ level of agreement. However, these findings tozoological, religious, or culturally-specific items. By
suggest that the response options provided in the origi- reducing the number of potentially problematic items
nal 7-point scale did not represent differentiable levels of and ensuring a blend of positive and negative items, the
belief intensity (as is indicated by a monotonic increase of PSBS reduces the risk of biases introduced by partici-
category thresholds). Additionally, MTT analyses permit- pant response patterns and cultural differences, which
ted an assessment of differential item functioning (DIF). have been highlighted as issues for older measures. While
Using the Rasch tree method for identifying DIF within cultural differences are often present in paranormal
the MTT scale, analysis focused on five covariates (age, beliefs [118], and consequently some PSBS items have
gender, ethnicity, education, and discipline) to determine seen cultural influence, the PSBS has a reduced num-
whether these, or some combination of these, influenced ber of culture-bound items compared to previous scales
participants’ responses to the scale. Examination of the such as the RPBS. Therefore, the PSBS may be a stronger
candidate for a universal measure of paranormal belief. A alternative to the existing measures of paranormal beliefs
further strength of the PSBS seen particularly when com- currently in use.
pared to the RPBS, is that that the scale is not affected by
certain subgroup characteristics, including respondents’
Abbreviations
age gender, ethnicity, level of education, or academic RPBS: Revised Paranormal Belief Scale; ELF: extraordinary lifeforms; TRB:
discipline. DIF analysis indicated that the PSBS is a reli- traditional religious beliefs; ASGS: Australian Sheep-Goat Scale; SSUB: survey
able unidimensional scale that can be used to explain of scientifically unaccepted beliefs; DIF: differential item functioning; CTT:
classical test theory; MTT: modern test theory; EFA: exploratory factor analysis;
data from all respondents. The results seen for the DIF RSM: rating scale model; ICC: item characteristic curve; MNSQ: mean square;
analysis are worth comparing to the RPBS, which con- CFA: confirmatory factor analysis; PCA: principal components analysis; PSBS:
tains items that are particularly sensitive to age and gen- Paranormal and Supernatural Beliefs Scale.
4. Gianotti LR, Mohr C, Pizzagalli D, Lehmann D, Brugger P. Asso- 33. Dagnall NA, Drinkwater K, Parker A, Clough P. Paranormal experience,
ciative processing and paranormal belief. Psychiatry Clin Neurosci. belief in the paranormal and anomalous beliefs. Paranthropol J Anthro-
2001;55(6):595–603. pol Approach Paranormal. 2016;7(1):4–15.
5. Wolfradt U. Dissociative experiences, trait anxiety and paranormal 34. Irwin HJ, Dagnall N, Drinkwater K. Paranormal belief and biases in
beliefs. Person Individ Differ. 1997;23(1):15–9. reasoning underlying the formation of delusions. Aust J Parapsychol.
6. Thalbourne MA, Delin PS. A common thread underlying belief in the 2012;12(1):7–21.
paranormal, creative personality, mystical experience and psychopa- 35. Irwin HJ, Dagnall N, Drinkwater K. Parapsychological experience
thology. J Parapsychol. 1994;58(1):3–8. as anomalous experience plus paranormal attribution: a question-
7. Diamond MJ, Taft R. The role played by ego permissiveness and naire based on a new approach to measurement. J Parapsychol.
imagery in hypnotic responsivity. Int J Clin Exp Hypn. 1975;23(2):130–8. 2013;77(1):39–53.
8. Andrews RA, Tyson P. The superstitious scholar. J Appl Res Higher Educ. 36. Jinks AL. Paranormal and alternative health beliefs as quasi-beliefs:
2019;11(3):415–27. implications for item content in paranormal belief questionnaires. Aust
9. Lindeman M, Svedholm-Häkkinen AM. Does poor understanding of J Parapsychol. 2012;12(2):127–58.
the physical world predict religious and paranormal beliefs? Appl Cogn 37. Storm L, Drinkwater K, Jinks AL. A question of belief: an analysis
Psychol. 2016;30(5):736–42. of item content in paranormal belief questionnaires. J Sci Explor.
10. Irwin HJ. Thinking style and the making of a paranormal disbelief. J Soc 2017;31(2):187–230.
Psych Res. 2015;79(920):129–39. 38. Houran J, Lange R. Reflections on paranormal beliefs as informed
11. Wain O, Spinella M. Executive functions in morality, religion and para- versus pseudo beliefs: comment on Jinks (2012). Aust J Parapsychol.
normal beliefs. Int J Neurosci. 2007;117(1):135–46. 2012;12(2):159–67.
12. Gimmer MR, White KD. Nonconventional beliefs among Australian sci- 39. Broad CD. The relevance of psychical research to philosophy. J R Inst
ence and nonscience students. J Psychol. 1992;126(5):521–8. Philos. 1949;24(91):291–309.
13. Otis LP, Alcock JE. Factors affecting extraordinary belief. J Soc Psychol. 40. Drinkwater K, Dagnall N, Denovan A, Parker A. The moderating effect
1982;118(1):77–85. of mental toughness: perception of risk and belief in the paranormal.
14. Salter CA, Routledge LM. Supernatural beliefs among graduate stu- Psychol Rep. 2019;122(1):268–87.
dents at the University of Pennsylvania. Nature. 1971;232(5308):278–9. 41. Rogers P, Hattersley M, French CC. Gender role orientation, thinking
15. Vitulli WF, Tipton SM, Rowe JL. Beliefs in the paranormal: age and sex style preference and facets of adult paranormality: a mediation analysis.
differences among elderly persons and undergraduate students. Conscious Cognit. 2019;76:102821.
Psychol Rep. 1999;85(3):847–55. 42. Lawrence E, Peters E. Reasoning in believers in the paranormal. J Nerv-
16. Irwin HJ. Age and sex differences in paranormal beliefs: a response to ous Ment Dis. 2004;192(11):727–33.
Vitulli, Tipton, and Rowe (1999). Psychol Rep. 2000;86(2):595–6. 43. Bressan P. The connection between random sequences, everyday
17. Vitulli WF. Rejoinder to Irwin’s (2000) “Age and sex differences in para- coincidences, and belief in the paranormal. Appl Cogn Psychol Off J
normal beliefs: a response to Vitulli, Tipton, and Rowe (1999).” Psychol Soc Appl Res Memory Cognit. 2002;16(1):17–34.
Rep. 2000;87(2):699–700. 44. Musch J, Ehrenberg K. Probability misjudgment, cognitive ability, and
18. Lange R, Irwin HJ, Houran J. Objective measurement of paranormal belief in the paranormal. Br J Psychol. 2002;93(2):169–77.
belief: a rebuttal to Vitulli. Psychol Rep. 2001;88(3):641–4. 45. Drinkwater K, Denovan A, Dagnall N, Parker A. The Australian sheep-
19. Betsch T, Jäckel P, Hammes M, Brinkmann BJ. On the adaptive value of goat scale: an evaluation of factor structure and convergent validity.
paranormal beliefs-a qualitative study. Integr Psychol Behav Sci. 2021. Front Psychol. 2018;9:1594.
20. Boden MT, Berenbaum H. The potentially adaptive features of peculiar 46. Hartman SE. Another view of the paranormal belief scale. J Parapsychol.
beliefs. Person Individ Differ. 2004;37(4):707–19. 1999;63(2):131–41.
21. Boden MT. Supernatural beliefs: considered adaptive and associated 47. Lawrence TR. How many factors of paranormal belief are there? A
with psychological benefits. Person Individ Differ. 2015;86:227–31. critique of the Paranormal Belief Scale. J Parapsychol. 1995;59(1):3–26.
22. Rogers P, Qualter P, Phelps G, Gardner K. Belief in the paranor- 48. Tobacyk JJ. What is the correct dimensionality of paranormal beliefs? A
mal, coping and emotional intelligence. Person Individ Differ. reply to Lawrence’s critique of the Paranormal Belief Scale. J Parapsy-
2006;41(6):1089–105. chol. 1995;59(1):23–43.
23. Irwin HJ. Origins and functions of paranormal belief: the role of 49. Tobacyk J, Milford G. Belief in paranormal phenomena: Assessment
childhood trauma and interpersonal control. J Am Soc Psych Res. instrument development and implications for personality functioning. J
1992;86(3):199–208. Person Soc Psychol. 1983;44(5):1029–37.
24. Berkowski M, MacDonald DA. Childhood trauma and the development 50. Thalbourne MA, Delin PS. A new instrument for measuring the sheep-
of paranormal beliefs. J Nervous Ment Dis. 2014;202(4):305–12. goat variable: its psychometric properties and factor structure. J Soc
25. Houran J, Lange R. Redefining delusion based on studies of subjective Psych Res. 1993;59:172–86.
paranormal ideation. Psychol Rep. 2004;94(2):501–13. 51. Irwin HJ, Marks AD. The Survey of Scientifically Unaccepted Beliefs: a
26. Lange R, Houran J. The role of fear in delusions of the paranormal. J new measure of paranormal and related beliefs. Aust J Parapsychol.
Nervous Ment Dis. 1999;187(3):159–66. 2013;13(2):133–67.
27. Berger AS. Quoth the raven: bereavement and the paranormal. OMEGA 52. Tobacyk JJ. A revised paranormal belief scale. Int J Transp Stud.
J Death Dying. 1995;31(1):1–10. 2004;23(23):94–8.
28. Parker JS. Extraordinary experiences of the bereaved and adaptive 53. Drinkwater K, Denovan A, Dagnall N, Parker A. An assessment of the
outcomes of grief. OMEGA J Death Dying. 2005;51(4):257–83. dimensionality and factorial structure of the revised paranormal belief
29. Cooper CE, Roe CA, Mitchell G. Anomalous experiences and the scale. Front Psychol. 2017;8:1693.
bereavement process. In: Death, dying, and mysticism 2015. Palgrave 54. Pennycook G, Cheyne JA, Seli P, Koehler DJ, Fugelsang JA. Analytic
Macmillan, New York, pp 117–131. cognitive style predicts religious and paranormal belief. Cognition.
30. Steffen EM, Wilde D, Cooper C. Affirming the positive in anomalous 2012;123(3):335–46.
experiences: a challenge to dominant accounts of reality, life and death. 55. Wiseman R, Watt C. Measuring superstitious belief: why lucky charms
In: Brown NJ, Lomas T, Eiroá-Orosa FJ, editors. The Routledge interna- matter. Person Individ Differ. 2004;37(8):1533–41.
tional handbook of critical positive psychology. Routledge: London; 56. Lange R, Irwin HJ, Houran J. Top-down purification of Tobacyk’s revised
2018. p. 227–44. paranormal belief scale. Person Individ Differ. 2000;29(1):131–56.
31. Glicksohn J. Belief in the paranormal and subjective paranormal experi- 57. Lawrence TR, De Cicco P. The factor structure of the paranormal belief
ence. Person Individ Differ. 1990;11(7):675–83. scale: more evidence in support of the oblique five. J Parapsychol.
32. Rattet SL, Bursik K. Investigating the personality correlates of para- 1997;61(3):243–51.
normal belief and precognitive experience. Person Individ Differ. 58. Lawrence TR, Roe CA, Williams C. Confirming the factor structure of
2001;31(3):433–44. the paranormal beliefs scale: big orthogonal seven or oblique five? J
Parapsychol. 1997;61(1):13–31.
59. Lawrence TR, Roe CA, Williams C. On obliquity and the PBS: Thougthts 86. Terhune DB, Smith MD. The induction of anomalous experiences in
on Tobacyk and Thomas (1997). J Parapsychol. 1998;62(2):147–51. a mirror-gazing facility: suggestion, cognitive perceptual personal-
60. Lawrence TR. Moving on from the Paranormal Belief Scale: a final reply ity traits and phenomenological state effects. J Nervous Ment Dis.
to Tobacyk. J Parapsychol. 1995;59(2):131–41. 2006;194(6):415–21.
61. Tobacyk JJ. Final thoughts on issues in the measurement of paranormal 87. Sharkness J, DeAngelo L. Measuring student involvement: a
beliefs. J Parapsychol. 1995;59(2):141–6. comparison of classical test theory and item response theory in
62. Bouvet R, Djeriouat H, Goutaudier N, Py J, Chabrol H. French validation the construction of scales from student surveys. Res Higher Educ.
of the revised paranormal belief scale. L’Encephale. 2014;40(4):308–14. 2011;52(2):480–507.
63. Willard AK, Norenzayan A. Cognitive biases explain religious 88. Rusch T, Lowry PB, Mair P, Treiblmaier H. Breaking free from the
belief, paranormal belief, and belief in life’s purpose. Cognition. limitations of classical test theory: developing and measuring
2013;129(2):379–91. information systems scales using item response theory. Inf Manag.
64. Peltzer K. Magical thinking and paranormal beliefs among second- 2017;54(2):189–203.
ary and university students in South Africa. Person Individ Differ. 89. Magno C. Demonstrating the difference between classical test
2003;35(6):1419–26. theory and item response theory using derived test data. Int J Educ
65. Dag I. The relationships among paranormal beliefs, locus of control Psychol Assess. 2009;1(1):1–11.
and psychopathology in a Turkish college sample. Person Individ Differ. 90. Jabrayilov R, Emons WHM, Sijtsma K. Comparison of classical test
1999;26(4):723–37. theory and item response theory in individual change assessment.
66. Jeswani M, Furnham A. Are modern health worries, environmental Appl Psychol Meas. 2016;40(8):559–72.
concerns, or paranormal beliefs associated with perceptions of the 91. Hambleton RK, Jones RW. Comparison of classical test theory and
effectiveness of complementary and alternative medicine? Br J Health item response theory and their applications to test development.
Psychol. 2010;15(3):599–609. Educ Meas Issues Pract. 1993;12(3):38–47.
67. Williams E, Francis L, Lewis CA. Introducing the Modified Paranormal 92. Downing SM. Item response theory: applications of modern test the-
Belief Scale: distinguishing between classic paranormal beliefs, religious ory in medical education. Medical Education. 2003 Aug 4;37(8):739–
paranormal beliefs and conventional religiosity among undergraduates 45 and Urbina S. Essentials of Psychological Testing. Wiley, New York;
in Northern Ireland and Wales. Arch Psychol Relig. 2009;31(3):345–56. 2014 Aug 4: 242–44.
68. Hergovich A, Schott R, Arendasy M. On the relationship between para- 93. DeVellis RF. Classical test theory. Med Care. 2006;1:S50–9.
normal belief and schizotypy among adolescents. Person Individ Differ. 94. Simms LJ. Classical and modern methods of psychological scale
2008;45(2):119–25. construction. Soc Person Psychol Compass. 2008;2(1):414–33.
69. Aarnio K, Lindeman M. Paranormal beliefs, education, and thinking 95. De Champlain AF. A primer on classical test theory and item
styles. Person Individ Differ. 2005;39(7):1227–36. response theory for assessments in medical education. Med Educ.
70. Hergovich A, Schott R, Arendasy M. Paranormal belief and religiosity. J 2010;44(1):109–17.
Parapsychol. 2005;69(2):293–303. 96. Urbina S. Essentials of psychological testing. New York: Wiley; 2014.
71. Orenstein A. Religion and paranormal belief. J Sci Study Relig. 97. Kline T. Psychological testing: a practical approach to design and
2002;41(2):301–11. evaluation. London: Sage; 2005.
72. Beck R, Miller JP. Erosion of belief and disbelief: effects of religiosity and 98. Downing SM. Item response theory: applications of modern test
negative affect on beliefs in the paranormal and supernatural. J Soc theory in medical education. Med Educ. 2003;37(8):739–45.
Psychol. 2001;141(2):277–87. 99. Kose IA, Demirtasli NC. Comparison of unidimensional and mul-
73. Hillstrom EL, Strachan M. Strong commitment to traditional Protestant tidimensional models based on item response theory in terms of
religious beliefs is negatively related to beliefs in paranormal phenom- both variables of test length and sample size. Proc Soc Behav Sci.
ena. Psychol Rep. 2000;86(1):183–9. 2012;46:135–40.
74. Bader CD, Baker JO, Molle A. Countervailing forces: religiosity and para- 100. Andrich D. A rating formulation for ordered response categories.
normal belief in Italy. J Sci Study Relig. 2012;51(4):705–20. Psychometrika. 1978;43(4):561–73.
75. Baker JO, Draper S. Diverse supernatural portfolios: certitude, exclusivity, 101. Smith AB, Rush R, Fallowfield LJ, Velikova G, Sharpe M. Rasch fit statis-
and the curvilinear relationship between religiosity and paranormal tics and sample size considerations for polytomous data. BMC Med
beliefs. J Sci Study Relig. 2010;49(3):413–24. Res Methodol. 2008;8(1):1–11.
76. Furr M. Scale construction and psychometrics for social and personality 102. Tang Y, Horikoshi M, Li W. ggfortify: unified interface to visualize
psychology. London: SAGE Publications Ltd; 2011. p. 16–24. statistical results of popular R packages. R J. 2016;8(2):478–89.
77. Thalbourne MA. Further studies of the measurement and correlates of 103. R Core Team. R: A language and environment for statistical comput-
belief in the paranormal. J Am Soc Psych Res. 1995;89(3):233–47. ing [Internet]. Vienna, Austria; 2020. Available from http://www.R-
78. Roe CA. Belief in the paranormal and attendance at psychic readings. J project.org/.
Am Soc Psych Res. 1998;92(1):25–51. 104. Mair P, Hatzinger R. Extended Rasch modelling: the eRm package for
79. Storm L, Drinkwater K, Jinks AL. A Question of belief: an analysis the application of IRT models in R. J Stat Softw. 2007;20(9):1–20.
of item content in paranormal belief questionnaires. J Sci Explor. 105. Mair P, Hatzinger R, Maier M. eRm: Extended Rasch modeling [Inter-
2017;31(2):187–230. net]. R package version 1.0–2; 2021. URL https://CRAN.R-project.org/
80. Dagnall N, Drinkwater K, Parker A, Rowley K. Misperception of chance, package=eRm.
conjunction, belief in the paranormal and reality testing: a reappraisal. 106. Linacre JM. Optimising rating scale category effectiveness. J Appl
Appl Cogn Psychol. 2014;28(5):711–9. Meas. 2002;3(1):85–106.
81. Irwin HJ, Marks AD, Geiser C. Belief in the paranormal: a state, or a trait? 107. Linacre JM. What do infit, outfit, mean-square and standardised
J Parapsychol. 2018;82(1):24–40. mean? Rasch Meas Trans. 2002;16(2):878.
82. Irwin HJ, Dagnall N, Drinkwater K. The role of doublethink and other 108. Komboz B, Zeileis A, Strobl C. Tree-based global model tests for
coping processes in paranormal and related beliefs. J Soc Psych Res. polytomous Rasch models. Educ Psychol Meas. 2018;78(1):128–66.
2015;79(2):80–96. 109. Strobl C, Kopf J, Zeileis A. Rasch trees: a new method for detecting
83. Lange R, Thalbourne MA. Rasch scaling paranormal belief and experi- differential item functioning in the Rasch model. Psychometrika.
ence: Structure and semantics of Thalbourne’s Australian Sheep-Goat 2015;80(2):289–316.
Scale. Psychol Rep. 2002;91(3):1065–73. 110. Strobl C, Kopf J, Zeileis A. Using the raschtree function for detecting
84. Drinkwater K, Dagnall N, Parker A. Reality testing, conspiracy theories, differential item functioning in the Rasch model [Internet]. Available
and paranormal beliefs. J Parapsychol. 2012;76(1):57–77. from https://cran.r-project.org/web/packages/psychotree/vignettes/
85. Watt C, Watson S, Wilson L. Cognitive and psychological mediators of raschtree.pdf.
anxiety: evidence from a study of paranormal belief and perceived 111. Rosseel Y. Lavaan: An R package for structural equation modeling
childhood control. Person Individ Differ. 2007;42(2):335–43. and more version 0.5–12 (BETA). J Stat Softw. 2012;48(2):1–36.
112. Taber KS. The use of Cronbach’s alpha when developing and 117. French CC, Stone A. Anomalistic psychology: exploring paranormal
reporting research instruments in science education. Res Sci Educ. belief and experience. Macmillan International Higher Education; 2013,
2018;48(6):1273–96. pp 13–14.
113. Roszkowski MJ, Soven M. Shifting gears: consequences of including 118. Maraldi ED, Farias M. Assessing implicit spirituality in a non-WEIRD
two negatively worded items in the middle of a positively worded population: development and validation of an implicit measure of new
questionnaire. Assess Eval Higher Educ. 2010;35(1):117–34. age and paranormal beliefs. Int J Psychol Relig. 2020;30(2):101–11.
114. Schriesheim CA, Eisenbach RJ, Hill KD. The effect of negation and polar
opposite item reversals on questionnaire reliability and validity: an
experimental investigation. Educ Psychol Meas. 1991;51(1):67–78. Publisher’s Note
115. French CC, Stone A. Anomalistic psychology: exploring paranormal Springer Nature remains neutral with regard to jurisdictional claims in pub-
belief and experience. Macmillan International Higher Education; 2013. lished maps and institutional affiliations.
pp 6–9.
116. Irwin HJ. The psychology of paranormal belief: a researcher’s handbook.
Hatfield: University of Hertfordshire Press; 2009. p. 3–5.
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at