0% found this document useful (0 votes)
113 views35 pages

GenAI in Research: Practices & Perceptions

This study surveys 2,534 researchers from Danish universities to explore their practices and perceptions regarding the use of Generative Artificial Intelligence (GenAI) in the research process. It identifies three clusters of perception about GenAI's role and assesses its implications for research integrity across various tasks, revealing mixed opinions on its application. The findings highlight differences in usage and attitudes based on research area and career stage, with junior researchers showing higher engagement with GenAI.

Uploaded by

Iron man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views35 pages

GenAI in Research: Practices & Perceptions

This study surveys 2,534 researchers from Danish universities to explore their practices and perceptions regarding the use of Generative Artificial Intelligence (GenAI) in the research process. It identifies three clusters of perception about GenAI's role and assesses its implications for research integrity across various tasks, revealing mixed opinions on its application. The findings highlight differences in usage and attitudes based on research area and career stage, with junior researchers showing higher engagement with GenAI.

Uploaded by

Iron man
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/383956820

Generative Artificial Intelligence (GenAI) in the research process – a survey of


researchers’ practices and perceptions

Preprint · September 2024


DOI: 10.31235/[Link]/83whe

CITATIONS READS

3 1,531

8 authors, including:

Jens Peter Andersen Lise Degn


Aarhus University Aarhus University
56 PUBLICATIONS 1,434 CITATIONS 36 PUBLICATIONS 741 CITATIONS

SEE PROFILE SEE PROFILE

Rachel Fishberg Ebbe Krogh Graversen


Aarhus University Aarhus University
6 PUBLICATIONS 12 CITATIONS 81 PUBLICATIONS 571 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Mads P. Sørensen on 17 September 2024.

The user has requested enhancement of the downloaded file.


Generative Artificial Intelligence (GenAI) in the research
process – a survey of researchers’ practices and
perceptions

Jens Peter Andersena, * [Link]


Lise Degna [Link]
Rachel Fishberga [Link]
Ebbe K. Graversena [Link]
Serge P.J.M. Horbachb [Link]
Evanthia Kalpazidou Schmidta [Link]
Jesper W. Schneidera [Link]
Mads P. Sørensena [Link]

a: Danish Centre for Studies in Research and Research Policy, Department of Political Sciences,
Aarhus University, Bartholins Allé 7, 8000 Aarhus C, Denmark.
b: Institute for Science in Society, Faculty of Science, Radboud University, Heyendaalseweg 135,
6525 AJ Nijmegen, Netherlands.

Acknowledgements

The authors would like to thank the 2,534 respondents who completed the survey and
the 533 respondents, who completed parts of it. We would also like to thank student
assistant, Simon Nielsen, from the Danish Centre for Studies in Research and Research
Policy, Aarhus University, for his help with parts of the qualitative analysis, and PhD
student Emil Dolmer Alnor for help extracting the participant information from university
web pages.

Authors contributions

All authors contributed equally to the work described in this manuscript. Authors are
listed alphabetically by last name in the article byline.

Declaration of interest

The authors declare no competing interests.

Ethics declaration

The study obtained ethical approval from the Ethical Review Board of Aarhus BSS, Aarhus
University, under document number: BSS-2023-132

Corresponding author

Jens Peter Andersen


Danish Centre for Studies in Research and Research Policy
Aarhus University
Bartholins Allé 7, 8000 Aarhus C, Denmark
jpa@[Link]
Abstract

This study explores the use of generative AI (GenAI) and research integrity assessments
of use cases by researchers, including PhD students, at Danish universities. Conducted
through a survey sent to all Danish researchers from January to February 2024, the study
received 2,534 responses and evaluated 32 GenAI use cases across five research
phases: idea generation, research design, data collection, data analysis, and
writing/reporting. Respondents reported on their own and colleagues' GenAI usage. They
also assessed whether the practices in the use cases were considered good research
practice. Through an explorative factor analysis, we identified three clusters of
perception: "GenAI as a work horse," "GenAI as a language assistant only", and "GenAI
as a research accelerator".

The findings further show varied opinions on GenAI's research integrity implications.
Language editing and data analysis were generally viewed positively, whereas
experiment design and peer review tasks faced more criticism. Controversial areas
included image creation/modification and synthetic data, with comments highlighting
the need for critical and reflexive use of GenAI. Usage differed by main research area,
with technical and quantitative sciences reporting slightly higher usage and more
positive assessments. Junior researchers used GenAI more than senior colleagues,
while no significant gender differences were observed.

Keywords

Generative Artificial Intelligence (GenAI), research process, research practice, use


cases, research integrity
1. Introduction
Research employing Generative Artificial Intelligence (GenAI) is rapidly expanding across
fields and is anticipated to accelerate and transform scientific knowledge. As in many
other parts of society, the integration of GenAI into academic research is characterised
by a wide range of attitudes, perceptions, and yet to be developed practices. Owing to
the inherent uncertainties that come along with new technologies, fierce debates have
emerged over the potential benefits, risks and challenges of using GenAI for research
purposes (e.g. Al-Zahrani, 2023; Peres et al., 2023). As the technology continues to
reshape academic landscapes, understanding the adoption and impacts of GenAI in
research practices becomes increasingly important. The currently evolving discourse
surrounding GenAI within academia reflects a spectrum of engagement ranging from
enthusiastic adoption (Korinek, 2023; Xie & Warshel, 2023) to cautious valuation (Gao et
al., 2023; Larosa et al., 2023) and scepticism (Birhane et al., 2023; Else, 2023; Messeri &
Crockett, 2024). Following an avalanche of opinion pieces and conceptual contributions,
empirical studies offering valuable insights into GenAI's adoption, perceptions, and
anticipated impacts across different scholarly activities are currently emerging. Studies
acknowledge GenAI’s potential for efficiency gains and enhanced research processes,
on one hand, while also revealing researchers' concerns about transparency,
misinformation, biases, and generally unknown implications on the other (Van Noorden
& Perkel, 2023).

While existing research has provided a foundation for understanding the complexities of
GenAI adoption, limited empirical work has systematically explored its diverse use and
perceptions across academic contexts and research fields. This includes variations
between disciplines or ways of conducting research, as well as potential disparities
across career stages, gender and other demographics. Furthermore, question remain
about not only whether, but how academics might use GenAI in research and how they
assess its research integrity implications across various use cases. Indeed, research
practices as diverse as planning experiments, writing project proposals, generating and
collecting data, analysing it, reporting it or transforming it into lay-accessible content can
all potentially be assisted or conducted by GenAI tools. However, whether researchers
actually do this and how they assess research integrity aspects of using GenAI for these
various cases, remains largely unknown.

While several actors are preparing policy interventions to steer the usage of GenAI in
research (e.g., publishers, funders or research institutions aiming to govern particular
research practices), systematic understandings of researchers' own practices become
increasingly important. Indeed, such policy initiatives tend to remain paper tigers if not
properly aligned with practices of those they tend to govern (Hepkema et al., 2021). This
paper aims at deepening our understanding of GenAI’s role in academia based on the
results of a nationwide survey of researchers across Danish universities. By exploring
how researchers from diverse backgrounds, fields and scholarly traditions use and
assess the application of GenAI for a wide range of research tasks, this paper seeks to
contribute to a more comprehensive understanding of GenAI's evolving role within
academia. Ultimately, this is intended to inform the preparation of tailored guidelines
and models of accountability.

1.1. Overview of other relevant studies


In examining the current landscape of GenAI within academia, several studies have
offered insights into its adoption, perceptions, and anticipated impacts across various
scholarly activities. While most studies have focused on the use of GenAI in educational
and teaching contexts, several have also examined research contexts. Notably, a survey
at a large U.S. research university revealed mixed attitudes among faculty and students
towards GenAI, with a general openness to training despite low current usage and
comfort levels, particularly in research contexts (Petricini et al., 2023). Contrastingly, a
survey among higher education faculty reported a modest but growing integration of
GenAI in research activities, marking a 13-percentage point increase between spring and
fall 2023 (9% vs. 22%) (Shaw et al., 2023).

A survey among users of ResearchGate and [Link] found that reasons for
researchers to use GenAI in their work are mainly related to timesaving, self-efficacy,
self-esteem and reduction of stress. Conversely, concerns over academic integrity and
negative peer evaluations of GenAI usage, limit researchers’ inclination to use GenAI
tools for their work (Bin-Nashwan et al., 2023).

Additionally, several surveys were conducted in the context of the Nature portfolio of
journals. A survey among authors of Nature articles and readers of Nature Briefing (Van
Noorden & Perkel, 2023) found that while a large minority of researchers engaged with AI
frequently, many expressed concerns about misinformation and biases, yet recognized
benefits such as efficiency gains and improved accessibility for non-native English
speakers. Another survey conducted by Nature among 3,838 postdocs indicated similar
level of engagement with GenAI, particularly chatbots, with 31% of respondents
reporting using chatbots. Interestingly, a majority (67%) felt AI had not significantly
altered their day-to-day work or career plans (Nordling, 2023). Similarly, a survey among
readers of the Nature journal suggested a diversely engaging but cautious approach
towards GenAI in academia (Owens, 2023). Out of the 40% that used GenAI at least
occasionally, most report using it for coding tasks or to help write manuscripts, prepare
presentations or conduct literature reviews.

Furthering this discourse, findings from a UK-based survey highlighted that over half of
the academics utilized GenAI for efficiency, expecting its role to expand significantly
(Watermeyer et al., 2023). This sentiment mirrors the European Research Council's
(2023) survey results, which anticipated AI (not necessarily generative AI) fostering faster
academic processes and enhancing human-AI collaborations, albeit with concerns
about potential ethical and transparency issues.

Apart from this host of survey studies on researchers’ use and perceptions of GenAI, two
other studies are worth mentioning here. A systematic review of the literature on
guidelines and standards on how to use GenAI and Large Language Models (LLMs) in
academic medicine (Kim et al., 2023) came to five recommendations for which they feel
there is sufficient consensus among the research community, including statements like
Chatbots not being allowed as an author in scientific manuscripts; humans must be held
accountable for use of ChatGPT/LLM and contents created by ChatGPT/LLM should be
meticulously verified by humans. The study highlights the necessity for robust guidelines
to govern GenAI's academic use, advocating for accountability in AI-generated content.
This need for clarity and oversight is crucial as evidenced by Gray’s (2024) quantitative
analysis, which discovered a noticeable uptick in LLM-assisted publications within
engineering and natural sciences – a trend that highlights the differential adoption rates
across disciplines. Gray estimates that up to 85,000 LLM-assisted articles were
published in 2023, indicating significant adoption of GenAI in academic publishing, for
which other papers have provided (more anecdotal) evidence too.

1.2. Research objective and questions


The above-mentioned studies collectively indicate an evolving uptake of GenAI
technologies in the academic community. The insights reflect a spectrum of
engagement, from enthusiastic adoption to fierce scepticism, which is still in flux. This
evolution is evident in the significantly different results observed in subsequent waves of
similar surveys. However, up to this point, studies have suggested some variations but
have failed to systematically examine the diversity of GenAI use and perceptions across
academic contexts. This includes variations between disciplines or ways of doing
research, as well as potential variations across career stages and other demographics.
In addition, relatively little is known about how academics use GenAI in their work and
how they assess the research integrity of different GenAI practices, if they think of them
as good or bad research practices.

In this study, we aim to address these knowledge gaps by conducting a nation-wide


survey of researchers at Danish universities. We examine how researchers from various
backgrounds and scholarly traditions use and assess the use of GenAI for a wide range
of research tasks. In the remainder of this article, we will report the results of this survey
and reflect on the implications for regulating GenAI use for research purposes. Our
article will be guided by two research question. First, for what purposes and to what
extent is GenAI applied for research at Danish universities by researchers across
different disciplines/research fields, career stages and demographics? Second, what are
the overall research integrity assessments of researchers towards the integration of
GenAI in academic research? How do attitudes towards GenAI vary across different
scientific fields, career stages, and demographics (e.g., gender, seniority)?

2. Methods
A description of the survey, the sampling process and analysis plan were described prior
to launching the survey and uploaded to OSF (Schneider, Sørensen, et al., 2024).
2.1. Study participants
The survey was fielded as a census of all researchers at Danish universities. Previous
studies have indicated that research practices and ways of producing knowledge, as well
as researchers’ assessment of good and bad research practices, strongly differ between
researchers from different research backgrounds, demographics and institutional
contexts (Ravn & Sørensen, 2021). Therefore, we aimed to reach researchers from all
main areas of research. As the accessibility and legal framework impacting the use of
novel technologies and generative AI in particular, differs across national contexts
(Hutson, 2023), we decided to field our survey in a single country, in order to minimize
variation in this respect. Together, these considerations resulted in our choice of
including all researchers including PhD students at Danish universities as our
participants. Participants were sampled by collecting contact details from the
institutional personal webpages of the researchers. A script was written to automatically
collect the email addresses of all research staff of Danish universities. This resulted in
50,652 people with contact details and job titles. As we were only interested in staff
members with research tasks, we selected all job titles that occurred at least 50 times
(118 job titles) and suggested an academic position that might involve research tasks
(leaving 88 different job titles). This resulted in 30,590 people. Removing duplicates (e.g.
researchers working at multiple departments, hence having multiple email addresses)
and inactive email addresses, the survey was sent out to 29,498 researchers including
PhD students at Danish universities on Jan 22nd, 2024. Two reminders were sent to
researchers who had not fully completed the survey and had not opted out of receiving
further communication about it. These waves of invitations were sent to 27,978 and
26,670 researchers respectively. Before starting the survey, participants were asked to
confirm that they are: “an active researcher at a Danish university holding a PhD degree
(or equivalent)” or are “a PhD student”.

Prior to sending out the full survey, 200 researchers were randomly selected from our
sample to conduct a pilot survey. They received the full survey, with the additional
request to flag any mistakes or phrases that were unclear. This led to minor adjustments
of the survey instrument.

After concluding the survey, identifying information that had been used to contact
respondents was removed, without links to other data sources. All analyses were done
on this data set. The qualitative responses were further reviewed to remove potentially
identifying information from the published data set.

2.2. Survey instrument and respondents


The full survey instrument can be found on the project's OSF page (Schneider, Sørensen,
et al., 2024). The survey consisted of two phases. In the first phase, we collected
demographic and other background variables on the participants, including gender,
academic age, native language, academic field and knowledge production ways (e.g.
quantitative or qualitative social sciences, theoretical or experimental natural science,
etc.), participants’ exposure to institutional regulations and conversations about AI, and
the extent to which they use GenAI either professionally or personally. In the second
phase, participants were presented with 32 potential use cases of AI, divided into five
research phases (see table 1). For each use case, they were asked to consider if they had
recently used AI for this purpose, if they were aware of colleagues with whom they had
collaborated over the last year who had done so, and whether they considered the use
case to be a good or problematic research practice. The survey concluded with two open
questions asking respondents whether they had one or multiple specific GenAI tools in
mind when completing the survey and providing them the opportunity to leave any
additional comments they wanted to share.

Out of the 29,498 invitations, we received 2,534 complete responses (8.6%), with
another 533 respondents answering part of the questions (1.8%). Table 1 – table
supplements 1-4 present an overview of the survey respondents and their self-reported
demographic and disciplinary backgrounds. It also compares respondents’
characteristics with the full study population in terms of gender and disciplinary
background. Gender of the non-respondents was inferred using first names and
disciplinary background was inferred from non-respondents' departmental affiliation.

2.3. Description of quantitative analyses


All quantitative analyses were performed using R version 4.3.2. Multiple imputation was
done using the ‘mice’ package (Buuren & Groothuis-Oudshoorn, 2011).

The categorical academic age variable was constructed from a binary response on
whether the respondent was a current PhD student, and a year for when the PhD-degree
was awarded. We recoded these years to roughly correspond to the categories of
European Research Council grant levels, so that respondents with a PhD from after 2016
are “starting”, those with a PhD before 2017 and after 2010 are “consolidators” and
those with a PhD from before 2011 are “advanced”.

For the purpose of calculating aggregated scores and imputing missing values, we also
created recoded versions of all numerical values of responses to personal use and the
use of others, so that the responses “No” and “Not relevant for me” were recoded to 0,
“Yes” to 1, and “Don’t know” to missing value. Research integrity assessments were
recoded for readability purposes only, as the value 1 corresponded to “excellent” and 7
to “very problematic”. We reversed this scale and converted the value 8 (“Unable to
answer”) to missing values.

The multiple imputation was done in two batches, one for the two usage groups of
variables, and one for the research integrity assessment variables. As the usage
variables are binary, we used logistic regression, while we used predictive mean
matching for the assessments. Both batches ran through 20 iterations.

The imputed data are used for both the reported aggregate scores and the factor
analysis. Aggregated use scores are the mean use of an individual respondents, and
corresponds to the proportion of use cases, the respondent has said “yes” to using
GenAI for. The aggregated research integrity assessment is the mean value of these
assessments, ranging from 1 (all use cases are rated very problematic) to 7 (all use cases
are rated excellent).

Factor analysis was done using the ‘psych’ package in R (Revelle, 2024). We used parallel
analysis to identify three factors with an eigenvalue above 1. We use a maximum
likelihood factoring method with varimax rotation, to select factors with distinct peak
loadings. The resulting loadings are high, with several peaks above .6, and 49.1% of the
variance explained. While the explained variance is not exceptional, we still consider it
reasonable. Adding two additional factors would only explain 5.7% more of the variance,
which would not be justifiable, and introduce noisy factors.

Using the individual factor scores per observation, we cluster observations with k-means
clustering with three centres, equivalent to the number of factors. Hierarchical clustering
visually supports the number of clusters. These clusters group respondents while the
factor loadings group the variables.

2.4. Description of qualitative analyses


The qualitative data we utilized was gathered from two open-text field questions included
in the survey. The first question asked respondents about the types of GenAI tools they
used, and the second, broader question sought their comments or insights related to
using GenAI for research purposes.

The responses to the first question were compiled and visualized in bar graphs,
segmented by gender, PhD age, research field, and whether the researchers were mono-
disciplinary or multidisciplinary (Figure 2 - figure supplements 1-5).

As for the second question, we received a total of 543 comments (excluding responses
that indicated ‘no comment’). To analyse the comments, we categorized the responses
into emerging thematic groups: 'No comment', 'Understanding of the survey or
elaboration of answers', 'Assessment of good or bad practice', 'Description of GenAI as
a tool', 'Thoughts or issues related to policy, training, or infrastructure for GenAI',
'Examples of use', and 'General opinions or emotions about GenAI'. The comments and
coding results can be found on the project’s OSF site (Schneider, Sørensen, et al., 2024).
Here, we have removed any identifying information from the open text fields, including
names, university details, department or specific research fields, and any particular
activities mentioned by respondents that could lead to identification. To help us analyse
the three clusters identified in this paper, we specifically focused on the 182 comments
coded under 'Assessment of good or bad practice'.
3. Results
3.1. Descriptive overview of main results
The survey was launched on January 22, 2024, and remained open until February 26,
2024, with one invitation and two reminders being sent to all researchers (incl. PhD
students) of all eight Danish universities. Out of the 29,498 invitations, we received 2,534
complete responses (8.6%), with another 533 respondents answering part of the
questions (1.8%). In the analyses below, we only use the complete responses. The survey
consisted of two main parts, one with questions regarding general GenAI experience and
demographic background variables, the other presenting 32 use cases across five
phases of research work (Table 1). For each use case, respondents were asked about
their own use, the perceived use of others, and an assessment of the use case in terms
of research integrity on a 7-point Likert scale from excellent research practice to very
problematic research practice. Further details about the survey can be found in the
methods section.

Respondents were generally well spread across disciplinary backgrounds and


demographics. We refer to Supplementary table 1-Supplementary table 4 for a
descriptive overview of respondents’ background and demographics relative to the study
population.

Table 1. Overview of use cases per research phase. ID column shows the codes used in the analyses, corresponding
to the use cases.

Phase Use case ID


Idea help identify gaps in current research idea1
Generation help identify relevant literature idea2
help summarize or analyse existing literature idea3
help identify potential collaborators idea4
help propose new hypotheses idea5
Research suggest a structure for research proposals rd1
Design help draft parts of a research proposal rd2
refine or edit language of research proposals rd3
refine or edit content of research proposals rd4
help design research methodology rd5
help develop theoretical models or conceptual frameworks rd6
help design experiments rd7
Data suggest experimental parameters dc1
Collection help formulate questions for surveys or interviews dc2
generate synthetic data sets dc3
transcribe recordings of research material (e.g. interviews, workshops or focus
groups). dc4
identify ethical issues in research (either your own or someone else’s) dc5
Data Analysis create or edit software code for data analysis da1
create or edit simulation software code da2
support statistical data analysis da3
help pattern recognition in data da4
create or modify scientific figures or images da5
Writing and suggest a structure for a research article pub1
Reporting help draft parts of a research article pub2
propose a title, abstract or keywords for your article pub3
edit a research article to improve readability and/or language pub4
format references pub5
identify strengths and weaknesses in a manuscript during the peer review
process pub6
help write review reports during the peer review process pub7
translate one of your research papers into a different language pub8
help create (parts of) a slide deck for a conference talk or similar academic
event pub9
help create lay summaries or similar non-academic writing for public
engagement, based on your own texts pub10

In this section, we present descriptive statistics on the use and research integrity
assessment of the 32 cases of GenAI use in the research process. Figure 1 presents a
plot of the research integrity assessment and average use (both own use and the
perceived use of colleagues) of each of the individual use cases. It shows a rather wide
distribution of research integrity assessments for most use cases, indicating diverse
opinions about whether using GenAI tools for these purposes constitutes problematic or
good research practices. In general, respondents assessed using GenAI for language
editing use cases (e.g. in proposal writing, editing of research articles, formatting
references) and those related to data analysis (e.g. creating codes for analysis or
simulation, pattern recognition, transcription of research recordings) as rather good
research practices. In contrast, usage of GenAI for arguably more fundamental tasks
related to designing research experiments or theoretical frameworks and critical
assessment of other work during peer review was considered more problematic.

Two use cases that were particularly contentious were those related to the creation or
modification of images and figures, and the creation of synthetic data. Both these cases
might have different connotations in diverse research fields. An important observation in
relation to the research integrity assessment is that many respondents elaborate in the
open text field of the survey (Schneider, Sørensen, et al., 2024), that their integrity
assessment of the use of GenAI depends on it being used critically and reflexively. As one
respondent puts it:

“Although I have answered in many cases that using AI is excellent


practice, this does not mean that it should be used uncritically or
without checking references etc. I just consider AI as giving an excellent
head start on all of these tasks” (ID19826).
The qualitative comments, allowing respondents to contextualise their responses,
contain several descriptions indicating a lack of trust in GenAI. The main problems
mentioned are hallucination (that the chatbot “makes up” information), violation of
privacy rights and copyrights (not knowing what is allowed to be fed into e.g. GenAI tools),
potential biases, and ’black boxing’ of the generative process.
Generally, we observe a moderate positive correlation between the research integrity
assessment of use cases and their admitted use by respondents (Kendall’s τ = .44) or
their colleagues (τ = .5). Some exceptions are the use of GenAI to identify potential
collaborators and to create synthetic datasets (relatively low use), and to propose a title,
keyword or abstract or even help draft parts of the body of an article (with relatively high
admitted use). For all use cases, the reported use of GenAI of direct colleagues is higher
than that of respondents themselves, with particularly large relative differences for the
two use cases related to the peer review process (identifying strengths and weaknesses
of manuscripts under review and writing review reports). This is in line with what is found
in other surveys, e.g. focusing on questionable research practices and malpractice
(Schneider, Allum, et al., 2024).

Last, we note that for almost all use cases, the share of respondents indicating a use
case to be a good, very good or excellent practice is higher than the share of respondents
indicating to have used GenAI for this purpose. This suggests that, while reported use of
GenAI is still fairly low, the reason for not engaging in more use cases of GenAI is probably
not primarily related to research integrity considerations.

This is further underlined by aggregated assessment and usage scores (see


Supplementary figure 1), illustrating higher assessment scores, and lower variance, as
usage grows. Respondents that had not used GenAI, or only had used it in very few use
cases, had much higher disagreement on the assessment of the use cases on average.
Figure 1. Research integrity assessment scores and share of participants using GenAI for specific use cases. Results are shown by research phase. Brown bars
show the shares of respondents judging the use case as a problematic practice, while green bars show positive assessments. Light gray bars are the share of neutral
responses. Blue dots in the right panel show how large a share of respondents that report ever having used AI for the specific use case, while yellow dots show the
share of respondents who report that they believe their colleagues use AI for this use case. Horizontal lines in the right panel serve as visual guides only.
Figure 2 presents the aggregated use and research integrity assessment of all 32 use
cases, broken down by research field (top panel), knowledge production ways (second
panel), gender (third panel) and academic age (bottom panel). It shows relative
consistency in responses across main research areas, with most users in all disciplines
indicating to use GenAI tools for only few use cases. However, a somewhat larger
proportion of respondents from the technical sciences, especially those in the
experimental technical sciences, indicate to use GenAI for a higher number of different
use cases, some even for more than half of all use cases mentioned in our survey.
Simultaneously, respondents from the technical sciences have the highest aggregated
research integrity assessment of our 32 use cases. In contrast, scholars from the
humanities indicate to use GenAI tools least frequently and they also have the least
positive research integrity assessment of the use of GenAI for research purposes. Some
of the respondents from this main area of research indicate to have a strongly negative
research integrity assessment of the usage of such tools for many use cases. A
substantial share of respondents from the humanities (19.3%) gives an overall
assessment of 3 or less on a 7-point scale (i.e. degrees of ‘problematic research
practice’).

Looking at differences within research fields, we note that quantitative social scientists
indicate to use GenAI tools for substantially more use cases than their colleagues from
the qualitative social sciences. Again, we notice the inverse pattern in terms of research
integrity assessment, i.e. the qualitative social scientists giving slightly lower scores to
the acceptability of using GenAI tools for various purposes.

In terms of gender, no differences in either use nor ethical assessment were observed
between men and women. Some variations are reported in the other two categories (non-
binary and ‘do not wish to disclose’), but these contain only few observations.

In terms of academic age, we observe a clear pattern in usage of GenAI tools, with more
junior scholars using GenAI for more different purposes than their senior colleagues. In
terms of research integrity assessment, no substantial differences between respondents
from different academic ages were observed. This means junior scholars have been
quicker to adopt GenAI tools for various use cases than their senior colleagues, even with
similar assessment of the appropriateness of such usage.
Figure 2. Distribution of aggregated use and research integrity assessment scores. Each panel shows the
aggregated research integrity assessment (left column) or aggregated use (right column) across research field,
knowledge production ways, gender and academic age (top to bottom). Colours in knowledge production ways
distributions correspond to colours in research field.
If we look at the GenAI tools that respondents had in mind when answering the survey,
most respondents indicated to be thinking of ChatGPT (n=1,550), while 894 respondents
answered ‘no’ or did not answer the question about if they had any specific tools in mind
while answering the survey (Figure 2 – figure supplement 1). Copilot (both Microsoft and
GitHub) was the second most mentioned tool (n=176). Other tools mentioned include
Grammarly (n=101), Google's Bard (n=76), Dall-E (n=74) and DeepL (n=69).

3.2. Factor analysis


We used exploratory factor analysis to identify patterns in the variance of research
integrity assessments. We identified three clusters, supported by the eigenvalues in the
parallel analysis (Supplementary figure 7). We also checked the correspondence
between observed and multiply imputed responses (Supplementary figure 8) and
consider the correspondence sufficient to incorporate imputed responses for a more
complete data material. Factor loadings underlying the cluster analysis are available in
Supplementary figure 9.
Figure 3. Research integrity assessment responses and own use across all use cases, split by factor loading clusters.
The factor analysis revealed three clusters of research integrity assessment of GenAI use
cases among respondents (Figure 3), based on a k-means clustering of individual factor
scores. The clusters differentiate from each other by highlighting different types of
integrity assessments of GenAI use in research. Cluster 1 could be labelled “GenAI as a
work horse”, with 893 respondents (35.2%). In this cluster we find researchers who
consider using GenAI to create and edit software codes for analysis and simulation (da1-
2), to support statistical analysis (da3) and to help recognize patterns in data (da4) as
good research practices. On the other hand, researchers in this cluster are more
sceptical towards using GenAI in the peer review process (cf. pub6 and pub7) than
researchers in the two other clusters. They also score using GenAI in the ‘Idea generation
phase’ (idea1-5) lower than researchers in the other two clusters. If we look at the
comments from researchers in this cluster, made in the open text field in the survey
(Schneider, Sørensen, et al., 2024), some researchers, point out that GenAI “is good
when used for tedious tasks like formatting, editing, generating a code for idea that you
have in mind, generating drafts, etc, and terrible for creative tasks” (ID2561), that it is
“problematic to be using generative AI in creating articles or other written materials”
(ID13760), but that GenAI can be good for “checking language and reviewing code”
(ID12608). This cluster is thereby mainly characterised by using GenAI as a tool to speed
up, process, or help researchers with technical issues in relation to their research – i.e.
as a “work horse”.

In cluster 2 – tentatively called “GenAI as a language assistant only”, we find the most
sceptical respondents (n=609, 24.0%). They generally assess the use of GenAI more
negatively than the other clusters, but it is also in this cluster where we find the most
“neutral” responses. Positive assessments are mainly found for use cases related to
language editing, e.g. refine and edit language of research proposals (rd3), transcribe
recordings of research material (dc4) and propose titles, keywords, or abstracts, edit
research articles for readability and formatting references (pub3-5). Particularly, cluster
2 researchers find it more problematic to use GenAI for data analysis (da1-5) than
researchers in the other two clusters. In the open text field comments from cluster 2,
researchers provide some clarification of this pattern. Respondents referred to GenAI as
“a glorified spell checker” (23440), and mentioned that it is potentially useful “[n]ot so
much in actual research, but for various kinds of help-services, especially in connections
with language polishing/translation and editing” (20622). Overall, it seems that
researchers in this cluster are generally sceptical towards using GenAI for research
purposes, potentially with the sole exceptions of using GenAI as an assistant in the more
“language related” aspects of the research process. The positive assessments are very
few and weak in this cluster.

Finally, in cluster 3, which could be labelled “GenAI as a research accelerator”, we


find 1,032 researchers (40.7%) who are generally very positive in their assessment of
GenAI. They are positive about using GenAI in almost all use cases, particularly in relation
to data analysis and research design. There are only a few use cases with a slightly more
varied/negative assessment, e.g. the creation of synthetic datasets (dc3) and identifying
ethical issues in research (dc5). Again, the comments do not directly explain why the
researchers in this cluster score these use cases as they do. The comments deal with
many different issues, but some researchers mention that using GenAI tools help them
become more productive:

“I do believe that AI is excellent practice for increasing productivity


especially in the form of content/outline SUGGESTION (not copy-
pasting it for the final version of a paper as content might be faulty and
is often too general), language improvement (here AI is excellent and I
don't see any ethical/moral problems with it as long as input data is not
confidential), and generating first drafts of sections based on my own
(unstructured) content/thoughts (again, I do not see any problem with
that as the content still comes from me).” (ID20164).
In terms of reported use, all three clusters follow a similar pattern regarding the use
cases for which higher/lower use of GenAI is self-reported. The clusters only differentiate
in the extent to which they report higher/lower use, with respondents in cluster 3
consistently reporting highest use for every use case. Hence, while we observe relatively
strong differences in terms of research integrity assessment, these only weakly reflect in
respondents’ self-reported use of GenAI across different clusters.

If we look at the demographic distribution of respondents over the three identified


clusters (Figure 4), we do not find any differences in how men and women assess the
research integrity of different GenAI use cases. Similarly, there are only minor differences
in how different seniority groups (PhD students, and starting, consolidated and advanced
researchers) assess what is good use of GenAI. Only in relation to main areas of research
and knowledge production ways, we find more pronounced differences.

In cluster 1 – “GenAI as a work horse” – we find researchers from all types of epistemic
backgrounds. However, researchers from the humanities, who work on data produced
by themselves (26.4%), clinical medical researchers (30.5%), and theoretical natural
scientists (31.7%) are less well represented in this cluster compared to researchers
working with other ways of producing knowledge, who have a representation of between
34.1% and 40% in cluster 1.

In cluster 2 – “GenAI as a language assistant only” – we find a bigger proportion of


humanities scholars (36.6%) compared to the other four main areas of research (from
20.6 to 23.4%). If we look at differences between researchers using different knowledge
production ways, we similarly see that a greater proportion of the humanities scholars,
who work on data produced by themselves, are to be found in cluster 2 (41.5%),
compared to the other nine knowledge production ways. However, many theoretical
natural scientists (32.1%), humanities scholars working on existing data (30.4%), and
qualitative social scientists (29.9%) can also be found in this cluster of GenAI sceptics.
Whereas, in comparison, much fewer quantitative social scientists (13.8%),
experimental natural scientists (17.8%), basic medical scientists (19%) and
experimental technical scientists (19.3%) are represented in this cluster.

In cluster 3 – “GenAI as a research accelerator” – we find a smaller proportion of


researchers from the humanities (32.9%) compared to the other main areas of research.
For example, in the technical sciences, 44.1% of researchers belong to this cluster, and
in medicine it is 42.9%. If we look at the 10 different knowledge production ways, the
humanities scholars are joined by the qualitative social scientists (33.3%) and the
theoretical natural scientists (36.2%) in being proportionally less represented in this
cluster compared to other knowledge production ways.

Figure 4. Demographic distribution of respondents over factor clusters. Each heatmap shows the distribution of
research age, gender, knowledge production way and research field for the three clusters of observations identified
in the factor analysis.
4. Discussion
In this study, we set out to explore the use and assessment of GenAI in various research
practices across gender, seniority, main areas of research, and knowledge production
ways. The results show that there were no or only very minor differences in the use and
assessment in relation to gender. Some variations were found related to seniority and
bigger differences were found related to the different main areas of research. As Figure 2
shows, both the patterns of use and assessment are fairly similar within and across main
areas of research.

An interesting finding of the study, which nuances these minor differences, is that we find
the largest disagreement of how to assess the research integrity of particular use cases
– how good a practice they are perceived to be – in the group of the non- or infrequent
users. This means that the agreement in assessment increases with use and indicates
that familiarity with GenAI leads to similar, usually more positive, evaluations. This points
to a need for more training for researchers within all fields; a need which is also echoed
in a number of the qualitative comments to the survey.

This difference in use – that some use GenAI more than others – is further highlighted in
the factor analysis presented above, where three main clusters of GenAI users in Danish
academia are identified: “GenAI as a work horse”, “GenAI as language assistant only”,
and “GenAI as a research accelerator”. Interestingly, the use patterns across the
clusters are remarkably similar, but the degree of use differs. This means that the
researchers in the three clusters use (and do not use) GenAI for the same things, but to
a varied degree. We also observe a moderately positive correlation between research
integrity assessment of use cases and reported use of the same cases. Our data do not
allow identification of the direction of causality (i.e. whether more use creates a more
positive view of GenAI or the other way around). Nevertheless, this correlation is fairly
weak, and many respondents report positive assessments of use cases but no actual
use of them. This suggests that non-use is often reported not because of research
integrity concerns but due to other reasons, such as lack of awareness that GenAI can
be used for this purpose, insufficient skills on how to use GenAI or a lack of confidence
that peers will approve of GenAI usage and concerns about potential negative
consequences.

However, the factor analysis also reveals some interesting differences between
disciplines and different ways of producing knowledge across the three clusters. These
differences are most pronounced in “GenAI as a research accelerator” (Cluster 3) and in
Cluster 2, “GenAI as a language assistant only”. In Cluster 3, the most GenAI-positive
group, we find mostly researchers from the technical and medical sciences, as well as
quantitative social scientists and experimental natural scientists. In Cluster 2, on the
other hand, we have more researchers from the humanities, qualitative social science,
and theoretical natural science compared to other knowledge production ways. This
pattern might reflect important differences in the way in which knowledge is produced;
in the methods used and the overall approach to doing research, including the normative
frameworks associated with these diverse approaches to knowledge production.

This difference can be described as a difference between nomothetic and ideographic


research areas (Windelband, 2016 [1894]) – or perhaps more precisely, between more
positivist ways of doing research, on the one side, and interpretative approaches on the
other side. It seems clear that the more interpretivist researchers are more sceptical
towards GenAI, and that they also use the “neutral” option more than the other clusters.
This may be because use cases are seen as irrelevant to their research approach, e.g.
generating hypotheses or suggesting experimental parameters.

Our study also demonstrates the complex interplay between regulations and community
norms in shaping responsible GenAI use. While top-down regulations can provide a clear
framework for good research practices, their effectiveness is contingent on their
alignment with the values and practices of the research community. The case of peer
review exemplifies this dynamic. It is among the few use cases surveyed in our study for
which clear guidelines exist, prohibiting the use of GenAI for this purpose (Nogueira &
Rein, 2024). Simultaneously, we observe that this use case is among those with the
strongest moral objections, perhaps influenced by the guidelines themselves. However,
it is equally plausible that the guidelines were formulated in response to perceived pre-
existing community concerns. This interplay highlights the need for a balanced approach
to regulating GenAI in research. While top-down frameworks may help shape standards,
rigid, top-down frameworks that disregard community norms risk being ineffective or
even counterproductive (Horbach et al., 2023). Conversely, a purely bottom-up
approach may lead to inconsistent practices and tensions between diverse research
areas. In addition, our respondents indicate a clear desire for more support from their
institutions (e.g. training and access to relevant infrastructure), to allow for well-
considered and responsible use of GenAI for research purposes.

4.1. Limitations
While our study provides valuable insights into the use and assessment of research
integrity of GenAI in the research process across various research fields in the Danish
university context, it is important to acknowledge a number of limitations that may
influence the interpretation of the findings and their generalizability.

First, there might be different interpretations among survey participants of what


constitutes a GenAI tool. While many researchers thought about tools like ChatGPT
when filling in the survey, others had more general tools in mind like Grammarly. Other
researchers note that they had highly specialized tools in mind, developed for particular
research tasks. This variation can influence the reported use and research integrity
assessments of the tools. Similar considerations might have affected the interpretation
of use cases, which might have different connotations within different knowledge
production ways and epistemic cultures (e.g. the creation of ‘synthetic data’).

Second, the study is based on responses from a specific subset of Danish university
researchers. The sample may not be fully representative of the entire Danish academic
community, considering potential biases in who chose to participate in the survey. The
low number of respondents in certain categories, e.g., non-binary and “do not wish to
disclose gender” options, also constraints the ability to draw any conclusions for these
specific groups. Moreover, respondents might underreport or overreport the use of
GenAI tools and their research integrity assessments due to personal beliefs or
perceived expectations. Research integrity assessments are subjective and vary based
on individual values, backgrounds, and disciplinary/field norms (Gray, 2024). This
diversity may lead to a wide range of integrity assessments for similar use cases. The
distribution of research integrity assessments indicates that there are many different
opinions on GenAI use, which might be influenced by individual skills and experiences,
field and disciplinary standards, and personal ethics. Social desirability might, for
example, have played a role in how researchers have answered.

Third, the field of GenAI is evolving fast, and tools and their applications can change
drastically over short periods. New tools emerge and existing ones are updated,
potentially changing use patterns and research integrity perceptions. Therefore, our
findings should be considered a snapshot of the state of affairs at a specific time and
context, not necessarily generalizable to other settings.

Fourth, our study does not extensively explore the influence of cultural and institutional
factors on the use of GenAI tools and research integrity assessment of use cases.
Universities may have different policies and support structures that impact on how
researchers engage with the tools.

Finally, while we studied a broad array of GenAI use cases, there are obviously other
potential applications of GenAI in research that were not covered in our research. Future
studies could expand the range of GenAI tools and use cases to provide a more
comprehensive picture.

5. References
Al-Zahrani, A. M. (2023). The impact of generative AI tools on researchers and research:
Implications for academia in higher education. Innovations in Education and
Teaching International, 1-15. [Link]
Bin-Nashwan, S. A., Sadallah, M., & Bouteraa, M. (2023). Use of ChatGPT in academia:
Academic integrity hangs in the balance. Technology in Society, 75.
[Link]
Birhane, A., Kasirzadeh, A., Leslie, D., & Wachter, S. (2023). Science in the age of large
language models. Nature Reviews Physics, 5(5), 277-280.
[Link]
Buuren, S. v., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by
Chained Equations inR. Journal of Statistical Software, 45(3).
[Link]
Else, H. (2023). Abstracts written by ChatGPT fool scientists. Nature, 613(7944), 423-
423. [Link]
European Commission, & European Research Council Executive Agency. (2023). Use
and impact of artificial intelligence in the scientific process – Foresight. P. O. o. t.
E. Union.
Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A.
T. (2023). Comparing scientific abstracts generated by ChatGPT to real abstracts
with detectors and blinded human reviewers. npj Digital Medicine, 6(1).
[Link]
Gray, A. (2024). ChatGPT "contamination": estimating the prevalence of LLMs in the
scholarly literature. arXiv. [Link]
Hepkema, W. M., Horbach, S. P. J. M., Hoek, J. M., & Halffman, W. (2021). Misidentified
biomedical resources: Journal guidelines are not a quick fix. International Journal
of Cancer, 150(8), 1233-1243. [Link]
Horbach, S. P. J. M., Sørensen, M. P., Allum, N., & Reid, A.-K. (2023). Disentangling the
local context—imagined communities and researchers’ sense of belonging.
Science and Public Policy, 50(4), 695-706.
[Link]
Hutson, M. (2023). Rules to keep AI in check: nations carve different paths for tech
regulation. Nature, 620(7973), 260-263. [Link]
02491-y
Kim, J. K., Chua, M., Rickard, M., & Lorenzo, A. (2023). ChatGPT and large language model
(LLM) chatbots: The current state of acceptability and a proposal for guidelines on
utilization in academic medicine. Journal of Pediatric Urology, 19(5), 598-604.
[Link]
Korinek, A. (2023). Generative AI for Economic Research: Use Cases and Implications for
Economists. Journal of Economic Literature, 61(4), 1281-1317.
[Link]
Larosa, F., Hoyas, S., García-Martínez, J., Conejero, J. A., Fuso Nerini, F., & Vinuesa, R.
(2023). Halting generative AI advancements may slow down progress in climate
research. Nature Climate Change, 13(6), 497-499.
[Link]
Messeri, L., & Crockett, M. J. (2024). Artificial intelligence and illusions of understanding
in scientific research. Nature, 627(8002), 49-58. [Link]
024-07146-0
Nogueira, L. A., & Rein, J. O. (2024). Chatbots: To Cite Or Not To Cite? (Part 1). The
Scholarly Kitchen. [Link]
cite-or-not-to-cite-part-1/
Nordling, L. (2023). How ChatGPT is transforming the postdoc experience. Nature,
622(7983), 655-657. [Link]
Owens, B. (2023). How Nature readers are using ChatGPT. Nature, 615(7950), 20-20.
[Link]
Peres, R., Schreier, M., Schweidel, D., & Sorescu, A. (2023). On ChatGPT and beyond:
How generative artificial intelligence may affect research, teaching, and practice.
International Journal of Research in Marketing, 40(2), 269-275.
[Link]
Petricini, T., Wu, C., & Zipf, S. T. (2023). Perceptions About Generative AI and ChatGPT
Use by Faculty and College Students. Open Science Framework.
[Link]
Ravn, T., & Sørensen, M. P. (2021). Exploring the Gray Area: Similarities and Differences
in Questionable Research Practices (QRPs) Across Main Areas of Research.
Science and Engineering Ethics, 27(4). [Link]
00310-z
Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality
Research. In (Version [Link]) CRAN. [Link]
[Link]/web/packages/psych/[Link]
Schneider, J. W., Allum, N., Andersen, J. P., Petersen, M. B., Madsen, E. B., Mejlgaard, N.,
& Zachariae, R. (2024). Is something rotten in the state of Denmark? Cross-
national evidence for widespread involvement but not systematic use of
questionable research practices across all fields of research. PLoS ONE, 19(8).
[Link]
Schneider, J. W., Sørensen, M. P., Horbach, S. P. J. M., Andersen, J. P., Fishberg, R.,
Graversen, E. K., Degn, L., & Schmidt, E. K. (2024). Research Integrity AI Survey.
Open Science Framework. [Link]
Shaw, C., Yuan, L., Brennan, D., Martin, S., Janson, N., Fox, K., & Bryant, G. (2023). GenAI
in Higher Education – Fall 2023 Update. Turnitin.
[Link]
[Link]
Van Noorden, R., & Perkel, J. M. (2023). AI and science: what 1,600 researchers think.
Nature, 621(7980), 672-675. [Link]
Watermeyer, R., Phipps, L., Lanclos, D., & Knight, C. (2023). Generative AI and the
Automating of Academia. Postdigital Science and Education, 6(2), 446-466.
[Link]
Windelband, W. (2016 [1894]). History and Natural Science. Theory & Psychology, 8(1),
5-22. [Link]
Xie, W. J., & Warshel, A. (2023). Harnessing generative AI to decode enzyme catalysis and
evolution for enhanced engineering. National Science Review, 10(12).
[Link]
Supplementary table 1. Gender and research field of respondents and study population.

N %
Population Sample Population Sample Ratio
Men Women Men Women Men Women Men Women Men Women
Tehnical &
natural sciences 4047 1574 815 342 35.8% 13.9% 33.1% 13.9% 0.92 1.00
Health sciences 1282 951 245 279 17.5% 24.0% 16.6% 28.2% 0.95 1.18
Social sciences 1308 861 283 215 17.8% 21.7% 19.2% 21.7% 1.08 1.00
Humanities 707 580 132 153 9.6% 14.6% 8.9% 15.5% 0.93 1.06
Total 7344 3966 1475 989

Note: Numbers on the study population are taken from the “Talent barometer” of Danish universities:
[Link]
[Link]. This publication combines technical and natural sciences, and the categories have also been added for
the sample.

Supplementary table 2. Gender and career stage of respondents and study population.

N %
Population Sample Population Sample Ratio
Wome Wome Wome Wome
Career step Men Women Career age Men n Men n Men n Men n
Postdoc
and
assistant
professor 2583 1852 Starting 378 281 23% 16% 22% 17% 0.98 1.01
Associate Consolidat
professor 2882 1506 or 202 132 25% 13% 12% 8% 0.47 0.59
Full
professor 1874 608 Advanced 505 194 17% 5% 30% 11% 1.80 2.13
Total 7339 3966 1085 607

Note: Numbers on the study population are taken from the “Talent barometer” of Danish universities:
[Link]
[Link]. This publication uses career steps which are not identical to the career age categories used in this study.
Supplementary table 3. Years since PhD of respondents.

Years since PhD N %


1 176 6.9%
2 126 5.0%
3 99 3.9%
4 87 3.4%
5 63 2.5%
6 62 2.4%
7 63 2.5%
8 59 2.3%
9 61 2.4%
10 61 2.4%
11 56 2.2%
12 50 2.0%
13 52 2.1%
14 60 2.4%
15 56 2.2%
16 39 1.5%
17 36 1.4%
18 25 1.0%
19 36 1.4%
20 40 1.6%
21 30 1.2%
22 35 1.4%
23 25 1.0%
24 30 1.2%
25 23 0.9%
26 24 0.9%
27 20 0.8%
28 24 0.9%
29 17 0.7%
30 22 0.9%
31 18 0.7%
32 28 1.1%
33 17 0.7%
34 13 0.5%
35 14 0.6%
36 11 0.4%
37 6 0.2%
38 11 0.4%
39 3 0.1%
40+ 57 2.2%
Current PhD 796 31.4%
Student
Missing 3 0.1%
Total 2534
Supplementary table 4. Summary statistics of all use cases.

Research integrity
assessment
Own Other’
Use case Mean SD Median use s use
help identify gaps in current research [idea1] 3.97 1.69 4 0.15 0.37
help identify relevant literature [idea2] 4.39 1.82 5 0.28 0.56
help summarize or analyse existing literature [idea3] 4.48 1.74 5 0.35 0.70
help identify potential collaborators [idea4] 4.08 1.56 4 0.03 0.14
help propose new hypotheses [idea5] 3.50 1.74 4 0.11 0.33
suggest a structure for research proposals [rd1] 4.62 1.52 5 0.18 0.58
help draft parts of a research proposal [rd2] 4.24 1.68 4 0.21 0.69
refine or edit language of research proposals [rd3] 5.58 1.37 6 0.47 0.85
refine or edit content of research proposals [rd4] 4.33 1.74 4 0.22 0.69
help design research methodology [rd5] 3.44 1.65 4 0.08 0.30
help develop theoretical models or conceptual frameworks [rd6] 3.26 1.74 3 0.07 0.29
help design experiments [rd7] 3.49 1.70 4 0.06 0.27
suggest experimental parameters [dc1] 4.00 1.55 4 0.06 0.29
help formulate questions for surveys or interviews [dc2] 4.51 1.55 5 0.10 0.43
generate synthetic data sets [dc3] 4.05 1.90 4 0.07 0.36
transcribe recordings of research material 5.11 1.65 5 0.13 0.54
(e.g. interviews, workshops or focus groups). [dc4]
identify ethical issues in research (either 3.61 1.76 4 0.04 0.17
your own or someone else’s) [dc5]
create or edit software code for data analysis [da1] 5.40 1.50 6 0.38 0.80
create or edit simulation software code [da2] 5.28 1.51 6 0.19 0.71
support statistical data analysis [da3] 5.06 1.55 5 0.25 0.69
help pattern recognition in data [da4] 5.18 1.50 5 0.15 0.62
create or modify scientific figures or images [da5] 4.51 1.87 5 0.16 0.62
suggest a structure for a research article [pub1] 4.64 1.48 5 0.16 0.68
help draft parts of a research article [pub2] 4.04 1.79 4 0.25 0.75
propose a title, abstract or keywords for your article [pub3] 4.93 1.54 5 0.35 0.80
edit a research article to improve readability and/or language [pub4] 5.47 1.45 6 0.48 0.87
format references [pub5] 5.13 1.63 5 0.09 0.54
identify strengths and weaknesses in a manuscript 3.67 1.78 4 0.07 0.44
during the peer review process [pub6]
help write review reports during the peer review process [pub7] 3.38 1.78 3 0.07 0.46
translate one of your research papers into a different language [pub8] 4.90 1.63 5 0.14 0.58
help create (parts of) a slide deck for a 4.62 1.52 5 0.10 0.58
conference talk or similar academic event [pub9]
help create lay summaries or similar non-academic 5.01 1.56 5 0.22 0.67
writing for public engagement, based on your own texts [pub10]
Supplementary figure 1. Scatter-plot of aggregated assessment and use.
Supplementary figure 2. Number of scientists using different AI tools.

Supplementary figure 3. Percent of men and women who use different AI tools.
Supplementary figure 4. Percent of AI tools users for each research age.
Supplementary figure 5. Percent of AI tools users for each research field.

Supplementary figure 6. Percent of AI tools users for inter and multi-disciplinary work.
Supplementary figure 7. Parallel analysis scree plot.
Supplementary figure 8. Correspondence between observed and multiply imputed research integrity assessment
scores.
Supplementary figure 9. Factor loadings across all use cases.

View publication stats

Common questions

Powered by AI

Assessing research integrity of GenAI use presents challenges varying across different clusters of researchers. In the 'GenAI as a work horse' cluster (Cluster 1), researchers perceive using GenAI for creating and editing software codes as positive but express skepticism over its use in peer reviews and creative tasks . In contrast, the 'GenAI as a language assistant only' cluster (Cluster 2) is more skeptical across the board, with minor positive assessments limited to language-related tasks . Lastly, researchers in the 'GenAI as a research accelerator' cluster (Cluster 3) generally hold favorable views towards most GenAI applications, indicating variations in how research integrity is perceived depending on the type of GenAI application and cluster characteristics . Overall, these differences highlight a correlation between familiarity with GenAI tools and positive research integrity assessments .

The identified clusters demonstrate significant differences in accepting GenAI for data-related tasks. Researchers in the 'GenAI as a work horse' cluster are more accepting of using GenAI for data analysis and technical tasks but hesitant about its integration into the peer review process . In contrast, the 'language assistant only' cluster perceives GenAI's use in data tasks more negatively; they mainly value its assistance in language-related processes, such as editing and formatting . Conversely, the 'research accelerator' cluster expresses broad acceptance for using GenAI in data analysis and research design, highlighting their confidence in its potential to increase productivity in research endeavors . These differences reflect varying priorities and skepticism levels towards GenAI's capabilities across clusters.

Researchers utilizing GenAI as a 'work horse' (Cluster 1) leverage these tools predominantly for technical and data-oriented tasks such as creating code, aiding statistical analysis, and recognizing data patterns. They focus on using GenAI to accelerate technical research tasks, though they remain skeptical about its role in peer-review processes and creative idea generation . In contrast, the 'language assistant only' cluster (Cluster 2) is generally more skeptical toward GenAI, using it mainly for language-related tasks such as editing and refining language in research documents. This group sees the potential of GenAI in improving readability and formatting, rather than for substantial research processes like data analysis . These distinct use patterns highlight varying acceptance levels and roles GenAI plays across different research practices.

Variations in interpretations of what constitutes a GenAI tool significantly affect the study's findings by introducing biases and inconsistencies in responses regarding the use and assessment of such tools. Some respondents considered general tools like Grammarly or highly specialized research tools in their responses, leading to a varied understanding of GenAI's role in academic research . This discrepancy can skew data on tool usage frequency and perceived research integrity, complicating efforts to draw generalized conclusions. Moreover, such variability in interpretation suggests the need for clearer definitions and guidance to ensure consistent understanding and evaluation of GenAI across different academic contexts .

The findings imply that future development and application of GenAI tools in academia should focus on increasing transparency and aligning capabilities with researchers' specific needs to improve acceptance and utility. Given the variability in how GenAI tools are perceived and used across different disciplines and clusters, developers should consider creating customizable tools that address diverse research processes. Moreover, academia might benefit from developing comprehensive guidelines and training programs to bridge knowledge gaps, standardize interpretations of GenAI applications, and enhance skills in deploying these tools effectively and ethically . This tailored approach could foster broader acceptance and integration of GenAI in research practices.

Survey responses regarding the integrity assessments of GenAI use may have been influenced by potential biases such as social desirability, where respondents might have overreported or underreported their use of GenAI tools to align with perceived expectations of peers or institutional norms . Additionally, personal beliefs about AI, varying levels of awareness and skills, and interpretations of what constitutes appropriate use could contribute to inconsistencies. The subjective nature of research integrity assessments, influenced by individual values, disciplinary norms, and experiences, further adds layers of bias that could affect the study's findings on how GenAI is assessed in academic research .

There is a need for increased training of researchers in the use of GenAI tools because familiarity with these tools tends to lead to more positive evaluations of their use, as shown by the increased agreement in assessments among frequent users compared to non-users. The study indicates that better understanding and skills in using GenAI could reduce skepticism and enhance the effective integration of these tools in research practices . Moreover, researchers across different clusters demonstrate varied levels of GenAI use and yet similar use patterns, suggesting that enhanced training could harmonize research integrity assessments by alleviating concerns or misuses due to lack of understanding or capability .

There is a moderately positive correlation between the frequency of GenAI use and more positive research integrity assessments, suggesting that increased familiarity with these tools can lead to a better appreciation of their potential benefits and responsible applications . This correlation implies that exposure to GenAI could mitigate negative perceptions and build confidence in its use, although the direction of causality is unclear. Thus, promoting educational initiatives and providing practical experience with GenAI tools could foster more favorable assessments of their role in research .

Cultural and institutional factors potentially influence researchers' engagement with GenAI tools by shaping attitudes towards technology, policies, and available resources for ethical and effective use. Different universities may implement diverse policies and support structures that impact how readily and comfortably researchers can use GenAI tools in academic settings . These factors could affect an individual's motivation to explore and incorporate GenAI tools into their workflows and define the support or resistance they might encounter from their academic community, thereby influencing the overall acceptance and integration of GenAI applications in research .

The fast-evolving nature of GenAI tools could significantly impact their utilization in academic research by rapidly changing how they are perceived and applied. The study highlights that these tools and their applications can change drastically over short periods, affecting use patterns and perceptions of research integrity. This rapid evolution might lead to inconsistencies in tool effectiveness and potentially require constant learning and adaptation among researchers . Moreover, as new tools emerge and existing ones are updated, researchers must regularly revise their approaches to leveraging these technologies effectively and ethically, thus indicating a need for ongoing education and structural support in academia .

You might also like