0% found this document useful (0 votes)
22 views449 pages

Davitti, E, Korybski, T, & Braun, S 2025 The Routledge Handbook

The Routledge Handbook of Interpreting, Technology and AI offers a comprehensive exploration of the intersection between interpreting and technology, focusing on the impact of AI and digital tools on the field. It includes contributions from global experts and covers various aspects such as technology-enabled interpreting, interpreter training, and ethical implications. This handbook serves as an essential resource for students, researchers, and professional interpreters seeking to understand the evolving landscape of interpreting practice.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views449 pages

Davitti, E, Korybski, T, & Braun, S 2025 The Routledge Handbook

The Routledge Handbook of Interpreting, Technology and AI offers a comprehensive exploration of the intersection between interpreting and technology, focusing on the impact of AI and digital tools on the field. It includes contributions from global experts and covers various aspects such as technology-enabled interpreting, interpreter training, and ethical implications. This handbook serves as an essential resource for students, researchers, and professional interpreters seeking to understand the evolving landscape of interpreting practice.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 449

“Amid a plethora of handbooks, this volume is particularly timely as a much-needed

stock-taking of technological developments that have been and will be shaping the way
interpreting is practiced and future technology-using professionals are educated to enable
communication in a variety of settings.”
Franz Pöchhacker, University of Vienna, Austria
THE ROUTLEDGE HANDBOOK OF
INTERPRETING, TECHNOLOGY AND AI

This handbook provides a comprehensive overview of the history, development, use, and
study of the evolving relationship between interpreting and technology, addressing the
challenges and opportunities brought by advances in AI and digital tools.
Encompassing a variety of methods, systems, and devices applied to interpreting as a
field of practice as well as a study discipline, this volume presents a synthesis of current
thinking on the topic and an understanding of how technology alters, shapes, and enables
the interpreting task. The handbook examines how interpreting has evolved through the
integration of both purpose-built and adapted technologies that support, automate, or
even replace (human) interpreting tasks and offers insights into their ethical, practical, and
socio-economic implications. Addressing both signed and spoken language interpreting
technologies, as well as technologies for language access and media accessibility, the book
draws together expertise from varied areas of study and illustrates overlapping aspects of
research.
Authored by a range of practising interpreters and academics from across five continents,
this is the essential guide to interpreting and technology for both advanced students and
researchers of interpreting and professional interpreters.

Elena Davitti is Associate Professor of Translation Studies at the Centre for Translation
Studies at the University of Surrey, Co-Director of the Leverhulme Doctoral Network
‘AI-Enabled Digital Accessibility’ (ADA), and Co-Editor of the journal Translation,
Cognition & Behavior.

Tomasz Korybski is Assistant Professor at the Institute of Applied Linguistics at the


University of Warsaw, Visiting Researcher at the Centre for Translation Studies at the
University of Surrey, and a conference interpreter/translator with over 20 years’ experience.

Sabine Braun is Professor of Translation Studies and the Director of the Centre for
Translation Studies at the University of Surrey, Co-Director of the Surrey Institute for
People-Centred AI, and Director of the Leverhulme Doctoral Network ‘AI-Enabled Digital
Accessibility’ (ADA).
ROUTLEDGE HANDBOOKS IN TRANSLATION
AND INTERPRETING STUDIES

Routledge Handbooks in Translation and Interpreting Studies provide comprehensive


overviews of the key topics in translation and interpreting studies. All entries for the
handbooks are specially commissioned and written by leading scholars in the field. Clear,
accessible and carefully edited, Routledge Handbooks in Translation and Interpreting
Studies are the ideal resource for both advanced undergraduates and postgraduate students.

THE ROUTLEDGE HANDBOOK OF TRANSLATION AND SEXUALITY


Edited by Brian Baer and Serena Bassi

THE ROUTLEDGE HANDBOOK OF CORPUS TRANSLATION STUDIES


Edited by Defeng Li and John Corbett

THE ROUTLEDGE HANDBOOK OF INTERPRETING AND COGNITION


Edited by Christopher D. Mellinger

THE ROUTLEDGE HANDBOOK OF TRANSLATION AND SOCIOLOGY


Edited by Sergey Tyulenev and Wenyan Luo

THE ROUTLEDGE HANDBOOK OF CHINESE INTERPRETING


Edited by Riccardo Moratto and Cheng Zhan

THE ROUTLEDGE HANDBOOK OF TRANSLATION AND CENSORSHIP


Edited by Denise Merkle and Brian Baer

THE ROUTLEDGE HANDBOOK OF TRANSLATION AND YOUNG AUDIENCES


Edited by Michał Borodo and Jorge Díaz-Cintas

THE ROUTLEDGE HANDBOOK OF INTERPRETING, TECHNOLOGY AND AI


Edited by Elena Davitti, Tomasz Korybski and Sabine Braun
For a full list of titles in this series, please visit www.routledge.com/Routledge-Handbooks-
in-Translation-and-Interpreting-Studies/book-series/RHTI
THE ROUTLEDGE
HANDBOOK OF
INTERPRETING,
TECHNOLOGY AND AI

Edited by Elena Davitti, Tomasz Korybski


and Sabine Braun
Designed cover image: Getty Images
First published 2025
by Routledge
4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
605 Third Avenue, New York, NY 10158
Routledge is an imprint of the Taylor & Francis Group, an informa
business
© 2025 selection and editorial matter, Elena Davitti, Tomasz Korybski
and Sabine Braun; individual chapters, the contributors
The right of Elena Davitti, Tomasz Korybski, and Sabine Braun to be
identified as the authors of the editorial material, and of the authors for
their individual chapters, has been asserted in accordance with sections
77 and 78 of the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced
or utilised in any form or by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying and recording,
or in any information storage or retrieval system, without permission in
writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or
registered trademarks, and are used only for identification and explanation
without intent to infringe.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-367-51300-9 (hbk)
ISBN: 978-0-367-51301-6 (pbk)
ISBN: 978-1-003-05324-8 (ebk)
DOI: 10.4324/9781003053248
Typeset in Sabon
by Apex CoVantage, LLC
CONTENTS

List of contributors x
Acknowledgmentsxiv

Introduction1
Elena Davitti, Tomasz Korybski and Sabine Braun

PART I
Technology-enabled interpreting 9

1 Telephone interpreting 11
Raquel Lázaro Gutiérrez

2 Video-mediated interpreting 30
Sabine Braun

3 Remote simultaneous interpreting 51


Agnieszka Chmiel and Nicoletta Spinolo

4 Video relay service 67


Camilla Warnicke

5 Portable interpreting equipment 79


Tomasz Korybski

6 Technology-enabled consecutive interpreting 91


Cihan Ünlü

vii
Contents

7 Tablet interpreting 108


Francesco Saina

PART II
Technology and interpreter training 121

8 Computer-assisted interpreting (CAI) tools and CAI tools training 123


Bianca Prandi

9 Digital pens for interpreter training 145


Marc Orlando

10 Technology for training in conference interpreting 156


Amalia Amato, Mariachiara Russo, Gabriele Carioli
and Nicoletta Spinolo

PART III
Technology for (semi-)automating interpreting workflows 179

11 Technology for hybrid modalities 181


Elena Davitti

12 Machine interpreting 209


Claudio Fantinuoli

PART IV
Technology in professional interpreting settings 227

13 Conference settings 229


Kilian G. Seeber

14 Healthcare settings 247


Esther de Boe

15 Legal settings 265


Jérôme Devaux

16 Immigration, asylum, and refugee settings 282


Diana Singureanu and Sabine Braun

viii
Contents

PART V
Current issues and debates 303

17 Quality-related aspects 305


Elena Davitti, Tomasz Korybski, Constantin Orăsan
and Sabine Braun

18 Ethical aspects 327


Deborah Giustini

19 Cognitive aspects 348


Christopher D. Mellinger

20 International and professional standards 364


Verónica Pérez Guarnieri and Haris N. Ghinos

21 Workflows and working models 388


Anja Rütten

22 Ergonomics and accessibility 407


Wojciech Figiel

Index423

ix
CONTRIBUTORS

Amalia Amato is an associate professor at the Department of Interpreting and Translation


of Bologna University. She has participated in five EU-funded research projects on legal
interpreting, remote interpreting, and interpreting for unaccompanied migrant children.
Her research interests also include interpreter education/training and media interpreting.
She serves on inTRAlinea’s editorial board.
Sabine Braun is a professor of translation studies, the director of the Centre for Trans-
lation Studies at the University of Surrey, the co-director of the Surrey Institute for
People-Centred AI, and the director of the Leverhulme Doctoral Network ‘AI-Enabled
Digital Accessibility’ (ADA). Her research explores human–machine interaction and
integration in interpreting and media accessibility.
Gabriele Carioli is an IT specialist in the Department of Interpreting and Translation of
Bologna University. He develops web tools, particularly for interpreting, using PHP and
NodeJS. He has advanced expertise in Linux systems and in databases and also develops
applications in C/C++, Golang, and ObjectPascal. He is currently developing ReBooth
2.0 and WhisperGUI.
Agnieszka Chmiel is an associate professor at the Department of Translation Studies at
Adam Mickiewicz University in Poznań, Poland, and an associate editor of Target. She
conducts experimental and corpus-based research on cognitive load in simultaneous
interpreting, multimodality, and the use of technology in remote interpreting.
Elena Davitti is an associate professor of translation studies at the Centre for Transla-
tion Studies, University of Surrey; co-director of the Leverhulme Doctoral Network
‘AI-Enabled Digital Accessibility’ (ADA); and co-editor of the journal Translation, Cog-
nition & Behavior. Her main research interests lie in hybrid human–AI practices for
multilingual, multimodal, and accessible communication in real time.
Esther de Boe is a tenure-track professor at the University of Antwerp. Her research focuses
on technology-mediated interpreting (e.g. Interactional Dynamics in Remote Inter-
preting: Micro-analytical Approaches [Routledge, 2024]). She is a board member of

x
Contributors

the European Network of Public Service Interpreting and Translation and previously
worked as a sworn interpreter.
Jérôme Devaux is a senior lecturer in French and translation studies at the Open University
(UK). His research interest lies at the intersection of interpreting studies, technology,
and social justice. He has published papers on various topics, including the interpreter’s
role(s) and technologies, legal interpreting, ethics, and interpreting training.
Claudio Fantinuoli is a researcher at the University of Mainz, CTO at KUDO Inc., and
consultant for international organisations. He works in the area of natural language
processing applied to human and machine interpreting (computer-assisted interpreting,
speech recognition, speech translation).
Wojciech Figiel is an assistant professor at the Institute of Applied Linguistics, University
of Warsaw. He has authored numerous publications on accessibility, the sociology of
translation, and disability studies. In particular, he has researched the accessibility of
translational professions for blind and low-sighted persons.
Haris Ghinos is the CEO of ELIT Language Services, vice president of the International
Association of Conference Interpreters (AIIC), and project leader of ISO 23155:2022
on conference interpreting services. Haris is also a consultant interpreter with Calliope
Interpreters and was a former staff interpreter with SCIC, the European Commission. He
holds degrees in physics, war studies, international politics, and finance.
Deborah Giustini is an assistant professor in intercultural communication at HBKU and
a research fellow in interpreting studies at KU Leuven. Her research explores the digi-
talisation of multilingual work. She serves on the IATIS executive council and the edito-
rial boards of Interpreting & Society, Sociology, The British Journal of Sociology, and
Sociological Research Online.
Tomasz Korybski is an assistant professor at the Institute of Applied Linguistics, University
of Warsaw; a visiting researcher at the Centre for Translation Studies, University of Sur-
rey; and a conference interpreter/translator with over 20 years’ experience. His research
interests include the evaluation of interpreting quality and the applicability of AI-based
solutions in the provision of interpreting services.
Raquel Lázaro Gutiérrez is an associate professor and the director of the Department of
Modern Philology at the University of Alcalá. She is a member of the FITISPos-UAH
Research Group and the vice president of ENPSIT. She has been PI of several projects, such
as ‘Corpus pragmatics and telephone interpreting’ (2023–2026) and MHEALTH4ALL
(2022–2025).
Christopher Mellinger is an associate professor at the University of North Carolina at Char-
lotte. He is the co-author of Quantitative Research Methods in Translation and Interpret-
ing Studies and the editor of The Routledge Handbook of Interpreting and Cognition. He
also serves as a co-editor of the journal Translation and Interpreting Studies.
Constantin Orăsan is a professor of language and translation technologies at the Centre of
Translation Studies, University of Surrey. He has over 25 years of experience in natural
language processing and artificial intelligence. His current research focuses on the use of
large language models and automatic speech recognition in translation and interpreting.

xi
Contributors

Marc Orlando is a full professor of translation and interpreting studies and the programme
director in the Department of Linguistics at Macquarie University, Sydney. He sits on
the editorial boards of The Interpreters’ Newsletter and Interpreting and Society. His
research investigates synergies between T&I practice, research, and training, with a
focus on digital technologies.
Verónica Pérez Guarnieri, PhD, is a lead auditor and standards expert with over a decade of
experience. She authored ISO 18841 and convenes ISO TC37/SC5/WG2. As a consult-
ant interpreter and translator, she has lectured globally on standardisation and certifica-
tion and helped establish mirror committees and language industry standards in several
countries.
Bianca Prandi is a postdoctoral assistant at the University of Innsbruck. She researches
human–machine interaction in interpreting and has published on computer-assisted
interpreting and cognition. She cooperated with the Universities of Vienna, Trieste, and
Bologna. She is a member of the EST and of the scientific board of TransActions.
Mariachiara Russo is a professor of Spanish language and interpretation at the Department
of Interpreting and Translation of the University of Bologna–Forlì Campus. She is also the
coordinator of the European Parliament Interpreting Corpus (EPIC), a co-coordinator of
the EU-funded project SHIFT in Orality (https://site.unibo.it/shiftinorality/en), and the
co-creator of UNIC (Unified Interpreting Corpus; http://unic.dipintra.it).
Anja Rütten holds a professorship at the University of Applied Sciences in Cologne, Ger-
many, focusing on knowledge management, terminology, and the use of technologies in
conference interpreting. She has been working for the institutional and private market
for over 20 years and is a member of AIIC’s AI workstream.
Francesco Saina is an Italian linguist, translator, and interpreter with English, French, and
Spanish. A university lecturer, he collaborates on academic and industrial research on
translation and interpreting technology and natural language processing. His work on
the applications of digital technology to the language professions has been published and
presented at international conferences.
Kilian Seeber is a professor of interpreting at the University of Geneva’s Faculty of Transla-
tion and Interpreting, where he serves as the vice dean and as the programme director
for the master of advanced studies in interpreter training programme. He is a principal
investigator at LaborInt and at InTTech.
Diana Singureanu is a researcher at the Centre for Translation Studies, University of Sur-
rey. She has investigated video-mediated interpreting (VMI) in court settings and helped
develop VMI standards and interpreter training through the EU-WEBPSI and EmpASR
projects. Recently, she was awarded a Leverhulme fellowship to explore machine inter-
preting in legal contexts.
Nicoletta Spinolo is an assistant professor in the Department of Interpreting and Transla-
tion, Bologna University. Her research interests include interpreter education, Italian–
Spanish interpreting, and interpreting technologies. With Agnieszka Chmiel, she was
a co-PI in the AIIC-funded ‘Inside the Virtual Booth’ project on the impact of remote
interpreting environments on interpreters’ experience and performance.

xii
Contributors

Cihan Ünlü is a researcher at İstanbul Yeni Yüzyıl University, Türkiye. His work focuses
on computer-assisted translation, interpreting technologies, machine translation, and
human–computer interaction. He is also pursuing a doctoral degree in translation stud-
ies at Boğaziçi University, Istanbul, where he specialises in interpreting technologies.
Camilla Warnicke is an associate professor, a certified interpreter of Swedish and Swedish
Sign Language, and a deaf-blind interpreter working at Stockholm University’s Institute
for Interpreting and Translation Studies and the Sign Language and Deafblind Inter-
preter Program at Fellingsbro Folkhighschool, Örebro. She is affiliated with Örebro Uni-
versity’s School of Behavioural, Social, and Legal Sciences.

xiii
ACKNOWLEDGMENTS

The editors extend their heartfelt thanks to the many contributors from around the world
who generously shared their invaluable insights, expertise, and perspectives. Their contribu-
tions have enriched this volume, ensuring its depth, broad relevance, and forward-looking
angle. Capturing a rapidly evolving topic is no easy task, yet our authors have engaged with
this challenge thoughtfully, helping to create a resource that reflects the dynamic nature of
the field. We are deeply grateful for their dedication and collaboration throughout this pro-
cess. We would also like to express our deep gratitude to the reviewers for their thorough
feedback, which has significantly shaped the quality of this volume and sparked an inspiring
exchange of ideas, even before its publication. A special thank-you goes to Megan Stock-
well, Education and Language Solutions Specialist at Megan Stockwell Language Solu-
tions, for her meticulous editing and proofreading, which ensured clarity and consistency
throughout the chapters. All remaining errors and inconsistencies are entirely our own.

xiv
INTRODUCTION
Elena Davitti, Tomasz Korybski and Sabine Braun

This book sets the ambitious goal of providing a comprehensive overview of the evolution,
application, and study of interpreting and technology in the AI era, covering various tech-
nologies regularly encountered in the field, the ways in which they have been integrated into
a variety of settings and workflows, as well as the issues arising as a result of this integra-
tion. Bringing together contributions from authors in 15 countries across five continents,
the volume addresses a wide range of methods, systems, and devices used both in interpret-
ing practice and research, while also engaging with emerging critical issues and debates. At
present, there is no comprehensive synthesis of interpreting and technology that addresses
all these areas concurrently.
In our increasingly technologised world, exploring the intersection between interpreting
practice, research, and technology is vital not only to capture the ways in which an increas-
ing range of digital technologies – from information and communication technologies and
platforms to data-driven language technologies, including AI – have altered real-time multi-
lingual and accessible communication and the interpreters’ tasks across different modes and
settings but also to understand the evolving trends that will continue to shape the interpret-
ing profession. While technology has played a role in interpreting since the 1920s, when it
paved the way for simultaneous interpreting, it has more recently started to truly permeate
the field, in many different forms and with more diversified uses and applications emerging
at a much faster pace than in the past. The rise of AI-powered language technology has
significantly accelerated this process, creating new possibilities for delivering interpreting
services across different settings and modalities via enhanced human–machine interaction,
and even going as far as supposedly replacing the human at the core of these practices.
In this respect, there has been a notable shift from technology developed specifically for
interpreting purposes (e.g. simultaneous interpreting consoles) to technology developed for
other purposes and subsequently adapted for interpreting. These include telephone and
videoconferencing systems, tablets, and portable equipment. Additionally, experimen-
tal efforts now aim to integrate AI-driven technology, such as automatic speech recogni-
tion and machine translation, within interpreting workflows not only to provide support
but also to introduce new hybrid practices of real-time speech-to-text/speech which were
previously unavailable. New research has begun to explore the possibilities afforded by


DOI: 10.4324/9781003053248-1
The Routledge Handbook of Interpreting, Technology and AI

technology from different perspectives, thus developing new lines of enquiry and illustrat-
ing the expanding role of technology in professional contexts.
This technological upheaval has intensified the need to categorise and distinguish
between different types of technology based on their functions when applied to interpreting:
technology opening up new ways of delivering interpreting services, such as distance inter-
preting; technology performing an assistive role to the interpreter’s task, with ramifications
for the ways in which interpreters prepare for work and the quality of service rendered;
technology semi-automating interpreting workflows, enabling new hybrid modalities that
cross the boundaries between interpreting and other translation-related practices; but also
technology designed to replace human interpreters, which is gaining momentum despite
being at different development stages.
New platforms and ‘solutions’ are continuously being developed and refined to address
the diverse needs for multilingual support in our globalised era. The rapid pace of these
developments challenges practitioners and industry stakeholders to keep up and adapt to an
ever-evolving landscape. Despite increased mutual collaboration, research is also struggling
to keep up with the pace of the industry, and perceptions related to the implementation,
use, and adaptation required by these solutions vary widely among different stakeholders.
Professional and international organisations are now developing guidelines and standards
to account for the shifting paradigm occasioned by the inclusion of technology in the work
of interpreters. Moreover, the close intersection between technology and interpreting is
leading to a shift in pedagogy, not only in terms of how training is delivered, but also in
the skills required of interpreters hoping to enter the profession. This dynamic environment
requires continuous learning, upskilling, flexibility, and engagement with new tools and
methodologies to ensure that interpreting practices remain relevant and effective amidst the
technological advances transforming the field.
At this juncture, it is crucial to reflect upon and reassess how technology is applied to and
integrated into professional interpreting practice and training as a contribution to securing
the profession’s long-term viability. Technological advancements have undoubtedly opened
up numerous possibilities, but they have also introduced significant challenges that must
be addressed. While halting technological progress is not feasible and outright opposition
would be outdated and shortsighted, the risks associated with the unmonitored adoption of
new technologies must be carefully evaluated. A balanced approach is required – one that
neither glorifies nor vilifies technology but instead carefully considers its affordances and
constraints. This nuanced perspective must weigh the pros and cons of technology within
specific contexts, ensuring that adoption is thoughtful, responsible and tailored to the needs
of various circumstances, avoiding blanket judgments.
A new approach is thus essential – one that integrates technology as part of a broader
solution for inclusive multilingual communication and that harnesses the benefits of tech-
nology while minimising its risks through ethical design. This is particularly true in the
context of (generative) language AI, where the focus must be on developing it safely to cre-
ate content that serves users with diverse linguistic, cultural, sensory, and cognitive needs.
The key ethical principles guiding this approach must include human-centric development,
inclusivity, fairness, accountability, sustainability, and transparency.
Building on these premises and considering the broad scope of enquiry into interpreting
technology and AI and their impact on the profession, there is now a pressing need for a
clear mapping of the current state of the art to gain a comprehensive overview of this rap-
idly evolving field. This handbook draws on literature in the field of interpreting and related

2
Introduction

disciplines to synthetise current thinking and examine how technology alters, shapes, and
enables interpreting practice. The volume covers both spoken and signed language inter-
preting technologies, as well as technologies for language access and media accessibility,
highlighting overlapping aspects of research on these topics. The inclusion of authors from
various relevant backgrounds and specialisations, with many being both practising inter-
preters and academics, allows for in-depth insight into these technologies and surrounding
debates. The volume is organised in five sections that will give space to both industry and
academic stakeholders, so as to cover all existing arguments around the complex intersec-
tion between interpreting and technology.
Part I, ‘Technology-enabled interpreting’, is dedicated to a range of interpreting modali-
ties that, over time, have facilitated the delivery of interpreting services, including the
modalities now known as distance interpreting. Each of the chapters in this section presents
an overview of the design and development of the underlying technology, the contexts of its
use, and its applications, along with critical issues and emerging trends. These chapters aim
to summarise current interdisciplinary research on each topic while identifying potential
areas for further enquiry. Chapter 1, by Lázaro Gutiérrez, explores telephone interpreting
as both a professional practice and a research area. It examines the evolution of the service,
highlighting its benefits and challenges for interpreters and users, and addresses some key
research issues in this field as well as future prospects driven by technological advances.
In Chapter 2, on video-mediated interpreting, Braun traces the evolution of this distance
interpreting modality and its current applications across different settings, exploring key
research topics in VMI, such as interpreting quality and interactional aspects, interpreter
and user perceptions, human factors, and working conditions. The chapter also discusses
opportunities from integrating AI-powered tools into VMI platforms, the role of audio-
visual communication technology in interpreter education, and training interpreters specifi-
cally for VMI. Chmiel and Spinolo address remote simultaneous interpreting in Chapter 3,
highlighting the shift to platform-based interpreting and related interface design issues;
discuss key issues, including sound quality, cognitive load, stress, teamwork, and multimo-
dality; and conclude by examining future possibilities through the lens of recent AI-related
developments. In Chapter 4, Warnicke explores video relay service, a bimodal interpreting
modality that enables interpreting between a person who uses a signed language via video
link and a speaking participant via telephone. This modality relies on technological devices
such as videophone and telephone to enable and shape the interaction. The chapter pro-
vides an overview of the service, its regulatory provisions, and discusses the main implica-
tions related to its usage. The remaining chapters in Part I address further technologies that
have been used and/or adapted to enable the delivery of interpreting services in different
ways, such as in tour guide systems, digital pens and tablets for SimConsec, and speech
recognition technology for SightConsec. In Chapter 5, Korybski explores the evolution and
application of portable interpreting equipment over the past century, tracing the techno-
logical advancements that have shaped this technology and highlighting significant mile-
stones and innovations. Building on existing research in the area, the chapter then examines
the primary contexts and modalities in which portable interpreting equipment is utilised,
while also addressing the inherent limitations, including technical, acoustic, and ethical
constraints, and other user accessibility issues. Additionally, the chapter explores the role
of portable interpreting equipment in the training of novice interpreters and offers a look
forward into future application contexts in a highly technologised environment. Chapter 6,
contributed by Ünlü, focuses on technology-enabled consecutive interpreting driven by

3
The Routledge Handbook of Interpreting, Technology and AI

advancements in speech technologies, computing power and hardware, and generative AI.
The chapter addresses the technologisation of consecutive interpreting by exploring the
development and implementation of computer-assisted interpreting tools tailored for this
mode and the functionalities and impact of hybrid modalities, digital pen–supported tools,
and automatic speech recognition–assisted solutions. Concluding this part of the volume,
Chapter 7 is devoted to tablet interpreting, a relatively new modality that has come to the
fore with the increasingly widespread adoption of mobile technological devices and the dig-
italisation of most interpreting-related processes and workflows. Saina explores the use of
tablets as a substitute for personal computers and laptops, both in interpreter preparation
and during interpreting assignments, pointing out the unstructured deployment and diver-
sified usage by practitioners. Building on the limited work on new and hybrid interpreting
modalities enabled by using tablets, he reports on early experiences of tablet interpreting
in professional practice and interpreter training and outlines possible future directions in
tablet interpreting research.
Part II, ‘Technology and interpreter training’, addresses technologies designed to sup-
port some aspects of the interpreting task with a view to ensuring quality. In Chapter 8,
Prandi provides an overview of tools encompassed under computer-assisted interpreting
(CAI) tools and CAI tools training, focusing on their evolution, application, and impact
on interpreter performance and training. To orient future investigations, the chapter scruti-
nises the existing body of research, spotlighting key enquiries and empirical approaches and
examining the impact of tool use on interpreters’ performance and cognitive processes, as
well as questions of system performance and usability in the context of the recent advances
in AI. Chapter 9 shifts the focus to the use of digital pens for interpreter training. Orlando
discusses how this technology, originally investigated, trialled, and recommended for use
in various fields of education since the early 2000s, made its appearance in interpreting
training only from 2010, particularly in the area of note-taking for consecutive interpret-
ing. The chapter reviews training initiatives undertaken on digital pens in interpreter edu-
cation and discusses the relevance of this technology in relation to more recent tools and
systems. In Chapter 10, Amato, Russo, Carioli, and Spinolo address technology for train-
ing in conference interpreting, providing an overview of the development and applications
of computer-assisted interpreter training (CAIT) tools from the late 1990s to today. The
authors highlight how such tools are used by trainers and how they assist trainees when
practising and honing their skills, as well as their perceived user-friendliness. In light of
the impact that information and communication technologies and AI have on interpreting
trainees, this chapter concludes by emphasising the need to include training in CAI and
CAIT tools, alongside soft skills training, in interpreting curricula.
Part III of the volume, ‘Technology for (semi-)automating interpreting workflows’,
focuses on increasingly automated solutions to support multilingual communication in real
time. This part specifically examines the intersection of interpreting with AI, natural lan-
guage processing, and (neural) machine translation. It first covers hybrid workflows for
real-time speech-to-text communication that rely on varying levels of human–machine inter-
action and explores fully automated machine interpreting, often termed ‘speech translation’
in other fields. In Chapter 11, on technology for hybrid modalities, Davitti examines the
transformative impact of AI-driven technology on interpreting-related workflows, focusing
on the emergence of practices combining speech recognition and machine translation. The
chapter highlights the high demand for real-time speech-to-text interlingual services and the
need to reconceptualise traditional interpreting practices accordingly. It then provides an

4
Introduction

overview of five key workflows representing new forms of human–AI interaction (HAII),
exploring the collaborative dynamics at play, the need for new skill sets, and the chal-
lenges of ensuring accuracy and reliability, particularly in high-stakes scenarios. Despite
the scarcity of comparative studies on these workflows, the chapter identifies and critically
reviews current research themes and challenges, including opportunities for upskilling lan-
guage professionals to expand their service offerings. In Chapter 12, Fantinuoli focuses on
machine interpreting as a form of automatic speech translation that has the potential to
overcome language barriers in real time but presents a number of challenges, including the
risks associated with providing real-time language mediation without human experts in the
loop, and in terms of responsible and ethical use. After exploring its evolution, challenges,
and potential future applications, the chapter discusses key issues and explores relevant
technological approaches, while also addressing the ethical questions that arise from the
development of artificial interpreting systems.
After grouping specific technologies for interpreting according to their main functions
and addressing each of them individually (Parts I, II, and III), the last two parts of the
volume adopt a different approach. Part IV, ‘Technology in professional interpreting set-
tings’, takes as a point of departure specific contexts in which interpreters regularly work
and the role that technologies play in these settings. Each of these settings has been an
area of enquiry in interpreting studies, but the structure and approach to these chapters
emphasise the ways in which technology has altered the work of the interpreter, and the
outcomes of its implementation in the respective setting. Interpreting in medical or legal
settings, for instance, regularly relies on several of the technologies presented in Parts I and
II and (more recently) III of the volume. Chapters in Part IV allow authors to synthesise
scholarship relative to interpreting technologies in the specific domain or setting to pre-
sent a comprehensive overview of their use, contexts of application, current practices, and
issues specific to these domains. These chapters help situate technology squarely within the
interpreter’s work and integrate the discussion of technology and interpreting rather than
treating each in isolation. Part IV starts with Chapter 13 on conference settings, where
Seeber argues that conference interpreting, given its status and wide recognition as a highly
professionalised practice, has also been the test bed for many new technological develop-
ments. In particular, the high expectations in terms of accuracy and completeness, as well
as the seemingly ever-increasing speed and density of statements delivered at international
conferences, have fostered the integration of technology in this setting, with a view to com-
pensating for the limitations of the human processor. This chapter considers how different
technologies have been developed, introduced, and used in conference interpreting settings
and what part of the process they are likely to have impacted. Chapter 14 shifts the focus
to healthcare settings, which are increasingly being influenced by technology, ranging from
distance interpreting via video link and telephone to machine translation. Yet whereas dis-
tance interpreting in this setting has already been explored to some extent, the impact of
the most recent technological advancements on the quality and organisation of healthcare
remains largely unexplored. In this chapter, De Boe presents examples of various types of
technology-based healthcare communication, discussing current practices and issues associ-
ated with their use, as identified through empirical research, and concluding with a brief
outlook for near-future developments and directions for research, calling for a greater focus
on users and heightened attention to ethical issues raised regarding the use of technol-
ogy in healthcare communication. In Chapter 15, on legal settings, Devaux discusses how
audio- and video-mediated interpreting has altered the way multilingual legal proceedings

5
The Routledge Handbook of Interpreting, Technology and AI

are conducted. More specifically, it examines the effect technology has on the legal inter-
preter’s working environment, the interpreting process, and the interpreter’s training. To
this end, it goes beyond distance interpreting by discussing other emerging technologies,
from computer-assisted interpreting tools to machine interpreting. By shedding light on
the transformative role of technology in legal interpreting, this chapter provides a founda-
tion for understanding the current state of technology-driven changes and raises considera-
tions for the future of the legal interpreting profession. Finally for this section, Chapter 16
explores the multifaceted world of immigration, asylum, and refugee settings. Singureanu
and Braun provide a comprehensive overview of current practice and research relating to
the use of distance interpreting modalities during asylum interviews, health assessment of
refugees and reception centres, as well as of existing yet currently limited and fragmented
guidelines and training for these practices. In addition, it outlines the emerging uses of other
technologies, such as crowdsourcing platforms and automated services, in such sensitive
and high-stake contexts.
Part V, ‘Current issues and debates’, concludes the volume by covering a comprehen-
sive range of topics that have been investigated in relation to interpreting and technology.
Each chapter focuses on one of these topics, discussing it in relation to one or more of the
technologies presented in previous parts, based on existing debates and studies. Chapters
in this part are split between chapters exploring theoretical constructs that are at the core
of debates around interpreting and technology (Chapters 17–19) and chapters address-
ing issues related to the profession (Chapters 20–22). Each contribution reviews relevant
theoretical backgrounds in addition to the current literature on the topic in interpreting
studies, addressing how technology has enabled and/or altered our understanding and
approach to these topics in the practice and theory of interpreting, followed by a discus-
sion of critical issues and emerging trends. To start with, Chapter 17, on quality-related
aspects, reviews this central concept within interpreting studies and its crucial relevance
for building a deeper understanding of the influence of different technology-related modali-
ties of interpreting on current and future interpreting practice. In addition, it considers the
uses of technology (automated measures) in the process of interpreting quality assessment
itself. Davitti, Korybski, Orăsan, and Braun synthesise how quality can be evaluated and
measured and consider the added complexity brought by technology. The chapter demon-
strates how technology integration in different interpreting practices makes the process of
assessing quality even more intricate, yet increasingly needed, while also outlining the ben-
efits and drawbacks of technology-driven quality assessment. Chapter 18 explores ethical
aspects in relation to the use and impact of interpreting technologies. Giustini focuses on
this interrelation between the use and impact of technologies, as a matter of ‘technoethics’,
highlighting how the technological tools available in the interpreting profession and the
industry are connected to moral and socio-economic issues, such as employment, working
conditions, and potential automation and substitution of human labour; confidentiality,
training, and corporate ownership of data; bias and linguistic diversity; and technology
use in crisis-prone interpreting settings. It concludes by arguing that while the usefulness
and viability of interpreting technologies should not be negated, there is a necessity for
increased awareness, inclusive guidelines, regulation, and stakeholder collaboration to pro-
mote their fair and ethical deployment. In Chapter 19, Mellinger presents an overview
of current scholarship on cognitive aspects that has been conducted at the intersection of
technology and interpreting highlighting studies that have explored various aspects of inter-
preter cognition as they relate to technology use in the practice and process of interpreting.

6
Introduction

The chapter also reviews several key topics associated with technology use, including cog-
nitive load and cognitive effort, cognitive ergonomics, and human–computer interaction,
as well as more situated and contextualised approaches to interpreter cognition. The chap-
ter concludes with a brief discussion of open questions in the field related to interpreting
technologies and cognition – namely, big data and interpreting ethics. Chapter 20 shifts
the focus onto international and professional standards, discussing what has been done in
relation to different modalities to regulate the growing intersection between interpreting
and technology, and what still needs to be addressed. In this chapter, Pérez Guarnieri and
Ghinos address the question of why standards are essential in the field of interpreting,
exploring the evolution and significance of interpreting and related technology standards
in the context of cross-cultural communication. The chapter outlines the historical roots of
these standards, their adaptability across various specialisations, and the numerous benefits
they bring to the interpreting profession. It highlights the crucial role these benchmarks
play in enhancing service quality, protecting interpreters and users, fostering professional
development and trust, and offering valuable insights for individuals and organisations
involved in interpreting communicative events. Additionally, it provides insights into the
development of specific ISO standards, including how standards can contribute to quality
in distance interpreting. Chapter 21 is devoted to workflows and working models. Rütten
explores the impact of technology on the workflow and working models of interpreting, with
a particular focus on conference interpreting. She argues that technological advances have
improved efficiency but increased interpreters’ cognitive load through ‘simultanification’ –
performing multiple tasks simultaneously. While technology simplifies data access and
knowledge acquisition, it also risks overreliance and information overload. Moreover,
while it enhances access to semiotic information, it may distance interpreters from the
communicative contexts in which they work. Additionally, technology blurs traditional
task boundaries, allowing interpreters to handle pre- and post-task activities during the
assignment itself. From a business perspective, it improves client matching and introduces
micropayments and more technical subjects. The volume concludes with Chapter 22, where
Figiel discusses the intersection between ergonomics and accessibility in an evolving context
of technologies for (conference) interpreting. The chapter defines the notions of ergonom-
ics, user experience, usability, and accessibility and applies these to the discussion of both
historical and current issues relating to interpreter workstations and workflow. The chap-
ter places special emphasis on the accessibility of the solutions and practices discussed,
examining the impact that the rise of distance interpreting has on ergonomics, focusing on
areas such as cognitive load, acoustic challenges, and issues with the interfaces of simul-
taneous interpreting delivery platforms. It also covers aspects of ergonomics related to
speech-to-text interpreting and interpreter training.
The volume addresses different types of readerships. Firstly, students and researchers
interested in interpreting and, more broadly, in technologies for multilingual communica-
tion access. There are indeed several university courses that include an interpreting and
technology module within their offer, which is testament to the growing importance of
this intersection within the broader field of interpreting studies. This handbook can serve
as a complement to existing handbooks and encyclopaedias on interpreting studies and as
the go-to reference on interpreting technology. The grouping of chapters into five parts,
with symmetrical chapter structures within each part as far as possible, makes the volume
approachable and readable, and the two-pronged approach – that is, presenting both tech-
nology as the specific object of study and the ways in which it has enabled and shaped the

7
The Routledge Handbook of Interpreting, Technology and AI

interpreting task, process, and workflow, with its socio-economic implications – makes it
a useful resource for students and researchers of interpreting alike. While advanced under-
graduate and graduate students can use the volume to become familiar with the scholarship
on the topic, researchers can also use it as a starting point to examine interpreting technol-
ogy and its impact on a range of topics. A second, yet equally important, readership are
professional interpreters. There is a growing number of professional interpreters interested
in the use and development of technologies for interpreting. For instance, there are working
groups dedicated specifically to the development of international standards on interpreting
technology in a variety of settings, interest groups, and divisions in professional organisa-
tions, as well as a number of podcasts, social media groups, and feeds (e.g. #terptech) dedi-
cated to the topic. As such, professionals working in the field and the agencies who engage
with them will likely find the volume to be of interest. In some respects, this volume could
help bridge the professional/academic divide that often limits professional engagement in
academic discourse.
We hope this comprehensive overview of the intersection between interpreting and tech-
nology in the AI era serves as a valuable resource for anyone interested in this evolving field,
to either deepen their knowledge or approach it for the first time. While it inevitably offers
only a snapshot of the current landscape, which is in a constant state of flux, it highlights
key developments, issues, and ongoing debates. By presenting a range of perspectives, we
encourage readers to form their own informed opinions on different aspects of the complex
relationship between interpreting and technology. Despite concerns about AI’s potential
to replace interpreting, this volume highlights the complex nature of their relationship,
emphasising the need for varied and nuanced solutions based on context and the careful
consideration of different factors. It underscores both the opportunities and limitations
of technology in interpreting. Ultimately, we hope readers find the volume insightful and
enjoyable, while recognising that despite the changes brought about by technology, inter-
preting plays and will continue to play a crucial role in facilitating real-time multilingual
communication across all areas of life.

8
PART I

Technology-enabled interpreting
1
TELEPHONE INTERPRETING
Raquel Lázaro Gutiérrez

1.1 Introduction
Telephone interpreting (TI) refers to the interpretation of spontaneous speech over the phone
by two or more speakers who use two different languages. It is a modality of remote or
distance interpreting that takes place in consecutive or dialogue mode (Ruiz Mezcua, 2018)
and is popular in public and private service provision (Lázaro Gutiérrez and Nevado Llo-
pis, 2022). In relation to Braun’s (2015, 2019) taxonomy of modes of interaction between
technology and interpreting, TI belongs to technology-mediated interpreting, which refers
to technologies used to deliver remote or distance interpreting services.
According to Fantinuoli (2018, 4), remote interpreting is ‘a broad concept which is com-
monly used to refer to forms of interpreter-mediated communication delivered by means
of information and communication technology’. Similar definitions have been provided by
Braun (2019, 2024), using the more recent term ‘distance interpreting’. Telephone inter-
preting, video-mediated interpreting (see Braun, this volume), and remote simultaneous
interpreting (see Chmiel and Spinolo, this volume) all fall under this umbrella term. Inter-
preting delivered over video link is gaining momentum as technology continues to progress
(Lázaro Gutiérrez and Nevado Llopis, 2022). These advances are particularly evident in
conference settings with remote simultaneous interpreting. However, telephone interpret-
ing remains the most popular modality of distance interpreting in service provision settings,
with significant investment made in this area (Hickey and Hynes, 2023) particularly after
the COVID-19 pandemic.
In terms of participant configuration (or ‘constellation’, following Pöchhacker, 2020),
telephone interpreting occurs when any of the participants involved in the interaction
(including the interpreter) connects through audio link. The most common situations
include a remote interpreter working with an on-site service provider and an end user, or a
three-way call in which all the participants connect via audio link (Rosenberg, 2007). The
latter configuration has also been termed ‘teleconference interpreting’ (Braun, 2019, 272).
A less-typical constellation is described by Spinolo et al. (2018), where the service provider
is in the same location as the interpreter and contacts the end user via audio link. To these,
a configuration that has received little attention in research to date ought to be added: when


DOI: 10.4324/9781003053248-3
The Routledge Handbook of Interpreting, Technology and AI

end users are in the same location as the interpreter and connect to service providers via
audio link. Despite being under-researched and potentially uncommon in formal settings
(W. Zhang et al., 2024), this configuration is more frequent in informal interpreting con-
texts. It can be illustrated by end users providing the interpreter, who, in these cases, is usu-
ally a relative or a friend of theirs. Usually, these informal interpreter-mediated encounters
occur in the frame of service provision over the phone, for instance, when an emergency
service is called or when appointments for different public services (such as a medical con-
sultation) are arranged telephonically.
This chapter starts with a conceptualisation of TI. Section 1.2 provides definitions and
refers to the evolution of TI services in market. Section 1.3 is devoted to the current practice
of and research into TI, including its characteristics, advantages, and challenges for profes-
sionals (interpreters) and users. Section 1.4 deals with future avenues for TI. This includes
aspects related to ergonomics, cognition, and working conditions, as well as the peculiari-
ties of human–machine interaction and the move towards automation.

1.2 Evolution
TI is a modality of distance interpreting often used in bilateral or dialogue interpreter–
mediated interaction. This has been explored in research on public service interpreting and
business interpreting settings (Lázaro Gutiérrez and Nevado Llopis, 2022). Another name
for telephone interpreting is ‘over-the-phone interpreting’, or OPI. Typically, a telephone
interpreter–mediated interaction in a service provision setting will include three partici-
pants: the interpreter, a service provider, and an end user. Even though telephone interpret-
ing could simply be defined as ‘bilateral interpretation over the phone’ (Andres and Falk,
2009), it has been widely acknowledged that it also possesses characteristics that go beyond
bilateral interpreting (González Rodríguez, 2017, 2020).
The shift towards remote delivery is a disruptive change in interpreting, and human
beings are often resistant to change. However, this is not the first time the interpreting pro-
fession has experienced such technological disruption. One of the most significant changes
was the introduction of simultaneous interpreting and the use of electro-acoustic technol-
ogy (Baigorri Jalón, 2014). This new mode of interpreting was initially called ‘telephonic’,
as it created distance between interpreters and end users, relying solely on oral communi-
cation and limiting visual cues. The technological shift provoked by the introduction of
electro-acoustic technology might also be at the core of ongoing debates about the visibility
of interpreters (Pöchhacker, 2020).
In fact, the etymology of ‘telephone’ refers to the transmission of sounds in the dis-
tance. In recent years, there have been attempts to complement or even substitute TI with
video-mediated interpreting (see, for example, Lázaro Gutiérrez and Nevado Llopis (2022)
for a discussion of this trend), which can be seen as a natural evolution from TI. For
instance, despite the differences between the two modalities, they share many elements,
including the transmission of sound across a distance. Incidentally, this is reflected in the
more recent umbrella term ‘distance’ interpreting. Spinolo (2022) describes distance inter-
preting (including TI and video-mediated modalities) as a phenomenon that was accel-
erated by the pandemic. Even now that pandemic social restrictions have been relaxed,
demand for distance interpreting is still growing, although at a slower pace. Technological
improvements, such as the availability of higher bandwidth in public services, have made it
easier not only to implement TI more widely but also to complement it with video-mediated

12
Telephone Interpreting

interpreting when needed (Spinolo, 2022). This is also due to a disruptive change in work
practices in general. These changes in practice favour telecommuting in all sectors, as well as
changes in social behaviour, with clients and users preferring to acquire and obtain services
over the phone instead of via on-site visits and interactions. Many telephone interpreters
started their careers as on-site interpreters. Accepting remote assignments was something
many telephone interpreters felt obliged to do because of changing market conditions. The
COVID-19 pandemic accelerated this process and transformed not only the interpreting
industry but also the way in which service provision occurs. Nowadays, some interpreters
have greater expertise in telephone assignments than in on-site assignments.
For sign language interpreting, telephones alone are not sufficient to meet all users’ needs,
and video-mediated interpreting has brought about real change. However, in the early days
of remote sign language interpreting, Deaf individuals wishing to communicate with hear-
ing people at a distance used teletypewriters (TTY) or a TDD (telecommunications device
for the deaf), also known as ‘text telephones’. These devices functioned like a telegraph,
transmitting signals from one device to another using the phone line. TDDs could display
text on a screen or print out written messages. Public services (mostly healthcare systems)
offered a human relay for those who did not own a TDD. In this system, the message was
communicated orally over the phone to an operator, who would codify (write) the message
on the TDD before sending it to the Deaf user.
More recently, market analysts (Hickey and Hynes, 2023) have highlighted two differ-
ent phenomena that influence the interpreting industry. On the one hand, their data points
to interpreters’ perception of having suffered poor working conditions in public services
settings for decades. The move to TI could be considered as the final nail in the coffin for
interpreters. This may even lead to an abandonment of the profession altogether. On the
other hand, this situation also gives way to younger interpreters, who ‘flourished during the
pandemic and now prefer remote assignments over on-site ones’ (Hickey and Hynes, 2023).
This explains the recurrent lack of talent in on-site public service interpreting, which has
intensified during this decade.

1.3 Practice and research


Research on TI began with a focus on quality issues and comparisons with on-site inter-
preting, and literature is prolific in showing both its advantages and disadvantages. Braun
(2015, 353) cited Paneth (1957) as the first reference in literature to TI, initially presented
as a very promising modality. The first telephone interpreting service is said to have been
set up in 1973 in Australia by Translating and Interpreting Service (TIS National), which
now belongs to the Australian government.1 Fewer years later, it spread to the United States
(1980s), Japan (1980s), the UK (1990s), and France (1990s), as reported by Phelan (2001).
TI has been described as being ‘fast’ (Ruiz Mezcua, 2018) and ‘cost-effective’ (Ko, 2006;
Phelan, 2001; Rosenberg, 2007), as it renders immediate access to interpreters possible, in a
wide variety of languages, at any time, any day of the week (Phillips, 2013; Gracia-García,
2002; Mintz, 1998; Fors, 1999; Hewitt, 1995; Jones and Gill, 1998; Wadensjö, 1999).
In comparison with on-site interpreting, Hornberger et al. (1996) and Saint-Louis et al.
(2003) found that remote modalities of healthcare interpreting resulted in increased rapport
between healthcare providers and patients. For interpreters, TI assignments appear easier
to combine with family life or other commitments, as interpreters can choose their schedule
and reduce their working hours if necessary (Lázaro Gutiérrez, 2021).

13
The Routledge Handbook of Interpreting, Technology and AI

However, disadvantages have also been described. These include increased difficulties
in coordinating talk (Wadensjö, 1999; Hsieh, 2006; Oviatt and Cohen, 1992), technical
issues related to sound quality and the connectivity of the line (Kelly, 2008; Lee, 2007), an
inadequate use of the technology by main speakers (Lázaro Gutiérrez, 2021), lack of brief-
ing (Lee, 2007), and scarce specialisation (Heh and Qian, 1997; Gracia-García, 2002). This
section focuses on two of the main, most often cited challenges of telephone interpreting:
the lack of visual context (multimodal input) and the coordination of discourse, as well as
on research around quality.
These two challenges are highly intertwined. Coordinating discourse, already complex
in on-site bilateral interpreting, becomes even more difficult over the phone. In this modal-
ity, the lack of visual cues further constrains turn-taking, adding to the complexity. Turn
exchange is normally performed unconsciously by primary speakers in monolingual inter-
actions, and its convenience and orchestration rely on non-verbal information, such as
intonation or body language (Drew and Heritage, 2006). These non-verbal elements are
inaccessible during telephone-interpreted interactions. Training in discourse coordination
abilities has been acknowledged as being essential (Fernández Pérez, 2015). In fact, this has
been the focus of research, although, at times, simulated conversations are used instead of
authentic data (De Boe, 2023).

1.3.1 Multimodal input


Due to the constraints of telephone technology, TI is fully based on audio input, while
human communication is multimodal. The lack of visual context is recurrently mentioned
in literature and is usually referred to as being ‘a disadvantage’ (Gentile et al., 1996; Roy,
2000; Wadensjö, 1998, 1999; Oviatt and Cohen, 1992; Fors, 1999; Kurz, 1999; Mack,
2001; Vidal, 1998; Lee, 2007; Lázaro Gutiérrez and Cabrera Méndez, 2019; Mintz, 1998).
However, the lack of visual context and remoteness inherent in telephone interpreting has
also been portrayed positively. Several authors stress how the lack of visual context and
remoteness may assist confidentiality and privacy (Hewitt, 1995; Kelly, 2008; Wadensjö,
1999; Rosenberg, 2007; Phillips, 2013; Ko, 2006; Mikkelson, 2003), as well as aid ‘per-
sonal detachment’ and neutrality (Lee, 2007).
Interpreters themselves have referred to the lack of visual cues in TI results as requiring
additional effort. To illustrate, David Mintz, former president of NAJIT (National Asso-
ciation of Judicial Interpreters and Translators, USA), together with other experienced
interpreters, sent a letter to the journal Proteus (Mintz, 1998) denouncing this additional
struggle, along with the feeling of insecurity that the lack of visual information in TI brings.
For example, telephone interpreters miss out on gestures, postures, and facial expressions
(Gentile et al., 1996; Roy, 2000; Wadensjö, 1998, 1999; Oviatt and Cohen, 1992; Fors,
1999; Kurz, 1999; Mack, 2001; Vidal, 1998; Lee, 2007). However, in complex contexts,
such as emergency settings, TI remains popular because of its accessibility. Nevertheless, it
is typical of these encounters that several people are involved in the communication, and TI
interpreters are unable to see how many of them are present or which roles they perform in
the interaction (Wadensjö, 1998; Lázaro Gutiérrez and Cabrera Méndez, 2019). That being
said, TI is discouraged in contexts where visual information is especially important, such
as in patient education or teaching scenarios (Kelly, 2008), or when end users present any
kind of condition that complicates communication over the phone, such as in mental health
settings (Lázaro Gutiérrez and Cabrera Méndez, 2019; Lázaro Gutiérrez, 2021).

14
Telephone Interpreting

The lack of visual context is not always perceived as a burden for the interpreters’ per-
formance that impairs communication. For example, if the sound conditions are appropri-
ate, interpreters can deploy strategies to obtain information from paralanguage (tone of
voice, breathing patterns, inflection, pitch, and volume) (Ko, 2006; Kelly, 2008; Crezee,
2013; Cheng, 2015). Since contextualisation at the beginning of the interaction is key,
telephone interpreters have been reported to interact with main speakers in order to obtain
key information that can help them better understand the scenario. This can be consid-
ered as ‘leaving aside the traditional invisible role’ of the interpreter (Lázaro Gutiérrez and
Cabrera Méndez, 2019).
Other advantages of TI include avoiding visual distractors. Visual distractors include
non-verbal language (Mikkelson, 2003; Lee, 2007) or unpleasant input (the sight of blood
or injuries). Distance also keeps interpreters away from unpleasant smells or possible risks
or harm (e.g. contagion in cases of medical interpreting, or aggression in conflict situations)
(Lázaro Gutiérrez, 2021). Some interpreters have reported an increased feeling of neutral-
ity when performing over the phone (Lee, 2007) and reduced interference with patients’
privacy (Lázaro Gutiérrez, 2021).
In order to overcome the most important constraints brought about by the lack of visual
context without losing the advantages of telephone interpreting, video-mediated interpret-
ing is seen as the most logical, convenient solution. For example, Spinolo (2022, 7) points
to the appearance of ‘video-mediated dialogue interpreting’ as an alternative to (or comple-
ment for) telephone interpreting in service provision settings. However, although technol-
ogy advances rapidly and users progressively develop technological competences, many
organisations are still not equipped to incorporate video-mediated interpreting into their
procedures. This could be due to fixed institutional practices, financial reasons, or the fact
that telephones are, undoubtedly, more accessible and portable than other devices.
One could argue that videoconferences can also be carried out via smartphone. However,
if video-mediated interpreting were to be used instead of TI, with the aim of having access
to visual information, it should be kept in mind that smartphones do not offer the required
technical characteristics, in terms of screen size or video quality. In fact, Moser-Mercer
(2005) stated that it was essential for interpreters’ visual needs to be addressed in order
to achieve proper working conditions that would, in turn, lead to quality interpreting per-
formance. Additionally, Sultanic (2022) mentioned the reduced sizes of screens and poor
lighting as challenges for remote interpreters in videoconferences caused by smartphones.
These challenges result in rendering non-verbal cues barely apparent, which, in turn, leads
to misunderstanding.
In any case, according to Hickey and Hynes (2023), and leaving out pandemic figures,
which reflect a dramatic increase of telephone and video-mediated interpreting over on-site
interpreting, predictions for the years to come indicate a decrease of between 2 and 3% for TI,
in favour of video-mediated and remote simultaneous interpreting. As technology advances
and society at large gets more used to it, remote modalities of interpreting will include more
and more input for interpreters. This, in turn, implies offering a multimedia and multimodal
set-up, and video is therefore the next logical step to complement audio input.

1.3.2 Coordination of discourse


As Spinolo (2022: 5) suggests, remote interpreting implies ‘a different management of com-
munication with the primary participants in the interaction’. This ‘different management’

15
The Routledge Handbook of Interpreting, Technology and AI

is motivated by the distance between the interpreters and the main participants in the inter-
action (who may also be in separate locations from one another). In the case of telephone
interpreting, communication management is even more challenging due to the lack of visual
information, as outlined previously, which prevents interpreters from ascertaining interac-
tional cues from body language (Peng et al., 2023).
When referring to instances of remote communication, such as remote interpreting, Spi-
nolo (2022) also points to a dual socio-cognitive element: the feelings of presence and
alienation. Traditional views of the interpreter’s role emphasise them being invisible, com-
pletely neutral, and impartial and translating everything that is said. However, in reality,
interpreters interact with the parties they interpret for and build rapport (De Boe, 2020)
throughout the conversation. Rapport is necessary in human interaction and facilitates
communication. However, several authors have pointed to difficulties in building rapport
for remote interpreters (including telephone interpreters). This can lead to an increased feel-
ing of alienation, in comparison to on-site interpreters (Moser-Mercer, 2005; Mouzourakis,
2006; Price et al., 2012).
Comparative studies on the characteristics of interpreter-mediated discourse have
revealed that the interpreter’s turn-taking and the coordination work are more complex
during telephone interpreting compared to on-site interpreting (Sultanic, 2022). For
instance, Wadensjö (1999) noticed the increased presence of overlaps and interruptions
over the phone. Furthermore, Braun (2015) stated that overlapping, which is more fre-
quent in remote modalities, may lead to incomplete renditions. Braun and Davitti (2020)
also studied on-site, telephone, and video-interpreted interactions to create categories that
allow us to analyse remote modalities. With a pedagogical aim, they focused on turn-taking
(managing conversation openings, turn shifts, closings) among other challenging aspects
for interpreters.
Whatever the modality of dialogue interpreting, turn-taking is always challenging. For
instance, participants may feel impatient and start talking before the interpreter finishes
their rendition. Similar challenges occur when personal or cultural turn-taking patterns
or practices differ between participants in the interaction. Likewise, the main speaker(s)
may not always allow interpreters to provide their interpreted rendition because they have
already understood the message themselves. Other challenges occur in remote modalities,
both in video and TI. Of particular interest are three-way calls, especially when end users
are located in a noisy or uncomfortable environment. As seen in Section 1.3.1, video inter-
preting seems to be a suitable solution to minimise the impact of the lack of visual context,
although this does not entirely resolve this issue. The same turn-taking issues also occur
with video interpreting and are particularly challenging when the interpreter has to manage
both visual context and turn-taking. Sultanic (2022) surveyed remote (video) interpreters
about the causes of difficulties in turn-taking and compared their remarks to TI. In her
study, ‘[b]oth providers and interpreters reported that the ability to clearly see the par-
ticipants on video made it easier to engage in dialogue and anticipate each speaker’s turn’
(p. 93). However, video-mediated interpreting cannot solve all the turn-taking issues that
appear in TI. It can also add challenges, such as technical issues related to lags or screens
freezing.
The use of the third person instead of the first is also reported as occurring more fre-
quently in TI. This has the effect of making interpreters more visible and active in the
coordination of turns. Several authors have also identified more frequent intervention by
interpreters to avoid miscommunication and overlap (Lee, 2007; Oviatt and Cohen, 1992).

16
Telephone Interpreting

In fact, Oviatt and Cohen (1992) and Kelly (2008) suggest that this trend of increasing the
use of personal pronouns may be due to the lack of visual cues. Similarly, in a survey-based
study, Wang (2021) found that, despite a tendency to use the first person, telephone inter-
preters feel they have to change to the third person to meet clients’ needs and compensate
for lack of experience in communicating through an interpreter. Clients were also reported
to have shifted between second- and third-person pronouns when addressing their inter-
locutors, thus further complicating the task of telephone interpreters.
The coordination role of dialogue interpreters can at times be considered as a step away
from the classical role of an ‘invisible’ interpreter. In remote assignments (as highlighted in
Section 1.3.1 and the beginning of Section 1.3.2), this coordination role goes even further.
Sultanic (2022) refers to instances in which interpreters take on the role of ‘technicians’
and give instructions to the primary speakers about how to use technology in remote com-
munication. These results were due to the fact that the interpreters in Sultanic’s study had
more experience in the use of the technology and therefore found themselves in the position
of advice-provider to the main speakers (particularly to patients) about how to use it effec-
tively. Interestingly enough, in this same study, Sultanic also found that some interpreters
felt the need to provide further cultural explanations when interpreting remotely than they
would on-site.

1.3.3 Interpreting quality and satisfaction with TI


As mentioned at the start of Section 1.3, previous studies in TI have tended to measure
the quality of interpreters’ performance over the phone in comparison with on-site assign-
ments (Martínez-Gómez, 2008; Jaime Pérez, 2015). However, when it comes to analysing
the quality of interpreters’ renditions during telephone assignments, particularly for studies
carried out more than ten years ago, one must remember that this form of assignment was
not as frequent as on-site jobs. As a result, interpreters tended to perform worse in TI and
experience fatigue at an earlier stage back then. Evidence to support this claim comes in the
form of a positive correlation between ‘expertise in TI’ (measured in terms of the percent-
age of time spent dedicated to this modality and the number of years of interpreters’ work
experience) and interpreters’ perceptions about and approach towards ‘TI challenges,’ as
discovered by Iglesias Fernández and Ouellet (2018).
Earlier studies also researched interpreter satisfaction and discovered that most preferred
on-site assignments. However, this is probably because the interpreters were most used to
this form of assignment (Azarmina and Wallace, 2005). Client or user satisfaction has also
been researched. Findings suggest that levels of client or user acceptance of TI are equal
to those of on-site interpreting (Azarmina and Wallace, 2005; Jaime Pérez, 2015). Recent
studies regarding quality in interpreting include comprehensive approaches that go beyond
quantitative measures of satisfaction. To illustrate, Lázaro Gutiérrez and Cabrera Méndez
(2023, 2024) apply the ‘critical incident’ methodology to analyse TI quality. This method-
ology implies regular group debriefing meetings, in which interpreters are able to share any
challenges or difficulties they encountered. The most salient challenges or difficulties are
isolated and described, to allow for examination by quality managers, who were listening
to the interpreters as they performed their assignments. This approach allows for a full and
contextualised understanding of a problem. This, in turn, facilitates the subsequent pro-
cesses of establishing guidelines and providing training to help interpreters deal with this
particular challenge and improve their future performance.

17
The Routledge Handbook of Interpreting, Technology and AI

1.4 Future avenues

1.4.1 New technological workflows


When considering technology-mediated interpreting, one could argue that TI is a ‘highly
technological’ activity, and that it is a very popular choice for bilateral and consecutive
modalities (Kerremans et al., 2018, 2019). In domains that are closely linked to technology
and innovative settings, evolution is usually fast, and practice and research can advance at
a slower pace. However, research can also drive innovation in the market, especially when
results of experiments can be transferred to the development of tools or draft guidelines. On
the other hand, new industry practice can also become the object of research. Furthermore,
as the paradigm of the ‘augmented interpreter’ (Jiménez Serrano, 2020; Fantinuoli, 2018;
Fantinuoli and Dastyar, 2022) and concerns about human–machine interaction evolve, it is
becoming apparent that the use of technology is indeed a multifaceted object of research.
Issues related to technology acceptance (Corpas Pastor and Gaber, 2020; Stengers et al.,
2023) and ergonomics (Dong and Turner, 2016; Braun, 2019) also come to the fore, as does
the question of ethics, which continues to be of interest to researchers. Besides the interac-
tion between different professional approaches to ethics (Lázaro Gutiérrez, 2020; Valero
Garcés et al., 2015), one can also note the clash with market demands as an emerging
theme. Likewise, research has also focused on the consequences of market aggravations, as
technology becomes more widespread and independent from human participation. The pre-
sent reality of TI appears to reflect this phenomenon. As a result, most common TI practices
have had to adapt accordingly, and specific protocols have had to be tailored to individual
clients (Lázaro Gutiérrez et al., 2021). However, interpreters themselves are projecting their
anxieties towards the future, focusing on how they can adapt to the consequences of full
automation and loss of control due to the growing presence of generative artificial intel-
ligence in their field. In fact, due to its conversational nature, TI settings could prove easy
to automate using current technology, based on large language models. However, an extra
challenge could be posed by more complex aspects of cross-cultural communication which
deal with cultural variations in semantics and pragmatics.
Another aspect that needs to be further considered in the future is interpreter reluctance
towards the use of technology (Corpas Pastor and Fern, 2016; Corpas Pastor, 2018). This
phenomenon is particularly evident in TI, and other remote interpreting technologies, and
particularly in public service settings (Stengers et al., 2023). There are a variety of potential
reasons behind this technological reluctance. For example, the existence of remote modali-
ties of interpreting implies not only a change in necessary interpreting skills but also a shift
in working conditions (i.e. in different ways of organising and performing assignments, and
a different system for dealing with fees and payments). This relates to ergonomics (Dong
and Turner, 2016). In fact, those known as ‘telephone interpreters’ do not only use a phone
for assignments; they also have to connect to other forms of technology, such as platforms
from TI companies, in order to complete numerous administrative tasks. These range from
creating a profile, providing availability, receiving training, accessing glossaries and pro-
tocols, receiving assignments, being evaluated by clients and supervisors, and viewing sta-
tistics about their interpreting performance and earnings. Telephone interpreters also use
a wide variety of the computer-assisted translation and interpreting tools available on the
market as a complement to the tools provided by the companies for whom they work. This
is particularly true during an interpreter’s preparation phase, but also during assignments

18
Telephone Interpreting

(Stengers et al., 2023). This range of technology can cause stress and anxiety to individual
interpreters and deeply concern professional associations (Lázaro Gutiérrez, 2021).
Nevertheless, several authors have attested to a slower introduction of technology within
interpreting compared with translation and mainly refer to ‘computer-assisted interpreting’
(CAI) tools for interpreting (Tripepi-Winteringham, 2010; Drechsel, 2015; Moser-Mercer,
2015; Fantinuoli, 2017 – see also Prandi, this volume). That being said, X. Zhang et al.
(2023) state that the COVID-19 pandemic accelerated the digitalisation of interpreting and
note that ‘radical changes’ occurred in less than one year. They highlight that this amount
of change would have most ‘probably taken five to six years, under normal conditions’.
Besides this crisis-provoked evolution in interpreting services, the development of technol-
ogy which supports interpreting has also advanced thanks to academic research. As Fan-
tinuoli (2023) states, this could be due to the size of the interpretation industry, which is
oriented towards a relatively small number of users. For example, while TI has not been the
main focus of CAI development, current projects, such as PRAGMACOR,2 include ‘design-
ing CAI tools adapted to TI needs’ within their objectives. Although these tools remain in
development, telephone interpreters can already use the CAI tools developed for conference
interpreting to prepare their own assignments. This is because this process is similar to what
conference interpreters typically undergo (Stengers et al., 2023).

1.4.2 Training
An array of training has been developed for telephone interpreters over recent decades.
Examples include numerous research projects, such as the project SHIFT in Orality (Amato
et al., 2018), which has placed remote interpretation at the core of their research objectives.
As a result, training in telephone interpreting has become more common and more stand-
ardised. This has allowed telephone interpreters to increase their skill set and move towards
video remote interpreting or other remote simultaneous modalities.
Training is acknowledged as being the ‘main pillar’ for quality in TI services (Kelly,
2008). Nevertheless, early research in TI in Spain denounced the scarcity of training and
guidelines that specialised in TI. This denunciation referred to the lack of courses and
educational resources for interpreters, as well as clear protocols and guidelines for service
providers (García Luque, 2009; Murgu and Jiménez, 2011; Prieto, 2008; Martínez-Gómez,
2008). Outside of Spain, Verrept, in Belgium (2011), and Ozolins, in Australia (2011), also
identified a need for interpreters to have access to supplementary training in order to ‘make
adequate use of equipment’ and ‘solve technological issues’ and ‘improve performance’.
Taking a different approach, Hlavac (2013) also alludes to this, suggesting including TI in
training programmes for interpreters.
Despite researchers widely acknowledging the lack of guidelines, Kelly’s (2008) guide for
telephone interpreters represents a remarkable example of training materials for telephone
interpreters. It includes workplace guidelines, as well as a full chapter on ethics, and scenar-
ios for practice. Similarly, in Spain, Fernández Pérez also designed and published training
activities. These were based on both role-plays (2015) and the particular skills telephone
interpreters need to acquire to fulfil their main tasks: discourse coordination and transla-
tion (2012). The TI characteristics that Fernández Pérez identified include lack of visual
information, increased access to interpreters in a short period of time for a larger number
of users from different backgrounds, and use of technical equipment. These characteristics

19
The Routledge Handbook of Interpreting, Technology and AI

originate from a study she carried out in order to identify and classify the characteristics
of TI.
In general, training in TI is provided by TI companies, and in cooperation with universi-
ties. TI companies tend to visit universities and provide training in the form of seminars and
workshops or as part of initial and ongoing training for their workers (Lázaro Gutiérrez,
2021). Training tends to focus on the use of specific protocols, basic information about the
field in which work will be conducted (e.g. healthcare, road assistance, social welfare), ethi-
cal issues, and technology.

1.4.3 Ergonomics and cognition


Aspects that concern all modalities of interpreting are those of ergonomics and cogni-
tive load. As a result, several studies have focused on professional matters in relation to
ergonomic aspects, and many of these also in relation to cognitive load. For instance,
Moser-Mercer (2005) compared the performance and perceptions of a group of inter-
preters who were working, first, on-site and, then, remotely. Here, interpreters perceived
working on-site as less stressful, and performance indicators showed it to be less tiring
for them. As a result, this group produced higher-quality renditions. This result could be
explained by the fact that this group of interpreters were unfamiliar with remote work.
Consequently, working remotely would have caused them physiological and psychological
strain and greater use of cognitive resources. The author, therefore, recommends shorter
shifts for remote interpreters to combat this. Furthermore, Moser-Mercer’s suggestions,
regarding the time of fatigue onset, have since been applied to both remote simultaneous
and telephone interpreting (Lazaro Gutiérrez and Rossi, under review). Additional sugges-
tions by Moser-Mercer include looking more closely at interpreters’ visual and social needs.
This suggestion has since been supported by authors studying distance interpreting in all its
modalities, including telephone interpreting (De Boe, 2020; Spinolo, 2022; X. Zhang et al.,
2023; W. Zhang et al., 2024).
Other authors have addressed interpreters’ working environments in the context of
remote simultaneous interpreting. For example, Roziner and Shlesinger (2010) focused
on both physiological and perceived levels of stress and fatigue, as well as quality of
interpreter performance. They found divergence between the interpreters’ subjective per-
ceptions of their performances and objectively measured their quality. Interpreters in this
study reported feeling ‘more tired and stressed’ and assessed their remote performance
as being of ‘lower quality’ compared with the results of objective measurements. Nev-
ertheless, the objective measurements did indicate that quality decreased more quickly
in remote assignments. It is therefore possible to state that the results of more negative
self-perception reflect interpreters’ stress and anxiety related to having to use remote
interpreting technology. Similar studies that focused on TI (Stengers et al., 2023; Lázaro
Gutiérrez, 2022) have also identified the presence of anxiety amongst interpreters when
having to use technology.
Roziner and Shlesinger (2010) also highlight the question of financial compensation in
their study and draw attention to the need for adequate financial compensation in interpret-
ing. This last point is also brought to the fore by studies focusing on TI. Studies by Lázaro
Gutiérrez and Nevado Llopis (2022) and Lázaro Gutiérrez (2014, 2022) also point at finan-
cial matters regarding TI, although it is not often the main research focus. The issue dealt
with here is that there has been a change in how fees are charged, rather than a question

20
Telephone Interpreting

of under-remuneration. Telephone interpreters are paid in minutes worked, rather than for
completed assignments or working days (although some TI companies do establish a mini-
mum pay for an interpreted telephone call).
The question of ergonomics has recently been investigated by scholars in relation to dis-
tance interpreting. Mainly focusing on simultaneous conference interpreting, the results of
these studies are also applicable to TI because they refer to aspects that concern remoteness
and telework. For instance, Ziegler and Gigliobianco (2018) and Spinolo (2022) suggest
that an interpreter’s workstation should mimic on-site conditions and be in a quiet location.
They say that interpreters should use good-quality headphones and microphones and work
at a desk to allow for comfortable note-taking and consultation of sources. Scholarly work
has also highlighted the importance of including interpreters in the design of user-centred
technologies (Mellinger, 2023). Current studies aiming at the development of CAI tools
for telephone interpreters, such as those conducted by the group FITISPos-UAH, use two
separate cohorts of interpreters in the research (students and professionals) and include
ergonomic testing and acceptance analysis.
Human–machine interaction and cognitive ergonomics (O’Brien, 2012) also contrib-
ute to the literature regarding remote interpreters’ working conditions. After acknowledg-
ing that distance interpreting increases cognitive load and leads to screen fatigue, stress,
isolation, and alienation, Liu (2022) reflects on the need for further training for remote
interpreters. They highlight the need for interpreters to know how to use remote interpret-
ing platforms, how to set up their home office, how to communicate efficiently with rel-
evant stakeholders, and how to fight for good working conditions. Remote interpreters are
expected to perform effectively, despite exposure to sudden, loud noises (which can cause
acoustic shocks), despite difficulties in collaborating with colleague interpreters, and with
limited contextual information about an assignment. Training is therefore deemed essential,
but many of the trainers themselves have limited experience of working remotely. Fur-
thermore, many have negative views of distance interpreting and see it to be a provisional
contingency in times of crisis only. This may be one of the reasons that TI companies tend
to provide TI training themselves, either via university workshops as extra-curricular activi-
ties, as training to new interpreter colleagues, or as part of the selection and recruitment
process (Lázaro Gutiérrez and Cabrera Méndez, 2021b).
Besides aspects related to working conditions and training, the human–machine inter-
play in interpreting has also been examined in order to account for the relationship that the
different actors in interpreter-mediated interactions establish with technology (Mellinger
and Pokorn, 2018). The focus of studies has been on the way interpreters ‘use’ technology.
This includes examining interpreters’ approaches and attitudes towards the present and
potential future use of technology (Mellinger and Hanson, 2018; Stengers et al., 2023).
Mellinger (2019) focuses on how the use of technology can influence the cognitive processes
that underpin interpreting. They also use new research methods, such as close discourse
analysis or (automated) corpus analysis of interpreters’ performance, using eye trackers to
examine interpreters’ behaviour (Mellinger, 2023). As Mellinger (2023: 203) states, ‘each
technological configuration situates and embeds interpreters in a specific context, ulti-
mately shaping their experience and cognitive interaction with the environment’. Whereas
the use of technology might imply an increased cognitive load for interpreters, they have
been observed to be able to externalise their cognitive efforts to technology (Mellinger,
2023). This suggests that technology can play an important role in the preparation phase
for telephone interpreters. Similarly, in the future, one can imagine that a user-centred

21
The Routledge Handbook of Interpreting, Technology and AI

platform will be designed to assist interpreters during assignments. Current research by the
FITISPos-UAH group also makes use of eye trackers and learning analytics to examine the
cognitive workload and telephone interpreters’ focus of attention while using CAI during
assignments. This research is part of a technological development project that is in progress.
It is hoped that it offer finalised, tested products by the end of 2028.3

1.4.4 Towards automation


Different forms of automation, such as neural machine translation, automatic speech recog-
nition, term extraction, and conversion, among others, have gained popularity in the inter-
pretation industry. These tools are used not only during the preparation of assignments but
also during their delivery, since interpreters are able to access them remotely from their own
desktops, rather than purely on-site. Nonetheless, task automation remains an interesting
issue to explore in relations to TI. Since many telephone interpreters are required to per-
form without being in front of a computer, the possibility of being able to use (ergonomic)
CAI tools is, at least, an option.
Artificial boothmates have been found to be ‘particularly useful for remote interpreters’
(Rodríguez et al., 2021; Fantinuoli et al., 2022). This tool is at the core of human–machine
interaction and contributes to the paradigm of ‘augmented interpreter’. In short, artificial
boothmates capture elements of discourse which are particularly challenging for interpret-
ers (numbers, terminology, and other named entities, including proper nouns) and offer
suggestions to the interpreters, in real time. Although primarily designed for remote simul-
taneous interpreting, artificial boothmates could also be useful during telephone interpret-
ing assignments, via a dedicated interface or work console. Similarly, artificial notepads
can offer assistance to dialogue (or consecutive) interpreters. These combine note-taking
applications with automatic speech recognition and provide the possibility of integrating
additional tools, such as the management of numbers, terminology, and other named enti-
ties. One can say that automation is likely to impact TI in a positive manner, since some of
its characteristics can be processed more easily by machine. For example, because genera-
tive AI has been trained to cope with the peculiarities of human interaction, automation
could assist an interpreter who is working with a source text of chunked discourse, such as
a conversation made of turns. Well-known examples of generative AI include artificial chat
systems, such as ChatGPT.
Downie (2023) focuses heavily on economic forces when discussing the introduction or
presence of technologies in interpreting, stating how ‘[b]udgets, policies and the perceived
future of interpreting will all have as much of an effect on the actual future of interpret-
ing as any technological discovery’ (p. 284). Similarly, Liu (2022) points to the need for
students to be made aware of these changes in the market. They write that interpreters will
likely have to compete with technology for the same assignments, particularly those ‘for
which accuracy is not as important as immediacy’. There is also a possibility that interpret-
ers will be hired to perform more complex assignments for which added value is sought,
for example, in ‘complex’ TI assignments which require the coordination of discourse and
the inference of meaning from incomplete information exchanges, together with aspects of
cross-cultural communication (Kelly, 2008; Fernández Pérez, 2012; Lázaro Gutiérrez and
Cabrera Méndez, 2021a). The vulnerability of many of the end users of TI and the purpose
of many interactions related to service provision (medical consultations, emergency con-
texts, child welfare supervision, etc.) also raise concerns about liability (Lázaro Gutiérrez

22
Telephone Interpreting

et al., 2021). In these contexts, although technologically possible, the use of machine inter-
preting should be discouraged (see Fantinuoli, this volume).
End users, who are also clients or buyers of interpretation services, also play an important
role in the transformation of the market. Downie (2023: 288) claims that it is important
for society to stop viewing human and machine interpreting as ‘rivals, bidding for the same
clients and the same work’. Both interpreting solutions can be complementary in a wider
panorama of language access and communication services. Machine interpreting can be use-
ful without the presence of a human interpreter, but it can equally act to augment a human
interpreter’s capabilities. Recent literature describes examples of conversation settings where
TI is available. However, in spite of this, interactants prefer not to use it, instead preferring
communication through a friend, relative, or colleague for ‘translanguaging’ (‘the mixed
use of elements of all the languages known by the speakers to convey meaning and build
communication’; Vogel and García, 2017) (Lázaro Gutiérrez and Tejero González, 2017). In
addition, primary speakers sometimes use technology, such as machine translation, to assist
them, as reported in a study by Lázaro Gutiérrez and Tejero González, 2022. If an applica-
tion is developed for automated bilateral interpreting, this could coexist with TI.
However, wider awareness about the consequences of the use of machine interpreting
without an interpreter in the loop is required. End users need to be made aware of the
characteristics and implications of the speech events and situations which require linguistic
mediation, and when it is appropriate to use machine interpreting, TI, or on-site interpret-
ing. In the medical field, where TI is popular, guidelines have been recently published by the
Interpreting SAFE AI Task Force, constituted in the realm of the NCICH (National Council
on Interpreting in Health Care, based in the USA, and reference for healthcare interpreting
worldwide), to guide the responsible use of AI-based technology in interpreting and multi-
lingual communication. Although initial conversations, debates, and work groups focused
on bilateral interpretation, including on-site and TI, task force activities soon opened to
cover ‘conference, medical, legal, educational, business and other settings’ (https://safeaitf.
org/mission/). Outside of conferences, all other interpreting settings tend to prefer bilateral
interpretation, and particularly TI (Hickey and Hynes, 2023).
In any case, the machine will not replace the human interpreter completely, as humans
will always establish a utilitarian relationship with it, as earlier demonstrated. Ideally, inter-
preters and machines will ‘conform a sort of partnership’ (Downie, 2023) to shape the most
accepted solution for solving multilingual communication problems in our modern societies
(Monzó Nebot, 2009).

1.5 Conclusion
TI has been used for many decades. It constitutes a fast and simple alternative to on-site bilat-
eral interpreting. In times when multilingual communication is frequent and the acknowl-
edgement of linguistic and cultural differences is spreading, interpretation is increasingly
used to communicate in bilateral encounters. Public and private service provision sits at
the core of this phenomenon, together with population movements and telecommunica-
tion advancements. A shortage of trained, professional interpreters, in both widespread
languages and languages of lesser diffusion, pushes for the existence and enhancement of
a globalised market in interpreting services. This would allow interpreters all around the
world to access assignments in remote locations, therefore making the most out of their
abilities and language knowledge.

23
The Routledge Handbook of Interpreting, Technology and AI

Working remotely and embracing TI imply a disruptive challenge for on-site bilateral
interpreters, many of whom have not been trained in most technologised modalities of inter-
preting (i.e. simultaneous conference interpreting, traditionally performed from a booth).
Reluctance to use technologies in all phases of the interpreting assignment is a significant
reason for explaining the avoidance of TI. However, the demand for TI is increasing, even
when other modalities of remote interpreting are appearing and improving. Supplying
interpretation services in many different languages and covering a wide schedule are easier
with TI than with on-site bilateral interpreting. TI represents a simpler form of interpreting
in all respects and can therefore be seen as being more accessible. Telephone technology is
reliable and cheap, and for these reasons, it can be found all around the world. Humans
have grown familiar with the use of a telephone from an early age. As a result, communicat-
ing via a telephone interpreter is usually no more complicated for end users than doing so
via an on-site interpreter. The advantages that TI represents for end users (such as imme-
diacy and reduced costs) go hand in hand with the benefits that it has for interpreters
(access to more assignments, more possibilities for work–life balance, increased privacy,
and reduced exposure to the presence of risks), always acknowledging the challenges it also
implies
However, in general, new work patterns also present risks for workers. Fierce competi-
tion, alienation, and the threat of substitution by a machine affect not only interpreters but
also workers in many other sectors. In fact, technological advances are always two-sided.
While humans are eager to experiment with the benefits provided by technological aids,
we also fear the changes that they bring to our professional practices in the long run. TI
will soon be pushed to use video remote technology. In addition, remote interpreters may
also see themselves having to use CAI tools during preparation and performance phases
of assignments, and even for making themselves available for clients. Although embracing
technology is a frequent attitude amongst interpreters (e.g. Stengers et al., 2023), this does
not come without extra effort. Exhibiting open-mindedness, proactivity, and creativity will
allow current telephone interpreters to not only remain in the market but also access new
settings and scenarios, as the complexity of multilingual communication is acknowledged
worldwide.
Telephone interpreters should seek training opportunities to assist with constant adapta-
tion to a changing market, but it is also important that updated training is made available
to them. In the same vein, in order for CAI tools to be seen as attractive for telephone inter-
preters, technology developers should tailor a more ergonomic design. TI will undoubtedly
remain and coexist with other remote modalities of interpreting, since it represents simplic-
ity for end users. The work environment of telephone interpreters will undoubtedly evolve,
and interaction with technology will remain a key issue in this domain, although it will
hopefully also provide assistance to them.

Notes
1 www.anao.gov.au/work/performance-audit/management-interpreting-services (accessed 22.8.2024).
2 https://pragmacor.web.uah.es/. Ref. PID2021-127196NA-I00. Corpus pragmatics and tele-
phone interpreting: analysis of face-threatening acts. Funded by the Spanish National Research
Agency (accessed 22.8.2024).
3 INNOVATRAD-CM, Ref. PHS-2024/PH-HUM-52, Artificial intelligence and human-machine
interaction: Research and innovation in real-time discourse generation, interpretation and transla-
tion. Funded by Comunidad de Madrid.

24
Telephone Interpreting

References
Amato, A., Spinolo, N., González Rodríguez, M.J., 2018. Handbook of Remote Interpreting – SHIFT
in Orality. University of Bologna, Bologna.
Andres, D., Falk, S., 2009. Remote and Telephone Interpreting. In Andres, D., Pöllabauer, S., eds.
Spürst Du, wie der Bauch rauf-runter? Fachdolmetschen im Gesundheitsbereich/Is Everything All
Topsy Turvy in Your Tummy? Martin Meidenbauer, Munich, 9–27.
Azarmina, P., Wallace, P., 2005. Remote Interpretation in Medical Encounters: A Systematic Review.
Journal of Telemedicine and Telecare 11, 140–145.
Baigorri Jalón, J., 2014. Interpreters at the Edges of the Cold War. In Fernández Ocampo, A.,
Wolf, M., eds. Framing the Interpreter: Towards a Visual Perspective. Routledge, London,
163–171.
Braun, S., 2015. Remote Interpreting. In Mikkelson, H., Jourdenais, R., eds. The Routledge Hand-
book of Interpreting. Routledge, New York, 352–367.
Braun, S., 2019. Technology and Interpreting. In O’Hagan, M., ed. The Routledge Handbook of
Translation and Technology. Routledge, New York, 271–288.
Braun, S., 2024. Distance Interpreting as a Professional Profile. In Massey, G., Ehrensberger-Dow, M.,
Angelone, E., eds. Handbook of the Language Industry. Mouton der Gruyter, Berlin, 449–472.
Braun, S., Davitti, E., 2020. A Multidisciplinary Methodological Framework. In Iglesias, E.F., Rod-
ríguez, M.J.G., Russo, M., eds. Telephone Interpreting/L’interpretazione telefonica. Bononia Uni-
versity Press (BUP), Bologna, 30–38.
Cheng, Q., 2015. Examining the Challenges for Telephone Interpreters in New Zealand (PhD the-
sis). URL https://openrepository.aut.ac.nz/bitstream/handle/10292/9250/ChengQ.pdf?sequence=
3andisAllowed=y (accessed 12.9.2024).
Corpas Pastor, G., 2018. Tools for Interpreters: The Challenges That Lie Ahead. Current Trends in
Translation Teaching and Learning, 157–182.
Corpas Pastor, G., Fern, L., 2016. A Survey of Interpreters’ Needs and Practices Related to Language
Technology. Universidad de Málaga, Málaga.
Corpas Pastor, G., Gaber, M., 2020. Remote Interpreting in Public Service Settings: Technology, Per-
ceptions and Practice. SKASE Journal of Translation and Interpretation 13(2), 58–68.
Crezee, I., 2013. Introduction to Healthcare for Interpreters and Translators. John Benjamins,
Amsterdam.
De Boe, E., 2020. Remote Interpreting in Dialogic Settings. In Salaets, H., Brône, G., eds. Linking
Up with Video: Perspectives on Interpreting Practice and Research. John Benjamins Publishing
Company, Amsterdam, 79–106.
De Boe, E., 2023. Remote Interpreting in Healthcare Settings. Peter Lang, Bern.
Dong, J., Turner, G., 2016. The Ergonomic Impact of Agencies in the Dynamic System of Interpreting
Provision: An Ethnographic Study of Backstage Influences on Interpreter Performance. Translation
Spaces 5(1), 97–123. URL https://doi.org/10.1075/ts.5.1.06don
Downie, J., 2023. Where Is It All Going? Technology, Economic Pressures and the Future of Inter-
preting. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologyes – Current and Future
Trends. John Benjamins Publishing Company, Amsterdam, 277–301.
Drechsel, A., 2015. The Tablet Interpreter. Lean Publishing, Canada.
Drew, P., Heritage, J., eds., 2006. Conversation Analysis, vol. I. Sage Publications Ltd., London.
Fantinuoli, C., 2017. Computer-Assisted Preparation in Conference Interpreting. Translation and
Interpreting 9(2), 24–37.
Fantinuoli, C., 2018. Computer-Assisted Interpreting: Challenges and Future Perspectives. In Durán,
I., Pastor, G.C., eds. Trends, E-Tools and Resources for Translators and Interpreters. Koninklijke
Brill Nv., Liden, 153–174.
Fantinuoli, C., 2023. Towards AI-Enhanced Computer-Assisted Interpreting. In Corpas Pastor, G.,
Defrancq, B., eds. Interpreting Technologies – Current and Future Trend. John Benjamins Publish-
ing Company, Amsterdam, 46–71.
Fantinuoli, C., Dastyar, V., 2022. Interpreting and the Emerging Augmented Paradigm. Interpreting
and Society 2(2), 185–194.
Fantinuoli, C., Marchesini, G., Landan, D., Horak, D., 2022. KUDO Interpreter Assist: Automated
Real-Time Support for Remote Interpretation. ArXiv, abs/2201.01800.

25
The Routledge Handbook of Interpreting, Technology and AI

Fernández Pérez, M.M., 2012. Identificación de Las Destrezas de La Interpretación Telefónica. Uni-
versidad de La Laguna, La Laguna.
Fernández Pérez, M.M., 2015. Designing Role-Play Models for Telephone Interpreting Training.
MonTI. Monographs in Translation and Interpreting, 259–279.
Fors, J., 1999. Perspectives on Remote Public Service Interpreting. In Álvarez Lugrís, A., Fernández
Ocampo, A., eds. Quality Issues in Remote Interpreting. Servizo de Publicacións da Universidade
de Vigo, Vigo, 114–116.
García Luque, F., 2009. La interpretación telefónica en el ámbito sanitario. Realidad social y reto
pedagógico. Redit 3, 18–30.
Gentile, A., Ozolins, U., Vasilakakos, M., 1996. Liaison Interpreting: A Handbook. Melbourne Uni-
versity Press, Melbourne.
González Rodríguez, M.J., 2017. La conversación telefónica monolingüe, su futuro inmediato y su
representación en ámbito judicial-policial. In San Vicente, F., Capanaga, P., Bazzocchi, G., eds.
ORALITER: Formas de comunicación presencial y a distancia. BUP, Bologna, 197–222.
González Rodríguez, M.J., 2020. La interpretación a distancia y su formación: La experiencia de la
Shift Summer School y cómo crear la ‘virtualidad necesaria’ en el aula. In Nicoletta Spinolo, Amalia
Amato, eds., TRAlinea Special Issue: Technology in Interpreter Education and Practice, 1–8.
Gracia-García, R.A., 2002. Telephone Interpreting: A Review of Pros and Cons. In Brennan, S., ed. Pro-
ceedings of the 43rd Annual Conference. American Translators Association, Alexandria, VA, 195–216.
Heh, Y., Qian, H., 1997. Over-the-Phone Interpretation: A New Way of Communication Between
Speech Communities. In Jérôme-O’Keeffe, M., ed. Proceedings of the 38th Annual Conference.
American Translators Association, Alexandria, VA, 51–62.
Hewitt, W.E., 1995. Court Interpretation: Model Guides for Policy and Practice in the State Courts.
National Center for State Courts.
Hickey, S., Hynes, R., 2023. The 2023 Nimdzi Interpreting Index: The Ranking of the Top 34 Larg-
est Interpreting Service Providers. URL https://www.nimdzi.com/nimdzi-100-top-lsp/#thenimdzi-
100-ranking (accessed 14.9.2024).
Hlavac, J., 2013. A Cross-National Overview of Translator and Interpreter Certification Procedures.
Translation and Interpreting 5(1), 32–65.
Hornberger, J., Gibson Jr, C. D., Wood, W., Dequeldre, C., Corso, I., Palla, B., Bloch, D.A., 1996.
Eliminating Language Barriers for Non-English-Speaking Patients. Medical Care 34(8), 845–856.
Hsieh, H., 2006. Understanding Medical Interpreters: Reconceptualizing Bilingual Health Communi-
cation. Health Communication 20(2), 177–186.
Iglesias Fernández, E., Ouellet, M., 2018. From the Phone to the Classroom: Categories of Problems
for Telephone Interpreting Training. The Interpreters’ Newsletter 23, 19–44.
Jaime Pérez, A., 2015. Remote Interpreting in Public Services: Developing a 3G Phone Interpreting
Application. In Lázaro Gutiérrez, R., Sánchez Ramos, M.M., Vigier Moreno, F.J., eds. Investi-
gación Emergente En Traducción e Interpretación. Comares, Granada, 73–82.
Jones, D., Gill, P., 1998. Breaking Down Language Barriers: The NHS Needs to Provide Accessible
Interpreting Services for All. BMJ (Clinical Research Ed.) 316(7143), 1476.
Kelly, N., 2008. Telephone Interpreting: A Comprehensive Guide to the Profession. Trafford, Victo-
ria, BC.
Kerremans, K., Cox, A., Stengers, H., Lázararo Gutiérrez, R., Rillof, P., 2018. On the Use of Tech-
nologies in Public Service Interpreting and Translation Settings. In Read, T., Montaner, S., Sedano,
B., eds. Technological Innovation for Specialized Linguistic Domains. Éditions universitaires euro-
péennes, Mauritius, 57–68.
Kerremans, K., Lázaro Gutiérrez, R., Stengers, H., Cox, A., Rillof, P., 2019. Technology Use by Pub-
lic Service Interpreters and Translators: The Link Between Frequency of Use and Forms of Prior
Training. FITISPos International Journal 6(1), 107–122.
Ko, L., 2006. The Need for Long-Term Empirical Studies in Remote Interpreting Research: A Case Study
of Telephone Interpreting. Linguistica Antverpiensia, New Series – Themes in Translation Studies 5.
Kurz, I., 1999. Remote Conference Interpreting: Assessing the Technology. Anovar/anosar estudios de
traducción e interpretación I, 114–116.
Lázaro Gutiérrez, R., 2014. Use and Abuse of an Interpreter. In Valero-Garcés, C., ed. (RE)Visit-
ing Ideology and Ethics in Situations of Conflict. Servicio de Publicaciones de la Universidad de
Alcalá, Alcalá de Henares.

26
Telephone Interpreting

Lázaro Gutiérrez, R., 2020. Fidelidad, confidencialidad y empatía en consultas médicas con víctimas
de violencia de género mediadas por un intérprete. In Pinazo, E.P., ed. Interpreting in a Changing
World: New Scenarios, Technologies, Training Chalenges and Vulnerable Groups. Peter Lang,
Berlin, 49–64.
Lázaro Gutiérrez, R., 2021. Remote (Telephone) Interpreting in Healthcare Settings. In Susam-Saraeva,
Ş., Spišiaková, E., eds. The Routledge Handbook of Translation and Health. Routledge, London,
216–231.
Lázaro Gutiérrez, R., 2022. Self-Care for Interpreters. In Porlán Moreno, R., Arnedo Villaescusa, C.,
eds. Interpreting in the Classroom: Tools for Teaching. UCO Press, Córdoba, 137–154.
Lázaro Gutiérrez, R., Cabrera Méndez, G., 2019. Chapter 2. Context and Pragmatic Meaning in
Telephone Interpreting. In Garcés-Conejos Blitvich, P., Fernández-Amaya, L., Hernández-López,
M.O., eds. Technology Mediated Service Encounters. John Benjamins Publishing Company,
Amsterdam, 45–68.
Lázaro Gutiérrez, R., Cabrera Méndez, G., 2021a. Development of Technological Competences:
Remote Simultaneous Interpreting Explored. Translating and the Computer 43.
Lázaro Gutiérrez, R., Cabrera Méndez, G., 2021b. How COVID-19 Changed Telephone Interpreting
in Spain. International Journal of Translation and Localization 8(2), 137–155.
Lázaro Gutiérrez, R., Cabrera Méndez, G., 2023. Hazard Communication Through Telephone
Interpreters During the Pandemic in Spain: The Case of COVID-19 Tracer Calls. The Translator
28(3), 1–15.
Lázaro Gutiérrez, R., Cabrera Méndez, G., 2024. Widening the Scope of Interpreting in Conflict Set-
tings: A Description of the Provision of Interpreting During the 2021 Afghan Evacuation to Spain.
In Declercq, C., Kerremans, K., eds. The Routledge Handbook of Translation, Interpreting and
Crisis. Routledge, Londres y Nueva York, 172–186.
Lázaro Gutiérrez, R., Iglesias-Fernández, E., Cabrera-Méndez, G., 2021. Ethical Aspects of Tel-
ephone Interpreting Protocols. Verba Hispanica 29(1), 137–156. URL https://doi.org/10.4312/
vh.29.1.137-156
Lázaro Gutiérrez, R., Nevado Llopis, A., 2022. Remote Interpreting in Spain After the Irruption of
COVID-19: A Mapping Exercise. Hikma 21(2), 211–230.
Lázaro Gutiérrez, R., Ross, C., under review. The Telephone Interpreter and the Machine: An Explor-
atory Study into the Potential of Adapting CAI-Tools to Dialogue Interpreting. Target.
Lázaro Gutiérrez, R., Tejero González, J.M., 2017. Interculturalidad y Mediación Cultural en el
Ámbito Sanitario. Descripción de la implementación de un programa de mediación intercultural
en el Servicio de Salud de Castilla-La Mancha. Panace@ XVIII(46), Segundo semestre, 97–107.
Lázaro Gutiérrez, R., Tejero González, J.M., 2022. Challenging Ideologies and Fostering Intercultural
Competence: The Discourses of Healthcare Staff About Linguistic and Cultural Barriers, Interpret-
ers, and Mediators. In Määttä, S.K., Hall, M.K., eds. Mapping Ideology in Discourse Studies. De
Gruyter Mouton, Berlin and Boston, 223–246.
Lee, J., 2007. Telephone Interpreting Seen from the Interpreters’ Perspective. Interpreting 9(2),
231–252.
Liu, J., 2022. The Impact of Technology on Interpreting: An Interpreter and Trainer’s Perspective.
International Journal of Chinese and English Translation and Interpreting (IJCETI) 1(8).
Mack, G., 2001. Conference Interpreters on the Air: Live Simultaneous Interpreting on Italian Tel-
evision. In Gambier, Y., Gottlieb, H., eds. (Multi) Media Translation: Concepts, Practices, and
Research. John Benjamins Publishing Company, Amsterdam and Philadelphia, 125–132.
Martínez-Gómez, A., 2008. La interpretación telefónica en los servicios de atención al inmigrante de
Castilla – La Mancha. In Valero Garcés, C., Pena Díaz, C., Lázaro Gutiérrez, R., eds. Investigación
y práctica en traducción e interpretación en los servicios públicos: Desafíos y alianzas. Servicio de
Publicaciones de la Universidad de Alcalá, Alcalá de Henares, 338–353.
Mellinger, C.D., 2019. Computer-Assisted Interpreting Technologies and Interpreter Cognition:
A Product and Process-Oriented Perspective. Tradumàtica 17, 33–44.
Mellinger, C.D., 2023. Embedding, Extending, and Distributing Interpreter Cognition with Tech-
nology. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies – Current and Future
Trend. John Benjamins Publishing Company, Amsterdam, 195–216.
Mellinger, C.D., Hanson, T.A., 2018. Interpreter Traits and the Relationship with Technology and
Visibility. Translation and Interpreting Studies 13(3), 366–394.

27
The Routledge Handbook of Interpreting, Technology and AI

Mellinger, C.D., Pokorn, N.K., 2018. Community Interpreting, Translation and Technology. Transla-
tion and Interpreting Studies 13(3), 337–341.
Mikkelson, H., 2003. Telephone Interpreting: Boon or Bane? In González, L.P., ed. Speaking in
Tongues: Language Across Contexts and Users. Universitat de València, Valencia, 251–269.
Mintz, D., 1998. Hold the Phone: Telephone Interpreting Scrutinized. Proteus 7(1), 1–5.
Monzó Nebot, E., 2009. Legal and Translational Occupations in Spain: Regulation and Specialization
in Jurisdictional Struggles. In Sela-Sheffy, R., Shlesinger, M., eds. Profession, Identity and Status,
Special Issue of Translation and Interpreting Studies 4(2), 134–154.
Moser-Mercer, B., 2005. Remote Interpreting: The Crucial Role of Presence. Bulletin VALS-ASLA
81, 73–97.
Moser-Mercer, B., 2015. Pedagogy. In Pöchhacker, F., Grbic, N., Mead, P., Setton, R., eds. Routledge
Encyclopedia of Interpreting Studies. Routledge, London, 303–306.
Mouzourakis, P., 2006. Remote Interpreting: A Technical Perspective on Recent Experiments. Inter-
preting 8(1), 45–66.
Murgu, D., Jiménez, S., 2011. La formación de un intérprete telefónico. In Valero Garcés, C., ed. Tra-
ducción e interpretación en los servicios públicos en un mundo INTERcoNEcTado. Universidad
de Alcalá Servicio de Publicaciones, Alcalá de Henares, 214–219.
O’Brien, S., 2012. Translation as Human-Computer Interaction. Translation Spaces 1(1), 101–122.
Oviatt, S., Cohen, P., 1992. Spoken Language in Interpreted Telephone Dialogues. Computer Speech
and Language 6, 277–302.
Ozolins, U., 2011. Telephone Interpreting: Understanding Practice and Identifying Research Needs.
Translation and Interpreting 3(1), 33–47.
Paneth, E., [1957] 2002. An Investigation into Conference Interpreting. In Pöchhacker, F., Shlesinger,
M., eds. The Interpreting Studies Reader. Routledge, London, 31–41.
Peng, K., Mo, A., Liu, M., 2023. Interacting Modalities in the Teletherapeutic Triad and Interpreter’s
Coping Tactics. In Corpas Pastor, G., Hidalgo Ternero, C.M., eds. Proceedings of the International
Workshop on Interpreting Technologies SAY-IT 2023, 5–7 June | Malaga. Incoma, Varna, 61–67.
Phelan, M., 2001. The Interpreter’s Resource. Multilingual Matters, Manchester.
Phillips, C., 2013. Remote Telephone Interpretation in Medical Consultations with Refugees:
Meta-Communications About Care, Survival and Selfhood. Journal of Refugee Studies 26(4),
505–523.
Pöchhacker, F., 2020. “Going Video”: Mediality and Multimodality in Interpreting. In Saalets, H.,
Gert, B., eds. Linking Up with Video. John Benjamins Publishing Company, Amsterdam, 13–45.
Price, E.L., Pérez-Stable, E.J., Nickleach, D., López, M., Karliner, L.S., 2012. Interpreter Perspectives
of In-Person, Telephonic, and Videoconferencing Medical Interpretation in Clinical Encounters.
Patient Education and Counseling 87(2), 226–232.
Prieto, M.N., 2008. La interpretación telefónica en los servicios sanitarios públicos. Estudio del caso:
el servicio de “conversación a tres” del Hospital Carlos Haya de Málaga. In Valero Garcés, C.,
Pena Díaz, C., Lázaro Gutiérrez, R., eds. Investigación y práctica en traducción e interpretación
en los servicios públicos: desafíos y alianzas. Servicio de Publicaciones de la Universidad de Alcalá,
Alcalá de Henares, 369–384.
Rodriguez, S., Gretter, R., Matassoni, M., Alonso, A., Corcho, O., Rico, M., Daniele, F., 2021. SmarT-
erp: A CAI System to Support Simultaneous Interpreters in Real-Time. In Mitkov, R., Sosoni, V.,
Giguère, J.C., Murgolo, E., Deysel, E., eds. Proceedings of the Translation and Interpreting Tech-
nology Online Conference. INCOMA Ltd., 102–109.
Rosenberg, B.A., 2007. A Data Driven Analysis of Telephone Interpreting. In Wadensjö, C., Dim-
itrova, B.E., Nilsson, A., eds. The Critical Link 4: Professionalisation of Interpreting in the Com-
munity. John Benjamins Publishing Company, Amsterdam, 65–77.
Roy, C., 2000. Interpreting as a Discourse Process. Oxford University Press, Oxford.
Roziner, I., Shlesinger, M., 2010. Much Ado About Something Remote: Stress and Performance in
Remote Interpreting. Interpreting 12(2), 214–247.
Ruiz Mezcua, A., 2018. General Overview of Telephone Interpretation (TI): A State of the Art. In
Ruiz Mezcua, A., ed. Approaches to Telephone Interpretation: Research, Innovation, Teaching
and Transference. Peter Lang, Bern, 9–17.
Saint-Louis, L., Friedman, E., Chiasson, E., Quessa, A., Novaes, F., 2003. Testing New Technologies
in Medical Interpreting. Cambridge Health Alliance, Somerville, MA.

28
Telephone Interpreting

Serrano, O.J., 2020. Foto fija de la interpretación simultanea remota al inicio del 2020. Revista Tra-
dumàtica. Tecnologies de la Traducció 17, 59–80.
Spinolo, N., 2022. Remote Interpreting. In Franco Aixelá, J., Muñoz Martín, R., eds. ENTI (Ency-
clopaedia of Translation and Interpreting). AIETI (Asociación Ibérica de Estudios de Traducción
e Interpretación).
Spinolo, N., Bertozzi, M., Russo, M., 2018. Basic Tenets and Features Characterising Telephone-and
Video-Based Remote Communication in Dialogue Interpreting. In Amalia, A., Spinolo, N.,
González Rodríguez, M.J., eds. Handbook of Remote Interpreting – SHIFT in Orality. AMS Acta,
Bologna, 12–25.
Stengers, H., Lázaro Gutiérrez, R., Kerremans, K., 2023. Public Service Interpreters’ Perceptions and
Acceptance of Remote Interpreting Technologies in Times of a Pandemic. In Corpas Pastor, G.,
Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. John Benjamins Pub-
lishing Company, Amsterdam, 109–141.
Sultanic, I., 2022. Interpreting in Pediatric Therapy Settings During the COVID-19 Pandemic: Ben-
efits and Limitations of Remote Communication Technologies and Their Effect on Turn-Taking
and Role Boundary. FITISPos-IJ 9(1), 78–101.
Tripepi-Winteringham, S., 2010. The Usefulness of ICTs in Interpreting Practice. The Interpreters’
Newsletter 15, 87–99.
Valero Garcés, C., Lázaro Gutiérrez, R., Del Pozo Triviño, M., 2015. Interpretar en casos de violencia
de género en el ámbito médico. In Toledano Buendía, C., del Pozo Triviño, M., eds. Interpretación
en Contextos de Violencia de Género. Tirant Lo Blanch, Valencia.
Verrept, H., 2011. Intercultural Mediation Through the Internet in Belgian Hospitals. Proceedings of
the 4th International Conference on Public Service Interpreting and Translation.
Vidal, M., 1998. Telephone Interpreting: Technological Advance or Due Process Impediment? Proteus
7(3), 1–6.
Vogel, S., García, O., 2017. Translanguaging. In Noblit, G., Moll, L., eds. Oxford Research Encyclo-
pedia of Education. Oxford University Press, Oxford.
Wadensjö, C., 1998. Interpreting as Interaction. Addison Wesley Longman, London and New York.
Wadensjö, C., 1999. Telephone Interpreting & the Synchronization of Talk in Social Interaction. The
Translator 5(2), 247–264.
Wang, J., 2021. “I Only Interpret the Content and Ask Practical Questions When Necessary.” Inter-
preters’ Perceptions of Their Explicit Coordination and Personal Pronoun Choice in Telephone
Interpreting. Perspectives: Studies in Translation Theory and Practice 29(4), 625–642.
Zhang, W., Davitti, E., Braun, S., 2024. Charting the Landscape of Remote Medical Interpreting: An
International Survey of Interpreters Working in Remote Modalities in Healthcare Services. Per-
spectives: Studies in Translation Theory and Practice, 1–26.
Zhang, X., Corpas Pastor, G., Zhang, J., 2023. Videoconference Interpreting Goes Multimodal. In
Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. John
Benjamins Publishing Company, Amsterdam, 169–194.
Ziegler, K., Gigliobianco, S., 2018. Present? Remote? Remotely Present! New Technological
Approaches to Remote Simultaneous Conference Interpreting. In Fantinuoli, C., ed. Interpreting
and Technology. Language Science Press, Berlin, 119–139.

29
2
VIDEO-MEDIATED
INTERPRETING
Sabine Braun

2.1 Introduction
Video-mediated interpreting (VMI), a term coined in the AVIDICUS projects (Braun, 2016),
refers to all modalities of interpreting that use audiovisual telecommunications technology
to enable the delivery of interpreting services where the interpreter(s) and at least one of the
primary participants are in separate locations. Alongside telephone-mediated interpreting
(see Lázaro Gutiérrez, this volume), VMI represents a key modality of distance interpreting
(DI; Braun, 2024).
The concept of DI likely originated in Germany in the 1950s, championed by Fredo
Nestler, a German interpreter and inventor. Nestler envisioned a centralised telephone inter-
preting service in Europe, where interpreters would be connected to international telephone
calls to provide simultaneous interpreting (Nestler, 1957). Although his system was never
implemented as a telephone service, the German ViKiS project later tested the feasibility
of a VMI service that incorporated a simultaneous interpreter into two-way video calls
between business clients, using an adapted videoconferencing system with additional audio
channels (Braun, 2001, 2004).
The configurations that are now more commonly associated with the term VMI are
those designed for consecutive/dialogue interpreting settings, using standard videoconfer-
encing systems. Such VMI configurations emerged from the 1990s onwards in the context
of public service interpreting, for example, in bilingual court proceedings, and later in the
context of healthcare communication. In courts, the adoption of videoconferencing tech-
nology to connect remote defendants or witnesses to a courtroom has created a need to
integrate interpreters into these video links when the defendant or witness did not speak
the language of the court. In healthcare settings, the driver for the use of videoconferencing
technology to deliver interpreting services has been the need to optimise access to interpret-
ers in increasingly multilingual societies.
In the context of multilingual conference interpreting, various configurations of VMI
in simultaneous mode were tested by UNESCO in the 1970s, laying the foundations for
remote simultaneous interpreting (RSI) (see Chmiel and Spinolo, this volume). Unlike stand-
ard VMI configurations, which are based on mutual visibility of all participants, including

DOI: 10.4324/9781003053248-4 
Video-mediated interpreting

the interpreter, RSI is an asymmetric configuration with additional audio channels where
the primary participants are visible to the interpreter, but not vice versa, in line with the
convention for simultaneous (conference) interpreting. While early RSI solutions involved
interpreters working in traditional interpreting booths, technological advances, acceler-
ated by the virtualisation of interpreting services during the COVID-19 pandemic, led to
the development of platform-based RSI solutions using simultaneous interpreting delivery
platforms (SIDPs). Another configuration of VMI, known as video relay service (VRS),
emerged to mediate communication between a deaf person and a hearing person who are
not co-located. Both connect to a sign language interpreter: the deaf person by video link,
and the hearing person by telephone (see Warnicke, this volume).
The shift towards online work during the COVID-19 pandemic contributed to the
expansion of VMI from a hitherto relatively marginal practice of interpreting remotely
for individual clients or specific events to a much more widespread modality of inter-
preting in fully virtual and/or complex hybrid event configurations. While interpreters
have often been critical of VMI, the COVID-19 pandemic has highlighted its benefits
in meeting linguistic demand, and the rapidly increasing exposure to VMI (or DI more
broadly) has enabled interpreters to adapt and develop new competencies, beginning to
change perceptions of VMI. At the same time, the many new and untested configura-
tions of working that emerged during the pandemic have also created new and poten-
tially more challenging working conditions, such as fully virtual meetings with many
participant sites. These new conditions have led to an increased need for interpreters
to work online for extended periods, use cloud-based or software-based interpreting
platforms, combine multiple modes of communication (audio-only/video), and/or use
multiple devices to connect with clients and fellow interpreters. The implications for the
quality and effectiveness of communication in VMI, along with other factors, have yet
to be fully addressed.
This chapter provides an overview of VMI as a growing professional practice and area
of research within interpreting studies. Section 2.2 defines key concepts and outlines the
primary configurations of VMI. Section 2.3 briefly traces the historical development of
VMI, while Section 2.4 examines its practices and representation in research across vari-
ous interpreting fields. Section 2.5 reviews specific research topics related to VMI. Finally,
Section 2.6 concludes with a brief outlook on potential future trends and developments.

2.2 Key characteristics of VMI


As mentioned previously, the term VMI refers to the modality of distance interpreting (DI)
in which an interpreter is connected to at least one primary communication participant
through audiovisual communication technology. A defining aspect of VMI is the spatial
separation of the primary participants and interpreters relative to one another. However,
VMI today involves a variety of configurations. One primary distinction can be made
between the configuration where all primary participants are co-located, while an inter-
preter or team of interpreters works from one or more remote locations to provide inter-
preting services (video remote interpreting; VRI), and the configuration where the primary
participants themselves are distributed across multiple locations, that is, hybrid or virtual
events, with the interpreter (team) either joining from one of these locations or from an
additional site (videoconference interpreting; VCI) (Braun, 2024).

31
The Routledge Handbook of Interpreting, Technology and AI

Differentiating between VRI and VCI helps clarify the distinct contexts in which
interpreters work with audiovisual communication technology. Conceptually, this dis-
tinction brings out the reasons why interpreters encounter different configurations. From
a practical point of view, the distinction reflects the different types of events interpreters
are engaged in. The VRI configuration applies to in-person events, where the technology
is used to connect interpreters to the physical venue in which the primary participants
are located. By contrast, VCI configurations arose from the need to integrate interpret-
ers into hybrid or fully virtual events, such as conferences with remote speakers, court
hearings with remote witnesses, or consultations between a lawyer in their law firm and
a client in prison. Initially, the VCI configuration saw interpreter(s) commonly being
co-located with the primary participants at one of the primary participant sites, but
as hybrid and virtual event formats became more frequent and grew more complex
and diverse, particularly because of the virtualisation of human interaction during the
COVID-19 pandemic, interpreters increasingly joined these events from separate sites,
such as an interpreting hub or their home office. This development may have blurred
the boundaries between VRI and VCI. In addition, the different configurations share
many features, such as spatial separation of at least some of the participants. These
factors might suggest that the distinctions between different configurations are less rel-
evant today. However, they remain important not only for conceptual clarity but also
because, as we will see later in this chapter, different configurations can impact the
interpreter-mediated communication in different ways.
A further conceptual distinction should be drawn based on the medium used for ser-
vice delivery – audio-only (e.g. telephone or audio connections via web conferencing plat-
forms) versus audio-video – highlighting the difference between telephone-mediated and
video-mediated interpreting. However, interpreters also work in hybrid, mixed-media sce-
narios, such as when a video link is unavailable for certain participants in a virtual event
(Zhang et al., 2024) or when participants choose to disable their video feed during an event.
These practices have further diversified the configurations of DI.
In addition, some configurations of VMI are inherently asymmetrical in terms of media
use. For instance, DI in conference interpreting settings – normally referred to as remote
simultaneous interpreting (RSI) regardless of the distribution of primary participants and
interpreters relative to one another – typically involves video feeds from participants to
interpreters, while the audience receives only an audio feed, reflecting traditional confer-
ence interpreting norms (see also Chmiel and Spinolo, this volume). Similarly, video relay
services (VRS), which are used to facilitate communication between deaf and hearing indi-
viduals, connect an interpreter by video link to the deaf person (to provide sign language
interpreting) and by telephone to the hearing person (Warnicke and Plejert, 2012; Napier
et al., 2018b; Warnicke, this volume).
This initial overview highlights the need to consider additional parameters in devel-
oping a comprehensive taxonomy of VMI. Firstly, the different configurations of VMI
cannot be fully described without drawing on the parameters that traditionally apply to
interpreter-mediated communicative events, such as the setting (e.g. conference, business,
legal, health- and social care, humanitarian), the type of event (monologic/dialogic, bi/mul-
tilingual), and the mode of interpreting used. The standard application of VMI typically
involves dialogic, bilingual interactions supported by consecutive or dialogue interpreting.
However, the asymmetrical modalities outlined earlier, that is, RSI and VRS, also exhibit
characteristics of VMI.

32
Video-mediated interpreting

Further relevant parameters for a comprehensive description of VMI, highlighting the


many nuances that characterise this modality of interpreting, include the type of location
from where the interpreter works, whether an interpreting hub with professional inter-
preter working stations or booths for simultaneous interpreting or an interpreter’s home or
mobile office; the system or platform that is used to connect the participants and deliver the
interpreting service, such as a generic videoconferencing platform or a bespoke interpreting
delivery platform with a specific software interface for the interpreter with or without addi-
tional audio channels enabling simultaneous interpretation; the type of connection, that is,
broadband internet or a mobile network; and the technical set-up, for instance, in terms of
number, size, and/or type of screens, cameras, microphone, and headset, and the presence
or absence of a hardware console with interpreting functions. All these parameters impact
the quality of the interpreter’s work environment and working conditions, and – as we will
see – many of them have been shown to affect interpreters’ and service users’ perceptions
of VMI as well as different dimensions of the interpreting task. The interpreter’s location
(home or hub) also has implications for liability and data security, raising questions about
who is responsible for technical breakdowns or data breaches (Ziegler and Gigliobianco,
2018).

2.3 Historical evolution of VMI


VMI, in the broader sense outlined in the preceding text, initially emerged in the fields
of conference and legal interpreting, but it took different avenues in each of these fields,
reflecting the different requirements of conference interpreting for international organisa-
tions (including international courts and tribunals) or commercial markets on the one hand,
and legal interpreting for police, courts, and tribunals, as a form of public service interpret-
ing, on the other.
International organisations, international courts, and the commercial/business sector
require interpreting in simultaneous mode, often between multiple language pairs, leading
to RSI, as pointed out earlier. When UNESCO launched its Symphonie satellite in the early
1970s and explored applications for the technology, RSI emerged as a candidate, given
the potential of telecommunications technology to reduce the need to move large inter-
preter teams around the world (UNESCO, 1976). Different variants were tested, including
the delivery of interpreting services via audio-only and audiovisual connections. From the
1980s through to the 2000s, several further tests were conducted by supranational organ-
isations, including the United Nations’ International Telecommunication Union and the
European Union, to evaluate RSI in the context of conference interpreting, this time using
experimental high-quality video links as well as ISDN-based solutions (for overviews, see
Braun, 2015; Mouzourakis, 1996, 2006; see also Section 2.4.1).
A watershed moment for RSI was the 2005 G20 Summit at Hampton Court Palace in
London, which saw the first large-scale use of RSI in practice. As in the earlier tests, inter-
preters worked from traditional interpreting booths with hardware consoles. At the G20
Summit, the booths were implemented in a nearby temporary RSI hub to avoid the need
to install interpreting booths in the historic Hampton Court Palace building. A further
turning point came in the 2010s, when several companies began to develop software-based
simultaneous interpreting delivery platforms (SIDPs), taking advantage of advances in web
conferencing technology, to facilitate RSI without the need for interpreting booths and
traditional conference interpreting technology (Ziegler and Gigliobianco, 2018). While the

33
The Routledge Handbook of Interpreting, Technology and AI

first commercial RSI solutions were entirely audio-based, today’s RSI solutions normally
provide one or more video feeds from the venue and/or from individual participants to the
interpreter.
Meanwhile, in the justice sector, the increasing use of videoconferencing technology in
courts led to a steady increase in VCI in video links between courts and remote witnesses
and/or defendants (Braun and Taylor, 2012b). An early implementation of VRI was in the
9th Judicial Circuit in Florida, which introduced this service in 2007. The videoconferenc-
ing platform allows interpreters to switch between consecutive and simultaneous interpret-
ing to mimic the conventions of traditional court interpreting (i.e. consecutive into the
official language of the court, and simultaneous out of that language). The Metropolitan
Police Service in London introduced VRI in 2011, with interpreters working in consecu-
tive mode from central remote interpreting hubs linked to London police stations (Braun
and Taylor, 2012b). In the healthcare sector, VRI began to replace telephone-mediated
interpreting in the 2010s (Marshall et al., 2019), but VRI still appears to be less common
than telephone interpreting in many healthcare settings. Many of the developments in VMI
outlined here took place in relation to both spoken language interpreting and sign language
interpreting (Napier et al., 2018a).
Various technologies have been used for VMI over the years. The early UNESCO tri-
als used satellite technology, but transmission was too slow. The digital telephone net-
work (ISDN) used from the 1990s onwards did not have sufficient bandwidth to provide
adequate audio and video quality for interpreting. The introduction of broadband internet,
along with improved technical standards for audio and video transmission, combined with
high-quality equipment, for example, in dedicated hubs for interpreters, made VMI a more
technically feasible option. However, the recent shift towards home/mobile working, using
cloud-based video platforms, has resulted in less control over the technical environment,
leading to variations in sound and image quality and potentially worsening working condi-
tions for interpreters (Buján and Collard, 2022).
Overall, regardless of the set-up or technological basis employed, VMI has been associ-
ated with a range of technical challenges, many of which persist to some extent. Insufficient
audio quality for interpreting purposes is perhaps the most frequently reported issue. Devel-
opers of videoconferencing systems have typically prioritised video quality over audio,
deeming the latter adequate for monolingual communication while it is not always suf-
ficient for interpreters’ specific needs regarding source speech comprehension. Additional
challenges include latency in audio and video transmission, lack of synchronicity between
audio and video feeds, and issues with system and network stability.
Perhaps unsurprisingly, given these challenges, interpreters have often been reluctant to
use VMI (Braun and Taylor, 2012b; Mouzourakis, 2006). In recent years, however, there
appears to have been a shift in attitudes, with an increased focus on the benefits of VMI
(e.g. Seeber et al., 2019; Corpas Pastor and Gaber, 2020). One possible explanation is the
growing exposure of interpreters to all modalities of DI, which has allowed interpreters to
gain first-hand experience, develop strategies to cope with different DI modalities, and iden-
tify benefits, particularly during the COVID-19 pandemic. Nimdzi (2023) highlighted in its
Interpreting Index that DI now accounts for nearly 50% of the interpreting market in the
post-pandemic era, with VMI (in the narrower sense, that is, bilingual consecutive/dialogue
interpreting) at 20%, RSI at 16%, and telephone-mediated interpreting at 13%. This marks
a significant increase from pre-pandemic market shares of 10% for telephone-mediated
interpreting, 7% for VMI, and 3% for RSI.

34
Video-mediated interpreting

2.4 From practice to research: VMI in different settings


As noted earlier, different VMI configurations arose from distinct needs. The introduction
of VRI has been associated with optimising access to interpreters in response to growing
linguistic demand and financial pressures on the cost of interpreting. In the public sec-
tor, shortages of qualified interpreters for some language pairs, often a result of the trend
towards the outsourcing of interpreting sources and lower pay of interpreters, have fur-
ther pushed the implementation of VRI. VCI, by contrast, has emerged in response to the
increasing virtualisation of events and collaboration and the ensuing increase in hybrid or
meetings, into which interpreters need to be integrated where language barriers exist. While
the different configurations share many features, each also has unique characteristics influ-
encing VMI practice. This section provides an overview of VMI applications and research
across a range of settings.

2.4.1 VMI in conference interpreting settings


As mentioned earlier, the first practical test of VMI in conference interpreting settings,
that is, RSI, took place in 1976 and was conducted by UNESCO via satellite link. The test
included three different modalities: remote interpreting by audio-only link and VRI from
Paris for UN delegates in Nairobi, and VCI between UN delegates in Paris and Nairobi,
with the interpreters located in Paris. The outcomes were documented in a practice report,
which noted that any form of remote interpreting was more challenging than videocon-
ference interpreting and that the video-mediated modalities were more effective than the
audio-only modalities (UNESCO, 1976).
Further VMI implementations in international organisations during the 1980s experi-
mented with both VRI and VCI across geographical distances and explored VRI solutions
aimed at removing interpreters from the conference venue and interpreting from a video
feed in an off-room instead (Jumpelt, 1985; for overviews, see Braun, 2015; Mouzoura-
kis, 1996, 2006). The research studies that followed in the 1990s and 2000s focused on
VRI rather than on VCI. They all included video streams to the interpreter rather than
audio-only connections, as the video modality had been found to be superior to the
audio-only modality (Mouzourakis, 1996). The first empirical study of VRI in conference
settings, using simultaneous interpreting, was carried out at the Heinrich Hertz Institute
(Böcker and Anderson, 1993), simulating VRI and using an ISDN connection. The study
concluded that VRI in this setting led to interpreter fatigue and low job satisfaction, with
ISDN bandwidth only marginally adequate for the task.
Later, two comparative studies of on-site interpreting and RSI were conducted: one by
the International Telecommunication Union (ITU) in 1999, in collaboration with the Uni-
versity of Geneva, involving 12 interpreters working in two language pairs (Moser-Mercer,
2003), and another by the European Parliament in 2004, involving 36 interpreters working
in multiple language pairs (reported in Roziner and Shlesinger, 2010). Conducted under
improved technical conditions, these studies found little objective difference in interpret-
ing quality or main human factors (e.g. stress levels). However, interpreters reported lower
satisfaction, higher perceived stress, and a sense that their RSI performance was inferior to
on-site interpreting. The ITU/Geneva study also indicated an earlier onset of fatigue during
RSI compared to on-site interpreting (Moser-Mercer, 2003).
RSI tests employed varying technical conditions. Although audio quality frequently
failed to meet the ISO 2063 standard for simultaneous interpreting booths, Mouzourakis

35
The Routledge Handbook of Interpreting, Technology and AI

(2006) argued that recurring physiological and psychological issues observed across stud-
ies were more attributable to the overarching condition of remoteness than to any specific
technical factor. Professional conference interpreters, particularly within the International
Association of Conference Interpreters (AIIC), strongly opposed RSI in the 2000s, deeming
remote interpreting unacceptable (AIIC, 2000).
The shortcomings of RSI identified in the early studies, such as the absence of a direct
view of the speaker and the resulting feelings of remoteness or alienation, prompted efforts
to optimise the technical set-up of RSI, including the use of multiple large screens to provide
interpreters with detailed views of delegates (Ziegler and Gigliobianco, 2018). Addition-
ally, new ISO standards were introduced, defining minimum requirements for audio quality
and related parameters (e.g. ISO 20108:2017 for the quality and transmission of sound and
image input in simultaneous interpreting, and ISO 20109:2016 for equipment; see Pérez
Guarnieri and Ghinos, this volume).
The idea of implementing physical RSI hubs equipped with advanced conferencing tech-
nology and soundproof interpreting booths became popular from the mid-2010s, particu-
larly in major cities. These hubs aimed to reduce interpreter travel while offering organisers
and interpreters greater control over technical conditions (Ziegler and Gigliobianco, 2018).
A study on RSI at the 2015 World Cup in Brazil highlighted benefits, such as dedicated work-
spaces, opportunities for team collaboration, and improved interpreter well-being, resulting
in more favourable attitudes towards RSI compared to earlier findings (Seeber et al., 2019).
However, as outlined in Section 2.2.2, RSI has undergone a further significant evolu-
tion since the late 2010s. In addition to the rise of physical hubs offering booth-based RSI,
software-based virtual RSI platforms also emerged. An early evaluation by the European
Commission’s Directorate-General for Interpreting found these platforms generally suit-
able for conference interpreting (DG SCIC, 2019). Around the same time, the International
Association of Conference Interpreters (AIIC) revised its stance on RSI, publishing guide-
lines in 2018 that recognised RSI as a new reality and established minimum requirements
for its use (AIIC, 2018).
It is unclear whether physical RSI hubs equipped with high-end technology and easily
accessible to interpreters would have achieved widespread adoption without the COVID-19
pandemic or whether they would have coexisted with software-based RSI platforms. The
pandemic significantly accelerated the adoption of software-based RSI, though a 2020
survey of over 800 conference interpreters found a majority believed their performance
and working conditions to be poorer in platform-based RSI compared to on-site inter-
preting (Buján and Collard, 2022). Despite these challenges, over 40 RSI platform pro-
viders emerged or expanded during the pandemic, introducing features such as improved
interpreter collaboration tools, AI-assisted transcription, term extraction (Fantinuoli et al.,
2022; Rodríguez González et al., 2023), and audio quality enhancements.
Nevertheless, generic conferencing platforms such as Zoom continue to dominate RSI
assignments due to organiser preferences, offering only basic interpreting functionality
(Buján and Collard, 2022; Saeed et al., 2023). A recent development to address this gap is
the integration of RSI platforms as plug-in solutions with generic conferencing platforms,
enhancing interpreter functionality. Post-pandemic, RSI platform providers are also shift-
ing their business models, transforming platforms into marketplaces for interpreters and
clients, alongside offering automated interpreting options.
In conclusion, RSI has evolved substantially in recent years, with the pandemic acting
as a catalyst for the development of software-based platforms. Although concerns about

36
Video-mediated interpreting

performance and working conditions persist, innovative features and new business models
suggest the RSI market remains dynamic and continues to adapt.

2.4.2 VMI in legal settings


Prior to the use of VMI in legal settings, telephone-mediated interpreting was used in some
countries. The most notable example is perhaps the Telephone Interpreting Project (TIP),
which was implemented in US federal courts in 1989 to address the growing need for lan-
guage services and to assist federal courts with access to interpreters for rare languages.
In the 1990s, law enforcement and justice sector institutions in many Anglophone coun-
tries and the European Union began implementing ISDN-based videoconferencing systems
into legal proceedings, especially criminal and immigration proceedings. This trend contin-
ued into the 2000s, transitioning to broadband-based videoconferencing technology. These
systems enabled video links between courtrooms, prisons, prosecution offices, and police
stations for pre-trial hearings and remote witness testimony (Braun and Taylor, 2012b;
Braun et al., 2018; Devaux, this volume). This development initially created demand for
VCI rather than VRI, as interpreters needed to be integrated in the various types of video
links. The motivations for the use of videoconferencing in the justice sector included cost
saving, reducing security risks linked to detained persons to court for their hearing, and
enhancing cross-border judicial collaboration in Europe (Braun and Taylor, 2012c). VCI
has since been used in cross-border legal proceedings under European legislation on mutual
legal assistance (Braun and Balogh, 2015), as well as in national courts, particularly for
pre-trial hearings, with defendants appearing from custody via video link, with interpreters
present either in court or in custody. Other uses include witness hearings and lawyer–client
conferences (Braun, 2018; Braun et al., 2018; Devaux, 2018; Fowler, 2018; Singureanu
et al., 2023).
Early pilot studies of video link use in criminal proceedings offered limited insight on
VCI due to the small number of cases involving interpreters and lack of a detailed analysis
of interpreter involvement (Terry et al., 2010). A more comprehensive feasibility study of
VCI in the legal setting was conducted in relation to immigration hearings in Canada in the
early noughties (Ellis, 2004). Involving over 70 participants (immigration lawyers, judges,
refugee protection officers, and interpreters), the interview-based study revealed mixed
opinions on the effectiveness of VCI in this setting. The separation of the interpreter from
the refugee, which prevented simultaneous (whispered) interpreting, was perceived as one
of the main problems. Interpreters had to use consecutive interpreting, which judges found
disruptive, adding to interpreters’ challenges. Participants also found video hearings more
tiring than in-person hearings and perceived the interpreting quality to be lower (see also
Singureanu and Braun, this volume). An ethnographic study of pre-trial hearings further
highlighted issues with interpreters’ positioning, microphone access, and video visibility,
noting that these problems, along with the lack of a protocol, caused disruptions and mis-
understandings (Fowler, 2018).
While the evolution of VMI in legal proceedings was initially driven by the need to
integrate interpreters into video link hearings, VRI also gained traction in the 2000s and
was adopted by Florida courts in the United States in 2007 (see Section 2.3). In the UK, the
London Metropolitan Police Service launched a hub-based consecutive video interpreting
service in 2011. By the 2010s, VMI, in both the VCI and VRI configurations, had expanded
significantly worldwide.

37
The Routledge Handbook of Interpreting, Technology and AI

In response to these developments, the European AVIDICUS projects (2008–2016;


Braun and Taylor, 2012a) conducted comprehensive studies on the viability and quality of
video-mediated interpreting (VMI) in legal proceedings. AVIDICUS 1 (2008–2011) included
over 40 experimental studies across three countries and multiple language pairs, compar-
ing on-site and video-mediated interpreting (Balogh and Hertog, 2012; Braun and Taylor,
2012d; Miler-Cassino and Rybińska, 2012). Results showed lower interpreting quality,
faster performance decline over time, interpreter fatigue, and more interactional problems
in VMI compared to on-site interpreting. The project recommended improvements through
training and better technical equipment and developed guidelines for good practice and
training for interpreters and legal practitioners (Braun, 2012; Braun et al., 2012). Napier
(2012) conducted a parallel study on sign language interpreting with similar findings.
AVIDICUS 2 (2011–2013), which replicated some of the AVIDICUS 1 studies after pro-
viding short-term training to interpreters and using better equipment, showed no significant
performance improvement, suggesting the need for a more comprehensive approach to the
implementation of VMI (Braun, 2014, 2017). Observations of real-life video hearings also
highlighted differences in communication dynamics between traditional and video-mediated
settings (Licoppe and Verdier, 2013; Licoppe et al., 2018).
AVIDICUS 3 (2014–2016) assessed videoconferencing facilities in European legal sys-
tems and found limited provision for VMI-specific requirements, such as high-quality
audio, and insufficient planning for integrating interpreters into complex cases involving
video links, despite the increasing use of such links across all jurisdictions in Europe (Braun
et al., 2018).
A more recent study focusing on the use of VCI in pre-trial hearings where the inter-
preter was located in court highlights the specific challenges of this participant distribution,
where the separation of the interpreter from the defendant prevents the interpreter from
interpreting simultaneously by chuchotage, leading to overlapping speech and loss of infor-
mation, but also adaptation of interpreters’ strategies (Singureanu et al., 2023).
Although the virtualisation of justice was already advancing through digital justice
programmes in many countries before the COVID-19 pandemic, the rapid shift to online
hearings during the pandemic introduced new complexities. This not only created fresh
challenges but also highlighted and exacerbated long-standing issues with implementing
effective VMI solutions in legal proceedings. Legal interpreters working in hybrid or online
hearings often had to resort to ad hoc strategies, such as using personal mobile devices to
deliver services. While court video platforms have improved since the early pandemic, they
have been slow to integrate interpreter-specific functionalities.
Overall, the use of VMI in legal proceedings has expanded significantly since its intro-
duction in the 1990s, evolving into the current post-pandemic digital justice era. However,
this growth has not consistently produced effective solutions for integrating VMI in legal
settings. Research has identified numerous challenges, many of which have been magnified
by the broader application of VMI since COVID-19 but remain inadequately addressed and
require further investigation.

2.4.3 VMI in healthcare settings


In healthcare settings, it was shortages of qualified interpreters that initially drove the adop-
tion of remote modalities, beginning with telephone-mediated interpreting in the 1970s
(Ozolins, 2011; Rosenberg, 2007). While the use of VRI in healthcare settings has expanded

38
Video-mediated interpreting

over recent years (Braun et al., 2023), telephone interpreting remains a common modality
in healthcare settings (De Boe, this volume).
When using VRI in healthcare settings, interpreters often work from home or from hubs
located in large hospitals or operated by remote interpreting service providers (Zhang et al.,
2024). The expansion of healthcare consultations by video link during the pandemic fur-
ther led to an increase in the demand for VCI, especially for configurations in which inter-
preters work from their own location, leading to three-way communication links between
healthcare providers, patients, and interpreters, sometimes combining audio-only and video
connections (Zhang et al., 2024). An earlier pilot of this configuration revealed high satis-
faction rates among patients (Schulz et al., 2015).
The effects of these relatively recent VCI configurations on interpreter-mediated interac-
tions remain underexplored and warrant further investigation. By contrast, VRI has been
studied extensively in health sciences, especially to assess its feasibility within healthcare
workflows. This research has primarily focused on the effectiveness of, and satisfaction
with, VRI compared to on-site interpreting, often eluding questions of interpreting qual-
ity, particularly accuracy, and its impact on patient outcomes. An early systematic review
concluded that remote interpreting by telephone and video link is as acceptable as on-site
interpreting for patients and doctors – and, to a slightly lesser extent, for interpreters – with
similar accuracy levels reported across the two modalities (Azarmina and Wallace, 2005).
However, no formal interpreting quality assessment had been undertaken in the reviewed
studies.
Studies examining interpreting quality in healthcare settings have typically relied on
self-reported perceptions from patients and providers, despite their limited ability to evaluate
quality accurately (see Section 2.5.2). More attention has been given to interactional issues,
with recent research highlighting differences in communication dynamics and complexity
between on-site interpreting and VRI (Hansen, 2020; Klammer and Pöchhacker, 2021).
Interestingly, while perception-based studies suggest that VRI is comparable to on-site inter-
preting, interactional studies reveal it can be more complex and challenging to navigate (see
Section 2.5.3). The links between interactional complexity, interpreting quality, and patient
outcomes remain largely unexplored and require further systematic investigation.

2.5 Selected research topics in VMI


This section examines the most explored research themes in VMI, including interpreters’
and users’ perceptions, attitudes, and satisfaction levels; interpreting quality; human fac-
tors, such as stress and fatigue; interactional dynamics; working conditions; strategies; and
the potential for adaptation to VMI. By mapping the exploration of these topics across vari-
ous interpreting domains, the section identifies areas of thorough investigation as well as
those that have received comparatively little attention. As previously, the emphasis remains
on VMI in its narrower sense, though findings from studies on RSI are incorporated where
relevant.

2.5.1 Perceptions, preferences, and attitudes


Perceptions, preferences, and attitudes towards VMI have been investigated across all inter-
preting settings. As explained in Section 2.4.3, research on technology-mediated healthcare
interpreting often compares on-site interpreting with telephone-mediated interpreting and

39
The Routledge Handbook of Interpreting, Technology and AI

VMI, revealing a general preference for on-site interpreting among interpreters (Azarmina
and Wallace, 2005; Locatis et al., 2010; Price et al., 2012). Yabe (2020) found that both
healthcare providers and deaf and hard-of-hearing patients preferred on-site interpret-
ing for critical care to ensure effective and accurate communication. Price et al. (2012)
emphasised the influence of healthcare communication genres on interpreter’s perceptions
of different interpreting modalities, with remote modalities perceived as less satisfactory
for patient assessment scenarios than on-site interpreting due to challenges in building rap-
port with remote participants. Healthcare interpreters surveyed by Zhang et al. (2024)
highlighted frequent technical and logistical challenges in telephone- and video-mediated
interpreting. Issues such as poor sound quality, limited visual cues, lack of briefing,
and restricted non-verbal communication negatively affected interpreters’ effectiveness.
Telephone-mediated interpreting was perceived to be particularly difficult in complex medi-
cal settings involving multiple speakers or emotionally charged communication, such as
delivering bad news. While VMI was also seen as impacting interaction and communica-
tion negatively, it was perceived as more effective than telephone-mediated interpreting for
handling complex healthcare interactions.
In the field of legal interpreting, an early survey of 150 legal interpreters in Europe
revealed generally negative attitudes towards, and low acceptance of, VMI (Braun and Tay-
lor, 2012c). However, further analysis by country highlighted significant differences, with
particularly negative attitudes in the UK compared to more positive views in continental
Europe (Braun, 2018). These differences were attributed to several factors. First, the tech-
nology used for VMI played a role. The UK, an early adopter of videoconferencing in legal
proceedings, relied on ISDN-based legacy equipment implemented from the 1990s onwards
(see Section 2.4.2) until well into the 2000s. This equipment provided poor sound quality
and only offered limited options to interact with the technology. In contrast, many continen-
tal European countries, as later adopters, implemented more modern, internet-based vide-
oconferencing systems that interpreters found more acceptable. Second, the socio-economic
context of interpreting in the UK is likely to have contributed to the negative perceptions,
given that the expansion of VMI coincided with debates about cost-cutting in public service
interpreting and an emerging trend towards outsourcing of interpreting services to a private
contractor, which resulted in lower remuneration and deteriorating working conditions
for interpreters, nurturing a perception of undervaluation within the UK justice system.
An interview-based study of 17 legal interpreters in the UK revealed mixed feelings about
VMI, with the interpreters recognising benefits such as improved safety and cost savings
but noting drawbacks such as altered communicative dynamics and reliance on technology
(Devaux, 2018, this volume).
A more recent survey of public service interpreters working in different sectors regard-
ing their views of both telephone-mediated and video-mediated interpreting yielded more
positive results, noting some appreciation for the benefits of these modalities, such as the
comfort of working from home, despite concerns about stress and quality (Corpas Pastor
and Gaber, 2020). Some of these results were echoed in an international survey of sign
language interpreters, which found that the interpreters’ experience of both VRI and VRS
is overall satisfactory, but that interactional and technological issues make VMI more chal-
lenging than working on-site (Napier et al., 2017). The interpreters in Napier et al.’s survey
highlighted benefits for both themselves and for the deaf community as the service users.
This is corroborated by Singureanu et al. (2023) and Zhang et al. (2024), who similarly

40
Video-mediated interpreting

found that interpreters often reflect on the benefits of VMI from the perspective of whether
it brings benefits for the service users.
In conference settings, two early studies comparing on-site interpreting with VRI (RSI)
reported high stress levels and low acceptance among interpreters (Moser-Mercer, 2003;
Roziner and Shlesinger, 2010). However, more recent findings suggest a shift in attitudes,
with the 22 participating interpreters working at an RSI hub during the World Cup in Bra-
zil reporting that they were satisfied with their performance and that they felt an increase
in psychological well-being after gaining experience with RSI (Seeber et al., 2019). Con-
versely, a COVID-19 pandemic survey of over 800 conference interpreters showed largely
negative attitudes and low satisfaction with RSI, likely influenced by the limited time to
adjust to working largely online at the time of the survey and by a fragmented client base
resulting from this (Buján and Collard, 2022). These perceptions were corroborated in a
number of local surveys during the pandemic (see Chmiel and Spinolo, this volume).

2.5.2 Interpreter comprehension, interpreting quality, and human factors


While the interpreter’s source speech comprehension and output quality are critical aspects
of VMI, they have not received consistent attention across all interpreting settings com-
pared to other factors. In relation to healthcare settings, research into interpreting quality
in particular in VMI remains limited. However, an observational study by De Boe (2020)
comparing interpreting quality in telephone, video remote, and on-site interpreting found
evidence of a link between modality and interpreting quality, showing that the remote
modalities have a negative impact on output quality. Healthcare interpreters’ perceptions
about their source speech comprehension and output quality were elicited by Zhang et
al. (2024) as part of a survey, finding that telephone-mediated interpreting is perceived to
have a more negative impact on both aspects than VMI. However, the extent of the per-
ceived difference between the two modalities varied across situations. VMI was perceived
as being superior to telephone-mediated interpreting when the source speech was highly
emotional or technical, and when speakers were difficult to understand. Additionally, two
studies indirectly assessed quality by evaluating patient comprehension of diagnoses. These
studies concluded that on-site and video interpreting facilitated better comprehension than
telephone interpreting (Lion et al., 2015; Anttila et al., 2017).
In conference settings, the quasi-experimental ITU/Geneva study (Moser-Mercer, 2003)
and the European Parliament study (Roziner and Shlesinger, 2010) comparing on-site inter-
preting and RSI yielded partly inconsistent results. While interpreters in the European Par-
liament study rated their performance in VRI (RSI) as poorer than in on-site interpreting,
statistical analyses in both studies found few differences in interpreting quality, except for
an earlier onset of fatigue in RSI in the ITU/Geneva study.
In legal settings, Braun and colleagues’ research in the European AVIDICUS projects
found that VMI often exacerbated known challenges in legal interpreting (Braun and Tay-
lor, 2012a). For example, two of the AVIDICUS studies specifically focusing on the use
of VRI in police interviews identified a significantly higher number of interpreting prob-
lems and a faster onset of fatigue in VRI compared to on-site interpreting (Braun, 2013,
2014). Qualitative analyses highlighted a range of comprehension and lexical retrieval
problems, mishearings, and misunderstandings by the interpreters (Braun, 2013, 2017).
However, a study conducting a three-way comparison of telephone, video remote, and

41
The Routledge Handbook of Interpreting, Technology and AI

on-site interpreting in police interviews concluded that telephone interpreting was inferior
to the other two modalities, while VRI and on-site interpreting showed minimal differences
(Hale et al., 2022).
This overview shows that research on VMI has yielded mixed results. On the one hand,
RSI studies have observed a striking gap between objective measures of interpreting quality
and interpreters’ subjective perceptions. While objective evaluations found minimal differ-
ences between on-site interpreting and RSI, interpreters’ subjective quality perceptions were
substantially lower for RSI. Roziner and Shlesinger (2010) suggested that interpreters’ per-
ceptions might have been influenced by their dissatisfaction with RSI. On the other hand,
studies in conference and legal settings have produced differing conclusions about VMI,
which may be attributed to variations in study design, methods of quality assessment (see
Davitti et al., this volume), and the nature of the interpreting contexts (e.g. monologic ver-
sus dialogic interactions). These differences complicate efforts to draw broad conclusions
about the quality of VMI compared to on-site interpreting.
While quality is a critical factor in evaluating the viability of VMI, other considerations,
such as ergonomic, psychological, and physiological factors, also play a significant role in
shaping interpreters’ experiences and well-being. Studies on RSI indicate that interpret-
ers face greater stress and discomfort in remote settings compared to on-site interpreting
(Buján and Collard, 2022; Moser-Mercer, 2003; Roziner and Shlesinger, 2010), potentially
due to the reduced sense of presence associated with technology-mediated communication.
Research in human–computer interaction (Luff et al., 2003) suggests that diminished pres-
ence can negatively impact user experience. Similarly, Moser-Mercer (2005) posited that
distance interpreting might hinder interpreters’ ability to process information and construct
mental representations of situations, contributing to stress and fatigue. This has sparked
discussions on the role of cognitive load in VMI (Zhu and Aryadoust, 2022; see also Chmiel
and Spinolo, this volume).
To enhance the VMI experience, further research is needed to address these challenges.
For instance, while audio quality is widely recognised as vital to interpreting quality and
user experience, the influence of other factors, such as the visual interface design of vide-
oconferencing and RSI platforms, remains less understood. Initial studies on RSI platform
interfaces suggest that interpreters tend to favour information-rich, feature-packed inter-
faces offered by bespoke platforms. However, much of their work is currently conducted on
generic platforms like Zoom, which employ minimalist interfaces that may negatively affect
both interpreting quality and user experience (Saeed et al., 2023).

2.5.3 Spatial arrangement, visual ecology, and interaction


The challenges of VMI, which stem from the spatial separation of participants and the tech-
nical issues associated with the transmission of sound and images, including transmission
delays and constraints on mutual visibility, have also been shown to affect the coordination
and interaction in VMI. Several studies have provided insights into how different configura-
tions of participant distribution and the technical working environments in VMI shape the
interactional dynamics and management.
As discussed in Section 2.5.1, studies comparing interpreter and user perceptions of dif-
ferent modalities have suggested a ‘ranking’ order, with video-mediated interpreting (VMI)
perceived as being closer to on-site interpreting than telephone interpreting. This aligns
with the classic work by Short et al. (1976) on the social dynamics of technology-enabled

42
Video-mediated interpreting

communication. Short and his colleagues considered the capability of different communi-
cation technologies to create and maintain a sense of presence, that is, a feeling of being
physically present with others. They concluded that technologies offering fewer non-verbal
and visual cues diminish this sense of presence, making interactions feel more impersonal
and potentially affecting communication quality. Conversely, media that transmit more
visual and auditory cues create a stronger sense of presence and are seen as supporting more
effective communication.
However, studies exploring the visual ecology in VMI, the spatial arrangements, and
the use of visual and embodied resources suggest that VMI is visually more complex than
on-site interpreting and can affect the communication. Although video links provide visual
cues, as captured by cameras, these cues offer only a partial view of the remote partici-
pants’ communicative context and are often less effective than the cues available in on-site
interpreting (Braun, 2004, 2012, 2017; Davitti and Braun, 2020). For instance, the way the
camera frames the remote site can obscure important visual details, such as hand gestures,
or introduce distractions, such as someone entering the room or off-camera activities in
the remote space. If the interpreter’s hands are not visible, participants may fail to realise
they are taking notes or reading from them, potentially misinterpreting this as uncertainty
and thereby undermining their trust in the interpreter. The distance between remote par-
ticipants and the camera, along with the size of the screen on which they are viewed, can
also lead to important cues being missed. For example, if a participant is far away from
the camera or viewed on a small screen, facial expressions and small hand gestures may go
unnoticed. This issue is particularly relevant when interpreting for older individuals whose
facial expressions may be harder to read (Gilbert et al., 2022). Equally important, the view-
ing distance and angle of the screen showing remote participants can influence how they are
perceived (Benforado, 2010; Singureanu et al., 2023).
Further evidence for the lower effectiveness of visual cues in VMI compared to on-site
interpreting comes from recent research on interaction in VMI. Hansen (2024) examined
interpreter-initiated repair sequences in VMI during healthcare encounters, highlighting
that while non-verbal repair initiators are less intrusive for the participants than verbal
utterances, the visual constraints of VMI may prevent non-verbal repairs from being per-
ceived by the remote participants. Exploring how gaze works for turn-taking in VMI,
Vranjes (2024) found that while gaze patterns in VMI are similar to those in face-to-face
interactions, their effectiveness in coordinating turns is reduced, suggesting that interpret-
ers may need to adopt more explicit turn-taking strategies, such as prolonged gaze at the
screen. De Boe (2024) compared interactional aspects in VRI, telephone-mediated, and
on-site interpreting, arguing that the remote modalities require more grounding to ensure
mutual understanding between participants than on-site interpreting and that VRI leads to
turn-taking issues and ineffective gestures.
Specifically in situations of VRI, achieving the triangular positioning typical of on-site
dialogue interpreting has been shown to be more challenging, making it harder for the inter-
preter to perceive embodied cues from the co-present interlocutors and to build rapport with
them, and vice versa (Davitti and Braun, 2020; Gilbert et al., 2022; Hansen, 2020; Klammer
and Pöchhacker, 2021). Participants’ awareness that the video camera only captures part of
their environment may prompt adjustments in their positioning relative to the videoconfer-
encing equipment, such as sitting closer together than in in-person interactions (Braun et al.,
2018; Klammer and Pöchhacker, 2021). However, in legal proceedings, such adjustments risk
undermining perceptions of the interpreter’s impartiality (Licoppe et al., 2018).

43
The Routledge Handbook of Interpreting, Technology and AI

Furthermore, different configurations of VMI have been shown to alter turn-taking pat-
terns and fragment communication. For example, the use of VRI in police interviews was
found to affect the interpreter’s understanding of the communicative intentions behind some
of the police officers’ statements, leading to misrepresentation of those intentions (Braun,
2017). Similarly, in video-mediated court proceedings where the minority-language speaker
participates remotely while the interpreter is co-located with other participants in court,
research has highlighted instances where the sequential order of turns was altered. This
occurred because participants present in court failed to notice requests from the remote
minority-language speaker to continue speaking after the first part of their utterance had
been interpreted (Licoppe et al., 2018).
Equally important, this configuration has also been shown to affect the mode of inter-
preting, which, in turn, influences the dynamics of the proceedings (Singureanu et al.,
2023). When the minority-language speaker attends court remotely and the interpreter
is in court, simultaneous interpreting is not feasible unless the videoconferencing system
provides additional audio channels. In such situations, interpreters have been observed to
adopt alternative strategies, especially alternating between simultaneous interpreting, by
briefly speaking over the participants in court, and consecutive interpreting, making use of
brief pauses in their speeches. However, these coping strategies can have an impact on the
court proceedings, potentially leading to errors or loss of information, and contribute to
increased stress for interpreters (Singureanu et al., 2023). Ethnographic research in legal
settings also highlights the negative impact of imbalanced participant distributions, such as
when the defendant is the only remote participant, on their ability to follow court proceed-
ings and intervene (Ellis, 2004; Fowler, 2018).
Research on the interactional aspects of VCI configurations with multiple sites remains
limited. Rosenberg (2007) suggested that interaction in three-way telephone links, where the
primary interlocutors and the interpreter are all located separately, may be less problematic
than remote interpreting via telephone, where the primary interlocutors are co-located, as
the three-way link can place participants on a more equal footing. However, Braun (2004,
2007), in her analysis of interaction in VCI configurations where the primary participants
were in different locations and interpreters worked from a third site, providing simultane-
ous interpreting, concluded that this set-up creates its own interactional challenges, particu-
larly requiring greater coordination efforts from the interpreter. Despite identifying these
challenges, interaction-focused research on VMI has also underscored the potential for
adaptation to VMI, which will be explored in the next section.

2.5.4 Interpreting strategies and adaptation


Research has shown that interpreters have developed various strategies to manage the chal-
lenges of VMI. For instance, the ViKiS project in Germany during the late 1990s investi-
gated adaptation strategies through simulations using a prototype VCI system, enabling
interpreters to join a videoconference call between two clients from a third location and
interpret simultaneously (Braun, 2004, 2007). Initially, interpreters employed explicit coor-
dination strategies but, finding the ‘moderator’ role uncomfortable, shifted from (retro-
spective) problem-solving strategies to strategies that would help them avoid or prevent
problems (Braun, 2004, 2007). Similar explicitation strategies have been observed in VRI,
where interpreters working in police interviews were observed expanding their renditions by
repeating information or making what appeared to be unnecessary additions as a strategy

44
Video-mediated interpreting

to mitigate the perceived lack of presence (Braun, 2017). Cavents and De Wilde’s (2024)
finding that the face work carried out by both the primary participants and the interpreter
to maintain mutual rapport was primarily directed at protecting the other’s faces rather
than their own may also serve as evidence of strategic adaptation to bridge the perceived
gap in presence in VMI.
However, one setting in particular has required wide-ranging adaptation strategies from
interpreters, namely, VCI in the court, when defendants attend by video link and interpreters
are located in the court. As explained in Section 2.4.2, this configuration makes it impossible
to provide whispered simultaneous interpreting to the defendant, which leads to problems
when court participants do not pause to let the interpreter deliver consecutive interpretation.
Interpreters have been observed adopting strategies such as delivering simultaneous inter-
preting in a normal voice – despite the potential disruption – or using brief pauses in speech
to deliver short target-language segments. They also alternated between the two methods
to align with the pace of proceedings (Singureanu et al., 2023). Emotional intelligence may
influence these strategic choices and adaptation abilities (Singureanu, 2023).
In relation to VMI in healthcare settings and conference interpreting, respectively, recent
surveys indicate adaptations to the physical separation from the participants and/or booth-
mates, such as using additional devices for text communication or muted video chats with
interpreter colleagues (Zhang et al., 2024, for healthcare settings; Buján and Collard, 2022;
Chmiel and Spinolo, 2022, for conference settings). However, the effectiveness of these
strategies warrants further investigation. Moser-Mercer (2005) suggests that experienced
interpreters might struggle to adapt to remote interpreting due to reliance on automated
processes, whereas novice interpreters, especially those trained in new modalities, may
adapt more readily.

2.6 Conclusions and future directions


Video-mediated interpreting (VMI) continues to develop as a professional practice, with
many interpreters now combining on-site and remote assignments. The COVID-19 pan-
demic accelerated the adoption of VMI but also highlighted technical issues, such as per-
sistent audio quality problems. Despite the prevailing challenges of VMI, some interpreters
report positive attitudes towards the modality, highlighting several benefits, especially when
working conditions can be optimised.
The pandemic has also accelerated the diversification of VMI configurations, particu-
larly towards VCI in fully online and hybrid meetings, with interpreters now often working
from a separate location. For example, the increase in telehealth consultations during the
pandemic also accelerated the use of VCI. While some believe that these VCI configura-
tions are less demanding than VRI because they put the primary participants and the inter-
preter on an equal footing, others have pointed out that this configuration presents its own
challenges, such as increased coordination efforts on the part of the interpreter. However,
research into the impact of three-way video links and mixed media on interpreter-mediated
interactions remains limited. Furthermore, the quality of VMI and its potential impact on
linguistic minority populations in healthcare and legal settings, that is, in terms of patient
outcomes and fairness of justice, are not well established, as most studies have focused pri-
marily on user satisfaction and interactional aspects.
Future development of VMI should focus on sustainable integration into professional
practice, balancing client needs and interpreter well-being. Addressing market vulnerabilities

45
The Routledge Handbook of Interpreting, Technology and AI

requires detailed research into the variables that shape interpreters’ experiences and atti-
tudes towards VMI. In addition, emerging AI-powered language technologies may alleviate
challenges such as high cognitive load and fatigue and contribute to sustaining interpreting
quality in VMI. Initial research on integrating AI-powered automatic speech recognition
into RSI and into VMI workflows in healthcare and legal interpreting has shown prom-
ising results (e.g. Rodríguez González et al., 2023; Tan et al., 2024; Tang et al., 2024,
respectively).
Another aspect highlighted by the growth of VMI is that technology-mediated environ-
ments require adaptation on the part of both interpreters and users of interpreting ser-
vices who interact with these environments (Davitti and Braun, 2020). This will need to be
addressed through training and upskilling of both interpreters and service users in continu-
ing professional development programmes as well as being reflected in institutional VMI
policies. As a complementary development, further effort is required in terms of standardi-
sation, which has so far mainly addressed RSI in the context of conference interpreting (see
Pérez Guarnieri and Ghinos, this volume). Exceptions are the German DIN8578 standard,
which covers requirements and recommendations for consecutive distance interpreting, and
the minimum standards for VMI developed in the European EU WEBPSI project (see Sin-
gureanu and Braun, this volume).

References
AIIC, 2000. Code for the Use of New Technologies in Conference Interpreting. AIIC Technical and
Health Committee. URL https://web.archive.org/web/20020429100556/www.aiic.net/ViewPage.
cfm?page_id=120 (accessed 10.10.2024).
AIIC, 2018. AIIC Position on Distance Interpreting. AIIC Executive Committee. URL https://aiic.org/
document/4837/AIIC_position_on_TFDI_05.03.18.pdf (accessed 10.10.2024).
Anttila, A., Rappaport, D.I., Tijerino, J., Zaman, N., Sharif, I., 2017. Interpretation Modalities Used
on Family-Centered Rounds: Perspectives of Spanish-Speaking Families. Hospital Pediatrics 7(8),
492–498. URL https://doi.org/10.1542/hpeds.2016-0209
Azarmina, P., Wallace, P., 2005. Remote Interpretation in Medical Encounters: A Systematic Review.
Journal of Telemedicine and Telecare 11(3), 140–145. URL https://doi.org/10.1258/135763305
3688679
Balogh, K., Hertog, E., 2012. AVIDICUS Comparative Studies – Part II: Traditional, Videoconference
and Remote Interpreting in Police Interviews. In Braun, S., Taylor, J., eds., Videoconference and
Remote Interpreting in Legal Proceedings. Intersentia, Antwerp, 119–136.
Benforado, A., 2010. Frames of Injustice: The Bias We Overlook. Indiana Law Journal 85(4), 1333.
URL www.repository.law.indiana.edu/ilj/vol85/iss4/8
Böcker, M., Anderson, D., 1993. Remote Conference Interpreting Using ISDN Videotelephony: A
Requirements Analysis and Feasibility Study. Proceedings of the Human Factors and Ergonomics
Society 1(3), 235–239. URL https://doi.org/10.1177/154193129303700305
Braun, S., 2001. ViKiS – Videokonferenz mit integriertem Simultandolmetschen für kleinere und
mittlere Unternehmen. In Beck, U., Sommer, W., eds. Proceedings of LearnTec 2001: European
Congress and Trade Fair for Educational and Information Technology, 9th, Karlsruhe, Germany.
Karlsruhe, 263–273.
Braun, S., 2004. Kommunikation unter widrigen Umständen? Einsprachige und gedolmetschte Kom-
munikation in der Videokonferenz. Gunter Narr, Tübingen.
Braun, S., 2007. Interpreting in Small-Group Bilingual Videoconferences. Interpreting 9(1), 21–46.
URL https://doi.org/10.1075/intp.9.1.03bra
Braun, S., 2012. Recommendations for the Use of Video-Mediated Interpreting in Criminal Proceed-
ings. In Braun, S., Taylor, J., eds. Videoconference and Remote Interpreting in Criminal Proceed-
ings. Intersentia, Antwerp, 301–328.

46
Video-mediated interpreting

Braun, S., 2013. Keep Your Distance? Remote Interpreting in Legal Proceedings. Interpreting 15(2),
200–228. URL https://doi.org/10.1075/intp.15.2.03bra
Braun, S., 2014. Comparing Traditional and Remote Interpreting in Police Settings: Quality and
Impact Factors. In Viezzi, M., Falbo, C., eds., Traduzione e interpretazione per la società e le
istituzioni. Edizioni Università, Trieste, 1–12.
Braun, S., 2015. Remote Interpreting. In Jourdenais, R., Mikkelson, H., eds. The Routledge Hand-
book of Interpreting. Routledge, London, 352–367.
Braun, S., 2016. The European AVIDICUS Projects: Collaborating to Assess the Viability of
Video-Mediated Interpreting in Legal Proceedings. European Journal of Applied Linguistics 4(1),
173–180. URL https://doi.org/10.1515/eujal-2016-0002
Braun, S., 2017. What a Micro-Analytical Investigation of Additions and Expansions in Remote
Interpreting Can Tell Us About Interpreters’ Participation in a Shared Virtual Space. Journal of
Pragmatics 107, 165–177. URL https://doi.org/10.1016/j.pragma.2016.09.011
Braun, S., 2018. Video-Mediated Interpreting in Legal Settings in England. Translation and Interpret-
ing Studies 13(3), 393–420. URL https://doi.org/10.1075/tis.00022.bra
Braun, S., 2024. Distance Interpreting as a Professional Profile. In Massey, G., Ehrensberger-Dow,
M., Angelone, E., eds. Handbook of the Language Industry: Contexts, Resources and Profiles. De
Gruyter, Berlin, 449–472. URL https://doi.org/10.1515/9783110716047-020
Braun, S., Al Sharou, K., Temizöz, Ö., 2023. Technology Use in Language-Discordant Interpersonal
Healthcare Communication. In The Routledge Handbook of Public Service Interpreting. Rout-
ledge, London, 89–105. URL https://doi.org/10.4324/9780429298202-8
Braun, S., Balogh, K., 2015. Bilingual Videoconferencing in Legal Proceedings: Findings from the
AVIDICUS Projects. In Proceedings of the Conference: Electronic Protocol – a Chance for Trans-
parent and Fast Trial. Polish Ministry of Justice, Warsaw, 21–34.
Braun, S., Davitti, E., Dicerto, S., 2018. Video-Mediated Interpreting in Legal Settings: Assessing
the Implementation. In Napier, J., Skinner, R., Braun, S., eds. Here or There: Research on Inter-
preting via Video Link. Gallaudet University Press, Washington, DC, 144–180. URL https://doi.
org/10.2307/j.ctv2rh2bs3.9
Braun, S., Taylor, J., eds., 2012a. Videoconferencing and Interpreting in Criminal Proceedings.
Intersentia, Antwerp.
Braun, S., Taylor, J., 2012b. Video-Mediated Interpreting: An Overview of Practice and Research.
In Braun, S., Taylor, J., eds. Videoconference and Remote Interpreting in Criminal Proceedings.
Intersentia, Antwerp, 33–68.
Braun, S., Taylor, J., 2012c. Video-Mediated Interpreting in Criminal Proceedings: Two European
Surveys. In Braun, S., Taylor, J., eds. Videoconference and Remote Interpreting in Criminal Pro-
ceedings. Intersentia, Antwerp, 69–98.
Braun, S., Taylor, J., 2012d. AVIDICUS Comparative Studies – Part I: Traditional Interpreting and
Remote Interpreting in Police Interviews. In Braun, S., Taylor, J.L., eds. Videoconference and
Remote Interpreting in Criminal Proceedings. Intersentia, Antwerp, 99–117.
Braun, S., Taylor, J., Miler-Cassino, J., Rybińska, Z., Balogh, K., Hertog, E., Vanden Bosch, Y., Rombouts,
D., 2012. Training in Video-Mediated Interpreting in Criminal Proceedings. In Braun, S., Taylor, J., eds.,
Videoconference and Remote Interpreting in Criminal Proceedings. Intersentia, Antwerp, 233–288.
Buján, M., Collard, C., 2022. Remote Simultaneous Interpreting and COVID-19: Conference
Interpreters’ Perspective. In Liu, K., Cheung, A., eds. Translation and Interpreting in the Age of
COVID-19. Springer, Singapore, 133–150. URL https://doi.org/10.1007/978-981-19-6680-4_7
Cavents, D., De Wilde, J., 2024. Face-Work in Video Remote Interpreting: A Multimodal
Micro-Analysis. In de Boe, E., Vranjes, J., Salaets, H., eds. Interactional Dynamics in Remote
Interpreting: Micro-Analytical Approaches. Routledge, New York, 155–175. URL https:/doi.
org/10.4324/9781003267867-8
Chmiel, A., Spinolo, N., 2022. Testing the Impact of Remote Interpreting Settings on Interpreter
Experience and Performance: Methodological Challenges Inside the Virtual Booth. Transla-
tion, Cognition and Behavior 5(2), 250–274. URL https://doi-org.surrey.idm.oclc.org/10.1075/
tcb.00068.chm
Corpas Pastor, G., Gaber, M., 2020. Remote Interpreting in Public Service Settings: Technology, Per-
ceptions and Practice. SKASE Journal of Translation and Interpretation 13(2), 58–78.

47
The Routledge Handbook of Interpreting, Technology and AI

Davitti, E., Braun, S., 2020. Analysing Interactional Phenomena in Video Remote Interpreting in Col-
laborative Settings: Implications for Interpreter Education. The Interpreter and Translator Trainer
14(3), 279–302. URL https://doi.org/10.1080/1750399X.2020.1800364
De Boe, E., 2020. Remote Interpreting in Healthcare Settings: A Comparative Study on the Influence
of Telephone and Video Link Use on the Quality of Interpreter-Mediated Communication. Uni-
versity of Antwerp, Antwerp.
De Boe, E., 2024. Synchronization of Interaction in Healthcare Interpreting by Video Link and Tel-
ephone. In de Boe, E., Vranjes, J., Salaets, H., eds. Interactional Dynamics in Remote Interpreting:
Micro-Analytical Approaches. Routledge, London, 22–41.
Devaux, J., 2018. Technologies and Role-Space: How Videoconference Interpreting Affects the Court
Interpreter’s Perception of Her Role. In Fantinuoli, C., ed. Interpreting and Technology. Language
Science Press, Berlin, 91–117.
DG SCIC, 2019. Interpreting Platforms. Consolidated Test Results and Analysis. European Commission
Directorate General for Interpretation (DG SCIC). URL https://knowledge-centre-interpretation.
education.ec.europa.eu/sites/default/files/interpreting_platforms_-_consolidated_test_results_
and_analysis_-_def.pdf (accessed 23.5.2023).
Ellis, S.R., 2004. Videoconferencing in Refugee Hearings. Ellis Report to the Immigration and Ref-
ugee Board Audit and Evaluation Committee. URL www.irb-cisr.gc.ca/Eng/transp/ReviewEval/
Pages/Video.aspx#analysis (accessed 3.10.2022).
Fantinuoli, C., Marchesini, G., Landan, D., Horak, L., 2022. KUDO Interpreter Assist: Automated
Real-Time Support for Remote Interpretation. arXiv:2201.01800 [cs.CL]. URL https://doi.
org/10.48550/arxiv.2201.01800
Fowler, Y., 2018. Interpreted Prison Video Link: The Prisoner’s Eye View. In Napier, J., Skinner, R.,
Braun, S., eds. Here or There: Research on Interpreting via Video Link. Gallaudet University Press,
Washington, DC, 183–209. URL https://doi.org/10.2307/j.ctv2rh2bs3.10
Gilbert, A.S., Croy, S., Hwang, K., LoGiudice, D., Haralambous, B., 2022. Video Remote Interpret-
ing for Home-Based Cognitive Assessments. Interpreting. International Journal of Research and
Practice in Interpreting. John Benjamins Publishing Company 24(1), 84–110. URL https://doi.
org/10.1075/intp.00065.gil
Hale, S., Goodman-Delahunty, J., Martschuk, N., Lim, J., 2022. Does Interpreter Location Make a
Difference? Interpreting 24(2), 221–253. URL https://doi.org/10.1075/intp.00077.hal
Hansen, J.P.B., 2020. Invisible Participants in a Visual Ecology: Visual Space as a Resource for Organ-
ising Video-Mediated Interpreting in Hospital Encounters. Social Interaction. Video-Based Studies
of Human Sociality 3(3). URL https://doi.org/10.7146/si.v3i3.122609
Hansen, J.P.B., 2024. Interpreters’ Repair Initiators in Video-Mediated Environments. In de Boe,
E., Vranjes, J., Salaets, H., eds. Interactional Dynamics in Remote Interpreting: Micro-Analytical
Approaches. Routledge, London, 91–112.
Jumpelt, R.W., 1985. The Conference Interpeter’s Working Environment Under the New ISO and IEC
Standards. Meta 30(1), 82–90. URL https://doi.org/10.7202/003278AR
Klammer, M., Pöchhacker, F., 2021. Video Remote Interpreting in Clinical Communication: A
Multimodal Analysis. Patient Education and Counseling 104(12), 2867–2876. URL https://doi.
org/10.1016/j.pec.2021.08.024
Licoppe, C., Verdier, M., 2013. Interpreting, Video Communication and the Sequential Reshaping of
Institutional Talk in the Bilingual and Distributed Courtroom. International Journal of Speech,
Language and the Law 20(2), 247–275. URL https://doi.org/10.1558/ijsll.v20i2.247
Licoppe, C., Verdier, M., Veyrier, C.A., 2018. Voice, Power and Turn-Taking in Multilingual, Con-
secutively Interpreted Courtroom Proceedings with Video Links. In Napier, J., Skinner, R., Braun,
S., eds. Here or There: Research on Interpreting via Video Link. Gallaudet University Press, Wash-
ington, DC, 299–322. URL https://doi.org/10.2307/j.ctv2rh2bs3.14
Lion, K.C., Brown, J.C., Ebel, B.E., Klein, E.J., Strelitz, B., Gutman, C.K., Hencz, P., Fernandez, J.,
Mangione-Smith, R., 2015. Effect of Telephone vs Video Interpretation on Parent Comprehension,
Communication, and Utilization in the Pediatric Emergency Department a Randomized Clinical Trial.
JAMA Pediatrics 169(12), 1117–1125. URL https://doi.org/10.1001/jamapediatrics.2015.2630
Locatis, C., Williamson, D., Gould-Kabler, C., Zone-Smith, L., Detzler, I., Roberson, J., Maisiak, R.,
Ackerman, M., 2010. Comparing In-Person, Video, and Telephonic Medical Interpretation. Journal
of General Internal Medicine 25(4), 345–350. URL https://doi.org/10.1007/s11606-009-1236-x

48
Video-mediated interpreting

Luff, P., Heath, C., Kuzuoka, H., Hindmarsh, J., Yamazaki, K., Oyama, S., 2003. Fractured Ecologies:
Creating Environments for Collaboration. Human – Computer Interaction 18(1–2), 51–84. URL
https://doi.org/10.1207/S15327051HCI1812_3
Marshall, L.C., Zaki, A., Duarte, M., Nicolas, A., Roan, J., Colby, A.F., Noyes, A.L., Flores, G.,
2019. Promoting Effective Communication with Limited English Proficient Families: Implementa-
tion of Video Remote Interpreting as Part of a Comprehensive Language Services Program in a
Children’s Hospital. Joint Commission Journal on Quality and Patient Safety 45(7), 509–516.
URL https://doi.org/10.1016/j.jcjq.2019.04.001
Miler-Cassino, J., Rybińska, Z., 2012. AVIDICUS Comparative Studies – Part III: Traditional Inter-
preting and Videoconference Interpreting in Prosecution Interviews. In Braun, S., Taylor, J., eds.
Videoconference and Remote Interpreting in Criminal Proceedings. Intersentia, Antwerp, 117–136.
Moser-Mercer, B., 2003. Remote Interpreting: Assessment of Human Factors and Performance
Parameters. Communicate! AIIC (Summer 2003-Being There), 1–25.
Moser-Mercer, B., 2005. Remote Interpreting: Issues of Multi-Sensory Integration in a Multilingual
Task. Meta 50(2), 727–738. URL https://doi.org/10.7202/011014ar
Mouzourakis, P., 1996. Videoconferencing. Interpreting 1(1), 21–38. URL https://doi.org/10.1075/
intp.1.1.03mou
Mouzourakis, P., 2006. Remote Interpreting: A Technical Perspective on Recent Experiments. Inter-
preting 8(1), 45–66. URL https://doi.org/10.1075/intp.8.1.04mou
Napier, J., 2012. Here or There? An Assessment of Video Remote Signed Language Interpreter-Mediated
Interaction in Court. In Braun, S., Taylor, J., eds. Videoconference and Remote Interpreting in
Criminal Proceedings. Intersentia, Antwerp, 145–185.
Napier, J., Skinner, R., Braun, S., eds., 2018a. Here or There: Research on Interpreting via Video
Link. Washington, DC: Gallaudet University Press. URL https://doi.org/10.2307/j.ctv2rh2bs3
Napier, J., Skinner, R., Braun, S., eds., 2018b. Interpreting via Video Link: Mapping of the Field. In
Napier, J., Skinner, R., Braun, S., eds. Here or There: Research on Interpreting via Video Link.
Gallaudet University Press, Washington, DC, 11–35. URL https://doi.org/10.2307/j.ctv2rh2bs3.4
Napier, J., Skinner, R., Turner, G.H., 2017. “It’s Good for Them but Not so for Me”: Inside the
Sign Language Interpreting Call Centre. Translation and Interpreting 9(2), 1–23. URL https://doi.
org/10.12807/ti.109202.2017.a01
Nestler, F., 1957. Tel-Interpret: Begründung und Grundlagen eines deutschen Telefon-Dolmetschdienstes.
Lebende Sprachen 2(1), 21–23.
Nimdzi, 2023. Nimdzi Interpreting Index: Ranking of Top Interpreting Service Providers. URL www.
nimdzi.com/interpreting-index-top-interpreting-companies/ (accessed 10.10.2024).
Ozolins, U., 2011. Telephone Interpreting: Understanding Practice and Identifying Research Needs.
Translation & Interpreting 3(2), 33–47.
Price, E.L., Pérez-Stable, E.J., Nickleach, D., López, M., Karliner, L.S., 2012. Interpreter Per-
spectives of In-Person, Telephonic, and Videoconferencing Medical Interpretation in Clinical
Encounters. Patient Education and Counseling 87(2), 226–232. URL https://doi.org/10.1016/j.
pec.2011.08.006
Rodríguez González, E., Ahmed Saeed, M., Korybski, T., Davitti, E., Braun, S., 2023. Reimagining
the Remote Simultaneous Interpreting Interface to Improve Support for Interpreters. In Ferreiro
Vázquez, Ó., Moutinho Pereira, A.T.V., Gonçalves Araújo, S.L., eds. Technological Innovation for
Language Learning, Translation and Interpreting. Peter Lang, Berlin, 227–246.
Rosenberg, B.A., 2007. A Data Driven Analysis of Telephone Interpreting. In Wadensjö, C., Dim-
itrova, B.E., Nilsson, A.-L., eds. The Critical Link 4 Professionalisation of Interpreting in the Com-
munity. Selected Papers from the 4th International Conference on Interpreting in Legal, Health
and Social Service Settings, Stockholm, Sweden, 20–23 May 2004. John Benjamins, Amsterdam,
65–76. URL https://doi.org/10.1075/btl.70.09ros
Roziner, I., Shlesinger, M., 2010. Much Ado About Something Remote. Interpreting 12(2), 214–247.
URL https://doi.org/10.1075/intp.12.2.05roz
Saeed, M.A., Rodríguez González, E., Korybski, T., Davitti, E., Braun, S., 2023. Comparing Interface
Designs to Improve RSI Platforms: Insights from an Experimental Study. In Orǎsan, C., Mitkov,
R., Corpas Pastor, G., Moni, J., eds. International Conference on Human-Informed Translation
and Interpreting Technology (HiT-IT 2023), Naples, Italy, 7–9.7.2023, 147–156. URL https://
hit-it-conference.org/wp-content/uploads/2023/07/HiT-IT-2023-proceedings.pdf

49
The Routledge Handbook of Interpreting, Technology and AI

Schulz, T.R., Leder, K., Akinci, I., Ann Biggs, B., 2015. Improvements in Patient Care: Videoconfer-
encing to Improve Access to Interpreters During Clinical Consultations for Refugee and Immigrant
Patients. Australian Health Review 39(4), 395–399. URL https://doi.org/10.1071/AH14124
Seeber, K.G., Keller, L., Amos, R., Hengl, S., 2019. Expectations vs. Experience. Interpreting 21(2),
270–304. URL https://doi.org/10.1075/intp.00030.see
Short, J., Williams, E., Christie, B., 1976. The Social Psychology of Telecommunications. John Wiley,
New York.
Singureanu, D., 2023. Managing the Demands of Video-Mediated Court Interpreting: Strategies and
the Role of Emotional Intelligence (dissertation), University of Surrey, Surrey. URL https://doi.
org/10.15126/thesis.900664
Singureanu, D., Hieke, G., Gough, J., Braun, S., 2023. I am His Extension in the Courtroom.’ How
Court Interpreters Cope with the Demands of Video-Mediated Interpreting in Hearings with
Remote Defendants. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies – Cur-
rent and Future Trends. John Benjamins, Amsterdam, 72–108. URL https://doi.org/10.1075/
ivitra.37.04sin
Tan, S., Orăsan, C., Braun, S., 2024. Integrating Automatic Speech Recognition into Remote Health-
care Interpreting: A Pilot Study of its Impact on Interpreting Quality. Proceedings of Translating and
the Computer 2024 (TC46), 175–191. URL https://asling.org/tc46/wp-content/uploads/2025/03/
TC46-proceedings.pdf
Tang, W., Singureanu, D., Wang, F., Orăsan, C., Braun, S., 2024. Integrating Automatic Speech Rec-
ognition in Remote Interpreting Platforms: An Initial Assessment. Presentation at the CIOL Inter-
preters Day, 16.3.2024, London.
Terry, M., Johnson, S., Thompson, P., 2010. Virtual Court Pilot Outcome Evaluation. Min-
istry of Justice (UK) Research Series 21/10. URL https://assets.publishing.service.gov.uk/
media/5a7b66ff40f0b6425d592eaf/virtual-courts-pilot-outcome-evaluation.pdf
UNESCO, 1976. A Teleconference Experiment: A Report on the Experimental Use of the Sympho-
nie Satellite to Link UNESCO Headquarters in Paris with the Conference Centre in Nairobi.
UNESCO, Paris.
Vranjes, J., 2024. Where to Look? On the Role of Gaze in Regulating Turn-Taking in Video Remote
Interpreting. In de Boe, E., Vranjes, J., Salaets, H., eds. Interactional Dynamics in Remote Inter-
preting: Micro-Analytical Approaches. Routledge, London, 113–134.
Warnicke, C., Plejert, C., 2012. Turn-Organisation in Mediated Phone Interaction Using Video
Relay Service (VRS). Journal of Pragmatics 44(10), 1313–1334. URL https://doi.org/10.1016/j.
pragma.2012.06.004
Yabe, M., 2020. Healthcare Providers’ and Deaf Patients’ Interpreting Preferences for Critical Care
and Non-Critical Care: Video Remote Interpreting. Disability and Health Journal 13(2), 100870.
URL https://doi.org/10.1016/j.dhjo.2019.100870
Zhang, W., Davitti, E., Braun, S., 2024. Charting the Landscape of Remote Medical Interpreting: An
International Survey of Interpreters Working in Remote Modalities in Healthcare Services. Per-
spectives 1–26. URL https://doi.org/10.1080/0907676X.2024.2382488
Zhu, X., Aryadoust, V., 2022. A Synthetic Review of Cognitive Load in Distance Interpreting:
Toward an Explanatory Model. Frontiers in Psychology 13, 3535. URL https://doi.org/10.3389/
fpsyg.2022.899718
Ziegler, K., Gigliobianco, S., 2018. Present? Remote? Remotely Present! New Technological
Approaches to Remote Simultaneous Conference Interpreting. In Fantinuoli, C., ed. Interpreting
and Technology. Language Science Press, Berlin, 119–139.

50
3
REMOTE SIMULTANEOUS
INTERPRETING
Agnieszka Chmiel and Nicoletta Spinolo

3.1 Introduction
According to one of the earliest definitions, remote simultaneous interpreting (RSI) is under-
stood as ‘any form of simultaneous interpreting where the interpreter works away from the
meeting room either through a video-conferencing set-up or through a cabled arrangement
close to the meeting facilities, either in the same building or at a neighboring location’
(Moser-Mercer, 2003, 1). RSI is a modality of interpreting that belongs to distance inter-
preting. This term may be considered as a hypernym for all forms of technology-mediated
interpreting (Braun, 2019). Looking at the phenomenon from the point of view of the
participants’ location, Braun (2019, 272) identifies remote interpreting as ‘the situation in
which the interpreter is in a physical space, different from that of the conference venue’,
and teleconference interpreting as ‘the situation in which the whole event is virtual and
therefore participants are connected at a distance, with the interpreter either at one client
location or working from a different venue’. Following this principle, Braun makes a fur-
ther differentiation based on the source of input for the interpreter. This can be audio-only
(audioconference interpreting, when the interpreter is located with one or some of the par-
ticipants; audio remote interpreting, when the interpreter is located remotely) or audio
and video (videoconference interpreting, when the interpreter is located with one or some
of the participants; video remote interpreting, when the interpreter is located remotely).
In a follow-up classification, based on participant location and aimed at considering
post-pandemic modifications in distance interpreting configurations, Braun (2024, 452)
employs the label ‘remote interpreting’ for situations in which clients are all located in the
same venue, and ‘virtual interpreting’ for fully virtual events.
Remote simultaneous interpreting is not recent; as a matter of fact, its first use dates back
to 1976 and was organised by UNESCO. Similar experiments were subsequently organ-
ised by the United Nations in the 1970s and the 1980s (Braun, 2015, 346). These initial
instances of remote interpreting in the simultaneous mode, alongside those that followed in
subsequent years, employed traditional booths and interpreting consoles placed away from
the conference venue (Mouzourakis, 2006).


DOI: 10.4324/9781003053248-5
The Routledge Handbook of Interpreting, Technology and AI

Seminal research on remote interpreting at that time (Moser-Mercer, 2005b; Roziner and
Shlesinger, 2010) also focused on situations where interpreters, although located away from
the conference venue, worked from interpreting booths, sharing the same physical space
with their boothmates, interpreting team, and technical support and using their habitual
interpreting console and equipment. This modality of interpreting was then already con-
sidered ‘a seemingly inevitable shift . . . in Brussels and beyond’ (Roziner and Shlesinger,
2010, 215).
In recent years, with incredible boost due to the COVID-19 pandemic, the shift towards
web-based communication led to a steadier definition of RSI as cloud-based simultaneous
interpreting (Braun, 2019; Saeed et al., 2022).
In this chapter, we will discuss RSI by presenting key terms and concepts (Section 3.2),
an overview of the communicative contexts in which it can be used (Section 3.3), and of
the development of RSI-related technology (Section 3.4). We will then move on to present-
ing feedback from practitioners (Section 3.5) and discussing critical issues (Section 3.6),
such as sound quality, cognitive load and performance, teamwork and boothmate presence,
stress, and multimodality. Finally, we will present concluding remarks and discuss emerg-
ing trends (Section 3.7).

3.2 Key terms and concepts


In this chapter, we will refer to RSI as a specific modality of video-mediated interpreting
(Braun, 2019) in which interpreters are not located in the same venue as the conference and
provide simultaneous interpreting services through cloud-based platforms. These are also
termed simultaneous interpreting delivery platforms (SIDPs) (ISO 24019: 2022; see also
Perez Guarnieri and Ghinos, this volume).
In this changing landscape, key terms related to RSI are being defined and standardised.
To illustrate, ISO standard 20539:2019, on ‘Translation, Interpreting and Related Technol-
ogy’, has been updated to ISO standard 20530:2023 to cover new tools and modalities.
As far as RSI is concerned, the new standard defines terms such as SIDP, hard console
and soft console, and webcasting (‘transmitting video and audio data across a network to
an audience’ [ISO 20539:2023, 13]). In a similar vein, the ISO standard defining SIDPs
(ISO 24019:2022) not only provides a definition for an SIDP (‘virtual environment used
in simultaneous interpreting . . . for managing the processing of signals during the trans-
mission of information from speakers . . . or signers . . . to distant interpreters . . . and the
interpreters’ rendition to a distant audience’; ISO 24019:2022, 3) but also sets a standard
for other related terms, such as interpreter interface (‘equipment containing controls used
by the interpreter . . . to facilitate simultaneous interpreting’; ISO 24019:2022, 3) and soft
console (‘interpreter interface . . . which runs on a computer or portable IT device and has
on-screen controls’; ISO 24019:2022, 4).
There are a variety of SIDPs currently available on the market, each with different inter-
face designs and features. Despite the wide variety of such platforms, a general twofold
distinction can be made. On one hand, there are web conferencing platforms that do not
have RSI as their core business. These were not initially developed for interpreting but have
added RSI features to make them more usable. On the other hand, there are platforms that
have been designed specifically to integrate and provide RSI services. At the time of writing
(a necessary specification, given the speed at which these technologies change), the main dif-
ferences between these two can be summarised as follows (Saeed et al., 2022; Saeed, 2023):

52
Remote simultaneous interpreting

non-RSI-specific platforms usually include RSI as a further option on the platform. These
have a lean, basic interpreter interface, and there is no actual soft console containing multi-
ple controls for the interpreter (the most popular example, at the time of writing, is Zoom).
In contrast, RSI-specific platforms (or ‘RSI bespoke platforms’, as termed by Saeed, 2023,
12) tend to have slightly more complex interpreter interfaces and soft consoles. These make
multiple options and configurations available for interpreters. For example, besides provid-
ing the platform service, these can also provide technical and logistic support before, dur-
ing, and after the event (examples, at the time of writing, are KUDO, Interprefy, Interactio).
However, it is not only the equipment and technology that is undergoing a termino-
logical update. The extensive diffusion of RSI implied radical changes in what Pöchhacker
(2004, 13) termed ‘constellation’ of interaction in the event. While further details will be
discussed in greater depth in Section 3.3, it is worth noting here that this has led to new
terminology being used to describe interaction with boothmates and the rest of the inter-
preting and technical team, who are either co-located or not (AIIC TFDI, 2020).
Although they do not pertain exclusively to the realm of RSI, the concepts of (social)
presence, immersion, and flow are tightly linked to remote communication in general and,
as a consequence, are often mentioned with respect to RSI. Lee (2004, 45) defines social
presence as ‘a psychological state in which virtual (para-authentic or artificial) social actors
are experienced as actual social actors in either sensory or nonsensory ways’ and states that
‘[s]ocial presence occurs when technology users do not notice the para-authenticity of medi-
ated humans and/or the artificiality of simulated nonhuman social actors’; a lack of sense
of presence in a remote interaction leads to a feeling of alienation in participants (Mouzou-
rakis, 2006). Saeed et al. (2022) identify how two concepts that are linked to the concept
of presence are those of ‘immersion’ and ‘flow’. They state how the concept of ‘immersion’
appears to overlap with that of ‘presence’, since it refers to ‘the feeling of “existing” within
a virtual world’ (Saeed et al., 2022, 218). The concept of ‘flow’ refers, more broadly, to a
mental state of full immersion and concentration on a task (Saeed et al., 2022, 220).

3.3 Contexts of use and application


As specified earlier, the remote location of the interpreter in relation to the conference venue
is a stable feature of RSI. However, there are multiple contexts of application and use for
RSI, as well as multiple possible constellations of all stakeholders involved in the event.
In terms of event type, RSI can be used either for fully virtual events, such as webinars
or online conferences; for hybrid events, where some of the speakers and audience are
co-located and some (including the interpreters) are non-co-located; or for fully on-site
events, where only interpreters are located remotely (see also Spinolo, 2022). Each pos-
sible configuration implies a different experience and different access to information for
interpreters.
In the case of fully virtual events, all participants (interpreters, speakers, moderators,
audience, technicians) share the same visuals at the event. Consequently, interpreters will
find themselves on an equal footing with other participants in terms of potential issues,
such as technical problems, sound quality, or access to visuals. Interestingly, in a fully
online event, the challenges faced by remote interpreters might become more visible, as
these would be experienced by all participants.
On the other hand, in a fully on-site event where only interpreters are located remotely,
there will be no equal ground. In this constellation, interpreters will be the only ones with

53
The Routledge Handbook of Interpreting, Technology and AI

different access to event information. This includes access to audio and video input and
view of the speakers, slides, and feedback from the audience. As opposed to a fully virtual
constellation, this unequal footing may make the challenges faced by interpreters less vis-
ible, since these are not shared by all participants. Such challenges may include connection
issues, misuse of equipment (microphones, cameras, etc.), poor and incomplete visuals, and
in most cases, no view of the audience to gain feedback.
A third possible event type is hybrid. Here, certain participants (speakers, moderators,
members of the audience) are on-site, while some others are online. Since a hybrid situa-
tion is a partially on-site, partially online event, it may present challenges of both other
aforementioned modalities. In a hybrid event, the view of the speakers and slides will vary
depending on whether they are presenting from home or from the venue. In addition, tech-
nical issues may arise either from online presenters or from on-site.
The implications of the different kinds of events for interpreters are multiple. The type
of the event (virtual, hybrid, on-site with interpreters only online) and the interpreter loca-
tion directly influence the interpreters’ working condition, role, and sense of social presence
(Lee, 2004 and Section 3.2) in the interaction.
Working locations can also vary and have important ramifications on the interpreters’
work and practice in RSI. While it is evident that, in RSI, interpreters can work from virtu-
ally anywhere, a broad distinction can be made as to whether interpreters work from home
or their private office or from an interpreting hub (a facility ‘equipped with interpretation
booths and remote interpretation equipment. The interpreter is co-located with at least
some other interpreters from the team’ (Buján and Collard, 2022, 139)). Working ‘from
a hub’ therefore implies being co-located with a boothmate, the interpreting team as a
whole, and technical support, while ‘working from home or a private office’ implies being
responsible not only for interpreting but also for technical equipment, and for interaction
with boothmates, colleagues, and technicians, from a distance. Finally, a third possibility is
a mixed remote constellation, where an interpreter (or part of an interpreting team) works
from a hub while others work remotely from home. Interpreting hubs are being set up by
large interpreting companies and agencies, as well as some institutions. One such example
is the European Parliament, where, due to travel restrictions imposed by COVID-19, del-
egates often took the floor remotely, and interpreters would occupy one booth per person
and communicate with their boothmates through the booth window (Jayes, 2023).
Chaves (2020) and Cheung (2022, 115) summarise the main features that differentiate
‘working from home’ from ‘working from a hub’. The overall picture emerging from these
observations is that hubs offer on-site technical support, co-location with boothmate and
easier handover, connection safety and stability, use of hard consoles (although this might
not always be the case), and soundproof settings (although, again, this may vary from hub
to hub). A home setting, on the other hand, makes the interpreter responsible for techni-
cal set-up and management and may present less network stability, increased difficulty in
communication and handover with boothmates, and greater difficulty in accessing remote
technical support (when such support is provided).
However, survey research suggests that, currently, while interpreting hubs do exist, most
interpreters work either from home or from their private office (Buján and Collard, 2022;
Chmiel and Spinolo, 2022). This may be due to multiple reasons, the most likely being that
there is no hub option provided, or that hubs are far from the interpreters’ locations. Also,
some interpreters might prefer to work from their own, personalised workstation and may
see the advantage of less travelling, a reduced carbon footprint, an easier work–life balance,

54
Remote simultaneous interpreting

and even the possibility of accepting more assignments, without travel days in between
events (Mayub Rayaa and Martin, 2022).
Whatever the interpreter’s choice of location (home or hub), it has an impact not only
on the mode of interaction with their colleagues and other stakeholders but also on their
workstation set-up and ability to customise it to the interpreter’s individual needs. There
are, as a matter of fact, multiple possible configurations of an interpreter’s workstation.
Interpreters can use a single or a double screen, one or two computers, or other devices for
documentation and interaction (Spinolo and Chmiel, 2023).
The question of ‘interpreter location’ is closely linked with ‘boothmate location’, since
‘hub-based’ interpreting usually implies a co-located boothmate, while ‘home-based’ inter-
preting usually means a non-co-located boothmate. In the former case, boothmate support
and interaction can occur in the ‘traditional’ way, by means of prompts written on paper,
gestures, or short oral communications with muted microphones. In the latter case, when
working with a non-co-located boothmate, options for boothmate interaction are multiple.
According to survey research (Chmiel and Spinolo, 2022), most interpreters seem to prefer
using an external chat on a device, different from the platform within which they are per-
forming RSI, to communicate with boothmates. Others prefer connecting with boothmates
via video call on a separate device. Alternatively, others use the chat embedded in their RSI
platform, or an external chat on the same device. Other ways interpreters may communi-
cate with one another, although less frequently, may be audio calls or muted video calls, on
a separate device. While various modes of interaction with boothmates have already been
observed (see, for instance, Chmiel and Spinolo, 2022), research on the impact of booth-
mate location and mode of interaction is still in its infancy (see Section 3.6.4), and results
are only just beginning to emerge.
Research on remote interpreting in the booth has reported an increased sense of alien-
ation for interpreters who are working with traditional equipment and co-located with
the team (Moser-Mercer, 2005b; Roziner and Shlesinger, 2010; Seeber et al., 2019). In a
home-based setting, without a co-located boothmate, this feeling of alienation is bound to
persist or increase. This sense of alienation could be offset by providing appropriate visual
input (Seeber et al., 2019) or by designing interfaces that improve immersion and general
interpreter experience. However, as explained earlier, due to the fast evolution of technol-
ogy, coupled with the acceleration triggered by the COVID-19 pandemic, the situation is
changing quickly, and further research is needed to extract observations that are relevant
to the current times.
Interpreters’ access to visual information from the event is an obvious and important
consequence of the distribution of event participants across the physical or virtual space.
In the case of an on-site event, where only the interpreters are located remotely, interpret-
ers may have different visual access to the event, in terms of speaker view, slide view, or
view of the room and audience. As Seeber (2022) identifies, the human field of vision has a
120-degree extension both horizontally and vertically, although the human eye has better
focus on elements that are central in our field of vision. With this notion in mind, Seeber
(2022) explains, it is easily understandable that even with a state-of-the-art technological
set-up, it will be hardly possible to offer interpreters the same visuals as in an on-site con-
ference setting. This is particularly relevant as, according to research on visual attention
distribution during RSI, the speaker frame occupies a large share of professionals’ visual
attention, even when slides are being presented and given their share of visual attention
(Chmiel et al., under review).

55
The Routledge Handbook of Interpreting, Technology and AI

Interface design and user experience in RSI are also crucial when it comes to providing
an enhanced sense of presence. Saeed et al. (2022) used the focus group methodology to
elicit interpreters’ preferences in terms of interface design. This was followed up (Saeed,
2023) by an experimental study using a validated usability questionnaire, the UEQ (User
Experience Questionnaire; Laugwitz et al., 2008), post-task questionnaires, and follow-up
interviews. Frittella (2023) used post-task questionnaires and interviews to assess the usa-
bility of an RSI platform (SmarTerp) providing AI-based CAI support. Chmiel et al. (under
review) used the UEQ (Laugwitz et al., 2008) to assess user experience in different RSI
configurations (see more in Section 3.4).

3.4 Design and development of technology


RSI technology has undergone a remarkable evolution over time. A general overview of the
technology employed for interpreting remotely in the conference setting can be based on
three main aspects: the technology used to convey sound and video (where relevant), the
interface for event participants (i.e. the way participants access the interpreting service),
and the interpreter’s interface. In this section, we will briefly discuss each aspect separately.
Regarding the technology used to transmit the input and output signal to and from the
interpreters, the introduction of the internet as a way to convey interpreting can be seen
as the shifting point towards RSI as we currently know it (Saeed et al., 2022). In the first
instances of remote interpreting, the technology that was used to convey the signal from
the conference venue to the interpreters, and from the interpreters back to the venue, was
radio satellite technology; this was used to transmit both telephone connections (audio
only) and video links (Mouzourakis, 1996). From the 1990s, international organisations
focused their interest on video remote interpreting with ISDN technology and high-quality
video links, mainly in order to tackle the issue of insufficient space in venues for the instal-
lation of interpreting booths (Braun, 2019). However, in the 21st century, the technol-
ogy used for this purpose has shifted towards internet-based connections. This shift was
gradual at first but greatly accelerated with the COVID-19 pandemic in 2020 that forced
the virtualisation of all events, due to lockdowns of various durations, in most countries
(Saeed et al., 2022).
The shift towards cloud-based interpreting (Braun, 2019) has led to a radical change in
how conference participants interact with interpreting technology and access interpreting
services. In simultaneous interpreting with traditional equipment, conference participants
are only able to listen to interpreters by means of receivers connected to the venue’s sound
system. The sound system transmits the sound in the conference room to the receivers via
radio infrared technology. Cloud-based interpreting has changed the way listeners access
interpreting. When located remotely from the event, participants can access the interpreted
rendition of the event from their own location by logging onto the conference platform
and choosing the appropriate language channel on their own devices (computers or mobile
devices). In the case of an on-site event, the signal sent by the RSI platform can still be trans-
mitted through an infrared system. Alternatively, participants can use their own smart-
phones and headsets to connect to the platform and follow the interpreted version of the
event while being physically located in the conference room, in a so-called BYOD (‘bring
your own device’) mode (Disterer and Kleiner, 2013; see also Spinolo, 2022).
As far as the interpreter’s interface is concerned, the arrival of cloud-based RSI has
brought about a significant change for interpreters in terms of human–machine interaction.

56
Remote simultaneous interpreting

To illustrate, besides not having the venue in direct sight, interpreters working in RSI
often use a so-called ‘soft console’ (see Section 3.2 and ISO 20539:2023). This means all
their booth controls are on-screen, rather than being physical consoles with buttons and
switches. However, hard consoles can at times be integrated and used within RSI platforms
(Fan, 2022). The difference between using a ‘hard’ or a ‘soft’ console does not only lie in
the fact that soft console controls are on-screen together with all other visual input from
the conference (video stream, slides, event chat, booth chat, etc.); this differentiation also
implies variation in location, number, functioning, and visualisation of controls. This some-
times varies noticeably from one platform to another and thus requires a certain degree of
adjustment from professionals when dealing with a new platform or switching regularly
between platforms. This may be particularly true for RSI-specific platforms. These offer a
wider variety of options for interpreters, and therefore have increased functions and con-
trols, compared to non-RSI-specific platforms.
An important element of the interpreter interface is that which is related to boothmate
communication. This, too, varies significantly from one platform to another. It can range
from platforms without a specific communication channel for boothmates (e.g. in many
non-RSI-specific platforms) to platforms which are developing virtual booths, either as a
part of the SIDP itself or as a separate backchannel, injecting booth sound into non-RSI
specific platforms. In such virtual booths (such as GTBooth), interpreters are not only able
to use a chat feature to communicate but can also see and talk to their boothmates without
being heard by conference participants. This, therefore, allows them to use both gestures
and voice as means of communication.
Interface-related issues are currently being investigated by researchers, and results are
starting to be published. Saeed et al. (2022) elicited opinions and preferences on RSI inter-
faces from two focus groups. The first included three professionals, and the second four
trainee interpreters. Results on visual preferences pointed at a desire for further stand-
ardisation of interfaces to reduce excessive variation (in line with results from Spinolo
and Chmiel, 2023). Results also indicated a feeling of information overload and potential
fatigue relating to visual information that is not regularly used during the interpreting pro-
cess (e.g. controls and buttons). Additionally, results showed a desire for easier communi-
cation with boothmates, through a video feed and a shared notepad and chat. In general,
authors detected a preference towards what they defined to be a ‘minimal’ interface. Based
on the focus groups, Saeed (2023) proceeded to carry out an experimental study with 29
participants to explore how different visual elements of interfaces can support practition-
ers. The author manipulated the following elements: interface (minimal vs maximal, using
mock-up interfaces with no prior usability testing), speaker view (close-up vs gesture view),
and CAI support (ASR vs no ASR), and aimed at exploring the impact of different configu-
rations on user experience and at identifying the most effective way of displaying visual
information to interpreters. Surprisingly, and somewhat in contrast with results from the
focus groups, UEQ scores (Laugwitz et al., 2008) hint at a better overall user experience
with the maximal interface, rather than the minimal. In addition, analysis of post-task
questionnaires completed by participants highlights high individual variation in preference
and, consequently, a desire for interface customisability (in line with results from Spinolo
and Chmiel, 2023).
A more recent development in RSI technology is the integration of AI-powered inter-
preter support within the platform (see, for example, Rodríguez et al., 2021, and Fantinu-
oli et al., 2022). This can come in the form of either a running automatic transcript of the

57
The Routledge Handbook of Interpreting, Technology and AI

speech or providing a specific selection of known problem triggers (Gile, 1995/2009), such
as named entities, numbers, specific terminology (see Section 3.7).

3.5 Feedback from practitioners


With the rapid uptake of RSI triggered by the COVID-19 pandemic, researchers first focused
on feedback from practitioners obtained via survey studies. This type of research was con-
sidered to be quicker and more feasible to conduct during initial lockdown periods than
experimental studies. Surveys served as a source of instant information about the sweeping
changes in the sector, enforced by the pandemic, and state-of-industry reports showed how
workloads and working patterns had changed. They illustrated the main challenges linked
to the rapid adoption of technologies and the dramatic shift from on-site to online events.
The most extensive feedback from interpreters was obtained through the ESIT survey
(Buján and Collard, 2022). This study included data from more than 900 respondents and
focused on conference interpreters’ work practices during the first year of the pandemic. Sur-
vey questions elicited information about respondents’ work during the pandemic, in contrast
to their work prior to the pandemic. Results showed that RSI was generally associated with
less work and fewer clients, shorter assignments, worse performance and working conditions,
hampered teamwork, and more difficulty. However, at the same time, there was also willing-
ness to continue working on RSI assignments. This information aligned with advantages
attributed to RSI, including new business opportunities and improved work–life balance.
Many smaller-scale national surveys were also conducted in Italy (Ferri, 2021), Poland
(Przepiórkowska, 2021), Taiwan (Fan, 2022), Spain (Mahyub Rayaa and Martin, 2022),
Canada (Canada Regional Bureau of AIIC, 2021), Turkey (Şengel, 2022), and Japan (Mat-
sushita, 2022). In total, these studies collected responses from almost 600 interpreters and
generally showed trends similar to those in the ESIT survey. RSI was viewed as a major
change that reshaped the market following the pandemic. As a result of this, interpreters
need to acquire new competences (including managing multiple devices and information
channels and coordinating teamwork) as their role is redefined. RSI offers greater flexibility
and business opportunities and is seen as being more environmentally friendly due to a
reduction in the carbon footprint, related to reduced travel. In contrast, major challenges
include health issues related to poor sound quality and more stressful working conditions,
fewer opportunities to network, and increased alienation. There are also risks related to
lower rates of pay and poor awareness of interpreters’ copyright (with instances of inter-
preters being forced to agree to being recorded so that they can complete the interpreting
assignment).
Further feedback from practitioners was provided by Spinolo and Chmiel (2023), who
surveyed almost 400 interpreters with RSI experience and looked for links between the
technical set-up (i.e. how often interpreters used various devices and additional screens
for RSI), work settings (home office or hub), and self-reported cognitive load. They found
that more complex technical set-ups were used in a home office setting compared to those
in a hub. More complex technical set-ups were also associated with prior training in RSI,
frequency of RSI practice, and age (interpreters in their early 40s used the simplest techni-
cal set-ups). Interestingly, according to the interpreters’ self-reports, their cognitive load did
not increase with more complex technical set-ups or managing multiple channels of infor-
mation (such as the speech, the chat between delegates, the written and auditory Q&A, and
the written, visual, and/or auditory communication with the boothmate).

58
Remote simultaneous interpreting

The picture stemming from interpreter feedback shows that RSI is viewed as a challenge
but is also embraced as an opportunity. Practitioners have manifested great flexibility and
adaptability to the changing nature of the interpreting market.

3.6 Critical issues


This section presents an overview of findings from studies on RSI, categorised accord-
ing to critical issues. The first critical issue to be discussed is sound quality. This will be
reviewed based on input from practitioners, associations, international organisations, and
results from a small number of experimental studies. Next, we will focus on cognitive
load and interpreting performance, stress in RSI, teamwork and boothmate presence, and
multimodality.

3.6.1 Sound quality


Sound quality is unquestionably the most critical issue currently related to RSI. Interpret-
ing platforms frequently provide sound with limited frequency range, compared to the
125–15,000 Hz range considered to be optimal for remote interpreting (Buján and Collard,
2022; Ziegler and Gigliobianco, 2018). Sound is also further manipulated by various algo-
rithms, for purposes such as noise suppression and feedback cancelling (Caniato, 2020a).
These technology-induced factors that have the effect of reducing sound quality are also
accompanied by human-induced factors. Examples include poor equipment used by speak-
ers during remotely interpreted events. Put together, these issues cause strain on the inter-
preters’ hearing, with the additional complication of prolonged exposure to poor-quality
sound, resulting in potential hearing damage. Examples of hearing issues include tinnitus
(ringing or similar sound experienced in the ear without any external source of such sound),
hyperacusis (increased sensitivity to sound and low tolerance for noise), or even loss of
hearing (Caniato, 2020a).
Interpreters have repeatedly voiced their concerns about toxic sound and related health
risks and performance deterioration. Sound quality usually ranks as the highest RSI-related
issue in interpreter surveys (Buján and Collard, 2022; Fan, 2022; Mahyub Rayaa and Mar-
tin, 2022). To illustrate, the most important factors for RSI, as identified by the respondents
to the survey study by Spinolo and Chmiel (2023), were ‘good sound quality’ (4.98 points
on a 5-point Likert scale), ‘good network connection’ (4.92), and ‘good equipment’ (4.90)
used by conference participants. All these factors contribute to the reduction of health risks
triggered by toxic sound.
Such concerns have spurred developments in institutional contexts. The AIIC (Inter-
national Association of Conference Interpreters) has been very active in promoting ‘good
sound’ in RSI in various ways, such as commissioning technical studies, and issuing recom-
mendations and resolutions. To illustrate, the AIIC Technical and Health Committee has
published analyses showing limited compliance of sound provided by SIDPs with ISO stand-
ards (AIIC THC, 2019), while the AIIC EU Negotiating Delegation has issued a resolution
on sound quality (AIIC EU ND, 2022) calling the EU bodies to ‘make the recommended
equipment and technical set-up a mandatory requirement for remote meeting participation’
(AIIC EU ND, 2022, 4). Similar steps were taken in the UN during the pandemic-induced
transition to RSI. Here, a task force was created to test and troubleshoot technical solutions
and raise awareness in relation to sound quality (Diur and Ruiz Rosendo, 2022).

59
The Routledge Handbook of Interpreting, Technology and AI

Improvements have been made in the area of technological factors that influence
sound quality. In September 2020, Zoom introduced a High-Fidelity Music Mode feature
to replace narrow-band audio with higher-clarity broadband audio in its VoIP protocol
(voice-over-internet protocol, a technology standard allowing for online transmission of
voice) (Zoom Blog, 2022). This development has been seen as both ‘promising’ (Caniato,
2020b) and ‘challenging’ because it might backfire if clients do not use this feature with
‘good equipment’ (Flerov, 2021).
Experimental studies on the impact of sound quality for interpreters have been scarce.
However, one such large-scale study focused on more than 80 interpreters and examined
the effect of sound quality on cognitive load and performance in RSI (Seeber, 2022; Seeber
and Pan, 2022). The study manipulated sound by using two frequency ranges: 125–15,000
Hz for the high-quality condition, and 300–3,400 Hz for the low-quality condition. Results
indicated the detrimental effect of low sound quality, and interpreters reported increased
cognitive load and levels of frustration relating to this condition. In addition, their inter-
preting performances were judged as being ‘worse’ by independent evaluators.

3.6.2 Cognitive load and performance


Cognitive load, understood as the processing demand required by a particular task, has
long been the focus of processing-oriented research on interpreting (see, for example, Chen,
2017; Seeber, 2011). The specificity of RSI imposes additional processing demands on the
interpreter. These include the mode of presentation (audio and/or video, visual access to the
speaker, the interface, etc.), turn duration, and participant interactivity (presence of on-site
or platform audience) (Zhu and Aryadoust, 2022). Experimental studies on RSI have so far
incorporated various manipulations and have shown that cognitive load is perceived to be
lower when interpreting takes place in a hub, as opposed to a home office (Cheung, 2022).
Cognitive load is also perceived to be higher when the input sound is considered to be ‘bad
quality’ (Seeber, 2022). This does not differ when RSI is performed with a co-located or a
non-co-located boothmate (Chmiel et al., under review).
These results can be viewed in contrast to objective and self-reported performance indi-
cators. In a study by Roziner and Shlesinger (2010), objective interpreting quality was simi-
lar in both on-site and remote interpreting, with RSI producing slightly lower-quality scores
(only in absolute numbers, with no statistically significant effect). However, subjective qual-
ity judgments showed decreased satisfaction with performance, as voiced by interpreters
after carrying out interpreting assignments remotely. These findings regarding objective
quality from the aforementioned study are at a variance with the results of Moser-Mercer
(2003), who reported a faster decline in quality for remote interpreting compared to on-site
performance.
With regards to specific factors that influence RSI interpreting performance, the level
of accuracy of interpreted numbers has been found to be similar for both home-based and
hub-based RSI (Cheung, 2022). The presence of a boothmate does seem to improve inter-
preting accuracy in RSI in this context. The presence of a co-located boothmate or a booth-
mate in a virtual booth has been shown to increase interpreting accuracy in comparison
to a non-co-located boothmate communicating via chat only (Chmiel et al., under review).
Additionally, having access to a live transcript of a speech, which is generated through
automatic speech recognition (ASR), during an RSI assignment was found to improve

60
Remote simultaneous interpreting

interpreting quality for lexically dense and fast speeches, with a delivery rate exceeding 140
words per minute (Rodríguez González et al., 2023b).

3.6.3 Stress
Stress is another critical issue that has been examined in the context of RSI. Two semi-
nal studies comparing on-site and remote simultaneous interpreting (Moser-Mercer, 2003;
Roziner and Shlesinger, 2010) in a within-subject design (i.e. comparing the same interpret-
ers performing both tasks) collected both objective and subjective measurements of stress.
The former included salivary cortisol measurements, while the latter included responses to
validated tests. Both studies showed no effect of task on objectively measured stress levels.
Despite numerical differences, interpreters in the study by Moser-Mercer (2003) did not
self-report higher stress levels when working in RSI. However, a larger-sample study by
Roziner and Shlesinger (2010) did show that RSI was indeed perceived as more stressful.
Stressors that were identified as being more significant in RSI included difficulty of delivery
and text, visibility, and lack of feedback from the audience. On the other hand, length of
turn, booth conditions, and technical equipment were judged as being equally stressful in
both remote and on-site interpreting. Interpreters also reported experiencing more exhaus-
tion, cognitive fatigue, and burnout after working in RSI. Both of these studies were con-
ducted in a hub, and it is not known whether these results can be generalised to RSI in a
home office scenario. In line with the findings of the two studies mentioned, Chmiel et al.
(under review) found no differences in self-reported anxiety levels, depending on whether
the boothmate was co-located or non-co-located. However, the study did not include an
on-site condition for comparison. Interestingly, a survey and interview-based study on RSI
by Seeber et al. (2019) found that interpreters’ impressions varied – with half of their sam-
ple judging RSI as being more stressful than on-site interpreting, and half perceiving RSI as
less stressful than traditional interpreting.

3.6.4 Teamwork and boothmate presence


RSI has completely different teamwork dynamics depending on the location of the active
interpreter’s boothmate (see Section 3.3). If co-located with their boothmate, interpreters
may adhere to long-established or ad hoc, mutually agreed-upon procedures for providing
assistance. Examples of assistance include interpreting numbers or difficult terms in the
form of prompts, coordinating handover with gestures and other direct interaction (Chmiel,
2008; Duflou, 2016). When the boothmate is not co-located, it becomes more difficult to
interact and coordinate handover. Chmiel et al. (under review) manipulated boothmate
presence and investigated interpreting accuracy and self-reported difficulty when interpret-
ers performed RSI with a co-located boothmate and with a non-co-located one, who was
either providing prompts via chat or was present in the virtual booth (in which both inter-
preter and boothmate can see each other, talk over a separate audio channel, and use a chat;
see Section 3.3) and provided prompts within the virtual booth chat. It turned out that
interpreting accuracy was similar within both the virtual booth setting and the co-located
setting. However, interpreting accuracy was found to be statistically significantly lower in a
non-co-located setting. The subjective evaluation of performance was in line with that data
and was at its lowest within a non-co-located setting. Nevertheless, the type of boothmate

61
The Routledge Handbook of Interpreting, Technology and AI

presence was not found to influence mental demand (understood as the amount of thinking
involved in a task) or temporal demand (time pressure).
One of the most common forms of collaboration in the booth is handover (Seresi and
Láncos, 2022), re-assigning the role of the active interpreter from one booth partner to
another. Handover may be achieved through eye contact or gestures. These procedures can
fail when performing RSI with a non-co-located boothmate (unless the interpreters are in
a virtual booth and can see each other). Thus, interpreters have turned to using particular
messages or icons in chat applications or agreeing on a predefined time for handover (Seresi
and Láncos, 2022). So far, a single experimental study on RSI has focused on the problem
of handover, showing difficulties in this respect in RSI (Matsushita et al., 2022).

3.6.5 Multimodality
Human–computer interaction is at the core of RSI and increases the multimodal nature of
this type of interpreting. This is paradoxical because, at a first glance, remote interpreting
may seem to provide the interpreter with fewer sources of input to process compared to
on-site interpreting. In RSI, input comes predominantly from a screen (or multiple screens)
and a headset, while on-site interpreting is rife with various input channels (visual infor-
mation from the conference room, conference slides, interpreters’ own computer screens,
documentation in the booth, written prompts from the boothmate, and auditory input
both from the floor and the boothmate). However, although reduced to a two-dimensional
screen, information input in RSI is far more complex, as interpreters must juggle screens,
devices, applications, platforms, and systems. This constitutes a challenge and creates a
need for ‘multisensory integration to construct meaning’ (Moser-Mercer, 2005a). How-
ever, when faced with such a challenge, interpreters have been found to show flexibility
and adaptability. When asked about how challenging the use of multiple input channels
is, interpreters find attending to multiple channels of information (speaker’s audio, a chat
between delegates, communication with boothmate, written questions from the audience,
presentation visuals) moderately problematic (scoring 5.33 out of 7) and using multi-
ple programmes, application, systems, and devices slightly less problematic (scoring 5.0
out of 7) (Spinolo and Chmiel, 2023). Interpreters seem to prefer lean interfaces with a
clearer view of the speaker and their body language (Rodríguez González et al., 2023a).
However, visual attention with more fragmented distribution over various chat boxes
and panels has not been found to lead to poorer performance or a higher self-reported
cognitive load (Chmiel et al., under review). Recent studies (Frittella, 2023; Saeed et al.,
2022) might help optimise the multimodal working environment in RSI in the future (see
Section 3.3).

3.7 Conclusion
This chapter has presented the main concepts related to RSI, including contexts of use and
technologies used in RSI. It also focused on feedback from practitioners and critical issues
such as sound quality, cognitive load, stress, teamwork, and multimodality. As mentioned,
the interpreting profession is now experiencing disruptive, extensive changes as a result
of technological developments. These are bound to continue developing and changing the
practice of RSI. It is only natural that RSI, a modality of interpreting that is so dependent
on cloud-based infrastructure, absorb new solutions and benefits from new features.

62
Remote simultaneous interpreting

Although not exclusive to RSI, a trend towards interpreter augmentation, very often
applied to RSI tools, is emerging. AI-powered, computer-assisted interpreting tools are
increasingly being included in RSI platforms in the form of interpreter support (see also
Prandi, this volume). This is particularly noticeable in relation to known problem-triggering
items such as numbers and named entities and prompting interpreters with specialized ter-
minology via ASR (Defrancq and Fantinuoli, 2021; Frittella, 2023; Prandi, 2023; Saeed
et al., 2022).
Zhang et al. (2023) draw future scenarios and propose various tools that could be
used to ensure better user experience and support interpreters working remotely. These
include video summarisation (providing a short update to boothmates resuming work
after a break), keyword spotting (identification of keywords in the source text rather than
displaying verbatim transcripts), face and gesture detection and recognition, and interac-
tion management to avoid overlapping speech (for instance, in Q&A sessions). It is also
possible to envisage meetings of the future being interpreted, in some language combina-
tions, via machine interpreting (see Fantinuoli, this volume) or using machine interpreting
post-editing (MIPE) (i.e. using automatically generated, machine-translated subtitles as a
source of interpreted text, to be read out following online editing.) A further and more
recent evolution of AI support for interpreters is the use of augmented reality tools to offer
ASR support (Gieshoff and Schuler, 2022). This practice is likely to grow stronger in the
near future and can be applied to both RSI and other simultaneous interpreting contexts.
In this scenario, interpreters wear virtual reality glasses and see AI-generated prompts as
part of their 3D visual field.
All these emerging trends constitute potential further areas of enquiry. Additionally, due
to the unprecedented recent development in generative artificial intelligence, the disruptive
impact of this technology on interpreting will surely be researched in the months and years
to come.

References
AIIC EU ND – EU Negotiating Delegation, 2022. Resolution on Sound Quality. AIIC Website.
URL https://aiic.org/document/10590/Resolution%20on%20Auditory%20Health%20and%20
Sound%20Quality%20v2.pdf (accessed 9.3.2024).
AIIC TFDI – Taskforce on Distance Interpreting, 2020. AIIC COVID-19 Distance Interpreting Rec-
ommendations for Institutions and DI Hubs. AIIC Website. URL https://aiic.org/document/4839/
AIIC%20Recommendations%20for%20Institutions_27.03.2020.pdf (accessed 9.3.2024).
AIIC THC – Technical and Health Committee, 2019. Technical Study on Transmission of Sound and
Image Through Cloud-Based Systems for Remote Interpreting in Simultaneous Mode (Remote
Simultaneous Interpreting – RSI). AIIC Website. URL https://aiic.org/document/4862/Report_
technical_study_RSI_Systems_2019.pdf (accessed 9.3.2024).
Braun, S., 2015. Remote Interpreting. In Pöchhacker, F., ed. The Routledge Encyclopedia of Interpret-
ing Studies, 1st ed. Routledge, Abingdon, 346–348. URL https://doi.org/10.4324/9781315678467
Braun, S., 2019. Technology and Interpreting. In O’Hagan, M., ed. The Routledge Handbook
of Translation and Technology, 1st ed. Routledge, Abingdon, 271–288. URL https://doi.
org/10.4324/9781315311258-19.
Braun, S., 2024. Distance Interpreting as a Professional Profile. In Massey, G., Ehrensberger-Dow,
M., Angelone, E., eds. Handbook of the Language Industry: Contexts, Resources and Profiles. de
Gruyter, Berlin, 449–472. URL https://doi.org/10.1515/9783110716047-020.
Buján, M., Collard, C., 2022. Remote Simultaneous Interpreting and COVID-19: Conference Inter-
preters’ Perspective. In Liu, K., Cheung, A.K.F., eds. Translation and Interpreting in the Age of
COVID-19. Springer, New York, 133–150. URL https://doi.org/10.1007/978-981-19-6680-4_7.

63
The Routledge Handbook of Interpreting, Technology and AI

Canada Regional Bureau of the International Association of Conference Interpreters, 2021.


Distance Interpreting During the Pandemic. AIIC Website. URL https://aiic.org/uploaded/web/
Interpreter%20survey%20report%20FINAL.pdf (accessed 9.3.2024).
Caniato, A., 2020a. The Proposed Pathodynamics of the Junk Sound Syndrome: Why RSI Sound
Is Bad for the Interpreter’s Ears. LinkedIn. www.linkedin.com/pulse/proposed-pathodynamics-
junk-sound-syndrome-why-rsi-bad-andrea-caniato/?trackingId=zWBSvkpsyR60l6%2Fq5FS%2FJQ
%3D%3D (accessed 9.3.2024).
Caniato, A., 2020b. Zoom Goes Hi-Fi: Music to the Interpreters’ Ears. LinkedIn. https://www.
linkedin.com/pulse/zoom-goes-hi-fi-music-interpreters-ears-andrea-caniato?trk=article-s
sr-frontend-pulse_more-articles_related-content-card (accessed 9.3.2024).
Chaves, S.G., 2020. Remote Simultaneous Interpreting Hubs or Platforms: What’s the Best
Option?. The ATA Chronicles. www.atanet.org/tools-and-technology/remote-simultaneous-
interpreting-hubs-or-platforms-whats-the-best-option/ (accessed 9.3.2024).
Chen, S., 2017. The Construct of Cognitive Load in Interpreting and Its Measurement. Perspectives
25(4), 640–657. URL https://doi.org/10.1080/0907676x.2016.1278026
Cheung, A.K.F., 2022. Remote Simultaneous Interpreting from Home or Hub: Accuracy of Num-
bers from English into Mandarin Chinese. In Liu, K., Cheung, A.K.F., eds. Translation and
Interpreting in the Age of COVID-19. Springer Nature, Singapore, 113–132. URL https://doi.
org/10.1007/978-981-19-6680-4_6
Chmiel, A., 2008. Boothmates Forever? On Teamwork in a Simultaneous Interpreting Booth. Across
Languages and Cultures 9(2), 261–276. URL https://doi.org/10.1556/Acr.9.2008.2.6
Chmiel, A., Spinolo, N., 2022. Testing the Impact of Remote Interpreting Settings on
Interpreter Experience and Performance: Methodological Challenges Inside the Virtual
Booth. Translation, Cognition & Behavior 5(2), 250–274. URL https://doi.org/10.1075/tcb.
00068.chm
Chmiel, A., Spinolo, N., Korpal, P., Olalla-Soler, Ch., Rozkrut, P., Kajzer-Wietrzny, M., Ghiselli, S.
Under review. Inside the Virtual Booth: The Impact of Remote Interpreting Settings on Interpreter
Experience and Performance. Translation and Interpreting Studies.
Defrancq, B., Fantinuoli, C., 2021. Automatic Speech Recognition in the Booth: Assessment of Sys-
tem Performance, Interpreters’ Performances and Interactions in the Context of Numbers. Tar-
get. International Journal of Translation Studies 33(1), 73–102. URL https://doi.org/10.1075/
target.19166.def.
Disterer, G., Kleiner, C., 2013. BYOD Bring Your Own Device. Procedia Technology 9, 43–53. URL
https://doi.org/10.1016/j.protcy.2013.12.005
Diur, M., Ruiz Rosendo, L., 2022. Reconceptualising Interpreting at the United Nations. In Liu, K.,
Cheung, A.K.F. Cheung., eds. Translation and Interpreting in the Age of COVID-19. Springer,
Berlin, 151–164. URL https://doi.org/10.1007/978-981-19-6680-4_8.
Duflou, V., 2016. Be(com)ing a Conference Interpreter – an Ethnography of EU Interpreters as a
Professional Community. Benjamins, Amsterdam. URL https://doi.org/10.1075/btl.124.
Fan, D., 2022. Remote Simultaneous Interpreting: Exploring Experiences and Opinions of Con-
ference Interpreters in Taiwan. Compilation and Translation Review 15(2), 159–198. URL
https://doi.org/10.29912/CTR.202209_15(2).0005
Fantinuoli, C., Marchesini, G., Landan, D., Horak, L., 2022. KUDO Interpreter Assist: Auto-
mated Real-Time Support for Remote Interpretation. arXiv:2201.01800v1. URL https://doi.
org/10.48550/arXiv.2201.01800
Ferri, E., 2021. Gli effetti della pandemia da COVID-19 sull’interpretazione simultanea: Come cam-
bia il panorama. Una indagine tra interpreti e organizzatori di eventi in Italia (Unpublished MA
dissertation). University of Bologna – Forlì.
Flerov, C., 2021. Is Zoom “Hi-Fi Mode” an Answer to Interpreters’ Woes? Not at All!. LinkedIn. www.
linkedin.com/pulse/zoom-hi-fi-mode-answer-interpreters-woes-cyril-flerov (accessed 9.3.2024).
Frittella, F.M., 2023. Usability Research for Interpreter-Centred Technology: The Case Study of
SmarTerp.Language Science Press, Berlin. URL https://zenodo.org/record/7376351
Gieshoff, A.C., Schuler, M., 2022. The Augmented Interpreter: A Pilot Study on the Use of Aug-
mented Reality in Interpreting. A Conference Presentation at the Third HKBU International
Conference on Interpreting, Hong Kong, 7–9.12.2022. URL https://digitalcollection.zhaw.ch/han-
dle/11475/26724 (accessed 9.3.2024).

64
Remote simultaneous interpreting

Gile, D., 1995/2009. Basic Concepts and Models for Interpreter and Translator Training. John
­Benjamins, Amsterdam. URL https://doi.org/10.1075/btl.8
ISO, 2022. ISO 24019:2022 Simultaneous Interpreting Delivery Platforms. Requirements and
­Recommendations. URL www.iso.org/standard/80761.html (accessed 9.3.2024).
ISO, 2023. ISO 20539:2023 Translation, Interpreting and Related Technology: Vocabulary. www.
iso.org/standard/81379.html (accessed 9.3.2024).
Jayes, T., 2023. Conference Interpreting and Technology: An Institutional Perspective. In Corpas Pas-
tor, G., Defrancq, B., eds. Interpreting Technologies: Current and Future Trends. John Benjamins,
Amsterdam, 217–240. URL https://doi.org/10.1075/ivitra.37.09jay.
Laugwitz, B., Held, T., Schrepp, M., 2008. Construction and Evaluation of a User Experience Ques-
tionnaire. USAB 5298, 63–76. URL https://doi.org/10.1007/978-3-540-89350-9_6
Lee, K.M., 2004. Presence, Explicated. Communication Theory 14(1), 27–50. URL https://doi.
org/10.1111/j.1468-2885.2004.tb00302.x
Matsushita, K., 2022. How Remote Interpreting Changed the Japanese Interpreting Industry: Find-
ings from an Online Survey Conducted During the COVID-19 Pandemic. INContext: Studies in
Translation and Interculturalism 2(2), 167–185. URL https://doi.org/10.54754/incontext.v2i2.22
Matsushita, K., Yamada, M., Ishizuka, H., 2022. How Multiple Visual Input Affects Interpreting Per-
formance in Remote Simultaneous Interpreting (RSI): An Experimental Study. A Conference Pres-
entation at the Third HKBU International Conference on Interpreting, Hong Kong, 7–9.12.2022.
Mayub Rayaa, B., Martin, A., 2022. Remote Simultaneous Interpreting: Perceptions, Practices
and Developments. The Interpreters’ Newsletter 27, 21–42. URL https://doi.org/10.13137/
2421-714X/34390
Moser-Mercer, B., 2003. Remote Interpreting: Assessment of Human Factors and Performance
Parameters. AIIC Webzine, Summer, 1–17.
Moser-Mercer, B., 2005a. Remote Interpreting: Issues of Multi-Sensory Integration in a Multilingual
Task. Meta 50(2), 727–738. URL https://doi.org/10.7202/011014ar
Moser-Mercer, B., 2005b. Remote Interpreting: The Crucial Role of Presence. Bulletin vals-asla 81,
73–97.
Mouzourakis, P., 1996. Videoconferencing: Techniques and Challenges. Interpreting: International
Journal of Research and Practice in Interpreting 1(1), 21–38.
Mouzourakis, P., 2006. Remote Interpreting: A Technical Perspective on Recent Experiments. Inter-
preting: International Journal of Research and Practice in Interpreting 8(1), 45–66. URL https://
doi.org/10.1075/intp.8.1.04mou
Pöchhacker, F., 2004. Introducing Interpreting Studies. Routledge, Abingdon.
Prandi, B., 2023. Computer-Assisted Simultaneous Interpreting: A Cognitive-Experimental Study on
Terminology. Language Science Press, Berlin. URL https://zenodo.org/record/7143056
Przepiórkowska, D., 2021. Adapt or Perish: How Forced Transition to Remote Simultaneous Inter-
preting During the COVID-19 Pandemic Affected Interpreters’ Professional Practices. Między
Oryginałem a Przekładem 27(4(54)), 137–159. URL https://doi.org/10.12797/MOaP.27.2021.54.08
Rodríguez González, E., Saeed, M.A., Korybski, T., Davitti, E., Braun, S., 2023a. Reimagin-
ing the Remote Simultaneous Interpreting Interface to Improve Support for Interpreters. In
Ferreiro-Vázquez, Ó., Correia, A., Araújo, S., eds. Technological Innovation Put to the Service
of Language Learning, Translation and Interpreting: Insights from Academic and Professional
Contexts. Peter Lang, Lausanne, 227–246.
Rodríguez, S., Gretter, R., Matassoni, M., Alonso, A., Corcho, O., Rico, M., Falavigna, D., 2021.
SmarTerp: A CAI System to Support Simultaneous Interpreters in Real-Time. In Mitkov, R.,
Sosoni, V., Giguère, J.C., Murgolo, E., Deysel, E., eds. Proceedings of the Translation and Inter-
preting Technology Online Conference. INCOMA Ltd., 102–109.
Rodríguez González, E., Saeed, M.A., Korybski, T., Davitti, E., Braun, S., 2023b. Assessing the Impact
of Automatic Speech Recognition on Remote Simultaneous Interpreting Performance Using the
NTR Model. Say It Again. International Workshop on Interpreting Technologies, Malaga, Spain.
Roziner, I., Shlesinger, M., 2010. Much Ado About Something Remote: Stress and Performance in
Remote Interpreting. Interpreting: International Journal of Research and Practice in Interpreting,
12(2), 214–247. URL https://doi.org/10.1075/intp.12.2.05roz
Saeed, M.A., 2023. Exploring the Visual Interface in Remote Simultaneous Interpreting (PhD thesis).
University of Surrey. URL https://doi.org/10.15126/thesis.901059

65
The Routledge Handbook of Interpreting, Technology and AI

Saeed, M.A., González, E.R., Korybski, T., Davitti, E., Braun, S., 2022. Connected Yet Distant: An
Experimental Study into the Visual Needs of the Interpreter in Remote Simultaneous Interpreting.
In Kurosu, M., ed. Human-Computer Interaction: User Experience and Behavior, Vol. 13304.
Springer, Berlin, 214–232. URL https://doi.org/10.1007/978-3-031-05412-9_16
Seeber, K., 2011. Cognitive Load in Simultaneous Interpreting: Existing Theories – New Models.
Interpreting 13(2), 176–204. URL https://doi.org/10.1075/intp.13.2.02see
Seeber, K., 2022. When Less Is Not More: Sound Quality in Remote Interpreting. UN Today web-
site. URL https://untoday.org/when-less-is-not-more-sound-quality-in-remote-interpreting/
(accessed 9.3.2024).
Seeber, K., Pan, D., 2022. Audio Quality in Remote Interpreting. A Conference Presentation at the
Third HKBU International Conference on Interpreting, Hong Kong, 7–9.12.2022.
Seeber, K.G., Keller, L., Amos, R., Hengl, S., 2019. Expectations vs. Experience: Attitudes Towards
Video Remote Conference Interpreting. Interpreting: International Journal of Research and Prac-
tice in Interpreting 21(2), 270–304. URL https://doi.org/10.1075/intp.00030.see
Şengel, Z., 2022. Zooming in: Interpreters Perspective Towards Remote Simultaneous Interpret-
ing (RSI) Ergonomics. Çeviribilim ve Uygulamaları Dergisi. Journal of Translation Studies 33,
169–190.
Seresi, M., Láncos, P.L., 2022. Teamwork in the Virtual Booth – Conference Interpreters’ Experiences
with RSI Platforms. In Liu, K., Cheung, A.K.F., eds. Translation and Interpreting in the Age of
COVID-19. Springer, Berlin, 181–196.
Spinolo, N., 2022. Remote Interpreting. In Franco Aixelá, J., Muñoz Martín, R., eds. ENTI (Enci-
clopedia de traducción e interpretación). AIETI, Asociación Ibérica de Estudios de Traducción e
Interpretación. URL https://zenodo.org/records/6370665
Spinolo, N., Chmiel, A., 2023. Final Report – AIIC Research Grant 2020 Inside the Virtual Booth:
The Impact of Remote Interpreting Settings on Interpreter Experience and Performance. Unpub-
lished report.
Zhang, X., Corpas Pastor, G., Zhang, J., 2023. Videoconference Interpreting Goes Multimodal. Some
Insights and a Tentative Proposal. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies –
Current and Future Trends. John Benjamins, Amsterdam, 169–194.
Zhu, X., Aryadoust, V., 2022. A Synthetic Review of Cognitive Load in Distance Interpreting:
Toward an Explanatory Model. Frontiers in Psychology 13, 899718. URL https://doi.org/10.3389/
fpsyg.2022.899718
Ziegler, K., Gigliobianco, S., 2018. Present? Remote? Remotely Present! New Technological
Approaches to Remote Simultaneous Conference Interpreting. In Fantinuoli, C., ed. Interpret-
ing and Technology. Language Science Press, Berlin, 119–139. URL https://doi.org/10.5281/
ZENODO.1493299
Zoom Blog, 2022. High-Fidelity, Professional-Grade Audio on Zoom. Zoom blog. URL https://blog.
zoom.us/high-fidelity-music-mode-professional-audio-on-zoom/ (accessed 9.3.2024).

66
4
VIDEO RELAY SERVICE
Camilla Warnicke

4.1 Introduction
Technological advancements have grown significantly in recent decades. For instance, the
ability to make real-time video calls is now a widespread phenomenon. One can contrast
this with the year 1968, when Stanley Kubrick’s groundbreaking film 2001: A Space Odys-
sey (1968) astonished audiences by depicting the ‘futuristic concept’ of communication
through sound and video using long-distance videophones. Current technology has evolved
sufficiently, with the result that remote communication via video is now possible in real life.
Today, what used to be considered ‘sci-fi-like interaction’ has become commonplace for many
people in the world. The possibility for users to see each other while communicating from a
distance has brought about tremendous changes for several societal groups. One such group
is deaf signers, who can now rely on a service that allows them to contact speaking parties via
interpreter mediation by the use of ‘video relay service’ (VRS). VRS provides calls between
two primary participants: a person using a signed language via videophone, interacting with
a person speaking on a ‘regular’ telephone/smartphone. The interaction between the primary
participants is mediated by an interpreter. From the position (often) in a call centre, the inter-
preter observes the signing party on the computer screen and listens to the speaking party via
a headset. In contrast, during this interaction, the two primary participants cannot see or hear
each other (see Figure 4.1).
Before the innovation of videophones, deaf signers had to meet face-to-face in order to
sign with each other. In fact, it was not until the 1960s that the signing community received
accessible telephone technology. The first example of distance interaction with deaf signers
in real time took place via the exchange of texts produced by a teletypewriter (TTY, or tele-
communications device for the deaf: TDD). TTY and TDD produce typed versions of a spo-
ken language (Brunson, 2011) and bridge the remote communication gap between (voice)
telephone users and those with hearing or speech disabilities. Later evolutions allowed for
text messages to be sent between mobile phones, via a short message service (SMS). It is
worth noting that signing interaction via videophone was launched in the 1990s, whereas
organised interpreting services via videophone for deaf people (VRS) have only been an
accessible option since 1996 (Haualand, 2012).


DOI: 10.4324/9781003053248-6
The Routledge Handbook of Interpreting, Technology and AI

Figure 4.1 VRS, an overview.

To date, VRS has been providing regular service in several countries around the world.
The first regular VRS was launched in Europe, in Sweden, in 1996 (Warnicke, 2017). VRS
came into use in other countries as time went by, being launched in, for example, the United
States in 2000 (Chang and Russell, 2022), and in Thailand in 2011 (Thailand Communica-
tions Relay Service Center, 2021). However, most countries in Africa and South America
do not yet have a regular VRS.
VRS contributes to providing greater access and equality for deaf people, via technical
developments and the provision of interpreting. In the wake of this, VRS has made deaf
signers’ access to societal services more comparable to that of hearing individuals in society
(Haualand, 2014). The technical set-up of a VRS is, however, not a ‘quick fix’ to improve
inclusion for deaf people in society (cf. De Meulder and Haualand, 2021).
After this brief overview of how VRS has emerged, Section 4.2 will provide a criti-
cal analysis of practices involved in VRS. It will focus on modalities and contrast spoken
and signed language. Subsequently, it will make a link to cognate practices and describe
telephone and video-mediated interactions. Lastly, Section 4.2 will present the procedures
required to make interpreter-mediated phone calls via a VRS. In Section 4.3, discussion
will focus on regulatory considerations relating to interpreters’ billable time and emergency
calls, in order to represent differences between VRS in different nations. Section 4.4 will
conclude the chapter by discussing potential future developments in this domain.

4.2 Critical analysis of practices


Some characteristics of VRS are similar across nations (Warnicke and Granberg, 2022).
For example, in countries where VRS has been established, the service involves two dif-
ferent languages and different modalities, several technical mediums, and mediation by an

68
Video relay service

interpreter. The next sections will individually address and analyse modalities, media and
institutional interaction relating to VRS. The different aspects will then be brought together
to discuss interpreter-mediated calls conducted via a VRS.

4.2.1 Modalities
VRS involves both a signed and a spoken language. It is therefore a bimodal and intercul-
tural practice. The two languages involved differ in modality, in the way they are produced
and perceived. Spoken languages rely on oral speech and auditory perception. Signed lan-
guages, on the other hand, rely on gestural-visual resources, including signs, gestures, and
facial expressions.
Signed languages are structured using conventional linguistic units, indicating that they
are naturally occurring human languages (Stokoe, 2005), distinct from spoken languages
(Brennan, 1990). However, not all signed languages are the same across the globe. As with
spoken languages, there is a wide variety of signed languages across different countries and
regions. In some cases, more than one official signed language can be used within a country
(see, for example, Brentari, 2010). Although signed languages have different signs, some
cultural expressions and elements of interaction will be the same across signed languages.
One example includes how to name or label a person. While names can be finger-spelled,
letter by letter, in signed languages, it is also common in the deaf community to use a sign
for each person’s name (Börstell, 2017; Supalla, 1990). Another common denominator
across all signed languages is that they have no written form. This means that written text
(often the national spoken official language) constitutes a second language for deaf signers.
In order to analyse the grammatical structure of signed languages, one needs to under-
stand the complexities related to using a signed language over videophone, via a VRS.
It is possible to categorise signs according to where on the body a sign is produced, the
handshape used, or movement of signs. Phonological segments are attributed to each sign,
in relation to its movement and hold (Liddell, 1984). These different aspects give the sign
its meaning – any change in one of these aspects can change the meaning of the sign.
Signed languages encompass manual signs made by the hands and non-manual signs (Lid-
dell, 1977). Non-manual signs include facial expressions and lip movements that may cor-
respond to spoken words. In addition, signed languages rely on various resources to convey
various linguistic elements. Examples include sentence or clause type (e.g. interrogative
clauses, relative clauses; cf. Sandler and Lillo-Martin, 2006). Although it could be difficult
to identify and decode some aspects of signs via visual media, such as videophones, both
the VRS interpreter and the signing party need to be able to perceive these aspects for com-
munication with the speaking party to take place.

4.2.2 Media and institutional interaction


Calls are shaped by the communication media in which they occur and the way the inter-
locutors interact. No interaction occurs in a social vacuum (Linell, 1998, 273), and this
holds true for telephone and videophone communication. Interactions in specific settings
often become institutional (cf. Drew and Heritage, 1992). This means that patterns of how
to interact in a specific context have developed (and continue to develop) over time. Thus,
institutional patterns and conventions have emerged for the interaction in telephone and
videophone communication. In addition, institutionalised interaction is shaped by existing

69
The Routledge Handbook of Interpreting, Technology and AI

guidelines and norms and by the circumstances and functions of the media used for it (cf.
Heritage and Clayman, 2010). VRS involves interaction both over telephone and video-
phone. Section 4.2.2.1 describes spoken interaction over the telephone, and Section 4.2.2.2
provides specificities relating to videophone interaction among deaf signers.

4.2.2.1 Telephone conversation


Spoken language communication over the telephone began with the American invention
of the telephone by scientist Alexander Graham Bell. Interestingly, Bell was working as a
speech teacher for deaf people at that time, using a repressive method called oralism. Oral-
ism forced deaf people to speak rather than use signed language. Ironically, Bell’s efforts
to make sound visible for deaf people led him to inventing the telephone, a device primar-
ily for hearing people. Bell patented the telephone in 1876, and the innovation became
widespread during the 20th century. In the late 2000s, the telephone became mobile, and
smartphones became widely used during the 2010s.
Communicating (with the voice) over a distance has thus been possible for nearly
150 years and is now a well-established practice. However, to some extent, the telephone
constrains the way hearing people interact. Those who speak and listen on the telephone are
unable to see each other, so no visual cues can be used in communication. This limitation is
reflected in the fact that telephone conversations begin differently from face-to-face encoun-
ters (cf. Hopper, 1992). To illustrate, speech via a telephone needs to be more explicit to
compensate for its constraints. Therefore, opening with a greeting and self-introduction is a
common way to start a telephone conversation. The absence of visual cues in this medium
also shapes turn organisation; gaps are shorter, and overlapping speech occurs more fre-
quently. In addition, the mobility offered by smartphones means that people can call from
anywhere. This also impacts the way people interact on the telephone.

4.2.2.2 Videophone conversation between deaf signing parties


Signers can interact across a distance using videophones. Studies of video calls between deaf
signers show that the interaction is transformed by the media used, that is, the videophones
with webcam. However, research into signers’ interactions via videophones is scarce.
­Nevertheless, one key finding regarding these interactions is that the signing is adapted to
the setting and constrained by the webcam. This implies that the signing, which is naturally
three-dimensional, has to be adjusted to fit a two-dimensional space. According to Keating
et al. (2008) and Keating and Mirus (2003), signers reduce the size and dimensionality of
the signs when using a two-dimensional space. In addition, due to the reduced access to
facial expressions in this type of video-mediated interaction, signing has to be more distinct.
This often involves redundancy, including occasional repetition of the same message, with
different explanations. To ensure signs are visible, they are repositioned; for example, they
are adjusted vertically or horizontally (cf. Liddell, 1977, 2003) or by positioning the hands
closer to the screen or webcam. Furthermore, signing speed may be reduced; slower signing
can serve as an adaptive strategy to facilitate decoding via a videophone.
Another example of an adaptive strategy that is used when communicating via vide-
ophone is the way questions are posed. Instead of indicating a question using facial expres-
sions and body posture, that is, with non-manual signs, a question can be indicated via
manual means, for example, by using the sign for question mark (Keating et al., 2008,

70
Video relay service

1074). This is a rather uncommon way for signers to pose a question during face-to-face
interaction, but it is seen as a feasible and functional way to do so during video-based inter-
action. Moreover, for a signer to direct a question to one of the two users of a videophone,
they may also have to compensate: using the name sign of their intended addressee. This is
an issue that is normally handled by gaze or by pointing towards a reference in face-to-face
interaction (cf. Keating et al., 2008, 1175–1176).
It is important to remember that participants do not share the same physical space dur-
ing video-mediated conversation. To some extent, using a video link fuses proximity and
distance between them. A shared space appears, which can be considered a ‘novel space’ (cf.
Warnicke and Broth, 2023). This novel space challenges boundaries between what is real
and what is virtual. Consider a scenario where two signing people interact via videophone,
sharing what can be called a ‘front stage’ (Goffman, 2016). This front stage encompasses
everything that is visible by the other person’s webcam. However, not everything that is
taking place in the surrounding (real) environment is captured by the webcam. View of
this shared, virtual space captured by the webcam is thus distinct from the ‘real’ space.
Furthermore, in the real space, the signer can communicate with another individual in the
same physical location, out of view of the webcam. As a result, interaction can take place in
both the virtual space (as captured by the webcam) and the real space (in the near physical
environment, outside of webcam view). However, in video-mediated interaction, it is com-
mon to describe who can see the interaction from the real space and who is following what
is being signed in the video call (Keating et al., 2008; Keating and Mirus, 2003).

4.2.3 Interpreter-mediated phone calls in VRS


The interaction in VRS is formed by a combination of the different media involved in the
service (telephone or videophone), the interpreters’ platform, screen, webcam, and headset.
The interaction consists of at least three parties involved in the call (the interpreter and the
primary participants: the ‘caller’ and the ‘called’). Not all parties share the same modality,
media, or space. As a result, the interpreted event constitutes a new setting, where interac-
tion amongst the parties involved unfolds like a dance of three (Warnicke, 2018), resem-
bling Wadensjö’s (2014) communicative ‘pas de trois’. The notion of the interpreter as an
‘ideal neutral transmitter’ (cf. Roy, 2002) is therefore challenged in VRS, as the interpreter
becomes an active party, with a novel and unique position to facilitate the communication
(Warnicke and Broth, 2023).
The interpreter is the only participant with access to both the visual and auditive con-
nection. The interpreter has contact with the signing party via a webcam and screen. From
the interpreter’s computer, it is also possible to exchange text with the signing party, via the
platform (Warnicke and Plejert, 2021). The interpreter and speaking party are connected
via headset and telephone, respectively (Warnicke and Plejert, 2018). The interpreter’s
headset consists of one or two earphones and a microphone. As the interpreter is signing, it
is convenient to have the headset microphone placed on the opposite side to the most active
signing hand. This means right-handed interpreters place the headset with the microphone
on their left-hand side (and vice versa), to avoid getting entangled in the device. However,
the headset also functions as an interactional resource between the interpreter and the sign-
ing party (Warnicke and Plejert, 2018). Here, the interpreter may point towards the headset
to make a reference to the speaking party, for the benefit of the signing party. Moreover,
when an interpreter touches the headset, the signing party tends to stop signing. Therefore,

71
The Routledge Handbook of Interpreting, Technology and AI

while the use of the headset is key for the interpreter to hear the speaking party, evidence
also shows that its presence has implications in the interaction between the interpreter and
the signing party too.
A common procedure in VRS occurs when the interpreter answers an incoming call from
either a signer’s videophone or a speaking party’s telephone. A traditional way of answering
a call is with a greeting (Warnicke, 2021). In this initial phase of the call, the interpreter
receives a number or an email address to a videophone to contact the called part on behalf
of the caller. The interpreter dials the number or the email address. When the called party
answers, the interpreter may have to explain the VRS. Following this, the two primary par-
ticipants can begin their interaction, mediated by the interpreter (Warnicke, 2021). These
phases of the call vary between countries, as do the regulations. These differences are elabo-
rated in Section 4.3.1 (‘billable time’).
Interpreting in VRS is carried out in the simultaneous mode. This requires the inter-
preter to interpret into the target language what is signed or said in the source language,
while processing the incoming source language (cf. Leeson, 2005; Riccardi, 2005; Russell,
2005). As explained in Section 4.3.1, the VRS setting involves a range of modalities that
are produced and perceived via different means of communication, that is, a visual/gestural
signed language and a verbally spoken language. The different modalities present in VRS
facilitate simultaneous interpreting, as the languages used do not interfere with each other.
In contrast to many simultaneous interpreters working in spoken language communication,
VRS interpreters interpret bidirectionally, that is, to and from both signed language and
spoken language.
Signed language can present language-related challenges for interpreters in a VRS. As a
nationwide service, VRS spans a variety of regions and may therefore include regional and
cultural variants of signs that may be unfamiliar to interpreters (Palmer et al., 2012). These
variants may be difficult to render simultaneously without asking for clarification. Further-
more, the use of personal signs for names, which is a cultural and common way for signers
to refer to a name or a person (see Section 4.2.1), could also present an issue for interpreters
(cf. Börstell, 2017; Supalla, 1990). In addition, in trilingual calls (e.g. involving a signed
language, English, and Spanish), challenges include how to render finger-spelled names for
the signing party in the videophone, and how to figure out a right pronunciation, as English
or Spanish can differ (Treviño and Quinto-Pozos, 2018).
The interpreter is also responsible for turn organisation, making decisions about who is
interpreted and when (Warnicke and Plejert, 2012). Turn organisation is managed exclu-
sively by the interpreter, as the signing and speaking participants cannot see or hear each
other. The interpreter can manage the organisation of turns using strategies such as antici-
pating an upcoming utterance or by providing expanded or reduced renditions (Warnicke
and Plejert, 2012). Other forms of coordination performed by the interpreter include using
embodied resources, such as hand and body movements, and gaze signals to communicate
with the signing party, and audible signals such as humming to the hearing participant
(Marks, 2018; Warnicke and Plejert, 2012). Due to the division of spaces in VRS, and for
effective communication to take place, it is important that the interpreter inform the pri-
mary participants about what is happening in the novel space between the ‘caller’ or the
‘called’ party and the interpreter. This presents an issue for the interpreter as s/he needs to
define the situation by explaining what is going on. For example, the interpreter will need
to mention if the signing party moves away from the videophone at any point, or if there

72
Video relay service

is disturbing background noise at the speaker’s end, or if a technical problem arises. In


VRS, visual, auditive, and contextual cues are limited due to the devices used. Furthermore,
neither primary participant is able to understand the other without the presence of the
interpreter. Thus, VRS interpreters need to maintain heightened awareness, reflexively, and
dynamically adapt to the evolving interaction on a moment-by-moment basis (Warnicke
and Plejert, 2016).
In VRS, the conversation is also affected by the relationship between primary partici-
pants. Interpreters may also work within a variety of different contexts, often in rapid suc-
cession. For instance, interpreters could first be involved in a call between family members,
and immediately afterwards, a call between a private individual and an authority (cf. Peter-
son, 2011). The relationship between primary participants can significantly impact how
the call proceeds and how the parties talk to each other (via the interpreter). In addition,
individuals’ ways of expressing themselves also vary. This is another factor which shapes
the conversation and impacts both the interpreter and the efficiency of the communication.
When a call arrives at a VRS, the interpreter does not know who is calling or the caller’s
intended objective. VRS calls are typically unpredictable. This requires the interpreter to
navigate between various emotions from one call to the next. The requirement to constantly
calibrate themselves with each assignment, within a minimal time frame between calls,
makes the interpreter’s situation in VRS unlike other traditional assignments. In traditional
face-to-face assignments, interpreters physically travel between locations and may take
breaks to recover or adjust between interpreting events. The experience of persistent emo-
tional extremes therefore becomes a risk factor for VRS interpreters (Wessling and Shaw,
2014) and could lead to stress and burnout for employees (Bower, 2015). While education
and training, exclusively for interpreters working in VRS settings, is lacking (Warnicke and
Granberg, 2022), it remains necessary to provide VRS interpreters with training for han-
dling emotional and interactional stress, as well as other challenges commonly experienced
in this setting (Alley, 2014; Napier et al., 2017; Roman and Samar, 2015; Skinner, 2023)
for this form of accessible communication to continue effectively.
As with other demanding work situations (see, for example, Bakker et al., 2010), it is
essential that interpreters manage and control each VRS situation. However, it has been
reported that interpreters often experience a ‘lack of control’ and ‘uncertainty’ about regula-
tions and company rules. For example, interpreters have reported feeling like ‘non-persons’
(Alley, 2014) in calls, whereas other research on VRS interaction (e.g. Warnicke, 2018,
2021; Warnicke and Broth, 2023) shows interpreters to be ‘active participants’ during
interaction, with a specific position and a novel task to carry out. An additional challenge
in VRS is that the interpreters usually work alone in the studio. In several countries, team
(or tandem) interpreting is implemented in face-to-face assignments that last longer than
30 min (Hoza, 2022). This is not the case for VRS. VRS calls may have no time limit, and
interpreters are required to remain present and active during each call – even long ones – on
their own.

4.3 Regulatory considerations


Numerous guidelines and regulations shape the setting and landscape of VRS, with signifi-
cant variations being observed between national services. Although several national services
are labelled ‘VRS’, the services are not equivalent across all providing countries, and despite

73
The Routledge Handbook of Interpreting, Technology and AI

some surface similarities, differences persist (Haualand, 2012). One difference that affects
the organisation of VRS is the underlying reason that each nation provides the service. For
instance, in Sweden, it is the need for increased accessibility that drives the provision of
VRS. However, in the United States, civil rights considerations led to the service. Finally,
in Norway, the service is organised as an extension of the regular sign language interpret-
ing service (Haualand, 2012). Each nation’s motive derives from its respective country’s
political, financial, and social structures that are embedded in various networks of actors.
These underlying structures shape each nation’s VRS in terms of organisational structure
and practical performance, that is, the level of interaction a user can have with the service.
As a result, the platforms and technical devices used to provide the service do not follow
identical standards all over the world.
Each country’s national VRS can be mandated by governmental authorities and/or be
managed by independent VRS companies. For instance, in Sweden, only one VRS company
is nominated to provide the service, following state procurement. However, the United
States boasts multiple VRS providers. In Norway, VRS is provided by local county govern-
ments. Despite some similarities between national services, differences remain substantial.
Regulatory considerations directly impact interpreters’ work and have been identified as
being a considerable stress factor (Bower, 2015; Chang and Russell, 2022). In addition,
regulations have implications within the calls themselves. Examples include differences
between company billing practices (see Section 4.3.1) and the provision of emergency calls
(see Section 4.3.2).

4.3.1 Billable time


Billable time for interpreting services, that is, the duration for which the interpreting com-
pany is paid in relation to the call, varies across national borders. In Sweden, the interpret-
ers’ billable time begins once they answer an incoming VRS call. In contrast, the United
States considers billable time to start when the interpreter connects the caller to the called
party (Brunson, 2011; Peterson, 2011).
Billable time and established connection may seem a relatively small detail, but it can
have major consequences for the interpreter in the initial phase of the call. It is during this
initial phase, before connecting the primary parties to each other, that interpreters establish
contact with the caller. The interpreter may communicate with the caller before starting to
interpret the actual call. Establishing this contact before the interpreting phase begins has
been shown to facilitate understanding among the parties during the call (Warnicke and
Plejert, 2016, 2018). Furthermore, if the interpreter possesses knowledge about the call in
advance, this also facilitates turn organisation. Obtaining knowledge in advance regard-
ing the type of call – for example, whether it is to a company or an individual – provides
the interpreter with an indication of the topics that may be discussed. The small talk with
the caller before the interpreting phase starts could reveal additional information about the
call. To further prevent potential problems in translation, there is a need to establish
cooperation and trust between the primary participants and the interpreter. A trusting
relationship established during the preparation phase can have a positive impact on the
entire call.
Lack of knowledge about the conversation prior to a call can be stressful to the inter-
preter (Brunson, 2011). A further source of pressure on the interpreter can originate from
VRS supervisors and administrators, who, in some services can monitor the interpreter

74
Video relay service

during the calls, via the interpreter’s computer (Brunson, 2011, 55). A scenario from the
United States described interpreters being alerted by a flashing light if the connection
between primary participants was not established within 30 sec. The monitoring can also
affect the interpreters’ salaries, as the VRS company can control how quickly they con-
nect calls, how much billable time they charge, and by extension, how much money they
raise for the company (cf. Brunson, 2011). Such regulations may lead to calls being set up
quickly, with little preparatory interaction. In VRSs where interpreters are not monitored
and where they are able to interact with callers before interpretation begins, the situation is
less stressful (cf. Warnicke, 2017). A less-stressful situation may provide interpreters with
a greater opportunity to offer higher-quality service. To sum up, enabling interpreters to
prepare for a call before the primary participants are connected could ensure better quality
interpretation. Although billable time seems to be a small detail initially, it has great practi-
cal consequences for both the interpreter and the interaction.

4.3.2 Emergency calls


VRS services can provide auxiliary services for non-emergency calls (for example, 101VRS
to the police in the UK [Skinner, 2020]), but some services provide emergency calls via
the regular VRS (Warnicke, 2019). In some of the regular services that provide emergency
calls, an incoming call has the same priority as regular calls to the VRS; no fast track is
provided. However, in Sweden and the United States, for example, a call can be made as a
‘VRS emergency call’ and be given priority (cf. Warnicke, 2019). A prioritised emergency
call is assigned to the first available interpreter and jumps the queue. In some services (for
example, in Sweden), the emergency call is announced to the interpreter before the caller
is connected. This provides the interpreter with an opportunity to mentally prepare them-
selves, which might be helpful.
Signing people’s access to emergency calls via VRS is dependent on the regulations of
specific countries and organisations. For example, in the United States, it is possible to
register a location (address) in the profile associated with the videophone. This provi-
sion makes it possible to automatically direct and connect emergency calls to the near-
est emergency assistant centre. If the caller is located at an address other than the one
registered to the videophone, the caller will be promptly connected to the nearest active
location. In some countries, it is neither possible nor legal to track a caller’s geolocation.
As the location needs to be solved rapidly, emergency calls can present language-related
challenges; to illustrate, the signing party will be required to finger-spell an address to the
interpreter for an ambulance to be sent to the correct one. With the additional stress of
an emergency, this could present a considerable challenge. This requires a very high level
of precision and concentration on behalf of both parties to ensure the correct information
is passed on.

4.4 Conclusion
The situation regarding VRS around the world is still evolving (cf. Warnicke and Granberg,
2022). Western countries, such as Sweden and the United States, have been providing a
VRS for a quarter of a century, although opening hours, organisation, and devices differ
between the two countries. The development of VRS has been forced to expand in a short
time due to COVID-19 (De Meulder et al., 2021; Warnicke and Matérne, 2024a, 2024b).

75
The Routledge Handbook of Interpreting, Technology and AI

However, some countries have no organised sign language interpreting service at all. Never-
theless, VRS could present a means of solving issues relating to sign language interpreting,
particularly the limited availability of interpreters. Factors such as large geographical areas
with low interpreter coverage, lack of interpreting training, and the need for interpreters to
travel long distances for assignments often contribute to this scarcity. In poor and troubled
areas, where movement is difficult, such as in countries with poorly developed infrastruc-
ture and in war zones, VRS could be a way to provide an accessible interpreting service.
Internet access and smartphones are commonly available in developing countries. However,
signers around the world require an option for making calls via a VRS in their country.
VRS could place signing people in developing countries on more equal terms with others, in
accordance with global sustainability goals (United Nations, 2015) and the United Nations
Convention on the Rights of Persons with Disabilities (CRPD; United Nations, 2006).
For services that are already in place, technical evolutions could provide a more devel-
oped service in the future. One opportunity that already is a reality in some VRS plat-
forms is an option for both the primary participants and the interpreter to share visual,
auditive, and text input during the call; this is referred to as total conversation (EENA,
2023). Although some VRSs have the technical solution to provide total conversation
call, both primary participants also need a visual link to make it happen. To the author’s
knowledge, in emergency assistant centre, this is not the case yet. Total conversation
could be thought as one way to optimise emergency calls via VRS. For the caller in a
precarious situation, it could be difficult to clearly convey what is needed (e.g. whether
they need an ambulance or a fire brigade). For the interpreter, it is highly demanding but
extremely relevant to give a correct translation, although it is a stressful situation, and
more difficult in the two-dimensional space (Skinner et al., 2021). A VRS that enables
total conversation could give cues that facilitate decisions about how to handle an alarm,
which potentially can save lives. A total conversation mode could be a good comple-
ment as a default in VRS calls in general and in emergency calls in particular. However,
although there have been positive developments in the design of emergency call in VRS,
no countries are yet offering direct calls using total conversation as default – a call when
both the interpreter, the signing help-seeker, and the switchboard operator at the emer-
gency centre can both see and hear each other as well as exchange text. An essential point
to keep in mind as VRS is evolving is, however, that even small details, as has been shown,
can have big consequences. The future of interaction within VRS depends on ongoing
development and further advancements.

References
Alley, E., 2014. Who Makes the Rules Anyway? Reality and Perception of Guidelines in Video Relay
Service Interpreting. Interpreters Newsletter 19, 13–26.
Bakker, A.B., Van Veldhoven, M., Xanthopoulou, D., 2010. Beyond the Demand-Control Model.
Journal of Personnel Psychology 9(1), 3–16.
Börstell, C., 2017. Types and Trends of Name Signs in the Swedish Sign Language Community. SKY
Journal of Linguistics 30.
Bower, K., 2015. Stress and Burnout in Video Relay Service (VRS). Interpreting. Journal of Interpreta-
tion 24(1).
Brennan, M., 1990. Word Formation in BSL (PhD thesis). Stockholm University, Sweden.
Brentari, D., 2010. Sign Languages. Cambridge University Press, Cambridge.
Brunson, J.L., 2011. Video Relay Service Interpreters: Intricacies of Sign Language Access. Gallaudet
University Press.

76
Video relay service

Chang, S., Russell, D., 2022. Coming Apart at the Screens: Canadian Video Relay Interpreters and
Stress. Journal of Interpretation 30(1), 1–18.
De Meulder, M., Haualand, H., 2021. Sign Language Interpreting Services: A Quick Fix for Inclusion?
Translation and Interpreting Studies 16(1), 19–40.
De Meulder, M., Pouliot, O., Gebruers, K., 2021. Remote Sign Language Interpreting in Times of
COVID-19. University of Applied Sciences, Utrecht.
Drew, P., Heritage, J., 1992. Analyzing Talk at Work: An Introduction. In Drew, P., Heritage, J., eds.
Talk at Work: Interaction in Institutional Settings. Cambridge University Press, Cambridge, 3–65.
EENA, 2023. Implementation of RTT and Total Conversation in Europe. URL https://eena.org/
knowledge-hub/documents/rtt-and-tc-implementation-in-europe/ (accessed 25.1.2014).
Goffman, E., 2016. The Presentation of Self in Everyday Life. Routledge.
Haualand, H., 2012. Interpreting Ideals and Relaying Rights. University of Oslo, Oslo.
Haualand, H., 2014. Video Interpreting Services: Calls for Inclusion or Redialling Exclusion? Ethnos
79(2), 287–305.
Heritage, J., Clayman, S., 2010. Talk in Action: Interactions, Identities, and Institutions. Wiley-
Blackwell, Chichester.
Hopper, R., 1992. Telephone Conversation, Vol. 724. Indiana University Press.
Hoza, J., 2022. Team Interpreting. In Stone, C., Adam, R., Müller de Quadros, R., Rathmann, C., eds.
The Routledge Handbook of Sign Language Translation and Interpreting. Routledge, 162–178.
Keating, E., Edwards, T., Mirus, G., 2008. Cybersign and New Proximities: Impacts of New Com-
munication Technologies on Space and Language. Journal of Pragmatics 40(6), 1067–1081.
Keating, E., Mirus, G., 2003. American Sign Language in Virtual Space: Interactions Between Deaf
Users of Computer-Mediated Video Communication and the Impact of Technology on Language
Practices. Language in Society 32(5), 693–714.
Kubrick, S., Producer, Director, 1968. 2001: A Space Odyssey. Stanley Kubrick Productions.
Leeson, L., 2005. Making the Effort in Simultaneous Interpreting: Some Considerations for Signed
Language Interpreters. In Janzen, T., ed. Topics in Signed Language Interpreting: Theory and Prac-
tice. John Benjamins Publishing Company, Amsterdam, 51–68.
Liddell, S.K., 1977. An Investigation into the Syntactic Structure of American Sign Language. Univer-
sity of California, San Diego.
Liddell, S.K., 1984. Think and Believe: Sequentially in American Sign Language. Language 60(2),
372–399.
Liddell, S.K., 2003. Sources of Meaning in ASL Classifier Predicates. Psychology Press.
Linell, P., 1998. Approaching Dialogue: Talk, Interaction and Contexts in Dialogical Perspectives.
John Benjamins Publishing.
Marks, A., 2018. “Hold the Phone!” Turn Management Strategies and Techniques in Video Relay
Service Interpreted Interaction. Translation & Interpreting Studies: The Journal of the American
Translation & Interpreting Studies Association 13(1), 87–109.
Napier, J., Skinner, R., Turner, G.H., 2017. “It’s Good for Them but Not so for Me”: Inside the Sign
Language Interpreting Call Centre. Translation & Interpreting 9(2), 1–23.
Palmer, J., Wanette Reynolds, L., Minor, R., 2012. “You Want What on Your Pizza!?” Videophone
and Video-Relay Service as Potential Influences on the Lexical Standardization of American Sign
Language. Sign Language Studies 12(3), 371–397.
Peterson, R., 2011. Profession in Pentimento. In Nicodemus, B., Swabey, L., eds. Advances in Inter-
preting Research: Inquiry and Action. John Benjamins Publishing, Amsterdam, 199–223.
Riccardi, A., 2005. On the Evolution of Interpreting Strategies in Simultaneous Interpreting. Meta
50(2), 753–767.
Roman, G.A., Samar, G., 2015. Workstation Ergonomics Improves Posture and Reduces Musculo-
skeletal Pain in Video Interpreters. Journal of Interpretation 24(1), 1–19.
Roy, C.B., 2002. The Problem with Definitions, Descriptions, and the Role Metaphors of Interpreters.
In Pöchhacker, F., Shlesinger, M., eds. The Interpreting Studies Reader. Routledge, London and
New York, 344–353.
Russell, D., 2005. Consecutive and Simultaneous Interpreting. In Janzen, T., ed. Topics in Signed
Language Interpreting. John Benjamins Publishing, 136–164.
Sandler, W., Lillo-Martin, D., 2006. Sign Language and Linguistic Universals. Cambridge University
Press, Cambridge.

77
The Routledge Handbook of Interpreting, Technology and AI

Skinner, R., 2023. Would You Like Some Background? Establishing Shared Rights and Duties in
Video Relay Service Calls to the Police. Interpreting and Society 3(1), 46–74.
Skinner, R.A., 2020. Approximately There – Positioning Video-Mediated Interpreting in Frontline
Police Services. International Journal of Interpreter Education 82.
Skinner, R.A., Napier, J., Fyfe, N.R., 2021. The Social Construction of 101 Non-Emergency Video Relay
Services for Deaf Signers. International Journal of Police Science & Management 23(2), 145–156.
Stokoe, W.C., 2005. Sign Language Structure: An Outline of the Visual Communication Systems of
the American Deaf. Journal of Deaf Studies and Deaf Education 10(1), 3–37.
Supalla, S.J., 1990. The Arbitrary Name Sign System in American Sign Language. Sign Language
Studies 67(1), 99–126.
Thailand Communication Relay Service Center, 2021. TTRS Center. URL www.ttrs.or.th (accessed
25.1.2014).
Treviño, R., Quinto-Pozos, D., 2018. Name Pronunciation Strategies of ASL-Spanish-English Trilin-
gual Interpreters During Mock Video Relay Service Calls. Translation and Interpreting Studies
13(1), 71–86.
United Nations, 2006. The United Nations Convention on the Rights of Persons with Disabilities.
(CRPD) (A/RES/61/106). United Nations, New York.
United Nations, 2015. The Sustainable Development Goals. URL www.un.org/sustainabledevelopment/
sustainable-development-goals/ (accessed 25.1.2014).
Wadensjö, C., 2014. Interpreting as Interaction. Routledge.
Warnicke, C., 2017. Tolkning vid förmedlade samtal via Bildtelefoni.net – Interaktion och gemensamt
meningsskapande [The Interpreting of Relayed Calls Through the Service Bildtelefoni.net – Inter-
action and the Joint Construction of Meaning] (PhD thesis). URL https://oru.diva-portal.org/
smash/get/diva2:1089956/FULLTEXT01.pdf.
Warnicke, C., 2018. Co-Creation of Communicative Projects Within the Swedish Video Relay Inter-
preting Service. In Napier, J., Skinner, R., Braun, S., eds. Here or There: Research on Interpreting
via Video Link. Gallaudet University Press, Washington, DC, 210–229.
Warnicke, C., 2019. Equal Access to Make Emergency Calls: A Case for Equal Rights for Deaf Citi-
zens in Norway and Sweden. Social Inclusion 7(1), 173–179.
Warnicke, C., 2021. Signed and Spoken Interaction at a Distance: Interpreter Practices to Strive for
Progressivity in the Beginning of Calls via the Swedish Video Relay Service. Interpreting 23(2),
296–320.
Warnicke, C., Broth, M., 2023. Embodying Dual Actions as Interpreting Practice: How Interpreters
Address Different Parties Simultaneously in the Swedish Video Relay Service. Translation and
Interpreting Studies 18(2), 191–212.
Warnicke, C., Granberg, S., 2022. Interpreter Mediated Interaction Between People Using a Signed
Respective Spoken Language on a Distance in Real Time – a Scoping Review. BMC Health Services
Research 22(387).
Warnicke, C., Matérne, M., 2024a. Sign Language Interpreters’ Experiences of Remote Interpreting
in Light of COVID-19 in Sweden. Interpreting and Society. URL https://journals.sagepub.com/
doi/10.1177/27523810241239779
Warnicke, C., Matérne, M., 2024b. Regulation, Modification, and Evolution of Remote Sign Lan-
guage Interpreting in Sweden – a Service in Progress. BMC Health Services Research 24, 1431.
URL https://doi.org/10.1186/s12913-024-11907-y
Warnicke, C., Plejert, C., 2012. Turn-Organisation in Mediated Phone Interaction Using Video Relay
Service (VRS). Journal of Pragmatics 44(10), 1313–1334.
Warnicke, C., Plejert, C., 2016. The Positioning and Bimodal Mediation of the Interpreter in a Video
Relay Interpreting (VRI) Service Setting. Interpreting: International Journal of Research & Prac-
tice in Interpreting 18(2), 198–230.
Warnicke, C., Plejert, C., 2018. The Headset as an Interactional Resource in a Video Relay Interpret-
ing (VRI) Setting. Interpreting: International Journal of Research & Practice in Interpreting 20(2),
285–308.
Warnicke, C., Plejert, C., 2021. The Use of the Text-Function in Video Relay Service Calls. Text and
Talk 41(3), 391–416.
Wessling, D.M., Shaw, S., 2014. Persistent Emotional Extremes and Video Relay Service Interpreters.
Journal of Interpretation 23(1), 1–21.

78
5
PORTABLE INTERPRETING
EQUIPMENT
Tomasz Korybski

5.1 Introduction
A significant part of this volume inevitably focuses on the most recent technological devel-
opments that have either already disrupted the way interpreting is delivered (such as remote
simultaneous interpreting delivery platforms; see Chmiel and Spinolo, this volume) or are
bound to impact interpreting, such as machine interpreting or automatic speech recogni-
tion. While these developments are undoubtedly of huge importance, a volume devoted to
interpreting and technology must strive to offer a potentially comprehensive picture of all
technological solutions used by interpreters. From today’s standpoint, some of these may
appear low-tech. However, they still have a place in the interpreter’s toolkit and continue
to successfully serve professionals in circumstances where cloud-based solutions and even
conventional conference equipment simply cannot work. This chapter focuses on one such
solution: portable interpreting equipment, that is, small pocket-sized kits that typically func-
tion based on radio frequencies. This equipment is composed of a transmitter (or multiple
transmitters) and multiple receivers and allows users to communicate at a (relatively short)
distance, either within the same language or across languages. Alternative names include
the widely used ‘tour guide system’ (as the technology has been used beyond interpreting,
by guides communicating with groups during guided tours), the French word bidule (‘the
thingie’), the German word Flüsterkoffer (‘the whispering briefcase’, as the portable sets
are usually carried in a briefcase which doubles as a charger, and the predominant modal-
ity of interpretation via such systems resembles whispered interpreting), the term ‘infoport
systems’, and even ‘boothless interpreting systems’. This chapter will use the generic term
‘portable interpreting equipment’ interchangeably with ‘tour guide systems/sets’ and ‘bid-
ule’, as these three terms appear to be the most widely applied in literature. Tour guide
systems are used primarily in guided tours, site visits, museums, conferences, and other
events where spoken information needs to be conveyed live, to one person or a group of
people, and where portability as well as mobility are vital. During assignments, the system’s
transmitter device is worn by the speaker (a tour guide, an interpreter, a host, etc.), and
receiver devices equipped with corded or cordless earphones are used by the recipient/s. The
transmitter emits audio signals through a selected channel. These signals convey spoken


DOI: 10.4324/9781003053248-7
The Routledge Handbook of Interpreting, Technology and AI

information or commentary, which is then wirelessly transmitted to the receivers worn by


the participants, typically via radio waves. This allows everyone in the group to hear the
guide’s commentary clearly, even in noisy environments or large crowds, while also main-
taining mobility. In an interpretation setting, the interpreter is both the recipient and the
speaker. Consequently, s/he can wear both the receiver device and the transmitter device or
work with a modern bidirectional device. Alternatively, in a less-desirable configuration,
the interpreter can wear only the transmitter device, relying solely on their ears to pick up
the original sound.
Tour guide systems facilitate mobility and independence from complex technical set-ups
and internet access. This feature makes them particularly well suited to certain interpreta-
tion settings, such as small-scale events with few recipients, field trip interpretation, or
humanitarian settings, among others. However, there are also significant limitations and
disadvantages associated with their use, and this chapter aims at presenting a balanced
view of the use of portable interpreting equipment. The chapter starts with a historical
overview focusing on the development of the technology (Section 5.2), followed by a refer-
ence to existing research on the subject (Section 5.3) and a discussion of possible and varied
configurations and applications of tour guide systems for interpreting, as well as associated
constraints (Section 5.4). Section 5.5 addresses concerns from professional organisations
and practitioners/researchers regarding the use of tour guide systems as a replacement for
regular conference set-ups with standard booths and features a list of best practices in the
application of portable kits for interpreting. Section 5.6 looks at when and how tour guide
systems can be applied in interpreter training. The chapter concludes with Section 5.7,
about the future of tour guide systems in interpreting and the existing research gap.

5.2 Historical background and development of the technology


Tour guide systems, or similar portable technologies for group communication, have
developed over several decades. The exact date of invention of these systems is difficult to
pinpoint. However, following Keiser (2004), and as reported in the AIIC’s (International
Association of Conference Interpreters) anniversary volume on the history of the profession
(2019), it seems safe to indicate the late 1940s and the 1950s as when the application of
radio frequency technology for interpreting purposes began. Conceptually, there is a shared
source of early conference equipment for the facilitation of simultaneous interpretation
and what later became known as ‘le bidule’. The Filene-Finlay ‘Hush-a-Phone’ translator
(Baigorri-Jalón et al., 2014) developed in the 1920s and, implemented by IBM in 1927, was
a telephone technology solution involving cabling and a switchboard. However, the facili-
tation of simultaneous interpretation proved so effective that both IBM and its founder,
Thomas J. Watson, decided to build on the Hush-a-Phone concept and develop a series of
prototypes of a radio frequency conference interpretation system: the ‘IBM Wireless Trans-
lation System’ (also: IBM Simultaneous Interpretation System). This solution was provided
to many international organisations between 1947 and 1954 to facilitate meetings, confer-
ences, and discussions. The IBM Wireless Translation System made it possible to handle
up to seven languages at a time during conferences, with interpretation transmitted to indi-
vidual receivers in the audience via a radio transmitter installed in the interpreters’ booth.
Although the receiver was slightly larger and heavier than the tour guide receivers currently
in use (the latter can be the size of a credit card and weigh as little as 45–60 g), it remained
perfectly portable. In addition, an innovative solution was to build the necessary antenna

80
Portable interpreting equipment

within a lanyard, thus allowing the receiver to be worn around the neck, leaving the hands
free to operate the device. The transmitters, however, were larger – despite the technology
already being available to vastly reduce their size, thanks to the Motorola corporation.
As is often the case with rapid technological development, this company’s links with the
military fuelled the creation of portable radio communication systems before and during
World War II. What would later become known as the popular pocket-size walkie-talkie
was originally a military-grade mobile radio frequency communication system called the
Motorola SCR (Signal Corps Radio), developed by Engr Henryk Magnuski and his team.1
The SCR-300 represented a significant advancement in military communication as the first
portable FM backpack radio, which allowed for reliable, secure communication over a
range of distances and was critical for military operations. In post-war years, as industries
adopted military technology for civilian use, the concept of portable radio frequency com-
munication was embraced and further developed by early producers of portable tour guide
systems. After the war, as the global economy recovered, and a plethora of political events
aimed at restoring world peace, there was a burgeoning interest in enhancing visitor experi-
ences at trade fairs, museums, and historical sites. This sparked the development of tour
guide technology, offering new ways to engage and inform visitors.
Initial systems were rudimentary, relying on portable radios with fixed-frequency trans-
mitters and receivers. While innovative for their time, these early devices were plagued with
interference issues – linked to analogue sound transmission – and had limited operational
range. These factors restricted the widespread adoption and effectiveness of these early
devices, yet they still offered both live communication and communication across languages.
The AIIC’s archives also provide evidence of portable equipment being used after the war,
with the system used by Thadé Pilley and Frank Barker as a prime example (Keiser, 2004).
We know that Pilley and Barker used some form of portable equipment to offer simultane-
ous interpretation, predominantly in Africa,2 but there is no certainty as to the exact tech-
nology used. It has been hypothesised that they used a set of radio frequency transmitter
and receivers, but it is more likely that the equipment was a portable, reduced-size version
of the Finlay–Filene Hush-a-Phone, thanks to accounts of cabling being carried by inter-
preters (AIIC 2019). Nevertheless, what is important is that the interpreting community
embraced technology early on, and that the concept of ‘mobile interpreting’ had already
found its initial adopters in the decade following World War II.
Returning to tour guide systems and fast-forwarding several decades, important strides
were made in battery technology towards the end of the 20th century. This facilitated the
creation of more portable, user-friendly systems, which enabled longer usage periods and
greater flexibility. It is worth noting that, currently (as of 2024), modern lightweight Li-Ion
batteries can power a digital tour guide transmitter or receiver for more than 20 hours. This
renders all-day assignments more comfortable and removes the need to remind users to save
battery life or charge their receivers during breaks. Aside from extended battery life, further
positive changes were brought about thanks to the transition from analogue to digital tech-
nology in the 1990s: the new digital devices offered superior sound quality and enhanced
ease of use. At the beginning of the 21st century, new advancements in portable tour guide
systems were implemented: a widespread uptake of wireless technology, including UHF and
VHF frequencies, allowed guides or interpreters to communicate effortlessly with multiple
participants via microphones and wireless receivers.
This analogue-to-digital shift was important for the user experience. While analogue and
digital radio transmitters both send audio signals through the air, they do this in different

81
The Routledge Handbook of Interpreting, Technology and AI

ways. Analogue radio sends sound waves directly through the air. This can result in sound
becoming ‘fuzzy’ or containing static. This is especially the case if the receiver is far from
the transmitter (the further from the transmitter, the worse the sound quality becomes), or
where there is some sort of interference (e.g. physical obstacle, other waves) along its path.
In contrast, digital radio technology converts sound into digital data, much like a computer
file, which it then sends through the air. This method ensures clearer, more consistent sound,
with far less static and interference. The sound remains clear until the signal becomes too
weak, at which point it stops rather than gradually degrading. This technological advance
has proven important and particularly advantageous for modern tour guide systems. The
sound received through such digital equipment is clear, even in the presence of background
noise or when a group of recipients is spread out during interpreted field visits or on-site
training, in noisy environments. The quality of digital radio is also far more consistent than
that of older analogue devices, and analogue crackling noises or sudden sound drops are
eliminated through digital processing. Additionally, digital systems can handle multiple
channels easily. This allows different groups with different language pairs the ability to
communicate in the same location without any interference.
When considering typical specifications of a mid-range digital UHF (ultra-high fre-
quency) tour guide transmitter, one might think of a small lightweight device resembling
a small wallet or pack of credit cards, with a selection of the following features: it may
weigh around 50–70 g, have a working range of around 150 m in the line of sight, pos-
sess over 30 channels, and have a battery life which exceeds 15 hr of continuous work
after 2–4 hours of charging. It may also feature Bluetooth and Wi-Fi technology for
easier integration with smartphones and other personal devices. Nowadays, standard
specifications also include digital transmission modulation with noise reduction technol-
ogy, interference-free channel search function, and remote channel change in receivers.
Additional accessories are also typically available. These include convenient bulk charg-
ers (which resemble a small briefcase), tailored bags, matching handheld wireless micro-
phones, etc. Therefore, from today’s perspective, it seems fair to say that what began as a
simple concept, based on analogue radio technology with hardware limitations, has, over
a century, morphed into a relatively advanced, highly portable device, based on digital
signal processing technology (DSP). Today’s devices can be deemed ‘reliable’ in terms of
sound quality and battery life and ‘flexible enough’ to work in configurations with other
modern devices, including smartphones, wireless equipment, and digital recorders. This
translates into devices that are more convenient to use when providing live interpreting
in a range of multilingual settings, such as seminars, lectures, international conferences,
site visits, facility tours, cultural tours, and business meetings. Still, the impressive tech-
nological progress described earlier has not eliminated all shortcomings that are inherent
to interpreting via tour guide systems, the most important of which are discussed later in
this chapter (Section 5.4).

5.3 Existing research


It is perhaps more fitting to say that interpreting with portable sets has been ‘described’
and ‘commented on’, rather than studied in depth, and often in conjunction with whis-
pered interpreting and its associated challenges. To illustrate, Kalina and Ziegler (2015)
refer to bidule interpreting as being a more challenging version of simultaneous interpret-
ing, whereby added difficulties are caused by the lack of a booth and production-related

82
Portable interpreting equipment

challenges, such as the strain on vocal cords during continued whispered assignments
(although simultaneous interpreting with portable systems need not always require the
interpreter to whisper).
More recent and comprehensive presentations of this modality are offered by Baxter
(2015) and Porlán Moreno (2019). The former refers to the apparent marginalisation of
bidule interpreting and positions it within volunteer and activist interpretation. Meanwhile,
the latter presents a comprehensive overview of the applications of tour guide interpreting.
Porlán Moreno (2019, 56) suggests that portable systems may work well for small groups,
visits, and environments where setting up traditional simultaneous interpretation booths
is impractical. However, the author also emphasises the apparent misuse of such systems.
This, in turn, creates challenges for interpreters and increases the need for interpreter train-
ing programmes and associations to clearly define appropriate environments within which
the bidule can be used, in addition to defining its limitations.
Some descriptions of bidules and their use, with comments, come directly from interpret-
ing service providers (e.g. Magalhães, 2016; Sekikawa, 2008). These provide a balanced
presentation of the ‘pros and cons’ of this modality of service delivery. Stahl (2010) remarks
that, as subject of research, tour guide interpreting is an ‘orphan’ of research in interpreting
studies. Indeed, well-structured research, based on empirical data, is limited to MA-level
dissertation projects, such as Bidone (2017) and Panna (2017). In addition to offering a
broader evidence-based analysis of the challenges associated with tour guide interpreting
and its positioning in interpreter training, both MA theses report on challenges encountered
by a particular group of trainee interpreters while performing an actual field assignment.
There is, therefore, ample opportunity to fill the existing research gap. This will be dis-
cussed in more detail in Section 5.7 of this chapter.

5.4 Application contexts, advantages, and constraints


Tour guide systems represent a relatively easy-to-apply alternative to conventional interpre-
tation, when the latter is deemed neither practical nor affordable. Furthermore, interpreta-
tion using tour guide systems can successfully replace whispered interpretation (chuchotage).
Similarly, these systems can also replace a combination of chuchotage and consecutive
interpreting. This is the case of bidirectional interpretation, which takes place with very few
recipients in each language, such as during negotiations. This aspect of cross-modal appli-
cation has the potential to make interpreting with portable equipment difficult to categorise
as either simultaneous or consecutive interpretation (SI and CI, respectively). As Baxter
(2015) notes, although many scenarios will be limited to SI, the event type and specifica-
tions of the equipment used may require a mix of both SI and CI. In SI settings, aside from
the obvious advantage of serving more than only one or two recipients (as in the case of
chuchotage), there are hygienic considerations that speak in favour of portable sets. With
such equipment, the distance between the speaker, the interpreter/s, and the recipients can
be increased with no detriment to the sound volume. In fact, the volume will be higher, and
thanks to sound-shielding equipment, the interpreter’s voice will not need to be suppressed.
As a result, the risk of miscomprehension and vocal cord strain for the interpreter/s will be
reduced. Additionally, the increased distance (compared to whispered interpreting) is a par-
ticularly relevant advantage in the post-pandemic reality, where risk of upper respiratory
tract infections and requirements to ensure appropriate social distancing remain important
concerns for event organisers.

83
The Routledge Handbook of Interpreting, Technology and AI

A further benefit offered by portable interpreting equipment systems is their flexibility


and mobility. As a typical portable interpreting set can fit into a small, lightweight brief-
case that houses several transmitters and a few dozen receivers, as well as headsets or
earphones, interpreter/s can become the system’s self-reliant operator/s. This means that
small- to medium-scale events requiring interpretation for several dozen audience members
can be conveniently served by a tour guide interpretation system. Briefcases that double as
charging stations can be carried to any location, including outdoor locations, and continue
to function while their batteries last. This mobility aspect can be decisive during study vis-
its, field trips, or facility tours, when it may be necessary for people to move freely within
the tour or event space while still being able to hear the interpretation. Even when multiple
groups with different target languages require interpretation, the interpreter/s providing
the service can cater for all, by setting multiple channels on their devices. However, in this
particular scenario, a sufficient number of transmitters must be ensured, ideally two per
language pair. In such situations, offering interpretation via bidule, rather than regular
consecutive interpretation, may enhance the audience experience and streamline the event
in terms of duration and accessibility, not to mention hygiene considerations.
However, interpretation via tour guide systems can also be provided in less-dynamic set-
tings. Examples of such settings include a typical conference interpretation setting, where
the devices can be used to replace conventional consoles and the audio system. During
conferences or lectures, interpreters can work via tour guide systems either from a typi-
cal soundproof booth or without a booth (from a designated location within the confer-
ence room or in an adjacent room). In the absence of the preferred option, that is, an
ISO-standard booth, interpreters can also work from a makeshift booth substitute using
these systems. Examples of makeshift booths include tabletop interpreter booths, micro-
phone isolation shields, and acoustic filters that aim to prevent the interpreter’s voice from
penetrating the room and interfering with the original sound. It is important to note that
although such alternative configurations do improve the working conditions for interpret-
ers, these are still considered ‘substandard’ alternatives and should not be presented as
being equal to a typical conference set-up with associated technical support.
Importantly, when a client opts for portable equipment rather than standard simulta-
neous equipment and booths, the interpreter needs to consider the additional risks and
workload associated with this decision. Firstly, if the client’s choice is driven by cost reduc-
tion, it is likely that the entire technical responsibility for setting up and operating the tour
guide equipment will rest with the interpreter, according to the client’s expectations. This
represents a significant additional effort, on top of preparation for the assignment and
its delivery. As a result, this should always be carefully considered by the interpreter/the
interpreting tandem. They should ask themselves whether they are prepared to take on the
additional workload, and under what conditions. Furthermore, such conditions should be
contractually specified prior to the assignment.
The increase in workload in relation to management and self-management that accom-
panies technological development has been present in the profession for decades. How-
ever, with the arrival of cloud-based distance interpreting/remote simultaneous interpreting
(RSI), this question is becoming a particularly significant topic. RSI brings with it a set of
technical, technological, and organisational tasks similar to those observed when interpret-
ing via portable sets. The balance between tasks for which interpreters are held responsible
and accountable (such as their remuneration, effort, and organisational costs incurred by
the client) must therefore be renegotiated to match these new circumstances.

84
Portable interpreting equipment

Another significant constraint linked to the use of tour guide systems, as reported by
Diriker (2015), is the way the interpreter uses their voice. While working with bidule, with-
out a soundproof booth or even a sound shield, it is necessary for the interpreter to consider
their immediate environment and lower their voice as a result. This may prove overly taxing
and lead to disruptions in delivery, with the interpreter’s control over their own product
suffering due to having to speak at a lower volume. In turn, this could impact the overall
perception of interpretation quality in such scenarios.
Furthermore, effective planning is required in relation to the interpreter/s’ physical posi-
tioning while providing their services via portable sets. Planning is particularly important
in the case of indoor events. When booths are unavailable, interpreters tend to work at
a distance from both the speaker and the audience in order to reduce potential acoustic
interference. However, the interpreters still need to be able to hear the speech from the floor
clearly (if no separate audio feed is available). They also need to be able to see the room
dynamics, since visual aids (presentation slides, etc.) and cues provide useful indications to
meaning during interpretation. This element requires appropriate attention when training
interpreters. Similarly, in real-life settings, it is a compulsory part of preparation for any
assignment and contributes to the interpreter’s overall workload.
A further aspect of interpreting that is crucial to consider prior to any assignment which
requires tour guide sets in indoor venues is incoming sound management. Aside from man-
aging the aforementioned issue of incoming and outgoing sound quality, before an assign-
ment, the interpreter must ascertain whether floor sound will be conveyed directly to his/
her earphones or headset, or whether sound will need to be picked up via the venue’s
loudspeakers. The former solution can be afforded by the venue’s technicians if they can
provide a spare outgoing audio channel for the interpreter. Alternatively, a combination of
two transmitters, operating simultaneously, can be used: one transmitter for the speaker,
another for the interpreter (who will also have a receiver set to the speaker’s channel, differ-
ent to the channel used by the audience). Recently, certain manufacturers have offered more
advanced, bidirectional devices, for this purpose,3 thus resolving the challenge created by
dynamic exchanges involving two languages, as is the case of Q&A sessions, for example.
Whatever the hardware used, one can observe that the alternative scenario further
increases the interpreter’s workload: S/he will need to work alongside the venue’s techni-
cians to equip the speaker with the transmitter/transceiver, brief the speaker on how to use
it, mount the transmitter’s microphone, and ensure the equipment is passed over to the next
speaker if needed. Therefore, it is no wonder that the interpreting community, including
researchers, has reservations about tour guide interpreting. However, despite these addi-
tional extra steps, it may still be worth the effort, as interpreting with portable sets is still
regarded as a less-strenuous alternative to whispered interpreting (Pöchhacker 2009). Nev-
ertheless, reservations are reflected in existing standards, guidelines, and comments in both
research and professional publications and will be discussed in more detail in the following
section.

5.5 Standards and ethical considerations


Following on from the discussion of constraints and practical limitations of tour guide
interpreting, it is important at this point to quote the approach of the trade associations to
this mode of interpreting. The AIIC (2014) recommends that interpreters avoid working
with tour guide sets and treat these types of assignments as an exception, which is subject

85
The Routledge Handbook of Interpreting, Technology and AI

to a number of requirements, including the characteristics of the event (e.g. a factory visit
where mobility is paramount), duration (short durations of up to two hours preferred), a
small number of participants, availability of two-way equipment (to ensure clear source
audio for the interpreter), and compliance of such equipment with the relevant standards,
such as the IEC 914 standard (AIIC 2002).
Setton and Dawrant (2016, 19) also refer to the use of portable sets as a ‘problematic’
phenomenon due to ‘inadequate sound quality and acoustic isolation’. They distinguish
interpreting in this mode from what they call ‘real SI’. Certainly, both the AIIC guidelines
and the terminology used by the aforementioned authors are dictated by a concern for the
comfort and convenience of the interpreter and a desire to uphold the standards of the
interpreting profession. As Magalhães (2016) notes, the interpreter’s agreement to use port-
able interpreting equipment should never be at the expense of the standards that the inter-
preting community has worked hard to establish for decades. Portable units ought to serve
as a streamlined alternative (predominantly to consecutive interpreting) in settings where
regular booth equipment cannot be installed. In fact, there is already anecdotal evidence of
how tour guide systems can be used in settings that employed consecutive interpretation in
the past, with positive results in high-stakes environments (Shermet 2019).
Another issue relating to standards centres on the question of the number of interpreters
required to deliver the service using portable units. In reality, as client decisions are often
cost-driven, there may be an expectation to cut costs further, employing just one interpreter
to work with portable units. Baxter (2015) describes ‘boothless simultaneous interpreting’
with portable equipment, and whispered interpreting, as alternatives that are preferred by
activist and non-governmental event organisers with limited budgets. In addition, Baxter
refers to interpreters delivering such services in both the singular and the plural. The expec-
tation for an interpreter to single-handedly deliver an assignment through a tour guide
system may materialise when portable units are used as a ‘faster’ alternative to consecutive
interpretation. There is a general expectation that these tasks be performed by one con-
secutive interpreter in the given setting. Therefore, it is essential that working standards
are upheld if the commissioned simultaneous shifts exceed 20 min in duration. In fact,
during assignments that use tour guide sets, interpreters’ shifts may require more frequent
handovers, due to the potentially more demanding acoustic environment. This is because
interpreter fatigue resulting from lack of soundproofing and exposure to background noise
from the audience may lead to compromised quality. Consequently, appropriate rest peri-
ods are mandatory. It is crucial to make this requirement known to both clients and event
organisers wherever possible, as well as address it during training. This is an aspect that the
next section will discuss in more detail.

5.6 Portable interpreting equipment in interpreter training


As previously mentioned, due to the limited amount of data on how often it is actually used,
interpreting with portable equipment has been largely overlooked in research. Furthermore,
conducting a comprehensive study on its usage across a wide range of settings would be
challenging to implement. As a result, researchers in this domain have to rely on fragmen-
tary data and make educated guesses as to the extent of commercial (and voluntary) use
of tour guide sets for interpretation. Bidone (2017) quotes AIIC data from 2010, accord-
ing to which nearly 4% of all days worked by the surveyed AIIC members were bidule
assignments. Although, in terms of the raw percentage, this does not sound like much, this

86
Portable interpreting equipment

statistic still shows that approximately 1 in every 20 working days will involve an assign-
ment using a tour guide system. This seems like reason enough to place tour guide interpret-
ing in curricula – especially since the same source also quotes a growing trend in the uptake
of tour guide systems for interpretation, as confirmed by nearly 50% of respondents. In
addition, researchers must also bear in mind that these figures already date from a decade
ago and involve an elite group of interpreters. One can reasonably assume that among
non-associated freelancers who often combine interpreting with other linguistic professions
and work for a variety of clients, including non-institutional clients, NGOs, or corporate
clients with limited budgets, these figures could be higher. In addition to the need to pro-
vide all-around training which covers the widest possible spectrum of interpreting modes
and modalities, this point serves as justification for including bidule training in interpreter
training curricula. The following sections offer practical elements that could be considered
during tour guide interpreter training.

5.6.1 The right time to introduce training with portable equipment


Due to the level of difficulty compounded by potential acoustic and organisational chal-
lenges in the delivery of tour guide interpreting, training with tour guide sets should
not be introduced at the beginning of the interpreter training process. The logic behind
the sequence of exercises in interpreter training that has been tried and tested for years
still holds true: the process of gradual introduction of tasks to improve comprehension,
short-term memory, reformulation, and note-taking (e.g. Gile, 2009; Jones, 2002) should
not be disturbed by a premature introduction of modalities, where practical and acoustic
challenges add to the already-formidable cognitive challenge of simultaneous interpreting.
The basic argument is that the simultaneous mode itself is highly challenging, and it is more
appropriate for the student to experience it, first in the ‘greenhouse’ conditions of a booth,
set up by a technician/trainer, with optimised soundproofing and high quality of incoming
sound. When simultaneous listening and production skills have been developed and prac-
tised to a satisfactory degree, there is time and space to introduce alternative equipment and
associated best practices. Depending on the intensity of the training, this could be several
weeks or even months after the start of training in booths. The advantage of following such
a sequence is that it enables students to understand and experience the differences between
the conditions offered by a typical ISO-compliant booth on the one hand, and portable sys-
tems on the other. Trainees will naturally draw their own conclusions about the impact of
equipment choice on the quality of interpretation. Consequently, newly trained interpreters
will then be able to pass this information on to their prospective clients.

5.6.2 Core skills and ethical considerations in training


With regards to the set of skills, abilities, and attitudes that make up the target competence
of an interpreter carrying out assignments using tour guide kits, these need to include a
crucial technical component. The list of purely technical tasks in standard on-site simulta-
neous interpreting (delivered from the booth and including technical support and a tandem
partner) may be limited to ensuring that at least one interpreter in the tandem knows how
to use the console at hand and tests the channels prior to starting an assignment. In the case
of tour guide sets, the usual expectation is that the interpreters working with the system will
make efficient use of their equipment. This includes managing its set-up as well as briefing

87
The Routledge Handbook of Interpreting, Technology and AI

the speakers/audience on how to use the equipment, and even ensuring uninterrupted use
and troubleshooting the equipment during the assignment. This difference highlights how
training should include both technical and organisational tasks that are rarely required of
an interpreter in a typical, technically supported booth conference interpreting service set-
ting in order to prepare trainees for tour guide set assignments.
Another skill for working with tour guide sets is the ability to interpret simultaneously
despite a lack of source audio fed directly to the interpreter’s headphones. This is a conten-
tious issue. For example, following the relevant AIIC (2002) guidelines, interpreters should
reject any jobs where audio needs to be picked up from the floor without a direct audio feed
to the interpreter’s ears. There are strong reasons for this standard to be upheld. However,
many anecdotal accounts prove that it is not realistic to always expect ideal source audio
during an assignment. As a result, the responsibility rests with the interpreter to choose
whether to accept the job, with all its challenges, or to decline it. Consequently, due to the
range of different interpreting scenarios and settings where tour guide systems can be used,
it is challenging for interpreter trainers to draw a line of acceptability in the ill-defined area
of ‘satisfactory vs non-satisfactory’ sound quality. What remains certain, however, is that
by the end of their course, students should be aware of the impact that acoustic limitations
can have on the quality of their product, and the additional burden on the interpreter that
this modality of delivery causes.
Interpreter training curricula should place a strong emphasis on both the ethical and
workload considerations that surround the delivery of interpreting services with portable
sets. A well-trained interpreter should be able to transparently present the advantages and
disadvantages of bidule interpreting vis-à-vis regular interpreting to the client. They should
also be capable of highlighting issues like feasibility, impact on quality, increased workload,
shift work, and technical and organisational tasks linked to the delivery. Such an approach
will facilitate a comprehensive analysis of requirements that are unique to each assign-
ment. Consequently, this approach would also reinforce a standardised methodology for a
non-standard delivery of interpreting services (however paradoxical that may sound).

5.7 Conclusion
As interpreting undergoes a dynamic transformation, caused predominantly by the arrival
of remote interpreting, automatic speech recognition, and AI/MI technologies (see contri-
butions by Prandi, Chmiel and Spinolo, Fantinuoli, and Ünlü, this volume), the question of
whether bidule interpreting is ‘future-proof’ is highly relevant. To answer this, it is necessary
to first consider the widest possible spectrum of interpreting assignments currently available
to service providers. In some settings, including remote interpreting of conferences, webi-
nars, town hall meetings, lectures, and training courses, the impact of new technology and
the potential for (semi-)automation is already clearly visible. However, many other settings
remain, where the physical presence of a human interpreter will continue to be the preferred
choice or simply a necessity. Field trips in remote areas, facility tours where mobility is
crucial, interpreting in crisis settings where internet access is limited or non-existent, clas-
sified negotiations at risk of cyberattacks, small- to medium-size group events for clients in
small premises or with limited budgets, or contingency-type interpreting when technology
has failed are just a few examples of situations where modern digital radio tour guide sets
can serve as a valuable companion for interpreters at work. If their inherent limitations and
impact on the interpreter’s output are both thoroughly understood and considered on a firm

88
Portable interpreting equipment

ethical groundwork, these devices are, and will most probably remain, a useful tool in the
flexible interpreter’s toolbox.
Furthermore, there is a need for more concrete justification regarding the presentation of
this modality of interpreting as ‘marginal’. Interpreting with portable equipment remains
an understudied area. Firstly, there is a considerable research gap concerning the actual
share of bidule assignments, their types, and their characteristics. Secondly, there is a lack
of knowledge about the impact that this non-standard equipment (in varying configura-
tions) may have on the quality of interpreting. Experimental research in this area is there-
fore necessary. Thirdly, interpreter fatigue and workload in this modality have only been
described vaguely, and researchers lack concrete data to prove the relevant claims made in
literature. Fourthly, as much criticism of tour guide systems is founded on claims relating
to inferior sound quality and impractical channel management, the impact of the recent
technological strides on the user experience has yet to be assessed. In addition, the key func-
tionalities to be investigated in this regard include the bidirectionality of equipment and its
noise-cancellation features. In a similar vein, more technically oriented acoustic research is
required, as is further investigation into, and comparisons between, the quality of incoming
sound across different interpreting modalities. These are just the main research strands that
are linked to bidule interpreting. These can certainly be narrowed down to more specific
research projects that will further our understanding of how this modality of interpreting
can be used.
Finally, thanks to recent strides in technology and the arrival of AI, a further poten-
tial avenue for application and development of portable systems for interpreting is their
combination with automatic speech recognition (ASR) and natural language processing
(NLP) technology. For instance, combining ASR with noise cancellation could allow port-
able equipment to be used to form alternative workflows for interpreters. To illustrate,
interpreters who are working in acoustically substandard environments could be provided
with live speech-to-text transcripts, which would aid interpretation. The size and interfaces
of existing tour guide kits facilitate this application. Similarly, further solutions may prove
useful for simultaneous interpreting in general (including RSI); for example, automatic and
low-latency summarisation applications could also help interpreters partially overcome the
contextual challenges that are present in tour guide interpreting.
Future research will reveal the extent to which a concept born nearly 100 years ago and
now in its most modern guise can work alongside the most recent developments in AI and
NLP if managed and controlled by well-trained human professionals.

Notes
1 Scr300.org (accessed 8.7.2024).
2 https://bootheando.com/2013/03/08/la-historia-de-la-interpretacion-simultanea-de-la-mano-
de-ted-pilley/ (accessed 10.7.2024).
3 Examples include products from Williams AV and Okayo (respectively: https://williamsav.com/
product/dlt-400/, www.okayo.com/en/product-281287/Full-duplex-Communication-System-Wave
TEAMS.html, accessed 7.9.2024).

References
AIIC, 2002. Text on Bidule. AIIC.net. URL https://aiic.net/page/633/text-on-bidule/lang/1
AIIC, 2014. Basic Texts. Code of Professional Ethics. AIIC.net. URL http://aiic.net/p/6724

89
The Routledge Handbook of Interpreting, Technology and AI

AIIC, 2019. Birth of a Profession. The First Sixty-Five Years of the AIIC.
Baigorri-Jalón, J., Mikkelson, H., Olsen, B.S., eds., 2014. From Paris to Nuremberg: The Birth of
Conference Interpreting. John Benjamins Pub Co., Amsterdam and Philadelphia.
Baxter, R.N., 2015. A Discussion of Chuchotage and Boothless Simultaneous as Marginal and Unor-
thodox Interpreting Modes. The Translator 22, 59–71. URL https://doi.org/10.1080/13556509.2
015.1072614
Bidone, A., 2017. Dolmetschen mit Flüsterkoffer: Eine Feldstudie (MA dissertation). University of
Vienna. Available online at phaidra.univie.ac (accessed 5.8.2024).
Diriker, E., 2015. Simultaneous Interpreting. In Pöchhacker, F., ed. Routledge Encyclopedia of
­Interpreting Studies. Routledge, London and New York.
Gile, D., 2009. Basic Concepts and Models for Interpreter and Translator Training, revised ed.
­Benjamins Translation Library, John Benjamins.
Jones, R., 2002. Conference Interpreting Explained. St. Jerome Publishing.
Kalina, S., Ziegler, K., 2015. Technology. In Pöchhacker, F., ed. Routledge Encyclopedia of Interpret-
ing Studies. Routledge, London, 410–412.
Keiser, W., 2004. L’interprétation de conférence en tant que profession et les précurseurs de l’Association
internationale des interprètes de conférence (AIIC) 1918–1953. Meta 49(3), 579–608. URL https://
doi.org/10.7202/009380ar
Magalhães, E., 2016. Portable Interpreting Equipment. What to Get and Why. URL https://ewandro.
com/portable/
Panna, S., 2017. Die Praxis des Flüsterdolmetschens: Eine qualitative und quantitative Studie (MA
dissertation). University of Vienna. Available at: https://phaidra.univie.ac.at/ (accessed 5.8.2024).
Pöchhacker, F., 2009. Conference Interpreting: Surveying the Profession. The Journal of the ­American
Translation and Interpreting Studies Association 4(2), 172–186. URL https://doi.org/10.1075/
tis.4.2.02poc
Porlán Moreno, R., 2019. The Use of Portable Interpreting Devices: An Overview. Revista Trad-
umàtica. Tecnologies de la Traducció 17, 45–58. URL https://doi.org/10.5565/rev/tradumatica.233
Sekikawa, F., 2008. Über Sinn und Unsinn einer Personenführungsanlage. URL www.sekikawa.de/
pdf-files/pfa_d.PDF (accessed 10.8.2024).
Setton, R., Dawrant, A., 2016. Conference Interpreting: A Complete Course. Benjamins Translation
Library: BTL, John Benjamins. URL https://doi.org/10.1075/btl.120
Shermet, S., 2019. A Quieter Revolution in Diplomatic Interpretation. ATA Website. URL www.
ata-divisions.org/ID/a-quieter-revolution-in-diplomatic-interpretation/ (accessed 13.7.2024).
Stahl, J., 2010. Flüsterdolmetschen – ein Waisenkind der Forschung. In Stahl, J., ed. Translatologické
reflexie. Iris, Bratislava, 55–67.

90
6
TECHNOLOGY-ENABLED
CONSECUTIVE INTERPRETING
Cihan Ünlü

6.1 Introduction
Over the past two decades, various tailor-made technological provisions have been cre-
ated for interpreters. Causes may be attributed to the broader accessibility of technology
for professional use, significant steps in the robustness and quality of speech technologies,
the drastic advent of computing power, significant advances made in deep learning, or the
prevalence of generative artificial intelligence. These provisions aim to offer both tools and
solutions to improve interpreters’ work efficiency. As a broader concept, technology in
interpreting has become a major topic and subject of enquiry in both academia and indus-
try. The intersection of technology and interpreting has thus also become an attractive,
albeit young, field of study and has given rise to a proliferation of human–machine interac-
tion and product- and process-oriented studies in the field of translation studies. The tech-
nologisation of interpreting is discussed by many, particularly in a period when the number
of remote interpreting (RI) platforms has multiplied, when machine interpreting (MI) has
become prevalent in the language industry, and when natural language–based AI technolo-
gies have achieved a level of maturity sufficient enough to help interpreters complete certain
subtasks before, during, and after assignments.
Recent progress in speech technologies (automatic speech recognition, speech transla-
tion, speech synthesis) alongside state-of-the-art deep learning models in natural language
processing have paved the way for the utilisation of and research on ‘process-oriented’
interpreting technologies (Fantinuoli, 2018). Accordingly, computer-assisted interpret-
ing (CAI) tools have emerged. These are dedicated to ‘assist[ing] professional interpret-
ers in at least one of the several sub-processes of interpreting’. Examples of sub-processes
include ‘knowledge acquisition and management, lexicographical memorization, real-time
­terminology access, and so on’ (Fantinuoli, 2023, 46). In the past decade, these tools and
technologies have been implemented in various settings and discussed academically in terms
of usability (Frittella, 2023), terminology (Gacek, 2015; Biagini, 2015), cognitive processes
(Prandi, 2023, this volume), and socio-cognitive (4EA) approaches (Mellinger, 2023),
among ­others. Interpreting is not only technology-enabled but also technology-mediated.
The sharp increase in demand for and provision of distance interpreting has made remote


DOI: 10.4324/9781003053248-8
The Routledge Handbook of Interpreting, Technology and AI

simultaneous interpreting (RSI) more prevalent, and technologisation more relevant for
practitioners. In addition, the platformisation of RSI has also led more interpreters with
technology literacy to work as independent contractors. Such independent contractors have
been found to have an increased reliance on digital platforms, which mediate their jobs and
terms (Giustini, 2024; Giustini, this volume). Furthermore, the shift to a digital workspace
has also made it necessary to integrate technology into the interpreter’s workflow. This
technological change requires additional empirical studies to assess how best to prevent
risks and establish which benefits they can offer in terms of streamlining the entire pro-
cess. Therefore, there has been an observable increase in the design and deployment of
‘setting-oriented’ (Fantinuoli, 2018) technologies as commercial RSI products with CAI
functions (Fantinuoli et al., 2022; Frittella and Rodríguez, 2022) and as products designed
for interpreter training (Arriaga-Prieto et al., 2023; Baselli, 2023).
Recently, the intersection of automatic speech recognition (ASR) and AI-based informa-
tion retrieval models has been observed as a game changer for process-oriented CAI tools,
particularly in SI. Several studies have explored the possibilities of using ASR technology
as an automated querying system (Hansen-Schirra, 2012; Fantinuoli, 2017). Other studies
have explored the possibilities of using ASR technology to enhance CAI tools in the context
of problem triggers during practice (Defrancq and Fantinuoli, 2021; Rodríguez et al., 2021;
Pisani and Fantinuoli, 2021), assisting in the preparatory needs of interpreters (Gaber et al.,
2020), and supporting interpreters by transcribing source speeches (Cheung and Tianyun,
2018; Wang and Wang, 2019; Rodríguez González et al., 2023).
In academia, there has been a clear interest and abundance of empirical research on CAI
in the last decade. However, the focus has predominantly been on simultaneous rather than
consecutive interpreting. A closer glance at the literature shows the majority of the func-
tionalities and designs of the tools, as well as future tool proposals, focus on simultaneous
interpreting. There is a need for more robust empirical studies that explore human–com-
puter interaction within CAI across various language pairs and modalities. That is why
most of the aforementioned studies mostly use SI performance as their independent vari-
able. Along with market demand, it might be the case that the preference for SI over CI is
influenced by the inherent characteristics of CI and the diverse environments it is applied
in. Contrary to simultaneous conference interpreting, CI often lacks a stable set-up. This
may create a doubt that technological aids for CI are less feasible and potentially riskier.
Consequently, when considering the implementation of technology in CI, the focus shifts
to the specific technological tools and their functions that are pertinent to various sub-
tasks within the interpreting process. Although overshadowed by SI-related CAI research,
computer-assisted consecutive interpreting is considered to be a new and vibrant field, with
many experimental studies yielding promising results and insights for process-oriented
interpreting studies.
However, the development of text processing and language understanding on the soft-
ware side and the introduction of new ergonomic and user-friendly tools on the hardware
side have influenced the creation of potential technological support in CI. Although tradi-
tionally considered to be a non-technical field, CI has the ability to integrate a variety of
tools and technologies. These tools aid in the different stages of the interpreting process.
This chapter will focus on the literature surrounding technology aids in CI, with a particular
lens on hybrid modalities, types of equipment, and ASR-assisted solutions. The chapter will
revisit studies conducted so far and analyse the goals, methodologies, and conclusions of
empirical studies that deal with the use of technology in consecutive interpreting in detail.

92
Technology-enabled consecutive interpreting

Section 6.2 provides a general overview of the interplay between CI and technology and
draws connections between various innovations that have been designed and conceptual-
ised for the CI process. Subsequent subsections outline these innovations in the following
broad categories: hybrid modalities, digital pen–supported modalities, and ASR-assisted
modalities. These subsections also provide a detailed review of the literature with empiri-
cal studies conducted so far. Finally, Section 6.3 draws a conclusion based on the current
status, limitations, and future studies.

6.2 Technology in consecutive interpreting


Roughly defined, consecutive interpreting (CI) is one of the two main modes of interpreting,
in which the interpreter listens to the speaker’s message in one language (sign or spoken),
with or without electronic equipment, and delivers the speech segments in the target lan-
guage after the original speech segment has concluded. Speech segments can vary in length,
from a few utterances to longer turns, and may require note-taking to aid the interpreter
in recalling the content from memory. Note-taking is an essential phase of long consecu-
tive and requires different techniques and strategies, such as mind mapping, condensing
sentences, and using symbols, abbreviations, bullet points, and keywords to help recall the
content of a speech (Gillies, 2019; Rozan, 1956). A common element across all definitions
of CI is the delineation of two principal stages: the listening stage and the production/deliv-
ery stage. Broadly speaking, in the listening stage, the interpreter engages in active listening
with the aim of fully understanding the source speech, storing information in their mem-
ory and taking notes to facilitate delivery in the subsequent stage. The interpreter initially
undertakes to understand the source speech through a process of constructing coherence.
They attempt to preserve the outcomes of their analysis of the source speech by storing them
partially in memory and partially recording them on a notepad. Albl-Mikasa describes this
process as engaging in ‘two simultaneous processes: source text comprehension and nota-
tion text (NT) production and then NT comprehension and target text production’ (2017,
92). During the production/delivery stage, the interpreter reconstructs their interpretation
of the original message in the target language. This reconstruction mainly integrates general
knowledge, information retained from the source speech, and notes taken during the lis-
tening stage. Further elaborating on the cognitive management during this process, Daniel
Gile’s effort models (1995, 2001, 2009) identify three specific efforts that are required in
the interpreter’s reformulation phase. These are the ‘note reading’ effort, which involves
deciphering notes; the ‘long-term memory’ effort, requiring the retrieval and reconstruction
of speech content from memory; and production effort, the act of generating speech in the
target language. In brief, both theoretical and practical discussions on the comprehension
and text processing processes in CI allude to a multilayered process. The subtasks involved
in listening and concurrent effort of note-taking (in long consecutive), as well as deliver-
ing output in the target language, require different coordination and process, and thereby
strategies, where technological support can be pivotal.
Unlike the array of technological support available for simultaneous interpreting in con-
ference settings, CI traditionally relies on simpler aids, like pen and paper. This means
technology-enabled consecutive interpreting can focus on providing a versatile solution for
different subtasks of the whole interpreting process. Accordingly, technological enhance-
ments in this area typically focus on supporting either the note-taking or note-consulting/
delivery stages, which correspond to the ‘listening and comprehension’ and ‘reformulation’

93
The Routledge Handbook of Interpreting, Technology and AI

phases of effort models (Gile, 2009), respectively. Despite the distinct operational dif-
ferences between simultaneous and consecutive modes, technological solutions in both
modes aim to facilitate the core work of interpreters by reducing their cognitive load
during information retrieval and enhancing performance quality. In the literature, there
appears to be a handful of studies tackling the question of whether state-of-the-art tech-
nology can enhance CI. There are currently four themes under the umbrella of ‘technol-
ogy use in CI’ that can be given for a broad review. These include hybrid interpreting
modalities, digital pens, tablet interpreting, and the new ASR-integrated approaches (see
also contributions by Davitti and Orlando, this volume). Rather than viewing these as
isolated technological applications for CI, these can be understood as interconnected
technological innovations that share functionalities across either the comprehension and
note-taking phase or the production and delivery phase. Hybrid modalities include the
practice of simultaneous-consecutive, which hypothetically allows interpreters to manage
cognitive loads better and potentially avoid traditional note-taking (Ferrari, 2002, 2007;
Pöchhacker, 2007; Hamidi, 2006; Hamidi and Pöchhacker, 2007). Smartpens (Orlando,
2014, 2016, and this volume) are high-tech electronic devices equipped with a micro-
phone, speaker, and infrared camera. Another innovation is tablet interpreting (Rosado,
2013; Drechsel and Goldsmith, 2016; Goldsmith and Holley, 2015, 2018; Altieri, 2020).
This provides improved handwriting recognition and portability for the note-taking pro-
cess. Lastly, recent studies focused on computer-assisted CI explore ASR (Ünlü, 2023) and
respeaking (Chen and Kruger, 2023), exploring the feasibility of certain speech technolo-
gies on performance enhancement. The following subsections will focus and elaborate on
these solutions under the following categories: hybrid modalities, digital pen–supported
modalities, and lastly, ASR-assisted modalities, with a detailed review of stand-alone com-
mercialised and non-commercialised tools available.

6.2.1 Hybrid modalities: simultaneous-consecutive


At the end of the 1990s, a hybrid approach, ‘simultaneous-consecutive’, was proposed
and first introduced by an SCIC interpreter, Michele Ferrari (Ferrari, 2001, 2002; Gomes,
2002). The ‘simultaneous-consecutive’ mode of interpreting includes a digital device that
records the original speech. The recorded speech is subsequently played back to the inter-
preter via earphones; they then interpret the input simultaneously, but in CI mode, with
or without notes. In 1999, Ferrari, credited as the pioneer of this mode, recorded a com-
missioner’s speech, played it back on a digital device, and interpreted it simultaneously,
merging CI with SI modality for the first time (Gomes, 2002). Numerous scholars have
since approached the hybrid mode of interpreting, giving it various names, such as ‘digitally
remastered consecutive’ (Ferrari, 2002), ‘digital recorder–assisted consecutive’ (Lombardi,
2003), ‘consec-simul with notes’ (Orlando, 2014), ‘simultaneous-consecutive’ (Hiebl, 2011;
Orlando and Hlavac, 2020), and ‘SimConsec’ (Pöchhacker, 2015).
As described earlier, the initial application of simultaneous-consecutive in a professional
context was pioneered by Michele Ferrari in 1999 (Gomes, 2002). Ferrari interpreted a
press conference in Rome by the then vice president of the European Commission, Neil
Kinnock (Hamidi and Pöchhacker, 2007, 277). This event marked the inaugural real-world
application of what he termed ‘digitally mastered consecutive’. The positive reception of
the result led Ferrari to describe the experiment as a successful demonstration of the poten-
tial of this new mode of interpreting (Gomes, 2002). Following this initial trial, Ferrari

94
Technology-enabled consecutive interpreting

conducted further tests within what became DG Interpretation, so as to refine the technique
(Ferrari, 2001, 2002). Subsequent to Ferrari’s initial findings, John Lombardi (2003) and
Erik Camayd-Freixas (2005), both US court interpreters, highlighted the potential ben-
efits of using digital voice recorders in their reviews and evaluations. Lombardi’s informal
evaluation of the ‘digital recorder–assisted consecutive’ (DRAC) method detailed the func-
tionality, advantages, and limitations of this approach (2003). Camayd-Freixas conducted
a more structured experiment at Florida International University involving 24 advanced
interpreting students and early-career professionals (Camayd-Freixas, 2005; Hamidi and
Pöchhacker, 2007). The experiment, in which both groups served as their own controls,
centred on the accuracy of interpreting. The unit of analysis was defined as ‘the percentage
of words missed in each statement’. This provided a quantitative measure of interpreter
performance (Camayd-Freixas, 2005, 43). Accuracy was tested using digital recorders ver-
sus traditional note-taking. Results indicated superior accuracy and completeness when
using the recorder, as speech length increased. Overall, studies concluded that such technol-
ogy not only enhances the quality of interpreting in terms of accuracy but also retains the
original’s intonation and ‘liveliness’ (Camayd-Freixas, 2005, 42) more faithfully. Avoiding
note-taking allowed interpreters to focus more intensively on listening and comprehending
the source material.
A couple of years later, drawing on Hamidi’s thesis (2006), Hamidi and Pöchhacker
(2007) tested the simultaneous-consecutive on three experienced professional interpret-
ers and assessed their performances using both methods through transcript analysis,
self-assessment, and audience response. The authors, who named it ‘SimConsec’, reported
that the participants showed more fluent delivery, had closer source–target correspondence,
and had fewer prosodic deviations (2007, 14). In later years, more comparative studies pro-
vided mixed methodologies. Similar to Hamidi and Pöchhacker’s findings, Hawel (2010)
and Orlando (2014) also note significant improvements in the quality of expression, includ-
ing closer correspondence between source and target texts, fewer prosodic deviations, and
enhanced fluency of delivery. Chitrakar (2016), Mielcarek (2017), and Svoboda (2020)
observe fewer linguistic and conceptual errors and a higher level of detail retention. Experi-
menting with the Chinese–English language pair, Ma (2022) reported that the new mode
enhanced interpreting quality by improving accuracy, information completeness, and logi-
cal clarity, while reducing the memory load and stress on participants. Pöchhacker (2015)
acknowledges the intent to eliminate traditional note-taking with this method. However, he
also states how interpreters may still utilise note-taking as a memory aid, particularly with
the advent of digital pen technology that integrates recording and note-taking in one device
(e.g. smartpens). This will be discussed later on in this chapter.
However, certain drawbacks and negative anecdotal evidence concerning the feasibility
of the mode are also apparent, namely, the fact that interpreters are typically focused on the
audio playback risks impairing proper eye contact or other forms of non-verbal commu-
nication with the audience (Orlando, 2014; Ma, 2022). For Orlando (2014), this could be
mitigated if interpreters are made aware of this drawback and trained to proactively engage
with the audience. The other possible disadvantage is the extended duration of delivery,
which may make it difficult to adhere to the norm that consecutive interpretations should
not be longer than the duration of the original speech. Studies also report scepticism among
professional interpreters regarding the practical applicability of this mode. For example,
Hiebl (2011) reported that some participants had concerns about its use in professional
practice. However, it is possible to say that the views ‘usually range from ambivalent to

95
The Routledge Handbook of Interpreting, Technology and AI

positive’ (Orlando and Hlavac, 2020, 11), both in the eyes of the audience (Svoboda, 2020)
and the practitioners (Orlando, 2014).
Further empirical research is indeed required to find out whether this method effectively
enhances the accuracy and fluency of the rendition, and to determine the best types of equip-
ment as well as the most effective contexts for its application. Admittedly, despite the positive
conclusions of the studies implemented over the past two decades, simultaneous-consecutive
is not widely adopted in professional settings, nor has it been comprehensively addressed
in training programmes. A systematic integration of simultaneous-consecutive into training
has been a much-discussed topic in academia (Orlando, 2016), but there is clear need for
further research on when to begin training in the simultaneous-consecutive mode within
interpreting courses (Orlando and Hlavac, 2020). Aside from its popularity among prac-
titioners, limited examples and insufficient data around how best to implement this mode
exist in interpreter training literature. Such shortcomings remain to be resolved.

6.2.2 Digital pen–supported modalities


A digital pen is a portable electronic device that captures handwriting and converts it into
digital data. These devices, also known as smartpens, often come with advanced process-
ing capabilities, memory for data storage, and additional tools, such as audio recorders or
text scanners (Orlando, 2023, 7). Digital pens have also evolved to include applications
for managing voice recordings and handwritten notes. Such developments have led to a
research strand with researchers investigating its efficiency in enhancing note-taking or
reducing cognitive load. The use of such technologies in interpreting studies has expanded
to also include tablets and styluses (Goldsmith, 2018; Altieri, 2020). The term ‘digital pen’
thus refers both to the styluses used for digital note-taking and to smartpens. Smartpens
have additional features, such as integrated audio recorders or text scanners, offering more
advanced capabilities. They may also provide audio and visual feedback, possess higher
processing power, and support various applications that enhance their utility beyond basic
handwriting digitisation (Orlando, 2023, 8). One of the first advanced smartpens, Live-
scribe Pulse, was launched in 2008. Livescribe pens are advanced smartpens designed for
note-taking and audio recording. These pens are equipped with microphones, speakers, and
an infrared camera. They use special dot paper to synchronise written notes with audio.
Users can tap on their notes to hear the corresponding audio. These features aid interpreters
in reviewing and understanding the context of their notes. Inspired by these features, Marc
Orlando proposed and integrated smartpens into CI training (Orlando, 2010, 2015, 2016,
this volume). The idea behind this decision was to make it easier for instructors to assess
and provide feedback on both the content and the process of students’ note-taking, rather
than just the final product. As such, it is possible to observe both notes and speech at the
same time and even assess temporal aspects, types of notes, and the final product (notes
taken or the rendition) (Pöchhacker, 2016, 199).
As mentioned earlier, smartpens allow for the capture and instant playback of notes on
various devices. They facilitate detailed classroom discussions on the qualities and short-
comings of students’ notes in relation to both the source speech and their interpretation. As
a result, discussions can focus on areas such as identification and understanding of issues,
misunderstandings or omissions in note-taking, and the lag between hearing and noting
down information (Orlando, 2023, 11). In general, digital pen technology and smartpens,
in particular, have been highlighted in experimental research as being beneficial in this

96
Technology-enabled consecutive interpreting

educational context. The earliest initiatives are Orlando’s studies (2010, 2015, 2016),
which introduced smartpens in classroom settings to facilitate the self-evaluation of stu-
dents’ note-taking practices. A limited number of studies report different pedagogical activi-
ties using the smartpen (Kellet Bidoli, 2016; Kellet Bidoli and Vardè, 2016; Romano, 2018),
with a range of conclusions drawn. For example, the smartpen is seen as being beneficial
for allowing students to revisit their notes repeatedly. This enhances their understanding of
their own note-taking habits and helps improve accuracy in interpreting (Romano, 2018).
As a result, a collaborative learning environment is created (Orlando, 2015; Kellet Bidoli,
2016), which allows for a tangible trace of mental processes, deemed to be typically chal-
lenging, to be captured and evaluated (Kellet Bidoli and Vardè, 2016). Nevertheless, further
widespread research is needed to further validate the effectiveness of smartpens in training.
Aside from the benefits that smartpens and digital pens provide in terms of the cognitive
aspects of the process, as well as enriching process-oriented research in CI, these devices
were also used and tested in the hybrid mode of simultaneous-consecutive for the recording
phase. Several studies employed explanatory research methodologies to analyse the practi-
cality of smartpens in simultaneous-consecutive. For example, Hiebl (2011) aimed to com-
pare CI with simultaneous-consecutive. Four interpreting students and three professional
interpreters interpreted three speeches from Italian to German using either mode. The results
showed that difficult texts were often chosen for simultaneous-consecutive interpreting,
but participants generally preferred traditional CI. Orlando (2014) tested four professional
interpreters who performed CI in both traditional consecutive and the hybrid mode. Their
level of accuracy was measured based on the units of meaning being correctly conveyed.
This was shown to be higher for participants who performed simultaneous-consecutive
using a smartpen. Orlando also reported that there were fewer disfluencies or hesitations
in the hybrid mode compared to the traditional mode. This may be due to having access to
the speech content for a second time through the playback feature of the digital pen (2014,
48). Similarly, Mielcarek (2017) tested the two different modes with four participants. The
study found better interpreting performance in simultaneous-consecutive with smartpens
compared to both traditional CI and simultaneous-consecutive with ordinary digital voice
recorders. In another comparative study by Svoboda (2020), seven interpreters performed
CI both in traditional consecutive mode using simultaneous-consecutive with a smartpen.
The interpretations were assessed by an independent audience and through video-based
analysis for ‘source–target correspondence’ (p. 47). Findings indicated that despite the tech-
nological support provided by the smartpen, the audience generally preferred traditional
consecutive interpreting over simultaneous-consecutive with a smartpen. However, the data
analysis showed that simultaneous-consecutive with a smartpen improved the accuracy of
source–target correspondence compared to traditional consecutive interpreting. Özkan’s
study (2020) found a positive trend in favour of simultaneous-consecutive with a smartpen,
but interpreters preferred tablets and styluses over smartpens.

6.2.3 ASR-assisted modalities: sight-consecutive and respeaking


The motivation behind ASR-assisted CI is to provide the transcription of the source speech
automatically, to be used as a reference text for memory jogging. The transcribed text can
either facilitate sight translation or serve as a supplementary aid to the interpreter’s own
notes. Due to the availability of ASR output, rather than taking manual notes, the inter-
preter engages in sight translation. Therefore, during the rendition phase, interpreters using

97
The Routledge Handbook of Interpreting, Technology and AI

ASR technology rely on the generated transcript and essentially sight-translate the text. The
rationale behind this is that an ASR model with precision and accuracy can enhance per-
formance by presenting the automatic transcription and/or machine-translated target text
for a smooth rendition. This new modality can be referred to as ‘consecutive interpreting
with text’ or ‘sight-consecutive’ (Ünlü, 2023, 23). Relying fully on a real-time transcript
generated by an ASR tool can occasionally be problematic for many reasons, particularly
in a real setting. The verbatim output is one such reason. For example, having access to
the entire ASR-generated source text, combined with machine translation, might compel
the interpreter to be more thorough and comprehensive, thus spending additional time to
deliver thorough renditions (Ünlü, 2023, 96). Another challenge is related to the inherent
process of text generation through an ASR pipeline. Recent progress in deep learning has
made it possible, albeit in favour of high-resource languages, to run multilingual speech rec-
ognition models with high accuracy and low word-error rate. However, ASR is still bound
to certain technical issues and inherent pitfalls. The technical limitations of ASR-assisted
interpreting, whether SI or CI, include, among many others, microphone malfunctions,
software glitches, connectivity issues, and confidentiality issues. Transcribing speech accu-
rately can also be challenging due to various factors. These include the type of speech used
(casual or formal), variations in the speaker’s voice, and ambiguity caused by homonyms
(Fantinuoli, 2017). Additionally, misrecognition of word boundaries can also contribute
to errors in the output of ASR. Therefore, it is crucial for the user to have an ergonomic
experience with the tool(s) and to know the weaknesses, strengths, and risks of having such
assistance.
As for the tools and technologies used in empirical research, the literature shows that
the researchers who conducted empirical studies on adopting ASR in CI mostly used
stand-alone ASR products that are not designed as CAI tools. However, two bespoke CAI
tools, designed specifically for interpreting scenarios, are now available on the market.
These publicly available products are Sight-Terp and Cymo Note. With their customised
interfaces, both aim to enhance the capabilities of standard note-taking applications by
incorporating AI-based functionalities like named entity recognition, segmentation, anno-
tation, and summarisation. The following subsections will focus on the studies that have
been conducted so far on the use of speech technologies (ASR and speech translation) in CI,
both in terms of non-bespoke ASR solutions and stand-alone CAI tools.

6.2.3.1 Review of the studies on ASR-assisted consecutive interpreting


Several studies have examined the impact of ASR in CI, yielding both positive and nega-
tive findings. A common methodological trend in experimental research is to compare and
analyse the interpreting performances of participants with and without technological assis-
tance. The developer of Sight-Terp, Ünlü (2023), conducted an empirical study investigating
the impact of Sight-Terp on the consecutive interpreting performance of 12 trainee inter-
preters in CI tasks from English to Turkish. Sight-Terp is a web-based, non-commercialised
CAI tool developed and designed for CI scenarios. The tool mainly features ASR and speech
translation (ST) with inner functions, such as automatic named entity recognition, enumer-
ated segmentation, and digital note-taking (Ünlü, 2023, 66). It transcribes spoken input
and translates speech segments simultaneously, providing two adjacent reference texts on
the interface. As the input speech is transcribed into chunks, the sentences are vertically
displayed with an enumerated view. In the meantime, named entities in the text (numbers;

98
Technology-enabled consecutive interpreting

organisation names; person names; dates; numerical data, for example, percentage, ordinal
numbers, temperature; location names; and currency data) are synchronously highlighted
to ease the ‘reading from notes’ effort (Ünlü, 2023, 73). The validity of this has not yet
been tested. Moreover, Sight-Terp incorporates an optional digital note-taking application,
supporting features like scratch-out erasing and drawing lines with a stylus like the Apple
Pencil or Samsung S Pen. This feature allows the user to experience both ASR support and
tablet interpreting (Goldsmith, 2018) experience. However, the use of a digital notepad
with ASR or ST has not been tested in this study.
Ünlü’s study used a within-subjects design to compare interpreting performance with
and without the tool, focusing particularly on accuracy and fluency variables. A repeated
measures design was used, involving pre-tests (without technological aid) and post-tests
(with Sight-Terp). Twelve novice interpreters participated, providing a within-subjects
comparison across conditions. The procedure involved initial pre-tests using traditional
note-taking methods, followed by training on the Sight-Terp tool, and subsequent post-tests
using the tool. It is important to note for this experiment that the Sight-Terp interface also
displays an instant MT output of the source transcription. So the participants were free to
make use of either the output of ASR (source language transcription) or the output of ST
during the assignment.
Results indicated that interpreters demonstrated increased accuracy when using
Sight-Terp compared to when they did not use technological aids. While Sight-Terp
improved accuracy, its use also led to an increase in disfluencies, such as pauses, hesitations,
repetitions, stuttering, and false starts. Using Sight-Terp also resulted in longer durations
of delivery. This was either perceived as interpreters taking additional time to process the
information provided by the tool or being tempted to take a more meticulous approach
to render the ASR/ST output fully in the target language. Participants generally found the
Sight-Terp interface to be user-friendly, and the tool itself useful for their interpreting work.
However, some challenges were noted, such as minor ASR errors and segmentation issues.
Another CAI tool offering ASR-based functionalities is Cymo Note, a software designed
as an interpretation assistant by Cymo Inc., a Beijing-based company. Cymo Note incorpo-
rates a third-party ASR with MT developed to augment both consecutive and simultaneous
interpreting performance. Cymo Note offers two display options: displaying the transcrip-
tion on a full screen, from which the interpreter makes use of the notes, or dividing the
screen into two halves, one for the transcript and the other for digital note-taking. Gener-
ally speaking, Cymo Note has similar functionalities to Sight-Terp, including automatic
transcription of speech in multiple languages, inline highlighting of important terms or
numbers, and on-demand machine translation of selected texts. On the other hand, Cymo
Note also makes digital note-taking possible during the speech recognition session. The tool
also allows for customisation via a proprietary algorithm, note annotation of notes directly
on transcripts, and local data storage for confidentiality (Goldsmith, 2023). Cymo Note is
tailored for various interpreting scenarios, including remote, on-site, and hybrid settings. It
does not save data to the cloud, ensuring user privacy (Zhu, 2023). Currently, there are no
studies that empirically test Cymo Note in CI.
Following Ünlü’s call to conduct empirical studies for other language pairs, Dellantonio
(2023) experimented with Sight-Terp with the participation of six Italian-speaking female
interpreting students from the University of Innsbruck. Participants interpreted two texts
from German to Italian, one with Sight-Terp’s assistance and one without, in consecu-
tive mode. Two speeches on climate change, each made up of 15 syntactically complex

99
The Routledge Handbook of Interpreting, Technology and AI

passages, were used as the stimuli. Performances were recorded for accuracy analysis. After
the experiment, participants completed a questionnaire where the researcher explored their
perceptions, opinions, and reflections on using Sight-Terp (2023, 99). A particularly nota-
ble aspect of Dellantonio’s study is that the study analysed how the tool aids interpreters in
tackling syntactically complex passages. The ST feature was discarded, and only the ASR
output was used. This means that the machine-generated translation of the source speech
input was not used for reference during the experiment. Unlike Ünlü (2023), who analysed
the renditions based on ‘units of meaning’ (Seleskovitch and Lederer, 1989), the perfor-
mances of the participants were analysed for accuracy and completeness using a rating
scale based on Tiselius (2009). The results indicate that, for the participants, the tool was
particularly helpful for handling long or complex sentences. However, this caused syntactic
differences between the source and the target in particular. These were perceived as a dif-
ficulty and led the participants to use Sight-Terp’s transcription (Dellantonio, 2023).
Another study on the use of Sight-Terp in CI is Michele Restuccia’s MA thesis (2023), which
similarly employs an experimental set-up to explore how interpreters interact with the tool,
and its impact on their performance. Six students nearing completion of their master’s degree
in conference interpreting participated in the study. Participants interpreted two speeches from
German to Italian, one in traditional consecutive interpreting mode (pen-and-paper notes),
and the other using Sight-Terp, with the order randomised. For better preparation and elimi-
nation of the unfamiliarity with technology-assisted CI, Restuccia provided the participants
with an 11 hr seminar on AI-assisted interpreting preparation using NotionAI and consecu-
tive interpreting with Sight-Terp. The data collection methods included the screen recording
(of Sight-Terp-generated transcription), audio recordings of the renditions, a questionnaire
(31 questions), and lastly, a focus group discussion to explore the participants’ strategies,
challenges, and opinions on Sight-Terp and ASR-assisted CI. Contrary to initial assumptions,
the availability of ASR transcription did not positively enhance listening comprehension. The
trainee interpreters who relied solely on the transcript reported feeling distracted and found it
challenging to analyse the speech in detail (Restuccia, 2023). Similar to the results concluded
in Ünlü’s study (2023), consecutive interpreting using Sight-Terp was found to take generally
longer than the performances using traditional techniques (pen and paper). This is likely due
to the added cognitive effort of monitoring both the transcript and their notes, or the reliance
on sight translation of a raw transcription, leading to a longer rendition. Furthermore, par-
ticipants developed various strategies for interacting with Sight-Terp: Some used it primarily
for confirming notes, some relied solely on the transcript for sight translation, while others
used it as a supplementary resource during rendition (Restuccia, 2023).
The existing body of literature in Western academic publications suggests a scarcity of
empirical studies that focus on the use of ASR/CAI within CI. However, a quick search of
the other databases, including China National Knowledge Infrastructure (CNKI), Wanfang
database (www.wf.pub), and Korean Studies Information Service System (KISS), reveals
a growing body of research conducted at both master’s and PhD levels on the application
of ASR in CI. It is also worth noting that the English-language empirical studies on CAI,
for both SI and CI, focus primarily on the adoption of InterpretBank and Dragon Natu-
ral Speaking software. In contrast, Chinese-language studies encompass a wide range of
software, including iFlytek ASR, iFlynote, or iFlytek Interpreting Assistant (Guo et al.,
2023, 93). The studies on ASR usage in CI in CNKI and other related non-European data-
bases provide empirical evidence for different language pairs. Lee’s experimental study
(2021, 2022) investigated ASR assistance in CI using Microsoft Azure’s speech-to-text.

100
Technology-enabled consecutive interpreting

Twenty-two professional interpreters (19 females, 3 males) at various ages and experience
levels were asked to interpret the first half of the material presented to them, without using
ASR support. The second half of the material was interpreted with ASR-generated tran-
scripts displayed on-screen. The experiment involved three rounds of varying text length
and content complexity (a lecture, a presentation, a conference call). Data analysis was
conducted through a questionnaire administered to all participants. This post-experiment
questionnaire revealed that interpreters generally expressed a positive view of using ASR
(2022, 950), and particularly, shorter texts with numerous numerical data were received
more favourably. The study also reported that interpreters changed their approach to using
the ASR output as they became more familiar with it over the three experimental rounds.
This means that repeated exposure appeared to lead to adjustments in their need for ASR
reference. Examples of adjustments include checking the ASR output to confirm their
understanding or fill in gaps in their notes, and using this for specific types of information
(like numbers or names). Some interpreters shifted their note-taking to focus on captur-
ing only the most essential ideas, knowing ASR could provide more detailed information
(2022, 948). Li (2016) and Zhang (2020) focused on the overall accuracy of ASR software,
like Dragon Naturally Speaking, and its impact on (consecutive) interpreting performance.
While Li (2016) found that ASR assists in improving interpreting performance, issues with
strong accents and rapid speech were notable. Similarly, Zhang (2020) reported a modest
improvement in fidelity but observed a decrease in fluency and time management when using
ASR technology (Li, H. Y., 2016; Zhang, 2020). Xin’s (2023) study specifically examined
the translation of proper nouns in Japanese–Chinese CI, suggesting how ASR technology
particularly aids with longer or more complex nouns. However, it also introduces chal-
lenges such as psychological dependency and potential errors from low recognition accu-
racy (Xin, 2023). Qin (2021) adopted a similar methodology to assess performance across
different text lengths and language directions. The findings suggested that high-accuracy
ASR technology consistently improves interpreting quality, with more proven benefits for
longer texts. Bu (2021) tested 20 graduate students interpreting speeches by Jack Ma on
education and AI, with and without iFlyNote speech recognition software. The findings
suggest that while the software can reduce repetition and self-correction in formal materi-
als, it generally lowers accuracy, efficiency, and fluency in interpreting, especially with col-
loquial materials. Li (2023) explored the efficiency of interpreters using iFlyRec software to
replace traditional note-taking tasks. The results showed significant improvements in inter-
preting scores with high speech recognition accuracy, underlining the importance of mate-
rial difficulty and individual differences in the effectiveness of ASR technology (Li, 2023).
Overall, the studies just quoted generally indicate a positive trend toward integrating
ASR technology in CI. However, they also highlight the need for improvements in software
accuracy and adaptation to specific interpreting challenges. On the other hand, studies in
this area show that they share some methodological limitations. One such limitation is
that the sample sizes are not large enough to allow for the generalisability of the findings.
The other limitation is that, besides Lee (2022), studies usually do not recruit professional
interpreters, but trainees. The methodologies of the studies also demonstrate that data
analysis regarding the accuracy of the renditions in terms of quality is approached from
very different frameworks and viewpoints. Examples of different frameworks and view-
points include linguistic micro-analysis of source–target correspondence, perceived quality
through surveys, or interrater agreements. This further shows the importance of developing
robust-quality assessment frameworks for empirical data analysis in ASR-assisted CI.

101
The Routledge Handbook of Interpreting, Technology and AI

6.2.3.2 Respeaking-assisted hybrid consecutive interpreting


Similar to the modus operandi of the previously mentioned simultaneous-consecutive
method, Chen and Kruger (2023) proposed a new approach. As an example of an ASR- or
ST-supported hybrid modality, this new method is called the ‘computer-assisted consecu-
tive interpreting’ (CACI) model. Chen and Kruger’s approach integrates speech recogni-
tion, machine translation (MT), and respeaking. Here, respeaking refers to a technique in
which a professional listens to the original audio of a live programme or event and repeats
or reformulates it in the same language. This reformulated speech is input into speech rec-
ognition software, which converts it into subtitles displayed on the screen with minimal
delay (Romero-Fresco, 2011). In general terms, the proposed CACI model restructures
the two phases of consecutive interpreting mode by assisting with certain cognitive tasks.
Inspired by Gile (2009), the authors refer to the aforementioned phases as ‘Phase I’ and
‘Phase II’. Phase I includes sub-processes of listening and analysis, note-taking, short-term
memory operations, and coordination (Gile, 2009). In Phase I of CACI, the interpreter lis-
tens to the source speech and simultaneously respeaks it intralingually into an ASR system.
­Subsequently, the ASR output text is fed into an MT system, which automatically translates
into the target language. Moving to Phase II of this adapted model, the interpreter utilises
both the diluted ASR output (the intralingual respoken source text) and MT texts to pro-
duce the final target speech. The authors hypothesise that this technique potentially reduces
the cognitive load associated with the conventional steps of note-taking and memory recall.
However, the fact that the interpreter simultaneously needs to speak to the ASR while the
source speech continues raises questions about the feasibility of such a method. For use
cases, the authors offer two scenarios for both on-site and remote interpreting:

As to the application of CACI in the real world, the scenarios would be different in
remote and on-site interpreting. In remote CI (referring to cases in which the inter-
preter is located remotely from the other parties), the interpreter simply needs to mute
herself on the conference software in Phase I and unmute in Phase II. In on-site CI, the
interpreter would need a stenomask – a microphone built into a padded, sound-proof
enclosure that fits over the speaker’s mouth – a device used by court reporters in
creating proceedings via SR. On the one hand, the steno-mask ensures SR quality in
noisy environments; on the other hand, it silences the user’s voice so that it does not
interfere with the surrounding environment, and in our case, the conference where
the interpreting takes place.
(Chen and Kruger, 2023, 8)

The study used six undergraduate students who had been majoring in English and had had
experience in conventional CI. The participants underwent a ten-week online and offline
training programme on CACI. However, the authors did not provide specific details about the
training programme. This left it unclear as to whether the participants received training on
respeaking as part of their training. iFlytek was used for ASR, while Baidu Translate was used
for MT. The results showed that the participants delivered more fluently with CACI than in
conventional CI. This meant the authors believed this method to be effective if provided with
proper training in CACI (2023). It is important to acknowledge the limitations of this method
in terms of certain practical aspects. For example, wearing a steno mask for extended peri-
ods can be uncomfortable and cause fatigue, especially for interpreters working on lengthy

102
Technology-enabled consecutive interpreting

assignments. Using a steno mask for CACI, as well as the multi-step nature of the method,
results in a trade-off between improved speech recognition quality and potential drawbacks
relating to communication, comfort, technical dependency, and psychological impact.

6.3 Conclusion
This chapter takes stock of the designs and implementations of various technologies applied
in CI, with a broad review of academic research in practice and training. Although CI has
traditionally been viewed as a non-technical field, it does have the potential to incorporate
various tools and technologies and enhance different phases of the interpreting process. It
is evident that the interplay between technology and consecutive interpreting occurs within
different methods and approaches. Although the number of studies remains very limited,
the application of technology in CI using different approaches is experimentally tested in
various geographies, with different language pairs and directionality. These applications and
experiments included technological types of equipment, like digital pens/smartpens, ASR
models, tablets, and hybrid modalities (simultaneous-consecutive and CACI). Themes of the
recent studies also show that the recent advancements in speech recognition increased interest
among scholars and students in CAI and ASR-integrated technologies applied in interpreting.
Despite efforts, it seems that there are still challenges and limitations in research that per-
sist and require further discussion. A common limitation appears to be the subjects involved
in the experiments, which are mostly trainee interpreters or new graduates. Studies involv-
ing professionals in ecologically valid experimental set-ups should eventually bridge this
gap. More studies based on practical experiments and empirical evidence are needed, par-
ticularly for ASR-assisted CI. Moreover, studies that examine the cognitive loads involved
in ASR-assisted CI are still very limited in number.
A general outlook at the field shows that technology in CI has not been widely adopted
in training or professional settings. This reflects its modest popularity and the lack of sys-
tematic integration into interpreter training curricula. Despite the recognised advantages
and the growing priority given to digital literacy and ICT skills in higher education, the
integration of these technologies in interpreting classrooms remains limited and should be
addressed. Nevertheless, the growing momentum in research will surely open new areas for
broader generalisability. The availability of various technologies and of multimodal func-
tions presents potential paths for future academic studies on CAI applied in CI.
On the technical side, recent advancements in speech recognition technology have led to
significant progress (Baevski et al., 2020; Barrault et al., 2023a), making it applicable across
a wide range of uses. State-of-the-art foundation models, such as Whisper (Radford et al.,
2023), Google USM (Zhang et al., 2023), w2v-BERT 2.0 v1 (Barrault et al., 2023a), and
w2v-BERT 2.0 v2 (Barrault et al., 2023b), have gained widespread popularity and are used
extensively in both academia and industry. Their robust performance demonstrates a lower
word error rate and remains very promising for ASR-related issues and challenges present
in ASR-assisted CI. Thus, related CAI tools might soon benefit from these advanced foun-
dational models in a fast cloud environment. Furthermore, thanks to their remarkable abil-
ity to understand, analyse, and generate text with vast amounts of pre-trained knowledge,
the emergence of large language models (LLMs) has revolutionised both academic natural
language processing (NLP) research and industrial products. Since LLMs capture complex
linguistic patterns, semantic relationships, and contextual cues, it is possible to produce
high-quality summaries that rival those crafted by humans. This capability is particularly

103
The Routledge Handbook of Interpreting, Technology and AI

advantageous in ASR-assisted CI, where the transcription of source speech can be dense and
complex. That is to say, LLMs can effectively distil the main arguments, points, and ideas
from the transcription and provide concise information. As of now, direct integration of an
LLM or LLM-in-the-loop approach in an ASR pipeline for CI remains unexplored.

References
Albl-Mikasa, M., 2017. Notation Language and Notation Text: A Cognitive-Linguistic Model of
Consecutive Interpreting. In Someya, Y., ed. Consecutive Notetaking and Interpreter Training.
Routledge, London, 71–117.
Altieri, M., 2020. Tablet Interpreting: Étude expérimentale de l’interprétation consécutive sur tab-
lette. The Interpreters’ Newsletter 25, 19–35. URL https://doi.org/10.13137/2421-714X/31235
Arriaga-Prieto, C., Villamayor, I., Serrano Leiva, A., Cascallana Rodriguez, A., Rodríguez, S., Pozo
Huertas, A., Alonso González, A., 2023. Smarterp Educational: A Virtual Laboratory to Train
Simultaneous Interpreting. In Proceedings of the 15th International Conference on Education and
New Learning Technologies (EDULEARN23), IATED Academy, Palma, Spain, 3257–3264. URL
https://doi.org/10.21125/edulearn.2023.0900
Baevski, A., Zhou, Y., Mohamed, A., Auli, M., 2020. wav2vec 2.0: A Framework for Self-Supervised
Learning of Speech Representations. In Proceedings of the 34th Conference on Neural Informa-
tion Processing Systems, 12449–12460.
Barrault, L., Chung, Y.-A., Cora Meglioli, M., Dale, D., Dong, N., Duquenne, P.-A., Elsahar, H.,
Gong, H., Heffernan, K., Hoffman, J., Klaiber, C., Li, P., Licht, D., Maillard, J., Rakotoarison,
A., Ram Sadagopan, K., Wenzek, G., Ye, E., Akula, B., Chen, P.-J., El Hachem, N., Ellis, B.,
Mejia Gonzalez, G., Haaheim, J., Hansanti, P., Howes, R., Huang, B., Hwang, M.-J., Inaguma,
H., Jain, S., Kalbassi, E., Kallet, A., Kulikov, I., Lam, J., Li, D., Ma, X., Mavlyutov, R., Peloquin,
B., Ramadan, M., Ramakrishnan, A., Sun, A., Tran, K., Tran, T., Tufanov, I., Vogeti, V., Wood,
C., Yang, Y., Yu, B., Andrews, P., Balioglu, C., R. Costa-jussà, M., Celebi, O., Elbayad, M., Gao,
C., Guzmán, F., Kao, J., Lee, A., Mourachko, A., Pino, J., Popuri, S., Ropers, C., Saleem, S.,
Schwenk, H., Tomasello, P., Wang, C., Wang, J., Wang, S., 2023a. Seamless M4T-Massively Multi-
lingual & Multimodal Machine Translation. arXiv preprint arXiv:2308.11596. URL https://arxiv.
org/abs/2308.11596
Barrault, L., Chung, Y.-A., Coria Meglioli, M., Dale, D., Dong, N., Duppenthaler, M., Duquenne,
P.-A., Ellis, B., Elsahar, H., Haaheim, J., Hoffman, J., Hwang, M.-J., Inaguma, H., Klaiber, C.,
Kulikov, I., Li, P., Licht, D., Maillard, J., Mavlyutov, R., Rakotoarison, A., Ram Sadagopan, K.,
Ramakrishnan, A., Tran, T., Wenzek, G., Yang, Y., Ye, E., Evtimov, I., Fernandez, P., Gao, C.,
Hansanti, P., Kalbassi, E., Kallet, A., Kozhevnikov, A., Mejia Gonzalez, G., San Roman, R., Touret,
C., Wong, C., Wood, C., Yu, B., Andrews, P., Balioglu, C., Chen, P.-J., R. Costa-jussà, M., Elbayad,
M., Gong, H., Guzmán, F., Heffernan, K., Jain, S., Kao, J., Lee, A., Ma, X., Mourachko, A., Pelo-
quin, B., Pino, J., Popuri, S., Ropers, C., Saleem, S., Schwenk, H., Sun, A., Tomasello, P., Wang,
C., Wang, J., Wang, S., Williamson, M., 2023b. Seamless: Multilingual Expressive and Streaming
Speech Translation. arXiv preprint arXiv:2312.05187. URL https://arxiv.org/abs/2312.05187
Baselli, V., 2023. Developing a New CAI-Tool for RSI Interpreters’ Training: A Pilot Study. In Orăsan,
C., Mitkov, R., Corpas Pastor, G., Monti, J., eds. Proceedings of the International Conference
on Human-Informed Translation and Interpreting Technology (HiT-IT 2023), Naples, Italy.
INCOMA Ltd., Shoumen, Bulgaria, 157–166.
Biagini, G., 2015. Glossario cartaceo e glossario elettronico durante l’interpretazione simultanea: uno
studio comparativo (Unpublished MA dissertation). Università di Trieste.
Bu, X., 2021. A Report of Automatic Speech Recognition’s Impacts on Chinese-Japanese Simultane-
ous Interpreting of Numbers: A Case Study of iFlyrec (Unpublished MA dissertation). Dalian
University of Foreign Languages.
Camayd-Freixas, E., 2005. A Revolution in Consecutive Interpretation: Digital Voice-Recorder-Assisted
CI. The ATA Chronicle 3, 40–46.
Chen, S., Kruger, J.-L., 2023. The Effectiveness of Computer-Assisted Interpreting: A Preliminary
Study Based on English-Chinese Consecutive Interpreting. Translation and Interpreting Studies
18(3). URL https://doi.org/10.1075/tis.21036.che

104
Technology-enabled consecutive interpreting

Cheung, A., Tianyun, L., 2018. Automatic Speech Recognition in Simultaneous Interpreting: A New
Approach to Computer-Aided Interpreting. In Proceedings of Ewha Research Institute for Transla-
tion Studies International Conference. Ewha Womans University.
Chitrakar, R., 2016. Tehnološko podprto konsekutivno tolmačenje [Technologically Supported Con-
secutive Interpreting] (Unpublished PhD thesis). University of Ljubljana.
Defrancq, B., Fantinuoli, C., 2021. Automatic Speech Recognition in the Booth: Assessment of Sys-
tem Performance, Interpreters’ Performances and Interactions in the Context of Numbers. Target
33(1), 73–101.
Dellantonio, E., 2023. Utilizzo del CAI Tool Sight-Terp in interpretazione consecutiva: Impiego del
CAI tool per la risoluzione di passaggi sintatticamente complessi nella combinazione linguistica
tedesco-italiano (Unpublished MA dissertation). Leopold-Franzens-Universität Innsbruck.
Drechsel, A., Goldsmith, J., 2016. Tablet Interpreting: The Evolution and Uses of Mobile Devices in
Interpreting. In Proceedings of CUITI Forum 2016.
Fantinuoli, C., 2017. Speech Recognition in the Interpreter Workstation. In Proceedings of the Trans-
lating and the Computer 39. Editions Tradulex, Geneva, 25–34.
Fantinuoli, C., 2018. Computer-Assisted Interpreting: Challenges and Future Perspectives. In
Durán, I., Corpas, G., eds. Trends in E-Tools and Resources for Translators and Interpreters. Brill,
153–174.
Fantinuoli, C., 2023. Towards AI-Enhanced Computer-Assisted Interpreting. In Corpas Pastor, G.,
Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. John Benjamins Pub-
lishing Company, 47–72. URL https://doi.org/10.1075/ivitra.37.01orl.
Fantinuoli, C., Marchesini, G., Landan, D., 2022. Interpreter Assist: Fully-Automated Real-Time
Support for Remote Interpretation. In Proceedings of Translator and Computer 53 Conference.
Ferrari, M., 2001. Consecutive Simultaneous? SCIC News 26, 2–4.
Ferrari, M., 2002. Traditional vs. Simultaneous consecutive. SCIC News 29, 6–7.
Ferrari, M., 2007. Simultaneous Consecutive Revisited. SCIS News, 124. URL https://iacovoni.files.
wordpress.com/2009/01/simultaneousconsecutive-2.pdf (accessed 9.3.2024).
Frittella, F.M., 2023. Usability Research for Interpreter-Centred Technology: The Case Study of
SmarTerp. Language Science Press.
Frittella, F.M., Rodríguez, S., 2022. Putting SmartTerp to Test: A Tool for the Challenges of Remote
Interpreting. NContext 2(2), 137–166. URL https://doi.org/10.54754/incontext.v2i2.21
Gaber, M., Corpas Pastor, G., Omer, A., 2020. Speech-to-Text Technology as a Documentation Tool
for Interpreters: A New Approach to Compiling an Ad Hoc Corpus and Extracting Terminology
from Video-Recorded Speeches. TRANS. Revista De Traductología 24, 263–281. URL https://doi.
org/10.24310/TRANS.2020.v0i24.7876
Gacek, M., 2015. Softwarelösungen für DolmetscherInnen (Unpublished MA dissertation). Univer-
sity of Vienna.
Gile, D., 1995. Basic Concepts and Models for Interpreter and Translator Training. John Benjamins,
Amsterdam.
Gile, D., 2001. The Role of Consecutive in Interpreter Training: A Cognitive View. Communicate 14.
URL http://aiic.net/p/377
Gile, D., 2009. Basic Concepts and Models for Interpreter and Translator Training, revised ed. John
Benjamins, Amsterdam. URL https://doi.org/10.1075/btl.8
Gillies, A., 2019. Consecutive Interpreting: A Short Course. Routledge, London.
Giustini, D., 2024. “You Can Book an Interpreter the Same Way You Order Your Uber”: (Re)Inter-
preting Work and Digital Labour Platforms. Perspectives 1–19. URL https://doi.org/10.1080/090
7676X.2023.2298910
Goldsmith, J., 2018. Tablet Interpreting: Consecutive Interpreting 2.0. Translation and Interpreting
Studies 13(3), 342–365. URL https://doi.org/10.1075/tis.00020.gol
Goldsmith, J., 2023. Cymo Note: Speech Recognition Meets Automated Note-Taking. The Tool Box
Journal 23(2), 34. URL www.internationalwriters.com/toolkit/current.html
Goldsmith, J., Holley, J., 2015. Consecutive Interpreting 2.0: The Tablet Interpreting Experience
(Unpublished MA dissertation). University of Geneva.
Gomes, M., 2002. Digitally Mastered Consecutive: An Interview with Michele Ferrari. Lingua Franca:
Le Bulletin de l’interpretation au Parlement Européen 5/6, 6–10.

105
The Routledge Handbook of Interpreting, Technology and AI

Guo, M., Han, L., Anacleto, M.T., 2023. Computer-Assisted Interpreting Tools: Status Quo and
Future Trends. Theory and Practice in Language Studies, 13(1), 89–99.
Hamidi, M., 2006. Simultanes Konsekutivdolmetschen. Ein experimenteller Vergleich im Sprachen-
paar Französisch-Deutsch (Unpublished MA dissertation). University of Vienna.
Hamidi, M., Pöchhacker, F., 2007. Simultaneous Consecutive Interpreting: A New Technique Put to
the Test. Meta 52(2), 276–289. URL https://doi.org/10.7202/016070ar
Hansen-Schirra, S., 2012. Nutzbarkeit von Sprachtechnologien für die Translation. trans-kom 5(2),
211–226.
Hawel, K., 2010. Simultanes versus klassisches Konsekutivdolmetschen: Eine vergleichende textuelle
Analyse (Unpublished MA dissertation). University of Vienna.
Hiebl, B., 2011. Simultanes Konsekutivdolmetschen mit dem Livescribe Echo Smartpen (Unpublished
MA dissertation). University of Vienna.
Kellet Bidoli, C.J., 2016. Traditional and Technological Approaches to Learning LSP in Italian to
English Consecutive Interpreter Training. In Garzone, G., Heaney, D., Riboni, G., eds. Focus on
LSP Teaching: Developments and Issues. LED, 103–126.
Kellet Bidoli, C.J., Vardè, S., 2016. Digital Pen Technology and Consecutive Note-Taking in the Class-
room and Beyond. In Zehnalová, J., Molnár, O., Kubánek, M., eds. Interchange Between Lan-
guages and Cultures: The Quest for Quality. Palacký University, 131–148.
Lee, J.-R.-A., 2021. Preliminary Research on the Application of Automatic Speech Recognition in
Interpretation. The Journal of Humanities and Social Science 21 12(5), 2407–2422. https://doi.
org/10.22143/HSS21.12.5.170
Lee, J.-R.-A., 2022. A Case Study on the Usability of Automatic Speech Recognition as an Auxil-
iary Tool for Consecutive Interpreting. The Journal of Humanities and Social Science 21 13(4),
937–952. URL https://doi.org/10.22143/HSS21.13.4.66
Li, F.Z., 2023. The Impact of Speech Recognition on the Efficiency of Interpreters in Consecu-
tive Interpreting (Unpublished MA dissertation). Inner Mongolia University. URL https://doi.
org/10.27224/d.cnki.gnmdu.2023.001088
Li, H.Y., 2016. The Auxiliary Role of Speech Recognition Technology in Consecutive Interpreting
(Unpublished MA dissertation). Sichuan Foreign Languages University. URL https://kns.cnki.net/
KCMS/detail/detail.aspx?dbname=CMFD201701&filename=1016072373.nh
Lombardi, J., 2003. DRAC Interpreting: Coming Soon to a Courthouse Near You? Proteus 12(2), 7–9.
Ma, Z., 2022. A Comparative Study of Student Interpreters’ Performance in EC Simultaneous-
Consecutive and Consecutive Interpreting Modes (Unpublished MA dissertation). Beijing Foreign
Studies University. URL https://doi.org/10.26962/d.cnki.gbjwu.2022.000760
Mellinger, C.D., 2023. Embedding, Extending, and Distributing Interpreter Cognition with Tech-
nology. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies – Current and Future
Trends. John Benjamins, 195–216. URL https://doi.org/10.1075/ivitra.37.08mel
Mielcarek, M., 2017. Das simultane Konsekutivdolmetschen (Unpublished MA dissertation). Univer-
sity of Vienna.
Orlando, M., 2010. Digital Pen Technology and Consecutive Interpreting: Another Dimension in
Note-Taking Training and Assessment. The Interpreters’ Newsletter 15, 71–86.
Orlando, M., 2014. A Study on the Amenability of Digital Pen Technology in a Hybrid Mode of
Interpreting: Consec-Simul with Notes. Translation and Interpreting 6(2), 39–54.
Orlando, M., 2015. Implementing Digital Pen Technology in the Consecutive Interpreting Classroom.
In Andres, D., Behr, M., eds. To Know How to Suggest . . . Approaches to Teaching Conference
Interpreting. Frank & Timme, 171–199.
Orlando, M., 2016. Training 21st Century Translators and Interpreters: At the Crossroads of Prac-
tice, Research and Pedagogy. Frank & Timme.
Orlando, M., 2023. Using Smartpens and Digital Pens in Interpreter Training and Interpreting
Research. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies – Current and Future
Trends. John Benjamins Publishing Company, 6–26. URL https://doi.org/10.1075/ivitra.37.01orl
Orlando, M., Hlavac, J., 2020. Simultaneous-Consecutive in Interpreter Training and Interpreting
Practice: Use and Perceptions of a Hybrid Mode. The Interpreters’ Newsletter 25, 1–17.
Özkan, C.E., 2020. To Use or Not to Use a Smartpen: That Is the Question. An Empirical Study on
the Role of Smartpen in the Viability of Simultaneous-Consecutive Interpreting (Unpublished MA
dissertation). Ghent University.

106
Technology-enabled consecutive interpreting

Pisani, E., Fantinuoli, C., 2021. Measuring the Impact of Automatic Speech Recognition on Number
Rendition in Simultaneous Interpreting. In Wang, C., Binghan Zheng, B., eds. Empirical Studies of
Translation and Interpreting, 1st ed. Routledge, London.
Pöchhacker, F., 2007. Going Simul? Technology-Assisted Consecutive Interpreting. Forum 5(2),
101–124.
Pöchhacker, F., 2015. Simultaneous Consecutive. In Pöchhacker, F., ed. Encyclopedia of Interpreting
Studies. Routledge, London, 381–382.
Pöchhacker, F., 2016. Introducing Interpreting Studies, 2nd ed. Routledge, London.
Prandi, B., 2023. Computer-Assisted Simultaneous Interpreting: A Cognitive-Experimental Study on
Terminology. Language Science Press.
Qin, X., 2021. Experimental Report on the Impact of Speech Recognition on Consecutive Interpret-
ing (Unpublished MA dissertation). Southwest University of Finance and Economics. URL https://
wf.pub/thesis/article:D02418773
Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I., 2023. Robust Speech Rec-
ognition via Large-Scale Weak Supervision. In Proceedings of the 40th International Conference
on Machine Learning, 28492–28518.
Restuccia, M., 2023. Interpretazione di conferenza e IA: Studio sperimentale su Sight-Terp e la con-
secutiva assistita (Unpublished MA dissertation). Università degli Studi di Trieste.
Rodríguez, S., Gretter, R., Matassoni, M., Falavigna, D., Alonso, A., Corcho, O., Rico, M., 2021.
SmarTerp: A CAI System to Support Simultaneous Interpreters in Real-Time. In Proceedings of
Triton 2021, 102–109.
Rodríguez González, E., Saeed, M.A., Korybski, T., Davitti, E., Braun, S., 2023. Assessing the Impact
of Automatic Speech Recognition on Remote Simultaneous Interpreting Performance Using the
NTR Model. In Corpas Pastor, G., Hidalgo-Ternero, C.M., eds. International Workshop on Inter-
preting Technologies – SAY IT AGAIN 2023. Málaga, Spain, 2–3.11.2023.
Romano, E., 2018. Teaching Note-Taking to Beginners Using a Digital Pen. Między Oryginałem a
Przekładem 24(42), 9–16.
Romero-Fresco, P., 2011. Subtitling Through Speech Recognition: Respeaking. Routledge, Manchester.
Rosado, T., 2013. Note-Taking with iPad: Making Our Life Easier. The Professional Interpreter
(Blog). URL http://rpstranslations.wordpress.com/2013/05/28/note-taking-with-ipad-making-our-
life-easier-2/ (accessed 6.3.2024).
Rozan, J., 1956. La prise de notes en interprétation consécutive. Libraire de l’Université Georg, Geneva.
Seleskovitch, D., Lederer, M., 1989. Pédagogie raisonnée de l’interprétation. OPOCE/Didier Erudition.
Svoboda, S., 2020. SimConsec: The Technology of a Smartpen in Interpreting (Unpublished MA dis-
sertation). Palacký University Olomouc.
Tiselius, E., 2009. Revisiting Carroll’s Scales. In Angelelli, C.V., Jacobson, H.E., eds. American Trans-
lators Association Scholarly Monograph Series, Vol. XIV. John Benjamins Publishing Company,
95–121.
Ünlü, C., 2023. Automatic Speech Recognition in Consecutive Interpreter Workstation: Computer-Aided
Interpreting Tool ‘Sight-Terp’ (Unpublished MA dissertation). Hacettepe University.
Wang, X., Wang, C., 2019. Can Computer-Assisted Interpreting Tools Assist Interpreting? Translet-
ters. International Journal of Translation and Interpreting 3, 109–139.
Xin, X.Y., 2023. The Impact of Speech Recognition Software on Japanese-Chinese Consecutive Inter-
preting with a Focus on Proper Nouns (Unpublished MA dissertation). Dalian Foreign Studies
University. URL https://wf.pub/thesis/article:D03183164
Zhang, P., 2020. Empirical Study on the Impact of Speech Recognition Software on the Performance
of English-Chinese Consecutive Interpreters (Unpublished MA dissertation). Inner Mongolia Uni-
versity. URL https://wf.pub/thesis/article:D02007520
Zhang, Y., Han, W., Qin, J., Wang, Y., Bapna, A., Chen, Z., Chen, N., Li, B., Axelrod, V., Wang, G.,
Meng, Z., Hu, K., Rosenberg, A., Prabhavalkar, R., S. Park, D., Haghani, P., Riesa, J., Perng, G.,
Soltau, H., Strohman, T., Ramabhadran, B., Sainath, T., Moreno, P., Chiu, C.-C., Schalkwyk, J.,
Beaufays, F., Wu, Y., 2023. Google USM: Scaling Automatic Speech Recognition Beyond 100 Lan-
guages. arXiv preprint arXiv:2303.01037. URL https://arxiv.org/abs/2303.01037
Zhu, M., 2023. Workflow for the Tech-Augmented Interpreter with Cymo Note. In 2023 Innovation
in Interpreting Summit. URL https://learn.techforword.com/courses/2012632/lectures/45432025
(accessed 9.3.2024).

107
7
TABLET INTERPRETING
Francesco Saina

7.1 Introduction
Since the beginning of the diffusion of modern digital tablets in the early 2010s and within
the broader framework of the digitalisation of most interpreting-related processes (Saina,
2021a), interpreters have started to explore ways to include such devices into their work in
support of their activity (Drechsel et al., 2018). To date, these exploratory approaches have
involved practitioners more than academic researchers.
Digital technology is increasingly embraced in daily lives for managing tasks as well
as externalising memory and cognitive efforts (Grinschgl and Neubauer, 2022). Hence,
tablet interpreting can be precisely defined as ‘the use of tablets to support one or more
aspects of interpreting and its associated activities’ (Goldsmith, 2023). Tablet interpret-
ing can be framed in the broader context (and growing domain of interest and study) of
computer-assisted interpreting (CAI) or, more widely, machine- or digitally assisted inter-
preting, in consideration of the differing interfaces and interactive natures of tablets.
Despite hesitations regarding the adoption of digital technology, as discussed in interpret-
ing studies literature (Tripepi Winteringham, 2010), both practice and recent research have,
in effect, started to witness how such tools can, in fact, aid the interpreting activity in various
ways. Examples include the provision of assistance in the development of dedicated terminol-
ogy management or real-time AI-enabled assisting software (see Prandi, this volume).
Despite developing scholarly attention towards interpreting technology and its impli-
cations for interpreting modalities, techniques, processes, and cognition, publications on
tablet interpreting tend to primarily consist of grey literature (i.e. informal texts, such as
web blog or social media posts by practitioners reflecting on their experience with the use
of tablets). Furthermore, there are currently less than a dozen academic publications on the
topic (Goldsmith, 2023). Existing research on tablet interpreting in academic settings is
based on limited samples of participants. These are frequently not representative of the pro-
fession and subject to significant methodological constraints, as will be detailed in the fol-
lowing sections. Moreover, most work on tablet interpreting is product-oriented, namely,
comparing device models and operating systems, considering tablet accessories (styluses,
keyboards), or assessing the usefulness and usability of specific applications.

DOI: 10.4324/9781003053248-9 
Tablet interpreting

The development and embedding into tablets of systems based on artificial intelligence
(AI) and, particularly, natural language processing (NLP), such as speech recognition and
speech-to-text technology or machine translation (MT) and multilingual support, appear to
be opening the way (primarily in the professional practice) to new and hybrid interpreting
modalities. These topics will be discussed in the following sections, as well as in other chap-
ters of the present volume (see contributions to this volume by Davitti, Fantinuoli, Prandi,
and Ünlü). While these practical uses have already been in place in the real world, albeit
unevenly, according to experience reported (mostly informally) by practising interpreters,
tablet usage is slowly starting to make its way into academic environments and formal
training institutions.
This contribution provides an overview of the first years of tablet interpreting practice
(Section 7.2) and research (Section 7.3). It describes the background and present research
gaps in the field and reports on the (still limited) adoption of tablet interpreting in pro-
fessional practice across different interpreting modalities (Subsections 7.3.1 to 7.3.3), as
documented by existing publications. In addition, this chapter highlights the implementa-
tions of tablet interpreting practice in interpreter training (Section 7.4) and finally outlines
directions for future investigation (Section 7.5).

7.2 Tablet interpreting in professional practice


Given their superior portability compared to personal computers and laptops, tablets are
being increasingly adopted into professional practice. They can be integrated into several
interpreting-associated tasks, detailed hereafter, ranging from job preparation to support
during the performance, actual service delivery, and even expanded accessibility of the
profession.
Digital systems and applications can support interpreters in the preparation phase of
their workflow. This phase represents an essential component of the activity, as it allows
interpreters to become acquainted with the specialised terminology and domain knowledge
required to adequately perform their tasks (Fantinuoli, 2017). At the same time, docu-
mentation, research, and preparation appear to be increasingly time-constrained due to
the evolving needs and characteristics of the interpreting labour market. The vast array of
content available on the internet calls for assistance in managing and discerning relevant
information. For instance, when one is handling preparatory documents for a meeting to
be interpreted, computers, laptops, and tablets can replicate (and sometimes enhance) the
‘conventional’ paper-based experience. This can be achieved by enabling the user to search
and highlight words within a document (or across several documents simultaneously),
annotate texts (with the use of digital styluses, in the case of tablets), instantly share files
with colleagues or others, conduct online research on a topic, and even use dedicated inter-
preting applications.
Thanks to the wide range of potential uses that tablets possess, interpreters can uti-
lise them – as with other forms of digital and portable equipment – for managing an
array of interpreting-related activities. Activities include communication with end clients
and team partners through emails and instant messaging applications, on-device or online
cloud document storage, and consultation of dictionaries and language apps. Interpreters
can use them during both consecutive and simultaneous interpreting preparation, and
also during actual delivery. This provides interpreters with fast, effective ways to consult

109
The Routledge Handbook of Interpreting, Technology and AI

documents and glossaries, or even access to the internet. Tablets can also be utilised when
designing and accomplishing educational activities in interpreting training settings. While
most digital devices could be used for the same purposes, tablets undoubtedly represent a
practical and manageable complement for interpreters on the move (Drechsel and Gold-
smith, 2016).
Tablets can also even be used to perform and deliver distance (consecutive and simul-
taneous) interpreting services. Indeed, most videoconferencing and remote interpreting
platforms or systems can be accessed on tablets as well, both via web pages and with
dedicated applications. Furthermore, the latest, most advanced tablets are now set to
offer the same level of stable connectivity, robust performance, and equipment quality (in
terms of CPU, RAM, and network capabilities) as recommended or required for distance
interpreting services. While they do not appear to be used to perform interpreting itself
as yet, the function and power of portable and mobile devices are anticipated to continue
advancing in the future, thus eventually making this option no longer impractical or
uncommon.
Finally, tablets consistently benefit from accessibility features. Features include speech
synthesis or customisable text size to aid screen reading. This has the potential to enable
interpreters with impairments to access and participate in the profession more easily
than they could have in the past. However, even though tablets have revolutionised the
market of augmentative and alternative communication (AAC) tools to a certain extent
by making portable, powerful devices accessible to wider audiences, advances in gesture
recognition and other sign language technologies are still meagre. As yet, tablets have
not found relevant representations in sign language interpreting research and practice
as a result.

7.3 Tablet interpreting research


As technology becomes increasingly prominent in interpreting practice, training, and con-
tinued professional development, a larger body of research is needed. This is especially the
case in a field such as tablet interpreting. Publications on the subject are indeed significantly
limited in their number and relevance. This may be due to their limitations in methodol-
ogy, sample participation, or scope and the fact that, at times, studies focus exclusively on
specific product aspects.
The involvement of a growing number of participants in these studies may indicate
increased interest around tablet interpreting, but it certainly does not allow for any gener-
alisation of findings and considerations as yet. Additionally, much current work on tablet
interpreting consists of interviews and surveys, with little experimental or empirical inves-
tigation carried out. Interpreting students are often the participants of such trials, as they
are more easily available and accessible than full-time interpreting professionals. Moreover,
these trials are frequently also conducted by students for an undergraduate or master’s
thesis.
Furthermore, existing studies tend to examine the use of tablets in conference interpret-
ing. As a result, these contain no observation of their use in community or public services
interpreting or in other non-conference settings. The following sections will therefore offer
an overview of the available literature that focuses on the adoption of tablets across various
conference interpreting modalities, namely, consecutive, simultaneous, and the emerging
‘hybrid’ approaches.

110
Tablet interpreting

7.3.1 Tablets in consecutive interpreting


As mentioned earlier, research studies on tablet interpreting investigation and practice are
still limited, both in number and in scope. Drawing on earlier survey research conducted
with Josephine Holley for an unpublished MA thesis, Goldsmith (2018) reports inter-
views with six professional interpreters who used tablets for consecutive interpreting. The
study underlines the effectiveness and benefits of such practice and posits that consecutive
interpreting on tablets can ‘equal and even outstrip pen and paper in most contexts’. For
instance, consecutive interpreting on tablets can facilitate internet connectivity and access
to relevant resources. Still, the considerably limited sample of participants involved in the
study should be considered in pondering these conclusions.
Previous work (Goldsmith, 2017) on the use of tablets for consecutive interpreting only
focused on a functional comparison of commercial applications for note-taking. It also
enumerated additional possible advantages of taking notes on a digital device, such as the
availability of unlimited ‘pages’ and ‘ink’, stylus stroke customisation, zoom-in/out func-
tionalities, or integrated voice recording.
Other benefits and shortcomings of tablet interpreting practice have been identified and
discussed in both aforementioned works. These range from access to reference materials
or glossaries during the actual assignment to concerns regarding device battery life, techni-
cal glitches or crashes (and, subsequently, increased stress for the interpreter), inadvertent
actions on device screen, client misperceptions, and lack of tailored training.
More recent experimental research (Altieri, 2020) compares ‘traditional’ paper-based
and tablet consecutive interpreting in a study involving 12 interpreting students. This par-
ticipant sample is limited in number and includes no professional interpreters. The study
found no considerable difference in the following evaluation parameters: duration of inter-
preted delivery, length of speaking pauses, eye contact with the recipients of the interpreted
communication, accuracy and completeness of rendered information compared with ‘tra-
ditional’ interpreting without tablets. The experiment was followed by a reflective ques-
tionnaire administered to participants. This revealed that interpreting students generally
perceived their performance to be better with a ‘conventional’ paper support, although
their perception could be susceptible to an ‘experience bias’.
A common trait emerging from these early studies is that short training sessions – usually
provided prior to the experiment in order to offer research participants the opportunity to
get acquainted with the technology – are not deemed ‘sufficient’ to feel confident and com-
fortable with the use of a new device. If such a technique is intended to be introduced into
future professional practice and to conduct more balanced research endeavours, a more
structured, prolonged training and learning approach is required. Also worthy of consid-
eration is the extent to which tablets provide assistance in non-conference and community
consecutive interpreting situations, or when they cannot provide access to the internet.

7.3.2 Tablets in simultaneous interpreting


Studies and considerations regarding the use of tablets in simultaneous interpreting are
even scanter than those on consecutive. While it is clear that tablets may be regarded as a
‘less-cumbersome’ alternative to laptops for document searches and terminology retrieval
in the booth (even through dedicated applications for interpreters), informal exchanges
within the interpreter community and on social media also highlight another use (Bertozzi

111
The Routledge Handbook of Interpreting, Technology and AI

and Cecchi, 2023). Tablets have been identified as providing a backchannel for communica-
tion between boothmates in distance interpreting settings.
An unpublished MA thesis study on mobile devices for simultaneous interpreting (Paone,
2016) appears to be the only work available that focuses on the use of tablets in this inter-
preting modality. Conducted by surveying 21 Austrian conference interpreters, the study
had participants reporting using tablets to perform tasks in preparation for their simulta-
neous interpreting assignments. Preparation included looking up terminology and taking
notes in the booth and recording and listening back to their own renditions for personal
reference.
Thus, in simultaneous interpreting studies, the existing gap in academic research on the
use of tablets – which is mostly documented in grey literature by practitioners reflecting on
their exposure to such devices – is even more evident.

7.3.3 Tablets in hybrid interpreting modalities


The integration of tablets in interpreting practice and the advancement of digital and lan-
guage technology have opened new paths to experimenting with ‘hybrid’ modalities. As
a result, the framework of ‘conventional’ simultaneous and consecutive interpreting has
expanded through the use of mobile devices like tablets.
One of these potentially emerging modalities is ‘simultaneous-consecutive interpreting’,
also referred to as ‘SimConsec’ (more thoroughly outlined and discussed by Ünlü, this
volume). This consists of recording a speech (through a dedicated device or a tablet appli-
cation) which would normally be rendered in consecutive, while taking notes or not. The
interpreter then plays the recorded speech back (even at different speed rates, or pausing
in certain segments, based on the functionalities of the employed resource) as a backup
during their consecutive delivery, or for a delayed consecutively performed simultaneous
rendition. A small number of anecdotal reports and publications of SimConsec exist. These
describe how SimConsec can be used in international organisations, institutions, or courts
(referenced in Goldsmith, 2023). Hamidi and Pöchhacker (2007) conducted a trial involv-
ing three experienced conference interpreters. They concluded that this technology-assisted
modality may enhance interpreting performance with a more fluent and less-hesitant deliv-
ery, closer source–target correspondence, and fewer prosodic deviations. Nevertheless,
one must not disregard the extremely modest scale and methodological constraints of this
experiment when considering its reported findings.
In contrast, in a series of publications, Marc Orlando (see Orlando, 2023, for an over-
view, and Orlando, this volume) extensively explored the use of digital pens with audio and
video capturing capabilities. These devices connect notes taken on a ‘special’ paper to the
words uttered when writing. However, in this case, research accounts seem to outnumber
actual professional practice. Moreover, the specific use of tablets for this modality – as a
substitute for smartpens, accompanied by dedicated notepads – still has to find its own
empirical exploration space.
Compared to other solutions, devices, or workarounds, tablets would certainly have the
practical edge in SimConsec contexts. Due to the nature of this hybrid modality, tablets
would be less intrusive than laptops or notepads with external recording devices that inter-
preters would have to carry around. Tablets would also be a more flexible option when the
interpreter needs to be standing or changing locations during an assignment.

112
Tablet interpreting

Another novel hybrid modality, which combines consecutive interpreting with sight
translation based on an automatically generated real-time speech transcription, is referred
to as ‘SightConsec’ (see also Orlando and Ünlü, this volume). In this case, too, the inter-
preter can take complete or reduced notes (as compared to ‘traditional’ consecutive) before
rendering the message by reading from, or referring to, the transcription. In this instance,
as emerging research is starting to report, a subsequent raw machine translation step could
be added as reference for the interpreter.
Generating running transcripts and draft translations is also becoming increasingly easy
on tablets, thanks to the integration of automatic speech recognition (ASR), MT, and mul-
tilingual support into digital and mobile devices. While no experimental or exploratory
research has yet been conducted academically into this mode (either with or without tab-
lets), dedicated commercial products and applications, specifically designed for interpreters,
are already available on the market. Initial tests of this modality could, for instance, com-
pare sight translation (based on a text transcription) with the simultaneous-oriented variant
(based on an audio recording). Researchers could then assess the potential impact of the
different backup channels on the interpreter’s consecutive performance. Another avenue of
research could also be to assess whether a combination of the two backup channels could
be a viable and effective option for interpreters.
However, despite having been explored in professional settings, both these hybrid
modalities are still far from being consolidated and used in regular practice in interpreting.
This is in spite of the potential usefulness and support they provide to interpreters working
from spoken into signed languages.

7.4 Tablets in interpreter training


In parallel with its adoption in professional practice and exploration in research, technol-
ogy has also made its way into interpreter training. This integration aims to expand and
enhance the teaching and learning of interpreting (Sandrelli, 2015). In addition to multime-
dia content, virtual learning environments, and various types of digital resources, technol-
ogy can equally be utilised as a tool and a medium through which training is performed and
delivered. As with any other technology requiring particular equipment (Saina, 2021b), the
question of the availability of devices arises for institutions who wish to introduce tablet
interpreting into their programmes.
The growing capabilities of mobile devices, such as tablets or the previously mentioned
digital pens and smartpens (see Ünlü and Orlando, this volume), have supplemented the cen-
tral role of computers that were used for this purpose. This suggests that other technologies
and devices can also be effectively utilised in the training of interpreters (Frittella, 2021). The
potential to capture audio (and even video) while taking notes on a tablet and review it later
allows trainers and trainees alike to analyse interpreting performances more thoroughly. As a
result, they can elaborate on the process (thus working on the metacognition of interpreting)
and assess the entire performance rather than just the final product, that is, the interpreted
rendition (Saina, 2022). This approach to interpreter training has been thoroughly addressed
and studied by Orlando (2016), mostly in relation to digital pens for capturing and listening
back to recordings. Taking notes and practising consecutive interpreting on tablets can addi-
tionally ease the process of keeping records and creating (personal and shared) databases of
symbols, terminology, exercises, or content of speeches and lessons.

113
The Routledge Handbook of Interpreting, Technology and AI

Indeed, a commercial manual on tablet interpreting by Alexander Drechsel and Joshua


Goldsmith (2020) hints at how tablets could be used in the classroom, for example, to share
the screen of the device and demonstrate note-taking in real time or to play back audio
and notes simultaneously so as to analyse the process afterwards. However, Moser-Mercer
(2015) warned against a premature introduction of tablets in interpreter training and sug-
gested their use only be phased in once learners had familiarised themselves with the ‘core
interpreting skills’. Nevertheless, the latest generations of interpreting trainees are extremely
conversant with, and sometimes even more comfortable using, digital devices as opposed
to pen and paper. Consequently, the appropriateness and convenience of imposing pen and
paper on students in the early stages of their training may be called into question today.
Nevertheless, with the exception of a few sporadic (but significantly constrained) experi-
ments, the use of tablets in interpreter training – both as a device to deliver training and
as a tool to perform interpreting – remains largely unexplored. Such experiments include
Arumí and Sánchez-Gijón (2019), whose study focused on convertible laptops rather than
modern tablets. Conversely, Wang et al. (2023) compared the perceptions of interpreting
students and professional interpreters regarding the use of tablets by contrasting newly
conducted interviews with 28 beginners with those of the 6 practitioners reported by Gold-
smith (2018). In this instance, less than a third of the former group expressed a preference
for tablets over pen and paper.

7.5 Future research directions


Previous sections of the present contribution have covered the introduction of tablets in pro-
fessional interpreting practice (Section 7.2), research (Section 7.3), and training (Section 7.4).
These have reported on early adoption experiences (and their impact on interpreting tech-
niques) and outlined the current state and limitations of existing investigation endeavours.
The usage of tablets in interpreting remains low. If research efforts on the subject are to be
pursued, a broader number of studies should be devised. These could involve larger sam-
ples of participants and be grounded on more rigorous and robust methodologies, with a
wider research purpose. Besides the research questions already raised earlier in the chapter,
research on tablet interpreting should also span across all interpreting modalities (includ-
ing sign language interpreting) and uses in different interpreting settings. Research in this
area should also include greater involvement from experienced practitioners as opposed
to students and trainees. Avenues for future research on tablet interpreting could range
from exploring the general framework of technology which supports interpreting prepa-
ration and performance (CAI) to addressing the need, creation, and acquisition of new
or revamped skills, the effect on interpreters’ cognitive processes as users, and the role of
tablets in the evolution and future perception of the profession itself.

7.5.1 Tablets and interpreting technology


Future research regarding tablet use in the larger realm of technology-assisted interpreting
could focus on comparing device types and sources of aid input. Research could also focus
on optimising tool interfaces and functionalities, delving into more technical applications of
NLP and AI resources, and assessing the impact of interpreter–tablet interaction on delivery
techniques and modalities.

114
Tablet interpreting

Digital mobile devices, including tablets, now hold the processing power and capacities
to support interpreting preparation, performance, and quality control (and even provide a
platform for the actual delivery of interpreting). Consequently, their applications and the
differences between these and other devices (mainly laptops and computers) need to be ade-
quately explored and studied. In interpreting technology research in general, there is scant
evidence of how interpreting work could be optimised using multiple inputs and external
prompts, be this via a laptop or a tablet. This perspective could also enable comparisons to
be made between device categories, ergonomics (workstation set-up and space saving), or
whether there are significant differences in terms of benefits associated to one device over
others, or across settings (for instance, in-person vis-à-vis remotely). This would help better
define the appropriate scope and the best-suited scenarios for applications of these devices.
Following the trajectory of other studies on CAI tools (Frittella, 2023), further research
could also investigate how user interfaces of dedicated tablet applications and systems are
designed in order to create optimal interpreter–device interaction. In particular, an applica-
tion designed and developed specifically for consecutive interpreting could feature dedicated
note-taking functionalities. Dedicated functionalities could include customisable shortcuts
for frequently used symbols, signs, abbreviations, and terminology or background pop-up
windows containing knowledge that interpreters may need to repeatedly recall during their
rendition (such as names, terms, or other relevant information).
More technical trials could also study how NLP-based systems, such as ASR, could be
used by interpreters to assess and improve (both in real time and retrospectively) the qual-
ity of their renditions in professional settings. Similarly, the use of technology such as face
detection and eye tracking could potentially compensate for any of consecutive and tablet
interpreting’s detected shortcomings. To illustrate, interpreters could be prompted to look
up if they are found to be gazing too much at the screen. As a result, eye contact could be
improved.
Also worth further investigation is interpreting technique, specifically with regard to
consecutive interpreting. Studies could focus on potential variations in note-taking method
and style across tablet-enabled modalities (consecutive on a tablet, SimConsec, and Sight-
Consec). Other studies could also bring forth exploration into latency. Across the range of
interpreting modes, some research has already taken place into the maximum acceptable
latency provided by the automated output of tools (Fantinuoli and Montecchio, 2022)
for these to be cognitively adequate and effectively useful for interpreters. However, more
can be done. Similarly, research focusing on the interpreter’s overreliance on these systems
(Defrancq and Fantinuoli, 2021) is also minimal and can be further explored.

7.5.2 Tablet interpreting and skills acquisition


Tablets could also be a resource in interpreter training as both the medium and the object
of new skills acquisition. However, the adoption of tablets by educational settings largely
depends on (currently lacking) research evidence and use by professionals. Many professional
interpreters currently prove hesitant to incorporating tablets into their work, in a regular
and extensive manner. The reasons for this warrant further, more comprehensive investi-
gation. Surveys into interpreters’ adoption of technology (Corpas Pastor and Fern, 2016)
and discussions within the interpreting community reveal multiple concerns. These include
the distrust of technology, device reliability vis-à-vis conventional ‘hard’ tools, reliance on

115
The Routledge Handbook of Interpreting, Technology and AI

long-established work processes and habits, shortage of evidence- and research-based ben-
efits, and insufficient extended specific training or upskilling programmes.
In the area of interpreter training (and continued professional development), research
could focus more precisely on the acquisition of new skills and competencies that are required
for interpreters to integrate and benefit from the use of tablets in interpreting. Research in this
area could also explore the learning curve related to the use of this technology and whether
this has had an impact on its limited adoption thus far compared to other interpreter tools.
In this vein, thanks to a more comprehensive body of investigation into interpreter–­
tablet interaction, new interpreting-specific skills could also emerge. Potential skills include
the competence to best leverage different devices or even adapt interpreting styles and pro-
cesses to the resources being used (e.g. by adjusting note-taking or décalage to the latency of
a tool) or the expertise and critical thinking required to decide what tasks can be optimised
through digital technology (and what are best-suited scenarios). An education-oriented
research path such as this could also guide instructional design proposals and assist in pro-
viding effective training on interpreting using this device.

7.5.3 Tablet interpreting and cognition


As previously noted, in the broader domain of interpreting studies, research into the use
of tablets still requires further exploration. Areas of particular interest include investigat-
ing the cognitive impact on interpreters, whether the utilisation of tablets in consecutive
interpreting affects note-taking choices and processes, and also end user perception. This
final avenue is of particular interest, given the increasing concerns regarding data security
and confidentiality that are currently being voiced in relation to employing digital resources
and AI technology to process information. In addition to investigations that are specifi-
cally related to interpreting techniques and technologies, other process-oriented trials could
examine any cognitive impact or information overload that results from the use of tablets
in the interpreting process.
Indeed, cognitive studies traditionally depicted cognitive processes as mental operations
on representations inside the brain, with perception and action seen as application pro-
cesses. However, more recent approaches to situated cognition acknowledge the distribu-
tion and offloading of these processes to external elements, including technology, devices,
and media (Clark and Chalmers, 1998; Grinschgl and Neubauer, 2022; Sprevak, 2019). As
situated cognitive science has increasingly gained a place within translation and interpret-
ing process research (Risku and Rogl, 2020), one can observe additional pressure on the
coordination cognitive effort required by the use of digital technology. This is particularly
evident in relation to use of tablets (Moser-Mercer, 2015). However, this area of research
still remains to be explored in a more extensive manner.
Research in other related areas, such as information comprehension and knowledge
retention across media (especially when using digital devices), has evidenced less ‘deep
reading’ and understanding when working with digital resources (Liu, 2005). Media and
communication studies (McLuhan, 1962, 1964; Ong, 1982) have already discussed how
different media resources and communication-mediating technologies can impact language
transfer and message reception processes, as well as human consciousness and knowledge
structures. Therefore, further investigation in this area could explore whether the behaviour
of digital readers – especially ‘aim-driven’ readers, like interpreters who are preparing for

116
Tablet interpreting

an assignment by studying documents and texts on a tablet – may change over time. For
example, interpreters could be studied as they get increasingly better-acquainted with the
use of a digital device in their daily lives. Alternatively, studies could focus on the evolution
of the technological medium and how it influences comprehension and knowledge retention
over time.

7.6 Conclusion
While interest in digital technology in interpreting is increasing gradually, the use of tablets
still appears to be a niche area in both research and professional practice. There are currently
limited academic publications on the subject (in addition to a few anecdotal accounts),
and only restricted circles of practitioners resort to these devices in their regular work
routines. This chapter has aimed to provide a comprehensive overview of the early years
of tablet interpreting, from its exploratory uses in professional practice (Section 7.2) as a
portable and convenient device to support interpreter preparation and performance, to
research (Section 7.3) on its application in consecutive, simultaneous, and hybrid interpret-
ing modalities (with all its constraints and limitations) and its usability in aid of interpreter
training settings (Section 7.4).
Amidst the growing relevance of AI and the level of interest regarding the impact it
may have on interpreting, any AI-based application that is addressed to interpreters (from
real-time, ASR-generated transcriptions to instant access to terminological databases) can
be easily made available on tablets. In addition, the level of portability and multifunction-
ality that tablets possess makes them ideal tools for the dynamic environments in which
interpreters often operate. However, potential associated hurdles (such as information over-
load or excessive reliance on system outputs) require further investigation. Additionally, the
potential impact on an interpreter’s cognitive load that use of a tablet may have remains
relatively unexplored. It could be hypothesised that the combination of components could
exacerbate the interpreters’ mental strain when they are already focusing on trying to con-
vey accurate messages across languages.
Human-specific aspects of interpreting and cross-language communication should not be
neglected either. Nuances such as cultural peculiarities, contextual subtleties, and elements
beyond verbal language are essential for interpreting, multilingual communication, and
exchanges among people. These may be at risk if technology-driven efficiency is prioritised
over higher-level human connection. Therefore, while AI-equipped tablets in interpreting
can indeed offer promising benefits in terms of support and accuracy, they also present
critical challenges. These need to be carefully considered in order to ensure proper and
favourable integration of tablets into interpreting practices. Nevertheless, these assump-
tions, common to digitally assisted interpreting on any device, can only be assessed and
scrutinised through empirical, evidence-based research.
In conclusion, despite its potential capabilities, tablet interpreting does not yet constitute
an established space in the interpreting field. This view takes into consideration the limited
amount of investigation that has already taken place in this area, as well as the sporadic
reports written over recent years, which attest to a burgeoning interest in this technology,
although not specifically developed for interpreting purposes. However, it remains to be
said that tablets possess the potential to be widely adopted by the profession and may prove
beneficial for interpreting in the future.

117
The Routledge Handbook of Interpreting, Technology and AI

References
Altieri, M., 2020. Tablet Interpreting: Étude expérimentale de l’interprétation consécutive sur tab-
lette. The Interpreters’ Newsletter 25, 19–35.
Arumí, M., Sánchez-Gijón, P., 2019. La presa de notes amb ordinadors convertibles en l’ensenyament-
aprenentatge de la interpretació consecutiva. Resultats d’un estudi pilot en una formació de màster.
Revista Tradumàtica. Tecnologies de la Traducció 17, 128–152.
Bertozzi, M., Cecchi, F., 2023. Simultaneous Interpretation (SI) Facing the Zoom Challenge:
Technology-Driven Changes in SI Training and Professional Practice. In Corpas Pastor, G.,
Hidalgo-Ternero, C.M., eds. Proceedings of the International Workshop on Interpreting Tech-
nologies JUST SAY-IT 2023. Shoumen, Incoma, 32–40.
Clark, A., Chalmers, D., 1998. The Extended Mind. Analysis 58(1), 7–19.
Corpas Pastor, G., Fern, L.M., 2016. A Survey of Interpreters’ Needs and Practices Related to Language
Technology. Universidad de Málaga, Malaga. URL www.lexytrad.es/assets/Corpas-Fern-2016.pdf
(accessed 29.9.2024).
Defrancq, B., Fantinuoli, C., 2021. Automatic Speech Recognition in the Booth: Assessment of Sys-
tem Performance, Interpreters’ Performances and Interactions in the Context of Numbers. Target
33(1), 73–102.
Drechsel, A., Bouchard, M., Feder, M., 2018. Inter-Institutional Training Cooperation on the Use of
Tablets in Interpreting. CLINA 4(1), 105–114.
Drechsel, A., Goldsmith, J., 2016. Tablet Interpreting: The Use of Mobile Devices in Interpreting. In
Forstner, M., Lee-Jahnke, H., eds. CIUTI-Forum 2016: Equitable Education Through Intercul-
tural Communication: Role and Responsibility for Non-State Actors. Frankfurt, Peter Lang.
Drechsel, A., Goldsmith, J., 2020. The Tablet Interpreting Manual: A Beginner’s Guide. URL www.
techforword.com/ (accessed 29.9.2024).
Fantinuoli, C., 2017. Computer-Assisted Preparation in Conference Interpreting. Translation & Inter-
preting 9(2), 24–37.
Fantinuoli, C., Montecchio, M., 2022. Defining Maximum Acceptable Latency of AI-Enhanced CAI
Tools. URL: https://doi.org/10.48550/arXiv.2201.02792.
Frittella, F.M., 2021. Computer-Assisted Conference Interpreter Training: Limitations and Future
Directions. Journal of Translation Studies 1(2), 103–142.
Frittella, F.M., 2023. Usability Research for Interpreter-Centred Technology: The Case Study of
SmarTerp. Language Science Press, Berlin.
Goldsmith, J., 2017. A Comparative User Evaluation of Tablets and Tools for Consecutive Interpret-
ers. In Esteves-Ferreira, J., Macan, J.M., Mitkov, R., Stefanov, O.M., eds. Proceedings of the 39th
Conference Translating and the Computer. Tradulex, Geneva, 40–50.
Goldsmith, J., 2018. Tablet Interpreting: Consecutive Interpreting 2.0. Translation and Interpreting
Studies 13(3), 342–365.
Goldsmith, J., 2023. Tablet Interpreting: A Decade of Research and Practice. In Corpas Pastor,
G., Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. John Benjamins,
27–45.
Grinschgl, S., Neubauer, A.C., 2022. Supporting Cognition with Modern Technology: Distributed
Cognition Today and in an AI-Enhanced Future. Frontiers in Artificial Intelligence 5. URL https://
doi.org/10.3389/frai.2022.908261
Hamidi, M., Pöchhacker, F., 2007. Simultaneous Consecutive Interpreting: A New Technique Put to
the Test. Meta 52(2), 276–289.
Liu, Z., 2005. Reading Behavior in the Digital Environment: Changes in Reading Behavior Over the
Past Ten Years. Journal of Documentation 61(6), 700–712.
McLuhan, M., 1962. The Gutenberg Galaxy: The Making of Typographic Man. University of Toronto
Press, Toronto.
McLuhan, M., 1964. Understanding Media: The Extensions of Man. McGraw-Hill, New York.
Moser-Mercer, B., 2015. Technology and Interpreting: New Opportunities Raise New Questions.
URL https://oeb.global/oeb-insights/interpreting-technology/ (accessed 29.9.2024).
Ong, W.J., 1982. Orality and Literacy: The Technologizing of the Word. Methuen, London.
Orlando, M., 2016. Training 21st Century Translators and Interpreters: At the Crossroads of Prac-
tice, Research and Pedagogy. Frank & Timme GmbH.

118
Tablet interpreting

Orlando, M., 2023. Using Smartpens and Digital Pens in Interpreter Training and Interpreting
Research: Taking Stock and Looking Ahead. In Corpas Pastor, G., Defrancq, B., eds. Interpreting
Technologies – Current and Future Trends. John Benjamins, 6–26.
Paone, M.D., 2016. Mobile Geräte beim Simultandolmetschen mit besonderem Bezug auf Tablets.
University of Vienna, Vienna. Unpublished.
Risku, H., Rogl, R., 2020. Translation and Situated, Embodied, Distributed, Embedded and Extended
Cognition. In Alves, F., Jakobsen, A., eds. The Routledge Handbook of Translation and Cognition.
Routledge, London, 478–499.
Saina, F., 2021a. Technology-Augmented Multilingual Communication Models: New Interaction
Paradigms, Shifts in the Language Services Industry, and Implications for Training Programs. In
Turchi, M., Fantinuoli, C., eds. Proceedings of the 1st Workshop on Automatic Spoken Language
Translation in Real-World Settings (ASLTRW). Association for Machine Translation in the Ameri-
cas, 49–59. URL https://aclanthology.org/2021.mtsummit-asltrw.5 (accessed 29.9.2024).
Saina, F., 2021b. Remote Interpreting: Platform Testing in a University Setting. In Mitkov, R.,
Sosoni, V., Giguère, J.C., Murgolo, E., Deysel, E., eds. Proceedings of the Translation and
Interpreting Technology Online (TRITON) Conference, 57–67. URL https://doi.org/10.2661
5/978-954-452-071-7_007
Saina, F., 2022. Scenari didattici della mediazione linguistica nell’era digitale. In Petrocchi, V., ed.
Spunti e riflessioni per una didattica della traduzione e dell’interpretariato nelle SSML. Configni
(Rieti), CompoMat, 107–119.
Sandrelli, A., 2015. Computer-Assisted Interpreter Training. In Pöchhacker, F., ed. Routledge Ency-
clopedia of Interpreting Studies. Routledge, London, 75–77.
Sprevak, M., 2019. Extended Cognition. In Crane, T., ed. The Routledge Encyclopedia of Philosophy.
Routledge, London. URL https://dx.doi.org/10.4324/9780415249126-V049-1
Tripepi Winteringham, S., 2010. The Usefulness of ICTs in Interpreting Practice. The Interpreters’
Newsletter 15, 87–99.
Wang, Y., Tian, Y., Jiang, Y., Yu, Z., 2023. The Acceptance of Tablet for Note-Taking in Consecutive
Interpreting in a Classroom Context: The Students’ Perspectives. Forum for Linguistic Studies
5(2). URL https://doi.org/10.59400/fls.v5i2.1862

119
PART II

Technology and interpreter training


8
COMPUTER-ASSISTED
INTERPRETING (CAI) TOOLS
AND CAI TOOLS TRAINING
Bianca Prandi

8.1 Introduction
Computer-assisted interpreting (CAI) can be defined as the use of technological applications,
such as terminology management and automatic speech recognition (ASR), on hardware,
such as laptops or tablets, to provide support to interpreters for one or more sub-processes
in their workflows. The use of technologies as support (Braun, 2019) is but one of the
potential applications of technology to interpreting: Other areas are technology-enabled
interpreting (see Part I of this volume) and technology (semi-)automating interpreting
workflows (see Part III of this volume). These additional applications are strictly linked
with CAI: For instance, support is often achieved through automation. This is increasingly
true when it comes to artificial intelligence (AI), as the same technologies can be used both
to help interpreters and to automate interpretation, as is the case for ASR and machine
translation (MT). Another example of the growing interplay between CAI and other tech-
nological applications in interpreting is the inclusion of CAI functionalities, either for the
automation of subtasks in the preparation phase or for live support during the assignment,
in remote simultaneous interpreting (RSI) platforms (e.g. Rodríguez et al., 2021; Fantinuoli
et al., 2022).
The definition of CAI is therefore broad and can essentially be taken to mean
‘technology-supported interpreting’. The term CAI tool, however, has been used both in a
broad and in a narrow sense in scholarship. In some publications (e.g. Costa et al., 2014), it
indicates any piece of software interpreters can use as support in their workflows. Based on
this definition, even unit converters and terminology management programs for t­ranslators
could be considered CAI tools. Other scholars, particularly those carrying out empirical
research on this subset of interpreting technologies, define CAI tools more narrowly as

all sorts of computer programs and mobile applications specifically designed and
developed to assist interpreters in at least one of the different sub-processes of inter-
pretation, for example, knowledge acquisition and management, lexicographic mem-
orisation, terminology access, and so forth.
(Fantinuoli, 2021, 512)


DOI: 10.4324/9781003053248-11
The Routledge Handbook of Interpreting, Technology and AI

This definition stresses the bespoke nature of these tools, often created by interpreters
for interpreters. An attempt to bridge these discrepant definitions following a functional
approach is offered by Guo et al. (2023, 91), who describe CAI tools as

pieces of computer software, mobile phone applications, or digital devices that can
be used during the interpreting process to reduce the cognitive stress that interpreters
face and to enhance overall processing capacity. They are an integral part of the inter-
preting process. They are also directly linked to and might positively affect the cog-
nitive processes that underlie the task of interpreting by reducing working-memory
stress, eliminating production difficulties, and such.

This definition is in line with Will’s (2020) classification of CAI tools according to whether
they provide support for the in-process phase (primary CAI tools) or the pre-process phase
of interpreting (secondary CAI tools) or for both (integrated CAI tools). Guo et al.’s defini-
tion can cover both primary and integrated CAI tools, but not secondary CAI tools, which
leaves out a considerable number of applications developed for interpreters’ advanced
preparation.
To avoid any terminological confusion and clearly identify the type of applications
which will be discussed in this chapter, CAI tools will henceforth be defined as those appli-
cations capable to provide support pre-, in-, peri-, and/or post-process (see Kalina, 2005)
and that have been specifically developed with the goal to optimise interpreters’ workflows
and cognitive processes, increasing their productivity and ultimately improving the quality
of their interpretation.
This chapter first discusses CAI tools from the point of view of their historical develop-
ment (Section 8.2), tracing their evolution, providing an overview of the current landscape,
and discussing their uptake and reception among professionals. With the goal to orient
future investigations, Section 8.3 spotlights the main areas of enquiry in empirical research
on CAI tools, that is, interpreter-centric research (Section 8.3.1) and research focusing on
the evaluation of CAI systems (Section 8.3.2). This central section offers a critical overview
of where we stand in terms of our understanding of interpreter–tool interaction, of the
implications of CAI tool use for interpreters’ cognitive processes and performance, of the
affordances and limitations of bespoke technologies for interpreters, and of the method-
ologies we adopt to investigate such topics. Training on and with CAI tools is discussed
in Section 8.4, which highlights current issues and open questions. The chapter concludes
by exploring potential future advancements in CAI, considering the recent progress in AI
technology.

8.2 Historical development of CAI tools


CAI tools are generally considered as a recent development in the field of interpreting tech-
nologies. However, the first CAI tools were made available at the beginning of the 2000s. This
section presents the evolution of CAI tools, illustrates the current landscape, and discusses
the current state of diffusion and the related perceptions and attitudes among practitioners.

8.2.1 Theoretical foundations and initial conceptualisations


The first conceptualisations of bespoke supporting tools for interpreters date back to the
early 2000s, when scholars started envisaging the use of technology to help interpreters

124
Computer-assisted interpreting tools and tools training

optimise their preparation workflows, rationalise their processes, and in turn, alleviate some
of the effort resulting from a highly demanding cognitive activity, such as interpreting. Stoll
(2009) suggested that CAI tools should aim to ‘move cognition ahead’ of the interpreting
task, by allowing interpreters to pre-process their preparation material, anticipating some
of the challenges they may encounter during the interpreting phase. When receiving the
text of a speech in advance, for instance, they could annotate the document with difficult
terminology, highlight names and numbers, and do so in a dedicated digital environment
that would facilitate this kind of preparatory work.
Interpreters have specific needs which partly differ from those of translators. They work
under high time pressure and can conduct terminological work mainly before the assign-
ment. They require highly personalised tools, offering speed of consultation and intuitive
navigation, allowing terminology lookup and updating in the booth (Rodríguez and Sch-
nell, 2009). Rütten (2004, 2007) sketched the ideal architecture of a tool for interpreters
which would support working with documents, extracting terminology, creating and stor-
ing glossaries, efficiently accessing terminological resources, and training. Starting from a
central homepage, interpreters would be able to access the different modules. This proposed
structure highlights how CAI tools were initially meant primarily for preparation, specifi-
cally for terminology work and glossary creation and management. The first tools were
therefore developed with these aims in mind and for conference interpreters working in the
simultaneous mode. At that time, simultaneous interpreting already saw the use of laptops
in the booth, whereas the use of tablets for consecutive interpreting (see, for example, Gold-
smith, 2018) was first envisaged only several years later. Additionally, the reduced visibility
of interpreters working in the simultaneous mode may foster the use of technology as sup-
port more than in consecutive interpreting.

8.2.2 Current landscape


With the rapid technological advances of the past few years, CAI tools have evolved
to become increasingly sophisticated supporting applications for interpreters. Unlike
general-purpose supporting technologies, still widely used by interpreters (Jiang, 2013;
Deysel, 2023), CAI tools present interpreter-specific advantages. They organise glossaries in
central databases, allow for quick retrieval of previously generated terminological resources
and for easy exchange of such resources in digital format. The growing integration of AI
and, especially, deep learning into CAI tools is progressively expanding the opportunities
offered to interpreters to leverage technology as support.
Up to four generations of tools (see Gaido et al., 2021; Ghent University, 2021) can be
identified based on their level of sophistication and types of functions offered, a distinction
which only partially reflects their historical development. Table 8.1 sums up the main fea-
tures of each generation of tools, providing examples for each.
The first generation of CAI tools includes applications mainly facilitating interpreters’
terminology management. These tools offer a simple structure for terminological entries
while allowing for the inclusion of additional information, such as grammatical gender,
source, or example sentences. No advanced algorithm is offered to speed up and refine
in-process queries, although it is possible to search the entire database. Some examples are
Interplex,1 Flashterm Interpreter,2 and Glossarmanager.3
Though still heavily oriented towards preparation, second-generation CAI tools offer a
richer suite of functionalities. They allow interpreters to process documents more efficiently,

125
The Routledge Handbook of Interpreting, Technology and AI

Table 8.1 Overview of CAI tools by generation

1st-generation 2nd-generation 3rd-generation 4th-generation


CAI tools CAI tools CAI tools CAI tools

Main focus Terminology Terminology Automation Integration into


work extraction and RSI platforms
knowledge
management
Features • Simple term • Document • Automatic glossary • Speech
entry structure management creation recognition
• Search • Manual terminol- • Text summarisation • Machine
function ogy extraction • Speech recognition translation
• Glossary • Advanced search • Machine
management algorithm translation
Example tools Interplex Boothmate InterpretBank KUDO
Flashterm Intragloss Sight-Terp Interpreter Assist
Interpreter SmarTerp&Me
Glossarmanager

thanks to dedicated interfaces for terminology extraction (e.g. Boothmate),4 and present an
advanced search algorithm for manual queries in the booth. When using these tools, inter-
preters need only type a few letters to start a query, without having to press the Enter key
and worry about typing mistakes.
The true innovation in supporting technology for interpreters comes from a third gen-
eration of CAI tools which rely heavily on support through automation, often including AI
applications. This covers all phases of the workflow: For instance, interpreters can auto-
matically extract terminology from documents within a few seconds, generate terminology
lists by providing the tool with a URL or domain-related keywords, and machine-translate
such lists or automatically summarise texts. Particularly prominent in third-generation tools
is the use of ASR (Fantinuoli, 2017b) and named entity recognition (NER) to automatically
prompt interpreters for common problem triggers, such as numbers, specialised terminol-
ogy, and named entities (Gile, 2009). InterpretBank5 is currently the only stand-alone CAI
tool in this category offering all these functions. Sight-Terp6 can also be regarded as a
third-generation CAI tool but does not offer preparation-related functions and only focuses
on the in-process phase.
Some authors suggest including a fourth generation of tools distinguishing cloud-based
applications from desktop-based applications. The addition of a fourth generation may
indeed be warranted by the fact that interpreters working for RSI platforms can now use
integrated CAI tools, as is the case for SmarTerp&Me (Rodríguez et al., 2021) and KUDO
Interpreter Assist (Fantinuoli et al., 2022). Figure 8.1 sums up the available CAI tools, the
phases they support, and the integrated technologies.

8.2.3 CAI tool use and reception


Despite CAI tools’ potential to support interpreters’ workflows and relieve some of their
cognitive effort, their uptake among practitioners still seems to be quite modest. This is
at least the picture which emerges from the available surveys dedicated to interpreters’

126
Computer-assisted interpreting tools and tools training

Figure 8.1 Currently available CAI tools and key characteristics.

use of technology, which sometimes include CAI tools among the types of applications sur-
veyed. At present, however, we can mostly formulate hypotheses as to interpreters’ actual
use of CAI tools and their attitudes towards these applications and, at best, draw tentative
conclusions. Obtaining a clear picture of CAI tools use is complicated by the current dearth
of data. The picture of an interpreter only modestly using supporting technologies, and in
particular, CAI tools, emerges from data collected several years ago, before the COVID-19
pandemic, which shifted much of interpreting online, with an increasing integration of
technology into interpreters’ workflows. For this reason, the surveys often cited to paint the
said picture may reflect outdated views and facts about interpreting technologies in general
and CAI tools in particular. Even before the pandemic, the usefulness of the data collected
for inferring interpreters’ adoption of CAI tools was limited by a certain lack of clarity as to
what was considered a CAI tool or which technologies were under investigation.
The seemingly limited uptake of CAI tools by interpreters is often attributed to an
attitude of general scepticism (see, for example, Tripepi Winteringham, 2010). How-
ever, interpreters’ motivations for rejecting technologies are rarely surveyed and often
stem from scholars’ anecdotal observations or suppositions. Some exceptions are the sur-
veys conducted by Mellinger and Hanson (2018), Deysel (2023) and Fan (2024). The first
pulled together several validated instruments to investigate potential relationships between
interpreters’ communication apprehension, visibility, and personal technology adoption
propensity. The survey highlighted how additional factors, such as the interpreting setting,
interpreters’ self-perception of their role, and even the availability of technologies devel-
oped to support interpreting, may explain interpreters’ attitudes towards the uptake of
technologies beyond their personal inclinations. The survey conducted by Fan (2024) also
highlights similar mediating factors, such as interpreters’ concern about the reliability of
such tools. In her survey, Deysel (2023) specifically explored interpreters’ concerns about

127
The Routledge Handbook of Interpreting, Technology and AI

technology. While her investigation did not only pertain to CAI tools, it is useful as it
highlighted how interpreters worry about tools potentially interfering with their cognitive
processes, proving distracting, and requiring excessive attention or processing capacity.
Additional reasons for the limited acceptance of CAI tools may be a lack of satisfaction
towards currently available functionalities, as the survey by Corpas and Fern (2016) on
terminology management tools would suggest. Qualitative data from empirical studies may
also help get a clearer picture of the reasons behind interpreters’ limited uptake of CAI
tools. Several study participants report being sometimes distracted by the tools (e.g. Prandi,
2015b, 2023; Desmet et al., 2018) and underline the importance of getting accustomed to
the tool to make the best use of it (e.g. Defrancq and Fantinuoli, 2020; Pisani and Fantinu-
oli, 2021). Most importantly, they observe that for a tool to be considered useful and sat-
isfactory, it must be dependable, help them achieve better performance than without, and
ideally offer features tailored to their own needs (e.g. Frittella, 2023; Ünlü, 2023a).
Despite these initial insights, we have only scratched the surface of the complex rela-
tionship between interpreters and CAI tools. While we may hypothesise that interpreters’
views of technology have changed and will continue to evolve in the coming years, further
dedicated investigation is needed to understand the prevalence and perception of tools spe-
cifically created for interpreters and to be able to draw conclusions which may inform CAI
tool development.

8.3 Empirical research on CAI tools


Empirical research on CAI tools is relatively new in interpreting studies and is gaining
momentum thanks to the exponential growth in the possibilities offered by AI for the opti-
misation of interpreters’ workflows and potential quality gains. This section offers a critical
overview of our current understanding of interpreter–tool interaction, the cognitive and
performance impacts of CAI tools on interpreters, the benefits and limitations of such tech-
nologies, and the methodologies used to investigate these issues.
The multiple research foci around CAI can be subsumed into two broad areas: stud-
ies focused on evaluating CAI tools’ effects on interpreters, and studies focused on the
technological applications. In the following sections, interpreter-centric and system-centric
research is discussed and reviewed, knowledge gaps are defined, and open questions are
identified, with the goal to orient future enquiries.

8.3.1 Interpreter-centric research on CAI tools


In discussing the empirical studies focusing on interpreters, this review will be organised
in process- and product-oriented research. This classification reflects the need, highlighted
by Mellinger (2019), to conduct research on both the process and product of CAI. This
dual perspective is necessary because it relates to the very objectives that led to the crea-
tion of CAI tools, that is, using technology to ease interpreters’ cognitive effort and to
help them achieve higher interpretation quality. Since product-oriented research inherently
offers insight into the process, albeit indirectly, and process-oriented research often involves
the collection of performance-related, product-oriented data, this division is necessarily of
a pragmatic nature.

128
Computer-assisted interpreting tools and tools training

8.3.1.1 Product-oriented research


Product-oriented empirical research on CAI tools centres on two main research objects:
interpreters’ computer-assisted preparation and CAI tools’ effect on individual aspects of
quality, such as numbers and specialised terminology.

CAI TOOLS AND INTERPRETERS’ PREPARATION

Although the first generation of CAI tools was primarily built to provide support for the
preparation phase of an interpreting assignment, research on computer-assisted interpret-
ers’ preparation is still rather scarce.
The idea of using technology, primarily from the area of corpus linguistics and natural
language processing, to rationalise interpreters’ glossary work can be summed up in Fan-
tinuoli’s (2017a) corpus-driven interpreters’ preparation (CDIP). Fantinuoli observed that
interpreters prepare for their assignments under high time pressure, often under suboptimal
conditions, as they lack relevant preparation materials. Furthermore, they are called upon
to facilitate communication on specialised subjects, in which they are often not experts
themselves, and to acquire knowledge and specialised terminology on a variety of topics. To
overcome these limitations, interpreters can use dedicated tools to create ad hoc corpora of
domain-relevant documents and explore such collections of texts to reconstruct the under-
lying conceptual systems and extract terminology.
Empirical work by Xu and Sharoff (2014) and Xu (2018) explored the question of
whether corpus-based preparation provides advantages to interpreters as compared to
manual preparation. In a preliminary study contrasting three term extraction tools for
­English and Chinese, Xu and Sharoff (2014) found that trainees in the experimental
group saved, on average, half the time compared to the students who extracted terms
manually. Students in the experimental group further observed that the term list ori-
ented them as to the terms and concepts worth prioritising when preparing. In a subse-
quent study involving 22 Chinese students divided into a control and a test group, Xu
(2018) found that the test group, working with a corpus creation tool, an automatically
extracted term list, and a concordancer, achieved significantly better results with regards
to terminological accuracy in SI, number of omissions, preparation time required, and
post-task recall of terms. This study has the merit of linking the preparation process to
the interpreting task, showing that the potential benefits of CDIP go beyond time savings
for interpreters.
It should be noted, however, that the manual and the automatic terminological prepa-
ration approaches are not entirely comparable, as they may pursue different goals. While
reading preparation documents and manually extracting terminology helps interpreters
gain domain knowledge and engage with the subject matter, automatic approaches allow
them to process large amounts of textual data and inevitably entail a focus on the special-
ised language used. The two approaches, then, can be integrated to achieve a thorough and
extensive level of preparation.
Future research may choose to investigate other aspects of computer-assisted prepara-
tion beyond corpus creation and terminology extraction, for instance, addressing the use of
speech recognition in the documentation phase (e.g. Gaber et al., 2020). CAI tools offer a
plethora of options to speed up glossary creation (see Section 8.2.2 for a detailed descrip-
tion). Aspects such as the impact of automatic glossary creation on interpreters’ learning

129
The Routledge Handbook of Interpreting, Technology and AI

processes in the preparation phase and the subsequent effects on their interpretation, as well
as interpreters’ perception of automatically generated terminological resources, are yet to
be widely explored empirically but may yield useful insights for CAI tool development and
training.

IN-PROCESS SUPPORT FOR NUMBERS

Much of the concern expressed by interpreters about CAI tools’ ability to support them
revolves around the limited cognitive resources they have available for any additional
activity during interpreting, particularly in the simultaneous mode. It is only natural that
empirical research on CAI tools has focused mainly on the in-process phase and on the
simultaneous mode, which imposes the additional constraint that the interpreter’s pace is
largely determined by the speaker. Research on number interpreting with CAI tool support
gained momentum after Fantinuoli (2017b) proposed the integration of ASR into CAI tools
to prompt interpreters for common problem triggers (but see also Hansen-Schirra, 2012,
for a first theorisation of CAI-ASR integration). With staggering improvements in ASR
performance (see Section 8.3.2.1), using this technology to aid interpreters when faced with
numbers, the ‘problem trigger par excellence’ (Frittella, 2019), seems feasible and has been
the object of several studies.
When examining empirical research on ASR support for numbers, an evident method-
ological issue emerges: ‘Number interpreting’ is conceptualised differently in the studies
conducted, as observed by Frittella (2022b), limiting the alignment of research methods,
the comparability of studies, and the interpretation of results. Most studies conducted
on number interpreting with CAI tool support focus on the interpretation of the number
word, without expanding the analysis to other elements of discourse (e.g. only check-
ing whether the number ‘150’ is interpreted correctly instead of ‘150 km’). This narrow
focus allows for a direct assessment of whether interpreters are able to pick up the sug-
gestions offered by the tool but does not tell us how the input provided is integrated into
the interpreter’s rendition, or how such integration may affect the overall process. One
example is the study conducted by Desmet et al. (2018). In an experiment involving ten
Dutch students, the authors mocked up a system automatically prompting interpreters
for numbers of different magnitude and complexity, using PowerPoint slides time-aligned
with the speech. Evaluating the percentage of numbers correctly interpreted by the study
participants as compared to the condition without support, they identified statistically
significant accuracy gains of around 30%, especially for complex numbers and decimals,
and almost 90% fewer approximations. Defrancq and Fantinuoli (2020) and Pisani and
Fantinuoli (2021) also focused on the number word, using, however, a real application.
Both studies supported the initial findings by Desmet et al. (2018), identifying accu-
racy gains of 11.5–44.2% and 25%, respectively, for the CAI tool condition, although
Defrancq and Fantinuoli (2020) observed that differences were statistically significant
only for two out of six study participants.
Overcoming the narrow focus of these initial studies, Frittella (2022a, 2022b, 2023)
argued in favour of a more nuanced and holistic approach to research on computer-assisted
number interpreting. Her analysis expanded the scope by examining the entire numerical
information unit. For instance, the sentence ‘Chinese export value decreased by 3 billion US
dollars in 2019’ can be analysed by considering the rendition of the numeral (3 billion), the

130
Computer-assisted interpreting tools and tools training

referent (export value), the unit of measurement (US dollars), the relative value (decreased),
the time reference (in 2019), and the geographical location (in China) (Frittella, 2022a,
91–92). This approach allowed her to uncover issues which do not emerge when exclusively
looking at the number word, such as severe semantic errors.
Also recognising that local accuracy gains should not come at the expense of overall
interpreting quality, Defrancq et al. (2024) reanalysed the data from a previous study
(Defrancq and Fantinuoli, 2020). Evaluating participants’ renditions for accuracy and
acceptability beyond individual items (numbers), they did not find a significant effect of
ASR, but as higher accuracy with tool support was found, they were able to conclude that
ASR had, on balance, a positive effect on performance.
If the initial findings foreground the potential of CAI tools to improve interpreters’ accu-
racy in the rendition of numbers, the studies reviewed also highlight issues which deserve
further exploration. A prominent negative trend identified in several studies is a certain
overreliance on the tool for support, with interpreters’ performance dropping in case of tool
failure (Defrancq and Fantinuoli, 2020; Frittella, 2023). At the same time, Defrancq and
Fantinuoli (2020) postulate a potentially beneficial psychological effect (see also Van Cau-
wenberghe, 2020) due to the mere presence of the ASR-CAI tool, which may be perceived
as a safety net by interpreters, lowering the stress related to the interpreting task.

IN-PROCESS SUPPORT FOR SPECIALISED TERMINOLOGY

Despite CAI tools’ rapid evolution, many of the functions they offer still revolve around
terminology work. This concerns not only glossary preparation ahead of the interpreting
assignment but also the possibility to retrieve terminology quickly and effectively from
terminological databases while interpreting. This can be done by integrating ASR, as in the
case of numbers, but also by using the advanced search algorithm provided by some appli-
cations to facilitate manual queries.
Initial studies looked at in-process terminology support examining participants’ man-
ual queries (Biagini, 2015; Prandi, 2015a, 2015b). Contrasting CAI tool queries with
paper glossaries, Biagini (2015) found that participants achieved greater accuracy in the
CAI condition. Broadening the scope of the analysis to the sentences including the target
terms, he also found fewer non-strategic omissions when the tool (InterpretBank) was
used. Prandi (2015a, 2015b) contrasted two groups of students with differing levels of
exposure to the tool, finding overall high levels of accuracy, particularly for those who
had practiced more often and started developing their own strategies for an effective
interaction with the tool.
While in these initial studies the experimental materials were speeches characterised by
high terminological density, without further control of variables such as the distribution
of terms in the speech or the frequency of the terms selected as stimuli, the studies con-
ducted by Prandi (2017, 2018, 2023) pursued higher experimental control allowing for
an analysis focused on the specialised terms selected as stimuli. Adopting methods from
sentence processing research (Seeber and Kerzel, 2012; see also Keating and Jegerski,
2015), she prepared three ad hoc speeches which she controlled for the terms’ frequency,
their level of morphological complexity (unigrams, bigrams, and trigrams equally distrib-
uted in the speech), and their position in the sentence. Additionally, the target sentences
containing the specialised terms were preceded and followed by generic sentences without

131
The Routledge Handbook of Interpreting, Technology and AI

syntactical or terminological complexities. Comparing a digital PDF glossary, a CAI tool


with manual query (InterpretBank), and a simulated ASR-CAI tool, Prandi found better
scores for terminological accuracy, number of omissions, and severe errors when the
CAI tool was used, especially with ASR integration. Encouraging results in terms of the
accuracy achieved particularly for short unknown terms were also found by Van Cau-
wenberghe (2020).
Ünlü (2023a) piloted Sight-Terp, a system for computer-assisted consecutive inter-
preting, and compared consecutive interpretation tasks with and without the support
of tool. Computer-assisted consecutive interpreting (CACI; see Chen and Kruger, 2023)
is a hybrid mode combining an ASR transcript, either created automatically or by the
interpreter respeaking the source speech, and the machine-translated transcript, which
is then live post-edited by the interpreter. Ünlü’s study involved students working in the
Turkish–English pair, finding statistically significant higher accuracy in the Sight-Terp
condition, probably due to the availability of the automatically generated ASR tran-
script and the additional MT reference, which supported information retention and
production. However, longer time on task and an increase in disfluencies were found in
the CACI condition (see Ünlü, this volume). These findings contrast with those by Chen
and Kruger (2023, 2024a), who found increased fluency compared to conventional
consecutive interpreting, although the effect was only significant in the L1 (Chinese)
to L2 (English) direction. This discrepancy may be influenced by the fact that Ünlü’s
(2023a) study did not adopt respeaking, potentially leading to a lower-quality source
text for MT.
Despite the encouraging findings on the potential of CAI tools to provide support also
for specialised terminology, particularly when ASR is used, anecdotal observations and
qualitative data reveal issues in the interaction that align with the findings of studies con-
ducted on numbers and point to the complexity of using CAI tools while interpreting, even
those that do not require manual querying of the database.
As for the effects on the product, study participants tend to display a higher likelihood
of errors and omissions at tool failure (e.g. Frittella, 2023) or to unnecessarily self-correct
and not to notice that they are importing ASR errors into their renditions (e.g. Van Cau-
wenberghe, 2020). Interacting with an ASR-CAI tool may still result in accuracy issues and
omissions both locally and also at a higher level, such as in the case of semantic errors iden-
tified by Frittella (2023) particularly in complex and dense speech passages. Methodologi-
cally, such findings further stress the need to extend the analysis beyond the local problem
triggers to fully comprehend the implications of CAI tool support.
Across studies, both in the simultaneous and the consecutive mode, participants perceive
the tools as distracting. They struggle to allocate attention effectively because of the multi-
modal input and have difficulty quickly finding the right suggestion in long-term lists (e.g.
Van Cauwenberghe, 2020; Prandi, 2023).
Finally, while study participants tend to appreciate the ASR function, they point out that
it does not always provide them with the suggestions they need (see Prandi, 2023). Along
the same lines, the students participating in Ünlü’s (2023a) study doubted the reliability
of ASR and MT results, which impacted their experience. Such comments foreground the
importance of triangulating quantitative findings with qualitative data collected through
surveys, interviews, or focus groups, which allows for a deeper understanding of the user
experience. The issues reviewed entail important implications not only for training but also
for the user interface (UI) design of CAI tools.

132
Computer-assisted interpreting tools and tools training

8.3.1.2 Process-oriented research


Research on the CAI process, and more specifically on the impact of CAI tool use on inter-
preters’ cognitive processing, is relatively recent and still scarce. As observed by Chen and
Kruger (2023, 400):

[I]n spite of the promise that many of these technologies hold and the media hype
around them, very little empirical evidence exists on the effectiveness of the technolo-
gies in assisting the workflow and sub-processes of interpreting.

Therefore, it appears essential to experimentally probe some of the assumptions which ini-
tially prompted the development of CAI tools, that is, that technology support may alleviate
interpreters’ cognitive effort in rendering problem triggers such as specialised terminology
and numbers. At the same time, the cognitive load imposed by CAI may be higher due to
the additional input provided through ASR, for instance, or to the additional resources
necessary for manually querying the tool. The postulated additional cognitive load may
also arise due to the effortful allocation of attention to multiple, multimodal informa-
tion sources, as interpreters must split their attention between the auditory and the visual
input.
CAI is a complex object of study. Therefore, one concern of researchers is establish-
ing a suitable methodology for the exploration of questions linked with interpreters’
cognitive processes, in addition to the already more established research foci discussed
in the previous sections. Prandi’s (2023) PhD project aimed to establish a methodology
for studying computer-assisted simultaneous interpreting (CASI). She combined multiple
methods, adopting performance, subjective, and behavioural measures. Her goal was
to explore whether CAI tools, especially those with integrated ASR, could reduce inter-
preters’ cognitive effort while interpreting specialised terminology and help them focus
on the speaker, the primary source of information. Her findings pointed to statistically
significant lower cognitive effort in the ASR condition and suggested that it was easier for
participants to allocate attention to the speaker when a CAI tool was used. This would
indicate an advantage provided by bespoke tools for interpreters, while non-bespoke
tools resulted in equal time spent on tool and speaker, with frequent attention switching.
The study allowed to explore the benefits and shortcomings of using methods such as
accuracy ratings, fixation-based measures, and qualitative questionnaires. Other meas-
ures could be explored in future studies, such as disfluencies (see, for example, Gieshoff,
2021a, 2021b) or vocal correlates of arousal (e.g. Scherer, 1989). A first step in this direc-
tion was taken by Defrancq et al. (2024), who explored mean fundamental frequency (F0)
as a cognitive load indicator in interpreters for the first time. The authors’ assumption
that ASR may lead to additional load was, however, not substantiated by F0 data. This
warrants further research on other aspects of F0, such as standard deviation, peaks, and
ranges (Defrancq et al., 2024, 54).
Chen and Kruger (2023) reported on a study on CACI involving respeaking and live
post-editing of the machine-translated transcript. CACI was compared with conventional
consecutive interpreting in an experiment involving six students working in the English–
Chinese pair who had been trained in the new mode. As this is one of the few studies
openly addressing both the product and the process of CACI, it is presented here, while
also reporting the findings on the effects on accuracy and fluency. To measure the impact

133
The Routledge Handbook of Interpreting, Technology and AI

on the study participants’ cognitive load, the authors used the NASA Task Load Index
(NASA-TLX; Hart and Staveland, 1988). Quality was measured both in general terms
with a rubric-based rating scale by five raters and more in-depth with propositional rat-
ing. For propositional rating, the speeches were divided into units and scored either 0
or 1, depending on whether the target text matched the source. Fluency was assessed by
automatically calculating unfilled pauses and manually counting filled pauses. Chen and
Kruger expected higher accuracy and lower cognitive load in the CACI condition due to
the reduced pressure on interpreters’ working memory. For cognitive load, their hypoth-
esis was confirmed, but only in the L1–L2 direction (Chinese–English), while higher
accuracy and fewer unfilled pauses were found in the CACI condition. A replication of
the study with a larger sample of 13 students (Chen and Kruger, 2024a) yielded similar
results for cognitive load, but also for fluency and target language quality, which were,
however, significantly better only in the L1–L2 direction. Overall quality was also sig-
nificantly higher for the CACI condition, although more for the L1–L2 direction. These
findings suggest that cognitive load in CACI is modulated by directionality and overall
provide further evidence in favour of CAI also for the consecutive mode.
Process research may also help better understand how interpreters interact with support-
ing tools, which can have important repercussions for training. For instance, Chen and Kru-
ger (2024b) conducted an eye tracking study to investigate how trainees allocate attention
during CACI. Participants’ dwell time, fixation durations, and saccade lengths suggested
that they focused more on listening and respeaking during phase 1. However, monitoring
the source text led to better respeaking quality. Greater reliance on the MT text was found
for phase 2, which led to better quality, but only in the L2 (English) to L1 (Chinese) direc-
tion, requiring further investigation.
To sum up, while process-oriented research on CAI is still in its infancy, it has the poten-
tial to offer valuable insights into CAI, integrating product-oriented findings and deepen-
ing our understanding of technology’s impact on interpreters’ cognition. The distraction,
high effort in coordinating attention, and effortful visual search reported offer additional
research questions worth investigating, making process-oriented research on CAI a fruitful
field of enquiry. Additionally, studying CAI may add to our knowledge of the interpreting
process, while possibly attracting the interest of cognitive psychologists, especially as con-
cerns the processing of multimodal input.

8.3.2 System-centric research on CAI tools


While training is essential to foster an effective use of CAI tools throughout the phases of
an interpreting assignment, it is also fundamental that the applications be dependable. Oth-
erwise, interpreters may prefer not to use the tools, despite their potential and the observed
benefits, because they may perceive it as too risky.
Nonetheless, evaluations of system performance tend to come second when inter-
preter–CAI tool interaction is studied, while the focus lies on interpreters’ skills and
strategic approach. Knowing how well tools currently perform is essential for interpreters
to use them effectively and make informed choices about their inclusion in workflows.
At the same time, usability research can help identify shortcomings in the tools’ design,
which may impact the interaction and which have long gone unnoticed because, as Frit-
tella (2023) observes, development has been mostly intuitive and not research-based or

134
Computer-assisted interpreting tools and tools training

validated. The following sections report on research about system performance and the
tools’ usability.

8.3.2.1 System performance


System performance is examined in dedicated studies, often conducted by CAI tool develop-
ers (e.g. Rodríguez et al., 2021; Fantinuoli et al., 2022), but some aspects of system perfor-
mance are also mentioned incidentally in research centred on interpreter–CAI tool interaction.
Research on system performance started in the area of CAI with the integration of ASR into
CAI tools (see Fantinuoli, 2017b) and focuses on two essential aspects of ASR-enhanced
applications, namely, system latency and precision. Although ASR still faces challenges typi-
cal of spoken language, such as speaker variability, ambiguity, speech continuity, and speed,
it is now mature enough to be considered a potentially useful addition to interpreters’ toolkit.

LATENCY

System latency refers to the delay with which the output of the ASR process is presented to
interpreters on the screen. Keeping latency low is of major importance for interpreters, as
high latency may pose an excessive strain on working memory and exacerbate issues related
to the coordination of auditory and visual-verbal input. The main assumption is that if sys-
tem latency can fit within interpreters’ ear–voice span (EVS), it may be perceived as accept-
able and allow for the integration of the tools’ suggestions in the interpretation. However,
interpreters adjust their EVS constantly, for instance, by shortening it to a minimum in the
case of question-and-answer sessions or heated debates, or for the interpretation of num-
bers, and so even very low system latency may be insufficient to cater to such specific needs.
Being aware of what can be reasonably expected of the tool is essential for interpreters to
make strategic choices in terms of supporting technologies.
Current research findings suggest that CAI tools’ latency may be sufficiently low to fit
within interpreters’ EVS. In their study on ASR support for the interpretation of numbers,
Defrancq and Fantinuoli (2020) found sufficiently low system latency, below the crucial
threshold of 2 1/2 to 3 sec reported in literature. They conclude that ‘provided interpret-
ers maintain an average EVS, the number is readable in its final version before interpret-
ers reach the point at which they would deliver it’ (Defrancq and Fantinuoli, 2020, 89).
Sufficiently low latency was also found by Van Cauwenberghe (2020) when using Inter-
pretBank as support for the interpretation of specialised terminology, although very high
latencies (up to about 11 sec) were occasionally observed for the terms at the beginning
of the speech, suggesting that the CAI tool needed some time to warm up, offering better
performance downstream. Satisfactory latency was also found by Fantinuoli et al. (2022),
with an average of 1.6 sec (range: 1.1–2.3 sec). A study by Fantinuoli and Montecchio
(2023) aimed to define the maximum acceptable latency for an ASR-CAI tool. To explore
this question, they compared increasingly high latencies, from 1 to 5 sec, and analysed the
effects on interpreters’ accuracy and fluency of rendition. Their study suggested that inter-
preters may be able to cope with latencies of up to 3 sec, corroborating the findings from
previous studies. While these results are encouraging, it should be noted that they represent
the system performance under controlled laboratory conditions, and further tests in real-life
settings may reveal additional shortcomings.

135
The Routledge Handbook of Interpreting, Technology and AI

PRECISION

The precision achieved by the ASR module of a third-generation CAI tool has been an
important concern since the hypothesised use of this technology for interpreters’ support.
High precision and recall are essential for an effective integration of ASR suggestions into
the interpreters’ rendition. Precision is defined as ‘the fraction of relevant instances among
the retrieved instances’, while recall is ‘the fraction of relevant instances that have been
retrieved over the total amount of relevant instances present in the speech’ (Fantinuoli,
2017b, 30). Precision should be prioritised over recall to produce relevant results.
A series of benchmark tests on CAI tools with integrated ASR reveals satisfactory results.
For instance, testing an ASR prototype, Fantinuoli (2017b) found a word error rate (WER)
of 5.04%, which dropped from 10.92% after the system was trained on a specialised glos-
sary. The system reached an F1 value, which considers both precision and recall, of 0.97 for
terms and of 1 for numbers. Encouraging results were also obtained for SmarTerp, both for
ASR and for NER, especially after an adaptation stage. Performance was found to be mod-
ulated by language (Rodríguez et al., 2021). The live support function of KUDO Interpreter
Assist was also recently tested (Fantinuoli et al., 2022), with encouraging results, such as an
F1 score of around 98% and good performance both with and without fine-tuning. NER
reached peaks of 100% for precision, recall, and F1 score, especially with fine-tuning.
The results reported in these studies suggest that ASR is mature enough to help interpret-
ers, although systems do not perform equally well on all problem triggers: While perfor-
mances are very good for numbers, NER is still a challenging task for machines (see, for
example, Gaido et al., 2021). Further testing is needed, also under real-life conditions, and
on a wider variety of languages, especially low-resource ones. Failing to account for these
modulating factors may yield a non-representative picture of the current level of devel-
opment of ASR technology and perpetuate disparities between interpreters working with
major languages and those providing services for minority languages. Furthermore, sys-
tem accuracy should not be limited to in-process support but include the performance of
systems automatically generating glossaries to help interpreters in the preparation phase.
While results in terms of the quality and relevance of the glossary items retrieved and
the quality of MT are satisfactory (e.g. Fantinuoli et al., 2022), further investigations are
needed on this topic.

8.3.2.2 Usability
Research on the usability of CAI tools is arguably at the intersection of interpreter-centric
and system-centric research, as it concerns both human factors and system-related issues.
Usability studies foreground the non-negligible role of design in the complex interpreter–
tool equation. Rather, design represents a defining factor, because ‘details matter in the
design of any user interface, where seemingly small features can significantly impact users’
performance’ (Frittella, 2023, 151).
Currently, usability testing on stand-alone CAI tools is missing, with design features
being discussed only marginally in empirical research on CAI. Only two research projects
have adopted a usability perspective so far, exploring which design features may facilitate
or hinder interpreters’ interaction and satisfaction with the tools: Saeed et al. (2022), who
studied how the integration of ASR into RSI platforms may support source text comprehen-
sion, and Frittella (2023), who addressed the impact on delivery accuracy.

136
Computer-assisted interpreting tools and tools training

Research on the usability of CAI tools adopts methods from user experience research
and human–computer interaction, representing a much-needed novelty in research on inter-
preters’ supporting tools. CAI tools are usually considered intuitive and user-friendly, but
this assumption has not yet been fully explored empirically. Saaed et al. (2022) found that
reduced visual information in RSI interfaces promotes a state of flow, facilitating the inter-
preting task. The mocked-up ASR feature presented interpreters with the live-generated full
transcript of the source speech. Using a convergent mixed-method design gathering per-
formance and subjective (perception) data through SI tests, post-task questionnaires, and
semi-structured interviews, Frittella’s (2023) study probed the soundness of the SmarTerp
RSI-CAI tool design. Her findings revealed the importance of customisation to tailor the
tool’s appearance to the user’s needs, as excessive or unnecessary input can be disruptive.
Usability studies not only are useful to identify general principles for effective tool design
but also provide tangible suggestions. For instance, they reveal that interpreters prefer see-
ing the ASR suggestions at the bottom of the screen, where they are used to reading subtitles
(Saeed et al., 2022). They also bring up a number of open questions, for instance, about the
most effective way of suggesting problem triggers to interpreters, or about the language in
which terms and acronyms should be presented (Frittella, 2023).

8.3.3 Methodological limitations and future research desiderata


As emerges from the review of studies presented in the previous sections, research on CAI is
an emerging field in interpreting studies, one that is still quite fragmented but that is gaining
increasing traction, and which has the potential to yield valuable insights for the improve-
ment of tools and their use by interpreters. If research on CAI is to achieve the objectives it
sets for itself, several limitations will have to be addressed in the future.
Many studies still have an exploratory nature and are conducted on small samples.
While a small N is not necessarily bad (Smith and Little, 2018), the small sample sizes limit
the generalisability of results, which can, in many cases, be considered tentative at best. The
extrapolation of findings to the larger population of interpreters is further hampered by the
inclusion of trainees in empirical studies, rather than of practising experienced interpreters
(with some exceptions; see, for example, Frittella, 2022b, 2023). While this choice is often
dictated by reasons of convenience and may have been motivated by the limited uptake
of tools when empirical research on CAI began, it is now conceivable to include a larger
number of professionals in such studies. As Chen and Kruger (2023, 2024b) point out,
improvements may be higher or lower in professional interpreters.
The scope of research on CAI should also be expanded to include more language combi-
nations beyond major European languages, such as English, German, Spanish, and Italian,
or Chinese, which currently limits the generalisability of results. After probing the tools
under laboratory conditions, researchers may want to further test interpreter–machine
interaction in real-life settings, where additional factors may influence both system and
interpreter performance and user behaviour. For such field studies, however, it may be more
difficult to obtain permission to observe CAI tool use in situ.
Finally, as pointed out by Bowker (2022), participant-oriented research on CAI could involve
other players, such as the recipients of materials created with CAI tools (e.g. glossaries shared
among colleagues) or, most importantly, for listeners of computer-assisted ­interpretation. The
communicative impact of CAI tool use remains a widely under-researched topic.

137
The Routledge Handbook of Interpreting, Technology and AI

8.4 CAI tool training


The complexity of the interaction between interpreters and CAI tools suggests that training
on these technologies should become an integral part of interpreting curricula. As stressed
in a recent panel on interpreter training (Rodriguez et al., 2022), training on and with CAI
tools is recognised as a priority and a necessity by higher education institutions, interna-
tional organisations, professional associations, and research representatives. As much of
interpreting has shifted to online settings, thanks to RSI proving a feasible alternative to in
situ interpreting, interpreters will be increasingly exposed not only to technologies enabling
the provision of interpreting services but also to supporting technologies which RSI plat-
form developers are starting to incorporate (Defrancq, 2023). CAI tools may prove particu-
larly beneficial in online settings, acting as artificial boothmates to make up for suboptimal
in-booth communication, but also beyond RSI assignments, to help interpreters cope with
increasingly shorter preparation times (Rodriguez et al., 2022).
While CAI tools are becoming ever more relevant for interpreters, their novel character, cou-
pled with several additional challenges, has so far limited their integration into interpreting cur-
ricula. Quite revealing in this respect is, for instance, the lack of a dedicated chapter in a recent
volume on training and technology for interpreters and translators (Rodríguez Melchor et al.,
2020). The few surveys conducted on the use of technologies in practice and training reveal a
certain difficulty in introducing CAI tools into university curricula. While the number of univer-
sities including interpreting technologies in their interpreting degrees has significantly improved
since Berber-Irabien’s (2010) study, as found by Prandi (2020) in her 2017 survey of CIUTI
universities, CAI tools are yet to become a staple in interpreter training. This is somewhat
curbed by the abundance of CPD offerings available (Defrancq, 2023), which, however, tend to
focus on providing knowledge about the individual applications rather than on the underlying
technologies and on relevant skill sets. Outside of the European context, universities seem to be
lagging even more behind: Recent surveys by Wan and Yuan (2022) and Wang and Li (2022)
revealed that Chinese interpreters are interested in technologies, including CAI tools, but they
lament a lack of appropriate training and, as a consequence, skills and knowledge.
Due to the relatively recent introduction of CAI tools, direct experience of working with
them is lacking. Trainers cannot resort to established best practices and to their expertise
when introducing CAI tools to students. Fantinuoli and Prandi (2018) provided a series
of suggestions for training based on their expert knowledge, but their proposal has not
been tested empirically. A certain confusion about the technology itself, identified, for
instance, by Prandi (2020), coupled with a potential reticence among trainers (Rodriguez
et al., 2022), may further limit the uptake of CAI tools in training. The rapid technological
developments, boosted by staggering advances in AI, complicate the provision of relevant
knowledge, as universities are called upon to ensure the longevity of their programmes in
the face of an ever-evolving technological landscape. Despite these issues, scholarly discus-
sion on what should be taught, how, and when is currently limited. Furthermore, the dearth
of empirical research and of educational research aimed at identifying educational needs
currently only allows for the formulation of tentative proposals. There are also practical
limitations, as the price point of interpreting technologies, CAI tools included, can be rather
steep (Prandi, 2020; see also Defrancq, 2023).
The available reflections on the integration of technologies into interpreter training seem
to agree on the need to provide the future generations of interpreters with both theoreti-
cal knowledge and relevant skills. Ideally, both knowledge and skills, identified as key

138
Computer-assisted interpreting tools and tools training

elements of technological competence (Wang and Li, 2022), should be derived from empiri-
cal research (Rodriguez et al., 2022). Providing trainees with transferable knowledge on
the technologies seems essential in the face of rapid technological advances (Fantinuoli and
Prandi, 2018; Defrancq, 2023). Specifically, trainees should be aware of how technolo-
gies work and develop critical judgment to be able to assess the tools’ performance, the
impact of CAI tools on their own performance, and the implications of technology use,
for instance, when ASR is involved. As Defrancq (2023, 305) observes, training on ethics
becomes even more important when technology is involved.
Empirical process-oriented research can provide support in developing students’ critical
thinking but also help scholars and trainers identify the skills needed for an effective and
efficient use of CAI tools. This would include not only mere operational skills, as most CAI
tools have intuitive interfaces, but also especially strategic knowledge on how to integrate the
interaction with CAI tools in the already-complex interpreting process. This goes beyond the
individual, in-process cognitive processes but extends to all phases of interpreting, and to
the constellation of interpreter teams and supporting applications. At the moment of writ-
ing, only one empirical study was conducted with a focus on the inclusion of CAI tools into
interpreting curricula (Prandi, 2015a, 2015b). Despite its exploratory nature, it revealed the
multifaceted nature of the interaction with CAI tools: Issues such as overreliance, distribution
of attention, the practical arrangement of the tool and other support materials in the booth,
and in-team coordination were also identified in subsequent empirical research, not only
on students, but also on practising interpreters. As described in Rodriguez et al. (2022, 85),
which references a PhD study by Frittella (2024), educational research methods may contrib-
ute to helping define how, what, and when to teach, as this approach involves research-based
interventions consisting of ‘(1) the identification of an educational need based on research, (2)
the design of the intervention, (3) its development and (4) evaluation to improve the solution,
on the one hand, and contribute to theoretical understanding, on the other’.
Not only are we still far from defining what the content of CAI tool training should be,
but teaching methods are also still under discussion. As mentioned earlier, experience-based
training is rarely possible in this area of interpreting technologies. Fantinuoli and Prandi
(2018) suggest that training on interpreting technologies should follow the constructivist
approach proposed by Kiraly (2000). While trainees should acquire theoretical knowledge
on the technologies, practical exposure to an increasingly complex use of CAI tools should
promote the development of relevant skills. The authors offer a series of practical sugges-
tions to help guide students in acquiring said competences. Defrancq (2023) argues that
training on technologies, and therefore on CAI tools, should not be encased in stand-alone
modules, as seems to be the case for some training institutions (see Prandi, 2020), but
should rather be routinely integrated horizontally into interpreter training. This is no
small feat, and it is possible that both dedicated modules aimed at providing the necessary
knowledge and the integration of technologies into regular interpreting classes may prove
beneficial to prepare the future generation of interpreters for the evolving interpreting mar-
kets. Further educational research on CAI tools may help universities substantiate their
training.

8.5 Conclusion
Despite current shortcomings and our still limited understanding of its implications,
CAI holds exciting prospects for improving interpreters’ work. Thanks to the staggering

139
The Routledge Handbook of Interpreting, Technology and AI

progress made by AI, it is safe to assume that CAI tools are poised for significant advance-
ments in the coming years. Broadening the scope of research, improving tool design, and
devising impactful training approaches are goals worth pursuing, as is a reflection on a
potential broader application of AI in the area of supporting technologies for interpret-
ers. For instance, research on CAI may explore the possibility for interpreters to leverage
AI beyond traditional applications, to generate speeches for training, or to automatically
evaluate performances (Ünlü, 2023b), and investigate what this might mean for training.
The considerable investments in machine interpreting propelled by RSI platform provid-
ers may bring about benefits for interpreters, as the underlying technologies, such as ASR,
are also key components of next-generation CAI tools. This represents a shift compara-
ble to the industry’s investment in CAT tools but also brings about similar risks, such as
the exclusion of interpreters from the development process. Research on CAI tools should
therefore also elucidate such risks, expanding its scope to the broad implications of the
‘technological turn’ (Fantinuoli, 2019) for the profession.
In the future, the current shortcomings of CAI tools may be addressed by further testing
predictive approaches (see, for example, Vogler et al., 2019) for the automatic assessment
of elements likely to be left untranslated by interpreters, thus providing more targeted sup-
port and alleviating the additional cognitive load imposed on interpreters by the tool’s
presence. Working towards interpreter augmentation (Fantinuoli and Dastyar, 2022), one
avenue currently being explored by researchers is the use of augmented reality (AR) to
alleviate negative split attention effects due to the interpreter having to attend to multiple
sources of information for support (Gieshoff et al., 2024). However, AR is not automati-
cally synonymous with augmentation. The very concept of augmented interpretation is still
undefined, and the precise definition of augmentation remains fluid.
In interpreting, augmentation could take various forms. One promising approach is offered
by research conducted on augmented cognition systems, as recently discussed by O’Brien
(2023), for translation. Her reflections on the implications of cognitive augmentation for
translators may be extrapolated to interpreting, urging future research to scrutinise how far
interpreter support may be pushed, whether the move towards augmented interpretation may
result in more interpreter-centric applications, and what this might entail for our conceptuali-
sation of cognition in interpreting and for the future of the profession. Being able to ask bold
questions will be essential to help the field navigate this ever-evolving landscape.

Notes
1 http://fourwillows.com/interplex.html (accessed 7.11.2024).
2 https://www.flashterm.eu/index.html (accessed 17.2.2025).
3 www.glossarmanager.de/ (accessed 7.11.2024).
4 https://interpretershelp.com/ (accessed 7.11.2024).
5 https://interpretbank.com/site/ (accessed 7.11.2024).
6 www.sightterp.net/ (accessed 10.7.2024).

References
Berber-Irabien, D.-C., 2010. Information and Communication Technologies in Conference Interpret-
ing (PhD thesis). Universitat Rovira i Virgili. URL http://hdl.handle.net/10803/8775 (accessed
9.10.2024).
Biagini, G., 2015. Glossario cartaceo e glossario elettronico durante l’interpretazione (MA thesis).
Università di Trieste.

140
Computer-assisted interpreting tools and tools training

Bowker, L., 2022. Computer-Assisted Translation and Interpreting Tools. In Zanettin, F., Run-
dle, C., eds. The Routledge Handbook of Translation and Methodology. Routledge, Oxon and
New York, 392–409. URL https://www.taylorfrancis.com/chapters/edit/10.4324/9781315158945-28/
computer-assisted-translation-interpreting-tools-lynne-bowker
Braun, S., 2019. Technology and Interpreting. In O’Hagan, M., ed. The Routledge Handbook of Transla-
tion and Technology. Routledge, London, 271–288 URL https://www.taylorfrancis.com/chapters/
edit/10.4324/9781315311258-19/technology-interpreting-sabine-braun?context=ubx.
Chen, S., Kruger, J.-L., 2023. The Effectiveness of Computer-Assisted Interpreting: A Preliminary
Study Based on English-Chinese Consecutive Interpreting. Translation and Interpreting Studies
18(3), 399–420. URL https://doi.org/10.1075/tis.21036.che
Chen, S., Kruger, J.-L., 2024a. A Computer-Assisted Consecutive Interpreting Workflow: Training
and Evaluation. The Interpreter and Translator Trainer 18(3), 380–399. URL https://doi.org/10/
gt3736
Chen, S., Kruger, J.-L., 2024b. Visual Processing During Computer-Assisted Consecutive Interpret-
ing: Evidence from Eye Movements. Interpreting 26(2), 231–252. URL https://doi.org/10/gt3738
Corpas Pastor, G., Fern, L.M., 2016. A Survey of Interpreters’ Needs and Practices Related to Lan-
guage Technology. Universidad de Malaga.
Costa, H., Corpas Pastor, G., Durán Muñoz, I., 2014. A Comparative User Evaluation of Terminol-
ogy Management Tools for Interpreters. In Drouin, P., Grabar, N., Hamon, T., Kageura, K., eds.
Proceedings of the 4th International Workshop on Computational Terminology (Computerm).
Association for Computational Linguistics and Dublin City University, Dublin, Ireland, 68–76.
URL https://doi.org/10.3115/v1/W14-4809
Defrancq, B., 2023. Technology in Interpreter Education and Training: A Structured Set of Proposals.
In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies – Current and Future Trends.
John Benjamins Publishing Company, Amsterdam (IVITRA Research in Linguistics and Litera-
ture, 37), 302–319. URL https://doi.org/10.1075/ivitra.37.12def
Defrancq, B., Fantinuoli, C., 2020. Automatic Speech Recognition in the Booth: Assessment of Sys-
tem Performance, Interpreters’ Performances and Interactions in the Context of Numbers. Target
33(1), 73–102. URL https://doi.org/10.1075/target.19166.def
Defrancq, B., Snoeck, H., Fantinuoli, C., 2024. Interpreters’ Performances and Cognitive Load in
the Context of a CAI Tool. In Deane-Cox, S., Böser, U., Winters, M., eds. Translation, Interpret-
ing and Technological Change: Innovations in Research, Practice and Training. Bloomsbury Aca-
demic, London (Bloomsbury Advances in Translation), 38–58.
Desmet, B., Vandierendonck, M., Defrancq, B., 2018. Simultaneous Interpretation of Numbers and
the Impact of Technological Support. In Fantinuoli, C., ed. Interpreting and Technology. Language
Science Press, Berlin (Translation and Multilingual Natural Language Processing, 11), 13–27. URL
https://doi.org/10.5281/zenodo.1493291
Deysel, E., 2023. Investigating the Use of Technology in the Interpreting Profession: A Comparison
of the Global South and Global North. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Tech-
nologies – Current and Future Trends. John Benjamins Publishing Company, Amsterdam (IVITRA
Research in Linguistics and Literature, 37), 142–168. URL https://doi.org/10.1075/ivitra.37.06dey
Fan, D.C., 2024. Conference Interpreters’ Technology Readiness and Perception of Digital Technolo-
gies, Interpreting 26(2), 178-200. URL https://doi.org/10.1075/intp.00110.fan
Fantinuoli, C., 2017a. Computer-Assisted Preparation in Conference Interpreting. Translation and
Interpreting 9(2), 24–37. URL https://doi.org/10.12807/ti.109202.2017.a02
Fantinuoli, C., 2017b. Speech Recognition in the Interpreter Workstation. In Esteves-Ferreira, J.,
Macan, J., Mitkov, R., Stefanov, O.-M., eds. Proceedings of the 39th Conference Translating and
the Computer. Editions Tradulex, London, 25–34. URL https://www.asling.org/tc39/wp-content/
uploads/TC39-proceedings-final-1Nov-4.20pm.pdf
Fantinuoli, C., 2019. The Technological Turn in Interpreting: The Challenges That Lie Ahead. In
Baur, W., Mayer, F., eds. Proceedings of the Conference Übersetzen und Dolmetschen 4.0. – Neue
Wege im digitalen Zeitalter. BDÜ Fachverlag, Bonn, 334–354.
Fantinuoli, C., 2021. Conference Interpreting and New Technologies. In Albl-Mikasa, M., Tiselius,
E., eds. The Routledge Handbook of Conference Interpreting. Routledge, London, 508–522. URL
https://doi.org/10.4324/9780429297878-44

141
The Routledge Handbook of Interpreting, Technology and AI

Fantinuoli, C., Dastyar, V., 2022. Interpreting and the Emerging Augmented Paradigm. Interpreting
and Society 2(2), 185–194. URL https://doi.org/10.1177/27523810221111631
Fantinuoli, C., Montecchio, M., 2023. Defining Maximum Acceptable Latency of AI-Enhanced CAI
Tools. In Ferreiro Vázquez, Ó., Correia, A., Araújo, S., eds. Technological Innovation Put to the
Service of Language Learning, Translation and Interpreting: Insights from Academic and Profes-
sional Contexts. Peter Lang, Berlin (Lengua, Literatura, Traducción, 2), 213–225.
Fantinuoli, C., Prandi, B., 2018. Teaching Information and Communication Technologies: A Pro-
posal for the Interpreting Classroom. Trans-kom: Journal of Translation and Technical Commu-
nication Research 11(2), 162–182. URL https://www.trans-kom.eu/bd11nr02/trans-kom_11_02_
02_Fantinouli_Prandi_Teaching.20181220.pdf
Fantinuoli, C., Marchesini, G., Landan, D., Horak, L., 2022. KUDO Interpreter Assist: Automated
Real-Time Support for Remote Interpretation. In Esteves-Ferreira, J., Mitkov, R., Recort Ruiz, M.,
Stefanov, O.-M., Chambers, D., Macan, J., Sosoni, V., eds. Proceedings of the 43rd Conference
Translating and the Computer. Editions Tradulex, Geneva, 68–77. URL https://www.tradulex.
com/varia/TC43-OnTheWeb2021.pdf.
Frittella, F.M., 2019. “70.6 Billion World Citizens”: Investigating the Difficulty of Interpreting Num-
bers. The International Journal of Translation and Interpreting Research 11(1), 79–99. URL
https://doi.org/10.12807/ti.111201.2019.a05
Frittella, F.M., 2022a. ASR-CAI Tool-Supported SI of Numbers: Sit Back, Relax and Enjoy Interpret-
ing? In Esteves-Ferreira, J., Mitkov, R., Recort Ruiz, M., Stefanov, O.-M., Chambers, D., Macan,
J., Sosoni, V., eds. Proceedings of the 43rd Conference Translating and the Computer. Editions
Tradulex, Geneva, 88–102. URL https://www.tradulex.com/varia/TC43-OnTheWeb2021.pdf
Frittella, F.M., 2022b. CAI Tool-Supported SI of Numbers: A Theoretical and Methodological
Contribution. International Journal of Interpreter Education 14(1), 32–56. URL https://doi.
org/10.34068/ijie.14.01.05
Frittella, F.M., 2023. Usability Research for Interpreter-Centred Technology: The Case Study of
SmarTerp. Language Science Press, Berlin (Translation and Multilingual Natural Language Pro-
cessing, 21). URL https://doi.org/10.5281/zenodo.7376351
Frittella, F.M., 2024. Computer-Assisted Interpreting: Cognitive Task Analysis and Evidence-Informed
Instructional Design Recommendations (PhD thesis). University of Surrey. URL https://doi.org/
10.15126/thesis.901410
Gaber, M., Corpas Pastor, G., Omer, A., 2020. Speech-to-Text Technology as a Documentation Tool
for Interpreters: A New Approach to Compiling an Ad Hoc Corpus and Extracting Terminology
from Video-Recorded Speeches. TRANS. Revista de Traductología 24, 263–281. URL https://doi.
org/10.24310/TRANS.2020.v0i24.7876
Gaido, M., Rodríguez, S., Negri, M., Bentivogli, L., Turchi, M., 2021. Is “Moby Dick” a Whale or a
Bird? Named Entities and Terminology in Speech Translation. In Moens, M.-F., Huang, X., Specia,
L., Yih, S. W.-T., eds. Proceedings of the 2021 Conference on Empirical Methods in Natural Lan-
guage Processing, 1707–1716. URL https://doi.org/10.18653/v1/2021.emnlp-main.128
Ghent University, 2021. Ergonomics for the Artificial Booth Mate (EABM). URL www.eabm.ugent.
be/survey/ (accessed 7.10.2024).
Gieshoff, A.C., 2021a. Does It Help to See the Speaker’s Lip Movements? An Investigation of Cog-
nitive Load and Mental Effort in Simultaneous Interpreting. Translation, Cognition & Behavior
4(1), 1–25. URL https://doi.org/10.1075/tcb.00049.gie
Gieshoff, A.C., 2021b. The Impact of Visible Lip Movements on Silent Pauses in Simultaneous Inter-
preting. Interpreting 23(2), 168–191. URL https://doi.org/10.1075/intp.00061.gie
Gieshoff, A.C., Schuler, M., Jahany, Z., 2024. The Augmented Interpreter: An Exploratory Study of
the Usability of Augmented Reality Technology in Interpreting. Interpreting 26(2), 282–315. URL
https://doi.org/10.1075/intp.00108.gie
Gile, D., 2009. Basic Concepts and Models for Interpreter and Translator Training. John Benja-
mins Publishing Company, Amsterdam (Benjamins Translation Library, 8). URL https://doi.
org/10.1075/btl.8
Goldsmith, J., 2018. Tablet Interpreting: Consecutive Interpreting 2.0. Translation and Interpreting
Studies 13(3), 342–365. URL https://doi.org/10.1075/tis.00020.gol
Guo, M., Han, L., Anacleto, M.T., 2023. Computer-Assisted Interpreting Tools: Status Quo and Future Trends.
Theory and Practice in Language Studies 13(1), 89–99. URL https://doi.org/10.17507/tpls.1301.11

142
Computer-assisted interpreting tools and tools training

Hansen-Schirra, S., 2012. Nutzbarkeit von Sprachtechnologien für die Translation. Trans-kom: Journal
of Translation and Technical Communication Research 5(2), 211–226. URL https://www.trans-kom.
eu/bd05nr02/trans-kom_05_02_02_Hansen-Schirra_Sprachtechnologien.20121219.pdf
Hart, S.G., Staveland, L.E., 1988. Development of NASA-TLX (Task Load Index): Results of Empir-
ical and Theoretical Research. In Hancock, P.A., Meshkati, N., eds. Advances in Psychology.
North-Holland, Amsterdam, 139–183. URL https://doi.org/10.1016/S0166-4115(08)62386-9
Jiang, H., 2013. The Interpreter’s Glossary in Simultaneous Interpreting: A Survey. Interpreting 15(1),
74–93. URL https://doi.org/10.1075/intp.15.1.04jia
Kalina, S., 2005. Quality Assurance for Interpreting Processes. Meta: Translators’ Journal 50(2),
768–784. URL https://doi.org/10.7202/011017ar
Keating, G.D., Jegerski, J., 2015. Experimental Designs in Sentence Processing Research: A Meth-
odological Review and User’s Guide. Studies in Second Language Acquisition 37(1), 1–32. URL
https://doi.org/10.1017/S0272263114000187
Kiraly, D., 2000. A Social Constructivist Approach to Translator Education: Empowerment from
Theory to Practice. St. Jerome Publishing, Manchester/Northampton.
Mellinger, C.D., 2019. Computer-Assisted Interpreting Technologies and Interpreter Cognition:
A Product and Process-Oriented Perspective. Revista Tradumàtica. Tecnologies de la Traducció
17, 33–44. URL https://doi.org/10.5565/rev/tradumatica.228
Mellinger, C.D., Hanson, T.A., 2018. Interpreter Traits and the Relationship with Technology and Visibil-
ity. Translation and Interpreting Studies 13(3), 366–392. URL https://doi.org/10.1075/tis.00021.mel
O’Brien, S., 2023. Human-Centered Augmented Translation: Against Antagonistic Dualisms. Per-
spectives 32(3), 391–406. URL https://doi.org/10.1080/0907676X.2023.2247423
Pisani, E., Fantinuoli, C., 2021. Measuring the Impact of Automatic Speech Recognition on Number Rendition
in Simultaneous Interpreting. In Wang, C., Zheng, B., eds. Empirical Studies of Translation and Interpreting:
The Post-Structuralist Approach. Routledge, New York, 181–197. URL https://www.taylorfrancis.com/
chapters/edit/10.4324/9781003017400-14/measuring-impact-automatic-speech-recognition-number-
rendition-simultaneous-interpreting-elisabetta-pisani-claudio-fantinuoli?context=ubx
Prandi, B., 2015a. L’uso di InterpretBank nella didattica dell’interpretazione: Uno studio esplorativo (MA
thesis). Università di Bologna. URL https://amslaurea.unibo.it/id/eprint/8206 (accessed 7.10.2024).
Prandi, B., 2015b. The Use of CAI Tools in Interpreters’ Training: A Pilot Study. In Esteves-Ferreira,
J., Macan, J., Mitkov, R., Stefanov, O.-M., eds. Proceedings of the 37th Conference Translating
and the Computer. Translating and the Computer. AsLing, London, 48–57. URL https://aclanthol-
ogy.org/2015.tc-1.8
Prandi, B., 2017. Designing a Multimethod Study on the Use of CAI Tools During Simultaneous Inter-
preting. In Esteves-Ferreira, J., Macan, J., Mitkov, R., Stefanov, O.-M., eds. Proceedings of the 39th
Conference Translating and the Computer: Translating and the Computer. AsLing, Geneva, 76–88.
URL www.asling.org/tc39/wp-content/uploads/TC39-proceedings-final-1Nov-4.20pm.pdf
Prandi, B., 2018. An Exploratory Study on CAI Tools in Simultaneous Interpreting: Theoretical
Framework and Stimulus Validation. In Fantinuoli, C., ed. Interpreting and Technology. Language
Science Press, Berlin (Translation and Multilingual Natural Language Processing, 11), 29–59. URL
https://doi.org/10.5281/zenodo.1493293
Prandi, B., 2020. The Use of CAI Tools in Interpreter Training: Where Are We Now and Where Do
We Go from Here? In TRAlinea, 1–10. URL https://www.intralinea.org/specials/article/2512
Prandi, B., 2023. Computer-Assisted Simultaneous Interpreting: A Cognitive-Experimental Study on
Terminology. Language Science Press, Berlin (Translation and Multilingual Natural Language Pro-
cessing, 22). URL https://doi.org/10.5281/zenodo.7143056
Rodríguez, N., Schnell, B., 2009. A Look at Terminology Adapted to the Requirements of Inter-
pretation. Language Update 6(1), 21–25. URL https://www.noslangues-ourlanguages.gc.ca/fr/
favourite-articles/terminology-adapted-requirements-interpretation
Rodríguez, S., Frittella, F.M., Okoniewska, A.M., 2022. A Paper on the Conference Panel “In-Booth
CAI Tool Support in Conference Interpreter Training and Education”. In Esteves-Ferreira, J., Mit-
kov, R., Recort Ruiz, M., Stefanov, O.-M., Chambers, D., Macan, J., Sosoni, V., eds. Proceedings
of the 43rd Conference Translating and the Computer. Editions Tradulex, Geneva, 78–87. URL
www.tradulex.com/varia/TC43-OnTheWeb2021.pdf (accessed 7.10.2024).
Rodríguez, S., Gretter, R., Matassoni, M., Falavigna, D., Alonso, Á., Corcho, O., Rico, M., 2021.
SmarTerp: A CAI System to Support Simultaneous Interpreters in Real-Time. In Mitkov, R.,

143
The Routledge Handbook of Interpreting, Technology and AI

Sosoni, V., Giguère, J. C., Murgolo, E., Deysel, E., eds. Proceedings of the Translation and Inter-
preting Technology Online ­Conference. Online: INCOMA Ltd., 102–109. URL https://doi.org/10
.26615/978-954-452-071-7_012
Rodríguez Melchor, M.D., Horváth, I., Ferguson, K., eds., 2020. The Role of Technology in Confer-
ence Interpreter Training. Peter Lang, New York.
Rütten, A., 2004. Why and in What Sense Do Conference Interpreters Need Special Software? Lin-
guistica Antverpiensia, New Series – Themes in Translation Studies 3, 167–178. URL https://doi.
org/10.52034/lanstts.v3i.110
Rütten, A., 2007. Informations- und Wissensmanagement im Konferenzdolmetschen. Peter Lang,
Berlin (Sabest. Saarbrücker Beiträge zur Sprach- und Translationswissenschaft, 15).
Saeed, M. A., Rodríguez González, E., Korybski, T., Davitti, E., Braun, S., 2022. Connected Yet
Distant: An Experimental Study into the Visual Needs of the Interpreter in Remote Simultaneous
Interpreting. In Kurosu, M., ed. Human-Computer Interaction: User Experience and Behavior.
Springer International Publishing, Cham (Lecture Notes in Computer Science, 13304), 214–232.
URL https://doi.org/10.1007/978-3-031-05412-9_16
Scherer, K.R., 1989. Vocal Correlates of Emotional Arousal and Affective Disturbance. In Wagner, H.,
Manstead, A., eds. Handbook of Social Psychophysiology. John Wiley & Sons, Chichester, 165–197.
Seeber, K.G., Kerzel, D., 2012. Cognitive Load in Simultaneous Interpreting: Model Meets Data. Inter-
national Journal of Bilingualism 16(2), 228–242. URL https://doi.org/10.1177/1367006911402982
Smith, P.L., Little, D.R., 2018. Small Is Beautiful: In Defense of the Small-N Design. Psychonomic
Bulletin & Review 25(6), 2083–2101. URL https://doi.org/10.3758/s13423-018-1451-8
Stoll, C., 2009. Jenseits simultanfähiger Terminologiesysteme: Methoden der Vorverlagerung und
Fixierung von Kognition im Arbeitsablauf professioneller Konferenzdolmetscher. WVT, Wissen-
schaftlicher Verlag Trier, Trier (Heidelberger Studien zur Übersetzungswissenschaft, 13).
Tripepi Winteringham, S., 2010. The Usefulness of ICTs in Interpreting Practice. The Interpreters’
Newsletter 15, 87–99. URL http://hdl.handle.net/10077/4751
Ünlü, C., 2023a. Automatic Speech Recognition in Consecutive Interpreter Workstation: Computer-
Aided Interpreting Tool ‘Sight-Terp’/Otomatik konuşma tanıma sistemlerinin ardıl çeviride
kullanılması: Sight-Terp (MA thesis). Hacettepe Üniversitesi.
Ünlü, C., 2023b. InterpreTutor: Using Large Language Models for Interpreter Assessment. In Orăsan,
C., Mitkov, R., Corpas Pastor, G., Monti, J., eds. International Conference on Human-Informed
Translation and Interpreting Technology (HiT-IT 2023). INCOMA Ltd., Naples, 78–96. URL
https://doi.org/10.26615/issn.2683-0078.2023_007
Van Cauwenberghe, G., 2020. Étude expérimentale de l’impact d’un soutien visuel automatisé sur la
restitution de terminologie spécialisée (MA thesis). Universiteit Ghent. https://lib.ugent.be/catalog/
rug01:002862551
Vogler, N., Stewart, C., Neubig, G., 2019. Lost in Interpretation: Predicting Untranslated Termi-
nology in Simultaneous Interpretation. In Burstein, J., Doran, C., Solorio, T., eds. Proceedings
of the 2019 Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for
Computational Linguistics, Minneapolis, 109–118. URL https://doi.org/10.18653/v1/N19-1010
Wan, H., Yuan, X., 2022. Perceptions of Computer-assisted Interpreting Tools in Interpreter Educa-
tion in Chinese Mainland: Preliminary Findings of a Survey. International Journal of Chinese and
English Translation & Interpreting 1, 1–28. URL https://doi.org/10.56395/ijceti.v1i1.8
Wang, H., Li, Z., 2022. Constructing a Competence Framework for Interpreting Technologies, and
Related Educational Insights: An Empirical Study. The Interpreter and Translator Trainer 16(3),
367–390. URL https://doi.org/10.1080/1750399X.2022.2101850
Will, M., 2020. Computer Aided Interpreting (CAI) for Conference Interpreters. Concepts, Con-
tent and Prospects. ESSACHESS-Journal for Communication Studies 13(25), 37–71. URL https://
www.essachess.com/index.php/jcs/article/view/480
Xu, R., 2018. Corpus-Based Terminological Preparation for Simultaneous Interpreting. Interpreting
20(1), 29–58. URL https://doi.org/10.1075/intp.00002.xu
Xu, R., Sharoff, S., 2014. Evaluating Term Extraction Methods for Interpreters. In Drouin, P., Grabar,
N., Hamon, T., Kageura, K., eds. Proceedings of the 4th International Workshop on Computa-
tional Terminology (Computerm). Association for Computational Linguistics and Dublin City
University, Dublin, 86–93. URL https://doi.org/10.3115/v1/w14-4811

144
9
DIGITAL PENS FOR
INTERPRETER TRAINING
Marc Orlando

9.1 Introduction
The term ‘digital pen technology’ was first used in the context of interpreter training in
2010 by the present author and solely referred to a smartpen used with a notepad (also
called pen-and-paper technology). Over the last decade, ‘digital pen’ has also been used
to refer to a stylus used with a tablet computer/a touchscreen device. The information
presented and discussed in this chapter will deal exclusively with digital pens/smartpens as
mobile devices combined with a notepad. It is worth noting, however, that several studies
dedicated to the use of styluses with tablet computers or touchscreen devices have been
carried out to establish their relevance to note-taking for consecutive interpreting (Altieri,
2020; Arumí and Sánchez-Gijón, 2019; Drechsel and Goldsmith, 2016; Goldsmith, 2023).
The chapter will also focus only on the use of such technology in interpreter training (for
detailed information on the use of digital pen technology – including tablet computers – in
interpreting training and research, see Orlando, 2023).
Smartpens belong to the category of mobile computing platforms and are input devices
which capture the handwriting of a user and convert analogue information into digi-
tal data. Depending on the model, they can have additional features, like an integrated
camera/text scanner and/or a microphone/audio recorder. They offer advanced process-
ing power, memory for handwriting capture and for audio or visual feedback, as well as
additional applications. Smartpens have been investigated, trialled, and recommended for
use in various fields of education since the early 2000s, such as in design education, educa-
tion, engineering, health, and allied health (Boyle, 2012; Dawson et al., 2010; Grieve and
McGee-Lennon, 2010; Maldonado et al., 2006). It is only from 2010 that they appeared in
interpreter training and, in particular, in the area of note-taking for consecutive interpret-
ing. The technology has been used either to assess trainees’ interpreting performances and
provide interactive and dynamic feedback through peer or self-assessments, or to develop
process-oriented pedagogical activities that underpin the acquisition of metacognitive skills
and competence. Their innovative features have been praised by interpreting students and
instructors as they offer new opportunities to delve into the intricacies of note-taking for
consecutive interpreting. This chapter aims at reviewing and presenting training initiatives


DOI: 10.4324/9781003053248-12
The Routledge Handbook of Interpreting, Technology and AI

undertaken with smartpens, as well as at making pedagogical recommendations about


innovative and transformative classroom activities that would benefit interpreting students
and educators.
The evolution and the different development phases of digital pen technology will be
discussed in Section 9.2, followed by an overview of its use in interpreter training in specific
T&I programmes, as well as by recommendations about its possible classroom implemen-
tation, in Section 9.3. Section 9.4 will present the digital pens and related equipment cur-
rently available and also discuss the possibilities offered by emerging technologies allowing
automated note-taking. Finally, after summarising the chapter’s content, the conclusion
presented in Section 9.5 will focus on possible reasons for the underutilisation of smartpens
in interpreter training and question their future relevance in light of the emergence of more
web-based and paperless solutions.

9.2 Digital pen technology and its evolution


The concept of pen computing is not a recent one, as it ‘dates to the 1950s, when research-
ers first began to design and test digital computing systems that used handwriting recogni-
tion rather than a keyboard’ (Goldsmith, 2018, 342). The first stylus (called ‘Styalator’)
was actually presented by Tom Dimond at a computer conference in 1957 (Dimond, 1957).
It is over the following decades that digital pens for ‘paper-based computing’ were further
developed (Pogue, 2008), and several examples of the use of digital pens offering the pos-
sibility to write on paper and upload notes to be digitised were reported in the early 2000s
in areas such as teacher education (Nguyen, 2006) or design education (Maldonado et al.,
2006). A comprehensive review of the development and use of the technology during this
time can be found in Nguyen’s doctoral thesis (2006).
It is worth noting that Anoto was the first company in this field to develop a technology
that enabled the digitising of handwritten notes using digital pen and augmented paper as
well as ‘a digital camera/optical sensor and built-in flash memory’ (Nguyen, 2006, 41),
allowing the merging of the digital and the paper world. However, Anoto only provided the
pattern licence and a software kit for interested licensed partners to buy and develop their
own digital pen models. Partner companies included Logitech (which developed several
io models), Nokia (Nokia Digital Pen), Maxell (Penit), and soon after, Livescribe, whose
smartpens became well known in the interpreting world some years later.
The systems developed by Nokia and Logitech were praised by Maldonado et al., who
used them for fieldwork with their students in design, because the digital pens could be
used as normal ballpoint pens, allowing users to take normal notes with a pen and a note-
pad (Maldonado et al., 2006). This is an interesting comment when one considers similar
feedback from interpreters surveyed about their perception of the use of digital pens for
note-taking tasks (Orlando, 2015a). Another important development occurring in this area
in the late 1990s and early 2000s was that of audio-augmented note-taking systems, which
‘enable both speech record and note-taking to facilitate information capture during lec-
tures and meetings’ (Nguyen, 2006, 9). Such systems would consist of a digital pen, paper,
and a digitising pad with audio recording and playing elements, which would enable users
to easily link a pen stroke with an audio part. Products developed from such systems,
like Dynomite in 1997 or the Audio Notebook in 2001, paved the way for the Livescribe
technology.

146
Digital pens for interpreter training

Livescribe released its first smartpen (Pulse) in 2008, a more advanced model than any
of its predecessors as it combined an infrared digital camera, augmented paper, and audio
recording capability. Originally conceived to assist students or secretaries in their retrieving
of notes taken during lectures or meetings, it was subsequently used for research activi-
ties in various fields, such as education, engineering, health and allied health, or science
(Orlando, 2016). With advanced processing power, audio and visual feedback, memory for
handwriting capture, audio recording, and a few other additional applications, the Live-
scribe smartpen soon appeared as the ideal digital tool to explore the process of note-taking
in consecutive interpreting in an innovative manner.

9.3 Digital pens to enhance interpreter training

9.3.1 The use of smartpens in the field of interpreting studies


As Pöchhacker notes (2016, 199), ‘few systematic studies on the pedagogy of consecutive
interpreting have been carried out’, and ‘descriptive data on note-taking techniques are
scarce, and little is known about the practical application of the approaches put forward by
various authors’. Before smartpens were introduced in interpreting classrooms, because of
technological limitations, most research in the teaching of note-taking had focused on notes
as a product and rarely on the process of taking notes (Orlando, 2010). Things changed
with the advent of digital pen technology, as the unique features offered by smartpens are
ideal to record ‘live’ note-taking, allowing

simultaneous source-speech recording and note-image capture . . . and subsequent


synchronized replay and visualization on a computer screen. This permits posterior
analysis of the note-taking process ‘in real time’, with respect to both the type of notes
taken and temporal aspects of their production.
(Pöchhacker, 2016, 199)

As the audio and video data captured by the pen are synchronised and can be uploaded to
any computer or played back instantly on digital devices, such as tablets or smartphones,
instructors and trainees can view the ‘live’ notes taken by an interpreter and pinpoint their
qualities or defects in direct relation to the source speech and the interpretation (for a com-
prehensive technical overview, see Orlando, 2023).
These features have appealed to researchers interested in collecting process-oriented data
for research purposes (Mellinger, 2022) with higher ecological validity and have led to
research projects focusing primarily on cognitive processes at play in note-taking (Chen,
2017, 2020; Kellett Bidoli and Vardè, 2016) or on hybrid modes of interpreting (Hiebl,
2011; Mielcarek, 2017; Orlando, 2014; Özkan, 2020; Svoboda, 2020). As proposed by
Mellinger (2022), more could be investigated, since digital pen technology allows data col-
lection that can be interpreted in line with cognitive research on interpreting. Researchers
with an interest in enhancing interpreter training in note-taking have also recommended the
use of digital pen technology (Chen, 2017; Kellett Bidoli and Vardè, 2016; Orlando, 2010,
2015b). As experiments and trials of the tool carried out so far have shown, the system can
‘certainly be a very valuable aid to training in consecutive with notes’ (Setton and Dawrant,
2016, 198), ‘an invaluable tool for teachers and students’ (Gillies, 2019, 225).

147
The Routledge Handbook of Interpreting, Technology and AI

9.3.2 Reported pedagogical activities in the consecutive


interpreting classroom
Despite this promising potential and the recommendations from researchers and trainers,
it appears that only few trainers and educators have attempted to implement the use of
smartpens in the interpreting classroom. At the time of writing (2023), the only reported
implemented activities known to the present author are those by Kellett Bidoli (2016), Kel-
lett Bidoli and Vardè (2016), and Romano (2018), as well as those he has written about
(Orlando, 2010, 2015b).
The first ever article relating to the use of the tool in training (Orlando, 2010) presented
the features of the Livescribe pen as well as specific pedagogical activities to be imple-
mented to assess qualities and defects of the notes taken by trainees and their impact on the
interpreting performance. Using existing general research on note-taking, which suggests
that the use of text-to-speech/speech-to-text technology and effective note-taking activi-
ties, coupled with review, can aid learning and understanding and therefore enhance the
comprehension, fluency, accuracy, speed, endurance, and concentration of individuals, the
author posited that if the taking of notes is too demanding on a student’s working memory
to permit the student to carry out generative processing/analytical listening in real time
and, in the case of interpreting students, leads to poor performance, the required analysis
of the content can still occur during the follow-up review of notes. Given the difficulties
many interpreting students face when reading their own notes, the author was of the view
that the synchronous juxtaposition of text and audio provided by smartpens should induce
greater learning from the students when reading, reviewing, and evaluating their own notes
during assessment activities.
The class activity he developed subsequently at his university in Australia followed a
five-step pedagogical sequence, during which students were asked to listen to a source
speech and take notes with a smartpen and to provide their interpretation while being
video-recorded. The video of the performance was then projected to the group for a collec-
tive quality analysis in terms of accuracy and presentation skills. The data recorded by the
pen (synchronised audio of the speech and filmed notes) were then played back on-screen
via a computer, and possible reasons and explanations for the deficiencies in the perfor-
mance were identified. Finally, comments, ideas, and suggestions on the choice of notes
(e.g. symbols, abbreviations), on misunderstood or missed chunks in the source speech, or
on ear–pen span were discussed collectively, and remediation strategies were shared. The
activity could be repeated several times for the whole class or within smaller groups, where
guided peer evaluations were provided. The recorded data was also provided to individual
students to be reviewed and analysed as part of a self-assessment activity.
The author also oversaw the implementation of the technology in the interpreting course
of the University of Mainz/Germersheim using the pedagogical sequence mentioned earlier
and providing training of trainers. Through surveys carried out with the students and the
instructors from his university and from Mainz/Germersheim involved in using smartpens
for note-taking training for several weeks, he collected data on the general usability of the
tool and its features, its effects on note-taking analyses, and its overall benefit to the devel-
opment of quality note-taking conventions, including cross-student feedback. Responses
from students and instructors from both universities were extremely positive. Both groups
pointed out several important benefits, ranging from the observation of the ear–pen lag
in real time and how it relates to comprehension to the identification of missed chunks of

148
Digital pens for interpreter training

information and of issues with notes layout, the ability to track the sequencing of notes
throughout a speech, the possibility to view and discuss playbacks collectively and establish
cross-fertilisation processes in peer assessment and group work, or the improvements of
note-taking personal conventions over time (Orlando, 2015b, 184–192).
Kellet Bidoli (University of Trieste) also reported using Livescribe smartpens when teach-
ing consecutive interpreting and how the technology has opened new horizons by providing
trainers with ‘new innovative tools to teach with and evaluate consecutive in class’ (Kellet
Bidoli, 2016, 116). She praised the fact that the synchronised audio and filmed notes can
be uploaded and shared with the group immediately after performing, allowing trainees
to follow her observations and comments on a single student’s notes and add their own
suggestions. The exercise makes it ‘possible to observe, trace and count features of the
interpreted text which, during normal oral critique sessions in class, are inaccessible to
such a high degree of accuracy’. Observations focus on a student’s notes in relation to the
correct use of terminology in the interpretation, as well as ‘other features like ear-voice
span, additions, corrections, hesitations, false starts, repetitions, figures of speech, names,
facts and figures’ (Kellet Bidoli, 2016, 118). Such cross-fertilisation of ideas allows students
to ‘quickly pick up new solutions and dispel any doubts’ (Kellet Bidoli, 2016, 117). From
a different project she carried out with Vardè, they pointed out that the data captured by
the pen allows ‘to return to any section of a speech and see the notes and/or listen to the
SL [source language] over and over to unravel the process, highlight mistakes and good or
bad choices which otherwise would go unnoticed’ (Kellet Bidoli and Vardè, 2016, 144).
The synchronised audio and video provided unique insights in lag, false starts, hesitations,
additions, or corrections. Finally, the authors also noted the interactional benefits gained as
‘the job of collecting notes is fast, they can be observed in all their dynamicity and, together
with synchronization of the SL, much can be gleaned in the classroom making students
active participants’ (Kellet Bidoli and Vardè, 2016, 144).
What prompted Romano to implement the use of smartpens in consecutive interpreting
training at Innsbruck University was a sense of dissatisfaction when teaching note-taking,
in particular, the difficulty to identify ‘what went wrong during the note-taking phase’ of a
substandard interpretation (Romano, 2018, 9). Asking students ‘to copy their notes from
their notebooks to the blackboard’ as a sharing activity was an impractical solution and ‘a
waste of precious time’, while with digital pens, ‘one gains valuable time that can be used
for in-depth analysis of the notes’ (Romano, 2018, 11). Working with a cohort of 52 stu-
dents, she decided to use the technology following the pedagogical scenarios recommended
by Orlando (2010) and concluded that it was particularly useful and helpful in one area
that ‘tends to be neglected in consecutive interpreting while being of paramount importance
in simultaneous interpreting, namely decalage’ (Romano, 2018, 13). She praised the storage
capacity of pens allowing students ‘to go back to previous notes and monitor how their
note-taking technique has evolved over time’ (Romano, 2018, 12), as well as the ease of use
and the potential for more interactivity in the classroom.
Whether digital pens are used at the very start of training (Romano) or at a later advanced
stage to assess progress and efficiency (Orlando), they can benefit trainees throughout the
duration of their course and later in their professional practice.
All examples reported hereby concur on the many pedagogical and metacognitive ben-
efits the use of such technology brings: the access to ‘live’ notes allowing to identify
at once and better understand what part of the source speech is misunderstood, not

149
The Routledge Handbook of Interpreting, Technology and AI

memorised or missed; how long the lag/décalage is; etc. The possibility for students to
visualise the process of note-taking and identify better their own qualities or deficiencies,
to share ideas and get inspiration from other students and trainers, to understand bet-
ter and analyse what can go wrong if they take excessive or disorganised notes, is also
an incentive to use such technology. Various pedagogical activities and sequences can
be developed and implemented to allow students and trainers to identify issues in the
note-taking technique of a trainee (e.g. self- or peer evaluation), but also to develop per-
sonalised and effective remediation strategies through cross-fertilisation (Orlando, 2016,
117–121). Despite these indisputable advantages, one can regret that only so few initia-
tives have been reported so far. It is possible, though, that other trainers have been using
the technology in their interpreting classroom. In their survey of 60 interpreter trainers
teaching within the EMCI (European Masters in Conference Interpreting) consortium,
Riccardi et al. (2020, 31) noted that 9 of them (15%) use smartpens in their classes.
Unfortunately, it appears that none of them have reported about their activities and find-
ings through publications.

9.3.3 Recommendations on the use of smartpens in interpreter education


As recalled by Sandrelli (2015), technology has always played an important role in pro-
fessional interpreting, and it is therefore not a surprise that interpreter trainers must
ensure they introduce useful new technologies in their classroom. Digital equipment and
resources for interpreter training are more and more accessible today, and ‘essential rou-
tine teaching tasks now seem inconceivable without the use of technology’ (Frittella,
2021, 104). However, and even though digital literacy and ‘digital citizenship skills’
(Darden, 2019) are today a priority for many universities and information and commu-
nications technologies (ICTs) in interpreter education seem to appeal to more and more
interpreting trainers and researchers, the use of technological tools in interpreting class-
rooms and its integration in curricula appear to remain marginal (Darden, 2019; Prandi
2020; Riccardi et al., 2020). Moreover, data from Prandi’s 2017 survey of CIUTI (Confé-
rence Internationale permanente d’Instituts Universitaires de Traducteurs et Interprètes)
institutions on the use, diffusion, and inclusion of CAI (computer-assisted interpreting)
tools in conference interpreter training revealed some confusion about what technologi-
cal tools are available and how they can be implemented in training (Prandi, 2020, 3), an
observation corroborated by Riccardi et al. (2020) and Frittella (2021), who recommends
and proposes a redefinition and classification of what computer-assisted interpreter train-
ing (CAIT) is.
For Prandi or Riccardi et al., the marginal in-class use of technologies in interpreter
education may be due to the lack of awareness by trainers of the existence of relevant
training tools, but also to a lack of pedagogical explorations for less-conventional
teaching methods. Recommendations for flexible ways to change institutional policies
and practices, as well as for specialised training-for-trainers seminars, are echoed in
Darden’s study of sign language interpreting educators and trainers (2019), where the
author found out that educators ‘cared about the digital citizenship of their students’
but ‘expressed a desire for clear policies at the institutional and program level to guide
their pedagogy’ (Darden, 2019, 157). This is a sentiment shared by other authors who
have noted that tailored training of trainers and educators in didactics, classroom tech-
niques, and curriculum/syllabus design should be considered to maximise the teaching

150
Digital pens for interpreter training

and learning benefits of using technologies on a more systematic basis (Frittella, 2021;
Orlando, 2019).
Similar to what Frittella recommends (2021) regarding computer-assisted interpreter
training (CAIT) and the use of technology in interpreter education, the use of digital pen
technology and smartpens in the consecutive interpreting classroom should be envisaged
and implemented on the basis of its intended pedagogical purpose. As discussed by Ahrens
and Orlando (2021), a favoured approach in interpreter education training aims at put-
ting the student at the centre of the teaching/learning act. Any such constructivist intention
relies on various pedagogical elements that will assist trainees in their learning process:
metacognitive strategies, process-oriented and product-oriented evaluations, or feedback
mechanisms, among others. In the teaching of note-taking in interpreting, the use of smart-
pens has been a crucial technological advance to achieve such objectives and should be
advocated for on a broader scale.

9.4 Available technology and equipment

9.4.1 Current smartpen technology


The aforementioned initiatives on the use of smartpens in interpreter training were all con-
ducted with Livescribe models. In 2010, the company released its second model, Echo, a pen
with similar features as the Pulse, though lighter and less bulky. Both pens have built-in audio
recording and can be used with 3D recording headsets in noisy environments. Uploading
and managing recorded files is done via the Livescribe desktop software, to be installed on a
computer by the user of the pen. The most recent generations of the Livescribe smartpens do
not have built-in microphones anymore and can record and synchronise audio through any
digital device paired with the pen via Bluetooth and the Livescribe app.
For training purposes, Kellet Bidoli and Vardè (2016) and Romano (2018) used Echo.
The present author used both Pulse and Echo models (2010–2015), as well as Livescribe
3 more recently.
Digital pen technology is an evolving technology, and new developments and advances
are frequently reported. The description of the devices listed hereafter should therefore be
considered valid only at the time of writing (2024). When looking for a digital pen model
to be chosen for training purposes, one must ensure that audio recording features are avail-
able. Some models would allow the capture and digitisation of handwritten notes and its
conversion to text to be visualised ‘live’ but would not offer the possibility to record the
audio of the speech to be synched with notes. Smartpens with the proper features known to
the author are listed hereafter in alphabetical order:

• The Livescribe range, requiring microdotted augmented paper: Pulse and Echo, with
built-in microphones and syncing possible with desktop via USB; Sky WiFi, Livescribe
3, Aegir, Symphony, and Echo 2, with audio syncing possible via a digital device paired
with Bluetooth and the Livescribe application. Audio files are sorted separately and
recorded as sharable pencasts synched with notes. It is worth noting that, at the time
of writing, most models but Symphony and Echo 2 appear to have been discontinued,
though some may still be available for purchase.
• The Neo models Smartpen M1+ and Smartpen N2 are used with the versatile Neo Studio
app and augmented paper too.

151
The Routledge Handbook of Interpreting, Technology and AI

• Moleskine Pen+ Ellipse, which offers an audio recording option on a paired phone, tab-
let, or computer via its application and works with specific Moleskine+ notebooks.
• The SyncPen NEWYES second-generation smartpen works both on paper and on a dedi-
cated tablet and is paired on any digital device via the app.

9.4.2 Towards automated note-taking?


In the last few years, the advent of automatic speech recognition (ASR) tools and
speech-to-text technology has affected the world of interpreters, and as noted by Pöch-
hacker (2016, 188), ‘ASR has a considerable potential for changing the way interpreting
is practised’. As reported in online blogs,1 fora, and webinars,2 more and more web-based
speech recognition tools are available for practitioners to generate real-time transcription
during simultaneous interpreting assignments. SmarTerp or InterpretBank, for example,
are tools that allow the extraction of key information from a generated transcript of a
speech, such as numbers, names, key terms in a paired glossary. Researchers like Defrancq
and Fantinuoli (2020) have shown the advantages of such tools with error reduction in
the rendition of numbers during simultaneous interpreting, and other experiments are cur-
rently being carried out to test ASR’s efficiency in enhancing the work of interpreters (e.g.
the Ergonomics for the Artificial Booth Mate at Ghent University). Though more research
is needed, it is acknowledged that ASR tools could be useful in areas of interpreting, such
as preparation tasks and terminology organisation, term extraction, information retrieval
and corpus building, or running transcription, and can overall enhance interpreting qual-
ity (Chen and Kruger, 2023; Cheung and Li, 2022). An interesting question to investigate
would be if the tool has potential for changing the way interpreter training in both simulta-
neous and consecutive interpreting, and in particular, in note-taking, is carried out.
One initiative to mention in this area is the use of Cymo Note for consecutive inter-
preting and automated note-taking. Cymo Note is a ‘professional multilingual note-taking
software for interpreters’ presented as a ‘smart interpretation assistant’ (Cymo Note, 2023)
that does speech recognition in real time, dictates the processed and generated output in
multiple languages, or can be customised and linked to individual glossaries. The user can
also annotate directly on the provided text. One interesting feature relevant to the present
chapter is the fact that the tool ‘combines a running transcription (highlighting of key terms
and figures) with a virtual notepad’ (Techforword, 2023). This allows the user to divide the
screen in two, a layout often favoured by interpreters, which provides space to take notes
next to the generated transcript. In a consecutive interpreting context, the source speech
would be captured, transcribed in real time extracting/highlighting specific key informa-
tion, and would appear on the left part of the divided screen/page; the interpreter would not
only take notes on the right part of the page but would also have the possibility to annotate
the transcript with information that would be useful during the reformulation phase. In
real-life assignments, confidentiality and data privacy could be an issue, and authorisation
to use such a web-based application would have to be obtained from the client. In training
contexts, however, one could easily see the benefits in trialling the tool and implementing
pedagogical activities in the consecutive interpreting classroom. As mentioned earlier, this
would require open-mindedness to explore less-conventional teaching methods on the part
of instructors, but also institutional flexibility and tailored support on the part of interpret-
ing programmes and universities.

152
Digital pens for interpreter training

Could such a tool be a substitute to traditional pen and paper or to smartpens? It might
become the ‘next-level’ tool for those trainees/practitioners who already use touchscreen
devices or tablets. For those who still prefer relying on a pen and a notepad and are not
comfortable using a stylus, this might be a step too far. Time will tell.

9.5 Conclusion
The use of smartpens in interpreter training over the last ten years has been reviewed
in this chapter. Smartpens with video and audio recording features used with notepads,
aka pen-and-paper technology, which still allows note-taking with pen and paper, offer
undeniable advantages. Initiatives implemented in the consecutive interpreting classroom,
albeit still too few, have demonstrated how the access to ‘live’ notes offers invaluable and
unmatched pedagogical opportunities to dissect the process of notes taken by trainees and
to provide personalised remediation for the issues they may encounter. Trainers and train-
ees who have trialled smartpens praise in particular the synchronisation of handwritten
notes with the audio recording of the source speech, as well as the change of dynamics and
the increased engagement of students during and outside class time thanks to the potential
for more exchange of ideas.
As was noted, digital pen technology and smartpens are still underutilised in interpreter
training, even though a lot more could still be done to teach interpreting students to develop
appropriate note-taking systems that would ultimately improve the quality of their inter-
preting performances. The main reasons for such an underutilisation may be multiple and
include the budget needed to acquire pens and to replace the microdotted paper (though it
can now be printed out from the Livescribe desktop at no cost), the perceived complexity
of set-up and of use of the tool, and the lack of technological support available for trainers
at their university, or the reluctance from professionals to use such technology (Orlando,
2023). As new digital tools and systems emerge, the future and the relevance of smart-
pens may also be questioned. Though the fact that they are used as ‘normal’ pens with
a notepad seemed to have been a positive characteristic at first (Maldonado et al., 2006;
Orlando, 2010), the paperless digital devices and web-based technologies currently avail-
able for note-taking and audio recording may be more convenient and appealing to current
and future generations of educators and students, who may no longer use pens and paper.
In any case, to see teaching traditional methods evolve and various technologies that aim
to enhance the practice of interpreting (be it digital pens and smartpens, automatic speech
recognition tools, or any future relevant technology) be implemented in interpreter educa-
tion and be more widespread and more democratised than they are today, a shift would
need to occur in interpreting programmes and in universities.
To make sure future graduates are trained to respond to the contemporary demands of
our discipline and our profession, and to fulfil their digital literacy obligations by exposing
students to a variety of technological tools, interpreting departments would need to build
capacity and ensure that trainers and researchers in education are made aware of the exist-
ence of such tools and systems, have access to them, and are trained in interpreting didactics
to gain the knowledge to develop purposeful pedagogical activities with them. As philoso-
pher and educational reformer John Dewey once put it, ‘if we teach today as we taught
yesterday, we rob our children of tomorrow’ (Dewey, 1897). If we want to see interpreting
education and practice remain relevant, a thorough focus on ongoing technological devel-
opments and changes, and on their impact on the interpreting world, is essential.

153
The Routledge Handbook of Interpreting, Technology and AI

Notes
1 Live prompting CAI tools – a market snapshot – Dolmetscher-wissen-alles.de (sprachmanagement.net)
(accessed 1.7.2024).
2 www.techforword.com/resources (accessed 12.9.2024).

References
Ahrens, B., Orlando, M., 2021. Note-Taking for Consecutive Conference Interpreting. In Albl-Mikasa,
M., Tiselius, E., eds. The Routledge Handbook of Conference Interpreting. Routledge, London
and New York, 34–48.
Altieri, M., 2020. Tablet Interpreting: Étude expérimentale de l’interprétation consécutive sur tab-
lette. The Interpreters’ Newsletter 25, 19–35.
Arumí, M., Sánchez-Gijón, P., 2019. La toma de notas con ordenadores convertibles en la
enseñanza-aprendizaje de la interpretación consecutiva. Resultados de un estudio piloto en una
formación de master. Tradumàtica 17, 128–152. URL https://doi.org/10.5565/rev/tradumatica.234
Boyle, JR., 2012. Note-Taking and Secondary Students with Learning Disabilities: Challenges and
Solutions. Learning Disabilities Research & Practice 27(2), 90–104.
Chen, S., 2017. Note-Taking in Consecutive Interpreting: New Data from Pen Recording. The Inter-
national Journal for Translation and Interpreting Research 9(1), 4–23.
Chen, S., 2020. The Process of Note-Taking in Consecutive Interpreting: A Digital Pen Recording
Approach. Interpreting 22(1), 117–139.
Chen, S., Kruger, J.L., 2023. The Effectiveness of Computer-Assisted Interpreting: A Preliminary
Study Based on English-Chinese Consecutive Interpreting. Translation and Interpreting Studies
18(3), 399–420. URL https://doi.org/10.1075/tis.21036.che
Cheung, A.K.F., Li, T., 2022. Machine Aided Interpreting: An Experiment of Automatic Speech Rec-
ognition in Simultaneous Interpreting. Translation Quarterly 104, 1–20.
Cymo Note, 2023. URL www.cymo.io/en/documentation/note/index.html
Darden, V 2019. Educator Perspectives on Incorporating Digital Citizenship Skills in Interpreter
Education (PhD thesis). Walden University.
Dawson, L., Plummer, V., Weeding, S., Harlem, T., Ribbons, B., Waterhouse, D., 2010. Build-
ing a System for Managing Clinical Pathways Using Digital Pens. URL https://ro.uow.edu.au/
infopapers/1471
Defrancq, B., Fantinuoli, C., 2020. Automatic Speech Recognition in the Booth. Target 33(1), 1–30.
URL https://doi.org/10.1075/target.19166.def
Dewey, J., 1897. My Pedagogic Creed. E. L. Kellog & Co., New York.
Dimond, T., 1957. Devices for Reading Handwritten Characters. Proceedings from the Eastern Joint
Computer Conference 232–237. URL https://doi.org/10.1145/1457720.1457765
Drechsel, A., Goldsmith, J., 2016. Tablet Interpreting: The Evolution and Uses of Mobile Devices in
Interpreting. URL http://independent.academia.edu/adrechsel
Frittella, F.M., 2021. Computer-Assisted Conference Interpreter Training: Limitations and Future
Directions. Journal of Translation Studies 2, 103–142.
Gillies, A 2019, Consecutive Interpreting: A Short Course. Routledge, London and New York.
Goldsmith, J., 2018. Tablet Interpreting: Consecutive 2.0. Translation and Interpreting Studies 13(3),
342–365.
Goldsmith, J., 2023. Tablet Interpreting: A Decade of Research and Practice. In Corpas Pastor, G.,
Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. IVITRA Research in
Linguistics and Literature, 37. John Benjamins, Amsterdam and Philadelphia, 27–45. URL https://
doi.org/10.1075/ivitra.37.02gol
Grieve, C.R., McGee-Lennon, M., 2010. Digitally Augmented Reminders at Home. URL www.cs.stir.
ac.uk/~kjt/research/match/resources/documents/grieve-reminders.pdf (accessed 12.9.2024).
Hiebl, B., 2011. Simultanes Konsekutivdolmetschen mit dem LivescribeTM EchoTM Smartpen
[Simultaneous Consecutive Interpreting with the LivescribeTM EchoTM Smartpen] (MA disserta-
tion). University of Vienna.
Kellet Bidoli, C.J., 2016. Traditional and Technological Approaches to Learning LSP in Italian to
English Consecutive Interpreter Training. In Garzone, G., Heaney, D., Riboni, G., eds., Focus on
LSP Teaching: Developments and Issues. LED, Milan, 103–126.

154
Digital pens for interpreter training

Kellet Bidoli, C.J., Vardè, S., 2016. Digital Pen Technology and Consecutive Note-Taking in the Class-
room and Beyond. In Zehnalová, J., Molnár, O., Kubánek, M., eds. Interchange Between Lan-
guages and Cultures: The Quest for Quality. Palacký University, Olomouc, 131–148.
Maldonado, H., Lee, B., Klemmer, S., 2006. Technology for Design Education: A Case Study. In
Proceedings of the Conference on Computer Human Interaction: CHI 2006. Montreal, Canada,
1067–1072.
Mellinger, C.D., 2022. Cognitive Behaviour During Consecutive Interpreting: Describing the
Note-Taking Process. The International Journal of Translation and Interpreting Research 14(2),
103–119.
Mielcarek, M., 2017. Das simultane Konsekutivdolmetschen [Simultaneous Consecutive Interpreting]
(MA dissertation). University of Vienna.
Nguyen, N.P.H., 2006. Note Taking and Sharing with Digital Pen and Paper: Designing for
Practice-Based Teacher Education (MA dissertation). Trondheim University of Science and
Technology.
Orlando, M., 2010. Digital Pen Technology and Consecutive Interpreting: Another Dimension in
Note-Taking Training and Assessment. The Interpreters’ Newsletter 15, 71–86.
Orlando, M., 2014. A Study on the Amenability of Digital Pen Technology in a Hybrid Mode of
Interpreting: Consec-Simul with Notes. The International Journal of Translation and Interpreting
Research 6(2), 39–54.
Orlando, M., 2015a. Digital Pen Technology and Interpreter Training, Practice and Research: Status
and Trends. In Ehrlich, S., Napier, J., eds. Interpreter Education in the Digital Age. Gallaudet
University Press, Washington, DC, 125–152.
Orlando, M., 2015b. Implementing Digital Pen Technology in the Consecutive Interpreting Class-
room. In Andres, D., Behr, M., eds. To Know How to Suggest . . . Approaches to Teaching Confer-
ence Interpreting. Frank & Timme, Berlin, 171–199.
Orlando, M., 2016. Training 21st Century Translators and Interpreters: At the Crossroads of Prac-
tice, Research and Pedagogy. Frank & Timme, Berlin.
Orlando, M., 2019. Training and Educating Interpreter and Translator Trainers as
Practitioners-Researchers-Teachers. The Interpreter and Translator Trainer 13(3), 216–232. URL
https://doi.org/10.1080/1750399X.2019.1656407
Orlando, M., 2023. Using Smartpens and Digital Pens in Interpreter Training and Interpreting
Research: Taking Stock and Looking Ahead. In Corpas Pastor, G., Defrancq, B., eds. Interpreting
Technologies – Current and Future Trends. IVITRA Research in Linguistics and Literature, 37.
John Benjamins, Amsterdam and Philadelphia, 6–26. URL https://doi.org/10.1075/ivitra.37.01orl
Özkan, C.E., 2020. To Use or Not to Use a Smartpen: That Is the Question. An Empirical Study on
the Role of Smartpen in the Viability of Simultaneous-Consecutive Interpreting (MA dissertation
(accessed 12.9.2024). Ghent University.
Pöchhacker, F., 2016. Introducing Interpreting Studies, 2nd ed. Routledge, London and New York.
Pogue, D., 2008. Gadget Fanatics, Take Note. The New York Times, 8.5.2008. URL www.nytimes.
com/2008/05/08/technology/personaltech/08pogue.html (accessed 12.9.2024).
Prandi, B., 2020. The Use of CAI Tools in Interpreter Training: Where We Are Now and Where Do
We Go from Here. inTRAlinea Special issue: Technology in Interpreter Education and Practice.
URL www.intralinea.org/specials/article/the_use_of_cai_tools_in_interpreter_training
Riccardi, A., Ceňková, I., Tryuk, M., Maček, A., Pelea, A., 2020. Survey of the Use of New Technolo-
gies in Conference Interpreting Courses. In Rodriguez Melchor, M.D., Horváth, I., Ferguson, K.,
eds. The Role of Technology in Conference Interpreter Training. Peter Lang, Oxford, 7–42.
Romano, E., 2018. Teaching Note-Taking to Beginners Using a Digital Pen. Między Oryginałem a
Przekładem 24(42), 9–16.
Sandrelli, A., 2015. Becoming an Interpreter: The Role of Computer Technology. MonTI Special Issue
2, 111–138.
Setton, R., Dawrant, A., 2016. Conference Interpreting: A Trainer’s Guide. John Benjamins, Amster-
dam and Philadelphia.
Svoboda, S., 2020. SimConsec: The Technology of a Smartpen in Interpreting (MA dissertation).
Palacký University Olomouc.
Techforword, 2023. Cymo Note: Speech Recognition Meets Automated Note-Taking. URL www.tech-
forword.com/blog/cymo-note-speech-recognition-meets-automated-note-taking (accessed 12.9.2024).

155
10
TECHNOLOGY FOR
TRAINING IN CONFERENCE
INTERPRETING1
Amalia Amato, Mariachiara Russo, Gabriele Carioli
and Nicoletta Spinolo

10.1 Introduction
The turn of the 20th century marked a watershed in interpreter-mediated communication.
The advent of technologies applied to conference interpreting has profoundly affected the
way in which the speaker’s message could be put across in another language. Firstly, an
IBM system enabled simultaneous interpreting. This mode was officially launched during
the Nuremberg Trials (Baigorri-Jalón, 2000) and remains the standard interpreting mode
in multilingual conferences and international organisations today. Secondly, video-based
conference interpreting systems have enabled remotely interpreted communication to take
place on an unprecedented scale.
Technology affordances have had a remarkable impact not only on conference inter-
preting but also on interpreter education. In this field, technological tools have been
­developed to meet training demands that have evolved over the years. These tools are
known as computer-assisted interpreter training tools, or CAIT tools (Fantinuoli, 2023;
Prandi, 2020).
This chapter provides an overview of the fast-paced evolution of CAIT tools for educa-
tional purposes (Section 10.2). It then discusses two examples of specifically developed tools
for online training and self-training (Sections 10.3 and 10.4 and related subsections). In light
of technological advances in interpreting, Section 10.5 deals with the need to include soft
skills training in interpreting curricula and presents an exploratory study on the use of an
AI-enhanced CAI tool in an educational setting. Section 10.6 offers some concluding remarks.

10.2 CAIT evolution in conference interpreting educational settings


The first wave of CAIT solutions was conceived in the Nineties to meet the demand for
suitable interpreter training materials by providing speech collections that are classified
according to levels of difficulty (beginner, intermediate, advanced) and interpreting mode
(consecutive or simultaneous). Seminal speech repositories were created by the University
of Granada (MARIUS, Boéri and de Manuel Jerez, 2011), and additional prototypes were
developed by the University of Trieste (Iris, Gran et al., 2002).

DOI: 10.4324/9781003053248-13 
Technology for training in conference interpreting

These first multilingual tools, the use of which was confined to their institutions of
origin, set the stage for the abundant online resources of the present day. Examples of cur-
rently available resources include the Interpreter Training Resources2 website, developed
by Andrew Gillies, or specific sections of the Knowledge Centre on Interpretation.3 Exam-
ples of accessible multilingual speech repositories include Speechpool, conceived by Sophie
Llewellyn Smith;4 the EU-funded ORCIT;5 and the multilayered Speech Repository, created
by the DG SCIC of the EU Commission.
Another training demand which emerged during this time concerned the need to effi-
ciently organise self-study activities for interpreter trainees. Sandrelli (2005) and Merlini
(1996) created two original pieces of software: Blackbox and InterprIT, respectively. These
examples of software innovation resulted from the collaboration between these two inter-
preting scholars and IT engineers. Blackbox included structured materials and exercises,
along with the possibility for users to record student performance and receive trainer feed-
back. From an educational prototype, Blackbox became a fully-fledged, somewhat popular
commercial product (Sandrelli, 2005). Although designed specifically for consecutive inter-
preting practice (Merlini, 1996; Gran et al., 2002), InterprIT never reached the commercial
production stage, unfortunately.
The gap filled by these initial innovative training tools in interpreter education is fur-
ther bridged today by the rise of virtual learning environments. Here, multiple training
resources can be stored, accessed, and exchanged. Examples include the general-purpose
platform Moodle, which can be used profitably to train interpreters (Kajzer-Wietrzny and
Tymczynska, 2014; Russo and Spinolo, 2022). Inspired by the concept of Moodle, Bertozzi
(2024), at the Department of Interpreting and Translation of the University of Bologna at
Forlì, has designed a self-training platform specifically for interpreter education. The plat-
form consists of several modules which both recap theoretical basic concepts of interpreter
training and provide a wide-ranging supply of principled training materials and evaluation
tools.
More recent developments in technology for interpreter education also appear in the field
of dialogue interpreting. These include computer-generated 3D virtual environments that
simulate credible business and community contexts, without exposing trainees to the stress-
ful conditions of real professional settings. These tools are the result of two EU-funded pro-
jects led by the University of Surrey: IVY (Interpreting in Virtual Reality; Braun and Slater,
2014; Ritsos et al., 2013) and EVIVA (Evaluating the Education of Interpreters and Their
Clients through Virtual Learning Activities) (Braun et al., 2014, this volume). In particular,
IVY was implemented in Second Life (SL) and includes virtual locations in which both
trainee interpreters and interpreting clients can practice, individually or collaboratively, via
virtual representations of themselves (avatars).
Another need arising in interpreter education concerns terminology retrieval and man-
agement, glossary compilation and ‘on-the-job’ consultation, term memorisation, and doc-
umentation. Meeting this demand has led to the second wave of technologies for interpreter
education. Whereas some proposed technological solutions have remained at master-thesis
level without any commercial development thus far (e.g. Pollice, 2016), others have enjoyed
larger diffusion, such as InterpretBank, developed by Fantinuoli (2023), which has become
a widely used CAIT tool. Examples of other commercial CAIT tools with similar features
include Interpreter’s Help or Interplex (see Fantinuoli, 2023).
A wider overview of the main CAIT tools developed during the second wave and, more
importantly, how and when to use them to promote guided student autonomy and more

157
The Routledge Handbook of Interpreting, Technology and AI

efficient learning environments is provided by Kajzer-Wietrzny and Tymczynska (2014).


A more recent, in-depth report on the array of IT applications, including CAIT tools, in
interpreter education was promoted by the European Masters in Conference Interpreting
and published by Rodríguez Melchor et al. (2020).
The combination of automatic speech recognition (ASR) software and artificial intel-
ligence (AI) has spurred the third wave of CAIT, which was closely linked to a new gen-
eration of CAI tools. These CAI tools aim to assist the simultaneous interpreter when
dealing with well-known problem triggers (named entities, figures, acronyms, and terms)
by transcribing them and providing their target language translation on a computer screen
as they are uttered by the speaker. Given their valuable support during an assignment, these
systems have become known as ‘digital boothmate[s]’ (Fantinuoli, 2023), and conference
interpreters are expected to increasingly avail themselves of such tools during professional
assignments. Consequently, some interpreter education institutions are testing them with
their trainees in order to familiarise them with their usage, while also collecting student
feedback and assessing the cognitive and physiological impact of such systems on users.
Defrancq and Fantinuoli (2020) assessed InterpretBank,6 and Olalla-Soler et al. (2023) and
Russo et al. (forthcoming) assessed the beta educational version of SmarTerp.7 Olalla-Soler
et al. focused on a physiological perspective, while Russo et al. focused on the students’
performance and their perception of usability of the tool. SmarTerp was also assessed by
Frittella (2023) in an in-depth, insightful study which addressed several aspects of human–
computer interaction and interpreter-centred technologies: Cognition, usability design, and
evidence-based instructional design. In addition, the use of in-booth CAI tools while train-
ing conference interpreters has also been the object of debate among different stakehold-
ers. Recent discussions have focused on when and how to introduce in-booth CAI tools in
interpreter education and training, and on key research issues, including how to identify
educational needs and develop technologies accordingly (Rodríguez et al., 2022).
As demonstrated thus far, some CAIT tools have been upgraded to provide features that
could turn these tools into efficient ‘digital boothmates’ for computer-assisted interpreting (CAI)
(e.g. InterpretBank’s conference modality). As these tools were already conceived as educational
devices, they differ from the professional (remote) conference interpreting platforms that now
incorporate CAI tools (e.g. KUDO Interpreter Assist and SmarTerp; Rodríguez et al., 2022)
and may have been integrated in interpreter training. Thus, a clear distinction has to be made
between the inclusion of such ‘upgraded’ conference platforms versus actual CAIT tools for
interpreter training and education, which have been covered in this chapter.
In general, today’s interpreter education institutions are aware of the inevitability of
introducing CAI and CAIT tools into their dedicated theoretical courses, as well as into
their courses that combine in-booth practice and workshops. To illustrate, Prandi’s (2020)
survey found that half of the 25 respondent institutions reported already including CAIT
tools in interpreter training, while others expressed a growing interest in expanding their
curricula to include them in the future.
One of the universities surveyed by Prandi was the Department of Interpreting and
Translation (DIT) of the University of Bologna at Forlì. DIT has already incorporated CAIT
training into its curriculum, both within its theoretical classes as well as its practical ses-
sions. Indeed, DIT has recognised the need to provide interpreter trainees with a variety of
computer-assisted training tools for individual or collective practice of the most required
interpreting skills, namely, sight translation, consecutive, and simultaneous interpreting.

158
Technology for training in conference interpreting

As a result, DIT has also developed two innovative open-source CAIT tools: InTrain and
ReBooth. Their development was based on fruitful collaboration between an IT specialist
and an interpreting trainer/researcher (Carioli and Spinolo, 2019, 2020). Mastering these
skills requires a long, self-paced learning curve in a student-centred, collaborative envi-
ronment; the availability of peers; trainer feedback; and adequate physical spaces where
to practise and sit exams: These have always been considered prerequisites in interpreter
education institutions, until the outbreak of the COVID-19 pandemic, which put all ten-
ets upside down. In particular, online training had become a necessity, and therefore,
InTrain, which had been developed to facilitate students’ self-study practice if no physical
booths were available (e.g. if the booth-equipped lab was already booked), and ReBooth,
which was developed during the pandemic, proved indispensable to support interpreter
education.
The following sections will describe both CAIT tools: InTrain for online peer-to-peer
interpreting practice (Section 10.3), and ReBooth for online interpreter training and evalua-
tion (Section 10.4). Section 10.5 will review training-related features and the user-friendliness
of SmarTerp – an AI-enhanced CAI tool in its first release – in order to highlight how CAIT
tools assist and are used by interpreter trainers and trainees to practise and hone the afore-
mentioned interpreting skills.

10.3 Online peer-to-peer(s) interpreting practice: InTrain


There are now software programmes, platforms, and websites that offer interpreter training
resources both for theoretical knowledge and practical skills acquisition. These can be used
fruitfully during on-site, blended, or distance learning, both with a trainer or in self-study
mode. To illustrate, see Braun et al. (2014, 2015) for further information about the afore-
mentioned IVY project, and Motta (2013, 2016) for further information on the ETI Virtual
Institute. Another potential application of online training tools is peer-to-peer practice.
This particular training/learning mode fosters cooperation and exchange among students
and reduces the solipsistic, and at times frustrating, practice of self-study for interpreting
students. Interpreting is a form of communication which requires at least one interlocutor
to work with the student interpreter, be they a mere listener or a more active participant.
This role could be embodied by a real assessor or an acting classmate. The example of
peer-to-peer CAIT that will be presented in this next section aims at meeting the need for
human interaction/feedback precisely during interpreter training.

10.3.1 Overview
InTrain (which stands for INterpreter TRAINing) is an open-source HTML5 WebRTC
web-based application for interpreter (self-)training. It was conceived in 2019 at the Uni-
versity of Bologna’s Department of Interpreting and Translation (DIT) by Carioli and Spi-
nolo and designed and developed by Carioli. This online tool allows students to practise
(either alone or with a tutor) and improve their simultaneous and consecutive interpreting
skills, as well as reflect on their practice in terms of interpreting process and product. The
rationale behind InTrain was to provide a minimal yet flexible application that would allow
groups of three students to work together. When working as a trio, one student acts as a
speaker, another as assessor and session administrator (who can also share a video or audio

159
The Routledge Handbook of Interpreting, Technology and AI

file to interpret) and the third acts as an interpreter. As a result, this tool allows students to
practise their skills independently. However, tutors could also easily use InTrain to work
with two students, without requiring technical supervision or intervention.
When a trainer uses InTrain with a small group of students in simultaneous mode, there
are two possible constellations:

1. The trainer acts as a ‘pure’ listener and/or assessor, while one student acts as the speaker
and a second one interprets (three people involved).
2. The trainer can share a video to be interpreted and act as listener and/or assessor while
one student interprets (two people involved).

When working in InTrain alongside a tutor in consecutive mode, two students can listen
to the speech and take notes. One student will then deliver his/her interpretation, while
the other checks his/her notes or assesses the first student’s delivery. The objective behind
using InTrain in this way is to focus on practising interpreting techniques alongside self- or
other-initiated reflections and assessment. InTrain also aims to allow students to work on
a specific skill or ability related to interpreting. Consequently, the tool includes features
that enable users to organise activities in relation to a specific purpose. Examples include
working on a specific task or problem trigger in order to complement activities covered in
the classroom.8

10.3.2 Tool development


A prerequisite for designing this tool was privacy. Since InTrain is a tool that students
should be able to use independently, outside of the University of Bologna, it was decided
that only information deemed ‘strictly necessary for the functioning of the application’
(to support communication between peers: Signalling, STUN,9 and TURN server)10 should
pass through the server. The designer wished for all communication to be peer-to-peer, with
no private data being stored on, or processed by, the server.
A range of JavaScript libraries and frameworks were evaluated for what were deemed to
be two of the most critical aspects: Connectivity (based on WebRTC) and audio processing.
For the former, the choice fell to PeerJS, thanks to the consistency of its programming inter-
face, the level of support it provided to media streams and data channels, its focus on decen-
tralised and fully interconnected small groups. InTrain peer-to-peer architecture aligns well
with the desired small-group set-up and offers low-latency communication without a need
for complex server infrastructure. To address audio considerations, server-side processing
and conversion were ruled out in order to uphold participants’ data privacy within their
respective browsers. While it offers advanced audio routing and processing capabilities,
HTML5 is unable to directly export multimedia in a readily usable format, such as WAV
or MP3. To address this limitation, while keeping the application as lightweight as possi-
ble, another open-source library was chosen, RecorderJS.11 This choice allows audio to be
exported in (stereo) WAV format.
The result of this entire design process was InTrain, a dedicated tool for the online
synchronous practice of consecutive and simultaneous interpreting between three users
connected from different locations. Currently, the platform is accessible free of charge,12
and its source code is available under the GNU Affero General Public v32313 licence.14
The requirements for use are as follows: Users need an electronic device with an internet

160
Technology for training in conference interpreting

connection (PC, tablet, etc.), headphones with a built-in microphone, and a webcam (which
may also be built into the device).
At first access, the screen resembles Figure 10.1.

Figure 10.1 Screenshot of InTrain landing page.

Before beginning a training session, each user/student must choose a role to enable one
participant to run the session, while the remaining two participants act as interpreter or
speaker, respectively (see Section 10.3.1). The objective behind this is for the session to
unfold smoothly and prepare all student participants for their role in training, with or with-
out the presence of a trainer. Before starting an InTrain training session, the three student
participants agree on the interpreting mode(s) and/or the interpreting aspect they wish to
work on. As a result, the supervisor will have chosen the speech to use in the session in
advance and will have therefore briefed the interpreter on the mode of delivery (consecutive
or simultaneous) and on the terminological aspects they should prepare.
In terms of the operational steps of a session, the supervisor logs on first, choosing any
username that is not currently in use by another user. They must then allow the site to
access their webcam and headset and communicate their username to the other participants
via a separate means of communication. There is no built-in mechanism for sending email
invitations through the server for security reasons, since it is a public access tool. The inter-
preter and the speaker will subsequently connect to the platform, choose their respective
roles, and enter their supervisor’s username to join the session. Once all three participants
have connected to the session, they will be able to see and communicate with each other, via
their own interface. This working mode is called briefing mode. All microphones and audio
channels are open. This enables the participants to agree on the activity they wish to carry
out, and for the speaker to brief the interpreter.
The three interfaces are arranged in a similar way for each user, with a larger user display
in the centre of the screen, two smaller displays stacked on the left, and a chat window on
the right. The displays were sized in relation to what was considered to be most relevant for
each participant to see. For instance, the interpreter sees the speaker on the larger display.

161
The Routledge Handbook of Interpreting, Technology and AI

Figure 10.2 Screenshot of the interpreter’s interface.

Below this are the respective toolbars. By default, the supervisor sees the speaker on the
main user display, and the interpreter in the box at the top on the left. On the other hand,
the speaker sees the supervisor at the centre of the screen and the interpreter at the top on
the left. Lastly, the interpreter sees the speaker at the centre and the supervisor on the top
left corner. The display at the bottom on the left shows their own video stream. Streams at
the centre and at the top right corner can be switched.
Although InTrain was not designed to simulate a remote interpreting platform, it does
include a feature to allow the exchange of text messages and files between participants, via
a chat panel (Figure 10.2). To start an exercise, the supervisor first enables training mode on
the interface. This automatically mutes the supervisor’s microphone and allows them to listen
to both the speaker’s and the interpreter’s delivery at the same time or choose between one of
the two audio inputs. The supervisor can then record and save the training session by creat-
ing a double-track file that contains both the speaker’s speech and the interpreter’s rendition.
Finally, the platform offers the possibility of conducting a session without a speaker. This
feature can be used by uploading a link to a video from YouTube, Vimeo, or Facebook that
can be played in the speaker’s box on the interface, or by sharing the supervisor’s own screen.
The interpreter’s role only allows the user to share documents, mute and unmute their
microphone, write messages in the chat, and take part in the briefing. Once the supervisor
has switched the session to training mode, the interpreter can still hear the speaker but can
no longer hear the supervisor, whose microphone is muted. The speaker is unable to hear
the interpreter’s rendition by default but can select the interpreter’s audio channel on the
interface and listen to the rendition.

10.3.3 Usability and suitability


Defrancq observed (2023, 305):

[U]niversities obviously have an important role to play in research into usability and
suitability of potential tools for interpreter support. They have a duty to help identify

162
Technology for training in conference interpreting

technological gaps and advance the conceptual development of tools that are both
relevant and suited for interpreters.

Consequently, testing the usability and suitability of InTrain became the object of an MA
thesis conducted at the Department of Interpreting and Translation of the University of
Bologna (Santoro, 2020). The study tested the tool in both consecutive and simultane-
ous modes. The participants were four second-year MA students from the aforementioned
Department at the Forlì Campus, and four from the Interpreting Department of the Uni-
versity of Trieste. The aim was to test the online platform with students who had already
trained and self-trained together in person, and with students from another university who
had neither trained nor interacted in person before with the students from Forlì.
The study took place in different stages, carried out at all times by the same MA stu-
dent who had prepared the texts and speeches, taken field notes, and run the simultaneous
interpreting sessions under the supervision of an academic trainer. First, four students from
the Forlì Department who had trained together worked in two pairs on consecutive inter-
preting from Spanish into Italian using InTrain. Each student, in turn, took on the role of
speaker and interpreter. This enabled each participant to play both roles and ensured that
every student had interpreted consecutively once. The participants were also asked to take
part in a simultaneous interpreting session, run by the supervisor, to provide further evalu-
ation of the function of this tool. During the second stage of the study, the four students
from Forlì worked with four students from Trieste, in mixed pairs, and performed the same
activities as in the first stage. In short, the two stages aimed to allow students to test the
tool in a more ‘familiar’ situation with their course mates, and in a less familiar situation
with unknown peers.
All sessions followed the same structure: Briefing, listening to a speech, note-taking,
consecutive delivery, comments, and feedback. After the InTrain interpreting sessions had
been completed, the eight participants were asked to fill in a questionnaire. The question-
naire was composed of statements to be rated with a Likert scale and questions with yes/
no answers, followed by a space to provide explanations for the answer. The questionnaire
consisted of six sections: The first requested information relating to the participants’ demo-
graphics. The second asked about previous experience using platforms or other tools for
online interpreting and their names and also contained three questions about InTrain – if the
student had received an induction before using InTrain, if it had been useful, and if it was
judged necessary. The third section asked respondents to rate the functional efficiency and
user-friendliness of the tool’s user interface, focusing on technical aspects, namely, if exer-
cises had run smoothly or if there had been technical problems, if the platform was intuitive
or not, and whether it was necessary to use an additional device in case of technical prob-
lems on the basis of statements to be rated using Likert scales ranging from 1 to 5, where
1 was ‘completely disagree’, 2 was ‘partially disagree’, 3 was ‘neither agree nor disagree’,
4 was ‘partially agree’, and 5 was ‘completely agree’. The fourth section investigated the
usability of the platform in the two interpreting modes. Students were asked to say whether
the tool was more fitting for consecutive or simultaneous interpreting; if, in the role of
interpreter, the sound quality had been good enough for the practice, and if not, why; and if
the sound quality when in the role of speaker and supervisor had been good enough for the
purpose of the session, and if not, why. The fifth section collected the users’ perceptions of
pragmatic and paralinguistic aspects of communication via the platform and of the training
experience. Respondents were asked to compare consecutive and simultaneous interpreting

163
The Routledge Handbook of Interpreting, Technology and AI

practice with InTrain and in the lab, to state whether the online mode had influenced their
performance; whether non-verbal aspects had been influenced by the online mode (posture,
gaze), and if yes, how; how they rated the interaction compared to an on-site in-person
session (better or worse), and why; and if the experience with InTrain had been positive or
negative. Finally, the sixth section was devoted to users’ comments, suggestions, and opin-
ions. For space reasons, only the most relevant findings of this study will be reported briefly
here, namely, those pertaining to the platform’s user-friendliness, efficiency of functions,
and the need for another device to communicate with peers before and during the activity.
When asked to rate the statement regarding InTrain’s user-friendliness, seven respond-
ents out of eight answered ‘partially agree’, and one participant chose ‘neither agree nor
disagree’. As for the efficiency of functions, five students out of eight remarked that they
‘completely agreed’ that InTrain allowed users to perform interpreting activities smoothly,
one ‘partially agreed’, and two ‘partially disagreed’. In the final section of the questionnaire,
where participants could write comments, these two respondents complained about con-
nectivity problems during the two practice sessions. When asked whether the tool was suit-
able both for simultaneous and consecutive exercises, seven students out of eight answered
positively. When asked whether the tool’s online interpreting practice was comparable to
on-site practice, only one participant out of eight stated that both simultaneous and con-
secutive exercises online are ‘not comparable’ to interpreting in the classroom. When asked
how they found their experiences with the platform, all participants answered positively
but expressed different opinions regarding its use: Some stated that InTrain was a valu-
able tool as an alternative to on-site practice, when face-to-face practice was not possible.
Others saw further potential for its use and said they would consider using InTrain as an
additional tool for student training in addition to the interpreting lab.
While the study was limited in size, its findings suggest that this tool can help students
in peer-to-peer (as well as trainer–student) interpreting practice, even when they are not
co-located. The tool provides students with a free, open-access virtual space designed for
practising consecutive and simultaneous modes. It fosters students’ abilities to work col-
laboratively, despite geographical distance, and promotes familiarity with remote inter-
preting and RSI platforms. These additional skills, often defined as ‘interpreter’s soft skills’
(Albl-Mikasa, 2013; Galán-Mañas et al., 2020), are becoming crucial, if not core, compe-
tencies in the interpreting profession. This trend is likely to increase as time goes by (see
Section 10.5).
The next section will describe another technological platform specifically designed and
developed for teaching interpreting online: ReBooth.

10.4 Online interpreter training and assessment: ReBooth


This section presents another tool for training interpreters online. ReBooth was developed
during the COVID-19 pandemic, which forced all educational institutions in Italy to move
their teaching and testing activities online. Prior to the outbreak of the pandemic, to the
best of the authors’ knowledge, no specific platform existed for interpreter training. Most
pre-existing platforms that were used for remote simultaneous interpreting practice had
drawbacks when used in a training context, especially when in terms of conducting exams.
These aforementioned platforms did not allow more than one student to be tested at a time,
nor did they allow examiners to record more than one booth at a time. As a result, using
these platforms meant students were unable to sit interpreting exams at the same time, nor

164
Technology for training in conference interpreting

could they interpret the same speech. These issues prevented consistency in terms of ensur-
ing equal levels of difficulty and test conditions for each student. ReBooth is specifically
designed to test groups of students interpreting the same speech at the same time, monitor-
ing their performance and collecting their renditions, which can then be evaluated later.
As for lab activities, the ReBooth platform allows trainers to see students in their virtual
booths, via the interface. Trainers are also able to communicate with their students, provid-
ing them with a briefing or feedback while all participants remain in their virtual booths.

10.4.1 Overview
Before the pandemic, other remote training systems (both synchronous and asynchro-
nous) had been developed and tested by different interpreter education institutions. At that
time, one of the most difficult aspects for distance education to reproduce was found to be
interaction between students and trainers. Ko (2006, 2008) experimented with teaching
liaison interpreting and SI via video and teleconferencing. He highlighted a main draw-
back of teleconferencing to be the fact that teachers and students were unable to see each
other, while remarking how interpreting in real-life situations involved both verbal and
non-verbal interaction between the interpreter and the primary participants. Later, Ko and
Chen (2011) would test online interactive interpreting teaching in virtual classrooms using
the internet and a collaborative cyber community (3C) learning platform. Their system
allowed students to connect from anywhere and to practise interpreting in four different
types of virtual spaces: Interacting with both their teacher and their classmates, in a syn-
chronous mode for in-class activities, and in an asynchronous mode for after-class group
practice. Within the InZone15 project, in humanitarian interpreting education, a blended
approach combined a learning platform – created by Geneva University’s Interpreting
Department Virtual Institute – and open-educational resources to prepare interpreters to
work in the field. In 2008, the Universidade Federal de Santa Catarina, together with other
Brazilian universities, developed a SLI e-learning programme, for deaf and hearing stu-
dents, which was supplemented with a face-to-face option to support teaching based on
visual learning (Müller De Quadros and Rossi Stumpf, 2015). In addition, the Middlebury
Institute of International Studies at Monterey redesigned and adapted a module of tradi-
tional face-to-face instruction for an online context to bridge the gap between the need for
professional development in community interpreting and the constraints experienced by
working professionals (Mikkelson et al., 2019).
However, the potential of ICT was far from being fully exploited in the field of inter-
preter training at the time of these aforementioned online or blended experiences. The
COVID-19 pandemic significantly accelerated both the development and adoption of online
remote training in interpreting. In Italy, which was strongly hit by the pandemic, higher
education institutions had to move from classroom to distance teaching overnight due to
long, repeated lockdowns. Since interpreter training involves more than lecturing – which
can be moved online with videoconferencing relatively easily – the DIT of the University
of Bologna, like many others around the world (see, for example, Ho and Zou, 2023, for
experiments with Gather, a proximity-based platform), had to find solutions to moving
interpreting classes online. Instructors searched for platforms that would provide students
with the possibility to practice both consecutive and simultaneous interpreting remotely.
As a result, there was great need for a web application that would allow both simultane-
ous interpreting classes and exams to be conducted remotely in a similar way to in-person.

165
The Routledge Handbook of Interpreting, Technology and AI

Students still needed to experience entering booths in interpreting classrooms and deliver-
ing simultaneous renditions of the same speech, at the same time, which could be recorded
and collected by their teacher for training, final exams, or individual feedback or reflections
on the task as during classes in the lab.
As a result, ReBooth sought to replicate this on-site situation as much as possible, pro-
viding a virtual classroom environment with remote virtual booths. The teacher would be
able to simultaneously deliver the same speech to all connected booths, monitor student
performance, and automatically collect recordings of the students’ renditions. The main
requirements of such a system were to be able to reliably reproduce the source speech in
all booths at the same time, without lags or interruptions due to streaming issues, and to
securely record and save students’ renditions.
A further requirement was the need to facilitate trainer-to-student communication as
much as possible. To this end, additional features were designed. These features included a
chat box, which could be used by, and be visible to, both the trainer and their students; a
‘class call’ mode to allow the trainer to talk to students for briefing or feedback purposes;
and visual signals to enable students to flag an issue to their trainer (for instance, a raised
hand to signal a student’s need/wish to speak). Although students are unable to see or talk to
each other in this tool, they are able to communicate via the chat and speak directly to their
trainer. In contrast to InTrain, which was specifically designed to foster student-to-student
interaction, the rationale behind this particular CAIT tool was to reproduce simultaneous
interpreting training sessions in the lab, with students in separate booths, listening to either
a speaker for whom they are interpreting or to their trainer.

10.4.2 Tool development


ReBooth (which stands for Remote Booth) was the solution created to cater for these afore-
mentioned requirements. It is an HTML5 WebRTC-based platform for conference interpreter
training conceived for the Department of Interpreting and Translation at the University of
Bologna, Forlì Campus, by Carioli and Spinolo in March 2020,16 during the COVID-19 pan-
demic lockdowns, and designed and developed over subsequent years by Carioli.17
ReBooth connects a trainer with up to 10/12 students, depending on the trainer’s band-
width, especially upstream, and hardware quality. The tool has inherited parts of the code
that was developed for InTrain (see Section 10.3) and also uses PeerJS as the basic infrastruc-
ture for connections. However, ReBooth was created for different needs than InTrain and was
therefore redesigned from scratch. The WebRTC connection scheme remains peer-to-peer,
but ReBooth needs to manage more connections at a time. As a result, the server infrastruc-
ture is also more complex and demanding. The trainer must first authenticate to log in. An
authentication infrastructure is then required, invitations are sent by email, and media files
are uploaded to and downloaded from the server, where they need to be stored. The workload
for the server is not high, but the bandwidth use and amount of storage space required mean
it is impossible for the department to offer public access to the platform.18
The trainer authenticates, logs in, and creates ‘a class’ by entering their students’ email
addresses (Figure 10.3). An invitation link is sent to the students, enabling them to connect
to the correct class on the platform. This step thus avoids the need for student authentica-
tion. Classes can also be saved in a file and reused when required.
Once the participants list is complete, the teacher sends the invitations and enters the
class themselves. The trainer’s user interface is then loaded (Figure 10.4).

166
Technology for training in conference interpreting

Figure 10.3 ReBooth class set-up mode.

Figure 10.4 ReBooth trainer’s interface.

167
The Routledge Handbook of Interpreting, Technology and AI

At this stage, the session has not yet started, and students cannot yet enter their booths.
However, the trainer can prepare their class, uploading media files to the server, verifying
previously uploaded files, checking headphone and webcam, and so on. Once all prepara-
tions are complete, the trainer ‘starts the class’, and the students can enter their booths by
opening the invitation link and clicking the ‘join’ button on their interface.
If a student has not received their invitation, the tutor can retrieve their connection link
by clicking the button in the upper left corner of their booth display and sending the link to
the student by other means. At this stage, the trainer can also create additional ‘booths’ as
needed and communicate links to the relevant students.
As shown in Figure 10.4, students are displayed on the trainer’s interface inside their
own ‘booths’ as soon as they log on, but their microphones are muted. In the student’s
interface, the trainer appears. The trainer is able to speak to particular students by clicking
the ‘talk to booth’ button on their display or communicate with the whole class using the
‘class call’ button. In ‘class call’ mode, the trainer can give the floor to one booth at a time.
As a result, the chosen student’s voice can be heard by all other booths, which allows only
one student to speak at a time. ReBooth connections are, in fact, ‘peer-to-peer’ rather than
‘peer-to-server’ connections. Consequently, there is a single audio/video WebRTC connec-
tion between the instructor and each student (the connection topology is star-shaped), but
there are no direct connections between booths. Students can only communicate with each
other via chat, using text messages that are transparently routed via the trainer’s application
using the WebRTC data channel. Students also can send visual signals (flags) to the trainer
like ‘raise hand’, ‘agree/ok/yes’, and ‘do not agree/no’.
The teacher’s interface features a media player panel to manage and play media files, a
recorder panel to activate the recording in manual mode, two buttons for the automated
interpretation session mode (simultaneous/consecutive), and buttons to manage other fea-
tures of the practice session (including a session status monitor, the capacity to dismiss flags
and make a ‘class call’).
To reduce the risk of jeopardising students’ activity, ReBooth contains the following
features:

1. ReBooth does not use streaming. Instead, it sends the entire media file to the student’s
browser before the trainer starts a class or an exam. This allows students to listen to
the media file in its original quality, without being affected by lags, drops, and even
disconnections.
2. ReBooth records the student’s audio using two separate methods alongside a potential
backup procedure:
• ReBooth saves all audio streaming that the trainer receives on their computer.
• ReBooth records the student’s audio locally on their computer and sends it to the
server where ReBooth is hosted. This ensures that the audio retains the best possible
quality, since it has been acquired directly from the student’s device.
• The aforementioned recording is also available to the student, who can also save and
send it to the trainer by other means (e.g. email).
In the ‘simultaneous’ mode, booth recording automatically starts when the speech begins,
and stops when the speech ends. The trainer can decide to allocate additional recording
time for students to complete their rendition (since there is often a time lag in simultaneous
interpreting).

168
Technology for training in conference interpreting

In the ‘consecutive’ mode, playback starts immediately, and recording begins as soon as
the speech ends. By default, recording lasts as long as the speech, but again the trainer can
decide to allow more (or less) time for the rendition. In both cases, an audible signal alerts
the student when the recording is starting, and a timer indicates time remaining.
During recording, the trainer can monitor the status of each booth via the appropriate
button. They can also listen to the students’ output using the discrete listening function.
This is activated by simply clicking on the relevant booth’s display.
At the end of an interpreting activity, the automatic collection of the renditions begins,
as previously described. Recordings made in streaming can be downloaded together in a
zip file by clicking the appropriate button in the recording panel during the session. All
recordings collected on the server are kept and can be downloaded immediately or later (or
individually from each booth).
The student’s interface is simple, since all functions and activities are launched by the
trainer (Figure 10.5). The student can adjust the volume of the floor and stream. Students
can also send the trainer visual cues (hand up, thumbs up, thumbs down) and communicate
with the class via the chat.
The student interface has three display panels: One features the trainer, one shows the
feedback from their own webcam, and the final panel (usually hidden) is activated to dis-
play the trainer’s chosen media file, from which they would like their class to work.
ReBooth has been used by the DIT at Forlì Campus in their conference interpreting MA
programme since April 2020 for remote classes and exams. It was also used for the first time
to conduct MA admission tests with 190 candidates in 2020. The candidates were divided
into groups of 10 to 12 students per trainer’s PC. They successfully sat the test remotely,
at the same time, without technical glitches from ReBooth. Since 2020, this process has
been used every year for admission tests to the interpreting MA programme, with over 150
candidates sitting the test in person, in the university’s language labs, using ReBooth. The
platform has also been used since 2022 by the master’s course in conference interpreting of
V. N. Karazin Kharkiv National University in Ukraine, since no on-site education has been
possible since the Russian invasion, taking place at the time of writing this chapter.

Figure 10.5 ReBooth student’s interface.

169
The Routledge Handbook of Interpreting, Technology and AI

The tool’s next release, ReBooth 2.0, is currently in its design stage. It will contain addi-
tional features, namely, a shared virtual space for trainers and students to communicate
before and after a practice session. This will allow for briefing and debriefing/feedback
or other purposes. ReBooth 2.0 will also include virtual booths that are able to host two
interpreters, who will be able to cooperate (prompt) and manage handover as they would
in-person. It will also be possible to interpret using relay. These additional features will
make ReBooth 2.0 an invaluable tool for online teaching and learning, as it will replicate
the functions and conditions of RSI platforms and provide training which is as close as pos-
sible to in-person, on-site training.

10.4.3 Usability
ReBooth was conceived as a multiple-booth, virtual classroom to train students remotely.
As a result, the trainers’ experiences and opinions regarding its usability and suitability are
of paramount importance. This was tested by two of this chapter’s authors (Spinolo and
Carioli) using a convergent, mixed-method research design (Creswell, 1999). Eleven inter-
preter trainers from Italian and Spanish universities who had never used ReBooth before
were recruited. Participants received the user manual a week before the test, along with a
set of 15 tasks to complete. These tasks represented a trainer’s typical role during an inter-
preter training session. In addition, the participants were sent a short audio file to play dur-
ing the test. The researchers collected the data from three different sources: (1) participants’
screen recordings during the session; (2) responses to the User Experience Questionnaire
(UEQ, Laugwitz et al., 2008), which was completed straight after the session; and (3) two
focus groups (with five participants each),19 held one week after the test.
The participants’ screen recordings20 were analysed as a ‘performance measure’ (Rubin
and Chisnell, 2008, 166). The number of attempts made by participants to complete each
task was rated on a 6-point scale ranging from 0 (‘not problematic’, one attempt) to 5 (‘very
problematic’, more than 5 attempts, or not completed). Results show that most tasks (11 out
of 15) proved ‘not’ or ‘slightly’ problematic, with a mean score ≤1. Of the remaining four, the
two ‘most problematic’ tasks had a mean score of 2 (SD = 2.3) (‘listen to students providing
their consecutive renditions using the discreet listening function’) and 2.3 (SD = 1.4) (‘make
sure that all students have received the file’), while the remaining two tasks scored 1.2 (SD =
1.6) (‘start a consecutive interpreting session with the media file sent’) and 1.6 (SD = 1.4)
(‘make sure you can hear all students and they can hear you through the intercom function’).
The full UEQ used in this study includes the following six scales, as described by its devel-
opers (Schrepp, 2023, 2): Attractiveness (whether or not users like the product), perspicuity
(whether it is easy to learn how to use the product), efficiency (whether tasks can be solved
without unnecessary effort), dependability (whether users felt in control of their interaction
with the tool), stimulation (whether it is exciting to use the tool), and novelty (whether the
tool is innovative and creative). The UEQ has been previously used in translation and inter-
preting studies to assess tool usability (see, for example, Braun et al., 2020). In this case, the
UEQ results were positive on all six scales. Results were obtained using the UEQ data analysis
tools (Schrepp, 2023), which also compare the results to benchmarks derived from more than
450 usability studies. The results of the UEQ are reported in Table 10.1.
The thematic analysis conducted on the recordings of the two focus groups allowed
for both positive and problematic aspects of the tool to be identified. This activity proved

170
Technology for training in conference interpreting

Table 10.1 ReBooth UEQ results

Scale Mean Variance Comparison to


benchmark

Attractiveness 2.348 0.25 Excellent


Perspicuity 1.750 0.43 Above average
Efficiency 1.818 0.54 Good
Dependability 1.886 0.24 Excellent
Stimulation 2.182 0.43 Excellent
Novelty 1.886 0.42 Excellent
Note: Min −3, max +3.21

useful in obtaining suggestions for the current version of ReBooth, as well as contributing
to identifying desired features for its future 2.0 release. In general, participants found the
interface pleasant and easy to use; they appreciated the ease with which they could switch
between one booth and another, the possibility to communicate with one booth or with
the entire class, and the ability to download all recordings and send the audio or video file
before starting the session. Participants also highlighted the ability to streamline consecutive
and simultaneous sessions and obtain dual track recordings as a further positive feature.
The most reported issues were connected to the class set-up interface, which participants
found less intuitive. They also reported issues with the format of recordings (webm), and as
observed in the screen recordings, participants experienced problems with using the inter-
com and found file sending functions to be less intuitive. Participants provided noteworthy
suggestions on how icon colours and sizes could be made more intuitive and created their
wish list of features for the 2.0 release. The most requested features included allowing more
students to connect at a time, allowing students to work cooperatively in virtual booths,
having access to more participant profiles (trainer, student, speaker, audience), having a
relay function, and the possibility to share screen and audio.
This usability study was conducted with a small sample of participants, and its lim-
ited scale does not allow for generalisable conclusions. However, based on the findings,
ReBooth appears to enable trainers to perform all the necessary tasks for conducting effi-
cient online interpreting classes and exam sessions.

10.5 Teaching soft skills to accommodate technological advances


This section discusses the need to include ICT literacy in education. Nowadays, ICT lit-
eracy is considered to be one of the essential soft skills required by the job market (Trilling
and Fadel, 2009). This remark also holds true for interpreter training and education, since
technology is pervading the field of written and oral translation. AI-based tools are being
developed to assist interpreters during the interpreting process. This section will focus on
SmarTerp, a tool developed to help interpreters deal with ‘problem triggers’, such as num-
bers (Gile, 1995/2009) or named entities (Frittella, 2023). SmarTerp was tested by students
in its early stages, during a research project at the DIT in Forlì held in October 2021. Before
that, the next section provides a brief overview of the evolution in the soft skills currently
required on the labour market, which also includes interpreting services.

171
The Routledge Handbook of Interpreting, Technology and AI

10.5.1 Overview of soft skills’ development needs


Trilling and Fadel (2009) identified the ‘core soft skills for the 21st century’ as being those
which are related to life and career, learning and innovating, and information, media, and
technology. Regarding the latter, nowadays, students are expected to be ‘info-savvy’, ‘media
fluent’, and ‘tech tuned’ (Trilling and Fadel, 2009, 60), according to the authors.
Information literacy, media literacy, and ICT literacy are considered to be the main,
necessary soft skills for students in general (interpreters are no exception to this trend), as
Trilling and Fadel stated already 15 years ago:

With today and tomorrow’s digital tools our next generation students will have
unprecedented power to amplify their ability to think, learn, communicate, collabo-
rate and create. Along with all that power comes the need to learn the appropriate
skills to handle massive amounts of information, media and technology.
(Trilling and Fadel, 2009, 65)

The current abundance of media resources, including videos, podcasts, websites, speech,
and terminology banks require the ability to navigate them. Therefore, interpreting stu-
dents need to understand how to use media resources and CAI tools for their learning pur-
poses and for their future professional development. In addition, since technology evolves
very quickly, two additional skills deemed necessary for students are ‘flexibility’ and ‘readi-
ness to explore and test innovations without being overwhelmed’.
Before the COVID-19 pandemic, which boosted the use of technology in interpret-
ing, Fantinuoli and Prandi (2018) had already identified three crucial areas of learning
for future interpreters: Remote interpreting, computer-assisted interpreting (CAI), and
automatic speech translation (AST). CAI tools are meant to relieve the interpreter’s cog-
nitive load during simultaneous interpreting and improve the interpreter’s performance.
Initial studies on the subject show that interpreters’ performances improved with these
tools (Prandi, 2017, 2018, this volume). CAI tools can have an impact on all three of the
stages in the interpreting process, as identified in field-specific literature (Kalina, 2007; Gile,
1995/2009), before, during, and after the interpreting event. These tools can help interpret-
ers retrieve domain-specific terminology and knowledge that are relevant to their specific
conference. CAI tools assist the interpreter in organising and memorising terminology and
knowledge before the event, and in looking up terminology during the time-constrained
interpreting process (Fantinuoli and Prandi, 2018, 167), without having to allocate addi-
tional cognitive resources to manually perform the search themselves (Fantinuoli, 2023).
CAI tools also assist the interpreter after completion of an assignment by integrating and
updating glossaries or other relevant documents. However, as anticipated, CAI tools have
since moved to a new paradigm and are starting to integrate AI. This evolution requires
further skills to be included in interpreter training programmes if trainers want their stu-
dents to be ‘tech-savvy’.

10.5.2 Testing an ASR and AI-based CAI tool: SmarTerp


In order to further introduce CAI and CAIT tools in interpreter education and stimulate stu-
dents’ soft skills, the EIT Digital–funded Project SmarTerp (SMARTER INTERPRETING:
Seamless Management and Automation of Resources and Tools for an Efficient Remote

172
Technology for training in conference interpreting

Simultaneous Interpreting) developed SmarTerp, an ASR- and AI-based CAI tool. Its initial
release was tested in an interpreter education setting – the DIT of the University of Bolo-
gna – to explore the impact of its use on simultaneous interpreting performance and collect
trainees’ perceptions of its usefulness. SmarTerp combines CAI features with speech recog-
nition and translation by displaying well-known problem-trigger items (terms, named enti-
ties, numbers, acronyms), alongside their translations, on the interpreter’s PC screen. Russo
et al. (forthcoming) conducted the study including 24 second-year students of the Master
in Interpreting who participated on a voluntary basis to test the tool. All participants (21
females, 3 males) were Italian L1 speakers and were divided according to the following
language combinations, each comprising six students: Italian > Spanish, Spanish > Italian,
Italian > English, English > Italian. These participants had never experienced this particu-
lar feature of a CAI tool before. However, they had all received training in the tool before
undertaking the tests. Twelve students participated on-site, in the Department’s interpret-
ing labs, while the remaining 12 took part online, via Zoom. A total of 15 source video
texts were administered to participants across three experimental sessions: Five English
speeches and corresponding five Spanish and five Italian speeches. The speeches contained
the problem triggers that Frittella (2021) had identified at an earlier stage of testing with
professional interpreters. The problem triggers identified include named entities, acronyms,
numbers, and technical terms. All speeches were controlled and harmonised in relation to
duration and text features. The six analytical categories suggested by Frittella (2022) plus
one suggested by the research team were subsequently applied to the transcribed renditions
in order to analyse them: Correct rendition, partial rendition, minor error/missing detail,
generalisation, omission and semantic error, plus self-corrections. After transcribing and
analysing all renditions, the sessions that had been conducted using SmarTerp were com-
pared to those without the CAI tool. A similar pattern was observed in all four directions
(i.e. IT<>EN and IT<>ES): More successful handling of triggers (correct renditions, par-
tial renditions, minor errors) and fewer instances of unsuccessful management (omissions,
semantic errors) were recorded when using the CAI tool. So directionality does not seem
to impact the students’ renditions. Furthermore, all 24 students’ performances improved
from Session 1 to Session 3, thus indicating the positive effect of their familiarisation with
the CAI tool.
In terms of usability, at the end of the sessions that used SmarTerp, the participants were
asked to complete a short online questionnaire where they answered questions and rated
statements using a 7-point Likert scale, with 1 = strongly dissatisfied/disagree/very unlikely
and 7 = strongly satisfied/agree/very likely. The main results are as follows: In response to
the question ‘Overall, how satisfied are you with the support of the CAI tool SmarTerp dur-
ing testing?’, the participants’ responses were overwhelmingly positive, with most partici-
pants indicating that they were either ‘satisfied’ or ‘very satisfied’. The majority of trainees
also found the CAI tool to be ‘user-friendly’, strongly agreeing with the statement ‘The CAI
tool was easy to use’. However, participants also highlighted the need for specific training
to be able to use SmarTerp effectively.

10.6 Conclusions
This chapter has explored the impact of technologies on interpreter education and profes-
sional life. In particular, CAI and CAIT tools are met with mixed opinions by all involved
parties: Higher interpreter education institutions, trainers, trainees, and professional

173
The Routledge Handbook of Interpreting, Technology and AI

interpreters. The COVID-19 pandemic proved to be a real catalyst in developing CAI and
CAIT solutions that ensured efficient interpreter-mediated communication and interpreter
training remotely. Two such CAIT solutions developed at the DIT of the University of Bolo-
gna at Forlì, InTrain and ReBooth, were presented, and their evidence-based usability was
discussed. Furthermore, a section of the chapter focused on the results that emerged from
the initial tests of SmarTerp, a CAI tool developed by a European partnership and tested in
an academic setting by 24 interpreting trainees at DIT.
As for InTrain, students overall expressed positive opinions about its user-friendliness,
efficiency of functions, and suitability. However, in terms of its use, there were divided
opinions: Some students considered InTrain to be a ‘good alternative’ to on-site training
when the latter is not possible, while others stated they would only use the tool as an addi-
tion to in-person classes in the lab. These results reflect two concepts that are inherent in
interpreting: Situatedness and embodiment (Risku, 2002; Davitti and Pasquandrea, 2017;
Pöchhacker, 2024). Being in a virtual environment means lacking shared physical space
and context, as well as elements of non-verbal communication that interpreters use during
comprehension and production tasks.
The usability of ReBooth was rated positively by participants during the study, and it is
currently used successfully at the DIT of Bologna University, as well as at Karazin Kharkiv
National University. The next release of the tool is currently under development and will
include additional interactive features that aim to reproduce a well-equipped in-person
classroom working environment as much as possible, including real booths. New features
will therefore allow students to share a virtual booth and be able to listen to and commu-
nicate with their booth partner as they would in a physical booth.
There has also been clear appreciation for SmarTerp, although participants also
expressed the need for specific training in order to use it efficiently. Since current trends
leaning towards working online and introducing AI into interpreting are likely to remain
and even accelerate, there is a call to include CAI and CAIT tools in interpreter education
in order to best equip students for the profession and allow them to develop the necessary
soft skills that are becoming increasingly crucial today.

Notes
1 This chapter was jointly conceived by the four authors. In the final version, Mariachiara Russo
authored Sections 10.1, 10.2, 10.5.2; Amalia Amato authored Sections 10.3, 10.3.1, 10.3.3, 10.4,
10.4.1, 10.5, 10.5.1, 10.6; Gabriele Carioli authored Sections 10.3.2, 10.4.2; and Nicoletta Spi-
nolo authored Section 10.4.3.
2 https://interpretertrainingresources.eu/ (accessed 31.3.2025).
3 https://knowledge-centre-interpretation.education.ec.europa.eu
4 https://nationalnetworkforinterpreting.ac.uk/interactive-resources/speechpool/
5 https://orcit.eu/
6 www.interpretbank.com/site/ (accessed 31.3.2025).
7 www.eitdigital.eu/fileadmin/2021/innovation-factory/new/digital-tech/EIT-Digital-Factsheet-
Smarterp.pdf (accessed 31.3.2025).
8 The use of InTrain is also discussed in the context of humanitarian interpreting training (Russo
and Spinolo, 2022).
9 The STUN (Session Traversal Utilities for NAT) protocol is used to assist devices behind a NAT
(network address translation) or firewall in establishing communication with external devices.
It works by allowing devices to discover their public IP address and the type of NAT they are
behind, enabling them to communicate with other devices even when located behind NAT or
firewalls.

174
Technology for training in conference interpreting

10 The TURN (Traversal Using Relays around NAT) protocol is a relay protocol used when direct
peer-to-peer communication is not possible due to network address translation (NAT) or firewall
restrictions. TURN servers act as intermediaries, relaying data between communicating peers. This
enables devices to establish communication even when direct connections are not feasible, ensur-
ing connectivity in diverse network environments.
11 https://github.com/mattdiamond/Recorderjs (accessed 31.3.2025).
12 https://intrain.ditlab.it/ (accessed 31.3.2025).
13 www.gnu.org/licenses/agpl-3.0.html (accessed 31.3.2025).
14 https://github.com/bilo1967/intrain (accessed 31.3.2025).
15 www.unige.ch/inzone/ (accessed 31.3.2025).
16 https://rebooth.ditlab.it/ (accessed 31.3.2025).
17 Source code is available under the GNU Affero General Public v3 License from https://github.com/
bilo1967/rebooth.
18 A temporary evaluation account can be obtained free of charge (see https://rebooth.ditlab.it/ for
the request address).
19 One of the recruited participants was not able to participate in the focus groups.
20 Screen recordings were analysed for ten participants, as one screen recording had to be discarded
due to technical issues.
21 UEQ items are ‘scaled from −3 to +3. Thus, −3 represents the most negative answer, 0 a neutral
answer, and +3 the most positive answer’ (Schrepp, 2023, 2). In the same way, the range of scales
goes from −3 (worst performance) to + 3 (best performance) (Schrepp, 2023, 5).

References
Albl-Mikasa, M., 2013. Developing and Cultivating Interpreter Expert Competence. The Interpreters’
Newsletter 18, 17–34.
Baigorri-Jalón, J., 2000. La interpretación de conferencias: el nacimiento de una profesión. De París
a Nuremberg. Editorial Comares, Granada.
Bertozzi, M., 2024. Continuous self-learning for conference interpreting trainees: the case of the Uni-
versity of Bologna. The Interpreters’ Newsletter 29, 19–39.
Boéri, J., De Manuel Jerez, J., 2011. From Training Skilled Conference Interpreters to Educating
Reflective Citizens: A Case Study of the Marius Action Research Project. The Interpreter and
Translator Trainer 5(1). Special Issue: Ethics and the Curriculum: Critical Perspectives, 41–64.
Braun, S., Davitti, E., Dicerto, S., Slater, C., Tymczyńska, M., Kajzer-Wietrzny, M., Floros, G., Kritsis, K., Hoffs-
taedter, P., Kohn, K., Roberts, J.C., Ritsos, P.D., Gittins, R., 2014. Eviva evaluation studies report. Research
Gate. URL www.researchgate.net/publication/309736446_EVIVA_Project_Evaluating_the_ Education_
of_Interpreters_and_their_Clients_through_Virtual_Learning_Activities_-_Evaluation_Studies_Report
(accessed 9.6.2024).
Braun, S., Davitti, E., Slater, C., 2020. ‘It’s Like Being in Bubbles’: Affordances and Challenges of Vir-
tual Learning Environments for Collaborative Learning in Interpreter Education. The Interpreter
and Translator Trainer 14(3), 259–278. URL https://doi.org/10.1080/1750399X.2020.1800362
Braun, S., Slater, C., 2014. Populating a 3D Virtual Learning Environment for Interpreting Students
with Bilingual Dialogues to Support Situated Learning in an Institutional Context. The Interpreter
and Translator Trainer 8(3), 469–485. URL https://doi.org/10.1080/1750399X.2014.971484
Braun, S., Slater, C., Botfield, N., 2015. Evaluating the Pedagogical Affordances of a Bespoke 3D Vir-
tual Learning Environment for Training Interpreters and Their Clients. In Erlich, S., Napier, J., eds.
Interpreter Education in the Digital Age: Innovation, Access, and Change. Gallaudet University
Press, Washington, DC, 39–67.
Carioli, G., Spinolo, N., 2019. InTrain [software]. URL https://intrain.ditlab.it/credits (accessed 9.
6.2024).
Carioli, G., Spinolo, N., 2020. ReBooth [software]. URL https://rebooth.ditlab.it/credits (accessed 9.
6.2024).
Creswell, J.D., 1999. Mixed-Method Research: Introduction and Application. In Cizek, G.J., ed.
Handbook of Educational Policy. Academic Press, Cambridge, 455–472.

175
The Routledge Handbook of Interpreting, Technology and AI

Davitti, E., Pasquandrea, S., 2017. Embodied Participation: What Multimodal Analysis Can Tell
Us About Interpreter-Mediated Encounters in Pedagogical Settings. Journal of Pragmatics 107,
105–128.
Defrancq, B., 2023. Technology in Interpreter Education and Training. In Corpas Pastor, G., Defrancq, B.,
eds. Interpreting Technologies – Current and Future Trends. John Benjamins, Amsterdam, 302–319.
Defrancq, B., Fantinuoli, C., 2020. Automatic Speech Recognition in the Booth: Assessment of
­System Performance, Interpreters’ Performances and Interactions in the Context of Numbers.
­Target – International Journal of Translation Studies 33(1), 73–102.
Fantinuoli, C., 2023. Towards AI-Enhanced Computer-Assisted Interpreting. In Corpas Pastor,
G., Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. John Benjamins,
Amsterdam, 46–71.
Fantinuoli, C., Prandi, B., 2018. Teaching Information and Communication Technologies – a Pro-
posal for the Interpreting Classroom. trans-kom 11(2), 162–182.
Frittella, F.M., 2021. Early Testing Report. Usability Test of the ASR- and AI-Powered CAI tool
‘SmarTerp’. Unpublished report of the SmarTerp project.
Frittella, F. M., 2022. The ASR-CAI Tool Supported SI of Numbers: Sit Back, Relax and Enjoy
Interpreting? Paper presented at the Conference Translator & The Computer (TC) 43. Research-
Gate. URL www.researchgate.net/publication/363835256_The_ASR-CAI_tool_supported_SI_of_
numbers_Sit_back_relax_and_enjoy_interpreting/link/63304eab86b22d3db4e07c1d/download
(accessed 9.6.2024).
Frittella, F.M., 2023. Usability Research for Interpreter-Centred Technology: The Case Study of
SmarTerp. Language Science Press, Berlin. URL https://doi.org/10.5281/zenodo.7376351
Galán-Mañas, A., Kuznik, A., Olalla-Soler, C., 2020. Entrepreneurship in Translator and Interpreter
Training. HERMES – Journal of Language and Communication in Business 60, 7–11.
Gile, D., 1995/2009. Basic Concepts and Models for Interpreter and Translator Training: Revised
Edition, 2nd ed. John Benjamins, Amsterdam.
Gran, L., Carabelli, A., Merlini, R., 2002. Computer-Assisted Interpreter Training. In Garzone, G.,
Viezzi, M., eds. Interpreting in the 21st Century: Challenges and Opportunities. John Benjamins,
Amsterdam, 277–294.
Ho, C.-E., Zou, Y., 2023. Teaching Interpreting in the Time of COVID: Exploring the Feasibil-
ity of Using Gather. In Liu, K., Cheung, A.K.F., eds. Translation and Interpreting in the Age of
COVID-19. Corpora and Intercultural Studies, Vol. 9, Springer, Singapore, 311–330. URL https://
doi.org/10.1007/978-981-19-6680-4_16
Kajzer-Wietrzny, M., Tymczynska, M., 2014. Integrating Technology into Interpreter Training
Courses: A Blended Learning Approach. inTRAlinea Special Issue: Challenges in Translation Peda-
gogy. URL www.intralinea.org/specials/article/2101
Kalina, S., 2007. ‘Microphone Off’ – Application of the Process Model of Interpreting to the Class-
room. Kalbotyra 57(3), 111–121.
Ko, L., 2006. Teaching Interpreting by Distance Mode: Possibilities and Constraints. Interpreting
8(1), 67–96.
Ko, L., 2008. Teaching Interpreting by Distance Mode: An Empirical Study. Meta 5(4), 814–840.
Ko, L., Chen, N.S., 2011. Online-Interpreting in Synchronous Cyber Classrooms. Babel 57(2),
123–143.
Laugwitz, B., Held, T., Schrepp, M., 2008. Construction and Evaluation of a User Experience
­Questionnaire. In Holzinger, A., ed. HCI and Usability for Education and Work: 4th Symposium of
the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer
Society (USAB 2008). Springer, Berlin, 63–76. URL https://doi.org/10.1007/978-3-540-89350-9_6
Merlini, R., 1996. InterpretIT. Consecutive Interpretation Module. The Interpreters’ Newsletter 7,
31–41.
Mikkelson, H., Slay, A., Szasz, P., Cole, B., 2019. Innovations in Online Interpreter Education:
A Graduate Certificate Program in Community Interpreting. In Sawyer, D., Austermühl, F.,
Enríquez Raído, V., eds. The Evolving Curriculum in Interpreter and Translator Education. John
Benjamins, Amsterdam, 167–184.
Motta, M., 2013. Evaluating a Blended Tutoring Program for the Acquisition of Interpreting Skills:
Implementing the Theory of Deliberate Practice (PhD thesis). University of Geneva, Switzerland.
Motta, M., 2016. A Blended Learning Environment Based on the Principles of Deliberate Practice
for the Acquisition of Interpreting Skills. The Interpreter and Translator Trainer 10(1), 133–149.

176
Technology for training in conference interpreting

Müller de Quadros, R., Rossi Stumpf, M., 2015. Sign Language Interpretation and Translation in Bra-
zil: Innovative Formal Education. In Ehrlich, S., Napier, J., eds. Interpreter Education in the Digi-
tal Age: Innovation, Access, and Change. Gallaudet University Press, Washington, DC, 243–265.
Olalla-Soler, C., Spinolo, N., Muñoz Martín, R., 2023. Under Pressure? A Study of Heart Rate and
Heart-Rate Variability Using Smarterp. HERMES 63, 119–142.
Pöchhacker, F., 2024. Is Machine Interpreting Interpreting? Translation Spaces, online first. URL
www.jbe-platform.com/content/journals/10.1075/ts.23028.poc
Pollice, A., 2016. Portare la tecnologia in cabina: le nuove tecnologie a servizio dell’interprete e il
caso della simultanea con testo (Unpublished MA dissertation). Department of Interpreting and
Translation, University of Bologna.
Prandi, B., 2017. Designing a Multimethod Study on the Use of CAI Tools During Simultaneous Inter-
preting. In Esteves-Ferreira, J., Macan, J., Mitkov, R., Stefanov, O.M., eds. Proceedings of the 39th
Conference Translating and the Computer. Tradulex, Geneva, 76–88. https://www.asling.org/tc39/
wp-content/uploads/TC39-proceedings-final-1Nov-4.20pm.pdf (accessed 9.6.2024).
Prandi, B., 2018. An Exploratory Study on CAI Tools in Simultaneous Interpreting: Theoretical
Framework and Stimulus Validation. In Fantinuoli, C., ed. Interpreting and Technology. Language
Science Press, Berlin, 29–59.
Prandi, B., 2020. The Use of CAI Tools in Interpreter Training: Where Are We Now and Where Do
We Go from Here? inTRAlinea Special Issue: Technology in Interpreter Education and Practice.
URL www.intralinea.org/specials/article/2512
Risku, H., 2002. Situatedness in Translation Studies. Cognitive Systems Research 3(3), 523–533.
Ritsos, P.D., Gittins, R., Braun, S., Slater, C., Roberts, J.C., 2013. Training Interpreters Using Virtual
Worlds. In Gavrilova, M.L., Tan, K., Kuijper, A., eds. Transactions on Computational Science
XVIII. Lecture Notes in Computer Science, Vol. 7848. Springer, Berlin, Heidelberg, 21–40. URL
https://doi.org/10.1007/978-3-642-38803-3_2
Rodríguez, S., Frittella, F.M., Okoniewska, A., 2022. A Paper on the Conference Panel “In-Booth CAI
Tool Support in Conference Interpreter Training and Education”. In Esteves-Ferreira, J., Mitkov,
R., Recort Ruiz, M., Stefanov, O.M., eds. Translating & the Computer (TC) 43, Language AsLing
Technology 16–18 November 2021 on the Web Conference Proceedings. Tradulex, Geneva,
78–87. URL www.tradulex.com/varia/TC43-OnTheWeb2021.pdf
Rodríguez Melchor, M.D., Horvath, I., Ferguson, K., 2020. The Role of Technology in Conference
Interpreting Training. Peter Lang, Lausanne.
Rubin, J., Chisnell, D., 2008. Handbook of Usability Testing: How to Plan, Design, and Conduct
Effective Tests. Wiley, Hoboken, NJ.
Russo, M., Amato, A., Torresi, I., under review. The Digital Boothmate in an Educational Setting:
Evaluation of SmarTerp Performance. The Interpreters’ Newsletter.
Russo, M., Spinolo, N., 2022. Technology Affordances in Training Interpreters for Asylum Seek-
ers and Refugees. In Ruiz Rosendo, L., Todorova, M., eds. Interpreter Training in Conflict and
Post-Conflict Scenario. Routledge, London, 165–179.
Sandrelli, A., 2005. Designing CAIT Tools: Blackbox. MuTra 2005 – Challenges of Multidimensional
Translation: Conference Proceedings, 191–209.
Santoro, B., 2020. Tecnologia e didattica dell’interpretazione. Esperienza di interpretazione consecu-
tiva e simultanea con InTrain (MA dissertation). University of Bologna, Department of Interpret-
ing and Translation.
Schrepp, M., 2023. User Experience Questionnaire Handbook. All You Need to Know to Apply the
UEQ Successfully in Your Projects. User Experience Questionnaire. URL www.ueq-online.org/
Material/Handbook.pdf (accessed 9.6.2024).
Trilling, B., Fadel, C., 2009. 21st Century Skills: Learning for Life in Our Times. Wiley, Hoboken, NJ.

177
PART III

Technology for (semi-)automating


interpreting workflows
11
TECHNOLOGY FOR HYBRID
MODALITIES
Elena Davitti

11.1 Introduction
Live communication occurs in a myriad of settings. These include live events, television
broadcasts, conferences, museums, theatres, public services, and schools. The rapid growth
of multimedia and multilingual content, coupled with the pandemic-induced surge in online
and streaming content (Nimdzi, 2022a), has called for the development of effective and
varied solutions to make this content accessible to the broadest possible audience (Nimdzi,
2023). Ensuring access to information, culture, education, and entertainment both within
and across linguistic, sensory, and cognitive boundaries, and in real time, is essential for
fostering an inclusive society, as outlined in the UN Sustainable Development Goal 17.8.1.1
A growing legal framework supports the necessity for inclusive services to guarantee
access to information and education as fundamental human rights, as established by the
EU Charter of Fundamental Human Rights.2 The UN Convention on the Rights of Persons
with Disabilities (2006)3 also emphasises the need for equitable access to information and
entertainment for individuals with disabilities. This concept, according to WHO,4 refers
not only to ‘body or mind impairment’ but also to ‘activity limitations and participation
restrictions that can affect any of us’. This expanded view of disability challenges tradi-
tional notions by emphasising the importance of accessibility, particularly communicative
access, as a critical criterion beyond mere physical needs. A holistic approach ensures that
everyone, regardless of physical or cognitive limitations, can fully participate in society.
Although not traditionally framed as an access service, interpreting has already catered
for such communicative needs by providing access both intramodally (spoken-to-spoken)
and interlingually (from one language to another). However, defining interpreting strictly
as an oral form of language use is overly restrictive. This perspective fails to account for
institutionalised forms of interpreting that cross intramodal boundaries, such as signed
language interpreting (from oral to signed) and sight translation (from written to oral).
Boundary-crossing is thus not new in interpreting. Following the ideas of Kade (1968) and
Pöchhacker (2022), this chapter views immediacy as a common denominator across these
practices, characterised by meeting communicative needs in real time with minimal oppor-
tunities for editing the output. Recent technological advances to support real-time content


DOI: 10.4324/9781003053248-15
The Routledge Handbook of Interpreting, Technology and AI

delivery – from information and communication technologies (e.g. videoconferencing and


distance communication platforms) to AI-driven technologies (e.g. speech recognition [SR]
and machine translation [MT]) – have further shifted the boundaries of traditional inter-
preting. This has resulted in new, hybrid practices that necessitate a revised conceptualisa-
tion (Pöchhacker, 2023).
This chapter explores technology-enabled, hybrid modalities of live communication
across languages, specifically focusing on real-time speech-to-text (STT) interlingual ser-
vices. These emerging workflows aim to provide instant access to live content in written
format (as captions, subtitles, or text), catering to audiences with different needs. When
produced intralingually, that is, in the same language, this output can be helpful for D/
deaf or Hard-of-Hearing people, alongside other potentially interested groups, such as
children, elderly people, and those with cognitive impairments, as well as to anyone in a
sound-sensitive environment, such as trains, offices, or libraries. Statistics show that 80%
of people who use subtitles are not actually deaf, and 85% of videos on social media are
watched without sound. This indicates that live subtitling serves a much wider audience
beyond the traditionally targeted groups (Kubánková, 2021). When produced interlin-
gually, subtitles can also benefit people who do not speak that language, thus deepening
the integration of translation, in its widest sense, into the notion of access. Beyond making
live content accessible in real time, subtitles may also be used as a basis for the production
of written reports or synchronised with a video of an event to be published online (e.g.
web-based TV catch-up services). The addition of a language transfer dimension is thus
where interpreting and speech-to-text start to merge and overlap.
With the global live subtitling market worth nearly USD 1.8bn and projected to
experience double-digit annual growth by 2028 (TheExpressWire, 2023), and with the
real-time language translation device market projected to reach USD 1.88bn by 2030
(Verified Market Reports, 2024), these services have become essential for communica-
tion in broadcast, on-site, and online events, and their demand is expected to soar (Nim-
dzi, 2022b). However, service providers are currently implementing different workflows,
mostly ad hoc, to meet contingent needs, without a comprehensive, research-informed
understanding of their affordances and constraints (Slator, 2023). In light of the skyrock-
eting demands, this chapter provides an overview of hybrid STT workflows for real-time
interlingual communication and reflects on how the increasing interaction between
human language professionals and the AI-related technologies required by these complex
practices is transforming the professional landscape. This transformation is examined
from both practical and conceptual perspectives, focusing particularly on our traditional
understanding of interpreting.
The real-time nature of the practices considered here involves human agents and
technological systems ‘co-creating’ the final product, each handling different tasks.
This collaboration drives the evolution of hybrid modalities and requires the blend-
ing of traditional skills with new competences. It also calls for the examination of
factors such as the impact on professionals (e.g. in terms of effort, cognitive load,
ergonomics), as well as the evaluation of the fitness for purpose of the output, and
the role of humans in these processes. Traditional quality concerns (see Davitti et al., this
volume) now encompass additional dimensions, such as latency, readability, and intel-
ligibility, from a user-oriented perspective. Furthermore, this technological shift creates
opportunities for retraining and upskilling, offering potential for expanded professional

182
Technology for hybrid modalities

capabilities and services. Understanding the dynamics of these complex workflows is essen-
tial to appreciate the evolving landscape of translation and interpreting practices in the
digital age.
Following this introduction, Section 11.2 discusses the concept of hybridity in this land-
scape. Section 11.3 explores live STT practices and their integration within translational
activities. Section 11.4 details five workflows, presented along a continuum of human
involvement. Some advantages and disadvantages of each workflow in delivering live inter-
lingual content are discussed, with reference to examples of real-life use cases and appli-
cations, where available. Section 11.5 reviews some key research themes on these hybrid
practices. Finally, Section 11.6 reflects on the implications of such practices at conceptual,
professional, and pedagogical levels.

11.2 Hybridity in technology-enabled modalities for real-time communication


The concept of hybridity in translation studies (TS) has evolved significantly over time.
Initially, any translational activity was viewed as a straightforward process of converting
text from one language to another. However, as scholars delved deeper into the complexi-
ties of language and culture, and as the two evolved in concert with technology, the idea
of hybridity emerged. This concept acknowledges that translation, as a multilingual com-
munication practice, is not merely about linguistic exchange but inherently involves the
crossing and mixing of different elements. Translation also involves the blending of cul-
tures, genres, techniques, semiotic elements, and methods, resulting in the creation of new
meaning. Translated texts are all, to some extent, inherently hybrid (Schaffner and Adab,
1995). Hybridity has thus sparked debates on the (in)visibility of the human at the core
of these practices, their influence on the target culture, and the adaptation processes that
remain central to discussion in TS.
This concept emerged from cultural and post-colonial studies as an effort to counteract
West-centrism (Bhabha, 1994). Simon (2011) notes that hybridity now carries a largely
positive connotation, despite its historical association with negativity, as seen in the 18th
and 19th centuries, when it was linked to the ‘abnormal, the monstrous, or the grotesque’
(p. 49). Furthermore, hybridity has now broadened beyond language and culture fol-
lowing, on the one hand, the rise of multimodal texts that combine aural, written, and
visual elements and, on the other, the increasing integration of AI-driven technology in
translation-related workflows that facilitate crossing diamesic boundaries, that is, changes
in language medium, such as speech-to-text (via SR) or text-to-speech (via speech synthe-
sis); modes of communication, such as image to speech (through computer vision tech-
niques); and linguistic boundaries (via MT). The notion now encompasses new practices
that blur the lines between traditional translation and interpreting. This has led to debates
around the characterisation of language professionals as cyborgs (Robinson, 2022), with
ensuing fears around the utopian or dystopian implications of such hybridity (Eszenyi et al.,
2023) and, most importantly, around how this integration between humans and machines
should look.
In real-time interlingual communication, hybridity already applies to well-established
practices. Simultaneous interpreting is a prime example of the interplay between human
linguistic and cultural capabilities and technology, all converging to deliver the service. In
the wake of the recent multimodal (Davitti, 2018) and technological (Fantinuoli, 2018;

183
The Routledge Handbook of Interpreting, Technology and AI

Braun, 2019; Jiménez-Crespo, 2020) turns identified in TS, new modalities have emerged
which broaden the concept of hybridity across different dimensions.
Transfer. Although not always considered as translational activities in their own right,
given their traditionally intralingual nature, inter- or transmodal practices that cross tra-
ditional boundaries between spoken and written media are increasingly emerging. These
‘enrich the continuum of diamesic variations traditionally polarised between spoken and
written language’ (Eugeni and Gambier, 2023, 70, my translation). In this context, real-time
STT practices are a case in point, as they can cross both diamesic and linguistic boundaries.
Method. Hybridity requires historical disciplinary silos (e.g. translation vs subtitling vs
interpreting) to be overcome, and practices that defy rigid categories, such as full-blown
translational practices, to be recognised. For example, the use of digital pens or tablets
for SimConsec (Pöchhacker, 2007) and SightConsec (see Saina and Ünlü, this volume) has
transformed traditional workflows and, consequently, relaxed the immediacy criterion
associated with traditional interpreting (Pöchhacker, 2023, 281). Similarly, speech technol-
ogies have opened up different methods for real-time STT output, which will be explored
further in this chapter.
Competences. Emerging practices also reshape the skills that humans require to oper-
ate within technologised environments. Doing so involves significant human–AI interac-
tion and dealing with highly multimodal texts. This requires traditional competences to be
updated or adjusted to new practices and workflows and increased awareness of different
user needs. Examples range from learning to interact effectively with technologies that sup-
port or enhance traditional interpreting performance (e.g. CAI tools, see Prandi, this vol-
ume; tablets, see Saina, this volume; digital pens, see Orlando, this volume) to technologies
that share tasks with human professionals to ensure real-time multilingual communication,
as is the case for live STT practices.
Set-ups. In live scenarios, hybridity is also used to reflect the mixed nature of live multi-
lingual communication set-ups, which often combine online/distance and on-site elements.
The COVID-19 pandemic has accelerated the emergence of new hybrid forms of real-time
communication. This evolution compels us to rethink our interactions within traditional
spatial constraints and address the complexities of real-time interlingual communication
in these contexts. Unlike traditional face-to-face interpreting, live STT practices have been
integrated from the start within remote and hybrid scenarios, such as TV broadcasts, and,
more recently, within various types of live events, which can be delivered in different for-
mats (Eichmeyer, 2018).
This chapter embraces the notion of hybridity between humans and machines, in line
with the emerging field of human–AI hybrids (Fabri et al., 2023). It challenges the conven-
tional view of AI as a substitute for human tasks, which leads to ‘two unfortunate conse-
quences: (1) a disproportionately large focus on automation and (2) a tendency to neglect
the powerful interworking that occurs when humans and AI augment each other’. Instead,
this chapter highlights research that promotes a decisively more positive conceptualisation
of such human–AI collaboration as ‘dynamic combinations of individual competencies of
human and AI-enabled systems’ (Fabri et al., 2023 [no pagination]).
This chapter will next discuss the role of these technology-enabled hybrid modalities in
live interlingual communication, within the broader context of translation and interpreting.
Furthermore, it will highlight the evolving conceptual nexus shaped by the integration of
new technologies.

184
Technology for hybrid modalities

11.3 Situating real-time speech-to-text practices


As explained in the previous section, hybrid practices cross traditional disciplinary bounda-
ries. The place occupied by real-time STT activities that spans both intra/interlingual and
inter/intramodal boundaries within the landscape of traditional translational practices is
still an object of debate.
Building on audiovisual translation and semiotic frameworks, Eugeni (2020) and Eugeni
and Gambier (2023) recently situated these practices within the concept of ‘diamesic trans-
lation’ as a domain of both intra- and interlingual, as well as intersemiotic, translation. This
concept promotes a broad view of translation stemming from the notion of multimodality
and including shifts at verbal, paraverbal, and non-verbal levels. Echoing Gottlieb (2005),
diamesic translation is defined as ‘any process, or product hereof, in which a combination
of spoken and non-verbal signs, carrying communicative intention, is replaced by a written
combination reflecting, or inspired by, the original entity’ (Eugeni, 2020, 21).
Building on biosemiotic theories (Marais, 2019) and universalist conception of accessi-
bility (Greco, 2018), Pöchhacker recently sought to reconceptualise various forms of trans-
lational activities across intralingual, interlingual, and intermodal dimensions. Firstly, he
placed interpreting within this continuum, distinguishing it from translation activities by
its temporal nature (immediacy) rather than genre or setting. Secondly, he highlighted the
‘need for rethinking the concept of translation . . . [which] arises from the emergence of new
social practices that are regarded or designated by some as “interpreting” but deviate from
some of the concept’s defining characteristics’ (Pöchhacker, 2023, 277–278). These include
real-time STT practices, as they share features of interpreting such as live performance and
simultaneous processing. However, they differ from traditional interpreting due to their
shift from spoken to written language. This is not typically covered under traditional con-
ceptualisation of ‘interpreting-related’ activities but, rather, under audiovisual translation
(AVT) practices. Furthermore, many of these practices are traditionally performed intra-
lingually, catering to specific groups within the same signed language or spoken/written
language. As such, they have mostly been associated with media accessibility (MA), initially
considered a subfield of AVT. However, Greco’s (2018) shift from a particularist to a uni-
versalist view of accessibility has broadened the scope of MA to include ‘access to media
and non-media objects, services and environments through media solutions, for any person
who cannot or would not be able to, either partially or completely, access them in their
original form’ (Greco, 2018, 211).
Practices included under the ‘live STT’ umbrella term, implemented via different hybrid
methods, are the focus of this chapter. They retain key features of interpreting, such as
real-time performance and simultaneous processing, yet extend beyond traditional concep-
tual boundaries of interpreting. This justifies their inclusion in a handbook of interpret-
ing, technology, and AI, which embraces a broader conceptualisation of interpreting as
one which encompasses any linguistic and transmodal semiotic processes. This perspective
aligns with the debate about whether the future of simultaneous interpreting will increas-
ingly involve written forms (Eugeni and Caro, 2021) as a result of the hybridisation of trans-
lation and interpreting. The hybrid nature of these practices, particularly their dependence
on varying degrees of HAII, requires a thorough examination of how tasks are distributed
between humans and machines. The next section will describe the mechanics underlying
some of the workflows, with reference to current applications and use cases.

185
The Routledge Handbook of Interpreting, Technology and AI

11.4 Technology-enabled hybrid modalities for real-time interlingual


speech-to-text: an overview
To recap, in this chapter, the term technology-enabled hybrid modalities is used as a
macro-level expression encompassing forms of real-time (or live) STT practices, which can
be both intralingual and interlingual. Based on the discussion in Section 11.3, these are
conceptualised here as different modalities of translation, in its broadest sense, at the cross-
roads of interpreting, AVT, and MA.
As with other technology-mediated practices, such as distance interpreting, the termi-
nology for these practices varies widely. Such variation depends on factors like geographic
location, type of target text produced, context of use, production system (pre-recorded,
live, semi-live), technique employed (e.g. keyboard-based methods, like stenotyping and
velotyping, or speech recognition–based ones), level of editing required, and the targeted
end users (for an overview, see Eugeni and Caro, 2019). A common denominator, however,
is the immediacy of the service they provide. This requires specification via terms such as
live or real-time. These adjectives, used almost interchangeably, are not required when dis-
cussing interpreting, which is, by default, a real-time practice.
The terms subtitles or subtitling are commonly used and intuitively understood. The for-
mer emphasises the final product displayed on-screen, while the latter focuses on the pro-
cess of creating the subtitles. However, both terms fall short as overarching labels because
subtitles are typically associated with interlingual STT transfer, although redundant expres-
sions like multilingual subtitling for live events are also found. Moreover, these terms seem-
ingly exclude intralingual practices. In some countries, the latter are referred to as captions
or, more broadly, as transcriptions. However, these differ in relation to display mode and
scope, with transcriptions generally covering more broadly the process of converting speech
or audio into text. Nevertheless, a quick search of different service providers’ websites
reveals a range of variants being used in the industry, including live translated captions
(as opposed to live same-language captioning) or live multilingual captioning. Addition-
ally, while subtitles and captions are typically positioned at the bottom of the screen, live
STT practices may use different display modes, such as surtitles or scrolling live text. To
address these issues, Pöchhacker and Remael (2019) coined the term titling, although its use
remains mostly within academic circles.
Another term that explicitly captures the intersection between these new modalities
and interpreting is speech-to-text interpreting (STTI). This term refers to the simultaneous
transfer from spoken to written language. It can be performed both intra- and interlingually
and includes additional meta-information, such as speaker identification, to facilitate access
to the source content. STTI is commonly used in the context of live events and educational
settings (but cfr live/real-time STT reporting in relation to contexts such as parliamentary
debates). In contrast, live subtitling is more commonly associated with broadcasts. As an
umbrella term, it encompasses various production methods. Despite its clear emergence as
a key form of interpreting that is accessible to both hearing people and those with hearing
loss, STTI has yet to be formally recognised in ISO international standards as a form of
interpreting in its own right (Pöchhacker, 2023).
In this chapter, live or real-time interlingual STT is used as a superordinate term to
describe different practices and workflows. This term relays the broader concept of conver-
sion (building on Wagner, 2005) or transfer (Davitti and Sandrelli, 2020, 104) from speech
to text and encompasses a wide range of options for both process and product. Although

186
Technology for hybrid modalities

speech-to-text is typically associated with full automation, it is used here as a hypernym to


include various hybrid methods, all aiming to produce the same product: live text (subtitles
or transcripts) in another language.
The rapid development of AI-driven technology, like automated (speaker-independent)
speech recognition and MT, has expanded options for live interlingual STT, leading to dif-
ferent workflows where the diamesic and linguistic shifts are allocated to different compo-
nents. Before delving into the specifics of each workflow, it is crucial to first address some
of the key challenges related to the content fed into these processes. These challenges have a
significant impact on both the benefits and the limitations of the methods applied, influenc-
ing the outcomes and efficiency of the workflows.
For instance, the number of speakers: multiparty interactions are complex to handle.
In live STT, this involves additional tasks, such as speaker identification and managing
overlapping voices and interruptions. This complexity remains a challenge for automation
systems which has yet to be fully resolved. Other challenges include specific features of
the source (spoken) input, such as topic (general vs highly specialised), event type (e.g. the
nature and purpose of the event, the different registers, and a different end goal to bear in
mind), and availability of the script ahead of the event or broadcast, as well as speaker’s
characteristics (e.g. native vs non-native accents, speaking style, vocabulary, and degree of
planning, such as impromptu vs planned speech).
Another significant variable is original speech rate: a comfortable speech rate is normally
estimated at 100–120 wpm. However, real life is typically higher, with sports programmes
averaging 160 wpm, news programmes 180 wpm, and interviews and weather reports
230 wpm. Importantly, adult viewers’ reading speed is generally slower than the average
speaking rate, with an estimated reduction factor of approximately one-third to accommo-
date audience processing speed (in the absence of other conditions requiring more adjust-
ments). Moreover, this varies based on different contexts, language combinations, levels of
redundancy of information, and differences in audiences.
The presence of visual aids (e.g. slides, graphs) may cause an additional burden to moni-
toring, as care must be taken to ensure that the subtitles do not obscure key visual elements
or duplicate information already presented on slides. Furthermore, as live STT practices
vary across countries, differences may arise in terms of subtitle format (ranging from tradi-
tional subtitles, for example, on TV or during parliamentary debates, to real-time text, for
example, at conferences or lectures), display mode (scrolling or block text), and sound and
music labels. Another variable is mode of delivery, that is, whether a specific live STT prac-
tice is conducted face-to-face or remotely. This affects technology use and may introduce
additional sound issues and delay. Finally, the range of language pairs covered by interlin-
gual practices remains a challenge, as no current solution ensures comprehensive coverage
of all language needs, including instances where languages are alternated in the source
content.
These variables are common across translation-related assignments yet require specific
consideration within the multimodal context of STT. The technologised nature of these
practices increasingly shifts decision-making towards machines and raises questions regard-
ing the extent to which the latter can handle these processes, in addition to the implications
for output quality. In these hybrid forms of HAII, humans are responsible for overseeing
AI output and performing live editing under severe time constraints, which can increase
cognitive load and fatigue. The shift towards full automation, where human involvement is
primarily for live editing (Pöchhacker, 2023), has raised questions about the role of humans

187
The Routledge Handbook of Interpreting, Technology and AI

Figure 11.1 Continuum of hybrid workflows for real-time interlingual STT.

in these processes and has outpaced our understanding of their effectiveness and impact of
different practices. With these considerations in mind, we will now examine the workflows
in more detail. Based on research (e.g. Davitti and Sandrelli, 2020; Korybski et al., 2022;
Wallinheimo et al., 2023), different workflows for real-time interlingual STT have been
placed on a continuum from more human-centric to semi- and fully automated (Figure 11.1
- see also SMART 2023 video clip5).

11.4.1 Human-centric workflows


The first set of HAII workflows presented is human-centric, with the balance of interaction
leaning towards humans actively driving the linguistic and diamesic processes involved in
these practices.

11.4.1.1 Interlingual respeaking


At one end of this cline is a technique called interlingual respeaking. In this human-centred
form of HAII, a specialised professional (human) listens to live spoken input in a source
language and simultaneously reformulates it (in the same language) or translates it (into a
target language) to speech recognition (AI). This, in turn, transforms it into written out-
put, displayed on-screen, with minimum delay (definition adapted from Romero-Fresco,
2011, 275).
Interlingual respeaking is thus a form of intermodal, AI-enabled simultaneous interpret-
ing. However, interlingual respeaking extends the challenges of traditional practices. The
professional has to not only deliver the source message into a different target language
but also obey the requirements of subtitling. Requirements include orally adding punctua-
tion and other features, such as speaker identification and sound effects, articulating and

188
Technology for hybrid modalities

controlling prosody to optimise recognition, and performing any adjustments or reformula-


tions necessary to convert oral input into written output and on-the-go editing of the writ-
ten output for comprehensibility and readability.
This method has proven highly efficient, especially for major languages. In its intralin-
gual variant (i.e. in the same language), it resembles a form of intermodal shadowing that
can be carried out more or less verbatim or sensatim, depending on the specific challenges
of the source content and other variables (for a complete overview of the practice, its use,
and its application, see Romero-Fresco, 2018; Romero-Fresco and Eugeni, 2020). Inter-
lingual respeaking adds a language transfer component, converting live spoken content
into written text in a different language. In terms of technology, it relies on interaction
between human and speaker-dependent SR, with the industry standard being Dragon Natu-
rallySpeaking, for languages supported by the software. The software uses deep learning
technology to achieve high accuracy in front-end live STT, is highly customisable (e.g. with
commands, word lists, etc.), and is designed to recognise a single speaker’s voice. It thus
differs from ASR solutions currently being implemented in other workflows.
The terminology surrounding this practice varies depending on its geographical loca-
tion and whether it is used in academic or professional environments. The term respeaking
generally refers to the method of producing live subtitles via SR, distinguishing it from
keyboard-based methods, like stenotyping. By default, respeaking refers to the intralingual
form of the practice. In Europe, this term is commonly used, while in the United States, the
same method is often referred to as real-time voice writing. Live subtitling is another term
frequently used to describe this practice, but since it can encompass different methods, it
requires specification (e.g. live subtitling via SR). When focusing on the product, live cap-
tioning is also used to define the output, which remains in the same language. As seen in
Section 11.4, STTI refers to a broader practice, with respeaking being a specific subset that
utilises speech recognition technology. Transpeaking is a newer term, coined in academic
circles (Pöchhacker and Remael, 2019) to highlight the translation aspect of this prac-
tice. However, this term has not been widely adopted in professional environments, where
respeaking or live subtitling are more commonly used.
Intralingual respeaking was introduced around the turn of the 21st century and is now
widely used globally to subtitle live broadcasts (e.g. news, weather forecasts, live sports,
chat shows, reality shows, parliamentary debates) or live events (e.g. conferences, talks,
award ceremonies, film festivals, university lectures, school classes, business, and town hall
meetings), mainly as an alternative to stenotyping, to make content accessible for people
with hearing loss. This is usually performed with no script or only a partial script available.
Compared to its predecessor, interlingual respeaking is still in its early stages and faces
significant challenges. Despite its being defined as ‘a thriving trend and a promising new
opportunity with sufficient potential for widespread use as long as good training is pro-
vided’ (Alonso-Bacigalupe and Romero-Fresco, 2024, 536–537), there remains a shortage
of qualified professionals offering this service across various language pairs. Furthermore,
the hype around (semi-)automation has raised questions about whether investing in inter-
lingual respeaking is a worthwhile choice. Additionally, there is a general lack of awareness
about the service among both clients and service providers, who may not realise that differ-
ent methods to produce live subtitles are available.
Currently, interlingual respeaking is not provided on the same large scale as intralingual
respeaking. Nevertheless, the demand for the service is growing rapidly. It is performed
on TV in countries including Wales, Belgium, and the Netherlands, for breaking news and

189
The Routledge Handbook of Interpreting, Technology and AI

other major live events, such as cultural events, festivals, and award ceremonies. It is also
being explored in settings such as conferences, business meetings, trade shows, and polit-
ical, educational, and legal settings, among other social environments. With the rise of
online streaming content and the need to make this accessible, there is potential for new
markets to open up. This includes settings where simultaneous interpreting or subtitling
is not normally available. Examples may cover live subtitling of online radio broadcasts,
remote subtitling of museum tours, and MOOC classes.
This workflow relies on efficient interaction between language professionals and
AI-driven SR. It requires complex, specialised skills, and an ability to adjust to technology
and see beyond traditional practices to understand the need for skill adaptation and acqui-
sition. Required skills include concurrent listening and translating while adjusting one’s
way of speaking to the software for enhanced recognition. The respeaker must also perform
audiovisual monitoring: checking their own spoken output (as is normal during interpret-
ing) in addition to monitoring the appearance of the subtitles on-screen and applying edits
or corrections in real time. This multitasking activity, which involves coordinating and
controlling all these steps simultaneously, makes interlingual respeaking akin to ‘simultane-
ous interpreting 2.0’.
This process is neither flawless nor effortless. At its core lies the ability to strategically
reformulate content in the target language. Humans possess the unique real-time flexibility
to adjust to different accents, speeds, and other source text characteristics, including multiple
speakers, lexical density, changing registers, or even languages. They can also extract mean-
ing from chaotic, impromptu speeches full of hesitations and self-repairs, make inferences,
and accurately interpret colloquialisms, sensitive language, idioms, cultural references, and
implicit meanings based on a thorough understanding of the context. This sets humans apart
from automated transcription and machine-generated subtitles, particularly in high-risk,
high-stakes contexts, and allows them to shape and apply their live editing skills according
to the needs at hand. However, these skills require adaptation to the specific practice.
As highlighted in Section 11.4, unlike interpreters, respeakers must remember that they
produce more words than the original, as they also verbalise punctuation and voice labels
to identify sounds or speakers. This results to a delay in following the speaker and adds
to the time lag required to process the incoming information. While broadcasters often
address time lag by delaying the signal to reduce or eliminate the perceived gap, condensa-
tion strategies become crucial in live scenarios, where managing the delay like broadcasters
is not possible. However, the question still remains regarding how best to implement these
strategies without losing meaning. This is especially the case with intralingual respeaking,
where a verbatim approach is often expected by viewers. Given the lack of one-to-one cor-
respondence between a source and a target language, the argument for a sensatim approach
becomes more pronounced in interlingual respeaking. When language transfer is involved,
the limitations of a word-for-word approach have been well-documented by translation
studies research, where the importance of transferring ideas rather than words to ensure
meaning retention is well-established.
Spontaneous speech is often grammatically unpolished and redundant, making it unsuit-
able for direct use in written form. Respeakers can streamline and polish the content to
ensure high quality and accuracy while also considering the specific needs of the target audi-
ence. This makes live subtitles more engaging, readable, and accessible. Editing skills tra-
ditionally refer to those applied post hoc in respeaking, with different correction strategies
(e.g. based on voice commands, keyboards, mouse), performed by the respeaker themselves

190
Technology for hybrid modalities

or by a corrector/editor, involving decisions on what to correct. However, when editing live,


respeakers must also ensure their output can be effectively processed through SR software
and fit the constraints of subtitles. Any delays in comprehension or while finding an appro-
priate equivalent can add to the latency with which the subtitles are displayed. Effective live
editing thus can be broadened from a reactive ability to identify and correct subtitle errors
as soon as they are processed and displayed on-screen to include ‘proactive strategic behav-
iour to ensure the fitness-for-purpose of the target output’ (Korybski and Davitti, 2024, 4).
Finally, preparation as a pre-process stage extends beyond content-related preparation and
includes SR software optimisation to maximise the chances of correct recognition of names,
acronyms, or foreign words when fed in advance. This acts as pre-editing and is essential to
optimise human–machine efficiency.
In relation to HAII, dictation is another key procedural skill, focused on optimising
interaction with SR. Dictation to SR requires clear enunciation and software-adapted deliv-
ery (SAD, a term coined within the SMART project). The respeaker must adjust their natu-
ral prosody and articulate words clearly with a neutral and even delivery. For interpreters,
this adjustment feels counterintuitive as prosody is a key vector of meaning. However, it is
necessary in respeaking when addressing a machine. Dictation to SR also requires chunk-
ing the input, pausing strategically, and finding the right rhythm to enable the software to
process and display the dictated words without too much delay.
The highly human-centric nature of this workflow is evident. The human operator is
at the core of this activity. They provide the initial input to the machine and determine
how the output is delivered, including punctuation and other details the machine processes.
As a result, the accuracy of the subtitles can be very high, but the cost and subtitle delay
are likely to increase.

11.4.1.2 Simultaneous interpreting + intralingual respeaking


SR technology has not been equally developed for all languages, and at the time of writing,
interlingual respeakers are not widely available, with limited general awareness of this tech-
nique. Therefore, alternative methods are often used to achieve the same goal of providing
real-time STT services across languages.
One alternative configuration involves two professionals: a simultaneous interpreter,
who renders a spoken message from the source into the target language, and an intralingual
respeaker, who then respeaks the interpreter’s output into SR. The respeaker voices punc-
tuation, adds content labels where needed, and employs SAD and strategic reformulation
techniques intralingually to create subtitles that are succinct, grammatically correct, and fit
for purpose. In this workflow, the respeaker does not need to work across two languages,
as the simultaneous interpreter acts as a relay, handling the language transfer.
Industry evidence shows that this workflow has been used by international organisations
and at large-scale events. It is highly human-centric, involving professionals with different
expertise, and can produce very accurate subtitles, albeit with increased cost and poten-
tial delays. The delay can be mitigated in hybrid meetings by restreaming the event for
the online audience, synchronising the audiovisual content with subtitles. However, this
multi-step process carries risks, including the increased potential for technology failures
and information loss, depending on how each step is implemented.
Creating consolidated teams of respeakers and interpreters who work together regularly
could help mitigate some of these risks. However, this is not always feasible, especially

191
The Routledge Handbook of Interpreting, Technology and AI

due to the corporate model of many companies that rely on a largely freelance workforce.
Additionally, increased awareness and education are needed, as this workflow puts signifi-
cant pressure on interpreters, who are responsible for conveying meaning across languages
under visible scrutiny.

11.4.2 Semi-automated workflows


Semi-automated workflows vary in how they use HAII to generate real-time subtitles in
another language. These workflows outsource some of the tasks, such as language or dia-
mesic transfer, to AI. They rely partially on the extent to which transcription (via ASR) and
translation (via MT) can be automated, leading to an increased distribution of responsibili-
ties that comes with affordances and constraints.

11.4.2.1 Intralingual respeaking + machine translation


This workflow involves an intralingual respeaker listening to live speech in a source language
and producing subtitles in the same language by skilfully interacting with SR. These subtitles
are then machine-translated into one or more target languages. Respeakers are responsible for
punctuating, chunking, and streamlining the subtitles, adopting strategies to create succinct,
grammatically correct, fit-for-purpose subtitles in the same language. They may omit repeti-
tions, hesitations, and redundancies or close unfinished sentences, thus optimising the source
text for MT. Language transfer is entirely carried out by MT, but the human respeaker opti-
mises the interaction with SR software to facilitate the process. This level of human–machine
interaction can result in highly accurate subtitles while containing costs and delays.
Anecdotal evidence shows this workflow being used for subtitles in different languages
(including rare languages, where SR technology and professionals are scarce) at interna-
tional events, such as the World Economic Forum in Davos, and at large-scale corporate
events, product launches, and information briefings. However, as a practice, it does not yet
seem to have been widely adopted in the industry (Romero-Fresco and Alonso-Bacigalupe,
2022; Alonso-Bacigalupe and Romero-Fresco, 2024).

11.4.2.2 Simultaneous interpreting + automatic speech recognition


This semi-automated workflow involves a simultaneous interpreter initially rendering con-
tent orally from a source into a target language. Their spoken output is then fed directly
into ASR, which converts the interpreter’s words into same-language text and automati-
cally adds punctuation. Without pre-editing the text fed into ASR, the software may mis-
recognise certain words, causing errors in the subtitles. Although recognition technology is
advancing rapidly, simultaneous interpreters are not specifically trained to interact with SR
to produce written subtitles and may not speak in a way that the software can easily process.
Additionally, the automatic segmentation and punctuation of written subtitles may vary.
As a result, human post-editing might be required, and interpreters would need specific
training to work with ASR, including using SAD and limiting self-repairs and hesitations.
Here, the human is responsible for the language transfer, but the conversion of spoken words
to written subtitles is outsourced to the machine. Although this workflow may produce low-cost
subtitles that appear on-screen quickly, the accuracy of these subtitles is likely to vary.

192
Technology for hybrid modalities

11.4.3 Full automation


Recent advances in language technology have introduced transformative possibilities,
promising to create a ‘world-readiness’ (Joscelyne, 2019) where subtitles are available to
anyone, anywhere, anytime. This is particularly relevant in a world where technology has
led to new types and formats of audiovisual products, often with very short lifespans, yet
created at an unprecedented rate, necessitating near real-time availability. Full automation
with no human intervention is thus generating hype, proving highly attractive for many
stakeholders.

11.4.3.1 Automatic speech recognition + machine translation


At the other end of the spectrum (Figure 11.1) lies a fully automated workflow with poten-
tial human intervention only in an editing role. In this workflow, the speaker’s original
words in the source language are processed through ASR to produce text in the same lan-
guage. This is then machine-translated to generate subtitles in one or more target languages.
Recent industry research (e.g. Wordly, 2024) highlights the growing demand for this service
due to its promise to increase inclusivity and accessibility while saving time, reducing costs,
and simplifying logistics.
Different stakeholders, including access service providers, broadcasters, RSI/multilingual
meeting platforms, language service providers, and technology providers, are converging
towards exploring how AI-enabled live captioning/subtitling can add value to their content
and bridge language gaps at live events, such as conferences, multilingual meetings (includ-
ing fully remote and hybrid ones), corporate events, employee meetings/training, customer
meetings/training, local government meetings, industry conferences (including keynotes
and breakouts/panels), and religious services. Providers have started integrating these capa-
bilities with meeting platforms or are developing proprietary solutions with live-translated
subtitles features.
However, significant challenges remain. Firstly, these technologies are not available
or equally effective in all languages. The output quality of live subtitles produced by SR
and MT is thus often unreliable, especially in the presence of background noise, multiple
speakers, unfinished sentences, incoherent talk, accents, and named entities (see Davitti
et al., this volume). These issues can cause errors at the level of recognition, punctuation,
and/or segmentation to propagate through MT, resulting in increasingly inaccurate and
hard-to-follow subtitles. Despite advancements in end-to-end STT models, live content still
has a long way to go (see Fantinuoli, this volume).
The European Parliament’s (EP) live STT and MT tool is one of the most notable
attempts to implement this workflow. Designed to automatically transcribe and translate
multilingual parliamentary debates in real time across all 24 official EU languages, this tool
aims to enhance accessibility and inclusivity for ‘unconstrained speech’ and is currently
undergoing quality assessment. It remains to be seen whether the EP, given its access to
highly trained human interpreters, will consider alternative, hybrid workflows for other
use cases. This could involve upskilling its current workforce, who appears best placed for
such an adaptation.
To conclude this section, it is safe to state that each workflow presents advantages
and drawbacks. Output quality varies, and high-risk, high-stake content requires human
intervention to ensure real-time subtitle accuracy. The key question is, at what stage

193
The Routledge Handbook of Interpreting, Technology and AI

should this intervention occur, and what skills are required for effective on-the-go edit-
ing? Human-driven services are considered premium and increasingly involve a hybrid
approach. From an industry perspective, despite the hype around cutting human profes-
sionals from workflows, there is increasing awareness that different methods may be needed
in different circumstances. For example, in a recent Slator interview,6 Tony Abraham from
Ai-Media noted:

[Y]ou would put a respeaker on something that, A, is really important content, but,
B, where you do not have that situation where the AI can deliver those results. So, for
example, where you have multiple speakers, mixed quality audio, background noise,
singing, multiple languages . . . [which] tends to be the most important content for
our customers.

Similar to when MT came to the fore in translation, many providers are now offer-
ing a tiered approach to product accuracy, with premium captioning services providing
the highest accuracy and more automated solutions being offered with disclaimers about
their limitations. Consistent with academic discourse, and cutting through the marketing
hype, there is a growing consensus that a ‘one-size-fits-all’ solution cannot exist. However,
there is also an increasing need to validate these workflows empirically and identify the
conditions under which they perform best, understand their affordances and constraints in
different contexts, and explore how humans operate within these environments, as will be
discussed in the next section.

11.5 Research on hybrid workflows


Research on real-time STT hybrid workflows is expanding. Broadly speaking, it can be
divided into academic and industry-driven streams. This section focuses on academic
research, which began with a focus on human-centric practices and is now expanding
towards comparative analyses of different workflows. In contrast, industry-driven research
appears more centred on (semi-)automated forms of intra- and interlingual STT conversion
for resource efficiency gains.
One key area of enquiry is how different workflows perform across various scenarios
and languages, with the aim of establishing quality benchmarks for real-time interlingual
STT. This will be reviewed in Section 11.5.1. However, one drawback of this body of
research is the near-total lack of comparability. This makes it difficult to derive any con-
clusive results. Important study design parameters are therefore reviewed in Section 11.5.2
to assess the current state of research in this area. As with new forms of HAII, these work-
flows require adaptation to new tasks, the acquisition of new skills, and the adjustment of
existing ones to ensure optimal combination of machine efficiency and human effectiveness.
This will be reviewed in Section 11.5.3. Lastly, given the novelty and hybridity of these
workflows, there is a need for training and upskilling, since awareness of these practices
is still relatively low. Some key initiatives in this area will be reviewed in Section 11.5.4.
In relation to the last two areas, namely, skills and competences, and training and upskill-
ing, most available research has focused on human-centric practices, such as interlingual
respeaking. This research can serve as a solid baseline for exploring similar questions in
relation to other workflows.

194
Technology for hybrid modalities

11.5.1 Efficiency
Although still relatively small, the body of research on different workflows is growing and
has yielded interesting findings. These studies have carried out either small-scale compara-
tive analyses of different methods or in-depth analyses of single workflows. They share a
common focus on assessing performance under various conditions. The term ‘quality’ is
often used. However, since ‘quality’ is a complex, multifaceted construct, it is not fully
captured (see Davitti et al., this volume). ‘Accuracy’ is another employed term covering
one important aspect of quality, but not its entirety. ‘Efficiency’ has emerged as a more
encompassing measure. It incorporates accuracy along with other critical factors, including
speed, latency – important for user experience – and cost, which are crucial for driving the
market demand for this service. This section will not delve into detailed reporting, which
can be accessed through the provided references. However, it will showcase the research
approaches in this domain, highlighting their strengths and weaknesses, with a focus on
interlingual real-time STT workflows.
Starting with comparative studies, Romero-Fresco and Alonso-Bacigalupe (2022) exam-
ined the five workflows discussed in this chapter, namely:

1. Interlingual respeaking
2. Simultaneous interpreting + intralingual respeaking
3. Simultaneous interpreting + ASR
4. Intralingual respeaking + MT
5. ASR + MT

The study focused on one language combination (EN > SP) and involved two participants
per workflow where needed, except for (2), requiring two pairs, and (5), requiring no
human intervention. Participants in these workflows were described as ‘professionals with
experience ranging from five to twenty years in the private market’. However, no detailed
information is provided in relation to what these years of experience represent in terms of
actual assignments performed or specific details on their prior training. Nevertheless, these
could both be key variables affecting performance. The materials used were two TEDx
monologic speeches, 11 to 15 min long, which, as stated by the authors, only differed in
length and speed of delivery. However, it is unclear how other variables, including topic,
technicality, lexical density, and syntactic complexity, were handled. Additionally, since the
study was conducted online, greater explanation of how specific variables were controlled
for would have been beneficial.
Efficiency was analysed in terms of accuracy, delay, and cost. Accuracy was calculated
via the NTR model (Romero-Fresco and Pöchhacker, 2017): three workflows (1, 2, 4)
exceeded the acceptability threshold of 98%, a benchmark validated only for intralingual
respeaking (using the NER model – Romero-Fresco and Martínez, 2015). This threshold
still requires validation in various real-life settings to determine its suitability as an accu-
racy benchmark in interlingual contexts. This distinction is important, as literature often
extends this threshold to interlingual modalities, without questioning the extent to which it
actually applies when language transfer is involved.
As the authors state, ‘the analysis of delay and cost yielded a much more nuanced sce-
nario that may limit the potential usefulness and acceptability of some of the workflows’
(Alonso-Bacigalupe and Romero-Fresco, 2024, 535). The fully automated workflow

195
The Routledge Handbook of Interpreting, Technology and AI

(5) ranked first in terms of delay, despite being the worst in terms of accuracy, while
workflow (2) ranked last in this variable, despite being one of the best in terms of accu-
racy. The authors also highlight an inversely proportional relationship between workflow
automation and cost, where the latter is calculated ‘speculatively’ in terms of resources
needed.
Despite the relevant contribution, there are several limitations in the design of this study.
The authors themselves call for ‘further studies [that] could test larger samples, including
different genres (for instance spontaneous, unscripted interactions involving several speak-
ers, such as TV talk shows, political or social debates, online meetings etc.) and other
language combinations’ (Romero-Fresco and Alonso-Bacigalupe, 2022, 13), which can
yield different results. In addition, more details regarding participants’ profiles, training
undertaken, and technology used (ASR, MT) would be needed for methodological rigour.
While these preliminary findings suggest certain potential patterns, it would be premature
to consider them conclusive, despite how they are reported at times.
One attempt to systematise research findings in relation to this small body of compara-
tive research can be found in Alonso-Bacigalupe and Romero-Fresco (2024). The authors
directly compare studies, including Eugeni (2020), Eichmeyer-Hell (2021), Dawson (2020),
and Pagano (2022). Despite acknowledging that ‘this is not a systematised battery of exper-
iments designed and carried out in parallel . . . but a compilation of the results obtained by
a number of researchers’ (Alonso-Bacigalupe and Romero-Fresco, 2024, 536), they carry
out direct comparisons and discuss ‘emerging trends’ and ‘consistency’ in some key find-
ings. For example, simultaneous interpreting + intralingual respeaking (2) is identified as
the most efficient workflow, and fully automated ASR + MT (5) as the least efficient. Inter-
lingual respeaking (1) is reported as performing well in some studies but not in others,
with comparison drawn to ‘interlingual velotyping’ (used in Eugeni, 2020), despite being a
completely different technique. Intralingual respeaking combined with MT (4) is reported
as yielding good results, while simultaneous interpreting combined with ASR (3) ‘performs
poorly’. Also, they extend their finding that ‘the more automated the workflow, the lower
the delay, the cost and accuracy; the more human the method, the higher the delay, cost
and accuracy’ across other studies, and conclude that semi-automated workflows repre-
sent a ‘happy medium in terms of overall quality and efficiency’ (Alonso-Bacigalupe and
Romero-Fresco, 2024, 538).
Despite the intriguing, pioneering nature of the findings, a straight comparative analy-
sis must be handled with caution as it can lead to misleading conclusions if taken out
of context by the industry. For example, the comparison included studies that focused
on very different workflow techniques. Eugeni (2020) combined keyboard-based and
SR-based methods (including stenotyping and velotyping) in several language pairs (Ital-
ian into German, German into English, English into French), and Eichmeyer-Hell (2021)
compared stenotyping and respeaking, only intralingually (in German). While both
studies provide very insightful observations due to their naturalistic settings, that is,
real-life conferences, they do not offer the controlled conditions typical of experimental
research, which makes them unsuitable for the kind of direct comparison performed by
Alonso-Bacigalupe and Romero-Fresco (2024). Dawson (2021) replicated Romero-Fresco
and Alonso-Bacigalupe’s experiment, using the same workflows and language pair but
opposite directionality. However, findings only seem to be reported in the latter study,
making it difficult to scrutinise the details and the comparability of the approach adopted.

196
Technology for hybrid modalities

Pagano (2022) carried out experiments comparing three workflows (1, 2, 3) from EN >
ITA and all five workflows from SP > ITA using a sample of 20 participants in total, all
postgraduate students. The potential implications of recruiting such diverse samples will
be discussed in Section 11.5.2.
Accuracy as a key element of efficiency is also evaluated by other studies, which focus
on a more in-depth analysis of specific workflows. SMART,7 for instance, focused on an
extensive experimental analysis of interlingual respeaking, performed by 51 language pro-
fessionals across six language pairs (IT, FR, SP into and out of EN; 17 participants per lan-
guage pair) from different, relevant professional backgrounds (including interpreting and/
or pre-recorded/live subtitling, translation). MATRIC8 carried out a small-scale study of
intralingual respeaking + MT compared against simultaneous interpreting. This study also
relied on the participation of professional respeakers working full-time for media outlets
and EU-accredited interpreters working for the European Parliament and four language
pairs (from EN into IT, SP, FR, PO).
In line with Romero-Fresco and Alonso-Bacigalupe (2022) and Dawson (2020), SMART
and MATRIC adopted the NTR model to evaluate accuracy based on a quantitative and
qualitative assessment of positive and negative shifts between source input and target out-
put. These were categorised as ‘effective editions’ and ‘errors’, respectively. Despite the var-
ying levels of detail provided across studies regarding the implementation of such models
(e.g. number of assessors, procedures for first and second marking, and whether the whole
performance or a sample was evaluated), there is an attempt at establishing coherence
across the studies, which bodes well for comparison. However, the issue of benchmark-
ing for interlingual practices remains, and the only attempt at addressing it can be found
in SMART. Here, NTR results, which point at informativeness, were triangulated with
results from the application of an intelligibility scale (based on Tiselius, 2009) to determine
whether and, if so, which performances below 98% could be included in the high perform-
ers samples.
As is the case for a growing number of hybrid practices being evaluated via NTR-like
approaches (e.g. Rodríguez González et al., 2023; Radić, 2023 – see Davitti et al., this vol-
ume), MATRIC adjusted the NTR model to suit the workflows analysed. Here, recognition
errors only apply to the intralingual respeaking process in the semi-automated workflow,
that is, an interim stage in the process. However, they do not apply to the simultaneous
interpreting (benchmark) workflow, where this category of errors was not considered for
the final output.
Among non-experimental studies, other types of accuracy assessment models were used,
namely, the IRA model (idea–unit rendition assessment; Eugeni, 2017) in Eugeni, 2020,
and the WIRA model (weighted idea rendition assessment; Eichmeyer-Hell, forthcoming) in
Eichmeyer-Hell (2021). Another naturalistic small-scale study, Sandrelli (2020), focused on
interlingual respeaking (compared against simultaneous interpreting) from EN > ITA pro-
vided at a real-life symposium during the same conference. Sandrelli compared the semantic
content conveyed by the two modalities using a more qualitative, purpose-made taxonomy,
based on three macro-categories in terms of information made available to the audience
(transmission, reduction, distortion) and related subcategories.
After providing a quick overview of the main methods used to evaluate accuracy in
existing studies on real-time interlingual STT practices, it emerges rather clearly that the
various approaches used make the comparison across different workflows rather arbitrary,

197
The Routledge Handbook of Interpreting, Technology and AI

particularly if the experiment parameters used in each study are not fully specified. Due
to the lack of a standardised approach, similar considerations arise in relation to meas-
urements of delay and cost as key elements of efficiency. The next section will address
other important methodological parameters to facilitate meaningful comparisons and offer
insights for future research.

11.5.2 Study design parameters


This section considers key variables often underreported in existing comparative studies
but vital to shed light on the validity of findings, such as language pairs, directionality, and
assessment methods (partly discussed earlier in Section 11.5.1); participants (profile, train-
ing, and number); type of training undergone in the specific practice; and materials used
(type and duration).
Initially, the findings obtained will inevitably be influenced by who the participants are,
their professional background, and how they have been trained to take part in studies.
This is particularly evident when accuracy is such a key factor. However, little information
is generally shared, and eligibility criteria used in recruitment are not always fully clari-
fied. Leaving aside naturalistic studies, which do not have an experimental/comparative
purpose, controlling participant recruitment in experimental studies remains vital for ensur-
ing a representative sample, reducing bias, and enhancing the reliability of findings.
One common denominator is the need to recruit professionals to study these complex
workflows. However, the lack of a single definition for what constitutes a ‘professional’
has led to diverse recruitment methods and sample characteristics. Dawson grouped her
participants into ‘interpreters’, ‘subtitlers’ and ‘other’, with combined experience in both
fields (interpreters with subtitling experience and subtitlers with interpreting experience).
Although her cohort of 44 had substantial training and academic qualifications, they can-
not be qualified as professionals, as most were students in advanced stages of education,
either completing or pursuing master’s degrees related to their field. Romero-Fresco and
Alonso-Bacigalupe (2022) included a very small cohort of professionals in their compara-
tive study but provided minimum profiling details. Their participants featured two pro-
fessional interlingual respeakers, one with five years’ prior experience as an intralingual
respeaker, the other with over 15 years as a simultaneous interpreter; two simultaneous
interpreters with over 20 years of experience in the professional market; and four intralin-
gual respeakers (different languages), two recently graduated, and two with over five years’
experience in the professional market. From this information, it is evident that participants
start from very different levels. This makes it challenging to control the impact of prior
experience on their performance. The authors do not offer a definition of what constitutes a
professional. While difficult, establishing such a baseline would help identify the minimum
level of expertise required. Furthermore, ‘years of experience’ as a parameter alone reveals
very little, as individuals may not have practiced regularly during certain periods.
Despite its focus on one specific workflow, namely, interlingual respeaking, the SMART
project attempted to tackle this question while collecting data from a substantial cohort of
participants (N = 51). The project builds on a set of assumptions: first, this hybrid practice
can attract language professionals from various backgrounds, such as written translation,
interpreting (with a distinction between consecutive/dialogue, simultaneous/whispered, and
sign language9 modes), and pre-recorded or live subtitling. Second, language professionals,
particularly freelancers, who may be interested in diversifying their portfolio of skills are

198
Technology for hybrid modalities

likely to have a mixed background, combining more than one clear-cut profile. Therefore,
in the recruitment stage, the project designed an eligibility questionnaire to profile par-
ticipants. A definition of ‘professional’ for the purpose of the study was provided, that
is, ‘someone with a minimum of 2,000 hr of professional (paid or pro bono) experience
in at least one language profession’. This baseline requirement – equivalent to working
an average of 4 hr every weekday for two years – was used to streamline the recruitment
process and establish common grounds across participants. Practice hours were catego-
rised into brackets (e.g. 2,000–3,900 hr, 4,000–9,999 hr, and more than 10,000 hr) since
different professions track work in various ways (e.g. by days, number of words, or sub-
titles translated). The findings revealed that the majority of participants had a composite
background, with most (26 out of 51) combining three different professions. Furthermore,
all participants indicated written translation as part of their skill set, forming a common
denominator, despite their varying levels of expertise. This reflects the reality of the lan-
guage industry, where professionals often offer multiple services as part of their portfolio.
It also suggests that a more granular approach – focusing on specific skills rather than entire
professions – might prove more useful for identifying what can be transferred, what can be
adjusted, or what needs to be acquired from scratch when performing hybrid practices in
general (see Davitti and Wallinheimo, 2025).
Given the scarcity of professionals trained in the array of hybrid practices analysed,
planning a study requires careful consideration of the type of training participants receive
before testing each specific workflow. This includes the structure of the training and its
duration – both crucial when comparing and commenting on performance and accuracy.
Once again, to date, most available information comes from studies that focus on inter-
lingual respeaking, where the pre-testing training provided varies significantly in format
and duration. For example, the SMART project employed a 25 hr ‘training for testing’
prototype upskilling course over five weeks. Dawson (2020) used a ‘train, practice, and
test’ approach with a four-week course, consisting of three weekly sessions and 2 hr weekly
exercises. Pagano (2022) implemented a 70 hr course over three months, including 50 hr
of synchronous lessons and additional time for practical exercises and individual training.
In contrast, studies comparing hybrid practices do not report specific pre-testing train-
ing; instead, they asked participants to perform techniques for which they had previously
received training (see Romero-Fresco and Alonso-Bacigalupe, 2022; Korybski et al., 2022).
The type and duration of materials used as a basis for performance analysis also vary.
Dawson (2020) used 5 min videos, at speeds between 107 wpm and 159 wpm, on topics
including gardening and feminism. Pagano (2022) used 11 min clips, at 125 wpm, on cli-
mate change and speeches of varying lengths (1 to 4.2 min) and speeds (98, 139, and 126
wpm) on diverse topics, ranging from Pope Francis on climate change to the European
Economic and Social Committee. Romero-Fresco and Alonso-Bacigalupe (2022) employed
the same climate change clip used by Dawson and a 15 min TEDx talk at 165 wpm. These
materials all differ in genre, technicality, lexical density, and syntactical complexity. How-
ever, detailed information on the actual characteristics of each speech remains vague, only
specifying that a speech may include ‘some or no specialised terminology’ or be qualified as
‘not particularly dense’ or having a ‘low level of technicality’. This represents a weakness of
current research, which labels itself as experimental, although ‘using unedited speeches and
conducting the analysis at the text level would have likely introduced an excessive amount
of potentially confounding variables’ (Prandi, 2023, 135).

199
The Routledge Handbook of Interpreting, Technology and AI

To address these issues, the SMART project designed its materials differently, focus-
ing on specific scenarios rather than genres. The experiments included two monologic
speeches – one at a controlled fast pace of around 140 wpm (identified as a stress factor
in real-time modes), and another with planned/unplanned alternation of pre-prepared and
impromptu speech structure, with the former normally associated with higher processing
difficulty. The third test was a dialogic speech to test the multiple speakers’ condition,
which is challenging in live STT due to voice alternation, quick exchanges, and partial
overlaps. The topic was controlled across all speeches to ensure no participant had an
advantage in terms of prior knowledge. Participants received advanced briefs and termi-
nology for personal and software preparation. Focusing on conditions rather than genres
enabled control over specific variables of interest. This also enabled the average accuracy
across all conditions to be calculated based on the assumption that, in real-life speeches, all
variables can be present at once. Overall, the speeches were longer than those used in other
studies (approximately 15 min each), as one of the study’s goals was to explore the extent
to which participants could sustain such a complex practice. Comparability was ensured by
rendering and adapting the same English script for flow, idiomaticity, and terminological
consistency across six languages and directionality. Delivery during testing was randomised
to prevent practice effects. This approach aimed to promote the creation of comparable
speeches, ensuring the same structure across languages, which is crucial for comparative
analysis.
In light of this, while interesting accounts and findings on different workflows exist,
cross-study comparisons are challenging due to the numerous variables characterising each
study. Also, accuracy scores compared without specifying the conditions in which they were
obtained are not particularly informative. For instance, considered in isolation, SMART’s
95.37% average interlingual respeaking accuracy score across all participants, language
pairs, and testing scenarios might seem unimpressive. However, this figure alone does not
convey the full picture of the workflow’s capabilities. When contextualised, this result reflects
the performance of 51 participants, each undertaking three interlingual respeaking tests
under different conditions (totalling 153 performances) in 15 min long speeches, after only
25 hr training. This context provides a more nuanced understanding of the findings, which
appear very impressive under this light and have been validated by industry stakeholders
involved in the project. Furthermore, the two-pronged analysis focusing on informativeness
and intelligibility (as explained in Section 11.5.1) revealed that 27 ‘high performers’ met
both criteria, achieving an average accuracy score of 97% and above (up to 98.87%) across
conditions and languages after a short training. Once again, this outcome is very promis-
ing and much more in line with findings from other studies. Consequently, there is clear
need for more rigorous analyses for reliable comparisons and a more nuanced approach to
reporting research findings, which is a responsibility of the academic community.

11.5.3 Skills and competences


The skills and competences required to perform optimally in each workflow, the transfer-
ability of skills from acquired professional backgrounds, and the adjustments necessary for
these live STT hybrid practices are critical areas of academic enquiry with a very applied
aim: supporting language professionals in their acquisition of expertise. In this regard, the
few available studies that have specifically addressed these questions are primarily related
to the human-centric hybrid workflow of interlingual respeaking.

200
Technology for hybrid modalities

Pöchhacker and Remael (2019) developed a theoretical model mapped against a tripar-
tite structure of the interlingual respeaking process: pre-, peri-, and post-process stages.
This model is based on a comprehensive understanding of the practice mechanics but lacks
empirical grounding. They identified integrated technical-methodological competences (i.e.
procedural, related to the interlingual respeaking task), linguistic and cultural competences,
world and subject matter knowledge, and interpersonal and professional skills. In her PhD
thesis, Dawson (2020) set out to explore empirically task-specific skills needed in this prac-
tice. These were identified as multitasking, live translation, dictation, command of source
and target languages, and comprehension. Based on the SMART project, Davitti and Wall-
inheimo (2025) framed interlingual respeaking as a form of HAII and broadened the scope
to empirically explore not only procedural but also cognitive and interpersonal skills that
underlie the process and the challenges that arise during performance. Despite the explora-
tory nature of the project, which does not aim to list a comprehensive set of skills, SMART
provides a multifactorial competence framework that can be further expanded in future
studies.
Another question, addressed by several studies, relates to defining the most suitable pro-
file for the job. This has often been approached using traditional, clear-cut labels, mostly
subtitling or interpreting. Szarkowska et al. (2018) conducted the first empirical work to
address this question by asking ‘Are interpreters better respeakers?’. They experimented
with 57 participants, grouped into 22 interpreters, 23 translators, and 12 controls, with dif-
ferent levels of experience in their fields and tested different parameters (speech rate and the
number of speakers) in intra- and interlingual respeaking. While they found that interpret-
ers consistently achieved the highest scores, the differences in interlingual respeaking were
not as pronounced as expected. This indicates a certain transfer of skills from interpreting
to new hybrid practices, but with other domains to explore. Dawson’s (2020) study showed
that, generally, the interpreters in her sample obtained better results than subtitlers in both
intra- and interlingual respeaking. Despite this, Dawson’s quantitative results suggest that
there may not be a particular professional profile best suited to interlingual respeaking.
In other words, being an interpreter does not appear to guarantee successful interlingual
respeaking performance.
SMART took a different approach, starting from the premise that no profile is clear-cut,
as described in Section 11.5.2. Different professional backgrounds among those chosen for
profiling were first examined independently to identify whether one could predict output
accuracy. Based on a sample of N = 51 language professionals, the data revealed that a
background in live (monolingual) subtitling emerged as a predictor of ‘good’ performance
(β = 0.32; p = 0.02.). This appears logical, as live intralingual subtitling via respeaking is
the closest practice to interlingual respeaking, sharing all procedural skills except language
transfer. This also suggests that adding language transfer (and the strategic behaviour asso-
ciated with it) is more efficient once all other skills involved in respeaking (e.g. interacting
with SR, adding punctuation, using SAD, etc.) are mastered. Of the 11 participants with
live intralingual respeaking in their background, 8 were high performers.
Given the composite background of most participants, SMART also took a different
approach to grouping their backgrounds. This approach considered the fundamentals of
interlingual respeaking, which involves a diamesic shift from spoken to written, transcend-
ing traditional boundaries of literacy or orality. One might argue that traditional subtitling
already does this, but the real-time nature and time constraints add a layer of complex-
ity. Conversely, the ‘real-time’ factor forms part of spoken interpreting, but traditionally,

201
The Routledge Handbook of Interpreting, Technology and AI

the diamesic transfer does not (as is argued at the beginning of this chapter). Therefore,
participants were grouped into clusters of professional backgrounds according to whether
they are more accustomed to practices that lead to spoken or written output, or both. This
approach helped move beyond the usual groupings according to profession. Based on this,
the project formed three balanced groups: spoken-to-spoken (n = 17), spoken-to-written
(n = 16), and mixed (n = 16). The former included individuals with consecutive/dialogue
and/or simultaneous/whispered interpreting in their background. The second included indi-
viduals with pre-recorded and/or live subtitling as part of their skills cluster. The last group
included individuals combining both interpreting and subtitling skills in their background
(e.g. consecutive/dialogue and pre-recorded subtitling; simultaneous and pre-recorded sub-
titling; consecutive/dialogue, simultaneous/whispered, and pre-recorded subtitling; con-
secutive/dialogue, simultaneous/whispered, pre-recorded, and live subtitling). Two outliers
reported having only written translation as their professional background. Interestingly, no
statistical difference was found between the three groups (p > 0.05), suggesting that no spe-
cific set of skills necessarily provides an advantage in this hybrid practice. It is thus a matter
of identifying the relevant procedural skills one already has, what is directly transferable,
and what may need adaptation or acquisition. This is particularly relevant for approaches
to training and upskilling, the final area of research, reviewed in the next section.
Before proceeding, it is important to note that other hybrid workflows have not been
scrutinised from the perspective of ‘required skills and competences’. Attention thus far has
focused mostly on establishing which workflow ‘performs best’, albeit not without limita-
tions, as discussed. Secondly, there seems to be a general assumption that, in workflows
combining different professional expertise (e.g. simultaneous interpreting + intralingual
respeaking), each actor would perform their tasks as normal, without much adaptation.
However, this would require further exploration and experimentation, since awareness of
working as part of these hybrid workflows, or in combination with machines (e.g. simulta-
neous interpreting + ASR), may need adjustment at procedural, cognitive, and interpersonal
levels.

11.5.4 Training and upskilling


Formal training is not available for all the hybrid practices explored in this chapter, as
many of these are not yet widely used. However, different studies focus on specific work-
flows, with interlingual respeaking once again attracting attention. Particularly in its intra-
lingual modality, respeaking has been taught at higher education institutions (HEIs) since
2007 via different models and approaches. Training ranges from HEI courses, mostly at
the postgraduate level within translation, interpreting, and subtitling programmes, to
company-organised in-house training, with varying set-ups and assessment procedures (e.g.
Robert et al., 2019; Dawson, 2020).
Beyond industry demand, many language professionals see training in these hybrid prac-
tices as an opportunity to expand their skill portfolios. This was confirmed by a survey
conducted in 2021 within the SMART project. When asked about their main motivation
for joining the course, language professionals expressed a desire to acquire new skills, rec-
ognising the need for adaptability in the evolving language industry and viewing SMART
as an opportunity for career development.
Focusing on interlingual respeaking, this section briefly reviews three key initiatives
for research-informed training in this practice. These were developed by Dawson (2020),

202
Technology for hybrid modalities

the ILSA,10 and the SMART projects (with the latter including the ongoing SMART-UP
follow-up).11 ILSA developed an online, self-paced course focusing on three real-life scenar-
ios: TV, live, and educational settings. ILSA’s course structure includes foundational compo-
nents, such as media accessibility, pre-recorded subtitling, and simultaneous interpreting, as
prerequisites to the core components of intra- and interlingual respeaking. Dawson focused
on developing a framework aiming to ‘offer trainers a structured approach to planning
training and to prepare trainers to organise and present course materials for interlingual
respeaking’ (p. 223). This includes three blocks of activities – the first one introducing
trainees to media access and dictation and software management, running simultaneously;
then block 2 with simultaneous interpreting and intralingual respeaking running simultane-
ously; and block 3, including one module on interlingual respeaking. The framework also
includes quality assessment points and discussions on professional practices and technolog-
ical advancements relevant to respeaking. Suggestions for different delivery modes, either
online or on-site or blended, are provided.
SMART complemented these efforts by developing an upskilling course aimed at lan-
guage professionals from diverse, yet relevant, backgrounds, each bringing unique skills to
this emerging field. The course was piloted as a prototype to empirically explore different
types of competences (procedural, cognitive, and interpersonal) required for this new form
of HAII. Delivered remotely (due to the pandemic) over five weeks in 2021, the bespoke 25
hr course aimed to place participants on a level playing field by teaching all core procedural
skills needed for intra- and interlingual respeaking. The course also collected data on the
key competences (not only procedural but also cognitive and interpersonal) underlying its
process and the level of accuracy in terms of products that could be achieved after complet-
ing the training. As previously mentioned, the course attracted 51 participants from over
250 applicants, demonstrating significant interest in this CPD opportunity. A face-to-face
version of the course was later implemented at a summer school in 2022 with 12 language
professionals, including staff from project partners Sky and Sub-Ti Ltd. Findings high-
lighted both shared and unique challenges in skills acquisition, depending on participants’
professional backgrounds, and established an initial baseline for skill attainment after 25
hr of training (see Davitti et al., under review). This offers a promising starting point for
individuals wishing to progress from functional to expert levels.
Building on these insights, the follow-up SMART-UP project is currently refining the
SMART prototype into a flexible CPD model, characterised by a modular structure that
ensures adaptability and customisability. Some elements are shared by the approaches
reviewed earlier, such as the focus on elements related to the use of SR and on skills from
subtitling (e.g. familiarity with conventions, awareness of viewers’ needs, etc.) and inter-
preting (e.g. ability to listen and speak at the same time, strategic behaviour, multitask-
ing, etc). However, traditionally, interlingual respeaking training follows a sequence, with
simultaneous interpreting and pre-recorded subtitling taught in full and separately, before
teaching intralingual respeaking practice first, and interlingual respeaking practice after.
In contrast, SMART’s approach breaks down the respeaking process into single pro-
cedural skills that can be taught progressively and in an incremental manner. Similar to
Dawson, within each teaching block devoted to procedural skills, there is a tech strand.
Here, participants focus on learning skills to interact with SR software. Differently, though,
respeaking is broken down into core components, which are practised first intralingually
and then interlingually throughout the training, in constant alternation, with intralingual
practice serving as a scaffold for interlingual skill development. For instance, instead of

203
The Routledge Handbook of Interpreting, Technology and AI

dedicating a full module to simultaneous interpreting, the course focuses on developing


simultaneous listening and speaking skills intralingually (verbatim and cognitive shadow-
ing) first, then interlingually. This is followed by developing software-adapted delivery,
initially adding punctuation, and related strategies for managing speed, chunking, software
optimisation, and error correction separately, before bringing strategies together in an incre-
mental, gradual manner. This design, applied to the prototype and refined in SMART-UP,
is informed by the four-component instructional design model (4C/ID). This model is spe-
cific for the development and acquisition of complex cognitive skills intended as those
skills ‘requiring an integration of knowledge, skills and attitudes and the coordination of
sometimes many different constituent skills’ (Van Merriënboer and Kirschner, 2018, 13).
This approach aims to support learning while preventing cognitive overload and allowing
participants to build competence in individual components before attempting the full task.
At this stage, the CPD model builds on the SMART competence model (see Davitti and
Wallinheimo, 2025) and includes six modules: technology-enabled hybrid modalities for
live speech-to-text; media, digital, and live content accessibility; professional and industry
knowledge; cognitive abilities and interpersonal traits; quality assessment; and respeak-
ing hands-on. The latter is further subdivided into five blocks where procedural skills are
taught and developed gradually. This modular structure allows for customisation based
on user needs. It facilitates efficient, time-effective upskilling for language professionals
from different backgrounds by helping them identify existing skill sets and areas requiring
adjustment and new learning, rather than having to learn all skills from scratch. Last but
not the least, the modular structure enables future adaptations to alternative live subtitling
workflows and the development of specific skills in relation to other hybrid practices.

11.6 Conclusion
In this chapter, live interlingual STT has been conceptualised as a unique form of interpret-
ing accessible to hearing individuals, speakers of other languages, and those with hearing
loss, making it a service that is essential for some and beneficial for all. This (re)conceptu-
alisation frames interpreting as a diamesic activity that transcends spoken, sign, and written
language boundaries. It is also where the fields of translation, interpreting, subtitling, and
media accessibility converge into an increasingly fluid, hybrid space. Traditional boundaries
between these disciplines are thus dissolving, leading to a growing recognition of their inter-
related nature and the need for a more integrated approach.
Given rapid technological advances, the same service can be delivered in different
ways. This chapter has offered, first, a descriptive overview of some salient workflows,
with reference to how they are currently used in the industry. Viewing these practices as
technology-enabled forms of interpreting is crucial for their broader adoption beyond audi-
ovisual translation and media accessibility. Understanding the dynamics of these complex
workflows is fundamental for appreciating the evolving landscape of language-related prac-
tices in the digital age. Raising awareness of their application and potential becomes essential
for effective participation in various contexts, as well as for effective information dissemina-
tion. The integration of human and AI efforts is central to this evolution, u ­ nderscoring the
need for new skills and regulatory frameworks to ensure high-quality, reliable services.
While technology is a critical component of these emerging practices, it is not the sole
factor determining their success or failure. Several aspects must be considered when explor-
ing such dynamics. These include procedural aspects (how to efficiently interface with AI

204
Technology for hybrid modalities

and optimise human–AI synergy), cognitive aspects (how to ensure appropriate effort levels
to prevent cognitive overload), interpersonal traits (what traits and attitudes can best sup-
port humans working in these environments), interactional dynamics (how to work with
others in new workflows), ergonomic considerations (how to integrate technology in a
manner conducive to well-being), and declarative knowledge (understanding how these
practices work and their best applications). Addressing these challenges requires rigorous
research methods and approaches to study design that enable replicability and comparabil-
ity. Research in this domain is still in its infancy, and it is hoped that the brief review of
existing literature, carried out in the second half of this chapter, has critically shown key
findings, but also areas for further improvement.
Despite varying greatly in their degree of human input and involvement, and despite the
hype around fully automated services, one common view across both academia and the
industry is that there is no one-size-fits-all solution. The ‘right’ workflow must match
the specific requirements and needs of each situation. These hybrid practices are thus not
intended to replace traditional ones, like simultaneous interpreting or subtitling; instead,
they are meant to complement existing services, offering opportunities for diversifying
service portfolios and expanding professional capabilities while adapting to the evolving
demands of the digital age. The rapid pace of technological advancement often outstrips
our understanding of its optimal applications and implications. However, despite fears of
replacement by technology, live STT across languages is far from a ‘solved problem’, due to
the many untested variables that impact quality and accessibility.
Now is the time to think critically about how operating in these HAII environments is
reshaping the language professionals’ identity at its core. While there cannot be an expec-
tation for them to suddenly become AI experts, their understanding of the processes and
goals of these practices positions them best to advise on appropriate workflows and optimal
points of human integration for specific situations. As highlighted in a recent industry-led
report (Slator, 2023), the level and type of human intervention are also evolving, creating
roles such as live subtitles post-editors and consultants for developing, training, and evalu-
ating STT technologies and workflows, among others.
The fear that automation may threaten established job profiles is tangible. However, turning
away from human-centric practices that could benefit from expertise already available in the
language industry is only likely to ‘encourage the industry to adopt fully automatic methods that
are still not ready to provide high quality translations’ (Alonso-Bacigalupe and Romero-Fresco,
2024, 541). In conclusion, technology-enabled hybrid modalities are not a threat but an oppor-
tunity for the language industry and require a shift in the way research is conducted. By embrac-
ing these new practices and continuing to develop expertise, industry and academia can ensure
high-quality, accessible real-time interlingual communication for all.

Notes
1 www.un.org/sustainabledevelopment/globalpartnerships/ (accessed 02.04.2025).
2 eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:12012P/TXT (accessed 02.04.2025).
3 www.ohchr.org/en/instruments-mechanisms/instruments/convention-rights-persons-disabilities
(accessed 02.04.2025).
4 World Health Organization, 2001. International Classification of Functioning, Disability, and
Health (ICF). Geneva: WHO.
5 See SMART (2023) video at www.youtube.com/watch?v=rxIRKLR2_7o (accessed 02.04.2025).

205
The Routledge Handbook of Interpreting, Technology and AI

6 ‘The future of live multilingual captioning – SLATOR interview Ai-Media CEO Tony Abrahams,
28.4.2023. URL https://slator.com/future-live-multilingual-captioning-ai-media-ceo-tony-abrahams/
(accessed 02.04.2025).
7 SMART project – Shaping Multilingual Access through Respeaking Technology, ES/T002530/1,
Economic and Social Research Council UK, 2020–2023. URL https://smartproject.surrey.ac.uk/
(accessed 02.04.2025).
8 MATRIC project – Machine Translation and Respeaking in Interlingual Communication, Expand-
ing Excellence in England, Research England, 2020–2024.
9 Unfortunately, no participants who signed up to take part in the study reported having a back-
ground in sign language interpreting, although it would be encouraged to include in future studies,
given the core diamesic transfer embedded in this practice.
10 ILSA project – Interlingual Live Subtitling for Access, Erasmus+ 2017-1-ES01-KA203-037948,
2017–2020.
11 SMART-UP project – Shaping Multilingual Access through Respeaking Technology – Upskilling,
Economic and Social Research Council Impact Acceleration Account, 2023–2025. URL https://
www.surrey.ac.uk/research-projects/smart-shaping-multilingual-access-through-respeaking-
technology-upskilling (accessed 02.04.2025).

References
Alonso-Bacigalupe, L., Romero-Fresco, P., 2024. Interlingual Live Subtitling: The Crossroads
Between Translation, Interpreting and Accessibility. Universal Access in the Information Society
23, 533–543. URL https://doi.org/10.1007/s10209-023-01032-8
Bhabha, H.K., 1994. The Location of Culture. Routledge, London.
Braun, S., 2019. Technology and Interpreting. In O’Hagan, M., ed. Routledge Handbook of Transla-
tion and Technology. Routledge, London, 271–288.
Davitti, E., Sandrelli, A., Zou, Y., Korybski, T., Wallinheimo, A.-S., under review. Designing a
Research-Informed Upskilling Course in Interlingual Respeaking for Language Professionals. The
Interpreter and Translator Trainer.
Davitti, E., 2018. Methodological Explorations of Interpreter-Mediated Interaction: Novel Insights
from Multimodal Analysis. Qualitative Research 19(1), 7–29. Sage Publications. https://doi.
org/10.1177/1468794118761492
Davitti, E., Sandrelli, A., 2020. Embracing the Complexity: A Pilot Study on Interlingual Respeaking.
Journal of Audiovisual Translation 3(2), 103–139. https://doi.org/10.47476/jat.v3i2.2020.135
Davitti, E., Wallinheimo, A.-S., 2025. Investigating Cognitive and Interpersonal Factors in Hybrid
Human-AI Practices: An Empirical Exploration of Interlingual Respeaking. Target 37, S­ pecial
Issue: Mapping Synergies within Cognitive Research on Multilectal Mediated Communication.
Dawson, H., 2020. Interlingual Live Subtitling: A Research-Informed Training Model for Interlingual
Respeakers to Improve Access for a Wide Audience (PhD thesis). University of Roehampton.
Dawson, H., 2021. Exploring the Quality of Different Live Subtitling Methods: A Spanish to English
Follow-Up Case Study. Paper presented at the 7th IATIS Conference, 17.9.2021, Pompeu Fabra
University, Barcelona, Spain.
Eichmeyer-Hell, D., 2018. Speech-to-Text Interpreting: Barrier-free Access to Universities for the
Hearing Impaired. Barrier-Free Communication: Methods and Products: Proceedings of the 1st
Swiss Conference on Barrier-Free Communication, ZHAW Digitalcollection, 6–15. URL https://
doi.org/10.21256/zhaw-3000
Eichmeyer-Hell, D., 2021. Speech Recognition (Respeaking) vs. the Conventional Method (Keyboard):
A Quality-Oriented Comparison of Speech-to-Text Interpreting Techniques and Addressee Prefer-
ences. In Jekat, S.J., Puhl, S., Carrer, L., Lintner, A., eds. Proceedings of the 3rd Swiss Conference on
Barrier-Free Communication (BfC 2020). Winterthur (online), 29.6.2020–4.7.2020. ZHAW Zurich
University of Applied Sciences, Winterthur. URL https://doi.org/10.21256/zhaw-3001
Eichmeyer-Hell, D., forthcoming. Schriftdolmetschen – Realisierungsformen qualitätsorientierten
Vergleich (PhD thesis).
Eszenyi, R., Bednárová-Gibová, K., Robin, E., 2023. Artificial Intelligence, Machine Translation &
Cyborg Translators: A Clash of Utopian and Dystopian Visions. Ezikov Svyat 21, 102–113. URL
https://doi.org/10.37708/ezs.swu.bg.v21i2.13

206
Technology for hybrid modalities

Eugeni, C., 2017. La sottotitolazione intralinguistica automatica – Valutare la qualità con IRA. CoMe
2(1), 102–113.
Eugeni, C., 2020. Human-Computer Interaction in Diamesic Translation: Multilingual Live Subtitling. In
Dejica, D., Eugeni, C., Dejica-Cartise, A., eds. Translation Studies and Information Technology – New
Pathways for Researchers, Teachers and Professionals. Editura Politehnica, Timişoara, 19–31.
Eugeni, C., Caro, R.B., 2019. The LTA Project: Bridging the Gap Between Training and the Profession
in Real-Time Intralingual Subtitling. Linguistica Antverpiensia, New Series – Themes in Transla-
tion Studies 18. URL https://doi.org/10.52034/lanstts.v18i0.512
Eugeni, C., Caro, R.B., 2021. Written Interpretation: When Simultaneous Interpreting Meets
Real-Time Subtitling. In Seeber, K., ed. 100 Years of Conference Interpreting: A Legacy. Cam-
bridge Scholars Publishing, London, 93–109.
Eugeni, C., Gambier, Y., 2023. La traduction intralinguistique: les défis de la diamésie. Editura
Politehnica, Timişoara.
TheExpressWire, 2023. Live Captioning Market 2023: Growth, Trend, Share, and Forecast Till 2030.
URL www.digitaljournal.com/pr/news/theexpresswire/live-captioning-market-2023-growth-trend-
share-and-forecast-till-2030-126-pages-report (accessed 12.08.2024).
Fabri, L., Häckel, B., Oberländer, A.M., Rieg, M., Stohr, A., 2023. Disentangling Human-AI Hybrids. Busi-
ness & Information Systems Engineering 65, 623–641. URL https://doi.org/10.1007/s12599-023-00810-1
Fantinuoli, C., 2018. Interpreting and Technology: The Upcoming Technological Turn. In Fantinuoli,
C., ed. Interpreting and Technology. Language Science Press, Berlin, 1–12.
Gottlieb, H., 2005. Multidimensional Translation: Semantics Turned Semiotics. In MuTra: Challenges
of Multidimensional Translation. URL www.euroconferences.info/proceedings/2005_Proceedings/
2005_proceedings.html
Greco, G.M., 2018. The Nature of Accessibility Studies. Journal of Audiovisual Translation 1(1),
205–232. URL https://doi.org/10.47476/jat.v1i1.51
Jiménez-Crespo, M., 2020. The “Technological Turn” in Translation Studies. Are We There Yet?
A Transversal Cross-Disciplinary Approach. Translation Spaces 9(2), 314–341. URL https://doi.
org/10.1075/ts.19012.jim
Joscelyne, A., 2019. World-Readiness: Towards a NewTranslate Benchmark. TAUS, The Language Data
Network. URL www.taus.net/resources/blog/world-readiness-towards-a-new-translate-benchmark
(accessed 12.09.2024).
Kade, O., 1968. Zufall und Gesetzmäßigkeit in der Übersetzung. Verlag Enzyklopädie, Leipzig.
(accessed 12.09.2024).
Korybski, T., Davitti, E., 2024. Human Agency in Live Subtitling Through Respeaking: Towards a
Taxonomy of Effective Editing. Journal of Audiovisual Translation. Special issue: Human Agency
in the Age of Technology, 1–22. URL https://doi.org/10.47476/jat.v7i2.2024.302
Korybski, T., Davitti, E., Orăsan, C., Braun, S., 2022. A Semi-Automated Live Interlingual Commu-
nication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking. In Proceed-
ings of the 13th Conference on Language Resources and Evaluation (LREC 2022), Marseille,
France, 4405–4413. ELRA.
Kubánková, E., 2021. Captions Increase Viewership, Accessibility and Reach. URL www.newtontech.
net/en/blog/23083-captions-increase-viewership-accessibility-and-reach (accessed 15.07.2024).
Marais, K., 2019. A (Bio)Semiotic Theory of Translation: The Emergence of Social-Cultural Reality.
Routledge, Abingdon. URL https://doi.org/10.4324/9781315142319
Nimdzi, 2022a. Languages & the Media: The Latest Trends in Media Localization. URL www.nim
dzi.com/media-localization-trends-languages-the-media/ (accessed 15.07.2024).
Nimdzi, 2022b. Consolidation and Growth in Media Localization. URL https://www.nimdzi.com/
transperfect-buys-hiventy/ (accessed 15.07.2024).
Nimdzi, 2023. The Nimdzi Business Confidence Study: Q1 and Q2 2023 Edition. URL www.nimdzi.
com/the-nimdzi-business-confidence-study-q1-and-q2-2023/ (accessed 15.07.2024).
Pagano, A., 2022. Testing Quality in Interlingual Respeaking and Other Methods of Interlingual Live
Subtitling (PhD thesis). Università di Genova.
Pöchhacker, F., 2007. Simultaneous Consecutive Interpreting: A New Technique Put to the Test. Meta
Journal des Traducteurs 52(2), 276–289. URL https://doi.org/10.7202/016070ar
Pöchhacker, F., 2022. Interpreters and Interpreting: Shifting the Balance? The Translator 28(2),
148–161. URL https://doi.org/10.1080/13556509.2022.2133393

207
The Routledge Handbook of Interpreting, Technology and AI

Pöchhacker, F., 2023. Re-Interpreting Interpreting. Translation Studies 16(2), 277–296. URL https://
doi.org/10.1080/14781700.2023.2207567
Pöchhacker, F., Remael, A., 2019. New Efforts? A Competence-Oriented Task Analysis of Interlingual
Live Subtitling. Linguistica Antverpiensia, New Series – Themes in Translation Studies. Antwerp,
Belgium, 18. URL https://doi.org/10.52034/lanstts.v18i0.515
Prandi, B., 2023. Computer-Assisted Simultaneous Interpreting: A Cognitive-Experimental Study on
Terminology. Translation and Multilingual Natural Language Processing 22.
Radić, A., Braun, S., Davitti, E., 2023. Introducing Speech Recognition in Non-Live Subtitling to
Enhance the Subtitler Experience. Proceedings of the International Conference HiT-IT 2023
167–176. URL https://doi.org/10.26615/issn.2683-0078.2023_015
Robert, I.S., Schrijver, I., Diels, E., 2019. Live Subtitlers: Who Are They? A Survey Study. Linguistica
Antverpiensia, New Series – Themes in Translation Studies 18. URL https://doi.org/10.52034/
lanstts.v18i0.544
Robinson, D., 2022. Cyborg Translation. Originally published in Petrilli, S. (Ed.), La traduzione.
Special issue of Athanor: Semiotica, Filosofia, Arte, Letteratura 10(2), 219–233.
Rodríguez González, E., Saeed, M.A., Korybski, T., Davitti, E., Braun, S., 2023. Assessing the Impact of Auto-
matic Speech Recognition on Remote Simultaneous Interpreting Performance Using the NTR Model. In
Proceedings of the International Workshop on Interpreting Technologies SAY IT AGAIN 2023.
Romero-Fresco, P., 2011. Subtitling Through Speech Recognition: Respeaking. St Jerome, Manchester.
Romero-Fresco, P., 2018. Respeaking: Subtitling Through Speech Recognition. In Pérez-González, L.,
ed. The Routledge Handbook of Audiovisual Translation. Routledge, London, 96–113.
Romero-Fresco, P., Alonso-Bacigalupe, L., 2022. An Empirical Analysis on the Efficiency of Five Interlin-
gual Live Subtitling Workflows. XLinguae 15, 3–13. URL https://doi.org/10.18355/XL.2022.15.02.01
Romero-Fresco, P., Eugeni, C., 2020. Live Subtitling Through Respeaking. In Bogucki, Ł., Deckert,
M., eds. The Palgrave Handbook of Audiovisual Translation and Media Accessibility. Palgrave,
London, 269–295. URL https://doi.org/10.1007/978-3-030-42105-2_14
Romero-Fresco, P., Martínez, J., 2015. Accuracy Rate in Live Subtitling: The NER Model. In
Díaz-Cintas, J., Baños, R., eds. Audiovisual Translation in a Global Context: Mapping an
Ever-Changing Landscape. Palgrave Macmillan, London, 28–50.
Romero-Fresco, P., Pöchhacker, F., 2017. Quality Assessment in Interlingual Live Subtitling: The NTR
Model. Linguistica Antverpiensia, New Series: Themes in Translation Studies 16, 149–167.
Sandrelli, A., 2020. Interlingual Respeaking and Simultaneous Interpreting in a Conference Setting: A
Comparison. inTRAlinea Special Issue: Technology in Interpreter Education and Practice.
Schaffner, C., Adab, B., 1995. Translation as Intercultural Communication: Contact as Conflict. In
Translation as Intercultural Communication: Selected Papers from the EST Congress, Prague.
John Benjamin’s Publishing Co., Philadelphia.
Simon, S., 2011. Hybridity and Translation. In Gambier, Y., van Doorslaer, L., eds. Handbook of
Translation Studies, Vol. 2. John Benjamins, 49–53. URL https://doi.org/10.1075/hts.2.hyb1
Slator, 2023. Slator Pro Guide: Subtitling and Captioning. URL https://slator.com/slator-pro-
guide-subtitling-and-captioning/ (accessed 12.08.2024).
Szarkowska, A., Krejtz, K., Dutka, Ł., Pilipczuk, O., 2018. Are Interpreters Better Respeakers? The
Interpreter and Translator Trainer 12(2), 207–226.
Tiselius, E., 2009. Revisiting Carroll’s Scales. In Angelelli, C.V., Jacobson, H.E., eds. Testing and Assess-
ment in Translation and Interpreting Studies. A Call for Dialogue Between Research and Practice,
American Translators Association Scholarly Monograph Series 14. John Benjamins, 95–121.
van Merriënboer, J.J.G., Kirschner, P.A., 2018. Ten Steps to Complex Learning: A Systematic
Approach to Four-Component Instructional Design, 3rd ed. Routledge, New York.
Verified Market Reports, 2024. Real-Time Language Translation Device Market Insights. URL
www.verifiedmarketreports.com/product/real-time-language-translation-device-market/ (accessed
12.08.2024).
Wagner, S., 2005. Intralingual Speech-to-Text Conversion in Real-Time: Challenges and Opportuni-
ties. In MuTra: Challenges of Multidimensional Translation Conference Proceedings.
Wallinheimo, A.-S., Evans, S.L., Davitti, E., 2023. Training in New Forms of Human-AI Interaction
Improves Complex Working Memory and Switching Skills of Language Professionals. Frontiers in
Artificial Intelligence 6. URL https://doi.org/10.3389/frai.2023.1253940
Wordly, 2024. State of Live AI Translation. URL www.wordly.ai/resources/wordly-ai-translation-
research-2024 (accessed 15.07.2024).

208
12
MACHINE INTERPRETING
Claudio Fantinuoli

12.1 Introduction
Language barriers pose a significant obstacle for numerous people. These barriers not only
restrict access to content in daily leisure and entertainment contexts, in the circulation of
news, etc., but also create difficulties in critical situations, such as accessing public ser-
vices during humanitarian operations or while managing crisis scenarios, to name just a
few. Previous research has extensively shown, for example, that limited proficiency in the
languages of host societies significantly contributes to anxiety and stress among migrants
(Ding and Hargraves, 2009); the absence of mutual understanding because of language
barriers creates dangerous situations, hazards, exclusion, and other cascading effects during
crises (O’Brien and Federici, 2022); and the lack of language accessibility in the informa-
tion selection process impacts the framing and agenda-setting factors in newsrooms (Van
Doorslaer, 2009). These challenges are not limited to the physical world; they are also
becoming increasingly significant in online communication, where new challenges arise for
achieving barrier-free interaction among users with diverse linguistic and accessibility needs
(Kožuh and Debevc, 2018; Abarca et al., 2020).
For centuries, overcoming language barriers involved relying on human interpreters,
both professional and amateur, adopting a common lingua franca (Cogo, 2015) or even
creating artificial languages, like Esperanto (Fettes, 1997). While each approach presents
its own advantages, it is important to acknowledge that these also come with recognised
limitations. For example, human interpreters are not universally accessible or cost-effective,
and their services are often reserved for specific situations deemed essential by stakehold-
ers or mandated by law (Jaeger et al., 2019). The use of a lingua franca, such as English, is
accessible only to a subset of individuals, and often, their proficiency in it is limited (Hart-
shorne et al., 2018). Most situations involving people from different language backgrounds
remain unaddressed.
Several solutions are conceivable as a way to overcome language barriers. These include
training more interpreters, training bilingual staff in multilingual workforces to act as inter-
preters, making interpreter provision more effective through the use of remote interpreting,
etc. Among these opportunities, many have proposed speech translation technology as a


DOI: 10.4324/9781003053248-16
The Routledge Handbook of Interpreting, Technology and AI

tool to enhance accessibility and bridge linguistic and cultural divides, offering a more
inclusive and connected world (Waibel et al., 1991; Salesky et al., 2023; Anastasopou-
los et al., 2022). By overcoming the exclusivity of professionals in delivering this service,
machines promise to make accessibility across languages ubiquitous and more affordable
(Susskind and Susskind, 2017), contributing, at least to some extent, to mitigating language
barriers.
The quest for automatic speech translation is long, but it is only recently that this tech-
nology has gained momentum in research. The International Workshop on Spoken Lan-
guage Translation (IWSLT),1 the most important annual scientific conference dedicated to
this topic, was founded in 2004. The first workshop described the vision of this new branch
of computational science as an ‘attempt to cross the language barriers between people with
different native languages who each want to engage in conversation by using their mother
tongue’.2
More recently, riding the wave of advancements in speech recognition and machine
translation driven by neural networks and, lately, the introduction of large language mod-
els, MI has transitioned from research labs and conference venues to real-life applications.
Despite the progress made, the pursuit of high-quality automatic speech translation is
fraught with a wide range of challenges that go beyond technical and linguistic hurdles.
Achieving a high standard of translation quality is just the beginning; it is also important
to navigate the ethical considerations of MI, assess its suitability in various settings, and
engage in deeper discussions about the implications of society having unlimited access to
spoken language information from different languages and cultures.
This chapter seeks to enhance the understanding of the application of this technology
beyond its mere technical aspects, targeting a broader audience rather than the experts
of the field. Section 12.2 introduces the concept of MI and the terminological conven-
tions used in different disciplines. Section 12.3 gives an overview of the development of
the field. Section 12.4 delves into the technological approaches that are the base for MI.
Section 12.5 presents the major challenges for machine interpretation. Section 12.6 outlines
some technological perspectives, and Section 12.7 the challenges in evaluating MI applica-
tions. Section 12.8 reflects on possible ethical implications of MI in society at large. Finally,
Section 12.9 concludes the chapter.

12.2 Definition and terminology


The automated process of converting spoken language from one language to another, particu-
larly for live communication settings, is known by various terms across different disciplines.
The two most common terms are speech-to-speech translation and machine interpreting.3
The term speech-to-speech translation, primarily used in computer science, refers to any
technology that converts spoken words in one language into spoken words in another. The
translation process can be performed offline in the case of pre-recorded audios and videos, or
in a streaming format in the case of live and real-time translation (Seligman, 1997; Kidawara
et al., 2020). Speech-to-speech is a specific category within speech translation, an even broader
term that includes all forms of translation from an oral source, whether into a written form,
for example, for the creation of subtitles (Xu et al., 2022; Papi et al., 2023), or another oral
form, for example, for dubbing or voiceover (Saboo and Baumann, 2019; Hu et al., 2022).
The term machine interpreting or machine interpretation (MI) is predominantly used in
the field of translation studies and refers to a specific subset of speech-to-speech translation

210
Machine interpreting

characterised by its focus on immediacy, that is, the fact that the message in translation is
available only once and cannot be edited (Fantinuoli, 2023). This is similar to the distinctive
features of human interpreting (Kade, 1968; Pöchhacker, 2022). The focus of MI is there-
fore on real-time and live scenarios, setting it apart from other forms of speech-to-speech
translation. MI can be either consecutive, where the machine processes and translates an
entire oral segment after it is spoken (such as a sentence or a longer passage), or simul-
taneous, where the translation is generated incrementally as the original speech is being
delivered, that is, based on partial input. Notably, MI may involve nuanced intervention in
the translation process, which may include adaptations, omissions, voice speed changes, or
other modifications to tailor communication effectively for specific contexts.
While the term speech-to-speech translation may refer to both the underlying technol-
ogy and its application, MI is primarily used only to denote the application rather than the
technology itself.

12.3 History
Modern speech translation applications have only caught the world’s attention in the last
few years thanks to impressive improvements in the underlying technologies, but research-
ers have been working to develop and understand its potential for much longer than that.
The history of speech-to-speech translation has always been tightly related to the progress
of a series of different technologies, especially speech recognition and machine translation
and, more recently, of large language and multimodal models.
While research and development in machine translation date back to the 1940s, it was
only during the 1983 World Telecommunication Exhibition that multilingual speech trans-
lation appeared on the stage (Kato, 1995). This technology, integrating speech recognition
and speech synthesis with machine translation, captured widespread attention at that event
and initiated a series of projects and conferences in the domain. Following that, the Speech-
Trans (Tomita et al., 1990) system, developed in 1988, marked another significant step in
speech translation, consolidating the field inside the computer science space. In 1992, the
German Federal Ministry of Education and Research launched a project called Verbmobil,
which developed a prototype portable translation system for the language pair German–
English. During the implementation of Verbmobil, a great number of scientific publications
and several spin-off products were generated. Start-up companies were established by the
participating universities and research centres (Wahlster, 2000). The project was character-
ised by its pioneering nature, and its legacy continues to resonate today.
Over the next two decades, especially after the Consortium for Speech Translation
Advanced Research was founded in 1991, various speech translation systems were cre-
ated, ranging from restricted domain and vocabulary to open-domain translation systems
(Waibel et al., 1991; Fügen et al., 2006). To further advance speech translation systems,
the International Workshop on Spoken Language Translation (IWSLT) was founded in
2004. In July 2010, the National Institute of Information and Communications Technol-
ogy (NICT) in Japan launched the world’s inaugural field experiment of a network-based
multilingual speech-to-speech translation system using a smartphone application.
In 2019, the European Parliament, a significant institution utilising a multilingual regime
with 24 official languages, initiated an innovation project in the space of speech translation.4
While this project was not related to MI but rather to speech-to-text translation (the goal
was to improve the accessibility of plenary meetings, enabling deaf and hard-of-hearing

211
The Routledge Handbook of Interpreting, Technology and AI

individuals to follow debates in real time), the initiative represents a crucial milestone in the
large-scale implementation of speech translation technologies.
More recently, MI has begun to be used in real-life scenarios, such as conferences, work-
shops, town halls, etc.5 While these systems were initially designed only for consecutive
interpretation, recent developments have also introduced systems capable of near simulta-
neous interpretation.
Speech-to-speech translation is a challenging research problem. From a technological per-
spective, the evolution of speech technology has followed a pattern similar to that of machine
translation, moving from simple, rule-based and statistical approaches to more complex,
deep learning paradigms.6 To simplify the task and make it somewhat more feasible, the first
research projects addressing this topic in the 1990s worked on restricted domains, such as
scheduling appointments or making orders, and moved only later to free and spontaneous
speech translation. Initially, speech translation systems were based on rule-based or statis-
tical approaches and, therefore, had all the limitations that came with them. Limitations
include the inability to translate idiomatic expressions, to take into consideration context, or
to produce fluent translation, to name just a few. After Google announced the development
of the Google Neural Machine Translation (Wu et al., 2016) in September 2016, a signifi-
cant paradigm shift from statistical approaches to deep learning and neural machine transla-
tion has been taking place, paving the way for new speech translation systems with increased
accuracy. The increased availability of special datasets, such as Europarl-ST (Iranzo-Sànchez
et al., 2020) or TEDx (Salesky et al., 2021), has further increased the efforts of the commu-
nity to create dedicated models for speech translation tasks.
At the moment of writing, commercial systems are designed based on the cascad-
ing approach, that is, combining several components, such as speech recognition and
machine translation (see Section 12.4.1). However, significant strides have been made
in developing and embracing end-to-end systems, that is, systems capable of translating
from one spoken language to another directly, without relying on an intermediate text
representation (Jia et al., 2019b). In the future, the end-to-end paradigm will simplify
architectures and bring about systems that will further improve the translation experi-
ence, allowing for example expressiveness and the retention of the features of the original
(Barrault et al., 2023).
Recently, much interest has also been devoted to reducing the latency of cascading and
direct systems for a range of reasons, including supporting real-time and near simultaneous
translation (Sudoh et al., 2020), extending coverage of dialect and low-resource languages,
and adapting register (Agarwal et al., 2023).

12.4 Current approaches


As previously explained, there are two main technical approaches used in MI: the cascad-
ing approach and the end-to-end approach. These can be seen as the extreme ends on a
continuum within many possible implementations and variations.

12.4.1 Cascading approach


The cascading approach is characterised by a pipeline of variable components concate-
nated one after the other. Cascading approaches are based on composite systems. In such

212
Machine interpreting

Figure 12.1 Simple cascading system for MI.

systems, the overall goal of the application is divided into multiple tasks, with each task
assigned to a specialised module. The simplest cascading configuration involves automatic
speech recognition (ASR), to convert speech into written text; neural machine translation
(NMT), to translate the written text from one language to another; and voice synthesis or
text-to-speech (TTS), to convert the written translation into an oral form (Figure 12.1 – see
also Davitti, this volume).
Configurations of cascading systems can vary considerably, depending on technologi-
cal advancements, design requirements, and complexity of the system. Recently, more
sophisticated applications have emerged. These may have, for example, dedicated com-
ponents for achieving low latency and continuous translation, or components to adapt
translation strategies to the translation goals, for example, speed adaptation, speech cor-
rection or normalisation, and language simplification (for translation into plain or simpli-
fied language), etc.
Generally speaking, cascading systems have the advantage of being able to leverage
robust technologies with years of development behind them. The robustness of the single
components is also granted by the availability of large training sets (Sperber and Paulik,
2020). However, cascading systems suffer from two challenges. First, the extreme complex-
ity of training and maintaining composite pipelines. Second, such systems tend to suffer
from error propagation from one component to the next, that is, errors made in one stage
of the process are carried forward and potentially amplified in subsequent stages. Notwith-
standing the possibility to correct errors, for example, in transcription issues (Martucci
et al., 2021; Macháček et al., 2023b), each of these stages can introduce new errors. Since
each subsequent stage relies on the output of the previous one, errors can accumulate and
worsen as the process continues.
Simultaneous machine interpreting is the most intricate form of speech translation, as
it adds a new significant temporal constraint. Systems need to translate an ongoing stream
of speech incrementally, without interruptions and without complete knowledge of what
the speaker will say moments later. To accomplish this, speech must be segmented into
meaningful chunks in real time (Waibel et al., 2012). The goal of a segmentation module is
to achieve a system that balances translation accuracy and latency. Latency can be broadly
defined as the time (in seconds) between the speaker uttering a word and the moment
the translation engine delivers the same word. Achieving lower latency, which implies less
context for the engine, can pose challenges in producing accurate translations. Segmenta-
tion methods range from detecting pauses in the speaker’s flow and employing fixed word
lengths to utilising dynamic approaches based on real-time syntactic and semantic analysis
of incoming speech.
In Figure 12.2, a simultaneous model is added to the pipeline. This model allows the
pipeline to achieve near simultaneous speech translation. Generally speaking, this model
aims at segmenting the incoming stream of text into chunks of text, for example, based on

213
The Routledge Handbook of Interpreting, Technology and AI

Figure 12.2 Cascading system for simultaneous translation.

Figure 12.3 Cascading system with direct speech-to-text translation.

units of meaning. These units of meaning can then be passed to the MT engine, and there-
after to the TTS model (Kumar et al., 2013; Wang et al., 1999; Ma et al., 2019), in a way
that the translation, once spoken, is meaningful and coherent.
Pipelines in translation processes are not inherently restricted to a unidirectional flow.
Depending on the chosen architecture, feedback loops can be integrated to enhance con-
trol over the translation process. Contrary to the unidirectional approach illustrated in
Figure 12.3 (which necessitates segmenting the input stream into chunks that resemble
semantically coherent sentences and thus incurs a latency at least as long as the length of
these sentences), the incorporation of a feedback loop can reduce latency. By reintroduc-
ing the already-translated portion of a sentence back into the system, the MT engine is
prompted to complete the translation of the new segment while still considering the context
of the previously translated part. This strategy not only achieves shorter latency but also
improves coherence and cohesion in the translated output.
As previously mentioned, tasks and components are not rigidly defined. For example, the
tasks of ASR and MT can be unified in a single component performing direct speech-to-text
translation (Berard et al., 2016; Papi et al., 2023; Sethiya and Maurya, 2023), that is, a
model that is able to directly convert speech signals in a language into text in another lan-
guage. The translation can then be passed directly to a TTS system for audio generation.
Several additional components can be integrated into a cascading pipeline, such as text
normalisation (Fügen, 2008), language identification (Singh et al., 2021), suppression of
disfluencies (Fitzgerald et al., 2009), prosody transfer (Kano et al., 2018), speaker diarisa-
tion (Yang et al., 2023), and so forth.
Recently, translation has been performed not only by neural machine translation engines
but also by large language models, such as GPT.7 This enables a more comprehensive
approach to the translation task, leveraging the capabilities of generative AI.
For instance, LLMs can incorporate contextual understanding for translation through
methods such as in-context learning (Moslem et al., 2023) or even integrate automatic
quality estimation to enhance real-time translation (cf. Kocmi and Federmann, 2023; Wang
and Fantinuoli, 2024).

12.4.2 End-to-end approach


The emerging alternative to cascading systems is the end-to-end approach. This approach
uses a single component to translate input audio in one language directly into output audio

214
Machine interpreting

in another language without generating intermediate text representations (Lee et al., 2022).
Prominent projects are Translatotron8 by Google and SeamlessM4T9 by Meta.
While cascading systems are unable to preserve para-/non-linguistic speech characteris-
tics that are central to human communication, such as prosody, tone of voice, pauses, and
emphasis, the end-to-end approach promises to maintain such traits also in translation.
In fact, end-to-end models learn to map features of language in a holistic way. They map
spoken language, comprising all the aforementioned features, to translations. End-to-end
models are then able to reproduce these features, at least partially (Barrault et al., 2023).
Initially, end-to-end models primarily used supervised machine learning techniques
that rely on bilingual speech datasets (Jia et al., 2019a, 2019b, 2021). This approach has
two limitations. On the one hand, collecting bilingual datasets, especially but not only
for low-resource languages, is difficult. On the other hand, the lack of bilingual data with
corresponding non-linguistic characteristics in both source and target languages makes it
impossible to transfer such features to the translated speech. More recently, attempts have
been made to overcome this limitation, using unsupervised machine learning with monolin-
gual datasets (Nachmani et al., 2023).
While the end-to-end approach was initially applied only to consecutive translation,
the first systems have also been proposed for the simultaneous modality (Barrault et al., 2023).

12.5 Challenges
MI encounters various challenges, drawing from the complexities inherent in speech trans-
lation (Macháček et al., 2023b) and adding unique difficulties due to its need for imme-
diacy. These challenges broadly fall into the following categories: linguistic, cultural and
communicative, and technological.

12.5.1 Linguistic challenges


The linguistic challenges in MI relate to all aspects connected with language, and especially
the distinctive features of spoken language. These challenges include, but are not exhaustive
to, the following:

• Disfluencies. Natural speech often contains filler words and sounds, such as ‘um’, ‘uh’,
‘you know’, and ‘like’.
• Poor enunciation. Mumbled or slurred words are difficult to interpret accurately.
• Features of spoken grammar. People often speak in sentences that would not be gram-
matically or syntactically correct in written language: changing direction mid-sentence,
not completing sentences, or using non-standard grammar.
• Proper nouns. Identifying and transcribing names (in cascading systems) or maintaining
the correct pronunciation of names (in direct systems) is a big challenge for machines.
• Simultaneity. The need to process incoming speech while it is unfolding is a further
challenge.

The difficulty in tackling these challenges is exacerbated in the case of low-resource lan-
guages, particularly Indigenous and minority languages. The same issue applies to many
accent variations that are not covered in the training data. Therefore, providing compre-
hensive language coverage continues to be a significant challenge. Commercial MI systems,

215
The Routledge Handbook of Interpreting, Technology and AI

such as those offered by Interprefy10 or KUDO,11 offer support for 80 and 38 languages,
respectively, at the time of writing. Variations in quality might be significant, not only
among vendors, but also between languages. Addressing these challenges requires ongoing
effort, resources, sophisticated algorithms, and extensive linguistic knowledge. It should
be recognised that the number of languages supported in machine translation is constantly
increasing, and this trend is expected to be reflected in commercial systems for MI as well.12

12.5.2 Cultural and communicative challenges


The cultural and communicative challenges of MI are many. Languages and cultures are
understandably deeply interconnected. One of the most nuanced challenges in speech trans-
lation is handling cultural references, idioms, and colloquialisms that are specific to a par-
ticular language or culture (Yao et al., 2023). Machine translation performs poorly on
culturally specific data, mostly due to the gap between the cultural norms associated with
the languages (Liebling et al., 2022). The same applies to irony and sarcasm, both of which
are highly dependent on cultural context.
Culturally bound information often cannot be translated directly, as its meaning is
deeply rooted in the culture and history of the people who use it. A speech translation
system should recognise such information and strategically translate it into an equivalent
expression in the target language. To be effective, the speech translation system would
be required to consider not only the semantic correspondence but also its communicative
dimension, which is limited to that specific moment and communicative act.
A fully functional speech translation system would require profound cultural and con-
textual understanding to precisely interpret and translate cultural references. Moreover, it
should possess the capability to make informed decisions regarding translation strategies,
based on the language/culture pairs and the communicative context. While initial research
is undergoing in this direction (cf. Yao, 2023), such advanced abilities are currently beyond
the scope of existing systems.
MI also faces challenges in processing background information about a communicative
event. Achieving accurate translation hinges significantly on comprehending the context
of a speech or conversation. In the absence of, or with limited access to, this contextual
understanding, a translation system may misinterpret the meaning or intention behind an
utterance. This could lead to potentially misleading translations.
Similarly, machine translation, especially in the context of real-time speech commu-
nication, is not grounded in the communicative context (cf. Lala et al., 2019). While in
some simple contexts, speech translation might functionally work even with or without
limited access to broader context, in more complex and nuanced settings, systems would
need a comprehensive grasp of the broader communicative setting to provide accurate
translations. This involves understanding the relationships between speakers, their tone
of voice, and the numerous non-verbal cues that are key to constructing meaning (See-
ber, 2012). While progress has been made in aspects such as multimodality (Sulubacak
et al., 2019), the limitations in grounding the translation in MI remain numerous.

12.5.3 Technical challenges


The technological challenges related to the design and deployment of speech translation
systems at scale are also manifold and are common to many machine learning applications
(Anastasopoulos et al., 2022; Agarwal et al., 2023).
216
Machine interpreting

A critical constraint is latency, particularly in real-time translation scenarios, such as live


broadcasting or interpersonal communication (Sarkar, 2016). Latency is the delay between
the original spoken words and the availability of the translation. Reducing the delay between
the spoken word and its translated output is essential to prevent conversation disruption,
misunderstandings, or a poor user experience. In a situation where a speaker is illustrating a
chart, for example, a delayed translation would not be conducive to achieving a functional
understanding. In an ideal scenario, the translation should therefore be voiced quickly, match-
ing the natural rhythm of speech, including its flow, pauses, and moments without speech.
Latency is basically determined by two elements: on the one hand, technical constraints,
that is, the time which is necessary to inference the translation model(s) as well as the delay
due to data transmission and processing and, on the other, linguistic constraints. Technical
constraints are typical to all forms of MI, that is, both consecutive as well as simultane-
ous, and can be addressed by having more performant hardware and improved language
models to reduce inference time or reducing the components involved in the process. Lin-
guistic constraints apply only to simultaneous translation, for which achieving low latency
while maintaining accuracy and reliability in translation presents a substantial challenge
(Chang et al., 2022). Because of the progressive nature of the translation process while the
original speech is unfolding, the system needs to wait for some context before processing
and outputting the translation. The challenge lies in finding the right compromise: Reduc-
ing the latency means the system has less context for the translation, which can negatively
impact accuracy. Conversely, allowing more latency will enhance the translation quality
but will degrade the translation experience. From a linguistic perspective, latency is tied
to the speech segmentation method chosen. These can range from detecting pauses in the
speaker’s delivery and using fixed word counts to dynamic strategies that analyse the syntax
and semantics of the incoming speech in real time.
Another key concern is the reliable and scalable deployment of computationally intensive
systems. Speech translation tools must consistently deliver translations under a variety of
conditions, accommodate a large number of users simultaneously, and do so in real time. To
ensure this level of performance, robust infrastructure and efficient algorithms are necessary,
each posing its own set of complex challenges. The complexity is further compounded for
cascading systems due to their architectural intricacy and the multitude of language models
requiring maintenance and deployment. Currently, deploying such complex systems is feasi-
ble only in the cloud and in robust data centres. Offline deployment on the consumers’ devices
remains beyond reach for now, although it is foreseeable that advancements in both software
and hardware will eventually support this deployment method (Wang et al., 2022).

12.6 Evaluation
Evaluating the quality of MI is an important activity. Evaluations yield insights that are
crucial for various stakeholders, including developers, users, buyers, certifying agencies,
and more (Han, 2022).
Quality in interpreting is a nuanced and variable concept heavily influenced by the spe-
cific needs and idiosyncrasies of end users or service purchasers (see Davitti, Korybski,
Orăsan and Braun, this volume). It becomes paramount in high-stakes scenarios, where the
accuracy and subtlety of translation can have significant implications. The notion of qual-
ity, therefore, is not uniformly defined and varies according to the context of use.
The task of assessing the quality of translated content is multifaceted and complex. Meas-
uring quality is inherently challenging due to the somewhat-intangible nature of spoken

217
The Routledge Handbook of Interpreting, Technology and AI

language translation (Garcia Becerra, 2016a, 7). Quality perceptions can differ significantly
among users, adding a layer of subjectivity to what is considered a correct and high-quality
translation. Quality standards vary depending on the interpreting context. For example, in
human conference interpreting, the focus is on the interpreter’s output, including content,
language, and delivery. While these features continue to remain important in public service
settings, such as social and healthcare interpreting, other skills, such as interactional skills
and discourse management, become relevant (Kalina, 2012). There seems to be a consensus
that there is poor agreement on what constitutes an acceptable translation (Zhang, 2016).
Translation quality can be evaluated manually or automatically. Human evaluations pro-
vide a comprehensive view of quality by considering various aspects of communication, offer-
ing a deep understanding of the interpreting performance, as indicated by interpreting scholars
(Pöchhacker, 2002; Garcia Becerra, 2016b). However, manual evaluations are labour-intensive,
time-consuming, and expensive (Wu, 2011). In automatic speech translation, manual evalu-
ation has been used to assess both accuracy and fluency (cf. Fantinuoli and Dastyar, 2022).
Automated or semi-automated metrics have been proposed as an alternative to manual eval-
uation in order to simplify and speed up this process. Very few studies have applied automated
evaluation to human interpreting, as in the case of semantic similarity proposed by Zhang
(2016). A higher number of studies have applied automated evaluation to speech-to-text trans-
lation. These use statistical metrics, such as BLEU (Papineni et al., 2002), BERTScore (Zhang
et al., 2020), and chrF2 (Popović, 2017), both to monitor quality evolution from a developer
perspective and to compare several systems, for example, during international evaluation cam-
paigns (Agarwal et al., 2023). Systems are often evaluated in terms of their ability to produce
translations that are similar to the target language references. More recently, reference-free
evaluations have been proposed, leveraging on multilingual sentence embeddings or other
techniques that attempt to capture meaning (cf. Wang and Fantinuoli, 2024).
When applying automatic metrics, one important consideration that is key to understand-
ing how reliable they are is how well they correlate with human judgment. Some scholars
have come to the conclusion that, given the current quality levels of the systems, simple
automatic metrics, such as COMET, can be used as proxies for quality estimation (Macháček
et al., 2023a). Semantic vectors and large language models, which can directly compare the
original with the translation, have also shown promise in inferring quality which better relates
to human judgments (Kocmi and Federmann, 2023; Han and Lu, 2021).

12.7 Ethical aspects


Machine interpreting, like any emerging technology, presents a range of ethical challenges that
require careful consideration and governance (Cath, 2018; Floridi, 2021). As a tool designed
to enhance communication and understanding across language barriers, it has the potential
to significantly impact various areas of human life. For this reason, its application must be
managed responsibly to ensure ethical integrity. At a very general level, we can categorise the
ethical challenges of any AI solution, hence also of MI, into three primary scenarios:13

• Overuse. This occurs when MI systems are deployed without a genuine need, leading to
unnecessary resource expenditure. The economic and environmental costs are significant,
considering the substantial energy and computational resources required to run these sys-
tems. Overuse not only leads to wasteful expenditure but also potentially contributes to
environmental degradation due to the high energy demands of current ML technologies.

218
Machine interpreting

• Misuse. This scenario arises when the technology is employed in situations where it may
cause harm. The accuracy and appropriateness of MI systems vary based on several fac-
tors, including language pairs, context of the communication, cultural nuances, techni-
cal limitations, and expectations. Using MI in sensitive or high-stake environments, such
as legal proceedings or medical consultations, without adequate safeguards can lead to
misinterpretations with serious consequences. Such misuse is not only unethical but also
potentially harmful and should be regulated to prevent adverse outcomes.
• Underuse. This refers to situations where MI is not utilised despite its potential to sig-
nificantly enhance communication and accessibility. Underuse is considered unethi-
cal as it denies the benefits of reduced language barriers to those who could otherwise
stand to gain from them. It is also economically imprudent, as the technology offers a
cost-effective solution for increasing accessibility. The underuse of MI can stem from a
lack of awareness, technological limitations, or resistance to change. Addressing these
barriers is crucial for maximising the technology’s positive impact.

The preceding categorisation serves as a general framework to guide the responsible adop-
tion of this technology. However, many factors remain to be defined in practical terms.
Such factors include deciding who is in charge of defining ‘critical use’, or how to define
metrics of acceptable translation performances. From a technical and legal standpoint, MI
systems present several critical aspects that necessitate responsible management and regula-
tion. Key areas of concern include:

• Confidentiality. MI interpreting applications are cloud-based and face data breach risks.
Ensuring confidentiality requires robust encryption, secure storage, and strict access con-
trols. Companies should adopt certifications like ISO 27001 and SOC2.
• Data ownership and privacy. MI systems process sensitive data, which raises ownership
and privacy concerns. Clear policies and compliance with regulations like GDPR are
crucial for safeguarding user data.
• Appropriate use. MI system effectiveness varies by language pair, situation complexity,
and cultural nuances. Users need guidelines to ensure they avoid misuse in critical con-
texts. Certifications can ensure reliability in regulated areas, like courts or healthcare.
• Liability. Accountability for translation errors requires clarity. Balanced regulations are
needed to ensure quality and innovation without eroding trust or stifling development.
• Ethical AI and bias mitigation. MI systems must address biases to prevent stereotypes or
discrimination. Regular audits and updates are key to ensuring ethical AI practices.

In addressing these challenges, it is essential to develop a balanced approach that maxim-


ises the benefits while minimising its risks (Floridi et al., 2018). This involves continuous
evaluation of the technology’s impact, updating ethical guidelines, and ensuring that its
deployment is aligned with societal values and needs. Stakeholders, including developers,
users, and policymakers, must collaborate to establish standards and regulations that guide
the responsible use of MI.
MI boldly promises to provide unrestricted access to spoken content across languages.
While this ambition appears commendable and worth pursuing, it carries subtle risks that
should not be overlooked. The interconnectedness of individuals, facilitated by the internet
and social media, while fostering increased information exchange and knowledge accessi-
bility, has simultaneously triggered societal polarisation and various negative consequences

219
The Routledge Handbook of Interpreting, Technology and AI

(Becker et al., 2019). In a similar vein, unrestricted access to information through machine
interpretation could yield both advantages and disadvantages.
On the positive side, MI offers the potential for enhanced and autonomous dissemination
of information and knowledge. While the exclusive provision of services by professionals
provides numerous advantages, including the assurance of expertise and the high-quality
standards that professionals can deliver, only machines have the possibility to make acces-
sibility available to everyone (Susskind and Susskind, 2017).14
However, the ubiquity of spoken language translation also carries the risk of further rad-
icalisation and polarisation of ideological positions. AI creates the illusion that all content
can and should be made accessible in every language and culture. However, not all content
created by individuals can be meaningfully translated without proper contextualisation
of cultural, historical, and sociological aspects. Some content is deeply rooted in specific
cultures or subcultures and only holds significance within that specific context. This makes
translation – if not culturally mediated or contextualised – futile or even counterproductive.
In such contexts, MI is bound to fall short or be executed poorly, potentially exacerbating
misunderstandings and polarisation.

12.8 Future developments


The future trajectory of MI is difficult to predict, as it hinges significantly on advancements in
various facets of language technology. While existing cascading systems are poised for qual-
ity enhancements, due to improvements in their underlying components, further evolution
will be contingent on the integration of novel technologies and methodologies (Sperber and
Paulik, 2020; Gaido et al., 2022), for example, end-to-end approaches (see Section 12.4.2).
Two primary developmental trajectories merit attention. The first developmental avenue
involves augmenting the traditional cascading system pipeline with new components. These
additions aim to address and improve the limitations inherent in current systems. Potential
areas for exploration include the integration of large language models, the enhancement
of information extraction from acoustic models, and the incorporation of visual elements.
LLMs and the capabilities of generative AI could serve various functions (Wu et al.,
2023). Unlike traditional neural approaches to language and translation tasks, LLMs intro-
duce the capacity for nuanced language manipulation and a certain level of understanding
(Chang et al., 2023). While the extent of this ‘understanding’ remains a topic of debate
among scholars (Bender and Koller, 2020) and hinges on the interpretation of the term
itself, it is widely acknowledged that LLMs exhibit an enhanced proficiency in contex-
tual text interpretation. This capability can be directly applied to contextual translation
(Karpinska and Iyyer, 2023) or used to pre-process source information for translation, akin
to the semi-automated approach described by Korybski et al. (2022).
In the area of acoustic models, integration could focus on extracting information from a
speaker’s prosody (Barrault et al., 2023; Ko et al., 2023). Current cascading systems use acous-
tic models in their automatic speech recognition segment to transcribe spoken words into text
for subsequent processing. However, valuable information – conveyed through speech ele-
ments such as intonation, prosody, and pauses – is often lost. This information is crucial, as
communication extends beyond mere words to how those words are articulated. Such nuances
might be implicitly captured in direct speech-to-text translation models (ten Bosch, 2003).
The second trajectory pertains to the advent of robust end-to-end systems. These sys-
tems stand to benefit from increased computational power, the availability of appropriate

220
Machine interpreting

data, and innovative machine learning frameworks capable of yielding quality results with
reduced training data requirements (Sperber and Paulik, 2020). The future of speech trans-
lation seems to be rapidly shifting from cascading systems to end-to-end systems. Recent
developments show that end-to-end systems can produce translations while preserving
the various features of speech. End-to-end models focus on often-overlooked elements of
prosody, such as speech rate and pauses. They also retain the emotion and style of the
original speech (Barrault et al., 2023; Jia et al., 2021; Nachmani et al., 2023). Not only can
end-to-end models be a useful solution for dealing with languages that do not have a formal
writing system (Duong et al., 2016), but they should also enable more effective streaming
techniques and better source language segmentation approaches (see Section 12.4.1) that
mimic the behaviour of human interpreters. Therefore, use of these models could lead to
both a potential reduction in latency and a simplification of deployment and maintenance.
Visual analysis could further ground the translation process in the communicative context.
Live communication typically encompasses both verbal and non-verbal elements, tailored
to the situational demands and communicative goals of the participants (Lala et al., 2019;
Sulubacak et al., 2019). This aspect is equally vital in multilingual communication and MI.
Emerging vision systems, when combined with LLMs, demonstrate remarkable proficiency
in image analysis, with live video analysis on the horizon. These systems are now capable
of converting visual data into what might be termed ‘situational meta-information’ – essen-
tially, information about what is happening in the communicative context: who is involved,
what the features of the setting are, etc. Leveraging this data may significantly enrich the
translation process, leading to enhanced accuracy and nuance.
In applications outside the simultaneous modality, virtual avatars might influence the
perception of artificial interpretation further, for example, in dialogic contexts, where the
embodiment of an interpreter seems to play a central role (Li et al., 2023).

12.9 Conclusions
This chapter has provided a thorough examination of MI, highlighting its significant evolu-
tion from early experimental stages to its current applications. Despite remarkable techno-
logical advancements, this technology continues to face complex challenges, particularly in
accurately capturing the nuances of spoken language and contextual subtleties. There are
also methodological challenges to tackle, such as the need to define shared quality criteria
that should help diverse stakeholders evaluate the fit of the technology for their use cases.
The chapter also underscores the importance of addressing ethical considerations in the
deployment of live speech translation technologies, emphasising the need for responsible
use to maximise benefits while minimising potential risks.
The future of MI looks promising, with potential advancements in computational power
and machine learning techniques, which could enhance its efficiency and accuracy. Overall,
MI stands at a pivotal point. It has the capacity to significantly impact global communica-
tion, provided its development and application are managed with careful consideration of
its technological, evaluative, and ethical dimensions.

Notes
1 https://iwslt.org (accessed 10.9.2024).
2 https://www2.nict.go.jp/astrec-att/workshop/IWSLT2004/archives/000196.html (accessed 10.9.2024).
3 For an in-depth analysis of this terminology, please refer to Pöchhacker (2024).

221
The Routledge Handbook of Interpreting, Technology and AI

4 See ‘call for tender’ at: https://etendering.ted.europa.eu/cft/cft-display.html?cftId=5249 (accessed


12.9.2024).
5 See, for example, https://slator.com/live-speech-to-speech-ai-translation-goes-commercial (accessed
12.9.2024).
6 See Poibeau (2017) for a detailed history of machine translation.
7 https://openai.com. (accessed 12.9.2024)
8 https://blog.research.google/2019/05/introducing-translatotron-end-to-end.html. (accessed 12.9.2024)
9 https://ai.meta.com/blog/seamless-m4t. (accessed 12.9.2024)
10 www.interprefy.com. (accessed 12.9.2024)
11 www.kudo.ai. (accessed 12.9.2024)
12 See, for example, https://blog.google/products/translate/google-translate-new-languages-2024/ and
https://ai.meta.com/research/no-language-left-behind/. (accessed 12.9.2024)
13 See Floridi et al. (2018) for the general theoretical framework used here.
14 Susskind and Susskind (2017, 33) note that ‘[m]ost individuals and organizations find it challeng-
ing to afford the services of top-tier professionals’, and that the use of AI might extend accessibil-
ity, where now only a limited number of people can actually avail themselves of these services.

References
Abarca, V.M.G., Palos-Sanchez, P.R., Rus-Arias, E., 2020. Working in Virtual Teams: A Systematic
Literature Review and a Bibliometric Analysis. IEEE Access 8, 168923–168940.
Agarwal, M., Agrawal, S., Anastasopoulos, A., Bentivogli, L., Bojar, O., Borg, C., Carpuat, M., Cattoni,
R., Cettolo, M., Chen, M., Chen, W., Choukri, K., Chronopoulou, A., Currey, A., Declerck, T., Dong,
Q., Duh, K., Estève, Y., Federico, M., Gahbiche, S., Haddow, B., Hsu, B., Mon Htut, P., Inaguma, H.,
Javorský, D., Judge, J., Kano, Y., Ko, T., Kumar, R., Li, P., Ma, X., Mathur, P., Matusov, E., McNamee,
P., McCrae, J.P., Murray, K., Nadejde, M., Nakamura, S., Negri, M., Nguyen, H., Niehues, J., Niu, X.,
Kr. A., Ojha, Ortega, J.E., Pal, P., Pino, J., van der Plas, L., Polák, P., Rippeth, E., Salesky, E., Shi, J.,
Sperber, M., Stüker, S., Sudoh, K., Tang, Y., Thompson, B., Tran, K., Turchi, M., Waibel, A., Wang, M.,
Watanabe, S., Zevallos, R., 2023. Findings of the IWSLT 2023 Evaluation Campaign. In Salesky, E.,
Federico, M., Carpuat, M., eds. Proceedings of the 20th International Conference on Spoken Language
Translation (IWSLT 2023), Association for Computational Linguistics, pp. 1–61.
Anastasopoulos, A., Barrault, L., Bentivogli, L., Zanon Boito, M., Bojar, O., Cattoni, R., Currey, A., Dinu,
G., Duh, K., Elbayad, M., Emmanuel, C., Estève, Y., Federico, M., Federmann, C., Gahbiche, S., Gong,
H., Grundkiewicz, R., Haddow, B., Hsu, B., Javorský, D., Kloudová, V., Lakew, S., Ma, X., Mathur,
P., McNamee, P., Murray, K., Nǎdejde, M., Nakamura, S., Negri, M., Niehues, J., Niu, X., Ortega, J.,
Pino, J., Salesky, E., Shi, J., Sperber, M., Stüker, S., Sudoh, K., Turchi, M., Virkar, Y., Waibel, A., Wang,
C., ­Watanabe, S., 2022. Findings of the IWSLT 2022 Evaluation Campaign. In Proceedings of the 19th
International Conference on Spoken Language Translation (IWSLT 2022), 98–157.
Barrault, L., Chung, Y.-A., Meglioli, M.C., Dale, D., Dong, N., Duquenne, P.-A., Elsahar, H., Gong,
H., Heffernan, K., Hoffman, J., Klaiber, C., Li, P., Licht, D., Maillard, J., Rakotoarison, A., Ram
Sadagopan, K., Wenzek, G., Ye, E., Akula, B., Chen, P.-J., El Hachem, N., Ellis, B., Mejia Gon-
zalez, G., Haaheim, J., Hansanti, P., Howes, R., Huang, B., Hwang, M.-J., Inaguma, H., Jain,
S., Kalbassi, E., Kallet, A., Kulikov, I., Lam, J., Li, D., Ma, X., Mavlyutov, R., Peloquin, M.,
Ramadan, M., Ramakrishnan, A., Sun, A., Tran, K., Tran, T., Tufanov, I., Vogeti, V., Wood, C.,
Yang, Y., Yu, B., Andrews, P., Balioglu, C., Costa-jussà, M.R., Celebi, O., Elbayad, M., Gao, C.,
Guzmán, F., Kao, J., Lee, A., Mourachko, A., Pino, J., Popuri, S., Ropers, C., Saleem, S., Schwenk,
H., Tomasello, P., Wang, C., Wang, J., Wang, S., 2023. SeamlessM4T: Massively Multilingual &
Multimodal Machine Translation. URL https://arxiv.org/abs/2308.11596
Becker, J., Porter, E., Centola, D., 2019. The Wisdom of Partisan Crowds. Proceedings of the National
Academy of Sciences 116(22), 10717–10722.
Bender, E.M., Koller, A., 2020. Climbing Towards NLU: On Meaning, Form, and Understanding in
the Age of Data. In Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics. Association for Computational Linguistics, 5185–5198.
Berard, A., Pietquin, O., Servan, C., Besacier, L., 2016. Listen and Translate: A Proof of Concept for
End-to-End Speech-to-Text Translation. NIPS Workshop on End-to-End Learning for Speech and
Audio Processing, December, Barcelona, Spain.

222
Machine interpreting

Cath, C., 2018. Governing Artificial Intelligence: Ethical, Legal and Technical Opportunities and
Challenges. Philosophical Transactions of the Royal Society A: Mathematical, Physical and
­Engineering Sciences 37(2133).
Chang, C.-C., Chuang, S.-P., Lee, H.-Y., 2022. Anticipation-Free Training for Simultaneous Machine
Translation. In Proceedings of the 19th International Conference on Spoken Language Translation
(IWSLT 2022). Association for Computational Linguistics, Dublin, Ireland, 43–61.
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y.,
Ye, W., Zhang, Y., Chang, Y., Yu, P.S., Yang, Q., Xie, X., 2023. A Survey on Evaluation of Large
Language Models. ACM Transactions on Intelligent Systems and Technology 15(3), 1–4.
Cogo, A., 2015. English as a Lingua Franca: Descriptions, Domains and Applications. In Bowles,
H., Cogo, A., eds. International Perspectives on English as a Lingua Franca: Pedagogical
Insights, International Perspectives on English Language Teaching. Palgrave Macmillan UK,
London, 1–12.
Ding, H., Hargraves, L., 2009. Stress-Associated Poor Health Among Adult Immigrants with a Lan-
guage Barrier in the United States. Journal of Immigrant and Minority Health 11(6), 446–452.
Duong, L., Anastasopoulos, A., Chiang, D., Bird, S., Cohn, T., 2016. An Attentional Model for
Speech Translation Without Transcription. In Proceedings of the 2016 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technolo-
gies, Association for Computational Linguistics, San Diego, CA, 949–959.
Fantinuoli, C., 2023. The Emergence of Machine Interpreting. European Society for Translation Stud-
ies 62, 10.
Fantinuoli, C., Dastyar, V., 2022. Interpreting and the Emerging Augmented Paradigm. Interpreting
and Society 2(2), 185–194.
Fettes, M., 1997. Esperanto and Language Awareness. In Van Lier, L., Corson, D., eds. Encyclopedia
of Language and Education: Knowledge About Language, Encyclopedia of Language and Educa-
tion. Springer Netherlands, Dordrecht, The Netherlands, 151–159.
Fitzgerald, E., Hall, K., Jelinek, F., 2009. Reconstructing False Start Errors in Spontaneous Speech
Text. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009).
Association for Computational Linguistics, 255–263.
Floridi, L., 2021. The End of an Era: From Self-Regulation to Hard Law for the Digital Industry.
Springer, Rochester, NY.
Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., Luetge, C., Madelin, R., Pagallo,
U., Rossi, F., Schafer, B., Valcke, P.,Vayena, E., 2018. AI4People – an Ethical Framework for a Good AI
Society: Opportunities, Risks, Principles, and Recommendations. Minds and Machines 28(4), 689–707.
Fügen, C., 2008. A System for Simultaneous Translation of Lectures and Speeches (Unpublished PhD
thesis). University of Karlsruhe.
Fügen, C., Kolss, M., Bernreuther, D., Paulik, M., Stuker, S., Vogel, S., Waibel, A., 2006. Open
Domain Speech Recognition & Translation: Lectures and Speeches. In 2006 IEEE International
Conference on Acoustics Speech and Signal Processing Proceedings 1, I–I).
Gaido, M., Papi, S., Fucci, D., Fiameni, G., Negri, M., Turchi, M., 2022. Efficient Yet Competitive
Speech Translation: FBK@IWSLT2022. In Proceedings of the 19th International Conference on
Spoken Language Translation (IWSLT 2022). Association for Computational Linguistics, Dublin,
Ireland, 177–189 (in-person and online), May.
Garcia Becerra, O., 2016a. Do First Impressions Matter? The Effect of First Impressions on the Assess-
ment of the Quality of Simultaneous Interpreting. Across Languages and Cultures 17(1), 77–98.
Garcia Becerra, O., 2016b. Survey Research on Quality Expectations in Interpreting: The Effect of
Method of Administration on Subjects’ Response Rate. Meta 60.
Han, C., 2022. Interpreting Testing and Assessment: A State-of-the-Art Review. Language Testing
39(1), 30–55.
Han, C., Lu, X., 2021. Interpreting Quality Assessment Re-Imagined: The Synergy Between Human
and Machine Scoring. Interpreting and Society 1(1), 70–90.
Hartshorne, J.K., Tenenbaum, J.B., Pinker, S., 2018. A Critical Period for Second Language Acquisi-
tion: Evidence from 2/3 Million English Speakers. Cognition 177, 263–277.
Hu, C., Tian, Q., Li, T., Wang, Y., Wang, Y., Zhao, H., 2021. Neural Dubber: Dubbing for Videos
According to Scripts. Proceedings of the 35th Conference on Neural Information Processing Sys-
tems (NeurIPS 2021).

223
The Routledge Handbook of Interpreting, Technology and AI

Iranzo-Sànchez, J., Silvestre-Cerdà, J.A., Rosello, N., Sanchis, A., Civera, J., Juan, A. 2020.
Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates. Proceed-
ings of the ICASSP2020 Conference.
Jaeger, F.N., Pellaud, N., Laville, B., Klauser, P., 2019. Barriers to and Solutions for Addressing
Insufficient Professional Interpreter Use in Primary Healthcare. BMC Health Services Research
19(1), 753.
Jia, Y., Johnson, M., Macherey, W., Weiss, R.J., Cao, Y., Chiu, C.-C., Biadsy, F., Macherey, W., John-
son, M., Chen, Z., Wu, Y., 2019a. Leveraging Weakly Supervised Data to Improve End-to-End
Speech-to-Text Translation. URL https://doi.org/10.48550/arXiv.1904.06037
Jia, Y., Ramanovich, M. T., Remez, T., Pomerantz, R., 2021. Translatotron 2: Robust Direct
Speech-to-Speech Translation. URL arxiv.org/abs/2107.08661
Jia, Y., Weiss, R.J., Biadsy, F., Macherey, W., Johnson, M., Chen, Z., Wu, Y., 2019b. Direct
Speech-to-Speech Translation with a Sequence-to-Sequence Model. URL https://doi.org/10.48550/
arXiv.1904.06037
Kade, O., 1968. Zufall und Gesetzmäßigkeit in der Übersetzung. Verlag Enzyklopädie Edition,
Leipzig.
Kalina, S., 2012. Quality in Interpreting. In Handbook of Translation Studies Online, vol. 3. John
Benjamins Publishing Company, 134–140.
Kano, T., Takamichi, S., Sakti, S., Neubig, G., Toda, T., Nakamura, S., 2018. An End-to-End Model for
Crosslingual Transformation of Paralinguistic Information. Machine Translation 32(4), 353–368.
Karpinska, M., Iyyer, M., 2023. Large Language Models Effectively Leverage Document-Level Context
for Literary Translation, but Critical Errors Persist. URL https://doi.org/10.48550/arXiv.2304.03245
Kato, Y., 1995. The Future of Voice-Processing Technology in the World of Computers and Commu-
nications. Proceedings of the National Academy of Sciences 92(22), 10060–10063.
Kidawara, Y., Sumita, E., Kawai, H., eds., 2020. Speech-to-Speech Translation. SpringerBriefs in
Computer Science. Springer, Singapore.
Ko, Y., Fukuda, R., Nishikawa, Y., Kano, Y., Sudoh, K., Nakamura, S., 2023. Tagged End-to-End
Simultaneous Speech Translation Training Using Simultaneous Interpretation Data. In Proceedings
of the 20th International Conference on Spoken Language Translation (IWSLT 2023). Association
for Computational Linguistics, Toronto, Canada, 363–375.
Kocmi, T., Federmann, C., 2023. Large Language Models Are State-of-the-Art Evaluators of Transla-
tion Quality. URL https://doi.org/10.48550/arXiv.2302.14520
Korybski, T., Davitti, E., Orăsan, C., Braun, S., 2022. A Semi-Automated Live Interlingual Com-
munication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking. In Pro-
ceedings of the Thirteenth Language Resources and Evaluation Conference. European Language
Resources Association, Marseille, France, 4405–4413.
Kozuh, I., Debevc, M., 2018. Challenges in Social Media Use Among Deaf and Hard of Hearing People.
In Dey, N., Babo, R., Ashour, A.S., Bhatnagar, V., Salim Bouhlel, M., eds. Social Networks Science:
Design, Implementation, Security, and Challenges. Springer International Publishing, Cham, 151–171.
Kumar, V., Sridhar, R., Chen, J., Bangalore, S., Ljolje, A., Chengalvarayan, R., 2013. Segmentation
Strategies for Streaming Speech Translation. In Proceedings of the 2013 Conference of the North
American Chapter of the Association for Computational Linguistics: Human Language Technolo-
gies. Association for Computational Linguistics, 230–238.
Lala, C., Madhyastha, P., Specia, L., 2019. Grounded Word Sense Translation. In Bernardi, R., Fer-
nandez, R., Gella, S., Kafle, K., Kanan, C., Lee, S., Nabi, M., eds. Proceedings of the Second
Workshop on Shortcomings in Vision and Language. Association for Computational Linguistics,
Minneapolis, MN, 78–85.
Lee, A., Gong, H., Duquenne, P.-A., Schwenk, H., Chen, P.-J., Wang, C., Popuri, S., Adi, Y., Pino, J.,
Gu, J., Hsu, W.-N., 2022. Textless Speech-to-Speech Translation on Real Data. URL https://doi.
org/10.48550/arXiv.2112.08352
Li, R., Liu, K., Cheung, A.K.F., 2023. Interpreter Visibility in Press Conferences: A Multimodal Con-
versation Analysis of Speaker – Interpreter Interactions. Humanities and Social Sciences Commu-
nications 10(1), 1–12.
Liebling, D., Heller, K., Robertson, S., Deng, W., 2022. Opportunities for Human-Centered Evalua-
tion of Machine Translation Systems. In Carpuat, M., de Marneffe, M.-C., Meza Ruiz, I.V., eds.
Findings of the Association for Computational Linguistics: NAACL 2022. Association for Com-
putational Linguistics, 229–240.

224
Machine interpreting

Ma, M., Huang, L., Xiong, H., Zheng, R., Liu, K., Zheng, B., Popuri, S., Adi, Y., Pino, J., Gu, J., Wang,
H., 2019. STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency
Using Prefix-to-Prefix Framework. In Proceedings of the 57th Annual Meeting of the Association
for Computational Linguistics. Association for Computational Linguistics, 3025–3036.
Macháček, D., Bojar, O., Dabre, R., 2023a. MT Metrics Correlate with Human Ratings of Simultane-
ous Speech Translation. In Proceedings of the 24th Annual Conference of the European Associa-
tion for Machine Translation.
Macháček, D., Polák, P., Bojar, O., Dabre, R., 2023b. Robustness of Multi-Source MT to Tran-
scription Errors. In Proceedings of the 24th Annual Conference of the European Association for
Machine Translation.
Martucci, G., Cettolo, M., Negri, M., Turchi, M., 2021. Lexical Modeling of ASR Errors for Robust
Speech Translation. In Interspeech 2021. ISCA, 2282–2286.
Moslem, Y., Haque, R., Kelleher, J.D., Way, A., 2023. Adaptive Machine Translation with Large
Language Models. In Proceedings of the 24th Annual Conference of the European Association for
Machine Translation, 227–237.
Nachmani, E., Levkovitch, A., Ding, Y., Asawaroengchai, C., Zen, H., Ramanovich, M.T., 2023.
Translatotron 3: Speech to Speech Translation with Monolingual Data. URL https://doi.
org/10.48550/arXiv.2305.17547
O’Brien, S., Federici, F.M., eds., 2022. Translating Crises. Bloomsbury Academic.
Papi, S., Gaido, M., Negri, M., 2023. Direct Models for Simultaneous Translation and Automatic Sub-
titling: FBK@IWSLT2023. In Proceedings of the 20th International Conference on Spoken Language
Translation (IWSLT 2023). Association for Computational Linguistics, Toronto, Canada, 159–168.
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J., 2002. BLEU: A Method for Automatic Evaluation of
Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Compu-
tational Linguistics (ACL), 311–318.
Pöchhacker, F., 2002. Quality Assessment in Conference and Community Interpreting. Meta 46(2), 410–425.
Pöchhacker, F., 2022. Interpreters and Interpreting: Shifting the Balance? The Translator 28(2),
148–161, Routledge.
Pöchhacker, F., 2024. Is Machine Interpreting Interpreting? In Translation Spaces. John Benjamins
Publishing Company.
Poibeau, T., 2017. Machine Translation. The MIT Press Essential Knowledge Series. The MIT Press.
Popović, M., 2017. chrF++: Words Helping Character N-Grams. In Bojar, O., Buck, C., Chatterjee,
R., Federmann, C., Graham, Y., Haddow, B., Huck, M., Jimeno Yepes, A., Koehn, P., Kreutzer,
J., eds. Proceedings of the Second Conference on Machine Translation. Association for Computa-
tional Linguistics, Copenhagen, Denmark, 612–618.
Saboo, A., Baumann, T., 2019. Integration of Dubbing Constraints into Machine Translation. In Pro-
ceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers). Associa-
tion for Computational Linguistics, Florence, Italy, 94–101.
Salesky, E., Darwish, K., Al-Badrashiny, M., Diab, M., Niehues, J., 2023. Evaluating Multilingual
Speech Translation Under Realistic Conditions with Resegmentation and Terminology. In Pro-
ceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
(In-Person and Online). Association for Computational Linguistics, Toronto, Canada, 62–78.
Salesky, E., Wiesner, M., Bremerman, J., Cattoni, R., Negri, M., Turchi, M., Oard, D.W., Post, M.,
2021. The Multilingual TEDx Corpus for Speech Recognition and Translation. Proceeding of the
Interspeech 2021 Conference.
Sarkar, A., 2016. The Challenge of Simultaneous Speech Translation. In Proceedings of the 30th
Pacific Asia Conference on Language, Information and Computation: Keynote Speeches and
Invited Talks. Seoul, South Korea, 7.
Seeber, K.G., 2012. Multimodal Input in Simultaneous Interpreting: An Eyetracking Experiment. In Zyba-
tov, L., Petrova, A., Ustaszewski, M., eds. Proceedings of the 1st International Conference TRANS-
LATA, Translation & Interpreting Research: Yesterday – Today – Tomorrow. Peter Lang, 341–347.
Seligman, M., 1997. Interactive Real-Time Translation via the Internet. In AAAI.
Sethiya, N., Maurya, C.K., 2023. End-to-End Speech-to-Text Translation: A Survey. URL https://doi.
org/10.48550/arXiv.2312.01053
Singh, G., Sharma, S., Kumar, V., Kaur, M., Baz, M., Masud, M., 2021. Spoken Language Iden-
tification Using Deep Learning. Computational Intelligence and Neuroscience. URL https://doi.
org/10.1155/2021/5123671

225
The Routledge Handbook of Interpreting, Technology and AI

Sperber, M., Paulik, M., 2020. Speech Translation and the End-to-End Promise: Taking Stock of
Where We Are. In Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics.
Sudoh, K., Kano, T., Novitasari, S., Yanagita, T., Sakti, S., Nakamura, S., 2020. Simultaneous
Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS. URL https://
arxiv.org/abs/2011.04845
Sulubacak, U., Caglayan, O., Grönroos, S.-A., Rouhe, A., Elliott, D., Specia, L., Tiedemann, J., 2019.
Multimodal Machine Translation Through Visuals and Speech, November 2019. URL https://
arxiv.org/abs/1911.12798
Susskind, R., Susskind, D., 2017. The Future of the Professions: How Technology Will Transform the
Work of Human Experts. Oxford University Press edition.
ten Bosch, L., 2003. Emotions, Speech and the ASR Framework. Speech Communication 40(1), 213–225.
Tomita, M., Tomabechi, H., Saito, H., 1990. Speech Trans: An Experimental Real-Time
Speech-to-Speech. Language Research 24(4), 663–672.
Van Doorslaer, L., 2009. How Language and (Non-)Translation Impact on Media Newsrooms: The
Case of Newspapers in Belgium. Perspectives 17(2), 83–92.
Wahlster, W., ed., 2000. Verbmobil: Foundations of Speech-to-Speech Translation. Springer, Berlin.
Waibel, A., Cho, E., Niehues, J., 2012. Segmentation and Punctuation Prediction in Speech Language
Translation Using a Monolingual Translation System. In IWSLT.
Waibel, A., Jain, A.N., McNair, A.E., Saito, H., Hauptmann, A.G., Tebelskis, J., 1991. JANUS: A
Speech-to-Speech Translation System Using Connectionist and Symbolic Processing Strategies.
URL https://doi.org/10.1109/ICASSP.1991.150456
Wang, H., Gao, W., Li, S., 1999. Utterance Segmentation of Spoken Chinese. Chinese Journal of
Computers 22, 1009–1013.
Wang, X., Fantinuoli, C., 2024. Exploring the Correlation Between Human and Machine Evaluation
of Simultaneous Speech Translation. Proceedings of the 25th Annual Conference of the European
Association for Machine Translation 1, 325–334.
Wang, Y., Wang, J., Zhang, W., Zhan, Y., Guo, S., Zheng, Q., Wang, X., 2022. A Survey on Deploying
Mobile Deep Learning Applications: A Systemic and Technical Perspective. Digital Communica-
tions and Networks 8, 1–17. URL https://doi.org/10.1016/j.dcan.2021.06.001
Wu, S.-C., 2011. Assessing Simultaneous Interpreting: A Study on Test Reliability and Examiners’
Assessment Behavior (PhD thesis). Newcastle University.
Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao,
Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y.,
Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa,
J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., Dean, J., 2016. Google’s Neural Machine
Translation System: Bridging the Gap Between Human and Machine Translation. URL https://doi.
org/10.48550/arXiv.1609.08144
Wu, Z., Li, Z., Wei, D., Shang, H., Guo, J., Chen, X., Rao, Z., Yu, X., Yang, J., Li, S., Xie, Y., Wei,
B., Zheng, J., Zhu, M., Lei, L., Yang, H., Jiang, Y., 2023. Improving Neural Machine Translation
Formality Control with Domain Adaptation and Reranking-Based Transductive Learning. In Pro-
ceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023).
Association for Computational Linguistics, Toronto, Canada, 180–186 (in-person and online).
URL https://doi.org/10.18653/v1/2023.iwslt-1.13
Xu, J., Buet, F., Crego, J., Bertin-Lemée, E., Yvon, F., 2022. Joint Generation of Captions and Subtitles
with Dual Decoding. In Proceedings of the 19th International Conference on Spoken Language
Translation (IWSLT 2022). Association for Computational Linguistics, Dublin, Ireland, 74–82
(in-person and online). URL https://doi.org/10.18653/v1/2022.iwslt-1.7
Yang, M., Kanda, N., Wang, X., Chen, J., Wang, P., Xue, J., Li, J., Yoshioka, T., 2023. DiariST: Stream-
ing Speech Translation with Speaker Diarization. URL https://doi.org/10.48550/arXiv.2309.08007
Yao, B., Jiang, M., Yang, D., Hu, J., 2023. Empowering LLM-Based Machine Translation with Cul-
tural Awareness. URL https://ar5iv.labs.arxiv.org/html/2305.14328
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y., 2020. BERTScore: Evaluating Text Gen-
eration with BERT. URL https://doi.org/10.48550/arXiv.1904.09675
Zhang, X., 2016. Semi-Automatic Simultaneous Interpreting Quality Evaluation. International Jour-
nal on Natural Language Computing 5, 1–12. URL https://doi.org/10.5121/ijnlc.2016.5501

226
PART IV

Technology in professional
interpreting settings
13
CONFERENCE SETTINGS
Kilian G. Seeber

13.1 Introduction
When we hear the word technology, many of us will spontaneously associate it with elec-
tronics, computers, algorithms, and probably some form of artificial intelligence. For many
scholars, however, technology has always been a much broader concept (see Mesthene,
1970; Galbraith, 1967; Ferré, 1988; Friedel, 2007), comprising ‘the application of organ-
ised knowledge to practical tasks by ordered systems of people and machines’ (Barbour,
1993, 3). According to this definition, technology can include practical experience, inven-
tions, and scientific theories aimed at producing goods, but also providing services. Cru-
cially, technology can relate to machines as much as it can relate to human beings. It is in
that sense that I will use the concept.
This chapter sets out to describe the way in which technology has been implemented in
conference interpreting settings. Crucially, whilst seemingly self-evident, what constitutes
conference interpreting is not unambiguously clear either and has, at times, been defined
rather loosely, both within and outside the professional community. Scholars have concep-
tualised and categorised interpreting by setting (the environment in which the interpreta-
tion takes place), by mode (the temporal dynamics underlying the interpreting technique),
and by modality (the channels involved in the production and reception of the interpreta-
tion) at different levels of granularity. This is how, over time, the relatively simple and
probably too simplistic dichotomous distinctions of interpreting settings (e.g. conference
vs community), modes (e.g. consecutive vs simultaneous), and modalities (e.g. on-site vs
distance) have given way to more fine-grained, partially overlapping taxonomies. So while
conference interpreting is undoubtedly the most highly professionalised (Baigorri-Jalón
et al., 2021) and perhaps the most readily recognised (Diriker, 2015) interpreting setting,
its definition according to the ‘socio-spatial contexts of interaction’ (Grbić, 2015, 371) in
which it takes place might be insufficient, particularly without further consideration of the
modes and modalities involved.
For instance, professionals regularly conflate simultaneous interpreting and conference
interpreting – either because of the prestige associated with that label (Kurz, 1991) or
because most simultaneous interpreter training happens within the confines of conference


DOI: 10.4324/9781003053248-18
The Routledge Handbook of Interpreting, Technology and AI

interpreting training programmes (Sawyer and Roy, 2015). Yet even in scholarly circles,
the distinction between setting and mode is not straightforward. Baker and Diriker (2019),
for instance, appear to merge mode and setting when arguing that the terms conference
interpreting and simultaneous interpreting are closely connected because the history of the
former begins with the introduction of the latter. This is in stark contrast to Baigorri-Jalón
(2015), who points to the first Pan-American conference in 1889–1890, where interpreters
were used to ensure mutual understanding across different languages, as an early exam-
ple of conference interpreting, before providing evidence of first attempts at implementing
(technology-enabled) simultaneous interpreting at the International Labour Organization
in Geneva in the mid-1920s. Others, like Chernov (2004) and Svejcer (1999), argue that
(technology-enabled) simultaneous interpreting was first used at the Comintern Congress
in Moscow in 1928, while Gaiba (1998) and Lederer (1984) pinpoint the Nuremberg trials
of 1945 as the birth of this interpreting mode.
While a comprehensive discussion of different interpreting taxonomies and a foren-
sic analysis of historical evidence documenting the origin of different interpreting modes
exceed the scope of this chapter, in this contribution I will focus on the most common inter-
preting modes and modalities in multilingual multilateral conferences. In doing so, I will
deliberately limit the scope of the conference interpreting setting, which has been argued to
include many different types of conference-like events, including high-level bilateral diplo-
matic exchanges, press conferences, workshops, etc. (Diriker, 2015), concentrating instead
on what Pöchhacker calls ‘international conference interpreting’ (2022, 16).
In this contribution, I will focus on the technological developments that impact the confer-
ence interpreting task as it unfolds, rather than those used during the preparation stage for
an interpreting assignment. This is not to suggest that the preparation of a conference cannot
be considered an integral part of a professional conference interpreter’s workflow (see Jiang,
2013). However, the tools used for conference preparation, many of them summarised under
the heading computer-assisted interpreting (CAI), are not unique to the conference interpret-
ing setting and are already comprehensively covered by Prandi (this volume).
This chapter opens with a short introduction to the beginnings of international confer-
ence interpreting in Section 13.2, before addressing some of the most salient technological
developments by modality, that is, consecutive and simultaneous conference interpreting.
To that end, Section 13.3 briefly describes the place of consecutive interpreting in multi-
lingual multilateral conferences, whereas Section 13.4 introduces an operative task model
for this modality. The use of note-taking, digital voice recorders, digital pens, tablets, auto-
matic speech recognition (ASR), and machine translation (MT), as well as distance inter-
preting (DI), in consecutive conference interpreting is then discussed in six subsections. In
Section 13.5, the chapter pivots to simultaneous interpreting, once more outlining how and
when it was introduced in international conference settings. Section 13.6 introduces the
notion of technology-enhanced simultaneous interpreting, feeding into the respective dis-
cussion of booths, microphones and headsets, electronic glossaries, DI, as well as ASR and
MT in simultaneous conference interpreting in four subsections. The final section provides
a short conclusion and outlook.

13.2 The beginnings of international conference interpreting


Until the beginning of the 20th century, French was widely used as a lingua franca at inter-
national conferences (Wright, 2006), making them largely monolingual affairs and the use

230
Conference settings

of interpreters unnecessary. And yet select accounts corroborate the reliance on interpreters
(at that time referred to as oral translators) to overcome the language barrier among differ-
ent conference participants as early as 1889:

Two other gentlemen, Mr. Starr-Hunt and Mr. Romero, are also entitled to a mention
for the able and conscientious manner in which they fulfilled their difficult task as
oral translators. As the deliberations of the Congress were in the English and Spanish
languages, it was necessary for the benefit of the United States delegates, that all the
remarks in Spanish be immediately translated into English. . . . This applies as well
to the observations and speeches made at various times by M. Léger, delegate from
Haiti, whose remarks in French were translated into English.
(Noel, 1902, 68)

International conferences, however, were not regularly multilingual before the founding of
the International Labour Organization (ILO) in 1919 and the League of Nations (LoN) in
1920, which initially worked primarily in French and English (Baigorri-Jalón, 2005), or
the Communist International in 1919, which chiefly relied on German and French and, to
a lesser extent, English and Russian (Riddell, 2015; but see Chernov, 2016a).1 The statu-
tory, and thus systematic, use of multiple languages in international conferences, therefore,
eventually led to the professionalisation of what had already existed as a practice several
decades prior (Baigorri-Jalón et al., 2021).

13.3 Consecutive interpreting at international conferences


The principal interpreting mode used in newly established international organisations,
such as the ILO, the LoN, and the Comintern, was consecutive, although whispered
interpretation into languages other than the official ones is likely to have been pro-
vided for the benefit of delegations who otherwise could not have followed proceedings
(Baigorri-Jalón, 2005). Consecutive interpreting entails the comprehension of an utter-
ance in the source language and its subsequent oral translation in the target language
(Seeber and Amos, 2023). Today, this interpreting mode is much more often associated
with settings outside of the realm of international conferences (including community,
legal, diplomatic, educational, healthcare, police interpreting, etc.; see Pöchhacker, 2022)
and only constitutes a small fraction of professional conference interpreters’ work. The
recent-most worldwide statistics (AIIC, 2024) show that freelance conference interpreters
worked an average 97 days per year in simultaneous, compared to only 14 in consecutive.
For staff interpreters, this ratio is even more slanted, with only 2 days of consecutive com-
pared to 152 days of simultaneous interpreting per year. In 1919, however, consecutive
was the principal interpreting mode used for the official languages spoken on the con-
ference floor. Importantly, to this day, consecutive conference interpreting, which Dam
(2010) and Pöchhacker (2011) call ‘classic’ consecutive, while Viezzi (2013) refers to it
as ‘long consecutive’, is characterised by monologic speeches or speech segments lasting
several minutes. This is also reflected in the European Masters in Conference Interpreting
(EMCI) core curriculum,2 requiring 5–7 min speeches for consecutive interpreting final
exams, as well as in the interinstitutional accreditation tests3 for freelance interpreters
at the European Union, where candidates are expected to interpret speeches of 6 min.
From a processing perspective, therefore, the principal cognitive demand of consecutive

231
The Routledge Handbook of Interpreting, Technology and AI

conference interpreting is on the interpreter’s memory (Bajo and Padilla, 2015; Yenki-
maleki and Van Heuven, 2017), making the reliance on written notes to support their
memory one of its hallmarks.

13.4 Technology in consecutive conference interpreting


As Barbour’s (1993) broad definition of technology comprises the execution of practical
tasks, we begin by outlining the cognitive structures and processes associated with the per-
formance of that task (Cooke, 1992). This so-called cognitive task analysis (Clark et al.,
2008) will serve as the starting point of our discussion of the impact different technologies
have had on consecutive conference interpreting. Seleskovitch (1968) conceptualises the
interpreting process as comprising three fundamental component tasks – comprehension,
deverbalisation, and production. Gile (1995/2009), on the other hand, conceives of con-
secutive interpreting as consisting of two phases, the first one comprising listening and
analysis, note-taking, short-term memory operations, and coordination, and the second one
of remembering, note-reading, and production, for a grand total of seven component tasks.
Based on observational, process-tracing, and conceptual techniques (Cooke, 1994) applied
during the development and implementation of the MASIT,4 I suggest a highly stylised
five-component task model for the consecutive conference interpreting process, including
listening, analysing, memorising, retrieving, and producing. In the discussion of the tech-
nological developments pertaining to consecutive conference interpreting, I will attempt to
highlight the component task it mainly affected.

13.4.1 Note-taking in consecutive conference interpreting


There are (primarily anecdotal) accounts of exceptional consecutive conference interpret-
ers, including George and André Kaminker, who were allegedly able to render speeches
of over an hour without taking a single note (Baigorri-Jalón, 2004). Yet many scholars
(Kirchhoff, 1979; Moser-Mercer et al., 1997; Ahrens and Orlando, 2021; Seeber and Amos,
2023) have argued that the cognitively most demanding part of the task is memorising and
retrieving information, which in the case of consecutive conference interpreting is intrinsi-
cally linked with the taking and reading of notes. It is probably no coincidence, then, that
the fist technological advancement in consecutive conference interpreting, considered by
some a non-technological mode of interpreting (Ahrens and Orlando, 2021), was the devel-
opment of note-taking systems, implemented with the rudimentary tools of that time, pen
and paper.
From the pioneering work of early practitioners like Herbert (1952) and Rozan (1956)
to contemporary professionals like Gillies (2017), the fundamental tenets of note-taking
techniques (e.g. focusing on ideas, and thus semantics, over words, and thus lexicon) do
not seem to have changed, as their principal objective is one of reducing the interpreter’s
memory load (required for the storage of information) between the listening and the pro-
duction task. But note-taking does not only interact with the storage and retrieval compo-
nent of memory, allowing off-loading and cuing (Seeber and Amos, 2023); it also interacts
with decision-making and thus the analysis task (Piolat et al., 2005). Consequently, notes
are also used to reflect the structure in which the ideas were presented and the way in which
they relate to each other.

232
Conference settings

Different technological advancements in information and communication technology


(ICT) of the early 21st century have been adopted or repurposed by conference interpreters
and applied to an interpreting mode that, from a process perspective, for many decades,
had not undergone many substantive changes.

13.4.2 Digital voice recorders in consecutive conference interpreting


The first, and arguably most drastic, use of modern ICT in consecutive conference interpret-
ing was the use of digital voice recorders (Ferrari, 2001, 2002).5 In self-experiments and
subsequent small-scale user tests, Ferrari replaced the note-taking process by using a per-
sonal digital assistant to record the original speech and subsequently providing a simulta-
neous interpretation thereof (see Section 13.5) while replaying it through earphones. This
once more illustrates how interpreters attempt to address one of the major challenges of
consecutive conference interpreting through technology. A solution allowing to altogether
bypass the note-taking and note-reading tasks has the potential to significantly reduce the
storage load experienced by interpreters during the long speech segments encountered in con-
ference settings. In doing so, hybrid interpreting modes were created, referred to as ‘digitally
remastered consecutive’ and ‘technology-assisted consecutive’ (Ferrari, 2002), ‘DRAC – digi-
tal recorder assisted consecutive’ (Lombardi, 2003), ‘digital voice recorder–assisted consecu-
tive interpreting’ (Camayd-Freixas, 2005), ‘SimConsec’ (Hamidi and Pöchhacker, 2007), and
‘simultaneous-consecutive’ (Setton and Dawrant, 2016; Orlando and Hlavac, 2020). Despite
the different nomenclature used, it seems that both practitioners and scholars intuitively con-
ceived of this new hybrid mode as a type of consecutive interpretation, foregrounding the time
lag between the original and the interpretation. From a processing perspective, however, an
equally valid case could be made for it to be considered a type of simultaneous interpreting,
seeing that the interpretation is performed in real time, albeit based on a recorded original.
Beyond the terminological issue, what remains is a hybrid interpreting mode that,
although reducing, if not altogether eliminating, the cognitive load associated with the
storage of information, arguably replaces it with an increase in the processing load asso-
ciated with simultaneous interpreting (see Section 13.5). And yet this technology lets the
interpreter listen to the entirety of a speech (or segment thereof) prior to interpreting it
simultaneously, potentially allowing for more predictive processing (Amos et al., 2022)
and the anticipation of speaker- and speech-related challenges (Mankauskienè, 2016). The
ability to additionally take notes might then allow the interpreter to pre-empt and minimise
downstream challenges. Practical experimentation continued, mainly in court interpreting
settings (see Lombardi, 2003; Camayd-Freixas, 2005), where norms for completeness and
accuracy might differ from those applicable to consecutive conference interpreting. Eventu-
ally, a series of small-scale experiments was carried out in Vienna, where Hamidi and Pöch-
hacker (2007), Sienkiewicz (2010), and Hawel (2010) contrasted consecutive interpreting
of monologic speech segments lasting several minutes using traditional pen-and-paper notes
with consecutive simultaneous interpretation based on digital voice recordings. Although
these studies involved very few participants and had several methodological caveats (Svo-
boda, 2020), they suggest that fluency, prosody, expression, and completeness all slightly
improve in consecutive simultaneous. The increased eye contact with the audience, on the
other hand, was deemed responsible for the more favourable overall evaluation of tradi-
tional pen-and-paper-based consecutive interpreting.

233
The Routledge Handbook of Interpreting, Technology and AI

13.4.3 Digital pens in consecutive conference interpreting


Digital pens are devices able to capture analogue handwriting by means of cameras, track-
ball sensors, accelerometers, etc. and convert it into a digital format (see Orlando, 2015).
The release of the dot paper–based digital pen by Livescribe in 20076 was soon followed
by a small-scale experiment by Hiebl (2011), who compared professional interpretations
of different speeches based on traditional pen-and-paper notes to simultaneous-consecutive
interpretations using the audio recording functionality of the digital pen. Live audience
evaluations showed that interpreters scored higher when working with traditional pen and
paper across all evaluation categories (including clarity, fluency, intonation, and contact
with the audience) than when using the digital pen to provide a simultaneous-consecutive
interpretation. The interpreters themselves considered this modality a potential training
tool for consecutive interpreting, something already explored by Orlando (2010).
Experimental studies comparing the accuracy and completeness of traditional pen-and-
paper-based consecutive interpretation to digital pen–based simultaneous-consecutive inter-
preting, on the other hand, indicate that the number of meaning units transposed increases
with this new technology (Orlando, 2014). What is more, overall accuracy has tentatively been
shown to be higher when using digital pen–based simultaneous-consecutive than when using
a separate digital voice recording device (Mielcarek, 2017). This last finding is interesting and
needs to be considered against the experiment’s methodology. To allow a three-way compari-
son, participants either took notes on pen and paper for their consecutive interpretation or
used the digital pen to take notes whilst, at the same time, recording the original and eventu-
ally working in simultaneous-consecutive, or they recorded the original with a digital voice
recorder and, without taking any notes, eventually performed a simultaneous-consecutive
interpretation. This means, then, that the comparison was one of a consecutive performance
based on notes only to two different types of simultaneous-consecutive performances: one
whereby the interpreter had taken notes and one without. Having analysed the entire speech for
the purpose of producing consecutive notes before rendering it in simultaneous-consecutive,
therefore, seems to have benefitted overall accuracy. It seems warranted to conclude that,
unlike the use of digital voice recorders, the use of digital pens will interact with the analysis
task as much as traditional pen-and-paper note-taking.
The question about whether interpreters could combine their traditional notes (taken on
dot paper using a digital pen) with the recorded original during the production task, thus
allowing them to exploit redundancy and complementarity effects (Seeber, 2017a), remains
unanswered. In terms of the component tasks identified earlier, it is important to keep
in mind that, much like digital voice recorders, the use of this technology fundamentally
changes the task, making it much more akin to that of simultaneous interpreting.

13.4.4 Tablets in consecutive conference interpreting


Although the first commercial tablet computers were developed at the end of the 20th
century, a genuine tablet revolution was not ushered in before Apple’s release of the first
iPad in 2010. Tablet computers with tactile interfaces quickly became ubiquitous, and
first attempts were made to repurpose them as a replacement of seemingly old-fashioned
pen-and-paper tools used by consecutive interpreters until then. While it did not take long
for individual practitioners to experiment with this technology and share their views on

234
Conference settings

blogs and websites, the first systematic small-scale user survey on the use of tablet comput-
ers in consecutive interpreting (although not exclusively in conference settings) was carried
out by Goldsmith and Holley (2015). It concludes with a contrastive analysis of technical,
visual, physical, as well as client relationship parameters. Of the long list of parameters que-
ried in the study, however, many relate to the physical features of the tool (stylus and tablet
vs pen and paper), with very few having a strong link to the component tasks identified
earlier. Among the latter is the ability to write emulating different styles (e.g. pen, pencil,
marker) in different colours and to erase notes. These perceived advantages of tablets have
the potential to impact the memorising task.
Similarly, the ability to increase the size of notes, to display multiple pages at once, and
to scroll vertically through notes without physically turning pages all potentially impact
the retrieval task. This means that, although at first glance the use of tablets in consecutive
conference interpreting might look revolutionary, this might be deceiving. And yet Altieri’s
(2020) contrastive empirical study on interpreting students’ note-taking on notepads and
tablets reveals some interesting tendencies. On the one hand, the number of short pauses
and looks at the audience are slightly lower when interpreting with a tablet. On the other
hand, when assessing their own performance, (student) participants rated their expressive-
ness, accuracy, faithfulness, coherence, and communicative effect substantially lower when
interpreting with a tablet. The extent to which lack of familiarity (despite five training ses-
sions) or, indeed, evaluator bias influenced these results remains unclear.
One advantage of tablet interpreting, other than the reduction in paper waste (although
the CO2 footprint generated by tablet computers has been estimated at 100 kg/year; see
Lövehagen et al., 2023), seems to be related to the note-reading more than the note-writing
task: scrolling rather than flipping through notepads and the possibility of showing all
pages at once are small but perhaps not-altogether-inconsequential improvements. The
underlying principles of the consecutive conference interpreting task, however, seem largely
unaffected by this technological development.

13.4.5 Speech recognition and machine translation in consecutive


conference interpreting
Recent advances in large language models (LLM) and artificial intelligence have paved the
way for automatic speech recognition (ASR) and machine translation (MT) to be integrated
in the consecutive interpreting process. Wang and Wang (2019) propose the combined use
of ASR and MT to complement interpreter’s pen-and-paper notes using off-the-shelf tech-
nology (e.g. Dragon and Google Translate) in what they label SR-CAI (speech recognition
computer-assisted interpreting). Their experiment reveals tendentially better accuracy when
interpreters have access to a machine translation of the automatically recognised original
along with their notes than when relying exclusively on the latter. Conversely, the impact
of SR-CAI on fluency is inconclusive. In a similar vein, Chen and Kruger (2023) suggest the
application of ASR and MT in what they refer to as computer-assisted consecutive inter-
preting (CACI), whereby note-taking is entirely replaced. They argue that this augmented
form of consecutive interpreting is particularly suitable for conference settings, owing to the
high demands placed on accuracy and completeness.
Unlike SR-CAI, the fundamental idea behind CACI is to replace the note-taking task
with a respeaking task, increasing the accuracy of automatic speech recognition, and the

235
The Routledge Handbook of Interpreting, Technology and AI

note-reading task with a reading task of either the automatically recognised original or its
machine translation (Chen and Kruger, 2024). This significantly alters the traditional con-
secutive interpreting task, which during the comprehension stage shares similarities with
phrase shadowing (Norman, 1976), while during the production stage can range from a
simple reading task (of a machine-translated text) to a paraphrase (McCarthy et al., 2009)
or a sight translation (Chmiel and Mazur, 2013) of the original automatically recognised
text. While accounts of real-life applications of this technology do not seem to have been
documented, first experiments suggest that overall quality gains can be significant, espe-
cially for interpreters working into a foreign language, thus providing a so-called ‘retour’
(Loiseau and Delgado Luchner, 2021). More specifically, Chen and Kruger’s (2024) study
suggests improved fluency and delivery when providing a consecutive retour interpretation,
which in a UN context is chiefly limited to interpreters working into Chinese and Arabic,
while in an EU context is mainly applicable to interpreters working in Eastern European
languages.7 It is interesting to note that subjective cognitive load was perceived to be lower
during CACI when working into, but not from, the foreign language.

13.4.6 Distance consecutive conference interpreting


Distance interpreting (DI) is a blanket term comprising all forms of information and com-
munication (ICT)–enabled interpreting where the speaker is in a different location from the
interpreter (ISO, 2020, 2022) regardless of mode or setting. Technically speaking, it can be
divided into four principal categories, audio- or videoconference interpreting, and audio
or video remote interpreting (for a detailed discussion, see Seeber and Fox, 2021). Reliable
post-COVID-19 data are scarce, but an AIIC-wide survey from 2018 indicates that the
most prevalent DI modality used for consecutive interpreting was ARI, or audio remote
interpreting (Seeber, 2020). It is important to underscore that the implementation of DI
technologies, from simple telephone connections to complex audio- and videoconferencing
hardware or software, seems to impact, first and foremost, the listening component task,
owing to the all-too-often-insufficient quality of the audio signal provided (AIIC, 2019).

13.5 Simultaneous interpreting at international conferences


As an interpreting mode defined by its temporal characteristics, during which the oral com-
prehension of the source language and the oral production of the output in the target lan-
guage temporally overlap (Seeber and Amos, 2023), simultaneous interpreting has existed
in the form of whispered interpretation, or ‘chuchotage’ (Diriker, 2015), long before the
birth of modern multilingual multilateral diplomacy, which is usually associated with the
creation of the League of Nations in 1919. As both the International Labour Organization
(ILO) and the United Nations (UN) operated in more than one language, however, consecu-
tive conference interpreting soon proved to be too time-consuming (Ryder, 2021). Early
tests based on repurposed telephone technology (Gordon-Finlay, 1927) led to the devel-
opment of two distinct new interpreting modes: simultaneous interpretation and instan-
taneous interpretation (Baigorri-Jalón, 2021). The former consisted in delivering several
consecutive interpretations (thus, once the speaker had finished) into different languages
at the same time, ensuring that all interpretations were provided simultaneously with each
other and not sequentially. The latter, on the other hand, consisted in providing the inter-
pretation at the same time as the original speech unfolds, making it simultaneous with the

236
Conference settings

original. This latter mode, which was eventually renamed and has since been known as
simultaneous interpreting, will be the object of the discussion that follows.

13.6 Technology in simultaneous conference interpreting


In its contemporary form, simultaneous interpreting has been inextricably linked to audio-
visual technology, such as headphones, microphones, consoles, and soundproof booths,
which were crucial in its implementation at multilingual international conferences. The
first attempts at repurposing existing telephone technology to provide multilingual inter-
pretation in real time can be traced back to the mid-1920s both in Switzerland and in
the Soviet Union. Technology-enabled simultaneous interpreting was first offered at the
ILO in Geneva and, only a few days later, at the Communist International (Comintern)
in Moscow, in 1928 (Chernov, 2016a; Baigorri-Jalón, 2004). Albeit with small technical
differences, the two systems were conceptually comparable, as they foresaw interpreters to
be in a separate room or booth, receiving the original sound over headphones and deliver-
ing the interpretation into microphones in real time. This allowed delegates in the meeting
room to listen to the interpretation of their choice. It was not before the highly media-
tised Nuremberg trials of 1945, however, that simultaneous interpreting became known
to the public at large and, soon afterwards, the interpreting mode of choice at the United
Nations, its specialised agencies, and the European institutions. While the design and
underlying architecture of consoles, microphones, and headphones evolved over time, the
simultaneous interpreter’s workplace only underwent relatively minor changes for three
quarters of a century (Seeber, 2015). The way in which different technologies impacted
simultaneous conference interpreting will once more be discussed based on a component
task model. For the sake of simplicity, the same five-step model comprising listening, ana-
lysing, memorising, retrieving, and producing will be used, with the understanding that
while the tasks might be the same, the way they are executed and the skills they draw upon
may differ considerably, owing to the different temporal characteristics of consecutive and
simultaneous interpreting.

13.6.1 Booths, microphones, and headsets in simultaneous


conference interpreting
One of the first technical innovations concomitant with the new simultaneous inter-
preting mode was soundproofing, to avoid interpreters talking over each other and the
delegates. This was initially achieved by using the so-called ‘Hush-a-Phone’, a mechani-
cal sound-muffling device developed in 1921 as an after-market attachment for candle-
stick telephones to improve privacy. Eventually, neck-worn collar microphones or fixed
funnel-shaped microphones replaced the Hush-a-Phone. Headsets were initially only pro-
vided to delegates in the conference room, but not interpreters, while interpreting booths
only slowly developed from semi-enclosed cubicles or partitions (e.g. at the Comintern
Congress in 1935 or the Nuremberg trials in 1945) to the fully enclosed, sound-insulated
structures in use today (see Baigorri-Jalón, 2021; Gaiba, 1998; Chernov, 2016a, 2016b,
2018). Modern industry standards for permanent booths (ISO, 2016a) and mobile booths
(ISO, 2016b) specify the minimum sound insulation required between interpreters’ booths
and the conference room, as well as from one booth to another. Those for interpretation

237
The Routledge Handbook of Interpreting, Technology and AI

equipment (ISO, 2016c, 2019) define minimum specifications for headphones, micro-
phones, integrated headsets combining earphones and microphones, along with consoles
and booth furniture. Some of the most fundamental parameters for simultaneous interpret-
ers, the quality of the sound transmitted to interpreters’ headsets, including the frequency
response, the total harmonic distortion, and the signal-to-noise ratio are also enshrined in
these standards. These early technological advances were crucial for the successful imple-
mentation of simultaneous conference interpreting, as they provided the necessary environ-
ment to carry out the unnatural task of speaking in one language while listening to another.
As compared to whispered simultaneous interpreting, these advances principally impacted
the listening and producing component tasks, as the acoustic separation of input and out-
put facilitated their simultaneous execution.

13.6.2 Electronic glossaries in simultaneous conference interpreting


The first generation of electronic glossaries to find their way into simultaneous confer-
ence interpreters’ workplace in the mid-1980s were static documents based on word
processors or spreadsheet applications whose main function was to visually and logically
organise terminology (Fantinuoli, 2018). Around the turn of the 21st century, tailored
electronic terminology management systems were developed, introducing a second gener-
ation of CAI tools8 (Guo et al., 2023; Fantinuoli, 2018). Given their dual purpose, which
goes beyond streamlining the preparation process and comprises the improvement of the
final product through their ability of allowing efficiently querying terms in real time, their
development has been said to be firmly rooted in simultaneous conference interpreting
settings (Will, 2020). An AIIC-wide survey from a decade ago (Jiang, 2013) indicates that
almost three-quarters of all conference interpreters prepare glossaries for most of their
meetings, with up to two-thirds using them in real time when interpreting. The same sur-
vey suggests that over three-quarters of interpreters rely on text-processing applications
or spreadsheets for the management of their glossaries, and thus static first-generation
tools (Guo et al., 2023), while only one-sixth rely on specifically purposed dynamic glos-
sary software, in other words, second-generation tools (Guo et al., 2023). Over time, the
time constraints associated with simultaneous conference interpreting, allowing for a lag
of no more than a few seconds (Timavorà et al., 2011), have led to the development of
software applications with cognitive ergonomics in mind (see Seeber and Arbona, 2020),
making them much simpler and, crucially, faster to use (Stoll, 2009; Will, 2009; Ruetten,
2003; Fantinuoli, 2016). This was done with a view to reducing the overall task interfer-
ence and cognitive load associated with the addition of the manual look-up task to the
already-complex simultaneous interpreting task (Prandi, 2023), which already consti-
tutes a complex combination of a comprehension and production task in itself (Seeber,
2007, 2017b). In her study on interpreting students, Prandi (2023) finds evidence for
slightly higher terminological accuracy, fewer errors and omissions, and a shorter ear–
voice span for interpreting with second-generation dynamic CAI tools than for interpret-
ing with first-generation static electronic glossaries. She concludes that dynamic glossary
management tools not only have less distraction potential than static glossaries but also
generate less cognitive load and can thus be integrated into the simultaneous conference
interpreting task at less cost. From a processing perspective, therefore, this technologi-
cal development mainly affects and arguably facilitates the retrieving component task of
simultaneous interpreting.

238
Conference settings

13.6.3 Distance simultaneous conference interpreting


Unlike consecutive interpreting, simultaneous interpreting, especially simultaneous confer-
ence interpreting, has been affected by the rapid spread of various distance interpreting (DI)
modalities following the COVID-19 pandemic. Yet attempts at providing DI solutions in
conference settings can be traced back to as early as the mid-1970s, when audio and video
remote interpreting (ARI and VRI) set-ups, today often referred to as remote simultaneous
interpreting (RSI), were tested during UNESCO’s General Conference (UNESCO, 1976),
as well as at the UN (Chernov, 2004). Systematic experiments followed several years later,
at the International Telecommunication Union in 1999 (Moser-Mercer, 2003 and at the
­European Parliament in 2001 (Mouzourakis, 2006) and 2004 (Roziner and Shlesinger,
2010). The first systematic use of DI in international conference settings started at the
European Commission in 2005, when based on the so-called Hampton Court agreement,
the use of VRI during working lunches of the Council was implemented, eventually leading
to an interinstitutional agreement on the use of remote interpreting in 2007 (Seeber and
Fox, 2021).
The use of DI in international conference interpreting increased drastically in 2020,
when DI was seen as the only solution to ensure business continuity of institutions and
organisations paralysed by pandemic-related travel restrictions and sanitary measures,
accelerating the development of so-called simultaneous interpreting delivery platforms
(SIDPs). Most international organisations and supranational institutions have since inte-
grated these new modalities into their modality portfolio, allowing for different forms
of active or passive online participation in their meetings. Some, however, reverted to
providing mainly in-person meetings with on-site interpreters. One noteworthy excep-
tion is the European Patent Office, which opted to keep the entirety of their opposition
proceedings online even after the COVID-19-related restrictions were lifted. In fact, after
piloting the new videoconference modality between 2021 and 2022, with some 2,400
oral proceedings including interpretation (EPO, 2022a), online proceedings with remote
interpretation were made the default modality at the beginning of 2023 (EPO, 2022b).
The recurring issues of insufficient sound quality leading to increased cognitive load and
fatigue (Seeber, 2022), as well as its potentially nefarious effects on interpreters’ audi-
tory health (Brady and Pickles, 2022), including through acoustic shocks (AIIC, n.d.),
were countered by adjusting working conditions and enforcing strict technical guidelines
as well as systematic test calls (EPO, n.d.). Although 85% of all proceedings via vide-
oconference have been reported to have ‘high-quality sound’ (EPO, 2022b), in processing
terms, the listening task component might remain the most impacted by this technological
advancement.

13.6.4 Speech recognition and machine translation in simultaneous


conference interpreting
More so than for the purpose of consecutive conference interpreting, automatic speech rec-
ognition (ASR) and machine translation (MT) have recently been combined into real-time
speech translation solutions to support simultaneous interpreters in the booth. This tech-
nological development, known variably as ‘electronic’, ‘virtual’, ‘digital’, or ‘artificial’
boothmate (Fantinuoli, 2017, 2023; Prandi, 2023), allows for the automatic recognition,
transcription, translation, and visualisation of entire speech segments or certain speech

239
The Routledge Handbook of Interpreting, Technology and AI

constituents known to be particularly prone to depleting interpreters’ cognitive resources,


such as numbers, abbreviations, proper names, and technical terms (Gile, 1995/2009; See-
ber, 2015; Mankauskienè, 2016). A first series of experiments comparing simultaneous
conference interpreting with and without this artificial boothmate suggests that it holds the
potential of increasing the accuracy of interpreted numbers by anywhere from 20% to 40%
(Desmet et al., 2018; Defrancq and Fantinuoli, 2021; Pisani and Fantinuoli, 2021).
Additionally, Prandi (2023) finds first evidence for a decrease in terminological errors
and an increase in fluency associated with the use of an artificial boothmate compared
to the use of static or dynamic electronic glossaries and makes a theoretical case for
a decrease in cognitive load when using the former as compared to the latter. These
potential gains are contingent on the latency with which the system is able to recognise,
produce, and visualise target language tokens, as the accuracy in the rendition of both
numbers and referents drops once the latency goes beyond 3 sec (Fantinuoli and Montec-
chio, 2023). Given that simultaneous conference interpreters usually operate with a lag
of between 2 and 5 sec, and that accuracy declines with increasing lag (Timarovà et al.,
2011), this is not surprising. Potential additional limitations of the system include the
accuracy of its ASR component when faced with accents (see DiChristofano et al., 2023;
Feng et al., 2024) and its MT component when faced with non-standard forms of English
(Anastasopoulos et al., 2019).
Both phenomena have been identified as potential problem triggers as well as having
become a hallmark of international conference interpreting (Albl-Mikasa, 2021). Finally,
two questions remain largely unanswered and will undoubtedly constitute the focus of
future research. On the one hand, while a gain in accuracy (and, in some cases, fluency)
has been found, we are still at the very beginning of understanding the underlying cogni-
tive ergonomic implications of this technology. In other words, we do not yet know how
the additional information provided by the artificial boothmate affects the overall cognitive
load experienced by the interpreter. On the other hand, while the accuracy of single tokens
like numbers or technical terms visibly benefitted from this technological augmentation,
the question about its impact on the overall quality of the output has yet to be answered.

13.7 Conclusion
Many of the technological developments discussed in this chapter, from booths, headsets,
and consoles to electronic glossaries and distance interpreting solutions, have been success-
fully integrated in international conference interpreting. While some of them were imme-
diately welcomed with excitement, others were initially met with resistance. Some, like
digital voice recorders or digital pens, were hailed as potentially revolutionary but never
really made much of an appearance in the conference room. And yet, even though it is fair
to say that they have not become part of the professional toolkit, they seem to have found
their way into the classroom as a training tool. This shows how, regardless of the hype
generated around a technological development, the factors leading to its acceptance and
adoption are complex. Most technological advances only address a small number of facets
of the interpreting task, meaning, that while they can be argued to provide added value,
this value comes at a cost – often a cognitive cost. All other things being equal, therefore, it
seems that conference interpreters might need to be convinced of a technology’s net benefit
to overcome what might be an initial reluctance to fix something that ain’t broke.

240
Conference settings

Notes
1 According to Chernov (2016a), the first Congress of the Comintern had adopted German and
Russian as working languages, with occasional interpretation provided from French, English, and
Chinese.
2 EMCI European Masters in Conference Interpreting. URL www.emcinterpreting.org/core-
curriculum/ (accessed 15.7.2024).
3 Interinstitutional accreditation tests. URL https://europa.eu/interpretation/freelance_en.html.
(accessed 31.3.2025).
4 MASIT – master of advanced studies in interpreter training. URL www.unige.ch/formcont/en/
courses/masit (accessed 31.3.2025).
5 Original blog posts no longer accessible; first referenced in Hamidi and Pöchhacker (2007).
6 Livescribe, the maker of the first paper-based smartpen, was acquired by Anoto in 2015.
7 It should be noted that, when on field missions, UN interpreters are regularly called upon to pro-
vide a ‘retour’, although often in simultaneous mode (Ruiz Rosendo et al., 2021). As for the EU,
interpreters with the necessary qualifications – regardless of their A language – may be called upon
to provide a retour when on mission or when providing interpretation ad personam (Tanzella and
Alvar Rozas, 2011).
8 For a conceptual and terminological discussion, see Guo et al. (2023).

References
Ahrens, B., Orlando, M., 2021. Note-Taking in Consecutive Interpreting. In Albl-Mikasa, M., Tise-
lius, E., eds. The Routledge Handbook of Conference Interpreting. Routledge, Abingdon, 34–48.
URL https://doi.org/10.4324/9780429297878
AIIC, 2019. Guidelines for Distance Interpreting. URL https://aiic.ch/wp-content/uploads/2020/04/
aiic-guidelines-for-distance-interpreting-version-10.pdf (accessed 1.7.2024).
AIIC, 2024. AIIC Statistical Report: 2023 Data. Unpublished internal report.
AIIC, n.d. Acoustic Shocks Research Project: Final Report. URL https://aiic.org/uploaded/web/
Acoustic%20Shocks%20Research%20Project.pdf (accessed 1.7.2024).
Albl-Mikasa, M., 2021. Conference Interpreting and English as a Lingua Franca. In Albl-Mikasa,
M., Tiselius, E., eds. The Routledge Handbook of Conference Interpreting. Routledge, Abingdon,
546–563. URL https://doi.org/10.4324/9780429297878-47
Altieri, M., 2020. Tablet Interpreting: Étude expérimentale de l’interprétation consécutive sur tab-
lette. The Interpreters’ Newsletter 25, 19–35.
Amos, R.M., Seeber, K.G., Pickering, M.J., 2022. Prediction During Simultaneous Interpreting: Evi-
dence from the Visual-World Paradigm. Cognition 220, 104987.
Anastasopoulos, A., Lui, A., Nguyen, T.Q., Chiang, D., 2019. Neural Machine Translation of Text
from Non-Native Speakers. In Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies Vol-
ume 1 (Long and Short Papers). Association for Computational Linguistics, 3070–3080. URL
https://doi.org/10.18653/v1/N19-1310
Baigorri-Jalón, J., 2004. Interpreters at the United Nations: A History. Salamanca: Ediciones Univer-
sidad de Salamanca.
Baigorri-Jalón, J., 2005. Conference Interpreting in the First International Labor Conference (Wash-
ington, D.C., 1919). Meta 50(3), 987–996.
Baigorri-Jalón, J., 2015. The History of the profession. In Mikkelson, H., Jourdenais, R., eds. Hand-
book of Conference Interpreting. Routledge, Abingdon, 11–28.
Baigorri-Jalón, J., 2021. Once Upon a Time at the ILO: The Infancy of Simultaneous Interpreting. In
Seeber, K.G., ed. 100 Years of Conference Interpreting: A Legacy. Cambridge Scholars, Newcastle
upon Tyne, 1–24.
Baigorri-Jalón, J., Fernández-Sánchez, M.-M., Payàs, A., 2021. Distance Conference Interpreting. In
Albl-Mikasa, M., Tiselius, E., eds. Routledge Handbook of Conference Interpreting. Routledge,
Abingdon, 9–18. URL https://doi.org/10.4324/9780429297878-43

241
The Routledge Handbook of Interpreting, Technology and AI

Bajo, M.T., Padilla, P., 2015. Memory. In Pöchhacker, F., ed. The Routledge Encyclopedia of Inter-
preting Studies. Routledge, Abingdon, 252–254.
Baker, M., Diriker, E., 2019. Conference and Simultaneous Interpreting. In Baker, M., Diriker, E.,
eds. Routledge Encyclopedia of Translation Studies, 3rd ed. Routledge, Abingdon, 95–101. URL
https://doi.org/10.4324/9781315683751
Barbour, I., 1993. Ethics in an Age of Technology. Harper, New York.
Brady, A., Pickles, M., 2022. Why Remote Interpreting Doesn’t Work for Interpreters. UNtoday,
June, 5, 12–14.
Camayd-Freixas, E., 2005. A Revolution in Consecutive Interpretation: Digital Voice-Recorder-Assisted
CI. The ATA Chronicle 34, 40–46.
Chen, S., Kruger, J.L., 2023. The Effectiveness of Computer-Assisted Interpreting: A Preliminary
Study Based on English-Chinese Consecutive Interpreting. Translation and Interpreting Studies
18(3), 399–420. URL https://doi.org/10.1075/tis.21036.che
Chen, S., Kruger, J.L., 2024. A Computer-Assisted Consecutive Interpreting Workflow: Training and
Evaluation. The Interpreter and Translator Trainer 1–20. URL https://doi.org/10.1080/17503
99X.2024.2373553
Chernov, G.V., 2004. Inference and Anticipation in Simultaneous Interpreting. Benjamins, Amsterdam.
Chernov, S., 2016a. At the Dawn of Simultaneous Interpreting in the USSR – Filling Some Gaps in
History. In Takeda, K., Baigorri-Jalón, J., eds. New Insights in the History of Interpreting. Benja-
mins, Amsterdam, 135–165. URL https://doi.org/10.1075/btl.122.06che
Chernov, S., 2016b. U istokov sinhronnogo perevoda v SSSR (The Origins of Simultaneous Interpret-
ing in the USSR). Mosty 2(50), 52–68.
Chernov, S., 2018. At the Dawn of Simuiltaneous Interpreting in the USSR: Filling Some Gaps in
History. In Takeda, K., Baigorri-Jalón, J., eds. New Insights into the History of Interpreting. Ben-
jamins, Amsterdam, 135–165.
Chmiel, A., Mazur, I., 2013. Eye Tracking Sight Translation Performed by Trainee Interpreters. In
Way, C., Vandepitte, S., Meylaerts, R., Bartlomiejczyk, M., eds. Tracks and Treks in Translation
Studies: Selected Papers from the EST Congress Leuven 2010. Benjamins, Amsterdam, 189–205.
Clark, R., Feldon, D., Van Merrienboer, J.J.G., Yates, K., Early, S., 2008. Cognitive Task Analysis. In
Spector, J.M., Merrill, M.D., van Merrienboer, J.J.G., Driscoll, M.P., eds. Handbook of Research
on Educational Communications and Technology. Macmillan/Gale, New York, 577–593.
Cooke, N.J., 1992. The Implications of Cognitive Task Analyses for the Revision of the Dictionary
of Occupational Titles. In Camara, W.J., ed. Implications of Cognitive Psychology and Cognitive
Task Analysis for the Revision of the Dictionary of Occupational Titles. American Psychological
Association, Washington, DC, 1–25.
Cooke, N.J., 1994. Varieties of Knowledge Elicitation Techniques. International Journal of
Human-Computer Studies 41, 801–849.
Dam, H.V., 2010. Consecutive Interpreting. In Gambier, Y., Van Doorslaer, L., eds. Handbook of
Translation Studies Online. Benjamins, Amsterdam, 75–79. URL https://doi.org/10.1075/hts.1
Defrancq, B., Fantinuoli, C., 2021. Automatic Speech Recognition in the Booth: Assessment of Sys-
tem Performance, Interpreters’ Performances and Interactions in the Context of Numbers. Target
33(1), 73–102.
Desmet, B., Vandierendonck, M., Defrancq, B., 2018. Simultaneous Interpretation of Numbers and
the Impact of Technological Support. In Fantinuoli, C., ed. Interpreting and Technology. Language
Science Press, Berlin, 13–27.
DiChristofano, A., Shuster, H., Chandra, S., Patwari, N., 2023. Performance Disparities Between
Accents in Automatic Speech Recognition (Student Abstract). Proceedings of the AAAI Conference
on Artificial Intelligence 37(13), 16200–16201. URL https://doi.org/10.1609/aaai.v37i13.26960
Diriker, E., 2015. De-/Re-Contextualizing Conference Interpreting: Interpreters in the Ivory Tower?
Benjamins, Amsterdam. URL https://doi.org/10.1075/btl.122.06che
EPO, n.d. Technical Guidelines. URL www.epo.org/en/applying/european/oral-proceedings/
proceedings/technical-guidelines (accessed 1.7.2024).
EPO, 2022a. Oral Proceedings in Opposition by Videoconference: Pilot Project Final Report. URL
https://link.epo.org/web/oral_proceedings_in_opposition_by_videoconference-pilot_project_
final_report_november_2022_en.pdf (accessed 1.7.2024).

242
Conference settings

EPO, 2022b. President Decides Future Format of Oral Proceedings in Opposition. News & Events.
URL www.epo.org/en/news-events/news/president-decides-future-format-oral-proceedings-oppo
sition (accessed 1.7.2024).
Fantinuoli, C., 2016. InterpretBank: Redefining Computer-Assisted Interpreting Tools. In
Esteves-Ferreira, J., Macan, J., Mitkov, R., Stefanov, O.-M., eds. Proceedings of the 38th Confer-
ence Translating and the Computer. AsLing, 42–52.
Fantinuoli, C., 2017. Speech Recognition in the Interpreter Workstation. In Esteves-Ferreira, J.,
Macan, J., Mitkov, R., Stefanov, O.-M., eds. Proceedings of the 39th Conference Translating and
the Computer. AsLing, 25–34.
Fantinuoli, C., 2018. Computer-Assisted Interpreting: Challenges and Future Perspectives. In Corpas
Pastor, G., Durán-Muñoz, I., eds. Trends in E-Tools and Resources for Translators and Interpret-
ers. Brill, Leiden, 153–174. URL https://doi.org/10.1163/9789004351790_009
Fantinuoli, C., 2023. Towards AI-Enhanced Computer-Assisted Interpreting. In Corpas Pastor, G.,
Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. Benjamins, Amsterdam,
46–71.
Fantinuoli, C., Montecchio, M., 2023. Defining Maximum Acceptable Latency of AI-Enhanced
CAI Tools. In Ferreiro-Vázquez, Ó., Varajão Moutinho Pereira, A., Gonçalves Araújo, S., eds.
Technological Innovation Put to the Service of Language Learning Translation and Interpreting:
Insights from Academic and Professional Contexts. Peter Lang, Berlin, 213–226. URL https://doi.
org/10.3726/b20168
Feng, S., Halpern, B.M., Kudina, O., Scharenborg, O., 2024. Towards Inclusive Automatic Speech
Recognition. Computer Speech & Language 84, 101567. URL https://doi.org/10.1016/j.
csl.2023.101567
Ferrari, M., 2001. Consecutive Simultaneous? SCIC News 26, 2–4.
Ferrari, M., 2002. Traditional vs. ‘Simultaneous’ Consecutive. SCIC News 29, 6–7.
Ferré, F., 1988. Philosophy of Technology. Prentice Hall, Englewood Cliffs, NJ.
Friedel, R., 2007. A Culture of Improvement: Technology and the Western Millennium. MIT Press,
Cambridge, MA. URL https://doi.org/10.7551/mitpress/9780262062626.001.0001
Gaiba, F., 1998. The Origins of Simultaneous Interpretation: The Nuremberg Trial. University of
Ottawa Press, Ottawa. URL www.jstor.org/stable/j.ctt1cn6rsh
Galbraith, J.K., 1967. The New Industrial State. Houghton Mifflin, Boston, MA.
Gile, D., 1995/2009. Basic Concepts and Models for Interpreter and Translator Training. Benjamins,
Amsterdam.
Gillies, A., 2017. Note-Taking for Consecutive Interpreting: A Short Course. Routledge, Abingdon.
Goldsmith, J., Holley, J.C., 2015. Consecutive Interpreting 2.0: The Tablet Interpreting Experience
(MA dissertation). University of Geneva.
Gordon-Finlay, A., 1927. Telephonic Interpretation Experiments; Technical Arrangements. Dossier
Filene Experiment. Results obtained 1926–1927 Sessions of the Conference (30H/4/8910). ILO
Archives, Geneva.
Grbić, N., 2015. Settings. In Pöchhacker, F., ed. The Routledge Encyclopedia of Interpreting Studies.
Routledge, Abingdon, 370–371.
Guo, M., Han, L., Teiceira Anacleto, A., 2023. Computer-Assisted Interpreting Tools: Status Quo
and Future Trend. Theory and Practice in Language Studies 13(1), 89–99. URL https://doi.
org/10.17507/tpls.1301.11
Hamidi, M., Pöchhacker, F., 2007. Simultaneous Consecutive Interpreting: A New Technique Put to
the Test. Meta 52(2), 276–289.
Hawel, K., 2010. Simultanes versus klassisches Konsekutivdolmetschen: Eine vergleichende textuelle
Analyse (MA dissertation). University of Vienna.
Herbert, J., 1952. The Interpreter’s Handbook: How to Become a Conference Interpreter. Librerie
de l’Université, Paris.
Hiebl, B., 2011. Simultanes Konsekutivdolmetschen mit dem LivescribeTM EchoTM Smartpen. Ein
Experiment im Sprachenpaar Italienisch-Deutsch mit Fokus auf Zuhörerbewertung (MA disserta-
tion). University of Vienna.
ISO, 2016a. ISO 2603:2016 Simultaneous Interpreting – Permanent Booths-Requirements. URL
www.iso.org/standard/74006.html (accessed 1.7.2024).

243
The Routledge Handbook of Interpreting, Technology and AI

ISO, 2016b. ISO 4043:2016 Simultaneous Interpreting – Mobile Booths – Requirements. URL www.
iso.org/standard/70804.html (accessed 1.7.2024).
ISO, 2016c. ISO 20109:2016 Simultaneous Interpreting – Equipment – Requirements. URL www.
iso.org/standard/63033.html (accessed 1.7.2024).
ISO, 2019. ISO 22259:2019 Conference Interpreting – Equipment – Requirements. URL www.iso.
org/standard/72001.html (accessed 1.7.2024).
ISO, 2020. ISO/PAS 24019:2020 Simultaneous Interpreting Delivery Platforms – Requirements and
Recommendations. URL https://www.iso.org/standard/80761.html (accessed 31.3.2025)
ISO, 2022. ISO 23155:2022 Interpreting Services — Conference Interpreting — Requirements and
Recommendations. URL www.iso.org/standard/74749.html (accessed 1.7.2024).
Jiang, H., 2013. The Interpreter’s Glossary in Simultaneous Interpreting: A Survey. Interpreting 15(1),
74–93. URL https://doi.org/10.1075/intp.15.1.04jia
Kirchhoff, H., 1979. Die Notationssprache als Hilfsmittel des Konferenzdolmetschers im Konsekutiv-
vorgang. In Mair, W., Sallager, E., eds. Sprachtheorie und Sprachenpraxis. Festschrift für Henri
Vernay zu seinem 60. Geburtstag. Gunter Narr, Tübingen, 121–133.
Kurz, I., 1991. Conference Interpreting: Job Satisfaction, Occupational Prestige and Desirability. In
Jovanović, M., ed. XIIth World Congress of FIT – Belgrade 1990. Proceedings. Prevodilac, Bel-
grade, 363–367.
Lederer, M., 1984. La traduction simultanée. In Seleskovitch, D., Lederer, M., eds. Interpréter pour
traduire. Didier, Paris, 136–162.
Loiseau, N., Delgado Luchner, C., 2021. A, B and C Decoded: Understanding Interpreters’ Language
Combinations in Terms of Language Proficiency. The Interpreter and Translator Trainer 15(4),
468–489. URL https://doi.org/10.1080/1750399X.2021.1911193
Lombardi, J., 2003. DRAC Interpreting: Coming Soon to a Courthouse Near You? Proteus 12(2), 7–9.
Lövehagen, N., Malmodin, J., Bergmark, P., Matinfar, S., 2023. Assessing Embodied Carbon Emis-
sions of Communication User Devices by Combining Approaches. Renewable and Sustainable
Energy Reviews 183, 113422. URL https://doi.org/10.1016/j.rser.2023.113422
Mankauskienè, D., 2016. Problem Trigger Classification and Its Applications for Empirical Research.
Procedia – Social and Behavioral Sciences 231, 143–148.
McCarthy, P.M., Guess, R.H., McNamara, D.S., 2009. The Components of Paraphrase Evaluations.
Behavior Research Methods 41(3), 682–690. URL https://doi.org/10.3758/BRM.41.3.682
Mesthene, E.G., 1970. Technological Change: Its Impact on Man and Society. Mentor, New York.
Mielcarek, M.M., 2017. Das simultane Konsekutivdolmetschen: Ein Experiment im Sprachenpaar
Spanisch-Deutsch (MA dissertation). University of Vienna.
Moser-Mercer, B., 2003. Remote Interpreting: Assessment of Human Factors and Performance
Parameters. Joint project, International Telecommunication Union (ITU) and Ecole de traduction
et interprétation, University of Geneva (ETI).
Moser-Mercer, B., Lambert, S., Darò, V., Williams, D., 1997. Skill Components in Simultaneous
Interpreting. In Gambier, Y., Gile, D., Taylor, C., eds. Conference Interpreting: Current Trends in
Research. Benjamins, Amsterdam, 133–148.
Mouzourakis, P., 2006. Remote Interpreting: A Technical Perspective on Recent Experiments. Inter-
preting 8(1), 45–66.
Noel, J.V., 1902. History of the Second Pan American Congress. Guggenheimer Weil and Co.,
New York.
Norman, D.A., 1976. Memory and Attention: An Introduction to Human Information Processing,
2nd ed. Wiley, New York.
Orlando, M., 2010. Digital Pen Technology and Consecutive Interpreting: Another Dimension in
Note-Taking Training and Assessment. The Interpreters’ Newsletter 15, 71–86.
Orlando, M., 2014. A Study on the Amenability of Digital Pen Technology in a Hybrid Mode of
Interpreting: Consec-Simul with Notes. The International Journal for Translation and Interpreting
Research 6(2), 39–54. URL https://doi.org/10.12807/ti.106202.2014.a03
Orlando, M., 2015. Implementing Digital Pen Technology in the Consecutive Interpreting Classroom.
In Andres, D., Behr, M., eds. To Know How to Suggest . . .: Approaches to Teaching Conference
Interpreting. Frank & Timme, Berlin, 171–200.

244
Conference settings

Orlando, M., Hlavac, J., 2020. Simultaneous-Consecutive in Interpreter Training and Interpreting
Practice: Use and Perceptions of a Hybrid Mode. The Interpreters’ Newsletter, 25, 1–17.
Piolat, A., Olive, T., Kellogg, R.T., 2005. Cognitive Effort During Note Taking. Applied Cognitive
Psychology 19(3), 291–312. URL https://doi.org/10.1002/acp.1086
Pisani, E., Fantinuoli, C., 2021. Measuring the Impact of Automatic Speech Recognition on Num-
ber Rendition in Simultaneous Interpreting. In Wang, C., Zheng, B., eds. Empirical Studies of
Translation and Interpreting. Routledge, Abingdon, 181–197. URL https://doi.org/10.4324/
9781003017400-14
Pöchhacker, F., 2011. Consecutive Interpreting. In Malmkjær, K., Windle, K., eds. The Oxford
Handbook of translation Studies. Oxford University Press, Oxford, 294–306. URL https://doi.
org/10.1093/oxfordhb/9780199239306.013.0021
Pöchhacker, F., ed., 2015. The Routledge Encyclopedia of Interpreting Studies. Routledge, Abingdon.
URL https://doi.org/10.4324/9781315720728
Pöchhacker, F., 2022. Introducing Interpreting Studies. Routledge, Abingdon. URL https://doi.
org/10.4324/9781003109020
Prandi, B., 2023. Computer-Assisted Simultaneous Interpreting: A Cognitive-Experimental Study on
Terminology. Language Science Press, Berlin. URL https://doi.org/10.5281/zenodo.7143055
Riddell, J., ed., 2015. To the Masses. Proceedings of the Third Congress of the Communist Interna-
tional 1921. Brill, Leiden. URL https://doi.org/10.1163/9789004266177
Rozan, J.F., 1956. Note-Taking in Consecutive Interpreting. Georg, Geneva.
Roziner, I., Shlesinger, M., 2010. Much Ado About Something Remote: Stress and Performance in
Remote Interpreting. Interpreting 12(2), 214–247.
Ruetten, A., 2003. Computer-Based Information Management for Conference Interpreters or How Will
I Make My Computer Act Like an Infallible Information Butler? In Esteves-Ferreira, J., Macan, J.,
Mitkov, R., Stefanov, O.-M., eds. Proceedings of Translating and the Computer 25. Aslib, 20–21.
Ruiz Rosendo, L., Barghout, A., Martin, C.H., 2021. Interpreting on UN Field Missions: A Training
Programme. The Interpreter and Translator Trainer 15(4), 450–467. URL https://doi.org/10.1080/
1750399X.2021.1903736
Ryder, G., 2021. Prologue. In Seeber, K.G., ed. 100 Years of Conference Interpreting: A Legacy. Cam-
bridge Scholars, Newcastle upon Tyne, xviii–xxii.
Sawyer, R.K., Roy, C.B., 2015. Encyclopedia of the Social and Cultural Foundations of Education.
Sage Publications, Thousand Oaks, CA. URL https://doi.org/10.4135/9781412963992
Seeber, K.G., 2007. Thinking Outside the Cube: Modeling Language Processing Tasks in a Multiple
Resource Paradigm. Proceedings of Interspeech 2007, 1382–1385. URL https://doi.org/10.21437/
Interspeech.2007-21
Seeber, K.G., 2015. Simultaneous Interpreting. In Mikkelson, H., Jourdenais R., eds. The Routledge
Handbook of Interpreting. Routledge, Abingdon, 79–95.
Seeber, K.G., 2017a. Multimodal Processing in Simultaneous Interpreting. In Schwieter, J.W., Ferreira,
A., eds. The Handbook of Translation and Cognition. Wiley Blackwell, Hoboken, NJ, 461–475.
Seeber, K.G., 2017b. Simultaneous Interpreting into a B Language: Considerations for Trainers and
Trainees. In Zybatow, L.N., Stauder, A., Ustanzewski, M., eds. Translation Studies and Translation
Practice: Proceedings of the 2nd International Translata Conference. Peter Lang, Berlin, 321–328.
URL https://doi.org/10.3726/b10842
Seeber, K.G., 2020. Distance Interpreting: Mapping the Landscape. In Ahrens, B., Beaton-Thome, M.,
Krein-Kühle, M., Krüger, R., Link, L., Wienen, U., eds. Interdependence and Innovation in Trans-
lation Interpreting and Specialised Communication. Frank & Timme, Berlin, 123–172.
Seeber, K.G., 2022. Project Report: Load and Fatigue in ARI and VRI. Department of Public Works
and Government Services Canada – Conference Interpretation, Ottawa.
Seeber, K.G., Amos, R.M., 2023. Capacity, Load and Effort in Translation, Interpreting and Bilingual-
ism. In Ferreira, A., Schwieter, J.W., eds. The Routledge Handbook of Translation, Interpreting and
Bilingualism, 1st ed. Routledge, Abingdon, 260–279. URL https://doi.org/10.4324/9781003109020
Seeber, K.G., Arbona, E., 2020. What’s Load Got to Do with It? A Cognitive-Ergonomic Training
Model of Simultaneous Interpreting. The Interpreter and Translator Trainer 14(4), 369–385. URL
https://doi.org/10.1080/1750399X.2020.1839996

245
The Routledge Handbook of Interpreting, Technology and AI

Seeber, K.G., Fox, B., 2021. Distance Conference Interpreting. In Albl-Mikasa, M., Tiselius, E., eds.
Routledge Handbook of Conference Interpreting. Routledge, Abingdon, 491–507. URL https://
doi.org/10.4324/9780429297878-43
Seleskovitch, D., 1968. L’interprète dans les conférences internationales. Minard Lettres Modernes,
Paris.
Setton, R., Dawrant, A., 2016. Conference Interpreting: A Complete Course. Benjamins, Amsterdam.
Sienkiewicz, B., 2010. Das Konsekutivdolmetschen der Zukunft: Mit Notizblock oder Aufnah-
megerät? Ein Experiment zum Vergleich von klassischem und simultanem Konsekutivdolmetschen
(MA dissertation). University of Vienna.
Svejcer, A.S., 1999. At the Dawn of Simultaneous Interpretation in Russia. Interpreting 4, 23–28.
Stoll, C., 2009. Jenseits simultanfähiger Terminologiesysteme. WVT Wissenschaftlicher Verlag, Berlin.
Svoboda, S., 2020. SimConsec: The Technology of a Smartpen in Interpreting (MA dissertation).
Palacký University.
Tanzella, D., Alvar Rozas, P., 2011. Interpreting at the European Institutions: Interpretation ad Per-
sonam (MA dissertation). University of Geneva.
Timarová, S., Dragstead, B., Gorm Hansen, I., 2011. Time Lag in Translation and Interpreting:
A Methodological Exploration. In Alvstad, C., Hild, A., Tiselius, E., eds. Methods and Strate-
gies of Process Research: Integrative Approaches in Translation Studies. Benjamins, Amsterdam,
121–146.
UNESCO, 1976. A Teleconference Experiment: A Report on the Experimental Use of the Sympho-
nie Satellite to Link UNESCO Headquarters in Paris with the Conference Centre in Nairobi.
UNESCO, Paris.
Viezzi, M., 2013. Simultaneous and Consecutive Interpreting (Non-Conference Settings). In Mil-
lán, C., Bartrina, F., eds. The Routledge Handbook of Translation Studies. Routledge, Abingdon,
377–388. URL https://doi.org/10.4324/9780203102893
Wang, X., Wang, C., 2019. Can Computer-Assisted Interpreting Tools Assist Interpreting? Translet-
ters. International Journal of Translation and Interpreting, 3, 109–139.
Will, M., 2009. Dolmetschorientierte Terminologiearbeit: Modell und Methode. Gunter Narr Verlag,
Tübingen.
Will, M., 2020. Computer Aided Interpreting (CAI) for Conference Interpreters: Concepts, Content
and Prospects. Journal for Communication Studies 13(1), 37–71.
Wright, S., 2006. French as a Lingua Franca. Annual Review of Applied Linguistics 26, 35–60. URL
https://doi.org/10.1017/S0267190506000031
Yenkimaleki, M., van Heuven, V.J., 2017. The Effect of Memory Training on Consecutive Interpreting
Performance by Interpreter Trainees: An Experimental Study. Forum 15(1), 157–172. URL https://
doi.org/10.1075/forum.15.1.09yen

246
14
HEALTHCARE SETTINGS
Esther de Boe

14.1 Introduction
Healthcare interpreting (HI) refers to the interaction that occurs between three parties: ‘a
speaker of a non-societal language (for example, a patient seeking healthcare)’, ‘a speaker
of the societal language (generally the service provider)’, and an interpreter, ‘who medi-
ates between the two parties in a simultaneous or consecutive mode, either face-to-face
or remotely’ (Angelelli, 2014, 574). HI is one of the most prevalent fields of professional
interpreting practice and research within dialogue interpreting studies (Hale, 2007; Pöch-
hacker and Schlesinger, 2005). Similar to other contexts of public services, healthcare pro-
viders are increasingly challenged by linguistic and cultural barriers. As societies diversify
due to various migration patterns, healthcare providers encounter an increasing number of
patients with whom they share neither a common language nor culture (Boujon et al., 2018,
50). Furthermore, the transition towards a patient-centred approach (Schinkel et al., 2019)
presents additional challenges for healthcare providers who are dealing with a more diverse
patient population (Fiedler et al., 2022).
Finding solid ways to bridge linguistic and cultural boundaries is essential in healthcare,
since success or failure of the communication may become a matter of life or death (Ng
and Crezee, 2020). As Anazawa et al. (2012, 1) point out, accuracy of interpretation is ‘the
most critical component of safe and effective communication’ between healthcare providers
and patients. This is why researchers contend that the potential risk of using unqualified
interpreters to work in such high-stakes settings cannot be measured (Hedding and Kauf-
man, 2012) and call for high-quality interpreting standards. This necessity is further under-
scored by the existence of diverse specialised fields within healthcare, each with distinct
requirements and terminology, such as speech pathology, psychotherapy, gynaecology, and
mental health (Dean and Pollard, 2011; Ng and Crezee, 2020).
Hence, it comes as no surprise that academic studies have confirmed the effectiveness
of employing professional interpreters in order to enhance the health and satisfaction of
patients with no or low proficiency in the societal language (Ribera et al., 2008). Employing
professional interpreters has been linked to an increased reception of preventive advice and
prescriptions, coupled with a reduced reliance on emergency consultations (Jacobs et al.,


DOI: 10.4324/9781003053248-19
The Routledge Handbook of Interpreting, Technology and AI

2004). Moreover, this practice may also prevent ethical and legal issues, such as informed
consent (Bot, 2013). However, it must be noted that, as Schouten et al. (2020) argue, the
use of professional interpreters may also pose challenges, in terms of a simplistic vision of
their role or a lack of awareness about the complexity of healthcare encounters. Next to
that, the understanding of who is considered to be a ‘professional’ healthcare interpreter
also varies from one country to another (Bischoff and Grossmann, 2006).
Although empirical evidence supports the use of professional interpreters, various societal
factors, such as a decrease in government subsidies and a lack of regulations (De Boe, 2015;
Phelan, 2012), pose obstacles to their widespread employment. Moreover, the supply of qual-
ified interpreters is insufficient to meet the increasingly varied linguistic demand (González
Rodríguez and Spinolo, 2017, 242). While Deaf individuals have identified healthcare settings
as the most crucial context requiring interpreter services (De Meulder et al., 2021; Haualand,
2010), these settings also pose the greatest challenge in terms of securing the use of quali-
fied interpreters (NIEC, 2008, in Swabey et al., 2017). In addition, healthcare providers are
not always ready to employ professional interpreters (whether remote or on-site). They may
instead choose to ‘get by’ without interpreting services, even though professional interpreting
services are readily available (Fiedler et al., 2022; Gutman et al., 2020).
Such circumstances often drive healthcare providers to seek alternative options for
surmounting language barriers. This trend is increasingly reinforced by technological
advancements. In light of the ongoing digitisation of our societies, healthcare services have
largely embraced technology in their communication with patients over the last decades.
This so-called ‘telemedicine’, which is also being referred to as ‘e-health’ or ‘telehealth’
(WHO-ITU, 2022),1 encompasses a broad spectrum of technological support. This ranges
from traditional technologies like telephone and email to more recent innovations, such
as online health portals, through which patients have electronic access to their healthcare
information, and video-mediated consultations. When such resources are accessed through
mobile communication devices, this has been referred to as ‘m-Health’, or ‘mHealth’
(Sweileh et al., 2017). Nowadays, (monolingual) video-mediated consultations are being
considered as ‘at least a partial solution to the complex challenges of delivering healthcare
to an aging and increasingly diverse population’ (Greenhalgh et al., 2016, 1). Moreover,
teleconsultations are often encouraged by authorities and decision-makers to improve
access to specialised services for isolated patients (Esterle and Mathieu-Fritz, 2013).
In the same way, technology-mediated HI in the form of telephone-based or video-based
interpreting has become widespread (Farag and Meyer, 2024, 2). Braun and Taylor (2012)
referred to these modalities of interpreting as a case of ‘dual mediation’, indicating that the
interaction between individuals is mediated by an interpreter, while the interaction itself
is, in turn, mediated by technology. Aside from using technology to support HI, relatively
recently, a growing use of technology-supported tools has been observed in healthcare com-
munication with non-native patients (Sweileh et al., 2017; Thonon et al., 2021). These
tools – aimed at replacing human interpreters – include generic machine translation (MT)
tools, such as Google Translate (Patil and Davies, 2014; Vieira et al., 2021). There are also
dedicated care applications that generate pre-established medical translations, some of which
can be activated by automatic speech recognition (ASR) (Van Straaten et al., 2023). Such
technology-driven tools exist next to more traditional solutions, such as multilingual guides
and brochures (Pokorn et al., 2024; Thonon et al., 2021; Van Straaten et al., 2023). Com-
pared with professional interpreters, whether on-site or remote, technology-supported tools
raise issues regarding quality and ethics (Braun et al., 2023). Yet whereas video-mediated

248
Healthcare settings

interpreting and telephone interpreting in healthcare have been scientifically explored to


some extent, research on the most recent developments in technology-supported healthcare
communication is still in its infancy. As a result, the impact of such tools on patients and
public health still needs to be established (Thonon et al., 2021).
This chapter presents different types of technology-based HI, discussing current common
practices and issues associated with its use. First, Section 14.2 centres on configurations of
distance HI, that is, technology-mediated communication involving human interpreters.
Subsequently, Section 14.3 discusses research themes in empirical research in this field. This
is followed by Section 14.4, which examines technological tools (i.e. MT tools and dedicated
care apps) replacing human interpreters in healthcare communication. Finally, Section 14.5
draws conclusions from the most significant findings in the preceding sections and provides
an outlook on future developments. It must be acknowledged that, due to space constraints,
this chapter does not aim to provide a comprehensive overview of technology-based HI but,
instead, sketches prevalent trends, supported by illustrative examples.

14.2 Distance interpreting in healthcare


Distance interpreting, or technology-mediated interpreting (Braun, 2019, 271), is an
umbrella term referring to ‘technologies used to deliver interpreting services and enhance
their reach’. When distance interpreting takes place via telephone, this is called telephone
interpreting (TI); when it is conducted by means of video link, the overarching term is
video-mediated interpreting (VMI). The specific configurations will be discussed in greater
detail in Section 14.2.1. In a recent survey on distance interpreting in public service
interpreting (PSI) settings, healthcare was mentioned as the subdomain with the highest
frequency of distance interpreting services (Corpas Pastor and Gaber, 2020). Trends in
practices in technology-mediated HI, therefore, seem to largely apply to those observed in
the larger domain of PSI. Nevertheless, some specific arguments for using distance inter-
preting in healthcare have been formulated (Kletecka-Pulker and Parrag, 2015, 12). These
can be summarised by four categories of arguments: medical, legal, ethical, and economic.
The first two categories do not necessarily exclusively pertain to distance interpreting but,
rather, to the use of professional interpreters in general (see also Section 14.1). From a
medical point of view, communication is an essential part of the treatment process. It plays
a central role in anamnesis and diagnosis, and from a legal perspective, informed patient
consent can only be achieved through clear communication. The ethical argument that
distance interpreting contributes to safeguarding equal access to healthcare and may be
economically desirable as it can lead to cost and time reduction (Kletecka-Pulker and Par-
rag, 2015, 12), again, specifically applies to distance interpreting. As Skinner et al. (2018,
13) conclude, interpreting services via video link have come to be considered essential to
facilitate communication in the private and public sphere, including medical services.
An element that needs to be considered in the discussion of distance interpreting in
healthcare settings is the use of non-professional distance interpreters. To bridge the lan-
guage gap, healthcare providers may resort to bilingual healthcare staff or other personnel
(Crossman et al., 2010) and informal (i.e. non-professional) interpreters, such as patient’s
family members or acquaintances (Flores et al., 2012; Schouten et al., 2020). Anecdotal
evidence indicates that non-professional RI is quite frequent. This is illustrated by an exam-
ple from the Netherlands2 showing that some hospitals organise non-professional TI using
bilingual personnel and trainees. This results in lists of almost 100 potentially informal

249
The Routledge Handbook of Interpreting, Technology and AI

interpreters. However, the organisation of these voluntary services seems to be rather ad


hoc, and little information on their functioning is available. Braun et al. (2023) also men-
tion real-time crowdsourcing of informal interpreters via digital platforms, web 2.0, and
social media, used during humanitarian crises and situations of urgent medical interven-
tions. The following sections (14.2.1 and 14.2.2), which discuss different types of distance
interpreting, including examples of current practices (Sections 14.2.1 and 14.2.2), will
focus specifically on distance interpreting conducted by trained professional interpreters.

14.2.1 Types of technology-mediated healthcare interpreting


The first type of distance interpreting that was introduced in healthcare settings, in particu-
lar in medical emergencies in many Western countries, was TI (see Farag and Meyer, 2024;
also Lázaro Gutiérrez, this volume). With the increase of immigration and a reduction in
telecommunication cost (Ozolins, 2011), along with the implementation of legal acces-
sibility provisions (Gracia-García, 2002; Skinner et al., 2018), TI has been practised since
the 1960s and grown significantly since the 1970s (Ozolins, 2011), in healthcare as well
as other domains of PSI. In Europe and Australia, the initial organisation of TI was man-
aged by a single major agency in the public sector. In contrast, in the United States, TI ser-
vices were initially provided by the community-based organisation Language Line, which
later became a private company (Kelly, 2008). This also occurred in European countries
such as the Netherlands, where TI services were first organised by a government-subsidised
agency but, over the years, were taken over by private companies. Currently, Global Talk
is a major player in this sector, as the largest supplier of TI services in healthcare (De
Groot et al., 2022). Similar to the situation in the United States, Global Talk (active in the
Netherlands and Belgium) offers several communication solutions specifically for health-
care settings. These include video-mediated interpreting and a digital care application (see
also Sections 14.2.2 and 14.2.3), next to traditional on-site interpreting.
A second type of distance interpreting is video-mediated interpreting (VMI). VMI is an
overarching term referring to all configurations that combine types of videoconferencing and
interpreting (Braun and Taylor, 2012). Within VMI, Braun and Taylor (2012) make a distinc-
tion between videoconference interpreting (VCI), where the interpreter is co-located with one
of the primary participants while the other participant is located elsewhere, and video remote
interpreting (VRI), where the interpreter is the only remote participant. More recently, the
VMI configuration in which all participants (i.e. both primary participants and the interpreter)
are in a different location has been referred to as three-point VMI (Verhaegen, 2023). In VMI
contexts involving deaf participants, video relay service (VRS) is the term used to indicate a
configuration in which each of the participants (the deaf person, the hearing person, and the
interpreter) are in separate locations (Skinner et al., 2018; see Warnicke, this volume). This dif-
fers from VRI in so far as both the interpreter and sign language user have visual access to each
other, whereas the interpreter and spoken language user communicate via audio (Brunson,
2018, 40). However, with the increase in use of easily accessible audiovisual telecommunica-
tion devices, VRS can also refer to three-point VMI, where all participants are connected via
video. In this chapter, the term VMI is used, unless a specific configuration is involved.
VMI has been adopted for a long time by large hospitals in the United States who use, for
example, in-house remote interpreters or interpreters provided by call centres. Yet its imple-
mentation on a global scale remained exceedingly limited until the onset of the COVID-19
pandemic, which precipitated significant changes in this situation. As many researchers

250
Healthcare settings

(e.g. Braun et al., 2023; De Cotret et al., 2020; De Boe et al., 2024) note, the outbreak of
this worldwide health crisis contributed extensively to shifting the balance from on-site to
RI interpreting in general, as governments were obliged to comply with safety protocols.
Although the adoption of VMI greatly accelerated during the COVID-19 pandemic, it is
worth noting that some countries’ governments had actively promoted VMI even before
the health crisis. Particular examples are Denmark (De Cotret et al., 2020) and Norway
(Hansen, 2021). However, VMI in particular seems to have benefited from the worldwide
crisis. For example, in Australia, a record use of VRI was observed in 2020, at the expense
of TI (Bachelier and Orlando, 2024, 82). Other examples are Belgium and the Nether-
lands, where the pandemic also clearly accelerated the market share of VMI3 (Stengers
et al., 2023; Van Straaten et al., 2023). Additionally, there were alterations in the methods
through which VMI was delivered. As a result of social distancing, an increasing number
of VMI services needed to be carried out via a three-point configuration, with each of the
participants located in separate places. Up to that time, the most common type of VMI had
been the configuration in which the primary participants were together in one location and
the interpreter in a separate one (Braun et al., 2023; Verhaegen, 2023).
Besides this, during the pandemic, telehealth platforms had to be adapted incredibly quickly
to integrate VRI (Bachelier and Orlando, 2024). This was the case across Australia, Europe,
and the Middle East (Almahasees et al., 2024). Taking the example of Australia, Bachelier
and Orlando (2024) show that the government-funded telehealth platforms Telehealth and
Healthdirect competed with the generic platforms Zoom, Microsoft Teams, and Cisco Webex
to offer VRI services. In the United States, the inclusion of VRI in telehealth platforms had
already become increasingly common over the last decades. This trend is illustrated by a
growing number of private companies partnering up to offer broader e-health services, such
as InTouchHealth, a telehealth platform, and InDemand Interpreting, a technology-abled
medical interpreting company. Together, they provide ‘virtual care networks’, offering ‘solu-
tions and services to support access and delivery of high-quality clinical care to any patient at
any time while reducing the overall cost of care’ (Teladoc Health, 2018). According to Corpas
Pastor and Sánchez-Rodas (2021, 10), these e-health systems are ‘novel, speedy, low-cost solu-
tions to communications needs in hospitals and healthcare centres’ and rapidly replace ‘tra-
ditional forms of interpreting’. These systems also increasingly integrate interpreter-replacing
communication applications (see Section 14.4) and usually specialise in either spoken lan-
guage or signed language distance interpreting services.

14.2.2 VMI replacing TI in healthcare?


As put forward by Skinner et al., VMI has come to be considered ‘a more effective way of
providing spoken language interpreting services than telephone interpreting’, for the fol-
lowing reasons:

(1) It is widely accepted that spoken language interaction includes important nonver-
bal elements of communication (e.g., eye gaze, gestures, etc.), and (2) the evolution of
technology means it has become much easier to interact via video.
(Skinner et al., 2018, 13)

However, despite these plausible advantages, as well as the boost by the pandemic and
some countries’ government encouragement, for the moment, it remains unclear to what

251
The Routledge Handbook of Interpreting, Technology and AI

extent the claim that VMI has begun to take over TI (e.g. Braun et al., 2023) can actually
be confirmed by current practices in healthcare. Since the organisation of PSI services is
not centralised and, in many cases, provided by commercial suppliers, it is extremely hard
to obtain a comparative overview of numbers indicating the actual share of RI services
in PSI. The same applies to the distribution of VMI and TI within RI, or their use in the
healthcare sector as part of the broader PSI domain. Hence, our understanding is limited to
fragmented insights gleaned from individual countries or even regions, with numbers often
applying to the broader field of PSI. As Bachelier and Orlando (2024, 82) argue regarding
the situation in Australia, although VMI has great potential in our audiovisually oriented
digital era, ‘performing through VRI still remains quite a novel and difficult exercise for
interpreters and medical staff, and the use of this modality is not as widespread as one
could imagine’. This is also illustrated by surveys among healthcare providers from other
countries. For example, a report on technology-based solutions in healthcare settings in
the Netherlands indicated that TI is far better known and used than VRI (Van Straaten
et al., 2023). Another example comes from Norway, where TI clearly gained ground at the
expense of on-site interpreting over the last years. However, the share of VRI, remained
extremely limited in 2021 and 2022 (Imdi 2021, 2022) for all types of interpreting services.
Nevertheless, not all examples follow this trend. The numbers provided for the city of
Brussels4 show that, since its introduction in 2020, the number of VMI services in PSI has
been growing throughout 2021 and 2022, while the number of TI services remained more
or less stable. At the same time, the number of on-site interpreting services clearly dropped.
This indicates that VMI has gained ground, at the expense of on-site interpreting services,
rather than of TI (Sociaal Vertaalbureau, 2022). It must be mentioned, however, that these
numbers are not limited to healthcare settings but represent all PSI services.
The decision to opt for TI, VMI, or on-site interpreting is also closely linked to financial
matters, market-related developments, and users’ willingness to work with the different types
of interpreting. On the one hand, as Lion et al. (2015) observed in their clinical study com-
paring TI and VRI in a paediatric hospital, charges for VRI services were double those of TI.
On the other hand, as Yabe (2020) points out, VRI is, in turn, cheaper than on-site interpret-
ing. As Ozolins (2011, 34) explains, the rise of TI from the mid-1990s onwards was very
much linked to a steep drop of the cost of telephony. Similarly, the ongoing advancements
in videotelephony are set to drive cost-efficiencies, potentially influencing the frequency of its
adoption. Verrept et al. (2018, 59), who investigated the implementation of video-mediated
intercultural mediation in Belgian hospitals from a health-sociological perspective, also noted
a limited willingness of hospital managers to implement VRI. This was due to increased costs
and considering logistical issues to be barriers to efficient structural application. In the same
vein, recent medical research comparing TI and VRI demonstrated that VRI generated greater
user satisfaction. However, a low willingness by healthcare providers to work with it (Fiedler
et al., 2022) has also been observed. Besides this, due to the lack of standardised (VR)I plat-
forms in many countries, interpreters need to be trained on various systems (Bachelier and
Orlando, 2024, 93). This may also be an impediment to larger-scale adoption of VRI.

14.3 Research themes within technology-mediated healthcare interpreting


Having identified practices of remote healthcare interpreting that give at least a partial
idea of the current situation in remote healthcare, this section delves deeper into recurring
research themes. RI within healthcare settings has been extensively investigated by scholars

252
Healthcare settings

from various disciplines. These include medicine, sociology, and interpreting studies, based
on a wide array of research designs. Methodologies span from large-scale randomised tri-
als and clinical surveys to more focused case studies. These draw on authentic materials or
simulations, often analysed from a conversation analytical and/or interactional perspective.
Most of the research, although not all, takes a comparative approach, with one or more
distance interpreting methods being contrasted with on-site interpreting. In some medical
studies, these comparisons also investigate additional options for crossing language barri-
ers, such as bilingual healthcare providers (e.g. Crossman et al., 2010) or informal inter-
preters (e.g. Flores et al., 2012).
Despite the diversity in approaches, the consensus among most studies is that dis-
tance interpreting presents numerous challenges, situated across various themes associ-
ated with remote healthcare interpreting. Issues that have been addressed by research into
technology-mediated HI from the various disciplines coincide with the larger categories
identified by Li (2022) for VMI across contexts and disciplines. These include (1) cost of
time, financial cost, and benefits; (2) physical and psychological costs; (3) users’ acceptance
and satisfaction; and (4) communication quality. Of these themes, user acceptance and, in
particular, satisfaction with the interpreting method stand out across research from both
medical studies and interpreting studies as a common denominator. However, exceptions
aside (e.g. Greenhalgh et al., 2016; Saint-Louis et al., 2003), as Pöchhacker (2006) claims,
medical studies tend to focus on satisfaction with quality of care (e.g. Paras et al., 2002),
next to cost-efficiency (e.g. Masland et al., 2010), whereas research from interpreting stud-
ies focuses more on user satisfaction with communication and interpreting processes. Sys-
tematic overviews of findings from medical studies on spoken language RI in healthcare
can be found in Azarmina and Wallace (2005) and Joseph et al. (2017), whereas a detailed
synthesis is provided in De Boe (2023, 24–32) and Braun et al. (2023, 91–94). An overview
of medical studies on signed language distance interpreting in healthcare was conducted by
Rivas Velarde et al. (2022). The study is highly critical of the state of the art concerning this
topic and emphasises that the Global South is underrepresented in the research. The study
also stresses that VRI has the potential to overcome communication barriers, but that it is
not a ‘quick fix to overcome accessibility issues’. It also states that the views and needs of
the Deaf and Hard of Hearing community should be considered very seriously when devel-
oping this technology (Rivas Velarde et al., 2022).
Divergences in satisfaction results between medical studies and interpreting studies may
also be related to research design and focus. While the former tends to yield predomi-
nantly positive outcomes in support of distance interpreting, satisfaction levels in the latter
are highly varied and appear to be influenced significantly by individual preferences and
configurations. This makes it difficult to provide a concise summary. Interpreting studies
research typically adopts a narrower focus compared to medical studies, often scrutinising
communicative events at a micro-level. Consequently, the outcomes of the latter are less
suitable for broad generalisations.5
In interpreting studies research on remote healthcare interpreting, the link between
communicative challenges and technical conditions is logically a common thread. This is
because these can be extremely unfavourable in PSI settings due to issues with connec-
tivity, hardware and software, inferior sound quality, etc. In surveys conducted among
public sector interpreters, poor sound quality and other technical issues continue to be a
source of inconvenience and frustration (Corpas Pastor and Gaber, 2020). In conference
settings, standards of practice for distance interpreting were developed as early as 1998.

253
The Routledge Handbook of Interpreting, Technology and AI

These aimed to safeguard interpreters from working in unacceptable technical conditions


and were updated during the health crisis.6 However, PSI still lacks any such standardisa-
tion (De Boe, 2023; Defrancq and Corpas Pastor, 2023). Although agencies and companies
generally provide guidelines for using distance interpreting (e.g. Dualia, in Amato, 2020),
even requiring users to enrol on a dedicated crash course introducing them with the basic
functioning of VMI (e.g. Verrept & Coune, 2016), low technological standards continue to
prevail in technology-mediated HI. Many studies touch upon several communicative issues
that are directly or indirectly related to technology. These include coordination (Amato,
2018, 2020; Hansen, 2021; Hansen and Svennevig, 2021; Korak, 2010), accuracy (De Boe,
2023), interpreters’ working conditions and overall well-being (Alley, 2012; Corpas Pastor
and Gaber, 2020; De Meulder et al., 2021; Havelka, 2018; Sultanic, 2022), and viability/
feasibility of VMI (Korak, 2010; Koller and Pöchhacker, 2018; Ozolins, 2011; Pöchhacker,
2014), to name just a few researchers and the most recurrent themes.
Since it is beyond the scope of this chapter to provide a detailed overview of all these
categories of issues, I will highlight some examples of technology-mediated HI research that
focus on the intricate relationships between several of the aforementioned themes, specifi-
cally studies dealing with the interactional dimension of coordination. Such studies increas-
ingly emphasise the role of multimodality in technology-mediated interaction, following
a more general trend in dialogue interpreting studies (Davitti, 2019a). This perspective is
particularly relevant to research on technology-mediated HI, since multimodality is closely
linked to the constraints and affordances of distance interpreting technologies.
Concerning VMI in healthcare, Hansen (2020, 2024) and Hansen and Svennevig (2021)
explore how the use of embodied resources for the purpose of turn-taking are constrained
by the specific configuration of VRI. In these instances, the interpreter has only limited
access to the primary participants’ multimodal resources. The researchers illustrate this
observation with examples from authentic VRI consultations and show how participants’
pre-beginning signals (e.g. an in-breath indicating willingness to take the turn) are not
picked up by the parties in the other location (Hansen and Svennevig, 2021). A lack of
awareness of these restrictions may lead to turn-taking issues or even to communication
breakdown in VRI (De Boe, 2023, 2024). Besides their impact on coordination of the inter-
action, the constraints of VRI have also been linked to the interpreter’s professional perfor-
mance. This is demonstrated by examples from authentic data, as analysed by Pöchhacker
and Klammer (2024). They demonstrate how the interaction is shaped by the institutional,
spatial, and technical environment and, most importantly, that interaction management
depends less on the interpreter than on the healthcare provider. According to the research-
ers, the service provider carries a large part of the responsibility of ensuring effective com-
munication and mutual understanding (Pöchhacker and Klammer, 2024, 67).
In TI research within healthcare settings, it has been found that turn transfers are gener-
ally achieved smoothly in TI, with participants seemingly being more cautious and leaving
silences between their turns to avoid overlap (De Boe, 2021, 2023). Yet since interpreters
have limited possibilities to claim their turn due to the ineffectiveness of using embodied
resources for turn-taking, turns in TI are generally longer (De Boe, 2023). These results
have been confirmed for TI in settings outside of the healthcare domain via the analysis of
turn-taking issues, for example, by Farag and Meyer (2024). Their analysis of turn-taking
issues shows how restricted audibility impacts interpreters’ opportunities to success-
fully claim their turns. However, they also observed that the majority of turn transitions
occurred free of issues. This research also corroborates the findings of Amato (2018), who

254
Healthcare settings

describes difficulties for the interpreter to take control of turn duration in a three-point TI
constellation in service calls, a large number of which were carried in a healthcare context.
By contrast, Amato also reports on explicit efforts by the interpreter to coordinate the
conversation (Amato, 2018, 86). However, as Farag and Meyer (2024, 2) observe, despite
these challenges and a growing use of RI, the linguistic and communicative requirements of
remote dialogue interpreting remain underexplored.

14.4 Technological tools replacing interpreters


As mentioned in Section 14.1, for various socio-economic reasons, technological tools
replacing interpreters are increasingly used in healthcare settings. This section provides
details on the use of and issues with using technological tools to replace interpreters in
language-discordant healthcare communication. The scarce currently existing studies
allude to general translation tools, dedicated care applications, and so-called phrasebooks
containing pre-translated domain-specific phrases.
Thonon et al. (2021) carried out a systematic overview of the technological tools sup-
porting healthcare providers in their communication with migrant patients. They explored
apps aimed at establishing communication as well as healthcare promotion, addressing
non-native-speaking patients, assessing the apps’ features, acceptability, and efficacy. Posi-
tive outcomes for translation apps included reducing the need to call an interpreter, espe-
cially in emergency situations. They also reduced consultation time and patient anxiety.
Negatives results included limitations in the dialogue between healthcare providers and
patients, as well as concerns about the apps hindering therapeutic relationships (Thonon
et al., 2021, 8). The apps that users (healthcare providers and migrant patients) deemed
most acceptable were those that integrated features beyond simple translation. Such fea-
tures include scheduling appointments or entering basic information to prepare a visit.
Additionally, users highly valued the inclusion of audio and video features in both transla-
tion and health promotion apps. The researchers therefore recommended integrating such
features in the development of new applications (Thonon et al., 2021, 11).
Other studies on the use of technological tools mention similar advantages and draw-
backs. In the Netherlands, Van Straaten et al. (2023) carried out a large-scale survey. This
was mostly aimed at healthcare providers, involving patients to a lesser degree, in Dutch
hospitals and had the objective of compiling an inventory of ‘digital support tools’ and
investigating their use. Tools were defined very broadly and included remote interpret-
ing solutions (TI and VMI), real-time translation apps, phrasebook apps, mobile transla-
tion devices, language-supported software, and multilingual information websites. Results
showed that healthcare providers often considered the aforementioned digital support
tools to be accessible alternatives, given the lack of availability of face-to-face interpreters.
Moreover, providers also valued these tools because they are often free of charge and offer
numerous options, such as translating spoken language, written language, or both, and
using these tools saved time compared with using a human interpreter. The authors con-
clude that Google Translate stands out as the best-known and most frequently used digital
communication aid in interaction with non-native patients, before TI. Yet healthcare pro-
viders indicated that they are not always satisfied with the digital tools they currently use.
They perceive these as challenging, mostly due to their inadequate quality. For example, the
translations provided by digital tools are not always deemed to be accurate, leading health-
care providers to not trust them fully. Some participants also mentioned data security as a

255
The Routledge Handbook of Interpreting, Technology and AI

disadvantage of using MT. TI and the voice translation app ‘SayHi’ were rated higher than
Google Translate, although the participants indicated that experiences with various digital
tools varied greatly and the quality of the translation often depended on the language pairs
used (Van Straaten et al., 2023, 5).
Despite the exponential growth in use of digital tools as well as interest from the research
community in various aspects of MT (e.g. Kenny, 2022), so far, few studies have examined
the impact of the use of MT in healthcare communication. One of the few studies that
exist is an early small-scale experiment by Patil and Davies (2014), who translated ten
medical phrases in 26 languages. They found that Google Translate achieved only a 57.7%
accuracy rate for translating medical phrases and should therefore not be relied upon for
crucial medical communication. Nevertheless, as the study lacks clarity on methodological
procedures, and since MT has been greatly evaluated meanwhile, Patil and Davies’s (2014)
results have little relevance today.
A more recent overview study of the use of MT in medicine and law by Vieira et al.
(2021, 1515) confirms that MT errors may indeed pose serious dangers in high-risk envi-
ronments. The authors also note that little is known about the nature of the risks involved,
or of the broader effects of ‘uninformed’ use of MT. In addition, Vieira et al. (2021) show
that research often fails to consider the complexities of language and translation. They also
draw attention to the risk that the use of MT reinforces social inequalities and jeopardises
certain communities. In relation to this, Boujon et al. (2018) point out that MT apps do not
always provide languages that are highly relevant in healthcare. Notable examples include
Tigrinya or sign languages and are not easily adapted to include additional languages.
Several researchers also put forward ethical concerns surrounding the assurance of data
protection that is lacking in such tools (Boujon et al., 2018; Braun et al., 2023).
One way to remedy these objections is to use so-called phrasebook apps (Braun et al.,
2023), ‘phraselators’ (Boujon et al., 2018), or ‘multilingual phrasebooks’ (Pokorn et al.,
2024). These tools are tailored by medical professionals for medical diagnostic scenarios
and comprise a collection of pre-translated, domain-specific standard sentences, including
questions and instructions. As Boujon et al. (2018, 51) argue, unlike MT, phrasebooks apps
offer the benefit of providing reliable translations and are more straightforward to adapt to
new languages or domains. Nevertheless, due to the limited combination of sentences and
translations offered by these tools, users must navigate through menus or keywords to find
a precise sentence. This proves impractical. To tackle this, Boujon’s team from Geneva Uni-
versity Hospitals developed a more sophisticated tool known as ‘BabelDr’. This ASR-based
tool was compared against a traditional phraselator without ASR (MediBabble). The find-
ings indicate that the new tool enabled participants to gather information more swiftly and
effortlessly. Furthermore, it enhanced interaction by enabling users to engage in more natu-
ral conversations compared to using a traditional phrasebook (Boujon et al., 2018, 63).
Similar tools are provided by many commercial companies, for example, Global Talk
Care.7 These are being introduced to the market at such an accelerated pace that it would
be impossible to provide an exhaustive overview of the industry. Nevertheless, all these
apps share common features. They may integrate several tasks, are often managed by large
American companies, and are predominantly the subject of usability studies (see review by
Thonon et al., 2021). While studies offer comprehensive analyses of usability and function-
alities, a gap remains in our understanding of how users interact with these diverse tools and
their impact on language mediation during healthcare interactions (Braun et al., 2023, 101).

256
Healthcare settings

14.5 Conclusion
This chapter set out to discuss current common practices of technology-based solutions
for bridging language gaps in healthcare communication. These include remote healthcare
interpreting by humans, which has seen a considerable increase in the last decades, and
technological tools replacing interpreters, which is a recent phenomenon. Meanwhile, TI
has become an established practice in healthcare. However, the use of VMI is only slowly
catching up in healthcare settings, despite its steep increase during the COVID-19 pan-
demic. Nevertheless, its strong potential is being embraced in our audiovisual era, and users
prefer VMI in comparison with TI. It must be acknowledged that obtaining precise figures
on the use of distance interpreting in healthcare, including respective shares of TI and VMI,
remains a complex task.
Although TI has frequently been designated as an ‘inferior type of interpreting’ (Ozolins,
2011) and considered ‘unsatisfactory’ (Lion et al., 2015), in many countries, TI services in
healthcare now outnumber on-site interpreting services. While both medical and interpret-
ing studies consistently indicate that TI is generally the least-preferred option compared to
on-site or VMI solutions, it seems to be valued more highly compared with technological
tools that replace human interpreters. This highlights the significance of human touch, trust,
and rapport in healthcare and other PSI interactions, in which ‘interaction, non-verbal com-
munication and language paralinguistic information . . . are of paramount importance, and
ethics and confidentiality issues are at stake’ (Corpas Pastor and Gaber, 2020, 58).
Similarly, the medical, legal, ethical, and economic arguments for using distance inter-
preting in healthcare (Kletecka-Pulker and Parrag, 2015) constitute excellent reasons for
pleading against technology-generated, non-human interpreting tools in such critical set-
tings. Given the current low levels of accuracy in MT for some language pairs that are rel-
evant for healthcare communication, the potential of jeopardising patients’ health cannot
be underestimated. As far as dedicated care apps are concerned, although they seem to pose
a lower risk in terms of accuracy, the impact on interactional dynamics and therapeutic
relationships between healthcare providers and patients remains unclear for the moment.
Although such issues have been addressed to a larger extent in technology-mediated
HI, they are also still in need of further mapping. To illustrate, the more recently emerged
three-point VRI configurations, which became more current during the global healthcare
crisis, have not yet been thoroughly investigated. Although the different RI configurations
share some characteristics in terms of their technical constraints, each configuration also
poses its own specific challenges (Skinner et al., 2018, 19). The effect of these emerging
configurations on the coordination of the interaction and doctor–patient rapport has not
yet been fully explored (Braun et al., 2023; Verhaegen, 2023).
In addition, while publications describing interpreters’ working conditions since the pan-
demic are slowly emerging, we currently have little knowledge about the structural impact
of technology support on HI quality, working conditions, or cognitive processes involved in
HI. However, a lack of awareness of such impact is being reported, for example, in the form
of healthcare providers’ reduced willingness and readiness to adopt professional distance
interpreting (Fiedler et al., 2022; Gutman et al., 2020). This reflects a more generalised lack
of awareness of the importance of using professional language mediation, for example, for
Deaf persons in healthcare (Middleton et al., 2010; Iezzoni et al., 2004).
Further issues that need to be addressed urgently pertain to ethical aspects of technol-
ogy use in healthcare communication. So far, little attention has been paid to ethical or

257
The Routledge Handbook of Interpreting, Technology and AI

ecological implications of the use of technology. Data security and an increasing ‘digital
divide’ (Valero-Garcés, 2018) are two such examples. Although these issues are discussed
by some more general research (e.g. Lázaro-Gutiérez et al., 2021), ethical aspects do not
seem to be the main focus of any research into technology-based communication in health-
care settings for the moment. However, with the current advancement of AI-supported tools
(Braun et al., 2023), ethical matters are likely to be of growing importance, and clearer poli-
cies need to be developed. Currently, situations vary from one country to another, and even
between institutions within the same country. For example, in the UK, official guidance
does not endorse the use of MT in primary care, and medical advisers warn against their
use in everyday clinical practice (Braun et al., 2023). In contrast, in other countries, such as
the Netherlands, regulations concerning the use of MT are being investigated but currently
remain unclear (Van Straaten et al., 2023). In the United States, some hospitals prohibit
healthcare clinicians from using non-native-language speakers for medical communication
unless their proficiency has been validated (Lion et al., 2015).
Following Vieira et al. (2021) in their approach to the use of MT in critical settings, we
must conclude that more extensive interdisciplinary research is crucial for understanding
the complexities of technology-based cross-linguistic communication in healthcare. How-
ever, integration between medical and interpreting studies remains limited. To date, each of
these domains has each investigated technology-based solutions for bridging the language
gap in healthcare setting on their own end. Whereas interpreting studies tend to pay atten-
tion to results generated by medical studies, this knowledge flow seems to work only in
one direction. Except for a few studies (e.g. Greenhalgh et al., 2016), medical studies tend
to miss out on the opportunity of including a more linguistically and communicatively ori-
ented perspective, which can be provided by interpreting studies (Davitti, 2019b). Interpret-
ing studies, in turn, could greatly benefit from closer cooperation with medical studies by
gaining access to larger samples of users and solid research environments. A higher level of
interdisciplinarity is therefore urgently needed to further explore the dynamics of human–
machine interaction, user experience, and the impact of technology on cognitive processes
in healthcare settings. Examples include investigations into user experience, that is, how
the different groups involved in healthcare experience technology support (Braun et al.,
2023). Looking ahead, there is an anticipated increase in the use of technology to support
cross-linguistic communication in healthcare (Kerremans et al., 2018, 766), with advance-
ments in artificial intelligence driving the development of new health apps and tools. This
evolving landscape of technology in healthcare presents dynamic opportunities for further
exploring fields, ultimately shaping the future of interpreting and other types of language
mediation in healthcare settings.

Notes
1 The terms ‘e-health’, ‘telehealth’, and ‘telemedicine’ are often used interchangeably in the literature
and correspond to what the World Health Organization defines as telemedicine, that is, ‘delivery of
health care services, where patients and providers are separated by distance’ (WHO-ITU, 2022).
2 Personal communication of the author with hospital employees, March 2024.
3 In their publications of numbers, agencies often do not distinguish between different configurations
of VMI. When it is not sure whether numbers apply to VRI or other types of VMI, the overarching
term VMI is used.
4 www.sociaalvertaalbureau.be/wp-content/uploads/2023/05/JAARVERSLAG-NL-2022.docx-1.pdf
(accessed 4.4.2025).

258
Healthcare settings

5 Another important difference between the medical studies and interpreting studies, pointed out by
Bischoff and Grossmann (2006, 34), is that the former are generally carried out in the United States,
where non-native-speaking patients are predominantly Hispanophone, whereas IS research outside
the United States focuses on migrants, representing a multitude of languages and cultures.
6 AIIC Reference Guide for Remote Simultaneous Interpreting. https://aiic.ch/wp-content/uploads/
2020/05/ aiic-ch-reference-guide-to-rsi.pdf (accessed 4.4.2025).
7 www.globaltalk.be/care-app/ (accessed 4.4.2025).

References
Alley, E., 2012. Exploring Remote Interpreting. International Journal of Interpreter Education 4(1),
111–119.
Almahasees, Z., Al-Natour, M., Mahmoud, S., Aminzadeh, S., 2024. Bridging Communication Gaps
in Crisis: A Case Study of Remote Interpreting in the Middle East During the COVID-19 Pan-
demic. World Journal of English Language 14(2), 462–470. URL https://doi.org/10.5430/wjel.
v14n2p462
Amato, A., 2018. Challenges and Solutions: Some Paradigmatic Examples. In Amato, A., Spinolo, N.,
González Rodríguez, M.J., eds. Handbook of Remote Interpreting: Research Report Shift in Oral-
ity Erasmus + Project: Shaping the Interpreters of the Future and of Today, 79–98. URL www.
shiftinorality.eu/es/resources/2018/05/11/shift-handbook-remote-interpreting
Amato, A., 2020. Interpreting on the Phone: Interpreter’s Participation in Healthcare and Medical
Emergency Service Calls. inTRAlinea Special Issue: Technology in Interpreter Education and Prac-
tice. URL www.intralinea.org/specials/article/2519
Anazawa, R., Ishikawa, H., Kiuchi, T., 2012. The Accuracy of Medical Interpretations: A Pilot Study
of Errors in Japanese-English Interpreters During a Simulated Medical Scenario. Translation &
Interpreting 4(1), 1–20.
Angelelli, C.V., 2014. Interpreting in the Healthcare Setting: Access in Cross-Linguistic Communica-
tion. In Hamilton, H., Chou, S., eds. The Routledge Handbook of Language and Health Com-
munication. Routledge, London, 573–585.
Azarmina, P., Wallace, P., 2005. Remote Interpretation in Medical Encounters: A Systematic Review. Jour-
nal of Telemedicine and Telecare 11, 140–145. URL https://doi.org/10.1258/1357633053688679
Bachelier, K., Orlando, M., 2024. Building Capacity of Interpreting Services in Australian Health-
care Settings: The Use of Video Remote During the COVID-19 Pandemic. Media and Intercul-
tural Communication: A Multidisciplinary Journal 2(1), 80–96. URL https://doi.org/10.22034/
MIC.2024.446261.1015
Bischoff, A., Grossmann, F., 2006. Telefondolmetschen im Spital. Universitat Basel, Institut fur
Pflegewissenschaft, Basel.
Bot, H., 2013. Taalbarrières in de zorg. Van Gorcum, Utrecht.
Boujon, V., Bouillon, P., Spechbach, H., Gerlach, J., Strasly, I., 2018. Can Speech-Enabled Phrasela-
tors Improve Healthcare Accessibility? A Case Study Comparing Babeldr with Medibabble for
Anamnesis in Emergency Settings. In Proceedings of the 1st Swiss Conference on Barrier-free Com-
munication, Winterthur, 50–65. URL https://doi.org/10.21256/zhaw-3000
Braun, S., 2019. Technology in Interpreting. In O’Hagan, M., ed. Routledge Encyclopedia of Transla-
tion Studies. Routledge, London, 271–288. URL https://doi.org/10.4324/9781315311258-19
Braun, S., Al Sharou, K., Temizöz, Ö., 2023. Technology Use in Language-Discordant Interpersonal
Healthcare Communication. In Wadensjö, C., Gavioli, L., eds. The Routledge Handbook of Public
Service Interpreting. Routledge, London, 89–105. URL https://doi.org/10.4324/9780429298202
Braun, S., Taylor, J.L., 2012. AVIDICUS Comparative Studies – Part I: Traditional Interpreting and
Remote Interpreting in Police Interviews. In Braun, S., Taylor, J.L., eds. Videoconference and
Remote Interpreting in Criminal Proceedings. Intersentia, Antwerp, 99–117.
Brunson, J.L., 2018. The Irrational Component in the Rational System: Interpreter Talk About Their
Motivation to Work in Relay Services. In Napier, J., Skinner, R., Braun, S., eds. Here or There:
Research on Interpreting via Video Link. Gallaudet University Press, Washington, DC, 39–60.
URL https://doi.org/10.2307/j.ctv2rh2bs3.5

259
The Routledge Handbook of Interpreting, Technology and AI

Corpas Pastor, G., Gaber, M., 2020. Remote Interpreting in Public Service Settings: Technology, Per-
ceptions and Practice. SKASE Journal of Translation and Interpretation 13(2), 58–78. URL http://
hdl.handle.net/2436/624259
Corpas Pastor, G., Sánchez Rodas, F., 2021. Now What? A Fresh Look at Language Technolo-
gies and Resources for Translators and Interpreters. In Lavid-López, J., Maíz-Arévalo, C.,
Zamorano-Mansilla, J., eds. Corpora in Translation and Contrastive Research in the Digital Age.
John Benjamins, Amsterdam, 23–48. URL https://doi.org/10.1075/btl.158.01cor
Crossman, K.L., Wiener, E., Roosevelt, G., Bajaj, L., Hampers, L.C., 2010. Interpreters: Telephonic,
In-Person Interpretation and Bilingual Providers. Pediatrics 125(3), 631–638. URL https://doi.
org/10.1542/peds.2009-0769
Davitti, E., 2019a. Methodological Explorations of Interpreter-Mediated Interaction: Novel
Insights from Multimodal Analysis. Qualitative Research 19(1), 7–29. URL https://doi.
org/10.1177/1468794118761492
Davitti, E., 2019b. Healthcare Interpreting. In Baker, M., Saldanha, G., eds. Routledge Encyclopedia
of Translation Studies. Routledge, London.
Dean, R.K., Pollard, R.Q. Jr., 2011. Context-Based Ethical Reasoning in Interpreting: A Demand
Control Schema Perspective. Interpreter and Translator Trainer 5(1), 155–182. URL https://
doi.org/10.1080/13556509.2011.10798816
De Boe, E., 2015. The Influence of Governmental Policy on Public Service Interpreting in the Neth-
erlands. The International Journal for Translation & Interpreting Research 7(3), 166–184. URL
https://doi.org/10.12807/ti.107203.2015.a12
De Boe, E., 2021. Management of Overlapping Speech in Remote Healthcare Interpreting. The Inter-
preter’s Newsletter 26, 137–155. URL https://doi.org/10.13137/2421-714X/33268; www.open
starts.units.it/dspace/handle/10077/2119
De Boe, E., 2023. Remote Interpreting in Healthcare Settings. Peter Lang, London. URL https://
doi.org/10.3726/b18200
De Boe, E., 2024. Synchronization of Interaction in Healthcare Interpreting by Video Link and Telephone. In
De Boe, E., Vranjes, J., Salaets, H., eds. Interactional Dynamics in Remote Interpreting: Micro-Analytical
Approaches. Routledge, London, 22–41. URL https://doi.org/10.4324/9781003267867.
De Boe, E., Vranjes, J., Salaets, H., 2024. About the Need for Micro-Analytical Investigations in
Remote Dialogue Interpreting. In De Boe, E., Vranjes, J., Salaets, H., eds. Interactional Dynamics
in Remote Interpreting: Micro-Analytical Approaches. Routledge, London, 1–21. URL https://doi.
org/10.4324/9781003267867
De Cotret, F.R., Beaudoin-Julien, A.-A., Leanza, Y., 2020. Implementing and Managing Remote Pub-
lic Service Interpreting in Response to COVID-19 and Other Challenges of Globalization. Meta
65(3), 618–642. URL https://doi.org/10.7202/1077406ar
Defrancq, B., Corpas Pastor, G., 2023. Introduction. In Corpas Pastor, G., Defrancq, B., eds. Inter-
preting Technologies: Current and Future Trends. John Benjamins, Amsterdam, 1–6. URL https://
doi.org/10.1075/ivitra.37.intro
De Groot, E., Fransen, L., Van Dam, F., Pinckaers, E., Berkhout, B., 2022. Tolken in de zorg: Een
overzicht van huidige inzet, financiering en knelpunten [Research Rapport]. Berenschot, Utrecht.
De Meulder, M., Pouliot, O., Gebruers, K., 2021. Remote Sign Language in Times of COVID-19
[Research Report]. Kenniscentrum Gezond en Duurzaam Leven, Hogeschool Utrecht, Utrecht.
www.hu.nl/onderzoek/publicaties/remote-sign-language-interpreting-in-times-of-covid-19
(accessed 1.11.2024).
Esterle, L., Mathieu-Fritz, A., 2013. Teleconsultation in Geriatrics: Impact on Professional Practice.
International Journal of Medical Informatics 82(8), 684–695. URL http://dx.doi.org/10.1016/j.
ijmedinf.2013.04.006
Farag, F., Meyer, B., 2024. Coordination in Telephone-Based Remote Interpreting. Interpreting 26(1),
80–130. URL https://doi.org/10.1075/intp.00097.far
Fiedler, J., Pruskil, S., Wiessner, C., Zimmermann, T., Scherer, M., 2022. Remote Interpreting in Pri-
mary Care Settings: A Feasibility Trial in Germany. BMC Health Services Research 22(1). URL
https://doi.org/10.1186/s12913-021-07372-6
Flores, G., Abreu, M., Barone, C.P., Bachur, R., Lin, H., 2012. Errors of Medical Interpretation
and Their Potential Clinical Consequences: A Comparison of Professional Versus Ad Hoc Versus
No Interpreters. Annals of Emergency Medicine 60(5), 545–553. URL https://doi.org/10.1016/j.
annemergmed.2012.01.025

260
Healthcare settings

González Rodríguez, M.J., Spinolo, N., 2017. Telephonic Dialogue Interpreting. In Niemants, N.,
Cirillo, L., eds. Teaching Dialogue Interpreting. John Benjamins, Amsterdam, 242–257. URL
https://doi.org/10.1075/btl.138.12gon
Gracia-García, R.A., 2002. Telephone Interpreting: A Review of Pros and Cons. In Brennan, S., ed.
Proceedings of the 43rd Annual Conference. American Translators Association, Alexandria, VA,
195–216.
Greenhalgh, T., Vijayaraghavan, S., Wherton, J., Shaw, S., Byrne, E., Campbell-Richards, D., Bhat-
tacharya, S., Hanson, P., Ramoutar, S., Gutteridge, C., Hodkinson, I., Collard, A., Morris, J.,
2016. Virtual Online Consultations: Advantages and Limitations (VOCAL) Study. BMJ Open 6,
e009388. URL https://doi.org/10.1136/bmjopen-2015-009388
Gutman, C.K., Klein, E.J., Follmer, K., Brown, J.C., Ebel, B.E., Lion, K.C., 2020. Deficiencies in
Provider-Reported Interpreter Use in a Clinical Trial Comparing Telephonic and Video Interpre-
tation in a Pediatric Emergency Department. Joint Commission Journal on Quality and Patient
Safety 46(10), 573–580. URL https://doi.org/10.1016/j.jcjq.2020.08.001
Hale, S., 2007. Community Interpreting. Palgrave Macmillan, Hampshire/Basingstoke.
Hansen, J.P.B., 2020. Invisible Participants in a Visual Ecology: Visual Space as a Resource for Organ-
izing Video-Mediated Interpreting in Hospital Encounters. Social Interaction. Video-Based Studies
of Human Sociality 3(3), 1–25. URL https://doi.org/10.7146/si.v3i3.122609
Hansen, J.P.B., 2021. Video-Mediated Interpreting: The Interactional Accomplishment of Interpret-
ing in Video-Mediated Environments (Unpublished doctoral thesis). University of Oslo, Oslo.
Hansen, J.P.B., 2024. Interpreters’ Repair Initiators in Video-Mediated Environments. In De Boe,
E., Vranjes, J., Salaets, H., eds. Interactional Dynamics in Remote Interpreting: Micro-Analytical
Approaches. Routledge, London, 91–112. URL https://doi.org/10.4324/9781003267867
Hansen, J.P.B., Svennevig, J., 2021. Creating Space for Interpreting Within Extended Turns at Talk.
Journal of Pragmatics 182, 144–162. URL https://doi.org/10.1016/j.pragma.2021.06.009
Haualand, H., 2010. Provision of Videophones and Video Interpreting for the Deaf and Hard
of Hearing: A Comparative Study of Video Interpreting (IV) Systems in the US, Norway and
Sweden. The Swedish Institute of Assisted Technology. www.independentliving.org/files/
haualand20100924video-interpreting-systems.pdf (accessed 23.2.2024).
Havelka, I., 2018. Videodolmetschen im Gesundheitswesen: Dolmetschwissenschaftliche Untersu-
chung eines österreichisches Pilotprojektes. Frank & Timme, Berlin.
Hedding, T., Kaufman, G., 2012. Health Literacy and Deafness: Implications for Interpreter
Education. In Swabey, L., Malcolm, K., eds. In Our Hands: Educating Healthcare Interpret-
ers. Gallaudet University Press, Washington, DC, 164–189. URL https://doi.org/10.2307/j.
ctv2rcnmkt.12
Iezzoni, L.I., O’Day, B.L., Killeen, M., Harker, H., 2004. Communicating About Health Care: Obser-
vations from Persons Who Are Deaf or Hard of Hearing. Annals of Internal Medicine 140(5),
356–362. URL https://doi.org/10.7326/0003-4819-140-5-200403020-00011
Imdi, 2021. Offentlige organers behov for tolking 2021. Faktaark 2021. URL www.imdi.no/­
contentassets/c669ebfc896d4fcc847b29b9ea14ae90/faktaark-2021.pdf (accessed 20.2.2024).
Imdi, 2022. Offentlige organers behov for tolking 2022. Faktaark 2022. URL www.imdi.no/
contentassets/261ab2f7b670401797f53d72fd574621/faktaark-2022.pdf (accessed 20.2.2024).
Jacobs, E.A., Shepard, D.S., Suaya, J.A., Stone, E., 2004. Overcoming Language Barriers in Health
Care: Costs and Benefits of Interpreter Services. American Journal of Public Health 94(5), 866–869.
URL https://doi.org/10.2105/ajph.94.5.866
Joseph, C., Garruba, M., Melder, A., 2017. Patient Satisfaction of Telephone or Video Interpreter
Services Compared with In-Person Services: A Systematic Review. Australian Health Review 42(2),
168–177. URL https://doi.org/10.1071/AH16195
Kelly, N., 2008. Telephone Interpreting: A Comprehensive Guide to the Profession. Trafford Publish-
ing, Bloomington, IN.
Kenny, D., ed., 2022. Machine Translation for Everyone: Empowering Users in the Age of Artificial
Intelligence. Language Science Press, Berlin. URL http://doi.org/10.5281/zenodo.6653406
Kerremans, K., De Ryck, L., De Tobel, V., Janssens, R., Rillof, P., Scheppers, M., 2018. Bridging the
Communication Gap in Multilingual Service Encounters: A Brussels Case Study. The European
Legacy 23(7–8), 757–772. URL https://doi.org/10.1080/10848770.2018.1492811
Kletecka-Pulker, M., Parrag, S., 2015. Pilotprojekt Qualitätssicherung in der Versorgung
nicht-deutschsprachiger PatientInnen: Videodolmetschen im Gesundheitswesen (research report).

261
The Routledge Handbook of Interpreting, Technology and AI

Platform Patientensicherheit. URL https://ierm.univie.ac.at/fileadmin/user_upload/i_ierm/


Projekte/Endbericht_QVC-_Qualitaetssicherung_von_nicht-deutschsprachigen_PatientInnen.
pdf (this is the correct link) (accessed 12.2.2024).
Koller, M., Pöchhacker, F., 2018. The Work and Skills . . . a Profile of First-Generation Video Remote
Interpreters. In Napier, J., Skinner, R., Braun, S., eds. Here or There: Research on Interpreting via
Video Link. Gallaudet University Press, Washington DC, 89–110.
Korak, C., 2010. Remote Interpreting via Skype. Anwendungsmöglichkeiten von VoIP-Software im
Bereich Community Interpreting – Communicate Everywhere? Frank & Timme, Berlin.
Lázaro-Gutiérrez, R., Iglesias-Fernández, E., Cabrera-Méndez, G., 2021. Ethical Aspects of Tel-
ephone Interpreting Protocols. Verba Hispanica 29(1), 137–156. URL https://doi.org/10.4312/
vh.29.1.137-156
Li, Y., 2022. Video-Mediated Interpreting: A Systematic Critical Review and Pedagogical Implications
for Interpreter Education. International Journal of Emerging Technologies in Learning 17(19),
191–206. URL https://doi.org/10.3991/ijet.v17i19.31653
Lion, K.C., Brown, J.C., Ebel, B.E., Klein, E.J., Strelitz, B., Gutman, C.K., Hencz, P., Fernandez,
J., Mangione-Smith, R., 2015. Effect of Telephone vs Video Interpretation on Parent Compre-
hension, Communication, and Utilization in the Pediatric Emergency Department: A Rand-
omized Clinical Trial. JAMA Pediatrics 169(12), 1117–1125. URL https://doi.org/10.1001/
jamapediatrics.2015.2630
Masland, M.C., Lou, C., Snowden, L., 2010. Use of Communication Technologies to Cost-Effectively
Increase the Availability of Interpretation Services in Healthcare Settings. Telemedicine and
e-Health 16(6), 739–745. URL https://doi.org/10.1089/tmj.2009.0186
Middleton, A., Turner, G.H., Bitner-Glindzicz, M., Lewis, P., Richards, M., Clarke, A., Stephens,
D., 2010. Preferences for Communication in Clinic from Deaf People: A Cross-Sectional Study.
Journal of Evaluation in Clinical Practice 16(4), 811–817. URL https://doi.org/10.1111/j.1365-
2753.2009.01207.x
National Interpreter Education Center, 2008. Phase One Deaf Consumer Needs Assessment: Final
Report (research report). URL http://www.interpretereducation.org/wp-content/uploads/2011/06/
FinalComparisonAnalysis.pdf (accessed 2.1.2024).
Ng, E.N.S., Crezee, I., 2020. Interpreting in Legal and Healthcare Settings: Perspectives on Research
and Training. John Benjamins, Amsterdam. URL https://doi.org/10.1075/btl.151
Ozolins, U., 2011. Telephone Interpreting: Understanding Practice and Identifying Research Needs.
Translation & Interpreting 3(1), 33–47.
Paras, M., Leyva, O., Berthold, T., Otake, R., 2002. Videoconferencing Medical Interpretation: The
Results of Clinical Trials. Health Access Foundation, Oakland, CA.
Patil, S., Davies, P., 2014. Use of Google Translate in Medical Communication: Evaluation of Accu-
racy. British Medical Journal 349. URL https://doi.org/10.1136/bmj.g7392
Phelan, M., 2012. Medical Interpreting and the Law in the European Union. European Journal of
Health Law 19, 333–335. URL https://doi.org/10.1163/157180912X650681
Pöchhacker, F., 2006. Research and Methodology in Healthcare Interpreting. In Hertog, E., Van der
Veer, B., eds. Taking Stock: Research and Methodology in Community Interpreting. Linguistica
Antverpiensia 5, 135–159.
Pöchhacker, F., 2014. Remote Possibilities: Training Simultaneous Video Interpreting for Austrian Hos-
pitals. In Nicodemus, B., Metzger, M., eds. Investigations in Healthcare Interpreting. ­Gallaudet Uni-
versity Press, Washington, DC, 302–326. URL https://doi.org/10.2307/j.ctv2rh2bq2.13
Pöchhacker, F., Klammer, M., 2024. Ensuring Understanding in Video Remote-Interpreted Doc-
tor – Patient Communication. In De Boe, E., Vranjes, J., Salaets, H., eds. Interactional Dynamics
in Remote Interpreting: Micro-Analytical Approaches. Routledge, London, 66–90. URL https://
doi.org/10.4324/9781003267867-4
Pöchhacker, F., Schlesinger, M., 2005. Introduction: Discourse Based Research on Healthcare Inter-
preting. Interpreting 7(2), 157–165. URL https://doi.org/10.1075/intp.7.2.01poc
Pokorn, N.K., Esih, M., Zelko, E., Hirci, N., Milavec Kapun, M., Mikolič Južnič, T., 2024. Simu-
lated Role Plays or Field Observation? Usability Testing of a Healthcare Phrasebook. Perspectives.
Online, 1–19. URL https://doi.org/10.1080/0907676X.2024.2326965
Ribera, J.M., Hausmann-Muela, S., Grietens, K.P., Toomer, E., 2008. Is the Use of Interpreters in Med-
ical Consultations Justified? A Critical Review of the Literature. PASS International v.z.w. www.

262
Healthcare settings

semanticscholar.org/paper/PASS-International-Is-the-use-of-interpreters-in-Ribera-Hausmann-
Muela/46e06f27509729bb1711780e65c00c7a0c3e87e1 (accessed 3.3.2024).
Rivas Velarde, M., Jagoe, C., Cuculick, J., 2022. Video Relay Interpretation and Overcoming Barri-
ers in Health Care for Deaf Users: Scoping Review. Journal of Medical Internet Research 24(6),
e32439. URL https://doi.org/10.2196/32439
Saint-Louis, L., Friedman, E., Chiasson, E., Quessa, A., Novaes, F., 2003. Testing New Technologies
in Medical Interpreting. Cambridge Health Alliance, Somerville, MA. URL https://icommunity-
health.org/publications/testing-new-technologies-in-medical-interpreting/
Schinkel, S., Schouten, B.C., Kerpiclik, F., Van Den Putte, B., Van Weert, J.C.M., 2019. Perceptions of
Barriers to Patient Participation: Are They Due to Language, Culture, or Discrimination? Health
Communication 34(12), 1469–1481. URL https://doi.org/10.1080/10410236.2018.1500431
Schouten, B.C., Cox, A., Duran, G., Kerremans, K., Banning, L.K., Lahdidioui, A., van den Muijsen-
bergh, M., Schinkel, S., Sungur, H., Suurmond, J., Zendedel, R., 2020. Mitigating Language and
Cultural Barriers in Healthcare Communication: Toward a Holistic Approach. Patient Education
and Counseling 103(12), 2604–2608. URL https://doi.org/10.1016/j.pec.2020.05.001
Skinner, R., Napier, J., Braun, S., 2018. Interpreting via Video Link: Mapping the Field. In Napier,
J., Skinner, R., Braun, S., eds. Here or There: Research on Interpreting via Video Link. Gallaudet
University Press, Washington, DC, 11–39. URL https://doi.org/10.2307/j.ctv2rh2bs3.4
Sociaal Vertaalbureau, 2022. Jaarverslag (Annual Report). URL https://www.sociaalvertaalbureau.be/
wp-content/uploads/2023/06/Jaaroverzicht-2022-NL.png (accessed 1.3.2024).
Stengers, H., Lázaro-Gutiérrez, R., Kerremans, K., 2023. Public Service Interpreters’ Perceptions and
Acceptance of Remote Interpreting Technologies in Times of a Pandemic. In Corpas Pastor, G.,
Defrancq, B., eds. Interpreting Technologies: Current and Future Trends. John Benjamins, Amster-
dam. https://doi.org/10.1075/ivitra.37.05ste
Sultanic, I., 2022. Interpreting in Pediatric Therapy Settings During the COVID-19 Pandemic: Ben-
efits and Limitations of Remote Communication Technologies and Their Effect on Turn-Taking
and Role Boundary. FITISPos International Journal 9(1), 78–101. URL https://doi.org/10.37536/
FITISPos-IJ.2023.1.9.313
Swabey, L., Laurion, R., Patrie, C., Ramirez, R., 2017. Using a Career Lattice to Chart a Path to Compe-
tency in Healthcare Interpreting. Conference of Interpreter Trainers – Out of the Gate, Towards the
Triple Crown: Research, Learn & Collaborate. URL https://citsl.org/using-a-career-lattice-to-chart-
a-path-to-competency-in-healthcare-interpreting/ (accessed 14.2.2024).
Sweileh, W.M., Al-Jabi, S.W., AbuTaha, A.S., Zyoud, S.H., Anayah, F.M.A., Sawalha, A.F., 2017. Bib-
liometric Analysis of Worldwide Scientific Literature in Mobile - Health: 2006-2016. BMC Medical
Informatics and Decision Making 30, 17(1), 72. URL https://doi.org/10.1186/s12911-017-0476-7
Teladoc Health, 2018. InTouch Health & InDemand Interpreting Partnership Enables Clinicians
to Connect with Interpreters. URL https://business.teladochealth.com/newsroom/intouch-health/
indemand-interpreting-partners-with-intouch-health/. (accessed 1.14.2025).
Thonon, F., Perrot, S., Yergolkar, A.V., Rousset-Torrente, O., Griffith, J.W., Chassany, O., Duracin-
sky, M., 2021. Electronic Tools to Bridge the Language Gap in Health Care for People Who Have
Migrated: Systematic Review. Journal of Medical Internet Research 6:23(5), e25131. URL https://
doi.org/10.2196/25131
Valero-Garcés, C., 2018. Introduction PSIT and Technology: Challenges in the Digital Age. FITISpos
International Journal 5(1), 1–6. URL https://doi.org/10.37536/FITISPos-IJ.2018.5.1.185
Van Straaten, W., Bloem, W., Gilhuis, N., Schipper. E., Boonen, L., 2023. Digitale hulpmiddelen
voor het overkomen van taalbarrières. Equalis Strategy & Modeling, Utrecht. URL https://open.
overheid.nl/documenten/5da7a55d-461b-4277-8c0f-3831baba1390/file (accessed 1.14.2025).
Verhaegen, M., 2023. Exploring Turn-Taking in Video-Mediated Interpreting: A Research Methodol-
ogy Using Eye Tracking. Interpreter’s Newsletter 28, 151–169. URL https://www.openstarts.units.
it/handle/10077/35555
Verrept, H., Coune, I., 2016. Guide for Intercultural Mediation in Healthcare. Federale over-
heidsdienst Volksgezondheid, veiligheid van de voedselketen en leefmilieu - Cel intercul-
turele bemiddeling & beleidsondersteuning, Brussels. URL https://www.health.belgium.be/nl/
gids-voor-de-interculturele-bemiddeling-de-gezondheidszorg (accessed 3.12.2024).
Verrept, H., Coune, I., Van de Velde, J., Baatout, S., 2018. Evaluatie projecten interculturele bemid-
deling via videoconferentie. Research report Federale overheidsdienst Volksgezondheid, veiligheid

263
The Routledge Handbook of Interpreting, Technology and AI

van de voedselketen en leefmilieu, Cel interculturele bemiddeling & beleidsondersteuning. URL


https://www.health.belgium.be/nl/evaluatierapport-2018-interculturele-bemiddeling-op-afstand-
videoconferentie (accessed 3.12.2024).
Vieira, L.N., O’Hagan, M., O’Sullivan, C., 2021. Understanding the Societal Impacts of Machine Trans-
lation: A Critical Review of the Literature on Medical and Legal Use Cases. Information, Com-
munication & Society 24(11), 1515–1532. URL https://doi.org/10.1080/1369118X.2020.1776370
World Healthcare Organization & International Telecommunication Unit, 2022. WHO-ITU Global
Standard for Accessibility of Telehealth Services. URL https://iris.who.int/bitstream/handle/
10665/356160/9789240050464-eng.pdf?sequence=1 (accessed 14.1.2025).
Yabe, M., 2020. Healthcare Providers’ and Deaf Patients’ Interpreting Preferences for Critical Care
and Non-Critical Care: Video Remote Interpreting. Disability and Health Journal 13, 100870.
URL https://doi.org/10.1016/j.dhjo.2019.100870

264
15
LEGAL SETTINGS
Jérôme Devaux

15.1 Introduction
Within the field of public service interpreting, legal interpreting is an umbrella term that
refers to interpreting during criminal and civil proceedings, when national and cross-national
cases are investigated and heard. It encompasses various settings, including police stations,
criminal and civil courts, asylum and immigration tribunals,1 correctional facilities, and
probation services (Hertog, 2015; Monteolivia-Garcia, 2018).
The advent of technology has brought significant changes to the practice of legal inter-
preting. While interpreters have been a part of legal proceedings for centuries (Morris,
1999), the inclusion of technology during the Nuremberg trials marked a new era in legal
interpreting, as the use of headsets and microphones allowed interpreters to simultaneously
interpret the trial proceedings.
More recently, technology has advanced rapidly, reshaping the legal interpreter’s working
environment. Technological systems such as telephone and videoconferencing have enabled
court users to participate in the legal process remotely. These systems have introduced new
modalities for interpreting beyond face-to-face interactions. Under the overarching term
of distance interpreting, telephone/audio-mediated interpreting and video-mediated inter-
preting2 have enabled participants to take part in multilingual legal proceedings remotely.
For example, a court hearing can now take place with the defendant in prison, attending
their remand hearing through a video link, while the interpreter is physically present in the
courtroom. Similarly, an interview between a police officer and a minority-language suspect
can be conducted at a police station with the assistance of an interpreter located in a remote
interpreting hub, many miles away.
The use of technology in legal proceedings and the impact of conducting legal proceed-
ings through technology have been researched since the 1980s, primarily in monolingual
legal settings. Scholarly investigations have explored various areas, including the concepts
of fairness and the rule of law (Johnson and Wiggins, 2006; Radburn-Remfry, 1994; Thax-
ton, 1993), the influence of technology on participants (Fullwood et al., 2008; McKay,
2016; Radburn-Remfry, 1994; Roth, 2000), and the challenges associated with technologi-
cal implementation (Haas, 2006; Plotnikoff and Woolfson, 1999, 2000). It is through the


DOI: 10.4324/9781003053248-20
The Routledge Handbook of Interpreting, Technology and AI

groundbreaking work of the AVIDICUS3 (2008–2016) and SHIFT4 projects that research
began to scrutinise the use of technology in interpreter-mediated legal interactions in a
more systematic fashion. These innovative projects initiated a new trajectory of research,
fostering a more comprehensive understanding of the interplay between technology and
legal interpreting.
Reviewing the current practice and existing body of research, this chapter examines
how technology has transformed the work of legal interpreters. The first section provides
a contextual background on the use of technology in legal interpreting. The second section
reviews the effects of technology on the legal interpreter’s working environment. The third
section analyses the impact of technology on the interpreting process itself. Finally, the last
section explores the influence of technology on legal interpreters’ coping mechanisms and
training.

15.2 Contextualising the use of technologies in legal interpreting


The application of technology in legal interpreting is primarily centred on facilitating
remote participation of one or more participants in the legal process. For this reason, this
section predominantly focuses on the use and research of telephone and videoconferencing
systems, which emerge as the prevailing technologies in legal interpreting. Nonetheless, this
section also explores other forms of technology that may support or affect the work of the
legal interpreter in the future.

15.2.1 Telephone systems


In the latter half of the 20th century, telephone systems revolutionised multilingual commu-
nication methods. For instance, telephone lines have allowed Deaf individuals to access ser-
vices through teletypewriters, or telecommunication devices for the Deaf, via the assistance
of relay operators since the 1960s (Lee, 2020). This technology is still in use within prison
settings in some countries, even though its outdated nature and the challenges it presents
for Deaf inmates have raised concerns (Thompson, 2017).
Regarding spoken languages, telephone interpreting (TI) emerged as an emergency
response in Australia in 1973 to address migrants’ access to public services. It subsequently
found adoption in the United States in 1981 as a means for police forces to access interpret-
ers. Then, throughout the 1980s and 1990s, TI started to be adopted in numerous coun-
tries, including Western Europe (Kelly, 2008). Nowadays, TI is widely used, with private
companies offering 24/7 TI services to law enforcement institutions. However, despite its
prolific use, TI has not received as much extensive research attention as other modalities,
particularly in a legal setting (Ozolins, 2011; Xu et al., 2020; see also Lázaro Gutiérrez,
this volume).

15.2.2 Videoconferencing technology


Building upon the benefits offered by telephone systems, videoconferencing (VC) technol-
ogy provides an audio and video feed to participants in a legal interaction, thus allowing
them to see and hear each other.
Videoconference interpreting (VCI) in a legal context was first experimented with in
Cook County Circuit Court in the United States in 1972. It then gained popularity in

266
Legal settings

anglophone countries, particularly within court settings (Braun and Taylor, 2012b; Dumou-
lin and Licoppe, 2010). Nowadays, this technology forms an integral part of day-to-day
court operations in countries such as England and Wales, as evidenced by the annual report
of Her Majesty’s Court and Tribunal Services (2020).
Examining its use in practice, Braun (2018) reviews how videoconferencing technol-
ogy is employed during criminal procedures to establish a connection between the court
and a defendant in prison. This technology also allows witnesses to provide evidence from
geographically remote locations and enables vulnerable witnesses to testify from separate
rooms within the court premises. Furthermore, it enables lawyers to communicate with
defendants in prison from their offices or courtrooms. This technology is particularly used
for short pre-trial hearings, lasting between 30 and 45 min (Braun et al., 2016a, 2018).
However, it is also noted that VC sessions may last much longer when a witness is giving
evidence via a video link, for instance (Braun et al., 2018). Although there is relatively less
research available, especially when an interpreter is present, videoconferencing technology
is also used in civil proceedings (Her Majesty’s Court and Tribunal Services, 2020; Zahrast-
nik and Baghrizabehi, 2022).
Considering the different locations a legal interpreter may be interpreting from, studies
(e.g. Braun and Taylor, 2012b; Devaux, 2017a, 2018; Singureanu et al., 2023a) distinguish
between videoconference interpreting A (VCI A), where the interpreter is co-located with
the participants in the courtroom while the minority-language speaker is in a remote loca-
tion. Videoconference interpreting B (VCI B) refers to the case scenario where the legal
interpreter is co-located with the minority-language speaker in a remote location.
Another case scenario that has emerged is remote interpreting (RI), which was first piloted
by international institutions in the 1990s and was later on adopted by courts and police
forces with spoken language interpreters (Braun, 2019). It led to the creation of interpreting
hubs, such as for the Ninth Circuit Court in Florida in 2007 (Braun and Taylor, 2012b).
In this context, the legal interpreter is not co-located with any of the other p ­ articipants.
Similarly, the Metropolitan Police Service in London created seven hubs across the city to
provide RI services to the police in 2011. The aim was to reduce interpreters’ travel costs,
which were estimated at 33% of the total interpreting budget at that time (Braun, 2018).
One specific modality of RI in the context of sign language interpreting is the video
relay service (VRS; see Warnicke, this volume). This hybrid modality has been used since
the 1990s to establish a video connection between a deaf person and an interpreter while
the interpreter and the other participant are connected via an audio feed. Conversely, in
video remote interpreting (VRI) in North America, the interpreter is located in a hub and
is linked via an audio and video feed to the deaf service user and the other participant,
who are located in the same room. This differs from European practice, where VRI refers
to the sign language interpreter working with audio-video technology (Lee, 2020; Skinner
et al., 2018). VRS and VRI are used in a variety of legal contexts, including non-emergency
calls, courtrooms, and police stations (Napier and Leneham, 2011; Skinner, 2020; Skinner
et al., 2021).

15.2.3 Benefits of using technologies in legal settings


Despite the drawbacks discussed in Sections 15.2 and 15.3 later, the increasing use and
popularity of technologies, particularly videoconferencing systems, in legal settings can be
attributed to several benefits. They include saving time, reducing financial costs, enhancing

267
The Routledge Handbook of Interpreting, Technology and AI

safety measures, protecting vulnerable witnesses, avoiding further traumatisation, and even
allowing witnesses to provide evidence from hospital beds (Adamska-Gallant, 2016; Ali
and Al-Junaid, 2019).
Additionally, the use of technology shows potential benefits for legal interpreters.
Indeed, it is argued that TI, VCI, RI, and VRS have expanded the pool of available legal
interpreters, especially for rare language combinations or for multilingual proceedings tak-
ing place in remote locations. Technology has also been advocated as a means to reduce the
interpreter’s travel time and costs, and to enhance their safety (Braun, 2013c; Braun and
Davitti, 2018; Devaux, 2016; Hale et al., 2022). Furthermore, legal interpreters have iden-
tified technology as a way to maintain neutrality, preserve anonymity, avoid interruptions
from the minority-language speaker, exert more control over breaks, and mitigate the risk
of malicious complaints (Hale et al., 2022).

15.2.4 Frameworks and codes regulating technology and the legal interpreter
Article 2(6) of the Directive 2010/64/EU on the right to interpretation in criminal proceed-
ings5 in Europe states that technologies, such as telephone and videoconferencing systems,
can be used during interpreter-mediated interactions. To ensure the suitability of the afore-
mentioned equipment for its intended purposes, the General Secretariat of the Council of
the European Union (2013) highlights the importance of adhering to various standards.
Within this context, the International Telecommunication Union (2024) puts forward an
extended list of standards, covering numerous (visual) telephoning and videoconferencing
aspects, such as video coding, picture-in-picture functionality, and real-time control pro-
tocols. These standards hold significance so that legal interpreters may be provided with
optimal audio and video quality.
At a more local level, some government organisations, interpreting bodies, and inter-
preting agencies have amended their guidelines and codes of conduct and practice to reflect
the fact that legal interpreters may be called to work via telephone and videoconferencing
systems (see, for instance, American Translators Association, 2021; College of Policing,
2020; Home Office, 2021; New York State Unified Court System, 2020). These amend-
ments cover a range of provisions, starting from a basic recognition that technology can
be used in legal interpreting to a more comprehensive set of guidelines outlining acceptable
practices for legal interpreters. However, it could be argued that these may not be sufficient,
as guidelines and codes of ethics are not consistently followed. Xu et al. (2020) analyse
interviews between lawyers and their clients, mediated via TI, and find that interpreters
do not always adhere to the guidance offered in their code of ethics. Similarly, Devaux
(2017b) interviews legal interpreters working in VCI in courts and concludes that codes of
ethics may not always offer the most suitable framework for addressing ethical dilemmas
encountered by court interpreters in VCI. He argues that other ethical approaches, such as
consequentialism, moral sentiments, and virtue ethics, may provide more suitable guidance
in the context of technology-mediated interpreting.

15.2.5 Other types of technologies


Technology has changed how legal professionals, including lawyers and paralegals, work
by partially automating certain tasks, from contract drafting to conducting legal research,
and even potentially predicting case outcomes (Choo, 2023). In a similar vein, technology

268
Legal settings

has shown potential to reshape translation and interpreting practice (Braun, 2019, 2020;
Carl and Braun, 2018; Pöchhacker, 2022), a phenomenon described as the ‘technological
turn’ (Fantinuoli, 2018).
Technology, from widely accessible cloud-based communication platforms to specialised
tools designed to assist interpreters, has been integrated into the interpreting workflow.
These technologies serve multiple purposes, from supporting interpreters in their prepara-
tory work to potentially replacing human interpreters with AI-powered software. While
research has demonstrated the application of various technologies in other translation and
interpreting contexts, there is limited evidence of their widespread adoption in legal inter-
preting. Nevertheless, this section presents technologies that could potentially redefine legal
interpreting provisions. However, a prerequisite for their implementation is the necessity
of conducting further research to assess the potential use and the subsequent impact and
benefits they may have within legal interpreting.
As discussed by Fantinuoli (2017, 2021), computer-assisted interpreting (CAI) tools offer
numerous benefits in supporting the interpreter’s workflow (see Prandi, this volume). These
tools have been developed to streamline the interpreter’s preparatory work by assisting in
the creation of glossaries, for example. They can also aid the interpreter whilst interpreting
by generating transcriptions and displaying numbers and proper names, which are notori-
ously challenging for interpreters. Another useful function is the fact that CAI tools can be
edited post-assignment for quality assurance purposes, thereby supporting interpreters in
their subsequent assignments.
Although the body of research is limited, Goldsmith (2018) discusses how tablet inter-
preting could enhance interpreters’ workflow both before and during assignments (see
also Saina, this volume). An important advantage worth highlighting is the possibility for
interpreters to transition to a paperless approach. By relying on tablets, interpreters can
eliminate the need for physical documents and instead rely on digital resources, which can
streamline their work processes and enhance overall efficiency and accessibility.
Digital pens may also bring potential benefits for interpreters, especially when work-
ing in consecutive mode (see Ünlü, this volume). They allow recording simultaneously the
interpreter’s notes and the speech being delivered, thereby assisting interpreters when deliv-
ering the speech in another language. They are a useful technological tool to be used dur-
ing interpreting training (Orlando, 2010). However, as they are equipped with audio and
video recording software, the use of digital pens within a legal context may raise significant
­ethical concerns.
Finally, automated translation and interpreting have opened up new horizons. For
instance, Pöchhacker (2022) discusses speech-to-text technology and how it enhances
accessibility, in educational settings and live events, by providing access to spoken mes-
sages to deaf or hard-of-hearing individuals (see also Davitti, this volume). Automated
translation is also very much used by a broader audience, as evidenced by Google Translate
reaching 1 billion installs in 2021 (Pitman, 2021). However, studies focusing on automated
translation within legal systems indicate that the use of such tools could be inappropriate
(Nunes Vieira et al., 2021; Trzaskawka, 2020).
Within the realm of public service interpreting, Monteolivia-Garcia’s (2020) study
reveals that police forces acknowledge the benefits of automated translation tools, includ-
ing Google Translate, whilst policing. The study reports that these applications are used in
informal situations (such as giving directions, answering some questions, or establishing
initial circumstances while waiting for an interpreter). The police officers participating in

269
The Routledge Handbook of Interpreting, Technology and AI

this study acknowledge that such applications have a limited use. Therefore, it is unsurpris-
ing that at the time of writing, machine translation and interpreting find limited application
within the context of legal proceedings (see Fantinuoli, this volume).

15.3 Outcome of technology implementation on the legal interpreter’s


working environment
Installing and using technology within legal systems present numerous challenges and
radically change the working environment for all legal stakeholders, including the legal
interpreter. As participants are geographically re-distributed, the legal interpreter’s working
environment is being reconfigured.

15.3.1 Technology and equipment quality in legal settings


The importance of sound quality for the interpreter to be able to interpret cannot be over-
stated. Reporting on the AVIDCUS 3 study, which surveyed 12 European nations, Braun
et al. (2018) find that the majority of jurisdictions use hardware systems as opposed to
cloud-based solutions. Interestingly, the study notes the continued widespread use of Inte-
grated Services Digital Network (ISDN) solutions, particularly for cross-border video links,
despite the new affordances offered by Internet Protocol (IP)–based communication sys-
tems. This presents potential challenges, as the sound and visual quality yielded by ISDN
systems may not be on par with those provided by IP-based solutions.
As identified by Braun et al. (2018), other factors may contribute to poor sound quality
in courts: the use of different microphones with varying sound quality outputs, mini sound
breaks which lead to distortion or omission of words or syllables, and issues with lip syn-
chronisation, which can be particularly problematic when the interpreter is watching the
speaker on a screen. Additional hurdles, such as microphone sharing between the defence
lawyer and the interpreter in VCI, or nearby noise like paper rustling, can further degrade
sound quality (Fowler, 2018).
Similarly, the issue of sound quality is also raised as a significant concern by legal inter-
preters in TI. This is particularly striking if participants are sharing a single handset, a
practice known as phone-passing, or when the equipment being used is inadequate. This
concern becomes even more pronounced when using mobile phones, as the sound quality
may suffer from poor network signal (Wang, 2018; Xu et al., 2020).
Similarly important to the legal interpreter, visual quality can also be impeded in legal
proceedings. Braun et al. (2018) provide an account of the Dutch judicial system, which is
equipped with multiple camera set-ups in courtrooms and remote locations. This arrange-
ment allows for both wide-angled shots of the courtroom and close-ups of participants, all
of which can be displayed on split screens for enhanced visibility. Given the static nature of
these cameras, there is no need for a member of staff to manually operate them. However,
the visual output of this particular set-up appears to be an exception. More frequently,
courtrooms are equipped with only one or two cameras featuring pre-set positions, which
are operated by court clerks, thus reducing visual output and cues for interpreters.
To explain disparities in equipment, Braun et al. (2018) make the distinction between
‘early adopters’ and ‘late adopters’. The former category, including countries like England,
Wales, Italy, and France, adopted videoconferencing systems in the 1990s and consequently
has a diverse range of equipment, products, and providers. In contrast, ‘late adopters’ tend

270
Legal settings

to pursue a more centralised approach, equipping courtrooms uniformly. The fragmenta-


tion in procurement and maintenance among the ‘early adopters’ explains, at least to some
extent, inconsistencies in sound and video quality. Within this context, Braun et al. (2018)
highlight that seasoned interpreters expressed their willingness to halt proceedings if the
sound and video quality were inadequate. However, there is a concern that less-experienced
interpreters may be hesitant to take similar action, a finding also identified by Devaux
(2017a).

15.3.2 The re-distributed legal system


Empirical research focusing primarily on England and Wales indicates a prevailing pat-
tern concerning the positioning of interpreters during court–prison and court–police
video links. In the former scenario, interpreters are typically positioned in court when
interpreting for a defendant. When interpreting for a witness, legal interpreters tend to be
co-located with them at a remote site. In the latter scenario, they are commonly stationed
with the defendant at the police station (Braun and Taylor, 2012b; Devaux, 2017a). How-
ever, there are exceptions, and studies also report on legal interpreters being co-located
with the defendant during court–prison video links. Reporting back on proceedings of
the International Criminal Tribunal for the former Yugoslavia, Adamska-Gallant (2016)
explains that judges have discretion over the interpreter’s location, and Braun et al. (2018)
argue that the lack of rules regarding the interpreter’s location contributes to disparities
in practice.
The location of the legal interpreter is not without consequences, as it may significantly
impact some aspects of the technologically mediated legal interactions.
In VCI, Devaux’s (2017a) study highlights varying concerns tied to the legal interpret-
er’s location, such as the defendant not seeing the interpreter being sworn in in VCI A.
Safety risks are also discussed, as interpreters are left alone in VCI B whilst the video link
connection is made. In RI, Hale et al. (2022) discuss potential security breaches during
police interviews when interpreters work remotely. Furthermore, their research unveils the
effect of environmental distractions in the remote work environment, such as an interpreter
attending to their children instead of interpreting. Linked to the interpreter’s geographical
re-distribution, another concern, according to Wang (2018), is the potential reduction in
earnings, which arose among 465 telephone interpreters who participated in her survey.

15.3.3 Adapting to a new working space


Installing technological equipment into pre-existing, often spatially constrained environ-
ments presents several challenges for legal interpreters. As observed by Braun et al. (2018),
such installation is often dictated by the space available in a courtroom or at a police sta-
tion, rather than by optimising the technology potential to provide a more user-friendly
experience. For example, screens can be mounted high on walls, imposing undue strain on
participants, and in some instances, the screens are simply disregarded. Fowler’s (2018)
study also finds that counsel briefings and debriefings with remote clients often occur in
cramped rooms outside of the courtroom, inadequate to accommodate both the lawyer
and the interpreter. This often results in doors being left open, potentially compromising
confidentiality. Similarly, prison and police interrogation rooms, typically compact spaces,
may have equipment affixed to the floor or walls.

271
The Routledge Handbook of Interpreting, Technology and AI

Given these spatial limitations, the interpreter’s position within courts and police inter-
view rooms must be adapted. Scholarly work shows that in face-to-face hearings, the inter-
preter typically sits or stands next to the defendant in the dock or just outside the dock.
However, in the context of a VCI hearing, the interpreter must sit or stand in a differ-
ent place. Braun et al.’s (2018) study reveals that the judge decides where the interpreter
stands, on an ad hoc basis, taking into account technological constraints, such as camera
or microphone locations. As a result, the interpreter might be standing outside the camera’s
field of view, making them invisible to the minority-language speaker. Singureanu et al.
(2023b) further investigate, amongst other themes, the visual ecology created in VCI A.
Their study reveals that this working environment presents additional challenges both for
the interpreter and the defendant. They report that the interpreter is more visible, and they
may find the setting more intimidating. Additionally, they may fail to notice the defendant’s
attempt to interact. It may also be more difficult for the defendant to identify who is speak-
ing, which results in an increased use of reported speech by the interpreter to make it more
accessible to the defendant. In an observational study conducted by Licoppe (2021), the
interpreter is positioned alongside the judge in VCI A. Proximity to the judge might suggest
collusion, thereby undermining the interpreter’s efforts to maintain impartiality.
In VCI B, Balogh and Hertog (2012) recommend that the interpreter sit behind the
minority-language speaker for them to be fully visible. Nonetheless, space is often restricted,
and Braun et al. (2018) note instances where defendants and interpreters are seated in a
line. This configuration has raised concerns among some participants in Devaux’s (2017a)
study, as court interpreters felt they were no longer perceived as impartial agents by partici-
pants located in the courtroom.

15.3.4 Change in the interpreting mode


Using technology necessitates a shift in the conventional mode of interpreting during court-
room proceedings. Interpreting for minority-language speakers now predominantly occurs
in consecutive mode, a deviation from the typical chuchotage, or whispered simultane-
ous interpreting, used in face-to-face settings. Research shows that the use of consecu-
tive interpretation can prolong court proceedings, potentially disrupting the court’s flow
(Fowler, 2007, 2013). An exception to this trend is when the interpreter is co-located with
the minority-language speaker in VCI B. In this scenario, whispered interpreting is often
used. However, whispered interpreting in VCI B carries a significant disadvantage, as Braun
et al.’s (2018) study indicates. In this mode, sound can still be relayed to the courtroom,
generating background noise. To mitigate this disturbance, the interpreter’s microphone
might be muted. This temporary fix may present several drawbacks, as muted microphones
could hamper the legal interpreter’s effort to manage the interaction and may amplify
remote participants’ sense of detachment from the courtroom proceedings.

15.4 Outcome of technology implementation on the interpreting process


Further to altering the legal interpreter’s working environment, technology is also reshaping
the interpreting process. Research demonstrates that audio- and video-mediated technology
can lower the legal interpreter’s output quality. Challenges related to interaction manage-
ment and rapport building are magnified, and concerns on physiological impact and the
interpreter’s role(s) have also emerged.

272
Legal settings

15.4.1 Quality
Empirical studies examining the influence of technology on legal interpreters’ performance
and quality of legal interpreting have yielded varied results. In a simulated police interview
context, Braun and Taylor (2012a) conduct a comparative study involving eight experi-
enced legal interpreters, gauging the difference in quality between face-to-face and remote
video interpreting. Their findings highlight a significant increase in the number of additions
(+290%) and distortions (+200%) in RI as compared to traditional face-to-face interac-
tions (see also Braun, 2013c).
A comparable study conducted by Miler-Cassino and Rybińska (2012) focuses on VCI
during prosecution questioning of witnesses in Polish courts. In this context, they designed
a simulation consisting of three case scenarios, employing three different interpreters. The
resulting assessment of the interpreters’ performances yielded more mixed results. For
instance, one interpreter demonstrated superior performance in VCI A compared to VCI B
or face-to-face, while another interpreter performed better in VCI B.
Broadening the pool of participants taking part in their study, Hale et al. (2022) also
report disparities regarding the performance and quality of interpreters when comparing
face-to-face, video remote, and audio remote interpreting. Their study identified no signifi-
cant disparity between face-to-face and video interpreting. However, the performance was
notably affected in audio remote interpreting. Hale et al. (2022, 18) suggest that a video
feed may contribute to improving quality, as it helps interpreters ‘render the original man-
ner and style accurately, maintain the verbal rapport markers, use correct legal discourse
and terminology, use the recommended interpreting protocols, and demonstrate adequate
management and coordination skills’.
Various factors potentially impacting the quality and performance of legal interpreting
have been postulated. Miler-Cassino and Rybińska (2012) hypothesise that differing levels
of linguistic and interpreting skills, or knowledge of the subject matter, could influence
outcomes. Other factors more intrinsically linked to the use of technology have been put
forward, including audio and video quality, interaction management, and familiarity with
the equipment (Braun, 2013a). Further investigation may be warranted to ascertain the
underlying factors that impact interpreting quality in a legal setting.

15.4.2 Turn-taking and interaction management


The effect of technology on turn-taking and interaction management appears to be consist-
ent across various studies, indicating that technology does have a tangible impact on the
legal interpreter’s ability to manage the interaction (see contributions by Lázaro Gutiérrez
and Braun, this volume).
In her seminal study, Wadensjö (1999) examines the dynamics of face-to-face versus
TI. Her findings suggest that the exchange of turns and the coordination of dialogue were
more fluid in face-to-face simulations. Similarly, Licoppe and Verdier (2013) reveal that
even when a video link is used, interactional management remains considerably more com-
plex, and according to Braun et al. (2018), it is particularly more difficult for the inter-
preter to intervene in VCI B. Echoing these findings, Balogh and Hertog (2012) shed more
light by scrutinising interaction in VCI and remote interpreting. Their results demonstrate
an increased need for interaction management and turn-taking, pointing to a propensity
for overlapped speech and artificial pauses. This narrative is substantiated by Braun and

273
The Routledge Handbook of Interpreting, Technology and AI

Taylor’s (2012a) research, which shows a significant increase in synchronisation issues in


remote video interpreting compared to face-to-face interpreting (+324%). Based on these
studies, it is evident that the implementation of VC systems gives rise to interaction man-
agement challenges, which also hold true for sign language–interpreted court interactions
via technology, as demonstrated in Napier’s (2012) study.
Interestingly, as posited in Licoppe and Verdier’s (2021) study, some interpreters may
assume a more proactive role in managing interaction. However, their more assertive
approach does not guarantee success. Licoppe and Verdier’s research highlights the poten-
tial risk of the interpreter inadvertently influencing the narrative. These findings underline
the complex balancing act a legal interpreter may have to perform in order to manage the
interaction effectively, particularly when interpreting is required.

15.4.3 Rapport building


Based on his professional experience as a police officer, Rombouts (2012) has insights that
stress the critical need for an interviewer to establish a rapport with an interviewee. He
asserts that this is challenging when interviews are conducted via VC systems, as the parties
are not co-located. In the AVDICUS 3 project, Braun et al. (2016a) support the contention
that establishing rapport with participants on the opposing side of the screen is inherently
difficult. This obstacle in building rapport also impinges upon the interpreter’s ability to
manage communication in both VCI A and VCI B contexts. They argue that interpret-
ers often need to exercise greater assertiveness than in face-to-face hearings to attract the
court’s attention. Braun’s (2013a, 45) research further emphasises that rapport building
could inherently be impacted by technology, as she asserts that ‘the technology, even when
very well designed, may not be able to erase reduction in the quality of the intersubjective
relations between the participants’. These insights strongly suggest that the conventional
rapport-building methods associated with face-to-face interaction may require reconsidera-
tion and adaptation when interpreting is carried out via technology.

15.4.4 Physiological impact


Pioneering research in the domain of conference interpreting has explored various physi-
ological responses of interpreters when engaging with technology, as documented by
Moser-Mercer (2003, 2005) and Mouzourakis (2003). To a lesser extent, some atten-
tion has been given to the physiological effects of technology on the legal interpreter. For
instance, Miler-Cassino and Rybińska’s (2012) study reports that interpreters found the
experimental conditions more stressful, isolating, and exhausting, suggesting an increased
cognitive load. Moreover, they observe that interpreters appear to require heightened focus
when operating in a VCI environment. This narrative is further supported by Braun and
Taylor’s (2012a) research, wherein the participants associate video-mediated interpreting
with amplified levels of stress and performance fatigue compared to traditional face-to-face
interpreting. This burgeoning body of evidence underscores the necessity of further research
into the physiological implications of technology use for legal interpreters.

15.4.5 The interpreter’s role


The role of the interpreter, particularly in the context of legal interpreting, has been a prom-
inent focus in the field of public service interpreting research, predominantly in face-to-face

274
Legal settings

interactions. Within the paradigm of legal interaction mediated via technologies, it emerges
that certain aspects of the interpreter’s role are redefined.
Skinner (2023) shows the multifaceted roles performed by legal interpreters when inter-
preting via video relay service in a police setting, including their auxiliary role as first-call
receivers. This notion of role multiplicity is echoed in Devaux’s (2017a, 2018) studies.
­Situated within the context of court VCI, he scrutinises the perceptions of 18 interpreters
concerning their role(s). Using the concept of role space proposed by Llewellyn-Jones and
Lee (2014), the interpreters report on a variety of factors influencing their presentation of
self, participant alignment, and interaction management. Examples of these factors affect-
ing their role perception include the inability to introduce themselves as the interpreter,
seating arrangements, scarcity of feedback and backchanneling, a propensity to align more
with participants situated in the courtroom, and a sense of not being able to interrupt
the interaction for clarifications. Noteworthy is the fact that the interpreters’ perceptions
of their role were not homogenous, leading to the creation of diverse role space models.
Although these models vary, it is interesting to note that some participants report that
VCI forces them to adopt different roles, within the same court assignment, depending on
whether they are interacting with the courtroom or the defendant.

15.5 Adapting to a new technological reality


Unquestionably, technological advancements exert significant influence on not only the
interpreter’s working environment but also the interpreting process itself. The necessity to
adapt to new technology-led modalities has compelled legal interpreters to innovate and
devise new strategies. Furthermore, considering the prevalence of technology, guidelines
and pedagogical resources have been developed, which will be useful to budding interpret-
ers, interpreter trainers, practitioners, and legal professionals at large.

15.5.1 Strategies
Miler-Cassino and Rybińska’s (2012) research suggests that the interpreters involved in
their case study developed, over the three-day duration of the experiment, various coping
mechanisms to manage stress, as they seemed more relaxed. Subsequent research delves
further into interpreting strategies in distance interpreting, corroborating that legal inter-
preters have crafted techniques to offset the hurdles inherent to the use of technologies.
Braun (2013a), for instance, lists strategies such as request for repetition, alert to problem,
comprehension check, direct request for clarification, repetition, plus interrogative, approx-
imation, and physical resolution, which can be combined. Adopting actor–network theory
(ANT) as a framework, Devaux (2017a) pinpoints additional strategies (or ‘interessement
devices’ in ANT terminology) that interpreters employ, either overtly or covertly, in VCI
A and VCI B. In his study, interpreters report using many strategies similar to those used
in face-to-face interactions, including referring participants to their professional code of
conduct or defining boundaries to refuse to do certain tasks. However, the use of technol-
ogy gives way to new strategies. For instance, when interpreting in VCI B with the defend-
ant, the lawyer, and the interpreter being co-located in prison, some interpreters would use
their proximity with the lawyer to seek clarification or ask for repetitions. In VCI A, some
interpreters would use distance as a tool to preserve their impartiality, as they cannot com-
municate directly with the defendant in this configuration.

275
The Routledge Handbook of Interpreting, Technology and AI

Braun (2013a) argues that some strategies are used more successfully than others; for
instance, a repetition request was less efficient than a comprehension check, which was used
more frequently. These findings underline the need for additional research to assess the
range and, more specifically, the effectiveness of strategies called upon by legal interpreters
in the context of distance interpreting.

15.5.2 Guidelines and training resources


The outcomes of the AVIDICUS and SHIFT projects (Amato, 2018; Braun, 2013a, 2013b;
Braun et al., 2016a, 2016b; Davitti and Braun, 2018a) offer comprehensive sets of recom-
mendations and guidelines targeted at national institutions, judicial authorities, legal stake-
holders (including judges, lawyers, and police officers), technicians, and legal interpreters.
For instance, a step-by-step guide (Braun, 2013b) for all parties to follow before, during,
and after a video-mediated interaction is provided. Specific advice for interpreters includes
familiarising themselves with the equipment, voicing their preference between VCI A and
VCI B, agreeing on procedures to follow with the other court actors, and monitoring their
output and body language. Interpreters are further advised to document their experiences
in a diary, noting any challenges and solutions encountered during a VC hearing. Arguably,
by encouraging interpreters to be reflective practitioners, they may be better equipped to
develop further adaptive strategies in the future.
In addition to proposing guidelines, these projects provide resources to support training.
As part of the AVIDICUS 1 project, Braun et al. (2012) provide suggestions for designing
modules to train interpreting students, practising legal interpreters, and legal practitioners.
They include a syllabus for course units and proposed course materials. Within the remit
of the SHIFT project, González Rodríguez (2018) and Davitti and Braun (2018b) suggest
exercises and role-play scenarios to support interpreting training. These scenarios include
role-play simulations in many different public service settings, including criminal and civil
contexts.
These resources and practical insights may currently serve as invaluable aids to legal
interpreter trainers, practitioners, and interpreting students and may be used as a first port
of call when designing a legal interpreting course.

15.6 Conclusion
This chapter discussed the extent to which technology is redefining interpreting provisions
within the legal field. The introduction of distance interpreting has altered the legal land-
scape by allowing participants to attend legal proceedings remotely. Although this tech-
nology has brought numerous benefits, it also presents challenges for the legal interpreter.
Moreover, this chapter has explored the potential of other emerging technologies, which
may further transform the legal interpreting profession.
As the legal field continues to embrace technological advancements, it is imperative that
legal interpreters are involved in decision-making processes and technological development
and implementation. This chapter also underscored the need to continue investigating the
effect of technology on the interpreter’s working environment and interpreting process.
Taking interdisciplinary and mixed methodology approaches can potentially yield com-
plementary findings, thereby enhancing our understating of how technology influences
interpreter-mediated legal interaction.

276
Legal settings

Notes
1 The use of technology in Asylum and Immigration Tribunals will not be discussed in this chapter,
as it is covered in Singureanu and Braun, this volume.
2 For more information on the different modalities, see Braun (2019, 2020).
3 AVIDICUS stands for Assessment of Videoconference Interpreting in the Criminal Justice Services.
The projects are available from: http://wp.videoconference-interpreting.net/ (accessed 4.4.2025).
4 More information on the Shaping the Interpreters of the Future and of Today (SHIFT) is available
from: https://site.unibo.it/shiftinorality/en (accessed 4.4.2025).
5 See https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32010L0064 (accessed
4.4.2025)

References
Adamska-Gallant, A., 2016. Video Conferencing in the Practice of the International Criminal Tri-
bunals. In Elektroniczny protokół – szansą na transparentny i szybkiproces’ [Electronic Proto-
col – a Chance for Transparent and Fast Trial]. Polish Ministry of Justice. URL www.academia.
edu/19730690/Video_Conferencing_in_Practice_of_Criminal_Courts (accessed 2.8.2024).
Ali, F., Al-Junaid, H., 2019. Literature Review for Videoconferencing in Court “E-Justice-Kingdom
of Bahrain”. 2nd Smart Cities Symposium, University of Bahrain, 24–26.3.2019. URL https://
ieeexplore.ieee.org/document/9124937 (accessed 2.8.2024).
Amato, A., 2018. Challenges and Solutions: Some Paradigmatic Examples. In Amato, A., Spinolo, N.,
González Rodríguez, M.J., eds. Handbook of Remote Interpreting – SHIFT in Orality. University
of Bologna, 79–102.
American Translators Association, 2021. ATA Position Paper on Remote Interpreting. URL www.
atanet.org/advocacy-outreach/ata-position-paper-on-remote-interpreting/ (accessed 2.8.2024).
Balogh, K., Hertog, E., 2012. AVIDICUS Comparative Studies – Part II: Traditional, Videoconference
and Remote Interpreting in Police Interviews. In Braun, S., Taylor, J., eds. Videoconference and
Remote Interpreting in Criminal Proceedings. Intersentia Publishing, Mortsel, 101–116.
Braun, S., 2013a. Assessment of Video-Mediated Interpreting in the Criminal Justice System: AVIDI-
CUS 2 – Action 2 Research Report. URL http://wp.videoconference-interpreting.net/wp-content/
uploads/2014/01/AVIDICUS2-Research-report.pdf (accessed 2.8.2024).
Braun, S., 2013b. Assessment of Video-Mediated Interpreting in the Criminal Justice System: AVIDI-
CUS 2 – Action 3: Guide to Video-Mediated Interpreting in Bilingual Proceedings. URL http://
wp.videoconference-interpreting.net/wp-content/uploads/2014/01/AVIDICUS2-Recommendations-
and-Guidelines.pdf (accessed 2.8.2024).
Braun, S., 2013c. Keep Your Distance? Remote Interpreting in Legal Proceedings: A Critical Assess-
ment of a Growing Practice. Interpreting 15(2), 200–228. https://doi.org/10.1075/intp.15.2.03bra
Braun, S., 2018. Video-Mediated Interpreting in Legal Settings in England: Interpreters’ Perceptions
in Their Sociopolitical Context. Translation and Interpreting Studies 13(3), 393–420. URL https://
doi.org/10.1075/tis.00022.bra
Braun, S., 2019. Technology, Interpreting. In Baker, M., Saldanha, G., eds. Routledge Ency-
clopedia of Translation Studies, 3rd ed. Routledge, Oxon and New York. URL https://doi.
org/10.4324/9781315678627
Braun, S., 2020. Technology and Interpreting. In O’Hagan, M., ed. Routledge Handbook of Transla-
tion and Technology. Routledge, Oxon and New York, 271–288.
Braun, S., Davitti, E., 2018. Face‐to‐Face vs. Video‐Mediated Communication – Monolingual. In
Amato, A., Spinolo, N., González Rodríguez, M.J., eds. Handbook of Remote Interpreting – SHIFT
in Orality Erasmus+ Project: Shaping the Interpreters of the Future and of Today. University of
Bologna, Bologna. URL http://amsacta.unibo.it/id/eprint/5955/
Braun, S., Davitti, E., Dicerto, S., 2016a. The Use of Videoconferencing in Proceedings Conducted
with the Assistance of an Interpreter. URL www.videoconference-interpreting.net/wp-content/
uploads/2016/11/AVIDICUS3_Research_Report.pdf (accessed 2.8.2024).
Braun, S., Davitti, E., Dicerto, S., 2016b. Handbook of Bilingual Videoconferencing: The Use of
Videoconferencing in Proceedings Conducted with the Assistance of an Interpreter. AVIDICUS
3 Project. URL www.videoconference-interpreting.net/wp-content/uploads/2016/08/AVIDICUS3_
Handbook_Bilingual_Videoconferencing.pdf

277
The Routledge Handbook of Interpreting, Technology and AI

Braun, S., Davitti, E., Dicerto, S., 2018. Video-Mediated Interpreting in Legal Settings: Assessing the
Implementation. In Napier, J., Skinner, R., Braun, S., eds. Here or There: Research in Interpreting
via Video Link. Gallaudet University Press, Washington, DC, 144–179.
Braun, S., Taylor, J., 2012a. AVIDICUS Comparative Studies – Part 1: Traditional Interpreting and
Remote Interpreting in Police Interviews. In Braun, S., Taylor, J., eds. Videoconference and Remote
Interpreting in Criminal Proceedings. Intersentia Publishing, Mortsel, 85–100.
Braun, S., Taylor, J., 2012b. Video-Mediated Interpreting: An Overview of Current Practice and
Research. In Braun, S., Taylor, J., eds. Videoconference and Remote Interpreting in Criminal Pro-
ceedings. Intersentia Publishing, Mortsel, 27–58.
Braun, S., Taylor, J., Miler-Cassino, J., Rybińska, Z., Balogh, K., Hertog, E., Vanden Bosh, Y., Rombouts,
D., 2012. Training in Video-Mediated Interpreting in Legal Proceedings: Modules for Interpreting
Students, Legal Interpreters and Legal Practitioner. In Braun, S., Taylor, J., eds. Videoconference and
Remote Interpreting in Criminal Proceedings. Intersentia Publishing, Mortsel, 233–288.
Carl, M., Braun, S., 2018. Translation, Interpreting and New Technologies. In Malmkjaer, K., ed. The
Routledge Handbook of Translation Studies and Linguistics. Routledge, Oxon and New York,
374–389.
Choo, Y.K., 2023. How Is Technology Impacting the Legal Profession? URL www.allaboutlaw.co.uk/
law-careers/legal-technology/how-is-technology-impacting-the-legal-profession (accessed 29.6.2023).
College of Policing, 2020. Briefing Note: Using Language Services. URL https://library.college.police.
uk/docs/college-of-policing/Language-Services-v1.0.pdf (accessed 2.8.2024).
Davitti, E., Braun, S., 2018a. Challenges and Solutions. In Amato, A., Spinolo, N., González Rod-
ríguez, M.J., eds. Handbook of Remote Interpreting – SHIFT in Orality Erasmus+ Project: Shap-
ing the Interpreters of the Future and of Today. University of Bologna, Bologna. URL http://
amsacta.unibo.it/id/eprint/5955/
Davitti, E., Braun, S., 2018b. Role-Play Simulations. In Amato, A., Spinolo, N., González Rodríguez,
M.J., eds. Handbook of Remote Interpreting – SHIFT in Orality Erasmus+ Project: Shaping the
Interpreters of the Future and of Today. University of Bologna, Bologna. URL http://amsacta.
unibo.it/id/eprint/5955/
Devaux, J., 2016. When the Role of the Court Interpreter Intersects and Interacts with New Technolo-
gies. In Intersect, Innovate, Interact. CTIS Occasional Papers, 7. URL https://hummedia.manches-
ter.ac.uk/schools/salc/centres/ctis/publications/occasional-papers/Devaux.pdf
Devaux, J., 2017a. Technologies in Interpreter-Mediated Criminal Court Hearings: An Actor-Network
Theory Account of the Interpreter’s Perception of Her Role-Space (PhD thesis). The University of
Salford. URL https://oro.open.ac.uk/54390/.
Devaux, J., 2017b. Virtual Presence, Ethics and Videoconferencing Interpreting: Insights from Court
Settings. In Valero-Garcés, C., Tipton, R., eds. Ideology, Ethics and Policy Development in Public
Service Interpreting and Translation. Multilingual Matters, Bristol, 131–150.
Devaux, J., 2018. Technologies and Role-Space: How Videoconference Interpreting Affects the Court
Interpreter’s Perception of Her Role. In Fantinuoli, C., ed. Interpreting and Technologies. Lan-
guage Science Press, Berlin, 91–119.
Dumoulin, L., Licoppe, C., 2010. Policy Transfer ou Innovation? L’activité juridictionnelle à distance
en France. Critique Internationale 48, 117–133.
Fantinuoli, C., 2017. Computer-Assisted Interpreting: Challenges and Future Perspectives. In Corpas
Pastor, G., Durán-Muñoz, I., eds. Trends in E-Tools and Resources for Translators and Interpret-
ers. Brill, Leiden, 153–174.
Fantinuoli, C., 2018. Interpreting and Technology: The Upcoming Technological Turn. In Fantinuoli,
C., ed. Interpreting and Technology. Language Service Press, Berlin, 1–12.
Fantinuoli, C., 2021. Conference Interpreting: New Technologies. In Albl-Mikasa, M., Tiselius, E.,
eds. The Routledge Handbook of Conference Interpreting. Routledge, Oxon and New York,
508–522.
Fowler, Y., 2007. Interpreting into the Ether: Interpreting for Prison/Court Video Link Hearings. Pro-
ceedings of the Critical Link 5 Conference, Sydney, 11–15.4.2007.
Fowler, Y., 2013. Non-English Speaking Defendants in the Magistrates Court: A Comparative Study
of Face to Face and Prison Video Link Interpreter Mediated Hearings in England (PhD thesis).
Aston University.

278
Legal settings

Fowler, Y., 2018. Interpreted Prison Video Link: The Prisoner’s Eye View. In Napier, J., Skinner, R.,
Braun, S., eds. Here or There: Research on Interpreting via Video Link. Gallaudet University Press,
Washington, DC, 183–209.
Fullwood, C., Judd, A.M., Finn, M., 2008. The Effect of Initial Meeting Context and Video-Mediation
on Jury Perceptions of an Eyewitness. Internet Journal of Criminology, Online. URL https://www.
academia.edu/1796147/THE_EFFECT_OF_INITIAL_MEETING_CONTEXT_AND_VIDEO_
MEDIATION_ON_JURY_PERCEPTIONS_OF_AN_EYEWITNESS
General Secretariat of the Council of the European Union, 2013. Guide on Videoconferencing in
Cross-Border Proceedings. URL http://bookshop.europa.eu/en/guide-on-videoconferencing-in-
cross-border-proceedings-pbQC3012963/ (accessed 11.6.2015).
Goldsmith, J., 2018. Tablet Interpreting: Consecutive Interpreting 2.0. Translation and Interpreting
Studies. The Journal of the American Translation and Interpreting Association 13(3), 342–365.
URL https://doi.org/10.1075/tis.00020.gol
González Rodríguez, M.J., 2018. Preparatory Exercises. In Amato, A., Spinolo, N., González Rod-
ríguez, M.J., eds. Handbook of Remote Interpreting – SHIFT in Orality. University of Bologna,
Bologna, 144–149.
Haas, A., 2006. Videoconferencing in Immigration Proceedings. Pierce Law Review 5(1), 59–90.
Hale, S.B., Goodman-Delahunty, J., Martschuk, N., Lim, J., 2022. Does Interpreter Location Make a
Difference? A Study of Remote vs Face-to-Face Interpreting in Simulated Police Interviews. Inter-
preting 24(2), 221–253. URL https://doi.org/10.1075/intp.00077.hal
Her Majesty’s Court and Tribunal Services, 2020. Annual Report and Accounts 2019–20. URL www.
gov.uk/official-documents (accessed 2.8.2024).
Hertog, E., 2015. Legal Interpreting. In Pöchhacker, F., ed. Routledge Encyclopedia of Interpreting
Studies. Routledge, Oxon and New York, 230–236.
Home Office, 2021. Interpreters Code of Conduct. URL https://assets.publishing.service.gov.uk/gov-
ernment/uploads/system/uploads/attachment_data/file/1085040/Code_of_conduct_for_UK_visas_
and_immigration_registered_interpreters_v4.pdf (accessed 2.8.2024).
International Telecommunication Union, 2024. ITU-T Recommendations by Series. URL www.itu.
int/ITU-T/recommendations/index.aspx?ser=H (accessed 2.8.2024).
Johnson, M., Wiggins, E., 2006. Videoconferencing in Criminal Proceedings: Legal and Empirical
Issues and Directions for Research. Law & Policy 28(2), 211–227.
Kelly, N., 2008. Telephone Interpreting: A Comprehensive Guide to the Profession. Trafford Publish-
ing, Victoria, BC.
Lee, R., 2020. Role-Space in VRS and VRI. In Salaets, H., Brône, G., eds. Linking Up with Video:
Perspectives on Interpreting Practice and Research. John Benjamins Publishing Company, Oxon
and New York, 107–125. URL https://doi.org/10.1075/btl.149.05lee
Licoppe, C., 2021. The Politics of Visuality and Talk in French Courtroom Proceedings with Video
Links and Remote Participants. Journal of Pragmatics 178, 363–377.
Licoppe, C., Verdier, M., 2013. Interpreting, Video Communication and the Sequential Reshaping of
Institutional Talk in the Bilingual and Distributed Courtroom. International Journal of Speech,
Language and the Law 20(2), 247–275. URL https://doi.org/10.1558/ijsll.v20i2.247
Licoppe, C., Verdier, M., 2021. L’interprète au centre du prétoire ? Voix, pouvoir et tours de parole
dans les débats multilingues avec interprétation consécutive et liaisons vidéo. Droit et société 107,
31–50.
Llewellyn-Jones, P., Lee, R.G., 2014. Redefining the Role of the Community Interpreter: The Concept
of Role-Space. SLI Press, Carlton-le-Moorland.
McKay, C., 2016. Video Links from Prison: Permeability and the Carceral World. International
Journal for Crime, Justice and Social Democracy 5(1), 21–37. URL https://doi.org/10.5204/ijcjsd.
v5i1.283
Miler-Cassino, J., Rybińska, Z., 2012. AVIDICUS Comparative Studies – Part III: Traditional Inter-
preting and Videoconference Interpreting in Prosecution Interviews. In Braun, S., Taylor, J., eds.
Videoconference and Remote Interpreting in Criminal Proceedings. Intersentia Publishing, Mort-
sel, 117–136.
Monteolivia-Garcia, E., 2018. The Last Ten Years of Legal Interpreting Research (2008–2017):
A Review of Research in the Field of Legal Interpreting. Language and Law/Linguagem e Direito
5(1), 36–80.

279
The Routledge Handbook of Interpreting, Technology and AI

Monteolivia-Garcia, E., 2020. Interpreting or Other Forms of Language Support? Experiences and
Decision-Making Among Response and Community Police Officers in Scotland. The International
Journal for Translation and Interpreting Research 12(1), 37–54.
Morris, R., 1999. The Face of Justice: Historical Aspects of Court Interpreting. Interpreting 4(1),
97–123.
Moser-Mercer, B., 2003. Remote Interpreting: Assessment of Human Factors and Performance
Parameters. The AIIC Webzine 23. URL http://aiic.net/page/1125/remote-interpreting-assessment-
of-human-factors-and-performance-parameters/lang/1 (accessed 27.11.2016).
Moser-Mercer, B., 2005. Remote Interpreting: The Crucial Role of Presence. Bulletin VALS-ASLA
81, 73–97.
Mouzourakis, T., 2003. That Feeling of Being There: Vision and Presence in Remote Interpreting. The
AIIC Webzine 23. URL http://aiic.net/issues/207/2003/summer-2003-23 (accessed 24.11.2016).
Napier, J., 2012. Here or There? An Assessment of Video Remote Signed Language Interpreter-Mediated
Interaction in Court. In Braun, S., Taylor, J., eds. Videoconference and Remote Interpreting in
Criminal Proceedings. Intersentia Publishing, Mortsel, 145–185.
Napier, J., Leneham, M., 2011. “It Was Difficult to Manage the Communication”: Testing the Fea-
sibility of Video Remote Signed Language Interpreting in Court. Journal of Interpretation 21(1).
URL https://digitalcommons.unf.edu/joi/vol21/iss1/5/
New York State Unified Court System, 2020. Court Interpreter: Manual and Code of Ethics. URL
https://ww2.nycourts.gov/sites/default/files/document/files/2020-10/20_Code_of_Ethics_0.pdf
(accessed 2.8.2024).
Nunes Vieira, L., O’Hagan, M., O’Sullivan, C., 2021. Understanding the Societal Impacts of Machine
Translation: A Critical Review of the Literature on Medical and Legal Use Cases. Information.
Communication and Society 24(11), 1515–1532.
Orlando, M., 2010. Digital Pen Technology and Consecutive Interpreting: Another Dimension in
Note-Taking Training and Assessment. The Interpreters’ Newsletter 15, 71–86.
Ozolins, U., 2011. Telephone Interpreting: Understanding Practice and Identifying Research Needs.
Translation and Interpreting 3(2), 33–47.
Pitman, J., 2021. Google Translate: One Billion Installs, One Billion Stories. URL https://blog.google/
products/translate/one-billion-installs/ (accessed 6.7.2023).
Plotnikoff, J., Woolfson, R., 1999. Preliminary Hearings: Video Links Evaluation of Pilot Projects.
URL http://lexiconlimited.co.uk/wp-content/uploads/2013/01/Videolink-magistrates.pdf (accessed
3.3.2015).
Plotnikoff, J., Woolfson, R., 2000. Evaluation of Video Link Pilot Project at Manchester Crown Court:
Final Report. URL http://lexiconlimited.co.uk/wp-content/uploads/2013/01/Videolink-Crown.pdf
(accessed 3.3.2015).
Pöchhacker, F., 2022. Interpreters and Interpreting: Shifting the Balance? The Translator 28(2),
148–161. URL https://doi.org/10.1080/13556509.2022.2133393
Radburn-Remfry, P., 1994. Due Process Concerns in Video Production of Defendants. Stetson Law
Review 23, 805–838.
Rombouts, D., 2012. The Police Interview Using Videoconferencing with a Legal Interpreter: A Criti-
cal View from the Perspective of Interview Techniques. In Braun, S., Taylor, J., eds. Videoconfer-
ence and Remote Interpreting in Criminal Proceedings. Intersentia Publishing, Mortsel, 137–144.
Roth, M.D., 2000. Laissez Faire Videoconferencing: Remote Witness Testimony and Adversarial
Truth. UCLA Law Review 48(1), 185–220.
Singureanu, D., Braun, S., Davitti, E., González Figueroa, L.A., Poellabauer, S., Mazzanti, E., De
Wilde, J., Maryns, K., Guaus, A., Buysse, L., 2023a. Research Report. EU-WEBPSI: Baseline
Study and Needs Analysis for PSI, VMI and LLDI. URL https://ucrisportal.univie.ac.at/de/pub-
lications/research-report-eu-webpsi-baseline-study-and-needs-analysis-for-p (accessed 2.8.2024).
Singureanu, D., Hieke, G., Gough, J., Braun, S., 2023b. “I am His Extension in the Courtroom”.
How Court Interpreters Cope with the Demands of Video-Mediated Interpreting in Hearings with
Remote Defendants. In Corpas Pastor, G., Defrancq, B., eds. Current and Future Trends. John Ben-
jamins Publishing Company, Amsterdam and Philadelphia, 72–108. URL https://doi.org/10.1075/
ivitra.37
Skinner, R., 2020. Approximately There – Positioning Video-Mediated Interpreting in Frontline
Police Services (PhD thesis). Heriot-Watt University.

280
Legal settings

Skinner, R., 2023. Would You Like Some Background? Establishing Shared Rights and Duties in
Video Relay Service Calls to the Police. Interpreting and Society: An Interdisciplinary Journal 3(1).
URL https://doi.org/10.1177/27523810221151107
Skinner, R., Napier, J., Braun, S., 2018. Interpreting via Video Link: Mapping of the Field. In Napier,
J., Skinner, R., Braun, S., eds. Here or There: Research on Interpreting via Video Link. Gallaudet
University Press, Washington, DC, 11–35.
Skinner, R., Napier, J., Fyfe, N.R., 2021. The Social Construction of 101 Non-Emergency Video Relay
Services for Deaf Signers. International Journal of Police Science & Management 23(2), 145–156.
Thaxton, R., 1993. Injustice Telecast: The Illegal Use of Closed-Circuit Television Arraignments and
Bail Bond Hearings in Federal Court. Iowa Law Review 79(1), 175–202.
Thompson, C., 2017. Why Many Deaf Prisoners Can’t Call Home. The Marshall Project: Nonprofit
Journalism about Criminal Justice. URL www.themarshallproject.org/2017/09/19/why-many-
deaf-prisoners-can-t-call-home#:~:text=The%20technology%20provided%20to%20deaf%
20people%20in%20most,which%20allows%20users%20to%20speak%20in%20sign%
20language (accessed 19.9.2017).
Trzaskawka, P., 2020. Selected Clauses of a Copyright Contract in Polish and English in Translation
by Google Translate: A Tentative Assessment of Quality. International Journal for the Semiotics of
Law – Revue internationale de Sémiotique juridique 33, 689–705.
Wadensjö, C., 1999. Telephone Interpreting and the Synchronisation of Talk. The Translator 5(2),
247–264.
Wang, J., 2018. “Telephone Interpreting Should Be Used Only as A Last Resort.” Interpreters’ Percep-
tions of the Suitability, Remuneration and Quality of Telephone Interpreting. Perspectives 26(1),
100–116.
Xu, H., Hale, S.B., Stern, L., 2020. Telephone Interpreting in Lawyer-Client Interviews: An Observa-
tional Study. Translation and Interpreting 12(1), 18–36.
Zahrastnik, K., Baghrizabehi, D., 2022. Videoconferencing in Times of the Pandemic and Beyond:
Addressing Open Issues of Videoconferencing in Cross-Border Civil Proceedings in the EU. Balkan
Social Science Review 19. URL https://doi.org/10.46763/BSSR2219047z

281
16
IMMIGRATION, ASYLUM, AND
REFUGEE SETTINGS
Diana Singureanu and Sabine Braun

16.1 Introduction
Asylum/immigration procedures include formal contexts, such as asylum interviews, that
is, formal procedures to assess an asylum claim. They also include court proceedings (e.g.
asylum appeals and judicial reviews of return decisions) and health assessments and a
wide range of informal encounters, for example, in reception centres. The 1951 United
Nations Geneva Convention (Refugee Convention), ratified by over 150 countries, guar-
antees the right to an interpreter for refugee-status applicants, including asylum seekers.
However, research has revealed malpractices, such as poor interpreting provision and the
soliciting of interpreters’ opinions. This potentially puts refugees and asylum seekers at risk
(Jiménez-Ivars and León-Pinilla, 2018, 31). Research on interpreting in asylum contexts
has largely focused on interactions of the government authorities responsible for granting
rights and providing resources, like counselling and medical support (Maryns, 2015), with
other diverse settings where interpreters operate (schools, NGOs, social services, banks,
shelters, landlord negotiations, and job interviews; Jiménez-Ivars and León-Pinilla, 2018,
31) remaining underexplored. Interpreters working in this variety of settings often face
challenges related to conveying cultural nuances and the traumatic experiences of refugees
while also navigating confusion about their role, as they are caught between authorities’
and refugees’ conflicting expectations (Jiménez-Ivars and León-Pinilla, 2018; Inghilleri and
Maryns, 2019). Pöllabauer (2023) highlights how interpreters are not always able to fully
or correctly render specific narratives, cultural nuances, and details, nor are they able to
fully address communication breakdowns. This can significantly affect asylum outcomes.
Unsurprisingly, due to these communication gaps, refugees and asylum seekers often feel
misunderstood and overlooked, as shown in a comprehensive literature review of primary
healthcare settings (Patel et al., 2021). Thus, the need for enhanced training and ethical
guidelines for interpreters working in these diverse and complex contexts is a prominent
recommendation in the literature (see also Giustini, this volume).
The use of technology in these settings is not new. For instance, video links have been
used in immigration contexts since the 1990s (Federman, 2006). Communication technolo-
gies such as video and telephone remain the primary tools for providing language support,

DOI: 10.4324/9781003053248-21 
Immigration, asylum, and refugee settings

and they form the main focus of this chapter. This contribution builds on primary research
and practice-based evidence, primarily focused on formal settings, such as immigration
proceedings, tribunals, asylum interviews, and studies of asylum seekers in detention, as
well as research on healthcare interactions with refugees. It also highlights the European
EU-WEBPSI project, a recent initiative aimed at improving access to basic services for
migrants and refugees through video-mediated interpreting (VMI).
Section 16.2 provides an overview of the main uses of interpreting by telephone and
video link (distance interpreting) in immigration, asylum, and refugee settings. It begins
by outlining common practices, key research areas, and main findings (Section 16.2.1). To
promote best practices, it then reviews key guidelines for VMI in public service interpret-
ing (PSI) relevant to refugee contexts (Section 16.2.2). While there are no specific VMI
guidelines for asylum and refugee settings, the section introduces a promising develop-
ment: minimum standards for VMI in these environments. The final part (Section 16.2.3)
focuses on training initiatives available for community interpreters in immigration, asy-
lum, and refugee contexts, highlighting the shortage of interprofessional training and recent
efforts to address this gap through programmes for both VMI and telephone interpret-
ing. Section 16.3 explores additional technologies, such as crowdsourcing platforms and
AI-driven translation tools, which have helped maintain language support services, espe-
cially in crisis situations. Section 16.4 concludes the chapter, emphasising the importance of
improving technology-enabled interpreting services, such as VMI, for refugees by investing
in reliable technology, interpreter training, and structured feedback systems.

16.2 Distance interpreting in immigration, asylum, and refugee settings

16.2.1 From practice to research


Over the past two decades, studies on video links involving interpreters in immigra-
tion proceedings have consistently identified distinct differences between traditional and
video-mediated settings. This is particularly due to the physical separation between the
interpreter and key participants (e.g. the applicant, lawyer, and judge). Technological bar-
riers, identified as a significant issue in video-mediated hearings, are further exacerbated
by the inadequate integration of interpreters and the prevalent use of unqualified interpret-
ers (BID, 2008; Shaw, 2016, 2018; Rowe et al., 2019). When the refugee or applicant is
separated from the interpreter, who is co-located with the judge or legal professionals, their
emotions and body language do not come across. This makes it difficult for interpreters to
read and express these as part of the interpreted message (Ellis, 2004; BID, 2008; Licoppe
and Veyrier, 2017, 2020). Another recurring issue is overlapping speech and coordination
difficulties, which complicate consecutive interpretation (Ellis, 2004; Rowe et al., 2019;
Licoppe and Veyrier, 2020). Whispered interpreting, the only form of simultaneous inter-
pretation available, due to the lack of specialised equipment, becomes impractical when
the interpreter is separated from the applicant. As a result, interpreters resort to chunking
strategies to handle longer speaking turns. However, these strategies are often ineffective
and problematic (Ellis, 2004; Rowe et al., 2019; Licoppe and Veyrier, 2020). Applicants
also experience difficulties in following the interpretation as a result of technical issues, spa-
tial arrangements, or differences in accent or dialect (BID, 2008, 2010; Eagly, 2015; Rowe
et al., 2019). Technical challenges and issues related to the integration of the interpreter

283
The Routledge Handbook of Interpreting, Technology and AI

disrupt the flow of hearings, making them longer, more exhausting, and less effective in
delivering justice (Ellis, 2004).
A Canadian feasibility study (Ellis, 2004) examined immigration hearings through 10 hr
of observations, interviews with counsels, and an online survey involving 17 interpreters,
14 counsels, 25 adjudicators, and 16 refugee protection officers. During these hearings, the
immigration judge, refugee protection officer, and interpreter were located in one office,
while the refugee and lawyer were in a different city. The study raises concerns about
whether videoconferencing strikes the right balance between fairness and efficiency, given
its observed effect on how credible claimants appear. This is especially important in refugee
hearings, where direct evidence is often limited. Body language and emotions are poorly
conveyed through VMI, and the impersonal nature of videoconferencing makes it difficult
for applicants to express their emotions effectively. Counsel generally held the most nega-
tive views of VMI, possibly due to their awareness of the impact on decision-making. The
separation of the interpreter from the claimant introduced additional difficulties, including
challenges with rapport, signalling turns, and the impracticality of whispered interpreta-
tion. Common technical and administrative issues, such as the absence of an authority
figure to manage claimants and maintain order, further complicate the process.
A study of immigration bail hearings via video link by two British charities, Bail for
Immigration Detainees (BID) and the British Refugee Council (BID, 2008), reached simi-
lar conclusions. Several applicants, who were separated from the interpreter and other
participants in court, reported difficulty in following the courtroom proceedings. In some
cases, only direct questions to the applicant were translated, leaving them as mere bystand-
ers for the rest of the proceedings, unable to fully participate. Building on the 2008 study,
BID re-examined the fairness of bail hearings and noted significant barriers related to the
integration of interpreters when proceedings were conducted via video link (BID, 2010). In
some instances, judges did not verify whether the applicant and interpreter understood each
other, and significant portions of dialogue were not translated. In some cases, interpreters
were asked to offer opinions on the applicant’s nationality. This violates the FTTIAC’s
Guidance Note1 on pre-hearing introductions, which states that interpreters should not be
used as experts or asked to give advice. Access to interpreters during pre-hearing consulta-
tions was also limited when hearings were conducted via video link, with technical issues
compounding communication barriers. One of the study’s key recommendations is that it
should be the judge’s responsibility to ensure complete interpretation, as well as to confirm
that the interpreter and claimant understand each other.
In a similar vein, a 2005 study by the Legal Assistance Foundation of Metropolitan Chi-
cago and the Chicago Appleseed Fund for Justice, which examined immigration removal
cases at Chicago’s immigration court and included interviews with immigration practition-
ers, found that non-English speakers faced challenges due to the quality and the integration
of the interpretation. Judges remained in the courtroom while detainees appeared remotely
via video, using a speakerphone at the court to communicate with the interpreter. Lengthier
exchanges between the judge and the attorneys were not translated. Nearly 30% of immi-
grants with interpreters misunderstood parts of the proceedings due to inadequate inter-
pretation, and 70% of non-English speakers faced issues related to videoconferencing. The
removal rate was significantly higher for non-English-speaking Latinos (76%) compared to
their English-speaking counterparts (46%).
A recent, more comprehensive study of removal hearings in the United States (Eagly,
2015) examining 153,835 hearings involving litigants held in detention centres, with

284
Immigration, asylum, and refugee settings

approximately a quarter conducted via video link, also identified technological, interac-
tional, and interpretation difficulties. Overall, detainees in video link hearings faced difficul-
ties following the proceedings due to poor video quality and technical issues. Furthermore,
lawyers reported transmission delays and screen blackouts. Difficulties also arose in under-
standing interpreters who were connected through speakerphone while the detainee par-
ticipated via video link. Echoing findings from earlier studies (Ellis, 2004), video link cases
showed reduced engagement compared to in-person hearings, with fewer detainees apply-
ing for removal relief or seeking to delay/stop the process.
A landmark report (Shaw, 2016) on the situation of immigrants in UK removal centres,
along with its follow-up report in 2018, raised significant concerns about the availability
and quality of interpreting services. These reports cited poor practices, including instances
where detainees were relied upon to assist with interpretation. While the follow-up review
noted some improvements, particularly thanks to an increased use of telephone interpret-
ing, quality issues persisted. These issues included interpreters’ reluctance to address sensi-
tive topics, such as sexual orientation, due to cultural or religious biases, and instances of
poor literacy among interpreters. Healthcare staff also reported instances of poor conduct,
such as interpreters abruptly disconnecting during mental health sessions and background
noise indicating a lack of privacy. Although on-site interpretation is preferable for critical
or sensitive situations, this may not always be practical. The report stressed the urgent need
for an independent review to improve the quality of interpreting services available in the
removal centres.
A study in France analysing over 300 immigration appeal court hearings via video link
explored the communicative dynamics between participants in the main court and claim-
ants. Interpreters were either co-located with the applicant or present in court. A significant
recurring issue was the poor positioning of cameras and screens. This often prevented par-
ticipants from having a clear view of each other, resulting in an incongruent visual set-up.
This made it difficult for claimants to identify whose speech the interpreter was conveying,
leading to comprehension problems (Licoppe and Veyrier, 2017). The physical separation
between asylum seeker and interpreter also affected the interpreters’ handling of communi-
cation flow. As a result, interpreters resorted to overt or explicit turn-taking techniques dur-
ing long exchanges. A prominent finding was that these interpreting strategies negatively
impacted perceptions of the asylum seeker’s credibility and willingness to cooperate during
the questioning process (Licoppe and Veyrier, 2020).
To address interpreter shortages and logistical challenges in asylum interviews, the use of
distance interpreting in asylum interviews has evolved across Europe with varying degrees
of success and adaptation based on individual countries’ needs. The General Directors’
Immigration Services Conference (GDISC) took an early step in integrating videoconfer-
encing technology to mitigate interpreter shortages across Europe. Launched in 2007, the
‘Interpreters’ Pool Project’ used relay interpreting, with one interpreter co-located with
the caseworker and applicant while a second, fluent in the applicant’s language, partici-
pated from another country. This project demonstrated how distance communication tech-
nologies could bridge gaps in interpreter availability within Europe, particularly for rare
languages.
More recently, the COVID-19 pandemic accelerated the use of distance interpreting
methods, as highlighted in EASO/EUAA’s 2021 Asylum Report. Countries adopted tel-
ephone (see also Lázaro Gutiérrez, this volume) and video-mediated interpreting (see also
Braun, this volume) to ensure safe and continued asylum interviews while adhering to health

285
The Routledge Handbook of Interpreting, Technology and AI

protocols. Some countries, like Norway and Ireland, already had remote interviewing prac-
tices in place, while others, such as Sweden and France, introduced this modality for the
first time. Norway has used Skype for asylum interviews since 2017, particularly in remote
centres, and Ireland adopted similar practices in 2019. Sweden uses remote interviews
to reduce travel for applicants, while France conducts remote interviews for ­vulnerable
individuals and overseas departments. Other countries, including Armenia, Belgium, and
Germany, employ remote interviews in specific cases, such as for detained asylum seekers
or when interpreters are working from a remote location. In the UK, remote interviews
are conducted with the asylum seeker, interpreter, legal representative, and interviewing
officer in separate locations, with a designated point of contact for safeguarding concerns
(UNHCR, 2020).
Interpreting for refugees has been explored in other contexts, with healthcare settings
being particularly well-documented. In these contexts, distance interpreting is commonly
used and has been the focus of studies which compare its effectiveness with in-person
interpreting. For example, Dubus (2016) explored interpreters’ experiences with both
in-person and telephone interpreting for refugees in the United States in healthcare set-
tings and revealed that telephone interpreting often led to emotional detachment and
communication breakdowns. Similarly, Kunin et al. (2022) assessed telephone interpret-
ing in refugee health clinics in Australia. They found that while practical, telephone
interpreting lacked the emotional connection and nuance of face-to-face consultations,
with many refugees feeling more comfortable and trusting during in-person interac-
tions. Another comparative study of the two modalities (Phillips, 2013) examined
the ‘meta-communication’ between doctors and refugees – conversations about care,
­survival, and identity that occur beyond literal translation. Phillips found that while
telephone interpretation can provide a functional solution in certain situations, it often
fails to capture the full depth of the refugee experience. Consequently, Phillips argues
that in-person interpreting is superior in building trust and understanding between refu-
gees and healthcare professionals​.
Distance interpreting has also become a useful tool in health screenings for asylum seek-
ers, which play an important role in the legal process of claiming asylum. For instance, a
pilot study conducted prior to the COVID-19 pandemic (Mishori et al., 2021) examined the
implementation of audio- and videoconferencing in remote asylum evaluations in migrant
camps in Mexico to assess the effectiveness and challenges of conducting these evalua-
tions via telehealth. The study found that while remote evaluations were considered better
than no evaluations at all, they posed significant challenges, including technological issues,
concerns about confidentiality, and difficulties in making visual observations and building
rapport – factors that are particularly important in mental health assessments. Despite
these challenges, the study suggests that remote evaluations could be a viable option for
hard-to-reach communities if improvements are made in technology and protocols​. A later
study by Pogue et al. (2021), carried out in the United States during the COVID-19 pan-
demic, also explored the experiences of clinicians conducting remote medical evaluations
for asylum seekers and reached similar conclusions. While remote evaluations were neces-
sary due to pandemic restrictions, they were often fraught with challenges, such as lim-
ited technology access and difficulties in assessing the physical and mental state of clients.
Nonetheless, the clinicians viewed this to be a vital alternative to the absence of evaluations
and recommended developing better technology and communication protocols to improve
future assessments.

286
Immigration, asylum, and refugee settings

Furthermore, practical recommendations regarding the registration of asylum seekers


issued by EASO/EUAA (2020) note that reception centres play an increasingly important
role in the registration process for asylum seekers. These centres assist with completing
forms and collecting documents, often with the support of interpreters or cultural media-
tors. The report observes that many countries expanded the role of reception centres dur-
ing the COVID-19 pandemic, and that closer cooperation with asylum authorities may
lead to more streamlined registration procedures. One key recommendation is to conduct
remote registration interviews within these centres, which can offer dedicated, confidential
rooms equipped with videoconferencing technology that meets essential quality standards.
Reception staff can verify the applicant’s identity, manage participation, and provide any
necessary support (EASO, 2020, 17). If these practices expand, it will likely increase the
demand for video-mediated interpreting in reception centres, where interpreters may either
be present on-site with the applicant or connected remotely from a different location.
An ad hoc query on interpreting in reception facilities by the European Migration Net-
work (2022) examined practices in reception facilities across 23 EU member states. Most
countries use a combination of on-site, telephone, and video-mediated interpreting, with
on-site interpreting being preferred for critical situations. Distance interpreting, especially
via video, has increased due to the pandemic. Countries like Belgium and Bulgaria now
predominantly use video-based solutions, while the Czech Republic and Sweden rely more
on telephone interpreting. Distance interpreting modalities address interpreter shortages,
especially for rare languages, and are more cost-effective since they reduce travel expenses.
In countries with dispersed reception facilities, such as Belgium and Finland, distance
interpreting helps overcome logistical challenges. The pandemic further accelerated the
adoption of distance communication technologies, maintaining asylum procedures while
ensuring health protocols. Distance interpreting also offers flexibility and quicker access to
interpreting services, particularly useful in urgent situations, such as late-night screenings
or emergencies.
The European EU WEBPSI project2 (2022–2025) focuses specifically on video-mediated
interpreting to expand interpreting capacity for languages of limited diffusion (LLDs),
ensuring improved language support and access to services for non-EU nationals in refu-
gee and asylum settings. In a series of case studies involving reception centres in Belgium,
Greece, and France, the project research reveals daily challenges and practices for interpret-
ing provision. Findings denote fluctuating demands for interpreting in various language
combinations, especially for LLDs, as well as practical issues, such as the remoteness of
some reception centres, time constraints, and budget limitations (Singureanu et al., 2023a).
In reception settings, interpreters primarily assist during intake procedures at the arrival
centre, which involve registration and basic social and medical assessments. Due to the
daily volume of assignments, these tend to be short procedures, usually carried out with the
help of on-site interpreters. More-advanced intakes, that is, detailed assessments, occur at
reception facilities, where VMI is more likely to be used. This is due to the wide distribu-
tion of facilities across a large geographical area and the unpredictable, often short-notice
demand for interpreters. These intakes include more varied interactions, such as provid-
ing information on voluntary return, schooling, employment, behaviour in the centre, the
handling of incidents that arise, preparing for asylum interviews, medical consultations,
identifying vulnerabilities, and transfer requests (De Wilde and Guaus, 2022). The imple-
mentation of VMI varies widely, depending not only on the stage of the intake process
but also on the country where the process is taking place. Some centres use VMI daily,

287
The Routledge Handbook of Interpreting, Technology and AI

employing resident interpreters who can be booked directly by staff. In other places, tel-
ephone interpreting remains the only modality of distance interpreting available, because
the transition to VMI has not yet been completed; speakers and webcams are not always
available in every office, and staff members may require training to use the equipment. In
addition, access to VMI is also dependent on the agencies with whom the centres work and
whether these agencies offer VMI.
To meet the needs of the centres, the most frequently used configuration is video
remote interpreting (VRI), where all primary participants are co-located and the inter-
preter is remote/off-site, connected to via video (Braun, 2019). Due to employee time
pressure to find and book interpreters at short notice and the frequent demand for LLDs
in remote locations, this configuration appears to be a suitable arrangement that meets
the centres’ daily needs, offering professional interpreters that are easy to book and also
cancel. This configuration typically features three parties: The beneficiary or centre resi-
dent and the authority staff member (e.g. the asylum service staff member or a social
worker) are co-located, and the interpreter is remote. This set-up occurs when a different
centre or refugee camp from the one where the resident interpreters are based requires
interpretation. Consequently, the interpreter is connected to the requesting centre via an
online platform. Of note, centres dedicated to minors are more likely to have an increased
need for interpreting services. A second use of this VMI configuration arises when centre
residents leave the centre (e.g. for medical appointments) and the interpreter remains in
the residential facility. Interestingly, VMI has been used to communicate with minors in
order to maintain the impartiality of resident interpreters, who might otherwise become
too familiar with the minors and risk compromising their neutrality. At times, a fourth
party, such as a psychologist, lawyer, or doctor, may join remotely, creating multi-point
video links. Although not meant to participate actively, such a fourth party may add
information at the interview’s conclusion.
A combination of staff interpreters, qualified freelancers, and agency interpreters is hired
for these assignments, while volunteers are also used to meet demand and typically assist
with informal or straightforward communications. When interpreting services are unavail-
able, staff often use tools like Google Translate or rely on multilingual colleagues that have
been strategically recruited. On-site interpreting is considered a better option for extended
briefings, sensitive topics, risk-to-life medical scenarios, and mental health matters, with
exceptions made for LLDs, especially when immediate assistance is required. The diversity
of settings in which interpreters work is covered, to varying degrees, by their training.
Training includes brief overviews and presentations on the asylum service, the medical field,
interpreting for individuals who have experienced smuggling or trafficking, and interpret-
ing for the LGBTQI community (Singureanu et al., 2023a). Furthermore, whether VMI
is offered directly to end users or outsourced to private agencies affects interpreter work-
ing arrangements, video platform choice, quality control, and technical support. In some
countries, interpreter bookings are managed centrally, and interpreters are deployed either
to VMI or on-site, depending on demand. This central management also allows for more
structured interpreter management, scheduled breaks, and debriefing sessions for challeng-
ing assignments. In other cases, in-house interpreters manage their own schedules and are
directly booked by staff using an internal online platform. When VMI or telephone inter-
preting is booked through contracted agencies, this requires advance requests and subse-
quent confirmation. The perception among service users appears to be that working via
agencies offers a quality guarantee; however, there is little control or transparency regarding

288
Immigration, asylum, and refugee settings

the qualifications or expertise of remote interpreters provided by agencies (Singureanu


et al., 2023a).
However, staff do not always have access to essential VMI equipment, such as laptops
with integrated webcams, headsets, and a good internet connection, and they may not
have sufficient office space dedicated to VMI to ensure privacy. In some cases, this issue is
addressed by advising interpreters to highlight problems with sound quality and to inter-
vene for clarification where required. In addition, local coordinators provide additional
support as needed. Furthermore, volunteers providing distance interpreting who cannot
invest in VMI equipment are more likely to offer telephone interpreting only. There is no
formal feedback system for interpreting quality or VMI usage. However, informal feedback
is occasionally gathered from staff and beneficiaries, along with site visits by senior manag-
ers to monitor VMI challenges. Future plans include prioritising VMI over on-site interpret-
ing, developing guidelines for VMI, creating and delivering practical training for staff on
the use of interpreters, and improving technological infrastructure to support VMI uptake.
Thus, the fragmented use of in-house interpreters, interpreting agencies, and volunteers for
video and telephone remote interpreting makes it difficult to monitor quality and practices.
This results in uneven interpreting practices, with VMI not being fully established and
evolving sporadically, rather than systematically. However, admittedly, a one-size-fits-all
approach would serve neither local needs nor demand.
To systematise VMI practices in the asylum context, it would be beneficial to leverage
existing empirical research and literature from other PSI settings and apply these findings
to refugee contexts. Logistical issues, such as limited preparation time and the use of sub-
standard equipment – common in the asylum context – have been shown in the literature to
impair interpreter performance. Furthermore, interactional problems, prevalent across all
VMI settings, can significantly affect both interpreters and end users in asylum cases, which
makes effective communication even more challenging.
VMI practices in refugee settings reveal that limited infrastructure constrains pre- and
post-assignment interaction between interpreters and end users, which can negatively
impact interpreter performance and well-being. The absence of suitable briefing can leave
interpreters inadequately prepared for assignments, ultimately affecting interpreting qual-
ity, as evidenced in legal interpreting literature (Balogh and Salaets, 2019; Rowe et al.,
2019; Braun, 2020; Corpas Pastor and Gaber, 2020; Clark, 2021). To some extent, this
is due to end users’ lack of awareness of how interpreters work and what they require
to perform well. This can be addressed through interprofessional training (Fowler, 2013;
Clark, 2021; Singureanu et al., 2023b). Post-session debriefing for interpreters is also a key
practical and ethical consideration for effective VMI implementation in asylum settings.
Psychological debriefing after emotionally challenging VMI assignments has been shown
to provide valuable support for interpreters. This is particularly relevant in refugee con-
texts, where such assignments are common (Barea Muñoz, 2021; Butow et al., 2012). Fur-
thermore, post-session feedback-oriented debriefings can also help identify potential VMI
issues, which can be systematically documented and integrated into practices and training
(Braun et al., 2013).
Technical issues, such as substandard equipment, are a key factor influencing the effi-
ciency and effectiveness of VMI. The literature on VMI in asylum settings, reviewed ear-
lier, highlights key challenges, including inadequate or outdated equipment and unreliable
internet connections. Research on VMI has established that technical issues, including
poor sound or image quality, can cause comprehension difficulties (Davis et al., 2015;

289
The Routledge Handbook of Interpreting, Technology and AI

Eagly, 2015; Clark, 2021) and negatively impact interpreter performance (Alley, 2012;
Braun, 2018; Corpas Pastor and Gaber, 2020; Hale et al., 2022). These issues can lead
to disruptions, distract users, or even necessitate adjournments and re-scheduling (Davis
et al., 2015; Clark, 2021). To improve the quality and uptake of VMI, there is a need for
investment in reliable, user-friendly equipment that is easily accessible to all participants
(Hale et al., 2022). PSI literature offers clear guidelines for VMI-ready workstations. These
should include a laptop, external camera (ideally positioned to enhance eye contact), pro-
fessional headsets, a microphone, a quiet space, and a high-speed, wired internet connec-
tion (Koller and Pöchhacker, 2018; Klammer and Pöchhacker, 2021). As with other VMI
settings, institutional stakeholders and professionals may be unaware of these technological
requirements (Braun et al., 2018). Furthermore, the input of interpreters working in refugee
settings is often missing from the grey literature. Thus, regular technical checks and sup-
port, along with increased technological literacy for professionals and interpreters, are also
essential (Ellis, 2004; Braun, 2013; Eagly, 2015; Fowler, 2018; Sultanić, 2020; Tam et al.,
2020; Ji et al., 2021).
Additionally, difficulties in interacting with/via the technology contribute to inter-
actional challenges between participants and the interpreter, an aspect of VMI
well-documented within PSI literature. Research shows that participants often struggle
to pick up on the remote interpreter’s non-verbal cues for speech segmentation due to
challenges with eye contact and screen monitoring (Klammer and Pöchhacker, 2021; Sin-
gureanu et al., 2023b). Overlapping speech is also more common in VMI. This causes
problems for the interpreter, who may mishear or miss information (Korak, 2012; Braun,
2018). These additional complexities tax the interpreter’s cognitive load in VMI, lead-
ing to more interpreting issues, such as omissions and additions, compared to on-site
interpreting (Braun, 2018). Communication and interactional problems also stem from
the incongruent visual environment due to the positioning of the equipment, that is, the
camera and screens, at both sites (Licoppe and Veyrier, 2017; Braun, 2018). Interactional
problems were found to be more common in VMI encounters with multiple participants
at one end, both in medical (Price et al., 2012) and legal settings (Licoppe and Veyrier,
2017; Fowler, 2018; Braun, 2020). This is particularly relevant in the refugee context,
where medical and age assessments conducted remotely can include multiple participants,
such as minors, or vulnerable patients who require the presence of an independent guard-
ian or family member.
VMI interactional issues for other-language speakers include difficulties in building rap-
port with the remote interpreter (Ellis, 2004), psychological distancing (Rowe et al., 2019),
and trust issues (Lindström and Pozo, 2020; Martínez et al., 2021). Without the personal
dynamics of in-person interpreting, remote interactions can hinder patient engagement with
interpreters and healthcare providers (Martínez et al., 2021). This is especially critical in
refugee contexts, where the power imbalance between cultures and languages makes rap-
port particularly important (Kleinert et al., 2019; Blasco Mayor, 2020). The human barriers
associated with VMI raise concerns about its suitability for certain encounters, since remote
interpreters struggle to navigate emotionally charged accounts and culture-specific nuances
(Ellis, 2004). Additionally, interpreters often struggle to read facial cues from elderly or
impaired individuals (Gilbert et al., 2022), which could impact the accuracy of remote
health assessments for migrants, discussed earlier in this section. While on-site interpreting
is prioritised for adversarial asylum interviews, there are situations when VMI remains the
only option for certain LLDs and in emergency situations.

290
Immigration, asylum, and refugee settings

16.2.2 Guidelines
This section showcases examples of good practices in VMI for asylum and immigration
settings. It draws on both the limited guidance specific to these areas and valuable insights
from VMI guidelines that have been developed for healthcare and legal contexts. Integrat-
ing these broader best practices can help close the gap in standardised guidelines for VMI
in asylum and refugee contexts. This ensures that interpreters working in this modality are
better prepared to deliver high-quality services, ultimately enhancing communication and
support for refugees.
More specific recommendations for VMI in refugee settings, particularly in fully virtual
encounters, where all parties are spatially separated, emerged during the COVID-19 pan-
demic. UNHCR (2020) recommendations for remote interviewing underscore interpret-
ers’ pivotal role in ensuring the fairness and effectiveness of remote asylum interviews.
They emphasise the importance of ensuring equal participation for all parties, including
interpreters. Thus, interpreters should ideally participate via the same method as other
participants, preferably via videoconferencing rather than just telephone, to ensure clear
communication and rapport. Additionally, training interpreters in VMI and remote inter-
viewing techniques helps minimise the challenges posed by physical distance and ensures
the effective participation of all parties throughout the interview process. These serve as
crucial procedural safeguards to uphold fairness during the remote interviewing process.
There is still a lack of harmonisation and standardisation of VMI guidelines, similar to
the situation in PSI. While some countries have developed detailed VMI guidelines driven
by institutional needs, there remains a significant gap in guidance for interpreters working
in asylum or immigration settings, especially for LLDs. To capture current VMI practices
across several European countries, guidelines from German, Finnish, Norwegian, Czech,
Dutch, Italian, and Polish sources were reviewed, with translations into English facilitated
by DeepL. It should be noted that this review of VMI guidelines is not exhaustive. Addi-
tionally, guidelines from healthcare and legal settings were examined, as they share com-
mon elements relevant to VMI in refugee contexts. These offer valuable insights that can
inform the development of guidelines for asylum and immigration settings.
VMI guidelines in legal settings clarify the prerequisites for VMI implementation and
address users’ lack of familiarity with integrating interpreters into distance communica-
tion (Council of the European Union, 2013). The guidelines also highlight factors that can
limit the use of VMI. This is of particular relevance to refugee settings, where one may
encounter inadequate technical equipment, a lack of technical knowledge, physical room
layout limitations, or long and complex interactions (Braun et al., 2013; OROLSI-JCS and
UNITAR, 2020; Federal Ministry of Justice, 2022). It is generally considered good prac-
tice to avoid an imbalance of participants, for example, set-ups where most participants
are co-located and only one primary participant or the interpreter is off-site. End users
can also be affected by problems with technological access and literacy (Federal Ministry
of Justice, 2022; OROLSI-JCS and UNITAR, 2020). These are crucial considerations in
refugee settings, as noted in Section 16.2.1. Continuous evaluation through a structured
feedback process is also essential for successful VMI implementation in local public services
(GDISC, 2008; HSE Social Inclusion Unit and the Health Promoting Hospitals Network,
2009; Braun et al., 2013).
Few guidelines currently address the various spatial arrangements and visual require-
ments regarding VMI configurations where one or more participants are remote (Council

291
The Routledge Handbook of Interpreting, Technology and AI

of the European Union, 2013; CIOL and ATC, 2021; IMDi, 2023a). For effective com-
munication, participants must be able to clearly see who is speaking at all times (van Rot-
terdam and van den Hoogen, 2012). Recreating the traditional triangular positioning of
participants, as seen in on-site interpreting, helps facilitate this (Office of the Commissioner
General for Refugees and Stateless Persons, 2022). Replicating this traditional arrangement
allows the interpreter to view both the speaker and the audience without participants need-
ing to turn away from the screen (Council of the European Union, 2013). Additionally,
participants should describe gestures or objects that are not visible to the interpreter (IMDi,
2023a). Suitable lighting is also essential to ensure that participants’ facial expressions are
clearly visible (Braun et al., 2013; CIOL, 2021).
A collaborative approach to communication management is recommended due to the
increased coordination required (Council of the European Union, 2013; CIOL, 2021; CIOL
and ATC, 2021). This extends to the preparation stage for interpreters, which requires
additional effort and coordination in VMI, particularly when booked via an agency, with
briefings provided either well in advance or just before the assignment (CIOL and ATC,
2021; IMDi, 2023b). For sensitive encounters, such as medical or adversarial settings, the
service provider or initiating party must determine which participants should be part of the
interpreter’s briefing (IMDi, 2024). Similarly, introductions at the beginning of the VMI
session should be facilitated by the coordinator, with the interpreter introducing themselves
if necessary (Braun et al., 2013; CIOL, 2021). Verbal and non-verbal signals for pacing and
turn-taking should also be established early (Council of the European Union, 2013; TEPIS,
2019; IMDi, 2024). As part of a collaborative approach, and in adherence to the princi-
ples of integrity outlined in the interpreters’ code of conduct, interpreters are encouraged
to intervene when necessary to address communication problems and technological issues
(Braun et al., 2013; TEPIS, 2019). Effective communication management involves pausing
the participants’ speech regularly to allow the interpreter to render manageable speech
segments, ensuring high-quality consecutive interpreting (Braun et al., 2013; CIOL, 2021;
Federal Ministry of Justice, 2022).
Technological recommendations highlight the importance of sound quality at the inter-
preter’s location (Braun et al., 2013). Ideally, all participants should use individual profes-
sional microphones with echo and background-noise cancellation (van Rotterdam and van
den Hoogen, 2012; Braun et al., 2013), though this may not be practical in refugee settings.
To address this, guidelines suggest conducting technical checks and establishing clear pro-
cedures for resolving acoustic issues at the beginning of the encounter, while continuously
monitoring sound quality throughout the session (Braun et al., 2013; OROLSI-JCS and
UNITAR, 2020; CIOL and ATC, 2021; Federal Ministry of Justice, 2022; IMDi, 2023b).
Essential technical specifications for web-based VMI platforms include supporting various
views (e.g. speaker, gallery, self-image). The initiating party must manage the platform as
well as features including data security, webcasting, and recording (Federal Ministry of
Justice, 2022). All parties, including the interpreter, must have adequate technical equip-
ment and are responsible for speed and connectivity at their location (OROLSI-JCS and
UNITAR, 2020; Federal Ministry of Justice, 2022; IMDi, 2023b). Freelance interpreters,
in particular, must handle technical issues at their location. This requires IT skills and VMI
experience or training (Braun et al., 2013; IMDi, 2023b).
Current VMI recommendations also highlight the importance of training for inter-
preters, professionals, and end users. Given the complexities of VMI, a well-coordinated
approach is essential to ensure effective use (Vastaanottava Pohjois-Savo [The Welcome

292
Immigration, asylum, and refugee settings

to Northern Savo Project], 2011; Braun et al., 2013; Council of the European Union,
2013; Finnish Translators and Interpreters’ Association, 2013; IMDi, 2023a). Ethical
aspects, such as interpreters only accepting remote assignments for which they have the
necessary skills and equipment, should also be included in VMI training (Finnish Trans-
lators and Interpreters’ Association, 2013). Interprofessional training should cover the
limitations of VMI and equip professionals with the ability to make informed decisions
about VMI’s suitability for specific assignments, as well as the ability to make adjust-
ments when required. Raising awareness among end users and stakeholders about the
importance of VMI-trained interpreters is key to maintaining a strong accreditation
system (Braun et al., 2013). Prioritising trained interpreters as standard practice could
further enhance the quality of VMI (HSE Social Inclusion Unit and Health Promoting
Hospitals Network, 2009).
A core set of minimum VMI requirements for interpreters and professionals in refugee
and asylum settings is essential to establishing a mandatory standard rather than mere
recommendations. At the same time, these requirements should provide NGOs and refugee
organisations with the flexibility to adapt them to their specific needs. One such step in this
direction is the EU-WEBPSI guidelines model, which aims to establish minimum standards
and essential requirements for VMI practice in the asylum and refugee settings (Guaus
et al., 2023). These guidelines highlight the importance of distinguishing between VMI
requirements for all involved parties and specific needs for interpreters and end users, such
as clients and service providers. Certain aspects, such as visual and acoustic requirements,
communication management, breaks, VMI equipment, and the use of consecutive mode,
are common points relevant to all parties involved. Thus, both end users and interpreters
must understand that effective communication in video-mediated encounters depends on
mutual visibility, optimal spatial arrangement, and suitable lighting. An increased level
of communication management is necessary for efficient VMI encounters, which require
structured collaborative approaches to introductions, pacing, and turn-taking during the
encounter. The guidelines stress the importance of interpreters possessing strong linguistic,
technical, and interactional skills, as well as familiarity with troubleshooting and the VMI
platform. The guidelines also address ethical considerations. Interpreters are expected to
act with integrity by accepting only VMI assignments that they can competently manage
and withdrawing if conditions, such as inadequate equipment or unfamiliarity with the
platform, are insufficient. Additionally, interpreters must address impartiality concerns by
ensuring that any overlooked input from the other-language speaker is brought to the atten-
tion of all participants. An important ethical consideration for meeting organisers is that
their duty of care must also extend to off-site interpreters, particularly during emotionally
sensitive assignments. This includes providing interpreters with free access to mental health
support resources.

16.2.3 Training
The literature on asylum and refugee settings often highlights the lack of adequate training
for interpreters and the need for ongoing professional development (Nuč and Pöllabauer,
2021; Cassim et al., 2022). Training courses are typically short, with minimal feedback
from a bilingual instructor (Gany et al., 2017; Blasco Mayor, 2020; Pöllabauer, 2022).
This makes it unlikely that these interpreting trainees will reach advanced levels of inter-
preting skills (Hale and Ozolins, 2014). Of note, the training and qualifications offered to

293
The Routledge Handbook of Interpreting, Technology and AI

interpreters working in this context vary considerably, and there is a notable lack of stand-
ardisation and quality control mechanisms (Tipton and Furmanek, 2016).
Prominent examples of comprehensive training initiatives in this field, that is, Quada,
EASO, and REMILAS, can be highlighted thanks to their involvement of multidisciplinary
teams from interpreting, linguistics, and cultural studies (Bergunde and Pöllabauer, 2019;
Ticca et al., 2023). Bergunde and Pöllabauer (2019) provide an in-depth exploration of
both the development and implementation of a training course for interpreters in asylum
contexts, focusing on linguistic and cultural competency. Their findings show that compre-
hensive training, including ethics and trauma-informed approaches, significantly improves
interpreter performance in complex asylum interviews, and the authors advocate for con-
tinuous professional development to better prepare interpreters for these demanding roles.
The VMI training initiative implemented in Germany, Austria, and Switzerland to address
the urgent need for interpreters of languages of limited diffusion (Albl-Mikasa and Eingrie-
ber, 2018) is also worth noting. This study highlights the integration of video interpreting
into healthcare services in response to the 2015 refugee crisis, especially for languages with
a limited pool of interpreters. Conducted over several days, the training focused on both
lay and qualified community interpreters working in social services, healthcare, and asylum
contexts. The course encompassed interpreting techniques, role-plays, professional ethics,
and VRI skills, including managing distance communication and technical issues. It also
featured VRI practice sessions, followed by feedback from clients and peers. The use of
simulations, discussions, and reflections on the differences between VMI and in-person
interpreting has also been made in other studies (Skaaden, 2018) as curriculum components
designed to help trainee interpreters adapt to the complexities of working in VMI. Skaaden
(2018) examines interpreting trainees’ experiences using Skype to interpret remotely dur-
ing simulated VRI interactions in social services and asylum settings. The 45 participating
interpreters reported challenges, such as turn-taking difficulties, an overload of visual infor-
mation, and the complexity of managing screen monitoring, alongside their interpreting
tasks. The study also emphasises the importance of online etiquette and the need for all
participants to be briefed on best practices for distance communication.
Other comprehensive training initiatives for distance interpreting in PSI settings also
offer relevant insights and training methodologies. The EU project SHIFT3 (Amato et al.,
2018) focused on training for telephone and video interpreting across various community
and social interpreting scenarios. This is particularly relevant for informal communication
in asylum and refugee settings. The project’s primary aim was to identify effective prac-
tices and develop educational materials based on these identified insights. A key outcome
of the SHIFT project was the creation of a taxonomy to analyse VRI interactions in col-
laborative settings, including business, community, and emergency encounters. This tax-
onomy addresses crucial interactional aspects, such as managing the opening and closing of
encounters, spatial organisation, and turn-taking. The categories highlight both successful
and problematic interpreting techniques while outlining coping and adaptation strategies
for VRI (Davitti and Braun, 2020). Some of the main challenges identified include handling
long speaking turns, referencing documents or objects, and achieving clear communication.
The use of adaptive strategies, such as micro-pauses and slower speech rates, can improve
communication management, while verbal cues should replace non-verbal signals for better
coordination. This analytical framework serves as an effective tool for interpreter training,
particularly in refugee settings.

294
Immigration, asylum, and refugee settings

Joint training programmes, such as the European-funded AVIDICUS 1–3 projects,4 have
been instrumental in legal PSI settings, helping professionals using VMI services understand
and implement best practices. This underscores the value of collaboration and shared train-
ing in achieving successful outcomes for VMI implementation. However, such in-person
collaborative training initiatives can be restrictive due to the time and cost investments
required from all parties. Training VMI end users through online self-directed courses or
short training sessions provides the flexibility professionals need (Ramos et al., 2014; Pavill,
2019). As current efforts suggest (EU-WEBPSI, 2022–2025; IMDi, 2023a, 2023b), this
approach may be more efficient and practical for those working in refugee settings, given
typical budget and time constraints.
The most recent training initiative is the EU-WEBPSI project, which aims to address the
growing demand for interpreting services in refugee and reception centre contexts. Here,
interpreters are often required for a wide variety of informal conversations in diverse ref-
ugee contexts, from medical to quasi-legal, educational, and social services. The project
primarily aims at developing a framework to train refugees as interpreters, particularly
for VMI, by creating specialised training materials for both interpreters and interpreter
trainers. Additionally, to ensure that interpreters can effectively work via video link in these
settings, the project is developing a dedicated platform for delivering video distance inter-
preting services. This comprehensive approach is designed to meet the unique communica-
tion challenges present in refugee contexts, while also empowering refugees to contribute
as trained interpreters.

16.3 Other technologies

16.3.1 Crowdsourcing platforms


Advances in communication technology, including Web 2.0, social media, and global plat-
forms, have enabled real-time crowdsourcing translation. This has also expanded access to
human interpreters. Crowdsourcing platforms connect networks of volunteers with varying
levels of language proficiency. From here, the volunteers are able to address urgent or unmet
communication needs, either on a paid or unpaid basis (O’Brien, 2011; Jiménez-Crespo,
2017). Described as the ‘human version of machine translation’ due to its ability to process
large volumes quickly, this approach has come to play an important role in crisis situations,
including natural disasters, political upheavals, and health emergencies, where immediate
communication is essential (Anastasiou and Gupta, 2011).
One example is the Tarjimly platform, launched in 2017 in response to the Syrian refu-
gee crisis. Tarjimly supports over 50 languages and operates through platforms like Face-
book Messenger. It connects over 9,000 volunteers to help refugees and aid workers with
medical emergencies, asylum interviews, and other critical needs. While platforms like Tar-
jimly help bridge language gaps in urgent situations, they explicitly state that they are not
a replacement for professional interpreting services in medical or legal settings when such
services are available. Thus far, the platform has been used to support access to healthcare
for refugees in Greece, assist in asylum processes in the USA, and aid organisations like the
UNHCR and the Syrian American Medical Society.
While fulfilling an important function in crisis communication support, crowdsourced
translation is not without its challenges. It relies on volunteers, often amateurs, who may

295
The Routledge Handbook of Interpreting, Technology and AI

lack familiarity with the professional ethics that guide certified interpreters. To address
these issues, providing training to volunteers is crucial. Another way to improve crowd-
sourced translation is to optimise the matching of bilingual volunteers with refugees or aid
workers needing support. This involves selecting the most suitable volunteer based on their
skills and the specific request. The Tarjimly platform uses artificial intelligence and machine
learning to do this on a large scale (Agarwal et al., 2020).

16.3.2 Automated services


The EASO/EUAA noted in its 2021 Asylum Report that the COVID-19 pandemic forced
EU+ countries to adapt their language services to ensure continuity while maintaining
health protocols. Member states implemented telephone and video-mediated interpreting
services to ensure safe interviews. However, some countries, such as Lithuania, adopted
Travis Touch Go AI smart pocket translation devices in reception centres, while oth-
ers used automated tools like Google Translate, despite acknowledging their limita-
tions. Translators without Borders (2020) and KoBo Inc. developed speech recognition
technology to improve data collection from marginalised speakers, initially modelling it
on languages like French and Congolese Swahili in the Democratic Republic of Congo.
Integrated into KoBoToolbox, the technology enabled humanitarians to engage more
effectively with vulnerable communities during the COVID-19 crisis. Countries including
Germany and Sweden introduced digital solutions for language recognition and analysis.
Additionally, several member states developed self-registration systems at local adminis-
tration premises, allowing some asylum applicants to register without requiring an inter-
preter’s presence. AI-based chatbots are also being explored to guide applicants through
the process.

16.4 Conclusion
The provision of VMI services in refugee settings is fragmented due to reception authorities
and national governments having to address various local needs. Some of the more typical
challenges related to the asylum and refugee settings consist of a complex target audience
made up of families, elderly people, and unaccompanied minors, together with a lack of
training in the VMI modality. VMI implementation varies by country and the particular
stage of the asylum process. Some centres use VMI daily, whilst others have not yet transi-
tioned to VMI and still use telephone interpreting to meet the demand. On-site interpreting
is predominantly used for initial intake procedures at arrival centres due to high volumes,
while VMI is more likely to be used for advanced intake procedures at dispersed facilities,
to manage unpredictable demand and the need for interpreters of LLDs. Technological
problems significantly impact the effective use of VMI. In remote locations, unreliable inter-
net connections are a major issue, while more generally, staff and interpreters often lack
essential VMI equipment and private spaces.
The integration of VMI in refugee settings presents both logistical and interactional
challenges, as highlighted by research over the past two decades. Studies consistently point
to the difficulties posed by the physical separation of interpreters and participants, sub-
standard equipment, and inadequate preparation time, all of which impair the quality of
interpretation. Non-verbal cues, essential for effective communication, are often lost in
VMI, exacerbating issues of rapport, trust, and accuracy. These challenges are particularly

296
Immigration, asylum, and refugee settings

significant in refugee contexts, where emotionally charged accounts and complex cultural
nuances demand accurate and culturally nuanced interpreting.
There is currently a lack of harmonisation and standardisation in VMI guidelines.
However, certain best practices, such as visual and acoustic requirements, technological
considerations, and communication management strategies, are key to establishing VMI
minimum standards in asylum and refugee settings. It is both essential and feasible to create
a shared core of minimum standards to ensure high-quality VMI provision and technologi-
cal infrastructure. Reception authorities can then adapt these standards into local codes
of practice based on their specific needs. Such a model can also serve as a benchmark for
training and interpreting needs, with continuous evaluation and feedback from stakehold-
ers being crucial.
Both specialised programmes for interpreters and interprofessional training have proved
successful in the adoption of best practices and in raising awareness about VMI limita-
tions in legal and medical settings. Whilst such training may be time- and cost-prohibitive,
there are practical alternatives, which consist of online, self-directed courses that would
be effective in the asylum context. VMI interprofessional training can also help end users,
stakeholders, and interpreters understand the knock-on effect that logistical and technical
problems can have on interpreters’ performances in VMI.
Despite its limitations, VMI remains an important tool in managing the logistical and
practical demands of interpreting in refugee settings. However, improvements are neces-
sary to ensure its effectiveness and fairness. To guarantee a certain level of quality within
VMI services in the refugee context, future effort must prioritise investments in reliable
technology, provide training for interpreters and users, and establish structured feedback
mechanisms.

Notes
1 FTTIAC, Adjudicator Guidance Note on Unrepresented Appellants, April 2003, para on the
interpreter.
2 The EU-WEBPSI project is funded by the Asylum and Migration Integration Fund (AMIF), grant
number 101038590; project coordinator: University of Ghent. Website: https://www.webpsi.eu.
3 SHIFT in Orality – Shaping the Interpreters of the Future and of Today. European Commission,
Erasmus+, Key Action 2: Strategic Partnership in Higher Education. 2015–1-IT02-KA203–014786.
Website: www.shiftinorality.eu.
4 AVIDICUS 1, 2, and 3 (Assessment of Video-Mediated Interpreting in the Criminal Justice Systems)
were projects funded by the European Commission Directorate-General for Justice. For further
details, see www.videoconference-interpreting.net and Braun (2016).

References
Agarwal, D., Baba, Y., Sachdeva, P., Tandon, T., Vetterli, T., Alghunaim, A., 2020. Accurate and
Scalable Matching of Translators to Displaced Persons for Overcoming Language Barriers. arXiv
preprint arXiv:2012.02595.
Albl-Mikasa, M., Eingrieber, M., 2018. Training Video Interpreters for Refugee Languages in the
German-Speaking DACH Countries. FITISPos International Journal 5, 33–44. URL https://doi.
org/10.37536/fitispos-ij.2018.5.1.163
Alley, E., 2012. Exploring Remote Interpreting. International Journal of Interpreter Education 4(1),
111–119.
Amato, A., Bertozzi, M., Braun, S., Capiozzo, E., Danese, L., Davitti, E., Fernandes, E.I., Lopez,
J.M., Méndez, G.C., Rodríguez, M.J.G., Russo, M., Spinolo, N., 2018. Handbook of Remote

297
The Routledge Handbook of Interpreting, Technology and AI

Interpreting – Shift in Orality – Shaping the Interpreters of the Future and of Today. University of
Bologna, 1–320. URL https://doi.org/10.6092/unibo/amsacta/5955
Anastasiou, D., Gupta, R., 2011. Crowdsourcing as Human-Machine Translation (HMT). Journal of
Information Science 20(10), 1–15.
Balogh, K., Salaets, H., 2019. The Role of Non-Verbal Elements in Legal Interpreting: A Study of a
Cross-Border Interpreter-Mediated Videoconference Witness Hearing. In Tipton, R., Desilla, L.,
eds. The Routledge Handbook of Translation and Pragmatics. Routledge, London, 394–429.
Barea Muñoz, M., 2021. Psychological Aspects of Interpreting Violence: A Narrative from the
Israeli-Palestinian Conflict. In Todorova, M., Ruiz Rosendo, L., eds. Interpreting Conflict. Pal-
grave Macmillan, Cham, 159–176. URL https://doi.org/10.1007/978-3-030-66909-6_10
Bergunde, A., Pöllabauer, S., 2019. Curricular Design and Implementation of a Training Course for
Interpreters in an Asylum Context. Translation and Interpreting 11(1), 1–21. URL https://doi.
org/10.12807/ti.111201.2019.a01
BID (Bail for Immigration Detainees), 2008. Immigration Bail Hearings by Video Link: A Monitoring
Exercise by Bail for Immigration Detainees and the Refugee Council. URL http://refugeecouncil.
org.uk/policy/position/2008/bail_hearings (accessed 21.5.2018).
BID (Bail for Immigration Detainees), 2010. A Nice Judge on a Good Day: Immigration Bail and the
Right to Liberty. URL https://bailobs.org/wp-content/uploads/2023/03/a_nice_judge_on_a_good_
day.pdf (accessed 16.1.2025).
Blasco Mayor, M.J., 2020. Legal Translator and Interpreter Training in Languages of Lesser Diffusion
in Spain. In Ng, E.N., Crezee, I.H., eds. Interpreting in Legal and Healthcare Settings: Perspectives
on Research and Training. John Benjamins, Amsterdam, 133–163. URL https://doi.org/10.1075/
btl.151.06bla
Braun, S., 2013. Keep Your Distance? Remote Interpreting in Legal Proceedings: A Critical Assess-
ment of a Growing Practice. Interpreting 15(2), 200–228. https://doi.org/10.1075/intp.15.2.03bra
Braun, S., 2016. The European AVIDICUS Projects: Collaborating to Assess the Viability of
Video-Mediated Interpreting in Legal Proceedings. European Journal of Applied Linguistics 4(1),
173–180. URL https://doi.org/10.1515/eujal-2016-0002
Braun, S., 2018. Video-Mediated Interpreting in Legal Settings in England: Interpreters’ Perceptions
in Their Sociopolitical Context. Translation and Interpreting Studies 13, 393–420. URL https://
doi.org/10.1075/tis.00022.bra
Braun, S., 2019. Technology and Interpreting. In The Routledge Handbook of Translation and Tech-
nology. Centre for Translation Studies, University of Surrey, Taylor and Francis, 271–288. URL
https://doi.org/10.4324/9781315311258-19
Braun, S., 2020. “You Are Just a Disembodied Voice Really”: Perceptions of Video Remote Inter-
preting by Legal Interpreters and Police Officers. In Salaets, H., Brône, G., eds. Linking Up with
Video: Perspectives on Interpreting Practice and Research. John Benjamins, 47–78. URL https://
doi.org/10.1075/btl.149.03bra
Braun, S., Balogh, K., Hertog, E., Licoppe, A., Miler-Cassino, J., Rombouts, D., Rybioska, Z., Taylor,
J., van den Bosch, Y., Verdier, M., 2013. Assessment of Video-Mediated Interpreting in the Crimi-
nal Justice System: AVIDICUS 2 – Guide to Video-Mediated Interpreting in Bilingual Proceedings.
AVIDICUS 2, EU Criminal Justice Programme, Project JUST/2010/JPEN/AG/1558, 2011–2013. URL
https://wp.videoconference-interpreting.net/wp-content/uploads/2014/01/AVIDICUS2-Research-
report.pdf (accessed 16.1.2025).
Braun, S., Davitti, E., Dicerto, S., 2018. Video-Mediated Interpreting in Legal Settings: Assessing the
Implementation. In Napier, J., Skinner, R., Braun, S., eds. Here or There: Research on Interpreting
via Video-Link. Gallaudet University Press, Washington, DC, 144–179.
Butow, P.N., Aldridge, L., Eisenbruch, M., Girgis, A., Goldstein, D., Jefford, M., King, M., Lobb, E.,
Schofield, P., Sze, M., 2012. A Bridge Between Cultures: Interpreters’ Perspectives of Consulta-
tions with Migrant Oncology Patients. Supportive Care in Cancer 20, 235–244. URL https://doi.
org/10.1007/s00520-010-1046-z
Cassim, S., Kidd, J., Ali, M., Abdul, H.N., Jamil, D., Keenan, R., Begum, F., Lawrenson, R., 2022.
“Look, Wait, I’ll Translate”: Refugee Women’s Experiences with Interpreters in Healthcare in
Aotearoa New Zealand. Australian Journal of Primary Health 28, 296–302. URL https://doi.
org/10.1071/PY21256

298
Immigration, asylum, and refugee settings

CIOL, 2021. CIOL Guide to Working with Interpreters Remotely for Judges. Based on AVIDICUS
Recommendations. www.ciol.org.uk/ciol-guide-working-interpreters-remotely-judges (accessed
1.7.2023).
CIOL and ATC, 2021. Remote Interpreting Best Practice Checklists. URL www.ciol.org.uk/sites/
default/files/Interpreting%20Checklist-FNL.pdf (accessed 20.7.2023).
Clark, J., 2021. Evaluation of Remote Hearings During the COVID-19 Pandemic. Research Report.
H.M. Courts & Tribunals Service.
Corpas Pastor, G., Gaber, M., 2020. Remote Interpreting in Public Service Settings: Technology, Per-
ceptions and Practice. SKASE Journal of Translation and Interpretation [Preprint]. The Slovak
Association for the Study of English 13(2), 58–78.
Council of the European Union: General Secretariat of the Council, 2013. Guide on Videoconfer-
encing in Cross-Border Proceedings. European e-Justice, Publications Office. URL https://data.
europa.eu/doi/10.2860/76243
Davis, R., Barton, A., Debus-Sherill, S., Matelevich-Hoang, J.B., Niedzwiecki, E., 2015. Research on
Videoconferencing at Post-Arraignment Release Hearings: Phase I Final Report. U.S. Department
of Justice, Fairfax, VA.
Davitti, E., Braun, S., 2020. Analysing Interactional Phenomena in Video Remote Interpreting in Col-
laborative Settings: Implications for Interpreter Education. The Interpreter and Translator Trainer
14(3), 279–302. URL https://doi.org/10.1080/1750399X.2020.1800364
De Wilde, J., Guaus, A., 2022. 1st International Conference on the Right to Languages. In Linguistic
Policies and Translation and Interpreting in Public Services and Institutions. Facultat de Dret – Uni-
versitat de València, 63–64. URL https://blogs.uji.es/cidl/files/2022/07/BoA.pdf (accessed 15.7.2024).
Dubus, N., 2016. Interpreters’ Subjective Experiences of Interpreting for Refugees in Person and via
Telephone in Health and Behavioural Health Settings in the United States. Health & Social Care
in the Community 24(5), 649–656.
Eagly, I.V., 2015. Remote Adjudication in Immigration. Northwestern University Law Review 109,
933–1020.
EASO, 2020. Practical Recommendations on Conducting Remote/Online Registration. EASO Prac-
tical Guide Series. Publications Office of the European Union, Luxembourg. URL https://doi.
org/10.2847/964488
EASO, 2021. EASO Asylum Report 2021: Annual Report on the Situation of Asylum in the European
Union. European Asylum Support Office. URL https://bit.ly/3NjoEKi (accessed 16.1.2015).
Ellis, S.R., 2004. Videoconferencing in Refugee Hearings. Ellis Report to the Immigration and Refu-
gee Board Audit and Evaluation Committee. Immigration and Refugee Board of Canada.
European Migration Network, 2022. Ad-Hoc Query on 2022.63 Interpreting in Reception Facili-
ties. EMN. URL https://emnbelgium.be/sites/default/files/publications/Compilation%20AHQ%20
2022.63_interpreting_in_reception_facilities%20FINAL%20VERSION.pdf (accessed 16.1.2025).
Federal Ministry of Justice, 2022. Draft Law to Promote the Use of Video Conferencing Technology
in Civil Jurisdiction and Specialist Jurisdictions. www.bundestag.de/dokumente/textarchiv/2023/
kw38-de-videokonferenztechnik-965066 (accessed 22.7.2024).
Federman, M., 2006. On the Media Effects of Immigration and Refugee Board Hearings via Video-
conference. Journal of Refugee Studies 19(4), 433–452. URL https://doi.org/10.1093/refuge/fel018
Finnish Translators and Interpreters’ Association, 2013. Asioimistulkin eettiset ohjeet [Ethical
Guidelines for Business Interpreters]. URL www.youpret.com/fi/tulkkien-eettiset-ohjeet/ (accessed
16.1.2025).
Fowler, Y., 2013. Non-English-Speaking Defendants in the Magistrates Court: A Comparative Study
of Face-to-Face and Prison Video Link Interpreter-Mediated Hearings in England (PhD thesis).
Aston University.
Fowler, Y., 2018. Interpreted Prison Video Link: The Prisoner’s Eye View. In Napier, J., Skinner, R.,
Braun, S., eds. Here or There: Research on Interpreting via Video Link. Gallaudet University Press,
Washington, DC, 183–209.
Gany, F., González, C.J., Pelto, D.J., Schutzman, E.Z., 2017. Engaging the Community to Develop
Solutions for Languages of Lesser Diffusion. In Jacobs, E.A., Diamond, L.C., eds. Providing Health
Care in the Context of Language Barriers: International Perspectives. Multilingual Matters, Clev-
edon, 149–169. URL https://doi.org/10.21832/9781783097777-011

299
The Routledge Handbook of Interpreting, Technology and AI

GDISC, 2008. GDISC Interpreters Pool – Final Evaluation Meeting, Sofia, 20–21 November 2008:
Summary Conclusions. GDISC Secretariat, The Hague.
Gilbert, A.S., Croy, S., Haralambous, B., Hwang, K., LoGiudice, D., 2022. Video Remote Inter-
preting for Home-Based Cognitive Assessments: Stakeholders’ Perspectives. Interpreting 24(1),
84–110. URL https://doi.org/10.1075/intp.00065.gil
Guaus, A., Braun, S., Buysse, L., Davitti, E., De Wilde, J., González Figueroa, L.A., Mazzanti, E., Mar-
yns, K., Pöllabauer, S., Singureanu, D., 2023. EU-WEBSI Model: Harmonised Minimal Standards
for Webcam Public Service Interpreting. www.webpsi.eu/deliverables/wp3-minimum-standards/
(accessed 17.1.2025).
Hale, S., Goodman-Delahunty, J., Lim, J., Martschuk, N., 2022. Does Interpreter Location Make a
Difference? A Study of Remote vs Face-to-Face Interpreting in Simulated Police Interviews. Inter-
preting 24(2), 221–253. URL https://doi.org/10.1075/intp.00077.hal
Hale, S., Ozolins, U., 2014. Monolingual Short Courses for Language-Specific Accreditation: Can
They Work? A Sydney Experience. Interpreter and Translator Trainer 8(2), 217–239. URL https://
doi.org/10.1080/1750399X.2014.929371
HSE Social Inclusion Unit and the Health Promoting Hospitals Network, 2009. On Speaking Terms:
Good Practice Guidelines for HSE Staff in the Provision of Interpreting Services. URL www.hse.
ie/eng/services/publications/socialinclusion/emaspeaking.pdf (accessed 16.1.2025).
IMDi [The Directorate for Integration and Diversity], 2023a. E-læringskurs i skjermtolking [E-Learning
Course in Screen Interpretation]. IMDi. www.imdi.no/tolk/skjerm/ (accessed 16.7.2023).
IMDi [The Directorate for Integration and Diversity], 2023b. Roller og ansvar ved tolking over
skjerm [Roles and Responsibilities When Interpreting Over a Screen]. www.imdi.no/tolk/
roller-og-ansvar-ved-tolking-over-skjerm/ (accessed 16.7.2023).
IMDi [The Directorate for Inclusion and Diversity], 2024. Gjennomføring av tolkede samtaler
og møter [Implementation of Interpreted Conversations and Meetings]. www.imdi.no/tolk/
hvordan-lage-retningslinjer-for-bruk-av-tolk-i-dinvirksomhet/#title_11 (accessed 16.7.2023).
Inghilleri, M., Maryns, K., 2019. Asylum. In Baker, M., ed. Routledge Encyclopaedia of Translation
Studies. Routledge, 22–27.
Ji, X., Abdelhamid, K., Bergeron, A., Chow, E., Lebouché, B., Mate, K.K.V., Naumova, D., 2021.
Utility of Mobile Technology in Medical Interpretation: A Literature Review of Current Prac-
tices. Patient Education and Counseling 104(9), 2137–2145. URL https://doi.org/10.1016/j.
pec.2021.02.019
Jiménez-Crespo, M.A., 2017. Translation Crowdsourcing: Research Trends and Perspectives.
In Cordingley, A., Frigau Manning, C., eds. Collaborative Translation: From the Renais-
sance to the Digital Age. Bloomsbury Academic, London, 192–211. URL http://dx.doi.
org/10.5040/9781350006034.0016
Jiménez-Ivars, A., León-Pinilla, R., 2018. Interpreting in Refugee Contexts: A Descriptive and Quali-
tative Study. Language & Communication 60, 28–43.
Klammer, M., Pöchhacker, F., 2021. Video Remote Interpreting in Clinical Communication: A
Multimodal Analysis. Patient Education and Counseling 104, 2867–2876. URL https://doi.
org/10.1016/j.pec.2021.08.024
Kleinert, C.V., Núñez-Borja, C., Stallaert, C., 2019. Buscando espacios para la formación de intér-
pretes para la justicia en lenguas indígenas en América Latina’ [Seeking Spaces for the Training
of Interpreters for Justice in Indigenous Languages in Latin America]. Mutatis Mutandis 12(1),
78–99. URL https://doi.org/10.17533/udea.mut.v12n1a03
Koller, M., Pöchhacker, F., 2018. The Work and Skills: A Profile of First-Generation Video Remote
Interpreters. In Napier, J., Skinner, R., Braun, S., eds. Here or There: Research on Interpreting via
Video Link. Gallaudet University Press, 89–110.
Korak, C.A., 2012. Remote Interpreting via Skype – a Viable Alternative to In Situ Interpreting?
Interpreters Newsletter 17, 83–102. URL http://hdl.handle.net/10077/8614 (accessed 15.1.2025).
Kunin, M., Ali, R., Yugusuk, C., Davis, A., McBride, J., 2022. Providing Care by Telephone to Refu-
gees and Asylum Seekers: An Evaluation of Telephone Mode-of-Care in Monash Health Refugee
Health and Wellbeing Clinic in Victoria, Australia. Health Services Insights 15, 1–10. URL https://
doi.org/10.1177/11786329221134349
The Legal Assistance Foundation of Metropolitan Chicago and The Chicago Appleseed Fund for Jus-
tice, 2005. Videoconferencing in Removal Proceedings: A Case Study of the Chicago Immigration

300
Immigration, asylum, and refugee settings

Court. URL http://chicagoappleseed.org/wp-content/uploads/2012/08/videoconfreport_080205.


pdf (accessed 16.1.2025).
Licoppe, C., Veyrier, C.A., 2017. How to Show the Interpreter on Screen? The Normative Organiza-
tion of Visual Ecologies in Multilingual Courtrooms with Video Links. Journal of Pragmatics 107,
147–164. URL https://doi.org/10.1016/j.pragma.2016.09.012
Licoppe, C., Veyrier, C.A., 2020. The Interpreter as a Sequential Coordinator in Courtroom Inter-
action: “Chunking” and the Management of Turn Shifts in Extended Answers in Consecutively
Interpreted Asylum Hearings with Remote Participants. Interpreting 22(1), 56–86. URL https://
doi.org/10.1075/intp.00034.lic
Lindström, N.B., Pozo, R.R., 2020. Perspectives of Nurses and Doulas on the Use of Information and
Communication Technology in Intercultural Pediatric Care: Qualitative Pilot Study. JMIR Pediat-
rics and Parenting 3(1), e16545. URL https://doi.org/10.2196/16545
Martínez, G.A., Hardin, K.J., Dejbord-Sawan, P., Magaña, D., Showstack, R.E., 2021. Pursuing Tes-
timonial Justice: Language Access Through Patient-Centered Outcomes Research with Spanish
Speakers. Applied Linguistics 42(6), 1110–1124. URL https://doi.org/10.1093/applin/amab060
Maryns, K., 2015. Asylum Settings. In Pöchhacker, F., Grbic, N., Mead, P., Setton, R., eds. Routledge
Encyclopedia of Interpreting Studies. Routledge, London, 22–26.
Mishori, R., Hampton, K., Habbach, H., Raker, E., Niyogi, A., Murphey, D., 2021. “Better Than
Having No Evaluation Done”: A Pilot Project to Conduct Remote Asylum Evaluations for Clients
in a Migrant Encampment in Mexico. BMC Health Services Research 21(1), 508.
Nuč, A., Pöllabauer, S., 2021. In the Limelight? Interpreters’ Visibility in Transborder Interpreting.
ELOPE: English Language Overseas Perspectives and Enquiries 18(1), 37–54. URL https://doi.
org/10.4312/elope.18.1.37-54
O’Brien, S., 2011. Harnessing Collective Intelligence for Translation: An Assessment of Crowdsourc-
ing as a Means of Bridging the Canadian Linguistic Digital Divide. University of Ottawa, Canada.
URL https://ruor.uottawa.ca/server/api/core/bitstreams/255bd8bc-ef09-4b90-bcb8-eab1c7cb9c57/
content (accessed 16.1.2025).
Office of the Commissioner General for Refugees and Stateless Persons, 2022. Code of Conduct for
Translators and Interpreters. URL www.cgrs.be/sites/default/files/brochures/2011-02-11_brochure_
deontology-for-translations_eng_0.pdf (accessed 16.1.2025).
OROLSI-JCS and UNITAR, 2020. Remote Hearing Toolkit. URL https://peacekeeping.un.org/sites/
default/files/unitar-orolsi_remote_hearing_toolkit._2020.pdf (accessed 16.7.2024).
Patel, P., Bernays, S., Dolan, H., Muscat, D.M., Trevena, L., 2021. Communication Experiences in
Primary Healthcare with Refugees and Asylum Seekers: A Literature Review and Narrative Syn-
thesis. International Journal of Environmental Research and Public Health 18(4), 1469. URL
https://doi.org/10.3390/ijerph18041469
Pavill, B., 2019. Practice Makes Perfect: Integrating Technology to Teach Language Barrier Solu-
tions. Nursing Education Perspectives 40(3), 189–191. URL https://doi.org/10.1097/01.
NEP.0000000000000324
Phillips, C., 2013. Remote Telephone Interpretation in Medical Consultations with Refugees:
Meta-Communications About Care, Survival and Selfhood. Journal of Refugee Studies 26(4),
505–523. URL https://doi.org/10.1093/jrs/fet005
Pogue, M., Raker, E., Hampton, K., Saint Laurent, M.L., Mishori, R., 2021. Conducting Remote
Medical Asylum Evaluations in the United States During COVID-19: Clinicians’ Perspectives on
Acceptability, Challenges and Opportunities. Journal of Forensic and Legal Medicine 84, 102255.
URL https://doi.org/10.1016/j.jflm.2021.102255
Pöllabauer, S., 2022. Interpreting in an Asylum Context: Interpreter Training as the Linchpin for
Improving Procedural Quality. In Ruiz Rosendo, L., Todorova, M., eds. Interpreter Train-
ing in Conflict and Post-Conflict Scenarios. Routledge, London, 129–145. URL https://doi.
org/10.4324/9781003230359-13
Pöllabauer, S., 2023. Research on Interpreter-Mediated Asylum Interviews. In Gavioli, L., Wadensjö,
C., eds. The Routledge Handbook of Public Service Interpreting. Routledge, Taylor & Francis,
London and New York, 140–154.
Price, E.L., Karliner, L.S., López, M., Nickleach, D., Pérez-Stable, E.J., 2012. Interpreter Perspectives
of In-Person, Telephonic, and Videoconferencing Medical Interpretation in Clinical Encounters.
Patient Education and Counseling 87(2), 226–232. URL https://doi.org/10.1016/j.pec.2011.08.006

301
The Routledge Handbook of Interpreting, Technology and AI

Ramos, R., Antolino, P., Davis, J.L., Grant, C.G., Green, B.L., Sanz, M., 2014. Language and Commu
nication Services: A Cancer Centre Perspective. Diversity and Equality in Health and Care 11(1),
71–80. URL www.primescholars.com/articles/language-and-communication-services-a-cancer-
centre-perspective-94515.html (accessed 16.7.2024).
Rowe, A., Twose, A., Makower, C., Mitchell, E., Benton, G., Munro Kerr, M., Singh, N., Meyer, N.,
Rath, N., Krishna, N., Thompson, T., 2019. Systematic Failure: Immigration Bail Hearings. Bail
Observation Project. URL https://bailobs.org/wp-content/uploads/2019/10/systematic-failure-1.
pdf (accessed 16.7.2024).
Shaw, S., 2016. Review into the Welfare in Detention of Vulnerable Persons: A Report to the Home
Office by Stephen Shaw. URL https://assets.publishing.service.gov.uk/government/uploads/system/
uploads/attachment_data/file/490782/52532_Shaw_Review_Accessible.pdf
Shaw, S., 2018. Assessment of Government Progress in Implementing the Report on the Welfare in
Detention of Vulnerable Persons: A Follow-Up Report to the Home Office by Stephen Shaw. URL
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/
file/728376/Shaw_report_2018_Final_web_accessible.pdf
Singureanu, D., Braun, S., Buysse, L., Davitti, E., De Wilde, J., González Figueroa, L.A., Guaus, A.,
Maryns, K., Mazzanti, E., Pöllabauer, S., 2023a. EU-WEBPSI: Baseline Study and Needs Analysis
for Public Service Interpreting, Video-Mediated Interpreting, and Languages of Lesser Diffusion
Interpreting. www.webpsi.eu/wp-content/uploads/2023/03/Comprehensive-research-report.pdf
(accessed 16.7.2024).
Singureanu, D., Gough, J., Hieke, G., Braun, S., 2023b. “I am His Extension in the Courtroom”:
How Court Interpreters Cope with the Demands of Video-Mediated Interpreting in Hearings with
Remote Defendants. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies – Cur-
rent and Future Trends. John Benjamins, Amsterdam, 72–108. URL https://doi.org/10.1075/
ivitra.37.04sin
Skaaden, H., 2018. Remote Interpreting: Potential Solutions to Communication Needs in the Refugee
Crisis and Beyond. European Legacy 23(7–8), 837–856. URL https://doi.org/10.1080/10848770.
2018.1499474
Sultanić, I., 2020. Medical Interpreter Education and Training. In Meng, J., Laviosa S., eds. The
Oxford Handbook of Translation and Social Practices. Oxford University Press, Oxford, 356–377.
URL https://doi.org/10.1093/oxfordhb/9780190067205.013.23
Tam, I., Fisher, E., Huang, M.Z., Patel, A., Rhee, K.E., 2020. Spanish Interpreter Services for the
Hospitalised Pediatric Patient: Provider and Interpreter Perceptions. Academic Pediatrics 20(2),
216–224. URL https://doi.org/10.1016/j.acap.2019.08.012
TEPIS [The Polish Society of Sworn and Specialised Translators], 2019. Kodeks zawodowy tłumacza
przysięgłego [The Professional Code of Sworn Translators]. URL https://tepis.org.pl/wp-content/
uploads/2020/02/Kodeks_zawodowy_t%C5%82umacza_przysi%C4%99g%C5%82ego_2019.
pdf (accessed 16.1.2025).
Ticca, A.C., Jouin, E., Traverso, V., 2023. Training Interpreters in Asylum Settings: The REMILAS
Project. In Gavioli, L., Wadensjö, C., eds. The Routledge Handbook of Public Service Interpreting.
Routledge, 362–382.
Tipton, R., Furmanek, O., 2016. Dialogue Interpreting: A Guide to Interpreting in Public Services
and the Community. Routledge, London. URL https://doi.org/10.4324/9781315644578
Translators without Borders, 2020. TWB and Kobo Inc Develop Speech Recognition Technology to
Capture Voices of Speakers of Marginalized Languages. URL https://translatorswithoutborders.org/
twb-and-kobo-inc-develop-speech-recognition-technology-to-capture-voices-of-speakers-of-marginalized-
languages/ (accessed 15.10.2024).
UNHCR, 2020. Remote Interviewing: Practical Considerations for States in Europe During
COVID-19. UNHCR Operational Data Portal (ODP). URL https://data2.unhcr.org/en/docu-
ments/details/77134 (accessed 4.10.2024).
van Rotterdam, P., van den Hoogen, R., 2012. True-to-Life Requirements for Using Videoconferenc-
ing in Legal Proceedings. In Braun, S., Taylor, J., eds. Videoconference and Remote Interpreting in
Criminal Proceedings. Intersentia, Antwerp, 187–198.
Vastaanottava Pohjois-Savo [The Welcome to Northern Savo], 2011. VÄLTÄ VÄÄRINKÄSI-
TYKSIÄ – KÄYTÄ TULKKIA [Avoid Misunderstandings – Use an Interpreter]. URL https://doc-
player.fi/1815584-Valta-vaarinkasityksia-kaytatulkkia.html (accessed 16.1.2025).

302
PART V

Current issues and debates


17
QUALITY-RELATED ASPECTS
Elena Davitti, Tomasz Korybski, Constantin Orăsan
and Sabine Braun

17.1 Introduction
Quality in interpreting is a fundamental yet elusive and relative construct which has tra-
ditionally been shaped by various perspectives and has been difficult to define or assess
through a unified approach. As a multidimensional concept, quality encompasses elements
such as textual accuracy, source–target correspondence, communicative effect, and the
interpreter’s role performance (Pöchhacker, 2001) and thus requires diverse evaluation
methods. While quality has always been a core concern of the interpreting profession, it
only became a focus of research in the 1980s. The question of quality evaluation has long
been debated, with different approaches offered. Traditionally, conference interpreting has
centred on product-oriented analyses, whereas community interpreting has emphasised
competence-oriented evaluations in its quest for establishing professional standards. Still,
more than 20 years into the 21st century, the elusiveness of the concept of quality and its
dependence on a plethora of contextual factors remain major research challenges.
The significant and rapid technological advancements made in recent decades have
complicated quality definition and assessment further by introducing both technological
variables and a new set of unknowns which relate to the efficiency of human–machine
collaboration in the delivery of interpreting assignments. The interaction between tech-
nology and interpreting has spawned various modalities of distance interpreting. These
include video-mediated interpreting (VMI; see Braun, this volume) and telephone interpret-
ing (TI; see Lázaro Gutiérrez, this volume), which in themselves comprise a wide variety of
sub-forms, depending on the social and technological contexts of application. For example,
both VMI and TI can be used for simultaneous or consecutive, or a mixture of the two, with
varied data throughputs and varied service delivery platforms (see, for example, Chmiel
and Spinolo, and Warnicke and Davitti, this volume). Each of these elements can affect
interpreter performance and, consequently, (aspects of) service quality.
Furthermore, the integration of assistive, AI-driven tools, such as automatic speech
recognition (ASR), adds another layer of complexity. These technologies have not only
expanded the possibilities of interpreting in different contexts but also enabled new assess-
ment methods, which range from semi-automated to fully automated metrics and ensue


DOI: 10.4324/9781003053248-23
The Routledge Handbook of Interpreting, Technology and AI

concerns about the reliability of such approaches. This chapter explores the multiple angles
from which quality evaluation in interpreting has been approached. It begins by revisiting
foundational concepts of quality in interpreting (Section 17.2) and examines key challenges
posed by the intersection of technology and interpreting (Section 17.3). Section 17.4 pro-
vides a comprehensive review of methods and perspectives used to assess quality in these
technologised workflows. Future developments are discussed in Section 17.5, followed by
conclusions in Section 17.6.

17.2 Quality in interpreting: some basic concepts


In her seminal work on interpreting quality, Grbić (2008) emphasised the need for further
clarification of this concept. García Becerra and Collados Aís (2019, 455) echoed this, not-
ing that ‘there is as yet no consensus on what [interpreting quality] is, what criteria should
be used to define it and how these criteria might be objectively evaluated’. This challenge is
further illustrated in the dichotomy between Kurz (2001), who argued that quality should
align with audience needs and perceptions, and Shlesinger (1997), who characterised qual-
ity as ‘elusive’ and questioned whether users of interpreting services are the best judges of
quality. To support this, Shlesinger used the example of fluent delivery, observing that it
can create a false impression of high-quality interpreting, a point corroborated by Collados
Aís (1998).
However, despite diverging views, over time, a partial consensus has emerged, and
research on interpreting quality has evolved significantly. This evolution has contributed
to shifts in focus and methodology. These, in turn, have led to a more sophisticated under-
standing of how quality is conceptualised, measured, and influenced by both cognitive and
contextual factors in the interpreting process. The introduction of technological tools adds
new dimensions and complexities to the interpreting process. It is therefore essential to
adjust traditional methods to account for additional variables. These complexities require a
more nuanced approach to ensure that both traditional and technology-driven interpreting
workflows maintain high-quality standards. A brief review of key strands in the approach
to quality evaluation and assessment in interpreting is thus necessary before delving into
how these strands have been adapted to allow for technological integration.
Quality evaluation in interpreting can be traced back to the 1950s, where practitioner-
driven insights like Herbert’s (1952) conceptualisation of quality was based on personal
experience, particularly in consecutive interpreting. In the 1960s–1970s, cognitive research
began to explore the impact of different variables, such as information density (Treisman,
1965) and speech speed (Gerver, 1969/2002, 1974), on the interpreter’s output in simul-
taneous interpreting. Barik (1971) examined interpreter output through errors. However,
his approach was criticised for using neither authentic source content nor professional
interpreters.
In the 1980s, attention shifted to focus on understanding users’ expectations of interpret-
ing quality. Bühler’s (1986) pivotal study broke interpreting quality down into key compo-
nents, such as sense consistency, fluency, terminology, accent, and delivery, and asked AIIC
interpreters to rate the importance of each component. Two further surveys of conference
interpreters highlighted variations in quality assessment based on meeting type and role
perceptions (Chiaro and Nocella, 2004; Pöchhacker and Zwischenberger, 2010). Further
studies investigating user expectations of quality have focused on different user groups
and settings. Kurz (1993), Kopczyński (1994), and Moser (1996) were among the first to

306
Quality-related aspects

explore the expectations of a range of users of interpreting services with regards to inter-
preting quality in the context of simultaneous interpreting. They built on and expanded
Bühler’s criteria, incorporating additional factors, such as interpreter’s voice, accent, and
intonation; setting; and subject type. These studies have highlighted that users generally
prioritise content over form when evaluating interpreting quality. However, these also
show how user expectations can vary significantly based on the users’ background. While
many users emphasised the importance of accuracy, logical cohesion, and completeness
of information, users from diplomatic backgrounds, for example, placed greater empha-
sis on fluency and consistency. However, those from technical fields prioritised accuracy
and terminology. Furthermore, Moser (1996) observed how more experienced conference
participants tended to have higher expectations of interpreting quality than less-frequent
service users. Nevertheless, interpreters were found to have higher expectations than users,
placing greater importance on consistency, completeness, and logical cohesion as key fac-
tors in quality (Bühler, 1986; Chiaro and Nocella, 2004). User expectations have also been
explored in relation to different settings, including healthcare (Mesa, 1997) and court inter-
preting (Kadric, 2001). Here, findings revealed that different professional backgrounds,
gender, age, and experience shaped quality priorities.
Parallel to user-centred studies, research into interpreter output has also evolved further.
Research in this area has focused on comparing interpreted versions (interpreters’ out-
put) with source speeches to assess and measure semantic equivalence and communicative
effectiveness, identify errors, or more broadly, ‘interpreting problems’. At the core of this
research lie evaluations by human raters, with approaches that can be broadly divided into
two categories. On the one hand, ‘bottom-up approaches’ assess quality through error/
item-based classifications and weighted scores according to the severity or an error or prob-
lem. These approaches are increasingly used to assess the quality of interpreting-related
workflows (see Section 17.4.2). On the other hand, ‘top-down’ approaches assess the inter-
preter’s output against an explicit rubric/set of criteria covering a range of dimensions of
interpreting quality. These approaches use (weighted) scales to rate the interpreter’s perfor-
mance level for each criterion and have been used widely for both consecutive and simul-
taneous interpreting, often in the context of interpreter training (see Section 17.4.1). There
is ongoing debate regarding the effectiveness of different assessment types in interpreting,
particularly for pedagogical and research purposes (e.g. Dawrant and Han, 2021; Han and
Lu, 2021b; Liu, 2013; Tiselius, 2009). Error/item-based assessments provide a fine-grained,
nuanced analysis of issues in an interpreter’s output but can be more labour-intensive.
Criterion-referenced approaches tend to be less time-consuming and offer holistic evalu-
ations; however, subjectivity remains a concern. To address this, detailed descriptors tend
to be developed to guide assessors, and multiple evaluators are often used. Assessor train-
ing, calibration, and the use of multiple assessors are essential in both error/item-based
and criterion-referenced evaluation methods. These processes help mitigate subjectivity and
ensure consistency in quality assessments. These methods also form the foundation for the
development of machine-based evaluation approaches, reviewed in Section 17.4.3. This
is because reliable human assessments make it possible to explore correlations with auto-
mated metrics, thereby enhancing the robustness of interpreting quality evaluations.
Despite variations, models of output quality have focused primarily on linguistic aspects
of interpreting performance. In an attempt to move beyond output quality and broaden the
scope of the concept of quality, Shlesinger (1997) contended that the quality of the inter-
preter’s output should be assessed at three levels: intertextual level (comparison between

307
The Routledge Handbook of Interpreting, Technology and AI

source speech and the interpreter’s output), intratextual level (interpreter’s output in its
own right), and instrumental level (usefulness and comprehensibility of the interpreter’s
output for a given audience and purpose). Together with findings from cognitive research,
which identify speech rate, interpreter preparation, and cognitive (over)load as factors that
influence interpreting quality (Gile, 1988, 1995/2009), Shlesinger’s model underpins the
case for developing more comprehensive models of quality that reach beyond the focus on
the interpreter’s output.
Examples of comprehensive models of quality assessment and assurance include those
developed by Pöchhacker (2001) and Kalina (2002, 2005). Pöchhacker (2001) argues for
quality to be broken down into different levels of assessment but states that the overarching
criterion should be successful communicative interaction.

Finally, the focus of quality assessment may be neither on the source text nor on lis-
teners’ comprehension or speakers’ intentions but on the process of communicative
interaction as such. From this perspective, which foregrounds the ‘(inter)activity’ of
interpreting rather than its nature as a ‘text-processing task’, quality essentially means
‘successful communication’ . . . among the interacting parties in a particular context
of interaction, as judged from the various (subjective) perspectives in and on the com-
municative event . . . and/or as analyzed more intersubjectively from the position of
an observer.
(Pöchhacker, 2001, 413)

Kalina (2002, 2005) emphasised the need for a model that considers the communication
situation, actors’ intentions and knowledge, and conditions impacting the event (before,
during, after). She also recognises that ‘total objectivity’ is not possible; ‘[w]hat is needed
is a model encompassing the communication situation, the intentions and knowledge
bases of different actors (including the interpreter), and any conditions liable to affect
the interpreted event’ (Kalina, 2002, 133). Ultimately, the goal is to achieve a complete,
accurate interpretation while being mindful of extralinguistic factors and situational con-
straints. By emphasising the importance of the communicative situation in which inter-
preting is embedded, Kalina shifts the focus towards the relationship between the output
quality and the interpreting process. She also highlights the role clients play in achiev-
ing quality, for instance, by providing preparation material, speaking clearly, and with
appropriate pace.
While both the concept of interpreting quality and the criteria used to assess it remain
the subject of ongoing debate, there is consensus that interpreting quality is a complex, mul-
tifaceted concept, shaped by various parameters. It is also widely recognised that (interpret-
ing) quality is a social construct (Grbić, 2008), influenced by individual perspectives and
the situational context in which interpreting occurs. Different actors – such as interpreters,
clients and users of interpreting services, trainers, and examiners – play a role in defining
quality, as do the contextual factors of an interpreting assignment. Ozolins and Hale (2009)
highlight this by describing interpreting quality as a shared responsibility. Importantly, the
situational context for interpreting has been influenced or altered by the introduction of
technology. It therefore appears logical that quality assessment frameworks and models
should account for the unique challenges and opportunities that the application of technol-
ogy in interpreting presents. Nevertheless, open questions remain about how the use of
technology in interpreting should influence/shape evaluation methods and criteria.

308
Quality-related aspects

For example, distance interpreting (further discussed later) has introduced challenges,
which include limited access to non-verbal cues, potential disruptions due to network prob-
lems, increased cognitive load, and fatigue. These challenges have been shown to affect the
interpreter’s performance. Nevertheless, there is currently limited understanding of how
these factors influence the quality perceptions of users of interpreting services. Similarly,
the use of language technologies, such as automatic speech recognition, during the inter-
preting task may enhance an interpreter’s output quality. However, this also increases the
interpreter’s need to multitask. The combined impact of these factors on both interpreting
quality and how it should be evaluated raises new questions for both research and practice.
The next section will review current challenges at the intersection of technology and inter-
preting, with a view to shaping the questions for quality assessment in technology-based
interpreting workflows. Section 17.4 will then explore the various approaches to quality
assessment that have already been applied to interpreting and technology research.

17.3 Current challenges: the intersection of technology and interpreting


Although technological advances, including technologies for distance interpreting and live
speech-to-text solutions, had emerged before the COVID-19 pandemic, the change caused
by the global shift to remote work has brought to the forefront the question of how to
assess interpreting quality within different modalities and different levels of technologisa-
tion – each presenting distinct challenges for quality assessment. Furthermore, in the cur-
rent technological context, the interpreter’s output is potentially dependent on more than
just the interpreter’s performance. To illustrate, the quality of the product is impacted by
the set-up, data throughput, technological savviness (or lack thereof) of the interpreter, and
extent to which CAI tools (see Prandi, this volume) are (skilfully) applied by the interpreter.
For example, from a quality assessment perspective, RSI platforms and generic video-
conferencing platforms provide interpreters with both auditory and visual inputs. However,
the way each platform provides these inputs differs from the conditions offered by regular
(on-site) interpretation (see Chmiel and Spinolo, this volume). Therefore, assessing quality
in a heavily technologised and/or remote set-up requires addressing certain issues, such as
the variability in the set-ups themselves, internet bandwidth, audio and video clarity, all of
which can impact the interpreter’s ability to access necessary visual cues and thus affect the
interpretation quality (Rennert, 2008; Seeber, 2017, this volume).
Heavily technologised set-ups also present unique challenges related to cognitive load.
The need to process both visual and auditory information simultaneously can increase cog-
nitive demand on interpreters, potentially leading to fatigue and a decline in performance
quality over time (Napier et al., 2018). With the advent of powerful ASR technology, the
question of increased cognitive demand, resulting from visual information (text display),
has become more prevalent. This has encouraged recent research projects to investigate
this area (e.g. Fantinuoli, 2017; Frittella, 2019; Rodríguez González, 2023). Consequently,
standard criteria for evaluating interpreting quality in RSI/VMI/highly technologised set-
tings may require metrics specific to these factors to ensure comprehensive assessment.
As previously mentioned, speech-to-text technologies represent a specific challenge for
interpreting quality and assessment. On the one hand, highly precise, low-latency ASR
technology can present information that is useful for the interpreter or even fill gaps in the
interpreter’s comprehension. On the other hand, the visual textual channel which delivers
this information competes cognitively with the ‘regular’ auditory and non-textual visual

309
The Routledge Handbook of Interpreting, Technology and AI

channels. This creates a high processing load for the professional. A current research chal-
lenge focuses on addressing how interpreters strike a balance between the benefits and
challenges of ASR, in addition to investigating the extent to which the product’s reliability
is impacted by such factors as duration of training with the technology or the interpreter’s
associated skills – for example, sight translation.
The consideration of speech-to-text technologies in interpreting also inevitably leads to
questions concerning the hybrid practices in live communication (these include respeaking
and related workflows; see Davitti, this volume). Rather than receiving live audio interpre-
tation, (some) audiences in certain settings may opt to receive live text instead (also referred
to as ‘live captions’ or ‘live subtitles’). There are several ways to produce such text. Davitti
(this volume) places these on a continuum, from fully automated to largely human. Fur-
thermore, recent research has looked at a pertinent related question: how do we compare
(aspects of) quality across audio and text, as diverse users tend to take advantage of both?
For example, the MATRIC project1 (Korybski et al., 2022) investigated live subtitles pro-
duced in a hybrid (semi-automated) workflow with human respeakers and compared their
accuracy to that offered by top-level human interpreters. Despite some caveats concerning
the scale of the experiment and the genres and languages investigated, the study found that
the semi-automated workflow, combining intralingual respeaking and machine translation,
was capable of generating outputs that were similar in terms of accuracy and completeness
to those produced by human interpreters. In the emerging practice of interlingual respeak-
ing investigated by the SMART project2 (e.g. Davitti and Wallinheimo, 2025; Korybski
and Davitti, 2024, Davitti, this volume), the human is responsible for rendering live audio
directly into the target language through speech recognition, thus producing live subtitles
in a process that adds challenges to ‘regular’ interpreting. Here, emerging technologies like
speech recognition and machine translation have not only established new practices but
also raised the need to reshape the criteria for evaluating quality in such complex scenarios.
The emerging question, therefore, is how to approach the measurement of aspects of qual-
ity while transcending the audio–text input divide?
Aside from the research initiatives mentioned, there is a research gap in this respect,
which calls for investigation, especially with regards to the reception of content via live
audio and live text. This further complicates the challenge of carrying out comprehensive
quality assessment in a highly technologised interpreting (or hybrid) set-up and adds an
array of human end user variables. Therefore, one may claim that studying quality meas-
urement in interpreting requires prior acceptance of its interdisciplinary nature. Research
may cross boundaries with (neuro)cognitive studies, HCI studies, psychology, anthropol-
ogy, and culture studies, for example. However, earlier research in interpreting studies also
provides a solid foundation on which to build, as the next section will present.

17.4 Quality assessment taxonomies and methods


The field of quality assessment in interpreting, particularly in relation to technology, is
fluid and rapidly evolving. As new technologies, such as distance and AI-related ones, are
introduced and applied within interpreting, they prompt a reconsideration of the appro-
priateness of existing assessment methods. As Section 17.2 highlights, quality assessment
methods have historically employed various taxonomies, including traditional top-down
approaches, which typically involve structured systems and frameworks to assess perfor-
mance through a set of pre-defined metrics. Over time, however, there has been a noticeable

310
Quality-related aspects

shift towards bottom-up approaches. These incorporate elements such as category weight-
ing for errors to allow for a more nuanced understanding of performance. These approaches
have marked a move towards flexible and context-sensitive assessment techniques. Further-
more, this progression sets the stage for the development of (semi-)automated assessment
tools. These could offer efficiency and precision when evaluating interpreting performance
across various contexts and thus enable comparative analyses, which are currently less
common, due to the time-consuming nature of quality assessment. This section will provide
an overview of existing assessment methods, grouping them into ‘top-down’, ‘bottom-up’,
and automated approaches. Discussion will focus on the evolution, strengths, and limita-
tions of these methods, in addition to the role played by emerging technologies in shaping
and refining these methods within the field of interpreting.

17.4.1 Top-down methods


‘Top-down’ or ‘criterion-referenced’ methods for assessing interpreting performance take
a holistic approach to quality assessment. These rate interpreting performances based on
predefined criteria, which constitute different dimensions of quality. Unlike ‘bottom-up’
methods, top-down methods do not require specific sections of the interpreter’s output to
be annotated. Assessor calibration is therefore crucial to ensure consistency in evaluation.
These methods are often applied in student assessments (e.g. Hartley et al., 2003; Lee,
2008; Liu, 2013; Riccardi, 1998), where the focus is on broader performance rather than
detailed deviations from the source text. However, these methods have also been used in
research that focuses on interpreting quality (e.g. Choi, 2013; Hale et al., 2018).
In relation to assessing the impact of technology on interpreting quality, a criteria-based
assessment method was used in a major study which compared remote simultaneous
interpreting (RSI) with on-site simultaneous interpreting to evaluate the feasibility of RSI.
The study, conducted at the European Parliament in 2004 and reported by Roziner and
Shlesinger (2010), collected performance data from 36 interpreters over five working days
and covered both on-site and distance interpreting sessions. Four excerpts per day were
selected for analysis, with source speeches in English, French, German, Italian, and Span-
ish interpreted into 12 target languages. Independent judges – experienced interpreters in
the relevant language pairs – rated the excerpts. The judges had access to recordings and
transcripts but were unaware of the modality they were assessing. The judges rated per-
formance against six quality criteria, including ‘error rate’ and ‘word choice and phras-
ing’, as well as providing a general evaluation of the performance. The seven evaluations
were then averaged into a single score. Roziner and Shlesinger highlight several practical
problems arising from this evaluation method. Due to the large number of excerpts (over
1,000) and language pairs involved, many judges were needed (n = 45). This potentially
affected consistency. In addition, considerable variations among the judges occurred due to
differing cultural backgrounds, language combinations, age, professional experience, etc.
To mitigate the potential lack of consistency, the judges were given detailed guidelines
and examples for each criterion. In addition, over 85% of the excerpts were evaluated by
two independent judges, although agreement between both evaluators was not particularly
high. Despite these challenges, Roziner and Shlesinger concluded that these factors did not
substantially affect the findings, which showed no significant difference between the slightly
lower mean score for RSI quality and that of on-site interpreting. Additional analysis of
the combined effect of several variables (interpreters, judges, languages, etc.) found a small

311
The Routledge Handbook of Interpreting, Technology and AI

yet statistically significant difference of 0.09 units on the 5-point rating scale. This result
slightly favoured on-site interpreting over RSI.
In legal settings, Hale et al. (2022) used a similar approach to compare interpreting
quality in on-site, telephone, and video-mediated interpreting during a police interview.
The criteria developed by Hale et al. (2018) to evaluate interpreting performance in police
interviews included ‘accuracy of propositional content’, ‘accuracy of manner and style’,
‘maintenance of verbal rapport markers’, ‘use of correct interpreting protocols’, ‘accuracy
of legal discourse and terminology’, ‘management and coordination skills’, and ‘bilin-
gual competence’. Each criterion was assigned a detailed descriptor, and the assessors
were asked to provide a score between 1 and 10 to each criterion. The scores were then
weighted. For example, the criterion ‘accuracy of propositional content’ had a weight of
30%, while other criteria were weighted between 10% and 15%. Two independent asses-
sors rated each interpreting performance based on a transcript, without knowing the test
conditions, and the mean score from both assessors was used in the quantitative analysis.
Hale and her colleagues found that the interpreters in the study performed better in on-site
and video-mediated interpreting than in interpreting via audio link.
The two studies selected here demonstrate both the potential and limitations of using
top-down assessment methods to compare interpreting quality across different conditions,
which is a common aim in the study of technology-enabled interpreting. While these meth-
ods provide a structured framework and the assessment criteria can be tailored to specific
research questions and interpreting settings, these methods also impose limitations on the
size of the data corpus and number of language pairs that can be assessed without consist-
ency being compromised. The variability in human judgment, even with clear guidelines,
and the lack of data annotation (which makes it difficult to trace how judges arrived at their
scores) suggest that this approach is better suited to smaller datasets, where fewer judges
are needed. Therefore, one of the overall challenges of this method is scalability.

17.4.2 Bottom-up methods


Initially, ‘bottom-up’ methods were designed to identify and evaluate deviations from an
original message in conference interpreting performance, particularly in the simultaneous
mode. These methods primarily addressed the dimension of quality known as ‘accuracy’,
intended to mean ‘the interpreter’s ability to deliver a faithful correspondence between the
source and target texts’.
Early approaches, such as Gerver’s (1969/2002, 1974) and Barik’s (1969, 1975) cod-
ing schemes, concentrated on identifying deviations at both word level and across phrases
and larger segments. This was in recognition of the fact that interpreters do not necessarily
convey every word in a source speech, and that a word-for-word rendition is not always
ideal. Gerver introduced the term ‘discontinuities’ to classify omissions, substitutions, and
corrections in the target output, while Barik categorised different forms of ‘departures’
between source and target message as ‘omissions’, ‘additions’, and ‘substitutions’. These
early frameworks provide a structured way to classify errors in interpreter performance,
emphasising completeness and fidelity to the original message. However, they have been
critiqued for being overly simplistic, focusing too narrowly on the linguistic product, over-
looking other contextual factors of interpreting, and failing to consider the broader view
of interpreting as a professional service (Pöchhacker, 2001). Despite these limitations,
these foundational studies have paved the way for later research into interpreting quality,

312
Quality-related aspects

particularly in relation to error classification. In addition, later research has still adopted
bottom-up approaches, from which it has provided a basis for quantifying interpreter per-
formance, thus laying the foundations for the development of later error classification sys-
tems when technology is involved.
One early study by Hornberger et al. (1996) applied Barik’s system to evaluate accuracy
in physician-mother-language-discordant consultations. The study compared two interpret-
ing modalities: remote simultaneous interpreting (experimental) and on-site consecutive
interpreting (control). Findings showed that remote simultaneous interpreting led to more
utterances per visit and a 13% lower rate of inaccurately interpreted mother utterances
(primarily omissions) compared to consecutive interpreting. Similar trends were observed in
physician utterances. This further demonstrates the utility of using this error classification
system to provide quantifiable evidence and identify trends, particularly when comparing
different modalities. More recently, in a study on the cognitive challenges of real-time auto-
mated captioning in interpreting during online meetings, Yuan and Wang (2023) adopted
Barik’s error classification to assess interpreting performance in two conditions, namely,
with and without live captioning. According to the authors, ‘the purpose of utilizing an
error-based analysis is to compromise the subjective caused by different makers recruited
for interpreting performance assessment’ (Yuan and Wang, 2023, 4).
Moser-Mercer (2003) conducted a comparative analysis between on-site interpreting and
booth-based RSI to assess the impact of booth-based RSI on interpreting quality and human
factors, such as interpreter stress and fatigue. In this case, bottom-up analysis included
transcribing and assessing interpreters’ performance outputs using an error rating scale
developed by Moser-Mercer et al. (1998) and adapted from similar earlier research (Barik,
1971; Gerver, 1974). This scale included four categories for meaning-related errors, namely,
‘“contre-sens” – saying exactly the opposite of what the speaker said; “faux-sens” – say-
ing something different from what the speaker said; “nonsense” – not making any sense
at all; “imprecision” – not capturing all of the original meaning (leaving out nuances)’
(Moser-Mercer et al., 1998, 54). This section of their assessment framework (which also
included methods to explore other human factors, such as baseline stress measurements)
captured a range of interpreter errors, such as omissions, additions, hesitations, corrections,
grammatical mistakes, and lexical errors. These were then graded based on their severity.
Errors that lead to a contre-sens were considered ‘the most serious’, and lexical mistakes
were deemed ‘the least serious’.
The AVIDICUS projects (Braun and Taylor, 2012a; Braun et al., 2013) extended
bottom-up approaches to the analysis of interpreting quality in a different mode and con-
text, namely, (simulated) police and prosecution interviews. Each scenario presented an
instance of two-way consecutive interpretating between a police officer or prosecutor and
a suspect or witness, in three different language combinations. Whilst in AVIDICUS 1, one
session was conducted using on-site interpreting and other sessions used video-mediated
interpreting, the sessions in AVIDICUS 2 all involved video-mediated interpreting, using
a variety of equipment and set-up and providing training to the interpreters prior to their
session. These comparative studies adapted Kalina’s (2002) comprehensive criteria, which
had been (primarily) used to assess conference interpreting performances, and combined
them with the quality analysis requirements for interpreting in legal settings. In addition,
AVIDICUS 2 also incorporated more nuanced language-based categories, including omis-
sions, additions, inaccuracies, lexical/terminological issues, and interactional problems (e.g.
turn-taking), and non-verbal and visual elements (such as gaze direction and being out of

313
The Routledge Handbook of Interpreting, Technology and AI

shot). Analysis involved coding using multiple raters and quantitative findings. Qualita-
tive analysis complemented this, in order to assess the scale of emerging problem areas
and identify critical instances. In addition, the interviews were divided into relevant genre
moves (e.g. introduction, caution, suspect’s version, etc., in the police interviews) to analyse
the occurrence of interpreting both within genre moves and over time (Braun, 2013).
While the researchers recognised that these categories only represent ‘one step of the
way to a comprehensive assessment of the viability of video-mediated criminal proceed-
ings that involve an interpreter’ (Braun and Taylor, 2012b: 115), the findings indicated that
video-mediated interpreting amplified known challenges in legal interpreting, particularly in
relation to omissions, additions, and inaccuracies, which were found to be more prevalent
in the technology-mediated settings. Inaccuracies were broken down further into subcatego-
ries, such as ‘distortions’, and given differential weighting, based on severity. Moreover, the
adopted approach revealed that certain problem types tended to co-occur (e.g. turn-taking
problems with omissions, with this being stronger in video-mediated interpreting). Conse-
quently, these categories were analysed together. The research also supported earlier findings
(e.g. from Moser-Mercer, 2003), which suggested that fatigue sets in more quickly dur-
ing technology-enabled sessions than in face-to-face interpreting, as evidenced by a greater
increase in interpreting problems. However, the study’s findings, based on simulations and a
small sample size, call for further replication to validate their significance.
Bottom-up error-based assessment methods that have been developed within interpret-
ing studies have seen a recent revival in relation to hybrid practices (e.g. live speech-to-text)
involving interaction between human professionals and speech recognition technology (aka
respeaking; see Davitti, this volume, for contextualisation of this practice as a unique form
of interpreting, which is accessible to hearing, non-hearing, and other-language-speaker
individuals). The NTR model was developed by Romero-Fresco and Pöchhacker (2017,
159; Figure 17.1). Based on the NER model for intralingual interaction (Romero-Fresco
and Martínez, 2015), it focuses on quantifying aspects such as accuracy and achieves this
via error classification and weighting. The model distinguishes between ‘recognition errors’

Figure 17.1 NTR model.

314
Quality-related aspects

and ‘translation errors’, the latter including both ‘content-related’ errors, that is, omissions,
additions, and substitutions, and ‘form-related’ errors, that is, grammatical correctness and
style errors. In this model, errors are attributed a score depending on their severity: ‘Minor’
errors (penalised with a −0.25-point deduction) do not hamper comprehension, ‘major’
errors (−0.50) cause confusion or loss of information, and ‘critical’ errors (−1) introduce
false or misleading information to the output.
However, the NTR model differs from earlier approaches based on error classifications.
One major difference includes the introduction of the category of ‘effective editions’ (EEs)
to account for editions that do not cause loss of information and may even improve the
text (Korybski and Davitti, 2024). While EEs are not assigned numerical scores, they do
play a role in the analysis, as they highlight the strengths of human editing. Furthermore,
space for qualitative evaluations is also accounted for. The accuracy rate is calculated using
the formula shown in Figure 17.1. In live intralingual subtitling, for subtitle accuracy to be
considered ‘acceptable’, the minimum accuracy threshold is set at 98% (Romero-Fresco,
2011). Several studies have validated the NER model in professional settings (e.g. Ofcom,
2015a, 2015b). In training, the model is also considered as a useful diagnostic tool to
identify recurrent errors in the performance of respeakers. However, no established quality
benchmark has yet been validated for interlingual live subtitling.
In effect, both the NER and NTR models assess interpreting accuracy from the perspec-
tive of evaluators rather than the end users. That being said, in principle, error scoring is
guided by the impact that a certain error may have on viewers’ comprehension. Conse-
quently, this leaves room for borderline cases where distinguishing between an error and
a positive strategy is more challenging. Moreover, the subjectivity involved in error rat-
ing can affect the consistency of assessments. This highlights the importance of having a
second-marking process when applying the model, to ensure reliability and reduce indi-
vidual bias in evaluation.
Adaptations of the NTR model have started to be used to evaluate interpreting perfor-
mance, with a view to producing findings that are comparable across studies. For instance,
Korybski et al.’s (2022) study compared a semi-automated workflow for live interlingual
subtitling, which involved a human (intralingual) respeaker paired with machine transla-
tion, against the output of simultaneous interpreting across several language pairs (Spanish,
Italian, French, Polish). The study used source speeches from the European Parliament for
both workflows. Importantly, recognition errors were not captured since they represented
an interim stage, specific to the semi-automated workflow’s intralingual respeaking process,
and did not apply to the simultaneous interpreting (the benchmark workflow).
Rodríguez González (2024; see also Rodríguez González et al., 2023) also adapted the
NTR model to evaluate the impact of ASR on interpreters’ performance in platform-based
RSI. To this end, recognition errors were excluded, as they were deemed irrelevant to the
interpreting workflow. A new category of disfluency was introduced to capture important
aspects of the interpreting delivery, namely, interjections, false starts, unfinished sentences,
truncated words, self-repairs, repetitions, silent and filled pauses. Three or more disfluen-
cies within an idea unit were penalised as a ‘minor style-related’ error. These adjustments
‘provided the research team with a pragmatic and systematic tool that enabled a rigorous
comparative analysis, based on a quantitative assessment, that captured the differences that
are present in the interpretations, both from intra- and extralinguistic perspectives’ (Rod-
ríguez González, 2024, 75).

315
The Routledge Handbook of Interpreting, Technology and AI

Two further studies are currently adapting the NTR model to explore how different
methods of integrating ASR into the interpreting workflow affect interpreting quality in
consecutive/dialogue interpreting in healthcare setting (Tan et al., 2024) and legal setting
(Tang et al., 2024).
Overall, approaches that aim to quantify aspects of interpreting performance offer
the potential to reduce the subjectivity, which has traditionally been a concern in quality
assessment, and achieve greater consistency and replicability across studies. However,
these methods remain somewhat experimental, given their labour-intensive nature, which
limits their application to large datasets. Semi-automation of quality assessment has been
tested in the context of intralingual speech-to-text practices (e.g. the NER Buddy; Alonso
Bacigalupe and Romero-Fresco, 2023). While this approach currently seems to perform
better with verbatim renditions, its use for the evaluation of interlingual practices is
worth exploring.
In relation to interlingual respeaking, the SMART project (see endnote 2) adopted an
NTR-driven assessment method involving two independent evaluators per performance
over a total of 153 performances and six language directionalities (see Davitti, this vol-
ume, and Davitti and Wallinheimo, 2025, for further information about the study design).
Through this bottom-up approach based on errors, different error types (as discussed ear-
lier in this section) as well as effective editions (EEs, that is, moves having a positive impact
on the final output) were manually identified. The results were used to calculate final accu-
racy scores, but also in multiple regressions, to identify predictors of accuracy (or lack
thereof) across all participants and scenarios tested. Findings showed that omissions were
the strongest negative predictor of accuracy (β = −1.1, p < .001), followed by substitutions
(β = −.19, p < .001), and recognition errors (β = −.31, p < .001). EEs emerged as positive
predictors of accuracy across all scenarios (β = .31, p = .03).
These findings highlight the potential for semi-automation of error detection, especially
as omissions are stronger predictors of inaccuracy than other error types. A system that
automatically identifies errors that have a strong impact on accuracy would provide a
quick assessment of the level of accuracy achieved by a specific output while reducing the
time required for analysis. More detailed, manual analysis could then be directed to those
errors that require a more nuanced source–target comparison or could be applied on a sam-
pling basis. However, the small-scale attempt at semi-automating this process presented in
Alonso Bacigalupe and Romero-Fresco (2023) shows that current systems struggle to differ-
entiate between omissions as strategic edits (e.g. for condensation purposes) and omissions
as actual errors in intralingual respeaking, which is likely to be exacerbated when language
transfer is involved.
While improvements in prompts and ASR technology will continue to enhance per-
formance, at the time of writing, human judgment remains irreplaceable for a compre-
hensive evaluation of quality, particularly in interlingual live content, where a verbatim
approach is rarely effective. This also highlights both the potential and the limitations of
automated systems, which, though efficient, lack the flexibility and adaptability of human
decision-making. Having to reassess post hoc which instances identified by the system as
‘errors’ actually have a positive or negative impact on the output would undermine any
efficiency gains. These considerations also point to the need to tailor automated assessment
systems to specific contexts and scenarios. Further discussion on full automation for quality
evaluation purposes, which would improve consistency and allow for processing of larger
datasets, will be offered in the following section.

316
Quality-related aspects

17.4.3 Automated metrics/methods


After reviewing human-based quality assessment methods, this section shifts focus to auto-
mated methods used for measuring interpreting quality. These concentrate on evaluating
how well the interpretation reflects the speaker’s original message, specifically assessing the
quality dimension in relation to informational accuracy.
Whilst methods relying on human assessors are the most appropriate for evaluating inter-
preting quality, these are difficult to apply, are time-consuming, and cannot be employed on
a large scale. As a result, automatic methods have become preferable in certain situations,
as they provide instant feedback to trainees, practitioners, and language service provid-
ers by indicating mistakes made during the process. This mirrors the scenario in machine
translation (MT) evaluation and is significant, as many of the automatic quality assessment
approaches in interpreting have been adapted from MT evaluation methods to provide
more efficient, large-scale assessment solutions.
After a short explanation of the most used MT evaluation methods in Section 17.4.3.1,
Section 17.4.3.2 will discuss how these methods have been employed to assess interpreting
quality. Section 17.4.3.3 will describe other methods used to assess the interpreting quality,
while Section 17.4.3.4 will focus on methods used in the evaluation of speech-to-speech
translation, which can also be applied to human interpretation.

17.4.3.1 Evaluation metrics used in machine translation


MT automatic evaluation metrics focus primarily on measuring the extent to which the
information from a source is present in its translation. This is usually achieved by compar-
ing the automatic translation with one or several reference translations produced by profes-
sional translators. The most used evaluation method is BLEU (Papineni et al., 2002), which
compares whether the ngrams3 from the translation appear in the reference translations. If
a match is found, the score of the translation is increased. This means that if the transla-
tion uses different words or a different word order to the reference translation, its score is
penalised, even if the meaning is the same.
METEOR (Banerjee and Lavie, 2005) is another evaluation metric used in MT which
relies on ngrams. In contrast to BLEU, the METEOR score integrates a dictionary of
synonyms to identify when different words convey the same meaning. However, because
METEOR also relies on ngrams, it is still affected by the order of words. BERTScore
(Zhang et al., 2020) and BLEURT (Sellam et al., 2020) are two recent evaluation metrics
which use neural networks to determine the quality of a translation. Rather than comparing
words or groups of words between the translation and the reference, these methods deter-
mine a numerical representation that encodes their meaning. These methods then use these
numerical representations to calculate how similar the two texts are. This, in turn, is used
to calculate the quality of the translation.
The methods described in the preceding text rely on the availability of a reference trans-
lation. The field of machine translation quality estimation (MTQE) develops automatic
methods that can predict the quality of a translation without the need for a reference trans-
lation (Specia et al., 2010). These methods are ‘data-driven’; this means they learn how to
estimate the quality of unseen translations from a large dataset where humans have already
annotated the quality of translation. In general, the main bottleneck in the development of
MTQE methods for a specific language pair is the availability of such data.

317
The Routledge Handbook of Interpreting, Technology and AI

17.4.3.2 MT evaluation metrics for measuring interpreting quality


Researchers have explored using MT evaluation metrics to assess interpreting quality.
However, their uptake has been rather limited for several reasons. As explained earlier,
these methods depend on the availability of one or several reference translations. In order
to apply these same methods to interpreting, it would be necessary to have ‘reference inter-
pretations’, or ‘ideal renditions’. These are challenging to create and, consequently, seldom
available. Furthermore, interpreters commonly employ techniques such as ‘summarisation’
to convey an original message effectively. However, most MT evaluation metrics rely on
ngram overlapping. This means that any differences between the words used in the refer-
ence interpretation and those in the human interpretation can result in penalisation. Addi-
tionally, to ensure that translations are of similar length to their reference translations, the
BLEU score includes a brevity penalty, which penalises translations that are shorter than
the reference translations. Since interpreters sometimes summarise source speeches, penal-
ties will be applied when scores are calculated. This may not be a problem if the reference
interpretation also summarises the source speech.
Despite the limitations discussed earlier, Han and Lu (2021a, b) argue that MT evalu-
ation metrics are indeed useful in assessing interpretations. Lu and Han (2023) show that
MT evaluation metrics like BLEU and METEOR can be used to assess interpreting qual-
ity for bidirectional consecutive English–Chinese interpretations. For their study, Lu and
Han (2023) asked four interpreters/interpreter trainers to produce reference interpreta-
tions to test them against reference-based MT metrics. Before applying the evaluation met-
rics, the reference interpretations and the students’ interpretations were transcribed and
pre-processed. As a result, the MT metrics were applied in the same way as they would be
for machine translation. In addition, human assessors, using analytic rubric scoring, holis-
tic rubric scoring, and comparative judgement, were recruited to assess the students’ inter-
pretations. Lu and Han (2023) show that there is moderate to strong correlation between
the automatic metrics and human scores, with better results for the English-to-Chinese
interpreting direction. Interestingly, they report that the strongest correlation is obtained
by using BLEU, rather than BERTScore, which is expected to better capture the similarity
between the students’ interpretation and the reference interpretations.
Wang and Fantinuoli (2024) also investigate the correlation between automatic metrics
and human assessments. Their study used 12 English speeches that were interpreted into
Spanish, both by three professional Spanish interpreters, and automatically by KUDO AI
Speech Translator. The interpretations were rated by human assessors on a Likert scale
for accuracy and intelligibility. Notably, the assessors were unaware of whether they were
reviewing human or machine output. The automatic evaluation was carried out using a
method similar to BERTScore. However, this evaluation was based on different neural net-
work models to derive the sentences’ numerical representations. To provide an alternative
evaluation method, GPT3.5 was prompted to give a score between 1 and 5 to indicate,
sentence by sentence, the quality of interpretation. Results show moderate correlations
between the informativeness score given by human assessors and the automatic methods,
with the most promising results being given by GPT3.5. The GPT3.5 model that was
used in the investigation can be deemed ‘rather weak’ for today’s state-of-the-art ver-
sion. Likewise, the prompt used can be considered ‘simple’. Therefore, it is possible that
stronger correlations could be obtained by using a more recent model and more sophisti-
cated prompts.

318
Quality-related aspects

Stewart et al. (2018) attempted to use MTQE to evaluate interpretations. They extended
QuEst++ (Specia et al., 2015) to account for interpreting specific features, like the ratio
of pauses/hesitations/incomplete words, the ratio of non-specific words, and the ratio of
‘quasi-cognates’. Their evaluation shows that the proposed method improved the correla-
tion for the predicted METEOR score over a baseline without the need for a reference
interpretation. Despite Stewart et al.’s (2018) promising results, it is currently difficult to
use MTQE methods in interpreting, due to lack of training data.
Although metrics like BLEU and METEOR were shown to correlate with human scores,
the scores provided by these metrics can only be used to rank interpretations. Alone, these
scores have limited utility, as they do not correspond to any predefined levels of quality.
The same criticism applies to methods derived from machine translation quality estimation,
such as that which Stewart et al. (2018) propose. This limitation is addressed by the method
presented in the next section.

17.4.3.3 Other methods for measuring interpreting quality


Li et al. (2022) proposed a method based on neural networks, which takes into considera-
tion both the transcription of the interpretation and pronunciation features extracted from
the speech. Based on these features, the method generates scores for keywords, content,
grammar, and fluency. These are then fused into one quality score. The method is assessed
on exam data of a Chinese–English interpreting task containing 2,734 spoken responses,
each up to 20 sec in length. Two human assessors were asked to score each response on a
predefined scale between 0 and 2. The final score for each response was created by calculat-
ing the average between scores. Li et al. (2022) show their method to obtain a high correla-
tion with manually assigned scores.
Ünlü (2023) investigated the use of GPT3.5 and GPT4 to assess interpreting quality.
They prompted LLMs to rate the automatic transcription of the interpretation on a scale
from 0 to 10. Given the small size of the experiment, correlation scores were not calcu-
lated. However, the study shows that chatbots can be used to provide feedback on differ-
ent aspects of interpretation, including accuracy, fidelity and completeness, coherence and
cohesion, terminology, and disfluency markers. While small-scale, this research shows the
potential of using LLMs in interpreting quality assessment.
Developments in AI have also improved processing of the audio signal, which, in turn,
can be used to assess interpretation. Yu and van Heuven (2017) automatically evaluate
speech fluency, showing how it correlates with humans’ perceived fluency ratings in the
context of consecutive interpreting. They also determine that certain variables (number
of syllables, number of filled pauses, mean length of pause) are important for determining
judged fluency. Similar findings are also reported in Han et al. (2020).

17.4.3.4 Measures used in the evaluation of machine interpreting


The International Conference on Spoken Language Translation (IWSLT)4 is dedicated to
various aspects of speech translation. The conference invites participants to tackle tasks
relating to automatic speech-to-speech translation. For systems that produce both text and
speech as output, evaluation metrics from MT, like BLEU and BLEURT, are used to cal-
culate the difference between the produced transcript and the human reference. Although

319
The Routledge Handbook of Interpreting, Technology and AI

these metrics are meant to measure the quality of machine interpreting, they can also
be used with human interpreting. The speech output of these systems is evaluated using
BLASER (BLASER et al., 2023), which compares the translated speech with the reference
speech. The same metrics can also be used to assess the quality of human interpreting per-
formance. However, as previously mentioned, these would only be useful when comparing
two interpretations.

17.5 Future developments


The current coexistence of human-driven and automated methods for quality evaluation
at the intersection between interpreting and technology makes it difficult to ensure com-
parability and replicability of current studies, particularly when evaluating human versus
machine performance.
Evaluation of interpreting quality appears to be moving towards automation. This shift
is driven by the need to handle larger datasets and improve scalability and addresses a limi-
tation in current interpreting research. While these methods have made significant strides in
measuring accuracy, they risk oversimplifying evaluation criteria and overlooking impor-
tant qualitative nuances that are critical to interpreting quality. Human oversight and quali-
tative methods thus remain critical.
To address these challenges, a hybrid approach which combines scalability with deeper,
more nuanced insights is key when dealing with the complex, nuanced interplay between
interpreting and technology. Automation may have scalability potential, but working
with smaller datasets to perform rich qualitative analyses remains invaluable for explor-
ing context-specific factors more deeply. Doing so allows evaluators to identify nuances
that could be missed by automated systems and thus contributes to a more comprehen-
sive understanding of performance. This is particularly important, as technology adds a
layer of complexity to the already-intricate interpreting process. Smaller qualitative studies
can provide valuable insights that inform larger quantitative studies by introducing more
variables for exploration and offering a richer, more nuanced understanding of interpreting
quality. The interplay between newer, automated methods and traditional, human methods
that tend to rely on qualitative approaches is essential to ensuring a balanced perspective
on quality.
Another important consideration for future evaluation methods is the usability of the
approach. Some automated methods rely on a narrow set of criteria that does not seem to
pay due attention to aspects beyond ‘accuracy’. This risks oversimplifying the evaluation
process. On the other hand, some (particularly top-down) approaches rely on a wide range
of parameters which are difficult to apply in practice. Balancing ‘usability’ with ‘compre-
hensiveness’ is key: This is where a hybrid approach can offer more manageable, scalable
solutions without sacrificing depth. In a hybrid approach, machine power can be used to
process larger amounts of data and recognise patterns in this data that may escape human
scrutiny. Human evaluators can, in turn, focus on the nuanced aspects of interpreting qual-
ity that machines cannot fully capture. In the various modalities of ‘technology-enabled’
or ‘assisted’ interpreting, which often amount to complex, multimodal forms of human–
machine interaction, machine-assisted data analyses may prove particularly useful for iden-
tifying relationships between different modalities. Using machine-assisted data analyses
could enable evaluators to detect how certain types of interaction with visual information
on-screen may influence an interpreter’s output in a more systematic way. This, in turn,

320
Quality-related aspects

may improve our understanding of what makes an interface of an interpreting platform


more or less effective.
Involving both interpreters and users of interpreting services in quality assessment is
another critical area for future exploration. On the one hand, interpreters make crucial
informants for understanding how their interaction with technology impacts the interpreting
process. This is due to the complex working conditions, environments, and interfaces that
interpreters encounter through their work with technology. On the other hand, with regard
to the involvement of users of interpreting services, reception studies that evaluate the quality
of output across different interpreting workflows and modalities from an end user perspec-
tive remain scarce. This brings back the enduring question: Should users be actively involved
in judging quality? This issue, as discussed in Section 17.2, continues to shape the debate on
interpreting quality assessment when technology is involved, especially when considering the
balance between objective evaluations and subjective user experiences.
Additionally, the future of interpreting quality assessment, for interpreting modalities
involving the use of technology, may lie in refining both bottom-up approaches (which
emphasise systematicity and objectivity) and top-down, criteria-based methods. The meth-
odological question of which evaluation methods will best isolate and identify the specific
impacts of technology on interpreting quality in different settings remains an ongoing chal-
lenge. This issue requires further exploration alongside the development of more robust
evaluation criteria and categories, in addition to continued research into the quality of
interpreting performance itself.

17.6 Conclusion
This chapter has undertaken the challenging task of exploring the broad, multifaceted con-
cept of quality within the context of interpreting and technology. It began by reviewing
key concepts related to interpreting quality and identifying specific challenges that arise
when technology is involved. Furthermore, it provided an overview of various evalua-
tion approaches adopted to explore such intersection, categorising them into top-down,
bottom-up, and automated methods.
A significant issue that was highlighted in relation to the latter group is the use of written
translation assessment methods to assess interpreting, which may not always be appropri-
ate. This is the case, for instance, in consecutive interpreting, where aspects of delivery (e.g.
intonation) and condensation play crucial roles. To adequately address these complexities,
it is essential to refine assessment methods to reflect the nuances of spoken language and
account for changes introduced by technology.
Looking ahead, a hybrid approach to quality assessment and evaluation appears promis-
ing. This approach would incorporate elements of automation for scalability while preserv-
ing the qualitative insights critical for assessing interpreting performance – especially as it
relates to technology and AI.
Further research endeavours based on larger datasets can also inform quality assessment
practices employed by different stakeholders. With the growing scale of live multilingual
(and multimodal) communication, one can predict increased interest in research-informed
semi-automated methods for interpreting quality evaluation. Additionally, the impact of
the ongoing academic debate on quality in interpreting is likely to extend beyond interpret-
ing studies, as long as sufficient, transparent, iterative communication and collaboration
between industry and academia remain.

321
The Routledge Handbook of Interpreting, Technology and AI

Furthermore, using some AI-driven technologies in evaluation can provide deeper under-
standing into how the use of other AI-driven technologies during interpretation delivery
could impact both the process and the product of interpretation. In addition, AI-driven
technology will be instrumental not only in assessing the quality of output delivered by
professional interpreters working in different workflows (at different levels of automation)
but also in shaping assessment in interpreter training contexts in the near future.
Dynamic, highly responsive research in the area of interpreting quality is indispensable to
ensure recurrent verification of the affordances of new technologies. With research-informed
insights, it will be possible to mitigate the current dominance of ‘tech hype’ over reason,
especially regarding the role of humans in providing high-quality, real-time communication
across languages. Ongoing research into interpreting quality and the best methods to eval-
uate technology-driven interpreting workflows may help counter some of the premature
claims about the universal feasibility of automated interpreting, often made by advocates
of major AI companies. If these claims remain unchallenged, they could discourage future
interpreters, potentially leading to a shortage, or even the disappearance, of a socially vital
profession.

Notes
1 MATRIC project – Machine Translation and Respeaking in Interlingual Communication, Expand-
ing Excellence in England, Research England, 2020–2024.
2 SMART project – Shaping Multilingual Access through Respeaking Technology, ES/T002530/1,
Economic and Social Research Council UK, 2020–2023. URL https://smartproject.surrey.ac.uk/
3 An ngram is a group of n consecutive words in a text. The most common way of applying BLEU
uses n = 1 . . . 4, that is, compares the groups of one word (unigrams), two words (bigrams), three
words (trigrams), and four words.
4 https://iwslt.org/ (accessed 4.4.2025).

References
Alonso Bacigalupe, L., Romero-Fresco, P., 2023. The Application of Artificial Intelligence-Based
Tools in Intralingual Respeaking: The NER Buddy. In Corpas Pastor, G., Hidalgo-Ternero, C., eds.
Proceedings of the International Workshop on Interpreting Technologies SAY IT AGAIN 2023,
9–15. URL https://lexytrad.es/SAYITAGAIN2023/
Banerjee, S., Lavie, A., 2005. METEOR: An Automatic Metric for MT Evaluation with High Levels
of Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and
Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI,
65–72.
Barik, H.C., 1969. A Study of Simultaneous Interpretation (PhD thesis).
Barik, H.C., 1971. A Description of Various Types of Omissions, Additions and Errors of Trans-
lation Encountered in Simultaneous Interpretation. Meta 16(4), 199–210. URL https://doi.
org/10.7202/001972ar
Barik, H.C., 1975. Simultaneous Interpretation: Qualitative and Linguistic Data. Language and
Speech 18(3), 272–297.
BLASER, Seamless Communication, Barrault, L., Chung, Y.-A., Meglioli, M.C., Dale, D., Dong, N.,
Duquenne, P.-A., Elsahar, H., Gong, H., Heffernan, K., Hoffman, J., Klaiber, C., Li, P., Licht,
D., Maillard, J., Rakotoarison, A., Sadagopan, K.R., Wenzek, G., Ye, E., Akula, B., Chen, P.-J.,
Hachem, N.E., Ellis, B., Gonzalez, G.M., Haaheim, J., Hansanti, P., Howes, R., Huang, B., Hwang,
M.-J., Inaguma, H., Jain, S., Kalbassi, E., Kallet, A., Kulikov, I., Lam, J., Li, D., Ma, X., Mavlyutov,
R., Peloquin, B., Ramadan, M., Ramakrishnan, A., Sun, A., Tran, K., Tran, T., Tufanov, I., Vogeti,
V., Wood, C., Yang, Y., Yu, B., Andrews, P., Balioglu, C., Costa-jussà, M.R., Celebi, O., Elbayad,
M., Gao, C., Guzmán, F., Kao, J., Lee, A., Mourachko, A., Pino, J., Popuri, S., Ropers, C., Saleem,

322
Quality-related aspects

S., Schwenk, H., Tomasello, P., Wang, C., Wang, J., Wang, S., 2023. SeamlessM4T: Massively Mul-
tilingual & Multimodal Machine Translation. URL https://arxiv.org/abs/2308.11596
Braun, S., 2013. Keep Your Distance? Remote Interpreting in Legal Proceedings. Interpreting 15(2),
200–228. URL https://doi.org/10.1075/intp.15.2.03bra
Braun, S., Taylor, J., eds., 2012a. Videoconference and Remote Interpreting in Criminal Proceedings.
Intersentia, Antwerp.
Braun, S., Taylor, J., 2012b. AVIDICUS Comparative Studies – Part I: Traditional Interpreting and
Remote Interpreting in Police Interviews. In Braun, S., Taylor, J., eds. Videoconference and Remote
Interpreting in Criminal Proceedings. Intersentia, Antwerp, 99–117.
Braun, S., Taylor, J., Miler-Cassino, J., Rybinska, Z., Balogh, K., Hertog, E., Vanden Bosch, Y., Rombouts,
D., Licoppe, C., Verdier, M., 2013. Assessment of Video-Mediated Interpreting in the Criminal Jus-
tice System: AVIDICUS 2 – Action 2 Research Report. URL http://wp.videoconference-interpreting.
net/wp-content/uploads/2014/01/AVIDICUS2-Research-report.pdf
Bühler, H., 1986. Linguistic (Semantic) and Extra-Linguistic (Pragmatic) Criteria for the Evaluation
of Conference Interpretation and Interpreters. Multilingua 5(4), 231–235.
Chiaro, D., Nocella, G., 2004. Interpreters’ Perception of Linguistic and Non-Linguistic Factors
Affecting Quality: A Survey Through the World Wide Web. Meta 49(2), 278–293. URL https://
doi.org/10.7202/009351ar
Choi, J.Y., 2013. Assessing the Impact of Text Length on Consecutive Interpreting. In Tsagari, D.,
van Deemter, R., eds. Assessment Issues in Language Translation and Interpreting. Peter Lang,
Frankfurt am Main, 85–96.
Collados Aís, Á., 1998. La evaluación de la calidad en interpretación simultánea. La importancia de
la comunicación no verbal. Editorial Comares, Granada.
Davitti, E., Wallinheimo, A.-S., 2025. Investigating Cognitive and Interpersonal Factors in Hybrid
Human-AI Practices: An Empirical Exploration of Interlingual Respeaking. Target 37, Special
Issue: Mapping Synergies within Cognitive Research on Multilectal Mediated Communication.
Dawrant, A., Han, C., 2021. Testing for Professional Qualification in Conference Interpreting. In
Albl-Mikasa, M., Tiselius, E., eds. Routledge Handbook of Conference Interpreting. Routledge,
London, 258–274.
Fantinuoli, C., 2017. Speech Recognition in the Interpreter Workstation. In Esteves-Ferreira, J.,
Macan, J., Mitkov, R., Stefanov, O.-M., eds. Translating and the Computer 39: Proceedings. Edi-
tions Tradulex, Geneva, 25–34.
Frittella, F.M., 2019. 70.6 Billion World Citizens: Investigating the Difficulty of Interpreting Numbers.
Translation and Interpreting, 11(1), 79–99. URL https://doi.org/10.12807/ti.111201.2019.a05
García Becerra, O., Collados Aís, Á., 2019. Quality, Interpreting. In Baker, M., ed. Encyclopedia of
Translation Studies. Routledge, London, 454–458.
Gerver, D., 1969/2002. The Effects of Source Language Presentation Rate on the Performance of
Simultaneous Conference Interpreters. In Pöchhacker, F., Shlesinger, M., eds. The Interpreting
Studies Reader. Routledge, London, 53–66.
Gerver, D., 1974. The Effects of Noise on the Performance of Simultaneous Interpreters: Accuracy of
Performance. Acta Psychologica 38, 159–167.
Gile, D., 1988. Le partage de l’attention et le ‘modèle d’effort’ en interprétation simultanée. The
Interpreters’ Newsletter 1, 4–22.
Gile, D., 1995/2009. Basic Concepts and Models for Interpreter and Translator Training. John Ben-
jamins, Amsterdam.
Grbić, N., 2008. Constructing Interpreting Quality. Interpreting 10(2), 232–257.
Hale, S., Goodman-Delahunty, J., Martschuk, N., 2018. Interpreter Performance in Police Interviews:
Differences Between Trained Interpreters and Untrained Bilinguals. The Interpreter and Translator
Trainer 13(2), 107–131. URL https://doi.org/10.1080/1750399X.2018.1541649
Hale, S., Goodman-Delahunty, J., Martschuk, N., Lim, J., 2022. Does Interpreter Location Make a
Difference? A Study of Remote vs Face-to-Face Interpreting in Simulated Police Interviews. Inter-
preting 24(2), 221–253. URL https://doi.org/10.1075/INTP.00077.HAL
Han, C., Chen, S., Fu, R., Fan, Q., 2020. Modeling the Relationship Between Utterance Flu-
ency and Raters’ Perceived Fluency of Consecutive Interpreting. Interpreting: International
Journal of Reasearch and Practice in Interpreting 22, 211–237. URL https://doi.org/10.1075/
intp.00040.han

323
The Routledge Handbook of Interpreting, Technology and AI

Han, C., Lu, X., 2021a. Can Automated Machine Translation Evaluation Metrics Be Used to Assess
Students’ Interpretation in the Language Learning Classroom? Computer Assisted Language
Learning 36, 1064–1087. URL https://doi.org/10.1080/09588221.2021.1968915
Han, C., Lu, X., 2021b. Interpreting Quality Assessment Re-Imagined: The Synergy Between
Human and Machine Scoring. Interpreting and Society, 1(1), 70–90. URL https://doi.org/10.
1177/27523810211033670
Hartley, A., Mason, I., Peng, G., Perez, I., 2003. Peer- and Self-Assessment in Conference Interpreter
Training. Centre for Languages, Linguistics and Area Studies, Heriot-Watt University.
Herbert, J., 1952. The Interpreter’s Handbook: How to Become a Conference Interpreter. Georg,
Geneva.
Hornberger, J., Gibson, C., Wood, W., Dequeldre, C., Corso, I., Palla, B., Bloch, D., 1996. Eliminating
Language Barriers for Non-English-Speaking Patients. Medical Care 34(8), 845–856.
Kadric, M., 2001. Dolmetschen bei Gericht. Erwartungen, Anforderungen, Kompetenzen. WUV Uni-
versitätsverlag, Wien.
Kalina, S., 2002. Quality in Interpreting and Its Prerequisites – a Framework for a Comprehensive
View. In Garzone, G., Viezzi, M., eds. Interpreting in the 21st Century. John Benjamins, Amster-
dam, 121–130.
Kalina, S., 2005. Quality Assurance for Interpreting Processes. Meta 50(2), 769–784. URL https://doi.
org/10.7202/011017ar
Kopczyński, A., 1994. Quality in Conference Interpreting: Some Pragmatic Problems. In Lambert,
S., Moser-Mercer, B., eds. Bridging the Gap: Empirical Research on Simultaneous Interpretation.
John Benjamins, Amsterdam, 87–99.
Korybski, T., Davitti, E., 2024. Human Agency in Live Subtitling Through Respeaking: Towards a
Taxonomy of Effective Editing. Journal of Audiovisual Translation. Special issue: Human Agency
in the Age of Technology 7(2), 1–22. URL https://doi.org/10.47476/jat.v7i2.2024.302
Korybski, T., Davitti, E., Orăsan, C., Braun, S., 2022. A Semi-Automated Live Interlingual Commu-
nication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking. In Proceed-
ings of the 13th Conference on Language Resources and Evaluation (LREC 2022). Marseille,
France, 4405–4413. ELRA. URL https://aclanthology.org/2022.lrec-1.468/
Kurz, I., 1993. Conference Interpretation: Expectations of Different User Groups. The Interpreters’
Newsletter 5, 13–21.
Kurz, I., 2001. Conference Interpreting: Quality in the Ears of the User. Meta 46(2), 394–409.
Lee, J., 2008. Rating Scales for Interpreting Performance Assessment. The Interpreter and Translator
Trainer 2(2), 165–184. URL https://doi.org/10.1080/1750399X.2008.10798772
Li, X., Li, X., Chen, S., Ma, S., 2022. Neural-Based Automatic Scoring Model for Chinese-English
Interpretation with a Multi-Indicator Assessment. Connection Science 34(1), 1638–1653.
Liu, M., 2013. Design and Analysis of Taiwan’s Interpretation Certification Examination. In Tsagari,
D., van Deemter, R., eds. Assessment Issues in Language Translation and Interpreting. Peter Lang,
Frankfurt am Main, 163–178.
Lu, X., Han, C., 2023. Automatic Assessment of Spoken-Language Interpreting Based on
Machine-Translation Evaluation Metrics: A Multi-Scenario Exploratory Study. Interpreting:
International Journal of Research and Practice in Interpreting 25, 109–143. URL https://doi.
org/10.1075/intp.00076.lu
Mesa, A.-M., 1997. L’interprète culturel: Un professionnel apprécié. Étude sur les services
d’interprétation: le point de vue des clients, des intervenants et des interprètes. Régie régionale de
la santé et des services sociaux de Montréal-Centre, Montréal.
Moser, P., 1996. Expectations of Users of Conference Interpretation. Interpreting 1(2), 145–178.
Moser-Mercer, B., 2003. Remote Interpreting: Assessment of Human Factors and Performance
Parameters. Communicate! Summer, 3.
Moser-Mercer, B., Künzli, A., Korac, M., 1998. Prolonged Turns in Interpreting: Effects on Quality,
Physiological and Psychological Stress (Pilot Study). Interpreting 3(1), 47–64. URL https://doi.
org/10.1075/intp.3.1.03mos
Napier, J., Skinner, R., Braun, S., 2018. Here or There: Research on Interpreting via Video Link. Gal-
laudet University Press, Washington, DC, 11–35.
Ofcom, 2015a. Ofcom’s Code on Television Access Services. URL https://www.ofcom.org.uk/__data/
assets/pdf_file/0016/40273/tv-access-services-2015.pdf (accessed 06.09.2024).

324
Quality-related aspects

Ofcom, 2015b. Measuring Live Subtitling Quality. Results from the Fourth Sampling Exercise.
URL https://www.scribd.com/document/552210915/REPORT-2015-Measuring-Live-Subtitling-
Quality-OFCOM (accessed 06.09.2024).
Ozolins, U., Hale, S., 2009. Introduction. Quality in Interpreting: A Shared Responsibility. In Hale,
S., Ozolins, U., Stern, L., eds. The Critical Link 5. Quality in Interpreting – a Shared Responsibil-
ity. John Benjamins, Amsterdam, 1–10.
Papineni, K., Roukos, S., Ward, T., Zhu, W., 2002. BLEU: A Method for Automatic Evaluation of Machine
Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Annual Meeting (ACL), Philadelphia, PA, 311–318. URL https://doi.org/10.3115/1073083.1073135
Pöchhacker, F., 2001. Quality Assessment in Conference and Community Interpreting. Meta 46(2),
410–425.
Pöchhacker, F., Zwischenberger, C., 2010. Survey on Quality and Role: Conference Interpreters’
Expectations and Self-Perceptions. Communicate! – A Webzine for Conference Interpreters and
the Conference Industry 53.
Rennert, S., 2008. Visual Input in Simultaneous Interpreting. Meta 52(1), 204–217.
Riccardi, A., 1998. Evaluation in Interpretation: Macrocriteria and Microcriteria. In Huang, E., ed.
Teaching Translation and Interpreting 4: Building Bridges. John Benjamins, Amsterdam, 115–127.
URL https://doi.org/10.1075/btl.42.14ric
Rodríguez González, E., 2024. The Use of Automatic Speech Recognition in Cloud-Based Remote
Simultaneous Interpreting (PhD thesis).
Rodríguez González, E., Saeed, A., Davitti, E., Korybski, T., Braun, S., 2023. Assessing the Impact of
Automatic Speech Recognition on Remote Simultaneous Interpreting Performance Using the NTR
Model. In Corpas Pastor, G., Hidalgo-Ternero, C., eds. Proceedings of the International Workshop
on Interpreting Technologies SAY IT AGAIN 2023, 1–8. URL https://acl-bg.org, https://lexytrad.
es/SAYITAGAIN2023/
Romero-Fresco, P., 2011. Subtitling Through Speech Recognition: Respeaking. St Jerome, Manchester.
Romero-Fresco, P., Martínez, J., 2015. Accuracy Rate in Live Subtitling–the NER Model. In
Díaz-Cintas, J., Baños Piñero, R., eds. Audiovisual Translation in a Global Context. Mapping an
Ever-Changing Landscape. Palgrave, London, 28–50.
Romero-Fresco, P., Pöchhacker, F., 2017. Quality Assessment in Interlingual Live Subtitling: The NTR
Model. Linguistica Antverpiensia, New Series: Themes in Translation Studies 16, 149–167.
Roziner, I., Shlesinger, M., 2010. Much Ado About Something Remote: Stress and Performance in
Remote Interpreting. Interpreting 12(2), 214–247.
Seeber, K., 2017. Multimodal Processing in Simultaneous Interpreting. In Schwieter, J.W., Ferreira, A.,
eds. The Handbook of Translation and Cognition. John Wiley & Sons, Inc.
Sellam, T., Das, D., Parikh, A.P., 2020. BLEURT: Learning Robust Metrics for Text Generation. In
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. URL
https://aclanthology.org/2020.acl-main.704.pdf
Shlesinger, M., 1997. Quality in Simultaneous Interpreting. In Gambier, Y., Gile, D., Taylor, C., eds.
Conference Interpreting: Current Trends in Research. John Benjamins, Amsterdam, 123–131.
Specia, L., Paetzold, G., Scarton, C., 2015. Multi-Level Translation Quality Prediction with QuEst++.
In Proceedings of ACL-IJCNLP 2015 System Demonstrations. Presented at the Proceedings of
ACL-IJCNLP 2015 System Demonstrations, Association for Computational Linguistics and The
Asian Federation of Natural Language Processing, Beijing, China, 115–120. URL https://doi.
org/10.3115/v1/P15-4020
Specia, L., Raj, D., Turchi, M., 2010. Machine Translation Evaluation Versus Quality Estimation.
Machine Translation 24, 39–50. URL https://doi.org/10.1007/s10590-010-9077-2
Stewart, C., Vogler, N., Hu, J., Boyd-Graber, J., Neubig, G., 2018. Automatic Estimation of Simultaneous
Interpreter Performance. In Proceedings of the 56th Annual Meeting of the Association for Computa-
tional Linguistics (Volume 2: Short Papers). Presented at the Proceedings of the 56th Annual Meeting
of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computa-
tional Linguistics, Melbourne, Australia, 662–666. URL https://doi.org/10.18653/v1/P18-2105
Tan, S., Orăsan, C., Braun, S., 2024. Integrating Automatic Speech Recognition into Remote
Healthcare Interpreting: A Pilot Study on Its Impact on Interpreting Quality. Proceedings of
Translating and the Computer 2024 (TC46), 175–191. URL https://asling.org/tc46/wp-content/
uploads/2025/03/TC46-proceedings.pdf

325
The Routledge Handbook of Interpreting, Technology and AI

Tang, W., Singureanu, D., Wang, F., Orăsan, C., Braun, S., 2024. Integrating Automatic Speech Rec-
ognition in Remote Interpreting Platforms: An Initial Assessment. In CIOL Interpreters Day, Lon-
don, 16.3.2024.
Tiselius, E., 2009. Revisiting Carroll’s Scales. In Angelelli, C., Jacobson, H., eds. Testing and Assess-
ment in Translation and Interpreting Studies. John Benjamins, Amsterdam, 95–121. URL https://
doi.org/10.1075/ata.xiv.07tis
Treisman, A.M., 1965. The Effects of Redundancy and Familiarity on Translating and Repeating Back
a Foreign and a Native Language. British Journal of Psychology 56, 369–379.
Ünlü, C., 2023. InterpreTutor: Using Large Language Models for Interpreter Assessment. In Proceed-
ings of the International Conference on Human-Informed Translation and Interpreting Technology
2023. Presented at the International Conference on Human-informed Translation and Interpreting
Technology 2023, Naples, Italy, 78–96. URL https://doi.org/10.26615/issn.2683-0078.2023_007
Wang, X., Fantinuoli, C., 2024. Exploring the Correlation Between Human and Machine Evalua-
tion of Simultaneous Speech Translation. In Proceedings of the 25th Annual Conference of the
European Association for Machine Translation (Volume 1), Sheffield, UK, 327–336. URL https://
aclanthology.org/2024.eamt-1.28/
Yu, W., Van Heuven, V.J., 2017. Predicting Judged Fluency of Consecutive Interpreting from Acoustic
Measures: Potential for Automatic Assessment and Pedagogic Implications. Interpreting: Interna-
tional Journal of Research and Practice in Interpreting 19, 47–68. URL https://doi.org/10.1075/
intp.19.1.03yu
Yuan, L., Wang, B., 2023. Cognitive Processing of the Extra Visual Layer of Live Captioning in
Simultaneous Interpreting. Triangulation of Eye-Tracking and Performance Data. Ampersand 11,
100131. URL https://doi.org/10.1016/j.amper.2023.100131
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y., 2020. BERTscore: Evaluating Text Gen-
eration with BERT. In Proceedings of the 8th International Conference on Learning Representa-
tions (ICLR 2020). URL https://iclr.cc/virtual_2020/poster_SkeHuCVFDr.html

326
18
ETHICAL ASPECTS
Deborah Giustini

18.1 Introduction
In recent decades, technological tools have become increasingly integrated by the interpret-
ing profession and industry thanks to growing computing power and market availability
of advanced audiovisual communication tools (Braun, 2019). Technological solutions now
include infrastructures that facilitate distance communication options for interpreting, as
well as online labour environments to manage the supply and demand of interpreting ser-
vices. As momentum continues to grow, stakeholders have increasing access to AI software
that is able to integrate a range of functionalities into the interpreting process, from auto-
mated terminology search to computer-assisted interpreting tools. In addition, technology
developers are driving innovation in machine interpreting through neural networks, leading
to its use in an increasing range of communication settings.
As debates on interpreting technologies come to the fore in the sector, research and
industry initiatives are aiming to assess their impact on both practices and settings. There
has been a relatively recent increase in scholarly literature that directly addresses the inter-
section between technology and interpreting, as well as the interrelation between these two
factors and a wide spectrum of cultural, social, economic, and professional issues (Drugan,
2019). In addition, interpreting researchers have themselves started developing a substan-
tial body of work relating to the ethics of interpreting technologies. This chapter reviews
the current body of literature that addresses these concerns and explores the ethical dimen-
sions of technological systems, or ‘technoethics’ (Bunge, 1975), of available interpreting
technologies. The chapter concludes by looking critically to the emerging technoethical
issues that have yet to be covered in the field and, hence, require further investigation in
the future.

18.2 Relevant concepts


In recent decades, scholarly literature has embraced the subject of ‘ethics’ in order to theo-
rise the changing roles and responsibilities of interpreters across various domains of pro-
fessional practice. In interpreting studies, Drugan (2019) remarks how ethics is a system


DOI: 10.4324/9781003053248-24
The Routledge Handbook of Interpreting, Technology and AI

of customs and values belonging to specific groups of individuals which involve moral
principles, that is, notions of what is right and wrong. More specifically, the discussion on
ethics in relation to interpreting ranges from universalist statements about general moral
principles, for example, Chesterman’s ‘Hieronymic oath’ (2001), to the recognition that
attitudes may vary according to the interpreting setting, situation, or participants’ needs.
In terms of approaches, scholarship and discourse on ethics in interpreting can be catego-
rised as follows: prescriptive (‘what interpreters ought to do’, or deontological), descriptive
(‘documenting what practitioners actually do’), and metaethical (moral reasoning in the
profession). When applied to practice, however, each category plays an interrelated role
(Dean and Pollard, 2022).
Regardless of the chosen approach, Horváth and Tryuk (2021) indicate that the main
ethical issues recognised in scholarly and professional discourse on interpreting focus on
accuracy, commitment, competence, confidentiality, impartiality, integrity, invisibility, and
transparency. Indeed, scholars highlight how ethical aspects relate to effective professional
performance in terms of an interpreter’s role, behaviour, and norms. This is illustrated in
the following quote by Drugan, taken from their work with ethicist Chris Megone:

[T]ranslation often involves impacts, direct or indirect, on oneself and others. Thus,
the question arises whether, in these impacts, one is manifesting virtues or vices (or
respecting obligations, or producing good or bad consequences), and this . . . requires
ethical reflection. In sum . . . the point of studying ethics for translators is not that
they become philosophers but that they develop good judgement.
(Drugan and Megone, 2011, 189)

This view also relates to demands for social recognition and fair working conditions and
draws attention to the fact that interpreters should operate within a framework that sup-
ports their professional contributions. To illustrate, in 2007, Hale insisted upon the neces-
sity to have both normative obligations and ethically acceptable working conditions. Their
survey of 16 professional codes of ethics found that the most prominent ethical principles
(confidentiality, accuracy, and impartiality) were accompanied by considerations regarding
definitions of an interpreter’s role, professional solidarity, and organisation of work (Hale,
2007).
However, technological developments are profoundly affecting the contexts and ways in
which the interpreting sector operates. More widely, technological developments have been
seen to impact the roles, norms, and practices involved at all stages of interpreting service
provision. Rapid advances in digital and AI tools have triggered changes in market demand.
In turn, these have impacted the tasks and skills present in the industry (examples include
distance interpreting, machine learning, speech technology, and most recently, generative
AI). It is unclear at this stage whether these changes represent an evolution, a revolution, an
innovation, or a disruption to the industry. Similarly, one cannot yet be sure of the scale of
the impact that these advances will have on the sector (Schmitt, 2019). Thus, as interpreting
technologies continue to be present on the mainstream market and to be used by society at
large, scholars are encouraged to systematically address the ethical dimensions underlying
their usage. Examples of ethical dimensions that require systematic consideration include
questions surrounding ownership, confidentiality, accessibility, workflow integration, con-
texts, and modalities of application (Massey et al., 2024).

328
Ethical aspects

To this extent, this discussion is well served by technoethics. As an interdisciplinary


research area, technoethics leverages theories from multiple knowledge domains, including
communication, social sciences, information studies, technology studies, applied ethics, and
philosophy. Technoethics acts as a compass to help researchers navigate the ethical dimen-
sions of technological systems and to recognise that technology design and use must entail
accompanying moral and social responsibility (Bunge, 1975). Technoethics, in fact, consid-
ers both ethics and technology to be socially embedded enterprises, according to the settings
and domains under investigation. Consequently, the application of technoethics should
guide relevant areas of focus, including professions, economics, politics, globalisation, and
the regulation of these same areas. When applied to interpreting, a level of technoethical
‘sensitivity’ could help direct reflections relating to the opportunities and challenges that
advanced, emerging technologies in the field present. This requires researchers to critically
attune themselves to the problems that technologies embody. Doing so calls for research to
consider the ways that technologies impact the organisation of interpreting work, as well as
the communicative situations and settings in which technological systems are entangled. In
this spirit, this chapter will examine a range of concrete ethical questions that have arisen,
in relation to the adoption and use of a variety of interpreting technologies. In addition,
questions will be raised with regards to the underlying assumptions and values that are
driving these changes.

18.3 Types of technology and core ethical issues


Before discussing core ethical issues, this chapter will briefly introduce the concept of inter-
preting technologies to aid in understanding their features and related concerns better. With
regard to organisation, the chapter leverages Braun’s (2019, 271) three-fold distinction
of interpreting technologies, as follows: technologies that support and enhance the inter-
preter’s preparation, performance, and workflow (technology-supported interpreting, that
is, computer-assisted interpreting, or CAI); technologies used to deliver and expand inter-
preting services (distance interpreting, or DI); and finally, technologies that automate the
interpreting process (technology-generated, or machine interpreting, or MI). Due to the
vast array of technologies available, the scope of this chapter is limited to introducing the
key applications that have generated some of the most significant ethical concerns relevant
to the domain.

18.3.1 Computer-aided interpreting


Computer-aided interpreting (CAI) implies support tools, more specifically computer soft-
ware that assists professional human interpretation prior, during, and after the assignment
phase (Braun, 2019). Therefore, CAI tools are considered to be ‘process-oriented’ technolo-
gies which, independent of modality, facilitate the overall goal of increasing interpreting
quality and productivity (Fantinuoli, 2018, 155; see also Prandi, this volume).
Concerning the preparation (‘prior’) phase, there is a range of tools that support glos-
sary creation and terminology management – examples include InterpretBank, Interplex,
and Intragloss. As Fantinuoli (2018) remarks, the common denominator of these tools is
their ability to compile corpora through text mining – based on representative terms for a
given domain – which can be anticipated and entered into the tool by the interpreter. The

329
The Routledge Handbook of Interpreting, Technology and AI

resultant corpora can then be used for terminology extraction or to elicit suggested transla-
tions for specific terms. Regarding the assignment (‘during’) phase, complex workstations
exist which address interpreting workflow. The most commonplace functionality here is
a computer-assisted search for specialised terminology and other units of interest during
an interpreting job, carried out while interpreting or assisting a boothmate (Fantinuoli,
2023). More advanced versions of this functionality include AI-enhanced CAI tools, which
attempt to automate certain components of the interpreter’s workflow (Fantinuoli, 2023).
Examples of such products include automatic glossary creation. The steps taken to create
an automatic glossary are similar to those taken during corpus creation, term extraction,
term translation, and glossary evaluation and require a building engine, whose output the
interpreter can edit accordingly. Another relevant tool is the Artificial Boothmate (ABM).
ABM is an application that automatically suggests problem triggers, such as numbers and
proper nouns, in real time (Fantinuoli, 2017). The architecture of such tools is based on
numerous elements, including automatic speech recognition (ASR; transcribing speech),
large language models (LLMs; retrieving units of interest from the transcription and match-
ing them with glossaries), natural language processing (NLP), machine translation (MT),
and a user interface (displaying information to the interpreter). Finally, CAI can also assist
in the ‘after’ assignment phase, aiding in creating glossaries and improving future interpret-
ing assignments by providing the interpreters with feedback and performance insights.

18.3.2 Distance interpreting


Technological evolution has sparked opportunities to develop distance modalities of com-
munication in real time. Braun (2024, 451) defines distance interpreting as ‘any situation
in which an interpreter is connected to at least one primary communication participant
through audio and/or audiovisual communication technology’. The term is also found in
ISO standards, where it is used alongside the expression ‘remote interpreting’ (ISO/DIS
17651–3, 3.5). This chapter adopts the former term, since it more comprehensively cov-
ers all distance communication options for interpreting across configurations. Within DI,
remote interpreting (RI) indicates the modality where all primary participants are co-located
in the same venue, while the interpreter(s) work from a remote location. In RI configura-
tions, technology is thus used to connect interpreters to in-person events. Accordingly, RI
can be performed over the phone or via audio or audio-video links, depending on the event
(Braun, 2024, 452–453). Virtual interpreting (Braun, 2024; also known as teleconference
interpreting) is another modality that can be classified under DI. In virtual interpreting, pri-
mary participants are distributed across two or more locations, with the interpreters either
in one of these locations or in one or more separate locations. Virtual interpreting allows
for more complex configurations to take place, for instance, interpreters providing services
for hybrid events.
While the United Nations and UNESCO have experimented with DI since the 1970s,
with further developments made at European institutions in the early 2000s (Mouzourakis,
2006), large-scale communication via DI is relatively recent. In fact, DI only fully developed
during the COVID-19 pandemic, during which time the interpreting industry adapted to
working remotely in order to overcome the challenges of social distancing. This adaptation
was accomplished by investing in IT resources and cloud-based technological infrastruc-
tures, such as online interfaces. However, in the post-pandemic period, DI is continuing to
gain ground. It now accounts for approximately half of the market for interpreting services,

330
Ethical aspects

including hybrid arrangements, such as virtual interpreting (Nimdzi, 2023). DI also goes
hand in hand with the ‘platformisation’ of the industry (Giustini, 2024), where digital
labour platforms are a rapidly expanding business structure, used to direct the supply and
demand of interpreting services. Using online marketplaces such as these, clients can con-
nect to a network of registered professionals directly. These platforms are often affiliated to
the software-as-service companies that run DI solutions through patented technologies. In
this way, interpreting can be outsourced digitally, 24/7, on demand.

18.3.3 Machine interpreting


Technological solutions designed to automate the delivery of interpreting appeared in the
1990s. However, they have had limited traction since then (Braun, 2019). Nevertheless, the
situation has changed recently, due to attempts at offering a fully automated service in lim-
ited settings, notably through machine interpreting (MI), or speech-to-speech translation.
MI refers to tools that use an automated language translation process which converts spo-
ken input into spoken output content for immediate consumption (Fantinuoli, 2023, this
volume). The most common approach to MI is the ‘cascading method’. This method com-
bines speech recognition, sequential machine translation, and voice synthesis to produce
output audio. It is worth noting that the development of MI using a direct speech-to-speech
model is on the horizon for both R&D and commercial uses, spearheaded by the Seamless
umbrella products (from Meta) and AudioPaLM (from Google).
While MI solutions have doubtlessly progressed, their application across multilingual
communicative events remains a ‘non-trivial’ challenge (Fantinuoli, 2023). MI exhibits lim-
ited flexibility. It performs poorly when spoken content lacks structure, fluency, and full
language encoding (Anastasopoulos et al., 2022). MI also lacks context awareness, includ-
ing that of speakers’ and interlocutors’ intentions, reactions, and non-verbal behaviours.
Finally, MI performance is also limited in terms of text coherence and lexical ambiguity
(Castilho et al., 2023). However, the rapid evolution of generative language models (e.g.
ChatGPT), which ‘learn’ complex language patterns and semantics through extensive data-
set training, may lead to significant progress in the near future and may also be incorpo-
rated by MI.

18.3.4 Related ethical issues


Generally, ethical concerns relating to the interpreting studies community are connected to
the continued role played by the technological developments that are now central to the
ways interpreters work, as well as how the sector functions as a whole (Fantinuoli, 2018;
Braun, 2019; Zetzsche, 2020; Okoniewska, 2022; Giustini, 2024). Specifically, interpreting
technologies have spurred discussions about ethical questions in four macro-areas: employ-
ment and working conditions, paired with potential automation and substitution of human
labour; data confidentiality, privacy, and ownership; accessibility and bias; and use of AI in
crisis-prone settings. For ease of discussion and clarity, this chapter will separate these areas
into subsections. However, it must be noted that many of these areas overlap, and that the
same ethical issue can be approached from intertwined viewpoints. Indeed, Drugan notes:

Many of these questions about ethical aspects of new technologies are difficult to sep-
arate from broader sociocultural issues. Technological developments have occurred

331
The Routledge Handbook of Interpreting, Technology and AI

alongside, and played a part in, major ongoing shifts in social structures, migration
patterns, trade, information and employment.
(2019, 250)

Furthermore, since it would be impossible to provide a truly comprehensive overview of


any technoethical issue, this chapter covers only the most pertinent questions to avoid
repetition, aiming to foster a discussion into key matters. It also attempts to offer readers
resources for deeper reflection into this rapidly developing area.

18.3.4.1 Working conditions, employability, and market developments


The ways in which technology affects the interpreting sector and its future have been the
object of much scholarly work of late. Due to advances in CAI, DI, and MI, research has
paid particular attention to the following issues: the ‘friend or foe’ relationship between
interpreters and the use of technology (Corpas Pastor, 2021, 23), the reduced status of
human interpreters and employment insecurity (Mellinger and Hanson, 2018), and the
dehumanisation of interpreting work that allegedly accompanies technological develop-
ments (Buján and Collard, 2023; Giustini, 2024).
One strand of research deals with technology that is deemed to pose significant ethi-
cal challenges to professional status and expertise. The results of a questionnaire study
conducted by Gentile and Albl-Mikasa (2017) suggest that new technologies impact an
interpreter’s perception of the stability of their professional status. The study reports a
widespread sense that technologies are changing communication requirements and prac-
tices. This results in fears of reduced fees and social esteem, with interpreting becoming a
‘simple commodity’ in the eyes of clients. Similarly, commenting on DI and technological
developments more generally, Fantinuoli (2018) observed that while these tools can still
successfully support professionals, the interpreting industry is faced with developments that
outsource work to unqualified individuals and justify price dumping. Due to the expansion
of global and digital information in labour markets, the ‘neoliberal drive’ behind these
practices is significantly affecting the profession, ‘from the way it is perceived by the general
public to the status and working conditions of interpreters’ (Fantinuoli, 2018, 3).
Given the spread of technology use in this domain, ethical challenges are particularly
apparent in DI, as far as working conditions and the market are concerned. Industry data
shows that DI has increased work opportunities across market segments (Nimdzi, 2023).
This has led to a so-called ‘productivity effect’, that is, a growth in the labour demand due
to technological progress (Fantinuoli, 2018). Advantages of this include increased flex-
ibility and a greater work–life balance, a widening global market, enhanced sustainability
thanks to a reduced carbon footprint, and the commissioning of interpreting services by
users who would otherwise not have had the financial or logistic capabilities to hire special-
ists (Tejada Delgado, 2019; Mahyub Rayaa and Martin, 2022). However, DI has also led
to a deterioration of working conditions, as several scholars attest. In a survey exploring
professionals’ perceptions of DI in remote simultaneous mode, Buján and Collard (2023)
found that although interpreters have shown adaptation to the practice, they still evaluate
DI negatively in terms of working conditions. Specifically, the work-from-home modality
accompanies a more cumbersome access to technical support and teamwork. This results
in the perception that DI jobs are more challenging than on-site ones. Another factor that
is raised regularly in the study deals with the ‘on-demand’ structure of DI. Interpreters

332
Ethical aspects

fear being seen as an interchangeable crowd of workers who can be simply ‘contracted on
and off’ and upon short-notice request. This raises ethical implications for interpreters’
visibility by users and risks increasing the ‘depersonification’ effect of the service provider.
As a result, interpreters fear that users and clients may end up considering interpreters as
‘a plug-and-play feature of RSI [remote simultaneous interpreting] platforms’ (Buján and
Collard, 2023, 146, author’s addition).
A cognate dimension relates to the outsourcing of DI through online marketplaces and
applications, especially in the private sector. This phenomenon falls into the gig economy,
where digital labour platforms mediate supply and demand for interpreting services (Gius-
tini, 2024). In this environment, work is rigidly monitored and commissioned through
algorithmic management – algorithms that direct the allocation and intermediation of
interpreting work through big data points. In turn, platforms unilaterally impose restric-
tions on interpreters’ working conditions in areas such as completion time, labour price,
and expectations of constant availability. For example, jobs are often offered to the first
or lowest bidder. In turn, the platforms take the lion’s share of profits, with detrimen-
tal consequences to interpreters in terms of underpayment and unfair trade competition.
Baumgarten and Bourgadel (2024) comment on this phenomenon by linking the lack of
ethics present within the neoliberal system of digital capitalism in the language industry to
the pressures of competing in a globally interconnected sector that is destined to generate
profits and shareholder value. As the authors remark, the digital economy of language ser-
vices favours production values, such as efficiency, speed, quantification, and cost saving,
in its pursuit of commercial profit. However, the authors remain doubtful about whether
the industry’s relentless drive towards ever-enhanced optimisation through platforms is the
right path towards ethically sustainable work.
Scholarship has also reported on interpreters’ fears about their labour substitution that
have been brought about by the arrival of neural networks and AI. Interpreters are said to
have ‘automation anxiety’ (Vieira, 2020, 9). However, a distinction must be made between
MI and CAI in this context. CAI is a supporting group of technological solutions. Corpas
Pastor (2018) and Prandi (2023) show that attitudes to CAI tools are not entirely negative.
For instance, real-time terminology support can lead to increased efficiency and improved
performance. Generally speaking, interpreters feel they still have control over their work-
flow while using CAI, despite the necessity to upskill when integrating such tools into one’s
set of competencies. The application of these technologies mainly raises ethical questions
with regard to the interpreter’s cognitive load. Interpreters lament their time-consuming
nature and potential for distraction and the perceived lack of sophistication, intuitiveness,
and user-friendliness of the software that is currently available. As a result, ethical issues
remain more limited to perceptions of quality of the available software and multitasking
needs deriving from its use.
In contrast, the impact of MI on the market and employment prospects is potentially
more marked than that of CAI. Fantinuoli envisages a near future in which MI will enter
the low-end segment of the market, that is, ‘areas which are less prestigious, critical or
sensitive’ (2018, 12). This strategy would ostensibly offer ‘acceptable performance’ and
capture customers in exchange for economic savings and larger service availability. Profes-
sional human interpreting would survive at the higher end of the market, ‘at least until the
advent of real human-like MI’ (p. 345). This is a common argument revolving around the
tech-driven polarisation of skills (Acemoglu and Autor, 2011). When automation is intro-
duced in a sector, labour tends to split into jobs at the bottom, requiring lower skill level,

333
The Routledge Handbook of Interpreting, Technology and AI

and jobs at the top, requiring greater skill level. This polarisation is not free from ethical
implications, which include potential job displacement in certain market sectors and reten-
tion in others. This raises concerns about employment security and inequalities among
interpreters and interpreting settings, according to perceptions of rank, specialisation, and
importance.
Lastly, advances in MI have ignited the debate from another angle: whether automated
technologies will replace interpreters on the lines of conduit versus situational interpreting
expertise. MI relies on algorithms and data processing to simulate human cognitive func-
tions. This, in turn, allows it to generate automated real-time renditions that are incremen-
tally provided with only partial input. Some scholars claim that this process is incapable of
replacing human interpreters (Ortiz and Cavalli, 2018; Corpas Pastor, 2018, 2021) because,
despite such affordances, MI cannot yet capture linguistic variation, non-verbal communi-
cation, or emotions. In other words, at present, MI lacks problem-solving capabilities and
situational reactions. However, Fantinuoli and Dastyar (2021) suggest that relying on these
dimensions as selling points for the superiority of human interpreting results in an uncon-
vincing argument. They state that organisations and end users require clearer explanations
of the higher value of professional human interpreting to convince them to prefer this
service over MI. Giustini and Dastyar (2024) add that another pressing ethical question in
this domain relates to whether customers and businesses are likely to prioritise perceptions
of human expertise, emotional nuances, and service quality over availability and price. The
signalling of expertise, quality, and professionalism is, in fact, afforded not only by inter-
preters but also by consumers. To this extent, users may favour MI if they view automated
tools as being capable of providing at least an ‘acceptable’ level of service delivery. It is
critical to recognise that the acceptability of MI is also highly dependent on the interpreting
context. Different settings – such as medical, legal, diplomatic, or business – present varying
levels of complexity and ethical implications. In high-stakes environments, such as medical
or legal settings, accepting lower quality or the potential for error resulting from limita-
tions of MI could have serious consequences and involve significant ethical trade-offs. This
raises additional questions that service providers should transparently address: To what
extent are customers, especially those in critical settings, informed about the risks of using
automated technology? How is the potential for lower quality or miscommunication result-
ing from MI being communicated to them? Moreover, should service providers be held
responsible for clarifying when MI is suitable and when it is not? Taking responsibility in
this area includes ensuring that users understand how MI may suffice for low-risk, routine
tasks (e.g. booking an appointment at a hospital’s reception desk), but disclosing how it has
limitations in more nuanced or high-pressure situations (e.g. medical consultations). These
actions must be taken to avoid context-specific risks and misleading consumers. In turn, in
less high-risk settings, such as routine corporate webinars or product demonstrations, MI
may be deemed more acceptable, despite inaccuracies or lack of nuance. However, even
in these cases, it remains crucial for service providers to communicate the potential for
reduced quality clearly, so that clients can manifest informed consent. Otherwise, provid-
ers and users across settings may inadvertently normalise risks and unclear standards as
‘acceptable’ for certain types of communication. This would raise broader ethical questions
about the long-term impact of MI on groups involved.
Finally, it is worth noting that speech translation enhancements are developing rapidly,
introducing more features to appeal to a wider range of more varied potential clients. For
instance, in May 2024, Microsoft Azure AI Speech launched a language detection tool

334
Ethical aspects

which switches between supported languages in the same audio stream. This eliminates the
need for developers to specify particular input languages and integrates custom translations
to a client’s domain-specific vocabulary. With these capabilities, companies can attract
broader audiences and improve user experience. The process may still eschew an apprecia-
tion for quality, but the extent to which these ethical concerns impact the future of inter-
preting remains uncertain at this stage, because the full potential of MI is still unfolding.

18.3.4.2 Confidentiality, privacy, and ownership


Drugan and Babych’s 2010 research directs another discussion of ethics and technologies.
The authors write about a practical issue that is arising as a result of translation memories,
denoting the question of database ownership. In this context, ethical issues arise when com-
panies perform data scraping or reuse translation output to train AI systems for commercial
purposes, without ascertaining the intentions of their original owner. This also means that
owners, from translators to linguistic communities, lose the ability to control how their data
is reused and receive relevant compensation. Furthermore, Lewis and Moorkens (2020)
observe that, in many cases, it is difficult to ensure the legal rights and copyright owner-
ship of target texts and aligned databases. This is because corporate practices often make
it challenging for owners, such as interpreters, to assert their rights. Additionally, Forcada
notes that although owners should receive compensation for unintended use of their works,
currently, ‘recoverability is virtually impossible in NMT [neural machine translation] out-
put’. This is because training data is ‘broken down to the level of words, subwords or even
characters’ (2023, 63, author’s addition). Thus, the input of an individual or a community
is virtually unidentifiable, and their overall contribution largely undetectable.
On the same subject, Horváth (2022) highlights questions relating to copyright. Copyright
is a complex issue, linked to legal considerations. However, legal systems vary worldwide.
This makes it difficult to harmonise the transnational nature of interpreting technologies.
Van de Meer (2021, cited in Horváth, 2022) notes that language data copyright in the
United States and the EU usually applies to complete or partial works (documents, sections,
products), more so than to individual segments, unless these have a distinctive creative
value, such as a line from a poem. However, the International Association of Conference
Interpreters (AIIC) details that ‘no interpreter may be recorded without his/her knowledge
and without his/her consent’ since ‘the performance of the conference interpreter becomes a
translation within the meaning of the Berne Convention [for the Protection of Artistic and
Literary Works]’ (AIIC 1999, 1–3, author’s addition). In other words, the interpretation is
considered an adaptation of a speaker’s original linguistic work, and thus the interpreter’s
own intellectual creation. Therefore, any recording must be secured by third parties with
the contractual consent of interpreters and a corresponding right of use (licence). It remains
uncertain as to whether wealthy technology companies hold permission rights or pay royal-
ties for data deriving from interpretations for commercial use.
Another problem relates to confidentiality and privacy of information. According to
Horváth (2022), interpreting technologies may entail privacy risks, of which users may
be unaware. For example, the data processed by an MI service may be kept and reused
by companies as training data. As a result, when sensitive information is input, it can be
inadvertently exposed to unauthorised parties. This leads to potential breaches of confiden-
tiality. Since the availability and capability of technological tools are expanding rapidly,
so, too, is the amount of inadvertently disclosed sensitive data. This increases the need for

335
The Routledge Handbook of Interpreting, Technology and AI

safeguarding personal or proprietary data, particularly in sensitive interpreting settings,


such as business, healthcare, and law. Despite existing regulations, such as the European
Union’s General Data Protection Regulation (GDPR 2016/679), the aggregation of ‘anony-
mous’ data circulating in interpreting technologies adds to the argument that data subjects
have limited rights and privacy on their use (Horváth, 2022, 9).
At this juncture, prevailing social discourses praise the ostensibly universal availability
of free speech translation services, portraying it as a solution which benefits the public
(Vieira, 2020). However, emerging reflections point towards the concentration of such
resources in the hands of corporations. While large tech companies contribute to field
advancements in natural language processing (NLP) through funding, sponsorship, and
R&D, they also own and control the necessary computing infrastructure (such as data
centres) to deploy large-scale AI systems into consumer markets (Abdalla et al., 2023).
Presently, all state-of-the-art speech translation tools are proprietary products of either a
small number of large tech companies or of language industry corporate leaders that have
resources or receive private equity investment (Nimdzi, 2024). Relying on a limited group
of corporate actors creates systemic risks for the independence of AI research initiatives.
This can lead to a concentration of expertise among a few dominant players and skew
industry research to align with corporate agendas rather than with broader ethical consid-
erations (Abdalla et al., 2023; Kak et al., 2023). This concentration also raises questions
about corporate ability to shape and catalyse the development of AI, de facto defining the
technology’s future.
Some meaningful alternatives do exist. These include open-source AI (Van Dis et al.,
2023), which can offer transparency, reusability, and extensibility. However, it does not
address the problem of concentrated market and research power. Regulation can help miti-
gate these matters and protect the interests of the public. However, as Kak and colleagues
(2023) notice, government policy often entrenches big tech power, as it leverages access to
political clout. Therefore, both transparency mandates and liability regimes should be put
into place to ensure that AI infrastructures in corporate models and applications remain
ethical. This will naturally require broad approaches that study the political-economic
forces at play, in combination with interpreting technology.

18.3.4.3 Accessibility, bias, and inclusion


A related point emerging from the literature is that automated interpreting has the potential
to simultaneously increase and decrease language access. As Drugan and Tipton (2017)
state, the proliferation of technological agents is ‘a double-edged sword’ for the profession.
On the one hand, technologies can enable participatory cultures – providing a platform for
individuals and organisations to engage more effectively in multilingual environments. In
this instance, technologies can empower ‘citizen translators’ (Drugan and Tipton, 2017,
121) to handle linguistic uncertainty. On the other hand, this is not without risk, espe-
cially when technologies are deployed to solve interlingual crises in settings such as courts
and hospitals. If institutional interactions favour expedient, imperfect tech solutions over
human input, the public needs to be informed about how to manage interlingual media-
tion ethically. These considerations were recently echoed by the Stakeholders Advocat-
ing for Fair and Ethical AI in Interpreting (SAFE AI) Task Force’s survey on automated
interpreting (Pielmeier et al., 2024, Ch. 8), where 73% of end user respondents noted
that the highest-ranked advantage of automated interpreting was its on-demand access and

336
Ethical aspects

round-the-clock availability. The next most cited benefits (58%) were ‘no need to schedule
an interpreter’ and ‘lower costs’. Nonetheless, respondents also noted that these benefits
were not necessarily assured in real deployments, due to AI’s limited language availability
and the potential high costs of errors offsetting savings.
In turn, Bowker and Buitrago Ciro (2019, 9) describe how the availability of data in
dominant languages, coupled with the ease of access to online tools, means that an increas-
ing amount of material is being translated, both into and out of these languages. However,
the dominance of English and European languages in AI training data excludes almost
1.7 billion people, minimally representing over 2.3 billion. Just ten languages comprise over
85% of training data (De Palma and Lommel, 2023). Since AI systems tend to be developed
for commercial purposes, lesser-known languages that cannot generate adequate profit are
excluded. AI systems also rely on extensive written text availability. This means that Indig-
enous languages, which exhibit greater oral traditions and lack extensive written corpora,
are left behind. Thus, developers tend to prioritise the application of LLMs for dominant
languages, as they exhibit more consistent corpora and algorithmic performance. However,
this results in the undertraining of languages of lesser diffusion, which, in turn, contributes
to low algorithmic performance in AI systems and further exacerbates the underrepresenta-
tion of social groups affiliated with these languages.
Along with underrepresentation, AI systems perpetuate bias in their training corpora.
Bias refers to systematic and disproportionate prejudice against a group and often results in
unfair judgment and treatment. To illustrate, research has identified gender bias (defaulting
to masculine forms and stereotypical associations) (Vanmassenhove et al., 2021), racial bias
(Wolfe and Caliskan, 2021), and political bias (Rozado, 2023). Since AI systems require
large amounts of data for quality outputs, they are programmed to identify and prioritise
patterns in texts. Societal biases embedded in texts, combined with the underrepresentation
of more advanced discourses in training corpora, lead to their perpetuation in interpret-
ing technologies (Monzó-Nebot and Tasa-Fuster, 2024, 9). These studies have been com-
pounded by further empirical evidence. The SAFE AI survey (Pielmeier et al., 2024, Ch. 3,
9) indicates that 81% of respondents (industry stakeholders, service providers, and service
recipients) worry about the detrimental effects of AI on the quality of language access.
In particular, respondents fear that AI could introduce biases and lead to discrimination,
which could degrade interpreting quality in critical areas, such as healthcare. Meanwhile,
Ren and Yin (2020) raise an adjacent ethical consideration: accountability. Accountability
presents a significant challenge because, unlike providers, users, or organisations, AI cannot
assume legal liability, demonstrate professional responsibility, or face consequences for its
interpretations. However, interpretation errors caused by AI can lead to harmful outcomes
or dangerous decisions and create human, legal, or professional risks. The question as to
how ethically desirable relationships and responsibilities can be developed in order to pre-
vent potential harm within the limits of compliance, reporting, and enforcement is yet to
be clarified.
Another less widely discussed matter relates to the improbable balance of accessibil-
ity versus the lack of inclusion of minority and Indigenous communities in technology
training. This problem extends to the methods that are used to harvest data for AI sys-
tems, particularly in NLP, which rely on datasets of previously translated sentences to
train models between different languages. For low-resource languages to be adequately
trained, automated web crawling is employed to gather larger datasets. However, domi-
nant languages are used to pivot engines for low-resource languages. Not only does this

337
The Routledge Handbook of Interpreting, Technology and AI

increase limitations in terms of acceptable output quality, but it also raises concerns over
the exclusion of native speakers in development and validation processes. In this context,
speaker involvement refers to participation in contributing data, review of output, and
provision of culturally and linguistically relevant feedback. This perspective on inclusion
is championed by a variety of global and academic actors. For example, according to the
IBM Artificial Intelligence Pillars, ‘inclusion’ means working with diverse development
teams and seeking out the perspectives of minority-serving organisations and impacted
communities in AI systems design (IBM 2023). UNESCO’s ethics of AI (2021) indicate
that states should work to ensure the ‘respect, protection and promotion of diversity and
inclusiveness . . . throughout the life cycle of AI systems . . . by promoting active partici-
pation of all individuals or groups’ (p. 6), including in AI development (p. 8). UNESCO
also states that in an AI life cycle, ‘measures should be adopted to allow for meaningful
participation by marginalized groups, communities and individuals and, where relevant,
in the case of Indigenous Peoples, respect for the self-governance of their data’ (p. 10).
Scholars such as Mager et al. (2023), García González (2024), and Ghosh and Chat-
terjee (2024) recommend adopting a human-centred approach toward MT and NLP to
avoid data extractivism and digital colonialism that may otherwise affect communities.
This involves upliftment of low-resource languages. It can be achieved by encouraging
the active participation of community members with relevant lived experience in vari-
ous context in which the language is used. As a result, native speakers can inform both
data collection and AI training. Therefore, speakers’ inclusion can ensure that both the
input (training data) and output (translations) align with their ways of communicating,
values, and beliefs and avoid misrepresentation. While speakers’ direct involvement is not
always feasible, it remains essential that relevant communities are consulted, represented,
and empowered to ensure that AI systems uphold their linguistic and cultural integrity.
A potential way forward includes collaborations with community members, representa-
tives, linguists, and experts who are able to advocate for their needs, and who can avoid
perpetuating the very power imbalances that technology should be minimising.
To this regard, Mager et al. (2023) discuss ethical considerations for MT for Indig-
enous languages. They argue that NLP risks being weaponised as a political and ideo-
logical instrument of power, influencing the culture of minorities as a means of control
and expropriation. In the same vein, Tymoczko highlights the issue of foreignisation, the
flooding of vulnerable communities and ‘subaltern’ cultures with ‘foreign materials and
foreign language impositions’ (2006, 454). Borrowing Tymoczko’s argument, Mager et al.
(2023) suggest that the ethical implication in this context relates to the encoding of colonial
domination in MT. To avoid such risks, in the same study, the authors involved members
of the Aymara, Chatino, Maya, Mazatec, Mixe, Nahua, Otomí, Quechua, Tenek, Tep-
ehuano, Kichwa of Otavalo, and Zapotec communities while researching ways to inform
technological advances. Their study revealed that while participants expressed interest in
having MT systems for their own languages, they were also concerned about the commer-
cial usage and knowledge ownership of the output. These concerns centred on the misuse
of cultural, religious, health, and mercantile matters, and that distortions, appropriations,
and attempts at standardisation would result, which could be used by corporations wishing
to profit from technological sovereignty and data ownership. Respondents called instead
for quality checks to be carried out by community members, for the right to control the
knowledge shared, and for licensing of the final datasets to be central instruments for ethi-
cal decision-making.

338
Ethical aspects

Overall, this corpus of studies suggests that the ethics of interpreting technologies are rife
with epistemological implications. To this regard, Monzó-Nebot and Tasa-Fuster (2024)
draw attention to the bigger picture of the linguistic hegemony that is reproduced by auto-
mated interpreting technologies, such as MI, and their impact on systems of knowledge.
As the authors point out, certain concepts and representations may not be readily avail-
able in English and other dominant languages. The original text/speech may also be struc-
tured according to encoded beliefs and normative expectations that are unfamiliar to those
operating in dominant languages. In other words, the source text may be embedded in an
epistemological paradigm that cannot be decoded effectively in an English equivalent. As a
result, this process may tame knowledge structures into those that are more aligned with a
Western-centric, hegemonic worldview, in a process that Bennett (2013, 171) dubbed ‘epis-
temicide’. Similarly, Měchura (2015) observed that, while technology is often seen as a tool
to ‘overcome’ the challenges posed by linguistic diversity, in minority-language settings the
goal is often to do the opposite: to preserve and reinforce diversity. Měchura also warns
that lack of attention to cultural-linguistic nuances in training AI systems results in the
original content, authored in the minority language, being allowed to ‘escape’ as it is assimi-
lated into the knowledge frameworks of dominant languages. To safeguard against this, it
is crucial to protect and diversify the epistemological frameworks embedded in AI systems.

18.3.4.4 Technology use in crisis-prone settings


Technology has gained a foothold in various public service settings, including courts,
healthcare, and asylum-seeking centres. Its use in such settings aims to reduce language
barriers between minority language speakers and institutional providers. Yet while these
trends point to positive phenomena, such as democratisation of access, they also highlight
a long-standing pattern of ethical issues (Stengers et al., 2023).
Many technoethical investigations have been channelled into healthcare interpreting.
Among such studies, Klammer and Pöchhacker (2021) examined the use of video remote
interpreting (VRI) to mediate clinical communication. They stressed that the VRI set-up
was the result of deliberate optimisation efforts made in this institutional context. These
optimisation efforts include guaranteeing on-demand audiovisual access to interpreters
and communicational support during language-discordant doctor–patient interactions. Yet
despite these affordances, ethical limitations remain. One key example is the competence
gap among healthcare professionals in knowing how to effectively work with interpreters.
This is exacerbated by VRI’s perceptual constraints, including restricted camera range, body
orientations, and blocked views. These factors hinder the interpreter’s ability to perceive
non-verbal cues, which complicates communication and affects interpreting accuracy. Simi-
larly, Gilbert and associates (2021) found that, while VRI offers practical and economic
benefits, including reducing the cost of the interpreter’s travel, it also raises ethical concerns,
particularly for patients experiencing cognitive impairment or dementia. The lack of body
language, eye contact, and emotional expression impedes patients’ abilities to respond to
verbal information and may affect empathic communication and diagnosis. Meanwhile, as
the COVID-19 pandemic brought about the rise of telehealth, it has become increasingly
common to see various forms of DI integrated into this process. Boéri and Giustini (2022,
2024) present the findings of an ethnographic case study conducted with Qatar-based inter-
preters during the first wave of the pandemic. The implementation of VRI and RI (audio,
audio link, and over-the-phone solutions) in medical and mental healthcare institutions

339
The Routledge Handbook of Interpreting, Technology and AI

forced interpreters to navigate new ethical challenges to maintain patient interaction and
confidentiality while working from home. These ethical challenges drove interpreters to
devise alternative uses of technologies in order to uphold interpreting quality and empathy.
These include comforting patients via audio link, rearranging household spaces to guaran-
tee privacy, and assisting medical staff in the case of contingent technical issues.
In a similar vein, studies on technologies in legal settings also provide important
insights into ethical matters. The most comprehensive insights to date which relate to
VMI (video-mediated interpreting, used as a cover term for all modalities of interpreting
involving video links) in criminal proceedings were conducted by the European AVIDICUS
1 (2008–2011), 2 (2011–2013), and 3 (2014–2016) projects (Braun and Taylor, 2012;
Braun, 2013; Braun et al., 2018). Through surveys from 200 legal interpreters and 30
institutions (AVIDICUS 1, 2) and ethnographic research (AVIDICUS 3), the projects iden-
tified numerous quality and ethical issues in court and police settings. The findings high-
light a common issue: While VMI is seen as a cost-effective way to improve access to
interpreter-supported justice services, its use is also controversial. To illustrate, there is
a discrepancy between objective measures (like interpreters’ performance) and percep-
tions of technology (see also Braun and Singureanu, this volume). This often leads to
reduced quality in participant interactions and a greater fragmentation of discourse. In
turn, some authors (Barak, 2021; Mellinger, 2022; Russo and Spinolo, 2022) highlight
that uninformed uses of language technology can severely impact decision-making in
asylum settings. Relevant arguments generally indicate that technical issues, translation
accuracy, and interaction management directly affect the credibility assessments and the
immigrant’s testimony. This potentially undermines the process and could impact the out-
come of deportation hearings or the granting of refugee status. Evidence from the ground
and language associations denounce increasing cases in the UK, the United States, and EU
states’ immigration systems, which are increasingly relying on AI-powered translations
in lieu of human interpreter mediation during asylum proceedings. AI tools, when used
unsupervised, have already been seen to result in asylum rejections and the weaponisation
of small linguistic technicalities to justify deportations (The Guardian, 2023a, 2023b; The
New Humanitarian, 2020). This is particularly the case when marginalised languages are
concerned. Indeed, lack of scrutiny when embedding technologies into institutional power
dynamics and interpreting processes can compromise vulnerable populations’ human
rights in judicial settings.
Finally, Federici et al. (2023) caution against deploying automated technologies for inter-
preting in crisis contexts, such as natural disasters, armed conflicts, displacement, and health
emergencies. These situations require effective communication for humanitarian operators
to coordinate efforts and provide information in local languages. The authors emphasise
that automated technologies can expedite outreach but still require ethical scrutiny. Insuf-
ficient crisis planning leads to improvised responses that depend heavily on automated
tools during emergencies. These tools are often used reactively instead of being integrated
into comprehensive crisis management strategies. Such oversight can hinder the overall
effectiveness of crisis communication efforts. Secondly, the authors highlight technological
constraints in handling languages with limited resources and the potential for insensitivity
to cultural nuances, which further diminish the tools’ efficacy in crisis situations. Therefore,
while automation can indeed enhance crisis response, it still requires human oversight and
optimised resources to ensure optimum utility and mitigate risks.

340
Ethical aspects

18.4 Emerging issues


Although technological progress has created opportunities, it has also brought challenges
which are now catching the attention of the interpreting community and industry. The
current situation requires the integration of technologies to be renegotiated to ensure the
overall sustainability of interpreting practices. The following section directs the reader’s
attention to the technoethical issues present in areas that have been scarcely covered by
research. These warrant further investigation, as they relate to the future of the field: train-
ing, social responsibility, and the development of AI literacy.

18.4.1 Teaching of ethics and technologies in interpreter training


Given the spread of technologies, scholars and trainers are increasingly calling for
technology-related ethics to be addressed more systematically across curricula and in inter-
preting teaching research. In consideration of the rising role of specific technologies, such
as CAI, and their use in the interpreter’s preparation and workflow, Drugan and Megone
(2011) and Kenny (2019) argue for integrating ethics education across training programmes.
Recently, Ramírez-Polo and Vargas-Sierra (2023) conducted a review of existing literature
on ethics. From this, they produced a corpus analysis of translation technology syllabi and
translation technology models to explore whether ethical competence was reflected in class.
The study highlighted a lack of pedagogical intervention to assist trainees in developing eth-
ical sub-competencies related to technology use. The authors urge trainers to address this
issue by developing a series of new models and tools. Depending on curricula and course
learning objectives, these could include increased discussion of ethics in core technology
courses, as well as targeted ethical interventions related to tools used for coursework.
Conversely, Horváth (2022) and Defrancq (2024) call for interpreter trainers to adapt
their teaching modes to the new reality of AI, at the level of instrumental skills and proce-
dural knowledge. Both authors stress the need to link training to interpreting technologies
using existing ethical norms. They argue that this will allow students to achieve the following
two goals: assess the performance of the technologies they are using, and assess the ways in
which these technologies affect their own performance, interpersonal relationships, and data
sharing. First, they suggest that students need proper training to effectively use the tools’ sup-
port while maintaining the quality of human communication. This includes learning how to
manage attention and filter out inaccurate information. Second, CAI and AI tools build on
processing capabilities and data that may entail confidentiality issues and are owned by a
small number of big tech players. Trainees should hence be informed about copyright owner-
ship and lawful data use (Defrancq, 2024; Horváth, 2022). Shifting attention from trainees
to trainers highlights the trainers’ role in providing ethical education for students who will
become future industry stakeholders, namely, interpreters, technology experts, and project
managers. These stakeholders of tomorrow will be devising and implementing interpreting
workflows, digitalisation, and data-based practices that will continue to shape the dynamics
of the sector, thus highlighting the need for education in ethics while they are still in training.

18.4.2 Social responsibility and ethical infrastructure


An important emerging theme that deserves the attention of various interpreting stakehold-
ers centres on social responsibility. This holds the potential to improve the current ethical

341
The Routledge Handbook of Interpreting, Technology and AI

landscape. As Drugan and Tipton observe, the question arises as to how technologies can
‘bring into relief the competing tensions . . . of what constitute socially responsible working
practices . . . as an ethical goal’ (2017, 121). ‘Responsibility’, in this context, encompasses a
dynamic commitment to sustaining decision-making and value judgements. This could lead to
the enhancement of ethics, rooted in social consensus on technological development and use.
Such commitment is reflected in policymakers’ attempts at promoting AI governance mecha-
nisms, which aim to manage the ethical and social challenges automated systems pose, while
maintaining incentives for technological innovation. For instance, the European Parliament
(2023) agreed on drafting the Artificial Intelligence Act (AI Act), which will enter into applica-
tion from 2026 and will be the first of its kind by a major regulator to do so. The European
Commission’s Directorate-General for Translation has also highlighted the necessity to exercise
caution and judgement with regards to ongoing ethical reflection between uses of sustainable
AI for language services and EU initiatives. This has led to the AI Act (Ellinides, 2023).
Alongside such attempts, sector associations are also devising ethical infrastructures. For
example, the SAFE-AI Task Force (2024), in collaboration with bodies such as the Ameri-
can Translators Association, proposes industry-wide guidelines to facilitate dialogue about
best practices for the responsible adoption of AI. These focus on questions relating to user
autonomy, safety and well-being, quality transparency, and accountability for errors. In
order to foster a reflective conversation on AI’s implications in interpreting, the Task Force
opened its guidance document for public comment, so as to receive feedback on the pro-
posed foundational ethical principles.1 By emphasising social responsibility as dynamic and
widely distributed, sector associations, groups, and organisations can create a discursive
space on the ethical implications of interpreting technologies on industry progress, linguis-
tic production, and the socio-economic order of the profession.

18.4.3 Critical AI literacy


AI is rapidly weaving itself into the fabric of competencies of the interpreting profession.
In this regard, attention should also be drawn to how human and machine expertise can
coexist at the level of tasks and skills to develop ethically desirable relationships. To the
author’s knowledge, discussions at the interface of human and AI expertise in interpret-
ing studies are only now appearing in scholarly literature. As Giustini and Dastyar (2024)
argue, the rise of AI in the interpreting industry should account for reflections on criti-
cal AI literacy (CAIL). CAIL encompasses technical proficiency and an understanding of
AI’s societal and ethical implications, especially in terms of the ability to comprehend the
(in)compatibility of automated systems with their particular application contexts. Rather
than framing AI as a threat to human expertise in terms of simplistic dichotomies ‘humans
vs machines’, CAIL recognises the potential for synergistic collaboration. This collabora-
tion can be achieved through carefully developing training and pedagogical initiatives for
upskilling, regulating AI through policy- and lawmaking, collaborating with industry stake-
holders, and implementing user guidelines and best practices. By confronting the presence
of AI and working towards its fair management and deployment, CAIL never loses sight
of the ethical issues at play, along with the potential biases inherent in AI systems. This is
accomplished by combining research and stakeholders’ ‘ability to critically engage with AI
technologies, question their outputs, and use them responsibly’ (p. 15). Hence, CAIL can be
considered as a crucial research theme to explore in order to make well-informed, ethically
grounded decisions about AI-enhanced human interpreting.

342
Ethical aspects

18.5 Conclusion and future directions


Questions addressing the crossroads between interpreting technologies and ethical concerns
are rapidly pushing to the forefront of the academia and the industry. They now require
both practical and scholarly attention and can no longer be dismissed.
This chapter denies neither the usefulness nor the viability of technological tools, and
it recognises the need for the interpreting profession to keep up with industry develop-
ments. Nevertheless, it is important to raise awareness of the associated ethical issues,
as well as the ambivalence that accompanies such tools, so as to promote their fair use.
These concerns include professional dilemmas about how technology may impact the
organisation of interpreting work. Additional concerns focus on associated questions
relating to interpreter working conditions, employability, productivity, and pay, all of
which have ethical implications, especially for interpreters as a professional community.
Alongside these concerns are anxieties about the appropriate use and limits of technolo-
gies such as CAI, DI, and MI, and related ethical worries regarding the substitution of
human labour, specifically in the case of automated forms of interpreting. Furthermore,
ethical investigations observe the ways in which corporate industry players possess power
and influence over the design and deployment of a variety of technologies. These, in turn,
raise questions of confidentiality, privacy, and ownership of data, social issues concern-
ing how LLMs can limit linguistic diversity, and how best to deal with the use of AI in
sensitive interpreting settings.
Thanks to the rapid development of new interpreting tools and software, reinvigorated
academic attention can provide a yardstick on how to best assist the sector to engage with
broad ethical questions, both practically and intellectually. Further avenues of research
are expected to open up as the field evolves. A natural expansion of the knowledge needed
in this area may come from future studies which explore the ethical implications of AI
decision-making, including establishing international standards and governance of AI
in interpreting. In addition, studies on market impact (professional identity, workflows,
quality of service, working conditions, and job tasks) warrant deeper attention so as to
provide understanding about the socio-economic and labour factors that derive from
technological implementation. To this extent, the investigation of human–AI collabora-
tion models may offer promising avenues for assessing how technologies can enhance
and support human skills rather than replace them. Lastly, in order to achieve granular
knowledge on how AI systems influence access to interpreting services, future research
should prioritise questions relating to user trust, involvement, and acceptance, as well
as accessibility and inclusion. The potential of codes of ethics and the role of associa-
tions in directing the present and the future of digitalisation in the sector is also an
under-researched domain worth exploring.
In broadening research in this area, it is essential to integrate technoethical sensitiv-
ity to further address the moral and social responsibilities that accompany technological
advancements in interpreting. To this extent, cross-fertilisation between research and prac-
tice remains a crucial point. It would be a mistake to dismiss the fact that interpreting
technologies are influenced by the interests of various stakeholders. As a result, these tech-
nologies are likely to continue to evolve accordingly. It is vital that stakeholders (users and
providers of interpreting services, interpreting associations, industry and client representa-
tives, software developers, companies, and researchers) collaborate in investigating and
mitigating the ethical risks arising from misuse of technology.

343
The Routledge Handbook of Interpreting, Technology and AI

Note
1 The finalised document ‘Interpreting SAFE AI Task Force Guidance on AI and Interpreting Services’
is available for consultation on the Task Force website URL https://safeaitf.org/guidance/ (accessed
22.6.2024).

References
Abdalla, M., Wahle, J.P., Ruas, T., Névéol, A., Ducel, F., Mohammad, S.M., Fort, K., 2023. The Ele-
phant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research.
ArXiv preprint arXiv:2305.02797.
Acemoglu, D., Autor, D., 2011. Skills, Tasks and Technologies: Implications for Employment and
Earnings. In Ashenfelter, A.C., Card, D., eds. Handbook of Labor Economics. Elsevier, Amster-
dam, 1043–1171.
AIIC, 1999. Memorandum Concerning the Use of Recordings of Interpretation at Conferences
[online]. aiic.net. URL https://aiic.net/p/58 (accessed 22.6.2024).
Anastasopoulos, A., Barrault, L., Bentivogli, L., Zanon Boito, M., Bojar, O., Cattoni, R., Currey, A.,
Dinu, G., Duh, K., Elbayad, M., Emmanuel, C., Estève, Y., Federico, M., Federmann, C., Gah-
biche, S., Gong, H., Grundkiewicz, R., Haddow, B., Hsu, B., Javorský, D., Kloudová, V., Lakew,
S., Ma, X., Mathur, P., McNamee, P., Murray, K., Nădejde, M., Nakamura, S., Negri, M., Nie-
hues, J., Niu, X., Ortega, J., Pino, J., Salesky, E., Shi, J., Sperber, M., Stüker, S., Sudoh, K., Turchi,
M., Virkar, Y., Waibel, A., Wang, C., Watanabe, S., 2022. Findings of the IWSLT 2022 Evaluation
Campaign. In Proceedings of the 19th International Conference on Spoken Language Translation
(IWSLT 2022). Association for Computational Linguistics, Dublin, 98–157.
Barak, M.P., 2021. Can You Hear Me Now? Attorney Perceptions of Interpretation, Technology, and
Power in Immigration Court. Journal on Migration and Human Security 9(4), 207–223.
Baumgarten, S., Bourgadel, C., 2024. Digitalisation, Neo-Taylorism and Translation in the 2020s.
Perspectives 32(3), 508–523.
Bennett, K., 2013. English as a Lingua Franca in Academia. Combating Epistemicide through Transla-
tor Training. The Interpreter and Translator Trainer 7(2), 169–193.
Boéri, J., Giustini, D., 2022. Localizing the COVID-19 Pandemic in Qatar: Interpreters’ Narratives
of Cultural, Temporal and Spatial Reconfiguration of Practice. The Journal of Internationalization
and Localization 9(2), 139–161.
Boéri, J., Giustini, D., 2024. Qualitative Research in Crisis: A Narrative-Practice Methodology to
Delve into the Discourse and Action of the Unheard in the COVID-19 Pandemic. Qualitative
Research 24(2), 412–432.
Bowker, L., Buitrago Ciro, J., 2019. Machine Translation and Global Research: Towards Improved
Machine Translation Literacy in the Scholarly Community. Emerald.
Braun, S., 2013. Keep Your Distance? Remote Interpreting in Legal Proceedings: A Critical Assess-
ment of a Growing Practice. Interpreting 15(2), 200–228. https://doi.org/10.1075/intp.15.2.03bra
Braun, S., 2019. Technology and Interpreting. In O’Hagan, M., ed. Routledge Handbook of Transla-
tion and Technology. Routledge, London, 271–288.
Braun, S., 2024. Distance Interpreting as a Professional Profile. In Massey, G., Ehrensberger-Dow,
M., Angelone, E., eds. Handbook of the Language Industry: Contexts, Resources and Profiles.
Gruyter & Co KG, 449–472.
Braun, S., Davitti, E., Dicerto, S., 2018. Video-Mediated Interpreting in Legal Settings: Assessing the
Implementation. In Napier, J., Skinner, R., Braun, S., eds. Here or There: Research on Interpreting
via Video Link. Gallaudet, Washington, DC, 144–179.
Braun, S., Taylor, J., 2012. Videoconference and Remote Interpreting in Legal Proceedings. Intersentia.
Buján, M., Collard, C., 2023. Remote Simultaneous Interpreting and COVID-19: Conference
Interpreters’ Perspective. In Liu, K., Cheung, A., eds. Translation and Interpreting in the Age of
COVID-19. Springer Nature, Singapore, 133–150.
Bunge, M., 1975. Towards a Technoethics. Philosophic Exchange 6(1), 69–79.
Castilho, S., Mallon, C., Meister, R., Yue, S., 2023. Do Online Machine Translation Systems
Care for Context? What About a GPT Model? In Nurminen, M., Brenner, J., Koponen, M.,
Latomaa, S., Mikhailov, M., Schierl, F., Ranasinghe, T., Vanmassenhove, E., Alvarez Vidal, S.,

344
Ethical aspects

Aranberri, N., Nunziatini, M., Parra Escartín, C., Forcada, M., Popovic, M., Scarton, C., Moniz,
H., eds. 24th Annual Conference of the European Association for Machine Translation (EAMT
2023). EAMT, Tampere, 393–417.
Chesterman, A., 2001. Proposal for a Hieronymic Oath. The Translator 7(2), 139–154.
Corpas Pastor, G., 2018. Tools for Interpreters: The Challenges That Lie Ahead. Trends in Translation
Teaching and Learning E 5, 157–182.
Corpas Pastor, G., 2021. Interpreting and Technology: Is the Sky Really the Limit? Proceedings of the
Translation and Interpreting Technology Online Conference, 15–24. URL https://doi.org/10.266
15/978-954-452-071-7_003
Dean, R.K., Pollard, R.Q., 2022. Improving Interpreters’ Normative Ethics Discourse by Imparting
Principled-Reasoning Through Case Analysis. Interpreting and Society 2(1), 55–72.
Defrancq, B., 2024. Conference Interpreting in AI Settings: New Skills and Ethical Challenges. In
Massey, G., Ehrensberger-Dow, M., Angelone, E., eds. Handbook of the Language Industry: Con-
texts, Resources and Profiles. Gruyter & Co KG, Berlin, 473–488.
De Palma, D., Lommel, A., 2023. The Evolution of Language Services and Technology [online]. CSA
Research. URL https://insights.csa-research.com/reportaction/305013598/Marketing (accessed
28.6.2024).
Drugan, J., 2019. Police Communication Across Languages in Crisis Situations: Human Trafficking
Investigations in the UK. In Federici, F., O’Brien, S., eds. Translation in Cascading Crises. Rout-
ledge, London, 46–66.
Drugan, J., Babych, B., 2010. Shared Resources, Shared Values? Ethical Implications of Sharing
Translation Resources. In Proceedings of the Second Joint EM+/CNGL Workshop: Bringing MT
to the User: Research on Integrating MT in the Translation Industry, 3–10.
Drugan, J., Megone, C., 2011. Bringing Ethics into Translator Training: An Integrated, Inter-Disciplinary
Approach. The Interpreter and Translator Trainer 5(1), 183–211.
Drugan, J., Tipton, R., 2017. Translation, Ethics and Social Responsibility. The Translator 23(2),
119–125.
Ellinides, C., 2023. EUATC Keynote: Ethical, Sustainable Business from EU Perspective [online].
ATC-EUATC Ethical Business Summit. URL https://atc.org.uk/people-and-purpose-driven-progress-
not-perfection/ (accessed 28.6.2024).
Fantinuoli, C., 2017. Speech Recognition in the Interpreter Workstation. In Esteves-Ferreira, J.,
Macan, J., Mitkov, R., Stefanov, O., eds. Proceedings of the Translating and the Computer.
AsLing, London, 25–34.
Fantinuoli, C., 2018. Interpreting and Technology. Language Science Press, Berlin.
Fantinuoli, C., 2023. The Emergence of Machine Interpreting. European Society for Translation Stud-
ies 62, 10.
Fantinuoli, C., Dastyar, V., 2021. Interpreting and the Emerging Augmented Paradigm. Interpreting
and Society 2(2), 185–194.
Federici, F.M., Declercq, C., Díaz Cintas, J., Baños Piñero, R., 2023. Ethics, Automated Processes,
Machine Translation, and Crises. In Moniz, H., Parra Escartín, E., eds. Towards Responsible Machine
Translation: Ethical and Legal Considerations in Machine Translation. Springer, Cham, 135–156.
Forcada, M.L., 2023. Licensing and Usage Rights of Language Data in Machine Translation. In
Moniz, H., Escartín, C.P., eds. Towards Responsible Machine Translation: Ethical and Legal Con-
siderations in Machine Translation. Springer, Cham, 49–69.
García González, M., 2024. The Role of Human Translators in the Human-Machine Era: Assessing
Gender Neutrality in Galician Machine and Human Translation. In Monzó-Nebot, E., Tasa-Fuster,
V., eds. Gendered Technology in Translation and Interpreting: Centering Rights in the Develop-
ment of Language Technology. Routledge, London, 173–201.
Gentile, P., Albl-Mikasa, M., 2017. “Everybody Speaks English Nowadays”. Conference Interpreters’ Per-
ception of the Impact of English as a Lingua Franca on a Changing Profession. Cultus 10(1), 53–66.
Ghosh, S., Chatterjee, S., 2024. Misgendering and Assuming Gender in Machine Translation When
Working with Low-Resource Languages. In Monzó-Nebot, E., Tasa-Fuster, V., eds. Gendered
Technology in Translation and Interpreting: Centering Rights in the Development of Language
Technology. Routledge, London, 274–290.
Gilbert, A.S., Croy, S., Hwang, K., LoGiudice, D., Haralambous, B., 2021. Video Remote Interpreting
for Home-Based Cognitive Assessments: Stakeholders’ Perspectives. Interpreting 24(1), 84–110.

345
The Routledge Handbook of Interpreting, Technology and AI

Giustini, D., 2024. “You Can Book an Interpreter the Same Way You Order Your Uber”: (Re) Inter-
preting Work and Digital Labour Platforms. Perspectives 32(3), 441–459.
Giustini, D., Dastyar, V., 2024. Critical AI Literacy for Interpreting in the Age of AI. Interpreting and
Society. URL https://doi.org/10.1177/27523810241247259
The Guardian, 2023a. Lost in AI Translation: Growing Reliance on Language Apps Jeopardizes
Some Asylum Applications [online]. URL www.theguardian.com/us-news/2023/sep/07/asylum-
seekers-ai-translation-apps (accessed 25.6.2024).
The Guardian, 2023b. Home Office to Tell Refugees to Complete Questionnaire in English or Risk
Refusal [online]. URL www.theguardian.com/uk-news/2023/feb/22/home-office-plans-to-use-
questionnaires-to-clear-asylum-backlog (accessed 25.6.2024).
Hale, S., 2007. Community Interpreting. Palgrave Macmillan, Basingstoke.
Horváth, I., 2022. AI in Interpreting: Ethical Considerations. Across Languages and Cultures
23(1), 1–13.
Horváth, I., Tryuk, M., 2021. Ethics and Codes of Ethics in Conference Interpreting. In Mikkelson,
H., Jourdenais, R., eds. The Routledge Handbook of Conference Interpreting. Routledge, London,
290–304.
IBM, 2023. IBM Artificial Intelligence Pillars. URL www.ibm.com/policy/ibm-artificial-intelligence-pillars/
(accessed 15.9.2024).
International Organization for Standardization (ISO). ISO/DIS 17651-3- Simultaneous Interpreting.
URL www.iso.org/obp/ui#iso:std:iso:17651:-3:dis:ed-1:v1:en:term:3.5 (accessed 15.9.2024).
Kak, A., Myers West, A., Whittaker, M., 2023. Make No Mistake – AI Is Owned by Big Tech
[online]. MIT Technology Review. URL www.technologyreview.com/2023/12/05/1084393/
make-no-mistake-ai-is-owned-by-big-tech/ (accessed 23.6.2024).
Kenny, D., 2019. Technology and Translator Training. In O’Hagan, M., ed. Routledge Handbook of
Translation and Technology. Routledge, London, 498–515.
Klammer, M., Pöchhacker, F., 2021. Video Remote Interpreting in Clinical Communication: A Multi-
modal Analysis. Patient Education and Counseling 104(12), 2867–2876.
Lewis, D., Moorkens, J., 2020. A Rights-Based Approach to Trustworthy AI in Social Media. Social
Media+ Society 6(3), 2056305120954672.
Mager, M., Mager, E., Kann, K., Thang, V.N., 2023. Ethical Considerations for Machine Translation
of Indigenous Languages: Giving a Voice to the Speakers. In Proceedings of the 61st Annual Meet-
ing of the Association for Computational Linguistics (Volume 1: Long Papers). Association for
Computational Linguistics, Toronto, 4871–4897.
Mahyub Rayaa, B., Martin, A., 2022. Remote Simultaneous Interpreting: Perceptions, Practices and
Developments. The Interpreters’ Newsletter 27, 21–42.
Massey, G., Ehrensberger-Dow, M., Angelone, E., eds., 2024. Handbook of the Language Industry:
Contexts, Resources and Profiles. Gruyter & Co KG, Berlin.
Měchura, M., 2015. Do Minority Languages Need the Same Language Technology as Majority Lan-
guages? [online]. URL www.lexiconista.com/minority-languages-machine-translation/ (accessed
22.6.2024).
Mellinger, C.D., Hanson, T.A., 2018. Interpreter Traits and the Relationship with Technology and
Visibility. Translation and Interpreting Studies 13(3), 366–392.
Mellinger, H., 2022. Interpretation at the Asylum Office. Law & Policy 44(3), 230–254.
Monzó-Nebot, E., Tasa-Fuster, V., eds., 2024. Gendered Technology in Translation and Interpreting:
Centering Rights in the Development of Language Technology. Routledge, London.
Mouzourakis, P., 2006. Remote Interpreting: A Technical Perspective on Recent Experiments. Inter-
preting 8(1), 45–66.
The New Humanitarian, 2020. “Translation Machines”: Interpretation Gaps Plague French Asylum
Process [online]. URL www.thenewhumanitarian.org/news-feature/2020/10/27/france-migration-
asylum-translation (accessed 25.6.2024).
Nimdzi, 2023. Remote vs Onsite Interpreting: The Post-Pandemic Equilibrium [online]. URL www.
nimdzi.com/remote-vs-onsite-interpreting-t-post-pandemic-equilibrium/ (accessed 20.6.2024).
Nimdzi, 2024. The 2024 Nimdzi 100 [online]. URL www.nimdzi.com/nimdzi-100-top-lsp/ (accessed
20.6.2024).
Okoniewska, A.M., 2022. Interpreters’ Roles in a Changing Environment. The Translator 28(2),
139–147.

346
Ethical aspects

Ortiz, L.E., Cavalli, P., 2018. Computer-Assisted Interpreting Tools (CAI) and Options for Automa-
tion with Automatic Speech Recognition. TradTerm 32, 9–31.
Pielmeier, H., Lommel, A., Toon, A., 2024. Perceptions on Automated Interpreting. Results of a
Large-Scale Study of End-Users, Requestors, and Providers of Interpreting Services and Technol-
ogy [online]. CSA Research. URL https://insights.csa-research.com/reports/305013618/Chapter7
Perceptionsa#r::305013618:Chapter7Perceptionsa:TheEthicsofReplacing (accessed 10.6.2024).
Prandi, B., 2023. Computer-Assisted Simultaneous Interpreting: A Cognitive-Experimental Study on
Terminology. Language Science Press, Berlin.
Ramírez-Polo, L., Vargas-Sierra, C., 2023. Translation Technology and Ethical Competence: An Anal-
ysis and Proposal for Translators’ Training. Languages 8(2), 1–22.
Ren, W., Yin, M., 2020. Conference Interpreter Ethics. In Koskinen, K., Pokorn, N.K., eds. The Rout-
ledge Handbook of Translation and Ethics. Routledge, London, 195–210.
Rozado, D., 2023. The Political Biases of ChatGPT. Social Sciences 12(3), art. 148.
Russo, M., Spinolo, N., 2022. Technology Affordances in Training Interpreters for Asylum Seek-
ers and Refugees. In Ruiz Rosendo, L., Todorova, M., eds. Interpreter Training in Conflict and
Post-Conflict Scenarios. Routledge, London, 165–180.
Schmitt, P.A., 2019. Translation 4.0–Evolution, Revolution, Innovation or Disruption? Lebende
Sprachen 64(2), 193–229.
Stakeholders Advocating for Fair and Ethical AI in Interpreting (SAFE-AI), 2024. Interpreting SAFE
AI Task Force Guidance (Ethical Principles) AI and Interpreting Services [online]. URL https://
safeaitf.org/guidance/ (accessed 25.6.2024).
Stengers, H., Lázaro Gutiérrez, R., Kerremans, K., 2023. Public Service Interpreters’ Perceptions
and Acceptance of Remote Interpreting Technologies in Times of a Pandemic. In Corpas Pastor,
G., Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. John Benjamins,
Amsterdam, 109–141.
Tejada Delgado, A., 2019. Is the Public Sector Interpreting Market Ready for Digital Transformation?
Revista Tradumática 17, 88–93.
Tymoczko, M., 2006. Translation: Ethics, Ideology, Action. The Massachusetts Review 47(3),
442–461.
UNESCO, 2021. Recommendation on the Ethics of Artificial Intelligence. URL https://unesdoc.une-
sco.org/ark:/48223/pf0000380455 (accessed 15.9.2024).
Van de Meer, J., 2021. Translation Economics of the 2020s [online]. Multilingual. URL https://
multilingual.com/issues/july-august-2021/translation-economics-of-the-2020s/ (accessed 25.6.2024).
Van Dis, E., Bollen, J., Zuidema, W., Van Rooij, R., Bockting, C.L., 2023. ChatGPT: Five Priorities
for Research. Nature 614(7947), 224–226.
Vanmassenhove, E., Emmery, C., Shterionov, D., 2021. Neutral Rewriter: A Rule-Based and
Neural Approach to Automatic Rewriting into Gender-Neutral Alternatives. arXiv preprint
arXiv:2109.06105.
Vieira, N.L., 2020. Automation Anxiety and Translators. Translation Studies 13(1), 1–21.
Wolfe, R., Caliskan, A., 2021. Low Frequency Names Exhibit Bias and Overfitting in Contextualizing
Language Models. arXiv preprint arXiv:2110.00672.
Zetzsche, J., 2020. Freelance Translator’s Perspectives. In O’Hagan, M., ed. Routledge Handbook of
Translation and Technology. Routledge, London, 166–182.

347
19
COGNITIVE ASPECTS
Christopher D. Mellinger

19.1 Introduction
Researchers have been interested in cognitive aspects of interpreting since the inception of
the discipline. In part, the emphasis on interpreter cognition is the result of psycholinguists,
who viewed interpreting as a form of extreme language processing that could provide insights
into how the bilingual brain functions with multiple languages at the same time. These stud-
ies were largely eschewed by practising interpreters at that time, particularly since the exper-
imental control required to understand cognition failed to account for the situated nature of
interpreting, which enables cross-language communication among multiple parties. Conse-
quently, practising interpreters-turned-researchers – that is, practisearchers – began to com-
plement these studies (Gile, 2000), seeking to balance laboratory-conducted experimental
studies with more professionally informed, context-dependent examinations of interpreting
(for an overview, see Mellinger, 2024; Pöchhacker and Shlesinger, 2002). Edited collections
on interpreting technologies now bring together research from both traditions to examine
interpreting as a product, a process, and a service (e.g. Fantinuoli, 2018; Jiménez Serrano,
2019; Mellinger and Pokorn, 2018; Napier et al., 2018).
Implicit within this disciplinary backdrop is the development and use of interpreting
technologies which have been present since some of the earliest studies on interpreting. For
instance, simultaneous interpreting as a professional practice for spoken language interpret-
ing arose from technological advances that allowed audio equipment to facilitate source
language listening at the same time as target language production (Diriker, 2010; see also
Seeber, this volume). If we consider a broader definition of technologies to encompass any
tools or equipment that enables specific professional practices, then note-taking involving
paper and pen, tablet computers, or digital pen technologies is a commonplace feature of
interpreting practices in both community and conference settings (Ahrens and Orlando,
2022; Goldsmith, 2018). Still, more recent technological advances that facilitate distance
interpreting, be it via telephone (Braun, 2019, 2024; see also Lázaro Gutiérrez, this vol-
ume) or videoconference technologies (Braun, 2019, this volume), are part and parcel of
professional interpreting practices. The same holds true for technologies that disseminate
interpreting services via television (e.g. Wehrmeyer, 2015), video streaming (e.g. Picchio,

DOI: 10.4324/9781003053248-25 
Cognitive aspects

2023), or speech-to-text broadcasting (e.g. Romero-Fresco, 2023; see also Davitti, this vol-
ume), such that technology remains an ever-present aspect associated with interpreting as
a practice, as a process, and as a service. Moreover, AI-driven technologies are increasingly
integrated into multilingual communication workflows, blurring the interface between
interpreters and the technologies used to support their work (Fantinuoli, 2023, this vol-
ume; Horváth, 2022).
Given the ubiquitous nature of interpreting technologies, there is an increasing need
to understand how technologies influence interpreter cognition. Interpreting is a cogni-
tively demanding task without the use of technology; however, technology-mediated or
technology-supported communication adds another dimension for which we must account
in order to understand interpreter behaviour and cognitive processing during the interpret-
ing task. As such, the primary focus of this chapter is on the intersection of technology
and interpreter cognition, highlighting studies that have explored various cognitive aspects
of interpreter cognition as they relate to technology use in the practice and process of
interpreting (Section 19.2). These studies are categorised based on how technologies either
enable interpreting (Section 19.2.1) or support the interpreting process (Section 19.2.2).
Mention is also made of the extent to which technology can support cognitive benefits
during interpreter training (Section 19.2.3). Then, several key topics associated with tech-
nology use are reviewed, including cognitive load and cognitive effort (Section 19.3.1),
cognitive ergonomics, and human–computer interaction (Section 19.3.2), as well as more
situated and contextualised approaches to interpreter cognition, including 4EA cognition
and augmented cognition (Section 19.3.3). The chapter concludes with a brief discussion of
open questions in the field related to interpreting technologies and cognition, namely, big
data and interpreting ethics (Section 19.4).

19.2 Interpreting studies, technology, and cognition


Research that incorporates both cognitive aspects of interpreting and technology can be
done at various levels, ranging from research on individual differences of interpreters to
broader questions that seek to understand cognitive processes common to all interpreters
working with technology. At the level of individuals, researchers can investigate cognitive
aspects of interpreting associated with beliefs, attitudes, or perceptions about interpreting
technologies, as well as individual-level cognitive characteristics or traits that may be influ-
enced by working with technology or in digital environments (Korpal and Mellinger, 2025).
Studies that focus explicitly on individual-level differences are limited in the field, with most
studies seeking to understand these variables more broadly across groups of interpreters
working in various domains, settings, or modes of interpreting. These broader questions
interrogate technology and its relationship to the interpreting process (Mellinger, 2019),
while also providing a panoramic view of general trends of cognitive research in relation to
interpreting. In the following subsections, individual-level and group-level differences are
distinguished when possible to provide a framework when reviewing cognitively oriented
studies on interpreting and technology.
Cognitive studies on interpreting and technology can be classified based on the types of
technologies that are employed. Fantinuoli (2018) and Chen and Doherty (2025) divide
these technologies into setting-oriented and process-oriented technologies in order to empha-
sise how technologised settings may alter interpreter cognition or whether the technologies

349
The Routledge Handbook of Interpreting, Technology and AI

modulate cognitive processes of interpreting. This chapter makes a distinction in line with
Braun (2019), categorising cognitive studies on interpreting based on whether the tech-
nologies enable a new type of interpreting (e.g. remote or distance interpreting) or sup-
port the interpreter during the interpreting task (e.g. automatic speech recognition [ASR],
real-time support technologies, digital pens). Cognitive studies on technology-enabled and
technology-supported interpreting can focus on the use of technologies as the specific vari-
able of interest in comparison to non-mediated or non-supported interpreting. These stud-
ies can also approach technology and interpreting as the baseline configuration – that is,
technology is part of each condition – focusing instead on other moderating variables that
occur in these technologised environments that impact the parties involved. Illustrative
examples of both types of studies are included to account for the breadth of scholarship on
interpreting, technology, and cognition.

19.2.1 Technology-enabled interpreting


Technology-enabled interpreting is perhaps most readily associated with remote and other
modalities of distance interpreting, in which specific technologies enable interpreter-mediated
encounters across a distance wherein some or all the various parties are not co-located with
others involved in the communicative event (Braun, 2019, 2024). One of the earliest stud-
ies that addressed cognitive dimensions in remote interpreting was conducted by Roziner
and Shlesinger (2010), which addressed both affective dimensions of the interpreting task
and psychophysiological changes resulting from interpreting at a distance. This research
focused on interpreters working in the European Parliament and investigated the extent
to which technology use may influence interpreting quality alongside measures associated
with interpreter health and well-being. While the product-oriented measures seemed to
suggest relatively comparable interpreting quality in remote configurations and minimal
changes in interpreter stress, the study revealed how remote interpreting did impact the
affective dimensions of interpreter cognition that were associated with feelings of alienation
and isolation. These results run counter to Moser-Mercer’s (2005) study on simultaneous
interpreting and Braun’s (2013) study on consecutive interpreting and the role that virtual
presence may have on the interpreting task, with a deterioration in performance being asso-
ciated with the onset of fatigue as a potential result of the allocation of additional cognitive
resources when working in a remote setting. Yet Moser-Mercer’s study confirms an affec-
tive change in the remote condition, with interpreters unable to develop a sense of presence
when working in this configuration. Later qualitative studies corroborate the potential link
between the affective dimensions of interpreting and the (in)ability to develop a sense of
presence (e.g. Braun, 2020).
Given the somewhat-contradictory findings of earlier studies, researchers have sought
to understand if these configurations have led to greater cognitive demands (Ziegler and
Gigliobianco, 2018). Whether these are the result of working explicitly with technologies
and some of the technical challenges that arise (Bertozzi and Cecchi, 2023; Salaets and
Brône, 2023) or if these are the result of their work in virtual spaces (Braun, 2017; Dono-
van, 2023) remains unclear. However, researchers have consistently documented how tech-
nology modulates the interpreter’s task and performance. As such, previous scholarship
that has demonstrated a relationship between working memory capacity and interpreter
performance (Mellinger and Hanson, 2019; Wen and Dong, 2019) and that has posited a
similar relationship in dialogue interpreting (Tiselius and Englund Dimitrova, 2023) should

350
Cognitive aspects

be considered, with technology serving as a potential moderating variable. Hale et al.


(2022) illustrate how this type of influence may manifest in relation to technical challenges
and self-perceived difficulty, particularly in consecutive interpreting encounters.

19.2.2 Technology-supported interpreting


A growing body of scholarship addresses how technology supports the interpreting process,
with increased attention being placed on process-oriented studies that investigate the extent
to which cognitive aspects of interpreting are altered as a result of these technologies being
used during interpreting. Mellinger (2019) describes how product- and process-oriented
studies can be used to research computer-assisted interpreting and the associated cogni-
tive processes. While not directly observable, interpreter cognition can be indirectly meas-
ured via proxy variables during both simultaneous and consecutive interpreting (Shlesinger,
2000). Such variables include physiological responses that are observed using a range of
data collection methods, including pupillometrics or galvanic skin response indicators
(Korpal and Rojo López, 2023) and measures of latent constructs documented via survey
instruments (Mellinger and Hanson, 2020).
Scholars interested in technologies and cognitive aspects of simultaneous interpreting
approach these questions from the perspective of the technological environments in which
interpreters can work as well as in relation to real-time and multiple input streams. Chmiel
and Spinolo (2022) recognise the challenges of trying to conduct this type of scholarship
in relation to virtual booths, particularly since the experimental control required to isolate
specific cognitive processes is difficult even in non-virtual spaces. Multiple data streams and
multimodal processing are an important consideration (Seeber, 2017), particularly since
there may be a point at which too much information can lead to a decrease in interpreter
performance (e.g. Saeed et al., 2023), whether as the result of cognitive dissonance, cogni-
tive saturation, or ergonomic factors associated with tool design.
More focused studies have identified potential benefits of working with specific tools,
such as SmarTerp, that seek to support interpreters in their renditions despite not illustrat-
ing a physiological decrease in stress (e.g. Olalla-Soler et al., 2023). Similarly, ASR has
been shown to have an impact on quality, although there are trade-offs that must be made
on account of the change in visual access that interpreters have during their work (Pisani
and Fantinuoli, 2021). Differences in attention allocation also influence interpreter perfor-
mance and interpreter cognition (Prandi, 2023a), demonstrating the endogenous relation-
ship between cognition and technology.
When considering consecutive interpreting, several studies have investigated individual-
level differences related to attitudes and behaviours of interpreters. These studies seek to
understand how latent constructs (e.g. attitudes toward technology adoption, use, and
implementation) might influence interpreter behaviour. For instance, Mellinger and Han-
son (2018) seek to understand how these latent constructs relate to attitudes toward inter-
preter role across medical and legal settings, identifying instances in which perspectives on
technology usage may intersect with an interpreter’s self-perception of their role during
the interpreted event. Stengers et al. (2023) address questions of technology adoption in
relation to the use of technology during the COVID-19 pandemic, along with the interpret-
ers’ acceptance of these tools. In both cases, these affective dimensions of attitude relate
to individual differences across interpreters, which, in turn, can be related to interpreter
performance. Affective dimensions related to consecutive interpreting can also arise when

351
The Routledge Handbook of Interpreting, Technology and AI

considering the technologies themselves, such as Goldsmith’s (2018) study that focuses on
the integration of tablet computers into the practice of interpreting, or Liu’s (2018) over-
view of interpreting technologies more generally and the call for a more critical evaluation
and integration in interpreting workflows. Such calls lead to the potential construct of a
technology literacy for interpreters (Drechsel, 2019) which, if sufficiently operationalised
and established as a measurable construct, could serve as yet another individual-level dif-
ference discernible in the literature. Reactions to technology form part of cognitive explo-
rations of interpreting, and these individual-level differences can provide a more nuanced
understanding of how interpreters interact with technologies.
Similar to questions associated with providing visual support to interpreters when work-
ing in the simultaneous mode, Chen and Kruger (2023) also examine ASR that allows
machine translation systems to support interpreters working in the consecutive mode. This
study posits a reduction in cognitive load as a result of the visual access to a potential tar-
get language rendition and also suggests that directionality plays a role when considering
technology-supported interpreting. Doherty et al. (2022) approach the question of visual
attention with respect to note-taking during video remote interpreting, finding that the
inclusion of note-taking as a concurrent activity to interpreting increases shifts in attention,
which in turn decreases visual attention to the speaker as a result of note-taking practices.
These note-taking behaviours have been examined at the level of individual differences as
well, particularly in relation to experience, but do not find that this variable mitigates the
perception of task difficulty (Kuang and Zheng, 2023). Still others have sought to exam-
ine how note-taking may be indicative of metacognitive behaviour using digital pens, evi-
denced by omissions, hesitations, stray marks, and symbols that are produced during the
interpreting task (Mellinger, 2022b). These studies are still tentative in their conclusions,
and additional work needs to be conducted to better understand the mechanisms by which
technology alters cognitive resource allocation and interpreter behaviour.

19.2.3 Training and education


A third area associated with interpreting technologies and cognition is the training and
education of novice interpreters. Scholarship beyond translation and interpreting studies
has argued that cognitive and educational sciences can be leveraged as a means to integrate
information and communication technologies into educational settings (Gardenfors and
Johansson, 2005). Similar positions have been forwarded more generally within translation
and interpreting scholarship – for instance, Díaz-Galaz and Winston (2025) provide an
overview of how cognitive studies can serve as the foundation for research-based training
on interpreting. Adding technology into this discussion is more recent, with scholars such
as Frittella (2021) tracing an overview of computer-assisted interpreter training and how
instructional support can be provided for conference interpreting studies. Moreover, she
identifies potential gaps in the literature that could be augmented by more explicit study to
provide a research-based pedagogy associated with interpreting technologies. Some studies
have addressed certain technologies in relation to interpreter training, including specific
course configurations and technical requirements (e.g. Darden and Maroney, 2018), and on
computer simulations (e.g. Viljanmaa, 2018), but calls for additional work on interpreting
technologies remain relevant to understand how each might influence or support training.
Previous studies take up this call for a more research-based pedagogy, particularly in
relation to digital pens (Ahrens and Orlando, 2022; Orlando, 2023) and video remote

352
Cognitive aspects

interpreting platforms (Bertozzi and Cecchi, 2023; Davitti and Braun, 2020). These studies
can be augmented by research associated with metacognition and self-regulation (Aguirre
Fernández Bravo, 2019; Cañada and Arumí, 2012; Herring, 2019), particularly since these
cognitive variables have not been sufficiently addressed when working with tools. Addi-
tionally, studies that focus on challenges associated with interpreting different modalities
of interpreting, such as telephone interpreting (e.g. Iglesias Fernández and Ouellet, 2018;
Ozolins, 2011), could be further tested beyond the affective dimensions to determine the
extent to which these tools can cognitively support interpreting students or how these tools
can be leveraged to enhance learning and skill-building.

19.3 Key topics and current issues


Although cognitive enquiry into interpreting and technologies is increasingly broad,
a number of topics regularly figure into contemporary studies. These include questions
associated with cognitive load and cognitive effort, as well as cognitive ergonomics and
human–computer interaction. The same could be said for greater attention being placed
on 4EA (embodied, embedded, enactive, extended, and affective) cognition and augmented
cognition (Englund Dimitrova and Tiselius, 2016; Muñoz Martín, 2016), particularly in
recognition of the situated nature of interpreting and the advent of new tools that purport
to enhance interpreters’ ability to do their work. While this is not an exhaustive list, many
of these appear to be mainstay topics that will continue to be included in these discussions,
particularly since both the professional and academic communities regularly comment on
these topics in relation to technology use in their work.

19.3.1 Cognitive load and cognitive effort


Cognitive load and cognitive effort are two distinct but related concepts that often appear
as self-reported or measured variables in studies associated with interpreter cognition and
technology. At times there are terminological and conceptual challenges associated with
the use of each of these terms in translation and interpreting studies (Gieshoff and Heeb,
2023), with cognitive effort and cognitive load being conflated or theoretically misaligned
with adopted research methods. Whereas cognitive load is often associated with the effort
expended during a given task, cognitive load is typically associated with the demands placed
on a person’s cognition (Gile and Lei, 2020). How these constructs are measured is of inter-
est across a range of disciplines (e.g. Zheng, 2018), which ultimately raises questions as to
how researchers working in translation and interpreting studies approach its measurement,
be it through self-report data or other physiological measures (Korpal and Rojo López,
2023; Seeber, 2013). These data collection methods are occasionally at odds with each
other, such that self-report data may not always coincide with findings derived from other
product- or process-oriented measures (Gieshoff and Heeb, 2023). A full discussion of each
of these terms lies beyond the scope of the current chapter; however, a number of excellent
overviews exist that theoretically ground this work and situate each within translation and
interpreting studies (e.g. Gile and Lei, 2020; Seeber and Amos, 2023).
Notwithstanding the epistemological debates on the definitions of these constructs, the
general concepts provide researchers with a means to recognise the complexity of the inter-
preting task, the demands placed on an interpreter’s cognition, and the effort expended to
resolve challenges encountered during task performance. The utility of these concepts can

353
The Routledge Handbook of Interpreting, Technology and AI

be illustrated by way of example. In the case of distance interpreting, the cognitive demands
regularly encountered by interpreters may be modulated by technology as an additional
variable involved during the process. In the case of distance interpreting, the technologised
environment in which interpreters work may give rise to changes in the interpreting task
and, as such, require interpreter cognition to vary to account for these differences (Chmiel
and Spinolo, 2022; Mellinger, 2019).
Typically, cognitive models of interpreting seek to describe the interpreting task as a pro-
cess (e.g. Gile 1991, 2009; Seeber, 2013; Seeber and Kerzel, 2012; for an overview of early
cognitive models, see Ahrens, 2025). These models provide information about various loci
of cognitive load, enabling researchers to focus on specific stages of the interpreting task to
investigate cognitive load or effort. To account for task-related characteristics that poten-
tially affect interpreter performance and cognitive load, Chen (2017) moves away from
these process-oriented models to posit a componential model of interpreter cognition. In
doing so, Chen’s (2017) work seeks to address technological aspects as well as physiologi-
cal and affective dimensions associated with interpreter cognition and, in the case of Chen,
questions associated with note-taking practices that arise during consecutive interpreting.
Zhu and Aryadoust (2022) take a similar approach when examining distance interpreting,
placing particular emphasis on remote interpreting and the task-related dimensions, envi-
ronment, and individual differences that may influence cognitive load. Performance indica-
tors and cognitive load have also been examined in relation to computer-aided interpreting
tools (Defrancq et al., 2024).

19.3.2 Cognitive ergonomics and human–computer interaction


Whereas cognitive load and cognitive effort are two constructs that many researchers find
of interest as part of research on interpreting, technology, and cognition, questions associ-
ated with cognitive ergonomics and human–computer interaction are more implicit in the
academic literature on interpreting technologies. Broadly speaking, cognitive ergonomics
focuses on how work occurs in the mind and how the mind can influence work (Hollnagel,
1997), all of which takes into account an understanding of the situation in which this work
is being done. In this case, the situation is broadly understood not only as a particular set-
ting but also the goals and constraints associated with this work, as well as the relational
nature of the agents or people in a given system (Hollnagel, 1997). In translation studies,
O’Brien (2012) has argued that translation can be viewed as human–computer interaction
(HCI), which enables reflection on many of these points. Much of the argument in O’Brien
(2012) focuses on written translation; however, O’Brien does suggest that the interaction
of interpreters with computers could also be considered a form of human–computer inter-
action. This early recognition of HCI as a framework within which to study technologies
and their use in translation, and by extension, interpreting, enables researchers to examine
interpreting technologies more broadly within the context of HCI and cognitive ergonom-
ics, particularly in relation to the influence that these technologies have on the interpreting
task itself (cf. Norman and Kirakowski, 2018).
Researchers in interpreting studies have emphasised that computer-assisted interpreting
tools ought to be developed in such a manner that would be ergonomically advantageous,
rather than increasing the overall cognitive load of users (Defrancq et al., 2024). Frittella
(2023) approaches this from the perspective of usability testing, arguing that tools such as
SmarTerp have been developed using an interpreter-centred approach akin to those seen in

354
Cognitive aspects

human factors scholarship. This type of reflection is reminiscent of scholars who focus on
information technologies in other spaces, including healthcare, that seek to understand the
impact that these tools can have on performance and human factor designs (e.g. Lawler
et al., 2011). Researchers working with cognitive ergonomics can leverage this framework
as a means to examine the impact that technologies have not only on individuals but also
on the systems in which they are embedded and the broader set of actors involved (Barcel-
lini et al., 2016).
The potential of ergonomics to address translator and interpreter training has been taken
up more recently, bridging both physical and cognitive ergonomic perspectives (e.g. van
Egdom et al., 2020). Seeber and Arbona (2020) explore the utility of cognitive ergonomics
in interpreting studies more explicitly, focusing on how training for simultaneous interpret-
ing can be adapted using these principles. As the authors note, their emphasis is to make
training efficient and effective, leveraging technologies to implement their training model
based on cognitively informed research. By relying on a cognitive ergonomics framework,
the work of interpreters can be further situated while allowing cognitive dimensions of the
interpreting task to be explored.

19.3.3 4EA cognition and augmented cognition


The contextualised nature of cognitive ergonomics and human–computer interaction is a
natural complement to research that adopts what is commonly referred to as 4EA cogni-
tion. While these theoretical paradigms can be used in conjunction, their initial discussions
in the field appear to have emerged independently rather than as a natural outgrowth from
either disciplinary tradition. The 4EA constellation of theoretical frameworks that address
embodied, embedded, enactive, extended, and affective cognition have been examined as a
potential means to study interpreting and technology (Halverson, 2021; Mellinger, 2023;
Sannholm and Risku, 2024). Much in the same way that cognitive ergonomics recognises
the task environment and the various actors involved in a specific system, embedded cogni-
tion recognises the situated nature of interpreting and the potential influence that a tech-
nologised environment may have on the task of interpreting. Mellinger (2023) also suggests
that the use of technologies can extend interpreter cognition beyond a computational-
ist view of cognition that occurs exclusively within the mind to encompass externalised
resources, such as glossaries and note-taking. In doing so, interpreting technologies can
function as an extension of an interpreter’s cognition. Externalisation of an interpreter’s
cognising activity can, in turn, be shared with other interpreters, distributing this cognition
across teams of interpreters. Interpreting technologies enable a distributed view of cogni-
tion as cognitive resources are shared among various constituencies, be they the result of the
work of other interpreters (e.g. glossaries and termbases) or from automated systems, such
as machine translation systems that interface ASR (Martín de León and Fernández Santana,
2021; Mellinger, 2023).
Interpreting cognition can also be viewed from an embodied or affective dimension as
well. Milošević and Risku (2025) articulate how situational aspects of interpreting, such
as interactions with technologies and the environment, shape and intersect with interpreter
cognition. These interactions can be viewed at the macro-level, such as the organisation or
positioning of specific equipment or technologies in the interpreter booth, as well as the
micro-level, to take into account sensorimotor aspects of the task to account for brain–
body–environment interactions. Embodied cognition is particularly well suited to cognitive

355
The Routledge Handbook of Interpreting, Technology and AI

enquiry into interpreting and technology, as this theoretical orientation addresses human
factors and the interface that interpreters have with technologies and their environment.
Scholarship that employs physiological measures to understand cognitive aspects of inter-
preting may also fit within this broader umbrella, particularly related to reactions to stress
and affective dimensions of the interpreting task (e.g. Gieshoff et al., 2021).
More recently, augmented cognition has garnered attention by the translation and inter-
preting studies research community. In translation studies, researchers have argued that
translation has been an augmented activity as a result of technology usage for some time
rather than being a recent development in line with widespread usage of large language
models (O’Brien, 2024). Yet as O’Brien (2024) notes, augmented cognition has varying
conceptualisations, depending on the disciplinary traditions and theoretical frameworks
employed. For instance, augmented cognition, in a more traditional sense, understands
interactions between humans and technological systems as extending beyond an enhance-
ment provided by technology-supported tasks; rather, it relies on a ‘tight coupling between
user and computer . . . achieved via physiological and neurophysiological sensing of a user’s
cognitive state’ (Stanney et al., 2009). Rather than replacing human ability, O’Brien (2024)
suggests that human-centred artificial intelligence that can amplify intelligence is a potential
means to move forward.
Moving this discussion into interpreting studies, Prandi (2023b) has discussed how
human cognition can move beyond 4EA cognitive frameworks of distributed and extended
cognition to augment human ability using technologies. Much in the same way that O’Brien
(2024) identifies multiple approaches to understand augmentation, so too can interpreting
technologies be viewed in myriad ways. Gieshoff (2023) describes how augmented reality
is one means by which interpreters working remotely may be able to leverage technologies
to support interpreter cognition. Scholarship involving augmented cognition in interpret-
ing is nascent, yet these studies show potential means forward as a way to understand how
technology can support interpreting as a task and as a service.

19.4 Open questions for research


While interpreting studies scholarship has examined a number of technology-related phe-
nomena in relation to cognition, there are new avenues that remain to be explored that are
particularly salient in light of the rapid development and expansion of technologies in the
field. In addition to the various areas discussed in previous sections, cognitively oriented ques-
tions associated with big data and ethics are relevant topics that merit attention. While not
an exhaustive account of potential areas of exploration, the intersections of technology with
big data and ethics are important points of reflection for the research community to address.
Advances in computing and data analytics now afford researchers the ability to query very
large datasets to detect potential patterns and relationships among variables that could not
otherwise be analysed or perceived without technological intervention (Mellinger, 2022a).
Increasingly prevalent large language models illustrate how big data can be leveraged as a
means to emulate human communication. While big data can be a useful tool to develop
machine interpreting systems and have been described as a harbinger of an augmented par-
adigm of interpreting (Fantinuoli and Dastyar, 2022), the data that underlie these systems
represent distributed cognitive resources drawn from interpreters who have worked previ-
ously in specific contexts. Using big data in this manner raises ethical questions, particularly
in the area of artificial intelligence (see Giustini, this volume). As Horváth (2022) describes,

356
Cognitive aspects

ethical questions arise in relation to data protection and confidentiality, particularly since
leveraging data that would serve as the foundation of AI systems and potential augmented
cognitive systems may run counter to codes of ethics. Moreover, there is the potential for
the biases inherent in the data to be propagated in their subsequent use. These biases have
been recognised in large language models (Navigli et al., 2023), and as such, tool develop-
ers and researchers must develop what Giustini and Dastyar (2024) refer to as critical AI
literacy to inform interpreting practices involving these technologies. These questions seem
particularly pressing as researchers continue working at the interface of artificial intelli-
gence, augmented cognition, and large language models (O’Leary, 2022).
In addition, ethical questions loom large when considering the intersection of inter-
preting, technology, and cognition. While there are any number of noteworthy benefits
associated with the incorporation of technology into process that can potentially enhance
interpreter performance, how these practices are implemented requires ethical reflection.
At present, many professional standards of conduct or codes of ethics do not mention
technology. In a similar vein, there is a relative dearth of scholarship on ethical aspects of
technological integration into interpreting practices, such that greater attention ought to
be paid to these practices. Scholars beyond interpreting studies have begun to reflect on
the ethics of cognitive enhancement (e.g. Hofmann, 2017; Jotterand and Dubljević, 2016),
the scope of which continues to expand as technological advances continue. Interpreting
studies scholars may find the ethical frameworks and questions raised in relation to other
technologies useful as a starting point to contemplate ethical dimensions.

19.5 Conclusion
Several tentative conclusions can be drawn based on this review of the extant scholarship
that lies at the intersection of interpreting, technology, and cognition. First, the practice and
research of interpreting require more careful consideration of how technology influences
interpreter behaviour, attitudes, and cognition. While previous reflections may have con-
ceptualised technology as an add-on to the practice of interpreting, 4EA cognitive frame-
works and HCI research paradigms highlight the integrated nature of technology within
the interpreting process. Therefore, technology cannot as easily be treated as a stand-alone
variable that researchers ask whether its specific use or not alters interpreter cognition.
Instead, technology may need to be treated as a moderating variable that alters how cogni-
tive resources are managed or allocated, not only when interpreters are actively leveraging
technological resources to support their practice, but also when interpreters are working
without tools that are typically at their disposal. Technological use in regular practice likely
needs to figure into demographic variables that researchers probe to understand whether
technology has a bearing on study findings.
Second, technology as an extension of interpreter cognition raises important questions
associated with ethics both in terms of research and practice. In addition to the considera-
tions posed in the section on open questions for research, the ethical dimensions of a tech-
nologised workplace challenge researchers to reflect on how technological advances alter
interpreting as a cognitive task. In many respects, augmented cognition remains largely
underexplored in interpreting studies, thereby necessitating more vigorous engagement
with these areas. This type of work will likely need to be collaborative in nature, bringing
together experts on interpreting technologies with those focused on cognitive interpreting
studies in an effort to address increasingly complex research questions.

357
The Routledge Handbook of Interpreting, Technology and AI

To conclude, it should be recognised that the study of interpreting technology and its
influence provides an opportunity to revisit our understanding of interpreter cognition
more broadly. Many studies focus on specific cognitive variables that may change when
working with technologies, but these constructs are embedded in broader cognitive mod-
els of interpreting. These models – be they computational, componential, interactional,
neurobiological, or from any of the 4EA research paradigms – seek to describe interpreter
cognition more generally, yet the technologised nature of these practices in specific settings
may precipitate the need for revision. In some cases, these models may already account
for interpreting technologies; however, the increasingly technologised workspace of inter-
preters highlights the complex interplay at the human–computer interface. In sum, more
explicit synthesis of research in these areas is likely needed to understand the relationship
between technology and interpreter cognition.

References
Aguirre Fernández Bravo, E., 2019. Metacognitive Self-Perception in Interpreting. Translation, Cog-
nition & Behavior 2(2), 147–164. URL https://doi.org/10.1075/tcb.00025.fer
Ahrens, B., 2025. Cognitive Models of Interpreting. In Mellinger, C.D., ed. The Routledge
Handbook of Interpreting and Cognition. Routledge, New York, 52–69. URL https://doi.
org/10.4324/9780429297533-5
Ahrens, B., Orlando, M., 2022. Note-Taking for Consecutive Conference Interpreting. In Albl-Mikasa,
M., Tiselius, E., eds. The Routledge Handbook of Conference Interpreting. Routledge, New York,
34–48. URL https://doi.org/10.4324/9780429297878-5
Barcellini, F., De Greef, T., Détienne, F., 2016. Editorial for Special Issue on Cognitive Ergonomics for
Work, Education and Everyday Life. Cognition, Technology & Work 18, 233–235. URL https://
doi.org/10.1007/s10111-016-0371-5
Bertozzi, M., Cecchi, F., 2023. Simultaneous Interpretation (SI) Facing the Zoom Challenge:
Technology-Driven Changes in SI Training and Professional Practice. In Proceedings of the Inter-
national Workshop on Interpreting Technologies SAY-IT 2023, Incoma, 32–40.
Braun, S., 2013. Keep Your Distance? Remote Interpreting in Legal Proceedings: A Critical Assessment
of a Growing Practice. Interpreting 15(2), 200–228. URL https://doi.org/10.1075/intp.15.2.03bra
Braun, S., 2017. What a Micro-Analytical Investigation of Additions and Expansions in Remote
Interpreting Can Tell Us About Interpreters’ Participation in a Shared Virtual Space. Journal of
Pragmatics 107, 165–177. URL https://doi.org/10.1016/j.pragma.2016.09.011
Braun, S., 2019. Technology and Interpreting. In O’Hagan, M., ed. The Routledge Handbook of Translation
and Technology. Routledge, New York, 271–288. URL https://doi.org/10.4324/9781315311258-16
Braun, S., 2020. “You Are Just a Disembodied Voice Really”: Perceptions of Video Remote Interpret-
ing by Legal Interpreters and Police Officers. In Salaets, H., Brône, G., eds. Linking Up with Video:
Perspectives on Interpreting Practice and Research. John Benjamins, Amsterdam, 47–78. URL
https://doi.org/10.1075/btl.149.03bra
Braun, S., 2024. Distance Interpreting as a Professional Profile. In Handbook of the Language Indus-
try. De Gruyter Mouton, 449–472.
Cañada, M.D., Arumí, M., 2012. Self-Regulating Activity: Use of Metacognitive Guides in the Inter-
preting Classroom. Educational Research and Evaluation 18(3), 245–264. URL https://doi.org/
10.1080/13803611.2012.661934
Chen, S., 2017. The Construct of Cognitive Load in Interpreting and Its Measurement. Perspectives
25(4), 640–657. URL https://doi.org/10.1080/0907676X.2016.1278026
Chen, S., Doherty, S., 2025. Interpreting and Technologies. In Mellinger, C.D., ed. The Routledge
Handbook of Interpreting and Cognition. Routledge, New York, 403–416. URL https://doi.
org/10.4324/9780429297533-29
Chen, S., Kruger, J.L., 2023. The Effectiveness of Computer-Assisted Interpreting: A Preliminary
Study Based on English-Chinese Consecutive Interpreting. Translation and Interpreting Studies
18(3), 399–420. URL https://doi.org/10.1075/tis.21036.che

358
Cognitive aspects

Chmiel, A., Spinolo, N., 2022. Testing the Impact of Remote Interpreting Settings on Interpreter
Experience and Performance: Methodological Challenges Inside the Virtual Booth. Translation,
Cognition & Behavior 5(2), 250–274. URL https://doi.org/10.1075/tcb.00068.chm
Darden, V., Maroney, E.M., 2018. “Craving to Hear from You . . .”: An Exploration of m-Learning in
Global Interpreter Education. Translation and Interpreting Studies 13(3), 442–464. URL https://
doi.org/10.1075/tis.00024.dar
Davitti, E., Braun, S., 2020. Analysing Interactional Phenomena in Video Remote Interpreting in Col-
laborative Settings: Implications for Interpreter Education. The Interpreter and Translator Trainer
14(3), 279–302. URL https://doi.org/10.1080/1750399X.2020.1800364
Defrancq, B., Snoeck, H., Fantinuoli, C., 2024. Interpreters’ Performances and Cognitive Load in the
Context of a CAI Tool. In Winters, M., Deane-Cox, S., Böser, U., eds. Translation, Interpreting
and Technological Change: Innovations in Research, Practice and Training. Bloomsbury, London,
37–58. URL https://doi.org/10.5040/9781350212978.0009
Díaz-Galaz, S., Winston, E.A., 2025. Interpreting, Training, and Education. In Mellinger, C.D., ed.
The Routledge Handbook of Interpreting and Cognition. Routledge, New York, 417–437. URL
https://doi.org/10.4324/9780429297533-30
Diriker, E., 2010. Simultaneous Conference Interpreting and Technology. In Gambier, Y., van
Doorslaer, L., eds. Handbook of Translation Studies. John Benjamins, Amsterdam, 329–332. URL
https://doi.org/10.1075/hts.1.sim1
Doherty, S., Martschuk, N., Goodman-Delahunty, J., Hale, S., 2022. An Eye-Movement Anal-
ysis of Overt Visual Attention During Consecutive and Simultaneous Interpreting Modes in a
Remotely Interpreted Investigative Interview. Frontiers in Psychology 13, 764460. URL https://
doi.org/10.3389/fpsyg.2022.764460
Donovan, C., 2023. The Consequences of Fully Remote Interpretation on Interpreter Interaction and
Cooperation: A Threat to Professional Cohesion? INContext: Studies in Translation and Intercul-
turalism 3(1), 24–48. URL https://doi.org/10.54754/incontext.v3i1.59
Drechsel, A., 2019. Technology Literacy for the Interpreter. In Sawyer, D.B., Austermühl, F., Enríquez
Raído, V., eds. The Evolving Curriculum in Interpreter and Translator Education: Stakeholder
Perspectives and Voices. John Benjamins, Amsterdam, 259–268. URL https://doi.org/10.1075/ata.
xix.12dre
Englund Dimitrova, B., Tiselius, E., 2016. Cognitive Aspects of Community Interpreting. Toward a
Process Model. In Muñoz Martín, R., ed. Reembedding Translation Process Research. John Ben-
jamins, Amsterdam, 195–214. URL https://doi.org/10.1075/btl.128.10eng
Fantinuoli, C., ed., 2018. Interpreting and Technology. Language Science Press, Berlin.
Fantinuoli, C., 2023. Towards AI-Enhanced Computer-Assisted Interpreting. In Corpas Pastor,
G., Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. John Benjamins,
Amsterdam, 46–71. URL https://doi.org/10.1075/ivitra.37.03fan
Fantinuoli, C., Dastyar, V., 2022. Interpreting and the Emerging Augmented Paradigm. Interpreting and Soci-
ety: An Interdisciplinary Journal 2(2), 185–194. URL https://doi.org/10.1177/27523810221111631
Frittella, F.M., 2021. Computer-Assisted Conference Interpreter Training: Limitations and Future Direc-
tions. Journal of Translation Studies 1(2), 103–142. URL https://doi.org/10.3726/JTS022021.6
Frittella, F.M., 2023. Usability Research for Interpreter-Centred Technology: The Case Study of
SmarTerp. Language Science Press, Berlin.
Gardenfors, P., Johansson, P., eds., 2005. Cognition, Education, and Communication Technology.
Routledge, New York.
Gieshoff, A.C., 2023. The Use of Augmented Reality in Interpreting: Methodological Challenges.
Paper presented at the Fourth International Conference on Translation, Interpreting, and Cogni-
tion, Santiago, Chile.
Gieshoff, A.C., Heeb, A.H., 2023. Cognitive Load and Cognitive Effort: Probing the Psychological
Reality of a Conceptual Difference. Translation, Cognition & Behavior 6(1), 3–28. URL https://
doi.org/10.1075/tcb.00073.gie
Gieshoff, A.C., Lehr, C., Heeb, A.H., 2021. Stress, Cognitive, Emotional and Ergonomic Demands in
Interpreting and Translation: A Review of Physiological Studies. Cognitive Linguistic Studies 8(2),
404–439. URL https://doi.org/10.1075/cogls.00084.gie
Gile, D., 1991. The Processing Capacity Issue in Conference Interpretation. Babel 37(1), 15–27. URL
https://doi.org/10.1075/babel.37.1.04gil

359
The Routledge Handbook of Interpreting, Technology and AI

Gile, D., 2000. Issues in Interdisciplinary Research into Conference Interpreting. In Englund Dim-
itrova, B., Hyltenstam, K., eds. Language Processing and Simultaneous Interpreting: Interdiscipli-
nary Perspectives. John Benjamins, Amsterdam, 89–106.
Gile, D., 2009. Basic Concepts and Models for Interpreter and Translator Training, revised ed. John
Benjamins, Amsterdam. URL https://doi.org/10.1075/btl.8
Gile, D., Lei, V., 2020. Translation, Effort and Cognition. In Alves, F., Jakobson, A.L., eds. The Rout-
ledge Handbook of Translation and Cognition. Routledge, New York, 263–278. URL https://doi.
org/10.4324/9781315178127-18
Giustini, D., Dastyar, V., 2024. Critical AI Literacy for Interpreting in the Age of AI. Interpreting and
Society: An Interdisciplinary Journal. URL https://doi.org/10.1177/27523810241247259
Goldsmith, J., 2018. Tablet Interpreting: Consecutive Interpreting 2.0. Translation and Interpreting
Studies 13(3), 342–365. URL https://doi.org/10.1075/tis.00020.gol
Hale, S., Goodman-Delahunty, J., Martschuk, N., Lim, J., 2022. Does Interpreter Location Make a
Difference? A Study of Remote vs Face-to-Face Interpreting in Simulated Police Interviews. Inter-
preting 24(2), 221–253. URL https://doi.org/10.1075/intp.00077.hal
Halverson, S.L., 2021. Translation, Linguistic Commitment and Cognition. In Alves, F., Jakobsen,
A.L., eds. The Routledge Handbook of Translation and Cognition. Routledge, New York, 37–51.
Herring, R., 2019. “A Lot to Think About”: Online Monitoring in Dialogue Interpreting. Transla-
tion, Cognition & Behavior 2(2), 283–304. URL https://doi.org/10.1075/tcb.00030.her
Hofmann, B., 2017. Toward a Method for Exposing and Elucidating Ethical Issues with Human
Cognitive Enhancement Technologies. Science and Engineering Ethics 23, 413–429. URL https://
doi.org/10.1007/s11948-016-9791-0
Hollnagel, E., 1997. Cognitive Ergonomics: It’s All in the Mind. Ergonomics 40(10), 1170–1182.
URL https://doi.org/10.1080/001401397187685
Horváth, I., 2022. AI in Interpreting: Ethical Considerations. Across Languages and Cultures 23(1),
1–13. URL https://doi.org/10.1556/084.2022.00108
Iglesias Fernández, E., Ouellet, M., 2018. From the Phone to the Classroom: Categories of Problems
for Telephone Interpreting Training. The Interpreters’ Newsletter 23, 19–44.
Jiménez Serrano, O., 2019. Interpreting Technologies: Introduction. Revista tradumàtica: traduc-
ció i tecnologies de la informació i la comunicació 17, 20–32. URL https://doi.org/10.5565/rev/
tradumatica.240
Jotterand, F., Dubljević, V., eds., 2016. Cognitive Enhancement: Ethical and Policy Implications in
International Perspectives. Oxford University Press, Oxford.
Korpal, P., Mellinger, C.D., 2025. Interpreting and Individual Differences. In Mellinger, C.D., ed.
The Routledge Handbook of Interpreting and Cognition. Routledge, New York, 357–372. URL
https://doi.org/10.4324/9780429297533-26
Korpal, P., Rojo López, A.M., 2023. Physiological Measurement in Translation and Interpreting.
In Schwieter, J.W., Ferreira, A., eds. The Routledge Handbook of Translation, Interpreting and
Bilingualism. Routledge, New York, 97–110. URL https://doi.org/10.4324/9781003109020-10
Kuang, H., Zheng, B., 2023. Note-Taking Effort in Video Remote Interpreting: Effects of Source
Speech Difficulty and Interpreter Work Experience. Perspectives 31(4), 724–744. URL https://doi.
org/10.1080/0907676X.2022.2053730
Lawler, E.K., Hedge, A., Pavlovic-Veselinovic, S., 2011. Cognitive Ergonomics, Socio-Technical Sys-
tems, and the Impact of Healthcare Information Technologies. International Journal of Industrial
Economics 41(4), 336–344. URL https://doi.org/10.1016/j.ergon.2011.02.006
Liu, H., 2018. Help or Hinder? The Impact of Technology on the Role of Interpreters. FITISPos Inter-
national Journal 5(1), 13–32. URL https://doi.org/10.37536/FITISPos-IJ.2018.5.1.162
Martín de León, C., Fernández Santana, A., 2021. Embodied Cognition in the Booth: Referential and
Pragmatic Gestures in Simultaneous Interpreting. Cognitive Linguistic Studies 14, 363–387. URL
https://doi.org/10.1075/cogls.00079.mar
Mellinger, C.D., 2019. Computer-Assisted Interpreting Technologies and Interpreter Cognition: A
Product and Process-Oriented Perspective. Revista tradumàtica: traducció i tecnologies de la infor-
mació i la comunicació 17, 33–44. URL https://doi.org/10.5565/rev/tradumatica.228
Mellinger, C.D., 2022a. Quantitative Questions on Big Data in Translation Studies. Meta 67(1),
217–231. URL https://doi.org/10.7202/1092197ar

360
Cognitive aspects

Mellinger, C.D., 2022b. Cognitive Behavior During Consecutive Interpreting: Describing the Note-
taking Process. Translation & Interpreting 14(2), 103–119. URL https://doi.org/10.12807/
ti.114202.2022.a07
Mellinger, C.D., 2023. Embedding, Extending, and Distributing Interpreter Cognition with Tech-
nology. In Corpas-Pastor, G., Defrancq, B., eds. Interpreting Technologies – Current and Future
Trends. John Benjamins, Amsterdam, 195–216. URL https://doi.org/10.1075/ivitra.37.08mel
Mellinger, C.D., 2024. Translation and Interpreting Process Research. In Lange, A., Monticelli, D.,
Rundle, C., eds. The Routledge Handbook of the History of Translation Studies. Routledge,
New York, 450–465. URL https://doi.org/10.4324/9781032690056-31
Mellinger, C.D., ed., 2025. The Routledge Handbook of Interpreting and Cognition. Routledge,
New York. URL https://doi.org/10.4324/9780429297533
Mellinger, C.D., Hanson, T.A., 2018. Interpreter Traits and the Relationship with Technology and Visibil-
ity. Translation and Interpreting Studies 13(3), 366–392. URL https://doi.org/10.1075/tis.00021.mel
Mellinger, C.D., Hanson, T.A., 2019. Meta-Analyses of Simultaneous Interpreting and Working
Memory. Interpreting 21(2), 165–195. URL https://doi.org/10.1075/intp.00026.mel
Mellinger, C.D., Hanson, T.A., 2020. Methodological Considerations for Survey Research: Validity,
Reliability, and Quantitative Analysis. Linguistica Antverpiensia, New Series – Themes in Transla-
tion Studies 19, 172–190. URL https://doi.org/10.52034/lanstts.v19i0.549
Mellinger, C.D., Pokorn, N.K., 2018. Community Interpreting, Translation, and Technology. Trans-
lation and Interpreting Studies 13(3), 337–341. URL https://doi.org/10.1075/tis.00019.int
Milošević, J., Risku, H., 2025. Embodied Cognition. In Mellinger, C.D., ed. The Routledge
Handbook of Interpreting and Cognition. Routledge, New York, 324–340. URL https://doi.
org/10.4324/9780429297533-24
Moser-Mercer, B., 2005. Remote Interpreting: Issues of Multi-Sensory Integration in a Multilingual
Task. Meta 50(2), 727–738. URL https://doi.org/10.7202/011014ar
Muñoz Martín, R., 2016. Of Minds and Men – Computers and Translators. Poznań Studies in Con-
temporary Linguistics 52(2), 351–381. URL https://doi.org/10.1515/psicl-2016-0013
Napier, J., Skinner, R., Braun, S., eds., 2018. Here or There: Research on Interpreting via Video Link.
Gallaudet University Press, Washington, DC. URL https://doi.org/10.2307/j.ctv2rh2bs3
Navigli, R., Conia, S., Ross, B., 2023. Biases in Large Language Models: Origins, Inventory,
and Discussion. Journal of Data and Information Quality 15(2), art. 10. URL https://doi.
org/10.1145/3597307
Norman, K.K., Kirakowski, J., eds., 2018. The Wiley Handbook of Human Computer Interaction,
2nd ed. Wiley, Malden, MA. URL https://doi.org/10.1002/9781118976005
O’Brien, S., 2012. Translation as Human-Computer Interaction. Translation Spaces 1(1), 101–122.
URL https://doi.org/10.1075/ts.1.05obr
O’Brien, S., 2024. Human-Centered Augmented Translation: Against Antagonistic Dualisms. Per-
spectives 32(3), 391–406. URL https://doi.org/10.1080/0907676X.2023.2247423
Olalla-Soler, C., Spinolo, N., Muñoz Martin, R., 2023. Under Pressure? A Study of Heart Rate and
Heart-Rate Variability Using SmarTerp. Hermes 63, 119–142. URL https://doi.org/10.7146/hjlcb.
vi63.134292
O’Leary, D.E., 2022. Massive Data Language Models and Conversational Artificial Intelligence:
Emerging Issues. Intelligent Systems in Accounting, Finance and Management 29(3), 182–198.
URL https://doi.org/10.1002/isaf.1522
Orlando, M., 2023. Using Smartpens and Digital Pens in Interpreter Training and Interpreting
Research. In Corpas Pastor, G., Defrancq, B., eds. Interpreting Technologies – Current and Future
Trends. John Benjamins, Amsterdam, 6–26. URL https://doi.org/10.1075/ivitra.37.01orl
Ozolins, U., 2011. Telephone Interpreting: Understanding Practice and Identifying Research Needs.
Translation & Interpreting 3(2), 33–47.
Picchio, L., 2023. Distance vs. Onsite (Non-) Streamed Interpreting Performances: A Focus on the
Renditions of Film Scenes. The Interpreters’ Newsletter 28, 171–188.
Pisani, E., Fantinuoli, C., 2021. Measuring the Impact of Automatic Speech Recognition on Number
Rendition in Simultaneous Interpreting. In Wang, C., Zheng, B., eds. Empirical Studies of Trans-
lation and Interpreting: The Post-Structuralist Approach. Routledge, New York, 181–197. URL
https://doi.org/10.4324/9781003017400-14

361
The Routledge Handbook of Interpreting, Technology and AI

Pöchhacker, F., Shlesinger, M., eds., 2002. The Interpreting Studies Reader. Routledge, New York.
Prandi, B., 2023a. Computer-Assisted Simultaneous Interpreting: A Cognitive-Experimental Study on
Terminology. Language Science Press.
Prandi, B., 2023b. Exploring Augmented Cognition for Real-Time Interpreter Support. Paper pre-
sented at the Second Bertinoro Translation Society Conference, Cabo de Palos, Spain.
Romero-Fresco, P., 2023. Interpreting for Access: The Long Road to Recognition. In Zwischenberger,
C., Reithofer, K., Rennert, S., eds. Introducing New Hypertexts on Interpreting Studies: A Trib-
ute to Franz Pöchhacker. John Benjamins, Amsterdam, 236–253. URL https://doi.org/10.1075/
btl.160.12rom
Roziner, I., Shlesinger, M., 2010. Much Ado About Something Remote: Stress and Performance in
Remote Interpreting. Interpreting 12(2), 214–247. URL https://doi.org/10.1075/intp.12.2.05roz
Saeed, M., Rodríguez González, E., Korybski, T., Davitti, E., Braun, S., 2023. Comparing Inter-
face Designs to Improve RSI Platforms: Insights from an Experimental Study. Proceed-
ings of the International Conference HiT-IT 2023, 147–156. URL https://doi.org/10.26615/
issn.2683-0078.2023_013
Salaets, H., Brône, G., 2023. “Working at a Distance from Everybody”: Challenges (and Some
Advantages) in Working with Video-Based Interpreting Platforms. The Interpreters’ Newsletter
28, 189–209.
Sannholm, R., Risku, H., 2024. Situated Minds and Distributed Systems in Translation: Exploring
the Conceptual and Empirical Implications. Target 36(2), 159–183. URL https://doi.org/10.1075/
target.22172.san
Seeber, K.G., 2013. Cognitive Load in Simultaneous Interpreting: Measures and Methods. Target
25(1), 18–32. URL https://doi.org/10.1075/target.25.1.03see
Seeber, K.G., 2017. Multimodal Processing in Simultaneous Interpreting. In Schwieter, J.W., Ferreira,
A., eds. The Handbook of Translation and Cognition. Wiley, Malden, MA, 461–475. URL https://
doi.org/10.1002/9781119241485.ch25
Seeber, K.G., Amos, R.M., 2023. Capacity, Load, and Effort in Translation, Interpreting, and Bilingual-
ism. In Schwieter, J.W., Ferreira, A., eds. The Routledge Handbook of Translation, Interpreting and
Bilingualism. Routledge, New York, 260–279. URL https://doi.org/10.4324/9781003109020-22
Seeber, K.G., Arbona, E., 2020. What’s Load Got to Do with It? A Cognitive-Ergonomic Training
Model of Simultaneous Interpreting. The Interpreter and Translator Trainer 14(4), 369–385. URL
https://doi.org/10.1080/1750399X.2020.1839996
Seeber, K.G., Kerzel, D., 2012. Cognitive Load in Simultaneous Interpreting: Model Meets Data. Inter-
national Journal of Bilingualism 16(2), 228–242. URL https://doi.org/10.1177/1367006911402982
Shlesinger, M., 2000. Interpreting as a Cognitive Process: How Can We Know What Really Happens?
In Tirkkonen-Condit, S., Jääskeläinen, R., eds. Tapping and Mapping the Processes of Transla-
tion and Interpreting: Outlooks on Empirical Research. John Benjamins, Amsterdam, 3–16. URL
https://doi.org/10.1075/btl.37.03shl
Stanney, K.M., Schmorrow, D.D., Johnston, M., Fuchs, S., Jones, D., Hale, K.S., Ahmad, A., Young,
P., 2009. Augmented Cognition: An Overview. Reviews of Human Factors and Ergonomics 5(1),
195–224. URL https://doi.org/10.1518/155723409X448062
Stengers, H., Lázaro Gutiérrez, R., Kerremans, K., 2023. Public Service Interpreters’ Perceptions
and Acceptance of Remote Interpreting Technologies in Times of a Pandemic. In Corpas-Pastor,
G., Defrancq, B., eds. Interpreting Technologies – Current and Future Trends. John Benjamins,
Amsterdam, 109–141. URL https://doi.org/10.1075/ivitra.37.05ste
Tiselius, E., Englund Dimitrova, B., 2023. Testing the Working Memory Capacity of Dialogue Interpret-
ers. Across Languages and Cultures 24(2), 163–180. URL https://doi.org/10.1556/084.2023.00439
Van Egdom, G.-W., Cadwell, P., Kockaert, H., Segers, W., 2020. A Turn to Ergonomics in Translator
and Interpreter Training. The Interpreter and Translator Trainer 14(4), 363–368. URL https://doi.
org/10.1080/1750399X.2020.1846930
Viljanmaa, A., 2018. Students’ Views on the Use of Film-Based LangPerform Computer Simulations
for Dialogue Interpreting. Translation and Interpreting Studies 13(3), 465–485. URL https://doi.
org/10.1075/tis.00025.vil
Wehrmeyer, E., 2015. Comprehension of Television News Signed Language Interpreters: A South
African Perspective. Interpreting 17(2), 195–225. URL https://doi.org/10.1075/intp.17.2.03weh

362
Cognitive aspects

Wen, H., Dong, Y., 2019. How Does Interpreting Experience Enhance Working Memory and
Short-Term Memory: A Meta-Analysis. Journal of Cognitive Psychology 31(8), 769–784. URL
https://doi.org/10.1080/20445911.2019.1674857
Zheng, R.Z., ed., 2018. Cognitive Load Measurement and Application. Routledge, New York.
Zhu, X., Aryadoust, V., 2022. A Synthetic Review of Cognitive Load in Distance Interpret-
ing: Toward an Explanatory Model. Frontiers in Psychology 13. URL https://doi.org/10.3389/
fpsyg.2022.899718
Ziegler, K., Gigliobianco, S., 2018. Present? Remote? Remotely Present! New Technological
Approaches to Remote Simultaneous Conference Interpreting. In Fantinuoli, C., ed. Interpreting
and Technology. Language Science Press, Berlin, 119–139.

363
20
INTERNATIONAL AND
PROFESSIONAL STANDARDS
Verónica Pérez Guarnieri and Haris N. Ghinos

20.1 Unravelling ISO standards: development and history


Before delving into the specifics of interpreting standards, it is crucial to comprehend the
organisational structure of the International Organization for Standardization (ISO),1
where these global interpretation standards take shape. In this section, we will succinctly
examine the various procedural facets involved in the development and publication of inter-
national standards.

20.1.1 Standard development stages


As clearly described in ISO/IEC Directives (ISO 2023b), the standardisation process begins
within technical committees (TCs) with a defined scope. When that scope becomes too
large, it may be divided into smaller scopes, which are then assigned to subcommittees
(SCs). Subcommittees operate in the same way as a technical committee, with its own chair
and secretary. In turn, SCs have working groups (WG) composed of experts. The coordina-
tor of the WG (convenor) drives the consensus process amongst these experts, who, in turn,
focus on the technical content.
Interpreting, translation, and related technology standards are developed under TC 37,
‘Language and Terminology’, SC5, ‘Translation, Interpreting, and Related Technology’.
At the time of publication, SC5 has published 19 standards, with 8 standards currently
under development, 35 participating members, and 15 observing members. The committee
contributes with 9 standards to 8 Sustainable Development Goals, among them Sustainable
Development Goal 10: Reduced Inequalities.
Regarding stakeholders, there are two categories of ISO members: countries represented
by their national standardisation bodies (e.g. IRAM, DIN, UNE, etc.) and organisations
in liaison (e.g. AIIC) that meet ISO’s admission criteria. In turn, these two categories have
subcategories. Member bodies can be participating members (P-members) or observing
members (O-members). P-members are full members who actively participate, vote, and
nominate experts to contribute to the development of any given project and ensure their
position is heard. O-members follow the development of the standards without providing
resources.

DOI: 10.4324/9781003053248-26 
International and professional standards

Regarding organisations in liaison, there are three categories: A, B, and C. Category


A liaison organisations play a valuable role in contributing to the activities of a techni-
cal committee. This involvement includes proposing new work items for inclusion in
the committee’s work programme. A-liaison organisations can also nominate experts to
working groups (WGs) and may take on convenor or project leader responsibilities within
these WGs. Although liaison organisations play a crucial role, they do not have voting
rights. In other words, liaison organisations can actively participate by providing com-
ments and suggesting modifications in the voting process, and their comments should be
given equal consideration to those submitted by P-members. Category B-liaisons are specifi-
cally for inter-governmental organisations seeking updates on a committee’s work. Liaison
­representatives in this category can attend committee meetings as observers and have the
right to submit comments. Moving to Category C, which is relevant at the working group
(WG) level, C-liaison representatives actively contribute to the technical work and fully
engage in a WG’s activities. However, they are not eligible for project leader or convenor
roles. Attendance at committee plenaries as observers is only permissible for C-liaisons
upon explicit invitation by the committee, and they do not possess the right to propose new
work items (ISO 2010, ISO 2011).
Standardisation is an exercise in consensus. The whole process begins when a member
body or a liaison organisation identifies a market need within a specific area and proposes
to develop a standard to address that need. The member body or liaison organisation will
then submit a new work item proposal (NP) to the relevant committee or subcommittee for
a vote and nominate a project leader, thus launching the proposal stage. For example, in
2017 the International Association of Conference Interpreters (AIIC) identified the need to
develop a standard for conference interpreting (ISO 23155:2022, see ut infra), submitted
an NP, and nominated a project leader.
Next, subcommittee members decide whether that specific area is worth developing a
standard for. They do this via a committee ballot, which is usually a 12-week approval pro-
cess. The project’s approval requires a two-thirds majority of the P-members voting affirma-
tively. If the requisite majority is reached, the project is registered, and the preparatory
stage is started. The project leader initiates a first working draft (WD), which is reviewed
by all nominated experts in a working group consultation. Successive WDs may be circu-
lated until an optimal text is developed. Once mature, the project moves to the committee
stage to become a committee draft (CD), a critical consensus-building stage where national
bodies provide technical and editorial comments. If enough consensus is reached, the draft
proceeds to the enquiry stage (draft international standard). This requires a two-thirds
majority vote from P-members and less than a quarter of negative votes, justified by techni-
cal reasons. If not approved, the leadership can revise and re-ballot the draft.
The enquiry stage concludes with the registration of the text for circulation as FDIS
(final draft international standard) or for publication. Only editorial changes can be made
at this late stage, and no technical comments will be considered. Approval criteria include
a two-thirds majority of voting members, with no more than 25% of all votes cast being
negative. Technical comments will be kept for consideration in the next revision of the doc-
ument. If the FDIS draft is approved, it proceeds to the publication stage. If not approved,
the committee can choose to resubmit the draft as a committee draft, a draft international
standard, or another final draft international standard.
All ISO standards are reviewed at least every five years to ensure they remain up-to-date
and current. During the systematic review process, national standards bodies assess the

365
The Routledge Handbook of Interpreting, Technology and AI

document and its applicability within their country, consulting with stakeholders to deter-
mine its continued validity, the need for updates, or the possibility of withdrawal (ISO 2019).

20.1.2 Standardisation deliverables


While the paramount output at ISO remains the international standard, it is imperative to
acknowledge the latitude granted to technical committees for the potential publication of
alternative deliverables (ISO 2020).
In the parlance of ISO, the term ‘shall’ denotes a requirement, ‘should’ signifies a recom-
mendation, and ‘may’ confers permission. The technical report stands as a deliverable that
does not include requirements. Consequently, a technical report abstains from any inclu-
sion of ‘shall’. Typically, it encapsulates data derived from collaborative group efforts and
discussions. The process of publishing a technical report is relatively straightforward and
does not need the same degree of consensus as mandated for an international standard.
The publicly available specification assumes the role of an intermediary specification,
capable of accommodating requirements, a departure from the technical report. ‘Shall’ may
be incorporated into this document, and the requisite consensus level for its publication is
notably more lenient than that mandated for an international standard.
The technical specification is positioned as a preliminary iteration of an international
standard. It possesses the capacity to integrate requirements, recommendations, and per-
missions but demands a lower level of consensus than its international counterpart. Com-
mittees may opt to disseminate a technical specification in scenarios where challenges in
consensus-building emerge within the working group or technical committee during interna-
tional standard development. In such instances, the release of a technical specification serves
the purpose of soliciting market feedback. If deemed appropriate by the technical committee,
there exists the possibility of subsequent conversion into an international standard.
The language used to draft standards will very much depend on the type of deliverable.
This will be described in the next subsection.

20.1.3 Drafting standards: ‘should’ vs ‘shall’ and other peculiarities


This section addresses pivotal elements within a standard, delving into the primary clauses,
the utilisation of ISO language, the significance of employing plain language in standard
drafting, specific terminology, phrases, and verbal forms, along with their nuanced mean-
ings within the ISO lexicon.
In the realm of standards, a delineation exists between ‘requirement’ standards, which
comprise clauses mandating strict compliance, and ‘recommendation’ standards, which
advocate optimal practices. However, it is noteworthy that standards can encompass both
facets and feature clauses, which necessitate obligatory adherence or offer recommenda-
tions for best practices. This duality is encapsulated in the vocabulary of the standard itself,
as exemplified by titles such as ‘ISO 18841:2018 Interpreting Services – General require-
ments, and recommendations’.2 This title signals the incorporation of both mandatory com-
pliance elements and advisory recommendations within the standard. Furthermore, this
duality should also be reflected in the use of certain verbal forms.
A standard consists of several sections; some are mandatory, and some are optional. The
Introduction, although optional, remains a highly recommended section of the standard
because it can be used to add information or comments about the content of the standard.

366
International and professional standards

In the introduction of interpreting standards, we read about the needs of the language
industry that led to their development, the clarification of some terms, and why some con-
cepts have been excluded. The most important section is the Scope, which should always
be concise, to explain to potential users what it covers and precisely and succinctly define
the subject of the standard. Like the Introduction, the Scope should be written as a series of
statements of facts and should not contain requirements or recommendations. The Scope
will include verbs such as ‘specifies’ and ‘establishes’.
Subsequently, the Normative References section provides a list of standards that are
referenced for understanding and implementing the content outlined in the document. The
Terms and Definitions section then covers how a typical terminological entry is drafted, the
importance of definitions seamlessly replacing terms in context, and specific rules for terms
and definitions. These rules include avoiding articles in definitions and refraining from
using equations, figures, and tables in definitions. Additionally, as definitions are explana-
tory, they should not contain requirements, recommendations, or permissions.
There are three possibilities for an introductory text in the ‘Terms and Definitions’ clause,
depending on whether new terms are defined, terms are referenced from another document,
or no terms and definitions are present. In the three definitions that follow, notice the ref-
erencing and derivation among them, the source in the case of definitions that are not new,
and the justification for the change of the cited definition.

3.2.63

interpreting

interpretation
rendering spoken or signed information from a source language (3.1.4) to a target
language (3.1.5) in oral or signed form, conveying both the meaning and language
register (3.1.10) of the source language content (3.1.15)

[SOURCE: ISO 20539:2019, 3.1.10, modified – The order of the wording ‘both the lan-
guage register and meaning’ has been changed to ‘both the meaning and language register’.]

3.2.104

conference interpreting
interpreting (3.2.6) used for multilingual communication at technical, political, scien-
tific and other formal meetings

[SOURCE: ISO 20539:2019, 3.4.18]

3.2.135

consultant interpreter
conference interpreter (3.2.9) who provides consultancy services in addition to work-
ing as a conference interpreter

367
The Routledge Handbook of Interpreting, Technology and AI

Supplementary information can be incorporated using Notes to Entry and ensuring


numbering consistency. Definitions are grouped in clause 3 of any standard for systematic
organisation, facilitating integration into online platforms.
Furthermore, the recommendation is to add subheadings in clause 3 to categorise terms
by different topics and organise them hierarchically for conceptual clarity. The necessity to
avoid providing definitions for self-explanatory terms is highlighted, along with the essen-
tial rule that all defined terms must be mentioned within the document.
Moreover, the inclusion of standard terms in an authoritative database, such as the Inter-
national Organization for Standardization’s Online Browsing Platform (www.iso.org/obp),
is instrumental. As these terms become more widely accessible, they gain recognition. This
fosters a shared understanding among all stakeholders.
Access to the Introduction, Scope, Normative References, and Terms and Definitions
of any standard is free online (unless it is a vocabulary standard, in which case the Terms
and Definitions constitute the body of the standard), thereby enabling informed decisions
regarding the acquisition of the standard.
The remaining sections of the standard will encompass the stipulated requirements or rec-
ommendations, contingent upon the chosen deliverable, as explained in the previous section.
Standards may contain annexes. These must be referenced somewhere in the body of the
document, excluding the table of contents. This is crucial since it enables standard users
to understand the relevance of each annex. Determining whether an annex is normative or
informative requires more than purely recognising its importance; it is normative only if
cited in a normative manner or section within the document.
For example, in ISO 18841:2018, Annex B, ‘Parties involved in interpreting, the client’s
responsibilities for the interpreter, and the interpreter’s own responsibilities’, is normative.
It is cited in normative clause 5.1, a clause containing ‘shall’. There is a requirement that
can only be followed by using Annex B concerning the responsibilities of the interpreter and
other parties involved in the interpreting communicative event.
In contrast, within the same standard, Annex A, ‘Non-exhaustive list of settings and spe-
cializations’, is considered informative. This is because there is no requirement involved. If
a user wants to know about interpreting specialisations, Annex A provides a list. Therefore,
it is informative rather than normative.
The final sections of any standard will be the Bibliography (optional) and the Index. The
former may include a list of related standards that are not normative references, and other doc-
uments cited for informational purposes and other relevant information resources. The latter is
a list of the terms under the Terms and Definitions section, but arranged in alphabetical order.
When it comes to the language employed in formulating any standard, the primary guide
is ISO/IEC Directives, Part 2 (ISO/IEC 2021). Complementing this, the ISO House Style
page6 provides comprehensive assistance for standard developers. Some key stylistic recom-
mendations that distinguish the drafting of international standards from other forms of
documents are:

• Plain English should be used for enhanced document clarity. This is deemed crucial for
international readers with English as a non-native language and aims at minimising
translation errors. Latin words should be avoided; if utilised, English plurals should be
employed, if one exists.
• When legislation and regulations are referenced, the word ‘compliance’ is used; for stand-
ards requirements, ‘conformity’ or the phrase ‘in accordance with’ should be utilised.

368
International and professional standards

• Oxford British spelling is followed by ISO documents, with exceptions made for approx-
imately 200 verbs for which the suffix ‑ize (rather than ‑ise) is used, for example, organ-
ize and standardize.
• The present tense should be used by default, and an impersonal tone should be main-
tained. ‘Shall’ should be specified for document requirements, and ‘must’ for external
constraints or obligations. The use of ‘need(s) to’ should be avoided to prevent confu-
sion. ‘Should’ denotes a ‘strong recommendation’.
• ‘May’ should be used to express permission, and ‘can’ should be used for possibilities or
capabilities. The substitution with ‘might’ or ‘could’ should be avoided to prevent confu-
sion during translation.
• When referring to an individual, ‘they’, ‘them’, and ‘their’ can serve as gender-neutral
pronouns.

20.1.4 Standardisation vs Certification vs Accreditation


These three terms are frequently conflated. Definitions are a good place to start to distin-
guish between them:

• Standardisation. ‘Activity of establishing, with regard to actual or potential problems,


provisions for common and repeated use, aimed at the achievement of the optimum
degree of order in a given context’.7
• Certification. ‘The provision by an independent body of written assurance (a certificate)
that the product, service or system in question meets specific requirements’.8
• Accreditation. ‘The formal recognition by an independent body, generally known as
an accreditation body, that a certification body operates according to international
standards’.9

In summary, ‘standardisation’ focuses on establishing order in various contexts, ‘certifica-


tion’ provides written assurance of meeting specific requirements, and ‘accreditation’ for-
mally recognizes the adherence of a certification body to international standards. Together,
these concepts form a framework for ensuring quality, consistency, and reliability in vari-
ous domains and contribute to the overall effectiveness and trustworthiness of products,
services, or systems.

20.2 The evolution of interpreting standards: a historical perspective


The initial milestone in the realm of interpreting standards was ‘ISO 13611: 2014 Interpret-
ing – Guidelines for community interpreting’.10 Its genesis can be traced back to the year
2010, when experts at the ISO level embarked on this initiative. Their focus was on com-
munity interpreting due to a perceived urgent necessity to standardize this particular
specialisation within the field. Some countries where large parts of their population do
not speak the official language of that country but need to communicate to access public
services (health, education, etc.) detected the need to regulate community interpreting to
enhance its professional status, describe appropriate working conditions for community
interpreters, and benefit users with the provision of quality community interpreting. The
impetus for this undertaking arose from the rapid growth of demand for community inter-
preting services, coupled with the unregulated nature of this domain. Thus, the inception

369
The Routledge Handbook of Interpreting, Technology and AI

of ISO 13611 marked a pivotal stride toward establishing a standardized framework in the
dynamic landscape of interpreting. This standard, whose revision has just been published, is
now called ‘ISO 13611:2024 Interpreting services – Community interpreting-Requirements
and recommendations’. The new version includes requirements and recommendations for
the provision of community interpreting services, establishing the relevant practices neces-
sary to ensure quality community interpreting services for all language communities (spo-
ken and/or signed) and for all stakeholders.
The only reference made to technology in the 2014 version of this standard was in
terms of its use to work remotely with the help of video or teleconferencing technology,
and with equipment or doing chuchotage, if interpreting simultaneously. In the newly pub-
lished version of the standard, for tasks involving technology, community interpreters are
expected to proficiently operate interpreting equipment, including microphones and audio-/
videoconferencing technology. ‘Proficiency’ in using the necessary equipment and platforms
for remote interpreting services is now also required.
The next standard to be approved in the interpreting standardisation pathway was
‘ISO 18841:2018 Interpreting Services – General requirements and recommendations’.
This standard is the umbrella standard from which all specialist standards derive. It was
intended to be the first of the series, but there was a pressing need to regulate the commu-
nity interpreting field back in 2010. ISO 18841 – under review at the moment of writing
this chapter – covers the basic requirements and recommendations necessary for the provi-
sion of interpreting services. It also offers recommendations of good practice for users of
interpreting services. In this standard, ‘distance interpreting’ was mentioned for the first
time in an interpreting standard, under Section 20.5.2.2, ‘Working Conditions’. Interpret-
ing equipment is mentioned in general without providing any specifications. This standard
is being revised at the time of publication. For the first time, the term “human beings” is
introduced in the scope of an international interpreting standard as subjects performing the
interpreting task.
The next standard in the series was ‘ISO 20228:2019 Interpreting Services – Legal
Interpreting-Requirements’. Its text covers the principles governing the provision of legal
interpreting services, outlining the required skills of legal interpreters. The recommenda-
tions it contains apply to all parties involved in the legal communicative event: interpret-
ers (oral, signed), legal practitioners/legal service providers, lay users, recipients of legal
services, and institutions. Informative Annex C of this standard, ‘Recommendations for
interpreting mode’, mentions that distance interpreting is used by court and the police to
facilitate interpreting when the parties are at different locations and that interpreters should
be provided with the right equipment. However, no specifications are given.11
In chronological order, the next standard to be approved was ‘ISO 21998:2020 Inter-
preting services – Healthcare interpreting – Requirements and recommendations’. This
text outlines the criteria and suggestions for spoken and signed communication in health-
care interpreting services. It is relevant to any scenario necessitating healthcare interpreta-
tion, wherein individuals must communicate using spoken or signed language to address
health-related matters. The target audience includes providers of interpreting services and
healthcare interpreters, as well as healthcare providers and users of healthcare services (i.e.
laypersons – patients, carers, etc). For the first time in an interpreting standard, this standard
includes a subsection on the technical competences and skills for healthcare interpreters. It
states that they shall have to use interpreting technology and underlines the responsibilities

370
International and professional standards

for the interpreting service providers to offer a suitable working environment for remote
interpreting, trying to mitigate noise or visual disruptions, ensuring optimal technology
quality, and providing adequate ventilation.12
The latest interpreting standard to be published is ‘ISO 23155:2022 Interpreting ser-
vices – Conference interpreting – Requirements and recommendations’. This standard, dis-
cussed in more depth in the next section, regulates the provision of conference interpreting
services and offers good practice recommendations. It mentions interpreting technology,
equipment, and for the first time in an interpreting standard, cognitive load.
In simultaneous interpreting, compliance with ‘ISO 24019:2022 Simultaneous interpreting
delivery platforms – Requirements and Recommendations’13 is mandatory, and adherence to
‘ISO 20109-2016 Simultaneous interpreting – Equipment – Requirements’14 is for the use of
audio/video, microphones, and headphones (see Section 20.4.8 for further details).

20.3 The case of ISO 23155:2022: interpreting services – conference


interpreting – requirements and recommendations
ISO 23155:202215 deals with conference interpreting. It is the most recent interpreting ISO
standard and has benefited from the lessons learned during the drafting of several past
interpreting standards. ISO 23155 is a propitious opportunity to discuss how ISO stand-
ards are specifically conceived, drafted, and implemented.

20.3.1 Overview
ISO 23155 started as a new work item proposal (NWIP) in August 2017 and was published
52 months later, on 2 January 2022. It specifies requirements and recommendations for the
provision of conference interpreting services. It is primarily addressed to conference inter-
preters and conference interpreting service providers (CISP), but it also serves as reference
for users of and parties involved in conference interpreting.
Conference interpreting is needed at conferences, that is, specialised, structured, for-
mal, multilingual communicative events (see definition 3.3.1). Conference interpreting is
a well-established profession. Every year, conference interpreters enable hundreds of thou-
sands of multilingual conferences and meetings to take place.
ISO 23155 can be qualified as innovative since it considers the provision of conference
interpreting as an integrated project. While ‘conference interpreting’ refers to the mental
processes taking place in the brain of a conference interpreter, the ‘conference interpreting
service’ includes the working conditions that enable conference interpreters to perform, as
well as the logistics (booths, conference equipment, cabling, documentation, travel arrange-
ments) required to deliver conference interpreting to an audience. Accordingly, the term
‘conference interpreting service provider’ (CISP) denotes the professionals, individuals or
organisations, that provide conference interpreting services.
Aside from the informative sections (Foreword, Introduction, Scope, Normative Ref-
erences, Terms and Definitions, Annexes, Bibliography, Index), the key structure of ISO
23155 contains the following clauses:

4 General provisions about conference interpreting


5 Competences and qualifications of conference interpreters

371
The Routledge Handbook of Interpreting, Technology and AI

6 Requirements and recommendations applicable to conference interpreters in connection


with conference interpreting assignments
7 Requirements concerning the conference interpreting service provider (CISP)

20.3.2 The three-layer model


The conference interpreting service, as stated earlier, is presented in ISO 23155 as an inte-
grated project which consists of three ‘layers’ (see Figure 20.1). Clauses 5, 6, and 7, pre-
sented earlier, reflect this three-layer model: At the core (the first layer) of the conference
interpreting service, one can find interpreting per se, as performed by professional, trained
conference interpreters. The second layer comprises the additional ad hoc tasks conference
interpreters must execute as part of each assignment to ensure quality interpreting. The
outer layer comprises the wide range of support services provided by conference organis-
ers, conference equipment providers, technicians, travel agents, and translators, on whom
conference interpreters depend to perform at their best.
Furthermore, where the first layer places the conference interpreter at the core of the
conference interpreting service, clause 5 presents the required competences. In addition
to highlighting the qualifications required for a conference interpreter to comply with the
Standard, clause 5 includes, among others, intellectual, linguistic, intercultural, interper-
sonal, stress management, and research competences.
The second layer (clause 6) includes the requirements applicable to conference interpret-
ers in connection with an assignment. This refers essentially to their meticulous preparation,
without which a quality result could not be guaranteed during a specialised, structured,

Figure 20.1 The three-layer model underlying ISO 23155:2022.


Source: Created by Eleni Nikolaou, ELIT Language Services.

372
International and professional standards

formal, multilingual communicative event. Ensuring confidentiality is hereby also estab-


lished as a key obligation of conference interpreters before, during, and after a conference.
The third layer (clause 7) reflects the realisation that the success of a conference inter-
preting project depends on a vast amount of ‘logistics’ (the responsibility of the CISP; see
following text). Namely, this includes all processes that must be implemented in support of
conference interpreters, to enable them to perform at their best. These include critical work-
ing conditions and compliance with interpreting-related technical ISO standards (which
will be discussed later).
In short, the first set of requirements is internal to conference interpreters. The second
connects interpreters to their immediate environment (the conference), and the third layer
describes the external factors that allow conference interpreters, who have been selected
for the specific assignment (layer 1), and who are adequately prepared (layer 2), to provide
interpreting in accordance with the Standard.

20.3.3 Some old and some new ideas


ISO 23155 corroborates certain key facts about conference interpreting, distilled from dec-
ades of experience in what is now a mature industry. The Standard offers an augmented
definition of ‘conference’ to help users of interpreting services draw as clear a dividing
line as possible between conference interpreting and other types of interpreting. It recog-
nizes that conference interpreting is an intellectually demanding activity, and acknowledges
stress as an inherent element of conference interpreting. This is logically linked to a crucial
requirement that conference interpreting be performed by professional, trained conference
interpreters who meet stringent competences and qualifications-related criteria (clause 5).
The Standard subsequently stresses the importance of teamwork (conference interpreters
work in teams of at least two) and the need for conference interpreters to share the same
workspace so as to ensure direct visual and oral communication. On-site teamwork is the
optimal way to retain cognitive load within acceptable limits. This will be explained later
on in the chapter.
Finally, the Standard endorses the ABC language classification system and ‘directional-
ity’. This requires conference interpreters to only interpret into their A and B languages.
This is a key requirement of the Standard.
ISO 23155 is innovative in several instances (all references are clauses of 23155). For
example, it includes sign language interpreting in the Scope of the Standard (4.1, 4.2, 5.2.1,
7.3.1). This is in the spirit of the AIIC’s groundbreaking decision in 2012 to open the Asso-
ciation to also include sign language interpreters among its members.
On top of recognising cognitive load as a key concept in conference interpreting, the
Standard also acknowledges how the cognitive load increases when conference interpreting
is delivered remotely (3.2.25, 4.1, 4.4). It stresses that distance interpreting (DI) is ‘more
difficult’ and requires mitigation measures to be taken in DI settings in order to safeguard
quality (4.1, 4.3, 4.4, and Annex E).
As far as the organisational aspects of the conference interpreting service are concerned, it
introduces the concept of the ‘conference interpreting service provider (CISP)’ and describes
in detail all processes under their responsibility (clause 7). To illustrate, it requires the CISP
and its subcontractors to have thorough knowledge of conference interpreting (7.1).
In the same vein, the Standard introduces the concept of risk management and makes it
one of the tasks for the CISP (7.1 and 7.3.1). It states that assuring the health and safety of

373
The Routledge Handbook of Interpreting, Technology and AI

interpreters is the responsibility of the CISP (7.2). Accordingly, it also underlines the need
to ensure conformity with ISO technical standards (see later text).
The concept of ‘consultant interpreter’ is also defined (3.2.13 and 7.1). Here, effort is
made to shed light on the concept of confidentiality in a requirement for an ‘augmented
level of confidentiality’ (6.1, 7.2, 7.3.1, 7.3.2, Annex B, Annex D).
None of these concepts is totally new, of course, but 23155 groups them together in a
consistent manner under an ISO standard. For decades, the International Association of
Conference Interpreters (AIIC, www.aiic.org), researchers, and academia have been con-
tributing to a vast knowledge base. ISO experts have recently transformed this into a pro-
gressive international standard.
This concludes the presentation of ISO standards on interpreting, those which describe
‘how interpreting should be done’. However, the interpreting industry also relies on another
series of standards. These lay down requirements and recommendations concerning the
technical means used during the majority of interpreted communicative events. Indeed,
there are very few multilingual events or meetings with interpretation that do not require
a minimum amount of technical equipment. This could range from a simple public address
system to simultaneous interpreting booths, interpreter interfaces (hard and soft consoles),
microphones and headsets, screens, cabling, and ancillary equipment.
Interestingly, although ISO Technical Committee 37 on Language and Terminol-
ogy became active as early as 1952, technical interpreting ISO standards predate the
non-technical interpreting standards. This could be due to the technical origins of ISO as
an organisation. To illustrate, ISO 4043 on mobile (simultaneous interpreting) booths dates
back to 1981, while the non-technical interpreting ISO standards previously mentioned
only started emerging well into the 21st century.
This chapter will now briefly visit the technical ISO standards related to interpreting,
with the aim of explaining their relevance to the interpreting service.

20.4 Technical ISO standards related to interpreting


This section will briefly discuss past, recent, and present technical ISO standards related
to interpreting. It aims to, first, provide a timeline of these technical ISO standards’ logi-
cal development. Secondly, it aims to link older standards with the new, so as to provide
perspective for stakeholders who naturally follow developments with a certain lag and who
may not yet have had the opportunity to familiarise themselves with the existence of the
new standards.
Certain standards are being revised in the context of the usual five-year ISO cycle. Other
standards have been, or are being, revised to integrate technical progress. Others were
created to address novel concepts, such as ‘simultaneous interpreting delivery platforms’
(SIDP).
Importantly, an ‘architectural’ change is currently taking shape. The key standards gov-
erning permanent booths and mobile booths are being integrated in a single standard: ISO
17651:2024, Simultaneous interpreting – Interpreters’ working environment. The Stand-
ard includes three parts: 17651–1 on permanent booths, 17651–2 on mobile booths, and
17651–3 on interpreting hubs.
To provide an insight into the approach adopted by the experts who drafted each stand-
ard, their key structure (excluding the informative sections) is mentioned in what follows.

374
International and professional standards

20.4.1 ISO 2603:2016: simultaneous interpreting – permanent


booths – requirements (replaced)
ISO 2603 is one of the oldest standards in conference interpreting and has recently been
replaced by ISO 17651–1:2024 (see later). It concerns permanent booths for simultaneous
interpreting, which have a direct view of the room in which the communicative event is
taking place.
The first version, ISO 2603:1983, was followed by ISO 2603:1998. This provides
requirements and recommendations for building and renovating permanent booths in new
and existing buildings.
Its main structure (as compared with 17651–1, which follows) includes the following
elements:

4 Location of booths
5 Building standards for booths
6 Booth interior
7 Facilities for interpreters

As one would expect, the Standard addresses minimum dimensions, windows, and vis-
ibility; soundproofing and acoustics; air quality; lighting; and working surface. It also
addresses exposure to electromagnetic radiation.
The Standard recognises that ‘as interpreting is an activity that requires high concentra-
tion, stress factors have to be avoided, and the working environment accordingly has to
meet the highest ergonomic standards and provide an environment that enables interpreters
to carry out their work properly’ (Introduction).
This document also emphasises the need for ‘good visual communication between the
interpreters and the participants in the event’ (Introduction). As discussed later, this consid-
eration becomes even more important, if not critical, in DI settings.

20.4.2 ISO 17651–1:2024: simultaneous interpreting – interpreters’


working environment – part 1: requirements and recommendations for
permanent booths
ISO 17651–1 cancels and replaces the fourth edition of ISO 2603:2016, which has been
technically revised. The main changes (see Foreword) cater for technological developments,
respond to the need for requirements to be formulated in a technology-neutral way, and
include a reference to booth partitioning that can be temporarily added, for health reasons.
The Standard also includes additional requirements for sign language interpreting.
The minimum booth dimensions and the work surface minimum dimensions (depth 60–66
cm) remain the same, despite the proliferation of electronic devices used by interpreters.
The structure of the revised Standard has changed and now aligns with the other parts of
the ISO 17651 series (compared with ISO 2603 earlier) in terms of thus:

4 Location
5 Design
6 Booth interior
7 Facilities for interpreters

375
The Routledge Handbook of Interpreting, Technology and AI

ISO 17651–1 also specifies that ‘booths are places used for work and are occupied through-
out the day’ (5.6.1). This affects the definition of requirements for air quality, temperature,
humidity, etc.

20.4.3 ISO 4043:2016: simultaneous interpreting – mobile


booths – requirements (withdrawn)
Similarly, ISO 4043 can also be considered as a relatively old standard. The initial version,
ISO 4043:1981, was followed by ISO 4043:1998 and ISO 4043:2016. Its latest version was
replaced by ISO 17651–2:2024.

[ISO 4043:2016] provides requirements and recommendations for the manufacturing


of mobile simultaneous interpreting booths. The primary features of mobile booths
that distinguish them from permanent simultaneous interpreting booths are that they
can be dismantled, moved and set up in a conference room not equipped with per-
manent booths.

Its main structure (compared with ISO 17651–2:2024, shown later) includes the
following elements:

4 General requirements
5 Size, weight and handling
6 Doors
7 Cable passages
8 Windows
9 Acoustics
10 Ventilation
11 Working surface
12 Lighting
13 Electricity supply
14 Language panels

20.4.4 ISO 17651–2:2024: simultaneous interpreting-interpreters’


working environment – part 2: requirements and recommendations for
mobile booths
This Standard cancels and replaces ISO 4043:2016 which has been technically revised.
It applies to mobile booths for simultaneous interpreting and adds that ‘mobile booths
are intended for non-permanent use’ (Introduction). It also reminds the reader that
‘table-mounted hoods and single-person booths do not conform with ISO 17651–2’ (5.2.1,
NOTE). This is of importance to settings used during distance interpreting.
ISO 17651–2 drifts away from ISO 4043 shown earlier and adopts the main structure of
ISO 17651–1 for permanent booths, as shown in the following:

4 Location
5 Design
6 Booth interior
7 Facilities for interpreters

376
International and professional standards

Notable additions to protect interpreters’ health include requirements for thus:

1. Ventilation which ‘renews the air at least eight times an hour’


2. A CO2 detector to ‘ensure that CO2 in the booths shall not exceed 1,000 parts per million’

Part 1 and Part 2 of ISO 17651 will be complemented with ISO 17651–3 (Part 3: Require-
ments and recommendations for interpreting hubs). This is currently still in progress but
will apply to booths which do not have a direct view of the room in which the communica-
tive event is taking place.

20.4.5 ISO 20108:2017: simultaneous interpreting – quality and


transmission of sound and image input – requirements (withdrawn)
ISO 20108:2017 was withdrawn in August 2023. This key Standard applied to ‘the quality
and transmission of sound and image input to interpreters and specified the characteristics
of the audio and video signals’. Its content is now covered by other Standards, mainly that
of 24019.
Its main structure included thus:

4 Sound input to interpreters


5 Image input to interpreters
6 Lip sync
7 Transmission of audio and video from a distant site

Under ‘7.1 Quality’ lies an interesting reference to the ‘effects of both packet level and
signal-related impairments caused by coding processes’. These currently (as of 2024) form
the centre of the debate concerning interpreters’ auditory health.

20.4.6 ISO 20109:2016: simultaneous interpreting –


equipment – requirements (under review)
ISO 20109:2016 specifies requirements for equipment used for simultaneous interpreting.
‘[It] specifies the components of typical interpreting equipment, which together with either
permanent (ISO 2603) or mobile (ISO 4043) booths, form the interpreter’s working envi-
ronment’ (Introduction).
Its main structure includes the following elements:

4 Overall interpreting system


5 Interpreter console
6 Microphones
7 Interpreters’ headphones/headset
8 Portable interpreting system

ISO 20109 also introduces hearing protection that must be provided by the interpreting
equipment (clause 4.5).
It must be noted that ISO 20109 is currently under review and will apply to a variety
of different settings. These include interpreters working in booths in the same space as all

377
The Routledge Handbook of Interpreting, Technology and AI

other participants, in booths adjacent to the meeting room, or interpreters working from
interpreting hubs, etc.

20.4.7 ISO 22259:2019: conference systems – equipment – requirements


Although not all events require interpreting, ISO 22259 has won its place here since it refers
to how ‘a conference system (the technical equipment used to conduct an event) can be
extended with an interpreting system and a language distribution system’. ISO 22259 speci-
fies requirements for conference systems, including microphones, headphones, and sound
reinforcement equipment. Perhaps importantly, it is also known to ‘contain the technical
backbone of ISO 20108 and ISO 20109’.

20.4.8 ISO 24019:2022: simultaneous interpreting delivery


platforms – requirements and recommendations
Simultaneous interpreting delivery platforms (SIDP) connect the following:

1. Speakers and signers with interpreters


2. Interpreters with their audience, transmitting spoken and visual information

ISO 24019 is a non-certifiable standard and reflects the rapidly changing landscape in the
industry. It cancels and replaces ISO/PAS 24019:2020 (PAS is explained in Section 20.1
of this chapter). Some of the changes of note include the additional requirements for sign
language interpreting, a reference to communication between interpreters with sound
and image, and requirements referring to the working environments of both speakers and
signers.
Also of note is the Standard’s approach to and description of distance interpreting as
‘settings where the interpreters are not at the same venue as participants, speakers and
signers or each other’ (Introduction). Here, the Standard indirectly acknowledges that
distance also from participants (not only speakers and signers) makes interpreting at
least ‘different’ for interpreters. The Standard also discusses sound and image quality,
synchronisation of sound and image, hearing protection, latency, existence of technical
support, etc.
Its main structure includes the following elements:

4 Purpose and characteristics of a simultaneous interpreting delivery platform


5 Overall performance
6 Technical support personnel
7 Requirements related to the simultaneous interpreting delivery platform
8 Requirements relating to the speaker and signer
9 Requirements relating to the interpreter

ISO 24019 also introduces a ‘handover procedure and control’ (7.7.11), a procedure to
allow an interpreter to hand over command of their outgoing channel to a channel partner
who is not located by their side. Here, the Standard implicitly refers to a situation where
an interpreter would be obliged to work alone, in uncontrolled (perhaps private) premises,
without technical support. The so-called ‘home alone’ model remains highly controversial,

378
International and professional standards

especially for the purpose of conference interpreting. This shall be discussed in greater
detail later.
Furthermore, the Standard recommends (note the use of ‘should’ vs ‘shall’) that com-
munication between interpreters, and between interpreters and technicians, moderators,
speakers, or signers and the conference organizer, ‘should necessitate minimal additional
intellectual effort on the part of the interpreters’ (7.8.2).

20.4.9 ISO 20539:2019: translation, interpreting, and related


technology – vocabulary, or the odyssey of terms
ISO 20539 is not a technical standard but deserves mention as it serves as a compendium
of vocabulary used in translation, interpreting, and other related technology standards. It is
compiled dynamically, with a view to coordinating usage of terminology. Concepts in inter-
preting are not necessarily considered the same way across all settings; ‘however, it is likely
that in the long term, consistency in terms and definitions across the related International
Standards will have a standardizing effect on the terms and definitions used in practice in
the long term’ (Introduction). As a result, ISO 20539 can serve linguists with a particular
interest in terminology.

20.5 Coming to terms with definitions


As mentioned in Section 20.1.3 earlier, definitions form a crucial part of the experts’ work
at ISO. Use of suitable definitions, appropriate for the purpose of a specific standard,
improves the readability of ISO standards. It is for this reason that terms are defined in dif-
ferent ways, even across standards within the same domain; a cursory browsing of the ISO
Online Browsing Platform (OBP, at www.iso.org/obp/ui) illustrates this.
For example, the concept of ‘floor’ as used in interpreting demonstrates how a term can
correspond to a range of different concepts and, therefore, definitions.
In 2019, ISO 22259:2019 Conference systems – Equipment – Requirements treated this
concept in purely technical terms. It defined floor as ‘audio output of discussion system
conveying microphone input and auxiliary input’. However, the working group (ISO TC37/
SC5/WG2) for ISO 23155, published in January 2022, considered the definition to not be
fit for purpose in the context of conference interpreting. It deemed that, for conference
interpreters, technicians, and CISPs, the floor is the ‘original’ content, which comprises
all information coming from the conference room other than the delivery of conference
interpreters. This definition may be easily understandable by the conference interpreting
community but not necessarily to others.
At the same time, another concern was for the definition of ‘floor’ to include a signer
signing at a conference, to put sign languages on an equal footing with spoken languages.
To illustrate, were a deaf member of the European Parliament to sign in the plenary room
in Strasbourg, for conference interpreters and technicians this would also be defined as the
‘floor’. PA announcements are also deemed part of the ‘floor’ and must be therefore inter-
preted by interpreters. As a result, a new definition had to be found for ISO 23155:

electric circuit serving as a path for information spoken, signed or otherwise pre-
sented in the course of the proceedings of a conference by participants other than
conference interpreters.

379
The Routledge Handbook of Interpreting, Technology and AI

Subsequently, ISO 24019:2022, on simultaneous interpreting delivery platforms, was pub-


lished nine months later, in September 2022. ISO 24019 draws a definition from the 2019
version of ISO 20539 (the ‘vocabulary standard’), albeit with a few changes:

audio output of conference system or simultaneous interpreting delivery platform


conveying auxiliary input and input from microphones, excluding input originating
from interpreters interpreting from a spoken language.
[SOURCE: ISO 20539:2019, 3.5.2.34, modified – ‘or simultaneous interpreting deliv-
ery platform’ has been added, ‘auxiliary input’ has been placed first, ‘microphone
input’ has been changed to ‘input from microphones’ and ‘excluding input originating
from interpreters interpreting from a spoken language’ has been added.]

Experts felt that, to add clarity and bring the definition closer to the way it is used at confer-
ences, some reference should be made to ‘content other than that produced by interpreters’,
hence the addition of ‘excluding input originating from interpreters interpreting from a
spoken language’.
In the meantime, within ISO 20539:2019, the 2019 version of the ‘vocabulary standard’
was withdrawn, revised, and succeeded by ISO 20539:2023. It now carries the following
definition of ‘floor’, derived directly from the ‘platform standard’, ISO 24019:2022. Note
that the source reference between brackets in the preceding quote has also disappeared, as
the 2019 version of 20539 has been withdrawn. The current ‘clean’ definition of ‘floor’ in
ISO 20539:2023 is as follows:

audio output of conference system or simultaneous interpreting delivery platform


conveying auxiliary input and input from microphones, excluding input originating
from interpreters interpreting from a spoken language.

However, due to the term ‘audio output’, this definition still restricts ‘floor’ to audio content
and excludes a signer presenting at a conference. This is an issue which is likely to be dis-
cussed during future ISO meetings. From this brief representation of the varying definitions
of the concept ‘floor’, it is hoped that the reader has been provided with insight into why
definitions in ISO standards must always be considered a ‘work in progress’.

20.6 Technical standards are not just technical


Technical standards tend to be perceived as the standards which lay down specifications
concerning sound, as used by interpreters in interpreting booths. However, careful read-
ing of technical standards related to interpreting reveals clauses which constitute valuable
guidelines about how interpreting should be carried out. These include references to good
interpreting practices, operational arrangements, quality concerns, or health and safety
concerns that have been embedded in technical standards. To illustrate, the NOTE under
clause 8.1 of ISO 20109 reads:

Portable interpreting equipment is meant to be used in specific environments and for


specific specialisations and is not meant to replace mobile booth installations, where
these are deemed appropriate.

380
International and professional standards

This explanation is highly relevant to the issue of quality in an indiscriminately ‘cost-slashing’


economy. In an attempt to reduce costs, some users may push for a temporary interpret-
ing system, based on mobile booths, to be replaced by a portable one. This clause clarifies
things beyond any doubt and sets a clear rule about how things should be done.
In a similar vein, ISO 20109, Annex C (normative) System operation, paragraph C.2
refers to the presence of a conference technician:

At least one qualified conference technician shall be present throughout the event/
conference, in order to monitor the correct functioning of the equipment. The techni-
cian may either be physically present or located in a centralised control booth or room.

This is particularly relevant to DI settings where, in extremis, interpreters could be called


to work from private premises without technical support. The absence of technical support
exacerbates cognitive load for conference interpreters to unacceptable levels, at the expense
of effective multilingual communication.
Likewise, under clause 5.2, on image quality, ISO 20108 refers to this:

In case of excessive blurring, freezing or artefacts, interpreting may have to be tempo-


rarily suspended until the image stabilises.

No conference interpreting can be provided, on-site or remotely, without visual cues. There-
fore, at least one EU institution instructs interpreters to stop interpreting when a speaker
does not use a camera or if it is switched off. This and other relevant provisions facilitate
negotiations between users of conference interpreting and CISPs.
Similarly, clause 5.1, ISO 17651–1, on permanent booths, reads:

Each booth shall accommodate interpreters comfortably seated side by side... Perma-
nent booths providing space for no more than one interpreter do not conform to this
document.

This an essential reminder of the team effort required by interpreters in conference


interpreting.
In view of the analysis in Section 20.1, these examples hope to further corroborate the
invaluable role played by standards in promoting quality in interpreting and serving as bench-
marks to stakeholders. In addition, standards facilitate understanding between providers and
buyers of interpreting services and provide guidance to tendering and contracting authorities.

20.7 Distance interpreting in ISO standards


Distance interpreting (DI) is a relatively new means of delivery of interpreting that comple-
ments on-site interpreting. While it is often celebrated as a ‘technological revolution’, DI
essentially consists of the integration of sound and video transmission over the internet, a
technology that has been commercially available for some time. However, this relatively
modest progress in technology has ushered in a disproportionately sized, almost-disruptive
change in the interpreting services market. Before discussing this further, DI must be defined
by ISO standards.

381
The Routledge Handbook of Interpreting, Technology and AI

20.7.1 Defining distance interpreting


DI is a generic term that encompasses all forms of interpreting provided remotely, from
basic telephone interpreting (no visual communication) to remote simultaneous interpret-
ing (RSI) provided over sophisticated simultaneous interpreting delivery platforms (SIDPs).
ISO 20108, published as early as 2017, discusses DI in a progressive manner:

ISO 20108:2017 specifies requirements for different varieties of distance interpreting


situations in which the interpreters are not at the same location as one or more of the
conference participants.
(Scope)

Interestingly, this description – an accurate and comprehensive, albeit indirect, definition of


distance interpreting – provides a wider and potentially more appropriate definition com-
pared to those found in more recent ISO standards.
ISO 20109:2016 (3.10) defines DI as ‘simultaneous interpreting of a speaker in a dif-
ferent location from that of the interpreter, enabled by information and communications
technology’. However, published one year later, ISO 20108:2017 removes the word ‘simul-
taneous’ from the definition.
In 2022, the definition of DI (‘remote interpreting’ being an admitted term) becomes
more inclusive in ISO 23155:2022, including a provision for sign languages (3.3.21):

[I]nterpreting of a speaker or signer in a different location from that of the interpreter,


enabled by information and communications technology.

However, neither definition encompasses the range of situations where interpreters are
physically separated from either one or more speakers or signers, from part or all of the
audience, or a combination of both (see Braun, Warnicke, Chmiel, and Spinolo, this vol-
ume). Nevertheless, while not providing a fully satisfactory definition, ISO 24019 does
come closer to this approach, mentioning ‘settings where the interpreters are not at the
same venue as participants, speakers and signers or each other’ (Introduction).
One can therefore argue that an improved definition of DI should encompass all settings
and arrangements where physical separation between interpreters and participants results
in reduced sensory input for the interpreters.

20.7.2 The ‘four degrees of separation’ model and cognitive load


A more comprehensive definition of DI is extremely important for interpreters. The effects
that DI has on cognitive load do not result exclusively from the interpreter’s separation
from speakers or signers. Physical separation from the audience is also an aggravating fac-
tor, for example, and is described by interpreters as being akin to a loss of control:

This social information about participants’ feelings, emotions, and attitudes to the
other delegates and to what is being discussed is of vital importance to the interpreter
as it constitutes the general framework that defines a communication event.
(Diriker, 2004, in Moser Mercer, 2005, 730)

382
International and professional standards

Much effort has gone into producing complex topologies that describe the respective posi-
tioning of speakers, moderators, interpreters, technicians, and listeners. However, the only
practicable way of discussing the effects of DI on interpreters is to put interpreters at the
centre and discuss their separation from various categories of participants:

1. Separation from speakers and signers


2. Separation from the audience
3. Separation from interpreters sitting in other booths (other languages)
4. Separation from technical support
5. Separation from interpreters of the same booth/language (‘home alone’)

Technical issues aside, what is common to this range of settings is that progressively – with
each degree of separation – interpreters receive less and less information (sensory input)
from the communicative event. In addition, interpreters are also loaded with additional
tasks, such as those which are normally carried out by specialised technicians. This ‘lethal’
combination of reduced sensory input and increased cognitive load is discussed by AIIC
member Andrew Constable, who links it to the question of interpreting quality. Constable
makes a distinction between intrinsic and extraneous cognitive load in DI settings:

The intrinsic cognitive load will be related to the difficulty of the source speech (e.g.,
speed of delivery, density, vocabulary, level of specialism) related to the capacity of
the interpreter to perform the task (experience, subject knowledge, level of prepara-
tion, etc.).
(Constable, 2021, 4)

Subsequently, Constable explains the concept of extraneous cognitive load in interpret-


ing: ‘Extraneous cognitive load is additional cognitive load caused by factors external to
the intrinsic difficulty of the speech or capacity of interpreter’ (Constable, 2021, 4). It is
worth noting that, in DI, the visual inputs that are available to the interpreter are limited,
and the quality of audio signal is poorer. As a practising conference interpreter, Constable
(along with many other practitioners) feels that this factor increases a task’s difficulty and
thus requires additional mental effort from the interpreter. Therefore, it is more likely that
the interpreter will suffer from cognitive overload, resulting in interpreting error, mental
fatigue, or both (Constable, 2015).
In addition, increased cognitive load can also result from exotic tasks (used in the origi-
nal sense of the term: tasks originating in a foreign, non-native country, ‘from the outside’).
These tasks are neither directly nor indirectly related to interpreting. Interpreters not work-
ing from conference rooms or interpreting hubs16 are called upon to execute additional
roles. Such roles may include acting as technicians and, occasionally, as moderators. Inter-
preters may also have to negotiate in real time with speakers regarding their equipment or
communicate with a remote boothmate via a variety of devices. They may also find them-
selves troubleshooting internet connection problems. Interpreting outside of a conference
centre or an interpreting hub, without technical support, therefore deprives interpreters of
a purpose-made protective, controlled environment. As a result, doing so entails an (addi-
tional) exotic workload, which may result in a corresponding addition to their cognitive
load. The debate on exotic tasks and their impact on interpreting quality is ongoing.

383
The Routledge Handbook of Interpreting, Technology and AI

Similarly, research on the implications of physical separation in DI is also ongoing.


However, a seasoned interpreter, overheard commenting on the range of sound and video
provided via multiple screens, appears to have aptly summed up the situation experienced
by most interpreters in this setting, albeit in simplified terms: ‘Interpreters experience the
strange feeling of being there without really being there.’ One could argue that this is exactly
the problem resulting from reduced sensory input: Interpreters are never really there.

20.8 Conclusion
Reflecting on the core topic discussed in this chapter, one must note that it is essential to
understand the need for interpreting standards. ISO standards incorporate a vast amount
of knowledge, experience, and good practices from various domains. Standards are shaped,
to a large extent, by industry pioneers and show others the way forward. Since the effort
towards standardising interpretation commenced 14 years ago, significant progress has
been made to establish a valuable tool that raises awareness and makes an impactful entry
into the realm of international standards.
In the realm of standardisation, the incorporation of terms and definitions plays a pivotal
role in promoting awareness and establishing a shared language across diverse fields. Pro-
fessionals and practitioners can draw upon these standardised terms and foster heightened
consistency in documentation. This ensures more effective communication and ultimately
facilitates the successful application of standards in a wide range of varied contexts. Stake-
holders utilising interpreting services are empowered to make informed decisions when
seeking interpreting services. The interpreting profession is poised for more organised and
systematic growth, with the standard serving as a critical reference point. In addition, those
embarking on a career in interpreting can gain a clearer understanding of the expectations
for delivering a proficient performance, with reference to current standards.
Furthermore, interpreting standards play a crucial role in advancing the profession towards
‘Sustainable Development Goal 10: Reduced Inequalities’ through the promotion of inclusive
communication. These standards serve to guarantee fair access to information and services.
They aim to dismantle language barriers and foster understanding among diverse communities.
Offering a consistent and universally applicable framework for language services, interpreting
standards actively contribute to the creation of a more inclusive and accessible global landscape.
In relation to the complex intersection between interpreting and technology, standards
also provide constant reflection in relation to technological progress. The two current
major changes in the industry, distance interpreting and artificial intelligence, are examples
of such complexities faced by both providers and users of interpreting today. Although
they are already used extensively during preparation ahead of an assignment, interpreters
mainly use AI tools to extract terminology from conference-related material (see Prandi,
this volume). However, a second category of applications is currently being built to assist
interpreters during an assignment. The success of such tools will depend on whether their
application causes additional cognitive load (distractions) during interpreting. Finally, a
third emerging category of AI tools involves that of machine interpreting, with the ambition
of replacing humans in suitable interpreting assignments in the future.
AI could indeed bring about disruptive changes to interpreting in the future, likely in a
more evident way than machine translation has changed translation. However, until AI starts
seriously interfering with human interpreting, DI remains the most recent key development in
the interpreting industry, despite the conventional technology it uses (combined transmission

384
International and professional standards

of sound and image over the internet). DI can be deemed ‘conventional’ because it does not
affect the core of interpreting; it does not impinge on or emulate the mental processes in the
interpreter’s brain, as AI may do in the future. However, although DI merely represents a new
means of delivery, it is still proving deeply disruptive in several ways.
To illustrate, the launch of distance interpreting, accelerated by the COVID-19 pan-
demic, helped ensure the continuation of business during a global health crisis. However, DI
contributed much more than that: SIDPs enabled novel business models. Large LSPs, who
traditionally focused on translation (where technology already played a role), realised that
interpreting could also become a profit centre, and thousands of linguists entered the inter-
preting market without credentials. DI blurred the dividing line between conference inter-
preting and other types of interpreting, and quality temporarily became a reduced priority.
As this first cycle of advancement reaches its conclusion, the language services indus-
try must review its impact. A series of technical issues and operational challenges cast
their shadow over DI. Interpreters feel that the sound quality transmitted to their ears can
present a health hazard. The question of whether interpreters receive adequate sensory
input to perform efficiently remains unanswered. No conclusive research exists concerning
interpreters’ new working conditions. For example, research confirms increased difficultly
within DI, indeed, but has not reached a conclusion about the corresponding necessary
reduction of working hours; or about whether working from home is conducive to qual-
ity; the forms of interpreting that could be covered, even in a rudimentary way, in ‘home
alone’ mode; how to help interpreters cope with increased stress in DI settings; or whether
it is acceptable to work in a windowless booth. It is hoped that further research into ISO
standards will help improve quality interpreting in DI settings.
DI technology currently removes and calls into question fundamental premises that
interpreters have taken for granted for decades. These include those of physical proximity
and unlimited access to speakers and listeners, full immersion in the proceedings of the
conference, genuine situational awareness of the meeting room and its surroundings. As a
result, DI affects the interpreter’s working environment and working conditions by reduc-
ing sensory input and adding workload (including from exotic tasks). This leads to exces-
sive cognitive load and accelerated onset of fatigue.
Accepting the hypothesis that the human brain has not evolved over the past five years
(given the split second that five years represent within the evolutionary timescale), one can
reasonably assume that in DI settings, a human interpreter’s brain is thus pushed beyond the
boundaries of its known ‘envelope’. To illustrate, in traditional (on-site) settings, the interpret-
er’s brain is already often operating at the edge of this ‘envelope’. In times of difficulty, when
mental resources are overwhelmed by the total effort required or, as Daniel Gile writes, ‘when
total available processing capacity is insufficient’ the interpreter’s brain is oversaturated. As
a result, interpreting output starts to fail (‘errors, omissions and infelicities can occur’) (Gile,
2021). In DI, on top of known interpreting challenges, this same human brain undergoes a
novel debilitating deficit in sensory input, combined with an increased cognitive load. This
mix predictably leads to more frequent or serious interpreting failures.
This phenomenon summarises the issues that interpreters face in DI settings. However,
this is not limited to interpreters but is, in fact, critical for the interpreting service as a
whole. If interpreting fails, the entire conference interpreting service collapses, and in this
sense interpreters can be considered as a ‘single point of failure’ in the process (see layer 1
in ISO 23155). To sum up, effective multilingual communication cannot be guaranteed in
DI when interpreters perform facing serious handicaps.

385
The Routledge Handbook of Interpreting, Technology and AI

The question remains as to whether these handicaps can be restored or if this is even
possible. What can be done to restore the status quo ante (i.e. make an interpreter feel like
they are ‘really there’ despite physical separation from various conference stakeholders)?
How can interpreters be redeemed from exotic tasks? How can the excess cognitive load be
mitigated to allow interpreters to operate within the ‘envelope’ again? Is this even possible?
Answers to these questions may lie in what has made interpreting, in particular, con-
ference interpreting, such a successful practice and the foundation of spoken interlingual
communication over decades: implementing best practices combined with a high level of
education and training at reputable interpreting schools worldwide.
While an inordinate amount of time has been spent lamenting issues related to DI set-
tings during its initial years (internet connection failures or unsuitable microphones), it is
time for the industry to acknowledge the extensive expertise and good practices that have
ensured millions of hours of conferences to take place without issue, since Nuremberg, and
invest in perpetuating this stellar record. This can be achieved by adopting and implement-
ing the practices now universally enshrined in acclaimed ISO standards.

Notes
1 The name does not match the acronym because ISO is derived from the Greek word ἴσος (ísos),
which means ‘equal’. This signifies that, irrespective of the country or language, we operate on an
equal footing, emphasising a universal and equitable approach in our work.
2 ISO 18841:2018 Interpreting Services – General Requirements and Recommendations. URL
www.iso.org/standard/63544.html (accessed 20.2.2024).
3 ISO 23155:2022 Interpreting – Conference Interpreting – Requirements and Recommendations.
URL www.iso.org/obp/ui/en/#iso:std:iso:23155:ed-1:v1:en (accessed 20.2.2024).
4 ISO 23155:2022 Interpreting – Conference Interpreting – Requirements and Recommendations.
URL www.iso.org/obp/ui/en/#iso:std:iso:23155:ed-1:v1:en (accessed 20.2.2024).
5 ISO 23155:2022 Interpreting – Conference Interpreting – Requirements and Recommendations.
URL www.iso.org/obp/ui/en/#iso:std:iso:23155:ed-1:v1:en (accessed 20.2.2024).
6 ISO, ISO House Style. URL www.iso.org/ISO-house-style.html#iso-hs-s-text-r-plain (accessed
24.2.2024).
7 ISO, Consumers and Standards: Partnership for a Better World. URL https://www.iso.org/sites/
ConsumersStandards/5_glossary.html (accessed 2.3.2024).
8 ISO, Certification. (7.2.2023). URL www.iso.org/certification.html (accessed 2.3.2024).
9 ISO, Certification. (7.2.2023). URL www.iso.org/certification.html (accessed 2.3.2024).
10 Replaced by ISO 13611:2024 Interpreting Services – Community Interpreting – Requirements
and Recommendations. URL www.iso.org/standard/82387.html (accessed 2.3.2024).
11 ISO 20228:2019 Interpreting Services – Legal Interpreting – Requirements. URL https://www.iso.
org/standard/67327.html (accessed 4.3.2024).
12 ISO 21998:2020 Interpreting Services – Healthcare Interpreting – Requirements and Recommen-
dations. URL www.iso.org/standard/72344.html (accessed 4.3.2024).
13 ISO 24019:2022 Simultaneous Interpreting Delivery Platforms – Requirements and Recommen-
dations. URL www.iso.org/standard/80761.html (accessed 4.3.2024).
14 ISO 20109:2016 Simultaneous Interpreting – Equipment – Requirements. URL https://www.iso.
org/standard/67063.html (accessed 4.3.2024).
15 ISO 23155:2022 Interpreting – Conference Interpreting – Requirements and Recommendations.
URL www.iso.org/standard/74749.html (accessed 20.2.2024).
16 The definition of ‘interpreting hub’ in the current form of ISO 17651–3 is ‘facility managed by special-
ized staff, with interpreting workspaces and fully equipped for the provision of distance interpreting’.

386
International and professional standards

References
Constable, A., 2015. Distance Interpreting: A Nuremberg Moment for Our Time? AIIC 2015 Assem-
bly Day 3: Debate on Remote. 18 January. URL https://aiic.ch/wp-content/uploads/2020/05/
di-a-nuremberg-moment-for-our-time-andrew-constable-01182015.pdf (accessed 16.7.2024).
Constable, A., 2021. Extraneous Cognitive Load in Distance Interpreting, November. URL www.
researchgate.net/publication/380519089 (accessed 16.7.2024).
Gile, D., 2021. The Effort Models of Interpreting as a Didactic Construct. In Muñoz Martín, R.; Sun,
S. S., Li, D. (eds). 2021. Advances in Cognitive Translation Studies. Singapore: Springer Nature,
139–160.
ISO, 2010. Ed. 1, Guidance for ISO Liaison Organizations. URL www.iso.org/publication/
PUB100270.html (accessed 10.2.2024).
ISO, 2011. Ed. 1, Guidance for ISO Liaison Organizations-Engaging stakeholders. URL http://www.
iso.org/iso/guidance_liaison-organizations.pdf (accessed 10.2.2024).
ISO, 2019. Ed. 2, Guidance on the Systematic Review Process in ISO. URL https://www.iso.org/files/
live/sites/isoorg/files/store/en/PUB100413.pdf (accessed 16.2.2024).
ISO, 2020. My ISO Job – What Delegates and Experts Need to Know. URL www.iso.org/publication/
PUB100037.html (accessed 16.2.2024).
ISO, 2023a. Ed.7, Getting Started Toolkit for ISO Committee Chairs. URL www.iso.org/publication/
PUB100417.html (accessed 12.2.2024).
ISO, 2023b. ISO/IEC Directives, Part 1, Consolidated ISO Supplement. URL https://www.iso.org/sites/
directives/current/consolidated/index.html (accessed 15.2.2024).
ISO, 2024. Ed. 7, My ISO Job-What Delegates and Experts Need to Know. URL www.iso.org/
publication/PUB100037.html (accessed 10.7.2024).
ISO/IEC, 2021. ISO/IEC Directives, Part 2 Principles and Rules for the Structure and Drafting of
ISO and IEC Documents, 9th ed. URL www.iso.org/sites/directives/current/part2/index.xhtml
(accessed 24.2.2024).
Moser-Mercer, B. 2005. Remote Interpreting: Issues of Multi-Sensory Integration in a Multilingual
Task. Meta, 50(2), 727–738. URL https://doi.org/10.7202/011014ar (accessed 10.7.2024).

387
21
WORKFLOWS AND
WORKING MODELS
Anja Rütten

21.1 Introduction
This chapter will cover different aspects under which an interpreter’s workflow is affected
and has changed under the influence of technology. While the focus is on conference inter-
preting, many aspects also apply to other forms of interpreting. The chapter will first look
at the evolution over time in a chronological order. It will then analyse the topic from the
perspectives of phases and efforts, semiotics of interpreting, and information and knowl-
edge. Finally, it will look at the impact of technology from a business point of view.

21.2 Technologisation over time


Technology, as a broad definition, can be considered as ‘the application of conceptual
knowledge for achieving practical goals, especially in a reproducible way’ (Skolnikoff,
1993, 13). Over time, technology was first used mainly to mediate interpretation, while
later on, it also became a means to support interpreters in their work (Braun, 2019). Over-
all, all aspects of technologisation have led to a gradual intensification of the process of
interpreting. Although there does not seem to be a common definition of work intensifica-
tion in scientific literature, in this particular context, it can be defined as ‘performing more
tasks within the same (or less) time’, which is clearly one determining factor (Heery and
Noon, 2008; van Herpen, 2017, 8ff).

21.2.1 Pen and paper


Technology most probably first made its way into (consecutive) interpreting in the form
of pen and paper. Used as a means of cognitive support for the interpreter’s memory, pen
and paper have given rise to a note-taking technique (Rozan, 1956; Matyssek, 1989) that
is more meaning-based than stenography, for example. To illustrate, consecutive interpret-
ing was made possible, thanks to this note-taking technique, during the negotiations of the
Treaty of Versailles, at the League of Nations and the International Labour Office (AIIC,
2019a). Note-taking increases convenience for the parties of communication, since longer

DOI: 10.4324/9781003053248-27 
Workflows and working models

passages can be interpreted. This, in turn, reduces the number of interruptions to a speech.
For interpreters, notes support their memory, on the one hand, but on the other, the act of
both note-taking and listening to the speaker simultaneously, over longer stretches of time,
increases their cognitive load (Chen, 2018, 91).

21.2.2 Simultaneous interpreting


The one form of technology which revolutionised the profession like no other appeared
post-war. Simultaneous interpreting was first prominently used in the form we know today
after WWII, during the Nuremberg trials, although some experimental forms existed in the
1920s (Gaiba, 1998; AIIC, 2019a). This technology, just like note-taking, created a mas-
sive boost in convenience and efficiency for speakers and listeners alike. For the first time,
multilingual communication could take place without a significant loss of time.
On the other hand, the introduction of simultaneous interpreting, or ‘simultanification’,
clearly represented the greatest intensification of the interpreting process in the history
of interpreting. The sub-processes of listening and memorising, which used to precede
the sub-process of speaking, became superimposed. Interpreters now had to process two
streams of information in parallel, literally doubling their mental workload per time unit.
At the same time, simultaneous interpreting also isolates interpreters from the communica-
tion setting by placing them into sound-insulated booths.
As increasing numbers of international organisations were created, the two typical modes
of conference interpreting (simultaneous and consecutive), and with them the concept of
conference interpreting itself, were here to stay. As a result, working standards started
to be discussed. The International Organisation of Conference Interpreters was founded
in 1953. With the intensification of the interpreting process and corresponding cognitive
load, working in teams of at least two interpreters taking turns became a business standard
(AIIC, 2019b).

21.2.3 The ‘bidule’ and ‘SimConsec’


In the decades that followed, no major impactful technological changes occurred. How-
ever, one hardware technology worth mentioning is the increased use of tour guide sys-
tems (French bidule) as a replacement for simultaneous interpreting equipment. Its primary
purpose was for whispered simultaneous interpreting (French chuchotage) for interpreting
settings where participants move around (e.g. in factory visits, in guided tours, or where,
for practical reasons, no full simultaneous interpreting equipment could be installed; AIIC,
2019c). Again, this may have increased convenience and efficiency for meeting partici-
pants, but this also increased intensity and cognitive load in the interpreting process. This
is because these systems lack sound insulation. As a result, in addition to the usual simul-
taneous interpreting process, the interpreter also has to filter out the sound of the source
language from ambient noise and their own voice. This renders interpreting more tiring and
more prone to errors (see also Korybski, this volume).
Yet another rather hardware-driven technological novelty that appeared around the turn
of the millennium is so-called ‘SimConsec’. This was first introduced by EU staff interpreter
Ferrari in 1999. A small digital voice recorder is used to record the original speech while
the interpreter listens to it and decides whether to take additional notes. After that, the

389
The Routledge Handbook of Interpreting, Technology and AI

interpreter plays back the recording into earphones and renders it in simultaneous mode
(Hamidi & Pöchhacker, 2007; Ferrari, 2002). Digital pens with built-in cameras can now-
adays also film notes on special microdot paper. These microdots enable the camera to
record the position of each element of the notes and match it with the recording. In this
way, the interpreter, while interpreting, can also play back the sound of the recording that
corresponds to a certain note element (Orlando, 2015, 144, this volume). While the tech-
nique of SimConsec clearly represents a cognitive relief to interpreters and – as research
suggests – increases the accuracy of the interpreting performance, ‘classical’ consecutive
seems to be rated higher by the audience (Svoboda, 2020, 74).
Both whispered interpreting and SimConsec are suitable for similar settings – situations
in which a simultaneous booth cannot be installed due to lack of space or movement of
the participants (i.e. factory tours or a change of location). Although the author does not
dispose statistics about the application of SimConsec, long-standing experience exists in
the admission committee of the German Association of Conference Interpreters, where
all senior members must submit a list of 200 working days as proof of proficiency. From
this perspective, it is fair to say that SimConsec has no relevance, at least in the German
conference interpreting market, whereas the use of personal guidance systems is quite wide-
spread. On a continuum of maximum and minimum cognitive load for the interpreters, the
two techniques can be seen as opposite extremes, with correspondingly opposite levels of
comfort for the communication participants (maximum vs minimum time delay, freedom
of movement for listeners thanks to headphones in the case of tour guide systems). Client
convenience seems to be a more dominant factor than interpreter convenience or interpreta-
tion accuracy.

21.2.4 Digitalisation of secondary information and knowledge work


All technological novelties in interpreting discussed so far are hardware-driven. As Braun
(2019) states, these are forms of technology-mediated and technology-supported interpret-
ing (as opposed to technology-generated interpreting) that mainly affect the ‘external’,
practical workflow. However, another dimension of technology-supported interpreting is
the support of the mental process of interpreting as such, that is, an ‘internal’ and cognitive
support.
Digitalisation and the emergence of hardware (portable computers, handheld devices,
other widely available generic user software, and in the 1990s and early 2000s, the World
Wide Web and email) provided completely new possibilities for interpreters to manage and
exchange information with clients and colleagues. In addition to generic software solutions
like word processing, spreadsheet, and database programmes, interpreter-specific tools also
emerged, with the aim of assisting terminology management during simultaneous interpret-
ing. Although their uptake among conference interpreters is not the norm (Jiang, 2015;
Wagener, 2012, 53, Corpas Pastor and Fern, 2016, 34, 53), examples of terminology man-
agement include Interplex, Lookup, and Interpreters’ Help (Rütten, 2017, 98ff). In any
case, for the first time, it became possible to edit, copy, paste, and reorganise texts and
glossaries back and forth effortlessly, and documents and glossaries could be shared. Inter-
preters’ (and everyone’s) information work and exchange became faster and much more
efficient as a result. Technical glossaries around the most obscure subjects, in any language
imaginable, are said to still be jealously guarded in the basements of more experienced col-
leagues. Once a scarce commodity for many decades, information on highly specific topics

390
Workflows and working models

in different languages has gradually become commonplace (with Linguee and Wikipedia
being the icing on the cake). Information overload has become a highly discussed topic, not
only in the world of interpreting. The proliferation of texts on all sorts of subjects, in many
languages, has been fuelled by the onset of search engine optimisation, not to mention
AI-generated texts in recent years. And still, new glossaries are being created from scratch
every day. This underlines the highly specialised and contextual nature of their work and
the need for highly targeted information.

21.2.5 Digitalisation of administrative and organisational tasks


As far as work organisation for freelance interpreters is concerned, at the beginning of the
millennium, it was still common practice to use public phones to check your answering
machine at home to see if a client had called. This was already very progressive. Since then,
mobile devices and mobile internet have changed the way interpreters organise their opera-
tive work. Even managing other clients and jobs while in the booth, or preparing the next
day’s assignment, has become commonplace, even more so with meetings being scheduled
and preparatory documents arriving at ever shorter notice. In addition, expected response
times to email and voice messages are not the same as they were in the 1980s either. All
in all, the mere possibility of carrying out tasks not directly related to the job at hand has
added yet another – albeit optional and manageable – layer of cognitive load to the inter-
preter’s work, especially in the booth. This represents a further increase in efficiency, or
another step towards intensification and ‘simultanification’.

21.2.6 Cloud-based collaboration


Moving further into the new millennium, cloud-based online collaboration emerged.
Google Docs and Sheets were launched in 2006 and had a major rebuild in 2010, which
made cloud-based online collaboration more convenient for teamwork (McHugh-Johnson,
2021). It was around this time that team glossary building became popular among confer-
ence interpreters. For the first time, real-time workload-sharing was possible. This brought
another increase in efficiency. It is worth noting that team glossaries can add complexity
and additional coordination effort, that is, increased cognitive load, to glossary creation,
as more people work on it, potentially simultaneously, and in some cases, additional lan-
guages are added to a file. On the other hand, the workload required to screen documents,
extract terminology, and find equivalents can be shared among team members. Similarly,
the consistency of the language used by the team is improved, as everyone has the same level
of information. Team spirit can also be strengthened, which is potentially a useful counter-
balance to remote interpreting settings. For the first time, technology brings the potential
of actually reducing workload, as well as bringing teams of interpreters, not co-located in
videoconference settings, closer together.

21.2.7 Videoconferencing
The next and most recent form of technology that has had a huge impact on conference
interpreting – the term ‘technology’ is again used here to mean mediation and not support
(Braun 2019) – is videoconferencing. While tests using audiovisual connections for remote
interpreting were conducted long before the turn of the century (Ziegler and Gigliobianco,

391
The Routledge Handbook of Interpreting, Technology and AI

2018, 123) and over-the-phone interpreting has long since been a reality, it was not before
the outbreak of COVID-19 that videoconference interpreting, in particular, remote simul-
taneous interpreting (RSI), became a ‘new normal’ setting for conference interpreters. The
ISO standard 20108:2017 defines distance interpreting or remote interpreting as ‘interpret-
ing of a speaker in a different location from that of the interpreter, enabled by information
and communications technology (ICT)’. While this can mean that all meeting participants
are in the same location, with only the interpreters connected remotely, nowadays vide-
oconferences are common practice in business and politics settings, where often both meet-
ing participants as well as the interpreters are distributed across different locations around
the world, or hybrid meetings, where some participants and interpreters are physically
co-located while others are connected remotely, from their own (home) offices or commer-
cial interpreting hubs (AIIC, 2019d; Braun, this volume).
Depending on the remote setting, additional tasks may need to be fulfilled by the inter-
preter, particularly in remote simultaneous interpreting (RSI) settings:

• If not working from a hub: creating the technical set-up before the meeting (computer,
ethernet connection, camera, microphone, videoconferencing software)
• Managing the videoconferencing software during the meeting
• Maintaining contact with the client/conference participants via chat/email, etc.
• If not co-located with their booth partner: communicating via chat/email/video or voice
backchannels, finding a way of supporting each other remotely, managing microphone
handover swiftly

Aside from this addition of more tasks and alienation from the setting, limited access and
contact with the communicative setting and participants, potentially poor sound quality,
and split attention in hybrid meetings add another layer of cognitive load. In institutions
like the European Commission and Parliament and European Patent Office, this has led to
an increase in team strength or restriction of working hours in settings involving videocon-
ference participants (Mahyub Rayaa and Marti, 2022).

21.2.8 Computer-aided interpreting and artificial intelligence


Finally, in recent years, computer-aided interpreting (CAI) tools have become a
much-discussed topic in conference interpreting. Fantinuoli defines CAI tools as ‘all sorts
of computer programs specifically designed and developed to assist interpreters in at least
one of the different sub-processes of interpreting’ (Fantinuoli, 2018, 155). However, many
tools can be considered tools that provide computer aid for interpreting, even if they have
not been designed for this purpose (Rütten, 2017, 99). Everything can be considered as
computer-aided interpreting tools, from the initial glossaries in text – or multiterm – format
and electronic dictionaries in the early days up to automatic term extraction, to abstracting
and machine translation (nowadays often based on large language models). As for the con-
cept of ‘computer-aided or computer-assisted interpreting’, in a broader sense, this could
therefore be considered to be any act of interpreting that is performed with the support of
computer technology in any of its phases, of which tailor-made systems for interpreters and,
in particular, those providing live prompting are one part.
The most recent technological leap in computer-aided interpreting is artificial intelligence
for large language models trained on enormous datasets (Leffer, 2023). These represent

392
Workflows and working models

impressive progress in machine translation and speech recognition, among others. Thus, AI
has made its way to interpreters’ workflow. This could have an even more sizeable impact
on workflow. Aside from tasks such as abstracting, terminology extraction, and glossary
creation, AI support is also possible during simultaneous interpreting. Examples include
SmarTerp, InterpretBank’s Automatic Booth Assistant, and Cymo Note, which already
offer live support when it comes to displaying terminology, numbers, and named entities.
In addition, the University of Ghent’s EABM (Ergonomics for the Artificial Booth Mate)
project also aims to provide a similar tool. In terms of cognitive load, it remains to be
seen whether real-time, automated computer support adds another layer of cognitive load,
causes split attention, and increases stress and/or deteriorates performance, or if it relieves
the complex process of simultaneous interpreting. Research to date shows a heterogene-
ous image (cf. Chapter 2.3). In terms of listening support, generic transcription tools like
otter.ai or notta.ai/, or live captions and translations like, for example, Speechmatics, may
become an alternative to bespoke CAI tools in the future. These tools are, of course, far
from intuitive to ‘read’ while interpreting, but their user interfaces might eventually opti-
mise in a way that also meets interpreters’ needs.

21.2.9 Screen multiplication


In simultaneous interpreting especially, technological innovation, introduced since the begin-
ning of the millennium, has led to a proliferation of sources of information being handled.
The multitude of screens potentially present in a simultaneous interpreting setting (nowa-
days not necessarily a ‘booth’ but a desk in an interpreter’s office) is symbolic of this prolif-
eration of information sources. This phenomenon is illustrated in Figure 21.1. The different
sources of information are shown as one separate screen each. In practice, not all these nec-
essarily have their own physical screens; some can be found as windows on one or several
computers. Working in such a (metaphorical) multi-screen or multi-resource environment

Figure 21.1 Multi-resource booth environment.

393
The Routledge Handbook of Interpreting, Technology and AI

requires a higher degree of attentional coordination. The interpreter is expected to handle


their cognitive capacities wisely and capitalise on the wealth of resources provided.

21.3 Workflow dimensions: phases and efforts, semiotics,


and information and knowledge
As outlined earlier, ‘technologisation’ has impacted the profession of conference interpret-
ing in several ways. Technology-based intensification through ‘simultanification’ and more
information to be processed improves efficiency alongside but increases complexity and
cognitive load, as well as alienation from the interpreting settings. All these developments
have a variety of effects on different dimensions of the workflow, that is, the way an inter-
preter’s work is organised (Cambridge Dictionary, 2024).
When looking at the approaches found in interpreting studies, there are several ways of
structuring an interpreter’s workflow. These can be:

• The levels of processing from data over information to knowledge


• The semiotic levels of the information being treated
• The timeline and subtasks or efforts of interpreting

21.3.1 Data, information, knowledge, and management


Interpreters are knowledge workers who constantly move back and forth between settings,
languages, and topics. To do this, they rely heavily on their own knowledge. This is com-
plemented by external information sources explored during preparation, ‘on the job’, and
afterwards. This process can be considered ‘secondary knowledge work’ – its purpose being
to ensure that the actual, ‘primary’ knowledge work, that is, the activity of interpreting, is
carried out properly (Rütten, 2007, 102f).

Figure 21.2 Primary and secondary knowledge work.

394
Workflows and working models

21.3.1.1 Conceptual framework


The interplay between data, information, and knowledge is characteristic of the different,
interwoven processing levels, in particular, in meeting preparation. In the field of confer-
ence interpreting, information can be defined as ‘knowledge that has been activated and
referenced and that is needed for a given decision or a given context of action (relevance,
context and purpose)’. Accordingly, this has a value in the sense of being useful or hav-
ing a particular effect (pragmatics). It also has a specific meaning (semantics), which is
dependent on the purpose or context, and thus on the given circumstances in which it is
used (individual state of mind of the subject using the information, situation). It is knowl-
edge that the current actor does not possess personally or cannot access directly (novelty).
This knowledge can only be received as a representation/encoding of knowledge and is
not necessarily true. Data, in contrast, is unstructured, isolated, and context-independent.
It includes signs structured by syntactic rules and only becomes interpretable information
in context.
Knowledge is the ‘totality of knowledge and skills that have been acquired at a given
point in time, in such a way that they have been acquired in a specific context or due to a
current necessity’. Alternatively, knowledge has been specifically found and can be inter-
preted and applied by comparison with existing knowledge. It is available for future use
in the long term, regardless of the specific context of use. It has both a semantic and a
pragmatic dimension, as it is used in specific situations and, therefore, also includes situ-
ational knowledge. Knowledge is always bound to a person and, unlike information, is not
coded. It includes declarative knowledge (‘knowing what’, facts) and procedural knowledge
(‘knowing how’, performing tasks) (Rütten, 2007, 20f; Kuhlen et al., 2004).
The different levels of enrichment corresponding to data, information, and knowledge
apply throughout the different phases of secondary information and knowledge work.
Obviously, they are not strictly sequential but are rather interwoven.
Level I: Data work. Rather mechanical retrieving and compiling of data, searching for
relevant information (agendas, minutes of previous meetings, presentations, manuscripts,
websites of the parties involved, and glossaries). This may also include physically obtain-
ing data.
Level II: Information work. Processing and analysing the information contained in the
data, that is, extracting from this ‘raw material’ the elements which are potentially relevant
for the assignment (terms, meanings, context). Looking up missing information, such as
semantic background and terminological gaps. Data is turned into information inasmuch
as it is (at least potentially) relevant, new, and useful in a particular context. Terms that are
considered relevant are entered into a paper or computer glossary or more sophisticated
term database, where they are assigned subject areas, clients, priority, date, etc. Short lists
are created in digital format or on a sheet of paper/Post-its so the most critical information
is visible at all times, in the booth.
Level III: Knowledge work. Making use of the processed information, checking it
against one’s personal knowledge. Deciding which of the relevant pieces of information are
most probably retrievable from memory, even under cognitive load, and memorising the
most relevant previously unknown items before the conference. This turns the information
into active knowledge. Otherwise, organising the processed information so as to make it
visible/searchable when it is required to fill knowledge gaps (Rütten, 2007, 18ff; Probst
et al., 1999, 23).

395
The Routledge Handbook of Interpreting, Technology and AI

What turns ‘information and knowledge work’ into ‘information and knowledge man-
agement’ is the deliberate orientation of work towards a defined goal (successful com-
munication) and the corresponding measurement, or at least evaluation, of the level of
achievement, as well as the corresponding deliberate and selective handling of information
and knowledge beyond one single assignment and isolated tasks (Rütten, 2007, 153ff). This
connects the end of one working cycle (assignment) to the beginning of the next. This helps
optimise the workflow and, at the same time, takes a somewhat-overarching perspective.

21.3.1.2 Impact of technology


Relevant information for a specific conference in a specific language combination used
to be a scarce good in the pre-internet era. Getting hold of information on a very specific
subject in the languages required (i.e. retrieving data) used to be the primary challenge in
the preparation workflow. Nowadays, meeting documents are easily distributed via email
or cloud folders. Detailed dictionaries are accessible online and don’t need to be bought in
a bookshop or carried to conferences in trolleys. The logistics of moving physical data car-
riers back and forth are no longer an issue.
What with the great amounts of potentially relevant information available, a certain
level of information literacy can be helpful. Selection/prioritisation, classification, automa-
tisation, systematisation, and extraction (Rütten, 2007, 226f) are important information
management strategies.
Fortunately, technology not only provides us with vast amounts of information but also
gives us the means to capitalise on it. Tools are available for extracting both ‘meaning’
(abstracting) and terminology. This facilitates the grasp of meaning and terminology in
large amounts of text, especially when no thorough reading is possible, or the interpreter
has to arrive at a task relatively last-minute. Existing automatic terminology extraction
tools do not always meet interpreters’ needs (Goldsmith, 2020, 299). However, there are
also solutions that support manual extraction, to make this process less-tedious. When
terminology has been gathered in one language, machine translation can help create a draft
translation in other language(s) automatically. This, then, can be accepted or corrected.
Automatisation in extraction and translation tasks can therefore be a huge time saver if
used wisely, that is, combined with human knowledge and judgement to check for correct-
ness and relevance of the information compiled.
Once all the terminology has been collected, it needs to be checked against one’s own
knowledge to see which terms are still difficult to recall from memory, especially under high
cognitive load. In times of quick searches in the booth (especially when using booth-friendly,
mouse-free searching tools, such as Interplex, InterpretBank, or Interpreters’ Help/Booth-
Mate, and even more so with the support of a live prompting CAI tools), it may be tempting
to skip the step of learning the most important terminology by heart. There are, however, ben-
efits to memorising information, that is, converting it into more easily retrievable knowledge
before the meeting, a kind of cognitive automatisation. Firstly, this reduces cognitive load in
the booth (Stoll, 2009). Secondly, this is more sustainable in cognitive terms, as information
converted into knowledge is ‘reusable’ and, ideally, does not need to be looked up again.
For an efficient use of cognitive resources, it makes sense not to memorise all terminol-
ogy randomly but to classify and sort it by priority/relevance and/or by subject area. Sort-
ing functions in electronic terminology databases provide vast improvements over paper.
Flashcard apps and text-to-speech functions are examples of how technology can make

396
Workflows and working models

‘learning vocab’ more efficient and convenient, as this can also be done ‘on the go’. Another
technology-enabled practice in meeting preparation is online collaboration. Shared online
glossaries, edited by several interpreters of a team, not only are efficient when it comes to
sharing workload but also make exchange among team members easier. This helps synchro-
nise and consolidate knowledge.
In summary, it can be stated that technology has made interpreters’ work less cumber-
some at the level of data retrieval. It has also provided greater amounts of information and
more sophisticated means of processing and managing it. Besides the risk or temptation of
overreliance on technology in the booth, technology brings a number of convenient options
for transforming processed information into knowledge.

21.3.2 Semiotics of interpreting


Information and knowledge (Chapter 3.1), as well as the process of interpreting, include
different dimensions that can be mapped onto the three tips of the semiotic triangle. The
semiotics of interpreting provide another valuable structural framework for analysing the
workflow of interpreting.

21.3.2.1 Conceptual framework


The semiotic triangle, which can be applied to any sign used by humans in communica-
tion, includes the following dimensions: codification or form (the linguistic form informa-
tion takes, that is, any morpheme – words, technical terms, expression, morpho-syntactical
structures), semantics (underlying meaning), the knowledge segment represented via lin-
guistic representation, and pragmatics or situational relevance and purpose in the respective
situation (Rütten, 2007, 27; Zimmermann, 2004, 707f; Arntz et al., 2002, 38; Lewan-
dowski, 1990, 1134).
While semantic and pragmatic dimensions are ideally the same for all communicating
parties, the ‘form’ dimension is twofold. This is because there is both a source and a target
language form.

Figure 21.3 Semiotics of interpreting.

397
The Routledge Handbook of Interpreting, Technology and AI

Unlike large language models, which transcode the superficial form of one language into
another, interpreters, just like translators, work on the basis of the underlying semantics.
Unlike translators or terminologists, interpreters work in the moment, for a unique setting,
with a more or less defined group of participants. Accordingly, the circumstances of the
situation play a greater role and also allow for more freedom. A ‘correct’ interpretation
is what participants accept or prefer, even if no one outside the group would understand.
Expressions that are documented to be correct, be it in a dictionary or other references, will
not necessarily be understood. This could be due to a lack of background knowledge or dif-
ferent origins. Expressions that would clearly not be equivalents in a strict conceptual sense
(e.g. suction cleaner nozzle and suction nozzle) can be perfect equivalents in the context of
a particular meeting, because everyone knows they are referring to the same object.
While it is of course important to know the correct terminology (i.e. form) for, say, the
German Aktiengesellschaft, Vorstand, Aufsichtsrat, and Verwaltungsrat in the other lan-
guage, in case of emergency, such punctual knowledge gaps can be filled by looking them
up or (often more efficiently) asking a boothmate. If, however, the interpreter is unfamiliar
with the intricacies of company management structures in the different countries (semantic
dimension), the risk of getting a message wrong is higher and more complicated to sort out
ad hoc. Errors in interpreting resulting from such semantic, or pragmatic, knowledge gaps
often create more severe miscommunication than ‘just the wrong word’ (Rütten, 2007,
196f).

21.3.2.2 Impact of technology in terms of semiotics


Considering technology-assisted interpreting from a semiotics point of view, there are
effects on all three levels. At form level, computer assistance to find just the right word at
the right moment through a quick search or live prompt is a real benefit. At semantic level,
many digital sources, including explanatory videos, pictures, articles on highly specific sub-
jects on Wikipedia, in scientific articles, on company websites, in private blogs, etc., provide
insight into the exact question at hand and can help grasp a new concept quickly, even
peri-process in the booth.
The same applies at pragmatic level. During preparation, it has become much easier to
find out about what a keynote speaker may have in mind by getting to know their (online)
personality, often even seeing them in-person in videos. In the booth, filling knowledge gaps
ad hoc, for example, looking up a missing term or checking background information, when
needed, has become a normality. Our brains remember information or particular words
better when we elaborate them (e.g. discuss with boothmate) (Bartsch and Oberauer, 2021).
If said aloud (MacLeod, 2011), this kind of active knowledge work is more sustainable and
motivating than checking a context-free list of missing information in the follow-up phase,
after an assignment.
As for interpreting settings, videoconferences can create an additional caveat as to the
interpreter’s understanding of, or immersion into, the communicative context. Visual cues
like body language, positions of people in the room, etc. are difficult to perceive and ham-
per the understanding of relations and interaction between people, moods, and other con-
textual information. Sometimes, with high numbers of unknown participants, or when a
videoconference is publicly streamed, there is little way of knowing about the participants’
expectations, previous knowledge, regional or professional background, etc.

398
Workflows and working models

On the other hand, the close-up camera view of each speaker often far surpasses the view
interpreters have on speaker from their booths in physical meetings. This can give a very
detailed impression of the speakers’ facial expression – an effect that could be even more
striking if image transmission in virtual reality format became common (Gigliobianco &
Ziegler, 2018).
Overall, digital information technologies have given interpreters better access to relevant
information across all semiotic levels, making their knowledge work potentially easier and
more meaningful. At pragmatic level, audiovisual transmission technologies may alienate
the interpreter (as well as normal participants) from the communicative situation but can
also offer additional situational insight.

21.3.3 Phases and subtasks


The levels of data–information–knowledge and semiotic dimensions previously discussed
span all tasks involved in an interpreting assignment. These usually follow one another in
a certain order in time.

21.3.3.1 Conceptual framework


According to Kalina, an interpreting assignment can be divided into the phases of pre-, peri-,
in-, and post-process (Kalina and Ziegler, 2015). The pre-process phase of preparation,
before the actual assignment, sometimes blurs into the peri-process phase. The peri-process
phase involves complete preparation while already in the situation of interpreting (i.e. at the
venue, or during a videoconference). Post-process work, that is, follow-up after the assign-
ment, involves completing and correcting information (e.g. terminology) compiled during
preparation. In the sense of a cyclical process of quality assurance, this could be connected
to the pre-process phase and also serves as a preparation for the next assignment, especially
in the case of regular clients (Kalina, 2005, 778ff).
Beyond knowledge work related to a particular assignment, there is also a more
generic dimension, intended to broaden, deepen, and consolidate the interpreter’s knowl-
edge base for potential future jobs and unforeseeable topics that may occur at any time.
Management-like considerations regarding the need for optimisation of information and
knowledge work or improving the interpreting performance also constitute a link that
closes the circle of the workflow (Rütten, 2007, 148). What clearly differs from the work
pre-, peri-, and post-process is the actual act of interpreting. Interpreting in-process is much
more focused in nature, and due to situational constraints and the high cognitive load (Gile,
1997, 198ff), opportunities for secondary information work are mostly limited to the occa-
sional search for a specific term (Stoll, 2009, 198f).
The actual process of interpreting itself – in-process – involves several sub-processes.
This is especially even in simultaneous, but also present in consecutive, interpreting. Both
modes require a high level of cognitive capacity and coordination. In his effort model, Gile
considers the interpreting process to consist of various efforts which compete for the limited
available cognitive capacity. In simultaneous interpreting, these efforts include listening and
analysing–production–memory–coordination. In consecutive interpreting, these efforts are
divided into two phases: Phase 1 (listening): listening–memory–note-taking–coordination.
And Phase 2 (reformulation): memory recall–note-reading–production (Gile, 1997, 198ff).

399
The Routledge Handbook of Interpreting, Technology and AI

For simultaneous interpreting, Seeber further differentiates the subtasks in his model for
measuring the cognitive load:

S storage in working memory,


P perceptual auditory verbal processing of input and output,
C cognitive-verbal processing of input and output,
R verbal-response processing of output,
I interference due to overlapping tasks
(Seeber, 2011, 187ff)

21.3.3.2 Impact of technology in different phases and subtasks


Booth-friendly terminology management software, with mouse-free search functions and
multi-source online search tools, have made it possible to make ad hoc queries in exten-
sive glossaries, and other online and offline resources, with much less cognitive effort.
This is especially when users can also blind-type. Thus, these tasks, formerly part of the
‘pre-process’, can now, at least occasionally, be performed ‘peri-’ and ‘in-process’.
With CAI tools providing live prompting in-process, based on automatic speech rec-
ognition (ASR), the boundaries between pre-, peri-, and in-process knowledge work may
become even more blurred (see Prandi, this volume). The question of their (perceived)
usefulness and acceptance is already a subject of great interest in interpreting studies.
Research suggests that live prompting tools can improve accuracy in interpreting numbers
by two-thirds, with complex numbers and decimals being the most error-prone types of
numbers (Desmet et al., 2018, 24f). In addition, increased terminological accuracy can be
achieved compared to looking terms up in a pdf glossary. However, this is less apparent
in comparison to looking terms up manually in a booth-friendly terminology management
tool (Prandi, 2023, 208ff). One further advantage in using ASR is that the user does not
need to know the spelling of an unknown term (Prandi, 2023, 244). On the other hand,
there is evidence that live prompting can cause distraction. For example, interpreters may
feel they need to check the correctness of the results displayed, or that live prompting helps
interpret numbers correctly but leads to errors in interpreting the overall meaning of the
message (Frittella, 2023, 140ff).
Overreliance can also cause problems when systems fail to provide support or offers
erroneous information (Defrancq and Fantinuoli, 2020, 26). Overall, the usefulness and
acceptance of CAI live prompting and the question of cognitive load vs cognitive burden
may depend on numerous different aspects. Conversely, interpreters’ attitudes and ability to
develop a certain routine in the use or deliberate non-use of live prompting may influence
its usefulness and acceptance. On the software side, however, aspects like intuitiveness of
the user interface, speed of the prompting, quality of the ASR, correctness and relevance
of terms, numbers or named entities displayed, and potentially, the option to customise
and/or edit the information displayed are all factors to consider. Being able to rely on this
type of system is a new challenge in interpreter’s meeting preparation. It’s a question of
striking a balance between receiving only relevant on-the-spot terminology, avoiding too
much ‘noise’, and not missing important information. In order to make such a tool display
customised terminology for a particular meeting, this terminology needs to be ‘fed’ into the
system beforehand, pre-process. Creating glossaries for live prompting requires consider-
able different and a highly disciplined approach compared to what is currently common

400
Workflows and working models

practice for many interpreters. If the interpreter wants only exact matches to be recognised
by the system, with no fuzzy matches, then any term interpreters want to see on their
screens in the booth needs to be first taught to the system in advance (SmarTerp, 2024). For
example, you couldn’t just write:

suction cleaner nozzle, suction nozzle Staubsaugerdüse, Saugdüse

But neatly spell out each possible pair of equivalents:

suction cleaner nozzle Staubsaugerdüse


suction nozzle Staubsaugerdüse
suction nozzle Saugdüse

Ideally, glossary creation would be carried out by CAI tools based on a specially trained
AI. This would considerably reduce the preparation work pre-process and leave the inter-
preter with more capacity for important content and context-related preparation. In any
case, this may reduce the effort of memorising terminology pre-process, as well as looking
up terminology in-process.
As for the primary interpreting activity as such, a subtask where technology can be of
great use is the perceptual auditory verbal processing of the source language. For languages
of high diffusion, transcription tools already provide reasonable (although not flawless)
support. This can help with following very fast or unclear speakers or, in the case of Eng-
lish, unknown accents.
In consecutive interpreting settings, interpreters are usually directly involved in the com-
municative setting (standing in front of an audience or sitting between the parties). Conse-
quently, relying on digital or paper glossaries or live prompting is far more difficult. It was
only with tablets for note-taking that consecutive interpreters had a real option of looking
up terms ad hoc or having any cognitive support that goes beyond a short ‘cheat sheet’. In
addition, the interpreter can also take notes on an electronic device. This can offer several
convenient functions, like copying and pasting or erasing notes, using different line thick-
nesses and colours, browsing the electronic notepad, zooming in and out, etc. (Goldsmith,
2017, 43ff). This facilitates the clearer organisation of information and the filling of knowl-
edge gaps ad hoc.
Similarly to pre-process, tasks that once had to be completed post-process (once an inter-
preter is back at their desk) can now be competed ‘on the job’. The use of portable comput-
ers in the booth enables interpreters to update their terminology or even carry out some
background research to fill semantic knowledge gaps peri-process. Other post-process tasks
like journal writing or self-assessment may also be supported by technology, for example,
by facilitating log files of CAI tools.
Overall, in terms of secondary information and knowledge work that formerly had to be
completed pre- or post-process, this can now theoretically be carried out in- or peri-process,
on top of the primary task of interpreting. On the other hand, CAI tools can help reduce
the burden of filling knowledge gaps up ad hoc and possibly also spare at least some of the
memorising of terminology pre-process. However, this would require a different approach
in preparation.
Furthermore, mobile computing enables interpreters to prepare for one meeting while
still attending another. They can also attend to clients’ requests and take care of logistical

401
The Routledge Handbook of Interpreting, Technology and AI

Figure 21.4 Overlapping phases of an interpreting assignment.

issues that used to be done before or after the meeting, before the era of mobile computers
and smartphones.

21.4 Business impact


On the internet, one can find any niche product or service far easier than in the pre-internet
era. Job platforms, online directories, and freelancers’ own websites also make it easier
for clients to track down interpreters who match their requirements in terms of languages,
expertise, experience, and location exactly. It is, however, also easier for clients to compare
prices and request as many quotations as they like from service providers. In this context,
competition and price pressure increase. On the other hand, it is also easier for clients
and service providers to find each other without intermediaries. This increases chances
for long-term, in-depth cooperation. Clients and service providers can be matched much
more efficiently and precisely. Recruiting interpreters on the basis of their profile could
considerably reduce their preparation time and deepen their background knowledge (Rüt-
ten, 2016, 137).
With the widespread use of videoconferences during the COVID-19 pandemic, a new
setting has emerged both for clients and for interpreters. Videoconference meetings tend
to be shorter and more intense. Without need for travel, it is easier to accommodate
more days, or hours, in a week, due to less overlap between meetings. With a wider variety
of assignments, a one-size-fits-all standard daily interpreting rate does not fit every purpose.
However, with smaller ‘micropayment units’ (e.g. minutes or hours), there is a risk of the
focus being reduced to the mere primary task of interpreting, without factoring in second-
ary knowledge- and non-knowledge-related work. Financial literacy can ensure that more
flexible fee structures both suit the clients’ needs and allow for profitable income structures
for interpreters (Rütten, 2016, 136f).
The most extreme scenario, in terms of technological impact on interpreting, might
be the replacement of human interpreting and their workflow altogether, either partly or
completely, in some areas (e.g. when certain languages of a conference are interpreted by
humans and others by machines; see also Fantinuoli, this volume). Simple, non-technical
assignments, where the content and outcome are not of critical importance, could be the
first ones to be performed by machines. The potentially practical, logistical, and finan-
cial advantages could outweigh lack of quality, reliability, or accuracy when there are no
bulky booths to be set up; when there are no travel arrangements, hotel rooms, or food

402
Workflows and working models

to organise; or when the organiser does not have to wait for a free slot in the interpreters’
agendas. However, where human interpreters will most probably still be needed are when
assignments require any of the following: technical, political, diplomatic, emotional, or
contextual understanding; tact; or when human creativity is required to translate a joke,
understand irony, convey newly created terms and ideas in another language; or where a
plausibility control is required. Considering that low-intensity assignments are more likely
to disappear for human interpreters, since they are more easily replaceable by machines or
the use of a common (foreign) language like English, this is another factor contributing to
the intensification of conference interpreting.

21.5 Conclusion
Technology, in very general terms, has contributed to the intensification and simultanifica-
tion of interpreters’ workflow over time, with simultaneous interpreting and videoconfer-
encing probably being the biggest contributors. These major developments have increased
the interpreters’ alienation from the communicative setting. From a client’s perspective,
acceptance of technology in interpreting seems to be influenced not only by accuracy but
also – or, to a certain extent, even more so – by convenience. Interpreters have long been
valued for what they are: human minds listening to what is being said, extracting and pro-
cessing underlying meaning, and communicating it to another party in an understandable
way (Seleskovitch, 1968, cited in Seleskovitch, 1992, 41).
As mobile devices such as laptops and tablets have entered booths, many more tasks
are able to be performed during an assignment. This potentially further adds to the
technology-induced intensification of work in the booth. The lines between the differ-
ent phases have become blurred. Now, however, new developments have also brought
technology-enabled benefits for interpreters. The large increase in available data and infor-
mation has gone hand in hand with more efficient exchange and software solutions – as if
interpreters have been given a pile of earth and a shovel. Cloud-based online collaboration,
for the first time, has allowed for real teamwork and workload-sharing in preparation.
This, in turn, has the potential to further increase complexity and simultaneity, as well as
efficiency and quality.
Information literacy, including strategies such as selection/prioritisation, classification,
automatisation/memorisation, systematisation, and extraction, can help capitalise on the
wealth of information available. Just as financial literacy can help capitalise on new busi-
ness opportunities created by the internet, the more sophisticated means of processing and
managing information can be aided by information literacy strategies. However, excessive
use of information technologies may lead to overreliance on it in the booth without memo-
rising terminology anymore. On the other hand, flashcard functions and other tools can
help convert processed information into knowledge. Furthermore, existing live prompt-
ing CAI tools, or those that are currently being developed, can help reduce the burden of
filling knowledge gaps while interpreting. However, these require a different approach in
preparation.
It remains to be seen whether these tools make their way to the market. The uptake of
specific CAI tools by interpreters has not been excessive in the past. Similarly, an increase
in accuracy of interpreting performance is not necessarily a convincing argument to clients
either. Nonetheless, at least bespoke CAI tools offering real-time prompting are in line
with the current trend concerning the use of AI, which is to combine the strengths of both

403
The Routledge Handbook of Interpreting, Technology and AI

humans and machines. Wilson et al. found that firms achieve the most significant perfor-
mance improvements through collaborative intelligence, where humans and AI comple-
ment their respective strengths:

The leadership, teamwork, creativity, and social skills of the former, and the speed,
scalability, and quantitative capabilities of the latter. What comes naturally to people
(making a joke, for example) can be tricky for machines, and what’s straightforward
for machines (analysing gigabytes of data) remains virtually impossible for humans.
Business requires both kinds of capabilities.
(Wilson & Daugherty, 2018)

It will be interesting to see which form of technological support the future brings for
conference interpreters in order to take advantage of human–machine synergies. Could it
be intuitive facial gesture commands, document navigation via speech recognition, teaching
AI to predict problematic elements, correction of pronunciation, or something else entirely?
The rapid technological developments in AI and human–machine interaction at least allow
for speculation that intuitive support for the highly specific needs of conference interpreters
might one day become a reality.

References
AIIC, 2019a. History of the Profession. URL https://aiic.org/site/world/about/history/profession
(accessed 21.2.2024).
AIIC, 2019b. History of AIIC. URL https://aiic.org/site/world/about/history (accessed 21.2.2024).
AIIC, 2019c. Glossary. URL https://aiic.org/site/world/conference/glossary (accessed 22.2.2024).
AIIC, 2019d. Leitlinien der AIIC für das Ferndolmetschen (Distance Interpreting). Version 1.0.
URL https://aiic.de/wp-content/uploads/2019/08/aiic-leitlinien-ferndolmetschen-20190802-2.pdf
(accessed 15.3.2024).
Arntz, R., Picht, H., Mayer, F., 2002. Einführung in die Terminologiearbeit. Georg Olms Verlag,
Hildesheim, Zürich, New York.
Bartsch, L.M., Oberauer, K., 2021. The Effects of Elaboration on Working Memory and Long-Term
Memory Across Age. Journal of Memory and Language 118. URL https://doi.org/10.1016/j.
jml.2020.104215
Braun, S., 2019. Technology and Interpreting. In O’Hagan, M., ed. The Routledge Handbook of
Translation and Technology. Routledge, London, 271–288.
Cambridge Dictionary, 2024. Cambridge University Press & Assessment. URL https://dictionary.
cambridge.org/de/worterbuch/englisch/workflow (accessed 9.7.2024).
Chen, S., 2018. Exploring the Process of Note-Taking and Consecutive Interpreting: A Pen-Eye-Voice
Approach Towards Cognitive Load (PhD thesis). Department of Linguistics, Faculty of Human
Sciences, Macquarie University, Sydney, Australia.
Corpas Pastor, G., Fern, M.L., 2016. A Survey of Interpreters’ Needs and Their Practices Related to
Language Technology. Technical Report. Universidad de Málaga, Málaga.
Defrancq, B., Fantinuoli, C., 2020. Automatic Speech Recognition in the Booth: Assessment of Sys-
tem Performance, Interpreters’ Performances and Interactions in the Context of Numbers. Target
33(1), 73–102. URL https://doi.org/10.1075/target.19166.def
Desmet, B., Vandierendock, M., Defrancq, B., 2018. Simultaneous Interpretation of Numbers and the
Impact of Technological Support. In Fantinuoli, C., ed. Interpreting and Technology: Translation
and Multilingual Natural Language Processing 11. Language Science Press, Berlin.
Fantinuoli, C., 2018. Computer-Assisted Interpreting: Challenges and Future Perspectives. In Corpas
Pastor, G., Durán-Muñoz, I., eds. Trends in E-Tools and Resources for Translators and Interpret-
ers. Brill, Leiden, 153–174. URL https://doi.org/10.1163/9789004351790_009
Ferrari, M., 2002. Traditional vs. “Simultaneous Consecutive”. SCIC News 29, 6–7.

404
Workflows and working models

Frittella, F.M., 2023. Usability Research for Interpreter-Centred Technology: The Case Study of
SmarTerp. Translation and Multilingual Natural Language Processing 21, Language Science Press,
Berlin.
Gaiba, F., 1998. The Origins of Simultaneous Interpretation: The Nuremberg Trial. University of
Ottawa Press, Ottawa.
Gigliobianco, S., Ziegler, K., 2018. Present? Remote? Remotely Present! New Technological
Approaches to Remote Simultaneous Conference Interpreting. In Fantinuoli, C., ed. Interpreting
and Technology. Language Science Press, Berlin, 119–139.
Gile, D., 1997. Conference Interpreting as a Cognitive Management Problem. In Danks, J., Shreve,
G.M., Fountain, S.B., McBeath, M.K., eds. Cognitive Processes in Translation and Interpretation.
Sage Publications, Thousand Oaks, 196–214.
Goldsmith, J., 2017. A Comparative User Evaluation of Tablets and Tools for Consecutive Interpret-
ers. In Translating and the Computer 39. Proceedings. AsLing, London, 40–50.
Goldsmith, J., 2020. Terminology Extraction Tools for Interpreters. In Ahrens, B., ed. Interdepend-
ence and Innovation in Translation, Interpreting and Specialized Communication. Frank &
Timme, Berlin, 279–302.
Hamidi, M., Pöchhacker, F., 2007. Simultaneous Consecutive Interpreting: A New Technique Put to
the Test. Meta 52, 276–289. URL https://doi.org/10.7202/016070ar
Heery, E., Noon, M., 2008. A Dictionary of Human Resource Management, 2nd ed. Oxford Univer-
sity Press. URL https://doi.org/10.1093/acref/9780199298761.001.0001
ISO, 2017. ISO 20108:2017 Simultaneous Interpreting – Quality and Transmission of Sound and
Image Input – Requirements.
Jiang, H., 2015. A Survey of Glossary Practice of Conference Interpreters. aiic.net. URL https://api.
semanticscholar.org/CorpusID:60436853 (accessed 5.3.2024).
Kalina, S., 2005. Quality Assurance for Interpreting Processes. Meta 50(2), 769–784.
Kalina, S., Ziegler, K., 2015. Technology. In Pöchhacker, F., ed. Routledge Encyclopedia of Interpret-
ing Studies. Routledge, London, 410–411.
Kuhlen, R., Seeger, T., Strauch, D., 2004. Grundlagen der Praktischen Information und Dokumenta-
tion, begründet von Klaus Laisiepen, Ernst Lutterbeck und Karl-Heinrich Meyer-Uhlenried. 5.,
völlig neu gefasste Ausgabe. Band 1: Handbuch zur Einführung in die Informationswissenschaft
und -praxis. K. G. Saur, München.
Leffer, L., 2023. When It Comes to AI Models, Bigger Isn’t Always Better. Scientific American.
URL www.scientificamerican.com/article/when-it-comes-to-ai-models-bigger-isnt-always-better/
(accessed 15.6.2024).
Lewandowski, T., 1990. Linguistisches Wörterbuch. 3 Bände. 5. Auflage. UTB, Heidelberg.
MacLeod, C.M., 2011. ‘I Said, You Said: The Production Effect Gets Personal. Psychonomic Bulle-
tin & Review 18, 1197–1202. URL https://doi.org/10.3758/s13423-011-0168-8
Mahyub Rayaa, B., Marti, A., 2022. Remote Simultaneous Interpreting: Perceptions, Practices
and Developments. The Interpreters’ Newsletter 27, 21–42. URL doi.org/10.13137/2421-
714X/34390
Matyssek, H., 1989. Handbuch der Notizentechnik für Dolmetscher: Ein Weg zur sprachunabhängi-
gen Notation. Groos Verlag, Heidelberg.
McHugh-Johnson, M., 2021. 15 Milestones, Moments and More for Google Docs’ 15th. In The Key-
word. Google, Mountain View, CA. URL https://blog.google/products/docs/happy-15-years-google-
docs/ (accessed 29.2.2024).
Orlando, M., 2015. Digital Pen Technology and Interpreter Training, Practice, and Research: Status
and Trends. In Ehrlich, S., Napier, J., eds. Interpreter Education in the Digital Age: Innovation,
Access, and Change. Gallaudet University Press, Washington, DC, 125–152.
Prandi, B., 2023. Computer-Assisted Simultaneous Interpreting: A Cognitive-Experimental Study on
Terminology. Translation and Multilingual Natural Language Processing 22, Language Science
Press, Berlin.
Probst, G., Raub, S., Romhardt, K., 1999. Wissen managen. Wie Unternehmen ihre wertvollste Res-
source optimal nutzen. Gabler, Wiesbaden.
Rozan, J.F., 1956. La prise de notes en interprétation consécutive. Georg, Geneva.
Rütten, A., 2007. Informations- und Wissensmanagement im Konferenzdolmetschen. Sabest. Saar-
brücker Beiträge zur Sprach- und Translationswissenschaft. Peter Lang, Frankfurt a. M.

405
The Routledge Handbook of Interpreting, Technology and AI

Rütten, A., 2016. Interpreters’ Workflows and Fees in the Digital Era. In Translating and the Com-
puter 38. Proceedings. AsLing, London, 133 ff.
Rütten, A., 2017. Terminology Management Tools for Conference Interpreters – Current Tools and
How They Address the Specific Needs of Interpreters. In Translating and the Computer 39. Pro-
ceedings. AsLing, London, 98 ff.
Seeber, K., 2011. Cognitive Load in Simultaneous Interpreting: Existing Theories – New Models.
Interpreting 13(2), 176–204. URL doi.org/10.1075/intp.13.2.02see
Seleskovitch, D., 1992. De la pratique à la théorie – Von der Praxis zur Theorie. In Salevsky, H.,
ed. Wissenschaftliche Grundlagen der Sprachmittlung. Berliner Beiträge zur Übersetzungswissen-
schaft. Peter Lang, Frankfurt a. M., 38–55.
Skolnikoff, E.B., 1993. The Elusive Transformation: Science, Technology, and the Evolution of Inter-
national Politics. Princeton University Press.
Stoll, C., 2009. Jenseits simultanfähiger Terminologiesysteme: Methoden der Vorverlagerung und Fix-
ierung von Kognition im Arbeitsablauf professioneller Konferenzdolmetscher. Wissenschaftlicher
Verlag Trier, Trier.
Svoboda, Š., 2020. SimConsec: The Technology of a Smartpen in Interpreting (MA dissertation).
Faculty of Philosophy, Faculty of Arts, University of Olomouc, Prague.
Van Herpen, D., 2017. Work Intensification: A Clarification and Exploration into Causes, Conse-
quences and Conditions. A Literary Review. School of Social and Behavioral Sciences, Tilburg
University.
Wagener, L., 2012. Vorbereitende Terminologiearbeit im Konferenzdolmetschen unter besonderer
Berücksichtigung der Zusammenarbeit im Dolmetschteam (MA dissertation). Faculty of Informa-
tion and Communication Sciences, Institute for Translation and Multilingual Communication,
University of Applied Sciences Cologne.
Wilson, H.J., Daugherty, P., 2018. Collaborative Intelligence: Humans and AI Are Joining Forces.
Harvard Business Review, July–August 2018 issue, 114–123.
Zimmermann, H.H., 2004. Information in der Sprachwissenschaft. In Kuhlen, R., Seeger, T., Strauch,
D., eds. Grundlagen der praktischen Information und Dokumentation, 5th ed., vol. 1. K. G. Saur,
Munich, 704–709.

Software
Airgram.io, 2024. URL https://www.notta.ai/en/welcome-airgram (accessed 20.3.2024).
Boothmate, 2024. URL https://boothmate.app (accessed 20.3.2024).
Cymo Note, 2024. URL www.cymo.io/en/note.html (accessed 20.3.2024).
Ergonomics for the Artificial Booth Mate (EABM), 2024. Ergonomics for the Artificial Booth Mate
(EABM). URL www.eabm.ugent.be/eabm/ (accessed 20.3.2024).
Interplex, 2024. URL www.fourwillows.com/interplex.html (accessed 20.3.2024).
InterpretBank ASR, 2024. URL https://www.interpretbank.com/site/docs/v4/asr.html (accessed
20.3.2024).
Otter.ai, 2024. URL https://otter.ai/ (accessed 20.3.2024).
SmarTerp, 2024. URL https://smarterp.me/ (accessed 20.3.2024).
Speechmatics, 2024. URL www.speechmatics.com/ (accessed 20.3.2024).

406
22
ERGONOMICS AND
ACCESSIBILITY
Wojciech Figiel

22.1 Introduction
This chapter discusses the notions of ergonomics and accessibility as applied to simultane-
ous interpreting, both in its on-site and remote modalities. While the author is aware that
interpreting exists in many modalities besides simultaneous (e.g. consecutive, dialogue, whis-
pered interpreting, etc.), this chapter will briefly mention these but will focus on the simul-
taneous mode, as it is here that the greatest technological developments can be observed.
As the complexity of interpreting technologies continues to grow, one can observe an
increased awareness of the links between ergonomics and interpreting studies and a result-
ant need for more extensive research into the interface between these two fields (see van
Egdom et al., 2020). Nevertheless, to date, little attention has been devoted to ergonomics
in the field of interpreting studies, and even less to the intersection between accessibility
and ergonomics in interpreting. This chapter will argue that there is a necessity to consider
accessibility, as well as ergonomics, in the discussion, in order to be inclusive to all inter-
preting professionals, including those with disabilities.
The author of this chapter is not neutral in this respect. As a visually impaired per-
son (VIP), and being an active conference interpreter for over 15 years, the author has
first-hand experience in striving to find the best possible solutions for both themselves and
their blind students, in an effort to make the interpreting profession as accessible as pos-
sible. As a result, this chapter will include personal perspectives relating to accessibility,
with particular attention being paid to experiences from the region of Central and Eastern
Europe (CEE), as the author is based in Poland. However, this does not make the chapter
less relevant for other locations, as most challenges encountered in the field of ergonomics
are universal in their nature.
The structure of the chapter is as follows: Section 22.2 will provide a discussion of the
most relevant terms related to this chapter. These include ergonomics, usability, user experi-
ence, and accessibility. Section 22.3 will sketch out a historical outline of the development
of technologies in the field of conference interpreting and analyse their impact on both ergo-
nomics and accessibility. Section 22.4 will review some of the current workflows of confer-
ence interpreters. Section 22.5 discusses computer-assisted interpreting (CAI) tools. The


DOI: 10.4324/9781003053248-28
The Routledge Handbook of Interpreting, Technology and AI

sixth section is devoted to distance interpreting (DI). This section presents the opportunities
and challenges involved in DI, with particular attention paid to accessibility for people with
visual impairments. The chapter concludes by providing information relating to ergonom-
ics in speech-to-text interpreting (Section 22.7) and in pedagogical contexts (Section 22.8).

22.2 Definitions
At first glance, the terms relevant to this chapter – ‘ergonomics’, ‘user experience’, and
‘accessibility’ – appear straightforward, intuitive, and easy to define. However, as with
nearly all terms, this is far from true. ‘Ergonomics’ comes from the Greek words for
‘work’ – ‘ergo’ – and ‘laws’ – ‘nomos’. The International Ergonomics Association defines
ergonomics as the ‘scientific discipline concerned with the understanding of interactions
among humans and other elements of a system, and the profession that applies theory,
principles, data, and methods to design in order to optimize human well-being and overall
system performance’ (International Ergonomics Association, n.d.). Other definitions of the
term (see Kiran, 2020: 221) place emphasis on the study of the relationships between human
beings and machines, as well as well-being and improvement in efficiency. In simple words,
the goal of ergonomics is ‘to improve the performance of systems by improving human
machine interaction’ (Bridger, 2003, 1). Interestingly, although the term was employed in
Ancient Greece, its first modern usage dates back to the 1857 treatise ‘The Outline of
Ergonomics, i.e. Science of Work, Based on the Truths Taken from the Natural Science’,
authored by a Polish scientist, Wojciech Bogumił Jastrzębowski (Kiran, 2020, 220).
In light of this, it can be postulated that ergonomics is highly relevant when it comes to
discussing both working conditions and interpreter training. In addition, one can also view
the human–machine interaction that takes place between interpreters and technology from
the perspective of user experience and usability. As Law et al. (2008) state, the notion of
user experience is both ‘elusive’ and ‘hard to define’. Norman and Nielsen (1998) note how
‘“user experience” encompasses all aspects of the end-user’s interaction with the company,
its services, and its products’. Meanwhile, Alben (1996, 12) stresses that experience can be
understood to be

all the aspects of how people use an interactive product: the way it feels in their
hands, how well they understand how it works, how they feel about it while they’re
using it, how well it serves their purposes, and how well it fits into the entire context
in which they are using it.

Usability, in turn, can be defined as ‘a quality attribute that assesses how easy user interfaces
are to use’ (Nielsen, 2012). Furthermore, the International Organisation for Standardization
(ISO, 2018) defines usability as ‘the extent to which a product can be used by specified users
to achieved specified goals with effectiveness, efficiency and satisfaction in a specified context
of use’. Moreover, Nielsen (2012) distinguishes five quality components of usability:

1. Learnability: How easy is it for users to accomplish basic tasks the first time they encoun-
ter the design?
2. Efficiency: Once users have learned the design, how quickly can they perform tasks?
3. Memorability: When users return to the design after a period of not using it, how easily
can they reestablish proficiency?

408
Ergonomics and accessibility

4. Errors: How many errors do users make, how severe are these errors, and how easily can
they recover from the errors?
5. Satisfaction: How pleasant is it to use the design?

These components overlap with the principles of ‘universal design’ (UD), a term which,
in itself, possesses multiple definitions (see Dolph, 2021). The classical definition of UD
postulates that it is ‘the design of products and environments to be usable by all people, to
the greatest extent possible, without the need for adaptation or specialized design’ (Mace,
1985). The aforementioned principles include ‘equitable use’, ‘flexibility in use’, ‘simple
and intuitive use’, ‘perceptible information’, ‘tolerance for error’, ‘low physical effort’, and
‘appropriate size and space for approach and use’ (Connell et al., 1997).
Lastly, the concept of ‘access’ refers to ‘efforts . . . to reform architecture and technology
to address diverse human abilities’ (Williamson, 2015, 14). Accessibility, in turn, can be
defined as a situation in which ‘[a]ll people, particularly disabled and older people, can use
websites in a range of contexts of use, including mainstream and assistive technologies; to
achieve this, websites need to be designed and developed to support usability across these
contexts’ (Petrie et al., 2015, 2). Although this definition refers to web design, it can easily
be extended to other domains (Choi and Seo, 2024).

22.3 A journey into the past


Since its inception almost 100 years ago, conference interpreting as a profession has been
inextricably bound to technology and technological development (cf. Baigorri-Jalón, 2014;
Chernov, 2016). As demonstrated by the first experiences of simultaneous interpreting
in the 1920s, both in the West and the Soviet Union, working conditions of interpreters
depended not only on technical limitations but also on political and other external consid-
erations (see, for example, Chernov, 2016). The ergonomic challenges observed in those
early trials were largely related to lack of technical and professional experience with simul-
taneous interpreting. To illustrate, the equipment used by the interpreters at the Nuremberg
trials was old, interpreters did not have access to a mute button, and ventilation inside the
booths was extremely poor (Baigorri-Jalón, 2014, 234).
However, growing pains such as these were slowly overcome as the profession of simul-
taneous interpreting developed further. As a result, the sound quality of the source language
received by interpreters improved, for example. Booths were also designed with greater
soundproofing features, and fresh air systems.
Decades of professional experience, research, and teaching, as well as technologi-
cal development and guidance from local and international professional organisations,
have led to further improvement and standardisation of interpreters’ working conditions
(see Perez Guarnieri and Ghinos, this volume). The latter can be illustrated by particu-
lar ISO standards, such as ISO 20109:2016 on interpreter equipment (ISO, 2016), ISO
17651–1:2024 on permanent booths (ISO, 2024a), ISO 17651–2:2024 on mobile booths
(2024b), and ISO/FDIS 2018 on sound/image transmission (ISO, 2017). However, many
challenges remain. As Baxter remarks:

[I]n all my 20 years’ experience as a freelance interpreter on the local market, I have
yet to encounter a booth which complies with most (if any) of these hypothetical
requirements, having had to sit on broken dining room chairs, in booths without

409
The Routledge Handbook of Interpreting, Technology and AI

working lighting, in cramped conditions with no ventilation (or the option of a noisy
ventilator) at the height of summer, etc.
(2015, 66)

22.4 Current issues relating to ergonomics of the interpreter’s workstation


A major breakthrough occurred with the advent of digital technologies. These modernised
the interpreter workstation, making it more customisable.
First, consoles have increased in sophistication. They now allow interpreters to use, and
almost instantly configure, multiple input and output channels. This has the effect of grant-
ing the interpreters with more control over how to manage their work. In addition, the
advent of brighter, clearer displays has allowed console manufacturers to present important
information in a more concise way (e.g. the timer feature, thanks to which an interpreter
knows when to hand over to their booth partner). This progress has occurred alongside
increased miniaturisation and therefore leaves more space on the desk for other devices,
such as laptops, which help the interpreter perform their job. In addition, in line with
ISO norms (particularly ISO 20109:2016, which covers interpreter equipment; ISO, 2016),
there is growing emphasis placed on accessibility. As a result, console users can activate
acoustic signals for certain features, including changing microphone status or overlaying
incoming signals from other booths. Modern consoles also offer high-contrast mode and
other amenities for low-sighted persons. In addition, in line with the ISO norm, buttons
must be labelled in Braille code. This feature provides a blind interpreter with the possibil-
ity to start work immediately, without the need to be trained to operate a specific console.
Labelling in Braille also relieves blind interpreters of the need to memorise the functions
assigned to specific buttons.
Secondly, interpreters have gained access to relatively inexpensive enhancements that
have allowed them to tailor a workstation to their specific needs. To illustrate, it is now pos-
sible to choose from a plethora of convenient, ergonomic headsets with an affordable price
tag. The chosen headset can be easily attached to the console thanks to virtually uniform,
standard 3.5 mm jack sockets. In addition, many varieties of popular consoles contain
numerous outputs, for example, 6 mm jack sockets. As a result, many interpreters can now
choose to bring their own headphones to the booth (Hynes, 2021). In addition, the presence
of standard sockets and the availability of laptop computers allow the signal to be routed
through a computer and for an automatic speech recognition (ASR) system to generate a
live source text transcript (Defrancq and Fantinuoli, 2021). However, it must be noted that
this means of creating a live transcript is subject to confidentiality restrictions, since many
ASR tools process (and store) data in the cloud.
Thirdly, interpreters have gone digital. From as early as the start of the 1990s,
Moser-Mercer (1992) had observed that over 60% of her sample of surveyed conference
interpreters were using computers. More recent studies affirm that laptops have become
widespread in the booth (Corpas Pastor and Fern, 2016). Interpreters were also early adop-
ters of tablets, using them to prepare assignments and/or during consecutive and simul-
taneous interpreting tasks (Drechsel and Goldsmith, 2016; Goldsmith, 2017). Several
interpreters have also been known to use voice recorders (Hamidi and Pöchhacker, 2007)
to assist with note-taking during consecutive interpreting assignments. Furthermore, digital
pens have been adopted in interpreting for this same purpose (Orlando, 2010), as well as
in a teaching context. Of these digital solutions, the use of both tablets and voice recorders

410
Ergonomics and accessibility

for consecutive interpreting has been identified, in the aforementioned papers, as also being
beneficial for blind interpreters.
However, these developments also present a host of challenges relating to ergonomics
and accessibility. For example, while it should be applauded that interpreters now have
increased opportunities to customise their own workspaces, event organisers and techni-
cians should have been consistently guaranteeing quality baseline working conditions that
do not require equipment modification on the part of interpreters. In other words, headsets,
booths, and other factors, including connectivity, should comply with ISO norms, and
interpreters should not be forced to take action to guarantee these equipment standards
themselves. That being said, if they choose to do so, interpreters should be made aware of
the range of options available for improving their work.
Nevertheless, there are modes and methods of interpreting where challenges have
remained largely unresolved for decades. One example is chuchotage, where interpretation
is whispered to a small audience. Issues pertaining to the ergonomics of such assignments
result both from the interpreter’s body position and from the act of whispering itself (see
Baxter, 2015). In addition, there is also the option of using portable interpreting equipment
(PIEs) (Porlán Moreno, 2019; Korybski, this volume), also known as ‘infoport systems’ or
tour guide sets. These devices usually employ FM radio frequencies to transmit voice over
short distances and are employed for simultaneous interpreting without a console, or in
Baxter’s (2015) words, in ‘boothless simultaneous’ contexts. However, very often, inter-
preters’ working conditions in such settings are deemed ‘poor’ due to the lack of space and
the absence of the soundproof environment typically found in a booth. It is for this reason
that, in such circumstances, it is even more important to observe guidelines, such as those
offered by the European institutions (EC, 2021).

22.5 Usability and accessibility of CAI tools


In addition to the aforementioned applications, digital technologies can also provide assis-
tance to interpreters before, during, and after an assignment, via so-called ‘computer-assisted
interpretation’ (CAI) tools. These can be defined as being ‘means to support the human
interpreter in the generation and use of terminological knowledge in the different phases
of an interpreting assignment and the respective resulting workflow’ (Will, 2020, 47). It
should be noted that virtually all CAI tools have been developed by active interpreters
(Fantinuoli, 2019). However, this has not guaranteed a wide-scale uptake of CAI tools by
interpreters (see Will, 2020, 45–47).
Fantinuoli (2019) explains that CAI tools can be divided into two generations.
First-generation CAI tools (see Prandi, this volume) consist of graphical inter-
faces to allow for glossary management before and during an assignment. In turn,
second-generation CAI tools provide a ‘more holistic approach to the interpreting task’
(Fantinuoli, 2019, 44) by offering information retrieval and organisational features, as
well as advanced search features within glossaries. In the past few years, there have been
promising attempts to include the use of automatic speech recognition (ASR) within
second-generation CAI tools. Such inclusions would support interpreters in looking up
terminology or when rendering numbers and statistics with accuracy during assignments
(Fantinuoli, 2017; Defrancq and Fantinuoli, 2021). According to Corpas Pastor (2017),
additional uses of CAI tools include note-taking assistance software in consecutive inter-
preting, speech-to-text converters, and computer-assisted interpreter training tools (for a

411
The Routledge Handbook of Interpreting, Technology and AI

comprehensive presentation and digest of the most modern forms of CAI technology, see
also Prandi, this volume).
However, these tools have graphical interfaces, which suggest potential problems with
accessibility. While accessibility of CAT tools is already the subject of academic interest
(see Figiel, 2018; Rodríguez Vázquez and Mileto, 2016), accessibility of CAI solutions is
an issue that has not been addressed until now. What follows is inspired by the author’s
personal professional experience with regards to the accessibility of CAI tools.
In general, CAI tools are not accessible. For example, InterpretBank has been deemed
‘inaccessible on all levels’ for screen reader users (Hof, 2021). This view is confirmed by Pol-
ish accessibility expert and well-known blind technology podcaster Piotr Machacz. Machacz
notes that the InterpretBank software is not accessible on all platforms because developers
built it on the cross-platform TK widget, a widget that is notorious within the blind commu-
nity for its inaccessibility. The web-based AI module for InterpretBank does not appear usable
for blind people either. However, although far from achieving perfect levels of accessibility,
screen reader users are able to access the main features of another CAI tool that continues
to be updated, Interpreters’ Help (see Hof, 2021). To the best of this author’s knowledge, no
tests for compliance with accessibility standards in relation to CAI tools have been reported in
academic publications thus far (besides SIPD tests that are referred to in subsequent sections
in this chapter). Therefore, as with CAT tools a decade ago, it is imperative that the question
of accessibility be studied in greater depth with regards to CAI tools.
With this in mind, as attested by the author’s correspondence on ‘The Round Table mail-
ing list for blind translators and interpreters’ (The Round Table, n.d.), there has been no
uptake of CAI tools among blind interpreters. As blind interpreters use several channels
for accessing information (e.g. tactile channels via Braille display, and auditory via speech
synthesis), the cognitive challenges related to CAI tools are only aggravated for blind inter-
preters. However, it can be posited that blind interpreters make ideal testers for how acces-
sible any potential solutions could be. This is because blind interpreters are more likely to
be able to identify any barriers to access that have been created by the dominance of visual
access to information. Even if these barriers could be mitigated by using accessible software
(which is not currently the case), they still present themselves in an acute form to this group.
Consequently, blind interpreters may be able to provide an important contribution to user
interface research, helping developers generate user interfaces that can be characterised
by high usability. Regardless of the existence (or lack of) of the potential advantages of
working alongside blind interpreters on improving user interfaces, the decision to provide
accessible solutions should not be motivated primarily by profit (or lack thereof). One
should consider instead the fact that individuals with disabilities should be able to enjoy
equal access to modern technologies, in order for their work to be competitive. This human
rights–based approach is described in the UN Convention on the Rights of Persons with
Disabilities (United Nations, 2006) and forms the core of EU’s Web Accessibility Directive
(EU, 2016), as well as the European Accessibility Act (EU, 2019). It has to be stressed that
this growing body of legislation is gradually starting to be taken on board not only by the
public sector but also by the private sector.

22.6 Ergonomics in remote simultaneous interpreting


Remote simultaneous interpreting (RSI) is a modality of distance interpreting (DI) which
can be defined as ‘situations in which interpreters are no longer present in the meeting room

412
Ergonomics and accessibility

but work from a screen and earphones without a direct view of the meeting room or the
speaker’ (Mouzourakis, 2006, 46). Other scholars postulate that DI ‘is a specific method of
(conference) interpreting and covers a variety of scenarios of a speaker at a different loca-
tion from that of the interpreter, enabled by information and communication technology’
(Ziegler and Gigliobianco, 2018, 128).
Experiments with RSI have been carried out since the 1970s (see Moser-Mercer, 2005;
Mouzourakis, 2006; Ziegler and Gigliobianco, 2018). However, it was only at the turn of
the century that technology allowed for (relatively) seamless audio and video streaming.
This resulted in a series of studies regarding the feasibility of RSI. These early tests showed
numerous complaints from interpreters about the physiological (sore eyes, back and neck
pain, headaches, nausea) and psychological (loss of concentration and motivation, feeling
of alienation) aspects of RSI (Mouzourakis, 2006, 52–53). Researchers at that time sug-
gested that these adverse consequences of RSI were related to interpreters experiencing a
lack of sense of presence (Moser-Mercer, 2005; Mouzourakis, 2006). However, it is worth
noting that, at that time, eyesight was considered to be one of the most important metrics
in these studies. To illustrate, one such study, conducted by the Directorate General for
Interpretation and Conferences of the European Parliament, suggested a statistically sig-
nificant difference in terms of such parameters as fatigue, physical discomfort, motivation,
and feeling of participation in the meeting for people wearing eyeglasses. The authors con-
cluded that ‘it seems that people wearing glasses had to strain their eyesight more than their
counterparts to extract the information they needed to carry out their job. . . . [T]heir visual
defect put them on an unequal footing with their colleagues’ (EPID, 2001, 30).
As technology developed further and bandwidth increased, quality of sound and video
in RSI steadily improved (Ziegler and Gigliobianco, 2018). Seeber et al. (2019, 300) even
go as far as to argue that ‘with the progress achieved in technology, [RSI] is no longer
perceived as more stressful than in-situ interpreting or as being detrimental to the quality
interpreters are able to provide’. And yet as the authors observe elsewhere, ‘“conference
interpreters” acceptance of some of these new professional paradigms has been consider-
ably slower than the development of the technology making them possible’ (2019, 271).
However, it seems that, just as with previous major technological breakthroughs, there is a
growing acceptance of RSI (see, for example, Saeed et al., 2022; Mahyub Rayaa and Mar-
tin, 2022; or Salaets and Brône, 2023).
It goes without saying that the COVID-19 pandemic played a crucial role in the grow-
ing acceptance of RSI (see Chmiel and Spinolo, this volume). In a very short period of
time, almost all interpreters were forced to switch to working online. While the pandemic
caused disruption, it also provided opportunities that were hitherto inaccessible for many
interpreters, who had previously worked in a face-to-face context. With the evolution of
RSI, interpreters now had the potential to work abroad without travelling, improve their
work–life balance, or work on more assignments per day (Mahyub Rayaa and Martin,
2022; Salaets and Brône, 2023).
However, one could argue that the growth of DI has made issues related to ergonomics
even more acute. We understand ergonomics to be connected to human–machine interac-
tion, and elements of interpreting, such as handover or collaboration between interpreters,
were previously considered as examples of human-to-human interaction in the majority of
cases. However, with the increasing move towards RSI, professional freelancers now find
themselves able to work from separate locations (Salaets and Brône, 2023). As a result,
these examples of interaction have become machine-mediated activities. What is more,

413
The Routledge Handbook of Interpreting, Technology and AI

since the pandemic and the rise of RSI, many tasks that were previously handled by techni-
cians are now dealt with by the interpreters themselves.
Seresi and Láncos (2023) identified three major challenges for interpreters working
remotely during the initial phase of the COVID-19 pandemic. These include listening
to their boothmate, handover, and assisting a virtual booth partner. Mahyub Rayaa and
Martin (2022), in turn, include coordination with a booth partner and auditory and cog-
nitive fatigue in this context. Solving these challenges has required ingenious solutions
and a range of new competencies and skills on the part of interpreters. This is because,
in most of cases, interpreters were having to use tools and software that had not been
designed with professional conference interpreters in mind. To illustrate, most simulta-
neous interpreting delivery platforms (SIDPs), as well as mainstream videoconferencing
platforms, did not offer effective or efficient ways of providing interpreter-to-interpreter
communication. Consequently, many interpreters resorted to using a second device to
communicate with each other (Przepiórkowska, 2021; Mahyub Rayaa and Martin, 2022;
Seresi and Láncos, 2023). Some even used three devices at a time (Salaets and Brône,
2023). Solving these challenges has had a direct impact on the ergonomics of an interpret-
er’s workstation. This further complicates workflows and requires interpreters to invest
in their own equipment.
In terms of accessibility, handover represents a major challenge for interpreters. For
blind interpreters in particular, handover has always required a more direct form of con-
tact, such as touching a colleague’s arm. However, in online settings, this is not feasible.
Blind interpreters are unable to communicate handover via the standard chat features, as
this would require additional cognitive effort, in the form of listening out for a spoken
reply from a colleague via speech synthesis, or by reading messages on a Braille display.
As a result, blind interpreters working remotely tend to set up an audio conversation on
a separate device and use spoken instructions to communicate handover. And yet it has to
be stressed that RSI, if provided in an accessible way, can represent a huge opportunity for
blind people, who may have previously experienced problems with travelling to a venue or
moving around independently (Figiel, 2017). Likewise, preparatory materials delivered on
paper could be challenging for them (Dold, 2016). Although by no means insurmountable
(see Dold, 2016), these challenges are eliminated by RSI. In addition, online tools can also
facilitate speaker identification (Zoom has a keyboard shortcut to facilitate reading aloud
the active speaker’s name) or grant increased control over microphone status (Interactio has
ISO-compliant sounds for signalling muting/unmuting a microphone).
Overall, one could argue that, through their own agency and the work of professional
organisations, working conditions and online working standards for conference interpret-
ers during the pandemic were maintained (Mahyub Rayaa and Martin, 2022). One excep-
tion to this pattern is the reduced sound quality that occurred as use of online platforms
increased (Hynes, 2021)1. Not only does poor sound quality present a health hazard for
interpreters, but it can also increase simultaneous interpreters’ cognitive load (Seeber and
Pan, 2022). This caused numerous interpreters to express concerns (Mahyub Rayaa and
Martin, 2022) and led to industrial action among the interpreters working for the European
Parliament (Sheftalovich, 2022).
The topic of ‘acoustic shock’ among interpreters had been receiving attention even before
the pandemic. Authors of a 2020 study commissioned by the International Association of
Conference Interpreters (AIIC) concluded that up to 67% of respondents were affected
by acoustic shock (AIIC, 2020). Other studies also confirm that this is a major issue. For

414
Ergonomics and accessibility

example, Hynes (2021) reports that 35% of respondents from her study complained about
experiencing hearing problems. As a result, the AIIC (2020) report presented a host of
recommendations for event organisers, conference interpreters, and other stakeholders on
how to prevent acoustic shock. These recommendations concentrate on raising awareness,
as well as the necessity to provide adequate equipment (for both interpreters and confer-
ence participants) and a recommendation for interpreters to undergo annual hearing tests
in order to react quickly to potential symptoms of acoustic shock. The commitment of the
AIIC in this field was reiterated in its 2022 resolution, which called for specific action to be
taken to protect interpreters’ health (AIIC, 2022).
These aforementioned challenges have led some researchers to try to develop SIDPs
themselves. These include SIDPs which offer specific features dedicated to teaching inter-
preting. One such example is the SmarTerp project (Rodríguez et al., 2021; Fossati, 2021;
Fritella, 2023). This project aims to build a comprehensive RSI system that includes an RSI
platform, a CAI tool based on ASR, a CAI tool for terminology preparation, and a peda-
gogical module (see SmarTerp, n.d.). It should also be noted that SmarTerp is fully commit-
ted to providing inclusive, accessible solutions for professionals and users.
Similarly, the VIP project (Corpas Pastor, 2021) aims to provide interpreters with tools
that can be used before, during, and after an assignment, as well as providing lifelong learn-
ing. Inspired by repeated, reported complaints about cumbersome SIDP interfaces, other
researchers (Rodríguez González et al., 2023; Saeed, 2024; Saeed et al., 2022) developed
a new, ergonomic interface. To this effect, after undergoing a series of focus groups and
pilot tests, interpreter users were then able to sample the resultant prototype view of an
‘ergonomic SIDP’. This led researchers to conclude that interpreters are indeed interested
in using a range of different interfaces, including more minimalist versions than those of
existing SIDPs.
However, studies conducted thus far rarely mention the non-visual parameters of SIDPs.
Examples include the ease with which one can use a keyboard or the availability of key-
board shortcuts or sounds relating to the status of certain parameters (such as microphone
on/off). A notable exception is Jawad’s (2022) MA dissertation, which was written within
the context of the SmarTerp project. Jawad carried out a series of usability tests with a
limited number of visually impaired interpreters and sign language interpreters. Results
following his intervention suggest that SmarTerp has achieved basic levels of accessibility,
both for visually impaired interpreters and for sign language interpreters.
However, there is a lack of information regarding the level of accessibility for visually
impaired interpreters of other SIDPs. The blind community at large universally acknowl-
edge that Zoom is an ‘accessible platform’ (Roussey, 2023; Hof, 2021). The author’s per-
sonal experience as a professional interpreter confirms this but also suggests that Zoom is
unable to effectively inform a blind user about the current channel into which they are inter-
preting. There is also an individual testimony of a blind interpreter who managed to work
with KUDO and QuaQua (Hof, 2021). Despite successfully navigating the web interface,
the same interpreter observed that she was able to work faster and more efficiently after
connecting a hard console to KUDO.
In addition, the author’s personal experience suggests that there are still a lot of unmet
needs in terms of making many RSDPs (remote simultaneous delivery platforms) accessible.
However, the author would also like to note certain cases where accessibility was consid-
ered and provided reasonable results. For example, Interactio, which the author personally
tested in spring 2024, appears to provide an environment within which blind interpreters

415
The Routledge Handbook of Interpreting, Technology and AI

can work. This environment includes keyboard shortcuts, and sound and voice feedback
regarding the status change of various parameters. Positive examples such as these only
serve to emphasise just how important and impactful it is to continue increasing accessibil-
ity within SIDPs, and doing so alongside a community of end users who can test improve-
ments first-hand.
It must be noted that interpreting researchers have only recently become interested in
usability as a research topic (e.g. Fritella and Rodriguez, 2022; Jawad, 2022; Fritella, 2023;
Rodríguez González et al., 2023; Saeed, 2024; Saeed et al., 2022). Taking into considera-
tion the growing complexity of the interpreters’ workstation and workflow, it is only natu-
ral that one should expect increased usability and user experience studies to be carried out
in this field in the future.

22.7 Ergonomics in emerging hybrid practices


This section is devoted to the discussion of ergonomics in speech-to-text interpreting (STTI).
STTI usually relies on respeakers, who dictate source text into speech recognition software
in order to provide live subtitles or transcripts (see Alonso-Bacigalupe and Romero-Fresco,
2024, 534). STTI can be carried out both on-site and online, and all members of the team
can work from separate locations. The number of people involved in an STTI event can
range from one single person to teams of up to four people – two respeakers and two cor-
rectors (Szczygielska et al., 2020). Originally developed for the provision of intralingual
subtitles in TV broadcasting, this practice has evolved to include interlingual STTI, which
involves interpretation between languages. There are several ways in which interlingual
STTI can be provided, as the process involves a combination of simultaneous interpreting,
machine translation, ASR, and respeaking (see Davitti, this volume, for a comprehensive
account of hybrid workflows). For example, a simultaneous interpretation can be respo-
ken intralingually. Alternatively, interlingual respeaking can be provided, or the result of
intralingual respeaking can be machine-translated (Alonso Bacigalupe and Romero-Fresco,
2024; Korybski et al., 2022).
The workflow for both intralingual and interlingual STTI tends to be complex, involving
a range of equipment and software, and requires special attention in terms of ergonomics
(Guevara, 2021). The same challenges discussed in relation to RSI are also present within
STTI. However, one noticeable difference is that teams are larger in this instance and, there-
fore, require close coordination for their work to be efficient. This is particularly the case
for some languages, such as Polish, where speech recognition software for respeaking is
less reliable than in the case of Germanic or Romance languages. Here, one feasible solu-
tion appears to be a backchannel where voice instructions for handover between the team
members can be exchanged.
As yet, the question of accessibility in relation to the software used in STTI has not
been the subject of systematic or independent research. Developers of the Dragon speech
­recognition software claim that the software is indeed accessible (Tips for vision-impaired
users, n.d.). This claim is corroborated by the author’s personal experience (notably, Dragon
allows for configuration of keyboard shortcuts). This author has also tested a piece of dicta-
tion software, Newton 5, but deemed it does not support screen readers. As for applications
for providing text input, these tend to be web-based and are either already accessible or can
be made to be so following the Web Content Accessibility Guidelines (WCAG) standard
(see Hof, 2021).

416
Ergonomics and accessibility

22.8 Ergonomics in teaching conference interpreting


As the use of technology in the interpreter’s workplace grows and complexifies, one can
also observe this same complex growth in the teacher’s workstation with regards to inter-
preter training. A potential threat to the set-up for teaching (or studying) interpreting may
be that it becomes too cognitively challenging and/or inaccessible (due to the presence of
an inaccessible graphical user interface). As with RSI, the COVID-19 pandemic had a pro-
found impact on teaching practices in this field (Zhao, 2023). One can discover, for exam-
ple, experiences of ‘emergency online teaching’ in this subject area, in recent scholarly texts
(García Oya, 2021; Mirek, 2021; Figiel, 2024). These studies cover attempts to provide
emergency online interpreter training that were deemed ‘successful’, reporting the use of
ZOOM, used alone or in combination with other, non-specialised tools. Furthermore, one
can describe the current scenario as one that is becoming increasingly complex as a result
of hybrid teaching solutions, technology use (Figiel, 2024), and the use of computer-assisted
interpreter training (CAIT) tools that are specifically tailored to the needs of SI trainers
(Zhao, 2023).
Without a concerted effort on the part of developers, including detailed testing with
users, it is likely that booth management software will remain inaccessible. An analogous
barrier can be observed in the case of touchscreens with inaccessible graphical interfaces.
Although accessible in all mainstream smartphones, touchscreens are almost universally
inaccessible in special-purpose solutions, both for teachers and students with visual impair-
ments. This is because these solutions tend to be manufactured and developed by compa-
nies that have no interest in accessibility and are produced on a small scale, sometimes even
tailored to the needs of individual customers. As with software, there is a need for more
user-centred research and the use of standardised operating systems with built-in screen
readers. In light of the author’s experience, there are three potential solutions to this acces-
sibility barrier for visually impaired teachers: (1) If the conference system is managed via a
web application (such as TeleVic’s Plixus), this would increase the likelihood that visually
impaired users may deem it ‘usable’ (though not fully accessible). (2) If the system is con-
nected via audio interface, it can be operated by a computer using accessible software (such
as Audio Hijack for macOS and/or Reaper for Windows and macOS). (3) A final option,
which should be considered as a workaround rather than a solution, is that of using classic
receiver headsets. Where needed, these headsets can be connected, via a jack-to-jack cable,
to the computer, allowing for audio recording. The most promising solution of the three is
the use of an audio interface whose output can be rerouted to videoconferencing software
(such as Zoom) or an SIDP (such as Interactio). This proposed solution would, in addition,
offer excellent sound quality for hybrid conferences.
With this in mind, if teaching simultaneous interpreting is to continue being accessible
for visually impaired interpreters (and VI students), manufacturers and developers need to
make a conscious effort to guarantee accessibility in their products.

22.9 Conclusions
This chapter has summarised the challenges related to ergonomics in conference inter-
preting. It has alluded to the significant amount of progress made in ergonomics that has
been achieved over the past decades. However, despite this, conference interpreters con-
tinue to face challenges in ergonomics to this day. Rather than reducing them, the move

417
The Routledge Handbook of Interpreting, Technology and AI

to online and hybrid modalities of interpreting has, in fact, increased these challenges. As
a result, awareness must be raised among all stakeholders. The same applies to the ques-
tion of accessibility, which has received attention from scholars and practitioners only
recently. If the interpreting profession is to continue on a sustainable, inclusive pathway,
manufacturers and developers need to fully acknowledge the challenges related to both
ergonomics and accessibility and work alongside practitioners to resolve them. Trainers
of conference interpreters, on the other hand, must provide new generations with the
knowledge about working conditions, ergonomics, and accessibility that will allow them
to make conscious decisions to guarantee their service as conference interpreters for dec-
ades to come.

Note
1 Very few users are aware of the fact that Zoom offers a feature that allows speakers to eliminate
noise cancellation and other types of sound compression that contribute to acoustic shock. The
speaker needs to turn on the so-called “original sound for musicians”. To do that, go to audio set-
tings and select “Original Sound for Musicians” under the “Audio Profile” section. Make sure that
“High fidelity mode” and “Echo Cancellation” checkboxes are checked. Then, once the meeting
starts, turn on the “Original Sound for Musicians” button in the top-right corner of the screen.

References
AIIC, 2020. Acoustic Shocks Research Project: Final Report. URL https://aiic.org/uploaded/web/
Acoustic%20Shocks%20Research%20Project.pdf (accessed 27.7.2024).
AIIC, 2022. Global Threats to Interpreters’ Auditory Health: Symptoms of Damage to the Audi-
tory System of Conference Interpreters Reported by Interpreters Worldwide Since the Advent of
Widespread Recourse to Remote Simultaneous Interpretation Platforms and Videoconferencing
Systems. URL https://aiic.org/document/10559/CdP_Resolution_Auditory%20Health_FINAL_
Sept22.pdf (accessed 27.7.2024).
Alben, L., 1996. Quality of Experience: Defining the Criteria for Effective Interaction Design. Interac-
tions 3(3), 11–15.
Alonso-Bacigalupe, L., Romero-Fresco, P., 2024. Interlingual Live Subtitling: The Crossroads
Between Translation, Interpreting and Accessibility. Universal Access in the Information Society
23, 533–543. URL https://doi.org/10.1007/s10209-023-01032-8
Baigorri-Jalón, J., 2014. From Paris to Nuremberg: The Birth of Conference Interpreting. John Ben-
jamins, Amsterdam and Philadelphia.
Baxter, R.N., 2015. A Discussion of Chuchotage and Boothless Simultaneous as Marginal and Unor-
thodox Interpreting Modes. The Translator 22(1), 59–71. URL https://doi.org/10.1080/1355650
9.2015.1072614
Bridger, R.S., 2003. Introduction to Ergonomics. Routledge, London and New York.
Chernov, S., 2016. At the Dawn of Simultaneous Interpreting in the USSR: Filling Some Gaps in
History. In Takeda, K., Baigorri-Jalón, J., eds. New Insights in the History of Interpreting. John
Benjamins, Amsterdam and Philadelphia, 136–166.
Choi, G., Seo, J., 2024. Accessibility, Usability, and Universal Design for Learning: Discussion of Three
Key LX/UX Elements for Inclusive Learning Design. TechTrends. URL https://doi.org/10.1007/
s11528-024-00987-6
Connell, B.R., Jones, M.L., Mace, R.L., Mueller, J.L., Mullick, A., Ostroff, E., Sanford, J., Steinfeld,
E., Story, M., Vanderheiden, G., 1997. The Principles of Universal Design, Version 2.0. Center
for Universal Design, North Carolina State University, Raleigh, NC. URL https://design.ncsu.edu/
wp-content/uploads/2022/11/principles-of-universal-design.pdf (accessed 27.7.2024).
Corpas Pastor, G., 2017. VIP: Voice-Text Integrated System for Interpreters. In Esteves-Ferreira, J.,
Macan, J., Mitkov, R., Stefanov, O., eds. Proceedings of the 39th Conference Translating and the
Computer. Editions Tradulex, Geneva, 7–10.

418
Ergonomics and accessibility

Corpas Pastor, G., 2021. Language Technology for Interpreters: The Vip Project. In Chambers, D.,
Esteves-Ferreira, J., Macan, J.M., Mitkov, R., Stefanov, O., eds. Translating and the Computer 42.
Tradulex, Geneva, 36–49.
Corpas Pastor, G., Fern, F.M., 2016. A Survey of Interpreters’ Needs and Practices Related to Language
Technology. Technical Paper. URL www.researchgate.net/publication/303685153_A_survey_of_
interpreters%5C%27_needs_and_practices_related_to_language_technology (accessed 27.7.2024).
Defrancq, B., Fantinuoli, C., 2021. Automatic Speech Recognition in the Booth: Assessment of Sys-
tem Performance, Interpreters’ Performances and Interactions in the Context of Numbers. Target
33(1), 73–102.
Dold, D., 2016. Technology Tip for Blind Interpreters: The KNFB Reader App. URL https://
theblindtranslator.wordpress.com/2016/03/29/technology-tip-for-blind-interpreters-th
e-knfb-reader-app/ (accessed 27.7.2024).
Dolph, E., 2021. The Developing Definition of Universal Design. Journal of Accessibility and Design
for All 11(2), 178–194. URL https://doi.org/10.17411/jacces.v11i2.263
Drechsel, A., Goldsmith, J., 2016. Tablet Interpreting: The Evolution and Uses of Mobile Devices in
Interpreting. In Lee-Jahnke, H., Forstner, M., eds. Proceedings of the 2016 CIUTI Forum. URL
www.academia.edu/36017504/Tablet_Interpreting_The_evolution_and_uses_of_mobile_devices_
in_interpreting (accessed 27.7.2024).
EC, 2021. PIE (Portable Interpreting Equipment) Technical Specifications. URL https://commission.
europa.eu/system/files/2022-01/technical-specifications-for-portable-interpreting-equipment_
2021_en.pdf (accessed 27.7.2024).
EPID, 2001. Report on Remote Interpretation Test: 22–25 January 2001 Brussels. URL www.euro-
parl.europa.eu/interp/remote_interpreting/ep_report1.pdf (accessed 27.7.2024).
EU, 2016. Directive (EU) 2016/2102 of the European Parliament and of the Council of 26
October 2016 on the Accessibility of the Websites and Mobile Applications of Public Sector Bod-
ies. URL https://eur-lex.europa.eu/eli/dir/2016/2102/oj (accessed 10.9.2024).
EU, 2019. Directive (EU) 2019/882 of the European Parliament and of the Council of 17 April 2019
on the Accessibility Requirements for Products and Services. URL https://eur-lex.europa.eu/eli/
dir/2019/882/oj (accessed 10.9.2024).
Fantinuoli, C., 2017. Speech Recognition in the Interpreter Workstation. In Esteves-Ferreira, J.,
Macan, J., Mitkov, R., Stefanov, O., eds. Proceedings of the 39th Conference Translating and the
Computer. Editions Tradulex, Geneva, 25–40.
Fantinuoli, C., 2019. The Technological Turn in Interpreting: Challenges That Lie Ahead. BDÜ Con-
ference Translating and Interpreting 4.0, Bonn.
Figiel, W., 2017. Tożsamość I status tłumaczy z dysfunkcją wzroku (Unpublished PhD thesis).
University of Warsaw.
Figiel, W., 2018. Levelling the Playing Field with (In)Accessible Technologies? How Technological
Revolution Has Changed the Working Conditions of Blind Translators. Między Oryginałem a
Przekładem 24(3/41), 75–88. URL https://doi.org/10.12797/MOaP.24.2018.41.04
Figiel, W., 2024. Teaching Simultaneous Interpreting During the COVID-19 Pandemic: Technology,
Society, Access. In Biernacka, A., Figiel, W., eds. New Insights into Interpreting Studies: Technol-
ogy, Society and Access. Peter Lang, Berlin, 289–302.
Fossati, G., 2021. SmarTerp: Applying the User-Centred Design Process in a Computer-Assisted
Interpreting (CAI) Tool (Unpublished dissertation thesis). Universidad Politécnica de Madrid.
Fritella, F., 2023. Usability Research for Interpreter-Centred Technology: The Case Study of SmarT-
erp. Language Science Press, Berlin.
Fritella, F., Rodriguez, S., 2022. Putting SmartTerp to Test: A Tool for the Challenges of Remote
Interpreting. INContext 2(2), 137–166.
García Oya, E., 2021. De las cabinas al entorno virtual: Didáctica de la interpretación simultánea en
línea sobrevenida. Estudios de Traducción 11, 147–155.
Goldsmith, J., 2017. A Comparative User Evaluation of Tablets and Tools for Consecutive Interpret-
ers. In Esteves-Ferreira, J., Macan, J., Mitkov, R., Stefanov, O., eds. Proceedings of the 39th Con-
ference Translating and the Computer. Editions Tradulex, Geneva, 41–50.
Guevara, N., 2021. Speech-to-Text Interpreting, Part 2: Some Practical Insights. Touch 29(3), 16–17.
Hamidi, M., Pöchhacker, F., 2007. Simultaneous Consecutive Interpreting: A New Technique Put to
the Test. Meta: Journal des traducteurs/Meta: Translators’ Journal 52(2), 276–289.

419
The Routledge Handbook of Interpreting, Technology and AI

Hof, M., 2021. Learning and Working Online as a Visually Impaired Interpreter (Part 2). URL
https://aibarcelona.blogspot.com/2021/05/learning-and-working-online-as-visually_26.html?m=1
(accessed 27.7.2024).
Hynes, R., 2021. Assessing the Risk Factors for Hearing Problems Among Simultaneous Conference
Interpreters. Translation Ireland 21(1), 163–188.
International Ergonomics Association, n.d. What Is Ergonomics (HFE)? URL https://iea.cc/about/
what-is-ergonomics/ (accessed 27.7.2024).
ISO, 2016. ISO 20109:2016: Simultaneous Interpreting – Equipment – Requirements. International
Organization for Standardization, Geneva.
ISO, 2017. ISO/FDIS 20108: Simultaneous Interpreting – Quality and Transmission of Sound and
Image Input – Requirements (Under Development). International Organization for Standardiza-
tion, Geneva.
ISO, 2018. Ergonomics of Human-System Interaction – Part 11: Usability: Definitions and Concepts.
International Organization for Standardization, Geneva.
ISO, 2024a. ISO 17651–1:2024 – Simultaneous Interpreting – Interpreters’ Working Environment.
Part 1: Requirements and Recommendations for Permanent Booths. International Organization
for Standardization, Geneva.
ISO, 2024b. ISO 17651–2:2024 – Simultaneous Interpreting – Interpreters’ Working Environment.
Part 2: Requirements and Recommendations for Mobile Booths. International Organization for
Standardization, Geneva.
Jawad, R., 2022. SmarTerp: Applying the User Centred Design Process for Visually Impaired Inter-
preters and Sign Language Interpreters (Unpublished dissertation thesis). Universidad Politécnica de
Madrid.
Kiran, D.R., 2020. Work Organization and Methods Engineering for Productivity. Butterworth-
Heinemann, Oxford.
Korybski, T., Davitti, E., Orasan, C., Braun, S., 2022. A Semi-Automated Live Interlingual Com-
munication. Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking. LREC
2022: Thirteenth International Conference on Language Resources and Evaluation, 4405–4413.
URL https://aclanthology.org/2022.lrec-1.468/
Law, L.-Ch., Roto, V., Vermeeren, A., Kort, J., Hassenzahl, M., 2008. Towards a Shared Definition
of User Experience. Extended Abstracts Proceedings of the 2008 Conference on Human Factors
in Computing Systems, CHI 2008, Florence, Italy, 5–10.4.2008, 2395–2398. URL https://doi.
org/10.1145/1358628.1358693
Mace, R., 1985. Universal Design, Barrier Free Environments for Everyone. Designers West, November.
Mahyub Rayaa, B., Martin, A., 2022. Remote Simultaneous Interpreting: Perceptions, Practices
and Developments. The Interpreters’ Newsletter 27, 21–42. URL https://doi.org/10.13137/
2421-714X/34390
Mirek, J., 2021. Teaching Simultaneous Interpreting During the COVID-19 Pandemic: A Case Study.
New Voices in Translation Studies 23, 94–103.
Moser-Mercer, B., 1992. Banking on Terminology: Conference Interpreters in the Electronic Age.
Meta: Journal des Traducteurs/Meta: Translators’ Journal 37(3), 507–522. URL https://doi.
org/10.7202/003634ar
Moser-Mercer, B., 2005. Remote Interpreting: Issues of Multi-Sensory Integration in a Multilingual
Task. Meta 50(2), 727–738.
Mouzourakis, P., 2006. Remote Interpreting: A Technical Perspective on Recent Experiments. Inter-
preting 8(1), 45–66.
Nielsen, J., 2012. Usability 101: Introduction to Usability. URL www.nngroup.com/articles/
usability-101-introduction-to-usability/ (accessed 27.7.2024).
Norman, D., Nielsen, J., 1998. The Definition of User Experience (UX). URL www.nngroup.com/
articles/definition-user-experience/ (accessed 27.7.2024).
Orlando, M., 2010. Digital Pen Technology and Consecutive Interpreting: Another Dimension in
Note-Taking Training and Assessment. The Interpreters’ Newsletter 15, 71–86.
Petrie, H., Savva, A., Power, C., 2015. Towards a Unified Definition of Web Accessibility. Proceed-
ings of the 12th Web for All Conference on – W4A ’15, 1–13. URL https://doi.org/10.1145/
2745555.2746653

420
Ergonomics and accessibility

Porlán Moreno, R., 2019. The Use of Portable Interpreting Devices: An Overview. Revista Trad-
umàtica. Tecnologies de la Traducció 17, 45–58.
Przepiórkowska, D., 2021. Adapt or Perish: How Forced Transition to Remote Simultane-
ous ­ Interpreting During the COVID-19 Pandemic Affected Interpreters’ Professional Prac-
tices. Między Oryginałem a Przekładem 27(4/54), 137–159. URL https://doi.org/10.12797/
MOaP.27.2021.54.08
Rodríguez, S., Gretter, S., Matassoni, M., Falavigna, D., Alonso, Á., Corcho, O., Rico, M., 2021.
SmarTerp: A CAI System to Support Simultaneous Interpreters in Real-Time. In Proceedings of
the Translation and Interpreting Technology Online Conference TRITON 2021. INCOMA Ltd.,
Shoumen, 102–109. URL https://doi.org/10.26615/978-954-452-071-7_012
Rodríguez González, E., Saeed, M., Korybski, T., Davitti, E., Braun, S., 2023. Reimagining the Remote
Simultaneous Interpreting Interface to Improve Support for Interpreters. In Ferreiro Vázquez, Ó.,
Correia, A., Araújo, S., eds. Technological Innovation Put to the Service of Language Learning,
Translation and Interpreting: Insights from Academic and Professional Contexts. Peter Lang, Ber-
lin, 227–246.
Rodríguez Vázquez, S., Mileto, F., 2016. On the Lookout for Accessible Translation Aids: Current
Scenario and New Horizons for Blind Translation Students and Professionals. Journal of Transla-
tor Education and Translation Studies (TETS) 1, 115–135.
The Round Table, n.d. URL http://lists.screenreview.org/listinfo.cgi/theroundtable-screenreview.org
(accessed 27.7.2024).
Roussey, B., 2023. Accessibility Features of Zoom and How to Make Zoom Meetings More Acces-
sible. URL www.accessibility.com/blog/accessibility-features-of-zoom-and-how-to-make-zoom-
meetings-more-accessible (accessed 27.7.2024).
Saeed, M.A., 2024. Exploring the Visual Interface in Remote Simultaneous Interpreting (PhD thesis).
University of Surrey [online]. URL https://openresearch.surrey.ac.uk/esploro/outputs/doctoral/
Exploring-the-visual-interface-in-Remote/99870866202346#file-0
Saeed, M.A., González, E.R., Korybski, T., Davitti, E., Braun, S., 2022. Connected Yet Distant: An
Experimental Study into the Visual Needs of the Interpreter in Remote Simultaneous Interpreting.
In Kurosu, M., ed. Human-Computer Interaction. User Experience and Behavior. HCII 2022. Lec-
ture Notes in Computer Science, vol. 13304. Springer, Cham. URL https://doi.org/10.1007/978-3-
031-05412-9_16
Salaets, H., Brône, G., 2023. “Working at a Distance from Everybody”: Challenges (and Some
Advantages) in Working with Video-Based Interpreting Platforms. The Interpreters’ Newsletter
28, 189–209. URL https://doi.org/10.13137/2421-714X/35556
Seeber, K., Pan, D., 2022. The Effect of Sound Quality on Attention and Load in Language Tasks.
ExLing 2022 Paris: Proceedings of 13th International Conference of Experimental Linguistics,
17–19.10.2022. Paris, France. URL https://doi.org/10.36505/ExLing-2022/13/0040/000582
Seeber, K.G., Keller, L., Amos, R., Hengl, S., 2019. Expectations vs. Experience: Attitudes Towards
Video Remote Conference Interpreting. Interpreting 21(2), 270–304.
Seresi, M., Láncos, P., 2023. Teamwork in the Virtual Booth – Conference Interpreters’ Experiences.
In Liu, K., Cheung, A., eds. Translation and Interpreting in the Age of COVID-19. Springer, Sin-
gapore, 181–196.
Sheftalovich, Z., 2022. European Parliament Interpreters Call Off Strike. URL www.politico.eu/arti-
cle/european-parliament-interpreters-call-off-strike/ (accessed 27.7.2024).
SmarTerp, n.d. Tutorials. URL https://smarterp.me/tutorials/ (accessed 27.7.2024).
Szczygielska, M., Dutka, Ł., Szarkowska, A., Romero-Fresco, P., Pöchhacker, F., Tampir, M., Figiel,
W., Moores, Z., Robert, I., Schrijver, I., Haverhals, V., 2020. How to Implement Speech-to-Text
Interpreting (Live Subtitling) in Live Events: Guidelines on Making Live Events Accessible. ILSA
Project. URL https://repository.uantwerpen.be/docman/irua/1618b4/how_to_implement_speech_
to_text_interpreting_in_live_events_1.pdf (accessed 27.7.2024).
Tips for Vision-Impaired Users, n.d. URL www.nuance.com/products/help/dragon15/dragon-for-pc/
enx/dpg-cp/Content/Help/tips_for_vision_impaired_users.htm (accessed 27.7.2024).
United Nations, 2006. Convention on the Rights of Persons with Disabilities and Optional
Protocol. URL www.un.org/disabilities/documents/convention/convoptprot-e.pdf (accessed
10.9.2024).

421
The Routledge Handbook of Interpreting, Technology and AI

van Egdom, G.W., Cadwell, P., Kockaert, H., Segers, W., 2020. A Turn to Ergonomics in Translator
and Interpreter Training. The Interpreter and Translator Trainer 14(4), 363–368. URL https://doi.
org/10.1080/1750399X.2020.1846930
Will, M., 2020. Computer Aided Interpreting (CAI) for Conference Interpreters. Concepts, Content
and Prospects. ESSACHESS-Journal for Communication Studies 13(25), 37–71.
Williamson, B., 2015. Access. In Adams, R., Reiss, B., Serlin, D., eds. Keywords for Disability Studies.
New York University Press, New York and London, 14–16.
Zhao, N., 2023. Use of Computer-Assisted Interpreting Tools in Conference Interpreting Training and
Practice During COVID-19. In Liu, K., Cheung, A., eds. Translation and Interpreting in the Age of
COVID-19. Springer, Singapore, 331–347.
Ziegler, K., Gigliobianco, S., 2018. Present? Remote? Remotely Present! New Technological
Approaches to Remote Simultaneous Conference Interpreting. In Fantinuoli, C., ed. Interpret-
ing and Technology. Language Science Press, Berlin, 119–139. URL https://doi.org/10.5281/
zenodo.1493299

422
INDEX

Note: Page numbers in italics indicate figures, bold indicate tables in the text, and references following
“n” refer to notes.

4C/ID model 204 116 – 117; ethical challenges 219, 342;


4EA cognition 91, 353, 355 – 356 generative 22, 214, 220, 328; human-AI
interaction 184 – 185, 187 – 188, 191 – 192,
Abraham, T. 194 194, 201, 203 – 205, 343; inclusion of
accessibility 269, 328, 343, 407; artificial 338; Microsoft Azure AI Speech 334 – 335;
intelligence in 337; among CAI tools open-source 336; remote simultaneous
411 – 412; challenges relating to 410 – 411; interpreting and 57 – 58; SAFE-AI Task Force
defined 409; ethics 336 – 339; knowledge 23, 336 – 337, 342; social challenges 342;
220; language 23, 209 – 211, 219, 336 – 337; training 337 – 339, 341
legal provisions 250; media 185 – 186, Arumí, M. 114
203 – 204; remote simultaneous interpreting Aryadoust, V. 354
and 413 – 414; speech-to-text interpreting asylum settings 282 – 283, 292, 297, 339;
and 416; teaching conference interpreting automated services in 296; COVID-19
and 416 pandemic 285 – 287, 291, 296;
actor-network theory (ANT) 275 crowdsourcing platforms in 295 – 296;
Adamska-Gallant, A. 271 distance interpreting in 283 – 290; guidelines
Ahrens, B. 151 291 – 293; interviews 282 – 283, 285 – 287,
Alben, L. 408 290 – 291, 294, 295; rejections 340; sound
Albl-Mikasa, M. 93, 332 quality in 289, 292; telephone interpreting in
Alonso-Bacigalupe, L. 195, 196, 197, 198, 283, 285, 286 – 289, 296; training 293 – 295;
199, 316 videoconference interpreting in 284 – 287,
Altieri, M. 235 291; video-mediated interpreting in 283,
Anacleto, M.T. 124 285, 287 – 297
Anazawa, R. 247 audioconference interpreting 51
Arbona, E. 355 audio remote interpreting (ARI) 51, 236,
artificial intelligence (AI) 88 – 89, 92, 100 – 101, 239, 273
123 – 126, 258, 296, 384 – 385, 403 – 404; audiovisual translation (AVT) 185 – 186, 204
in accessibility 337; automatic speech augmentation 63, 140, 240, 356
recognition and 158; bias 337; CAI tools augmented cognition 355 – 356, 357
and 56, 156, 159, 172 – 173, 330, 392 – 393, augmented interpreter 18, 22
401; confidentiality and privacy 336; in automatic speech recognition (ASR) 57,
crisis-prone settings 331, 340; critical AI 60 – 61, 63, 113, 117, 126, 152, 192, 213,
literacy 342, 357; equipped tablets 109, 230, 309 – 310, 315 – 316, 330, 351 – 352,

423
Index

410, 416; advantages of 400; AI-based remotely 414; work with KUDO and
information retrieval model and 92; artificial QuaQua 415 – 416
intelligence and 158; -assisted consecutive Boéri, J. 339
interpreting 98 – 104, 235 – 236; -assisted bottom-up quality assessment methods
modalities 97 – 98; CAI tools -based 311 – 316, 321
99 – 101, 103, 130 – 136, 140, 172 – 173, Boujon, V. 256
411, 415; computer-assisted consecutive Bourgadel, C. 333
interpreting and 94; machine translation Bowker, L. 137, 337
and 99, 113, 123, 187, 193 – 194, 210 – 212, Braille code 410, 412, 414
214, 235 – 236, 239 – 240; and natural Braun, S. 11, 13, 16, 41, 44, 51 – 52, 159, 248,
language processing 89, 115; remote 250, 253, 267, 270 – 276, 329 – 330, 350, 390
simultaneous interpreting and 136; within British Refugee Council 284
second-generation CAI tools 411; in Bu, X. 101
simultaneous conference interpreting 98, Bühler, H. 306
192 – 193, 196, 202, 239 – 240 Buitrago Ciro, J. 337
automatic speech translation (AST) 172, Buján, M. 332
210, 218
automation 196, 205, 321, 331, 333, 340; Cabrera Mendez, G. 17
anxiety 333; automated metrics/methods Camayd-Freixas, E. 95
218, 317 – 320; CAI tools 126; forms of Carioli, G. 159, 166
22 (see also specific forms); full 18, 187, cascading approach 212 – 214, 213
193 – 194, 316; future 320; semi (see Cavents, D. 45
semi-automated workflows); telephone ChatGPT 22
interpreting 22 – 23 Chatterjee, S. 338
AVIDICUS projects, Europe 30, 38, 41, 276, Chaves, S.G. 54
277n3, 295, 297n4, 313, 340 Chen, N.S. 165
Azarmina, P. 253 Chen, S. 102, 132, 133, 134, 137, 235, 236,
349, 352, 354
BabelDr 256 Chernov, G.V. 230
Bachelier, K. 251, 252 Chesterman, A. 328
Baidu Translate 102 Cheung, A.K.F. 54
Baigorri-Jalón, J. 230 Chitrakar, R. 95
Bail for Immigration Detainees (BID) 284 Chmiel, A. 59, 61, 351
Baker, M. 230 cloud-based interpreting 56
Balogh, K. 272, 273 cognition 158, 348 – 349; 4EA 91, 353,
Barbour, I. 232 355 – 356; augmented 355 – 356, 357;
Barik, H.C. 306, 312, 313 cognitive effort 100, 116, 126, 128, 133,
Baumgarten, S. 333 349, 353 – 354; cognitive ergonomics
Baxter, R.N. 83, 86, 409 – 410, 411 20 – 22, 354 – 355; on interpreting studies
Bell, Alexander Graham 70 349 – 353; tablet interpreting and 116 – 117;
Bennett 339 on technology 349 – 353; technology-enabled
Berber-Irabien, D.-C. 138 interpreting 350 – 351; technology-supported
Bergunde, A. 294 interpreting 351 – 352; telephone interpreting
BERTScore 218, 317 – 318 and 20 – 22; training and education 352 – 353
Biagini, G. 131 cognitive effort 100, 116, 126, 128, 133, 349,
Bidone, A. 83, 86 353 – 354
bidule interpreting 79 – 80, 82 – 89, 389 – 390 cognitive load 94, 96, 102 – 103, 133, 233,
bilateral interpreting 12; automated 23; on-site 238 – 240, 309, 333, 349, 352, 371, 373,
14, 23 – 24 389, 414; and cognitive effort 353 – 354; in
billable time 74 – 75 computer-assisted consecutive interpreting
Blackbox 157 134; in distance interpreting 382 – 383;
BLEU 218, 317 – 319 ergonomics and 20 – 21; extraneous
BLEURT 317, 319 383; intrinsic 383; remote simultaneous
blind interpreters 411; Braille code for 410; CAI interpreting and 60 – 61; self-reported 58, 62;
tools among 412; as conference interpreting subjective 236; subtasks for measuring 400;
teachers 417; of SIDPs 415; working in VMI 42, 46, 290

424
Index

Cohen, P. 17 conference interpreting service providers (CISP)


Collados Ais, A. 306 371 – 374, 379, 381
Collard, C. 332 conference systems 378, 379 – 380, 417
Communist International (Comintern) consecutive interpreting (CI) 92; ASR-assisted
231, 241n1 98 – 104, 235 – 236; asylum and immigration
community interpreting 165, 305, 369 – 370 settings 292; cognition 350 – 352;
computer-assisted consecutive interpreting computer-assisted (see computer-assisted
(CACI) 92, 102 – 103, 132 – 134, 235 – 236 consecutive interpreting); conventional
computer-assisted interpreter training (CAIT) 133; Cymo Note 99 132, 152; defined 93;
tools 150, 151, 166, 172, 173 – 174, 417; in digital pen–supported 96 – 97, 148 – 150,
conference interpreting educational settings 234; digital voice recorders 233 – 234;
156 – 159; open-source 159; peer-to-peer 159 distance interpreting 236; at international
computer-assisted interpreting (CAI) tools 19, conferences 231 – 232; machine translation
91–92, 108, 150, 174, 230, 329–330, 343, 235 – 236; note-taking 93 – 94, 145, 147 – 150,
392–393, 403; accessibility of 411–412; 232 – 233, 354, 388, 410 – 412; pedagogical
advantages of 125; artificial intelligence activities 147, 148 – 150; phases and subtasks
and 56, 156, 159, 172–173, 330, 392–393, 399 – 400; portable interpreting equipment
401; ASR and 99–101, 103, 130–136, 83; proximate 313; quality evaluation
140, 172–173, 411, 415; automation 126; 306, 313, 319, 321; respeaking-assisted
benefits for 269; among blind interpreter 412; hybrid 102 – 103; Sight-Terp 98 – 100;
characteristics 127; COVID-19 pandemic 127; simultaneous-consecutive interpreting 94 – 96,
defined 123, 392, 411; digital boothmates 112 – 113; smartpens 149, 151; speech
for 158; ethical issues in 332–333, 341, recognition 235 – 236; tablets 109 – 111, 113,
343; first-generation 125, 126, 411; fourth 115 – 116, 125, 234 – 235; workflows and
generation 126, 126; future research 137; working models 388, 401
glossary creation by 125, 129, 131–132, 136, Consortium for Speech Translation Advanced
401, 411; historical development of 124–128; Research 211
in-booth 158; interpreter-centric research Constable, A. 383
on 128–134; and interpreters’ preparation consultant interpreter, defined 367, 374
129–130; latency 135; limitations 137; Corpas Pastor, G. 128, 251, 333, 411
number interpreting with 130–131; precision corpus-driven interpreters’ preparation
135; process-oriented research 133–134; (CDIP) 129
product-oriented research 129–132; remote COVID-19 pandemic 309, 351, 413 – 414, 417;
simultaneous interpreting 57, 137–138, 140; computer-assisted interpreting tools 127;
second-generation 125–126, 126, 238, 411; digitalisation and 19; distance interpreting
for specialised terminology 131–132; speech 239, 285, 330, 339, 385; immigration,
recognition 235; system-centric research on asylum, and refugee settings 285 – 287, 291,
134–137; system performance, examination 296; remote simultaneous interpreting 52,
of 135–136; for tablet interpreting 115; 54 – 56, 58; technology for hybrid modalities
telephone interpreting 21–22, 24; theory and during 184; technology for training during
concepts 124–125; third generation 126, 159, 164 – 166, 172, 174; telephone
126; training 138–139; usability 136–137, interpreting 13, 19; videoconference
411–412; use and reception 126–128; interpreting 392, 402; video-mediated
see also computer-assisted interpreter training interpreting 31 – 32, 34, 36, 38, 41, 45,
(CAIT) tools 250 – 251; video relay service 75 – 76
computer-assisted simultaneous interpreting CPD model 203 – 204
(CASI) 133 crisis-prone settings: artificial intelligence in
conference interpreting 229, 381; acceptance 331, 340; technology use in 339 – 340
413; cognitive load in 373; defined 367, criterion-referenced method see top-down
371; educational settings, CAIT evolution quality assessment methods
in 156 – 159; ergonomics in teaching critical AI literacy 342, 357
147; information in 395; international crowdsourcing platforms 250, 283, 295 – 296
230 – 231, 239 – 240; requirements and Cymo Note 99, 152, 393
recommendations 371 – 374; three-layer
model 372 – 373, 372; video-mediated Dam, H.V. 231
interpreting in 35 – 37; see also specific modes Darden, V. 150

425
Index

Dastyar, V. 334, 342 Doherty, S. 349, 352


Davies, P. 256 Downie, J. 22, 23
Davitti, E. 16, 201, 276 Dragon speech recognition 189, 416
Dawrant, A. 86 Drechsel, A. 114
Dawson, H. 196, 197, 199, 201, 203 Drugan, J. 327 – 328, 331, 335, 336, 341, 342
De Boe, E. 41, 43, 253 Dubus, N. 286
Defrancq, B. 130, 131, 133, 135, 139, 152,
158, 162 – 163, 341 ear-voice span (EVS) 135, 149, 238
Dellantonio, E. 99 – 100 Echo model 151
Department of Interpreting and Translation effort models 93 – 94, 399
(DIT) 157 – 159, 163, 165 – 166, 169, 171, e-health 248, 251, 258n1
173 – 174 Eichmeyer-Hell, D. 196, 197
Desmet, B. 130 emergency calls 75 – 76
Devaux, J. 268, 271, 272, 275 emergency online teaching 417
Dewey, J. 153 end-to-end approach 215
De Wilde, J. 45 ergonomics 407; challenges relating to
Deysel, E. 127 410 – 411; cognitive 20 – 22, 354 – 355; defined
dialogue interpreting 157, 247, 254, 316, 408 – 409; emerging hybrid practices 415;
350 – 351; modality 16; remote 255; goal of 408; of interpreter’s workstation
video-mediated 15, 30, 32, 34, 43 410 – 411; interpreter training and 408;
diamesic translation 185 remote simultaneous interpreting 412 – 415;
Diaz-Galaz, S. 352 speech-to-text interpreting 416; teaching
digital boothmates 158 conference interpreting 416; telephone
digital pens 95, 145, 151 – 152, 184, 230, 348, interpreting and 20 – 22; working conditions
352, 410; with audio and video capturing and 408
capabilities 112 – 113; automated note-taking ergonomic SIDP 415
and 152 – 153; benefits of 269; with ethics 327 – 328, 356 – 357; and accessibility
built-in cameras 390; dot paper-based 234; 336 – 337; bias 337; computer-assisted
enhancing interpreter training 147; evolution interpreting tools in 332 – 333, 341, 343;
146 – 147; in interpreter education 150 – 151; confidentiality 335 – 336; crisis-prone settings,
note-taking by 96 – 99, 146 – 150, 233, 348; technology use in 339 – 340; critical AI
pedagogical activities 148 – 150; supported literacy 342; distance interpreting 332 – 333,
modalities 93 – 94, 96 – 97 339, 343; employability 332; healthcare 249,
digital recorder-assisted consecutive (DRAC) 257 – 258; inclusion 337 – 338; legal 268;
94, 95, 233 machine interpreting 218 – 220, 332 – 335,
digital signal processing technology (DSP) 82 343; machine translation 338; market
digital voice recorder 95, 97, 230, 233 – 234, developments 332 – 334; ownership 335 – 336;
240, 389 portable interpreting equipment 85 – 88;
Dimond, T. 146 privacy 335 – 336; social responsibility and
Diriker, E. 85, 230 infrastructure 341 – 342; teaching of 341;
distance interpreting (DI) 2, 11 – 12, 21, 26, technoethics 329; video-mediated 293 – 294;
186, 230, 236, 239, 309, 330 – 331, 354, working conditions 332 – 333, 343
392, 412 – 413; cognitive dimensions in Eugeni, C. 185, 196, 197
350; cognitive load in 382 – 383; COVID-19 European Accessibility Act 412
pandemic 239, 285, 330, 339, 385; ethical European EU WEBPSI project 287
issues 332 – 333, 339, 343; four degrees of European Migration Network 287
separation model in 382 – 383; in healthcare EU’s Web Accessibility Directive 412
249 – 254, 257; immigration, asylum, EU-WEBPSI project 283, 293, 295, 297n2
and refugee settings in 283 – 290; in ISO
standards 370, 373, 376, 378, 381 – 385; Fadel, C. 172
in public service interpreting settings Fantinuoli, C. 11, 19, 129 – 131, 135, 138 – 139,
249, 283, 289 – 291, 294; sign language 152, 157, 158, 172, 269, 318, 329,
251, 253; strategies in 275; technological 332 – 333, 334, 392, 411
advances facilitating 348 – 349; see also Farag, F. 254 – 255
remote simultaneous interpreting; telephone Federici, F.M. 340
interpreting; video-mediated interpreting Fern, L.M. 128

426
Index

Fernandez Perez, M. M. 19 requirements and recommendations 370;


Ferrari, M. 94 – 95 sound quality 253; technology-mediated
Filene-Finlay ‘Hush-a-Phone’ translator 250 – 258; telephone interpreting in 34,
80 – 81, 237 249 – 252, 254 – 257; video-mediated
final draft international standard (FDIS) 365 interpreting in 38 – 41, 43, 45 – 46,
four degrees of separation model 382 – 383 249 – 255, 257
Fowler, Y. 271 Herbert, J. 232, 306
Frittella, F.M. 56, 130, 132, 134, 136 – 137, Hertog, E. 272, 273
139, 150, 151, 158, 173, 254, 352 Hickey, S. 15
Hiebl, B. 95, 97, 234
Gaiba, F. 230 Hlavac, J. 19
Gambier, Y. 185 Holley, J.C. 111, 235
Garcia Becerra, O. 306 home alone model 378 – 379, 383, 385
García González, M. 338 Hornberger, J. 13, 313
General Directors Immigration Services Horváth, I. 328, 335, 341, 356
Conference (GDISC) 285 human-AI interaction (HAII) 184 – 185,
General Secretariat of the Council of the 187 – 188, 191 – 192, 194, 201, 203 – 205, 343
European Union 268 human-centric workflows 194, 205; interlingual
generative AI 22, 214, 220, 328 respeaking 188 – 191; simultaneous
Gentile, P. 332 interpreting and intralingual respeaking
German Association of Conference 191 – 192
Interpreters 390 human-machine interaction 1, 12, 18, 21 – 22,
Gerver, D. 312 56, 91, 192, 258, 320, 404, 408, 413
Ghosh, S. 338 Hush-a-Phone concept 80 – 81, 237
Gieshoff, A.C. 356 hybridity, concept of 183 – 184, 194
Gigliobianco, S. 21 hybrid modalities 32, 182, 205, 233, 267, 310,
Gilbert, A.S. 339 320; efficiency 195 – 198; ergonomics in 416;
Gile, D. 93, 102, 232 hybridity, concept of 183 – 184; quality 95,
Gillies, A. 232 195; for real-time interlingual speech-to-text
Giustini, D. 334, 339, 342, 357 186 – 194, 188; for real-time speech-to-text
Global Talk 250, 256 practices 185; respeaking-assisted hybrid
glossary creation 152, 329, 391, 393; automatic consecutive interpreting 102 – 103;
330; by CAI tools 125, 129, 131 – 132, 136, simultaneous-consecutive 94 – 97, 102,
401, 411; digital 132; dynamic software 238 112 – 113, 233 – 234; skills and competences
Goldsmith, J. 111, 114, 235, 269, 352 for 200 – 202; study design parameters
Gonzalez Rodriguez, M.J. 276 198 – 200; tablets in 112 – 113; teaching 417;
Google Neural Machine Translation 212 training for 202 – 204; see also automation;
Gottlieb, H. 185 human-centric workflows; semi-automated
GPT 214 workflows
GPT3.5 318 – 319 Hynes, R. 15, 415
GPT4 319
Grbić, N. 306 IBM Wireless Translation System 80
Greco, G.M. 185 iFlynote 100
Guo, M. 124 iFlytek ASR 100, 102
iFlytek Interpreting Assistant 100
Hale, S.B. 271, 273, 308, 312, 351 Iglesias Fernandez, E. 17
Hamidi, M. 95, 112 ILSA project 203, 206n10
Han, C. 318, 319 image quality 34, 289, 378, 381
Han, L. 124 immigration 37, 282 – 283, 340; automated
Hansen, J.P.B. 43, 254 services in 296; bail hearings 284 – 285;
Hanson, T.A. 127, 351 COVID-19 pandemic and 285 – 287,
Hawel, K. 95, 233 291, 296; crowdsourcing platforms in
healthcare interpreting (HI) 247 – 249, 282 – 283, 295 – 296; distance interpreting in 283 – 290;
291, 307, 316, 339; distance interpreting guidelines 291 – 293; sound quality in
in 249 – 254, 257; interpreting for refugees 289, 292; telephone interpreting in 283,
with 286; machine translation in 256, 258; 285, 286 – 289, 296; training 293 – 295;

427
Index

videoconference interpreting in 284 – 287, InTrain 159, 161, 166, 174, 174n9;
291; video-mediated interpreting in 283, development 160 – 162; suitability 162 – 164;
285, 287 – 297 usability 162 – 164
infoport systems see portable interpreting intralingual respeaking 189 – 190, 195, 201, 316,
equipment 416; machine translation and 192, 196 – 197,
in-process knowledge work 399 – 401, 402 310; semi-automated 315; simultaneous
Integrated Services Digital Network (ISDN) interpreting and 191 – 192, 196, 202 – 203
33 – 35, 37, 40, 56, 270 InZone project 165
Interactio 415 – 417 ISO 13611: 2014 369 – 370
interlingual respeaking 187, 188 – 190, 194, ISO 17651 – 1:2024 374, 375 – 376, 381, 409
196 – 197, 416; ILSA and 203; SMART ISO 17651 – 2:2024 374, 376 – 377, 409
project and 197 – 204, 206n7, 310, 316, ISO 18841:2018 366, 368, 370
322n2; structure of 201, 203 ISO 20108:2017 377 – 378, 381 – 382
International Association of Conference ISO 20109:2016 36, 371, 377 – 378, 380 – 382,
Interpreters (AIIC) 59, 85 – 86, 88, 236, 238, 409, 410
306, 335, 365, 374, 414 – 415 ISO 20228:2019 370
international conference interpreting 230, ISO 20539:2019 379, 380
230 – 231, 239 – 240 ISO 21998:2020 370
International Ergonomics Association 408 ISO 22259:2019 378, 379 – 380, 417
International Labour Organization (ILO) ISO 23155:2022 371, 373, 374, 379, 382;
231, 236 clauses of 371 – 372; old and new ideas
International Organisation of Conference 373 – 374; three-layer model underlying
Interpreters 389 372 – 373, 372
International Organization for Standardization ISO 24019:2022 52, 371, 378 – 380, 382
(ISO) 408; accreditation, defined 369; ISO 2603:2016 375, 377
certification, defined 369; development ISO 4043:2016 374, 376, 377
stages 364 – 366; distance interpreting ISO/IEC Directives 364, 368
370, 373, 376, 378, 381 – 385; drafting IVY project 157
standards 366 – 369; interpreting standards
369 – 371; ISO 13611: 2014 369 – 370; ISO Jastrzębowski, W. B. 408
17651 – 1:2024 374, 375 – 376, 381, 409; Jawad, R. 415
ISO 17651 – 2:2024 374, 376 – 377, 409; ISO Joseph, C. 253
18841:2018 366, 368, 370; ISO 20108:2017
377 – 378, 381 – 382; ISO 20109:2016 36, Kade, O. 181
371, 377 – 378, 380 – 382, 409, 410; ISO Kajzer-Wietrzny, M. 158
20228:2019 370; ISO 20539:2019 379, Kak, A. 336
380; ISO 21998:2020 370; ISO 22259:2019 Kalina, S. 82, 308, 313
378, 379 – 380, 417; ISO 23155:2022 Keating, E. 70
(see ISO 23155:2022); ISO 24019:2022 Keiser, W. 80
52, 371, 378 – 380, 382; ISO 2603:2016 Kellet Bidoli, C.J. 148 – 149, 151
375, 377; ISO 4043:2016 374, 376, 377; Kelly, N. 17, 19
standardisation, defined 369; stylistic Kenny, D. 341
recommendations 368 – 369; technical Kiraly, D. 139
standards 374 – 381 Klammer, M. 254, 339
International Telecommunication Union (ITU) knowledge work 394 – 396, 394, 398, 401;
33, 35, 41, 239, 279 digitalisation of 390 – 391; in-process
International Workshop on Spoken Language 399 – 401, 402; peri-process 399 – 401, 402;
Translation (IWSLT) 210 – 211, 319 post-process 399 – 401, 402; pre-process
Interplex 125, 157, 329, 390, 396 399 – 401, 402; primary 394, 394; secondary
InterpretBank 100, 126, 131 – 132, 135, 152, 394, 394
157 – 158, 329, 393, 396, 412 Ko, L. 165
interpreter-mediated phone calls 71 – 73 KoBo Inc. 296
Interpreters’ Help 390, 396, 412 Kopczyński, A. 306
Interpreters’ Pool Project 285 Korybski, T. 315
InterprIT 157 Kruger, J.-L. 102, 132, 133, 134, 137, 235,
Intragloss 329 236, 352

428
Index

KUDO AI Speech Translator 126, 136, 216, machine interpreting (MI) 23, 79, 91, 140,
318, 415 – 416 210 – 211, 329, 331, 339, 356, 384;
Kunin, M. 286 cascading approach 212 – 214, 213; cultural
Kurz, I. 306 and communicative challenges in 216;
end-to-end approach 215; ethical issues in
Lancos, P. 414 218 – 220, 332 – 335, 343; future trajectory
large language models (LLMs) 18, 104, 210, of 220 – 221; history 211 – 212; linguistic
214, 218, 220 – 221, 235, 319, 330, 337, challenges in 215 – 216; measures in
343, 356 – 357, 392 evaluation of 319 – 320; misuse 219; overuse
latency 115, 135, 213 – 214, 217 219; post-editing 63; quality 217 – 218;
Law, L.-Ch. 408 technical challenges in 217; underuse 219
Lazaro Gutierrez, R. 17, 20 – 21, 23 machine translation (MT) 23, 98, 102,
League of Nations (LoN) 231, 236, 388 109, 132, 182, 216, 230, 315, 330 – 331,
Lederer, M. 230 392 – 393, 416; automatic speech recognition
Lee, K.M. 53, 100, 101 and 99, 113, 123, 187, 193 – 194, 210 – 212,
Lee, R.G. 275 214, 235 – 236, 239 – 240; in consecutive
legal interpreting 33, 265, 290 – 291, 312 – 313, conference interpreting 235 – 236; ethical
340, 351; benefits of using technologies considerations 338; evaluation metrics in
in 267 – 268; ethnographic research in 44; 317 – 319; generic tools 248; in healthcare
frameworks and codes regulating technology 256, 258; human version of 295; intralingual
and 268; guidelines and training resources respeaking and 192, 196 – 197, 310;
276; interpreter’s role 274 – 275; interpreting neural 22, 212 – 214, 335; on-demand 99;
mode, change in 272; new working space in simultaneous conference interpreting
adaptation 271 – 272; physiological impact 239 – 240
274; quality in 273; rapport building 272, machine translation quality estimation (MTQE)
274; re-distributed legal system 271; remote 317, 319
simultaneous interpreting and 268, 271, Magalhães, E. 86
273; sound quality 270 – 271; standard Mager, M. 338
370; strategies 275 – 276; technology Magnuski, E. H. 81
and equipment quality in 270 – 271, 273; Mahyub Rayaa, B. 414
telephone 265 – 266, 268, 270, 273; Martin, A. 414
turn-taking and interaction management MASIT 232, 241n4
in 273 – 274; videoconferencing technology MATRIC project 197, 206n8, 310, 322n1
in 37 – 39, 266 – 267, 268, 270 – 276; Maxell 146
video-mediated 37 – 38, 40 – 42, 265, 274; Měchura, M. 339
video relay service 267 – 268, 275; video media accessibility (MA) 185 – 186, 203 – 204
remote interpreting and 268 MediBabble 256
Lewis, D. 335 Megone, C. 328, 341
Li, H.Y. 101 Mellinger, C. 21, 127, 128, 147, 351, 355
Li, X. 319 Merlini, R. 157
Li, Y. 253 METEOR 317 – 319
Li, Z. 138 Meyer, B. 254 – 255
liaison organisation 364 – 365 Microsoft Azure AI Speech 101, 334 – 335
Licoppe, C. 272, 273, 274 Mielcarek, M. 95, 97
Lion, K.C. 252 Miler-Cassino, J. 273, 274, 275
Liu, H. 352 Milošević, J. 355
Liu, J. 21, 22 Mintz, D. 14
Livescribe 96, 146 – 149, 151, 234, 241n6 Mirus, G. 70
Llewellyn-Jones, P. 275 Moleskine 152
Logitech 146 Monteolivia-Garcia, E. 269
Lombardi, J. 95 Monzo-Nebot, E. 339
Lookup 390 Moodle 157
Lu, X. 318 Moorkens, J. 335
Moser-Mercer, B. 15, 20, 42, 45, 60, 61, 114,
Ma, Z. 95 274, 313, 350, 410
Machacz, P. 412 Moser, P. 306, 307

429
Index

Motta, M. 159 Özkan, C.E. 97


Mouzourakis, P. 35 – 36 Ozolins, U. 252, 308
Mouzourakis, T. 274
multilingual communication 23 – 24, 117, Pagano, A. 196, 197, 199
183 – 184, 221, 258, 266, 349, 381, 385, 389 Paneth, E. 13
Panna, S. 83
named entity recognition (NER) 98, 126, 136, Patil, S. 256
195, 314 – 315 peer-to-peer connection 168, 175n11, 159;
Napier, J. 38, 274 see also InTrain
National Institute of Information and pen-and-paper technology 153, 234, 348,
Communications Technology (NICT) 211 388 – 389
natural language processing (NLP) 89, 91, 104, pen computing concept 146
109, 114 – 115, 129, 330, 336 – 338 peri-process knowledge work 399 – 401, 402
Neo models 151 Pöchhacker, F. 53, 95, 112, 147, 152, 181,
neural machine translation (NMT) 22, 186, 201, 230, 231, 233, 253, 254, 269,
212 – 214, 335 313, 339
Nevado Llopis, A. 20 Pogue, M. 286
new work item proposal (NWIP) 365, 371 Pöllabauer, S. 282, 294, 308
Nguyen, N.P.H. 146 Porlán-Moreno, R. 83
Nielsen, J. 408 portable interpreting equipment (PIEs) 79 – 80,
Nimdzi 34 89, 411; advantages 83 – 84; application
Nokia 146 83; challenges 82 – 83; constraints 84 – 85;
Norman, D. 408 defined 380; development of 80 – 82;
note reading task 93, 232, 233, 235 ethical considerations 85 – 88; historical
note-taking 21 – 22, 87, 93 – 94, 102, 145, 163, background 80 – 82; in interpreter training
230, 355, 401, 410 – 412; audio-augmented 86 – 88; skill for working with tour guide
146; automated 152 – 153; in consecutive sets 87 – 88; sound quality 81 – 82, 85 – 86,
conference interpreting 232 – 233; by 88 – 89; standards 85 – 86; time and space to
digital pens 96 – 99, 146 – 150, 233, 348; introduce alternative equipment 87
live, recording 147; on notepads 235; pen post-process knowledge work 399 – 401, 402
and paper for 153, 234, 348, 388 – 389; PRAGMACOR project 19
tablets for 111, 114 – 115, 235, 348, 401; Prandi, B. 131 – 133, 138 – 139, 150, 158,
traditional 95, 101; during video remote 172, 240
interpreting 352 precision, defined 136
NotionAI 100 pre-process knowledge work 399 – 401, 402
NTR model 195, 197, 314 – 316, 314 Price, E.L. 40
number interpreting 130 primary knowledge work 394, 394
public service interpreting (PSI) 249 – 250,
O’Brien, S. 140, 354, 356 252 – 254, 257, 283, 289 – 291, 294 – 295
Olalla-Soler, C. 158 Pulse model 151
online interpreter training 164, 417, 159;
see also ReBooth Qin, X. 101
on-site interpreting 102, 287, 381; remote quality 305 – 306, 322; assessment and
simultaneous interpreting and 35 – 36, 41 – 42, assurance 306 – 308; automated metrics/
61 – 62, 311 – 313; telephone interpreting and methods 317 – 320; bottom-up methods
13, 15 – 17, 23, 41 – 42, 252; video-mediated 311 – 316, 321; challenges 309 – 310; future of
interpreting and 15, 35, 38 – 43, 252, 320 – 321; in healthcare settings 39; in hybrid
289 – 290 modalities 95, 195; in legal interpreting
oral translators 231 270 – 271, 273; in machine interpreting
Orlando, M. 95 – 97, 112, 113, 149, 151, 217 – 218; in remote simultaneous
251, 252 interpreting 59 – 60; standards 377; in
Ouellet, M. 17 telephone interpreting 17; top-down methods
overlapping speech 38, 63, 70, 283, 290 for 310 – 312, 320 – 321; in video-mediated
over-the-phone interpreting (OPI) see telephone interpreting 34 – 36, 41 – 42; see also sound
interpreting quality
Oviatt, S. 17 QuaQua 415

430
Index

Ramirez-Polo, L. 341 Restuccia, M. 100


rapport building 272, 274 Riccardi, A. 150, 151
ReBooth 159, 164 – 166, 174; class set-up mode Risku, H. 355
167; development 166 – 170; features 168; Rivas Velarde, M. 253
ReBooth 2.0 170; student’s interface 169; Rodríguez González, E. 315
trainer’s interface 167; UEQ results 171; Rodriguez Melchor, M.D. 158
usability 170 – 171 Rodriguez, S. 139
refugee settings 37, 340; 2015 crisis 294; Romano, E. 148, 151
automated services in 296; COVID-19 Rombouts, D. 274
pandemic 285 – 287, 291, 296; Romero-Fresco, P. 195, 196, 197, 198, 199,
crowdsourcing platforms in 295 – 296; 314, 316
distance interpreting in 283 – 290; Rosenberg, B.A. 44
guidelines 291 – 293; sound quality 289, Rozan, J.F. 232
292; telephone interpreting in 283, Roziner, I. 20, 42, 60, 61, 311, 350
285, 286 – 289, 296; training 293 – 295; Russo, M. 158, 173
videoconference interpreting in 284 – 287, Rybińska, Z. 273, 274, 275
291; video-mediated interpreting in 283,
285, 287 – 297 Saeed, M.A. 53, 54, 57, 136 – 137
Remael, A. 186, 201 Saint-Louis, L. 13
remote interpreting 267, 330; see also distance Sánchez-Gijón, P. 114
interpreting Sánchez Rodas, F. 251
remote simultaneous delivery platforms Sandrelli, A. 150, 157, 197
(RSDPs) 415 Schouten, B.C. 248
remote simultaneous interpreting (RSI) 11, screen multiplication 393 – 394
15, 20, 22, 30 – 34, 39, 46, 84, 123, 126, SeamlessM4T 215
164, 170, 193, 239, 309, 382; accessibility secondary information 390 – 391, 395, 399, 401
of 413 – 414; artificial intelligence and secondary knowledge work 394, 394
57 – 58; automatic speech recognition and Seeber, K. 55, 61, 355, 400, 413
136; booth-based 313; boothmate presence Seleskovitch, D. 232
in 61 – 62; -CAI tools 57, 137 – 138, 140; semi-automated workflows 88, 189,
challenges for interpreters in 414 – 415; 196 – 197, 310, 315 – 316, 321; intralingual
cognitive load and 60 – 61; in conference respeaking and machine translation 192;
settings 239; COVID-19 pandemic 52, semi-automated metrics 218; simultaneous
54 – 56, 58; critical issues 59 – 62; defined interpreting and automatic speech
51; design and development of 56 – 58; recognition 192 – 193
ergonomics in 412 – 415; interpreting semiotics of interpreting 397 – 398
performance 60 – 61; key terms and concepts sensatim 189 – 190
of 52 – 53; legal interpreting and 268, separation model 382 – 383
271, 273; multimodality in 62; in online Seresi, M. 414
interpreter training 164, 173; on-site Setton, R. 86
interpreting and 35 – 36, 41 – 42, 61 – 62, Sharoff, S. 129
311 – 313; physical hubs 36; platformisation SHIFT project 19, 276, 294, 297n3
of 92; plug-and-play feature of 333; Shlesinger, M. 20, 42, 60, 61, 306, 307,
practitioners’ feedback about 58 – 59; 311, 350
shortcomings of 36; sound and video in 413; Short, J. 42 – 43
sound quality in 52 – 53, 58 – 60, 62; -specific Sienkiewicz, B. 233
platforms 53; stress in 61; tasks need to be Sight-Terp 98 – 100, 126, 132
fulfilled in 392; in teaching 417; teamwork sign language 13, 40, 72, 76, 110, 181,
dynamics 61 – 62; use and application of 370, 375; distance interpreting 251,
53 – 56 253; educators and trainers 150; remote
Ren, W. 337 interpreting 267; SIDPs and 415; standard
respeaking 94, 132, 204, 235; -assisted 373, 375, 378 – 379, 382; structuring 69;
consecutive interpreting 102 – 103; CACI telephones and 13; by videophone 67;
involving 133 – 134; defined 189; interlingual see also video relay service
(see interlingual respeaking); intralingual (see SimConsec 94 – 95, 112, 184, 233, 389 – 390
intralingual respeaking) Simon, S. 183

431
Index

simultaneous-consecutive interpreting 94 – 97, sound quality 163, 239, 378, 392, 409, 414;
102, 112 – 113, 233 – 234 healthcare interpreting 253; immigration,
simultaneous interpreting (SI) 1, 34, 44, 87, asylum, and refugee settings 289, 292; legal
152, 160, 229, 272, 348, 350, 389 – 391, settings 270 – 271; portable interpreting
403, 409 – 411; artificial intelligence and equipment 81 – 82, 85 – 86, 88 – 89; remote
188; ASR assisted 98, 192 – 193, 196, simultaneous interpreting 52 – 53, 58 – 60, 62;
202, 239 – 240; boothless 86; booths, telephone interpreting 14; video-mediated
microphones, and headsets in 237 – 238; interpreting 34, 40, 289
CAI tools and 92; cognition and 350 – 351; speaker involvement 338
distance interpreting in 239; electronic Speechmatics 393
glossaries in 238; equipment for 377 – 378; speech recognition (SR) 189, 191 – 193, 196,
IBM system enabled 156; at international 201, 203; CAI tools 235; in consecutive
conferences 236 – 237; interpreters’ working conference interpreting 235 – 236; in
environment for 375 – 377; and intralingual simultaneous conference interpreting
respeaking 191 – 192, 196, 202 – 203; 239 – 240; see also automatic speech
machine translation in 239 – 240; mobile recognition
booths for 376; on-site 87, 311; permanent speech synthesis 110, 183, 211, 414
booths for 375; phases and subtasks of speech-to-speech translation 210 – 212, 319, 331
399 – 400; portable interpreting equipment in speech-to-text (STT) 109, 148, 152, 183,
83 – 84; real SI 86; simultaneous interpreting 204, 269, 309 – 310; broadcasting 349;
2.0 190; sound and image input, quality and intralingual 194, 316; live interlingual 187,
transmission of 377; speech recognition in 204; live STT 184 – 187, 189, 193, 200,
239 – 240; tablet in 109 – 112; teaching of 205; Microsoft Azure 101; real-time 182,
417; see also computer-assisted simultaneous 185, 191, 194; real-time interlingual 182,
interpreting; remote simultaneous 184, 186 – 187, 188, 188, 194 – 195, 197;
interpreting; simultaneous interpreting transcripts 89; translation 211, 214, 214,
delivery platforms 218, 221
simultaneous interpreting delivery platforms speech-to-text interpreting (STTI) 186,
(SIDPs) 52, 414, 416, 417; accessibility 414; 189, 416
blind interpreters and 415; ergonomic 415; speech translation (ST) 91, 98 – 100, 102,
ISO standards 371, 374, 378 – 379, 380, 382, 209 – 210, 221; automatic 172, 210, 218;
385; sign language interpreting and 415 challenges 215 – 217; free services 336;
Singureanu, D. 40, 272 live 222; multilingual 211; real-time
Skaaden, H. 294 239; simultaneous 213; speech-to-speech
Skinner, R. 249, 251, 275 210 – 212, 319, 331; see also machine
SLI e-learning programme 165 interpreting
SmarTerp&Me 126 Spinolo, N. 11 – 13, 15 – 16, 21, 58, 59, 159,
SmarTerp 136 – 137, 152, 158 – 159, 171 – 174, 166, 351
351, 354, 393, 415 Stahl, J. 83
smartpens 94, 112, 145, 149, 153; benefits 97; Standardisation see International Organization
features 96, 147, 151 – 152; in interpreter for Standardization
education 150 – 151; Livescribe 96, 146 – 149, Stengers, H. 351
151, 234, 241n6; Moleskine 152; Neo Stewart, C. 319
models 151; note-taking 148; at present Stoll, C. 125
151; simultaneous-consecutive with 97; stress 95, 132, 274, 313, 350 – 351, 385;
SyncPen 152 cognitive 124; management 275, 372; remote
SMART project 197 – 204, 206n7, 310, simultaneous interpreting 61; video relay
316, 322n2 service 73 – 74; working-memory 124
SMART-UP project 203 – 204, 206n11 STUN protocol 160, 174n10
Smith, L. 157 Sultanic, I. 15, 16, 17
social presence 53, 54 Svejcer, A.S. 230
social responsibility 329, 341 – 342 Svennevig, J. 254
soft skills 156, 164, 174; development needs Svoboda, S. 95, 97
172; teaching 171 SyncPen 152
software-adapted delivery (SAD) 191 – 192, 204 Szarkowska, A. 201

432
Index

tablet interpreting 84, 94, 99, 145, 147, top-down quality assessment methods 310 – 312,
152 – 153, 230, 269, 352, 410; artificial 320 – 321
intelligence and 109, 116 – 117; CAI Translators without Borders 296
tools and 115; and cognition 116 – 117; Translatotron 215
consecutive interpreting 109 – 111, 113, Trilling, B. 172
115 – 116, 125, 234 – 235; defined 108; Tryuk, M. 328
digital pens and 111, 114 – 115, 235, TURN protocol 160, 175n11
348, 401; future directions 114 – 117; in turn-taking 14, 16, 43 – 44, 254, 273 – 274, 284,
hybrid interpreting modalities 112 – 113; 292 – 294
in professional practice 109 – 110; research Tymczyńska, M. 158
110 – 113; simultaneous 109 – 112; and skills Tymoczko, M. 338
acquisition 115 – 116; training 113 – 114
Tasa-Fuster, V. 339 UN Convention on the Rights of Persons with
Taylor, J. 273, 274 Disabilities 181, 412
Taylor, J.L. 248, 250 universal design (UD) 409
teamwork 58, 61, 332, 373, 391, 403 – 404 Unlu, C. 98, 99, 100, 132, 319
technologies 348 – 349; cognition on 349 – 353;
in data, information, knowledge, and Van Cauwenberghe, G. 135
management 396 – 397; defined 388; -enabled Van de Meer, J. 335
consecutive interpreting (see specific entries); van Heuven, V.J. 319
-enabled interpreting 350 – 351; in phases and Van Straaten, W. 255
subtasks 400 – 402; in semiotics 398 – 399; Varde, S. 148 – 149, 151
-supported interpreting 351 – 352; training Vargas-Sierra, C. 341
and education 352 – 353; see also specific verbatim approach 98, 189 – 190, 204, 316
technologies Verbmobil 211
technology-mediated interpreting see distance Verdier, M. 273, 274
interpreting Verrept, H. 252
telecommunications device for the deaf (TDD) videoconference interpreting (VCI) 30, 44 – 45,
13, 67 51, 236, 250, 370, 391 – 392, 398, 403;
teleconference interpreting 11, 51, 330 business and 402; COVID-19 pandemic 392,
telehealth 45, 248, 251, 258n1, 286, 339 402; generic 309; in immigration, asylum,
telemedicine 248, 258n1 and refugee settings 284 – 287, 291; legal
telephone interpreting (TI) 39, 41 – 42, 305, 312, 37 – 39, 266 – 267, 268, 270 – 276; video
353, 382; advantages 13, 15; automation remote interpreting vs. 32 – 35, 37;
22 – 23; CAI tools 21 – 22, 24; cognition see also Zoom
and 20 – 22; coordination of discourse in video-mediated dialogue interpreting 15, 30, 32,
14, 15 – 17; defined 11, 12; disadvantages 34, 43
14 – 15; ergonomics and 20 – 22; future video-mediated interpreting (VMI) 11, 12 – 13,
avenues 18 – 24; in healthcare settings 15 – 16, 46, 258n3, 305, 312, 314, 340;
34, 249 – 252, 254 – 257; in immigration, adaptation strategies 44 – 45; AVIDICUS
asylum, and refugee settings 283, 285, projects 30, 38, 41, 276, 277n3, 295,
286 – 289, 296; lack of visual context in 297n4, 313, 340; challenges of 42 – 44;
14 – 16; in legal settings 265 – 266, 268, 270, characteristics of 31 – 33; cognitive load in 42,
273; multimodal input in 14 – 15; on-site 46, 290; in conference interpreting settings
interpreting and 13, 15 – 17, 23, 41 – 42, 252; 35 – 37; COVID-19 pandemic 31 – 32, 34,
quality and satisfaction 17; sound quality 36, 38, 41, 45, 250 – 251; critical aspects of
14; technological workflows 18 – 19; training 41 – 42; defined 30, 31; in healthcare settings
19 – 20; VMI and 251 – 252 38 – 41, 43, 45 – 46, 249 – 255, 257; historical
Telephone Interpreting Project (TIP) 37 evolution of 33 – 34; human factors in
teletypewriter (TTY) 13, 67 41 – 42; in immigration, asylum, and refugee
text telephones 13 settings 283, 285, 287 – 297; interaction in
text-to-speech (TTS) 148, 183, 213, 396 42 – 44; interpreter comprehension in 41 – 42;
Thonon, F. 255 interpreting quality in 34 – 36, 41 – 42; in legal
Tipton, R. 336, 342 settings 37 – 38, 40 – 42, 265, 274; on-site
Tiselius, E. 100 interpreting and 15, 35, 38, 40 – 43, 252,

433
Index

289 – 290; perceptions, preferences, and Wadensjö, C. 16, 71, 273


attitudes towards 39 – 41; RSI as (see remote Wallace, P. 253
simultaneous interpreting); sound quality Wan, H. 138
34, 40, 289; spatial arrangement, in 42 – 44; Wang, B. 313
strategies to manage challenges of 44 – 45; Wang, C. 235
telephone interpreting and 251 – 252; visual Wang, H. 138
ecology in 42 – 44 Wang, J. 17, 271
video relay service (VRS) 31, 32, 40, 67, 68, 76, Wang, X. 235
250; billable time for 74 – 75; characteristics Wang, Y. 114
of 68 – 69; COVID-19 pandemic 75 – 76; Web Content Accessibility Guidelines
guidelines and regulations for 73 – 75; (WCAG) 416
interpreter-mediated phone calls in 71 – 73; Will, M. 124
in legal settings 267 – 268, 275; media and Wilson, H.J. 404
institutional interaction in 69 – 71; modalities Winston, E.A. 352
69; providing emergency calls 75 – 76; work intensification 388
stress 73 – 74; telephone conversation and work-life balance 24, 54, 58, 332, 413
70; videophone conversation between deaf
signing parties 70 – 71 Xin, X.Y. 101
video remote interpreting (VRI) 31, 40 – 43, Xu, H. 268
239, 267, 288, 294, 339; in healthcare 34, Xu, R. 129
38 – 39, 250 – 254, 257; legal interpreting and
268; on-site interpreting and 39, 41 – 43, 252; Yabe, M. 40, 252
in police interviews 41, 44; videoconference Yenkimaleki, M. 319
interpreting vs. 32 – 35, 37 Yin, M. 337
Vieira, L.N. 256, 258 Yuan, L. 313
Viezzi, M. 231 Yuan, X. 138
ViKiS project, Germany 30, 44
VIP project 415 Zhang, P. 101
virtual interpreting 51, 330 – 331 Zhang, W. 40 – 41
visual distractors 15 Zhang, X. 19, 63, 218
visually impaired person (VIP) 407; see also Zhu, X. 354
blind interpreters Ziegler, K. 21, 82
visual quality 270 Zoom 36, 53, 60, 173, 251, 415, 417,
Vranjes, J. 43 418n1

434

You might also like