0% found this document useful (0 votes)
907 views242 pages

Towards Responsible Machine Translation - Ethical and Legal Considerations in Machine Translation

This document provides an introduction to the fourth volume of the book series "Machine Translation: Technologies and Applications". The volume is titled "Towards Responsible Machine Translation: Ethical and Legal Considerations in Machine Translation" and is edited by Helena Moniz and Carla Parra Escartín. It contains 11 chapters divided into three parts covering the ethical, legal and societal impacts of machine translation. The foreword commends the editors for addressing the important and timely topic of responsible artificial intelligence and machine translation. The acknowledgements section thanks the contributors, editors, and support networks involved in producing the volume.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
907 views242 pages

Towards Responsible Machine Translation - Ethical and Legal Considerations in Machine Translation

This document provides an introduction to the fourth volume of the book series "Machine Translation: Technologies and Applications". The volume is titled "Towards Responsible Machine Translation: Ethical and Legal Considerations in Machine Translation" and is edited by Helena Moniz and Carla Parra Escartín. It contains 11 chapters divided into three parts covering the ethical, legal and societal impacts of machine translation. The foreword commends the editors for addressing the important and timely topic of responsible artificial intelligence and machine translation. The acknowledgements section thanks the contributors, editors, and support networks involved in producing the volume.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 242

Machine Translation: Technologies and Applications

Series Editor: Andy Way

Helena Moniz
Carla Parra Escartín Editors

Towards
Responsible
Machine
Translation
Ethical and Legal Considerations in
Machine Translation
Machine Translation: Technologies and
Applications

Volume 4

Editor-in-Chief
Andy Way, ADAPT Centre, Dublin City University, Dublin, Ireland

Editorial Board Member


Sivaji Bandyopadhyay, National Institute Of Technology Silchar, Silchar, Assam,
India
This book series tackles prominent issues in MT at a depth which will allow these
books to reflect the current state-of-the-art, while simultaneously standing the test of
time. Each book topic will be introduced so it can be understood by a wide audience,
yet will also include the cutting-edge research and development being conducted
today.
Machine Translation (MT)is being deployed for a range of use-cases by millions
of people on a daily basis. Google Translate and FaceBook provide billions of
translations daily across many language. Almost 1 billion users see these translations
each month. With MT being embedded in platforms like this which are available to
everyone with an internet connection, one no longer has to explain what MT is on a
general level. However, the number of people who really understand its inner
workings is much smaller.
The series includes investigations of different MT paradigms (Syntax-based
Statistical MT, Neural MT, Rule-Based MT), Quality issues (MT evaluation, Quality
Estimation), modalities (Spoken Language MT) and MT in Practice (Post-Editing
and Controlled Language in MT), topics which cut right across the spectrum of MT
as it is used today in real translation workflows.
For inquiries and submission of proposals please contact the Series Editor,
Andy Way.
Helena Moniz • Carla Parra Escartín
Editors

Towards Responsible
Machine Translation
Ethical and Legal Considerations in Machine
Translation
Editors
Helena Moniz Carla Parra Escartín
School of Arts and Humanities Research & Development
University of Lisbon RWS Language Weaver
Lisbon, Portugal Dublin, Ireland

ISSN 2522-8021 ISSN 2522-803X (electronic)


Machine Translation: Technologies and Applications
ISBN 978-3-031-14688-6 ISBN 978-3-031-14689-3 (eBook)
https://doi.org/10.1007/978-3-031-14689-3

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by
similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword

As with the previous volume edited by Michael Carl, I am delighted to write a few
words to be included at the beginning of the fourth book in our series on Machine
Translation: Technologies and Applications. So far, we have seen how solutions to
problems for MT might be found by looking at how the brain works, how MT
quality can be evaluated, and how MT is used in practice. Now that MT is in
widespread use, it is time to look—as is happening in AI more generally—at how
it can be used in a responsible manner.
Volume 4 in the series is entitled Towards Responsible Machine Translation:
Ethical and Legal Considerations in Machine Translation, a collection of 11 chap-
ters edited by two of the nicest people in the field of MT and Translation, Helena
Moniz and Carla Parra Escartín. After an introduction by the editors, the main body
of the book consists of three separate but related parts: (1) Responsible Machine
Translation: Ethical, Philosophical and Legal Aspects; (2) Responsible Machine
Translation from the End-User Perspective; and (3) Responsible Machine Transla-
tion: Societal Impact. Altogether, there are 23 contributors, all of whom are experts
in their respective areas, so there will be plenty of interest in this volume for sure,
whether the reader is interested in looking at the topic from an ethical point of view,
licensing issues, and ownership rights; or from a user perspective, topics surrounding
post-editing of MT output, the impact of MT on the reader, and other issues of
automation; or issues of bias, the impact of large AI models on the planet we live on,
and the very topical subject of privacy.
Each reader of this volume will be more or less at home in different sections of the
book, but none, I contend, will argue against the inclusion of all these topics in a
single coherent volume. Perhaps you have never thought about these topics before,
and will be inspired to follow up the work presented here; at the very least, when you
are writing about issues on responsible AI (and especially MT) in the future—a topic
that can no longer be ignored—you will know where to come for a perspective on all
these issues as well as a thorough set of references.
As such, this volume fills an important gap in the field, providing an encapsula-
tion of the state of the art, and pointers to future work in the area, especially for

v
vi Foreword

newcomers to the topic who are well versed in ethics and legal issues from other
application areas, but also to longstanding researchers in the discipline, like me, who
have perhaps buried their head in the sand for too long. As such, I am delighted that
Helena and Carla agreed to take up this challenge in the first place, and the book they
have delivered is, I think, essential reading for us all.

ADAPT Centre, Dublin City University, Andy Way


Dublin, Ireland
Acknowledgements

Many stakeholders have been involved in the process of preparing this book for
publication, and we are grateful to all of them for their patience with us during the
whole process and for their constant support.
First of all we would like to thank Sharon O’Brien for inspiring us to work on
research related with Ethics and MT during the INTERACT project and Andy Way
for encouraging us to go beyond writing a book chapter and embarking on this book
project. Without his encouragement and support in the very early stages of this
project, this book would not have been possible.
We are very grateful to all the authors who contributed to this volume. Editing a
book in the middle of a pandemic has brought about additional challenges and your
efforts and contributions to the book have been paramount in making this book a
reality. We also sympathise deeply with those authors that had to decline our
invitation or had to withdraw their contribution due to the harsh conditions of the
pandemic.
We are also very grateful to our editors at Springer, Sowmya Thodur and Shina
Harshavardhan, for all the support along this journey, the numerous emails helping
us navigate through the process, and their patience.
Last but certainly not least, we would like to thank our families and close friends
and colleagues for their continuous support during this project. You all understood
the importance of this topic and have given us the much-needed energy to complete
it. Without your support it would have been possible, but certainly much harder.
Helena Moniz and Carla Parra Escartín

vii
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Helena Moniz and Carla Parra Escartín

Part I Responsible Machine Translation: Ethical, Philosophical


and Legal Aspects
2 Prolegomenon to Contemporary Ethics of Machine Translation . . . 11
Wessel Reijers and Quinn Dupont
3 The Ethics of Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . 29
Alexandros Nousias
4 Licensing and Usage Rights of Language Data in Machine
Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Mikel L. Forcada
5 Authorship and Rights Ownership in the Machine Translation
Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Miguel L. Lacruz Mantecón

Part II Responsible Machine Translation from the End-User


Perspective
6 The Ethics of Machine Translation Post-editing in the Translation
Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Celia Rico and María del Mar Sánchez Ramos
7 Ethics and Machine Translation: The End User Perspective . . . . . . 113
Ana Guerberof-Arenas and Joss Moorkens

ix
x Contents

8 Ethics, Automated Processes, Machine Translation, and Crises . . . 135


Federico M. Federici, Christophe Declercq, Jorge Díaz Cintas,
and Rocío Baños Piñero

Part III Responsible Machine Translation: Societal Impact


9 Gender and Age Bias in Commercial Machine Translation . . . . . . . 159
Federico Bianchi, Tommaso Fornaciari, Dirk Hovy,
and Debora Nozza
10 The Ecological Footprint of Neural Machine Translation
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Dimitar Shterionov and Eva Vanmassenhove
11 Treating Speech as Personally Identifiable Information
and Its Impact in Machine Translation . . . . . . . . . . . . . . . . . . . . . . 215
Isabel Trancoso, Francisco Teixeira, Catarina Botelho,
and Alberto Abad
Editors and Contributors

About the Editors

Helena Moniz is the President of the European Association for Machine Transla-
tion (EAMT) and Vice President of the International Association for Machine
Translation (IAMT). She is also the Vice-coordinator of the Human Language
Technologies Lab at INESC-ID. Helena is an Assistant Professor at the School of
Arts and Humanities at the University of Lisbon, where she teaches Computational
Linguistics, Computer-Assisted Translation, and Machine Translation Systems and
Post-editing. She graduated in Modern Languages and Literature at School of Arts
and Humanities, University of Lisbon (FLUL), in 1998. She received a Master’s
degree in Linguistics at FLUL, in 2007, and a PhD in Linguistics at FLUL in
cooperation with the Technical University of Lisbon (IST), in 2013. She has been
working at INESC-ID/CLUL since 2000, in several national and international pro-
jects involving multidisciplinary teams of linguists and speech processing engineers.
Within these fruitful collaborations, she participated in 19 national and international
projects. Since 2015, she is also the PI of a bilateral project with INESC-ID/Unbabel,
a translation company combining AI + post-editing, working on scalable Linguistic
Quality Assurance processes for crowdsourcing. She was responsible for the crea-
tion of the Linguistic Quality Assurance processes developed at Unbabel for Lin-
guistic Annotation and Editors’ Evaluation. She now is working mostly on research
projects involving Linguistics, Translation, and AI.
School of Arts and Humanities, University of Lisbon, Lisbon, Portugal

Carla Parra Escartín is Research Manager within the R&D department of RWS
Language Weaver. She holds a MA in English Philology from the University of
Zaragoza (Spain), a MA Degree in Translation and Interpreting and a MA in Applied
Linguistics, both from the Pompeu Fabra University (Barcelona, Spain), and a PhD
in Computational Linguistics from the University of Bergen (Norway). She has over
10 years of research experience in linguistic infrastructures, human factors in
machine translation, and multiword expressions (MWEs). Throughout her career
she has worked in various EU-funded projects and actions (LIRICS, CLARIN,
xi
xii Editors and Contributors

FLaReNet, CLARA, PARSEME, DASISH, EXPERT, EDGE, INTERACT), as well


as nationally funded projects in Spain (TACOC, CLARIN-CAT) and Norway
(CLARINO). During her research career she has produced over 60 scientific publi-
cations including book chapters, journal articles, conference papers, and deliver-
ables. She has been awarded three Marie Skłodowska-Curie fellowships (one
pre-doctoral and two post-doctoral) and has served as a reviewer for the most
prestigious venues in Machine Translation and Computational Linguistics, including
ACL, EMNLP, WMT, EAMT, COLING, and MT Summit. Between 2018 and 2020
she was a member of the Standing Committee of the SIGLEX-MWE, a special
interest group focusing on research in MWEs. She is also a member of the Editorial
Board of the Phraseology and Multiword Expressions Series (LangSci Press).
Research & Development, RWS Language Weaver, Dublin, Ireland

Contributors

Alberto Abad University of Lisbon, INESC-ID, Lisbon, Portugal


Federico Bianchi Bocconi University, Milan, Italy
Catarina Botelho University of Lisbon, INESC-ID, Lisbon, Portugal
Jorge Díaz Cintas University College London (UCL), London, UK
Christophe Declercq Utrecht University, Utrecht, Netherlands
University College London (UCL), London, UK
Quinn Dupont School of Business, University College Dublin (UCD), Dublin,
Ireland
Federico M. Federici University College London (UCL), London, UK
Mikel L. Forcada Dept. de Llenguatges i Sistemes Informàtics, Universitat
d’Alacant, Alicante, Spain
Prompsit Language Engineering, Alicante, Spain
Tommaso Fornaciari Bocconi University, Milan, Italy
Ana Guerberof-Arenas Guildford, UK
University of Groningen, Groningen, Netherlands
Dirk Hovy Bocconi University, Milan, Italy
Miguel L. Lacruz Mantecón School of Law, University of Zaragoza, Zaragoza,
Spain
María del Mar Sánchez Ramos University of Alcalá, Madrid, Spain
Joss Moorkens Dublin City University (DCU), Dublin, Ireland
Editors and Contributors xiii

Alexandros Nousias National Centre for Scientific Research “Demokritos”,


Athens, Greece
Debora Nozza Bocconi University, Milan, Italy
Rocío Baños Piñero University College London (UCL), London, UK
Wessel Reijers Robert Schuman Centre, European University Institute (EUI),
San Domenico di Fiesole, Italy
Celia Rico Universidad Complutense de Madrid, Madrid, Spain
Dimitar Shterionov Department of Cognitive Science and Artificial Intelligence,
Tilburg University, Tilburg, The Netherlands
Francisco Teixeira University of Lisbon, INESC-ID, Lisbon, Portugal
Isabel Trancoso University of Lisbon, INESC-ID, Lisbon, Portugal
Eva Vanmassenhove Department of Cognitive Science and Artificial Intelligence,
Tilburg University, Tilburg, The Netherlands
Abbreviations

AAE African American English


AD Alzheimer’s Disease
ADD Addition Operation
AI Artificial Intelligence
ASR Automatic Speech Recognition
ASV Automatic Speaker Verification
AVT Audiovisual Translation
BERT Bidirectional Encoder Representations from Transformers
BLEU Bilingual Evaluation Understudy
BPE Byte-Pair Encoding
CAT Computer-Assisted Translation
CBMT Corpus-Based Machine Translation
CC-BY Creative Commons Attribution, one of the open licences promoted
by the Creative Commons initiative
CDPA British Copyright, Designs and Patents Act 1988
CISAC International Confederation of Societies of Authors and Composers
CO2 Carbon Dioxide
CPU Central Processing Unit
DL Deep Learning
DNN Deep Neural Network
EER Equal Error Rate
EIOS Epidemic Intelligence from Open Sources
ELMo Embeddings from Language Models
ELP Endangered Language Project
EMEA European Medicines Evaluation Agency, since 2004 “European
Medicines Agency”
EMNLP Empirical Methods in Natural Language Processing
EN Language code for the English language
ES Language code for the Spanish language
EU European Union
FAHQMT Fully Automatic High Quality Machine Translation
xv
xvi Abbreviations

FB Frame Buffer
FP32 32-bit floating point number
FPO Floating Point Operations
FR Language code for the French language
GB Gigabytes
GC Garbled Circuit
GDACS Global Disaster Alerting Coordination System
GDPR General Data Protection Regulation
GloVe Global Vector
GMM Gaussian Mixture Model
GNMT Google Neural Machine Translation
GNU GNU’s Not Unix! [recursive abbreviation], a free software initiative
and software collection
GPHIN Global Public Health Intelligence Network
GPT Generative Pre-trained Transformer
GPU Graphics Processing Unit
HE Homomorphic Encryption
HLT Human Language Technology
HMM Hidden Markov Model
HT Human Translation
IBM International Business Machines
IE Ireland
ILSP Institute for Language and Speech Processing (Athens, Greece)
INT16 Integer number stored with 16 bit
INT8 Integer number stored with 8 bit
IPR Intellectual Property Rights
ISCA International Speech Communication Association
KL Kullback-Leibler
LoAs Levels of Abstraction
LR Language Resource
LS-BERT Language-Specific BERT
LSP Language Service Provider
LSTM Long Short-Term Memory
mBERT Multilingual BERT
MCO Mars Climate Orbiter
MD5 Message Digest version 5, a function assigning a 128-bit number to
a piece of data
ML Machine Learning
MLaaS Machine Learning as a Service
MOS Mean Opinion Score
MS Word Microsoft Word
MT Machine Translation
MTPE Machine Translation Post-Editing
MUL Multiplication operation
Abbreviations xvii

NGO Non-governmental Organisation


NL The Netherlands
NLP Natural Language Processing
NMT Neural Machine Translation
NN Neural Network
OCHA Office for the Coordination of Humanitarian Affairs
OSA Obstructive Sleep Apnoea
PD Parkinson’s Disease
PDF Portable Document Format
PII Personable Identifiable Information
Prêt-à-LLOD Prêt-à-Linguistic Linked Open Data
ProMED Program for Monitoring Emerging Diseases
PUE Power Usage Effectiveness
QA Quality Assurance
QE Quality Estimation
RAPL Running Average Power Limit
RBMT Rule-Based Machine Translation
RNN Recurrent Neural Network
RTA Retrospective Think Aloud
S2SMT Speech-to-Speech Machine Translation
SBE Secure Binary Embeddings
SM Streaming Multiprocessor
SMC Secure Multiparty Computation
SMH Secure Modular Hashing
SMT Statistical Machine Translation
SPSC Security and Privacy in Speech Communication
T&I Translation and Interpreting
TER Translation Error Rate
TF-IDF The Term Frequency—Inverse Document Frequency
TM Translation Memory
TMX Translation Memory Exchange (a standard format for translation
memories)
TRANS Transformer
TTS Text-to-Speech
UK United Kingdom
UN United Nations
VAST Values Across Space and Time
VC Voice Conversion
WER Word Error Rate
WHO World Health Organization
Chapter 1
Introduction

Helena Moniz and Carla Parra Escartín

1.1 Responsible Artificial Intelligence

Together with Artificial Intelligence (AI) technologies and their prevalence in our
everyday lives, new trends in research have emerged which researchers have started
to question from an ethical and ecological point of view. Along with this trend,
citizens have also started to question how these systems can be trusted if there is little
or no control over the decisions they make. Should one trust an AI algorithm to be
the main driver of investments in the stock market, for instance? How many of these
algorithms should citizens know and who is responsible for potential damages
caused? What is the ecological impact that training these systems has in an era
where we are all concerned about climate change? Are the systems able to detect and
categorise speakers’ attitudes, beliefs, or even biases beyond the semantic meaning
and alert the user on such patterns? All these questions are addressed in this
emerging field of research, and along with it, different terms have been coined to
refer to it: Ethical AI, Fair AI, Responsible AI, and even Green AI.
It is well known that the so-called neural systems are not able to extract cause-
effect as human beings do (e.g. Pearl and Mackenzie 2018) and rather work as a
“black box”. In the words of Joshua Bengio, one of the mentors of such systems,
“It’s a big thing to integrate [causality] into AI”.1 Several metaphors have been used
to express such concerns (or even fears) that technology is no longer controlled by
the human being, but rather performs in such “magical” ways that it can lead to

1
https://www.wired.com/story/ai-pioneer-algorithms-understand-why/.

H. Moniz (✉) · C. P. Escartín


School of Arts and Humanities, University of Lisbon, Lisbon, Portugal
Research & Development, RWS Language Weaver, Dublin, Ireland
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1


H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_1
2 H. Moniz and C. P. Escartín

unforeseen scenarios.2 Responsible AI is thus the ethical use of AI technologies,


based on neural systems, explicitly tackling several principles, such as the ones
highlighted in the IEEE Ethical Aligned Design (IEEE 2016): Human Benefit,
Responsibility, Transparency, and Education and Awareness.
In the field of Natural Language Processing (NLP), questions around ethics have
also rightly started to emerge, with one of the seminal papers being the one by Hovy
and Shannon (2016), who questioned the social impact of NLP. Two workshops on
ethics and NLP have been organised3 (Hovy et al. 2017; Albano et al. 2018), and a
new track in the main conferences in NLP has been established for Ethics and NLP.4
Since 2021, reviewers to major NLP conferences are also asked to assess ethical
aspects of the papers they are reviewing,5 and ethics is now also embedded in the
NLP curriculum in some universities (Bender et al. 2020).
In parallel, the Speech Processing field is also promoting broad awareness on AI
technologies based on the fact that speech, as a primary means of communication, is
an idiosyncratic biomarker through which we express ourselves, our attitudes, and
our emotions. As such, it also raises several ethical concerns and these have been
covered in various papers, from the seminal work of Cowie (2015) to a survey on
Computational Paralinguistics by Batliner et al. (2022). Speech as a biomarker and
the need to treat it as Personally Identifiable Information (PII) is also the topic of
several conferences on security and privacy (e.g. ISCA Symposium on Security and
Privacy on Speech Communication 2021).6
Along with papers related to new advances in our field, now it is also common to
find papers analysing bias in NLP applications (Larson 2017), questioning how to
ensure reproducibility (Belz et al. 2021) and explainability (Wiegreffe and Pinter
2019), or even discussing the ethical issues around competitiveness in research
(Parra Escartín and Liu 2017; Ethayarajh and Jurafsky 2020). We do not only
agree on this new trend towards questioning the ethical dimension of the research
being done, but also fully support it and believe it has to be further explored in the
specific field of Machine Translation (MT). Some steps in this direction have already
been made, and publications exploring issues such as gender bias in MT
(Vanmassenhove et al. 2018), questioning the reproducibility and comparability of
MT systems (Marie et al. 2021), and fair MT (Kenny et al. 2020) have emerged in
recent years. However, we are still in the early days of this trend, and a joint effort of
all the communities at stake is needed to make Responsible MT a reality.

2
See Kenny (2019) for an overview on the topic.
3
https://ethicsinnlp.org/ethnlp-2017 and https://ethicsinnlp.org/.
4
See: https://acl2020.org/committees/program, https://2021.aclweb.org/organization/program/.
5
https://2021.aclweb.org/ethics/Ethics-review-questions/.
6
https://spsc-symposium2021.de/.
1 Introduction 3

1.2 Towards Responsible Machine Translation

We understand Responsible MT as a combination of all factors that need to be


considered when developing and deploying MT systems to ensure that such systems
are ethically designed, including, but not limited to data bias, data licences and
rights, ecological footprint, and intended end-users. In what follows, we offer a brief
overview of some of these key factors.
If we first focus on the data used to train MT systems, different factors play
important roles in the process. The data will inevitably contain an intrinsic bias. Such
embedded bias is of crucial importance in NLP in general, and in MT in particular
(Stanovsky et al. 2019; Wang and Sennrich 2020). Translation enables humans to
communicate in languages that they do not necessarily speak, and by extension, this
is also what MT allows: human communication. The training data used to develop
MT systems is therefore a key element in how a message is conveyed. When
developing responsible MT systems, the implicit and explicit biases (e.g. age,
gender, social status, ethnicity, regional dialect, ideology, personality traits, etc.)
present in the training datasets need to be addressed to ensure that the output of the
MT systems is not biased and hence cannot potentially harm a particular societal
group (e.g. an MT system that produces translations that are sexist, racist, etc.).
Moreover, the requirements for good quality (responsible) MT go beyond this: a
good MT system should not produce output that is repetitive or omits parts of the
message. It should provide accurate translations and avoid what is usually called in
the MT community as “hallucinations”: completely wrong translations that have
nothing to do with the original source text. When used in professional translation
workflows, MT should also become a support tool for the translator who post-edits
its output by providing gains in performance that justify the discounts applied by the
use of MT. Does MT increase or decrease the cognitive load of a professional
translator? How is this taken into account and applied in real-world settings? How
is quality being assessed? What is the best metric to assess the improvements done
on an engine updated over time? What does the result of a particular metric really
imply? Are evaluation metrics properly explained to the end-customers so that they
understand their advantages, as well as their caveats? These questions need to be
addressed so that MT is used in a fair way in real-world settings. Responsible AI
systems act in the benefit of human beings and transposed to MT; responsible MT
should assist in the act of communication.
If we now focus on Green AI (Strubell et al. 2019), we should also consider the
carbon emissions produced when developing MT systems, and when using them in
production. Green AI is becoming a concern in the NLP research community, but
there is still a lot to be done. Where is the limit that determines that we should stop
running experiments trying to attain a better quality? What is the environmental
impact of such experiments? Questions such as what is the best trade-off between
training and running MT engines of different levels of quality and their carbon
emissions need to be posed and researched.
4 H. Moniz and C. P. Escartín

While no standard has yet been proposed, Responsible Machine Translation is


also addressing data privacy issues and curating the training data used for training
the systems. It is the anonymization of Personal Identifiable Information (PII), the
extraction of embedded biases in the datasets, or at least the analysis and detection of
such biases. It is coping with the environmental impact of the systems and showing
awareness of its implications.

1.3 Structure of the Book

In this book, we attempted to give our contribution to the research community


towards thinking and reflecting on what Responsible Machine Translation really
means. MT is present in our everyday lives. It is used in a myriad of scenarios and
allows people to communicate. In the last 20 years we have also experienced its rapid
development from an emergent technology to a technology that is sometimes taken
for granted and that achieves quality levels that were unthinkable just 10 years ago.
That is why it is about time we question the ethical and legal issues around it. We
have a moral obligation to think and reflect on issues that impact the development
and/or the use of MT. And as MT goes beyond the research field of MT per se, we
invited authors from various disciplines to contribute with reflections from their own
field. In what follows we present how we conceived this book as an open dialogue
across disciplines, from philosophy to law, with the ultimate goal of providing a
wide spectrum of topics to reflect on. This does not mean that everything is covered,
but we think it is a starting point for further exploring what responsible MT should
entail.
The book is divided into three parts: (1) Responsible Machine Translation:
Ethical, Philosophical and Legal Aspects; (2) Responsible Machine Translation
from the End-User Perspective; and (3) Responsible Machine Translation: Societal
Impact.
Part I discusses the ethical and legal issues around MT. This serves as a preamble
to frame responsible MT across disciplines. The chapters in this first part address
issues related to translators, ownership rights, intellectual property, and data protec-
tion and sharing. Part II focuses on responsible MT from the end-user perspective
specifically, as an ecosystem on its own, in the perspective of translators, end-users,
and also in crisis scenarios, a very timely topic considering the COVID-19 pan-
demic. Finally, Part III covers the societal impact of MT. First, gender and age bias
in MT systems are explored. Second, the ecological footprint of neural MT systems
is presented, and third the importance of speech as Personally Identifiable Informa-
tion (PII) and its impacts on MT systems is addressed.
The next few paragraphs aim at providing an overview of the key messages of
each chapter and how they align with responsible MT, mostly focusing on the
questions they trigger and possible venues for thought.
Part I encompasses four chapters with a main focus on ethical and legal consid-
erations on responsible MT. In Chap. 2, “Prolegomenon to Contemporary Ethics of
1 Introduction 5

Machine Translation”, Wessel Reijers and Quinn Dupont provide an articulated


view of philosophy and ethics of technology, especially of MT systems. The authors
stimulate our thoughts with core questions on meaning, on the one hand—“what is
the nature of meaning?” and “is meaning lost in translation?”—and on the philos-
ophy of technologies, urging us to reflect on crucial topics on ethics and
Responsible AI: “do machines think?” and “what is artificial speech?” As a contin-
uation on the path of reflecting on Ethics and Language, on “The Ethics of Machine
Translation”, Alexandros Nousias explores ways to responsibly analyse the patterns
in language and provides recommendations for ethical design optimization. The
author frames his work as a contribution to the discussion of how to reflect on data
patterns and social and semantic meanings of the data. On “Licensing and Usage
Rights of Language Data in Machine Translation”, Mikel Forcada highlights the
historically heavy dependency of MT systems on data, and how with the advent of
neural systems this dependency is creating blurred frontiers on ownership. The
chapter describes the usages of data in the distinct paradigms of MT, from rule-
based to neural systems, and poses intriguing discussions on licensing, new usages
of data, and copyright. Finally, in Chap. 5, “Authorship and Rights Ownership in the
Machine Translation Era”, Miguel Lacruz Mantecón presents the legal framework of
translation (co)authorship, legal uses of data, and intellectual property rights. The
author discusses the legal implications of a translation as “a derivative work” and
whether machines can be considered authors.
Part II is devoted exclusively to Responsible Machine Translation from the
End-User Perspective. There are three main axes explored in this part, all related
to human factors in MT: the post-editors of MT, the end-users of MT, and society at
large and MT as a means of communication. In “The Ethics of Machine Translation
Post-editing in the Translation Ecosystem”, Celia Rico and María del Mar Sánchez
Ramos analyse how MT fits in the translation ecosystem from the perspective of
those who correct the MT output: the post-editors. They discuss the key players of
this ecosystem and reflect on three dilemmas: (1) the post-editor’s status; (2) the
post-editor’s commitment to quality; and (3) digital ethics and the post-editor’s
responsibility. In Chap. 7 the use of MT as an end-product is analysed. Aligned
with the importance of centring the post-editing process on end-users, Ana
Guerberof-Arenas and Joss Moorkens describe usability experiments researching
the difference between the inclusion of raw and post-edited MT in multilingual
products and creative texts with an emphasis on users’ feedback. In this chapter,
entitled “Ethics and Machine Translation: The End User Perspective”, they also offer
their reflections on the complex ecosystem that MT brings about and the ethical
implications of technologies. As the final chapter of this part, we look into the use of
MT to benefit society as a whole. In Chap. 8, “Ethics, Automated Processes,
Machine Translation, and Crises”, Federico Federici, Christophe Declerq, Jorge
Díaz Cintas, and Rocío Baños Piñero magnify the complex post-editing ecosystem
in a crisis scenario. The authors provide several ethical recommendations and a
strong appeal to commit to the UN’s motto of “leave no one behind”. They advocate
for a strong focus on preparedness and investment on global risk reduction
6 H. Moniz and C. P. Escartín

platforms, on the one hand; and communication settings based on trust, credibility,
and social equality, on the other.
Finally, Part III covers the societal impact of MT, tackling gender and age bias in
MT systems, ecological implications of Neural MT systems, and the role of speech
as key Personally Identifiable Information. There are multiple topics that could be
encompassed in this last part of the book, as MT is nowadays used to enable
communication in a myriad of possible scenarios and hence multiple perspectives
could be explored. With the ultimate goal of highlighting how Responsible MT
spreads across domains and even has societal implications in our daily lives, the
three chapters included here aim to be a first approximation towards reflecting
further on the societal impact MT has. In Chap. 9, “Gender and Age Bias in
Commercial Machine Translation”, Federico Bianchi, Tommaso Fornaciari, Dirk
Hovy, and Debora Nozza tackle style issues in MT and how they reflect gender and
age. Three commercial MT systems outputs (Bing, DeepL, Google) are analysed and
evidence on demographic bias is provided. The authors further explore whether the
bias found can be used as a feature, by correcting skewed initial samples, and
compute fairness scores for the different demographics.
GPU performance and NMT models have been gaining traction within the topic
of Green AI and NLP, discussing the implications of higher performances and
inference in the NMT systems and balanced those with ecological concerns. There
are ecological implications of training big datasets with lower or no significance
improvements, often described with biased metrics, such as BLEU (Kocmi et al.
2021). Chapter 10, “The Ecological Footprint of Neural Machine Translation Sys-
tems”, written by Dimitar Shterionov and Eva Vanmassenhove, reports on several
experiments aimed at measuring the carbon footprint of MT. The authors train
models using distinct NMT architectures (Long Short-Term Memory (LSTM) and
Transformer) on different types of GPU (NVidia GTX 1080Ti and NVidia Tesla
P100) and collect power readings from the corresponding GPUs in order to compute
their respective ecological footprint.
Our book ends with the final chapter entitled “Treating Speech as Personable
Identifiable Information—Impact in Machine Translation”. In it, Isabel Trancoso
provides a broad view of the widespread use of speech technologies and the ethical
implications of using such an idiosyncratic biomarker. She illustrates how it is
possible to extract metadata on personality traits, emotions, health status, gender,
age, accent, etc. from very small samples of speech. Sending speech to remote
servers should, therefore, be a very cautious and informed action. Unfortunately,
most citizens are not aware of the potential misuses of their voice. Speech gives more
idiosyncratic information than a fingerprint and consequently it is highly sensitive in
terms of security and privacy. Isabel’s chapter allows us to end the book in a circular
way, as she answers Wessel and Quinn’s question “what is artificial speech?” and its
ethical implications.
1 Introduction 7

1.4 Avenues for Future Work

The topic of Responsible AI is a very complex one. This book provides what we
consider a first exploration of many venues of work that should be pursued to
accomplish responsible MT systems. Each chapter provides an introduction to
what could be a book of its own. Along this journey, we had a constant feeling
that much more could be covered. We felt as if we were just covering nanoparticles
in an unexplored multiverse and reading the contributions of all the authors, this was
only confirmed. Responsible MT plays a key role in every citizen’s life. This book
aims at bringing up this topic for discussion and providing the basis for a much
broader field of research: one that encompasses cross-disciplinary collaborations to
ensure that responsible MT becomes a reality in the (near) future.
AI technologies are here to stay, in health applications, in e-learning, in embodied
agents, in generating writing suggestions, in bias detectors, etc. And along with them
comes a societal impact that needs to be addressed. NLP is surpassing the frontier of
“meaning” and embracing metadata on our attitudes, our mentality, our ways of
thinking, and even our ways of expressing ourselves. We leave a fingerprint of our
inner selves in our speech, in the texts we write, and in the gestures we use when we
communicate. AI models are more than ever able to capture and code all those
nuances. They are coding who we are. And that is why we think we are at a crucial
turning point, a disruptive one, we should state, in which the ethical and legal
implications of such systems need to be raised up for discussion and informed
decisions need to be made, not only on the way we develop such technologies, but
also on the way we use them. It is clear that AI systems—and by extension MT
systems—are having a major impact on our daily lives, but it is still unclear how we
will manage to balance the progress of such technologies and the uniqueness of each
citizen’s voice/text/visual information. Ultimately, as end-users of the systems we
should be part of this discussion. This book is intended to be a first step in that
direction.

References

Albano M, Hovy D, Mitchell M, Strube M (2018) Proceedings of the Second ACL Workshop on
Ethics in Natural Language Processing, EthNLP@NAACL-HLT 2018. New Orleans, Louisi-
ana, USA, June 5, 2018
Batliner A, Hantke S, Schuller BW (2022) Ethics and good practice in computational paralinguis-
tics. IEEE Trans Affect Comput 13:1236. https://doi.org/10.1109/TAFFC.2020.3021015
Belz A, Agarwal S, Shimorina A, Reiter E (2021) A systematic review of reproducibility research in
natural language processing. In: Proceedings of the 16th Conference of the European Chapter of
the Association for Computational Linguistics: Main Volume. EACL 2021
Bender EM, Hovy D, Schofield A (2020) Integrating ethics into the NLP curriculum. In: Pro-
ceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial
Abstracts. ACL 2020
8 H. Moniz and C. P. Escartín

Cowie R (2015) Ethical issues in affective computing. In: Calvo R, D’Mello S, Gratch J, Kappa A
(eds) The Oxford handbook of affective computing. Oxford University Press, Oxford. https://
doi.org/10.1093/oxfordhb/9780199942237.013.006
Ethayarajh K, Jurafsky D (2020) Utility is in the eye of the user: a critique of NLP leaderboard
design. ArXiv:abs/2009.13888
Hovy D, Shannon LS (2016) The social impact of natural language processing. In: Proceedings of
the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short
Papers). ACL 2016
Hovy D, Spruit SL, Mitchell M, Bender EM, Strube M, Wallach HM (2017) Proceedings of the
First ACL Workshop on Ethics in Natural Language Processing, EthNLP@EACL, Valencia,
Spain, April 4, 2017. EthNLP@EACL
IEEE (2016) Ethically aligned design. A vision for prioritizing human well-being with autonomous
and intelligent systems. IEEE, Washington, DC
Kenny D (2019) Machine translation. In: Rawling JP, Wilson P (eds) The Routledge handbook of
translation studies and linguistics. Routledge, London, pp 428–445
Kenny D, Moorkens J, do Carmo F (2020) Fair MT: Towards ethical, sustainable machine
translation. Transl Space 9:1. https://doi.org/10.1075/ts.00018.int
Kocmi T, Federmann C, Grundkiewicz R, Junczys-Dowmunt M, Matsushita H, Menezes A (2021)
To ship or not to ship: an extensive evaluation of automatic metrics for machine translation.
ArXiv:abs/2107.10821
Larson BN (2017) Gender as a variable in natural-language processing: ethical considerations. In:
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing,
EthNLP@EACL, Valencia, Spain, April 4, 2017. EthNLP@EACL
Marie B, Fujita A, Rubino R (2021) Scientific credibility of machine translation research: a meta-
evaluation of 769 papers. ArXiv:abs/2106.15195
Parra Escartín C, Reijers W, Lynn T, Moorkens J, Way A, Liu C-H (2017) Ethical considerations in
NLP shared tasks. Proceedings of the First ACL Workshop on Ethics in Natural Language
Processing, EthNLP@EACL, Valencia, Spain, April 4, 2017. EthNLP@EACL
Pearl J, MacKenzie D (2018) The book of why: the new science of cause and effect. Penguin Book,
London
Stanovsky G, Smith N, Zettlemoyer L (2019) Evaluating gender bias in machine translation. In:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
ACL 2019
Strubell E, Ganesh A, MacCallum A (2019) Energy and policy considerations for deep learning in
NLP. In: Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics. ACL 2019
Vanmassenhove E, Hardmeier C, Way A (2018) Getting gender right in neural machine translation.
ArXiv:abs/1909.05088
Wang C, Sennrich R (2020) On exposure bias, hallucinations and domain shift in neural machine
translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics. ACL 2020
Wiegreffe S, Pinter Y (2019) Attention is not explanation. In: Proceedings of the 2019 Conference
on Empirical Methods in Natural Language Processing and the 9th International Joint Confer-
ence on Natural Language Processing (EMNLP-IJCNLP). ACL
Part I
Responsible Machine Translation: Ethical,
Philosophical and Legal Aspects
Chapter 2
Prolegomenon to Contemporary Ethics
of Machine Translation

Wessel Reijers and Quinn Dupont

Abstract Globalisation has triggered a proliferation of translation practises, many


of which are mediated by machines. This development raises fundamental philo-
sophical questions about language, writing, meaning, reference, and representation.
This chapter builds a bridge between the ethics of machine translation and philos-
ophy of technology. It starts by considering the activity of translation as such and
argues that this is an inherently ethical activity because it involves sacrifice, estab-
lishes commonality between foreign elements, and invokes certain professional
virtues. Consequently, the chapter asks what machines ‘do’ to translation practises,
arguing that they fundamentally transform the activity of translation into the tran-
scription of notations. This raises the philosophical questions of logocentrism, the
extent to which machines translate the ‘presence’ of lived experience, and
phonocentrism, the extent to which machines transcribe the spoken word. Based
on this analysis, the chapter turns to three ethical questions that pertain to machine
translation. The first is about responsibility: while machines rely on retrospective
responsibility, can they deal with prospective responsibility in translation? The
second is about hospitality: can machines adapt to foreign worlds without having
the lived experience attached to these worlds? And the third is about virtue: can the
exchangeability inherent to machine translation cohere with the incommensurability
of the work of translation?

Keywords Ethics of translation · Philosophy of technology · Sacrifice · Linguistic


hospitality · Virtue · Logocentrism · Phonocentrism

W. Reijers (✉)
Robert Schuman Centre, European University Institute, San Domenico di Fiesole, Italy
e-mail: [email protected]; [email protected]
Q. Dupont
School of Business, University College Dublin, Dublin, Ireland
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 11


H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_2
12 W. Reijers and Q. Dupont

2.1 Introduction

We live in the age of globalisation, powered by technological progress and a


burgeoning capitalist economy. Globalisation challenges society in many ways: it
affects the ways we communicate, how cultures interact, dissolve, and reinforce
themselves, and how economic and technological processes intervene in everyday
habits and practises. Efforts towards Machine Translation (MT) are central to this
development.
Consider the international traveller. To communicate across a polyglot Europe in
turbulent times, the European Commission developed a website called Re-open EU.1
Once citizens visit this portal, they receive a pop-up notification:
Due to the frequency of the updates, aimed at offering relevant information at all times, this
tool partly relies on automatic machine translations. We apologise for any inconvenience
and we strive to continuously improve the user experience.

What does it mean that a machine translates this information? Does it mean I have to
expect bad sentence structures and weird spelling? Might it mean that I receive
factually wrong information? Or perhaps I will miss out on some important nuances?
MT generates many such practical questions, but it also raises more fundamental
questions about language, writing, meaning, reference, and representation. What
does a word, phrase or sentence mean and hence, when is a given translation
‘correct’ or ‘good’? Such quandaries also provoke questions concerning responsi-
bility. If a MT is ‘wrong,’ who is to be blamed? What is the ethical choice made by
the translator—in this case a machine—in picking one word rather than another? Is
something ‘lost’ in translation, and why does this matter?
Let us start with some straightforward ethical problems facing MT (Vieira et al.
2020). Consider again the EU portal, which might offer machine-translated recom-
mendations that are rendered ambiguous or incorrect. This might lead to people
being denied visitation or immigration rights. Or, medical information could be
displayed incorrectly. These issues of facticity can cause direct harm. More intricate
issues of interpretation emerge too. For instance, a text translated by Google
Translate2 might be disregarded in a legal court case because of its origins, even
when the facts are correct. Due to algorithmic and training data set bias based on
English text corpora, MT might affect the way we use language at a global scale, for
instance by giving it a more ‘English’ ring (Raley 2003). This might also affect
natural linguistic diversity.
The ethics of MT is still nascent and largely motivated by the MT community
(Kenny 2011), which focuses on fairness in MT (Kenny et al. 2020) and conse-
quentialist issues that emerge when MT is used in crisis situations (Parra Escartín
and Moniz 2019; O’Mathúna and Hunt 2019). Scholars studying the philosophy and
ethics of technology have, despite MT’s obvious societal importance, remained

1
https://reopen.europa.eu/en.
2
https://translate.google.com/.
2 Prolegomenon to Contemporary Ethics of Machine Translation 13

largely silent on the matter. We wonder if, to echo Feenberg (2010), philosophers of
technology have failed to see the paradox at the heart of MT. Is the most obvious the
most hidden? In this prolegomenon, we seek to build a bridge between the ethics of
MT and philosophy of technology.
Philosophy of technology brings at least three significant insights to the fore.
First, technologies are not mere instruments; they are not neutral with regards to
human actions and values but instead mediate reality (Verbeek 2005). For example,
eyeglasses do not let the wearer see reality more ‘correctly’. Rather, eyeglasses
mediate vision by amplifying some aspects and reducing or skewing others. Second,
technologies can promote or prohibit human action and delegate responsibility. As
such, technologies have ontological significance, a conceit strikingly captured by
Bruno Latour under the term “translation” (Latour 1994, 32). For Latour, translation
simpliciter is part of technology because it implies an uncertainty of goals. One
technology can be used to attain a multitude of aims. Third, philosophy of technol-
ogy teaches us that while our social and political norms are the basis of technological
design, technologies in return ‘refigure’ these values (Winner 1980). As such,
technology can have particular values that evolve, leaving a lasting effect on values
themselves: the insight behind value-centred design (Friedman et al. 2002).
In this chapter, we start building the bridge between the ethics of MT and
philosophy and ethics of technology. First, we consider translation as such and
argue that it should be seen as an inherently ethical activity. Its technological
mediation should therefore be of paramount interest for philosophers of technology.
Second, we take the role of a philosopher of technology and ask what machines ‘do’
to translation: how do machines mediate the human activity of translation and what
happens to the values associated with this activity? We draw on insights from a range
of thinkers to show that MT is in many ways an apotheosis of over 2000 years of
philosophical thought, but going forward must address three enduring ethical chal-
lenges. Third, we take these philosophical insights and discuss normative implica-
tions, highlighting three questions that could prompt further work on the ethics
of MT.

2.2 Translation and Ethics

While the ethics of MT is in its infancy, it leans heavily on the ethics of translation
(Pym 2001), which is as old as philosophy itself, at least insofar as philosophy
considers problems that are treated under the heading of interpretation. Indeed, the
early names for translators were hermeneus in Greek and interpres in Latin (Kearney
2007). In modern philosophy, translation becomes a major theme with the advent of
hermeneutics in the Romantic age. Schleiermacher thought translation was the
object of textual interpretation. With some novelty, he opposed the common view
that translation should accommodate the reader’s understanding. Rather, taking the
ideal of a translation as if it is the original, Schleiermacher observed how translation
happens ‘in between’ (Venuti 1991, 127). That is, the translator finds theirself in
14 W. Reijers and Q. Dupont

between the author and the reader and can move along an interpretive continuum.
For example, ethical choices in translation have been recognised by religions for
millennia. Scriptural religions, like those of the Abrahamic faith, have had to wrestle
with the perseverance of meaning in translation. Given that scripture contains the
words of God, or at least prophets and followers, religious thinkers have long
worried about how translations mediate ‘original’ meaning and translation
(famously, there are no translations of Qurʼān in Islam, only interpretations). By
working in between, the translator accommodates the Author and the believer, the
recipient of His words.
The hermeneutic significance of translation offers three clues to its ethical
challenges. We find a first clue in the original meaning of translation, as carried
(from the Latin latus) over (from the Latin trans). It implies a process of moving
something from one place and carrying it over to another place, like transporting
grain from a farm to feed a city. The thing that is carried over is not entirely
possessed by or integral to the agent that carries it but remains, nonetheless, external
or already committed. While in transport, the farmer might spill some grain. Simi-
larly, a translator does not possess the meaning of a text but rather carries it over
from the author to the reader, risking a loss of meaning along the way. Translation
thus shows affinities with deduction (reasoning from general rules to conclusions
about particulars) and induction (reasoning from particulars to conclusions about
general rules). Similarly, Renaissance scholars discussed translation in terms of
transduction (Kearney 2007). Crucially, moving in between, translation involves
sacrifice: an inevitable loss or lack.
We find a second clue in how the concept of translation is used pragmatically. In
the vernacular, we often speak about translation outside of language. We ‘translate’
thoughts into actions, a recipe into a dish, and a football strategy into play on the
field. Similarly, scholars use the verb ‘to translate’ in a variety of ways and contexts:
investigating whether biological indicators translate cancer cells, whether clinical
trial outcomes translate into benefits for patients, and whether organisations translate
climate change into business as usual. Moreover, translation links symptoms to
diseases, descriptions into prescriptions, theory into practice, and use into abuse
(the latter already indicates a move towards norms). What these uses have in
common is that they take two or more different elements and connect them through
a commonality. Like metaphors, translation bridges the gap of ‘foreignness’ between
heterogeneous elements.
We find a third clue when considering translation as a human activity. Often
translation is performed as a professional activity, for example, real-time interpreters
in the European Parliament enable multilingual communication between politicians,
paralegal translators create authorised translations of documents, and technical
writers prepare manuals for products in different languages. What characterises
these efforts is the activity of translation as work, as a profession. As with other
kinds of work, translation comes with standards of excellence—best practises—with
an understanding of professional responsibility. Of course, translation work ought to
be done well, serving the idea of a ‘good’ translation.
2 Prolegomenon to Contemporary Ethics of Machine Translation 15

These clues give us an initial sense of the ethical character of translation: to


translate means making a sacrifice, establishing commonality amongst plurality, and
respecting professional norms, standards, and virtues. Let us briefly survey each of
these aspects from a philosophical point of view.
First, sacrifice in translation requires irredeemable loss. Sacrifice challenges the
common sense view of translation as a relation between an original and a copy
(Benjamin 2012, 156). While it is intuitive to think that a translation of the English
phrase “the cat is on the mat” into the German one, “Die Katze ist auf der Matte”,
simply involves copying the meaning from one language into another, Benjamin
(2012, 158) notes:
In the original, content and language constitute a certain unity, like that between a fruit and
its skin, whereas a translation surrounds its content as if with the broad folds of a royal
mantle. For translation indicates a higher language than its own, and thereby remains
inappropriate, violent, and alien with respect to its content.

Benjamin stipulates that the activity of translation does not copy the same thing, as
though a translation is just another configuration of characters and words, but rather
that translation produces commonality by relating to a ‘meta-language’ that exists
invisibly in between languages (a controversial point we will return to). According to
Benjamin, translation invokes the possibility of an ideal or pure language that existed
before the construction of the Biblical Tower of Babel, which has remained forever
unreachable. Such an exercise has been attempted many times before, with little
success. From the sixteenth century, Renaissance and Enlightenment scholars
(including intellectual giants like Descartes, Bacon, Leibniz, and Kircher) have
tried to develop or uncover ‘artificial’, ‘perfect’, and ‘philosophical’ languages,
often modelled on Hebrew or Chinese.
Translation cannot exist without sacrifice and therefore involves (semantic)
violence. Paul Ricoeur recognises this essential element of translation as a form of
suffering (Kearney 2007, 150). According to Ricoeur, the translator is compelled to
reduce the ‘otherness’ of the text while translating, and therefore cannot help but
inflict violence on the original. The translator cannot help but ‘betray’ the original
meaning of a text when working with it. Translation is hence about dealing respon-
sibly with the inevitability of betrayal.
Second, translation establishes commonality between foreign and heterogeneous
elements. Sigmund Freud, for instance, saw translation as a process that bridges the
foreignness between our subconscious and conscious states, a process that happens
within ourselves. In this same internal sense, even when speaking in a mother tongue
(cf. Kittler 1990), we are always already translating: between public and private
settings, formal and informal settings, work and play, and so on. We use language to
cross over from the familiar to the unfamiliar. As Kearney argues, in translating “we
are called to make our language put on the stranger’s clothes at the same time as we
invite the stranger to step into the fabric of our own speech” (Kearney 2007, 151).
Because of this, translation between languages also involves a translation between
different visions of the world. A text is not only ‘German’ because of its grammar
and syntax, but because it embodies and circumscribes a culture and inhabits a
16 W. Reijers and Q. Dupont

world. To accommodate this foreignness, translation needs to welcome the other as


oneself, which Ricoeur captures with the notion of linguistic hospitality (Ricoeur
2006). Linguistic hospitality offers a way to better understand oneself, but also to
better understand the limits of language: to discover what is untranslatable. Linguis-
tic hospitality permits discourse between strangers and therein helps humans avoid
violence. Thus, linguistic hospitality, according to Ricoeur, is connected with the
exercise of justice (Bottone 2013). To translate means to facilitate peaceful
co-existence; we cease to fight because we can talk.
Third, the profession of translation revolves around certain standards of excel-
lence or virtues. Benjamin argues that translation extends the life of a (literary) work
because it has a “special, high purposefulness” (Benjamin 2012, 154). Likewise,
Arendt (1958) argues that work is a distinct mode of human activity aimed at
durability: the creation of a common, lasting world of things. As such, it links up
with a purpose and aim (telos). Following MacIntyre (2007), the internal purpose or
good of translation presupposes the existence of certain standards of excellence or
virtues. However, translation is not just a mode of work amongst many (such as
playing the piano, farming, or building). Rather, because translation has an essen-
tially ethical character, the link between translation and justice is found in political
action, such as acting and speaking in the public sphere. Indeed, the virtues of the
profession of translation are not the virtues of just any profession, but one that
pushes the boundaries of work towards a mode of action (Reijers 2020).
It is perhaps no surprise that the virtue most frequently discussed in the ethics of
translation is that of honesty or fidelity (Pym 2001, 130), which is seen as an antidote
to the inevitable betrayal essential to translation. Virtues like fidelity support truthful
political discourse, which is exemplified by the translation work that happens in
political contexts, like the polyglot European Parliament. Again, we note that
truthfulness does not equate with correctness, but rather urges the translator to
represent both the speaker and the hearer through ethical commitments, especially
an openness to others and to their cultures (Kearney 2007, 154).
Before we turn to the problematic case of MT—the technological mediation of
translation—let us briefly review this first section. We have conceptualised transla-
tion as an activity that mediates between heterogeneous elements: not just between
foreign languages but within a language, perhaps even within an individual. We
argued that translation is an inherently ethical activity because, as work, it is essential
to the exercise of justice; it offers communication as an alternative to violence,
bridging the gap between the familiar and the foreign, oneself and another. And yet,
translation entails semantic violence: it inevitably betrays the intentions of the author
and the reader, the speaker and hearer. Translation is a form of interpretation, which
is never neutral and is often bound up with ideology (Cáceres Würsig 2017). To
remedy these inclinations, certain virtues are required for responsible translation—
most notably the virtue of fidelity—which ensures that translation is always aimed at
supporting political action, at acting and speaking in concert.
2 Prolegomenon to Contemporary Ethics of Machine Translation 17

2.3 What Machines Do To Translation

Having surveyed the ethics of translation, we now consider how the activity of
translation is mediated by technology, or—in the words of Verbeek (2005)—we ask
what technology ‘does’ to translation. In short, machines transcribe notation, which
is a specific form of inscription or writing. However, notational writing is distinctive:
neither a form of language nor writing simpliciter. In this section we briefly discuss
what ‘translation’ machines are, how they have developed historically, and why they
use notation. We then discuss two enduring pitfalls of our common sense under-
standing of MT, that translation would capture the ‘presence’ of lived experience
(logocentrism) and would ‘translate’ the spoken word (phonocentrism).
The history of MT is illustrative of its scientific specificity. According to Hutch-
ins (2006), the history of MT began exactly in 1933 (exempting the much longer
pre-history of perfect, universal, and philosophical languages; see, e.g., Eco 1995;
Markley 1993; or Slaughter 1982). Two patents for MT were issued simultaneously
in France and Russia to Georges Artsrouni and Petr Trojanskij, respectively.
Artsrouni’s patent described a general-purpose text machine that could also function
as a mechanical multilingual dictionary. Trojanskij’s patent, also basically a
mechanical dictionary, included coding and grammatical functions using ‘universal’
symbols. Importantly, like many of the earlier designs for ‘mechanical brains’
(Gardner 1958), Artsrouni and Trojanskij’s machines permuted alphabetic text.
But, unaware of these earlier designs, in 1946 the British crystallographer
Andrew Booth met Warren Weaver (director of the Rockefeller Foundation), to
discuss how their experiences of codebreaking during the war might apply to
MT. Over the next few years, the two men came to believe that MT would yield
similar successes if treated as a problem of code breaking, or cryptanalysis. In 1949,
Weaver distributed (and later published) a memorandum that introduced the idea of
MT to the scientific community. In this memorandum, Weaver included his corre-
spondence with Norbert Wiener, describing how one might apply the “powerful new
mechanised methods in cryptography” to MT and concluding that, when faced with
an unknown language, one might respond by presupposing that the text is “really
written in English, but it has been coded in some strange symbols” (Weaver 1955).
Thereafter, the task of MT was tantamount to code breaking.
We need not valorise or belabour the subsequent successes of this technoscientific
program. After several early attempts at ‘rationalist’ MT using grammatical rules,
Weaver’s codebreaking ethos returned. Formulated as a problem of code breaking,
MT thereafter was understood as a process by which computers calculate statistical
inferences across linguistic corpora (DuPont 2018). Unlike rationalist efforts of
translation that try to reconstruct meaning like a human translator, codebreaking
discovers hidden patterns of use through computational permutations of symbols.
The issue facing the rationalist is not whether computers ‘think’ as well as humans.
In fact, in many cases computers ‘think’ much better than humans, capable of
making deep, uncanny insights. Rather, machines ‘think’ differently from humans
because they process notational symbols.
18 W. Reijers and Q. Dupont

The most familiar notation for computing is the digital bit, first theorised and
named by John Tukey. The bit is usually represented as the numerals 1 and 0 or ‘on’
and ‘off’ electrical states (and following Russell and Whitehead’s program of
logicism in the early 1900s, ‘true’ and ‘false’). Collections (or sets) of bits create
deterministic routines, or algorithms. According to Knuth (1997), algorithms are
essentially proof by mathematical induction.
While the bit is now our most familiar notation, it is joined by a long history of
notational inscriptions used in other domains of knowledge, such as Western
classical music’s staff notation, mathematical notation, and dance notation (the
most sophisticated of which is Labanotation). The purpose of these special marks
is to create meaning. To be sure, all forms of inscription, including painting, dia-
grams, and natural language create meaning. In the Order of Things, Foucault
describes the rich ‘semantic web’ of resemblance underlying inscription up to the
end of the sixteenth century, which “played a constructive role in the knowledge of
Western culture” (this includes scientia) (2002/1966, 20). The four ‘essential’
modalities of meaning in the human sciences are, according to Foucault,
convenientia, aemulatio, analogy, and sympathy (2002/1966, 20–26).
Thereafter, representing, which includes ordering, mathesis, and taxinomia,
comes to dominate the French ‘classical’ epistemé. The historical accuracy and
precision of Foucault’s account need not concern us here. Rather, we reflect on the
changing epistemé, from resemblance to representation: on how “the written word
and things no longer resemble one another”, and how “words have swallowed up
their own nature as signs” (2002/1966, 53–54). Foucault locates this fissure of the
human sciences in the work of Francis Bacon and René Descartes (two scholars
keenly involved in the longue durée of MT), finding a critique of resemblance “that
concerns, not the relations of order and equality between things, but the types of
mind and the forms of illusions to which they might be subject” (2002/1966, 57). To
put the point bluntly, MT is only possible in a representational epistemé.
Unlike human translation, which arguably works through chains of metaphoric
connotation (Eco 1994, 29), MT requires a much more rigid and stable semiotics. To
do ‘work,’ machines must transcribe notational inscriptions according to narrowly
prescribed rules. But in the infinite semiotic chain of translation, machines build
hidden layers of semiosis using a constitutive set of characters that are disjoint and
finitely differentiated. For each layer of meaning, these sets of characters are
correlated to a field of reference that is unambiguous, semantically disjoint, and
semantically finitely differentiated (see Goodman 1976). Thus, despite
accomplishing the task of translation, machines do not translate ‘words’—however
and variously inscribed in image, sound, video, or viva voce—but rather transcribe
symbolic ‘notations.’
The transformation implied in the activity of translation raises at least two
fundamental philosophical issues: (1) to what extent does MT transcribe the ‘pres-
ence’ of lived experience (logocentrism), and (2) to what extent does it transcribe the
spoken word associated with this lived experience (phonocentrism)? That is, do
words represent some deeper, more essential, reality? Peirce did not think so:
2 Prolegomenon to Contemporary Ethics of Machine Translation 19

The meaning of a representation can be nothing but a representation. In fact it is nothing but
the representation itself conceived as stripped of irrelevant clothing. But this clothing never
can be completely stripped off: it is only changed for something more diaphanous.
(CP 1.339)

But is a naked word—not a representation—possible? Perhaps there exists some


kind of primordial language of thought or ‘mentalese’ (Fodor 1975)? Perhaps we can
have an ‘intuition of the Essence’ (Husserl 1969, 52)? Whatever the answer to this
long-standing philosophical quandary might be, we are concerned only with
machine translation and therefore can focus on notation—a constitutive writing, or
text, or inscription.
Since at least Plato (Republic III and Cratylus), philosophical thought has tended
to confine writing to a secondary and instrumental function. Typically, writing has
been thought to be a “translator of a full speech that was fully present (present to
itself, to its signified, to the other, the very condition of the theme of presence in
general)” (Derrida 1998, 8). Derrida named this epochal mistake ‘logocentrism,’ the
subjugation of writing to presence, essence, or logos.
Derrida attempted to move beyond logocentrism by deconstructing the metaphys-
ics of presence, positing the possibility of an arche-writing (Derrida 1998, 57).
Arche-writing emerges out of the arbitrariness of the sign (Peirce’s infinite semiosis)
and cannot be reduced to presence. It is the unnameable movement of difference-
itself (Derrida 1998, 93) and “the pure trace” (62), the “movement of différance”
(that which constantly differs and defers) (60). Trace and différance have many
complexities but share in a non-simple origin and ‘stand in’ as a simulacrum of
presence that makes the “opposition of presence and absence possible” (Derrida
1998, 143). This opposition motivates the phenomenology of writing—and the very
possibility of a grammatology—because it can never expose the trace; only the
difference emerges, in “spacing (pause, blank, punctuation, interval in general,
etc.)”. A grammatology of MT, therefore, must not commit to presence but rather
to spacing.
Derrida gestures to the human translator’s practice through arche-text. Naturally,
the human translator reads a text and makes an interpretation. But finding no simple
origin in the arche-text, Derrida invented the method of deconstruction to open
readings, a process that is unbounded by presence and seems to offer an unlimited
semiosis, with no limit to interpretation. Should the human translator take Derrida’s
method too seriously, however, there would be linguistic anarchy.
Against Derrida’s freewheeling approach, Umberto Eco re-reads Peirce’s Col-
lected Papers and concludes that the limits of interpretation lie with the author’s
original intention, which must be motivated by some extra-semiotic ‘Dynamic
Object’ (Eco 1994, 39). For Eco, the dynamic object opens the reading while habit
closes it. Habit, according to Eco, is “a disposition to act upon the world” in a
community who offer “an intersubjective guarantee of a nonintuitive, non-naively
realistic, but rather conjectural, notion of truth” (Eco 1994, 39). Thus, MT, with no
limits of interpretation and no community to guarantee a text’s closure, cannot rely
on logos, or presence. This lack of foundation is also why, according to Dewey and
20 W. Reijers and Q. Dupont

Tufts (1909), habit is such an important element of ethics, and a fundamental


limitation for any ethics of MT.
The gap between writing and speech shrank dramatically when the Greeks
invented the vowelised alphabet. Unlike ideographs (such as Chinese), the alphabet
is a strict notation because it has a formal compliance to the semantic field (which
can be tested for accuracy by appeal to rules) (Goodman 1976). However, being
composed of just consonants, the original alphabet could not fully express speech.
The invention of the vowelised alphabet lifted this limitation and created—for the
first time—a system of meaning capable of representing all language, Greek and
barbarian. It is not an accident that philosophy blossomed in this environment, but
the perceived universality of a vowelised alphabet created a rift between speech and
writing. Because of the instantaneous comprehension of speech, the voice mirrors
things by natural resemblance (see Aristotle in De interpretatione), a further rela-
tionship of translation or natural signification. Derrida called this the “absolute
proximity of voice and being” (Derrida 1998, 12). MT, however, does not speak.
Instead, a machine translates a work through a double performance, which always
produces a supplement. This process is structurally the same for all notational
technologies, as when Bach’s Brandenburg Concertos (a musical work) is played
by Glenn Gould (a performance), or when a plaintext message (an algorithmic work)
is enciphered (a performance), that is then reversed through decryption (a double
performance). Unlike human translation, the unity of a work is found in the correct-
ness of spelling (Goodman 1976)—a reversible process—if not for the supplement.
For example, two performances of the Brandenburg Concerto—say, Gould’s and Le
Concert des Nations’—are alike insofar as all the ‘correct’ notes have been played,
but Gould’s performance is more: more than the sum of the notes played in the
correct order (a supplement). The performance of voice also has a supplement
(known in linguistics as ‘suprasegmentals’).
It is telling that the history of artificial and ‘perfect’ languages often developed as
pasigraphies (i.e., in which each symbol represents a concept). Artificial and perfect
languages were a Babelian dream, both invented (like John Wilkins’ Real Character)
and discovered (Hebrew and Chinese), much like the universal MT software used
today. Characteristically, pasigraphies are silent, an ‘all writing’ that is traditionally
thought to reveal the inner power of writing, long associated with occult, statehood,
and the elite. Moreover, silence and ineffability are often put to strategic use, playing
off the illiteracy of commoners and barbarians alike (see Zielinski 2006). But today,
MT reverses the presumed (logocentric) order of evolution, which holds that speech
is primary and writing is derivative. Notably, notational writing machines
co-evolved into what we today call a computer. The implications of this debate are
significant; if logocentrism and phonocentrism are true, then computers cannot
think, because they do not speak. Even speech synthesis, after all, is first a type of
writing. Perhaps contemporary MT proves Derrida’s theory of arche-text right? At
Babel, Man was confused by many tongues, but MT today appears to offer an
apotheosis for the long legacy of writing.
2 Prolegomenon to Contemporary Ethics of Machine Translation 21

2.4 Ethical Questions of MT

Over the last decade, advances in state-of-the-art MT have demonstrated that


efficient and reliable MT is possible. In this exceedingly short time span, the
dream of a Babelian utopia was realised, which has settled most of the philosophical
debates about whether translation is uniquely human. Meanwhile, the automatic
production of text, speech synthesis, artificial summarisation, and semantic analysis
across many communication modalities has also been largely developed, to the point
that real applications are now often used. Most commentators have embraced these
advances and now concern themselves with niggling questions of transparency and
bias, forgetting that the ethics of MT involves more than these practical concerns.
Ethical questions in MT go deep and are uniquely tied to human obligations and
responsibilities. In much the same way as the weave of a printed page communicates
heritage, a twisted face expresses sorrow, and a trompe l’oeil painting alights with
feints of sympathy and depth, machine translation is more than a sum of its parts.
Resonating but departing from the ethics of human translation, a genuine ethics of
MT concerns telos, truth, and ways of worldmaking.
In this final section, we outline an agenda for the ethics of MT. We do not propose
definite answers but rather open up fundamental questions about ethics and respon-
sibility. Responsibility (i.e., the ability to respond to a call), we admit, has a
philosophical vestige that is tied up with both writing and history itself. So, in
posing these questions we neither condemn nor celebrate MT. Instead, we embrace
what Stiegler (2011) calls a pharmacological perspective, a perspective that con-
siders technology to be essential to human existence and as such is irreducible to
good or bad; rather, MT can be both a cure and poison. In more practical terms, this
means MT can participate in the good. Increasingly, utopian claims that MT may
bring together people from different nations, cultures, and persuasions are justified
and illustrate genuine contributions to fostering a global understanding. Yet, at the
same time, MT comes with a dark side and anticipates the ‘ultimate danger’ of our
technological worldview, as evocatively expressed by Heidegger (1977). These
questions motivate our prolegomenon to thinking through the pharmacological
character of MT.
First, we ask: what place does responsibility hold in MT? Here, the ethical
principle of responsibility resonates with what we earlier encountered as fidelity.
However, responsibility goes beyond fidelity because it is not just about being
respectful of truth but it also suggests the possibility of harm. In the inevitable
sacrifice or betrayal that has to be made in the activity of translation, there is a
growing risk of harm. This is evident, especially, in disaster situations and multi-
modal translation, like speech synthesis. When a rapid response is required, a
sacrifice in meaning can seriously affect decision-making (O’Mathúna et al. 2020,
185). Human translators are expected, or at least should be expected, to take
responsibility for this sacrifice. In other words, when harm is done or might be
done, a human translator can answer to the call of responsibility: both retroactively,
for instance, by stating why a more generic translation was preferred because of time
22 W. Reijers and Q. Dupont

pressure, and prospectively, by stating which ethical considerations will guide


translation decisions. These values affect professional translation but also so-called
‘citizen translators,’ who are non-professionals providing translation aid in crisis
situations and yet are also expected to adhere to certain virtues and a sense of
responsibility (O’Mathúna et al. 2020).
However, can machines answer the call of responsibility in the same way as
human translators? Under the rubric of ‘responsibility,’ often about risk manage-
ment, machines can in fact answer to this call. As Ricoeur observes in The Just,
responsibility in the modern world has been largely divorced from the agent and is
now subsumed under a calculus of probability (Ricoeur 2006, 26). It is no longer a
matter of who is responsible but of where responsibility lies because risk is invoked
as a liability concern. However, responsibility in MT is backwards-looking; it relies
on a decision that involves calculated risk-taking, based on a certain programmed
past. This history is not strictly speaking historical, as in having to do with events in
time, because the semantic surface has been rendered a-historical through abstraction
by way of notational traces, most notably digital traces. For instance, a Wikipedia
corpus is historical in the sense that it consists of textual entries that have been added
to it in time, but a-historical insofar as it is a collection of digital traces that have been
separated from the very historical acts and experiences that they refer to.
This gap between notational representation and experience and action (the gap
between presence, trace, and surplus) marks a normative gap between retrospective
and prospective responsibility. Prospective responsibility, at least for Ricoeur, is
about an agent, a ‘who’ to whom an action can be imputed. But responsibility is
essentially tied up with ‘responsiveness,’ which requires communication in the
broad sense of the term. Insofar as communication has a primordial grasp on action
and experience, this is not rendered in notation. Rather this primordial grasp is a
dialogue between the Self and the Other (Gorgoni and Gianni 2021, 179), a
translation. It is in such a dialogue that we designate ourselves as the ‘I’ who speaks,
as the ‘I’ who can respond to a call of responsibility. Writing, including notional
writing, problematises responsibility because it extends prospective responsibility
across the local ‘I’ and ‘You’ and towards the unknown other.
Writing and MT, in other words, embeds responsibility in an institutional context.
When a human translator takes prospective responsibility, the individual has to
mediate between an institutional context (e.g., professional norms), an utterance,
and lived personal experience and action. As Arendt (1958) concedes, taking
prospective responsibility involves acting and speaking in concert, before responsi-
ble actions can be taken up in (notational) writing. Responsible MT, therefore, is
limited. To the extent that retrospective responsibility suffices, arguably the case in
domains such as insurance, finance, and so forth, MT might well be just as capable of
‘responsibility’ as human translation. However, when prospective responsibility is
called for, which requires a ‘who’ that acts and speaks, notational machines seem
incapable of taking the necessary responsibility.
Second, we ask: to what extent can MT accommodate linguistic hospitality?
As discussed above, the norms of hospitality are at stake because translation
necessarily bridges the gap between foreign, heterogeneous elements without
2 Prolegomenon to Contemporary Ethics of Machine Translation 23

reducing one to the other. Heterogeneous elements are different visions of the world.
Arguably, different languages represent different visions of the world, but also
within languages (O’Mathúna and Hunt 2019). Hospitality suggests that even
when communicating in one language, it is often necessary to ‘translate’ particular
social contexts. Translation offers an opportunity to move between different visions
of collective meaning making. Linguistic hospitality requires such a capacity, but
can machines possess such a thing?
To motivate an answer to this question, we recall Goodman’s concept of ‘ways of
world making’ (1978). Goodman argued that we understand reality through a
pluralist lens of different versions of the world that are irreducible to one another.
These versions of the world are constructed through procedures of ‘world making,’
such as composition, ordering, and deformation (see also Herman 2009, 77). Which
ways of world making are possible depends on the ‘affordances’ and ‘constraints’ of
the medium through which they are conducted. When the medium is a (notational)
translation machine, the procedures of world making are necessarily limited. Most
crucially, a machine, unlike a human translator, has no access to the experiential
context of a particular version of the world. Machines lack ‘experience’ of the world
within (and across) linguistic realms. A machine may transcribe notations from
source to target languages but it lacks the procedures necessary to embed the
constructed meaning in a context of lived experience.
Does this mean that MT is incapable of producing linguistic hospitality? Not
necessarily so. Insofar as the procedural limits of world making allow, machines can
‘translate’ between different worlds by composition and ordering procedures. How-
ever, machines will not have access to the full range of practises that linguistic
hospitality entails and moreover, such translations always evince semantic violence.
For instance, a machine may not be able to produce a ‘diplomatic’ tone because
diplomacy requires access to lived experience: tacit understanding of cultural cues
and preferences, body language, time perception, and all the sundry actions of
embodiment. A notionally correct and semantically serviceable translation will, in
such cases, fail to cross the gap between the familiar and the foreign.
Third, we ask: to what extent does MT preserve the activity of translation as
virtuous work? Prima facie this seems impossible, for how could machines have
virtues that translators would and should possess? They certainly lack the ‘person-
ality’ usually associated with virtues. However, following MacIntyre (2007) we
argue that virtues lie primarily in practises (i.e., performances of a work). Returning
to the example explored above, a piano does not possess any virtues outside of its
context of use, but it is a necessary part of a performance and hence co-constitutes
relevant virtues, like Gould’s mastery. When practises—or performances—are
hybrids of humans and machines, they are symbiotic, as when, for instance,
human translators prepare texts (pre-editing) or correct translated results (post-
editing). To be sure, such forms of human-machine collaboration still carry many
risks, such as diminishing human creativity and introducing bias. Yet, at least in
principle, such practises could support professional virtues associated with transla-
tion. When humans are ‘out of the loop’ entirely, the risks increase further. Consider,
for instance, the challenges of preserving the virtues of translation in fully automated
24 W. Reijers and Q. Dupont

political contexts, like the European Commission. What would the virtue of fidelity
mean in such a context, beyond the narrow notion of correctness?
It is difficult to respect professional virtues in MT because the practice is
transmedial. By passing through a notational media, we argue, MT risks subjecting
the nuance of translation activity to a logic of commodification. As with any other
modes of production, works of translation have ‘commodity potential’ (Appadurai
2003, 82). Linguistic commodification occurs when meaning is made exchangeable,
which in turn, renders things (cars, apples, office buildings, but also translations)
universally substitutable. The layers of standardisation entailed in MT commodify
language in the service of global capitalism. As we discussed above, notational
writing is one of the oldest, most profound, and most impactful forms of
standardisation and universalisation. Notational writing offers a transmedial foun-
dation for the political economy of commodification par excellence.
However, translation work that generates a common understanding may even
resist commodification. Even though notational writing is subject to standardisation
and universalisation, language also forecloses the possibility of an infinitude of
instantiations with finite means. This gives rise to the possibility that, like a work
of art, translation bears the mark of its maker and is in that sense a particular object
that resists universal exchangeability. Consider, for instance, how translations of the
Illiad are sometimes incommensurable. This incommensurability does not lie in the
product itself (the translation) but in the practice of translation as work. In this
practice, the translator is disposed to make certain choices and thereby enact virtues,
most notably fidelity. In literary works, translators often choose to translate proper
names in such a way that they lie closer to the target language than to the source
language, which requires a considerable level of creativity. No two works of
translation are the same and because translators are unique. Rather, each translator
brings a particular character and worldview, which imbues the work with a style.
Characteristic styles may also resist commodification: a translation of poetry is
distinct from real-time political translation because it is more standardised and
rudimentary. Yet, in both cases professional virtues—perhaps unique to the
style—are required.
With echoes of Arendt (1958), we find that MT signals the substitution of
mechanical labour—in terms of cyclical process—for work, in terms of durability.
MT does not extend the life of a work but continuously consumes and produces
‘meaning.’ Even when MT is deemed highly accurate, it dissolves the identity of a
work of art through its process—practice and production—that ends up being both
universalised and standardised and therefore imminently exchangeable. Through
this process, MT reifies the practice of translation, as a ‘given’ thing with ‘phantom
objectivity’ (Pitkin 1987, 265). In so doing, MT risks obscuring and covering up the
social relations that its practice gathers and mobilises. This problematic underlies the
political economy of MT and has led to an alienation of human translators and the
erosion of profession (cf. Moorkens 2017).
2 Prolegomenon to Contemporary Ethics of Machine Translation 25

2.5 Conclusion

This chapter offers a critical introduction to the ethics of MT. First, we discussed the
ethics of translation by exploring three major themes: sacrifice, linguistic hospitality,
and the virtues of translation work as a professional activity. Second, in the spirit of
philosophy of technology, we asked what machines do to the activity of translation.
We argued that machines do not translate words, as human translators (arguably) do,
but rather transcribe notations. We then presented the enduring philosophical chal-
lenges of logocentrism and phonocentrism and identified performance as central to
MT. Third, we discussed how the limitations of MT became ethical questions. We
explicitly asked how MT can accommodate responsibility, to what extent it can
create linguistic hospitality, and how it affects the virtues of the work of translation.
Our approach opened the ethics of MT across different levels of granularity: at the
micro level of individual sensemaking, the meso level of mediation between oneself
and another, and the macro level of the political economy of MT.

Acknowledgments One author (Reijers) was funded by the European Research Council (ERC)
under the European Union’s Horizon 2020 Research and Innovation Programme (Grant Agreement
No. 865856).

References

Appadurai A (2003) Commodities and the politics of value. In: Pearce S (ed) Interpreting objects
and collections. Routledge, London
Arendt H (1958) The human condition, vol 24. University of Chicago Press, Chicago, IL. https://
doi.org/10.2307/2089589
Benjamin W (2012) The translator’s task. TTR : Trad, Terminol, Réd 10(2):151. https://doi.org/10.
7202/037302ar
Bottone A (2013) Translation and justice in Paul Ricoeur. In: Foran L (ed) Translation and
philosophy. Peter Lang, Oxford
Cáceres Würsig I (2017) Interpreters in history: a reflection on the question of loyalty. In: Valero-
Garcés C, Tipton R (eds) Ideology, ethics and policy development in public service interpreting
and translation, vol 1. Multilingual Matters, Bristol; Blue Ridge Summit, pp 3–20
Derrida J (1998) Of grammatology. Johns Hopkins University Press, Baltimore. MD; London.
(G. C. Spivak, Trans.; Corrected)
Dewey J, Tufts J (1909) Ethics. Henry Holt and Company, New York, NY
DuPont Q (2018) The cryptological origins of machine translation, from al-Kindi to Weaver.
Amodern:8. https://amodern.net/article/cryptological-origins-machine-translation/
Eco U (1994) The limits of interpretation. Indiana University Press, Bloomington, IN
Eco U (1995) The search for the perfect language. Blackwell, Cambridge, MA. Translated by James
Fentress
Feenberg A (2010) Ten paradoxes of technology. Techne 14(1):3–15
Fodor JA (1975) The language of thought. Thomas Y. Crowell, New York, NY
Foucault M (2002/1966) Order of things. Routledge, New York, NY
Friedman B, Kahn P, Borning A (2002) Value sensitive design: theory and methods. Univ
Washington Tech Rep 2(12):1–8
Gardner M (1958) Logic machines and diagrams. McGraw-Hill, New York, NY
26 W. Reijers and Q. Dupont

Goodman N (1976) Languages of art. Hackett, Indianapolis, IN


Goodman N (1978) Ways of worldmaking. Hackett, Indianapolis, IN
Gorgoni G, Gianni R (2021) Responsibility, technology, and innovation. In: Reijers W, Romele A,
Coeckelbergh M (eds) Interpreting technology. Rowman and Littlefield, London
Heidegger M (1977) The question concerning technology and other essays. Garland Publishing,
Inc, New York, NY
Herman D (2009) Narrative ways of worldmaking. In: Heinen S, Sommer R (eds) Narratology in
the age of cross-disciplinary narrative research. De Gruyter, Berlin. https://doi.org/10.1515/
9783110222432.1
Husserl E (1969) Ideas. Collier-Macmillan, Ltd, London. (Boyce Gibson, trans.)
Hutchins JW (2006) Machine translation: history. In: Asher R (ed) Encyclopedia of language and
linguistics. Elsevier Science & Technology Books, San Diego, CA
Kearney R (2007) Paul Ricoeur and the hermeneutics of translation. Res Phenomenol
37(2):147–159
Kenny D (2011) The ethics of machine translation. In: New Zealand Society of Translators and
Interpreters Annual Conference 2011, 4–5 June 2011, Auckland, New Zealand. ISBN 978-0-
473-21372-5
Kenny D, Moorkens J, do Carmo F (2020) Fair MT: towards ethical, sustainable machine transla-
tion. Transl Space 9(1):1–11
Kittler F (1990) Discourse networks 1800/1900. Stanford University Press, Stanford,
CA. Translated by Michael Metteer and Chris Cullens
Knuth D (1997) The art of computer programming. Addison-Wesley, Reading, MA
Latour B (1994) On technical mediation. Common Knowl 3(2):29–64
MacIntyre A (2007) After virtue: a study in moral theory, 3rd edn. University of Notre Dame Press,
Indiana, IN. https://doi.org/10.1017/CBO9781107415324.004
Markley R (1993) Fallen languages: crises of representation in Newtonian England, 1660-1740.
Cornell University Press, Ithaca, NY
Moorkens J (2017) Under pressure: translation in times of austerity. Perspect Stud Translatol
25(3):464–477. https://doi.org/10.1080/0907676X.2017.1285331
O’Mathúna DP, Hunt MR (2019) Ethics and crisis translation: insights from the work of Paul
Ricoeur. Disast Prevent Manag 29(2):175–186. https://doi.org/10.1108/DPM-01-2019-0006
O’Mathúna DP, Parra Escartín C, Roche P, Marlowe J (2020) Engaging citizen translators in
disasters. Transl Interpret Stud 15(1):57–79
Parra Escartín C, Moniz H (2019) Ethical considerations on the use of machine translation and
crowdsourcing in cascading crises. In: Federici FM, O’Brien S (eds) Translation in cascading
crises. Routledge, New York, NY, pp 132–151
Pitkin HF (1987) Rethinking reification. Theory Soc 16(2):263–293
Pym A (2001) Introduction: the return to ethics in translation studies. Translator 7(2):129–138.
https://doi.org/10.1080/13556509.2001.10799096
Raley R (2003) Machine translation and global English. Yale J Critic 16(2):291–313. https://doi.
org/10.1353/yale.2003.0022
Reijers W (2020) Responsible innovation between virtue and governance: revisiting Arendt’s
Notion of work as action. J Respons Innov 7(3):471–489. https://doi.org/10.1080/23299460.
2020.1806524
Ricoeur P (2006) On translation. Routledge, Abingdon
Slaughter MM (1982) Universal languages and scientific taxonomy in the seventeenth century.
Cambridge University Press, Cambridge
Stiegler B (2011) Pharmacology of desire: drive-based capitalism and libidinal dis-economy. New
Form 72(72):150–161. https://doi.org/10.3898/newf.72.12.2011
Venuti L (1991) Genealogies of translation theory: Jerome. Traduire La Théorie 4(2):5–28. https://
doi.org/10.1215/01903659-2010-014
Verbeek P-P (2005) What things do; philosophical reflections on technology, agency, and design.
Pennsylvania University Press, Pennsylvania, PA
2 Prolegomenon to Contemporary Ethics of Machine Translation 27

Vieira LN, O’Hagan M, O’Sullivan C (2020) Understanding the societal impacts of machine
translation: a critical review of the literature on medical and legal use cases. Inf Commun Soc
24(11):1515–1532. https://doi.org/10.1080/1369118x.2020.1776370
Weaver W (1955) Translation. In: Machine translation of languages. Technology Press of Massa-
chusetts Institute of Technology, Boston, MA
Winner L (1980) Do artifacts have politics? Daedalus 109(1):121–136. https://doi.org/10.2307/
20024652
Zielinski S (2006) Deep time of the media: toward an archeology of hearing and seeing by technical
means. MIT Press, Boston, MA
Chapter 3
The Ethics of Machine Translation

Alexandros Nousias

Abstract Language technologies are gradually turning into key modalities of our
algorithmic present and future. Real world texts embed patterns and patterns hack
natural meaning, following the language structure, word ordering, the underlying
values and perceptions of a given semantic agent. Understanding the logic behind
computational linguistics and meaning formulation provides the necessary theoret-
ical substrate for ethical screening and reasoning. The present paper aims to map the
components and functionalities of a random linguistic system, tracking all stages of
semantic rendering throughout the semantic cycle and providing recommendations
for ethical design optimisation. We attempt to articulate the information space as a
transition process of natural and/or non-natural phenomena into subjective informa-
tion models, driven by subjective human experience and worldview: To realise the
information space as a dynamic space, full of arbitrary features, where the informa-
tion maker and the information receiver both affect/influence and are affected/
influenced by the system. This article is not a practical toolkit towards the application
of ethics and further social considerations on Natural Language Processing at large
and/or more specifically Machine Translation ones; it is rather a contribution in
shaping the appropriate framework within which the Language Technology
(LT) stakeholders may identify, analyse, understand and appropriately manage
issues of ethical concern. It is a plea for multimodal computational linguistic
management, via interdisciplinary groups of professionals, in order to optimise the
research around the impact of computational linguistics on human rights, language
diversity, self-identity and social behaviour. Language technologies need to articu-
late complex meanings and subtle semantic divergences. New external metrics need
to be added, based on legal, ethical, social, anthropological grounds. This is a
necessary puzzle to solve, in order not to discount future structural rewards.

Keywords Ethics · Machine translation · Language · Contextualisation

A. Nousias (✉)
Present Address: National Centre for Scientific Research – Demokritos, Athens, Greece
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 29


H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_3
30 A. Nousias

3.1 Introduction

“M[achine] L[earning] models being trained today might still be in production in


50 years, and that’s terrifying” (Narayan 2018).1 Apparently, this is the case for
Natural Language Processing (NLP) as well. The way NLP models reflect language
and meaning, regardless of the degree of their pervasiveness, depends on the system
parameters, as selected and configured in their initial design. As NLP becomes
increasingly widespread and uses more social data, the situation has changed: the
outcome of NLP experiments and applications can now have a direct effect on
individual users’ lives (Hovy and Spruit 2016). Intuitively this brings us to the
question of which parameters designers should choose for their models in the first
place? To put it in other words, how can a designer guarantee that the model is what
it ought to be? This is where ethics chips in, still with no guarantees whatsoever. As
Zunger, former principal engineer at Google and Chief Architect for Social con-
tends, “people sometimes forget the degree to which engineering is inextricable from
human society, human norms and human values” (Zunger 2017). In regard to
Language Technology (LT), we claim that understanding the logic of language
formulation and the critical factors in meaning production will articulate the ethical
issues at play and allow NLP to develop in an ethical way. Such an ethical exercise is
an indispensable requirement for the language systems we are about to shape for
possibly the next 50 years.
In the emerging algorithmic reality, an era of extended modularity (Kostakis
2019), language and the words used are tagged as digital language resources
(whether text, speech, or other) and essentially exploited and understood as data.
As makers put semantic context into the language resources, these contingently
qualify as digital semantic capital, that is, “any content that can enhance someone’s
power to semanticise something” (Floridi 2018). The point where language
resources operate either as data or semantic capital provides the critical borderline
between humans and machines. Humans are by design and by default about seman-
tics and consequently pragmatics. Machines by design lack semantics. Machines at
first instance were designed to handle solely syntax, gradually turning to agents of
meaning generation. As such, the crucial point for the development of machine-
learned models for automated NLP and relevant Machine Translation
(MT) applications goes beyond simple lexicography, language data linking and
annotation. The endgame here is to provide accurate meaning in the appropriate
context, aiming to minimise data and/or model and/or annotation bias, social or
semantic misrepresentation, trolling, hate speech, toxicity, computational propa-
ganda, etc. Providing the appropriate semantic context in the use of digital language
resources constitutes the central ethical destination and admittedly an ethical brainer.
The success of relevant applications like MT, language identification and text
classification lies on such contextual appropriation. The common ethical denomina-
tor of relevant applications and technologies is choice: choice in language resource

1
Narayanan on Twitter: https://twitter.com/random_walker/status/993866661852864512.
3 The Ethics of Machine Translation 31

creation, collection, classification, annotation and linking as well as findability,


accessibility, interoperability and, finally, use and reuse. It is a choice that will
define the language data linking in the level of meaning and the overall semantic
formulation henceforth. It is also a choice with certain legal, ethical and societal
representations, with massive impact on the human self, the human identity and
human perception of society, nature, life and reality. However, such choice is not the
output of a binary concept of good vs. bad, but the output of a sophisticated
navigation within complex systems, comprised of continuous tensions between
opposing legal, ethical, technical and conceptual interests that need to be secured
in an appropriate fashion. Our starting point is the identification of the inherent
philosophical problems of language, based on the in-depth analysis of Avgelis
(2014) as included in Andriopoulos (2021). Training a language model to predict
a word or providing a MT output may at first instance be quite straightforward. But
does this computational language processing reach parity with the underlying
meaning? Do model makers have a common understanding based on a common
semantic starting point in regard to the processed language theme? Are word
embeddings neutral or do word embeddings capture the linguistic/semantic nuances?
This chapter discusses the language/meaning dichotomy and assesses on philosoph-
ical terms the semantic shift potential when language is subject to modelling and the
underlying complexity.

3.2 Language and the Problem of Meaning

Matter exists in the physical world independent of language and communication,


whether speech or text, that is logos (in Greek: λóγoς). Linguistic rendering of matter
comes at a later stage, adding to language and words an autonomous semantic
feature. Aristotle identifies the core philosophical problem of language as a problem
of meaning derived from the matter/language dualism and the fluid contextual
semantics thereof (Aristotle, De Interpretatione 16a3-8). ‘Logical Empiricism’
introduces an additional rendering layer, extending language from its syntactic and
semantic relationship towards matter or facts, to its dependency with the language
subject, that is the human agent that appropriates language into their personal time
and space, empirical background, beliefs and overall value system. Riding the wave
of philosophical interpretation of language, Clark and Schober (1992) state that “The
common misconception is that language has to do with words and what they mean. It
doesn’t. It has to do with people and what they mean”. People are considered agents
of their semantic capital, namely their personal capacity to provide meaning through
their personal course in time and space. Such meaning provision is a subjective
cultural output with given implications to people and to society at large.
Converting the philosophical problem of language to a problem of meaning,
philosophy points out the very complex interconnection between the two core
underlying modalities of language that are semiotics and semantics. Semiotics refers
to the signs and symbols, while semantics refers to the overall meaning, the context.
32 A. Nousias

Language functionality derives from symbols. Each term, word etc. is a symbol fit
for purpose and language functionality is the output of a random symbol juxtaposi-
tion so as to produce meaningful information. Symbols “denote” things, namely all
natural and non-natural phenomena in language through coherent semantics. In the
theory of meaning, semiotics discerns the signifier from the signified. Aristotle
identifies the problematic of meaning in language as functionality and consistency
(Avgelis 2014 as included in Andriopoulos 2021). Symbols do not cite natural or
non-natural phenomena directly, but through meaning derived from consciousness,
experience, culture, information and the knowledge base of a given epistemic agent.
In other words, symbols, as expressed in different spoken languages, dialects at
different spaces and at different times, do not signify the phenomena per se, rather
views, perspectives, norms and values across these random parameters. They are
semantic nuances that need to be annotated and modelled accordingly so as to
appropriate and contextualise symbols as expressed across languages, groups,
space and time. Values Across Space and Time,2 is a very interesting European
Union’s Horizon 2020 research and innovation programme that envisions to capture
different semantic states of that sort, focusing on values through advanced model-
ling, methods, techniques and digital tools, enabling the collaborative study, anno-
tation and continuous capture of experience. Equally to value appropriation, we
believe that a relevant semantic appropriation of symbols is imperative.
Back to the task of symbol appropriation, we identify a semantic cluster of three
interconnected nodes: (a) the symbols per se; (b) the expressions of the soul as
shaped by consciousness, experience, culture, information and the knowledge base
which in turn shape how phenomena are expressed; and (c) the phenomena them-
selves which exist prior to or independently from the language. This makes ques-
tionable the adequacy or consistency of the symbol (or word)/thing ratio of
language. Aristotle (in his De Interpretatione) seems to be navigating within such
polarity:
1. Symbols represent things or a type of data, words reflect realities or a type of
information and sentences provide the state of these realities or—ground—
knowledge;
2. Linguistic expressions are meanings specific within an intersubjective
framework;
Evidently, both theses provide an ontological character, as they represent a universe
of things, their relations and an implied or expressed set of rules thereof thus
providing room for further inferences. The relation between a word and a thing
seizes its linear course, as things are replaced by thought, transforming it (the
relationship) to a complex and dynamic system in a state of informational modelling,
a mental, creative and intersubjective process, that extends the is to the could or
should, flirting with the idea of naturalism. It is not by coincidence that Plato (in the
Cratylus) understands ‘logos’ (λóγoς) as a mix of language and intellect with the

2
https://www.vast-project.eu/vision/.
3 The Ethics of Machine Translation 33

‘nous’. Therefore, logos in Greek stands for both intellect and language. Language
turns out to be the reflection of the relations between ideas. Language, reality and
ideas are absolutely discernible components of meaning generation. The essence of
language lies in depicting the dialectic relation between the ideas.
A very straightforward question that arises in this complex system of meaning is
whether language mirrors the world or generates meaning via an ‘intersubjectivity’
reached by ‘convention’. When using the term ‘intersubjectivity by convention’,
Aristotle refers to the contingent ability of varying interlocutors or semantic agents,
to understand the meaning a hypothetical semantic agent A is conveying. The
essence of meaning lies not on the image of a thing, a fact or a phenomenon, but
rather on the intersubjective function of language and its underlying semantics.
Linguistics and the communication thereof are about delivering/receiving meaning
or providing a statement about something. It is not about naming a thing. Such
intersubjectivity requires conventional convergence, a consensus on the basics. At
this stage such consensus lies on the variant prospects of language and logos. When
communicating, we aim at sending/receiving meaning, via a given set of linguistic
symbols, the syntax and grammar thereof. Meaning does not stand for a transcending
entity referenced by language; rather it is the output of a constructive process.
Meaning is conveyed through language from the moment we reach consensus in
regard to the infinite informatic capabilities of logos.
A clear distinction between naturalism and convention may prove helpful at this
point. There might be a conflict between language as a reality and language as an
informatic convention. The semantic function of language (as logos) cannot be
articulated under a symbol/thing ratio, as the latter allows variable interpretations,
depending on the approach, the interests, and the perceptions of the observer. How
we choose the set-up of logos is a constructive process that shapes the language used,
which in turn delivers the information communicated, which in turn conveys a set
meaning accordingly. Remixing Hume’s views in regard to natural theory (Wacks
2006) with the problematic of meaning analysed herein, the meaning conveyed by
the selection of a specific set of language, in a set syntax and grammar is neither
equal to what it is nor to what it ought to be. It is neither a reflection of reality, a
mirroring or portraying nor an output of semantic moral reasoning. As Floridi (2019)
nicely says, “the transformation of the world into information is more like cooking:
the dish does not represent the ingredients, it uses them to make something else out
of them, yet the reality of the dish and its properties hugely depend on the reality and
the properties of the ingredients”. I would add that the transformation of the world
into information is carried out via the logos and generation of meaning. In other
words, it is a shift from representation to interpretation through perception. Alter-
natively, in an analogy to Bentham’s critique of natural law (Commentaries 1765 as
referenced in Wacks 2006), meaning generation through language formulation is or
could be a ‘private opinion in disguise’, something somewhat ‘censorial’ (Bentham
Manuscripts, University College, London Library). After all, the actual rule of law is
language with reason in an already established ethical context.
This process of linguistic creativity qualifies for an ethical assessment and
analysis. The implications of such conveyed meaning to the recipient and society
34 A. Nousias

at large claim our attention and ethical processing and reasoning as well, as they urge
the semantic agent (the information sender) to step backwards in order to reassess the
constructive process of the former meaning generation. Add multilingualism and the
embedded hindrances, constraints and compromises into the mix and what you get is
a set of subsequent interfaces comprised of variant points of interest through which
humans access reality. Each interface carries its own requirements, design, logic,
ethics and implications thereof. It develops in its own contextual framework serving
the very purpose of its user, the semantic agent, as formulated by their perception.

3.3 From Meaning Construction to Meaning Consumption

In the previous section we identified language formulation and usage as a construc-


tive process and filtered by the cultural, mental, biological, socioeconomic, religious
(and the list goes on) affordances and constraints of a semantic agent, a meaning
architect. Language is constructed in a specific context for a specific purpose,
modelled in a specific interface, analysed as a set of many semantic nodes or what
Floridi (2013, 2019) calls Levels of Abstraction (LoAs). The core of the ethics
problematic lies on the sender/receiver relation. This brings us to a twofold question.
What are the constructive elements and processes to produce a LT output and what
are the implications of such output to the end user? In a nutshell, what are the ethics
of the language-as-logos lifecycle from construction to consumption? We will
attempt to answer these questions with a high-level application of the constructive/
interpretative process of information design, as introduced by Floridi (2013, 2019).
We will do so by applying the concepts developed therein in the domain of LT and
exploring any tensions that may crop up between the sender/producer of logos and
the receiver/consumer of the semantics beyond language.
For the purposes of the present work, the sender/producer is the linguist, the
annotator, the language technologist, the person who creates, edits, accesses and
shares language resources, for the construction of Language Technology
(LT) applications (such as MT) and technologies, used in their development (such
as Language Modelling), and which eventually are released in the market as
language services, tools and products. The starting point of such a constructive
process of meaning generation is the viewpoint of the semantic agent. According
to Floridi (2013, 2019), this viewpoint will define their (the semantic agent’s),
‘conceptual interface’ or the LoA. An LoA consists of a collection of observables,
each with a well-defined possible set of values and outcomes (Floridi 2013, 2019).
For example, if agent A looks for an address in a given city (e.g. Emerald City), the
address is the observable and Emerald City is the applicable LoA. The values of the
observable address are (among others) a street name, a number and a postal code.
Switching from a limited city map to a country map (e.g. Land of Oz), you get
multiplied observables (addresses in our example) as the query is extended in scale
via a different LoA. Switching to a Google search, you get a much different and
extended LoA with more observables, like for example results with similarities to the
3 The Ethics of Machine Translation 35

initial query. In our example the returned results would include films, instead of
cities and namely the Wizard of Oz, directing the semantic output to something
totally different and absolutely more (or less) relevant.
The LoA qualifies the level at which a system is considered by the selection of its
observables. An interface is a set of LoAs under which a given system is analysed.
Models are the outcome of the analysis of a system developed at some LoAs. The
way a semantic agent models a system is subject to the selection of its observables
and/or the selection of its LoAs, that include different types of observables. Such
selection affects the language model and the semantic output. To put it in context,
let’s reflect on the nurse/doctor example, inspired by Stanovsky et al. (2019) and
assume that agent A is analysing texts related to health services at the LoA of
HUMAN CAPITAL; in these texts, the observables for NURSE consist of values
related to females (e.g. female names) while the corresponding values for DOCTOR
are associated with male persons. Apparently, the idea of LoAs commits the
existence of some specific types of observables (nurse: female, doctor: male),
qualifying the system (that is health services) accordingly. Add observables regard-
ing e.g. OUTER LOOKS and you get the ‘pretty’ and ‘handsome’ values directly
(and undesirably) linked with the profession parameters in the model we examine.
This could be highly problematic for an annotated corpus and the MT model where,
by design, the system will exhibit distortions that ‘corrupt’ the receiver, the end user,
who consumes the semantic output. Google Translate, traditionally has provided
only one translation for a query, even if the translation could have either feminine or
masculine form (Kuczmarcski 2018). Google started tackling this malfunction by
offering the end user gender-specific translations for some gender-neutral words like
the English “nurse”, which can be translated into French both as “infirmier” (mas-
culine) and as “infirmière” (feminine). Alternatively, the user is allowed at times to
choose between gendered translations. However, malfunctions of that sort are
distributed across the NLP domain and the role of information ethics in this context
may provide insights under information modelling via a given set of LoAs based on
the primes of information ethics.
Similarly, in a different context, the training of Machine Learning (ML) models
on annotated language data, in which dialectical differences have not been consid-
ered, and their subsequent deployment in automatic hate speech detection systems
without again putting into perspective the appropriate LoAs, may result in wrongly
associated dialectical expressions with hate markers, thus potentially amplifying
harm against minority populations. Sap et al. (2019) uncover unexpected correla-
tions between surface markers of African American English (AAE) and ratings of
toxicity in several widely used hate speech datasets. Then, they show that models
trained on these corpora acquire and propagate these biases, such that AAE tweets
and tweets by self-identified African Americans are up to two times more likely to be
labelled as offensive compared to others. The researchers provide a two-step meth-
odology. They first empirically characterise the racial bias present in several widely
used Twitter corpora annotated for toxic content and quantify the propagation of this
bias through models trained on them. They establish strong associations between
AAE markers (e.g., “n*ggas”, “ass”) and toxicity annotations, and show that models
36 A. Nousias

acquire and replicate this bias: in other corpora, tweets inferred to be in AAE and
tweets from self-identifying African American users are more likely to be classified
as offensive. They then conduct an annotation study, where they introduce a way of
mitigating annotator bias through dialect and race priming. We focus on dialect, as
further explained.
In this example we could assume that annotators engage in an analysis on a
supposed LoA ‘OFFENSIVE LANGUAGE’ consisted of observables for SALU-
TATIONS with values like “wussup n*gga!” and “what’s up bro!”, REFERENCES
with values like “I saw him yesterday” and “I saw his ass yesterday” and ETHNIC-
ITY with values like black male/female, white male/female. The annotators annotate
the given digital interactions, on the LoA OFFENSIVE LANGUAGE. This LoA
marks a target dialect in the context of hate speech as highly toxic, thus corrupting
the receivers of the said cognitive output. Switching from LoA OFFENSIVE
LANGUAGE to LoA DIALECT, you get an enhanced understanding of the sub-
jectivity of offensive language, as you extend the analysis from a different standpoint
of variable properties and values (please refer to the Emerald City example men-
tioned earlier). What you get, as the study shows, is that AAE tweets are significantly
less likely to be labelled as offensive to anyone. What matters here in the analysis of
hate speech is the occurrence and flow of information, that is the Twitter corpora at a
different LoA, namely DIALECT, where the model returns more relevant results.
Different types of analysis lead to different models that may produce more accurate
and more ethically aligned outcomes. The key is the information qualities at hand.
LoAs allow us to navigate among these systemic bugs in a better oriented course,
as they provide insight on the systems’ conceptual interface, as well as the systems’
maker perception, like for example the annotator in the race bias use case. On the
other hand, The model of Google Translate replicated at first instance a function of
the available observables, based on the modelling specifications like consistency
with the data, background informativeness, social conception of given stereotypes
etc. Information Ethics or (for the economics of the present work) a branch of ethics
in LTs could make use of the LoA in order to: (a) identify whether a system is
assessed at the appropriate level; and (b) explore possible complex connections,
clusters or overlaps among various LoAs in a given context. This is clearly illus-
trated by the example provided in Floridi (2013, 2019) which additionally highlights
the importance of LoAs in the information ethics modelling process: “In November
1999, NASA lost the $125m Mars Climate Orbiter (MCO) because the Lockheed
Martin engineering team used English (also known as Imperial) units of measure-
ment, while the agency, team used the metric system. As a result, the MCO crashed
into Mars. LoAs are not equal to context”. LoAs are applied within contexts to
articulate semantic implications for a purpose, in our case ethical screening, in a
more consistent and straightforward fashion. Ultimately, LoAs may disclose latent
tensions between the maker and the user, the sender and the receiver, between
information construction and information consumption.
3 The Ethics of Machine Translation 37

3.4 Semantic Mismatches in Meaning Formulation

A complex and dynamic space, like the information space, requires a very delicate
set of skills and well-designed screening tools that may allow its inhabitants to truly
understand its functional design, the logic that lies behind and the ethics thereof.
Having these properties activated, the system interpreter may tackle or mitigate (the
least) contingent malfunctions like bias, propaganda, hate speech, semantic misrep-
resentation, trolling or, at a more esoteric level, confusion, overwhelmingness and
misconception, issues that we touch upon in the case studies below. But still we, the
end (semantic) users, fail to perceive our presence in the digital space and the
occurring meaning formulation system, as mediated by technology. The same
analogy can be used in the language/meaning ratio, where a message receiver
perceives subjective language as objective reality. In regard to MT applications,
we perceive MT outcomes as lacking “understanding of language nuances” or
“understanding of meaning hidden in linguistic utterances” in the background.
This means that having the impression that doctor is by default male and nurse by
default female or that African American English inclines towards hate speech,
regardless of the dialect’s specific semantic features, we fail to realise that this is
actually model mediated where the model is trained under a not-so-accurate anno-
tation schema of a training dataset with unequal language or social group represen-
tations, thus requiring semantic filtering and model repackaging. Below we examine
three complementary case studies, denoting diverse hidden implications of different
MT-related systems, articulating the influence of the input data and parameters set by
the information makers and the implications of each given choice to the information
receivers.

3.4.1 Digital Language Misrepresentation

Semantic mismatch is related to the diversity and context of the input data as offered
in different languages. Human LT refers to the production of technologies that seek
to understand and reproduce human language (Sveinsdottir et al. 2020). All these
technologies produce tools that are used in a range of fields, e.g., communication,
health and education, and can significantly improve people’s quality of life (Meta-
Net). LTs have been mostly developed for high resource languages, meaning
languages captured in a large number of digital resources (e.g. datasets, lexica,
ontologies, terminologies, etc.). Such languages are English, German, Spanish,
Arabic and Mandarin Chinese.3 Putting aside about 30 languages with a relatively
satisfactory number of resources, the remaining language capital lies in the digital
language resource margins, if it exists at all; in other words, it lacks digital resources

3
For Europe’s Languages in the Digital Age, please consult the Meta-Net White Paper series
available at: http://www.meta-net.eu/whitepapers/overview.
38 A. Nousias

in quantity and, more likely, also quality, if it exists at all. Moreover, the language
resource ecosystem looks even poorer when we add in the picture dialects, for which
digital resources are almost non-existent. The features of low resource languages are:
• Lack of linguistic expertise;
• Lack of a unique writing system or stable orthography;
• Limited presence on the web; and
• Lack of electronic resources for speech and language processing, such as mono-
lingual corpora, bilingual electronic dictionaries, transcribed speech data, pro-
nunciation dictionaries, and vocabulary lists (Krauwer 2003).
The shortage of language training data, text, audio and other media corpora puts in
question the overall NLP application development framework, as it impedes the
digital presence and well-being of specific groups with very negative social
implications.
In Sect. 3.1, we identified that meaning derives from a semantic cluster of three
interconnected nodes: (a) the symbols per se, (b) the expressions of the soul as
shaped by consciousness, experience, culture, information and the knowledge base
which in turn shapes how phenomena are expressed, and (c) the phenomena
themselves which exist prior to or independently from the language. In the context
of HLT, point (c) refers among others to data representing human attitudes and
behaviours. Indeed, we concluded that meaning formulation derives from a con-
structive process of combining, interpreting and repurposing phenomena-as-data and
not simply mirroring or portraying them (the phenomena). This is an invariably
necessary component for the evolving information models including those devel-
oped for LTs. What happens, however, when human attitudes, behaviours and
languages are not equally represented in the massive ‘artificialisation’ currently in
the making? Take social media data, for example, which provides access to
behavioural data at an unprecedented scale and granularity. However, using these
data to understand phenomena in a broader population is rather difficult due to their
unrepresentativeness and the bias of statistical inference tools towards dominant
languages and groups (Wang et al. 2019). While demographic attribute inference
could be used to mitigate such bias, current techniques are almost entirely monolin-
gual and fail to work in a global environment (Wang et al. 2019). This provokes
information reduction, altering the state of phenomena in both normative and
semantic sense, with further implications for the receiver and the society at large.
From an ethics perspective, what is important is to understand that such reduction
and consequent normative and semantic alteration is a non-natural construct that
needs to be addressed and assessed as such. In practice this requires (a) creating a
dataset representative of the types of diversity within languages; and (b) explicitly
modelling multilingual and codes-switched communication for arbitrary language
pairs (Jurgens et al. 2017). Not solely a data science task, rather a broader interdis-
ciplinary one where the establishment and further development of an interoperable
ecosystem of AI and LT platforms (Rehm et al. 2020) allowing appropriate model-
ling is an imperative. A model design needs to be aware of and subject to the
applicable system parameters that do or may affect language, like general social
3 The Ethics of Machine Translation 39

and economic factors, speaker background (birthplace, place where they live, edu-
cational level, cultural beliefs), situation in which an utterance is uttered (formal,
informal, setting, other participants), etc. It is the context that matters not the words;
and context is a complex system.

3.4.2 Digital Language Pluralism and Reviews

A second problem with the semantic mismatches is the way in which a semantic
output is perceived. Digital rating illustrates this dimension at large. The number and
quality of user reviews greatly affects consumer purchasing decisions (Hale and
Eleta 2017). While reviews in all languages are increasing, it is still often the case
(especially for non-English speakers) that there are only a few reviews in a person’s
first language. Using an online experiment, Hale (2016) and Hale and Eleta (2017)
examined the value that potential purchasers receive when reading reviews written in
languages other than their native (first) language. For instance, English native
speakers may read reviews of the same product written, for example, in French or
German by users who are respectively native speakers or French or German. But this
seems not always to be the case. The fundamental question he poses is whether
reviews in different languages are analytically similar to each other. It appears that
speakers of different languages focus on different aspects, evaluate products differ-
ently, and/or have consistently different experiences (e.g., different
internationalisation/localisation choices for software or different information avail-
able for in-person activities), thus the reviews from one language may have less
relevance to individuals primarily speaking a different language (Hale 2016; Hale
and Eleta 2017). Hale concluded, for instance, that reviews in German, Norwegian,
and French are more strongly correlated with reviews in other languages than are
reviews in Japanese, Portuguese, or Russian, without however, explicitly defining
the specific language correlations thereof. Thus, the usefulness that users have from
reviews in other languages likely varies with the languages of the same language
family. The correlations between pairs of languages suggest that reviews from some
languages will be closer to the experience of a person speaking a correlated (but still
not explicitly defined) language, than reviews from other languages. This may be
due to underlying elements of culture that are captured by the language(s) of a
person. This is where the language/meaning dichotomy strikes again, we may claim.
Hale’s (2016) and Hale and Eleta’s (2017) findings reveal that consumers (aka
information receivers) most value reviews in their first language. If so, the practice
of creating an average rating from reviews in multiple languages could be unhelpful
or even misleading, Hale concludes.
One way of tackling such outcomes would be “calculating the correlations
between languages and countries”, and thus deploying a MT design towards more
appropriate and adoptable reviews based on both linguistic and conceptual compo-
nents. For example, in terms of a multilingual product/service review, native English
speakers, as the semantic agents with the lowest level of bilingualism, seem to be
40 A. Nousias

less tolerant to foreign comments and reviews and closer to specific Anglo American
ones, while native speakers of smaller sized languages derive more value from
foreign language reviews as they are more frequent bilinguals. Following this
finding, in the event we address information (the multilingual review in our example)
as a resource, it stands to reason to argue that such a resource may be able to define
behaviour, as behaviour is defined by information. As a matter of fact, such
information-as-a-resource covers a multitude of ethical issues like liability, testi-
mony, advertising, misinformation, deception, censorship etc., thus “literally
re-ontologizing our world” (Floridi 2013). In our example, reviews in languages
with lack of correlation to reviews from other languages may constitute a driver for
less clicks of the translation button by the native English speakers, thus less market
research, low trust levels for our target product/service and possibly (at a highly
hypothetical level) less purchases accordingly. At such a macro-ethical level, ethics
screening of paradigms of that sort becomes imperative.

3.4.3 Multilingual Cross Sectoral Data Access


and Sustainability

Let us now switch to the medical sector and assume we are seeking evidence for the
safety and effectiveness of a drug or vaccine in order to provide a proof of concept
for the population at large. Due to conspiracy theories, lack of trust, distress and
inadequate data, we look for evidence outside the controlled settings of the clinical
trials and the sponsor’s semantic primes. Sounds disturbingly familiar, doesn’t it?4
Such real-world evidence refers to textual resources, randomly spread in socially
related domains, like social media or patient/doctor forums, where subjective assess-
ments of the target subjects (namely doctors or patients) on the efficiency and/or
added value of the drug or vaccine are harvested, contextualised, further analysed
(on certain LoAs) and probably reshaped or repurposed. Such content is typically
available as unstructured natural language text in multiple languages. Access to
better solutions for extracting evidence from multilingual content would have an
important impact in this case, as it would allow information receivers to derive
crucial insights or deliver a message either fact checked or semantically derived out
of thin air. In practice, a combination of ML and NLP applications like MT, needs to
be applied in order to convert this raw pharma centric text into meaningful multi-
lingual information. In other words, a range of models generated at a range of LoAs
will be required for ethical screening in context. Such a semantic design requires an
innovative data Extraction-Transformation-Loading process. We believe that this
process should be based on linked data principles and methods given that they can
take advantage of the links between data and annotation schemas, developed out of
the right mix of domains and parameters.

4
In the context of this paper, from now on we refer to the COVID-19 era.
3 The Ethics of Machine Translation 41

A pilot experiment showcasing the above has been conducted in the context of the
Pret-a-LLOD project funded by the European Union’s Horizon 2020 research and
innovation programme.5 The Pret-a-LLOD project focuses on Linguistic Linked
Open Data and caters for the development of multilingual resources intended to
support language transfer in various types of NLP systems and models. The pilot
experiment seeks for novel pharmaceutical applications based on real world evi-
dence. The pilot operates in a selection of contexts, that include ENTITY RECOG-
NITION, RELATION EXTRACTION, SENTIMENT ANALYSIS and EMOTION
ANALYSIS. Each of them is subsequently analysed and modelled at a predefined set
of LoAs for a predefined purpose. From an ethical perspective, the interest lies on
three (3) main topics: (1) on the qualities of real-world evidence (e.g., accuracy,
representativeness, completeness); (2) on identifying and following the semantic
transitions of real-world evidence throughout the semantic cycle (translation phase
included); and (3) the semantic output. Let us focus on the second topic of interest.
Such a use case typically involves one or more of the following steps:
• The manual annotation of a corpus;
• The development of ontologies, lexica and terminologies to be used in this
annotation, the formulation of lexico-syntactic rules for the processing of the
corpus; and
• The training of ML models based on the annotated data.
Annotators and pharma-agnostic language engineers would collaborate in this
scenario.
The pilot, however, aims to formulate a framework of configurable language
transfer pipelines enabled by the capabilities to discover, transform and compose
language resources developed within the Prêt-à-LLOD project, reducing the need for
bespoke engineering of the processing pipeline. The ethical rigour lies ahead as the
channel and communication process of the semantic output makes its way to the
multilingual receiver and risks of overgeneralisation, confirmation bias, implicit bias
and topic under- and overexposure. As such, the inclusion of social science, law and
ethics in the process stands for a number of reasons. Indicatively, when applications
need more data, random human experiences, opinions, beliefs and preferences are
extracted, harvested and rendered into behavioural data pursuing further ‘emotion
scanning’ as e.g. word embeddings offer fertile ground for sociological analysis
reaching at times the level of individuals.
To date, NLP used to involve mostly anonymous corpora, with the goal of
enriching linguistic analysis, and was therefore unlikely to raise ethical concerns.
Adda et al. (2014) touch upon these issues under “traceability” (i.e., whether
individuals can be identified): this is undesirable for experimental subjects, but
might be useful in the case of annotators in the event the annotation features are
ontologically poor and further documentation is required. The public outcry over the
“emotional contagion” experiment on Facebook (Kramer et al. 2014; Selinger and

5
https://pret-a-llod.github.io/.
42 A. Nousias

Hartzog 2016) further suggests that data sciences now affect human subjects in real
time, and the LT domain needs to seriously reconsider the application of ethical
considerations to the research involved (Puschmann and Bozdag 2014). This brings
us to the need to assess the maker’s view in meaning formulation and linking.
Section 3.5 elaborates on a couple of case studies illustrative of the subjectivism,
the loopholes and the ethics gap at the design stage.

3.5 The Ethics of Semantic Design

In principle, our perception about the world is either direct, coming from our ‘first
hand testimony of the senses’ (Floridi 2019), or indirect, that is ‘second hand
perception by proxy’ an information properly interpreted and assigned a meaning
by a third party, a testimony. Semantic agents transfer information as perception to
each other, blurring the boundaries between ‘empirically knowing’ and ‘merely
being informed’. This brings us to a fundamental distinction of meaning as natural
meaning vs. non-natural or conventional one, a distinction that may allow us to
better understand and ethically assess use cases like ratings and real-world evidence
and how MT and further NLP applications are involved. Add into the mix the
various collaborative design processes of a given system, like Linguistic Linked
Data and semantics turn to an absolutely non-natural, complex and dynamic phe-
nomenon, where, for proper ethical assessment, different LoAs not visible at sight
become interpretation modules of added value. What do ratings tell us about product
x, service y, destination z, etc.? How did these ratings occur and what semantics did
they convey when initially processed and further translated? Can we identify,
evaluate and validate all data processing activities from input to output? These
kinds of questions lead to the basic conclusion, that is the artificial nature of design
intervention. Such intervention is evident as a series of tech-driven data processing
activities of perception/testimony mediated data, which in turn lead to data and
further semantic hacking, served to end users (the message receivers) as reality.
Below, we examine some use cases and analyse the ethics of the design in informa-
tion formulation from the maker’s (as both designer and messenger) perspective.

3.5.1 Linking Lexical Knowledge at the Level of Meaning

The use cases around reviews and the articulated perception of real-world evidence
and testimony, as background modalities of information formulation, are derived by
two basic information sources: (a) corpora, that are collections of language data
formulated in a certain context, at a defined LoA, for a set purpose; and (b) lexicons
that focus on the general meaning of words and the structure of semantic concepts in
a certain context, at a defined LoA, for a set purpose. An overall system infrastruc-
ture is complemented by metadata and typological databases that describe features of
3 The Ethics of Machine Translation 43

individual languages and/or dialects (Cimiano et al. 2020). The typical case of
corpora and lexicons in given contexts at defined LoAs, for set purposes, is usually
that they remain unconnected. Linking dictionaries and corpora at the level of
meaning is a crucial component in order to be able to derive new content (and
meaning as well), either within the same language or across multiple languages, or to
enrich currently available data. Pilot II of the project focuses on devising and
developing technology for interlinking lexical knowledge to other lexical resources
or encyclopaedic resources, such as WordNet or Wikidata (McCrae and Cillessen
2021), in order to facilitate rapid integration of lexicographic resources and allow for
wider application of these types of data for companies in the LT area, such as
multilingual search, cross-lingual document retrieval, domain adaptation, and lexical
translation. In particular, the goal of this pilot is to explore and use state-of-the-art
methods and techniques in computational semantics, data mining and ML for linking
language data at the level of meaning via available multilingual lexical content
interlinking, the creation of new lexical content from current datasets, the enrichment
of corpora etc. (McCrae et al. 2019). Relations between lexical elements can be
found in three levels:
• Lexical relations relate the surface forms of a word, e.g. to represent etymology
and derivation.
• Sense relations relate the meanings of two words, e.g. to express that two senses
are translations, synonyms or antonyms of each other.
• Conceptual relations relate concepts regardless of their lexicalisation. Examples
of such conceptual relations are the hypernymy or meronymy relations.
The task consists of: (a) the rendering of language data to lexical definitions and
further extension of these definitions to appropriate translations; and (b) the linking
of the corpus text to the dictionary content. The ethical rigour lies in the selected
requirements. In both cases we have a semantic transition from state A to state B. To
follow the semantic flow of every given semantic state, we need to screen (a) the
intrinsic information qualities, like accuracy of the resources, in order to control
potential data errors and consequent informational errors; (b) the contextual infor-
mation qualities like relevancy, timeliness and completeness; and (c) the represen-
tational information qualities like interpretability, ease of understanding and
consistent representations6 while setting the appropriate parameters (context,
LoAs, purpose). The main objective here is to understand whether the input infor-
mation is fit for reuse for a new oriented purpose, to what extent and under what
requirements. In other words, it is about proper domain adaptation. In the event of
required domain adaptation, the maker needs to take into account the fact that the
information quality properties do weigh differently in each state of given parameters,
as we move from semantic state A to semantic state B, etc. This means that

6
Floridi (2019) introduces the concept of Bi-Categorical Information Quality and its main proper-
ties are accuracy, objectivity, accessibility, security, relevancy, timeliness, interpretability and
understandability.
44 A. Nousias

transferring an MT system initially based on texts from a given domain (e.g. the
financial domain) to another domain (e.g. legal, media, news) requires the adaptation
to new lexical and terminological data as well as on handling different linguistic
structures, as each domain has its own vocabulary, grammar and inclusive
properties.
Constantly reassessing the parameters is of high value as there are many ways of
expressing a state A or a state B and many different dimensions to look at. The next
logical question is how. Indicatively, such reassessment could take the form of a set
of questions that require a single YES/NO answer throughout the semantic cycle.
These questions will be ethically curated and appropriated in a fixed language within
specified context, LoAs and purpose parameters, thus allowing the computational
identification of possible semantic alterations of a given linguistic system as it
transits from semantic state A to semantic state B.7 The good news is that the
semantic distance between state A and state B is traceable and measurable. This is
similar to how the word embeddings run in neural networks and MT architectures
make use of word embeddings, such as OpenNMT, an open source toolkit for neural
MT (Klein et al. 2017). Neural networks measure similarities among words and
capture a great amount of real-world information, maybe supererogatory at times,
thus articulating the great value of ethical parameters. For example, Preotiuc-Pietro
et al. (2016) identified in their study isolated stylistic differences by using paraphrase
pairs and clusters from social media text. Paraphrases represent alternative ways to
convey the same information (Barzilay and McKeown 2001), using single words or
phrases (e.g. giggle/laugh or brutal/fierce) linked to user attributes (e.g. male/female
or of high/low occupational status). By studying occurrences within these paraphrase
pairs and clusters, Preotiuc-Pietro et al. (2016) directly presented the difference of
stylistic lexical choice between different user groups, while minimising the confla-
tion of topical differences. These stylistic differences entail predictive power in user
profiling and conformity with human perception. Translating paraphrases leads to a
marked improvement in coverage and translation quality, especially in the case of
unknown words, as paraphrases introduce some amount of ‘generalisation’ into
statistical MT (Callison-Burch et al. 2006). In other words, general knowledge,
external to the translation model is exploitable, while also capturing meaning in
translation. An appropriate set of questions within a fixed set of parameters (namely
context, LoAs, purpose) may illustrate the varying attributes per author and reveal in
the lexical linking in question, semantic dissimilarities, correctable errors and
information quality shortages. In other words, they may identify a mismatch and
shed the necessary light for an application for semantic mismatch prevention. The
sceptical reader may object, however, that very few of the problems of the world are
binary.

7
Based on Floridi (2019) and the concept of Borel numbers (β) and Hamming distance (hd).
3 The Ethics of Machine Translation 45

3.5.2 Discovery Search and Display

The quality of the output of multilingual applications, including MT and multi/cross-


lingual information retrieval, can benefit from the use of high quality open term
bases, providing accurate translation equivalents for lexicalized concepts in domain-
specific corpora. From the quality improvement of extracted terms, to word sense
induction and word sense disambiguation of senses, the maker needs to define
methods for term extraction, corpus analysis, sense detection, annotation and every-
thing related to the selected system requirements. Data sources, methods, tools
constitute the components of the discovery search design. Our focus here lies at
the makers’ knowledge and the ethical assessment of the design approach, process
and tools, towards a semantic output that claims to be truthful, accurate and fair. In
this process knowledge graphs, which can be and frequently are multilingual (Gracia
et al. 2012),8 play a key role as they create, map and classify relations between
different and possibly at first instance absolutely irrelevant instances. They are
blueprints of knowledge to be formulated by the maker, thus, full of unexplored
implicit biases and inferences as part expressions of the principle of information
closure within the language models, in word embeddings, in the training data.
Therefore, the ethical assessment and further design of MT engines should involve
an analysis as deployed below:
• To what extent do perception and testimony shape the MT input and output?
• In a constructive process of knowledge creation, where different agents claim
their part, who is the maker in the first place and what’s the degree of their
contribution?
• Is the provenance of each new term of further semantic modification identified?
• Are there preferred sources? On what ground?
• To what extent and how is the maker capable of measuring the semantic prox-
imity between two linked systems, articulating pitfalls and proceeding, in appro-
priate edit operations like data substitution, insertion, deletion, transposition?
• What is the maker’s context, LoAs and purpose and how do these parameters
affect the information classification, linkage and overall structure towards new
knowledge?
It seems that the initial input of the involved semantic agents matters. Thus, a further
screening on that ground is of great ethical value as the original input encloses extra
information bits, processed differently by each agent’s variant semantic parameters.
Enhancing the maker’s input with claim-relevance discovery by leveraging various
information retrieval and ML techniques would provide some extra documentation
to the maker’s knowledge, and the overall semantic design.

8
For more info about the Principle of Information Closure (PIC) see Floridi (2019: p. 149).
46 A. Nousias

3.6 Conclusion

As we automate in scale and in scope, a clear tension is evident among (a) the
opportunities and tools provided by NLP applications, like MT; and (b) the impli-
cations in social behaviour, behaviour engineering and the societal properties
thereof. It becomes clear that applied ethics needs to take the lead in the overall
LT design. Ethics on the field, if we may say, provided the prior framing of its
conceptual grounds and the development of its theoretical reasoning. The objective
is to allow for smooth navigation among the emerging polarities within LTs, rather
than focusing on what Salganik (2017) calls ‘abstract social theory or fancy
machine learning’. Following the target system observance, the input data and
appropriate description (modelling) of the observed system, we need to aim at a
diffuse, distributed, decentralised design process towards a relevant blueprint of
interpretation and engagement with our target system through interdisciplinary
lenses. For language in particular, the core problems of societal behavioural distor-
tion lie in data exclusion or demographic misrepresentation, modelling
overgeneralisation, such as automatic inferences of user attributes, topic overexpo-
sure leading to the so-called psychological effect called availability heuristics
(Tversky and Kahneman 1973) 9 and dual uses (Hovy and Spruit 2016). The
objective for a consistent blueprint is subject to general moral imperatives like
societal wellness, fairness, beneficence plus more specific professional responsibil-
ities transcending strict legal compliance. This is not a tech problem rather the free
space, where ethical elaboration lies. Ethical elaboration on ‘what’ we design, ‘how’
we design it and what is the ‘impact’ of such design.

References

Adda G, Besacier L, Couillault A, Fort K, Mariani J, de Mazancourt H (2014) Where are the data
coming from? Ethics, crowdsourcing and traceability for Big Data in Human Language
Technology. In: Crowdsourcing and human computation multidisciplinary workshop, CNRS,
September 2014, Paris, France
Andriopoulos DZ (2021) ΑΡIΣΤΟΤΕΛΗΣ ΠΕΝΗΝΤΑ ΤΡΕIΣ ΟMΟKΕΝΤΡΕΣ MΕΛΕΤΕΣ (7Η
ΕKΔΟΣΗ). Private Edition
Barzilay R, McKeown KR (2001) Extracting paraphrases from a parallel corpus. In: Proceedings of
the 39th Annual Meeting of the Association for Computational Linguistics, ACL, Toulouse.
https://doi.org/10.3115/1073012.1073020
Callison-Burch C, Koehn P, Osborne M (2006) Improved statistical machine translation using
paraphrases. In: Proceedings of the Human Language Technology Conference of the NAACL,
Main Conference, New York, Association for Computational Linguistics

9
If people can recall a certain event, or have knowledge about specific things they infer it must be
more important. If research repeatedly found that the language of a certain demographic group
was harder to process, it could create a situation where this group was perceived to be difficult, or
abnormal, especially in the presence of existing biases.
3 The Ethics of Machine Translation 47

Cimiano P, Chiarcos C, McCrae JP, Gracia J (2020) Linguistic linked data: representation,
generation and applications. Springer International Publishing, Cham. https://doi.org/10.1007/
978-3-030-30225-2
Clark HH, Schober MF (1992) Asking questions and influencing answers. In: Tanur JM
(ed) Questions about questions: inquiries into the cognitive bases of surveys. Russell Sage
Foundation, New York, NY, pp 15–48
Floridi L (2013) The ethics of information. Oxford University Press, Oxford. https://doi.org/10.
1093/acprof:oso/9780199641321.001.0001
Floridi L (2018) Semantic capital: its nature, value, and curation. Phil Tech 31(4):481–497. https://
doi.org/10.1007/s13347-018-0335-1
Floridi L (2019) The logic of information: a theory of philosophy as conceptual design. Oxford
University Press, Oxford. https://doi.org/10.1093/oso/9780198833635.001.0001
Gracia J, Montiel-Ponsoda E, Cimiano P, Gómez-Pérez A, Buitelaar P, McCrae J (2012) Challenges
for the multilingual web of data. J Web Semant 11:63–71. https://doi.org/10.1016/j.websem.
2011.09.001
Hale SA (2016) User reviews and language: how language influences ratings. In: Proceedings of the
2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA
‘16). Association for Computing Machinery, New York, NY, pp 1208–1214. https://doi.org/10.
1145/2851581.2892466
Hale SA, Eleta I (2017) Foreign-language reviews: help or hindrance? In: Proceedings of the 2017
CHI Conference on Human Factors in Computing Systems (CHI ‘17). Association for Com-
puting Machinery, New York, NY, pp 4430–4442. https://doi.org/10.1145/3025453.3025575
Hovy D, Spruit SL (2016) The social impact of natural language processing. In: Proceedings of the
54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short
Papers). Association for Computational Linguistics, pp 591–598. https://doi.org/10.18653/v1/
P16-2096
Jurgens D, Tsvetkov Y, Jurafsky D (2017) Incorporating dialectal variability for socially equitable
language identification. In: Proceedings of the 55th Annual Meeting of the Association for
Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguis-
tics, pp 51–57. https://doi.org/10.18653/v1/P17-2009
Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) OpenNMT: Open-Source Toolkit for neural
machine translation. ArXiv. https://arxiv.org/abs/1701.02810
Kostakis V (2019) How to reap the benefits of the “digital revolution”? Modularity and the
commons. Halduskultuur 20(1):4–19. https://doi.org/10.32994/hk.v20i1.228
Kramer et al (2014) Proc Natl Acad Sci USA (111:8788–8790), Issue 24
Krauwer (2003) The Basic Language Resource Kit (BLARK), as the first milestone for the
Language Resources Roadmap Utrecht Institute for Linguistics, available at http://www.
elsnet.org/dox/krauwer-specom2003.pdf
Kuczmarcski J (2018) Reducing gender bias in Google Translate. https://blog.google/products/
translate/reducing-gender-bias-google-translate/
McCrae JP, Cillessen D (2021) Towards a linking between WordNet and Wikidata. In: Proceedings
of the 11th Global Wordnet Conference, Global Wordnet Association, South Africa
McCrae JP, Tiberius C, Khan AF, Kernerman I, Declerck T, Krek S, Monachini M, Ahmadi S
(2019) The ELEXIS interface for interoperable lexical resources. In: Electronic lexicography in
the 21st Century. Proceedings of the ELex 2019 Conference, Lexical Computing. Sintra,
Portugal
Narayanan (2018) Twitter. https://twitter.com/random_walker/status/993866661852864512
Preotiuc-Pietro D, Xu W, Ungar L (2016) Discovering user attribute stylistic differences via
paraphrasing. Proc AAAI Conf Artif Intell 30(1):3030
Puschmann C, Bozdag E (2014) Staking out the unclear ethical terrain of online social experiments.
Intern Pol Rev 3(4):1. https://doi.org/10.14763/2014.4.338
Rehm G, Galanis D, Labropoulou P, Piperidis S, Welß M, Usbeck R, Köhler J, Deligiannis M,
Gkirtzou K, Fischer J, Chiarcos C, Feldhus N, Moreno-Schneider J, Kintzel F, Montiel E,
48 A. Nousias

Rodríguez Doncel V, McCrae JP, Laqua D, Theile IP, Dittmar C, Bontcheva K, Roberts I,
Vasiļjevs A, Lagzdiņš A (2020) Towards an interoperable ecosystem of AI and LT platforms: a
roadmap for the implementation of different levels of interoperability. In: Proceedings of the 1st
International Workshop on Language Technology Platforms. European Language Resources
Association
Salganik MJ (2017) Bit by bit: social research in the digital age. Princeton University Press,
Princeton, NJ
Sap M, Card D, Gabriel S, Choi Y, Smith NA (2019) The risk of racial bias in hate speech
detection. In: Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics. Association for Computational Linguistics
Selinger E, Hartzog W (2016) Facebook’s emotional contagion study and the ethical problem of
co-opted identity in mediated environments where users lack control. Res Ethics 12(1):35–43.
https://doi.org/10.1177/1747016115579531
Stanovsky G, Smith NA, Zettlemoyer L (2019) Evaluating gender bias in machine translation. In:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
Association for Computational Linguistics
Sveinsdottir T, Troullinou P, Aidlinis S, Delipalta A, Finn R, Loukinas P, Muraszkiewicz J,
O’Connor R, Petersen K, Rovatsos M, Santiago N, Sisu D, Taylor M, Wieltschnig P (2020)
The role of data in AI. Zenodo. https://doi.org/10.5281/zenodo.4312907
Tversky A, Kahneman D (1973) Availability: a heuristic for judging frequency and probability.
Cogn Psychol 5(2):207–232. 0010028573900339. https://doi.org/10.1016/0010-0285(73)
90033-9
Wacks R (2006) Philosophy of law: a very short introduction. Oxford University Press, Oxford
Wang Z, Hale S, Adelani DI, Grabowicz P, Hartman T, Flöck F, Jurgens D (2019) Demographic
inference and representative population estimates from multilingual social media data. In: The
World Wide Web Conference (WWW ‘19). Association for Computing Machinery, New York,
NY, pp 2056–2067. https://doi.org/10.1145/3308558.3313684
Zunger Y (2017) So, about this Googler’s manifesto. Medium. https://medium.com/
@yonatanzunger/so-about-this-googlers-manifesto-1e3773ed1788
Chapter 4
Licensing and Usage Rights of Language
Data in Machine Translation

Mikel L. Forcada

Abstract Machine translation (MT) is special in that it heavily relies on data. In


rule-based MT, an engine performs the translation task by using language resources
such as dictionaries and grammar rules, usually written by experts, but sometimes
learned from monolingual or bilingual text. Corpus-based (statistical and, more
recently, neural) MT leverages large amounts of monolingual and sentence-aligned
bilingual text. Clearly, MT programs using these data are works of creation that may
be copyright-protected, but this chapter focuses on data. Human labour, and there-
fore, creative authorship of works, is present in all forms of MT data: monolingual
text has been authored, parallel text has been translated and aligned, and rules and
dictionaries have been written by experts. Since its conception centuries ago,
copyright protects the livelihoods of authors by regulating how copies of these
data can be used and how works derived from them are used and published, using
instruments such as licences. While the case of dictionaries and grammars as used in
rule-based MT is reasonably clear, as they are purposely written for one or another
language-processing application, monolingual and parallel text, as used in MT, were
not created with MT in mind, and this has led some authors to ask whether authors
and translators should get additional compensation for this unintended use of their
work to generate new value downstream. This chapter gives an overview of the
different sources of data used in MT, discussing authorship along the steps of
creating, curating and transforming those data for use with MT, determining the
kinds of implicit and explicit licensing schemes that apply to them and how they
work. It also describes the controversy surrounding the use of published works to
generate new, initially unintended, value through translation technologies and the
various ways in which copyright issues are addressed.

M. L. Forcada (✉)
Dept. de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, Sant Vicent del Raspeig,
Spain
Prompsit Language Engineering, Elx, Spain
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 49


H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_4
50 M. L. Forcada

Keywords Machine translation · Corpora · Usage rights · Licensing · Copyright ·


Professional translations · Repurposing

4.1 Introduction

Among other software, Machine Translation (MT) is special in that it heavily relies
on data. On the one hand, rule-based or knowledge-based MT is usually
implemented in such a way that a roughly language-independent engine performs
the translation task by using language resources such as dictionaries and grammar
rules, which may be (a) manually written from scratch, (b) obtained by converting
existing, manually written dictionaries and rules, (c) learned from monolingual text
or from sentence-aligned bilingual text, or (d) a mixture of some or all three. On the
other hand, corpus-based MT (CBMT) such as statistical MT (SMT) and, more
recently, neural MT (NMT) rely mainly on monolingual and sentence-aligned
bilingual text ((c) above), but they may also use rules and dictionaries to
pre-process text for easier learning. Section 4.2 briefly reviews the kinds of data
used in MT. This chapter will not deal with the fact that computer programs using
these data are also works of creation that should also be protected, but will rather
concentrate on the data itself. Human labour, and therefore, creative authorship of
works, is present in all forms of MT data: monolingual text has been authored,
parallel text has been translated and aligned, and rules and dictionaries have been
written by experts. Since it was conceived more than three centuries ago, copyright
(explicit or implicit as recognized by the Berne Convention) establishes authorship
and regulate (usually by restricting it for a limited time) the usage rights of copies or
data derived from these works using legal instruments such as licences, to make it
possible for authors to make a living from creative work. Section 4.3 gives an
overview of the different sources of data used in MT, defining authorship along
the steps of creating, curating and transforming those data for use with MT,
determining the kinds of implicit and explicit licensing schemes that apply to them
and how they work. On the one hand, the case of dictionaries and grammars as used
in rule-based MT (RBMT) (Sect. 4.3.1) is reasonably clear, as they are purposely
written for one or another language-processing application; a brief mention is made
of the benefits of free/open-source licensing for RBMT data. On the other hand,
however, monolingual and parallel text (discussed in detail in Sect. 4.3.2), as used in
MT, were not created with MT in mind, and this may lead, for example, to the
question whether authors and translators should get additional compensation for this
unintended use of their works (particularly those published on the Internet and
harvested using crawling techniques) to generate new, initially unintended, value,
not only through MT but also through other translation technologies such as
computer-aided translation (CAT); this is discussed in Sect. 4.3.3. The chapter
ends with concluding remarks—but they are concluding more in the sense that
they end the paper rather than in the sense of settling a rather complicated matter;
instead, they try to summarise the various ways in which copyright issues are
addressed in the real world and give an indication of the open questions ahead.
4 Licensing and Usage Rights of Language Data in Machine Translation 51

4.2 Machine Translation Relies on Data

MT systems may be described as comprising three key components: an engine, the


program that carries out the translation, translation data, and tools to manage and
maintain the data and make them usable by the engine. MT, therefore, heavily relies
on translation data. The nature of this data will vary depending on the type of MT, as
described below, but one can already distinguish between two types of data: lin-
guistic resources and corpora. There are also two main types of translation tech-
nology: rule-based MT and corpus-based MT. They are discussed in the rest of this
section.

4.2.1 Rule-Based Machine Translation

In the decades between the first attempt at machine translation to the 1990s, the main
approach was RBMT. Typically, RBMT starts with the translations of words and
ideally builds from them a translation for the whole sentence.
To build an RBMT system, translation experts must create machine-readable
monolingual and bilingual dictionaries for the languages involved and rules to
analyse the source text and perform other actions, such as converting the source
grammatical structure to the equivalent structure of the target language. Keep in
mind that the translator’s intuitive and unformalized knowledge needs to be turned
into rules and dictionaries and coded in a computationally efficient way. This may
sometimes require simplifications which may however be useful in most cases if
chosen wisely. Computer experts of course have to write engines that use the
dictionaries and apply rules to the source text in the specified order, to produce a
raw translation, and any tools needed to manage and convert data produced by
experts to the format required by the engine.
In the case of RBMT, the translation data, dictionaries and rule sets, are an
example of linguistic resources.

4.2.2 Corpus-Based Machine Translation

The beginning of the 1990s saw the inception of corpus-based machine translation,
which is currently the main paradigm, at least for major language pairs.
In this case, the MT program is automatically trained, that is, it learns to translate
from a huge corpus of examples, each of which contains a source-language sentence
paired with its translation in the target language, and sometimes also from an
additional corpus of monolingual target-language sentences. These large parallel
corpora, more precisely sentence-aligned parallel corpora (Sect. 4.3.2), may indeed
be seen as large translation memories (TMs) such as the ones used in CAT (Bowker
52 M. L. Forcada

2002): hundreds of thousands or even millions of sentence pairs are usual in MT


training. While the role of computer experts is clear here, as they write the programs
that learn from corpora and translate new source texts, the role of translation experts
may not be clear until one considers the translation effort present in the corpora used
to train the system. CBMT uses corpora as translation data.
CBMT comes in two flavours: statistical machine translation and neural machine
translation.
• Statistical machine translation (SMT, Koehn 2009) was developed in the late
1980s and has been commercially available since 2003. To translate, it uses a
probabilistic model learned by counting the number of occurrences of specific
events observed in a sentence-aligned parallel corpus, such as how many times a
word co-occurs with another word in the target sentence, or how many times a
specific word is used in the target sentence when another specific word is used in
the source sentence. The resulting probabilistic model is used to estimate the
probability (mathematical likelihood) of each of a reasonably large set of trans-
lation hypotheses for the source sentence and select the most probable (mathe-
matically most likely) one.
• Neural machine translation (NMT) which first appeared in the market in 2016
(Sennrich et al. 2016), is relatively new in this area. It utilises artificial neural
networks, vaguely inspired by how the brain learns from examples and general-
ises beyond them. In NMT, learning and generalisation are based on observations
of the sentence-aligned bilingual corpus (see e.g. Forcada 2017). In fact, the
major online MT systems, such as those offered by Google (Google Translate1)
and Microsoft (Microsoft Translator2), originally statistical, are now based on
NMT; in addition, new “born neural” systems have appeared, such as Linguee’s
DeepL.
Note also that while modern MT tends to be corpus-based, this is generally only
possible when large sentence-aligned parallel corpora are available.3 For many of the
world’s languages the only possible choice is still traditional RBMT, as they lack
these corpora. Note that even for larger languages, data for some genres or domains
may be too scarce to properly train a CBMT.4

1
https://translate.google.com/.
2
https://translator.microsoft.com/.
3
It has been shown that it is possible to train CBMT systems in an unsupervised way, that is, using
only monolingual text on both languages and very small amounts of parallel text, or even no parallel
text at all (Artetxe et al. 2018; Lample et al. 2018), with more modest results.
4
Sometimes the CBMT system is trained on a large general stock corpus and then refined or tuned in
some way using the available data for the specific domain (see e.g. Pecina et al. 2012).
4 Licensing and Usage Rights of Language Data in Machine Translation 53

4.2.3 Hybrid Systems

Of course, there are also hybrid systems that integrate the two strategies. For
example, one may (a) use morphological rules to analyse the text before translating
it using a system that has also been trained on a corpus of morphologically analysed
texts,5 or (b) use a syntactic parser to pre-reorder the source text so that its syntax is
closer to that of the target text,6 (c) or use a rule-based system to translate a large
target-language monolingual corpus into the source language to generate synthetic
parallel training material for a CBMT system.7 The discussion below about usage
rights and licensing also applies to these hybrid systems.

4.3 Translation Data, Usage Rights, and Licensing

The previous section has identified two main kinds of data that may be used when
building MT systems: linguistic resources, mainly used in RBMT, and corpora,
mainly used in CBMT.

4.3.1 Linguistic Resources for Machine Translation

For RBMT to work, one must provide the engine with specific computer-readable
linguistic resources, such as dictionaries describing the morphology of the source
and target languages, rules to disambiguate homographs and polysemic words,
bilingual dictionaries, and rules transforming the structure of the source sentence
into a valid target-language structure. As mentioned earlier, building these resources
requires linguists and translation experts who are familiar with the formats required
by the MT system. These experts have to create their resources from scratch or by
transforming results which are already available.
Note that linguistic resources are not only useful in RBMT; they can also be used
to automatically transform, annotate, and prepare corpora to make them useful to
train CBMT systems, as described in Sect. 4.2.3.

5
See, for instance, Lee (2004).
6
Both in statistical (Cai et al. 2014) and neural (Du and Way 2017) MT.
7
This would be a kind of back-translation, similar to that described by Sennrich et al. (2016)
for NMT.
54 M. L. Forcada

4.3.1.1 Licensing of Linguistic Resources

The creation of linguistic resources, even if it may be partially automated, involves a


lot of expertise and creative work that authors may choose to protect under the
copyright legislation available in their jurisdiction, using suitable licences. Linguis-
tic data such as rules and morphologies may be seen as computer programs and one
can easily repurpose software licences to protect copyright and regulate their usage,8
but they can also be considered textual creative works and be protected using the
kind of licences used for text such as literary works.9 On the one hand, distributors of
language resources and corpora such as the European Language Resource Associ-
ation offer authors a licence wizard that lets them choose,10 but does not offer
software licences as an option; on the other hand, Apertium RBMT platform11
publishes all of its linguistic resources under the same free/open-source software
licence it uses for its engines,12 to ensure the widest possible use, share, and
collaborative development (Forcada 2020).

4.3.2 Sentence-Aligned Parallel Corpora

Isabelle et al. (1993)—but also Simard et al. (1993)—are famously quoted for saying
that “Existing translations contain more solutions to more translation problems than
any other currently available resource”. CBMT does indeed spring from this idea. As
mentioned above, to work, CBMT requires large numbers (hundreds of thousands to
millions) of sentence pairs made up of an original sentence and its translation.
Creating such a corpus requires a great deal of effort. To start, you need to have
enough translated text, ideally professionally translated.13 Then, before training the
system, the translations must be aligned sentence by sentence (if they were not
translated in a computer-aided environment and therefore already segmented and

8
Examples are the End User License Agreements used with various commercial software (https://
en.wikipedia.org/wiki/End-user_license_agreement) or the Free Software Foundation’s GNU Gen-
eral Public License (https://en.wikipedia.org/wiki/GNU_General_Public_License).
9
“All rights reserved” licences with wordings in the style of “Copyright © 2020 by Author Name.
All rights reserved. This book or any portion thereof may not be reproduced or used in any manner
whatsoever without the express written permission of the publisher except for the use of brief
quotations in a book review or certain other non-commercial uses permitted by copyright law.” or
open licences such as those in the Creative Commons set of copyright licences (https://
creativecommons.org/).
10
http://wizard.elra.info.
11
http://www.apertium.org.
12
The GNU General Public Licence, https://www.gnu.org/licenses/gpl-3.0.en.html.
13
When this is not possible, it is not uncommon to make do with corpora where translations are used
as the source text, or where documents are translations of documents in a third language (for
example, a German–Finnish parallel text may be the result of Finnish → English translation that
was subsequently translated into German).
4 Licensing and Usage Rights of Language Data in Machine Translation 55

aligned); though alignment can be automatically performed, it may need translators’


revision or supervision. However, as it is impossible to check every sentence pair, it
is not uncommon for these corpora to contain noise,14 that is, sentence pairs which
are not mutual translations, which would have to be detected and deleted.
Some of these corpora have been expressly created and published as such by
various organisations, who have usually chosen clear licences regulating their usage;
they are discussed in Sect. 4.3.3.1. But many corpora have been created by third
parties from content which is publicly available on the Internet (Sect. 4.3.3.2). A
large part of these corpora has been curated and collected in OPUS,15 the largest
repository of such data, which even publishes ready-made MT models trained
from them.

4.3.2.1 Sentence-Aligned Parallel Corpora Published by Their Owners

Institutions, organisations and companies that produce (or purchase) and publish
translations may decide to publish also the corresponding TMs or sentence-aligned
parallel corpora.

Corpora Published by Public Administrations

Some public agencies publish part of their TMs, particularly those related to
documents that they will publish in various languages. For example, since 2007,
the Directorate General for Translation of the European Commission have published
large, curated TMs16 corresponding to the so-called Acquis Communautaire, that is,
the body of common rights and obligations that are binding for all European Union
countries. This kind of TMs may be very useful in CAT of future documents
produced by the same administration, but also in CAT or to train MT systems for
texts of this genre to be published by related administrations. Another initiative by
the government of the Basque autonomous community, called Open Data Euskadi,
publishes TMs for the Basque language, generated by the government or related
institutions in the territory.17
Many administrations that generate translations for public service information
such as websites have however not reached internally the level of translation
workflow maturity (Bel et al. 2016) that would allow them to curate and, if so
desired, publish their translations as TMs.18 Without the TMs, they may end up

14
For instance, due to errors in sentence alignment.
15
http://opus.nlpl.eu.
16
https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory.
17
A search of “TMX” in https://opendata.euskadi.eus/inicio/ reveals many such TMs.
18
In addition to project Paracrawl, which will be discussed in Sect. 4.3.2.2, which ends up crawling
this kind of translations to build, other Connecting Europe Facility projects collect parallel data
56 M. L. Forcada

paying for the translation of material that had already been translated, at an unnec-
essary expense of taxpayers’ money. Sometimes, members of the institutions have
later taken it upon themselves and worked in collaboration with third parties to create
sentence-aligned parallel corpora from their public document, as in the case of the
six-language United Nations corpus (Rafalovitch and Dale 2009; Ziemski et al.
2016).

Corpora Expressly Created for Open Software or Other Projects

A large part of publicly-available sentence-aligned parallel corpora consists of rather


specialised content created by free/open-source software projects, and has been
expressly generated and published in the form of segment-by-segment translation
data as a result of the localization of free/open-source software, usually in collabo-
rative or community settings. Examples include: the GNOME desktop manager,19
the Ubuntu GNU/Linux operating system,20 or the OpenOffice.org word proces-
sor.21 These sentence- (or rather segment-) aligned corpora usually bear free/open-
source licences identical or similar to those of the associated software.

Other Sentence-Aligned Corpora

Tatoeba22 manages a crowd-sourced (collaboratively created) collection of about


nine million sentences, each one with a unique identifier, in slightly less than
400 languages (not all sentences are available in all languages). The resulting
sentence sets may be downloaded from the site and converted into sentence-aligned
parallel corpora using their identifiers. Ready-made sentence-aligned parallel cor-
pora can also be found on the OPUS site.23

from institutions. For instance, the project Principle (https://principleproject.eu/) collects data in
Icelandic, Norwegian, Croatian and Irish from early adopter institutions and companies, or the
MT4All project (http://ixa2.si.ehu.eus/mt4all/project), which focuses on unsupervised MT (Artetxe
et al. 2018; Lample et al. 2018) and on the creation of resources for EU and non-EU languages
including Kazakh. There are also initiatives like ELRC-Share (https://www.elrc-share.eu/) which
collect all sorts of parallel corpora, many based on institutional translations.
19
https://www.gnome.org/.
20
https://translations.launchpad.net/.
21
https://translate.apache.org/projects/aoo40/.
22
http://tatoeba.org.
23
http://opus.nlpl.eu/Tatoeba.php.
4 Licensing and Usage Rights of Language Data in Machine Translation 57

4.3.2.2 Web-Crawled Sentence-Aligned Parallel Corpora

A very large, perhaps the largest source of sentence-aligned translations used to train
MT systems comes from publicly-accessible documents published on the Internet
either by manually scraping and aligning it, or either by automated crawling and
alignment.
Recent advances have made it possible to automatically crawl multilingual
websites to obtain parallel corpora. This is very likely one of the methods used by
commercial systems like Google, Microsoft and DeepL. To do this, documents in the
two languages of interest are downloaded from selected candidate websites. Once
the language of downloaded documents is automatically identified, source-language
and target-language documents are matched by examining, for instance, their length
and the internal structure of the text and their content, using available bilingual
resources such as dictionaries and MT to guide the matching. Then, the source-
language and the target-language texts are split into sentences, and statistical
methods, again assisted by bilingual resources if available, are used to produce as
many sentence pairs (translation units) as possible. Finally, additional statistical
techniques are used to discard translation units which are not likely to be useful in
any application, for instance, when the source and the target have very disparate
lengths, or when they contain more numbers or punctuation than text, or when an
automatic language identifier detects them as not being in the expected language.24
For example, the project Paracrawl25 performs this sort of crawling to obtain,
publish, and provide the European Commission with sentence-aligned parallel
corpora for languages that are official in Europe and develops software to perform
the task, but many other projects crawl the web for parallel text.

Corpora from Non-governmental Organisations

Non-governmental organisations (NGOs) produce large amounts of multilingual


text. For example, Global Voices26 is an NGO that acts like a multilingual newsroom
that “report[s] on people whose voices and experiences are rarely seen in mainstream
media” with the help of journalists and translators worldwide. The EU-funded
project CASMACAT27 processed the content of their website and produced
sentence-aligned parallel corpora which are available through OPUS.

24
This may be due to the existence of segments in a language different from the rest of the
document.
25
http://paracrawl.eu.
26
https://globalvoices.org/.
27
http://casmacat.eu/corpus/global-voices.html.
58 M. L. Forcada

Corpora from Religious Sites

Sentence-aligned parallel corpora are frequently collected from religious websites; I


will name some examples that may be found on OPUS. JW300 (Agić and Vulić
2020) is a corpus produced from texts on jw.org,28 the website of Jehovah’s
Witnesses. The Tanzil project collects translations and recitations of the Quran and
has been turned into a sentence-aligned parallel corpus. Christodouloupoulos and
Steedman (2015) aligned online translations of the Bible in 100 languages. The
usefulness of these corpora in the translation of text in other genres may vary; the
JW300 corpus contains contemporary English prose but also “Bible English” cita-
tions full of archaic forms (thou [singular ‘you’] canst [singular ‘you can’], doth
[‘does’], spake [‘spoke’], brethren [‘brothers’] etc.).

Corpora from Public Administrations

Many public administrations publish public service multilingual data, but they do
not publish the TMs or sentence-aligned corpora that they could probably have
produced as a side-product of their use of CAT tools.
For example, the European Medicines Agency publishes all sorts of texts related
to the evaluation of medicines for human and veterinary use in the form of PDF files.
The OPUS project29 has extracted text from these PDF files and has aligned them to
produce corpora for 22 official languages of Europe.
Another example: the Catalan government has to publish its legal gazette in
Catalan and Spanish, as they are both official languages in Catalonia. In spite of
the fact that CAT is used to produce it, the sentence-aligned parallel corpora have not
been made available by the Catalan government, but rather by an independent
Catalan researcher, Antoni Oliver, and published in OPUS.30

4.3.3 Licensing and Usage Rights of Sentence-Aligned


Parallel Corpora

As discussed in Sect. 4.3.2 above, sentence-aligned parallel corpora may basically be


produced in two ways: as a side-product of mature, well-organised CAT workflows,
in the form of TMs, or as a later derivative obtained by processing existing published
translations. In the former case, the organisations that have commissioned the
translations determine the licence and the usage terms of the sentence-aligned
parallel corpora in case they decide to publish them; in the latter case, the resulting

28
https://www.jw.org/en/.
29
http://opus.nlpl.eu/EMEA.php.
30
https://opus.nlpl.eu/DOGC.php.
4 Licensing and Usage Rights of Language Data in Machine Translation 59

corpora are clearly a derivative of the published translations and are therefore
directly affected by the terms associated to them at the time of publication, which
may sometimes preclude the publication of any derivatives.
It is also important to consider that the compilation of the corpus, as will be
discussed below in Sect. 4.3.3.2, adds new value to the original translations, and
compilers may want to use a licence that protects their work.
In case the terms of the original translations allow for the derived corpora to be
published, the licence attached to the corpora has to be compatible with that of the
original material.
This section discusses both scenarios; that is, corpora published by their owners
(Sect. 4.3.3.1) and corpora crawled from the Internet, with emphasis in the latter case
(Sect. 4.3.3.2), and it discusses aspects such as automatic copyright and copyright as
protection, considers a non-literal approach to copyright, the implementation of
exceptions to copyright for the purpose of text mining, a workaround called deferred
crawling, the recognition of translators as authors having the right to copyright their
work, the nature of value added through the compilation of sentence-aligned parallel
corpora and the training of CBMT, and the claims of translators for compensation for
the subsequent unintended use of their work.

4.3.3.1 Corpora Published by Their Owners

Organisations that publish sentence-aligned parallel corpora as a result of their


translation activity do so under a variety of licences. For instance, the TMs of the
Directorate General for Translation mentioned in Sect. 4.3.2.1 are published under
the European Union Public Licence,31 a free licence, that is, allowing any kind of
usage, copy and modification, with copyleft, that is, requiring that all derivatives
have to be distributed under the same terms.
Tatoeba (see also Sect. 4.3.2.1) uses the Creative Commons Attribution (“CC-
BY”) licence and a Public Domain licence, both allowing any use and kind of
derivatives, for their collaboratively-built corpora, which are released in different
formats; they are turned into sentence-aligned parallel corpora by project OPUS,
who also releases them under the “CC-BY” licence.
As regards sentence-aligned parallel corpora created by in-house developers after
translations are published, licences are expected to be tied to the original licence of
the text they are derived from. In the case of the six-language United Nations corpus,
first, Rafalovitch and Dale (2009) did not specify a licence for their original corpus;
later, Ziemski et al. (2016) chose a “very liberal licence” which was drafted “with the
advice of the General Legal Division, Office of Legal Affairs, United Nations” that
basically only requires the attribution of its origin and indemnifies authors of any
damage that may result.

31
https://joinup.ec.europa.eu/sites/default/files/custom-page/attachment/eupl_v1.2_en.pdf.
60 M. L. Forcada

4.3.3.2 Web-Crawled Corpora

Sentence-aligned parallel corpora obtained by aligning publicly-available multilin-


gual content downloaded (“crawled”) from the Internet are clearly derivatives of that
content. The publication and use of this derivative material is therefore regulated,
and may completely be forbidden by the licence associated with the original material
by its copyright owners.
There are websites that carefully associate an explicit licence to the content they
publish. Just to name two radically different examples, Wikipedia uses the “Creative
Commons Attribution–ShareAlike” licence32 which allows any use and the publi-
cation of derivatives as long as the source is credited and any derivative content is
shared with the same licence, while The Washington Post reserves all rights and
basically restricts any further usage.33
Many sites do not provide an explicit licence for their content. However, this does
not mean that there are no usage or redistribution restrictions as will be seen in what
follows.

Automatic Copyright

Under the Berne Convention for the Protection of Literary and Artistic Works,34
Berne Convention for short, which has been signed by 197 countries,35 it is under-
stood that the author reserves all rights to reproduction when the work does not
specify the conditions of reuse (“Protection must not be conditional upon compli-
ance with any formality (principle of “automatic” protection)”). As a result, the
content of websites not bearing any licence cannot be reproduced in any way without
the explicit permission of the author.
Regular sentence-aligned parallel corpora contain thousands or millions of
sentences coming from hundreds or thousands of documents. Owners may implicitly
or explicitly reserve all rights: Clearly, if one followed copyright rules strictly, one
would have to contact them and explicitly ask for clearance, which would often start
a discussion on the licensing of the corpus.
For most corpora, this is a very hard task, as it would involve contacting the
owners of many sources (for a description in a real scenario, see De Clercq and
Montero Pérez 2010, cited in Macken et al. 2011). Moreover, where owners have

32
https://creativecommons.org/licenses/by-sa/3.0/.
33
“[. . .] unless expressly authorised by The Washington Post in writing, you are prohibited from
publishing, reproducing, distributing, publishing, entering into a database, displaying, performing,
modifying, creating derivative works, transmitting, or in any way exploiting any part of the
Services, except that you may make use of the content for your own personal use as follows: you
may make one machine-readable copy and/or print copy that is limited to occasional articles of
personal interest only.”
34
https://www.wipo.int/treaties/en/ip/berne/.
35
https://www.wipo.int/treaties/en/ShowResults.jsp?treaty_id=15.
4 Licensing and Usage Rights of Language Data in Machine Translation 61

explicitly stated a licence, as said above, this may restrict the choice of licences for
the corpus. In fact, it may be the case that the corpus would have to be partitioned in
sub-corpora, each one with a licence which is compatible with that of the original
sources. Does this mean that sentence-aligned parallel corpora created from these
pages cannot be distributed at all?
Section 4.3.2.2 discusses legal exceptions to the automatic all-rights-
reserved rule.

Copyright as Protection

Let us take a closer look at copyright. Copyright was originally conceived as a way
to protect the right of authors to obtain proper compensation for their work by
guaranteeing that, during a certain period of time, no one else would be able to make
a profit from derivatives. As a result, society as a whole would benefit as authors
would be encouraged to create works useful for it. One of the earliest legal formu-
lations of this principle of copyright for the common good appears in the United
States Constitution (article I, section 8, clause 8) as one of the duties of Congress:
“To promote the Progress of Science and useful Arts, by securing for limited Times
to Authors and Inventors the exclusive Right to their respective Writings and
Discoveries.” This period of time, initially small, was subsequently extended so
that not only authors but also their heirs could benefit after their death.

A Nonliteral Approach to Copyright

Now, let us leave aside for a moment the literal interpretation of “all rights reserved”
copyright clauses and think about how the publication of web-crawled sentence-
aligned parallel corpora may threaten the livelihood of authors (and their heirs) or the
viability of enterprises producing web content.
Let us clarify that we are concerned here with textual content which is freely
accessible on the Internet, that is, not behind any paywall or authentication chal-
lenge, so that anyone with a browser can read it, as this is basically what web
crawlers, that is, headless browsers, can obtain if allowed.36 But one may ask, what
is the point of protecting against copies of publicly-available content that anyone
can read?
There are a number of reasons. When authors or copyright holders of this textual
content want to make a profit with it, they can, for example, add advertisement37 to

36
Some websites avoid access by headless browsers by challenging them to show human behav-
iour, but this is quite unusual with textual content.
37
And even secure future advertisement by leaving small files in the local computer of the person
browsing the site, called cookies, which may later be picked up by other websites to reinsert new
advertisement.
62 M. L. Forcada

it, paid for by third parties. If someone else publishes a copy of substantial parts of
that content somewhere else without the advertisement, they may compete with the
original content and reduce the advertisement revenue associated with the original
content. The fact that publishers may derive value from human reading is the reason
why some sites expect headless browsers such as text crawlers to respect the
restrictions expressed in suitable files.38 However, when a sentence-aligned parallel
corpus is generated from multilingual web content, it often happens that:
• Textual content is divided in small units such as sentences;
• Formatting, which may actually be used to guide segmentation and alignment is
thrown away;
• Sentence order may not be preserved;
• Sentences that have not been successfully aligned with a translation or were
already found in another document are discarded;
• Material from different documents ends up mixed (for example, as the result of
sorting and de-duplication).
It would be a bit as if a number of documents had been processed by a document
shredder that made a paper strip out of each line, so that strips coming from different
documents had been mixed and shuffled into a mess from which some strips might
have been removed.
As a result, it is impossible, or at least very difficult, to reconstruct substantial
parts of the original document from the usual sentence-aligned parallel corpora, and
therefore, to compete with the original content by publishing reconstructed copies of
substantial parts of it elsewhere.39 In fact, Tsiavos et al. (2014) suggest that “[E]nsur
[ing] the original material [without any copyright notice] cannot be reconstructed
from the LR” is one way to reduce legal risk. This is because, while the sentence-
aligned corpus could be seen to be literally in violation of the copyright of many
pages of content, there would be very little incentive on the part of their authors to
sue the compilers of the corpora, as it would be very difficult to prove the existence
of damage and even more difficult to put a value on it.
A common legal risk mitigation strategy used by some compilers of sentence-
aligned parallel corpora is to establish a notice-and-take-down procedure.40 To give
a specific example, the wording in the paracrawl.eu website invites whoever con-
siders that their data “contains material that is owned by you and should therefore not
be reproduced here” to contact the project, clearly identify themselves, “clearly
identify the copyrighted work claimed to be infringed” and “the material that is
claimed to be infringing” together with “information reasonably sufficient to allow
us to locate the material”, and commits to “comply to legitimate requests by

38
Websites can have a robots.txt file that regulates access of headless browsers (“robots” or
crawlers) to specific parts of the site, and should therefore be respected.
39
But not completely impossible. Carlini (2020) shows that, using carefully-crafted queries, neural
models can reproduce substantial parts of the data that was used to train them.
40
https://en.wikipedia.org/wiki/Notice_and_take_down.
4 Licensing and Usage Rights of Language Data in Machine Translation 63

removing the affected sources from the next release of the corpus”. Other corpus
distribution papers such as the Wacky Corpus41 simply state “If you want your
webpage to be removed from our corpora, please contact us.”. In fact, “notice and
take-down” is one of the strategies suggested by European projects QTLaunchpad
(Tsiavos et al. 2014) and Panacea (Arranz et al. 2013) to reduce legal risk.42
One of the main applications of sentence-aligned parallel corpora is training MT
systems, which may then reasonably be considered derivative work. Therefore,
when creating a commercial MT system by reusing the original text and its trans-
lations, it would be wise to ask how much the copyright of the original text is
respected. Is CBMT a real threat to the rights of the authors and translators of the
texts used to train it? SMT output may indeed contain short subsegments (sequences
of a few or several target-language words) from the translations it was trained on, but
these are spliced together in a wholly new order; as a result, the original works are
even harder to recover than they were from sentence-aligned parallel corpora.
Recoverability is virtually impossible in NMT output, where each word (or even
every sub-word unit43 or every character (Lee et al. 2017)) is produced separately.
Therefore, considering that translations produced by systems trained with multilin-
gual works should be considered the public reproduction of substantial parts of these
works would seem like a rather outlandish claim. In fact, it would be impossible to
trace back copyright to translators, as “in [NMT] training, data is broken down to the
level of words, subwords, or even characters [. . .] so that the input of any individual
translator is unrecognisable and their contribution to a system trained with very large
amounts of data is untraceable.” (Moorkens and Lewis 2019, citing Lee et al. 2017).
In fact, in many other industries, you can buy products where traceability is lost. If
you buy packaged beef sirloin, in some parts of Europe you have traceability
numbers explaining where the animal was raised and slaughtered, and sometimes
the ear tag of the actual individual, but if you buy a can of meatballs, meat from
different provenances, individuals (and even different species such as pork) may be
found with no possible traceability. However, this does not necessarily imply that
farmers do not get proper compensation (they may not, but that is beside the point
now). Corpus-based MT would be a bit like the untraceable canned meatballs of
translation.
In fact, Tsiavos et al. (2014) suggest that “risk mitigation strategies” to avoid
“expos[ing] the [language resource] processor to legal risks” may include the choice
to “[p]rovide the service” (for instance, the MT system) “rather than the data” (the
sentence-aligned parallel corpora). Arranz et al. (2013) also suggest that language
technology providers “[do] not offer any content or derivative content as such but

41
https://wacky.sslmit.unibo.it.
42
Tsiavos et al. (2014) describe in particular detail a number of scenarios in which language
resources derived from web text are published.
43
Inputs and outputs in NMT are usually sub-word units produced by a trained segmenter based on
byte-pair encoding (Sennrich et al. 2016) or the newer SentencePiece approach (Kudo and
Richardson 2018).
64 M. L. Forcada

only services that do not replicate the material collected but only produce a service of
its processing [sic]”. In fact, while web-based commercial MT systems such as
Google Translate or Microsoft Translator are clearly based on text crawled from
the Internet, they have been basically free of legal challenge, even considering the
colossal commercial value they derive from MT.

Legal Exceptions to Allow for Text Mining

The interests of translation technology companies and researchers have influenced


legal doctrines and even adoption of legislation in different constituencies to estab-
lish exceptions to copyright for “text mining”. I will briefly illustrate this with the
examples of the USA and the EU.
In the USA, activities such as the ones discussed above could be framed as fair
use, that is, the kind of legitimate, limited use of copyrighted material that can occur
without getting explicit copyright clearance from owners, which may be a rather
cumbersome process. Fair use used to be legal doctrine, until it was partly consol-
idated by the Copyright Act in 1976. To simplify, a use is fair when there is no harm
to the author’s economic interests which motivated the use of copyright. Judges’
rulings, which interpret and augment the doctrine, and the Copyright Act, set a rather
fluid scenario which has allowed the commercial exploitation of text mining by large
companies offering translation services.44
Similar fair-use provisions are found in the United Kingdom and the Republic of
Ireland; in most of Europe, however, copyright has historically had a more explicit
and restrictive regulation, both in the legislation of states, and in that of the European
Union. With the advancement of text mining, a rather intricate system of legal
exceptions has been set up, mainly through European Union Directive 2019/790.
The text and data mining exceptions are mainly established in Article 3 for scientific
research purposes, while Article 4 deals with more general, commercial uses. Article
4 limits the applicability of the exception to the cases where “the use of works [. . .]
has not been expressly reserved by their rights holders in an appropriate manner,
such as machine-readable means in the case of content made publicly available
online”, which radically limits the range of licit activities. However, as discussed
above, and similarly to the USA, initiatives performing text mining and publishing

44
As one of the anonymous reviewers of this chapter points out, and as an illustration of fair use in
the United States, when the Authors Guild sued Google over their service Google Books in 2005, in
an attempt to exercise their copyright, the judge (in 2013, after many attempts to reach a settlement)
ruled against the authors, as it was considered that the crawling done by Google was a fair use and
could help in the advancement of other language technologies (for details, see https://www.
authorsguild.org/where-we-stand/authors-guild-v-google/ and https://www.jipitec.eu/issues/jipitec-
5-1-2014/3908). Rulings like these fueled lobbying in Europe in favour of EU-level legislation
authorising researchers to crawl data, and not letting individual countries decide on it (at that time,
only English-speaking countries in the EU allowed their researchers to freely crawl data under the
fair use provision). Current EU legislation (directive 2017/790) partially reflects these requests.
4 Licensing and Usage Rights of Language Data in Machine Translation 65

or exploiting derivative work that do not fulfil the conditions of the European Union
Directive do occur without appreciable legal challenge, as it would be very difficult
to successfully substantiate the nature of the economic damage sustained by authors.
The recommendations set out by Arranz et al. (2013) and Tsiavos et al. (2014) to
minimise legal risk discussed above predate the Directive but are therefore still
relevant.

A Partial Workaround: ‘Deferred Crawling’

Recently, Forcada et al. (2016) proposed what they claimed to be a legally-safer


workaround to distribute the value added to publicly available textual content when
creating sentence-aligned parallel corpora. The workaround, called deferred
crawling, avoids distributing pairs of sentences, that is, stretches of text copied
from the public textual content. Instead, what is distributed is (1) an indication of
the precise location (URL of the document, position inside the document); (2) the
length of those text stretches; and (3) a signature, digest, or checksum,45 that is, an
integer number computed from it to detect content that may have changed or moved
since it was originally crawled (for example, due to later edits of the webpage). One
would not re-distribute segments of the textual content, which could literally violate
copyright in some cases, but rather a description;46 one which is precise enough to
allow recovery if content has not changed. The deferred sentence-aligned parallel
corpus may then be swiftly processed with a recrawler to create, chez the user, a
temporary copy which may then be used as a TM or to train a MT system.47

Translators as Authors

The fact that translators are authors, and should be considered as such, is hard to
dispute. They produce a new text which extends the reach of an otherwise inacces-
sible text to a new readership, and this involves a variety of cognitive processes
which involve interpreting the source text and then getting into the shoes of readers
to decide the best target-language rendering of it. The value they add to the original
text is the reason for the existence of translation as a profession. But it is also hard to
dispute that the depth or the intensity of authorship varies widely across text genres,

45
Such as an MD5 code, https://en.wikipedia.org/wiki/MD5.
46
Effectively a stand-off annotation of the World-Wide Web.
47
Project Paracrawl (http://paracrawl.eu) is developing and will release in 2021 a complete deferred
crawling software suite as part of Bitextor (http://github.com/bitextor); however, the new software
will not specify any indication of position inside the document, following a suggestion by Heafield
(2020), but simply the URL of the document and digest or checksum of the text segment, assuming a
standard process to split the document in segments: after segmentation, the segment having the
same digest will be recovered (it is possible, but utterly unlikely, for two different segments to have
the same digest).
66 M. L. Forcada

with artistic and literary texts towards one end and repetitive formulaic texts towards
the other end.
According to Article 2 of the Berne Convention, “Translations, adaptations,
arrangements of music and other alterations of a literary or artistic work shall be
protected as original works without prejudice to the copyright in the original work.”
In the same article, literary and artistic works are said to “include every production
in the literary, scientific and artistic domain, whatever may be the mode or form of its
expression.” So, clearly, the Berne Convention recognizes translators as authors and
establishes that they can also benefit from copyright.48

Value Added and Changes in the Profession

The production of sentence-aligned parallel corpora from texts found on the Internet
is usually an automated process. For example, one may use software such as
Bitextor49 or the ILSP Focused Crawler.50 In addition to the research and develop-
ment effort put in creating this software, most of which has been already paid for
using European taxpayers’ money, the effort to set up a crawling job (identifying
websites, installing and configuring the software, etc.) and the price of the compu-
tational resources involved is clearly non-negligible. Subsequent cleaning also
requires software that has to be installed,51 configured and run. Finally, the storage
and, optionally, the publication of the resulting sentence-aligned parallel corpora
requires sizable computational resources again.
But value is also crucially added when each sentence is aligned with its transla-
tion in another document which is likely to be its translation. The resulting sentence-
aligned parallel corpora may be used as TMs in CAT environments, with substantial
savings in translation costs; as a result, the use of TMs has radically changed the way
in which professional translators work (limited access to context or TMs, work with
disordered pre-translated segments) and get paid (fuzzy-match discounts); see
Garcia (2007, 2009). Note that this commoditization of how TMs are used has led
companies such as TAUS to derive profit from setting up a “data marketplace”52
where people buy and sell TMs.
With further investment in research, development, installation, configuration and
computational resources, the sentence-aligned parallel corpora may be used to create
CBMT systems which may further cut translation costs through MT post-editing
workflows, and sometimes even drive translators off areas in which machine-
translated text is good enough to be used ‘as is’ (probably in new applications
where professional translation had little or no penetration such as customer reviews
or other user-generated textual content).

48
https://www.ceatl.eu/translators-rights/legal-status.
49
https://github.com/bitextor.
50
http://nlp.ilsp.gr/redmine/projects/ilsp-fc.
51
Such as Bicleaner (https://github.com/bitextor/bicleaner).
52
https://datamarketplace.taus.net/.
4 Licensing and Usage Rights of Language Data in Machine Translation 67

Claims for Compensation for Subsequent Use

The collection, curation and training efforts discussed above do add value to data
which was published after translators were compensated as agreed, ideally fairly.
When the customer hiring the translator clearly states in the contract that the
translation will be published, it may not be possible to say that the transaction
with the translator was not completed, but even when this was not explicitly stated,
a fair compensation for the translator’s work should probably be enough.53 While
Moorkens and Lewis (2019) echo the view that this may leave “many [translators]
disempowered with regard to working conditions and repurposing of translated
work”, it is also true that many workers in the world are paid to produce goods or
services that are repurposed downstream to generate other goods or services of
additional value. Historically, workers have managed to organise in unions, guilds,
etc. protecting their interests and have used instruments such as industrial action
(e.g. a strike) to force negotiation of fair compensation for their work, and, to some
extent, to empower themselves. It is true that the nature of repurposing in the case of
translations is very elaborate and, in the case of MT, mediated by machine learning,
but, at the end of the day, translators are workers that could organise (and do
organise to some extent, usually in translators’ associations but less so in traditional
unions) to attain fair compensation for their work, particularly if one considers that
their work will ultimately be used to create translation technologies that change their
profession in ways that may not be acceptable to them. It is worth noting here that
Moorkens and Lewis (2019) advocate for a more collaborative approach and pro-
pose “a community-owned and managed digital commons [which] would ultimately
benefit the public and translators by making the industry more sustainable than at
present” and state that “there are several reasons for changing the current copyright
and data ownership conditions”.

4.4 Concluding Remarks

This chapter has discussed the licensing and usage rights of the language data used to
train MT systems. There are two main kinds of language data: linguistic resources,
on the one side, used mainly (but not exclusively) in RBMT, and corpora (mainly
sentence-aligned parallel corpora), used to train CBMT systems. It is argued that,
while the case of licensing and usage rights for linguistic resources such as dictio-
naries or rule sets may be considered settled, the case for sentence-aligned parallel
corpora needs closer examination; in particular, the case is most controversial when
these corpora are derived from text published on the Internet. Sentence-aligned

53
In fact, as one of the anonymous reviewers points out, leaving aside the case of literary works that
become best sellers, or, in some jurisdictions, audiovisual content and videogames, most of the
work done by translators is not subject to royalties.
68 M. L. Forcada

corpora are crawled, usually in an undetectable way, compiled, sometimes


published, and finally used to fuel powerful online commercial MT systems, in a
way that makes it virtually impossible to substantially reconstruct original content
from either the corpora or the output of the systems, whose availability is changing
the nature of professional translation. After re-examining copyright as protection for
authors and translators, its automaticity as dictated by the Berne Convention, the
“text mining” exceptions recently promoted by legislation, the motivations behind
copyright, and the value added when corpora are produced, the chapter, rather than
reaching any conclusion, hopefully gives some clues for stakeholders to contribute
to this relevant, ongoing debate.

Acknowledgments I thank the anonymous reviewers for very useful suggestions.

References

Agić Ž, Vulić I (2020) JW300: a wide-coverage parallel corpus for low-resource languages. In:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL
2019), pp 3204–3210. https://doi.org/10.17863/CAM.44029
Arranz V, Choukri K, Hamon O, Bel N, Tsiavos P (2013) PANACEA project deliverable 2.4, annex
1: Issues related to data crawling and licensing [project deliverable]. http://cordis.europa.eu/
docs/projects/cnect/4/248064/080/deliverables/001-PANACEAD24annex1.pdf
Artetxe M, Labaka G, Aguirre E, Cho K (2018) Unsupervised neural machine translation. In:
Proceedings of ICLR 2018, the International Conference on Learning Representations. https://
openreview.net/forum?id=Sy2ogebAW
Bel N, Forcada ML, Gómez-Pérez A (2016) A maturity model for public administration as open
translation data providers. ArXiv:1607.01990. https://arxiv.org/pdf/1607.01990.pdf
Bowker L (2002) Computer-aided translation technology: a practical introduction. University of
Ottawa Press, Ottawa, ON
Cai J, Utiyama M, Sumita E, Zhang Y (2014) Dependency-based pre-ordering for Chinese-English
machine translation. In: Proceedings of the 52nd Annual Meeting of the Association for
Computational Linguistics (Volume 2: Short Papers), pp 155–160. https://www.aclweb.org/
anthology/P14-2026.pdf
Carlini N (2020) Privacy considerations in large language models. Blog post. https://ai.googleblog.
com/2020/12/privacy-considerations-in-large.html
Christodouloupoulos C, Steedman M (2015) A massively parallel corpus: the Bible in 100 lan-
guages. Lang Resour Eval 49(2):375–395. https://link.springer.com/article/10.1007/s10579-
014-9287-y
De Clercq O, Montero Pérez M (2010) Data collection and IPR in multilingual parallel corpora. In:
Proceedings of the Seventh Language Resources and Evaluation Conference (LREC 2010), pp
19–21. http://www.lrec-conf.org/proceedings/lrec2010/pdf/204_Paper.pdf
Du J, Way A (2017) Pre-reordering for neural machine translation: helpful or harmful? Prague Bull
Math Linguist 108:171–182. https://ufal.mff.cuni.cz/pbml/108/art-du-way.pdf
Forcada ML (2017) Making sense of neural machine translation. Transl Space 6(2):291–309
Forcada ML (2020) Building machine translation systems for minor languages: challenges and
effects. Revista de Llengua i Dret 73. http://revistes.eapc.gencat.cat/index.php/rld/article/
download/10.2436-rld.i73.2020.3404/n73-forcada-en.pdf
Forcada ML, Esplà-Gomis M, Pérez-Ortiz JA (2016) Stand-off annotation of web content as a
legally safer alternative to crawling for distribution. Baltic J Mod Comput 4(2):152–164.
4 Licensing and Usage Rights of Language Data in Machine Translation 69

(proceedings of the 19th Annual Conference of the European Association for Machine Trans-
lation, Riga, Latvia, May 30–June 1, 2016). https://www.bjmc.lu.lv/fileadmin/user_upload/lu_
portal/projekti/bjmc/Contents/4_2_8_Forcada.pdf
Garcia I (2007) Power shifts in web-based translation memory. Mach Transl 21:55–68. https://link.
springer.com/content/pdf/10.1007/s10590-008-9033-6.pdf
Garcia I (2009) Beyond translation memory: computers and the professional translator. J Special
Transl 12:199–214. https://www.jostrans.org/issue12/art_garcia.pdf
Heafield K (2020) Personal communication
Isabelle P, Dymetman M, Foster G, Jutras J-M, Macklovitch E, Perrault F, Ren X, Simard M (1993)
Translation analysis and translation automation. In: Proceedings of the 1993 conference of the
Centre for Advanced Studies on Collaborative research: distributed computing, vol 2, pp
1133–1147. http://www.iro.umontreal.ca/~foster/papers/trans-tmi93.pdf
Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge, MA
Kudo T, Richardson J (2018) SentencePiece: a simple and language independent subword tokenizer
and detokenizer for Neural Text Processing. In: Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing: System Demonstrations, pp 66–71.
https://www.aclweb.org/anthology/D18-2012.pdf
Lample G, Conneau A, Denoyer L, Ranzato MA (2018) Unsupervised machine translation using
monolingual corpora only. In: Proceedings of ICLR 2018, the International Conference on
Learning Representations. https://openreview.net/forum?id=rkYTTf-AZ
Lee Y-S (2004) Morphological Analysis for Statistical Machine Translation. In Proceedings of
HLT-NAACL 2004: Short Papers, pp 57–60. https://www.aclweb.org/anthology/N04-4015
Lee J, Cho K, Hoffmann T (2017) Fully character-level neural machine translation without explicit
segmentation. Trans Assoc Comput Linguist 5:365–378. https://www.aclweb.org/anthology/Q1
7-1026.pdf
Macken L, De Clercq O, Paulussen H (2011) Dutch parallel corpus: a balanced copyright-cleared
parallel corpus. Meta 56(2):374–390. https://doi.org/10.7202/1006182ar
Moorkens J, Lewis D (2019) Research questions and a proposal for the future governance of
translation data. J Special Transl 32:2–25. https://www.jostrans.org/issue32/art_moorkens.pdf
Pecina P, Toral A, Van Genabith J (2012) Simple and effective parameter tuning for domain
adaptation of statistical machine translation. In: Proceedings of COLING 2012, pp 2209–2224.
https://www.aclweb.org/anthology/C12-1135.pdf
Rafalovitch A, Dale R (2009) United Nations General Assembly resolutions: a six-language parallel
corpus. In: Proceedings of the MT Summit, vol 12, pp 292–299. http://clt.mq.edu.au/~rdale/
publications/papers/2009/MTS-2009-Rafalovitch.pdf
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword
units. In: Proceedings of the 54th Annual Meeting of the Association for Computational
Linguistics, pp 1715–1725. https://www.aclweb.org/anthology/P16-1162.pdf
Simard M, Foster GF, Perrault F (1993) Transsearch: a bilingual concordance tool. Technical
report. Centre d’Innovation en Technologies de l’Information, Laval, QC. http://rali.iro.
umontreal.ca/rali/sites/default/files/publis/sfpTS93e.pdf
Tsiavos P, Piperidis S, Gavrilidou M, Labropoulou P, Patrikakos T (2014) QTLaunchpad public
deliverable D4.5.1: Legal framework. http://www.qt21.eu/launchpad/system/files/deliverables/
QTLP-Deliverable-4_5_1_0.pdf
Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The united nations parallel corpus. In:
Proceedings of the Language Resources and Evaluation Conference (LREC’16), Portorož,
Slovenia, May 2016. https://conferences.unite.un.org/UNCorpus/Content/Doc/un.pdf
Chapter 5
Authorship and Rights Ownership
in the Machine Translation Era

Miguel L. Lacruz Mantecón

Abstract A translation of a text of any kind is a derivative intellectual work. It


involves a transformation of the original text, and therefore it is a right of the holder
of that text to authorize (or not) its translation. This is covered in Article 8 of the
Berne Convention for the Protection of Literary and Artistic Works of September
9, 1886, and the same Convention tells us in Article 2.3 that “Translations, (. . .) and
other alterations of a literary or artistic work shall be protected as original works
without prejudice to the copyright in the original work (. . .)”. The rights ownership
of the translator implies that the translation cannot be used without authorization
from the translation copyright owner. These premises are jeopardized with the use of
AI systems and the introduction of machine translation. The key issues raised by the
new technologies are the ownership of rights over machine translations, and the
possibility of using the results associated with translation as data for the improve-
ment of Machine Translation algorithms.

Keywords Intellectual property · Copyright · Derivative works · Translations ·


Machine translation

5.1 Translation as an Intellectual Work

The translation of literary or scientific texts of any kind is a clear example of a


derivative intellectual work. The doctrine considers the translation of a text as an
assumption of the transformation of the work, which requires the work’s author
authorization. This is the sense in which it appears in the Berne Convention for the
Protection of Literary and Artistic Works of September 9, 1886, in its Article 2.3.:
“Translations, adaptations, arrangements of music and other alterations of a

M. L. Lacruz Mantecón (✉)


Law Faculty, University of Zaragoza, Zaragoza, Spain
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 71


H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_5
72 M. L. Lacruz Mantecón

literary or artistic work shall be protected as original works without prejudice to the
copyright in the original work (. . .)”.
It is, therefore, obvious that as translation involves such a transformation of the
original text, it is a right of the holder of that text to authorise (or not) its translation.
This is also covered in Article 8 of the same Convention: “Article 8. Right of
Translation. Authors of literary and artistic works protected by this Convention
shall enjoy the exclusive right of making and of authorizing the translation of their
works throughout the term of protection of their rights in the original works.”
National regulations also contain this right to authorise the translation of a literary
work, as well as the consideration of the translation itself as an autonomous work. In
France, translation is regulated by Article L112-3 of the Code de la propriété
intellectuelle, which warns that the authors of translations, adaptations, transforma-
tions and modifications of protected works enjoy the protection derived by this Code
without prejudice to the rights of the author of the original work. The British
Copyright, Designs and Patents Act 1988 (CDPA) states in its Article 21-3 that
translations are among the adaptations subject to the author's authorization: “In this
Part “adaptation”—(a) in relation to a literary work, other than a computer
program or a database, or in relation to a dramatic work, means—(i) a translation
of the work”. Similarly, U.S. Copyright law (17 U.S. Code §101) considers trans-
lations to be derivative works.
The Spanish Intellectual Property Law, Royal Legislative Decree 1/1996, of
12 April (Ley de Propiedad intelectual, Real Decreto Legislativo 1/1996, de 12 de
abril), tells us in its Article 11 that translations are derivative works: “Sin perjuicio
de los derechos de autor sobre la obra original, también son objeto de propiedad
intelectual: 1.° Las traducciones y adaptaciones (. . .)”.1 In the aforementioned
Article, translations, as well as revisions, updates and annotations of works, com-
pendiums, abstracts, extracts or musical arrangements are considered by law as
transformations of literary or scientific work. Another feature is that such trans-
formations may be paid not only by means of a percentage, but also by a lump sum.
We see, therefore, how translation is considered as a genuine “work”, although
one of a dependent or secondary nature, indeed subjected to obtaining the
corresponding authorization by the right holder of the text to be translated. This
need to request permission for translation does not occur when the use of the original
text is permitted by intellectual property laws, for example, when we are dealing
with literary works in the public domain, or if they are legislative texts. But the
translation rights remain, and such translation remains an intellectual property work.
The translation author, the translator, holds the rights over their work. As stated
by Venuti (1995), “a translator can be said to author a translation because translating
originates a new medium of expression, a form for the foreign text in a different
language and literature”. Lee (2020) points out that on the basis of the former rulings
of Byrne v. Statist (1914) and Walter v. Lane (1900), it was concluded that the

1
Without prejudice to the copyright on the original work, the following are also the object of
intellectual property: 1. Translations and adaptations (...) [translation of my own].
5 Authorship and Rights Ownership in the Machine Translation Era 73

translator is “(. . .) an author as a matter of law in respect of the translated text – not a
joint author, but an author in his/her own right.”
This ownership by the translator occurs despite the inevitable lack of originality
of the derivative work, since it expresses (translates) an existing work. Obviously,
originality and creativity in translation must be referred to the operation of
transforming the source text into another language, not the form or content, which
by definition must be reproduced from the original. Their work consists of a kind of
reinterpretation of the original work, although Lee (2020) believes that this is an
exaggeration of the role of the translator, which merely transforms the original
language into another.
The rights ownership of the translator remains even in front of the rights holder of
the original work, and as Moorkens and Lewis (2020) state, the translation can't be
used “without authorization from the translation copyright owner (. . .) the original
author may not use the translation as the basis for a further translation without
permission, and that the translation copyright owner retains rights even over trans-
lation of a text that is out of copyright”. Despite this, Venuti points out that in most
cases the translation rights have been transmitted to the holder of the rights to the
original work, either by means of a working contract, whereby the translator is an
employee, or by commissioning the translation work.

5.2 Translation by Means of Machines

Translation is certainly an intelligent activity, and the possibility of Machine Trans-


lation (MT), using Artificial Intelligence (AI) systems, is today a reality that effec-
tively meets the requirements of a Turing Test, in which the machine replaces the
human translator. In fact, and as Wahler (2019) points out, the first approaches to
MT predate the disclosure of AI bases by Turing in Mind (1950), or by McCarthy at
the Dartmouth Summer Research Project on AI. Thus, as early as 1949 Warren
Weaver, Director of the Natural Sciences Division at the Rockefeller Center,
published his memorandum “Translation”, laying the theoretical foundations for
an automatic translation on cryptographic models. And in 1951, Yehoshua
Bar-Hillel became the first full-time MT researcher at the Massachusetts Institute
of Technology.
This reality of MT is imposed today with the use of neural networks and machine
learning, as explained by Moorkens and Lewis (2020): “The application of neural
networks to translation means that Machine Translation (MT) is now considered an
application of machine learning. Define machine learning as ‘an automated process
that extracts patterns from data’. . . the application of machine learning to translation
has led to a leap forward in MT quality and claims that MT is approaching human
parity”. The use of MT technology is common among individuals and in the
translation industry, as explained by Parra Escartín and Moniz (2019): “Nowadays,
machine translation is used in professional translation workflows and personal
communication alike”. They also emphasise the importance of this technology in
74 M. L. Lacruz Mantecón

disaster or emergency situations, to “unlock the potentially huge bottleneck that


could be caused by a lack of professional translators during crises”.
Two key issues raised by the new translation technologies are: (1) the ownership
of rights over mechanical translations, and connected with the previous one, (2) the
possibility of using the results associated with translation as data for the improve-
ment of MT algorithms.
Moorkens and Lewis (2020) refer to this issue by saying: “(. . .) ownership and
sharing of translation data became a point of discussion among translators and
translation buyers. For example, Topping (2000) discussed whether T[ranslation]
M[emorie]s should belong to the end client, the L[anguage]S[ervice]P[rovider], or
the translator (as owner of the TM tool); whether data sharing was ethical, and
whether bespoke TMs created for one purpose would be useful for another”. They
are naturally referring to the data generated by the act of translating, which are useful
for the training of intelligent translation systems, especially those equipped with
neural networks, as neural MT (NMT).

5.3 Translation Authorship with Current Systems

The universal principle in the field of intellectual property is that the original
ownership of rights to the literary or artistic work belongs to its author; The
authorship of the work carries as a consequence the ownership of rights. The same
applies in principle to the ownership of translation as a derivative work, in which the
rights correspond to the author-translator. However, this original attribution of rights
is a legally articulated mechanism to allow the subsequent transmission of rights
over the created work. Once there is a right—of the kind of ownership—to the work,
a firm starting point is created for the commissioning and distribution of the work in
the artistic, literary or audiovisual market. In the case of translation, authorship
allows the transfer of rights to the beneficiary of the translation, who will make
use of the translation with the ultimate goal of using the work for its intended
application. Copyright regulation is therefore based on the recognition of an initial
point of rights ownership, which is the author, for an openly commercial purpose. In
the field of translation, this initial point will be the translator.
With the use of intelligent systems for the production of musical, artistic or
literary works, the current problem is that the machine takes on such a leading role
that the authorship of the human being is blurred until it almost disappears.
The solution chosen in such cases is to recognize authorship in the person who
has contributed in the use of the creative intelligent system by means of a significant
activity (or a co-authorship if there have been several people involved). This idea
seeks to preserve the ownership of rights in the human being that is more directly
related to the creative result. Such a solution is justified in the legal doctrine
especially on the basis of Article 9.3 of the UK Copyright, Designs and Patent Act
(1988): “Authorship of work. In the case of a literary, dramatic, musical or artistic
5 Authorship and Rights Ownership in the Machine Translation Era 75

work which is computer-generated, the author shall be taken to be the person by


whom the arrangements necessary for the creation of the work are undertaken.”
This solution is easily applicable where the use of the machine is purely instru-
mental. In such cases, human intervention is relevant, recognizing the authorship of
the human who has used the system, thus protecting their interests. As Mezei (2020)
explains, in these instrumental or partially autonomous systems, the success of the
algorithms depends largely on human participation in the creation process. To
illustrate this, he refers to the project “The Next Rembrandt”,2 where the intervention
of the human team was decisive.
In MT, the solution has to be the same: If the system is used as a support and/or
instrumental tool, the ownership of the rights to the translation must be granted to the
person who uses the machine (be it a translator, or an end-user). However, this
instrumental use is a misleading concept: we say that there is instrumental use when
the system does not provide us with perfect translations, and it is necessary to review
and post-edit the results. In these cases, the work of the translator is twofold, as they
have to prepare the source text, send it as input to the machine, and then check and
improve the resulting translation. As Forcada (2023) highlights in a different chapter
in this volume, “(. . .) MT is usually implemented in such a way that a roughly
language-independent engine performs the translation task by using language
resources such as dictionaries and grammar rules, which may be (a) manually written
from scratch, (b) obtained by converting existing, manually written dictionaries and
rules, (c) learned from monolingual text or from sentence-aligned bilingual text, or
(d) a mixture of some or all three”. In short, Forcada (2023) notes: “Human labour,
and therefore, creative authorship of works, is present in all forms of MT data”.
The use of rule or knowledge-based MT systems implies a substantial leap
forward but still today the translations are defective. As Way (2013, 2018) and
Parra Escartín and Moniz (2019) remind us, the translation output needs to be
checked and post-edited, and the degree of changes will depend on the expected
level of quality. There could be workflows in which algorithms of Quality Estima-
tion (QE) could be used to skip translations deemed right (predicted as high-quality
translations by the QE system), and different post-editing guidelines may be given to
translators. These may range from instructing them to aim for complete precision to
simply making sure that intelligibility is achieved. This work of reviewing and
improving MT output (post-editing) ensures the translator ownership of rights, as
the translator work is openly creative.

2
The Next Rembrandt project was led by the agency J. Walter Thompson for the ING bank. It
counted with Microsoft and the Technical University of Delft as technology partners, as well as with
the Mauritshuis Museum and the Rembrandt House Museum. Using as a starting point the analysis
of Rembrandt's work, in 2017 it was possible to create a new Rembrandt, the portrait of a man who
replicates (not copies) the style of the painter.
76 M. L. Lacruz Mantecón

5.4 The Blurring of Authorship in Advanced Systems

Today, the use of systems based on neural networks and corpus-based MT (CBMT)
results in greater outcomes in the translation output. As Wahler (2019) says, since
2016 the Google Neural Machine Translation (GNMT) system “(. . .) . . . uses deep
machine learning to mimic the function of a human brain, and Google claims the tool
is sixty percent more accurate for English, Spanish, and Chinese than Google
Translate, which is phrase-based. . . the GNMT system no longer translates word-
by-word but instead translates entire phrases as units, a feature known as ‘soft
alignment’”. In spite of this, and in order to avoid complaints, Wahler considers
that the intervention of the human translator in post-editing the results remains
essential.
But progress in this field leads us to think that in the near future the quality level
of MT may have reached (if it has not already done so) that of the human translator.
This leads us to rethink the problem of the authorship of these “automatic trans-
lations”, because the supervisory work carried out today by the human translator will
not even take place. In fact, this is already an existing scenario, and there are
companies that today already choose to use MT without a human in the loop for
specific purposes. This problem has arisen in the same way in the field of artistic and
literary creation, in which genuine robotic authorship is denied by many jurists,
understanding that a genuine creation activity is always human and is not replicable
by machines. Miernicki and Ng (2020) point out that the interpreters of the Berne
Convention, where there is a lack of a clear definition of the concept of “authorship”,
have repeatedly suggested that the convention refers only to human creators, and that
the minimum standard established by this Agreement applies only to works carried
out by human beings, excluding AI productions. They add that the US Copyright Act
n.d. provides protection for “original works” (17 U.S.C.§ 102 (a)), with the under-
standing that this refers only to creations made by human beings.
In this sense, Ginsburg and Budiardjo (2019) insist on rejecting even the author-
ship of AI systems that use neural networks and are trained by the use of deep
learning. They do so because the use of more sophisticated “learning” models does
not change their initial conclusion that machines are not genuinely “creative”.
Lanteri (2020) also warns that the United States Bar Association endorses the
position of Ginsburg that we just referred to, and that the International Confederation
of Societies of Authors and Composers (CISAC) also maintains this position that the
protected works are those produced using AI just as an additional support to human
creativity and hence they can be managed within the current copyright framework.
For his part, Calo (2016) reduces cybernetic creativity to the ability of a robot to be
considered as an “interpreter”, which could place it in the scope of copyright
protection over derived creative contributions, as those of musicians or actors.
In short, genuine authorship of artistic works is denied when there is no signif-
icant human intervention in its creation. It is a question of demanding that sufficient
human activity has taken place to allow the human actor to be qualified as the author,
and not the machine. In other words, the authorship can be attributed to the person,
5 Authorship and Rights Ownership in the Machine Translation Era 77

as Article 9.3 of the British CDPA tells us, “the person by whom the arrangements
necessary for the creation of the work are undertaken”.
However, the fact is that in the current field of art, and predictably also in the
future of translation, intelligent systems allow for the creation of works without any
human activity other than pressing the “on” button. As Dornis (2021) says, in these
cases, it must be recognized that the system occupies “the driver’s seat”, as far as
creativity and technical innovation are concerned, and the results are “works without
an author”—human author, one might add. This is shared by Bridy (2012), who
believes that we must stop looking for a subterfuge and admit that the author of the
work is, without doubt, the AI system that creates such work. Galanter (2020) states
a similar thing from the point of view of generative art, and suggests considering
appropriate an approach that treats the assignment of authorship not as a moral act,
but simply as a descriptive one. We would be morally obliged to acknowledge the
authorship of the machine, but not in view of it itself, but in view of the human being,
because otherwise our human social life, in this world of art, would be based on a lie.
The same applies to automatic translations.
We are therefore faced with a clash between factual reality and the legal require-
ment that the author of any work be a human being. However, I believe that one
thing is the fact that all or most of the creation work is carried out by a machine, and
another very different thing is that such a machine may be the rights holder over such
creation. Moreover, and as we have already discussed above, that legal recognition
of authorship is only a resource to establish a firm basis for the recognition of rights,
so that works can enter the art market and its distribution channels.
This is why in Spain Navas Navarro (2018) distinguishes the “legal author”, as
the physical or legal person who commissioned the work or used the system, and the
“material author” who would be the “robot machine”. Duque Lizarralde (2020)
points out that in works generated by intelligent systems, the creator is in fact the
system, which is not an entity but an object, and under the aforementioned normative
frameworks cannot be considered an author. However, it is also evident from the
regulations that authorship corresponds only to the effective creator of the work.3
Ramalho (2017) acknowledges AI as the author in factual terms, but questions
whether it should be the author in legal terms.4 Clearly, no: The only way to proceed
is to transfer ownership to the human being who produces the machine output
results. The solution is to assign the rights ownership to the human being by
means of a “legal authorship” (or fictitious authorship), and not to the machine.

3
“El creador de hecho es el sistema, que no es un ente sino un objeto, y bajo los marcos normativos
citados no puede considerarse autor. Sin embargo, también se deduce de la normativa que la
autoría corresponde únicamente al creador de hecho de la obra” (The effective creator is the
system, which is not an entity, but an object, and according the cited legal frameworks, it cannot be
considered an author. Nevertheless, it can also be inferred from the legislation that the authorship
solely corresponds to the factual creator of the work”—translation of my own).
4
“En otras palabras, la IA es el autor en términos fácticos, pero ¿debería ser el autor en términos
legales?” (In other words, AI is the author in factual terms, but should it be the author in legal
terms?—translation of my own).
78 M. L. Lacruz Mantecón

5.5 The Pretended “Electronic Personality” of an


Intelligent System

This proposal was echoed a few years ago. It consists of acknowledging a kind of
legal personality to advanced intelligent systems, similar to that of societies and
other “moral persons”. This idea was particularly publicised following the European
Parliament resolution of 16 February 2017, which included recommendations to the
Commission on civil law rules on robotics, and which proposed to manage the
liability of intelligent systems by attributing them an electronic personality (this was
primarily thinking of self-driven vehicles). In this sense, the authorship could be
assigned to the MT system itself.
But accepting that a machine has a personality becomes very difficult, due to its
lack of consciousness and authentic subjectivity, as I explain in a previous work
(Lacruz Mantecón 2020). Moreover, the ownership of rights would necessarily have
to be managed by humans, which makes this solution useless. Precisely because of
these reasons, such an electronic personality is being rejected by most legal special-
ists. Further still, the most recent European texts abandon this idea which only
sought to solve the problem of responsibility. The European Parliament resolution
of 20 October 2020 on intellectual property rights for the development of artificial
intelligence technologies (2020/2015(INI)) expressly states in its Recital 13 that
although the process of automatic generation of artistic content raises problems
related to the ownership of rights, the European Parliament considers “(. . .) that it
would not be appropriate to seek to impart legal personality to AI technologies and
points out the negative impact of such a possibility on incentives for human
creators”.

5.6 Fully Automated Translation

5.6.1 Approach

The recent European Parliament resolution of 20 October 2020 on intellectual


property rights for the development of AI technologies (2020/2015(INI)) welcomes
the differentiation of these two modes of operation from AI systems. Thus, Recital
14 points out the difference between “AI-assisted human creations and AI-generated
creations, with the latter creating new regulatory challenges for IPR protection,
such as questions of ownership, inventorship and appropriate remuneration, as well
as issues related to potential market concentration (. . .)”.
With regard to the topic of this chapter, the relevant ones are the translations
generated by the computer completely autonomously, because this is where the
problems of authorship and ownership of rights can arise. Although the presence
of the translator is still necessary today, in the future, the solution is to make a
5 Authorship and Rights Ownership in the Machine Translation Era 79

separation between authorship and ownership of rights, so that the rights necessarily
correspond to a human being, call it a legal author or simply a rights holder.
Taking as a starting point Article 9.3 of the British CDPA, this human being will
be the one who has made the necessary arrangements to obtain the output of the
machine, even when such operations are often reduced to a commissioning of the
MT system and sending the texts for translation to such system. In short, we are
trying to find the “human behind the machine”, the human who makes the machine
work or who just takes advantage of its output. Who could be this person that holds
the best relationship with the machine and hence is in a position to claim rights over
its output? Fernández Carballo-Calero (2021) points out that in the specific field of
computer-generated works, the potential candidates are four: (1) the author of the
program; (2) the user of the program; (3) the program; and (4) none.5 We have
already seen that the program—the machine—has no personality or therefore ability
to own rights. Let’s now have a look at the various human candidates.

5.6.2 The System Programmer

To attribute rights on the system results (output) to the author of the program is the
classic solution. Rogel Vide (1984) notes that as a result of a meeting held in Geneva
in 1979 between WIPO and UNESCO a Working Group was established that drafted
a Report which, inter alia, dealt with “copyright problems arising from the use of
computers for the creation of works” (Rogel Vide 1984). One of them was the
fatherhood of computer-generated works, a matter in which the principle was
affirmed that the copyright owner of such works could not be the computer itself,
but the person who triggered the creation. This solution is still maintained today by
many authors, such as Ginsburg and Budiardjo (2019), who believe that in fully
generative systems the ownership of the created works corresponds to the “program-
mer-designer” of the machine.
Likewise, Bridy (2012) follows Judge Holmes’s assertion of authorship based on
the inherent uniqueness of human personality. She points out that the law cannot
confer the copyright ownership of an artificially generated work of art to the author
of the work, because this is, in fact, a generative software program, and has no legal
personality. Therefore, the programmer of the generative software is the logical
rights holder of the works generated by its software: It would be “the author of the
author of the works”. As Ramalho (2017) points out, this was also the position taken
by the English courts in the Nova Productions Ltd v Mazooma Games Ltd and
Others (2007), in which the court attributes ownership to the programmer and
designer of a video game.

5
“En el ámbito específico de las computer-generated works se ha señalado que los "sospechosos
habituales" son cuatro: (1) el autor del programa; (2) el usuario del programa; (3) el programa y
(4) ninguno”.
80 M. L. Lacruz Mantecón

This range of potential candidates would include all participants in the design of
the MT system, from programmers to data selectors and “trainers” of the intelligent
system by means of deep learning. The normal will be a plural authorship of the
systems and a subsequent co-authorship. Of course, if designers act for a company,
the rights owner would be the company they work for.
However, despite its popularity, this attribution of rights to the programmer or
designer is now being rejected, because as Carrasco Perera and del Estal Sastre
(2017) explain, it implies an overprotection since program creators already have an
exclusive right to their creations and obtain their remuneration by licensing their
programs. These authors state that the owner (author) of intellectual property rights
in a computer program does not require nor deserve additional protection, as the
author at the same time of the work resulting from the use of the program.6 In our
case, such resulting work would be the output translation. The same idea is found in
American doctrine. Yu (2017) claims that allowing the programmer to have
protected by copyright not only his software, but also any result thereof, is an
excessive reward for their efforts and invites the accumulation of copyright.

5.6.3 The System User

Holder et al. (2019) point out that, in the future, when the machine acquires an
autonomy that allows it to generate creations of its own, the only human intervention
will be the initiation of the process and the establishment of the requirements for the
work, something that the authors describe as “interaction” with the machine. As they
point out, in these cases the robot's manufacturer-programmer has no intervention
whatsoever in the creative result. In such cases, the user and owner of the machine
would be the only one who holds the rights of the work, because they are the only
one who intervenes in its creation.
Yu (2017) also believes that the assignment of rights should be made to the end
user of the machine, because it makes more sense both from a social policy point of
view as well as from an economic point of view. And this is because the end user
ultimately determines whether a machine-created work is produced, and so it is this
end user who must connect with the interests of the general public to introduce the
works into the market. If, through copyright, we want to encourage the production of
more creative works, it seems better to grant copyright protection to the end user of
the system. In addition, this attribution of user rights would be effective in encour-
aging the acquisition of licences and the development of better MT systems. Yu
(2017) exemplifies this saying that in the analogical world, this would be the
equivalent to asking whether copyright should be attributed to the pen manufacturer

6
“El titular (autor) de derechos de propiedad intelectual sobre un programa de ordenador no
requiere ni merece una protección suplementaria, como autor al mismo tiempo del opus resultante
de la aplicación del programa.”
5 Authorship and Rights Ownership in the Machine Translation Era 81

or to the writer. He then questions why this ambiguity should prove to be problem-
atic in the digital world and uses Microsoft Word as another example: Microsoft
created the Word software, but obviously does not own all the work done with that
software.7
In the event that the machine produces several alternative results such as multiple
parallel translations, or that a post-editing of the resulting text is required, a new
subject must be added: The person who makes the choice between the various output
results or the person who performs the post-editing of the text. It will normally be the
user themselves who makes such a choice or performs the post-editing, and hence
there will be no discussion as to their rights ownership. But if we are faced with
different people, as will be the case if these tasks are outsourced to a third party, this
ownership will have to be reconsidered. For these cases Ginsburg and Budiardjo
(2019) recover a theory that was applied to acknowledge the photographer’s title as
the author, as opposed to the mere photo taker employed by the photographer and
who just followed their instructions. This is the so-called “theory of adoption”, and it
prescribes that in these cases part of the creative work of the author consists of the
choice of the results, that is, of the photographs that they consider worthy of their
genius, which results in authorship by adoption. Through the “adoption theory”,
authorship is attributed to the photographer-planner when random forces intervene in
the results, because it is the photographer who “adopts” (or rejects) these results.
This idea is perfectly applicable to the translator who chooses from the various
possibilities presented to them by the machine, a specialised human intervention that
also grants them authorship and rights over the final translation.
The ownership of user rights could be articulated through the creation of a sui
generis8 right, as Sanjuán Rodríguez (2020) proposes in Spain. She considers that
the best thing would be the construction of a sui generis right or a related right in line
with the one referred to in article 129.2 of the Spanish Intellectual Property Law.
Seeking investment protection, Ramalho (2017) also advocates for the granting of a
related or sui generis right similar to that of database manufacturers. This would be a
regime similar to the law of the publisher of unpublished works as prescribed by the
Directive 2006/116/EC of the European Parliament and of the Council of
12 December 2006 on the term of protection of copyright and certain related rights,
whose Article 4 grants publishers 25 years of protection.

7
“En el mundo analógico, esto es como preguntarse si el derecho de autor debería atribuirse al
fabricante de una pluma o al escritor. Entonces, ¿por qué podría tendría que ser problemática esta
ambigüedad en el mundo digital? Tomemos el caso de Microsoft Word. Microsoft creó el programa
informático Word, pero evidentemente no es titular de todos los trabajos realizados con ese
software.”
8
Sui generis is a Latin expression that means “special”. In Intellectual Property Law it is used to
refer to the rights of those not covered by intellectual or artistic work authorship, such as a cinema
producer, or the businessman behind the creation of a data base.
82 M. L. Lacruz Mantecón

5.6.4 Legal Solutions Outside the Copyright Field

If the minimum user activity of operating the machine and collecting the results is
not considered sufficient to grant the user the rights to the generated translation, the
generated work would fall, as a consequence, in the public domain. This would be
due to a lack of human creative activity, which determines that the results of MT are
not considered works protected by copyright. Ginsburg and Budiardjo (2019)
believe that as the authorship and subsequent rights ownership is something exclu-
sively referable to human beings, a work falling into the public domain is the
consequence of a lack of sufficient human intervention in the creative process.
The public domain would be what Fernández Carballo-Calero (2021) denotes as
the “none” owner. As Ríos Ruiz (2001) points out, this is a classic solution, already
advocated for by WIPO professor Daniel Gervais in the early 90s. On the basis of the
autonomous generation of author and artistic productions, Gervais holds that the
works thus generated fall in the public domain because they are not intellectual
creations created by human beings, and hence International Copyright does not
provide protection to them.
Mezei (2020) also advocates for the public domain, and notes that although this
solution will lead to the loss of incentives for some, he deems it more appropriate to
follow the proposal by Victor Palace, who asserted that the AI industry is likely to
continue to flourish regardless of copyright, as it has done so far, due to the inherent
incentives in the AI industry itself. He proposes the creation of a special category of
authorless works, which are already “born” in the public domain.
However, this does not mean that the results of the system, the translations, are
left out without any protection, and alternative means such as unfair competition
rules can be used. Mezei (2020) refers to a Japanese legislative proposal to introduce
an intellectual property regime for works not created by humans and which would
therefore be applied to the results generated by AI. According to this proposal,
instead of expanding the copyright system, the regulatory body will analyse a
framework that handles AI-created works in a manner similar to trademarks,
protecting them from unauthorised use through legislation that prohibits unfair
competition. The proposal goes on to explain that in light of the ability of
AI-based systems to create a huge amount of work in a short time, the plan is to
provide protection only to those works that reach a certain degree of popularity or
otherwise maintain market value. In particular, and as Scheuerer (2021) explains,
unfair competition legislation can intervene effectively in the field of intellectual
property in three dimensions: First, establishing a general regulatory paradigm for an
approach to the protection of intangible assets that is adapted to the data economy.
Second, in a dimension de ferenda,9 proposing it as an alternative to the introduction

9
In legal texts, de lege ferenda refers to matters that need to be regulated by laws in the future.
5 Authorship and Rights Ownership in the Machine Translation Era 83

of new intellectual property rights in cases of uncertainty10 function of providing


supplementary protection to the de lata11 IP.

5.7 Use of Translation-Associated Results

5.7.1 Personal Data Associated with Translation

The intelligent system, when translating, returns the translated text as the first result.
But there are other associated results that are less visible.
For starters, when loading the source text, it is possible to extract the personal data
relating to the subject referred to in the text, or even the biometric data of the
individual who introduces his or her spoken language into a translation system,
and even data on attitudes or emotional states extracted through paralinguistic
communication. As Trancoso (2022) says in another chapter in this volume, when
sending a biometric signal such as speech to a remote server for processing, “. . . this
input signal reveals, in addition to the meaning of words, much information about
the user, including his/her preferences, personality traits, mood, health, political
opinions, among other data such as gender, age range, height, accent, etc. Moreover,
the input signal can be also used to extract relevant information about the user
environment, namely background sounds”. That is, through the MT mode called
speech-to-speech MT (S2SMT), this whole series of personal data can be obtained.
However, in the case of personal data, the processing and use of personal data
must be authorised by the person from whom they have been extracted, so it is this
consent that determines the legitimacy in the use of such data. See in this sense
Regulation EU 2016/679 of the European Parliament and of the Council of 27 April
2016 on the protection of natural persons with regard to the processing of personal
data, whose Article 6 states: “Lawfulness of processing. 1. Processing shall be
lawful only if and to the extent that at least one of the following applies: (a) the
data subject has given consent to the processing of his or her personal data for one or
more specific purposes. . .”. However, Trancoso (2022) proposes a simpler solution:
Protecting the user by encrypting and anonymizing their communication by means
of spoofing algorithms.
On the other hand, the ethical guidelines that are being imposed in the vast field of
AI require all participants to use personal data appropriately. As Parra Escartín and
Moniz (2019) warn, in the case of translation: “Issues such as who has access to the
data, who is the data curator and manager, how is the data processed and where and
how it is stored are key prior to establishing any translation workflow to ensure that
all parties are protected from potential data and privacy breaches, or even potential

10
Alternative to the introduction of new intellectual property rights in cases of uncertainty
(my translation).
11
In legal texts, lege data refers to what is already enforced by law.
84 M. L. Lacruz Mantecón

threats like cyberattacks”. As I pointed out in previous work (Lacruz Mantecón


2021), “ethics” in AI can be translated as “systems security”.

5.7.2 Technical Data Suitable for Training Neural Networks

Translating by means of a machine has associated a whole series of additional uses


of a statistical and predictive character, which can include databases to generate
probabilistic models that optimise the translation task in terms of precision and time.
These are the so-called translation memory (TM) tools, mainly databases of previous
translations, already used by the individual that commissioned the translation, but
which continue to be useful as data for training corpus-based MT systems, which can
be “statistical”, or “neural”.
Forcada (2023) explains that corpus-based systems are automatically trained to
learn how to translate from vast collections of examples, “. . .each of which contains
a source-language sentence paired with its translation in the target language, and
sometimes also from an additional corpus of monolingual target-language
sentences”. To train the systems, a segmentation technique is used, which consists
of dividing the text to be translated into fragments. When TMs are used in translation
workflows, a search engine checks whether those segments already have translations
in the database (concordancing tools). In the case of MT, those segments help the
system to learn what is the most probable translation of a given chunk, as well as
what is the most probable sequence of words in the target language. Moorkens and
Lewis (2020) warn that the adoption of NMT means that the old translations “. . .are
reused not only at the segment level (as happens in a TM system) or phrase-level
(as in SMT), but at the word-, subword-, or even character-level in MT output. NMT
systems encode and output words, one by one, followed by sentence-ending punc-
tuation”. They also point out that the change from statistical to neural MT “. . .has
increased data requirements further, and the associated improvements in MT fluency
have concomitantly boosted the use of MT not only for assimilation (i.e. gisting) but
also for dissemination”.
Along with these corpora that constitute sample translations, other tools such as
linguistic resources can also be used, such as dictionaries or morphological rules.
These resources are mainly used in rule-based systems, but as Forcada says, they are
also useful for transforming and preparing sample corpora for training corpus-based
MT systems.

5.7.3 Rights Over Linguistic Resources

The creation and adaptation of linguistic resources (they must be computer readable)
is a work of experts, who see their work protected by copyright. In particular, and by
5 Authorship and Rights Ownership in the Machine Translation Era 85

structuring these resources in the form of computer programs directly executable on


the machine, their use rights are protected by software licences.
As for the bilingual corpora for training systems, Forcada (2023) explains how
they demand a lot of data, from hundreds of thousands to millions of sentence pairs
made up of an original sentence and its translation: “Creating such a corpus requires
a great deal of effort. To start, you need to have enough translated text, ideally
professionally translated. Then, before training the system, the translations must be
aligned sentence by sentence . . .though alignment can be automatically performed, it
may need translators’ revision or supervision”. In short, these are translation data-
bases where the translation of a given sentence is stored along with its original
source text.
This leads us to one consideration: the rights object in question is not different
from the rights object of the original work owner (the source text owner) and the
translator rights to their translation of the text, which will have been delivered to the
person who commissioned the work. What changes is the use of these texts, since
there is no longer an editorial or commercial use purpose, but a purpose of improving
MT systems by processing large amounts of translation data. This object additionally
presents another unique feature: we are not talking about individual translations, but
about databases made up of a large number of original texts and their translation into
the respective foreign languages.
The WTO Agreement on Trade-Related Aspects of Intellectual Property Rights
(TRIPS) 1995, stipulated that the software and compilations of data should be
protected as literary works in the Berne Convention if they can be classified as
intellectual creations. However, Moorkens and Lewis (2020) concur with Gow’s
(2007) opinion “that a Translation Memory (TM) file would not qualify as a
sufficiently original creation in the US or Canada”. Nor does it appear that they
can be protected with a sui generis right such as that of databases, since there has not
been a substantial investment in the production of these data separately from the
creation of the translation engine or auxiliary tools.12
These authors explain how TM files are normally handed over to the end
customer, and that unless a contractual agreement allows otherwise, the rights over
these translations (as well as over the original source text) will be in the hands of the
person who commissioned the translation. However, Forcada (2023) believes that
both the source-text authors and the translators “should get an additional compen-
sation for this unused use of their works (particularly those published on the Internet
and made using crawling techniques) to generate new, initially unused, value, not
only through MT but also through other translation technologies such as computer-
based translation”.

12
Note that this is what Gow (2007) advocates for. In his view, the creation and feeding of a TM
over time would not be considered an investment. However, this is a field still to be regulated and
different parties may view at this matter from different angles.
86 M. L. Lacruz Mantecón

The misunderstandings around the topic of rights are evident. In my opinion,


various stakeholders can be distinguished in the process of compiling these TM files,
glossaries and other tools:
• The final clients of the translator, who are rights holders of the original work and,
by contract with the translator, holders of the exploitation rights of its translation.
These rights are protected independently of those of the author of the work. In
fact, the authorship rights may have already entered the public domain, if enough
time has elapsed (e.g. in Spain, the author's life plus 70 years after his death).
• The translators, who are still authors of the translation of each original work
stored in the database. Their authorship would allow them to demand, at the very
least, recognition for their work. However, there is an intrinsic problem: the
tremendous segmentation that both the original source text and the translation
may undergo. This makes it difficult to acknowledge additional rights or com-
pensations for the translator.
• The TM database compilers, which proceed to the compilation of source-target
sentence pairs that can be subsequently used, in the case of MT, to train systems.
We are dealing with databases and therefore creations protected by copyright,
without prejudice to the rights associated with their contents, as can be seen in
Article 10 of the WTO Agreement on Trade-Related Aspects of Intellectual
Property Rights: “1. Computer programs, whether in source or object code,
shall be protected as literary works under the Berne Convention (1971). 2. Com-
pilations of data . . .which by reason of the selection or arrangement of their
contents constitute intellectual creations shall be protected as such. . . without
prejudice to any copyright subsisting in the data or material itself”.13 Hence,
those compiling TM databases would have ownership rights over the compiled
corpus, provided that they have obtained it in a legitimate manner. It should also
be noted that the compilation and processing of texts for direct use in a translation
or for training MT systems implies providing an important added value to the
raw data.
• In addition to the above, we have subjects of a public or governmental nature that
consider it necessary to have open access databases with translations of regulatory
texts common to various societies, as well as other subjects who argue for a sort
of international brotherhood and maintain that in this field of translation it is
necessary to enable the development of all kinds of translation engines, and hence
all data and tools should be provided to the community free of charge. As an
example of the former, Forcada (2023) refers to the Directorate General for
Translation of the European Commission, “that have published the body of
common rights and obligations binding for all European Union countries, the
so-called “Acquis Communautaire””. As an example of the latter, we can refer to
the Tatoeba platform,14 an open and collaborative database that has a corpus of

13
Article 12 of the Spanish Intellectual property law establishes the same.
14
https://tatoeba.org/en/ (Accessed 28 October 2021).
5 Authorship and Rights Ownership in the Machine Translation Era 87

10,048,454 sentences in 409 languages, obtained in many cases by means of


crowdsourcing, and that publishes its data under a Creative Commons licence. To
these possibilities we need to add, as Forcada (2023) also explains, the parallel
corpora crawled from free access sites on the Internet, a method used by com-
mercial systems like Google, Microsoft and DeepL: “A very large, perhaps the
largest source of sentence-aligned translations used to train machine translation
systems comes from publicly-accessible documents published on the Internet
either by manually scraping and aligning it, or either by automated crawling
and alignment”.
In my opinion, the following are the main actors in corpus compilation efforts:
• Database compilers, who compile these corpora from pairs of texts obtained both
from translators and from the end customers of those translators. The
corresponding rights may be obtained either from the translation contract that
initially links a translator and their end customer, or from a subsequent contract
drafted for the purposes of including the translations in a corpus being compiled.
They may also add to the database sentence pairs considered to be of public
ownership, such as those that governmental institutions decide to make available
to any interested party, as well as any data crawled from websites owned by
private entities that decide to make their data freely accessible under a Creative
Commons licence or similar. Both end-customers and translators can become
database compilers and create their own database, by means of any of the above
contractual covenants or by extracting data that is publicly permitted to be
extracted. In such cases, the database is legitimately owned by its holder, and
its use may be granted by obtaining a licence upon payment of the corresponding
rights.
• Public entities, which make available certain types of texts (usually legislative or
administrative) in various languages, all in a free access mode or via copyleft
licences.
• Private entities that make their integrated databases available to the public
through their own funds, crowdsourcing, or crawling freely accessible sites.
These databases are made available to the public for the extraction of their
translations, allowing free access, and, in many cases, re-use through open
licences such as Creative Commons.

5.8 Intellectual Property Regulation and Corpora


Extracted from Translation Memories

As I have said, the compilation of corpora as databases requires the compiler to


obtain authorization from the rights holders, both those of the original text and those
of the translation. Any content available on any media under copyright or with a
copyright warning is protected by copyright and is not freely accessible. Contents
88 M. L. Lacruz Mantecón

under copyleft or creative commons licenses, on the other hand, would be accessible,
but generally they deny subsequent non-free uses of the data.
Nevertheless, the text processing applied to both source and target texts, partic-
ularly their segmentation, makes it virtually impossible to make any claims over the
non-consented use of protected data. As Forcada (2023) warns, segmentation is
equivalent to inserting the test into a document shredder, which transforms it into a
series of paper strips that mix with other strips from different shredded documents.
As the author points out, the result is that it is extremely difficult to rebuild
substantial parts of the original document, which makes it almost impossible to
demonstrate that a non-consented exploitation of the text has occurred, as well as
that a compensation has been incurred. He also adds two possible exceptions to the
need to obtain authorization from the right holders: The first, that the use of text
fragments could be covered by the “fair use” exception in Anglo-Saxon law. The
second is that the processing of texts can be understood as “data mining”, an
operation permitted by the Directive (EU) 2019/790 of the European Parliament
and of the Council of 17 April 2019 on copyright and related rights in the Digital
Single Market: “Article 3. Text and data mining for the purposes of scientific
research. 1. Member States shall provide for an exception to the rights provided
for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/
29/EC, and Article 15(1) of this Directive for reproductions and extractions made by
research organisations and cultural heritage institutions in order to carry out, for
the purposes of scientific research, text and data mining of works or other subject
matter to which they have lawful access. . .”.
To sum up, the reservation of rights to original texts and their translation can only
be effectively protected if those texts have not left the possession of their holders.
When such texts become accessible via web pages or repositories, they become
susceptible to tracing, localization and extraction for segmentation by web crawlers.
And once the text has been segmented, any rights complaint becomes extremely
difficult. We can also conclude that professional translation work today cannot
survive without the use of MT instruments, for reasons of competitiveness. As
early as 2009, Garcia (2009) said that “as soon as 2010, translation will be pushed
into simple MT post-editing, (..). Translators will still be needed, but their working
conditions into the next decade will be quite dissimilar to those of the nineties.” And
of course, and to conclude, the evolution and progress of MT systems will lead to a
reduced need for human translators. This was the case with the arrival of the railway:
the number of stagecoach drivers decreased, but the number of train drivers
increased.

5.9 Conclusions and Main Takeaways

The introduction of intelligent systems in the field of translation has brought a


revolution in the work of translation professionals. At first glance, MT has become
a supporting tool, as the translator is aided by a tool that is increasingly achieving a
5 Authorship and Rights Ownership in the Machine Translation Era 89

better degree of quality in its results. But this increase in quality means that, on the
one hand the translator is replaced in some cases by the tool, and on the other, that
the competition is increasingly fierce, as productivity has increased, allowing more
services to be offered for the same remuneration, which sometimes places translators
in difficult situations.
Today the profession is still in this tense situation, but translators are aware that
systems are becoming increasingly better precisely because they themselves are
helping to increase the quality of MT systems. The image that comes to my mind
is that of a person riding on the back of a tiger: They cannot fall from their saddle or
descend from it, because the tiger will devour them. This is nowadays the translator's
dilemma with MT technology.
The solution is, of course, to continue riding the tiger. I know virtually nothing
about neural networks, but I do know that they need to be “trained,” and that they
need continuous readjustments: This may be a new niche for the translator’s work. I
also know that the large text-generating companies will continue to need profes-
sional translation services, and that to get the exact translation of the ambiguities of
any language a brain that understands these ambiguities is required, because only
such a brain will be able to discern the context in which they are generated and the
nuances associated to them.
This is precisely one of the weaknesses of any AI system. They are usually
developed to act in a very narrow context, and hence have problems understanding
double senses, contradictions or absurd claims. We are faced with one of the
limitations of intelligent systems already warned by the AI pioneers, who soon
realised that self-referential claims (such as those referred to in the Gödel Theorem)
could not be processed by an intelligent system. Here the human being has an
enormous advantage: Contradictions cause no problem to them as they are assumed
as a fairly frequent human feature. Transposed to the field of translation, the human
translator therefore has this enormous advantage over the machine.
Finally, with regard to the unauthorised use of translations which is carried out
through the fragmentation of texts, and the lack of remuneration that translators
suffer from such exploitation, the solution may come from copyright. In fact, in view
of the unremunerated use of protected works that occurred through copying literary
works, musical performances or films by means of photocopiers, tape recorders or
video devices and later also CD recorders or computer memories, the law reacted by
imposing an equitable compensation for the authors, editors, producers or inter-
preters who saw copied the works in which they had been involved.
This is the system of “equitable compensation” by private copy, which the
Spanish Intellectual Property Law regulates in its article 25: “1. La reproducción
de obras divulgadas en forma de libros o publicaciones que a estos efectos se
asimilen mediante real decreto, así como de fonogramas, videogramas o de otros
soportes sonoros, visuales o audiovisuales, realizada mediante aparatos o
instrumentos técnicos no tipográficos, exclusivamente para uso privado, no
profesional ni empresarial, sin fines directa ni indirectamente comerciales, de
90 M. L. Lacruz Mantecón

conformidad con el artículo 31, apartados 2 y 3, originará una compensación


equitativa. . .”.15
The reader will have already reached the idea that I now expose: The aim would
be to achieve equitable compensation for translations or translation fragments used
for the development of MT systems. A remuneration that would be distributed
among the translators, to reward the use of their works without asking them for an
authorization to do so. The debtors of this compensation would be those who used
translations with the purposes of applying them to their intelligent systems, espe-
cially the developers of MT systems. The amount to be paid should be fixed
according to the estimated amounts of texts used (possibly with the intervention of
a public body). The creditors would be the translators whose texts are used for this
purpose. However, as with the equitable compensation of authors and interpreters,
the amounts would not be collected by themselves, but by their “management
entities”: professional intellectual property rights management organisations with
numerous affiliates. Perhaps translators would have to think about something similar
at national level, although it would be more practical to commission management
from an already established entity (in Spain, CEDRO,16 for example, as it is an
organisation that takes care of this for literary authors). This idea remains to be
explored in future work.

Acknowledgements The present work has been carried out under the project “Derecho e
inteligencia artificial: nuevos horizontes jurídicos de la personalidad y la responsabilidad
robóticas”, IP. Margarita Castilla Barea, (PID2019-108669RB-100/AEI/10.13039/501100011033).

References

Berne Convention for the Protection of Literary and Artistic Works of September 9, 1886
Bridy A (2012) Coding creativity: copyright and the artificially intelligent author. Stanford Technol
Law Rev 2012:5. http://stlr.stanford.edu/pdf/bridy-coding-creativity.pdf
Calo R (2016) Robots in American Law (February 24, 2016). University of Washington School of
Law Research Paper No. 2016-04. Available https://ssrn.com/abstract=2737598
Carrasco Perera Á, del Estal Sastre R (2017) Art. 5. In: Bercovitz R (ed) Comentarios a la Ley de
Propiedad Intelectual, 4th edn. Tecnos, Madrid
Directive 2006/116/EC of the European Parliament and of the Council of 12 December 2006 on the
term of protection of copyright and certain related rights

15
1. The reproduction of works disseminated in the form of books or publications that are
assimilated to these purposes by real decree, as well as phonograms, videograms or other sound,
visual or audiovisual media, carried out by means of non-typographical technical apparatus or
instruments, exclusively for private, non-professional or business use, without direct or indirect
commercial purposes, in accordance with article 31(2) and (3), shall result in an equitable com-
pensation . . . (my translation).
16
https://www.cedro.org/.
5 Authorship and Rights Ownership in the Machine Translation Era 91

Dornis TW (2021) Of ‘Authorless Works’ and ‘Inventions without Inventor’ -The Muddy Waters
of ‘AI Autonomy’ in Intellectual Property Doctrine. In European Intellectual Property Review
(E.I.P.R.) 2021
Duque Lizarralde M (2020) Las obras creadas por Inteligencia Artificial, un nuevo reto para la
propiedad intelectual. In Pe. i.: Revista de propiedad intelectual, N° 64
European Parliament resolution of 16 February 2017 with recommendations to the Commission on
Civil Law Rules on Robotics (2015/2103(INL))
European Parliament resolution of 20 October 2020 on intellectual property rights for the devel-
opment of artificial intelligence technologies (2020/2015(INI))
Fernández Carballo-Calero P (2021) La propiedad intelectual de las obras creadas por inteligencia
artificial. Aranzadi Thomson Reuters, Cizur Menor
Forcada ML (2023) Licensing and usage rights of language data in machine translation. In:
Moniz H, Escartín CP (eds) Towards responsible machine translation. Ethical and legal con-
siderations in machine translation. Springer International Publishing, Heidelberg
Galanter P (2020) Towards ethical relationships with machines that make art. In: West B (ed) AI,
arts & design: questioning learning machines. Artnodes, no. 26, 2020. UOC. https://doi.org/10.
7238/a.v0i26.3371
Garcia I (2009) Beyond translation memory: computers and the professional translator. J Spec
Transl 12:199–214
Ginsburg JC, Budiardjo LA (2019) Authors and machines. (August 5, 2018). Columbia public law
research paper No. 14-597. Berkeley Technol Law J 34(2):61–62. https://doi.org/10.2139/ssrn.
3233885
Gow F (2007) You must remember this: the copyright conundrum of “translation memory”
databases. Can J Law Technol 6(3):175–192
Holder C, Khurana V, Hook J, Bacon G, Day R (2019) Robotics and law: key legal and regulatory
implications of the robotics age (Part II of II). Comp Law Secur Rev 32:2016
Lacruz Mantecón ML (2020) Robots y personas. Una aproximación jurídica a la personalidad
cibernética. Editorial Reus, Madrid
Lacruz Mantecón M (2021) La ética de los agentes cibernéticos (una ética de plástico para seres de
plástico). Paper presented at the XXVII Congreso Internacional Derecho y Genoma Humano,
Bilbao
Lanteri P (2020). La problemática de la Inteligencia Artificial y el Derecho de autor llama a la puesta
de la OMPI. Cuadernos jurídicos: Instituto de Derecho de Autor, 15 ° aniversario / Díez
Alfonso (dir.), p 19
Lee TK (2020) Translation and copyright: towards a distributed view of originality and authorship.
Translator. https://doi.org/10.1080/13556509.2020.1836770
Ley de Propiedad intelectual, Real Decreto Legislativo 1/1996, de 12 de abril.
Mezei P (2020) From leonardo to the next rembrandt – the need for AI-Pessimism in the age of
algorithms (July 24, 2020). Arch Med Medienwissensc 2:390–429. https://doi.org/10.5771/
2568-9185-2020-2-390
Miernicki M, Ng I (2020) Artificial intelligence and moral rights. AI Soc. https://doi.org/10.1007/
s00146-020-01027-6
Moorkens J, Lewis D (2020) Copyright and the reuse of translation as data. In: O’Hagan M (ed) The
Routledge handbook of translation and technology. Routledge, London, pp 469–481
Navas Navarro S (2018) Obras generadas por algoritmos. En torno a su posible protección jurídica.
Rev Derecho Civil 5:273–291
Nova Productions Ltd v Mazooma Games Ltd & Others (2007) EWCA Civ 219, Case No:
A3/2006/0205
Parra Escartín C, Moniz H (2019) Chapter 7. Ethical considerations on the use of machine
translation and crowdsourcing in cascading crises. In: O’Brien S, Federici FM (eds) Translation
in cascading crises. Routledge, London
Ramalho A (2017) Will robots rule the (artistic) world? A proposed model for the legal status of
creations by AI systems. SSRN Pap 2017:2987757. https://doi.org/10.2139/ssrn.2987757
92 M. L. Lacruz Mantecón

Regulation EU 2016/679 of the European Parliament and of the Council of 27 April 2016 on the
protection of natural persons with regard to the processing of personal data
Ríos Ruiz WR (2001) Los sistemas de inteligencia artificial y la propiedad intelectual de las obras
creadas, producidas o generadas mediante ordenador. Rev Propiedad Mater 3:5–13
Rogel Vide C (1984) Autores, coautores y propiedad intelectual. Tecnos, Madrid
Sanjuán Rodríguez N (2020) La inteligencia artificial y la creación intelectual: ¿está la propiedad
intelectual preparada para este nuevo reto? La Ley mercantil, N°. 72 (septiembre)
Scheuerer S (2021) Artificial intelligence and unfair competition – unveiling an underestimated
building block of the AI regulation landscape. GRUR Int 2021:8–10. https://doi.org/10.1093/
grurint/ikab021
Trancoso I (2022) Treating speech as personable identifiable information—impact in machine
translation. In: Moniz H, Parra Escartín C (eds) Towards responsible machine translation.
Ethical and legal considerations in machine translation. Springer International Publishing,
Heidelberg
Topping S (2000) Sharing translation database information: considerations for developing an
ethical and viable exchange of data. Multiling Comput Technol 11(5):59–61. Available online:
https://multilingual.com/all-articles/?art_id=1105. Accessed 12 Nov 2018
US Copyright Act (17 U.S.C.) (n.d.)
Venuti L (1995) Translation, authorship, copyright. Translator 1:1–24. https://doi.org/10.1080/
13556509.1995.10798947
Wahler ME (2019) A word is worth a thousand words: legal implications of relying on machine
translation technology. Stetson Law Rev 48:109
Way A (2013) Traditional and emerging use-cases for machine translation. Paper presented at
translating and the computer 35, London
Way A (2018) Quality expectations of machine translation. In: Moorkens J, Castilho S, Gaspari F,
Doherty S (eds) Translation quality assessment: from principles to practice. Springer Interna-
tional Publishing, Heidelberg, pp 159–178. https://doi.org/10.1007/978-3-319-91241-7_8
WTO Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS) 1995
Yu R (2017) The machine author: what level of copyright protection is appropriate for fully
independent computer-generated works? Univ Pa Law Rev 1245:1260. Available https://
scholarship.law.upenn.edu/penn_law_review/vol165/iss5/5
Part II
Responsible Machine Translation from
the End-User Perspective
Chapter 6
The Ethics of Machine Translation
Post-editing in the Translation Ecosystem

Celia Rico and María del Mar Sánchez Ramos

Abstract The metaphor of the translation ecosystem originates from situational


models of translation that conceptualise the translation process as a complex
system. This includes not only the translator, but also other people—cooperation
partners such as clients, project managers, proof-readers or co-translators—their
specific social and physical environments as well as their cultural artefacts
(Risku, Translationsmanagement. Interkulturelle fachkommunikation im
kommunikationszeitalter. Narr, Tübingen, p 19, 2004). These artefacts, understood
as objects made or used by humans for a particular purpose, have a high relevance
for the translation process and for the translator’s cognition. The artefact group of
translation technology includes, among others, tools for terminology and project
management, translation memory (TM) systems, alignment software and machine
translation (MT) systems (Krüger, Lebende Sprachen 61(2):297–332, 2016a). From
the perspective of ecosystemic theories of translation, we are able to include
situational factors which are external and internal to the translator and provide a
holistic means for the analysis of translation performance. In this respect, the ethics
of machine translation post-editing (MTPE) poses a question of central importance, a
question that can be addressed from the stance of the ecosystem metaphor.
MTPE as an object of study is directly linked to the different developments in MT
over time. During the first years of MT, it was largely empirical and focused on MT
usability and comprehensibility, with a view to further developing the technology.
Eventually, when MT reached a maturity, research interests concentrated on the
practicalities of MTPE, with case studies and best practice examples (Garcia, Anglo
Saxonica 3(3):291–308, 2012). With the latest developments in neural MT, MTPE is
in a “state of terminological flux” (Vieira, The Routledge handbook of translation

C. Rico (✉)
Universidad Complutense de Madrid, Madrid, Spain
e-mail: [email protected]
M. del Mar Sánchez Ramos
University of Alcalá, Madrid, Spain
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 95


H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_6
96 C. Rico and M. del Mar Sánchez Ramos

and technology. Routledge, London, p 320, 2019), comprising different, yet com-
plementary, tasks and procedures: as a separate service with its own international
standard, a dynamic activity that goes beyond the static cleaning of MT outputs, and
a task associated by default with lower quality expectations. The instability of MTPE
as a concept leads to the discussion of human agency in the MTPE process, and the
exploration of the extent to which translators are able to intervene in the use of MT in
MTPE. Furthermore, the analysis of the different degrees of human control triggers
diverse issues in the ethics of MTPE. This chapter explores such issues in the light of
the translation ecosystem, analysing three specific ethical dilemmas: (a) Dilemma
#1: the post-editor’s status; (b) Dilemma #2: the post-editor’s commitment to
quality; and (c) Dilemma #3: digital ethics and the post-editor’s responsibility.
Rather than offering a set of closed conclusions, the chapter should be read as an
invitation to the reader to think about key ethical elements and the way MTPE is
affecting the translator’s work.

Keywords Machine translation · Post-editing · Situated translation · Ethics in


machine translation · Translation ecosystem

6.1 Introduction

The latest advances in machine translation (MT) testify to the rapid emergence of
automated translation tools. While each advance in MT (e.g., neural MT) represents
progress, each step forward substantially transforms the translator’s tasks. Machine
Translation post-editing (MTPE), for instance, is an activity related to changes made
to translations produced by an MT system to maintain previously established quality
standards (Allen 2003). Although its relevance is sometimes underestimated, MTPE
has always been part of MT: “While mostly a matter of language, it is clear from
early records that post-editors—and pre-editors—have a peripheral role in the
MT-based translation process” (Vieira et al. 2019, p. 3). However, similar to MT,
MTPE has evolved from being a minor task to becoming a translation necessity. It is
around the second decade of the twenty-first century that MTPE emerged as a
discipline of its own in professional workflows, with specific discussions in profes-
sional forums, as part of specialised training courses or mentioned in academic
journals. Accordingly, during this period, translation companies began to offer it
as a value-added service, and the first job offers began to appear (Sánchez Ramos
and Rico Pérez 2020). In any case, it is important to highlight that MTPE has been a
field of research for a longer period of time with pioneers in the field like Krings
(2001), O’Brien (2002, 2011) and Guerberof (2012) focusing on how MT was
impacting translators.
The evolution of MTPE is also linked to the constant technology-induced
changes in the daily work of the translator and the translation process. MTPE has
evolved from being a static task (Vieira 2019) and being regarded as an activity
performed once the machine has produced its output, to becoming an activity
6 The Ethics of Machine Translation Post-editing in the Translation Ecosystem 97

wherein translators interact with adaptive MT systems 1 as the final version of the
text is being generated. However, although MTPE tasks have gained some recogni-
tion within the translation sector, many studies indicate that translators have a
negative perception of MT for various reasons, ranging from the low quality of
MT systems (Vieira et al. 2019) to the effects of MT on the translation market (Vieira
2020). MTPE has also garnered significance following its emergence as an academic
field of inquiry and inclusion in different university syllabi, as further shown by
research studies on the profile of post-editors and their competencies (Rico and
Torrejón 2012; Sánchez Gijón 2016; Guerberof and Moorkens 2019; Konttinen et al.
2021) or on differences between MTPE, revision, and editing (Do Carmo and
Mookens 2021).
Other studies have focused on MTPE efforts (Krings 2001) and the quality of
MTPE, two keywords because the possible benefits of MTPE in professional
translation settings depends on them (Vieira 2019). For example, a heavily debated
topic is whether MTPE should be considered a monolingual or bilingual activity,
which in turn depends on whether the original text is available together with the MT
output. Although the consensus is that MTPE is a bilingual task, various studies
abound on the benefits of monolingual MTPE, as long as the product of an MT
system is post-edited by a subject-matter expert (Schwartz 2014). However, other
studies (Mitchell et al. 2013; Nitzke 2016) only partly corroborate these findings,
demonstrating that monolingual post-editing improves fluency and not adequacy.
Another key concept related to MTPE is productivity (Plitt and Masselot 2010),
which is also linked to quality (Guerberof 2014). 2 This may even be the main reason
why MTPE tasks are now in such high demand. Although MT followed by post-
editing increases the productivity levels of a company and of an individual transla-
tion, it may adversely affect quality. Furthermore, whereas neural MT minimises
major grammatical errors, the resulting text may not read like natural language.
Several studies have been conducted to identify the differences between MT
followed by post-editing and human translations. Most recently, Sánchez Gijón
(2016) conducted a study to determine the factors to be considered in addition to
those expected by the industry, such as productivity, when evaluating MTPE aimed
at providing results with quality matching that of human translation. The author
concludes that using MT followed by MTPE jeopardises the naturalness of the target
text, despite the very low error rate. In her words, “reaching human quality means
going beyond the grammatical correction that machine translation systems aspire to
achieve, which is currently provided through MTPE” (Sánchez-Gijón 2020, p. 98).
She concludes that “the overestimation of machine translation is probably its most
denounced aspect among professionals” (Sánchez-Gijón 2020, p. 98).

1
Adaptive MT allows an MT system to learn from corrections on the fly, as the post-editor
makes them.
2
As we will see later when discussing dilemma #2, the question of which quality is to be delivered
in MTPE raises some important concerns in ethical terms.
98 C. Rico and M. del Mar Sánchez Ramos

This brief account about MTPE 3 highlights some of the many ways in which this
field of activity is transforming translation, bringing some changes which, in turn,
give rise to some ethical considerations. In the present chapter we approach these
from the perspective of the translation ecosystem as described by Krüger (2016a, b).
With a view to adapting Krüger’s model for the purposes of MTPE we will first
review each of the components that make up the translation ecosystem—cooperation
partners, social factors, artefacts and psychological factors—and show how they
relate to the task of MTPE. This will reveal different aspects that intervene in the
construction of MTPE ethics, providing the framework upon which we later con-
struct the argumentation over three key ethical dilemmas. The second part of the
chapter will concentrate on the discussion of these three key ethical dilemmas:
(a) Dilemma #1: the post-editor’s status; (b) Dilemma #2: the post-editor’s commit-
ment to quality; and (c) Dilemma #3: digital ethics and the post-editor’s
responsibility.

6.2 MTPE in the Translation Ecosystem

The ecosystem metaphor in cognitive studies is based on the central assumption that
the cognitive ecosystem consists of humans and everything that surrounds them
(Strohner 1995). This ecosystem has clear implications in translation studies and has
given rise to situated translation, one of the most relevant theories in translation
studies (Risku 2004, 2010). From a cognitive perspective translators are situated at
the centre of the process, with two other elements around them which are significant:
artefacts and people (cooperation partners). Artefacts are the objects used by the
translator for specific tasks (e.g., translation memories or MT systems). On the other
hand, cooperation partners comprise language service providers, project managers,
and reviewers, in addition to other internal factors such as the physical and psycho-
logical conditions of the translator. In this model, the translator acquires the status of
the situated agent, who is the creator of textual material in a specific cultural context
(Risku 2004, p. 75).
Inspired by Risku’s (2004) initial proposal, Krüger (2016a, b) creates a situational
model applicable to specialised translation, known as the Cologne model of the
situated LSP translator.4 Krüger’s (2015, 2016a, b) proposal for a translation
ecosystem is based on the models by Serrano Piqueras (2011), Risku (2004,
2010), Holz-Mänttäri (1984), and Schubert (2007). In its representation according
to the cologne model, Krüger (2016a, b) formulates the translational ecosystem,
covering the entire translation process, which is divided into the creation, transfer,
and organisation phases. Creation is the phase in which the source texts of the LSP

3
Our account is necessarily brief as we understand that the reader is already familiar with MTPE.
For a thorough review into the matter see, for example, Koponen et al. (2021).
4
In the cologne model, LSP stands for language for specific purposes (Krüger 2016b, p. 118).
6 The Ethics of Machine Translation Post-editing in the Translation Ecosystem 99

Fig. 6.1 MTPE in the translation ecosystem (adapted from Krüger 2016a)

translation process are written; in the second phase, transfer, content is processed
from one language to another; the last phase, organisation, includes all operational
flows (Krüger 2016a, p. 312). The translation process specifically takes place in the
transfer phase, which is subdivided into various work phases, ranging from project
preparation to quality control (Krüger 2016a, p. 311). This model is premised on the
following components: (1) the translator as the central agent; (2) the cooperation
partners or users; (3) the social factors that comprise the professional world of the
translator; (4) different types of artefacts (or resources) that facilitate the translation
process such as computer-assisted translation (CAT); and (5) psychological and
social factors.
For the purposes of analysing the ethics of MTPE we have adapted Krüger’s
model, addressing those features which we understand are most related to MTPE
(Fig. 6.1). Although the author does not specifically refer to MTPE, we use this
figure as the basis to examine how translation ecosystem components contribute to
the construction of the ethics of MTPE. All these elements are located along the
transfer phase, which explains why we have given less prominence to the other two
phases in our adaptation of the model.5
We will first review each component, highlighting the main ethical issues that
spring from them and, later, concentrate on the discussion of these issues along the
three main ethical dilemmas as indicated above.

5
The creation phase can be related to the concept of pre-editing. In this connection see, for instance,
Guerberof (2019).
100 C. Rico and M. del Mar Sánchez Ramos

6.2.1 Cooperation Partners in MTPE

In line with the guidelines in Risku (2010) proposal, Krüger points to cooperation
partners as an essential part of the translational ecosystem since the translator
interacts with each of them at different phases of the translation process (Krüger
2016a, p. 317). Cooperation partners refer to the following figures: the initiator, the
commissioner, the source text producer, the target text user, the target text receiver,
the co-translator, the proof-reader, and the project manager. In our adaptation of the
model, we identify two main groups of cooperation partners in MTPE. On the one
hand, those partners the post-editor interacts with in the exchange of expertise or for
extra input (co-translators, post-editing team and project manager). On the other, the
group of partners represented by the client and the target text user (TT user), who
determine the final use of the text and, therefore, the way it is to be post-edited. In the
interaction with cooperation partners, we can see some aspects that need to be
explored in the light of an ethical dimension. The first relates to the way the
MTPE project is conceived, the role assigned to post-editors, their status and the
possible result into negative attitudes towards the task (Plaza-Lara 2020, p. 164). As
we will see in the next section, these constitute what we call dilemma #1, related to
the post-editor’s status. A second aspect (dilemma #2) is concerned with the quality
assigned to the final text in view of the requirements from the client and the TT user.
A final consideration (dilemma #3) is the post-editor’s responsibility towards data
governance and ownership (for instance, the exposure of private data when using
online MT).

6.2.2 Social Factors in MTPE

Krüger (2016a) uses the Bourdian terminology (Bourdieu 1984) to classify the social
factors that constitute the professional world of the translator into field, capital, and
habitus. The first term refers to the translation field and location of the client who
requests the translation service; the key players here are the actors who provide these
translation services (including language service providers or freelance translators).
The capital determines the economic, cultural, and social relationships between the
translation and the cooperative partners. For example, membership of professional
associations, subject-matter expertise or field-specific training will enable specific
levels of communication with the cooperative partners. Together, these factors will
define the translational habitus, that is, the behaviour of the translator in the field of
action or (translation and location) field.
As we will see below, these three features of Bourdieu’s framework are key in
defining the post-editor’s status in relation to that of the translator, and will serve as
the basis for discussing dilemma #1.
6 The Ethics of Machine Translation Post-editing in the Translation Ecosystem 101

6.2.3 Artefacts in MTPE

In the translation ecosystem there are two types of artefacts: translation technology
tools and steering instruments. Although Krüger’s (2016a) model does not assign
much prominence to MTPE, we can easily relate it to the artefact group of translation
technology tools or, in the author’s words, ‘translation technology in a narrow
sense’. In a non-exhaustive list, the author assigns the following tools to this
group: translation memory systems (TM systems), terminology management, MT
systems and project management tools (PM tools). MT systems are seen as affecting
the translation process “even more drastically than TM systems since, in this case,
the translator’s task is reduced to pre- and post-editing while the actual translation is
performed by a machine” (Krüger 2016a, p. 321).
The interaction of MT and TM is further explained in the elaboration of the model
(Krüger 2016b), where CAT tools are contextualised in dynamic processes of
interaction between the translator and the working environment. MT systems are
part of the translation process, as a component of TM systems, providing an
alternative translation to exact or fuzzy matches (Krüger 2016b, p. 125), when the
translator has to decide whether or not to accept the MT output. Again, this opens the
debate of whether the post-editor’s work should be set to the mere acceptance of a
segment translated by a MT system integrated into a CAT tool. In fact, Krüger
(2016a) anticipates here what we currently know as the CAT-e(nvironment): a CAT
tool is much more than a translation memory, it is the environment that surrounds the
translator and the artefacts that interact with the rest of the ecosystem. However, as
recent studies state, the translator’s task extends further than pre- and post-editing
and MT can be seen as part of the CAT-e (Vieira 2019).
As we will discuss in the following section, the interaction of the post-editor with
the artefact group of translation technology tools presents two main ethical issues:
(a) the human role—and status—as related to the machine (dilemma #1); (b) the
personal commitment to quality of the final text (dilemma #2).
Together with the artefact group of translation technology tools, MTPE is also
connected to the group of steering instruments, which highlight the importance of
client instructions, style guides, glossaries, databases, and terminological standards,
all influencing the production of the target text by the translator. It is in this group
that we need to include MTPE guidelines and standards, as they are steering
instruments that allow the post-editor to decide when and how a particular segment
has to be post-edited. In this respect, it is interesting to note that using those is not as
straightforward as it may seem. On the one hand, the apparent better quality
delivered in the output of neural MT systems is blurring the division originally
devised by Allen (2001) between light and full post-editing.6 On the other, the
conceptualization of MTPE made by ISO2015 standard (ISO (International Organi-
zation for Standardization) 2017) implies that this task is easier than translation as it

6
The aim of light post-editing is to make the text comprehensible by making as few changes as
possible, while full post-editing is performed on texts that require higher quality (Allen 2001).
102 C. Rico and M. del Mar Sánchez Ramos

rests on the assumption that it is the machine that has done the translation effort
(do Carmo 2020). As we see, both guidelines and standards have, then, a relevant
role in defining the ethics of MTPE and the post-editor’s commitment to quality
(dilemma #2).

6.2.4 Psychological Factors in MTPE

The last factors Krüger (2016a) mentions in his translation ecosystem are psycho-
logical factors, which include external aspects (e.g. time pressure) and internal
aspects (e.g. motivation). These factors have been also pointed out in recent studies
as intrinsically related to translation: “Translation is currently described as a profes-
sion under pressure from automation, falling prices and globalization” (Vieira 2020).
Without a doubt, these psychological factors of the translator also have a direct
influence on our adaptation of Krüger’s model. For instance, time pressure is linked
to the concept of productivity. As Vieira and Alonso (2020) state, the automation of
translation has made clients expect large volumes of texts translated in a short period
of time. However, this ‘speed’ in the process of translation can be related to
low-quality target translations (dilemma #2) if there is a lack of communication
between the client and the post-editor. This fact, also, can have a negative effect on
the post-editor’s motivation and status (dilemma #2). The client, as a cooperation
partner and part of the post-editor’s working environment, should place the post-
editor in a relevant position within the MT network, that is, and using Krüger’s
(2016a, p. 326) words, a proper amount of symbolic capital (or degree of expert
status) should be assigned to the post-editor. As we shall see, these psychological
factors of the translation ecosystem also have an influence in stating the ethics
of MTPE.

6.3 Ethical Dilemmas in MTPE


6.3.1 Ethical Dilemma #1: The Post-Editor’s Status

In the translation ecosystem as depicted by Krüger (2016a), we see that the figure of
the translator is central, mastering the use of artefacts, interacting with cooperation
partners, and exhibiting a distinctive professional status. What is not so straightfor-
ward in this model is whether the status of the post-editor remains the same as that of
the translator. As we have seen in our adaptation of Krüger’s model, the post-editor
status is a recurrent issue, present in all interactions with the components of the
ecosystem: (a) the relationship with cooperation partners and the different ways the
MTPE project can be conceived questions the post-editor’s role in the translation
process; (b) social factors determine the position that post-editors play in the
translation ecosystem; (c) artefacts (i.e. translation technology tools and steering
6 The Ethics of Machine Translation Post-editing in the Translation Ecosystem 103

instruments) have an effect on the way the MTPE task is performed and the actual
value assigned to it; and (d) psychological factors undoubtedly have an effect on the
post-editor’s motivation.
In order to deal with this first ethical dilemma about the post-editor’s status, we
will explore the nature of both translators and post-editors from a sociological point
of view, and analyse whether they share the same attributes or, alternatively, whether
they can be considered two different actors in the translation ecosystem. By com-
paring one with the other, we will examine whether there is some loss of power/
influence, or even some marginalisation in the process of becoming a post-editor
(Vieira and Alonso 2020).
From a sociological perspective, Sakamoto (2019) uses Bourdieu’s (1984) social
framework to conceptualise the relationship between the post-editor and the trans-
lator (the main candidate to become a future post-editor). In her model, post-editors
are considered as a new category of workers while translators remain reluctant to the
expectations of MTPE, as they feel that the incorporation of this task leaves aside
their professional skills and identities (Sakamoto 2019, p. 201). The position of both
translators and post-editors can, then, be seen as complementary: while the most
experienced translators who work in a traditional environment (translation-edition-
revision) are at the top of the cultural capital, the composition of the post-editors'
capital is described as high in economic capital and low in cultural capital.7 This is so
as the cost-saving property of MTPE is highly valued by end clients who request an
MTPE service and language service providers which are keen to train post-editors.
The intellectual property of MTPE, on the other hand, is assigned a lower cultural
value (Sakamoto 2019, p. 210). This disjunction may give rise to feelings of
restlessness, anxiety and, sometimes, resentment among translators. However,
these positions are not static and may change as a result of factors such as techno-
logical developments and the evolution of the traditional work environment towards
an MT-based model.
In a subsequent study, Sakamoto and Yamada (2020) explore further the nego-
tiations that take place between project managers and translators in their daily
interactions in the translation ecosystem. The authors conducted four focus groups
involving 22 project managers from 19 language service providers, with a view to
eliciting how the translation community has been shaping the practice of MTPE. The
project managers’ accounts of translators’ work revealed three positions towards the
task (Sakamoto and Yamada 2020, p. 87–88). They identified a first group of “proud
professionals who love to write texts and to create texts from scratch in the way they
like”. This first group tends to reject MTPE work. A second group included those
who are willing to accept MTPE and “prefer to correct existing translations as this

7
“Capital is a resource that social agents invest in and exchange to locate themselves in the social
spaces and hierarchies. In addition to economic capital [. . .], social agents possess other forms of
capital, i.e. cultural capital (this includes upbringing and educational background), social capital
(e.g. personal connections with persons of certain social standings) and symbolic capital (which
confers legitimacy and prestige to the person in the form of, for example, professional titles)”
(Sakamoto 2019, p. 202).
104 C. Rico and M. del Mar Sánchez Ramos

involves less manual and cognitive effort, making the work easy”. The third group is
described by project managers as “fast and cheap translators”, situated “low down in
the translators’ hierarchy but needed to cater to different needs in the market”. A
similar division of perceptions is mentioned in Torres Hostench et al. (2016), who
also identify three approaches to MTPE from translators: those who willingly accept
it, those who reject it and those who accept it but with some reluctance. Among the
negative aspects of MTPE, the participants indicate the following: “we do not trust
MT”, “results are not good”, “we use our heads when we translate”, “we do trans-
lations the old artisanal work” (Torres Hostench et al. 2016, p. 20).
On the other hand, and according to Guinovart Cid (2020), the different roles
assigned to translators and post-editors might be a matter of perspective. For some,
MTPE is a new trade or service while, for others, it is an existing activity (MT-aided
translation). The author groups both profiles under a single one, “linguist”, on the
grounds that “future job positions in the translation industry will be more pluri- and
transdisciplinary”, affecting “the very activity of MTPE and the profile of the
professional, who is found in constant synergy of (now fully mixed) boundaries”
(Guinovart Cid 2020, p. 172). This notion of perspective is better seen in the light of
the interaction between the translator/post-editor and the computer, as discussed by
Vieira (2019). He explores agency in MT, involving different degrees of human
control, from MT-centred automatic MTPE to human-centred interactive/adaptive
MT. Agency does not only depend, then, on the nature of the task but also on other
aspects such as client requirements, the nature of the commission and the translation
company, among other factors. From a situated approach, we can adopt a holistic
view and conceive of MTPE not only as an additional service but also as an activity
that goes beyond the simple static cleaning of MT output. In Koponen’s (2016,
p. 133) words, the discussion arises “when the output of MT is considered to be a
first version that overrules that of the translator”. It is at that point that we see
translators/post-editors at risk of being marginalised and disempowered in the
translation workflow.
A complementary aspect that contributes to the uncertain status of the post-editor
is the “terminological instability” of MT (Vieira et al. 2019, p. 4). From a taxonomic
point of view, the integration of MT (and, therefore, of MTPE) in the translation
process has resulted in the fact that, somehow, the lines that determine what is proper
to the machine and what corresponds to the translator are blurred. This is the case, for
example, when MTPE takes place in environments where translation memories, MT
and human translation interact. From a conceptual point of view, do Carmo (2020)
challenges the narratives that think of MTPE as the revision of pre-translated content
and analyses how these, in combination with the assumption that MTPE increases
productivity, result in downgrading the value of the service. The very definition of
MTPE, according to ISO2015, implies that editing and correcting MT output is
easier and takes less time than translation, as the machine has already done the
translation effort (do Carmo 2020, p. 37). This view is reinforced by usual MTPE
guidelines in the industry that recommend to perform as few edits as possible, with a
focus on time efficiency and productivity. The devaluation of the post-editor is only
a natural consequence when we link MTPE tasks to time, productivity and money.
6 The Ethics of Machine Translation Post-editing in the Translation Ecosystem 105

Perhaps it would be worth acknowledging with do Carmo (2020, p. 41) that the
cognitive load and complexity of performing MTPE tasks is comparable to those of
translation tasks and that the tendency in the industry to reduce per-word fees does
not really take into account this effort.
In our view, the divide between the different status in translation versus MTPE
comes from the conception of these activities as separated or even antagonistic. As
Mitchell-Schuitevoerder (2020, p. 107) adequately remarks, the act of translating
and MTPE overlap since “the post-editor is not only post- editing but also translat-
ing”. In order to modify a target sentence, post-editors need to first generate a
translation in their minds, a tentative model, so to speak. In this respect, we agree
with Mitchell-Schuitevoerder (2020, p. 107) when she points out that “the post-
editor’s cognitive effort is undervalued (also financially) if we consider the many
thought patterns needed while MTPE”. This goes in line with Melby and Hague’s
(2019) argumentation that the numerous advances in technology call for a different
view of the translator’s role, one that places them at the core of the process and turns
them into language advisors, dominating the translation ecosystem and deciding
which tools to use, when and how.

6.3.2 Ethical Dilemma #2: The Post-Editor’s Commitment


to Quality

There are a number of principles common to all ethical codes for translation. These
concern translation competence, impartiality, integrity and accuracy, potential con-
flicts of interest, confidentiality and continuous professional development. However,
the same codes fall short of providing adequate support to translators in their daily
practice, as they fail to cover the infinite range of potential situations they may face
(Lambert 2018). This situation gets more complex when introducing MTPE in the
translation ecosystem since, as we pointed out before, the quality of the final text
depends on factors such as the quality requirements imposed by cooperation partners
(mainly the client and the final user), the position assigned to the machine as an
artefact in relation to the human post-editor, or the conceptualization of MTPE in
guides and standards as steering instruments. We could assume, a priori, that the
end-product of the MTPE process should be a text similar in quality to the one a
translator might deliver. However, the discussion is precisely what is meant by
quality when we deal with a translation created by a machine and subsequently
revised by a person. In other words, to what extent can the two results be compared?
What’s more, should they even be compared?
When the post-editor is instructed to “use as much of the raw MT output as
possible” (TAUS 2010), quality might be undermined. In this sense, the debate
arises as to whether quality, in the translation ecosystem, depends on the context in
which the work is carried out, the multiple cooperation partners involved (translator,
client, language service provider) as well as the different interests of each of them,
106 C. Rico and M. del Mar Sánchez Ramos

which can lead to divergence. The artefact group of explicit steering instruments,
such as MTPE guidelines and instructions, is of utmost relevance in this context. As
Moorkens et al. (2018) point out, the variability to translation quality requirements is
expressed “using vague, relatively undefined terms”, in the form of prescriptive
guidelines for light or medium MTPE, error typologies or penalties specifically
designed for a translation client. In the absence of clear rules that can be applied
universally, post-editors may be involved in a complex situation affecting their
commitment to quality.
In translation studies, the concept of quality presupposes a theory of translation:
from the early intuitive conceptualizations of quality as the “natural flow of the
translated text”, dependent on the translator’s artistic competence, to formal models
based on the concept of text equivalence and the requirements of a translation’s end
user. When quality is explored in the light of what MTPE entails, we must also add
MT as a complementary factor, as the developments of the latter over time have
significantly affected the way the former is perceived. At the time of the first
experiments, back in the 1950s, when the aim was to use computers as a replacement
for the translator, the success of MT was measured by the quality obtained in
comparison with human translation. Those were the times of the so-called FAHQMT
(Fully Automatic High Quality Machine Translation), a wish that had researchers
struggling with computers and texts for a long time. As Bowker (2019, p. 453) points
out, quality, in this context, was understood as “the excellence of machine translation
in relation to a translation made by a professional human translator”. The ultimate
test of quality would be for MT to achieve human parity, a concept that is still under
question (Toral 2020). In fact, the development of translation technology along time
also brings an evolving concept of translation quality, associated variably to human
time, human judgements of linguistic quality, and to productivity measured in terms
of terminological consistency and usability (Pym 2019, p. 437).
With the new developments in MT technology and its incorporation in industrial
processes, MTPE starts to gain ground and quality is discussed in terms of which
segments should be post-edited, what time should be allocated to the task and the
type of corrections to be made and, most importantly, what level of quality is
expected. This took Allen to define two types of MTPE according to the final
purpose of the text: light and full post-editing (Allen 2001). These two concepts
are being currently superseded with the emergence of neural MT, which calls into
question certain aspects of the way in which MTPE is carried out. For Vieira (2019,
p. 326, 328), the fact that neural systems produce more fluent texts may make it more
difficult to detect (and correct) translation errors. From this point of view, the notion
of post-editing levels may lose relevance and give way to a different concept of MT
revision in which the post-editor focuses on checking the correct use of terminology
and giving approval to the translated content. This type of MTPE is in line with a
more flexible way of understanding quality, the so-called “fit for purpose” (Way
2013). This concept introduces the idea that the final quality of the text can be
negotiated on the basis of a series of variable criteria and taking into account the
technology to be used. From a practical point of view, the product of MTPE no
longer aspires to be comparable to a human translation, but rather to the end use that
6 The Ethics of Machine Translation Post-editing in the Translation Ecosystem 107

will be given to the text. MT evaluation metrics should be sensitive to the intended
use of the system and determine whether a system permits an adequate response to
given needs and constraints (Bowker 2019, p. 454–455).
This diversity of the concept of quality in relation to MTPE is best described in
the framework of a situated model where translation is more than a linguistic
exercise and the translator’s choices depend on a holistic description of “the relevant
factors influencing his/her cognition in real world translation environments” (Krüger
2016a, p. 310). Following Krüger’s model, we see translator’s cognitive perfor-
mance as closely related to the artefact group of “technology in the narrow sense”,
where MT is placed together with TM systems, terminology management, alignment
tools and project management tools. This artefact group is essential to the translation
process (Krüger 2016a, p. 320) and, as such, is an important part of the translator’s
cognition. The post-editor’s commitment to quality is then linked to the specific use
of this artefact group and allows for broader and adaptable requirements in quality
which include such factors as human time, human judgements of linguistic quality,
and to productivity measured in terms of terminological consistency and usability.

6.3.3 Ethical Dilemma #3: Digital Ethics


and the Post-Editor’s Responsibility

There are many aspects that need to be considered when exploring the post-editor’s
responsibility in terms of digital ethics in the use of MT, and they mostly refer to the
interaction with cooperation partners in the translation ecosystem. Following
Canfora and Ottmann (2020), we identify three key issues: (a) translation errors in
critical domains that place a risk for the end user of the translation, (b) liability for
clients and post-editors when working with MT, and (c) the potential exposure of
sensitive data in free online MT. This last one, in turn, introduces a series of
important issues related to intellectual property rights and ownership, confidentiality
and non-disclosure agreements, data sharing and data protection in collaborative
environments and MT databases (Mitchell-Schuitevoerder 2020, p. 113–127). Data
governance is also a major concern for authors such as Moorkens and Lewis (2019),
who consider “translation as a shared knowledge resource” and call for “a move to a
community-owned and managed digital commons” (Moorkens and Lewis 2019,
p. 17). The argumentation on all the different issues at stake when using MT
certainly needs careful consideration not only from a legal and normative point of
view but also from a technical perspective integrating secure translation workflows
in “close-circuit MT engines” (Mitchell-Schuitevoerder 2020, p. 121).
What calls our attention regarding the post-editor’s digital responsibility towards
MT is the acknowledgment of a certain lack of knowledge towards what it is that this
technology entails. In this respect, the study conducted by Sakamoto et al. (2017) is
significant. They explored the opinions and perceptions towards technology use in
the UK language service industry, and particularly project managers as the “key
108 C. Rico and M. del Mar Sánchez Ramos

people who have strong influences on all aspects of translation practice in the
industry”. Among their findings, we note the following: (a) project managers were
not clearly informed about actual MT use by translators in their teams, (b) they do
not discuss use of MT openly, (c) if the quality is low they suspect the translation
may be a post-edited MT output, or (d) there is no industry-wide consensus about
how much and in what way translators should use MT, and not many language
service providers implement an official policy about it. In this scenario where MT
use (and MTPE) goes almost unnoticed (deliberately or not) it is critical to delve into
the post-editor’s accountability towards translation as a digital product. From the
perspective of a situated model of translation, the post-editor’s responsibility is
found in the interaction with the different cooperation partners and artefacts, and
along the different phases of the task. That leads us to consider who initiates and
commissions the project (language service provider, project manager, final client),
which technology is to be used (MT system, terminology management tools,
translation memory), which other instruments are used (client instructions, MTPE
guides, previous translations), and which professional status is assumed by post-
editors or even assigned to them. When project managers acknowledge, for instance,
a “willingness and ignorance to know whether translators use MT” (Sakamoto et al.
2017), the MTPE process is deprived of transparency. Similarly, when translators
use free online MT without informing their clients, even when its use has been
excluded specifically in contractual clauses (Canfora and Ottmann 2020, p. 64),
transparency in the process is at risk.
A possible cause for this behaviour might be found in a lack of MT literacy in the
translation community. As Bowker (2019) indicates, “just because machine transla-
tion is easily accessible [. . .] this doesn’t mean that we instinctively know how to
optimise it or even to use it wisely in a given context”. It might be possible that the
absence of formal training in MT and MTPE hinders the capacity to become
informed and critical users of MT tools. When exploring Bowker and Buitrago
Ciro (2019) findings in the framework of scholarly communication we can easily
relate those to the context of translators and post-editors, advocating for their ability
to go beyond the mere technical (and procedural) competence and becoming critical
and informed users. Ideally, this involves comprehending the basics of how MT
systems process texts, understanding the implications associated with the use of MT,
and evaluating the possibilities of this technology. It is our contention that a
thorough knowledge of what MT is and what it entails in a situated model of
translation provides the adequate background for the post-editor with regards to
digital ethics.

6.4 Concluding Remarks

This chapter has presented MTPE in the light of the translation ecosystem,
conceptualising the translation process in a situated model. By adapting Krüger
(2016a) original framework, we have been able to analyse how the different
6 The Ethics of Machine Translation Post-editing in the Translation Ecosystem 109

components intervening in the process have an effect on the ethical construction of


MTPE. In this respect, we have identified three ethical dilemmas where post-editors
are confronted to their status as compared to that of translators (dilemma #1), the
commitment to quality is challenged (dilemma #2), and digital responsibility
towards data governance is questioned (dilemma #3).
In our argumentation, some issues have been left open since we believe that the
definitive effect of technology in translation (and MTPE) is yet to be seen. In this
respect, we hope this chapter takes the reader to explore further the multiple facets of
the post-editor’s role. As a final consideration, we must take into account that post-
editors actually move between two complementary ethical systems: the “utilitarian
business ethics” and the “deontological ethics” (Abdallah 2011). The first allows
them to deal with the client and work quickly in order to get paid and to make a
profit. The second enables them to accept “translators’ ethical codes” that point to
accuracy, fidelity, and neutrality. In the translation ecosystem, the post-editor is able
to retain agency along the process and conform to professional ethical principles.

References

Abdallah K (2011) Towards empowerment: students’ ethical reflections on translating in produc-


tion networks. Interpreter Transl Trainer 5(1):129–154. https://doi.org/10.1080/13556509.2011.
10798815
Allen J (2001) Post-editing or not post-editing? Int J Lang Document 2001:41–42
Allen J (2003) Post-editing. In: Somers H (ed) Computers and translation: a translator’s guide. John
Benjamins Publishing, Amsterdam, pp 297–318. https://doi.org/10.1075/btl.35.19all
Bourdieu P (1984) Distinction: a social critique of the judgement of taste. Routledge, London
Bowker L (2019) Fit-for-purpose translation. In: O’Hagan M (ed) The Routledge handbook of
translation and technology. Routledge, London, pp 453–568. https://doi.org/10.4324/
9781315311258-27
Bowker L, Buitrago Ciro J (2019) Machine translation and global research: towards improved
machine translation literacy in the scholarly community. Emerald Publishing Limited, Bingley.
https://doi.org/10.1108/9781787567214
Canfora C, Ottmann A (2020) Risks in neural machine translation. Transl Spaces 9(1):58–77.
https://doi.org/10.1075/ts.00021.can
Do Carmo F (2020) “Time is money” and the value of translation. Transl Spaces 9(1):35–57. https://
doi.org/10.1075/ts.00020.car
Do Carmo F, Mookens J (2021) Differentiating editing, post-editing, and revision. In: Koponen M,
Mossop B, Robert IS, Scocchera G (eds) Translation revision and post-editing. Industry
practices and cognitive processes. Routledge, London, pp 35–50. https://doi.org/10.4324/
9781003096962-4
Guerberof A (2012) Productivity and quality in the post-editing of outputs from translation
memories and machine translation. PhD thesis. Universitat Rovira i Virgili
Guerberof A (2014) Correlations between productivity and quality when post-editing in a profes-
sional context. Mach Transl 28(3-4):165–186. https://doi.org/10.1007/s10590-014-9155-y
Guerberof A (2019) Pre-editing and post-editing. In: Angelone E, Ehrensberger-Dow M, Massey G
(eds) The Bloomsbury companion to language industry studies. Bloomsbury, London, pp
333–360
110 C. Rico and M. del Mar Sánchez Ramos

Guerberof A, Moorkens J (2019) Machine translation and post-editing training as part of a master’s
programme. JosTrans 31:217–238. https://www.jostrans.org/issue31/art_guerberof.pdf
Guinovart Cid C (2020) The professional profile of a post-editor according to LSCs and linguists: a
survey-based research. Hermes 60:171–190. https://doi.org/10.7146/hjlcb.v60i0.121318
Holz-Mänttäri J (1984) Translatorisches handeln. Theorie und methode. Suomalainen
Tiedeakatemia, Helsinki
ISO (International Organization for Standardization) (2017) Translation services – post-editing of
machine translation output – requirements. ISO 18587:2017. International Organization for
Standardization, Geneva. https://www.iso.org/standard/62970.html. Accessed 14 March 2020
Konttinen K, Salmi L, Koponen M (2021) Revision and post-editing competences in translator
education. In: Koponen M, Mossop B, Robert IS, Scocchera G (eds) Translation, revision and
post-editing. Industry practices and cognitive processes. Routledge, London, pp 185–201.
https://doi.org/10.4324/9781003096962-15
Koponen M (2016) Is machine translation post-editing worth the effort? A survey of research into
post-editing and effort. J Special Transl 25:131–148. https://www.jostrans.org/issue25/art_
koponen.pdf
Koponen M, Mossop B, Robert IS, Scocchera G (2021) Translation revision and post-editing.
Industry practices and cognitive processes. Routledge, London. https://doi.org/10.4324/
9781003096962
Krings HP (2001) Repairing texts: empirical investigations of machine translation post-editing
processes. Kent State University Press, Kent
Krüger R (2015) The Interface between scientific and technical translation studies and cognitive
linguistics. With particular emphasis on explicitation and implicitation as indicators of transla-
tional text-context interaction. Frank & Timme, Berlin
Krüger R (2016a) Situated LSP translation from a cognitive translational perspective. Lebende
Sprachen 61(2):297–332. https://doi.org/10.1515/les-2016-0014
Krüger R (2016b) Contextualising computer-assisted translation tools and modelling their usability.
Trans-kom 9(1):114–148. http://www.trans-kom.eu/bd09nr01/trans-kom_09_01_08_Krueger_
CAT.20160705.pdf. Accessed 15 Dec 2020
Lambert J (2018) How ethical are codes of ethics? Using illusions of neutrality to sell translations.
JosTrans 30:269–290. https://www.jostrans.org/issue30/art_lambert.pdf
Melby AK, Hague D (2019) A singular(ity) preoccupation. Helping translation students become
language-services advisors in the age of machine translation. In: Sawyer DB, Austermühl F,
Enríquez Raído V (eds) The evolving curriculum in interpreter and translator education:
stakeholder perspectives and voices. John Benjamins, Amsterdam, pp 205–228. https://doi.
org/10.1075/ata.xix.10me.l
Mitchell L, Roturier J, O’Brien S (2013) Community-based post-editing of machine-translated
content: monolingual vs. bilingual. In: O’Brien S, Simard M, Specia L (eds) Proceedings of the
MT summit XIV workshop on post-editing technology and practice. http://doras.dcu.ie/20030/.
Accessed 15 Dec 2020
Mitchell-Schuitevoerder R (2020) A project-based approach to translation technology. Routledge,
London. https://doi.org/10.4324/9780367138851
Moorkens J, Lewis D (2019) Research questions and a proposal for the future governance of
translation data. JosTrans 32:2–25. https://jostrans.org/issue32/art_moorkens.pdf
Moorkens J, Castilho S, Gaspari F, Doherty S (2018) Translation quality assessment: from
principles to practice. Springer, Cham. https://doi.org/10.1007/978-3-319-91241-7
Nitzke J (2016) Monolingual post-editing: an exploratory study on research behaviour and target
text quality. In: Hansen-Schirra S, Grucza S (eds) Eye-tracking and applied linguistics. Lan-
guage Science Press, Berlin, pp 83–109
O’Brien S (2002) Teaching post-editing: a proposal for course content. In: Proceedings of the 6th
EAMT workshop: teaching machine translation. https://www.aclweb.org/anthology/2002.
eamt-1.11.pdf. Accessed 15 May 2021
6 The Ethics of Machine Translation Post-editing in the Translation Ecosystem 111

O’Brien S (2011) Towards predicting post-editing productivity. Mach Transl 25(3):197–215.


https://doi.org/10.1007/s10590-011-9096-7
Plaza-Lara C (2020) How does machine translation and post-editing affect project management? An
interdisciplinary approach. Hikma 19(2):163–182. https://doi.org/10.21071/hikma.v19i2.12516
Plitt M, Masselot F (2010) A productivity test of statistical machine translation post-editing in a
typical localisation context. Prague Bull Math Linguist 93:7–16. https://doi.org/10.2478/
v10108-010-0010-x
Pym A (2019) Quality. In: O’Hagan M (ed) The Routledge handbook of translation and technology.
Routledge, London, pp 437–452. https://doi.org/10.4324/9781315311258-26
Rico C, Torrejón E (2012) Skills and profile of the new role of the translator as MT post-editor.
Tradumàtica 10:166–178. https://doi.org/10.5565/rev/tradumatica.18
Risku H (2004) Translationsmanagement. Interkulturelle fachkommunikation im
kommunikationszeitalter. Narr, Tübingen
Risku H (2010) A cognitive scientific view on technical communication and translation. Do
embodiment and situatedness really make a difference? Targets 22(1):94–111. https://doi.org/
10.1075/target.22.1.06ris
Sakamoto A (2019) Why do many translators resist post-editing? A sociological analysis using
Bourdieu’s concepts. JosTrans 31:201–216. https://www.jostrans.org/issue31/art_sakamoto.
php. Accessed 15 Dec 2020
Sakamoto A, Yamada M (2020) Social groups in machine translation post-editing. Transl Spaces
9(1):78–97. https://doi.org/10.1075/ts.00022.sak
Sakamoto A, Rodríguez de Céspedes B, Berthaud S, Evans J (2017) When translation meets
technologies: language service providers (LSP) in the digital age. University of Porstmouth.
https://www.iti.org.uk/resource/when-translation-meets-technologies-language-service-pro
viders-in-the-digital-age.html. Accessed 15 Dec 2020
Sánchez Gijón P (2016) La posedición: hacia una definición competencial del perfil y una
descripción multidimensional del fenómeno. Sendebar 27:151–162. https://revistaseug.ugr.es/
index.php/sendebar/article/view/4016/5057. Accessed 15 Dec 2020
Sánchez Ramos MM, Rico Pérez C (2020) Traducción automática. Conceptos clave, procesos de
evaluación y técnicas de posedición. Comares, Granada
Sánchez-Gijón P (2020) La posedición bajo el microscopio. In: Álvarez Álvarez S, Ortego Antón
MT (eds) Perfiles estratégicos de traductores e intérpretes. Comares, Granada, pp 81–104
Schubert K (2007) Wissen, Sprache, Medium, Arbeit. Ein integratives Modell der ein- und
mehrsprachigen Fachkommunikation. Narr, Tübingen
Schwartz L (2014) Monolingual post-editing by a domain expert is highly effective for translation
triage. In: O’Brien S, Simard M, Specia L (eds) Proceedings of the third workshop on post-
editing technology and practice. Vancouver, Association for Machine Translation in the
Americas, pp 34–44
Serrano Piqueras J (2011) Überlegungen zur Untersuchung des Einflusses von Translation-
Memory-Systemen auf die Übersetzungskompetenz. MA Thesis at the Institute of Translation
and Multilingual Communication, Cologne University of Applied Sciences
Strohner H (1995) Kognitive Systeme. Eine Einführung in die Kognitionswissenschaft. Opladen.
Westdeutscher Verlag, Leverkusen. https://doi.org/10.1007/978-3-322-94240-1
TAUS (2010) MT post-editing guidelines. https://www.taus.net/academy/best-practices/postedit-
best-practices/machine-translation-post-editing-guidelines. Accessed 15 Dec 2020
Toral A (2020) Reassessing claims of human parity and super-human performance. In: Proceedings
of the 22nd annual conference of the european association for machine translation, Lisbon,
Portugal, pp 185–194. http://eamt2020.inesc-id.pt/proceedings-eamt2020.pdf. Accessed
15 Dec 2020
Torres Hostench O, Cid-Leal P, Presas M, Piqué Huerta R, et al (2016) El uso de traducción
automática y posedición en las empresas de servicios lingüísticos españolas: Informe de
investigación ProjecTA 2015. Bellaterra. https://ddd.uab.cat/record/148361. Accessed
15 Dec 2020
112 C. Rico and M. del Mar Sánchez Ramos

Vieira LN (2019) Post-editing of machine translation. In: O’Hagan M (ed) The Routledge hand-
book of translation and technology. Routledge, London, pp 319–337. https://doi.org/10.4324/
9781315311258-19
Vieira LN (2020) Automation anxiety and translators. Transl Stud 13(1):1–21. https://doi.org/10.
1080/14781700.2018.1543613
Vieira LN, Alonso E (2020) Translating perceptions and managing expectations: an analysis of
management and production perspectives on machine translation. Perspectives 28(2):163–184.
https://doi.org/10.1080/0907676X.2019.1646776
Vieira LN, Alonso E, Bywood L (2019) Introduction: post-editing in practice – Process, product
and networks. JosTrans 31:2–13. https://jostrans.org/issue31/art_introduction.php. Accessed
15 Dec 2020
Way A (2013) Traditional and emerging use-cases for machine translation. In: Proceedings of
translating and the computer. ASLIB, London. https://www.computing.dcu.ie/~away/
PUBS/2013/Way_ASLIB_2013.pdf. Accessed 15 Dec 2020
Chapter 7
Ethics and Machine Translation: The End
User Perspective

Ana Guerberof-Arenas and Joss Moorkens

Abstract This chapter analyses existing research on the ethical implications of


using MT in translation and communication, and it describes results from usability
experiments that focus on the inclusion of raw and post-edited MT in multilingual
products and creative texts with an emphasis on users’ feedback. It also offers
suggestions on how MT content should be presented to users, readers, and con-
sumers in general. It finally considers the ethical responsibility of all stakeholders in
this new digital reality. If the ethical dimension is an ecosystem, users also have the
responsibility to support products that protect language, translators, and future
generations.

Keywords Ethics · Machine translation · Usability · User reception · Translation


reception · Ethical responsibility · Sustainability

7.1 Introduction

In 2016, ten years after it was launched, the world’s biggest machine translation
(MT) producer, Google Translate, announced that it generated over 143 billion
words per day (Pichai 2016). We can safely assume that this output has subsequently
increased, including many text types translated for a wide variety of users. Why is it
that MT use has become so widespread? There are two primary positions on this:
(a) the technological determinist view that the time has come for this technology,
i.e. from the natural evolution of the field at that time, and (b) the social determinist

A. Guerberof-Arenas (✉)
Guildford, UK
University of Groningen, Groningen, Netherlands
e-mail: [email protected]
J. Moorkens
Dublin City University, Dublin, Ireland
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 113
H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_7
114 A. Guerberof-Arenas and J. Moorkens

view that circumstances (societal, technological, economical) are such that huge
efforts were put into MT development. The former implies that MT is a small part of
inevitable technological progress and it follows that MT should be put into use
where possible without consideration of its sociocultural context. The sociotechnical
counterargument is that the pros, cons, and repercussions of each new technology
should be carefully considered by society before its implementation.
Kranzberg (1986) wrote that technology is “neither good nor bad; nor is it
neutral” (p. 545). The view among ethicists and researchers in science and technol-
ogy studies or STS (as summarised in Olohan 2017) is that science is not linear and
deterministic, but rather that development is rooted in a worldview from which the
decision as to what to develop, its intended audience, and its implementation are
indivisible. This set of factors, in turn, influence the effects of technologies in use, as
they reshape activities and their meaning, engendering new worlds of their own
(Winner 1983). For example, as Larsonneur (2021) noted, the major MT providers
are now big tech companies due to their access to resources and ubiquitous online
offering. The university research groups that at one stage topped the leaderboards in
competitive MT shared task events, particularly those for well-supported languages,
have gradually been replaced by big tech research groups. This means that the
perspective and motivation of big tech companies now drives much of MT devel-
opment.1 In other words, large corporations rather than all players in society are
determining the use and suitability of MT for assimilation, where MT is served
directly to the end user.
In Weaver’s (1949) memo proposing MT, for example, he wants to enable
communication and encourage peace between nations. He also sees translation as
a problem and foreign languages as encrypted versions of English or an as-yet-
undiscovered universal language (Raley 2003), an idea that Kenny et al. (2020, p. 1)
call one of the ‘most reductive. . . in translation history.’ This is on the basis that
translation as a communicative act is a much more complex process than coding and
decoding language at the superficial level of the written word as opposed to the
world of ideas and of communication between cultures. Nonetheless, this superficial
view has often prevailed: Kenny et al. (2020) note that the notion of ‘foreign as
English’ survived in MT literature well into the 2000s. We can see the evidence of
superficiality and neutralisation in MT output, as discussed in recent literature about
normalisation in MT (Čulo and Nitzke 2016; Toral 2019) and reduced lexical
diversity (Vanmassenhove et al. 2019). And yet, MT also carries the utopian
communicative intent of Weaver’s memo, enabling effective communication for
many people in many scenarios.
In this chapter, we attempt to systematically analyse the ethics of MT as an
end-product, and to examine the world engendered by widely-available MT, a world
that did not really exist before the advent of free, networked, and ubiquitous MT. If
we were to take a stakeholder approach (see Fig. 7.1) in analysing the effects of MT
on groups of people with different levels of involvement with MT (e.g., translators,

1
See also Paullada (2020) on MT and power dynamics.
7 Ethics and Machine Translation: The End User Perspective 115

Fig. 7.1 MT stakeholder ecosystem

company shareholders, engineers, academics, end users) and to take a utilitarian


position as to whether MT helped them materially or to consider whether it helped
them to flourish as human beings, the results would probably be different for each
user group. Here, however, we focus on end users, although we are aware that this
group is multilingual and not homogeneous.
On the one hand, one could argue that the MT end user receives a lot of attention
from big tech companies. Where translators are often given mixed-quality MT and
expected to ‘work miracles’ (Thicke 2013, p. 42), end users are given up-to-date
interfaces and seamless integration within internet browsers. On the other hand,
there is little published user testing of these translation interfaces, and the only
opportunity for feedback, if such an opportunity is offered at all, is a binary yes or
no response to mark satisfaction or dissatisfaction with the translation provided,
giving the impression that the text is just about worth translating, but not necessarily
worth the effort of translating well. For MT providers, is no news good news?
Looking at a variety of user types, Liebling et al. (2020) find that many users’ needs
are not catered to by contemporary mobile MT applications.
Way (2018) suggests that the level of automation for translation should relate to
the perishability of the source text, but notes that new use cases are constantly being
found for raw (and post-edited) MT. Canfora and Ottmann (2018) introduce a
116 A. Guerberof-Arenas and J. Moorkens

Fig. 7.2 Modelling MT use


cases B C

Low risk High-risk


long term long term

Low-risk High-risk
A
short term short term D

second continuum of risk, whereby low-risk texts afford translation using MT but
high-risk texts (where mistranslation may cause injury or death) require careful
human revision. In Sect. 7.1 of this chapter, we consider ethics and MT as an
end-product for different types of texts (see Fig. 7.2), from those that have a short
shelf-life and are low risk to those that have a long shelf-life and are high risk. Of
course, modelling or presuming a user perspective is not ideal, so we also hear the
voices of real users testing MT in a novel long shelf-life use case in Sect. 7.2. In Sect.
7.3, we discuss the implications of our analyses and some further issues prior to
conclusion.

7.2 Modelling MT Use Cases

Translation use cases with a short shelf life that present low risk (A in Fig. 7.2) are
ostensibly ideal for reception of raw, ‘low stakes’ MT. For example, it makes little
sense to hire a professional translator to translate most user-generated content such
as online travel reviews or forum postings, as the reviews are likely to be superseded
by newer ones within hours or days and few readers tend to read beyond the most
recent postings. Similarly, online auctions are time-limited and cease to be useful as
soon as the auction closes. We can only assume that most instances of low-stakes
MT are positive and useful, with comprehension facilitated by MT. End users will
probably get the gist of the review or auction posting, and any mistranslation will
have few, if any, risks or repercussions.
The taxonomy proposed by Canfora and Ottmann (2018), categorising risks
along a continuum of increasing severity, is a useful tool for evaluating the level
of translation risk. These risks range from communication being impaired or impos-
sible, loss of reputation, and financial or legal consequences, to damage to property,
physical injury, and death. Following this taxonomy, there may be a possibility of
communication being impaired and a loss of reputation on the part of the provider of
a low-risk translation, i.e. the hosting site for reviews or sales. An example of this
was the launch of an Amazon portal for Sweden, featuring mistranslations and
vulgar MT errors (Hern 2020). Reputational risk is greater with mistranslation of
7 Ethics and Machine Translation: The End User Perspective 117

social media posts. An embarrassing and offensive mistranslation in 2020 led


Facebook to deactivate English-Thai MT alongside a public apology (Marking
2020). Because most interactions with the average Twitter or Facebook post occur
within the first 2 h or less (Rey 2014), translation needs to be timely. However,
translation that maintains the original tone, content, and indexing (using the appro-
priate hashtags) can be difficult (Desjardins 2017), and since these posts are indef-
initely searchable, they cannot be considered short shelf-life content—or low risk. If
the posting has been machine translated prior to posting, the translation will be static,
but if not, it will be dynamic and not controllable by the original poster, with MT
applied to the post whenever it is viewed by someone using the platform in an
MT-supported language (and thus may be considered part of Category B or even C
in Fig. 7.2). Seemingly innocuous public social media posts may form the basis for
major decisions about users that impact on the future opportunities and wellbeing.
The United States (US) Citizen and Immigration Service, for example, are reported
to instruct their officers to ‘screen refugees’ social media posts using commonly
available translation tools such as Google Translate’ (Doc Society v. Pompeo 2019),
despite the tendency of such large language models to encode societal bias (Bender
et al. 2021). The likelihood of mistranslation is greater for low-resource language
pairs due to data sparsity and, according to Wang et al. (2021), possible malicious
attacks on neural MT (NMT) systems to alter the output.
End users also interact with free online MT platforms to access quick, low-risk
translation (A in Fig. 7.2) (Nurminen 2018). This may be to comprehend a restaurant
menu, to read signs and instructions in an unfamiliar language, or for social
interaction, such as fan conversations or bar and restaurant orders in the example
of the use of Google Translate at the 2018 Russia World Cup (Smith 2018). In this
situation, the risks are a little higher, especially for users with little MT literacy, and
providers usually exclude liability (Canfora and Ottmann 2020). Bowker and Ciro
(2019) introduced the concept of MT literacy to include a basic understanding of
how MT works, the ability to craft appropriate input text, and to edit MT output for
accuracy and readability. The user with little MT literacy may assume that any
translation produced is clumsy but accurate, with possible risky results if they eat an
ingredient to which they are allergic or follow a mistranslated instruction (quickly
joining Category D in Fig. 7.2). With regular use of free online MT, a user might
improve their MT literacy, becoming familiar with the quality to be expected of
output and understanding the types of texts that are likely to be well translated. For
those users, MT may provide a gateway for civic participation and for access to
information (Nurminen and Koponen 2020).
This appears to be the case for the UK-based Syrian asylum seekers in Vollmer’s
(2020) research, who use a smartphone with MT as one of their digital literacy tools
for communication and, in one case, to practice for a driving theory test. Ciribuco
(2020) reports similar pragmatic use of smartphones and MT among asylum seekers
in Italy. However, in the latter case most research participants appear to use MT
uncritically, other than one participant who says that he will not use it again after
realising that it had provided him with an inaccurate translation. The use cases
presented by Ciribuco (2020), including use of MT when watching TV, conversing
118 A. Guerberof-Arenas and J. Moorkens

with teachers, and working in a garage, probably carry little risk (A and B in
Fig. 7.2), but there is a danger that uncritical use of MT will mean insufficient
discrimination between low- and high-risk uses. Ciribuco (2020) writes of the ‘need’
for translation for ‘survival’, and there may be instances when the availability of
online or mobile MT is crucial as a translator or informal mediator is just not
available, in which case MT fulfils a communication function. This raises a risk of
overreliance on MT, however. The non-English speaking immigrants to the US
interviewed by Liebling et al. (2020) experienced mistranslations that were inappro-
priate or dangerously inaccurate, and report having lost work and struggled to build
relationships due to their use of free online MT on smartphones for almost all
interactions.
Translation is often necessary for survival in high risk, low shelf-life situations
(D in Fig. 7.2) as addressed in work on crisis translation and on the use of free online
MT engines for health and legal settings. Cadwell et al. (2019) provide examples of
MT being used in response to crises, with common problems of underdeveloped
technology for the language pairs in use, including insufficient data, and available
data coming from an inappropriate domain. These problems exacerbate existing
quality issues with MT, and Federici and O’Brien (2020) suggest preparedness and
the intervention of professional translators and interpreters where possible to miti-
gate risk. In a crisis scenario, not all translation will be public-facing, and a digital
divide may affect access to human or machine translation, whereby socio-economic
factors (as highlighted by Cadwell et al. 2019) or gender inequality (as highlighted
by Vollmer 2020) might limit access to technology generally. Therefore, the use of
MT in these scenarios should be implemented with caution and ideally under
supervision of translators or others with high MT literacy (Parra Escartín and
Moniz 2019).
Example use cases for combined speech recognition and MT tools are often in
medical settings (see Sumita 2017, for example), which appears to be a high-risk
setting, despite the short MT shelf life. A mistranslation could have dire conse-
quences for an individual. In high risk, long-life use cases (C in Fig. 7.2), such as
translation of food ingredients, medicines and their accompanying information, or
instructions for machinery, mistranslation could expose individuals to risk at the
high end of Canfora and Ottmann’s (2018) continuum, such as injury or death. While
the argument for use of MT for assimilation in crisis scenarios might be a utilitarian
effort to minimise harm, there can be little argument that the use of MT without
expert human intervention in high-risk scenarios is neither wise nor ethical (Parra
Escartín and Moniz 2019; O’Mathúna et al. 2020).
Aside from the use cases discussed in this section, we may have low risk, long
shelf life (B in Fig. 7.2) use cases for MT, such as in the use of MT for literature or
for user interface translation. The risks will come at the lower end of Canfora and
Ottmann’s (2018) continuum, with mistranslation risking communication being
impaired or a loss of reputation. For literary translation, one could argue that there
is a long-term risk to language, and reduced readability presents a possible risk to
engagement with the other, to empathy. As noted by Bender et al. (2021), societal
views or biases as represented in MT systems are set in aspic from the moment the
7 Ethics and Machine Translation: The End User Perspective 119

training data is harvested, whereas in society these will change over time. If literature
models the way in which society thinks, feels and behaves, the consequences of poor
engagement with the text could be of a high-risk nature that are currently
unforeseeable. The argument inherent in the copyright waiver for developing coun-
tries in the 1971 update to the Berne Convention (see Moorkens and Lewis 2019)
suggests that availability is currently considered to outweigh these risks.
In the following section, we look in detail at user interaction with raw and post-
edited machine translation to bring in the direct voice of users and their perception of
the issues faced.

7.3 The Voices of Users

This section gives voice to users of raw MT in technical and creative environments
that are not language professionals, but the ultimate users or readers of translated
texts.
Since 2017, part of our research has explored how using MT (both highly
customised statistical, SMT, and NMT) engines impacts the user or reader experi-
ence when applying different translation modalities. We define translation modality
as a descriptor of the process in which the translation is generated. For example, if
the translation of a product or a story is generated by professional translators without
the aid of MT, this is considered one translation modality that we could call “human
translation”, but if the translation is generated by MT and then post-edited it would
be considered another modality, that we could call “MT post-editing (MTPE)”, and
finally, if raw MT output is used, this is labelled as MT.

7.3.1 First Experiment: Technical Environment

The first experiment involved 84 participants that were native Japanese, English,
German and Spanish speakers using an eye-tracker (Guerberof-Arenas et al. 2021).
The participants were frequent users of word-processing applications but had a
different Microsoft Word (MS Word) literacy (varying experience when using
MS Word).
We set up an intra-subject experiment where the users did six tasks, three of them
using the published version of MS Word as localised for their native language
(HT/MTPE as part of the content is post-edited), and, after a brief pause, the
remaining three using a machine-translated version of MS Word (MT). Half of the
participants in each language group follow the reverse order in the translation
modality to counterbalance the order effect. The engine used for this “experimental
translation” was a customised SMT engine used in production by Microsoft in the
company’s localization process at that time (Quirk et al. 2005), and therefore deemed
of acceptable quality for post-editing for all the languages tested (Schmidtke and
120 A. Guerberof-Arenas and J. Moorkens

Groves 2019). Based on previous experiments (Doherty and O’Brien 2014; Castilho
2016), and in order to analyse the usability of the different translation modalities, we
looked at effectiveness, i.e. the number of completed tasks versus the total number of
tasks; efficiency, i.e. effectiveness in relation to time; satisfaction, i.e. the level of
satisfaction in completing tasks, the time, the instructions given and the language
used in MS Word; and, finally, cognitive effort, i.e. the mental effort employed in
completing (or not) the tasks. For a detailed methodology of this experiment, refer to
Guerberof-Arenas et al. (2019, 2021).
The results show that the type of task, and hence the participants’ experience and
ability (MS Word literacy) was a factor in their effectiveness, but the translation
modality was not a statistically significant factor. However, when it came to the
combination of completed tasks and time, efficiency, and the reported users’ satis-
faction, the translation modality was an influencing factor, and MT scored signifi-
cantly lower than the HT modality in efficiency and satisfaction. With regards to
cognitive load, the results show that the English participants exert a lower cognitive
effort when reading the instructions and completing the tasks in comparison to the
rest of the languages, but there is no difference between the other languages or the
modalities.
After the users had completed the experiment, each participant recorded a semi-
structured interview while viewing their own eye-tracking data in a Retrospective
Think Aloud (RTA) protocol. The interviews range from 10 to 20 min and they were
conducted in English. Let us examine some of the relevant questions guiding the
interviews and the responses by the participants to see the effect that MT had on their
user experience.

7.3.1.1 Did You Notice a Difference in the Language in MS Word When


You Came Back from the Pause?

Strikingly, only three participants said that they had noticed that the application had
changed after the pause, and only one participant referred to MT “I just thought like
this is a, this is something that was processed by a machine and you cannot rely on
whatever you see, you have to search for it”.
However, the majority did not notice the change. There were several reasons
given for this: users with previous experience on that task did not look at the
language in detail, they just focused on the action, because they knew the location
of the option; others concentrated so deeply on reading the instructions and com-
pleting the tasks that they did not pay attention to the language in the application;
others assumed it would be the same application, they looked at other cues in the
application, or they were used to working with another version of MS Word. Having
said this, the users did fixate on the words on the application, perhaps only to look for
keywords or anchors without necessarily reflecting on the quality.
Most participants were not aware that they were using a different translation
modality after the pause. This came as a surprise to the research group, as we were
expecting that the change would be obvious because we were using a “fake” setup
7 Ethics and Machine Translation: The End User Perspective 121

with raw MT output—no post-editing at all was performed. This could have been
especially problematic for the German and Japanese participants because tradition-
ally these languages are difficult for MT, and because there were obvious errors in
the text displayed within the application in all languages (Guerberof-Arenas et al.
2021).

7.3.1.2 How Did You Find the Quality


of the German/Spanish/Japanese Here?

During the RTA, the participants were asked about the general quality of the
language they were working on in each modality. The responses were rather
mixed even from the same participants. However, from the 56 participants that
commented on the quality of the MT modality, 23 mentioned, surprisingly, that
the quality of the language was “Correct”, “Fine”, “Good” or “Very good”, but at the
same they were puzzled by some of the translations. “The Japanese language, I
didn’t recognise unnatural things in this task, but I did recognise some unnatural
translation, like too informal language in some other tasks”.

7.3.1.3 How Did You Find the Language in this Menu, Dialog Box,
Option?

When the participants were asked about the language in certain menus, dialog boxes
or options, 44 participants reported errors in the MT modality while only 5 reported
errors in the HT modality (which were not actual linguistic errors). These are some
examples of the errors found in MT.
It says, ‘links’ so you can actually adjust the left side, but it didn’t say anything for the right
side, it just said ‘Richting’2 and I was unsure about what it exactly means. (P04DE)
I don’t know if there is the original Spanish application or if there is a translation from the
English. Because in the next task I was looking for the right for the column space in the right.
But it was Correcto3 which is the direct translation from Spanish, so I don’t know if the
Spanish version, I don’t think it has written Correcto instead of right. And it was confusing
for me but the rest I think it is fine. (P07ES)

The users resorted to some strategies such as using the context to understand the
options or they back-translated the option to make sense of it.
Therefore, even if at the beginning we were puzzled that the change of modality
was not obvious to the participants after the pause, we did realise that some of the
MT options were confusing and that the language played an important role in finding

2
Richting was the MT alternative instead of Rechts (right in German).
3
Correcto was the MT alternative instead of Derecha because Right can have several translations,
one meaning right-hand side and another one meaning correct.
122 A. Guerberof-Arenas and J. Moorkens

an option, especially for users with little experience, and, hence, in completing the
tasks.

7.3.1.4 How Did You Feel During this Task?

The participants expressed several feelings during the RTA such as confusion,
confidence, concentration, disappointment, frustration, nervousness, unhappiness,
and/or happiness. However, the two most frequent words (5 characters or longer)
when describing their feelings were “confused” and “nervousness”. On some occa-
sions the participants were confused because of the tasks, and on other occasions
they were confused because of the instructions and the difficulty to find the options,
the functionality of the application, or indeed the language “I didn’t see this. No
Izquierdo correcto4”.
In summary, in this first experiment, most users were not overtly aware when they
were working with the MT modality. This could lead us to believe that using raw MT
technology as part of the translation process, in the context of technical texts or even
software applications as in this case, does not compromise the user experience since
these texts would be considered (according to our models) low risk and long term
(B in Fig. 7.2). However, the participants did experience difficulties with certain
words which lead to inefficiency and less satisfaction when using MT despite being
unaware that they were using MT, and this was more problematic with less-
experienced users. Of course, there are other aspects to consider: the type of task,
the experience, the MT quality for a specific language, and the MS Word setup that
might also influence the user experience, but the results show that the translation
modality was indeed a factor.
We see here that the user experience when dealing with the MT translation
modality is not only difficult to gauge because of multiple confounding variables
that make isolating language difficult, such as the task itself, the user’s experience,
the language combination, but also because the user has a pre-set notion of the
quality of the application that is related to the status of that application and even the
historical relationship of the user with this application. Users might be confused
when they look for keywords or when they look at brief messages and this contrib-
utes to a poor experience without necessarily knowing why. There are insufficient
studies that focus on testing the user experience when MT is involved as a translation
modality, and we believe that asking the users if they found the information
translated with MT useful will not reveal the real experience because this is a
much more complex and nuanced phenomenon.

4
The participant is referring here to the indentation where the MT proposal meant correct in Spanish
instead of right. He understands left (Izquierdo) but then he sees correct (Correcto) instead of right
(Derecho/a).
7 Ethics and Machine Translation: The End User Perspective 123

7.3.2 Second Experiment: Creative Environment

The second experiment explores the relationship between creativity in translation


and different translation modalities, and the impact this could have on readers of
translated fiction. The study involves 88 participants reading a short story in Catalan
followed by a questionnaire in the Qualtrics online survey tool to assess three aspects
of the user experience: narrative engagement, enjoyment and translation reception
(Guerberof-Arenas and Toral 2020).
The experimental design this time involved three translation modalities: HT,
MTPE and MT. To avoid the effect that one translator could have on the impact of
the story (i.e. to avoid the translator style effect), two professional literary translators
worked on the HT and on the MTPE modalities. The MT modality was provided by a
NMT engine customised for the domain (literature) and language combination
(English to Catalan) (Toral and Way 2018). The three modalities were analysed by
a professional literary translator to identify creativity indexes (considering errors and
creative shifts). Then, in a reception study, participants were presented randomly
with a text translated in one of the modalities (Qualtrics automatically presents a
balanced number per modality). After reading the text, they had to fill in a question-
naire that included a narrative engagement questionnaire (Busselle and Bilandzic
2009), enjoyment questions (Dixon et al. 1993; Hakemulder 2004) and a translation
reception questionnaire designed for this experiment.
The results show not only that creativity is higher in HT, lower in MTPE and
lowest in MT, but also that the reading experience is different depending on the
modality. HT scores higher in narrative engagement and translation reception and is
marginally lower than MTPE in enjoyment. However, there are no statistically
significant differences between HT and MTPE for any of these variables. This
means that once a professional literary translator intervenes in the process, the user
experience for HT and MTPE appears to be comparable, even though we see a trend
towards higher scores in HT. MT, unsurprisingly, has the lowest engagement,
enjoyment and translation reception scores, and these results are statistically signif-
icantly lower than for HT and for MTPE. It is noteworthy, though, that those
categories in the narrative engagement scale that are related to attentional focus,
emotional engagement and narrative presence do not show statistically significant
differences across the modalities.
These results seem to hint at the possibility that contemporary MT might be able
to fulfil a communicative function for some genres of literary texts even if the
reading experience is not as optimal as with the other modalities where professional
literary translators intervene (for more information, see Guerberof-Arenas and Toral
(2020))
Apart from the results from the questionnaires, what did the users have to say
about the translation modalities?
124 A. Guerberof-Arenas and J. Moorkens

7.3.2.1 If You Realised That It Was a Translation, Can You Describe


How Did You Conclude This?

The participants were debriefed about the translation of the text and then were asked
if they had realised they were reading a translation, 66% of the participants
responded “Yes” to this question. The follow up to this was how they had realised
it was a translation. In the follow up question, there was a striking difference between
the modalities. In the HT modality, 75% of the participants realised that the text was
a translation because it was set in the United States and so the names of the characters
and places referred to this country, the remaining 25% refer to literal or unnatural
expressions.
In the MTPE modality 60% of the participants refer to the USA setting, with the
remaining 40% referring to unnatural words, phrases, spelling or word order
(according to their own preferences).
In the MT modality, only 18% of the participants refer to the USA setting, with
the remaining 82% referring to nonsensical words, literal translations, strange
syntagmatic expressions, grammatical errors, wrong use of articles, lack of coher-
ence, and incorrect word order.
These comments help us understand that this estrangement factor—these odd
words and syntactic expressions—prevents the reader from engaging with or
enjoying the text. If the reader is disrupted from these stylistic elements because
they encounter errors, they become less focused on the text as a narration, and more
focused on the unusual words and hence enjoy the experience less, even if the reader
might be unaware of the translation modality.

7.3.2.2 Were There Any Paragraphs or Sentences That Were Difficult


to Understand? Can You Tell Us Which Ones?

In the HT modality, 39% of the participants found that there were paragraphs or
sentences that were difficult to understand because they contained too much infor-
mation (for example, proper names) or the narrative voice changed from third to first
person. Therefore, the difficulties were related to the structure of the ST and the
narrative decisions made by the author. In the MTPE modality, 34% of the partic-
ipants found paragraphs or sentences that were difficult to understand because of the
change of narrative voice and because the syntax was confusing. They gave exam-
ples of sentences that were long and difficult to follow. In the MT modality, 64% of
the participants responded “Yes” to this question because they found odd or non-
sensical words, confusion in the gender of articles, wrong syntactical constructions,
incorrect proper noun gender (la Joan instead of el Joan because in Catalan this is a
name for men), or they simply thought that certain sentences were not properly
translated.
7 Ethics and Machine Translation: The End User Perspective 125

In this second question we can see that in the HT and MTPE modalities the main
issues are related to the original ST while in the case of MT the issues are directly
related to the translation.

7.3.2.3 Was There a Sentence or a Paragraph That You Especially


Liked? Can You Tell Us Which One?

In the HT modality, most participants, 61%, found paragraphs or sentences that they
liked. They referred especially to the paragraphs of the story that described the way
one of the main characters dies and the description of his gaze.
In the MTPE modality, 43% of the participants found paragraphs or sentences
that they liked. They also referred to the paragraphs in the story where the author
describes how clinical death comes about and the description of the dying man’s
gaze. However, most of the participants (57%) did not say that they liked a paragraph
or sentences in this modality, in sharp contrast with the experience of those in the HT
modality.
In the MT modality, 36% of the participants found paragraphs or sentences that
they liked, mainly the first paragraph where the author explains what clinical death
is, even though there were translation errors in this paragraph.
In summary, when the users were asked specifically about parts of the texts they
liked, a majority liked parts of the text in the HT modality, and the three modalities
make references to parts of the text that had already powerful imaginary in the ST.

7.3.2.4 Do You Want to Make Any Other Comment?

There were other comments from the participants at the end of the survey. One
participant that read the HT modality said:
I have done the exercise in half an hour, I have read the text very quickly. The writing has
had an impact on me, I could see the images. For this reason, I wouldn’t read something
similar. The text is well written, and it transmits emotions. But I never read this type of
horrible thing. It is not my genre. But yes, the translation is brilliant, it transmits everything.
(P37)5

For this modality, the main issues reported are the genre, the topic and the narrative
style of the ST. Participant P37 even refers to the translation as “brilliant”.

MTPE

For this modality, again the main issues reported are aspects of the ST. Participant
P68 even wants to read the whole book and P85 also praises the translation.

5
The translations from Catalan into English are provided by the authors.
126 A. Guerberof-Arenas and J. Moorkens

I would like to read the whole book. (P68)


I haven’t read the original to correctly assess the translation. The Catalan used is very good,
but I doubt that the English version would make descriptions using less colloquial expres-
sions. (P85)

MT

In this modality, users did notice errors even though they did not disclose if they
realised that they were reading MT, and this seemed to influence their reading
experience.
There are words in the text that do not make any sense in the context, such as “jihad” or
“thone”. (P03)
It is difficult to know the quality of the translation without reading the original text. (P06)
As I advanced in my reading of the text the translation deficiencies have become less
problematic. (P23)
I've been in lock down for about fifty days and maybe this has influenced the fact that I had a
hard time concentrating. (P49)

We see here that once the readers are immersed in the narrative, they might
compensate for the lack of coherence, lexical accuracy and cohesion, so the context
and the narrative help to decipher a low-quality text. We wonder about the additional
cognitive effort of doing this, especially for a longer story. Recent research confirms
that the cognitive effort is higher when reading MT in literary texts (Colman et al.
2021). Finally, something that appears obvious, but given the current world situation
seems even more relevant, the personal circumstances of readers influence their
perception of language and their engagement. If participants already have difficulties
reading because of their personal circumstances (such as a pandemic), shouldn’t a
translation facilitate the engagement and enjoyment of the text instead of making it
more cognitively demanding?
In summary, we see that in a creative environment, the MT modality has a strong
effect on the reader experience. Readers show significantly less engagement, enjoy-
ment and diminished reception than those who read a version where a professional
translator intervened. We also see a pattern of higher values in the HT modality in
relation to the MTPE. We are aware that at present MT is not used in the publishing
sector, or at least its use is not publicised. However, MT is becoming an intrinsic part
of the translation workflow in the audiovisual sector through platforms that might
simplify structures to obtain better MT output (Mehta et al. 2020). Are viewers then
exposed to the best possible version of their language? There are some studies that
look into the productivity gains and quality of subtitles and conclude that this is a
viable solution (Bywood et al. 2017; Matusov et al. 2019; Koponen et al. 2020), but
we feel that analysing productivity and final quality in a more “traditional” way
leaves out an important aspect: the impact that MT has on the viewers. Based on this
research, it is important that streaming platforms make viewers and readers aware of
7 Ethics and Machine Translation: The End User Perspective 127

the use of MT, but also that translation reception studies are prioritised if technology
is to be used in any creative domain. The implications of these results for society are
various: on the one hand it shows that MT diminishes the reader's experience and
that using MT as a tool constrains the translators’ creativity. On the other hand, the
long-term effects of using technology might be worrying such as loss of lexical
richness, style simplification, loss of reputation for authors, and, thus, the minimi-
zation of the transformative effect that literature and fiction has on society.

7.4 Discussion on Ethical Implications

In Sect. 7.1 we saw that risk for low-stakes MT is minimised (but not negligible),
rising along with the shelf life of the text. In Sect. 7.2 we looked in detail at two
experimental use cases for raw MT—so-called MT for assimilation—where we can
see that, although the risk is not high, there are nuanced implications for translation
modalities that affect the end user or reader experience. Participants in the studies
described in Sect. 7.2 were not aware of the translation modality chosen to produce
the text that they engaged with, as their preconceptions may have altered their
behaviour or responses.6
It is commonly assumed that MT should help users to communicate and that this
means of communication is improving as the technology improves, e.g. the percep-
tion that NMT is “harmless” in short-term and low-risk scenarios, as opposed to
high-risk long-term scenarios that involve mainly health, crisis and/or legal settings
(Vieira et al. 2020). This stems from the logical perspective that when users are only
trying to gist for content, skimming a website, a document, or a message, misleading
or even inaccurate translations are not as “important” as when users are trying to
understand or carry out an action that involves their health, their legal status, or
indeed their survival. We are aware, as users of public and private MT technology
and researchers, that indeed MT helps users to communicate in a language other than
their own, especially if they have not mastered that second language.
By looking at the data from our research, we see several common patterns when
examining the user experience in the context of raw MT output in several models
considered low-risk: (a) users do not necessarily recognise that they are exposed to
MT, as indeed the technology improves, if they are not explicitly informed;
(b) nevertheless, users might be confused, frustrated or (in the case of biased output)
misled by the information found and are likely to encounter errors, awkward style,
and unintelligible words that will result in lower efficiency, satisfaction, enjoyment
or engagement scores; and (c) end users are affected by the translation modality,
especially if their experience of the translated application or knowledge of the source

6
Our ethics committees agreed that these were low-risk settings for use of MT, but in a high-risk
setting this should change. At what level of risk does a study using MT without informing
participants become research involving deception?
128 A. Guerberof-Arenas and J. Moorkens

Table 7.1 Summary of risks discussed


Information
duration High risk Low risk
Short term • Overreliance on MT may cause miscom- • Overreliance on MT may cause
munication misunderstanding
• MT might facilitate timely and cheap • Facilitates timely and cheap
sharing of information communication
• Health hazards • Might cause confusion and
• Legal implications frustration
• Loss of work opportunities • Loss of efficiency
• Loss of engagement
Long term • Financial consequences • Higher cognitive effort
• Legal actions • Loss of satisfaction
• Loss of health • Impoverishing of language
• Loss of legal status • Loss of reputation (brands or
• Loss of privacy authors)
• Loss of working opportunities

language is low, and they either fail to achieve what they set out to do or compensate
for errors by looking at the context, back-translating, or even compensating at later
stages of their interaction with the “product”. We conclude that the user experience is
not a binary issue resolved by asking if this “information was helpful or unhelpful in
your language” or by counting the “number of translation errors in the target
language” or by calculating a similarity score with a gold reference translation
(as happens in automatic MT quality evaluation). User experience research that
considers MT should look at a broader picture that considers experience not only
as a static and isolated event, but as part of a communicative process in the short and
long term. See Table 7.1 for a summary according to levels of risk and length of
time.
All of this indicates that there are implications and risks inherent in the use of MT
that has not been reviewed, suggesting that readers or users should be made aware
when text has been machine translated via a note or, keeping in mind the legal
implications of mistranslation, a disclaimer. Usability, privacy, and (cyber-)security
do not always integrate well, as evidenced by the ‘accept cookies’ popups that are
mandated by GDPR to appear when visiting many websites from within the
European Union, but a mandatory label or disclaimer for raw MT should not be
distracting or user-unfriendly. We should also clearly indicate what standard of
pre-publication review is necessary to avoid such a disclaimer.
The motivation for low-stakes MT, as described in Sect. 7.1, is for fast or
immediate translation of highly perishable low-risk text. As the shelf-life and risk
level of texts increase, producers or content providers may want to use MT in
translation workflows or even use raw MT to increase productivity and reduce
costs. There is a decision to make in this case, balancing potential risk against
speed and savings. Sometimes, as in Massidda (2015) and many multimedia trans-
lation production networks, consumer demand for simultaneous releases in all
locales—what used to be called simship—pushes turnaround speed. But is it really
7 Ethics and Machine Translation: The End User Perspective 129

the case that consumers cannot wait for the new game, software, or product, even if
this means a compromise on quality and an introduction of risk? Clearly labelling
MT published without review would make this clearer.

7.5 Conclusions on Ethical Implications

Therefore, in the MT ecosystem, the stakeholders involved have certain ethical


responsibilities for the communication to be optimal and fair, if indeed this is the
objective of MT. A well-informed translation ecosystem could better self-regulate,
although it is naïve to assume that this would end unethical practices.
Firstly, the stakeholders in charge of improving or creating MT engines have a
responsibility to create algorithms and treat data in a way that not only is not
discriminatory or harmful, but also so that the data is harvested in a way where the
“owners” of this data are acknowledged and reap the benefits of their work, and that
the final product preserves the richness of the user experience. This might mean
curating and documenting the data at deeper levels, such as lexical, syntactical and
stylistic levels, as well as removing elements of gender, racial or national bias, but
also in acknowledging the ownership of that data, and highlighting where issues
might arise in that data (racial, gender and other biases for sure, but also impover-
ishment of the language). This can be done by engaging professionals in the
translation field and not simply “annotators” found in online mechanical torques
and by understanding translation processes and user experience as opposed to taking
data only as an engineering and numerical exercise. As Bender et al. (2021) suggest,
this requires greater transparency and accountability on the part of MT systems and
services.
Secondly, companies or government agencies that make use of raw MT without
human intervention for reasons such as “improving productivity” or “disseminating
information” need to make this fact visible to users so that they know the content at
hand has not been reviewed and understand the consequences mistranslation might
have. Profit might be the primary motivation for a business and information dissem-
ination for government agencies, but the ethical responsibility towards its users and
society should be equally considered. In the case of MT, business and governments
could consider the immediate effect of its usage (can the user understand/enjoy the
product/text) and also the long-term effect (such as the impact of MT on language
style/richness in the target culture). This means that the responsibility not only lies
with the user in deciding to use public MT engines or translators using MT integrated
in CAT tools, but also with businesses and governments to investigate this impact
and to make it public so that users/society can choose whether to utilise a text with
enough accurate information at hand. There needs to be a legal framework (Yanisky-
130 A. Guerberof-Arenas and J. Moorkens

Ravid and Martens 2019)7 that enforces the advertising of MT usage as part of the
translation process so that users are aware and proceed with caution, and this
warning cannot be in small print hidden somewhere within the legal documentation
of the product or the text, but visible in the application, the text, or the leaflet that
spells out the existing issues with the technology and the possible risks the users
might encounter while using it.
Thirdly, translators and post-editors are also stakeholders in this ecosystem, as
they are obviously involved in the MT workflow as creators of training data when
post-editing, translating, and even when using MT as an additional tool. They have a
responsibility to acquire enough knowledge about the technology, to be aware of the
types of errors and biases found in the raw output and how best to fix them. Also,
more importantly, they need to be aware of copyright infringements and possible
interferences in producing a high-quality final product. We wonder how long a
translator can be exposed to a simplified version of a language without being
influenced by it. We understand that translators are not always well compensated
for their work, but there is a belief in some parts of the translation community that,
because they are dealing with MT post-editing, the effort required and the respon-
sibility towards the content, the user and reader, and the final quality is not as high as
without MT, and that the knowledge of a language is something immutable that will
not be influenced by processing poor translations and reading low-quality references.
The suggestion from Parasuraman et al. (2000) is that reasonably (but not entirely)
reliable automation can lead to skill degradation and to complacency in trusting the
automated output. In a similar way that good journalism does not occur by cutting
and pasting news from different sources, good translation practice does not involve
just cutting and pasting references from different technologies and that process could
result in a serious impoverishment of the language and the profession.
Fourthly, academics need to bring to light the aspects that we have mentioned in
this paper, mainly use and propagation of curated data, responsible use of that data,
MT literacy for translators and users, analysis of user and reader reception of
translated work, including different translation modalities in different languages,
and different genres. Academia has been better at analysing the use and effect of MT
than that of other tools (such as translation memories), possibly because the impact
of MT has also been greater. However, there is a need to engage more often and more
deeply with the final users of MT and within different branches of knowledge.
Interdisciplinary research cannot be a mere box to be ticked in grant applications,
it must be a real endeavour where academics value all aspects of a field. The more
mathematical or engineering sides of research cannot have precedence over the more
humanistic side of technology: the effect on the people using the technology. For
research on user reception and user experience, interdisciplinary research needs to

7
The authors offered a detailed description of copyright laws, translation and how AI might be
infringing these international laws that protect authors and creators, and they suggest including
work generated by AI systems.
7 Ethics and Machine Translation: The End User Perspective 131

happen at all levels of the research system, and not as an afterthought regarded as a
lesser science.
And finally, we should also consider the ethical responsibility of the users/
readers. If the ethical dimension is an ecosystem, users also have the responsibility
to buy products that protect language, translators, and those in the text supply chain
in the source and target culture. In the same way that some consumers might not buy
certain brands or in certain shops or from web portals or shopping malls because of
their operational practices or because in doing so they are destroying the local,
social, and commercial fabric of their city, region or country, users should have
enough information about how a text is produced to choose not to buy/see/hear a
given product or, at least, to know the effects this will have on their user experience,
their language, and their culture in the long term. This can only happen if readers and
users are informed by the whole ecosystem and if the ecosystem promotes transpar-
ent information sharing.

References

Bender EM, Gebru T, McMillan-Major A, Shmitchel S (2021) On the dangers of stochastic parrots:
can language models be too big? In: FAccT ’21, March 3–10, 2021, Virtual Event, Canada
Bowker L, Ciro JB (2019) Machine translation and global research: towards improved machine
translation literacy in the scholarly community, 1st edn. Emerald Publishing, Bingley
Busselle R, Bilandzic H (2009) Measuring narrative engagement. Media Psychol 12:321–347.
https://doi.org/10.1080/15213260903287259
Bywood L, Georgakopoulou P, Etchegoyhen T (2017) Embracing the threat: machine translation as
a solution for subtitling. Perspectives 25:492–508. https://doi.org/10.1080/0907676X.2017.
1291695
Cadwell P, O’Brien S, DeLuca E (2019) More than tweets: a critical reflection on developing and
testing crisis machine translation technology. Transl Spaces 8:300–333
Canfora C, Ottmann A (2018) Of ostriches, pyramids, and Swiss cheese: risks in safety-critical
translations. Transl Spaces 7:167–201
Canfora C, Ottmann A (2020) Risks in neural machine translation. Transl Spaces 9:58–77
Castilho S (2016) Acceptability of machine translated enterprise content. Ph.D. Thesis, Dublin City
University
Ciribuco A (2020) Translating the village: translation as part of the everyday lives of asylum seekers
in Italy. Transl Spaces 9:179–201
Colman T, Fonteyne M, Daems J, Macken L (2021) It is all in the eyes: an eye-tracking experiment
to assess the readability of machine translated literature. In: The 31st meeting of computational
linguistics in The Netherlands, Ghent
Čulo O, Nitzke J (2016) Patterns of terminological variation in post-editing and of cognate use in
machine translation in contrast to human translation. In: Proceedings of the 19th annual
conference of the European association for machine translation, pp 106–114
Desjardins R (2017) Translation and social media. In: Theory, in training and in professional
practice, 1st edn. Palgrave Macmillan, London
Dixon P, Bortolussi M, Twilley LC, Leung A (1993) Literary processing and interpretation:
towards empirical foundations. Poetics 22:5–33. https://doi.org/10.1016/0304-422X(93)
90018-C
132 A. Guerberof-Arenas and J. Moorkens

Doc Society v. Pompeo (2019) Doc Society v. Pompeo: a lawsuit challenging the State Depart-
ment’s social media registration requirement. In: 1st Night Amendment Institute at Columbia
University. https://knightcolumbia.org/cases/doc-society-v-pompeo
Doherty S, O’Brien S (2014) Assessing the usability of raw machine translated output: a user-
centered study using eye tracking. Int J Hum Comput Interact 30:40–51. https://doi.org/10.
1080/10447318.2013.802199
Federici FM, O’Brien S (2020) Cascading crises: translation as risk reduction. In: Federici FM,
O’Brien S (eds) Translation in cascading crises. Routledge, Abingdon, pp 1–22
Guerberof-Arenas A, Toral A (2020) The impact of post-editing and machine translation on
creativity and reading experience. Transl Spaces 9:255–282
Guerberof-Arenas A, Moorkens J, O’Brien S (2019) What is the impact of raw MT on Japanese
users of Word: preliminary results of a usability study using eye-tracking. In: Proceedings of
XVII machine translation summit. European Association for Machine Translation (EAMT),
Dublin, pp 67–77
Guerberof-Arenas A, Moorkens J, O’Brien S (2021) The impact of translation modality on user
experience: an eye-tracking study of the Microsoft Word user interface. Mach Transl. https://
doi.org/10.1007/s10590-021-09267-z
Hakemulder J (2004) Foregrounding and its effect on readers’ perception. Discourse Process 38:
193–218. https://doi.org/10.1207/s15326950dp3802_3
Hern A (2020) Amazon hits trouble with Sweden launch over lewd translation. The Guardian
Kenny D, Moorkens J, de Carmo F (2020) Fair MT: towards ethical, sustainable machine transla-
tion. Transl Spaces 9:1–11
Koponen M, Sulubacak U, Vitikainen K, Tiedemann J (2020) MT for subtitling: user evaluation of
post-editing productivity. In: Proceedings of the 22nd annual conference of the European
association for machine translation. European Association for Machine Translation, Lisboa,
pp 115–124
Kranzberg M (1986) Technology and history: “Kranzberg’s Laws”. Technol Cult 27:544–560
Larsonneur C (2021) Neural machine translation: from commodity to commons? In: Desjardins R,
Larsonneur C, Lacour P (eds) When translation goes digital: case studies and critical reflections.
Springer, Cham, pp 257–280
Liebling DJ, Lahav M, Evans A et al (2020) Unmet needs and opportunities for mobile translation
AI. In: Proceedings of the 2020 CHI conference on human factors in computing systems. ACM,
Honolulu, pp 1–13
Marking (2020) Thai mistranslation shows risk of auto-translating social media content. Slator
Massidda S (2015) Audiovisual translation in the digital age: The Italian fansubbing phenomenon,
1st edn. Palgrave Macmillan, London
Matusov E, Wilken P, Georgakopoulou Y (2019) Customizing neural machine translation for
subtitling. In: Proceedings of the fourth conference on machine translation, vol 1. Association
for Computational Linguistics, Florence, pp 82–93
Mehta S, Azarnoush B, Chen B, et al (2020) Simplify-then-translate: automatic preprocessing for
black-box machine translation. arXiv:200511197 [cs]
Moorkens J, Lewis D (2019) Research questions and a proposal for governance of translation data,
p 24
Nurminen M (2018) Machine translation in everyday life: What makes FAUT MT workable? In:
TAUS eLearning blogs. https://blog.taus.net/elearning/machine-translation-in-everyday-life-
what-makes-faut-mt-workable. Accessed 25 Aug 2020
Nurminen M, Koponen M (2020) Machine translation and fair access to information. Transl Spaces
9:150–169
O’Mathúna DP, Escartín CP, Roche P, Marlowe J (2020) Engaging citizen translators in disasters:
virtue ethics in response to ethical challenges. TIS 15:57–79. https://doi.org/10.1075/tis.
20003.oma
Olohan M (2017) Intercultural faultlines: research models in translation studies: v. 1: textual and
cognitive aspects. Routledge, London
7 Ethics and Machine Translation: The End User Perspective 133

Parasuraman R, Sheridan TB, Wickens CD (2000) A model for types and levels of human
interaction with automation. IEEE Trans Syst Man Cybern 30:286–297. https://doi.org/10.
1109/3468.844354
Parra Escartín C, Moniz H (2019) Ethical considerations on the use of machine translation and
crowdsourcing in cascading crises. In: Translation in cascading crises, 1st edn. Routledge,
London
Paullada A (2020) How does machine translation shift power? In: Resistance AI workshop at
NeurIPs 2020, Virtual Event, Canada
Pichai S (2016) Google I/O 2016 - keynote
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed
phrasal SMT. In: Proceedings of the 43rd annual meeting of the association for computational
linguistics (ACL’05). Association for Computational Linguistics, Ann Arbor, pp 271–279
Raley R (2003) Machine translation and global English. Yale J Crit 16:291–313
Rey B (2014) Your tweet half-life is 1 billion times shorter than Carbon-14’s. In: Wiselytics. https://
www.wiselytics.com/blog/tweet-isbillion-time-shorter-than-carbon14/. Accessed 3 May 2021
Schmidtke D, Groves D (2019) Automatic translation for software with safe velocity. In: Pro-
ceedings of machine translation summit XVII volume 2: translator, project and user tracks.
European Association for Machine Translation, Dublin, pp 159–166
Smith R (2018) The google translate world cup. The New York Times
Sumita E (2017) Social innovation based on speech-to-speech translation technology targeting the
2020 Tokyo Olympic/Paralympic Games Presentation at MT Summit XVI, Nagoya, Japan
Thicke L (2013) Post-editor shortage and MT. Multilingual Magaz 2013:42–44
Toral A (2019) Post-editese: an Exacerbated Translationese. arXiv:190700900 [cs]
Toral A, Way A (2018) What level of quality can neural machine translation attain on literary text?
arXiv:180104962 [cs]
Vanmassenhove E, Shterionov DS, Way A (2019) Lost in translation: loss and decay of linguistic
richness in machine translation. In: Proceedings of machine translation summit XVII volume 1:
research track. European Association for Machine Translation, Dublin, pp 222–232
Vieira LN, O’Hagan M, O’Sullivan C (2020) Understanding the societal impacts of machine
translation: a critical review of the literature on medical and legal use cases. Inf Commun Soc
1:1–18. https://doi.org/10.1080/1369118X.2020.1776370
Vollmer SM (2020) The digital literacy practices of newly arrived Syrian refugees: a spatio-visual
linguistic ethnography. PhD Thesis, University of Leeds
Wang J, Xu C, Guzman F, et al (2021) Putting words into the system’s mouth: a targeted attack on
neural machine translation using monolingual data poisoning. arXiv:210705243 [cs]
Way A (2018) Quality expectations of machine translation. In: Moorkens J, Castilho S, Gaspari F,
Doherty S (eds) Translation quality assessment: from principles to practice. Springer, Berlin, pp
159–178
Weaver W (1949) Translation. UNESCO memo. Rockefeller Foundation
Winner L (1983) Technologies as forms of life. In: Cohen RS, Wartofsky MW (eds) Epistemology,
methodology and the social sciences. Reidel, Dordrecht, pp 249–263
Yanisky-Ravid S, Martens C (2019) From the myth of babel to google translate: confronting
malicious use of artificial intelligence – copyright and algorithmic biases in online translation
systems. SSRN J. https://doi.org/10.2139/ssrn.3345716
Chapter 8
Ethics, Automated Processes, Machine
Translation, and Crises

Federico M. Federici, Christophe Declercq, Jorge Díaz Cintas,


and Rocío Baños Piñero

Abstract Deploying technologies in support of translation/interpreting during cri-


ses in multilingual settings poses serious deontological and ethical challenges. Most
arise from ethical concerns around the adoption of technologies that can be only
partially controlled. We discuss automated translation processes in relation to four
dimensions of preparedness, crowdsourcing and data mining, local vs global crises,
and multimodal demands of communication. We start by considering automation
processes in which MT is embedded as a tool to support crisis communication, we
consider ethical risks pertaining reliance on and understanding of MT potential. We
then focus on the ethical complexity of multimodal processes of communication that
hinge on crowdsourcing practices, that collate users’ data, and that complement
other automation processes. We move on to correlate these explicit ethical dimen-
sions, with successful applications of MT engines to respond to local and global
crises, and reflect on the ethical need to enshrine these in other practices that make
translation into a risk reduction tool. We eventually zoom in on areas that are yet to
exploit fully current automation processes, yet have already encountered ethical
dilemmas when delivering information in multimodal format. We look at ways in
which current automation processes can be successfully exploited, while we also
warn to revise practices in which too much is expected by MT and automation
processes thus heightening rather than reducing risks when communicating in
multilingual crises. We conclude the chapter by connecting our ethical consider-
ations on the role of MT and automation to debates around linguistic equality and
social justice.

F. M. Federici (✉) · J. D. Cintas · R. B. Piñero


University College London, London, UK
e-mail: [email protected]
C. Declercq
University College London, London, UK
Utrecht University, Utrecht, Netherlands

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 135
H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_8
136 F. M. Federici et al.

Keywords Multimodal crisis communication · Human-machine integration ·


Translation as risk reduction · Crisis translation

8.1 Introduction

The United Nations Sustainable Development Goals (UN 2015) are underpinned by
the motto “leave no one behind”, which reflects the notion of inclusive development.
At the core of the UN’s 2030 Agenda for Sustainable Development, these goals
drove some attention to multidirectional communication as a way of increasing
social cohesion and equity. Timely, efficient, and trustworthy communication is
paramount to increase resilience to cascading crises for multilingual communities.
The world is richly multilingual, and delivering urgent messages across languages,
when it matters, quickly, and accurately, makes such paramount communication act
almost a superhuman task. Technologies are therefore superlative resources to
support translation and interpreting in intercultural crisis communication settings.
Automation resources that are crucial to speed up communication come in the form
of standalone applications, cloud-based platforms, and apps; they can deal with both
spoken and written texts.1 While the former textual types have benefited from
technologies such as automated speech recognition (ASR) and machine interpreting,
the latter encompass machine translation (MT), computer-aided translation (CAT)
tools, and hybrid MT-CAT solutions. Regardless of their nature, it is important to
avoid enthusiastically portraying them as self-sufficient solutions to pursue the UN
ambitious “leave no one behind” motto, even though they clearly play a substantial
role and will play an ever bigger one in the future. The questions we ask in this
chapter are not intended to challenge the technological innovations that are often
desperately needed but, rather, they are posed to raise some of the ethical issues that
must be considered, and some of the challenges that need to be tackled in order to
apply technology in an appropriate and efficient manner.
Deploying technologies in support of translation/interpreting during crises in
multilingual settings poses serious deontological and ethical challenges. Most arise
from ethical concerns around the adoption of technologies that can be only partially
controlled. Often, MT output is taken at face value and MT engines can be part of
automation processes where no quality assurance is available, but where no other
options to communicate exist. Unchecked uses of MT, even if justified by situational
constraints and urgency, raise concerns. Technology-driven automation rests on an
ethically complex spectrum in which the human-machine interaction ought to be
central to inform the final decision-making process for providing translation in
cascading crises. Heralding the benefits of technology to specialists and professional
linguists is different to showcasing them to international relief collaborations in
which some lingua francas have assumed a dominant position.

1
See Nimdzi’s language technology atlas (www.nimdzi.com/language-technology-atlas-2019) for a
detailed overview. Accessed 24 February 2022.
8 Ethics, Automated Processes, Machine Translation, and Crises 137

This chapter presumes knowledge of MT concepts and workflows in the readers.


Our contribution stems from our engagement in this subject as trainers of
technology-savvy translators who are going to support multilingual communication
in crises and who frequently use CAT tools, MT, post-editing, cloud-based platforms
and operate in partially automated translation workflows. This chapter therefore
engages with perceptions of how artificial intelligence (AI) and automation pro-
cesses, especially those reliant on MT and machine learning (ML), permeate differ-
ent translation workflows. It seeks to highlight the ethical issues raised by such
perceptions in commissioners and recipients of translated texts, who rely on such
translations to take actions to reduce risks. The focus is, therefore, on how translation
actors interact with the distinct technologies available to them in relation to crisis
settings and not on any potential concerns about how the technologies have been
designed and built. Ethical concerns emerge when using raw MT output and when
making use of poorly understood functionalities in the workflow of multimodal
translations (subtitles, speech-to-text transcriptions, etc.). All these aspects deserve
attention and critical consideration since ethics is crucial to engage with translation
as a risk reduction tool (Federici and O’Brien 2020).
The chapter is subdivided into sections that enable us to discuss four dimensions
in which ethics play an important role, as regards the implementation of automated
processes in crisis settings. Firstly, the frame of automation in which MT is embed-
ded as a tool to support communication in crises must be explained. Secondly, the
complex dimension of ethics in multimodal processes of automation will be
scrutinised, including its co-appearance with practices such as crowdsourcing and
mapping. Thirdly, the specificity of hybrid multimodal processes and platforms used
to enhance translation capacity in crisis settings is discussed. Fourthly, the perspec-
tive of a technology-centred approach as the option that solves the problem of
multilingualism is briefly discussed by juxtaposing the correlation between forms
of automation and training of critical users of the tools. The concluding remarks will
bring forward ethical considerations on the role of MT and automation processes
within existing debates around linguistic equality and social justice.

8.2 Automation Processes and Crisis Preparedness

By considering recent work on ethics and citizenship in crisis translation


(O’Mathúna and Hunt 2019; O’Mathúna et al. 2020), on MT in providing fair access
to information (Nurminen and Koponen 2020), and on the spectrum of translation
technologies used in disasters (O’Brien 2019), it is clear that while automation
processes facilitate communication in crises, they also raise concerns, especially
when technologies are used uncritically.
A discussion of ethical concerns in relation to crisis communication must be
grounded in an overview of what crisis and emergency risk communication entail, as
it explains the rationale behind the concerns we intend to raise. The term ‘crisis’ here
is used to describe disruptive events such as disasters, emergencies, armed conflicts,
138 F. M. Federici et al.

cyber- and terrorist attacks (Alexander and Pescaroli 2019; O’Brien and Federici
2019). Often used interchangeably, particularly the first two, these terms indicate
different triggers, from natural hazards via teleological choices (e.g. conflict,
cyberattacks, and terrorism), to technological failures (e.g. failing nuclear reactors,
collapsed hydroelectric power plants, etc.), to which correspond varying scales of
impact on populations, properties, infrastructures, and societies. When crises happen
in multilingual environments, they disrupt the way of life of entire communities
within one country or across a region; they often require resources beyond those
locally available to the most affected communities, and, increasingly, they have
cross-national cascading effects (Alexander and Pescaroli 2019). International crises
entail communication expectations to enable humanitarian operators, rescuers, and
disaster managers to coordinate each other’s efforts. They also need to provide
information to and gather information from affected communities, who should be
able to seek information using their own languages. In short, communication must be
a multidirectional exchange of information and must be accessible and inclusive if it
is to reduce risks for all members of the crisis-affected communities (Greenwood
et al. 2017; O’Brien et al. 2018).
In the 21st century, local triggers and hazards easily generate transboundary
cascading crises. Whereas, on the one hand, the 2017 Sierra Leone mudslide did
not have repercussions in Europe or Asia, the country was severely affected in terms
of trade, farming, and transportation around the capital, Freetown. On the other hand,
certain hazards set off international crises, as the COVID-19 pandemic has demon-
strated. Although it can be argued that the hazard posed by the mutation of SARs-
CoV-2 from animal to human transmission was not predictable, such
unpredictability does not justify lack of preparedness in terms of running multilin-
gual information campaigns from the onset. Before the pandemic, WHO (2014) had
published and revised information for medical personnel and crisis communicators
to explain how to prevent wide-spread contagion against severe acute respiratory
syndromes (SARs).2 At the time when WHO published these measures in English,
they could have been translated into multiple languages, especially focusing on those
used in deprived and multilingual areas with under-resourced healthcare systems
(see Crouse Quinn 2008). Pioneering projects such as the Canadian Global Public
Health Intelligence Network (GPHIN), launched in 1997, have been using MT
technology to translate important measures for public health in multiple languages
(Mawudeku et al. 2013). Risk reduction measures such as GPHIN have not seen
world-wide adoption, nor secured funding to achieve full coverage in terms of
languages of under-resourced and vulnerable regions, though the UN SDGs agenda
may increase momentum around these topics. However, it has influenced the
development of the Epidemic Intelligence from Open Sources (EIOS) system, see

2
The authors refuse to use the semantically misleading “social distancing” as it is evidence of a
kneejerk reaction, rather than the implementation of bespoke emergency plans: physical and spatial
distance, face masks, and PPE were necessary preventive measures to avoid contagion; endless
nation-wide lockdowns generating “social distancing” became necessary as failures to act suitably
in distancing people and reducing spread of the SARS-CoV-2.
8 Ethics, Automated Processes, Machine Translation, and Crises 139

discussion in the Sect. 8.4 of this chapter, which aims to scale up the GPHIN
monitoring approach increasing reliance on machine learning, web crawling, and
integrated forms of automation for data collection and sharing (Abdelmalik et al.
2018, p. 268).Together with the Global Disaster Alerting Coordination System
(GDACS), fruit of the collaboration between the UN Office for the Coordination
of Humanitarian Affairs (OCHA) and the European Commission, that provides live
updates and warnings on natural hazards, these systems should be the beacons of
global preparedness. GPHIN and EIOS embed MT engines, but GDACS does not do
so yet, even though it intends to support “disaster managers worldwide.”3
Risk preparedness is often a question of momentum and visibility, since, as
highlighted by Kelman (2020, p. 1), “[d]isasters are a socially and politically
structured phenomenon, arising from a combination of hazard and vulnerability,
with the associated risks reflected in public policies, infrastructure decisions and
considerations of inclusion”. To be inclusive, communication must “leave no one
behind”, so it needs to be in a language and format that are accessible. Given that
crisis lifecycles go beyond the response phase, MT resources can be used for
different purposes at various stages and operate in conjunction with other technol-
ogies such as CAT tools and ASR systems. Guidance on MT post-editing must be
part of a compulsory baseline training to maintain communication in myriad lan-
guages and avoid disastrous translations. The debate on whether the MT solutions
for information mining heighten rather than reduce risks is open (Cadwell et al.
2019). When it comes to fully or partially automated technological solutions, the
multiple functions performed through automation processes embedded in translation
workflows require assessment from an ethical perspective (Kenny 2011; Parra
Escartín and Moniz 2020), acknowledging recent work on ethics and citizenship in
crisis translation (O’Mathúna and Hunt 2019; O’Mathúna et al. 2020). Our first
ethical stance is that automation processes of translation have a central role to play,
alongside human interaction, when it comes to increasing preparedness.
In the case of language challenges, circulation of information from the urban to
the rural areas of Sierra Leone was an issue in 2017, as it had been during the 2014
Western Africa Ebola outbreak, due to the multiple languages in which such
information had to reach affected people and overcome significant cultural and
social barriers. Communication issues have had global consequences during the
recent pandemic. One common denominator remains: people are interconnected but
planning for efficient crisis communication across multilingual communities con-
tinues to be missing, as much as opportunities for embedding automation processes
in human-monitored translation workflows to enhance crisis communication. In both
examples of markedly diverse crises in terms of sizes of affected communities and
geographical impact, circulation of information had a vital role to play, and com-
munication strategies have become “a key plank of responses” to crises (Quinn
2018, p. 1). In this context, technologies that support translation automation, so that
cascading information in affected areas can reach out to affected communities faster,

3
See https://gdacs.org. Accessed 24 February 2022.
140 F. M. Federici et al.

are promising solutions. Not only should provision and access to information cut
across language boundaries, but also across social strata and sensory abilities. One
priority, underpinning most ethical issues, would be to identify who can be held
responsible for achieving multilingual communication since very rarely is a single
individual in charge of this objective, which is rather a distributed responsibility
within the international humanitarian sector (Federici et al. 2019a).
All members of affected populations and all language communities ought to be
able to find information relevant to them in a language and format they understand
(WHO 2017), including in this definition members of society affected by sensory
impairments such as deafness and blindness. To date, these communicative
exchanges fail to be symmetrical, often relying on a global or regional lingua franca,
which means that the risk of a top-down distribution of information to affected
populations remains high in international contexts (IFRC 2018). The dissemination
of information among affected communities features prominently when interlingual
translation is considered as part of the equation during crises, in which case an
apparent lack of planning tends to be observed when it comes to employing trans-
lators and interpreters in the response phase of crises. This issue leads to two further
problems: (1) translation capacity relies on both planning, which cannot just happen
when the crisis emerges, and availability of resources, which varies hugely for each
affected locality; (2) disregard of translation and interpreting (T&I) issues means that
the focus of the relief effort continues to be on the ad-hoc response phase of any
crisis rather than on the planning one.
These considerations dictate our second ethical stance: T&I must be acknowl-
edged more broadly in relation to the lifecycle of crises. For disaster risk reduction
experts and government officials, the best solutions are those that have been planned,
for which personnel have been trained, and of which the population has been
informed. To enhance social resilience, crises are better considered in relation to
preparedness (emergency plans, budgeting, assessing hazards, creating and evaluat-
ing resources in relation to them), which can encompass language needs. In this
respect, selecting, funding, and developing automation processes to support T&I to
meet such needs will contribute to designing emergency plans that mitigate the risks
in a multilingual crisis.
Ethical engagement with affected communities implies rectifying this asymmet-
rical communication that pivots around the main language of one country, or the
lingua franca in international settings, by enabling multi-directional communication
through policies and emergency plans (Federici et al. 2019b). Translation can be an
ethically sustainable risk reduction tool (Federici and O’Brien 2020) once, and if,
multilingual communication is planned and part of the equation. Awareness of the
cultural and linguistic needs of the affected communities is a sine qua non when
engaging multilingual communities and, to be successful in its objectives, it needs to
be supported by appropriate translation technologies and workflows. A translated
text, including audiovisual ones, is more likely to lead recipients to consider risk
reduction actions if it is trusted. Full automation is a risk as MT specialists and T&I
professionals know the strengths, weaknesses, and differences of usage of current
systems, but non-discerning users may not, and could thus deploy MT engines in
8 Ethics, Automated Processes, Machine Translation, and Crises 141

ways that end up eroding trust (see discussion of Wuhan’s crisis response in the next
section). To be ready and able to deploy all available solutions, one needs to know
the vulnerabilities of the affected (members of) society. These ethical principles of
engagement with communities for the promotion of symmetrical channels of com-
munication is pivotal when considering the role of T&I and automation in crisis
settings.
From an operational perspective, the use of MT solutions ultimately seeks to
provide language support that is critical to the response phase of disaster, crisis, and
emergency management (Hunt et al. 2019). Yet, it can be argued that the technology-
centred approach is not enough to establish trust on the message, as any MT output
will be used, read, adopted, and/or manipulated by humans at some point in the
communicative workflow. Although an important solution, it could lead to mis-
conceptions and faulty reception down the line, as translated materials have an
important role to play throughout a crisis lifecycle.
A lot of emphasis has been placed on creating MT engines that use, as one of their
main linguistic components, the international response lingua franca, i.e., English, in
an attempt to support efficient communication to aid affected populations and to
support responders. Successful operations such as Mission 4636 (discussed in Sect.
8.3) justify this initial approach. This perspective marries many of the issues
surrounding automation (e.g. MT, automated workflows, pivoting techniques, ren-
dering of multimodal documentation) with notions such as urgency and response. As
MT is perceived to “dramatically increase the speed by which relief can be provided”
(Lewis et al. 2011, p. 501), it is obvious why it should be considered first for a crisis’
response phase. Automation processes could achieve more by supporting under-
resourced languages and enabling communities to translate preparedness materials
and create their own bilingual resources (translation memories and domain-specific
engines) to enhance their resilience over time to known hazards. From working with
community translators (O’Mathúna et al. 2020), who may use MT output in uncrit-
ical manners, to time-pressed end-users who need translated information, ethical
concerns abound. Early and regular collaboration among technology experts and
low-resource language users could help alleviate some of these concerns. In the
response phase emergency, personnel can do with having “a sufficient picture of a
situation that responders will be able to plan and execute human and material
resource deployment activities” (Christianson et al. 2018, p. 8). It is, however, a
one-way communication solution; the question of how affected communities would
be able to direct their questions to those providing aid remains a critical issue.
In particular, by focusing on high-resource languages, natural language
processing (NLP) researchers have prioritised methods that work well only when
large amounts of labelled and unlabelled data are available (Ruder 2020). This very
reality clashes with the unpredictability of crisis settings, in which the languages
spoken by the affected populations may, or may not, have any amount of labelled/
unlabelled data. MT technologies are one of the various resources available to
responders and, to attain any ethical value, they need to be based on ML approaches
in which the automated processes do not reflect world-bias in favour of English but
provide the support tools necessary to the World South. Also, low-resourced
142 F. M. Federici et al.

language communities must be made aware that MT engines and the ML models on
which they are based have been able to extract only limited information, creating a
resource that has scarce potential for direct use.
Our third ethical stance focuses on the need for MT and automation processes to
be presented clearly for their potential as well as for their limitations, which must be
critically explained to users.

8.3 Crowdsourcing Data, Mapping, and Translation


Automation

Since 2011, the use of MT in crisis settings has been explored (Lewis et al. 2011;
Munro 2010, 2013; Christianson et al. 2018), together with time-saving automation
processes and crowdsourcing (Sutherlin 2013; Mulder et al. 2016), as well as
technological-dependencies of the Third Sector (Rico Pérez 2013, 2020). Cadwell
et al. (2019) highlight the problems of data mining social-media resources using MT
engines. As far as the use of social media in crisis scenarios is concerned, the
technological framework in which relief responders operated in the 2010 Haiti
earthquake stands out as a paradigm that continues to influence the deployment of
MT in twenty-first century crises. The approach took into consideration logistical
connectivity, including the crowdsourcing of the translation of text messages and
messages from social media, between the local language, Haitian Creole, and that of
the responders. Mission 4636 (mission4636.org) launched soon after the scale of
destruction, mortality, and morbidity brought about by the earthquake became clear.
A partnership among 50 countries, Mission 4636 provided an online translation and
information processing service, connecting the Haitian people with the international
aid efforts. Devised to support local emergency response, this was a technology silo
in which local crowdsourcing efforts merged with translation—human translation
first, supported by MT—for the purpose of information assimilation (i.e., gaining a
superficial understanding of the meaning of what is being conveyed through the
automated translation output) and interchange.4 Haitians could mainly be reached by
phone and text messaging. Setting up the free emergency phone number ‘4636’, for
people to send text messages, relief organisations had a direct channel to collect and
share information. With cell towers restored pretty much immediately after the
disaster and with 83% of men and 67% of women owning a mobile phone, Haiti
remained connected. Survivors looking for relatives and friends could find out
through text messages and social media about their location and physical situation,
and they did so mainly in Haitian Creole, a language that international rescuers
arrived in Haiti did not speak or understand (Munro 2010; Lewis et al. 2011).
Cellular connectivity remained present and wide coverage was quickly restored
during the Haiti crisis; in its resilience, the mobile communication infrastructure

4
See Hutchins (1995, 2005) on the use of MT for the purpose of assimilation and interchange.
8 Ethics, Automated Processes, Machine Translation, and Crises 143

allowed responders to translate messages to the 4636 number. The messages located
individuals’ or groups’ requests for help on maps so that emergency logistics could
be deployed over there; and they were translated by about 2000 volunteers who
spoke Creole and/or French. Munro (2013) reports that over 80,000 messages were
processed, producing 45,000 unique reports that were transmitted to emergency
rescue teams on the ground. Increasingly over the first few weeks, this task was
then undertaken by people who were paid to provide translations. In the same period,
MT engines were developed (by Microsoft first, Google next) using the processed
messages, thus creating an additional support of the relief response, which
complemented the communication chain organised around the text messages
(Munro 2010; Lewis et al. 2011). The crowdsourcing and translation approach in
place allowed for responders to gain an idea of what the crowdsourced-translated
texts conveyed and to act quickly upon the information received. Through its ability
to empower disaster-affected communities and help them define the way in which
they could obtain help, the crowdsourced mapping approach of Mission 4636 has
quite simply revolutionised the way in which crisis response is nowadays perceived
and articulated (Harvard Humanitarian Initiative 2011).
With the combined efforts of crowdsourcing, mapping, and translating (with the
support of MT), Mission 4636 shaped emergency responses to other high-profile
crises such as the 2011 Great East Japan Earthquake, the 2013 Typhoon Haiyan/
Yolanda (Southeast Asia, but it mainly affected the Philippines), the 2014 Western
Ebola outbreak, the 2015 Cyclone Pam (South Pacific, but it mainly affected
Vanuatu), and the 2015 Nepal earthquake (IFRC 2013; Meier 2015; Ramalingam
and Sanderson 2015). In the two days following Typhoon Haiyan in the Philippines
in 2013, for instance, nearly 230,000 tweets were collected and processed. Only
800 (0.35%) were relevant but provided critical information about the most affected
regions which emergency response had to reach, thus saving lives (Moore and Verity
2014). In the evacuation phase, as reminisced by Field (2017, p. 341):
a common recollection in interviews of one of the causes of low evacuation rates in the days
preceding the landfall of typhoon Yolanda was the fact that the projected tidal impact on the
exposed coastal regions was referred to as a ‘storm surge’ rather than a tsunami or a
destructive wave. While the two are scientifically different phenomena, it was acknowledged
that had the threat of the storm surge been likened to that of a tsunami (for a coastal
population hit by a wave, the impact would be similar), the coastal regions would have
seen higher evacuation rates.

Pitching human competence in revising scientific terminology versus mere reliance


on technology would be simplistic, which is why MT and automation processes
must be scrutinised holistically. Despite increasing awareness and promotion of the
benefits of crowdsourced crisis mapping, cross-organisation collaboration between
“those supplying crowdsourced mapping data, and their intended ‘clients’, Volun-
teer and Technical Communities and formal responders” still has lots of leeway for
further improvement (Hunt and Specht 2019, p. 4). Including linguists and devel-
oping their competence in MT quality assurance processes could be sufficient to
minimise errors when deploying MT solutions alongside other communication
strategies in such contexts. For this to happen, the focus has to shift to preparing
144 F. M. Federici et al.

domain-specific resources (from bilingual corpora to MT engines) that will power up


automation processes. Examples that make available such language resources can be
seen in Yourterm’s (2021) online portal of COVID-19 terminology resources in
numerous languages, in the provision of the TAUS (2021) Corona Crisis Corpora,
and in Systran’s (2021) Corona Crisis translation models for MT in various language
pairs.
For good quality data as well as domain-specific resources to be available a shift
toward inclusive preparedness is necessary. As foregrounded by Iglesias et al.
(2020), the full potential of combining automated big data processing for specific
purposes (here digital humanitarianism), has not been reached yet. The tension
between potential and effective solutions in crisis settings opens the doors to
additional technology that cuts across the various dimensions of cross-organisational
collaboration. Yet, it also calls for an ethical assessment of the apparent lack of
quality checks of domain-specific data sets available for MT engine training and of
the risks of harnessing crowdsourced resources without quality check mechanisms
or training support for those suddenly plunged into the use of automated or
crowdsourced translation outputs.
Typically, crowdsourcing in settings of crises and their ensuing humanitarian
responses refers to a method of creating datasets from available local data, through
filtering and processing, for specific purposes (Mulder et al. 2016). Regardless of the
cross-organisation collaboration behind the crowdsourcing technology, the
processing and interpreting of the data obtained requires local knowledge. In the
case of information flows in post-earthquake Nepal in 2015, knowledge presented in
the datafication process mutated on two levels: first, the explicitly available infor-
mation was verbalised in local languages—spoken or written—and subsequently
became available in translation into English. As discussed by Mulder et al. (2016),
the move towards the use of a single language of emergency support, English as a
pivot, risks excluding data processors and analysts with local knowledge and—more
importantly—can leave behind affected people who do not have knowledge of the
language used by emergency responders, without beginning to consider endemic
issues with digital inequalities and the digital divide affecting marginalised, or
partially integrated, communities. Additionally, resorting to pivot languages, to
then translate into other minoritised languages, risks omitting important cultural
and linguistic nuances or including mistranslations that can negatively affect the
effectiveness of communication during a crisis. While it can be argued that the
advantages of having information translated via a pivot language (usually English)
may outweigh any potential ensuing problems, the jury is still out on the ethical
issues resulting from these scenarios.
For Iglesias et al. (2020), translations done using crowdsourcing or automatic
methods are core tasks that belong to an elaborate framework for data activity geared
towards improved access to information. Other scholars, such as Greenwood et al.
(2017), O’Brien et al. (2018), and Nurminen and Koponen (2020), also agree that
access to information is a human right and that MT can help ensure accessibility of
multilingual information for previously underserved groups.
8 Ethics, Automated Processes, Machine Translation, and Crises 145

In the aftermath of the Haiti earthquake relief response and in light of the role
played by machine translation, Lewis et al. (2011, p. 510) proposed a crisis cook-
book, on the grounds that MT, which had proven to “dramatically increase the speed
by which relief can be provided”, should become “a standard tool in the arsenal of
tools needed in crisis events”. The argument is clear: if MT facilitates communica-
tion for assimilation and interchange purposes, provided the translated information is
accurate enough to be used, then it cannot but be the centrepiece of making content
available to a local population affected by a crisis in a language that they can
understand.

8.4 From Local Cascading Crises to Global Events

Instead of being considered as a “problem” (Harvard Humanitarian Initiative 2011),


multilingualism should be at the centre of any emergency plans to communicate
risks to everybody, be supported with extensive translation activity and be backed-
up by automation processes. To fight the propagation of the new virus SARs-CoV-2,
the Office of Foreign Affairs, in the Municipal Government of Wuhan, set in motion
a crisis response in December 2019 to thwart what would later become the first
pandemic of the twenty-first century. On 23 January 2020, Wuhan, a city of
11 million inhabitants, went into lockdown in an attempt to avert the rapidly
spreading pandemic. The next day, 18 hospitals in Wuhan resorted to social
media, pleading for help and asking for donations and medical supplies. The race
to provide information on preventative measures to avoid contagion was to become
“history’s biggest translation challenge” (McCulloch 2020, online), supported by
multiple modalities of language transfer, from live multilingual interpreting and sign
language interpreting to text translation of infographics, via radio broadcasts and
audiovisual translations. Web crawling systems using MT meant that the global alert
systems warned WHO to initiate the pandemic emergency plans. On December
30, 2019, “A machine translation of the Finance Sina report [on a pneumonia
outbreak in Wuhan] was published on the website of the Program for Monitoring
Emerging Diseases (ProMED). This report was picked up by the Epidemic Intelli-
gence from Open Sources (EIOS) system and alerted WHO Headquarters to the
outbreak” (Independent Panel 2021, p. 22). This type of automation is indeed a
powerful risk reduction tool and does not raise ethical concerns.
In a matter of weeks, over 80 countries and many international organisations had
contributed emergency supplies, but the international donations hit several linguistic
and cultural barriers, which led to dramatically increased translation needs (Zhang
and Wu 2020). Wuhan’s circumstances brought to light some often overlooked
aspects of intercultural crisis communication: not only is cross-organisation collab-
oration among international partners important, but also information transfer
between the languages of donor countries and the language(s) of the recipient
(be it a country, region or relief consortium). Translation for the procurement of
international donations and equipment happened through a WeChat group,
146 F. M. Federici et al.

consisting of “over 250 college teachers and students, frontline responders, medical
staff, procurement agents, overseas donors, and foundation officers in Wuhan and
across the world” (ibid., p. 520). Volunteers translated the messages posted in the
WeChat group from Chinese into English, French, Japanese, Korean, Portuguese,
Russian, Spanish, Thai, and Vietnamese (ibid.).
Automation, central in these early efforts to disseminate information on the
pandemic, was given priority over CAT tools because, as Wang (2019, p. 98) states,
“it usually takes time to train a translator to use translation memory tools, [so] all
volunteer translators were instead encouraged to use two neural machine translation
tools (Google Translate and Youdao Translator) in this time-constrained context”.
Yet, awareness of potential MT shortcomings meant that the crowdsourcing efforts
continued in terms of volunteers’ post-editing role: “No matter whether they were
students, professionals or other bilinguals, all volunteers were asked to carry out
post-editing of machine-translated output since this situation was too risky to rely on
raw machine translation” (ibid., p. 99). We need to delve more into this type of
automation.
Developing ever more accurate automated translation outputs has intensified
existing ethical dilemmas. A first issue relates to the perceived readiness of the
translated information. Increasingly being used for assimilation purposes, the per-
ception of its quality has posed, and continues to pose, substantial issues, especially
when the output is no longer treated for gisting the content but, instead, it is
considered to be a complete, translated text. The ethical implications that derive
from the text output being seen as operative are significant, especially in crises. A
second issue, more widely discussed, relates to the improbable balance of accessi-
bility versus privacy. Although accessibility might very well be the motor for social
inclusion and for promoting intercultural dialogue (Matamala and Ortiz-Boix 2016),
the underlying risk is that such inclusion and dialogue rest on big tech companies
gaining access to and processing of personal data. If the advantage of increased
access to translated information is a given when it comes to furthering cross-
organisation collaboration and multilingual communication, then the downside
raises strong reservations in terms of infrastructural access to tools (connectivity
and storage), data privacy (third-party access and usage), financial interests (eco-
nomic exploitation of crowdsourced/free data), and environmental impact (Knight
2020; Joscelyne 2021). Even when initiatives are launched as part of a charity effort,
political and financial interests seem to always be at stake.
With the statistical model of automated translation introduced in the late 1980s,
the management of data took a central role. Neural machine translation (NMT)
equally relies on deep learning processes that use datasets of previously translated
sentences to train a model capable of translating between any two languages
(Lanners 2019). Tests on using a lingua franca to improve MT engines for
low-resource languages have shown promise but also limitations as pivot MT
engines obtain what can ethically only be considered as a barely acceptable output
quality (Silva et al. 2018; Liu et al. 2019). In order for a low-resource language to
become more resourced, so that a larger dataset is available, automated web crawling
is used, typically for in-domain knowledge. Although the internet was conceived as a
8 Ethics, Automated Processes, Machine Translation, and Crises 147

virtual space with great democratic potential, the reality is that the presence of certain
languages on the web does not allow for large-data processing in low-resource
languages (cf. data on languages used on websites, as collated by Statista 2020).5
When it comes to out-of-domain knowledge platforms, machine learning is still in its
infancy for the over 2000 African languages. Projects such as Lanfrica (Emezue and
Dossou 2020) and Masakhane (masakhane.io) aim to strengthen and spur NLP
research in African languages and to raise their visibility and standing among MT
researchers. Against this backdrop, the African experience could be capitalised on as
a model for the operationalisation and refinement of multilingual MT solutions in
crises. Experts on NLP and machine learning for translation purposes, from all over
the world, could also benefit from quality improvements to systems, approaches, and
workflows implemented when working with low-resource languages.

8.5 From Monomodal to Multimedia Communication

Using 2010 Haiti and 2020 Wuhan as comparator points, they have one astonishing
element in common: how little audiovisual translation options have been studied
(and in part exploited) for their crucial role in crisis communication. Rogl (2017,
p. 244) points out how “[a]nother important mission for translators” during the early
phase of recovery in Haiti included “a series of subtitling assignments for webcasts
and documentaries reporting on the earthquake”. In a technology-driven multimedia
society like the present one, the value of moving images, accompanied with sound
and text, is crucial when it comes to engaging in communication. It would be natural
to expect that they would be ubiquitously used with translations in crises, as
audiovisual formats have been hailed as the quintessential means of communication
in the twenty-first century. Indeed, audiovisual productions are omnipresent in our
time and age and their seductive power and influence are here to stay and live on via
the screens. Compiled by Stancheva (2021, online), some of the statistics regarding
our consumption of video are mind-boggling, such as the fact that “[v]ideo is the
number 1 source of information for 66% of people” and “92% of people who watch
videos on their mobile phone go on to share the content with other users”. Not only
are audiovisual productions given priority by citizens when it comes to finding
information, but they also seem to be pivotal in the reiteration and dissemination
of such knowledge. And yet, despite their communicative virtues and the increasing
numbers of individuals, companies, institutions, and international organisations that
resort to audiovisual and multimodal material as their preferred mode of communi-
cation, systematic uses of accessible videos, and particularly translated ones, in

5
According to aggregated data published in Statista.com, the most commonly used languages in
websites are English 25.9%, Chinese 19.4%, Spanish 7.9%, Arabic 5.2%, Indonesian/Malaysian
4.3%, Portuguese 3.7%, French 3.3%, Japanese 2.6%, Russian 2.5%, German 2%, and the rest of
languages 23.1%. www.statista.com/statistics/262946/share-of-the-most-common-languages-on-
the-internet. Accessed 24 February 2022.
148 F. M. Federici et al.

crises are disappointingly limited. This underusage is astonishing, especially when


compared to the output generated in the form of written documentation such as
factsheets, posters, handbooks, and the like.
The scarcity of studies in audiovisual translation concerned with crisis commu-
nication is our final ethical concern. Under the slogan “Hubei fights against novel
coronavirus epidemic”, a wide range of informative documentation, that “varied
from handbooks and pamphlets to posters (flyers) and videos” (Wang 2019, p. 93),
was produced, translated and hosted on the institutional website for people to
consult, especially for foreign nationals anxious to find out details about the new
illness.6 From a quantitative perspective, of the 78 items listed in their “Guide and
services”, and dating from 21 January until 5 April 2020, only three are videos, with
varying levels of quality in terms of legibility and readability. What is more, and
despite the explicit acknowledgement by the project manager of this initiative that
“[t]he translation efforts gave priority on containment measures which had been
translated from Chinese to English (EN), Japanese (JP), Korean (KR), Russian (RU),
Italian (IT), Spanish (SP) and French (FR)” (ibid.), only some of the written
documents, delivered as pdf, display this gamut of languages. When it comes to
the videos, only English and Chinese have been used as the vehicular languages, to
the detriment of the rest: one of the videos is monolingual, with both the soundtrack
and the subtitles in English, in an apparent effort “to ensure information was
accessible to the deaf or hard-of-hearing audiences” (ibid.), while the other two
have been shot in Chinese, with the inclusion of interlingual subtitles in English and
onscreen text in Chinese that provides a literal account of what is also being said. As
admitted by the main project manager, “[i]deally, there should have been more
language versions of the videos” (ibid.). As the global response evolved, many
multimodal combinations appeared, as attested in the repository provided by the
Endangered Language Project (ELP) on its website,7 which gives access to a large
database, including multiple audiovisual (or audio-only) resources.
Notwithstanding the role played by TV news programmes in crises, normally in
the form of audiovisual broadcasts, the reasons behind this state of affairs are
multifarious. On the one hand, and despite the fact that technological advances in
the digital video editing arena have made the production of videos much easier than
before, the reality is that generating a written text and its translation continues to be
simpler and, crucially, faster than creating and translating a video, which is vital in
situations where the prompt delivery of information is of the essence. On the other
hand, it can be argued that, out of tradition, a myopic strategy, when there is one,
persists to be applied in these contexts, in which written information continues to
prevail over multimedia messages, despite the potentially wider impact and imme-
diacy of the latter (for instance missing out poorer, low-literacy groups). In fact, it
has been argued that the generation of trust in an emergency is intrinsically linked
with the source of information (Arlikatti et al. 2007; Cadwell 2019) and, in this

6
See http://en.hubei.gov.cn/special/coronavirus_2019/index.shtml. Accessed 24 February 2022.
7
See https://endangeredlanguagesproject.github.io/COVID-19. Accessed 24 February 2022.
8 Ethics, Automated Processes, Machine Translation, and Crises 149

sense, audiovisual messages can help build that trust, as the speakers can be seen on
screen, thus reaching the intended audience in a more impactful manner. However,
these advantages can be somewhat curtailed because the translation of multimedia
products can be perceived as being more complicated than that of written texts,
particularly from a technological perspective. Indeed, this further complex logistics
of having to translate audiovisual productions into a plurality of languages—very
frequently minority and/or minoritised ones—adds to the apparent lack of appetite
for exploiting the production and multilingual dissemination of informative and
instructive videos.
From an academic perspective, the situation is not much better and little research
has so far focused on the role, challenges, and potential of audiovisual communica-
tion and translation in crises. As an emerging field within the wider discipline of
translation studies, the remit of crisis translation should in future include ways of
contextualising multimodal translation (subtitling, dubbing, audio description,
respeaking, etc.) for the significant role it already plays in the crisis lifecycle to
educate on preparedness measures, to inform of risk mitigation measures, and to
reach low-literacy and sensory impaired audiences. Audiovisual translation studies
lag and have been somewhat restricted. Yet, as previously mentioned, the fact
remains that TV news broadcasts become crucially important in periods of crisis
as key tools to disseminate and receive information, as happened after the 2011
Great East Japan Earthquake (Sato et al. 2009; Tsuruta 2011; Kaigo 2012), and as
highlighted by Wang (2019) in Wuhan’s response. In these cases, translation might
not be immediately apparent on screen as different stations air their programmes in
their own languages. However, depending on the broadcasters’ awareness and
priorities, as well as on the potential existence of legislation on the topic, the figure
of the sign language interpreter can become a regular actor in these settings so as to
convey the information to the D/deaf community. On occasion, and to enhance
audiovisual accessibility and social inclusiveness, the screen is also shared with
subtitles that are created for D/deaf and hard of hearing people. Less prominent
seems to be the presence of audio description in these environments as a means to
communicate with other societal groups like the blind and the partially sighted and
further research ought to be conducted on the topic to gain a more comprehensive
picture of the current state of affairs.
Having said all this, it is true that audiovisual materials are being increasingly
exploited in crises, particularly during the response phase, when instructional and
informative videos are created to help those affected, as in the example about Wuhan
previously discussed. One of the defining characteristics of audiovisual translation is
its fundamental enmeshment with technology and technical advancements (Díaz-
Cintas 2014, 2018; Baños 2018; Doherty and Kruger 2018; Díaz-Cintas and
Massidda 2019), which opens up the gates to exploration into the potential that
linguistic and technical automation can have when dealing with the translation of
audiovisual materials in crises. In such contexts, not only practices like MT, whose
implementation in the subtitling industry has increased substantially over the past
few years, but also technologies such as (automatic) speech recognition (ASR) and
speech synthesis (Ciobanu and Secară 2019) could be efficiently implemented to
150 F. M. Federici et al.

help translating audiovisual content. In this respect, speech recognition has been
used to transcribe the dialogue uttered in videos into (intralingual) subtitles, both
automatically, with the help of ASR, and with the participation of a respeaker, in
which case, “the process is not fully automated as a professional is needed to dictate
or respeak the original speech” (Baños 2018, p. 31). YouTube automatic captioning
offers another instance of the use of ASR in subtitling, whereby Google’s speech
recognition technology is used to automatically convert the speech from a clip into
text, which is then synchronically presented as subtitles on screen. Likewise, text in
the form of existing subtitles or lists of dialogue can be converted to speech through
the use of speech synthesis. The resulting voice track can then be embedded in any
given audiovisual material to make it more accessible, for instance to people who are
blind.
Although the potential of MT in settings of crisis translation should certainly be
investigated as a tool to increase productivity and speed up the transfer process of
audiovisual productions, it is our contention that the most important issues should be
prioritised in this emerging field. Indeed, the designing, testing and implementation
of Audiovisual Translation (AVT) workflows that could be operational during a
crisis eventuality should be given utmost precedence, as well as the exploration of
the users’ likes and dislikes when it comes to accessing this type of material in a
revoiced (e.g., dubbed, voiced-over) or subtitled version. Of course, this would be of
significant consequence in the case of translating languages with little written
tradition. Another area that merits urgent attention is the potential unleashed by
the technical migration to the cloud and the popularity of online collaborative
platforms designed exclusively for dubbing and subtitling purposes. In these new
virtual spaces, activities such as cybersubtitling and cyberdubbing (Díaz-Cintas
2018) present both opportunities but also challenges as far as productivity and the
ethics of volunteering are concerned. Our fourth and final ethical stance is the
reiteration that we urgently need more systematic research on the usage of audiovi-
sual translation in crises, as audiovisual materials support most citizens and, thus,
contribute to the inclusivity agenda behind the UN Sustainable Goals.

8.6 Conclusions

The realisation of the UN’s “leave no one behind” SDGs is very uncertain and
definitely projected into a distant future. Inequities still remain in access to informa-
tion and to the infrastructures that allow data management and storage. The COVID-
19 pandemic has reminded us that circulation of information—even when commu-
nicating risks in the interest of the whole society—is an extremely complex task.
Lending trustworthiness to the various voices and speakers depends on many factors
but language is certainly quintessential to creating successful communication chan-
nels. Urgency, gaps, and “language indifference” (Polezzi forthcoming) from mono-
lingual, power-holding groups risk justifying rushed and incomplete forms of full
automation through technologies. Such an approach, underestimating the nuanced
8 Ethics, Automated Processes, Machine Translation, and Crises 151

landscapes that computer scientists who design them envisage for such technologies,
diminishes the very impact of MT engines in supporting humans to deal with crises.
The translation of information, regardless of the mode used, its role in addressing
social inequality, and the adoption of MT-driven or automated translation workflows
as if they were standalone, self-sufficient solutions risk obscuring the real potential
behind translation automation. Automation should be fully implemented but, yet
again, people should be also trained to work and enhance it with the ultimate goal of
strengthening social resilience against adversity. Global early warning systems such
as GDACS could scale up the approach that was being used by GPHIN, combining
automated information mining, MT, and domain-specific language experts (Tanguay
2019). The Epidemic Intelligence from Open Sources seems to be a successful move
in this direction as it aims “to create a unified, all-hazards, One Health approach by
using open-source information for early detection, verification and assessment of
public health risks and threats” (Abdelmalik et al. 2018, p. 268). Its design allows the
system to aggregate data streams from other systems, with a capacity to scale up the
data-harnessing features of GPHIN by relying on the WHO global network of
country-level offices, as well as all the open sources on the internet, and local
scientists’ reports and warnings. As mentioned earlier, EIOS picked up the first
mention of the epidemic in Wuhan. It is unclear how expert linguists are expected to
interact with the system. Of course, the design and development of new technology
and automated platforms bring about a wide range of ethical considerations that we
have not touched upon in this contribution, as our main interest lies on the use that is
made of such technology in the interactions with translation actors. The implications
from a technical point of view have been broached by authors like Parra Escartín and
Moniz (2020).
There are numerous multimodal contexts that currently embed MT engines and
raise ethical issues. In many areas, volunteerism and technology lull users into a false
security of the translation quality achieved through automated processes. Projects
such as Masakhane are an example of how 1500 languages in use in African
countries have seen scattered and unsystematic work to create MT engines and
minable resources. They show promise but also concerns with access in terms of
what is available, which has only recently started being optimised, and what is
infrastructurally possible. For rare and low-resource languages, the risk is that any
solution for automating translation processes might be applied, and often can only be
adopted, by untrained users and deployed as a final translation product with limited
diagnostic Quality Assurance (QA), which takes into consideration the responses of
members of the user group.
Our first ethical consideration has been that applications and uses of automation
processes of translation have a central role to play, together with human interaction,
in increasing preparedness, but more focus is needed on the human-computer
interaction and the role of humans in quality assurance processes. Our second ethical
consideration is that T&I automation processes must be part of the lifecycle of crises
and not just of the response phase. Our third consideration is that to reduce ethical
concerns about the application of MT and automation processes to crisis communi-
cation, these must be embedded in global platforms that aim to reduce risks, but their
152 F. M. Federici et al.

limitations must be critically explained to users. Our final ethical consideration


concerns scholars in T&I, audiovisual software developers, and MT researchers to
enlarge and systematise research on the usage of AVT in crises, as multimodal
transfer of information is becoming ever more central to people’s lives and it reduces
accessibility and economic barriers, while increasingly adopting multi-layered forms
of automation in its translation workflows.
One ethical recommendation would be to invest on global risk reduction plat-
forms such as GDACS, GPHIN, and EIOS to make sure that they are regularly
updated with the latest MT resources, to gradually include as many low-resource
languages as possible, and to ensure that they integrate with AVT workflows to
support automation processes. At that stage, discussions about better computer-
human interaction with automation processes and MT can be entertained, as well
as about the creation of bespoke and efficient training on these systems and the
enhancement of QA practices vis-à-vis MT outputs. We firmly believe that many
automation processes currently support translation in vital ways and are aware that
the need for near-immediate translation in crises is only going to grow. It is also our
contention that automation is part of a process and, however advantageous and
complex they might be, MT translation options currently qualify as tools in these
processes, and not as standalone solutions. This position rests on the ethical dimen-
sion at the core of this chapter: communication in crisis settings pivots around
notions of trust, credibility, and social equality (Piller 2016, 2020) that pervade
social interactions and remain the domain of human activity. Interlingual commu-
nication in times of crisis and its relation to automation processes—usage, applica-
tion but also design and development—needs to be built on inclusive preparedness
and multilingual communication supported by technology is an essential activity to
fulfil this ambition. The ability to communicate messages that reduce risks to the
wider population is crucial, but a barely understandable message is not intrinsically
sufficient.

References

Abdelmalik P, Peron E, Schnitzler J, Fontaine J, Elfenkampera E, Barbozaa P (2018) The epidemic


intelligence from open sources initiative: a collaboration to harmonize and standardize early
detection and epidemic intelligence among public health organizations. Wkly Epidemiol Rec
93(20):267
Alexander DE, Pescaroli G (2019) The role of translators and interpreters in cascading crises and
disasters. Disaster Prev Manag 29(2):144–156. https://doi.org/10.1108/DPM-12-2018-038
Arlikatti S, Lindell MK, Prater CS (2007) Perceived stakeholder role relationships and adoption of
seismic hazard adjustments. Int J Mass Emerg Disasters 25(3):218–256
Baños R (2018) Technology and audiovisual translation. In: Chan S-W (ed) An encyclopedia of
practical translation and interpreting. The Chinese University of Hong Kong Press, Sha Tin, pp
3–30. https://doi.org/10.2307/j.ctvbtzp7q.4
Cadwell P (2019) Trust, distrust, and translation in a disaster. Disaster Prev Manag 29(2):157–174
Cadwell P, O’Brien S, DeLuca E (2019) More than tweets: a critical reflection on developing and
testing crisis machine translation technology. Transl Spaces 8(2):300–333
8 Ethics, Automated Processes, Machine Translation, and Crises 153

Christianson C, Duncan J, Onyshkevych B (2018) Overview of the DARPA LORELEI program.


Mach Transl 32(1):3–9. https://doi.org/10.1007/s10590-017-9212-4
Ciobanu D, Secară A (2019) Speech recognition and synthesis technologies in the translation
workflow. In: O’Hagan M (ed) The Routledge handbook of translation and technology.
Routledge, London, pp 91–106
Crouse Quinn S (2008) Crisis and emergency risk communication in a pandemic: a model for
building capacity and resilience of minority communities. Health Promot Pract 9(4):18S–25S
Díaz-Cintas J (2014) Technological strides in subtitling. In: Chan S-W (ed) Routledge encyclopedia
of translation technology. Routledge, London, pp 632–643
Díaz-Cintas J (2018) ‘Subtitling’s a carnival’: new practices in cyberspace. J Special Transl 30:127–
149
Díaz-Cintas J, Massidda S (2019) Technological advances in audiovisual translation. In: O’Hagan
M (ed) The Routledge handbook of translation and technology. Routledge, London, pp 255–270
Doherty S, Kruger J-L (2018) Assessing quality in human-and machine-generated subtitles and
captions. In: Moorkens J, Castilho S, Gaspari F, Doherty S (eds) Translation quality assessment.
Springer, Cham, pp 179–197
Emezue CC, Dossou BFP (2020) Lanfrica: a participatory approach to documenting machine
translation research on African languages. arXiv preprint arXiv:2008.07302
Federici FM, O’Brien S (2020) Cascading crises: translation as risk reduction. In: Federici FM,
O’Brien S (eds) Translation in cascading crises. Routledge, London, pp 1–22
Federici FM, Gerber BJ, O’Brien S, Cadwell P (2019a) The international humanitarian sector and
language translation in crisis situations. INTERACT Network, Assessment of current practices
and future needs. https://doi.org/10.53241/INTERACT/001
Federici FM, O’Brien S, Cadwell P, Marlowe J, Gerber B, Davis O (2019b) INTERACT recom-
mendations on policies. http://doras.dcu.ie/23880. Accessed 21 July 2021
Field J (2017) What is appropriate and relevant assistance after a disaster? Accounting for culture
(s) in the response to Typhoon Haiyan/Yolanda. Int J Disaster Risk Reduct 22:335–344. https://
doi.org/10.1016/j.ijdrr.2017.02.010
Greenwood F, Howarth C, Poole DE, Raymond NR, Scarnecchia DP (2017) The signal code: a
human rights approach to information during crisis. Harvard Humanitarian Initiative. http://hhi.
harvard.edu/sites/default/files/publications/signalcode_final.pdf. Accessed 21 July 2021
Harvard Humanitarian Initiative (2011) Disaster relief 2.0: the future of information sharing in
humanitarian emergencies. https://hhi.harvard.edu/files/humanitarianinitiative/files/disaster-
relief-2.0.pdf?m=1612814759. Accessed 21 July 2021
Hunt A, Specht D (2019) Crowdsourced mapping in crisis zones: collaboration, organisation and
impact. J Int Humanitarian Act 4(1):1–11
Hunt M, O’Brien S, Cadwell P, O’Mathúna DP (2019) Ethics at the intersection of crisis translation
and humanitarian innovation. J Humanitarian Affairs 1(3):23–32
Hutchins JW (1995) Machine translation: a brief history. In: Koerner EFK, Asher RE (eds) Concise
history of the language sciences. Pergamon Press, Oxford, pp 431–445
Hutchins JW (2005) Current commercial machine translation systems and computer-based transla-
tion tools: System types and their uses. Int J Transl 17(1-2):5–38. http://www.hutchinsweb.me.
uk/IJT-2005.pdf
IFRC (2013) World disasters report 2013. Focus on technology and the future of humanitarian
action. International Federation of Red Cross and Red Crescent Societies. https://reliefweb.int/
report/world/world-disasters-report-2013-focus-technology-and-future-humanitarian-action-
enar. Accessed 21 July 2021
IFRC (2018) World disasters report 2018. Leaving no one behind. https://media.ifrc.org/ifrc/wp-
content/uploads/sites/5/2018/10/B-WDR-2018-EN-LR.pdf. Accessed 21 July 2021
Iglesias CA, Favenza A, Carrera Á (2020) A big data reference architecture for emergency
management. Inform 11(12):569. https://doi.org/10.3390/info11120569
Independent Panel (2021) COVID-19: make it the last pandemic. https://theindependentpanel.org/
mainreport/. Accessed 21 July 2021
154 F. M. Federici et al.

Joscelyne A (2021) How does AI ethics impact translation? TAUS Blog. https://blog.taus.net/
multilingual-morals-how-does-ai-ethics-impact-translation. Accessed 21 July 2021
Kaigo M (2012) Social media usage during disasters and social capital: Twitter and the Great East
Japan Earthquake. Keio Commun Rev 24:19–35
Kelman I (2020) COVID-19: what is the disaster? Soc Anthropol 28(2):296–297. https://doi.org/10.
1111/1469-8676.12890
Kenny D (2011) The ethics of machine translation. In: Ferner S (ed) Reflections on language and
technology: the driving forces in the modern world of translation and interpreting. NZSTI,
Auckland
Knight W (2020) AI can do great things—if it doesn’t burn the planet. Wired. https://tinyurl.com/
W20HBTC. Accessed 21 July 2021
Lanners Q (2019) Neural machine translation. Towards data science. https://towardsdatascience.
com/neural-machine-translation-15ecf6b0b. Accessed 21 July 2021
Lewis WD, Munro R, Vogel S (2011) Crisis MT: developing a cookbook for MT in crisis situations.
Proceedings of the sixth workshop on statistical machine translation, Edinburgh, 30-31 July
Liu C-H, Way A, Silva C, Martins AF (2019) Pivot machine translation in INTERACT project.
Proceedings of machine translation summit XVII volume 2: translator, project and user tracks,
Dublin
Matamala A, Ortiz-Boix C (2016) Accesibilidad y multilingüismo: un estudio exploratorio sobre la
traducción automática de descripciones de audio. Trans 20:11–24. https://doi.org/10.24310/
TRANS.2016.v0i20.2059
Mawudeku A, Blench M, Boily L, John RS, Andraghetti R, Ruben M (2013) The global public
health intelligence network. In: M’ikanatha NM, Lynfield R, Van Beneden CA, de Valk H (eds)
Infectious disease surveillance. Wiley, Hoboken, pp 457–469
McCulloch G (2020) Covid-19 is history’s biggest translation challenge. Wired. https://www.wired.
com/story/covid-language-translation-problem. Accessed 21 July 2021
Meier P (2015) How digital Jedis are springing to action in response to Cyclone Pam. iRevolutions.
https://irevolutions.org/2015/04/07/digital-jedis-cyclone-pam. Accessed 21 July 2021
Moore R, Verity A (2014) Hashtag standards for emergencies: OCHA policy and studies brief.
United Nations Office for the Coordination of Humanitarian Affairs (OCHA). https://www.
unocha.org/publication/policy-briefs-studies/hashtag-standards-emergencies. Accessed
21 July 2021
Mulder F, Ferguson J, Groenewegen P, Boersma K, Wolbers J (2016) Questioning big data:
crowdsourcing crisis data towards an inclusive humanitarian response. Big Data Soc 3(2):
1–13. https://doi.org/10.1177/2053951716662054
Munro R (2010) Crowdsourced translation for emergency response in Haiti: the global collabora-
tion of local knowledge. AMTA workshop on collaborative crowdsourcing for translation,
31 October, Denver, CO
Munro R (2013) Crowdsourcing and the crisis-affected community: lessons learned and looking
forward from Mission 4636. J Inf Retr 16(2):210–266
Nurminen M, Koponen M (2020) Machine translation and fair access to information. Transl Spaces
9(1):150–169
O’Brien S (2019) Translation technology and disaster management. In: O’Hagan M (ed) The
Routledge handbook of translation technologies. Routledge, London, pp 304–318
O’Brien S, Federici FM (2019) Crisis translation: considering language needs in multilingual
disaster settings. Disaster Prev Manag 29(2):129–143
O’Brien S, Federici FM, Cadwell P, Marlowe J, Gerber B (2018) Language translation during
disaster: a comparative analysis of five national approaches. Int J Disaster Risk Reduct 31:627–
636. https://doi.org/10.1016/j.ijdrr.2018.07.006
O’Mathúna DP, Hunt MR (2019) Ethics and crisis translation: insights from the work of Paul
Ricoeur. Disaster Prev Manag 29(2):175–186
8 Ethics, Automated Processes, Machine Translation, and Crises 155

O’Mathúna DP, Parra Escartín C, Roche P, Marlowe J (2020) Engaging citizen translators in
disasters: virtue ethics in response to ethical challenges. Transl Interpret Stud 15(1):57–79.
https://doi.org/10.1075/tis.20003.oma
Parra Escartín C, Moniz H (2020) Ethical considerations on the use of machine translation and
crowdsourcing in cascading crises. In: Federici FM, O’Brien S (eds) Translation in cascading
crises. Routledge, London, pp 132–151
Piller I (2016) Linguistic diversity and social justice: an introduction to applied sociolinguistics.
Oxford University Press, Oxford
Piller I (2020) COVID-19 forces us to take linguistic diversity seriously. Perspectives on the
pandemic: international social science thought leaders reflect on COVID-19, 12, 13–17.
https://www.degruyter.com/fileasset/craft/media/doc/DG_12perspectives_socialsciences.pdf
Polezzi L (forthcoming) Language indifference. In: Forsdick C, Kamali L (eds) Translating
cultures: a glossary. Liverpool University Press, Liverpool
Quinn P (2018) Crisis communication in public health emergencies: the limits of ‘legal control’ and
the risks for harmful outcomes in a digital age. Life Sci Soc Policy 14(1):1–40. https://doi.org/
10.1186/s40504-018-0067-0
Ramalingam B, Sanderson D (2015) Nepal earthquake response: lessons for operational agencies.
ALNAP/ODI. https://reliefweb.int/sites/reliefweb.int/files/resources/nepal-earthquake-
response-lessonspaper.pdf. Accessed 21 July 2021
Rico Pérez C (2013) From hacker spirit to collaborative terminology: resourcefulness in humani-
tarian work. Transl Spaces 2(1):19–36
Rico Pérez C (2020) Mapping translation technology and the multilingual needs of NGOs along the
aid chain. In: Federici FM, O’Brien S (eds) Translation in cascading crises. Routledge, London,
pp 112–131
Rogl R (2017) Language-related disaster relief in Haiti: volunteer translator networks and language
technologies in disaster aid. In: Antonini R, Cirillo L, Rossato L, Torresi I (eds)
Non-professional interpreting and translation. State of the art and future of an emerging field
of research. John Benjamins, Amsterdam, pp 231–255. https://doi.org/10.1075/btl.129.12rog
Ruder S (2020) Why you should do NLP beyond English. http://ruder.io/nlp-beyond-english.
Accessed 21 July 2021
Sato K, Okamoto K, Miyao M (2009) Japan, moving towards becoming a multi-cultural society,
and the way of disseminating multilingual disaster information to non-Japanese speakers.
Proceedings of the 2009 International Workshop on Intercultural Collaboration
Silva CC, Liu C-H, Poncelas A, Way A (2018) Extracting in-domain training corpora for neural
machine translation using data selection methods. Proceedings of the Third Conference on
Machine Translation
Stancheva T (2021) 24 noteworthy video consumption statistics. TechJury. https://techjury.net/
blog/video-consumption-statistics. Accessed 21 July 2021
Sutherlin G (2013) A voice in the crowd: Broader implications for crowdsourcing translation during
crisis. J Inf Sci 39(3):397–409
Systran (2021) 12 translation models specialized with corona crisis data. https://www.systransoft.
com/systran/news-and-events/specialized-corona-crisis-corpus-models/#try. Accessed
21 July 2021
Tanguay F (2019) GPHIN. Presentation. World Health Organization. https://www.who.int/docs/
default-source/eios-gtm-2019-presentations/tanguay-phac–-eios-gtm-2019.pdf?sfvrsn=
8c758734_2. Accessed 21 July 2021
TAUS (2021) TAUS corona crisis corpora. https://md.taus.net/corona. Accessed 21 July 2021
Tsuruta C (2011) Broadcast interpreters in Japan: Bringing news to and from the world. Inter-
preters’ Newsl 16:157–173
UN (2015) Sustainable development: the 17 goals. United Nations. https://sdgs.un.org/goals.
Accessed 21 July 2021
Wang P (2019) Translation in the COVID-19 health emergency in Wuhan: a crisis manager’s
perspective. J Int Local 6(2):86–107
156 F. M. Federici et al.

WHO (2014) Infection prevention and control of epidemic and pandemic prone acute respiratory
infections in health care. World Health Organization. https://www.who.int/publications/i/item/
infection-prevention-and-control-of-epidemic-and-pandemic-prone-acute-respiratory-infec
tions-in-health-care. Accessed 21 July 2021
WHO (2017) Communicating risk in public health emergencies. A WHO guideline for emergency
risk communication (ERC) policy and practice. World Health Organization
Yourterm (2021) COVID-19 terminology resource centre. https://yourterm.org/covid-19. Accessed
21 July 2021
Zhang J, Wu Y (2020) Providing multilingual logistics communication in COVID-19 disaster relief.
Multilingua 39(5):517–528. https://doi.org/10.1515/multi-2020-0110
Part III
Responsible Machine Translation: Societal
Impact
Chapter 9
Gender and Age Bias in Commercial
Machine Translation

Federico Bianchi, Tommaso Fornaciari, Dirk Hovy, and Debora Nozza

Abstract The main goal of Machine Translation (MT) has been to correctly convey
the content in the source text to the target language. Stylistic considerations have
been at best secondary. However, style carries information about the author’s
identity. Mostly overlooking this aspect, the output of three commercial MT systems
(Bing, DeepL, Google) make demographically diverse samples from five languages
“sound” older and more male than the original texts. Our findings suggest that
translation models reflect demographic bias in the training data. This bias can
cause misunderstandings about unspoken assumptions and communication goals,
which normally differ for different demographic categories. These results open up
interesting new research avenues in MT to take stylistic considerations into account.
We explore whether this bias can be used as a feature, by correcting skewed initial
samples, and compute fairness scores for the different demographics.

Keywords Bias · Ethics · Natural language processing · Machine translation

All authors contributed equally and are listed alphabetically.

F. Bianchi
Computer Science Department, Stanford University, Stanford, USA
e-mail: [email protected]
T. Fornaciari
Italian National Police, Rome, Italy
e-mail: [email protected]

D. Hovy (✉) · D. Nozza


Computing Sciences Department, Bocconi University, Milan, Italy
e-mail: [email protected]; [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 159
H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_9
160 F. Bianchi et al.

9.1 Introduction

Good communication is fundamental in a highly connected world. Nowadays, we


need to swiftly cooperate with people who speak different languages: getting
supplies from another country requires multilingual interaction, with different emails
exchanged between different partners.
These situations are where Machine Translation (MT) comes to shine. MT is a
broad set of techniques introduced to translate a sentence from any one language into
another. The advantages of this technology in terms of time and cost efficiency are
readily apparent.
However, as many other applications, MT requires a lot of care when used in real-
world settings, as we are not completely aware about the possible side-effects of
these techniques when used in production systems (Bianchi and Hovy (2021) refer to
this general problem as the Gap between Adoption and Understanding). Saying the
wrong thing in the wrong context can have dramatic consequences. Japanese, for
example, has many honorific expressions that need to be respected for coherent and
efficient conversation. Using the wrong honorific can prompt an unexpected reaction
on the other side of the conversation—or an end to it. Similarly, mistranslating an
expression by using the wrong synonym can result in humorous sentences (see
reams of internet memes) or very serious outcomes (as when an MT engine trans-
lated “good morning” as “attack them”).1
These are gross and apparent mistakes, and MT has made great strides to address
these technical issues. However, beyond the grammatical and semantic correctness,
language has a social component, which is equally as important for conveying
meaning (Flek 2020; Hovy and Yang 2021). This component includes slight but
consistent linguistic differences based on the speaker’s age, gender, and other socio-
demographic aspects. This linguistic feature is sometimes explicitly marked via
morphemes. Many languages, for example Japanese, use explicitly gendered lan-
guage, i.e., language carries information about the speaker’s gender. Even when
languages do not explicitly require us to mark author gender, the use of certain
lexemes, constructions, or formulations might still encode a wealth of socio-
demographic information about the speaker (Johannsen et al. 2015).
But MT systems (and human translations) act as intermediaries, i.e., they speak
on behalf of the person translated. Just as we do not want to put wrong words into a
speaker’s mouth, so do we not want to misrepresent their demographic identity. This
faithful representation is a question of user autonomy (see Prabhumoye et al. 2021).
So if we accept that language carries idiosyncratic socio-demographic information
that contributes to an utterance as much as the content does, we seek to preserve that
information in a good translation.
Our research investigates whether MT systems faithfully translate socio-
demographic author profiles. We look at age and gender identity here, as those are

1
https://www.theguardian.com/technology/2017/oct/24/facebook-palestine-israel-translates-good-
morning-attack-them-arrest.
9 Gender and Age Bias in Commercial Machine Translation 161

salient and well-studied aspects, but our findings hold more broadly for a wide range
of socio-demographic speaker-attributes.
Simply put, we would like our gender identity and age to be respected when we
communicate, via translation, with someone. Gender identity and age are thus called
Language Invariant Properties (Bianchi et al. 2021b): properties that should not
change when we translate text. However, currently, this does not happen, as we
demonstrate empirically in Sect. 9.4. We test what happens to perceived author
demographics when we translate texts with three different services. We use classi-
fiers based on standard TF-IDF (c.f. Box 9.1) with logistic regression (c.f. Box 9.2)
and based on the recent BERT model (Devlin et al. 2019) (c.f. Box 9.3) to measure
the degree of misrepresentation.

Box 9.1 TF-IDF


The Term Frequency—Inverse Document Frequency (TF-IDF) is a statistical
measure for words in texts.
It represents a trade-off between the overall word frequencies and their
distribution across documents. Essentially, it weighs the overall frequency of a
term by the number of documents it occurs in. It gives high values to terms that
are frequent, but at the same time appear in a limited number of documents.
Words with high TF-IDF values are typically good features such as Logistic
Regression (see Box 9.2).

Box 9.2 Logistic Regression


Logistic Regression is a statistical model used to estimate the probability of an
observation to belong a certain class.
In our application, the observations are texts, represented as a vector of (n-
grams of) word frequencies present in the texts and selected according to their
TF-IDF values. This kind of feature set is commonly called Bag-of-Words
(BoW). The predicted classes are the demographic variables: gender and age
of the text’s author.

Box 9.3 BERT


The Bidirectional Encoder Representations from Transformers (BERT) is a
state-of-the-art language model in Natural Language Processing (NLP). It
represents documents as a vector of numbers, where similar sentences receive
similar vectors.
This kind of word and sentence representation has proven effective for most
NLP tasks.
As its names says, BERT’s building block is the Transformer neural
network model BERT has been trained on huge corpora covering several
languages, and requires extensive computation times. For this reason, people
use pre-trained versions that are freely available (Vaswani et al. 2017).
162 F. Bianchi et al.

We use demographically-representative author samples from five languages


(Dutch, English, French, German, Italian) and translate them with three commer-
cially available MT systems, namely Google Translate, Bing Translator, and DeepL.
We compare the actual demographics with each translation’s predicted demo-
graphics (and a control predictor trained on the same language). Without making
any judgment on the translation quality of the content, we find two outcomes that can
drastically impact how messages could be perceived after translation:
1. There are substantial discrepancies in the perceived demographics;
2. Translations tend to make the writers appear older and considerably more male
than they are.
This issue is not only relevant to MT research but strongly impacts the public. Our
most crucial conclusion, corroborated by other experiments in the literature, is that
MT bias is a problem that we cannot overlook.
This book chapter is based on our previous work Hovy et al. (2020) that we
extend with more details and a more general outlook. In particular, we consider the
use of age and gender classifiers based on BERT (Devlin et al. 2019) for all the
different languages, thus making the results much more robust since they are now
built on the most recent literature. Eventually, we also give some hints about the
effect of back-translating a text in its original language.
We organize this chapter as follows. We begin with a brief recap of MT in Sect.
9.2, where we also discuss the aspects of commercial MT tools in more detail. In
Sect. 9.3 we discuss recent literature on bias in MT, describing which problems
affect this task. Eventually, we describe our experiments to demonstrate age and
gender bias in commercial MT tools in Sect. 9.4. We end the chapter with a
discussion on the limits of these technologies and a tentative path forward, highlight-
ing both limitations and opportunities.

9.2 Machine Translation

MT is one of the most well-studied NLP tasks. Translation models—based initially


on lookup tables and n-grams2—have became much more reliable with the advent of
deep learning. We refer the reader to Dabre et al. (2020) for an in-depth analysis of
these models. Here, we give only an introduction. Nowadays, most models are based
entirely on deep neural networks (Zhao et al. 2019; Wu et al. 2019; Liu et al. 2020).
Deep models are often based on Recurrent Neural Networks (RNN) that process
the input sentence sequentially, maintaining a state from which they can generate the
output sentence’s words.

2
n-grams are sequences of n consecutive words in a text. In NLP, they are frequently used linguistic
features, and the most common n-grams are bi-grams and tri-grams.
9 Gender and Age Bias in Commercial Machine Translation 163

The most recent and prominent neural architecture to date are the Transformer
models, introduced for MT (Vaswani et al. 2017) and that have subsequently shaped
algorithms in NLP (and progressively replaced RNNs).
MT is the NLP problem with the most comprehensive adoption in industry, as
companies need to translate their documents. This global need for translation
underscores the importance of MT. In the following sections, we will consider
three popular translation systems:
• Google Translate: is a service provided by Google.3
• Bing Translator: is a service provided by Microsoft.4
• DeepL: provided by the homonymous company,5 to date, DeepL is an MT
service often considered by translators due to the quality of the translations.6
Given the essential business aspect of machine translations/machine-translated
texts, all three translation services come with both free and premium versions (where
the main difference is the maximum numbers of translations a user can run, and the
availability of some other features for professional use, like the possibility of being
integrated in spreadsheets).

9.3 Bias in Machine Translation

There is now a growing literature on various types of bias present in the NLP field.
There is indeed a lot of interest in exploring the harm that modern systems can
perpetuate when used in real settings (Stafanovičs et al. 2020; Gehman et al. 2020;
Nozza et al. 2021; Nozza 2021). Blodgett et al. (2020) review 146 papers (about
written text only, excluding spoken language) focusing on “bias” in NLP systems.
However, they argue that such growth is not directed, as the studies’ motivations are
“often vague, inconsistent, and lacking in normative reasoning” (page 5455). In
particular, they point out the frequent lack of a clear definition of “bias” and
engagement with the relevant literature outside of NLP. They provide some recom-
mendations on improving the problem of bias definition and, therefore, the proposed
methods’ effectiveness.
Shah et al. (2020) offers another review of the literature with that goal in mind.
They identify four origins of biases in creating NLP systems, formalize them
mathematically, and show countermeasures for each type. With respect to MT, the
most frequent sources of bias are selection biases, semantic biases and
overamplification (the fourth bias they found, label bias, is not applicable in MT).
These biases play a role in selecting the training corpus, in the use of word

3
https://translate.google.com/.
4
https://www.bing.com/translator.
5
https://www.deepl.com/en/translator.
6
https://www.deepl.com/quality.html.
164 F. Bianchi et al.

representations, and the models’ optimization, respectively. The authors show how
training corpora and word representations, i.e., word embeddings, can incorporate a
distorted society image, possibly harmful for some socio-demographic categories,
especially those characterized by the frequent presence of biased and stereotyped
attributes in training corpora. The authors also note that NLP models themselves can
reproduce probability distributions that do not reflect the training data.
Work by Mirkin et al. (2015); Mirkin and Meunier (2015) has set the stage for
considering the impact of demographic variation (Hovy et al. 2015) and its integra-
tion in MT. Rescigno et al. (2020), report statistics about machine translation tools
performance in translating gender-related words. More recent research has suggested
that MT systems reflect cultural and societal biases (Stanovsky et al. 2019; Escudé
Font and Costa-jussà 2019), though primarily focusing on data selection and embed-
dings as sources. Vanmassenhove et al. (2021) found effects of bias amplification in
MT, with “loss of lexical and morphological richness” (page 2203). Bentivogli et al.
(2020) address the gender de-biasing problem in MT, using multi-modal data that
includes the speakers’ voice.
Zhao et al. (2018) show that downstream tasks inherit gender biases from the
contextualized word embeddings used. Manzini et al. (2019) propose methods for
multi-class debiasing of word embeddings. Escudé Font and Costa-jussà (2019)
propose one of the first approaches to debiasing MT. The authors focus on the use of
debiased word embeddings (Bolukbasi et al. 2016) and gender-neutral word embed-
dings (Zhao et al. 2017) to provide better support for neural MT pipelines. However,
subsequent research by Gonen and Goldberg (2019) has called into question whether
these techniques actually address the underlying bias, or rather “put lipstick on a
pig.” I.e., they mask some of the symptoms, but they resurface when the embeddings
are used in downstream applications.
The work by Zmigrod et al. (2019) and Zhao et al. (2018) addresses the problem
of gender biases by training models with balanced data sets. In contrast, given the
difficulty in building such data sets, Saunders and Byrne (2020) reduce the biases in
Neural Machine Translation (NMT) systems by fine-tuning rather than re-training,
treating the debiasing procedure as a domain adaptation task. Similarly, Michel and
Neubig (2018) propose to take into consideration speakers’ attributes in NMT.
Vanmassenhove et al. (2018) propose to reduce gender biases in NMT systems by
integrating gender information as an additional feature. Saunders et al. (2020),
however, point out how such an approach needs improvements. For example, it
does not account for multiple referents in the same sentence, overgeneralizing the
predicted gender to all of them.
Niu et al. (2018) address the broad problem of translating texts from one language
to another, preserving not only the content, but also their stylistic features.
Stafanovičs et al. (2020) underline that one of the problems is that there is not
enough contextual information to provide a translation. The translation system can
rely on the most frequent case, which is often the most stereotypical one. The authors
explain this with the example The secretary asked for details, where there is not
enough information to translate the term secretary with the correct gender. The
9 Gender and Age Bias in Commercial Machine Translation 165

authors also propose to use word-level annotations related to the subject-gender to


provide better support to the translation system.
Prates et al. (2019) has shown that Google Translate suffers from a strong bias
towards male writers when doing translations. The authors collected a diverse set of
sentences relating to job position of a general form (He/She is an engineer) in
languages that do not have a gender system. Once these sentences are translated to
English, the authors have counted the statistical patterns related to this male bias.
The loss of linguistic richness in the translation process, expressed as reduced
linguistic variety in the translated texts, is also pointed out by Vanmassenhove
et al. (2019).
Rabinovich et al. (2017) investigated the effect of translation on gender using
research translation systems. They show that translation weakens the predictive
power but do not investigate the direction of false predictions. In Sect. 9.4 we
show that there is a definitive bias. Besides, we extend the analysis to include age.
We also use various commercially available MT tools rather than research systems.

9.4 Gender and Age Bias in Commercial Machine


Translation

In this section, we describe our experiments to demonstrate gender and age bias in
commercial MT. Based on the availability, we only cover a binary gender distinction
here. This should not be read as a normative comment, but a limitation of the data.

9.4.1 Method

Suppose we have access to a prediction model for the authors’ demographic aspects
(i.e., gender or age) for texts in all the languages and their respective translations. To
assess the demographic profile of a text, we train two separate aspect classifiers for
each language, namely age and gender. These classifiers allow us to compare the
predicted profiles in the original language with the predicted profiles of the transla-
tion, and compare both to the actual demographics of the test data.
Indeed, we can compare the predicted distribution, P, of an aspect with the actual
distributions, Q (following Shah et al. 2020). To compare the distribution, we can
use the Kullback-Leibler (KL) divergence, a divergence measure between probabil-
ity distributions.7

7
Note that the KL is a divergence and not a distance measure, because it is not symmetric,
KL(P|Q) ≠ KL(Q|P). This difference is not important for our objective, but it is important to
remember. Moreover, the KL divergence does not have an upper bound, while the lower bound
is 0 (i.e., equal distributions).
166 F. Bianchi et al.

Fig. 9.1 Two discrete


distributions that show high
KL divergence.
Classifier one: classifier for
language A, classifying
texts in original language.
Classifier two: classifier for
language B, classifying the
translated text

P Pi
KLðPjQÞ = Pi log2
i Qi

Therefore, we define the notion of bias for this paper as the divergence between
the real and predicted demographic variables, comparing both original and translated
texts. In particular, given the predictions over demographic aspects on the text in the
original language and the text in the translated language, we describe two events.
1. If there is a classifier bias, both the predictions based on the original language and
the predictions based on the translations should be skewed in the same
direction. E.g., both are predicting a higher rate of male authors. Note that we
are not interested in the actual predictive performance of the classifiers, but
just in how similar the prediction rates are between the original an the translated
text. As explained above, we can measure the classifier bias by computing the KL
divergence of the predicted distribution from the true sample distribution.
2. If instead there is a translation bias, then the translated predictions should exhibit
a stronger skew than the predictions based on the original language. E.g., the
gender distribution in the original language (which we control) should be closer
to uniform than the gender ratio in the translation. Translation bias is the main
target of investigation of this paper.
To give an high-level view of this, Fig. 9.1 shows an example of how a
translation bias would behave: the KL is high, because the two distributions are
very different. Instead in the case of classifier bias, the predicted distributions of the
two classifiers should look like the ones in Fig. 9.2, where the KL is low, because the
two predicted distributions are similar. Note that in practice, both types of bias are
likely to be present in our results.
By using both translations from and into English, we can further tease apart the
direction of this effect. Figure 9.3 summarizes the idea behind our experiment, here
considering gender as the demographic aspect.
9 Gender and Age Bias in Commercial Machine Translation 167

Fig. 9.2 Two discrete


distributions that show low
KL divergence.
Classifier one: classifier for
language A, classifying
texts in original language.
Classifier two: classifier for
language B, classifying the
translated text

The high
difference
between these
two suggest
translation bias

Fig. 9.3 The experimental methodology we adopt in this chapter. For example, we have a French
and an English gender classifier, each trained on monolingual corpora balanced for gender in the
respective language, i.e., French and English. We also have two possible translation pairs: French
original texts paired with their English translations, and English original texts paired with their
French translations. Our two classifiers are not perfect, but, if there is no translation bias, we expect
both predicted gender distributions to be similar to each other. On the other hand, if there is a
translation bias, the predicted distributions of the two classifiers should be different (e.g., the
English classifier predicting more females than males). The divergence between the two distribu-
tions gives a measure of the translation bias, i.e., the difficulty of the MT systems to preserve the
authors’ gender identity from one language to the other

Moreover, to see whether the predictions differ statistically significantly from the
original, we use a χ 2 contingency test and report significance at p ≤ 0.05 and
p ≤ 0.01.
168 F. Bianchi et al.

9.4.2 Data

Our starting point is the Trustpilot data set by Hovy et al. (2015). Trustpilot is an
online website, founded in Denmark in 2007, to provide users a platform to review
companies and services. For example, a satisfied customer can express a positive
opinion on their favorite online shoe seller. Users can also voluntarily report their
age and their (binary) gender identity, something that makes this data particularly
interesting for our use case. We use this self-reported socio-demographic informa-
tion as ground truth.
The data set contains reviews in different languages, but here we focus on
English, German, Italian, French, and Dutch. This selection was made taking into
consideration two criteria we established as requirements to run our experiment.
First of all, to provide a fair comparison, we need languages that can be covered by
all translation systems. Secondly, we need to be able to collect reviews that are
demographically representative samples of the language and country (the reviews
are from Germany, Italy, France, and the Netherlands, respectively). For the English
data, we use US reviews, rather than UK reviews, based on the general prevalence of
this variety in translation engines.

9.4.2.1 Translation Data

For each language, we restrict ourselves to reviews written in the respective lan-
guage (according to langid8 Lui and Baldwin 2012) that have both age and gender
information. We use the CIA factbook9 data on age pyramids to sample 200 instances
for both male and female users each. We use four age groups given in the factbook,
i.e., 15–24, 25–54, 55–64, and 65+, and sample instances proportionally to the age
pyramid from each group. Based on data sparsity in the Trustpilot data, we do not
include the under-15 age group. This sampling procedure results in five test sets (one
for each language) of about 400 instances each (the exact numbers vary slightly due
to rounding and the exact proportions in the CIA factbook data), balanced for binary
gender. The exception is Italian, where the original data is so heavily skewed
towards male-written reviews that we only achieve a 48:52 gender ratio even with
downsampling.
We then translate all non-English test sets into English. We will refer to this set of
data as the Into English data. We also translate the English test set into all other
languages, which we will refer to as From English. We use three commercially
available MT tools: Bing, DeepL, and Google Translate.

8
https://github.com/saffsd/langid.py.
9
https://www.cia.gov/library/publications/the-world-factbook/.
9 Gender and Age Bias in Commercial Machine Translation 169

9.4.2.2 Profile Prediction Data

We use all review instances that are not part of any test set to create training data for
the respective age and gender classifiers (see Sect. 9.4.3). Since we want to compare
fairly across languages, the training data sets need to be of comparable size. We are
therefore bounded by the size of the smallest available subset (Italian). We sample
about 2500 instances per gender in each language, again according to the respective
age distributions. This sampling results in about 5000 instances per language (again,
the exact number varies slightly based on the availability of samples for each group
and rounding). We again subsample to approximate the actual age and gender
distribution, since, according to Hovy et al. (2015), the data skews strongly male
while otherwise closely matching the official age distributions.

9.4.3 Classifiers

We use simple Logistic Regression (Pedregosa et al. 2011) models with L2 regular-
ization over TF-IDF based 2–6 character-grams, and regularization optimized via
tenfold cross-validation.
Furthermore, we also use the recent BERT-based neural language model archi-
tecture (Devlin et al. 2019; Liu et al. 2019). BERT has obtained convincing results
on many different NLP tasks and across different languages (Nozza et al. 2020).
Essentially, BERT is a transformer-based neural network (Vaswani et al. 2017)
trained on a large amount of text. Its network can be used as a starting point to
create new classifiers through fine-tuning (i.e., adapting the weights of BERT to a
new task). The same authors also released a multilingual version of BERT
(mBERT). mBERT10 supports over 100 languages, including Arabic, Dutch, or
Portuguese. BERT and mBERT have obtained increasingly good results in many
applications that involved monolingual and multilingual tasks (Yang et al. 2019;
Zhu et al. 2020; Bianchi et al. 2021c; Lamprinidis et al. 2021).
Given those initial results on English and multi-lingual data, researchers have
retrained BERT for specific languages, with similar (or better) performance (Nguyen
and Tuan Nguyen 2020; Antoun et al. 2020; Bianchi et al. 2021a; Martin et al. 2020;
de Vries et al. 2019, inter alia). In our experimental configuration we use the
following language-specific BERT (LS-BERT11) models:
• French: CamemBERT (Martin et al. 2020);
• German: GermanBERT;12

10
https://github.com/google-research/bert/blob/master/multilingual.md.
11
We will use LS-BERT to refer to the corresponding language-specific BERT applicable to the
language being discussed.
12
https://github.com/dbmdz/berts.
170 F. Bianchi et al.

Table 9.1 We show the de en fr it nl


Macro-F1 for the age and
Gender
gender classifiers on German
(de), English (en), French (fr), LogReg 0.65 0.62 0.64 0.62 0.66
Italian (it) and Dutch (nl). The LS-BERT 0.66 0.64 0.69 0.61 0.62
best result for each language mBERT 0.66 0.63 0.69 0.61 0.62
and demographic factor is Age
in bold
LogReg 0.52 0.53 0.45 0.52 0.49
LS-BERT 0.49 0.53 0.48 0.49 0.46
mBERT 0.49 0.52 0.48 0.49 0.46

• Italian: GilBERTo;13
• Dutch: BERTje (de Vries et al. 2019);
• English: RoBERTa (Liu et al. 2019).
Our classifiers cover a different range of training domains with different level of
specificity: the logistic regression is trained and tested on our corpus. The language-
specific BERT models were pretrained on language-specific common corpora and
then finetuned on our corpus. Finally, we finetuned a multilingual BERT model
(pretrained on a large Wikipedia corpus with many languages) on our corpus.
The numbers in Table 9.1 indicate that both age and gender can be inferred
reasonably well across all of the languages with all the classifiers. We use these
classifiers in the following analyses. LR, LS-BERT, and mBERT all achieve com-
parable results, indicating that there is an upper ceiling here. Since our corpus and
the task is original, we have no previous literature to compare our results, however
Logistic Regression is a fairly good baseline in most classification tasks.
For each non-English sample, we predict the age and gender of the author in both
the original language and in each of the three English translations (Google, Bing, and
DeepL). I.e., we use the respective language’s classifier described above (e.g., a
classifier trained on German to predict German test data). We use the English
classifiers described above for the translations from other languages into
English. E.g., we use the age and gender classifier trained on English data to predict
the English translations from the German test set.
For the English data, we first translate the texts into each of the other languages,
using each of the three translation systems. Then we again predict the author
demographics in the original English test set (using the classifier trained on English),
as well as in each of the translated versions (using the classifier trained on the
respective language). E.g., we create a German, French, Italian, and Dutch transla-
tion with each Google, Bing, and DeepL, and classify both the original English and
the translation.

13
https://github.com/idb-ita/GilBERTo.
9 Gender and Age Bias in Commercial Machine Translation 171

Table 9.2 Gender split (%) and KL divergence from gold for each language when translated into
English and classified with LogReg
From Gold Org. lang Google Bing DeepL
F:M F:M F:M F:M F:M
split split KL split KL split KL split KL
de 50:50 48:52 0.001 37:63 0.034 35:65 0.045 35:65 0.045
fr 50:50 47:53 0.002 49:51 0.000 48:52 0.001 49:51 0.000
it 48:52 47:53 0.000 37:63 0.026 43:57 0.006 36:64 0.033
nl 50:50 49:51 0.000 47:53 0.001 47:53 0.002 44:56 0.007
avg 0.000 0.015 0.013 0.021
**
Split differs significantly from gold split at p ≤ 0.01

Table 9.3 Gender split (%) and KL divergence from gold for each language when translated into
English and classified with LS-BERT
From Gold Org. lang Google Bing DeepL
F:M F:M F:M F:M F:M
split split KL split KL split KL split KL
de 50:50 51:49 0.000 36:64 0.042 39:61 0.022 39:61 0.023
fr 50:50 51:49 0.000 40:60 0.020 44:56 0.007 47:53 0.002
it 48:52 31:69 0.065 35:65 0.035 37:63 0.027 38:62 0.020
nl 50:50 48:52 0.000 46:54 0.003 41:59 0.015 43:57 0.008
avg 0.016 0.025 0.018 0.013
Split differs significantly from gold split at p ≤ 0.05
*

Significant difference at p ≤ 0.01


**

Table 9.4 Gender split (%) and KL divergence from gold for each language when translated into
English and classified with mBERT
From Gold Org. lang Google Bing DeepL
F:M F:M F:M F:M F:M
split split KL split KL split KL split KL
de 50:50 51:49 0.000 41:59 0.015 40:60 0.021 41:59 0.016
fr 50:50 51:49 0.000 45:55 0.005 47:53 0.002 47:53 0.002
it 48:52 31:69 0.065 40:60 0.014 44:56 0.005 44:56 0.004
nl 50:50 48:52 0.000 38:62 0.028 39:61 0.023 42:58 0.014
avg 0.016 0.016 0.013 0.009
Split differs significantly from gold split at p ≤ 0.05
*

Significant difference at p ≤ 0.01


**

9.4.4 Gender Bias

Tables 9.2, 9.3, and 9.4 show the results when translating into English. Tables show
for each language the test gender ratio, the predicted ratio from classifiers trained in
the same language, as well as their KL divergence from the ratio in the test set, and
172 F. Bianchi et al.

the ratio predictions and KL divergence on predictions of an English classifier on the


translations from three MT systems.

9.4.4.1 Translating into English

Using Logistic Regression as a classifier (Table 9.2), we can observe that, for most
languages, there already exists a classifier bias in the gender predictions of that
language, skewed towards male. However, the translated English versions create an
even stronger skew. The notable exception is French, which most translation engines
render in a demographically accurate manner. Dutch is slightly worse, followed by
Italian (note, though, that the Italian data was so heavily imbalanced that we could
not sample an even distribution for the test data in the first place). The gender skew is
most substantial for German, swinging by as much as 15 percentage points.

Using BERT: Language-Specific Mono-lingual Models

The results obtained with language-specific BERT classifiers (Table 9.3) demon-
strate very similar results across languages and translation engines. The only notable
difference is the gender prediction distribution for Italian, which is more skewed
towards the male class compared to Table 9.2. In particular, this imbalanced gender
prediction is present both in the translated English version and the source Italian one.

Using BERT: Multi-lingual Models

To better understand if this phenomenon was due to translation bias or an intrinsic


male bias present in the pretrained GilBERTo model, we perform the same analysis
with mBERT (Table 9.4). We use a BERT model trained on multiple languages in
this setting, removing (or at least strongly reducing) the possible bias introduced by
individual language-specific training and architectures. Ultimately, we observe that

Table 9.5 Gender split (%) and KL divergence from gold for each language when translated from
English and classified with LogReg
Gold English To Google Bing DeepL
F:M split F:M split KL F:M split KL F:M split KL F:M split KL
50:50 49:51 0.000 de 59:41 0.015 58:42 0.013 58:42 0.011
fr 49:51 0.000 52:48 0.001 54:46 0.003
it 45:55 0.004 44:56 0.007 41:59 0.016
nl 40:60 0.020 43:57 0.010 40:60 0.019
avg 0.010 0.008 0.012
Split differs significantly from gold split at p ≤ 0.05
*

Significant difference at p ≤ 0.01


**
9 Gender and Age Bias in Commercial Machine Translation 173

Table 9.6 Gender split (%) and KL divergence from gold for each language when translated from
English and classified with LS-BERT
Gold English To Google Bing DeepL
F:M split F:M split KL F:M split KL F:M split KL F:M split KL
50:50 49:51 0.000 de 64:36 0.040 56:44 0.007 62:38 0.031
fr 45:55 0.005 40:60 0.021 48:52 0.001
it 35:65 0.050 34:66 0.057 39:61 0.026
nl 49:51 0.000 40:60 0.022 46:54 0.003
avg 0.024 0.027 0.015
Split differs significantly from gold split at p ≤ 0.05
*

Significant difference at p ≤ 0.01


**

Table 9.7 Gender split (%) and KL divergence from gold for each language when translated from
English and classified with mBERT
Gold English To Google Bing DeepL
F:M split F:M split KL F:M split KL F:M split KL F:M split KL
50:50 49:51 0.000 de 47:53 0.001 48:52 0.001 49:51 0.000
fr 42:58 0.014 39:61 0.023 46:54 0.003
it 21:79 0.201 20:80 0.218 21:79 0.214
nl 35:65 0.045 33:67 0.059 37:63 0.033
avg 0.065 0.075 0.062
Split differs significantly from gold split at p ≤ 0.05
*

Significant difference at p ≤ 0.01


**

the discrepancies between the ratio predictions on English translations have


decreased, while the gender prediction on Italian is still strongly skewed. This
finding leads us to conclude that the exceptionally high bias observed with BERT
models on Italian is mainly due to intrinsic male bias present in the pretrained Italian
BERT models (GilBERTo and mBERT).

9.4.4.2 Translating from English

Tables 9.5, 9.6, and 9.7 show the results when translating from English into the
various languages. The format is the same as for Tables 9.2, 9.3, and 9.4.
Again we see large swings, normally exacerbating the balance towards men.
However, translating into German with all systems produces estimates that are
perceived as a lot more female than the original data. This result could be the inverse
effect of what we observed above.
We observe little change for French in both tables, though the gender prediction
with logistic regression demonstrates some female bias in two MT systems. Similar
174 F. Bianchi et al.

Fig. 9.4 Density distribution and KL for age prediction in various languages and different systems
in original and when translated into English classified with LogReg. Solid yellow line = true
distribution.  = predicted distribution differs significantly from gold distribution at p ≤ 0.05.  =
significant difference at p ≤ 0.01

to previous results when using the BERT model pretrained on Italian (GilBERTo
and mBERT), the gender prediction is significantly skewed towards the male class.
9 Gender and Age Bias in Commercial Machine Translation 175

Fig. 9.5 Density distribution and KL for age prediction in various languages and different systems
in original and when translated into English classified with LS-BERT. Solid yellow line = true
distribution.  = predicted distribution differs significantly from gold distribution at p ≤ 0.05.  =
significant difference at p ≤ 0.01

9.4.5 Age Bias

Figures 9.4, 9.5, and 9.6 show the kernel density plots for the four age groups in each
language (rows) in the same language prediction and the English translation. The
distributions are reasonably close in all cases, but the predictions overestimate the
most prevalent class in all cases. This effect is clearer when predicting age
176 F. Bianchi et al.

Fig. 9.6 Density distribution and KL for age prediction in various languages and different systems
in original and when translated into English classified with mBERT. Solid yellow line = true
distribution.  = predicted distribution differs significantly from gold distribution at p ≤ 0.05.  =
significant difference at p ≤ 0.01

distribution with the multilingual BERT model on English translations, demonstrat-


ing that a less specific model will tend to converge to the most prevalent classes even
more. A notable case is the age prediction with the Italian BERT model, which has a
significantly skewed output distribution when tested on Italian.
To delve a bit deeper into this age mismatch, we also split up the sample by
decade, rather than age group (i.e., seven classes: 10s, 20s, etc., up to 70s+).
Figures 9.7 and 9.8 show the results. The caveat here is that the overall predictive
performance is lower due to the higher number of classes. We also cannot guarantee
9 Gender and Age Bias in Commercial Machine Translation 177

Fig. 9.7 Density distribution and KL for decade prediction in various languages and different
systems in original and when translated into English classified with LogReg. Solid yellow line =
true distribution.  = predicted distribution differs significantly from gold distribution at p ≤ 0.05. 
= significant difference at p ≤ 0.01

that the distribution still follows the true demographic distribution since we are
subsampling within the larger classes given by the CIA factbook.
However, the results still strongly suggest that the observed mismatch is driven
predominantly by the overprediction of the 50s decade for logistic regression and
70s decade for language-specific and multilingual BERT models. Because these
178 F. Bianchi et al.

Fig. 9.8 Density distribution and KL for decade prediction in various languages and different
systems in original and when translated into English classified with LS-BERT. Solid yellow line =
true distribution.  = predicted distribution differs significantly from gold distribution at p ≤ 0.05. 
= significant difference at p ≤ 0.01

decades often contributed strongly to the most frequent age categories (25–54 and
65+), predictions did not differ as much from gold in the previous test.
In essence, English translations of all these languages, irrespective of the MT
system, seem to be produced by authors much older than what they actually are
(Fig. 9.9).
9 Gender and Age Bias in Commercial Machine Translation 179

Fig. 9.9 Density distribution and KL for decade prediction in various languages and different
systems in original and when translated into English classified with mBERT. Solid yellow line =
true distribution.  = predicted distribution differs significantly from gold distribution at p ≤ 0.05. 
= significant difference at p ≤ 0.01

9.4.6 Discrepancies Between MT Systems

All three tested commercial MT systems are close together in terms of results in our
paper. However, they also seem to show the same systematic translation biases. The
most likely reason is the use of biased training data. The fact that translations into
English are perceived as older and more male than translations into other languages
180 F. Bianchi et al.

could indicate a more extensive collection of unevenly selected data in English than
for other languages.

9.5 Discussion

In this chapter, we have demonstrated the existence of gender and age bias in
MT. We have shown that translations with commercial systems make the translated
text seem produced by subjects more male and older than what they are. While
similar findings in literature corroborate these results, we are the first to provide a
quantitative analysis on three different commercial MT tools. We expect more
consideration for this kind of problem in the future, since they might affect our
communication quality. Ultimately, our findings contribute to a growing body of
research indicating that language is about more than just information content, but
includes important social aspects as well. By giving those aspects more consider-
ation, we can push the frontier of MT and move into stylistic aspects. On the other
hand, by ignoring these aspects, we proliferate the status quo, resulting in uneven
user experiences and translations that capture only half of what they should.

References

Antoun W, Baly F, Hajj H (2020) AraBERT: Transformer-based model for Arabic language
understanding. In: Proceedings of the 4th workshop on open-source arabic corpora and
processing tools, with a shared task on offensive language detection, Marseille, France.
European Language Resource Association, pp 9–15
Bentivogli L, Savoldi B, Negri M, Di Gangi MA, Cattoni R, Turchi M (2020) Gender in danger?
evaluating speech translation technology on the MuST-SHE corpus. In: Proceedings of the 58th
annual meeting of the association for computational linguistics, Online. Association for Com-
putational Linguistics, pp 6923–6933
Bianchi F, Hovy D (2021) On the gap between adoption and understanding in NLP. In: Findings of
the Association for Computational Linguistics: ACL-IJCNLP 2021, Online. Association for
Computational Linguistics, pp 3895–3901
Bianchi F, Nozza D, Hovy D (2021a) FEEL-IT: Emotion and sentiment classification for the Italian
language. In: Proceedings of the eleventh workshop on computational approaches to subjectiv-
ity, sentiment and social media analysis, Online. Association for Computational Linguistics, pp
76–83
Bianchi F, Nozza D, Hovy D (2021b) Language invariant properties in natural language processing.
Preprint. arXiv:2109.13037
Bianchi F, Terragni S, Hovy D, Nozza D, Fersini E (2021c) Cross-lingual contextualized topic
models with zero-shot learning. In: Proceedings of the 16th conference of the European chapter
of the Association for Computational Linguistics: main volume, Online. Association for Com-
putational Linguistics, pp 1676–1683
Blodgett SL, Barocas S, Daumé III H, Wallach H (2020) Language (technology) is power: A critical
survey of “bias” in NLP. In: Proceedings of the 58th annual meeting of the Association for
Computational Linguistics, Online. Association for Computational Linguistics, pp 5454–5476
9 Gender and Age Bias in Commercial Machine Translation 181

Bolukbasi T, Chang K, Zou JY, Saligrama V, Kalai AT (2016) Man is to computer programmer as
woman is to homemaker? debiasing word embeddings. In Lee DD, Sugiyama M, von
Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29:
annual conference on neural information processing systems 2016, December 5–10, 2016,
Barcelona, Spain, pp 4349–4357
Dabre R, Chu C, Kunchukuttan A (2020) A survey of multilingual neural machine translation.
ACM Comput Surv 53(5):1
de Vries W, van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M (2019) BERTje: A
Dutch BERT model. Preprint. arXiv:1912.09582
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional
transformers for language understanding. In: Proceedings of the 2019 conference of the North
American chapter of the Association for Computational Linguistics: human language technol-
ogies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computa-
tional Linguistics, pp 4171–4186
Escudé Font J, Costa-jussà MR (2019) Equalizing gender bias in neural machine translation with
word embeddings techniques. In: Proceedings of the first workshop on gender bias in natural
language processing, Florence, Italy. Association for Computational Linguistics, pp 147–154
Flek L (2020) Returning the N to NLP: Towards contextually personalized classification models. In:
Proceedings of the 58th annual meeting of the Association for Computational Linguistics,
Online. Association for Computational Linguistics, pp 7828–7838
Gehman S, Gururangan S, Sap M, Choi Y, Smith NA (2020) RealToxicityPrompts: Evaluating
neural toxic degeneration in language models. In: Findings of the Association for Computa-
tional Linguistics: EMNLP 2020, Online. Association for Computational Linguistics, pp
3356–3369
Gonen H, Goldberg Y (2019) Lipstick on a pig: Debiasing methods cover up systematic gender
biases in word embeddings but do not remove them. In: Proceedings of the 2019 conference of
the North American chapter of the Association for Computational Linguistics: human language
technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for
Computational Linguistics, pp 609–614
Hovy D, Yang D (2021) The importance of modeling social factors of language: Theory and
practice. In: Proceedings of the 2021 conference of the North American chapter of the Associ-
ation for Computational Linguistics: human language technologies, Online. Association for
Computational Linguistics, pp 588–602
Hovy D, Johannsen A, Søgaard A (2015) User review sites as a resource for large-scale sociolin-
guistic studies. In: Gangemi A, Leonardi S, Panconesi A (eds) Proceedings of the 24th
international conference on world wide web, WWW 2015, Florence, Italy, May 18–22, 2015.
ACM, pp 452–461
Hovy D, Bianchi F, Fornaciari T (2020) “you sound just like your father” commercial machine
translation systems include stylistic biases. In: Proceedings of the 58th annual meeting of the
Association for Computational Linguistics, Online. Association for Computational Linguistics,
pp 1686–1690
Johannsen A, Hovy D, Søgaard A (2015) Cross-lingual syntactic variation over age and gender. In:
Proceedings of the nineteenth conference on computational natural language learning, Beijing,
China. Association for Computational Linguistics, pp 103–112
Lamprinidis S, Bianchi F, Hardt D, Hovy D (2021) Universal joy a data set and results for
classifying emotions across languages. In: Proceedings of the eleventh workshop on computa-
tional approaches to subjectivity, sentiment and social media analysis, Online. Association for
Computational Linguistics, pp 62–75
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V
(2019) Roberta: A robustly optimized bert pretraining approach. Preprint. arXiv:1907.11692
Liu X, Duh K, Liu L, Gao J (2020) Very deep transformers for neural machine translation. Preprint.
arXiv:2008.07772
182 F. Bianchi et al.

Lui M, Baldwin T (2012) langid.py: An off-the-shelf language identification tool. In: Proceedings
of the ACL 2012 system demonstrations, Jeju Island, Korea. Association for Computational
Linguistics, pp 25–30
Manzini T, Yao Chong L, Black AW, Tsvetkov Y (2019) Black is to criminal as caucasian is to
police: Detecting and removing multiclass bias in word embeddings. In: Proceedings of the
2019 conference of the North American chapter of the Association for Computational Linguis-
tics: human language technologies, volume 1 (Long and Short Papers), Minneapolis, Minnesota.
Association for Computational Linguistics, pp 615–621
Martin L, Muller B, Ortiz Suárez P J., Dupont Y, Romary L, de la Clergerie É, Seddah D, Sagot B
(2020) CamemBERT: a tasty French language model. In: Proceedings of the 58th annual
meeting of the Association for Computational Linguistics, Online. Association for Computa-
tional Linguistics, pp 7203–7219
Michel P, Neubig G (2018) Extreme adaptation for personalized neural machine translation. In:
Proceedings of the 56th annual meeting of the Association for Computational Linguistics
(Volume 2: Short Papers), Melbourne, Australia. Association for Computational Linguistics,
pp 312–318
Mirkin S, Meunier JL (2015) Personalized machine translation: Predicting translational
preferences. In: Proceedings of the 2015 conference on empirical methods in natural language
processing, Lisbon, Portugal. Association for Computational Linguistics, pp 2019–2025
Mirkin S, Nowson S, Brun C, Perez J (2015) Motivating personality-aware machine translation. In:
Proceedings of the 2015 conference on empirical methods in natural language processing,
Lisbon, Portugal. Association for Computational Linguistics, pp 1102–1108
Nguyen DQ, Tuan Nguyen A (2020) PhoBERT: Pre-trained language models for Vietnamese. In:
Findings of the Association for Computational Linguistics: EMNLP 2020, Online. Association
for Computational Linguistics, pp 1037–1042
Niu X, Rao S, Carpuat M (2018) Multi-task neural models for translating between styles within and
across languages. In: Proceedings of the 27th international conference on computational lin-
guistics, Santa Fe, New Mexico, USA. Association for Computational Linguistics, pp
1008–1021
Nozza D (2021) Exposing the limits of zero-shot cross-lingual hate speech detection. In: Pro-
ceedings of the 59th annual meeting of the Association for Computational Linguistics and the
11th international joint conference on natural language processing (Volume 2: Short Papers),
Online. Association for Computational Linguistics, pp 907–914
Nozza D, Bianchi F, Hovy D (2020) What the [mask]? making sense of language-specific bert
models. Preprint. arXiv:2003.02912
Nozza D, Bianchi F, Hovy D (2021) HONEST: Measuring hurtful sentence completion in language
models. In: Proceedings of the 2021 conference of the North American chapter of the Associ-
ation for Computational Linguistics: human language technologies, Online. Association for
Computational Linguistics, pp 2398–2406
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel, O, Blondel M, Prettenhofer P,
Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay
E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Prabhumoye S, Boldt B, Salakhutdinov R, Black AW (2021) Case study: Deontological ethics in
NLP. In: Proceedings of the 2021 conference of the North American chapter of the Association
for Computational Linguistics: human language technologies, Online. Association for Compu-
tational Linguistics, pp 3784–3798
Prates MO, Avelar PH, Lamb LC (2019) Assessing gender bias in machine translation: a case study
with Google translate. Neural Comput Appl:1–19
Rabinovich E, Patel RN, Mirkin S, Specia L, Wintner S (2017) Personalized machine translation:
Preserving original author traits. In: Proceedings of the 15th conference of the European chapter
of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain.
Association for Computational Linguistics, pp 1074–1084
9 Gender and Age Bias in Commercial Machine Translation 183

Rescigno AA, Monti J, Way A, Vanmassenhove E (2020) A case study of natural gender
phenomena in translation: A comparison of Google Translate, bing Microsoft translator and
DeepL for English to Italian, French and Spanish. In: Workshop on the impact of machine
translation (iMpacT 2020), Virtual. Association for Machine Translation in the Americas, pp
62–90
Saunders D, Byrne B (2020) Reducing gender bias in neural machine translation as a domain
adaptation problem. In: Proceedings of the 58th annual meeting of the Association for Compu-
tational Linguistics, Online. Association for Computational Linguistics, pp 7724–7736
Saunders D, Sallis R, Byrne B (2020) Neural machine translation doesn’t translate gender
coreference right unless you make it. In: Proceedings of the second workshop on gender bias
in natural language processing, Barcelona, Spain (Online). Association for Computational
Linguistics, pp 35–43
Shah DS, Schwartz HA, Hovy D (2020) Predictive biases in natural language processing models: A
conceptual framework and overview. In: Proceedings of the 58th annual meeting of the
Association for Computational Linguistics, Online. Association for Computational Linguistics,
pp 5248–5264
Stafanovičs A, Bergmanis T, Pinnis M (2020) Mitigating gender bias in machine translation with
target gender annotations. In: Proceedings of the fifth conference on machine translation,
Online. Association for Computational Linguistics, pp 629–638
Stanovsky G, Smith NA, Zettlemoyer L (2019) Evaluating gender bias in machine translation. In:
Proceedings of the 57th annual meeting of the Association for Computational Linguistics,
Florence, Italy. Association for Computational Linguistics, pp 1679–1684
Vanmassenhove E, Hardmeier C, Way A (2018) Getting gender right in neural machine
translation. In: Proceedings of the 2018 conference on empirical methods in natural language
processing, Brussels, Belgium. Association for Computational Linguistics, pp 3003–3008
Vanmassenhove E, Shterionov D, Way A (2019) Lost in translation: Loss and decay of linguistic
richness in machine translation. In: Proceedings of machine translation summit XVII volume 1:
research track, Dublin, Ireland. European Association for Machine Translation, pp 222–232
Vanmassenhove E, Shterionov D, Gwilliam M (2021) Machine translationese: Effects of algorith-
mic bias on linguistic complexity in machine translation. In: Proceedings of the 16th conference
of the European chapter of the Association for Computational Linguistics: main volume, Online.
Association for Computational Linguistics, pp 2203–2213
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017)
Attention is all you need. In Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R,
Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30:
annual conference on neural information processing systems 2017, December 4–9, 2017, Long
Beach, CA, USA, pp 5998–6008
Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019) Pay less attention with lightweight and
dynamic convolutions. In: 7th International conference on learning representations, ICLR 2019,
New Orleans, LA, USA, May 6–9, 2019. OpenReview.net
Yang W, Xie Y, Lin A, Li X, Tan L, Xiong K, Li M, Lin J (2019) End-to-end open-domain question
answering with BERTserini. In: Proceedings of the 2019 conference of the North American
chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, Min-
nesota. Association for Computational Linguistics, pp 72–77
Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2017) Men also like shopping: Reducing
gender bias amplification using corpus-level constraints. In: Proceedings of the 2017 conference
on empirical methods in natural language processing, Copenhagen, Denmark. Association for
Computational Linguistics, pp 2979–2989
184 F. Bianchi et al.

Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2018) Gender bias in coreference resolution:
Evaluation and debiasing methods. In: Proceedings of the 2018 conference of the North
American chapter of the Association for Computational Linguistics: human language technol-
ogies, volume 2 (Short Papers), New Orleans, Louisiana. Association for Computational
Linguistics, pp 15–20
Zhao G, Sun X, Xu J, Zhang Z, Luo L (2019) Muse: Parallel multi-scale attention for sequence to
sequence learning. Preprint. arXiv:1911.09483
Zhu J, Xia Y, Wu L, He D, Qin T, Zhou W, Li H, Liu T (2020) Incorporating BERT into neural
machine translation. In: 8th International conference on learning representations, ICLR 2020,
Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net
Zmigrod R, Mielke SJ, Wallach H, Cotterell R (2019) Counterfactual data augmentation for
mitigating gender stereotypes in languages with rich morphology. In: Proceedings of the 57th
annual meeting of the Association for Computational Linguistics, Florence, Italy. Association
for Computational Linguistics, pp 1651–1661
Chapter 10
The Ecological Footprint of Neural Machine
Translation Systems

Dimitar Shterionov and Eva Vanmassenhove

Abstract Over the past decade, deep learning (DL) has led to significant advance-
ments in various fields of artificial intelligence, including machine translation (MT).
These advancements would not be possible without the ever-growing volumes of
data and the hardware that allows large DL models to be trained efficiently. Due to
the large amount of computing cores as well as dedicated memory, graphics
processing units (GPUs) are a more effective hardware solution for training and
inference with DL models than central processing units (CPUs). However, the
former is very power demanding. The electrical power consumption has economic
as well as ecological implications.
This chapter focuses on the ecological footprint of neural MT systems. It starts
from the power drain during the training of and the inference with neural MT models
and moves towards the environment impact, in terms of carbon dioxide emissions.
Different architectures (RNN and Transformer) and different GPUs (consumer-
grade NVidia 1080Ti and workstation-grade NVidia P100) are compared. Then,
the overall CO2 offload is calculated for Ireland and the Netherlands. The NMT
models and their ecological impact are compared to common household appliances
to draw a more clear picture.
The last part of this chapter analyses quantization, a technique for reducing the
size and complexity of models, as a way to reduce power consumption. As quantized
models can run on CPUs, they present a power-efficient inference solution without
depending on a GPU.

Keywords Neural machine translation · Power consumption · Carbon dioxide


emissions · GPU comparison · Quantization · LSTM · Transformer · Europarl

D. Shterionov (✉) · E. Vanmassenhove


Department of Cognitive Science And Artificial Intelligence, Tilburg University, Tilburg, The
Netherlands
e-mail: [email protected]; [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 185
H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_10
186 D. Shterionov and E. Vanmassenhove

10.1 Introduction

Over the past decade, Deep Learning (DL) techniques took the world by storm and
their application led to state-of-the-art results in various fields. The same holds for
the field of Machine Translation (MT), where the latest AI boom led the way for the
emergence of a new paradigm Neural Machine Translation (NMT). Since most of
the foundational techniques used in current applications of AI were developed before
the turn of the century, the main triggers of the boom were innovations in general
purpose GPU computing as well as hardware advancements that facilitate much
more efficient training and inference. In spite of the improvements in terms of
efficiency, graphics processing units (GPUs) are more power demanding than their
central processing unit (CPU) predecessors and as such, they have a considerably
higher environmental impact.
In this chapter, a brief overview of the most recent paradigms is presented along
with a more in depth introduction to the core processing technology involved in
NMT (GPUs as opposed to CPUs). We elaborate on the exponential growth of
models, the current trends in ‘big data’ and their relation to model performance. The
related work discusses pioneering and recent papers on Green AI for Natural
Language Processing (NLP) as well as tools to quantify the environmental impact
of AI.
As a case study and to outline the realistic dimensions of power consumption and
environmental footprint of NMT, we train 16 NMT models and use them for
translation, while collecting power readings from the corresponding GPUs. Using
the collected measurements we compare different NMT architectures (Long Short-
Term Memory (LSTM) (Sutskever et al. 2014; Cho et al. 2014; Bahdanau et al.
2015) and Transformer (Vaswani et al. 2017)), different GPUs (NVidia GTX 1080Ti
and NVidia Tesla P100) as well as the environmental footprint of training and
translation with these models to other commonly-used devices.
Together with the research and analytical contributions, this chapter also aims to
motivate researchers to devote time, effort and investment in developing more
ecological solutions for MT.

10.2 The Technological Shift(s) in MT

Machine translation (MT), the task of automatically translating text in one language,
into text in another language using a computer system, has become an indispensable
tool for professional translators (to assist them in the translation workflow), for
commercial users, e.g. e-commerce companies (to make their content quickly
available in multiple languages) and to everyday users (to access information
unrestricted by the language in which it is produced). Since its inception in the
late 1950s, MT has undergone many shifts, the latest of which, NMT imposes a
requirement for hardware that can facilitate efficient training and inference with
10 The Ecological Footprint of Neural Machine Translation Systems 187

NMT models. Most commonly used hardware are GPUs that support embarrassingly
parallel computations due to the large amounts of processing cores and dedicated
memory.1

10.2.1 From Rule-Based to Neural MT

In the early days of machine translation (MT), rule-based MT (RBMT) systems were
built around dictionaries and human-crafted rules to convert a source sentence into
its equivalent in the target language. Such systems were heavily dependent on the
efforts and skills of linguists. Developing a rule-based system for a new domain or a
new language pair was (and still is) a cumbersome, time-consuming task that
requires extensive linguistic expertise (Arnold et al. 1994). However, from a com-
putational point of view, it is an inexpensive task—using an RBMT system does not
require substantial computational resources.
In the 1980s, researchers attempted to overcome some of the shortcomings of
RBMT when dealing with languages that differed substantially structurally
(e.g. English vs Japanese). Focusing specifically on collocations, examples could
be used for transfer when rules and trees failed (Nagao 1984). These hybrid
approaches evolved by the end of the decade into a more example-centred approach
(example-based MT (EBMT), where patterns would be retrieved from existing
corpora and adapted using hand-written rules).2 The idea of using patterns extracted
from corpora culminated when, in the late 1980s and early 1990s, a group of
researchers at IBM created an MT system relying solely on statistical information
extracted from corpora (Brown et al. 1988, 1990).3
The generation of corpus-based MT systems rely on data and statistical models to
derive word-, phrase- or segment-level translations eliminating the need for complex
semantic and/or syntactic rules. As such, the core mechanism of MT systems shifted
from human expertise in linguistics to machine learning techniques. This shift
entailed other important changes for MT related to the development time and the
computational resources required for training.4 Furthermore, this group of MT
paradigms that learn automatic translation models from large amounts of parallel
and monolingual data reached—for in-domain translations and given enough

1
There are other hardware and software that are specifically developed for AI-accelerated comput-
ing, e.g. tensor processing units (https://cloud.google.com/tpu/docs/tpus). However, as the most
commonly used and easily accessible such devices are GPUs, our work focuses on the power
considerations and environmental footprint of GPUs.
2
For an overview of EBMT we refer the interested reader to Carl and Way (2003).
3
An overview of the pre-neural evolution of MT can be found in Hutchins (2005a).
4
As a matter of fact, the technical advances in terms of computational resources facilitated
developments in the field of corpus-based MT. The paradigm shift furthermore coincided with
the late 1990s and early 2000s growth in terms of direct applications for MT and localisation.
188 D. Shterionov and E. Vanmassenhove

available training data—a better overall translation quality than that achieved by
earlier RBMT systems.
Up until about 2016, Phrase-Based Statistical MT (PB-SMT) was the dominant
corpus-based paradigm (especially after Google Translate made the switch from
RBMT to SMT) (Bentivogli et al. 2016). Currently, most state-of-the-art results for
MT are achieved using neural approaches, i.e. models based on artificial neural
networks and most prominently recurrent neural networks (RNNs) (Sutskever et al.
2014; Cho et al. 2014; Bahdanau et al. 2015) and Transformer architectures
(Vaswani et al. 2017). RNNs feed their output as input, along with new input.
This enables RNNs to compress sequences (of tokens, e.g. words) of arbitrary length
into a fixed-size representation. This representation can then be used to initiate a
decoding process where one token is generated at a time (again using a recurrent
network) conditioned on the previously generated tokens and the encoded represen-
tation of the input until a certain condition is met. In the context of MT, this
generation process typically continues until the end-of-sentence token is generated.
To mitigate issues related to long-distance dependencies, LSTM units (Zaremba
et al. 2014) or gated recurrent units (Cho et al. 2014) are typically used instead of
simple RNNs. In addition, to further improve the relation between encoder and
decoder, an attention mechanism is added (Luong et al. 2015). The attention
mechanism learns to associate different weights based on the importance of individ-
ual input tokens. It has been shown that attention-based models significantly
outperform those that do not employ attention. In contrast to NMT using RNNs,
Transformer does not employ recurrence; it uses self attention, i.e. an attention
mechanism that indicates the importance of a token with respect to the other input
tokens. Within a self-attention mechanism the positional information is lost. As
such, Transformer employs a positional encoding that learns the positional informa-
tion. These operations can be performed in parallel for different tokens, which allows
the training process of a Transformer model to be parallelised, unlike RNNs, where
operations are performed sequentially, making Transformer a more efficient
architecture.
For a more complete overview and description of the history of Machine Trans-
lation, we refer to the work of Hutchins (2005b), Kenny (2005), Poibeau (2017) and
Koehn (2020).

10.2.2 Why GPUs?

NMT evolved very quickly, replacing PB-SMT in both academia and industry.
Aside from the paradigm shift, NMT imposed another big change within the field
of MT, that of the core processing technology. In particular, employing GPUs
instead of CPUs. Training neural networks (NNs) revolves around the manipulation
of large matrices. A general-purpose, high-performance CPU typically contains 16–
32 high-frequency cores that can operate in parallel. CPUs are designed for general
purpose, sequential operations. However, the sizes of matrices involved in training
10 The Ecological Footprint of Neural Machine Translation Systems 189

ALU ALU
CONTROL
ALU ALU

CACHE

CPU GPU

Fig. 10.1 Visual comparison between a CPU and a GPU in terms of cache, control and processing
units. Source: https://fabiensanglard.net/cuda/

NNs are far beyond the processing and memory capacities of CPUs; processes that
can (theoretically) be parallelised need to be serialized and conducted in a sequence,
leading to rather high processing times. To perform all matrix-operations efficiently
a parallel-processing framework is more suitable. GPUs—as the name clearly
indicates—are designed to render and update graphics. They encapsulate thousands
of cores with tens of thousands of threads. With their large dedicated memory GPUs
can host both an NN model and training examples reducing the required memory
transfer. The large amount of processing cores that can operate in parallel and the
dedicated memory makes GPUs a much more effective option for training NN
models (Raina et al. 2009), including NMT models. See Fig. 10.1 for a visual
comparison between a CPU and a GPU in terms of cache, control and processing
units.
In GPUs, however, many more transistors are dedicated to data processing, rather
than to caching and control flow as in CPUs (Raina et al. 2009). GPUs are in fact
much more power demanding than CPUs leading to two types of considerations:
(1) first, the physical power requirements towards a data centre or a workstation
dedicated to training NN models, and (2) second, the ecological concern related to
the production and consumption of electricity to power up and sustain the training of
NN (including NMT) models. In this chapter we focus on the second question. We
will present actual power and thermal indicators measured during the training of
different NMT models and align them with ecological as well as economical markers
in Sect. 10.5. Then, in Sect. 10.6, we will discuss two approaches to reduce the GPU
power consumption: distribution and parallelisation, and quantization.

10.2.3 The More, the Better?

Since statistical models (both traditional and deep learning) took over in the field of
NLP, datasets and models grew bigger. Especially since 2018 when BERT (Devlin
et al. 2019) and its successors (e.g. GPT 2 (Radford et al. 2019), GPT-3 (Brown et al.
2020) and Turing NLG (Microsoft 2020)) appeared, the size of language models and
190 D. Shterionov and E. Vanmassenhove

the amount of parameters grew exponentially. Since the relation between a model’s
performance and its complexity is at best logarithmic (Schwartz et al. 2020),
exponentially larger models are being trained for often small gains in performance.
This exponential growth is illustrated well by the Switch-C, the current largest
language model introduced in 2021 with a capacity of 1.6 trillion parameters. For
comparison, one of its recent predecessors GPT-3, currently the third5 largest model,
introduced in June 2020 had a capacity of ‘only’ 175 billion parameters.6 Similarly, a
blogpost by Open AI (Amodei et al. 2018) demonstrated how the compute grew by
more than 300, 000×. This corresponds to a doubling every 3.4 months.7
Many studies have investigated the performance of NMT in terms of adequacy,
fluency, errors, data requirements, impact on the translation workflow and language
service providers, bias. The efficiency and energy impact of developing new NMT,
however, has not yet received the necessary attention. In Sect. 10.3 we present
related work in order to properly position our research in current literature.

10.3 Related Work

The related work is divided into three subsections. In Sect. 10.3.1, we cover the
related research papers. Since there are a few more practical tools that have been
suggested and constructed in order to measure the environmental and financial cost,
we cover the main tools in Sect. 10.3.2. Finally, in Sect. 10.3.3, we mention some
recent initiatives related to sustainable NLP.

10.3.1 Research

Recent work (Strubell et al. 2019; Schwartz et al. 2020) brought to the attention of
the NLP community the environmental (carbon footprint) and financial (hardware
and electricity or cloud compute time) cost of training ‘deep’ NLP models. In
Strubell et al. (2019) the energy consumption (in kilowatts) of different state-of-
the-art NLP models is estimated. With this information, the carbon emissions and
electricity costs of the models can be approximated. From their experiments regard-
ing the cost of training, it results that training BERT (Devlin et al. 2019) is
comparable to a trans-American flight in terms of carbon emissions. They also
quantified the cost of development by studying logs of a multi-task NLP model

5
The second largest model is the GShard (Lepikhin et al. 2020), introduced in September 2020
which had a capacity of 600 billion parameters.
6
https://analyticsindiamag.com/open-ai-gpt-3-language-model/.
7
For comparison, Moore’s Law forecasted a doubling every 2 years for the number of transistors in
a dense integrated circuit (Amodei et al. 2018).
10 The Ecological Footprint of Neural Machine Translation Systems 191

(Strubell et al. 2018) that received the Best Long Paper award at EMNLP 2018.
From the estimated development costs, it resulted that the most problematic aspect in
terms of cost is the tuning process and the full development cycle (due to
hyperparameter grid searches)8,9 and not the training process of a single model.
They conclude their work with three recommendations for NLP research which
stress the importance of: (1) reporting the time required for (re)training and the
hyperparameters’ sensitivity, (2) the need for equitable access to computational
resources in academia, and (3) the development of efficient hardware and models.
Similar to Strubell et al. (2019), the work by Schwartz et al. (2020) advocates for
‘Green AI’, which is defined as “AI research that is more environmentally friendly
and inclusive” (Schwartz et al. 2020) and is directly opposed to environmentally
unfriendly, expensive and thus exclusive ‘Red AI’. Although the ‘Red AI’ trend has
led to significant improvements for a variety of AI tasks, Schwartz et al. (2020) stress
that there should also be room for other types of contributions that are greener, less
expensive and that allow young researchers and undergraduates to experiment,
research and have the ability to publish high-quality work at top conferences. The
trend of so-called ‘Red AI’, where massive models are trained using huge amount of
resources, can almost be seen as a type of ‘buying’ stronger results, especially given
that the relation between the complexity of a model and its performance is at best
logarithmic implying that exponentially larger model are required for linear gains.10
Nevertheless, their analysis of the trends in AI based on papers from top conferences
such as ACL11 and NeurIPS12 reveals that there is a strong tendency within the field
to focus merely on the accuracy (or performance) of the proposed models with very
few papers even mentioning other measures such as speed, model size or efficiency.
They propose making the efficiency of models a key criterion aside (or integrated
with) commonly used metrics. There are multiple ways to measure efficiency,13
floating point operations (FPO) being the one advocated for by Schwartz et al.
(2020). FPO is a metric that has occasionally been used to determine the energy
footprint of models (Molchanov et al. 2016; Vaswani et al. 2017; Gordon et al. 2018;
Veniat and Denoyer 2018) as it estimates “the work performed by a computation
process” (Schwartz et al. 2020) based on two operations ‘ADD’ and ‘MUL’. They
furthermore advocate for reporting a baseline that promotes data-efficient
approaches by plotting accuracy as a function of “computational cost and of training
set size” (Schwartz et al. 2020). Aside from the environmental impact, both recent
papers also stress the importance of making research more inclusive and accessible

8
During tuning the model is trained from an already existing checkpoint, typically using new data.
9
In the development cycle, different versions of the model are trained or tuned and evaluated. Each
of those differ in terms of hyperparameter values.
10
Such models are most commonly developed by large multinational companies that poses the
necessary resources.
11
https://acl2018.org.
12
https://nips.cc/Conferences/2018.
13
For a more detailed overview of measures we refer to the paper itself.
192 D. Shterionov and E. Vanmassenhove

and thus advise reviewers of journals and conferences to recognize contributions to


the field by valuing certain contributions (e.g. based on efficiency) even when they
do not beat state-of-the-art results.
While in the context of a different topic, that of NLP leaderboards, in its critique
on how the community’s focus has shifted towards accuracy rather than applicabil-
ity, the work of Ethayarajh and Jurafsky (2020) points out the importance of
efficiency, fairness, robustness and other desiderata on applicability of NLP models
in practice. In fact, in a commercial setting where revenue is the main criterion of
success, efficiency (i.e. of training/updating models, or of inference) is often more
important than accuracy. See, for example, the work of Shterionov et al. (2019)
which compares several systems for quality estimation of MT based on performance
metrics, business metrics and efficiency (training and inference times). This publi-
cation stems from an academic-industry collaboration, based on commercial data
and use-cases.
Another recent, yet heavily debated paper by Bender et al. (2021),14 focuses on a
broader range of problems related to current trends in AI and the risks associated
with these trends. Their paper goes beyond merely discussing the environmental
impact of large models as it covers various concerns related specifically to the
unfathomable nature of large data sets and its potential consequences (related to
e.g. bias, derogatory associations, stereotyping), including for downstream tasks.
The section that focuses specifically on the environmental and financial cost mainly
summarizes findings of previous papers (e.g. Strubell et al. 2019; Schwartz et al.
2020) as well as some recent tools and benchmarks. They also provide a table with
an overview of the 12 most recent, largest language models along with their number
of parameters and the data set sizes, which illustrates well how these massive
language models have grown exponentially over the last 2 years. The paper further-
more points out that the majority of these technologies are constructed for the
already privileged part of society while marginalized populations are the ones
more likely to experience environmental racism (Barsh 1990; Pulido 2016; Bender
et al. 2021).
Both Lacoste et al. (2019) and Henderson et al. (2020) provide tools (see Sect.
10.3.2 for more details) to help researchers report carbon emissions and energy
consumption. Aside from the tools themselves, both papers also provide an expla-
nation of certain factors (such as server location, energy grid, hardware and the
run-time of training procedures) and their impact on the emissions as well as more
practical mitigation guidelines for researchers. The recommendations and best
practices listed in Lacoste et al. (2019) centre around transparency, the choice of
cloud providers, the location of the data centre, a reduction of wasted resources and
the choice of hardware. Henderson et al. (2020) provide additional mitigation and
reporting strategies highlighting how both industry and academia can improve their

14
Among others due to the fact that it has been considered the centrepiece that led to Google firing
leading AI ethics researcher Dr. Timnit Gebru, and subsequently Dr. Margaret Mitchell, two of the
authors of the paper.
10 The Ecological Footprint of Neural Machine Translation Systems 193

environmental impact in terms of carbon and energy efficiency. Their mitigation


strategies are an extension of those provided in Lacoste et al. (2019) and include:
ensuring the reproducibility of experiments (and thus avoiding replication difficul-
ties), focusing more on energy-efficiency and energy-performance trade-offs, reduc-
ing overheads for utilizing efficient algorithms and selecting efficient test
environments.

10.3.2 Tools

Quantifying the environmental impact of AI technology is a difficult task that might


have prevented researchers from reporting energy consumption and/or carbon emis-
sions. To overcome this problem and to make this information easier to report and
calculate, Lacoste et al. (2019) released an online tool.15 Given the run-time,
hardware and cloud provider, the emission calculator estimates the raw carbon
emissions and the approximate offset carbon emissions of your research. The
cloud provider is of importance since the emissions incurred during training depend
on the energy grid and the location of the training server. The main goal of the tool is
to push researchers for transparency within the field of Machine Learning by
publishing the ‘Machine Learning Emissions Calculator’ results in their
publications.
More recently, and similar to Lacoste et al. (2019), Henderson et al. (2020)
hypothesized that the complexity of collecting and estimating energy and carbon
metrics has been one of the main bottlenecks for researchers. To encourage
researchers to include such metrics in their work, they present the ‘experiment-
impact-tracker’.16 Their framework aims to facilitate “consistent, easy and more
accurate reporting of energy, compute and carbon impacts of ML systems”
(Henderson et al. 2020). The authors claim that the assumptions made by previous
estimation methods (Lacoste et al. 2019; Schwartz et al. 2020) lead to significant
inaccuracies, particularly for experiments relying heavily on both GPUs and CPUs.

10.3.3 Workshop

SustaiNLP:17 The first SustaiNLP workshop was held in November, 2020 and
(virtually) co-located with EMNLP2020.18 It specifically focused on efficiency, by
encouraging researchers to design solutions that are more simple yet competitive

15
https://mlco2.github.io/impact.
16
https://github.com/Breakend/experiment-impact-tracker.
17
https://sites.google.com/view/sustainlp2020.
18
https://2020.emnlp.org/.
194 D. Shterionov and E. Vanmassenhove

with the state-of-the-art, and justifiability, by stimulating researchers to provide


justifications for models and in this way encouraging for more novel and creative
designs.

10.4 Case Study: Empirical Evaluation of MT Systems

In the following three sections we try to give a realistic image of the power
consumption and environmental footprint, in terms of carbon emissions, related to
the usage of GPUs for training and translating with NMT models. To do that, we
train multiple NMT models, using both LSTM and Transformer architectures on
different GPUs. We record the power consumption for each GPU during training as
well as during translation. In Sect. 10.5 we analyse the results and in Sect. 10.6 we
present one possible strategy to reduce power consumption at inference time,
i.e. quantization—the process of approximating a neural network’s parameters by
reducing their precision, thus reducing the size of a model.

10.4.1 Hardware Setup

To assess the energy consumption during the training and translation processes of
NMT we trained different NMT models from scratch, i.e. without any pretraining
nor relying on additional models (e.g. BERT) on two different workstations,
equipped with different GPUs. The GPUs we had at our disposal are four NVidia
GeForce 1080Ti with 11GB of vRAM and 3 NVidia Tesla P100 with 16GB or
vRAM. These units differ not only in terms of technical specifications, but also in
their purpose of use—the 1080Ti is a user-class GPU, developed for desktop
machine with active cooling; the P100 is designed for workstations that operate
continuously and does not support active cooling.
In Table 10.1, we compare the two GPU types in terms of their specifications.
The rest of the configurations are as follows: (1) for the 1080Ti desktop
workstation—an Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz, RAM: 64GB
(512MB block) (2) for the P100 workstation—CPU: Intel(R) Xeon(R) Gold 6128
CPU @ 3.40GHz, RAM: 196 GB (1GB block).
In this work we focus on assessing the power consumption related to the
utilization of GPUs. As such, in this chapter we will not provide metrics related to
the CPU utilization and the corresponding power consumption. This decision is
motivated by the fact that GPUs are the main processing unit for NMT models.
10 The Ecological Footprint of Neural Machine Translation Systems 195

Table 10.1 An overview of the GPU specifications of 1080Ti and P100


1080Ti P100
CUDA cores 3584 3584
vRAM 11 GB 16 GB
Core clock speed 1481 MHz 1190 MHz
Boost clock 1600 MHz 1329 MHz
Transistor count 11,800 million 15,300 million
Manufacturing process technology 16 nm 16 nm
Power consumption (TDP) 250 W 250 W
Maximum GPU temperature 91 ∘C 85 ∘C
Floating-point performance 11,340 gflops 10,609 gflops
Type Desktop Workstation

Table 10.2 Number of par- Lang. pair Train Test Dev


allel sentences for the training,
EN-FR/FR-EN 1,467,489 499,487 7723
testing and development sets
EN-ES/ES-EN 1,472,203 459,633 5734

10.4.2 Machine Translation Systems

We experimented with the current state-of-the-art data-driven paradigms, RNN


(LSTM) and Transformer. We used data from the Europarl corpus (Koehn 2005)
for two language pairs, English–French and English–Spanish in both directions
(EN→FR, FR→EN, EN→ES and ES→EN). Our data is summarised in
Table 10.2. For reproducibility reasons, the model specifications, data, scripts and
our results are made publicly available for comparison on https://github.com/
dimitarsh1/NMT-EcoFootprint.git.
We ought to note that our test set is atypically large. One of the main reasons for
these experiments is that it can give us measures over a larger period of time enabling
us to draw more stable conclusions.
For the RNN and Transformer systems we used OpenNMT-py.19 The systems
were trained for maximum 150K steps, saving an intermediate model every 5000
steps or until reaching convergence according to an early stopping criteria of no
improvements of the perplexity (scored on the development set) for 5 intermediate
models. The options we used for the neural systems are:
• RNN: size: 512, RNN type: bidirectional LSTM, number of layers of the encoder
and of the decoder: 4, attention type: MLP, dropout: 0.2, batch size: 128, learning
optimizer: Adam (Kingma and Ba 2014) and learning rate: 0.0001.
• Transformer: number of layers: 6, size: 512, transformer_ff: 2048, number of
heads: 8, dropout: 0.1, batch size: 4096, batch type: tokens, learning optimizer
Adam with beta2 = 0.998, learning rate: 2.

19
https://opennmt.net/OpenNMT-py/.
196 D. Shterionov and E. Vanmassenhove

Table 10.3 Vocabulary sizes. For completeness we also present the vocabulary size without BPE,
i.e. the number of unique words in the corpora
Lang. pair No BPE With BPE
EN FR/ES EN FR/ES
EN-FR/FR-EN 113,132 131,104 47,628 48,459
EN-ES/ES-EN 113,692 168,195 47,639 49,283

All NMT systems have the learning rate decay enabled and their training is
distributed over 4 nVidia 1080Ti GPUs. The selected settings for the RNN systems
are optimal according to Britz et al. (2017); for the Transformer we use the settings
suggested by the OpenNMT community20 as the optimal ones that lead to quality on
par with the original Transformer work (Vaswani et al. 2017).
For training, testing and validation of the systems we used the same data. To build
the vocabularies for the NMT systems we used sub-word units, allowing NMT to be
more creative. Using sub-word units also mitigates to a certain extent the out of
vocabulary problem. To compute the sub-word units we used BPE with 50 000
merging operations for all our data sets. Separate subword vocabularies were used
for every language. In Table 10.3 we present the vocabulary sizes of the data used to
train our PB-SMT and NMT systems.
The quality of our MT systems is evaluated on the test set using standard
evaluation metrics: BLEU (Papineni et al. 2002) (as implemented in SacreBLEU
(Post 2018) and TER (Snover et al. 2006) (as implemented in MultEval (Clark et al.
2011)). Our evaluation scores are presented in Table 10.4.
We computed pairwise statistical significance using bootstrap resampling (Koehn
2004) and a 95% confidence interval. The results shown in Table 10.4 are all
statistically significant based on 1000 iterations and samples of 100 sentences. All
metrics show the same performance trends for all language pairs: Transformer
(TRANS) outperforms all other systems, followed by PB-SMT, and LSTM.

10.4.3 GPU Power Consumption

To measure the consumption of power, memory and core utilisation, as well as the
heat generated during the training and use of an NMT model, several tools can be
exploited, e.g. experiment-impact-tracker21 (Henderson et al. 2020),
“Weights and Biases”,22 as well as NVidia’s device monitoring: nvidia-
smi dmon. We decided to stick to the mainstream NVIDIA System Management
Interface program (nvidia-smi)23 which does not require additional installation

20
http://opennmt.net/OpenNMT-py/FAQ.html.
21
https://github.com/Breakend/experiment-impact-tracker.
22
https://wandb.ai.
23
https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf.
10

Table 10.4 Quality evaluation scores for our MT systems. TRANS denotes Transformer systems
System English as source English as target
EN→FR EN→ES FR→EN ES→EN
BLEU" TER# BLEU" TER# BLEU" TER# BLEU" TER#
1080Ti LSTM 34.2 50.9 38.2 45.3 34.6 48.2 38.1 44.7
TRANS 37.2 48.7 40.9 43.4 37.0 46.4 41.3 41.4
P100 LSTM 34.1 50.7 37.3 47 34.9 48 38.5 44.4
TRANS 37.4 48.4 40.9 43.3 37.3 47 41.6 42.5
The Ecological Footprint of Neural Machine Translation Systems
197
198 D. Shterionov and E. Vanmassenhove

or setup. It is a tool by NVidia for monitoring and management of major lines of their
GPUs. The nvidia-smi is cross-platform and supports all standard NVidia
driver-supported Linux distributions as well as 64bit versions of the Windows
operating system. In this work we used the nvidia-smi dmon command to
monitor all GPUs during the training and inference processes. This command
displays one line of monitoring data per monitoring cycle. The default range of
metrics includes power usage (or power draw—the last measured power draw for the
entire board reported in watts), temperature, SM clocks, memory clocks and utili-
zation values for SM, memory, encoder and decoder. By default we monitor all
GPUs during the training process as training is distributed over all four 1080Ti or
three P100 GPUs. However, at inference time we use only one GPU. As such, we
only monitor and report values for that specific GPU during inference.
We ought to note that the experiment-impact-tracker, which internally
invokes nvidia-smi, is a much more elaborate tool, designed to ease the collec-
tion and analysis of data. However, it utilizes Intel’s RAPL interface.24 As part of the
Linux kernel, RAPL is read/write protected to normal users and the information it
generates is readable by super users only. As such, the experiment-impact-
tracker, which utilizes RAPL would not collect any metrics from the CPU. We
did not find any work-around but to disable this functionality. That, in turn, resulted
in only collecting metrics from nvidia-smi and therefore we did not employ the
tool. However, along with the resource monitoring, this tool can compute the CO2
emissions based on the compute time and the energy consumed.
The CO2 emissions generated by each experiment are computed based on
Eq. 10.1 (Strubell et al. 2019; Henderson et al. 2020).

PUE  kWh  I CO2 ð10:1Þ


CO2 emissions =
1 000

where PUE, which stands for Power Usage Effectiveness, defines how efficiently
data centres use energy, i.e. it accounts for the additional energy required to support
the compute infrastructure; kWh are the total kilowatt hours consumed; I CO2 is the
CO2 intensity. The PUE and I CO2 values vary greatly and depend on a large set of
factors. In our computations, similar to Henderson et al. (2020) we use averages
reported on a global or national level. In particular, we use the global average PUE
value reported by Ascierto and Lawrence (2020) of 1.59.25
The kWh are computed as the sum of all power (in watts) drawn per second per
GPU which is then divided by 3,600,000. We ought to note that during training on

24
Intel’s Running Average Power Limit or RAPL (Intel 2009) interface exposes power meters and
power limits. It also allows power limits on the CPU and the DRAM to be set.
25
It is worth noting that the average PUE value follows a descending trend until 2018 when the
average PUE is 1.58 (as reported in the Uptime Institute global data centre survey for 2018) and
used in the experiment-impact-tracker tool. However, in 2019 the global average PUE is
1.67, surpassing 2013; in 2020, the value is still high—1.59 (see p. 10 of the 2020 survey (Ascierto
and Lawrence 2020)).
10 The Ecological Footprint of Neural Machine Translation Systems 199

Table 10.5 Train time in hours, number of steps and average train time for one step
System 1080Ti P100
Elapsed # steps time/ Elapsed # steps time/
time (h) ×1000 step time (h) ×1000 step
LSTM EN-FR 25.08 160 0.16 18.83 145 0.13
EN-ES 28.41 180 0.16 16.66 130 0.13
FR-EN 23.51 145 0.16 13.95 105 0.13
ES-EN 24.38 145 0.17 19.21 145 0.13
TRANS EN-FR 5.22 14.5 0.36 5.06 11 0.46
EN-ES 6.6 19.5 0.34 6.06 13 0.47
FR-EN 6.15 17.5 0.35 4.85 11 0.44
ES-EN 6.36 19 0.33 6.2 13 0.48

the P100 workstation and translation (both quantized and not quantized) on both
workstations, we collected the readings of nvidia-smi every second; for the
training on the 1080Ti workstation, the readings were collected every 5 s. In order to
make the comparison more realistic, for the latter case we interpolated the missing
values instead of simply averaging for every 5-s interval.26
The carbon intensity is a measure of how much CO2 emissions are produced per
kilowatt hour of electricity consumed.27 It is a constantly-changing value and as such
we consider an average collected over the first half of 2020. The electricityMap
project28 (Tranberg et al. 2019) collects and distributes elaborate information related
to electricity production and consumption, including carbon intensity per country
and per specific timestamp. Since our research is conducted for academic purposes
‘electricityMap’ gave us access to historical data from which we calculated the mean
CO2 intensity and standard deviation for both Ireland (IE) and the Netherlands (NL),
where our workstations are located (the 1080Ti workstation is located in Ireland, the
P100 workstation—in the Netherlands. The values are as follows: IE: 229.8718 ±
77.4026; NL: 399.3685 ± 31.9251.

10.5 Power Consumption and CO2 Footprint

10.5.1 Train and Translation Times

We first present the run times for each experiment and the training step at which the
training was terminated (using early stopping). The run times are shown in
Table 10.5. Although we are using the exact same data, software, hyperparameter

26
We used SciPy’s interpolate https://docs.scipy.org/doc/scipy/reference/interpolate.html.
27
https://carbonintensity.org.uk/.
28
https://www.electricitymap.org/.
200 D. Shterionov and E. Vanmassenhove

Table 10.6 Translation time System 1080Ti P100


in hours
LSTM EN-FR 1.52 1.84
EN-ES 1.38 1.69
FR-EN 1.48 1.79
ES-EN 1.34 1.62
TRANS EN-FR 2.63 3.01
EN-ES 2.48 2.80
FR-EN 2.47 3.18
ES-EN 2.45 2.69

values29 and random seeds, the training processes on the different hardware deviate
due to differences in the hardware, number of GPUs and the NVidia driver.30
The values in Table 10.5 indicate a larger train time for the LSTM models
compared to the TRANS models, for both the 1080Ti and the more performant
P100. We furthermore observe an overall larger train time for the 1080Ti. However,
we ought to note that the average time per step for the TRANS models is larger for
the P100. We associate these differences with the number of GPUs on which we
trained the models: 4 for the 1080Ti and 3 for the P100.
In Table 10.6 we present the elapsed time during translation of our test set. Recall
that the test set is intentionally large. The translation is conducted on a single GPU
and no file is translated by two models at the same time in order to avoid any I/O
delays.
Contrary to what is ought to be expected, from the experiments it can be observed
that the translation times on the 1080Ti machine are constantly lower than those on
the P100.

10.5.2 Power Consumption and CO2 Emissions

Below we summarise the measurements collected through the nvidia-smi tool


and analyse the ecological footprint in terms of CO2 emissions. In Tables 10.7 and
10.8 we present the power consumption and carbon dioxide emissions at train and at
translation time, respectively.
The average power draw, as shown in Tables 10.7 and 10.8, is computed as the
sum of all readings (in watt) for all GPUs divided by the number of readings. That is,
it is a value that does not take into account the number of GPUs. Comparing the
average power draw of LSTM and Transformer (TRANS) models, we observe that
for both workstations (both types of GPUs), LSTMs consume less power (at a time)
than TRANS models, during both training and translation. However, due to the

29
These are not dependent on the number of GPUs.
30
https://pytorch.org/docs/stable/notes/randomness.html.
10 The Ecological Footprint of Neural Machine Translation Systems 201

Table 10.7 Run-time, power draw and CO2 emissions (kg) at train time
System 1080Ti P100
Elapsed Avg. CO2 Elapsed Avg. CO2
time power kWh (kg) time power kWh (kg)
(h) (W) (h) (W)
LSTM EN-FR 25.08 142.05 14.07 5.14 ± 1.73 18.83 115.09 6.33 4.02 ± 0.32
EN-ES 28.41 140.88 15.79 5.77 ± 1.94 16.66 113.99 5.54 3.52 ± 0.28
FR-EN 23.51 141.85 13.15 4.81 ± 1.62 13.95 113.48 4.63 2.94 ± 0.24
ES-EN 24.38 139.90 13.44 4.91 ± 1.65 19.21 113.91 6.37 4.04 ± 0.32
TRANS EN-FR 5.22 176.70 3.64 1.33 ± 0.45 5.06 153.47 2.27 1.44 ± 0.12
EN-ES 6.60 176.54 4.60 1.68 ± 0.56 6.06 152.08 2.69 1.71 ± 0.14
FR-EN 6.15 176.64 4.29 1.56 ± 0.53 4.85 151.43 2.15 1.37 ± 0.11
ES-EN 6.36 179.48 4.50 1.64 ± 0.55 6.20 151.59 2.74 1.74 ± 0.14

Table 10.8 Run-time, average power draw and CO2 emissions (kg) at translation time
System 1080Ti P100
Elapsed Average kWh CO2 Elapsed Average kWh CO2
power power
time (h) (w) (kg) time (h) (w) (kg)
LSTM EN-FR 1.52 157.80 0.22 0.08 ± 0.03 1.84 90.50 0.16 0.10 ± 0.01
EN-ES 1.38 158.51 0.20 0.07 ± 0.02 1.69 89.06 0.15 0.10 ± 0.01
FR-EN 1.34 153.43 0.19 0.07 ± 0.02 1.79 93.14 0.16 0.10 ± 0.01
ES-EN 1.48 154.98 0.21 0.08 ± 0.03 1.62 89.35 0.14 0.09 ± 0.01
TRANS EN-FR 2.63 188.75 0.45 0.16 ± 0.06 3.01 104.52 0.31 0.20 ± 0.02
EN-ES 2.48 170.02 0.38 0.14 ± 0.05 2.80 102.71 0.28 0.18 ± 0.01
FR-EN 2.47 193.34 0.47 0.17 ± 0.06 3.18 100.93 0.31 0.20 ± 0.02
ES-EN 2.45 175.60 0.42 0.15 ± 0.05 2.69 104.35 0.28 0.18 ± 0.01

excessively larger train time of LSTM models, their overall power consumption
(reported as kWh) is much larger than that of TRANS models. As such, in our
experiments they lead to a larger CO2 emission, as computed by Eq. 10.1 for all
4 1080Ti or 3 P100 GPUs: in the case of the 1080Ti, between 2.99 times (4.91–1.64
kg) for the ES-EN language pair and 3.86 times (5.14–1.64 kg) for EN-FR; in the
case of the P100, between 2.06 times (4.63–2.15 kg) for FR-EN and 2.79 times
(6.33–2.27 kg) for EN-FR. These results indicate that for the same data and optimal
hyperparameters,31 LSTMs have a larger ecological footprint at train time. How-
ever, at translation time the observations are reversed—as suggested by the lower
average power consumption for LSTM models and the larger translation time for
TRANS models, the CO2 emission regarded to LSTM models is two times lower
than that of TRANS models (consistent over all language pairs and GPU types). This

31
As recommended in the literature.
202 D. Shterionov and E. Vanmassenhove

would imply that after a certain time of usage, our LSTM models are “greener” than
the TRANS models. In particular, for the 1080Ti GPUs, our LSTMs would become
“greener” after 10–40 days and for the P100—after 9 and 12 days.32
When we compare the two types of GPUs, we notice that the P100 is much less
power-demanding than the 1080Ti with an average power draw between 113.91
(ES-EN LSTM) and 115.09 (EN-FR LSTM) versus between 139.9 (ES-EN LSTM)
and 142.05 (EN-FR LSTM), and between 151.43 (FR-EN TRANS) and 153.47
(EN-FR TRANS) versus between 176.54 (EN-ES TRANS) and 179.48 (FR-EN
TRANS).33 This is also reflected in the overall power consumption to which a large
contributing factor is train time which is much smaller with the P100 workstation.
That is, at train time, the power consumption for the P100 GPU workstation is almost
three times smaller than that of the 1080Ti machine for LSTM models and around
two times smaller for TRANS models. At translation time, while still leading to a
lower power consumption, mainly because of the lower average power draw, the
differences are much smaller.
Special attention needs to be paid to the impact of the national carbon intensity.
For Ireland it is much lower than that of the Netherlands. These differences have a
substantial impact on the carbon emissions: the TRANS models trained on the P100
workstation in the Netherlands have a larger footprint than in the 1080Ti case in
three out of the four cases (EN-FR, EN-ES and ES-EN); at translation time for all
models running on the P100 machine, the footprint is larger than in the case of the
1080Ti.
Analysing the data from Tranberg et al. (2019), we see that the power for Ireland
originates primarily from renewable sources while that is not the case for the
Netherlands. Figure 10.2 illustrates this for the period between 01/01/2020 and
01/06/2020 on a monthly basis.

10.5.3 The Impact

Comparing different types of workstations, different NMT architectures and lan-


guage pairs as well as the regional factors related to carbon emissions gives us an
understanding of the conditions under which a certain amount of electricity is
consumed and the factors that we should consider to optimize our NMT-related
ecological footprint. To put matters into perspective, we compare our models and the
related measurements to common everyday devices. We assess (1) the power draw
and (2) the carbon emissions. For the former, we collect the power consumption as

32
This is the time that would be required for the two types of architectures to consume the same
power (in watt) including training (on the same data) and translation. Since LSTM consumes less
power at translation time per time unit (e.g. second), any further operation of both LSTM and
TRANS models would lead to an overall lower energy consumption by LSTM models.
33
At train time.
10 The Ecological Footprint of Neural Machine Translation Systems 203

Ireland Netherlands
100% 100% Origin
Fossil origin
80% 80% Renewable origin

60% 60%

40% 40%

20% 20%

0% 0%
20

20

20

20

20

20

20

20

20

20

20

20
20

20

20

20

20

20

20

20

20

20

20

20
-0

-0

-0

-0

-0

-0

-0

-0

-0

-0

-0

-0
1

6
Fig. 10.2 Power origin distribution.

indicated by Caruna,34 a Finnish electricity distribution company; for the latter, we


gather data from Carbon Footprint,35 a company that specialises in carbon emission
assessments, LifeCycle Analysis, environmental strategy/planing, etc., and indicates
an estimated times or hours of use of common devices.
Figure 10.3 compares the average power draw of the workstations during training
and translation for the different models (marked in light blue)36 to common devices
(marked in light orange). The common devices are typical household appliances; we
have excluded larger, more power-demanding devices such as electric shower (8750
W) since our range, for practical reasons, spans to maximum 1200 W. From
Fig. 10.3, it is clear that the GPUs are less power demanding than some common
household appliances. That is, at a given moment in time, a GPU board draws fewer
watts of electricity than some common household appliances (e.g. a microwave, an
electric mower, etc.).
However, the usage (or utilisation) time of a GPU is much higher than that of a
microwave and as such the power consumption (computed in kWh) as well as the
carbon emissions are significantly different. When computing the carbon emissions
of a device for a certain period of time, one must take into account the utilisation rate
of that device for the given time period, i.e. the number of hours or times the device
is used within the considered time interval to the time it is idle. According to
Nadjaran Toosi et al. (2017) and Doyle and Bashroush (2020), in data centres, the
average utilization rate ranges between 20% and 40%. However, this is not an ideal
scenario for a GPU workstation as the return on investment would be low due to the
high costs of GPUs.37 As an ideal and rather extreme scenario, we assume that in an
industry environment a GPU workstation is utilised 100%. This, in turn, will indicate

34
https://www.caruna.fi.
35
https://www.carbonfootprint.com.
36
These are the same values indicated in Tables 10.7 and 10.8.
37
A modern workstation with 3 x Nvidia Tesla V100 costs approximately € 30,000; a workstation
with 4 x Nvidia RTX3060 is approximately € 7000.
204 D. Shterionov and E. Vanmassenhove

Toaster
Microwave
Electric mower
Hairdryer
Electric drill
Vacuum cleaner
1080Ti (TRANS – ES-EN)
1080Ti (TRANS – EN-FR)
1080Ti (TRANS – FR-EN)
1080Ti (TRANS – EN-ES)
1080Ti (LSTM – EN-FR)
1080Ti (LSTM – FR-EN)
1080Ti (LSTM – EN-ES)
1080Ti (LSTM – ES-EN)
Dehumidifier
P100 (TRANS – EN-FR)
P100 (TRANS – EN-ES)
P100 (TRANS – ES-EN)
P100 (TRANS – FR-EN)
Plasma TV
P100 (LSTM – EN-FR)
P100 (LSTM – EN-ES)
P100 (LSTM – FR-EN)
P100 (LSTM – ES-EN)
Fridge/freezer
Towel rail
Appliance

1080Ti (TRANS – FR-EN Translate)


1080Ti (TRANS – EN-FR Translate)
1080Ti (TRANS – ES-EN Translate)
1080Ti (TRANS – EN-ES Translate)
Heating blanket
LCD TV
1080Ti (LSTM – EN-ES Translate)
1080Ti (LSTM – EN-FR Translate)
1080Ti (LSTM – FR-EN Translate)
1080Ti (LSTM – ES-EN Translate)
Freezer
P100 (general)
1080Ti (general)
Games console
Desktop computer
P100 (TRANS – EN-FR Translate)
P100 (TRANS – ES-EN Translate)
P100 (TRANS – EN-ES Translate)
P100 (TRANS – FR-EN Translate)
P100 (LSTM – FR-EN Translate)
P100 (LSTM – EN-FR Translate)
P100 (LSTM – ES-EN Translate)
P100 (LSTM – EN-ES Translate)
Fridge
Video, DVD or CD
Laptop
TV box
Extractor fan
Tablet (charge)
Broadband router
Smart phone (charge)
0 200 400 600 800 1,000 1,200
Average power draw (Watts)
Devices GPUs

Fig. 10.3 Average power draw


10 The Ecological Footprint of Neural Machine Translation Systems 205

the maximum amount of CO2 emissions if GPU workstations of the types we


considered are used to train and translate with our models. To compute the impact,
in terms of CO2 emissions, we first estimate the carbon emission per unit of time the
product of power draw (in watts), utilisation rate38 and carbon intensity (for both
Ireland and the Netherlands) and second we extrapolate the CO2 emissions over a
year based on the recorded values for the given time period. Our estimates are shown
in Fig. 10.4.
Here, however, we ought to note that while at train time each workstation utilizes
all of its GPUs, that is not the case for the translation phase. As such, we advise the
reader to consider these as two different types of measurements, one where all GPUs
are utilized (train time) and one where only one GPU is used (translation time).
From Fig. 10.4 we observe that in 1 year, at 100% utilization, a GPU workstation
that is used to train simple models, such as ours, could produce up to 2500 kg of
CO2.39 This is approximately equal to the CO2 emissions from the electricity
consumption of two small households in the UK.40

10.6 Optimizing at Inference Time Through Model


Quantization

In an academic environment, the main use of MT is as an experimentation ground for


research. Thus, it is often the case that many NMT models are trained and evaluated
on a small test set. However, within an industry environment, the main purpose of an
NMT model is to be used to translate as much new content as possible before
updating or rebuilding it. From the evaluation summarised in Sect. 10.5 we see
that, at translation time the P100 workstation uses less energy than a 1080Ti
workstation (see Table 10.8); furthermore at translation time the P100 workstation
uses less energy than both workstations at train time (see Table 10.7). And while this
is an indication that investing in a higher-end GPU workstation is better (in terms of
electricity consumption and carbon emissions), we ought to investigate other means
to optimize both power consumption and inference time. In that regard, in this
section we investigate how quantizing the model can improve energy consumption,
inference time as well as what the impact is on the translation quality.

38
Values are based on data from Carbon Footprint: https://www.carbonfootprint.com.
39
One can also multiply the CO2 emissions at translation time by 4 for the 1080Ti or by 3 for the
P100 GPUs and get an indication about how much CO2 emissions would be generated if a
workstation (with 4 or 3 GPUs, as in our case) is utilized 100% at translation time. That is, either
all GPUs translate using the same model in parallel, or different models are used at the same time for
different translation jobs.
40
According to data from https://www.carbonindependent.org.
206 D. Shterionov and E. Vanmassenhove

1080Ti (TRANS – ES-EN)


1080Ti (TRANS – FR-EN)

1080Ti (TRANS – EN-FR)


1080Ti (TRANS – EN-ES)

1080Ti (LSTM – EN-FR)

1080Ti (LSTM – FR-EN)

1080Ti (LSTM – EN-ES)

1080Ti (LSTM – ES-EN)


P100 (TRANS – EN-FR)

P100 (TRANS – EN-ES)

P100 (TRANS – FR-EN)


P100 (TRANS – ES-EN)

P100 (LSTM – EN-FR)


P100 (LSTM – EN-ES)

P100 (LSTM – FR-EN)

P100 (LSTM – ES-EN)

1080Ti (TRANS – FR-EN Translate)


1080Ti (TRANS – ES-EN Translate)

1080Ti (TRANS – EN-FR Translate)

1080Ti (TRANS – EN-ES Translate)

1080Ti (LSTM – EN-ES Translate)


Appliance

1080Ti (LSTM – EN-FR Translate)

1080Ti (LSTM – ES-EN Translate)

1080Ti (LSTM – FR-EN Translate)

P100 (TRANS – ES-EN Translate)


P100 (TRANS – EN-FR Translate)

P100 (TRANS – EN-ES Translate)


P100 (TRANS – FR-EN Translate)

P100 (LSTM – FR-EN Translate)


P100 (LSTM – EN-ES Translate)

P100 (LSTM – EN-FR Translate)

P100 (LSTM – ES-EN Translate)

Fridge-Freezer A spec
Electric Tumble Dryer

Electric Hob
Fridge-Freezer A+ spec

Electric Oven

Fridge-Freezer A ++ spec

Dishwasher at 65°C
Kettle

Washing Machine
Dishwasher at 55°C

Microwave Oven
Primary TV – LCD 34-37 inch

0 500 1,000 1,500 2,000 2,500


CO2 (kg) for a year

Country
Ireland Netherlands

Fig. 10.4 Estimated yearly CO2 emission in kg


10 The Ecological Footprint of Neural Machine Translation Systems 207

10.6.1 Quantization

DNNs contain a huge amount of parameters (biases, weights) that are adapted during
training to reduce the loss. These are typically stored as 32-bit floating point
numbers. In every forward pass all these parameters are involved in computing the
output of the network. The large precision requires more memory and processing
power than, e.g. integer numbers. Quantization is the process of approximating a
neural network’s parameters by reducing their precision. A quantized model exe-
cutes some or all of the operations on tensors with integers rather than floating point
values.41 Quantization is a term that encapsulates a broad range of approaches to the
aforementioned process: binary quantization (Courbariaux and Bengio 2016), ter-
nary (Lin et al. 2016; Li and Liu 2016), uniform (Jacob et al. 2018) and learned
(Zhang et al. 2018), to mention a few. The benefits of quantization are a reduced
model size and the option to use more high performance vectorized operations, as
well as, the efficient use of other hardware platforms.42 However, more efficient
quantized models may suffer from worse inference quality. In the field of NMT
Bhandare et al. (2019) quantize trained Transformer models to a lower precision
(8-bit integers) for inference on Intel® CPUs. They investigate three approaches to
quantize the weights of a Transformer model and achieve a drop in performance of
only 0.35–0.421 BLEU points (from 27.68 originally). Prato et al. (2020) investigate
a uniform quantization for Transformer, quantizing matrix multiplications and
divisions (if both the numerator and denominator are second or higher rank tensors),
i.e. all operations that could improve the inference speed. Their 6-bit quantized
EN-DE Transformer base model is more than 5 times smaller than the baseline and
achieves higher BLEU scores. For EN-FR, the 8-bit quantized model which achieves
the highest performance is almost 4 times smaller than the baseline. With the
exception of the 4-bit fully quantized models and the naive approach, all the rest
show a significant reduction in model size with almost no loss in translation quality
(in terms of BLEU).
The promising results from the aforementioned works motivated us to investigate
the power consumption of quantized versions of our models running on a GPU. We
used the CTranslate2 tool of OpenNMT.43 As noted in the repository,
“CTranslate2 is a fast inference engine for OpenNMT-py and OpenNMT-tf models
supporting both CPU and GPU execution. The goal is to provide comprehensive
inference features and be the most efficient and cost-effective solution to deploy
standard neural machine translation systems such as Transformer models.”

41
https://pytorch.org/docs/stable/quantization.html.
42
For example, the second generation Intel®Xeon®Scale processors incorporate an INT8 data type
acceleration (Intel®DL Boost Vector Neural Network Instructions (VNNI) (Evarist 2018)), specif-
ically designed to accelerate neural network-related computations (Rodriguez et al. 2018).
43
https://github.com/OpenNMT/CTranslate2.
208 D. Shterionov and E. Vanmassenhove

Table 10.9 Evaluation metrics for quantized and baseline Transformer models. FP32 stands for
32-bit floating point; INT16 or INT8 stand for 16-bit or 8-bit integer. FP32 is the default,
non-quantised model (See Table 10.4)
GPU Prec. EN-FR EN-ES FR-EN ES-EN
BLEU" TER# BLEU" TER# BLEU" TER# BLEU" TER#
1080Ti FP32 37.2 48.7 40.9 43.4 37.0 46.4 41.3 41.4
INT16 36.4 49.5 40.6 44.1 36.6 47.3 40.9 44.0
INT8 36.3 49.5 40.5 44.1 36.6 47.3 40.8 44.1
P100 FP32 37.4 48.4 40.9 43.3 37.3 47 41.6 42.5
INT16 36.4 49.1 40.7 44.0 36.5 50.6 40.5 43.8
INT8 36.4 49.2 40.7 44.0 36.5 50.4 40.5 43.8

10.6.2 Quality of Quantized Transformer Models

We quantized our Transformer models to INT8 and INT16 and translated our test
sets on the 1080Ti and P100 workstations. On each workstation, we quantized the
models that were originally trained on that machine. After translation, we scored
BLEU and TER the same way as with our normal models. The quality results are
summarised in Table 10.9.

10.6.3 Energy Considerations for Quantized Transformer


Models

We measured the power draw during the translation process with our quantized
models and computed the consumed kWh as well as CO2 emissions per model and
per region. Our results are summarised in Table 10.10.
Comparing these results to those in Table 10.8 (TRANS), we first notice the
increased translation time for the quantized models running on the 1080Ti machine:
from 2.45 to 4.03 (INT8) and 4.45 (INT16) hours for the fastest ES-EN and from
2.63 to 5.17 (INT16) and 4.66 (INT8) for the slowest EN-FR. However, due to the
lower power draw the overall energy consumption as well as the CO2 emissions at
translation time with these models (on the 1080Ti workstation) is still lower than for
the non-quantized models (on the same workstation).
When comparing the performance of these models on the P100 workstation we
notice a much lower translation time over all models, even if the difference between
the EN-FR/INT16 model and the non-quantized EN-FR Transformer model is not so
drastic. At the same time the power draw for all models is lower than for their
non-quantized version, leading to a very low electricity consumption and low carbon
emissions.
We ought to note that the time for quantization, i.e. the process of converting a
non-quantized Transformer model into a quantized one, is very low (between 6 and
12 s on the P100 workstation). Furthermore, quantization is very much suitable for
10 The Ecological Footprint of Neural Machine Translation Systems 209

Table 10.10 Run-time, average power draw and CO2 emissions (kg) at translation time for
quantized models
System 1080Ti P100
Elapsed Avg. kWh CO2 Elapsed Avg. kWh CO2
time power (kg) time power (kg)
(h) (W) (h) (W)
INT16 EN-FR 5.17 130.61 0.13 0.05 ± 0.02 0.79 81.54 0.01 0.01 ±0.00
INT8 EN-FR 4.66 115.99 0.11 0.04 ± 0.01 2.16 49.06 0.02 0.01 ±0.00
INT16 EN-ES 4.45 158.96 0.14 0.05 ± 0.02 0.99 65.4 0.01 0.01 ±0.00
INT8 EN-ES 4.15 124.40 0.10 0.04 ± 0.01 1.00 68.01 0.01 0.01 ±0.00
INT16 FR-EN 4.57 139.38 0.13 0.05 ± 0.02 1.28 67.57 0.02 0.01 ±0.00
INT8 FR-EN 4.39 107.87 0.09 0.03 ± 0.01 1.29 68.39 0.02 0.01 ±0.00
INT16 ES-EN 4.45 131.66 0.12 0.04 ± 0.01 1.02 68.33 0.01 0.01 ±0.00
INT8 ES-EN 4.03 117.48 0.09 0.03 ± 0.01 1.04 67.25 0.01 0.01 ±0.00

inference on CPU. Based on the results in Tables 10.9 and 10.10, the low quantiza-
tion time and the fact that quantized models can easily be run on CPU, we would
recommend quantized models at translation time for large-scale translation projects
in the pursuit of greener MT.

10.7 Conclusions and Future Work

In the era of deep learning, neural models are continuously pushing the boundaries in
NLP, MT included. The ever growing volumes of data and the advanced, larger
models keep delivering new state-of-the-art results. A facilitator for these results are
the innovations in general purpose GPU computing, as well as in the hardware itself,
i.e. GPUs. The embarrassingly parallel processing required for deep learning models
is easily distributed on the thousands of processing cores on a GPUs, making
training and inference with such models much more efficient than on CPUs. How-
ever, GPUs are much more power demanding and as such have a higher environ-
mental impact. In this chapter we discussed considerations related to the power
consumption and ecological footprint, in terms of carbon emissions associated with
the training and the inference with MT models.
After briefly presenting the evolution of MT and the shift to GPUs as the core
processing technology of (N)MT, we discussed the related work addressing the
issues of power consumption and environmental footprint of computational models.
We acknowledge that the work of colleague researchers and practitioners, such as
Strubell et al. (2019) and Schwartz et al. (2020), among others, raises awareness
about the environmental footprint that deep learning models have, and we would like
to join them in their appeal towards “greener” AI. This could be achieved through
210 D. Shterionov and E. Vanmassenhove

optimizing models through quantization, as discussed in Sect. 10.6, but also, through
reusability, smarter data selection, knowledge distillation and other techniques.
To outline the realistic dimensions of power consumption and environmental
footprint of NMT, we analysed a number of NMT models, running (training or
inference) on two types of GPUs: a consumer GPU card (NVidia GTX 1080Ti),
designed to work on a desktop machine and a workstation GPU (NVidia Tesla P100)
developed for heavy loads of graphics or neural computing. We reported results for
training both LSTM and Transformer models for English-French and English-
Spanish (and vice-versa) language pairs on data from the Europarl corpus. These
models were trained on approximately 1.5M parallel sentences and used to translate
large test sets that include approximately 500, 000 sentences each. Our results and
analysis show that, while a Transformer model is much faster and as such much more
power efficient than an LSTM at train time, at translation time Transformer models
are lagging behind LSTM models, in terms of power consumption, speed as well as
carbon emissions. We also note that using the more expensive P100 is preferable in
almost every case. An exception is the slightly higher translation time, which
however, comes with the benefit of a largely reduced power consumption.
Additionally, we also note the impact of electricity sources on the carbon
emissions by investigating two different countries, each of which has a different
distribution between fossil and renewable energy sources—Ireland with a larger
portion of renewable energy and the Netherlands with a larger portion of fossil
sources.
Together with the aforementioned contributions, we also aim to motivate
researchers to devote time, effort and investment in developing more ecological
solutions. We are already looking into model reusability, data selection and filtering,
multi-objective optimization of hyperparameters and other approaches that reduce
the environmental footprint of NMT.
Carbon Impact Statement
This work contributed 50.77 ± 11.37 kg of CO2eq to the atmosphere and used
111.55 kWh of electricity.

Acknowledgements We would like to thank electicityMap.org for their responsiveness to our


queries and providing us with valuable information.

References

Amodei D, Hernandez D, Open AI (2018) Ai and compute


Arnold D, Balkan L, Meijer S, Humphreys RL, Sadler L (1994) An introductory guide. Blackwells-
NCC, London
Ascierto R, Lawrence A (2020) Uptime institute global data center survey 2020 (report)
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and
translate. In: Proceedings of the 3rd international conference on learning representations (ICLR
2015), San Diego, CA, USA, 15 pp
Barsh R (1990) Indigenous peoples, racism and the environment. Meanjin 49(4):723–731
10 The Ecological Footprint of Neural Machine Translation Systems 211

Bender EM, Gebru T, McMillan-Major A, Shmitchell S (2021) On the dangers of stochastic parrots:
Can language models be too big. In: Proceedings of the 2020 conference on fairness, account-
ability, and transparency. Association for Computing Machinery, New York
Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine
translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods
in natural language processing, Austin, Texas, pp 257–267
Bhandare A, Sripathi V, Karkada D, Menon V, Choi S, Datta K, Saletore V (2019) Eflcient 8-bit
quantization of transformer neural machine language translation model. CoRR, abs/1906.00532
Britz D, Goldie A, Luong MT, Le Q (2017) Massive exploration of neural machine translation
architectures. In Proceedings of the association for computational linguistics (ACL), Vancouver,
Canada, pp 1442–1451 translation. Comput Linguist 16(2):79–85
Brown PF, Cocke J, Della Pietra S, Della Pietra VJ, Jelinek F, Mercer RL, Roossin PS (1988) A
statistical approach to language translation. In: Proceedings of the 12th international conference
on computational linguistics, COLING ’88, Budapest, Hungary, August 22–27, 1988. John von
Neumann Society for Computing Sciences, Budapest, pp 71–76
Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty J, Mercer RL, Roossin PS
(1990) A statistical approach to machine
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P,
Sastry G, Askell A, et al (2020) Language models are few-shot learners. Preprint.
arXiv:2005.14165
Carl M, Way A (2003) Recent advances in example-based machine translation. Springer,
Cambridge, MA
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014)
Learning phrase representations using RNN encoder–decoder for statistical machine
translation. In: Proceedings of the 2014 conference on empirical methods in natural language
processing, Doha, Qatar, pp 1724–1734
Clark JH, Dyer C, Lavie A, Smith NA (2011) Better hypothesis testing for statistical machine
translation: Controlling for optimizer instability. In: Proceedings of the 49th annual meeting of
the Association for Computational Linguistics: human language technologies, Portland, Ore-
gon, USA. Association for Computational Linguistics, pp 176–181
Courbariaux M, Bengio Y (2016) Binarynet: Training deep neural networks with weights and
activations constrained to +1 or -1. CoRR, abs/1602.02830
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional trans-
formers for language understanding. In: Proceedings of the 2019 conference of the North
American chapter of the Association for Computational Linguistics: human language technol-
ogies, (NAACL-HLT 2019), volume 1 (Long and Short Papers), Minneapolis, Minnesota, USA,
pp 4171–4186
Doyle J, Bashroush R (2020) Case studies for achieving a return on investment with a hardware
refresh in organizations with small data centers. IEEE Trans Sustain Comput:1
Ethayarajh K, Jurafsky D (2020) Utility is in the eye of the user: A critique of NLP leaderboards. In:
Proceedings of the 2020 conference on empirical methods in natural language processing,
EMNLP 2020, Online, November 16–20, 2020. Association for Computational Linguistics,
pp 4846–4853
Evarist F (2018) Eflcient 8-bit quantization of transformer neural machine language translation
model
Gordon A, Eban E, Nachum O, Chen B, Wu H, Yang TJ, Choi E (2018) Morphnet: Fast & simple
resource-constrained structure learning of deep networks. In: Proceedings of the IEEE confer-
ence on computer vision and pattern recognition, pp 1586–1595
Henderson P, Hu J, Romoff J, Brunskill E, Jurafsky D, Pineau J (2020) Towards the systematic
reporting of the energy and carbon footprints of machine learning. J Mach Learn Res 21(248):
1–43
Hutchins J (2005a) The history of machine translation in a nutshell. Retrieved December 20(2009):1
212 D. Shterionov and E. Vanmassenhove

Hutchins J (2005b) Towards a definition of example-based machine translation. In: Machine


translation summit X, second workshop on example-based machine translation. Citeseer, pp
63–70
Intel IA (2009) Intel architecture software developer’s manual volume 3: System programming
guide
Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard AG, Adam H, Kalenichenko D (2018)
Quantization and training of neural networks for eflcient integer-arithmetic-only inference. In:
2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City,
UT, USA, June 18–22, 2018. IEEE Computer Society, pp 2704–2713
Kenny D (2005) Parallel corpora and translation studies: old questions, new perspectives? reporting
that in gepcolt: a case study. In: Meaningful texts: The extraction of semantic information from
monolingual and multilingual corpora, pp 154–165
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. CoRR, abs/1412.6980
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of
the 2004 conference on empirical methods in natural language processing (EMNLP2004),
Barcelona, Spain, pp 388–395
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the
tenth machine translation summit (MT Summit 2005), Phuket, Thailand, pp 79–86
Koehn P (2020) Neural machine translation. Cambridge University Press
Lacoste A, Luccioni A, Schmidt V, Dandres T (2019) Quantifying the carbon emissions of machine
learning. Preprint. arXiv:1910.09700
Lepikhin D, Lee H, Xu Y, Chen D, Firat O, Huang Y, Krikun M, Shazeer N, Chen Z (2020) Gshard:
Scaling giant models with conditional computation and automatic sharding. Preprint.
arXiv:2006.16668
Li F, Liu B (2016) Ternary weight networks. CoRR, abs/1605.04711
Lin Z, Courbariaux M, Memisevic R, Bengio Y (2016) Neural networks with few
multiplications. In: 4th International conference on learning representations, ICLR 2016, San
Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings
Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine
translation. In: Proceedings of the 2015 conference on empirical methods in natural language
processing, EMNLP 2015, Lisbon, Portugal, September 17–21, 2015. The Association for
Computational Linguistics, pp 1412–1421
Microsoft (2020) https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parame
ter-language-model-by-microsoft/Turing-nlg: A 17 billion parameter language model by
microsoft
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2016) Pruning convolutional neural networks for
resource efficient inference. Preprint. arXiv:1611.06440
Nadjaran Toosi A, Qu C, de Assunção MD, Buyya R (2017) Eflcient 8-bit quantization of
transformer neural machine language translation model. J Netw Comput Appl 83:155–168
Nagao M (1984) A framework of a mechanical translation between Japanese and English by
analogy principle. In: Proc. of the international NATO symposium on artificial and human
intelligence, USA. Elsevier North-Holland, pp 173–180
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: A method for automatic evaluation of
machine translation. In Proceedings of the 40th annual meeting on Association for Computa-
tional Linguistics (ACL 2002), Philadelphia, Pennsylvania, USA. Association for Computa-
tional Linguistics, pp 311–318
Poibeau T (2017) Machine translation. MIT Press
Post M (2018) A call for clarity in reporting BLEU scores. In: Proceedings of the third conference
on machine translation: research papers, Belgium, Brussels. Association for Computational
Linguistics, pp 186–191
Prato G, Charlaix E, Rezagholizadeh M (2020) Fully quantized transformer for machine
translation. In: Findings of the Association for Computational Linguistics: EMNLP 2020,
Online. Association for Computational Linguistics, pp 1–14
10 The Ecological Footprint of Neural Machine Translation Systems 213

Pulido L (2016) Flint, environmental racism, and racial capitalism. Capitalism Nat Socialism 27(3):
1–16
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are
unsupervised multitask learners. OpenAI Blog 1(8):9
Raina R, Madhavan A, Ng AY (2009) Large-scale deep unsupervised learning using graphics
processors. In: Proceedings of the 26th annual international conference on machine learning,
ICML ’09, pp 873–880
Rodriguez A, Segal E, Meiri E, Fomenko E, Kim YJ, Shen H, Ziv B (2018) Lower numerical
precision deep learning inference and training. Intel White Paper 3:1–19
Schwartz R, Dodge J, Smith NA, Etzioni O (2020) Green AI. Commun ACM 63:54–63
Shterionov D, Do Carmo F, Moorkens J, Paquin E, Schmidtke D, Groves D, Way A (2019) When
less is more in neural quality estimation of machine translation. an industry case study. In
Proceedings of machine translation summit XVII volume 2: translator, project and user tracks,
Dublin, Ireland. European Association for Machine Translation, pp 228–235
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with
targeted human annotation. In Proceedings of the 7th conference of the Association for Machine
Translation of the Americas (AMTA 2006). Visions for the Future of Machine Translation,
Cambridge, Massachusetts, USA, pp 223–231
Strubell E, Verga P, Andor D, Weiss D, McCallum A (2018) Linguistically-informed self-attention
for semantic role labeling. In: Proceedings of the 2018 conference on empirical methods in
natural language processing, Brussels, Belgium. Association for Computational Linguistics, pp
5027–5038
Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in
NLP. In: Proceedings of the 57th annual meeting of the Association for Computational
Linguistics, Florence, Italy. Association for Computational Linguistics, pp 3645–3650
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In:
Proceedings of advances in neural information processing systems 27: annual conference on
neural information processing systems, Montreal, Quebec, Canada, pp 3104–3112
Tranberg B, Corradi O, Lajoie B, Gibon T, Staffell I, Andresen GB (2019) Eflcient 8-bit
quantization of transformer neural machine language translation model. Energy Strategy Rev
26:100367
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017)
Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R,
Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30.
Curran Associates, Red Hook, pp 5998–6008
Veniat T, Denoyer L (2018) Learning time/memory-efficient deep architectures with budgeted
super networks. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 3492–3500
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. CoRR,
abs/1409.2329
Zhang D, Yang J, Ye D, Hua G (2018) Lq-nets: Learned quantization for highly accurate and
compact deep neural networks. In: Computer vision - ECCV 2018 - 15th European conference,
Munich, Germany, September 8–14, 2018, Proceedings, Part VIII, volume 11212 of Lecture
Notes in Computer Science. Springer, pp 373–390
Chapter 11
Treating Speech as Personally Identifiable
Information and Its Impact in Machine
Translation
Isabel Trancoso, Francisco Teixeira, Catarina Botelho, and Alberto Abad

Abstract Speech is the most natural and immediate form of communication. It is


ubiquitous. The tremendous progress in language technologies that we have
witnessed in the past few years has led to the use of speech as input/output modality
in a panoply of applications which have been mostly reserved for text until recently.
Machine translation is one of the technologies which traditionally has dealt with text
input and output. However, speech-to-speech translation is no longer a research-only
topic, and one can only anticipate its growing use in our multilingual world. Many of
these applications run on cloud-based platforms that provide remote access to
powerful models, enabling the automation of time-consuming tasks such as docu-
ment translation, or transcribing speech, and helping users to perform everyday tasks
(e.g. voice-based virtual assistants). When a biometric signal such as speech is sent
to a remote server for processing, however, this input signal can be used to determine
information about the user, including his/her preferences, personality traits, mood,
health, political opinions, among other data such as gender, age range, height,
accent, etc. Moreover, information can also be extracted about the recording envi-
ronment. Although there is a growing society awareness about user data protection
(the GDPR in Europe is an example), most users of such remote servers are unaware
of the amount of information that can be extracted from a handful of their sentences.
In fact, most users are unaware of the potential for misuse allowed by this new
generation of speech technology systems. For instance, most users do not know how
many sentences in their own voice are necessary for cloning it, nor have they heard
about spoofing speaker recognition systems, or understand to what extent their
recordings can be anonymised. Moreover, most users do not realize that adversarial
techniques now enable the effective injection of hidden commands in spoken
messages without being audible. The recent progress in speech and language
technologies is also reflected in speech-to-speech translation systems, where the
traditional cascade of “speech recognition—machine translation—speech synthesis”

I. Trancoso (✉) · F. Teixeira · C. Botelho · A. Abad


INESC-ID/Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 215
H. Moniz, C. Parra Escartín (eds.), Towards Responsible Machine Translation,
Machine Translation: Technologies and Applications 4,
https://doi.org/10.1007/978-3-031-14689-3_11
216 I. Trancoso et al.

is being replaced by end-to-end systems that allow the sentences in the target
language to sound as the voice of the speaker in the source language, opening a
world of possibilities. All these privacy and security issues are becoming more and
more pressing in an era where speech must be legally regarded as PII (Personally
Identifiable Information).

Keywords Speech · Machine translation · Privacy · Anonymisation · Cryptography

11.1 Introduction

The discussion about ethics in AI is an extremely relevant topic that deserves the
attention of all the AI communities, but it is also extremely broad. Within AI, there
are communities in which this discussion is particularly pertinent. Natural Language
Processing (or NLP) is one of them. The term NLP has been mostly used in the past
to cover techniques dealing with text. Nowadays, however, it is increasingly used to
cover the human language in all its forms: written, spoken, gestural. This chapter
focuses on a single modality: speech. Speech is the most natural and immediate form
of communication. It is ubiquitous. The recent progress in speech technologies, due
mostly to advances in deep learning, huge amounts of training data and growing
computing power, has led to the use of speech as an input/output modality in a
panoply of applications which have been mostly reserved for text until recently.
Despite the fact that progress is very much dependent on the amount of training data,
making it very dependent on factors such as age and language/accent, and motivat-
ing huge efforts on training for less resourced languages, spoken language technol-
ogies are widely accepted nowadays. In fact, spoken language technologies, in
particular through their use in voice assistants, are transforming our society in
terms of digital inclusion, allowing an increasingly seamless use of the digital
tools that surround us: our telephone, our television set, or our household appliances.
Because of their complexity, many language applications run on cloud-based
platforms that provide remote access to powerful models in what is commonly
known as Machine Learning as a Service (MLaaS), enabling the automation of
time-consuming tasks such as document translation, or transcribing speech, and
helping users to perform everyday tasks (e.g. voice-based virtual assistants). When
a biometric signal such as speech is sent to a remote server for processing, however,
this input signal reveals, in addition to the meaning of words, much information
about the user, including his/her preferences, personality traits, mood, health, and
political opinions, among other data such as gender, age range, height, accent, etc.
Moreover, the input signal can be also used to extract relevant information about the
user environment, namely background sounds.
In fact, most users are unaware of the potential for misuse allowed by this new
generation of speech technology systems. For instance, most users do not know how
many sentences in their own voice are necessary for cloning it, nor have they heard
about spoofing speaker recognition systems. Moreover, most users do not realize
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 217

that adversarial techniques now enable the effective injection of hidden commands
in spoken messages without being audible.
The almost impossible task of reviewing the state of the art in core spoken
language technologies is the topic of the second section of this chapter. In the
third one, we try to raise awareness of their potential misuse.
The fourth section briefly describes efforts towards voice privacy, covering
anonymisation and encryption techniques.
The fifth section focuses on the ethical impact of using speech as an input/output
modality for MT, one of the NLP technologies which traditionally has dealt with text
input and output. However, speech-to-speech machine translation (S2SMT) is no
longer a research-only topic, and one can only anticipate its growing use in our
multilingual world.
The sixth section wraps up, arguing that there is a need for a growing awareness
of the potential for misuse of speech technologies, and simultaneously a need for a
common taxonomy that allows experts in speech technology, cryptography, and law
to clearly define the boundaries of ethical speech processing.
Among several other survey papers that may complement the many topics raised
in this chapter, we strongly recommend the excellent overview of ethics and good
practice in computational paralinguistics in Batliner et al. (2020). For a survey of
approaches for privacy preserving speech processing (Nautsch et al. 2019b) is
another excellent start. For an in-depth study of human profiling from their voice,
we also recommend (Singh 2019).

11.2 Recent Progress in Speech Technologies

A colleague once complained about reading too many papers including the words
“with the advent of deep learning”, but it is in fact the most adequate start for this
section, which tries to summarize in a couple of pages the recent progress in speech
technologies.

11.2.1 Computational Paralinguistics

Paralinguistics is defined by Laver (1994) as communicative behaviour that is


non-linguistic and non-verbal, but nevertheless coded. Paralinguistic communica-
tion informs the receiver about the speaker’s feelings, attitude or emotional state,
among many other characteristics. The authors in Batliner et al. (2020) adopt a much
broader definition, encompassing other input modalities as well. They show several
use cases of how computational paralinguistic technologies can have good or bad
uses. For instance, health screening can be used for the early detection or monitoring
of speech affecting diseases, or can be misused by insurance companies.
218 I. Trancoso et al.

Yet, most users of remote speech technology servers are unaware of the amount
of information that can be mined from their sentences, in particular, about their
health status. In fact, the potential of speech as a biomarker for health has been
realized for diseases affecting respiratory organs, such as the common Cold,
Obstructive Sleep Apnea (OSA), or COVID-19, for mood disorders such as Depres-
sion, Anxiety, and Bipolar Disease, and neurodegenerative diseases such as
Parkinson’s (PD), Alzheimer’s (AD), and Huntington’s disease.
For instance, the most common speech disturbances in Parkinson’s Disease are
excess of tremor, reduced loudness, monotonicity, hoarseness, and imprecise artic-
ulation. A second example is OSA. Here, most patients show articulatory anomalies,
phonation anomalies, and abnormal coupling of the vocal tract with the nasal cavity,
which is present even in non-nasal sounds. A third example is depression, where
speech is characterized as dull, monotone, monoloud, lifeless and metallic.
Some of these symptoms are visible in features that may be automatically
extracted from the acoustic signal: prosodic features, voice quality features, and
spectral features (e.g., pitch, energy, resonance frequencies, jitter, harmonicity,
speech rate, pause duration, etc.). Other symptoms may also be visible in the analysis
of the text that is automatically produced by a speech recognition system. For
instance, the analysis of speech of patients with AD may show a decline in content
and fluency, and a higher prevalence of pauses and filler words.
Building classifiers that detect such diseases entails the collection of datasets of
speech from patients and healthy controls, which until very recently was mostly
done in clinical facilities. The experience with COVID-19 revealed how important
remote diagnosis can be, motivating research with data collected in-the-wild. Other
examples of potentially very useful remote diagnosis and monitoring are machine-
assisted depression and suicide risk assessment (Cummins et al. 2015).
In paralinguistic tasks involving speech affecting diseases, data collection may
involve different types of acoustic signals: read and spontaneous speech, sustained
vowels, consonant-vowel syllables, coughs, simulated snoring, etc. The typical short
dimension and limited demographic coverage of datasets for paralinguistic research
has been a problem for many years, independently of the physical or psychological
trait that is being detected. In fact, it has limited the use of neural machine learning
approaches in many cases. Hence, results vary significantly for the same task among
different datasets. For instance, Vasquez et al. (2017) reports results from 70.3% to
88.5% (unweighted average recall) using the same method for the task of detecting
Parkinson’s disease in datasets of read sentences in three different languages. These
limitations have been the main motivation for establishing joint Computational
Paralinguistics challenges such as ComParE,1 that takes place yearly at Interspeech
conferences since 2009, covering many different tasks, and motivating many teams
worldwide that work on common datasets.
These current limitations also contribute to show that the enormous potential of
profiling humans from their voices is still very far from being explored. This

1
http://www.compare.openaudio.eu/.
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 219

potential is particularly high for clinical applications, going much beyond the typical
speech and language disorders such as stuttering or sigmatism (defective pronunci-
ation of sibilant sounds). Our vision is that collecting speech samples will one day
become as common as a blood test, so that doctors and therapists will be able to
automatically retrieve the results of a detailed analysis of speech features, as well as
global indicators of the presence of speech affecting diseases, that may be used as a
“second opinion”.
Health is an application domain where MT may play a very relevant role. One can
envisage a panoply of applications of MT and in particular speech-to-speech MT in
clinical facilities, to be used by physicians, caretakers and patients. Transposing
speech analysis models trained with speech in one language to another language
could hamper diagnosis based on speech features, although several paralinguistic
features, namely the ones related to voice quality (e.g. jitter), could be considered
language independent.
On the other hand, some prosodic features (such as pitch contours, for instance)
may be strongly language dependent. Their relevance for the extraction of paralin-
guistic traits such as emotion cannot be overemphasized. Hence, transposing such
traits to the synthetic output of speech-to-speech MT is a very challenging research
topic.

11.2.2 Speaker Representation

Much of the recent progress in automatic speaker recognition may be attributed to


representation learning, the so-called ‘speaker embeddings’. In many NLP tasks, MT
included, word embeddings (such as Word2vec (Mikolov et al. 2013), GloVe
(Pennington et al. 2014), ELMo (Peters et al. 2018) and BERT (Devlin et al.
2019)) play a major role. The term is used for a representation where related
words tend to cluster near each other in a space of real valued vectors. Likewise,
speaker embeddings encode the speaker characteristics of an utterance into a fixed-
length vector. The most popular technique for achieving this compact representation,
independent of the utterance length, is currently the x-vector approach (Snyder et al.
2016), a name that was motivated by its precursor, the i-vector approach (Dehak
et al. 2011). The x-vector embeddings are extracted from the hidden layers of deep
neural networks, when they are trained to distinguish over thousands of speakers. In
fact, this approach has been applied to Voxceleb,2 a multimodal corpus of YouTube
clips that includes over 7000 speakers of multiple ethnicities, accents, occupations
and age groups. A very popular open-domain toolkit for speech processing, Kaldi,3
includes a recipe for training x-vectors using this corpus, reaching impressive equal

2
https://www.robots.ox.ac.uk/~vgg/data/voxceleb/.
3
https://kaldi-asr.org/.
220 I. Trancoso et al.

error rates (EER) close to 3%. This metric derives its name from corresponding to a
threshold for which the false positive and false negative error rates are equal.
These weights can be used to initialize models for new datasets and new tasks. In
fact these pre-trained speaker representations have many applications beyond voice
biometrics. They have been successfully applied to numerous speech processing
tasks, including linguistic (e.g. speech recognition and speech synthesis) and para-
linguistic tasks.
The enormous progress of speaker verification systems has allowed its worldwide
deployment in biometric systems with great impact in fraud prevention in areas such
as banking.4

11.2.3 Speech Recognition

The state of the art in automatic speech recognition (ASR) since the 1980s was
predominantly based on the GMM-HMM paradigm (Gaussian Mixture Models—
Hidden Markov Models). By combining these acoustic models, which were fed with
perceptually meaningful features, with additional knowledge sources provided by
n-gram language models and lexical (or pronunciation) models, one could achieve
word error rates (WER) that made ASR systems usable for certain tasks, namely
those involving read speech in clean recording conditions. However, progress was
too slow for more than three decades, and robustness was a major issue. A giant leap
in error rate reduction was achieved during the last decade with the so-called ‘hybrid
paradigm’, that pairs a deep neural network with HMMs. The Kaldi toolkit includes
a recipe for training a DNN-HMM based system on a corpus of read audiobooks
(Librispeech (Panayotov et al. 2015)) with close to 960 h of speech. This recipe
achieves a WER of 3.8%, an unthinkable result a decade ago. Conversational speech
is more challenging, with error rates that are almost triple the above result, and so are
tasks involving for instance non-native accents, distant microphones in a meeting
room, etc. Nowadays, fully end-to-end architectures are proposed to perform the
entire ASR pipeline (Karita et al. 2019), with the exception of feature extraction, but
the performance is significantly worse when training data is short. Many machine
learning approaches have been proposed recently for improving the ASR perfor-
mance in challenging tasks, such as audio augmentation (Park et al. 2019; Ko et al.
2015), transfer learning (Abad et al. 2020), multi-task learning (Pironkov et al.
2016), etc.
Of particular interest are the recent unsupervised approaches that leverage speech
representations to segment unlabeled audio and learn a mapping from these repre-
sentations to phonemes via adversarial training. These approaches reach competitive

4
https://www.finextra.com/newsarticle/37989/hsbcs-voice-id-prevents-249-million-of-attempted-
fraud.
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 221

word error rates on the Librispeech benchmark (5.9%), rivaling systems trained on
labeled data (Baevski et al. 2021).
Due to their high complexity, ASR systems typically run in the cloud. In personal
voice assistants, the task of spotting the wake up keyword when they are in the
“always listening mode” is done on device using much less complex approaches.
Despite recent breakthroughs in ASR, spontaneous speech recognition is still a
very challenging task, representing one of the major sources of errors that together
with MT errors hinder the widespread use of speech-to-speech MT systems.
Although for specific domains enough training data can be collected to mitigate
such errors, users should be made aware of the limitations of current systems.

11.2.4 Speech Synthesis

Up until 10 years ago, text-to-speech (TTS) systems traditionally depended on


sophisticated chains of linguistic modules which took text as input and produced a
string of phonemes together with the prosodic information that specified the derived
intonation. These linguistic codes were then fed into a waveform generation module
that produced speech. Most of the commercially available systems from that gener-
ation used a concatenative synthesis module that selected the best segments to join
together from a huge corpus of sentences by a single speaker. Quality was high
enough for many commercial applications, but expressiveness was a major issue,
and the costs of building new synthetic voices were often prohibitive. Statistic
parametric speech synthesis (Black et al. 2007) partially overcame these problems,
by predicting the parameters that could control expressiveness (at that time, typically
via regression trees), but that control came at the cost of using a vocoder5 model that,
despite its complexity, could never produce speech that sounded natural enough.
Hybrid approaches (Qian et al. 2013) combining statistical parametric and
concatenative approaches were the state of the art until the late 2010s.
The first breakthrough was the replacement of the traditional vocoder by a neural
vocoder that took as input time-frequency spectrogram representations (van den
Oord et al. 2016). Nowadays, the whole paradigm changed to encoder-decoder
architectures, with attention mechanisms mapping the linguistic time scale to the
acoustic time scale (see for instance Shen et al. 2018, among many other systems).
Moreover, multi-speaker TTS systems are no longer fiction, given the possibility of
leveraging speaker embeddings. In fact zero-shot multi-speaker TTS can build
synthetic voices with only a few seconds, using for instance, flow-based models
(Kim et al. 2020; Casanova et al. 2021).

5
A vocoder (short for voice encoder) is a synthesis system which was initially developed to
reproduce human speech.
222 I. Trancoso et al.

Blizzard challenges6 have been organized every year since 2005 to jointly
evaluate speech synthesizers built on the same datasets. The synthetic speech quality
is now very close to human speech, reaching values above 4 on a scale of 1–5, the
so-called ‘Mean Opinion Score’ (MOS) scale.
Due to their high complexity, neural TTS systems typically run in the cloud, but
on-device implementation is a current target for some big industrial players, leverag-
ing the internal deep learning framework of the latest generation of mobile devices.

11.2.5 Voice Conversion

Voice conversion (VC) belongs to the general technical field of speech synthesis, but
instead of converting text to speech it encompasses changing the properties of
speech, for example, voice identity, emotion, and accents. An excellent survey on
VC can be found in Sisman et al. (2021). The impact of deep learning on VC has
been so relevant, that a number of applications of VC that were considered only
potential until very recently are now much closer to deployment: personalized
speech synthesis, namely for the speech-impaired community, speaker
de-identification, voice mimicry and disguise, computer-assisted pronunciation
training for second language students, and voice dubbing for movies. Unlike the
first above-mentioned applications which are essentially monolingual, voice dub-
bing involves the much harder task of crosslingual VC.
Traditional VC approaches encompassed three stages: analysis and feature
extraction, which decomposes the speech signals of a source speaker into features
that represent supra-segmental and segmental information, mapping, which changes
them towards the target speaker, and reconstruction, which re-synthesizes time-
domain speech signals. For many years, one of the hardest problems with these
traditional VC approaches was the overall muffling effect, most probably linked to
the statistical averaging that characterizes the adopted optimization criteria, and to
the low-resolution features that were used for the mapping.
Recent deep learning approaches avoid this pipeline. The reconstruction is done
with neural vocoders, which, being trainable, can be trained jointly with the mapping
module and even with the analysis module to become an end-to-end solution
(Sisman et al. 2021). The possibility of disentangling speaker information and
linguistic contents is crucial. This has been done notably by using variational auto-
encoder schemes, in which the content encoder learns a latent code from the source
speaker speech, and the speaker encoder learns the speaker embedding from the
target speaker speech. At run-time, the latent code and the speaker embedding are
combined to generate speech. This type of approach was used as one of the baselines
for the most recent Voice Conversion Challenge (Yi et al. 2020). Another baseline
was a cascade of ASR and TTS systems.

6
https://www.synsig.org/index.php/Blizzard_Challenge.
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 223

The success of this disentanglement of speaker and linguistic contents has led
researchers to new approaches that try to factor in prosody embeddings or style
embeddings as well. This disentanglement can be particularly relevant in the context
of speech-to-speech translation.
VC systems may be evaluated using several subjective and objective metrics.
Among the subjective tests the MOS (mean opinion score) is one of the most
commonly used to measure speech naturalness. Among the objective tests, one
can use speaker recognition and speech recognition tests.

11.3 Privacy and Security Breaches in Speech Technologies

The recent progress in spoken language technologies has motivated a growing


number of new applications and, simultaneously, new possibilities for their misuse.
Mining the speech signals captured by always listening devices is one of the most
obvious ones, and yet it is mostly ignored by the growing community of users of
personal assistants that by default operate in this listening mode. As in many other AI
applications, users may sense that the system learns from past interactions, but may
not be fully aware of what this learning may imply in terms of privacy. Most users
are not even aware that this “always listening” default mode can be turned off, or that
they have the possibility of issuing commands such as “Delete everything I said
today”, for increased privacy.
Another pressing possibility is the use of synthesis and voice conversion tech-
nology for the purposes of incrimination, defamation or misinformation, and also for
attacking (or spoofing) speaker verification systems. This type of attack, along with
replay attacks, has raised the need for anti-spoofing countermeasures, the central
theme of several ASVspoof Challenges, (Nautsch et al. 2021) since 2015. As the
quality of voice conversion with very little spoken material from a target speaker
increases, the need for more sophisticated anti-spoofing also grows concurrently.
The possibility of hidden voice commands, injected in the input signal is another
threat. In the past, such commands took explicitly advantage of hardware
non-linearities in the analogue signal processing path, exploiting the fact that the
human hearing mechanism could not detect certain signals (e.g. frequencies above a
given threshold). Other approaches obfuscated target signals in what was perceived
as noise by humans.
Nowadays, a major threat in terms of hidden voice commands can be also
achieved by adversarial attacks. In fact, the recent advances provided by deep
learning techniques come with a price: their vulnerability to this type of attack.
Basically, these attacks consist of the perturbation of a classifier’s input at test time
such that the classifier outputs a wrong prediction.
These attacks can affect different speech-based classifiers. They are particularly
relevant when they are aimed at fooling speaker verification systems, pushing the
system to misidentify a speaker or to identify a specific speaker chosen by the
attacker. They can also aim at fooling speech recognition systems which are misled
224 I. Trancoso et al.

into outputting a target command. In the past, this type of perturbation was in most
cases perceptible, but the current state of the art in adversarial attacks, namely by
using multi-objective loss functions, showed that one can generate highly impercep-
tible perturbations that are extremely effective in misleading either speaker or speech
recognition systems.
The possibilities for misuse are in fact endless and this very brief review barely
touched the surface, ignoring several other types of attack that may target speech-
based apps.

11.3.1 Growing Awareness

Despite having claimed that most users of speech technologies running on remote
servers are not fully aware of the privacy concerns and potential security attacks
these technologies entail, privacy awareness in remote speech processing is gradu-
ally coming up in the media. The growing awareness that audio sensing can be
anywhere, anytime is illustrated by titles such as:
• Microsoft workers listen to some translated Skype calls7
• Apple halts practice of contractors listening in to users on Siri8
• Google ordered to halt human review of voice AI recordings over privacy risks9
• Amazon’s Alexa recorded private conversation and sent it to random contact10
• Amazon Echo Dot ad cleared over cat food order11
• LaLiga fined for soccer app’s privacy-violating spy mode12
• An Amazon Echo may be the key to solving a murder case13
• Siri and Alexa could become witnesses against you in court some day14
Privacy does not only concern what a subject says, but the way the subject says it,
the physical and psychological traits of the subject, and the environment in which the
utterance was produced. On the other hand, the possibility of using synthetic voices
for fraudulent purposes, or the recreation of the voices of celebrities also generates
great interest, as in:

7
https://www.bbc.com/news/technology-49263260.
8
https://www.theguardian.com/technology/2019/aug/02/apple-halts-practice-of-contractors-listen
ing-in-to-users-on-siri.
9
https://techcrunch.com/2019/08/02/google-ordered-to-halt-human-review-of-voice-ai-recordings-
over-privacy-risks/.
10
https://www.theguardian.com/technology/2018/may/24/amazon-alexa-recorded-conversation.
11
https://www.bbc.com/news/business-43044693.
12
https://techcrunch.com/2019/06/12/laliga-fined-280k-for-soccer-apps-privacy-violating-spy-
mode.
13
https://techcrunch.com/2016/12/27/an-amazon-echo-may-be-the-key-to-solving-a-murder-case/.
14
https://sdlgbtn.com/news/2016/12/29/siri-and-alexa-could-become-witnesses-against-you-court-
some-day.
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 225

• Beware: Phone scammers are using this new sci-fi tool to fleece victims15
• An artificial-intelligence first: Voice-mimicking software reportedly used in a
major theft16
• New AI Tech Can Mimic Any Voice17
• The haunting afterlife of Anthony Bourdain18
In fact, social networks went wild with the recent use of TTS techniques for
generating the voice of a dead celebrity reading his own email sentences in a biopic
documentary. Terms such as AI lapse show that there is a growing awareness that
such speech deepfakes may raise ethical issues.

11.4 Voice Privacy

Much progress on voice privacy has been spurred by joint evaluation challenges
such as the Voice Privacy challenge (in 2020) (Tomashenko et al. 2020). There are
many different approaches targeting voice privacy. This section briefly covers two
classes: via anonymisation and via encryption.
For the sake of space, we leave out deletion methods which target ambient sound
analysis, deleting or obfuscating any overlapping speech, so that no information
about it can be recovered. For recent work on this topic, see for instance Cohen-
Hadria et al. (2019) and Gontier et al. (2020).
We also leave out federated (also called ‘decentralized’ or ‘distributed’) learning
methods, which aim to learn models from distributed data without accessing it
directly (Leroy et al. 2019). This type of method is often combined with differential
privacy.

11.4.1 Anonymisation

Anonymisation approaches try to suppress personally identifiable attributes of the


speech signal, leaving all other attributes intact. Anonymised utterances should
sound as if they had been uttered by another speaker. In the above-mentioned
Voice Privacy challenge (Tomashenko et al. 2020), the primary baseline system
was based on anonymising the extracted x-vector of the utterance and resynthesizing
it with a neural source-filter model. The anonymisation was achieved by averaging a

15
https://fortune.com/2021/05/04/voice-cloning-fraud-ai-deepfakes-phone-scams/.
16
https://www.washingtonpost.com/technology/2019/09/04/an-artificial-intelligence-first-voice-
mimicking-software-reportedly-used-major-theft/.
17
https://www.scientificamerican.com/article/new-ai-tech-can-mimic-any-voice/.
18
https://www.newyorker.com/culture/annals-of-gastronomy/the-haunting-afterlife-of-anthony-
bourdain.
226 I. Trancoso et al.

set of vectors among the farthest ones in the x-vector space. This means that the
artificial voice may not correspond to any real speaker.
The challenge used a number of objective metrics such as equal error rate (EER)
of speaker verification and word error rate of speech recognition, as well as subjec-
tive measures. The primary baseline system achieved good anonymisation results in
terms of EER (comparable to chance level) at the cost of a relative degradation in
terms of WER, which in some datasets surpassed 60%. However, EER results
degraded significant when the verification system is trained on anonymised data.
Several teams participating in the challenge proposed more elaborate x-vector
modification schemes, showing again the pervasiveness of speaker embedding
approaches.
Most anonymisation solutions aim at suppressing information related to the
identity of the speaker, whilst preserving the information related to the contents of
the spoken message. But anonymisation could also aim at selectively suppressing
certain attributes (e.g. gender/age), a disentanglement problem that may enable a
greater control of privacy levels.
The topic of anonymisation in the context of speech-to-speech MT is still
relatively unexplored to the best of our knowledge, but it should certainly deserve
much attention.

11.4.2 Encryption

Privacy-preserving speech processing using encryption techniques is an emergent


cross-disciplinary topic, such as privacy-preserving machine learning in general. The
main alternatives focus on cryptographic methods that allow two or more parties to
jointly compute a function so that only the final result is disclosed to one or more
parties, while the function’s inputs and intermediate results remain hidden. These
methods include, with varying degrees of privacy: Secure Multiparty Computation
(SMC) techniques, such as Garbled Circuits (GC) (Yao 1986) and Secret Sharing
protocols (Goldreich 1999; Ben-Or et al. 1988); Homomorphic Encryption
(HE) systems (Paillier 1999; Elgamal 1985; Fan and Vercauteren 2012); distance-
preserving hashing techniques such as Secure Binary Embeddings (SBE)
(Boufounos and Rane 2011) and Secure Modular Hashing (SMH) (Jiménez et al.
2015); and hardware-based solutions (e.g. using Intel’s SGX (Brasser et al. 2018)).
One of the main problems in applying HE to speech processing tasks is noise
growth. In this context, ‘noise’ means what is added to a plaintext during encryption,
and removed during decryption. Noise grows with each operation performed on
ciphertexts, particularly multiplications. Care must be taken in order to avoid passing
a threshold above which the ciphertext cannot be decrypted correctly anymore.
Typical approaches involve sending the data back to client where it is decrypted,
thereby removing all noise, reencrypted and sent back to the server. The literature
shows that for complex classifiers hybrid approaches achieve the best results. These
approaches use the most appropriate method for each system component, trading off
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 227

computational cost and model privacy with server-client interactions. HE


approaches are claimed to be particularly suitable to compare speaker embeddings
(Nautsch et al. 2018).
Distance-preserving hashing techniques have also received considerable attention
in speech, as they are much faster than SMC techniques, and machine learning
algorithms such as Support Vector Machines can be easily adapted to work with
them. The security of these methods, however, depends heavily on how hash keys
are distributed between parties.
The literature shows that cryptographic techniques have been applied to several
tasks in privacy-preserving speech processing. Speaker verification (Pathak et al.
2012; Portêlo et al. 2013; Portêlo et al. 2014; Nautsch et al. 2018) is one of the most
crucial tasks, given the importance of allowing clients and remote servers to perform
authentication (e.g. for banking applications), without clients having access to the
server models and without servers having access to raw audio data or features
derived from it, nor the corresponding predictions. Another very relevant use of
cryptographic techniques concerns the extraction of paralinguistic information such
as emotions and speech-affecting diseases (Dias et al. 2018; Teixeira et al. 2018;
Teixeira et al. 2019). Other speech tasks are, for instance, query-by-example speech
search (Portêlo et al. 2015), etc.
Methods involving the binarisation of templates, models and features are also
worth mentioning, in particular for speaker verification (Mtibaa et al. 2018). Another
interesting approach, targeting ASR, is the one described in Zhang et al. (2019). The
authors propose a deep polynomial network that can be applied to the encrypted
speech as an acoustic model. The remote server does not have access to the raw
audio, only encrypted features, and predictions are also returned in encrypted form.
The authors claim acceptable degradation in WER and latency.
As mentioned above, cryptographic approaches typically involve overheads of
computation, communication, and rounds of interaction. This has prevented its
deployment in scenarios such as personal voice assistants, showing the many
challenges that lie ahead. These challenges will become even more relevant in the
post-quantum computing era.

11.5 Speech-to-Speech Machine Translation

Up to very recently, S2SMT systems were based on a cascade of ASR, MT and TTS
modules. Depending crucially on the state of the art for each of these modules, their
usability was mostly demonstrated in narrow domains, which was an obstacle to
their commercial deployment. End-to-end approaches have attracted much attention
recently, due to the potential for avoiding the error propagation that is inherent to
cascade approaches. They may also be advantageous in terms of computational
complexity and latency. Such advantages were also realised for speech-to-text
translation systems where end-to-end approaches have had a profound impact
(Bahar et al. 2019; Gangi et al. 2019). In addition, direct models for S2SMT are
228 I. Trancoso et al.

naturally capable of retaining paralinguistic and non-linguistic information during


translation, e.g. maintaining the source speaker’s voice, emotion, and prosody, in the
synthesized translated speech. Moreover, directly conditioning on the input speech
makes it easy to learn to generate fluent pronunciations of words which do not need
to be translated, such as names.
One of the earlier approaches, Translatotron (Jia et al. 2019), is an attention-based
sequence-to-sequence neural network which can directly translate speech from one
language into speech in another language, without an intermediate text representa-
tion. The network is trained end-to-end, learning to map speech spectrograms into
target spectrograms in another language, and another voice, the one of the source
speaker. Voice conversion is done by leveraging a speaker encoder separately
trained in a speaker verification task. This approach is particularly worth mentioning,
because its successor, Translatotron 2 (Jia et al. 2021), features a new approach for
retaining the source speaker’s voice in the translated speech. The trained model is
restricted to retain the source speaker’s voice, and unlike its predecessor, is not able
to generate speech in a different speaker’s voice, making the model more robust for
production deployment, by mitigating potential misuse for creating spoofing audio
artifacts. This shows in fact the awareness of the industry for the potential misuse of
speech technologies.
S2SMT systems pose enormous challenges in terms of training because of data
scarcity. However, they also raise significant challenges in terms of evaluation, due
to the many aspects that must be taken into account. In order to use typical measures
of text translation quality such as BLEU scores, pre-trained ASR systems may be
applied to recognize the translated speech. Given the potential ASR errors, the
resulting scores may be considered as a lower bound on translation quality (Jia
et al. 2021). This is typically complemented with MOS listening tests to measure
subjective speech naturalness. When crosslingual voice conversion is adopted,
assessing speaker recognition scores also becomes relevant.
The volume of words that are daily translated using publicly available MT
engines reaches several trillion words.19 Translation requests come from both
individual users and companies, in an increasing proportion, and often in order to
satisfy urgent needs. We anticipate that in a not too distant future, the same sort of
statements may cover not only textual words but spoken words. Speech can be
particularly relevant for unwritten languages or dialects that have no standardized
orthography. But for all the other languages, one can also envisage a panoply of
commercial translation applications that use speech as input/output modality instead
of text. When their popularity reaches the level of other current spoken language
technologies, privacy/security breaches must be anticipated as well.
The human-in-the-loop MT paradigm has proved very successful in text-based
systems, namely for solving multilingual communication issues in customer service.
In this type of systems, anonymisation strategies are typically applied to the input
text prior to submission to human editors, in order to obfuscate confidential

19
https://blog.sdl.com/blog/The-Issue-of-Data-Security-and-Machine%20Translation.html.
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 229

information. The possibility of using the same human-in-the loop paradigm for
speech input/output remains unexplored, although proof-of-concept systems have
already been demonstrated (Bernardo et al. 2019). This possibility raises numerous
challenges not only from a technical point of view, but also from the point of view of
privacy. The human-in-the-loop paradigm for speech-to-speech MT would solve the
problems of users not being able to check if there were any misrecognition/
mistranslation errors in the synthetic files which could be spoken in the user’s voice.

11.6 Conclusions

This chapter tried to summarize several privacy and security issues potentially raised
by the use of speech technologies, when they are accessed at remote servers. Users of
speech-to-speech MT systems must be made aware of these issues at different levels.
One one hand, their input speech data in L1 may reveal a great amount of
paralinguistic/extra-linguistic information, besides the linguistic content of the spo-
ken message which may contain references to entities that one might prefer to
anonymise. That would allow a malicious server to profile the user for different
purposes, such as recommending products and services, for instance. On the other
hand, the input utterances themselves can be used for building text-to-speech
synthesizers in the speaker’s voice, which may be misused for impersonation/
spoofing attacks. One may argue that ideally the spoken utterance in L2 should
preserve all this information, but this may have to be done at the cost of trusting the
remote server. Achieving a balance between privacy and utility in speech technol-
ogies deployed in remote servers is a difficult goal, but it becomes much harder when
such technologies are combined in complex speech-to-speech MT systems.
Privacy engineering is an interdisciplinary field which is slowly emerging within
speech technologies, making developers conscious that performance indicators
alone are no longer sufficient, and privacy by design is crucial.
The discussion of all these issues requires joining forces of different communi-
ties: the speech research community, the cryptography research community, and the
legal community. This is one of the objectives of the recently formed Special Interest
Group (SIG) “Security and Privacy in Speech Communication” (SPSC), within the
International Speech Communication Association (ISCA). Intended as an interdis-
ciplinary platform, the SIG fosters exchange between leading industrial and aca-
demic players with the goal of reaching standards and procedures that protect the
privacy of the individual in speech communication while providing sufficient means
and incentives for industry to exploit towards future innovative services. The SIG
was very active in the recent discussion promoted by the European Data Protection
Board on Virtual Voice Assistants.20 In particular, the SIG criticized the use of the

20
https://edpb.europa.eu/our-work-tools/documents/public-consultations/2021/guidelines-022021-
virtual-voice-assistants_en.
230 I. Trancoso et al.

term unique identifiability with regard to speech signals. The term does not reflect
the probabilistic nature of speaker identification methods. This probabilistic nature
also raises the need to address uncertainty in decision outcomes and to limit its
impact in decision making.
The different taxonomy among the different communities is probably the first
obstacle to conquer, in order to clearly define the boundaries of ethical speech
processing (Nautsch et al. 2019a). The GDPR contains few norms that have direct
applicability to inferred data, requiring an effort of extensive interpretation of many
of its norms, with adaptations, to guarantee the effective protection of people’s rights
in an era where speech must be legally regarded as PII (Personally Identifiable
Information).

Acknowledgements This work was supported by national funds through Fundação para a Ciência
e a Tecnologia (FCT) with references UIBD/50021/2020 and CMU/TIC/0069/2019, and by the
P2020 project MAIA (contract 045909). We would like to thank several colleagues for many
interesting discussions on this topic, namely Bhiksha Raj, Helena Moniz, Filipa Calvão, and
Andreas Nautsch.

References

Abad A, Bell P, Carmantini A, Renais S (2020) Cross lingual transfer learning for zero-resource
domain adaptation. In: IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp 6909–6913
Baevski A, Hsu WN, Conneau A, Auli M (2021) Unsupervised speech recognition. ArXiv preprint,
2105.11084
Bahar P, Bieschke T, Ney H (2019) A comparative study on end-to-end speech to text translation.
In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp 792–
799
Batliner A, Hantke S, Schuller BW (2020) Ethics and good practice in computational paralinguis-
tics. IEEE Trans Affect Comput. Manuscript. Preliminary Version
Ben-Or M, Goldwasser S, Wigderson A (1988) Completeness theorems for non-cryptographic
fault-tolerant distributed computation. In: 20th Annual ACM Symposium on Theory of Com-
puting, pp 1–10
Bernardo L, Giquel M, Quintas S, Dimas P, Moniz H, Trancoso I (2019) Unbabel Talk - human
verified translations for voice instant messaging. In: Interspeech, pp 3691–3692
Black AW, Zen H, Tokuda K (2007) Statistical parametric speech synthesis. In: IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), vol 4, pp IV-1229–IV-1232
Boufounos P, Rane S (2011) Secure binary embeddings for privacy preserving nearest neighbors.
In: IEEE Workshop on Information Forensics and Security (WIFS), pp 1–6
Brasser F, Frassetto T, Riedhammer K, Sadeghi A-R., Schneider T, Weinert C (2018) VoiceGuard:
secure and private speech processing. In: Interspeech, pp 1303–1307
Casanova E, Shulby C, Gölge E, Müller NM, de Oliveira FS, Candido Jr A, da Silva Soares A,
Aluisio SM, Ponti MA (2021) SC-GlowTTS: an efficient zero-shot multi-speaker text-to-speech
model. In: Interspeech, pp 3645–3649
Cohen-Hadria A, Cartwright M, McFee B, Bello JP (2019) Voice anonymization in urban sound
recordings. In: IEEE International Workshop on Machine Learning for Signal Processing
(MLSP), pp 1–6
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 231

Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of


depression and suicide risk assessment using speech analysis. Speech Commun 71:10–49
Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker
verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional
transformers for language understanding. ArXiv preprint, 1810.04805
Dias M, Abad A, Trancoso I (2018) Exploring hashing and Cryptonet based approaches for privacy-
preserving speech emotion recognition. In: IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pp 2057–2061
Elgamal T (1985) A public key cryptosystem and a signature scheme based on discrete logarithms.
IEEE Trans Inf Theory 31(4):469–472
Fan J, Vercauteren F (2012) Somewhat practical fully homomorphic encryption. IACR Cryptology
ePrint Archive, 2012:144. Informal publication
Gangi MAD, Negri M, Turchi M (2019) Adapting transformer to end-to-end spoken language
translation. In: Interspeech, pp 1133–1137
Goldreich O (1999) Secure multi-party computation. Manuscript. Preliminary Version
Gontier F, Lagrange M, Lavandier C, Petiot JF (2020) Privacy aware acoustic scene synthesis using
deep spectral feature inversion. In: IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pp 886–890
Jia Y, Weiss RJ, Biadsy F, Macherey W, Johnson M, Chen Z, Wu Y (2019) Direct speech-to-speech
translation with a sequence-to-sequence model. In: Interspeech, pp 1123–1127
Jia Y, Ramanovich MT, Remez T, Pomerantz R (2021) TRANSLATOTRON 2: Robust direct
speech-to-speech translation. ArXiv preprint, 2107.08661
Jiménez A, Raj B, Portêlo J, Trancoso I (2015) Secure modular hashing. In: IEEE International
Workshop on Information Forensics and Security (WIFS), pp 1–6
Karita S, Wang X, Watanabe S, Yoshimura T, Zhang W, Chen N, Hayashi T, Hori T, Inaguma H,
Jiang Z, Someki M, Enrique N, Soplin Y, Yamamoto R (2019) A comparative study on
transformer vs rnn in speech applications. In: IEEE Workshop on Automatic Speech Recogni-
tion and Understanding (ASRU), pp 449–456
Kim J, Kim S, Kong J, Yoon S (2020) Glow-TTS: A generative flow for text-to-speech via
monotonic alignment search. ArXiv preprint, 2005.11129
Ko T, Peddinti V, Povey D, Khudanpur S (2015) Audio augmentation for speech recognition. In:
Interspeech, pp 3586–3589
Laver J (1994) Principles of phonetics. Cambridge University Press
Leroy D, Coucke A, Lavril T, Gisselbrecht T, Dureau J (2019) Federated learning for keyword
spotting. In: IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp 6341–6345
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words
and phrases and their compositionality. ArXiv preprint, 1310.4546
Mtibaa A, Petrovska-Delacretaz D, Hamida AB (2018) Cancelable speaker verification system
based on binary Gaussian mixtures. In: 4th International Conference on Advanced Technologies
for Signal and Image Processing (ATSIP), pp 1–6
Nautsch A, Isadskiy S, Kolberg J, Gomez-Barrero M, Busch C (2018) Homomorphic encryption for
speaker recognition: protection of biometric templates and vendor model parameters. In:
Speaker and Language Recognition Workshop (Odyssey), pp 16–23
Nautsch A, Jasserand C, Kindt E, Todisco M, Trancoso I, Evans N (2019a) The GDPR &
speech data: reflections of legal and technology communities, first steps towards a common
understanding. In: Interspeech, pp 3695–3699
Nautsch A, Jiménez A, Treiber A, Kolberg J, Jasserand C, Kindt E, Delgado H, Todisco M, Hmani
MA, Mtibaa A, et al (2019b) Preserving privacy in speaker and speech characterisation. Comput
Speech Lang 58:441–480
Nautsch A, Wang X, Evans N, Kinnunen TH, Vestman V, Todisco M, Delgado H, Sahidullah M,
Yamagishi J, Lee KA (2021) ASVspoof 2019: Spoofing countermeasures for the detection of
232 I. Trancoso et al.

synthesized, converted and replayed speech. IEEE Trans Biometr Behav Identity Sci 3(2):
252–265
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In:
Advances in cryptology, volume 1592 of Lecture Notes in Computer Science, pp 223–238
Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an ASR corpus based on public
domain audio books. In: IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pp 5206–5210
Park DS, Chan W, Zhang Y, Chiu CC, Zoph B, Cubuk ED, Le QV (2019) SpecAugment: a simple
data augmentation method for automatic speech recognition. In: Interspeech, pp 2613–2617
Pathak M, Portelo J, Raj B, Trancoso I (2012) Privacy-preserving speaker authentication. In:
International Conference on Information Security. Springer, pp 1–22
Pennington J, Socher R, Manning CD (2014) GloVe: Global vectors for word representation. In:
Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep
contextualized word representations. ArXiv preprint, 1802.05365
Pironkov G, Dupont S, Dutoit T (2016) Multi-task learning for speech recognition: an overview. In:
ESANN – European Symposium on Artificial Neural Networks, Computational Intelligence and
Machine Learning (ESANN), pp 189–194
Portêlo J, Abad A, Raj B, Trancoso I (2013) Secure binary embeddings of front-end factor analysis
for privacy preserving speaker cerification. In: Interspeech, pp 2494–2498
Portêlo J, Raj B, Abad A, Trancoso I (2014) Privacy-preserving speaker verification using garbled
GMMs. In: EUSIPCO, pp 2070–2074
Portêlo J, Abad A, Raj B, Trancoso I (2015) Privacy-preserving query-by-example speech search.
In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp
1797–1801
Qian Y, Soong FK, Yan ZJ (2013) A unified trajectory tiling approach to high quality speech
rendering. IEEE Trans Audio Speech Lang Process 21(2):280–290
Shen J, Pang R, Weiss RJ, Schuster M, Jaitly N, Yang Z, Chen Z, Zhang Y, Wang Y, Skerrv-Ryan
R, Saurous RA, Agiomvrgiannakis Y, Wu Y (2018) Natural TTS synthesis by conditioning
WaveNet on mel spectrogram predictions. In: IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp 4779–4783
Singh R (2019) Profiling humans from their voice. Springer
Sisman B, Yamagishi J, King S, Li H (2021) An overview of voice conversion and its challenges:
from statistical modeling to deep learning. IEEE/ACM Trans Audio Speech Lang Process 29:
132–157
Snyder D, Ghahremani P, Povey D, Garcia-Romero D, Carmiel Y, Khudanpur S (2016) Deep
neural network-based speaker embeddings for end-to-end speaker verification. In: IEEE Spoken
Language Technology Workshop (SLT), pp 165–170
Teixeira F, Abad A, Trancoso I (2018) Patient privacy in paralinguistic tasks. In: Interspeech, pp
3428–3432
Teixeira F, Abad A, Trancoso I (2019) Privacy-preserving paralinguistic tasks. In: International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 6575–6579
Tomashenko N, Srivastava BML, Wang X, Vincent E, Nautsch A, Yamagishi J, Evans N, Patino J,
Bonastre JF, Noé PG, Todisco M (2020) Introducing the VoicePrivacy initiative. In:
Interspeech, pp 1693–1697
van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior
AW, Kavukcuoglu K (2016) WaveNet: A generative model for raw audio. CoRR,
abs/1609.03499
11 Treating Speech as Personally Identifiable Information and Its Impact. . . 233

Vasquez J, Orozco JR, Noeth E (2017) Convolutional neural network to model articulation
impairments in patients with Parkinson’s disease. In: Interspeech, pp 314–318
Yao AC (1986) How to generate and exchange secrets. In: 27th Annual Symposium on Foundations
of Computer Science (SFCS), pp 162–167
Yi Z, Huang WC, Tian X, Yamagishi J, Das RK, Kinnunen T, Ling Z, Toda T (2020) Voice
conversion challenge 2020—intralingual semi-parallel and cross-lingual voice conversion. In:
Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, pp 909–910
Zhang SX, Gong Y, Yu D (2019) Encrypted speech recognition using deep polynomial networks.
In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp
5691–5695

You might also like