You Tubes Automated Subtitling From English Into Arabic
You Tubes Automated Subtitling From English Into Arabic
net/publication/389180590
CITATIONS READS
0 17
2 authors, including:
Ahmad Al-Harahsheh
Yarmouk University
52 PUBLICATIONS 199 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ahmad Al-Harahsheh on 22 February 2025.
ARTICLE
ABSTRACT
Recently, the development of speech-to-text technology, together with machine translation, has led to the development
of simultaneously translating the captions of videos into other languages. YouTube, a video-sharing platform, offers
multilingual subtitles using this feature. The current automated caption system captures audio data during video uploads
and generates a subtitle file in text format. The current study aims at examining whether YouTube machine translation from
English into Arabic is reliable in rendering the intended meaning on subtitling, depending on the FAR model (functional
equivalence, readability, and acceptability). The data of this study consisted of 30 examples that were taken from the
YouTube platform and their translated versions into Arabic using YouTube’s machine translation. The study is both
descriptive and comparative. The results of the study indicate that YouTube machine translation represents varying levels
of inadequate translation according to its system and database, revealing many deficiencies. The total approval rate is
68.5%, which gives the impression that the translation is very poor. Therefore, the machine requires the development of its
system and the enrichment of its databases, specifically the Arabic ones.
Keywords: Machine Translation; Subtitling; Far Model; YouTube; Harry Potter
*CORRESPONDING AUTHOR:
Ahmad Mohammad Al-Harahsheh, Translation Department, Faculty of Arts, Yarmouk University, Irbid, Jordan; Email: [email protected]
ARTICLE INFO
Received: 1 December 2024 | Revised: 10 January 2025 | Accepted: 13 January 2025 | Published Online: 18 February 2025
DOI: https://doi.org/10.30564/fls.v7i2.8163
CITATION
Al-Harahsheh, A.M., Rababah, R.H., 2025. YouTube’s Automated Subtitling from English into Arabic: A Case Study of Harry Potter and the
Prisoner of Azkaban. Forum for Linguistic Studies. 7(2): 583–602. DOI: https://doi.org/10.30564/fls.v7i2.8163
COPYRIGHT
Copyright © 2025 by the author(s). Published by Bilingual Publishing Group. This is an open access article under the Creative Commons
Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License (https://creativecommons.org/licenses/by-nc/4.0/).
583
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
584
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
present study may have the potential to assist researchers is impacted by several factors, including background noise
engaged in the use of artificial intelligence within the domain and speech pace, leading to significant variations in recogni-
of translation. tion rates across different scenarios. One limitation of speech
recognition is its inability to modify text based on contextual
2. Literature Review cues. Moreover, inadequate semantic understanding is a major
obstacle in this field. Yao [3] suggests that in order to address
The translation industry is influenced by the integration this issue, it is essential to enhance the algorithm and acquire
of Artificial Intelligence (AI), leading to the development a substantial volume of dependable data for algorithmic train-
of various software applications, databases, corpora, and ing. These measures are necessary to facilitate the algorithm’s
machine translation systems. However, the quality of trans- progress toward a certain degree of complexity. Nevertheless,
lation or subtitle provided by machine translation and AI is it is important to recognize that machine-translated subtitles,
still doubtful because this translation or subtitle has some lin- although not yet meeting the necessary standards for direct
guistic, and cultural mistakes. Yao [3] investigates the quality market use, do exhibit a certain level of accuracy, devoid of
of the automated subtitles created by the NetEaseSight plat- grammatical errors or omissions. Furthermore, the automated
form. The study used a selection of the top 20 TED speeches generation of timetables does not only save time for subtitlers
that have been posted on the NetEaseSight platform in order but also improves overall efficiency.
to develop machine-translated subtitles. He investigates the In their research, Hagström and Pedersen [4] undertook
accuracy of voice recognition and cut scores, using the FAR a diachronic analysis of subtitles, examining the changes that
approach to assess the efficacy of machine-generated Chi- occurred both before and subsequent to the integration of ma-
nese subtitles. The findings indicate that there is a need to chine translation into the translation process. They conduct a
enhance the accuracy of machine translation engines. Certain comparative analysis of a corpus of Swedish subtitles derived
terms occasionally do not appear in standard dictionaries but from Anglophone TV programs made after the implementa-
are used to name things such as people, locations, establish- tion of machine translation and a corpus of subtitles from the
ments, and registered brands, as well as denoting temporal pre-machine translation era. This study aimed to examine if
references, numerical values, and new words. there were differences in the quality of subtitles generated in
In addition, it is worth noting that although the transla- the 2020s compared to those produced in the 2010s. They
tion exhibits a certain level of coherence, the overall readabil- adapted the FAR approach, which encompasses an analysis
ity remains a significant concern. During the training phase of three distinct dimensions of quality from the viewers’ per-
of the machine translation engine, low-frequency and uncom- spectives, namely functional equivalence, acceptability, and
mon phrases are removed in order to reduce the complexity readability. The findings indicated that the post-edited subti-
of the module and save storage space. There is also a need to tles generated in the 2020s had certain characteristics when
enhance the accuracy of word translations. Certain terms oc- evaluated based on established standards and the FAR model.
casionally do not appear in standard dictionaries but are used Specifically, these subtitles were observed to be faster, less
to name things such as people, locations, establishments, and
cohesive, more oral, and less complete with less meticulous
registered brands, as well as denoting temporal references, punctuation, and line breaks compared to the subtitles cre-
numerical values, and new words. In addition, it is worth ated in the 2010s. The items examined exhibited notably
noting that although the translation exhibits a certain level lower quality across all assessed areas.
of coherence, the overall readability remains a significant In the same line, Karakanta [5] concentrates on auto-
concern [3] . mated and PE-based assessments of automatic subtitling.
Besides, there is a need for enhancing the accuracy of Initially, she evaluated automatic subtitling in terms of tech-
speech recognition systems and optimizing the segmentation nological advancements, assessment methods, and empirical
process. Despite significant advancements in the technolog- studies. Secondly, she emphasized existing shortcomings
ical maturity of speech recognition, achieving one hundred and aspects that require more attention to fully understand
percent accuracy remains unachievable. Speech interaction and enhance automation in subtitling through the application
585
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
of effective approaches utilizing advancements in both MT text layout, spelling problems, and punctuation issues. These
and AVT. She analyzed publications that provided at least one categories of errors provide a thorough understanding of the
form of experimental design, using automated and/or human current capabilities of automated transcription technology.
evaluation. She also noted that, while there had been studies The study suggests that these systems lack autonomy and rely
undertaking experimental research on automated processes on expert intervention to achieve optimal transcription qual-
for interlingual subtitling, the transition from source to target ity. The study findings indicate that the online apps for MT
language in interlingual subtitling introduces a further level need adequate training and calibration. The text segmenters
of complexity to the approach and evaluation. Therefore, she face challenges related to spatial and temporal constraints
only examined works utilizing MT/ST in interlingual subti- that are unique to the field of subtitling.
tling, including both automated and human assessments. Fur- Matusov et al. [7] provide a comprehensive description
thermore, she addressed the emerging paradigm of automatic of the process by which a state-of-the-art Neural Machine
subtitling, which presents additional obstacles manifested in Translation (NMT) system may be successfully tailored for
a variable number of segments, necessitating auto-spotting the purpose of subtitling. They put forward a straightforward
and segmentation, as well as the disentanglement of vari- approach to include inter-sentence context in the translation
ables. of brief utterances and dialog turns. They also modified the
Karakanta comes up with a set of suggestions that NMT system to accommodate linguistic diversity, namely
should help keep experimental designs for studying auto- Latin American Spanish, as well as subtitling style and do-
matic subtitling from running into problems. The main rec- main. They present a unique approach for the segmenta-
ommendations were that research in automatic subtitling tion of subtitles that integrates a recurrent neural network
should encompass all aspects of subtitling, enhanced inter- model with both hard and soft restrictions on subtitle length
faces, adherence to reporting standards, provision of test and duration inside a beam search framework. A compre-
data and benchmarks, and assessment should be independent hensive assessment, both automated and human-based, was
[5]
of generation. In conclusion, Karakanta asserts that her conducted to assess the quality of the modified machine
selection does not diminish the necessity for perception stud- translation output when segmented into subtitles using the
ies, which will enhance the understanding of experimental suggested method. The results of this evaluation show sig-
research and gain significance as technological quality ad- nificant improvements compared to the baseline MT system
vances. Consequently, standardization and harmonization output, which used line breaks based on heuristics. The im-
are deemed essential for the prosperous future of the AVT plementation of this quality enhancement resulted in signifi-
and MT industries. cant improvements in productivity and time efficiency when
[6]
In Varga’s study, the primary focus was on the funda- the modified machine translation output was post-edited
mental framework of automatic subtitle systems. The study by impartial professional translators. These improvements
primarily centers on the examination of nine online subtitling were seen in comparison to the processes of translating from
platforms, with a particular emphasis on the analysis of their scratch and post-editing the translations generated by the
features. It is worth noting that out of the nine services, only original MT system.
five provide free automated subtitles, each varying in terms Song et al. [8] put up an innovative method for sentence
of their quality. The same video clip was used to evaluate segmentation that involves the use of deep neural networks
these internet platforms, and their results were examined us- to automatically create period marks. The primary objective
ing both quantitative and qualitative analyses to emphasize of this technique is to enhance the precision of the automated
the main characteristics of each site. To comprehensively translation of YouTube subtitles. The study introduces a new
evaluate their competencies, the researcher chose the open- method for phrase segmentation that utilizes neural networks
ing sequence of Quentin Tarantino’s film Reservoir Dogs as and YouTube scripts, and is less dependent on word order
the chosen video clip. and sentence structure. The performance of this strategy
The empirical data highlights many types of errors, in- was measured. They constructed the input in a manner that
cluding missing text, text coherence, speaker recognition, closely resembles YouTube scripts and tried to identify punc-
586
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
tuation marks only based on textual characteristics. For this tions. The issue of word structure errors ranked as the second
investigation, they used a total of 27,826 subtitles extracted most significant challenge in the French language. The issue
from the online courses offered by Stanford University. They of lexical translation posed a considerable challenge for the
use Long-Short Term Memory (LSTM) of Recurrent Neu- languages of German, Spanish, and Arabic.
ral Network (RNN), a very effective technique in the field Hiraoka [10] conducted a study on the effective pre-
of natural language processing, to construct a model using editing rules for subtitling TED Talks using neural machine
available data. This model is then utilized to make predic- translation. The study seeks to formulate and evaluate a set
tions about the placement of punctuation marks. The LSTM of straightforward, efficient pre-editing rules for audiovi-
model has shown promise for its applicability in the restora- sual materials, including TED Talk subtitling, to translate
tion of punctuation in voice transcripts. This approach in- Japanese source text into English, utilizing an NNT engine
volves the integration of textual elements and the length of created by the National Institute of Information and Commu-
pauses. Despite the fact that RNNs have shown commend- nications Technology (NICT) in Japan.
able performance over a range of input durations, they have Pre-editing is classified into two methods: bilingual pre-
compromised some of these advantages by aligning the data editing and monolingual pre-editing. Bilingual pre-editing
length to that of YouTube subtitles. An attempt was made enables the pre-editor to modify the source text while ref-
to forecast the occurrence of periods between consecutive erencing the MT outcome, in contrast to monolingual pre-
words. The experiment included measuring the accuracy of editing, which does not permit this. Thus, monolingual pre-
the approach, which was found to be 70.84%. editing needs no proficiency in the target language. This
[9]
In their study on automatic translations, Gupta et al. study concentrates on monolingual pre-editing, since Hi-
identify and provide explanations for the challenges encoun- raoka aims to empower content producers or those with lim-
tered. The researchers categorize each difficulty into three ited proficiency in the target language to pre-edit source texts
distinct categories. First, “the problems directly related to in their native language for content dissemination. It focuses
textual translation”. Secondly, “problems related to subtitle on monolingual pre-editing, since Hiraoka seeks to enable
creation guidelines”. Lastly, “problems due to adaptabil- content creators or individuals with limited proficiency in
ity of MT engines” [9] . The researchers determine the fre- the target language to pre-edit the source text in their SL for
quency occurrence of 16 significant issues in the automatic content dissemination. The efficacy of the pre-editing rules
translation of subtitles from English to six specific target was assessed based on the enhancement of MT output quality,
languages, namely German, Chinese (simplified), French, considering the 21-character-per-second (CPS) constraint.
Castilian Spanish, Arabic, and Brazilian Portuguese. The Given that the translation aim is TED subtitling, it is essential
experiment was conducted using a dataset consisting of 56 to consider character limitations.
movie subtitle files, with a cumulative count of 17,977 subti- The assessment results indicated that, in comparison to
tle blocks. The English subtitles were produced by humans, the MT output of the raw source text, the MT output of the
while the target subtitles were created using a machine trans- pre-edited source text showed a quality enhancement in the
lation system that was trained using a specific methodology. average scores of both human evaluation and BLEU. The
The findings indicate that the researchers have seen the overall percentage of subtitle segments that contributed to
presence of certain difficulties across a majority of languages. a score gain is 41%. Despite the observed score declines in
The primary issue in all languages, with the exception of Chi- the pre-edited MT, the majority of parts remained over the
nese (simplified), is the high level of paraphrasing errors. ‘Acceptable’ level on the human evaluation scale. Besides
However, there are some issues that are peculiar to certain translation quality, the study has also investigated the charac-
languages and hence need specialist solutions. In compar- ter limitations of subtitling and confirmed that the instances
ison to other languages, German translation has a greater of segments in both raw MT and pre-edited MT outputs that
prevalence of issues concerning structure errors and word violate the 21-character per segment guideline established
order errors. The occurrence of non-text character translation by TED were almost nonexistent. Therefore, it is determined
is often seen in the context of Chinese and Arabic transla- that pre-editing according to the prior rules does not hinder
587
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
compliance with the 21-CPS requirement. applications, ushering in a new era of subtitling.
Athanasiadi [11] investigates the potential of MT and
other linguistic assisting technologies in subtitling. This 3. Methods and Procedures
study looks at the lack of commercial subtitling software
that includes linguistic assistive tools. The goals are to find 3.1. Data Collection
out what programs are already on the market, what their
The study focuses on the comparison of the source text
limitations are, and whether customers want these tools to
and the machine translation output for subtitling on YouTube
be added. Quantitative research was done through an on-
platform. The data of the study were selected from Harry
line questionnaire using Google Forms, incorporating both
Potter and the Prisoner of Azkaban, it consisted of 30 exam-
structured (multiple choice), and unstructured (open-ended)
ples that were taken from the YouTube platform and their
questions to get robust results.
translated versions into Arabic using YouTube’s machine
The study developed a model of a fully automated MT
translation. These examples were purposefully used because
engine for subtitling, intending to illustrate the optimal func-
they contained certain errors that affect the fluency of subti-
tioning of such a system. This concept is predicated on an
tling and therefore, affect the process of understanding. The
SMT engine rather than a rule-based or hybrid engine. The
Harry Potter series comprises a collection of seven fantasy
model segments the engine’s processing into three phases.
books written by the renowned British novelist J. K. Rowling.
The initial phase entails preparing the corpus and integrating
The literary works document the experiences of a young wiz-
it into the system to build the engine. The second step, ST
ard named Harry Potter, together with his friends Hermione
editing, has a voice recognition component together with text
Granger and Ron Weasley, who are enrolled as students at the
condensation and segmentation elements. The third process,
esteemed school known as Hogwarts School of Witchcraft
referred to as ST refinement, entails the automatic modifi-
and Wizardry. The primary narrative trajectory centers on
cation of the script to reduce post-editing effort. The voice
the protagonist Harry’s confrontation with Lord Voldemort,
recognition system automatically initiates the translation of
a malevolent wizard who seeks immortality, aims to topple
previously detected and transcribed subtitles upon comple-
the ruling institution of wizards called the Ministry of Magic,
tion of all steps. The translated screenplay incorporates the
and seeks to dominate both wizards and Muggles (people
timecodes from the transcribed subtitles, providing the sub-
without magical abilities).
title with two .srt files and one.txt file. One .srt file has
timecoded SL subtitles, whereas the second .srt file contains
3.2. Data Analysis
timecoded TL subtitles. The .txt file contains the SL script
devoid of timecodes, serving as a reference for post-editing. The data were analyzed based on the FAR model
The previous MT model was developed as a fully au- (Functional Equivalence, Acceptability, and Readability) sug-
tomated machine translation system to emphasize the advan- gested by Pedersen [12] . Moreover, the data were analyzed
tages that a machine translation engine for subtitling offers to to explore if YouTube machine translation is reliable in ren-
subtitlers, particularly regarding time efficiency. Nonetheless, dering the intended meaning depending on film subtitling.
the efficacy of such an engine can only be assessed through The concept of quality in translation is a multifaceted issue,
deployment. The questionnaire findings indicated a strong and it becomes much more complex when applied to subti-
preference for TM components in subtitling software over all tling. The assessment of subtitling quality is often evaluated
other alternatives. This indicates that TM tools are desired by based on internal criteria [12] . The proposed model serves
subtitlers, and may be seen as a significant oversight in the as a comprehensive framework for evaluating the overall
evolution of subtitling software. The questionnaire’s findings quality of pre-existing interlingual subtitles. Its applicability
indicated that TBs are the respondents’ second preference for has been seen in the assessment of quality in both fansubs
integration into a subtitling system with a TM component. and professional subtitles. Furthermore, it has been seam-
The primary conclusion is that traditional subtitling software lessly incorporated into the quality assessment process of the
is gradually evolving into online, accessible, and adaptable Trados subtitling unit.
588
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
The FAR model incorporates the correlation between equivalence, acceptability, and readability. We should men-
interlingual subtitles and the ultimate consumer (the viewer). tion that the researcher analyzed the scenes of the movie
The fundamental unit of evaluation under the FAR model is from the videos that are available on the YouTube platform,
the subtitle itself. The subtitle is used as the fundamental which are estimated to be an hour and a half from the movie.
unit of evaluation, for instance, the word, phrase, or minute Table 1 shows that the highest approval rate was in
of airtime. This is due to the fact that subtitling entails the functional equivalence with 111.8%, which means that the
linguistic compression of information, and the level of con- machine has a serious problem at the semantics level; the
versation intensity may significantly range across various score of serious errors is considered very high, and the
shows. The FAR approach examines three categories of viewer certainly won’t understand the scenes, and the stan-
quality as perceived by viewers: functional equivalence, dard errors are also high, which confuse the viewer’s com-
acceptability, and readability. The concept of functional prehension. Nevertheless, the high percent of untranslated
equivalence centers on the communication or significance utterances, which is estimated at 34.3% of standard errors,
conveyed in the original text and the extent to which it then the acceptability with a 34% approval rate, and that
has been accurately conveyed in the target language. The seems like a bad percentage, means that the YMT has an
concept of acceptability involves evaluating whether the issue with Arabic norms and made the text foreign to the
linguistic standards of the specific language being targeted Arab people. After that, the readability had a 30.9% ap-
have been followed. Finally, readability refers to the capa- proval rate. The total approval rate is 68.5%, which gives
bility of the audience to effectively read the subtitles and the impression that the translation is very poor. The study
comprehend the conveyed material. used the FAR model, and the researcher looked at how good
A suggested penalty point system is included, along the examples were by giving each one a penalty point value:
with methods for identifying faults and categorizing their minor: 0.25 points, standard: 0.5 points, serious: 1 point
severity as intersubjectively as feasible for each of the FAR for all categories except semantic errors, which are: minor:
categories. This gives the users the ability to evaluate each 0.5 points, standard: 1, serious: 2, and that according to
subtitled text from these three angles. The utilization of the Pedersen [12] . The findings reveal that the YMT was not
penalty point system facilitates the identification of prob- successful in most examples.
lematic areas in a subtitle’s text. Consequently, it may be
employed to offer subtitlers constructive feedback, which Table 1. Number of Errors, Error Scores, and Approval Rates of
the data.
could be beneficial in an educational setting. The mistake
classifications and scores are imported from the NER model, Number of Errors Error Score
which are “minor,” “standard,” and “serious”. Functional equivalence 201 224.75
Semantics errors 196 222.5
The rationale behind using FAR model is that it is func- - Serious errors 40 80
- Standard errors 129 129
tional and easy to apply. In addition, it allows the assessor
- Minor errors 27 13.5
of subtitling to easily recognize the mistakes in translation Stylistic errors 5 2.25
such as providing the functional or equivalent term of the SL - Standard errors 4 2
- Minor errors 1 0.25
in the TL, and to decide whether this term is readable and Approval rate: 111.8%
acceptable for the target readers or not. Therefore, it is easy Acceptability 72 24.5
Grammar errors 67 22.75
to spot and evaluate the error on screen and to suggest an - Standard errors 24 12
alternative translation. - Minor errors 43 10.75
Spelling errors 3 0.75
- Minor errors 3 0.75
Idiomaticity errors 2 1
4. Findings and Discussion - Standard errors 2 1
Approval rate: 34%
Readability 166 51.25
4.1. Quantitative Analysis Segmentation errors 29 12.75
- Serious errors 4 4
Thirty examples were analyzed to investigate the ma- - Standard errors 10 5
- Minor errors 15 3.75
chine subtitling translation in three categories: functional
589
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
Table 1. Cont. both the literal content and the intended meaning, it would re-
Number of Errors Error Score sult in a clear mistake. If the intended meaning is accurately
Punctuation & graphics 137 38.5 delivered without any additional information, this should not
- Serious errors 1 1
- Standard errors 14 7 be considered a mistake. Instead, it is a common technique
- Minor errors 122 30.5 in subtitling and may be preferable to providing a word-for-
Approval rate: 30.9%
Total: 439 300.5 word translation. If just the literal words said or written are
Total approval rate: 68.5% considered without taking into account the intended meaning
behind them, this would also be considered a mistake since it
4.2. Qualitative Analysis might lead to misinterpretation or confusion. However, since
YMT is a speech-to-text engine, we will focus on whether it
The study explores the subtitling techniques of Harry conveys the literal meaning in a way that the viewers will get
Potter and the Prisoner of Azkaban used by the YouTube the idea. There are two types of equivalence errors, namely
machine translation. These examples were analyzed accord- semantic and stylistic.
ing to the FAR model (functional equivalence, acceptability, (1) Semantic Errors
and readability). The examples were classified and analyzed In consideration of the significance of semantic equiv-
depending on the type of error. alence in interlingual subtitling and the users’ presumed re-
duced error tolerance, the penalty points for semantic equiva-
4.2.1. Functional Equivalence lence are as follows: minor: 0.5, standard: 1, and serious: 2.
[12]
According to Pedersen , functional equivalence, also Serious Errors
known as dynamic equivalence or meaning-based translation, A serious semantic equivalence error refers to a subti-
is a translation technique whereby the translator attempts to tle that contains such substantial inaccuracies. It makes the
convey the intended meaning and thought of the reader in viewers’ comprehension of the subtitle completely ineffec-
the source language, rather than focusing only on the lit- tive. This error not only hinders the viewers’ understanding
eral words and structures used. Ideally, a subtitle should of subsequent subtitles but also has the potential to cause mis-
effectively communicate both the explicit content and the understandings in the plot or disturb the overall illusionary
underlying intention. If there is a failure to accurately convey experience for more than a single subtitle.
Example (1)
ST: Give me the cup. Oh, my dear boy. My dear...you have the Grim.
TT: ،أوه أوه أنا هل تجرؤ على الحصول على خبز كريمي
(Back translation: Oh, Oh, I, do you dare to get a creamy bread,)
In example (1), the dialogue was between Professor of the meaning by the viewers. Furthermore, the machine
Trelawney and Harry Potter in the class of divination. The omitted the first sentence when the professor asked Harry to
students learned tasseomancy, the art of reading tea leaves, give her the cup. Also, it adds the question tool “( ”هلdo), and
to have a sight of the future. Therefore, the professor takes translates the words “My dear boy. My dear” into “ ” تجرؤ
Harry’s cup to tell him what the sight in it. She is shocked (dare) which is a mistake that transfers very different infor-
about what he has, and with all sadness, she tells him that mation. This error is serious since it hampers comprehension
he has the “Grim,” which is a form of giant spectral dog. of both the individual sentence and the broader context. The
It’s among the darkest omens in the world, and it’s an omen occurrence of this error might perhaps be attributed to the
of death. In YMT, it translates the word “Grim” as “ الكريمي fact that the machine did not comprehend the British accent.
(” الخبزcreamy bread) which is far cry from the intended Suggested translation: . لديك الغريم،طفلي العزيز! عزيزي
meaning in the ST(See Figure 1). Thus, the absence of a con- أوه! يا.( أعطني الكأسBT: Give me the cup. Oh! My dear
nection in the translation causes an inadequate understanding child! My dear, you have a Grim.).
590
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
Standard Errors the intended meaning and does not significantly hamper
A standard semantic equivalence error may be defined the comprehension of the viewers beyond that particular
as a subtitle that includes errors but remains relevant to subtitle.
Example (2)
ST: Who is that? Who is? That is Sirius Black, that is. Don’t tell me you’ve never been hearing of Sirius
Black. He’s a murderer. Got himself locked up in Azkaban for it.
TT: إنه قاتل حصل على نفسه تم،ذلك الرجل الذي هو ذلك السود الجاد الذي ل تخبرني أنك لم تكن هنا أبددا بظهر أسود خطير
،حبسه في أسكابان بسبب ذلك
(BT: This man who is the serious black that you didn’t tell me that you were here at all in a black dangerous
back, he is a killer got himself, he was prisoned in Azbakan because of this)
In example (2), Harry sees a picture of a man in the more, the translation doesn’t use the punctuation correctly,
newspaper, so he asks Stan Shunpike, who holds the news- especially question marks, which will improve the qual-
paper, about him. Stan, in a surprising way because Harry ity of the translation and make it more acceptable. This
doesn’t know him, tells him that his name is Sirius Black, may be considered a standard error since it changes the sen-
a murderer who was locked up in Azkaban prison. In this tence’s semantic meaning without changing the viewer’s
scene, YMT uses word-for-word translation. It translates comprehension of the preceding information. Suggested
“Sirius Black” in two terms: “ ( ” السود الجادserious black) translation: ل تخبرني أنك لم تسمع.من هو؟ هذا سيريوس بلك
and “ ( ” أسود خطيرdangerous black) and both are clearly تم حبسه في أزكابان بسبب من هذا؟،عن سيريوس بلك! أنه قاتل
wrong that it may cause misunderstanding. Moreover, the .ذلك
machine made a mistake in translating the word “hearing” (BT: Who is this? Who is he? This is Sirius Black.
into “ بظهر,” (back) which made the sentence have a differ- Don’t tell me you haven’t heard of Sirius Black! He’s a
ent meaning from that in the ST(See Figure 2). Further- murderer, he was prisoned in Azkaban because of it.)
591
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
(2) Untranslated within the narrative are left uninterpreted, the examples will
According to Pedersen [12] , when important statements be classified as standard semantic errors.
Example (3)
ST: A cat? Is that what they told you? Looks like a pig with hair.
TT: ،أخبرتك حًدا أنك تبدو مثل خنزير بشعر إذا سألتني
(BT: I told you that you look like a hairy pig if you ask me.”
592
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
In example (3), Ron is talking with Hermione about machine didn’t use the correct punctuation, and the untrans-
their pets. He hates her cat because it is always chasing lated word caused the wrong segmentation. The standard
his rat. YMT didn’t translate the word “a cat?” the main error in untranslated word also prompted a serious error in
word in the dialogue and translates the whole other words translating the rest of the sentence. Suggested translation:
incorrectly, which means that the viewers will conclude .أخبروك به؟ تبدو مثل خنزير مغطى بالشعر
ك قطة؟ هل هذا ما
that Ron is directing all this to Hermione, so the whole (BT: A cat? Is this what they told you? It looks like a pig
translation is inaccurate( See Figure 3). Far from that, the covered in hair).
Minor Errors flaws that primarily involve terminology and do not have a
Minor functional equivalence errors refer to lexical significant impact on the overall narrative of the film.
Example (4)
ST: But where is it? I saw the beast, just now. Not a moment ago!
TT: ، لًد رأيت البنجر الن وليس للحظة قبل ذلك،فأين هي
(BT: Where is it, I saw the beet now and not before a moment.)
In example (4), after the ministry sentenced the death of ers(See Figure 4). Moreover, the machine uses the wrong
Buckbeak, a kind of bird that belonged to Hagrid, a friend of punctuation in this segment; it changes the question mark
Harry, the minister went to Hagrid’s house to implement the into a comma and deletes the point that affected the segment
ruling, but at the same moment, when the minister was busy and the translation. The error can be a minor one because
talking with Professor Dumbledore. Harry and Hermione It’s a lexical error that has no bearing on the broader plot
succeeded in smuggling the bird. The YMT translates the of the film. Suggested translation: قبل،رأيت الوحش للتو
word “beast” as “ البنجر,” which means in Arabic “beets,” ( لحظة! ولكن أين هو؟ لًدBut where is he? I saw the beast
and that may cause a little misunderstanding for the view- just now, not a moment ago).
593
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
(3) Stylistic Errors the wrong words for address, speaking in the wrong regis-
Stylistic errors are comparatively less consequential ter (either too high or too low), or using language in a way
than semantic errors since they mostly result in inconve- that doesn’t follow the rules set by the original context—for
niences rather than misunderstandings. For example, using example, using modern language in historical films [12] .
Example (6)
ST: Ernie! They’re right on top of us! Mind your head.
TT: اهتم برأسك،إيه الحق فوقنا
594
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
(BT: Yeah, the right is above us, take care of your head)
In example (6), Harry is on a wizard bus to take him to loquial Arabic, especially Egyptian, used in wonder or the
the Leaky Cauldron in London. In the meantime, they face question; therefore, the viewers may read the subtitle in that
the double bus. So, the driver assistant alerts the driver that way. The machine changed from normal Arabic into col-
they’re directly in front of them and says, “Ernie, they’re loquial, which caused a misunderstanding in the style and
right on top of us!”. YMT translates it in a word-for-word changed the whole meaning of the sentence. The error is
way as “ ” فوقنا إيه الحقthat doesn’t convey neither what classified as a standard error because it affects the sentence’s
it said nor the intended meaning(See Figure 6). Also, as meaning, not the whole scene. Suggested translation: !آيرني
we noticed, the machine didn’t catch the name of the driver .( أنهن أمامنا مباشرة! انتبه لرأسكBT: Ernie! They are right
clearly; thus, it translates as “ ”إيهand this is a word in col- in front of us! Watch your head!)
4.2.2. Acceptability less, it is important to note that the grammar used in subtitling
[12]
Acceptability refers to the degree to which the target is specifically tailored to the target language. Pedersen
text adheres to the norms of the target language. The mis- points out that subtitling might be seen as a hybrid mani-
takes in this category are those that make the subtitles seem festation of both spoken and written language, suggesting
foreign or otherwise odd. These errors also break the con- that a rigid adherence to the grammatical norms of written
tract of illusion as they direct attention to the subtitles. These language may be inappropriate. The presence of a serious
mistakes are of three kinds: Grammar errors, spelling errors, grammatical mistake in the subtitle hinders its readability
and errors of idiomaticity [12] . and/or comprehension. Minor errors, such as the misuse
(4) Grammar Errors of ‘whom’ in the English language, are considered the pet
These cases are just instances of grammatical errors in peeves that cause discomfort to purists. Standard errors are
the target language, appearing in different forms. Neverthe- located in between.
Example (7)
ST: Before I fainted, I heard something. A woman screaming.
TT: سمعت شيًدا امرأة تصرخ،قبل أن أغمي عليه
(BT: Before he fainted, I heard something, a woman screaming)
595
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
In example (7), Harry was talking with Professor Lupin changes the agent; like, it’s not Harry who fainted but some-
describing to him how he felt before he fainted because of one else (See Figure 7). The error is classified as a standard
the Dementors. YMT translates it in a good way that conveys one. Suggested translation: . أمرأة تصرخ. سمعت شيًدا ما،ي
عل ي
the intended meaning. However, the machine translates “I ( قبل أن ييغمىBT: Before I fainted, I heard something, a
fainted” as “ ” أغمي عليه, which is a grammatical mistake that woman screaming.)
Example (8)
ST: Why don’t you run along and play with your chemistry set!
TT: ،لماذا ل تركضان وتلعبان في مجموعة الكيمياء الخاصة بك
(BT: Why don’t you (dual/two people) run and play in your chemistry set!)
In example (8), Sirius tells Professor Snape to leave ing (See Figure 8). The error may be considered standard.
the place. YMT conveys the intended meaning correctly; Suggested translation: !في مجموعة الكيمياء الخاصة بك
however, it translates it as if he were talking with two peo- ( لماذا ل تركض وتلعبBT: Why don’t you run and play in
ple. It’s a minor error that doesn’t affect the overall mean- your chemistry set!).
596
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
(5) Spelling Errors classified based on their severity as follows: minor errors in-
Spelling refers to the accurate arrangement of letters clude all spelling mistakes, standard errors alter the intended
in a given word. The assessment of spelling errors may be meaning of a word, and serious errors render a word illegible.
Example (9)
ST: This term, we’ll focus on Tasseomancy, the art of reading tea leaves.
TT: وهو فن قراءة أوراق الشايtezo MEMC هذا الفصل الدراسي الذي سنركز فيه على
(BT: In this semester, we we’ll focus on tezo MEMC, the art of reading tea leaves.)
In example (9), the word “tasseomancy,” which is the of things, places, etc. Although it may occasionally bother
subject of the course that Professor Trelawney started teaching the viewers(See Figure 9). Thus, the error is considered mi-
to her students. YMT didn’t translate it, but it conveys it as nor error. Suggested translation: .فن قراءة أوراق الشاي
loan word with incorrect spelling. These kinds of errors don’t سنركز على التاسيومانسي؛ وهو، هذا الفصل. (BT: In this term,
affect the meaning of the scene, especially if it’s the name we’ll focus on Tasseomancy, the art of reading tea leaves).
(6) Idiomaticity Errors speaker of that particular language. Problems falling under
Within this model, the concept of idiomaticity encom- this category pertain not to grammatical flaws but rather
passes more than just the use of idioms. It also entails the to errors that result in unnatural-sounding language in the
natural use of language, specifically referring to expressions target language since they give rise to regressions, hinder
and phrases that would be considered natural by a native comprehension, and thus impact reading speed.
Example (10)
ST: I’m sure Madam Pomfrey will fix it in a heartbeat.
TT: ،أنا متأكد من أن مدام بومبس ستصلح نبضات قلبهم
(BT: I am sure that Madam Bombos will fix their heartbeat).
In example (10), the dialogue was between Hermione YMT uses the word “ ” نبضات الًلبwhen translating the
and Ron about Ron’s leg, which was almost torn off by the word “heartbeat,” which is a literal translation in the wrong
dog after it bit him. For this reason, Hermione gives him context. This is an idiom, which means that Madam Pomfrey
the comforting assurance that Madam Pomfrey, who was a will fix it in no time or in brief. Therefore, the subtitles don’t
skilled healer at Hogwarts, would undoubtedly cure him. The convey the intended meaning correctly, which may confuse
597
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
the viewers. Also, the machine transliterates the name of of viewers, but it doesn’t affect the whole scene. Suggested
Madam Pomfrey erroneously as “ ( ” مدام بومبسSee Figure translation: أنا متأكدة أن مدام بومفري ستعالجها في طرفة
10). This may be seen as a standard error since it alters the .( عينBT: I am sure that Madam Pomfrey will fix it in a
natural-sounding language and hampers the understanding twinkle of an eye).
Example (11)
ST: Very well. Kill him. But wait one more minute. Harry has the right to know why.
TT: الًتل جيددا ولكن انتظر دقيًة واحدة لري لديه الحق في معرفة
(Back translation: Good killing, but wait for one minute Larry has the right to know why).
ST: I know why. You betrayed my parents. You’re the reason they’re dead!
TT: ، فأنت السبب في موتهما،سبب خيانتك لوالداي
(BT): The reason for your betrayal to my parents, you are the reason for their death.
In example (11), Professor Lupin is talking with Sir- truth before killing Peter Pettigrew, and in the meantime,
ius Black and reminding him that Harry has to know the Harry interrupts them by saying that he knows why he did
598
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
that, because he betrayed his parents. YMT makes two and the first sentence, “I know why.” by Harry’s speaking,
mistakes at the segmentation level. Firstly, it translates and combines the dialogue to make the translation of theirs
the words “Very well. Kill him.” as “ ” الًتل جيددا, which as a single sentence. There is also a spelling mistake in
combines the two sentences together and makes it as one translating the name “Harry” as “ ( ” لريSee Figure 11).
sentence. Also, it omits the pronoun “him”, and that clearly This error may be considered a standard error that causes
affects conveying the intended meaning. Second, the ma- a misunderstanding and an inconvenience to the viewers.
chine omits the word “why” at the end of Lupin’s speaking Suggested translation:
. هاري لديه الحق في معرفة السبب، لكن انتظر دقيًة واحدة، حسندا إذا! اقتله-
. أنت السبب في موتهما،ي لًد خنت والد ي. أعلم لماذا-
BT: - Alright! Kill him, but wait a minute, Harry has the right to know the reason.
- I know why, you betrayed my parents, you caused their death
(8) Punctuation and Graphics among others. The widespread use of this practice in many
“Punctuation in subtitling is more important than in other contexts has established it as a customary norm, integrating
texts” [12] . The use of italics to indicate ‘irrealis’ is a notewor- it into the structure of the contract of illusion. Consequently,
thy illustration. Italics serve as a typographic convention in any incorrect utilization of this practice should be seen as
several countries to indicate the presence of a voice or text that a standard mistake. Similarly, the use of dashes follows the
is deemed to be absent. This encompasses various contexts, same principle. There is a considerable degree of variety in the
such as telephone conversations, television broadcasts, public use of dashes. Speaker indication is one of the primary func-
address systems, dreams, internal thoughts, and recollections, tions of subtitles, facilitating the identification of the speaker.
599
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
Additionally, they are used to maintain continuity between person [12] . It is important to acknowledge that none of the
utterances. In more infrequent cases, subtitles may also be aforementioned elements have been used in YMT. We have
utilized to signify the speaker’s engagement with a different focused on the punctuation in-text.
Example (12)
ST: Dementors force us to relive our very worst memories. Our pain becomes their power.
TT: ، الًوة. ويصبح ألمنا همهم،يجبر الديمنتورز على استعادة أسوأ ذكرياتنا
(BT: Dementors force us to relive our very worst memories. Our pain becomes their concern, the power).
In example (12), Professor Lupin explains to Harry about “power,” which affects the translation and disturbs the viewers
the Dementors and how they feed on humans’ good memories (See Figure 12). The error is classified as standard. Suggested
and leave them the worst ones to live with. YMT translates translation: . وتصبح آلمنا قوتهم،على إحياء أسوأ ذكرياتنا
it in a good way, almost conveying the meaning correctly, ( يجبرنا الديمنتورزBT: Dementors force us to relive our
but the machine puts the point between the words “their” and worst memories, and our pains become their power).
على إحياء
600
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
factors, including background noise and speech pace. The Data Availability Statement
main drawback of the lack of speech recognition is its inabil-
ity to modify text based on contextual cues. Additionally, No new data were created.
inadequate semantic understanding is a major obstacle in
this field. In order to address this issue, it is imperative to Conflicts of Interest
enhance the algorithm and acquire a substantial volume of
The authors declare no conflict of interest.
dependable data for algorithmic training. Nevertheless, it is
important to acknowledge that machine-translated subtitles,
although not yet meeting the necessary standards for direct References
market use, exhibit accuracy in certain instances, devoid of
[1] Maučec, M.S., Donaj, G., 2019. Machine translation
grammatical errors or omissions. The errors that occurred and the evaluation of its quality. Recent Trends In Com-
can be attributed to the complex diversity of structures, pol- putational Intelligence. 143, 1–20.
ysemy, and vocabulary present in the Arabic language, as [2] Matkivska, N., 2014. Audiovisual translation: Concep-
tion, types, characters’ speech and translation strategies
well as the limitations of Arabic machine translation and its
applied. Studies About Languages. 25, 38–44.
database. However, the automated generation of timetables [3] Yao, G., 2022. Evaluation of machine translation in
serves to save time for subtitlers and improve overall effi- English-Chinese automatic subtitling of ted talks. Mod-
ciency. Further studies should be undertaken to assess other ern Languages, Literatures, and Linguistics. 1(01),
12–22.
forms of machine translation that employ distinct operational
[4] Hagström, H., Pedersen, J., 2022. Subtitles in the 2020s:
systems on different platforms. The influence of machine translation. Journal of Audio-
visual Translation. 5(1), 207–225.
[5] Karakanta, A., 2022. Experimental research in au-
Author Contributions tomatic subtitling: At the crossroads between ma-
chine translation and audiovisual translation. Transla-
Conceptualization, A.M.A.-H. and R.H.R.; methodol- tion Spaces. 11(1), 89–112.
ogy, A.M.A.-H.; software, A.M.A.-H.; validation, A.M.A.- [6] VARGA, C., 2021. Online Automatic Subtitling Plat-
forms and Machine Translation. An Analysis of Quality
H., and R.H.R.; formal analysis, A.M.A.-H.; investigation,
in AVT. Scientific Bulletin of the Politehnica Univer-
A.M.A.-H.; resources, R.H.R.; data curation, R.H.R.; writ- sity of Timisoara Transactions on Modern Languages,
ing—original draft preparation, A.M.A.-H.; writing—review 20(1), 37–49
and editing, R.H.R.; visualization, A.M.A.-H. and R.H.R.; [7] Matusov, E., Wilken, P., Georgakopoulou, Y., et al.,
2019. Customizing neural machine translation for sub-
supervision, A.M.A.-H.; project administration, A.M.A.-H.;
titling. Proceedings of the Fourth Conference on Ma-
funding acquisition, A.M.A.-H. All authors have read and chine Translation; Florence, Italy, 19–23 August 2019.
agreed to the published version of the manuscript. pp. 82–93.
[8] Song, H.J., Kim, H.K., Kim, J.D., 2019. Inter-sentence
segmentation of YouTube subtitles using long-short
Funding term memory. Applied Sciences. 9(7), 1504.
[9] Gupta, P., Sharma, M., Pitale, K., et al., 2019. Problems
This work received no external funding. with automating translation of movie/tv show. Avail-
able from: https://arxiv.org/abs/1909.05362 (cited 4
September 2019).
Institutional Review Board Statement [10] Hiraoka, Y., Yamada, M., et al., 2019. Pre-editing plus
neural machine translation for subtitling: effective pre-
Not applicable. editing rules for subtitling of TED Talks. Proceedings of
Machine Translation Summit XVII: Translator, Project
and User Tracks; Dublin, Ireland, 19–23 August 2019.
Informed Consent Statement pp. 64–72.
[11] Athanasiadi, R., 2019. Exploring the potential of ma-
Not applicable. chine translation and other language assistive tools in
601
Forum for Linguistic Studies | Volume 07 | Issue 02 | February 2025
subtitling: A new era? In: Deckert, M. (ed.). Audiovi- [13] d’Ydewalle, G., Warlop, L., Van Rensbergen, J., 1989.
sual Translation: Research and Use. Peter Lang: Berlin, Television and attention: Differences between young
Germany. pp. 29–49. and older adults in the division of attention over dif-
[12] Pedersen, J., 2017. The FAR model: Assessing quality ferent sources of TV information. Medienpsychologie:
in interlingual subtitling. The Journal of Specialized Zeitschrift für Individual- und Massenkommunikation.
Translation. 28, 210–229 1, 42‒57.
602