Usability Analysis of the Concordia Tool
Applying Novel Concordance Searching
Rafał Jaworski1 , Ivan Dund̄er2 , and Sanja Seljan2
1 Adam Mickiewicz University in Poznań, Faculty of Mathematics and Computer Science
2 University of Zagreb, Faculty of Humanities and Social Sciences
Abstract. This paper describes a novel tool for concordance searching, named
Concordia. It combines the capabilities of standard concordance searchers with
the usability of a translation memory. The tool is described in detail with re-
gard to main applied methods and differences when compared to already exist-
ing CAT tools. Concordia uses three data structures, i.e. hashed index, mark-
ers array and suffix array, which are loaded into memory to enable fast lookups
according to fragments that cover a search pattern. In this new concordancing
system, sentences are stored in the index and marked with additional informa-
tion, such as unique ids, which are then retrieved by the Concordia search algo-
rithm. The usability of the new tool is analysed in an experiment involving two
English-Croatian human translation tasks. The paper presents a detailed scheme
and methodology of the conducted experiment. Furthermore, an analysis of the
experiment results is presented, with special emphasis on the users’ attitudes to-
wards the usefulness and functionalities of Concordia.
Keywords: concordance searching, computer-assisted translation, approximate
searching, suffix array, human evaluation
1 Introduction
In order to bridge the gap between the industry and the research, various studies have
been conducted regarding the usability of computer-assisted translation (CAT) tools in
the translation process. CAT tools can be considered as stand-alone systems, or tools
that are integrated with electronic dictionaries, machine translation (MT) engines, con-
cordancers, terminology managers, full-text search tools etc. CAT tools and MT sys-
tems, along with integrated plug-ins and resources can offer quick gisting translation,
but still lack the quality. Numerous experiments have been conducted in order to assess
the usability of CAT and/or MT systems. While CAT technology is used to find match-
ing sentences from sentence-aligned translation memories (TM), translators often need
translations of sub-sentence units, e.g. phrases, expressions etc.
One of the key requirements is to have high-quality aligned parallel corpora, see
[18], [9]. Bilingual concordancers are still often used by translators. For a given query,
the system retrieves a source-target translation unit pair containing the queried se-
quences of characters. Bilingual concordancers represent an extension of dictionaries,
allowing for searching of multi-word units, collocations or idiomatic expressions (e.g.
“look forward to”), phrases or even entire sentences.
2 R. Jaworski, I. Dund̄er, S. Seljan
CAT systems typically use sets of previously translated sentences, called translation
memories. For a given sentence, a CAT system searches for a similar sentence in the
translation memory. If such a sentence is found, its translation is used to produce the
output sentence. This output sentence is then used as a suggestion for translation, while
a human translator carries out the post-editing.
This technique is applied in many leading CAT platforms, such as SDL Trados [16]
or Kilgray memoQ [15]. Its main advantage is the fast detection of situations, in which
a translator is presented an identical or almost identical sentence to the one previously
translated. In this case the old translation can be reused with a minimal need for post-
editing. However, the main drawback of translation memory searching lies in the fact
that the situations described above happen relatively rarely.
Another mentioned technique used in the mentioned CAT platforms is concordance
searching – a technique of looking up single words or multi-word units from the trans-
lated sentence in a translation memory. Occurrences of these words are then presented
to the translator with the appropriate contexts.
It is crucial to know which of these techniques can prove valuable in the translation
process. Therefore, evaluation of the translation productivity is conducted in order to
obtain or maintain a suitable translation quality and/or reduce work time and costs.
Translation productivity, analysed through post-editing of CAT/MT-translated text, is
often performed in combination with a survey of the users’ skills, cognitive efforts and
the quality of the translated text.
Human evaluation of usefulness of a CAT tool mostly takes into account the impact
on post-editing speed and effort, usability of the interface, ease of translation spotting,
autocompleting of translations etc. On the other hand, automatic evaluation is com-
monly analysed with the help of human-targeted translation edit rate (HTER) as shown
in [20], the BLEU metric and, in more recent works, fuzzy matching measures: [5], [4].
This paper presents a new CAT tool, i.e. a novel concordance searcher named Con-
cordia, and evaluation results regarding its usability. Concordia uses a combination of
well established algorithms and data structures to facilitate fast queries and combines
the advantages of standard concordancers with the capabilities of a translation memory.
Subsequent sections describe related work in the field, the details about the Concordia
search algorithm, followed by the description of the experimental evaluation with the
corresponding results, whereas conclusions are given in the final section.
2 Related work
Usability and productivity studies of various CAT tools have recently emerged due to
the interest of industry leaders, software engineers, computer and information scientists,
translators, localisers and data scientists. Numerous assessments taking into consider-
ation different aspects, ranging from human evaluations up to automatic metrics, have
been conducted.
The paper [5] assessed the user productivity of a commercial CAT tool with the pub-
licly available MyMemory plug-in and an integrated commercial machine translation
engine. Twelve translators participated in a real translation project. The productivity
was measured by human and automatic evaluations. The machine translation engine was
Usability Analysis of the Concordia Tool 3
analysed through the rate of words per hour, fuzzy match, productivity gain, BLEU and
TER score. The results showed that post-editing efforts significantly decreased when
using a combination of translation memories and machine translation. The post-editing
speed implied significant differences across translators, languages, and domains. In an-
other research (see [17]) the machine translation post-editing productivity with regard
to speed and needed effort was measured. The results were obtained with the help of an
eye-tracking system.
An interesting experiment is described in [7]. It involved eight professional trans-
lators who were given a task to translate ca 800 source words from scratch, using a
glossary, a translation memory with mainly 80-90% fuzzy matches and a commercial
statistical machine translation (SMT) engine trained on the translation memory content.
The productivity was measured in terms of speed and quality of translated texts. Rela-
tive time improved 4% to 52%, with an average of 27%. Short (1–10 words), medium
(11–20 words) and long (>20 words) segments were used, whereas the highest quality
increase of machine translations was observed on medium to long segments.
The article [6] showed that the use of MT in a post-editing task improved the speed
and quality, and suggested new approaches to translation interface design, after per-
forming a visual analysis of the translation process and a statistical analysis using
ANOVA.
Recent research assessed the post-editing effort while using four different SMT sys-
tems [12]. Post-editing was carried out by fluent bilingual native speakers without pre-
vious experience in professional translation. Post-editing speed was strongly influenced
by the post-editors’ skills and effort. Also, the impact of the SMT system quality on
the post-editing effort was assessed by using human-mediated TER, which uses as the
reference translation the machine-translated sentence in order to minimise the number
of edit operations. The authors indicated that differences among post-editors are larger
than among MT systems.
The paper [3] described a searchable translation memory relying on statistical ma-
chine translation, using word-alignment and phrase-based SMT with the possibility to
search for all possible substrings, i.e. unseen phrases. The authors recommended the
Linear B system available for Arabic, Chinese and seven European languages. The eval-
uation was done with regard to precision and recall.
A bilingual concordancing system which displayed occurrences of a specific word
or an expression is presented in [19]. The tool can also be accessed over the internet
and performs thousands of user queries per day. It searches through a large database
of bitexts (sentence-aligned text) – Hansard and Court Decisions, which mostly con-
tains bigram expressions, followed by 3-grams, 4-grams and unigrams. The authors
proposed a word-processor add-on functionality which would allow to submit queries
to the TransSearch system directly from a word-processor.
Another bilingual concordancer is presented in [2]. It is transformed into a trans-
lation search engine through some improvements. The authors indicated that during a
6-year period 87% of searches contained at least two words.
The DOMCAT project [1] is a web-based bilingual concordancer for domain-specific
computer-assisted translation. The system retrieves, for a given multi-word expression,
4 R. Jaworski, I. Dund̄er, S. Seljan
aligned sentence pairs. The authors stated that translation spotting was the most chal-
lenging part.
The paper [22] described a web-based English-Chinese concordance system named
TotalRecall, which was developed for computer-assisted language learning of idiomatic
expressions.
Also, the paper [11] presented a system for term extraction by extracting the con-
texts and combining word alignment and concordancing. The aim was to develop the
Terminology Management System (TMS) of legal phraseology and terminology for
French, Dutch and German. In an experimental tool called FragmALex, links between
the source and target text were created using lexical resources (lemmas and their trans-
lations) borrowed from dictionaries, terminology bases, documents and cognates, i.e.
words with common etymological origin which are similar in the source and target
language.
3 Concordia: a Concordance Search Algorithm
This section presents a novel solution for concordance searching. It differs significantly
from the standard concordance searchers. The most important difference is that Concor-
dia tries to search sequences of words in the translation memory, instead of single word
occurrences. In order to carry out the search procedures efficiently, an offline index,
based on the suffix array ([14], [13]) and other auxiliary data structures are used.
3.1 Operations on Index
Main operations performed on the index are stated subsequently.
– void addToIndex(string sentence, int id) – this method is used to add a sentence to
the index along with its unique id. The id is treated as additional information about
the sentence and it is then retrieved from the index by the search algorithm. This
is useful in a standard scenario, when sentences are stored in a database or a text
file, where the id is the line number. Within the addToIndex method the sentence is
tokenised and from this point forward treated as a word sequence.
– void generateIndex() – after adding all the sentences to the index the generateIndex
method should be called in order to compute the suffix array for the needs of the
fast lookup index. This operation may take some time depending on the number of
sentences in the index. Nevertheless, its length rarely exceeds one minute (during
experiments with 2 million sentences the index generation took 6-7 seconds).
– concordiaSearch(string pattern) – the main concordance search method returns the
longest fragments from the index that cover the search pattern.
3.2 Index Construction
The index incorporates the idea of a suffix array and is aided by two auxiliary data
structures – the hashed index and markers array. The first serves as the “text” (in terms
of approximate string search algorithms) and the second facilitates the process of re-
trieving matches from the memory.
Usability Analysis of the Concordia Tool 5
During the operation of the system, i.e. when the searches are performed, all three
structures (hashed index, markers array and suffix array) are loaded into RAM. For
performance reasons, hashed index and markers array are backed up on the hard disk.
When a new sentence is added to the index via the aforementioned addToIndex method,
the following operations are performed:
1. tokenizing of the sentence
2. stemming of each token
3. converting each token to a numeric value according to a dynamically created map
(called dictionary)
The coded stems are stored in the index. Stemming each word and replacing it with
a code results in a situation, where even large text corpora require relatively few codes.
For example, a research of this phenomenon presented that a corpus of 3593227 tokens
coming from a narrow domain (the JRC-acquis corpus) required only 17001 codes (see
[10]). In this situation each word could be stored in just 2 bytes, which significantly
reduces space complexity.
3.3 Concordia Searching
The Concordia search is aimed at finding the longest matches from the index that cover
the search pattern. Such a match is called “matched pattern fragment”. Then, out of all
matched pattern fragments, the best pattern overlay is computed.
The pattern overlay is a set of matched pattern fragments which do not intersect
with each other. Best pattern overlay is an overlay that matches the most of the pattern
with the fewest number of fragments.
Additionally, the score for this best overlay is computed. The score is a real number
between 0 and 1, where 0 indicates, that the pattern is not covered at all (i.e. not a single
word from this pattern is found in the index). The score 1 represents a perfect match
– pattern is covered completely by just one fragment, which means that the pattern is
found in the index as one of the examples. The formula (1) is used to compute the best
overlay score:
len( f ragment) log(len( f ragment) + 1)
score = ∑ · (1)
f ragment∈overlay len(pattern) log(len(pattern) + 1)
According to this formula, each fragment covering the pattern is assigned a base
score equalling the relation of its length to the length of the whole pattern. This concept
is taken from the standard Jaccard index [8]. However, this base score is modified by
the second factor, which assumes the value 1 when the fragment covers the pattern
completely, but decreases significantly when the fragment is shorter. For that reason,
if one considers a situation where the whole pattern is covered with two continuous
fragments, such overlay is not given the score 1.
An example illustrating the Concordia search procedure is given hereafter. Let the
index contain the sentences from Table 1.
Table 2 presents the results of searching for the pattern: “Our new test product has
nothing to do with computers”.
6 R. Jaworski, I. Dund̄er, S. Seljan
Table 1. Example sentences for Concordia searching.
Sentence id
Alice has a cat 56
Alice has a dog 23
New test product has a mistake 321
This is just testing and it has nothing to do with the above 14
Table 2. Concordia search results.
Pattern interval Example id Example offset
[4, 9] 14 6
[1, 5] 321 0
[5, 9] 14 7
[2, 5] 321 1
[6, 9] 14 8
[3, 5] 321 2
[7, 9] 14 9
[8, 9] 14 10
best overlay: [1, 5][5, 9], score = 0.53695
These results list all the longest matched pattern fragments. The longest is [4, 9]
(length 5, as the end index is exclusive) which corresponds to the pattern fragment “has
nothing to do with”, found in the sentence 14 at offset 7. However, this longest fragment
was not chosen to the best overlay. The best overlay are two fragments of length 4: [1, 5]
“new test product has” and [5, 9] “nothing to do with”. It should also be noted, that if
the fragment [4, 9] was chosen to the overlay, it would eliminate the [1, 5] fragment. The
score of such an overlay is 0.53695, which can be considered as quite satisfactory to
serve as an aid for a translator.
4 Experiment and Evaluation
The aim of the proposed experiment involving a human translation task is to get an
insight into the users’ perspective on the usefulness, functionalities of the system Con-
cordia and the possibilities of integrating it into the translation workflow.
4.1 Methodology
At first, the Concordia tool was fed with the SETimes2 corpus consisting of approxi-
mately 200k sentences from the news domain (for corpus description see [21]). Then,
the evaluation of Concordia is performed for the English-Croatian language pair. Each
of the 14 evaluators, which were separated into two groups (group A and B), was given
20 sentences, also from the news domain (but not present in the SETimes2 corpus).
Usability Analysis of the Concordia Tool 7
Those 20 sentences were divided into two test sets: while evaluators from group
A translated the sentences without Concordia, but could use other internet resources,
group B translated the same test set with Concordia and with the possibility to use
other internet resources. Then, in the second task, group A translated with Concordia
(possibility to use other internet resources) and group B without Concordia (but with
the possibility to arbitrarily use resources on the internet).
Evaluators were graduate students at the Faculty of Humanities and Social Sciences,
University of Zagreb, fluent in English. Before starting with the translation tasks, a pre-
translation survey was carried out in order to acquire more information on the evalua-
tors’ background (study group and familiarity with translation tools).
Prior to starting with the translation tasks the evaluators were shown three examples
on how to use Concordia. Each evaluator was then asked to record the total time needed
for translating a sentence.
The translators, i.e. evaluators, were not interrupted during the work in any way
and could take any time they needed. Apart from the total translation time, the best
Concordia, i.e. overlay, score for each of the sentences from the test sets was recorded.
After translating the given sentences with and without the help of Concordia, a post-
task questionnaire was given to the evaluators. The questionnaire contained various
questions regarding the usefulness of the tool, existence of necessary functionalities for
effective translation, purpose of using Concordia during translation (single words in a
dictionary-like style, multi-word units or entire phrases), intuitiveness of design etc.
The list of questions of the questionnaire is presented in Table 3.
Table 3. Post-translation questionnaire.
Question Type
Did the system help you in the translation task? yes/no
Please rate the intuitiveness of the system. score 1-5
How many times did you look up
hits suggested by the system? number
After looking up a hit suggested by the system,
did you find its translation easily? always / sometimes / never easy
What did you look up
most often in the system? single words / multi-word units / entire phrases
Please list suggestions for improvement. short comment
The translation times, automatic best overlay scores and the survey results were then
analysed, as valuable user feedback with regard to usability and user-friendliness can
be utilised to upgrade the new CAT tool.
4.2 Results and Discussion
In total, 14 evaluators (9 were male, 5 female) participated in this experimental study.
They were randomly selected and split into two groups (7 per group). 10 evaluators
8 R. Jaworski, I. Dund̄er, S. Seljan
were studying informatics and 4 were students of translation study groups of various
languages. Figure 1 shows the distribution of digital language resources according to
the students’ familiarity.
Fig. 1. Familiarity with digital language resources.
Both groups were allowed to use any other preferred language resource, when trans-
lating with or without Concordia. Among 14 evaluators, 2 used only Google Translate,
whereas 2 evaluators used both Google Translate and Bing Translator. Each test set
consisted of 10 sentences from the same news domain regarding traffic accidents in the
region. Minimum length was 5 words, whereas the maximum sentence length was 38
words. Average sentence length for the first test set was 21.8 and for the second 18.2
words, an in total 218 and 182, respectively.
Table 4 shows the total time needed for the translation tasks. Time needed for trans-
lation was very similar, but the main advantage would be consistent translation of the
specific abbreviation when using Concordia. However, due to relatively small test sets,
differences in time and quality are not clear. The reason for a little longer time with
Concordia was that students were not said to be relatively quick with translation task,
but on the contrary, they took more time to investigate all possible doubts. The test 1
contained the specific abbreviation and a few more specific terms, which caused longer
time for the translation task. The total number of lookups for the test set 1 was 69,
and for the test set 2 it was 35. However, the time needed to translate longer sentences
having from 23-38 words decreased by 17-25%.
Table 4 presents the total times needed for performing translation tasks.
Table 5 shows various Pearson correlation results.
The most indicative are the following correlations:
– correlation between the average number of fragments and the average time for com-
pleting the translation tasks (0.86 and 0.84) indicating that more fragments ask for
more verification and longer time.
Usability Analysis of the Concordia Tool 9
Table 4. Total time needed for the translation tasks.
Test set without Concordia with Concordia No. of words No. of look-ups
+ add. resources + add. resources
Test set 1 Group A: 67m 10s Group B: 66m 42s 218 69
Test set 2 Group B: 49m 31s Group A: 50m 38s 182 35
Table 5. Pearson correlation results.
Correlation Test set 1 Test set 2
Concordia score and the number of fragments -0.78 -0.69
Concordia score and the average number of lookups -0.17 -0.49
Number of fragments and the average number of lookups 0.59 0.72
Concordia score and the average time needed -0.73 -0.50
Average number of fragments and the average time needed 0.86 0.84
Average number of lookups and the average time needed 0.52 0.96
– correlation between the Concordia score and the number of fragments (-0.78 and -
0.69), where higher values of Concordia scores imply lower numbers of fragments.
Higher Concordia scores are obtained with smaller number of fragments.
– in the second translation task there was a high correlation (0.96) between the aver-
age number of lookups and average time that was needed
Other correlations, such as “Concordia score and average number of lookups”,
“number of fragments and average number of lookups”, “Concordia score and average
time needed” were relatively low. Still, more extensive human evaluation on a larger test
set is planned. Evaluators were asked whether the system helped them in their assigned
translation tasks. 10 students confirmed that the system was helpful during translation,
whereas 4 students did not find the system useful. Average score of the system intuitive-
ness given by the evaluators was 3.5. Most of the students (57%) answered that after
looking up a hit suggested by the system, it was “sometimes easy” to find its translation,
whereas 29% of the students stated that it was “always easy” to find the corresponding
translation.
When answering the question what did they look up most often in the system, 7
students answered “multi-word units” and 4 students stated “single words”.
When asked to list suggestions for improvements or remarks, evaluators suggested
that the system should be fed with much larger corpora. Also more interactivity with
the end-users would be desirable. Furthermore, design improvements with regard to
displaying long sentences that stretch across the screen were proposed, as displaying of
longer sentences is not user-friendly.
10 R. Jaworski, I. Dund̄er, S. Seljan
5 Conclusions
Concordia combines capabilities of standard concordance searchers with the usabil-
ity of a translation memory. In the pre-translation task the familiarity with language
resources was evaluated, indicating that on average 85% of students of mainly non-
translation study group were acquainted with various language resources.
In the translation task, two experiments were conducted without and with Concordia
tool, using both tests sets from the news domain. 71% of users declared the Concordia
system useful in the translation task, with the average score for the system intuitive-
ness 3.5. The system was mostly used to find multi-word expressions, followed by sin-
gle words. 57% of users indicated that it was “sometimes easy” to find its translation,
whereas 29% of the students stated that it was “always easy”. However, there are no
firm conclusions regarding the speed of translation, due to relatively small test sets and
specific terminology and abbreviation which has caused longer time for the translation
task. However, the Concordia system was more useful with longer and more complex
sentences where translation time decreased for 17-25%.
The future research would focus on improvements of interface design, better inter-
activity, corpus enlargement and more extensive experiments.
References
1. Bai, M.H., Hsieh, Y.M., Chen, K.J., Chang, J.S.: DOMCAT: A Bilingual Concordancer for
Domain-Specific Computer Assisted Translation. In: ACL (System Demonstrations). pp. 55–
60. The Association for Computer Linguistics (2012), http://dblp.uni-trier.de/db/
conf/acl/acl2012d.html#BaiHCC12
2. Bourdaillet, J., Huet, S., Langlais, P., Lapalme, G.: TransSearch: from a bilingual concor-
dancer to a translation finder. Machine Translation 24(3), 241–271 (2011), http://dx.doi.
org/10.1007/s10590-011-9089-6
3. Callison-Burch, C., Bannard, C., Schroeder, J.: Searchable translation memories. In: In Pro-
ceedings of ASLIB Translation and the Computer 26 (2004)
4. Escartín, C.P., Arcedillo, M.: A fuzzier approach to machine translation evaluation: A pilot
study on post-editing productivity and automated metrics in commercial settings. In: Pro-
ceedings of the ACL 2015 Fourth Workshop on Hybrid Approaches to Translation (HyTra).
pp. 40–45 (2015)
5. Federico, M., Cattelan, A., Trombetti, M.: Measuring User Productivity in Machine Trans-
lation Enhanced Computer Assisted Translation. In: Proceedings of the Tenth Confer-
ence of the Association for Machine Translation in the Americas (AMTA) (2012), http:
//www.mt-archive.info/AMTA-2012-Federico.pdf
6. Green, S., Heer, J., Manning, C.D.: The efficacy of human post-editing for language transla-
tion. In: Mackay, W.E., Brewster, S.A., Bødker, S. (eds.) 2013 ACM SIGCHI Conference on
Human Factors in Computing Systems, CHI ’13, Paris, France, April 27 - May 2, 2013. pp.
439–448. ACM (2013), http://doi.acm.org/10.1145/2470654.2470718
7. Guerberof, A.: Productivity and quality in MT post-editing. In: Proceedings of the 12th Ma-
chine Translation Summit (MT Summit XII) Workshop: Beyond Translation Memories -
New Tools for Translators. p. 9 (2009)
8. Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des
Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, pp. 547-579 (1901)
Usability Analysis of the Concordia Tool 11
9. Jaworski, R., Jassem, K.: Building High Quality Translation Memories Acquired from
Monolingual Corpora. Proceedings of the Intelligent Information Systems Conference pp.
157–168 (2010)
10. Jaworski, R.: Anubis – speeding up Computer-Aided Translation. Computational Linguistics
– Applications, Studies in Computational Intelligence vol. 458, Springer-Verlag (2013)
11. Kockaert, H.J., Vanallemeersch, T., Steurs, F.: Term-based context extraction in legal termi-
nology : a case study in Belgium. In: Fóris, A., Pusztay, J. (eds.) Current Trends in Terminol-
ogy: Proceedings of the International Conference on Terminology (Terminologia et Corpora
Supplementum 4). pp. 153–162 (2008)
12. Koehn, P., Germann, U.: The Impact of Machine Translation Quality on Human Post-Editing.
In: Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Trans-
lation. pp. 38–46. Association for Computational Linguistics, Gothenburg, Sweden (April
2014), http://www.aclweb.org/anthology/W14-0307
13. Makinen, V., Navarro, G.: Compressed compact suffix arrays. In Proc. 15th Annual Sympo-
sium on Combinatorial Pattern Matching (CPM), LNCS v. 3109, pp. 420-433 (2004)
14. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. First Annual
ACM-SIAM Symposium on Discrete Algorithms. pp. 319–327 (1990)
15. multiple: Kilgray Translation Technologies: memoQ Translator Pro.
http://kilgray.com/products/memoq/
16. multiple: SDL Trados translation solution. http://www.sdl.com/en/sites/sdl-trados-solutions
17. O’Brien, S.: Towards predicting post-editing productivity. Machine Translation 25(3), 197–
215 (2011), http://dx.doi.org/10.1007/s10590-011-9096-7
18. Seljan, S., Gašpar, A., Pavuna, D.: Sentence Alignment as the Basis For Translation Mem-
ory Database. INFuture2007–The Future of Information Sciences: Digital Information and
Heritage. Zagreb: Odsjek za informacijske znanosti, Filozofski fakultet (2007)
19. Simard, M., Macklovitch, E.: Studying the Human Translation Process through the
TransSearch Log-Files. In: Knowledge Collection from Volunteer Contributors, Papers
from the 2005 AAAI Spring Symposium, Technical Report SS-05-03, Stanford, Califor-
nia, USA, March 21-23, 2005. pp. 70–77. AAAI (2005), http://www.aaai.org/Library/
Symposia/Spring/2005/ss05-03-011.php
20. Specia, L., Farzindar, A.: Estimating Machine Translation Post-Editing Effort with HTER.
In: AMTA Workshop Bringing MT to the User: MT Research and the Translation Industry.
Denver, Colorado (2010), http://www.mt-archive.info/JEC-2010-Specia.pdf
21. Tiedemann, J.: Parallel Data, Tools and Interfaces in OPUS. In: Calzolari, N., Choukri, K.,
Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S.
(eds.) Proceedings of the Eight International Conference on Language Resources and Eval-
uation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey
(may 2012)
22. Wu, J.C., Yeh, K.C., Chuang, T.C., Shei, W.C., Chang, J.S.: TotalRecall: A Bilingual Con-
cordance for Computer Assisted Translation and Language Learning. In: Funakoshi, K.,
Kübler, S., Otterbacher, J. (eds.) ACL (Companion). pp. 201–204. The Association for Com-
puter Linguistics (2003), http://dblp.uni-trier.de/db/conf/acl/acl2003c.html#
WuYCSC03