0% found this document useful (0 votes)

22 views11 pages

Usability Analysis of The Concordia Tool Applying Novel Concordance Searching

The document presents a novel concordance searching tool named Concordia, which integrates features of standard concordance searchers and translation memory for enhanced usability. It employs advanced data structures for efficient searching and was evaluated through experiments involving English-Croatian translation tasks to assess user attitudes towards its functionalities. The findings highlight Concordia's potential to improve translation productivity and quality in comparison to existing CAT tools.

Uploaded by

rafciujaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

Usability Analysis of The Concordia Tool Applying Novel Concordance Searching

Uploaded by

rafciujaw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Usability Analysis of the Concordia Tool

Applying Novel Concordance Searching

Rafał Jaworski1 , Ivan Dund̄er2 , and Sanja Seljan2

1 Adam Mickiewicz University in Poznań, Faculty of Mathematics and Computer Science
2 University of Zagreb, Faculty of Humanities and Social Sciences

Abstract. This paper describes a novel tool for concordance searching, named
Concordia. It combines the capabilities of standard concordance searchers with
the usability of a translation memory. The tool is described in detail with re-
gard to main applied methods and differences when compared to already exist-
ing CAT tools. Concordia uses three data structures, i.e. hashed index, mark-
ers array and suffix array, which are loaded into memory to enable fast lookups
according to fragments that cover a search pattern. In this new concordancing
system, sentences are stored in the index and marked with additional informa-
tion, such as unique ids, which are then retrieved by the Concordia search algo-
rithm. The usability of the new tool is analysed in an experiment involving two
English-Croatian human translation tasks. The paper presents a detailed scheme
and methodology of the conducted experiment. Furthermore, an analysis of the
experiment results is presented, with special emphasis on the users’ attitudes to-
wards the usefulness and functionalities of Concordia.

Keywords: concordance searching, computer-assisted translation, approximate

searching, suffix array, human evaluation

1 Introduction
In order to bridge the gap between the industry and the research, various studies have
been conducted regarding the usability of computer-assisted translation (CAT) tools in
the translation process. CAT tools can be considered as stand-alone systems, or tools
that are integrated with electronic dictionaries, machine translation (MT) engines, con-
cordancers, terminology managers, full-text search tools etc. CAT tools and MT sys-
tems, along with integrated plug-ins and resources can offer quick gisting translation,
but still lack the quality. Numerous experiments have been conducted in order to assess
the usability of CAT and/or MT systems. While CAT technology is used to find match-
ing sentences from sentence-aligned translation memories (TM), translators often need
translations of sub-sentence units, e.g. phrases, expressions etc.
One of the key requirements is to have high-quality aligned parallel corpora, see
[18], [9]. Bilingual concordancers are still often used by translators. For a given query,
the system retrieves a source-target translation unit pair containing the queried se-
quences of characters. Bilingual concordancers represent an extension of dictionaries,
allowing for searching of multi-word units, collocations or idiomatic expressions (e.g.
“look forward to”), phrases or even entire sentences.
2 R. Jaworski, I. Dund̄er, S. Seljan

CAT systems typically use sets of previously translated sentences, called translation
memories. For a given sentence, a CAT system searches for a similar sentence in the
translation memory. If such a sentence is found, its translation is used to produce the
output sentence. This output sentence is then used as a suggestion for translation, while
a human translator carries out the post-editing.
This technique is applied in many leading CAT platforms, such as SDL Trados [16]
or Kilgray memoQ [15]. Its main advantage is the fast detection of situations, in which
a translator is presented an identical or almost identical sentence to the one previously
translated. In this case the old translation can be reused with a minimal need for post-
editing. However, the main drawback of translation memory searching lies in the fact
that the situations described above happen relatively rarely.
Another mentioned technique used in the mentioned CAT platforms is concordance
searching – a technique of looking up single words or multi-word units from the trans-
lated sentence in a translation memory. Occurrences of these words are then presented
to the translator with the appropriate contexts.
It is crucial to know which of these techniques can prove valuable in the translation
process. Therefore, evaluation of the translation productivity is conducted in order to
obtain or maintain a suitable translation quality and/or reduce work time and costs.
Translation productivity, analysed through post-editing of CAT/MT-translated text, is
often performed in combination with a survey of the users’ skills, cognitive efforts and
the quality of the translated text.
Human evaluation of usefulness of a CAT tool mostly takes into account the impact
on post-editing speed and effort, usability of the interface, ease of translation spotting,
autocompleting of translations etc. On the other hand, automatic evaluation is com-
monly analysed with the help of human-targeted translation edit rate (HTER) as shown
in [20], the BLEU metric and, in more recent works, fuzzy matching measures: [5], [4].
This paper presents a new CAT tool, i.e. a novel concordance searcher named Con-
cordia, and evaluation results regarding its usability. Concordia uses a combination of
well established algorithms and data structures to facilitate fast queries and combines
the advantages of standard concordancers with the capabilities of a translation memory.
Subsequent sections describe related work in the field, the details about the Concordia
search algorithm, followed by the description of the experimental evaluation with the
corresponding results, whereas conclusions are given in the final section.

2 Related work

Usability and productivity studies of various CAT tools have recently emerged due to
the interest of industry leaders, software engineers, computer and information scientists,
translators, localisers and data scientists. Numerous assessments taking into consider-
ation different aspects, ranging from human evaluations up to automatic metrics, have
been conducted.
The paper [5] assessed the user productivity of a commercial CAT tool with the pub-
licly available MyMemory plug-in and an integrated commercial machine translation
engine. Twelve translators participated in a real translation project. The productivity
was measured by human and automatic evaluations. The machine translation engine was
Usability Analysis of the Concordia Tool 3

analysed through the rate of words per hour, fuzzy match, productivity gain, BLEU and
TER score. The results showed that post-editing efforts significantly decreased when
using a combination of translation memories and machine translation. The post-editing
speed implied significant differences across translators, languages, and domains. In an-
other research (see [17]) the machine translation post-editing productivity with regard
to speed and needed effort was measured. The results were obtained with the help of an
eye-tracking system.
An interesting experiment is described in [7]. It involved eight professional trans-
lators who were given a task to translate ca 800 source words from scratch, using a
glossary, a translation memory with mainly 80-90% fuzzy matches and a commercial
statistical machine translation (SMT) engine trained on the translation memory content.
The productivity was measured in terms of speed and quality of translated texts. Rela-
tive time improved 4% to 52%, with an average of 27%. Short (1–10 words), medium
(11–20 words) and long (>20 words) segments were used, whereas the highest quality
increase of machine translations was observed on medium to long segments.
The article [6] showed that the use of MT in a post-editing task improved the speed
and quality, and suggested new approaches to translation interface design, after per-
forming a visual analysis of the translation process and a statistical analysis using
ANOVA.
Recent research assessed the post-editing effort while using four different SMT sys-
tems [12]. Post-editing was carried out by fluent bilingual native speakers without pre-
vious experience in professional translation. Post-editing speed was strongly influenced
by the post-editors’ skills and effort. Also, the impact of the SMT system quality on
the post-editing effort was assessed by using human-mediated TER, which uses as the
reference translation the machine-translated sentence in order to minimise the number
of edit operations. The authors indicated that differences among post-editors are larger
than among MT systems.
The paper [3] described a searchable translation memory relying on statistical ma-
chine translation, using word-alignment and phrase-based SMT with the possibility to
search for all possible substrings, i.e. unseen phrases. The authors recommended the
Linear B system available for Arabic, Chinese and seven European languages. The eval-
uation was done with regard to precision and recall.
A bilingual concordancing system which displayed occurrences of a specific word
or an expression is presented in [19]. The tool can also be accessed over the internet
and performs thousands of user queries per day. It searches through a large database
of bitexts (sentence-aligned text) – Hansard and Court Decisions, which mostly con-
tains bigram expressions, followed by 3-grams, 4-grams and unigrams. The authors
proposed a word-processor add-on functionality which would allow to submit queries
to the TransSearch system directly from a word-processor.
Another bilingual concordancer is presented in [2]. It is transformed into a trans-
lation search engine through some improvements. The authors indicated that during a
6-year period 87% of searches contained at least two words.
The DOMCAT project [1] is a web-based bilingual concordancer for domain-specific
computer-assisted translation. The system retrieves, for a given multi-word expression,
4 R. Jaworski, I. Dund̄er, S. Seljan

aligned sentence pairs. The authors stated that translation spotting was the most chal-
lenging part.
The paper [22] described a web-based English-Chinese concordance system named
TotalRecall, which was developed for computer-assisted language learning of idiomatic
expressions.
Also, the paper [11] presented a system for term extraction by extracting the con-
texts and combining word alignment and concordancing. The aim was to develop the
Terminology Management System (TMS) of legal phraseology and terminology for
French, Dutch and German. In an experimental tool called FragmALex, links between
the source and target text were created using lexical resources (lemmas and their trans-
lations) borrowed from dictionaries, terminology bases, documents and cognates, i.e.
words with common etymological origin which are similar in the source and target
language.

3 Concordia: a Concordance Search Algorithm

This section presents a novel solution for concordance searching. It differs significantly
from the standard concordance searchers. The most important difference is that Concor-
dia tries to search sequences of words in the translation memory, instead of single word
occurrences. In order to carry out the search procedures efficiently, an offline index,
based on the suffix array ([14], [13]) and other auxiliary data structures are used.

3.1 Operations on Index

Main operations performed on the index are stated subsequently.
– void addToIndex(string sentence, int id) – this method is used to add a sentence to
the index along with its unique id. The id is treated as additional information about
the sentence and it is then retrieved from the index by the search algorithm. This
is useful in a standard scenario, when sentences are stored in a database or a text
file, where the id is the line number. Within the addToIndex method the sentence is
tokenised and from this point forward treated as a word sequence.
– void generateIndex() – after adding all the sentences to the index the generateIndex
method should be called in order to compute the suffix array for the needs of the
fast lookup index. This operation may take some time depending on the number of
sentences in the index. Nevertheless, its length rarely exceeds one minute (during
experiments with 2 million sentences the index generation took 6-7 seconds).
– concordiaSearch(string pattern) – the main concordance search method returns the
longest fragments from the index that cover the search pattern.

3.2 Index Construction

The index incorporates the idea of a suffix array and is aided by two auxiliary data
structures – the hashed index and markers array. The first serves as the “text” (in terms
of approximate string search algorithms) and the second facilitates the process of re-
trieving matches from the memory.
Usability Analysis of the Concordia Tool 5

During the operation of the system, i.e. when the searches are performed, all three
structures (hashed index, markers array and suffix array) are loaded into RAM. For
performance reasons, hashed index and markers array are backed up on the hard disk.
When a new sentence is added to the index via the aforementioned addToIndex method,
the following operations are performed:
1. tokenizing of the sentence
2. stemming of each token
3. converting each token to a numeric value according to a dynamically created map
(called dictionary)
The coded stems are stored in the index. Stemming each word and replacing it with
a code results in a situation, where even large text corpora require relatively few codes.
For example, a research of this phenomenon presented that a corpus of 3593227 tokens
coming from a narrow domain (the JRC-acquis corpus) required only 17001 codes (see
[10]). In this situation each word could be stored in just 2 bytes, which significantly
reduces space complexity.

3.3 Concordia Searching

The Concordia search is aimed at finding the longest matches from the index that cover
the search pattern. Such a match is called “matched pattern fragment”. Then, out of all
matched pattern fragments, the best pattern overlay is computed.
The pattern overlay is a set of matched pattern fragments which do not intersect
with each other. Best pattern overlay is an overlay that matches the most of the pattern
with the fewest number of fragments.
Additionally, the score for this best overlay is computed. The score is a real number
between 0 and 1, where 0 indicates, that the pattern is not covered at all (i.e. not a single
word from this pattern is found in the index). The score 1 represents a perfect match
– pattern is covered completely by just one fragment, which means that the pattern is
found in the index as one of the examples. The formula (1) is used to compute the best
overlay score:
len( f ragment) log(len( f ragment) + 1)
score = ∑ · (1)
f ragment∈overlay len(pattern) log(len(pattern) + 1)

According to this formula, each fragment covering the pattern is assigned a base
score equalling the relation of its length to the length of the whole pattern. This concept
is taken from the standard Jaccard index [8]. However, this base score is modified by
the second factor, which assumes the value 1 when the fragment covers the pattern
completely, but decreases significantly when the fragment is shorter. For that reason,
if one considers a situation where the whole pattern is covered with two continuous
fragments, such overlay is not given the score 1.
An example illustrating the Concordia search procedure is given hereafter. Let the
index contain the sentences from Table 1.
Table 2 presents the results of searching for the pattern: “Our new test product has
nothing to do with computers”.
6 R. Jaworski, I. Dund̄er, S. Seljan

Table 1. Example sentences for Concordia searching.

Sentence id
Alice has a cat 56
Alice has a dog 23
New test product has a mistake 321
This is just testing and it has nothing to do with the above 14

Table 2. Concordia search results.

Pattern interval Example id Example offset

[4, 9] 14 6
[1, 5] 321 0
[5, 9] 14 7
[2, 5] 321 1
[6, 9] 14 8
[3, 5] 321 2
[7, 9] 14 9
[8, 9] 14 10
best overlay: [1, 5][5, 9], score = 0.53695

These results list all the longest matched pattern fragments. The longest is [4, 9]
(length 5, as the end index is exclusive) which corresponds to the pattern fragment “has
nothing to do with”, found in the sentence 14 at offset 7. However, this longest fragment
was not chosen to the best overlay. The best overlay are two fragments of length 4: [1, 5]
“new test product has” and [5, 9] “nothing to do with”. It should also be noted, that if
the fragment [4, 9] was chosen to the overlay, it would eliminate the [1, 5] fragment. The
score of such an overlay is 0.53695, which can be considered as quite satisfactory to
serve as an aid for a translator.

4 Experiment and Evaluation

The aim of the proposed experiment involving a human translation task is to get an
insight into the users’ perspective on the usefulness, functionalities of the system Con-
cordia and the possibilities of integrating it into the translation workflow.

4.1 Methodology
At first, the Concordia tool was fed with the SETimes2 corpus consisting of approxi-
mately 200k sentences from the news domain (for corpus description see [21]). Then,
the evaluation of Concordia is performed for the English-Croatian language pair. Each
of the 14 evaluators, which were separated into two groups (group A and B), was given
20 sentences, also from the news domain (but not present in the SETimes2 corpus).
Usability Analysis of the Concordia Tool 7

Those 20 sentences were divided into two test sets: while evaluators from group
A translated the sentences without Concordia, but could use other internet resources,
group B translated the same test set with Concordia and with the possibility to use
other internet resources. Then, in the second task, group A translated with Concordia
(possibility to use other internet resources) and group B without Concordia (but with
the possibility to arbitrarily use resources on the internet).
Evaluators were graduate students at the Faculty of Humanities and Social Sciences,
University of Zagreb, fluent in English. Before starting with the translation tasks, a pre-
translation survey was carried out in order to acquire more information on the evalua-
tors’ background (study group and familiarity with translation tools).
Prior to starting with the translation tasks the evaluators were shown three examples
on how to use Concordia. Each evaluator was then asked to record the total time needed
for translating a sentence.
The translators, i.e. evaluators, were not interrupted during the work in any way
and could take any time they needed. Apart from the total translation time, the best
Concordia, i.e. overlay, score for each of the sentences from the test sets was recorded.
After translating the given sentences with and without the help of Concordia, a post-
task questionnaire was given to the evaluators. The questionnaire contained various
questions regarding the usefulness of the tool, existence of necessary functionalities for
effective translation, purpose of using Concordia during translation (single words in a
dictionary-like style, multi-word units or entire phrases), intuitiveness of design etc.
The list of questions of the questionnaire is presented in Table 3.

Table 3. Post-translation questionnaire.

Question Type
Did the system help you in the translation task? yes/no
Please rate the intuitiveness of the system. score 1-5
How many times did you look up
hits suggested by the system? number
After looking up a hit suggested by the system,
did you find its translation easily? always / sometimes / never easy
What did you look up
most often in the system? single words / multi-word units / entire phrases
Please list suggestions for improvement. short comment

The translation times, automatic best overlay scores and the survey results were then
analysed, as valuable user feedback with regard to usability and user-friendliness can
be utilised to upgrade the new CAT tool.

4.2 Results and Discussion

In total, 14 evaluators (9 were male, 5 female) participated in this experimental study.
They were randomly selected and split into two groups (7 per group). 10 evaluators
8 R. Jaworski, I. Dund̄er, S. Seljan

were studying informatics and 4 were students of translation study groups of various
languages. Figure 1 shows the distribution of digital language resources according to
the students’ familiarity.

Fig. 1. Familiarity with digital language resources.

Both groups were allowed to use any other preferred language resource, when trans-
lating with or without Concordia. Among 14 evaluators, 2 used only Google Translate,
whereas 2 evaluators used both Google Translate and Bing Translator. Each test set
consisted of 10 sentences from the same news domain regarding traffic accidents in the
region. Minimum length was 5 words, whereas the maximum sentence length was 38
words. Average sentence length for the first test set was 21.8 and for the second 18.2
words, an in total 218 and 182, respectively.
Table 4 shows the total time needed for the translation tasks. Time needed for trans-
lation was very similar, but the main advantage would be consistent translation of the
specific abbreviation when using Concordia. However, due to relatively small test sets,
differences in time and quality are not clear. The reason for a little longer time with
Concordia was that students were not said to be relatively quick with translation task,
but on the contrary, they took more time to investigate all possible doubts. The test 1
contained the specific abbreviation and a few more specific terms, which caused longer
time for the translation task. The total number of lookups for the test set 1 was 69,
and for the test set 2 it was 35. However, the time needed to translate longer sentences
having from 23-38 words decreased by 17-25%.
Table 4 presents the total times needed for performing translation tasks.
Table 5 shows various Pearson correlation results.
The most indicative are the following correlations:

– correlation between the average number of fragments and the average time for com-
pleting the translation tasks (0.86 and 0.84) indicating that more fragments ask for
more verification and longer time.
Usability Analysis of the Concordia Tool 9

Table 4. Total time needed for the translation tasks.

Test set without Concordia with Concordia No. of words No. of look-ups
+ add. resources + add. resources
Test set 1 Group A: 67m 10s Group B: 66m 42s 218 69
Test set 2 Group B: 49m 31s Group A: 50m 38s 182 35

Table 5. Pearson correlation results.

Correlation Test set 1 Test set 2

Concordia score and the number of fragments -0.78 -0.69
Concordia score and the average number of lookups -0.17 -0.49
Number of fragments and the average number of lookups 0.59 0.72
Concordia score and the average time needed -0.73 -0.50
Average number of fragments and the average time needed 0.86 0.84
Average number of lookups and the average time needed 0.52 0.96

– correlation between the Concordia score and the number of fragments (-0.78 and -
0.69), where higher values of Concordia scores imply lower numbers of fragments.
Higher Concordia scores are obtained with smaller number of fragments.
– in the second translation task there was a high correlation (0.96) between the aver-
age number of lookups and average time that was needed

Other correlations, such as “Concordia score and average number of lookups”,

“number of fragments and average number of lookups”, “Concordia score and average
time needed” were relatively low. Still, more extensive human evaluation on a larger test
set is planned. Evaluators were asked whether the system helped them in their assigned
translation tasks. 10 students confirmed that the system was helpful during translation,
whereas 4 students did not find the system useful. Average score of the system intuitive-
ness given by the evaluators was 3.5. Most of the students (57%) answered that after
looking up a hit suggested by the system, it was “sometimes easy” to find its translation,
whereas 29% of the students stated that it was “always easy” to find the corresponding
translation.
When answering the question what did they look up most often in the system, 7
students answered “multi-word units” and 4 students stated “single words”.
When asked to list suggestions for improvements or remarks, evaluators suggested
that the system should be fed with much larger corpora. Also more interactivity with
the end-users would be desirable. Furthermore, design improvements with regard to
displaying long sentences that stretch across the screen were proposed, as displaying of
longer sentences is not user-friendly.
10 R. Jaworski, I. Dund̄er, S. Seljan

5 Conclusions

Concordia combines capabilities of standard concordance searchers with the usabil-

ity of a translation memory. In the pre-translation task the familiarity with language
resources was evaluated, indicating that on average 85% of students of mainly non-
translation study group were acquainted with various language resources.
In the translation task, two experiments were conducted without and with Concordia
tool, using both tests sets from the news domain. 71% of users declared the Concordia
system useful in the translation task, with the average score for the system intuitive-
ness 3.5. The system was mostly used to find multi-word expressions, followed by sin-
gle words. 57% of users indicated that it was “sometimes easy” to find its translation,
whereas 29% of the students stated that it was “always easy”. However, there are no
firm conclusions regarding the speed of translation, due to relatively small test sets and
specific terminology and abbreviation which has caused longer time for the translation
task. However, the Concordia system was more useful with longer and more complex
sentences where translation time decreased for 17-25%.
The future research would focus on improvements of interface design, better inter-
activity, corpus enlargement and more extensive experiments.

References

1. Bai, M.H., Hsieh, Y.M., Chen, K.J., Chang, J.S.: DOMCAT: A Bilingual Concordancer for
Domain-Specific Computer Assisted Translation. In: ACL (System Demonstrations). pp. 55–
60. The Association for Computer Linguistics (2012), http://dblp.uni-trier.de/db/
conf/acl/acl2012d.html#BaiHCC12
2. Bourdaillet, J., Huet, S., Langlais, P., Lapalme, G.: TransSearch: from a bilingual concor-
dancer to a translation finder. Machine Translation 24(3), 241–271 (2011), http://dx.doi.
org/10.1007/s10590-011-9089-6
3. Callison-Burch, C., Bannard, C., Schroeder, J.: Searchable translation memories. In: In Pro-
ceedings of ASLIB Translation and the Computer 26 (2004)
4. Escartín, C.P., Arcedillo, M.: A fuzzier approach to machine translation evaluation: A pilot
study on post-editing productivity and automated metrics in commercial settings. In: Pro-
ceedings of the ACL 2015 Fourth Workshop on Hybrid Approaches to Translation (HyTra).
pp. 40–45 (2015)
5. Federico, M., Cattelan, A., Trombetti, M.: Measuring User Productivity in Machine Trans-
lation Enhanced Computer Assisted Translation. In: Proceedings of the Tenth Confer-
ence of the Association for Machine Translation in the Americas (AMTA) (2012), http:
//www.mt-archive.info/AMTA-2012-Federico.pdf
6. Green, S., Heer, J., Manning, C.D.: The efficacy of human post-editing for language transla-
tion. In: Mackay, W.E., Brewster, S.A., Bødker, S. (eds.) 2013 ACM SIGCHI Conference on
Human Factors in Computing Systems, CHI ’13, Paris, France, April 27 - May 2, 2013. pp.
439–448. ACM (2013), http://doi.acm.org/10.1145/2470654.2470718
7. Guerberof, A.: Productivity and quality in MT post-editing. In: Proceedings of the 12th Ma-
chine Translation Summit (MT Summit XII) Workshop: Beyond Translation Memories -
New Tools for Translators. p. 9 (2009)
8. Jaccard, P.: Étude comparative de la distribution florale dans une portion des Alpes et des
Jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37, pp. 547-579 (1901)
Usability Analysis of the Concordia Tool 11

9. Jaworski, R., Jassem, K.: Building High Quality Translation Memories Acquired from
Monolingual Corpora. Proceedings of the Intelligent Information Systems Conference pp.
157–168 (2010)
10. Jaworski, R.: Anubis – speeding up Computer-Aided Translation. Computational Linguistics
– Applications, Studies in Computational Intelligence vol. 458, Springer-Verlag (2013)
11. Kockaert, H.J., Vanallemeersch, T., Steurs, F.: Term-based context extraction in legal termi-
nology : a case study in Belgium. In: Fóris, A., Pusztay, J. (eds.) Current Trends in Terminol-
ogy: Proceedings of the International Conference on Terminology (Terminologia et Corpora
Supplementum 4). pp. 153–162 (2008)
12. Koehn, P., Germann, U.: The Impact of Machine Translation Quality on Human Post-Editing.
In: Proceedings of the EACL 2014 Workshop on Humans and Computer-assisted Trans-
lation. pp. 38–46. Association for Computational Linguistics, Gothenburg, Sweden (April
2014), http://www.aclweb.org/anthology/W14-0307
13. Makinen, V., Navarro, G.: Compressed compact suffix arrays. In Proc. 15th Annual Sympo-
sium on Combinatorial Pattern Matching (CPM), LNCS v. 3109, pp. 420-433 (2004)
14. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. First Annual
ACM-SIAM Symposium on Discrete Algorithms. pp. 319–327 (1990)
15. multiple: Kilgray Translation Technologies: memoQ Translator Pro.
http://kilgray.com/products/memoq/
16. multiple: SDL Trados translation solution. http://www.sdl.com/en/sites/sdl-trados-solutions
17. O’Brien, S.: Towards predicting post-editing productivity. Machine Translation 25(3), 197–
215 (2011), http://dx.doi.org/10.1007/s10590-011-9096-7
18. Seljan, S., Gašpar, A., Pavuna, D.: Sentence Alignment as the Basis For Translation Mem-
ory Database. INFuture2007–The Future of Information Sciences: Digital Information and
Heritage. Zagreb: Odsjek za informacijske znanosti, Filozofski fakultet (2007)
19. Simard, M., Macklovitch, E.: Studying the Human Translation Process through the
TransSearch Log-Files. In: Knowledge Collection from Volunteer Contributors, Papers
from the 2005 AAAI Spring Symposium, Technical Report SS-05-03, Stanford, Califor-
nia, USA, March 21-23, 2005. pp. 70–77. AAAI (2005), http://www.aaai.org/Library/
Symposia/Spring/2005/ss05-03-011.php
20. Specia, L., Farzindar, A.: Estimating Machine Translation Post-Editing Effort with HTER.
In: AMTA Workshop Bringing MT to the User: MT Research and the Translation Industry.
Denver, Colorado (2010), http://www.mt-archive.info/JEC-2010-Specia.pdf
21. Tiedemann, J.: Parallel Data, Tools and Interfaces in OPUS. In: Calzolari, N., Choukri, K.,
Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S.
(eds.) Proceedings of the Eight International Conference on Language Resources and Eval-
uation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey
(may 2012)
22. Wu, J.C., Yeh, K.C., Chuang, T.C., Shei, W.C., Chang, J.S.: TotalRecall: A Bilingual Con-
cordance for Computer Assisted Translation and Language Learning. In: Funakoshi, K.,
Kübler, S., Otterbacher, J. (eds.) ACL (Companion). pp. 201–204. The Association for Com-
puter Linguistics (2003), http://dblp.uni-trier.de/db/conf/acl/acl2003c.html#
WuYCSC03

A Quantitative Method For Evaluation of CAT Tools Based On User Preferences. Anna Zaretskaya
No ratings yet
A Quantitative Method For Evaluation of CAT Tools Based On User Preferences. Anna Zaretskaya
5 pages
Garcia CAT Systems
No ratings yet
Garcia CAT Systems
21 pages
Computer-Aided Translation in Student's Practical Translation
No ratings yet
Computer-Aided Translation in Student's Practical Translation
4 pages
Computer-Assisted Translation
No ratings yet
Computer-Assisted Translation
22 pages
The Symbiosis Between Translator & Computer
No ratings yet
The Symbiosis Between Translator & Computer
5 pages
Computer-Assisted Translation
No ratings yet
Computer-Assisted Translation
3 pages
Bowker, Lynne & Fisher, Des - Computer-Aided Translation
No ratings yet
Bowker, Lynne & Fisher, Des - Computer-Aided Translation
6 pages
An SMT-driven Authoring Tool: Sriram Venkatapathy Shachar M Irkin
No ratings yet
An SMT-driven Authoring Tool: Sriram Venkatapathy Shachar M Irkin
8 pages
Indonesian-Malaysian MT System Study
No ratings yet
Indonesian-Malaysian MT System Study
7 pages
Translation Quality Assessment Tools and Processes in Relation To CAT Tools
No ratings yet
Translation Quality Assessment Tools and Processes in Relation To CAT Tools
9 pages
Translation Tools for Professionals
No ratings yet
Translation Tools for Professionals
14 pages
Art Bundgaard
No ratings yet
Art Bundgaard
25 pages
Multimedia Interaction-Based Computer-Aided Translation Technology in Applied English Teaching
No ratings yet
Multimedia Interaction-Based Computer-Aided Translation Technology in Applied English Teaching
10 pages
Machine Translation Computer-Assisted Translation
No ratings yet
Machine Translation Computer-Assisted Translation
33 pages
Technology Era in Literary Translation
100% (1)
Technology Era in Literary Translation
5 pages
CAT Tools - History
No ratings yet
CAT Tools - History
3 pages
Assessing The Usability of Raw Machine Translated Output Doherty & O'Brien
No ratings yet
Assessing The Usability of Raw Machine Translated Output Doherty & O'Brien
38 pages
Statistical Approaches
No ratings yet
Statistical Approaches
26 pages
Research On Computer Aided English Translation in
No ratings yet
Research On Computer Aided English Translation in
6 pages
Reading Comprehension RT I
No ratings yet
Reading Comprehension RT I
3 pages
Translation Memory Database in The Translation Process: Abstract
No ratings yet
Translation Memory Database in The Translation Process: Abstract
6 pages
An Analysis of Computer-Assisted Translation (CAT) Tools
No ratings yet
An Analysis of Computer-Assisted Translation (CAT) Tools
53 pages
Improving The Content of Training Future Translators in The Aspect of Studying Modern CAT Tools
No ratings yet
Improving The Content of Training Future Translators in The Aspect of Studying Modern CAT Tools
16 pages
CAT Improving The Content of Training Future Translators in The Aspect
No ratings yet
CAT Improving The Content of Training Future Translators in The Aspect
16 pages
A Short Guide To Post-Editing Cap 6
No ratings yet
A Short Guide To Post-Editing Cap 6
6 pages
JOST 2009 Fiederer
No ratings yet
JOST 2009 Fiederer
18 pages
Machine Translation in CAT Tools
No ratings yet
Machine Translation in CAT Tools
14 pages
Translator-Computer Interaction in Action - An Observational Process Study of Computer-Aided Translation
No ratings yet
Translator-Computer Interaction in Action - An Observational Process Study of Computer-Aided Translation
12 pages
RK Introduction To CAT Tools India
No ratings yet
RK Introduction To CAT Tools India
21 pages
Towards Science of Machine Translation
No ratings yet
Towards Science of Machine Translation
9 pages
Teaching Machine Translation
No ratings yet
Teaching Machine Translation
9 pages
Leeds 2006
No ratings yet
Leeds 2006
34 pages
Springer Machine Translation: This Content Downloaded From 42.111.238.73 On Sun, 12 May 2019 05:30:19 UTC
No ratings yet
Springer Machine Translation: This Content Downloaded From 42.111.238.73 On Sun, 12 May 2019 05:30:19 UTC
5 pages
Contextualising Computer-Assisted Transl
No ratings yet
Contextualising Computer-Assisted Transl
36 pages
Proceedings of The International Worksho PDF
No ratings yet
Proceedings of The International Worksho PDF
42 pages
Adv. CAT MT Proposal
No ratings yet
Adv. CAT MT Proposal
9 pages
Issues in Translating English Technical Terms To Arabic by Google Translate
No ratings yet
Issues in Translating English Technical Terms To Arabic by Google Translate
20 pages
MR's 19
No ratings yet
MR's 19
8 pages
An Investigation of Saudi EFL University Students' Attitudes Towards The Use of Google Translate
No ratings yet
An Investigation of Saudi EFL University Students' Attitudes Towards The Use of Google Translate
11 pages
MachTransl 28 (2014) 2 Cettolo & Als, Translation Project Adaptation For MT-enhanced Computer Assisted Translation
No ratings yet
MachTransl 28 (2014) 2 Cettolo & Als, Translation Project Adaptation For MT-enhanced Computer Assisted Translation
24 pages
Computer Assisted Translation Tools Elma
No ratings yet
Computer Assisted Translation Tools Elma
4 pages
Metrics For Evaluating Translation Memory Software
No ratings yet
Metrics For Evaluating Translation Memory Software
136 pages
Blanca Roig Allué: Entreculturas 9
No ratings yet
Blanca Roig Allué: Entreculturas 9
14 pages
A Study of Iranian Translation Students' Attitudes Towards CAT' (Computer-Aided Translation) Tools
No ratings yet
A Study of Iranian Translation Students' Attitudes Towards CAT' (Computer-Aided Translation) Tools
10 pages
Translation Technology
No ratings yet
Translation Technology
42 pages
Machine Translation Models & Tools
No ratings yet
Machine Translation Models & Tools
29 pages
AI Translation
No ratings yet
AI Translation
11 pages
INTERNSHIP1TASKS
No ratings yet
INTERNSHIP1TASKS
10 pages
THE Tool FOR Updating and Consistency in A Collaborative Translation Project
No ratings yet
THE Tool FOR Updating and Consistency in A Collaborative Translation Project
23 pages
Application of Computer Aided
No ratings yet
Application of Computer Aided
6 pages
Review2 Rothwell+by+Secara
No ratings yet
Review2 Rothwell+by+Secara
6 pages
Essential CAT Tools for Translators
No ratings yet
Essential CAT Tools for Translators
2 pages
Building High Quality Translation Memories Acquired From Monolingual Corpora
No ratings yet
Building High Quality Translation Memories Acquired From Monolingual Corpora
12 pages
Administrador,+Brita+Banitzr+CT+40 1 PdfA
No ratings yet
Administrador,+Brita+Banitzr+CT+40 1 PdfA
18 pages
CAT Tools (Computer-Assisted Translation Tools)
No ratings yet
CAT Tools (Computer-Assisted Translation Tools)
5 pages
Công Nghệ Dịch Thuật
No ratings yet
Công Nghệ Dịch Thuật
4 pages
Teixeira 2011 NLPCS PDF
No ratings yet
Teixeira 2011 NLPCS PDF
12 pages
Practical Work On Unit 6
No ratings yet
Practical Work On Unit 6
2 pages
Book - Wiktionary, The Free Dictionary
No ratings yet
Book - Wiktionary, The Free Dictionary
25 pages
Independant and Dependant Clauses
No ratings yet
Independant and Dependant Clauses
4 pages
Coraline Study Guide for Students
100% (1)
Coraline Study Guide for Students
11 pages
Foreign and Second Language Anxiety-Hotwiz 2010
No ratings yet
Foreign and Second Language Anxiety-Hotwiz 2010
1 page
Fire and Ice
No ratings yet
Fire and Ice
4 pages
The Ball Poem
No ratings yet
The Ball Poem
2 pages
IELTS Reading Test VER5
No ratings yet
IELTS Reading Test VER5
6 pages
Lesson Plan in Filipino 11.docx Sept. 2-5
No ratings yet
Lesson Plan in Filipino 11.docx Sept. 2-5
9 pages
Assignment For Parts of Speech
No ratings yet
Assignment For Parts of Speech
4 pages
Test 22.02.23
No ratings yet
Test 22.02.23
2 pages
7 Chapter Seven Word Stress Revision
No ratings yet
7 Chapter Seven Word Stress Revision
5 pages
Final Exam-Reading and Writing-2
No ratings yet
Final Exam-Reading and Writing-2
6 pages
Poetry Analysis for Lit Students
No ratings yet
Poetry Analysis for Lit Students
3 pages
Arabic Q&A for Travelers
No ratings yet
Arabic Q&A for Travelers
35 pages
P5 Tenses
No ratings yet
P5 Tenses
11 pages
Pronoun Quiz: Correct Usage Test
No ratings yet
Pronoun Quiz: Correct Usage Test
3 pages
Present Continuous Use
No ratings yet
Present Continuous Use
2 pages
Adam Ornella - SKILLS THAT PAY THE BILLS: VERBAL AND NONVERBAL COMMUNICATION
No ratings yet
Adam Ornella - SKILLS THAT PAY THE BILLS: VERBAL AND NONVERBAL COMMUNICATION
3 pages
Task 4 - Multiword Verbs
No ratings yet
Task 4 - Multiword Verbs
7 pages
Bien Dich Viet Anh 2021
No ratings yet
Bien Dich Viet Anh 2021
61 pages
Discussion Unit 1
No ratings yet
Discussion Unit 1
2 pages
An Experiential Function On Students' Genre of Writing
No ratings yet
An Experiential Function On Students' Genre of Writing
125 pages
Pca Ingles Bachillerato Acelerado
No ratings yet
Pca Ingles Bachillerato Acelerado
21 pages
1a Samjna-Prakarana
No ratings yet
1a Samjna-Prakarana
48 pages
Spelling Activities for Students
100% (1)
Spelling Activities for Students
4 pages
Tracy S Tiger
No ratings yet
Tracy S Tiger
44 pages
B2 Lesson Plan: 3rd Conditional
No ratings yet
B2 Lesson Plan: 3rd Conditional
6 pages
DLL Eng2 Q1 Wk3
No ratings yet
DLL Eng2 Q1 Wk3
5 pages

Usability Analysis of The Concordia Tool Applying Novel Concordance Searching

Uploaded by

Usability Analysis of The Concordia Tool Applying Novel Concordance Searching

Uploaded by

Usability Analysis of the Concordia Tool

Applying Novel Concordance Searching

Rafał Jaworski1 , Ivan Dund̄er2 , and Sanja Seljan2

Keywords: concordance searching, computer-assisted translation, approximate

3 Concordia: a Concordance Search Algorithm

3.1 Operations on Index

3.2 Index Construction

3.3 Concordia Searching

Table 1. Example sentences for Concordia searching.

Table 2. Concordia search results.

Pattern interval Example id Example offset

4 Experiment and Evaluation

Table 3. Post-translation questionnaire.

4.2 Results and Discussion

Fig. 1. Familiarity with digital language resources.

Table 4. Total time needed for the translation tasks.

Table 5. Pearson correlation results.

Correlation Test set 1 Test set 2

Other correlations, such as “Concordia score and average number of lookups”,

Concordia combines capabilities of standard concordance searchers with the usabil-

You might also like