Christian Buck
2022
Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation
Jannis Bulian | Christian Buck | Wojciech Gajewski | Benjamin Börschinger | Tal Schuster
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Jannis Bulian | Christian Buck | Wojciech Gajewski | Benjamin Börschinger | Tal Schuster
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
The predictions of question answering (QA) systems are typically evaluated against manually annotated finite sets of one or more answers. This leads to a coverage limitation that results in underestimating the true performance of systems, and is typically addressed by extending over exact match (EM) with predefined rules or with the token-level F1 measure.In this paper, we present the first systematic conceptual and data-driven analysis to examine the shortcomings of token-level equivalence measures.To this end, we define the asymmetric notion of answer equivalence (AE), accepting answers that are equivalent to or improve over the reference, and publish over 23k human judgements for candidates produced by multiple QA systems on SQuAD.Through a careful analysis of this data, we reveal and quantify several concrete limitations of the F1 measure, such as a false impression of graduality, or missing dependence on the question.Since collecting AE annotations for each evaluated model is expensive, we learn a BERT matching (BEM) measure to approximate this task. Being a simpler task than QA, we find BEM to provide significantly better AE approximations than F1, and to more accurately reflect the performance of systems.Finally, we demonstrate the practical utility of AE and BEM on the concrete application of minimal accurate prediction sets, reducing the number of required answers by up to X2.6.
Decoding a Neural Retriever’s Latent Space for Query Suggestion
Leonard Adolphs | Michelle Chen Huebscher | Christian Buck | Sertan Girgin | Olivier Bachem | Massimiliano Ciaramita | Thomas Hofmann
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Leonard Adolphs | Michelle Chen Huebscher | Christian Buck | Sertan Girgin | Olivier Bachem | Massimiliano Ciaramita | Thomas Hofmann
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a “query decoder” that, given a latent representation of a neural search engine, generates the corresponding query. We show that it is possible to decode a meaningful query from its latent representation and, when moving in the right direction in latent space, to decode a query that retrieves the relevant paragraph. In particular, the query decoder can be useful to understand “what should have been asked” to retrieve a particular paragraph from the collection. We employ the query decoder to generate a large synthetic dataset of query reformulations for MSMarco, leading to improved retrieval performance. On this data, we train a pseudo-relevance feedback (PRF) T5 model for the application of query suggestion that outperforms both query reformulation and PRF information retrieval baselines.
2017
Proceedings of the Second Conference on Machine Translation
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Julia Kreutzer
Proceedings of the Second Conference on Machine Translation
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Julia Kreutzer
Proceedings of the Second Conference on Machine Translation
2016
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Ondřej Bojar | Christian Buck | Rajen Chatterjee | Christian Federmann | Liane Guillou | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Aurélie Névéol | Mariana Neves | Pavel Pecina | Martin Popel | Philipp Koehn | Christof Monz | Matteo Negri | Matt Post | Lucia Specia | Karin Verspoor | Jörg Tiedemann | Marco Turchi
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Findings of the WMT 2016 Bilingual Document Alignment Shared Task
Christian Buck | Philipp Koehn
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Christian Buck | Philipp Koehn
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Quick and Reliable Document Alignment via TF/IDF-weighted Cosine Distance
Christian Buck | Philipp Koehn
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
Christian Buck | Philipp Koehn
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers
2014
The MateCat Tool
Marcello Federico | Nicola Bertoldi | Mauro Cettolo | Matteo Negri | Marco Turchi | Marco Trombetti | Alessandro Cattelan | Antonio Farina | Domenico Lupinetti | Andrea Martines | Alberto Massidda | Holger Schwenk | Loïc Barrault | Frederic Blain | Philipp Koehn | Christian Buck | Ulrich Germann
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations
Marcello Federico | Nicola Bertoldi | Mauro Cettolo | Matteo Negri | Marco Turchi | Marco Trombetti | Alessandro Cattelan | Antonio Farina | Domenico Lupinetti | Andrea Martines | Alberto Massidda | Holger Schwenk | Loïc Barrault | Frederic Blain | Philipp Koehn | Christian Buck | Ulrich Germann
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: System Demonstrations
CASMACAT: A Computer-assisted Translation Workbench
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
Vicent Alabau | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Ulrich Germann | Jesús González-Rubio | Robin Hill | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Ortiz-Martínez | Herve Saint-Amand | Germán Sanchis Trilles | Chara Tsoukala
Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
N-gram Counts and Language Models from the Common Crawl
Christian Buck | Kenneth Heafield | Bas van Ooyen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Christian Buck | Kenneth Heafield | Bas van Ooyen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We contribute 5-gram counts and language models trained on the Common Crawl corpus, a collection over 9 billion web pages. This release improves upon the Google n-gram counts in two key ways: the inclusion of low-count entries and deduplication to reduce boilerplate. By preserving singletons, we were able to use Kneser-Ney smoothing to build large language models. This paper describes how the corpus was processed with emphasis on the problems that arise in working with data at this scale. Our unpruned Kneser-Ney English 5-gram language model, built on 975 billion deduplicated tokens, contains over 500 billion unique n-grams. We show gains of 0.5-1.4 BLEU by using large language models to translate into various languages.
Proceedings of the Ninth Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Christian Federmann | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Lucia Specia
Proceedings of the Ninth Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Christian Federmann | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Lucia Specia
Proceedings of the Ninth Workshop on Statistical Machine Translation
Findings of the 2014 Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Christian Federmann | Barry Haddow | Philipp Koehn | Johannes Leveling | Christof Monz | Pavel Pecina | Matt Post | Herve Saint-Amand | Radu Soricut | Lucia Specia | Aleš Tamchyna
Proceedings of the Ninth Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Christian Federmann | Barry Haddow | Philipp Koehn | Johannes Leveling | Christof Monz | Pavel Pecina | Matt Post | Herve Saint-Amand | Radu Soricut | Lucia Specia | Aleš Tamchyna
Proceedings of the Ninth Workshop on Statistical Machine Translation
FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task
José Guilherme Camargo de Souza | Jesús González-Rubio | Christian Buck | Marco Turchi | Matteo Negri
Proceedings of the Ninth Workshop on Statistical Machine Translation
José Guilherme Camargo de Souza | Jesús González-Rubio | Christian Buck | Marco Turchi | Matteo Negri
Proceedings of the Ninth Workshop on Statistical Machine Translation
2013
Advanced computer aided translation with a web-based workbench
Vicent Alabau | Ragnar Bonk | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Jesús González | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Oriz | Hervé Saint-Amand | Germán Sanchis | Chara Tsiukala
Proceedings of the 2nd Workshop on Post-editing Technology and Practice
Vicent Alabau | Ragnar Bonk | Christian Buck | Michael Carl | Francisco Casacuberta | Mercedes García-Martínez | Jesús González | Philipp Koehn | Luis Leiva | Bartolomé Mesa-Lao | Daniel Oriz | Hervé Saint-Amand | Germán Sanchis | Chara Tsiukala
Proceedings of the 2nd Workshop on Post-editing Technology and Practice
Proceedings of the Eighth Workshop on Statistical Machine Translation
Ondrej Bojar | Christian Buck | Chris Callison-Burch | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Herve Saint-Amand | Radu Soricut | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation
Ondrej Bojar | Christian Buck | Chris Callison-Burch | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Herve Saint-Amand | Radu Soricut | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation
Findings of the 2013 Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Chris Callison-Burch | Christian Federmann | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Radu Soricut | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation
Ondřej Bojar | Christian Buck | Chris Callison-Burch | Christian Federmann | Barry Haddow | Philipp Koehn | Christof Monz | Matt Post | Radu Soricut | Lucia Specia
Proceedings of the Eighth Workshop on Statistical Machine Translation
The Feasibility of HMEANT as a Human MT Evaluation Metric
Alexandra Birch | Barry Haddow | Ulrich Germann | Maria Nadejde | Christian Buck | Philipp Koehn
Proceedings of the Eighth Workshop on Statistical Machine Translation
Alexandra Birch | Barry Haddow | Ulrich Germann | Maria Nadejde | Christian Buck | Philipp Koehn
Proceedings of the Eighth Workshop on Statistical Machine Translation
FBK-UEdin Participation to the WMT13 Quality Estimation Shared Task
José Guilherme Camargo de Souza | Christian Buck | Marco Turchi | Matteo Negri
Proceedings of the Eighth Workshop on Statistical Machine Translation
José Guilherme Camargo de Souza | Christian Buck | Marco Turchi | Matteo Negri
Proceedings of the Eighth Workshop on Statistical Machine Translation
2012
Search
Fix author
Co-authors
- Philipp Koehn 13
- Barry Haddow 8
- Ondřej Bojar 7
- Christian Federmann 6
- Christof Monz 6
- Matt Post 6
- Lucia Specia 6
- Matteo Negri 5
- Marco Turchi 5
- Herve Saint-Amand 4
- Rajen Chatterjee 3
- Ulrich Germann 3
- Matthias Huck 3
- Antonio Jimeno Yepes 3
- Pavel Pecina 3
- Radu Soricut 3
- Vicent Alabau 2
- Nicola Bertoldi 2
- José G. C. de Souza 2
- Chris Callison-Burch 2
- Michael Carl 2
- Francisco Casacuberta 2
- Mauro Cettolo 2
- Marcello Federico 2
- Mercedes García-Martínez 2
- Jesús González-Rubio 2
- Liane Guillou 2
- Luis A. Leiva 2
- Bartolomé Mesa-Lao 2
- Aurelie Neveol 2
- Mariana Neves 2
- Martin Popel 2
- Germán Sanchis-Trilles 2
- Jörg Tiedemann 2
- Karin Verspoor 2
- Leonard Adolphs 1
- Olivier Bachem 1
- Loic Barrault 1
- Alexandra Birch 1
- Frédéric Blain 1
- Ragnar Bonk 1
- Jannis Bulian 1
- Benjamin Börschinger 1
- Alessandro Cattelan 1
- Michelle Chen Huebscher 1
- Massimiliano Ciaramita 1
- Antonio Farina 1
- Wojciech Gajewski 1
- Sertan Girgin 1
- Jesús González 1
- Yvette Graham 1
- Kenneth Heafield 1
- Robin L. Hill 1
- Thomas Hofmann 1
- Julia Kreutzer 1
- Johannes Leveling 1
- Domenico Lupinetti 1
- Andrea Martines 1
- Alberto Massidda 1
- Maria Nadejde 1
- Daniel Oriz 1
- Daniel Ortiz-Martínez 1
- Tal Schuster 1
- Holger Schwenk 1
- Aleš Tamchyna 1
- Marco Trombetti 1
- Chara Tsiukala 1
- Chara Tsoukala 1
- Bas van Ooyen 1