Code to reproduce the experiments in A Simple Transfer Learning Baseline for Ellipsis Resolution
Requires Python >= 3.5.0
Recommended: Create a conda environment with conda create -n myenv python=3.7
The repository contains conversion scripts for converting different datasets into the SQuAD 1.1 format.
vpe2squad.py: Convert VP ellipsis dataset into SQuAD formatconll2squad.py: Convert coreference data from C0NLL-2012 to SQuAD format- First convert
.conllfiles to.jsonlinesusing this - Set
ONTONOTES_DIR(ontonotes folder path) andset2fmt(filename to convert to SQuAD format) - Run script
- First convert
sluice2squad.py: Convert sluice ellipsis dataset into SQuAD formatwikicoref2conll.py: Convert WikiCoref dataset into CoNLL-2012 formatsquad2conll.py: Convert the prediction files produced bybert/run_squad.pyinto CONLL format for evaluation
annotate_qwords.py: Adds<ref>and</ref>tags to interrogation words in SQuAD filesevaluate-v1.1.py: Standard SQuAD v1.1 evaluation script (for evaluating ellipsis)
For coreference resolution, use the standard CoNLL-2012 script after converting the predictions into the CoNLL-2012 format using squad2conll.py.
Each model folder contains pre-processing, configuration, training and evaluation scripts for Sluice Ellipsis. To run on other datasets, just replace the data paths appropriately.
- Code based on Facebook's DrQA
- Scripts for preprocessing, training and prediction
- Code based on AllenNLP
- AllenNLP configuration file
- Scripts for training and prediction
- Uses Huggingface's Transformers
- Scripts for training and evaluation