train-eval-hlv

Code for the paper "Training and Evaluating with Human Label Variation: An Empirical Study"

Requirements

Linux with CUDA 12.1
Python 3.10+
pip-tools

Install dependencies with pip-sync requirements.txt.

If you have a non-Linux system or a different version of CUDA/Python, you may need to generate the appropriate requirements.txt yourself. See the example command for reference.

Tests

Ensure that everything works by running the tests with pytest. By default, slow tests (e.g., training models) are skipped. Run them by passing -m slow argument: pytest -m slow.

Also, tests in module test_flair are skipped by default. You may want to run them to verify the FlairNLP version by passing -k test_flair argument: pytest -k test_flair. Combine with -m slow to run the slow tests too.

Data

All Python scripts in this section can be invoked with -h or --help option to show their usage.

Download the following datasets:
Convert the datasets into JSONL format using convert_dataset.py
Create 10-fold train-test splits for ChaosNLI and MFRC using create_kfold_splits.py
Create a dev set for each fold using create_random_split.py

Training

./run_training.py train with chaosnli data_dir=data artifacts_dir=artifacts method=ReL

The above invocation trains a base RoBERTa on the ChaosNLI data under directory data using repeated labelling (ReL) method and saves the training artifacts (incl. the trained model parameters) under directory artifacts. Argument train and chaosnli are a command and a named configuration respectively, while key-value pairs such as data_dir=data are (unnamed) configurations. The train command can be omitted as it is the default command. The full list of commands can be viewed by executing the help command without any arguments.

The script also accepts extra configurations as a JSON file, which is useful for hyperparameter tuning with random search (see below).

Random search

./generate_configs.py output_dir -n 20

The above invocation generates 20 JSON files under directory output_dir, each containing batch size and learning rate configurations sampled randomly. Invoke with -h or --help for detailed usage.

Evaluation

The run_training.py script already performs evaluation after training completes. This section is mainly for evaluation with K-fold cross-validation (ChaosNLI and MFRC).

./run_eval.py with artifacts_dir=artifacts

The above invocation evaluates training artifacts under directory artifacts. For K-fold CV, artifacts hlv.npy and label.dict must be concatenated across folds beforehand.

Annotation

NOTE: In this section, the term "HLV" refers to "human judgement distribution".

All Python scripts in this section can be invoked with -h or --help option to show their usage.

Preprocessing

Compute the mean HLV across runs for all models using compute_method2hlv.py
Create the annotation input data using create_hlv_annotation_input.py --vs method

Annotators are expected to annotate the annotation input data. The expected format of the annotation output data is JSON whose schema can be viewed under directory schemas. The following instructions assume that the annotation output data exists.

Postprocessing

Compute the number of pairwise wins using compute_pairwise_wins2.py. In the output directory, there will be 2 files:

counts.npz, which contains two NumPy arrays---each of shape 2xMxM where the first dimension represents the pretrained model (0=RoBERTa, 1=LLaMA), and M is the number of HLV training methods---under the following keys:
1. comps, a symmetric matrix with all-zeros diagonal which stores the number of comparisons between each pair of methods; and
2. wins, which stores the number of comparisons where the column method wins over the row method.
method.dict, a serialised FlairNLP's Dictionary containing the method names and indices corresponding to the rows and columns of matrices in counts.npz.

To get the scores of all methods, the rank centrality algorithm is then used.

MongoDB integration with Sacred

Both run_training.py and run_eval.py scripts use Sacred and have its MongoObserver activated by default. Set SACRED_MONGO_URL (and optionally SACRED_DB_NAME) environment variable(s) to write experiment runs to a MongoDB instance. For example, set SACRED_MONGO_URL=mongodb://localhost:27017 if the MongoDB instance is listening on port 27017 on the local machine.

License

Apache License, Version 2.0

Citation

@article{kurniawan2025,
  title = {Training and {{Evaluating}} with {{Human Label Variation}}: {{An Empirical Study}}},
  author = {Kurniawan, Kemal and Mistica, Meladel and Baldwin, Timothy and Lau, Jey Han},
  year = 2025,
  journal = {Computational Linguistics},
  pages = {1--27},
  issn = {1530-9312},
  doi = {10.1162/COLI.a.578},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
schemas		schemas
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
classifiers.py		classifiers.py
compute_method2hlv.py		compute_method2hlv.py
compute_pairwise_wins2.py		compute_pairwise_wins2.py
constraints.txt		constraints.txt
convert_dataset.py		convert_dataset.py
corpus.py		corpus.py
create_hlv_annotation_input.py		create_hlv_annotation_input.py
create_kfold_splits.py		create_kfold_splits.py
create_random_split.py		create_random_split.py
evaluation.py		evaluation.py
generate_configs.py		generate_configs.py
hlv_loss.py		hlv_loss.py
predict.py		predict.py
pytest.ini		pytest.ini
requirements.in		requirements.in
requirements.txt		requirements.txt
run_eval.py		run_eval.py
run_training.py		run_training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

train-eval-hlv

Requirements

Tests

Data

Training

Random search

Evaluation

Annotation

Preprocessing

Postprocessing

MongoDB integration with Sacred

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

train-eval-hlv

Requirements

Tests

Data

Training

Random search

Evaluation

Annotation

Preprocessing

Postprocessing

MongoDB integration with Sacred

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages