CNIMA (Chinease Non-Native Interactivity Measurement and Automation) is a fully annotated spoken dialogue dataset for automated evaluating dialogue quality and dialogue annotations for Chinese Second Language (CSL)conversation dialogue.
It was created for the CNIMA project:
For more details, please read:
Dialogues labelled for three levels (from the above paper) can be found in CNIMA data.
Features exacted from the annotated datasets can be found in feature_label.csv.
Python script data/explore_data.py provides an example of interfacing with the data.
-
dataset/CNIMA- Sample Dataset: Full dataset can bedownloaded via: https://drive.google.com/drive/folders/1uOzINQoNLxdina5tJ5bnumn4i1M5lnAf?usp=drive_link -
Dataset Viewing
-
To run the notebooks for examining the datasets, please follow the procedures listed below:
-
Download the dataset from the Shared folder link.
-
Put the data into dataset/CNIMA and extract sample.zip.
-
To view the data, one may use preprocessing.ipynb for viewing the examples in the example folder.
notebooks/a.ipynb- Notebook for preprocessingnotebooks/b.ipynb- Notebook for main experiments in micro-level predictionnotebooks/c.ipync- Notebook for added experiments in score predictionfolder/d.- Notebook for prompt LLM in annotation and score predictionnotebooks/.js- Codes for data annotation website and platformCSL.html- Interface for data annotation website and platform
figures/- Contains all figures used for this project
utils/- Contains all utility functions for this project