Skip to content

RenaGao/CSL2024

Repository files navigation

CSL2024

CNIMA (Chinease Non-Native Interactivity Measurement and Automation) is a fully annotated spoken dialogue dataset for automated evaluating dialogue quality and dialogue annotations for Chinese Second Language (CSL)conversation dialogue.

It was created for the CNIMA project:

For more details, please read:

Dialogues labelled for three levels (from the above paper) can be found in CNIMA data.

Features exacted from the annotated datasets can be found in feature_label.csv.

Python script data/explore_data.py provides an example of interfacing with the data.

Dataset

Notebooks

  • notebooks/a.ipynb - Notebook for preprocessing
  • notebooks/b.ipynb - Notebook for main experiments in micro-level prediction
  • notebooks/c.ipync - Notebook for added experiments in score prediction
  • folder/d. - Notebook for prompt LLM in annotation and score prediction
  • notebooks/.js - Codes for data annotation website and platform
  • CSL.html - Interface for data annotation website and platform

Figures

  • figures/ - Contains all figures used for this project

Utils

  • utils/ - Contains all utility functions for this project

Reports

About

This is the repo for CSL CNIMA datasets and related codes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors