The dataset is associated with the ACL 2025 paper titled "Towards Geo-Culturally Grounded LLM Generations" by Piyawat Lertvittayakumjorn, David Kinney, Vinodkumar Prabhakaran, Donald Martin Jr., and Sunipa Dev.
This dataset was collected during the human evaluation (Section 4) in our paper. Specifically, it consists of pairs of prompts and AI-generated text for 10 countries (China, Ethiopia, Greece, Indonesia, Iran, Mexico, South Korea, Spain, the United Kingdom, and the United States), accompanied by human annotations assessing the cultural familiarity of each generated text on a scale from 0 to 4. The text in this dataset was generated by the Gemini 1.5 Flash model (with different augmentation or grounding methods), while the annotations for each text were done by annotators from the country in the prompt. More details can be found in the data card.
The following table is necessary for this dataset to be indexed by search engines such as Google Dataset Search.
property | value |
---|---|
name | Cultural Familiarity Annotations |
description | The dataset consists of pairs of prompts and AI-generated text for 10 countries (China, Ethiopia, Greece, Indonesia, Iran, Mexico, South Korea, Spain, the United Kingdom, and the United States), accompanied by human annotations assessing the cultural familiarity of each generated text on a scale from 0 to 4. The text in this dataset was generated by the Gemini 1.5 Flash model (with different augmentation or grounding methods), while the annotations for each text were done by annotators from the country in the prompt. |
sameAs | https://github.com/google-research-datasets/cultural_familiarity_annotations |
If you use or refer to this cultural familiarity annotation dataset, please cite the following paper.
@inproceedings{lertvittayakumjorn-etal-2025-towards,
title = "Towards Geo-Culturally Grounded {LLM} Generations",
author = "Lertvittayakumjorn, Piyawat and
Kinney, David and
Prabhakaran, Vinodkumar and
Jr., Donald Martin and
Dev, Sunipa",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-short.26/",
doi = "10.18653/v1/2025.acl-short.26",
pages = "313--330",
ISBN = "979-8-89176-252-7"
}
Piyawat Lertvittayakumjorn (firstname [at] google [dot] com)