GenQA: A Method for Generating and Validating Question/Answer Pairs from Journalistic Data Material
Résumé
Data visualizations are now commonly used in online press articles which often supports engaging data-driven stories. However, due to its visual nature, this type of content inherently lacks accessibility (e.g. when one wants to consume those visualizations using conversational agents, hearing them in audible formats, or using screen reader). Writing alternative texts is the recommended standard in order to provide text descriptions associated with an image. However, newsrooms rarely produce them for data-visualizations, or when they do, these are overly simplistic. Several intertwined limitations explain that situation like the limited amount of time journalists have to produce these expected detailed descriptions or the lack of precise and standardized writing guidelines for describing visualizations. To address this issue, we propose a new approach to help journalists generate descriptions of visualizations, based on a set of generated question and answer pairs (hereafter referred to as Q/A). Due to the previously enumerated limitations, our method first generates those Q/As using a generative AI model of Natural Language Processing (NLP). This approach alleviates and homogenizes the writing task workload and allows for a systematic and more exhaustive exploration of the possible Q/As for a given visualization. However, among the critical challenges of using AI-based generative tools in a journalism context is the risk of publishing unreliable or biased information. Therefore, the methodology proposed in this paper gives the journalist user a high level of control over the AI-generated Q/As. To enable and optimize this mandatory validation task, we design an interface where Q/As are grouped in terms of semantic and textual content, and accessibility interest. Visual cues are also displayed to improve the journalist’s decision-making. To evaluate this proposed methodology, that we call GenQA, we conducted a comparative design study that gathered journalists from two different Canadian newsrooms and teachers. We observed that GenQA was efficiently used by those users and helped them to produce detailed visualization descriptions that met their expectations in terms of quality and workload. This study also showed that GenQA triggered significant serendipity potential, allowing users to explore and produce Q/As that cover aspects they might not have considered.
| Origine | Fichiers produits par l'(les) auteur(s) |
|---|---|
| Licence |
