GenQA: A Method for Generating and Validating Question/Answer Pairs from Journalistic Data Material

Théo Lecardonnel; Christophe Hurter; Thomas Hurtut

doi:10.1109/PacificVis64226.2025.00036

Communication Dans Un Congrès Année : 2025

GenQA: A Method for Generating and Validating Question/Answer Pairs from Journalistic Data Material

(1) , (2) , (1)

1
2

Théo Lecardonnel

Fonction : Auteur

EPM - École Polytechnique de Montréal

Christophe Hurter

Fonction : Auteur
PersonId : 6085
IdHAL : christophe-hurter
ORCID : 0000-0003-4318-6717
IdRef : 162363176

ENAC - Ecole Nationale de l'Aviation Civile

Thomas Hurtut

Fonction : Auteur
PersonId : 864515

EPM - École Polytechnique de Montréal

Résumé

Data visualizations are now commonly used in online press articles which often supports engaging data-driven stories. However, due to its visual nature, this type of content inherently lacks accessibility (e.g. when one wants to consume those visualizations using conversational agents, hearing them in audible formats, or using screen reader). Writing alternative texts is the recommended standard in order to provide text descriptions associated with an image. However, newsrooms rarely produce them for data-visualizations, or when they do, these are overly simplistic. Several intertwined limitations explain that situation like the limited amount of time journalists have to produce these expected detailed descriptions or the lack of precise and standardized writing guidelines for describing visualizations. To address this issue, we propose a new approach to help journalists generate descriptions of visualizations, based on a set of generated question and answer pairs (hereafter referred to as Q/A). Due to the previously enumerated limitations, our method first generates those Q/As using a generative AI model of Natural Language Processing (NLP). This approach alleviates and homogenizes the writing task workload and allows for a systematic and more exhaustive exploration of the possible Q/As for a given visualization. However, among the critical challenges of using AI-based generative tools in a journalism context is the risk of publishing unreliable or biased information. Therefore, the methodology proposed in this paper gives the journalist user a high level of control over the AI-generated Q/As. To enable and optimize this mandatory validation task, we design an interface where Q/As are grouped in terms of semantic and textual content, and accessibility interest. Visual cues are also displayed to improve the journalist’s decision-making. To evaluate this proposed methodology, that we call GenQA, we conducted a comparative design study that gathered journalists from two different Canadian newsrooms and teachers. We observed that GenQA was efficiently used by those users and helped them to produce detailed visualization descriptions that met their expectations in terms of quality and workload. This study also showed that GenQA triggered significant serendipity potential, allowing users to explore and produce Q/As that cover aspects they might not have considered.

Mots clés

Domaines

Informatique [cs]

Fichier principal

Final GenQA_Article.pdf (1.56 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	Autorisation HAL

Connectez-vous pour contacter le contributeur

https://enac.hal.science/hal-05382042

Soumis le : mardi 25 novembre 2025-16:30:25

Dernière modification le : vendredi 27 février 2026-03:11:49

Dates et versions

hal-05382042 , version 1 (25-11-2025)

Licence

Autorisation HAL

Identifiants

HAL Id : hal-05382042 , version 1
DOI : 10.1109/PacificVis64226.2025.00036

Citer

Théo Lecardonnel, Christophe Hurter, Thomas Hurtut. GenQA: A Method for Generating and Validating Question/Answer Pairs from Journalistic Data Material. 2025 IEEE 18th Pacific Visualization Conference (PacificVis), Apr 2025, Taipei City, France. pp.296-306, ⟨10.1109/PacificVis64226.2025.00036⟩. ⟨hal-05382042⟩

GenQA: A Method for Generating and Validating Question/Answer Pairs from Journalistic Data Material

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager