Skip to content
@dsfsi

Data Science for Social Impact Research Group @ University of Pretoria

We are the Data Science for Social Impact research group at the Computer Science Department, University of Pretoria.

We are the Data Science for Social Impact research group at the Computer Science Department, University of Pretoria.

Our general areas of work straddle Data Science for Society as well as Local Language Natural Language Processing. These two strands are complementary. Our work in Data Science and Society has allowed us to have a more nuanced approach to understanding the systematic challenges that face being able to do excellent science with local languages. Through Data Science for Society, we have to understand how when one carries through Data Science research, we situate how the users are part of the process. We find that we need to adjust our research to take care of these challenges and innovate in ways we gather direct data or alternative data.

For us, Data Science for Society means being able to improve approaches/methods or scientific tools for DS while enhancing the ways decision-makers can use the insights that come from these tools. Local Language Natural Language Processing is focused on ways to develop new tools, new data and methodology to improve the state of African languages.

DSFSI Vision, Mission and Values.

Vision

To be a leading inclusive lab that creates and harnesses data and multidisciplinary scientific exploration for societal impact.

Mission

Data-driven collaborative innovation to empower society to tackle challenges and preserve our languages.

Values

  • Community and Collaboration
  • Shared responsibility
  • Inclusiveness
  • Integrity and openness
  • Agency
  • Generosity

Pinned Loading

  1. vukuzenzele-nlp vukuzenzele-nlp Public

    Forked from dsfsi/dsfsi-dataset-template

    The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'u…

    Jupyter Notebook 7 6

  2. textaugment textaugment Public

    TextAugment: Text Augmentation Library

    Python 431 60

  3. covid19za covid19za Public

    Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa

    Jupyter Notebook 256 197

  4. dsfsi-datasets dsfsi-datasets Public

    Official DSFSI Public Datasets Registry - Comprehensive catalog of 50+ datasets for South African & African languages. Includes speech recognition, NLP, terminology, health, legal & financial data…

    Jupyter Notebook 3 3

  5. deadlines deadlines Public

    Forked from vukosim/ai-ds-africa-deadlines

    ⏰ AI/ML/DS conference/workshop/event deadlines on the African continent

    HTML 22 11

  6. za-mafoko za-mafoko Public

    DSFSI South African Terminlogy Lists and Lexicon Project

    HTML 3 1

Repositories

Showing 10 of 58 repositories
  • lacunafund-datasets Public Forked from Fair-Forward/datasets

    A catalog that links to the datasets and use-cases of the Lacuna Fund

    dsfsi/lacunafund-datasets’s past year of commit activity
    Python 1 CC0-1.0 3 0 0 Updated Jan 7, 2026
  • cos802 Public

    Defense against the dark text arts

    dsfsi/cos802’s past year of commit activity
    SCSS 0 MIT 1 0 0 Updated Jan 3, 2026
  • mit808 Public

    Data Science Capstone

    dsfsi/mit808’s past year of commit activity
    SCSS 0 MIT 0 0 0 Updated Jan 3, 2026
  • vukuzenzele-nlp Public Forked from dsfsi/dsfsi-dataset-template

    The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the data/raw folder. The PDFS were obtained from the Vuk'uzenzele website.

    dsfsi/vukuzenzele-nlp’s past year of commit activity
    Jupyter Notebook 7 MIT 7 3 0 Updated Jan 2, 2026
  • dsfsi-datasets Public

    Official DSFSI Public Datasets Registry - Comprehensive catalog of 50+ datasets for South African & African languages. Includes speech recognition, NLP, terminology, health, legal & financial data across HuggingFace, GitHub, Zenodo & more.

    dsfsi/dsfsi-datasets’s past year of commit activity
    Jupyter Notebook 3 MIT 3 0 0 Updated Dec 31, 2025
  • gov-za-multilingual Public

    The data set contains cabinet statements from the South African government. Data was scraped from the governments website: https://www.gov.za/cabinet-statements

    dsfsi/gov-za-multilingual’s past year of commit activity
    Jupyter Notebook 4 MIT 0 0 0 Updated Dec 31, 2025
  • PuoBERTa Public

    A Roberta-based language model specially designed for Setswana, using the new PuoData dataset.

    dsfsi/PuoBERTa’s past year of commit activity
    Makefile 5 0 0 0 Updated Dec 30, 2025
  • textaugment Public

    TextAugment: Text Augmentation Library

    dsfsi/textaugment’s past year of commit activity
    Python 431 MIT 60 1 1 Updated Dec 10, 2025
  • za-mafoko Public

    DSFSI South African Terminlogy Lists and Lexicon Project

    dsfsi/za-mafoko’s past year of commit activity
    HTML 3 1 0 0 Updated Nov 9, 2025
  • dsfsi/za-mafoko-translation’s past year of commit activity
    Python 0 0 0 0 Updated Sep 28, 2025