Research Synopsis
Research Synopsis
1.1.Background
Neural Machine Translation (NMT) has significantly improved the quality of automatic
translation for various languages by leveraging deep learning models such as Transformer,
BERT, and mBART. However, low-resource languages like Tigrigna (spoken primarily in
Ethiopia and Eritrea) lack sufficient parallel corpora, which limits the performance of
machine translation systems.
1.2.Problem Statement
The main problem in the Tigrigna language is the lack of a large-scale parallel corpus for
Tigrigna-English translation. In addition, the poor performance of existing machine
translation systems for Tigrigna, due to data scarcity, along with the morphological richness
and syntactic complexity of the language, makes translation challenging.
1.3.Research Questions
1. How can transfer learning improve the translation quality of Tigrigna-English NMT
models?
2. What data augmentation techniques (such as synthetic data generation) can help
overcome parallel corpus limitations?
3. How effective are pre-trained multilingual models (such as mBART and mT5) for
Tigrigna NMT?
1.4.Research Significance
2.2.Specific Objectives
To deploy and test the developed model in real-world applications such as Tigrigna AI
chatbots, multilingual search engines, and speech-to-text translation services.
3. Literature Review
3.1.Overview of Neural Machine Translation (NMT)
Limited Parallel Data: Unlike English or French, Tigrigna lacks a large, high-quality
bilingual dataset.
This research will follow an experimental approach, combining data collection, model training, and
evaluation to optimize the Tigrigna-English NMT system.
Training Process:
o Use GPU-based training on TensorFlow and PyTorch.
Evaluation Metrics:
Collect feedback from linguists and native Tigrigna speakers for manual evaluation.