University of Gondar
Department of Computer Science
Postgraduate Program
Academic Year: 2019/12 (Sem I)
November 02, 2019
Projects on Natural Language Processing (COSC 6405)
Introduction
As part of the Natural Language Processing course, 13 project ideas are given herewith the purpose of
which is to familiarize you with the tools, techniques and concepts used to develop NLP applications. You
will be given all the necessary data and tools you need to work with. Almost in all of the projects, you
are required to develop NLP applications for Amharic text, but feel free to use any of the Ethiopian
languages that are of interest to you. For the case of editable Ethiopic texts, you should work using
Unicode representation of Ethiopic characters.
Deliverables
System: You should submit the system (along with the source codes) you have developed.
Report: You should submit a report of the work you have done. The report should specify the system
you developed, the approach you used, the performance of your system (supported by
examples where it works fine or fails to work), difficulties encountered, further works to be
done, and so on.
Evaluation
You should present your work in the presence of other students. You will be evaluated based on your
system, report and presentation. System and report evaluations are group-based, but for the
presentation and questions arising, you will be evaluated individually.
Contribution to Grade
The project contributes 40% of the overall course grade.
Submission Deadline
December 15, 2019
[Link]@[Link]
Page 1 of 2
Projects Ideas
1. Develop a system that checks the spelling of Amharic words (along with suggestions if there are
errors) as you are typing.
2. Write a program that displays parse tree for a given Amharic sentence.
3. Develop an Amharic WordNet and write a program that disambiguates word senses in a given
Amharic sentence.
4. Write a program that identifies double meanings (‘gold’ and ‘wax’) in Amharic poems.
5. Develop a system that identifies the language in which an Ethiopic document is written.
6. Design and develop a rule-based English-to-Amharic machine translation system for simple
sentences.
7. Develop a statistical Amharic-to-English machine translation system.
8. Develop a text-to-speech converter for Amharic.
9. Using GATE (General Architecture for Text Engineering) toolkit, develop a system that recognizes
named entities in Amharic text.
10. Using HTK (HMM Toolkit), develop a speech recognition system for Amharic language.
11. Develop a system that predicts the lexical category of words in a given Amharic sentence.
12. Using OpenCV image processing tool (and C++ programming language), write a program that
detects text lines and segments characters in machine printed documents. Before segmentation, the
program should be able to remove noise (if there is any) using Gaussian filter. The resultant image
should show segmented characters in rectangular boxes and each detected text line by drawing two
lines above and below the text line.
13. Using MATLAB programming tool, write a program that displays the directions of pen-tip
movements in online handwritten text. The program should be able to read handwritten data
(captured online and saved in UNIPEN format) from a file and the directions of data points
representing pen-tip movements should be shown using arrows on each point.
Page 2 of 2