0% found this document useful (0 votes)
98 views4 pages

Ancient Kannada Script Recognition

The document discusses the challenges of deciphering ancient Kannada scripts and proposes a technological solution using image processing and deep learning to convert these scripts into readable formats. The methodology includes image acquisition, preprocessing, segmentation, and classification of letters using Convolutional Neural Networks (CNNs). This approach aims to enhance understanding of ancient Kannada culture and potentially develop a mobile application for easier access to this historical information.

Uploaded by

shruthi g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views4 pages

Ancient Kannada Script Recognition

The document discusses the challenges of deciphering ancient Kannada scripts and proposes a technological solution using image processing and deep learning to convert these scripts into readable formats. The methodology includes image acquisition, preprocessing, segmentation, and classification of letters using Convolutional Neural Networks (CNNs). This approach aims to enhance understanding of ancient Kannada culture and potentially develop a mobile application for easier access to this historical information.

Uploaded by

shruthi g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Text Localization And Extraction Of Ancient

Kannada script
K Omkar Prasad Chaithanya K S
Department Of CSE Department Of CSE
Global Academy of Global Academy of
Technology Technology
Bengaluru, India Bengaluru, India
[email protected] [email protected]

Divya D Pallavi L J
Department Of CSE Department Of CSE
Global Academy of Global Academy of
Technology Technology
Bengaluru, India Bengaluru, India
[email protected] [email protected]

Abstract—India is a diverse country with lots of different Ancient Kannada culture encompasses a wide array of
cultures, traditions, and languages. Some of these languages, like elements including literature, lifestyles, culinary traditions,
Kannada and Tamil, are really old. Right now, India has 22 arts, rituals, clothing, architecture, and medicine. However,
official languages and more than 1,600 local ones.Kannada, one
of the oldest languages, is mainly spoken in the southern part modern generations often face challenges in deciphering an-
of India. It’s famous for its rich literature, which tells us a lot cient Kannada scripts due to their significant differences from
about the past. But there’s a problem: the way Kannada was modern scripts. Therefore, the development of systems capable
written in the past is hard for us to understand today.So, we of converting these ancient scripts into readable versions is
want to use technology to change ancient Kannada writing into imperative. This would enable younger generations to better
something we can read more easily. This will help us learn from
our ancestors’ knowledge. We’ll do this by using computers to appreciate and understand the rich Kannada heritage, despite
analyze images of old writing and then teach them to recognize the unique challenges posed by deciphering ancient scripts.
and translate it.
Index Terms—Image processing, Deep learning, Technology

I. INTRODUCTION
Ancient Kannada history stretches back over 6000 years,
boasting a rich cultural tapestry that continues to captivate
Fig. 1. Example of ancient Kannada script
scholars and enthusiasts worldwide. The Kannada people hold
their traditions and language in high esteem, with Kannada
being one of the world’s oldest languages, renowned for its II. RELATED WORKS
rich literary heritage. Works such as ”Kavirajamarga” and
”Vaddaradhane” stand as testament to its literary prowess. Prof. Vaibhav. V. Mainkar, Ms. Jyoti A. Katkar, Mr. Ajinkya
Over the course of two millennia, Kannada literature has B. Upade, and Ms. Poonam R. Pednekar proposed a Hand-
flourished, offering insights into the arts and history that written Character Recognition system [2] utilizing image
have shaped Kannada society. Understanding these facets processing techniques. Their approach involved segmentation
necessitates a deep understanding of the Kannada language. and feature extraction processes, which demand significant
Ancient Kannada scripts, used for inscriptions since antiquity, processing power and memory. However, their system is
provide invaluable insights into ancient Kannada life, shedding limited to recognizing recent text and cannot identify ancient
light on social, political, and cultural trends. scripts.
The usage of ancient Kannada scripts for inscriptions dates G. G. Rajput and Suryakanth Baburao Ummapure proposed
back to the earliest days of Kannada history, with artifacts a script identification technique [3], specifically for recogniz-
discovered across regions like Karnataka, Andhra Pradesh, ing Hindi text. Their method employed SVM, which requires
Kerala, and Sri Lanka. These scripts exhibit unique charac- intensive processing and necessitates maintaining a separate
teristics, distinct from other ancient writing systems. database for comparison.
G. Janani, V. Vishalini, and Dr. P. Mohan Kumar proposed
a Tamil Inscription recognition system [4], utilizing mor-
phological operations, connected component techniques, and
correlation processes. However, recent methodologies offer
more efficient computation methods.
Sudharshan Duth P. and Rahul H L proposed a method
[5] for identifying and extracting text from blurred and non-
blurred images. While their approach uses the Otsu threshold-
ing method for text segmentation, it performs less accurately
on blurred images compared to non-blurred ones.
Tengjun Yao, Zhengang Zhai, and Bingtao Gao proposed
a classification method [6] based on the fastText tool, which
classifies text based on frequently occurring sequences. How-
ever, this approach may yield similar results to the proposed
method for ancient Tamil scriptures, where the occurrence of
the same word sequences is minimal.
Yunxiang Zhang and Zhuyi Rao proposed a method [7]
utilizing a bag-of-word approach, which yields relatively good
results. Nevertheless, in classifying ancient Tamil scripts,
where words and letters differ from current usage, this method
may not be optimal.
Ranjit Ghoshal and Ayan Banerjee proposed a method [8]
employing thresholding methods, which, while satisfactory,
involves less computation compared to the proposed method
using deep neural networks. Comparative analysis [9] demon-
strates that convolutional neural networks outperform similar Fig. 2. Flow diagram of proposed system
tools. The proposed method utilizes deep learning techniques
to train neural networks, eliminating the need for manual
feature extraction and database creation. • Noise Removal: Techniques like median blur, Gaussian
blur, and non-local means denoising are used to enhance
III. METHODOLOGY image clarity by reducing noise.
The proposed system’s block diagram is outlined in Figure filter1 = cv2.medianBlur(gray, 3)
2, featuring a process with five steps. Initially, the system
captures the input image, accommodating various formats like filter2 = cv2.GaussianBlur(filter1, (5, 5), 0
jpeg, tiff, bmp, png, etc. Next, the image undergoes prepro- dst = cv2.fastNlMeansDenoising(filter2, None,
cessing, including grayscale conversion and noise removal.
Subsequently, a combined segmentation step handles both 17, 9, 17)
word and letter segmentation. Following this, the features of
each letter are extracted. Finally, the system identifies the • Binarization: Binarization converts the preprocessed im-
corresponding Kannada letter for the ancient Kannada script. age into a binary format using adaptive thresholding,
improving contrast between characters and background.
A. Image Acquisition By following these preprocessing steps sequentially, the input
Image acquisition involves capturing an image through a image is refined for better character segmentation and classi-
digital camera or a scanner, resulting in digital data stored fication, thereby enhancing the overall accuracy of character
in a standardized format. Common formats include jpeg, png, recognition algorithms.
tiff, and bmp, with images stored either in color (RGB format) C. Segmentation
or grayscale.
Segmenting characters from an image involves several
B. Preprocessing steps. Initially, we detect the contours to outline character
regions. Next, we create bounding boxes around these contours
Preprocessing is essential for optimizing images for charac- to encompass each character’s area. These bounded areas,
ter recognition tasks, ensuring they are clear and free of noise. known as Regions of Interest (ROIs), pinpoint individual
It involves several key steps: characters within the image. To verify accuracy, we visually
• Skew Correction: Skew correction adjusts the horizontal highlight these ROIs on the original image using rectangles.
orientation of text within an image, aligning it properly Subsequently, we save the segmented characters as separate
for accurate recognition. image files for further processing, such as dataset creation
or neural network training. By integrating contour detection,
bounding box extraction, ROI definition, visualization, and
character saving, the segmentation process effectively isolates
characters from the input image. This facilitates subsequent
character recognition and classification tasks within the image-
processing workflow.
D. Identification Of Letters
Fig. 3. Input Image
In the identification stage following segmentation, individ-
ual character regions extracted from the image are thoroughly
analyzed. This involves extracting various features like shape, B. Preprocessing
size, and texture from each segmented character to capture Preprocessing constitutes a fundamental stage in numer-
its unique traits. By scrutinizing the pixel intensity distribu- ous image processing methodologies. Within the proposed
tion within these regions, discernible patterns emerge, aiding approach, the initial transformation of the input image involved
in distinguishing between different letters. The identification converting it from RGB to grayscale. Subsequently, noise
process revolves around recognizing visual attributes inherent within the image was filtered out. The ’rgb2gray’ syntax
to each character, including curves, lines, and intersections that was employed for this conversion, which effectively trans-
define specific letters. Through feature extraction and pattern lates RGB values into grayscale values. This transformation
recognition techniques, the segmented characters are compared is achieved through a weighted sum of the R, G, and B
against predefined templates or patterns to ascertain their components, following the formula:
corresponding letter identities. This comparison of extracted
features with known letter patterns enables the system to 0.2989 ∗ R + 0.5870 ∗ G + 0.1140 ∗ B.
accurately assign letter labels to each segmented region based Three denoising techniques, including Gaussian blur, median
on their visual attributes.
E. Classifiaction Of Letters
In the classification of letters using Convolutional Neural
Networks (CNNs), a deep learning model is trained to rec-
ognize and differentiate between different letters based on
visual features extracted from segmented character images.
The process begins with preprocessing the input images and
feeding them into the CNN.The CNN architecture includes Fig. 4. Gray Image
convolutional layers for feature extraction, pooling layers for
dimensionality reduction, and fully connected layers for classi- blur, and non-local means, are employed to eliminate impuri-
fication. During training, the CNN learns to associate specific ties in the grayscale image. Subsequently, the outcomes from
visual patterns with each letter class through backpropagation these methods undergo binarization using the adaptive thresh-
and optimization algorithms. The model adjusts its internal old method, culminating in the final preprocessed image.The
parameters iteratively to minimize classification errors on the final output is shown in fig 5
training dataset.Once trained, the CNN predicts the letter
of a segmented character image by passing it through the
network and obtaining output class probabilities. The softmax
activation function in the output layer provides a probability
distribution across all letter classes, enabling accurate pre-
dictions. The CNN leverages its ability to learn hierarchical
features to effectively classify letters based on their visual
attributes.
Fig. 5. Final Preprocessed Image
IV. RESULTS
A. Image Acquisition C. Segmentation
The initial step of the proposed method involves image The function ‘segment characters‘ is crafted to process an
acquisition. The ancient Kannada script serves as the test image path as its input. It proceeds by detecting contours
image for the proposed system. This image can be obtained corresponding to characters within the image and segmenting
using any digital camera or by scanning the inscription and each character by isolating its bounding box region. These
uploading it to the proposed system. The input image can be segmented characters are then individually saved as image files
in any standard format, with commonly used formats including within a directory denoted as ”segmented characters”. After
PNG, jpeg, tif, or BMP. segmentation, the resultant images are forwarded to a CNN
model for classification. The below figure provides a visual [3] P. Dharani Devi; R. Thanuja, “Convolutional Neural Network based
representation of various examples of segmented characters. Deep Feature Extraction in Remote Sensing Images”, International
Conference on Smart Electronics and Communication (ICOSEC), 10-
12 Sept. 2020
[4] Tengjun Yao, Zhengang Zhai, Bingtao Gao, “Text Classification Model
Based on fastText”, International Conference on Artificial Intelligence
and Information Systems (ICAIIS), 2020.
[5] Deekshith K Gowda, V Kanchana, “Kannada Handwritten Character
Recognition and Classification Through OCR Using Hybrid Machine
Learning Techniques” IEEE International Conference on Data Science
and Information System (ICDSIS), 2022
[6] G. Janani; V. Vishalini; P. Mohan Kumar, “Recognition and Analysis of
Tamil Inscriptions And Mapping Using Image Processing Techniques”,
Second International Conference on Science Technology Engineering
and Management (ICONSTEM), 30-31 March 2016. Date Added to
IEEE Xplore: 08 September 2019.
[7] Yunxiang Zhang, Zhuyi Ra, “n-BiLSTM: BiLSTM with n-gram Features
for Text Classification”, 5th Information Technology and Mechatronics
Engineering Conference (ITOEC 2020), 2020.
[8] Ranjit Ghoshal, Ayan Banerjee, “An Improved Scene Text and Document
Image Binarization Scheme”, 4th International Conference on Recent
Advances in Information Technology — RAIT-2018 —, 2018.
[9] B J, Bipin Shobharani, N. Sreekumar, N.R. Ashok, Gopikr-
ishna. (2021). A Two Phase Denoising Approach to Remove
Uneven illumination From Ancient Note Book Images. 1563-
Fig. 6. Letters Segmented 1568.10.1109/ICACCS51430.2021.9441911
[10] Ranjit Ghoshal, Ayan Banerjee, “An Improved Scene Text and Document
Image Binarization Scheme”, 4th International Conference on Recent
D. Classification Of Letters Advances in Information Technology — RAIT-2018 —, 2018.
The segmented letters are inputted into the CNN model.
The classifier recognizes the letters and displays the identified
letters for the ancient Kannada script.

Fig. 7. Letters Segmented

V. CONCLUSION
The proposed method can identify ancient Kannada letters
and convert them into recent scripts. This will help researchers
and enthusiasts around the world to understand ancient Kan-
nada scripts. It will also provide insights into ancient Kannada
culture, including medicine, governance, education, lifestyle,
food habits, and traditions. This work has future research
potential to convert palm leaf and stone inscriptions into
readable Kannada scripts. Additionally, a mobile application
could be developed to enable individuals to easily read ancient
Kannada letters. The key aspect of this approach is the ability
to recognize and translate ancient Kannada scripts, which
can unlock valuable historical information and make it more
accessible. The potential development of a mobile app further
enhances the accessibility and usability of this technology for
a wider audience.
REFERENCES
[1] Vaibhav. V. Mainkar; Jyoti A. Katkar; Ajinkya B. Upade; Poonam R.
Pednekar “Handwritten Character Recognition to obtain Editable Text”,
International Conference on Electronics and Sustainable Communication
Systems (ICESC), 2-4 July 2020. Date Added to IEEE Xplore: 04
August 2020.
[2] Sudharshan Duth P, Rahul H L, “Identification and Extraction of Text in
Non-Blurred and Blurred Image Sequences”, International Conference
on Intelligent Computing and Control Systems (ICICCS 2020), 2020

You might also like