0% found this document useful (0 votes)
32 views66 pages

Proposal Bangla LGR 20may20 en

Uploaded by

TAPAS SAU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views66 pages

Proposal Bangla LGR 20may20 en

Uploaded by

TAPAS SAU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Proposal for a Bangla (or Bengali) Script

Root Zone Label Generation Ruleset (LGR)


LGR Version: 4.0
Current Date: 2020-05-20
Document version: 5
Authors: Neo-Brahmi Generation Panel [NBGP]

1. General Information
This document lays down the Label Generation Rule Set (LGR) for the Bangla (or
‘Bengali’)1 script under the general rubric of the Neo-Brāhmī Writing System. Three
main components of the Bangla Script LGR i.e. (i) Code point repertoire, (ii) Variants
and (iii) Whole Label Evaluation Rules which have been described in detail here, having
given a brief historical background of the Script under Section 3.

All these components will be incorporated in a machine-readable format in an XML file


named "proposal-bengali-lgr-20mar20-en.xml". Labels for testing can be found in the
accompanying text document “bangla-test-labels-20mar20-en.txt”.

2. Script for Which the LGR Is Proposed


ISO 15924 Code: Beng
ISO 15924 Key N°: 325
ISO 15924 English Name: Bengali (Bangla)
Latin transliteration of native script names [in IPA]: bɑːŋlɑː, ôxômiya
Native names of the script: বাংলা, অসমীয়া
Maximal Starting Repertoire (MSR) version : MSR-4

3. Background on Script & Principal Languages Using It


3.0. Introduction
‘Bangla’ (or Bengali) is historically and genealogically regarded as an eastern Indo-
Aryan language with around 178.2 million speakers in Bangladesh (98% speakers), and
83.4 million speakers in the Indian states of West Bengal (68.37 million), Tripura (2.15
million), South Assam (7.3 million), Odisha (0.49 million) and Delhi (0.21 million) as

1 The term ‘Bangla’ is used in the descriptive text and the term ‘Bengali’ is used in the normative part of this
proposal.
well as in the Andaman and Nicobar Islands (close to a hundred thousand) - accounting
for 8.3% of India. It is a major language in Jharkhand (2.6 million), too and a language
with a sizable population in Bihar (0.44 million). Apart from these, there are a huge
number of Bangla-speaking diasporas spread all over the world. It is the seventh largest
spoken and written language in the world. Bangla is the national and official language of
Bangladesh, and one of the 22 Official languages in India (listed in the 8th Schedule of
the Indian Constitution). It is also one of the official languages of Sierra Leone. The
script is also called Bangla [102], which is an eastern variety of the ‘Brāhmī’ Writing
System, written from left to right. Historically it derives from the Brāhmī alphabet as
used in the Ashokan inscriptions (269-232 BC).

Bangla and its cognate languages, as mentioned above, together form a linguistic group
known as the Eastern New Indo-Aryan (NIA). There is a gross inadequacy of the
inscriptions and manuscripts in the Eastern Apabhraṅ śa or ‘Avahaṭṭha’ except for small
inscriptions and the manuscripts of the Tantric Buddhist text titled
‘Caryyācaryyaviniścaya’ or the Caryā-Pada [114] dating back to the 9th-11th century. As
a result, there is not much epigraphic evidence for the development of its writing
system. However, what evidence is available of the genesis of Bangla writing system is
discussed in the section 3.1 [109].

Historically, the Bangla language is divided into three periods as evident from various
sources:

(i) Firstly, Old Bangla Period (roughly 950/1000 to A.D.1200/1350) of which


three specimens are found: (a) 47 Caryā songs, the Dohākōṣa of Saraha and
the Dohākōṣa of Kānha (mostly in Apabhraṅ śa), and the Ḍākārṇava (in a
variety of Prā kṛ t), (b) Old Bangla specimens of over 300 words in a
commentary [141].
(ii) Then there is Middle Bangla Period - 1200-1800 AD, again divided into three
stages: (a) Transitional Middle Bangla (1200-1300 A.D, for which no genuine
specimens are found) [147], (b) Early Middle Bangla (1300-1500 A.D), and
(c) Late Middle Bangla (1500-1800 A.D).
(iii) Finally, after 1800 AD, we find the Modern or New Bangla, marked by the
introduction of written prose [109] in the books of Fort William College
(established in 1800) The colloquial variety of Bangla based on the speech
variety of Calcutta (called ‘Kolkata’ now) made its first appearance through
the Hutōm Pẽcāra Nakśā (1862) by Peari Chand Mitra. The influence of
English in the vocabulary, idioms, and expressions as well as in the writing
styles of Bangla is significant by this time. The fonts and types for Bangla
developed during this time also spread to all parts of Bangla speech
community [101, 120]. The same fonts with some extensions were also used
for the neighbouring languages deploying this writing system.

2
Bangla prose had developed two literary styles during the 19th-20th Century: The
Sādhubhāṣā (সাধু ভাষা - "Elegant Language or Style") and the Calitabhāṣā (চিলতভাষা
"Current Language, or Modern Style"). It is the latter style that is prevalent today in
written prose.

The Language Movement in Bangladesh (the then East Pakistan) began in 1948, as civil
society dissented to the elimination of the Bangla script from currency and stamps,
which were in use since the British Raj. The movement reached its pinnacle in 1952,
when on 21 February the police fired on demonstrating students and civilians,
triggering numerous injuries and deaths2. Later, following the Language movement, on
27 April 1952, the All Party National Language Committee decided to demand
establishment of an organization for the promotion of Bengali language. Bangla
Academy, Dhaka right from its inception in 1955 has been engaged in promoting and
fostering Bangla as the lingua franca of the country before and after independence from
Pakistan in 1971. Through the various commissions and committees constituted by the
Government of Bangladesh (Bā ṅ lā deś a Jā tı̄ya Sy ikṣ ā Kamiś ana in 1972, Jā tı̄ya Sy ikṣ ā
Upadeṣ ṭā Pariṣ ad in 1979, Bā ṅ lā Bhā ṣ ā Bā stabā yana Sela in 1982, Bā ṅ lā Bhā ṣ ā Kamiṭi in
1983, etc.3) after independence in 1971 Bangla was made the primary medium of
instruction/communication in all Governmental and educational activities. Through a
great struggle and bloodshed, the Bengalis established Bangla as an official language of
the state.4.

3.1. Written Bangla


The ‘Bangla alphabet’ (বাংলা িলিপ - Bānglā lipi, ISO 15924) is derived from the Brāhmī
writing system, which is related to the Nā garı̄ (also known as Devanāgarī5) script [108]
as well as to Tirhutā writing system [106]. Considered to be fifth most widely used
writing system in the world, this combined Bangla-Asamiyā-Maṇ ipuri Script (showing
some variations for Asamiyā and Meitei or Biṣ ṇupriyā Manipuri) (130), was used in the
eastern Indian Sanskrit manuscripts too. For Chā kmā in India and Bangladesh and for
Kokborok in Tripurā , it was and still is one of the scripts used. A close variant, called
Tirhutā (123; now available also in UNICODE 10.0 as 11480 114DF; See 110) or

2
The UN declared Ekuśe February (21st February) as the International Mother Language Day at the UNESCO
General Conference in Paris on 17 November 1999 “in recognition of the sanctity and preservation of all
vernacular languages in the world.”22
3
Bāṅlā Bhāṣā Kamiṭi. 1983. Bāṅlā Bhāṣā Kamiṭi Riporṭ (Report of the Bangla Bhasha Committee). Dhakaː Śikṣā,
Dharma, Krīṛā O Saṅskṛti Mantraṇālaya, Peoples Republic of Bangladesh.
4
Chakraborty, Rajib. 2018. The Fishermen’s Community: A Language-Culture Interplay (A Study of Post-1971
Select Bangla Novels). Unpublished Ph.D. Dissertation, Visva-Bharati.
5 William Dwight Whitney in his Sanskrit Grammar unequivocally said, “This name (Devanā garı̄) is of
doubtful origin and value” (Whitney, William Dwight. 1994 reprint. Sanskrit Grammar. New Delhiː Motilal
Banarasidass Publishers, p. 1)

3
Mithilākṣara was used for Maithili from the 14th Century until the early-20th century
[106]. In this context, one finds a mention of ‘Sylheti Nā garı̄ lipi’ or ‘Siloṭi’ (added to the
Unicode Standard in March 2005 with the release of version 4.1) the details of which
could be of interest only to historians and historical linguists (See 137 and 144). But
Sylheti Bangla is generally written by many in the modern-day Bangla script now for all
practical purposes. Originally, during the reign of the Pāla dynasty (750-1154 AD) in
the eastern India, and even earlier, perhaps during the Malla period (694 AD onwards),
the present-day Bangla writing system got a shape comparable to the modern-day ones
[111, 119]. A pictorial description of Brāhmī to Modern Bangla Script could be
presented here in a tabular form:

Modern
ক জ ম র স অ

k j m r s a

Table 1: Pictorial depiction of Evolution of Brāhmī to Bangla

The inscriptional evidence in Brāhmī is found in the Archaic Brāhmī from the 3rd
century B.C. to the 1st century B.C, and in Middle Brāhmī – soon after (1st-3rd Century
A.D.) and then on in the Late Brāhmī (4th-6th Century A.D.). This evidence could be seen
in both Bangladesh and West Bengal [108] by 1) The Mahā sthā nagaṛ a (Bogra district,
Bangladesh — the ancient name being Puṇ ḍ ranagara or Pauṇ ḍ ravardhanapura)
inscriptions, 2) Brāhmī (and Kharoṣṭhī) inscriptions from the lower ‘Gangetic Bengal’
and (3) Copper plate inscriptions of the Imperial Guptas from Northern part of West
Bengal and North-West Bangladesh — in the areas under Dharmā ditya, Gopachandra
and Samācāradeva (about whom one only knows from five Copper-plates found in
Kotā lipā ṛ ā in the Faridpur district in Bangladesh, one in Mallasā rul in the Burdwan
district (West Bengal), and one in Jayrā mapura (Balleś vara district, now in Odisha).

4
These epigraphs from the eastern part of Undivided India (dating back to the 4th-6th
Centuries A.D.) showed some characteristic features of letters (especially in ম ‘ma’, ল ‘la’,
শ ‘ś a’, স ‘sa’ and হ ‘ha’), which led to the development of eastern variety of Gupta script.
Epigraphic records from Bangladesh demonstrate remarkable developments in Eastern
Brāhmī. In this context, the Tippera copper plate inscription of the ‘Samataṭa’ rulers
(139, pp 265) such as Lokanātha (dated 7th Century A.D., during the latter half), the
Kailan inscription of Sy ridharaṇ a Rāta as well as the Astafpur copper plates. The letters
seem to hang down from wedge shaped solid triangles with right hand verticals bending
down at the bottom, because of which it was described by Prinsep and Fleet as Kuṭila-
lipi (literally, ‘Cursive writing style’), whereas the term Siddhamātrikā (as a mā trā or
bar is placed over each of the letters) was used by Al Biruni (973-1048) to designate the
script of Northern India. The next stage of development is illustrated by the 9th Century
copper plate inscriptions from Khalimpur of the reign of Dharmapāla, from Monghyr
and Nā landā of the time of Devapāla in Bihar, and from Jagjı̄vanpura (Malda) of the
reign of Mahendrapāla. The Siddhamātrikā (mentioned as ‘Siddham’ in Chinese sources)
is said to have been prevalent also in this region up to the end of the tenth century. Also
called the Gauri (i.e. Gandi) in Pūrvadeśā or the Eastern country, it was regarded as the
same script to which is given the appellative Proto-Bangla characteristics in
rudimentary forms, in the period between A.D. 875 and A.D. 1025.

In some epigraphs it is considered as belonging to the second quarter of the eleventh


century A.D. Flattening of head-marks becomes prominent in comparison to the wedge-
shaped serifs. An important landmark in the development of the Bangla script is the
Rā magañ ja copper plate inscription of Mahāmāṇ ḍalika in the last quarter of the
eleventh century A.D. It is the earliest document from this entire region which bears the
letter m, with a tick rising upwards. The full vowel i develops a tick at the right end of
the upper horizontal bar above and a curved hook below. Initial e approaches the
modern Bangla character. A mature form of Proto-Bangla, the immediate precursor of
Bangla script, is illustrated in the inscriptions of the Varmaṇ a, Sena and Deva rulers of
the twelfth and thirteenth centuries [104].

The evolution of the Bangla script (Cf. 136) is aligned with the story of advancement of
printing technology. The first “Movable type” scripts technically created and used while
printing Nathaniel Brassey Halhed's (1751-1830) 1778-book titled, 'A Grammar of the
Bengal Language'. In 1785, Governor-General Warren Hastings (1732-1818) requested
another civilian, Charles Wilkins (1749-1836) to cut punches for Bangla printing
characters. The current printed form of Bangla script appeared soon after. It is generally
agreed that Wilkins developed Bangla print script [111]. He passed on this knowledge
to Pañ cā nana Karmakā ra (?-1804), a renowned artist in Bengal. Later it was Karmakar
and his family that became famous in Bangla printing technology. Shepherd was
another assistant of Wilkins in this designing of script, which became more angular with
sharper turns and edges [133]. A few archaic letters were modernized during the 19th

5
century. It was standardized by Pandit Ishwar Chandra Vidyasagar when the Bangla
type fonts were to be used to publish on a large scale under the Calcutta School Book
Society [116 for several references].

Much later, in 1935, the Linotype technique, invented by Ottmar Mergenthaler (1854-
1899) in 1886, was introduced into Bangla printing in 1935, by the efforts of Suresh
Chandra Majumdar (1888-1954), Rajsekhar Basu (1880-1960), Jatindra Kumar Sen
(1882-1966) and his disciple, Sushil Kumar Bhattacharya and had begun being used by
the Aƒ nandabā zara Patrikā group, later followed by others. Within a few years the more
advanced monotype technology came to be used in Bangla printing. However, in Bangla
printing culture, monotype has a very limited acceptance and linotype held stage till,
eventually, the digital technology came in to replace all earlier techniques.

All these could be presented in a table:

PERIOD DESCRIPTION NAMES

3rd Century B.C. Use of Brāhmī and Kharoṣ ṭhī scripts begin in the Brāhmī
subcontinent. Brāhmī was widely used during the
Mauryan King, Aśoka. In one theory, Brāhmī is
based on North Semitic alphabet but suitably
modified to fit the need of local languages. It is
currently believed to have been an independent
development.

1st-3rd Century The Kuṣ āṇ a script, named after the Kuṣ āṇ a royal Kuṣ āṇ a script
AD dynasty.

4th-5th Century The next stage of its evolution was into the Gupta Gupta script
AD script, named after the Gupta royal dynasty.

7th Century AD Epigraphic records from Bangladesh demonstrate Kuṭila-lipi


remarkable developments in Eastern Brāhmī, giving
rise to the Kuṭila-lipi

8th Century AD Some copper plate inscriptions are found in the Siddhamātikā
Khalimpur, Bangladesh during the reign of
Dharmapāla, from Monghyr and Nālandā in Bihar, of
the time of Devapāla, and from Jagjı̄vanapura in
West Bengal of the reign of Mahendrapāla.

6
PERIOD DESCRIPTION NAMES

9th Century AD Proto-Bangla characteristics in rudimentary forms Proto-Bangla


until 1025 AD develop. An important landmark in the Script &
development of the Bangla script is the Rā magañ ja Language
copper plate inscription of Mahāmāṇ ḍ alika found in
the last quarter of the eleventh century A.D.

12th-13th A mature form of Proto-Bangla, the immediate Matured


Century AD precursor of Bangla script, is found in the Proto-Bangla
inscriptions of the Varmaṇ a, Sena and Deva rulers
of the twelfth and thirteenth centuries.

14th-15th The characteristics of typical Bangla script began to Modern


Century AD develop, as could be seen in the copper plate Bangla Script
inscription of Vijayamāṇ ikya-I of Tripura dated era begins
1478 AD - also Illustrates forms of Bangla letters in (See Ross
the fifteenth century A.D. 1999)

16th-17th The chart of the Bangla alphabet, appended to the Printed


Century AD China Monuments, published from Amsterdam in Charts of
1667 and The code of Gentoo law, published from Bangla
London in 1776, both show a chart of the Bangla
alphabet. They show 16 Vowel letters, including the
̥̄ Anusvāra and Visarga, and 34
Long ‘ৡ’ ‘li’,
Consonants.

18th-19th Charles Wilkins develops printing in Bangla in 1778 Bangla Type


Century AD and Vidyasagar reforms it. Fonts

Table 2: Development of the Bangla Writing System

The overall development of Bangla Script from the Kuṭila-lipi period to Modern Bangla
could be seen here in Table 3 ([102 and 146] and also see the web-page in 147).

7
Table 3: Bangla Script in Different Centuries

3.2. Languages Considered


Below is the tabular representation of the languages using Bangla script that are placed
on EGIDS Scale 1-6. (See 117 for details.) Some languages under EGIDS 5 and 6 have
also developed their own scripts for printing and publishing. Some had used Bangla
script earlier (such as Bodo), or used it in West Bengal at some point of time (Santali)
but have later shifted to another writing system. Bodo is now written in Nāgarī or
Devanāgarī and for Santali one uses both Nāgarī/Devanāgarī and Ol-chiki (145). For the
purposes of the Bangla LGR, only languages belonging to the EGIDS scale 1 to 4 have
been considered. Consider the following table:

8
EGIDS EGIDS EGIDS EGIDS EGIDS EGIDS 6
Scale 1 Scale 2 Scale 3 Scale 4 Scale 5

Bangla Santali, Bodo, Lepcha


(Bengali) Riang, Khumi, Pnar, Koda/
Mru(ng), Asho Kora, Chak

Asamiyā Koch or Mā lto or


(Assamese) Rā jabaṅ ś ı̄ Mā lpā hā ṛ iyā

Maṇ ipuri or Biṣ ṇupriyā Chā kmā , Toto,


Meitei Maṇ ipuri, Hā jong, Rohingyā ,
Kok-Borok Muṇ ḍ ā ri & Tippera,
(Tripura & Kurux (of Megam,
Bangladesh) Bangladesh) Tanchangya

Usoi Limbu, Sadri or Bhumij or


Oraon Muṇ ḍ ā ri,
Bawm, Chin

Table 4: Main languages in India and Bangladesh


that use Bangla Script on the EGIDS Scale

3.3. Notable Features of Bangla Script [150]


Bangla Writing System has certain features that show how it has to be written in or how type-
setting in Bangla could be done. This section is followed by a section that explains the Code-
points (and fixed Code-point sequences) which show certain distinctive characteristics of
Bangla and which make the Repertoire. The next sections will also cover the ‘akshar’-formation
rules (ABNF) showing character class, Word Level Evaluation (WLE) and Context Rules as well
as In-Script and Cross-Script Variants. Here, we present some basic features of the Script and
Pronuncition:

● The Bangla script is an alpha-syllabic writing system in which writing of all


consonants are assumed to contain an accompanying ‘inherent’ vowel
(theoretically before or after each consonant). It varies between /ɔ/ and /o/
depending on the position of the consonant in the word. At times, these
‘assumed’ or ‘inherent’ vowels are not pronounced at all [142].

9
● Vowels can be written as independent letters, or by using a variety of diacritical
marks which are written above, below, before, after or both of the last two
positions the consonant they follow in pronunciation [105].
● All Bangla consonants when pronounced in isolation are uttered with an inherent
vowel - / ɔ/; hence ক ‘k’, খ ‘kh’ or গ ‘g’ are usually pronounced as [kɔ], [khɔ], or
[gɔ], etc. Phonologically, Bangla vowel - / ɔ/ corresponds to the Hindi schwa /ə/
● When consonants occur together in clusters, special conjunct letters are formed.
In printed Bangla, many of these consonantal clusters or conjoined consonants
are in use. The letters for the consonants other than the final one in the group are
generally reduced. But there are a few special conjunct characters which are
compounds of the consonant characters, e.g. 7(k)+ষ(ṣ )=8(kṣ )/,
9(ñ )+জ(j)=:(ñ j), ;(j)+ঞ(ñ )==(jñ ), >(h)+ম(m)=?(hm). There are other issues
also—র as the second member of a cluster is reduced to a secondary symbol, e.g.
@(p)+র(r)=A(pr), B(ṣ )+C(ṭ)+র(r)=D(ṣ ṭr) (as in উD uṣ ṭra “camel”); য (y), when used
as a primary symbol, represents /jɔ/ in Bangla. But its secondary symbol
(allograph) jɔ-phalā has two phonetic values. When added to the initial
consonant in a word, it is a vowel /æ/ (as in শGামল (ś yā mala) “green”, রGাপার
(ryā pā ra) “wrapper”, etc.). But after a non-initial consonant, it just doubles it in
pronunciation (as in কাযH, ধাযH, etc.). The I(r)+য(y) combination has two
renderings—রG(ry) and যH(ry). In case of J(d)+ধ(dh), K(g)+ধ(dh), L(n)+ধ(dh) the
shape of the second member is changed—e.g. M(ddh), N(gdh), and O(ndh)
respectively. The solitary example of I(r)+ঋ(ṛ )=ঋH (as in QনঋHত nairṛ t
"Southwest") – used mostly in cases of Classical borrowings, shows the use of
secondary symbol of a consonant followed by the primary symbol of a vowel.
The inherent vowel only applies to the final consonant of the cluster.
● In consonant clusters, many consonants took a completely different form. Some
typical examples are S (kt), T (kr), 8 (kṣ ), N (gdh), = (jñ ), U (ñ c), : (ñ j), V (ṭṭ), W
(nt), O (ndh), X (bdh), Y (bhr), Z (mb), [ (st) etc. র has two allographs, apart from
this full shape : one is ‘repha’, as found in কH (rk), পH (rp); and another is ra- phalā ,
as in A (pr), T (kr). \ (ṣ +ṇ ) is another one, where the cerebral nasal consonant
sign takes a queer shape. [151]
● The Bangla script has at least fifty-two primary symbols and quite a few
allographs (positional variants of them), corresponding to forty-four (7 oral and
7 nasal vowels and 30 consonants) phonemes (150) or functional speech sounds,
with some obvious redundancies, although in one of the first phonemic analysis,
the number was thought to be thirty-five phonemes [140].

10
● As mentioned above, in Bangla, several graphemic symbols have secondary
shapes, technically called ‘allographs’ with a complementary distribution in each
case. These graphs or markings are generally added to the following positions of
the primary symbol [113] in the following manner:

1) Below (e.g. কু (ku), W (nta), কূ (kū ), ^ (hra), etc.)


2) Above (e.g. চঁ (cã ), কH (rka), etc.)
3) Right side (e.g. কা (kā ), কং (kaṅ ), etc.)
4) Left side (e.g. `ক (ke))
5) Left Side and above simultaneously (e.g. Qক (kai), িক (ki) etc.)
6) Right side and above simultaneously (e.g. কী (kı̄))
7) Right side and left side simultaneously (e.g. `কা (ko))
8) Right side, left side and above simultaneously (e.g. `কৗ (kau)).
● As for complementary distribution of vowel letters (word- or syllable-initial) and
Vowel Mā trā s, which are relevant for ABNF, let us consider the following.
Besides some simple Vowel Modifiers called ‘Kā rs’ in Bangla (also referred to as
Mā trā in the other LGR documents of Neo-Brāhmī) there are some combinatory
modifiers of Bangla Vowels with certain consonants. For example, whereas

আ U+0986 BENGALI LETTER AA is substituted by


◌া U+09BE BENGALI VOWEL SIGN AA,
ই U+0987 BENGALI LETTER I is substituted by
pre-posed ি◌ U+09BF BENGALI VOWEL SIGN I,
ঈ U+0988 BENGALI LETTER II is substituted by
◌ী U+09C0 BENGALI VOWEL SIGN II or
উ U+0989 BENGALI LETTER U is substituted by
◌ু U+09C1 BENGALI VOWEL SIGN U by marking below the primary
grapheme, there are some special vowel modifiers of উ as in the following
combined letters:
‌ gu, rather than writing as গ (g) + ◌ু (u)
h ru, rather than writing as র (r) + ◌ু (u)
‌ śu, rather than writing as শ (ś )+ ◌ু (u)
j hu, rather than writing as হ (h) + ◌ু (u)
k/! ntu, rather than writing as L (n) +ত (t) +◌ু (u)

11
Similarly, there could be vowel modifiers of ঊ or ‘(Long) ū’ as well; e.g.

m (bh) + র (r) (n bhrū “eyebrow”), o (ś ) +র (r) (p ś rū ), ঋ (ṛ ) after হ (h) (q hṛ ), etc.

● There have been many notable contributions in simplifying and modifying Bangla
spellings and combinatory techniques, especially by scholars such as Pabitra
Sarkar (1992) [134]. In this there has been an attempt to reduce the number of
allographs of both vowels and consonants in clusters, and it has been widely
accepted in the printing of school texts in both Bangladesh and West Bengal [151,
152]. As of now, two systems, the old (traditional), and the new, go on side by
side, operative in different domains.
However, in preparation of this LGR document, the aim has been to consider the widely
used and usable sequences and combinations and their variations across the sister
scripts belonging to the basket of Brāhmī writing systems.

Bangla Academy, Dhaka published Standard Bangla Spelling Rules in 1992 following the
recommendations of a committee constituted through a workshop jointly organized by
the Jā tı̄ya Sy ikṣ ākrama and Pā ṭhyapustaka Board in 1988. A throughly revised edition of
the Rules was published in September 2012.6

After the establishment of Bā ṅ lā Aƒ kā demi of West Bengal in 1986, its first President,
Annadasankar Ray (1904-2002), in his inaugural address, gave a direction for
standardization of Bangla alphabet, script, the spelling system and clearly argued that
they would not blindly follow the Sanskritic model of conventional grammar. A broad
list of proposals was sent to experts on Bangla, and a broad agreement was reached for
‘homogenization of Bangla spelling’ by 1988. Based on opinions received from different
quarters, a unanimous list of ‘rules’ was agreed upon. This was published by a ‘Spelling
Dictionary’ titled, Ākādemi Bānāna Abhidhāna (1997), which was obviously more
comprehensive than ‘The University of Calcutta proposals’, made in 1936. Along with
the ‘rationalization’ of spellings, another step was taken to make the writing system
easier to read, by making the symbols used, both single and combined ones, more
‘transparent’. These reforms were originally suggested by Sarkar (1987, first published
in 1978) [134] [153] where he used the terms Swaccha (‘Transparent’) and Aswaccha
(‘Opaque’ or non-transparent), even adding Ardha Swaccha (‘half transparent) in
between the two. Some sample examples are:

Transparent: r (nn), s (pt), [ (st), where both member of the cluster can be
recognized.

6Bangla Academy. 2012. Bāṅlā Ekaḍemī Pramita Bāṅlā Bānānera Niyama (Bangla Academy Standard
Bangla Spelling Rules). Dhakaː Bangla Academy.

12
Opaque: where neither of the two could be (easily) recognized—8 kṣ (7 k + ষ ṣ ), = jñ
(; j + ঞ ñ ), t ṅ g (u ṅ + গ g), ? hm (> h + ম m).
Semi-transparent: A (pr), পH (rp) where one symbol is recognizable and the other is
not. In case of three-term clusters, at least one symbol will not be transparent, e.g. v str
(w s+x t+র r), D ṣ ṭr (B ṣ +C ṭ+র r), etc.

There were, in fact, two types of proposals. One concerned the shape of the letters,
those of consonant + vowel (CV) combinations and conjuncts, which is consonant +
consonant combinations. There were further complex shapes, i.e. those of consonant +
consonant+ (consonant+) vowel (CC(CV) signs, as in y (pru), or z (skru). Some
decisions in this area were necessary because a few of the CC(C) symbols represented
complexities that made learning them difficult for the children. The other dealt with the
spellings of words only, without any reference to the shapes of letters in which they
were written. The basic objective here was ‘one word, one spelling’, to the greatest
extent that was possible. [151]
Below we place a statement of the most salient changes that affect the consonant +
vowel combinations. [153]
a. The variants of the short u (^{ উ-কার hrasva u-kāra) vowel sign have been
brought down to one, i.e., ◌ু. So ‌ (gu) is now গু. Similarly h (ru) > রু, ‌
(śu)> শ,ু j (hu)>হ.ু and therefore, cluster + short u sign : k (ntu)> Wু
(ন+◌্+ত+উ), } (stu)>[ু (স+◌্+ত+উ)
b. The variants of long u (দীঘH ঊ-কার dīrgha u-kāra) have also been reduced. €
(rū)> র;ূ n (bhrū) > Yূ (ভ bh+◌্+র r+ঊ ū); • (drū)> ‚ূ (দ d+◌্+র r+ঊ ū); p (śrū)>
ƒূ (শ ś+◌্+র r+ঊ ū)
c. The variants of ঋ-কার (ṛ-kāra "secondary symbol of ṛ") have been brought
down to one: q (hṛ) > হৃ
Regarding consonant + consonant + (consonant)…+ (vowel) clusters Paschimbanga
Bangla Akademi proposed transparent or semi-transparent shapes for clusters to the
extent admissible in Bangla writing system. Some examples will clarify the proposal (A
slash will mean that the traditional cluster-shape precedes it, while the Bangla Akademi
innovation follows.) [153]
X/…ধ bdh († b+ ধ dh), M/‡ধ ddh (J d+ধ dh), ˆ/‰থ, " nth (L n+থ th), U/‹চ, # ñc (9
ñ+চ c), Œ/‹ছ, $ ñch (9+ছ), :/‹জ, % ñj (9 ñ+জ j), S/Žত, & kt (7 k+ত t), T/' kr (7
k+র r), N/•ধ, ( gdh (K g+ধ dh), •/) ṅk (u ṅ+ক k), t/ * ṅg (u ṅ+গ g), \/+ ṣṇ (B ṣ+ণ
ṇ), ’/‰“, , ndhr (L n+” dh+র r), •/- ṇḍr (– ṇ+— ḍ+র r), ˜/. ktr (7 k+x t+র r)

13
3.3.1 The Consonants
As per traditional classification Bangla Consonants are categorized according to their
phonetic properties, especially in terms of place and manner of articulation [107]. There
are Five ‘Varga’ (pronounced as ‘Barga’ in Bangla) or Groups (sets or classes)
distinguished by Place of Articulation, and one Non-‘varga’ group [105]. Each Varga,
which corresponds to Stops at a certain place of articulation, contains a series of five
consonants classified as per their phonetic qualities (i.e. manner of articulation),
beginning from Unvoiced and Unaspirated to Voiced and Aspirated (in the fourth
column), finally ending with a Homorganic or Corresponding nasal [107]. Consider the
following table:

‘Varga’ or Unvoiced Voiced Nasal


Sets

-Asp +Asp -Asp +Asp

Velar ক ‘K’ খ ‘KH’ গ ‘G’ ঘ ‘GH’ ঙ ‘NG’


U+0995 U+0996 U+0997 U+0998 U+0999

Palatal চ ‘C’ ছ ‘CH’ জ ‘J’ ঝ ‘JH’ ঞ ‘NY’


U+099A U+099B U+099C U+099D U+099E

Retroflex ট ‘TT’ ঠ ‘TTH’ ড ‘DD’ ঢ ‘DDH’ ণ ‘NN’


U+099F U+09A0 U+09A1 U+09A2 U+09A3

Dental ত ‘T’ থ ‘TH’ দ ‘D’ ধ ‘DH’ ন ‘N’


U+09A4 U+09A5 U+09A6 U+09A7 U+09A8

Bilabial প ‘P’ ফ ‘PH’ ব ‘B’ ভ ‘BH’ ম ‘M’


U+09AA U+09AB U+09AC U+09AD U+09AE

Table 5: Varga classification of Bangla consonants


(Falling into a Pattern of Five Sets of Unvoiced Unaspirated, Unvoiced Aspirated, Voiced Unaspirated,
Voiced Aspirated and Nasals, called five ‘Varga’)

য ‘Y’ য় ‘YY’ র ‘R’ ড় ‘RR’ ঢ় ‘RH’


Non- U+09AF U+09DF U+09B0 U+09DC U+09DD
Varga
ল ‘L’ শ ‘SH’ ষ ‘SS’ স ‘S’ হ ‘H’
U+09B2 U+09B6 U+09B7 U+09B8 U+0939

Table 6: Non-Varga consonants (Not falling into any of the five categories)

14
3.3.2 The Implicit Vowel Killer: Hasanta (called ’Halant’ or ‘Halanta’ in other
Brā hmı̄-based scripts)
As stated earlier, all consonants are pronounced in isolation with an implicit vowel
(central back /-ɔ/ in Bangla as the neutral vowel) assumed to be associated with them
[121]. The ‘Hasanta’ (=’ Halant’ or ‘Halanta’ in other Brā hmı̄-based scripts) or the term
‘Virāma’7 (=’Dā ̃ri’ in Bangla) as preferred in UNICODE (cf. Unicode 3.0 and above) have
been used in this report as terms that have been used to denote the character that mark
the absence of this inherent vowel. It may be noted that the term virā ma has been
adopted in UNICODE in a sense that is different from the traditional definition of
grammar, and hence it requires some explanation here. Considering the importance of
the document this note should be a part of this LGR document, so that anybody refering
to it should be able to know the proper grammatical explanation of the term. Because a
special sign is needed whenever this implicit vowel is stripped off, the symbol is known
as the Hasanta (= Halant) "◌्" (U+09CD). By placing the Hasanta under the first
consonant of a combination or cluster, one could – in common parlance, “kill” its vowel,
and create conjuncts. In this manner, conjunct characters can be generally written by
joining two to four consonant combinations. In rare cases, this process can join up to
five consonants. However, the notion of a maximum number of consonants joining to
form one akṣara8 is to be bounded empirically. This is an observation based on the CIIL-
Emille Corpora of Bangla words [132 & 133] as seen in print these days. Given the
mixture of scripts and languages happening on the web, the possibility that one may
want a generic Top Level Domain [gTLD] which may have more than the observed
maximum cannot be ruled out. This can be the case when a foreign language word,
which admits a large number of consonants, is transliterated into Bangla. Hence, in the
Bangla LGR work, this limit will not be enforced.

3.3.3 Vowels
Separate symbols exist for all ‘Swara’ or Vowels in Bangla, which are pronounced
independently either at the beginning of the word or after another vowel or consonant
sound. To indicate a Vowel sound other than the implicit one, a Vowel sign, called ‘kār’
in Bangla or Mātrā in Nā garı̄9 is attached to the consonant. Since the consonant has this
built in neutral vowel at the end, there are equivalent kāras (Mātrās) for all vowels
except the অ (pronounced /-ɔ/). The correlation is shown as follows:

7 Virāma, as used here, is also a misnomer according to the Indian grammatical traditions. No where mere
absence of a vowel is marked as virā ma. Hasanta just marks the absence of a vowel, nothing else.
(Abhyankar, Kashinath Vasudev & J. M. Shukla. 1961. A Dictionary of Sanskrit Grammar. Barodaː Oriental
Institute.)
8 This term needs to be disambiguated. Akṣ ara also means ‘syllable‘ in Indian grammatical treaditions
9 Although the term ‘Mā trā ‘ in Bangla stands for an altogether different concept, viz.the top bar placed

over a letter – typically available in Hindi and Bangla but missing in Gujarati.

15
Vowel Corresponding vowel sign
(kāras (Mātrās)

অ ‘A’ U+0985

আ ‘AA’ U+0986 ◌া U+09BE

ই ‘I’ U+0987 ি◌ U+09BF

ঈ ‘II’ U+0988 ◌ী U+09C0

উ ‘U’ U+0989 ◌ু◌ U+09C1

ঊ ‘UU’ U+098A ◌ূ◌ U+09C2

ঋ Vocalic ’R’ U+098B ◌ৃ◌ U+09C3

ৠ Vocalic ‘RR’ U+09E0 ◌ৄ◌ U+09C4

ঌ Vocalic ‘L’ U+098C ◌ৢ◌ U+09E2

ৡ Vocalic ‘LL’ U+09E1 ◌ৣ◌ U+09E3

এ ‘E’ U+098F `◌ U+09C7

ঐ ‘AI’ U+0990 Q◌ U+09C8

ও ‘O’ U+0993 `◌া U+09CB

ঔ ‘AU’ U+0994 `◌ৗ U+09CC

- ◌ৗ U+09D7

Could appear on top of অ ‘A’ U+0985 or ◌ঁ U+0981 Candrabindu


any other vowel
Could appear after অ ‘A’ U+0985 or ◌ং U+0982 Anusvā ra
any other vowel
Could appear after অ ‘A’ U+0985 ◌ঃ U+0983 Visarga
or any other vowel
After any consonant ◌্ U+09CD (Hasanta )

- ঽ U+09BD Avagraha

Table 7: Bangla Vowels with corresponding kārs

16
3.3.4 The Anusvāra /onuʃʃār/ (◌ং - U+0982)
The Anusvāra or /onuʃʃār/ in Bangla at times represents a homorganic nasal but not
always. It replaces a conjunct group of a ‘Nasal Consonant+Hasanta +Consonant’ where
the second consonant belongs to the Velar varga or set as in লংকা. But it often appears
also for such combinations involving non-velars appearing as the last member of the
combination as in লGাংটা “naked”, or লGাংচা “a kind of sweet/to limp”. Before a non-varga
consonant, the Anusvā ra represents a nasal sound that may have an alternative
conjoined writing symbol representing the corresponding nasal consonant of the
particular set. Although Modern Hindi, Marathi and Konkani prefer the anusvāra to the
corresponding Half-nasal, in Bangla it is clearly demarcated as to where one must use
the Anusvāra and where it has to be a conjunct cluster with a nasal as the first or the
second component.

3.3.5 Nasalization: Candrabindu (◌ँ - U+0981)


Candrabindu denotes nasalization of the preceding vowel as in চাঁদ /cā̃ d/ ‘moon’
(U+099A U+09BE U+0981 U+09A6). This sign with a dot inside the half-moon mark is
used as nasalization marker in many Brā hmı̄-based scripts. [143]

3.3.6 Nukta (◌़ - U+09BC)


The nukta sign does not exist in Bangla orthography. It is predominantly used in many
Brā hmı̄ derived scripts, such as Devanā garı̄ (for Hindi, Bodo, Maithili, Santali, Kashmiri
and Sindhi. The term and the concept of nukta are borrowed in Bangla.

The IDNA Protocol (RFC 5891) states that IDNs must be in Unicode Normalization Form
C (NFC). RFC 7940 applies this requirement to LGRs. The definition of NFC in the
Unicode Standard contains a number of composition exclusions. As a result, the Bangla
letters য় YYA, ড় RRA and ঢ় RRHA have to be represented in the this LGR by using the
sequences (YA +Nukta: U+9AF + U+09BC), (DDA + Nukta: U+9A1 + U+09BC), and
(DDHA + Nukta: U+9A2 + U+09BC) instead of the single code points YYA (U+9DF), RRA
(U+09DC), and RRHA (U+09DD), although the use of ‘Nukta’ is otherwise completely
unnatural in Bangla.

It is noted that in the current Unicode Standard chart, these characters are listed as
additional consonants. As per the LGR Procedure, however, these decisions depend on
the IDNA Protocol through a set of prodedures developed by the IETF. Even though the
Unicode Standard also prescribes methods to produce these three characters both as
atomic characters (for example, 09DC for ড় [ṛ ], 09DD for ঢ় [ṛ h], and 09DF as য় [y] as
single key stroke), the IDNA protocol requires that we treat them as conjunct characters
and then allocate codes for these in the Unicode Bengali Block.

17
It may be noted that there could be sporadic attempts or cases of writing Muslim names,
Urdu poetic words and Perso-Arabic loan words with nukta under ক (k), খ (kh), গ (g), জ
(j) and ফ (ph) only for the sake of correct pronunciation and for maintaining the
sanctity of the loan word. These were also like using Bangla writing system to work like
the IPA script. It is, however, not in use in Bangla writing in printing.

3.3.7 Visarga /biʃɔrgo/ (◌ঃ - U+0983) and Avagraha (ঽ - U+09BD)


The Visarga /biʃɔrgo/ U+0983 is frequently used in Bangla loanwords borrowed from
Sanskrit and represents a sound very close to /h/. One could quote, as an example: দু ঃখ
/duhkho/ “sorrow’’, “unhappiness’’ (U+0926 U+0941 U+0983 U+0916).

The Avagraha "ঽ" (U+09BD) is mainly used in Sanskrit, Pā li, Prā kṛ t or Maithili texts
written in Bangla. It is gradually being replaced by an upper comma (e.g. নেরাঽপরািণ re-
written as নেরা’পরািণ). It is rarely used now even in other languages using Bangla script.
In case of LGR, the Avagraha is not part of the repertoire. It has been decided, therefore,
not to retain Avagraha (ঽ) (U+09BD) because it is blocked in TLDs as per the Maximal
Starting Repertoire (MSR).

Please see Appendix II in section 11 for a complete list of Bangla consonants and their
allographs.

3.3.8 Zero Width Non-joiner (U+200C) and Zero Width Joiner (U+200D)

This note is pertinent to the use of Zero Width Joiner (ZWJ) and Zero Width Non Joiner
(ZWNJ) as used in Bangla. It needs to be noted that Nepali, Konkani and Hindi use these
two signs in a different manner.

ZWJ (U+0200D) and ZWNJ (U+0200C) are code points that have been provided by the
Unicode standard to instruct the rendering of a string where the script has the option
between joining and non-joining characters. Without the use of these control codes, the
string may be rendered in an alternate form from what is intended.

Use of ZWJ

• Insofar as Bangla is concerned ZWJ is used for the proper rendering of characters
such as khaṇḍa-ta /ৎ/ as in সতGিজৎ (satyajit) “Satyajit” and সৎ (sat) “honest”. This
is typed as follows:
ta + Hasanta + ZWJ (U+0200D)

18
• However, ZWJ is more important where same combination of consonantal
characters is represented differently depending upon the contexts. E.g. র+◌্+য
have two representations in Bangla—as যH and as রG. To get the form যH one has to
type in the following manner—র+◌্+য, but for রG the sequence would be
র+ZWJ+◌্+য. [154]. In other words, ZWJ is used in the rendering of words
demanding ya-phalā after ra which is otherwise not possible to type (render)
due to the same order of ra+hasanta+antastha ja in the medial and/or final
position. Interestingly, ra+hasanta+antastha ja is used to type repha on the
consonant - antastha ja as in কায6 (kaarjo). In order to get a ya-phalā after the
consonant -ra it is therefore obligatory to use ZWJ after -ra as in রGাপার
(wrapper), রGাশ (rash), রGািল (rally) etc. The typing sequence is given below:
ra (র) + ZWJ + hasanta (◌্) + antastha ja (য) = রG

Use of ZWNJ

• The use of ZWNJ in Bangla is used to represent the explicit Hasanta or Halant. In
order to avoid conjunct formation in cases where there is an explicit hasanta
before the succeeding consonant the ZWNJ is used.

Consonant + hasanta + ZWNJ + consonant = explicit hasanta

Example: Aা7কথন (prā kkathana /prakkɔtʰon)


The use of ZWJ/ZWNJ have been ruled out from the root zone by the [Procedure]. Used
in Bangla, to create alternate renderings, the insertion of these two signs can affect
searching as well as NLP.

The Zero Width Non-joiner (ZWNJ) is an invisible character used in certain cases (after
Hasanta) where default conjunct formation is to be explicitly restricted and the Hasanta
joining the two consonants participating in the conjunct formation needs to be explicitly
shown.

3.3.9 Use of Ya-phalaa

Ya-Phalaa sequences are two instances in Bangla where Hasanta is preceded by a full
vowel (U+0985 অ - BENGALI LETTER A and U+098F এ - BENGALI LETTER E).
• অ"া 0985 09CD 09AF 09BE
BENGALI LETTER A + BENGALI SIGN VIRAMA +
BENGALI LETTER YA + BENGALI VOWEL SIGN AA
• এ"া 098F 09CD 09AF 09BE

19
BENGALI LETTER E+ BENGALI SIGN VIRAMA + BENGALI LETTER YA +
BENGALI VOWEL SIGN AA
For rendering Ya-phalā followed by অ and এ, it is necessary to type U+09CD Hasanta
plus U+09AF ya preceded by the said vowels. This is a purely ligatural entity and the
addition of Ya-phalā and ākā ra is used to elicit the /æ/ sound as in English 'acid' অGািসড,
'association' অGােসািসেয়শন, ‘bat’ বGাট, ‘fat’ ফGাট, ‘mat’ মGাট, ‘cap’ কGাপ etc.

The Brāhmī script, by nature does not have Hasanta after a vowel. Hasanta is generally
described as ‘vowel killer’, although it actually indicates absence of a vowel after the
marked consonant. Only the consonants can have the Hasanta marked. But as we see
here, Bangla ends up with a deviant feature in the orthography here in which Hasanta
comes immediately after a vowel in ligatures অ8া and এ8া (Cf Unicode 10.0 p. 473 [100]).

3.3.10 Formation of Ra-phalaa and Ref Sequences

This case refers to the formation of repha and ra-phalā as follows:


Ra-Hasanta = (C2 H)
where C2 is either
09B0 (র - BENGALI LETTER RA) or
09F0 (ৰ - ASSAMESE LETTER RA/ Unicode name:
BENGALI LETTER RA WITH MIDDLE DIAGONAL)
H is 09CD (◌্ - BENGALI SIGN VIRAMA)

Owing to co-occurrence with HASANTA, RA either loses its own implicit vowel (REPHA),
or suppresses the implicit vowel of the preceding consonant (RA-PHALAƒ ). For instance,
repha = ra + Hasanta + C (e.g. কH i.e. ra + Hasanta + ka, as in অকH arka “the sun“); ra-phalā=
C + Hasanta + ra (e.g. T i.e. ka + Hasanta + ra, as in চT chakra “cycle”). The point is in
both the cases the slot for ra could be Bangla ra র (U+09B0) or the Assamese ra ৰ
(U+09F0), followed/ preceded by the common Hasanta (U+09CD), whereas the shapes
of repha and ra-phalā in both the cases remain the same. The LGR makes a note of this
point of concern with respect to the two RAs in disguise as it would be compeltely
impossible to distinguish between them with naked eyes in a lable so generated which
may consequently lead to concerns related to spoofing and other kind of cyber
irregularities. The motive to class these two CPs as (blocking) variants is because fully
rendered labels may mask the distinction between Bangla ra র (U+09B0) or the
Assamese ra ৰ (U+09F0). That provides the justification for Variant Set 4, though only in
the context of following Hasant. The difference between the RAs is only distinguishable
if one looks into their Unicode values. Therefore, labels such as অকH arka, শীষH ś ır̄ ṣ a ‘top/
apex’, অY abhra ‘cloud/the sky’, ƒম śrama ‘physical labour’ could be extremely
dangerous as the web-user may never verify the digital content (the labels) with its
unicode value/code points. This point is made explicitly, with reference to Table 9 (of
sequences, p. 36) and Table 16 (of WLE Symbols, p. 47) that are to follow. Moreover, it

20
is noteworthy that the REPHA can also occur with KHANDA TA. The conditions in this
context of KHANDA TA are liable to be such that the C should be either RA U+09B0 (র)
(used in Bangla) or RA U+09F0 (ৰ) (used in Assamese).

4. Overall Development Process and Methodology

The Neo-Brāhmī Generation Panel (NBGP) has been formed by members having
experience in Linguistics (especially in NLP / Computational linguistics), Literature,
Language History and Epigraphy. Under the Neo-Brāhmī Generation Panel, Bangla and
eight other scripts belonging to separate Unicode blocks are being taken up to assign a
separate LGR for each. However, an attempt is made to ensure that the fundamental
philosophy behind building those LGRs consistent with all other Brāhmī-derived scripts.
The present LGR will cater to multiple languages belonging to EGIDS scale 1 to 4 (see
Table 4) that use Bangla script.

The following guiding principles are used in making decisions about Bangla LGR Code-
points:

4.1 Guiding Principles


The NBGP adopts following broad principles for selection of code-points in the code-
point repertoire across the board for all the Neo-Brāhmī scripts within its ambit.

4.1.1 Inclusion Principles

4.1.1.1 Modern Usage


Every character proposed should be in the everyday usage of a particular linguistic
community. The characters, which have been encoded in the Unicode for transcription
purposes only or for archival purposes, will not be considered for inclusion in the code-
point repertoire.

4.1.1.2 Unambiguous Use


Every character proposed should have unambiguous understanding among linguists
about its usage in the language.

4.2 Exclusion Principles


The main exclusion principle is that of External Limits on Scope. These consist of
protocols or standards, which are prerequisites to the Label Generation Rule-sets. All
further principles are in fact subsumed under these limitations but have been spelt out
separately for the sake of clarity.

21
4.2.1 External Limits of Scope
The code point repertoire for root zone being a very special case, at the top of protocol
hierarchies, the canvas of available characters for selection as a part of the Root Zone
code point repertoire is already constrained by various protocol layers beneath it. The
following three main protocols/standards act as successive filters:

i. The Unicode Chart

Out of all the characters that are needed by the script in question, if a particular
character is not encoded in Unicode, it cannot be incorporated in the code point
repertoire. Such cases are quite rare, and especially so in Bangla-Asamiyā-Maṇ ipuri
Writing System, given the elaborate and exhaustive character inclusion efforts made by
the Unicode consortium.

ii. IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possible
representation of a given script/language, it has encoded as far as possible all the
possible characters needed by the script. However, the Domain name being a
specialized case, it is governed by an additional protocol known as IDNA
(Internationalized Domain Names in Applications). The IDNA protocol excludes some
characters out of Unicode repertoire from being part of the domain names.

iii. Maximal Starting Repertoire (MSR)

The Root-zone LGR being the repertoire of characters which are going to be used for
creation of the Root-zone TLDs, which in turn constitute an even more specialized case
of domain names, the ROOT LGR procedure introduces additional exclusions on the
IDNA’s allowed set of characters.

Example: Bangla Sign Avagraha "ঽ" (U+093D) even if allowed by IDNA protocol, is not
permitted in the Root Zone Repertoire as per the MSR.

To sum up, the restrictions start off with admitting only such characters as are part of
the code-block of the given script/language. The IDNA Protocol further narrows this
down and finally an additional filter in the form of Maximal Starting Repertoire restricts
the character set associated with the given language even more.

4.2.1.1 No Punctuation Marks


The TLDs being identifiers, punctuation markers present in BraHami-based scripts will
not be included.

22
4.2.1.2 No Symbols and Abbreviations
Abbreviations, weights and measures and other such iconic characters like BANGLA
ISSHAR "৺" (U+09FA), BANGLA CURRENCY DENOMINATOR SIXTEEN "৹" (U+09F9) etc.
will also not be included.

4.2.1.3 No Rare and Obsolete Characters


There are characters which have been added to Unicode to accommodate rare forms
such as Sanskritic VOCALIC RR "ৠ" (U+09E0) and VOCALIC L “ঌ” (U+098C) as well as
VOCALIC LL "ৡ" (U+09E1) and the allographic –kā ra forms of the latter two symbols -
VOWEL SIGN VOCALIC L "◌ৢ" (U+09E2) and VOWEL SIGN VOCALIC LL “◌ৣ" (U+09E3). All
such characters are excluded, which complies with the Conservatism principle as laid
down in the Root Zone LGR procedure. However, in Bangla, the -kā ra corresponding to
VOCALIC RR "ৠ" (U+09E0) which is VOWEL SIGN VOCALIC RR “◌ৄ ” (U+09C4) is still in
active use in certain limited borrowed or Sanskritic words, and are, therefore, retained.

4.2.1.4 No Stress Markers of Classical Sanskrit and Vedic


Stress markers for classical Sanskrit will not be included. This is also in consonance
with the Letter principle as laid down in the Root Zone LGR procedure.

4.2.1.5 ABNF
The Augmented Backus-Naur Formalism (ABNF) is described in Section 5.4.1 and
Appendix (Section 10.1).

5. Repertoire
The Bangla Writing System is represented in UNICODE using the Bengali (Bangla) script
name as enumerated in ISO 15924 corresponding to languages such as Asamiyā
(Assamese), Bangla (Bengali) and Maṇ ipuri. The BENGALI block used for Bangla-
Asamiyā-Maṇ ipuri in the UNICODE has 93 entries. This section details the code-point
repertoire that the Neo-Brāhmī Generation Panel [NBGP] proposes to be included in the
Bangla LGR.

It may be mentioned here that the Government of Assam has submitted a proposal to
Bureau of Indian Standards (BIS) on 26th February 2016 for dis-unification of Bangla
and Asamiyā Scripts. The BIS in its 8th Meeting of Indian Language Technologies and
Products Sectional Committee, LITD 20, held on 23rd Aug 2017, decided to refer the
proposal for recognition of Assamese script in ISO/IEC 10646 to ISO. Until the UNICODE
Consortium takes any further action, it will be assumed that the Code Point Repertoire
under Table 11 will be valid for all the three languages as above.

23
For each of the code points, language references have been given in the last column
titled "Reference" under Table 8 titled the “Code Point Repertoire”. For entire coverage
of Bangla code points, references of Bangla, Asamiyā (Assamese), Maṇ ipuri (Meitei), and
Bishnupriya are given. Kokborok, written in Bangla script, is not known to have
introduced many new complications, except for one particular character. Though only a
few representative languages under EGIDS Scale 1-4 have been chosen for referencing,
they together cover all the code-points required for all the languages that NBGP has
considered as given under Bangla Unicode Points (as given in UNICODE 6.3).

However, before the details are presented, it is ideal to look at the Bangla Code Point
Chart from Maximal Starting Repertoire [MSR] Version 3. It may be noted that the shapes of
the reference glyphs given below in the code charts are based on one of the many fonts
designed, and are not prescriptive, because there could be some variations in actual
fonts – both UNICODE-compatible and True-Type ones. Consider the following Code
point table:

24
Colour convention:

All characters that are included in the [MSR]


- Yellow background

PVALID in IDNA2008 but excluded from the


[MSR] - Pinkish background

Not PVALID in IDNA2008, or are ineligible


for the root zone (digits, hyphen) - White
background

Figure 1: Bangla Code Page from [MSR] for


Bangla- Asamiyā -Maṇ ipuri

Given the Bangla Unicode Block as in Figure 1, for the code points those are included in the
MSR, the following symbols will need a separate treatment:
ৎ U+09CE Bangla Letter Khaṇ ḍ a-Ta
ৰ U+09F0 Asamiyā -Bangla Letter Ra With Middle Diagonal
ৱ U+09F1 Asamiyā -Bangla Letter Ra With Lower Diagonal

25
5.1 Code Point Repertoire Inclusion
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

1. U+0981 ◌ঁ BENGALI Candra- 1 Bangla, [112], [122], [125]


SIGN bindu 2 Maṇ ipuri,
CANDRABIN 2 Assamese
DU

2. U+0982 ◌ং BENGALI Onushshar 1 Bangla, [112], [122], [125]


SIGN (Anusvā ra) 2 Maṇ ipuri,
ANUSVARA 2 Assamese

3. U+0983 ◌ঃ BENGALI Biśarga 1 Bangla, [112], [122], [125]


SIGN (Visarga) 2 Maṇ ipuri,
VISARGA 2 Assamese

4. U+0985 অ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER A 2 Maṇ ipuri,
2 Assamese

5. U+0986 আ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER AA 2 Maṇ ipuri,
2 Assamese

6. U+0987 ই BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER I 2 Maṇ ipuri,
2 Assamese

7. U+0988 ঈ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER II 2 Maṇ ipuri,
2 Assamese

26
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

8. U+0989 উ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER U 2 Maṇ ipuri,
2 Assamese

9. U+098A ঊ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER UU 2 Maṇ ipuri,
2 Assamese

10. U+098B ঋ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER 2 Maṇ ipuri,
VOCALIC R 2 Assamese

11. U+098F এ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER E 2 Maṇ ipuri,
2 Assamese

12. U+0990 ঐ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER AI 2 Maṇ ipuri,
2 Assamese

13. U+0993 ও BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER O 2 Maṇ ipuri,
2 Assamese

14. U+0994 ঔ BENGALI Vowel 1 Bangla, [112], [122], [125]


LETTER AU 2 Maṇ ipuri,
2 Assamese

27
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

15. U+0995 ক BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER KA 2 Maṇ ipuri,
2 Assamese

16. U+0996 খ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER KHA 2 Maṇ ipuri,
2 Assamese

17. U+0997 গ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER GA 2 Maṇ ipuri,
2 Assamese

18. U+0998 ঘ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER GHA 2 Maṇ ipuri,
2 Assamese

19. U+0999 ঙ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER NGA 2 Maṇ ipuri,
2 Assamese

20. U+099A চ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER CA 2 Maṇ ipuri,
2 Assamese

21. U+099B ছ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER CHA 2 Maṇ ipuri,
2 Assamese

28
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

22. U+099C জ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER JA 2 Maṇ ipuri,
2 Assamese

23. U+099D ঝ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER JHA 2 Maṇ ipuri,
2 Assamese

24. U+099E ঞ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER NYA 2 Maṇ ipuri,
2 Assamese

25. U+099F ট BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER TTA 2 Maṇ ipuri,
2 Assamese

26. U+09A0 ঠ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER 2 Maṇ ipuri,
TTHA 2 Assamese

27. U+09A1 ড BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER DDA 2 Maṇ ipuri,
2 Assamese

29
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

28. 09A1 ড় Normalized Consonant 1 Bangla, [112], [122], [125]


09BC form of 2 Maṇ ipuri,
(U+09DC BENGALI 2 Assamese 09DC is the preferred
) LETTER RRA code point, however it
is not available for
LGR as per the
standards governing
this LGR development

29. U+09A2 ঢ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER 2 Maṇ ipuri,
DDHA 2 Assamese

30. 09A2 ঢ় Normalized Consonant 1 Bangla, [112], [122], [125]


09BC form of 2 Maṇ ipuri,
(U+09DD BENGALI 2 Assamese 09DD is the preferred
) LETTER RHA code point, however it
is not available for
LGR as per the
standards governing
this LGR development

31. U+09A3 ণ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER NNA 2 Maṇ ipuri,
2 Assamese

32. U+09A4 ত BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER TA 2 Maṇ ipuri,
2 Assamese

30
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

33. U+09A5 থ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER THA 2 Maṇ ipuri,
2 Assamese

34. U+09A6 দ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER DA 2 Maṇ ipuri,
2 Assamese

35. U+09A7 ধ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER DHA 2 Maṇ ipuri,
2 Assamese

36. U+09A8 ন BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER NA 2 Maṇ ipuri,
2 Assamese

37. U+09AA প BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER PA 2 Maṇ ipuri,
2 Assamese

38. U+09AB ফ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER PHA 2 Maṇ ipuri,
2 Assamese

39. U+09AC ব BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER BA 2 Maṇ ipuri,
2 Assamese

31
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

40. U+09AD ভ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER BHA 2 Maṇ ipuri,
2 Assamese

41. U+09AE ম BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER MA 2 Maṇ ipuri,
2 Assamese

42. U+09AF য BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER YA 2 Maṇ ipuri,
2 Assamese

43. 09AF য় Normalized Consonant 1 Bangla, [112], [122], [125]


09BC form of 2 Maṇ ipuri,
(U+09DF BENGALI 2 Assamese, 09DF is the preferred
) LETTER YYA code point, however it
is not available for
LGR as per the
standards governing
this LGR development

44. U+09B0 র BENGALI Consonant 1 Bangla, [112], [122]


LETTER RA 2 Maṇ ipuri

45. U+09B2 ল BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER LA 2 Maṇ ipuri,
2 Assamese

46. U+09B6 শ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER SHA 2 Maṇ ipuri,
2 Assamese

32
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

47. U+09B7 ষ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER SSA 2 Maṇ ipuri,
2 Assamese

48. U+09B8 স BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER SA 2 Maṇ ipuri,
2 Assamese

49. U+09B9 হ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER HA 2 Maṇ ipuri,
2 Assamese

50. U+09BE ◌া BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
AA 2 Assamese

51. U+09BF ি◌ BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
I 2 Assamese

52. U+09C0 ◌ী BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
II 2 Assamese

53. U+09C1 ◌ু BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
U 2 Assamese

33
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

54. U+09C2 ◌ূ BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
UU 2 Assamese

55. U+09C3 ◌ৃ BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
VOCALIC R 2 Assamese

56. U+09C4 ◌ৄ BENGALI Kāra 1 Bangla, [112], [125]


VOWEL SIGN (Mātrā) 2 Assamese
VOCALIC RR

57. U+09C7 l◌ BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
E 2 Assamese

58. U+09C8 m◌ BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
AI 2 Assamese

59. U+09CB l◌া BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
O 2 Assamese

60. U+09CC l◌ৗ BENGALI Kāra 1 Bangla, [112], [122], [125]


VOWEL SIGN (Mātrā) 2 Maṇ ipuri,
AU 2 Assamese

34
No. Unicode Gly Character Category Language(s), References and
Code ph Name with EGIDS Comment
Point Value

61. U+09CD ◌্ BENGALI Hasanta 1 Bangla, [112], [122], [125]


SIGN (=Halant)/ 2 Assamese
VIRAMA Virā ma 2 Maṇ ipuri
(=Dār̃ i)

62. U+09CE ৎ BENGALI Consonant 1 Bangla, [112], [122], [125]


LETTER 2 Maṇ ipuri,
KHANDA TA 2 Assamese

63. U+09F0 ৰ BENGALI Consonant 2 Assamese [125]


LETTER RA
WITH
MIDDLE
DIAGONAL

64. U+09F1 ৱ BENGALI Consonant 2 Maṇ ipuri [122],[125]


LETTER RA 2 Assamese
WITH
LOWER
DIAGONAL

Table 8: Bangla Code-Point Repertoire

Apart from the above individual code-points, the Neo-Brāhmī Generation Panel also
proposes some specific sequences which enable conditional inclusion of the "Bangla
LETTER A and E" followed by Bangla SIGN VIRAMA and Bangla LETTER YA again
followed by Bangla VOWEL SIGN AA in the repertoire for enabling inclusion of /æ/
sound as in English ‘bat’, ‘cat’ etc.

Sr. Unicode Seque Character Names Example Reference


No. Code nce languages
Points using the
code-point
(Not
exhaustive
list)

35
Sr. Unicode Seque Character Names Example Reference
No. Code nce languages
Points using the
code-point
(Not
exhaustive
list)

S1. 0985 অ8া BENGALI LETTER A Bangla, [112], [122]


09CD BENGALI SIGN VIRAMA Assamese
09AF BENGALI LETTER YA
09BE BENGALI VOWEL SIGN AA

S2. 098F এ8া BENGALI LETTER E Bangla [112]


09CD BENGALI SIGN VIRAMA
09AF BENGALI LETTER YA
09BE BENGALI VOWEL SIGN AA

Table 9: Sequences

5.2 Code Point Repertoire Exclusion


There are some characters of the Bangla script that find place in the Unicode but have
not been included in the repertoire in the LGR proposal. The reason for excluding ঌ
(U+098C) and ◌ৗ (U+09D7) is that they are rare and obsolete characters.

Sr. No. Code Glyph Character Names Note


Points

1. U+098C ঌ BENGALI LETTER VOCALIC L Limited or declining use

2. U+09D7 ◌ৗ BENGALI AU LENGTH MARK Limited or declining use

Table 10: Excluded Code Points

36
5.3 Code point not used alone
BENGALI SIGN NUKTA U+09BC (See 3.3.6) is excluded from repertoire since it will
never be used alone. It will be used as sequence in three special characters in
normalized form for ড়, ঢ়, য়.

Unicode Glyph Character Name Reason for exclusion


Code
Point
U+09BC ◌় BENGALI SIGN Never used alone. Only used
NUKTA together with U+09A1 ড,
U+09A2 ঢ, U+09AF য as to
form ড়, ঢ়, য় respectively

Table 10b: Excluded Code Points

5.4 The Basis of Present IDN


The present LGR has also benefited from the earlier work on IDN for Bangla (different
versions) done for .भारत or .ভারত drafted between 20.11.2009 and 18.07.2013.

5.4.1 The ABNF Variables


The Augmented Backus-Naur Formalism (ABNF) began with the following variables:
C → Consonant
V → Vowel
M → kāra (Mātrā)
B → Anusvāra (/onuʃʃār /)
D → Candrabindu
X → Visarga (/biʃɔrgo)
H → Hasanta /Virā ma
Z → Khaṇ ḍ a Ta

The Augmented Backus-Naur Formalism (ABNF) will use the following Operators:

Sr. Number Operator Function

1 “|“ Alternative

2 “[ ]” Optional

3 “*” Variable Repetition

4 “( )” Sequence Group

37
Table 11: The ABNF Formalism

In what follows, the Vowel Sequence and the Consonant Sequence pertinent to Bangla
are given to facilitate understanding.

5.4.2 The Vowel Sequence


In what follows, the Vowel Sequence and the Consonant Sequence pertinent to Bangla
are given. To facilitate understanding of other Brahmi script users, equivalents in
Devanāgarī are provided, wherever necessary.

A vowel sequence is made up of a single vowel. It may be followed but not necessarily
(optionally) by an Anusvāra /onuʃʃār/ (B), Candrabindu (D) or a Visarga /biʃɔrgo/ (X).
The number of D, B or X which can follow a V in Bangla may not be restricted to one.
Going by the rules illustrated in the document it is clear that formations such as VDD,
VBB and VXX are invalid orthographic units. However, it is valid and possible to have
formations or sequences such as anusvā ra followed by a chandrabindu on one hand and
visarga followed by a chandrabindu on the other as in হ8াঁংচা ‘hænchā’ and ‘hæn’ হ8াঁঃ
respectively.

The possibility of a Visarga or Anusvāra (/onuʃʃār/) following a Candrabindu exists in


Bangla. Vowel can optionally be followed by a combination of Hasanta / Virā ma [H],
Consonant [C] to form a Ya-phalā . “Ya-phalā is a presentation form of U+09AF Bangla
letter য or ‘ya’. Represented by the sequence < U+09CD, i.e. ◌্ BENGALI SIGN VIRAMA,
Bangla SIGN Hasanta or VIRAƒ MA, U+09AF - য BENGALI LETTER YA>, Ya-phalā has a
special form: য়. Again, when combined with U+09BE ◌া BENGALI VOWEL SIGN AA, (i.e.
‘aa’(ā)), it is used for transcribing [æ] as in the “a” in the English word “bat” written in
Bangla as ব8াট.

A Vowel-sequence admits the following combinations:

5.4.2.1 A Single Vowel

Examples: V অ अ

5.4.2.2 A Vowel with Conditions


A Vowel can optionally be followed by Anusvāra [B] or Candrabindu [D] or Visarga [X]
or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX] or combination of
Hasanta (or Virama) [H] followed by Consonant [C] followed by kāra (Mātrā) [M].

38
Examples:

VB অং अं
VD অঁ अँ
VX অঃ अः
VDB অঁং अँ◌ं◌ं
VDX অঁঃ अँ◌ं◌ः
VHCM অ8া /এ8া

5.4.2.3 VHCM Sequence


A VHCM sequence can optionally be followed by Anusvāra [B] or Candrabindu [D] or
Visarga [X] or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX].

Examples:
VHCMB অ8াং/এ8াং
VHCMD অ8াঁ/এ8াঁ
VHCMX অ8াঃ/এ8াঃ
VHCMDB অ8াঁং/এ8াঁং
VHCMDX অ8াঁঃ/এ8াঁঃ

5.4.3 The Consonant Sequence

5.4.3.1 A Single Consonant (C)

Example: C ক क

5.4.3.2 A Consonant with Conditions


A Consonant optionally followed by dependent vowel sign / kāra (Mātrā) [M] or
Anusvāra [B] or Candrabindu [D] or Visarga [X] or Hasanta (also known as
Virā ma) [H] or Candrabindu+ Anusvāra [DB] or Candrabindu+ Visarga [DX]

Example:
CM িক/ কৃ -क/ कृ
CB কং कं

39
CD কঁ कँ
CX কঃ कः
CH p क् (Pure consonant)
CDB কঁ ং कँ◌ं ◌ं
CDX কঁ ঃ कँः

5.4.3.3 CM Sequence
A CM sequence can be optionally followed by B, D, X, DB or DX.

Example:
CMB কীং/ কৃ ং क/ं/ कं ृ
CMD কাঁ काँ
CMX বীঃ वीः
CMDB কাঁং काँ ◌ं
CMDX কাঁঃ काँः

5.4.3.4 Sequence of Consonants


A sequence of consonants (up to 4) joined by Hasanta (also known as Virama).

*3(CH)C
Example:
CHC W → ন+◌্+ ত न ्+ त

CHCHC ² → ন+◌্ + ত+◌্ + র न ्+ त ्+ र


CHCHCHC q8 → ন+◌্+ত+◌্+র+◌্+য় न ् + त ् + र् + य

5.4.3.5 Subsets:

While considering its subsets, as a representative example, we will consider the


combination CHC only, however the same is equally applicable to CHCHC and CHCHCHC.

[A]. The combination may be followed by M, B, D, X, DB or DX.


Example:
CHCM ³ী →ক ◌্ ক ◌ী 4क/ → क ◌् क ◌ী
CHCB ³ং →ক ◌্ ক ◌ং 4कं → क ◌् क ◌ं◌ं
CHCD ³ঁ →ক ◌্ ক ◌ঁ 4कँ→ क ◌् क ◌ं◌ँ

40
CHCX ³ঃ →ক ◌্ ক ◌ঃ 4कः → क ◌् क ◌ঃ
CHCDB ³ঁ ◌ং →ক ◌্ ক ◌ঁ ◌ং 4कँ◌ं◌ं→ क ◌् क ◌ं◌ँ ◌ं
CHCDX ³ঁঃ →ক ◌্ ক ◌ঁ ◌ঃ 4कँ◌ं◌ः→ क ◌् क ◌ं◌ँ ◌ः

[B]. *3(CH)CM may further be followed by a B, D, X, DB or DX


Example:
CHCMB ³ীং → ক ◌্ ক ◌ী ◌ং 4क/ं → क ◌् क ◌ी ◌ं
³ৃং → ক ◌্ ক ◌ৃ ◌ং 4कं ृ → क ◌् क ◌ृ ◌ं
CHCMD ³াঁ → ক ◌্ ক ◌া ◌ঁ 4काँ → क ◌् क ◌ा ◌ं◌ँ
CHCMX ³ীঃ → ক ◌্ ক ◌ী ◌ঃ 4क/ः → क ◌् क ◌ी ◌ः
CHCMDB ³াঁং→ ক ◌্ ক ◌া ◌ঁ ◌ং 4काँ→ क ◌् क ◌ा ◌ँ ◌ं

CHCMDX ³াঁঃ → ক ◌্ ক ◌া ◌ঁ ◌ঃ 4काँः → क ◌् क ◌ा ◌ं◌ँ ◌ः

5.4.4 The Khaṇ ḍ a-Ta sequence

5.4.4.1 A single ‘Khaṇ ḍ a’-Ta (Z)


Example: Z ৎ = x

5.4.4.2 A Khaṇ ḍ a Ta Combination10


A Khaṇ ḍ a Ta can be preceded by a consonant and Hasanta (also known as Virā ma)

[CH]Z

Example:
র + ◌্ + ৎ = ৎH as in ভৎHসনা (bhartsanā) "scolding"
Note: The conditions in this context of KHANDA TA are that the C should be either RA
U+09B0 (র) (used in Bangla) or RA U+09F0 (ৰ) (used in Assamese).

5.4.5 Special Cases S and P

Two special cases involving Sequences (referred to as S and P in Table 16 under Section
7) could be described briefly here. Let us take up S in the first instance. It is noteworthy
that there are two instances in Bangla where Hasanta (U+09CD) is preceded by a full

10
Refer to Rule P in Section 7, Table 16.

41
vowel (U+0985 অ - BANGLA LETTER A and U+098F এ - BANGLA LETTER E). For
rendering Ya-phalā followed by অ and এ, it is necessary to type U+09CD plus U+09AF ya
preceded by the said vowels. This is a purely ligatural entity and the addition of Ya-
phalā and ā-kāra is used to elicit the /æ/ sound as in English ‘bat’, ‘fat’ etc. The Brā hmı̄
script, by nature does not have Hasanta after a vowel. Hasanta is generally described as
‘vowel killer’, although it actually indicates absence of a vowel after the marked
consonant. Only the consonants can have the Hasanta marked. But as we see here,
Bangla ends up with a deviant feature in the orthography here in which Hasanta comes
immediately after a vowel in ligatures অGা and এGা (Cf Unicode 10.0 p. 473 [100]).

Another case refers to the formation of repha and ra-phalā in the said script and
mentioned in the table above as P. Owing to co-occurrence with HASANTA, RA either
loses its own implicit vowel (REPHA), or suppresses the implicit vowel of the preceding
consonant (RA-PHALAƒ ). For instance, repha = ra + Hasanta + C (e.g. কH i.e. ra + Hasanta +
ka, as in অকH arka “the sun“); ra-phalā= C + Hasanta + ra (e.g. T i.e. ka + Hasanta + ra, as
in চT cakra “cycle“). The point is in both the cases the slot for ra could be Bangla ra র
(U+09B0) or the Assamese ra ৰ (U+09F0), followed/ preceded by the common Hasanta
(U+09CD), whereas the shapes of repha and ra-phalā in both the cases remain the same.

42
6. Variants

This section talks about the variants in the Bangla script. The NBGP categorizes these
confusingly variants in two groups.

Group 1: Confusing due to pure visual similarity.


Group 2: Confusing due to deviation from normally perceived character formations by
larger linguistic community.

For Group 1, any identical code points are defined as variants. The confusable, but not
identical, cases are not proposed, as there is another panel (String similarity assessment
panel) entrusted to deal with such cases. However, cases which belong to Group 2 are
proposed to be considered as variants. These cases are not of mere visual similarity as
they involve some deviations from the widely accepted norms of Bangla Akshar
formations. These can cause confusion even to a careful observer and hence being
proposed as variants.

The variants are generated in a script when two or more forms are formed with
different storage or code points. In Bangla the e-kāra, ā-kāra and the o-kāra have
different code points. One can type o with a consonant at one go and the same by typing
e-kāra and ā-kāra as two separate keys getting the same results. A reader cannot
differentiate between the two ko (`কা), one typed with a single key and the other one
typed with two different keys. Moreover, this will not be considered as a case of variant
because a kāra followed by a kāra is not allowed.

6.1 In Script Variants


However, we propose two cases of true in-script variants in Bangla script.

CASE I:
As far as true variants in Bangla are concerned, we may draw our attention to cases
wherein Hasanta with (U+09A5) থ (tha) appears as conjunct with (U+09B8) স (sa) and
(U+09A8) ন (na).

1. স + Hasanta + থ (U+09B8 + U+09CD + U+09A5) versus


স + Hasanta + হ (U+09B8 + U+09CD + U+09B9)
2. ন + Hasanta + থ (U+09A8 + U+09CD + U+09A5) versus
ন + Hasanta + হ (U+09A8 + U+09CD + U+09B9)

43
The above combinations, if written in traditional orthography, could be little confusing,
where the থ (tha) in conjunct appears like a হ (ha). The conjunct could be in the initial,
medial or final positions (as shown below in e.g. no 1). It could be typed wrong as well,
thinking it was a হ (ha) U+09B9, increasing the chances of risks in label writing and
identification.

Examples:
1. ´ and µহ (as in ´ান sthāna, ´ূল sthū la, {া´G svāsthya, অ´ায়ী asthāyı̄)
2. ˆ and ‰হ (as in ¶ˆ grantha)

The fonts which represent traditional Bangla writing system could tend to create this
problem. Therefore, these may be taken as cases of variants in Bangla.

CASE II:
Another interesting example of variant is encountered in ra + Hasanta and Hasanta + ra
combinations in writing labels in the Bangla script (for languages such as Bangla,
Assamese and Maṇ ipuri). The variant cases arise in typing ‘repha’ (involving ra +
Hasanta) and ‘ra-phalā’ (involving Hasanta + ra).

‘Repha’ could be formed by two sequences (mainly because both Assamese and Bangla
find place in the same UNICODE points, and ‘B_RA’ as well as ‘A_RA’ refer to the same
phonetic element). Here, the final ligatures look the same, and will be as follows:
(1) B_RA + H + C
(2) A_RA + H + C

Where
B_RA = U+09B0 BENGALI LETTER RA (র) or
A_RA = U+09F0 BENGALI LETTER RA WITH MIDDLE DIAGONAL (ৰ)
H = U+09CD BENGALI SIGN VIRAMA (◌্)
C = any consonant (theoretically)
Example:

Sequence1 Ligature Sequence2 Ligature


(Using Bangla RA) 1 (Using Assamese RA) 2

U+09B0 (র) U+09CD (◌্)U+0995 (ক) কH U+09F0 (ৰ) U+09CD (◌্) U+0995 (ক) কH

U+09B0 (র) U+09CD (◌্)U+09A0 (ঠ) ঠH U+09F0 (ৰ) U+09CD (◌্) U+09A0 (ঠ) ঠH

Table 12: Example of Repha

44
Note: As Bangla and Assamese ক and ঠ look exactly the same, the resultant
combinations with 'Repha' look identical. Addition of 'Repha' does not make any
difference.

‘Ra-phalā ’ could be formed by two sequences on similar grounds, and the final ligatures
would look the same
(1) C1 + H + B_RA
(2) C1 + H + A_RA
Where
C1 = any consonants except Khaṇ ḍ a-ta
Example:

Sequence1 Ligature Sequence2 Ligature


(Using Bangla RA) 1 (Using Assamese RA) 2

U+0995 (ক) U+09CD (◌্) U+09B0 (র) ' U+0995 (ক) U+09CD (◌্) U+09F0 (ৰ) '

U+09A8 (ন) U+09CD (◌্) U+09B0 (র) ) U+09A8 (ন) U+09CD (◌্) U+09F0 (ৰ) )

Table 13: Example of Ra-phalā

As the Assamese and Bangla Repha and Ra-phalā conjunct forms look the same, this
could cause confusability to the end-users. Hence, the repha and ra-phalā cases need to
be defined as variants.

NBGP concluded to define র and ৰ as variant code points, where only one variant set
between র and ৰ could cover all cases. But this will create blocked variant labels, e.g. if
someone registers “র র র” the variant label “ৰৰৰ” will be generated as variant and will
be blocked and vice versa. However, it is only blocked at the label level, if someone else
needs to register other labels e.g. ৰৰ or ৰৰৰৰ, it is still possible.

After the public comment, the NBGP reviewed the disposition for র and ৰ variants.
These code points are used equally. Therefore, for the usability, the NBGP decided that র
and ৰ are variant “allocatable”. In addition, these code points 09B0 and 09F0 should
not be used in the same label, therefore the no-mix rule should be implemented.

45
6.2 Cross Script Variants
A crisp cross script study for Bangla has been done with respect to sister scripts such as
Devanāgarī, Gurmukhı̄ and Odia11 (formerly Oriya) keeping in mind the visual and
technical confusions they may cause as labels on the web domain. Moreover, there is no
in-script variant in Bangla as far as the orthography is concerned. The following
characters are being proposed by the NBPG as variants. Although there are certain
characters which are somewhat similar they but have not been included here. They
have been provided in the Appendix (10.2) for reference.

1. Bangla and Nāgarī /Devanāgarī Script

Bangla Devanāgarī
ম म
U+09AE U+092E

ি◌ ि◌
U+09BF U+093F

Table 14 - Bangla and Devanāgarī cross-script variant code point

2. Bangla and Gurmukhi Script

Bangla Gurmukhı̄
ম ਸ
U+09AE U+0A38
ি◌ ਿ◌
U+09BF U+0A3F

Table 15 - Bangla and Gurmukhı̄ cross-script variant code point

7. Whole Label Evaluation Rules (WLE)

This section provides the WLEs that are required by all the languages mentioned in
section 3.2 when written in Bangla12 Script. The rules have been drafted in such a
way that they can be easily translated into the LGR specifications.

11
Unicode uses Oriya for the script, although Odia is now the official term used.
12
As used by the Unicode, denoting and including both Assamese and Maṇipuri.

46
Below are the symbols used in the WLE rules, for each of the "Indic Syllabic
Category" as mentioned in the table provided in Code point repertoire (Section 5.1).

C → Consonant
M → Kāra (Mātrā)
V → Vowel
B → Anusvāra

D → Candrabindu

X → Visarga
H → Hasanta
Z → Khaṇ ḍ a Ta
S → S1, S2 (from Table 9)

or

(a/e) Ya-phalā (V1 H C1 M1)


where
V1 is either 0985 (অ - BENGALI LETTER A)
or 098F (এ - BENGALI LETTER E)
H is 09CD (◌্ - BENGALI SIGN VIRAMA)
C1 is - 09AF (য - BENGALI LETTER YA)
M1 is - 09BE (◌া - BENGALI VOWEL SIGN AA)

S1 and S2 are valid, even they are not allowed by


the other context rules.

P → Ra-Hasanta (C2 H)
where
C2 is either 09B0 (র - BENGALI LETTER RA)
or 09F0 (ৰ - ASSAMESE LETTER RA/
Unicode name: BENGALI LETTER RA
WITH MIDDLE DIAGONAL)
H is 09CD (◌্ - BENGALI SIGN VIRAMA)

Table 16 - Symbols used in WLE rules

47
It is also perhaps ideal to mention here that in Bangla, the consonant letters (or
graphemes) are physically joined to form “clusters” that could theoretically conjoin
from two to four consonants and combine to create new shapes. Dash and Chaudhuri
(1998) state that there are “nearly 380 unique consonant...clusters” out of which Bi-
consonantal combinations are 290, three-letter combinations account for another 80
and the rarer ones with four letters number 10 more [136, Pg 4]. More details of such
combinations could be seen in Pabitra Sarkar (1993) [135].

7.1 Final Set of WLE Rules

The prevalent patterns in Bangla, and various restrictions, below are the specific WLE
rules that need to be implemented.

1. C is a set of C and CN where CN is the set of normalized forms of {ড়, ঢ়, য়}.


2. H: must be preceded by C
Example: #
3. M: must be preceded by C
Example: কা
4. D: must be preceded by either of V, C, or M
Example: আঁ, খঁ, খাঁ, হ"াঁ
5. X: must be preceded by either of V, C, M or D
Example: উঃ, খঃ, বঃ, ◌া◌ঃ, দুঁ ঃ
6. B: must be preceded by either of V, C, M or D
Example: আং, ইং, কং
7. Z: must be preceded by V, C, M, D, B, X or P
Example: ইৎ, কৎ, ◌াৎ, ◌া◌ঁৎ, প6ৎ, rৎ (S is not listed, because S ends with M, Z
may also follow S)”.
8. V: CANNOT be preceded by H
Details in 7.1.1 Case of V preceded by H
9. S: CANNOT be preceded by H
10. 09B0(র)and 09F0(ৰ)CANNOT be mixed
Details in 6.1 CASE II

Now let us elaborate each rule with examples from the script keeping in mind
the Bangla, Assamese and Maṇ ipuri communities. Some combinations of
characters may seem unrealistic or rare in usage but there is no harm in adding
such ligatures because it is possible to create them by any user easily but may
not be attested combinations.

48
7.1.1 Case of V Preceded by H:

There could be cases involving multi-word domains where V may need to be


allowed to follow an H

e.g. ব8াtঅuইিvয়া /bæŋk ʌv ɪndiə / (U+09AC U+09CD U+09AF U+09BE U+0999


U+09CD U+0995 U+0985 U+09AB U+09CD U+0987 U+09A8 U+09CD U+09A1
U+09BF U+09DF U+09BE) (meaning: Bank of India)

This is the case where two different words are joined together first of which ends
with an H (অu) and the second word begins with a V (ইিvয়া). Some sections of
the linguistic community require the explicit presence of H for full
representation of the sound intended. However, by and large, the form of the
first word without an H (U+09CD) is considered enough for full representation of
the sound intended for the first word.

This is a unique situation necessitated by the lack of hyphen, space or the Zero
Width Non-joiner character in the permissible set of characters in the Root zone
repertoire. Otherwise, V is never required to be allowed to follow an H.
Permitting this may create a perceptive similarity between two labels (with and
without H) for majority of the linguistic communities hence this is explicitly
prohibited by the NBGP.

In future if required, depending on the prevailing requirements from the


community, the future NBGP may consider revisiting this rule.

7.2 Additional Examples from Bangla ABNF:

Below are a few examples which help one understand some of the rules ABNF
puts in place. These are just given for reference purposes and are not meant to
be comprehensive.

1. H, M, B, D or X cannot occur in the beginning of a Bangla word. Example:


◌্ক ◌्क
◌াক ◌ाक
◌ংক ◌ं◌क

◌ঁক ◌ं◌ँक
◌ঃক ◌ःक
As can be seen such combination will result automatically in a “golu” or a dotted
circle marking it as an invalid formation. This is an intrinsic property of the

49
Indian language syllable and is quasi-automatically applied wherever supported
by the OS.

2. H is not permitted after V, B, D, X, M, S


Example:
অ্ अ्
অং◌্ कं ◌्
কঁ ◌্ कँ ◌्
কঃ◌্ कः◌्
ি* )क्

3. Number of B, D or X permitted after Consonant or Vowel or a kāra (Mātrā) is


restricted to one thus the following combinations are invalidated.
Example:
কংং कं ◌ं◌ं
কঁ ◌ঁ कँ ◌ं◌ँ
কঃঃ कःः
কাঁ ◌ঁ काँ ◌ँ
কীঃঃ क*ःः
অংং अं◌ं◌ं
অঁ ◌ঁ अँ ◌ँ
অঃঃ अःः

4. Number of M permitted after Consonant is restricted to one.


Example:
কীী क*ी

5. M is not permitted after V.


Example:
ইা/ .ঈৗ ईा/ ईौ

6. The combinations of Anusvāra + Visarga as well as Visarga + Anusvāra are not


permissible.
Example:

50
কংঃ कं ◌ः
কঃং कः ◌ं

8. Contributors

8.1 Experts from India

Professor Udaya Narayana Singh, Chair-Professor of Linguistics & Dean, Faculty of Arts,
Amity University Haryana, Gurgaon; Pachgaon, Manesar PIN 122431 (Haryana), India.

Professor Pabitra Sarkar, formerly Vice-Chancellor, Rabindra Bharati University,


Kolkata.

Dr Atiur Rahman Khan, Principal Technical Officer, GIST Group, C-DAC, Pune, PIN
411008 (Maharashtra), India.

Mr Rajib Chakraborty, Linguist, Society for Natural Language Technology Research


(SNLTR), Module 114 & 130, SDF Building, Salt Lake, Sector-V, Kolkata-700091 (West
Bengal), India.

Mr Akshat Joshi, Project Engineer, GIST Group, C-DAC, Pune, PIN 411008 (Maharashtra),
India.

Ms Moumita Chowdhury, Senior Technical Officer, GIST Group, C-DAC, Pune, PIN
411008 (Maharashtra), India.

Mr Chandrakanta Murasingh, Agartala, Tripura.

Some other NBGP members.

8.2 Contributors from Bangladesh

Janab Mustafa Jabbar, Honorable Minister, Ministry of Posts, Telecommunications &


Information Technology, Govt of Bangladesh

Prof Shamsuzzaman Khan, Former Director-General, Bangla Academy, Dhaka

51
Prof Rafiqul Islam, National Professor of Humanities, Dhaka.

Prof Swarochis Sarkar, Director, Institute of Bangladesh Studies, Rajshahi University,


Rajshahi, Bangladesh

Prof Jinnat Imtiaz Ali, Director-General, International Mother Language Institute, Dhaka

Mr Mohammad Mamun Or Rashid , Department of Bangla, Jahangirnagar University &


Member, Bangladesh Computer Council

Prof Maniruzzaman, formerly Professor, Chittagong University, Chattagram, Bangladesh

Mr Shyam Sunder Sikder, Secretary, Secretary, Post & Telecommunications Division


Govt of Bangladesh

Mr Md. Mustafa Kamal, Former Director General, Bangladesh Telecommunications


Regulatory Authority, Government of Bangladesh, Dhaka

Brigadier General Md Mahfuzul Karim Majumder, Director-General, Engineering &


Operations Division, Bangladesh Telecommunications Regulatory Authority,
Government of Bangladesh, Dhaka

Md. Ziarul Islam, Programmer, Posts & Telecommunications Division, Government of


Bangladesh, Dhaka

Prof Syed Shahriyar Rahman, Department of Linguistics, University of Dhaka

Dr Mizanur Rahman, Director (In-Charge), Translation, Text Book and International


Relations Division, Bangla Academy, Dhaka

Dr Aparesh Bandyopadhyay, Director, Bangla Academy, Dhaka

Mr Md Mobarak Hossain, Director, Bangla Academy, Dhaka

Dr Jalal Ahmed, Director, Bangla Academy, Dhaka

Mr Jahangir Hossain, Internet Society Bangladesh (ICANN ALS)

Janab Sarwar Mostafa Choudhury, Bangladesh Computer Council, Dhaka

Janab Md Rashid Wasif, Bangladesh Computer Council, Dhaka

52
Janab Istiaque Arif, Senior Assistant Director, Bangladesh Telecommunications
Regulatory Authority, Dhaka

Ms. Afifa Abbas, Information Security and Governance Lead Engineer at Banglalink, and
ICANN Fellow.

Mr Mohammad Abdul Haque, Secretary General, Bangladesh Internet Governance


Forum

Mr Imran Hossen, CEO, EyeSoft and key member of Bangladesh Association of Software
& Information Services (BASIS).

Ms Shahida Khatun, Director, Folklore, Museum & Archive Division, Bangla Academy,
Dhaka

Mr Syed Ashik Rehman, CEO, Bengal Media Corporation, Dhaka

Mr Haseeb Rahman, CEO, Professionals’ Systems, Dhaka

9. References
[100] Unicode Consortium. 2017. Unicode Standard 10.0. Mountain View CA.

[101] Bandyopadhyay, Chittaranjan. 1981. Dui Shataker Bangla Mudran o Prakashan.


Kolkata: Ananda Publishers.

[102] Banerji, R.D. 1919. The Origin of the Bengali Script. Kolkata. New Delhi; Asian
Educational Services; 2003 reprint.

[103] Chatterji, S.K. 1926. The Origin and Development of the Bengali Language.
Calcutta: Calcutta University Press.

[104] -----. 1939. Bhasha-prakash Bangala Vyakaran (A Grammar of the Bengali


Language), Calcutta: University of Calcutta.

[105] Hai, Muhammad Abdul. 1964. Dhvani Vijnan O Bangla Dhvani-tattwa (Phonetics
and Bengali Phonology), Dhaka: Bangla Academy.

[106] Jha, Subhadra. 1958. The Formation of Maithili. London: Luzac & Co.

[107] Kostic, Djordje; Das, Rhea S. 1972. A Short Outline of Bengali Phonetics, Calcutta:
Statistical Publishing Company.

[108] Majumdar, R.C. 1971. History of Ancient Bengal, Calcutta: G. Bhardwaj.

[109] Mazumdar, Bijaychandra. 1920/2000. The History of the Bengali Language (Repr.
Calcutta, 1920. ed.). New Delhi: Asian Educational Services.

[110] Pandey, Anshuman. 2001. Proposal to Encode the Tirhuta Script in ISO/IEC
10646.

53
[111] Pal, Palash Baran. 2001. Dhwanimala Barnamala. Kolkata: Papyrus.

[112] -----. 2007. ‘Bangla Harapher Panch Parba’. In Swapan Chakraborty, ed.
Mudraner Sanskriti O Bangla Boi. Kolkata: Ababhas.

[113] Ross, Fiona. 1999. The Printed Bengali Character and its Evolution. London:
Curzon.

[114] Shastri, Mahamahopadhyay Hara Prasad. 1916. Hājār Bacharēr Purāṇa Bāṅgālā
Bhāṣāy Bauddha Gān ō Dōhā. Calcutta: Bangiya Sahitya Parishat.

[115] Singh, Udaya Narayana (Jointly Maniruzzaman). 1983. Diglossia in Bangladesh


and language planning. Calcutta: Gyan Bharati. 214 pp.

[116] -----. 1987. A Bibliography of Bengali Linguistics. Mysore: CIIL. xii+316 pp.

[117] -----. 2017. (with Rajib Chakraborty, Bidisha Bhattacharjee & Arimardan Kumar
Tripathy) Languages and Cultures on the Margin: Guidelines for Fieldwork on Endangered
Languages. Mimeo. Centre for Endangered Languages, Visva-Bharati.

[118] -----. 1980. Scriptal choice and spelling reform: An essay in language and
planning. Journal of the M.S. University of Baroda, Social Science Number, 29.2 : 173-
186. A modified version reprinted E. Annamalai, Bjorn Jernudd and Joan Rubin, eds.
Language Planning: Proceedings of an Institute. Mysore: CIIL. 405-417.

[119] Sripantha. 1996. Jakhan Chapakhana Elo. Kolkata: Paschim-Banga Bangla


Academy.

[120] Sur, Atul. 1986. Bangla Mudraner Dusho Bachar. Kolkata: Jijnasa.

[121] Script Behaviour for Bengali, Version 1.1, TDIL and C-DAC Pune.

[122] Bora, Mahendra. 1981. The Evolution of Assamese Script. Jorhat: Assam Sahitya
Sabha.

[123] Proposal to Encode the Tirhuta Script in ISO/IEC 10646,


http://www.unicode.org/L2/L2011/11175r-tirhuta.pdf accessed on 25.11.2017

[124] Ethnologue, Assamese in the Language Cloud,


https://www.ethnologue.com/cloud/asm accessed on 25.11.2017

[125] Bengali alphabet for Manipuri, found in Ethnologue, Manipuri (Meeteilon/


Meithei), https://www.omniglot.com/writing/manipuri.htm accessed on 20.10.2019

[126] Wikipedia, Bengali alphabet, https://en.wikipedia.org/wiki/Bengali_alphabet


accessed on 25.11.2017

[129] Omniglot, Slyheti, http://www.omniglot.com/writing/syloti.htm accessed on


10.5.2018

[130] Wikipedia, Bishnupriya Manipuri language,


https://en.wikipedia.org/wiki/Bishnupriya_Manipuri_language accessed on 25.11.2017

54
[131] The EMILLE/CIIL Corpus, http://metashare.elda.org/repository/browse/the-
emilleciil-
corpus/abdd35c8de6f11e2b1e400259011f6ea6bce74d38dbb42d881da76c64a6adb20
/ accessed on 10.5.2018

[132] The EMILLE/CIIL Corpus,


http://catalog.elra.info/product_info.php?products_id=696 accessed on 10.5.2018

[133] Bangla Language & Script, https://www.isical.ac.in/~rc_bangla/bangla.html


accessed on 10.5.2018

[134] Sarkar, Pabitra. 1992. Bangla Banan Sanskar: Samasya o Sambhabana. Kolkata:
Chirayata Prakashan.

[135] Sarkar, Pabitra. 1993. Bangla Bhashar Yuktabyanjan. Bhasha 1.1: 23-45.

[136] Dash, Niladri Shekhar and B.B.Chaudhuri. 1998. Bangla Script: A Structural
Study. Linguistics Today 1.2: 1-28. Also available at
https://www.academia.edu/9967428/Bangla_Script_A_Structural_Study

[137] Dani, Ahmed Hasan. (1957) ‘Srīhaṭṭa-Nāgarī Lipir Utpatti o Bikāś.’ Bangla
Academy Patrika (Dhaka), Vol 1.2. (Bhadra-Agrahayan, 1364 Bangabda Number).pg 1.

[138] Wikipedia, Sylheti Nagari,


https://en.wikipedia.org/wiki/Sylheti_Nagari accessed on 19.5.2018

[139] Furui, Ryosuke. (2015). ‘Variegated Adaptations: State Formation in Bengal from
the Fifth to Seventh Century’, in Bhairabi Prasad Sahu & Hermann Kulke, eds.
Interrogating Political Systems: Integrative Processes and States in Pre-Modern India.
Chapter 9. Pp 255-73. New Delhi: Manohar.

[140] Ferguson, Chares A. and Munier Chowdhury. (1960) ‘Phones of Bengali’,


Language, Vol. 36, No. 1, pp. 22-59.

[141] Shahidullah, Muhammad. (2007) Buddhist Mystic Songs. Dhaka: Mowla Brothers.

[142] Ray, Punya Sloka. (1966) Bengali Language Handbook. Washington.

[143] Hai, Muhammad Abdul. (1960) A phonetic and phonological study of nasals and
nasalization in Bengali. Dhaka: University of Dhaka.

55
[144] Unicode Consortium, Proposal Summary Form to Accompany Submissions for
Additions to the Repertoire of ISO/IEC 10646 / UNICODE,
https://www.unicode.org/L2/L2002/02387r-syloti-form.pdf accessed on May 21,
2018

[145] Wikipedia, Ol Chiki (Unicode block),


https://en.wikipedia.org/wiki/Ol_Chiki_(Unicode_block) accessed on May 21, 2018

[146] Bangla Script, http://www.bangladesh2000.com/bd/bangla_script.html


accessed on May 21, 2018

[147] Bhattacharya, Ashutosh ed. (1942) Gopichandrer Gan, Calcutta: Calcutta


University.

[149] Das, Sisir Kumar. (1975) Sahibs and Munshis: An Account of the College of Fort
William. Calcutta.

[150] Islam, Rafiqul, Pabitra Sarkar, Mahbubul Haq & Rajib Chakraborty (eds.). (2014)
Bangla Academy Promito Bangla Byabaharik Byakaran (A Functional Grammar of
Standard Bangla). Dhaka: Bangla Academy.

[151] Sarkar, Pabitra. [2013] ‘Bangla Spelling Reform: the Long and Short of It’. Bangla
Journal 19: 215-232.

[152] Bangla Academy. (2012) Bangla Academy Promito Bangla Bananer Niyam
(Standard Bangla Spelling as adopted by Bangla Academy). Dhaka: Bangla Academy.

[153] Sarkar, Pabitra & Rajib Chakraborty. 2018. “What has happened So Far In terms
of Script Reforms”. Paper presented at the Face to Face meeting jointly held by the
Bangla Academy, Dhaka & ICANN at Bangla Academy, Dhaka on 10.07.2018.

[154] The Unicode Consortium. 2018. The Unicode® Standard Version 11.0 – Core
Specification. Chapter 12, P. 473.

56
10. Appendix- I

10.1 Augmented Backus-Naur Formalism (ABNF)


The Augmented Backus-Naur Formalism (ABNF) is generic in nature and when
applied to a specific language/script, certain restriction rules apply. In other words,
in a given language some of the Formalism structures do not necessarily apply. To
take care of such cases restriction rules are set in place. These restrictions will help
to fine-tune the ABNF.

In case of Bangla13 in particular the following rules apply:

1. Khaṇḍa ta (ৎ) is NOT allowed at the beginning of an IDN label. The same
applies to ঞ and the velar nasal ঙ in the Bangla Scheme of five-fold ‘varga’ (as
defined under Table 5). Moreover, Bangla does not allow ya (য়) in the
beginning of a word either but we can cite a couple of native examples, for
example, the word য়8াwেড়া (yæbbɔRo) from the poem ‘Lichuchor’ written by
Kazi Nazrul Islam. However, there are instances of it being used in names,
mostly of foreign origin such as Yaqub which may be written with ya (য়) in
the beginning as in য়াxব). In very recent times, while transliterating some
Chinese and Japanese names in Bangla, one does come across the possibility
of Khaṇḍa ta (ৎ) followed by sa (স) in the beginning of a word, for example
yেসিরং (Tsering).

2. CH can come with Khaṇ ḍ a Ta in only the case where C is ra (র) (09B0).
ৎ6 as in ভৎ6 সনা

3. Only following combinations with VHCM will be allowed.


→ অ8া (together pronounced as æ) as in অ8ািসড (acid)
→ এ8া (together also pronounced as æ) as in এ8ািসড, এ8ােসািসেয়শান
(acid, association)

10.2 ‘Sylheti Nā garı̄ lipi’ or ‘Siloṭi’


This version of Bangla script resembles the ‘Kaithī’ script (ISO 12954) used by the
Accountants (perhaps by the Kāyastha community) in Eastern Uttar Pradesh and
Bihar – widely in use during the 1880s. There were several other names of Sylheti

13 This section specifically takes up issues of restrictions pertaining to Bangla (Bengali) language. Assamese
and Maṇipuri have not been covered in this section.

57
Nā garı̄ or Siloti (129) – such as ‘Jā lā lā bā da Nā garı̄’, ‘Fula (flower) Nā garı̄’, ‘Muslim
Nā garı̄’, or ‘Muhā mmad Nā garı̄’. It is said that Shā h Jā lā la had brought the script with
him in 13th-14th Century in Sylhet (138), although some suggested that it was an
invention by the Afghan rulers of Sylhet (137). Some ascribe the credit to the
Buddhist Bhikkhus from Nepal. Purely for historical reasons, the details of the script
with 32 symbols are reproduced here (138):

Table 17 – The Script Table of Sylheti Nā garı̄ or Siloṭi

10.3 Confusable code points


The following code points were analysed and concluded that they are either (a)
distinguishable or (b) confusable but not enough to be defined as variant code
points.

10.3.1 Bangla and Nāgarī or Devanāgarī

NBGP
Bangla Devanāgarī Decision
◌ঃ U+0983 ◌ः U+0903 Confusable
ও U+0993 उ U+0909 Confusable
ঘ U+0998 घ U+0918 Confusable
◌ঁ U+0981 ◌ॅ U+0945 Confusable
Table 18: Bangla and Devanāgarī confusable code points

58
10.3.2 Bangla and Gurmukhi

NBGP
Bangla Gurmukhi decision
ঘ U+0998 ਬ U+0A2C Confusable
◌ঁ U+0981 ◌ੱ U+0A71 Confusable

Table 19: Bangla and Gurmukhi confusable code points

Gurmu NBGP decision


Bangla khi
ও U+0993 ਤ Distinguishable
U+0A24
শ U+09B6 ਅ Distinguishable
U+0A05
ম U+09AE ਮ Distinguishable
U+0A2E
বা U+09AC and ਗ Distinguishable
U+09BE U+0A17

Table 20 – Bangla and Gurmukhı̄ distinguishable code points

10.3.3 Bangla and Oriya (Odia)

Bangla Oriya (Odia) NBGP


Decision
ও U+0993 ଓ U+0B13 Confusable

Table 21 – Bangla and Oriya distinguishable code points

Bangla Oriya (Odia) NBGP


Decision
ঘ U+0998 ସ U+0B38 Distinguishable

Table 22 – Bangla and Oriya distinguishable code points

59
11. Appendix -II
Bengali consonants and their allographs
Consonants Phonetic Value Allographs

Clusters Transparent
Form (Bangla
Akademi font)

প /p/ z ({+ত), | ({ + ন), } ({ + প), প8


({ + য), r ({ + র), ~ ({ + ল), •
({ + স)

€/€ (•+প), ‚ (ƒ+প)

ফ /pʰ/ „ (u+ র), … (u + ল)

†/† (•+ফ)

ব /b/ ‡ (ˆ + জ), ‰ (ˆ + দ), Š (ˆ + ধ), / (0+ধ)


w (ˆ+ব), ব8 (ˆ+য), ‹ (ˆ+র), Œ
(ˆ+ল), •ভ (ˆ+ভ)

Ž (•+ব), • (•+ব) 2 (3+ব)

ভ /bʱ/ ভ8 (‘+য), ’ (‘+র), “ (‘+ল)

ত /t/ ” (y+ত), ”8 (y+y+য), •


(y+y+ব), – (y+থ), — (y+ন), ত8
(y+য), ˜ (y+ম), ˜8 (y+™+য), š
(y+ব), › (y+র)

z ({+ত), œ (p+ত), • (p+y+ব),


ž (Ÿ+ত), q8 (Ÿ+y+ +য), ¡
(•+y+র)
& (5+ত)
There is a marked form of
ত+◌্=ৎ, ৎ6 ( +y/ৎ)

60
Consonants Phonetic Value Allographs

Clusters Transparent
Form (Bangla
Akademi font)

থ /tʰ/ থ8 (¢+য), £ (¢+র)


¤ (•+থ), – (y+থ), ¥ (Ÿ+থ) " (7+থ), 9 (:+থ)

দ /d/ ¦ (§+গ), ¨ (§+ঘ), © (§+দ), ª ; (<+গ), > (<+ধ)


(§+ধ), দ8 (§+য), « (§+ব), ¬
(§+ভ), - (§+র)
‰ (ˆ+দ), ® (Ÿ+দ), ¯ (Ÿ+§+র), -6
( +§+র)

ধ /dʱ/ ° (±+ন), ² (±+ম), ধ8 (±+য), ³


(±+র)

( (?+ধ), > (<+ধ),


´ (µ+ধ), ª (§+ধ), Š (ˆ+ধ), ¶
/ (0+ধ), @ (7+ধ)
(Ÿ+ধ)

ট /ʈ/ · (¸+ট), ট8 (¸+য), ¹ (¸+ব), º


(¸+র)

» (p+ট), ¼ (½+ট)

ঠ /ʈʰ/ ঠ8 (¾+য)

¿ (À+ঠ), Á (½+ঠ)

ড /ɖ/ Â (Ã+ড), ড8 (Ã+য), Ä (Ã+র)

ঢ /ɖʱ/ ঢ8 (Å+য)
Æ (À+ঢ)

চ /t͡ʃ/ Ç (È+চ), É (È+ছ), Ê (È+Ë+র),


Ì (È+ঞ), চ8 (È+য)

Í (Î+চ), Ï (Ð+চ)
# (A+চ)

61
Consonants Phonetic Value Allographs

Clusters Transparent
Form (Bangla
Akademi font)

ছ /t͡ʃʰ/ Ñ (Ë+র)

É (È+ছ), Ò (Î+ছ), Ó (Ð+ছ)


$ (A+ছ)

জ /dʒ/ Ô (Õ+জ), Ö (Õ+Õ+ব), ×


(Õ+ঝ), Ø (Õ+ঞ), জ8 (Õ+য), Ù
(Õ+র)

Ú (Î+জ) % (A+জ)

ঝ /dʒʱ/ (not privileged enough to have


clusters as a first member)

× (Õ+ঝ), Û (Î+ঝ)

ক /k/ Ü (p+ক), » (p+ট), œ/œ (p+ত), & (5+ত), .


Ý (p+y+র), • (p+y+ব), Þ (5+E+র), G
(5+E+ব), ' (5+র)
(p+ন), ß (p+ব), à (p+ম), ক8
(p+য), á (p+র), â (p+ষ), ã
(p+½+ণ), ä (p+½+ম), å
(p+½+ব), â8 (p+½+য), æ (p+স)

t (ç+ক), è (•+p+র)

) (H+ক), J
(:+5+র)

খ /kʰ/ (not privileged enough to have


clusters as a first member)

é (ç+খ)

62
Consonants Phonetic Value Allographs

Clusters Transparent
Form (Bangla
Akademi font)

গ /g/ ê (µ+গ), ë (µ+দ), ´ (µ+ধ), ì ( (?+ধ)


(µ+ন), í (µ+ব), î (µ+ম), গ8
(µ+য), ï (µ+র), ð (µ+ল)

ñ (ç+গ), ñ6 ( +ç+গ) * (H+গ), *K


(L+H+গ)

ঘ /gʱ/ ò (ó+ন), ঘ8 (ó+য), ô (ó+র)

õ (ç+ঘ)

ঞ This letter does not Í (Î+চ), Ò (Î+ছ), Ú (Î+জ), Û # (A+চ), $


have any particular (Î+ঝ) (A+ছ), % (A+জ),
phonetic value, but M (A+ঝ)
mostly pronounced
Ø (Õ+ঞ),
as /n/.

ণ /n/ ö (À+ট), ¿ (À+ঠ), ÷ (À+ড), ø O (P+ড), -


(À+Ã+র), Æ (À+ঢ), ù (À+ণ), ণ8 (P+R+র)

(À+য), ú (À+ব)

ã (p+½+ণ), û (½+ণ), ü (•+ণ) + (S+ণ)

ঙ/◌ং /ŋ/ t (ç+ক), ý (ç+p+র), é (ç+খ), ) (H+ক), *


ñ (ç+গ), õ (ç+ঘ), þ (ç+p+ষ), (H+গ), U (H+ঘ)
(In some contexts ç is
replaced by ◌ং )

কং, অং

63
Consonants Phonetic Value Allographs

Clusters Transparent
Form (Bangla
Akademi font)

ম /m/ ÿ (™+ল), ! (™+প), " (™+{+র),


# (™+ভ), $ (™+‘+র), % (™+ম),
& (™+র),

˜ (y+ম), ² (±+ম), ' (•+ম), ä W (3+ম)


(p+½+ম)

ন /n/ ( (Ÿ+ট), ) (Ÿ+¸+র), * (À+ঠ), " (7+থ), @ (7+ধ),


v (Ÿ+ড), + (Ÿ+Ã+র), ž (Ÿ+ত), , (7+Y+র)

q (Ÿ+y+র), q8 (Ÿ+y+ +য), ¥


(Ÿ+থ), ® (Ÿ+দ), ¯ (Ÿ+§+র), ¶
(Ÿ+ধ), , (Ÿ+±+র), - (Ÿ+§+ব),
. (Ÿ+ন), / (Ÿ+ম), ন8 (Ÿ+য), 0
(Ÿ+স)

1 (•+ন)

শ /ʃ/ Ï (Ð+চ), Ó (Ð+ছ), 2 (Ð+ন), 3


(Ð+ম), 4 (Ð+র), 5 (Ð+ল), শ8
(Ð+য)

ষ /ʃ/ 6 (½+ক), ¼ (½+ট), Á (½+ঠ), û + (S+ণ)


(½+ণ), 7 (½+প), 8 (½+{+র), 9
(½+ফ), ¼ (½+ট), : (½+¸+র), Á
(½+ঠ), û (½+ণ), ষ8 (½+য)

â (p+ষ), ã (p+½+ণ), ä
(p+½+ম)

64
Consonants Phonetic Value Allographs

Clusters Transparent
Form (Bangla
Akademi font)

স /s/ & /ʃ/ ;/; (•+ক), < (•+ট), € (•+প), 9 (:+থ)


† (•+ফ), = (•+ত), ¤ (•+থ), <
(•+ট), ; (•+ক), > (•+খ), স8
(•+য), ? (•+র), @ (•+ল)

æ (p+স)

হ /h/ ü (•+ণ), 1 (•+ন), ' (•+ম), হ8 W (3+ম)


(•+য), A (•+র), B (•+ল)

ড় /ɽ/ C (D+গ)

ঢ় /ɽʱ/ (not privileged enough to have


clusters)

য /dʒ/ ক8 (p+য), স8 (•+য), র8 ( +য)


The secondary [Just র8 is never used in Bangla
symbol (allograph)
orthography. র8া is, but then
jɔ-phalā has two
its last two symbols, Ya-phalā
phonetic values.
ā -kā ra, constitute a vowel sign,
When added to the
initial consonant in a representing the vowel অ8া.]
word, it is a vowel
/æ/ (as in শ8ামল,
র8াপার, etc.). But after
a non-initial
consonant, it just
doubles it in
pronunciation (as in
কায6, ধায6, etc.). The
+য combination has
two physical
manifestations—র8
and য6.

65
Consonants Phonetic Value Allographs

Clusters Transparent
Form (Bangla
Akademi font)

র /r/ Two manifestations—


i. lরফ /repʰ/ as the first
member of a cluster,
e.g., প6, ৎ6 , -6 , য6, E6
( +±+ব) (earlier
F6 = +§+±+ব, a four-
term cluster), etc.
(placed over the
following
consonant)
ii. র-ফলা /rɔ-pʰɔla/ as the
second/third
member of a cluster,
e.g., &, ¡, etc. (placed
under the
consonant it
follows)
ল /l/ G (ƒ+গ), ‚ (ƒ+প), H (ƒ+ব), I
(ƒ+ম), J (ƒ+ট), K (ƒ+ড), L
(ƒ+ক), G (ƒ+গ), M (ƒ+দ), ল8
(ƒ+য)

ð (µ+ল), “ (‘+ল), ÿ (™+ল)

◌ঃ /h/ word finally, অঃ, কঃ


word medially it
doubles the
pronunciation of the
following consonant.

◌ঁ / ̃/ অঁ, বঁ

66

You might also like