An
Open
Source
Urdu
Resource
Grammar
Shafqat
M
Virk
Muhammad
Humayoun
Aarne
Ranta
Department
of
Applied
IT
Laboratory
of
MathmaEcs
Department
of
CS
&
Eng
University
of
Gothenburg
University
of
Savoie
University
of
Gothenburg
[email protected]
mhuma@univ-‐savoie.fr
[email protected]
Plan
Introduc7on
– Urdu
Language
– Gramma7cal
Framework
(GF)
Urdu
Resource
Grammar
– Morphology
– Syntax
ACempto
(An
Applica7on
Grammar)
Future
Work
Ques7ons
Urdu
Language
Indo-‐European
Indo-‐Iranian
Indo-‐
Aryan
family
Widely
spoken
in
south
Asia
Closely
related
to
Hindi
- Phonology,
morphology,
syntax
and
day-‐to-‐
day
vocabulary.
- Differs
considerably
in
their
script
and
scholarly
wri7ngs
- Urdu
is
wriCen
in
a
Perso-‐Arabic
script
from
right
to
leR;
whereas
Hindi
is
wriCen
in
Devanagari
script
from
leR
to
right.
Urdu-‐Hindi
together
- One
of
the
most
widely
spoken
language
in
the
world
with
1,017,290,000
speakers
(Na7ve
+
second
language,
aRer
Chinese
Rahman,
2004)
Brahui
language
is
spoken
in
Pakistan,
and
is
Dravidian
Picture
from
google
Gramma7cal
Framework
(GF)
A
tool
for
working
with
grammars
Programming
language
for
wri7ng
grammars
A
number
of
mul7lingual
text
genera7on
applica7ons
(Phrasebook,ACempto,WebAlt
etc)
have
been
developed
using
GF
and/or
GF
resource
library
(Grammatical Framework, Ranta 2004)
http://www.grammaticalframework.org/
Levels
of
GF
Grammars
GF
Grammars
have
two
levels
– Abstract
Syntax
– Concrete
Syntax
Abstract
Syntax
Defines
a
set
of
Categories*
and
Tree
building
func7ons
Independent
of
language
Common
to
all
languages
*Term Category is used to model different parts of speech
Abstract
Syntax
Categories
– cat
CN
– cat
NP
– cat
A
– cat
AP
– cat
V2
Func7ons
– fun
PositA
:
A
-‐>
AP
;
-‐-‐
black
– fun
AdjCN
:
AP
-‐
>
CN
-‐
>
CN
;
-‐-‐
black
cat
– fun
Compl
:
V2
-‐>
NP
-‐>
VP
;
-‐-‐
eats
bread
Concrete
Syntax
Contains
lineariza7on
rules
for
categories
and
trees
Language
dependent
Each
language
has
its
own
concrete
syntax
Concrete
Syntax
[Urdu]
Categories
- lincat
CN
=
{s
:
Number
=>
Case
=>
Str
;
g
:
Gender}
;
- lincat
AP=
{
s:
Number
=>
Gender
=>
Case
=>
Degree
=>
Str
};
Func7ons
– PositA
a
=
a
;
– lin
AdjCN
ap
cn
=
{
s
=
\\n,c
=>
ap.s
!
n
!
cn.g
!
c
!
Posit
++
cn.s
!
n
!
c
;
g
=
cn.g
}
;
– Compl
v2
np
=
np
++
v2
;
(bread
eat,
rʋʈi:
kʰata,
%&%'( !"
)رو
Concrete
Syntax
[Urdu]
Categories
- lincat
CN
=
{s
:
Number
=>
Case
=>
Str
;
g
:
Gender}
;
- lincat
AP=
{
s:
Number
=>
Gender
=>
Case
=>
Degree
=>
Str
};
Func7ons
– PositA
a
=
a
;
– lin
AdjCN
ap
cn
=
{
s
=
\\n,c
=>
ap.s
!
n
!
cn.g
!
c
!
Posit
++
cn.s
!
n
!
c
;
g
=
cn.g
}
;
– Compl
v2
np
=
np
++
v2
;
(bread
eat,
rʋʈi:
kʰata,
%&%'( !"
)رو
Concrete
Syntax
[English]
Categories
– lincat
CN
=
{s
:
Number
=>
Case
=>
Str
;
g
:
Gender}
;
– lincat
AP
=
{s
:
Agr
=>
Str
;
isPre
:
Bool}
;
Func7ons
– PositA
a
=
{
s
=
\\_
=>
a.s
!
AAdj
Posit
Nom
;
isPre
=
True}
;
– AdjCN
ap
cn
=
{
s
=
\\n,c
=>
preOrPost
ap.isPre
(ap.s
!
agrgP3
n
cn.g)
(cn.s
!
n
!
c)
;
g
=
cn.g
}
;
- Compl
v2
np
=
v2
++
np
;
eat
bread
Types
of
Grammars
Resource
Grammars
Applica7ons
Grammars
Resource
Grammars
General
purpose
grammars
that
cover
general
aspects
of
a
language
linguis7cally
Resource
grammars
encodes
syntac7c
features
of
language
Applica7on
Grammars
Typically
limited
to
specific
domains
Encode
seman7c
structures
Can
use
resource
grammars
as
libraries
Urdu
Resource
Grammar
A
resource
grammar
consists
of
– Lexicon
– Grammar
GF
library
currently
has
resource
grammars
for
15
languages
Urdu
is
16th
in
total
and
first
South
Asian
language
Almost
2700
lines
of
code
and
development
7me
is
almost
seven
months
Lexicon
Test
Lexicon
of
350
Words
Almost
100
Structural
Words
(Closed
Word
Categorey)
The
rules
of
defining
Urdu
morphology
are
borrowed
from
(Humayoun
et
el
2006)
- An
Urdu
morphology
was
developed
in
Haskell
using
Func7onal
morphology
toolkit
- Now
we
have
developed
in
GF
Morphology
+
Syntax
Nouns
and
Noun
Phrases
Verbs
and
Verb
Phrases
Adjec7ves
and
Adjec7val
Phrases
Clauses
Sentences
Urdu
Nouns
Urdu
Nouns
inflect
in
- Number
(Singular,
Plural)
- Case
(Direct,
Oblique,
Voca7ve)
Inherent
Gender
Noun
=
{s
:
Number
=>
Case
=>
Str
;
g
:
Gender}
Urdu
Nouns
Urdu
Nouns
inflect
in
Direct
Oblique
VocaEve
- Number
(Singular,
Plural)
Singular
lRka
lRkE
lRkE
- Case
(Direct,
Oblique,
Voca7ve)
%()*
+()* +()*
Inherent
Gender
Plural
lRkE
lRkwN
lRkw
+()*
ں-()* .-()*
Noun
=
{s
:
Number
=>
Case
=>
Str
;
g
:
Gender}
Different
Forms
of
Noun
’Boy’
We*
have
divided
Nouns
into
15
different
groups,
based
on
how
they
end,
and
there
is
one
group
for
worst
case.
*
Humayoun
et
el
2006
Noun
Phrases
(M)
H
(M)
NPErg:
lRkE
ne:
ktab
Xrydy
NP
:
Type
=
{s
:
NPCase
=>
Str
;
a
:
Agr}
;
The
boy
bought
book.
NPCase
=
NPC
Case
|
NPErg
|
NPAbl
|
NPIns
NPAbl:
|
NPLoc1
|
NPLoc2;
lRkE
se:
ktab
lkh-‐y
gyy
- NPErg:
Erga7ve
case
with
case
marker
‘ne:
’ﻧﮯ The
book
was
wriCen
by
boy.
- NPAbl:
Abla7ve
with
case
marker
‘se:
’ﺳﮯ NPIns:
- NPIns:
Instrumental
case
with
case
marker
‘se:
lRkE
nE
pnsl
se:
lkh-‐a
The
boy
wrote
with
’ﺳﮯ pencil.
- NPLoc1:
Loca7ve
case
with
case
marker
‘mi:
ɳ
NPLoc1:
’ﻣ&ﮟ lRka
kmrE
mi:
ɳ
hE
The
boy
is
in
the
- NPLoc2:
Loca7ve
case
with
case
marker
‘pr
’ﭘﺮ room.
NPLoc2:
Ktab
myZ
pr
hE
The
book
is
on
the
table.
Verbs
Urdu
Verb
inflects
in
- Gender
(Masculine,
Feminine)
- Number
(Singular,
Plural)
- Person
(First,
Second
{casual,familiar,respecwull}
,Third
{near,distant})
- Tense
(Subjunc7ve,
Perfec7ve,
Imperfec7ve)
Verb
=
{s
:
VerbForm
=>
Str}
VerbForm
=
VF
VTense
UPerson
Number
Gender
|
Inf
|
Root
Verbs
VF
Subj
Pers1
Sg
Masc
=>
kh-‐aw^N
وں%'(
VF
Subj
Pers1
Sg
Fem
=>
kh-‐aw^N
وں%'(
VF
Subj
Pers1
Pl
Masc
=>
kh-‐ay^N
/01%'(.
VF
Subj
Pers1
Pl
Fem
=>
kh-‐ay^N
/01%'(
VF
Subj
Pers2_Casual
Sg
Masc
=>
kh-‐a
%'(
VF
Subj
Pers2_Casual
Sg
Fem
=>
kh-‐a
%'(
VF
Subj
Pers2_Casual
Pl
Masc
=>
kh-‐aw^
و%'(
VF
Subj
Pers2_Casual
Pl
Fem
=>
kh-‐aw^
و%'(
……………………
……………………
VF
Imperf
Pers1
Sg
Masc
=>
kh-‐ata
%&%'(
VF
Imperf
Pers1
Sg
Fem
=>
kh-‐aty
!&%'(
VF
Imperf
Pers1
Pl
Masc
=>
kh-‐atE
+&%'(
VF
Imperf
Pers1
Pl
Masc
=>
kh-‐atyN
/0&%'(
……………………
……………………
Inf
=>
kh-‐ana
%2%'(
Root
=>
kh-‐a
%'(
Verb
Phrases
VPH
:
Type
=
{
VPHForm
=
s
:
VPHForm
=>
{fin,
inf
:
Str}
;
VPTense
VPPTense
Agr
|
VPReq
HLevel
obj
:
{s
:
Str
;
a
:
Agr}
;
|
VPStem
vType
:
VType
;
comp
:
Agr
=>
Str;
PTense
=
VPPres
embComp:
Str;
|VPPast
|VPFutr
ad
:
Str;
}
;
HLevel
=
Tu
|Tum
|Ap
|Neutr
VType
=
VIntrans
|
VTrans
|
VTransPost
Verb
Phrases
VPH
:
Type
=
{
s:
{fin
:
Copula
s
:
VPHForm
=>
{fin,
inf
:
Str}
;
inf
:
actual
form
of
verb}
obj
:
{s
:
Str
;
a
:
Agr}
;
obj:
object
of
the
verb
vType
:
VType
;
vType
:
Type
of
verb,
will
be
used
in
Erga7vity
comp
:
Agr
=>
Str;
embComp
:
Str;
comp:
Complement
of
verb
embComp:
Used
in
case
of
ad
:
Str;
embeded
sentences
}
;
ad:
adverb
Verb
Phrases
He
says
that
she
runs.
She
wants
to
run
+3 !& وہ دوڑ7( +3 %89( وہ +3 !83%: %2وہ دوڑ
Noun ++ VP.obj ++ VP.adverb ++ VP.complement ++ VP.verb++ VP.copula
He
says
that
she
runs.
She
wants
to
run.
.+3 %89( +3 !& وہ دوڑ7( وہ .+3 !83%: %2وہ دوڑ
Noun ++ VP.obj ++ VP.adverb ++ VP.complement ++ VP.verb ++ VP.copula ++
VP.embComp
+3 !& وہ دوڑ7( +3 %89( وہ .+3 !83%: %2وہ دوڑ
Adjec7ves
Urdu
Adjec7ves
inflect
in
- Number
(Singular,Plural)
- Gender
(Masculine,Feminine)
- Case
(Direct,Oblique,Voca7ve)
- Degree
(Posit,Compar
,Superl)
Adjec7ve
=
{
s:
Number
=>
Gender
=>
Case
=>
Degree
=>
Str
};
Adjec7val
Phrases
AP
=
{
s:
Number
=>
Gender
=>
Case
=>
Degree
=>
Str
};
Sg
Mas
Dir
Posit
=>
kala
;<%(
Sg
Mas
Dir
Compar
=>
bht
kala
;<%( =9>
Sg
Mas
Dir
Superl
=>
sb
sE
kala
;<%( +? @?
…………………
…………………
Sg
Fem
Dir
Posit
=>
kaly
!*%(
Sg
Fem
Dir
Compar
=>
bht
kaly
!*%( =9>
Sg
Fem
Dir
Superl
=>
sb
sE
kaly
!*%( +? @?
Clauses
Clause
:
Type
=
{s
:
VPHTense
=>
Polarity
=>
Order
=>
Str}
;
- VPHTense
=
VPGenPres
|
VPPastSimple
|
VPFut
|
VPContPres
|
VPContPast
|
VPContFut
|
VPPerfPres
|
VPPerfPast
|
VPPerfFut
|
VPPerfPresCont
|
VPPerfPastCont
|
VPPerfFutCont
|
VPSubj
- Polarity
=
Pos
|
Neg
;
- Order
=
ODir
|
OQuest
;
Sentenses
S
=
{s
:
Str}
UseCl
:
Temp
-‐>
Pol
-‐>
Cl
-‐>
S
UseCl
temp
p
cl
=
{
s
=
case
<temp.t,temp.a>
of
{
<Pres,Simul>
=>
temp.s
++
p.s
++
cl.s
!
VPGenPres
!
p.p
!
ODir;
<Pres,Anter>
=>
temp.s
++
p.s
++
cl.s
!
VPPerfPres
!
p.p
!
ODir;
<Past,Simul>
=>
temp.s
++
p.s
++
cl.s
!
VPImpPast
!
p.p
!
ODir;
<Past,Anter>
=>
temp.s
++
p.s
++
cl.s
!
VPPerfPast
!
p.p
!
ODir;
<Fut,Simul>
=>
temp.s
++
p.s
++
cl.s
!
VPFut
!
p.p
!
ODir;
<Fut,Anter>
=>
temp.s
++
p.s
++
cl.s
!
VPPerfFut
!
p.p
!
ODir;
<Cond,Simul>
=>
temp.s
++
p.s
++
cl.s
!
VPSubj
!
p.p
!
ODir;
<Cond,Anter>
=>
temp.s
++
p.s
++
cl.s
!
VPSubj
!
p.p
!
Odir
}
}
Erga7vity
Final
verb
agreement
is
with
direct
subjec7ve
except
in
the
transi7ve
perfec7ve
tense
In
transi7ve
perfec7ve
tense
verb
agreement
is
with
direct
object
Girl
ate
apple
VS
Girl
ate
bread.
ی%'( !" رو+2 !()*
%1%'( @0? +2 !()*
lɽki:
ne:
rʋʈi:
kʰai:
VS
lɽki:
ne:
si:b
kʰai:a
{Fem:
!" رو,Fem:
} ('&ی {Masc:
)*+
,Masc:
&,&'(
}
Erga7vity
mkClause
:
NP
-‐>
VPH
-‐>
Clause
=
\np,vp
-‐>
{
s
=
\\vt,b,ord
=>
let
subjagr
:
NPCase
*
Agr
=
case
vt
of
{
VPPast
=>
case
vp.subj
of
{
(Vtrans|
VTransPost)
=>
<NPErg,
vp.obj.a>
;
_
=>
<NPC
Dir,
np.a>
}
;
_
=>
<NPC
Dir,
np.a>
}
;
……………………
………………
ACempto
A
grammar
for
Controlled
Language
Implemented
for
English
then
was
ported
to
Finnish,
French,
German,
Italien,
Swedish
Ported
to
Urdu
Attempto Home Page : http://attempto.ifi.uzh.ch/site/
Future
Work
Bigger
Lexicon
(A
lexicon
of
6600
words
has
been
completed
recently)
Language
Specific
Module
(under
construc7on)
Hindi
Resource
Grammar
(almost
completed)
Applica7on
Grammars
(SMS
translator)
Ques7ons/Sugges7ons