Lab Session 2 Arithmetic Expresion Grammar
Lab Session 2 Arithmetic Expresion Grammar
1. OBJECTIVE
The objectives of Lab 2 are (1) to write a grammar with ANTLRWorks and (2) to handle
arithmetic expression grammar.
2. EXPERIMENT
In the first lab, we have tried to make ANTLRWorks recognize some patterns by the
means of regular expressions (RE). In fact, it is not very convenient, because
ANTLRWorks is not the tool intended for RE merely. ANTLRWorks only shows its true
power when working with actual grammar.
Let’s play with some grammars. To begin, define your first grammar in ANTLRWorks as
follows:
grammar G1;
INT :'0'..'9'+;
So, what are the non-terminal set and the token set of the grammar?
In ANTLRWorks, if a rule has its left-hand side (LHS) beginning with an uppercase
letter, this rule defines a token. Otherwise, this rule is a production of the grammar, thus
its LHS is a non-terminal symbol.
Hence, in the above grammar, the token set is {INT} and the non-terminal set is {e}, the
production set is {e INT ‘+’ e | INT}.
Unfortunately, even though now your grammar is defined precisely, it still cannot work.
Test your grammar with some inputs like 2+3 or 23+5, what you get are only some
erroneous trees.
ANTLRWork is not intended for all arbitrary grammars, but those of programming
languages. Remember that in most of programming languages, the statements must be
ended by a special character (like ‘;’ in C++ or Java). If you have chances to learn some
parsing algorithms in some appropriate courses like Compiler, you might know the
reason behind this secret. For now, just follow the tradition by revising your grammar as
follows.
grammar G2;
INT :'0'..'9'+;
SEMI : ';';
Things are nice now. Your grammar should work perfectly with valid inputs like 2+3;
(do not forget the semicolon).
grammar G3;
p : e SEMI;
INT :'0'..'9'+;
SEMI : ';';
grammar G4;
p : e SEMI;
INT :'0'..'9'+;
SEMI : ';';
In grammar G4, we define token WS to deal with spaces. In particular, this token is
associated with some actions. The actions, put in the curly brackets, are
$channel=HIDDEN. It means that token WS will be put into a hidden channel. In other
words, this token will always be ignored by the parser.
Test your grammar again with the input of 2 + 3; and observe the difference in the
generated tree.
In grammar G4, the operator ‘+’ is of left-associativity. One may want to make it of right-
associativity. In the first place, things seem quite simple. Just modify your grammar as
follows:
grammar G5;
p : e SEMI;
INT :'0'..'9'+;
SEMI : ';';
WS : (' ')+ {$channel=HIDDEN;};
That is right, theoretically. However, it does not work on ANTLRWorks. You can try and
see.
The reason is due to the production: e e ‘+’ INT. Here we have something called left-
recursion: the symbol on the LHS is also the first symbol on the RHS.
Some parsing algorithms can work well with left-recursion, but it is not the case of
ANTLRWorks. We must eliminate this left-recursion using the following transformation:
X X|
X Y
Y Y | ε
Applying the above transformation on the case of e e ‘+’ INT | INT (where X is e, is
‘+’ INT and is INT, we have a new grammar as follows;
grammar G6;
p : e SEMI;
e : INT t;
INT :'0'..'9'+;
SEMI : ';';
Note that in grammar G6, the associativity of ‘+’is still left. It is quite hard to obverse this
on the tree, but in the upcoming Lab 3 we will verify it.
3. CLASS EXERCISES
3.1.
For example:
3+4;4+5-6;23-12+47;
b) Modify grammar in written 3.1 such that the operator ‘+’ takes higher precedence than
that of ‘-‘. The grammar can also now be able to ignore some whitespace characters like
space, tab and new line1. In addition, the grammar also allows users to use parentheses in
the expression.
3.2 Eliminate left-recursion in the following grammars and test the transformed
grammars in ANTLRWorks
a)
e t ‘+’ e |t
t t ‘*’ A | A
A ‘0’..’9’+
b2)
e e ‘+’ t | e ‘-’ t | t
t t ‘*’ A | A
A ‘0’..’9’+
4. SUBMISSIONS
Students are required to submit in writing to the tutor-in-charge the following materials:
1
One can use ‘\t’ and ‘\n’ to represent tab and new line character in ANTLRWorks.
2
For Honor program (KSTN program) only
4.1 The grammars for Exercise 3.1 and 3.2
4.2 The trees generated by ANTLRWorks with the following (transformed) grammars
and inputs
Grammar input
3.2a 3+4*5*6
3.2b 3+4*5-6