0% found this document useful (0 votes)
52 views

Lab Session 2 Arithmetic Expresion Grammar

The document discusses arithmetic expression grammars in ANTLRWorks. It begins with the objectives of handling arithmetic expressions and writing grammars. Various grammars are presented and modified to address issues like left recursion, operator precedence, and whitespace. Exercises are given to modify existing grammars to handle multiple expressions, precedence, ignore whitespace, and use parentheses. Students are required to submit the grammars for the exercises and parse trees for sample inputs.
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Lab Session 2 Arithmetic Expresion Grammar

The document discusses arithmetic expression grammars in ANTLRWorks. It begins with the objectives of handling arithmetic expressions and writing grammars. Various grammars are presented and modified to address issues like left recursion, operator precedence, and whitespace. Exercises are given to modify existing grammars to handle multiple expressions, precedence, ignore whitespace, and use parentheses. Students are required to submit the grammars for the exercises and parse trees for sample inputs.
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

LAB SESSION 2

ARITHMETIC EXPRESION GRAMMAR

1. OBJECTIVE
The objectives of Lab 2 are (1) to write a grammar with ANTLRWorks and (2) to handle
arithmetic expression grammar.

2. EXPERIMENT

2.1 Your first grammar

In the first lab, we have tried to make ANTLRWorks recognize some patterns by the
means of regular expressions (RE). In fact, it is not very convenient, because
ANTLRWorks is not the tool intended for RE merely. ANTLRWorks only shows its true
power when working with actual grammar.

Let’s play with some grammars. To begin, define your first grammar in ANTLRWorks as
follows:

grammar G1;

e : INT '+' e| INT;

INT :'0'..'9'+;

So, what are the non-terminal set and the token set of the grammar?

In ANTLRWorks, if a rule has its left-hand side (LHS) beginning with an uppercase
letter, this rule defines a token. Otherwise, this rule is a production of the grammar, thus
its LHS is a non-terminal symbol.

Hence, in the above grammar, the token set is {INT} and the non-terminal set is {e}, the
production set is {e  INT ‘+’ e | INT}.

The language generated by this grammar is very simple: it is an addition of many


integers, for example: 25+3, 12+5+7, etc.
Since everything now has been indicated clearly in this grammar, ANTLRWorks can
generate the corresponding lexer and parser without any problem. So that we do not need
to bother about the lexer::header directive as in Lab 1. Just play with this grammar.

2.2 Revising your grammar

Unfortunately, even though now your grammar is defined precisely, it still cannot work.
Test your grammar with some inputs like 2+3 or 23+5, what you get are only some
erroneous trees.

ANTLRWork is not intended for all arbitrary grammars, but those of programming
languages. Remember that in most of programming languages, the statements must be
ended by a special character (like ‘;’ in C++ or Java). If you have chances to learn some
parsing algorithms in some appropriate courses like Compiler, you might know the
reason behind this secret. For now, just follow the tradition by revising your grammar as
follows.

grammar G2;

e : INT '+' e SEMI| INT SEMI;

INT :'0'..'9'+;

SEMI : ';';

Things are nice now. Your grammar should work perfectly with valid inputs like 2+3;
(do not forget the semicolon).

Or, you can make your grammar more concise as follows:

grammar G3;

p : e SEMI;

e : INT '+' e| INT;

INT :'0'..'9'+;

SEMI : ';';

2.3 Deal with whitespace


When you test grammar G3 above with the input of 2 + 3; (see, there are some
spaces occurring inside the expression), you still get the parse tree, but it does not look
nice. The spaces are inserted into the generate nodes, and they do not make sense.

It urges us to modify the grammar as follows:

grammar G4;

p : e SEMI;

e : INT '+' e| INT;

INT :'0'..'9'+;

SEMI : ';';

WS : (' ')+ {$channel=HIDDEN;};

In grammar G4, we define token WS to deal with spaces. In particular, this token is
associated with some actions. The actions, put in the curly brackets, are
$channel=HIDDEN. It means that token WS will be put into a hidden channel. In other
words, this token will always be ignored by the parser.

Test your grammar again with the input of 2 + 3; and observe the difference in the
generated tree.

2.4 Left-recursion elimination

In grammar G4, the operator ‘+’ is of left-associativity. One may want to make it of right-
associativity. In the first place, things seem quite simple. Just modify your grammar as
follows:

grammar G5;

p : e SEMI;

e : e '+' INT| INT;

INT :'0'..'9'+;

SEMI : ';';
WS : (' ')+ {$channel=HIDDEN;};

That is right, theoretically. However, it does not work on ANTLRWorks. You can try and
see.

The reason is due to the production: e e ‘+’ INT. Here we have something called left-
recursion: the symbol on the LHS is also the first symbol on the RHS.

Some parsing algorithms can work well with left-recursion, but it is not the case of
ANTLRWorks. We must eliminate this left-recursion using the following transformation:

If we have left-recursion productions as follows:

X X| 

The left-recursion can be eliminated by applying transformation on the productions as


follows:

X Y

Y Y | ε

Applying the above transformation on the case of e e ‘+’ INT | INT (where X is e,  is
‘+’ INT and  is INT, we have a new grammar as follows;

grammar G6;

p : e SEMI;

e : INT t;

t : '+' INT t|;

INT :'0'..'9'+;

SEMI : ';';

WS : (' ')+ {$channel=HIDDEN;};

Note that in grammar G6, the associativity of ‘+’is still left. It is quite hard to obverse this
on the tree, but in the upcoming Lab 3 we will verify it.
3. CLASS EXERCISES

3.1.

a) Modify grammar G3 such that it can generate multiple expressions separated by


semicolons; the operators accepted are ‘+’ and ‘-‘.

For example:

3+4;4+5-6;23-12+47;

b) Modify grammar in written 3.1 such that the operator ‘+’ takes higher precedence than
that of ‘-‘. The grammar can also now be able to ignore some whitespace characters like
space, tab and new line1. In addition, the grammar also allows users to use parentheses in
the expression.

3.2 Eliminate left-recursion in the following grammars and test the transformed
grammars in ANTLRWorks

a)

e  t ‘+’ e |t

t  t ‘*’ A | A

A  ‘0’..’9’+

b2)

e  e ‘+’ t | e ‘-’ t | t

t  t ‘*’ A | A

A  ‘0’..’9’+

4. SUBMISSIONS

Students are required to submit in writing to the tutor-in-charge the following materials:

1
One can use ‘\t’ and ‘\n’ to represent tab and new line character in ANTLRWorks.
2
For Honor program (KSTN program) only
4.1 The grammars for Exercise 3.1 and 3.2

4.2 The trees generated by ANTLRWorks with the following (transformed) grammars
and inputs

Grammar input

3.2a 3+4*5*6

3.2b 3+4*5-6

You might also like