Theory of Programming
Languages
LECTURE#5
Chapter # 2
Describing Syntax and Semantics
Introduction
The study of programming languages (study of natural languages)
can be divided into:
syntax
semantics
Introduction
The syntax of a programming language is the form of its expressions,
statements, and program units.
Its semantics is the meaning of those expressions, statements, and
program units.
For example, the syntax of a Java while statement is
while (boolean_expr) statement
Describing Syntax
Language
is a set of strings of characters from some alphabet.
Sentences
The strings of a language are called sentences or statements.
Syntax rules
The syntax rules of a language specify which strings of characters from the
language’s alphabet are in the language.
Describing Syntax
Lexemes
the lowest-level syntactic or small units are called lexemes.
The lexemes of a programming language include its
numeric literals,
operators, and
special words
Describing Syntax
Identifiers
Lexemes are partitioned into groups called identifiers.
For example, the names of variables, methods, classes,
Token
Each lexeme group is represented by a name, or token.
Describing Syntax
For example, an identifier is a token that can have lexemes, or
instances, such as sum and total.
In some cases, a token has only a single possible lexeme.
For example, the token for the arithmetic operator symbol + has just
one possible lexeme.
Describing Syntax
Consider the following Java Lexemes Tokens
statement: index identifier
index = 2 * count + 17; = equal_sign
2 int_literal
* mult_op
count identifier
+ plus_op
17 int_literal
; semicolon
Describing Syntax
Languages can be formally defined in two distinct ways:
by recognition
capable of reading strings of characters
by generation
generate the sentences of a language
Formal Methods of Describing Syntax
Backus-Naur Form and Context-Free Grammars
In the middle to late 1950s, two men, Noam Chomsky and John Backus,
in
unrelated research efforts, developed the same syntax description
formalism, which subsequently became the most widely used method
for programming language syntax.
Grammars
The grammar classes which describe the syntax of programming
languages named as:
context-free and regular.
Regular grammars: The forms of the tokens of programming
languages.
Context-free grammars: The syntax of whole programming
languages.
BNF
BNF is a natural notation for describing syntax.
BNF is a metalanguage for programming languages.
A metalanguage is a language that is used to describe another language.
BNF uses abstractions for syntactic structures.
BNF
A simple Java assignment statement, for example,
<assign> → <var> = <expression>
The text on the left side of the arrow, which is aptly called the left-
hand side (LHS), is the abstraction being defined. The text to the right
of the arrow is the definition of the LHS. It is called the right-hand
side (RHS) and consists of some mixture of tokens, lexemes, and
references to other abstractions.
An example sentence whose syntactic structure is described by the rule
is: total = subtotal1 + subtotal2
BNF
The abstractions in a BNF description, or grammar, are often called
nonterminal symbols, or simply nonterminals.
The lexemes and tokens of the rules are called terminal symbols, or
simply terminals.
A BNF description, or grammar, is a collection of rules.
A Grammar for a Small Language
<program> → begin <stmt_list> end
<stmt_list> → <stmt>
| <stmt> ; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var> + <var>
| <var> – <var>
| <var>
A derivation of a program:
<program> => begin <stmt_list> end
=> begin <stmt> ; <stmt_list> end
=> begin <var> = <expression> ; <stmt_list> end
=> begin A = <expression> ; <stmt_list> end
=> begin A = <var> + <var> ; <stmt_list> end
=> begin A = B + <var> ; <stmt_list> end
=> begin A = B + C ; <stmt_list> end
=> begin A = B + C ; <stmt> end
=> begin A = B + C ; <var> = <expression> end
=> begin A = B + C ; B = <expression> end
=> begin A = B + C ; B = <var> end
=> begin A = B + C ; B = C end
A Grammar for Simple Assignment
Statements
<assign> → <id> = <expr>
<id> → A| B | C
<expr> → <id> + <expr>
| <id> * <expr>
| ( <expr>)
| <id>
A Grammar for Simple Assignment
Statements
For example, the statement
A=B*(A+C)
is generated by the leftmost derivation:
<assign> => <id> = <expr>
=> A = <expr>
=> A = <id> * <expr>
=> A = B * <expr>
=> A = B * ( <expr>)
=> A = B * ( <id> + <expr>)
=> A = B * ( A + <expr>)
=> A = B * ( A + <id>)
=> A = B * ( A + C )