16.
2 Translation Software Notes
Objective:
Show understanding of how an interpreter can execute programs without producing a
translated version.
Show understanding of various stages in compilation of program. Including lexical analysis,
syntax analysis, code generation and optimization.
Show understanding of how grammar of language can be expressed using syntax diagrams
or Backus-Naur Form (BNF) notation.
Show understanding of how Reverse Polish Notation (RPN) can be used to carry out
evaluation of expressions.
How an Interpreter Differs from A Compiler
An interpreter executes the program it is interpreting. A compiler does not execute program
it is compiling.
With a compiler, source code is input and either object code program or error messages are
output. Object code produced can then be executed without needing recompilation.
With an interpreter, source code is input; there may also be other inputs that program
requires or to correct errors in source program. No object code is output, but error messages
from interpreter are output, as well as any outputs produced by program being interpreted.
As there is no object code produced from interpretation process, interpreter will need to be
used every time program is executed.
Interpreter checks each statement individually and reports any errors, which can be
corrected before statement is executed. After each statement is executed, control is
returned to the interpreter so next statement can be checked before execution.
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com
CS Made Easy
Stages in Compilation of a Program
A compiler has a 'front end analysis ' and a 'back end analysis '.
֎ Front-end program performs analysis of source code and produces an intermediate
code that expresses completely semantics (the meaning) of source code.
֎ Back-end program then takes this intermediate code as input and performs synthesis
of object code.
Process of translating source program written in high-level language into object program in
machine code can be divided into four stages:
Lexical analysis, Syntax analysis, Code generation and Optimisation.
Lexical Analysis
Lexical analysis is first stage in process of compilation. It carries following steps
֎ All unnecessary characters not required by compiler, such as white space and
comments, are removed and tokenises the program.
Source Program After Unnecessary Characters Removal
// My addition program Version 2 DECLARE x, y, z : INTEGER
DECLARE x, y, z : INTEGER
OUTPUT "Please enter two numbers to add "
OUTPUT "Please enter two numbers to add"
INPUT x, y
INPUT x, y // Taking Input from user
z=x+y
z=x+y
OUTPUT "Answer is ", z
OUTPUT "Answer is ", z
֎ Lexical analyzer uses source program as input and creates stream of tokens for syntax
analyzer. This process is called Tokenisation.
Computer Science IGCSE, O & A level By Engr M Kashif 03345606716
16.2 Translation Software Notes
In order to tokenise program, compiler will use keyword table that contains all tokens for
reserved words and symbols used in programming language are stored with their tokens.
Every program being compiled uses same keyword table. Keywords (reserved words) are
checked for validity from keywords table.
In example below, all tokens are represented as by hexadecimal numbers.
Keyword Table
= 01
+ 02
: 03
, 04
--------- -----------
DECLARE 31
INTEGER 32
INPUT 33
OUTPUT 34
֎ Symbol Table: During lexical analysis, variables and constants and other identifiers
used in a program are added to a symbol table, produced during compilation,
specifically for that program.
✓ A symbol table is built for every program during compilation which contains all
identifiers (variables and constants) found in source code .
✓ It stores type of variable / identifier and its value (if any).
✓ Every identifier is assigned a token.
✓ At this stage only variable names are noted in symbol table. Other details data
type and scope are enter in next stage i.e. syntax analysis.
✓ Symbol table is data structure which is used in later stages of compilation.
✓ Check for obvious errors in use of identifiers names e.g. length of variable or
variable not declared (if applicable).
For Example, part of symbol table for program above could be as follows.
Finally keywords /reserved words and identifiers in source code are replaced by tokens.
Final output of this stage produces a tokenized version of program code.
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com
CS Made Easy
For example, declaration statement:
Var Count : integer ; would be recognized as containing five tokens:
Var Count : integer ;
Assignment Statement:
PercentMark [ Count ] := Score * 10 would be recognized as containing eight tokens:
PercentMark [ Count ] := Score * 10
Output from lexical analysis is tokenised list stored in main memory. Here is output for first
four lines of program:
31 81 04 82 04 83 03 34 84 33 81 04 82 83 01 81 02 82
Syntax Analysis
Syntax analysis (also known parsing) is next stage in process of
compilation. In this stage, output from lexical analysis is checked for grammatical (syntax)
errors.
For Example, source program statement: z + x + y would produce this tokenised list:
83 02 81 02 82
↑
error = (01) expected
➢ Complete Tokenised list is checked for errors using grammatical rules for
programming language. This is called Parsing. Whole program goes through this
process even if errors are found. Tree data structure are used to check grammar rules.
➢ Rules for Parsing can be set out in Backus-Naur form (BNF) notation. If any errors are
found, each statement and associated error are output but code generation will not
be attempted. Compilation Process will finish after this stage.
➢ If tokenised code is error free, it will be passed to next stage of compilation,
generating object code.
Code Generation
During Code Generation stage, compiler transform tokenized form into
code that can be understood by computer’s processor. During this process information in
symbol table is used and Code of program libraries is also included in finally generated code.
Code generation stage produces an object program to perform task defined in source code.
Program must be syntactically correct for an object program to be produced.
Computer Science IGCSE, O & A level By Engr M Kashif 03345606716
16.2 Translation Software Notes
Object program is in Machine-Readable form (binary). It is no longer in a form that is
designed to be read by humans.
Optimisation
Code produce by code generation process may not be efficient code.
Optimisation stage supports creation of an efficient object program which executes faster.
Optimise Code has fewer instructions and occupies less space in memory. When executed
minimizes execution time of code. Optimised programs should perform task using minimum
amount of resources. These include time, storage space, memory and CPU use.
Some optimisation can take place after syntax analysis or as part of code generation.
Benefits of Optimisation Stage:
• Redundant code is removed and code is reduced.
• Program require less memory.
• Code is reorganized to make it more efficient.
• Program will complete task in a shorter time means program will execute faster.
Meta Languages
Meta languages describe syntax of programming languages and specify how different
elements of language can be combined to form valid expressions.
Grammatical rules for programming language can be shown graphically in syntax diagram or
using a meta language such as Backus-Naur form (BNF) notation.
Syntax Diagrams
Syntax diagrams are graphical notations that use shapes and symbols to represent different
elements of language and ways in which they can be combined.
Syntax diagram is used to describe language’s syntax in a graphical form.
Note: Circle symbol with some text inside is known as terminal symbol. There are no
lower level syntax diagrams defining this any further.
Rectangle symbol with some text inside mean this is an item that is defined in some
greater detail in another syntax diagram.
Arrow Symbol Indicates direction in which you are allowed to read overall diagram.
Example: Each element in language has a diagram showing how it is built.
For example, variable consisting of letter followed by digit would be shown as:
Note: Latter and Digit are in rectangle shape so need further definition:
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com
CS Made Easy
✓ Now Define letter by using Syntax Diagram:
✓ Now Define digit by using Syntax Diagram:
Defining an Unsigned integer using a syntax diagram
According to above syntax diagram, integer is made up of one or more digits.
Defining a digit.
Computer Science IGCSE, O & A level By Engr M Kashif 03345606716
16.2 Translation Software Notes
Draw Syntax Diagram for Signed Integer.
Exam Style Questions 9618/31/M/J/23
Several syntax diagrams are shown.
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com
CS Made Easy
(a) State whether following passwords is valid or invalid and give a reason for your choice.
➢ DPAD99$ .......................... Ans: Valid
Reason: 4/multiple letters followed by 2/multiple digits followed by a symbol.
➢ DAD#95 ...............................Ans: Invalid
Reason: The symbol comes before the digits – it should be after.
➢ ADY123? .............................. Ans: Invalid
Reason: The ? is not a valid symbol
Example of Assignment statement using Syntax Diagram:
Syntax Diagram for operator could be shown as:
Syntax diagrams can allow for repetitions as well as alternatives. For example, a variable
that can be a letter followed by another letter or any number of digits can be shown as:
Computer Science IGCSE, O & A level By Engr M Kashif 03345606716
16.2 Translation Software Notes
Backus-Naur form (BNF)
BNF uses a set of symbols to describe grammar rules in a programming language. BNF textual
notation includes:
Symbol Explanation
::= separates an item from its definition. is read as “is defined as”
<> used to enclose an item. Any character written within these symbols
make up a non-terminal language element.
| (vertical bar) Is read as “OR” . Between items indicates a choice.
; the end of a rule
Any set of characters not enclosed by symbols is terminal e.g. A,B,C,D 1,2,3,4 etc.
❖ Example: Define an integer using BNF
<integer> ::= <digit>
<digit> ::= 0| 1| 2|3 |4 |5 |6 |7 |8 |9
❖ Example: Defining Integer. E.g 125
BNF Form: <integer> ::= <digit><integer>
Explanation: It consists of three integers. 1 is integer followed by integer 25. But again 25 is
aconsist of 2 integer followed by another single digit integer i.e. 5.
Note: All integers of more than one digit start with a single digit and followed by an integer.
We can define integer with many digits:
<integer> ::= <digit><integer>
This is recursive definition as integer is defined in term of itself. Recursion is used to
show repetition of character and optional characters.
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com
CS Made Easy
To stop this process, we use fact that eventually, is a single digit. So we can say
<integer> ::= <digit> | <digit><integer>
It means integer is a digit or digit followed by an integer.
❖ Write BNF Form of Unsigned Integer
<unsigned integer> ::= <digit> | <digit><unsigned integer>
<digit> ::= 0| 1| 2|3 |4 |5 |6 |7 |8 |9
❖ Write BNF form of Signed Integer .
Example: Variable consisting of letter (A, B or C) followed by digit (1, 2, 3) would be shown as:
Example: Suppose variable is sequence of one or more characters starting with letter. The
character can be any letter, digit or underscore.
Computer Science IGCSE, O & A level By Engr M Kashif 03345606716
16.2 Translation Software Notes
Reverse Polish notation (RPN)
Reverse Polish notation (RPN) is a method of representing an arithmetical or logical
expression without use of brackets or special punctuation. RPN is also known as postfix
notation, where an operator is placed after the variables ( Operand ) it acts on.
RPN never requires brackets and has no rules of precedence.
For Example, A + B would be written as A B +.
If there are multiple operations, and operator is given immediately after its second operand; so
expression written “(3 – 4) + 5” in conventional infix notation would be written as “3 4 – 5 +” in
RPN (postfix ).
Compilers use RPN because any expression can be processed from left to right without using
any back tracking.
Converting an expression to RPN:
Consider a simple expression : a+b*C
Final RPN form: abC*+
Note: If original expression had been (a + b) * c (where brackets were essential) then conversion
to RPN would have given: RPN= a b + C *
Rules for Converting a infix notation into postfix or RNP using stack
➢ Read infix expression from left to right.
➢ No two operators can stay in stack with same priority, pop the first one and write it in
postfix /RPN column and then push operator with same priority.
➢ Operator with less priority cannot be push onto stack on an operator with higher priority.
In this case higher priority operator will be pop up from stack and written in postfix / RNP
expression before lower priority operator can be pushed onto stack.
➢ If an operator is found inside parenthesis in stack, it must be pop up immediately and
written in postfix / RPN column.
➢ If there is start parenthesis “(”, then operator can be pushed on it irrespective of what
priority of operator is below this
➢ If a value is itself negative (e.g. –A / B) then put this in parenthesis (i.e. (– A) / B) in given
infix expression.
Evaluating an RPN Expression
To Evaluate Expression using a Stack
Values are added to stack in turn going from left to right.
when an operator is encountered, it is not added to stack but used to operate on top
two values of stack – which are popped off the stack, operated on, then the result is
pushed back on stack.
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com
CS Made Easy
This is repeated until there is a single value on stack and end of expression has been
reached.
Example: If A = 2, B = 3 and C = 4, then A B C * - is evaluated using stack, as shown below:
An expression using brackets ( A + B ) * ( C – D ) becomes A B + C D - * in RPN, as brackets
have highest precedence.
If A = 2, B = 3, C = 4 and D = 5
Example: Evaluating RPN using stack. 8 5 2 - * 30 2 3 * / -
Purpose of Using RPN:
✓ RPN provide a more efficient and unambiguous way of expressing mathematical
expressions, particularly for use in computer programs and calculators.
✓ In infix notation, parentheses are used to indicate order of operations, which can be
complex and difficult to parse. In RPN (postfix ), order of operations is determined by
order in which operators appear, making it easy to evaluate expressions using a stack-
based algorithm.
Advantages of RPN:
✓ Its simplicity, ease of implementation, and lack of ambiguity.
✓ No need for rules of precedence (BODMAS) because order of evaluation is
determined by order of operands and operators.
Computer Science IGCSE, O & A level By Engr M Kashif 03345606716
16.2 Translation Software Notes
✓ No need for brackets (parentheses) to indicate order of operations because order
is determined by order of operands and operators.
✓ RPN can be evaluated efficiently using a stack-based algorithm that requires
minimal memory.
BODMAS is acronym used to remember order of operations in mathematics. It stands for:
➢ Brackets
➢ Orders (i.e., powers and square roots, etc.)
➢ Division and Multiplication (from left to right)
➢ Addition and Subtraction (from left to right)
BODMAS rule ensures that mathematical expressions are evaluated in correct order.
****************
O-A Level Computer Science By Engr M Kashif 03345606716 paperscambridge.com