Compiler L 400
Compiler L 400
What is a Translator
What is a Compiler
1
Compilers are computer programs that’s translate one language into another.
Compiler takes the source language and produces an equivalent
program written in target language.
Compiler is fairly complex program. How many lines of code? 10k to 1 mil.
Writing or understanding it is not simple.
2
A Translator is a computer program that translates one programming
language instruction(s) into another programming language
instruction(s) without the loss of original meaning.
OR, the translator will translate Q language and produce Q’ language.
Where Q is the MEANING and ‘(DASH) is the language.
Some advanced translators will even change the logic (not meaning) or
will simplify the logic without losing the essence.
Types
1. If the translator translates a high level language into an assembly or
machine language it is called a compiler. Eg. include Ada, ALGOL, BASIC,
COBOL, FORTRAN, PL/I, C/C++.
2. If the translator translates a high level language into an intermediate
code which will be immediately executed it is called interpreter.
eg. include APL, ASP, CYBOL, LISP, Smalltalk, PHP and PERL.
3. If the compiled program can run on a computer whose CPU or
operating system is different from the one on which the compiler runs,
the compiler is known as a cross-compiler. 3
Why Compilers ?
Initially programs were written in machine language – numeric codes that
represented the actual machine operations.
C7 06 0000 0002
Move number 2 to location 0000 (in hexadecimal on intel 8x86 processor)
It looks easy, right?
5
PROGRAMS RELATED TO COMPILERS
Tokens
parser
Literal
Syntax Tree Table
Semantic Analyzer
symbol
Annotated Tree Table
Source Code Optimizer
Intermediate Code error
handler
Code Generator
Target Code
Target Code 8
The Scanner – does the actual reading of the source code,
which is usually in the form of stream characters. It performs
lexical analysis: it collects sequences of characters into
meaningful units called tokens, which are like the words of
natural language such as English. ie performs a function similar
to spelling.
eg. in C program: a [index] = 4 + 6
This code contain 12 non blank characters, but only 8 tokens:
a identifier
[ left bracket
index identifier
] right bracket
= assignment
4 number
+ plus sign
6 number 9
Each token consists of one or more characters that’s are
collected into a unit before further processing takes place.
It may enter identifiers into the symbol table, and may enter
literals into literal tables.
Literals include numeric constants such as 3.141 and quoted
strings of text such as “Hello , World!”
The Parser – receives the source in the form of tokens from the
scanner and performs syntax analysis, which determines the
structure of the program.
Its similar to performing grammatical analysis on a sentence
in a natural language. Syntax analysis determines the
structural elements of the program as well as their
relationships. The results of the syntax analysis are usually
presented as a parse tree or syntax tree.
10
eg. in C program:
expression
assigned-expression
expression = expression
subscript-expression additive-expression
+
expression [ expression ] expression expression
assigned-expression
subscript-expression additive-expression
12
Semantic Analyser
Semantics of a program are its meaning not syntax.
Semantics of a program determines the runtime behaviour
Most programming languages have features that can be
determined prior to execution and yet cant be conveniently
expressed as syntax and analysed by the parser. – static semantics
Analysis of such semantics is the work of the semantic analyser.
Dynamic semantics of a program – properties of a program that
can only be determine by executing it, can not be determined by
a compiler, since it does not execute the program.
Typical static semantic features of common programming
languages include declaration and type checking.
subscript-expression additive-expression
integer integer
assigned-expression
subscript-expression number
integer 10
integer
identifier identifier
a Index
Array of integer integer 15
t=4+6
a [ index ] = t
Variable t to store intermediate results. Optimiser would
improve code in 2 steps
1. t = 10
a [index] = t
2. a [index] = 10
16
Code Generator
Takes the IR and generate code for machine.
Most compilers generate object codes directly, but we shall go
thru the assembly language for ease of understanding.
Properties of the target machine is now a major factor.
Eg representation of data such as how many bytes or words
variables of integer and floating-point data types occupy in
memory.
How integers are to be stored to for array indexing.
sample code in hypothetical assembly language
20
Front End and Back End
Front End – operations that depend only on the source language
21
Regular Expressions
Regular expressions represent patterns of strings of characters.
A regular expression r is completely defined by the set of
strings that it matches. This set is called the language
generated by the regular expression and written as L(r)
Language here means set of strings. Eg set of ASCII characters.
Basic Regular Expressions – these are just the single character
from the alphabets which match themselves.
Given any character a from the alphabet ∑ , RE a matches the
character a by writing:
L(a) = {a} a is the character a used as a pattern
Empty string is s string that contains no characters ie Ɛ
L(Ɛ) = {Ɛ}
Empty set matches no string. ie { } or ɸ.
L(ɸ) = { } 22
What’s the diff between empty string and empty set ???
{ } and {Ɛ}
23
Choice Among Alternatives – if r and s are REs, then r|s is a RE
which matches any string that is matched either by r or by s.
In terms of languages, the language of r |s is the union of the
languages of r and s, or
L(r |s)= L(r) U L(s)
eg, consider the RE a|b : it matches either of character a or b ,
ie L(a|b) = L(a) U L(b) = {a} U {b} = {a, b}
Also, a|Ɛ matches either the single character a or empty
string(consisting of no characters), ie L(a | Ɛ ) = {a, Ɛ}
24
ICT 441: COMPILER AND TRANSLATORS
This course introduces the concepts of compilation and illustrates
those by a compiler for a small Pascal-like language. It further
deals with the understanding of how compilers work and a deep
understanding of the syntax of programming languages, efficiency
and memory considerations of the available control structures and
data types, issues in separate compilation, differences between
programming languages, and the implications of processor
architecture.
Topics include the compilation process (stages, phases, passes);
language definition (syntax, grammar, regular and context-free
languages) lexical analysis; parsing; semantic analysis, storage
allocation and code generation.
25