Language Processing System:-: Compiler
Language Processing System:-: Compiler
Preprocessor:-
A preprocessor produce input to compilers. They may perform the following functions.
o Macro processing: A preprocessor may allow a user to define macros that are short
hands for longer constructs.
o File inclusion: A preprocessor may include header files into the program text.
o Rational preprocessor: augment older languages with more modern flow-of-control
and data structuring facilities.
o Language Extensions: attempts to add capabilities to the language by certain
amounts to build-in macro
Compiler:-
A compiler is a program that can read a program in one language - the source language and
translate it into an equivalent program in another language - the target language. An important role of
the compiler is to report any errors in the source program that it detects during the translation process.
Interpreter:-
is another common kind of language processor. Instead of producing a target program as a
translation, an interpreter appears to directly execute the operations specified in the source program
on inputs supplied by the user.
Hybrid Compiler:-
Java language processors combine compilation and interpretation, as shown in Fig. A Java
source program may first be compiled into an intermediate form called bytecodes. The bytecodes are
then interpreted by a virtual machine. A benefit of this arrangement is that bytecodes compiled on
one machine can be interpreted on another machine.
Assembler:-
programmers use a mnemonic (symbols) for each machine instruction, which they would
subsequently translate into machine language. Such a mnemonic machine language is now called an
assembly language. Programs known as assembler were written to translation of assembly language
in to machine language.
The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to create an intermediate representation of
the source program. If the analysis part detects that the source program is either syntactically ill
formed or semantically unsound, then it must provide informative messages, so the user can take
corrective action. The analysis part also collects information about the source program and stores it
in a data structure called a symbol table, which is passed along with the intermediate representation
to the synthesis part.
The synthesis part constructs the desired target program from the intermediate representation
and the information in the symbol table. The analysis part is often called the front end of the
compiler; the synthesis part is the back end.
Phases of a compiler
Lexical Analysis:-
lexical analysis or scanning forms the first phase of a compiler. The lexical analyzer reads
the stream of characters which makes the source program and groups them into meaningful
sequences called lexemes. For each lexeme, the lexical analyzer produces tokens as output. A
token format is shown below.
<token-name, attribute-value>
These tokens pass on to the subsequent phase known as syntax analysis. The token elements
are listed below:
o Token-name: This is an abstract symbol used during syntax analysis.
o Attribute-value: This point to an entry in the symbol table for the corresponding
token.
Information from the symbol-table entry 'is needed for semantic analysis and code
generation.
Syntax Analysis:-
Syntax analysis forms the second phase of the compiler.
The list of tokens produced by the lexical analysis phase forms the input and arranges them in
the form of tree-structure (called the syntax tree).This reflects the structure of the program.
This phase is also called parsing.
The syntax tree consists of interior node representing an operation and the child of the node
representing arguments. A syntax tree for the token statement is as shown in the above
example.
Semantic analysis:-
This phase uses the syntax tree and the information in the symbol table to check the source
program for consistency with the language definition. This phase also collects type information and
saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-code
generation.
Type checking forms an important part of semantic analysis. Here the compiler checks whether each
operator has matching operands. For example, many programming language definitions require an
array index to be an integer; the compiler must report an error if a floating- point number is used to
index an array.
Code Optimization:-
This is a machine-independent phase which attempts to improve the intermediate code for
generating better (faster) target code.
For example, a straightforward algorithm generates the intermediate code using an instruction
for each operator in the tree representation that comes from the semantic analyzer.
Code Generator:-
This phase takes the intermediate representation of the source program as input and maps it
to the target language.
The intermediate instructions are translated into sequences of machine instructions that
perform the same task. A critical aspect of code generation is the assignment of registers to
hold variables.
Using R1 & R2 the intermediate code will get converted into machine code.
Symbol-Table Management:-
An essential function of a compiler is to record the variable names used in the source
program and collect information about various attributes of each name.
These attributes may provide information about the storage allocated for a name, its type, its
scope (where in the program its value may be used), and in the case of procedure names, such
things as the number and types of its arguments, the method of passing each argument (for
example, by value or by reference), and the type returned.
The symbol table is a data structure containing a record for each variable name, with fields
for the attributes of the name.
The Grouping of Phases into Passes:-
Activities from several phases may be grouped together into a pass that reads an input file
and writes an output file.
For example, the front-end phases of lexical analysis, syntax analysis, semantic analysis, and
intermediate code generation might be grouped together into one pass.
Code optimization might be an optional pass.
back-end pass consisting of code generation for a particular target machine.
Some compiler collections have been created around carefully designed intermediate
representations that allow the front end for a particular language to interface with the back
end for a certain target machine.
With these collections, we can produce compilers for different source languages for one
target machine by combining different front ends with the back end for that target machine.
Similarly, we can produce compilers for different target machines, by combining a front end
with back ends for different target machines.
A pattern is a description of the form that the lexemes of a token may take. In the case of a
keyword as a token, the pattern is just the sequence of characters that form the keyword. For
identifiers and some other tokens, the pattern is a more complex structure that is matched by
many strings.
A lexeme is a sequence of characters in the source program that matches the pattern for a
token
Cross Compiler:-
– Cross compiler is a compiler capable of creating executable code for a platform other
than one on which compiler is running.
– Creating more & more compiler for same source language but for different machines is
called cross-compiler.
Bootstrapping :-
1. Source Language
2. Target Language
3. Implementation Language
1. Create a compiler SCAA for subset, S of the desired language, L using language "A" and
that compiler runs on machine A.
3. Compile LCSA using the compiler SCAA to obtain LCAA. LCAA is a compiler for language L,
which runs on machine A and produces code for machine A.