0% found this document useful (0 votes)
9 views13 pages

Unit 4 SS

The document provides an overview of compiler functions, detailing the process of translating high-level programming languages into low-level machine code through stages such as lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. It explains the roles of different types of compilers, the structure of grammar, and the importance of tokens and lexemes in lexical analysis. Additionally, it discusses the intricacies of code generation, including instruction selection, register allocation, and the impact of target machine architecture on the generated code.

Uploaded by

trail3299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Unit 4 SS

The document provides an overview of compiler functions, detailing the process of translating high-level programming languages into low-level machine code through stages such as lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. It explains the roles of different types of compilers, the structure of grammar, and the importance of tokens and lexemes in lexical analysis. Additionally, it discusses the intricacies of code generation, including instruction selection, register allocation, and the impact of target machine architecture on the generated code.

Uploaded by

trail3299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT -4

BASIC COMPILER FUNCTION

The compiler is software that converts a program written in a high-level language (Source
Language) to a low-level language (Object/Target/Machine Language/0, 1’s).
A translator or language processor is a program that translates an input program written in a
programming language into an equivalent program in another language.

The compiler is a type of translator, which takes a program written in a high-level programming
language as input and translates it into an equivalent program in low-level languages such as
machine language or assembly language.

The program written in a high-level language is known as a source program, and the program
converted into a low-level language is known as an object (or target) program.

Without compilation, no program written in a high-level language can be executed.

The process of translating the source code into machine code involves several stages, including
lexical analysis, syntax analysis, semantic analysis, code generation, and optimization.

Compiler is an intelligent program as compare to an assembler. Compiler verifies all types of


limits, ranges, errors , etc.

High-Level Programming Language


A high-level programming language is a language that has an abstraction of
attributes of the computer. High-level programming is more convenient to the user in writing a
program.

Low-Level Programming Language


A low-Level Programming language is a language that doesn’t require programming ideas and
concepts.

Stages of Compiler Design

1. Lexical Analysis: The first stage of compiler design is lexical


analysis, also known as scanning.
In this stage, the compiler reads the source code character by
character and breaks it down into a series of tokens, such as
keywords, identifiers, and operators.

These tokens are then passed on to the next stage of the


compilation process.

2. Syntax Analysis: The second stage of compiler design is syntax analysis, also
known as parsing. In this stage, the compiler checks the syntax of the source code to ensure
that it conforms to the rules of the programming language.

The compiler builds a parse tree, which is a hierarchical representation of the program’s
structure, and uses it to check for syntax errors.

3. Semantic Analysis: The third stage of compiler design is semantic analysis. In this
stage, the compiler checks the meaning of the source code to ensure that it makes sense.

The compiler performs type checking, which ensures that variables are used correctly and
that operations are performed on compatible data types.

The compiler also checks for other semantic errors, such as undeclared variables and
incorrect function calls.

4. Code Generation: The fourth stage of compiler design is code generation.

In this stage, the compiler translates the parse tree into machine code that can be executed by
the computer.

The code generated by the compiler must be efficient and optimized for the target platform.

5. Optimization: The final stage of compiler design is optimization. In this stage, the
compiler analyzes the generated code and makes optimizations to improve its performance.

The compiler may perform optimizations such as constant folding, loop unrolling, and
function inlining.

. A well-designed compiler can greatly improve the efficiency and performance of software
programs, making them more useful and valuable for users.
Compiler

 Cross Compiler that runs on a machine ‘A’ and produces a code for
another machine ‘B’. It is capable of creating code for a platform other
than the one on which the compiler is running.

 Source-to-source Compiler or transcompiler or transpiler is a


compiler that translates source code written in one programming
language into the source code of another programming language

Grammar :
It is a finite set of formal rules for generating syntactically correct sentences or meaningful
correct sentences.

Constitute Of Grammar :
Grammar is basically composed of two basic elements –
1. Terminal Symbols –
Terminal symbols are those which are the components of the sentences generated using a
grammar and are represented using small case letter like a, b, c etc.
2. Non-Terminal Symbols –
Non-Terminal Symbols are those symbols which take part in the generation of the sentence
but are not the component of the sentence. Non-Terminal Symbols are also called Auxiliary
Symbols and Variables. These symbols are represented using a capital letter like A, B, C,
etc.
Formal Definition of Grammar :
Any Grammar can be represented by 4 tuples – <N, T, P, S>
 N – Finite Non-Empty Set of Non-Terminal Symbols.
 T – Finite Set of Terminal Symbols.
 P – Finite Non-Empty Set of Production Rules.
 S – Start Symbol (Symbol from where we start producing our sentences or strings).

Different Types Of Grammars :


Grammar can be divided on basis of –
 Type of Production Rules
 Number of Derivation Trees
 Number of Strings
LEXICAL ANALYSIS

A simple way to build lexical analyzer is to construct a diagram that illustrates the
structure of the tokens of the source language, and then to hand-translate the diagram into a
program for finding tokens. Efficient lexical analysers can be produced in this manner .

Role of Lexical Analyser


The lexical analyzer is the first phase of compiler. Its main task is to read the input characters
and produces output a sequence of tokens that the parser uses for syntax analysis. As in the
figure, upon receiving a “get next token” command from the parser the lexical analyzer reads
input characters until it can identify the next token.
Fig. 1.8 Interaction of lexical analyzer with parser

Since the lexical analyzer is the part of the compiler that reads the source text, it may also
perform certain secondary tasks at the user interface. One such task is stripping out from the
source program comments and white space in the form of blank, tab, and new line character.
Another is correlating error messages from the compiler with the source program.

Issues in Lexical Analysis

There are several reasons for separating the analysis phase of compiling into lexical
analysis and parsing

1) Simpler design is the most important consideration. The separation of lexical analysis from
syntax analysis often allows us to simplify one or the other of these phases.
2) Compiler efficiency is improved.
3) Compiler portability is enhanced.

Tokens Patterns and Lexemes.

There is a set of strings in the input for which the same token is produced as output. This
set of strings is described by a rule called a pattern associated with the token. The pattern is set to
match each string in the set.
In most programming languages, the following constructs are treated as tokens:
keywords, operators, identifiers, constants, literal strings, and punctuation symbols such as
parentheses, commas, and semicolons.
Lexeme

Collection or group of characters forming tokens is called Lexeme. A lexeme is a


sequence of characters in the source program that is matched by the pattern for the token. For
example in the Pascal’s statement const pi = 3.1416; the substring pi is a lexeme for the token
identifier.

Patterns

A pattern is a rule describing a set of lexemes that can represent a particular token in
source program. The pattern for the token const in the above table is just the single string const
that spells out the keyword.

Certain language conventions impact the difficulty of lexical analysis. Languages such as
FORTRAN require a certain constructs in fixed positions on the input line. Thus the alignment of
a lexeme may be important in determining the correctness of a source program.

Attributes of Token

The lexical analyzer returns to the parser a representation for the token it has found. The
representation is an integer code if the token is a simple construct such as a left parenthesis,
comma, or colon. The representation is a pair consisting of an integer code and a pointer to a
table if the token is a more complex element such as an identifier or constant.
The integer code gives the token type, the pointer points to the value of that token. Pairs
are also retuned whenever we wish to distinguish between instances of a token.

The attributes influence the translation of tokens.


i) Constant : value of the constant
ii) Identifiers: pointer to the corresponding symbol table entry.

Error Recovery Strategies In Lexical Analysis


The following are the error-recovery actions in lexical analysis:
1) Deleting an extraneous character.
2) Inserting a missing character.
3)Replacing an incorrect character by a correct character.
4)Transforming two adjacent characters.
5) Panic mode recovery: Deletion of successive characters from the token until error is
resolved.

SYNTACTIC ANALYSIS

When an input string (source code or a program in some language) is given to a compiler, the
compiler processes it in several phases,

starting from lexical analysis (scans the input and divides it into tokens) to target code
generation.

Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks the
syntactical structure of the given input, i.e. whether the given input is in the correct syntax (of
the language in which the input has been written) or not.
It does so by building a data structure, called a Parse tree or Syntax tree.
The parse tree is constructed by using the pre-defined Grammar of the language and the input
string.
If the given input string can be produced with the help of the syntax tree (in the derivation
process), the input string is found to be in the correct syntax. if not, the error is reported by the
syntax analyzer.
The main goal of syntax analysis is to create a parse tree or abstract syntax tree (AST) of the
source code, which is a hierarchical representation of the source code that reflects the
grammatical structure of the program.
Features of syntax analysis:

Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical representation
of the code’s structure .

The tree shows the relationship between the various parts of the code, including statements,
expressions, and operators .

Context-Free Grammar: Syntax analysis uses context-free grammar to define the syntax
of the programming language.

Context-free grammar is a formal language used to describe the structure of programming


languages.
Top-Down and Bottom-Up Parsing:
Syntax analysis can be performed using two main approaches: top-down parsing and bottom-
up parsing.

Top-down parsing starts from the highest level of the syntax tree and works its way down,
while bottom-up parsing starts from the lowest level and works its way up.

Error Detection: Syntax analysis is responsible for detecting syntax errors in the code.

If the code does not conform to the rules of the programming language, the parser will report
an error and halt the compilation process.
Intermediate Code Generation: Syntax analysis generates an intermediate
representation of the code, which is used by the subsequent phases of the compiler.

The intermediate representation is usually a more abstract form of the code, which is easier to
work with than the original source code.

Optimization: Syntax analysis can perform basic optimizations on the code, such as
removing redundant code and simplifying expressions.

Code Generation

Compiler Design is an essential aspect of software engineering that enables us to translate high-

level programming languages into machine-readable code.


One of the most important tasks in compiler design is code generation, which involves

transforming the intermediate code generated .

The compiler into efficient machine code and challenging process that requires a deep

understanding of the target architecture and the programming language being compiled.

Code generation can be considered as the final phase of the compilation.

It takes as input the intermediate representation (IR) produced by the front end of the compiler,
along with the relevant symbol table information, and produces as output a semantically

equivalent target program.

Position of code generator

The requirement imposed on a code generator are severe. The Target Program must preserve the

semantic meaning of the source program and be of high quality, that it must make effective use of

the available resources of the target machine.

Through post code generation, optimization process can be applied on the code, but that can be

sees a part of code generation phase itself.

The code generated by the compiler is an object code of some lower-level programming language,

for example assembly language.

While transforming into a lower-level language that results in a lower-level object code should

have following properties:


 It should carry the exact meaning of the source code.

 It should be efficient in terms of CPU usage and memory management.

A code generator has three primary tasks:

Instruction selection, register allocation and assignment, and instruction ordering.

Instruction selection involves choosing appropriate target-machine instructions to implement the

IR statements.

Register allocation and assignment involves deciding what values to keep in which registers.

Instruction ordering involves deciding in what order to schedule the execution of instruction.

Input to the Code Generator

The input to the code generator is the intermediate representation of the source program produced

by the front end, along with information in the symbol table that is used to determine the run-time

addresses of the data objects denoted by the names in the IR.

Intermediate Representation include three-address representation such as quadruples, triples,

indirect triples: virtual machine representations such as postfix notation; and graphical

representation such as abstract syntax trees (AST) and DAG’s.


The Target Program

The architecture of target machine has a significant impact on the difficulty of constructing a good

code generator that produces high-quality machine code.

The most common target-machine architectures are reduced instruction set computer (RISC),

complex instruction set computer (CISC), and stack based.

A RISC machine typically has many registers, three-address instructions, simple addressing

modes, and a relatively simple instruction-set architecture.

In contrast, a CISC machine typically has few registers, two-address instructions, a variety of

addressing modes, several register classes, variable-length instructions and instructions with side

effects.

One example of a RISC architecture is the ARM architecture, which is commonly used in mobile

devices and embedded systems.

CISC architecture is the x86 architecture, which is commonly used in desktop and laptop

computers.

In a stack-based machine, operations are done by pushing operands onto a stack and then

performing the operations on the operands at the top of the stack.

Stack-based machines almost disappeared because it was felt that the stack was too limiting and

required too many swap and copy operations.


Instruction Selection

The code generator takes intermediate representation as input and converts it into target machine’s

instruction set.

One representation can have many ways to convert it, so it becomes the responsibility of the code

generator to choose the appropriate instruction wisely.

The complexity of performing this mapping is determined by a factor such as:

· The level of the Intermediate Representations

· The nature of the instruction-set architecture

· The desired quality of the generated code

The nature of the instruction set of the target machine has a strong effect on the difficulty of
instruction selection.

For example, the uniformity and completeness of the instruction set are important factors.

Register Allocation

A key problem in code generation is deciding what values to hold in what registers.

A program has a number of values to be maintained during the execution.


The targets machine’s architecture may not allow all of the values to be kept in the CPU memory

or registers.

Code generator decides what values to keep in the registers. Also, it decides the registers to be

used to keep these values.

The use of registers is often subdivided into two subproblems:

1. Register allocation, during which we select the set of variables that will reside in registers at

each point in the program.

2. Register assignment, during which we pick the specific register that a variable will reside in.

Evaluation Order

The order in which computations are performed by the code generator. It creates schedules for

instruction to execute them.

You might also like