0% found this document useful (0 votes)

9 views13 pages

Unit 4 SS

The document provides an overview of compiler functions, detailing the process of translating high-level programming languages into low-level machine code through stages such as lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. It explains the roles of different types of compilers, the structure of grammar, and the importance of tokens and lexemes in lexical analysis. Additionally, it discusses the intricacies of code generation, including instruction selection, register allocation, and the impact of target machine architecture on the generated code.

Uploaded by

trail3299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views13 pages

Unit 4 SS

Uploaded by

trail3299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT -4

BASIC COMPILER FUNCTION

The compiler is software that converts a program written in a high-level language (Source
Language) to a low-level language (Object/Target/Machine Language/0, 1’s).
A translator or language processor is a program that translates an input program written in a
programming language into an equivalent program in another language.

The compiler is a type of translator, which takes a program written in a high-level programming
language as input and translates it into an equivalent program in low-level languages such as
machine language or assembly language.

The program written in a high-level language is known as a source program, and the program
converted into a low-level language is known as an object (or target) program.

Without compilation, no program written in a high-level language can be executed.

The process of translating the source code into machine code involves several stages, including
lexical analysis, syntax analysis, semantic analysis, code generation, and optimization.

Compiler is an intelligent program as compare to an assembler. Compiler verifies all types of

limits, ranges, errors , etc.

High-Level Programming Language

A high-level programming language is a language that has an abstraction of
attributes of the computer. High-level programming is more convenient to the user in writing a
program.

Low-Level Programming Language

A low-Level Programming language is a language that doesn’t require programming ideas and
concepts.

Stages of Compiler Design

1. Lexical Analysis: The first stage of compiler design is lexical

analysis, also known as scanning.
In this stage, the compiler reads the source code character by
character and breaks it down into a series of tokens, such as
keywords, identifiers, and operators.

These tokens are then passed on to the next stage of the

compilation process.

2. Syntax Analysis: The second stage of compiler design is syntax analysis, also
known as parsing. In this stage, the compiler checks the syntax of the source code to ensure
that it conforms to the rules of the programming language.

The compiler builds a parse tree, which is a hierarchical representation of the program’s
structure, and uses it to check for syntax errors.

3. Semantic Analysis: The third stage of compiler design is semantic analysis. In this
stage, the compiler checks the meaning of the source code to ensure that it makes sense.

The compiler performs type checking, which ensures that variables are used correctly and
that operations are performed on compatible data types.

The compiler also checks for other semantic errors, such as undeclared variables and
incorrect function calls.

4. Code Generation: The fourth stage of compiler design is code generation.

In this stage, the compiler translates the parse tree into machine code that can be executed by
the computer.

The code generated by the compiler must be efficient and optimized for the target platform.

5. Optimization: The final stage of compiler design is optimization. In this stage, the
compiler analyzes the generated code and makes optimizations to improve its performance.

The compiler may perform optimizations such as constant folding, loop unrolling, and
function inlining.

. A well-designed compiler can greatly improve the efficiency and performance of software
programs, making them more useful and valuable for users.
Compiler

 Cross Compiler that runs on a machine ‘A’ and produces a code for
another machine ‘B’. It is capable of creating code for a platform other
than the one on which the compiler is running.

 Source-to-source Compiler or transcompiler or transpiler is a

compiler that translates source code written in one programming
language into the source code of another programming language

Grammar :
It is a finite set of formal rules for generating syntactically correct sentences or meaningful
correct sentences.

Constitute Of Grammar :
Grammar is basically composed of two basic elements –
1. Terminal Symbols –
Terminal symbols are those which are the components of the sentences generated using a
grammar and are represented using small case letter like a, b, c etc.
2. Non-Terminal Symbols –
Non-Terminal Symbols are those symbols which take part in the generation of the sentence
but are not the component of the sentence. Non-Terminal Symbols are also called Auxiliary
Symbols and Variables. These symbols are represented using a capital letter like A, B, C,
etc.
Formal Definition of Grammar :
Any Grammar can be represented by 4 tuples – <N, T, P, S>
 N – Finite Non-Empty Set of Non-Terminal Symbols.
 T – Finite Set of Terminal Symbols.
 P – Finite Non-Empty Set of Production Rules.
 S – Start Symbol (Symbol from where we start producing our sentences or strings).

Different Types Of Grammars :

Grammar can be divided on basis of –
 Type of Production Rules
 Number of Derivation Trees
 Number of Strings
LEXICAL ANALYSIS

A simple way to build lexical analyzer is to construct a diagram that illustrates the
structure of the tokens of the source language, and then to hand-translate the diagram into a
program for finding tokens. Efficient lexical analysers can be produced in this manner .

Role of Lexical Analyser

The lexical analyzer is the first phase of compiler. Its main task is to read the input characters
and produces output a sequence of tokens that the parser uses for syntax analysis. As in the
figure, upon receiving a “get next token” command from the parser the lexical analyzer reads
input characters until it can identify the next token.
Fig. 1.8 Interaction of lexical analyzer with parser

Since the lexical analyzer is the part of the compiler that reads the source text, it may also
perform certain secondary tasks at the user interface. One such task is stripping out from the
source program comments and white space in the form of blank, tab, and new line character.
Another is correlating error messages from the compiler with the source program.

Issues in Lexical Analysis

There are several reasons for separating the analysis phase of compiling into lexical
analysis and parsing

1) Simpler design is the most important consideration. The separation of lexical analysis from
syntax analysis often allows us to simplify one or the other of these phases.
2) Compiler efficiency is improved.
3) Compiler portability is enhanced.

Tokens Patterns and Lexemes.

There is a set of strings in the input for which the same token is produced as output. This
set of strings is described by a rule called a pattern associated with the token. The pattern is set to
match each string in the set.
In most programming languages, the following constructs are treated as tokens:
keywords, operators, identifiers, constants, literal strings, and punctuation symbols such as
parentheses, commas, and semicolons.
Lexeme

Collection or group of characters forming tokens is called Lexeme. A lexeme is a

sequence of characters in the source program that is matched by the pattern for the token. For
example in the Pascal’s statement const pi = 3.1416; the substring pi is a lexeme for the token
identifier.

Patterns

A pattern is a rule describing a set of lexemes that can represent a particular token in
source program. The pattern for the token const in the above table is just the single string const
that spells out the keyword.

Certain language conventions impact the difficulty of lexical analysis. Languages such as
FORTRAN require a certain constructs in fixed positions on the input line. Thus the alignment of
a lexeme may be important in determining the correctness of a source program.

Attributes of Token

The lexical analyzer returns to the parser a representation for the token it has found. The
representation is an integer code if the token is a simple construct such as a left parenthesis,
comma, or colon. The representation is a pair consisting of an integer code and a pointer to a
table if the token is a more complex element such as an identifier or constant.
The integer code gives the token type, the pointer points to the value of that token. Pairs
are also retuned whenever we wish to distinguish between instances of a token.

The attributes influence the translation of tokens.

i) Constant : value of the constant
ii) Identifiers: pointer to the corresponding symbol table entry.

Error Recovery Strategies In Lexical Analysis

The following are the error-recovery actions in lexical analysis:
1) Deleting an extraneous character.
2) Inserting a missing character.
3)Replacing an incorrect character by a correct character.
4)Transforming two adjacent characters.
5) Panic mode recovery: Deletion of successive characters from the token until error is
resolved.

SYNTACTIC ANALYSIS

When an input string (source code or a program in some language) is given to a compiler, the
compiler processes it in several phases,

starting from lexical analysis (scans the input and divides it into tokens) to target code
generation.

Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks the
syntactical structure of the given input, i.e. whether the given input is in the correct syntax (of
the language in which the input has been written) or not.
It does so by building a data structure, called a Parse tree or Syntax tree.
The parse tree is constructed by using the pre-defined Grammar of the language and the input
string.
If the given input string can be produced with the help of the syntax tree (in the derivation
process), the input string is found to be in the correct syntax. if not, the error is reported by the
syntax analyzer.
The main goal of syntax analysis is to create a parse tree or abstract syntax tree (AST) of the
source code, which is a hierarchical representation of the source code that reflects the
grammatical structure of the program.
Features of syntax analysis:

Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical representation
of the code’s structure .

The tree shows the relationship between the various parts of the code, including statements,
expressions, and operators .

Context-Free Grammar: Syntax analysis uses context-free grammar to define the syntax
of the programming language.

Context-free grammar is a formal language used to describe the structure of programming

languages.
Top-Down and Bottom-Up Parsing:
Syntax analysis can be performed using two main approaches: top-down parsing and bottom-
up parsing.

Top-down parsing starts from the highest level of the syntax tree and works its way down,
while bottom-up parsing starts from the lowest level and works its way up.

Error Detection: Syntax analysis is responsible for detecting syntax errors in the code.

If the code does not conform to the rules of the programming language, the parser will report
an error and halt the compilation process.
Intermediate Code Generation: Syntax analysis generates an intermediate
representation of the code, which is used by the subsequent phases of the compiler.

The intermediate representation is usually a more abstract form of the code, which is easier to
work with than the original source code.

Optimization: Syntax analysis can perform basic optimizations on the code, such as
removing redundant code and simplifying expressions.

Code Generation

Compiler Design is an essential aspect of software engineering that enables us to translate high-

level programming languages into machine-readable code.

One of the most important tasks in compiler design is code generation, which involves

transforming the intermediate code generated .

The compiler into efficient machine code and challenging process that requires a deep

understanding of the target architecture and the programming language being compiled.

Code generation can be considered as the final phase of the compilation.

It takes as input the intermediate representation (IR) produced by the front end of the compiler,
along with the relevant symbol table information, and produces as output a semantically

equivalent target program.

Position of code generator

The requirement imposed on a code generator are severe. The Target Program must preserve the

semantic meaning of the source program and be of high quality, that it must make effective use of

the available resources of the target machine.

Through post code generation, optimization process can be applied on the code, but that can be

sees a part of code generation phase itself.

The code generated by the compiler is an object code of some lower-level programming language,

for example assembly language.

While transforming into a lower-level language that results in a lower-level object code should

have following properties:

 It should carry the exact meaning of the source code.

 It should be efficient in terms of CPU usage and memory management.

A code generator has three primary tasks:

Instruction selection, register allocation and assignment, and instruction ordering.

Instruction selection involves choosing appropriate target-machine instructions to implement the

IR statements.

Instruction ordering involves deciding in what order to schedule the execution of instruction.

Input to the Code Generator

The input to the code generator is the intermediate representation of the source program produced

by the front end, along with information in the symbol table that is used to determine the run-time

addresses of the data objects denoted by the names in the IR.

Intermediate Representation include three-address representation such as quadruples, triples,

indirect triples: virtual machine representations such as postfix notation; and graphical

representation such as abstract syntax trees (AST) and DAG’s.

The Target Program

The architecture of target machine has a significant impact on the difficulty of constructing a good

code generator that produces high-quality machine code.

The most common target-machine architectures are reduced instruction set computer (RISC),

complex instruction set computer (CISC), and stack based.

A RISC machine typically has many registers, three-address instructions, simple addressing

modes, and a relatively simple instruction-set architecture.

In contrast, a CISC machine typically has few registers, two-address instructions, a variety of

addressing modes, several register classes, variable-length instructions and instructions with side

effects.

One example of a RISC architecture is the ARM architecture, which is commonly used in mobile

devices and embedded systems.

CISC architecture is the x86 architecture, which is commonly used in desktop and laptop

computers.

In a stack-based machine, operations are done by pushing operands onto a stack and then

performing the operations on the operands at the top of the stack.

Stack-based machines almost disappeared because it was felt that the stack was too limiting and

required too many swap and copy operations.

Instruction Selection

The code generator takes intermediate representation as input and converts it into target machine’s

instruction set.

One representation can have many ways to convert it, so it becomes the responsibility of the code

generator to choose the appropriate instruction wisely.

The complexity of performing this mapping is determined by a factor such as:

· The level of the Intermediate Representations

· The nature of the instruction-set architecture

· The desired quality of the generated code

The nature of the instruction set of the target machine has a strong effect on the difficulty of
instruction selection.

For example, the uniformity and completeness of the instruction set are important factors.

A key problem in code generation is deciding what values to hold in what registers.

A program has a number of values to be maintained during the execution.

The targets machine’s architecture may not allow all of the values to be kept in the CPU memory

or registers.

Code generator decides what values to keep in the registers. Also, it decides the registers to be

used to keep these values.

The use of registers is often subdivided into two subproblems:

1. Register allocation, during which we select the set of variables that will reside in registers at

each point in the program.

2. Register assignment, during which we pick the specific register that a variable will reside in.

Evaluation Order

The order in which computations are performed by the code generator. It creates schedules for

instruction to execute them.

Unit 1 CD
No ratings yet
Unit 1 CD
17 pages
Compiler Design Essentials
100% (1)
Compiler Design Essentials
193 pages
Module-1 1
No ratings yet
Module-1 1
53 pages
Compiler Design Module
100% (1)
Compiler Design Module
120 pages
Unit 1
No ratings yet
Unit 1
109 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
Automata Theory and Compiler Design
No ratings yet
Automata Theory and Compiler Design
55 pages
Unit-1 Notes CD OU
No ratings yet
Unit-1 Notes CD OU
19 pages
ACD UNIT 2 R20 Part 2
No ratings yet
ACD UNIT 2 R20 Part 2
12 pages
CD - 1
No ratings yet
CD - 1
22 pages
Introduction to Compiler Design
No ratings yet
Introduction to Compiler Design
30 pages
CD Unit-1
No ratings yet
CD Unit-1
31 pages
Compiler Design
No ratings yet
Compiler Design
7 pages
Compiler Design
No ratings yet
Compiler Design
15 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
47 pages
1 Lexial Analysis
No ratings yet
1 Lexial Analysis
24 pages
Compiler Design Principles Overview
No ratings yet
Compiler Design Principles Overview
24 pages
Compiler Construction II Handout
100% (1)
Compiler Construction II Handout
27 pages
Lec#1
No ratings yet
Lec#1
36 pages
Compiler Design Overview and Phases
No ratings yet
Compiler Design Overview and Phases
250 pages
CDPPT Unit1
No ratings yet
CDPPT Unit1
60 pages
Understanding Compiler Design Basics
No ratings yet
Understanding Compiler Design Basics
14 pages
Compilers Course Notes
No ratings yet
Compilers Course Notes
100 pages
Notes PDF
No ratings yet
Notes PDF
100 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Compiler Design - Webview
No ratings yet
Compiler Design - Webview
10 pages
Day - 1 Intro To Compilers
No ratings yet
Day - 1 Intro To Compilers
53 pages
Introduction to Compiler Design Concepts
No ratings yet
Introduction to Compiler Design Concepts
43 pages
Compiler Design Essentials
No ratings yet
Compiler Design Essentials
45 pages
Unit 1 Slides
No ratings yet
Unit 1 Slides
49 pages
Compiler Design: Key Concepts Explained
No ratings yet
Compiler Design: Key Concepts Explained
16 pages
Acd 2.1
No ratings yet
Acd 2.1
20 pages
Compiler Design
No ratings yet
Compiler Design
4 pages
VI-Semester Departmental Elective CY603 (C) - Autometa & Compiler Design
No ratings yet
VI-Semester Departmental Elective CY603 (C) - Autometa & Compiler Design
94 pages
Slides 01 - Compiler Construction - UET CS - Introduction
No ratings yet
Slides 01 - Compiler Construction - UET CS - Introduction
37 pages
CD - Unit 1
No ratings yet
CD - Unit 1
46 pages
Compiler Design: Phases and Analysis
No ratings yet
Compiler Design: Phases and Analysis
117 pages
CDUnit 1
No ratings yet
CDUnit 1
39 pages
Introduction to Compiler Design
100% (1)
Introduction to Compiler Design
13 pages
Compiler Design: Phases & Analysis
100% (1)
Compiler Design: Phases & Analysis
36 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
Compilation Stages - New
No ratings yet
Compilation Stages - New
42 pages
Basics of Compilation Process COM 413
No ratings yet
Basics of Compilation Process COM 413
35 pages
Compiler Design: Structure and Analysis
No ratings yet
Compiler Design: Structure and Analysis
29 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
Compiler Design: Phases and Analysis
No ratings yet
Compiler Design: Phases and Analysis
97 pages
Compiler Constructer
No ratings yet
Compiler Constructer
17 pages
CD Unit 1
No ratings yet
CD Unit 1
19 pages
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
No ratings yet
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
35 pages
Introduction Compiler
No ratings yet
Introduction Compiler
47 pages
CSC 318 Class Notes
No ratings yet
CSC 318 Class Notes
21 pages
Compiler Design: Structure and Phases
No ratings yet
Compiler Design: Structure and Phases
47 pages
DFAs for Binary String Languages
No ratings yet
DFAs for Binary String Languages
23 pages
Introduction to Compilers and Translators
No ratings yet
Introduction to Compilers and Translators
77 pages
CD All Units
No ratings yet
CD All Units
117 pages
NoSQL Sharding and Replication Guide
No ratings yet
NoSQL Sharding and Replication Guide
28 pages
DataGridView Programming Guide
No ratings yet
DataGridView Programming Guide
263 pages
Schlumberger Software Services
No ratings yet
Schlumberger Software Services
2 pages
ADC ModoDMA
No ratings yet
ADC ModoDMA
13 pages
Logical Functions PPT Kesar Bhandari - 1
No ratings yet
Logical Functions PPT Kesar Bhandari - 1
14 pages
GD311-AEF11 Modbus RTU Register Map
No ratings yet
GD311-AEF11 Modbus RTU Register Map
13 pages
SMBv2 - v3 Support
100% (1)
SMBv2 - v3 Support
4 pages
Autel Mapper User Manual V1.1.0
No ratings yet
Autel Mapper User Manual V1.1.0
150 pages
Bos Lab M
No ratings yet
Bos Lab M
22 pages
BGP Troubleshooting
No ratings yet
BGP Troubleshooting
8 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Online Campus Upgrade Blueprint
No ratings yet
Online Campus Upgrade Blueprint
16 pages
Report On Challenges Faced On The IHIP Portal
No ratings yet
Report On Challenges Faced On The IHIP Portal
2 pages
Drafting Module 8
No ratings yet
Drafting Module 8
170 pages
Computer Science Specializations - Choosing The One For You
No ratings yet
Computer Science Specializations - Choosing The One For You
11 pages
Requirements Analyst Role Overview
No ratings yet
Requirements Analyst Role Overview
3 pages
New Interaction Paradigm For Complex Eda Software
No ratings yet
New Interaction Paradigm For Complex Eda Software
8 pages
Unit4 Material
No ratings yet
Unit4 Material
8 pages
Acu-Expert TD Eng
No ratings yet
Acu-Expert TD Eng
2 pages
System Implementation Plan Template
100% (1)
System Implementation Plan Template
21 pages
Dball2-Chrysler3-Rxt en Ig vm20170202
No ratings yet
Dball2-Chrysler3-Rxt en Ig vm20170202
15 pages
Macro Processor & Compiler Guide
No ratings yet
Macro Processor & Compiler Guide
153 pages
Symantec:: Symantec IT Management Suite
No ratings yet
Symantec:: Symantec IT Management Suite
30 pages
Domain Model in Object-Oriented Design
No ratings yet
Domain Model in Object-Oriented Design
10 pages
Ain Dumps 2023-Aug-31 by Martin 0q Vce
100% (1)
Ain Dumps 2023-Aug-31 by Martin 0q Vce
30 pages
Forensicsci 02 00016
No ratings yet
Forensicsci 02 00016
21 pages
Arduino Robotics for Kids: A Hands-On Guide
No ratings yet
Arduino Robotics for Kids: A Hands-On Guide
12 pages
IoT Based Assistive Device For Deaf Dumb and Blind
No ratings yet
IoT Based Assistive Device For Deaf Dumb and Blind
11 pages
Google Professional Machine Learning Engineer Updated Dumps
100% (1)
Google Professional Machine Learning Engineer Updated Dumps
54 pages
Multiprocess
No ratings yet
Multiprocess
6 pages