0% found this document useful (0 votes)
21 views51 pages

Report

report for btech students

Uploaded by

kuhurawat2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views51 pages

Report

report for btech students

Uploaded by

kuhurawat2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

MELON

A Project Report

Submitted for the partial fulfilment of the award of the degree of

B.TECH - COMPUTER SCIENCE ENGINEERING

Submitted by

KUHU RAWAT 2116208


MANASVI SHARMA 2116216
MUSKAN SONI 2116227
NANCY CHAHAL 2116228
NANDINI AGARWAL 2116229

Under the supervision of

Dr. MANISHA JAILIA


Associate Professor
Department Of Computer Science

Department of Computer Science


Banasthali Vidyapith
2023-24
CERTIFICATE

Certified that Group CS-16: Kuhu Rawat (2116208), Manasvi Sharma (2116216),
Muskan Soni (2116227), Nancy Chahal (2116228), Nandini Agarwal (2116229) has
carried out the project work titled “ MELON ” from 1/9/23 to 1/4/24 for the award of the
BTech Computer Science from Banasthali Vidyapith under my supervision. The project
embodies the result of original work and studies carried out by the students themselves and
the contents of the project do not form the basis for the award of any other degree to the
candidate or anybody else.

Dr. MANISHA JALIA

DESIGNATION: ASSOCIATE PROFESSOR

PLACE: BANASTHALI VIDYAPITH

DATE:
ABSTRACT

In the ever-evolving landscape of programming languages, Melon emerges as a beacon of


innovation and simplicity. Designed with a focus on accessibility, expressiveness, and
efficiency, Melon embodies a fresh paradigm in programming language design.

At its core, Melon prioritizes readability and ease of use without compromising on power and
versatility. Drawing inspiration from a diverse array of languages, Melon combines the best
features of its predecessors while introducing novel concepts and constructs.

Key features of Melon include a clean and intuitive syntax, a robust type system, and
seamless interoperability with existing libraries and frameworks. With its minimalist design
philosophy, programmers can quickly grasp and harness the full potential of Melon, enabling
rapid development and iteration.

Furthermore, Melon embraces modern programming paradigms such as functional and


object-oriented programming, providing developers with the flexibility to choose the
approach that best suits their needs. Additionally, its advanced compiler optimizations ensure
high performance and scalability across a wide range of applications.

In summary, Melon represents a new era in programming language development, offering a


refreshing alternative for both novice and seasoned developers alike. With its simplicity,
expressiveness, and versatility, Melon paves the way for the next generation of software
innovation.
ACKNOWLEDGEMENT

We extend our heartfelt gratitude to the countless individuals and organizations whose
contributions and insights have shaped the development of Mellon.

Special thanks to the programming language community for their invaluable feedback and
constructive criticism throughout the design and implementation process.

We are also indebted to the pioneers of programming language theory and practice whose
groundbreaking work continues to inspire and inform our endeavours.

Furthermore, we express our appreciation to the open-source community for their dedication
and support in fostering collaboration and sharing knowledge.

Last but not least, we acknowledge the unwavering commitment and tireless efforts of our
development team, whose passion and expertise have been instrumental in bringing Mellon to
fruition.

Together, we celebrate the collective effort and commitment to innovation that has
culminated in the creation of Mellon

Welcome to the exciting world of Mellon, a cutting-edge programming language designed to


revolutionize the way we write code.

In today's fast-paced digital landscape, the demand for simplicity, efficiency, and versatility
in software development has never been greater. Enter Mellon: a language crafted with
precision and passion to meet the evolving needs of developers worldwide.

With Mellon, we embark on a journey of exploration and innovation, where clarity and
expressiveness reign supreme. Whether you're a seasoned programmer or just starting your
coding adventure, Mellon empowers you to unleash your creativity and bring your ideas to
life with ease.

Join us as we delve into the features, capabilities, and endless possibilities of Mellon.
Together, let's embark on a transformative journey into the future of programming.
INTRODUCTION

Welcome to Melon, an object-oriented programming language meticulously designed for


Windows environments. Melon offers a seamless development experience tailored to
simplify software creation while harnessing the full potential of the Windows platform. With
its intuitive syntax and robust feature set, Melon is poised to become your trusted companion
for building sophisticated Windows applications.
Welcome to the exciting world of Melon, a cutting-edge programming language designed to
revolutionize the way we write code.

In today's fast-paced digital landscape, the demand for simplicity, efficiency, and versatility
in software development has never been greater. Enter Melon: a language crafted with
precision and passion to meet the evolving needs of developers worldwide.

With Melon, we embark on a journey of exploration and innovation, where clarity and
expressiveness reign supreme. Whether you're a seasoned programmer or just starting your
coding adventure, Mellon empowers you to unleash your creativity and bring your ideas to
life with ease.

Join us as we delve into the features, capabilities, and endless possibilities of Mellon.
Together, let's embark on a transformative journey into the future of programming.
Requirement Analysis (SRS)

Specific Requirements
1. Interface
● UserInterface:
( Example hai we can write as per our own)
The compiler will be operated through a command-line interface (CLI) to ensure simplicity
and
flexibility in its usage. Users will interact with the compiler by executing commands in a
terminal
or command prompt.
/* The following command-line arguments and options will be supported:- `--input <file>`:
Specifies the source code file to be compiled.- `--output <file>`: Specifies the name of the
output file generated after compilation.- `--optimization <level>`: Enables optimization, with
levels ranging from 0 (none) to 3
(maximum).- `--target <architecture>`: Sets the target architecture for the compiled code
(e.g., x86, ARM).
*/
In the event of an incorrect command or option usage, the compiler will provide informative
error
messages to guide the user towards correct usage.
Here we can mention what error it will throw by giving some of error names with their
format.
● Language Syntax and Semantics
The programming language will follow a C-style syntax, including features like variables,
loops,
conditionals, functions, and data types. A detailed language specification document will be
provided to users, outlining the syntax rules and expected behavior of each language
construct.
The semantics of language constructs will be explained in terms of their expected effects on
program state.
● Error Handling
The compiler will implement robust error handling mechanisms. It will detect and report
various
types of errors, including syntax errors, semantic errors, and linker errors.
Syntax errors will be reported with specific line and column numbers along with a descriptive
error message. For example:
Error: Syntax error at line 10, column 20: Unexpected token '}'.
Semantic errors will include informative messages to help users identify and correct issues
related to variable types, function calls, and other semantic rules.
2. Databases
● SymbolTable
The compiler will maintain a symbol table as a data structure to keep track of variables,
functions, and their associated attributes during the compilation process. The symbol table
will
be implemented as a hash table for efficient look-up and storage.
For example, when encountering a variable declaration:
int x = 5;
The symbol table will record the entry `x` with its type `int` and value `5`.
[ ] Intermediate Representation (IR)
The compiler will utilize a three-address code-based intermediate representation (IR) for
efficient code generation and optimization. This IR will be designed to facilitate various
optimization passes, such as constant folding, dead code elimination, and loop optimization.
For example, the following C code:
int result = a + b;
Will be represented in the IR as:
t1 = a + b
result = t1
● Performance
Compilation Time
The compiler will aim to maintain reasonable compilation times across different sizes of
source
code. The maximum acceptable compilation time for a typical program (approximately 1000
lines of code) will be set at 5 seconds on standard hardware.
● Execution Time
The compiled programs will exhibit competitive runtime performance comparable to other
widely
used compilers for the target architecture. Efforts will be made to implement standard
optimization techniques, such as inlining and loop unrolling, to improve execution speed.
4. Software System Attributes
● Portability
The compiler will initially target the x86 architecture and will be designed with portability in
mind.
Future versions may extend support to additional architectures based on user demand and
project resources.
● Maintainability
To ensure maintainability, the codebase will follow established coding conventions and will
be
extensively documented. Version control using Git will be employed, and a clear branching
and
merging strategy will be defined to manage code updates and releases.
● Security
The compiler will implement security measures to mitigate potential vulnerabilities. This
includes
rigorous input validation to prevent buffer overflows or injection attacks. Regular security
audits
and code reviews will be conducted to identify and address any potential security risks.
COADING
MAIN MODULES

ASSEMBLER CODE
PARSING CODE
TOKENIZER
METHODOLOGY

We have two phases of compilers, namely the Analysis phase and the Synthesis phase. The
analysis phase creates an intermediate representation from the given source code. The
synthesis phase creates an equivalent target program from the intermediate representation.

A compiler is a software program that converts the high-level source code written in a
programming language into low-level machine code that can be executed by the computer
hardware. The process of converting the source code into machine code involves several
phases or stages, which are collectively known as the phases of a compiler.

The typical phases of a compiler are:

1. Lexical Analysis: The first phase of a compiler is lexical analysis, also known as
scanning. This phase reads the source code and breaks it into a stream of tokens,
which are the basic units of the programming language. The tokens are then passed on
to the next phase for further processing.
2. Syntax Analysis: The second phase of a compiler is syntax analysis, also known as
parsing. This phase takes the stream of tokens generated by the lexical analysis phase
and checks whether they conform to the grammar of the programming language. The
output of this phase is usually an Abstract Syntax Tree (AST).
3. Semantic Analysis: The third phase of a compiler is semantic analysis. This phase
checks whether the code is semantically correct, i.e., whether it conforms to the
language’s type system and other semantic rules. In this stage, the compiler checks
the meaning of the source code to ensure that it makes sense. The compiler performs
type checking, which ensures that variables are used correctly and that operations are
performed on compatible data types. The compiler also checks for other semantic
errors, such as undeclared variables and incorrect function calls.
4. Intermediate Code Generation: The fourth phase of a compiler is intermediate code
generation. This phase generates an intermediate representation of the source code
that can be easily translated into machine code.
5. Optimization: The fifth phase of a compiler is optimization. This phase applies
various optimization techniques to the intermediate code to improve the performance
of the generated machine code.
6. Code Generation: The final phase of a compiler is code generation. This phase takes
the optimized intermediate code and generates the actual machine code that can be
executed by the target hardware.
Symbol Table – It is a data structure being used and maintained by the compiler, consisting
of all the identifier’s names along with their types. It helps the compiler to function smoothly
by finding the identifiers quickly.

The analysis of a source program is divided into mainly three phases. They are:
1. Linear Analysis-
This involves a scanning phase where the stream of characters is read from left to
right. It is then grouped into various tokens having a collective meaning.
2. Hierarchical Analysis-
In this analysis phase, based on a collective meaning, the tokens are categorized
hierarchically into nested groups.
3. Semantic Analysis-
This phase is used to check whether the components of the source program are
meaningful or not.

The compiler has two modules namely the front end and the back end. Front-end constitutes
the Lexical analyzer, semantic analyzer, syntax analyzer, and intermediate code generator.
And the rest are assembled to form the back end.

Lexical Analyzer –
Lexical Analysis is the first phase of the compiler also known as a scanner. It converts the
High level input program into a sequence of Tokens..
Lexical Analysis can be implemented with the Deterministic finite Automata
The output is a sequence of tokens that is sent to the parser for syntax analysis

What is a Token?
A lexical token is a sequence of characters that can be treated as a unit in the grammar of the
programming languages. Example of tokens:
· Type token (id, number, real, . . . )
· Punctuation tokens (IF, void, return, . . . )
· Alphabetic tokens (keywords)
Keywords; Examples-for, while, if etc.
Identifier; Examples-Variable name, function name, etc.
Operators; Examples '+', '++', '-' etc.
Separators; Examples ',' ';' etc
Example of Non-Tokens:
· Comments, preprocessor directive, macros, blanks, tabs, newline, etc.
Lexeme: The sequence of characters matched by a pattern to form the corresponding token or
a sequence of input characters that comprises a single token is called a lexeme. eg- “float”,
“abs_zero_Kelvin”, “=”, “-”, “273”, “;” .

How Lexical Analyzer Works?


1. Input preprocessing: This stage involves cleaning up the input text and preparing it for
lexical analysis. This may include removing comments, whitespace, and other non-
essential characters from the input text.
2. Tokenization: This is the process of breaking the input text into a sequence of tokens.
This is usually done by matching the characters in the input text against a set of
patterns or regular expressions that define the different types of tokens.
3. Token classification: In this stage, the lexer determines the type of each token. For
example, in a programming language, the lexer might classify keywords, identifiers,
operators, and punctuation symbols as separate token types.
4. Token validation: In this stage, the lexer checks that each token is valid according to
the rules of the programming language. For example, it might check that a variable

name is a valid identifier, or that an operator has the correct syntax.


5. Output generation: In this final stage, the lexer generates the output of the lexical
analysis process, which is typically a list of tokens. This list of tokens can then be
passed to the next stage of compilation or interpretation.

· The lexical analyzer identifies the error with the help of the automation machine and
the grammar of the given language on which it is based like C, C++, and gives row
number and column number of the error.

Advantages

1. Simplifies Parsing:Breaking down the source code into tokens makes it easier for
computers to understand and work with the code. This helps programs like compilers
or interpreters to figure out what the code is supposed to do. It’s like breaking down a
big puzzle into smaller pieces, which makes it easier to put together and solve.

2. Error Detection: Lexical analysis will detect lexical errors such as misspelled
keywords or undefined symbols early in the compilation process. This helps in
improving the overall efficiency of the compiler or interpreter by identifying errors
sooner rather than later.

3. Efficiency: Once the source code is converted into tokens, subsequent phases of
compilation or interpretation can operate more efficiently. Parsing and semantic
analysis become faster and more streamlined when working with tokenized input.

Disadvantages
1. Limited Context: Lexical analysis operates based on individual tokens and does not
consider the overall context of the code. This can sometimes lead to ambiguity or
misinterpretation of the code’s intended meaning especially in languages with
complex syntax or semantics.

2. Overhead: Although lexical analysis is necessary for the compilation or


interpretation process, it adds an extra layer of overhead. Tokenizing the source code
requires additional computational resources which can impact the overall performance
of the compiler or interpreter.

3. Debugging Challenges: Lexical errors detected during the analysis phase may not
always provide clear indications of their origins in the original source code.
Debugging such errors can be challenging especially if they result from subtle
mistakes in the lexical analysis process.

Syntax Analyzer

When an input string (source code or a program in some language) is given to a compiler, the
compiler processes it in several phases, starting from lexical analysis (scans the input and
divides it into tokens) to target code generation.
Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks the
syntactical structure of the given input, i.e. whether the given input is in the correct syntax (of
the language in which the input has been written) or not. It does so by building a data
structure, called a Parse tree or Syntax tree. The parse tree is constructed by using the pre-
defined Grammar of the language and the input string. If the given input string can be
produced with the help of the syntax tree (in the derivation process), the input string is found
to be in the correct syntax. if not, the error is reported by the syntax analyzer.
Syntax analysis, also known as parsing, is a process in compiler design where the compiler
checks if the source code follows the grammatical rules of the programming language. This is
typically the second stage of the compilation process, following lexical analysis.
The main goal of syntax analysis is to create a parse tree or abstract syntax tree (AST) of the
source code, which is a hierarchical representation of the source code that reflects the
grammatical structure of the program.
There are several types of parsing algorithms used in syntax analysis, including:
· LL parsing: This is a top-down parsing algorithm that starts with the root of the
parse tree and constructs the tree by successively expanding non-terminals. LL
parsing is known for its simplicity and ease of implementation.
· LR parsing: This is a bottom-up parsing algorithm that starts with the leaves of the
parse tree and constructs the tree by successively reducing terminals. LR parsing is
more powerful than LL parsing and can handle a larger class of grammars.
· LR(1) parsing: This is a variant of LR parsing that uses lookahead to disambiguate
the grammar.
· LALR parsing: This is a variant of LR parsing that uses a reduced set of lookahead
symbols to reduce the number of states in the LR parser.
· Once the parse tree is constructed, the compiler can perform semantic analysis to
check if the source code makes sense and follows the semantics of the programming
language.
· The parse tree or AST can also be used in the code generation phase of the compiler
design to generate intermediate code or machine code.

Features of syntax analysis:

Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical representation of
the code’s structure. The tree shows the relationship between the various parts of the code,
including statements, expressions, and operators.

Context-Free Grammar: Syntax analysis uses context-free grammar to define the syntax of
the programming language. Context-free grammar is a formal language used to describe the
structure of programming languages.

Top-Down and Bottom-Up Parsing: Syntax analysis can be performed using two main
approaches: top-down parsing and bottom-up parsing. Top-down parsing starts from the
highest level of the syntax tree and works its way down, while bottom-up parsing starts from
the lowest level and works its way up.

Error Detection: Syntax analysis is responsible for detecting syntax errors in the code. If the
code does not conform to the rules of the programming language, the parser will report an
error and halt the compilation process.

Intermediate Code Generation: Syntax analysis generates an intermediate representation of


the code, which is used by the subsequent phases of the compiler. The intermediate
representation is usually a more abstract form of the code, which is easier to work with than
the original source code.

Optimization: Syntax analysis can perform basic optimizations on the code, such as
removing redundant code and simplifying expressions.
The pushdown automata (PDA) is used to design the syntax analysis phase.

The Grammar for a Language consists of Production rules.


Example: Suppose Production rules for the Grammar of a language are:

S -> cAd
A -> bc|a
And the input string is “cad”.
Now the parser attempts to construct a syntax tree from this grammar for the given input
string. It uses the given production rules and applies those as needed to generate the string.
To generate string “cad” it uses the rules as shown in the given diagram:
In step (iii) above, the production rule A->bc was not a suitable one to apply (because the
string produced is “cbcd” not “cad”), here the parser needs to backtrack, and apply the next
production rule available with A which is shown in step (iv), and the string “cad” is
produced.
Thus, the given input can be produced by the given grammar, therefore the input is correct in
syntax. But backtrack was needed to get the correct syntax tree, which is really a complex
process to implement.
There can be an easier way to solve this, which we shall see in the next article “Concepts of
FIRST and FOLLOW sets in Compiler Design”.

Advantages :

· Advantages of using syntax analysis in compiler design include:


· Structural validation: Syntax analysis allows the compiler to check if the source
code follows the grammatical rules of the programming language, which helps to
detect and report errors in the source code.
· Improved code generation: Syntax analysis can generate a parse tree or abstract
syntax tree (AST) of the source code, which can be used in the code generation phase
of the compiler design to generate more efficient and optimized code.
· Easier semantic analysis: Once the parse tree or AST is constructed, the compiler
can perform semantic analysis more easily, as it can rely on the structural information
provided by the parse tree or AST.

Disadvantages:

· Disadvantages of using syntax analysis in compiler design include:


· Complexity: Parsing is a complex process, and the quality of the parser can greatly
impact the performance of the resulting code. Implementing a parser for a complex
programming language can be a challenging task, especially for languages with
ambiguous grammars.
· Reduced performance: Syntax analysis can add overhead to the compilation process,
which can reduce the performance of the compiler.
· Limited error recovery: Syntax analysis algorithms may not be able to recover from
errors in the source code, which can lead to incomplete or incorrect parse trees and
make it difficult for the compiler to continue the compilation process.
· Inability to handle all languages: Not all languages have formal grammars, and some
languages may not be easily parseable.
· Overall, syntax analysis is an important stage in the compiler design process, but it
should be balanced against the goals and

Semantic Analyzer –

Semantic Analysis is the third phase of Compiler. Semantic Analysis makes sure that
declarations and statements of program are semantically correct. It is a collection of
procedures which is called by parser as and when required by grammar. Both syntax tree of
previous phase and symbol table are used to check the consistency of the given code. Type
checking is an important part of semantic analysis where compiler makes sure that each
operator has matching operands.
It uses syntax tree and symbol table to check whether the given program is semantically
consistent with language definition. It gathers type information and stores it in either syntax
tree or symbol table. This type information is subsequently used by compiler during
intermediate-code generation.

Semantic Errors:
Errors recognized by semantic analyzer are as follows:
· Type mismatch
· Undeclared variables
· Reserved identifier misuse

Functions of Semantic Analysis:


1. Type Checking –
Ensures that data types are used in a way consistent with their definition.
2. Label Checking –
A program should contain labels references.
3. Flow Control Check –
Keeps a check that control structures are used in a proper manner.(example: no break
statement outside a loop)

Example:

float x = 10.1;

float y = x*30;

In the above example integer 30 will be typecasted to float 30.0 before multiplication, by
semantic analyzer.
It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a
verified parse tree. It also does type checking, Label checking, and Flow control
checking.

Intermediate Code Generator –


In the analysis-synthesis model of a compiler, the front end of a compiler translates a source
program into an independent intermediate code, then the back end of the compiler uses this
intermediate code to generate the target code (which can be understood by the machine). The
benefits of using machine-independent intermediate code are:

Because of the machine-independent intermediate code, portability will be


enhanced. For ex, suppose, if a compiler translates the source language to its
target machine language without having the option for generating intermediate
code, then for each new machine, a full native compiler is required. Because,
obviously, there were some modifications in the compiler itself according to the
machine specifications.
Retargeting is facilitated.
It is easier to apply source code modification to improve the performance of
source code by optimizing the intermediate code.

If we generate machine code directly from source code then for n target machine we will
have optimizers and n code generator but if we will have a machine-independent intermediate
code, we will have only one optimizer. Intermediate code can be either language-specific
(e.g., Bytecode for Java) or language. independent (three-address code). The following are
commonly used intermediate code representations:
1. Postfix Notation:
Also known as reverse Polish notation or suffix notation.
In the infix notation, the operator is placed between operands, e.g., a + b.
Postfix notation positions the operator at the right end, as in ab +.
For any postfix expressions e1 and e2 with a binary operator (+) , applying the

operator yields e1e2+.


Postfix notation eliminates the need for parentheses, as the operator’s position
and arity allow unambiguous expression decoding.
In postfix notation, the operator consistently follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c *

Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is :


ab – cd + *ab -+
Read more: Infix to Postfix

2. Three-Address Code:
A three address statement involves a maximum of three references,
consisting of two for operands and one for the result.
A sequence of three address statements collectively forms a three address
code.
The typical form of a three address statement is expressed as x = y op z, Where
x, y, and z represent memory addresses.
Each variable (x, y, z) in a three address statement is associated with a specific memory
location.

While a standard three address statement includes three references, there are instances where
a statement may contain fewer than three references, yet it is still categorized as a three
address statement.
Example: The three address code for the expression a + b * c + d : T1 = b * c T2 = a + T1
T3 = T2 + d; T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
i) Quadruples
ii) Triples
iii) Indirect Triples
Read more: Three-address code

3. Syntax Tree:
· A syntax tree serves as a condensed representation of a parse tree.
· The operator and keyword nodes present in the parse tree undergo a relocation
process to become part of their respective parent nodes in the syntax tree. the
internal nodes are operators and child nodes are operands.
· Creating a syntax tree involves strategically placing parentheses within the
expression. This technique contributes to a more intuitive representation, making
it easier to discern the sequence in which operands should be processed.
· The syntax tree not only condenses the parse tree but also offers an improved
visual representation of the program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)

Advantages of Intermediate Code Generation:


Easier to implement: Intermediate code generation can simplify the code generation process
by reducing the complexity of the input code, making it easier to implement.
Facilitates code optimization: Intermediate code generation can enable the use of various
code optimization techniques, leading to improved performance and efficiency of the
generated code.
Platform independence: Intermediate code is platform-independent, meaning that it can be
translated into machine code or bytecode for any platform.
Code reuse: Intermediate code can be reused in the future to generate code for other
platforms or languages.
Easier debugging: Intermediate code can be easier to debug than machine code or bytecode,
as it is closer to the original source code.
Disadvantages of Intermediate Code Generation:
Increased compilation time: Intermediate code generation can significantly increase the
compilation time, making it less suitable for real-time or time-critical applications.
Additional memory usage: Intermediate code generation requires additional memory to
store the intermediate representation, which can be a concern for memory-limited systems.
Increased complexity: Intermediate code generation can increase the complexity of the
compiler design, making it harder to implement and maintain.
Reduced performance: The process of generating intermediate code can result in code that
executes slower than code generated directly from the source code.

Code Optimizer

The code optimization in the synthesis phase is a program transformation technique, which
tries to improve the intermediate code by making it consume fewer resources (i.e. CPU,
Memory) so that faster-running machine code will result. Compiler optimizing process
should meet the following objectives :
· The optimization must be correct, it must not, in any way, change the meaning of the
program.
· Optimization should increase the speed and performance of the program.
· The compilation time must be kept reasonable.
· The optimization process should not delay the overall compiling process.

When to Optimize?

Optimization of the code is often performed at the end of the development stage since it
reduces readability and adds code that is used to increase the performance.

Why Optimize?

Optimizing an algorithm is beyond the scope of the code optimization phase. So the program
is optimized. And it may involve reducing the size of the code. So optimization helps to:
· Reduce the space consumed and increases the speed of compilation.
· Manually analyzing datasets involves a lot of time. Hence we make use of software
like Tableau for data analysis. Similarly manually performing the optimization is also
tedious and is better done using a code optimizer.
· An optimized code often promotes re-usability.
Types of Code Optimization: The optimization process can be broadly classified into two
types :
1. Machine Independent Optimization: This code optimization phase attempts to
improve the intermediate code to get a better target code as the output. The part of
the intermediate code which is transformed here does not involve any CPU registers
or absolute memory locations.
2. Machine Dependent Optimization: Machine-dependent optimization is done after
the target code has been generated and when the code is transformed according to the
target machine architecture. It involves CPU registers and may have absolute memory
references rather than relative references. Machine-dependent optimizers put efforts to
take maximum advantage of the memory hierarchy

Advantages of Code Optimization:

Improved performance: Code optimization can result in code that executes faster and uses
fewer resources, leading to improved performance.

Reduction in code size: Code optimization can help reduce the size of the generated code,
making it easier to distribute and deploy.

Increased portability: Code optimization can result in code that is more portable across
different platforms, making it easier to target a wider range of hardware and software.

Reduced power consumption: Code optimization can lead to code that consumes less
power, making it more energy-efficient.

Improved maintainability: Code optimization can result in code that is easier to understand
and maintain, reducing the cost of software maintenance.
Disadvantages of Code Optimization:

Increased compilation time: Code optimization can significantly increase the compilation
time, which can be a significant drawback when developing large software systems.

Increased complexity: Code optimization can result in more complex code, making it harder
to understand and debug.

Potential for introducing bugs: Code optimization can introduce bugs into the code if not
done carefully, leading to unexpected behavior and errors.

Difficulty in assessing the effectiveness: It can be difficult to determine the effectiveness of


code optimization, making it hard to justify the time and resources spent on the process.

Target Code Generator

Target code generation is the final Phase of Compiler.

1. Input : Optimized Intermediate Representation.

2. Output : Target Code.

3. Task Performed : Register allocation methods and optimization, assembly level


code.

4. Method : Three popular strategies for register allocation and optimization.

5. Implementation : Algorithms.

Target code generation deals with assembly language to convert optimized code into machine
understandable format. Target code can be machine readable code or assembly code. Each
line in optimized code may map to one or more lines in machine (or) assembly code, hence
there is a 1:N mapping associated with them .
1 : N Mapping

Computations are generally assumed to be performed on high speed memory locations,


known as registers. Performing various operations on registers is efficient as registers are
faster than cache memory. This feature is effectively used by compilers, However registers
are not available in large amount and they are costly. Therefore we should try to use
minimum number of registers to incur overall low cost.

Advantages :

· Fast accessible storage


· Allows computations to be performed on them
· Deterministic as it incurs no miss
· Reduce memory traffic
· Reduces overall computation time

Disadvantages :

· Registers are generally available in small amount ( up to few hundred Kb )


· Register sizes are fixed and it varies from one processor to another
· Registers are complicated
· Need to save and restore changes during context switch and procedure calls
All these six phases are associated with the symbol table manager and error handler as
shown in the above block diagram.

RESULT AND DISCUSSION :

Suppose we pass a statement through lexical analyzer – a =b+c;

It will generate token sequence like this: id=id+id;

Where each id refers to it’s variable in the symbol table referencing all details For example,
consider the program

int main()

// 2 variables

int a, b;

a = 10;

return 0;

All the valid tokens are:

'int' 'main' '(' ')' '{' 'int' 'a' ',' 'b' ';'

'a' '=' '10' ';' 'return' '0' ';' '}'

Above are the valid tokens. You can observe that we have omitted comments. As another
example, consider below printf statement.

There are 5 valid token in this printf statement.


Exercise 1: Count number of tokens:

int main()

int a = 10, b = 20;

printf("sum is:%d",a+b);

return 0;

Answer: Total number of token: 27.

Exercise 2: Count number of tokens: int max(int i);

· Lexical analyzer first read int and finds it to be valid and accepts as token.
· max is read by it and found to be a valid function name after reading (
· int is also a token , then again I as another token and finally ;

Answer: Total number of tokens 7:

int, max, ( ,int, i, ), ;

We can represent in the form of lexemes and tokens as under

Lexemes Tokens Lexemes Tokens

while WHILE a IDENTIEFIER

( LAPREN = ASSIGNMENT

a IDENTIFIER a IDENTIFIER
>= COMPARISON – ARITHMETIC

b IDENTIFIER 2 INTEGER

) RPAREN ; SEMICOLON

Parse Tree:
· Parse tree is the hierarchical representation of terminals or non-terminals.
· These symbols (terminals or non-terminals) represent the derivation of the grammar
to yield input strings.
· In parsing, the string springs using the beginning symbol.
· The starting symbol of the grammar must be used as the root of the Parse Tree.
· Leaves of parse tree represent terminals.
· Each interior node represents productions of a grammar.

Rules to Draw a Parse Tree:


1. All leaf nodes need to be terminals.
2. All interior nodes need to be non-terminals.
3. In-order traversal gives the original input string.

Example 1: Let us take an example of Grammar (Production Rules).

S -> sAB

A -> a

B -> b

The input string is “sab”, then the Parse Tree is:

Example-2: Let us take another example of Grammar (Production Rules).


S -> AB

A -> c/aA

B -> d/bB

The input string is “acbd”, then the Parse Tree is as follows:

We have learnt how a parser constructs parse trees in the syntax analysis phase. The plain
parse-tree constructed in that phase is generally of no use for a compiler, as it does not carry
any information of how to evaluate the tree. The productions of context-free grammar, which
makes the rules of the language, do not accommodate how to interpret them.

For example

E→E+T

The above CFG production has no semantic rule associated with it, and it cannot help in
making any sense of the production.

Semantics

Semantics of a language provide meaning to its constructs, like tokens and syntax structure.
Semantics help interpret symbols, their types, and their relations with each other. Semantic
analysis judges whether the syntax structure constructed in the source program derives any
meaning or not.

CFG + semantic rules = Syntax Directed Definitions


For example:

int a = “value”;

should not issue an error in lexical and syntax analysis phase, as it is lexically and structurally
correct, but it should generate a semantic error as the type of the assignment differs. These
rules are set by the grammar of the language and evaluated in semantic analysis. The
following tasks should be performed in semantic analysis:

● Scope resolution
● Type checking
● Array-bound checking

Semantic Errors

We have mentioned some of the semantics errors that the semantic analyzer is expected to
recognize:

● Type mismatch
● Undeclared variable
● Reserved identifier misuse.
● Multiple declaration of variable in a scope.
● Accessing an out of scope variable.
● Actual and formal parameter mismatch.

Attribute Grammar

Attribute grammar is a special form of context-free grammar where some additional


information (attributes) are appended to one or more of its non-terminals in order to provide
context-sensitive information. Each attribute has well-defined domain of values, such as
integer, float, character, string, and expressions.

Attribute grammar is a medium to provide semantics to the context-free grammar and it can
help specify the syntax and semantics of a programming language. Attribute grammar (when
viewed as a parse-tree) can pass values or information among the nodes of a tree.

Example:

E → E + T { E.value = E.value + T.value }

The right part of the CFG contains the semantic rules that specify how the grammar should be
interpreted. Here, the values of non-terminals E and T are added together and the result is
copied to the non-terminal E.

Semantic attributes may be assigned to their values from their domain at the time of parsing
and evaluated at the time of assignment or conditions. Based on the way the attributes get
their values, they can be broadly divided into two categories : synthesized attributes and
inherited attributes.

Synthesized attributes
These attributes get values from the attribute values of their child nodes. To illustrate, assume
the following production:

S → ABC

If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized attribute,
as the values of ABC are synthesized to S.

As in our previous example (E → E + T), the parent node E gets its value from its child node.
Synthesized attributes never take values from their parent nodes or any sibling nodes.

Inherited attributes

In contrast to synthesized attributes, inherited attributes can take values from parent and/or
siblings. As in the following production,

S → ABC

A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take
values from S, A, and B.

Expansion : When a non-terminal is expanded to terminals as per a grammatical rule

Reduction : When a terminal is reduced to its corresponding non-terminal according to


grammar rules. Syntax trees are parsed top-down and left to right. Whenever reduction
occurs, we apply its corresponding semantic rules (actions).

Semantic analysis uses Syntax Directed Translations to perform the above tasks.

Semantic analyzer receives AST (Abstract Syntax Tree) from its previous stage (syntax
analysis).

Semantic analyzer attaches attribute information with AST, which are called Attributed AST.

Attributes are two tuple value, <attribute name, attribute value>

For example:

int value = 5;
<type, “integer”>
<presentvalue, “5”>

For every production, we attach a semantic rule.

Intermediate codes can be represented in a variety of ways and they have their own benefits.

· High Level IR - High-level intermediate code representation is very close to the


source language itself. They can be easily generated from the source code and we
can easily apply code modifications to enhance performance. But for target
machine optimization, it is less preferred.
· Low Level IR - This one is close to the target machine, which makes it suitable
for register and memory allocation, instruction set selection, etc. It is good for
machine-dependent optimizations.

Intermediate code can be either language specific (e.g., Byte Code for Java) or language
independent (three-address code).

Three-Address Code

Intermediate code generator receives input from its predecessor phase, semantic analyzer, in
the form of an annotated syntax tree. That syntax tree then can be converted into a linear
representation, e.g., postfix notation. Intermediate code tends to be machine independent
code. Therefore, code generator assumes to have unlimited number of memory storage
(register) to generate code.

For example:

a = b + c * d;

The intermediate code generator will try to divide this expression into sub-expressions and
then generate the corresponding code.

r1 = c * d;
r2 = b + r1;
a = r2

r being used as registers in the target program.

A three-address code has at most three address locations to calculate the expression. A three-
address code can be represented in two forms : quadruples and triples.

Quadruples

Each instruction in quadruples presentation is divided into four fields: operator, arg1, arg2,
and result. The above example is represented below in quadruples format:
Op arg1 arg2 result

* c d r1

+ b r1 r2

+ r2 r1 r3

= r3 a

Triples

Each instruction in triples presentation has three fields : op, arg1, and arg2.The results of
respective sub-expressions are denoted by the position of expression. Triples represent
similarity with DAG and syntax tree. They are equivalent to DAG while representing
expressions.

Op arg1 arg2

* c d

+ b (0)

+ (1) (0)

= (2)

Code Optimization is done in the following different ways:

1. Compile Time Evaluation:

· C
(i) A = 2*(22.0/7.0)*r

Perform 2*(22.0/7.0)*r at compile time.

(ii) x = 12.4

y = x/2.3

Evaluate x/2.3 as 12.4/2.3 at compile time.

2.Variable Propagation:

· C
//Before Optimization

c=a*b

x=a

till

d=x*b+4

//After Optimization

c=a*b

x=a

till

d=a*b+4

3. Constant Propagation:
· If the value of a variable is a constant, then replace the variable with the constant.
The variable may not always be a constant.
Example:
· C

(i) A = 2*(22.0/7.0)*r

Performs 2*(22.0/7.0)*r at compile time.

(ii) x = 12.4

y = x/2.3

Evaluates x/2.3 as 12.4/2.3 at compile time.

(iii) int k=2;


if(k) go to L3;

It is evaluated as :

go to L3 ( Because k = 2 which implies condition is always true)

4. Constant Folding:

· Consider an expression : a = b op c and the values b and c are constants, then the
value of a can be computed at compile time.
Example:
· C

#define k 5

x=2*k

y=k+5

This can be computed at compile time and the values of x and y are :

x = 10

y = 10

Note: Difference between Constant Propagation and Constant Folding:


· In Constant Propagation, the variable is substituted with its assigned constant where
as in Constant Folding, the variables whose values can be computed at compile time
are considered and computed.

5. Copy Propagation:

· It is extension of constant propagation.


· After a is assigned to x, use a to replace x till a is assigned again to another variable
or value or expression.
· It helps in reducing the compile time as it reduces copying.
Example :
· C
//Before Optimization

c=a*b

x=a

till

d=x*b+4

//After Optimization

c=a*b

x=a

till

d=a*b+4

6. Common Sub Expression Elimination:

· In the above example, a*b and x*b is a common sub expression.

7. Dead Code Elimination:

· Copy propagation often leads to making assignment statements into dead code.
· A variable is said to be dead if it is never used after its last definition.
· In order to find the dead variables, a data flow analysis should be done.
Example:
· C
c=a*b

x=a

till

d=a*b+4

//After elimination :

c=a*b

till

d=a*b+4

8. Unreachable Code Elimination:

· First, Control Flow Graph should be constructed.


· The block which does not have an incoming edge is an Unreachable code block.
· After constant propagation and constant folding, the unreachable branches can be
eliminated.

· C++
#include <iostream>

using namespace std;

int main() {

int num;

num=10;

cout << "GFG!";

return 0;

cout << num; //unreachable code

//after elimination of unreachable code

int main() {

int num;

num=10;

cout << "GFG!";

return 0;

9. Function Inlining:

· Here, a function call is replaced by the body of the function itself.


· This saves a lot of time in copying all the parameters, storing the return address, etc.

10. Function Cloning:


· Here, specialized codes for a function are created for different calling parameters.
· Example: Function Overloading

11. Induction Variable and Strength Reduction:

· An induction variable is used in the loop for the following kind of assignment i = i +
constant. It is a kind of Loop Optimization Technique.
· Strength reduction means replacing the high strength operator with a low strength.
Examples:
Example 1 :

Multiplication with powers of 2 can be replaced by shift left operator which is less

expensive than multiplication

a=a*16

// Can be modified as :

a = a<<4

Example 2 :

i = 1;

while (i<10)

y = i * 4;

//After Reduction

i=1

t=4

while( t<40)

y = t;
t = t + 4;

Loop Optimization Techniques:

1. Code Motion or Frequency Reduction:


· The evaluation frequency of expression is reduced.
· The loop invariant statements are brought out of the loop.
Example:
· C

a = 200;

while(a>0)

b = x + y;

if (a % b == 0)

printf(“%d”, a);

//This code can be further optimized as

a = 200;

b = x + y;

while(a>0)

{
if (a % b == 0}

printf(“%d”, a);

2. Loop Jamming:
· Two or more loops are combined in a single loop. It helps in reducing the compile
time.
Example:
· C

// Before loop jamming

for(int k=0;k<10;k++)

x = k*2;

for(int k=0;k<10;k++)

y = k+3;

//After loop jamming

for(int k=0;k<10;k++)

{
x = k*2;

y = k+3;

3. Loop Unrolling:
· It helps in optimizing the execution time of the program by reducing the iterations.
· It increases the program’s speed by eliminating the loop control and test instructions.

Example:
· C

//Before Loop Unrolling

for(int i=0;i<2;i++)

printf("Hello");

//After Loop Unrolling

printf("Hello");

printf("Hello");

Where to apply Optimization?


Now that we learned the need for optimization and its two types,now let’s see where to apply
these optimization.
· Source program: Optimizing the source program involves making changes to the
algorithm or changing the loop structures. The user is the actor here.
· Intermediate Code: Optimizing the intermediate code involves changing the
address calculations and transforming the procedure calls involved. Here compiler is
the actor.
· Target Code: Optimizing the target code is done by the compiler. Usage of
registers, and select and move instructions are part of the optimization involved in the
target code.
· Local Optimization: Transformations are applied to small basic blocks of
statements.

Techniques followed are Local Value Numbering and Tree Height Balancing.
· Regional Optimization: Transformations are applied to Extended Basic Blocks.
Techniques followed are Super Local Value Numbering and Loop Unrolling.
· Global Optimization: Transformations are applied to large program segments that
include functions, procedures, and loops. Techniques followed are Live Variable
Analysis and Global Code Replacement.
· Interprocedural Optimization: As the name indicates, the optimizations are
applied inter procedurally. Techniques followed are Inline Substitution and Procedure
Placement.

R0 = A
R1 = B
R2 = C
R3 = R0 + R1
R4 = R2 + R3
d = R4

or

R0 = A
R1 = B
R2 = R0 + R1
R0 = C
R3 = R2 + R0
d = R3
Arguably, the second version is more efficient due to it needing less registers to execute
however, the second version was only possible due to the order of operations allowing it to
use less. Here we can see that the order of operations will affect performance, which is why it
is needed during code generation.

Input code:
I=J+K

Intermediate code is then generated:


R1 = J

R2 = K
R1 = R1 + R2
I = R1

Then, the target code:


LDR R1, J //loads value of j into r1
LDR R2, K //loads value of k into r2
ADD R1, R1, R2 //adds r1 and r2, stores it into r1
STR I, R1 //stores the value of R1 to I

Here we can see an example of how the compiler goes about generating code for our target
machine. First, the high-level code is inputted in which the intermediate representation is
output. Then, through the methods and optimisation techniques discussed earlier, the
instructions are generated for the target, leaving us with the appropriate machine code for this
program.
CONCLUSION

Our journey with Project Melon, the development of a new programming language, has been
a fruitful one. We embarked on this project intending to make coding more accessible and
enjoyable for programmers of all levels. Throughout the development process, we explored
various features and functionalities that would streamline this objective.

Our project, Melon, aimed to create a new programming language. Throughout the
development process, we explored features and functionalities that would make coding easier
and more enjoyable.

In this project, we successfully designed Melon, a programming language with clear syntax,
focus on readability and user-friendly language. We believe Melon offers a user-friendly
experience for both beginners and experienced programmers.

However, Melon is still under development. We plan to study the IBM architecture and
continue in future With further improvements, Melon has the potential to become a valuable
tool for anyone interested in the world of programming.

Our project has established a solid foundation for Melon, including its syntax, semantics, and
standard library. We've strived to make Melon accessible and user-friendly, providing
comprehensive documentation and practical guides for developers.

While Melon remains a hypothetical language, our exploration has sparked valuable insights
into language design principles and implementation challenges. We've encountered various
obstacles along the way, from defining language features to optimizing compiler
performance, but each hurdle has been an opportunity for learning and growth.

Looking ahead, we envision continued refinement and expansion of Melon, incorporating


feedback from the programming community and exploring new avenues for innovation. Our
project serves as a testament to the creativity and ingenuity of developers, showcasing the
potential for new languages to shape the future of software development.

In essence, Melon offers a refreshing take on programming, making it more accessible


and enjoyable for all.
APPENDICES

A. Melon Language Specification

The Melon Language Specification offers an exhaustive breakdown of Melon's syntax,


semantics, and governing principles. It encompasses an extensive array of topics, ranging
from fundamental language constructs to advanced features. Included within are
comprehensive descriptions of keywords, data types, operators, control structures, and other
pivotal elements that constitute the Melon language. This specification serves as an
indispensable reference for developers seeking clarity and understanding while navigating the
intricacies of Melon's design.

B: Melon Syntax in Detail

This appendix offers a deeper dive into Melon's syntax. Use tables or diagrams to illustrate
how Melon structures code components like variables, functions, and conditional statements.
This is helpful for readers who want to understand the nuts and bolts of writing Melon code.

C. Melon Standard Library Documentation

The Melon Standard Library Documentation offers a comprehensive overview of the


modules, functions, and utilities available within Melon's standard library. Organized and
categorized for ease of reference, this documentation provides detailed descriptions, usage
examples, and guidelines for incorporating standard library components into Melon
applications. From essential data manipulation functions to specialized modules for specific
tasks, the Melon Standard Library Documentation equips developers with the tools and
resources necessary to streamline their development process and enhance the functionality of
their Melon projects.

D. Performance Evaluation Results

Appendix D presents the findings of performance evaluations conducted on the compiler and
runtime environment. Through rigorous benchmarking, metrics analysis, and performance
profiling, this section offers insights into the efficiency, scalability, and resource utilization of
Melon in real-world scenarios. Performance benchmarks encompass a wide range of criteria,
including execution speed, memory consumption, and concurrency handling, providing
developers with a comprehensive understanding of Melon's performance characteristics and
areas for optimization.
E. Future Enhancements

This appendix can discuss potential future improvements for Melon. This might include
features you couldn't implement in the current stage or ideas for expanding Melon's
capabilities.

No project is ever truly finished, and Melon is no exception. This appendix allows you to
discuss potential future enhancements for Melon. This could encompass features you
envisioned but couldn't implement in the current stage or entirely new ideas to expand
Melon's capabilities.
REFERENCES

● "Soft Skills: The Software Developer's Life Cycle" by John Sonmez (Discusses
the importance of programmer-friendly languages)

● "https://medium.com/javarevisited/python-for-everybody-course-review-is-it-
really-that-good-bf84af24e28"

● "https://developer.mozilla.org/en-US/docs/Web/JavaScript"

● https://www.freecodecamp.org/news/the-programming-language-pipeline-
91d3f449c919/

● https://www.education.com/science-fair/article/design-new-programming-
language/

You might also like