Report
Report
A Project Report
Submitted by
Certified that Group CS-16: Kuhu Rawat (2116208), Manasvi Sharma (2116216),
Muskan Soni (2116227), Nancy Chahal (2116228), Nandini Agarwal (2116229) has
carried out the project work titled “ MELON ” from 1/9/23 to 1/4/24 for the award of the
BTech Computer Science from Banasthali Vidyapith under my supervision. The project
embodies the result of original work and studies carried out by the students themselves and
the contents of the project do not form the basis for the award of any other degree to the
candidate or anybody else.
DATE:
ABSTRACT
At its core, Melon prioritizes readability and ease of use without compromising on power and
versatility. Drawing inspiration from a diverse array of languages, Melon combines the best
features of its predecessors while introducing novel concepts and constructs.
Key features of Melon include a clean and intuitive syntax, a robust type system, and
seamless interoperability with existing libraries and frameworks. With its minimalist design
philosophy, programmers can quickly grasp and harness the full potential of Melon, enabling
rapid development and iteration.
We extend our heartfelt gratitude to the countless individuals and organizations whose
contributions and insights have shaped the development of Mellon.
Special thanks to the programming language community for their invaluable feedback and
constructive criticism throughout the design and implementation process.
We are also indebted to the pioneers of programming language theory and practice whose
groundbreaking work continues to inspire and inform our endeavours.
Furthermore, we express our appreciation to the open-source community for their dedication
and support in fostering collaboration and sharing knowledge.
Last but not least, we acknowledge the unwavering commitment and tireless efforts of our
development team, whose passion and expertise have been instrumental in bringing Mellon to
fruition.
Together, we celebrate the collective effort and commitment to innovation that has
culminated in the creation of Mellon
In today's fast-paced digital landscape, the demand for simplicity, efficiency, and versatility
in software development has never been greater. Enter Mellon: a language crafted with
precision and passion to meet the evolving needs of developers worldwide.
With Mellon, we embark on a journey of exploration and innovation, where clarity and
expressiveness reign supreme. Whether you're a seasoned programmer or just starting your
coding adventure, Mellon empowers you to unleash your creativity and bring your ideas to
life with ease.
Join us as we delve into the features, capabilities, and endless possibilities of Mellon.
Together, let's embark on a transformative journey into the future of programming.
INTRODUCTION
In today's fast-paced digital landscape, the demand for simplicity, efficiency, and versatility
in software development has never been greater. Enter Melon: a language crafted with
precision and passion to meet the evolving needs of developers worldwide.
With Melon, we embark on a journey of exploration and innovation, where clarity and
expressiveness reign supreme. Whether you're a seasoned programmer or just starting your
coding adventure, Mellon empowers you to unleash your creativity and bring your ideas to
life with ease.
Join us as we delve into the features, capabilities, and endless possibilities of Mellon.
Together, let's embark on a transformative journey into the future of programming.
Requirement Analysis (SRS)
Specific Requirements
1. Interface
● UserInterface:
( Example hai we can write as per our own)
The compiler will be operated through a command-line interface (CLI) to ensure simplicity
and
flexibility in its usage. Users will interact with the compiler by executing commands in a
terminal
or command prompt.
/* The following command-line arguments and options will be supported:- `--input <file>`:
Specifies the source code file to be compiled.- `--output <file>`: Specifies the name of the
output file generated after compilation.- `--optimization <level>`: Enables optimization, with
levels ranging from 0 (none) to 3
(maximum).- `--target <architecture>`: Sets the target architecture for the compiled code
(e.g., x86, ARM).
*/
In the event of an incorrect command or option usage, the compiler will provide informative
error
messages to guide the user towards correct usage.
Here we can mention what error it will throw by giving some of error names with their
format.
● Language Syntax and Semantics
The programming language will follow a C-style syntax, including features like variables,
loops,
conditionals, functions, and data types. A detailed language specification document will be
provided to users, outlining the syntax rules and expected behavior of each language
construct.
The semantics of language constructs will be explained in terms of their expected effects on
program state.
● Error Handling
The compiler will implement robust error handling mechanisms. It will detect and report
various
types of errors, including syntax errors, semantic errors, and linker errors.
Syntax errors will be reported with specific line and column numbers along with a descriptive
error message. For example:
Error: Syntax error at line 10, column 20: Unexpected token '}'.
Semantic errors will include informative messages to help users identify and correct issues
related to variable types, function calls, and other semantic rules.
2. Databases
● SymbolTable
The compiler will maintain a symbol table as a data structure to keep track of variables,
functions, and their associated attributes during the compilation process. The symbol table
will
be implemented as a hash table for efficient look-up and storage.
For example, when encountering a variable declaration:
int x = 5;
The symbol table will record the entry `x` with its type `int` and value `5`.
[ ] Intermediate Representation (IR)
The compiler will utilize a three-address code-based intermediate representation (IR) for
efficient code generation and optimization. This IR will be designed to facilitate various
optimization passes, such as constant folding, dead code elimination, and loop optimization.
For example, the following C code:
int result = a + b;
Will be represented in the IR as:
t1 = a + b
result = t1
● Performance
Compilation Time
The compiler will aim to maintain reasonable compilation times across different sizes of
source
code. The maximum acceptable compilation time for a typical program (approximately 1000
lines of code) will be set at 5 seconds on standard hardware.
● Execution Time
The compiled programs will exhibit competitive runtime performance comparable to other
widely
used compilers for the target architecture. Efforts will be made to implement standard
optimization techniques, such as inlining and loop unrolling, to improve execution speed.
4. Software System Attributes
● Portability
The compiler will initially target the x86 architecture and will be designed with portability in
mind.
Future versions may extend support to additional architectures based on user demand and
project resources.
● Maintainability
To ensure maintainability, the codebase will follow established coding conventions and will
be
extensively documented. Version control using Git will be employed, and a clear branching
and
merging strategy will be defined to manage code updates and releases.
● Security
The compiler will implement security measures to mitigate potential vulnerabilities. This
includes
rigorous input validation to prevent buffer overflows or injection attacks. Regular security
audits
and code reviews will be conducted to identify and address any potential security risks.
COADING
MAIN MODULES
ASSEMBLER CODE
PARSING CODE
TOKENIZER
METHODOLOGY
We have two phases of compilers, namely the Analysis phase and the Synthesis phase. The
analysis phase creates an intermediate representation from the given source code. The
synthesis phase creates an equivalent target program from the intermediate representation.
A compiler is a software program that converts the high-level source code written in a
programming language into low-level machine code that can be executed by the computer
hardware. The process of converting the source code into machine code involves several
phases or stages, which are collectively known as the phases of a compiler.
1. Lexical Analysis: The first phase of a compiler is lexical analysis, also known as
scanning. This phase reads the source code and breaks it into a stream of tokens,
which are the basic units of the programming language. The tokens are then passed on
to the next phase for further processing.
2. Syntax Analysis: The second phase of a compiler is syntax analysis, also known as
parsing. This phase takes the stream of tokens generated by the lexical analysis phase
and checks whether they conform to the grammar of the programming language. The
output of this phase is usually an Abstract Syntax Tree (AST).
3. Semantic Analysis: The third phase of a compiler is semantic analysis. This phase
checks whether the code is semantically correct, i.e., whether it conforms to the
language’s type system and other semantic rules. In this stage, the compiler checks
the meaning of the source code to ensure that it makes sense. The compiler performs
type checking, which ensures that variables are used correctly and that operations are
performed on compatible data types. The compiler also checks for other semantic
errors, such as undeclared variables and incorrect function calls.
4. Intermediate Code Generation: The fourth phase of a compiler is intermediate code
generation. This phase generates an intermediate representation of the source code
that can be easily translated into machine code.
5. Optimization: The fifth phase of a compiler is optimization. This phase applies
various optimization techniques to the intermediate code to improve the performance
of the generated machine code.
6. Code Generation: The final phase of a compiler is code generation. This phase takes
the optimized intermediate code and generates the actual machine code that can be
executed by the target hardware.
Symbol Table – It is a data structure being used and maintained by the compiler, consisting
of all the identifier’s names along with their types. It helps the compiler to function smoothly
by finding the identifiers quickly.
The analysis of a source program is divided into mainly three phases. They are:
1. Linear Analysis-
This involves a scanning phase where the stream of characters is read from left to
right. It is then grouped into various tokens having a collective meaning.
2. Hierarchical Analysis-
In this analysis phase, based on a collective meaning, the tokens are categorized
hierarchically into nested groups.
3. Semantic Analysis-
This phase is used to check whether the components of the source program are
meaningful or not.
The compiler has two modules namely the front end and the back end. Front-end constitutes
the Lexical analyzer, semantic analyzer, syntax analyzer, and intermediate code generator.
And the rest are assembled to form the back end.
Lexical Analyzer –
Lexical Analysis is the first phase of the compiler also known as a scanner. It converts the
High level input program into a sequence of Tokens..
Lexical Analysis can be implemented with the Deterministic finite Automata
The output is a sequence of tokens that is sent to the parser for syntax analysis
What is a Token?
A lexical token is a sequence of characters that can be treated as a unit in the grammar of the
programming languages. Example of tokens:
· Type token (id, number, real, . . . )
· Punctuation tokens (IF, void, return, . . . )
· Alphabetic tokens (keywords)
Keywords; Examples-for, while, if etc.
Identifier; Examples-Variable name, function name, etc.
Operators; Examples '+', '++', '-' etc.
Separators; Examples ',' ';' etc
Example of Non-Tokens:
· Comments, preprocessor directive, macros, blanks, tabs, newline, etc.
Lexeme: The sequence of characters matched by a pattern to form the corresponding token or
a sequence of input characters that comprises a single token is called a lexeme. eg- “float”,
“abs_zero_Kelvin”, “=”, “-”, “273”, “;” .
· The lexical analyzer identifies the error with the help of the automation machine and
the grammar of the given language on which it is based like C, C++, and gives row
number and column number of the error.
Advantages
1. Simplifies Parsing:Breaking down the source code into tokens makes it easier for
computers to understand and work with the code. This helps programs like compilers
or interpreters to figure out what the code is supposed to do. It’s like breaking down a
big puzzle into smaller pieces, which makes it easier to put together and solve.
2. Error Detection: Lexical analysis will detect lexical errors such as misspelled
keywords or undefined symbols early in the compilation process. This helps in
improving the overall efficiency of the compiler or interpreter by identifying errors
sooner rather than later.
3. Efficiency: Once the source code is converted into tokens, subsequent phases of
compilation or interpretation can operate more efficiently. Parsing and semantic
analysis become faster and more streamlined when working with tokenized input.
Disadvantages
1. Limited Context: Lexical analysis operates based on individual tokens and does not
consider the overall context of the code. This can sometimes lead to ambiguity or
misinterpretation of the code’s intended meaning especially in languages with
complex syntax or semantics.
3. Debugging Challenges: Lexical errors detected during the analysis phase may not
always provide clear indications of their origins in the original source code.
Debugging such errors can be challenging especially if they result from subtle
mistakes in the lexical analysis process.
Syntax Analyzer
When an input string (source code or a program in some language) is given to a compiler, the
compiler processes it in several phases, starting from lexical analysis (scans the input and
divides it into tokens) to target code generation.
Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks the
syntactical structure of the given input, i.e. whether the given input is in the correct syntax (of
the language in which the input has been written) or not. It does so by building a data
structure, called a Parse tree or Syntax tree. The parse tree is constructed by using the pre-
defined Grammar of the language and the input string. If the given input string can be
produced with the help of the syntax tree (in the derivation process), the input string is found
to be in the correct syntax. if not, the error is reported by the syntax analyzer.
Syntax analysis, also known as parsing, is a process in compiler design where the compiler
checks if the source code follows the grammatical rules of the programming language. This is
typically the second stage of the compilation process, following lexical analysis.
The main goal of syntax analysis is to create a parse tree or abstract syntax tree (AST) of the
source code, which is a hierarchical representation of the source code that reflects the
grammatical structure of the program.
There are several types of parsing algorithms used in syntax analysis, including:
· LL parsing: This is a top-down parsing algorithm that starts with the root of the
parse tree and constructs the tree by successively expanding non-terminals. LL
parsing is known for its simplicity and ease of implementation.
· LR parsing: This is a bottom-up parsing algorithm that starts with the leaves of the
parse tree and constructs the tree by successively reducing terminals. LR parsing is
more powerful than LL parsing and can handle a larger class of grammars.
· LR(1) parsing: This is a variant of LR parsing that uses lookahead to disambiguate
the grammar.
· LALR parsing: This is a variant of LR parsing that uses a reduced set of lookahead
symbols to reduce the number of states in the LR parser.
· Once the parse tree is constructed, the compiler can perform semantic analysis to
check if the source code makes sense and follows the semantics of the programming
language.
· The parse tree or AST can also be used in the code generation phase of the compiler
design to generate intermediate code or machine code.
Syntax Trees: Syntax analysis creates a syntax tree, which is a hierarchical representation of
the code’s structure. The tree shows the relationship between the various parts of the code,
including statements, expressions, and operators.
Context-Free Grammar: Syntax analysis uses context-free grammar to define the syntax of
the programming language. Context-free grammar is a formal language used to describe the
structure of programming languages.
Top-Down and Bottom-Up Parsing: Syntax analysis can be performed using two main
approaches: top-down parsing and bottom-up parsing. Top-down parsing starts from the
highest level of the syntax tree and works its way down, while bottom-up parsing starts from
the lowest level and works its way up.
Error Detection: Syntax analysis is responsible for detecting syntax errors in the code. If the
code does not conform to the rules of the programming language, the parser will report an
error and halt the compilation process.
Optimization: Syntax analysis can perform basic optimizations on the code, such as
removing redundant code and simplifying expressions.
The pushdown automata (PDA) is used to design the syntax analysis phase.
S -> cAd
A -> bc|a
And the input string is “cad”.
Now the parser attempts to construct a syntax tree from this grammar for the given input
string. It uses the given production rules and applies those as needed to generate the string.
To generate string “cad” it uses the rules as shown in the given diagram:
In step (iii) above, the production rule A->bc was not a suitable one to apply (because the
string produced is “cbcd” not “cad”), here the parser needs to backtrack, and apply the next
production rule available with A which is shown in step (iv), and the string “cad” is
produced.
Thus, the given input can be produced by the given grammar, therefore the input is correct in
syntax. But backtrack was needed to get the correct syntax tree, which is really a complex
process to implement.
There can be an easier way to solve this, which we shall see in the next article “Concepts of
FIRST and FOLLOW sets in Compiler Design”.
Advantages :
Disadvantages:
Semantic Analyzer –
Semantic Analysis is the third phase of Compiler. Semantic Analysis makes sure that
declarations and statements of program are semantically correct. It is a collection of
procedures which is called by parser as and when required by grammar. Both syntax tree of
previous phase and symbol table are used to check the consistency of the given code. Type
checking is an important part of semantic analysis where compiler makes sure that each
operator has matching operands.
It uses syntax tree and symbol table to check whether the given program is semantically
consistent with language definition. It gathers type information and stores it in either syntax
tree or symbol table. This type information is subsequently used by compiler during
intermediate-code generation.
Semantic Errors:
Errors recognized by semantic analyzer are as follows:
· Type mismatch
· Undeclared variables
· Reserved identifier misuse
Example:
float x = 10.1;
float y = x*30;
In the above example integer 30 will be typecasted to float 30.0 before multiplication, by
semantic analyzer.
It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a
verified parse tree. It also does type checking, Label checking, and Flow control
checking.
If we generate machine code directly from source code then for n target machine we will
have optimizers and n code generator but if we will have a machine-independent intermediate
code, we will have only one optimizer. Intermediate code can be either language-specific
(e.g., Bytecode for Java) or language. independent (three-address code). The following are
commonly used intermediate code representations:
1. Postfix Notation:
Also known as reverse Polish notation or suffix notation.
In the infix notation, the operator is placed between operands, e.g., a + b.
Postfix notation positions the operator at the right end, as in ab +.
For any postfix expressions e1 and e2 with a binary operator (+) , applying the
2. Three-Address Code:
A three address statement involves a maximum of three references,
consisting of two for operands and one for the result.
A sequence of three address statements collectively forms a three address
code.
The typical form of a three address statement is expressed as x = y op z, Where
x, y, and z represent memory addresses.
Each variable (x, y, z) in a three address statement is associated with a specific memory
location.
While a standard three address statement includes three references, there are instances where
a statement may contain fewer than three references, yet it is still categorized as a three
address statement.
Example: The three address code for the expression a + b * c + d : T1 = b * c T2 = a + T1
T3 = T2 + d; T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
i) Quadruples
ii) Triples
iii) Indirect Triples
Read more: Three-address code
3. Syntax Tree:
· A syntax tree serves as a condensed representation of a parse tree.
· The operator and keyword nodes present in the parse tree undergo a relocation
process to become part of their respective parent nodes in the syntax tree. the
internal nodes are operators and child nodes are operands.
· Creating a syntax tree involves strategically placing parentheses within the
expression. This technique contributes to a more intuitive representation, making
it easier to discern the sequence in which operands should be processed.
· The syntax tree not only condenses the parse tree but also offers an improved
visual representation of the program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)
Code Optimizer
The code optimization in the synthesis phase is a program transformation technique, which
tries to improve the intermediate code by making it consume fewer resources (i.e. CPU,
Memory) so that faster-running machine code will result. Compiler optimizing process
should meet the following objectives :
· The optimization must be correct, it must not, in any way, change the meaning of the
program.
· Optimization should increase the speed and performance of the program.
· The compilation time must be kept reasonable.
· The optimization process should not delay the overall compiling process.
When to Optimize?
Optimization of the code is often performed at the end of the development stage since it
reduces readability and adds code that is used to increase the performance.
Why Optimize?
Optimizing an algorithm is beyond the scope of the code optimization phase. So the program
is optimized. And it may involve reducing the size of the code. So optimization helps to:
· Reduce the space consumed and increases the speed of compilation.
· Manually analyzing datasets involves a lot of time. Hence we make use of software
like Tableau for data analysis. Similarly manually performing the optimization is also
tedious and is better done using a code optimizer.
· An optimized code often promotes re-usability.
Types of Code Optimization: The optimization process can be broadly classified into two
types :
1. Machine Independent Optimization: This code optimization phase attempts to
improve the intermediate code to get a better target code as the output. The part of
the intermediate code which is transformed here does not involve any CPU registers
or absolute memory locations.
2. Machine Dependent Optimization: Machine-dependent optimization is done after
the target code has been generated and when the code is transformed according to the
target machine architecture. It involves CPU registers and may have absolute memory
references rather than relative references. Machine-dependent optimizers put efforts to
take maximum advantage of the memory hierarchy
Improved performance: Code optimization can result in code that executes faster and uses
fewer resources, leading to improved performance.
Reduction in code size: Code optimization can help reduce the size of the generated code,
making it easier to distribute and deploy.
Increased portability: Code optimization can result in code that is more portable across
different platforms, making it easier to target a wider range of hardware and software.
Reduced power consumption: Code optimization can lead to code that consumes less
power, making it more energy-efficient.
Improved maintainability: Code optimization can result in code that is easier to understand
and maintain, reducing the cost of software maintenance.
Disadvantages of Code Optimization:
Increased compilation time: Code optimization can significantly increase the compilation
time, which can be a significant drawback when developing large software systems.
Increased complexity: Code optimization can result in more complex code, making it harder
to understand and debug.
Potential for introducing bugs: Code optimization can introduce bugs into the code if not
done carefully, leading to unexpected behavior and errors.
5. Implementation : Algorithms.
Target code generation deals with assembly language to convert optimized code into machine
understandable format. Target code can be machine readable code or assembly code. Each
line in optimized code may map to one or more lines in machine (or) assembly code, hence
there is a 1:N mapping associated with them .
1 : N Mapping
Advantages :
Disadvantages :
Where each id refers to it’s variable in the symbol table referencing all details For example,
consider the program
int main()
// 2 variables
int a, b;
a = 10;
return 0;
'int' 'main' '(' ')' '{' 'int' 'a' ',' 'b' ';'
Above are the valid tokens. You can observe that we have omitted comments. As another
example, consider below printf statement.
int main()
printf("sum is:%d",a+b);
return 0;
· Lexical analyzer first read int and finds it to be valid and accepts as token.
· max is read by it and found to be a valid function name after reading (
· int is also a token , then again I as another token and finally ;
( LAPREN = ASSIGNMENT
a IDENTIFIER a IDENTIFIER
>= COMPARISON – ARITHMETIC
b IDENTIFIER 2 INTEGER
) RPAREN ; SEMICOLON
Parse Tree:
· Parse tree is the hierarchical representation of terminals or non-terminals.
· These symbols (terminals or non-terminals) represent the derivation of the grammar
to yield input strings.
· In parsing, the string springs using the beginning symbol.
· The starting symbol of the grammar must be used as the root of the Parse Tree.
· Leaves of parse tree represent terminals.
· Each interior node represents productions of a grammar.
S -> sAB
A -> a
B -> b
A -> c/aA
B -> d/bB
We have learnt how a parser constructs parse trees in the syntax analysis phase. The plain
parse-tree constructed in that phase is generally of no use for a compiler, as it does not carry
any information of how to evaluate the tree. The productions of context-free grammar, which
makes the rules of the language, do not accommodate how to interpret them.
For example
E→E+T
The above CFG production has no semantic rule associated with it, and it cannot help in
making any sense of the production.
Semantics
Semantics of a language provide meaning to its constructs, like tokens and syntax structure.
Semantics help interpret symbols, their types, and their relations with each other. Semantic
analysis judges whether the syntax structure constructed in the source program derives any
meaning or not.
int a = “value”;
should not issue an error in lexical and syntax analysis phase, as it is lexically and structurally
correct, but it should generate a semantic error as the type of the assignment differs. These
rules are set by the grammar of the language and evaluated in semantic analysis. The
following tasks should be performed in semantic analysis:
● Scope resolution
● Type checking
● Array-bound checking
●
Semantic Errors
We have mentioned some of the semantics errors that the semantic analyzer is expected to
recognize:
● Type mismatch
● Undeclared variable
● Reserved identifier misuse.
● Multiple declaration of variable in a scope.
● Accessing an out of scope variable.
● Actual and formal parameter mismatch.
Attribute Grammar
Attribute grammar is a medium to provide semantics to the context-free grammar and it can
help specify the syntax and semantics of a programming language. Attribute grammar (when
viewed as a parse-tree) can pass values or information among the nodes of a tree.
Example:
The right part of the CFG contains the semantic rules that specify how the grammar should be
interpreted. Here, the values of non-terminals E and T are added together and the result is
copied to the non-terminal E.
Semantic attributes may be assigned to their values from their domain at the time of parsing
and evaluated at the time of assignment or conditions. Based on the way the attributes get
their values, they can be broadly divided into two categories : synthesized attributes and
inherited attributes.
Synthesized attributes
These attributes get values from the attribute values of their child nodes. To illustrate, assume
the following production:
S → ABC
If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized attribute,
as the values of ABC are synthesized to S.
As in our previous example (E → E + T), the parent node E gets its value from its child node.
Synthesized attributes never take values from their parent nodes or any sibling nodes.
Inherited attributes
In contrast to synthesized attributes, inherited attributes can take values from parent and/or
siblings. As in the following production,
S → ABC
A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take
values from S, A, and B.
Semantic analysis uses Syntax Directed Translations to perform the above tasks.
Semantic analyzer receives AST (Abstract Syntax Tree) from its previous stage (syntax
analysis).
Semantic analyzer attaches attribute information with AST, which are called Attributed AST.
For example:
int value = 5;
<type, “integer”>
<presentvalue, “5”>
Intermediate codes can be represented in a variety of ways and they have their own benefits.
Intermediate code can be either language specific (e.g., Byte Code for Java) or language
independent (three-address code).
Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic analyzer, in
the form of an annotated syntax tree. That syntax tree then can be converted into a linear
representation, e.g., postfix notation. Intermediate code tends to be machine independent
code. Therefore, code generator assumes to have unlimited number of memory storage
(register) to generate code.
For example:
a = b + c * d;
The intermediate code generator will try to divide this expression into sub-expressions and
then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2
A three-address code has at most three address locations to calculate the expression. A three-
address code can be represented in two forms : quadruples and triples.
Quadruples
Each instruction in quadruples presentation is divided into four fields: operator, arg1, arg2,
and result. The above example is represented below in quadruples format:
Op arg1 arg2 result
* c d r1
+ b r1 r2
+ r2 r1 r3
= r3 a
Triples
Each instruction in triples presentation has three fields : op, arg1, and arg2.The results of
respective sub-expressions are denoted by the position of expression. Triples represent
similarity with DAG and syntax tree. They are equivalent to DAG while representing
expressions.
Op arg1 arg2
* c d
+ b (0)
+ (1) (0)
= (2)
· C
(i) A = 2*(22.0/7.0)*r
(ii) x = 12.4
y = x/2.3
2.Variable Propagation:
· C
//Before Optimization
c=a*b
x=a
till
d=x*b+4
//After Optimization
c=a*b
x=a
till
d=a*b+4
3. Constant Propagation:
· If the value of a variable is a constant, then replace the variable with the constant.
The variable may not always be a constant.
Example:
· C
(i) A = 2*(22.0/7.0)*r
(ii) x = 12.4
y = x/2.3
It is evaluated as :
4. Constant Folding:
· Consider an expression : a = b op c and the values b and c are constants, then the
value of a can be computed at compile time.
Example:
· C
#define k 5
x=2*k
y=k+5
This can be computed at compile time and the values of x and y are :
x = 10
y = 10
5. Copy Propagation:
c=a*b
x=a
till
d=x*b+4
//After Optimization
c=a*b
x=a
till
d=a*b+4
· Copy propagation often leads to making assignment statements into dead code.
· A variable is said to be dead if it is never used after its last definition.
· In order to find the dead variables, a data flow analysis should be done.
Example:
· C
c=a*b
x=a
till
d=a*b+4
//After elimination :
c=a*b
till
d=a*b+4
· C++
#include <iostream>
int main() {
int num;
num=10;
return 0;
int main() {
int num;
num=10;
return 0;
9. Function Inlining:
· An induction variable is used in the loop for the following kind of assignment i = i +
constant. It is a kind of Loop Optimization Technique.
· Strength reduction means replacing the high strength operator with a low strength.
Examples:
Example 1 :
Multiplication with powers of 2 can be replaced by shift left operator which is less
a=a*16
// Can be modified as :
a = a<<4
Example 2 :
i = 1;
while (i<10)
y = i * 4;
//After Reduction
i=1
t=4
while( t<40)
y = t;
t = t + 4;
a = 200;
while(a>0)
b = x + y;
if (a % b == 0)
printf(“%d”, a);
a = 200;
b = x + y;
while(a>0)
{
if (a % b == 0}
printf(“%d”, a);
2. Loop Jamming:
· Two or more loops are combined in a single loop. It helps in reducing the compile
time.
Example:
· C
for(int k=0;k<10;k++)
x = k*2;
for(int k=0;k<10;k++)
y = k+3;
for(int k=0;k<10;k++)
{
x = k*2;
y = k+3;
3. Loop Unrolling:
· It helps in optimizing the execution time of the program by reducing the iterations.
· It increases the program’s speed by eliminating the loop control and test instructions.
Example:
· C
for(int i=0;i<2;i++)
printf("Hello");
printf("Hello");
printf("Hello");
Techniques followed are Local Value Numbering and Tree Height Balancing.
· Regional Optimization: Transformations are applied to Extended Basic Blocks.
Techniques followed are Super Local Value Numbering and Loop Unrolling.
· Global Optimization: Transformations are applied to large program segments that
include functions, procedures, and loops. Techniques followed are Live Variable
Analysis and Global Code Replacement.
· Interprocedural Optimization: As the name indicates, the optimizations are
applied inter procedurally. Techniques followed are Inline Substitution and Procedure
Placement.
R0 = A
R1 = B
R2 = C
R3 = R0 + R1
R4 = R2 + R3
d = R4
or
R0 = A
R1 = B
R2 = R0 + R1
R0 = C
R3 = R2 + R0
d = R3
Arguably, the second version is more efficient due to it needing less registers to execute
however, the second version was only possible due to the order of operations allowing it to
use less. Here we can see that the order of operations will affect performance, which is why it
is needed during code generation.
Input code:
I=J+K
R2 = K
R1 = R1 + R2
I = R1
Here we can see an example of how the compiler goes about generating code for our target
machine. First, the high-level code is inputted in which the intermediate representation is
output. Then, through the methods and optimisation techniques discussed earlier, the
instructions are generated for the target, leaving us with the appropriate machine code for this
program.
CONCLUSION
Our journey with Project Melon, the development of a new programming language, has been
a fruitful one. We embarked on this project intending to make coding more accessible and
enjoyable for programmers of all levels. Throughout the development process, we explored
various features and functionalities that would streamline this objective.
Our project, Melon, aimed to create a new programming language. Throughout the
development process, we explored features and functionalities that would make coding easier
and more enjoyable.
In this project, we successfully designed Melon, a programming language with clear syntax,
focus on readability and user-friendly language. We believe Melon offers a user-friendly
experience for both beginners and experienced programmers.
However, Melon is still under development. We plan to study the IBM architecture and
continue in future With further improvements, Melon has the potential to become a valuable
tool for anyone interested in the world of programming.
Our project has established a solid foundation for Melon, including its syntax, semantics, and
standard library. We've strived to make Melon accessible and user-friendly, providing
comprehensive documentation and practical guides for developers.
While Melon remains a hypothetical language, our exploration has sparked valuable insights
into language design principles and implementation challenges. We've encountered various
obstacles along the way, from defining language features to optimizing compiler
performance, but each hurdle has been an opportunity for learning and growth.
This appendix offers a deeper dive into Melon's syntax. Use tables or diagrams to illustrate
how Melon structures code components like variables, functions, and conditional statements.
This is helpful for readers who want to understand the nuts and bolts of writing Melon code.
Appendix D presents the findings of performance evaluations conducted on the compiler and
runtime environment. Through rigorous benchmarking, metrics analysis, and performance
profiling, this section offers insights into the efficiency, scalability, and resource utilization of
Melon in real-world scenarios. Performance benchmarks encompass a wide range of criteria,
including execution speed, memory consumption, and concurrency handling, providing
developers with a comprehensive understanding of Melon's performance characteristics and
areas for optimization.
E. Future Enhancements
This appendix can discuss potential future improvements for Melon. This might include
features you couldn't implement in the current stage or ideas for expanding Melon's
capabilities.
No project is ever truly finished, and Melon is no exception. This appendix allows you to
discuss potential future enhancements for Melon. This could encompass features you
envisioned but couldn't implement in the current stage or entirely new ideas to expand
Melon's capabilities.
REFERENCES
● "Soft Skills: The Software Developer's Life Cycle" by John Sonmez (Discusses
the importance of programmer-friendly languages)
● "https://medium.com/javarevisited/python-for-everybody-course-review-is-it-
really-that-good-bf84af24e28"
● "https://developer.mozilla.org/en-US/docs/Web/JavaScript"
● https://www.freecodecamp.org/news/the-programming-language-pipeline-
91d3f449c919/
● https://www.education.com/science-fair/article/design-new-programming-
language/