K Sai CPDS
K Sai CPDS
SUBMITTED BY
Kanuri Sai Karthik(24261A0490)
Sakshi Tiwari(24261A04B6)
Inkulu Varshini(24261A087)
SUBMITTED TO
Assistant Professor
(Autonomous)
Chaitanya Bharathi (PO), Kokapet(V), Gandipet (M), Ranga Reddy district, Hyd,
Telangana, India-500075
MGIT| Page1
TABLE OF CONTENTS
ABSTRACT
1. Introduction
1.1 Overview of the Project
1.2 Objectives and Scope
1.3 Importance of a Mini Compiler
3. System Design
3.1 Architectural Overview
3.2 Components of the Mini Compiler
3.3 Design Constraints and Assumptions
4. Lexical Analysis
4.1 Role of the Lexer
4.2 Token Specification for C Subset
4.3 Implementation Details
5. Syntax Analysis
5.1 Basic Definition and Techniques Used
5.2 Grammar for the C Subset
5.3 Abstract Syntax Tree Generation
MGIT| Page2
6. Semantic Analysis
6.1 Basic Definition and Working Principle
6.2 Symbol Table Management
6.3 Type Checking and Error Handling
10. In Summary
REFERENCES
MGIT| Page3
ABSTRACT
A compiler is a computer program that transforms source code written in a
programming language(the source language) into another computer
language(the target language),with the latter often having a binary form
known as object code.This report presents the design and implementation of a
mini compiler for a subset of the C programming language, developed using
the C language itself. The compiler translates high-level C subset code into
low-level machine code, enabling the execution of programs on a virtual or
physical machine. Key features of the compiler include lexical analysis,
syntax analysis, semantic analysis, intermediate code generation, and code
optimization.The project focuses on supporting essential programming
constructs such as variable declarations, expressions, conditional statements,
loops, and function calls, while simplifying complexities like advanced
pointer manipulations, memory management, and extensive library support.
The implementation leverages modular programming principles, ensuring
scalability and maintainability of the compiler components.This report details
the compiler’s architecture, including the development of a custom lexical
analyzer, parser, and code generator. Challenges such as error detection and
recovery, symbol table management, and type checking are discussed, along
with strategies employed to overcome them. Performance evaluations and
comparisons with existing compilers highlight the compiler’s efficiency and
accuracy.The report concludes with potential improvements and extensions,
such as support for additional C language features, optimizations for
execution speed, and compatibility with other hardware architectures. This
project serves as a foundational step for further research and development in
compiler design and programming language processing.
MGIT| Page4
1. Introduction
1.1 Overview of the Project
The project focuses on the design and implementation of a mini compiler for a subset of
the C programming language. The mini compiler will serve as a tool to convert high-level
code written in the defined C subset into equivalent low-level machine code or an
intermediate representation. Using the provided codebase, the compiler processes and
executes simple C programs by performing the essential phases of compilation: lexical
analysis, syntax analysis, semantic analysis, and code generation while also providing
insights into these workings.This project is educational in nature, providing insights into
compiler design through a hands-on approach. It takes a minimalist approach by focusing
on a reduced set of C language features.The compiler processes basic C constructs,
including variable declarations, assignments, and print statements, to execute small
programs.
MGIT| Page5
2. Background and Literature Review
2.1 Overview of Compiler Design
A compiler is a software tool that translates high-level programming language code into
low-level machine code or an intermediate representation that can be executed by a
computer. It acts as a bridge between human-readable code and machine-executable code,
ensuring the program adheres to the syntactic and semantic rules of the programming
language. Compilers play a crucial role in making high-level programming efficient and
accessible by abstracting machine-level complexities. The provided code demonstrates
the fundamental phases of compiler design for a C subset. It includes lexical analysis,
syntax analysis, and semantic analysis. The evaluate expression() function processes
arithmetic operations, while execution is handled directly without generating machine
code. This simple pipeline highlights the core components of a compiler, providing a
practical understanding of tokenization, parsing, and code evaluation.
MGIT| Page6
3. System Design
3.1 Architectural Overview
The mini compiler follows a modular architecture, where each phase of the compilation
process is implemented as an independent module. The workflow of the mini compiler
can be summarized as follows:
● Input Code: The user provides a source file written in a subset of the C language.
● Lexical Analysis: The source code is scanned to generate tokens.
● Syntax Analysis: Tokens are parsed to create an Abstract Syntax Tree (AST) that
represents the program's structure.
● Semantic Analysis: The AST is validated for type correctness, scope resolution,
and other semantic checks.
● Intermediate Code Generation: The validated AST is converted into three-address
code or another intermediate representation.
● Code Generation: The intermediate code is translated into target machine code or a
simpler assembly-like output.
● Output: The final output is a low-level representation or executable for a
predefined virtual machine or hardware
Diagram of Workflow
expression evaluator that computes simple arithmetic operations and resolves variable
references. Error handling mechanisms are integrated to provide feedback on syntax
and semantic errors, ensuring a robust compilation process. Together, these
MGIT| Page7
components enable the mini compiler to effectively process and execute a limited
subset of a programming language.
4. Lexical Analysis
4.1 Role of the Lexer
The lexer plays a crucial role in the mini compiler by serving as the first stage of the
compilation process. Its primary function is to read the raw source code and convert it
into a sequence of tokens, which are the fundamental building blocks for further analysis.
The lexer identifies and categorizes various elements of the code, including keywords,
identifiers, numeric literals, and symbols.By skipping whitespace and tracking line
numbers, the lexer ensures that the tokens are accurately represented for the parser.
1. Keywords: These are reserved words with special meaning in the language. In this
subset, important keywords include:
● int: Used for declaring integer variables.
● print: Used for outputting values to the console.
● main: The entry point of the program.
MGIT| Page8
2. Identifiers: These tokens represent variable names defined by the user. Identifiers
must follow specific naming conventions, typically starting with a letter or
underscore, followed by letters, digits, or underscores.
3. Numbers: Tokens that represent numeric literals, specifically integers in this
subset. They are recognized by their digit composition.
4. Operators and Symbols: These include various symbols that perform operations or
denote structure in the code:
● =: Assignment operator.
● +: Addition operator.
● ;: Semicolon, used to terminate statements.
● { and }: Curly braces, used to define code blocks.
● ( and ): Parentheses, used for grouping expressions and function calls.
5. End of File (EOF): A special token indicating the end of the input source code.
MGIT| Page9
error reporting. This implementation emphasizes clarity and efficiency, laying a solid
foundation for the subsequent phases of the compilation process. Here is an example of
how lexer works for a code when the code given has errors.
Input:
Output:
MGIT| Page10
5. Syntax Analysis(Parser)
MGIT| Page11
6. Semantic Analysis
6.1 Basic Definition and Working Principle
Semantic analysis is the process of ensuring that a program's declarations and
statements are semantically correct, meaning their usage aligns with the intended
control structures and data types. It involves comparing information within
different parts of a parse tree, such as verifying that variable references match their
declarations and that function call parameters align with their definitions.
Implementing semantic actions is more straightforward in recursive descent
parsing, as they can be integrated into the recursive procedures. Key functions of
semantic analysis include maintaining and updating the symbol table, checking for
semantic errors and warnings like type mismatches, variable scope issues,
redefinitions, and the use of undeclared variables.
MGIT| Page12
6.3 Type Checking and Error Handling
Type checking and error handling in the provided mini compiler are essential
aspects of semantic analysis that ensure the correctness of operations and
expressions. Type checking verifies that variables are assigned compatible data
types and that arithmetic operations are performed on appropriate types. The
compiler checks for semantic errors, such as the use of undeclared variables,
through symbol table management, generating error messages when issues arise.
Additionally, the ‘error’ function provides informative feedback, including line
numbers and error descriptions, to help users identify and correct problems in their
code. Overall, these mechanisms enhance the robustness of the semantic analysis
phase, ensuring adherence to rules of type usage and variable scope. Here is an
example of this process with the example of a variable assignment case.
Variable Assignment:
Valid Assignment
Invalid Assignment
Type Checking Implementation: The compiler would check the type of the value being
assigned to x and raise an error if it does not match the expected type (integer).
MGIT| Page13
Intermediate Representation (IR) is a crucial concept in compiler design that serves
as a bridge between the high-level source code and the low-level machine code. It
provides a way to represent the program in a form that is easier for the compiler to
analyze and optimize, while still being abstract enough to be independent of the
target architecture.The IR consists of tokenized and parsed structures ready for
evaluation.
MGIT| Page14
Obtained Output(Test Case 1):
MGIT| Page15
8.2 Debugging Common Issues
Below are some common issues encountered during development, along with strategies
for debugging:
8.2.1 Lexical Analysis Issues
Issue: Scanner fails to recognize a valid identifier or token.
Debugging Steps:
● Verify the regular expressions used for token recognition.
● Check the order of token matching rules to avoid conflicts (e.g., if being matched
as an identifier instead of a keyword).
● Add debug statements to log unrecognized characters.
MGIT| Page16
Debugging Steps:
● Simulate the execution of the intermediate code and verify its correctness step by
step.
● Check register allocation and memory management for errors.
● Add debug output to the generated code to trace execution.
MGIT| Page17
9.2 Future Enhancements
9.2.1 Adding More Features of the C Language
Currently, the mini compiler supports a limited subset of the C language. Adding more
features would make it more versatile and closer to a full-fledged C compiler. Possible
enhancements include:
MGIT| Page18
10.In Summary
In conclusion, the development of a mini compiler for a subset of the C programming
language represents a significant step in understanding the principles of compiler design
and implementation. Through the exploration of lexical analysis, syntax parsing,
semantic analysis, and code generation, we have demonstrated the fundamental processes
that transform high-level code into executable machine instructions.This project not only
highlights the intricacies involved in compiling a programming language but also serves
as a practical application of theoretical concepts in computer science. The mini compiler
effectively showcases the ability to parse and execute a limited set of C constructs,
providing a foundation for further enhancements and expansions.Future work could
involve extending the compiler's capabilities to support additional C features, optimizing
the generated code, or even implementing error handling mechanisms to improve user
experience. Overall, this mini compiler serves as a valuable educational tool, fostering a
deeper understanding of compiler architecture and paving the way for more complex
programming language implementations.
Reference
● Youtube (Cobb Coding)
● GitHub Projects
● GeeksforGeeks
MGIT| Page19