0% found this document useful (0 votes)
32 views11 pages

Muhammad Hamza BSCS-E3-22-23 Compiler

Uploaded by

Muhammad Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views11 pages

Muhammad Hamza BSCS-E3-22-23 Compiler

Uploaded by

Muhammad Hamza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Muhammad Hamza

BSCS-E3-22-23

Assignment # 01

Compiler Construction

MR. Waqar Ahmad

University of Sahiwal
Question 1: Please answer directly without unnecessary details.

I. Write down the history of compiler? Why we do study of


compiler?
History:
In order to make these machines more accessible to the public, computer scientists
developed compilers that allowed users to write programs in higher-level languages.
The first compiler was developed in the 1950s, and it was designed to translate
programs written in Fortran into machine code.

we do study of compiler:
Compilers provide you with the theoretical and practical knowledge that is needed to
implement a programming language. Once you have learned a compiler, you pretty
much know the innards of many programming languages. Judging programming
language(PL) by its essential features will become easy for you.

II. Define what a compiler is and explain the main phases of the
compilation process?
Define:
A compiler translates source code into machine code, allowing a computer to
execute the program.

Main Phases of the Compilation Process:


The compilation process typically involves several phases, which may vary
depending on the compiler and programming language. Here are the main
phases:

1. Preprocessing (Preprocessing Phase)


Reads the source code and performs preliminary operations:
○ Expands macros (if applicable)
○ Includes header files
○ Removes comments
○ Triggers preprocessor directives (e.g., #ifdef)
2. Lexical Analysis (Scanning Phase)
Breaks the preprocessed source code into individual tokens:
○ Identifiers (e.g., variable names)
○ Keywords (e.g., if, while)
○ Literals (e.g., numbers, strings)
○ Symbols (e.g., +, -, *, /)
3. Syntax Analysis (Parsing Phase)
Analyzes the tokens to ensure the program follows the language's syntax rules:
○ Parses the token stream into an abstract syntax tree (AST)
○ Checks for syntax errors (e.g., missing semicolons)
4. Semantic Analysis (Analysis Phase)
Examines the AST to ensure the program is semantically correct:
○ Checks type compatibility
○ Verifies variable declarations
○ Performs scope analysis
5. Intermediate Code Generation (ICG Phase)
Generates intermediate code (e.g., assembly code or bytecode) from the AST:
○ Three-address code (TAC) or quadruples
6. Optimization (Optimization Phase)
Improves the intermediate code to reduce execution time or memory usage:
○ Dead code elimination
○ Constant folding
○ Register allocation

7. Code Generation (CG Phase)


Translates the optimized intermediate code into machine code:
○ Target-specific assembly code or machine code

8. Assembly and Linking (optional)


If necessary, assembles and links the object files to create an executable file:
○ Resolves external references
○ Allocates memory for the program
After these phases, the compilation process is complete, and the resulting machine
code can be executed directly by the computer's processor.

III. Compare and contrast compilers and interpreters. Provide


examples of programming languages that use each.

Compilers:
● Translate source code into machine code beforehand (ahead-of-time, AOT)
● Produce executable files that can run independently
● Typically faster execution speed
Examples:
○ C
○ C++
○ Fortran
○ Assembly languages

Interpreters:
● Translate and execute source code line-by-line at runtime (just-in-time, JIT)
● Do not produce executable files
● Often used for dynamic or scripting languages
Examples:
○ Python
○ JavaScript
○ Ruby
○ PHP
Key Differences
● Compilation Time: Compilers translate code before execution, while interpreters
translate and execute simultaneously.
● Executable Files: Compilers produce executable files, whereas interpreters do
not.
● Execution Speed: Compiled code typically runs faster since interpretation occurs
at runtime.
● Error Handling: Compilers usually detect errors before execution, while
interpreters detect errors during execution.
Hybrid Approaches
● Just-In-Time (JIT) Compilation: Combines interpretation and compilation.
Examples:
● Java (JIT compilation)
● .NET languages (C#, F#, etc.)

● Bytecode Compilation: Compiles to intermediate bytecode, then interprets or


compiles to machine code.
Examples:
● Java (JIT compilation)
● Python (bytecode compilation and interpretation)

IV. Discuss three primary goals of a compiler (e.g., correctness,


optimization, portability). Why is each goal important for
efficient program execution?
Primary Goals of a Compiler:

1. Correctness
Ensure the generated machine code accurately implements the source program's
semantics.
● Importance:
○ Prevents errors and bugs.
○ Maintains program integrity.
○ Ensures reliability.
2. Optimization
Improve the generated code's performance, efficiency, and resource usage.
● Importance:
○ Faster execution speeds.
○ Reduced memory usage.
○ Enhanced scalability.
3. Portability
Enable the compiled program to run on multiple platforms with minimal
modifications.
● Importance:
○ Cross-platform compatibility.
○ Reduced maintenance.
○ Increased software reuse.
Why These Goals Matter:
● Correctness ensures the program works as intended.
● Optimization enhances performance and efficiency.
● Portability facilitates wider adoption and easier maintenance.
Balancing Goals:
● Compilers trade off between goals (e.g., optimization vs. compilation speed).
● Effective compilers balance correctness, optimization, and portability.

V. Explain the distinction between the front-end and back-end of a


compiler, and describe the specific tasks performed by each.
Compiler Front-end vs. Back-end

Front-end (Analysis Phase)

● Analyzes source code


● Tasks:
1. Lexical Analysis (tokenization)
2. Syntax Analysis (parsing)
3. Semantic Analysis (type checking)

4. Intermediate Code Generation (ICG)


Back-end (Synthesis Phase)
● Generates machine code
● Tasks:
1. Optimization (performance improvement)
2. Code Generation (machine code)
3. Assembly Code Generation

4. Object Code Generation


Key Distinctions:
1. Focus: Front-end (analysis) vs. Back-end (synthesis)
2. Input: Front-end (source code) vs. Back-end (intermediate code)
3. Output: Front-end (intermediate code) vs. Back-end (machine code)
Modular Design Benefits:
1. Easier maintenance
2. Platform flexibility
3. Reusability
Example Tools:

Front-end: Lex, Yacc


Back-end: GCC, LLVM

VI. Define lexical analysis and explain its role in the compilation
process. What is the role of a lexical analyzer (scanner)?
Lexical Analysis (Scanning)

Lexical analysis is the first phase of the compilation process. It breaks the source code
into individual tokens, such as:

● Keywords (e.g., if, while)


● Identifiers (e.g., variable names)
● Literals (e.g., numbers, strings)
● Symbols (e.g., +, -, *, /)

Role in Compilation Process:

1. Converts source code into tokens


2. Removes whitespace and comments
3. Provides input for syntax analysis (parsing)

Lexical Analyzer (Scanner)


A lexical analyzer, also known as a scanner, performs lexical analysis. Its role is to:

1. Read source code character by character


2. Identify tokens based on patterns (regular expressions)
3. Return tokens to the parser

Scanner's Tasks:

1. Tokenization
2. Error handling (e.g., invalid characters)
3. Filtering (e.g., removing comments)

Output:

● Token stream (sequence of tokens)


● Token information (e.g., token type, value)

Example Tools:

● Lex (UNIX)
● Flex (Fast Lexical Analyzer)
● ANTLR (ANother Tool for Language Recognition)

Simple Analogy:

Scanner breaks text into words, like:


"Hello World" → "Hello", "World"
Scanner does this for programming code

VII. Define and provide examples for:


● Token
● Lexemes
● Patterns

Token:
A single unit of code, representing a meaningful symbol or word.
Examples:
● Keywords: if, while, for
● Identifiers: x, myVariable, userName
● Literals: 5, 3.14, "hello"
● Symbols: +, -, *, /, =
● Operators: &&, ||, !

Lexemes:
The actual text or sequence of characters that makes up a token.

Examples:

● Token: Keyword (if)


Lexeme: "if"
● Token: Identifier (myVariable)
Lexeme: "myVariable"
● Token: Literal (5)
Lexeme: "5"

Patterns:
Regular expressions or rules used to match and identify tokens.
Examples:
● Keyword pattern: [a-zA-Z]+ (matches one or more alphabetic characters)
● Identifier pattern: [a-zA-Z_][a-zA-Z0-9_]* (matches alphabetic or underscore
followed by alphanumeric or underscore)
● Integer literal pattern: [0-9]+ (matches one or more digits)

VIII. How does the lexical analyzer use regular expressions to


identify tokens?
Lexical Analyzer & Regular Expressions
The lexical analyzer uses regular expressions (regex) to match patterns in the source
code and identify tokens.
Steps:
1. Define regex patterns for tokens (e.g., keywords, identifiers, literals)
2. Scan source code character by character
3. Match characters against regex patterns
4. If match found, create token and store attributes (e.g., token type, value)
Regex Patterns:
● Keywords: [a-zA-Z]+ (e.g., if, while)
● Identifiers: [a-zA-Z_][a-zA-Z0-9_]* (e.g., x, myVariable)
● Integers: [0-9]+ (e.g., 5)
● Strings: "[^"]*" (e.g., "hello")
Regex Special Characters:
● . (dot) - matches any character
● (star) - matches zero or more occurrences
● (plus) - matches one or more occurrences
● ? (question mark) - matches zero or one occurrence
● [ ] (brackets) - matches characters within
● ^ (caret) - negates match
Example Tools:
● Lex (UNIX)
● Flex (Fast Lexical Analyzer)
● ANTLR (ANother Tool for Language Recognition)
Benefits:
● Efficient tokenization
● Flexible pattern matching
● Easy maintenance and modification
IX. Discuss common errors that occur during lexical analysis. How
does a lexical analyzer handle errors, and what are some
techniques to recover from them?

Common Errors During Lexical Analysis:


1. Invalid characters
2. Unrecognized tokens
3. Token recognition errors (e.g., keyword vs. identifier)
4. Comment errors (e.g., unclosed comments)
5. String literal errors (e.g., unclosed strings)
Error Handling Techniques:
1. Error Tokens: Create special tokens for errors
2. Error Recovery: Continue analysis after error
3. Panic Mode: Skip input until valid token found
4. Synchronization Points: Resume analysis at specific points
5. Phrase-Level Recovery: Recover at phrase boundaries
Lexical Analyzer Error Handling:
1. Report error
2. Skip invalid input
3. Continue analysis
4. Recover using error tokens or synchronization points
Techniques to Recover from Errors:
1. Backtracking: Revert to previous state
2. Lookahead: Check future input
3. Synchronization Points: Resume analysis at specific points
4. Default Values: Assign default values to missing tokens
Tools and Techniques:
1. Lex, Flex, ANTLR (tools)
2. Regular expressions (pattern matching)
3. State machines (finite automata)
Best Practices:
1. Clear error reporting
2. Robust error recovery
3. Comprehensive testing
4. Well-documented error handling
Key Considerations:
1. Balance error reporting and recovery
2. Minimize error propagation
3. Ensure robustness and reliability
4. Optimize performance and efficiency
Question 2: Regular Expressions
I. Create a regular expression that matches strings of even length
that only contain the characters x, y, and z.
Here's a regular expression:

[xyz]{2}[xyz]*
This matches:
● Only x, y, z characters
● Strings with even length (2, 4, 6, ...)
Breakdown:
● [xyz] matches x, y, or z
● {2} means exactly 2 characters
● [xyz]* means 0 or more additional x, y, z characters
Example matches:
● xy
● yz
● xxyy
● zzzz
Example non-matches:
● x (odd length)
● y (odd length)
● xyz (odd length)

II. Write the regular expression for the language


accepting all the string which are starting with 1 and
ending with 0, over ∑ = {0, 1}.

The regular expression for the language accepting all strings starting with 1
and ending with 0 over ∑ = {0, 1} is:
*1(0|1)0
Explanation:
● 1: Starts with 1
● (0|1)*: Followed by any number (including 0) of 0s or 1s
● 0: Ends with 0
This regular expression matches strings like:
● 10
● 110
● 1010
● 1110
But not:
● 00
● 11
● 010
This regular expression can also be represented using the following notation:
1(0|1)+0: This requires at least one 0 or 1 in between.
*1(0|1)0 allows for zero occurrences of 0 or 1 in between.

III. Write the regular expression for the language starting with a but
not having consecutive b's.

Here is the regular expression:


a(a|c|d)((b(a|c|d)))**
Alternatively:
a(⟨non-b⟩|b⟨non-b⟩)*
Where ⟨non-b⟩ = (a|c|d)
Explanation:
● a: Starts with 'a'
● (a|c|d)*: Zero or more occurrences of 'a', 'c', or 'd'
● ((b(a|c|d)))***: Zero or more occurrences of 'b' followed by zero or more 'a', 'c', or
'd'
This regular expression matches strings like:
● a
● ac
● abac
● abcda
But not:
● abba
● abb
● abbb
Note: The alphabet Σ = {a, b, c, d}.
If Σ has more characters, replace (a|c|d) with ([^b]), where [^b] matches any character
except 'b'.

IV. Write the regular expression for the language accepting all the
string in which any number of a's is followed by any number of
b's is followed by any number of c's.

The regular expression for the language accepting all strings with any number of a's
followed by any number of b's followed by any number of c's is:
abc***
Explanation:
● a*: Zero or more 'a's
● b*: Zero or more 'b's
● c*: Zero or more 'c's
This regular expression matches strings like:
● aaabbbccc
● abc
● a
● b
● c
● (empty string)
Note: The * symbol indicates zero or more occurrences.
If you want to ensure at least one character, replace * with +:
a+b+c+
This requires at least one 'a', one 'b', and one 'c'.

You might also like