CD 1-8 Units Q&A by NovaSkillHub
CD 1-8 Units Q&A by NovaSkillHub
• Lexeme: It is the actual text from the source code that matches a token.
Example: In if(x > 0),
o if is a token
4-Marks Questions
The structure of a compiler consists of several phases that work together to translate source
code written in a high-level language into machine-level code. These phases are organized in
a sequence, and each phase performs a specific task.
• The compiler starts with the lexical analysis phase, where the source code is read and
broken into tokens. Tokens are meaningful words like keywords, identifiers,
operators, etc. This phase also removes white spaces and comments.
• Next is the syntax analysis phase, also called parsing. It checks whether the tokens
follow the grammar rules of the language. If any syntax error is found, it is reported
at this stage. A parse tree or syntax tree is generated which represents the structure
of the source code.
• After that, the semantic analysis phase verifies the meaning of the statements. It
checks things like variable declarations, type compatibility, and scope resolution.
• Then comes the code optimization phase, which improves the intermediate code by
removing unnecessary instructions or reordering them to make the program run
faster or use less memory.
• After optimization, the code generation phase takes place where the intermediate
code is converted into target machine code.
• The symbol table is used throughout the process to keep track of variable names,
types, and scopes.
• The compiler also has error handling in each phase and provides proper messages to
the programmer to fix issues in the code.
All these phases work in sequence to convert high-level language code into efficient machine
code. A neat block diagram showing these phases should be drawn along with the answer
for better clarity and marks.
Token recognition is one of the key tasks performed during the lexical analysis phase of the
compiler. The main aim is to identify the smallest meaningful units in the source code,
known as tokens. A token represents categories such as keywords, identifiers, operators,
constants, delimiters, etc.
• The lexical analyzer reads the input program character by character from left to right
and groups them into tokens using patterns defined by regular expressions.
• Each pattern is associated with a token name, and the input that matches a particular
pattern is considered a valid token. These patterns are matched using tools or
manually written code.
• For example, in the line int x = 10;, the tokens identified are:
o int → Keyword
o x → Identifier
o = → Assignment operator
o 10 → Constant
o ; → Delimiter
• The process of recognizing tokens helps the next phase, i.e., the syntax analyzer, to
build a proper structure (parse tree) of the source code.
This whole process of token recognition plays a vital role in ensuring that the input is valid
and properly formatted before moving to further phases of compilation.
Both hand-written lexical analyzers and LEX-generated analyzers are used for scanning
source code and producing tokens, but they differ in terms of implementation, ease of use,
and flexibility.
• On the other hand, LEX is a tool that automatically generates a lexical analyzer. It
takes input in the form of regular expressions and actions and produces the code for
scanning and token recognition.
• Hand-written analyzers are more flexible but require more time and effort.
Debugging and testing also take longer.
• LEX analyzers are faster to develop and easier to modify but might not be as efficient
as hand-written ones in complex scenarios.
In summary, LEX is used for quick and standard lexical analyzer development, while hand-
written analyzers are preferred when more control and customization are needed.
LEX is a tool used to generate lexical analyzers. It uses regular expressions to define patterns
and matches tokens in the input code.
Below is a simple LEX program to recognize keywords like int, float, and identifiers (variable
names):
%{
#include <stdio.h>
%}
%%
%%
int main() {
yylex();
return 0;
• In this program, keywords like int, float, etc., are matched and printed.
• This code can be compiled using lex and gcc to test and recognize input patterns.
This program clearly shows how LEX is used to automate the process of recognizing
keywords and identifiers in any source code.
5-Marks Questions
The compilation process is divided into several phases, each having a specific role in
translating the source code to machine code. These phases are executed in order and work
together to produce efficient executable programs.
• The first phase is Lexical Analysis, which reads the source code and converts it into
tokens. It removes comments, white spaces, and separates valid words like keywords
and identifiers.
• Next is the Syntax Analysis phase, also known as parsing. It checks if the token
sequence follows the correct grammar rules of the programming language. It
generates a parse tree or syntax tree as output.
• Then comes the Semantic Analysis phase, which checks for the meaning of the
statements. It verifies if variables are declared, types are matched, and functions are
called with correct parameters.
• The Code Generation phase converts the optimized code into actual machine code
that can be executed on the hardware.
• Symbol Table Management is used throughout all these phases to store information
about variables, functions, types, and scope.
• Error Handling is also done at each phase to detect and report errors effectively.
2. Discuss in detail the working of LEX tool with an example.
LEX is a tool used to generate lexical analyzers in compiler design. It takes patterns defined
by regular expressions and produces a program in C that can identify and process tokens
from the input.
• The LEX file is divided into three sections: definitions, rules, and user code. In the
definitions section, header files are included. The rules section contains regular
expressions and actions. The user code section has the main function and other
logic.
• When a LEX file is compiled using lex command, it generates a file called lex.yy.c,
which is then compiled using gcc to create an executable.
• The tool reads the input program character by character, matches it with defined
patterns, and performs specified actions like printing token names.
Example:
%%
%%
• In the above example, if the input contains int or any variable name like count, it will
print appropriate messages.
• This tool is very helpful in automating lexical analysis, which saves time and reduces
the possibility of errors compared to writing it manually.
3. How does lexical analysis help in error handling and token generation?
Lexical analysis plays a very important role in the compiler design process, especially in the
initial stages of source code processing. It is responsible for scanning the input code and
breaking it down into meaningful tokens.
• During token generation, the lexical analyzer uses regular expressions to identify
valid words in the program such as keywords, operators, identifiers, and constants.
These tokens are then passed to the next phase, i.e., syntax analysis.
• Apart from token generation, lexical analysis also handles errors related to invalid
characters, illegal symbols, or unexpected sequences in the source code.
• If a character does not match any defined pattern, the lexical analyzer detects it as an
error and reports it. For example, if the programmer writes int @value;, the @
symbol will be flagged as an error.
• The lexical analyzer also performs error recovery, such as skipping invalid input or
using dummy tokens to allow the compilation process to continue.
• Proper error messages are given with line numbers and descriptions to help the
programmer fix them quickly. This improves the debugging process and code quality.
Hence, lexical analysis ensures that only valid and meaningful tokens are passed to the
parser, and errors in the source code are caught at the earliest stage possible.
UNIT-2: Introduction to Syntax Analysis
2-Marks Questions
• The parser is the second phase in the compiler, after lexical analysis.
• It checks whether the token sequence follows the syntax rules of the programming
language, defined by grammar.
• If the structure is correct, it builds a parse tree; otherwise, it reports syntax errors.
• Each production rule has a single non-terminal on the left side and a combination of
terminals and non-terminals on the right side.
• Left recursion happens when a non-terminal refers to itself as the first symbol in one
of its production rules.
• It can cause infinite recursion in top-down parsers like recursive descent parsers.
• To avoid this problem, left recursion is removed and replaced by right recursion or
iteration.
• Removing left recursion helps the parser work efficiently and correctly.
• A grammar is said to be ambiguous if a string can have more than one valid parse
tree.
4-Marks Questions
E→E+T|T
T→T*F|F
F → (E) | id
• This grammar shows expressions (E) as a combination of terms (T) and factors (F).
• Brackets are also handled using (E) which means expressions can be nested.
Left recursion causes issues in top-down parsers and needs to be removed. Let’s take a
simple example to explain how to eliminate it.
A→Aα|β
A → β A’
A’ → α A’ | ε
• This version removes the left recursion and can now be parsed using top-down
parsers.
• Example:
• One example is variable declarations and usage. In most languages, a variable must
be declared before use, and CFGs can’t track such dependencies.
• Also, checking that function parameters match during a function call is beyond the
power of CFG.
E→E+T|T
T→T*F|F
F → id
For the expression a + b * c, the tree structure will show operator precedence:
/\
E +
/ \
T T
| |
F T
| /\
a T *
| \
F F
| |
b c
• This tree shows that b * c is evaluated first (due to higher precedence of *) and then
added to a.
5-Marks Questions
Context-Free Grammar (CFG) plays a vital role in defining the syntax rules of programming
languages. It helps compilers understand and validate the structure of source code.
• CFG consists of four parts: a set of terminals, non-terminals, a start symbol, and
production rules. Terminals are the basic symbols (like keywords, operators), and
non-terminals are placeholders for patterns of terminals.
• CFG is used to define how statements and expressions should be written. For
example, a grammar rule might define how an arithmetic expression or an if-else
statement should look.
• Parsers use CFG to check if a given program follows the syntax rules. If the code
matches the rules, it is considered syntactically correct.
• CFG helps in generating parse trees that represent the structure of code. These trees
are further used in semantic analysis and code generation.
Ambiguity in grammar occurs when a single input string can be parsed in more than one
way. This leads to multiple parse trees and different interpretations of the same program.
• Ambiguous grammars are problematic because the compiler cannot decide which
structure to follow. This can cause confusion in code execution.
• Example:
Grammar:
• E → E + E | E * E | id
Input string: id + id * id
E→E+T|T
T→T*F|F
F → id
• Using this, the parse tree will always give id + (id * id) for the above input.
Writing grammars for programming languages is a crucial part of language design. Good
grammar ensures that the syntax rules are clear, unambiguous, and easy to parse.
• One technique is starting with simple rules and gradually building complex
constructs. Start from expressions and then move to statements and blocks.
• Use unambiguous grammar to avoid confusion. For example, use separate rules for
different operator precedence levels.
• Remove left recursion from grammar, especially for top-down parsers, to prevent
infinite loops.
• Use factoring techniques like left factoring to make the grammar suitable for
predictive parsing.
• Ensure the grammar handles all valid constructs of the language, including loops,
conditionals, functions, and declarations.
• Also include error handling rules in the grammar to help the parser detect and
report mistakes.
• Test the grammar with various inputs to check if it handles precedence, associativity,
and nesting properly.
2-Marks Questions
• FIRST set of a non-terminal contains all the terminals that can appear as the first
symbol in some string derived from that non-terminal.
• If the non-terminal can derive epsilon (ε), then epsilon is also included in its FIRST
set.
• FOLLOW set of a non-terminal contains all the terminals that can appear immediately
to the right of that non-terminal in some derivation.
• These sets are used in constructing predictive parsing tables for LL(1) parsers.
• A predictive parser is a type of top-down parser that does not use backtracking.
• It predicts the production to use based on the current input symbol and non-
terminal.
• The parser shifts input symbols onto the stack until it can reduce them to a non-
terminal using a grammar rule.
• It repeats shifting and reducing until it reduces the entire input to the start symbol.
• Viable prefixes are the prefixes of the right sentential forms that can appear on the
stack of a shift-reduce parser.
• They help in constructing LR parsing tables and are recognized by LR(0) automata.
4-Marks Questions
• The table is constructed using the FIRST and FOLLOW sets of the grammar.
• The parser uses the table to decide which production to apply by looking at the
current input symbol and top of the stack.
• LL(1) table must not have any multiple entries; otherwise, the grammar is not LL(1).
• SLR(1) parser is a simple LR parser that uses FOLLOW sets to determine parsing
actions.
• LALR(1) parser uses lookaheads specific to items, making it more powerful and
precise.
• SLR(1) may fail for some grammars due to insufficient context in FOLLOW sets.
• LALR(1) combines states with same core items and merges lookahead information,
improving accuracy.
• LALR(1) parsers are widely used in practice (like in YACC), as they balance efficiency
and power.
• All SLR(1) grammars are LALR(1), but the reverse is not always true.
• YACC (Yet Another Compiler Compiler) is a tool that generates parsers automatically.
• YACC takes grammar rules as input and generates C code for the parser.
• Example: A simple calculator grammar written in YACC can parse expressions like a +
b * c.
• It works with a lexical analyzer like Lex to complete the front-end of a compiler.
4. Discuss error recovery in predictive parsing.
• When an unexpected symbol is found, the parser may stop and report a syntax error.
• One recovery method is Panic Mode, where symbols are discarded until a
synchronizing token (like ; or }) is found.
• Another method is Error Productions, where extra grammar rules are added to catch
specific errors.
• Error routines can also be written to suggest corrections, like missing brackets or
incorrect keywords.
• These techniques make the parser more user-friendly and helpful during compilation.
5-Marks Questions
Recursive descent parsing is a top-down method of parsing where each non-terminal in the
grammar is represented by a function in the parser.
• The parser functions call each other recursively to match the input tokens.
• If the grammar has left recursion, it must be removed before using this method.
Example Grammar:
E → T E'
E' → + T E' | ε
T → id
• The parser reads tokens one by one and matches them with grammar rules.
LR(0) automaton is used to construct the canonical collection of LR(0) items, which helps in
building the LR parsing table.
• LR(0) items are grammar rules with a dot (•) showing the position of the parser.
• For example: A → α • β means the parser has seen α and expects β next.
• The automaton starts with an initial item and builds states by shifting the dot over
the symbols.
• These tables guide the parser during shift, reduce, and accept decisions.
LR(0) is the foundation of more powerful parsers like SLR, LR(1), and LALR(1).
Shift-reduce parsing is a bottom-up approach that uses a stack and input buffer.
• The parser keeps shifting input symbols onto the stack until it matches the right side
of a production rule.
• This process continues until the stack contains only the start symbol.
Example Grammar:
E → E + id | id
Input: id + id
Steps:
• Shift id → Stack: id
• Reduce id to E → Stack: E
• Shift + → Stack: E +
• Shift id → Stack: E + id
• Reduce id to E → Stack: E + E
• Reduce E + E to E → Stack: E
Input is accepted.
4. Discuss the LR parsing algorithm and compare SLR(1), LR(1), and LALR(1).
The LR parsing algorithm uses two tables — ACTION and GOTO — along with a stack to parse
input from left to right, constructing a rightmost derivation in reverse.
• It either shifts the input to the stack or reduces using a grammar rule.
• SLR(1) is the simplest and uses FOLLOW sets for lookahead, but may reject valid
grammars.
• LR(1) is the most powerful and uses specific lookaheads for each item, but creates
large tables.
• LALR(1) combines LR(1) states with the same core items and merges lookaheads,
giving power close to LR(1) with smaller tables.
2-Marks Questions
• These attributes can carry semantic information like data types, values, or memory
locations.
• Synthesized attributes are computed from the attributes of a symbol’s children in the
parse tree and passed upwards.
• Inherited attributes are computed from the attributes of the symbol's parent or
siblings and passed downwards or sideways in the tree.
• Nodes in the graph represent attributes, and edges show which attributes are
needed to compute others.
• S-attributed definitions use only synthesized attributes. These are easy to evaluate
using bottom-up parsers.
• L-attributed definitions may use both synthesized and inherited attributes, but
inherited ones are restricted to come from the left side.
• S-attributed definitions are a subset of L-attributed definitions.
• L-attributed definitions are more general and suitable for top-down parsing.
4-Marks Questions
• For every production rule in a grammar, each attribute involved becomes a node in
the graph.
• Example:
For the production E → E1 + T, suppose we want to compute E.val = E1.val + T.val.
The dependency graph will have arrows from E1.val and T.val to E.val.
• By analyzing this graph, the compiler can evaluate attributes in a correct sequence.
• Consider a grammar:
• E→E+T
• E→T
• T→T*F
• T→F
• F → digit
• SDD:
• These rules evaluate the final result by computing values as the parser processes the
expression.
• Synthesized attributes are evaluated by using the values of attributes from child
nodes in the parse tree.
• The evaluation proceeds from the leaves of the tree towards the root — this is called
bottom-up evaluation.
• Each non-terminal collects the values of its children to compute its own synthesized
attribute.
• These semantic actions are performed when a reduction happens in the LR parsing
process.
• Each grammar rule is associated with code that computes the synthesized attribute
during reduction.
• The values are stored in a semantic stack alongside the parsing stack.
• This approach integrates attribute evaluation into the standard LR parsing flow
efficiently.
5-Marks Questions
• S-attributed definitions use only synthesized attributes and are well-suited for
bottom-up parsers like LR parsers.
• In S-attributed grammars, attributes are passed from child to parent in the parse
tree.
• Example:
• Example:
• A→BC
• S-attributed grammars are simpler but less flexible, while L-attributed grammars are
more powerful and commonly used in semantic analysis.
• The evaluation order of attributes must follow the dependency rules between them
to ensure correctness.
• The graph is constructed with nodes representing attributes and edges showing
dependencies.
• A topological sort of this graph gives the correct order in which attributes should be
evaluated.
• If there is a cycle in the graph, it means circular dependency and the attributes
cannot be evaluated.
• This process ensures that all required values are computed before using them in any
rule.
• L-attributed definitions are ideal for these parsers because inherited attributes can
be passed as function parameters.
• The function for C will then use the inherited value i to compute its own attributes.
• The order of calling functions naturally follows the left-to-right flow required for L-
attributed grammars.
2-Marks Questions
A symbol table is like a dictionary used by the compiler to store information about
identifiers (like variables, functions, arrays, etc.).
It stores details like:
• Name of identifier
• Scope (local/global)
• Memory location
It helps the compiler quickly check if a variable is declared, already used, or defined.
2. How is "scope" represented in semantic analysis?
• Synthesized attributes: Values passed from children to parent in the parse tree.
→ Example: Calculating expression values.
Semantic error recovery means how the compiler handles errors in meaning, like:
• Error messages
• Guessing types
• Skipping code
so compilation can continue without stopping completely.
4-Marks Questions
Array Example:
A → B[C]
E → E1 + T
For assignments:
S → id = E
→ Check if id is declared
2. Discuss the treatment of arrays and structures during semantic analysis using attribute
grammars.
Arrays:
Structures:
• Global Scope
• Local Scope
Symbol tables are used to manage identifiers within each scope.
Example:
void func() {
• Type mismatch
→ int x = "abc";
Compiler gives:
• Line numbers
2-Marks Questions
Example:
x1 = 10
x2 = x1 + 5
3. What is backpatching?
Backpatching is the process of delaying the insertion of jump targets (like labels) until the
target is known.
Example:
In if (a || b) → If a is true, b is not checked.
4-Marks Questions
Example for a = b + c * d
1. (*, c, d)
2. (+, b, result_of_1)
3. (=, result_of_2, a)
Code:
if (a < b)
x = 1;
else
x = 2;
Intermediate Code:
if a < b goto L1
goto L2
L1: x = 1
goto L3
L2: x = 2
L3:
Code:
a[i] = b[j] + 5;
t1 = b[j]
t2 = t1 + 5
a[i] = t2
Quadruples:
Code:
a = 5;
if (a < 10)
b = 1;
else
b = 2;
Flow Graph:
• B1: a = 5
• B2: b = 1
• B3: b = 2
• B4: End
Edges:
• B1 → B2 (if true)
• B1 → B3 (if false)
• B2, B3 → B4
5-Marks Questions
t1 = a + b
t2 = t1 * c
2. Quadruples:
3. Triples:
• Represents expressions
2. Describe the translation of control-flow constructs like while-do, switch, and Boolean
expressions with short-circuit code.
While Loop:
while (i < 5)
i = i + 1;
i=i+1
goto L1
L2:
Switch Statement:
switch(x)
case 1: y = 1;
case 2: y = 2;
if x == 1 goto L1
if x == 2 goto L2
goto L3
L1: y = 1; goto L3
L2: y = 2
L3:
Short-Circuit Boolean:
if (a && b)
goto Ltrue
Code:
if (a > b || c < d)
Step-by-step:
Maintain lists of true and false conditions. After generating labels, we backpatch them.
4. Generate intermediate code for a sample program with expressions, arrays, and control
statements.
Code:
a[i] = b[i] + 2;
Intermediate Code:
i=0
t1 = b[i]
t2 = t1 + 2
a[i] = t2
i=i+1
goto L1
L2:
Covers:
• Arrays
• Arithmetic
• Loop
• Condition Check
UNIT-7: Run-Time Environments (Simple & Easy for Exam)
2-MARKS QUESTIONS
• Parameters
• Local variables
• Return address
• Temporary data
• Variable storage
Example:
Calling fact(3) creates 3 records for fact(3), fact(2), fact(1).
• They are accessed using fixed memory locations or global symbol table
Example:
In C language, we can access global variable int x; from any function.
Think of stack as a call history that stores what’s needed for each function.
5-MARKS QUESTIONS
Memory Layout:
| Code Area |
| Global Variables |
| Heap (Dynamic) |
| Stack (Functions) |
Example:
In Pascal:
procedure A
procedure B
Example:
int fact(int n) {
if (n==0) return 1;
return n * fact(n-1);
| fact(3) |
| fact(2) |
| fact(1) |
| fact(0) |
UNIT-8: Machine Code Generation & Optimization (Easy for End Sem)
2-MARKS QUESTIONS
MOV R1, R2
4-MARKS QUESTIONS
Intermediate Code:
t1 = a + b
t2 = t1 * c
LOAD a, R1
ADD b, R1
MUL c, R1
STORE R1, t2
The code is converted into steps that the machine understands using registers.
2. Types of Machine-Independent Optimizations:
1. Constant Folding:
Replace constant operations at compile-time.
x=2+3→x=5
5. a = b + c
7. Strength Reduction:
Replace expensive operations with cheaper ones.
x=y*2→x=y+y
1. Instruction Selection:
Choose the correct machine instructions for operations.
2. Register Allocation:
Assign variables to CPU registers.
3. Instruction Scheduling:
Arrange instructions to avoid delays and improve speed.
It is applied after intermediate code generation and before machine code generation.
Constant Folding:
x = 10 * 5 → x = 50
int x = 5;
Strength Reduction:
x=y*2→x=y+y
a = b + c;
Arithmetic Statement:
Intermediate Code:
t1 = a + b
t2 = t1 * c
Machine Code:
LOAD a, R1
ADD b, R1
MUL c, R1
STORE R1, t2
Intermediate Code:
if (a > b) goto L1
goto L2
L1: x = 1
L2: x = 0
Machine Code:
LOAD a, R1
SUB b, R1
JGT R1, L1
JMP L2
L1: MOV 1, x
L2: MOV 0, x
Dear Students,
The following questions for Units 1 to 8 of Compiler Design are carefully prepared to help
you cover all key topics for your semester preparation. Make sure to go through them
thoroughly these are designed to support your success!
Ch Anil Kumar
NovaSkillHub