0% found this document useful (0 votes)
12 views

Unit 3 CD

compiler design notes

Uploaded by

ankitprajapat403
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Unit 3 CD

compiler design notes

Uploaded by

ankitprajapat403
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Type Checking: An Overview

Type checking is the process of verifying and enforcing the constraints of types—data
classifications that dictate how values can be manipulated—within a programming language. It is
a crucial aspect of static analysis that ensures programs adhere to the type rules, preventing type
errors that could lead to runtime failures.

Type System

A type system is a formal framework within a programming language that assigns types to
various constructs, such as variables, expressions, functions, and modules. It defines the rules for
type assignments and type checking. A strong type system helps in detecting errors at compile
time, improving program safety and reliability.

Key Concepts in Type Systems

1. Static vs. Dynamic Typing:


o Static Typing: Types are checked at compile time. Examples include C, C++, and
Java.
o Dynamic Typing: Types are checked at runtime. Examples include Python and
JavaScript.
2. Strong vs. Weak Typing:
o Strong Typing: Types are enforced strictly, and implicit type conversion is
limited or non-existent.
o Weak Typing: Types can be implicitly converted, allowing more flexibility but
increasing the risk of errors.
3. Type Inference:
o Some languages (like Haskell or TypeScript) can infer types based on the context,
reducing the need for explicit type annotations.

Equivalence of Expressions

In compiler design, two expressions are considered equivalent if they yield the same result for all
possible values of their variables in a given context. Expression equivalence is crucial for
optimizations like common subexpression elimination, code motion, and constant folding.

Types of Equivalences:

1. Semantic Equivalence: Two expressions are semantically equivalent if they always


produce the same result. For example, x + 0 is semantically equivalent to x.
2. Syntactic Equivalence: Two expressions are syntactically equivalent if they have the
same structure and operand values. For example, x + 1 and 1 + x may be semantically
equivalent in the context of commutative operations but not syntactically equivalent.

Types and Type Systems

A type system is a set of rules that assigns types to various constructs in a programming
language, such as variables, expressions, functions, and modules. Types help prevent errors by
ensuring that operations are performed on compatible types.

Type Checking

Type checking is the process of verifying the type rules in a program. It can be done at compile
time (static type checking) or runtime (dynamic type checking).

Type Inference

Type inference automatically deduces the types of expressions without explicit type annotations.
Languages like Haskell and TypeScript use type inference to reduce the need for explicit type
declarations.

Type Conversion (Type Casting)

Type conversion is the process of converting one data type to another. There are two main types
of type conversions in compiler design:

1. Implicit Type Conversion (Coercion): Automatic type conversion performed by the


compiler. For example, converting an int to a float when performing arithmetic
operations between an int and a float.
2. Explicit Type Conversion (Casting): Type conversion specified by the programmer
using casting operators. For example, (float) x explicitly converts the variable x to a
float.

Types of Type Conversions

1. Widening Conversion: Converting a type to a larger type. This is generally safe and
involves no loss of data. For example, converting an int to a float.
2. Narrowing Conversion: Converting a type to a smaller type. This can lead to data loss
or overflow. For example, converting a double to an int.

Overloading of Functions and Operators in Compiler Design


Overloading allows multiple functions or operators to have the same name but different
implementations based on their parameters or operands. It enhances the readability and
reusability of code by allowing the same function name or operator to perform different tasks
depending on the context.

Function Overloading

Function overloading enables the definition of multiple functions with the same name but
different parameter lists (either in number, type, or both). The compiler distinguishes these
functions based on the types and number of arguments provided at the call site.

Example of Function Overloading


cpp
Copy code
#include <iostream>
using namespace std;

void print(int i) {
cout << "Integer: " << i << endl;
}

void print(double f) {
cout << "Float: " << f << endl;
}

void print(string s) {
cout << "String: " << s << endl;
}

int main() {
print(5); // Calls print(int)
print(3.14); // Calls print(double)
print("Hello"); // Calls print(string)
return 0;
}
Implementation in a Compiler

1. Symbol Table: The symbol table maintains entries for each function name, along with a list of
possible signatures (parameter types and return type).
2. Overload Resolution: During semantic analysis, the compiler uses the function name and
arguments provided in a call to select the correct function. This process is known as overload
resolution.

Operator Overloading

Operator overloading allows user-defined types to extend the behavior of operators, enabling
operators to be used with user-defined types much like they are with built-in types.

Example of Operator Overloading


cpp
Copy code
#include <iostream>
using namespace std;

class Complex {
public:
double real, imag;

Complex(double r, double i) : real(r), imag(i) {}

Complex operator + (const Complex& other) const {


return Complex(real + other.real, imag + other.imag);
}
};

int main() {
Complex a(1.0, 2.0), b(3.0, 4.0);
Complex c = a + b; // Uses overloaded operator+
cout << "Result: " << c.real << " + " << c.imag << "i" << endl;
return 0;
}
Implementation in a Compiler

1. Syntax Analysis: Extend the syntax rules to allow operators to be used with user-defined types.
2. Semantic Analysis: Ensure that the overloaded operator has a matching implementation for the
types of its operands.
3. Code Generation: Generate code that calls the appropriate operator function.

Polymorphic Functions in Compiler Design

Polymorphic functions are functions that can operate on different types without being rewritten
for each type. This feature enhances code reuse and flexibility. There are two primary types of
polymorphism in programming languages: parametric polymorphism and ad-hoc
polymorphism.

1. Parametric Polymorphism: The function or data type can handle values identically
without depending on their type. This is often seen in generics in languages like Java, C#,
and templates in C++.
2. Ad-hoc Polymorphism: The function can be applied to arguments of different types and
can behave differently depending on the type of arguments. This includes function
overloading and operator overloading.

Parametric Polymorphism
Parametric polymorphism allows functions and data types to be written generically so that they
can handle values uniformly without depending on their type. This is a form of compile-time
polymorphism.

Example in Haskell
haskell
Copy code
-- A generic identity function
identity :: a -> a
identity x = x

main = do
print (identity 5) -- Integer
print (identity "Hello") -- String
print (identity [1, 2, 3]) -- List of Integers
Example in C++ with Templates
cpp
Copy code
#include <iostream>
using namespace std;

template <typename T>


T identity(T x) {
return x;
}

int main() {
cout << identity(5) << endl; // Integer
cout << identity("Hello") << endl; // String
cout << identity(3.14) << endl; // Float
return 0;
}

Ad-hoc Polymorphism

Ad-hoc polymorphism allows functions to operate differently based on the type of arguments.
This includes function overloading and operator overloading.

Example in C++ (Function Overloading)


cpp
Copy code
#include <iostream>
using namespace std;

void print(int i) {
cout << "Integer: " << i << endl;
}

void print(double f) {
cout << "Float: " << f << endl;
}

void print(string s) {
cout << "String: " << s << endl;
}

int main() {
print(5); // Calls print(int)
print(3.14); // Calls print(double)
print("Hello"); // Calls print(string)
return 0;
}

Implementation in a Compiler

To support polymorphic functions in a compiler, we need to:

1. Extend the Syntax and Grammar: Allow the definition and usage of polymorphic functions in the
language syntax.
2. Symbol Table Management: Extend the symbol table to store information about polymorphic
functions.
3. Type Checking and Inference: Implement mechanisms to resolve the types during compilation.
4. Code Generation: Generate appropriate code to handle different types for polymorphic
functions.

Runtime Environment: Storage Organization in Compiler Design

In compiler design, the runtime environment (RTE) manages the execution of a program. The
RTE is responsible for the allocation, organization, and management of storage for variables,
functions, objects, and control structures during the execution of the program. Storage
organization is a crucial aspect of the RTE, as it affects both performance and correctness.

Components of Storage Organization

1. Static Storage: Memory allocated at compile time for global variables and constants.
2. Stack Storage: Memory allocated for local variables and function call information.
3. Heap Storage: Memory allocated dynamically for data structures whose size can change at
runtime.

Static Storage

Static storage is used for global variables, constants, and static variables. These variables are
allocated memory at compile time and have a fixed address throughout the program's execution.

Characteristics:

 Lifetime: Entire program execution.


 Allocation: Compile-time.
 Access: Direct (known address).
Example:

c
Copy code
int global_var = 10; // Global variable

void function() {
static int static_var = 5; // Static variable
}

Stack Storage

The stack is used for managing function calls and local variables. Each function call creates a
stack frame (or activation record) that contains the function's local variables, return address, and
other control information.

Characteristics:

 Lifetime: Function call duration.


 Allocation: Runtime (when the function is called).
 Access: LIFO (Last In, First Out) order.

Example:

c
Copy code
void function() {
int local_var = 10; // Local variable
}

Stack Frame Structure:

 Return Address: The address to return to after the function call completes.
 Parameters: Arguments passed to the function.
 Local Variables: Variables declared within the function.
 Saved Registers: Registers saved before the function call.

Heap Storage

Heap storage is used for dynamic memory allocation. Data structures such as linked lists, trees,
and other objects whose size can vary are typically allocated on the heap.

Characteristics:

 Lifetime: Managed by the programmer (explicit allocation and deallocation).


 Allocation: Runtime (using functions like malloc in C, new in C++/Java).
 Access: Random access.
Example:

c
Copy code
int* ptr = (int*)malloc(sizeof(int) * 10); // Allocate an array of 10 integers
free(ptr); // Deallocate the memory

Memory Layout

A typical memory layout of a process in a runtime environment is as follows:

1. Code Segment: Contains the compiled code of the program.


2. Data Segment: Divided into initialized data and uninitialized data (BSS).
o Initialized Data: Global and static variables that are initialized.
o BSS (Block Started by Symbol): Global and static variables that are uninitialized.
3. Heap Segment: Dynamic memory allocation.
4. Stack Segment: Function call management and local variables.

plaintext
Copy code
+------------------+
| Code Segment |
+------------------+
| Initialized Data |
+------------------+
| BSS Segment |
+------------------+
| Heap Segment |
| (grows) |
| V |
+------------------+
| Stack |
| (grows) |
| ^ |
+------------------+

Managing the Stack

The stack is managed using a stack pointer (SP) and a frame pointer (FP). The stack pointer
points to the top of the stack, while the frame pointer points to the base of the current stack
frame.

 Stack Pointer (SP): Adjusted as functions are called and return, growing and shrinking the stack.
 Frame Pointer (FP): Remains fixed during the execution of a function, providing a stable
reference point for accessing parameters and local variables.

Function Call Example

Consider the following function calls:


c
Copy code
void foo(int x) {
int y = 10;
}

void main() {
int a = 5;
foo(a);
}

Stack Frame for main:

 Return Address
 Local Variable a

Stack Frame for foo:

 Return Address
 Parameter x
 Local Variable y

Heap Management

Heap memory is managed through allocation and deallocation mechanisms provided by the
programming language (e.g., malloc and free in C, new and delete in C++).

 Allocation: Requests a block of memory of a specified size.


 Deallocation: Frees a previously allocated block of memory.

Fragmentation: Over time, the heap can become fragmented, with free memory scattered in
small blocks. This can be mitigated using various allocation strategies (e.g., first fit, best fit,
buddy system).

Conclusion

Storage organization in the runtime environment is crucial for the efficient and correct execution
of a program. It involves managing different types of memory (static, stack, heap) with distinct
lifetimes and access patterns. Properly implementing these storage mechanisms ensures that
variables and data structures are correctly allocated, accessed, and deallocated, leading to robust
and efficient program execution.

Dynamic Storage Allocation Strategies in Compiler Design


Dynamic storage allocation strategies are essential for managing the heap memory efficiently
during the execution of a program. These strategies determine how the memory is allocated and
deallocated to handle variable-sized requests, minimize fragmentation, and improve
performance.

Common Dynamic Storage Allocation Strategies

1. First-Fit Allocation
2. Best-Fit Allocation
3. Worst-Fit Allocation
4. Next-Fit Allocation
5. Buddy System Allocation

1. First-Fit Allocation

In first-fit allocation, the allocator searches the memory from the beginning and allocates the first
block that is large enough to satisfy the request.

Advantages:

 Simple and fast allocation.


 Generally efficient for small to medium-sized allocations.

Disadvantages:

 Can lead to fragmentation as small holes accumulate at the beginning of the memory
space.

Example:

plaintext
Copy code
Memory: [ 10 | 20 | 5 | 15 | 30 ] (blocks of free memory)
Request: 12
Allocation: [ 10 | 8 (20-12) | 5 | 15 | 30 ] (12 is allocated from the second
block)

2. Best-Fit Allocation

In best-fit allocation, the allocator searches the entire memory and allocates the smallest block
that is large enough to satisfy the request. This strategy aims to reduce wasted space.

Advantages:

 Reduces wasted space by finding the most suitable block.

Disadvantages:
 Slower allocation due to the need to search the entire memory.
 Can create small, unusable fragments.

Example:

plaintext
Copy code
Memory: [ 10 | 20 | 5 | 15 | 30 ] (blocks of free memory)
Request: 12
Allocation: [ 10 | 20 | 5 | 3 (15-12) | 30 ] (12 is allocated from the fourth
block)

3. Worst-Fit Allocation

In worst-fit allocation, the allocator searches the entire memory and allocates the largest block.
The idea is to leave large blocks available for future allocations.

Advantages:

 Can help in distributing free space evenly.

Disadvantages:

 Often leaves large, unusable fragments.


 Generally less efficient in terms of space utilization.

Example:

plaintext
Copy code
Memory: [ 10 | 20 | 5 | 15 | 30 ] (blocks of free memory)
Request: 12
Allocation: [ 10 | 20 | 5 | 15 | 18 (30-12) ] (12 is allocated from the fifth
block)

4. Next-Fit Allocation

In next-fit allocation, the allocator starts searching from the location of the last allocation and
allocates the first block that is large enough to satisfy the request.

Advantages:

 Simple and fast allocation, similar to first-fit.


 Can reduce fragmentation in certain scenarios by distributing allocations more evenly.

Disadvantages:

 Can lead to fragmentation over time.


 Performance can degrade if large blocks are frequently allocated and deallocated.

Example:

plaintext
Copy code
Memory: [ 10 | 20 | 5 | 15 | 30 ] (blocks of free memory)
Request: 12 (previous allocation was in the second block)
Allocation: [ 10 | 20 | 5 | 3 (15-12) | 30 ] (12 is allocated from the fourth
block)

5. Buddy System Allocation

The buddy system is a more complex but efficient allocation strategy that splits memory into
blocks of sizes that are powers of two. When a block is split, it creates two "buddy" blocks of
equal size. When a block is freed, the allocator attempts to merge it with its buddy.

Advantages:

 Efficient allocation and deallocation.


 Reduced fragmentation compared to other strategies.
 Fast coalescence of free blocks.

Disadvantages:

 Can waste memory due to rounding up to the nearest power of two.


 More complex implementation.

Example:

plaintext
Copy code
Initial Memory: [ 64 ] (single block of 64 units)
Request: 12
Split: [ 32 | 32 ]
Split: [ 16 | 16 | 32 ]
Split: [ 16 | 8 | 8 | 32 ]
Allocation: [ 16 | 8 | 4 (8-4) | 32 ] (12 is allocated from the thi
Symbol Table in Compiler Design in Detail
Answer

A symbol table is a crucial data structure in compiler design that stores information about the
various identifiers (such as variables, functions, objects, etc.) used in the source code. Here are
the key aspects of the symbol table:

1. Purpose:
o To keep track of the semantics of variables, functions, classes, and other entities.
o To store information such as scope, type, memory location, and attributes of identifiers
[1].

2. Structure:
o It is typically implemented as a hash table or a tree to allow for efficient insertion,
deletion, and lookup operations.
o Each entry in the symbol table holds the identifier's name and its associated attributes.

3. Phases of Compilation:
o Lexical Analysis: Adds new entries for each identifier and checks for duplicate
declarations.
o Syntax Analysis: Uses the table to ensure the correct usage of variables and functions.
o Semantic Analysis: Verifies type consistency and other semantic rules using the
information stored in the table.
o Code Generation: Utilizes the memory location information to generate machine code.

4. Attributes Stored:
o Name: The identifier's name.
o Type: Data type of the identifier (e.g., integer, float).
o Scope: The block of code where the identifier is valid.
o Value: The constant value (for constants).
o Memory Location: Address in memory where the identifier is stored.
o Other Attributes: Additional properties relevant to specific identifiers (e.g., parameter
types for functions) [2].

5. Operations:
o Insertion: Adding a new identifier and its attributes.
o Modification: Updating attributes of an existing identifier.
o Lookup: Retrieving attributes of an identifier.
o Deletion: Removing an identifier from the table (if applicable).

The symbol table plays a vital role in ensuring that the compiler can efficiently process the
source code by maintaining organized and accessible information about all identifiers used.

You might also like