Unit 3 CD
Unit 3 CD
Type checking is the process of verifying and enforcing the constraints of types—data
classifications that dictate how values can be manipulated—within a programming language. It is
a crucial aspect of static analysis that ensures programs adhere to the type rules, preventing type
errors that could lead to runtime failures.
Type System
A type system is a formal framework within a programming language that assigns types to
various constructs, such as variables, expressions, functions, and modules. It defines the rules for
type assignments and type checking. A strong type system helps in detecting errors at compile
time, improving program safety and reliability.
Equivalence of Expressions
In compiler design, two expressions are considered equivalent if they yield the same result for all
possible values of their variables in a given context. Expression equivalence is crucial for
optimizations like common subexpression elimination, code motion, and constant folding.
Types of Equivalences:
A type system is a set of rules that assigns types to various constructs in a programming
language, such as variables, expressions, functions, and modules. Types help prevent errors by
ensuring that operations are performed on compatible types.
Type Checking
Type checking is the process of verifying the type rules in a program. It can be done at compile
time (static type checking) or runtime (dynamic type checking).
Type Inference
Type inference automatically deduces the types of expressions without explicit type annotations.
Languages like Haskell and TypeScript use type inference to reduce the need for explicit type
declarations.
Type conversion is the process of converting one data type to another. There are two main types
of type conversions in compiler design:
1. Widening Conversion: Converting a type to a larger type. This is generally safe and
involves no loss of data. For example, converting an int to a float.
2. Narrowing Conversion: Converting a type to a smaller type. This can lead to data loss
or overflow. For example, converting a double to an int.
Function Overloading
Function overloading enables the definition of multiple functions with the same name but
different parameter lists (either in number, type, or both). The compiler distinguishes these
functions based on the types and number of arguments provided at the call site.
void print(int i) {
cout << "Integer: " << i << endl;
}
void print(double f) {
cout << "Float: " << f << endl;
}
void print(string s) {
cout << "String: " << s << endl;
}
int main() {
print(5); // Calls print(int)
print(3.14); // Calls print(double)
print("Hello"); // Calls print(string)
return 0;
}
Implementation in a Compiler
1. Symbol Table: The symbol table maintains entries for each function name, along with a list of
possible signatures (parameter types and return type).
2. Overload Resolution: During semantic analysis, the compiler uses the function name and
arguments provided in a call to select the correct function. This process is known as overload
resolution.
Operator Overloading
Operator overloading allows user-defined types to extend the behavior of operators, enabling
operators to be used with user-defined types much like they are with built-in types.
class Complex {
public:
double real, imag;
int main() {
Complex a(1.0, 2.0), b(3.0, 4.0);
Complex c = a + b; // Uses overloaded operator+
cout << "Result: " << c.real << " + " << c.imag << "i" << endl;
return 0;
}
Implementation in a Compiler
1. Syntax Analysis: Extend the syntax rules to allow operators to be used with user-defined types.
2. Semantic Analysis: Ensure that the overloaded operator has a matching implementation for the
types of its operands.
3. Code Generation: Generate code that calls the appropriate operator function.
Polymorphic functions are functions that can operate on different types without being rewritten
for each type. This feature enhances code reuse and flexibility. There are two primary types of
polymorphism in programming languages: parametric polymorphism and ad-hoc
polymorphism.
1. Parametric Polymorphism: The function or data type can handle values identically
without depending on their type. This is often seen in generics in languages like Java, C#,
and templates in C++.
2. Ad-hoc Polymorphism: The function can be applied to arguments of different types and
can behave differently depending on the type of arguments. This includes function
overloading and operator overloading.
Parametric Polymorphism
Parametric polymorphism allows functions and data types to be written generically so that they
can handle values uniformly without depending on their type. This is a form of compile-time
polymorphism.
Example in Haskell
haskell
Copy code
-- A generic identity function
identity :: a -> a
identity x = x
main = do
print (identity 5) -- Integer
print (identity "Hello") -- String
print (identity [1, 2, 3]) -- List of Integers
Example in C++ with Templates
cpp
Copy code
#include <iostream>
using namespace std;
int main() {
cout << identity(5) << endl; // Integer
cout << identity("Hello") << endl; // String
cout << identity(3.14) << endl; // Float
return 0;
}
Ad-hoc Polymorphism
Ad-hoc polymorphism allows functions to operate differently based on the type of arguments.
This includes function overloading and operator overloading.
void print(int i) {
cout << "Integer: " << i << endl;
}
void print(double f) {
cout << "Float: " << f << endl;
}
void print(string s) {
cout << "String: " << s << endl;
}
int main() {
print(5); // Calls print(int)
print(3.14); // Calls print(double)
print("Hello"); // Calls print(string)
return 0;
}
Implementation in a Compiler
1. Extend the Syntax and Grammar: Allow the definition and usage of polymorphic functions in the
language syntax.
2. Symbol Table Management: Extend the symbol table to store information about polymorphic
functions.
3. Type Checking and Inference: Implement mechanisms to resolve the types during compilation.
4. Code Generation: Generate appropriate code to handle different types for polymorphic
functions.
In compiler design, the runtime environment (RTE) manages the execution of a program. The
RTE is responsible for the allocation, organization, and management of storage for variables,
functions, objects, and control structures during the execution of the program. Storage
organization is a crucial aspect of the RTE, as it affects both performance and correctness.
1. Static Storage: Memory allocated at compile time for global variables and constants.
2. Stack Storage: Memory allocated for local variables and function call information.
3. Heap Storage: Memory allocated dynamically for data structures whose size can change at
runtime.
Static Storage
Static storage is used for global variables, constants, and static variables. These variables are
allocated memory at compile time and have a fixed address throughout the program's execution.
Characteristics:
c
Copy code
int global_var = 10; // Global variable
void function() {
static int static_var = 5; // Static variable
}
Stack Storage
The stack is used for managing function calls and local variables. Each function call creates a
stack frame (or activation record) that contains the function's local variables, return address, and
other control information.
Characteristics:
Example:
c
Copy code
void function() {
int local_var = 10; // Local variable
}
Return Address: The address to return to after the function call completes.
Parameters: Arguments passed to the function.
Local Variables: Variables declared within the function.
Saved Registers: Registers saved before the function call.
Heap Storage
Heap storage is used for dynamic memory allocation. Data structures such as linked lists, trees,
and other objects whose size can vary are typically allocated on the heap.
Characteristics:
c
Copy code
int* ptr = (int*)malloc(sizeof(int) * 10); // Allocate an array of 10 integers
free(ptr); // Deallocate the memory
Memory Layout
plaintext
Copy code
+------------------+
| Code Segment |
+------------------+
| Initialized Data |
+------------------+
| BSS Segment |
+------------------+
| Heap Segment |
| (grows) |
| V |
+------------------+
| Stack |
| (grows) |
| ^ |
+------------------+
The stack is managed using a stack pointer (SP) and a frame pointer (FP). The stack pointer
points to the top of the stack, while the frame pointer points to the base of the current stack
frame.
Stack Pointer (SP): Adjusted as functions are called and return, growing and shrinking the stack.
Frame Pointer (FP): Remains fixed during the execution of a function, providing a stable
reference point for accessing parameters and local variables.
void main() {
int a = 5;
foo(a);
}
Return Address
Local Variable a
Return Address
Parameter x
Local Variable y
Heap Management
Heap memory is managed through allocation and deallocation mechanisms provided by the
programming language (e.g., malloc and free in C, new and delete in C++).
Fragmentation: Over time, the heap can become fragmented, with free memory scattered in
small blocks. This can be mitigated using various allocation strategies (e.g., first fit, best fit,
buddy system).
Conclusion
Storage organization in the runtime environment is crucial for the efficient and correct execution
of a program. It involves managing different types of memory (static, stack, heap) with distinct
lifetimes and access patterns. Properly implementing these storage mechanisms ensures that
variables and data structures are correctly allocated, accessed, and deallocated, leading to robust
and efficient program execution.
1. First-Fit Allocation
2. Best-Fit Allocation
3. Worst-Fit Allocation
4. Next-Fit Allocation
5. Buddy System Allocation
1. First-Fit Allocation
In first-fit allocation, the allocator searches the memory from the beginning and allocates the first
block that is large enough to satisfy the request.
Advantages:
Disadvantages:
Can lead to fragmentation as small holes accumulate at the beginning of the memory
space.
Example:
plaintext
Copy code
Memory: [ 10 | 20 | 5 | 15 | 30 ] (blocks of free memory)
Request: 12
Allocation: [ 10 | 8 (20-12) | 5 | 15 | 30 ] (12 is allocated from the second
block)
2. Best-Fit Allocation
In best-fit allocation, the allocator searches the entire memory and allocates the smallest block
that is large enough to satisfy the request. This strategy aims to reduce wasted space.
Advantages:
Disadvantages:
Slower allocation due to the need to search the entire memory.
Can create small, unusable fragments.
Example:
plaintext
Copy code
Memory: [ 10 | 20 | 5 | 15 | 30 ] (blocks of free memory)
Request: 12
Allocation: [ 10 | 20 | 5 | 3 (15-12) | 30 ] (12 is allocated from the fourth
block)
3. Worst-Fit Allocation
In worst-fit allocation, the allocator searches the entire memory and allocates the largest block.
The idea is to leave large blocks available for future allocations.
Advantages:
Disadvantages:
Example:
plaintext
Copy code
Memory: [ 10 | 20 | 5 | 15 | 30 ] (blocks of free memory)
Request: 12
Allocation: [ 10 | 20 | 5 | 15 | 18 (30-12) ] (12 is allocated from the fifth
block)
4. Next-Fit Allocation
In next-fit allocation, the allocator starts searching from the location of the last allocation and
allocates the first block that is large enough to satisfy the request.
Advantages:
Disadvantages:
Example:
plaintext
Copy code
Memory: [ 10 | 20 | 5 | 15 | 30 ] (blocks of free memory)
Request: 12 (previous allocation was in the second block)
Allocation: [ 10 | 20 | 5 | 3 (15-12) | 30 ] (12 is allocated from the fourth
block)
The buddy system is a more complex but efficient allocation strategy that splits memory into
blocks of sizes that are powers of two. When a block is split, it creates two "buddy" blocks of
equal size. When a block is freed, the allocator attempts to merge it with its buddy.
Advantages:
Disadvantages:
Example:
plaintext
Copy code
Initial Memory: [ 64 ] (single block of 64 units)
Request: 12
Split: [ 32 | 32 ]
Split: [ 16 | 16 | 32 ]
Split: [ 16 | 8 | 8 | 32 ]
Allocation: [ 16 | 8 | 4 (8-4) | 32 ] (12 is allocated from the thi
Symbol Table in Compiler Design in Detail
Answer
A symbol table is a crucial data structure in compiler design that stores information about the
various identifiers (such as variables, functions, objects, etc.) used in the source code. Here are
the key aspects of the symbol table:
1. Purpose:
o To keep track of the semantics of variables, functions, classes, and other entities.
o To store information such as scope, type, memory location, and attributes of identifiers
[1].
2. Structure:
o It is typically implemented as a hash table or a tree to allow for efficient insertion,
deletion, and lookup operations.
o Each entry in the symbol table holds the identifier's name and its associated attributes.
3. Phases of Compilation:
o Lexical Analysis: Adds new entries for each identifier and checks for duplicate
declarations.
o Syntax Analysis: Uses the table to ensure the correct usage of variables and functions.
o Semantic Analysis: Verifies type consistency and other semantic rules using the
information stored in the table.
o Code Generation: Utilizes the memory location information to generate machine code.
4. Attributes Stored:
o Name: The identifier's name.
o Type: Data type of the identifier (e.g., integer, float).
o Scope: The block of code where the identifier is valid.
o Value: The constant value (for constants).
o Memory Location: Address in memory where the identifier is stored.
o Other Attributes: Additional properties relevant to specific identifiers (e.g., parameter
types for functions) [2].
5. Operations:
o Insertion: Adding a new identifier and its attributes.
o Modification: Updating attributes of an existing identifier.
o Lookup: Retrieving attributes of an identifier.
o Deletion: Removing an identifier from the table (if applicable).
The symbol table plays a vital role in ensuring that the compiler can efficiently process the
source code by maintaining organized and accessible information about all identifiers used.