0% found this document useful (0 votes)
2 views

module 1

The document provides an overview of system software and compiler design, detailing the differences between system software and application software, as well as the roles of various components like assemblers, compilers, and operating systems. It introduces the Simplified Instructional Computer (SIC) architecture, including its memory structure, registers, data formats, instruction formats, and I/O operations. Additionally, it covers the SIC/XE architecture, addressing modes, and basic assembler functions with an example program demonstrating the use of these concepts.

Uploaded by

zswn9bx52d
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

module 1

The document provides an overview of system software and compiler design, detailing the differences between system software and application software, as well as the roles of various components like assemblers, compilers, and operating systems. It introduces the Simplified Instructional Computer (SIC) architecture, including its memory structure, registers, data formats, instruction formats, and I/O operations. Additionally, it covers the SIC/XE architecture, addressing modes, and basic assembler functions with an example program demonstrating the use of these concepts.

Uploaded by

zswn9bx52d
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

Module 1
Introduction,Assemblers,Macroprocessors

Software : collection of programs.

System Software: It is a set of programs to perform a variety of system functions as file


editing, resource management, I/O management and storage management. System programs
are intended to support the operation and use of the computer itself, rather than any
particular application. For this reason, they are usually related to the architecture of the
machine on which they are run.
Example : OS,utility programs.

Application Software: An application program is primarily concerned with the solution of


some problem, using the computer as a tool.
Example : Excel,Word,Power Point.

System Software Application Software

It is a set of programs that helps to run It is used by user to solve the problem
computer system using computer as a tool

It is machine oriented software It is not machine-oriented

It is machine dependent It is machine independent

Directly interacts with hardware It does not directly interacts with hardware

Development of software is a complex task Its development is easier

Example: os and utility programs Example: Excel,Word,Power Point.

Text Editor: it creates and modifies the program


Compiler: Source Program to Object Program
Compilers
• A compiler is a language program that translates programs written in any high-level
language into its equivalent machine language program.
• It bridges the semantic gap between a programming language domain and the execution
domain.
• Two aspects of compilation are:
 Generate code to increment meaning of a source program in the execution domain.
 Provide diagnostics for violation of programming language, semantics in a b source program.
• The program instructions are taken as a whole.
Assemblers
• Programmers found it difficult to write or red programs in machine language. In a
convenient language, they began to use a mnemonic (symbol) for each machine instructions
which would subsequently be translated into machine language.
• Such a mnemonic language is called Assembly language.
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 1
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

• Programs known as Assemblers are written to automate the translation of assembly


language into machine language.
Fundamental functions:
1. Translating mnemonic operation codes to their machine language equivalents.
2. Assigning machine addresses to symbolic tables used by the programmers.
Operating System:
It is the most important system program that act as an interface between the users and the
system. It makes the computer easier to use.
• It provides an interface that is more user-friendly than the underlying hardware.
• The functions of OS are:
1. Process management
2. Memory management
3. Resource management
4. I/O operations
5. Data management
6. Providing security to user’s job.
Loader: Loads program into memory.
Linker: It will link the program.
Debugger: It helps in identifying the error.

SIC MACHINE ARCHITECTURE


SIC is abbreviated as Simplified Instructional Computer. Advanced version of SIC is SIC/XE i.e
Simplified Instructional Computer/Extra Equipment or Extra/expensive.
SIC is a hypothetical computer that includes the hardware feature most often found on real
machines
● SIC standard model
● SIC/XE
Both are designed to be upward compatible i.e program written in standard mode can also be
executed in XE version.
Components are:
● input/output
● Memory
● Registers
● Data formats
● Construction formats
● Addressing modes
● Instruction sets

Memory:
• It consists of bytes(8 bits) ,words (24 bits which are consecutive 3 bytes) addressed by the
location of their lowest numbered byte.
• There are totally 32,768 bytes in memory.

Registers: it contains 5 registers and each register bit length is 24 bits.

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 2


Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

s.no register Register name register Special use


.no

1 A Accumulator 0 For arithmetic operation


register

2 X Index register 1 For addressing

3 L Linkage register 2 Jump Subroutine(JSUB)

4 PC Program counter 8 Holds the address of next instruction

5 SW Status word 9 Contains all types of information and is


“cc”(condition code) to check whether is
ready or not
Data formats:
● Integer -- 24 bits binary number are stored in it
● Character-- each character is stored using 8 bit ASCII formats
● Negative numbers -- represented using 2’s complement
● There is no floating point hardware in SIC data formats

Instruction formats:
Each format is of 24 bits i.e divided into opcode and address

opcode X Address
8 bits 1 bit 15 bits

Addressing modes:
There are two different types of addressing modes:
1. Direct addressing modes
2. Indirect addressing modes
S.NO mode Calculation of target address indication

1. Direct TA = Address X=0

2. Indexed TA = address=[X] X=1

Instruction sets:
It includes instructions like:
1. Data movement instruction Ex: LDA, LDX, STA, STX.
2. Arithmetic operating instructions. Ex: ADD, SUB, MUL, DIV.
This involves register A and a word in memory, with the result being left in the register.
3. Branching instruction. Ex: JLT, JEQ, TGT.
4. Subroutine linkage instructions. Eg - JSUB, RSUB
5. Conditional Jump. EX-- JE,JLT,JGT
6. Comparison (COMP)-- comparing 2 strings

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 3


Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

Input and output:


I/O is performed by transferring one byte at a time to or from the rightmost 8 bits
of register A.
• Each device is assigned a unique 8-bit code.
• There are 3 I/O instructions,
1) The Test Device (TD) instructions tests whether the addressed device is ready to send or
receive a byte of data. The condition code(cc) is said to indicate the result of test.
‘<’ => device is ready to send or receive
‘=’ => device is not ready
2) A program must wait until the device is ready, and then execute a Read Data (RD) or Write
Data (WD).
3) The sequence must be repeated for each byte of data to be read or written.

SIC/XE MACHINE ARCHITECTURE


Memory:
• 1 word = 24 bits (3 8-bit bytes)
• Total (SIC/XE) = 220 (1,048,576) bytes (1Mbyte)

Registers:
S.No Mnemonic No Special use

1 B 3 Base register,use for addressing

2 S 4 General purpose register for


storing

3 T 5 General purpose register for


storing

4 F(floating) 6 Floating point accumulator(48-


bits)

1 A 0 For arithmetic operations

2 X 1 For addressing

3 L 2 Jump subroutine(JSUB)

4 PC 8 Holds the address of next


instruction

5 SW 9 Contains all types of information


and is “cc”(condition code) to
check whether is ready or not

Data formats:
• Integers are stored in 24 bit, 2's complement format
• Characters are stored in 8-bit ASCII format
• Floating point is stored in 48 bit signed-exponent-fraction format:

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 4


Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

S E F
Sign exponent fraction
• The fraction is represented as a 36 bit number and has value between 0 and 1.
• The exponent is represented as a 11 bit unsigned binary number between 0 and 2047.
• The sign of the floating point number is indicated by s : 0=positive, 1=negative.
• Therefore, the absolute floating point number value is: f*2(e-1024)
Instruction formats:
There are 4 different instruction formats available:
Format 1:
opcode
8 bits
Eg - RSUB
Format 2:
opcode R1 R2
8 bits 4 bits 4 bits
Eg - COMP A,S
Format 3:
opcode n i x b p e displacement

6 bits 1 bit 1 bit 1 bit 1 bit 1 bit 1 bit 12 bits


Format 4:
opcode n i x b p e address/displacement
6 bits 1 bit 1 bit 1 bit 1 bit 1 bit 1 bit 20 bits
Formats 3 & 4 introduce addressing mode flag bits:
• n=0 & i=1
Immediate addressing - TA is used as an operand value (no memory reference)
• n=1 & i=0
Indirect addressing - word at TA (in memory) is fetched & used as an address to fetch the
operand from
• n=0 & i=0
Simple addressing TA is the location of the operand
• n=1 & i=1
Simple addressing same as n=0 & i=0
Flag x:
x=1 Indexed addressing add contents of X register to TA calculation
Flag b & p (Format 3 only):
• b=0 & p=0
Direct addressing displacement/address field contains TA (Format 4 always uses
direct addressing)
• b=0 & p=1
PC relative addressing - TA=(PC)+disp (-2048<=disp<=2047)*
• b=1 & p=0
Base relative addressing - TA=(B)+disp (0<=disp<=4095)**
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 5
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

Flag e:
e=0 use Format 3 e=1 use Format 4
Instruction set:
SIC provides 26 instructions, SIC/XE provides an additional 33 instructions (59 total)
SIC/XE has 9 categories of instructions:
• Load/store registers (LDA, LDX, LDCH, STA, STX, STCH, etc.)
• integer arithmetic operations (ADD, SUB, MUL, DIV) these will use register A and a word in
memory, results are placed into register A
• compare (COMP) compares contents of register A with a word in memory and sets CC
(Condition Code) to <, >, or =
• conditional jumps (JLT, JEQ, JGT) - jumps according to setting of CC
• subroutine linkage (JSUB, RSUB) - jumps into/returns from subroutine using register L
• input & output control (RD, WD, TD) - see next section
• floating point arithmetic operations (ADDF, SUBF, MULF, DIVF)
• register manipulation, operands-from-registers, and register-to-register arithmetics
(RMO, RSUB, COMPR, SHIFTR, SHIFTL, ADDR, SUBR, MULR, DIVR, etc)

Input and output:


(256) I/O devices may be attached, each has its own unique 8-bit address
• 1 byte of data will be transferred to/from the rightmost 8 bits of register
Three I/O instructions are provided:
• RD Read Data from I/O device into A
• WD Write data to I/O device from A
• TD Test Device determines if addressed I/O device is ready to send/receive a byte of data.
The CC (Condition Code) gets set with results from this test:
< device is ready to send/receive
= device isn't ready
SIC/XE Has capability for programmed I/O (I/O device may input/output data while CPU
does other work) - 3 additional instructions are provided:
• SIO Start I/O
• HIO Halt I/O
• TIO Test I/O

Addressing modes:
It supports numerous addressing modes:
● Base-relative
● Base-relative indexed
● Program counter relative
● Program counter relative indexed

Mode Initialization Target address calculation

Base - relative b=1, P=0 TA = [B]+disp (0<=disp<=4095)

PC - relative b=0, P=1 TA = [PC]+disp (-2048<=disp<=2047)

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 6


Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

Base relative indexed b=1, X=1, P=0 TA = [B]+[X]+disp

Program counter relative b=0, X=1, P=1 TA = [PC]+[X]+disp


indexed addressing mode

With respect to ‘n’ and ‘i’ bits , the remaining addressing modes can be calculated as
n i Addressing modes

0 1 Immediate addressing mode

1 0 Indirect addressing mode

1 1 simple/direct addressing mode

0 0 simple/direct addressing mode

Assemblers

Translation of mnemonic operation codes to their machine language equivalents and


assigning machine programmer. There are some features of an assembler language that have
no direct relation to machine architecture.
Basic Assembler Functions:

START-Specific name and starting address for the program.


END-Indicate the end of the source program and specify the first executable
instruction in the program.
BYTE-Generate character or hexadecimal constant, occupying an many bytes as neede to
represent the constant.
WORD-Generate one-word integer constant.
RESB- Reserve the indicated number of bytes for a data area.
RESW-Reserve the indicated number of words for a data area.

Example:
SIC assambler language program.
 The program contains a main routine that reads records from an input
device and copies them to an output device.
 This main routine calls subroutine RDREC to read a record into a buffer and
subroutine WRREC to write the record from the buffer to the output device.
 Each subroutine must transfer the record one character at a time.
 Because the only instructions available are RD and WD.
 The buffer is necessary because the I/O rates for the two devices,such as a
disk and a slow printing terminal may be very different.
 The end of each record is marked with a null character.If a record is longer
than the length of the buffer (4096 bytes),only the first 4096 bytes are
copied.
 The program does not deal with error recovery.
 The end of the file to be copied is indicated by zero-length record.
 When the end of file is detected,the program writes EOF on the output
device and terminates by executing RSUB instruction.
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 7
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

 This program was called by the operating system using a JSUB


instruction,Thus the RSUB will return control to the operating system.

PROGRAM
EXPLANATION
SOURCE STATEMENT
LINE LOCCTR LABEL
OPCODE OPERAND
5 COPY START 1000 COPY FILE FROM I/P TO O/P
10 FIRST STL RETADR SAVE RETURN ADDRESS
15 CLOOP JSUB RDREC READ I/P RECORD
20 LDA LENGTH TEST FOR EOF(LENGTH=0)
25 COMP ZERO
30 JEQ ENDFIL EXIT IF EOF FOUND
35 JSUB WRREC WRITE O/P RECORD
40 J CLOOP LOOP
45 ENDFIL LDA EOF INSERT END OF FILE MARKER
50 STA BUFFER
55 LDA THREE SET LENGTH=3
60 STA LENGTH
65 JSUB WRREC WRITE EOF
70 LDL RETADR GET RETURN ADDRESS
75 RSUB RETURN TO CALLER
80 EOF BYTE C'EOF'
85 THREE WORD 3
90 ZERO WORD 0
100 RETADR RESW 1
105 LENGTH RESW 1
110 BUFFER RESB 4096

SUB ROUTINE TO READ RECORD INTO BUFFER


PROGRAM
EXPLANATION
SOURCE STATEMENT
LINE LOCCTR LABEL
OPCODE OPRAND
125 RDREC LDX ZERO CLEAR LOOP COUNTER
130 LDA ZERO CLEAR A TO ZERO
135 RLOOP TD INPUT TEST I/P DEVICE
140 JEQ RLOOP LOOP UNTIL READY

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 8


Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

145 RD INPUT READ CHARACTER I


REGISTER A
150 COMP ZERO TEST FOR END OF RECORD
155 JEQ EXIT EXIT LOOP IF EOR
160 STCH BUFFER,X STORE CHARACTER IN BUFFER
165 TIX MAXLEN LOOP UNLESS MAX LENGTH HAS
REACHED
170 JLT RLOOP
175 EXIT STX LENGTH SAVE RECORD LENGTH
180 RSUB RETURN TO CALLER
185 INPUT BYTE X'F1' CODE FOR I/P DEVICE
190 MAXLEN WORD 4096

SUBROUTINE TO WRITE RECORD FROM BUFFER

PROGRAM
EXPLANATION
SOURCE STATEMENT
LINE LOCCTR LABEL
OPCODE OPERAND
200 WRREC LDX ZERO CLEAR LOOP COUNTER
210 WLOOP TD OUTPUT TEST OUTPUT DEVICE
215 JEQ WLOOP LOOP UNTIL READY
220 LDCH BUFFER,X GET CHARACTER FROM BUFFER
225 WD OUTPUT WRITE CHARACTER
230 TIX LENGTH LOOP UNTIL ALL CHARACTERS H
BEEN WRITTEN
235 RSUB RETURN TO CALLER
240 OUTPUT BYTE X'05' CODE FOR O/P DEVICE
245 END FIRST
A Simple SIC Assembler:

o Convert mnemonic operation codes to their machine language


equivalents. (examlpe:translate STL to 14)
o Convert example: translate RETADR to 1033)
o Build the machine instructions in the proper format.
o Convert the data constants specified in the source program into their
internal machine representations(example: OF to 454f46)
o Write the object program and the assembler.
o Consider the statement,
10 1000 FIRST STL RETADR
o To translate the program line by line, we will be unable to process this
statement because we do not know the address that will be assigned to
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 9
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

RETADR.
o Because of this, most of assemblers make two passes over the source
program.
o The first pass does little more than scan the source program for label
definitions and assign addresses.
o The second pass performs most of the actual translation previously
described.
o In addition, to translating the instructions of the source program, the
assembler must process statements called assembler directives or pseudo-
instructions.
o These statements are not translated into machine instructions.Instead,they
provide instructions to the assembler itself.(example:BYTE,WORD)
o In our example program
 START-Specifies the starting memory address for the object program. END-Specific
end of the program.
o Finally, the assembler must write generated object code onto some output
device.

Object program format is divided into three types of records, Header,


Text &End

The header record contains the program name, starting address and length. Header record:
col:1 H
col:2-7 program name
col:8-13 Starting address of object program col:14-19 Length of object
program in bytes

The text program, together with an indication of the addresses where these are to be loaded.
Text Record:
col:1 T
col:2-7 Starting address for object code in the record
col:8-9 Length of object code in this record in bytes
col:10-69 Object code, represented in hexadecimal

The end record marks the end of the object program and specifies the address in the program
where execution is to begin.
End record:
col:1 E
col:2-7 Address of first executable instruction in object program.
The scope of the assembler is, to generate object code. But assembler does not know the
address exactly.so that the assembler choose pass1 algorithm and pass 2 algorithm4.

Pass:1
1. Assign addresses to are statements in the program. 2.Save the values assigned to are labels
for use in pass 2. 3. Perform some processing of assembler directives.

Pass:2
1. Assemble instructions. 2. Generate data values.
3. Perform processing of assembler directives not done during pass 1.
4.Write the object program and the assembly listing.
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
10
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

Assembler Algorithm and Data structures

Our simple assembler uses two major internal data structures:


-The operation code tabel(OPTAB)
-The symbol table(SYMTAB)
OPTAB is used to look up mnemonic operation codes and translate them to their machine
language equivalents.
SYMTAB is used to store values assigned to labels.
LOCCTR-This is a variable that is used to help in the assignment of addresses.LOCCTR is
initialized to the beginning address specified in the START statement.After each source statement
is processed, the length of the assembled instruction or data area to be generated is added to
LOCCTR.Whenever we reach a label in the source program, the current values of LOCCTR gives
the addresss to be associated with that label.
PASS 1 ASSEMBLER ALGORITHM

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


11
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

PASS 2 ASSEMBLER ALGORITHM

Machine – Dependent Assembler features


Eg: SIC-XE assembler program.
 Immediate and indirect addressing can be adopted in programs written in
SIC/XE version.
 *Immediate operands are denoted with the prefix #
 *Indirect addressing is indicated by adding the prefix @ to the operand.
 Instructions that refer to memory are assembled normally using program
counter relative or base relative mode.
 If the displacement required for pc relative and base relative addressing are
too large then the 4 byte extended format instruction is used.
 The main difference between SIC and SIC/XE programs is the use of register
to register instruction.
Advantages of SIC/XE Program:
 Execution speed is good since register to register instruction execution
speed is faster than register to memory instruction.
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
12
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

 Immediate operand need not be fetched from anywhere as it is present as a


part of instruction.
 The large main memory of SIC/XE provides room to load and run several
programs at the same time.

Instruction formats and Addressing Modes:


 The START statement specifies the starting address of the location where
the program is to be loaded.
Eg:START 0 statement will allow a program to be loaded in the address 0.
 SYMTAB would be preloaded with the register names (A,X...etc) and their
values(0,1...ets)
 Register to memory instruction is assembled using either program counter
relative or base relative addressing.
 The assembler must calculate the displacement, which must be added as a
part of the object instruction.
 The displacement is calculated so that the correct target address is ngot
when content of program counter(pc) or base register(B) is added with the
displacement.
 Displacement must be between 0 and 4095(for base relative mode) or
between -2048 and
-2047(for program counter relative mode).
 If neither program counter rerlative nor base relative addressing can be
used then the 4-byte extended instruction format is used.

Difference between pc relative and base relative addressing11:


1. When pc relative addressing is used the assembler will know the content of
pc,only during execution time.
2. But in base relative addressing ,the programmer must tell the assembler
what the base register will contain during the execution of the program and the assembler
will calculate the displacement.

Program Relocation
 More than one program can share the memory and other resources of the
machine.
 If we knew in advance,which program would execute concurrently,we could
assign address,when the program were assembled so that they would fit together
without overlap.But practically this may not be possible.
 So it is desirable to load a program into the memory whenever there is a
space for it.
 In such cases actual starting address of the program is not know until load
time.
 If the program is loaded beginning at the location 1000,the variable THREE
value will located at address 102D.
 If the program is loaded starting at some other addresss 2000,the address
102D will not contain the actual value of THREE.
 So we have to make some changes in the address portion of the instruction
in order to retrieve the correct value.
Eg:
0006 CLOOP +JSUB RDREC 4B101036
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
13
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

.
.
1036 RDREC CLEAR X B410

Case 1:
The statement RDREC is present at the memory location 1036,if the program loaded
beginning at address 0000.

0000
.
0006 4B101036 <--+JSUB RDREC
.
. 1036
B410 <--RDREC

Case 2:
5000
.
. 5006
4B106036 <--+JSUB RDREC
.
.
6036 B410 <--RDREC
The address of the instruction JSUB the address of label RDREC.
The assembler loaded. However the assembler can identify for the loader those parts of the object
program that need modification.
An object program that contains the information to perform this kind of modification is called a
reloadable program.
Relocation Program Solving Steps:
 When the assembler generates the object code for the JSUB instruction,it
will insert the address of RDREC,relative to the start of the program.(This is the reason we
initialized the location counter to 0 for the assembly)
 The assembler will also produce a command for the loader, instructing it to
add the beginning address of the program to the address field in the JSUB instruction at load
time.

Modification Record:
col:1 M
col:2-7 Starting location of the address field to be modified relative to the beginning of the
program.
Col:8-9 Length of address field to be modified in half bytes. (ie. 4 bits=1 half byte)
 For all the instruction which uses extended format instruction,relocation
must be performed, so modification record must be added.
 Other lines in the program do not require modification as they use pc
relative or base relative addressing.
Machine Independent Assembler Features
This features that are commonly found in implementation of this type of software and that are
relatively machine independent.
Literals
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
14
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

 Programmer is convenient to write the value of a constant operand as a


part of the instruction that uses it.
 This avoids having defined the constantss where in the program and make
up a label for it.
 Such an operand is called as “literal”,because the value is stated “literally” in
the instruction.
 Literal is identified with the prefix '=',which is followed by a specification of
the literal value.
Eg:
45 001A ENDFIL LDA =C'EOF'032010
215 1062 WLOOP TD =X'05'E32011

Difference between literal and immediate operand

Immediate addressing ,the operand value ios assembled as part of the machine instruction.In
literal the assembler must generate the value as a constant in any of the memory location.
 Address of the constant is assigned as the target address.

Literal pool

 Literals are stored in literal pool. This operation is carried out the end of the
program.
LTORG-->Assembler directives
 It creates the literals pool immediately and store the literals until the
previous LTORG.
 Once a literal is stored in the literal pool then it is not repeated again.
 In some program the LTORG is placed in the middle of the program, this is
because the literals are placed in the pool at the end of the program.
 When there is a literal at the beginning of the program and the program has
300 lines means then the starting address of the literal pool is at the end of the program.
 The reference for the operand make the pc to go for to reach literal and this
waste the time.So it is possible to use as much LTORG statement in the program.
 Most of the assembler does not allow duplication of literals in the literal
pool.They allow the same literal used more than one place in the program.
 In literal pool only one copy of the specified date value is stored.
 Before allocating space for a literal in the pool,it is verified that is there the
same literal is already in the pool by means of comparing the literals in the pool character
with the new literal.
For example,
Same literal is used more than once,and the literal has different values during the
execution of the program. Here according to the duplication of literals in the pool, the above
mentioned literal is appeared once in the pool and the execution may be a problem.

The solution is created basic data structure literal table,Literal tabel contains,
-Literal name
-The operand value and length
-Address assigned to the operand
 During pass 1 the assembler searches the LITTAB for a literal name.If the
literal is present means no problem.If it is not the literal is added to the literal tabel.
 During pass 2 the assembler searches the LITTAB for the literal address for
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
15
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

object code generation.

Symbol defining statements

 User defined symbols in assembler language program have appeared as


labels on instruction or data areas.
 The value of such a label is the address assigned to the statement on which
it appears.
 Most assembler provides an assembler directive that allows the
programmer to define symbols and specify their values.
 The assembler directive generally used in EQU.
 The general form of such statement is, symbol EQU value.
 This statement defines the given symbol and assigns the value specified to
it.
 The value may be given as,
-A constant.
-As any expression involving contents.
-Previously defined symbols.
One use of EQU is to establish symbolic names that can be used for improved readability in
place of numeric values.
Eg:
+LDT #4096
to load the values 4096 into register T.This values represents the maximum length record. We
could read with subroutine RDREC.
MAXLEN EQU 4096
And the calling statements like this
+LDT #MAXLEN
Now it is clear that MAX LEN is replaced with the values 4096 during execution.Assembler
encounters the EQU and stores it in the SYMTAB with its value 4096.
Another common use of EQU is in defining mnemonics names for registers.

Eg:
A EQU 0 BASE EQU R1
X EQU 1 COUNT EQU R2
L EQU 2 INDEX EQU R3

 These statements specify a 1 byte literal with the hexadecimal value 05.The
notation used for literal varies from assembler to assembler.
 It is important to understand the difference between a literal and an
immediate operand with immediate addressing,the operand value is assembled as part of the
machine instruction.
 With literal the assembler generates the specified value as a constant at
some other memory location.
BASE *
LDB =*
 Another assembler directive is called ORG.This is used to indirectly
assign the values to symbols.
 When value is a constant or an expression involving constants and
previously defined symbol.
SYMBOL RESB 6
VALUE RESB 1
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
16
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

FLAGS RESB 2
ORG STAB +1100
The first ORG resets the location counter to the value of STAB.The label on the following RESB
statements defines SYMBOL to have the current value in LOCCTR.
(ie)the same address assigned to SYMTAB LOCCTR.

Expressions:
 Our previous examples of assembler language statements have used single
terms like label,literal,etc.,as instruction operands.
 Most of the assemblers use expression wherever a single operand is
permitted.
 Such expression is evaluated by the assembler and the result is used as the
normal operand.
 Arithmetic expressions are allowed and it must follow the normal rules
using the operators
+,-,* and /.
 This statement is encountered during assembly of a program,the assembler
refers its location contain(LOCCTR)to the specified value we can define a symbol tabel with all
following structures.
SYMBOL VALUE FLAGS

 In this tabel,SYMBOL field contain'6' byte user-defined symbols;VALUE is a


one-word representation of the value assigned to the symbol;FLAGS is a 2-byte field that
specifies symbol type and other information.
STAB RESB 1100
 With EQU statements,
SYMBOL EQU STAB
VALUE EQU STAB+6
FLAGS EQU STAB+9
 With help of assembler directive ORG,we can write those statemnts, STAB
RESB 1100
ORG STAB
 Division is usually defined to produce an integer result.Individual terms in
the expression may be constant,user-defined symbols(or)special terms,common special term
is the current value of the location counter(designated by *).(ie)the value of the next
unassigned memory location.
BUFEND EQU *
 The above expression gives BUFEND a value that is the address of the next
byte after the buffer area.
 Some values in the object program are relative to the beginning of the
program,while others are absolute.
 Similarly,the values of terms and expressions are either relative or absolute.

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


17
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

 A constant is an absolute term.Labels on instructions and data areas,and


references to the location counter value,are relative terms.
 A symbol whose value is given by EQU may be either an absolute term or a
relative term depending upon the expression used to define its value.
 Expressions are classified as,

*Absolute expression
*Relative expression
 The expressions are depending upon the type of value they produce.
 Expression that contains only absolute terms are come under absolute
expression.
 There are some conditions19 to use the relative terms in the expressions,
*Every relative term is paired with another relative term.
*Remaining unpaired term is assigned with a pasitive sign.
*Relative term is not allowed for multiplication and division operation.
 Expressions that do not come under absolute or relative are flagged by the
assembler an errors.
 Some timer relative terms are paired with opposite signs,in that case the
result is an absolute value.
MAXLEN EQU BUFEND-BUFFER

Program Blocks
 Normally the source program is treated as a unit which contains
subroutines,data areas,etc.,
 The assembler evaluates the program and results in a single unit of object
code.
 Some features of assembler allow generalized machine instruction and data
to appear in the object program in a different order from the corresponding source
statements.
 These parts maintain their identity and are handled separately by the
loader.
 We use the program blocks to refer to segments of code that are arranged
within a single object program unit and control sections to refer to segments that are
translate into independent object program units.
 Each program blocks may actually contain several seperate segments of the
source program.
 In this case three blocks are used.The first program block contains the
executable instructions of the program.(unnamed block).
 The second block(C DATA)contains all data areas that are small in
length.
 The third (C BLKS) contains all data areas that consist of larger blocks of
memory.
 The assembler directive USE indicates which portions of the source
program belongs to the various blocks.
At the beginning of the program,statements are assumed to be part of
unnamed (default)block.If no USE statements use included ,the entire program belongs to this
single block.The assembler will rearrange these segmants to gather together the pieces of
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
18
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

each block.These blocks will then be assigned addresses in the object program,with the blocks
appearing in the same order in which they were first begun in the source program.
The assembler accomplishes this logical rearrangement of code by
maintaining,during pass 1 a seperate location counter for each program block.The location
counter for a block is initialized to '0' when the block is first begun.The current value if this
location counter is saved when switching to another block.And the saved value is restored
when resuming a previous block.
During pass 1 each label in the program is assigned an address that is
relative to the start of the block that contains it.
When labels are entered into the symbol tabel,the block name or number is
stored along with the assigned relative address.At the end of pass 1 the latest value of the
location counter for each block indicates the length of that block.The assembler can then
assign to each block a starting address in the object program.For code generation during pass
2,the assembler needs the address for each symbol relative to the start of the object program.

Block Name Block Number Address Length

Default 0 0000 0066

C DATA 1 0066 00013

C BLKS 2 0071 1000

Control Section and Program linking


A control section is a part of program that maintains its identify after
assembly.Each such control section can be loaded and relocated independently of the
others.Diffferent control sections are most often used for subroutines or other logical
subdivisions of a program.The programmer can assemble,load and manipulate each of these
control sections seperately.The resulting flexibility is a major benefit of using control sections.
When control section form logically related parts of a program,it is
necessary to provide some means for linking them together.Instructions in one control
section might need to refer to instructions or data located in another section.Besause control
sections are independently loaded and relocated, the assembler is unable to process these
references in the usual way.
The assembler has no idea where any control section will be located at
execution time.Such references between control external references.In this case there are
three control sections.One for the main program and for each subroutine. Program blocks
traced through the assembly and loading process.Control sections differ from program blocks
in that they are handled seperately by the assembler.

Symbols that are defined in control section may not be used directly by
another section;they must be identified as external references for loader to handle.
EXTDEF – EXTERNAL DEFINITION EXTREF – EXTERNAL REFERENCE
The two new record types21 are DEFINE and REFER. A Define record gives
information about external symbol that are defined in this control section. A Refer record lists
symbols that are yield as external references by the control section.

DEFINE RECORD:
COL 1 :D
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
19
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

COL 2-7 :Name of the external symbol defined in this Control


section. COL 8-13 :Relative address of symbol.
COL 14-73 : Repeat information in col 2-13 for other external
symbol.

REFER RECORD:
COL 1 :R
COL 2-7 :Name of external symbol.
COL 8-13 :Name of the other external reference symbols.

MODIFICATION RECORD:
COL 1 :M
COL 2-7 :Starting address of the field to be modified. COL 8-9
:Length of the field to be modified as half bytes.
COL 10 :Modification flag.
COL 11-16 :External symbol whose value is to be added or
subtracted to the indication field.

One pass assemblers and Multipass assemblers:


One-Pass Assemblers:
Scenario for one-pass assemblers
Generate their object code in memory for immediate execution – load-and-go assembler.
External storage for the intermediate file between two passes is slow or is inconvenient to
use.
Main problem - Forward references
Data items
Labels on instructions
Solution
Require that all areas be defined before they are referenced. It is possible, although
inconvenient, to do so for data items.
Forward jump to instruction items cannot be easily eliminated.
Insert (label, address_to_be_modified) to SYMTAB Usually, address_to_be_modified is stored
in a linked-list
Forward Reference in One-pass Assembler:
Omits the operand address if the symbol has not yet been defined.
Enters this undefined symbol into SYMTAB and indicates that it is undefined
Adds the address of this operand address to a list of forward references associated with the
SYMTAB entry.

When the definition for the symbol is encountered, scans the reference list and inserts the
address.
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
20
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

At the end of the program, reports the error if there are still SYMTAB entries indicated
undefined symbols.
Multi-Pass Assemblers:
For a two pass assembler, forward references in symbol definition are not allowed:

ALPHA EQU BETA


BETA EQU DELTA
DELTA RESW 1
Symbol definition must be completed in pass 1.
Prohibiting forward references in symbol definition is not a serious inconvenience.
Forward references tend to create difficulty for a person reading the program.

Implementation:
For a forward reference in symbol definition, we store in the SYMTAB: The symbol name

The defining expression

The number of undefined symbols in the defining expression.The undefined symbol (marked
with a flag *) associated with a list of symbols depend on this undefined symbol.When a
symbol is defined, we can recursively evaluate the symbol expressions depending on the
newly defined symbol.

IMPLEMENTATION EXAMPLE

 MASAM assembler
 SPARC assembler

MASAM assembler

MASAM assembler is written for Pentium and other x 86 systems.Since x 86 system views
memory as a collection of segments, MASAM assembler language program is written as a
collection of segments.Each segment is defined as belonging to a particular class.Commonly
used classes are CODE, DATA, CONST and STACK.During program execution, segments are
addressed via the x 86 segment registers.Code segment are addressed using register CS Start
segments are addressed using register SS Data segments are addressed using DS or GS.
Jump instructions are assembled in two different ways ‘
Near jump and Far jump
Macro Processors
 A macro represents a commonly used group of statements in the source programming
language. The macro processor replaces each macro instruction with the
corresponding group of source language statements. This is called expanding the
macros.

 Macro instructions allow the programmer to write a shorthand version of a program,


Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
21
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

and leave the mechanical details to be handled by the macroprocessor.

 For example, suppose that it is necessary to save the contents of all registers before
calling a subprogram.
On SIC/XE, this would require a sequence of seven instructions (STA, STB, etc.).
Using a macro instruction, the programmer could simply write one statement like
SAVEREGS. This macro instruction would be expanded into the seven assembler language
instructions needed to save the register contents.

 The most common use of macro processors is in assembler language programming.


However, macro processors can also be used with high-level programming languages,
operating system command languages, etc.

Basic Macro Processor Functions


Macro Definition and Expansion

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


22
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

 Fig 4.1 shows an example of a SIC/XE program using macro instructions. The
definitions of these macro instructions (RDBUFF and WRBUFF) appear in the source
program following the START statement.

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


23
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

 Two new assembler directives (MACRO and MEND) are used in macro definitions.

The first MACRO statement (line 10) identifies the beginning of a macro definition.
The symbol in the label field (RDBUFF) is the name of the macro, and the entries in the operand
field identify the parameters of the macro instruction.

 In our macro language, each parameter begins with the character &, which facilitates the
substitution of parameters during macro expansion.
The macro name and parameters define a pattern or prototype for the macro
instructions used by the programmer.
Following the MACRO directive are the statements that make up the body of the macro
definition.
The MEND assembler directive marks the end of the macro definition.

 Fig 4.2 shows the output that would be generated. Each macro invocation statement has
been expanded into the statements that form the body of the macro, with the arguments
from the macro invocation substituted for the parameters in the macro prototype.

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


24
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

For example, in expanding the macro invocation on line 190, the argument F1 is substituted
for the parameter &INDEV wherever it occurs in the body of the macro.
Similarly, BUFFER is substituted for &BUFADR, and LENGTH is substituted for &RECLTH.

 The comment lines within the macro body have been deleted. Note that the macro
invocation statement itself has been included as a comment line. This serves as
documentation of the statement written by the programmer.

 The label on the macro invocation statement (CLOOP) has been retained as a label on the
first statement generated in the macro expansion.
This allows the programmer to use a macro instruction in exactly the same way as an
assembler language mnemonic.
Note that the two invocations of WRBUFF specify different arguments, so they produce
different expansions.

 After macro processing, the expanded file (Fig 4.2) can be used as input to the assembler.

 In general, the statements that form the expansion of a macro are generated (and
assembled) each time the macro is invoked (see Fig 4.2). Statements in a subroutine
appear only once, regardless of how many times the subroutine is called (see Fig 2.5).

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


25
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

Macro Processor Algorithm and Data Structures

 Approach 1: It is easy to design a two-pass macro processor in which all macro definitions
are processed during the first pass, and all macro invocation statements are expanded
during the second pass.
However, such a two-pass macro processor would not allow the body of one macro instruction
to contain definitions of other macros (because all macros would have to be defined during the
first pass before any macro invocations were expanded).

 Approach 2: A one-pass macro processor that can alternate between macro definition and
macro expansion is able to handle macros like those in Fig 4.3.

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


26
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

Because of the one-pass structure, the definition of a macro must appear in the source
program before any statements that invoke that macro.
 There are three main data structures involved in our
macro processor.
The macro definitions themselves are stored in a definition table (DEFTAB), which
contains the macro prototype and the statements that make up the macro body (with a
few modifications). Comment lines from the macro definition are not entered into

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


27
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

DEFTAB because they will not be part of the macro expansion.


References to the macro instruction parameters are converted to a positional notation for
efficiency in substituting arguments.
The macro names are entered into NAMTAB, which serves as an index to DEFTAB. For
each macro instruction defined, NAMTAB contains pointers to the beginning and end of
the definition in DEFTAB.

 The third data structure is an argument table (ARGTAB), which is used during the
expansion of macro invocations.
When a macro invocation statement is recognized, the arguments are stored in ARGTAB
according to their position in the argument list.
As the macro is expanded, arguments from ARGTAB are substituted for the corresponding
parameters in the macro body.

 Fig 4.4 shows portions of the contents of these tables during the processing of program
in Fig 4.1.

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |


28
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

Fig 4.4(a) shows the definition of RDBUFF stored in DEFTAB, with an entry in NAMTAB identifying the
beginning and end of the definition.
Note the positional notation that has been used for the parameters: &INDEV € ?1(indicatingthe
firstparameterin the prototype), &BUFADR €?2, etc.
Fig 4.4(b) shows ARGTAB as it would appear during expansion of the RDBUFF statement on line 190.
In this case (this invocation), the first argument is F1, the second is BUFFER, etc.

 The macro processor algorithm is presented in Fig 4.5.

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 1


Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

The procedure DEFINE, which is called when the beginning of a macro definition is recognized,
makes the appropriate entries in DEFTAB and NAMTAB.
EXPAND is called to set up the argument values in ARGTAB and expand a macro invocation statement.
The procedure GETLINE, which is called at severalpoints in the algorithm, gets the next line to be
processed. This line may come from DEFTAB (the next line of a macro begin expanded), or from the
input file, depending on whether the Boolean variable EXPANDING is set to TRUE or FALSE.

 One aspect of this algorithm deserves further comment: the handling of macro definitions within
macros (as illustrated in Fig 4.3).
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 2
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63

The DEFINE procedure maintains a counter named LEVEL. Each time a MACRO directive is read, the value
of LEVEL is increased by 1.
Each time an MEND directive is read, the value of LEVEL is decreased by 1.
When LEVEL reaches 0, the MEND that corresponds to the original MACRO directive has been found.

 The above process is very much like matching left and right parentheses when scanning an
arithmetic expression.

Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 3

You might also like