module 1
module 1
Module 1
Introduction,Assemblers,Macroprocessors
It is a set of programs that helps to run It is used by user to solve the problem
computer system using computer as a tool
Directly interacts with hardware It does not directly interacts with hardware
Memory:
• It consists of bytes(8 bits) ,words (24 bits which are consecutive 3 bytes) addressed by the
location of their lowest numbered byte.
• There are totally 32,768 bytes in memory.
Instruction formats:
Each format is of 24 bits i.e divided into opcode and address
opcode X Address
8 bits 1 bit 15 bits
Addressing modes:
There are two different types of addressing modes:
1. Direct addressing modes
2. Indirect addressing modes
S.NO mode Calculation of target address indication
Instruction sets:
It includes instructions like:
1. Data movement instruction Ex: LDA, LDX, STA, STX.
2. Arithmetic operating instructions. Ex: ADD, SUB, MUL, DIV.
This involves register A and a word in memory, with the result being left in the register.
3. Branching instruction. Ex: JLT, JEQ, TGT.
4. Subroutine linkage instructions. Eg - JSUB, RSUB
5. Conditional Jump. EX-- JE,JLT,JGT
6. Comparison (COMP)-- comparing 2 strings
Registers:
S.No Mnemonic No Special use
2 X 1 For addressing
3 L 2 Jump subroutine(JSUB)
Data formats:
• Integers are stored in 24 bit, 2's complement format
• Characters are stored in 8-bit ASCII format
• Floating point is stored in 48 bit signed-exponent-fraction format:
S E F
Sign exponent fraction
• The fraction is represented as a 36 bit number and has value between 0 and 1.
• The exponent is represented as a 11 bit unsigned binary number between 0 and 2047.
• The sign of the floating point number is indicated by s : 0=positive, 1=negative.
• Therefore, the absolute floating point number value is: f*2(e-1024)
Instruction formats:
There are 4 different instruction formats available:
Format 1:
opcode
8 bits
Eg - RSUB
Format 2:
opcode R1 R2
8 bits 4 bits 4 bits
Eg - COMP A,S
Format 3:
opcode n i x b p e displacement
Flag e:
e=0 use Format 3 e=1 use Format 4
Instruction set:
SIC provides 26 instructions, SIC/XE provides an additional 33 instructions (59 total)
SIC/XE has 9 categories of instructions:
• Load/store registers (LDA, LDX, LDCH, STA, STX, STCH, etc.)
• integer arithmetic operations (ADD, SUB, MUL, DIV) these will use register A and a word in
memory, results are placed into register A
• compare (COMP) compares contents of register A with a word in memory and sets CC
(Condition Code) to <, >, or =
• conditional jumps (JLT, JEQ, JGT) - jumps according to setting of CC
• subroutine linkage (JSUB, RSUB) - jumps into/returns from subroutine using register L
• input & output control (RD, WD, TD) - see next section
• floating point arithmetic operations (ADDF, SUBF, MULF, DIVF)
• register manipulation, operands-from-registers, and register-to-register arithmetics
(RMO, RSUB, COMPR, SHIFTR, SHIFTL, ADDR, SUBR, MULR, DIVR, etc)
Addressing modes:
It supports numerous addressing modes:
● Base-relative
● Base-relative indexed
● Program counter relative
● Program counter relative indexed
With respect to ‘n’ and ‘i’ bits , the remaining addressing modes can be calculated as
n i Addressing modes
Assemblers
Example:
SIC assambler language program.
The program contains a main routine that reads records from an input
device and copies them to an output device.
This main routine calls subroutine RDREC to read a record into a buffer and
subroutine WRREC to write the record from the buffer to the output device.
Each subroutine must transfer the record one character at a time.
Because the only instructions available are RD and WD.
The buffer is necessary because the I/O rates for the two devices,such as a
disk and a slow printing terminal may be very different.
The end of each record is marked with a null character.If a record is longer
than the length of the buffer (4096 bytes),only the first 4096 bytes are
copied.
The program does not deal with error recovery.
The end of the file to be copied is indicated by zero-length record.
When the end of file is detected,the program writes EOF on the output
device and terminates by executing RSUB instruction.
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 7
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
PROGRAM
EXPLANATION
SOURCE STATEMENT
LINE LOCCTR LABEL
OPCODE OPERAND
5 COPY START 1000 COPY FILE FROM I/P TO O/P
10 FIRST STL RETADR SAVE RETURN ADDRESS
15 CLOOP JSUB RDREC READ I/P RECORD
20 LDA LENGTH TEST FOR EOF(LENGTH=0)
25 COMP ZERO
30 JEQ ENDFIL EXIT IF EOF FOUND
35 JSUB WRREC WRITE O/P RECORD
40 J CLOOP LOOP
45 ENDFIL LDA EOF INSERT END OF FILE MARKER
50 STA BUFFER
55 LDA THREE SET LENGTH=3
60 STA LENGTH
65 JSUB WRREC WRITE EOF
70 LDL RETADR GET RETURN ADDRESS
75 RSUB RETURN TO CALLER
80 EOF BYTE C'EOF'
85 THREE WORD 3
90 ZERO WORD 0
100 RETADR RESW 1
105 LENGTH RESW 1
110 BUFFER RESB 4096
PROGRAM
EXPLANATION
SOURCE STATEMENT
LINE LOCCTR LABEL
OPCODE OPERAND
200 WRREC LDX ZERO CLEAR LOOP COUNTER
210 WLOOP TD OUTPUT TEST OUTPUT DEVICE
215 JEQ WLOOP LOOP UNTIL READY
220 LDCH BUFFER,X GET CHARACTER FROM BUFFER
225 WD OUTPUT WRITE CHARACTER
230 TIX LENGTH LOOP UNTIL ALL CHARACTERS H
BEEN WRITTEN
235 RSUB RETURN TO CALLER
240 OUTPUT BYTE X'05' CODE FOR O/P DEVICE
245 END FIRST
A Simple SIC Assembler:
RETADR.
o Because of this, most of assemblers make two passes over the source
program.
o The first pass does little more than scan the source program for label
definitions and assign addresses.
o The second pass performs most of the actual translation previously
described.
o In addition, to translating the instructions of the source program, the
assembler must process statements called assembler directives or pseudo-
instructions.
o These statements are not translated into machine instructions.Instead,they
provide instructions to the assembler itself.(example:BYTE,WORD)
o In our example program
START-Specifies the starting memory address for the object program. END-Specific
end of the program.
o Finally, the assembler must write generated object code onto some output
device.
The header record contains the program name, starting address and length. Header record:
col:1 H
col:2-7 program name
col:8-13 Starting address of object program col:14-19 Length of object
program in bytes
The text program, together with an indication of the addresses where these are to be loaded.
Text Record:
col:1 T
col:2-7 Starting address for object code in the record
col:8-9 Length of object code in this record in bytes
col:10-69 Object code, represented in hexadecimal
The end record marks the end of the object program and specifies the address in the program
where execution is to begin.
End record:
col:1 E
col:2-7 Address of first executable instruction in object program.
The scope of the assembler is, to generate object code. But assembler does not know the
address exactly.so that the assembler choose pass1 algorithm and pass 2 algorithm4.
Pass:1
1. Assign addresses to are statements in the program. 2.Save the values assigned to are labels
for use in pass 2. 3. Perform some processing of assembler directives.
Pass:2
1. Assemble instructions. 2. Generate data values.
3. Perform processing of assembler directives not done during pass 1.
4.Write the object program and the assembly listing.
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
10
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
Program Relocation
More than one program can share the memory and other resources of the
machine.
If we knew in advance,which program would execute concurrently,we could
assign address,when the program were assembled so that they would fit together
without overlap.But practically this may not be possible.
So it is desirable to load a program into the memory whenever there is a
space for it.
In such cases actual starting address of the program is not know until load
time.
If the program is loaded beginning at the location 1000,the variable THREE
value will located at address 102D.
If the program is loaded starting at some other addresss 2000,the address
102D will not contain the actual value of THREE.
So we have to make some changes in the address portion of the instruction
in order to retrieve the correct value.
Eg:
0006 CLOOP +JSUB RDREC 4B101036
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
13
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
.
.
1036 RDREC CLEAR X B410
Case 1:
The statement RDREC is present at the memory location 1036,if the program loaded
beginning at address 0000.
0000
.
0006 4B101036 <--+JSUB RDREC
.
. 1036
B410 <--RDREC
Case 2:
5000
.
. 5006
4B106036 <--+JSUB RDREC
.
.
6036 B410 <--RDREC
The address of the instruction JSUB the address of label RDREC.
The assembler loaded. However the assembler can identify for the loader those parts of the object
program that need modification.
An object program that contains the information to perform this kind of modification is called a
reloadable program.
Relocation Program Solving Steps:
When the assembler generates the object code for the JSUB instruction,it
will insert the address of RDREC,relative to the start of the program.(This is the reason we
initialized the location counter to 0 for the assembly)
The assembler will also produce a command for the loader, instructing it to
add the beginning address of the program to the address field in the JSUB instruction at load
time.
Modification Record:
col:1 M
col:2-7 Starting location of the address field to be modified relative to the beginning of the
program.
Col:8-9 Length of address field to be modified in half bytes. (ie. 4 bits=1 half byte)
For all the instruction which uses extended format instruction,relocation
must be performed, so modification record must be added.
Other lines in the program do not require modification as they use pc
relative or base relative addressing.
Machine Independent Assembler Features
This features that are commonly found in implementation of this type of software and that are
relatively machine independent.
Literals
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
14
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
Immediate addressing ,the operand value ios assembled as part of the machine instruction.In
literal the assembler must generate the value as a constant in any of the memory location.
Address of the constant is assigned as the target address.
Literal pool
Literals are stored in literal pool. This operation is carried out the end of the
program.
LTORG-->Assembler directives
It creates the literals pool immediately and store the literals until the
previous LTORG.
Once a literal is stored in the literal pool then it is not repeated again.
In some program the LTORG is placed in the middle of the program, this is
because the literals are placed in the pool at the end of the program.
When there is a literal at the beginning of the program and the program has
300 lines means then the starting address of the literal pool is at the end of the program.
The reference for the operand make the pc to go for to reach literal and this
waste the time.So it is possible to use as much LTORG statement in the program.
Most of the assembler does not allow duplication of literals in the literal
pool.They allow the same literal used more than one place in the program.
In literal pool only one copy of the specified date value is stored.
Before allocating space for a literal in the pool,it is verified that is there the
same literal is already in the pool by means of comparing the literals in the pool character
with the new literal.
For example,
Same literal is used more than once,and the literal has different values during the
execution of the program. Here according to the duplication of literals in the pool, the above
mentioned literal is appeared once in the pool and the execution may be a problem.
The solution is created basic data structure literal table,Literal tabel contains,
-Literal name
-The operand value and length
-Address assigned to the operand
During pass 1 the assembler searches the LITTAB for a literal name.If the
literal is present means no problem.If it is not the literal is added to the literal tabel.
During pass 2 the assembler searches the LITTAB for the literal address for
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
15
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
Eg:
A EQU 0 BASE EQU R1
X EQU 1 COUNT EQU R2
L EQU 2 INDEX EQU R3
These statements specify a 1 byte literal with the hexadecimal value 05.The
notation used for literal varies from assembler to assembler.
It is important to understand the difference between a literal and an
immediate operand with immediate addressing,the operand value is assembled as part of the
machine instruction.
With literal the assembler generates the specified value as a constant at
some other memory location.
BASE *
LDB =*
Another assembler directive is called ORG.This is used to indirectly
assign the values to symbols.
When value is a constant or an expression involving constants and
previously defined symbol.
SYMBOL RESB 6
VALUE RESB 1
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
16
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
FLAGS RESB 2
ORG STAB +1100
The first ORG resets the location counter to the value of STAB.The label on the following RESB
statements defines SYMBOL to have the current value in LOCCTR.
(ie)the same address assigned to SYMTAB LOCCTR.
Expressions:
Our previous examples of assembler language statements have used single
terms like label,literal,etc.,as instruction operands.
Most of the assemblers use expression wherever a single operand is
permitted.
Such expression is evaluated by the assembler and the result is used as the
normal operand.
Arithmetic expressions are allowed and it must follow the normal rules
using the operators
+,-,* and /.
This statement is encountered during assembly of a program,the assembler
refers its location contain(LOCCTR)to the specified value we can define a symbol tabel with all
following structures.
SYMBOL VALUE FLAGS
*Absolute expression
*Relative expression
The expressions are depending upon the type of value they produce.
Expression that contains only absolute terms are come under absolute
expression.
There are some conditions19 to use the relative terms in the expressions,
*Every relative term is paired with another relative term.
*Remaining unpaired term is assigned with a pasitive sign.
*Relative term is not allowed for multiplication and division operation.
Expressions that do not come under absolute or relative are flagged by the
assembler an errors.
Some timer relative terms are paired with opposite signs,in that case the
result is an absolute value.
MAXLEN EQU BUFEND-BUFFER
Program Blocks
Normally the source program is treated as a unit which contains
subroutines,data areas,etc.,
The assembler evaluates the program and results in a single unit of object
code.
Some features of assembler allow generalized machine instruction and data
to appear in the object program in a different order from the corresponding source
statements.
These parts maintain their identity and are handled separately by the
loader.
We use the program blocks to refer to segments of code that are arranged
within a single object program unit and control sections to refer to segments that are
translate into independent object program units.
Each program blocks may actually contain several seperate segments of the
source program.
In this case three blocks are used.The first program block contains the
executable instructions of the program.(unnamed block).
The second block(C DATA)contains all data areas that are small in
length.
The third (C BLKS) contains all data areas that consist of larger blocks of
memory.
The assembler directive USE indicates which portions of the source
program belongs to the various blocks.
At the beginning of the program,statements are assumed to be part of
unnamed (default)block.If no USE statements use included ,the entire program belongs to this
single block.The assembler will rearrange these segmants to gather together the pieces of
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
18
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
each block.These blocks will then be assigned addresses in the object program,with the blocks
appearing in the same order in which they were first begun in the source program.
The assembler accomplishes this logical rearrangement of code by
maintaining,during pass 1 a seperate location counter for each program block.The location
counter for a block is initialized to '0' when the block is first begun.The current value if this
location counter is saved when switching to another block.And the saved value is restored
when resuming a previous block.
During pass 1 each label in the program is assigned an address that is
relative to the start of the block that contains it.
When labels are entered into the symbol tabel,the block name or number is
stored along with the assigned relative address.At the end of pass 1 the latest value of the
location counter for each block indicates the length of that block.The assembler can then
assign to each block a starting address in the object program.For code generation during pass
2,the assembler needs the address for each symbol relative to the start of the object program.
Symbols that are defined in control section may not be used directly by
another section;they must be identified as external references for loader to handle.
EXTDEF – EXTERNAL DEFINITION EXTREF – EXTERNAL REFERENCE
The two new record types21 are DEFINE and REFER. A Define record gives
information about external symbol that are defined in this control section. A Refer record lists
symbols that are yield as external references by the control section.
DEFINE RECORD:
COL 1 :D
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
19
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
REFER RECORD:
COL 1 :R
COL 2-7 :Name of external symbol.
COL 8-13 :Name of the other external reference symbols.
MODIFICATION RECORD:
COL 1 :M
COL 2-7 :Starting address of the field to be modified. COL 8-9
:Length of the field to be modified as half bytes.
COL 10 :Modification flag.
COL 11-16 :External symbol whose value is to be added or
subtracted to the indication field.
When the definition for the symbol is encountered, scans the reference list and inserts the
address.
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page |
20
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
At the end of the program, reports the error if there are still SYMTAB entries indicated
undefined symbols.
Multi-Pass Assemblers:
For a two pass assembler, forward references in symbol definition are not allowed:
Implementation:
For a forward reference in symbol definition, we store in the SYMTAB: The symbol name
The number of undefined symbols in the defining expression.The undefined symbol (marked
with a flag *) associated with a list of symbols depend on this undefined symbol.When a
symbol is defined, we can recursively evaluate the symbol expressions depending on the
newly defined symbol.
IMPLEMENTATION EXAMPLE
MASAM assembler
SPARC assembler
MASAM assembler
MASAM assembler is written for Pentium and other x 86 systems.Since x 86 system views
memory as a collection of segments, MASAM assembler language program is written as a
collection of segments.Each segment is defined as belonging to a particular class.Commonly
used classes are CODE, DATA, CONST and STACK.During program execution, segments are
addressed via the x 86 segment registers.Code segment are addressed using register CS Start
segments are addressed using register SS Data segments are addressed using DS or GS.
Jump instructions are assembled in two different ways ‘
Near jump and Far jump
Macro Processors
A macro represents a commonly used group of statements in the source programming
language. The macro processor replaces each macro instruction with the
corresponding group of source language statements. This is called expanding the
macros.
For example, suppose that it is necessary to save the contents of all registers before
calling a subprogram.
On SIC/XE, this would require a sequence of seven instructions (STA, STB, etc.).
Using a macro instruction, the programmer could simply write one statement like
SAVEREGS. This macro instruction would be expanded into the seven assembler language
instructions needed to save the register contents.
Fig 4.1 shows an example of a SIC/XE program using macro instructions. The
definitions of these macro instructions (RDBUFF and WRBUFF) appear in the source
program following the START statement.
Two new assembler directives (MACRO and MEND) are used in macro definitions.
The first MACRO statement (line 10) identifies the beginning of a macro definition.
The symbol in the label field (RDBUFF) is the name of the macro, and the entries in the operand
field identify the parameters of the macro instruction.
In our macro language, each parameter begins with the character &, which facilitates the
substitution of parameters during macro expansion.
The macro name and parameters define a pattern or prototype for the macro
instructions used by the programmer.
Following the MACRO directive are the statements that make up the body of the macro
definition.
The MEND assembler directive marks the end of the macro definition.
Fig 4.2 shows the output that would be generated. Each macro invocation statement has
been expanded into the statements that form the body of the macro, with the arguments
from the macro invocation substituted for the parameters in the macro prototype.
For example, in expanding the macro invocation on line 190, the argument F1 is substituted
for the parameter &INDEV wherever it occurs in the body of the macro.
Similarly, BUFFER is substituted for &BUFADR, and LENGTH is substituted for &RECLTH.
The comment lines within the macro body have been deleted. Note that the macro
invocation statement itself has been included as a comment line. This serves as
documentation of the statement written by the programmer.
The label on the macro invocation statement (CLOOP) has been retained as a label on the
first statement generated in the macro expansion.
This allows the programmer to use a macro instruction in exactly the same way as an
assembler language mnemonic.
Note that the two invocations of WRBUFF specify different arguments, so they produce
different expansions.
After macro processing, the expanded file (Fig 4.2) can be used as input to the assembler.
In general, the statements that form the expansion of a macro are generated (and
assembled) each time the macro is invoked (see Fig 4.2). Statements in a subroutine
appear only once, regardless of how many times the subroutine is called (see Fig 2.5).
Approach 1: It is easy to design a two-pass macro processor in which all macro definitions
are processed during the first pass, and all macro invocation statements are expanded
during the second pass.
However, such a two-pass macro processor would not allow the body of one macro instruction
to contain definitions of other macros (because all macros would have to be defined during the
first pass before any macro invocations were expanded).
Approach 2: A one-pass macro processor that can alternate between macro definition and
macro expansion is able to handle macros like those in Fig 4.3.
Because of the one-pass structure, the definition of a macro must appear in the source
program before any statements that invoke that macro.
There are three main data structures involved in our
macro processor.
The macro definitions themselves are stored in a definition table (DEFTAB), which
contains the macro prototype and the statements that make up the macro body (with a
few modifications). Comment lines from the macro definition are not entered into
The third data structure is an argument table (ARGTAB), which is used during the
expansion of macro invocations.
When a macro invocation statement is recognized, the arguments are stored in ARGTAB
according to their position in the argument list.
As the macro is expanded, arguments from ARGTAB are substituted for the corresponding
parameters in the macro body.
Fig 4.4 shows portions of the contents of these tables during the processing of program
in Fig 4.1.
Fig 4.4(a) shows the definition of RDBUFF stored in DEFTAB, with an entry in NAMTAB identifying the
beginning and end of the definition.
Note the positional notation that has been used for the parameters: &INDEV € ?1(indicatingthe
firstparameterin the prototype), &BUFADR €?2, etc.
Fig 4.4(b) shows ARGTAB as it would appear during expansion of the RDBUFF statement on line 190.
In this case (this invocation), the first argument is F1, the second is BUFFER, etc.
The procedure DEFINE, which is called when the beginning of a macro definition is recognized,
makes the appropriate entries in DEFTAB and NAMTAB.
EXPAND is called to set up the argument values in ARGTAB and expand a macro invocation statement.
The procedure GETLINE, which is called at severalpoints in the algorithm, gets the next line to be
processed. This line may come from DEFTAB (the next line of a macro begin expanded), or from the
input file, depending on whether the Boolean variable EXPANDING is set to TRUE or FALSE.
One aspect of this algorithm deserves further comment: the handling of macro definitions within
macros (as illustrated in Fig 4.3).
Prepared by: P.Kalamani Sri Sairam College of Engineering Anekal. Page | 2
Regulation – 2017(CBCS Scheme) System software & Compiler Design– 17CS63
The DEFINE procedure maintains a counter named LEVEL. Each time a MACRO directive is read, the value
of LEVEL is increased by 1.
Each time an MEND directive is read, the value of LEVEL is decreased by 1.
When LEVEL reaches 0, the MEND that corresponds to the original MACRO directive has been found.
The above process is very much like matching left and right parentheses when scanning an
arithmetic expression.