0% found this document useful (0 votes)
152 views

Assembly Language: by - Prof. Prithi K.S

Assembly language is a low-level programming language that is directly influenced by a processor's instruction set and architecture. It uses mnemonics instead of binary and must be translated into machine language by an assembler. Assembly language offers better efficiency in terms of space and time compared to high-level languages but is more difficult to program, maintain, and port. It is still used for applications requiring direct hardware access or optimal performance.

Uploaded by

prithiks
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views

Assembly Language: by - Prof. Prithi K.S

Assembly language is a low-level programming language that is directly influenced by a processor's instruction set and architecture. It uses mnemonics instead of binary and must be translated into machine language by an assembler. Assembly language offers better efficiency in terms of space and time compared to high-level languages but is more difficult to program, maintain, and port. It is still used for applications requiring direct hardware access or optimal performance.

Uploaded by

prithiks
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 67

ASSEMBLY LANGUAGE

BY – PROF. PRITHI K.S.


Programmer’s View

A programmer’s view of a computer system depends on the type and


level of language they intend to use. From the programmer’s viewpoint, there
exists a hierarchy from low-level languages to high-level languages. As we move
up in this hierarchy, the level of abstraction increases.
At the lowest level, we have the machine language that is the native
language of the machine. This is the language understood by the machine
hardware. Since digital computers use 0 and 1 as their alphabet, machine
language naturally uses 1s and 0s to encode the instructions. One level up, there
is the assembly language as shown in Figure.
Assembly language does not use 1s and 0s; instead, it uses mnemonics
to express the instructions. Assembly language is a close relative of the machine
language.
A programmer’s view of a computer system.
Assembly Language

Assembly language is a low-level programming language that is directly


influenced by the instruction set and architecture of a processor. Assembly
language code must be translated into the processor’s machine language. This
translation is done by a piece of software called the assembler.
MASM (Microsoft assembler), TASM (Borland turbo assembler), and
NASM (Netwide Assembler) are some of the popular assemblers available for
the Intel processors. Compared to assembly language, high-level languages
(HLLs) such as C offer several advantages in program development, such as:
Faster program development,
Easier program maintenance,
Portability.
There are two main reasons why programming is still done in assembly
language:
Efficiency
Accessibility to system hardware.
Translation Of Language
Why Program in Assembly Language?

Assembly language programs tend to be lengthy and take more


time to code and debug. As a result, they are also difficult to maintain.
Also assembly language programs are written for a particular system and
cannot be used on a different system.
Despite these disadvantages, some programming is still done in
assembly language. There are two main reasons for this: efficiency (Space
and Time) and accessibility to system hardware. Efficiency refers to how
“good” a program is in achieving a given objective.
Space-efficiency

Space-efficiency refers to the memory requirements of a


program (i.e., the size of the code). Program A is said to be
more space-efficient if it takes less memory space than
program B to perform the same task. Very often, programs
written in an assembly language tend to generate more
compact executable code than the corresponding high-level
language version.
Time-efficiency

Time-efficiency refers to the time taken to execute a


program. Clearly, a program that runs faster is said to be better
from the time-efficiency point of view. Programs written in an
assembly language tend to run faster than those written in a high-
level language. However, sometimes a compiler-generated code
executes faster than a handcrafted assembly language code!
Registers

The Pentium has ten 32-bit and six 16-bit registers. These

registers are grouped into general, control, and segment registers.

The general registers are further grouped into data, pointer, and

index registers.
Data Registers

There are four 32-bit data registers that can be used for
arithmetic, logical, and other operations. These four
registers are unique in that they can be used as follows:
• Four 32-bit registers (EAX, EBX, ECX, EDX); or
• Four 16-bit registers (AX, BX, CX, DX); or
• Eight 8-bit registers (AH, AL, BH, BL, CH, CL, DH, DL).
Data Registers
Pointer and Index Registers

Following figure shows the four 32-bit registers in this group.


These registers can be used either as 16-or 32-bit registers. The two
index registers play a special role in string processing instructions.
In addition, they can be used as general-purpose data registers.
The pointer registers are mainly used to maintain the stack.
Even though they can be used as general-purpose data registers, they
are almost exclusively used for maintaining the stack.
Pointer and Index Registers
Control Registers

This group of registers consists of two 32-bit registers: the


instruction pointer register and the flags register. The processor
uses the instruction pointer register to keep track of the location of
the next instruction to be executed. The instruction pointer can be
used either as a 16-bit register (IP), or as a 32-bit register (EIP). IP
is used for 16-bit addresses and EIP for 32-bit addresses.
When an instruction is fetched from memory, the instruction
pointer is updated to point to the next instruction. This register is
also modified during the execution of an instruction that transfers
control to another location in the program (such as a jump,
procedure call, or interrupt).
Flags Register

The flags register can be considered as either a 16-bit

FLAGS register, or a 32-bit EFLAGS register. The FLAGS

register is useful in executing 8086 processor code. The EFLAGS

register consists of 6 status flags, 1 control flag, and 10 system

flags, as shown in Figure. Bits of this

register can be set (1) or cleared (0).


Flags Register
Instruction Pointer Registers
Segment Registers

The six 16-bit segment registers of the Pentium are shown in Figure.
These registers support the segmented memory organization of the Pentium. In
this organization, memory is partitioned into segments, where each segment is a
small part of the memory.
The processor, at any point in time, can only access up to six segments
of the main memory. The six segment registers point to where these segments
are located in the memory.
The program is logically divided into two parts: a code part that
contains only the instructions, and a data part that keeps only the data. The code
segment (CS) register points to where your instructions are stored in the main
memory, and the data segment (DS) register points to your data segment
location. The stack segment (SS) register points to the program’s stack segment.
The last three segment registers—ES, GS, and FS—are additional segment
registers that can be used in a similar way as the other segment registers.
Segment Registers
Instruction Formats

Processors use two types of basic instruction format: fixed-


length or variable-length instructions. In the fixed-length encoding, all
(or most) instructions use the same size instructions. In the latter
encoding, the length of the instructions varies quite a bit.
Typically, RISC processors use fixed-length instructions, and
the CISC designs use variable-length instructions. All 32-bit RISC
processors discussed in this book use instructions that are 32-bits wide.
Some examples are the SPARC, MIPS, and PowerPC processors.
Instruction Formats

The Intel Itanium, which is a 64-bit processor, uses fixed-


length, 41-bit wide instructions. The size of the instruction depends
on the number of addresses and whether these addresses identify
registers or memory locations. Following figure shows how the size
of the instruction varies with the number of addresses when all
operands are located in registers. This format assumes that eight
bits are reserved for the operation code (opcode).
Thus we can have 256 different instructions. Each operand
address is five bits long, which means we can have 32 registers.
This is the case in processors like the MIPS. The Itanium, for
example, uses seven bits as it has 128 registers.
Instruction Formats
Addressing Modes

Addressing mode refers to how the operands are specified.


Operands can be in one of three places: in a register, in memory, or
part of the instruction as a constant. Specifying a constant as an
operand is called the immediate addressing mode. Similarly,
specifying an operand that is in a register is called the register
addressing mode. All processors support these two addressing
modes.
Addressing Modes

The difference between the RISC and CISC processors is in


how they specify the operands in memory. RISC processors follow
the load/store architecture. Instructions other than load and store
expect their operands in registers or specified as constants. Thus,
these instructions use register and immediate addressing modes.
Memory-based operands are used only in the load and store
instructions. In contrast, CISC processors allow memory-based
operands for all instructions.
Addressing Modes

Most assembly language instructions require specification


of operands. The Pentium assembly language provides several ways
to specify the operands. These are called addressing modes. An
operand may be in one of the following locations:
• In a register internal to the CPU;
• In the instruction itself;
• In main memory (usually in the data segment);
• At an I/O port
Addressing Modes

Most assembly language instructions require specification


of operands. The Pentium assembly language provides several ways
to specify the operands. These are called addressing modes. An
operand may be in one of the following locations:
• In a register internal to the CPU;
• In the instruction itself;
• In main memory (usually in the data segment);
• At an I/O port
Addressing Modes

Specification of an operand that is in a register can be done

by register addressing mode, whereas immediate addressing mode

refers to specifying an operand that is part of the instruction. We

will describe two basic addressing modes to specify an operand

located in memory. These addressing modes are called memory

addressing modes.
Register Addressing Mode

In this addressing mode, CPU registers contain the operands. For example, the
instruction -
mov EAX,EBX
requires two operands and both are in the CPU registers. The syntax of the move
(mov) instruction is
mov destination,source
The mov instruction copies the contents of source to destination. The contents of
source, however, are not destroyed. Thus,
mov EAX,EBX
copies the contents of the EBX register into the EAX register. In this example,
mov is operating on 32-bit data. However, we can also use the mov instruction to
copy 16- and 8-bit data, as shown in the following examples:
mov BX,CX
mov AL,CL
Register Addressing Mode

The register addressing mode is the most efficient


way of specifying operands for two reasons:
• The operands are in the registers and no memory access
is required.
• Instructions using the register mode tend to be shorter,
as only 3 bits are needed to identify a register. In contrast,
we need at least 16 bits to identify a memory location.
Immediate Addressing Mode

In this addressing mode, data are specified as part of the instruction.


As a result, even though the data are in memory, they are located in the code
segment, not in the data segment. This addressing mode is typically used to
specify a constant, either directly or via the EQU directive . In the example
mov AL,75
the source operand 75 is specified in the immediate addressing
mode and the destination operand is specified in the register addressing
mode. Such instructions are said to use mixed mode addressing. Immediate
addressing mode is also faster because the operand is fetched into the
instruction queue along with the instruction during the instruction fetch
cycle. This prefetch, therefore, reduces the time required to get the operand
from memory
Direct Addressing Mode

Operands specified in a memory addressing mode require access to


the main memory, usually to the data segment. As a result, they tend to be
slower than either of the last two addressing modes. Recall that to locate a
data item in a data segment, we need to specify two components: the data
segment base address and an offset value within the segment. Recall that
the offset is sometimes referred to as the effective address. The start address
of the segment is typically found in the DS register. Thus, various memory
addressing modes differ in the way the offset is specified.
In direct addressing mode, the offset is specified directly as part of
the instruction. In assembly language programs, the value is usually
indicated by the variable name of the data item referenced. The assembler
will translate the name to its offset value during the assembly process using
the symbol table.
Indirect Addressing Mode

The direct addressing can be used to access simple variables. The main
drawback of this addressing mode is that it is not useful for accessing complex
data structures such as arrays and records that are used in high-level languages.
For example, it is not useful for accessing the second element of table1, as in

table1[1] = 99

The indirect addressing mode remedies this deficiency. In this addressing


mode, offset of the data is in one of the general registers. For this reason, this
addressing mode is sometimes called the register indirect addressing mode.
Data Transfer Instructions
The mov Instruction
We have already introduced the mov instruction, which requires two operands and
has the syntax
mov destination,source
The data are copied from source to destination, and the source operand remains unchanged.
Both operands should be of the same size. The mov instruction can take one of the following
five forms:
mov register,register
Restrictions:
• Destination register cannot be CS or (E)IP registers.
• Both registers cannot be segment registers.
mov register,immediate
Restriction: Register cannot be a segment register.
mov memory,immediate
mov register,memory
mov memory,register
There is no move instruction to transfer data from memory to memory, as the Pentium does
not allow it.
The xchg Instruction

The xchg instruction exchanges 8-, 16-, or 32-bit source and


destination operands. The syntax is similar to that of the mov instruction.
Here are some examples:
xchg EAX,EDX
xchg response,CL
xchg total,DX
As in the mov instruction, both operands cannot be located in memory.
Thus,
xchg response,name1 ; illegal
is invalid. The xchg instruction is convenient because we do not need a
third register to hold a temporary value in order to swap two values.
The xlat Instruction

The xlat (translate) instruction can be used to perform character


translation. For example, it can be used to translate character codes from
ASCII to EBCDIC and vice versa. The xlat has the form
xlatb
To use the xlat instruction, the BX register must be loaded with the
starting address of the translation table and AL must contain an index value
into the table. The xlat instruction adds contents of the AL to the BX and
reads the byte at the resulting address. This byte replaces the index value in
the AL register.
Arithmetic Instructions

The Pentium provides several instructions to perform simple

arithmetic operations. In this section, we describe five instructions

to perform addition, subtraction, and comparison. Arithmetic

instructions update the status flags, to record the result of the

operation.
Increment/Decrement Instructions

These instructions can be used to either increment or

decrement the operands. The inc (increment) instruction adds one

to its operand, and the dec (decrement) instruction subtracts one

from its operand. Both instructions take a single operand. The

operand can be either in a register or in memory.


Add Instructions

The add instruction can be used to add two 8-, 16-, or 32-bit
operands. The syntax is
add destination,source
As with the mov instruction, add can also take the five basic forms
depending on how the two operands are specified. The semantics of the add
instruction are
destination = destination + source
As a result, destination loses its contents before the execution of
add but the contents of source remain unchanged.
Subtract Instructions

The sub (subtract) instruction can be used to subtract two 8-,


16-, or 32-bit numbers. The syntax is
sub destination,source
The source operand is subtracted from the destination
operand and the result is placed in the destination.
destination = destination - source
Compare Instruction

The cmp (compare) instruction is used to compare two operands (equal, not
equal, and so on).
cmp destination, source
subtracts the source operand from the destination operand but does not alter
any of the two operands, as shown below:
destination – source
The flags are updated as if the sub operation were performed. The main
purpose of the cmp instruction is to update the flags so that a subsequent
conditional jump instruction can test these flags.
Logical Instructions

The Pentium instruction set provides five logical instructions: and, or, xor, test,
and not. The syntax of these instructions is
and destination,source
or destination,source
xor destination,source
test destination,source
not destination
The and, or, and xor are binary operators and perform bitwise and, or, and xor
logical operations. The not is a unary operator that performs bitwise complement
operations. The binary logical operations set the destination by performing the specified
bitwise logical operation on the source and destination operands. The logical not
operation simply flips the bits: a 1 in input becomes a 0 in the output, and vice versa.
Shift Instructions

The Pentium provides two types of shift

instructions: logical shifts and arithmetic shifts. Within

each type, left and right shifts are supported.


Logical Shift Instructions

The shl (shift left) instruction can be used to left-shift a destination


operand. Each left-shiftcauses the leftmost bit to move to the carry flag (CF), and
the vacated rightmost bit is filled with a zero as shown below:

The shr (shift right) instruction works similarly but shifts bits to the
right, as shown below:

The general format of these instructions is


shl destination, count shr destination,count
shl destination, CL shr destination,CL
Arithmetic Shift Instructions

This set of shift instructions


sal (Shift Arithmetic Left)
sar (Shift Arithmetic Right)
can be used to shift signed numbers left or right, as shown below:
Subroutines in Assembly Language

In a given program, it is often needed to perform a particular sub-


task many times on different data values. Such a subtask is usually called a
subroutine. For example, a subroutine may sort numbers in an integer array
or perform a complex mathematical operation on an input variable (e.g.,
calculate sin(x)). It should be noted, that the block of instructions that
constitute a subroutine can be included at every point in the main program
when that task is needed.
However, this would result in unnecessary waste of memory space.
Rather, only one copy of the instructions that constitute the subroutine is
placed in memory and any program that requires the use of the subroutine
simply branches to its starting location in memory. The instruction that
performs this branch is named a CALL instruction. The calling program is
called CALLER and the subroutine called is called CALLEE.
Subroutines in Assembly Language

Assemblers provide two directives to define procedures in


the assembly language: PROC and ENDP. The PROC directive
(stands for procedure) signals the beginning of a procedure, and
ENDP (end procedure) indicates the end of a procedure. Both these
directives take a label that is the name of the procedure. In addition,
the PROC directive may optionally include NEAR or FAR to
indicate whether the procedure is a NEAR procedure or a FAR
procedure.
Subroutines in Assembly Language

The general format is


proc-name PROC NEAR
to define a near procedure, and
proc-name PROC FAR
to define a far procedure. A procedure is said to be a near procedure if the
calling and called procedures are both located in the same code segment.
On the other hand, if the calling and called procedures are located in two
different code segments, they are called far procedures. Near procedures
involve intrasegment calls, and far procedures involve intersegment calls.
Assembly Language Statements

Assembly language programs are created out of three

different classes of statements:

First class also called executable instructions

Second class also called assembler directives or pseudoops

Third class also called assembler macros


Translators

As we have already seen, the only language that a computer


can understand is the so called machine language. These languages
are composed of a set of basic operations whose execution is
implemented in the hardware of the processor. We have also seen
that high level programming languages provide a machine-
independent level of abstraction that is higher than the machine
language. Therefore, they are more adapted to a human-machine
interaction. But this also implies that there is a sort of translator
between the high level programming language and the machine
languages. There exists two sorts of translators: Interpreters and
Compilers.
Interpreter

An Interpreter is a program that implements or simulates a virtual


machine using the base set of instructions of a programming language as
its machine language.
You can also think of an Interpreter as a program that implements
a library containing the implementation of the basic instruction set of a
programming language in machine language.
An Interpreter reads the statements of a program, analyzes them
and then executes them on the virtual machine by calling the corresponding
instructions of the library.
Interpretation
Compiler

Compiler is a program that translates code of a


programming language in machine code, also called object code.
The object code can be executed directly on the machine where it
was compiled. So using a compiler separates translation and
execution of a program. In contrast of an interpreted program the
source code is translated only once.
The object code is machine-dependent meaning that the
compiled program can only be executed on a machine for which it
has been compiled, whereas an interpreted program is not machine-
dependent because the machine-dependent part is in the interpreter
itself.
Compilation
Executable Instructions

Assembly language programs are created out of three


different classes of statements. Statements in the first class tell the
CPU what to do. These instructions are called executable
instructions, or instructions for short. Each executable instruction
consists of an operation code (opcode for short). Executable
instructions cause the assembler to generate machine language
instructions. Each executable statement typically generates one
machine language instruction.
Assembler Directives Or Pseudoops

The second class of statements provides information to the

assembler on various aspects of the assembly process. These

instructions are called assembler directives or pseudoops.

Assembler directives are non-executable and do not generate

machine language instructions.


Macros

The last class of statements, called macros, is used as a


shorthand notation for a group of statements. Macros permit the
assembly language programmer to name a group of statements and
refer to the group by the macro name. During the assembly process,
each macro is replaced by the group of statements that it represents
and is assembled in place. This process is referred to as macro
expansion. We use macros to provide the basic input and output
capabilities to standalone assembly language programs.
Format of Assembly Language

All three classes of the assembly language statements use


the same format:
[label] mnemonic [operands] [;comment]
The fields in the square brackets are optional in some
statements. As a result of this format, it is a common
practice to align the fields to aid readability.
Label

This is an optional field. The label field serves two distinct


purposes: it is used to represent either an identifier or a constant.
When a label appears in an executable instruction, it is used as a
marker to identify the instruction. Then, for example, you can make
the program execution jump to the labeled instruction. In this case,
the label represents the memory address of the instruction. When
used with certain assembler directives such as EQU, the label
represents a constant.
Mnemonic

This is a required field and identifies the purpose of


the statement. In certain lines of code, this field is not
required. Examples include lines consisting of a comment,
or a label, or a label and a comment.
Operands

Operands specify the data to be manipulated by the


statement. The number of operands required depends on
the specific statement or directive. For instance,
executable statements may have zero, one, two, or three
operands.
Comment

This is an optional field and serves the same purpose as comments


in a high-level language. Comments play a more important role in assembly
language, as it is a low-level language. Assembler ignores all comments.
Comments begin with a semicolon (;) and extend until the end of
the line. Since the readability of assembly language programs is poor,
comments should be generously added to improve readability.
It is good programming practice to explain the functionality of a
group of statements by several lines of comments and then add brief
comments to selected code lines within the group.
Symbol Table

When we allocate storage space using a data definition directive, we usually


associate a symbolic name with it for reference. The assembler, during the assembly
process, assigns an offsetvalue to each symbolic name. For example, consider the
following data definition statements:
.DATA
value DW 0
sum DD 0
marks DW 10 DUP (?)
message DB ’The grade is:’,0
char1 DB ?
As said earlier, the assembler assigns contiguous memory space for the
variables. The assembler uses the same ordering of variables that is present in the
source code. Thus, finding the offset value of a variable is a simple matter of
counting the number of bytes allocated to the variables preceding it. For example, the
offset of marks is 6 because value and sum are allocated 2 and 4 bytes, respectively.
Symbol Table
Assembler Directives

An assembler directive is a message to the assembler that tells


the assembler something it needs to know in order to carry out the
assembly process; for example, an assemble directivetess the
assembler where a program is to be located in memory. In each case,
the term <label> indicates a user-defined label (i.e., symbolic
name) that must start in column 1 of the program, and <value>
indicates a value that must be supplied by the programmer (this may
be a number, or a symbolic name that has a value).
Assembler Directives Examples

Include 'C:\PICTOOLS\16C877.inc' ; loads default symbols


  ; for the targeted device.
FUSES _WD_OFF&_LP_OSC ; specify multiple fuse settings
  ; using the '&' operator.
FUSES _CP_ON ; Specifies 1 fuse setting per line.
   
Digit = 43h ; Assign value 43h to Digit
Max EQU 1Ah ; Assign value 1Ah to Max
ORG 10h ; Set assembly address to 10h
Count DS 2 ; Define 2 bytes at 10h & 11h
  ; Bytes can be referred to
  ; later as Count and Count+1
Assembler Directives Examples

ID 1234h ; Set 16C5x ID to 1234h


ID ‘ABCD’ ; Set newer PIC ID to 'ABCD'
INCLUDE ‘KEYS.SRC’ ; Include KEYS.SRC file at
  ; point of insertion
RESET Start ; Set 16C5x reset jump to
  ; location at Start
Start mov Count,#00 ; This will be executed
  ; when PIC is reset
EEORG 10h ; Set EEPROM address to 10h
EEDATA 02h,88h,34h ; Store 3 bytes in EEPROM

You might also like