Computer
An electronic equipment that performs high-speed arithmetic, logical or transfer operations or that assembles, stores, correlates, or otherwise processes information. Hardware + Software
EXAMPLE
Addition Function is is programmed by dedicated the interconnection A 4-bit adder circuit a specialized, computer of wires Computer Elements are Logic Gates
Early Computer - ENIAC
(Electronic Numerical Integrator and Computer)
1946, Univ. of Pennsylvania Constructed by Eckert and Mauchly 30 tons 1500 sq ft 18,000 Vacuum Tubes Programmed by connecting:
switches and cables
6000 different switches and cables!
HARDWARE PROGRAMMABLE
Stored Program Concept
John von Neumann (Univ. of Penn.), A.W. Burks, H.H. Goldstein - wrote paper describing Stored Programs Based on Theoretical Mathematical model, Turing Machine, Alan Turing, 1936
Simple type of computer
works by reading/writing symbols on tape tape can move left or right
Finite State Machine As powerful (in theory) as any possible computer Still Used in Computability Theory, find out what they cant do
Charles Babbage also proposed a mechanical device that could
store a program, Analytical Engine in 1822
Never constructed (although he tried)
First Stored Program Computer - EDVAC
(Electronic Discrete Variable Automatic Computer)
Completed in 1952 Storage: 1000 words 1 word=10 decimal digits Programmed using Paper Tape
Sequential Storage
Random Access Core Developed
Random Access
SOFTWARE PROGRAMMABLE
Modern Computer Model
CPU - Central Processing Unit = Microprocessor
arithmetic, logical, synchronization functions
Memory - Stores Information (DRAM, ROM)
CPU instructions and Data
I/O - Input/Output Devices - Peripherals (KBD, PRN)
Interface to Outside World (humans, other machines)
Bus - Set of Parallel wires (Address Bus, Data Bus)
Transmit instructions/data between CPU/Memory/IO
Fetch and Execute Cycle
1. CPU issues: (FETCH)
Memory Read Signal on Control Bus Location of Desired Memory Data on Address Bus
2.
Memory issues:
(FETCH)
Data Valid Signal on Control Bus Actual Memory Data on Data Bus
3.
CPU Stores Data Internally
(FETCH)
Contains Registers
4. CPU Interprets Data as an Instruction (FETCH/EXECUTE*)
Instruction Decoder Circuit
5.
CPU Performs the Instruction (EXECUTE)
May Require more Memory Accesses *Some consider step 4. To be part of fetch; others part of Execute May Require Interaction with Peripherals
Quantifying Memory
Measured in the quantity of BInary digiT (BIT)
1 nybble 1 byte 1 word 1 doubleword 1 quadword 1 paragraph 1 page 1 segment (max) = = = = = = = = 4 bits 8 bits 16 bits 32 bits 64 bits 16 256 bytes 65,636 bytes
Standard
Machine Dependent bytes (8086)
Capacity Measures
1 kilobyte 1 megabyte 1 gigabyte 1 terabyte (kB) (MB) (GB) (TB) = = = = 210 bytes 220 bytes 230 bytes 240 bytes
Inside the CPU - Arithmetic Logic Unit
(ALU)
Combinational Logic Circuit
Two Classes of Inputs:
Control Data
Two Classes of Outputs
Status Data
General Arithmetic Circuit
- Attempt to Share logic Example - Purely combinational - data path between registers - 3-bit ALU A0 B0 0 1 2 3 s1 s0 Cin FA S Cout C0
X Y
A1 B1 0 1 2 3 s1 s0 X Y Cin FA S Cout C1
A2 B2 0 1 2 3 s1 s0 X Y Cin FA S Cout C3 C2
S1 S0
Arithmetic Logic Unit (ALU)
In a higher level diagram:
ALU
n
n+1
CLK S1 S0
Inside the CPU - Control Unit
FSM - Finite State Machine
Generates Control Signals
External - Bus Signals Internal - Register Load/Clear; ALU Control
Synchronization
Controls when to Fetch/Execute Generates Timing Signals Handles External Events - Interrupts
Generally Composed of Subcircuits
Bus Controller Memory Controller Cache Controller
Computer Organization
Principle Components
CPU - (Central Processing Unit)
Fetch/Execute Machine
Main Memory
An Array of Storage Locations for Bits Data and Instructions Stored Here
Secondary Storage
Memory that is Cheap Memory that is Slow
I/O Devices - (Input/Output)
Human and Computer to Communication Computer and Other Device Communication
Intel x86 Microprocessors
CPU Name 8080 8086 80286 80386 80486 Pentium Pentium Pro Pentium MMX Pentium II Celeron Pentium III Year Intro. 1974 1978 1982 1985 1989 1994 1995 1997 1998 1998 1999 Int. CPU Clock 2-3 MHz 5-10 MHz 6-16 MHz 16-33 MHz 25-50 MHz 60-200 MHz 150-200 MHz 133-266 MHz 233-500 MHz 266-500 MHz 450-600 MHz # Trans. Addr. Pins Data Pins 4500 8 16 29000 16 20 130000 16 24 275000 32 32 1.2M 32 32 3.1M 64 32 5.5M 64 36 64 32 7.5M 64 7.5M 64 64
Intel x86 Microprocessors
8086 - 20 bit Addr. Bus - 1MB of Memory 80286 - 24 Addr. Bus - Added Prot. Mode 80386 - 32 bit regs/busses - Virtual 86 Mode 80486 - RISC Core - L1 Cache - FPU Pentium - Superscalar - Dual Pipeline - Split L1 Cache Pentium Pro - L2 Cache - Br. Pred. - Spec. Exec. Pentium MMX - 57 Instructions - Integrated DSP (MMX) Pentium II - 100 MHz Bus - L2 Cache - MMX Celeron - 66 MHz Bus - True L2 Cache Integration Pentium III - 100 MHz Bus - 70 Instr. Streaming SIMD Ext. .actual processors: P IV, Centrino, DualCore, Atom,
Intel x86 Family Tree
Designer Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Intel Processor 4004 8080 8086 8086 8086 8088 8088 80186 80186 80186 80188 80188 80286 80286 80286 80386DX 80386DX 80386DX 80386DX 80386SX 80386SX 80486DX 80486DX 80386SL 80486SX 80486SX 80486SX 80486DX2 80486DX2 80486DX2 80486DX2 80486DX4 Pentium Pentium Pentium Pentium 1990 1991 1991 1991 1992 1992 1992 1992 1994 1993 1988 1988 1989 1982 1982 1982 1985 Codename Year 1971 1974 1978 1978 1978 1979 1979 1982 CPU Clk 0.1 2 4.77 8 10 4.77 8 8 10 12.5 8 10 8 10 12.5 16 20 25 33 16 20 25 33 20 16 20 25 50 50 66 66 60 60 66 75 90 60 66 50 60 1 1 1.5 1.5 25 25 33 33 2 2 2 2 4.77 8 10 4.77 8 BUS Clk Clk Mult 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5, 12 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 3.3 5 5 3.3 3,3 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.8 0.8 0.6 0.6 BiCMOS BiCMOS PGA PGA PGA PGA 273 273 296 296 1.5 1.5 0.8 1.5 1.5 1.5 1.5 Voltage Feature Size 10 6 3 3 3 3 3 NMOS NMOS, CHMOS NMOS, CHMOS NMOS, CHMOS NMOS, CHMOS NMOS, CHMOS NMOS, CHMOS NMOS, CHMOS NMOS, CHMOS NMOS, CHMOS NMOS, CHMOS NMOS NMOS NMOS CHMOS CHMOS CHMOS CHMOS CHMOS CHMOS CHMOS CHMOS CHMOS CHMOS CHMOS CHMOS DIP DIP DIP DIP DIP DIP PLCC, PGA,LCC PLCC, PGA,LCC PLCC, PGA,LCC PLCC, PGA,LCC PLCC, PGA,LCC PLCC, PGA,LCC PLCC, PGA,LCC PLCC, PGA,LCC PGA PGA PGA PGA PQFP PQFP PGA PGA PQFP, LGA PQFP, PGA PQFP, PGA PQFP, PGA PGA Quad FP PGA Quad FP 40 40 40 40 40 40 68 68 68 68 68 68 68 68 132 132 132 132 100 100 168 168 196/227 196/168 196/168 196/168 168 196 168 196 Tech. Package Pins
80x86 microprocessors
1972 Intel Corp. 8008 1978 8086
20 bit address instead of 16 1MB memory access / 64K Bus Interface Unit/ Execution Unit instruction fetch/ execution Internal Registers : Data = 16bits HW multiplier/Divider External arithmetic processor
8088
8bit external Bus Can use cheap and simple 8bit memory interface 16bit register / 20 bit address bits 1982 XT : 16 K memory, 4.77 MHz
Internal architecture of 8086 microprocessor
Address Bus
Data Bus
Instr. Decode; Bus Controller
BIU
AH BH CH
AL BL CL DL BP DI SI SP CS ES SS DS IP
ADD
1 2 3 4 5 6
EU
DH
Instruction Queue
SYSTEM BUS (Internal)
ALU/EXECUT
FLAGS
80186/80188
Single Computer in a chip 8086(8) + clock generator + timer + interrupt controller + DMA (Direct Memory Access) controller + IO interface
80286
16bit data/ 24bit address Operation modes: Real mode / protected mode
Real mode : same as 8086 Protected mode : multi- tasking programming Many segments in memory Once in a protected mode, cannot return real mode -> pitfall
80386
1985 : 32 bit data/address 4GB physical memory access Real mode : same as 8086 Protected mode : descriptor register controls tasks, allocates segment Segment size boundary, size Virtual Memory support
80386
Windows, OS/2 2 clock cycles for memory access Cache 16 added instructions 386SX : 16bit data/ 24bit address bits
80486
RISC (Reduced Instruction Set Computer) concept is applied Improved 386 performance 5 stage pipeline 80387 floating point processor DX2/DX4 : fast internal bus/slow external bus(clock)
Pentium
Super-scalar processor Separate 2 Pipelines Code cache/data cache 5 -8 -stage pipeline 64 bit external bus
Operating modes for Pentium
REAL MODE similar as 8086 with possibility to switch PROTECTED MODE
Virtual 8086 multitasking, virtual memory addressing,
SYTEM MANAGEMENT MODE (SMM):
Standard architectural feature since Intel 387 SL provides an operating system and application independent power management system Activated by an external interrupt SMI# switches the CPU to a separate address space while saving the entire context of the CPU
Advanced technologies used in Pentium (1)
Superscalar execution Compared with I486 which can execute only one instruction at a time, Pentium can sometimes execute 2 instructions at a time Pipeline architecture Instructions are executed in 5 stages: this allows the processor to overlap multiple instructions so that it takes less time to execute two instructions in a row Pentium has 2 independent pipelines Branch target buffer Pentium processor fetches the branch target instruction before it executes the branch instruction Dual on-chip caches two separate caches on chip--one for instructions and one for data which allows the processor to fetch data and instructions from the cache simultaneously Write-back cach When data is modified; only the data in the cache is changed. Memory data is changed only when the processor replaces the modified data in the cache with a different set of data
Advanced technologies used in Pentium (2)
64 bit bus
with its 64-bit-wide external data bus (Intel486 has 32-bit external bus) the processor can handle up to twice the data load of the Intel486 processor at the same clock frequency
Instruction optimization
The Pentium processor has been optimized to run critical instructions in fewer clock cycles than the Intel486 processor
Floating Point Optimization
The Pentium processor executes individual instructions faster through execution pipelining, which allows multiple floating-point instructions to be executed at the same time
Pentium extension
The Pentium processor has fewer instruction set extensions than the Intel486 processors. The Pentium processor also has a set of extensions for multiprocessor (MP) operation. This makes a computer with multiple Pentium processors possible
Compared with I486: separate instruction and data c
aches dual integer pipelines (U and V) branch prediction with BTB pipelined FPU 64 external bus about 3 million transistors
Pentium Pro
Two separate silicon die : processor + second cache(256K or 512K) Internal bus : 32 bit External data bus : 64 bit Address bus : 36bit for 64GB 100% compatible with 80x86 programs 3 processor instruction + 2 floaing point instructions
Improvements in Pentium Pro
Superpipelining: increases the number of execution steps, to 14, from the Pentium's 5. Integrated Level 2 Cache: The Pentium Pro features a higher-performance secondary cache compared to all earlier processors. Instead of using motherboardbased cache running at the speed of the memory bus, it uses an integrated level 2 cache with its own bus, running at full processor speed, typically three times the speed that the cache runs at on the Pentium. The Pentium Pro's cache is also nonblocking, which allows the processor to continue without waiting on a cache miss. 32-Bit Optimization: The Pentium Pro is optimized for running 32-bit code (which most modern operating systems and applications use) and so gives a greater performance improvement over the Pentium when using the latest software. Wider Address Bus: The address bus on the Pentium Pro is widened to 36 bits, giving it a maximum addressability of 64 GB of memory. Greater Multiprocessing: Quad processor configurations are supported with the Pentium Pro compared to only dual with the Pentium. Out of Order Completion: Instructions flowing down the execution pipelines can complete out of order. Superior Branch Prediction Unit: The branch target buffer is double the size of the Pentium's and its accuracy is increased. Register Renaming: This feature improves parallel performance of the pipelines. Speculative Execution: The Pro uses speculative execution to reduce pipeline stall time in its RISC core.
P6 Microarchitecture
1st level cache = 8KB instruction cache + 8KB data cache 2nd Level cache = 1 MB static RAM, 64 bits bus CENTERPIECE =Out of Order Execution called Dynamic Execution) 3 functions
Deep branch prediction (DBP)
Dynamic Data Flow Analysis (DDFA) Speculative Execution (SE) execute instructions beyind a branch
Pentium 4
NetBurst Architecture
1. Hyper pipeline technology: more pipelines: 20 31 pipes 2. Rapid Execution Engine: the ALU in the core of the CPU actually operate at twice the core clock frequency 3. Execution Trace Cache: It stores decoded micro-operations, so that when executing a new instruction, instead of fetching and decoding the instruction again, the CPU directly accesses the decoded microops from the trace cache, thereby saving a considerable time
High clock speeds (up to 4 GHz) SSE2 and SSE3 instruction sets to accelerate media processing Integration of HyperThreading
make one physical CPU work as two logical and virtual CPUs
Bigger L2 cache (512KB, 2MB) Pipeline: 31 stages
HyperThreading Technology
Figure shows a comparison of a processor that supports HT Technology (implemented with two logical processors) and a traditional dual processor system.
The technology enables a single physical processor to execute two or more separate code streams (threads) concurrently logical processors The logical processors in an IA-32 processor supporting HT Technology share the core resources of the physical processor. This includes the execution engine and the system bus interface. After power up and initialization, each logical processor can be independently directed to execute a specified thread, interrupted, or halted.
Dual (Multi) Core Processors
Based on core technology: more processors on a single chip they share some of the resources / external buses, cache Dual core - Less power consumtion (50%, peak of 65W) Faster on CPU intensive applications (audio/video processing, files scans, etc)
Homework
Explain the concept of pipeline
Explain which is the difference in addressing the memory and ad dressing the peripheral devices Explain the role of the retirement unit from P6 microarchitecture Explain the difference between 1st level cache and 2nd level cache Explain the concept of speculative execution SIMD and SSE2 stand for (Explain). Explain the concept of HyperThreading Which is the number bits allocated for data bus, respectively addr ess bus for the following microprocessors: 8080, Pentium IV, 8048 4, 80286, 8086, 80186.
(C2)
80x86 Internal Architecture
Computer Operation Model
FETCH Instruction - EXECUTE Instruction
FETCH EXECUTE FETCH time EXECUTE FETCH EXECUTE
FETCH
EXECUTE
1) Read Instruction from Memory 2) Decode/Interpret Instruction 3) Increment Instruction Address Register 1) Control Unit - Input is Decoded Instruction 2) Control Signals Set 3) Data is Processed
8086 Architecture Specifics
BIU and EU - Pipelined Arrangement
BIU - Bus Interface Unit Instruction Pipeline EU - Execution Unit Pipeline - Hardware Designed for Parallel Operation
BIU
FETCH
FETCH
FETCH
FETCH
FETCH GET DATA
EU
WAIT
EXECUTE
EXECUTE
EXECUTE
time
8086 Internal Architecture
Execution Model
8086 Overall Architecture
BIU and EU - Pipelined Arrangement
Instruction Pipeline EXECUTION UNIT EU
BUS INTERFACE UNIT BIU
System Bus (PC Bus)
Address Bus
Data Bus
Instr. Decode; Bus Controller
BIU
AH BH CH
AL BL CL DL BP DI SI SP CS ES SS DS IP
ADD
1 2 3 4 5 6
EU
DH
Instruction Queue
SYSTEM BUS (Internal)
ALU/EXECUT
FLAGS
Content of the EU: Content of the BIU: .
BIU Bus Interface Unit (Resp: signals and data/instruction
control)
To bring the instructions into the internal QUEUE To control the content of the queue To computes the address To generates the control signals
BIU - contents
Bloc for controlling the signals FIFO memory to implement the 6 byte s queue Instruction pointer (next instruction to be executed) ALU to calculate the address Internal communication registers Registers for memory segmentation
EU Execution Unit
Decoding of instructions ALU General registers (accessible by user) Internal registers (internal operations) Register to store the status and contr ol of the program
Accumulator Base Counter Data
15
AH BH CH DH
AL BL CL DL
0
AX BX CX DX
Code Segment Data Segment Stack Segment Extra Segment
15
CS DS SS ES
0
Instruction Pointer Stack Pointer Base Pointer Source Index Destination Index
IP SP BP SI DI
} }
For 32 bit processors: AX register (16b) EAX (32b)
8086/8088 Register File (cont)
Instruction Pointer Register
15 0
IP Contains Address of NEXT Instruction to be Fetched
Automatically Incremented
Programmer can Control with jump and branch
AX, BX, CX, DX
General Purpose Registers
7 0 7 0
Accumulator Base Counter Data
AH BH CH DH
AL BL CL DL
Can Be Used Separately as 1-byte Registers AX AH:AL Temporary Storage to Avoid Memory Access
Faster Execution Avoids Memory Access
Some Special uses for Certain Instructions
AX, BX, CX, DX
General Purpose Registers - Some Specialized Uses
7 0 7 0
Accumulator Base Counter Data
AH BH CH DH
AL BL CL DL
AX, Accumulator Main Register for Performing Arithmetic mult/div must use AH, AL accumulator Means Register with Simple ALU BX, Base Point to Translation Table in Memory Holds Memory Offsets; Function Calls CX, Counter Index Counter for Loop Control DX, Data After Integer Division Execution - Holds Remainder
CS, DS, ES, SS - Segment Registers
Contains Base Value for Memory Address
CS, Code Segment
Used to point to Instructions Determines a Memory Address (along with IP) Segmented Address written as CS:IP
DS, Data Segment
Used to point to Data Determines Memory Address (along with other registers) ES, Extra Segment allows 2 Data Address Registers
SS, Stack Segment
Used to point to Data in Stack Structure (LIFO) Used with SP or BP SS:SP or SP:BP are valid Segment Addresses
IP, SP, BP, SI, DI - Offset Registers
Contains Index Value for Memory Address
IP, Instruction Pointer
Used to point to Instructions Determines a Memory Address (along with CS) Segmented Address written as CS:IP
SI, Source Index;
DI, Destination Index
Used to point to Data Determines Memory Address (along with other registers) DS, ES commonly used
SP, Stack Pointer;
BP, Base Pointer
Used to point to Data in Stack Structure (LIFO) Used with SP or BP SS:SP or SP:BP are valid Segment Addresses
These can also be used as General Registers !!!!!!
8086/8088 Register File (cont)
Flags Register
15 0
x OF DF IF TF SF ZF x AF x PF x CF
Status and Control Bits Maintained in Flags Register
Generally Set and Tested Individually
9 1-bit flags in 8086; 7 are unused
Status Flags
Indicate Current Processor Status
CF OF ZF SF Carry Flag Overflow Flag Zero Flag Sign Flag Arithmetic Carry Arithmetic Overflow Zero Result; Equal Compare Negative Result; NonEqual Compare Even Number of 1 bits Used with BCD Arithmetic
PF
AF
Parity Flag
Auxiliary Carry
Control Flags
Influence the 8086 During Execution Phase DF: Direction Flag Increment/Decrement
used for string operations
IF: Interrupt Flag
TF Trap Flag
Enables Interrupts
Allows Single-Step
allows fetch-execute to be interrupted for debugging; causes interrupt after each op
MOV AH,[SI]
8086 Segmented Memory
x86 Memory Partitioned into Segments 8086: maximum size is 64K (16-bit index reg.) 8086: can have 4 active segments (CS, SS, DS, ES) 8086: 2-data; 1-code; 1-stack x386: maximum size is 4GB (32-bit index reg.) x386: can have 6 active segments (4-data; FS, GS) Why have segmented memory ???????? Other microprocessors could only address 64K since they only had a single 16-bit MemAddrReg (or smaller). Segments allowed computers to be built that could use more than 64K memory (but not all at the same time).
8086/8088 Memory Access Registers
15 0
Code Segment Data Segment Stack Segment Extra Segment
15
CS DS SS ES
0
Instruction Pointer Stack Pointer Base Pointer Source Index Destination Index
IP SP BP SI DI
} }
8086 Generating Physical Addresses
CS
Memory System Address Lines
ES
SS DS
19 Physical Address
Dedicated Segment Registers
ADD
15 Index Reg. 0 15 Segment Reg. 0 0000 BP DI SI SP
Portion of BIU Circuitry
IP
Dedicated Index Registers
Segmented Addressing
Each Segment must begin at Paragraph Boundary
physical address
00000h
CS ES SS
memory paragraph 1
DS
00010h
paragraph 2
00020h
paragraph 3
BP DI SI
Each paragraph has phys. address that is multiple of 10h BIU is responsible for appending 0000 to Segment
only need 16-bit segment registers
SP IP
Segmented Memory (x86 Style)
FFFFFh
Code Segment
Segment Registers
CS ES SS
Extra Segment Stack Segment
DS
Segment Registers: Point to Base Address Index Registers: Contain Offset Value fragmentation Notation (Segmented Address):
CS:IP DS:SI ES:DI SS:BP SS:SP
Data Segment
00000h
System Memory
Memory Storage Organization
Organized as SEGMENTS Maximum segment size = 64KB
(Since 16 bit offsets: 216 = 65,535 = 64KB)
Maximum Memory Size: 220 = 1,048,576 = 1MB Newer Processors (Pentium) Can Utilize More Memory Wider Address Registers 32 bits
232 = 4,294,967,296 = 4GB
Segmented Memory Example
FFFFFh
Code Segment
Segment Registers
CS ES SS
Extra Segment Stack Segment
DS
Logical, Segmented Address: 0FE6:012Bh Offset, Index Address: 012Bh Physical Address: 0FE60h 65120 + 012Bh 299 0FF8Bh 65149
Data Segment
00000h
System Memory
Segmented Memory Aliasing
Logical, Segmented Address 1: DS:SI = 1234:4321 Physical Address: 12340h 74560 + 4321h 17185 16661h 91745 Logical, Segmented Address 2: ES:DI = 1665:0011 Physical Address: 16650h 91728 + 0011h 00017 16661h 91745
Segment Locations in Physical Memory
1 Word = 16 bits Byte Addressable Little Endian Arrangement
MSB (Most Significant Byte) at Higher Address
072CH 072BH 072AH 18H A3H 7EH AD5FCH AD5FBH AD5FAH
0729H
0728H 0727H 0726H 0725H 0724H 0723H 0722H
69H
AAH 2EH 00H 55H 02H 72H 11H
AD5F9H
AD5F8H AD5F7H AD5F6H AD5F5H AD5F4H AD5F3H AD5F2H
Base Address = ACEDH Logical Address = 0724H Physical Address = ACED0H + 0724H = AD5F4H M[ACED:0724] = M[AD5F4] = 5502H
0725H
0724H 0H 0000 2H 0010
5H 0101
5H 0101
hex binary
072CH 072BH 072AH 0729H 0728H 0727H 0726H 0725H 0724H 0723H 0722H 0721H 0720H 071FH 071EH 071DH
18H A3H 7EH 69H AAH 2EH 00H 02H 55H 11H 20H 72H DEH ADH FAH CEH
AD5FCH AD5FBH AD5FAH AD5F9H AD5F8H AD5F7H AD5F6H AD5F5H AD5F4H AD5F3H AD5F2H AD5F1H AD5F0H AD5EFH AD5EEH AD5EDH
Assume: M[DS:DI] Contains a Pointer Value DS = AD5Fh; DI = 0005h (All Segments Start on Paragraph Boundary) SI M[DS:DI] Then: Pointer is M[DS:DI] = M[AD5F:0005] = M[AD5F5] = 0002h M[DS:SI] = M[DS:(DS:DI)] = M[DS:0002h] = M[AD5F:0002] = M[AD5F2] = 1120h
071CH
CAH
FEH
AD5ECH
Default Segment/Index Pairs
Type of Memory Reference Instruction Fetch Stack Operation Variable (except following) - String Source - String Destination - BP used as Base Register - BX Used as Base Register
Default Segment Base CS SS DS DS ES SS DS
Alternate Segment Base Offset None IP None SP CS, ES. SS Effective Address CS, ES, SS SI None DI CS, DS, ES Effective Address CS, ES, SS Effective Address
Homework: Give several exercises
Keypoints
(C3)
Addressing Modes for 80x86 microprocessors
Addressing modes- Classification
Register addressing Immediate addressing Memory addressing
Direct addressing Indirect register addressing Based addressing (with/without displacement) Indexed addressing (with/without displacement) Based-indexed adressing (with/without displacement) Addrssing on strings of bytes Addressing of ports
Exceptions:
-Segm segm -Segm immediate value
Memory addressing
Addressing on strings of bytes
Strings of bytes
Source string (SI), in DS (default) Destination string (DI), in ES (default)
Examples of strings Examples how the address is calculated:
MOVSB, LODSB.
Addressing the ports
What is a port? Input/output on ports Which is the address of the port?
Difference on memory address
Registers used in addressing Examples How to switch on a LED? Example.
Instruction encoding (e.g. MOV)
The instruction set
http://burks.brighton.ac.uk/burks/language/asm/asmtut/asm1.htm#toc http://webster.cs.ucr.edu/AoA/DOS/AoADosIndex.html
1) Instructions for data transfer
2) Arithmetic instructions
3) Logic instructions
4) Shifts/rotate instructions + LOOPS
5) Instructions on strings of bytes
6) Instructions for port input/outpus
Instructions for data transfer
MOV XCHG XLAT PUSH/POP LEA LDS, LES
Arithmetic Instructions ( to be continued)
ADD, ADC INC AAA, DAA SUB, SBB DEC AAS, DAS
(C4)
INTRUCTION SET
The instruction set
http://burks.brighton.ac.uk/burks/language/asm/asmtut/asm1.htm#toc http://webster.cs.ucr.edu/AoA/DOS/AoADosIndex.html
1) Instructions for data transfer
2) Arithmetic instructions
3) Logic instructions
4) Shifts/rotate instructions + LOOPS
5) Instructions on strings of bytes
6) Instructions for port input/outpus
Instructions for data transfer
MOV XCHG XLAT PUSH/POP LEA LDS, LES
Important: -Dest & source the same size in bits -Register can not be IP -Transfer memory memory is not possible -Flags are not changed
Exercises (MOV, XCHG)
1) In AH, byte from the address 0, AL FFFFF
1) 2) Using direct addressing Using indexed addressing
2)
ES=1000, DS=5000, DI=100, SI=200 exchange the values of mem locations (bytes):
1) 2) Using only MOV Using XCHG
3) 4) 5) 6) 7)
Use based with index addressing AX ES:[3000h] Interchange DS with ES AXBXCXDXAX (2 solutions give other solutions) Interchange AX with BX without: MOV, XCHG (**) For laboratory: propose 2-3 exercises similar as above
Exercise XLAT
1) 2) 3) 4) Draw the schematic principle Where is applied: encryption, conversion Example with ASCII codes Write an encryption algorithm give the solution:
1) Input from port 100h 2) Encrypt 3) Send to the port 200h
LDS, LES - examples
The schematic explanation (first SI, then DS) LDS SI, adress (LDS BX, address) LES DI, address (LES reg,address) Example for transfer of strings of bytes
Exercises PUSH, POP
Save an the stack all register (CALL) Exchange BX with CX using push/pop AXBXCX using push/pop Propose 2-3 problems
Arithmetic Instructions ( to be continued)
ADD, ADC INC AAA, DAA SUB, SBB DEC AAS, DAS
(C5)
INTRUCTION SET - 2
The instruction set
http://burks.brighton.ac.uk/burks/language/asm/asmtut/asm1.htm#toc http://webster.cs.ucr.edu/AoA/DOS/AoADosIndex.html
1) Instructions for data transfer 2) Arithmetic instructions
3) Logic instructions
4) Shifts/rotate instructions + LOOPS
5) Instructions on strings of bytes
6) Instructions for port input/outpus
Exercise XLAT
Conversion of the digit from
AL (0, 1, ...9, A, ...F) in the corresponding ASCII code (30h,...39h, 41h, ...46h)
Give 2 solutions
Example AAA, DAA
AL=37h, BL=32h (in ASCII) The result:
in binary, in ASCII
DAA instruction for BCD numbers
Example AAA
;Example AAA MOV AH, 09h MOV AL, 05h ADD AL, AH MOV AH, 0 AAA
; example AAS
MOV AL, 05h
MOV BL, 09h SUB AL, BL; al=FC (-4)
; convert to BCD, AL = 6, ; AAS
ADD AL, 30h; convert in ASCII
Examples
1) 2) 3) 4) 5) AX= BX-CX Substraction on bytes (SBB) (AX,BX) = (AX,BX) (CX, DX) Al=al-2 Mov bx, 0; Dec bx
On BYTE: On WORD:
AX = AL*operand(8) (DX,AX) = AX*operand(16). Eg:..
Examples MUL
Ex: val1 DB val2 DW mov mul mov mul
mov
3 257 al, 0ah val1 ax,100h val2
al,8 bl,7 bl; AX=38h=56 AX=0506
mov mul aam;
AX=3*AL (mul + add) AX= 5*AL 7*BL 2 solutions AL = BCD representation of a number on 2 digits. BL its binary representation.
On BYTE: AL = [AX / operand(8))], AH = the rest
On WORD: AX = [(DX,AX) / operand(16)], DX the rest
Examples DIV
AL=AL/3 BL = BL/2 (give 2 solutions) AL= AL/5 BL/7
Remarks (AL AX)
Examples:
AX = AX BX (sub ax,bx OR neg bx; add ax,bx
mov mov cmp jl inc label inc bl
al,value bl,2 al,bl label bl
Conditional JUMP
JUMP IF less/bellow less or equal/bellow or eq equal/zero not equal/not zero greater or equal/above eq greater/above carry/not carry sign/not sign JL/JB JLE/JBE JE/JZ JNE/JNZ JGE/JAE JG/JA JC/JNC JS/JNS
Exercises
AL = max(BL, CL) AL = BL+CL (if DL>0), BL-CL (DL<0), 0 DL = 0 if AL is odd, 1 if AL is even AL = ASCII code (digit from AL)
TEST dest,source
Exercises
mov and or xor test ax, 0abcdh ax,0ffh ah, 0fch al,ah al,1
AL = 0 (if BL6=0), 1 (if BL6=1) AX=ASCII AL=binary DL=Binary AX-ASCII
(C6)
INTRUCTION SET - 3
The instruction set
http://burks.brighton.ac.uk/burks/language/asm/asmtut/asm1.htm#toc http://webster.cs.ucr.edu/AoA/DOS/AoADosIndex.html
1) Instructions for data transfer 2) Arithmetic instructions
3) Logic instructions
4) Shifts/rotate instructions + LOOPS
5) Instructions on strings of bytes
6) Instructions for port input/outpus
Instuctions for SHIFT
Logic (insert 0s)
Left: Right: SHL reg/mem, {1, CL}
1011 0001 0110 0010
SHR reg/mem, {1, CL}
1011 0001 0101 1000
Arithmetic (insert 0s, but keep the SIGN)
Left: Right: SAL reg/mem, {1, CL} SAR reg/mem, {1, CL}
Instuctions for ROTATE
Without Carry (CY is not rotated)
Left:
1011 0001 0110 0011
ROL
reg/mem, {1, CL}
Right:
1011 0001 1101 1000
ROR reg/mem, {1, CL}
With Carry (CY is included in rotation)
Left:
CY=x CY=1
RCL reg/mem, {1, CL}
1011 0001 0110 001x
Right:
CY=x CY=1
RCR reg/mem, {1, CL}
1011 0001 x101 1000
Examples
a) mov al, 0ffh shl al,1 mov cl,3 shl al,cl -----------------c) mov al, 0fch mov cl,4 rol al,cl shr al,cl
b) sal al,1; sar al,1; (mul 2) (div 2)
Exercises (1)
1) Store in BL the value of bit a4 from AL a) b) mov cl,3 and al,00010000b shl al,cl mov cl,4 mov cl,7 shr al,cl shr al,cl mov bl,al mov bl,al Find out other 2 solutions !!
Exercises (2)
2) AX = ??xy; (x,y=hexa). Obtain AX=0x0y push cx mov cl,4 rol ax,cl; AX=?xy? and ah,0Fh AX=0xy? shr al,cl; AX=0x0y pop cx Propose another solution !!
Exercises (3)
3) AX = 8*AL 7*BL mov cl,3 cbw sal ax,cl xchg ax,bx cbw mov dx,ax sal ax,cl sub ax,dx sub bx,ax xchg ax,bx
Exercises (4)
4) Counts into DL the number of bits 1 from AX xor dl,dl mov cx,16 nextbit rcl ax,1 jnc zerobit inc dl zerobit dec cx jnz nextbit Propose another 2 solutions !!
Exercises (5)
5) Fill in the first 256 bytes from DS with the values: 00, 01, 02, ..., FFh mov si,0 mov cx,256 xor al,al NextByte: mov byte ptr[si],al inc al inc si dec cx jnz NextByte
LOOP instruction
Syntax: LOOP Equivalent with: DEC JNZ label
CX label
Other forms: LOOPE/LOOPZ and LOOPNE/LOOPNZ
Examples
et1: mov xor add loop cx,100 ax,ax ax,2 et1
Instructions to control the FLAGS
Examples: ...some examples.
Example: read from keyboard (port 60h) and write to printer (port 378h):
in
mov out
al,60h
dx,378h dx,al
(!!! Some tests are required: keypressed?, printer busy?...later on)
Examples IN/OUT
1) Generate a rectangular signal on D0 from port 300h 2) Control the frequency 3) Give the solution to control the duration of the signal 4) Signal with another form 5) Connect a DAC and generate a triangular signal (maximum frequency) 6) A dynamic light: control the speecd
Examples: a) wait for non busy, b) timeout
LODSB(W)
Example LODSB
DX = Sum(DS:SI=100h), i=1...20 xor dx,dx mov cx,20 mov si,100h cld nextbyte: lodsb cbw add dx,ax loop nextbyte Propose the solution without using LODS
STOSB(W)
Example STOSB
Generate a string of 256 bytes (00, 01, ...FFh) at the address ES:DI=300H mov di,300h xor al,al mov cx,256 cld sto: stosb inc al loop sto nop Propose the solution without using STOSB
MOVSB
Examples MOVSB (transfer 20 bytes from DS:SI=100h to ES:DI=300h) mov si,100h mov si,100h mov di,300h mov di,300h mov cx,20 mov cx,20 cld e: mov al,[si] e: MOVSB mov [di],al loop e inc si inc di Obs: dec cx REP jnz e MOVSB
Example: returns into BX the offset of the first 00 byte encountered in the string of 100 bytes found at the address ES:DI=200h OR 0FFFF if not found.
mov mov mov s: cmp je inc loop mov f0: mov nop
di,200h cx,100 al,0 al,[di] f0 di s di,0ffffh bx,di
mov mov cx, mov cld REPNE SCASB jcxnz mov f: dec mov nop
di,200h 100 al,0
f di,0 di bx,di
Exercise
Data Acquisition System: - Control port, RD/WR: 200h: (START on D0 active in 1, EOC on D7 active in 0) - data port, RD, 300h - Read 10.000 samples in ES:DI=100h
MACRO - instructions
= a group of instructions identified by a unique name (the name of the MACRO) and interpreted as a new instruction - a MACRO needs to be defined: name MACRO param_1, ..., param_n .... instructions .... ENDM In order to use the MACROs: - They need to be defined - They could be used afterwards
Example 1 (without parameters)
MACRO to store/restore the registers (A) (B) Definitions: push_regs MACRO pop_regs MACRO push ax pop dx push bx pop cx push cx pop bx push dx pop ax ENDM ENDM Usage: .... ..... push_regs ; substituted by the above sequence .... .... pop_regs .... .... Discussion: Advantages (+), Disadvantages (-)
Example 2 (with parameters)
MACRO to FILL in a string of bytes in ES at the address addr with the value 0 (parameters: address and number of bytes ) Definition: fill MACRO addr, n xor al,al cld mov di, addr mov cx, n rep stosb ENDM Usage: .... ..... fill 100h, 200h ; substituted by the above sequence .... ....
Example 3 (with parameters)
MACRO to add/subtract 2 numbers n1, n2 of 1 byte into AL (AL=n1+/- n2) Definition: compute MACRO n1, operand, n2 mov al, [n1] cmp [operand], + jne minus add al, [n2] jmp final minus: sub al,[n2] final: nop ENDM Usage: mov di DB 20h op DB - b DB 10h .... ..... compute a, op, b ; substituted by the above sequence .... ....
Exercises (homework)
MACRO for:
Obtaining in AX the sum of a string of bytes (the offset address in DS and the number of the bytes are the parameters) Obtaining in AL the maximum of a string of bytes (the address and the number of the bytes are the parameters) Returning in CX the length of a string of bytes that ends with 00h (the address is transmited as parameter)
CALL - instructions
The need to substitute sequences of programs which are repeated OR group the functionalities in a single software entity Example diagram Types of CALLS Intrasegment (NEAR Calls) - IP Intersegment (FAR Calls) CS, IP
Execution Steps
1) 2) CALL is classified as NEAR or FAR NEAR FAR save IP on the stack - save CS on the stack - save IP on the stack 3) JUMP at the address CS:IP 4) Execute the sequence, until RET is encountered 5) RET execution: NEAR - get IP from the stack - get IP from the stack - get CS from the stack 6) JUMP at the address CS:IP
Procedures
Definition: namep PROC ... ... instructions ... ... RET; namep ENDP Usage: ... call ... ... namep ... {NEAR or FAR}
PAY ATTENTION !!
Example 1 (find_max in a string; receives in SI begining of the string and the length in CX; returns the max in AL and the index of max in BX)
find_max PROC mov mov dec cmp jge mov mov inc inc loop RET ENDP .... mov mov call ... NEAR al, byte ptr [si] bx,si cx al, byte ptr[si +1] Ok al, byte ptr[si+1] bx,si bx si c
c:
Ok:
Usage: ... si, offset string1 cx,length find_max ...
Example 2 (strlen returns in CX the length of a string of bytes finished with 00h; receives into DI the begining of the string)
strlen PROC xor xor cld scasb jz inc jmp ret ENDP str .... mov call ...
cx,cx al,al
comp:
exit cx comp
exit: Usage:
DB un text, 0 ... di, offset str; (equiv: strlen ...
lea
di,str)
Allocation of memory for variables
DB define byte, DW define word, DD-define double word Examples:
a b c str len DB DB DW DB EQU 1 2 1, 2, 3 text, 0 $-Str
Usage: mov mov mov mov al, [a] si, offset a al, [si] si, offset str
Organisation of the programs in Assembling Language (*.asm)
PAGE 60,132 TITLE ROTIT COMMENT * * STIVA SEGMENT DW STIVA ENDS ; DATA SEGMENT a DB DATA ENDS CODE SEGMENT MAIN PROC ASSUME push ds xor ax,ax push ax mov ax,DATA mov ds,ax ; ..... ret MAIN ENDP CODE ENDS END MAIN PARA 256 STACK DUP(?) 'STACK'
PARA 0
PUBLIC
'DATE'
PARA PUBLIC 'COD' FAR SS:STIVA,DS:DATA,CS:CODE,ES:NOTHING
;preg.pt.ret ;preg. DS
EXE/COM