IA-32 Architecture
Computer Organization
&
Assembly Language Programming
Adapted from the slides prepared by Kip
Irvine for the book, Assembly Language
for Intel-Based Computers, 5th Ed.
Outline
Intel Microprocessors
IA-32 Registers
Instruction Execution Cycle
IA-32 Memory Management
IA-32 Architecture Computer Organization and Assembly Language
slide 2/45
Basics
IA-32 Architecture refers to systems based on 32-bit
processors generally compatible with the Intel Pentium®
II processor, (for example, Intel® Pentium® 4 processor
or Intel® Xeon® processor), or processors from other
manufacturers supporting the same instruction set,
running a 32-bit operating system.
Intel® 64 Architecture refers to systems based on IA-
32 architecture processors which have 64-bit
architectural extensions, for example, Intel® CoreTM2
processor family), running a 64-bit operating system
such as Microsoft Windows XP* Professional x64 Edition
or Microsoft Windows Vista* x64
IA-32 Architecture Computer Organization and Assembly Language
slide 3/45
Intel Microprocessors
Intel introduced the 8086 microprocessor in 1979
8086, 8087, 8088, and 80186 processors
16-bit processors with 16-bit registers
16-bit data bus and 20-bit address bus
Physical address space = 220 bytes = 1 MB
8087 Floating-Point co-processor
Uses segmentation and real-address mode to address memory
Each segment can address 216 bytes = 64 KB
8088 is a less expensive version of 8086
Uses an 8-bit data bus
80186 is a faster version of 8086
IA-32 Architecture Computer Organization and Assembly Language
slide 4/45
Intel 80286 and 80386 Processors
80286 was introduced in 1982
24-bit address bus 224 bytes = 16 MB address space
Introduced protected mode
Segmentation in protected mode is different from the real mode
80386 was introduced in 1985
First 32-bit processor with 32-bit general-purpose registers
First processor to define the IA-32 architecture
(short for "Intel Architecture, 32-bit", sometimes also called i386)
32-bit data bus and 32-bit address bus
232 bytes 4 GB address space
Introduced paging, virtual memory, and the flat memory model
Segmentation can be turned off
IA-32 Architecture Computer Organization and Assembly Language
slide 5/45
Memory segmentation
Segmentation is the process in which the main memory
of the computer is logically divided into different
segments and each segment has its own base address.
It is basically used to enhance the speed of execution of the
computer system, so that the processor is able to fetch and
execute the data from the memory easily and fast.
Read
Need for segmentation
Advantages of segmentation
IA-32 Architecture Computer Organization and Assembly Language
slide 6/45
Intel 80486 and Pentium Processors
80486 was introduced 1989
Improved version of Intel 80386
On-chip Floating-Point unit (DX versions)
On-chip unified Instruction/Data Cache (8 KB)
Uses Pipelining: can execute up to 1 instruction per clock cycle
Pentium (80586) was introduced in 1993
Wider 64-bit data bus, but address bus is still 32 bits
Two execution pipelines: U-pipe and V-pipe(simple instructions)
Superscalar performance: can execute 2 instructions per clock cycle
Separate 8 KB instruction and 8 KB data caches
MMX instructions (later models) for multimedia applications (single
instruction, multiple data (SIMD) instruction set architecture)
IA-32 Architecture Computer Organization and Assembly Language
slide 7/45
Intel P6 Processor Family
P6 Processor Family: Pentium Pro, Pentium II and III
Pentium Pro was introduced in 1995
Three-way superscalar: can execute 3 instructions per clock cycle
36-bit address bus up to 64 GB of physical address space
Introduced dynamic execution
Out-of-order and speculative execution
Integrates a 256 KB second level L2 cache on-chip
Pentium II was introduced in 1997
Added MMX instructions (already introduced on Pentium MMX)
Pentium III was introduced in 1999
Added SSE instructions and eight new 128-bit XMM registers
Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture)
IA-32 Architecture Computer Organization and Assembly Language
slide 8/45
Pentium 4 and Xeon Family
Pentium 4 is a seventh-generation x86 architecture
Introduced in 2000
New micro-architecture design called Intel Netburst
Very deep instruction pipeline, scaling to very high frequencies
Introduced the SSE2 instruction set (extension to SSE)
Tuned for multimedia and operating on the 128-bit XMM registers
In 2002, Intel introduced Hyper-Threading technology
Allowed 2 programs to run simultaneously, sharing resources
Xeon is Intel's name for its server-class microprocessors
Xeon chips generally have more cache
Support larger multiprocessor configurations
IA-32 Architecture Computer Organization and Assembly Language
slide 9/45
Pentium-M and EM64T
Pentium M (Mobile) was introduced in 2003
Designed for low-power laptop computers
Modified version of Pentium III, optimized for power efficiency
Large second-level cache (2 MB on later models)
Runs at lower clock than Pentium 4, but with better performance
Extended Memory 64-bit Technology (EM64T)
Introduced in 2004
64-bit superset of the IA-32 processor architecture
64-bit general-purpose registers and integer support
Number of general-purpose registers increased from 8 to 16
64-bit pointers and flat virtual address space
Large physical address space: up to 240 = 1 Terabytes
IA-32 Architecture Computer Organization and Assembly Language
slide 10/45
Intel MicroArchitecture History
IA-32 Architecture Computer Organization and Assembly Language
slide 11/45
Complete processor history
https://www.computerhope.com/history/processor.htm
IA-32 Architecture Computer Organization and Assembly Language
slide 12/45
Intel Core MicroArchitecture
64-bit cores
Wide dynamic execution (execute four instructions
simultaneously)
Intelligent power capability (power gating)
Advanced smart cache (shares L2 cache between cores)
Smart memory access (memory disambiguation)
Advanced digital media boost
See the demo at
http://www.intel.com/technology/architecture/coremicro/d
emo/demo.htm?iid=tech_core+demo
IA-32 Architecture Computer Organization and Assembly Language
slide 13/45
CISC and RISC
CISC – Complex Instruction Set Computer
Large and complex instruction set
Variable width instructions
Requires microcode interpreter
Each instruction is decoded into a sequence of micro-operations
Example: Intel x86 family
RISC – Reduced Instruction Set Computer
Small and simple instruction set
All instructions have the same width
Simpler instruction formats and addressing modes
Decoded and executed directly by hardware
Examples: ARM, MIPS, PowerPC, SPARC, etc.
IA-32 Architecture Computer Organization and Assembly Language
slide 14/45
Next ...
Intel Microprocessors
IA-32 Registers
Instruction Execution Cycle
IA-32 Memory Management
IA-32 Architecture Computer Organization and Assembly Language
slide 15/45
Basic Program Execution Registers
Registers are high speed memory inside the CPU
Eight 32-bit general-purpose registers
Six 16-bit segment registers
Processor Status Flags (EFLAGS) and Instruction Pointer (EIP)
32-bit General-Purpose Registers
EAX EBP
EBX ESP
ECX ESI
EDX EDI
16-bit Segment Registers
EFLAGS CS ES
SS FS
EIP
DS GS
IA-32 Architecture Computer Organization and Assembly Language
slide 16/45
General-Purpose Registers
Used primarily for arithmetic and data movement
mov eax, 10 move constant 10 into register
eax
Specialized uses of Registers
EAX – Accumulator register
Automatically used by multiplication and division instructions
ECX – Counter register
Automatically used by LOOP instructions
ESP – Stack Pointer register
Used by PUSH and POP instructions, points to top of stack
ESI and EDI – Source Index and Destination Index register
Used by string instructions
EBP – Base Pointer register
Used to reference parameters and local variables on the stack
IA-32 Architecture Computer Organization and Assembly Language
slide 17/45
Accessing Parts of Registers
EAX, EBX, ECX, and EDX are 32-bit Extended registers
Programmers can access their 16-bit and 8-bit parts
Lower 16-bit of EAX is named AX
AX is further divided into
AL = lower 8 bits
AH = upper 8 bits
ESI, EDI, EBP, ESP have only
16-bit names for lower half
IA-32 Architecture Computer Organization and Assembly Language
slide 18/45
Accessing Parts of Registers
IA-32 Architecture Computer Organization and Assembly Language
slide 19/45
Special-Purpose & Segment Registers
EIP = Extended Instruction Pointer
Contains address of next instruction to be executed
EFLAGS = Extended Flags Register
Contains status and control flags
Each flag is a single binary bit
Six 16-bit Segment Registers
Support segmented memory
Six segments accessible at a time
Segments contain distinct contents
Code
Data
Stack
IA-32 Architecture Computer Organization and Assembly Language
slide 20/45
EFLAGS Register
Status Flags
Status of arithmetic and logical operations
Control and System flags
Control the CPU operation
Programs can set and clear individual bits in the EFLAGS register
IA-32 Architecture Computer Organization and Assembly Language
slide 21/45
Status Flags
Carry Flag
Set when unsigned arithmetic result is out of range
Overflow Flag
Set when signed arithmetic result is out of range
Sign Flag
Copy of sign bit, set when result is negative
Zero Flag
Set when result is zero
Auxiliary Carry Flag
The Auxiliary flag is set (to 1) if during an "add" operation there is a carry from the low nibble (lowest four bits) to the
high nibble (upper four bits), or a borrow from the high nibble to the low nibble, in the low-order 8-bit portion, during a
subtraction.
Parity Flag
If after any arithmetic or logical operation the result has even parity, an even number of 1 bit, the parity register
becomes set i.e. 1, otherwise it becomes reset i.e. 0
IA-32 Architecture Computer Organization and Assembly Language
slide 22/45
Floating-Point, MMX, XMM Registers
Floating-point unit performs high speed FP operations
Eight 80-bit floating-point data registers
ST(0), ST(1), . . . , ST(7)
Arranged as a stack
Used for floating-point arithmetic
Eight 64-bit MMX registers
Used with MMX instructions
Eight 128-bit XMM registers
Used with SSE instructions
IA-32 Architecture Computer Organization and Assembly Language
slide 23/45
Registers in Intel Core Microarchitecture
IA-32 Architecture Computer Organization and Assembly Language
slide 24/45
Next ...
Intel Microprocessors
IA-32 Registers
Instruction Execution Cycle
IA-32 Memory Management
IA-32 Architecture Computer Organization and Assembly Language
slide 25/45
Fetch-Execute Cycle
Each machine language instruction is first fetched from the memory
and stored in an Instruction Register (IR).
The address of the instruction to be fetched is stored in a register
called Program Counter or simply PC. In some computers this
register is called the Instruction Pointer or IP.
After the instruction is fetched, the PC (or IP) is incremented to point
to the address of the next instruction.
The fetched instruction is decoded (to determine what needs to be
done) and executed by the CPU.
IA-32 Architecture Computer Organization and Assembly Language
slide 26/45
Instruction Execute Cycle
Instruction
Obtain instruction from program storage
Fetch
Instruction
Determine required actions and instruction size
Infinite Cycle
Decode
Operand
Locate and obtain operand data
Fetch
Execute Compute result value and status
Writeback
Deposit results in storage for later use
Result
IA-32 Architecture Computer Organization and Assembly Language
slide 27/45
Instruction Execution Cycle – cont'd
PC program
Instruction Fetch I1 I2 I3 I4 ...
memory fetch
Instruction Decode op1
read
op2
registers registers
Operand Fetch instruction
I1 register
Execute
decode
Result Writeback
write
write
flags ALU
execute
(output)
IA-32 Architecture Computer Organization and Assembly Language
slide 28/45
Pipelined Execution
Instruction execution can be divided into stages
Pipelining makes it possible to start an instruction before
completing the execution of previous one
Stages For k stages and n instructions, the
S1 S2 S3 S4 S5 S6 number of required cycles is: k + n – 1
1 I-1
2 I-1
3 No I-1
4 n- p I-1
Wa ipe
Cycles
5
ste line I-1
6 dc de I-1
7 I-2 loc xe
k c cut
8 I-2
yc ion
9 I-2 les
10 I-2 Pipelined
11 I-2 Execution
12 I-2
IA-32 Architecture Computer Organization and Assembly Language
slide 29/45
Wasted Cycles (pipelined)
When one of the stages requires two or more clock
cycles to complete, clock cycles are again wasted
Assume that stage S4 is the Stages
execute stage exe
S1 S2 S3 S4 S5 S6
Assume also that S4 requires 1 I-1
2 I-2 I-1
2 clock cycles to complete 3 I-3 I-2 I-1
Cycles
4 I-3 I-2 I-1
As more instructions enter the
5 I-3 I-1
pipeline, wasted cycles occur 6 I-2 I-1
7 I-2 I-1
For k stages, where one 8 I-3 I-2
stage requires 2 cycles, n 9 I-3 I-2
instructions require k + 2n – 1 10 I-3
11 I-3
cycles
IA-32 Architecture Computer Organization and Assembly Language
slide 30/45
Superscalar Architecture
A superscalar processor has multiple execution pipelines
The Pentium processor has two execution pipelines
Called U and V pipes
In the following, stage
S4 has 2 pipelines
Each pipeline still
requires 2 cycles
Second pipeline
eliminates wasted cycles
For k stages and n
instructions, number of
cycles = k + n
IA-32 Architecture Computer Organization and Assembly Language
slide 31/45
Next ...
Intel Microprocessors
IA-32 Registers
Instruction Execution Cycle
IA-32 Memory Management
IA-32 Architecture Computer Organization and Assembly Language
slide 32/45
Modes of Operation
Real-Address mode (original mode provided by 8086)
Only 1 MB of memory can be addressed, from 0 to FFFFF (hex)
Programs can access any part of main memory
MS-DOS runs in real-address mode
Protected mode
Each program can address a maximum of 4 GB of memory
The operating system assigns memory to each running program
Programs are prevented from accessing each other’s memory
Native mode used by Windows NT, 2000, XP, and Linux
Virtual 8086 mode
Processor runs in protected mode, and creates a virtual 8086
machine with 1 MB of address space for each running program
IA-32 Architecture Computer Organization and Assembly Language
slide 33/45
Memory Segmentation
Segmentation is the process in which the main memory of
the computer is logically divided into different segments
and each segment has its own base address.
It is basically used to enhance the speed of execution of the
computer system, so that the processor is able to fetch and
execute the data from the memory easily and fast.
A segment is a logical unit of memory that may be up to 64
kilobytes long. Each segment is made up of contiguous
memory locations. It is an independent, separately
addressable unit. Starting address will always be changing.
It will not be fixed.
IA-32 Architecture Computer Organization and Assembly Language
slide 34/45
Segmentation cont.
Need for Segmentation – The Bus Interface Unit (BIU)
contains four 16 bit special purpose registers (mentioned
below) called as Segment Registers.
Code segment register (CS): is used for addressing memory
location in the code segment of the memory, where the
executable program is stored.
Data segment register (DS): points to the data segment of the
memory where the data is stored.
Extra Segment Register (ES): also refers to a segment in the
memory which is another data segment in the memory.
Stack Segment Register (SS): is used for addressing stack
segment of the memory. The stack segment is that segment of
memory which is used to store stack data.
IA-32 Architecture Computer Organization and Assembly Language
slide 35/45
Memory Segmentation
Memory segmentation is necessary since the 20-bits memory
addresses cannot fit in the 16-bits CPU registers
Since x86 registers are 16-bits wide, a memory segment is made of
216 consecutive words (i.e. 64K words)
Each segment has a number identifier that is also a 16-bit number
(i.e. we have segments numbered from 0 to 64K)
A memory location within a memory segment is referenced by
specifying its offset from the start of the segment. Hence the first
word in a segment has an offset of 0 while the last one has an offset
of FFFFh
To reference a memory location its logical address has to be
specified. The logical address is written as:
Segment number: offset
For example, A43F:3487h means offset 3487h within segment
A43Fh.
IA-32 Architecture Computer Organization and Assembly Language
slide 36/45
Segmentation rules
Rules of Segmentation - Segmentation process follows
some rules as follows:
The starting address of a segment should be such that it can be
evenly divided by 16.
Minimum size of a segment can be 16 bytes and the maximum
can be 64 Kb.
Read on advantages of segmentation.
IA-32 Architecture Computer Organization and Assembly Language
slide 37/45
Program Segments
Machine language programs usually have 3 different parts stored in
different memory segments:
Instructions: This is the code part and is stored in the code segment
Data: This is the data part which is manipulated by the code and is
stored in the data segment
Stack: The stack is a special memory buffer organized as Last-In-First-
Out (LIFO) structure used by the CPU to implement procedure calls
and as a temporary holding area for addresses and data. This data
structure is stored in the stack segment
The segment numbers for the code segment, the data segment, and
the stack segment are stored in the segment registers CS, DS, and
SS, respectively.
Program segments do not need to occupy the whole 64Kb locations
in a segment
IA-32 Architecture Computer Organization and Assembly Language
slide 38/45
Real Address Mode
A program can access up to six segments
at any time
Code segment
Stack segment
Data segment
Extra segments (up to 3)
Each segment is 64 KB
Logical address
Segment = 16 bits
Offset = 16 bits
Linear (physical) address = 20 bits
IA-32 Architecture Computer Organization and Assembly Language
slide 39/45
Logical to Linear Address Translation
Linear address = Segment × 10 (hex) + Offset
Example:
segment = A1F0 (hex)
offset = 04C0 (hex)
logical address = A1F0:04C0 (hex)
what is the linear address?
Solution:
A1F00 (add 0 to segment in hex)
+ 04C0 (offset in hex)
A23C0 (20-bit linear address in hex)
IA-32 Architecture Computer Organization and Assembly Language
slide 40/45
Segment Overlap
There is a lot of overlapping
between segments in the main
memory.
Due to segments overlapping
logical addresses are not
unique .
IA-32 Architecture Computer Organization and Assembly Language
slide 41/45
Your turn . . .
What linear address corresponds to logical address
028F:0030?
Solution: 028F0 + 0030 = 02920 (hex)
Always use hexadecimal notation for addresses
What logical address corresponds to the linear address
28F30h?
Many different segment:offset (logical) addresses can
produce the same linear address 28F30h. Examples:
28F3:0000, 28F2:0010, 28F0:0030, 28B0:0430, . . .
IA-32 Architecture Computer Organization and Assembly Language
slide 42/45
Flat Memory Model
Modern operating systems turn segmentation off
Each program uses one 32-bit linear address space
Up to 232 = 4 GB of memory can be addressed
Segment registers are defined by the operating system
All segments are mapped to the same linear address space
In assembly language, we use .MODEL flat directive
To indicate the Flat memory model
A linear address is also called a virtual address
Operating system maps virtual address onto physical addresses
Using a technique called paging
IA-32 Architecture Computer Organization and Assembly Language
slide 43/45
Programmer View of Flat Memory
Same base address for all segments Linear address space of
All segments are mapped to the same a program (up to 4 GB)
linear address space 32-bit address
ESI
EIP Register EDI DATA
Points at next instruction 32-bit address
EIP
CODE
ESI and EDI Registers
32-bit address
Contain data addresses EBP STACK
Used also to index arrays ESP
CS
ESP and EBP Registers DS Unused
ESP points at top of stack SS
ES
EBP is used to address parameters and base address = 0
variables on the stack for all segments
IA-32 Architecture Computer Organization and Assembly Language
slide 44/45
Protected Mode Architecture
Logical address consists of
16-bit segment selector (CS, SS, DS, ES, FS, GS)
32-bit offset (EIP, ESP, EBP, ESI ,EDI, EAX, EBX, ECX, EDX)
Segment unit translates logical address to linear address
Using a segment descriptor table
Linear address is 32 bits (called also a virtual address)
Paging unit translates linear address to physical address
Using a page directory and a page table
IA-32 Architecture Computer Organization and Assembly Language
slide 45/45
Logical to Linear Address Translation
Upper 13 bits of
segment selector GDTR, LDTR
are used to index
the descriptor table
TI = Table Indicator
Select the descriptor table
0 = Global Descriptor Table
1 = Local Descriptor Table
IA-32 Architecture Computer Organization and Assembly Language
slide 46/45
Segment Descriptor Tables
Global descriptor table (GDT)
Only one GDT table is provided by the operating system
GDT table contains segment descriptors for all programs
Also used by the operating system itself
Table is initialized during boot up
GDT table address is stored in the GDTR register
Modern operating systems (Windows-XP) use one GDT table
Local descriptor table (LDT)
Another choice is to have a unique LDT table for each program
LDT table contains segment descriptors for only one program
LDT table address is stored in the LDTR register
IA-32 Architecture Computer Organization and Assembly Language
slide 47/45
Segment Descriptor Details
Base Address
32-bit number that defines the starting location of the segment
32-bit Base Address + 32-bit Offset = 32-bit Linear Address
Segment Limit
20-bit number that specifies the size of the segment
The size is specified either in bytes or multiple of 4 KB pages
Using 4 KB pages, segment size can range from 4 KB to 4 GB
Access Rights
Whether the segment contains code or data
Whether the data can be read-only or read & written
Privilege level of the segment to protect its access
IA-32 Architecture Computer Organization and Assembly Language
slide 48/45
Segment Visible and Invisible Parts
Visible part = 16-bit Segment Register
CS, SS, DS, ES, FS, and GS are visible to the programmer
Invisible Part = Segment Descriptor (64 bits)
Automatically loaded from the descriptor table
IA-32 Architecture Computer Organization and Assembly Language
slide 49/45
Paging
Paging divides the linear address space into …
Fixed-sized blocks called pages, Intel IA-32 uses 4 KB pages
Operating system allocates main memory for pages
Pages can be spread all over main memory
Pages in main memory can belong to different programs
If main memory is full then pages are stored on the hard disk
OS has a Virtual Memory Manager (VMM)
Uses page tables to map the pages of each running program
Manages the loading and unloading of pages
As a program is running, CPU does address translation
Page fault: issued by CPU when page is not in memory
IA-32 Architecture Computer Organization and Assembly Language
slide 50/45
Paging – cont’d
Main Memory
The operating
system uses
linear virtual address
linear virtual address
Page m ... Page n
space of Program 1
space of Program 2
page tables to ... ...
map the pages
in the linear Page 2 Page 2
virtual address Page 1 Page 1
space onto Page 0 Page 0
main memory
Hard Disk
The operating
Each running Pages that cannot system swaps
program has fit in main memory pages between
its own page are stored on the memory and the
table hard disk hard disk
As a program is running, the processor translates the linear virtual addresses
onto real memory (called also physical) addresses
IA-32 Architecture Computer Organization and Assembly Language
slide 51/45