Advanced Computer
Architecture
Pages 1-7 Summary
The 80386 Microprocessor
➔ What is the 80386?
◆ It’s a 32 bit high performance microprocessor used to drive the most advanced
computer applications.
➔ Why is it considered a 32 bit microprocessor?
◆ Because it has 32 bit internal and external data paths.
➔ What does the 80386 incorporates(what is its features)?
◆ It has 5 features:
● Multitasking support.
● Memory management.
● Pipelined architecture.
● Address translation caches.
● High speed bus interface.
➔ Talk about its registers.
◆ The 80386 has 8 general purpose registers, each one with a 32 bit capacity.
➔ How many bits does the physical address use? And how much memory can it address?
◆ It uses 32 bits, and it can address up to 4GB of memory.
➔ What is the benefit of a pipelined architecture?
◆ It allows the 80386 to perform instruction fetching, decoding, executing and
memory management in parallel.
➔ How many stages does the pipeline in the 80386 microprocessor have?
◆ The pipeline has 6 stages.
➔ How many units does the 80386 have?
◆ It has 6 main units:
● The Bus Interface Unit.
● The Code Prefetch Unit.
● The Instruction Decoding Unit.
● The Execution Unit: which consist of 3 subunits:
○ The Control Unit.
○ The Data Unit.
○ The Test Protection Unit.
● The Segmentation Unit.
● The Paging Unit.
The Bus Interface Unit
➔ Define the bus interface unit?
◆ One of the 80386 units which provides the interface between the 80386
microprocessor and its environment.
➔ What is its purpose(what does it do)?
◆ It has 4 functions:
● It accepts internal requests for code fetches(from the code prefetch unit)
and data requests(from the execution unit).
● Prioritize the requests.
● Generate or process the signals to perform the current bus cycle.
● Control the interface for external bus masters and coprocessors.
➔ What type of signals does the bus interface unit generate or process?
◆ It generate or process 3 types of signals:
● Address signals.
● Data signals.
● Control outputs for accessing external memory and I/O
The Code Prefetch Unit
➔ What is its function?
◆ It performs the program look ahead function. When the bus interface unit is
not busy, the code prefetch unit uses it to fetch sequentially along the
instruction byte steam.
➔ Where does the code prefetch unit stores the instructions it fetches?
◆ It stores them in the code queue.
➔ How long is the code queue?
◆ It is 16 byte long.
➔ Which one have the higher priority, code prefetches or data transfers?
◆ Data transfers has a higher priority.
➔ How does the CPU benefit from the code prefetch unit?
◆ It reduces the idle time(the time in which the CPU waits for instructions to
arrive) to practically zero.
The Instruction Decoding Unit
➔ What is its function?
◆ It has 3 functions:
● It takes instruction stream bytes, immediate data and opcode offsets from
the prefetch queue.
● Translates them into microcode.
● Stores them in the instruction queue.
➔ How many levels does the instruction queue have(how deep is it)?
◆ It is three levels deep.
The Execution Unit
➔ What is its function?
◆ It is the unit responsible for executing the instruction taken from the instruction
queue, and therefore needs to communicate with all other units required to
complete the instruction.
➔ What does the control unit contain?
◆ It contains microcode and special parallel hardware.
➔ What does the control unit do?
◆ It speeds up multiply, divide and effective address calculation.
➔ What does the data unit contain?
◆ It contains the ALU, a file of eight 32-bits general purpose registers and a 64
bit parallel shifter(which performs multiple shifts in one clock cycle).
➔ What does the data unit do?
◆ It performs data operations requested by the control unit.
➔ What does the protection test unit do?
◆ It checks for segmentation violations under the control of the microcode.
➔ How does the execution unit speed up the execution of memory reference instructions?
◆ It does so by partially overlapping the execution of any memory reference
instruction with the previous instruction, and since memory reference
instructions are frequent, a performance gain of approximately 9% is achieved.
The Segmentation Unit
➔ What does it do?
◆ It translates logical address into linear address at the request of the execution
unit.
➔ What is the logical address?
◆ It is the address generated by the CPU.
➔ What is the linear address? And when does the linear address = the physical address?
◆ It is the resultant address from adding the base segment(logical address) to an
offset. The linear address is the same as the physical address if the paging unit is
NOT enabled.
➔ What is the purpose of segment descriptor cache?
◆ It is used to store currently used segment descriptor to speed up translation.
➔ Where is the translated linear address is sent?
◆ It is sent to the paging unit.
The Paging Unit
➔ When is the linear address considered the same as the physical address?
◆ When the paging unit is NOT enabled.
➔ What is the use of the page descriptor cache?
◆ It stores recently used page directory and page table entries in its translation
lookaside buffer(TLB) to speed this translation
➔ Where is the physical address is sent?
◆ It is sent to the BIU(Bus Interface Unit) to perform memory and I/O accesses.
The 80368 processing modes
➔ How many processing modes does the 80368 have?
◆ It has 3 modes:
● Protected Mode.
● Real address mode.
● Virtual 8086 mode.
➔ What is the the protected mode?
◆ It is the natural 32 bit environment of the 80368 processor. In this mode all the
instructions and features are available.
➔ What is the real address mode?
◆ It is the mode used by the 80368 after a RESET, and also by the applications to
initialize.
➔ What is the virtual 8086 mode?
◆ A dynamic mode used to execute a native 8086 applications.
➔ What dose does it mean that the virtual 8086 mode is dynamic?
◆ It means that the processor can switch repeatedly and rapidly between the
protected mode and the V86 mode.
Advanced Computer
Architecture
Intel 80486 MP
Intel 80486 Microprocessor
Q/What are the features of 80486 MP?
A/
● It has a math microprocessor (80486)
● It has 8 KByte of code and data caches(Inside the CPU)
● It features a parity checker / generator
● It has a FPU(Floating Point Unit)
Q/What is the difference between the math coprocessor of 80386 and the math
coprocessor of the 80486?
A/
The math coprocessor of 80486(which is known as 80487) is built in and integrated with the
80486, which allows math instructions to be executed 3 times faster than the 386/387
combination.
Q/How to obtain the code of a math coprocessor?
A/
We can obtain the code for the math coprocessor by adding 1 to the name of the microprocessor,
for example the math coprocessor for the 8086 microprocessor is 8087, and for the 80386 is
80387.
Q/Why does it has an 8 KByte code and data caches
A/
To speed up the execution of instructions and the acquisition of data.
Q/How is the paging is different from 80386
A/
We can disable paging for any section of translation memory page, while the 80386 cannot.
FPU
Q/What is it, what does it provide, what data types does it support, and what is its
features?
A/
● FPU stands for Floating Point Unit
● It provides high performance floating point processing capabilities.
● It supports real, integer and BCD-integer data types
● It supports floating point processing algorithms and exception handling defined in
the IEEE 754 and 854 standards for floating points arithmetic.
Q/In what applications is the FPU commonly used? And why?
A/
It is used in scientific, engineering and business applications, because it improves the
efficiency in handling high precision floating point processing operations
Advanced Computer
Architecture
Intel Pentium
Intel Pentium Processor
● What technology is used in manufacturing the pipeline in intel pentium pro?
○ 0.6m technology
● What is the superscalar factor?
○ The maximum number of instructions that can be completed in a clock cycle
● How to divide the architecture of the pentium pro processor?
○ We can divide it into 4 units and a memory subsystem
■ Memory system can be divided into:
● Buses: System Bus
● Caches: L1 cache(Instruction and data), L2 cache and memory
reorder buffer.
● Interface units: Bus Interface Unit and Memory Interface Unit
■ Fetch/Decode Unit:
● Instruction Fetch Unit
● Branch target buffer
● Instruction decoder
● Microcode sequencer
● Register alias table unit
■ Instruction Pool: made of reorder buffer
■ Dispatch/Execute Unit:
● Reservation station
● 2 integer units
● 2 floating point units
● 2 address generation units
■ Retirement Unit:
● Retire Unit
● Retirement register file
● What is the difference between L1 and L2 caches
○ L1 caches reside inside the CPU, while L2 caches reside outside of it.
● What are the differences between intel pentium and pentium pro
Intel Pentium Intel Pentium Pro
Superscalar Factor 2 3
Pipeline 5 stages Decoupled 12 stages
Data Paths 32 bit 64 bit
Dynamic Branch Prediction Same Same
Memory Subsystem
● What is a transaction oriented bus?
○ A bus that handles each bus access as a separate request and response.
● How many access operations on the caches can the BIU handle?
○ 4 concurrent access operations
● How is the L1 instruction cache organized?
○ It is organized as a 4 way set associative.
● How is the L1 data cache organized?
○ It is organized as a 2 way set associative(can support one load and one store
operation per clock cycle).
● How is coherency is maintained between the caches and the memory subsystem?
○ By using the MESI cache protocol.
● What does MESI stands for?
○ It stands for Modified, Exclusive, Shared and Invalid.
● What is the function of the memory reorder buffer?
○ It works as a scheduling and dispatch station, and it is able to reorder memory.
● Why does the memory reorder buffer reorder memory?
○ To prevent blocks and improve throughput.
Fetch/Decode Unit
● Where does it read the instructions from?
○ From the L1 instruction cache and converts them into a series of micro-ops.
● How many bytes can it fetch per clock cycle?
○ 32 bytes.
● How many bytes does it transfer to the decoder?
○ 16 aligned bytes.
● How does it calculate the instruction pointer?
○ Based on inputs from the branch target buffer, the interrupt status and
branch prediction indications.
● What is branch prediction? And which buffer performs it?
○ The process in which the processor tries to predict whether the branch
instruction will jump or not depending on a past history in the branch. The
branch target buffer performs branch prediction.
● How many entries does the branch target buffer allow?
○ 512 entries.
● How is branch prediction is achieved?
○ By looking ahead of the retirement program counter using Yeh’s algorithm.
● How many decoders does the instruction decoder have?
○ 3 parallel decoders(2 simple and 1 complex).
● What does each decoder do?
○ Convert each instruction into a triadic micro-ops(2 logical sources and 1 logical
destination)
● How many micro-ops can each instruction be decoded to?
○ From 1 to 4 micro-ops for each instruction.
● How many micro-ops can the instruction decoder generate per clock cycle?
○ 6 per clock cycle.
● How many general purpose registers are there?
○ 40 internal, which can handle both integer and floating point.
● What is the purpose of the Register Alias Table Unit?
○ Converts the logical register references into physical register references.
● What does the allocator in the Register Alias Table Unit do?
○ It adds status bits and flags to the micro-ops to allow out-of-order execution
and sends the micro-ops to the instruction pool
● How many execution units does the processor have?
○ 6 parallel units
Instruction Pool
● Define it?
○ An array of content addressable memory(set associative cache) organized into 40
micro-ops registers.
● What does it contain?
○ It contains micro-ops that are waiting to be executed in addition to those
who have been executed and yet to be committed to the machine state.
Dispatch/Execute Unit
● What is its functions?
○ Schedules and executes micro-ops stored in the reorder buffer(instruction pool)
depending on data dependencies and resource availability.
● What is the function of the reservation station?
○ It is responsible for scheduling and dispatching of micro-ops from the reorder
buffer.
● What if 2 micro-ops of the same type are available at the same time?
○ The reorder buffer follows a FIFO algorithm to execute them.
● How many instructions can we schedule per clock cycle? And why?
○ We can schedule 5 because we have 2 integer units, 2 floating point units and 1
memory interface unit.
● How is branch misprediction is detected?
○ One of the integer units detects it and sends a signal to the branch target buffer
to restart the pipeline.
● What does the memory interface unit do?
○ It executes one store and one load operations per clock cycle.
Retirement Unit
● What is its function?
○ Commits the results of previously executed micro-ops into the permanent
machine state and remove them from the reorder buffer.
● How many micro-ops can it retire per clock cycle?
○ 3 micro-ops per clock cycle.
● Where does it write the result to?
○ It write the results to the retirement register file and/or memory.
● How many registers does the retirement register file contain?
○ It contains 8 general purpose registers and 8 floating point data registers
Instruction Set Architecture Features
● What type of instruction architecture is the pentium pro?
○ It is a CISC(Complex Instruction Set Computer)
● How does it achieve high performance?
○ By using many organizational features from RISC(Reduced Instruction Set
Computer).
● How many groups is the Intel Architecture Instructions divided to?
○ It is divided into 4:
■ Integer
■ MMX Technology
■ Floating Point
■ System Instructions
● What does the integer instructions perform?
○ They perform:
■ Arithmetic
■ Logic
■ Program Flow
● What are the different types of integer instructions are there?
○ There are many types like:
■ Data transfer instructions (PUSH, POP, MOV etc)
■ Arithmetic Instructions (ADD, ADC, SUB, SBB)
■ Decimal Arithmetic(DDA, DAS)
■ Logic Instructions (AND, OR, XOR, NOT)
■ Shift and Rotate(SAR, SHR)
● On what data types does the MMX technology instructions operate on?
○ It operates on the following data types:
■ Packed byte
■ Packed word
■ Packed doubleword
■ Quadword
● How is the MMX technology instructions grouped?
○ There are grouped into the following sub groups:
■ MMX conversion instructions
■ MMX packed arithmetic instructions
■ MMX comparison instructions
■ MMX logic instructions
■ MMX shift and rotate instructions
■ MMX state management
● What unit does the floating point instructions work on?
○ It executes using the processor FPU(Floating Point Unit)
● What data types does the floating point instructions operate on?
○ It operates on the following data types:
■ Floating Point(Real)
■ Extended integer
■ BCD(Binary Coded Decimal)
● What are the types of floating point instructions?
○ It includes many different types like:
■ Data transfer(FLD, FST)
■ Basic arithmetic(FADD, FADDP)
■ Comparison(FCOM, FCOMP)
● What are the system instructions?
○ They are used to control the functions of processor that are provided to support
the operating systems and executives.
Advanced Computer
Architecture
Intel Pentium III Processor
● What is the intel Pentium III processor?
○ It is essentially a pentium II running at a higher speed
● What sets it apart from the pentium II processor?
○ It has 2 features that sets it apart from pentium II processor:
■ The Processor Serial Number (also called Chip ID)
■ Streaming SIMD Extensions (SSE)
● What is the Processor Serial Number (or the Chip ID)
○ It is a unique identifier burned into the pentium III processor, it has many
applications in security, manageability and information management.
● How is the Processor Serial Number (Chip ID) used in security?
○ It can be used in the following ways:
■ Allowing only the authorized people to access the confidential
information
■ Used by applications to add another layer of identifications thus
increasing confidentiality
■ Strengthen data for consumer web sites who want to maintain a section
open only for family members
■ Used by business to add a level of validation to electronic signature
approvals
● Why do we use Processor Serial Number (Chip ID) is used instead of MAC address or
BIOS GUID?
○ Because both the MAC address and BIOS’s GUID can be erased, making them less
reliable compared to the Processor Serial Number (Chip ID) which can NOT be
erased once it is burned.
● How is the Processor Serial Number (Chip ID) used in information management?
○ It is used in various tasks such as:
■ Finding multiple copies of virus-infected documents
■ Tracking change information
■ Delivering customized information to the end user
● What does SISD and SIMD stand for?
○ SISD = Single Instruction and Single Data
○ SIMD = Single Instruction and Multiple Data
● What are the differences between MMX and SSE?
MMX SSE
MMX instructions are SIMD for integers SSE instructions are SIMD for single
precision floating point numbers
MMX instructions operate on two 32-bit SSE instructions operate on four 32-bit floats
integers simultaneously(at the same time) simultaneously
No new registers were defined for MMX Eight new registers were defined for SSE
Cannot Can be used in 3d graphics
Advanced Computer
Architecture
Multicore Architecture
● What is a multicore processor?
○ It is a single computing component with 2 or more independent actual
processing units called cores
● What are cores?
○ They are the units which read and execute program instructions.
● Why do we use multicore architecture?
○ Because it increases overall speed, because each core can run multiple
instructions at the same time, which is similar to parallel computing.
● What is a CMP?
○ A CMP is a Chip MultiProcessor, which is a single integrated circuit die
● How are the cores integrated?
○ They are either integrated as:
■ Single integrated circuit die (Known as CMP)
■ Multiple dies in a single chip package
● How many cores can a multicore processor have?
○ It can have the following:
■ Dual core (2 cores)
■ Tri core (3 cores)
■ Quad core (4 cores)
■ Hexa core (6 cores)
■ Octa core (8 cores)
■ Deca core (10 cores)
● How is the the cores coupled together?
○ They are coupled in one of 2 ways:
■ Tightly: they share the same cache(s)
■ Loosely: each one has its own cache(s)
● What are the inner-core communication methods?
○ There are 2 methods:
■ Message Passing
■ Shared Memory
● How are the cores connected with each other?
○ There are many ways to connect core:
■ Bus
■ Ring
■ Two-dimensional mesh
■ Crossbar
● What are homogeneous and heterogeneous multicore systems?
○ Homogeneous refers to systems that have only identical cores.
○ Heterogeneous refers to systems that have cores that are NOT identical.