0% found this document useful (0 votes)

31 views7 pages

Milen Dimitrov HW2 Q3

Advanced Computer Architecture homework For Syracuse University Masters of Computer Engineering

Uploaded by

milenid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views7 pages

Milen Dimitrov HW2 Q3

Advanced Computer Architecture homework For Syracuse University Masters of Computer Engineering

Uploaded by

milenid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Homework 2, Q3

Computer Architecture CIS 655/CSE661

Instructor: Dr. Mo Abdallah

Autor name: Milen Dimitrov

Created on: 07/14/2021

Syracuse University, College of Engineering & Computer Science

Question 3: [30 points] Write a review report about

computer pipelining. The report must be more than 2500
words in any format. Use your own words.
.

1.Introduction
In the field of computer architecture and computer science itself, instruction pipelining is a
technology that implements ILP - instruction-level parallelism, in a single core CPU. Pipeline
purpose is to attempt to keep all the parts of the processor busy, with multiple instructions at the
same time, by splitting incoming instructions into a stages of sequential steps , a.k.a.
"pipelines", which are executed by different CPU circuit units and run different parts of
instructions at the same time, in parallel.

2. Concepts
In a single CPU computer, only one task can be executed at time. To simulate many tasks,
computers do context switching.To do multiple instructions at the time, usually multi processors
aor multi core processors are used. But there is also another way to increase the throughput of
a single CPU, by making it run various instructions at the same time, and not letting parts of the
CPU to Idle. in a computer with an implemented pipeline, instructions flow through the CPU in
stages. For example, for each step of the clock cycle, it may have one stage: fetch instructions
(IF) , decode instruction (ID) , execute instructions (EX) , access memory (MEM) and write
registers / memory back(WB). Pipeline computers usually have "pipeline registers" after each
stage. These registers store information from instructions and calculations temporarily, so that
the next level of logic gates can proceed to the next step. These pipeline registers are permitting
to read and write at the same time in the main registers.
This architecture allows the CPU to complete one instruction every clock cycle. Most often the
even stages run on one edge of the clock, and the odd stages run on the other edge. At a given
clock frequency, this allows more CPU throughput than a multi-cycle computer, but may
increase latency due to the additional overhead of the pipeline process itself.to add more , even
if the electronic elements have a fixed max speed, the pipelined computer can be made even
faster by changing the number of stages in the pipelines. As there are more number of stages,
the less things each stage needs to do, so this stage has less delay from the gates and can run
at a higher clock rate, higher frequency.
When the cost is measured in logic gates per instruction per second, the pipelined computer
model is usually the most economical. At each moment, an instruction is in only one pipeline
stage. On average, the pipeline stage is cheaper than a multi-cycle computer. In addition, if
done well, the logic of most pipeline computers is in use most of the time. In contrast,
disordered computers usually have a lot of idle logic at any given moment. Similar calculations
usually indicate that the pipeline computer uses less energy per instruction.
Pipeline computers are usually more complex and more costly than comparable multi-cycle
computers. It usually has more transistors, registers, gates and more complex control blocks. In
a similar way, it may use more total energy, while each instruction uses less energy. Out-of-
order CPUs can usually execute more instructions per second because they can execute
multiple instructions at once. In the computer with the implemented pipeline, the control block
arranges the instruction sequence to start, continue and stop according to the program.
Instruction data is usually passed from one stage to the next in the pipeline register, and each
stage has a slightly separate control logic. The control block also ensures that the operations of
each stage will not affect the operation of the instructions of other stages, so called hazards. For
example, data hazard may occur if the same piece of data must be used in both stages, the
control logic can ensure that it is used in the correct order.
When running efficiently, the CPU with pipelines will execute one instruction at each stage.
Then it processes all these instructions at the same time. It can complete approximately one
instruction per cycle of its clock. But when the program switches to a different instruction
sequence, the pipeline sometimes must discard the data being processed and restart. This is
called a "stall".
Most of the design of the pipeline computer can prevent interference between stages and
reduce pauses
Numbers of stages more often are 3 or 7 , but extreme cases exist. IBM used 3 stages around
fifties and sixties years of the last century, the classic RISC architecture is 5 stages, Atmel used
for AVR and PIC 2 staged pipelines.Intel Pentium 4 has 20 stages, and some Xeon Intel cores
even 31. There is an example - X10q of Xelerated with more than 1000 stages, but actually that
are just cores, dedicated for specific instructions.

3. History
The pioneering use of the pipeline was in the ILLIAC II project and the IBM Stretch project,
although a simple version was used in the early Z1 in 1939 and Z3 in 1941. John Vincent
Atanasoff and Clifford Berry computer - ABC , the first programmable digital computer was
using a pipeline.
Pipelining really began in supercomputers such as vector processors and array processors in
the late 1970s. One of the early supercomputers was the Cyber series built by Control Data
Corporation. Its principal architect Seymour Cray later led Cray Research. Cray developed the
XMP supercomputer series, which uses pipelines for multiplication and addition / subtraction
functions. Later, Star Technologies added parallelism (multiple pipeline functions working in
parallel) developed by Roger Chen. In 1984, Star Technologies added a pipeline division circuit
developed by James Bradley. By the mid-1980s, many different companies around the world
were using assembly line technology.
The pipeline is not limited to supercomputers. In 1976, Amdahl's 470 series general-purpose
mainframe had a 7 -step pipeline and a patented branch prediction circuit.
Intel introduced pipelines in 80486 in 1989. Pentium had 2 pipelines in 1993.

4. Hazards
Using pipelines to manage few instructions at the same time, often can cause problematic
conditions, when an instruction depends on the previous one. These conditions are called
“Hazards” and there are few well known examples - Structural hazard, Data hazard ( few
different types, that will be reviewed later) and Control hazards, involving the branching
instructions.

4.1 Structural hazard , sometimes called recourse hazard.

When two instructions may try to use the same resource at the same time, structural hazards
occur. As an example, consider that multiple instructions are ready to enter the execution
instruction phase and there is only one ALU. One solution to this resource risk is to increase the
available resources, such as connecting multiple ports to the main memory and multiple ALUs.

4.2 Data hazard

When arbitrary scheduled instructions try to use data before the data in the registers is
available, data hazards occur. Or in other words data hazards happen when instructions that
has data dependence are modifying data that is used at different stages of the pipeline. If not
properly handled, potential data hazards can lead to race conditions (a.k.a. race hazards).
There are three possible data hazards:
True dependency - Read after write, an anti-dependency - Write after read, output dependency
- Write after write.
True Data Dependency hazard, Read after write, ( Instruction2 - I2 tries to read the source
before Instruction 1- i1 is written) Read after write Data hazard refers to the situation where the
instruction refers to a result that has not been calculated or retrieved. This happens because
even if the instruction is executed after another, previous instruction, the previous instruction is
only partially processed in the pipeline. The first instruction is to calculate the value to be stored
in say, register R2, and the second instruction will use this value to calculate the result of
register R3. But when in the pipeline the operand is fetched for the second operation, the result
of the first operation has not yet been saved, so there will be data dependence.
Anti-dependency data hazard - Write after read, (i2 tries to write to the target before i1 reads)
Write after read data hazard indicates that there is a problem with concurrent execution.In any
case, i2 may complete before i1 (that is, concurrent execution), and it must be ensured that the
result of register is not stored before i1 has a chance to get the operand.
Output dependency data hazard , Write after write - (I2 tries to write the operand before i1 is
written) In a concurrent execution environment, write-after-write data hazards may occur.The
write-back of i2 must be delayed until i1 completes its execution.

4.3. Control hazard ( a.k.a instruction hazard or branching hazard)

A control hazard occurs when the pipeline makes the wrong decision on the branch prediction
and therefore brings instructions that must be discarded later into the pipeline.

4.4 Methods to eliminate or decrease pipeline hazards

The classic 5 stages RISC pipeline avoids these dangers by expanding the hardware. In
particular, branch and jumping instructions can use values calculated by ALU to calculate the
target address of the branch. If the ALU is busy, used at the same time by the decoding stage,
then the ALU instruction followed by a branch will see two instructions trying to use the ALU at
the same time. It is relatively easy to resolve this problem by designing a special branch target
arithmetic full adder in the decoding stage, so at least for calculating the offset and indexing of
the memory address there will be no need to use the main ALU.
In the classic, standard 5 stages RISC pipeline, data hazards can be reduced or avoided in one
of various ways: Bypass ( operand forwarding) , out -of -order execution or simple pipeline
buble, NOP instructions. The last one is the worst and last resort solution, because introducing
delays, some CPU stages are doing nothing useful, wasting resources. Operand forwarding is
costly, you need to add a lot of gates and register. And reordering of the instruction needs to be
done by the compiler and programmers. For the former, register renaming may be used, which
means the compiler is using different general purpose registers, even if in the program the same
register is reused. That's one of the reasons why more and more general purpose registers are
needed.

Bypass is also called operand forwarding. This is additional hardware, transistors and gates,
that are allowing data to be send directly from the ALU output ( from the previous instruction) to
the input, to be used by the next instruction, before the saving value in the appropriate register
( it still will be saved, later, just will be used ahead of time).The instruction fetch and decode
stage sends the second instruction after the first cycle. Other method is Pipeline interlocking
When the machine code of the program is written by the compiler, this data hazard can be
easily detected. In this case, the Stanford MIPS machine relies on the compiler to add NOP
instructions instead of letting the circuit detect and (more laboriously) stop the first two pipeline
stages. Hence the name MIPS: Microprocessor without interlocking pipeline stage. It turns out
that the additional NOP instructions added by the compiler expand the program binary file,
thereby reducing the instruction cache hit rate. Although the stall hardware is expensive, it was
later redesigned to improve the instruction cache hit rate, at which point the acronym no longer
makes sense.

Control hazards
Control hazards are caused by conditional branches and unconditional branches. The classic 5
stages RISC pipeline resolves branches in the decoding stage, which means that the branch
resolution cycle is two cycles long. It has the following three meanings:

Branch resolution is repeated through quite a few circuits: instruction cache reading, register file
reading, branch condition calculation (involving 32-bit comparison on MIPS CPU), and the next
instruction address multiplexer.
Since branch and jump targets are calculated in parallel with register reading, RISC ISA usually
does not have instructions for branching to register + offset address.
When making any branch, the instruction immediately following the branch is always fetched
from the instruction cache. If this instruction is ignored, the IPC penalty for each branch has one
cycle, which is large enough.
There are four solutions to this performance problem with branches:
Prediction not taken: The instruction after the branch is always fetched from the instruction
cache, but only executed when the branch is not taken. If no branch is taken, the pipeline
remains full. If a branch is taken, the instruction is discarded (marked as NOP), and the
opportunity to complete the instruction in one cycle is lost.
Possible branch: always fetch the instruction after the branch from the instruction cache, but
execute it only when the branch is taken. The compiler can always fill the branch delay slot on
such a branch, and because the branch is used more frequently than not, the IPC penalty for
this branch is smaller than the previous type.
Branch Delay Slot: Always fetch the branched instruction from the instruction cache and always
execute it, even if the branch is taken. The branch delay slot IPC penalizes the branches for
which the compiler cannot schedule the branch delay slot, instead of IPC penalties for some
branches that are adopted (maybe 60%) or not adopted (maybe 40%). The designers of
SPARC, MIPS and MC88K designed a branch delay slot in their ISA.
Branch prediction: In parallel with fetching each instruction, guess whether the instruction is a
branch or a jump, and if it is, guess the target. In the loop after the branch or jump, the
instruction is fetched at the guessed target. When the guess is wrong, refresh the target
obtained by the error.
Delayed branches are controversial, first of all because of their complex semantics. Delayed
branch designates a jump to a new location after the next instruction. The next instruction is the
instruction that the instruction cache inevitably loads after the branch.
Delayed branches are criticized as a bad short-term choice in Instruction Set Architecture
design: Compilers usually find it difficult to find logically independent instructions to place after
the branch (the instructions after the branch are called delay slots), so they must insert NOPs
into the delay slots.
Superscalar processors fetch multiple instructions per cycle and must have some form of
branch prediction, so they will not benefit from delayed branches. Alpha ISA ignores the delayed
branch because it is designed for superscalar processors.
The most serious disadvantage of delayed branches is that they bring additional control
complexity. If an exception occurs in the delay slot instruction, the processor must be restarted
on the branch, not on the next instruction. Exceptions essentially have two addresses, the
exception address and the restart address, and in all cases correctly generating and
distinguishing these two has always been the source of error in the later design.

4.5 Other interesting cases - Self modifying programs and Uninterruptible

instructions
Self-modifying programs were very popular at the beginning of the computer age, as they
permitted very efficient use of the then- very expensive memory. The technique of self-
modifying code may create problems on pipeline CPUs. In this technique, it may happen that
one of the functions of the program may modify its own upcoming instructions. If the CPU has
an instruction cache, the original instruction may get copied to the input queue, and the
modification will not take effect.
An instruction may be uninterruptible to ensure its atomicity, for example when it exchanges two
items and to avoid race conditions. Sequential processors allow interrupts between instructions,
but pipeline processors overlap instructions, so executing non-interruptible instructions will
make some ordinary instructions uninterruptible.

5. When is not good to use pipelines

While pipelines in general increase the throughput, there may be situations when they may do
more harm than good. One of the biggest problems is that the latency for individual instruction
on average increases. For example in Low Latency Trading it is common practice to disable
cores of the CPU - single thread core is faster , even with lower throughput. Same for hyper-
threading. ( additional effect is that when only 1 core is working, the thermal load is lower, and
the clock rate can be increased, CPU overclocking ). Another case is when more predictability
is needed, pipelining introduces more uncertainty. Programming is also more complicated for
the pipelined computer, and compilers may not always produce the optimal code.
6. Conclusion.
Pipelines greatly increase the throughput thru ILP, and even with the current multicore chips
they are one of the most important parts of the CPUs. With Moore's law declining in the latest
years, inability to increase frequency of the CPUs, pipelines, together with multi-core
architecture helps the CPU to grow horizontally. More and more space on the silicon dice is
dedicated to pipelines and cache, and that is mentioned even in discussion about RISC / CISC
architectures by DEC and Patterson in the 1980s, tha papers we reviewed in the Homework 1,
that the silicon , released from the complex instructions may be used for more pipeline stages
and cache.
But with increasing the complexity of the pipelines, programers need to know more about it, and
to learn the internal structure of the CPU, pipeline design and workflow, so they can code more
efficient programs.

Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
Pipe Lining
No ratings yet
Pipe Lining
14 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
8 pages
OLP Notes
No ratings yet
OLP Notes
11 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Pipelining: Techniques and Challenges
No ratings yet
Pipelining: Techniques and Challenges
13 pages
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
No ratings yet
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
6 pages
4-Concept of Pipelining
No ratings yet
4-Concept of Pipelining
20 pages
CPU Pipelining Explained
No ratings yet
CPU Pipelining Explained
7 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
CPU Pipelining and Cache Basics
No ratings yet
CPU Pipelining and Cache Basics
61 pages
Snehasis Barik-Computer Architeacture
No ratings yet
Snehasis Barik-Computer Architeacture
11 pages
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
100% (1)
Concept of Pipelining - Computer Architecture Tutorial What Is Pipelining?
5 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Understanding Processor Pipelining
No ratings yet
Understanding Processor Pipelining
28 pages
Session6-Pipelining Approach
No ratings yet
Session6-Pipelining Approach
11 pages
Pipelined Data-Path in MIPS Architecture
No ratings yet
Pipelined Data-Path in MIPS Architecture
31 pages
Pipeline Processing in Computer Systems
No ratings yet
Pipeline Processing in Computer Systems
16 pages
PCC-CS402
No ratings yet
PCC-CS402
7 pages
Understanding Pipelining in Microprocessors
No ratings yet
Understanding Pipelining in Microprocessors
13 pages
Pipelined Architecture Explained
No ratings yet
Pipelined Architecture Explained
20 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Understanding Pipelining and Hazards
No ratings yet
Understanding Pipelining and Hazards
19 pages
Module 3
No ratings yet
Module 3
20 pages
Pipelining From Jagran Lakecity University
No ratings yet
Pipelining From Jagran Lakecity University
7 pages
Pipeline Concepts for CS Students
No ratings yet
Pipeline Concepts for CS Students
7 pages
Principles of Designing Pipelined Processor-1
No ratings yet
Principles of Designing Pipelined Processor-1
32 pages
Csa Module Iv Notes
No ratings yet
Csa Module Iv Notes
59 pages
Lec 7
No ratings yet
Lec 7
26 pages
Understanding Pipelining Techniques
No ratings yet
Understanding Pipelining Techniques
21 pages
COA Unit - V Notes
No ratings yet
COA Unit - V Notes
21 pages
Concept of Pipelining 3.1.3
No ratings yet
Concept of Pipelining 3.1.3
6 pages
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
No ratings yet
Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining
12 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
64 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
77 pages
RISC Pipeline Implementation Explained
100% (1)
RISC Pipeline Implementation Explained
16 pages
UNIT 3 Second Half Notes
No ratings yet
UNIT 3 Second Half Notes
28 pages
Lecture 7 - PIPELINING
No ratings yet
Lecture 7 - PIPELINING
16 pages
Pipelining in Computer Architecture Explained
No ratings yet
Pipelining in Computer Architecture Explained
10 pages
Pipelining Basic Concepts and Approaches
No ratings yet
Pipelining Basic Concepts and Approaches
6 pages
ACA Handwriten Notes Chat GPT
No ratings yet
ACA Handwriten Notes Chat GPT
52 pages
Differentiate Organization and Architecture.: Advanced Computer Architechture Assignment 1
No ratings yet
Differentiate Organization and Architecture.: Advanced Computer Architechture Assignment 1
4 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Advanced Computer Architecture 2
No ratings yet
Advanced Computer Architecture 2
17 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
14 pages
Lec 8 Performance Enhancement-Computer Architecture
No ratings yet
Lec 8 Performance Enhancement-Computer Architecture
23 pages
Understanding Pipelining in Embedded Systems
No ratings yet
Understanding Pipelining in Embedded Systems
26 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
16 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
Chapter 6
No ratings yet
Chapter 6
43 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Campmc Unit Ii
No ratings yet
Campmc Unit Ii
61 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
Pipelining
No ratings yet
Pipelining
5 pages
Unit 1
No ratings yet
Unit 1
5 pages
CPU Architecture - From Hack To Hero
No ratings yet
CPU Architecture - From Hack To Hero
25 pages
Risc V PDF
No ratings yet
Risc V PDF
117 pages
Week2 - 1
No ratings yet
Week2 - 1
64 pages
Advanced Computer Architechture 1
No ratings yet
Advanced Computer Architechture 1
2 pages
Understanding Instruction Sets of Microcontrollers
No ratings yet
Understanding Instruction Sets of Microcontrollers
52 pages
Understanding Instruction Set Architecture
No ratings yet
Understanding Instruction Set Architecture
10 pages
ARM Cortex-M3 for Electronics Students
No ratings yet
ARM Cortex-M3 for Electronics Students
129 pages
Computer Architecture Simulators
No ratings yet
Computer Architecture Simulators
6 pages
CS6303 Computer Architecture 2
No ratings yet
CS6303 Computer Architecture 2
56 pages
Anch Prediction
No ratings yet
Anch Prediction
25 pages
VLSI Design: A Basic RISC Processor
No ratings yet
VLSI Design: A Basic RISC Processor
54 pages
8bit Risc Processor
No ratings yet
8bit Risc Processor
7 pages
Ht48xa0 5v150
No ratings yet
Ht48xa0 5v150
38 pages
Unit 4 Q&A
No ratings yet
Unit 4 Q&A
59 pages
Module 1: Basic Structure of Computers 1.1 Basic Operational Concepts
No ratings yet
Module 1: Basic Structure of Computers 1.1 Basic Operational Concepts
34 pages
Assembly Language: Branching Techniques
No ratings yet
Assembly Language: Branching Techniques
28 pages
8085 Microprocessor Instruction Set
No ratings yet
8085 Microprocessor Instruction Set
65 pages
Lec11 PDF
No ratings yet
Lec11 PDF
45 pages
02 MC Module-02 Notes 2023-24
No ratings yet
02 MC Module-02 Notes 2023-24
24 pages
Instruction and Arithmetic Pipeline Design
100% (1)
Instruction and Arithmetic Pipeline Design
34 pages
PIC16F887 Assembly Instructions
No ratings yet
PIC16F887 Assembly Instructions
23 pages
The Evolution of RISC Technology at IBM
No ratings yet
The Evolution of RISC Technology at IBM
8 pages
Assembly Language Compiler Guide
No ratings yet
Assembly Language Compiler Guide
67 pages
Assembly Language Jumps and Branches
No ratings yet
Assembly Language Jumps and Branches
23 pages
Nptel Cao Imp Questions
No ratings yet
Nptel Cao Imp Questions
58 pages
Instruction Set of 8086
No ratings yet
Instruction Set of 8086
45 pages
6 - CH13 - Instruction Sets Functions
No ratings yet
6 - CH13 - Instruction Sets Functions
50 pages
Ac Motor Speed Control Using PWM Technique
100% (1)
Ac Motor Speed Control Using PWM Technique
63 pages
PIC (Peripheral Interface Controller) PIC Is A Family of Harvard Architecture Microcontrollers Made by
100% (2)
PIC (Peripheral Interface Controller) PIC Is A Family of Harvard Architecture Microcontrollers Made by
7 pages
Microarchitecture and Register Design Guide
No ratings yet
Microarchitecture and Register Design Guide
51 pages

Milen Dimitrov HW2 Q3

Uploaded by

Milen Dimitrov HW2 Q3

Uploaded by

Homework 2, Q3

Computer Architecture CIS 655/CSE661

Instructor: Dr. Mo Abdallah

Autor name: Milen Dimitrov

Syracuse University, College of Engineering & Computer Science

Question 3: [30 points] Write a review report about

4.1 Structural hazard , sometimes called recourse hazard.

4.2 Data hazard

4.3. Control hazard ( a.k.a instruction hazard or branching hazard)

4.4 Methods to eliminate or decrease pipeline hazards

4.5 Other interesting cases - Self modifying programs and Uninterruptible

5. When is not good to use pipelines

You might also like