0% found this document useful (0 votes)

21 views

SSC Course 6 CPU

Uploaded by

koder57776

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

SSC Course 6 CPU

Uploaded by

koder57776

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 17

Structure of Computer

Systems
Course 6
Multi-core systems
Multithreading and multi-processing
 Exploiting different forms of parallelism:
 data level parallelism (DLP) – same operations on a set of data – SIMD
architectures, multiple ALUs
 instruction level parallelism (ILP) – instructions phases executed in
parallel – pipeline architectures
 thread level parallelism (TLP) – instruction sequences/streams executed
in parallel – hyper-treading, multiprocessor architectures (mult-icore,
GRID, cloud, parallel computers)

 Thread level parallelism execution issues:

 synchronization between thread
 data consistency
 concurrent access to shared resources
 communication between threads
Multiprocessing
 Limits of performance
increase
 Amdahl’s law
 S - speedup of a parallel
execution
 ts – time for sequential execution
 tp – time for parallel execution
 q fraction of a program which can
be executed in parallel
 n – number of nodes/threads

t ts
S s   Examples:
tp (1  q )ts  qts / n q=50%, n->∞ => S=2
1
 q=75%, n->∞ => S=4
1 q  q / n
q=95%, n->∞ => S=20
Hyper-threading
 hyper-treading - parallel execution of instruction streams
on a single CPU
 Idea: when a tread is stalled because of some hazard cases
another thread can be executed
 Solution:
 two threads executed in parallel on the same pipelined CPU
 after every stage two buffers (registers) store the partial results of the
two threads
 Speedup – approximately 30%
 The operating system will detect 2 logical CPUs !!
Single Thread IF ID Ex M Wb
threaded
Thread 1

Hyper
threaded IF ID Ex M Wb
Thread 2
Multiprocessors
 Parallel execution of instruction streams on multiple CPUs
 Implementations:
 multi-core architectures – multiple CPUs in a single integrated
circuit (IC)
 parallel computers – multiple CPUs on different ICs, but in the
same computer infrastructure
 distributed computing facilities – multiple CPUs on different
computers, connected through a network
• network of PCs
• GRID architectures – distributed computing resources for virtual
organizations (VOs), manly for batch processing
• cloud architectures – computing resources (execution and storage)
offered as a service; it can be hired dynamically
 combination of all above: multi-cores on parallel computers,
building distributed computing facilities
Multi-core processors
 Why multi-core:
 Difficult to make single-core clock frequencies even higher; in
the last 4-5 years the clock frequency growth saturated at 2.5-3
GHz
 power consumption and dissipation problems (higher frequency
means more power)
 pipeline architectures (instruction level parallelism) reached their
efficiency limits (around 20 pipeline stages)
 designing a very complex CPU (with multiple optimization
schemes involved) requires coordination of very large designing
teams
 many new applications are multithreaded (e.g. servers that solve
multiple concurrent requests, agent systems, gaming,
simulation, etc.)
Multi-core processors
 Issues (decision choices):
 same or different functionalities for CPUs (homogeneous v.s.
heterogeneous CPUs)
• symmetric cores (SMP – Symmetric multi-core processor) – every
core has the same structure and functionality
• asymmetric cores (ASMP) – there are coordination cores and
(simpler) specialized cores
 the relation with the memory
• symmetric memory access - the SYMA
• non-uniform memory access – NUMA
 connection between cores
• common bus – parallel or network-based (see network-on-chip)
• crossbar – multiple connections controlled with a switch
• memory hierarchy (cache) – common memory zones
Multi-core processors
 architectural solutions

Core Core Core Core Core Core

L1 L1 L1 L1
L1 L1

Switch L2 L2
crossbar
L2
L3 L3

Memory Memory Memory

Module 1 Module 2
Symmetric multi-core with private L1 Symmetric multi-core partially
cache and shared L2 and memory shared L2 and L3
Multi-core processors
 architectural solutions (cont.)
Processor 1 Processor 2
Local Local
Core (2x SMT) Core Core Core Core
Store Store
Core Core
L1 L1 L1 L1 L1
Ring network

L2 Switch Switch
Core Core
Local Local L2 L2
Store Store

Memory Memory
I/O
Module

Heterogeneous multi-core with Two processors with two cores and shared
memory
local and shared cache
Multi-core processors
 Shared cache
 high speed memory used by a number of cores (CPUs)
 advantages:
• efficient allocation of existing memory space
• one core may pre-fetch data for the other core
• sharing of common data
• no cache coherence problems
• less accesses to external memory
 drawbacks:
• conflict between cores when allocating space on the cache; one core
may replace the other core’s data
• more complex control circuit and longer latency time because of the
switching
• one core may lock the access to the other core
Multi-core processors
 Cache coherence of private memory
 How to keep the data consistent across caches?
• solutions:
 write through – every write is made also in the memory – not so
efficient
 Write-back – inconsistency solved when the cache line is
discharged – long inconsistency period may generate errors
 snooping and invalidation – cores are snooping the bus and
invalidates their cache line if a write from another core affects its
caches content (e.g. Pentium Pro’s P6 bus – snooping phase)
core 1 core 2 core 3 core 4

cache cache cache cache

write inconsistency
Read Memory
Multi-core processors
 Symmetric v.s. asymmetric cores
 Symmetric architecture
• all cores are the same
• cores can perform any tasks; they are interchangeable
• Advantages:
 easy to build (simple replication),
 easy to program, to compile and to execute multithreaded
programs
• examples:
 Intel, AMD - Dual and Quad core, Core2,
 SUN - UltraSparc T1 (Niagara) – 8 cores
Multi-core processors
 Symmetric v.s. asymmetric cores (cont.)
 Asymmetric (heterogeneous) architecture
• some cores have different functionalities:
 1-2 master cores and many slave (simpler) cores
 1 main core and multiple specialized cores (graphics, Fp,
multimedia)
• compilations should take into consideration what
functionalities can be performed by each core
• Advantages:
 can integrate much more simple cores
• examples:
 IBM – cell processor – used for Playstation 3
Multi-core processors
 Asymmetric (heterogeneous)
architecture
 IBM cell architecture: 9 cores
• 1 PPE - power processor element
 coordination and data transfer
• 8 SPEs - Synergistic Processing
Element
 specialized mathematical units
• applications:
 supercomputers
 playstations
 home cinema
 video cards
Multi-core processors
 Advantages of multi-core processors:
 Signals between different CPUs travel shorter distances, those
signals degrade less.

 These higher quality signals allow more data to be sent in a

given time period since individual signals can be shorter and do
not need to be repeated as often

 Cache coherency circuitry can operate at a much higher clock

rate than is possible if the signals have to travel off-chip.

 A dual-core processor uses slightly less power than two coupled

single-core processors.
Multi-core processors
 Disadvantages of multi-core processors:
 Ability of multi-core processors to increase application
performance depends on the use of multiple threads within
applications.

 Most current video games will run faster on a 3 GHz single-core

processor than on a 2GHz dual-core processor (of the same
core architecture.

 Two processing cores sharing the same system bus and

memory bandwidth limits the real-world performance advantage.

 If a single core is close to being memory bandwidth limited,

going to dual-core might only give 30% to 70% improvement.

 If memory bandwidth is not a problem, a 90% improvement can

be expected.
Multi-core processors
 Thread affinity
 we can specify if a thread may be executed
on any core or just on a specific core
• soft affinity: - controlled by the operating system
 an interrupted thread should continue on the same core
• hard affinity – flags associated to a thread that
indicate on which core(s) may be executed
 useful for real-time and control applications – to reduce
the load on a core on which critical threads are executed

CH17-COA10e - Parallel Processing
No ratings yet
CH17-COA10e - Parallel Processing
45 pages
Final Report: Multicore Processors
No ratings yet
Final Report: Multicore Processors
12 pages
Multi-Core Architectures
100% (1)
Multi-Core Architectures
43 pages
Multi Core 15213 Sp07
No ratings yet
Multi Core 15213 Sp07
67 pages
MULTIPROCTLPA
No ratings yet
MULTIPROCTLPA
99 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Multi Core
No ratings yet
Multi Core
7 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Ahmad Aljebaly Department of Computer Science Western Michigan University
No ratings yet
Ahmad Aljebaly Department of Computer Science Western Michigan University
42 pages
Osa Multi Core
No ratings yet
Osa Multi Core
37 pages
Multi-Core Processing: Advantages & Challenges
No ratings yet
Multi-Core Processing: Advantages & Challenges
35 pages
Ayushagrawal Hpc
No ratings yet
Ayushagrawal Hpc
17 pages
Flynns Taxonomy
0% (1)
Flynns Taxonomy
79 pages
Multi-Core Computing: Osama Awwad
No ratings yet
Multi-Core Computing: Osama Awwad
37 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Week_5
No ratings yet
Week_5
35 pages
L 5 Multicore
No ratings yet
L 5 Multicore
30 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Multicore Processor
100% (1)
Multicore Processor
23 pages
Background: Computer System Architectures Computer System Software
No ratings yet
Background: Computer System Architectures Computer System Software
25 pages
CC Unit 1
No ratings yet
CC Unit 1
24 pages
FALLSEM2024-25 CSI3021 TH VL2024250101925 2024-09-20 Reference-Material-I
No ratings yet
FALLSEM2024-25 CSI3021 TH VL2024250101925 2024-09-20 Reference-Material-I
25 pages
CH17 COA9e Parallel Processing
No ratings yet
CH17 COA9e Parallel Processing
52 pages
10-Multithreading
No ratings yet
10-Multithreading
60 pages
ITEC582 Chapter18
No ratings yet
ITEC582 Chapter18
36 pages
Architecture
No ratings yet
Architecture
67 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
Future Processors To Use Coarse-Grain Parallelism
No ratings yet
Future Processors To Use Coarse-Grain Parallelism
48 pages
Comparch Individual Assignment
No ratings yet
Comparch Individual Assignment
19 pages
Slot29 CH18 MultiCoreComputers 18 Slides
No ratings yet
Slot29 CH18 MultiCoreComputers 18 Slides
18 pages
Tlp
No ratings yet
Tlp
19 pages
Winsem2022-23 Cse4001 Eth Vl2022230503160 Reference Material i 15-12-2022 1.4 Multi-core Processor
No ratings yet
Winsem2022-23 Cse4001 Eth Vl2022230503160 Reference Material i 15-12-2022 1.4 Multi-core Processor
34 pages
Dual Core Processors: Presented by Prachi Mishra IT - 56
No ratings yet
Dual Core Processors: Presented by Prachi Mishra IT - 56
16 pages
Parallel Arch 2
No ratings yet
Parallel Arch 2
9 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Unit VI
No ratings yet
Unit VI
50 pages
Module 2
No ratings yet
Module 2
5 pages
Lec 44 Multicore
No ratings yet
Lec 44 Multicore
23 pages
BCSE412L - Parallel Computing 03
No ratings yet
BCSE412L - Parallel Computing 03
11 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
Mod 7
No ratings yet
Mod 7
56 pages
Parallelism and Multicores
No ratings yet
Parallelism and Multicores
54 pages
LECTURE 37
No ratings yet
LECTURE 37
17 pages
PART17
No ratings yet
PART17
45 pages
William Stallings Computer Organization and Architecture: Parallel Processing
No ratings yet
William Stallings Computer Organization and Architecture: Parallel Processing
40 pages
CH5 Parallel Processing
No ratings yet
CH5 Parallel Processing
30 pages
Arkom 13-40275
No ratings yet
Arkom 13-40275
32 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
No ratings yet
Cs405-Computer System Architecture: Module - 1 Parallel Computer Models
91 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
L38 TLP
No ratings yet
L38 TLP
13 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages

SSC Course 6 CPU

Uploaded by

SSC Course 6 CPU

Uploaded by

Structure of Computer

 Thread level parallelism execution issues:

Core Core Core Core Core Core

Memory Memory Memory

cache cache cache cache

 These higher quality signals allow more data to be sent in a

 Cache coherency circuitry can operate at a much higher clock

 A dual-core processor uses slightly less power than two coupled

 Most current video games will run faster on a 3 GHz single-core

 Two processing cores sharing the same system bus and

 If a single core is close to being memory bandwidth limited,

 If memory bandwidth is not a problem, a 90% improvement can

You might also like