SSC Course 6 CPU
SSC Course 6 CPU
Systems
Course 6
Multi-core systems
Multithreading and multi-processing
Exploiting different forms of parallelism:
data level parallelism (DLP) – same operations on a set of data – SIMD
architectures, multiple ALUs
instruction level parallelism (ILP) – instructions phases executed in
parallel – pipeline architectures
thread level parallelism (TLP) – instruction sequences/streams executed
in parallel – hyper-treading, multiprocessor architectures (mult-icore,
GRID, cloud, parallel computers)
t ts
S s Examples:
tp (1 q )ts qts / n q=50%, n->∞ => S=2
1
q=75%, n->∞ => S=4
1 q q / n
q=95%, n->∞ => S=20
Hyper-threading
hyper-treading - parallel execution of instruction streams
on a single CPU
Idea: when a tread is stalled because of some hazard cases
another thread can be executed
Solution:
two threads executed in parallel on the same pipelined CPU
after every stage two buffers (registers) store the partial results of the
two threads
Speedup – approximately 30%
The operating system will detect 2 logical CPUs !!
Single Thread IF ID Ex M Wb
threaded
Thread 1
Hyper
threaded IF ID Ex M Wb
Thread 2
Multiprocessors
Parallel execution of instruction streams on multiple CPUs
Implementations:
multi-core architectures – multiple CPUs in a single integrated
circuit (IC)
parallel computers – multiple CPUs on different ICs, but in the
same computer infrastructure
distributed computing facilities – multiple CPUs on different
computers, connected through a network
• network of PCs
• GRID architectures – distributed computing resources for virtual
organizations (VOs), manly for batch processing
• cloud architectures – computing resources (execution and storage)
offered as a service; it can be hired dynamically
combination of all above: multi-cores on parallel computers,
building distributed computing facilities
Multi-core processors
Why multi-core:
Difficult to make single-core clock frequencies even higher; in
the last 4-5 years the clock frequency growth saturated at 2.5-3
GHz
power consumption and dissipation problems (higher frequency
means more power)
pipeline architectures (instruction level parallelism) reached their
efficiency limits (around 20 pipeline stages)
designing a very complex CPU (with multiple optimization
schemes involved) requires coordination of very large designing
teams
many new applications are multithreaded (e.g. servers that solve
multiple concurrent requests, agent systems, gaming,
simulation, etc.)
Multi-core processors
Issues (decision choices):
same or different functionalities for CPUs (homogeneous v.s.
heterogeneous CPUs)
• symmetric cores (SMP – Symmetric multi-core processor) – every
core has the same structure and functionality
• asymmetric cores (ASMP) – there are coordination cores and
(simpler) specialized cores
the relation with the memory
• symmetric memory access - the SYMA
• non-uniform memory access – NUMA
connection between cores
• common bus – parallel or network-based (see network-on-chip)
• crossbar – multiple connections controlled with a switch
• memory hierarchy (cache) – common memory zones
Multi-core processors
architectural solutions
L1 L1 L1 L1
L1 L1
Switch L2 L2
crossbar
L2
L3 L3
L2 Switch Switch
Core Core
Local Local L2 L2
Store Store
Memory Memory
I/O
Module
Heterogeneous multi-core with Two processors with two cores and shared
memory
local and shared cache
Multi-core processors
Shared cache
high speed memory used by a number of cores (CPUs)
advantages:
• efficient allocation of existing memory space
• one core may pre-fetch data for the other core
• sharing of common data
• no cache coherence problems
• less accesses to external memory
drawbacks:
• conflict between cores when allocating space on the cache; one core
may replace the other core’s data
• more complex control circuit and longer latency time because of the
switching
• one core may lock the access to the other core
Multi-core processors
Cache coherence of private memory
How to keep the data consistent across caches?
• solutions:
write through – every write is made also in the memory – not so
efficient
Write-back – inconsistency solved when the cache line is
discharged – long inconsistency period may generate errors
snooping and invalidation – cores are snooping the bus and
invalidates their cache line if a write from another core affects its
caches content (e.g. Pentium Pro’s P6 bus – snooping phase)
core 1 core 2 core 3 core 4
write inconsistency
Read Memory
Multi-core processors
Symmetric v.s. asymmetric cores
Symmetric architecture
• all cores are the same
• cores can perform any tasks; they are interchangeable
• Advantages:
easy to build (simple replication),
easy to program, to compile and to execute multithreaded
programs
• examples:
Intel, AMD - Dual and Quad core, Core2,
SUN - UltraSparc T1 (Niagara) – 8 cores
Multi-core processors
Symmetric v.s. asymmetric cores (cont.)
Asymmetric (heterogeneous) architecture
• some cores have different functionalities:
1-2 master cores and many slave (simpler) cores
1 main core and multiple specialized cores (graphics, Fp,
multimedia)
• compilations should take into consideration what
functionalities can be performed by each core
• Advantages:
can integrate much more simple cores
• examples:
IBM – cell processor – used for Playstation 3
Multi-core processors
Asymmetric (heterogeneous)
architecture
IBM cell architecture: 9 cores
• 1 PPE - power processor element
coordination and data transfer
• 8 SPEs - Synergistic Processing
Element
specialized mathematical units
• applications:
supercomputers
playstations
home cinema
video cards
Multi-core processors
Advantages of multi-core processors:
Signals between different CPUs travel shorter distances, those
signals degrade less.