0% found this document useful (0 votes)

102 views

Thread Level Parallelism

The local memory which is local to every multithreaded Single Instruction Multiple Data(SIMD) Processor is called local memory. The correct answer is a. Local memory. Local memory refers to the on-chip memory that is local to each SIMD processor core in a manycore architecture. It allows fast data sharing between threads executing on the same core.

Uploaded by

Kashif Mehmood Kashif Mehmood

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views

Thread Level Parallelism

Uploaded by

Kashif Mehmood Kashif Mehmood

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Thread Level Parallelism

Cache Coherency, Multithreading, Symmetric Multiprocessing

Thread

In computer science, a thread of execution is the smallest

sequence of programmed instructions that can be
managed independently by a scheduler, which is typically
a part of the operating system. The implementation of
threads and processes differs between operating systems,
but in most cases a thread is a component of a process.
The multiple threads of a given process may be
executed concurrently (via multithreading capabilities),
sharing resources such as memory, while different
processes do not share these resources.
Thread Level Parallelism (TLP)
Goal - Higher performance through parallelism

Job-level (process-level) parallelism - High throughput for independent jobs

Application-level parallelism - Single program run on multiple processors  speedup

 Each core can operate concurrently and in parallel

 Multiple threads may operate in a time sliced fashion on a single core

Thread Level Parallelism (TLP)

 Multiple threads of execution

 Exploit ILP in each thread

 Exploit concurrent execution across threads

Cache Coherency

 Coherence defines the behavior of reads and writes to a single address location.

 One type of data occurring simultaneously in different cache memory is called cache coherence, or in some
systems, global memory.
Cache Coherency

 When clients in a system maintain caches of a common

memory resource, problems may arise with incoherent data,
which is particularly the case with CPUs in
a multiprocessing system.

 In the illustration on the right, consider both the clients have a

cached copy of a particular memory block from a previous read.
Suppose the client on the bottom updates/changes that
memory block, the client on the top could be left with an
invalid cache of memory without any notification of the change.
Cache coherence is intended to manage such conflicts by
maintaining a coherent view of the data values in multiple
caches.
Overview

In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of shared data:
one copy in the main memory and one in the local cache of each processor that requested it. When one of the copies of data is changed, the
other copies must reflect that change. Cache coherence is the discipline which ensures that the changes in the values of shared operands (data)
are propagated throughout the system in a timely fashion.

 Write Propagation - Changes to the data in any cache must be propagated to other copies (of that cache line) in the peer caches.

 Transaction Serialization - Reads/Writes to a single memory location must be seen by all processors in the same order.
Incoherent Caches

 The caches have different values of a single

address location.
Coherent Caches

 The value in all the caches' copies is the

same.
Mechanism

 Snooping

First introduced in 1983, snooping is a process where the individual caches monitor address lines for accesses to memory locations that they
have cached. The write-invalidate protocols and write-update protocols make use of this mechanism. For the snooping mechanism, a snoop
filter reduces the snooping traffic by maintaining a plurality of entries, each representing a cache line that may be owned by one or more nodes.
When replacement of one of the entries is required, the snoop filter selects for the replacement of the entry representing the cache line or lines
owned by the fewest nodes, as determined from a presence vector in each of the entries. A temporal or other type of algorithm is used to refine
the selection if more than one cache line is owned by the fewest nodes.

 Directory-based

In a directory-based system, the data being shared is placed in a common directory that maintains the coherence between caches. The directory
acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is
changed, the directory either updates or invalidates the other caches with that entry.
Multithreading
In computer architecture, multithreading is the ability of a central processing unit (CPU)
(or a single core in a multi-core processor) to provide multiple threads of
execution concurrently, supported by the operating system. In a multithreaded
application, the threads share the resources of a single or multiple cores, which include
the computing units, the CPU caches, and the translation lookaside buffer (TLB).

Multithreading aims to increase utilization of a single core by using thread-level

parallelism, as well as instruction-level parallelism.
Pros & Cons

 If a thread gets a lot of cache misses, the other threads can continue taking advantage of the unused computing resources, which may lead to
faster overall execution, as these resources would have been idle if only a single thread were executed. If a thread cannot use all the
computing resources of the CPU (because instructions depend on each other's result), running another thread may prevent those resources
from becoming idle.

 Multiple threads can interfere with each other when sharing hardware resources such as caches or translation lookaside buffers (TLBs). As a
result, execution times of a single thread are not improved and can be degraded, even when only one thread is executing, due to lower
frequencies or additional pipeline stages that are necessary to accommodate thread-switching hardware. Overall efficiency varies; Intel claims
up to 30% improvement with its Hyper-Threading Technology, while a synthetic program just performing a loop of non-optimized
dependent floating-point operations actually gains a 100% speed improvement when run in parallel. On the other hand, hand-
tuned assembly language programs using MMX or AltiVec extensions and performing data prefetches (as a good video encoder might) do
not suffer from cache misses or idle computing resources. Such programs therefore do not benefit from hardware multithreading and can
indeed see degraded performance due to contention for shared resources.
Symmetric Multiprocessing
Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer
hardware and software architecture where two or more identical processors are connected to a single, shared main
memory, have full access to all input and output devices, and are controlled by a single operating system instance
that treats all processors equally, reserving none for special purposes. Most multiprocessor systems today use an SMP
architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate
processors.
Symmetric Multiprocessing

SMP systems are tightly coupled

multiprocessor systems with a pool of homogeneous
processors running independently of each other. Each
processor, executing different programs and working on
different sets of data, has the capability of sharing
common resources (memory, I/O device, interrupt system
and so on) that are connected using a system bus or
a crossbar.
Performance

When more than one program executes at the same time, an SMP system has considerably better performance than a uni-
processor, because different programs can run on different CPUs simultaneously.

In cases where an SMP environment processes many jobs, administrators often experience a loss of hardware efficiency.
Software programs have been developed to schedule jobs and other functions of the computer so that the processor
utilization reaches its maximum potential. Good software packages can achieve this maximum potential by scheduling each
CPU separately, as well as being able to integrate multiple SMP machines and clusters.
Uses

Time-sharing and server systems can often use SMP without changes to applications, as they may have multiple processes running in parallel,
and a system with more than one process running can run different processes on different processors.

On personal computers, SMP is less useful for applications that have not been modified. If the system rarely runs more than one process at a
time, SMP is useful only for applications that have been modified for multithreaded (multitasked) processing. Custom-
programmed software can be written or modified to use multiple threads, so that it can make use of multiple processors.

Multithreaded programs can also be used in time-sharing and server systems that support multithreading, allowing them to make more use
of multiple processors.
Pros & Cons

In current SMP systems, all of the processors are tightly coupled inside the same box with a bus or switch; on earlier SMP systems, a
single CPU took an entire cabinet. Some of the components that are shared are global memory, disks, and I/O devices. Only one copy
of an OS runs on all the processors, and the OS must be designed to take advantage of this architecture. Some of the basic
advantages involves cost-effective ways to increase throughput. To solve different problems and tasks, SMP applies multiple
processors to that one problem, known as parallel programming.

There are a few limits on the scalability of SMP due to cache coherence and shared objects.
Multiple Choice Questions
1. In a particular system it is observed that, the cache performance gets improved as a result of increasing
the block size of cache. The primary reason behind this is:

a. Programs exhibits temporal locality b. Programs have small working set

c. Programs exhibits spatial locality

d. Read operation in frequently required rather than write operation

Multiple Choice Questions

2. The on-chip memory which is local to every multithreaded Single Instruction Multiple Data(SIMD) Processor is called

a. Local memory b. global memory

c. Flash memory d. none of the above

3. The thread level parallelism is a process of

d. saving the context of currently executing process b. flushing the CPU of the same process

c. loading the context of new next process d. all of the mentioned

Multiple Choice Questions

4. Thread becomes non runnable when?

a. Its stop method is invoked b. Its sleep method is invoked

c. Its finish method is invoked d. Its init method is invoked

5. Multithreading aims to increase utilization of a single core by using

a. thread-level parallelism b. instruction-level parallelism

c. Both a and b d. only by a

Questions
1. Show thread level parallelism dramatically.

2. Compare Write Propagation and Transaction Serialization in terms of cache coherency.

3. Explain mechanism of cache coherency.

4. State some uses of symmetric multiprocessing.

5. What is the purpose of multi threaded process?

6. State pros and cons of multithreading.

Task: 1 Write A Sketch To Interface Arduino With A 3 X 4 Matrix Keypad. The Display of The Pressed Key Should Be Displayed On The LCD. Code
No ratings yet
Task: 1 Write A Sketch To Interface Arduino With A 3 X 4 Matrix Keypad. The Display of The Pressed Key Should Be Displayed On The LCD. Code
12 pages
Basics of OOP: Chapter # 13
No ratings yet
Basics of OOP: Chapter # 13
18 pages
Functional Dependency (DBMS)
No ratings yet
Functional Dependency (DBMS)
17 pages
Operating Systems - CS604 Power Point Slides Lecture
100% (3)
Operating Systems - CS604 Power Point Slides Lecture
27 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Programming Fundamentals Project: Faculty of Information Technology UCP Lahore Pakistan
No ratings yet
Programming Fundamentals Project: Faculty of Information Technology UCP Lahore Pakistan
10 pages
Data Transfer Instructions
100% (1)
Data Transfer Instructions
3 pages
Arrays PDF
No ratings yet
Arrays PDF
13 pages
Instant ebooks textbook Operating System Concepts 10th 10th Edition Abraham Silberschatz download all chapters
100% (2)
Instant ebooks textbook Operating System Concepts 10th 10th Edition Abraham Silberschatz download all chapters
65 pages
Flynn's Classification
No ratings yet
Flynn's Classification
3 pages
Chapter 18: The Pentium and Pentium Pro Microprocessors
No ratings yet
Chapter 18: The Pentium and Pentium Pro Microprocessors
75 pages
Using MIPS and MFLOPS As Performance Metrics: April 26, 2008
No ratings yet
Using MIPS and MFLOPS As Performance Metrics: April 26, 2008
3 pages
Demand Paging: Amna Ahmad Muhammad Mustafa
0% (1)
Demand Paging: Amna Ahmad Muhammad Mustafa
24 pages
Write A Program That Takes 3 Values From User. Two Values of Integer and One Value of Float Data Type. Print Each Result On One Line
No ratings yet
Write A Program That Takes 3 Values From User. Two Values of Integer and One Value of Float Data Type. Print Each Result On One Line
15 pages
Moore and Mealy Machines
No ratings yet
Moore and Mealy Machines
15 pages
Swe-102 Lab 10!
100% (1)
Swe-102 Lab 10!
4 pages
Chapter 2 Web Services Delivered From The Cloud
No ratings yet
Chapter 2 Web Services Delivered From The Cloud
8 pages
Introduction To Computing (CS101) : Assignment # 01 Spring 2021
No ratings yet
Introduction To Computing (CS101) : Assignment # 01 Spring 2021
3 pages
Mutex Locks (Os Presentation)
No ratings yet
Mutex Locks (Os Presentation)
19 pages
Chapter 8: Deadlocks: System Model Deadlock Characterization Methods For Handling Deadlocks
No ratings yet
Chapter 8: Deadlocks: System Model Deadlock Characterization Methods For Handling Deadlocks
12 pages
Reducing Pipeline Branch Penalties
No ratings yet
Reducing Pipeline Branch Penalties
4 pages
Producer Consumer Problem
No ratings yet
Producer Consumer Problem
4 pages
Memory Management: Background Swapping Contiguous Allocation
No ratings yet
Memory Management: Background Swapping Contiguous Allocation
51 pages
Cs604-Operating System Solved Subjective For Mid Term Exam: List of Schedulers Include
No ratings yet
Cs604-Operating System Solved Subjective For Mid Term Exam: List of Schedulers Include
17 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Mg2451 Engineering Economics and Cost Analysis Unit II Value Engineering
No ratings yet
Mg2451 Engineering Economics and Cost Analysis Unit II Value Engineering
45 pages
Unit 3 Building Cloud Network
No ratings yet
Unit 3 Building Cloud Network
11 pages
31-Analysis of Clocked Synchronous Sequential Circuits-07!03!2023
No ratings yet
31-Analysis of Clocked Synchronous Sequential Circuits-07!03!2023
16 pages
Computer System Architecture
No ratings yet
Computer System Architecture
22 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
Objective Resolution
No ratings yet
Objective Resolution
5 pages
Emergence of Bangladesh
100% (1)
Emergence of Bangladesh
13 pages
Multivariable Calculus Multivariable Calculus
No ratings yet
Multivariable Calculus Multivariable Calculus
1 page
Unit I Introduction To 8085 Microprocessor
No ratings yet
Unit I Introduction To 8085 Microprocessor
55 pages
Stack by Linked List (By C++) : #Include
No ratings yet
Stack by Linked List (By C++) : #Include
4 pages
Systolic Array
No ratings yet
Systolic Array
42 pages
Osi Model
No ratings yet
Osi Model
2 pages
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
No ratings yet
1) Define MIPS. CPI and MFLOPS.: Q.1 Attempt Any FOUR
10 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
What Is Software Economics?
No ratings yet
What Is Software Economics?
10 pages
Introduction To Multi-Core Architecture
No ratings yet
Introduction To Multi-Core Architecture
16 pages
Pakistan Relations With USA
No ratings yet
Pakistan Relations With USA
11 pages
Classical Problems of Synchronization
No ratings yet
Classical Problems of Synchronization
10 pages
Equivalence Class Testing
No ratings yet
Equivalence Class Testing
31 pages
Measuring & Improving Drive Performance
100% (2)
Measuring & Improving Drive Performance
13 pages
Project of Data
No ratings yet
Project of Data
5 pages
COA Chapter 6
No ratings yet
COA Chapter 6
6 pages
General Features of Operating Systems
0% (1)
General Features of Operating Systems
4 pages
Risc and Cisc Casestudy
No ratings yet
Risc and Cisc Casestudy
5 pages
Computer Architecture
No ratings yet
Computer Architecture
18 pages
R18 DBMS Unit-V
No ratings yet
R18 DBMS Unit-V
43 pages
Design Issues: SMT and CMP Architectures
No ratings yet
Design Issues: SMT and CMP Architectures
9 pages
2.1 Advanced Processor Technology
No ratings yet
2.1 Advanced Processor Technology
40 pages
MSDFF
No ratings yet
MSDFF
20 pages
Const. Member Function
100% (2)
Const. Member Function
2 pages
CPU Scheduling
No ratings yet
CPU Scheduling
48 pages
Csa Mod 2
100% (1)
Csa Mod 2
28 pages
10-Multithreading
No ratings yet
10-Multithreading
60 pages
Multi-Core Architectures
100% (1)
Multi-Core Architectures
43 pages
unit6
No ratings yet
unit6
36 pages
NCERT Notes: Bhakti Movement - Origin, Saints, Timeline (Medieval Indian History Notes For UPSC)
No ratings yet
NCERT Notes: Bhakti Movement - Origin, Saints, Timeline (Medieval Indian History Notes For UPSC)
15 pages
The Types of Economic Policies Under Capitalism-Brill (2016)
No ratings yet
The Types of Economic Policies Under Capitalism-Brill (2016)
353 pages
Amanda Craven It
No ratings yet
Amanda Craven It
1 page
FDH Bank Prospectus Digital PDF
100% (1)
FDH Bank Prospectus Digital PDF
152 pages
Research Report On Crimes in The Name of Honour
No ratings yet
Research Report On Crimes in The Name of Honour
64 pages
Yi Jin Jing Postures1
No ratings yet
Yi Jin Jing Postures1
4 pages
LDSF Field Guide
No ratings yet
LDSF Field Guide
14 pages
Molar Mass
No ratings yet
Molar Mass
16 pages
Sma Resins - Versatile in Carpet and Rug Shampoo PDF
No ratings yet
Sma Resins - Versatile in Carpet and Rug Shampoo PDF
4 pages
Failure Analysis Report AFA 1
No ratings yet
Failure Analysis Report AFA 1
5 pages
Skull Crusher 30 - Physics
No ratings yet
Skull Crusher 30 - Physics
2 pages
Gold Exp 2e A2 TB U9
No ratings yet
Gold Exp 2e A2 TB U9
13 pages
Crinișor Ștefan - The Paraclete and Prophecy in The Johannine Community
No ratings yet
Crinișor Ștefan - The Paraclete and Prophecy in The Johannine Community
25 pages
The Vairgit: History
No ratings yet
The Vairgit: History
18 pages
Chapter 19 Ap Chemistry Outline
No ratings yet
Chapter 19 Ap Chemistry Outline
9 pages
Kronos Español Usos 2310
No ratings yet
Kronos Español Usos 2310
20 pages
The Barbie Phenomenon
No ratings yet
The Barbie Phenomenon
13 pages
EIN4333 Exam1 Solutions
No ratings yet
EIN4333 Exam1 Solutions
7 pages
Heat Engines: For SCIENCE Grade 9 Quarter 4 / Week 6
No ratings yet
Heat Engines: For SCIENCE Grade 9 Quarter 4 / Week 6
12 pages
Imaging Channels in Nile Delta
No ratings yet
Imaging Channels in Nile Delta
7 pages
5.laws of Motion
No ratings yet
5.laws of Motion
16 pages
AkhilaJayanthi Hyderabad Secunderabad, Telangana 3.03 Yrs
No ratings yet
AkhilaJayanthi Hyderabad Secunderabad, Telangana 3.03 Yrs
2 pages
Module-1_Hydraulics (1)
No ratings yet
Module-1_Hydraulics (1)
12 pages
Statistical Appendix in English PDF
No ratings yet
Statistical Appendix in English PDF
175 pages
Orthopaedic Logbook 3
No ratings yet
Orthopaedic Logbook 3
50 pages
Research Letter: Circulation: Genomic and Precision Medicine
No ratings yet
Research Letter: Circulation: Genomic and Precision Medicine
4 pages
Vident Ieasy310 OBDII (EOBD) +CAN Code Reader User Manual EN V1.0
No ratings yet
Vident Ieasy310 OBDII (EOBD) +CAN Code Reader User Manual EN V1.0
26 pages
01 - Fiche - Technique - JINKO SOLAIRE
No ratings yet
01 - Fiche - Technique - JINKO SOLAIRE
2 pages
Car Rental System Project PDF
No ratings yet
Car Rental System Project PDF
58 pages
Accidentally in Love
No ratings yet
Accidentally in Love
2 pages