Threads
Motivation
• Most modern applications are multithreaded
• Threads run within the application
• Multiple tasks with the application can be implemented by separate
threads
• Update display
• Fetch data
• Spell checking
• Answer a network request
• Process creation is heavy-weight, while thread creation is light-weight
• Can simplify code, increase efficiency
• Kernels are generally multithreaded
Multithreaded Server Architecture
Benefits
• Responsiveness – may allow continued execution if
part of process is blocked, especially important for
user interfaces
• Resource Sharing – threads share resources of
process, easier than shared memory or message
passing
• Economy – cheaper than process creation, thread
switching lower overhead than context switching
• Scalability – process can take advantage of
multiprocessor architectures
Multicore Programming
• Multicore or multiprocessor systems putting pressure on
programmers, challenges include:
• Dividing activities
• Balance
• Data splitting
• Data dependency
• Testing and debugging
• Parallelism implies a system can perform more than one
task simultaneously
• Concurrency supports more than one task making progress
• Single processor / core, scheduler providing concurrency
Multicore Programming (Cont.)
• Types of parallelism
• Data parallelism – distributes subsets of the same data across multiple cores,
same operation on each
• Task parallelism – distributing threads across cores, each thread performing
unique operation
• As # of threads grows, so does architectural support for threading
• CPUs have cores as well as hardware threads
• Consider Oracle SPARC T4 with 8 cores, and 8 hardware threads per core
Concurrency vs. Parallelism
● Concurrent execution on single-core system:
● Parallelism on a multi-core system:
Single and Multithreaded Processes
Amdahl’s Law
• Identifies performance gains from adding additional cores to an
application that has both serial and parallel components
• S is serial portion
• N processing cores
• That is, if application is 75% parallel / 25% serial, moving from 1
to 2 cores results in speedup of 1.6 times
• As N approaches infinity, speedup approaches 1 / S
Serial portion of an application has disproportionate effect on
performance gained by adding additional cores
User Threads and Kernel Threads
• User threads - management done by user-level threads library
• Three primary thread libraries:
• POSIX Pthreads
• Windows threads
• Java threads
• Kernel threads - Supported by the Kernel
• Examples – virtually all general-purpose operating systems, including:
• Windows
• Solaris
• Linux
• Tru64 UNIX
• Mac OS X
Multithreading Models
• Many-to-One
• One-to-One
• Many-to-Many
Many-to-One
• Many user-level threads mapped to
single kernel thread
• One thread blocking causes all to
block
• Multiple threads may not run in
parallel on muticore system because
only one may be in kernel at a time
• Rarely used model now
• Examples:
• Solaris Green Threads
• GNU Portable Threads
One-to-One
• Each user-level thread maps to kernel
thread
• Creating a user-level thread creates a kernel
thread
• More concurrency than many-to-one
• Number of threads per process sometimes
restricted due to overhead
• Examples
• Windows
• Linux
• Solaris 9 and later
Many-to-Many Model
• Allows many user level threads
to be mapped to many kernel
threads
• Allows the operating system
to create a sufficient number
of kernel threads
• Solaris prior to version 9
• Windows with the
ThreadFiber package
Implicit Threading
• Growing in popularity as numbers of threads increase, program
correctness is more difficult with explicit threads
• Creation and management of threads done by compilers and run-time
libraries rather than programmers
• Three methods explored
• Thread Pools
• OpenMP
• Grand Central Dispatch
Thread Pools
• Create a number of threads in a pool where they await work
• Advantages:
• Usually slightly faster to service a request with an existing thread than
create a new thread
• Allows the number of threads in the application(s) to be bound to the size
of the pool
• Separating the task to be performed from the mechanics of creating the
task allows different strategies for running the task
• Tasks could be scheduled to run periodically
OpenMP
• Set of compiler directives and an API for C, C++,
FORTRAN
• Provides support for parallel programming in
shared-memory environments
• Identifies parallel regions – blocks of code that can
run in parallel
#pragma omp parallel
Create as many threads as there are cores
#pragma omp parallel for
for(i=0;i<N;i++) {
c[i] = a[i] + b[i];
}
Run for loop in parallel
Grand Central Dispatch
• Apple technology for Mac OS X and iOS operating systems
• Extensions to C, C++ languages, API, and run-time library
• Allows identification of parallel sections
• Manages most of the details of threading
• Block is in “^{ }” - ˆ{ printf("I am a block"); }
• Blocks placed in the dispatch queue
• Assigned to available thread in thread pool when removed from the queue
Grand Central Dispatch
• Two types of dispatch queues:
• serial – blocks removed in FIFO order, queue is per process, called main
queue
• Programmers can create additional serial queues within program
• concurrent – removed in FIFO order but several may be removed at a time
• Three system wide queues with priorities low, default, high
Threading Issues
• Semantics of fork() and exec() system calls
• Signal handling
• Synchronous and asynchronous
• Thread cancellation of target thread
• Asynchronous or deferred
• Thread-local storage
Semantics of fork() and exec()
• Does fork()duplicate only the calling thread or all threads?
• Some UNIXes have two versions of fork
• exec() usually works as normal – replace the running process
including all threads
Signal Handling
• Signals are used in UNIX systems to notify a process that a
particular event has occurred.
• A signal handler is used to process signals
• Signal is generated by a particular event
• Signal is delivered to a process
• Signal is handled by one of two signal handlers:
• Default
• User-defined
• Every signal has default handler that the kernel runs when
handling signal
• User-defined signal handler can override default
• For single-threaded, signal delivered to process
Signal Handling (Cont.)
• Where should a signal be delivered for multi-threaded?
• Deliver the signal to the thread to which the signal applies
• Deliver the signal to every thread in the process
• Deliver the signal to certain threads in the process
• Assign a specific thread to receive all signals for the process
Thread Cancellation
• Terminating a thread before it has finished
• Thread to be canceled is target thread
• Two general approaches:
• Asynchronous cancellation terminates the target thread immediately
• Deferred cancellation allows the target thread to periodically check if it
should be cancelled
Thread Cancellation (Cont.)
• Invoking thread cancellation requests cancellation, but actual
cancellation depends on thread state
• If a thread has cancellation disabled, cancellation remains pending
until the thread enables it
• Default type is deferred
• Cancellation only occurs when the thread reaches the cancellation point
• i.e. pthread_testcancel()
• Then cleanup handler is invoked
• On Linux systems, thread cancellation is handled through signals
Thread-Local Storage
• Thread-local storage (TLS) allows each thread to have its own copy
of data
• Useful when you do not have control over the thread creation
process (i.e., when using a thread pool)
• Different from local variables
• Local variables visible only during single function invocation
• TLS visible across function invocations
• Similar to static data
• TLS is unique to each thread
Scheduler Activations
• M:M models require communication to
maintain the appropriate number of kernel
threads allocated to the application
• Typically use an intermediate data structure
between user and kernel threads – lightweight
process (LWP)
• Appears to be a virtual processor on which process
can schedule user thread to run
• Each LWP attached to a kernel thread
• How many LWPs to create?
• Scheduler activations provide upcalls – a
communication mechanism from the kernel to
the upcall handler in the thread library
• This communication allows an application to
maintain the correct number of kernel threads