Unit 1
Unit 1
1.1 Introduction
1.2 Objectives
1.3 Multiprocessing and Processor Coupling
1.4 Multiprocessor Interconnections
1.4.1 Bus-oriented System
1.4.2 Crossbar-connected System
1.4.3 Hypercubes System
1.4.4 Multistage Switch-based System
1.5 Types of Multiprocessor Operating Systems
1.5.1 Separate Supervisors
1.5.2 Master-Slave
1.5.3 Symmetric
1.6 Multiprocessor Operating System - Functions and Requirements
1.7 Multiprocessor Synchronization
1.7.1 Test-and-set
1.7.2 Compare-and-Swap
1.7.3 Fetch-and-Add
1.8 Summary
1.9 Solutions / Answers
1.10 Further Readings
1.0 INTRODUCTION
A multiprocessor system is a collection of a number of standard processors put
together in an innovative way to improve the performance / speed of computer
hardware. The main feature of this architecture is to provide high speed at low
cost in comparison to uniprocessor. In a distributed system, the high cost of
multiprocessor can be offset by employing them on a computationally intensive
task by making it compute server. The multiprocessor system is generally
characterised by - increased system throughput and application speedup - parallel
processing.
Throughput can be improved, in a time-sharing environment, by executing a
number of unrelated user processor on different processors in parallel. As a result
a large number of different tasks can be completed in a unit of time without
explicit user direction. On the other hand application speedup is possible by
creating a multiple processor scheduled to work on different processors.
The scheduling can be done in two ways:
1) Automatic means, by parallelising compiler.
2) Explicit-tasking approach, where each programme submitted for execution
is treated by the operating system as an independent process. 181
Advanced Topics in Multiprocessor operating systems aim to support high performance through
Operating Systems
multiple CPUs. An important goal is to make the number of CPUs transparent
to the application. Achieving such transparency is relatively easy because the
communication between different (parts of) applications uses the same
primitives as those in multitasking uni-processor operating systems. The idea
is that all communication is done by manipulating data at shared memory
locations, and that we only have to protect that data against simultaneous access.
Protection is done through synchronization primitives like semaphores and
monitors.
In this unit we will study multiprocessor coupling, interconnections, types of
multiprocessor operating systems and synchronization.
1.1 OBJECTIVES
After going through this unit, you should be able to:
define a multiprocessor system;
describe the architecture of multiprocessor and distinguish among various
types of architecture;
become familiar with different types of multiprocessor operating systems;
discuss the functions of multiprocessors operating systems;
describe the requirement of multiprocessor in Operating System, and
discuss the synchronization process in a multiprocessor system.
Processor Coupling
Tightly-coupled multiprocessor systems contain multiple CPUs that are connected
at the bus level. These CPUs may have access to a central shared memory (SMP),
or may participate in a memory hierarchy with both local and shared memory
(NUMA). The IBM p690 Regatta is an example of a high end SMP system. Chip
182 multiprocessors, also known as multi-core computing, involve more than one
processor placed on a single chip and can be thought of the most extreme form of Multiprocessor Systems
tightly-coupled multiprocessing. Mainframe systems with multiple processors
are often tightly-coupled.
Loosely-coupled multiprocessor systems often referred to as clusters are based
on multiple standalone single or dual processor commodity computers
interconnected via a high speed communication system. A Linux Beowulf is an
example of a loosely-coupled system.
Tightly-coupled systems perform better and are physically smaller than loosely-
coupled systems, but have historically required greater initial investments and
may depreciate rapidly; nodes in a loosely-coupled system are usually inexpensive
commodity computers and can be recycled as independent machines upon
retirement from the cluster.
Power consumption is also a consideration. Tightly-coupled systems tend to be
much more energy efficient than clusters. This is due to fact that considerable
economies can be realised by designing components to work together from the
beginning in tightly-coupled systems, whereas loosely-coupled systems use
components that were not necessarily intended specifically for use in such systems.
The above architecture gives rise to a problem of contention at two points, one is
shared bus itself and the other is shared memory. Employing private/cache memory
in either of two ways, explained below, the problem of contention could be
reduced;
with shared memory; and
with cache associated with each individual processor
1) With shared memory
The second approach where the cache is associated with each individual processor
is the most popular approach because it reduces contention more effectively.
184
Cache attached with processor an capture many of the local memory references Multiprocessor Systems
for example, with a cache of 90% hit ratio, a processor on average needs to
access the shared memory for 1 (one) out of 10 (ten) memory references because
9 (nine) references are already captured by private memory of processor. In this
case where memory access is uniformly distributed a 90% cache hit ratio can
support the shared bus to speed-up 10 times more than the process without cache.
The negative aspect of such an arrangement arises when in the presence of multiple
cache the shared writable data are cached. In this case Cache Coherence is
maintained to consistency between multiple physical copies of a single logical
datum with each other in the presence of update activity. Yes, the cache coherence
can be maintained by attaching additional hardware or by including some
specialised protocols designed for the same but unfortunately this special
arrangement will increase the bus traffic and thus reduce the benefit that processor
caches are designed to provide.
Cache coherence refers to the integrity of data stored in local caches of a shared
resource. Cache coherence is a special case of memory coherence.
But Unfortunately this system also faces the problem of contention when,
1) More than two processors (P1 and P2) attempt to access the one memory
module (M1) at the same time. In this condition, contention can be avoided
by making any one processor (P1) deferred until the other one (P2) finishes
this work or left the same memory module (M1) free for processor P1.
2) More than two processor attempts to access the same memory module. This
problem cannot be solved by above-mentioned solution.
1.4.2 Master-Slave
In master-slave, out of many processors one processor behaves as a master whereas
others behave as slaves. The master processor is dedicated to executing the
operating system. It works as scheduler and controller over slave processors. It
schedules the work and also controls the activity of the slaves. Therefore, usually
data structures are stored in its private memory. Slave processors are often
identified and work only as a schedulable pool of resources, in other words, the
slave processors execute application programmes.
This arrangement allows the parallel execution of a single task by allocating
several subtasks to multiple processors concurrently. Since the operating system
is executed by only master processors this system is relatively simple to develop
and efficient to use. Limited scalability is the main limitation of this system,
because the master processor become a bottleneck and will consequently fail to
fully utilise slave processors.
1.4.3 Symmetric
In symmetric organisation all processors configuration are identical. All processors
are autonomous and are treated equally. To make all the processors functionally
identical, all the resources are pooled and are available to them. This operating
system is also symmetric as any processor may execute it. In other words there is
one copy of kernel that can be executed by all processors concurrently. To that
end, the whole process is needed to be controlled for proper interlocks for
accessing scarce data structure and pooled resources.
The simplest way to achieve this is to treat the entire operating system as a critical
section and allow only one processor to execute the operating system at one
time. This method is called ‘floating master’ method because in spite of the
presence of many processors only one operating system exists. The processor
that executes the operating system has a special role and acts as a master. As the
operating system is not bound to any specific processor, therefore, it floats from
one processor to another.
1.6.1 Test-and-Set
The test-and-set instruction automatically reads and modifies the contents of a
memory location in one memory cycle. It is as follows:
function test-and-set (var m: Boolean); Boolean;
begin
test-and set:=m;
m:=rue
end;
The test-and-set instruction returns the current value of variable m (memory
location) and sets it to true. This instruction can be used to implement P and V
operations (Primitives) on a binary semaphore, S, in the following way (S is
implemented as a memory location):
P(S): while Test-and-Set (S) do nothing;
V(S): S:=false;
Initially, S is set to false. When a P(S) operation is executed for the first time,
test-and-set(S) returns a false value (and sets S to true) and the “while” loop of
the P(S) operation terminates. All subsequent executions of P(S) keep looping
because S is true until a V(S) operation is executed.
1.6.2 Compare-and-Swap
The compare and swap instruction is used in the optimistic synchronization of
concurrent updates to a memory location. This instruction is defined as follows
(r1and r2 are to registers of a processor and m is a memory location):
function test-and-set (var m: Boolean); Boolean;
var temp: integer;
begin
temp:=m;
191
Advanced Topics in if temp = r1 then {m:= r2;z:=1}
Operating Systems
else [r1:= temp; z:=0}
end;
If the contents of r1 and m are identical, this instruction assigns the contents of
r2 to m and sets z to 1. Otherwise, it assigns the contents of m to r1 and set z to 0.
Variable z is a flag that indicates the success of the execution. This instruction
can be used to synchronize concurrent access to a shared variable.
1.6.3 Fetch-and-Add
The fetch and add instruction is a multiple operation memory access instruction
that automatically adds a constant to a memory location and returns the previous
contents of the memory location. This instruction is defined as follows:
Function Fetch-and-add (m: integer; c: integer);
Var temp: integer;
Begin
Temp:= m;
M:= m + c;
Return (temp)
end;
This instruction is executed by the hardware placed in the interconnection network
not by the hardware present in the memory modules. When several processors
concurrently execute a fetch-and-add instruction on the same memory location,
these instructions are combined in the network and are executed by the network
in the following way:
A single increment, which is the sum of the increments of all these instructions,
is added to the memory location.
A single value is returned by the network to each of the processors, which is
an arbitrary serialisation of the execution of the individual instructions.
If a number of processors simultaneously perform fetch-and-add instructions
on the same memory location, the net result is as if these instructions were
executed serially in some unpredictable order.
The fetch-and-add instruction is powerful and it allows the implementation of P
and V operations on a general semaphore, S, in the following manner:
P(S): while (Fetch-add-Add(S, -1)< 0) do
begin
Fetch-and-Add (S, 1);
while (S<0) do nothing;
end;
The outer “while-do” statement ensures that only one processor succeeds in
decrementing S to 0 when multiple processors try to decrement variable S. All
the unsuccessful processors add 1 back to S and try again to decrement it. The
“while-do” statement forces an unsuccessful processor to wait (before retrying)
192
until S is greater then 0. Multiprocessor Systems
Check Your Progress 1
1) What is the difference between a loosely coupled system and a tightly coupled
system? Give examples.
........................................................................................................................
........................................................................................................................
........................................................................................................................
........................................................................................................................
1.7 SUMMARY
Multiprocessors systems architecture provides higher computing power and speed.
It consists of multiple processors that putting power and speed to the system.
This system can execute multiple tasks on different processors concurrently.
Similarly, it can also execute a single task in parallel on different processors. The
design of interconnection networks includes the bus, the cross-bar switch and
the multistage interconnection networks. To support parallel execution this system
must effectively schedule tasks to various processors. And also it must support
primitives for process synchronization and virtual memory management. The
three basic configurations of multiprocessor operating systems are: Separate
supervisors, Master/slave and Symmetric. A multiprocessor operating system
manages all the available resources and schedule functionality to form an
abstraction. It includes Process scheduling, Memory management and Device
management. Different mechanism and techniques are used for synchronization
in multiprocessors to serialize the access of pooled resources. In this section we
have discussed Test-and-Set, Compare-and-Swap and Fetch-and-Add techniques
of synchronization.
1.8 SOLUTIONS/ANSWERS
1) One feature that is commonly characterizing tightly coupled systems is that
they share the clock. Therefore, multiprocessors are typically tightly coupled
but distributed workstations on a network are not.
Another difference is that: in a tightly-coupled system, the delay experienced
when a message is sent from one computer to another is short, and data rate
is high; that is, the number of bits per second that can be transferred is large. 193
Advanced Topics in In a loosely-coupled system, the opposite is true: the intermachine message
Operating Systems
delay is large and the data rate is low.
For example, two CPU chips on the same printed circuit board and connected
by wires etched onto the board are likely to be tightly coupled, whereas two
computers connected by a 2400 bit/sec modem over the telephone system
are certain to be loosely coupled.
2) The difference between symmetric and asymmetric multiprocessing: all
processors of symmetric multiprocessing are peers; the relationship between
processors of asymmetric multiprocessing is a master-slave relationship. More
specifically, each CPU in symmetric multiprocessing runs the same copy of
the OS, while in asymmetric multiprocessing, they split responsibilities
typically, therefore, each may have specialised (different) software and roles.
194