0% found this document useful (0 votes)
18 views

unit 1

Uploaded by

storytimess111
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

unit 1

Uploaded by

storytimess111
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Characteristics of Distributed System :-

Introduction

Picture a world where your favourite websites never crash or slow down, even when millions of
people are accessing them at the same time.

Well, this is the power of distributed systems.

In this article, we will explore the depths of distributed systems. We will also study the characteristics
of the revolutionary approach to computing.

What are Distributed Systems?

Also known as Distributed Computing or Distributed Databases, a Distributed System is a cluster of


independent components on different machines connected by a centralised computer network.
These machines share resources, files and messages to reach common goals. In these modern times,
the data is more distributed than ever. This is why modern applications do not run separately and
rely heavily on distributed systems.

Types of Distributed Systems

 Client/Server Systems: These are the most basic forms of servers. The client gives an input
to the server, and the server replies with an output. The client wants to perform a task on a
server, and the server allocates and performs the task and sends the result as a response.
These types of servers can be applied to many other servers.

 Peer-to-Peer Systems: In this system, each node executes its task on its locally allocated
memory and shares the data through a supporting medium. Computer network applications
use a peer-to-peer system to manage processors that communicate with each other but
maintain independent memory bases.

 Middleware: It is an application which sits between two different applications and provides
services and benefits to both.

 Three-tier: Three-tier system uses a distinct layer and server for each program function. The
client's data is stored in the middle tier. It contains an application layer, a data layer, and a
presentation layer. This three-tier system is most commonly used in web or online
applications.
 N-tier: It is also known as a multitier distributed system. As the name suggests, this system
may contain any number of functions, similar to the three-tier system. This N-tier system is
more commonly used in web applications and data systems.

Also read - features of operating system

Applications of Distributed System

 In Networking, ethernet and LAN are the best examples of distributed systems. Computers
sent messages to other computers and systems using local IP addresses.

 In Telecommunications, telephone and cellular networks have distributed networks with a


broad spread of base stations. They persist in growing in complexity as a distributed network.

 In Real-time Systems, the systems are distributed globally. Many major industries use these
systems. Companies like Uber and Lyft use dispatch systems, many significant airlines use
flight control systems and e-commerce websites, and logistic companies use real-time
tracking systems.

Characteristics of Distributed Systems

The key characteristics of distributed systems are:

Transparency

One of the essential characteristics of the distributed system, transparency, is the notion that the
user interacts with a whole quantity rather than a cluster of cooperating elements. A system capable
of presenting itself as a whole to the user is called transparent.

Transparency is divided into the eight sub-characteristics illustrated in the following table:

Transparency Description

Access Hide differences in data representation and object accessibility.

Location Hide the object’s location.

Relocation Hide the location of the moving object while still in use.

Migration Hide that an object may move to another location

Replication Hide replication of the object

Concurrency Hide that an object may have a shared databases access

Failure Hide any resource failures.


Transparency Description

Persistence Hide the fact about memory location.

Heterogeneity

Heterogeneity refers to the system's ability to operate on various hardware and software
components. Middleware in the software layer helps achieve heterogeneity. The goal of the
middleware is to interpret the programming calls such that the distributed processing gets
completed.

Openness

Another important characteristic of Distributed system is openness. A distributed system's openness


is the difficulty in extending or improving an existing system. In order to make an open distributed
system,

 The interface of the components should be well-defined and precise.

 The interface of the components should be standardised.

 Integration of new components with existing ones must be effortless.

Scalability

In terms of effectiveness, scalability is one of the significant characteristics of distributed systems. It


refers to the ability of the system to handle growth as the number of users increases. Scalability is
accomplished by adding more computer systems to the existing networks.

A centralised system affects the scalability of a distributive system. If a system is centralised, more
nodes will try to communicate, which results in a bottleneck at that particular time in the system.

Fault Tolerance

A distributed system is very likely to be prone to system failures. This is due to the fact that several
computers have diverse-aged hardware. The ability of a system to handle these failures is called fault
tolerance. Fault tolerance is achieved by:

 Recovery: Systems and processes will have a stored backup. It takes over when the system
fails.

 Redundancy: When a component acts predictable and in a controlled way is called


redundancy.

Concurrency

Concurrency is the system's capability to access and use shared resources. It means multiple actions
are performed at the same time. In distributive systems, the concurrent execution of activities takes
place in different components running on numerous machines.

Efficiency
Efficiency refers to the capability of the system to use its resources effectively to execute the given
tasks. The system's design, the workload handled by the system, and the hardware and software
resources used are some critical factors affecting the system's efficiency.

Some of the common ways to improve the efficiency of the system are:

 Optimising the design of the system. This minimises the amount of communication and
coordination required between the different components, reducing any extra power
consumption.

 Carefully negating the workload of the system. This balance avoids any component overload
and ensures that the system can make the most efficient use of its resources.

After expanding your knowledge of the characteristics of distributed systems, let us now discuss the
disadvantages of distributed systems.

Disadvantages of Distributed Systems.

 Currently, there is no relevant software for distributed systems.

 As the data is distributed, security is a primary concern as it may be easily accessible.

 If there is a delay in the network, the user may face difficulty accessing data.

 The distributive system has an intricate Database that faces challenges to manage.

 Network overloading is also a challenge faced in distributive systems. This happens when all
the nodes send data at once.

Distributed Systems: Challenges/Failures


1. Fault Tolerance

 Challenge: Components (nodes, hardware, or software) may fail, and the system must
continue to function correctly despite these failures.
 Key Issues:
o Identifying failures promptly.
o Recovering gracefully without impacting the overall system.
 Example: Handling server crashes in cloud computing.

2. Scalability

 Challenge: The system must handle increased loads by adding more resources.
 Key Issues:
o Avoiding bottlenecks, such as centralized components.
o Maintaining performance as the number of nodes grows.
 Example: Scaling up e-commerce platforms during high-traffic events like Black
Friday.

3. Concurrency

 Challenge: Multiple processes execute simultaneously and may access shared


resources.
 Key Issues:
o Preventing race conditions where two processes update the same resource
inconsistently.
o Avoiding deadlocks where processes wait indefinitely for resources.
 Example: Managing simultaneous edits on a shared document in collaborative tools.

4. Data Consistency

 Challenge: Ensuring data remains consistent across all nodes in the system.
 Key Issues:
o Trade-offs between consistency, availability, and partition tolerance (CAP
theorem).
o Achieving consistency in the face of network delays or failures.
 Example: Banking systems ensuring account balances are updated accurately across
branches.

5. Latency

 Challenge: Communication between distributed nodes introduces delays.


 Key Issues:
o Reducing latency caused by geographical distance and network inefficiencies.
o Ensuring low-latency responses in real-time systems.
 Example: Video streaming platforms ensuring smooth playback without buffering.

6. Security

 Challenge: Protecting the system from unauthorized access, breaches, and attacks.
 Key Issues:
o Secure communication between nodes.
o Preventing attacks like Distributed Denial of Service (DDoS) or data
interception.
 Example: Encrypting data transmission in online banking.
7. Heterogeneity

 Challenge: Integrating diverse hardware, software, and network configurations.


 Key Issues:
o Compatibility across platforms with different protocols and architectures.
o Standardizing communication mechanisms.
 Example: IoT systems combining sensors, mobile devices, and cloud platforms.

8. Transparency

 Challenge: Masking the complexity of the distributed system from users and
developers.
 Types of Transparency:
o Access Transparency: Users shouldn’t need to know how to access remote
resources.
o Location Transparency: Resources’ physical locations shouldn’t matter.
o Failure Transparency: Failures should be handled seamlessly.
o Replication Transparency: Multiple copies of data shouldn’t be visible.
 Example: A user accessing files in Google Drive without worrying about their
storage location.

9. Synchronization

 Challenge: Coordinating operations and ensuring consistency in time-sensitive tasks.


 Key Issues:
o Maintaining synchronized clocks across nodes (e.g., NTP).
o Handling out-of-order events or updates.
 Example: Online gaming platforms where players' actions must be synchronized.

10. Resource Management

 Challenge: Efficiently allocating and sharing resources among distributed nodes.


 Key Issues:
o Balancing resource usage to avoid overloading some nodes while
underutilizing others.
o Preventing resource contention.
 Example: Distributed databases managing query loads across multiple servers.

EXAMPLES :-
Distributed systems are systems that break down large tasks into smaller sub-tasks that can be
executed in parallel on different system components. The goal of a distributed system is to improve
performance, reliability, and scalability. Some examples of distributed systems include:

 Telecommunications networks: Support mobile and internet networks

 Scientific computing: Includes protein folding and genetic research

 Airline and hotel reservation systems: Use distributed systems

 Multiuser video conferencing systems: Use distributed systems

 Cryptocurrency processing systems: Such as Bitcoin

 Peer-to-peer file-sharing systems: Use distributed systems

 Cloud computing platforms: Use distributed systems

 Content distribution networks: Use distributed systems

 Distributed databases: Use distributed systems

 Hadoop Distributed File System (HDFS): Used for distributed computing on the Hadoop
framework

Interprocess Communication in Distributed


Systems :-
Interprocess Communication (IPC) in distributed systems is crucial for enabling processes across
different nodes to exchange data and coordinate activities. This article explores various IPC methods,
their benefits, and challenges in modern distributed computing environments.

What is Interprocess Communication in a Distributed system?

Interprocess Communication in a distributed system is a process of exchanging data between two or


more independent processes in a distributed environment is called as Interprocess communication.
Interprocess communication on the internet provides both Datagram and stream communication.
Characteristics of Inter-process Communication in Distributed Systems

There are mainly five characteristics of inter-process communication in a distributed


environment/system.

 Synchronous System Calls: In synchronous system calls both sender and receiver use
blocking system calls to transmit the data which means the sender will wait until the
acknowledgment is received from the receiver and the receiver waits until the message
arrives.

 Asynchronous System Calls: In asynchronous system calls, both sender and receiver use non-
blocking system calls to transmit the data which means the sender doesn’t wait from the
receiver acknowledgment.

 Message Destination: A local port is a message destination within a computer, specified as


an integer. Aport has exactly one receiver but many senders. Processes may use multiple
ports from which to receive messages. Any process that knows the number of a port can
send the message to it.

 Reliability: It is defined as validity and integrity.

 Integrity: Messages must arrive without corruption and duplication to the destination.

Types of Interprocess Communication in Distributed Systems

Below are the types of interprocess communication (IPC) commonly used in distributed systems:

 Message Passing:

o Definition: Message passing involves processes communicating by sending and


receiving messages. Messages can be structured data packets containing information
or commands.

o Characteristics: It is a versatile method suitable for both synchronous and


asynchronous communication. Message passing can be implemented using various
protocols such as TCP/IP, UDP, or higher-level messaging protocols like AMQP
(Advanced Message Queuing Protocol) or MQTT (Message Queuing Telemetry
Transport).

 Shared Memory

Shared memory enables multiple processes to access a common region of memory. This
method is efficient as it minimizes communication overhead.

Key Concepts

 Processes communicate by reading and writing data to a shared memory space.

 Often implemented within a single machine but can extend to distributed environments
through specialized shared memory technologies.

Pros

 High performance: Fast access with minimal latency.

 Efficient memory sharing.


Cons

 Complex synchronization: Requires semaphores, mutexes, or other synchronization


mechanisms.

 Limited to machines on the same node or network configurations with specialized


shared memory technologies.

 Remote Procedure Calls (RPC):

o Definition: RPC allows one process to invoke a procedure (or function) in another
process, typically located on a different machine over a network.

o Characteristics: It abstracts the communication between processes by making it


appear as if a local procedure call is being made. RPC frameworks handle details like
parameter marshalling, network communication, and error handling.

 Sockets:

o Definition: Sockets provide a low-level interface for network communication


between processes running on different computers.

o Characteristics: They allow processes to establish connections, send data streams


(TCP) or datagrams (UDP), and receive responses. Sockets are fundamental for
implementing higher-level communication protocols.

 Message Queuing Systems:

o Description: Message queuing systems facilitate asynchronous communication by


allowing processes to send messages to and receive messages from queues.

o Characteristics: They decouple producers (senders) and consumers (receivers) of


messages, providing fault tolerance, scalability, and persistence of messages.
Examples include Apache Kafka, RabbitMQ, and AWS SQS.

 Publish-Subscribe Systems:

o Description: Publish-subscribe (pub-sub) systems enable communication between


components without requiring them to directly know each other.

o Characteristics: Publishers publish messages to topics, and subscribers receive


messages based on their interest in specific topics. This model supports one-to-many
communication and is scalable for large-scale distributed systems. Examples include
MQTT and Apache Pulsar.

These types of IPC mechanisms each have distinct advantages and are chosen based on factors such
as communication requirements, performance considerations, and the nature of the distributed
system architecture. Successful implementation often involves selecting the most suitable IPC type or
combination thereof to meet specific application needs.

Benefits of Interprocess Communication in Distributed Systems

Below are the benefits of IPC in Distributed Systems:

 Facilitates Communication:
o IPC enables processes or components distributed across different nodes to
communicate seamlessly.

o This allows for building complex distributed applications where different parts of the
system can exchange information and coordinate their activities.

 Integration of Heterogeneous Systems:

o IPC mechanisms provide a standardized way for integrating heterogeneous systems


and platforms.

o Processes written in different programming languages or running on different


operating systems can communicate using common IPC protocols and interfaces.

 Scalability:

o Distributed systems often need to scale horizontally by adding more nodes or


instances.

o IPC mechanisms, especially those designed for distributed environments, can


facilitate scalable communication patterns such as publish-subscribe or message
queuing, enabling efficient scaling without compromising performance.

 Fault Tolerance and Resilience:

o IPC techniques in distributed systems often include mechanisms for handling failures
and ensuring resilience.

o For example, message queues can buffer messages during network interruptions,
and RPC frameworks can retry failed calls or implement failover strategies.

 Performance Optimization:

o Effective IPC can optimize performance by minimizing latency and overhead


associated with communication between distributed components.

o Techniques like shared memory or efficient message passing protocols help in


achieving low-latency communication.

Challenges of Interprocess Communication in Distributed Systems

Below are the challenges of IPC in Distributed Systems:

 Network Latency and Bandwidth:

o Distributed systems operate over networks where latency (delay in transmission) and
bandwidth limitations can affect IPC performance.

o Minimizing latency and optimizing bandwidth usage are critical challenges, especially
for real-time applications.

 Reliability and Consistency:

o Ensuring reliable and consistent communication between distributed components is


challenging.
o IPC mechanisms must handle network failures, message loss, and out-of-order
delivery while maintaining data consistency across distributed nodes.

 Security:

o Securing IPC channels against unauthorized access, eavesdropping, and data


tampering is crucial.

o Distributed systems often transmit sensitive data over networks, requiring robust
encryption, authentication, and access control mechanisms.

 Complexity in Error Handling:

o IPC errors, such as network timeouts, connection failures, or protocol mismatches,


must be handled gracefully to maintain system stability.

o Designing robust error handling and recovery mechanisms adds complexity to


distributed system implementations.

 Synchronization and Coordination:

o Coordinating actions and ensuring synchronization between distributed components


can be challenging, especially when using shared resources or implementing
distributed transactions.

o IPC mechanisms must support synchronization primitives and consistency models to


avoid race conditions and ensure data integrity.

Example of Interprocess Communication in Distributed System

Let’s consider a scenario to understand the Interprocess Communication in Distributed System:

Consider a distributed system where you have two processes running on separate computers, a client
process (Process A) and a server process (Process B). The client process needs to request information
from the server process and receive a response.

IPC Example using Remote Procedure Calls (RPC):

1. RPC Setup:

 Process A (Client): Initiates an RPC call to Process B (Server).

 Process B (Server): Listens for incoming RPC requests and responds accordingly.

2. Steps Involved:

 Client-side (Process A):

o The client process prepares an RPC request, which includes the name of the
remote procedure to be called and any necessary parameters.

o It sends this request over the network to the server process.

 Server-side (Process B):

o The server process (Process B) listens for incoming RPC requests.


o Upon receiving an RPC request from Process A, it executes the requested
procedure using the provided parameters.

o After processing the request, the server process prepares a response (if
needed) and sends it back to the client process (Process A) over the
network.

3. Communication Flow:

 Process A and Process B communicate through the RPC framework, which manages
the underlying network communication and data serialization.

 The RPC mechanism abstracts away the complexities of network communication and
allows the client and server processes to interact as if they were local.

4. Example Use Case:

 Process A (Client) could be a web application requesting user data from a database
hosted on Process B (Server).

 Process B (Server) receives the request, queries the database, processes the data,
and sends the results back to Process A (Client) via RPC.

 The client application then displays the retrieved data to the user.

In this example, RPC serves as the IPC mechanism facilitating communication between the client and
server processes in a distributed system. It allows processes running on different machines to
collaborate and exchange data transparently, making distributed computing more manageable and
scalable.

Application Program Interface (API) for internet protocols


API is a programming interface between application programs and communication
subsystems based on open network protocols. The API lets any application program
operating in its own MVS address space to access and use communication services provided
by an MVS subsystem that implements this interface. TCP access, which provides
communication services using TCP/IP protocols, is an example of such a subsystem.

This programmer's reference describes an interface to the transport layer of the Basic
Reference Model of Open Systems Interconnection (OSI). Although the API is capable of
interfacing to proprietary protocols, the Internet open network protocols are the intended
providers of the transport service. This document uses the term "open" to emphasize that
any system conforming to one of these standards can communicate with any other system
conforming to the same standard, regardless of vendor. These protocols are contrasted with
proprietary protocols that generally support a closed community of systems supplied by a
single vendor External Data Representation and Marshalling.

External data representation and marshalling :-


External data representation
The information stored in running programs is represented as data structures. In a distributed system
information in messages transferring between components consists of sequences of bytes. So, to
communicate any information these data structures must be converted to a sequence of bytes
before transmission. Also in the arrival of messages, the data should be re-converted into its original
data structure.

There are several different types of data that use in computers and these types are not the same in
every place that data needed to transfer. Let’s see how these types differ from one to another.

 Integers have two different types — big-endian and little-endian

 Floats — Different representation in different architectures

 Characters — ASCII and Unicode

To effectively communicate these different types of data between computers there should be a way
to convert every data to a common format. External data representation is the data type that act as
the intermediate data type in the transmission.

Marshalling

Marshalling is the process of taking a collection of the data structures to transfer and format them
into an external data representation type which suitable for transmission in a message.

Unmarshalling

Unmarshalling is the inverse of this process, which is reformatting the transferred data on arrival to
produce the original data structures at the destination.

Let’s find how this external data representation works in different use cases.

CORBA’s common data representation

Common Object Request Broker Architecture aka CORBA is a specification developed by Object
Management Group (OMG) currently the leading middleware solutions in most distributed systems.
Its a specification for creating, distributing, and managing objects in distributed networks. CORBA
describes a messaging mechanism by which objects distributed over a network can transfer
messages with each other irrespective of the platform or language used to create those objects. This
enables collaboration between systems on different architectures, operating systems, programming
languages as well as computer hardware.

CORBA’s Common Data Representation specification includes 15 primitive data types and other
constructed types.
CORBA CDR Example

Java’s object serialization

In Java remote method invocation (RMI), both objects and primitive data values may be passed as
arguments and results of method invocations. In Java, the term serialization refers to the activity of
flattening an object(An instance of a class) or a connected set of objects into a serial form that is
suitable for storing on disk or transmitting in a message.
Java Object Serialization Example

XML (Extensible Markup Language)

XML is a markup language that was defined by the World Wide Web Consortium for general use on
the web. XML was initially developed for writing structured documents for the web. XML is used to
enable clients to communicate with web services and for defining the interfaces and other properties
of web services.

XML Data Representation Example

Conclusion

In CORBA’s common data representation and Java’s object serialization, the marshalling and
unmarshalling activities are intended to be carried out by a middleware layer without any
involvement on the part of the application programmer. Even in the case of XML, which is textual and
therefore more accessible to hand-encoding, software for marshalling and unmarshalling is available
for all commonly used platforms and programming environments. Because marshalling requires the
consideration of all the finest details of the representation of the primitive components of composite
objects, the process is likely to be error-prone if carried out by hand. Compactness is another issue
that can be addressed in the design of automatically generated marshalling procedures.

In CORBA’s common data representation and Java’s object serialization, the primitive data types are
marshalled into a binary form. In XML, the primitive data types are represented textually. The textual
representation of a data value will generally be longer than the equivalent binary representation. The
HTTP protocol, which is described in Chapter 5, is another example of the textual approach.

Another issue with regard to the design of marshalling methods is whether the marshalled data
should include information concerning the type of its contents. For example, CORBA’s representation
includes just the values of the objects transmitted and nothing about their types. On the other hand,
both Java serialization and XML do include type information, but in different ways. Java puts all of the
required type information into the serialized form, but XML documents may refer to externally
defined sets of names (with types) called namespaces.

Although we are interested in the use of an external data representation for the arguments and
results of RMIs and RPCs, it does have a more general use for representing data structures, objects or
structured documents in a form suitable for transmission in messages or storing in files.

Client Server Communication in Operating System :-


In an Operating System, Client Server Communication refers to the exchange of data and Services
among multiple machines or processes. In Client client-server communication System one process or
machine acts as a client requesting a service or data, and Another machine or process acts like a
server for providing those Services or Data to the client machine. This Communication model is
widely used for exchanging data among various computing environments like Distributed Systems,
Internet Applications, and Networking Application communication. The communication between
Server and Client takes place with different Protocols and mechanisms.

Different Ways of Client-Server Communication

In Client Server Communication we can use different ways.

1. Sockets Mechanism

2. Remote Procedure Call

3. Message Passing

4. Inter-process Communication

5. Distributed File Systems

Sockets Mechanism

The Sockets are the End Points of Communication between two machines. They provide a way for
processes to communicate with each other, either on the same on machine or over through Internet
also possible. The Sockets enable the communication connection between Serthe er and the client to
transfer data in a bidirectional way.
Client Server Communication using Sockets

Remote Procedure Call (PRC)

Remote Procedure Call is a Protocol. A Protocol is set of Instructions. It allows a client to execute a
procedure call on remote server, as if it is local procedure call. PRC is commonly used in Client Server
communication Architecture. PRC Provide high level of abstraction to the programmer. In This The
client Program issues a procedure call , which is translated into message that is sent over the
network to the Server, The Server execute the call and send back to the Client Machine.

Remote Procedure Call Process

Message Passing

Message Passing is a communication Method. Where the machines communicated with each one by
send and receiving the messages. This approach is commonly used in Parallel and Distributed
Systems, This approach enables data exchange among the System.
Message Passing Process

Inter process Communication

The Inter Process Communication also called IPC. It allows communication between processes within
the same Machine. The IPC can enable data sharing and Synchronous between different processes
running concurrently on an operating system. And it includes Sharing Memory, message queues,
semaphores and pipes among others.

Inter process Communication Process

Distributed File Systems

Distributed File Systems provide access to files from multiple machines in network. Client can access
and manipulate files stored on Remote Server, Through Standard Interface Example Network File
System and Server Message Block.

Group Communication in Distributed Systems :-


In distributed systems, efficient group communication is crucial for coordinating activities among
multiple entities. This article explores the challenges and solutions involved in facilitating reliable and
ordered message delivery among members of a group spread across different nodes or networks.

What is Group Communication in Distributed Systems?

Group communication in distributed systems refers to the process of exchanging information among
multiple nodes or entities that are geographically dispersed or located on different machines within a
network. It involves mechanisms and protocols designed to facilitate communication and
coordination among members of a group, where each member typically plays a specific role or
performs particular tasks within the distributed system.

Importance of Group Communication in Distributed Systems

Group communication is critically important in distributed systems due to several key reasons:

 Coordination and Synchronization:

o Distributed systems often involve multiple nodes or entities that need to collaborate
and synchronize their activities.

o Group communication mechanisms facilitate the exchange of information,


coordination of tasks, and synchronization of state among these distributed entities.

o This ensures that all parts of the system are aware of the latest updates and can act
in a coordinated manner.

 Efficient Information Sharing:

o In distributed systems, different nodes may generate or process data that needs to
be shared among multiple recipients.

o Group communication allows for efficient dissemination of information to all


relevant parties simultaneously, reducing latency and ensuring consistent views of
data across the system.

 Fault Tolerance and Reliability:

o Group communication protocols often include mechanisms for ensuring reliability


and fault tolerance.

o Messages can be replicated or acknowledged by multiple nodes to ensure that


communication remains robust even in the face of node failures or network
partitions.

o This enhances the overall reliability and availability of the distributed system.

 Scalability:

o As distributed systems grow in size and complexity, the ability to scale effectively
becomes crucial.

o Group communication mechanisms are designed to handle increasing numbers of


nodes and messages without compromising performance or reliability.

o They enable the system to maintain its responsiveness and efficiency as it scales up.
Types of Group Communication in a Distributed System

Below are the three types of group communication in distributed systems:

1. Unicast Communication

Unicast Communication

Unicast communication refers to the point-to-point transmission of data between two nodes in a
network. In the context of distributed systems:

 Definition: Unicast involves a sender (one node) transmitting a message to a specific receiver
(another node) identified by its unique network address.

 Characteristics:

o One-to-One: Each message has a single intended recipient.

o Direct Connection: The sender establishes a direct connection to the receiver.

o Efficiency: Suitable for scenarios where targeted communication is required, such as


client-server interactions or direct peer-to-peer exchanges.

 Use Cases:

o Request-Response: Common in client-server architectures where clients send


requests to servers and receive responses.

o Peer-to-Peer: Direct communication between two nodes in a decentralized network.

 Advantages:

o Efficient use of network resources as messages are targeted.

o Simplified implementation due to direct connections.

o Low latency since messages are sent directly to the intended recipient.

 Disadvantages:

o Not scalable for broadcasting to multiple recipients without sending separate


messages.

o Increased overhead if many nodes need to be contacted individually.

2. Multicast Communication

Multicast Communication

Multicast communication involves sending a single message from one sender to multiple receivers
simultaneously within a network. It is particularly useful in distributed systems where broadcasting
information to a group of nodes is necessary:

 Definition: A sender transmits a message to a multicast group, which consists of multiple


recipients interested in receiving the message.

 Characteristics:

o One-to-Many: Messages are sent to multiple receivers in a single transmission.


o Efficient Bandwidth Usage: Reduces network congestion compared to multiple
unicast transmissions.

o Group Membership: Receivers voluntarily join and leave multicast groups as needed.

 Use Cases:

o Content Distribution: Broadcasting updates or notifications to subscribers.

o Collaborative Systems: Real-time collaboration tools where changes made by one


user need to be propagated to others.

 Advantages:

o Saves bandwidth and network resources by transmitting data only once.

o Simplifies management by addressing a group rather than individual nodes.

o Supports scalable communication to a large number of recipients.

 Disadvantages:

o Requires mechanisms for managing group membership and ensuring reliable


delivery.

o Vulnerable to network issues such as packet loss or congestion affecting all


recipients.

3. Broadcast Communication

Broadcast communication involves sending a message from one sender to all nodes in the network,
ensuring that every node receives the message:

Broadcast Communication

 Definition: A sender transmits a message to all nodes within the network without the need
for specific recipients.

 Characteristics:

o One-to-All: Messages are delivered to every node in the network.

o Broadcast Address: Uses a special network address (e.g., IP broadcast address) to


reach all nodes.

o Global Scope: Suitable for disseminating information to all connected nodes


simultaneously.

 Use Cases:

o Network Management: Broadcasting status updates or configuration changes.

o Emergency Alerts: Disseminating critical information to all recipients in a timely


manner.

 Advantages:
o Ensures that every node receives the message without requiring explicit recipient
lists.

o Efficient for scenarios where global dissemination of information is necessary.

o Simplifies communication in small-scale networks or LAN environments.

 Disadvantages:

o Prone to network congestion and inefficiency in large networks.

o Security concerns, as broadcast messages are accessible to all nodes, potentially


leading to unauthorized access or information leakage.

o Requires careful network design and management to control the scope and impact
of broadcast messages.

Reliable Multicast Protocols for Group Communication

Reliable multicast protocols are essential in distributed systems to ensure that messages sent from a
sender to multiple recipients are delivered reliably, consistently, and in a specified order. These
protocols are designed to handle the complexities of group communication, where ensuring every
member of a multicast group receives the message correctly is crucial. Types of Reliable Multicast
Protocols include:

 FIFO Ordering:

o Ensures that messages are delivered to all group members in the order they were
sent by the sender.

o Achieved by sequencing messages and delivering them sequentially to maintain the


correct order.

 Causal Ordering:

o Preserves the causal relationships between messages based on their dependencies.

o Ensures that messages are delivered in an order that respects the causal
dependencies observed by the sender.

 Total Order and Atomicity:

o Guarantees that all group members receive messages in the same global order.

o Ensures that operations based on the multicast messages (like updates to shared
data) appear atomic or indivisible to all recipients.

Scalability and Performance for Group Communication

Scalability and performance are critical aspects of group communication in distributed systems,
where the ability to handle increasing numbers of nodes, messages, and participants while
maintaining efficient operation is essential. Here’s an in-depth explanation of scalability and
performance considerations in this context:

1. Scalability
Scalability in group communication refers to the system’s ability to efficiently accommodate growth
in terms of:

 Number of Participants: As the number of nodes or participants in a group increases, the


system should be able to manage communication without significant degradation in
performance.

 Volume of Messages: Handling a larger volume of messages being exchanged among group
members, ensuring that communication remains timely and responsive.

 Geographical Distribution: Supporting communication across geographically dispersed


nodes or networks, which may introduce additional latency and bandwidth challenges.

2. Challenges in Scalability

 Communication Overhead: As the group size increases, the overhead associated with
managing group membership, message routing, and coordination can become significant.

 Network Bandwidth: Ensuring that the network bandwidth can handle the increased traffic
generated by a larger group without causing congestion or delays.

 Synchronization and Coordination: Maintaining consistency and synchronization among


distributed nodes becomes more complex as the system scales up.

3. Strategies for Scalability

 Partitioning and Sharding: Dividing the system into smaller partitions or shards can reduce
the scope of communication and management tasks, improving scalability.

 Load Balancing: Distributing workload evenly across nodes or partitions to prevent


bottlenecks and ensure optimal resource utilization.

 Replication and Caching: Replicating data or messages across multiple nodes can reduce
access latency and improve fault tolerance, supporting scalability.

 Scalable Protocols and Algorithms: Using efficient communication protocols and algorithms
designed for large-scale distributed systems, such as gossip protocols or scalable consensus
algorithms.

4. Performance

Performance in group communication involves optimizing various aspects to achieve:

 Low Latency: Minimizing the time delay between sending and receiving messages within the
group.

 High Throughput: Maximizing the rate at which messages can be processed and delivered
across the system.

 Efficient Resource Utilization: Using network bandwidth, CPU, and memory resources
efficiently to support fast and responsive communication.

5. Challenges in Performance
 Message Ordering: Ensuring that messages are delivered in the correct order while
maintaining high throughput can be challenging, especially in protocols that require strict
ordering guarantees.

 Concurrency Control: Managing concurrent access to shared resources or data within the
group without introducing contention or bottlenecks.

 Network Conditions: Adapting communication strategies to varying network conditions,


such as bandwidth limitations or packet loss, to maintain optimal performance.

6. Strategies for Performance

 Optimized Message Routing: Using efficient routing algorithms to minimize the number of
network hops and reduce latency.

 Asynchronous Communication: Employing asynchronous messaging patterns to decouple


sender and receiver activities, improving responsiveness.

 Caching and Prefetching: Pre-fetching or caching frequently accessed data or messages to


reduce latency and improve response times.

 Parallelism: Leveraging parallel processing techniques to handle multiple tasks or messages


concurrently, enhancing throughput.

Challenges of Group Communication in Distributed Systems

Group communication in distributed systems poses several challenges due to the inherent
complexities of coordinating activities across multiple nodes or entities that may be geographically
dispersed or connected over unreliable networks. Here are some of the key challenges:

 Reliability: Ensuring that messages are reliably delivered to all intended recipients despite
network failures, node crashes, or temporary disconnections. Reliable delivery becomes
especially challenging when nodes join or leave the group dynamically.

 Scalability: As the number of group members increases, managing communication becomes


more challenging. Scalability issues arise in terms of bandwidth consumption, message
processing overhead, and the ability to maintain performance as the system scales.

 Concurrency and Consistency: Ensuring consistency of shared data across distributed nodes
while allowing concurrent updates can be difficult. Coordinating access to shared resources
to prevent conflicts and maintain data integrity requires robust synchronization mechanisms.

 Fault Tolerance: Dealing with node failures, network partitions, and transient communication
failures without compromising the overall reliability and availability of the system. This
involves mechanisms for detecting failures, managing group membership changes, and
ensuring that communication continues uninterrupted.

IPC in UNIX :-
Inter-process communication (IPC) in UNIX allows processes to exchange data and signals,
enabling collaboration between different programs or processes. UNIX provides several IPC
mechanisms, each designed for different use cases, performance requirements, and levels of
complexity. Here are the key IPC methods in UNIX:
### 1. **Pipes**
- **Description**: Pipes are one of the simplest IPC mechanisms that allow data to flow in
one direction between two related processes (parent-child).
- **Types**:
- **Anonymous Pipes**: Used for communication between processes that have a
common ancestor, like parent and child processes.
- **Named Pipes (FIFOs)**: These pipes have a name in the filesystem, allowing
communication between unrelated processes.
- **Example**:
```bash
$ ls | grep txt
```
In this example, the output of the `ls` command is passed through a pipe to the `grep`
command.

### 2. **Message Queues**


- **Description**: Message queues allow processes to exchange messages in a structured
format (usually in the form of messages and message types). Each message has a type and a
body.
- **Advantages**:
- Messages can be read out of order (based on priority or type).
- It provides persistence (messages remain in the queue until they are read).
- **System Calls**:
- `msgget()`: Create or access a message queue.
- `msgsnd()`: Send a message to the queue.
- `msgrcv()`: Receive a message from the queue.

### 3. **Shared Memory**


- **Description**: Shared memory allows multiple processes to access a common segment
of memory, making it the fastest IPC method since it avoids the overhead of data copying.
- **Advantages**:
- Efficient for large data transfer since data is not copied between processes.
- **Challenges**:
- Synchronization issues, such as race conditions, need to be managed (usually via
semaphores).
- **System Calls**:
- `shmget()`: Create or access a shared memory segment.
- `shmat()`: Attach the shared memory segment to the process's address space.
- `shmdt()`: Detach the shared memory segment.
- `shmctl()`: Control operations on the shared memory segment.

### 4. **Semaphores**
- **Description**: Semaphores are used to control access to shared resources by multiple
processes, ensuring that only a certain number of processes can access the resource at a
time.
- **Types**:
- **Binary Semaphores**: They behave like mutexes, allowing only one process to access
a critical section.
- **Counting Semaphores**: These allow a specific number of processes to access a
resource.
- **System Calls**:
- `semget()`: Create or access a semaphore set.
- `semop()`: Perform operations on a semaphore set (e.g., wait, signal).
- `semctl()`: Control operations on the semaphore set.

### 5. **Signals**
- **Description**: Signals are asynchronous notifications sent to a process to notify it of an
event (e.g., `SIGTERM`, `SIGKILL`, `SIGINT`).
- **Use Case**: Used to notify a process of events such as termination requests, timeouts,
or errors.
- **System Calls**:
- `kill()`: Send a signal to a process.
- `signal()`: Set a signal handler for a process.
- `sigaction()`: Advanced signal handling.
- **Example**:
```bash
$ kill -9 <pid>
```

### 6. **Sockets**
- **Description**: Sockets provide communication between processes, either on the same
machine or over a network. They support bidirectional communication.
- **Types**:
- **UNIX Domain Sockets**: For local inter-process communication (same machine).
- **Internet Domain Sockets**: For communication over networks (e.g., TCP/IP).
- **System Calls**:
- `socket()`: Create a socket.
- `bind()`: Bind a socket to an address.
- `listen()`, `accept()`: Listen for and accept connections.
- `send()`, `recv()`: Send and receive data through the socket.

### 7. **Memory-Mapped Files (mmap)**


- **Description**: `mmap` allows processes to map files or devices into memory, enabling
multiple processes to share the same memory space by mapping the same file.
- **Use Case**: Efficient for file sharing and large data access.
- **System Calls**:
- `mmap()`: Map a file into memory.
- `munmap()`: Unmap the file from memory.

### 8. **File-Based Communication**


- **Description**: Processes can communicate by reading from and writing to files in the
filesystem. This method is simple but less efficient compared to other IPC mechanisms.
- **Use Case**: Used for logging, exchanging data between batch processes, or when
simplicity is more important than performance.
### Choosing the Right IPC Mechanism:
- **Pipes**: Simple, unidirectional communication between related processes.
- **Message Queues**: Suitable for structured communication with message prioritization.
- **Shared Memory**: Ideal for large data sharing with high performance, but requires
synchronization.
- **Semaphores**: Best for managing access to shared resources and avoiding race
conditions.
- **Signals**: Lightweight notification for event handling.
- **Sockets**: Required for network-based communication or communication between
unrelated processes.
- **mmap**: Efficient file-based communication for large datasets.

Each method has its trade-offs between simplicity, performance, and the specific needs of
the communication.

You might also like