0% found this document useful (0 votes)
40 views5 pages

CAP Theorem

Uploaded by

rajuucet123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views5 pages

CAP Theorem

Uploaded by

rajuucet123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

The CAP Theorem in DBMS



The inherent trade-offs in networked shared-data system design make it


very difficult to create a dependable and effective system. The CAP
theorem, or CAP principle, is a central foundation for comprehending
these trade-offs in distributed systems. The CAP theorem emphasizes the
limitations that system designers have while addressing distributed data
replication. It states that only two of the three properties—consistency,
availability, and partition tolerance—can be concurrently attained by a
distributed system.

Developers must carefully balance these attributes according to their


particular application demands because of this underlying restriction.
Designers may decide which qualities to prioritize to obtain the best
performance and reliability for their systems by knowing the CAP theorem.
This article will provide a thorough analysis of all the properties given in
the CAP theorem, investigate the associated trade-offs, and talk about
how these ideas relate to distributed systems in the real world.

What is the CAP Theorem?


The CAP theorem is a fundamental concept in distributed systems theory
that was first proposed by Eric Brewer in 2000 and subsequently shown
by Seth Gilbert and Nancy Lynch in 2002. It asserts that all three of the
following qualities cannot be concurrently guaranteed in any distributed
data system:
1. Consistency
Consistency means that all the nodes (databases) inside a network will
have the same copies of a replicated data item visible for various
transactions. It guarantees that every node in a distributed cluster returns
the same, most recent, and successful write. It refers to every client
having the same view of the data. There are various types of consistency
models. Consistency in CAP refers to sequential consistency, a very
strong form of consistency.
Note that the concept of Consistency in ACID and CAP are slightly
different since in CAP, it refers to the consistency of the values in different
copies of the same data item in a replicated distributed system. In ACID, it
refers to the fact that a transaction will not violate the integrity constraints
specified on the database schema.
For example, a user checks his account balance and knows that he has
500 rupees. He spends 200 rupees on some products. Hence the amount
of 200 must be deducted changing his account balance to 300 rupees.
This change must be committed and communicated with all other
databases that hold this user’s details. Otherwise, there will be
inconsistency, and the other database might show his account balance as
500 rupees which is not true.

Consistency problem

2. Availability
Availability means that each read or write request for a data item will
either be processed successfully or will receive a message that the
operation cannot be completed. Every non-failing node returns a response
for all the read and write requests in a reasonable amount of time. The
key word here is “every”. In simple terms, every node (on either side of a
network partition) must be able to respond in a reasonable amount of
time.
For example, user A is a content creator having 1000 other users
subscribed to his channel. Another user B who is far away from user A
tries to subscribe to user A’s channel. Since the distance between both
users are huge, they are connected to different database node of the
social media network. If the distributed system follows the principle of
availability, user B must be able to subscribe to user A’s channel.

Availability problem
3. Partition Tolerance
Partition tolerance means that the system can continue operating even if
the network connecting the nodes has a fault that results in two or more
partitions, where the nodes in each partition can only communicate
among each other. That means, the system continues to function and
upholds its consistency guarantees in spite of network partitions. Network
partitions are a fact of life. Distributed systems guaranteeing partition
tolerance can gracefully recover from partitions once the partition heals.
For example, take the example of the same social media network where
two users are trying to find the subscriber count of a particular channel.
Due to some technical fault, there occurs a network outage, the second
database connected by user B losses its connection with first database.
Hence the subscriber count is shown to the user B with the help of replica
of data which was previously stored in database 1 backed up prior to
network outage. Hence the distributed system is partition tolerant.

Partition Tolerance

The CAP theorem states that distributed databases can have at most
two of the three properties: consistency, availability, and partition
tolerance. As a result, database systems prioritize only two
properties at a time.

Venn diagram of CAP theorem


The Trade-Offs in the CAP Theorem
The CAP theorem implies that a distributed system can only provide two
out of three properties:
1. CA (Consistency and Availability)
These types of system always accept the request to view or modify
the data sent by the user and they are always responded with data
which is consistent among all the database nodes of a big,
distributed network.
However, such type of distributed systems is not realizable in real world
because when network failure occurs, there are two options: Either send
old data which was replicated moments ago before network failure or do
not allow user to access the already moments old data. If we choose first
option, our system will become Available and if we choose second option
our system will become Consistent.
The combination of consistency and availability is not possible in
distributed systems and for achieving CA, the system has to be monolithic
such that when a user updates the state of the system, all other users
accessing it are also notified about the new changes which means that
the consistency is maintained. And since it follows monolithic architecture,
all users are connected to single system which means it is also available.
These types of systems are generally not preferred due to a requirement
of distributed computing which can be only done when consistency or
availability is sacrificed for partition tolerance.
Example databases: MySQL, PostgreSQL

CAP diagram
2. AP (Availability and Partition Tolerance)
These types of system are distributed in nature, ensuring that the
request sent by the user to view or modify the data present in the
database nodes are not dropped and are processed in presence of a
network partition.
The system prioritizes availability over consistency and can respond with
possibly stale data which was replicated from other nodes before the
partition was created due to some technical failure. Such design choices
are generally used while building social media websites such as
Facebook, Instagram, Reddit, etc. and online content websites like
YouTube, blog, news, etc. where consistency is usually not required, and
a bigger problem arises if the service is unavailable causing corporations
to lose money since the users may shift to new platform. The system can
be distributed across multiple nodes and is designed to operate reliably
even in the face of network partitions.
Example databases: Amazon DynamoDB, Google Cloud Spanner.
3. CP (Consistency and Partition Tolerance)
These types of system are distributed in nature, ensuring that the
request sent by the user to view or modify the data present in the
database nodes are dropped instead of responding with inconsistent
data in presence of a network partition.
The system prioritizes consistency over availability and does not allow
users to read crucial data from the stored replica which was backed up
prior to the occurrence of network partition. Consistency is chosen over
availability for critical applications where latest data plays an important
role such as stock market application, ticket booking application, banking,
etc. where problem will arise due to old data present to users of
application.
For example, in a train ticket booking application, there is one seat which
can be booked. A replica of the database is created, and it is sent to other
nodes of the distributed system. A network outage occurs which causes
the user connected to the partitioned node to fetch details from this
replica. Some user connected to the unpartitioned part of distributed
network and already booked the last remaining seat. However, the user
connected to partitioned node will still one seat which makes the available
data inconsistent. It would have been better if the user was shown error
and make the system unavailable for the user and maintain consistency.
Hence consistency is chosen in such scenarios.
Example databases: Apache HBase, MongoDB, Redis.

You might also like