Understanding Multi-Master Replication 2
Understanding Multi-Master Replication 2
Principal Engineer
Ibrar Ahmed
Replication
PostgreSQL supports several replication methods, including logical and
streaming each catering to different requirements and use cases.
2
Replication Use Cases
• High Availability: Ensuring continuous operation of the database by automatically failing over to a standby database if the primary
database fails.
• Data Latency: Minimizing the delay in data being replicated across geographical locations to ensure timely access to data.
• Data Residency: Replicating data to servers in specific locations to comply with legal or policy requirements regarding where data
is stored.
• Near Zero Downtime for Major Upgrades Across Version: Upgrading PostgreSQL versions with minimal service interruption by
replicating data to a newer version instance and then switching operations to it.
• Load Balancing / Query Routing: Distributing read queries among several database replicas to optimize performance and
resource utilization.
• Data Migration: Moving data from one database to another, potentially across different storage systems, locations, or database
schemas.
• ETL (Extract, Transform, Load): Using replication to extract data from operational databases, transform it as needed, and load it
into a data warehouse or analytical system.
• Data Warehousing: Aggregating data from various sources into a central repository to support business intelligence and
reporting activities.
3
Physical Replication
• Physical replication in PostgreSQL is the method
of copying and synchronizing data from a
primary server to standby servers in real-time.
• Real-time transfer of WAL records from primary
to standby servers to ensure data consistency
and up-to-date replicas.
• Standby servers can run in hot standby mode,
allowing them to handle read-only queries while
replicating changes.
• Configurable as either synchronous, for strict
data integrity, or asynchronous, for improved
write performance.
• Facilitates automatic failover by promoting a
standby to primary in case of primary server
failure.
Asynchronous Replication
Secondary Secondary
Client Primary
Write Write on
Primary
Write
Complete
Write on
ACK
Replica 1
Write on
Replica 2
ACK
5
Synchronous Replication
Secondary Secondary
Client Primary
Write Write on
Primary
Write on
ACK
Quorum Replica 1
Commit Write on
(PG-10) ACK Replica 2
Write
Complete
6
Logical Replication
• Logical replication is the method of copying data
objects and changes based on replication
identity.
• Provides fine grained control over data replication
and security.
• Publisher / Subscriber model - one or more
subscriber nodes subscribe to one or more
publisher nodes.
7
• Copy data in a format that can be interpreted by
other systems using logical decoding plugins.
• Publication is set of changes generated from a
table or group of tables.
• Subscription is the downstream end of logical
replication.
PostgreSQL Logical Replication Slots
• PostgreSQL logical replication slots ensure efficient replication by retaining changes until confirmed by all
subscribers.
• They play a crucial role in managing database changes in replication systems, allowing for seamless data
streaming.
• Logical replication slots are essential for maintaining data integrity and consistency across multiple
subscriber nodes.
• Monitor logical replication slots by querying 'SELECT * FROM pg_replication_slots;' to check slot status.
• Regular monitoring of logical replication slots is crucial for ensuring seamless replication and data consistency.
8
PostgreSQL Logical Replication Slots
• When deleting logical replication slots in PostgreSQL, with caution to prevent potential data loss
SELECT pg_drop_replication_slot(slot_name);
• Ensure no active subscriptions rely on the slot before deletion to avoid disrupting the replication process.
• Considerations when deleting slots include verifying all subscribers have received and applied the changes to maintain
consistency.
9
PostgreSQL Logical Replication
First, ensure your PostgreSQL instance is configured to support logical replication. Key settings in the
postgresql.conf file include:
wal_level = logical
max_replication_slots = 4
max_wal_senders = 4
10
Create a Publication
On the primary server (the source of the data), create a publication. A publication is a set of database
changes that can be replicated to subscribers. You can publish all of the tables in a database or a
selected subset.
Examples:
11
Create a Subscription
On the subscriber database (the destination for the data), create a subscription to the publication.
This establishes the connection to the publisher and starts the replication process.
Example:
PUBLICATION my_publication;
12
PostgreSQL Logical Replication in Retrospect
● PostgreSQL 10
○ Logical Replication was added
● PostgreSQL 11
○ Reduced memory usage
○ Truncate replication
● PostgreSQL 12
○ Allowing copying of replication slots.
● PostgreSQL 13
○ Replicate partitioned table
○ max_slot_wal_keep_size parameter (limiting WAL storage for rep slots)
● PostgreSQL 14
○ Replication slot activity reporting
○ Streaming in-progress replication
○ Support data transfer in binary mode
○ Allow decoding of prepared transactions
● PostgreSQL 15
○ Replication of prepared transactions
○ Allow publication of all tables in schema
○ Allow logical replication to run as owner of subscription
13
Allow logical decoding on the stand-by
• With this PG-16 feature, creation of a replication slot and logical decoding is possible from the
stand-by server.
• Thanks to this feature, you can create subscriptions to a standby server.
• This has been in the works with the PostgreSQL community for some time now.
• Prior to this feature while creating logical replication slot at stand-by:
ERROR: logical decoding cannot be used while in recovery
• Prior to this feature creating a subscription to a stand-by would produce :
ERROR: could not create replication slot "mysub": ERROR: logical
decoding cannot be used while in recovery
• This feature reduces the load on the primary server.
14
Monitoring and Managing Replication
PostgreSQL provides several functions and views to monitor and manage logical replication, such as:
To monitor replication status, you can query these views. For example, to see the status of
subscriptions:
15
Active-Standby / Active-Active
Active - Standby Active-Active
• One primary and one or more stand-by • One or more
Logical primary/active servers
Replication
servers replicating between each other.
• Write traffic on primary and read traffic • Not part of core PostgreSQL, implemented
load balancing on read replicas using using 3rd party tools and extension
external tools • Requires conflict detection and resolutions
• Synchronous / Asynchronous / Quorum • Use cases are High Availability, data
commit choices residency, data latency, and near zero
• Load Balancing / High Availability downtime upgrades
• Automatic Failover using external tools
• Low chance of data loss
16
Managing Conflicts
• Logical replication does not handle conflict resolution automatically. If there are conflicting
changes made in the subscriber database, you'll need to manage these conflicts manually or
through application logic.
• Limited Conflict Detection: PostgreSQL's built-in logical replication does not inherently detect
conflicts at the row level when the same record is modified in different replicas at the same time.
Conflict detection and resolution are typically handled at the application level or with external
tools.
• Application-Level Resolution: Logical replication expects conflicts to be resolved by the
application or the administrators. This means designing the application in a way that minimizes
conflict possibilities (e.g., by segmenting data so that each replica writes to a distinct set of rows)
or having a mechanism to detect and resolve conflicts post-replication.
17
Multimaster Replication
● Simultaneous Data Writing: Multi-master replication allows
multiple database instances to handle write operations
simultaneously, enhancing the database's write availability and
scalability.
● Conflict Resolution: Incorporates mechanisms to handle conflicts
that arise when the same data is modified at different nodes,
ensuring data consistency across all nodes.
Re e
ad rit ● Load Distribution: Distributes both read and write loads across
-W - W
ad several nodes, effectively balancing the load and improving overall
rit Re
e
system performance.
● Improved Fault Tolerance: Increases the database system's fault
tolerance by allowing the system to remain operational even if one
of the master nodes fails, thereby reducing potential downtime.
● Real-Time Data Synchronization: Ensures real-time or
near-real-time synchronization between nodes, keeping the
databases up-to-date and consistent with each other.
● Geographical Distribution: Supports geographical distribution of
database nodes, which can reduce latency for globally distributed
applications by allowing users to interact with the nearest
database node.
18
Challenges of MMR
Data conflicts can arise when multiple masters simultaneously update the same data, leading to
inconsistencies and requiring conflict resolution.
• Ensuring consistency across multiple masters is challenging, as immediate synchronization may not
always be feasible, potentially causing data discrepancies.
• Scalability issues may arise due to the increased complexity of managing multiple master nodes and
the need for efficient communication and coordination.
19
Principles of MMR
20
PostgreSQL's MMR Architectural Overview
• Key components of PostgreSQL's MMR include logical decoding, replication slots, and transaction
timestamps for conflict resolution.
• Logical decoding captures changes at a granular level, replication slots ensure data consistency, and
timestamps aid in conflict detection.
• These components work together harmoniously to synchronize data across multiple master nodes,
facilitating distributed database systems efficiently.
21
Existing Solutions in PostgreSQL Ecosystem
• pgEdge (Spock): A state-of-the-art solution offering seamless multi-master replication with high
performance.
• BDR: Known for its robust replication capabilities, allowing for distributed databases but requiring a
complex setup and configuration.
• Postgres-XL: Provides scalable horizontal database clustering, ideal for large-scale applications, yet
may pose challenges in terms of maintenance and management.
22
pgEdge (Spock): Solution Overview
23
BDR (Bi-Directional Replication)
• BDR allows simultaneous reads and writes on multiple database nodes, ensuring data
consistency across all masters.
• With conflict resolution mechanisms, BDR resolves data conflicts arising from concurrent
write operations on different nodes.
• Practical applications of BDR include geographically distributed databases, high availability
scenarios, and workload scaling in PostgreSQL.
24
Postgres-XL: Scalability Solutions
25
Conflict Resolution Strategies
• Last Write Wins (LWW): In this approach, the system resolves conflicts by accepting the data from the most recent
write based on timestamp, effectively overriding earlier writes.
• Version Vectors: Utilize version vectors to keep track of the version history of each data item. This method allows
the system to identify and resolve conflicts by comparing version histories and merging changes accordingly.
• Operational Transformation (OT): Commonly used in collaborative applications, this technique involves
transforming operations in such a way that they can be applied in any order while still achieving a consistent state
across all nodes.
• Conflict-Free Replicated Data Types (CRDTs): Implement CRDTs which are data structures designed to handle
data consistency in a decentralized manner, ensuring that all replicas can converge to the same state without
needing to resolve conflicts.
• Manual Override: Provide mechanisms for manual conflict resolution, where conflicts that cannot be automatically
resolved are flagged for administrative intervention, allowing database administrators to make the final decision.
26
Pros and Cons of Multi-Master Replication
• Complexity: Multi-master replication is inherently
● High Availability: If one master fails, another
complex, making it challenging to manage.
master can continue to handle updates and
• Conflict Resolution: Simultaneous writes on multiple
inserts, ensuring service continuity.
nodes can lead to conflicts, which are often difficult to
● Geographical Redundancy: Masters are
resolve.
located in different locations, significantly
• Manual Intervention: Conflicts may sometimes require
reducing the risk of simultaneous failures.
manual intervention to resolve, adding to maintenance
● Scalable Writes: Allows data updates on
overhead.
multiple servers, enhancing write scalability.
• Data Inconsistency: There is a risk of data
● Traffic Distribution: No need to route all
inconsistencies across different nodes due to the
traffic to a single master, allowing better load
replication mechanism.
distribution and utilization.
27
Performance Optimization in MMR
• Employ conflict resolution mechanisms to handle conflicting changes between multiple master
nodes efficiently.
• Implement strict data consistency checks to ensure synchronization among all nodes in the
replication setup.
• Utilize efficient data distribution strategies to balance the workload and prevent bottlenecks in
multi-master replication systems.
28
Ensuring Data Integrity
• Ensuring data integrity in multi-master replication systems is crucial for maintaining consistency
across nodes and preventing data conflicts.
• Best practices include implementing strong data validation mechanisms, performing regular audits,
and utilizing checksums to detect anomalies.
• Tools like pgEdge’s Spock, pglogical, Bucardo, and PostgreSQL's built-in features (like logical
replication) are instrumental for validating data consistency in distributed nodes.
29
High Availability in PostgreSQL Using MMR
Patroni
pgbackrest Patroni
pgbackrest
US West-2 US East-1
Logical Replication
Log
Logical Replication
ical
Rep
li cati
o n
Patroni
US East-2
pgbackrest
Questions
Code is like clay; in the hands of a skilled craftsman,
it can be molded into something that stands the test
of time. Remember, the art is not in writing code, but
in crafting solutions that endure. Let's build not just
for today, but for the future.
Ibrar Ahmed