StormaticsStormatics

Achieving High Availability in PostgreSQL: From 90% to 99.999%

When you are running mission-critical applications, like online banking, healthcare systems, or global e-commerce platforms, every second of downtime can cost millions and damage your business reputation. That’s why many customers aim for four-nines (99.99%) or five-nines (99.999%) availability for their applications n this post, we will walk through what those nines really mean and, more importantly, which PostgreSQL cluster setup will get you there.
Read More

A Guide to Deploying Production-Grade Highly Available Systems in PostgreSQL

In today’s digital landscape, downtime isn’t just inconvenient, it’s costly. No matter what business you are running, an e-commerce site, a SaaS platform, or critical internal systems, your PostgreSQL database must be resilient, recoverable, and continuously available. So in short: High Availability (HA) is not a feature you enable; it’s a system you design.
Read More

Replication Types and Modes in PostgreSQL

Data is a key part of any mission-critical application. Losing it can lead to serious issues, such as financial loss or harm to a business’s reputation. A common way to protect against data loss is by taking regular backups, either manually or automatically. However, as data grows, backups can become large and take longer to complete.
Read More

Choosing the Right Barman Backup Type and Mode for Your PostgreSQL Highly Available Cluster

When running a PostgreSQL database in a High Availability (HA) cluster, it’s easy to assume that having multiple nodes means your data is safe. But HA is not a replacement for backups. If someone accidentally deletes important data or runs a wrong update query, that change will quickly spread to all nodes in the cluster. Without proper safeguards, that data is gone everywhere. In these cases, only a backup can help you restore what was lost. The case mentioned above isn’t the only reason backups are important. In fact, many industries have strict compliance requirements that make regular backups mandatory. This makes backups essential not just for recovering lost data, but also for meeting regulatory standards. Barman is a popular tool in the PostgreSQL ecosystem for managing backups, especially in High Availability (HA) environments. It's known for being easy to set up and for offering multiple types and modes of backups. However, this flexibility can also be a bit overwhelming at first. That’s why I’m writing this blog to break down each backup option in a simple and clear way, so you can choose the one that best fits your business needs.
Read More

Which PostgreSQL HA Solution Fits Your Needs: Pgpool or Patroni?

When designing a highly available PostgreSQL cluster, two popular tools often come into the conversation: Pgpool-II and Patroni. Both are widely used in production environments, offer solid performance, and aim to improve resilience and reduce downtime; however, they take different approaches to achieving this goal. We often get questions during webinars/talks and customer calls about which tool is better suited for production deployments. So, we decided to put together this blog to help you understand the differences and guide you in choosing the right solution based on your specific use case. Before we dive into comparing these two great tools for achieving high availability, let's first take a quick look at some of the key components involved in building a highly available and resilient setup.
Read More

Understanding Split-Brain Scenarios in Highly Available PostgreSQL Clusters

High Availability (HA) refers to a system design approach that ensures a service remains accessible even in the event of hardware or software failures. In PostgreSQL, HA is typically implemented through replication, failover mechanisms, and clustering solutions to minimize downtime and ensure data consistency. Hence, HA is very important for your mission-critical applications.  In this blog post, we will try to explore a critical failure condition known as a split-brain scenario that can occur in PostgreSQL HA clusters. We will first see what split-brain means, and then how it can impact PostgreSQL clusters, and finally discuss how to prevent it through architectural choices and tools available in the PostgreSQL ecosystem
Read More

Best Practices for TimescaleDB Massive Delete Operations

Welcome to the second part of our TimescaleDB best practices series! In the first part, we explored how to perform massive backfill operations efficiently, sharing techniques to optimize performance and avoid common pitfalls. If you haven’t had a chance to read the first part yet, you can check it out using this link In today's blog, we will discuss another crucial aspect of time-series data management: massive delete operations. As your data grows over time, older records often lose their relevance but continue to occupy valuable disk space, potentially increasing storage costs and might degrade the performance if not managed well.  Let's walk through some strategies to clean up or downsample aged data in TimescaleDB, helping you maintain a lean, efficient, and cost-effective database.
Read More

How to Safely Perform Backfill Operations in TimescaleDB

Backfilling data into a TimescaleDB hypertable in production can be very tricky, especially when automated processes like compression policies are involved. From past experience, we have seen that if backfill operations aren’t handled properly, they can interfere with these automated tasks, sometimes causing them to stop working altogether.  This blog covers a safer and more reliable approach to backfilling hypertables, along with best practices to prevent disruptions to compression and other background processes. What is a Backfill Operation? Backfilling means adding old or missing data into the database table after some time has already passed.  Imagine you are collecting temperature readings every hour, but your system was down for a day and didn’t save any data. Later, you get that missing data from the local storage of the device or cloud storage, and want to put it back in the right hypertable, which is called backfilling.  In TimescaleDB, this is common with time-series data, but it needs to be done carefully. That’s because TimescaleDB might already be doing things in the background, like compressing old data to save space. If we are not careful, backfilling can mess up these automatic tasks.
Read More

Finding Bottlenecks and Avoiding Over-Optimization via Explain Plans

Performance optimization in a production database is important, but trying to over-optimize can make things more complicated without real improvements. In this post, I’ll share two very basic EXPLAIN ANALYZE outputs from a production system. A user asked us to help optimize these queries. I've changed the table and column names for privacy. We will look at how to spot slow parts of a query, improve performance the right way, and avoid unnecessary tuning. Plan A: Identifying and Resolving a Bottleneck Execution Plan A (Before Optimization) Nested Loop (cost=1000.42..25607.38 rows=1 width=129) (actual time=78.521..90.445 rows=0 loops=1) -> Gather (cost=1000.00..25598.95 rows=1 width=65) (actual time=78.520..90.443 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on <table_1> e (cost=0.00..24598.85 rows=1 width=65) (actual time=75.351..75.351 rows=0 loops=3) Filter: ((<column_1>) = '<date_value>'::date) AND ((<column_2>)::text = '<event_type>'::text) Rows Removed by Filter: <number_removed_rows> -> Index Scan using <index_name> on <table_2> a (cost=0.42..8.43 rows=1 width=41) (never executed) Index Cond: ((<column_3>)::text = (<column_4>)::text) Filter: ((<column_5>)::text = '<default_value>'::text) Planning Time: 0.466 ms Execution Time: 90.580 ms
Read More

Important PostgreSQL Parameters: Understanding Their Importance and Recommended Values

Have you ever experienced your database slowing down as the amount of data increases? If so, one important factor to consider is tuning PostgreSQL parameters to match your specific workload and requirements.  PostgreSQL has many parameters because it is designed to be highly flexible and customizable to meet a wide range of use cases and workloads. Each parameter allows you to fine-tune different aspects of the database, such as memory management, query optimization, connection handling, and more. This flexibility helps database administrators to optimize performance based on hardware resources, workload requirements, and specific business needs. In this blog, I will cover some of the important PostgreSQL parameters, explain their role, and provide recommended values to help you fine-tune your database for better performance and scalability.  Memory-Related Parameters Memory-related parameters in PostgreSQL control how the database allocates and manages memory. Tuning these settings is important for improving query performance and preventing resource bottlenecks. Name: work_mem Description: Sets the maximum amount of memory used by internal operations like sorts and hashes before writing to disk. Increasing it can improve performance for complex queries Default: 4MB Recommended: Typically, setting work_mem to 1-2% of the total system's available memory is recommended, i.e., if the total system memory is 256 GB, assign 3 to 5 GB for work_mem. Note: This may lead to higher memory usage for operations that involve sorting. Name: shared_buffers Description: Determines the amount of memory allocated for caching database data. Default: 128MB Recommendation: Typically, setting shared_buffers to 25-40% of the total system memory is recommended, i.e., if the total system memory is 256 GB, assign 64-102 GB for shared_buffers. Name: maintenance_work_mem Description: Specifies the amount of memory used for maintenance operations like VACUUM, CREATE INDEX, and ALTER TABLE. Increasing it can speed up these operations. Default: 64MB Recommendation: it's recommended to set 5-10% of the total system memory, i.e., if the total system memory is 256 GB, assign 13 to 26 GB for maintenance_work_mem.
Read More