Peter Geoghegan

multidimensional-search-talk-pgconfdev

Fri, 16 May 2025 00:00:00 -0400

bloat-postgresql-scale

Fri, 10 Mar 2023 00:00:00 -0500

https://www.socallinuxexpo.org/scale/20x/presentations/bloat-postgresql-taxonomy Video of talk: https://www.youtube.com/watch?v=JDG4bMHxCH8 PostgreSQL's approach to transaction management uses MVCC (multi-version concurrency control). Postgres often maintains multiple physical versions of a single logical row. This is used to reconstruct the logical contents of tables at a specific point in time for SQL queries (zero or one row versions should be visible for each logical row). MVCC avoids having readers block writers and writers block readers, a frequent problem with database systems that use traditional 2PL. However, there is a cost to this approach: bloat must eventually be removed and reclaimed, typically by an autovacuum worker process. Most Postgres DBAs are familiar with bloat, and almost as many will have some experience with tuning autovacuum to better manage it. There have been quite a few talks about the practical aspects of optimizing autovacuum and avoiding bloat; this talk isn't one of them. Instead, the goal of the talk is to show how bloat can accumulate, what that looks like at the page level and at the level of entire tables and indexes, and how that may impact production queries. The talk covers: How VACUUM processes each structure, and in what order. How the HOT optimization works. How Postgres manages free space. The design of VACUUM. What space/bloat management tasks are prioritized by VACUUM, and why this makes sense.

logical-database

Thu, 26 May 2022 00:00:00 -0400

Video of talk on YouTube: https://www.youtube.com/watch?v=QeiOv6j0Jws Thinking about the logical database https://www.pgcon.org/events/pgcon_2022/schedule/session/308-thinking-about-the-logical-database/ When it comes to the design of the internal components of PostgreSQL, history matters. Earlier designs create ripples that affect later designs. Extensible indexing created the need for VACUUM, since without VACUUM it is far from obvious how transaction rollback could ever work, at least with GiST and GIN indexes. Transaction rollback that is decoupled from the physical representation of data (compared to traditional designs based on two-phase locking) was necessary even before Postgres added multi-version concurrency control. This talk will describe a conceptual framework for discussing whether something is an essential part of storing data transactionally from the point of view of users, or whether it is an inessential implementation detail of transaction management and storage, that could in principle be implemented in many different ways. The former can be categorized as belonging to the logical database, while the latter can be categorized as belonging to the physical database. Recent improvements in how the standard B-Tree index access method performs garbage collection to control MVCC version bloat (authored by the speaker) drew upon these concepts. But almost any improvement to the on-disk representation of either tables or indexes has some kind of tension between the logical and physical database. The talk explores the "logical database, physical database" concepts by discussing this recent work, as well as pending work on free space management in the standard heap table access method.

postgresql-deduplication-2020

Tue, 13 Oct 2020 00:00:00 -0400

B-Tree deduplication in PostgreSQL 13: design and background

xact-rollback-pgconfeu

Thu, 17 Oct 2019 00:00:00 -0400

https://www.postgresql.eu/events/pgconfeu2019/schedule/session/2742-instantaneous-transaction-rollback-and-other-advantages-of-versioned-storage/

nbtree-arch-pgcon

Thu, 30 May 2019 00:00:00 -0400

nbtree: An architectural perspective Video (lacks original animations): https://www.youtube.com/watch?v=p5RaATILoiE Many PostgreSQL users have a basic understanding of how Postgres nbtree indexes work internally (e.g., how the structure is maintained, high level details of how a new level is added to the tree, the role of VACUUM in garbage collection). A smaller number have some understanding of advanced topics (e.g., details of Lehman & Yao's B-Link technique, details of crash recovery). Even an experienced backend hacker could be forgiven for concluding that this is all well explored territory, leaving little room for improvement, since all the important components are already in place. This view of things is based on a correct premise, and yet cannot explain why nbtree doesn't perform well with certain specific real world workloads. For example, there is an excessive amount of nbtree index bloat created by the industry standard TPC-C benchmark, despite the fact that TPC's transactions rarely update indexed columns, and therefore handily avoid so-called "write amplification". Certain pieces are missing. Code enhancements (authored by the speaker) that will appear in PostgreSQL 12 will significantly improve matters for affected workloads. In some ways, this work is based on a return to decades old fundamentals. In other ways, it is based on practical experience, involving analyzing real-world index structures in an effort to learn where problems may lie. This talk will cover: • A review of the design of nbtree, especially its high level goals. • The importance of thinking in terms of invariants — the rules underlying what belongs where in the index. • Interesting ways in which nbtree exceeds what is truly required by the invariants, and how that can be exploited to improve performance. • Possible future work aimed at reducing CPU cache misses while descending a B-Tree. • The big picture — how all these techniques are complementary, and worth more than the sum of their parts

bloat-postgresql-pgconfeu

Fri, 26 Oct 2018 00:00:00 -0400

bloat-postgresql-pgopen

Thu, 06 Sep 2018 00:00:00 -0400

query-evaluation-pwl

Thu, 15 Jun 2017 00:00:00 -0400

Presentation on Goetz Graefe's "Query Evaluation Techniques for Large Databases" paper for Papers We Love. https://www.meetup.com/papers-we-love-too/events/237686185/

sort-hash-pgconfus-2017

Wed, 29 Mar 2017 00:00:00 -0400

Video: https://www.youtube.com/watch?v=aic_9KNwKn0 PostgreSQL 9.5 and 9.6 significantly improved upon the performance of both hash joins, and sort operations. Sorts are often used as input to GroupAggregate nodes and merge joins. While both approaches have various strengths and weaknesses, and are essential components of the PostgreSQL executor, their relative importance has somewhat shifted over the years. This happened due to trends in CPU and storage performance characteristics, and various improvements that gradually made their way into Postgres. Capabilities expected to be part of PostgreSQL 10 may further complicate this picture; parallel hash join and parallel sort add another dimension that must be considered. This may force us to further revise the "Sort vs. Hash" analysis in the coming years. In this talk, I'll discuss: * Why merge joins may be faster than hash joins for particular cases, and vice-versa. (Nested-loop joins will be briefly discussed.) * Improvements that have been made in both areas, and improvements that are tentatively scheduled for the next Postgres release. * How to conceptualize both approaches, to understand why the optimizer may prefer one or the other of the two general approaches in practice. * A historical perspective: the waxing and waning of sort merge join since the 1970s.

sort-hash-pgconfsv

Tue, 15 Nov 2016 00:00:00 -0500

Video: https://www.youtube.com/watch?v=w2Lu3KMOG98 Sort vs. Hash: A Duality PostgreSQL 9.5 and 9.6 significantly improved upon the performance of both hash joins, and sort operations. Sorts are often used as input to GroupAggregate nodes and merge joins. While both approaches have various strengths and weaknesses, and are essential components of the PostgreSQL executor, their relative importance has somewhat shifted over the years. This happened due to trends in CPU and storage performance characteristics, and various improvements that gradually made their way into Postgres. Capabilities expected to be part of PostgreSQL 10 may further complicate this picture; parallel hash join and parallel sort add another dimension that must be considered. This may force us to further revise the "Sort vs. Hash" analysis in the coming years. In this talk, I'll discuss: * Why merge joins may be faster than hash joins for particular cases, and vice-versa. (Nested-loop joins will be briefly discussed.) * Improvements that have been made in both areas, and improvements that are tentatively scheduled for the next Postgres release. * How to conceptualize both approaches, to understand why the optimizer may prefer one or the other of the two general approaches in practice. * A historical perspective: the waxing and waning of sort merge join since the 1970s.

Sorting improvements in PostgreSQL 9.5 and 9.6

Fri, 18 Sep 2015 00:00:00 -0400

UPSERT use-cases

Fri, 18 Sep 2015 00:00:00 -0400

Ecobox Home Blog About Tickets Sessions Venue & Hotel Sponsors POSTGRES OPEN SEPT 16TH - 18TH ・ DALLAS UPSERT use cases Back Date: Sept. 18, 2015 Time: 13:30 - 14:20 Room: Houston Ballroom B/C Level: Intermediate Feedback: Leave feedback PostgreSQL 9.5 will have support for a feature that is popularly known as "UPSERT" - the ability to either insert or update a row according to whether an existing row with the same key exists. If such a row already exists, the implementation should update it. If not, a new row should be inserted. This is supported by way of a new high level syntax (a clause that extends the INSERT statement) that more or less relieves the application developer from having to give any thought to race conditions. This common operation for client applications is set to become far simpler and far less error-prone than legacy ad-hoc approaches to UPSERT involving subtransactions. Moreover, the new implementation performs much better than those legacy approaches. While the feature is most obviously compelling for OLTP and web application use cases, it's also true that the syntax is powerful enough to be very useful in many real world data integration scenarios. The non-standard PostgreSQL syntax offer explicit, fine grained control over where and how to update. For example, an update may not actually affect an existing row due to not satisfying some additional criteria (i.e. due to not passing the ON CONFLICT ... DO UPDATE special, dedicated WHERE clause). This talk gives an overview of the feature from a high level, and examines these use cases. You will learn how you might want to use the new UPSERT feature in your application beyond the obvious. In passing, there will be brief discussion of why UPSERT's implementation proved to be a hard problem, and, relatedly, why a custom syntax was used instead of the SQL standard's MERGE syntax.

"UPSERT" in PostgreSQL

Tue, 26 May 2015 00:00:00 -0400

Small talk covering the "UPSERT" feature coming in PostgreSQL 9.5. https://www.youtube.com/watch?v=pbg97bkxbbY

jsonb Deep Dive

Wed, 25 Jun 2014 00:00:00 -0400

Peter Geoghegan, one of the major developers of the new "JSONB" binary, indexable JSON type for PostgreSQL 9.4, will be in town and will guide SFPUG members in a "deep dive" into the new technology, including: • Both the new JSONB type and the old JSON type input and output JSON, so what's the difference? • What new features does it offer? • How is the new data type structured, and how does it work? • How do you index JSONB? • What things remain unimplemented? Before the main event, we will have a Lightning Talk by Eric Ongerth: Running PostgreSQL in a Docker Container Food and Drink, as well as Peter's travel, are sponsored by Heroku. It is being hosted by SwitchFly.

Concurrency in Postgres

Fri, 20 Sep 2013 00:00:00 -0400

Talk that examines handling of concurrency issues in Postgres, and how Postgres 9.3 improves situation surrounding foreign key locks. (See http://postgresopen.org/2013/schedule/presentations/366/)