summaryrefslogtreecommitdiff
path: root/src/include/replication
AgeCommit message (Collapse)Author
2022-03-24Add decoding of sequences to built-in replicationTomas Vondra
This commit adds support for decoding of sequences to the built-in replication (the infrastructure was added by commit 0da92dc530). The syntax and behavior mostly mimics handling of tables, i.e. a publication may be defined as FOR ALL SEQUENCES (replicating all sequences in a database), FOR ALL SEQUENCES IN SCHEMA (replicating all sequences in a particular schema) or individual sequences. To publish sequence modifications, the publication has to include 'sequence' action. The protocol is extended with a new message, describing sequence increments. A new system view pg_publication_sequences lists all the sequences added to a publication, both directly and indirectly. Various psql commands (\d and \dRp) are improved to also display publications including a given sequence, or sequences included in a publication. Author: Tomas Vondra, Cary Huang Reviewed-by: Peter Eisentraut, Amit Kapila, Hannu Krosing, Andres Freund, Petr Jelinek Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2022-03-23Replace BASE_BACKUP COMPRESSION_LEVEL option with COMPRESSION_DETAIL.Robert Haas
There are more compression parameters that can be specified than just an integer compression level, so rename the new COMPRESSION_LEVEL option to COMPRESSION_DETAIL before it gets released. Introduce a flexible syntax for that option to allow arbitrary options to be specified without needing to adjust the main replication grammar, and common code to parse it that is shared between the client and the server. This commit doesn't actually add any new compression parameters, so the only user-visible change is that you can now type something like pg_basebackup --compress gzip:level=5 instead of writing just pg_basebackup --compress gzip:5. However, it should make it easy to add new options. If for example gzip starts offering fries, we can support pg_basebackup --compress gzip:level=5,fries=true for the benefit of users who want fries with that. Along the way, this fixes a few things in pg_basebackup so that the pg_basebackup can be used with a server-side compression algorithm that pg_basebackup itself does not understand. For example, pg_basebackup --compress server-lz4 could still succeed even if only the server and not the client has LZ4 support, provided that the other options to pg_basebackup don't require the client to decompress the archive. Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker. Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
2022-03-15Allow extensions to add new backup targets.Robert Haas
Commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc allowed for base backup targets, meaning that we could do something with the backup other than send it to the client, but all of those targets had to be baked in to the core code. This commit makes it possible for extensions to define additional backup targets. Patch by me, reviewed by Abhijit Menon-Sen. Discussion: http://postgr.es/m/CA+TgmoaqvdT-u3nt+_kkZ7bgDAyqDB0i-+XOMmr5JN2Rd37hxw@mail.gmail.com
2022-03-08Add support for zstd base backup compression.Robert Haas
Both client-side compression and server-side compression are now supported for zstd. In addition, a backup compressed by the server using zstd can now be decompressed by the client in order to accommodate the use of -Fp. Jeevan Ladhe, with some edits by me. Discussion: http://postgr.es/m/CA+Tgmobyzfbz=gyze2_LL1ZumZunmaEKbHQxjrFkOR7APZGu-g@mail.gmail.com
2022-02-22Allow specifying row filters for logical replication of tables.Amit Kapila
This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
2022-02-15Move replication slot release to before_shmem_exit().Andres Freund
Previously, replication slots were released in ProcKill() on error, resulting in reporting replication slot drop of ephemeral slots after the stats subsystem was already shut down. To fix this problem, move replication slot release to a before_shmem_exit() hook that is called before the stats collector shuts down. There wasn't really a good reason for the slot handling to be in ProcKill() anyway. Patch by Masahiko Sawada, with very minor polishing by me. I, Andres, wrote a test for dropping slots during process exit, but there may be some OS dependent issues around the number of times FATAL error messages are displayed due to a still debated libpq issue. So that test will be committed separately / later. Reviewed-By: Kyotaro Horiguchi <[email protected]> Reviewed-By: Andres Freund <[email protected]> Author: Masahiko Sawada <[email protected]> Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
2022-02-11Add suport for server-side LZ4 base backup compression.Robert Haas
LZ4 compression can be a lot faster than gzip compression, so users may prefer it even if the compression ratio is not as good. We will want pg_basebackup to support LZ4 compression and decompression on the client side as well, and there is a pending patch for that, but it's by a different author, so I am committing this part separately for that reason. Jeevan Ladhe, reviewed by Tushar Ahuja and by me. Discussion: http://postgr.es/m/CANm22Cg9cArXEaYgHVZhCnzPLfqXCZLAzjwTq7Fc0quXRPfbxA@mail.gmail.com
2022-02-10Logical decoding of sequencesTomas Vondra
This extends the logical decoding to also decode sequence increments. We differentiate between sequences created in the current (in-progress) transaction, and sequences created earlier. This mixed behavior is necessary because while sequences are not transactional (increments are not subject to ROLLBACK), relfilenode changes are. So we do this: * Changes for sequences created in the same top-level transaction are treated as transactional, i.e. just like any other change from that transaction, and discarded in case of a rollback. * Changes for sequences created earlier are applied immediately, as if performed outside any transaction. This applies also after ALTER SEQUENCE, which may create a new relfilenode. Moreover, if we ever get support for DDL replication, the sequence won't exist until the transaction gets applied. Sequences created in the current transaction are tracked in a simple hash table, identified by a relfilenode. That means a sequence may already exist, but if a transaction does ALTER SEQUENCE then the increments for the new relfilenode will be treated as transactional. For each relfilenode we track the XID of (sub)transaction that created it, which is needed for cleanup at transaction end. We don't need to check the XID to decide if an increment is transactional - if we find a match in the hash table, it has to be the same transaction. This requires two minor changes to WAL-logging. Firstly, we need to ensure the sequence record has a valid XID - until now the the increment might have XID 0 if it was the first change in a subxact. But the sequence might have been created in the same top-level transaction. So we ensure the XID is assigned when WAL-logging increments. The other change is addition of "created" flag, marking increments for newly created relfilenodes. This makes it easier to maintain the hash table of sequences that need transactional handling. Note: This is needed because of subxacts. A XID 0 might still have the sequence created in a different subxact of the same top-level xact. This does not include any changes to test_decoding and/or the built-in replication - those will be committed in separate patches. A patch adding decoding of sequences was originally submitted by Cary Huang. This commit reworks various important aspects (e.g. the WAL logging and transactional/non-transactional handling). However, the original patch and reviews were very useful. Author: Tomas Vondra, Cary Huang Reviewed-by: Peter Eisentraut, Hannu Krosing, Andres Freund Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2022-02-10Remove server support for the previous base backup protocol.Robert Haas
Commit cc333f32336f5146b75190f57ef587dff225f565 added a new COPY sub-protocol for taking base backups, but retained support for the previous protocol. For the same reasons articulated in the message for commit 9cd28c2e5f11dfeef64a14035b82e70acead65fd, remove support for the previous protocol from the server. Discussion: http://postgr.es/m/CA+TgmoazKcKUWtqVa0xZqSzbKgTH+X-aw4V7GyLD68EpDLMh8A@mail.gmail.com
2022-01-25Fix various typos, grammar and code style in comments and docsMichael Paquier
This fixes a set of issues that have accumulated over the past months (or years) in various code areas. Most fixes are related to some recent additions, as of the development of v15. Author: Justin Pryzby Discussion: https://postgr.es/m/[email protected]
2022-01-24Fix limitations on what SQL commands can be issued to a walsender.Tom Lane
In logical replication mode, a WalSender is supposed to be able to execute any regular SQL command, as well as the special replication commands. Poor design of the replication-command parser caused it to fail in various cases, notably: * semicolons embedded in a command, or multiple SQL commands sent in a single message; * dollar-quoted literals containing odd numbers of single or double quote marks; * commands starting with a comment. The basic problem here is that we're trying to run repl_scanner.l across the entire input string even when it's not a replication command. Since repl_scanner.l does not understand all of the token types known to the core lexer, this is doomed to have failure modes. We certainly don't want to make repl_scanner.l as big as scan.l, so instead rejigger stuff so that we only lex the first token of a non-replication command. That will usually look like an IDENT to repl_scanner.l, though a comment would end up getting reported as a '-' or '/' single-character token. If the token is a replication command keyword, we push it back and proceed normally with repl_gram.y parsing. Otherwise, we can drop out of exec_replication_command() without examining the rest of the string. (It's still theoretically possible for repl_scanner.l to fail on the first token; but that could only happen if it's an unterminated single- or double-quoted string, in which case you'd have gotten largely the same error from the core lexer too.) In this way, repl_gram.y isn't involved at all in handling general SQL commands, so we can get rid of the SQLCmd node type. (In the back branches, we can't remove it because renumbering enum NodeTag would be an ABI break; so just leave it sit there unused.) I failed to resist the temptation to clean up some other sloppy coding in repl_scanner.l while at it. The only externally-visible behavior change from that is it now accepts \r and \f as whitespace, same as the core lexer. Per bug #17379 from Greg Rychlewski. Back-patch to all supported branches. Discussion: https://postgr.es/m/[email protected]
2022-01-24Server-side gzip compression.Robert Haas
pg_basebackup's --compression option now lets you write either "client-gzip" or "server-gzip" instead of just "gzip" to specify where the compression should be performed. If you write simply "gzip" it's taken to mean "client-gzip" unless you also use --target, in which case it is interpreted to mean "server-gzip", because that's the only thing that makes any sense in that case. To make this work, the BASE_BACKUP command now takes new COMPRESSION and COMPRESSION_LEVEL options. At present, pg_basebackup cannot decompress .gz files, so server-side compression will cause a failure if (1) -Ft is not used or (2) -R is used or (3) -D- is used without --no-manifest. Along the way, I removed the information message added by commit 5c649fe153367cdab278738ee4aebbfd158e0546 which occurred if you specified no compression level and told you that the default level had been used instead. That seemed like more output than most people would want. Also along the way, this adds a check to the server for unrecognized base backup options. This repairs a bug introduced by commit 0ba281cb4bf9f5f65529dfa4c8282abb734dd454. This commit also adds some new test cases for pg_verifybackup. They take a server-side backup with and without compression, and then extract the backup if we have the OS facilities available to do so, and then run pg_verifybackup on the extracted directory. That is a good test of the functionality added by this commit and also improves test coverage for the backup target patch (commit 3500ccc39b0dadd1068a03938e4b8ff562587ccc) and for pg_verifybackup itself. Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which this is a part has also had review and/or testing from Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+Tgmoa-ST7fMLsVJduOB7Eub=2WjfpHS+QxHVEpUoinf4bOSg@mail.gmail.com
2022-01-20Support base backup targets.Robert Haas
pg_basebackup now has a --target=TARGET[:DETAIL] option. If specfied, it is sent to the server as the value of the TARGET option to the BASE_BACKUP command. If DETAIL is included, it is sent as the value of the new TARGET_DETAIL option to the BASE_BACKUP command. If the target is anything other than 'client', pg_basebackup assumes that it will now be the server's job to write the backup in a location somehow defined by the target, and that it therefore needs to write nothing locally. However, the server will still send messages to the client for progress reporting purposes. On the server side, we now support two additional types of backup targets. There is a 'blackhole' target, which just throws away the backup data without doing anything at all with it. Naturally, this should only be used for testing and debugging purposes, since you will not actually have a backup when it finishes running. More usefully, there is also a 'server' target, so you can now use something like 'pg_basebackup -Xnone -t server:/SOME/PATH' to write a backup to some location on the server. We can extend this to more types of targets in the future, and might even want to create an extensibility mechanism for adding new target types. Since WAL fetching is handled with separate client-side logic, it's not part of this mechanism; thus, backups with non-default targets must use -Xnone or -Xfetch. Patch by me, with a bug fix by Jeevan Ladhe. The patch set of which this is a part has also had review and/or testing from Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-19Make logical decoding a part of the rmgr.Jeff Davis
Add a new rmgr method, rm_decode, and use that rather than a switch statement. In preparation for rmgr extensibility. Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/ed1fb2e22d15d3563ae0eb610f7b61bb15999c0a.camel%40j-davis.com Discussion: https://postgr.es/m/20220118095332.6xtlcjoyxobv6cbk@jrouhaud
2022-01-18Modify pg_basebackup to use a new COPY subprotocol for base backups.Robert Haas
In the new approach, all files across all tablespaces are sent in a single COPY OUT operation. The CopyData messages are no longer raw archive content; rather, each message is prefixed with a type byte that describes its purpose, e.g. 'n' signifies the start of a new archive and 'd' signifies archive or manifest data. This protocol is significantly more extensible than the old approach, since we can later create more message types, though not without concern for backward compatibility. The new protocol sends a few things to the client that the old one did not. First, it sends the name of each archive explicitly, instead of letting the client compute it. This is intended to make it easier to write future patches that might send archives in a format other that tar (e.g. cpio, pax, tar.gz). Second, it sends explicit progress messages rather than allowing the client to assume that progress is defined by the number of bytes received. This will help with future features where the server compresses the data, or sends it someplace directly rather than transmitting it to the client. The old protocol is still supported for compatibility with previous releases. The new protocol is selected by means of a new TARGET option to the BASE_BACKUP command. Currently, the only supported target is 'client'. Support for additional targets will be added in a later commit. Patch by me. The patch set of which this is a part has had review and/or testing from Jeevan Ladhe, Tushar Ahuja, Suraj Kharage, Dipesh Pandit, and Mark Dilger. Discussion: http://postgr.es/m/CA+TgmoaYZbz0=Yk797aOJwkGJC-LK3iXn+wzzMx7KdwNpZhS5g@mail.gmail.com
2022-01-08Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-11-16Fix headerscheck failure in replication/worker_internal.hAlvaro Herrera
Broken by 31c389d8de91
2021-11-05Introduce 'bbsink' abstraction to modularize base backup code.Robert Haas
The base backup code has accumulated a healthy number of new features over the years, but it's becoming increasingly difficult to maintain and further enhance that code because there's no real separation of concerns. For example, the code that understands knows the details of how we send data to the client using the libpq protocol is scattered throughout basebackup.c, rather than being centralized in one place. To try to improve this situation, introduce a new 'bbsink' object which acts as a recipient for archives generated during the base backup progress and also for the backup manifest. This commit introduces three types of bbsink: a 'copytblspc' bbsink forwards the backup to the client using one COPY OUT operation per tablespace and another for the manifest, a 'progress' bbsink performs command progress reporting, and a 'throttle' bbsink performs rate-limiting. The 'progress' and 'throttle' bbsink types also forward the data to a successor bbsink; at present, the last bbsink in the chain will always be of type 'copytblspc'. There are plans to add more types of 'bbsink' in future commits. This abstraction is a bit leaky in the case of progress reporting, but this still seems cleaner than what we had before. Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger, and Jeevan Ladhe. Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
2021-10-18Reset properly snapshot export state during transaction abortMichael Paquier
During a replication slot creation, an ERROR generated in the same transaction as the one creating a to-be-exported snapshot would have left the backend in an inconsistent state, as the associated static export snapshot state was not being reset on transaction abort, but only on the follow-up command received by the WAL sender that created this snapshot on replication slot creation. This would trigger inconsistency failures if this session tried to export again a snapshot, like during the creation of a replication slot. Note that a snapshot export cannot happen in a transaction block, so there is no need to worry resetting this state for subtransaction aborts. Also, this inconsistent state would very unlikely show up to users. For example, one case where this could happen is an out-of-memory error when building the initial snapshot to-be-exported. Dilip found this problem while poking at a different patch, that caused an error in this code path for reasons unrelated to HEAD. Author: Dilip Kumar Reviewed-by: Michael Paquier, Zhihong Yu Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com Backpatch-through: 9.6
2021-09-14Remove code duplication for permission checks with replication slotsMichael Paquier
Two functions, both named check_permissions(), used the same checks to verify if a user had required privileges to work on replication slots. This commit removes the duplication, and moves the function doing the checks to slot.c to be centralized. Author: Bharath Rupireddy Reviewed-by: Nathan Bossart, Euler Taveira Discussion: https://postgr.es/m/CALj2ACUPpVw1u7sQocFVWrSs0n10pt_G_4NPZKSxXK6cW1dErw@mail.gmail.com
2021-09-02Optimize fileset usage in apply worker.Amit Kapila
Use one fileset for the entire worker lifetime instead of using separate filesets for each streaming transaction. Now, the changes/subxacts files for every streaming transaction will be created under the same fileset and the files will be deleted after the transaction is completed. This patch extends the BufFileOpenFileSet and BufFileDeleteFileSet APIs to allow users to specify whether to give an error on missing files. Author: Dilip Kumar, based on suggestion by Thomas Munro Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila Discussion: https://postgr.es/m/[email protected]
2021-08-30Refactor sharedfileset.c to separate out fileset implementation.Amit Kapila
Move fileset related implementation out of sharedfileset.c to allow its usage by backends that don't want to share filesets among different processes. After this split, fileset infrastructure is used by both sharedfileset.c and worker.c for the named temporary files that survive across transactions. Author: Dilip Kumar, based on suggestion by Andres Freund Reviewed-by: Hou Zhijie, Masahiko Sawada, Amit Kapila Discussion: https://postgr.es/m/[email protected]
2021-08-27Add logical change details to logical replication worker errcontext.Amit Kapila
Previously, on the subscriber, we set the error context callback for the tuple data conversion failures. This commit replaces the existing error context callback with a comprehensive one so that it shows not only the details of data conversion failures but also the details of logical change being applied by the apply worker or table sync worker. The additional information displayed will be the command, transaction id, and timestamp. The error context is added to an error only when applying a change but not while doing other work like receiving data etc. This will help users in diagnosing the problems that occur during logical replication. It also can be used for future work that allows skipping a particular transaction on the subscriber. Author: Masahiko Sawada Reviewed-by: Hou Zhijie, Greg Nancarrow, Haiying Tang, Amit Kapila Tested-by: Haiying Tang Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-08-19Rename LOGICAL_REP_MSG_STREAM_END to LOGICAL_REP_MSG_STREAM_STOP.Amit Kapila
In the code, most places used the term "Stream Stop" for the logical stream message. This commit improves consistency by renaming LogicalRepMsgType "LOGICAL_REP_MSG_STREAM_END" to "LOGICAL_REP_MSG_STREAM_STOP". Author: Masahiko Sawada Reviewed-by: Hou Zhijie, Amit Kapila Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
2021-08-04Add prepare API support for streaming transactions in logical replication.Amit Kapila
Commit a8fd13cab0 added support for prepared transactions to built-in logical replication via a new option "two_phase" for a subscription. The "two_phase" option was not allowed with the existing streaming option. This commit permits the combination of "streaming" and "two_phase" subscription options. It extends the pgoutput plugin and the subscriber side code to add the prepare API for streaming transactions which will apply the changes accumulated in the spool-file at prepare time. Author: Peter Smith and Ajin Cherian Reviewed-by: Vignesh C, Amit Kapila, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com
2021-07-16Advance old-segment horizon properly after slot invalidationAlvaro Herrera
When some slots are invalidated due to the max_slot_wal_keep_size limit, the old segment horizon should move forward to stay within the limit. However, in commit c6550776394e we forgot to call KeepLogSeg again to recompute the horizon after invalidating replication slots. In cases where other slots remained, the limits would be recomputed eventually for other reasons, but if all slots were invalidated, the limits would not move at all afterwards. Repair. Backpatch to 13 where the feature was introduced. Author: Kyotaro Horiguchi <[email protected]> Reported-by: Marcin Krupowicz <[email protected]> Discussion: https://postgr.es/m/[email protected]
2021-07-14Add support for prepared transactions to built-in logical replication.Amit Kapila
To add support for streaming transactions at prepare time into the built-in logical replication, we need to do the following things: * Modify the output plugin (pgoutput) to implement the new two-phase API callbacks, by leveraging the extended replication protocol. * Modify the replication apply worker, to properly handle two-phase transactions by replaying them on prepare. * Add a new SUBSCRIPTION option "two_phase" to allow users to enable two-phase transactions. We enable the two_phase once the initial data sync is over. We however must explicitly disable replication of two-phase transactions during replication slot creation, even if the plugin supports it. We don't need to replicate the changes accumulated during this phase, and moreover, we don't have a replication connection open so we don't know where to send the data anyway. The streaming option is not allowed with this new two_phase option. This can be done as a separate patch. We don't allow to toggle two_phase option of a subscription because it can lead to an inconsistent replica. For the same reason, we don't allow to refresh the publication once the two_phase is enabled for a subscription unless copy_data option is false. Author: Peter Smith, Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Sawada Masahiko, Vignesh C, Dilip Kumar, Takamichi Osumi, Greg Nancarrow Tested-By: Haiying Tang Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
2021-07-02Don't try to print data type names in slot_store_error_callback().Tom Lane
The existing code tried to do syscache lookups in an already-failed transaction, which is problematic to say the least. After some consideration of alternatives, the best fix seems to be to just drop type names from the error message altogether. The table and column names seem like sufficient localization. If the user is unsure what types are involved, she can check the local and remote table definitions. Having done that, we can also discard the LogicalRepTypMap hash table, which had no other use. Arguably, LOGICAL_REP_MSG_TYPE replication messages are now obsolete as well; but we should probably keep them in case some other use emerges. (The complexity of removing something from the replication protocol would likely outweigh any savings anyhow.) Masahiko Sawada and Bharath Rupireddy, per complaint from Andres Freund. Back-patch to v10 where this code originated. Discussion: https://postgr.es/m/[email protected]
2021-06-15Fix decoding of speculative aborts.Amit Kapila
During decoding for speculative inserts, we were relying for cleaning toast hash on confirmation records or next change records. But that could lead to multiple problems (a) memory leak if there is neither a confirmation record nor any other record after toast insertion for a speculative insert in the transaction, (b) error and assertion failures if the next operation is not an insert/update on the same table. The fix is to start queuing spec abort change and clean up toast hash and change record during its processing. Currently, we are queuing the spec aborts for both toast and main table even though we perform cleanup while processing the main table's spec abort record. Later, if we have a way to distinguish between the spec abort record of toast and the main table, we can avoid queuing the change for spec aborts of toast tables. Reported-by: Ashutosh Bapat Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 9.6, where it was introduced Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
2021-06-11Return ReplicationSlotAcquire API to its original formAlvaro Herrera
Per 96540f80f833; the awkward API introduced by c6550776394e is no longer needed. Author: Andres Freund <[email protected]> Reviewed-by: Álvaro Herrera <[email protected]> Discussion: https://postgr.es/m/[email protected]
2021-06-10Add some const decorationsPeter Eisentraut
One of these functions is new in PostgreSQL 14; might as well start it out right.
2021-05-27Fix assertion during streaming of multi-insert toast changes.Amit Kapila
While decoding the multi-insert WAL we can't clean the toast untill we get the last insert of that WAL record. Now if we stream the changes before we get the last change, the memory for toast chunks won't be released and we expect the txn to have streamed all changes after streaming. This restriction is mainly to ensure the correctness of streamed transactions and it doesn't seem worth uplifting such a restriction just to allow this case because anyway we will stream the transaction once such an insert is complete. Previously we were using two different flags (one for toast tuples and another for speculative inserts) to indicate partial changes. Now instead we replaced both of them with a single flag to indicate partial changes. Reported-by: Pavan Deolasee Author: Dilip Kumar Reviewed-by: Pavan Deolasee, Amit Kapila Discussion: https://postgr.es/m/CABOikdN-_858zojYN-2tNcHiVTw-nhxPwoQS4quExeweQfG1Ug@mail.gmail.com
2021-05-12Rename the logical replication global "wrconn"Alvaro Herrera
The worker.c global wrconn is only meant to be used by logical apply/ tablesync workers, but there are other variables with the same name. To reduce future confusion rename the global from "wrconn" to "LogRepWorkerWalRcvConn". While this is just cosmetic, it seems better to backpatch it all the way back to 10 where this code appeared, to avoid future backpatching issues. Author: Peter Smith <[email protected]> Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com
2021-05-10Revert recovery prefetching feature.Thomas Munro
This set of commits has some bugs with known fixes, but at this late stage in the release cycle it seems best to revert and resubmit next time, along with some new automated test coverage for this whole area. Commits reverted: dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery." 1d257577: Optionally prefetch referenced data in recovery. f003d9f8: Add circular WAL decoding buffer. 323cbe7c: Remove read_page callback from XLogReader. Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the corresponding section of config.sgml is now reverted. Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
2021-05-06Update replication statistics after every stream/spill.Amit Kapila
Currently, replication slot statistics are updated at prepare, commit, and rollback. Now, if the transaction is interrupted the stats might not get updated. Fixed this by updating replication statistics after every stream/spill. In passing update the docs to change the description of some of the slot stats. Author: Vignesh C, Sawada Masahiko Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/[email protected]
2021-04-27Use HTAB for replication slot statistics.Amit Kapila
Previously, we used to use the array of size max_replication_slots to store stats for replication slots. But that had two problems in the cases where a message for dropping a slot gets lost: 1) the stats for the new slot are not recorded if the array is full and 2) writing beyond the end of the array if the user reduces the max_replication_slots. This commit uses HTAB for replication slot statistics, resolving both problems. Now, pgstat_vacuum_stat() search for all the dead replication slots in stats hashtable and tell the collector to remove them. To avoid showing the stats for the already-dropped slots, pg_stat_replication_slots view searches slot stats by the slot name taken from pg_replication_slots. Also, we send a message for creating a slot at slot creation, initializing the stats. This reduces the possibility that the stats are accumulated into the old slot stats when a message for dropping a slot gets lost. Reported-by: Andres Freund Author: Sawada Masahiko, test case by Vignesh C Reviewed-by: Amit Kapila, Vignesh C, Dilip Kumar Discussion: https://postgr.es/m/[email protected]
2021-04-16Add information of total data processed to replication slot stats.Amit Kapila
This adds the statistics about total transactions count and total transaction data logically sent to the decoding output plugin from ReorderBuffer. Users can query the pg_stat_replication_slots view to check these stats. Suggested-by: Andres Freund Author: Vignesh C and Amit Kapila Reviewed-by: Sawada Masahiko, Amit Kapila Discussion: https://postgr.es/m/[email protected]
2021-04-08Remove duplicate typedef.Thomas Munro
Thinko in commit 323cbe7c, per complaint from BF animal locust's older GCC compiler.
2021-04-08Remove read_page callback from XLogReader.Thomas Munro
Previously, the XLogReader module would fetch new input data using a callback function. Redesign the interface so that it tells the caller to insert more data with a special return value instead. This API suits later patches for prefetching, encryption and maybe other future projects that would otherwise require continually extending the callback interface. As incidental cleanup work, move global variables readOff, readLen and readSegNo inside XlogReaderState. Author: Kyotaro HORIGUCHI <[email protected]> Author: Heikki Linnakangas <[email protected]> (parts of earlier version) Reviewed-by: Antonin Houska <[email protected]> Reviewed-by: Alvaro Herrera <[email protected]> Reviewed-by: Takashi Menjo <[email protected]> Reviewed-by: Andres Freund <[email protected]> Reviewed-by: Thomas Munro <[email protected]> Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp
2021-04-06Allow pgoutput to send logical decoding messages.Amit Kapila
The output plugin accepts a new parameter (messages) that controls if logical decoding messages are written into the replication stream. It is useful for those clients that use pgoutput as an output plugin and needs to process messages that were written by pg_logical_emit_message(). Although logical streaming replication protocol supports logical decoding messages now, logical replication does not use this feature yet. Author: David Pirotte, Euler Taveira Reviewed-by: Euler Taveira, Andres Freund, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com
2021-04-06Refactor function parse_output_parameters.Amit Kapila
Instead of using multiple parameters in parse_ouput_parameters function signature, use the struct PGOutputData that encapsulates all pgoutput options. It will be useful for future work where we need to add other options in pgoutput. Author: Euler Taveira Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com
2021-03-30Add a xid argument to the filter_prepare callback for output plugins.Amit Kapila
Along with gid, this provides a different way to identify the transaction. The users that use xid in some way to prepare the transactions can use it to filter prepare transactions. The later commands COMMIT PREPARED or ROLLBACK PREPARED carries both identifiers, providing an output plugin the choice of what to use. Author: Markus Wanner Reviewed-by: Vignesh C, Amit Kapila Discussion: https://postgr.es/m/[email protected]
2021-03-17Code review for server's handling of "tablespace map" files.Tom Lane
While looking at Robert Foggia's report, I noticed a passel of other issues in the same area: * The scheme for backslash-quoting newlines in pathnames is just wrong; it will misbehave if the last ordinary character in a pathname is a backslash. I'm not sure why we're bothering to allow newlines in tablespace paths, but if we're going to do it we should do it without introducing other problems. Hence, backslashes themselves have to be backslashed too. * The author hadn't read the sscanf man page very carefully, because this code would drop any leading whitespace from the path. (I doubt that a tablespace path with leading whitespace could happen in practice; but if we're bothering to allow newlines in the path, it sure seems like leading whitespace is little less implausible.) Using sscanf for the task of finding the first space is overkill anyway. * While I'm not 100% sure what the rationale for escaping both \r and \n is, if the idea is to allow Windows newlines in the file then this code failed, because it'd throw an error if it saw \r followed by \n. * There's no cross-check for an incomplete final line in the map file, which would be a likely apparent symptom of the improper-escaping bug. On the generation end, aside from the escaping issue we have: * If needtblspcmapfile is true then do_pg_start_backup will pass back escaped strings in tablespaceinfo->path values, which no caller wants or is prepared to deal with. I'm not sure if there's a live bug from that, but it looks like there might be (given the dubious assumption that anyone actually has newlines in their tablespace paths). * It's not being very paranoid about the possibility of random stuff in the pg_tblspc directory. IMO we should ignore anything without an OID-like name. The escaping rule change doesn't seem back-patchable: it'll require doubling of backslashes in the tablespace_map file, which is basically a basebackup format change. The odds of that causing trouble are considerably more than the odds of the existing bug causing trouble. The rest of this seems somewhat unlikely to cause problems too, so no back-patch.
2021-03-12Add condition variable for walreceiver shutdown.Thomas Munro
Use this new CV to wait for walreceiver shutdown without a sleep/poll loop, while also benefiting from standard postmaster death handling. Discussion: https://postgr.es/m/CA%2BhUKGK1607VmtrDUHQXrsooU%3Dap4g4R2yaoByWOOA3m8xevUQ%40mail.gmail.com
2021-03-03Add option to enable two_phase commits via pg_create_logical_replication_slot.Amit Kapila
Commit 0aa8a01d04 extends the output plugin API to allow decoding of prepared xacts and allowed the user to enable/disable the two-phase option via pg_logical_slot_get_changes(). This can lead to a problem such that the first time when it gets changes via pg_logical_slot_get_changes() without two_phase option enabled it will not get the prepared even though prepare is after consistent snapshot. Now next time during getting changes, if the two_phase option is enabled it can skip prepare because by that time start decoding point has been moved. So the user will only get commit prepared. Allow to enable/disable this option at the create slot time and default will be false. It will break the existing slots which is fine in a major release. Author: Ajin Cherian Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/[email protected]
2021-03-01Avoid repeated decoding of prepared transactions after a restart.Amit Kapila
In commit a271a1b50e, we allowed decoding at prepare time and the prepare was decoded again if there is a restart after decoding it. It was done that way because we can't distinguish between the cases where we have not decoded the prepare because it was prior to consistent snapshot or we have decoded it earlier but restarted. To distinguish between these two cases, we have introduced an initial_consistent_point at the slot level which is an LSN at which we found a consistent point at the time of slot creation. This is also the point where we have exported a snapshot for the initial copy. So, prepare transaction prior to this point are sent along with commit prepared. This commit bumps SNAPBUILD_VERSION because of change in SnapBuild. It will break existing slots which is fine in a major release. Author: Ajin Cherian, based on idea by Andres Freund Reviewed-by: Amit Kapila and Vignesh C Discussion: https://postgr.es/m/[email protected]
2021-02-15Fix the warnings introduced in commit ce0fdbfe97.Amit Kapila
Author: Amit Kapila Reviewed-by: Tom Lane Discussion: https://postgr.es/m/[email protected]
2021-02-12Allow multiple xacts during table sync in logical replication.Amit Kapila
For the initial table data synchronization in logical replication, we use a single transaction to copy the entire table and then synchronize the position in the stream with the main apply worker. There are multiple downsides of this approach: (a) We have to perform the entire copy operation again if there is any error (network breakdown, error in the database operation, etc.) while we synchronize the WAL position between tablesync worker and apply worker; this will be onerous especially for large copies, (b) Using a single transaction in the synchronization-phase (where we can receive WAL from multiple transactions) will have the risk of exceeding the CID limit, (c) The slot will hold the WAL till the entire sync is complete because we never commit till the end. This patch solves all the above downsides by allowing multiple transactions during the tablesync phase. The initial copy is done in a single transaction and after that, we commit each transaction as we receive. To allow recovery after any error or crash, we use a permanent slot and origin to track the progress. The slot and origin will be removed once we finish the synchronization of the table. We also remove slot and origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not finished. The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true cannot be executed inside a transaction block because they can now drop the slots for which we have no provision to rollback. This will also open up the path for logical replication of 2PC transactions on the subscriber side. Previously, we can't do that because of the requirement of maintaining a single transaction in tablesync workers. Bump catalog version due to change of state in the catalog (pg_subscription_rel). Author: Peter Smith, Amit Kapila, and Takamichi Osumi Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
2021-02-10Make pg_replication_origin_drop safe against concurrent drops.Amit Kapila
Currently, we get the origin id from the name and then drop the origin by taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent sessions can get the id from the name at the same time and then when they try to drop the origin, one of the sessions will get the either "tuple concurrently deleted" or "cache lookup failed for replication origin ..". To prevent this race condition we do the entire operation under lock. This obviates the need for replorigin_drop() API and we have removed it so if any extension authors are using it they need to instead use replorigin_drop_by_name. See it's usage in pg_replication_origin_drop(). Author: Peter Smith Reviewed-by: Amit Kapila, Euler Taveira, Petr Jelinek, and Alvaro Herrera Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com
2021-01-04Allow decoding at prepare time in ReorderBuffer.Amit Kapila
This patch allows PREPARE-time decoding of two-phase transactions (if the output plugin supports this capability), in which case the transactions are replayed at PREPARE and then committed later when COMMIT PREPARED arrives. Now that we decode the changes before the commit, the concurrent aborts may cause failures when the output plugin consults catalogs (both system and user-defined). We detect such failures with a special sqlerrcode ERRCODE_TRANSACTION_ROLLBACK introduced by commit 7259736a6e and stop decoding the remaining changes. Then we rollback the changes when rollback prepared is encountered. Author: Ajin Cherian and Amit Kapila based on previous work by Nikhil Sontakke and Stas Kelvich Reviewed-by: Amit Kapila, Peter Smith, Sawada Masahiko, Arseny Sher, and Dilip Kumar Tested-by: Takamichi Osumi Discussion: https://postgr.es/m/[email protected] https://postgr.es/m/CAMGcDxeqEpWj3fTXwqhSwBdXd2RS9jzwWscO-XbeCfso6ts3+Q@mail.gmail.com