summaryrefslogtreecommitdiff
path: root/src/backend/replication/slot.c
AgeCommit message (Collapse)Author
2016-04-05Fix error message from wal_level value renamingPeter Eisentraut
found by Ian Barwick
2016-03-18Merge wal_level "archive" and "hot_standby" into new name "replica"Peter Eisentraut
The distinction between "archive" and "hot_standby" existed only because at the time "hot_standby" was added, there was some uncertainty about stability. This is now a long time ago. We would like to move forward with simplifying the replication configuration, but this distinction is in the way, because a primary server cannot tell (without asking a standby or predicting the future) which one of these would be the appropriate level. Pick a new name for the combined setting to make it clearer that it covers all (non-logical) backup and replication uses. The old values are still accepted but are converted internally. Reviewed-by: Michael Paquier <[email protected]> Reviewed-by: David Steele <[email protected]>
2016-03-10Introduce durable_rename() and durable_link_or_rename().Andres Freund
Renaming a file using rename(2) is not guaranteed to be durable in face of crashes; especially on filesystems like xfs and ext4 when mounted with data=writeback. To be certain that a rename() atomically replaces the previous file contents in the face of crashes and different filesystems, one has to fsync the old filename, rename the file, fsync the new filename, fsync the containing directory. This sequence is not generally adhered to currently; which exposes us to data loss risks. To avoid having to repeat this arduous sequence, introduce durable_rename(), which wraps all that. Also add durable_link_or_rename(). Several places use link() (with a fallback to rename()) to rename a file, trying to avoid replacing the target file out of paranoia. Some of those rename sequences need to be durable as well. There seems little reason extend several copies of the same logic, so centralize the link() callers. This commit does not yet make use of the new functions; they're used in a followup commit. Author: Michael Paquier, Andres Freund Discussion: [email protected] Backpatch: All supported branches
2016-02-18Improve error message about active replication slotPeter Eisentraut
The old phrasing was awkward if a replication slot is activated and deactivated repeatedly.
2016-02-12Make builtin lwlock tranche names consistent.Robert Haas
Previously, we had a mix of styles. Amit Kapila
2016-02-05Fix typo in comment.Robert Haas
Michael Paquier
2016-01-29Migrate replication slot I/O locks into a separate tranche.Robert Haas
This is following in a long train of similar changes and for the same reasons - see b319356f0e94a6482c726cf4af96597c211d8d6e and fe702a7b3f9f2bc5bf6d173166d7d55226af82c8 inter alia. Author: Amit Kapila Reviewed-by: Alexander Korotkov, Robert Haas
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-10-29Message style improvementsPeter Eisentraut
Message style, plurals, quoting, spelling, consistency with similar messages
2015-10-06Remove more volatile qualifiers.Robert Haas
Prior to commit 0709b7ee72e4bc71ad07b7120acd117265ab51d0, access to variables within a spinlock-protected critical section had to be done through a volatile pointer, but that should no longer be necessary. This continues work begun in df4077cda2eae3eb4a5cf387da0c1e7616e73204 and 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348. Thomas Munro and Michael Paquier
2015-10-03Improve errhint() about replication slot naming restrictions.Andres Freund
The existing hint talked about "may only contain letters", but the actual requirement is more strict: only lower case letters are allowed. Reported-By: Rushabh Lathia Author: Rushabh Lathia Discussion: AGPqQf2x50qcwbYOBKzb4x75sO_V3g81ZsA8+Ji9iN5t_khFhQ@mail.gmail.com Backpatch: 9.4-, where replication slots were added
2015-08-11Allow pg_create_physical_replication_slot() to reserve WAL.Andres Freund
When creating a physical slot it's often useful to immediately reserve the current WAL position instead of only doing after the first feedback message arrives. That e.g. allows slots to guarantee that all the WAL for a base backup will be available afterwards. Logical slots already have to reserve WAL during creation, so generalize that logic into being usable for both physical and logical slots. Catversion bump because of the new parameter. Author: Gurjeet Singh Reviewed-By: Andres Freund Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11Introduce macros determining if a replication slot is physical or logical.Andres Freund
These make the code a bit easier to read, and make it easier to add a more explicit notion of a slot's type at some point in the future. Author: Gurjeet Singh Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11Minor cleanups in slot related code.Andres Freund
Fix a bunch of typos, and remove two superflous includes. Author: Gurjeet Singh Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com Backpatch: 9.4
2015-06-11Fix typoPeter Eisentraut
2015-05-24pgindent run for 9.5Bruce Momjian
2015-05-20Fix more typos in comments.Heikki Linnakangas
Patch by CharSyam, plus a few more I spotted with grep.
2015-05-20Collection of typo fixes.Heikki Linnakangas
Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
2015-04-27Use a fd opened for read/write when syncing slots during startup.Andres Freund
Some operating systems, including the reporter's windows, return EBADFD or similar when fsync() is invoked on a O_RDONLY file descriptor. Unfortunately RestoreSlotFromDisk() does exactly that; which causes failures after restarts in at least some scenarios. If you hit the bug the error message will be something like ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor Simply use O_RDWR instead of O_RDONLY when opening the relevant file descriptor to fix the bug. Unfortunately I have no way of verifying the fix, but we've seen similar problems in the past. This bug goes back to 9.4 where slots were introduced. Backpatch accordingly. Reported-By: Patrice Drolet Bug: #13143: Discussion: [email protected]
2015-04-21Add 'active_in' column to pg_replication_slots.Andres Freund
Right now it is visible whether a replication slot is active in any session, but not in which. Adding the active_in column, containing the pid of the backend having acquired the slot, makes it much easier to associate pg_replication_slots entries with the corresponding pg_stat_replication/pg_stat_activity row. This should have been done from the start, but I (Andres) dropped the ball there somehow. Author: Craig Ringer, revised by me Discussion: CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-14Reorganize our CRC source files again.Heikki Linnakangas
Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.
2015-04-02Correct comment to use RS_EPHEMERALSimon Riggs
2015-01-24Replace a bunch more uses of strncpy() with safer coding.Tom Lane
strncpy() has a well-deserved reputation for being unsafe, so make an effort to get rid of nearly all occurrences in HEAD. A large fraction of the remaining uses were passing length less than or equal to the known strlen() of the source, in which case no null-padding can occur and the behavior is equivalent to memcpy(), though doubtless slower and certainly harder to reason about. So just use memcpy() in these cases. In other cases, use either StrNCpy() or strlcpy() as appropriate (depending on whether padding to the full length of the destination buffer seems useful). I left a few strncpy() calls alone in the src/timezone/ code, to keep it in sync with upstream (the IANA tzcode distribution). There are also a few such calls in ecpg that could possibly do with more analysis. AFAICT, none of these changes are more than cosmetic, except for the four occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength source leads to a non-null-terminated destination buffer and ensuing misbehavior. These don't seem like security issues, first because no stack clobber is possible and second because if your values of sslcert etc are coming from untrusted sources then you've got problems way worse than this. Still, it's undesirable to have unpredictable behavior for overlength inputs, so back-patch those four changes to all active branches.
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2015-01-03Add pg_string_endswith as the start of a string helper library in src/common.Andres Freund
Backpatch to 9.3 where src/common was introduce, because a bugfix that needs to be backpatched, requires the function. Earlier branches will have to duplicate the code.
2014-11-12Fix several weaknesses in slot and logical replication on-disk serialization.Andres Freund
Heikki noticed in [email protected] that slot.c and snapbuild.c were missing the FIN_CRC32 call when computing/checking checksums of on disk files. That doesn't lower the the error detection capabilities of the checksum, but is inconsistent with other usages. In a followup mail Heikki also noticed that, contrary to a comment, the 'version' and 'length' struct fields of replication slot's on disk data where not covered by the checksum. That's not likely to lead to actually missed corruption as those fields are cross checked with the expected version and the actual file length. But it's wrong nonetheless. As fixing these issues makes existing on disk files unreadable, bump the expected versions of on disk files for both slots and logical decoding historic catalog snapshots. This means that loading old files will fail with ERROR: "replication slot file ... has unsupported version 1" and ERROR: "snapbuild state file ... has unsupported version 1 instead of 2" respectively. Given the low likelihood of anybody already using these new features in a production setup that seems acceptable. Fixing these issues made me notice that there's no regression test covering the loading of historic snapshot from disk - so add one. Backpatch to 9.4 where these features were introduced.
2014-11-12Message improvementsPeter Eisentraut
2014-11-04Switch to CRC-32C in WAL and other places.Heikki Linnakangas
The old algorithm was found to not be the usual CRC-32 algorithm, used by Ethernet et al. We were using a non-reflected lookup table with code meant for a reflected lookup table. That's a strange combination that AFAICS does not correspond to any bit-wise CRC calculation, which makes it difficult to reason about its properties. Although it has worked well in practice, seems safer to use a well-known algorithm. Since we're changing the algorithm anyway, we might as well choose a different polynomial. The Castagnoli polynomial has better error-correcting properties than the traditional CRC-32 polynomial, even if we had implemented it correctly. Another reason for picking that is that some new CPUs have hardware support for calculating CRC-32C, but not CRC-32, let alone our strange variant of it. This patch doesn't add any support for such hardware, but a future patch could now do that. The old algorithm is kept around for tsquery and pg_trgm, which use the values in indexes that need to remain compatible so that pg_upgrade works. While we're at it, share the old lookup table for CRC-32 calculation between hstore, ltree and core. They all use the same table, so might as well.
2014-09-05Assorted message fixes and improvementsPeter Eisentraut
2014-07-24Properly remove ephemeral replication slots after a crash restart.Andres Freund
Ephemeral slots - slots that shouldn't survive database restarts - weren't properly cleaned up after a immediate/crash restart. They were ignored in the sense that they weren't restored into memory and thus didn't cause unwanted resource retention; but they prevented a new slot with the same name from being created. Now ephemeral slots are fully removed during startup. Backpatch to 9.4 where replication slots where added.
2014-06-12Consistency improvements for slot and decoding code.Andres Freund
Change the order of checks in similar functions to be the same; remove a parameter that's not needed anymore; rename a memory context and expand a couple of comments. Per review comments from Amit Kapila
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-03-07fix ReplicationSlotsCountDBSlots for dropping unrelated databasesBruce Momjian
YAMAMOTO Takashi
2014-03-03Introduce logical decoding.Robert Haas
This feature, building on previous commits, allows the write-ahead log stream to be decoded into a series of logical changes; that is, inserts, updates, and deletes and the transactions which contain them. It is capable of handling decoding even across changes to the schema of the effected tables. The output format is controlled by a so-called "output plugin"; an example is included. To make use of this in a real replication system, the output plugin will need to be modified to produce output in the format appropriate to that system, and to perform filtering. Currently, information can be extracted from the logical decoding system only via SQL; future commits will add the ability to stream changes via walsender. Andres Freund, with review and other contributions from many other people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan, Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve Singer.
2014-02-02Fix typos in docs and comments.Fujii Masao
Thom Brown
2014-02-01Introduce replication slots.Robert Haas
Replication slots are a crash-safe data structure which can be created on either a master or a standby to prevent premature removal of write-ahead log segments needed by a standby, as well as (with hot_standby_feedback=on) pruning of tuples whose removal would cause replication conflicts. Slots have some advantages over existing techniques, as explained in the documentation. In a few places, we refer to the type of replication slots introduced by this patch as "physical" slots, because forthcoming patches for logical decoding will also have slots, but with somewhat different properties. Andres Freund and Robert Haas