summaryrefslogtreecommitdiff
path: root/src/backend/access/rmgrdesc
AgeCommit message (Collapse)Author
2018-12-20Check for conflicting queries during replay of gistvacuumpage()Alexander Korotkov
013ebc0a7b implements so-called GiST microvacuum. That is gistgettuple() marks index tuples as dead when kill_prior_tuple is set. Later, when new tuple insertion claims page space, those dead index tuples are physically deleted from page. When this deletion is replayed on standby, it might conflict with read-only queries. But 013ebc0a7b doesn't handle this. That may lead to disappearance of some tuples from read-only snapshots on standby. This commit implements resolving of conflicts between replay of GiST microvacuum and standby queries. On the master we implement new WAL record type XLOG_GIST_DELETE, which comprises necessary information. On stable releases we've to be tricky to keep WAL compatibility. Information required for conflict processing is just appended to data of XLOG_GIST_PAGE_UPDATE record. So, PostgreSQL version, which doesn't know about conflict processing, will just ignore that. Reported-by: Andres Freund Diagnosed-by: Andres Freund Discussion: https://postgr.es/m/20181212224524.scafnlyjindmrbe6%40alap3.anarazel.de Author: Alexander Korotkov Backpatch-through: 9.6
2018-11-20Make WAL description output more consistentPeter Eisentraut
The output for record types XLOG_DBASE_CREATE and XLOG_DBASE_DROP used the order dbid/tablespaceid, whereas elsewhere the order is tablespaceid/dbid[/relfilenodeid]. Flip the order for those two types to make it consistent. Author: Jean-Christophe Arnu <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/CAHZmTm18Ln62KW-G8NYvO1wbBL3QU1E76Zep=DuHmg-zS2XFAg@mail.gmail.com/
2018-11-14Add flag values in WAL description to all heap recordsMichael Paquier
Hexadecimal is consistently used as format to not bloat too much the output but keep it readable. This information is useful mainly for debugging purposes with for example pg_waldump. Author: Michael Paquier Reviewed-by: Nathan Bossart, Dmitry Dolgov, Andres Freund, Álvaro Herrera Discussion: https://postgr.es/m/[email protected]
2018-05-20printf("%lf") is not portable, so omit the "l".Tom Lane
The "l" (ell) width spec means something in the corresponding scanf usage, but not here. While modern POSIX says that applying "l" to "f" and other floating format specs is a no-op, SUSv2 says it's undefined. Buildfarm experience says that some old compilers emit warnings about it, and at least one old stdio implementation (mingw's "ANSI" option) actually produces wrong answers and/or crashes. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2018-04-19Handle XLOG_BTREE_META_CLEANUP in btree_desc() and btree_identify()Teodor Sigaev
New WAL record XLOG_BTREE_META_CLEANUP introduced in 857f9c36 has no handling in btree_desc() and btree_identify(). This patch implements corresponding handling. Alexander Korotkov
2018-04-17Fix confusion on the padding of GIDs in on commit and abort records.Heikki Linnakangas
Review of commit 1eb6d652: It's pointless to add padding to the GID fields, when the code that follows assumes that there is no alignment, and uses memcpy(). Remove the pointless padding. Update comments to note the new fields in the WAL records. Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/33b787bf-dc20-1161-54e9-3f3b607bf59d%40iki.fi
2018-04-09Further cleanup of client dependencies on src/include/catalog headers.Tom Lane
In commit 9c0a0de4c, I'd failed to notice that catalog/catalog.h should also be considered a frontend-unsafe header, because it includes (and needs) the full form of pg_class.h, not to mention relcache.h. However, various frontend code was depending on it to get TABLESPACE_VERSION_DIRECTORY, so refactoring of some sort is called for. The cleanest answer seems to be to move TABLESPACE_VERSION_DIRECTORY, as well as the OIDCHARS symbol, to common/relpath.h. Do that, and mop up inclusions as necessary. (I found that quite a few current users of catalog/catalog.h don't seem to need it at all anymore, apparently as a result of the refactorings that created common/relpath.[hc]. And initdb.c needed it only as a route to pg_class_d.h.) Discussion: https://postgr.es/m/[email protected]
2018-04-09Revert "Allow on-line enabling and disabling of data checksums"Magnus Hagander
This reverts the backend sides of commit 1fde38beaa0c3e66c340efc7cc0dc272d6254bb0. I have, at least for now, left the pg_verify_checksums tool in place, as this tool can be very valuable without the rest of the patch as well, and since it's a read-only tool that only runs when the cluster is down it should be a lot safer.
2018-04-07Indexes with INCLUDE columns and their support in B-treeTeodor Sigaev
This patch introduces INCLUDE clause to index definition. This clause specifies a list of columns which will be included as a non-key part in the index. The INCLUDE columns exist solely to allow more queries to benefit from index-only scans. Also, such columns don't need to have appropriate operator classes. Expressions are not supported as INCLUDE columns since they cannot be used in index-only scans. Index access methods supporting INCLUDE are indicated by amcaninclude flag in IndexAmRoutine. For now, only B-tree indexes support INCLUDE clause. In B-tree indexes INCLUDE columns are truncated from pivot index tuples (tuples located in non-leaf pages and high keys). Therefore, B-tree indexes now might have variable number of attributes. This patch also provides generic facility to support that: pivot tuples contain number of their attributes in t_tid.ip_posid. Free 13th bit of t_info is used for indicating that. This facility will simplify further support of index suffix truncation. The changes of above are backward-compatible, pg_upgrade doesn't need special handling of B-tree indexes for that. Bump catalog version Author: Anastasia Lubennikova with contribition by Alexander Korotkov and me Reviewed by: Peter Geoghegan, Tomas Vondra, Antonin Houska, Jeff Janes, David Rowley, Alexander Korotkov Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2018-04-07Logical decoding of TRUNCATEPeter Eisentraut
Add a new WAL record type for TRUNCATE, which is only used when wal_level >= logical. (For physical replication, TRUNCATE is already replicated via SMGR records.) Add new callback for logical decoding output plugins to receive TRUNCATE actions. Author: Simon Riggs <[email protected]> Author: Marco Nenciarini <[email protected]> Author: Peter Eisentraut <[email protected]> Reviewed-by: Petr Jelinek <[email protected]> Reviewed-by: Andres Freund <[email protected]> Reviewed-by: Alvaro Herrera <[email protected]>
2018-04-05Allow on-line enabling and disabling of data checksumsMagnus Hagander
This makes it possible to turn checksums on in a live cluster, without the previous need for dump/reload or logical replication (and to turn it off). Enabling checkusm starts a background process in the form of a launcher/worker combination that goes through the entire database and recalculates checksums on each and every page. Only when all pages have been checksummed are they fully enabled in the cluster. Any failure of the process will revert to checksums off and the process has to be started. This adds a new WAL record that indicates the state of checksums, so the process works across replicated clusters. Authors: Magnus Hagander and Daniel Gustafsson Review: Tomas Vondra, Michael Banck, Heikki Linnakangas, Andrey Borodin
2018-04-02Fix some dubious WAL-parsing code.Tom Lane
Coverity complained about possible buffer overrun in two places added by commit 1eb6d6527, and AFAICS it's reasonable to worry: even granting that the WAL originator properly truncated the commit GID to GIDSIZE, we should not really bet our lives on that having the same value as it does in the current build. Hence, use strlcpy() not strcpy(), and adjust the pointer advancement logic to be sure we skip over the whole source string even if strlcpy() truncated it.
2018-03-28Store 2PC GID in commit/abort WAL recs for logical decodingSimon Riggs
Store GID of 2PC in commit/abort WAL records when wal_level = logical. This allows logical decoding to send the SAME gid to subscribers across restarts of logical replication. Track relica origin replay progress for 2PC. (Edited from patch 0003 in the logical decoding 2PC series.) Authors: Nikhil Sontakke, Stas Kelvich Reviewed-by: Simon Riggs, Andres Freund
2018-01-03Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-08-16Remove dedicated B-tree root-split record types.Heikki Linnakangas
Since commit 40dae7ec53, which changed the way b-tree page splitting works, there has been no difference in the handling of root, and non-root split WAL records. We don't need to distinguish them anymore If you're worried about the loss of debugging information, note that usually a root split record will normally be followed by a WAL record to create the new root page. The root page will also have the BTP_ROOT flag set on the page itself, and there is a pointer to it from the metapage. Author: Aleksander Alekseev Discussion: https://www.postgresql.org/message-id/[email protected]
2017-06-21Phase 3 of pgindent updates.Tom Lane
Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2017-06-21Phase 2 of pgindent updates.Tom Lane
Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2017-05-17Post-PG 10 beta1 pgindent runBruce Momjian
perltidy run not included.
2017-04-01BRIN de-summarizationAlvaro Herrera
When the BRIN summary tuple for a page range becomes too "wide" for the values actually stored in the table (because the tuples that were present originally are no longer present due to updates or deletes), it can be useful to remove the outdated summary tuple, so that a future summarization can install a tighter summary. This commit introduces a SQL-callable interface to do so. Author: Álvaro Herrera Reviewed-by: Eiji Seki Discussion: https://postgr.es/m/[email protected]
2017-03-27Still more code review for single-page hash vacuuming.Robert Haas
Most seriously, fix use of incorrect block ID, per a report from Jeff Janes that it causes a crash and a diagnosis from Amit Kapila. Improve consistency between the hash and btree versions of this code by adding back a PANIC that btree has, and by registering data in the xlog record in the same way, per complaints from Jeff Janes and Amit Kapila. Tidy up some minor cosmetic points, per complaints from Amit Kapila. Patch by Ashutosh Sharma, reviewed by Amit Kapila, and tested by Jeff Janes. Discussion: http://postgr.es/m/CAMkU=1w-9Qe=Ff1o6bSaXpNO9wqpo7_9GL8_CVhw4BoVVHasqg@mail.gmail.com
2017-03-23Track the oldest XID that can be safely looked up in CLOG.Robert Haas
This provides infrastructure for looking up arbitrary, user-supplied XIDs without a risk of scary-looking failures from within the clog module. Normally, the oldest XID that can be safely looked up in CLOG is the same as the oldest XID that can reused without causing wraparound, and the latter is already tracked. However, while truncation is in progress, the values are different, so we must keep track of them separately. Craig Ringer, reviewed by Simon Riggs and by me. Discussion: http://postgr.es/m/CAMsr+YHQiWNEi0daCTboS40T+V5s_+dst3PYv_8v2wNVH+Xx4g@mail.gmail.com
2017-03-20Fixes for single-page hash index vacuum.Robert Haas
Clear LH_PAGE_HAS_DEAD_TUPLES during replay, similar to what gets done for btree. Update hashdesc.c for xl_hash_vacuum_one_page. Oversights in commit 6977b8b7f4dfb40896ff5e2175cad7fdbda862eb spotted by Amit Kapila. Patch by Ashutosh Sharma. Bump WAL version. The original patch to make hash indexes write-ahead logged probably should have done this, and the single page vacuuming patch probably should have done it again, but better late than never. Discussion: http://postgr.es/m/CAA4eK1Kd=mJ9xreovcsh0qMiAj-QqCphHVQ_Lfau1DR9oVjASQ@mail.gmail.com
2017-03-16Port single-page btree vacuum logic to hash indexes.Robert Haas
This is advantageous for hash indexes for the same reasons it's good for btrees: it accelerates space recycling, reducing bloat. Ashutosh Sharma, reviewed by Amit Kapila and by me. A bit of additional hacking by me. Discussion: http://postgr.es/m/CAE9k0PkRSyzx8dOnokEpUi2A-RFZK72WN0h9DEMv_ut9q6bPRw@mail.gmail.com
2017-03-14hash: Add write-ahead logging support.Robert Haas
The warning about hash indexes not being write-ahead logged and their use being discouraged has been removed. "snapshot too old" is now supported for tables with hash indexes. Most importantly, barring bugs, hash indexes will now be crash-safe and usable on standbys. This commit doesn't yet add WAL consistency checking for hash indexes, as we now have for other index types; a separate patch has been submitted to cure that lack. Amit Kapila, reviewed and slightly modified by me. The larger patch series of which this is a part has been reviewed and tested by Álvaro Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper Pedersen. Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com
2017-02-14Split index xlog headers from other private index headers.Robert Haas
The xlog-specific headers need to be included in both frontend code - specifically, pg_waldump - and the backend, but the remainder of the private headers for each index are only needed by the backend. By splitting the xlog stuff out into separate headers, pg_waldump pulls in fewer backend headers, which is a good thing. Patch by me, reviewed by Michael Paquier and Andres Freund, per a complaint from Dilip Kumar. Discussion: http://postgr.es/m/CA+TgmoZ=F=GkxV0YEv-A8tb+AEGy_Qa7GSiJ8deBKFATnzfEug@mail.gmail.com
2017-02-09Rename user-facing tools with "xlog" in the name to say "wal".Robert Haas
This means pg_receivexlog because pg_receivewal, pg_resetxlog becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.
2017-02-08Add WAL consistency checking facility.Robert Haas
When the new GUC wal_consistency_checking is set to a non-empty value, it triggers recording of additional full-page images, which are compared on the standby against the results of applying the WAL record (without regard to those full-page images). Allowable differences such as hints are masked out, and the resulting pages are compared; any difference results in a FATAL error on the standby. Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki Linnakangas. Extensively reviewed and revised by Michael Paquier and by me, with additional reviews and comments from Amit Kapila, Álvaro Herrera, Simon Riggs, and Peter Eisentraut.
2017-01-19Fix race condition in reading commit timestampsAlvaro Herrera
If a user requests the commit timestamp for a transaction old enough that its data is concurrently being truncated away by vacuum at just the right time, they would receive an ugly internal file-not-found error message from slru.c rather than the expected NULL return value. In a primary server, the window for the race is very small: the lookup has to occur exactly between the two calls by vacuum, and there's not a lot that happens between them (mostly just a multixact truncate). In a standby server, however, the window is larger because the truncation is executed as soon as the WAL record for it is replayed, but the advance of the oldest-Xid is not executed until the next checkpoint record. To fix in the primary, simply reverse the order of operations in vac_truncate_clog. To fix in the standby, augment the WAL truncation record so that the standby is aware of the new oldest-XID value and can apply the update immediately. WAL version bumped because of this. No backpatch, because of the low importance of the bug and its rarity. Author: Craig Ringer Reviewed-By: Petr Jelínek, Peter Eisentraut Discussion: https://postgr.es/m/CAMsr+YFhVtRQT1VAwC+WGbbxZZRzNou=N9Ed-FrCqkwQ8H8oJQ@mail.gmail.com
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-12-05Fix incorrect output from gin_desc().Fujii Masao
Previously gin_desc() displayed incorrect output "unknown action 0" for XLOG_GIN_INSERT and XLOG_GIN_VACUUM_DATA_LEAF_PAGE records with valid actions. The cause of this problem was that gin_desc() wrongly used XLogRecGetData() to extract data from those records. Since they were registered by XLogRegisterBufData(), gin_desc() should have used XLogRecGetBlockData(), instead, like gin_redo(). Also there were other differences about how to treat XLOG_GIN_INSERT record between gin_desc() and gin_redo(). This commit fixes gin_desc() routine so that it treats those records in the same way as gin_redo(). Batch-patch to 9.5 where WAL record format was revamped and XLogRegisterBufData() was added. Reported-By: Andres Freund Reviewed-By: Tom Lane Discussion: <[email protected]>
2016-09-23Remove useless code.Tom Lane
Apparent copy-and-pasteo in standby_desc_invalidations() had two entries for msg->id == SHAREDINVALRELMAP_ID. Aleksander Alekseev Discussion: <20160923090814.GB1238@e733>
2016-08-29Split hash.h → hash_xlog.hAlvaro Herrera
Since the hash AM is going to be revamped to have WAL, this is a good opportunity to clean up the include file a little bit to avoid including a lot of extra stuff in the future. Author: Amit Kapila
2016-07-18Clear all-frozen visibilitymap status when locking tuples.Andres Freund
Since a892234 & fd31cd265 the visibilitymap's freeze bit is used to avoid vacuuming the whole relation in anti-wraparound vacuums. Doing so correctly relies on not adding xids to the heap without also unsetting the visibilitymap flag. Tuple locking related code has not done so. To allow selectively resetting all-frozen - to avoid pessimizing heap_lock_tuple - allow to selectively reset the all-frozen with visibilitymap_clear(). To avoid having to use visibilitymap_get_status (e.g. via VM_ALL_FROZEN) inside a critical section, have visibilitymap_clear() return whether any bits have been reset. There's a remaining issue (denoted by XXX): After the PageIsAllVisible() check in heap_lock_tuple() and heap_lock_updated_tuple_rec() the page status could theoretically change. Practically that currently seems impossible, because updaters will hold a page level pin already. Due to the next beta coming up, it seems better to get the required WAL magic bump done before resolving this issue. The added flags field fields to xl_heap_lock and xl_heap_lock_updated require bumping the WAL magic. Since there's already been a catversion bump since the last beta, that's not an issue. Reviewed-By: Robert Haas, Amit Kapila and Andres Freund Author: Masahiko Sawada, heavily revised by Andres Freund Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com Backpatch: -
2016-06-17pg_visibility: Add pg_truncate_visibility_map function.Robert Haas
This requires some core changes as well so that we can properly WAL-log the truncation. Specifically, it changes the format of the XLOG_SMGR_TRUNCATE WAL record, so bump XLOG_PAGE_MAGIC. Patch by me, reviewed but not fully endorsed by Andres Freund.
2016-06-09pgindent run for 9.6Robert Haas
2016-04-27Emit invalidations to standby for transactions without xid.Andres Freund
So far, when a transaction with pending invalidations, but without an assigned xid, committed, we simply ignored those invalidation messages. That's problematic, because those are actually sent for a reason. Known symptoms of this include that existing sessions on a hot-standby replica sometimes fail to notice new concurrently built indexes and visibility map updates. The solution is to WAL log such invalidations in transactions without an xid. We considered to alternatively force-assign an xid, but that'd be problematic for vacuum, which might be run in systems with few xids. Important: This adds a new WAL record, but as the patch has to be back-patched, we can't bump the WAL page magic. This means that standbys have to be updated before primaries; otherwise "PANIC: standby_redo: unknown op code 32" errors can be encountered. XXX: Reported-By: Васильев Дмитрий, Masahiko Sawada Discussion: CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w@mail.gmail.com
2016-04-12Correct copyright for newly added genericdesc.cStephen Frost
It's 2016 these days (no, not entirely sure how we got here either). Pointed out by Amit Langote
2016-04-06Generic Messages for Logical DecodingSimon Riggs
API and mechanism to allow generic messages to be inserted into WAL that are intended to be read by logical decoding plugins. This commit adds an optional new callback to the logical decoding API. Messages are either text or bytea. Messages can be transactional, or not, and are identified by a prefix to allow multiple concurrent decoding plugins. (Not to be confused with Generic WAL records, which are intended to allow crash recovery of extensible objects.) Author: Petr Jelinek and Andres Freund Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs Discussion: [email protected]
2016-04-01Add Generic WAL interfaceTeodor Sigaev
This interface is designed to give an access to WAL for extensions which could implement new access method, for example. Previously it was impossible because restoring from custom WAL would need to access system catalog to find a redo custom function. This patch suggests generic way to describe changes on page with standart layout. Bump XLOG_PAGE_MAGIC because of new record type. Author: Alexander Korotkov with a help of Petr Jelinek, Markus Nullmeier and minor editorization by my Reviewers: Petr Jelinek, Alvaro Herrera, Teodor Sigaev, Jim Nasby, Michael Paquier
2016-03-18Merge wal_level "archive" and "hot_standby" into new name "replica"Peter Eisentraut
The distinction between "archive" and "hot_standby" existed only because at the time "hot_standby" was added, there was some uncertainty about stability. This is now a long time ago. We would like to move forward with simplifying the replication configuration, but this distinction is in the way, because a primary server cannot tell (without asking a standby or predicting the future) which one of these would be the appropriate level. Pick a new name for the combined setting to make it clearer that it covers all (non-logical) backup and replication uses. The old values are still accepted but are converted internally. Reviewed-by: Michael Paquier <[email protected]> Reviewed-by: David Steele <[email protected]>
2016-03-08Add new flags argument for xl_heap_visible to heap2_desc.Robert Haas
Masahiko Sawada
2016-02-12Change delimiter used for display of NextXIDJoe Conway
NextXID has been rendered in the form of a pg_lsn even though it really is not. This can cause confusion, so change the format from %u/%u to %u:%u, per discussion on hackers. Complaint by me, patch by me and Bruce, reviewed by Michael Paquier and Alvaro. Applied to HEAD only. Author: Joe Conway, Bruce Momjian Reviewed-by: Michael Paquier, Alvaro Herrera Backpatch-through: master
2016-01-21Refactor headers to split out standby defsSimon Riggs
Jeff Janes
2016-01-09Revoke change to rmgr desc of btree vacuumSimon Riggs
Per discussion with Andres Freund
2016-01-09Avoid pin scan for replay of XLOG_BTREE_VACUUMSimon Riggs
Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require complex interlocking that matched the requirements on the master. This required an O(N) operation that became a significant problem with large indexes, causing replication delays of seconds or in some cases minutes while the XLOG_BTREE_VACUUM was replayed. This commit skips the “pin scan” that was previously required, by observing in detail when and how it is safe to do so, with full documentation. The pin scan is skipped only in replay; the VACUUM code path on master is not touched here. The current commit still performs the pin scan for toast indexes, though this can also be avoided if we recheck scans on toast indexes. Later patch will address this. No tests included. Manual tests using an additional patch to view WAL records and their timing have shown the change in WAL records and their handling has successfully reduced replication delay.
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-12-28Rename (new|old)estCommitTs to (new|old)estCommitTsXidJoe Conway
The variables newestCommitTs and oldestCommitTs sound as if they are timestamps, but in fact they are the transaction Ids that correspond to the newest and oldest timestamps rather than the actual timestamps. Rename these variables to reflect that they are actually xids: to wit newestCommitTsXid and oldestCommitTsXid respectively. Also modify related code in a similar fashion, particularly the user facing output emitted by pg_controldata and pg_resetxlog. Complaint and patch by me, review by Tom Lane and Alvaro Herrera. Backpatch to 9.5 where these variables were first introduced.
2015-11-19Fix typo in comment.Robert Haas
Amit Langote
2015-09-29Code review for transaction commit timestampsAlvaro Herrera
There are three main changes here: 1. No longer cause a start failure in a standby if the feature is disabled in postgresql.conf but enabled in the master. This reverts one part of commit 4f3924d9cd43; what we keep is the ability of the standby to activate/deactivate the module (which includes creating and removing segments as appropriate) during replay of such actions in the master. 2. Replay WAL records affecting commitTS even if the feature is disabled. This means the standby will always have the same state as the master after replay. 3. Have COMMIT PREPARE record the transaction commit time as well. We were previously only applying it in the normal transaction commit path. Author: Petr Jelínek Discussion: http://www.postgresql.org/message-id/CAHGQGwHereDzzzmfxEBYcVQu3oZv6vZcgu1TPeERWbDc+gQ06g@mail.gmail.com Discussion: http://www.postgresql.org/message-id/CAHGQGwFuzfO4JscM9LCAmCDCxp_MfLvN4QdB+xWsS-FijbjTYQ@mail.gmail.com Additionally, I cleaned up nearby code related to replication origins, which I found a bit hard to follow, and fixed a couple of typos. Backpatch to 9.5, where this code was introduced. Per bug reports from Fujii Masao and subsequent discussion.
2015-09-26Rework the way multixact truncations work.Andres Freund
The fact that multixact truncations are not WAL logged has caused a fair share of problems. Amongst others it requires to do computations during recovery while the database is not in a consistent state, delaying truncations till checkpoints, and handling members being truncated, but offset not. We tried to put bandaids on lots of these issues over the last years, but it seems time to change course. Thus this patch introduces WAL logging for multixact truncations. This allows: 1) to perform the truncation directly during VACUUM, instead of delaying it to the checkpoint. 2) to avoid looking at the offsets SLRU for truncation during recovery, we can just use the master's values. 3) simplify a fair amount of logic to keep in memory limits straight, this has gotten much easier During the course of fixing this a bunch of additional bugs had to be fixed: 1) Data was not purged from memory the member's SLRU before deleting segments. This happened to be hard or impossible to hit due to the interlock between checkpoints and truncation. 2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but that doesn't work for offsets that haven't yet been flushed to disk. Add code to flush the SLRUs to fix. Not pretty, but it feels slightly safer to only make decisions based on actual on-disk state. 3) find_multixact_start() could be called concurrently with a truncation and thus fail. Via SetOffsetVacuumLimit() that could lead to a round of emergency vacuuming. The problem remains in pg_get_multixact_members(), but that's quite harmless. For now this is going to only get applied to 9.5+, leaving the issues in the older branches in place. It is quite possible that we need to backpatch at a later point though. For the case this gets backpatched we need to handle that an updated standby may be replaying WAL from a not-yet upgraded primary. We have to recognize that situation and use "old style" truncation (i.e. looking at the SLRUs) during WAL replay. In contrast to before, this now happens in the startup process, when replaying a checkpoint record, instead of the checkpointer. Doing truncation in the restartpoint is incorrect, they can happen much later than the original checkpoint, thereby leading to wraparound. To avoid "multixact_redo: unknown op code 48" errors standbys would have to be upgraded before primaries. A later patch will bump the WAL page magic, and remove the legacy truncation codepaths. Legacy truncation support is just included to make a possible future backpatch easier. Discussion: [email protected] Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro Backpatch: 9.5 for now