diff options
author | Peter Geoghegan | 2021-03-21 22:25:39 +0000 |
---|---|---|
committer | Peter Geoghegan | 2021-03-21 22:25:39 +0000 |
commit | 9dd963ae2534e9614f0abeccaafbd39f1b93ff8a (patch) | |
tree | 0f3d448f2e5ad78b14eca30a2d90273676fbeaf0 /src/backend/access/nbtree/nbtree.c | |
parent | 4d399a6fbeb720b34d33441330910b7d853f703d (diff) |
Recycle nbtree pages deleted during same VACUUM.
Maintain a simple array of metadata about pages that were deleted during
nbtree VACUUM's current btvacuumscan() call. Use this metadata at the
end of btvacuumscan() to attempt to place newly deleted pages in the FSM
without further delay. It might not yet be safe to place any of the
pages in the FSM by then (they may not be deemed recyclable), but we
have little to lose and plenty to gain by trying. In practice there is
a very good chance that this will work out when vacuuming larger
indexes, where scanning the index naturally takes quite a while.
This commit doesn't change the page recycling invariants; it merely
improves the efficiency of page recycling within the confines of the
existing design. Recycle safety is a part of nbtree's implementation of
what Lanin & Shasha call "the drain technique". The design happens to
use transaction IDs (they're stored in deleted pages), but that in
itself doesn't align the cutoff for recycle safety to any of the
XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters
is whether or not _other_ backends might be able to observe various
inconsistencies in the tree structure (that they cannot just detect and
recover from by moving right). Recycle safety is purely a question of
maintaining the consistency (or the apparent consistency) of a physical
data structure.
Note that running a simple serial test case involving a large range
DELETE followed by a VACUUM VERBOSE will probably show that any newly
deleted nbtree pages are not yet reusable/recyclable. This is expected
in the absence of even one concurrent XID assignment. It is an old
implementation restriction. In practice it's unlikely to be the thing
that makes recycling remain unsafe, at least with larger indexes, where
recycling newly deleted pages during the same VACUUM actually matters.
An important high-level goal of this commit (as well as related recent
commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
operations in index AMs rare in general. If index vacuuming frequently
depends on the next VACUUM operation finishing off work that the current
operation started, then the general behavior of index vacuuming is hard
to predict. This is relevant to ongoing work that adds a vacuumlazy.c
mechanism to skip index vacuuming in certain cases. Anything that makes
the real world behavior of index vacuuming simpler and more linear will
also make top-down modeling in vacuumlazy.c more robust.
Author: Peter Geoghegan <[email protected]>
Reviewed-By: Masahiko Sawada <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
Diffstat (limited to 'src/backend/access/nbtree/nbtree.c')
-rw-r--r-- | src/backend/access/nbtree/nbtree.c | 34 |
1 files changed, 22 insertions, 12 deletions
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index c02c4e7710d..9282c9ea22f 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -859,9 +859,13 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats) * Maintain num_delpages value in metapage for _bt_vacuum_needs_cleanup(). * * num_delpages is the number of deleted pages now in the index that were - * not safe to place in the FSM to be recycled just yet. We expect that - * it will almost certainly be possible to place all of these pages in the - * FSM during the next VACUUM operation. + * not safe to place in the FSM to be recycled just yet. num_delpages is + * greater than 0 only when _bt_pagedel() actually deleted pages during + * our call to btvacuumscan(). Even then, _bt_pendingfsm_finalize() must + * have failed to place any newly deleted pages in the FSM just moments + * ago. (Actually, there are edge cases where recycling of the current + * VACUUM's newly deleted pages does not even become safe by the time the + * next VACUUM comes around. See nbtree/README.) */ Assert(stats->pages_deleted >= stats->pages_free); num_delpages = stats->pages_deleted - stats->pages_free; @@ -937,6 +941,14 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, "_bt_pagedel", ALLOCSET_DEFAULT_SIZES); + /* Initialize vstate fields used by _bt_pendingfsm_finalize */ + vstate.bufsize = 0; + vstate.maxbufsize = 0; + vstate.pendingpages = NULL; + vstate.npendingpages = 0; + /* Consider applying _bt_pendingfsm_finalize optimization */ + _bt_pendingfsm_init(rel, &vstate, (callback == NULL)); + /* * The outer loop iterates over all index pages except the metapage, in * physical order (we hope the kernel will cooperate in providing @@ -995,17 +1007,15 @@ btvacuumscan(IndexVacuumInfo *info, IndexBulkDeleteResult *stats, MemoryContextDelete(vstate.pagedelcontext); /* - * If we found any recyclable pages (and recorded them in the FSM), then - * forcibly update the upper-level FSM pages to ensure that searchers can - * find them. It's possible that the pages were also found during - * previous scans and so this is a waste of time, but it's cheap enough - * relative to scanning the index that it shouldn't matter much, and - * making sure that free pages are available sooner not later seems - * worthwhile. + * If there were any calls to _bt_pagedel() during scan of the index then + * see if any of the resulting pages can be placed in the FSM now. When + * it's not safe we'll have to leave it up to a future VACUUM operation. * - * Note that if no recyclable pages exist, we don't bother vacuuming the - * FSM at all. + * Finally, if we placed any pages in the FSM (either just now or during + * the scan), forcibly update the upper-level FSM pages to ensure that + * searchers can find them. */ + _bt_pendingfsm_finalize(rel, &vstate); if (stats->pages_free > 0) IndexFreeSpaceMapVacuum(rel); } |