summaryrefslogtreecommitdiff
path: root/src/backend/commands/vacuumparallel.c
AgeCommit message (Collapse)Author
2025-02-11Add cost-based vacuum delay time to progress views.Nathan Bossart
This commit adds the amount of time spent sleeping due to cost-based delay to the pg_stat_progress_vacuum and pg_stat_progress_analyze system views. A new configuration parameter named track_cost_delay_timing, which is off by default, controls whether this information is gathered. For vacuum, the reported value includes the sleep time of any associated parallel workers. However, parallel workers only report their sleep time once per second to avoid overloading the leader process. Bumps catversion. Author: Bertrand Drouvot <[email protected]> Co-authored-by: Nathan Bossart <[email protected]> Reviewed-by: Sami Imseih <[email protected]> Reviewed-by: Robert Haas <[email protected]> Reviewed-by: Masahiko Sawada <[email protected]> Reviewed-by: Masahiro Ikeda <[email protected]> Reviewed-by: Dilip Kumar <[email protected]> Reviewed-by: Sergei Kornilov <[email protected]> Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
2025-01-31Doc: add commentary about cowboy assignment of maintenance_work_mem.Tom Lane
Whilst working on commit 041e8b95b I happened to notice that parallel_vacuum_main() assigns directly to the maintenance_work_mem GUC. This is definitely not per project conventions, so I tried to fix it to use SetConfigOption(). But that fails with "parameter cannot be set during a parallel operation". It doesn't seem worth working on a cleaner answer, at least not till we have a few more instances of similar problems. But add some commentary, just so nobody gets the idea that this is an approved way to set a GUC.
2025-01-31Get rid of our dependency on type "long" for memory size calculations.Tom Lane
Consistently use "Size" (or size_t, or in some places int64 or double) as the type for variables holding memory allocation sizes. In most places variables' data types were fine already, but we had an ancient habit of computing bytes from kilobytes-units GUCs with code like "work_mem * 1024L". That risks overflow on Win64 where they did not make "long" as wide as "size_t". We worked around that by restricting such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB on Win64. This patch removes that restriction, after replacing such calculations with "work_mem * (Size) 1024" or variants of that. It should be noted that this patch was constructed by searching outwards from the GUCs that have MAX_KILOBYTES as upper limit. So I can't positively guarantee there are no other places doing memory-size arithmetic in int or long variables. I do however feel pretty confident that increasing MAX_KILOBYTES on Win64 is safe now. Also, nothing in our code should be dealing in multiple-gigabyte allocations without authorization from a relevant GUC, so it seems pretty likely that this search caught everything that could be at risk of overflow. Author: Vladlen Popolitov <[email protected]> Co-authored-by: Tom Lane <[email protected]> Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217
2025-01-01Update copyright for 2025Bruce Momjian
Backpatch-through: 13
2024-12-04Fix use-after-free in parallel_vacuum_reset_dead_itemsJohn Naylor
parallel_vacuum_reset_dead_items used a local variable to hold a pointer from the passed vacrel, purely as a shorthand. This pointer was later freed and a new allocation was made and stored to the struct. Then the local pointer was mistakenly referenced again. This apparently happened not to break anything since the freed chunk would have been put on the context's freelist, so it was accidentally the same pointer anyway, in which case the DSA handle was correctly updated. The minimal fix is to change two places so they access dead_items through the vacrel. This coding style is a maintenance hazard, so while at it get rid of most other similar usages, which were inconsistently used anyway. Analysis and patch by Vallimaharajan G, with further defensive coding by me Backpath to v17, when TidStore came in Discussion: https://postgr.es/m/[email protected]
2024-09-29Set query ID in parallel workers for vacuum, BRIN and btreeMichael Paquier
All these code paths use their own entry point when starting parallel workers, but failed to set a query ID, even if they set a text query. Hence, this data would be missed in pg_stat_activity for the worker processes. The main entry point for parallel query processing, ParallelQueryMain(), is already doing that by saving its query ID in a dummy PlannedStmt, but not the others. The code is changed so as the query ID of these queries is set in their shared state, and reported back once the parallel workers start. Some tests are added to show how the failures can happen for btree and BRIN with a parallel build enforced, which are able to trigger a failure in an assertion added by 24f520594809 in the recovery TAP test 027_stream_regress.pl where pg_stat_statements is always loaded. In this case, the executor path was taken because the index expression needs to be flattened when building its IndexInfo. Alexander Lakhin has noticed the problem in btree, and I have noticed that the issue was more spread. This is arguably a bug, but nobody has complained about that until now, so no backpatch is done out of caution. If folks would like to see a backpatch, well, let me know. Reported-by: Alexander Lakhin Reviewed-by: Sami Imseih Discussion: https://postgr.es/m/[email protected]
2024-08-14Use pgBufferUsage for buffer usage tracking in analyze.Masahiko Sawada
Previously, (auto)analyze used global variables VacuumPageHit, VacuumPageMiss, and VacuumPageDirty to track buffer usage. However, pgBufferUsage provides a more generic way to track buffer usage with support functions. This change replaces those global variables with pgBufferUsage in analyze. Since analyze was the sole user of those variables, it removes their declarations. Vacuum previously used those variables but replaced them with pgBufferUsage as part of a bug fix, commit 5cd72cc0c. Additionally, it adjusts the buffer usage message in both vacuum and analyze for better consistency. Author: Anthonin Bonnefoy Reviewed-by: Masahiko Sawada, Michael Paquier Discussion: https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com
2024-04-18Fix typos and duplicate wordsDaniel Gustafsson
This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <[email protected]> Author: Heikki Linnakangas <[email protected]> Author: Alexander Lakhin <[email protected]> Author: David Rowley <[email protected]> Author: Nazir Bilal Yavuz <[email protected]> Discussion: https://postgr.es/m/[email protected]
2024-04-02Use TidStore for dead tuple TIDs storage during lazy vacuum.Masahiko Sawada
Previously, we used a simple array for storing dead tuple IDs during lazy vacuum, which had a number of problems: * The array used a single allocation and so was limited to 1GB. * The allocation was pessimistically sized according to table size. * Lookup with binary search was slow because of poor CPU cache and branch prediction behavior. This commit replaces that array with the TID store from commit 30e144287a. Since the backing radix tree makes small allocations as needed, the 1GB limit is now gone. Further, the total memory used is now often smaller by an order of magnitude or more, depending on the distribution of blocks and offsets. These two features should make multiple rounds of heap scanning and index cleanup an extremely rare event. TID lookup during index cleanup is also several times faster, even more so when index order is correlated with heap tuple order. Since there is no longer a predictable relationship between the number of dead tuples vacuumed and the space taken up by their TIDs, the number of tuples no longer provides any meaningful insights for users, nor is the maximum number predictable. For that reason this commit also changes to byte-based progress reporting, with the relevant columns of pg_stat_progress_vacuum renamed accordingly to max_dead_tuple_bytes and dead_tuple_bytes. For parallel vacuum, both the TID store and supplemental information specific to vacuum are shared among the parallel vacuum workers. As with the previous array, we don't take any locks on TidStore during parallel vacuum since writes are still only done by the leader process. Bump catalog version. Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
2024-03-04Remove unused #include's from backend .c filesPeter Eisentraut
as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-01-04Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/[email protected] Backpatch-through: 12
2023-10-26Add trailing commas to enum definitionsPeter Eisentraut
Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-07-11Report index vacuum progress.Masahiko Sawada
This commit adds two columns: indexes_total and indexes_processed, to pg_stat_progress_vacuum system view to show the index vacuum progress. These numbers are reported in the "vacuuming indexes" and "cleaning up indexes" phases. This uses the new parallel message type for progress reporting added by be06506e7. Bump catversion because this changes the definition of pg_stat_progress_vacuum. Author: Sami Imseih Reviewed by: Masahiko Sawada, Michael Paquier, Nathan Bossart, Andres Freund Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2023-04-19Fix various typos and incorrect/outdated name referencesDavid Rowley
Author: Alexander Lakhin Discussion: https://postgr.es/m/[email protected]
2023-04-06Add VACUUM/ANALYZE BUFFER_USAGE_LIMIT optionDavid Rowley
Add new options to the VACUUM and ANALYZE commands called BUFFER_USAGE_LIMIT to allow users more control over how large to make the buffer access strategy that is used to limit the usage of buffers in shared buffers. Larger rings can allow VACUUM to run more quickly but have the drawback of VACUUM possibly evicting more buffers from shared buffers that might be useful for other queries running on the database. Here we also add a new GUC named vacuum_buffer_usage_limit which controls how large to make the access strategy when it's not specified in the VACUUM/ANALYZE command. This defaults to 256KB, which is the same size as the access strategy was prior to this change. This setting also controls how large to make the buffer access strategy for autovacuum. Per idea by Andres Freund. Author: Melanie Plageman Reviewed-by: David Rowley Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Reviewed-by: Bharath Rupireddy Discussion: https://postgr.es/m/[email protected]
2023-04-06Refresh cost-based delay params more frequently in autovacuumDaniel Gustafsson
Allow autovacuum to reload the config file more often so that cost-based delay parameters can take effect while VACUUMing a relation. Previously, autovacuum workers only reloaded the config file once per relation vacuumed, so config changes could not take effect until beginning to vacuum the next table. Now, check if a reload is pending roughly once per block, when checking if we need to delay. In order for autovacuum workers to safely update their own cost delay and cost limit parameters without impacting performance, we had to rethink when and how these values were accessed. Previously, an autovacuum worker's wi_cost_limit was set only at the beginning of vacuuming a table, after reloading the config file. Therefore, at the time that autovac_balance_cost() was called, workers vacuuming tables with no cost-related storage parameters could still have different values for their wi_cost_limit_base and wi_cost_delay. Now that the cost parameters can be updated while vacuuming a table, workers will (within some margin of error) have no reason to have different values for cost limit and cost delay (in the absence of cost-related storage parameters). This removes the rationale for keeping cost limit and cost delay in shared memory. Balancing the cost limit requires only the number of active autovacuum workers vacuuming a table with no cost-based storage parameters. Author: Melanie Plageman <[email protected]> Reviewed-by: Masahiko Sawada <[email protected]> Reviewed-by: Daniel Gustafsson <[email protected]> Reviewed-by: Kyotaro Horiguchi <[email protected]> Reviewed-by: Robert Haas <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com
2023-04-06Separate vacuum cost variables from GUCsDaniel Gustafsson
Vacuum code run both by autovacuum workers and a backend doing VACUUM/ANALYZE previously inspected VacuumCostLimit and VacuumCostDelay, which are the global variables backing the GUCs vacuum_cost_limit and vacuum_cost_delay. Autovacuum workers needed to override these variables with their own values, derived from autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay and worker cost limit balancing logic. This led to confusing code which, in some cases, both derived and set a new value of VacuumCostLimit from VacuumCostLimit. In preparation for refreshing these GUC values more often, introduce new, independent global variables and add a function to update them using the GUCs and existing logic. Per suggestion by Kyotaro Horiguchi Author: Melanie Plageman <[email protected]> Reviewed-by: Masahiko Sawada <[email protected]> Reviewed-by: Daniel Gustafsson <[email protected]> Reviewed-by: Kyotaro Horiguchi <[email protected]> Reviewed-by: Robert Haas <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ZngzqnEODc7LmS1NH04Kt6Y9huSjz5pp7%2BDXhrjDA0gw%40mail.gmail.com
2023-04-03Move heaprel struct field next to index rel field.Peter Geoghegan
Commit 61b313e4 added a heaprel struct member to IndexVacuumInfo, but placed it last. Move the heaprel struct member next to the index struct member to improve the code's readability. Author: Peter Geoghegan <[email protected]> Discussion: https://postgr.es/m/CAH2-WznG=TV6S9d3VA=y0vBHbXwnLs9_LLdiML=aNJuHeriwxg@mail.gmail.com
2023-04-02Pass down table relation into more index relation functionsAndres Freund
This is done in preparation for logical decoding on standby, which needs to include whether visibility affecting WAL records are about a (user) catalog table. Which is only known for the table, not the indexes. It's also nice to be able to pass the heap relation to GlobalVisTestFor() in vacuumRedirectAndPlaceholder(). Author: "Drouvot, Bertrand" <[email protected]> Discussion: https://postgr.es/m/[email protected]
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-08-22Use logical operator && instead of & in vacuumparallel.c.Amit Kapila
As such the current usage of & won't produce incorrect results but it would be better to use && to short-circuit the evaluation of second condition when the same is not required. Author: Ranier Vilela Reviewed-by: Tom Lane, Bharath Rupireddy Backpatch-through: 15, where it was introduced Discussion: https://postgr.es/m/CAEudQApL8QcoYwQuutkWKY_h7gBY8F0Xs34YKfc7-G0i83K_pw@mail.gmail.com
2022-08-19Initialize index stats during parallel VACUUM.Peter Geoghegan
Initialize shared memory allocated for index stats to avoid a hard crash. This was possible when parallel VACUUM became confused about the current phase of index processing. Oversight in commit 8e1fae1938, which refactored parallel VACUUM. Author: Masahiko Sawada <[email protected]> Reported-By: Justin Pryzby <[email protected]> Discussion: https://postgr.es/m/[email protected] Backpatch: 15-, the first version with the refactoring commit.
2022-05-12Pre-beta mechanical code beautification.Tom Lane
Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
2022-04-11Fix various typos and spelling mistakes in code commentsDavid Rowley
Author: Justin Pryzby Discussion: https://postgr.es/m/[email protected]
2022-04-07pgstat: store statistics in shared memory.Andres Freund
Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <[email protected]> Author: Andres Freund <[email protected]> Author: Melanie Plageman <[email protected]> Reviewed-By: Andres Freund <[email protected]> Reviewed-By: Thomas Munro <[email protected]> Reviewed-By: Justin Pryzby <[email protected]> Reviewed-By: "David G. Johnston" <[email protected]> Reviewed-By: Tomas Vondra <[email protected]> (in a much earlier version) Reviewed-By: Arthur Zakirov <[email protected]> (in a much earlier version) Reviewed-By: Antonin Houska <[email protected]> (in a much earlier version) Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2022-01-15Unify VACUUM VERBOSE and autovacuum logging.Peter Geoghegan
The log_autovacuum_min_duration instrumentation used its own dedicated code for logging, which was not reused by VACUUM VERBOSE. This was highly duplicative, and sometimes led to each code path using slightly different accounting for essentially the same information. Clean things up by making VACUUM VERBOSE reuse the same instrumentation code. This code restructuring changes the structure of the VACUUM VERBOSE output itself, but that seems like an overall improvement. The most noticeable change in VACUUM VERBOSE output is that it no longer outputs a distinct message per index per round of index vacuuming. Most of the same information (about each index) is now shown in its new per-operation summary message. This is far more legible. A few details are no longer displayed by VACUUM VERBOSE, but that's no real loss in practice, especially in the common case where we don't need multiple index scans/rounds of vacuuming. This super fine-grained information is still available via DEBUG2 messages, which might still be useful in debugging scenarios. VACUUM VERBOSE now shows new instrumentation, which is typically very useful: all of the log_autovacuum_min_duration instrumentation that it missed out on before now. This includes information about WAL overhead, buffers hit/missed/dirtied information, and I/O timing information. VACUUM VERBOSE still retains a few INFO messages of its own. This is limited to output concerning the progress of heap rel truncation, as well as some basic information about parallel workers. These details are still potentially quite useful. They aren't a good fit for the log output, which must summarize the whole operation. Author: Peter Geoghegan <[email protected]> Reviewed-By: Masahiko Sawada <[email protected]> Reviewed-By: Andres Freund <[email protected]> Discussion: https://postgr.es/m/CAH2-WzmW4Me7_qR4X4ka7pxP-jGmn7=Npma_-Z-9Y1eD0MQRLw@mail.gmail.com
2022-01-08Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-12-23Fix compilation error introduced by commit 8e1fae1938.Amit Kapila
Author: Masahiko Sawada Discussion: https://postgr.es/m/[email protected]
2021-12-23Move parallel vacuum code to vacuumparallel.c.Amit Kapila
This commit moves parallel vacuum related code to a new file commands/vacuumparallel.c so that any table AM supporting indexes can utilize parallel vacuum in order to call index AM callbacks (ambulkdelete and amvacuumcleanup) with parallel workers. Another reason for this refactoring is that the parallel vacuum isn't specific to heap so it doesn't make sense to keep this code in heap/vacuumlazy.c. Author: Masahiko Sawada, based on suggestion from Andres Freund Reviewed-by: Hou Zhijie, Amit Kapila, Haiying Tang Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de