summaryrefslogtreecommitdiff
path: root/src/backend/partitioning/partprune.c
AgeCommit message (Collapse)Author
2024-04-18Fix typos and duplicate wordsDaniel Gustafsson
This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <[email protected]> Author: Heikki Linnakangas <[email protected]> Author: Alexander Lakhin <[email protected]> Author: David Rowley <[email protected]> Author: Nazir Bilal Yavuz <[email protected]> Discussion: https://postgr.es/m/[email protected]
2024-03-04Remove unused #include's from backend .c filesPeter Eisentraut
as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
2024-03-04Support partition pruning on boolcol IS [NOT] UNKNOWNDavid Rowley
While working on 4c2369ac5, I noticed we went out of our way not to support clauses on boolean partitioned tables in the form of "IS UNKNOWN" and "IS NOT UNKNOWN". It's almost as much code to disallow this as it is to allow it, so let's allow it. Discussion: https://postgr.es/m/CAApHDvobKtcN6+xOuOfcutfp6T7jP=JPA9y3=MAEqnuKdDsQrw@mail.gmail.com
2024-02-20Minor corrections for partition pruningDavid Rowley
When the partition pruning code finds an OpExpr with an operator that does not belong to the partition key's opfamily, the code checks to see if the negator of the operator is the opfamily's BTEqualStrategyNumber operator so that partition pruning can support that operator and invert the matching partitions. Doing this only works for LIST partitioned tables. Here we fix a minor correctness issue where when we discover we're not pruning for a LIST partitioned table, we return PARTCLAUSE_NOMATCH. PARTCLAUSE_NOMATCH is only meant to be used when the clause may match another partitioned key column. For this case, the clause is not going to be any more useful to another partitioned key as the partition strategy is not going to change from one key to the next. Noticed while working 4c2369ac5. No backpatch because returning PARTCLAUSE_NOMATCH instead of PARTCLAUSE_UNSUPPORTED mostly just causes wasted effort checking subsequent partition keys against a clause that will never be used for pruning. In passing, correct a comment for get_matching_range_bounds() which mentions that an 'opstrategy' of 0 is supported. It's not, so fix the comment. This was pointed out by Alexander Lakhin. Discussion: https://postgr.es/m/CAApHDvqriy8mPOFJ_Bd66YGXJ4+XULpv-4YdB+ePdCQFztyisA@mail.gmail.com Discussion: https://postgr.es/m/[email protected]
2024-02-19Fix incorrect pruning of NULL partition for boolean IS NOT clausesDavid Rowley
Partition pruning wrongly assumed that, for a table partitioned on a boolean column, a clause in the form "boolcol IS NOT false" and "boolcol IS NOT true" could be inverted to correspondingly become "boolcol IS true" and "boolcol IS false". These are not equivalent as the NOT version matches the opposite boolean value *and* NULLs. This incorrect assumption meant that partition pruning pruned away partitions that could contain NULL values. Here we fix this by correctly not pruning partitions which could store NULLs. To be affected by this, the table must be partitioned by a NULLable boolean column and queries would have to contain "boolcol IS NOT false" or "boolcol IS NOT true". This could result in queries filtering out NULL values with a LIST partitioned table and "ERROR: invalid strategy number 0" for RANGE and HASH partitioned tables. Reported-by: Alexander Lakhin Bug: #18344 Discussion: https://postgr.es/m/[email protected] Backpatch-through: 12
2024-01-04Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/[email protected] Backpatch-through: 12
2023-10-26Add trailing commas to enum definitionsPeter Eisentraut
Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
2023-10-12Fix incorrect step generation in HASH partition pruningDavid Rowley
get_steps_using_prefix_recurse() incorrectly assumed that it could stop recursive processing of the 'prefix' list when cur_keyno was one before the step_lastkeyno. Since hash partition pruning can prune using IS NULL quals, and these IS NULL quals are not present in the 'prefix' list, then that logic could cause more levels of recursion than what is needed and lead to there being no more items in the 'prefix' list to process. This would manifest itself as a crash in some code that expected the 'start' ListCell not to be NULL. Here we adjust the logic so that instead of stopping recursion at 1 key before the step_lastkeyno, we just look at the llast(prefix) item and ensure we only recursively process up until just before whichever the last key is. This effectively allows keys to be missing in the 'prefix' list. This change does mean that step_lastkeyno is no longer needed, so we remove that from the static functions. I also spent quite some time reading this code and testing it to try to convince myself that there are no other issues. That resulted in the irresistible temptation of rewriting some comments, many of which were just not true or inconcise. Reported-by: Sergei Glukhov Reviewed-by: Sergei Glukhov, tender wang Discussion: https://postgr.es/m/[email protected] Backpatch-through: 11, where partition pruning was introduced.
2023-05-04Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt"Alvaro Herrera
This reverts commit ec386948948c and its fixup 589bb816499e. This change was intended to support query planning avoiding acquisition of locks on partitions that were going to be pruned; however, the overall project took a different direction at [1] and this bit is no longer needed. Put things back the way they were as agreed in [2], to avoid unnecessary complexity. Discussion: [1] https://postgr.es/m/[email protected] Discussion: [2] https://postgr.es/m/[email protected]
2023-04-14Fix incorrect partition pruning logic for boolean partitioned tablesDavid Rowley
The partition pruning logic assumed that "b IS NOT true" was exactly the same as "b IS FALSE". This is not the case when considering NULL values. Fix this so we correctly include any partition which could hold NULL values for the NOT case. Additionally, this fixes a bug in the partition pruning code which handles partitioned tables partitioned like ((NOT boolcol)). This is a seemingly unlikely schema design, and it was untested and also broken. Here we add tests for the ((NOT boolcol)) case and insert some actual data into those tables and verify we do get the correct rows back when running queries. I've also adjusted the existing boolpart tests to include some data and verify we get the correct results too. Both the bugs being fixed here could lead to incorrect query results with fewer rows being returned than expected. No additional rows could have been returned accidentally. In passing, remove needless ternary expression. It's more simple just to pass !is_not_clause to makeBoolConst(). It makes sense to do this so the code is consistent with the bug fix in the "else if" condition just below. David Kimura did submit a patch to fix the first of the issues here, but that's not what's being committed here. Reported-by: David Kimura Reviewed-by: Richard Guo, David Kimura Discussion: https://postgr.es/m/CAHnPFjQ5qxs6J_p+g8=ww7GQvfn71_JE+Tygj0S7RdRci1uwPw@mail.gmail.com Backpatch-through: 11, all supported versions
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-12-01Move PartitioPruneInfo out of plan nodes into PlannedStmtAlvaro Herrera
The planner will now add a given PartitioPruneInfo to PlannedStmt.partPruneInfos instead of directly to the Append/MergeAppend plan node. What gets set instead in the latter is an index field which points to the list element of PlannedStmt.partPruneInfos containing the PartitioPruneInfo belonging to the plan node. A later commit will make AcquireExecutorLocks() do the initial partition pruning to determine a minimal set of partitions to be locked when validating a plan tree and it will need to consult the PartitioPruneInfos referenced therein to do so. It would be better for the PartitioPruneInfos to be accessible directly than requiring a walk of the plan tree to find them, which is easier when it can be done by simply iterating over PlannedStmt.partPruneInfos. Author: Amit Langote <[email protected]> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
2022-10-05Rename shadowed local variablesDavid Rowley
In a similar effort to f01592f91, here we mostly rename shadowed local variables to remove the warnings produced when compiling with -Wshadow=compatible-local. This fixes 63 warnings and leaves just 5. Author: Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
2022-09-02Fix planner to consider matches to boolean columns in extension indexes.Tom Lane
The planner has to special-case indexes on boolean columns, because what we need for an indexscan on such a column is a qual of the shape of "boolvar = pseudoconstant". For plain bool constants, previous simplification will have reduced this to "boolvar" or "NOT boolvar", and we have to reverse that if we want to make an indexqual. There is existing code to do so, but it only fires when the index's opfamily is BOOL_BTREE_FAM_OID or BOOL_HASH_FAM_OID. Thus extension AMs, or extension opclasses such as contrib/btree_gin, are out in the cold. The reason for hard-wiring the set of relevant opfamilies was mostly to avoid a catalog lookup in a hot code path. We can improve matters while not taking much of a performance hit by relying on the hard-wired set when the opfamily OID is visibly built-in, and only checking the catalogs when dealing with an extension opfamily. While here, rename IsBooleanOpfamily to IsBuiltinBooleanOpfamily to remind future users of that macro of its limitations. At some point we might want to make indxpath.c's improved version of the test globally accessible, but it's not presently needed elsewhere. Zongliang Quan and Tom Lane Discussion: https://postgr.es/m/[email protected]
2022-08-18Improve performance of adjust_appendrel_attrs_multilevel.Tom Lane
The present implementations of adjust_appendrel_attrs_multilevel and its sibling adjust_child_relids_multilevel are very messy, because they work by reconstructing the relids of the child's immediate parent and then seeing if that's bms_equal to the relids of the target parent. Aside from being quite inefficient, this will not work with planned future changes to make joinrels' relid sets contain outer-join relids in addition to baserels. The whole thing can be solved at a stroke by adding explicit parent and top_parent links to child RelOptInfos, and making these functions work with RelOptInfo pointers instead of relids. Doing that is simpler for most callers, too. In my original version of this patch, I got rid of RelOptInfo.top_parent_relids on the grounds that it was now redundant. However, that adds a lot of code churn in places that otherwise would not need changing, and arguably the extra indirection needed to fetch top_parent->relids in those places costs something. So this version leaves that field in place. Discussion: https://postgr.es/m/[email protected]
2022-08-17Avoid using list_length() to test for empty list.Tom Lane
The standard way to check for list emptiness is to compare the List pointer to NIL; our list code goes out of its way to ensure that that is the only representation of an empty list. (An acceptable alternative is a plain boolean test for non-null pointer, but explicit mention of NIL is usually preferable.) Various places didn't get that memo and expressed the condition with list_length(), which might not be so bad except that there were such a variety of ways to check it exactly: equal to zero, less than or equal to zero, less than one, yadda yadda. In the name of code readability, let's standardize all those spellings as "list == NIL" or "list != NIL". (There's probably some microscopic efficiency gain too, though few of these look to be at all performance-critical.) A very small number of cases were left as-is because they seemed more consistent with other adjacent list_length tests that way. Peter Smith, with bikeshedding from a number of us Discussion: https://postgr.es/m/CAHut+PtQYe+ENX5KrONMfugf0q6NHg4hR5dAhqEXEc2eefFeig@mail.gmail.com
2022-04-05Refactor and cleanup runtime partition prune code a littleAlvaro Herrera
* Move the execution pruning initialization steps that are common between both ExecInitAppend() and ExecInitMergeAppend() into a new function ExecInitPartitionPruning() defined in execPartition.c. Those steps include creation of a PartitionPruneState to be used for all instances of pruning and determining the minimal set of child subplans that need to be initialized by performing initial pruning if needed, and finally adjusting the subplan_map arrays in the PartitionPruneState to reflect the new set of subplans remaining after initial pruning if it was indeed performed. ExecCreatePartitionPruneState() is no longer exported out of execPartition.c and has been renamed to CreatePartitionPruneState() as a local sub-routine of ExecInitPartitionPruning(). * Likewise, ExecFindInitialMatchingSubPlans() that was in charge of performing initial pruning no longer needs to be exported. In fact, since it would now have the same body as the more generally named ExecFindMatchingSubPlans(), except differing in the value of initial_prune passed to the common subroutine find_matching_subplans_recurse(), it seems better to remove it and add an initial_prune argument to ExecFindMatchingSubPlans(). * Add an ExprContext field to PartitionPruneContext to remove the implicit assumption in the runtime pruning code that the ExprContext to use to compute pruning expressions that need one can always rely on the PlanState providing it. A future patch will allow runtime pruning (at least the initial pruning steps) to be performed without the corresponding PlanState yet having been created, so this will help. Author: Amit Langote <[email protected]> Discussion: https://postgr.es/m/CA+HiwqEYCpEqh2LMDOp9mT+4-QoVe8HgFMKBjntEMCTZLpcCCA@mail.gmail.com
2022-01-08Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-08-02Track a Bitmapset of non-pruned partitions in RelOptInfoDavid Rowley
For partitioned tables with large numbers of partitions where queries are able to prune all but a very small number of partitions, the time spent in the planner looping over RelOptInfo.part_rels checking for non-NULL RelOptInfos could become a large portion of the overall planning time. Here we add a Bitmapset that records the non-pruned partitions. This allows us to more efficiently skip the pruned partitions by looping over the Bitmapset. This will cause a very slight slow down in cases where no or not many partitions could be pruned, however, those cases are already slow to plan anyway and the overhead of looping over the Bitmapset would be unmeasurable when compared with the other tasks such as path creation for a large number of partitions. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com
2021-04-08Cleanup partition pruning step generationDavid Rowley
There was some code in gen_prune_steps_from_opexps that needlessly checked a list was not empty when it clearly had to contain at least one item. This prompted a further cleanup operation in partprune.c. Additionally, the previous code could end up adding additional needless INTERSECT steps. However, those do not appear to be able to cause any misbehavior. gen_prune_steps_from_opexps is now no longer in charge of generating combine pruning steps. Instead, gen_partprune_steps_internal, which already does some combine step creation has been given the sole responsibility of generating all combine steps. This means that when we recursively call gen_partprune_steps_internal, since it always now adds a combine step when it produces multiple steps, we can just pay attention to the final step returned. In passing, do quite a bit of work on the comments to try to more clearly explain the role of both gen_partprune_steps_internal and gen_prune_steps_from_opexps. This is fairly complex code so some extra effort to give any new readers an overview of how things work seems like a good idea. Author: Amit Langote Reported-by: Andy Fan Reviewed-by: Kyotaro Horiguchi, Andy Fan, Ryan Lambert, David Rowley Discussion: https://postgr.es/m/CAKU4AWqWoVii+bRTeBQmeVW+PznkdO8DfbwqNsu9Gj4ubt9A6w@mail.gmail.com
2021-02-01Remove [Merge]AppendPath.partitioned_rels.Tom Lane
It turns out that the calculation of [Merge]AppendPath.partitioned_rels in allpaths.c is faulty and sometimes omits relevant non-leaf partitions, allowing an assertion added by commit a929e17e5a8 to trigger. Rather than fix that, it seems better to get rid of those fields altogether. We don't really need the info until create_plan time, and calculating it once for the selected plan should be cheaper than calculating it for each append path we consider. The preceding two commits did away with all use of the partitioned_rels values; this commit just mechanically removes the fields and the code that calculated them. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-02-01Revise make_partition_pruneinfo to not use its partitioned_rels input.Tom Lane
It turns out that the calculation of [Merge]AppendPath.partitioned_rels in allpaths.c is faulty and sometimes omits relevant non-leaf partitions, allowing an assertion added by commit a929e17e5a8 to trigger. Rather than fix that, it seems better to get rid of those fields altogether. We don't really need the info until create_plan time, and calculating it once for the selected plan should be cheaper than calculating it for each append path we consider. As a first step, teach make_partition_pruneinfo to collect the relevant partitioned tables for itself. It's not hard to do so by traversing from child tables up to parents using the AppendRelInfo links. While here, make some minor stylistic improvements; mainly, don't use the "Relids" alias for bitmapsets that are not identities of any relation considered by the planner. Try to document the logic better, too. No backpatch, as there does not seem to be a live problem before a929e17e5a8. Also no new regression test; the code where the bug was will be gone at the end of this patch series, so it seems a bit pointless to memorialize the issue. Tom Lane and David Rowley, per reports from Andreas Seltenreich and Jaime Casanova. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
2021-01-28Fix hash partition pruning with asymmetric partition sets.Tom Lane
perform_pruning_combine_step() was not taught about the number of partition indexes used in hash partitioning; more embarrassingly, get_matching_hash_bounds() also had it wrong. These errors are masked in the common case where all the partitions have the same modulus and no partition is missing. However, with missing or unequal-size partitions, we could erroneously prune some partitions that need to be scanned, leading to silently wrong query answers. While a minimal-footprint fix for this could be to export get_partition_bound_num_indexes and make the incorrect functions use it, I'm of the opinion that that function should never have existed in the first place. It's not reasonable data structure design that PartitionBoundInfoData lacks any explicit record of the length of its indexes[] array. Perhaps that was all right when it could always be assumed equal to ndatums, but something should have been done about it as soon as that stopped being true. Putting in an explicit "nindexes" field makes both partition_bounds_equal() and partition_bounds_copy() simpler, safer, and faster than before, and removes explicit knowledge of the number-of-partition-indexes rules from some other places too. This change also makes get_hash_partition_greatest_modulus obsolete. I left that in place in case any external code uses it, but no core code does anymore. Per bug #16840 from Michał Albrycht. Back-patch to v11 where the hash partitioning code came in. (In the back branches, add the new field at the end of PartitionBoundInfoData to minimize ABI risks.) Discussion: https://postgr.es/m/[email protected]
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-12-24Fix typos and grammar in docs and commentsMichael Paquier
This fixes several areas of the documentation and some comments in matters of style, grammar, or even format. Author: Justin Pryzby Discussion: https://postgr.es/m/[email protected]
2020-11-02Allow run-time pruning on nested Append/MergeAppend nodesDavid Rowley
Previously we only tagged on the required information to allow the executor to perform run-time partition pruning for Append/MergeAppend nodes belonging to base relations. It was thought that nested Append/MergeAppend nodes were just about always pulled up into the top-level Append/MergeAppend and that making the run-time pruning info for any sub Append/MergeAppend nodes was a waste of time. However, that was likely badly thought through. Some examples of cases we're unable to pullup nested Append/MergeAppends are: 1) Parallel Append nodes with a mix of parallel and non-parallel paths into a Parallel Append. 2) When planning an ordered Append scan a sub-partition which is unordered may require a nested MergeAppend path to ensure sub-partitions don't mix up the order of tuples being fed into the top-level Append. Unfortunately, it was not just as simple as removing the lines in createplan.c which were purposefully not building the run-time pruning info for anything but RELOPT_BASEREL relations. The code in add_paths_to_append_rel() was far too sloppy about which partitioned_rels it included for the Append/MergeAppend paths. The original code there would always assume accumulate_append_subpath() would pull each sub-Append and sub-MergeAppend path into the top-level path. While it does not appear that there were any actual bugs caused by having the additional partitioned table RT indexes recorded, what it did mean is that later in planning, when we built the run-time pruning info that we wasted effort and built PartitionedRelPruneInfos for partitioned tables that we had no subpaths for the executor to run-time prune. Here we tighten that up so that partitioned_rels only ever contains the RT index for partitioned tables which actually have subpaths in the given Append/MergeAppend. We can now Assert that every PartitionedRelPruneInfo has a non-empty present_parts. That should allow us to catch any weird corner cases that have been missed. In passing, it seems there is no longer a good reason to have the AppendPath and MergeAppendPath's partitioned_rel fields a List of IntList. We can simply have a List of Relids instead. This is more compact in memory and faster to add new members to. We still know which is the root level partition as these always have a lower relid than their children. Previously this field was used for more things, but run-time partition pruning now remains the only user of it and it has no need for a List of IntLists. Here we also get rid of the RelOptInfo partitioned_child_rels field. This is what was previously used to (sometimes incorrectly) set the Append/MergeAppend path's partitioned_rels field. That was the only usage of that field, so we can happily just remove it. I also couldn't resist changing some nearby code to make use of the newly added for_each_from macro so we can skip the first element in the list without checking if the current item was the first one on each iteration. A bug report from Andreas Kretschmer prompted all this work, however, after some consideration, I'm not personally classing this as a bug fix. So no backpatch. In Andreas' test case, it just wasn't that clear that there was a nested Append since the top-level Append just had a single sub-path which was pulled up a level, per 8edd0e794. Author: David Rowley Reviewed-by: Amit Langote Discussion: https://postgr.es/m/flat/CAApHDvqSchs%2BubdybcfFaSPB%2B%2BEA7kqMaoqajtP0GtZvzOOR3g%40mail.gmail.com
2020-08-07Fix yet another issue with step generation in partition pruning.Etsuro Fujita
Commit 13838740f fixed some issues with step generation in partition pruning, but there was yet another one: get_steps_using_prefix() assumes that clauses in the passed-in prefix list are sorted in ascending order of their partition key numbers, but the caller failed to ensure this for range partitioning, which led to an assertion failure in debug builds. Adjust the caller function to arrange the clauses in the prefix list in the required order for range partitioning. Back-patch to v11, like the previous commit. Patch by me, reviewed by Amit Langote. Discussion: https://postgr.es/m/CAPmGK16jkXiFG0YqMbU66wte-oJTfW6D1HaNvQf%3D%2B5o9%3Dm55wQ%40mail.gmail.com
2020-07-28Fix some issues with step generation in partition pruning.Etsuro Fujita
In the case of range partitioning, get_steps_using_prefix() assumes that the passed-in prefix list contains at least one clause for each of the partition keys earlier than one specified in the passed-in step_lastkeyno, but the caller (ie, gen_prune_steps_from_opexps()) didn't take it into account, which led to a server crash or incorrect results when the list contained no clauses for such partition keys, as reported in bug #16500 and #16501 from Kobayashi Hisanori. Update the caller to call that function only when the list created there contains at least one clause for each of the earlier partition keys in the case of range partitioning. While at it, fix some other issues: * The list to pass to get_steps_using_prefix() is allowed to contain multiple clauses for the same partition key, as described in the comment for that function, but that function actually assumed that the list contained just a single clause for each of middle partition keys, which led to an assertion failure when the list contained multiple clauses for such partition keys. Update that function to match the comment. * In the case of hash partitioning, partition keys are allowed to be NULL, in which case the list to pass to get_steps_using_prefix() contains no clauses for NULL partition keys, but that function treats that case as like the case of range partitioning, which led to the assertion failure. Update the assertion test to take into account NULL partition keys in the case of hash partitioning. * Fix a typo in a comment in get_steps_using_prefix_recurse(). * gen_partprune_steps() failed to detect self-contradiction from strict-qual clauses and an IS NULL clause for the same partition key in some cases, producing incorrect partition-pruning steps, which led to incorrect results of partition pruning, but didn't cause any user-visible problems fortunately, as the self-contradiction is detected later in the query planning. Update that function to detect the self-contradiction. Per bug #16500 and #16501 from Kobayashi Hisanori. Patch by me, initial diagnosis for the reported issue and review by Dmitry Dolgov. Back-patch to v11, where partition pruning was introduced. Discussion: https://postgr.es/m/16500-d1613f2a78e1e090%40postgresql.org Discussion: https://postgr.es/m/16501-5234a9a0394f6754%40postgresql.org
2020-05-14Initial pgindent and pgperltidy run for v13.Tom Lane
Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
2020-05-13Dial back -Wimplicit-fallthrough to level 3Alvaro Herrera
The additional pain from level 4 is excessive for the gain. Also revert all the source annotation changes to their original wordings, to avoid back-patching pain. Discussion: https://postgr.es/m/[email protected]
2020-05-12Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGSAlvaro Herrera
Use it at level 4, a bit more restrictive than the default level, and tweak our commanding comments to FALLTHROUGH. (However, leave zic.c alone, since it's external code; to avoid the warnings that would appear there, change CFLAGS for that file in the Makefile.) Author: Julien Rouhaud <[email protected]> Author: Álvaro Herrera <[email protected]> Reviewed-by: Tom Lane <[email protected]> Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol Discussion: https://postgr.es/m/flat/[email protected]
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-08-16Don't include utils/array.h from acl.h.Andres Freund
For most uses of acl.h the details of how "Acl" internally looks like are irrelevant. It might make sense to move a lot of the implementation details into a separate header at a later point. The main motivation of this change is to avoid including fmgr.h (via array.h, which needs it for exposed structs) in a lot of files that otherwise don't need it. A subsequent commit will remove the fmgr.h include from a lot of files. Directly include utils/array.h and utils/expandeddatum.h from the files that need them, but previously included them indirectly, via acl.h. Author: Andres Freund Discussion: https://postgr.es/m/[email protected]
2019-08-13Don't constraint-exclude partitioned tables as muchAlvaro Herrera
We only need to invoke constraint exclusion on partitioned tables when they are a partition, and they themselves contain a default partition; it's not necessary otherwise, and it's expensive, so avoid it. Also, we were trying once for each clause separately, but we can do it for all the clauses at once. While at it, centralize setting of RelOptInfo->partition_qual instead of computing it in slightly different ways in different places. Per complaints from Simon Riggs about 4e85642d935e; reviewed by Yuzuko Hosoya, Kyotaro Horiguchi. Author: Amit Langote. I (Álvaro) again mangled the patch somewhat. Discussion: https://postgr.es/m/CANP8+j+tMCY=nEcQeqQam85=uopLBtX-2vHiLD2bbp7iQQUKpA@mail.gmail.com
2019-08-13Fix inconsistencies and typos in the tree, take 10Michael Paquier
This addresses some issues with unnecessary code comments, fixes various typos in docs and comments, and removes some orphaned structures and definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/[email protected]
2019-08-12Rationalize use of list_concat + list_copy combinations.Tom Lane
In the wake of commit 1cff1b95a, the result of list_concat no longer shares the ListCells of the second input. Therefore, we can replace "list_concat(x, list_copy(y))" with just "list_concat(x, y)". To improve call sites that were list_copy'ing the first argument, or both arguments, invent "list_concat_copy()" which produces a new list sharing no ListCells with either input. (This is a bit faster than "list_concat(list_copy(x), y)" because it makes the result list the right size to start with.) In call sites that were not list_copy'ing the second argument, the new semantics mean that we are usually leaking the second List's storage, since typically there is no remaining pointer to it. We considered inventing another list_copy variant that would list_free the second input, but concluded that for most call sites it isn't worth worrying about, given the relative compactness of the new List representation. (Note that in cases where such leakage would happen, the old code already leaked the second List's header; so we're only discussing the size of the leak not whether there is one. I did adjust two or three places that had been troubling to free that header so that they manually free the whole second List.) Patch by me; thanks to David Rowley for review. Discussion: https://postgr.es/m/[email protected]
2019-08-09Fix SIGSEGV in pruning for ScalarArrayOp with constant-null array.Tom Lane
Not much to be said here: commit 9fdb675fc should have checked constisnull, didn't. Per report from Piotr Włodarczyk. Back-patch to v11 where bug was introduced. Discussion: https://postgr.es/m/CAP-dhMr+vRpwizEYjUjsiZ1vwqpohTm+3Pbdt6Pr7FEgPq9R0Q@mail.gmail.com
2019-08-07Apply constraint exclusion more generally in partitioningAlvaro Herrera
We were applying constraint exclusion on the partition constraint when generating pruning steps for a clause, but only for the rather restricted situation of them being boolean OR operators; however it is possible to have differently shaped clauses that also benefit from constraint exclusion. This applies particularly to the default partition since their constraints are in essence a long list of OR'ed subclauses ... but it applies to other cases too. So in certain cases we're scanning partitions that we don't need to. Remove the specialized code in OR clauses, and add a generally applicable test of the clause refuting the partition constraint; mark the whole pruning operation as contradictory if it hits. This has the unwanted side-effect of testing some (most? all?) constraints more than once if constraint_exclusion=on. That seems unavoidable as far as I can tell without some additional work, but that's not the recommended setting for that parameter anyway. However, because this imposes additional processing cost for all queries using partitioned tables, I decided not to backpatch this change. Author: Amit Langote, Yuzuko Hosoya, Álvaro Herrera Reviewers: Shawn Wang, Thibaut Madeleine, Yoshikazu Imai, Kyotaro Horiguchi; they were also uncredited reviewers for commit 489247b0e615. Discussion: https://postgr.es/m/[email protected]
2019-08-04Improve pruning of a default partitionAlvaro Herrera
When querying a partitioned table containing a default partition, we were wrongly deciding to include it in the scan too early in the process, failing to exclude it in some cases. If we reinterpret the PruneStepResult.scan_default flag slightly, we can do a better job at detecting that it can be excluded. The change is that we avoid setting the flag for that pruning step unless the step absolutely requires the default partition to be scanned (in contrast with the previous arrangement, which was to set it unless the step was able to prune it). So get_matching_partitions() must explicitly check the partition that each returned bound value corresponds to in order to determine whether the default one needs to be included, rather than relying on the flag from the final step result. Author: Yuzuko Hosoya <[email protected]> Reviewed-by: Amit Langote <[email protected]> Discussion: https://postgr.es/m/[email protected]
2019-07-15Represent Lists as expansible arrays, not chains of cons-cells.Tom Lane
Originally, Postgres Lists were a more or less exact reimplementation of Lisp lists, which consist of chains of separately-allocated cons cells, each having a value and a next-cell link. We'd hacked that once before (commit d0b4399d8) to add a separate List header, but the data was still in cons cells. That makes some operations -- notably list_nth() -- O(N), and it's bulky because of the next-cell pointers and per-cell palloc overhead, and it's very cache-unfriendly if the cons cells end up scattered around rather than being adjacent. In this rewrite, we still have List headers, but the data is in a resizable array of values, with no next-cell links. Now we need at most two palloc's per List, and often only one, since we can allocate some values in the same palloc call as the List header. (Of course, extending an existing List may require repalloc's to enlarge the array. But this involves just O(log N) allocations not O(N).) Of course this is not without downsides. The key difficulty is that addition or deletion of a list entry may now cause other entries to move, which it did not before. For example, that breaks foreach() and sister macros, which historically used a pointer to the current cons-cell as loop state. We can repair those macros transparently by making their actual loop state be an integer list index; the exposed "ListCell *" pointer is no longer state carried across loop iterations, but is just a derived value. (In practice, modern compilers can optimize things back to having just one loop state value, at least for simple cases with inline loop bodies.) In principle, this is a semantics change for cases where the loop body inserts or deletes list entries ahead of the current loop index; but I found no such cases in the Postgres code. The change is not at all transparent for code that doesn't use foreach() but chases lists "by hand" using lnext(). The largest share of such code in the backend is in loops that were maintaining "prev" and "next" variables in addition to the current-cell pointer, in order to delete list cells efficiently using list_delete_cell(). However, we no longer need a previous-cell pointer to delete a list cell efficiently. Keeping a next-cell pointer doesn't work, as explained above, but we can improve matters by changing such code to use a regular foreach() loop and then using the new macro foreach_delete_current() to delete the current cell. (This macro knows how to update the associated foreach loop's state so that no cells will be missed in the traversal.) There remains a nontrivial risk of code assuming that a ListCell * pointer will remain good over an operation that could now move the list contents. To help catch such errors, list.c can be compiled with a new define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents whenever that could possibly happen. This makes list operations significantly more expensive so it's not normally turned on (though it is on by default if USE_VALGRIND is on). There are two notable API differences from the previous code: * lnext() now requires the List's header pointer in addition to the current cell's address. * list_delete_cell() no longer requires a previous-cell argument. These changes are somewhat unfortunate, but on the other hand code using either function needs inspection to see if it is assuming anything it shouldn't, so it's not all bad. Programmers should be aware of these significant performance changes: * list_nth() and related functions are now O(1); so there's no major access-speed difference between a list and an array. * Inserting or deleting a list element now takes time proportional to the distance to the end of the list, due to moving the array elements. (However, it typically *doesn't* require palloc or pfree, so except in long lists it's probably still faster than before.) Notably, lcons() used to be about the same cost as lappend(), but that's no longer true if the list is long. Code that uses lcons() and list_delete_first() to maintain a stack might usefully be rewritten to push and pop at the end of the list rather than the beginning. * There are now list_insert_nth...() and list_delete_nth...() functions that add or remove a list cell identified by index. These have the data-movement penalty explained above, but there's no search penalty. * list_concat() and variants now copy the second list's data into storage belonging to the first list, so there is no longer any sharing of cells between the input lists. The second argument is now declared "const List *" to reflect that it isn't changed. This patch just does the minimum needed to get the new implementation in place and fix bugs exposed by the regression tests. As suggested by the foregoing, there's a fair amount of followup work remaining to do. Also, the ENABLE_LIST_COMPAT macros are finally removed in this commit. Code using those should have been gone a dozen years ago. Patch by me; thanks to David Rowley, Jesper Pedersen, and others for review. Discussion: https://postgr.es/m/[email protected]
2019-07-12Fix RANGE partition pruning with multiple boolean partition keysDavid Rowley
match_clause_to_partition_key incorrectly would return PARTCLAUSE_UNSUPPORTED if a bool qual could not be matched to the current partition key. This was a problem, as it causes the calling function to discard the qual and not try to match it to any other partition key. If there was another partition key which did match this qual, then the qual would not be checked again and we could fail to prune some partitions. The worst this could do was to cause partitions not to be pruned when they could have been, so there was no danger of incorrect query results here. Fix this by changing match_boolean_partition_clause to have it return a PartClauseMatchStatus rather than a boolean value. This allows it to communicate if the qual is unsupported or if it just does not match this particular partition key, previously these two cases were treated the same. Now, if match_clause_to_partition_key is unable to match the qual to any other qual type then we can simply return the value from the match_boolean_partition_clause call so that the calling function properly treats the qual as either unmatched or unsupported. Reported-by: Rares Salcudean Reviewed-by: Amit Langote Backpatch-through: 11 where partition pruning was introduced Discussion: https://postgr.es/m/CAHp_FN2xwEznH6oyS0hNTuUUZKp5PvegcVv=Co6nBXJ+mC7Y5w@mail.gmail.com
2019-05-22Phase 2 pgindent run for v12.Tom Lane
Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
2019-05-17Restructure creation of run-time pruning steps.Tom Lane
Previously, gen_partprune_steps() always built executor pruning steps using all suitable clauses, including those containing PARAM_EXEC Params. This meant that the pruning steps were only completely safe for executor run-time (scan start) pruning. To prune at executor startup, we had to ignore the steps involving exec Params. But this doesn't really work in general, since there may be logic changes needed as well --- for example, pruning according to the last operator's btree strategy is the wrong thing if we're not applying that operator. The rules embodied in gen_partprune_steps() and its minions are sufficiently complicated that tracking their incremental effects in other logic seems quite impractical. Short of a complete redesign, the only safe fix seems to be to run gen_partprune_steps() twice, once to create executor startup pruning steps and then again for run-time pruning steps. We can save a few cycles however by noting during the first scan whether we rejected any clauses because they involved exec Params --- if not, we don't need to do the second scan. In support of this, refactor the internal APIs in partprune.c to make more use of passing information in the GeneratePruningStepsContext struct, rather than as separate arguments. This is, I hope, the last piece of our response to a bug report from Alan Jackson. Back-patch to v11 where this code came in. Discussion: https://postgr.es/m/[email protected]
2019-05-16Fix bogus logic for combining range-partitioned columns during pruning.Tom Lane
gen_prune_steps_from_opexps's notion of how to do this was overly complicated and underly correct. Per discussion of a report from Alan Jackson (though this fixes only one aspect of that problem). Back-patch to v11 where this code came in. Amit Langote Discussion: https://postgr.es/m/[email protected]
2019-05-16Fix partition pruning to treat stable comparison operators properly.Tom Lane
Cross-type comparison operators in a btree or hash opclass might be only stable not immutable (this is true of timestamp vs. timestamptz for example). partprune.c ignored this possibility and would perform plan-time pruning with them anyway, possibly leading to wrong answers if the environment changed between planning and execution. To fix, teach gen_partprune_steps() to do things differently when creating plan-time pruning steps vs. run-time pruning steps. analyze_partkey_exprs() also needs an extra check, which is rather annoying but now is not the time to restructure things enough to avoid that. While at it, simplify the logic for the plan-time case a little by insisting that the comparison value be a Const and nothing else. This relies on the assumption that eval_const_expressions will have reduced any immutable expression to a Const; which is not quite 100% true, but certainly any case that comes up often enough to be interesting should have simplification logic there. Also improve a bunch of inadequate/obsolete/wrong comments. Per discussion of a report from Alan Jackson (though this fixes only one aspect of that problem). Back-patch to v11 where this code came in. David Rowley, with some further hacking by me Discussion: https://postgr.es/m/[email protected]
2019-03-30Speed up planning when partitions can be pruned at plan time.Tom Lane
Previously, the planner created RangeTblEntry and RelOptInfo structs for every partition of a partitioned table, even though many of them might later be deemed uninteresting thanks to partition pruning logic. This incurred significant overhead when there are many partitions. Arrange to postpone creation of these data structures until after we've processed the query enough to identify restriction quals for the partitioned table, and then apply partition pruning before not after creation of each partition's data structures. In this way we need not open the partition relations at all for partitions that the planner has no real interest in. For queries that can be proven at plan time to access only a small number of partitions, this patch improves the practical maximum number of partitions from under 100 to perhaps a few thousand. Amit Langote, reviewed at various times by Dilip Kumar, Jesper Pedersen, Yoshikazu Imai, and David Rowley Discussion: https://postgr.es/m/[email protected]
2019-03-25Fix use of wrong datatype with sizeof().Robert Haas
OID and int are the same size, but they are not the same thing. David Rowley Discussion: http://postgr.es/m/CAKJS1f_MhS++XngkTvWL9X1v8M5t-0N0B-R465yHQY=TmNV0Ew@mail.gmail.com
2019-03-22Rearrange make_partitionedrel_pruneinfo to avoid work when we can't prune.Tom Lane
Postpone most of the effort of constructing PartitionedRelPruneInfos until after we have found out whether run-time pruning is needed at all. This costs very little duplicated effort (basically just an extra find_base_rel() call per partition) and saves quite a bit when we can't do run-time pruning. Also, merge the first loop (for building relid_subpart_map) into the second loop, since we don't need the map to be valid during that loop. Amit Langote Discussion: https://postgr.es/m/[email protected]
2019-03-22Collations with nondeterministic comparisonPeter Eisentraut
This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <[email protected]> Reviewed-by: Peter Geoghegan <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2019-03-07Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.Robert Haas
We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com