git.postgresql.org Git - postgresql.git/log

Make levels 1-based in pg_log_backend_memory_contexts()

Both pg_get_process_memory_contexts() and pg_backend_memory_contexts
have 1-based levels, whereas pg_log_backend_memory_contexts() was using
0-based levels. Align these.

This results in slightly saner behavior from MemoryContextStatsDetail()
in regards to the max_level. Previously it would stop at 1 level before
the maximum requested level rather than at that level.

Reported-by: Atsushi Torikoshi <[email protected]>
Author: Atsushi Torikoshi <[email protected]>
Author: David Rowley <[email protected]
Reviewed-by: Melih Mutlu <[email protected]>
Reviewed-by: Rahila Syed <[email protected]>
Discussion: https://postgr.es/m/395ea5d4fe190480efa95bf533485c70@oss.nttdata.com

Suppress "may be used uninitialized" warnings from older compilers.

The "children" list won't be used until "got_children" has been set
true, but older compilers don't get that; about half a dozen
buildfarm animals are warning about this. Issue added by 11ff192b5.

While here, improve slightly-shaky grammar in comment.

Discussion: https://postgr.es/m/2057835.1744833309@sss.pgh.pa.us

Portability fix: isdigit() must be passed an unsigned char.

Oversight in commit 40b9c2701, per buildfarm member mamba.

Cache typlens of a SQL function's input arguments.

This gets rid of repetitive get_typlen calls in postquel_sub_params,
which show up as costing a few percent of the runtime in simple test
cases (more with more parameters).

In combination with the preceding patches, this gets us most of the
way back down to the amount of per-call overhead that functions.c
had before commit 0dca5d68d. There are some more things that could
be done, but this seems like an okay place to stop for v18.

Make SQLFunctionCache long-lived again.

At this point, the only data structures we allocate directly in
fcontext are the SQLFunctionCache struct itself, the ParamListInfo
struct, and the execution_state array, all of which are small and
perfectly capable of being re-used across executions of the same
FmgrInfo.  Hence, let's give them the same lifespan as the FmgrInfo.
This step gets rid of the separate SQLFunctionLink struct and makes
fn_extra point to SQLFunctionCache again.  We also get rid of the
separate fcontext memory context and allocate these items directly
in fn_mcxt.

For notational simplicity, SQLFunctionCache still has an fcontext
field, but it's just a copy of fn_mcxt.

The motivation for this is to allow these structures to live as
long as the FmgrInfo and be re-used across calls, restoring the
original design without its propensity for memory leaks.  This
gets rid of some per-call overhead that we added in 0dca5d68d.

We also make an effort to re-use the JunkFilter and result slot.
Those might need to change if the function definition changes,
so we compromise by rebuilding them if the cached plan changes.

This also moves the tuplestore into fn_mcxt so that it can be
re-used across calls, again undoing a change made in 0dca5d68d.

Split some storage out to separate subcontexts of fcontext.

Put the JunkFilter and its result slot (and thence also
some subsidiary data such as the result tupledesc) into a
separate subcontext "jfcontext".  This doesn't accomplish
a lot at this point, because we make a new JunkFilter each
time through the SQL function.  However, the plan is to make
the fcontext long-lived, and that raises the possibility
that we'll need a new JunkFilter because the plan for the
result-generating query changes.  A separate context makes
it easy to free the obsoleted data when that happens.

Also, instead of always running the sub-executor in fcontext,
make a separate context for it if we're doing lazy eval of
a SRF, and otherwise just run it inside CurrentMemoryContext.

Make functions.c mostly run in a short-lived memory context.

Previously, much of this code ran with CurrentMemoryContext set
to be the function's fcontext, so that we tended to leak a lot of
stuff there.  Commit 0dca5d68d dealt with that by releasing the
fcontext at the completion of each SQL function call, but we'd
like to go back to the previous approach of allowing the fcontext
to be query-lifespan.  To control the leakage problem, rearrange
the code so that we mostly run in the memory context that fmgr_sql
is called in (which we expect to be short-lived).  Notably, this
means that parsing/planning is all done in the short-lived context
and doesn't leak cruft into fcontext.

This patch also fixes the allocation of execution_state records
so that we don't leak them across executions.  I set that up
with a re-usable array that contains at least as many
execution_state structs as we need for the current querytree.
The chain structure is still there, but it's not really doing
much for us, and maybe somebody will be motivated to get rid
of it.  I'm not though.

This incidentally also moves the call of BlessTupleDesc to be
with the code that creates the JunkFilter.  That doesn't make
much difference now, but a later patch will reduce the number
of times the JunkFilter gets made, and we needn't bless the
results any more often than that.

We still leak a fair amount in fcontext, particularly when
executing utility statements, but that's material for a
separate patch step; the point here is only to get rid of
unintentional allocations in fcontext.

Minor performance improvement for SQL-language functions.

Late in the development of commit 0dca5d68d, I added a step to copy
the result tlist we extract from the cached final query, because
I was afraid that that might not last as long as the JunkFilter that
we're passing it off to. However, that turns out to cost a noticeable
number of cycles, and it's really quite unnecessary because the
JunkFilter will not examine that tlist after it's been created.
(ExecFindJunkAttribute would use it, but we don't use that function
on this JunkFilter.) Hence, remove the copy step. For safety,
reset the might-become-dangling jf_targetList pointer to NIL.

In passing, remove DR_sqlfunction.cxt, which we don't use anymore;
it's confusing because it's not entirely clear which context it
ought to point at.

Assert lack of hazardous buffer locks before possible catalog read.

Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind,
which existed in all branches for six days before detection.  While the
probability of reaching the trouble was low, the disruption was extreme.  No
new backends could start, and service restoration needed an immediate
shutdown.  Hence, add this to catch the next bug like it.

The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit
0bada39.  This also checks in a number of similar places.  It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.

No back-patch for now, but a back-patch of commit 243e9b4 should back-patch
this, too.  A back-patch could omit the src/test/regress changes, since back
branches won't gain new index columns.

Reported-by: Alexander Lakhin <[email protected]>
Discussion: https://postgr.es/m/20250410191830 [email protected]
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com

pg_dump: Set private_date pointer to NULL in callback

The end callback for ZStandard compression frees the private_data
but didn't set the pointer to NULL after freeing. This is not a
bug as the code is right now, since nothing is dereferencing the
pointer upon returning from the callback but it is good practice
to do.

Author: Alexander Kuznetsov <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/efaee52b-9550-44ca-8633-ea86076b3283@altlinux.org

pg_dump: Fix incorrect archive format shown in error message.

In pg_dump and pg_restore, _allocAH() calls _discoverArchiveFormat() to
determine the archive format when the input format is unknown one.
If the input or discovered format is unrecognized, it reports an error
including the archive format number.

If discovered format is unrecognized, its number should be shown in
the error message. But previously the error message mistakenly showed
the originally requested format number (i.e., unknown one) instead of
the discovered one, due to referencing the wrong variable in the error
message.

This commit corrects the issue by using the appropriate variable in
the error message.

This fix has no practical impact since _discoverArchiveFormat() never
returns an unrecognized format and that error mesasge is actually
never output. Therefore, while the issue exists in back branches,
it's not worth the trouble and buildfarm cycles to back-patch.
So this fix is applied only to the master branch.

Author: Mahendra Singh Thalor <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/CAKYtNAqu+N-Ab2Fq6wzNSOm_-0N-BMneanYNV1+6kFDXjva1Eg@mail.gmail.com

Another unintentional behavior change in commit e9931bfb75.

Reported-by: Noah Misch <[email protected]>
Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/20250412123430 [email protected]

Improve comment in regc_pg_locale.c.

Reported-by: Noah Misch <[email protected]>
Reviewed-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/20250412123430 [email protected]

Fixup various new-to-v18 usages of appendPQExpBuffer

Use appendPQExpBufferStr when there are no parameters and
appendPQExpBufferChar when the string length is 1.

Author: David Rowley <[email protected]>
Discussion: https://postgr.es/m/CAApHDvoARMvPeXTTC0HnpARBHn-WgVstc8XFCyMGOzvgu_1HvQ@mail.gmail.com

Improve comments for estimate_multivariate_ndistinct()

estimate_multivariate_ndistinct() is coded to assume the caller handles
passing it a list of GroupVarInfos with unique 'var' fields over the
entire list.  6bb6a62f3 added code which didn't ensure this and that
could result in estimate_multivariate_ndistinct() erroring out with:

ERROR:  corrupt MVNDistinct entry

This occurred because estimate_multivariate_ndistinct() first searches
for a set of stats that match to at least two of the given GroupVarInfos
and then later assumes that the MVNDistinctItem.items array of the
best matching stats will have an entry for those two columns.  If the
GroupVarInfos List contained a duplicate entry then the same column could
be matched to twice and that could trick the code into thinking we have
>= 2 columns matched in cases where only a single distinct column has been
matched.  This could result in a failure to find the correct
MVNDistinctItem in the stats as the array containing those never
contains an item for single columns.

Here we make it more clear that the function needs a distinct set of
GroupVarInfos and also tidy up a few other comments to make things a bit
easier to follow.

Author: David Rowley <[email protected]>
Discussion: https://postgr.es/m/CAApHDvocZCUhM9W9mJ39d6oQz7ePKoqFnao_347mvC-A7QatcQ@mail.gmail.com

Sync declarations and definitions of two new tablecmds.c functions.

Buildfarm member drongo complained because the definitions of these
functions used "const Oid foo" where the forward declarations just
had "Oid foo". (I'm a bit surprised that drongo seems to be the only
complainant.) I chose to fix this by removing the "consts" because
(a) I'm generally not a fan of using const that way, and (b) it was
a minority usage even within these two functions, let alone compared
to the rest of our code base.

Oversight in commit eec0040c4, so no need for back-patch.

Elide not-null constraint checks on child tables during PK creation

We were unnecessarily acquiring AccessExclusiveLock on all child tables
when "ALTER TABLE ONLY sometab ADD PRIMARY KEY" was run on their parent
table, an oversight in commit 14e87ffa5c54.  This caused deadlocks
during pg_restore of partitioned tables.

The reason to acquire the AEL was that we need to verify that child
tables have the involved columns already marked as not-null; but if the
parent table has an inheritable not-null constraint, then all children
must necessarily be in the correct state already, so we can skip the
check, which avoids acquiring the lock.  Reorder the code so that it
works that way.  This doesn't change things in the case where the
constraint doesn't exist, but that case is of lesser importance because
it doesn't occur during parallel pg_restore.

While at it, reword some errmsg() and add errhint() to similar cases in
related but not adjacent code.

Diagnosed-by: Tom Lane <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Discussion: https://postgr.es/m/67469c1c-38bc-7d94-918a-67033f5dd731@gmx.net
Discussion: https://postgr.es/m/2045026.1743801143@sss.pgh.pa.us
Discussion: https://postgr.es/m/1280408.1744650810@sss.pgh.pa.us

Update pg_config.h.in with libnuma changes

Add macros from autoheader which were accidentally omitted in
commit 65c298f61fc. There is no function change by this as no
code is currently using the missing macro.

Author: Daniel Gustafsson <[email protected]>
Reviewed-by: Jacob Champion <[email protected]>
Discussion: https://postgr.es/m/CF6D7D7F-E1C4-45BE-9019-0F4B4BC7C135@yesql.se

Fix pg_dump --clean with partitioned indexes.

We'd try to drop the partitions of a partitioned index separately,
which is disallowed by the backend, leading to an error during
restore. While the error is harmless, it causes problems if you
try to use --single-transaction mode.

Fortunately, there seems no need to do a DROP at all, since the
partition will go away silently when we drop either the parent index
or the partition's table. So just make the DROP conditional on not
being a partition.

Reported-by: jian he <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: Pavel Stehule <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CACJufxF0QSdkjFKF4di-JGWN6CSdQYEAhGPmQJJCdkSZtd=oLg@mail.gmail.com
Backpatch-through: 13

pg_restore cleanups

. remove unnecessary oid_string list stuff
. use pg_get_line_buf() instead of open-coding it
. cleaner parsing of map.dat lines

Reverts 2b69afbe50d add new list type simple_oid_string_list to fe-utils/simple_list

Author: Álvaro Herrera <[email protected]>
Author: Andrew Dunstan <[email protected]>

Discussion: https://postgr.es/m/202504141220 [email protected]

Fix an incorrect check in get_memoize_path

Memoize typically marks cache entries as complete after fully scanning
the inner side of a join.  However, in the case of unique joins, we
skip to the next outer tuple as soon as the first matching inner tuple
is found, leaving no opportunity to scan the inner side to completion.
To work around that, we mark cache entries as complete after fetching
the first matching inner tuple in unique joins.

This approach is only safe when all of the join's restriction clauses
are parameterized; otherwise, there is no guarantee that reading just
one tuple from the inner side is sufficient.

Currently, we check for this by verifying that the number of clauses
in ppi_clauses is no less than the number of the join's restriction
clauses.  However, this check isn't entirely reliable, as ppi_clauses
includes join clauses available from all outer rels, not just the
current outer rel.  This means the check could pass even if a
restriction clause isn't parameterized, as long as another join
clause, which doesn't belong to the current join, is included in
ppi_clauses.

To fix this, we explicitly check whether each restriction clause of
the current join is present in ppi_clauses.

While we're here, remove the XXX comment from the modified code, as
it's not justified; in certain cases, it's not possible to move a join
clause to the inner side.

This is arguably a bugfix, but no backpatch given the lack of field
reports.

Author: Richard Guo <[email protected]>
Reviewed-by: wenhui qiu <[email protected]>
Reviewed-by: Andrei Lepikhov <[email protected]>
Discussion: https://postgr.es/m/CAMbWs4-8JPouj=wBDj4DhK-WO4+Xdx=A2jbjvvyyTBQneJ1=BQ@mail.gmail.com

doc: Fix typos in documentation

This fixes a set of typos introduced during the v18 development
cycle.

Author: Daniel Gustafsson <[email protected]>
Reviewed-by: Laurenz Albe <[email protected]>
Discussion: https://postgr.es/m/7038B4C5-2742-42B1-A8F0-0FFEAECF02A7@yesql.se

Fix failure for generated column with a not-null domain constraint.

If a GENERATED column is declared to have a domain data type where
the domain's constraints disallow null values, INSERT commands failed
because we built a targetlist that included coercing a null constant
to the domain's type.  The failure occurred even when the generated
value would have been perfectly OK.  This is adjacent to the issues
fixed in 0da39aa76, but we didn't notice for lack of testing a domain
with such a constraint.

We aren't going to use the result of the targetlist entry for the
generated column --- ExecComputeStoredGenerated will overwrite it.
So it's not really necessary that it have the exact datatype of
the generated column.  This patch fixes the problem by changing
the targetlist entry to be a null Const of the domain's base type,
which should be sufficiently legal.  (We do have to tweak
ExecCheckPlanOutput to accept the situation, though.)

This has been broken since we implemented generated columns.
However, this patch only applies easily as far back as v14, partly
because I (tgl) only carried 0da39aa76 back that far, but mostly
because v14 significantly refactored the handling of INSERT/UPDATE
targetlists.  Given the lack of field complaints and the short
remaining support lifetime of v13, I judge the cost-benefit ratio
not good for devising a version that would work in v13.

Reported-by: jian he <[email protected]>
Author: jian he <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CACJufxG59tip2+9h=rEv-ykOFjt0cbsPVchhi0RTij8bABBA0Q@mail.gmail.com
Backpatch-through: 14

doc: Fix missing whitespace in pg_restore documentation.

Previously, a space was missing between "<option>--exclude-schema</option>"
and "for" in the pg_restore documentation. This commit fixes the typo by
adding the missing whitespace.

Back-patch to v17 where the typo was added.

Author: Lele Gaifax <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 17

pg_combinebackup: Fix incorrect code documentation

The code comment for parse_oid accidentally used the wrong parameter
when referring to the location of the last backup. Also, while there,
improve sentence wording by removing a superfluous word.

Backpatch to v17 where pg_combinebackup was addedd

Author: Amul Sul <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/CAAJ_b95ecWgzcS4K3Dx0E_Yp-SLwK5JBasFgioKMSjhQLw9xvg@mail.gmail.com
Backpatch-through: 17

Fix incorrect format placeholders

BlockNumber is unsigned int. Fix for commit 14ffaece0fb.

Add more source files to pg_verifybackup/nls.mk

also related to commit 8dfd3129027

Doc: use "an SQL" consistently rather than "a SQL"

Per the precedent set by 04539e73f, adjust article prefixes for "SQL" to
use "an" consistently rather than "a", i.e., "an es-que-ell" rather than
"a sequel".

Both of these are new to v18. Also see b1b13d2b5, d866f0374 and
7bdd489d3.

Mark sslkeylogfile as Debug option

Mark the sslkeylogile option as "D" debug as this truly is a debug
option, and it will allow postgres_fdw et.al to filter it out as
well. Also update the display length to match that for an ssl key
as they are both filename based inputs.

Author: Daniel Gustafsson <[email protected]>
Reported-by: Jacob Champion <[email protected]>
Discussion: https://postgr.es/m/CAOYmi+=5GyBKpu7bU4D_xkAnYJTj=rMzGaUvHO99-DpNG_YKcw@mail.gmail.com

Make AIO error test more portable

Alpine Linux's C library (musl) spells one error message differently.

Reported-by: Wolfgang Walther

Free memory properly in pg_restore.c

Thinko in commit 39729ec01d2. Mea maxima culpa.

Per Mahendra Singh Thalor <[email protected]>

Doc: do a little copy-editing on Index Storage Parameters list.

Add a paragraph break per suggestion from David G. Johnston.
Use a consistent voice for all the different parameter
descriptions, and fix a couple of grammatical issues.

Reported-by: Igor Korot <[email protected]>
Co-authored-by: "David G. Johnston" <[email protected]>
Co-authored-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CA+FnnTz=EW1VQRpWB9J+G-NSchrPFcw4nR7d0JqzEK9jWKB35A@mail.gmail.com

Fix GIN's shimTriConsistentFn to not corrupt its input.

Commit 0f21db36d made an assumption that GIN triConsistentFns
would not modify their input entryRes[] arrays.  But in fact,
the "shim" triConsistentFn that we use for opclasses that don't
supply their own did exactly that, potentially leading to wrong
answers from a GIN index search.  Through bad luck, none of the
test cases that we have for such opclasses exposed the bug.

One response to this could be that the assumption of consistency check
functions not modifying entryRes[] arrays is a bad one, but it still
seems reasonable to me.  Notably, shimTriConsistentFn is itself
assuming that with respect to the underlying boolean consistentFn,
so it's sure being self-centered in supposing that it gets to do so.

Fortunately, it's quite simple to fix shimTriConsistentFn to restore
the entry-time state of entryRes[], so let's do that instead.

This issue doesn't affect any core GIN opclasses, since they all
supply their own triConsistentFns.  It does affect contrib modules
btree_gin, hstore, and intarray.

Along the way, I (tgl) noticed that shimTriConsistentFn failed to
pick up on a "recheck" flag returned by its first call to the boolean
consistentFn.  This may be only a latent problem, since it would be
unlikely for a consistentFn to set recheck for the all-false case
and not any other cases.  (Indeed, none of our contrib modules do
that.)  Nonetheless, it's formally wrong.

Reported-by: Vinod Sridharan <[email protected]>
Author: Vinod Sridharan <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CAFMdLD7XzsXfi1+DpTqTgrD8XU0i2C99KuF=5VHLWjx4C1pkcg@mail.gmail.com
Backpatch-through: 13

Harmonize function parameter names for Postgres 18.

Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places. These
inconsistencies were all introduced during Postgres 18 development.

This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).

Fix instability with WAL fsync test in stats.sql

A backend using wal_sync_method set to "open_sync" or "open_datasync"
would fail the test checking the WAL sync data in pg_stat_io. These
modes guarantee that a sync is done when WAL is written to disk, and the
data checked by the test is not incremented in this case,
issue_xlog_fsync() doing nothing.

Oversight in commit a051e71e28a1.

Author: Sami Imseih <[email protected]>
Discussion: https://postgr.es/m/CAA5RZ0uxwg3xAi4nvdBMJ-zJQEeyg+RotuU+ebM2F6CKmnvaYA@mail.gmail.com

Fix recently introduced typos

This fixes typos in docs and comments introduced during the v18
development cycle, to keep them from ending up in backbranches.

Author: Jacob Brazeal <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/CA+COZaCgGua25f2hSrjrDLJcJJAHkwoKgTTqUy-wyL1=64JNjw@mail.gmail.com

Add missing space in pg_restore documentation.

Oversight in commit 1495eff7bd.

Add missing source file to pg_verifybackup/nls.mk

added by commit 8dfd3129027

Add missing source file to pg_dump/nls.mk

added by commit c1da7281060

Add missing source file to pg_upgrade/nls.mk

added by commit 40e2e5e92b7

Add missing PGDLLIMPORT markings

Discussion: https://www.postgresql.org/message-id/flat/25095db5-b595-4b85-9100-d358907c25b5%40eisentraut.org

Fix race with synchronous_standby_names at startup

synchronous_standby_names cannot be reloaded safely by backends, and the
checkpointer is in charge of updating a state in shared memory if the
GUC is enabled in WalSndCtl, to let the backends know if they should
wait or not for a given LSN.  This provides a strict control on the
timing of the waiting queues if the GUC is enabled or disabled, then
reloaded.  The checkpointer is also in charge of waking up the backends
that could be waiting for a LSN when the GUC is disabled.

This logic had a race condition at startup, where it would be possible
for backends to not wait for a LSN even if synchronous_standby_names is
enabled.  This would cause visibility issues with transactions that we
should be waiting for but they were not.  The problem lasts until the
checkpointer does its initial update of the shared memory state when it
loads synchronous_standby_names.

In order to take care of this problem, the shared memory state in
WalSndCtl is extended to detect if it has been initialized by the
checkpointer, and not only check if synchronous_standby_names is
defined.  In WalSndCtlData, sync_standbys_defined is renamed to
sync_standbys_status, a bits8 able to know about two states:
- If the shared memory state has been initialized.  This flag is set by
the checkpointer at startup once, and never removed.
- If synchronous_standby_names is known as defined in the shared memory
state.  This is the same as the previous sync_standbys_defined in
WalSndCtl.

This method gives a way for backends to decide what they should do until
the shared memory area is initialized, and they now ultimately fall back
to a check on the GUC value in this case, which is the best thing that
can be done.

Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by
the checkpointer when this process starts, so the window is very narrow.
It is possible to enlarge the problematic window by making the
checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()
with a hardcoded sleep for example, and doing so has showed that a 2PC
visibility test is indeed failing.  On machines slow enough, this bug
would cause spurious failures.

In 17~, we have looked at the possibility of adding an injection point
to have a reproducible test, but as the problematic window happens at
early startup, we would need to invent a way to make an injection point
optionally persistent across restarts when attached, something that
would be fine for this case as it would involve the checkpointer.  This
issue is quite old, and can be reproduced on all the stable branches.

Author: Melnikov Maksim <[email protected]>
Co-authored-by: Michael Paquier <[email protected]>
Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru
Backpatch-through: 13

Add code comment explaining ins_since_vacuum and aborted inserts

Sami complained that there's a discrepancy between n_mod_since_analyze
and n_ins_since_vacuum, as the former only accounts for committed changes
and the latter tracks committed and aborted inserts.  Nobody seemed
overly concerned that this would cause any concerning issues.  The
repercussions, from what I can tell, are limited to causing an
autovacuum to trigger for inserts sooner than it otherwise might. For
typical ratios of commits to aborts, it's unlikely to ever be noticed.

Fixing things to make it so n_ins_since_vacuum only displays committed
inserts would require an additional field in PgStat_TableCounts, which
does not quite seem worthwhile at this stage.  This commit just adds a
comment with some details to mention that we know about it, which will
hopefully prevent repeat discussions.

Reported-by: Sami Imseih <[email protected]>
Author: David Rowley <[email protected]>
Reviewed-by: Sami Imseih <[email protected]>
Discussion: https://postgr.es/m/CAApHDvpgV3a-R2EGmPOh0L-x3pHbZpM3y4dySWfy+UqUazwDQA@mail.gmail.com

Fix fat fingering in 22cb6d28950

Per Rainier Vilela

Improve various new-to-v18 appendStringInfo calls

Similar to 8461424fd, here we adjust a few new locations which were not
using the most suitable appendStringInfo* function for the intended
purpose.

Author: David Rowley <[email protected]
Discussion: https://postgr.es/m/CAApHDvqJnNjueb=Eoj8K+8n0g7nj_AcPWSiCj5RNV4fDejAfqA@mail.gmail.com

Rename global variable backing DSA area

The global variable backing the DSA area for Memory Context stats
reporting had a too generic name, rename to be more descriptive.
Independently reported by Peter and Laurenz.

Author: Daniel Gustafsson <[email protected]>
Reported-by: Peter Eisentraut <[email protected]>
Reported-by: Laurenz Albe <[email protected]>
Discussion: https://postgr.es/m/d51172bd4e7f4b07a18a0288ca1b1c28a71a5f6a [email protected]
Discussion: https://postgr.es/m/25095db5-b595-4b85-9100-d358907c25b5@eisentraut.org

Fix memory leak in pg_restore.c

Oversight in 1495eff7bdb

Author: Ranier Vilela <[email protected]>

Doc: remove long-obsolete advice about generated constraint names.

It's been twenty years since we generated constraint names that
look like "$N". So this advice about double-quoting such names
is well past its sell-by date, and now it merely seems confusing.

Reported-by: Yaroslav Saburov <[email protected]>
Author: "David G. Johnston" <[email protected]>
Discussion: https://postgr.es/m/174393459040.678.17810152410419444783@wrigleys.postgresql.org
Backpatch-through: 13

Remove useless check for negative result of ip_addrsize().

By inspection, ip_addrsize() can't return a negative result.
(If it could, we'd have way bigger problems elsewhere.)
So delete useless check in network_send(). Most C compilers
are probably perfectly capable of removing this code by
themselves, but it's confusing/misleading.

Bug: #18889
Reported-by: Daniel Elishakov <[email protected]>
Discussion: https://postgr.es/m/18889-73d4f19e953a629e@postgresql.org

Further cleanup for directory creation on pg_dump/pg_dumpall

Instead of two separate (and different) implementations, refactor to use
a single common routine.

Along the way, remove use of a hardcoded file permissions constant in
favor of the common project setting for directory creation.

Author: Mahendra Singh Thalor <[email protected]>

Discussion: https://postgr.es/m/CAKYtNApihL8X1h7XO-zOjznc8Ca66Aevgvhc9zOTh6DBh2iaeA@mail.gmail.com

Fix data loss in logical replication.

Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ...
or ALTER TYPE ... that don't take a strong lock on table happens
concurrently to DMLs on the tables involved in the DDL. This happens
because logical decoding doesn't distribute invalidations to concurrent
transactions and those transactions use stale cache data to decode the
changes. The problem becomes bigger because we keep using the stale cache
even after those in-progress transactions are finished and skip the
changes required to be sent to the client.

This commit fixes the issue by distributing invalidation messages from
catalog-modifying transactions to all concurrent in-progress transactions.
This allows the necessary rebuild of the catalog cache when decoding new
changes after concurrent DDL.

We observed performance regression primarily during frequent execution of
*publication DDL* statements that modify the published tables. The
regression is minor or nearly nonexistent for DDLs that do not affect the
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.

An alternative approach considered was to take a strong lock on each
affected table during publication modification. However, this would only
address issues related to publication DDLs (but not the ALTER TYPE ...)
and require locking every relation in the database for publications
created as FOR ALL TABLES, which is impractical.

The bug exists in all supported branches, but we are backpatching till 14.
The fix for 13 requires somewhat bigger changes than this fix, so the fix
for that branch is still under discussion.

Reported-by: hubert depesz lubaczewski <[email protected]>
Reported-by: Tomas Vondra <[email protected]>
Author: Shlok Kyal <[email protected]>
Author: Hayato Kuroda <[email protected]>
Reviewed-by: Zhijie Hou <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Tested-by: Benoit Lobréau <[email protected]>
Backpatch-through: 14
Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com
Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com

Fix incorrect format placeholders

for commits 8f427187db7, 6ee3b91bad2

Update wording in optimizer/README for EquivalenceClasses

d69d45a5a changed how em_is_child members are stored in
EquivalenceClasses. Children are no longer stored in the ec_members
list. optimizer/README mentioned that most operations "should ignore
child members", but that felt a little untrue now since child members
are now stored in a separate place, they simply won't be found by the
normal means of looking (a foreach loop over ec_members), and if you don't
find them, there's technically no need to "ignore" them.

Here we tweak the wording slightly to reflect the new storage location
for child members.

Reported-by: Amit Langote <[email protected]>
Author: Amit Langote <[email protected]>
Author: David Rowley <[email protected]>
Discussion: https://postgr.es/m/CA+HiwqE8v=EuAP_3F_A2xn8zWx+nG_etW_Fe_DvKO-Fkx=+DdQ@mail.gmail.com

Cosmetic fixes for pg_createsubscriber's -all option.

Author: Peter Smith <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Discussion: https://postgr.es/m/CAHut+PsmSCQ-ENSDQ0YOUcsgzT=GG-E9jyXBvxd51A_dMXH5XA@mail.gmail.com

ci: Check for missing dependencies in meson builds

Extends the Linux and Windows meson builds with a check for missing
dependencies by running

ninja -t missingdeps

after the build. This highlights unindended dependencies.

Reviewed-by: Andres Freund <[email protected]>
https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com

Cleanup of pg_numa.c

This moves/renames some of the functions defined in pg_numa.c:

* pg_numa_get_pagesize() is renamed to pg_get_shmem_pagesize(), and
  moved to src/backend/storage/ipc/shmem.c. The new name better reflects
  that the page size is not related to NUMA, and it's specifically about
  the page size used for the main shared memory segment.

* move pg_numa_available() to src/backend/storage/ipc/shmem.c, i.e. into
  the backend (which more appropriate for functions callable from SQL).
  While at it, improve the comment to explain what page size it returns.

* remove unnecessary includes from src/port/pg_numa.c, adding
  unnecessary dependencies (src/port should be suitable for frontent).
  These were either leftovers or unnecessary thanks to the other changes
  in this commit.

This eliminates unnecessary dependencies on backend symbols, which we
don't want in src/port.

Reported-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
https://postgr.es/m/CALdSSPi5fj0a7UG7Fmw2cUD1uWuckU_e8dJ+6x-bJEokcSXzqA@mail.gmail.com

pg_upgrade: Mention that we preserve database OIDs in a comment.

Oversight in commit aa01051418.

Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/4055696.1744134682%40sss.pgh.pa.us

Fix performance issue in deadlock-parallel isolation test.

With debug_discard_caches = 1, the runtime of this test script
increased by about a factor of 10 after commit 0dca5d68d.  That's
causing some of our buildfarm animals to fail with a timeout.

The reason for the increased time is that now we are re-planning
some intentionally-non-inlineable SQL functions on every execution,
where the previous coding held onto the original plans throughout
the outer query.  The previous behavior was arguably quite buggy,
so I don't think 0dca5d68d deserves blame here.  But we would
like this test script to not take so long.

To fix, instead of forcing a "parallel safe" label via a
non-inlineable SQL function, apply it directly to the advisory-lock
functions by making internal-language aliases for them.  A small
problem is that the advisory-lock functions return void but this
test would really like them to return integer 1.  I cheated here by
declaring the aliases as returning "int".  That's perhaps undue
familiarity with the implementation of PG_RETURN_VOID(), but that
hasn't changed in twenty years and is unlikely to do so in the next
twenty.  That gets us an integer 0 result, and then an inline-able
wrapper to convert that to an integer 1 allows the rest of the script
to remain unchanged.

For me, this reduces the runtime with debug_discard_caches = 1
by about 100x, making the test comfortably faster than before
instead of slower.

Discussion: https://postgr.es/m/136163.1744179562@sss.pgh.pa.us

Fix test races between syscache-update-pruned.spec and autovacuum.

This spec fails ~3% of my Valgrind runs, and the spec has failed on Valgrind
buildfarm member skink at a similar rate.  Two problems contributed to that:

- A competing buffer pin triggered VACUUM's lazy_scan_noprune() path, causing
  "tuples missed: 1 dead from 1 pages not removed due to cleanup lock
  contention".  FREEZE fixes that.

- The spec ran lazy VACUUM immediately after VACUUM FULL.  The spec implicitly
  assumed lazy VACUUM prunes the one tuple that VACUUM FULL made dead.  First
  wait for old snapshots, making that assumption reliable.

This also adds two forms of defense in depth:

- Wait for snapshots using shared catalog pruning rules (VISHORIZON_SHARED).
  This avoids the removable cutoff moving backward when an XID-bearing
  autoanalyze process runs in another database.  That may never happen in this
  test, but it's cheap insurance.

- Use lazy VACUUM option DISABLE_PAGE_SKIPPING.  Commit
  c2dc1a79767a0f947e1145f82eb65dfe4360d25f did this for a related requirement
  in other tests, but I suspect FREEZE is necessary and sufficient in all
  these tests.

Back-patch to v17, where the test first appeared.

Reported-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/sv3taq4e6ea4qckimien3nxp3sz4b6cw6sfcy4nhwl52zpur4g@h6i6tohxmizu
Backpatch-through: 17

Update config.guess and config.sub

Fix a few oversights in the longer cancel keys patch

Change MyCancelKeyLength's type from uint8 to int. While it always
fits in a uint8, plain int is less surprising, as there's no
particular reason for it to be uint8.

Fix one ProcSignalInit caller that passed 'false' instead of NULL for
the pointer argument.

Author: Peter Eisentraut <[email protected]>
Discussion: https://www.postgresql.org/message-id/61be9e31-7b7d-49d5-bc11-721800d89d64@eisentraut.org

Perform missed catversion bump

Commit c57971034e69ca renamed an argument for a function but missed
to bump the catversion to reflect this.

Reported-by: David Rowley <[email protected]>
Discussion: https://postgr.es/m/CAApHDvqOega=dPtu3h2C5fJWJEuaGCMDib_sVfhKQqgUNJVmFA@mail.gmail.com

Doc: note that two examples in optimizer/README are oversimplified.

These examples fail to account for join clauses generated by
EquivalenceClasses, but since we haven't mentioned EquivalenceClasses
yet it seems like it'd just add confusion to make them fully accurate.
Instead, parenthetically note that they're oversimplified.

Reported-by: Zeyuan Hu <[email protected]>
Co-authored-by: David Rowley <[email protected]>
Co-authored-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CACvHWmYFo+60yMqKJajDDvKN5EM41YHrCT3oxukwXmGAqpWvyw@mail.gmail.com

Adjust AdjustUpgrade.pm for commit b1720fe63.

Need to delete the functions we no longer have available from
the dumps to be reloaded from old versions.

Per buildfarm.

Move contrib/spi testing from core regression tests to contrib/spi.

It's weird to have the core regression tests depending on contrib
code, and coverage testing shows that those test queries add nothing
to the core-code coverage of the core tests.  So pull those test bits
out and put them into ordinary test scripts inside contrib/spi/,
making that more like other contrib modules.

Aside from being structurally nicer, anything we can take out of the
core tests (which are executed multiple times per check-world run)
and put into tests executed only once should be a win.  It doesn't
look like this change will buy a whole lot of milliseconds, but a
cycle saved is a cycle earned.

Also, there is some discussion around possibly removing refint and/or
autoinc altogether.  I don't know if that will happen, but we'd
certainly need to decouple them from the core tests to do so.

The tests for autoinc were quite intertwined with the undocumented
"ttdummy" trigger in regress.c.  That made the tests very hard to
understand and contributed nothing to autoinc's testing either.
So I just deleted ttdummy and rewrote the autoinc tests without it.

I realized while doing this that the description of autoinc in
the SGML docs is not a great description of what the function
actually does, so the patch includes some updates to those docs.

Author: Tom Lane <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Discussion: https://postgr.es/m/3872677.1744077559@sss.pgh.pa.us

Rename argument in pg_get_process_memory_contexts().

During development the third argument to pg_get_process_memory_contexts
was a retry count, but it was changed to a timeout instead. The param
name was accidentally left in pg_proc.dat though. Fix by renaming to
the correct parameter name.

Author: Fujii Masao <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/3eb40b3e-45c7-426a-b7f8-81f7d05a9b53@oss.nttdata.com

Fix incorrect format placeholder

for commit 749a9e20c97

Prevent 006_transfer_modes.pl from leaving files behind.

This test was leaving files like delete_old_cluster.{sh,bat} in the
source directory for VPATH and meson builds. To fix, change the
directory to tmp_check before running the test, as was done in
commits 15b6d21553, 8af917be6b, and c462b054ba.

Oversight in commit af0d4901c1.

Reported-by: Andrew Dunstan <[email protected]> (on Discord)
Reviewed-by: Michael Paquier <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/Z_RHkG770w3SE0yU%40nathan

ci: Add MBUILD_TARGET for NetBSD and OpenBSD

Commit b2bdb972c0 added MBUILD_TARGET to ensure that meson builds
the tests before running them, this adds MBUILD_TARGET to OpenBSD
and NetBSD builds as well where it was missing.

No backpatching since OpenBSD and NetBSD support does not exist
in the backbranch CI.

Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/CAN55FZ2LNnRrtL+cpSdEg44fQcLPq_GjJjfNa0vz+xqEdq=ZHw@mail.gmail.com

pg_buffercache: Change page_num type to bigint

The page_num was defined as integer, which should be sufficient for the
near future (with 4K pages it's 8TB). But it's virtually free to return
bigint, and get a wider range. This was agreed on the thread, but I
forgot to tweak this in ba2a3c2302f1.

While at it, make the data types in CREATE VIEW a bit more consistent.

Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.co

doc: Correct pg_shmem_allocations_numa.size data type

The code in pg_get_shmem_allocations_numa() returned 'size' as int64,
but the docs said int32.

Report and fix by Noriyoshi Shinoda.

Reported-by: Noriyoshi Shinoda <[email protected]>
Discussion: https://postgr.es/m/DM4PR84MB1734308EB741A6ECFF040C27EEAA2@DM4PR84MB1734.NAMPRD84.PROD.OUTLOOK.COM

Fix uninitialized index information access during apply.

The issue happens when building conflict information during apply of
INSERT or UPDATE operations that violate unique constraints on leaf
partitions.

The problem was introduced in commit 9ff68679b5, which removed the
redundant calls to ExecOpenIndices/ExecCloseIndices. The previous code was
relying on the redundant ExecOpenIndices call in
apply_handle_tuple_routing() to build the index information required for
unique key conflict detection.

The fix is to delay building the index information until a conflict is
detected instead of relying on ExecOpenIndices to do the same. The
additional benefit of this approach is that it avoids building index
information when there is no conflict.

Author: Hou Zhijie <[email protected]>
Reviewed-by:Reviewed-by: Amit Kapila <[email protected]>
Discussion: https://postgr.es/m/TYAPR01MB57244ADA33DDA57119B9D26494A62@TYAPR01MB5724.jpnprd01.prod.outlook.com

Fix typo in docs.

Typo in previous commit.

Introduce file_copy_method setting.

It can be set to either COPY (the default) or CLONE if the system
supports it. CLONE causes callers of copydir(), currently CREATE
DATABASE ... STRATEGY=FILE_COPY and ALTER DATABASE ... SET TABLESPACE =
..., to use copy_file_range (Linux, FreeBSD) or copyfile (macOS) to copy
files instead of a read-write loop over the contents.

CLONE gives the kernel the opportunity to share block ranges on
copy-on-write file systems and push copying down to storage on others,
depending on configuration. On some systems CLONE can be used to clone
large databases quickly with CREATE DATABASE ... TEMPLATE=source
STRATEGY=FILE_COPY.

Other operating systems could be supported; patches welcome.

Co-authored-by: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Ranier Vilela <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com

Add function to get memory context stats for processes

This adds a function for retrieving memory context statistics
and information from backends as well as auxiliary processes.
The intended usecase is cluster debugging when under memory
pressure or unanticipated memory usage characteristics.

When calling the function it sends a signal to the specified
process to submit statistics regarding its memory contexts
into dynamic shared memory.  Each memory context is returned
in detail, followed by a cumulative total in case the number
of contexts exceed the max allocated amount of shared memory.
Each process is limited to use at most 1Mb memory for this.

A summary can also be explicitly requested by the user, this
will return the TopMemoryContext and a cumulative total of
all lower contexts.

In order to not block on busy processes the caller specifies
the number of seconds during which to retry before timing out.
In the case where no statistics are published within the set
timeout,  the last known statistics are returned, or NULL if
no previously published statistics exist.  This allows dash-
board type queries to continually publish even if the target
process is temporarily congested.  Context records contain a
timestamp to indicate when they were submitted.

Author: Rahila Syed <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Atsushi Torikoshi <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>
Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com

Increase BAS_BULKREAD based on effective_io_concurrency

Before, BAS_BULKREAD was always of size 256kB. With the default
io_combine_limit of 16, that only allowed 1-2 IOs to be in flight -
insufficient even on very low latency storage.

We don't just want to increase the size to a much larger hardcoded value, as
very large rings (10s of MBs of of buffers), appear to have negative
performance effects when reading in data that the OS has cached (but not when
actually needing to do IO).

To address this, increase the size of BAS_BULKREAD to allow for
io_combine_limit * effective_io_concurrency buffers getting read in. To
prevent the ring being much larger than useful, limit the increased size with
GetPinLimit().

The formula outlined above keeps the ring size to sizes for which we have not
observed performance regressions, unless very large effective_io_concurrency
values are used together with large shared_buffers setting.

Reviewed-by: Thomas Munro <[email protected]>
Discussion: https://postgr.es/m/lqwghabtu2ak4wknzycufqjm5ijnxhb4k73vzphlt2a3wsemcd@gtftg44kdim6
Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah@brqs62irg4dt

Add pg_buffercache_evict_{relation,all} functions

In addition to the added functions, the pg_buffercache_evict() function now
shows whether the buffer was flushed.

pg_buffercache_evict_relation(): Evicts all shared buffers in a
relation at once.
pg_buffercache_evict_all(): Evicts all shared buffers at once.

Both functions provide mechanism to evict multiple shared buffers at
once. They are designed to address the inefficiency of repeatedly calling
pg_buffercache_evict() for each individual buffer, which can be time-consuming
when dealing with large shared buffer pools. (e.g., ~477ms vs. ~2576ms for
16GB of fully populated shared buffers).

These functions are intended for developer testing and debugging
purposes and are available to superusers only.

Minimal tests for the new functions are included. Also, there was no test for
pg_buffercache_evict(), test for this added too.

No new extension version is needed, as it was already increased this release
by ba2a3c2302f.

Author: Nazir Bilal Yavuz <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Aidar Imamov <[email protected]>
Reviewed-by: Joseph Koshakow <[email protected]>
Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw%40mail.gmail.com

Speedup child EquivalenceMember lookup in planner

When planning queries to partitioned tables, we clone all
EquivalenceMembers belonging to the partitioned table into em_is_child
EquivalenceMembers for each non-pruned partition.  For partitioned tables
with large numbers of partitions, this meant the ec_members list could
become large and code searching that list would become slow.  Effectively,
the more partitions which were present, the more searches needed to be
performed for operations such as find_ec_member_matching_expr() during
create_plan() and the more partitions present, the longer these searches
would take, i.e., a quadratic slowdown.

To fix this, here we adjust how we store EquivalenceMembers for
em_is_child members.  Instead of storing these directly in ec_members,
these are now stored in a new array of Lists in the EquivalenceClass,
which is indexed by the relid.  When we want to find EquivalenceMembers
belonging to a certain child relation, we can narrow the search to the
array element for that relation.

To make EquivalenceMember lookup easier and to reduce the amount of code
change, this commit provides a pair of functions to allow iteration over
the EquivalenceMembers of an EC which also handles finding the child
members, if required.  Callers that never need to look at child members
can remain using the foreach loop over ec_members, which will now often
be faster due to only parent-level members being stored there.

The actual performance increases here are highly dependent on the number
of partitions and the query being planned.  Performance increases can be
visible with as few as 8 partitions, but the speedup is marginal for
such low numbers of partitions.  The speedups become much more visible
with a few dozen to hundreds of partitions.  With some tested queries
using 56 partitions, the planner was around 3x faster than before.  For
use cases with thousands of partitions, these are likely to become
significantly faster.  Some testing has shown planner speedups of 60x or
more with 8192 partitions.

Author: Yuya Watari <[email protected]>
Co-authored-by: David Rowley <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Andrey Lepikhov <[email protected]>
Reviewed-by: Alena Rybakina <[email protected]>
Reviewed-by: Dmitry Dolgov <[email protected]>
Reviewed-by: Amit Langote <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Tested-by: Thom Brown <[email protected]>
Tested-by: newtglobal postgresql_contributors <[email protected]>
Discussion: https://postgr.es/m/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com

Stabilize 035_standby_logical_decoding.pl.

Some tests try to invalidate logical slots on the standby server by
running VACUUM on the primary. The problem is that xl_running_xacts was
getting generated and replayed before the VACUUM command, leading to the
advancement of the active slot's catalog_xmin. Due to this, active slots
were not getting invalidated, leading to test failures.

We fix it by skipping the generation of xl_running_xacts for the required
tests with the help of injection points. As the required interface for
injection points was not present in back branches, we fixed the failing
tests in them by disallowing the slot to become active for the required
cases (where rows_removed conflict could be generated).

Author: Hayato Kuroda <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Backpatch-through: 16, where it was introduced
Discussion: https://postgr.es/m/[email protected]

Fix PG 17 [NOT] NULL optimization bug for domains

A PG 17 optimization allowed columns with NOT NULL constraints to skip
table scans for IS NULL queries, and to skip IS NOT NULL checks for IS
NOT NULL queries. This didn't work for domain types, since domain types
don't follow the IS NULL/IS NOT NULL constraint logic. To fix, disable
this optimization for domains for PG 17+.

Reported-by: Jan Behrens
Diagnosed-by: Tom Lane
Discussion: https://postgr.es/m/[email protected]

Backpatch-through: 17

Flush the IO statistics of active WAL senders more frequently

WAL senders do not flush their statistics until they exit, limiting the
monitoring possible for live processes.  This is penalizing when WAL
senders are running for a long time, like in streaming or logical
replication setups, because it is not possible to know the amount of IO
they generate while running.

This commit makes WAL senders more aggressive with their statistics
flush, using an internal of 1 second, with the flush timing calculated
based on the existing GetCurrentTimestamp() done before the sleeps done
to wait for some activity.  Note that the sleep done for logical and
physical WAL senders happens in two different code paths, so the stats
flushes need to happen in these two places.

One test is added for the physical WAL sender case, and one for the
logical WAL sender case.  This can be done in a stable fashion by
relying on the WAL generated by the TAP tests in combination with a
stats reset while a server is running, but only on HEAD as WAL data has
been added to pg_stat_io in a051e71e28a1.

This issue exists since a9c70b46dbe and the introduction of pg_stat_io,
so backpatch down to v16.

Author: Bertrand Drouvot <[email protected]>
Reviewed-by: vignesh C <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 16

Add pg_buffercache_numa view with NUMA node info

Introduces a new view pg_buffercache_numa, showing NUMA memory nodes
for individual buffers. For each buffer the view returns an entry for
each memory page, with the associated NUMA node.

The database blocks and OS memory pages may have different size - the
default block size is 8KB, while the memory page is 4K (on x86). But
other combinations are possible, depending on configure parameters,
platform, etc. This means buffers may overlap with multiple memory
pages, each associated with a different NUMA node.

To determine the NUMA node for a buffer, we first need to touch the
memory pages using pg_numa_touch_mem_if_required, otherwise we might get
status -2 (ENOENT = The page is not present), indicating the page is
either unmapped or unallocated.

The view may be relatively expensive, especially when accessed for the
first time in a backend, as it touches all memory pages to get reliable
information about the NUMA node. This may also force allocation of the
shared memory.

Author: Jakub Wartak <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com

Introduce pg_shmem_allocations_numa view

Introduce new pg_shmem_alloctions_numa view with information about how
shared memory is distributed across NUMA nodes. For each shared memory
segment, the view returns one row for each NUMA node backing it, with
the total amount of memory allocated from that node.

The view may be relatively expensive, especially when executed for the
first time in a backend, as it has to touch all memory pages to get
reliable information about the NUMA node. This may also force allocation
of the shared memory.

Unlike pg_shmem_allocations, the view does not show anonymous shared
memory allocations. It also does not show memory allocated using the
dynamic shared memory infrastructure.

Author: Jakub Wartak <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com

Add support for basic NUMA awareness

Add basic NUMA awareness routines, using a minimal src/port/pg_numa.c
portability wrapper and an optional build dependency, enabled by
--with-libnuma configure option. For now this is Linux-only, other
platforms may be supported later.

A built-in SQL function pg_numa_available() allows checking NUMA
support, i.e. that the server was built/linked with the NUMA library.

The main function introduced is pg_numa_query_pages(), which allows
determining the NUMA node for individual memory pages. Internally the
function uses move_pages(2) syscall, as it allows batching, and is more
efficient than get_mempolicy(2).

Author: Jakub Wartak <[email protected]>
Co-authored-by: Bertrand Drouvot <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CAKZiRmxh6KWo0aqRqvmcoaX2jUxZYb4kGp3N%3Dq1w%2BDiH-696Xw%40mail.gmail.com

Use specific collation where needed in new test

Oversight in commit a379061a22a8.

Per Czech buildfarm members jay and hippopotamus.

Fix some issues in contrib/spi/refint.c.

check_foreign_key incorrectly used a single cache entry for its saved
plans for a 'c' (cascade) trigger, although there are two different
queries to execute depending on whether it fires for an update or a
delete.  This caused the wrong things to be done if both types of
event occur in one session.  (This was indeed visible in the triggers
regression test, but apparently nobody ever questioned it.)  To fix,
add the operation type to the cache key.

Its debug log output failed to distinguish update from delete
events, too.

Also, change the intended trigger usage from BEFORE ROW to AFTER ROW,
and add checks insisting on that usage.  BEFORE is really rather
unsafe, since if there are other BEFORE triggers they might change or
cancel the operation we are trying to check.  AFTER triggers are the
standard way to propagate changes to other rows, so we should follow
that way here.

In passing, remove a useless duplicate lookup of the cache entry.

This code is mostly intended as a documentation example, so we
won't consider a back-patch.

Author: Dmitrii Bondar <[email protected]>
Reviewed-by: Paul Jungwirth <[email protected]>
Reviewed-by: Lilian Ontowhee <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/79755a2b18ed4fe5e29da6a87a1e00d1@postgrespro.ru

aio: Make AIO more compatible with valgrind

In some edge cases valgrind flags issues with the memory referenced by
IOs. All of the cases addressed in this change are false positives.

Most of the false positives are caused by UnpinBuffer[NoOwner] marking buffer
data as inaccessible. This happens even though the AIO subsystem still holds a
pin. That's good, there shouldn't be accesses to the buffer outside of AIO
related code until it is pinned by "user" code again. But it requires some
explicit work - if the buffer is not pinned by the current backend, we need to
explicitly mark the buffer data accessible/inaccessible while executing
completion callbacks.

That however causes a cascading issue in IO workers: After the completion
callbacks for a buffer is executed, the page is marked as inaccessible. If
subsequently the same worker is executing IO targeting the same buffer, we
would get an error, as the memory is still marked inaccessible. To avoid that,
we need to explicitly mark the memory as accessible in IO workers.

Another issue is that IO executed in workers or via io_uring will not mark
memory as DEFINED. In the case of workers that is because valgrind does not
track memory definedness across processes. For io_uring that is because
valgrind does not understand io_uring, and therefore its IOs never mark memory
as defined, whether the completions are processed in the defining process or
in another context. It's not entirely clear how to best solve that. The
current user of AIO is not affected, as it explicitly marks buffers as DEFINED
& NOACCESS anyway. Defer solving this issue until we have a user with
different needs.

Per buildfarm animal skink.

Reviewed-by: Noah Misch <[email protected]>
Co-authored-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/3pd4322mogfmdd5nln3zphdwhtmq3rzdldqjwb2sfqzcgs22lf@ok2gletdaoe6

localbuf: Add Valgrind buffer access instrumentation

This mirrors 1e0dfd166b3 (+ 46ef520b9566), for temporary table buffers. This
is mainly interesting right now because the AIO work currently triggers
spurious valgrind errors, and the fix for that is cleaner if temp buffers
behave the same as shared buffers.

This requires one change beyond the annotations themselves, namely to pin
local buffers while writing them out in FlushRelationBuffers().

Reviewed-by: Noah Misch <[email protected]>
Co-authored-by: Noah Misch <[email protected]>
Discussion: https://postgr.es/m/3pd4322mogfmdd5nln3zphdwhtmq3rzdldqjwb2sfqzcgs22lf@ok2gletdaoe6

doc: Fix a typo in pg_recvlogical documentation.

Oversight in cf2655a9029a.

Author: Zhijie Hou <[email protected]>
Discussion: https://postgr.es/m/OS3PR01MB5718DD1466E2B9043448AE5094AA2@OS3PR01MB5718.jpnprd01.prod.outlook.com

Follow-up fixes for SHA-2 patch (commit 749a9e20c).

This changes the check for valid characters in the salt string to
only allow plain ASCII letters and digits. The previous coding was
locale-dependent which doesn't really seem like a great idea here;
moreover it could not work correctly in multibyte encodings.

This fixes a careless pointer-use-after-pfree, too.

Reported-by: Tom Lane <[email protected]>
Reported-by: Andres Freund <[email protected]>
Author: Bernd Helmle <[email protected]>
Discussion: https://postgr.es/m/6fab35422df6b6b9727fdcc243c5fa1c667dd3b5 [email protected]

Fix erroneous construction of functions' dependencies on transforms.

The list of transform objects that a function should use is specified
in CREATE FUNCTION's TRANSFORM clause, and then represented indirectly
in pg_proc.protrftypes.  However, ProcedureCreate completely ignored
that for purposes of constructing pg_depend entries, and instead made
the function depend on any transforms that exist for its parameter or
return data types.  This is bad in both directions: the function could
be made dependent on a transform it does not actually use, or it
could try to use a transform that's since been dropped.  (The latter
scenario would require use of a transform that's not for any of the
parameter or return types, but that seems legit for cases where the
function performs SQL operations internally.)

To fix, pass in the list of transform objects that CreateFunction
identified, and build pg_depend entries from that not from the
parameter/return types.  This results in changes in the expected
test outputs in contrib/bool_plperl, which I guess are due to
different ordering of pg_depend entries -- that test case is
surely not exercising either of the problem scenarios.

This fix is not back-patchable as-is: changing the signature of
ProcedureCreate seems too risky in stable branches.  We could
do something like making ProcedureCreate a wrapper around
ProcedureCreateExt or so.  However, I'm more inclined to do
nothing in the back branches.  We had no field complaints up to
now, so the hazards don't seem to be a big issue in practice.
And we couldn't do anything about existing pg_depend entries,
so a back-patched fix would result in a mishmash of dependencies
created according to different rules.  That cure could be worse
than the disease, perhaps.

I bumped catversion just to lay down a marker that the expected
contents of pg_depend are a bit different than before.

Reported-by: Chapman Flack <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/3112950.1743984111@sss.pgh.pa.us

Allow NOT NULL constraints to be added as NOT VALID

This allows them to be added without scanning the table, and validating
them afterwards without holding access exclusive lock on the table after
any violating rows have been deleted or fixed.

Doing ALTER TABLE ... SET NOT NULL for a column that has an invalid
not-null constraint validates that constraint.  ALTER TABLE .. VALIDATE
CONSTRAINT is also supported.  There are various checks on whether an
invalid constraint is allowed in a child table when the parent table has
a valid constraint; this should match what we do for enforced/not
enforced constraints.

pg_attribute.attnotnull is now only an indicator for whether a not-null
constraint exists for the column; whether it's valid or invalid must be
queried in pg_constraint.  Applications can continue to query
pg_attribute.attnotnull as before, but now it's possible that NULL rows
are present in the column even when that's set to true.

For backend internal purposes, we cache the nullability status in
CompactAttribute->attnullability that each tuple descriptor carries
(replacing CompactAttribute.attnotnull, which was a mirror of
Form_pg_attribute.attnotnull).  During the initial tuple descriptor
creation, based on the pg_attribute scan, we set this to UNRESTRICTED if
pg_attribute.attnotnull is false, or to UNKNOWN if it's true; then we
update the latter to VALID or INVALID depending on the pg_constraint
scan.  This flag is also copied when tupledescs are copied.

Comparing tuple descs for equality must also compare the
CompactAttribute.attnullability flag and return false in case of a
mismatch.

pg_dump deals with these constraints by storing the OIDs of invalid
not-null constraints in a separate array, and running a query to obtain
their properties.  The regular table creation SQL omits them entirely.
They are then dealt with in the same way as "separate" CHECK
constraints, and dumped after the data has been loaded.  Because no
additional pg_dump infrastructure was required, we don't bump its
version number.

I decided not to bump catversion either, because the old catalog state
works perfectly in the new world.  (Trying to run with new catalog state
and the old server version would likely run into issues, however.)

System catalogs do not support invalid not-null constraints (because
commit 14e87ffa5c54 didn't allow them to have pg_constraint rows
anyway.)

Author: Rushabh Lathia <[email protected]>
Author: Jian He <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Tested-by: Ashutosh Bapat <[email protected]>
Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-Rt+bQ@mail.gmail.com

Clean up error messages from 1495eff7bdb

Quote file names, and mostly avoid hard coded file names. Along the way
make a few other minor improvements.

Discussion: https://postgr.es/m/20250407.152721.1397761902317499205 [email protected]

Add local-address escape "%L" to log_line_prefix.

This escape shows the numeric server IP address that the client
has connected to. Unix-socket connections will show "[local]".
Non-client processes (e.g. background processes) will show "[none]".

We expect that this option will be of interest to only a fairly
small number of users. Therefore the implementation is optimized
for the case where it's not used (that is, we don't do the string
conversion until we have to), and we've not added the field to
csvlog or jsonlog formats.

Author: Greg Sabino Mullane <[email protected]>
Reviewed-by: Cary Huang <[email protected]>
Reviewed-by: David Steele <[email protected]>
Reviewed-by: Jim Jones <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/CAKAnmmK-U+UicE-qbNU23K--Q5XTLdM6bj+gbkZBZkjyjrd3Ow@mail.gmail.com

Revert "Use workaround of __builtin_setjmp only on MINGW on MSVCRT"

This reverts commit c313fa4602defe1be947370ab5b217ca163a1e3c.

This is found to cause issues on x86_64 Windows even when using UCRT.

Discussion: https://postgr.es/m/3312149.1744001936@sss.pgh.pa.us

read_stream: Fix overflow hazard with large shared buffers

If the limit returned by GetAdditionalPinLimit() is large, the buffer_limit
variable in read_stream_start_pending_read() can overflow. While the code is
careful to limit buffer_limit PG_INT16_MAX, we subsequently add the number of
forwarded buffers.

The overflow can lead to assertion failures, crashes or wrong query results
when using large shared buffers.

It seems easier to avoid this if we make the buffer_limit variable an int,
instead of an int16. Do so, and clamp buffer_limit after adding the number of
forwarded buffers.

It's possible we might want to address this and related issues more widely by
changing to int instead of int16 more widely, but since the consequences of
this bug can be confusing, it seems better to fix it now.

This bug was introduced in ed0b87caaca.

Discussion: https://postgr.es/m/ewvz3cbtlhrwqk7h6ca6cctiqh7r64ol3pzb3iyjycn2r5nxk5@tnhw3a5zatlr

Remove GUC_NOT_IN_SAMPLE from enable_self_join_elimination

fc069a3a6319 implements Self-Join Elimination (SJE) and provides a new GUC
variable: enable_self_join_elimination.  This new GUC variable was marked
as GUC_NOT_IN_SAMPLE.  However, enable_self_join_elimination is documented
and is not different from any other enable_* GUCs.  Thus, remove
GUC_NOT_IN_SAMPLE from it and add it to the postgresql.conf.sample.

Discussion: https://postgr.es/m/CAPpHfdsqMTEsmxk3aQwt6xPz%2BKpUELO%3D6fzmER9ZRGrbs4uMfA%40mail.gmail.com
Author: Tender Wang <[email protected]>
Reviewed-by: Tom Lane <[email protected]>

psql: Clarify help message for WATCH_INTERVAL

The help message for WATCH_INTERVAL was hard to interpret and didn't
follow the style of other messages, this updates it to nake it fit in
better and be easier to interpret.

Author: Daniel Gustafsson <[email protected]>
Reported-by: Kyotaro Horiguchi <[email protected]>
Reviewed-by: David G. Johnston <[email protected]>
Discussion: https://postgr.es/m/20250326.120732.1167093737847500721 [email protected]

Fix grammar in log message of pg_restore.c

Introduced by 1495eff7bdb0.

Author: Kyotaro Horiguchi <[email protected]>
Discussion: https://postgr.es/m/20250407.151359.72428746612514925 [email protected]

libpq: Fix some issues in TAP tests for service files

The valid service file was not correctly shaped, as append_to_file() was
called with an array as input.  This is changed so as the parameter and
value pairs from the valid connection string are appended to the valid
service file one by one.

Even with the first issue fixed, the tests should fail.  However, they
have been passing because all the connection attempts relied on the
default values given to PGPORT and PGHOST from the node when using
Cluster.pm's connect_ok() and connect_fails(), rather than the data in
the service file.  The test is updated to use an interesting trick: a
dummy node is initialized but not started, and all the connection
attempts are done through it.  This ensures that the data inside the
service file is used for all the connection tests.  Note that breaking
the contents of the valid service file on purpose makes all the tests
that rely on it fail.

Issues introduced by 72c2f36d5727.

Author: Andrew Jackson <[email protected]>
Discussion: https://postgr.es/m/CAKK5BkG_6_YSaebM6gG=8EuKaY7_VX1RFgYeySuwFPh8FZY73g@mail.gmail.com