| Age | Commit message (Collapse) | Author |
|
The SQL spec says that sql_identifier is a domain over varchar,
but it also says that that domain is supposed to represent the set
of valid identifiers for the implementation, in particular applying
a length limit matching the implementation's identifier length limit.
We were declaring sql_identifier as just "character varying", thus
duplicating what the spec says about base type, but entirely failing
at the rest of it.
Instead, let's declare sql_identifier as a domain over type "name".
(We can drop the COLLATE "C" added by commit 6b0faf723, since that's
now implicit in "name".) With the recent improvements to name's
comparison support, there's not a lot of functional difference between
name and varchar. So although in principle this is a spec deviation,
it's a pretty minor one. And correctly enforcing PG's name length limit
is a good thing; on balance this seems closer to the intent of the spec
than what we had.
But that's all just language-lawyering. The *real* reason to do this is
that it makes sql_identifier columns exposed by information_schema views
be just direct representations of the underlying "name" catalog columns,
eliminating a semantic mismatch that was disastrous for performance of
typical queries on the information_schema. In combination with the
recent change to allow dropping no-op CoerceToDomain nodes, this allows
(for example) queries such as
select ... from information_schema.tables where table_name = 'foo';
to produce an indexscan rather than a seqscan on pg_class.
Discussion: https://postgr.es/m/CAFj8pRBUCX4LZ2rA2BbEkdD6NN59mgx+BLo1gO08Wod4RLtcTg@mail.gmail.com
|
|
information_schema output columns that are declared as being type
sql_identifier are supposed to conform to the implementation's rules
for valid identifiers, in particular the identifier length limit.
Several places potentially violated this limit by concatenating a
function's name and OID. (The OID is added to ensure name uniqueness
within a schema, since the spec doesn't expect function name overloading.)
Simply truncating the concatenation result to fit in "name" won't do,
since losing part of the OID might wind up giving non-unique results.
Instead, let's truncate the function name as necessary.
The most practical way to do that is to do it in a C function; the
information_schema.sql script doesn't have easy access to the value
of NAMEDATALEN, nor does it have an easy way to truncate on the basis
of resulting byte-length rather than number of characters.
(There are still a couple of places that cast concatenation results to
sql_identifier, but as far as I can see they are guaranteed not to produce
over-length strings, at least with the normal value of NAMEDATALEN.)
Discussion: https://postgr.es/m/[email protected]
|
|
Using the full width of the CPU's native word size shouldn't cost
anything in typical cases. When working with large bitmapsets,
this halves the number of operations needed for many common BMS
operations. On the right sort of test case, a measurable improvement
is obtained.
David Rowley
Discussion: https://postgr.es/m/CAKJS1f9EGBd2h-VkXvb=51tf+X46zMX5T8h-KYgXEV_u2zmLUw@mail.gmail.com
|
|
Now that name comparison has effectively the same behavior as text
comparison, we might as well merge the name_ops opfamily into text_ops,
allowing cross-type comparisons to be processed without forcing a
datatype coercion first. We need do little more than add cross-type
operators to make the opfamily complete, and fix one or two places
in the planner that assumed text_ops was a single-datatype opfamily.
I chose to unify hash name_ops into hash text_ops as well, since the
types have compatible hashing semantics. This allows marking the
new cross-type equality operators as oprcanhash.
(Note: this doesn't remove the name_ops opclasses, so there's no
breakage of index definitions. Those opclasses are just reparented
into the text_ops opfamily.)
Discussion: https://postgr.es/m/[email protected]
|
|
The "name" comparison operators now all support collations, making them
functionally equivalent to "text" comparisons, except for the different
physical representation of the datatype. They do, in fact, mostly share
the varstr_cmp and varstr_sortsupport infrastructure, which has been
slightly enlarged to handle the case.
To avoid changes in the default behavior of the datatype, set name's
typcollation to C_COLLATION_OID not DEFAULT_COLLATION_OID, so that
by default comparisons to a name value will continue to use strcmp
semantics. (This would have been the case for system catalog columns
anyway, because of commit 6b0faf723, but doing this makes it true for
user-created name columns as well. In particular, this avoids
locale-dependent changes in our regression test results.)
In consequence, tweak a couple of places that made assumptions about
collatable base types always having typcollation DEFAULT_COLLATION_OID.
I have not, however, attempted to relax the restriction that user-
defined collatable types must have that. Hence, "name" doesn't
behave quite like a user-defined type; it acts more like a domain
with COLLATE "C". (Conceivably, if we ever get rid of the need for
catalog name columns to be fixed-length, "name" could actually become
such a domain over text. But that'd be a pretty massive undertaking,
and I'm not volunteering.)
Discussion: https://postgr.es/m/[email protected]
|
|
Up to now we allowed text columns in system catalogs to use collation
"default", but that isn't really safe because it might mean something
different in template0 than it means in a database cloned from template0.
In particular, this could mean that cloned pg_statistic entries for such
columns weren't entirely valid, possibly leading to bogus planner
estimates, though (probably) not any outright failures.
In the wake of commit 5e0928005, a better solution is available: if we
label such columns with "C" collation, then their pg_statistic entries
will also use that collation and hence will be valid independently of
the database collation.
This also provides a cleaner solution for indexes on such columns than
the hack added by commit 0b28ea79c: the indexes will naturally inherit
"C" collation and don't have to be forced to use text_pattern_ops.
Also, with the planned improvement of type "name" to be collation-aware,
this policy will apply cleanly to both text and name columns.
Because of the pg_statistic angle, we should also apply this policy
to the tables in information_schema. This patch does that by adjusting
information_schema's textual domain types to specify "C" collation.
That has the user-visible effect that order-sensitive comparisons to
textual information_schema view columns will now use "C" collation
by default. The SQL standard says that the collation of those view
columns is implementation-defined, so I think this is legal per spec.
At some point this might allow for translation of such comparisons
into indexable conditions on the underlying "name" columns, although
additional work will be needed before that can happen.
Discussion: https://postgr.es/m/[email protected]
|
|
pg_tables already includes partitioned tables, so for consistency
pg_indexes should show partitioned indexes.
Author: Suraj Kharage
Reviewed-by: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/CAF1DzPVrYo4XNTEnc=PqVp6aLJc7LFYpYR4rX=_5pV=wJ2KdZg@mail.gmail.com
|
|
It appears that all platforms that have sys_siglist[] also have
strsignal(), making that fallback case in pg_strsignal() dead code.
Getting rid of it allows dropping a configure test, which seems worth
more than providing textual signal descriptions on whatever platforms
might still hypothetically have use for the fallback case.
Discussion: https://postgr.es/m/[email protected]
|
|
When partitioned tables were introduced, we failed to realize that by
copying the tablespace handling for other relation kinds with no
physical storage we were causing the secondary effect that their
partitions would not automatically inherit the tablespace setting. This
is surprising and unhelpful, so change it to adopt the behavior
introduced in pg11 (commit 33e6c34c3267) for partitioned indexes: the
parent relation remembers the tablespace specification, which is then
used for any new partitions that don't declare one.
Because this commit changes behavior of the TABLESPACE clause for
partitioned tables (it's no longer a no-op), it is not backpatched.
Author: David Rowley, Álvaro Herrera
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f9SxVzqDrGD1teosFd6jBMM0UEaa14_8mRvcWE19Tu0hA@mail.gmail.com
|
|
At least as far back as the 2008 spec, POSIX has defined strsignal(3)
for looking up descriptive strings for signal numbers. We hadn't gotten
the word though, and were still using the crufty old sys_siglist array,
which is in no standard even though most Unixen provide it.
Aside from not being formally standards-compliant, this was just plain
ugly because it involved #ifdef's at every place using the code.
To eliminate the #ifdef's, create a portability function pg_strsignal,
which wraps strsignal(3) if available and otherwise falls back to
sys_siglist[] if available. The set of Unixen with neither API is
probably empty these days, but on any platform with neither, you'll
just get "unrecognized signal". All extant callers print the numeric
signal number too, so no need to work harder than that.
Along the way, upgrade pg_basebackup's child-error-exit reporting
to match the rest of the system.
Discussion: https://postgr.es/m/[email protected]
|
|
Commit ffa4cbd62 added logic to detect SIGPIPE failure of a COPY child
process, but it only worked correctly if the SIGPIPE occurred in the
immediate child process. Depending on the shell in use and the
complexity of the shell command string, we might instead get back
an exit code of 128 + SIGPIPE, representing a shell error exit
reporting SIGPIPE in the child process.
We could just hack up ClosePipeToProgram() to add the extra case,
but it seems like this is a fairly general issue deserving a more
general and better-documented solution. I chose to add a couple
of functions in src/common/wait_error.c, which is a natural place
to know about wait-result encodings, that will test for either a
specific child-process signal type or any child-process signal failure.
Then, adjust other places that were doing ad-hoc tests of this type
to use the common functions.
In RestoreArchivedFile, this fixes a race condition affecting whether
the process will report an error or just silently proc_exit(1): before,
that depended on whether the intermediate shell got SIGTERM'd itself
or reported a child process failing on SIGTERM.
Like the previous patch, back-patch to v10; we could go further
but there seems no real need to.
Per report from Erik Rijkers.
Discussion: https://postgr.es/m/[email protected]
|
|
When we first put in collations support, we basically punted on teaching
pg_statistic, ANALYZE, and the planner selectivity functions about that.
They've just used DEFAULT_COLLATION_OID independently of the actual
collation of the data. It's time to improve that, so:
* Add columns to pg_statistic that record the specific collation associated
with each statistics slot.
* Teach ANALYZE to use the column's actual collation when comparing values
for statistical purposes, and record this in the appropriate slot. (Note
that type-specific typanalyze functions are now expected to fill
stats->stacoll with the appropriate collation, too.)
* Teach assorted selectivity functions to use the actual collation of
the stats they are looking at, instead of just assuming it's
DEFAULT_COLLATION_OID.
This should give noticeably better results in selectivity estimates for
columns with nondefault collations, at least for query clauses that use
that same collation (which would be the default behavior in most cases).
It's still true that comparisons with explicit COLLATE clauses different
from the stored data's collation won't be well-estimated, but that's no
worse than before. Also, this patch does make the first step towards
doing better with that, which is that it's now theoretically possible to
collect stats for a collation other than the column's own collation.
Patch by me; thanks to Peter Eisentraut for review.
Discussion: https://postgr.es/m/[email protected]
|
|
The cache lookup routines for foreign-data wrappers and foreign servers
are extended with an extra argument to handle a set of flags. The only
value which can be used now is to indicate if a missing object should
result in an error or not, and are designed to be extensible on need.
Those new routines are added into the existing set of user-visible
FDW APIs and documented in consequence. They will be used for future
patches to improve the SQL interface for object addresses.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
|
|
The changes I made in 578b229718e assigned oids below
FirstBootstrapObjectId to objects in include/catalog/*.dat files that
did not have an oid assigned, starting at the max oid explicitly
assigned. Tom criticized that for mainly two reasons:
1) It's not clear which values are manually and which explicitly
assigned.
2) The space below FirstBootstrapObjectId gets pretty crowded, and
some PostgreSQL forks have used oids >= 9000 for their own objects,
to avoid conflicting.
Thus create a new range for objects not assigned explicit oids, but
assigned by genbki.pl. For now 1-9999 is for explicitly assigned oids,
FirstGenbkiObjectId (10000) to FirstBootstrapObjectId (1200) -1 is for
genbki.pl assigned oids, and < FirstNormalObjectId (16384) is for oids
assigned during bootstrap. It's possible that we'll have to adjust
these boundaries, but there's some headroom for now.
Add a note suggesting that oids in forks should be assigned in the
9000-9999 range.
Catversion bump for obvious reasons.
Per complaint from Tom Lane.
Author: Andres Freund
Discussion: https://postgr.es/m/[email protected]
|
|
If a domain has no constraints, then CoerceToDomain doesn't really do
anything and can be simplified to a RelabelType. This not only
eliminates cycles at execution, but allows the planner to optimize better
(for instance, match the coerced expression to an index on the underlying
column). However, we do have to support invalidating the plan later if
a constraint gets added to the domain. That's comparable to the case of
a change to a SQL function that had been inlined into a plan, so all the
necessary logic already exists for plans depending on functions. We
need only duplicate or share that logic for domains.
ALTER DOMAIN ADD/DROP CONSTRAINT need to be taught to send out sinval
messages for the domain's pg_type entry, since those operations don't
update that row. (ALTER DOMAIN SET/DROP NOT NULL do update that row,
so no code change is needed for them.)
Testing this revealed what's really a pre-existing bug in plpgsql:
it caches the SQL-expression-tree expansion of type coercions and
had no provision for invalidating entries in that cache. Up to now
that was only a problem if such an expression had inlined a SQL
function that got changed, which is unlikely though not impossible.
But failing to track changes of domain constraints breaks an existing
regression test case and would likely cause practical problems too.
We could fix that locally in plpgsql, but what seems like a better
idea is to build some generic infrastructure in plancache.c to store
standalone expressions and track invalidation events for them.
(It's tempting to wonder whether plpgsql's "simple expression" stuff
could use this code with lower overhead than its current use of the
heavyweight plancache APIs. But I've left that idea for later.)
Other stuff fixed in passing:
* Allow estimate_expression_value() to drop CoerceToDomain
unconditionally, effectively assuming that the coercion will succeed.
This will improve planner selectivity estimates for cases involving
estimatable expressions that are coerced to domains. We could have
done this independently of everything else here, but there wasn't
previously any need for eval_const_expressions_mutator to know about
CoerceToDomain at all.
* Use a dlist for plancache.c's list of cached plans, rather than a
manually threaded singly-linked list. That eliminates a potential
performance problem in DropCachedPlan.
* Fix a couple of inconsistencies in typecmds.c about whether
operations on domains drop RowExclusiveLock on pg_type. Our common
practice is that DDL operations do drop catalog locks, so standardize
on that choice.
Discussion: https://postgr.es/m/[email protected]
|
|
When GIN vacuum deletes a posting tree page, it assumes that no concurrent
searchers can access it, thanks to ginStepRight() locking two pages at once.
However, since 9.4 searches can skip parts of posting trees descending from the
root. That leads to the risk that page is deleted and reclaimed before
concurrent search can access it.
This commit prevents the risk of above by waiting for every transaction, which
might wait to reference this page, to finish. Due to binary compatibility
we can't change GinPageOpaqueData to store corresponding transaction id.
Instead we reuse page header pd_prune_xid field, which is unused in index pages.
Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com
Author: Andrey Borodin, Alexander Korotkov
Reviewed-by: Alexander Korotkov
Backpatch-through: 9.4
|
|
postgres_fdw's postgresGetForeignPlan() assumes without checking that the
outer_plan it's given for a join relation must have a NestLoop, MergeJoin,
or HashJoin node at the top. That's been wrong at least since commit
4bbf6edfb (which could cause insertion of a Sort node on top) and it seems
like a pretty unsafe thing to Just Assume even without that.
Through blind good fortune, this doesn't seem to have any worse
consequences today than strange EXPLAIN output, but it's clearly trouble
waiting to happen.
To fix, test the node type explicitly before touching Join-specific
fields, and avoid jamming the new tlist into a node type that can't
do projection. Export a new support function from createplan.c
to avoid building low-level knowledge about the latter into FDWs.
Back-patch to 9.6 where the faulty coding was added. Note that the
associated regression test cases don't show any changes before v11,
apparently because the tests back-patched with 4bbf6edfb don't actually
exercise the problem case before then (there's no top-level Sort
in those plans).
Discussion: https://postgr.es/m/[email protected]
|
|
The timestamp generated by the standby at message transmission has been
included in the protocol since its introduction for both the status
update message and hot standby feedback message, but it has never
appeared in pg_stat_replication. Seeing this timestamp does not matter
much with a cluster which has a lot of activity, but on a mostly-idle
cluster, this makes monitoring able to react faster than the configured
timeouts.
Author: MyungKyu LIM
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/1657809367.407321.1533027417725.JavaMail.jboss@ep2ml404
|
|
Skipping over the "hole" in full page images in the XLOG code was
described as being a form of compression, but this got a bit confusing
since we now have PGLZ-based compression happening, so adjust the
wording to discuss "removing" the "hole" and keeping the talk about
compression to where we're talking about using PGLZ-based compression of
the full page images.
Reviewed-By: Kyotaro Horiguchi
Discussion: https://postgr.es/m/[email protected]
|
|
This allows to set a lower log_min_duration_statement value without
incurring excessive log traffic (which reduces performance). This can
be useful to analyze workloads with lots of short queries.
Author: Adrien Nayrat
Reviewed-by: David Rowley, Vik Fearing
Discussion: https://postgr.es/m/[email protected]
|
|
"\pset format csv", or --csv, selects comma-separated values table format.
This is compliant with RFC 4180, except that we aren't too picky about
whether the record separator is LF or CRLF; also, the user may choose a
field separator other than comma.
This output format is directly compatible with the server's COPY CSV
format, and will also be useful as input to other programs. It's
considerably safer for that purpose than the old recommendation to
use "unaligned" format, since the latter couldn't handle data
containing the field separator character.
Daniel Vérité, reviewed by Fabien Coelho and David Fetter, some
tweaking by me
Discussion: https://postgr.es/m/[email protected]
|
|
recovery.conf settings are now set in postgresql.conf (or other GUC
sources). Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.
Recovery is now initiated by a file recovery.signal. Standby mode is
initiated by a file standby.signal. The standby_mode setting is
gone. If a recovery.conf file is found, an error is issued.
The trigger_file setting has been renamed to promote_trigger_file as
part of the move.
The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".
pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.
Author: Fujii Masao <[email protected]>
Author: Simon Riggs <[email protected]>
Author: Abhijit Menon-Sen <[email protected]>
Author: Sergei Kornilov <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]/
|
|
Users of the WaitEventSet and WaitLatch() APIs can now choose between
asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking
for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death.
This reduces code duplication, since almost all callers want the latter.
Repair all code that was previously ignoring postmaster death completely,
or requesting the event but ignoring it, or requesting the event but then
doing an unconditional PostmasterIsAlive() call every time through its
event loop (which is an expensive syscall on platforms for which we don't
have USE_POSTMASTER_DEATH_SIGNAL support).
Assert that callers of WaitLatchXXX() under the postmaster remember to
ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent
future bugs.
The only process that doesn't handle postmaster death is syslogger. It
waits until all backends holding the write end of the syslog pipe
(including the postmaster) have closed it by exiting, to be sure to
capture any parting messages. By using the WaitEventSet API directly
it avoids the new assertion, and as a by-product it may be slightly
more efficient on platforms that have epoll().
Author: Thomas Munro
Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ%40mail.gmail.com,
https://postgr.es/m/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com
|
|
Sets the timestamp to current if not already set. Will acquire more
callers momentarily.
Author: Fabien Coelho
Discussion: https://postgr.es/m/alpine.DEB.2.21.1808111104320.1705@lancre
|
|
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.
This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row. Neither pg_dump nor COPY included the contents of the
oid column by default.
The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.
WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.
Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.
The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.
The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such. This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.
The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.
Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).
The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.
While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.
Catversion bump, for obvious reasons.
Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/[email protected]
|
|
The dedicated private buffer to store records is used only for these
crossing a page boundary since 285bd0ac, but its description did not
match completely the reality.
Reported-by: Andrey Lepikhov
Author: Michael Paquier
Discussion: https://postgr.es/m/[email protected]
|
|
For example:
ssl_min_protocol_version = 'TLSv1.1'
ssl_max_protocol_version = 'TLSv1.2'
Reviewed-by: Steve Singer <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
|
|
The 'partoids' list which was constructed by the previous version
of this code was necessarily identical to 'inhoids'. There's no
point to duplicating the list, so avoid that. Instead, construct
the array representation directly from the original 'inhoids' list.
Also, use an array rather than a list for 'boundspecs'. We know
exactly how many items we need to store, so there's really no
reason to use a list. Using an array instead reduces the number
of memory allocations we perform.
Patch by me, reviewed by Michael Paquier and Amit Langote, the
latter of whom also helped with rebasing.
|
|
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure. In that case the only remaining copy of the
data is in the WAL. A subsequent fsync() could appear to succeed,
but not have flushed the data. That means that a future checkpoint
could apparently complete successfully but have lost data.
Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure. Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.
Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses. If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail. The GUC defaults
to off, meaning that we panic.
Back-patch to all supported releases.
There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error. A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.
Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
|
|
Setting them to SIG_IGN seems unlikely to have any beneficial effect
on that platform, and given the signal numbering collision with SIGABRT,
it could easily have bad effects.
Given the lack of field complaints that can be traced to this, I don't
presently feel a need to back-patch.
Discussion: https://postgr.es/m/[email protected]
|
|
This commit completes the work prepared in 1a0586de36, splitting the
old TupleTableSlot implementation (which could store buffer, heap,
minimal and virtual slots) into four different slot types. As
described in the aforementioned commit, this is done with the goal of
making tuple table slots extensible, to allow for pluggable table
access methods.
To achieve runtime extensibility for TupleTableSlots, operations on
slots that can differ between types of slots are performed using the
TupleTableSlotOps struct provided at slot creation time. That
includes information from the size of TupleTableSlot struct to be
allocated, initialization, deforming etc. See the struct's definition
for more detailed information about callbacks TupleTableSlotOps.
I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and
ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more
consistent with other naming introduced in recent patches.
There's plenty optimization potential in the slot implementation, but
according to benchmarking the state after this commit has similar
performance characteristics to before this set of changes, which seems
sufficient.
There's a few changes in execReplication.c that currently need to poke
through the slot abstraction, that'll be repaired once the pluggable
storage patchset provides the necessary infrastructure.
Author: Andres Freund and Ashutosh Bapat, with changes by Amit Khandekar
Discussion: https://postgr.es/m/[email protected]
|
|
This yields a minor speedup, which roughly balances the loss from the
upcoming introduction of callbacks to do some operations on slots.
Author: Andres Freund
Discussion: https://postgr.es/m/[email protected]
|
|
This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well
as the future MERGE) on partitioned tables.
This changes the setup for tuple routing so that it does far less work
during the initial setup and pushes more work out to when partitions
receive tuples. PartitionDispatchData structs for sub-partitioned
tables are only created when a tuple gets routed through it. The
possibly large arrays in the PartitionTupleRouting struct have largely
been removed. The partitions[] array remains but now never contains any
NULL gaps. Previously the NULLs had to be skipped during
ExecCleanupTupleRouting(), which could add a large overhead to the
cleanup when the number of partitions was large. The partitions[] array
is allocated small to start with and only enlarged when we route tuples
to enough partitions that it runs out of space. This allows us to keep
simple single-row partition INSERTs running quickly. Redesign
The arrays in PartitionTupleRouting which stored the tuple translation maps
have now been removed. These have been moved out into a
PartitionRoutingInfo struct which is an additional field in ResultRelInfo.
The find_all_inheritors() call still remains by far the slowest part of
ExecSetupPartitionTupleRouting(). This commit just removes the other slow
parts.
In passing also rename the tuple translation maps from being ParentToChild
and ChildToParent to being RootToPartition and PartitionToRoot. The old
names mislead you into thinking that a partition of some sub-partitioned
table would translate to the rowtype of the sub-partitioned table rather
than the root partitioned table.
Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera
Testing help from Jesper Pedersen and Kato Sho.
Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
|
|
Per MSVC complaint on buildfarm member dory.
|
|
Virtual tuple table slots never need tuple deforming. Therefore, if we
know at expression compilation time, that a certain slot will always
be virtual, there's no need to create a tuple deforming routine for
it.
Author: Andres Freund
Discussion: https://postgr.es/m/[email protected]
|
|
Previously this information was computed when JIT compiling an
expression. But the information is useful for assertions in the
non-JIT case too (for assertions), therefore it makes sense to move
it.
This will, in a followup commit, allow to treat different slot types
differently. E.g. for virtual slots there's no need to generate a JIT
function to deform the slot.
Author: Andres Freund
Discussion: https://postgr.es/m/[email protected]
|
|
Upcoming work intends to allow pluggable ways to introduce new ways of
storing table data. Accessing those table access methods from the
executor requires TupleTableSlots to be carry tuples in the native
format of such storage methods; otherwise there'll be a significant
conversion overhead.
Different access methods will require different data to store tuples
efficiently (just like virtual, minimal, heap already require fields
in TupleTableSlot). To allow that without requiring additional pointer
indirections, we want to have different structs (embedding
TupleTableSlot) for different types of slots. Thus different types of
slots are needed, which requires adapting creators of slots.
The slot that most efficiently can represent a type of tuple in an
executor node will often depend on the type of slot a child node
uses. Therefore we need to track the type of slot is returned by
nodes, so parent slots can create slots based on that.
Relatedly, JIT compilation of tuple deforming needs to know which type
of slot a certain expression refers to, so it can create an
appropriate deforming function for the type of tuple in the slot.
But not all nodes will only return one type of slot, e.g. an append
node will potentially return different types of slots for each of its
subplans.
Therefore add function that allows to query the type of a node's
result slot, and whether it'll always be the same type (whether it's
fixed). This can be queried using ExecGetResultSlotOps().
The scan, result, inner, outer type of slots are automatically
inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(),
left/right subtrees respectively. If that's not correct for a node,
that can be overwritten using new fields in PlanState.
This commit does not introduce the actually abstracted implementation
of different kind of TupleTableSlots, that will be left for a followup
commit. The different types of slots introduced will, for now, still
use the same backing implementation.
While this already partially invalidates the big comment in
tuptable.h, it seems to make more sense to update it later, when the
different TupleTableSlot implementations actually exist.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/[email protected]
|
|
Previously materializing a slot always returned a HeapTuple. As
current work aims to reduce the reliance on HeapTuples (so other
storage systems can work efficiently), that needs to change. Thus
split the tasks of materializing a slot (i.e. making it independent
from the underlying storage / other memory contexts) from fetching a
HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from
a slot and materializing the slot at the same time, controlled by a
parameter.
For now some callers of ExecFetchSlotHeapTuple, with materialize =
true, expect that changes to the heap tuple will be reflected in the
underlying slot. Those places will be adapted in due course, so while
not pretty, that's OK for now.
Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and
ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely
that future storage methods will need similar methods. There already
is ExecFetchSlotMinimalTuple, so the new names make the naming scheme
more coherent.
Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar
Discussion: https://postgr.es/m/[email protected]
|
|
It was not really that obvious that there's meant to be exactly 1 item
in the partexprs List for each zero-valued partattrs element. Some
incorrect code using these fields was the cause of CVE-2018-1052, so
it's worthwhile to mention how they should be used in the comments.
Author: David Rowley <[email protected]>
|
|
The comments above the PartitionedRelPruneInfo struct incorrectly
document how subplan_map and subpart_map are indexed. This seems to
have snuck in on 4e232364033.
Author: David Rowley <[email protected]>
|
|
The reformat_dat_file.pl script, added by 372728b0d49552641, supported
all the necessary options to make it work in a VPATH build, but the
makefile invocations didn't take VPATH into account. Fix that.
Discussion: https://postgr.es/m/[email protected]
Backpatch: 11-, where 372728b0d49552641 was merged
|
|
BufFileSize() can't use off_t, because it's only 32 bits wide on
some systems. BufFile objects can have many 1GB segments so the
total size can exceed 2^31. The only known client of the function
is parallel CREATE INDEX, which was reported to fail when building
large indexes on Windows.
Though this is technically an ABI break on platforms with a 32 bit
off_t and we might normally avoid back-patching it, the function is
brand new and thus unlikely to have been discovered by extension
authors yet, and it's fairly thoroughly broken on those platforms
anyway, so just fix it.
Defect in 9da0cc35. Bug #15460. Back-patch to 11, where this
function landed.
Author: Thomas Munro
Reported-by: Paul van der Linden, Pavel Oskin
Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/15460-b6db80de822fa0ad%40postgresql.org
Discussion: https://postgr.es/m/CAHDGBJP_GsESbTt4P3FZA8kMUKuYxjg57XHF7NRBoKnR%3DCAR-g%40mail.gmail.com
|
|
date_trunc(field, timestamptz, zone_name) performs truncation using
the named time zone as reference, rather than working in the session
time zone as is the default behavior. It's equivalent to
date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name
but it's faster, easier to type, and arguably easier to understand.
Vik Fearing and Tom Lane
Discussion: https://postgr.es/m/[email protected]
|
|
Change lock level for renaming index (either ALTER INDEX or implicitly
via some other commands) from AccessExclusiveLock to
ShareUpdateExclusiveLock.
One reason we need a strong lock for relation renaming is that the
name change causes a rebuild of the relcache entry. Concurrent
sessions that have the relation open might not be able to handle the
relcache entry changing underneath them. Therefore, we need to lock
the relation in a way that no one can have the relation open
concurrently. But for indexes, the relcache handles reloads specially
in RelationReloadIndexInfo() in a way that keeps changes in the
relcache entry to a minimum. As long as no one keeps pointers to
rd_amcache and rd_options around across possible relcache flushes,
which is the case, this ought to be safe.
We also want to use a self-exclusive lock for correctness, so that
concurrent DDL doesn't overwrite the rename if they start updating
while still seeing the old version. Therefore, we use
ShareUpdateExclusiveLock, which is already used by other DDL commands
that want to operate in a concurrent manner.
The reason this is interesting at all is that renaming an index is a
typical part of a concurrent reindexing workflow (CREATE INDEX
CONCURRENTLY new + DROP INDEX CONCURRENTLY old + rename back). And
indeed a future built-in REINDEX CONCURRENTLY might rely on the ability
to do concurrent renames as well.
Reviewed-by: Andrey Klychkov <[email protected]>
Reviewed-by: Fabrízio de Royes Mello <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
|
|
The code building PartitionBoundInfo based on the constituent partition
data read from catalogs has been located in partcache.c, with a specific
set of routines dedicated to bound types, like sorting or bound data
creation. All this logic is moved to partbounds.c and relocates all the
bound-specific logistic into it, with partition_bounds_create() as
principal entry point.
Author: Amit Langote
Reviewed-by: Michael Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/[email protected]
|
|
This patch fixes several related cases in which pg_shdepend entries were
never made, or were lost, for references to roles appearing in the ACLs of
schemas and/or types. While that did no immediate harm, if a referenced
role were later dropped, the drop would be allowed and would leave a
dangling reference in the object's ACL. That still wasn't a big problem
for normal database usage, but it would cause obscure failures in
subsequent dump/reload or pg_upgrade attempts, taking the form of
attempts to grant privileges to all-numeric role names. (I think I've
seen field reports matching that symptom, but can't find any right now.)
Several cases are fixed here:
1. ALTER DOMAIN SET/DROP DEFAULT would lose the dependencies for any
existing ACL entries for the domain. This case is ancient, dating
back as far as we've had pg_shdepend tracking at all.
2. If a default type privilege applies, CREATE TYPE recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for types in 9.2.
3. If a default schema privilege applies, CREATE SCHEMA recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for schemas in v10
(commit ab89e465c).
Another somewhat-related problem is that when creating a relation
rowtype or implicit array type, TypeCreate would apply any available
default type privileges to that type, which we don't really want
since such an object isn't supposed to have privileges of its own.
(You can't, for example, drop such privileges once they've been added
to an array type.)
ab89e465c is also to blame for a race condition in the regression tests:
privileges.sql transiently installed globally-applicable default
privileges on schemas, which sometimes got absorbed into the ACLs of
schemas created by concurrent test scripts. This should have resulted
in failures when privileges.sql tried to drop the role holding such
privileges; but thanks to the bug fixed here, it instead led to dangling
ACLs in the final state of the regression database. We'd managed not to
notice that, but it became obvious in the wake of commit da906766c, which
allowed the race condition to occur in pg_upgrade tests.
To fix, add a function recordDependencyOnNewAcl to encapsulate what
callers of get_user_default_acl need to do; while the original call
sites got that right via ad-hoc code, none of the later-added ones
have. Also change GenerateTypeDependencies to generate these
dependencies, which requires adding the typacl to its parameter list.
(That might be annoying if there are any extensions calling that
function directly; but if there are, they're most likely buggy in the
same way as the core callers were, so they need work anyway.) While
I was at it, I changed GenerateTypeDependencies to accept most of its
parameters in the form of a Form_pg_type pointer, making its parameter
list a bit less unwieldy and mistake-prone.
The test race condition is fixed just by wrapping the addition and
removal of default privileges into a single transaction, so that that
state is never visible externally. We might eventually prefer to
separate out tests of default privileges into a script that runs by
itself, but that would be a bigger change and would make the tests
run slower overall.
Back-patch relevant parts to all supported branches.
Discussion: https://postgr.es/m/[email protected]
|
|
In a lot of nodes the return slot is not required. That can either be
because the node doesn't do any projection (say an Append node), or
because the node does perform projections but the projection is
optimized away because the projection would yield an identical row.
Slots aren't that small, especially for wide rows, so it's worthwhile
to avoid creating them. It's not possible to just skip creating the
slot - it's currently used to determine the tuple descriptor returned
by ExecGetResultType(). So separate the determination of the result
type from the slot creation. The work previously done internally
ExecInitResultTupleSlotTL() can now also be done separately with
ExecInitResultTypeTL() and ExecInitResultSlot(). That way nodes that
aren't guaranteed to need a result slot, can use
ExecInitResultTypeTL() to determine the result type of the node, and
ExecAssignScanProjectionInfo() (via
ExecConditionalAssignProjectionInfo()) determines that a result slot
is needed, it is created with ExecInitResultSlot().
Besides the advantage of avoiding to create slots that then are
unused, this is necessary preparation for later patches around tuple
table slot abstraction. In particular separating the return descriptor
and slot is a prerequisite to allow JITing of tuple deforming with
knowledge of the underlying tuple format, and to avoid unnecessarily
creating JITed tuple deforming for virtual slots.
This commit removes a redundant argument from
ExecInitResultTupleSlotTL(). While this commit touches a lot of the
relevant lines anyway, it'd normally still not worthwhile to cause
breakage, except that aforementioned later commits will touch *all*
ExecInitResultTupleSlotTL() callers anyway (but fits worse
thematically).
Author: Andres Freund
Discussion: https://postgr.es/m/[email protected]
|
|
s/xl_heap_delete/xl_heap_truncate/ in a comment block referring to flags
for truncation.
Discussion: https://postgr.es/m/[email protected]
|
|
The original code to propagate NOT NULL and default expressions
specified when creating a partition was mostly copy-pasted from
typed-tables creation, but not being a great match it contained some
duplicity, inefficiency and bugs.
This commit fixes the bug that NOT NULL constraints declared in the
parent table would not be honored in the partition. One reported issue
that is not fixed is that a DEFAULT declared in the child is not used
when inserting through the parent. That would amount to a behavioral
change that's better not back-patched.
This rewrite makes the code simpler:
1. instead of checking for duplicate column names in its own block,
reuse the original one that already did that;
2. instead of concatenating the list of columns from parent and the one
declared in the partition and scanning the result to (incorrectly)
propagate defaults and not-null constraints, just scan the latter
searching the former for a match, and merging sensibly. This works
because we know the list in the parent is already correct and there can
only be one parent.
This rewrite makes ColumnDef->is_from_parent unused, so it's removed
on branch master; on released branches, it's kept as an unused field in
order not to cause ABI incompatibilities.
This commit also adds a test case for creating partitions with
collations mismatching that on the parent table, something that is
closely related to the code being patched. No code change is introduced
though, since that'd be a behavior change that could break some (broken)
working applications.
Amit Langote wrote a less invasive fix for the original
NOT NULL/defaults bug, but while I kept the tests he added, I ended up
not using his original code. Ashutosh Bapat reviewed Amit's fix. Amit
reviewed mine.
Author: Álvaro Herrera, Amit Langote
Reviewed-by: Ashutosh Bapat, Amit Langote
Reported-by: Jürgen Strobel (bug #15212)
Discussion: https://postgr.es/m/[email protected]
|
|
Per buildfarm, HAVE_COPYFILE is not the same thing as HAVE_COPYFILE_H.
Add the extra configure test.
|