Age | Commit message (Collapse) | Author |
|
Teach nbtree multi-column index scans to opportunistically skip over
irrelevant sections of the index given a query with no "=" conditions on
one or more prefix index columns. When nbtree is passed input scan keys
derived from a predicate "WHERE b = 5", new nbtree preprocessing steps
output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
That is, preprocessing generates a "skip array" (and an output scan key)
for the omitted prefix column "a", which makes it safe to mark the scan
key on "b" as required to continue the scan. The scan is therefore able
to repeatedly reposition itself by applying both the "a" and "b" keys.
A skip array has "elements" that are generated procedurally and on
demand, but otherwise works just like a regular ScalarArrayOp array.
Preprocessing can freely add a skip array before or after any input
ScalarArrayOp arrays. Index scans with a skip array decide when and
where to reposition the scan using the same approach as any other scan
with array keys. This design builds on the design for array advancement
and primitive scan scheduling added to Postgres 17 by commit 5bf748b8.
Testing has shown that skip scans of an index with a low cardinality
skipped prefix column can be multiple orders of magnitude faster than an
equivalent full index scan (or sequential scan). In general, the
cardinality of the scan's skipped column(s) limits the number of leaf
pages that can be skipped over.
The core B-Tree operator classes on most discrete types generate their
array elements with the help of their own custom skip support routine.
This infrastructure gives nbtree a way to generate the next required
array element by incrementing (or decrementing) the current array value.
It can reduce the number of index descents in cases where the next
possible indexable value frequently turns out to be the next value
stored in the index. Opclasses that lack a skip support routine fall
back on having nbtree "increment" (or "decrement") a skip array's
current element by setting the NEXT (or PRIOR) scan key flag, without
directly changing the scan key's sk_argument. These sentinel values
behave just like any other value from an array -- though they can never
locate equal index tuples (they can only locate the next group of index
tuples containing the next set of non-sentinel values that the scan's
arrays need to advance to).
A skip array's range is constrained by "contradictory" inequality keys.
For example, a skip array on "x" will only generate the values 1 and 2
given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66". Such a skip
array qual usually has near-identical performance characteristics to a
comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66". However,
improved performance isn't guaranteed. Much depends on physical index
characteristics.
B-Tree preprocessing is optimistic about skipping working out: it
applies static, generic rules when determining where to generate skip
arrays, which assumes that the runtime overhead of maintaining skip
arrays will pay for itself -- or lead to only a modest performance loss.
As things stand, these assumptions are much too optimistic: skip array
maintenance will lead to unacceptable regressions with unsympathetic
queries (queries whose scan can't skip over many irrelevant leaf pages).
An upcoming commit will address the problems in this area by enhancing
_bt_readpage's approach to saving cycles on scan key evaluation, making
it work in a way that directly considers the needs of = array keys
(particularly = skip array keys).
Author: Peter Geoghegan <[email protected]>
Reviewed-By: Masahiro Ikeda <[email protected]>
Reviewed-By: Heikki Linnakangas <[email protected]>
Reviewed-By: Matthias van de Meent <[email protected]>
Reviewed-By: Tomas Vondra <[email protected]>
Reviewed-By: Aleksander Alekseev <[email protected]>
Reviewed-By: Alena Rybakina <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
|
|
This changes commit 7406ab623fe in that the gist strategy number
mapping support function is changed to use the CompareType enum as
input, instead of the "well-known" RT*StrategyNumber strategy numbers.
This is a bit cleaner, since you are not dealing with two sets of
strategy numbers. Also, this will enable us to subsume this system
into a more general system of using CompareType to define operator
semantics across index methods.
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
|
|
This is support function 12 for the GiST AM and translates
"well-known" RT*StrategyNumber values into whatever strategy number is
used by the opclass (since no particular numbers are actually
required). We will use this to support temporal PRIMARY
KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality.
This commit adds two implementations, one for internal GiST opclasses
(just an identity function) and another for btree_gist opclasses. It
updates btree_gist from 1.7 to 1.8, adding the support function for
all its opclasses.
(previously committed as 6db4598fcb8, reverted by 8aee330af55; this is
essentially unchanged from those)
Author: Paul A. Jungwirth <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: jian he <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
|
|
This feature set did not handle empty ranges correctly, and it's now
too late for PostgreSQL 17 to fix it.
The following commits are reverted:
6db4598fcb8 Add stratnum GiST support function
46a0cd4cefb Add temporal PRIMARY KEY and UNIQUE constraints
86232a49a43 Fix comment on gist_stratnum_btree
030e10ff1a3 Rename pg_constraint.conwithoutoverlaps to conperiod
a88c800deb6 Use daterange and YMD in without_overlaps tests instead of tsrange.
5577a71fb0c Use half-open interval notation in without_overlaps tests
34768ee3616 Add temporal FOREIGN KEY contraints
482e108cd38 Add test for REPLICA IDENTITY with a temporal key
c3db1f30cba doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE
144c2ce0cc7 Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes
Discussion: https://www.postgresql.org/message-id/[email protected]
|
|
This is support function 12 for the GiST AM and translates
"well-known" RT*StrategyNumber values into whatever strategy number is
used by the opclass (since no particular numbers are actually
required). We will use this to support temporal PRIMARY
KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality.
This commit adds two implementations, one for internal GiST opclasses
(just an identity function) and another for btree_gist opclasses. It
updates btree_gist from 1.7 to 1.8, adding the support function for
all its opclasses.
Author: Paul A. Jungwirth <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: jian he <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
|
|
Refer to CREATE ACCESS METHOD rather than suggesting direct changes to
pg_am. Also corrects index-specific language that predated table
access methods.
Discussion: https://postgr.es/m/[email protected]
Reported-by: Yugo NAGATA <[email protected]>
|
|
gist.sgml and xindex.sgml hadn't been fully updated for the
addition of a sortsupport support function (commit 16fa9b2b3).
xindex.sgml also missed that the compress and decompress support
functions are optional, an apparently far older oversight.
In passing, fix gratuitous inconsistencies in wording and
capitalization.
Noted by E. Rogov. Back-patch to v14; the residual issues
before that aren't significant enough to bother with.
Discussion: https://postgr.es/m/[email protected]
|
|
Not much to say here: does what it says on the tin.
We steal a previously-always-zero bit from the nextOffset
field of leaf index tuples in order to track whether there
is a nulls bitmap. Otherwise it works about like included
columns in other index types.
Pavel Borisov, reviewed by Andrey Borodin and Anastasia Lubennikova,
and rather heavily editorialized on by me
Discussion: https://postgr.es/m/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
|
|
Commit 15cb2bd27 neglected to make the running text match the
tables, leaving the reader with the strong impression that
we cannot count. Also, don't drop an unrelated para between
a table and the para describing it.
|
|
Discussion: https://postgr.es/m/20200620232145.GB17995%40telsasoft.com
Author: Justin Pryzby
Backpatch-through: 13
|
|
911e7020770 added opclass options and adjusted documentation for each
particular affected opclass. However, documentation for extendability was
not adjusted. This commit adjusts documentation for interfaces of index AMs
and opclasses.
Discussion: https://postgr.es/m/CAH2-WzmQnW6%2Bz5F9AW%2BSz%2BzEcEvXofTwh_A9J3%3D_WA-FBP0wYg%40mail.gmail.com
Author: Alexander Korotkov
Reported-by: Peter Geoghegan
Reviewed-by: Peter Geoghegan
|
|
I concluded that we really just ought to force all tables in PDF output
to default to "left" alignment (instead of "justify"); that is what the
HTML toolchain does and that's what most people have been designing the
tables to look good with. There are few if any places where "justify"
produces better-looking output, and there are many where it looks
horrible. So change stylesheet-fo.xsl to make that true.
Also tweak column widths in a few more tables to make them look better
and avoid "exceed the available area" warnings. This commit fixes
basically everything that can be fixed through that approach. The
remaining tables that give warnings either are scheduled for redesign
as per recent discussions, or need a fundamental rethink because they
Just Don't Work in a narrow view.
|
|
Invent the concept of a B-Tree equalimage ("equality implies image
equality") support function, registered as support function 4. This
indicates whether it is safe (or not safe) to apply optimizations that
assume that any two datums considered equal by an operator class's order
method must be interchangeable without any loss of semantic information.
This is static information about an operator class and a collation.
Register an equalimage routine for almost all of the existing B-Tree
opclasses. We only need two trivial routines for all of the opclasses
that are included with the core distribution. There is one routine for
opclasses that index non-collatable types (which returns 'true'
unconditionally), plus another routine for collatable types (which
returns 'true' when the collation is a deterministic collation).
This patch is infrastructure for an upcoming patch that adds B-Tree
deduplication.
Author: Peter Geoghegan, Anastasia Lubennikova
Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
|
|
Note: Following existing practice, titles of formalpara and step are
not titlecased.
|
|
Currently, KNN searches were supported only by GiST. SP-GiST also capable to
support them. This commit implements that support. SP-GiST scan stack is
replaced with queue, which serves as stack if no ordering is specified. KNN
support is provided for three SP-GIST opclasses: quad_point_ops, kd_point_ops
and poly_ops (catversion is bumped). Some common parts between GiST and SP-GiST
KNNs are extracted into separate functions.
Discussion: https://postgr.es/m/570825e8-47d0-4732-2bf6-88d67d2d51c8%40postgrespro.ru
Author: Nikita Glukhov, Alexander Korotkov based on GSoC work by Vlad Sterzhanov
Review: Andrey Borodin, Alexander Korotkov
|
|
Historically, the term procedure was used as a synonym for function in
Postgres/PostgreSQL. Now we have procedures as separate objects from
functions, so we need to clean up the documentation to not mix those
terms.
In particular, mentions of "trigger procedures" are changed to "trigger
functions", and access method "support procedures" are changed to
"support functions". (The latter already used FUNCTION in the SQL
syntax anyway.) Also, the terminology in the SPI chapter has been
cleaned up.
A few tests, examples, and code comments are also adjusted to be
consistent with documentation changes, but not everything.
Reported-by: Peter Geoghegan <[email protected]>
Reviewed-by: Jonathan S. Katz <[email protected]>
|
|
This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING"
frame boundaries in window functions. We'd punted on that back in the
original patch to add window functions, because it was not clear how to
do it in a reasonably data-type-extensible fashion. That problem is
resolved here by adding the ability for btree operator classes to provide
an "in_range" support function that defines how to add or subtract the
RANGE offset value. Factoring it this way also allows the operator class
to avoid overflow problems near the ends of the datatype's range, if it
wishes to expend effort on that. (In the committed patch, the integer
opclasses handle that issue, but it did not seem worth the trouble to
avoid overflow failures for datetime types.)
The patch includes in_range support for the integer_ops opfamily
(int2/int4/int8) as well as the standard datetime types. Support for
other numeric types has been requested, but that seems like suitable
material for a follow-on patch.
In addition, the patch adds GROUPS mode which counts the offset in
ORDER-BY peer groups rather than rows, and it adds the frame_exclusion
options specified by SQL:2011. As far as I can see, we are now fully
up to spec on window framing options.
Existing behaviors remain unchanged, except that I changed the errcode
for a couple of existing error reports to meet the SQL spec's expectation
that negative "offset" values should be reported as SQLSTATE 22013.
Internally and in relevant parts of the documentation, we now consistently
use the terminology "offset PRECEDING/FOLLOWING" rather than "value
PRECEDING/FOLLOWING", since the term "value" is confusingly vague.
Oliver Ford, reviewed and whacked around some by me
Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
|
|
Up to now, useful info for writing a new btree opclass has been buried
in the backend's nbtree/README file. Let's move it into the SGML docs,
in preparation for extending it with info about "in_range" functions
in the upcoming window RANGE patch.
To do this, I chose to create a new chapter for btree indexes in Part VII
(Internals), parallel to the chapters that exist for the newer index AMs.
This is a pretty short chapter as-is. At some point somebody might care
to flesh it out with more detail about btree internals, but that is
beyond the scope of my ambition for today.
Discussion: https://postgr.es/m/[email protected]
|
|
I noticed some slightly confusing or out-of-date verbiage here
while working on the window RANGE patch. Seems worth committing
separately.
|
|
Since some preparation work had already been done, the only source
changes left were changing empty-element tags like <xref linkend="foo">
to <xref linkend="foo"/>, and changing the DOCTYPE.
The source files are still named *.sgml, but they are actually XML files
now. Renaming could be considered later.
In the build system, the intermediate step to convert from SGML to XML
is removed. Everything is build straight from the source files again.
The OpenSP (or the old SP) package is no longer needed.
The documentation toolchain instructions are updated and are much
simpler now.
Peter Eisentraut, Alexander Lakhin, Jürgen Purtz
|
|
IDs in SGML are case insensitive, and we have accumulated a mix of upper
and lower case IDs, including different variants of the same ID. In
XML, these will be case sensitive, so we need to fix up those
differences. Going to all lower case seems most straightforward, and
the current build process already makes all anchors and lower case
anyway during the SGML->XML conversion, so this doesn't create any
difference in the output right now. A future XML-only build process
would, however, maintain any mixed case ID spellings in the output, so
that is another reason to clean this up beforehand.
Author: Alexander Lakhin <[email protected]>
|
|
For DocBook XML compatibility, don't use SGML empty tags (</>) anymore,
replace by the full tag name. Add a warning option to catch future
occurrences.
Alexander Lakhin, Jürgen Purtz
|
|
Historically, the selectivity functions have simply not distinguished
< from <=, or > from >=, arguing that the fraction of the population that
satisfies the "=" aspect can be considered to be vanishingly small, if the
comparison value isn't any of the most-common-values for the variable.
(If it is, the code path that executes the operator against each MCV will
take care of things properly.) But that isn't really true unless we're
dealing with a continuum of variable values, and in practice we seldom are.
If "x = const" would estimate a nonzero number of rows for a given const
value, then it follows that we ought to estimate different numbers of rows
for "x < const" and "x <= const", even if the const is not one of the MCVs.
Handling this more honestly makes a significant difference in edge cases,
such as the estimate for a tight range (x BETWEEN y AND z where y and z
are close together).
Hence, split scalarltsel into scalarltsel/scalarlesel, and similarly
split scalargtsel into scalargtsel/scalargesel. Adjust <= and >=
operator definitions to reference the new selectivity functions.
Improve the core ineq_histogram_selectivity() function to make a
correction for equality. (Along the way, I learned quite a bit about
exactly why that function gives good answers, which I tried to memorialize
in improved comments.)
The corresponding join selectivity functions were, and remain, just stubs.
But I chose to split them similarly, to avoid confusion and to prevent the
need for doing this exercise again if someone ever makes them less stubby.
In passing, change ineq_histogram_selectivity's clamp for extreme
probability estimates so that it varies depending on the histogram
size, instead of being hardwired at 0.0001. With the default histogram
size of 100 entries, you still get the old clamp value, but bigger
histograms should allow us to put more faith in edge values.
Tom Lane, reviewed by Aleksander Alekseev and Kuntal Ghosh
Discussion: https://postgr.es/m/[email protected]
|
|
This will be useful for hash partitioning, which needs a way to seed
the hash functions to avoid problems such as a hash index on a hash
partitioned table clumping all values into a small portion of the
bucket space; it's also useful for anything that wants a 64-bit hash
value rather than a 32-bit hash value.
Just in case somebody wants a 64-bit hash value that is compatible
with the existing 32-bit hash values, make the low 32-bits of the
64-bit hash value match the 32-bit hash value when the seed is 0.
Robert Haas and Amul Sul
Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
|
|
We had thirty different GIN array opclasses sharing the same operators and
support functions. That still didn't cover all the built-in types, nor
did it cover arrays of extension-added types. What we want is a single
polymorphic opclass for "anyarray". There were two missing features needed
to make this possible:
1. We have to be able to declare the index storage type as ANYELEMENT
when the opclass is declared to index ANYARRAY. This just takes a few
more lines in index_create(). Although this currently seems of use only
for GIN, there's no reason to make index_create() restrict it to that.
2. We have to be able to identify the proper GIN compare function for
the index storage type. This patch proceeds by making the compare function
optional in GIN opclass definitions, and specifying that the default btree
comparison function for the index storage type will be looked up when the
opclass omits it. Again, that seems pretty generically useful.
Since the comparison function lookup is done in initGinState(), making
use of the second feature adds an additional cache lookup to GIN index
access setup. It seems unlikely that that would be very noticeable given
the other costs involved, but maybe at some point we should consider
making GinState data persist longer than it now does --- we could keep it
in the index relcache entry, perhaps.
Rather fortuitously, we don't seem to need to do anything to get this
change to play nice with dump/reload or pg_upgrade scenarios: the new
opclass definition is automatically selected to replace existing index
definitions, and the on-disk data remains compatible. Also, if a user has
created a custom opclass definition for a non-builtin type, this doesn't
break that, since CREATE INDEX will prefer an exact match to opcintype
over a match to ANYARRAY. However, if there's anyone out there with
handwritten DDL that explicitly specifies _bool_ops or one of the other
replaced opclass names, they'll need to adjust that.
Tom Lane, reviewed by Enrique Meneses
Discussion: <[email protected]>
|
|
From: Guillaume Lelarge <[email protected]>
|
|
The chapter "Interfacing Extensions To Indexes" and CREATE OPERATOR
CLASS reference page were missed when BRIN was added. We document
all our other index access methods there, so make sure BRIN complies.
Author: Álvaro Herrera
Reported-By: Julien Rouhaud, Tom Lane
Reviewed-By: Emre Hasegeli
Discussion: https://www.postgresql.org/message-id/56CF604E.9000303%40dalibo.com
Backpatch: 9.5, where BRIN was introduced
|
|
This patch reduces pg_am to just two columns, a name and a handler
function. All the data formerly obtained from pg_am is now provided
in a C struct returned by the handler function. This is similar to
the designs we've adopted for FDWs and tablesample methods. There
are multiple advantages. For one, the index AM's support functions
are now simple C functions, making them faster to call and much less
error-prone, since the C compiler can now check function signatures.
For another, this will make it far more practical to define index access
methods in installable extensions.
A disadvantage is that SQL-level code can no longer see attributes
of index AMs; in particular, some of the crosschecks in the opr_sanity
regression test are no longer possible from SQL. We've addressed that
by adding a facility for the index AM to perform such checks instead.
(Much more could be done in that line, but for now we're content if the
amvalidate functions more or less replace what opr_sanity used to do.)
We might also want to expose some sort of reporting functionality, but
this patch doesn't do that.
Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily
editorialized on by me.
|
|
Commit d04c8ed9044ec added another support function to the GIST API,
but overlooked mentioning it in xindex.sgml's summary of index support
functions.
Anastasia Lubennikova
|
|
|
|
With the GIN "fast scan" feature, GIN can skip items without fetching all
the keys for them, if it can prove that they don't match regardless of
those keys. So far, it has done the proving by calling the boolean
consistent function with all combinations of TRUE/FALSE for the unfetched
keys, but since that's O(n^2), it becomes unfeasible with more than a few
keys. We can avoid calling consistent with all the combinations, if we can
tell the operator class implementation directly which keys are unknown.
This commit includes a triConsistent function for the built-in array and
tsvector opclasses.
Alexander Korotkov, with some changes by me.
|
|
SP-GiST is comparable to GiST in flexibility, but supports non-balanced
partitioned search structures rather than balanced trees. As described at
PGCon 2011, this new indexing structure can beat GiST in both index build
time and query speed for search problems that it is well matched to.
There are a number of areas that could still use improvement, but at this
point the code seems committable.
Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane
|
|
This patch creates an API whereby a btree index opclass can optionally
provide non-SQL-callable support functions for sorting. In the initial
patch, we only use this to provide a directly-callable comparator function,
which can be invoked with a bit less overhead than the traditional
SQL-callable comparator. While that should be of value in itself, the real
reason for doing this is to provide a datatype-extensible framework for
more aggressive optimizations, as in Peter Geoghegan's recent work.
Robert Haas and Tom Lane
|
|
Noah Misch. Review and minor cosmetic changes by me.
|
|
|
|
This commit adds columns amoppurpose and amopsortfamily to pg_amop, and
column amcanorderbyop to pg_am. For the moment all the entries in
amcanorderbyop are "false", since the underlying support isn't there yet.
Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with
[ FOR SEARCH | FOR ORDER BY sort_operator_family ] clauses to allow the new
columns of pg_amop to be populated, and create pg_dump support for dumping
that information.
I also added some documentation, although it's perhaps a bit premature
given that the feature doesn't do anything useful yet.
Teodor Sigaev, Robert Haas, Tom Lane
|
|
|
|
|
|
|
|
|
|
supported release.
|
|
SGML-escaping.
|
|
prefix matching using this facility.
Teodor Sigaev and Oleg Bartunov
|
|
"consistent" functions, and remove pg_amop.opreqcheck, as per recent
discussion. The main immediate benefit of this is that we no longer need
8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery
searches on GIN indexes. In future it should be possible to optimize some
other queries better than is done now, by detecting at runtime whether the
index match is exact or not.
Tom Lane, after an idea of Heikki's, and with some help from Teodor.
|
|
which previously only talked about btree opclasses.
|
|
|
|
|
|
appropriate.
|
|
Standard English uses "may", "can", and "might" in different ways:
may - permission, "You may borrow my rake."
can - ability, "I can lift that log."
might - possibility, "It might rain today."
Unfortunately, in conversational English, their use is often mixed, as
in, "You may use this variable to do X", when in fact, "can" is a better
choice. Similarly, "It may crash" is better stated, "It might crash".
Also update two error messages mentioned in the documenation to match.
|
|
|