Add additional jsonpath string methods by theory · Pull Request #12 · theory/postgres

theory · 2025-05-24T21:40:42Z

Add the following jsonpath methods:

l/r/btrim()
lower(), upper()
initcap()
replace()
split_part()

Each simply dispatches to the standard string processing functions. These depend on the locale, but since it's set at initdb, they can be considered immutable and therefore allowed in any jsonpath expression.

The "named tranches" term is a little confusing. In most places it refers to tranches requested with RequestNamedLWLockTranche(), even though all built-in tranches and tranches allocated with LWLockNewTrancheId() also have a name. But in MAX_NAMED_TRANCHES, it refers to tranches requested with either RequestNamedLWLockTranche() or LWLockNewTrancheId(), as it's the maximum of all of those in total. The "user defined" term is already used in LWTRANCHE_FIRST_USER_DEFINED, so let's standardize on that to mean tranches allocated with either RequestNamedLWLockTranche() or LWLockNewTrancheId(). Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi

Merge the LWLockTranches and NamedLWLockTrancheRequest data structures in shared memory into one array of user-defined tranches. The NamedLWLockTrancheRequest list is now only used in postmaster, to hold the requests until shared memory is initialized. Introduce a C struct, LWLockTranches, to hold all the different fields kept in shared memory. This gives an easier overview of what are all the things kept in shared memory. Previously, we had separate pointers for LWLockTrancheNames, LWLockCounter and the (shared memory copy of) NamedLWLockTrancheRequestArray. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi

Previously we reused the shmem allocator's ShmemLock to also protect lwlock.c's shared memory structures. Introduce a separate spinlock for lwlock.c for the sake of modularity. Now that lwlock.c has its own shared memory struct (LWLockTranches), this is easy to do. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi

This makes shmem.c independent of the main LWLock array. That makes it possible to stop passing MainLWLockArray through BackendParameters in the next commit. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi

It's nice to have them show up in pg_shmem_allocations like all other shmem areas. ShmemInitStruct() depends on ShmemIndexLock, but only after postmaster startup. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/47aaf57e-1b7b-4e12-bda2-0316081ff50e@iki.fi

Reported-by: Lukas Fittl <lukas@fittl.com>

This adds an MSVC warning option equivalent to those added in commit 29bf4ee for GCC/Clang. Note that this requires commit bccfc73 (Disable warnings in system headers in MSVC). Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/aa73q1aT0A3/vke/%40ip-10-97-1-34.eu-west-3.compute.internal

Adding an implicit empty vertex pattern when a path pattern starts or ends with an edge pattern or when two consecutive edge patterns appear in the pattern is not supported right now. Prohibit such path patterns. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Reviewed-by: Henson Choi <assam258@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/72a23702-6d96-4103-a54b-057c2352e885%2540eisentraut.org

Author: Sami Imseih <samimseih@gmail.com> Discussion: https://www.postgresql.org/message-id/CAA5RZ0sLENRM+BicUjQFs_rP38oPx3gm0SsGrD0-jMhhM+HZ_w@mail.gmail.com

An element pattern variable may be repeated in the path pattern. GraphTableParseState maintains a list of all variable names used in the graph pattern. Add a new variable name to that list only when it is not present already. This isn't a problem right now, but it could be in the future. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5tR4O0vjeqTCPr2VB5pYjNYbJgbCBEQf63NtU5Pz1MiOQ%40mail.gmail.com

By using palloc() instead of raw malloc(). Reported-by: Gaurav Singh <gaurav.singh@yugabyte.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/CAEcQ1bYR9s4eQLFDjzzJHU8fj-MTbmRpW-9J-r2gsCn+HEsynw@mail.gmail.com Backpatch-through: 14

The PredicateLockShmemInit function is pretty complicated, and one source of confusion is that it reuses the same local variable for sizes of things. Replace the different uses with separate variables for clarity. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/113724ab-0028-493f-9605-6e8570f0939f@iki.fi

When pressing Ctrl+C while running pgindent, it would often leave around files like pgtypedefAXUEEA. This slightly changes SIGINT handling so those files are cleaned up. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://www.postgresql.org/message-id/flat/DFCDD5H4J7VX.3GJKRBBDCKQ86@jeltef.nl

The previous commit let pgindent clean up File::Temp files on SIGINT. This extends that to also cleaning up the .BAK files, created by pg_bsd_indent. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://www.postgresql.org/message-id/flat/DFCDD5H4J7VX.3GJKRBBDCKQ86@jeltef.nl

These tests were intended to be aligned with each other, but additional tests for virtual generated columns disrupted that alignment. The test confirming that user-defined types are not allowed in virtual generated columns has also been moved to the generated_virtual.sql-specific section. Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Mutaamba Maasha <maasha@gmail.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Discussion: https://www.postgresql.org/message-id/flat/20250808115142.e9ccb81f35466a9a131a4c55@sraoss.co.jp

Autovacuum workers scan pg_class twice to collect the set of tables to process. The first pass is for plain relations and materialized views, and the second is for TOAST tables. When the worker finds a table to process, it adds it to the end of a list. Later on, it processes the tables in the same order as the list. This simple strategy has worked surprisingly well for a long time, but there have been many discussions over the years about trying to improve it. This commit introduces a scoring system that is used to sort the aforementioned list of tables to process. The idea is to have autovacuum workers prioritize tables that are furthest beyond their thresholds (e.g., a table nearing transaction ID wraparound should be vacuumed first). This prioritization scheme is certainly far from perfect; there are simply too many possibilities for any scoring technique to work across all workloads, and the situation might change significantly between the time we calculate the score and the time that autovacuum processes it. However, we have attemped to develop something that is expected to work for a large portion of workloads with reasonable parameter settings. The score is calculated as the maximum of the ratios of each of the table's relevant values to its threshold. For example, if the number of inserted tuples is 100, and the insert threshold for the table is 80, the insert score is 1.25. If all other scores are below that value, the table's score will be 1.25. The other criteria considered for the score are the table ages (both relfrozenxid and relminmxid) compared to the corresponding freeze-max-age setting, the number of update/deleted tuples compared to the vacuum threshold, and the number of inserted/updated/deleted tuples compared to the analyze threshold. Once exception to the previous paragraph is for tables nearing wraparound, i.e., those that have surpassed the effective failsafe ages. In that case, the relfrozenxid/relminmxid-based score is scaled aggressively so that the table has a decent chance of sorting to the front of the list. To adjust how strongly each component contributes to the score, the following parameters can be adjusted from their default of 1.0 to anywhere between 0.0 and 10.0 (inclusive). Setting all of these to 0.0 restores pre-v19 prioritization behavior: autovacuum_freeze_score_weight autovacuum_multixact_freeze_score_weight autovacuum_vacuum_score_weight autovacuum_vacuum_insert_score_weight autovacuum_analyze_score_weight This is intended to be a baby step towards smarter autovacuum workers. Possible future improvements include, but are not limited to, periodic reprioritization, automatic cost limit adjustments, and better observability (e.g., a system view that shows current scores). While we do not expect this commit to produce any earth-shattering improvements, it is arguably a prerequisite for the aforementioned follow-up changes. Reviewed-by: Sami Imseih <samimseih@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/aOaAuXREwnPZVISO%40nathan

While fixing the base32hex UUID sortability test in commit 8921003, it turned out that the expected lexicographical order is only maintained under the C collation (or an equivalent byte-wise collation). Natural language collations may employ different rules, breaking the sortability. This commit updates the documentation to explicitly state that base32hex is "byte-wise sortable", ensuring users do not fall into the trap of using natural language collations when querying their encoded data. Co-Authored-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com

Not only is this good style, but it dodges some obscure bugs within pg_bsd_indent. We could try to fix said bugs, but the amount of effort required seems far out of proportion to the benefit. Reported-by: Akshay Joshi <akshay.joshi@enterprisedb.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andrew Dunstan <andrew@dunslane.net> Discussion: https://postgr.es/m/CANxoLDfca8O5SkeDxB_j6SVNXd+pNKaDmVmEW+2yyicdU8fy0w@mail.gmail.com

After the series of preceding commits introducing and using BufferBeginSetHintBits()/BufferSetHintBits16(), hint bits are not set anymore while IO is going on. Therefore we do not need to copy pages while they are being written out anymore. For the same reason XLogSaveBufferForHint() now does not need to operate on a copy of the page anymore, but can instead use the normal XLogRegisterBuffer() mechanism. For that the assertions and comments to XLogRegisterBuffer() had to be updated to allow share-exclusive locked buffers to be registered. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d

An upcoming commit will make UnlockReleaseBuffer() considerably faster and more scalable than doing LockBuffer(BUFFER_LOCK_UNLOCK); ReleaseBuffer();. But it's a small performance benefit even as-is. Most of the callsites changed in this patch are not performance sensitive, however some, like the nbtree ones, are in critical paths. This patch changes all the easily convertible places over to UnlockReleaseBuffer() mainly because I needed to check all of them anyway, and reducing cases where the operations are done separately makes the checking easier. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d

Now that the buffer content lock is implemented as part of BufferDesc.state, releasing the lock and unpinning the buffer can be implemented as a single atomic operation. This improves workloads that have heavy contention on a small number of buffers substantially, I e.g., see a ~20% improvement for pipelined readonly pgbench on an older two socket machine. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d

Create a <sect4> section for each function that the previous text described in one long series of paragraphs. Also split the functions' previously in-line syntax summaries into <synopsis> clauses, which is more readable and allows us to sneak in an explicit mention of the result data type. This change gives us an opportunity to make cross-reference links more specific, too, so do that. Author: jian he <jian.universality@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CACJufxFuk9P=P4=BZ=qCkgvo6im8aL8NnCkjxx2S2MQDWNdouw@mail.gmail.com

Upcoming commits will change StartReadBuffers() and its building blocks, making it worthwhile to directly test StartReadBuffers(). Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv

While we have a lot of indirect coverage of read streams, there are corner cases that are hard to test when only indirectly controlling and observing the read stream. This commit adds an SQL callable SRF interface for a read stream and uses that in a few tests. To make some of the tests possible, the injection point infrastructure in test_aio had to be expanded to allow blocking IO completion. While at it, fix a wrong debug message in inj_io_short_read_hook(). Author: Andres Freund <andres@anarazel.de> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv

PostmasterContext is not available in single-user mode, use TopMemoryContext instead. Also make sure that we use the correct memory context in the lappend(). Author: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://www.postgresql.org/message-id/acb_Eo1XtmCO_9z7@nathan

Until now StartBufferIO() had a few weaknesses: - As it did not submit staged IOs, it was not safe to call StartBufferIO() where there was a potential for unsubmitted IO, which required AsyncReadBuffers() to use a wrapper (ReadBuffersCanStartIO()) around StartBufferIO(). - With nowait = true, the boolean return value did not allow to distinguish between no IO being necessary and having to wait, which would lead ReadBuffersCanStartIO() to unnecessarily submit staged IO. - Several callers needed to handle both local and shared buffers, requiring the caller to differentiate between StartBufferIO() and StartLocalBufferIO() - In a future commit some callers of StartBufferIO() want the BufferDesc's io_wref to be returned, to asynchronously wait for in-progress IO - Indicating whether to wait with the nowait parameter was somewhat confusing compared to a wait parameter Address these issues as follows: - StartBufferIO() is renamed to StartSharedBufferIO() - A new StartBufferIO() is introduced that supports both shared and local buffers - The boolean return value has been replaced with an enum, indicating whether the IO is already done, already in progress or that the buffer has been readied for IO - A new PgAioWaitRef * argument allows the caller to get the wait reference is desired. All current callers pass NULL, a user of this will be introduced subsequently - Instead of the nowait argument there now is wait This probably would not have been worthwhile on its own, but since all these lines needed to be touched anyway... Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw@3p3zu522yykv

When a backend attempts to start a read IO and finds the first buffer already has I/O in progress, previously it waited for that I/O to complete before initiating reads for any of the subsequent buffers. Although it must wait for the I/O to finish when acquiring the buffer, there's no reason for it to wait when setting up the read operation. Waiting at this point prevents starting I/O on subsequent buffers and can significantly reduce concurrency. This matters in two workloads: 1) When multiple backends scan the same relation concurrently. 2) When a single backend requests the same block multiple times within the readahead distance. Waiting each time an in-progress read is encountered effectively degenerates the access pattern into synchronous I/O. To fix this, when encountering an already in-progress IO for the head buffer, the wait reference is now recorded and waiting is deferred until WaitReadBuffers(), when the buffer actually needs to be acquired. In rare cases, a backend may still need to wait synchronously at IO start time: If another backend has set BM_IO_IN_PROGRESS on the buffer but has not yet set the wait reference. Such windows should be brief and uncommon. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/flat/zljergweqti7x67lg5ije2rzjusie37nslsnkjkkby4laqqbfw%403p3zu522yykv

Rename the `csv_` tokens to `int_`, because they represent signed or unsigned integers, as follows: * `csv_elem` => `int_elem` * `csv_list` => `int_list` * `opt_csv_list` => `opt_int_list` Rename the `datetime_precision` tokens to `uint_arg`, as they represent unsigned integers and will be useful for other methods in the future, as follows: * `datetime_precision` => `uint_elem` * `opt_datetime_precision` => `opt_uint_arg` Rename the `datetime_template` tokens to `str_arg`, as they represent strings and will be useful for other methods in the future, as follows: * `datetime_template` => `str_elem` * `opt_datetime_template` => `opt_str_arg`

Add the following jsonpath methods: * l/r/btrim() * lower(), upper() * initcap() * replace() * split_part() Each simply dispatches to the standard string processing functions. These depend on the locale, but since it's set at `initdb`, they can be considered immutable and therefore allowed in any jsonpath expression.

theory self-assigned this May 24, 2025

theory force-pushed the more-jsonpath-methods branch 3 times, most recently from 6c61d21 to b996d19 Compare May 28, 2025 17:29

theory force-pushed the more-jsonpath-methods branch 14 times, most recently from f3eb863 to 8f2146a Compare June 4, 2025 15:21

theory force-pushed the more-jsonpath-methods branch from 8f2146a to ea19345 Compare June 14, 2025 15:06

theory force-pushed the more-jsonpath-methods branch 5 times, most recently from 0db127d to 694cf34 Compare July 11, 2025 18:04

theory force-pushed the more-jsonpath-methods branch from 694cf34 to 50e867f Compare October 28, 2025 19:49

theory force-pushed the more-jsonpath-methods branch from 50e867f to 9c9dcdb Compare November 30, 2025 21:15

theory force-pushed the more-jsonpath-methods branch 4 times, most recently from fbc9017 to ebb9dcc Compare January 5, 2026 23:48

hlinnaka and others added 29 commits March 26, 2026 23:46

pg_plan_advice: pgindent

874da8b

Reported-by: Lukas Fittl <lukas@fittl.com>

Minor comment fixes to yesterday's LWLock tranche refactoring

9899315

Author: Sami Imseih <samimseih@gmail.com> Discussion: https://www.postgresql.org/message-id/CAA5RZ0sLENRM+BicUjQFs_rP38oPx3gm0SsGrD0-jMhhM+HZ_w@mail.gmail.com

theory force-pushed the more-jsonpath-methods branch from ede508a to 288c81d Compare March 28, 2026 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional jsonpath string methods#12

Add additional jsonpath string methods#12
theory wants to merge 555 commits intomasterfrom
more-jsonpath-methods

theory commented May 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

theory commented May 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants