Fix recovery_prefetch with low maintenance_io_concurrency.
authorThomas Munro <[email protected]>
Thu, 8 Sep 2022 08:25:20 +0000 (20:25 +1200)
committerThomas Munro <[email protected]>
Thu, 8 Sep 2022 09:44:55 +0000 (21:44 +1200)
commitadb466150b44d1eaf43a2d22f58ff4c545a0ed3f
tree0c3b8dbf28df0032efefc9a3a7930269b0da33a8
parent12d40d4a8d0495cf2c7b564daa8aaa7f107a6c56
Fix recovery_prefetch with low maintenance_io_concurrency.

We should process completed IOs *before* trying to start more, so that
it is always possible to decode one more record when the decoded record
queue is empty, even if maintenance_io_concurrency is set so low that a
single earlier WAL record might have saturated the IO queue.

That bug was hidden because the effect of maintenance_io_concurrency was
arbitrarily clamped to be at least 2.  Fix the ordering, and also remove
that clamp.  We need a special case for 0, which is now treated the same
as recovery_prefetch=off, but otherwise the number is used directly.
This allows for testing with 1, which would have made the problem
obvious in simple test scenarios.

Also add an explicit error message for missing contrecords.  It was a
bit strange that we didn't report an error already, and became a latent
bug with prefetching, since the internal state that tracks aborted
contrecords would not survive retrying, as revealed by
026_overwrite_contrecord.pl with this adjustment.  Reporting an error
prevents that.

Back-patch to 15.

Reported-by: Justin Pryzby <[email protected]>
Reviewed-by: Kyotaro Horiguchi <[email protected]>
Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
src/backend/access/transam/xlogprefetcher.c
src/backend/access/transam/xlogreader.c
src/include/access/xlogreader.h