A Deeper Dive into EXPLAIN

A deeper dive
into EXPLAIN
Michael Christofides

Hi, I’m Michael
Half of the team behind pgMustard
Spent a lot of time looking into EXPLAIN
Background: product management, database tools
pgmustard.com/docs/explain
michael@pgmustard.com
michristofides

Picking up from other EXPLAIN talks
Not the basics*
1) Some of the less intuitive arithmetic
2) Some less well covered issues
* postgresql.org/docs/current/performance-tips
thoughtbot: reading EXPLAIN ANALYZE
YouTube: Josh Berkus Explaining EXPLAIN

Picking up from other EXPLAIN talks
Not the basics*
1) Arithmetic: why is this query slow?
2) Issues: what can we do about it?
* postgresql.org/docs/current/performance-tips
thoughtbot: reading EXPLAIN ANALYZE
YouTube: Josh Berkus Explaining EXPLAIN

Arithmetic: loops
Many of the stats are a per-loop average
This includes costs, rows, timings
Watch out for rounding, especially to 0 rows

Disclaimer: heavily doctored
plans ahead, inaccuracies likely.

Nested Loop
(cost=0.84..209.82 rows=16 width=11)
(actual time=0.076..0.368 rows=86 loops=1)
-> Index Only Scan using a on b
(cost=0.42..4.58 rows=9 width=4)
-> Index Scan using x on y
(cost=0.42..22.73 rows=7 width=15)
Nested Loop: 86 rows
Index Scan: 9 * 10 = 90 rows
(Rounding not too bad here)

Arithmetic: threads
Costs, rows, and timings are also per-thread
Shown as loops
Threads = workers + 1
Tip: use VERBOSE
<- the leader

Parallel Seq Scan on table
(cost=0.00..6772.21 rows=79521 width=22)
Output: column1, column2, column3
Worker 0: actual time=0.111..66.325 rows=56225 loops=1
Worker 1: actual time=0.138..66.027 rows=58792 loops=1
Seq Scan: 63617 * 3 = 190851 rows
Leader: 190851 - 58792 - 56225
= 75834 rows

Arithmetic: buffers
Buffer stats are a total, not per-loop
They are inclusive of children

Nested Loop (... loops=1)
Buffers: shared hit=105
-> Index Only Scan using a on b (... loops=1)
-> Index Scan using x on y (... loops=9)
Nested Loop buffers:
105 - (101 + 4) = 0 blocks

Arithmetic: timings
Per-loop, per-thread
Inclusive of children
Calculating per-node (exclusive) times can get
tricky, even for tools

Nested Loop
(cost=0.84..209.82 rows=16 width=11)
-> Index Only Scan using a on b
(cost=0.42..22.73 rows=7 width=15)
Index Scan: 0.030 * 9 = 0.27 ms
Nested Loop: 0.368 - 0.27 - 0.019
= 0.079 ms

WITH init AS (
SELECT * FROM pg_sleep_for('100ms')
UNION ALL
SELECT * FROM pg_sleep_for('200ms')
)
(SELECT * FROM init LIMIT 1)
UNION ALL
(SELECT * FROM init);
Credit @felixge

Append (actual time=100.359..
300.688 … )
CTE init
-> Append (actual time=100.334..
300.652 … )
-> Function Scan (actual time=100.333..
100.335 … )
-> Function Scan (actual time=200.310..
200.312 … )
-> Limit (actual time=100.358..
100.359 … )
-> CTE Scan a (actual time=100.355..100.356 … )
-> CTE Scan b (actual time=0.001..
200.322 … )
Execution Time: 300.789 ms
Further reading:
flame-explain.com/docs/general/quirk-correction
Some double-counting in this case.

Arithmetic: tools can help
eg explain.depesz.com
explain.dalibo.com
flame-explain.com
pgmustard.com
<- fellow calculations nerd
<- 👋

Issues: let’s skip the basics
Seq Scans with large filters
Bad row estimates
Sorts and Hashes on disk

Issues: inefficient index scans
Looks out for lots of rows being filtered
Filters are per-loop
So again, watch out for rounding

(cost=0.42..302502.05 rows=1708602 width=125)
Index Cond: (id = another_id)
Filter: (status = 1)
Rows Removed by Filter: 3125626
Index efficiency: 1000/(1000+3125626) = 0.03%
Watch out for high loops

Issues: lossy bitmap scans
When bitmap would otherwise exceed work_mem
Point to a block rather than a row (Tuple Id)
Lossy blocks are a total (ie not per-loop)

-> Bitmap Heap Scan on table
(cost=49153.29..4069724.27 rows=3105598 width=1106)
Recheck Cond: (something > something_else)
Rows Removed by Index Recheck: 5905323
Heap Blocks: exact=14280 lossy=1951048
Lossy blocks: 1951048/(1951048+14280) = 99%
Extra rows read: 5.9 million

Issues: lots of data read
Requires BUFFERS
Lots of data being read for the amount returned
Can be a sign of bloat
Default block size: 8kB

Index Cond: (id = another_id)
Filter: (status = 1)
Buffers: shared hit=1146405 read=110636
Caveats: width estimated, rows rounded
Data read: (1146405 + 110636) * 8kB = 10GB
Data returned: 1 * 256753 * 8 bytes = 2MB

Issues: planning time
At the end of the query plan
Can be planning related: eg joins, partitions
But other things too: eg extensions, locks
Warning: not included in auto_explain

(...)
Planning Time: 27.844 ms
Planning proportion:
27.844/(27.844 + 11.162) = 71%

Issues: Just In Time compilation
Included in execution time
On by default in PostgreSQL 12 and 13
Start-up time can be a tell-tale

JIT:
Functions: 277
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 31.602 ms, Inlining 253.114 ms, Optimization
1498.268 ms, Emission 913.945 ms, Total 2696.929 ms
JIT proportion:
2696.929/(9.138 + 5194.851) = 52%

Very suspicious actual start-up time
from a JIT dominated plan.
-> Seq Scan on table
(cost=0.00..3.57 rows=72 width=8)

Issues: triggers
Total time across calls
Check foreign keys indexed
Before triggers vs after triggers

Trigger: RI_ConstraintTrigger_a_12345 on table
time=83129.491 calls=2222623
Trigger proportion:
83129.491/(0.227 + 87645.739) = 95%

Summary: check the arithmetic
Watch out for loops and threads
Watch out for CTEs
Tools can help, if in doubt check two

Summary: keep rarer issues in mind
Check the end section first
Also look out for filters, rechecks, lossy
blocks, amount of data
Tools, mailing lists, and communities can help

Thank you! Any questions?
michael@pgmustard.com
michristofides
Further reading:
* flame-explain.com/docs/general/quirk-correction
* pgmustard.com/docs/explain
* wiki.postgresql.org/wiki/Slow_Query_Questions

A Deeper Dive into EXPLAIN

More Related Content

What's hot(20)

Similar to A Deeper Dive into EXPLAIN(20)

More from EDB(20)

Recently uploaded(20)

A Deeper Dive into EXPLAIN