0% found this document useful (0 votes)

52 views23 pages

Postgresql'S Io Subsystem: Problems, Workarounds, Solutions: Andres Freund Postgresql Developer & Committer

The document discusses PostgreSQL's IO subsystem and identifies several problems including lack of asynchronous reads, inefficient background writer, unpredictable writeback behavior from backends, and jitter from unpredictable IO. It proposes solutions like adding support for asynchronous reads, improving the background writer to perform clock sweeps and asynchronous writes, prioritizing different IO causes, and reducing jitter through more efficient caching and prefetching. The challenges of moving away from fully buffered IO are also outlined.

Uploaded by

Ramkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views23 pages

Postgresql'S Io Subsystem: Problems, Workarounds, Solutions: Andres Freund Postgresql Developer & Committer

Uploaded by

Ramkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

PostgreSQL's IO subsystem:

Problems, Workarounds, Solutions

Andres Freund
PostgreSQL Developer & Committer

Email: [email protected]
Email: [email protected]
Twitter: @AndresFreundTec
anarazel.de/talks/2019-10-16-pgconf-milan-io/io.pdf
Memory Architecture
Process local Memory Shared Memory

Postmaster
Sorting
Buffer Cache

User Connection Backend

Plans Checkpointer
Locking
Wal Writer Information
Background Writer
Temporary
Tables
Transaction
State
Bitmap
Scans
…
Client

Parser
Catalog
Planner

Executor DDL

Postgres Table Access Manager

WhatAM HeapAM ZHeapAM

Buffer Manager Buffers

Storage Manager (IO)

Black Page Cache Disk

Kernel Hole
IO Performance
●
Time till IO has finished
●
CPU Overhead
– polling IO, lots of threads, … can be faster, but eat a lot of CPU
●
Synchronous Blocking Operation
– buffered writes: often non-blocking
– buffered reads: commonly blocking
– non-buffered writes: blocking & asynchronously
– non-buffered read: blocking & asynchronously
●
Efficiency of IO internally to the drive
– sequential writes faster than random writes
– operations covering larger “blocks” of data usually faster
– deeper queues → higher throughput, higher latency
What Is What
●
Backends ●
Background Writer
– client connection, or “worker” – tries to write out dirty buffers if
for parallel query processing needed by backends, i.e.
working set bigger than
●
Checkpointer memory
– writes out dirty data once ●
WAL Writer
every checkpoint_timeout
– tries to write out WAL
– sorts data before writeout generated by backends
– allows to remove / recycle – does most WAL writing when
WAL synchronous_commit = off
– may do a fair bit of WAL
writing when most
transactions are longer
IO Properties
●
Backends ●
Background Writer
– Data: – Data:
●
synchronous random reads ●
“writeback”, allows cheap
●
triggers read prefetches reuse of buffers
●
sequential journal writes
●
random writes
●
under pressure: writeback ●
WAL Writer
– WAL – WAL
●
async append to pre- ●
pre flushes WAL
allocated journal ●
commonly purely sequential
●
fdatasync on commit (potential gaps)
●
Checkpointer
– Data:
●
paced ordered writes (in file
order, potentially with lots of
gaps)
●
fsyncs all modified files
Problem: Postgres Reads
●
Very little prefetching
– partially a problem of the executor
– partially a problem of the available interfaces
●
No concurrent IO
– especially bad on good SSDs, which can process many many requests in
IOs in parallel
●
All reads are synchronous
– the less SQL level concurrency, the worse this is
– not that bad for nearly entirely cached or very concurrent workloads with
just a read or two per “statement”
– kernel/device to postgres copies are expensive, and not done in parallel
●
Workarounds:
– NVMe SSDs (low latency)
Reads: synchronous, not cached
Client

Postgres

Disk

Time
Reads: asynchronous, not cached
Client

Postgres

Disk

Time
Reads: synchronous, OS cached
Client

Postgres

Disk

Time
Reads: synchronous, postgres cached
Client

Postgres

Disk

Time
Solution: Postgres Reads
●
Add support for asynchronous reads
– Highly platform dependent
– typically only supports “direct IO”, i.e. IO bypassing the kernel page
cache
●
emulation via fadvise() currently done, but still does kernel→userspace
copy
– linux has new interface, io_uring, that is a lot more flexible
●
including fewer syscalls (important after intel security “fixes”)
– lots of work
●
including executor architecture
●
Emit better prefetching requests
– not that hard in individual cases, but a lot of places
Problem: Background Writer
●
Refresher for bgwriter:
– writes data back to OS when working set doesn’t fit in shared buffers
– reduces writes needing to be done by backends
●
Background writer does not change recency information
(perform clock sweep)
– when all blocks “recently” used → can’t do anything
– configuration complicated & not meaningful
●
All IO buffered synchronous
– throughput / IO utilization too low, and thus falling behind
– flushes can be disabled, but often causes massive latency issues
●
A lot of random IO
– victim buffer selection usage and buffer pool position dependent
– hard to efficiently combine writes for neighboring blocks currently (hash
mapping)
Clock-Sweep CNT: 4
3
35 0 1
2
3 CNT: 0
4
5
6
Problem: Background Writer
●
Consequences:
– backends to a lot of IO, a lot of it random (slow)
– high jitter, depending on bgwriter temporarily doing things or not
●
Partial Workarounds
– reduce bgwriter_delay significantly
– increase shared_buffers and/or decrease checkpoint_timout (all
sequential writes)
– sometimes: set backend_flush_after (for jitter reduction)
Solution: Background Writer
●
Perform Clock Sweep
– avoids inability to find work
– can actually improve recency accuracy (less saturation)
●
Queue of clean buffers
– removes pacing requirements
– reduces average cost of getting clean buffer
●
Asynchronous Writes / Writeback
– improves IO throughput / utilization, especially with random IO
●
Write Combining
– reduces random IO
– requires shared_buffer mapping datastructure with ordering support

●
Prototype seems to work well
Problem: Backend Writeback
●
takes time away from query execution
●
unpredictable latency
– query - due to having to write
– write - due to kernel cache
●
Diagnose:
– pg_stat_statements.blk_write_time etc, for readonly queries
– EXPLAIN (ANALYZE, BUFFERS)
●
Workarounds:
– tune background writer to be aggressive
– set backend_flush_after
●
Solutions:
– new bgwriter
– asynchronous direct-IO write submissions
Problem: Jitter
●
query performance can be unpredictable
●
Causes:
– kernel has a lot of dirty buffers → decides to write back
– postgres issues IOs at an unpredictable rate
– kernel readahead randomly makes reads take longer
●
Workarounds:
– set backend_flush_after, reduce other *_flush_after settings
– disable kernel readahead (can be bad for sequential scans)
– make bgwriter more aggressive
●
Solutions:
– disable kernel readahead, perform our own readahead / prefetching
– prioritize / throttle different IO causes differently
– improve cache hit ratio
Why Buffered IO?
●
Parts of Postgres’ IO stack have, uh, baggage
●
Portability
●
Needs far less tuning
– PG buffer cache size less critical, extends to kernel page cache
– IO issue rate to drive doesn’t need to be controlled
●
Why is having less tuning crucial:
– DBAs / sysadmins don’t exist for vast majority of systems (if they exist, they don’t
know hardware that well)
– workloads continuously change
– machines / OS instances are heavily over-committed and shared
– adapting shared memory after start is hard (PG architecture, OS)
●
Consequence
– PG defaults to 128MB shared buffers (“page cache”)
– works OK for low-medium heavy load
Why Direct IO?
●
Much higher IO throughput, especially for writes
●
locking for buffered writes limits concurrency
●
no AIO without DIO for most platforms (except now io_uring)
●
No Double Buffering
●
Writeback behavior of various OS kernels leads to hard to
predict performance
●
kernel page cache scales badly for large amounts of memory
●
kernel page cache lookups are not cheap, so need to be
avoided anyway (copy_to_user + radix tree lookup, syscalls
after security fixes)
Further Problems
●
Available postgres level monitoring incomplete and confusing
– pg_stat_bgwriter has stats not about bgwriter
– buffers_backend includes relation extension, which cannot be done by
any other process
– write times of different processes not recorded (checkpoint_write_time
useless, includes sleeps)
●
Ring Buffers for sequential reads, vacuum, COPY can cause
very significant slowdowns
– data never cached
– writes quickly trigger blocking
●
Double Buffering
– trigger posix_fadvise(POSIX_FADV_DONTNEED) when
dirtying?

PostgreSQL_Essentials_v16_Student
No ratings yet
PostgreSQL_Essentials_v16_Student
462 pages
PostgreSQL_Essentials_v16_Student
No ratings yet
PostgreSQL_Essentials_v16_Student
400 pages
Linux 4.11
No ratings yet
Linux 4.11
20 pages
Linux Block IO Paper Systor13-Final18
No ratings yet
Linux Block IO Paper Systor13-Final18
10 pages
Lecture 12 - Parallel IO, Performance Analysis and Tuning, Power
No ratings yet
Lecture 12 - Parallel IO, Performance Analysis and Tuning, Power
20 pages
Computer Organization and Architecture Module 1 (Kerala University) Notes
100% (12)
Computer Organization and Architecture Module 1 (Kerala University) Notes
30 pages
Manual Arcadis Varic
No ratings yet
Manual Arcadis Varic
36 pages
Postgres' Io Architecture, Tuning, Problems: Andres Freund Postgresql Developer & Committer
No ratings yet
Postgres' Io Architecture, Tuning, Problems: Andres Freund Postgresql Developer & Committer
30 pages
74 Series IC Data Sheet Index
No ratings yet
74 Series IC Data Sheet Index
5 pages
Postgresql Course Material
No ratings yet
Postgresql Course Material
205 pages
535 PGCon 2019 Borodin Odyssey
No ratings yet
535 PGCon 2019 Borodin Odyssey
66 pages
The Life of A Dirty Page
No ratings yet
The Life of A Dirty Page
24 pages
Gsmith Content Postgresql TuningPGWAL
No ratings yet
Gsmith Content Postgresql TuningPGWAL
8 pages
ArEngrs Electrical Load and Computation Template V2.0.Xlsm 1
No ratings yet
ArEngrs Electrical Load and Computation Template V2.0.Xlsm 1
12 pages
Linux and H/W Optimizations For Mysql: Yoshinori Matsunobu
No ratings yet
Linux and H/W Optimizations For Mysql: Yoshinori Matsunobu
160 pages
PostgreSQL Proficiency For Python People
No ratings yet
PostgreSQL Proficiency For Python People
215 pages
PostgreSQL Distributed Architectures and Best Practices
No ratings yet
PostgreSQL Distributed Architectures and Best Practices
42 pages
Scale15x-2017-Postgresql Zfs Best Practices
No ratings yet
Scale15x-2017-Postgresql Zfs Best Practices
110 pages
whats-new-in-postgresql-16 (2)
No ratings yet
whats-new-in-postgresql-16 (2)
48 pages
PostgreSQL Configuration for Humans
No ratings yet
PostgreSQL Configuration for Humans
38 pages
E Cient IO With Io - Uring
No ratings yet
E Cient IO With Io - Uring
17 pages
50 46 Pgcon2008 Problem
No ratings yet
50 46 Pgcon2008 Problem
36 pages
PyParallel: How We Removed The GIL and Exploited All Cores
No ratings yet
PyParallel: How We Removed The GIL and Exploited All Cores
153 pages
Call Graph Prefetching For Database Applications
No ratings yet
Call Graph Prefetching For Database Applications
33 pages
unit 5
No ratings yet
unit 5
98 pages
Linuxpiter 2015 Kosmodemiansky Linux
No ratings yet
Linuxpiter 2015 Kosmodemiansky Linux
26 pages
Postgresql Benchmark
No ratings yet
Postgresql Benchmark
36 pages
Distributed PostgreSQL
No ratings yet
Distributed PostgreSQL
118 pages
Deep Dive On AWS Redshift
67% (3)
Deep Dive On AWS Redshift
73 pages
Inside Buffer Cache
No ratings yet
Inside Buffer Cache
26 pages
io_uring
No ratings yet
io_uring
17 pages
Regionalization PDF
No ratings yet
Regionalization PDF
15 pages
PostgreSQL Architecture Document by Subham Dash 1710404181
No ratings yet
PostgreSQL Architecture Document by Subham Dash 1710404181
11 pages
Postgresql
No ratings yet
Postgresql
56 pages
Back To The Roots Oracle Database IO Management
No ratings yet
Back To The Roots Oracle Database IO Management
35 pages
Accidentaldbalinuxcon 130102190320 Phpapp02
No ratings yet
Accidentaldbalinuxcon 130102190320 Phpapp02
61 pages
PostgreSQL - Identifying Slow Queries and Fixing Them
No ratings yet
PostgreSQL - Identifying Slow Queries and Fixing Them
40 pages
2020300053_ADBMS_EXP2_Chinmay
No ratings yet
2020300053_ADBMS_EXP2_Chinmay
7 pages
Postgres DBA Interview Questions
100% (2)
Postgres DBA Interview Questions
13 pages
PostgreSQL When It's Not Your Job
No ratings yet
PostgreSQL When It's Not Your Job
183 pages
Understanding Modern Storage APIs - A Systematic Study of Libaio, SPDK, And Io_uring
No ratings yet
Understanding Modern Storage APIs - A Systematic Study of Libaio, SPDK, And Io_uring
8 pages
Exam 5020 L16-22
No ratings yet
Exam 5020 L16-22
10 pages
Postgresql DBA Architecture
100% (1)
Postgresql DBA Architecture
60 pages
National University of Singapore: Due: 11:59pm February 22, 2010 (Monday)
No ratings yet
National University of Singapore: Due: 11:59pm February 22, 2010 (Monday)
8 pages
Recommended default settings for a new PostgreSQL or EPAS installation - Linux-2
No ratings yet
Recommended default settings for a new PostgreSQL or EPAS installation - Linux-2
12 pages
PostgreSQL%3A+Let%27s+make+PostgreSQL+multi-threaded
No ratings yet
PostgreSQL%3A+Let%27s+make+PostgreSQL+multi-threaded
1 page
lect13x6
No ratings yet
lect13x6
5 pages
MBA - Full Time (Regular) - Regulation & Syllabus: Bharathiar University: Coimbatore-46
No ratings yet
MBA - Full Time (Regular) - Regulation & Syllabus: Bharathiar University: Coimbatore-46
59 pages
Introduction Postgre SQLAdministration V11
No ratings yet
Introduction Postgre SQLAdministration V11
274 pages
Article 5
No ratings yet
Article 5
12 pages
Company Profile
No ratings yet
Company Profile
91 pages
PostgreSQL Internals Notes Compilation
No ratings yet
PostgreSQL Internals Notes Compilation
18 pages
Postgresql MVCC
No ratings yet
Postgresql MVCC
5 pages
9780138285999_Sample
No ratings yet
9780138285999_Sample
52 pages
SEOUC - How To Solve The Wrong Problem
No ratings yet
SEOUC - How To Solve The Wrong Problem
42 pages
Understanding Operating System Resources
No ratings yet
Understanding Operating System Resources
11 pages
Software Quality Assurance Guide For Use With DOE O 414.1D, Quality Assurance
No ratings yet
Software Quality Assurance Guide For Use With DOE O 414.1D, Quality Assurance
131 pages
Internals of PostgreSQL Wal
100% (1)
Internals of PostgreSQL Wal
51 pages
The Internals of PostgreSQL - Chapter 2 Process and Memory Architecture
No ratings yet
The Internals of PostgreSQL - Chapter 2 Process and Memory Architecture
3 pages
Final Year Power Factor SENT
100% (1)
Final Year Power Factor SENT
36 pages
Office Management
No ratings yet
Office Management
17 pages
Postgre SQL
No ratings yet
Postgre SQL
35 pages
Algorithms and Parallel Computing: Dr. Fayez Gebali, P.Eng
No ratings yet
Algorithms and Parallel Computing: Dr. Fayez Gebali, P.Eng
17 pages
Utility Applications of Time Sensitive Networking - White Paper - Final Review
No ratings yet
Utility Applications of Time Sensitive Networking - White Paper - Final Review
17 pages
DT-1. Familiarization With AIML Platforms
No ratings yet
DT-1. Familiarization With AIML Platforms
25 pages
Thesis Defense Party Invitation
100% (3)
Thesis Defense Party Invitation
8 pages
Kerala University PHD Thesis Submission
100% (3)
Kerala University PHD Thesis Submission
8 pages
Enable Online Backup in PostgreSQL
No ratings yet
Enable Online Backup in PostgreSQL
4 pages
Manual de Instalación de Camaras Trampa
No ratings yet
Manual de Instalación de Camaras Trampa
33 pages
Logical IO Vs Physical IO Vs Consistent Gets
No ratings yet
Logical IO Vs Physical IO Vs Consistent Gets
11 pages
Postgresql InterviewQuestion
100% (1)
Postgresql InterviewQuestion
5 pages
PostgreSQL Backups The Modern Way
No ratings yet
PostgreSQL Backups The Modern Way
50 pages
Postgrre
No ratings yet
Postgrre
14 pages
Computer-Organization-and-Architecture-CSE (1)
No ratings yet
Computer-Organization-and-Architecture-CSE (1)
3 pages
Performance Tuning PostgreSQL
No ratings yet
Performance Tuning PostgreSQL
25 pages
Pganalyze - Best Practices For Optimizing Postgres Query Performance
100% (1)
Pganalyze - Best Practices For Optimizing Postgres Query Performance
26 pages
Postgresql Tuning Guide: Postgresql Architecture: Key Takeaways
No ratings yet
Postgresql Tuning Guide: Postgresql Architecture: Key Takeaways
8 pages
PostgreSQL Python Tutorial
No ratings yet
PostgreSQL Python Tutorial
28 pages
P 17 Rev
No ratings yet
P 17 Rev
61 pages
AMD Ryzen ThreadripperPRO 7995WX @nettrain
No ratings yet
AMD Ryzen ThreadripperPRO 7995WX @nettrain
6 pages
Technical Drafting
No ratings yet
Technical Drafting
40 pages
440-0879.000-J-174 - 11 - Assembly Drawings System Cabinet Scada PDF
No ratings yet
440-0879.000-J-174 - 11 - Assembly Drawings System Cabinet Scada PDF
2 pages
A Riddle Wrapped in An Enigma - Neal Koblitz, Alfred J. Menezes
No ratings yet
A Riddle Wrapped in An Enigma - Neal Koblitz, Alfred J. Menezes
21 pages
How To Create Charts in Excel - 105042
No ratings yet
How To Create Charts in Excel - 105042
11 pages
Intelligent Maintenance
No ratings yet
Intelligent Maintenance
10 pages
BA4T6F International Financial Management
No ratings yet
BA4T6F International Financial Management
1 page
Communication System Lab Poster
No ratings yet
Communication System Lab Poster
4 pages
DSP QB Updated - New
No ratings yet
DSP QB Updated - New
7 pages
f3 Comps Assignment
No ratings yet
f3 Comps Assignment
3 pages
Open - Gapps Arm 10.0 Stock 20220215.versionlog
No ratings yet
Open - Gapps Arm 10.0 Stock 20220215.versionlog
3 pages
M.B.A (CBCS Pattern) (For The Affiliated College Students Admitted During The Academic Year 20012-13&onwards) Examinations
No ratings yet
M.B.A (CBCS Pattern) (For The Affiliated College Students Admitted During The Academic Year 20012-13&onwards) Examinations
11 pages
21.mass Communication 22.advertising Management
No ratings yet
21.mass Communication 22.advertising Management
6 pages
Logsene Brochure PDF
No ratings yet
Logsene Brochure PDF
24 pages
Metal Detecting Robot - Bluetoot
No ratings yet
Metal Detecting Robot - Bluetoot
5 pages
Humanoid Robot Hbe Robonova Ai 2
No ratings yet
Humanoid Robot Hbe Robonova Ai 2
5 pages
Software Project Management 3 1 0 4 Unit I
No ratings yet
Software Project Management 3 1 0 4 Unit I
1 page
4 Tle Eim G8 Exploratory Q2 Week2 Day4 PDF
No ratings yet
4 Tle Eim G8 Exploratory Q2 Week2 Day4 PDF
3 pages
Kernel I/O Subsystem in Operating System
No ratings yet
Kernel I/O Subsystem in Operating System
2 pages
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Node.js, JavaScript, API: Interview Questions and Answers
From Everand
Node.js, JavaScript, API: Interview Questions and Answers
John Edward Cooper Berg
5/5 (1)

Postgresql'S Io Subsystem: Problems, Workarounds, Solutions: Andres Freund Postgresql Developer & Committer

Uploaded by

Postgresql'S Io Subsystem: Problems, Workarounds, Solutions: Andres Freund Postgresql Developer & Committer

Uploaded by

PostgreSQL's IO subsystem:

Problems, Workarounds, Solutions

User Connection Backend

Postgres Table Access Manager

WhatAM HeapAM ZHeapAM

Buffer Manager Buffers

Storage Manager (IO)

Black Page Cache Disk

You might also like