1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
|
<!-- doc/src/sgml/bgworker.sgml -->
<chapter id="bgworker">
<title>Background Worker Processes</title>
<indexterm zone="bgworker">
<primary>Background workers</primary>
</indexterm>
<para>
PostgreSQL can be extended to run user-supplied code in separate processes.
Such processes are started, stopped and monitored by <command>postgres</command>,
which permits them to have a lifetime closely linked to the server's status.
These processes have the option to attach to <productname>PostgreSQL</>'s
shared memory area and to connect to databases internally; they can also run
multiple transactions serially, just like a regular client-connected server
process. Also, by linking to <application>libpq</> they can connect to the
server and behave like a regular client application.
</para>
<warning>
<para>
There are considerable robustness and security risks in using background
worker processes because, being written in the <literal>C</> language,
they have unrestricted access to data. Administrators wishing to enable
modules that include background worker process should exercise extreme
caution. Only carefully audited modules should be permitted to run
background worker processes.
</para>
</warning>
<para>
Background workers can be initialized at the time that
<productname>PostgreSQL</> is started by including the module name in
<varname>shared_preload_libraries</>. A module wishing to run a background
worker can register it by calling
<function>RegisterBackgroundWorker(<type>BackgroundWorker *worker</type>)</function>
from its <function>_PG_init()</>. Background workers can also be started
after the system is up and running by calling the function
<function>RegisterDynamicBackgroundWorker(<type>BackgroundWorker
*worker, BackgroundWorkerHandle **handle</type>)</function>. Unlike
<function>RegisterBackgroundWorker</>, which can only be called from within
the postmaster, <function>RegisterDynamicBackgroundWorker</function> must be
called from a regular backend.
</para>
<para>
The structure <structname>BackgroundWorker</structname> is defined thus:
<programlisting>
typedef void (*bgworker_main_type)(Datum main_arg);
typedef struct BackgroundWorker
{
char bgw_name[BGW_MAXLEN];
int bgw_flags;
BgWorkerStartTime bgw_start_time;
int bgw_restart_time; /* in seconds, or BGW_NEVER_RESTART */
bgworker_main_type bgw_main;
char bgw_library_name[BGW_MAXLEN]; /* only if bgw_main is NULL */
char bgw_function_name[BGW_MAXLEN]; /* only if bgw_main is NULL */
Datum bgw_main_arg;
int bgw_notify_pid;
} BackgroundWorker;
</programlisting>
</para>
<para>
<structfield>bgw_name</> is a string to be used in log messages, process
listings and similar contexts.
</para>
<para>
<structfield>bgw_flags</> is a bitwise-or'd bit mask indicating the
capabilities that the module wants. Possible values are
<literal>BGWORKER_SHMEM_ACCESS</literal> (requesting shared memory access)
and <literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal> (requesting the
ability to establish a database connection, through which it can later run
transactions and queries). A background worker using
<literal>BGWORKER_BACKEND_DATABASE_CONNECTION</literal> to connect to
a database must also attach shared memory using
<literal>BGWORKER_SHMEM_ACCESS</literal>, or worker start-up will fail.
</para>
<para>
<structfield>bgw_start_time</structfield> is the server state during which
<command>postgres</> should start the process; it can be one of
<literal>BgWorkerStart_PostmasterStart</> (start as soon as
<command>postgres</> itself has finished its own initialization; processes
requesting this are not eligible for database connections),
<literal>BgWorkerStart_ConsistentState</> (start as soon as a consistent state
has been reached in a hot standby, allowing processes to connect to
databases and run read-only queries), and
<literal>BgWorkerStart_RecoveryFinished</> (start as soon as the system has
entered normal read-write state). Note the last two values are equivalent
in a server that's not a hot standby. Note that this setting only indicates
when the processes are to be started; they do not stop when a different state
is reached.
</para>
<para>
<structfield>bgw_restart_time</structfield> is the interval, in seconds, that
<command>postgres</command> should wait before restarting the process, in
case it crashes. It can be any positive value,
or <literal>BGW_NEVER_RESTART</literal>, indicating not to restart the
process in case of a crash.
</para>
<para>
<structfield>bgw_main</structfield> is a pointer to the function to run when
the process is started. This function must take a single argument of type
<type>Datum</> and return <type>void</>.
<structfield>bgw_main_arg</structfield> will be passed to it as its only
argument. Note that the global variable <literal>MyBgworkerEntry</literal>
points to a copy of the <structname>BackgroundWorker</structname> structure
passed at registration time. <structfield>bgw_main</structfield> may be
NULL; in that case, <structfield>bgw_library_name</structfield> and
<structfield>bgw_function_name</structfield> will be used to determine
the entry point. This is useful for background workers launched after
postmaster startup, where the postmaster does not have the requisite
library loaded.
</para>
<para>
<structfield>bgw_library_name</structfield> is the name of a library in
which the initial entry point for the background worker should be sought.
It is ignored unless <structfield>bgw_main</structfield> is NULL.
But if <structfield>bgw_main</structfield> is NULL, then the named library
will be dynamically loaded by the worker process and
<structfield>bgw_function_name</structfield> will be used to identify
the function to be called.
</para>
<para>
<structfield>bgw_function_name</structfield> is the name of a function in
a dynamically loaded library which should be used as the initial entry point
for a new background worker. It is ignored unless
<structfield>bgw_main</structfield> is NULL.
</para>
<para>
<structfield>bgw_notify_pid</structfield> is the PID of a PostgreSQL
backend process to which the postmaster should send <literal>SIGUSR1</>
when the process is started or exits. It should be 0 for workers registered
at postmaster startup time, or when the backend registering the worker does
not wish to wait for the worker to start up. Otherwise, it should be
initialized to <literal>MyProcPid</>.
</para>
<para>Once running, the process can connect to a database by calling
<function>BackgroundWorkerInitializeConnection(<parameter>char *dbname</parameter>, <parameter>char *username</parameter>)</function>.
This allows the process to run transactions and queries using the
<literal>SPI</literal> interface. If <varname>dbname</> is NULL,
the session is not connected to any particular database, but shared catalogs
can be accessed. If <varname>username</> is NULL, the process will run as
the superuser created during <command>initdb</>.
BackgroundWorkerInitializeConnection can only be called once per background
process, it is not possible to switch databases.
</para>
<para>
Signals are initially blocked when control reaches the
<structfield>bgw_main</> function, and must be unblocked by it; this is to
allow the process to customize its signal handlers, if necessary.
Signals can be unblocked in the new process by calling
<function>BackgroundWorkerUnblockSignals</> and blocked by calling
<function>BackgroundWorkerBlockSignals</>.
</para>
<para>
If <structfield>bgw_restart_time</structfield> for a background worker is
configured as <literal>BGW_NEVER_RESTART</>, or if it exits with an exit
code of 0 or is terminated by <function>TerminateBackgroundWorker</>,
it will be automatically unregistered by the postmaster on exit.
Otherwise, it will be restarted after the time period configured via
<structfield>bgw_restart_time</>, or immediately if the postmaster
reinitializes the cluster due to a backend failure. Backends which need
to suspend execution only temporarily should use an interruptible sleep
rather than exiting; this can be achieved by calling
<function>WaitLatch()</function>. Make sure the
<literal>WL_POSTMASTER_DEATH</> flag is set when calling that function, and
verify the return code for a prompt exit in the emergency case that
<command>postgres</> itself has terminated.
</para>
<para>
When a background worker is registered using the
<function>RegisterDynamicBackgroundWorker</function> function, it is
possible for the backend performing the registration to obtain information
regarding the status of the worker. Backends wishing to do this should
pass the address of a <type>BackgroundWorkerHandle *</type> as the second
argument to <function>RegisterDynamicBackgroundWorker</function>. If the
worker is successfully registered, this pointer will be initialized with an
opaque handle that can subsequently be passed to
<function>GetBackgroundWorkerPid(<parameter>BackgroundWorkerHandle *</parameter>, <parameter>pid_t *</parameter>)</function> or
<function>TerminateBackgroundWorker(<parameter>BackgroundWorkerHandle *</parameter>)</function>.
<function>GetBackgroundWorkerPid</> can be used to poll the status of the
worker: a return value of <literal>BGWH_NOT_YET_STARTED</> indicates that
the worker has not yet been started by the postmaster;
<literal>BGWH_STOPPED</literal> indicates that it has been started but is
no longer running; and <literal>BGWH_STARTED</literal> indicates that it is
currently running. In this last case, the PID will also be returned via the
second argument.
<function>TerminateBackgroundWorker</> causes the postmaster to send
<literal>SIGTERM</> to the worker if it is running, and to unregister it
as soon as it is not.
</para>
<para>
In some cases, a process which registers a background worker may wish to
wait for the worker to start up. This can be accomplished by initializing
<structfield>bgw_notify_pid</structfield> to <literal>MyProcPid</> and
then passing the <type>BackgroundWorkerHandle *</type> obtained at
registration time to
<function>WaitForBackgroundWorkerStartup(<parameter>BackgroundWorkerHandle
*handle</parameter>, <parameter>pid_t *</parameter>)</function> function.
This function will block until the postmaster has attempted to start the
background worker, or until the postmaster dies. If the background runner
is running, the return value will <literal>BGWH_STARTED</>, and
the PID will be written to the provided address. Otherwise, the return
value will be <literal>BGWH_STOPPED</literal> or
<literal>BGWH_POSTMASTER_DIED</literal>.
</para>
<para>
The <filename>worker_spi</> contrib module contains a working example,
which demonstrates some useful techniques.
</para>
<para>
The maximum number of registered background workers is limited by
<xref linkend="guc-max-worker-processes">.
</para>
</chapter>
|