Lists: | pgsql-hackers |
---|
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Misdesigned command/status APIs for parallel dump/restore |
Date: | 2016-05-28 20:01:57 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
pg_dump's APIs for parallel dump/restore include an
archive-format-specific MasterStartParallelItem function, which is
evidently intended to allow format-specific data to be sent to the
worker process during startup of a parallel sub-job. However, no
such data can be sent in practice, because parsing of those commands
is hardwired in parallel.c's WaitForCommands() function; nor do the
APIs for the format-specific WorkerJobDump/WorkerJobRestore functions
allow any format-specific data to be passed to them. So it's unsurprising
that the two existing instantiations of MasterStartParallelItem
are effectively duplicates.
I am pretty strongly tempted to get rid of MasterStartParallelItem
altogether and just hard-code what it does in DispatchJobForTocEntry.
That'd also allow getting rid of the ugly usage of static buffers there.
Alternatively, we could do what some of the comments suggest and make the
format-specific WorkerJobDump/WorkerJobRestore functions responsible for
parsing the command strings. But that seems like it would just be adding
more duplicative code in support of flexibility that there's no use for.
The arguments to DispatchJobForTocEntry are just a TocEntry and a
T_Action, and that seems unlikely to change.
The situation for MasterEndParallelItem isn't a whole lot better.
At least the status strings are both created and parsed by format-
specific functions --- but those functions are completely duplicative,
performing no useful format-specific work, and there's no good reason
to think that we'd need format-specific work in future. So I'm tempted
to get rid of the MasterEndParallelItem API as well. Instead, have
WorkerJobDump/WorkerJobRestore just return integer status codes, and
perform all construction and parsing of the status messages in parallel.c.
It's possible to imagine that future archive formats might have use for,
say, passing an amount-of-data-written number from a worker back up to
the master. But you could also implement that by relying on the
filesystem (that is, a master or another worker could check file size
to see how much had been written). So I'm pretty skeptical that we
need extra flexibility here.
A different line of thought would be to fix the worker-command-parsing
situation by allowing the parsing to happen in format-specific code,
but avoid duplicative coding by letting archive formats share a common
implementation function if they had no need for any custom data.
Thoughts?
regards, tom lane
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Misdesigned command/status APIs for parallel dump/restore |
Date: | 2016-06-01 18:29:30 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I wrote:
> I am pretty strongly tempted to get rid of MasterStartParallelItem
> altogether and just hard-code what it does in DispatchJobForTocEntry.
> ...
> A different line of thought would be to fix the worker-command-parsing
> situation by allowing the parsing to happen in format-specific code,
> but avoid duplicative coding by letting archive formats share a common
> implementation function if they had no need for any custom data.
In the attached patch for this, I took a middle ground of separating out
the command and status string building and parsing functions. There isn't
presently any provision for letting archive format modules override these,
but that could easily be added if we ever need it. Meanwhile, this saves
about 100 lines of rather messy code.
Like the other couple of patches I've posted recently for parallel dump,
this is against current HEAD. There will be minor merge conflicts when
these patches are combined; but they should be reviewable independently,
so I'll worry about the merge issues later.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
cleaner-dump-command-status-APIs.patch | text/x-diff | 23.9 KB |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Misdesigned command/status APIs for parallel dump/restore |
Date: | 2016-06-02 19:48:42 |
Message-ID: | [email protected] |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I wrote:
> In the attached patch for this, I took a middle ground of separating out
> the command and status string building and parsing functions. There isn't
> presently any provision for letting archive format modules override these,
> but that could easily be added if we ever need it. Meanwhile, this saves
> about 100 lines of rather messy code.
Attached are two versions of this patch. The first is against current
HEAD (i.e., rebased over e652273e073566b6) and the second is additionally
over the patch I proposed at <6814(dot)1464893855(at)sss(dot)pgh(dot)pa(dot)us>, resolving
the merge conflicts between them.
Since this is only refactoring, and doesn't (I think) fix any bugs,
I'm setting it aside till the next commitfest.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
cleaner-dump-command-status-APIs-2.patch | text/x-diff | 23.9 KB |
cleaner-dump-command-status-APIs-3.patch | text/x-diff | 24.6 KB |
From: | Kevin Grittner <kgrittn(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Misdesigned command/status APIs for parallel dump/restore |
Date: | 2016-09-27 17:29:45 |
Message-ID: | CACjxUsO4k=Y3W1_Lo=iX5CLg4FDvstxZofOtUaaOt2FL+8pN0g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
This patch also applied with minor offsets, compiled without
warning, and passes regression tests, including `make check-world`
with TAP tests enabled. It looks good to me, presenting a cleaner
API.
I really want to get this to the point where we can dump and
restore nested materialized views better, and this doesn't get us
to the point where that's immediately possible; but with a cleaner
API it should be easier to get there, so this is a step along the
way.
Setting to Ready for Committer.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company