Quick Links

pgbench - allow to specify scale as a size

Lists:	pgsql-hackers

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	pgbench - allow to specify scale as a size
Date:	2018-02-17 09:20:14
Message-ID:	alpine.DEB.2.20.1802171013110.20813@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

After Karel Moppel piece on pgbench scale/size conversion, it occured to
me that having this as an option would be nice.

https://www.cybertec-postgresql.com/en/a-formula-to-calculate-pgbench-scaling-factor-for-target-db-size/

Here is a attempt at extending --scale so that it can be given a size.

pgbench -i --scale=124G ...

The approximated database size is also shown in the end-of-run summary.

--
Fabien.

Attachment	Content-Type	Size
pgbench-scale-size-1.patch	text/x-diff	6.5 KB

From:	Erik Rijkers <er(at)xs4all(dot)nl>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 11:30:14
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2018-02-17 10:20, Fabien COELHO wrote:
> After Karel Moppel piece on pgbench scale/size conversion, it occured
> to me that having this as an option would be nice.
>
> https://www.cybertec-postgresql.com/en/a-formula-to-calculate-pgbench-scaling-factor-for-target-db-size/
>
> Here is a attempt at extending --scale so that it can be given a size.
>
> pgbench -i --scale=124G ...
>
> The approximated database size is also shown in the end-of-run summary.

> [pgbench-scale-size-1.patch]

Seem a nice addition but something isn't quite right; with '-s 50' (no
unit) I get: 'scale 50 too small':

$ pgbench -is 50
scale 50 too small, rounded to 1
dropping old tables...
creating tables...
generating data...
100000 of 100000 tuples (100%) done (elapsed 0.13 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.

echo '\dti+ pgbench_accounts*' | psql -qX

thanks,

Erik Rijkers

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Erik Rijkers <er(at)xs4all(dot)nl>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 14:50:22
Message-ID:	alpine.DEB.2.20.1802171524390.20813@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Seem a nice addition but something isn't quite right; with '-s 50' (no unit)
> I get: 'scale 50 too small':

Sigh. Indeed, it seems that I forgot to test some cases... Thanks
for the debug. Here is an hopefully better attempt.

I also upgraded the regression test to test more that "-s 1". I also tried
to improve the documentation to point out that the result is an
approximation.

--
Fabien.

Attachment	Content-Type	Size
pgbench-scale-size-2.patch	text/x-diff	7.8 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 16:26:23
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
> Here is a attempt at extending --scale so that it can be given a size.

I do not actually find this to be a good idea. It's going to be
platform-dependent, or not very accurate, or both, and thereby
contribute to confusion by making results less reproducible.

Plus, what do we do if the backend changes table representation in
some way that invalidates Kaarel's formula altogether? More confusion
would be inevitable.

regards, tom lane

From:	Alvaro Hernandez <aht(at)ongres(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 16:55:20
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17/02/18 11:26, Tom Lane wrote:
> Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>> Here is a attempt at extending --scale so that it can be given a size.
> I do not actually find this to be a good idea. It's going to be
> platform-dependent, or not very accurate, or both, and thereby
> contribute to confusion by making results less reproducible.
>
> Plus, what do we do if the backend changes table representation in
> some way that invalidates Kaarel's formula altogether? More confusion
> would be inevitable.

Why not then insert a "few" rows, measure size, truncate the table,
compute the formula and then insert to the desired user requested size?
(or insert what should be the minimum, scale 1, measure, and extrapolate
what's missing). It doesn't sound too complicated to me, and targeting a
size is something that I believe it's quite good for user.

Álvaro

Alvaro Hernandez

-----------
OnGres

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Hernandez <aht(at)ongres(dot)com>
Cc:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 17:17:17
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Hernandez <aht(at)ongres(dot)com> writes:
> On 17/02/18 11:26, Tom Lane wrote:
>> Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>>> Here is a attempt at extending --scale so that it can be given a size.

>> I do not actually find this to be a good idea. It's going to be
>> platform-dependent, or not very accurate, or both, and thereby
>> contribute to confusion by making results less reproducible.
>>
>> Plus, what do we do if the backend changes table representation in
>> some way that invalidates Kaarel's formula altogether? More confusion
>> would be inevitable.

> Why not then insert a "few" rows, measure size, truncate the table,
> compute the formula and then insert to the desired user requested size?
> (or insert what should be the minimum, scale 1, measure, and extrapolate
> what's missing). It doesn't sound too complicated to me, and targeting a
> size is something that I believe it's quite good for user.

Then you'd *really* have irreproducible results.

regards, tom lane

From:	Alvaro Hernandez <aht(at)ongres(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 17:22:37
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17/02/18 12:17, Tom Lane wrote:
> Alvaro Hernandez <aht(at)ongres(dot)com> writes:
>> On 17/02/18 11:26, Tom Lane wrote:
>>> Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>>>> Here is a attempt at extending --scale so that it can be given a size.
>>> I do not actually find this to be a good idea. It's going to be
>>> platform-dependent, or not very accurate, or both, and thereby
>>> contribute to confusion by making results less reproducible.
>>>
>>> Plus, what do we do if the backend changes table representation in
>>> some way that invalidates Kaarel's formula altogether? More confusion
>>> would be inevitable.
>> Why not then insert a "few" rows, measure size, truncate the table,
>> compute the formula and then insert to the desired user requested size?
>> (or insert what should be the minimum, scale 1, measure, and extrapolate
>> what's missing). It doesn't sound too complicated to me, and targeting a
>> size is something that I believe it's quite good for user.
> Then you'd *really* have irreproducible results.
>
> regards, tom lane

You also have irreproducible results today, according to your
criteria. Either you agree on the number of rows but may not agree on
the size (today), or you agree on the size but may not agree on the
number of rows. Right now you can only pick the former, while I think
people would significantly appreciate the latter. If neither is correct,
let's at least provide the choice.

Regards,

Álvaro

Alvaro Hernandez

-----------
OnGres

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 17:33:12
Message-ID:	alpine.DEB.2.20.1802171823290.20813@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello Tom,

>> Here is a attempt at extending --scale so that it can be given a size.
>
> I do not actually find this to be a good idea. It's going to be
> platform-dependent, or not very accurate, or both, and thereby
> contribute to confusion by making results less reproducible.

I have often wanted to have such an option for testing, with criterion
like "within shared_buffers", "within memory", "twice the available
memory", to look for behavioral changes in some performance tests.

I you want reproducible (for some definition of reproducible) and
accurate, you can always use scale with a number. The report provides the
actual scale used anyway, so providing the size is just a convenience for
the initialization phase. I agree that it cannot be really exact.

Would it be more acceptable with some clear(er)/explicit caveat?

> Plus, what do we do if the backend changes table representation in
> some way that invalidates Kaarel's formula altogether?

Then the formula (a simple linear regression, really) should have to be
updated?

> More confusion would be inevitable.

There is no much confusion when the "scale" is reported. As for confusion,
a performance tests is influenced by dozen of parameters anyway.

Now if you do not want such a feature, you can mark it as rejected, and we
will keep on trying to guess or look for the formula till the end of
time:-)

--
Fabien.

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Alvaro Hernandez <aht(at)ongres(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 17:37:38
Message-ID:	alpine.DEB.2.20.1802171833450.20813@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Why not then insert a "few" rows, measure size, truncate the table,
> compute the formula and then insert to the desired user requested size? (or
> insert what should be the minimum, scale 1, measure, and extrapolate what's
> missing). It doesn't sound too complicated to me, and targeting a size is
> something that I believe it's quite good for user.

The formula I used approximates the whole database, not just one table.
There was one for the table, but this is only part of the issue. In
particular, ISTM that index sizes should be included when caching is
considered.

Also, index sizes are probably in n ln(n), so some level of approximation
is inevitable.

Moreover, the intrinsic granularity of TPC-B as multiple of 100,000 rows
makes it not very precise wrt size anyway.

--
Fabien.

From:	Alvaro Hernandez <aht(at)ongres(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-17 17:40:03
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17/02/18 12:37, Fabien COELHO wrote:
>
>> Why not then insert a "few" rows, measure size, truncate the
>> table, compute the formula and then insert to the desired user
>> requested size? (or insert what should be the minimum, scale 1,
>> measure, and extrapolate what's missing). It doesn't sound too
>> complicated to me, and targeting a size is
>> something that I believe it's quite good for user.
>
> The formula I used approximates the whole database, not just one
> table. There was one for the table, but this is only part of the
> issue. In particular, ISTM that index sizes should be included when
> caching is considered.
>
> Also, index sizes are probably in n ln(n), so some level of
> approximation is inevitable.
>
> Moreover, the intrinsic granularity of TPC-B as multiple of 100,000
> rows makes it not very precise wrt size anyway.
>

Sure, makes sense, so my second suggestion seems more reasonable:
insert with scale 1, measure there (ok, you might need to crete indexes
only to later drop them), and if computed scale > 1 then insert whatever
is left to insert. Shouldn't be a big deal to me.

I like the feature :)

Álvaro

Alvaro Hernandez

-----------
OnGres

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Alvaro Hernandez <aht(at)ongres(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-19 07:43:52
Message-ID:	alpine.DEB.2.20.1802190832140.10483@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello Alvaro & Tom,

>>> Why not then insert a "few" rows, measure size, truncate the table,
>>> compute the formula and then insert to the desired user requested
>>> size? (or insert what should be the minimum, scale 1, measure, and
>>> extrapolate what's missing). It doesn't sound too complicated to me,
>>> and targeting a size is something that I believe it's quite good for
>>> user.
>>
>> The formula I used approximates the whole database, not just one table.
>> There was one for the table, but this is only part of the issue. In
>> particular, ISTM that index sizes should be included when caching is
>> considered.
>>
>> Also, index sizes are probably in n ln(n), so some level of
>> approximation is inevitable.
>>
>> Moreover, the intrinsic granularity of TPC-B as multiple of 100,000
>> rows makes it not very precise wrt size anyway.
>
> Sure, makes sense, so my second suggestion seems more reasonable: insert
> with scale 1, measure there (ok, you might need to crete indexes only to
> later drop them), and if computed scale > 1 then insert whatever is left
> to insert. Shouldn't be a big deal to me.

I could implement that, even if it would lead to some approximation
nevertheless: ISTM that the very large scale regression performed by
Kaarel is significantly more precise than testing with scale 1 (typically
a few MiB) and extrapolation that to hundreds of GiB.

Maybe it could be done with kind of an open ended dichotomy, but creating
and recreating index looks like an ugly solution, and what should be
significant is the whole database size, including tellers & branches
tables and all indexes, so I'm not convinced. Now as tellers & branches
tables have basically the same structure as accounts, it could be just
scaled by assuming that it would incur the same storage per row.

Anyway, even if I do not like it, it could be better than nothing. The key
point for me is that if Tom is dead set against the feature the patch is
dead anyway.

Tom, would Alvaro approach be more admissible to you that a fixed formula
that would need updating, keeping in mind that such a feature implies
some level approximation?

--
Fabien.

From:	Mark Wong <mark(at)2ndQuadrant(dot)com>
To:	Alvaro Hernandez <aht(at)ongres(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-19 17:00:41
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 17, 2018 at 12:22:37PM -0500, Alvaro Hernandez wrote:
>
>
> On 17/02/18 12:17, Tom Lane wrote:
> > Alvaro Hernandez <aht(at)ongres(dot)com> writes:
> >> On 17/02/18 11:26, Tom Lane wrote:
> >>> Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
> >>>> Here is a attempt at extending --scale so that it can be given a size.
> >>> I do not actually find this to be a good idea. It's going to be
> >>> platform-dependent, or not very accurate, or both, and thereby
> >>> contribute to confusion by making results less reproducible.
> >>>
> >>> Plus, what do we do if the backend changes table representation in
> >>> some way that invalidates Kaarel's formula altogether? More confusion
> >>> would be inevitable.
> >> Why not then insert a "few" rows, measure size, truncate the table,
> >> compute the formula and then insert to the desired user requested size?
> >> (or insert what should be the minimum, scale 1, measure, and extrapolate
> >> what's missing). It doesn't sound too complicated to me, and targeting a
> >> size is something that I believe it's quite good for user.
> > Then you'd *really* have irreproducible results.
> >
> > regards, tom lane
>
> You also have irreproducible results today, according to your
> criteria. Either you agree on the number of rows but may not agree on
> the size (today), or you agree on the size but may not agree on the
> number of rows. Right now you can only pick the former, while I think
> people would significantly appreciate the latter. If neither is correct,
> let's at least provide the choice.

What if we consider using ascii (utf8?) text file sizes as a reference
point, something independent from the database?

I realize even if a flat file size can be used as a more consistent
reference across platforms, it doesn't help with accurately determining
the final data file sizes due to any architecture specific nuances or
changes in the backend. But perhaps it might still offer a little more
meaning to be able to say "50 gigabytes of bank account data" rather
than "10 million rows of bank accounts" when thinking about size over
cardinality.

Regards,
Mark

--
Mark Wong http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Mark Wong <mark(at)2ndQuadrant(dot)com>
Cc:	Alvaro Hernandez <aht(at)ongres(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-19 17:27:51
Message-ID:	alpine.DEB.2.20.1802191811190.21372@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello Mark,

> What if we consider using ascii (utf8?) text file sizes as a reference
> point, something independent from the database?

Why not.

TPC-B basically specifies that rows (accounts, tellers, branches) are all
padded to 100 bytes, thus we could consider (i.e. document) that
--scale=SIZE refers to the amount of useful data hold, and warn that
actual storage would add various overheads for page and row headers, free
space at the end of pages, indexes...

Then one scale step is 100,000 accounts, 10 tellers and 1 branch, i.e.
100,011 * 100 ~ 9.5 MiB of useful data per scale step.

> I realize even if a flat file size can be used as a more consistent
> reference across platforms, it doesn't help with accurately determining
> the final data file sizes due to any architecture specific nuances or
> changes in the backend. But perhaps it might still offer a little more
> meaning to be able to say "50 gigabytes of bank account data" rather
> than "10 million rows of bank accounts" when thinking about size over
> cardinality.

Yep.

Now the overhead is really 60-65%. Although the specification is
unambiguous, but we still need some maths to know whether it fits in
buffers or memory... The point of Karel regression is to take this into
account.

Also, whether this option would be more admissible to Tom is still an open
question. Tom?

--
Fabien.

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Mark Wong <mark(at)2ndQuadrant(dot)com>
Cc:	Alvaro Hernandez <aht(at)ongres(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-02-20 10:06:26
Message-ID:	alpine.DEB.2.20.1802201104480.6091@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Now the overhead is really 60-65%. Although the specification is unambiguous,
> but we still need some maths to know whether it fits in buffers or memory...
> The point of Karel regression is to take this into account.
>
> Also, whether this option would be more admissible to Tom is still an open
> question. Tom?

Here is a version with this approach: the documentation talks about
"actual data size, without overheads", and points out that storage
overheads are typically an additional 65%.

--
Fabien.

Attachment	Content-Type	Size
pgbench-scale-size-3.patch	text/x-diff	7.8 KB

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Mark Wong <mark(at)2ndQuadrant(dot)com>
Cc:	Alvaro Hernandez <aht(at)ongres(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-03-03 19:37:03
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/20/18 05:06, Fabien COELHO wrote:
>> Now the overhead is really 60-65%. Although the specification is unambiguous,
>> but we still need some maths to know whether it fits in buffers or memory...
>> The point of Karel regression is to take this into account.
>>
>> Also, whether this option would be more admissible to Tom is still an open
>> question. Tom?
>
> Here is a version with this approach: the documentation talks about
> "actual data size, without overheads", and points out that storage
> overheads are typically an additional 65%.

I think when deciding on a size for a test database for benchmarking,
you want to size it relative to RAM or other storage layers. So a
feature that allows you to create a database of size N but it's actually
not going to be anywhere near N seems pretty useless for that.

(Also, we have, for better or worse, settled on a convention for byte
unit prefixes in guc.c. Let's not introduce another one.)

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc:	Mark Wong <mark(at)2ndQuadrant(dot)com>, Alvaro Hernandez <aht(at)ongres(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-03-04 09:09:11
Message-ID:	alpine.DEB.2.20.1803040947500.12500@lancre
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>> Now the overhead is really 60-65%. Although the specification is unambiguous,
>>> but we still need some maths to know whether it fits in buffers or memory...
>>> The point of Karel regression is to take this into account.
>>>
>>> Also, whether this option would be more admissible to Tom is still an open
>>> question. Tom?
>>
>> Here is a version with this approach: the documentation talks about
>> "actual data size, without overheads", and points out that storage
>> overheads are typically an additional 65%.
>
> I think when deciding on a size for a test database for benchmarking,
> you want to size it relative to RAM or other storage layers. So a
> feature that allows you to create a database of size N but it's actually
> not going to be anywhere near N seems pretty useless for that.

Hmmm.

At least the option say the size of the useful data, which should be what
the user be really interested in:-) You have a developer point of view
about the issue. From a performance point of view, ISTM that useful data
storage size is an interesting measure, which allows to compare between
(future) storage engines and show the impact of smaller overheads, for
instance.

The other option can only be some kind of approximation, and would require
some kind of maintenance (eg a future zheap overhead would be different
that the heap overhead, the overhead depends on the size itself, and it
could also depend on other options). This has been rejected, and I agree
with the rejection (incredible:-).

So ISTM that the patch is dead because it is somehow necessarily
imprecise. People will continue to do some wild guessing on how to
translate scale to anything related to size.

> (Also, we have, for better or worse, settled on a convention for byte
> unit prefixes in guc.c. Let's not introduce another one.)

Hmmm. Indeed for worse, as it is soooo much better to invent our own units
than to reuse existing ones which were not confusing enough:-)

- SI units: 1kB = 1000 bytes (*small* k)
- IEC units: 1KiB = 1024 bytes
- JEDEC units: 1KB = 1024 bytes (*capital* k)

But postgres documentation uses "kB" for 1024 bytes, too bad:-(

The gucs are about memory, which is measured in 1024, but the storage is
usually measured in 1000, and this option was about storage, hence I felt
it better to avoid confusion.

Conclusion: mark the patch as rejected?

--
Fabien.

From:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	Mark Wong <mark(at)2ndQuadrant(dot)com>, Alvaro Hernandez <aht(at)ongres(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgbench - allow to specify scale as a size
Date:	2018-03-06 13:18:40
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/4/18 04:09, Fabien COELHO wrote:
> So ISTM that the patch is dead because it is somehow necessarily
> imprecise. People will continue to do some wild guessing on how to
> translate scale to anything related to size.

I think so.

> Conclusion: mark the patch as rejected?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services