From: | Binguo Bao <djydewang(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Optimize partial TOAST decompression |
Date: | 2019-07-05 18:27:56 |
Message-ID: | CAL-OGkuvHceCv96H8PHk+GeDn5NxuojGhQQati7Z7uFHduZVBg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> 于2019年7月5日周五 上午1:46写道:
> I've done a bit of testing and benchmaring on this patch today, and
> there's a bug somewhere, making it look like there are corrupted data.
>
> What I'm seeing is this:
>
> CREATE TABLE t (a text);
>
> -- attached is data for one row
> COPY t FROM '/tmp/t.data';
>
>
> SELECT length(substr(a,1000)) from t;
> psql: ERROR: compressed data is corrupted
>
> SELECT length(substr(a,0,1000)) from t;
> length
> --------
> 999
> (1 row)
>
> SELECT length(substr(a,1000)) from t;
> psql: ERROR: invalid memory alloc request size 2018785106
>
> That's quite bizarre behavior - it does work with a prefix, but not with
> suffix. And the exact ERROR changes after the prefix query. (Of course,
> on master it works in all cases.)
>
> The backtrace (with the patch applied) looks like this:
>
> #0 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
> #1 toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
> #2 0x00000000004c3b08 in heap_tuple_untoast_attr_slice (attr=<optimized
> out>, sliceoffset=0, slicelength=-1) at tuptoaster.c:315
> #3 0x000000000085c1e5 in pg_detoast_datum_slice (datum=<optimized out>,
> first=<optimized out>, count=<optimized out>) at fmgr.c:1767
> #4 0x0000000000833b7a in text_substring (str=133761519127512, start=0,
> length=<optimized out>, length_not_specified=<optimized out>) at
> varlena.c:956
> ...
>
> I've only observed this with a very small number of rows (the data is
> generated randomly with different compressibility etc.), so I'm only
> attaching one row that exhibits this issue.
>
> My guess is toast_fetch_datum_slice() gets confused by the headers or
> something, or something like that. FWIW the new code added to this
> function does not adhere to our code style, and would deserve some
> additional explanation of what it's doing/why. Same for the
> heap_tuple_untoast_attr_slice, BTW.
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
Hi, Tomas!
Thanks for your testing and the suggestion.
That's quite bizarre behavior - it does work with a prefix, but not with
> suffix. And the exact ERROR changes after the prefix query.
I think bug is caused by "#2 0x00000000004c3b08 in
heap_tuple_untoast_attr_slice (attr=<optimized out>, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.
FWIW the new code added to this
> function does not adhere to our code style, and would deserve some
> additional explanation of what it's doing/why. Same for the
> heap_tuple_untoast_attr_slice, BTW.
I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.
Best Regards, Binguo Bao.
Attachment | Content-Type | Size |
---|---|---|
0001-Optimize-partial-TOAST-decompression-5.patch | text/x-patch | 6.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-07-05 18:48:45 | Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development |
Previous Message | Paul A Jungwirth | 2019-07-05 17:59:26 | Re: range_agg |