Re: Mixed up protocol packets in server response?

Lists: pgsql-generalpgsql-jdbc
From: Michal Politowski <mpol+pg(at)meep(dot)pl>
To: pgsql-general(at)postgresql(dot)org
Subject: Mixed up protocol packets in server response?
Date: 2011-06-01 13:06:49
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

I have an instance of what looks like a weird mixup of data returned by the server.
This is observed in a Java application using the 9.0-801.jdbc4 JDBC driver
connecting to an 8.3.4 server (yes, it's old) running on Solaris.

The application hung waiting for the result of one
select * from a_table where a_field = parameter;

Looking at it with a debugger I see that:
1. it waits in:
SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native method]
SocketInputStream.read(byte[], int, int) line: 129
VisibleBufferedInputStream.read(byte[], int, int) line: 219
PGStream.Receive(byte[], int, int) line: 460
PGStream.ReceiveTupleV3() line: 365
QueryExecutorImpl.processResults(ResultHandler, int) line: 1814
QueryExecutorImpl.execute(Query, ParameterList, ResultHandler, int, int, int) line: 257
Jdbc4PreparedStatement(AbstractJdbc2Statement).execute(Query, ParameterList, int) line: 500
Jdbc4PreparedStatement(AbstractJdbc2Statement).executeWithFlags(int) line: 388
Jdbc4PreparedStatement(AbstractJdbc2Statement).executeQuery() line: 273
because it expects a field of size = 825767394 (l_size in ReceiveTupleV3)
which is much more than the whole result should be.

2. then looking at the answer array in ReceiveTupleV3:
A few first fields have the expected values,
then answer[3] has the expected
size but somewhere in the middle it changes from field 3 of the first row
of the result to the middle of field 3 of the last row of the result.
Then of course there are 4 bytes of this field that as a 32-bit integer
have the value of 825767394.
Then answer[4] obviously has 825767394 elements. And it continues with the
field 3 of the last row and then it contains the 4-byte length and then
value of the field 4 (last) of the last row of the result - so the end of a
DataRow message.
Then a CommandComplete message with the tag "SELECT",
then a ReadyForQuery message with the status indicator "T".
And then there is the rest of the field 3 of the first row, then length and
value for the field 4 of the first row and then the DataRow messages for
the following rows of the result. The last one up until the place where
it got mixed up before.

So it looks like the server wanted to send
D row 1, D row 2, D row 3, D row 4, C SELECT, Z T
but the application sees
D ro, ow 4, C SELECT, Z T, w 1, D row 2, D row 3, D r

What may be the cause of this weird problem? Is it some known or unknown bug in
8.3.4 or is the application/Java side more suspected?

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michal Politowski <mpol+pg(at)meep(dot)pl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Mixed up protocol packets in server response?
Date: 2011-06-01 14:11:11
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

Michal Politowski <mpol+pg(at)meep(dot)pl> writes:
> 2. then looking at the answer array in ReceiveTupleV3:
> ...
> So it looks like the server wanted to send
> D row 1, D row 2, D row 3, D row 4, C SELECT, Z T
> but the application sees
> D ro, ow 4, C SELECT, Z T, w 1, D row 2, D row 3, D r

> What may be the cause of this weird problem? Is it some known or unknown bug in
> 8.3.4 or is the application/Java side more suspected?

It's just about impossible to believe that the server sent that, because
(1) it generates and sends those messages one at a time, and (2) nothing
of the sort has ever been reported before. My money is on some foulup
in buffer-wrangling on the client side.

regards, tom lane


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Michal Politowski <mpol+pg(at)meep(dot)pl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Mixed up protocol packets in server response?
Date: 2011-06-02 00:50:30
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

On 1/06/2011 9:06 PM, Michal Politowski wrote:

> What may be the cause of this weird problem? Is it some known or unknown bug in
> 8.3.4 or is the application/Java side more suspected?

It'd be really helpful if you could collect and examine a trace of the
client/server communication using WireShark. That way you can confirm
whether it is (as Tom suspects) the client side mangling its buffers, or
whether the server really did send the nonsensical sequence.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Michal Politowski <mpol+pg(at)meep(dot)pl>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Mixed up protocol packets in server response?
Date: 2011-06-03 13:03:57
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

On Thu, 2 Jun 2011 08:50:30 +0800, Craig Ringer wrote:
> On 1/06/2011 9:06 PM, Michal Politowski wrote:
>
> >What may be the cause of this weird problem? Is it some known or unknown bug in
> >8.3.4 or is the application/Java side more suspected?
>
> It'd be really helpful if you could collect and examine a trace of
> the client/server communication using WireShark. That way you can
> confirm whether it is (as Tom suspects) the client side mangling its
> buffers, or whether the server really did send the nonsensical
> sequence.

Actually my own money is with Tom's. It's very hard to believe Postgres
would do something like this, unless it were some obvious and long fixed bug in 8.3.4.

Still, trying to trace the communication makes sense, if I can convince the
owners of the system to let me do it. Unfortunately this is an
"one in a million of successful queries" (actually two in much more than a million)
problem. And the next run of the application seems not to have hit it, yet.

Thinking aloud: If this is, as it is to be suspected, an application-side problem,
there is at first sight not much space in the application where it could hide. The data is
mixed up in a driver buffer, two method calls from the standard library
socket code. There is the VisibleBufferedInputStream there. Could it do
something like this? Maybe if the connection was erroneously used from two threads
concurrently? The connections are pooled via commons-dbcp BasicDataSource
and queries are executed via Spring JdbcTemplate within Spring-configured
transaction. No passing connections by hand anywhere, everything should be
nicely thread-bound. Still, if not here, where could it go wrong?

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michal Politowski <mpol+pg(at)meep(dot)pl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Mixed up protocol packets in server response?
Date: 2011-06-03 13:53:59
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

Michal Politowski <mpol+pg(at)meep(dot)pl> writes:
> Thinking aloud: If this is, as it is to be suspected, an application-side problem,
> there is at first sight not much space in the application where it could hide. The data is
> mixed up in a driver buffer, two method calls from the standard library
> socket code. There is the VisibleBufferedInputStream there. Could it do
> something like this? Maybe if the connection was erroneously used from two threads
> concurrently? The connections are pooled via commons-dbcp BasicDataSource
> and queries are executed via Spring JdbcTemplate within Spring-configured
> transaction. No passing connections by hand anywhere, everything should be
> nicely thread-bound. Still, if not here, where could it go wrong?

You'd probably be better off asking these questions in pgsql-jdbc ...
a lot of us here don't even speak Java.

regards, tom lane


From: Michal Politowski <mpol+pg(at)meep(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Mixed up protocol packets in server response?
Date: 2011-06-03 14:13:48
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

On Fri, 3 Jun 2011 09:53:59 -0400, Tom Lane wrote:
[...]
> You'd probably be better off asking these questions in pgsql-jdbc ...
> a lot of us here don't even speak Java.

Good point. Will try the other list. Thank you for your help.

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.


From: Michal Politowski <mpol+pg(at)meep(dot)pl>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: [GENERAL] Mixed up protocol packets in server response?
Date: 2011-06-03 14:28:25
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

Hello,
I have a quite weird problem. I described it in a mail to pgsql-general
quoted below, and quite obviously the general opinion is that the source is
on the application side, not server side. But the data is mixed up in
the jdbc driver, two method calls away from the standard library socket
code, so there is not much space for the application to mess it up.

There is the VisibleBufferedInputStream there. Could it do something
like this? Maybe if the connection was erroneously used from two
threads concurrently? The connections are pooled via commons-dbcp
BasicDataSource and queries are executed via Spring JdbcTemplate within
Spring-configured transactions. No passing connections by hand anywhere
and everything should be nicely thread-bound. Still, if not here, where
could it go wrong? Could someone possibly give some other suggestions?

On Wed, 1 Jun 2011 15:06:49 +0200, Michal Politowski wrote:
> I have an instance of what looks like a weird mixup of data returned by the server.
> This is observed in a Java application using the 9.0-801.jdbc4 JDBC driver
> connecting to an 8.3.4 server (yes, it's old) running on Solaris.
>
> The application hung waiting for the result of one
> select * from a_table where a_field = parameter;
>
> Looking at it with a debugger I see that:
> 1. it waits in:
> SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native method]
> SocketInputStream.read(byte[], int, int) line: 129
> VisibleBufferedInputStream.read(byte[], int, int) line: 219
> PGStream.Receive(byte[], int, int) line: 460
> PGStream.ReceiveTupleV3() line: 365
> QueryExecutorImpl.processResults(ResultHandler, int) line: 1814
> QueryExecutorImpl.execute(Query, ParameterList, ResultHandler, int, int, int) line: 257
> Jdbc4PreparedStatement(AbstractJdbc2Statement).execute(Query, ParameterList, int) line: 500
> Jdbc4PreparedStatement(AbstractJdbc2Statement).executeWithFlags(int) line: 388
> Jdbc4PreparedStatement(AbstractJdbc2Statement).executeQuery() line: 273
> because it expects a field of size = 825767394 (l_size in ReceiveTupleV3)
> which is much more than the whole result should be.
>
> 2. then looking at the answer array in ReceiveTupleV3:
> A few first fields have the expected values,
> then answer[3] has the expected
> size but somewhere in the middle it changes from field 3 of the first row
> of the result to the middle of field 3 of the last row of the result.
> Then of course there are 4 bytes of this field that as a 32-bit integer
> have the value of 825767394.
> Then answer[4] obviously has 825767394 elements. And it continues with the
> field 3 of the last row and then it contains the 4-byte length and then
> value of the field 4 (last) of the last row of the result - so the end of a
> DataRow message.
> Then a CommandComplete message with the tag "SELECT",
> then a ReadyForQuery message with the status indicator "T".
> And then there is the rest of the field 3 of the first row, then length and
> value for the field 4 of the first row and then the DataRow messages for
> the following rows of the result. The last one up until the place where
> it got mixed up before.
>
> So it looks like the server wanted to send
> D row 1, D row 2, D row 3, D row 4, C SELECT, Z T
> but the application sees
> D ro, ow 4, C SELECT, Z T, w 1, D row 2, D row 3, D r
>
> What may be the cause of this weird problem? Is it some known or unknown bug in
> 8.3.4 or is the application/Java side more suspected?

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.


From: David Boreham <david_list(at)boreham(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Mixed up protocol packets in server response?
Date: 2011-06-03 16:20:12
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc


> transaction. No passing connections by hand anywhere, everything should be
> nicely thread-bound. Still, if not here, where could it go wrong?
>
I have seen two cases in my career where there was an "evil box" on the
network that corrupted the traffic.
The first was a very long time ago (in the late '80s) but the second was
only a couple of years ago and presented
with very similar symptoms to your report. This happened at a consulting
client's site (actually between two sites).
Weird "franken-packets" showed up once in a while, leading to a protocol
decode failure. Luckily we had been
involved in writing both the client and the server, and therefore had a
high degree of confidence that they were
correct. The network administrators denied strongly that they had any
equipment deployed that touched the
payload of any packet. They denied this several times. Eventually we
were able to take packet traces
on both client and server machines, correlate the traffic (not
necessarily an easy task), and prove
conclusively that what had been sent from one end did not show up intact
at the other end.
A few days later the network people revealed that they had some sort of
firewall/traffic management box
that was mangling the traffic.

Having said that, bugs in buffer management code are also not uncommon,
and can manifest intermittently
since they may be triggered by specific boundary conditions, specific
received data buffer size, and so on.

I have also seen once case of data leaking between threads in an
unpleasant and intermittent way
in a Java application, in buffer management code that attempted to avoid
GC overhead by re-using
buffers across sessions. So that's definitely a non-zero possibility too.


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Michal Politowski <mpol+pg(at)meep(dot)pl>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: [GENERAL] Mixed up protocol packets in server response?
Date: 2011-06-03 22:39:26
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

2011/6/4 Michal Politowski <mpol+pg(at)meep(dot)pl>:
> Hello,
> I have a quite weird problem. I described it in a mail to pgsql-general
> quoted below, and quite obviously the general opinion is that the source is
> on the application side, not server side. But the data is mixed up in
> the jdbc driver, two method calls away from the standard library socket
> code, so there is not much space for the application to mess it up.

Can you get a capture of the traffic on the actual network (e.g. via
tcpdump/wireshark?)

Oliver


From: Michal Politowski <mpol+pg(at)meep(dot)pl>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: [GENERAL] Mixed up protocol packets in server response?
Date: 2011-06-06 08:13:23
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-general pgsql-jdbc

On Sat, 4 Jun 2011 10:39:26 +1200, Oliver Jowett wrote:
> 2011/6/4 Michal Politowski <mpol+pg(at)meep(dot)pl>:
> > Hello,
> > I have a quite weird problem. I described it in a mail to pgsql-general
> > quoted below, and quite obviously the general opinion is that the source is
> > on the application side, not server side. But the data is mixed up in
> > the jdbc driver, two method calls away from the standard library socket
> > code, so there is not much space for the application to mess it up.
>
> Can you get a capture of the traffic on the actual network (e.g. via
> tcpdump/wireshark?)

Depends on what the system owners will be happy with, but I'm trying to
arrange it.

--
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.