Skip to content

Conversation

@dnwe
Copy link
Collaborator

@dnwe dnwe commented Apr 15, 2025

Array lengths in the Kafka protocol are encoded as a 32-bit signed integer in big-endian format. For a null value this corresponds to the byte sequence FF FF FF FF which is intended to represent null as a length of -1. To correctly read that we need to cast the value to int32 before widening to int.

Copy link
Collaborator

@puellanivis puellanivis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARRAY: Represents a sequence of objects of a given type T. Type T can be either a primitive type (e.g. STRING) or a structure. First, the length N is given as an INT32. Then N instances of type T follow. A null array is represented with a length of -1. In protocol documentation an array of T instances is referred to as [T].
https://kafka.apache.org/protocol.html

Checks out. The cast to int32 first is necessary to get signedness before then casting to a maybe-larger int.

I’m not sure that we’re still handling -1 length NULL arrays properly though. Should we be returning errInvalidArrayLength or should we be returning a []T(nil)?

@dnwe
Copy link
Collaborator Author

dnwe commented Apr 15, 2025

Yeah the latter I think. @vladoatanasov spotted the original problem whilst using the decoder code elsewhere. I was mildly surprised we'd never hit any problems due to this, but I can only guess it's an uncommon response for the client to decode and we obviously don't have any unittest coverage of this case

@vladoatanasov
Copy link

vladoatanasov commented Apr 15, 2025

Hey @puellanivis, I stumbled upon this issue while implementing the DescribeConfigs API here. The spec states The configuration keys to list, or null to list all configuration keys. for the configuration keys array. And that's how clients behave, they send -1, aka nil, when no config keys are specified. Tried this behavior with Sarama and Kafka cli. So I believe the decoder should return nil when the array size is -1

@github-actions
Copy link

Thank you for your contribution! However, this pull request has not had any activity in the past 90 days and will be closed in 30 days if no updates occur.
If you believe the changes are still valid then please verify your branch has no conflicts with main and rebase if needed. If you are awaiting a (re-)review then please let us know.

@github-actions github-actions bot added the stale Issues and pull requests without any recent activity label Jul 14, 2025
@puellanivis
Copy link
Collaborator

🤔 This probably shouldn’t still be in draft mode? The change is sound. (Even though it’s a little weird that it has a double cast, it’s correct.)

@github-actions github-actions bot removed the stale Issues and pull requests without any recent activity label Jul 15, 2025
@dnwe dnwe force-pushed the dnwe/fix-decoder branch from 9f66306 to df8f7cd Compare July 15, 2025 09:35
@dnwe dnwe marked this pull request as ready for review July 15, 2025 09:35
@dnwe
Copy link
Collaborator Author

dnwe commented Jul 15, 2025

@puellanivis yeah I think I intended to come back and add a unittest before getting a review and merging, but obviously forgot about it

@vladoatanasov
Copy link

@dnwe it's probably worth it adding a code comment as to why a double cast is needed. I had already forgotten about this use-case. Someone looking at this code in the future might be wondering what's going on.

@puellanivis
Copy link
Collaborator

🤔 I think a unit test might be a bit hard, because it would never fail in a 32-bit environment. But then, I think we get plenty of 64-bit environment testing done.

I have +1’ed already the idea of adding a comment about why a double cast is necessary. I remembered pretty fast why it’s there, but it is kind of an astonishing thing to need to do.

@dnwe dnwe added the fix label Aug 6, 2025
@dnwe dnwe force-pushed the dnwe/fix-decoder branch from df8f7cd to eff4b22 Compare August 6, 2025 11:35
Array lengths in the Kafka protocol are encoded as a 32-bit signed
integer in big-endian format. For a null value this corresponds to the
byte sequence FF FF FF FF which is intended to represent null as a
length of -1. To correctly read that we need to cast the value to int32
before widening to int.

Signed-off-by: Dominic Evans <[email protected]>
@dnwe dnwe force-pushed the dnwe/fix-decoder branch from eff4b22 to c394867 Compare August 6, 2025 11:40
@dnwe dnwe merged commit 2f15dc6 into main Aug 6, 2025
17 checks passed
@dnwe dnwe deleted the dnwe/fix-decoder branch August 6, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants