Skip to content

BinderTransport flow control starves unlucky streams #8834

@jdcormie

Description

@jdcormie

What version of gRPC-Java are you using?

head

What is your environment?

Android/Linux

Steps to reproduce the bug

Create a gRPC client that maintains 10 concurrently active unary RPCs to a service that immediately responds with a payload of 1MB, for 60 seconds.

What did you expect to see?

No errors and a normal distribution of response latencies.

What did you see instead?

~1% of requests fail with either DEADLINE_EXCEEDED or CANCELLED
p99 latency of 3 seconds compared to p50 of only 90ms

I believe the long tail latency / timeouts is caused by some call ids always hashing to the end of BinderTransport's ongoingCalls container. Every time space in the flow control window opens up, call ids that appear early in the iteration order gobble it all up. By the time we get to the end of ongoingCalls, flowController.isTransmitWindowFull() is returning true again and Outbound.send() just returns without making any progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions