背景
项目中碰到了grpc双向流和单向流出现了服务端流context报错:context cancel error,而客户端无感知连接断开,依然认为还在连接中,导致数据传输中断。
遇到这种情况从以下几个维度去思考问题发生的可能:
- 网络是否稳定,能否复现
- 客户端和服务端是否有对context进行cancel
- 客户端和服务端是否有去处理context的cancel信号
由于出现的问题不是在开发环境中,因此开始只能看日志进行排查,但是日志中只得到了服务端会收到context的cancel信号,而客户端没有任何异常。因此尝试能否在开发环境复现,通过拔掉客户端网线后重现了这个异常。解决的方法也很简单,可以在流式接口中加上心跳,如果多久没有收到心跳就断连重连。但是否有更简洁更优雅的方式呢?有的,答案就是grpc的keepalive参数。
KeepAlive
keepalive ping 是一种通过传输发送 HTTP2 ping 来检查通道当前是否正在工作的方法。它是周期性发送的,如果 ping 在一定的超时时间内没有得到对等方的确认,则传输断开。
客户端的grpc keepalive参数
// ClientParameters is used to set keepalive parameters on the client-side.
// These configure how the client will actively probe to notice when a
// connection is broken and send pings so intermediaries will be aware of the
// liveness of the connection. Make sure these parameters are set in
// coordination with the keepalive policy on the server, as incompatible
// settings can result in closing of connection.
type ClientParameters struct {
// After a duration of this time if the client doesn't see any activity it
// pings the server to see if the transport is still alive.
// If set below 10s, a minimum value of 10s will be used instead.
Time time.Duration // The current default value is infinity.
// After having pinged for keepalive check, the client waits for a duration
// of Timeout and if no activity is seen even after that the connection is
// closed.
Timeout time.Duration // The current default value is 20 seconds.
// If true, client sends keepalive pings even with no active RPCs. If false,
// when there are no active RPCs, Time and Timeout will be ignored and no
// keepalive pings will be sent.
PermitWithoutStream bool // false by default.
}
keepalive.ClientParameters 参数的含义如下:
- Time:如果没有 activity, 则每隔此值发送一个 ping 包
- Timeout: 如果 ping ack 该值之内未返回则认为连接已断开
- PermitWithoutStream:如果没有 active 的 stream, 是否允许发送 ping
服务端的grpc keepalive参数
// ServerParameters is used to set keepalive and max-age parameters on the
// server-side.
type ServerParameters struct {
// MaxConnectionIdle is a duration for the amount of time after which an
// idle connection would be closed by sending a GoAway. Idleness duration is
// defined since the most recent time the number of outstanding RPCs became
// zero or the connection establishment.
MaxConnectionIdle time.Duration // The current default value is infinity.
// MaxConnectionAge is a duration for the maximum amount of time a
// connection may exist before it will be closed by sending a GoAway. A
// random jitter of +/-10% will be added to MaxConnectionAge to spread out
// connection storms.
MaxConnectionAge time.Duration // The current default value is infinity.
// MaxConnectionAgeGrace is an additive period after MaxConnectionAge after
// which the connection will be forcibly closed.
MaxConnectionAgeGrace time.Duration // The current default value is infinity.
// After a duration of this time if the server doesn't see any activity it
// pings the client to see if the transport is still alive.
// If set below 1s, a minimum value of 1s will be used instead.
Time time.Duration // The current default value is 2 hours.
// After having pinged for keepalive check, the server waits for a duration
// of Timeout and if no activity is seen even after that the connection is
// closed.
Timeout time.Duration // The current default value is 20 seconds.
}
// EnforcementPolicy is used to set keepalive enforcement policy on the
// server-side. Server will close connection with a client that violates this
// policy.
type EnforcementPolicy struct {
// MinTime is the minimum amount of time a client should wait before sending
// a keepalive ping.
MinTime time.Duration // The current default value is 5 minutes.
// If true, server allows keepalive pings even when there are no active
// streams(RPCs). If false, and client sends ping when there are no active
// streams, server will send GOAWAY and close the connection.
PermitWithoutStream bool // false by default.
}
keepalive.EnforcementPolicy:
- MinTime:如果客户端两次 ping 的间隔小于此值,则关闭连接
- PermitWithoutStream: 即使没有 active stream, 也允许 ping
keepalive.ServerParameters:
- MaxConnectionIdle:如果一个 client 空闲超过该值, 发送一个 GOAWAY, 为了防止同一时间发送大量 GOAWAY, 会在此时间间隔上下浮动 10%, 例如设置为15s,即 15+1.5 或者 15-1.5
- MaxConnectionAge:如果任意连接存活时间超过该值, 发送一个 GOAWAY
- MaxConnectionAgeGrace:在强制关闭连接之间, 允许有该值的时间完成 pending 的 rpc 请求
- Time: 如果一个 client 空闲超过该值, 则发送一个 ping 请求
- Timeout: 如果 ping 请求该时间段内未收到回复, 则认为该连接已断开