前段时间有同学在线上问了个问题:
服务器端我是这样设的:gen_tcp:listen(8000, [{active, false}, {recbuf,1}, {buffer,1}]).
客户端是这样设的:gen_tcp:connect(“localhost”, 8000, [{active, false}, {high_watermark,2}, {low_watermark,1}, {sndbuf,1}, {buffer,1}]).
我客户端每次gen_tcp:send()发送一个字节,前6个字节返回ok,第7个字节阻塞
服务端每次gen_tcp:recv(_,0)接收一个字节,接收三个字节后,客户端的第7次发送返回。
按我的理解的话:应该是 服务器端可以接收2个字节+sndbuf里的一个字节,第4个字节客户端就该阻塞的,可事实不时这样,求分析
这个问题确实还是比较复杂,涉及到gen_tcp的发送缓冲区和接收缓冲区,水位线等问题,其中接收缓冲区的问题在这篇 以及这篇 博文里面讲的比较清楚了,今天我们重点来分析下发送缓冲区和水位线的问题。
在开始分析前,我们需要熟悉几个gen_tcp的选项, 更多参见 这里:
{delay_send, Boolean}
Normally, when an Erlang process sends to a socket, the driver will try to immediately send the data. If that fails, the driver will use any means available to queue up the message to be sent whenever the operating system says it can handle it. Setting {delay_send, true} will make all messages queue up. This makes the messages actually sent onto the network be larger but fewer. The option actually affects the scheduling of send requests versus Erlang processes instead of changing any real property of the socket. Needless to say it is an implementation specific option. Default is false.
{high_msgq_watermark, Size} (TCP/IP sockets)
The socket message queue will be set into a busy state when the amount of data queued on the message queue reaches this limit. Note that this limit only concerns data that have not yet reached the ERTS internal socket implementation. Default value used is 8 kB.
Senders of data to the socket will be suspended if either the socket message queue is busy, or the socket itself is busy.
For more information see the low_msgq_watermark, high_watermark, and low_watermark options.
Note that distribution sockets will disable the use of high_msgq_watermark and low_msgq_watermark, and will instead use the distribution buffer busy limit which is a similar feature.
{high_watermark, Size} (TCP/IP sockets)
The socket will be set into a busy state when the amount of data queued internally by the ERTS socket implementation reaches this limit. Default value used is 8 kB.
Senders of data to the socket will be suspended if either the socket message queue is busy, or the socket itself is busy.
For more information see the low_watermark, high_msgq_watermark, and low_msqg_watermark options.
{low_msgq_watermark, Size} (TCP/IP sockets)
If the socket message queue is in a busy state, the socket message queue will be set in a not busy state when the amount of data queued in the message queue falls below this limit. Note that this limit only concerns data that have not yet reached the ERTS internal socket implementation. Default value used is 4 kB.
Senders that have been suspended due to either a busy message queue or a busy socket, will be resumed when neither the socket message queue, nor the socket are busy.
For more information see the high_msgq_watermark, high_watermark, and low_watermark options.
Note that distribution sockets will disable the use of high_msgq_watermark and low_msgq_watermark, and will instead use the distribution buffer busy limit which is a similar feature.
{low_watermark, Size} (TCP/IP sockets)
If the socket is in a busy state, the socket will be set in a not busy state when the amount of data queued internally by the ERTS socket implementation falls below this limit. Default value used is 4 kB.
Senders that have been suspended due to either a busy message queue or a busy socket, will be resumed when neither the socket message queue, nor the socket are busy.
For more information see the high_watermark, high_msgq_watermark, and low_msgq_watermark options.
这选项里面两对高低水位线的设置,以及delay_send选项,对发送缓冲区的影响很大。
gen_tcp:send的行为在之前的 博文 中分析的比较到位了,建议同学先看看这篇文章垫底下。
我们知道每个erlang的进程都有个消息队列,其他进程要和他通信就需要透过发消息给他,把通讯的内容在消息里面交代清楚。进程消息队列里面一旦有消息,erlang的VM就会马上准备调度该进程来让进程执行,处理消息。这个进程的消息队列机制每个erlang入门的书籍都写的非常清楚。 那么port呢?在Erlang的早期,Port是和进程一样的地位,接口,使用方式。Port作为Erlang对外的IO的执行单位,也拥有自己的消息队列,当进程把消息发送给port的时候,port通常也是把消息保存在消息队列中,然后VM就会调度这个port。等到port被调度执行的时候,port把队列里面的消息消耗掉,发送到网络或者执行相应IO的操作。port的调度和erlang的进程的调度是一样的,都非常讲究公平调度。
我们来考证下port和进程消息发送的接口。 我们知道!符号是erlang:send的语法糖,当我们给Port!msg 或者Pid!msg,最终都是调用erlang:send来发送消息。后面不知道为什么,erlang的设计者专门为port设计了port_command系列函数专门为port发送消息。
我们来考证下:
erlang:send->BIF_RETTYPE send_3(BIF_ALIST_3)->do_send 源码在bif.c中我们来看看:
do_send(Process *p, Eterm to, Eterm msg, int suspend, Eterm *refp) { |
if (is_internal_pid(to)) { |
} else if (is_external_pid(to)) { |
return remote_send(p, dep, to, to, msg, suspend); |
} else if (is_external_port(to) |
&& (external_port_dist_entry(to) |
== erts_this_dist_entry)) {
|
erts_dsprintf_buf_t *dsbufp = erts_create_logger_dsbuf(); |
"Discarding message %T from %T to %T in an old " |
"incarnation (%d) of this node (%d)\n" , |
external_port_creation(to), |
erts_this_node->creation); |
erts_send_error_to_logger(p->group_leader, dsbufp); |
} else if (is_internal_port(to)) { |
pt = erts_port_lookup(portid, ERTS_PORT_SFLGS_INVALID_LOOKUP); |
switch (erts_port_command(p, ps_flags, pt, msg, refp)) { |
case ERTS_PORT_OP_CALLER_EXIT: |
诸位看到了吧! 1. erlang:send接受二种对象: port和process 2. 发送到port的消息走的和erts_port_command是一样的路。
喝口水,保存体力,重新温习下二点: 1. port有消息队列。 2. port也是公平调度。
有了上面的知识铺垫,我们其实就比较好明白上面选项中的水位线做什么的。和每个消息队列一样,为了防止发送者和接收者能力的失衡,通常都会设置高低水位线来保护队列不至于太大把
系统撑爆。 上面的{high_watermark, Size},{low_watermark, Size} 就是干这个用的。
那port是如何保护自己的呢?答案是:
当消息量达到高水位线的时候,port进入busy状态,这时候会把发送进程suspend起来,等消息达到低水位线的时候,解除busy状态,同时让发送进程继续执行。
证明上面的说法,参考下port_command 文档:
port_command(Port, Data, OptionList) -> boolean()
Types:
Port = port() | atom()
Data = iodata()
Option = force | nosuspend
OptionList = [Option]
Sends data to a port. port_command(Port, Data, []) equals port_command(Port, Data).
If the port command is aborted false is returned; otherwise, true is returned.
If the port is busy, the calling process will be suspended until the port is not busy anymore.
Currently the following Options are valid:
force
The calling process will not be suspended if the port is busy; instead, the port command is forced through. The call will fail with a notsup exception if the driver of
the port does not support this. For more information see the ERL_DRV_FLAG_SOFT_BUSY driver flag.
nosuspend
The calling process will not be suspended if the port is busy; instead, the port command is aborted and false is returned.
那如何知道一个port进入busy状态,因为这个状态通常很严重,发送进程被挂起,会引起很大的latency.
幸亏erlang考虑周到,参看这里:
erlang:system_monitor(MonitorPid, Options) -> MonSettings
busy_port
If a process in the system gets suspended because it sends to a busy port, a message {monitor, SusPid, busy_port, Port} is sent to MonitorPid. SusPid is the pid that
got suspended when sending to Port.
系统会很友好的把发生busy_port的进程发出来,我们就可以知道那个进程进程碰到高水位线被挂起了,方面我们后面调整水位线避免这种情况发生。
当用户调用gen_tcp:send要发送数据的时候最终都会调用port_command来具体执行, 那么我们来看下它是如何运作的:
erts_port_command(Process *c_p, |
if (is_tuple_arity(command, 2)) { |
if (is_internal_pid(cntd)) { |
if (!erts_port_synchronous_ops) |
flags &= ~ERTS_PORT_SIG_FLG_NOSUSPEND; |
return erts_port_exit(c_p, flags, port, cntd, am_normal, refp); |
} else if (is_tuple_arity(tp[2], 2)) { |
if (tp[1] == am_command) { |
if (!(flags & ERTS_PORT_SIG_FLG_NOSUSPEND) |
&& !erts_port_synchronous_ops) |
return erts_port_output(c_p, flags, port, cntd, tp[2], refp); |
else if (tp[1] == am_connect) { |
if (!erts_port_synchronous_ops) |
flags &= ~ERTS_PORT_SIG_FLG_NOSUSPEND; |
return erts_port_connect(c_p, flags, port, cntd, tp[2], refp); |
erts_port_output(Process *c_p, |
try_call = (force_immediate_call |
|| !(sched_flags & (invalid_flags |
| ERTS_PTS_FLGS_FORCE_SCHEDULE_OP))); |
try_call_state.pre_chk_sched_flags = 0; |
try_call_res = force_imm_drv_call(&try_call_state); |
try_call_res = try_imm_drv_call(&try_call_state); |
case ERTS_TRY_IMM_DRV_CALL_OK: |
call_driver_outputv(flags & ERTS_PORT_SIG_FLG_BANG_OP, |
c_p ? c_p->common.id : ERTS_INVALID_PID, |
finalize_force_imm_drv_call(&try_call_state); |
finalize_imm_drv_call(&try_call_state); |
case ERTS_TRY_IMM_DRV_CALL_INVALID_SCHED_FLAGS: |
sched_flags = try_call_state.sched_flags; |
case ERTS_TRY_IMM_DRV_CALL_BUSY_LOCK: |
call_driver_outputv( int bang_op, |
if (bang_op && from != ERTS_PORT_GET_CONNECTED(prt)) |
(*drv->outputv)((ErlDrvData) prt->drv_data, evp); |
erts_smp_atomic_add_nob(&erts_bytes_out, size); |
从源码分析来看,我们看到port_comma