From: merch-redmine@... Date: 2020-08-31T21:38:15+00:00 Subject: [ruby-core:99809] [Ruby master Bug#14997] Socket connect timeout exceeds the timeout value for Issue #14997 has been updated by jeremyevans0 (Jeremy Evans). Status changed from Open to Closed I believe this timeout issue is now solved by the Socket.tcp :resolv_timeout option, introduced in commit:6382f5cc91ac9e36776bc854632d9a1237250da7. ---------------------------------------- Bug #14997: Socket connect timeout exceeds the timeout value for https://bugs.ruby-lang.org/issues/14997#change-87330 * Author: maciej.mensfeld (Maciej Mensfeld) * Status: Closed * Priority: Normal * ruby -v: 2.5.1 * Backport: 2.3: UNKNOWN, 2.4: UNKNOWN, 2.5: UNKNOWN ---------------------------------------- Given a case, where a domain is being resolved to multiple IPs (4 in the following example): ``` dig debug-xyz.elb.us-east-1.amazonaws.com a ; <<>> DiG 9.10.3-P4-Ubuntu <<>> debug-xyz.elb.us-east-1.amazonaws.com a ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54375 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;debug-xyz.elb.us-east-1.amazonaws.com. IN A ;; ANSWER SECTION: debug-xyz.elb.us-east-1.amazonaws.com. 60 IN A 172.31.86.79 debug-xyz.elb.us-east-1.amazonaws.com. 60 IN A 172.31.109.24 debug-xyz.elb.us-east-1.amazonaws.com. 60 IN A 172.31.119.55 debug-xyz.elb.us-east-1.amazonaws.com. 60 IN A 172.31.71.167 ;; Query time: 4 msec ;; SERVER: 172.31.0.2#53(172.31.0.2) ;; WHEN: Tue Aug 14 13:46:18 UTC 2018 ;; MSG SIZE rcvd: 132 ``` and when `connect_timeout` is set to a certain value (N), the overall timeout upon non-responsive endpoints that don't immediately throw an exception can reach `N * 4`. This can disrupt some time-sensitive systems. We've experienced it with the following setup: - TCP server (event machine) behind an AWS NLB - TCP server process goes down behind NLB but NLB is still responsive - Socket connect_timeout is set to 100ms - AWS NLB keeps the connection in the waiting state hoping that the service behind it will get back to normal (but it doesn't) - Ruby timeouts after 100ms - Ruby tries to connect to the next IP from the pool (AWS NLB again) - Due to 4 hosts resolving, the overall timeout is 400ms. Not sure whether this should be qualified as a bug or a feature, but I believe it should be definitely documented or there should be an option to "hard" block this limit. Here's the code actually responsible for this behavior: https://github.com/ruby/ruby/blob/trunk/ext/socket/lib/socket.rb#L631-L664 -- https://bugs.ruby-lang.org/ Unsubscribe: