-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Always skip randomly failing OCI8 extauth tests #9524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ext/oci8/tests/extauth_01.phpt
Outdated
@@ -70,7 +66,7 @@ var_dump($c); | |||
|
|||
echo "Test 7\n"; | |||
|
|||
$c = oci_connect('/', '', 'anything', null, OCI_CRED_EXT); | |||
$c = oci_connect('/', '', 'localhost', null, OCI_CRED_EXT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when localhost
is used instead of anything
, tests fail, so it cannot be really "anything".
@@ -30,4 +30,4 @@ runs: | |||
--offline \ | |||
--show-diff \ | |||
--show-slow 1000 \ | |||
--set-timeout 120 | |||
--set-timeout 1200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/php/php-src/runs/8296934151?check_suite_focus=true#step:11:3157 shows when the issue is hit, there is no OCI8 timeout or the timeout is higher than 10 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on:
https://github.com/mvorisek/php-src/runs/8302900173?check_suite_focus=true#step:15:108
https://github.com/mvorisek/php-src/runs/8302902211?check_suite_focus=true#step:10:117
(with elapsed time dump)
it seems "Test 7" - "Test 10" can take up ~200 seconds / "Test XX"
Thank you for working on this! FWIW, I just learned that as of Oracle 19c it is possible to specify |
Hi, @cjbj, I am reaching you as you are the author of these extauth tests. Can you please explain what resources php-src/ext/oci8/tests/extauth_01.phpt Line 83 in 4b93033
the test is connecting to and why the elapsed time varies from a second to 10 minutes? CI results:
|
|
These tests are supposed to fail to connect, so they were originally written with connection strings like |
What I see is random ~200s / connect timeout. I cannot explain these timeouts. I tried real The only solution I have now is to change it to |
@mvorisek, have you tried something like |
I will once the previous tests finish (they take like 3h / commit, later, I optimized them to test OCI8 ext only). My question to you it, what shoult this solve? a) the timeouts are random, shouldn't we first confirm if such insane behavior is valid, even with stable DNS names like So if we cannot find the real issue, I am more inclined to remove (comment out) these tests for now. |
@cmb69 tried - https://github.com/php/php-src/actions/runs/3039225236/jobs/4893908509#step:11:224 - has no effect |
549248d
to
a0ab112
Compare
I was able to reproduce the problem in CI /w strace: https://github.com/mvorisek/php-src/actions/runs/3039764841/jobs/4895065748 relevant logs: @cjbj can you please check the a) it seems the Oracle Instant Client is looping massively when resolving the host The Oracle Instant Client has probably some condition in the code wrong. Code to reproduce: just |
@cmb69 PR done, the 3 extauth tests cannot be fixed, they need to be skipped until the Oracle Instant Client is fixed. Tested with hundreds of CI runs, this PR fixes the random CI failures. Before merge, please squash the changes into one commit. |
Thank you! |
I'll point our network people at this and let the experts discuss it. Thanks for all your work on this PR. |
@mvorisek The reply is "The looping and timeout behavior would depend on the setting in /etc/resolv.conf. Here it looks like the default timeout is 5 seconds and there might be multiple DNS servers specified so after each timeout the next DNS host would be tried. The content of resolv.conf can be seen to verify the number of dns servers. All the looping would be inside getaddrinfo call." |
5 second timeout is expected. But what I have experienced, and the logs shows it as well, is massive looping/retries with ~200 seconds timeout total. According to the https://man7.org/linux/man-pages/man5/resolver.5.html there should be 2 retries per DNS server by default. With 3 DNS servers, this will imply 5 x 2 x 3 = 30 seconds timeout. Maybe GH Actions has some special configuration or resolv patched for less failures, but I doubt so. @cjbj Is the Oracle Instant Client really calling the |
@mvorisek how long does nslookup take in that environment? You could turn on Oracle network tracing to check if snlinGetAddrInfo is being called multiple times. (Ping me if you need steps). Our net team say it's basically a DNS setup issue as DNS lookup shouldn't hang. |
fix #8348 (comment)
The issue is weird as it happends only sometimes in quite repeatable environment like GH Actions.
I guess the
anything
connection string is the problem, sometimes it is resolved immediatelly, sometimes the resolving (or the connection) to the hostname ends with a timeout. I was however not able to reproduce it locally.These tests was marked as slow tests. There are not slow. There are either fast or hit the problem ;-)