-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/go, testing: go test 4-5x slower on Windows when test caching is used; -count=1 is fast #72992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
maybe #35801 |
Odd, I can't clone the repo at all on Windows:
|
Your git install may not have long paths enabled; the Windows GUI installer I think preselects the option but some installs might not.
(Apologies for the terribly long test names; they're autogenerated and used to "fit" before we started using the repo as a submodule...) |
Thanks, I'll give that a try. FWIW, I can reproduce this on Linux, but the delta is less extreme at ~2x:
24s vs 12s |
Oh, wow, I guess my internal clock never noticed. $ go clean -testcache
$ time go test ./internal/testrunner
ok github.com/microsoft/typescript-go/internal/testrunner 8.609s
go test ./internal/testrunner 108.22s user 23.12s system 892% cpu 14.709 total
$ time go test -count=1 ./internal/testrunner
ok github.com/microsoft/typescript-go/internal/testrunner 8.919s
go test -count=1 ./internal/testrunner 106.11s user 20.10s system 1292% cpu 9.763 total Not quite as pronounced on my machine but I guess I couldn't tell 9 seconds from 14 seconds. |
I managed to reproduce on a gotip-windows-amd64 gomote by setting There I get 110s vs 16s. |
Back to Linux, looking at system calls via strace. Here are the syscall counts with
And with caching:
The caching version makes 10x as many newfstatat calls. My guess would be that FS calls are more expensive on Windows, making the issue more pronounced. |
I collected strace logs with:
And filtered to newfstatat arguments with:
Just eyeballing the diff, the cache version add fstat calls for lots of stuff (everything?) in Most of the files are stat'd more than once. I didn't do an explicit count to see if that adds up to all of the difference. Presumably the test must be opening these files or they wouldn't be part of the cache key, so perhaps it is the multiple calls that is the problem? I'll leave it here for the cmd/go folks. |
Yeah, the tests definitely open these files; they are opened, read out, and then are used to generate more subtests. Which tests are generated (unfortunately) depend on what's in those files too, though that wouldn't contribute to more stat calls specifically I wouldn't think... |
Right, to be clear my line of thinking is that even if cmd/go stating these files is slow, since the test itself opens and reads them, you’d expect the test to be slower than cmd/go. Do the tests run in parallel? That may be a difference from cmd/go. |
They run in parallel, but I don't think they are initially scanned/read in parallel (i.e. it's |
Some additional details from #72992 (comment):
This is the number of times each path is stat'd. The top four, with 170k+ stats each, are empty directories. I'm not sure what's up with that. Here is the full distribution:
~5k unique paths are stat'd once, ~200k 3 times, ~12k 6 times, and ~151k 7 times. |
Just for reference, this package contains 78291 tests (according to the count of |
Can you do a diff of the counts between cached and uncached? I suspect that those top ones are actually all just from repeated EDIT: Or try https://github.com/microsoft/typescript-go/tree/jabaile/baseline-mkdirall-cache where I stuck a cache in front of it. |
Yeah, I just checked, and my linked branch eliminates all of the stats that come from the tests themselves; with
The stat calls after that branch but with test caching look similar to the after, so I'm not sure that the stats specifically are the smoking gun ☹ |
I hacked the parent Go command to take a CPU profile: https://pprof.host/k84g/ As far as I can tell, there's a 13s-ish |
I was able to shave off some of the time here (maybe 5%?) by caching the result of |
Change https://go.dev/cl/667495 mentions this issue: |
That's any interesting approach. I'm a bit worried that the cache will end up being very big for long running applications which have to deal with many different errors, given that there are more that Windows defines more 15000 different errors. I wonder if we could identify the most common errors and only use the cache for those. I'm sure that we can get a small list of 10-20 errors that would be enough to make a difference. In fact, the go/src/syscall/zsyscall_windows.go Lines 25 to 36 in ecdd429
@jakebailey could you patch the |
Yep, I can try that. Given "file not found" is errno 2, I almost bet that most errors are "small" numbers such that the cache could just be "all errors under 256" or something. |
This netted interesting results. Running
Then running our tests (both before/after being cached):
Caching just errnos 2 and 3 would eliminate the bulk of these calls. Now, there's clearly different groupings of errors to, some under 256, some ~1000, and some ~10000; I don't know if the table of errors are somewhere but it'd be interesting to know if there's some cutoff that would be workable here. (EDIT: I found https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes, but it's just structured as fixed size ranges for doc pages.) But, I can also just update the PR to cache just 256 (a number pulled out of a hat). Of course, again this isn't the smoking gun of the original problem, just one part I noticed when profiling 😄 |
I don't like magic numbers. The data you provided is more than enough to justify caching |
N=2 also means you can use a couple of https://pkg.go.dev/sync#OnceValue globals, rather than a full |
Unfortunately not; doing so creates an init cycle since the code to do the error message syscall eventually references Errno itself:
So any setup where a |
Good catch. Let's use a |
I didn't mean to spam this thread too hard about this one particular fix since it's not the whole problem; I think maybe it'd be best to comment on the CL now? The current version uses a |
Hi @jakebailey, shortly after this was first open, I took a brief look. Consider these "drive by" comments that might in fact be wrong (including mostly relying on memory at this point, sorry).
go/src/cmd/go/internal/test/test.go Lines 1979 to 1986 in dc1e255
Or maybe that doesn't make sense based on the required semantics, but something perhaps to at least briefly investigate.
|
Good point. Windows just released GetFileInformationByHandle, which should replace the GetFileAttributes + CreateFile fallback. In fact some weeks ago I reimplemented |
The perf of the test code itself is actually fine; I have a change locally which converted the code to a recursive walker which iterates over the paths concurrently with the thing that generates the tests (or at least, generates them for the main goroutine to then Once we get past test caching, the underlying test code itself runs in a similar amount of time to me using WSL on the same machine (where I typically work, and thus was unaware of the misery my colleagues were experiencing: "wait, your tests don't run in 10 seconds?" 😄).
Unfortunately it's not really desirable. The current test format is pretty old and very helpful. I certainly know about txtar, we just have our own format (predating txtar I think) called "twoslash" that works roughly the same way, declaring a FS with some metadata. It's just that we then use that as an input to output 5 or so files related to said test. The other interesting bit is that the test caching code runs more than once; I think three times. Once for the test results, once for the test binary, then a third time when writing out the results. It's documented as being a perf benefit since this means being able to skip the binary writing part, but in our case the testlog code itself is what dominates. |
Change https://go.dev/cl/668636 mentions this issue: |
I was curious whether the approach I outlined in item 3 in #72992 (comment) above was viable, so I took a quick stab this weekend and sent https://go.dev/cl/668636. In short, it tries to do the test log file operations in parallel, but then calculates the final hash in the original order from the test log to keep the end result deterministic. On an old-ish Windows laptop (Windows 10 with 8 logical cores), here are wall-clock times for doing
The meaning of the times in that table:
The CL also de-duplicates some of the work. (For typescript-go testrunner package, about 30% of the file operations in the test log seem to be repeats). It probably could be simplified, but my starting point was not knowing if it would be feasible, so consider it a first cut. It passes all.bash and the longtest trybots on Linux and Windows, but it still could be wrong. 😅 @jakebailey, I'd be curious if you see any improvement:
|
Just measuring how long it takes when the results are cached, before it's:
After:
So, certainly better on my machine, though my measurements are different. |
Go version
go version go1.24.1 windows/amd64
Output of
go env
in your module/workspace:What did you do?
These tests come from the Go port of The TypeScript compiler. We've found that on Windows only, the Go test cache seems to make things a lot slower (microsoft/typescript-go#685 disables it for Windows for now).
What did you see happen?
On my machine, the test run after cleaning the test cache takes 13 to run the tests, but the wall time is 65 seconds:
What did you expect to see?
Tunning with
-count=1
(skipping all test caching), the actual test time is about the same, but the wall time is only 15 seconds, about 4-5x faster:This doesn't reproduce on Linux.
The text was updated successfully, but these errors were encountered: