Skip to content

cmd/go, testing: go test 4-5x slower on Windows when test caching is used; -count=1 is fast #72992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jakebailey opened this issue Mar 21, 2025 · 35 comments
Assignees
Labels
BugReport Issues describing a possible bug in the Go implementation. GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows

Comments

@jakebailey
Copy link
Contributor

jakebailey commented Mar 21, 2025

Go version

go version go1.24.1 windows/amd64

Output of go env in your module/workspace:

set AR=ar
set CC=gcc
set CGO_CFLAGS=-O2 -g
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-O2 -g
set CGO_ENABLED=1
set CGO_FFLAGS=-O2 -g
set CGO_LDFLAGS=-O2 -g
set CXX=g++
set GCCGO=gccgo
set GO111MODULE=
set GOAMD64=v1
set GOARCH=amd64
set GOAUTH=netrc
set GOBIN=
set GOCACHE=C:\Users\jabaile\AppData\Local\go-build
set GOCACHEPROG=
set GODEBUG=
set GOENV=C:\Users\jabaile\AppData\Roaming\go\env
set GOEXE=.exe
set GOEXPERIMENT=
set GOFIPS140=off
set GOFLAGS=
set GOGCCFLAGS=-m64 -mthreads -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=C:\Users\jabaile\AppData\Local\Temp\go-build3919903901=/tmp/go-build -gno-record-gcc-switches
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMOD=D:\work\TypeScript-go\go.mod
set GOMODCACHE=C:\Users\jabaile\go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\jabaile\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Users\jabaile\scoop\apps\go\current
set GOSUMDB=sum.golang.org
set GOTELEMETRY=local
set GOTELEMETRYDIR=C:\Users\jabaile\AppData\Roaming\go\telemetry
set GOTMPDIR=
set GOTOOLCHAIN=auto
set GOTOOLDIR=C:\Users\jabaile\scoop\apps\go\current\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.24.1
set GOWORK=
set PKG_CONFIG=pkg-config

What did you do?

These tests come from the Go port of The TypeScript compiler. We've found that on Windows only, the Go test cache seems to make things a lot slower (microsoft/typescript-go#685 disables it for Windows for now).

$ git clone --recurse-submodules https://github.com/microsoft/typescript-go.git
$ cd typescript-go
# warm the caches
$ go test ./internal/testrunner
# clean the test cache only
$ go clean -testcache
$ Measure-Command { go test ./internal/testrunner | Out-Default }
$ Measure-Command { go test -count=1 ./internal/testrunner | Out-Default }

What did you see happen?

On my machine, the test run after cleaning the test cache takes 13 to run the tests, but the wall time is 65 seconds:

$ Measure-Command { go test ./internal/testrunner | Out-Default }
ok      github.com/microsoft/typescript-go/internal/testrunner  13.584s

Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 5
Milliseconds      : 364
Ticks             : 653649277
TotalDays         : 0.000756538515046296
TotalHours        : 0.0181569243611111
TotalMinutes      : 1.08941546166667
TotalSeconds      : 65.3649277
TotalMilliseconds : 65364.9277

What did you expect to see?

Tunning with -count=1 (skipping all test caching), the actual test time is about the same, but the wall time is only 15 seconds, about 4-5x faster:

$ Measure-Command { go test -count=1 ./internal/testrunner | Out-Default }
ok      github.com/microsoft/typescript-go/internal/testrunner  11.726s

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 15
Milliseconds      : 296
Ticks             : 152960956
TotalDays         : 0.000177038143518519
TotalHours        : 0.00424891544444444
TotalMinutes      : 0.254934926666667
TotalSeconds      : 15.2960956
TotalMilliseconds : 15296.0956

This doesn't reproduce on Linux.

@seankhliao
Copy link
Member

maybe #35801

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Mar 21, 2025
@jakebailey
Copy link
Contributor Author

jakebailey commented Mar 21, 2025

#26562, #61608 above are related too, but unlike the past closed issues I actually have a repro 😄

@qmuntal qmuntal added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 21, 2025
@prattmic prattmic added the GoCommand cmd/go label Mar 22, 2025
@prattmic
Copy link
Member

cc @matloob @samthanawalla

@prattmic
Copy link
Member

Odd, I can't clone the repo at all on Windows:

$ git clone --recurse-submodules https://github.com/microsoft/typescript-go.git
Cloning into 'typescript-go'...                  
remote: Enumerating objects: 95648, done.        
remote: Counting objects: 100% (11177/11177), done.
remote: Compressing objects: 100% (7548/7548), done.
remote: Total 95648 (delta 4558), reused 3646 (delta 3625), pack-reused 84471 (from 3)
Receiving objects: 100% (95648/95648), 60.42 MiB | 19.73 MiB/s, done.
Resolving deltas: 100% (26407/26407), done.      
Updating files: 100% (65383/65383), done.                               
Submodule '_submodules/TypeScript' (https://github.com/microsoft/TypeScript.git) registered for path '_submodules/TypeScript'
Cloning into 'C:/b/s/w/ir/typescript-go/_submodules/TypeScript'...                               
remote: Enumerating objects: 737700, done.                                                       
remote: Counting objects: 100% (107/107), done.                  
remote: Compressing objects: 100% (65/65), done.         
remote: Total 737700 (delta 68), reused 50 (delta 41), pack-reused 737593 (from 2)        
Receiving objects: 100% (737700/737700), 2.73 GiB | 36.99 MiB/s, done.
Resolving deltas: 100% (513580/513580), done.                                                                                                                                                    
error: unable to create file tests/baselines/reference/tsserver/configuredProjects/Open-ref-of-configured-project-when-open-file-gets-added-to-the-project-as-part-of-configured-file-update-buts
-its-open-file-references-are-all-closed-when-the-update-happens.js: Filename too long           
error: unable to create file tests/baselines/reference/tsserver/projectReferences/solution-with-its-own-files-and-disables-looking-into-the-child-project-if-disableReferencedProjectLoad-is-set-
in-first-indirect-project-but-not-in-another-one.js: Filename too long                           
fatal: Unable to checkout '52c59dbcbee274e523ef39e6c8be1bd5e110c2f1' in submodule path '_submodules/TypeScript'

@jakebailey
Copy link
Contributor Author

jakebailey commented Mar 22, 2025

Your git install may not have long paths enabled; the Windows GUI installer I think preselects the option but some installs might not.

git config --global core.longpaths true

(Apologies for the terribly long test names; they're autogenerated and used to "fit" before we started using the repo as a submodule...)

@prattmic
Copy link
Member

Thanks, I'll give that a try.

FWIW, I can reproduce this on Linux, but the delta is less extreme at ~2x:

$ go clean -testcache
$ /usr/bin/time -p go test ./internal/testrunner
ok      github.com/microsoft/typescript-go/internal/testrunner  10.641s
real 24.05
user 58.48
sys 13.87
$ /usr/bin/time -p go test -count=1 ./internal/testrunner
ok      github.com/microsoft/typescript-go/internal/testrunner  10.672s
real 11.93
user 51.42
sys 8.00

24s vs 12s

@jakebailey
Copy link
Contributor Author

Oh, wow, I guess my internal clock never noticed.

$ go clean -testcache          
$ time go test ./internal/testrunner
ok      github.com/microsoft/typescript-go/internal/testrunner  8.609s
go test ./internal/testrunner  108.22s user 23.12s system 892% cpu 14.709 total
$ time go test -count=1 ./internal/testrunner
ok      github.com/microsoft/typescript-go/internal/testrunner  8.919s
go test -count=1 ./internal/testrunner  106.11s user 20.10s system 1292% cpu 9.763 total

Not quite as pronounced on my machine but I guess I couldn't tell 9 seconds from 14 seconds.

@prattmic
Copy link
Member

I managed to reproduce on a gotip-windows-amd64 gomote by setting git config --global core.longpaths true first.

There I get 110s vs 16s.

@prattmic
Copy link
Member

Back to Linux, looking at system calls via strace.

Here are the syscall counts with -count=1:

 492462 fcntl
 309607 newfstatat
 249177 read
 218362 openat
 129989 nanosleep
 123908 close
 120450 fstat
  57962 futex
  21703 getpid
  21681 tgkill
   4089 lseek
   2943 mmap
   1479 getdents64
   1273 write
    335 unlinkat
    252 sigaltstack
    217 gettid
    209 pread64
    161 mkdirat
    115 clone
     87 madvise
     63 dup3
     59 pipe2
     50 mprotect
     28 execve
     24 prlimit64
     24 munmap
     23 waitid
     21 eventfd2
     18 flock
     15 rseq
     12 clone3
     11 brk
      9 readlink
      6 access
      4 ftruncate
      4 faccessat2
      3 readlinkat
      2 utimensat
      2 ioctl
      2 getrandom
      2 fallocate
      2 chdir
      1 wait4
      1 uname
      1 pwrite64

And with caching:

2809756 newfstatat
 507101 nanosleep
 492518 fcntl
 372379 futex
 250883 read
 219527 openat
 125071 close
 120453 fstat
  18013 getpid
  17991 tgkill
  16043 write
   4089 lseek
   3930 getdents64
   2896 mmap
   1340 madvise
    336 unlinkat
    250 sigaltstack
    216 gettid
    209 pread64
    160 mkdirat
    115 clone
     64 dup3
     59 pipe2
     48 mprotect
     28 execve
     24 waitid
     24 prlimit64
     24 munmap
     21 eventfd2
     18 flock
     14 rseq
     11 clone3
     11 brk
      9 readlink
      8 utimensat
      8 ftruncate
      6 access
      4 faccessat2
      3 readlinkat
      2 ioctl
      2 getrandom
      2 fallocate
      2 chdir
      1 wait4
      1 uname
      1 pwrite64

The caching version makes 10x as many newfstatat calls.

My guess would be that FS calls are more expensive on Windows, making the issue more pronounced.

@prattmic
Copy link
Member

I collected strace logs with:

$ strace -f -o /tmp/cache.txt /usr/bin/time -p go test ./internal/testrunner
$ strace -f -o /tmp/nocache.txt /usr/bin/time -p go test -count=1 ./internal/testrunner

And filtered to newfstatat arguments with:

$ sed -n 's/^[0-9]\+ newfstatat(.*, "\(.*\)",.*/\1/p' /tmp/nocache.txt > /tmp/nocache.fstat
$ sed -n 's/^[0-9]\+ newfstatat(.*, "\(.*\)",.*/\1/p' /tmp/cache.txt > /tmp/cache.fstat

Just eyeballing the diff, the cache version add fstat calls for lots of stuff (everything?) in ./testdata and ./_submodules/TypeScript/tests/, which contain 71k and 65k files, respectfully. These need to be checked to make sure the cache is not invalidated.

Most of the files are stat'd more than once. I didn't do an explicit count to see if that adds up to all of the difference.

Presumably the test must be opening these files or they wouldn't be part of the cache key, so perhaps it is the multiple calls that is the problem? I'll leave it here for the cmd/go folks.

@jakebailey
Copy link
Contributor Author

Presumably the test must be opening these files or they wouldn't be part of the cache key

Yeah, the tests definitely open these files; they are opened, read out, and then are used to generate more subtests. Which tests are generated (unfortunately) depend on what's in those files too, though that wouldn't contribute to more stat calls specifically I wouldn't think...

@prattmic
Copy link
Member

Right, to be clear my line of thinking is that even if cmd/go stating these files is slow, since the test itself opens and reads them, you’d expect the test to be slower than cmd/go.

Do the tests run in parallel? That may be a difference from cmd/go.

@jakebailey
Copy link
Contributor Author

jakebailey commented Mar 22, 2025

They run in parallel, but I don't think they are initially scanned/read in parallel (i.e. it's t.Run in a loop).

@prattmic
Copy link
Member

prattmic commented Mar 22, 2025

Some additional details from #72992 (comment):

$ sort /tmp/cache.fstat > /tmp/cache.fstat.sort
$ uniq -c /tmp/cache.fstat.sort| sort -n -r | head -n 50
 360472 /tmp/typescript-go/testdata/baselines/local/submodule/compiler
 346514 /tmp/typescript-go/testdata/baselines/local/submodule/conformance
 180236 /tmp/typescript-go/testdata/baselines/local/submoduleAccepted/compiler
 173257 /tmp/typescript-go/testdata/baselines/local/submoduleAccepted/conformance
    454 /usr/local/google/home/mpratt/.cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d
    224 /tmp/typescript-go/testdata/baselines/local/compiler
    157 /tmp/go-build2317256986
     28 /tmp/typescript-go/testdata/baselines/local/conformance
     21 /usr/local/google/home/mpratt/.config/go/telemetry/local
     15 /tmp/typescript-go
     14 /tmp/typescript-go/testdata/baselines/local
     13 /usr/local/google/home/mpratt/go/pkg/mod/gotest.tools/[email protected]
     13 /usr/local/google/home/mpratt/go/pkg/mod/github.com/pkg/[email protected]
     13 /usr/local/google/home/mpratt/go/pkg/mod/github.com/go-json-experiment/[email protected]
     13 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/gotest.tools/v3/@v/v3.5.2.ziphash
     13 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/gotest.tools/v3/@v/v3.5.2.partial
     13 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/pkg/diff/@v/v0.0.0-20241224192749-4e6772a4315c.ziphash
     13 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/pkg/diff/@v/v0.0.0-20241224192749-4e6772a4315c.partial
     13 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/go-json-experiment/json/@v/v0.0.0-20250223041408-d3c622f1b874.ziphash
     13 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/go-json-experiment/json/@v/v0.0.0-20250223041408-d3c622f1b874.partial
     12 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/gotest.tools/v3/@v/v3.5.2.mod
     12 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/pkg/diff/@v/v0.0.0-20241224192749-4e6772a4315c.mod
     12 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/go-json-experiment/json/@v/v0.0.0-20250223041408-d3c622f1b874.mod
     11 /usr/local/google/home/mpratt/go/pkg/mod/github.com/google/[email protected]
     11 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/google/go-cmp/@v/v0.7.0.ziphash
     11 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/google/go-cmp/@v/v0.7.0.partial
     11 /usr/local/google/home/mpratt/.cache/go-build/f0/f09e16d246448f43b25cab7bd5677e2c192d041a33e95444326e571496973665-d
     10 /usr/local/google/home/mpratt/go/pkg/mod/cache/download/github.com/google/go-cmp/@v/v0.7.0.mod
     10 .
      8 /tmp/typescript-go/internal/testrunner
      7 /usr/lib
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression4_es6.types.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression4_es6.types
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression4_es6.symbols.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression4_es6.symbols
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression4_es6.js.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression4_es6.js
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression4_es6.errors.txt.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression4_es6.errors.txt
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression3_es6.types.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression3_es6.types
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression3_es6.symbols.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression3_es6.symbols
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression3_es6.js.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression3_es6.js
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression3_es6.errors.txt.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression3_es6.errors.txt
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression2_es6.types.diff
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression2_es6.types
      7 /tmp/typescript-go/testdata/baselines/local/submodule/conformance/YieldStarExpression2_es6.symbols.diff

This is the number of times each path is stat'd. The top four, with 170k+ stats each, are empty directories. I'm not sure what's up with that.

Here is the full distribution:

$ uniq -c /tmp/cache.fstat.sort| sort -n -r | awk '{print $1}' | uniq -c       
      1 360472
      1 346514
      1 180236
      1 173257
      1 454
      1 224
      1 157
      1 28
      1 21
      1 15
      1 14
      9 13
      3 12
      4 11
      2 10
      1 8
 151535 7
  12598 6
      5 5
      5 4
 202059 3
    333 2
   4918 1

~5k unique paths are stat'd once, ~200k 3 times, ~12k 6 times, and ~151k 7 times.

@prattmic
Copy link
Member

prattmic commented Mar 22, 2025

Just for reference, this package contains 78291 tests (according to the count of RUN lines from go test -v).

@jakebailey
Copy link
Contributor Author

jakebailey commented Mar 22, 2025

Can you do a diff of the counts between cached and uncached? I suspect that those top ones are actually all just from repeated os.MkdirAll calls due to those dirs being test output dirs for our suite (which we should really stick behind a cache/once.OnceFunc or something).

EDIT: Or try https://github.com/microsoft/typescript-go/tree/jabaile/baseline-mkdirall-cache where I stuck a cache in front of it.

@jakebailey
Copy link
Contributor Author

Yeah, I just checked, and my linked branch eliminates all of the stats that come from the tests themselves; with -count=1 before and after:

  51496 /home/jabaile/work/TypeScript-go/testdata/baselines/local/submodule/compiler
  49502 /home/jabaile/work/TypeScript-go/testdata/baselines/local/submodule/conformance
  25748 /home/jabaile/work/TypeScript-go/testdata/baselines/local/submoduleAccepted/compiler
  24751 /home/jabaile/work/TypeScript-go/testdata/baselines/local/submoduleAccepted/conformance
    449 /home/jabaile/.cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d
    156 /tmp/go-build4240875372
     40 /home/jabaile/.config/go/telemetry/local
     36 /home/jabaile/work/TypeScript-go/testdata/baselines/local/compiler
     29 /home/jabaile/work/TypeScript-go
     22 .
    449 /home/jabaile/.cache/go-build/e3/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855-d
    156 /tmp/go-build1327348528
     39 /home/jabaile/.config/go/telemetry/local
     29 /home/jabaile/work/TypeScript-go
     22 .
     13 /home/jabaile/go/pkg/mod/gotest.tools/[email protected]
     13 /home/jabaile/go/pkg/mod/github.com/pkg/[email protected]
     13 /home/jabaile/go/pkg/mod/github.com/go-json-experiment/[email protected]
     13 /home/jabaile/go/pkg/mod/cache/download/gotest.tools/v3/@v/v3.5.2.ziphash
     13 /home/jabaile/go/pkg/mod/cache/download/gotest.tools/v3/@v/v3.5.2.partial

The stat calls after that branch but with test caching look similar to the after, so I'm not sure that the stats specifically are the smoking gun ☹

@jakebailey
Copy link
Contributor Author

I hacked the parent Go command to take a CPU profile: https://pprof.host/k84g/

As far as I can tell, there's a 13s-ish exec.Run call which is the actual test run, then effectively everything else is caching.

@jakebailey
Copy link
Contributor Author

jakebailey commented Mar 31, 2025

I was able to shave off some of the time here (maybe 5%?) by caching the result of syscall.Errno.String() on Windows (which is actually a syscall, compared to a simple lookup elsewhere). I can send that CL as it's probably generally good, but that's of course not a major change to the performance characteristics here.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/667495 mentions this issue: syscall: cache Errno.Error() on Windows

@qmuntal
Copy link
Member

qmuntal commented Apr 23, 2025

I was able to shave off some of the time here (maybe 5%?) by caching the result of syscall.Errno.String() on Windows (which is actually a syscall, compared to a simple lookup elsewhere). I can send that CL as it's probably generally good, but that's of course not a major change to the performance characteristics here.

That's any interesting approach. I'm a bit worried that the cache will end up being very big for long running applications which have to deal with many different errors, given that there are more that Windows defines more 15000 different errors.

I wonder if we could identify the most common errors and only use the cache for those. I'm sure that we can get a small list of 10-20 errors that would be enough to make a difference. In fact, the syscall package was an outstanding TODO that goes in this direction:

func errnoErr(e Errno) error {
switch e {
case 0:
return errERROR_EINVAL
case errnoERROR_IO_PENDING:
return errERROR_IO_PENDING
}
// TODO: add more here, after collecting data on the common
// error values see on Windows. (perhaps when running
// all.bat?)
return e
}

@jakebailey could you patch the syscall.errnoErr to trace every error found by your normal workflow and report back the distribution? Thanks!

@jakebailey
Copy link
Contributor Author

Yep, I can try that. Given "file not found" is errno 2, I almost bet that most errors are "small" numbers such that the cache could just be "all errors under 256" or something.

@jakebailey
Copy link
Contributor Author

jakebailey commented Apr 23, 2025

This netted interesting results. Running all.bat gives:

    450 3 The system cannot find the path specified.
     70 2 The system cannot find the file specified.
     38 10054 An existing connection was forcibly closed by the remote host.
     17 11004 The requested name is valid, but no data of the requested type was found.
     14 10053 An established connection was aborted by the software in your host machine.
      4 126 The specified module could not be found.
      2 232 The pipe is being closed.
      1 5 Access is denied.
      1 1300 Not all privileges or groups referenced are assigned to the caller.
      1 123 The filename, directory name, or volume label syntax is incorrect.
      1 10061 No connection could be made because the target machine actively refused it.
      1 10047 An address incompatible with the requested protocol was used.
      1 10044 The support for the specified socket type does not exist in this address family.
      1 10022 An invalid argument was supplied.

Then running our tests (both before/after being cached):

 533403 2 The system cannot find the file specified.
      5 3 The system cannot find the path specified.

Caching just errnos 2 and 3 would eliminate the bulk of these calls.

Now, there's clearly different groupings of errors to, some under 256, some ~1000, and some ~10000; I don't know if the table of errors are somewhere but it'd be interesting to know if there's some cutoff that would be workable here. (EDIT: I found https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes, but it's just structured as fixed size ranges for doc pages.)

But, I can also just update the PR to cache just 256 (a number pulled out of a hat).


Of course, again this isn't the smoking gun of the original problem, just one part I noticed when profiling 😄

@qmuntal
Copy link
Member

qmuntal commented Apr 24, 2025

But, I can also just update the PR to cache just 256 (a number pulled out of a hat).

I don't like magic numbers. The data you provided is more than enough to justify caching ERROR_FILE_NOT_FOUND and ERROR_PATH_NOT_FOUND, let's do this for now. We can always be more aggressive latter when we have more data.

@mvdan
Copy link
Member

mvdan commented Apr 24, 2025

N=2 also means you can use a couple of https://pkg.go.dev/sync#OnceValue globals, rather than a full sync.Map.

@jakebailey
Copy link
Contributor Author

Unfortunately not; doing so creates an init cycle since the code to do the error message syscall eventually references Errno itself:

D:\work\go\src\syscall\syscall_windows.go:142:2: initialization cycle for errnoString2
        D:\work\go\src\syscall\syscall_windows.go:142:2: errnoString2 refers to error
        D:\work\go\src\syscall\syscall_windows.go:163:16: error refers to formatMessage
        D:\work\go\src\syscall\zsyscall_windows.go:648:6: formatMessage refers to Addr
        D:\work\go\src\syscall\dll_windows.go:276:20: Addr refers to mustFind
        D:\work\go\src\syscall\dll_windows.go:267:20: mustFind refers to Find
        D:\work\go\src\syscall\dll_windows.go:243:20: Find refers to FindProc
        D:\work\go\src\syscall\dll_windows.go:102:15: FindProc refers to Error
        D:\work\go\src\syscall\syscall_windows.go:146:16: Error refers to errnoString2

So any setup where a Errno.Error references a global var which then references Errno doesn't work ☹.

@qmuntal
Copy link
Member

qmuntal commented Apr 25, 2025

So any setup where a Errno.Error references a global var which then references Errno doesn't work ☹.

Good catch. Let's use a sync.Map for now, it will be way faster than calling syscall.FormatMessage. We can always switch to sync.OnceValue is we need to by dynamically loading it from the runtime and accessing it via linkname from syscall, but that might be an overkill.

@jakebailey
Copy link
Contributor Author

I didn't mean to spam this thread too hard about this one particular fix since it's not the whole problem; I think maybe it'd be best to comment on the CL now? The current version uses a sync.Map and I've updated it with the "less than 256" thing, but if you all want something else I'll change it to whatever works (can just switch/case on those two errnos instead).

@thepudds
Copy link
Contributor

Hi @jakebailey, shortly after this was first open, I took a brief look.

Consider these "drive by" comments that might in fact be wrong (including mostly relying on memory at this point, sorry).

  1. It might be more convenient to see what's going on if you look directly at the test log. For example:
$ go clean -testcache && go clean -cache   # clean first to make test log easier to find
$ go test encoding/json                    # sample test run
$ cd $(go env GOCACHE)                     
$ rg '^open ' | head                       # find your test log (a very large test log in your case)
$ cat 08/08ebf86b46ea93757a5300842885bc9a540b2299f519583f2dcebfa7a2ae23e5-d   # a small test log for encoding/json

# test log
getenv HTTP_PROXY
[...]
open testdata\fuzz\FuzzEqualFold
open testdata\fuzz\FuzzUnmarshalJSON
open testdata\fuzz\FuzzDecoderToken
  1. I wonder if the Windows stat code could be sped up, at least for modern Windows. From quick look, the main approach in stat_windows.go might be at least ~6 years old. It looks like in your profiles, it calls both syscall.GetFileAttributesEx and syscall.CreateFile, with both taking non-negligible time. The code looks like it might be hoping only GetFileAttributesEx is needed in the common case; it could be worth poking around that code a bit.

  2. I wonder if it might be possible to at least partially process the test log concurrently. For example, perhaps the test log could be broken up into chunks, with individual hashOpen or hashStat calls happening in parallel, but then combining their results sequentially in the right order.

fh, err := hashOpen(name)
if err != nil {
if cache.DebugTest {
fmt.Fprintf(os.Stderr, "testcache: %s: input file %s: %s\n", a.Package.ImportPath, name, err)
}
return cache.ActionID{}, err
}
fmt.Fprintf(h, "open %s %x\n", name, fh)

Or maybe that doesn't make sense based on the required semantics, but something perhaps to at least briefly investigate.

  1. A more invasive change might be to try to change the test log validating code to try to gather stat information directory-by-directory rather than file-by-file, though again maybe doesn't make sense based on the required semantics.

  2. If the typescript-go Go test runner is building up the full list of files to visit as strings, and then processing them in a later phase, maybe that could be changed so that the directory walk passes open files rather than strings (e.g., maybe via an iterator or channel or callback or whatever). I didn't look carefully, so that might not make any sense. Separately, might be worth double-checking if the typescript-go test runner is using the most ~modern Go stdlib approach for walking directories. (There can be material performance differences, but that might not matter here -- I'm mostly mentioning because there's some remote-ish chance that could cut down on the number of file operations that the test log later needs to validate, but that's mostly just a hopeful wish).

  3. That is a lot of files. I suspect this is part of your shared test data with the TypeScript compiler, so changing formats is probably not very desirable, but if there is an opportunity to collapse down the files, I will note that txtar is a very convenient format for working with many test files: https://pkg.go.dev/golang.org/x/tools/txtar

@qmuntal
Copy link
Member

qmuntal commented Apr 25, 2025

I wonder if the Windows stat code could be sped up, at least for modern Windows.

Good point. Windows just released GetFileInformationByHandle, which should replace the GetFileAttributes + CreateFile fallback.

In fact some weeks ago I reimplemented os.Stat using that new API, but I didn't submit the CL because I didn't see enough perf improvements in the common case. I would give it another try using the typescrypt-go repo as test bench.

@jakebailey
Copy link
Contributor Author

If the typescript-go Go test runner is building up the full list of files to visit as strings, and then processing them in a later phase, maybe that could be changed so that the directory walk passes open files rather than strings (e.g., maybe via an iterator or channel or callback or whatever). I didn't look carefully, so that might not make any sense. Separately, might be worth double-checking if the typescript-go test runner is using the most ~modern Go stdlib approach for walking directories. (There can be material performance differences, but that might not matter here -- I'm mostly mentioning because there's some remote-ish chance that could cut down on the number of file operations that the test log later needs to validate, but that's mostly just a hopeful wish).

The perf of the test code itself is actually fine; I have a change locally which converted the code to a recursive walker which iterates over the paths concurrently with the thing that generates the tests (or at least, generates them for the main goroutine to then t.Run since that can't be done concurrently), and it didn't help at all.

Once we get past test caching, the underlying test code itself runs in a similar amount of time to me using WSL on the same machine (where I typically work, and thus was unaware of the misery my colleagues were experiencing: "wait, your tests don't run in 10 seconds?" 😄).

That is a lot of files. I suspect this is part of your shared test data with the TypeScript compiler, so changing formats is probably not very desirable, but if there is an opportunity to collapse down the files, I will note that txtar is a very convenient format for working with many test files:

Unfortunately it's not really desirable. The current test format is pretty old and very helpful. I certainly know about txtar, we just have our own format (predating txtar I think) called "twoslash" that works roughly the same way, declaring a FS with some metadata. It's just that we then use that as an input to output 5 or so files related to said test.


The other interesting bit is that the test caching code runs more than once; I think three times. Once for the test results, once for the test binary, then a third time when writing out the results. It's documented as being a perf benefit since this means being able to skip the binary writing part, but in our case the testlog code itself is what dominates.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/668636 mentions this issue: cmd/go/internal/test: parallelize test log file operations

@thepudds
Copy link
Contributor

I was curious whether the approach I outlined in item 3 in #72992 (comment) above was viable, so I took a quick stab this weekend and sent https://go.dev/cl/668636.

In short, it tries to do the test log file operations in parallel, but then calculates the final hash in the original order from the test log to keep the end result deterministic.

On an old-ish Windows laptop (Windows 10 with 8 logical cores), here are wall-clock times for doing go test ./internal/testrunner just after doing go clean -testcache (as outlined in the initial report above):

                          go1.24    cl-668636        delta

        overall (sec.)    120.56        48.09       -60.1%
      test time (sec.)     21.25        20.92            ~
  non-test time (sec.)     99.31        27.17       -72.6%

The meaning of the times in that table:

  • overall: end-to-end wall-clock time (measured via time in MSYS2 bash on Windows).
  • test time: the package test execution time reported by go test.
  • non-test time: overall minus test time, which is a simple/approximate view into how much time is spent outside of executing the user code under test.

The CL also de-duplicates some of the work. (For typescript-go testrunner package, about 30% of the file operations in the test log seem to be repeats).

It probably could be simplified, but my starting point was not knowing if it would be feasible, so consider it a first cut.

It passes all.bash and the longtest trybots on Linux and Windows, but it still could be wrong. 😅

@jakebailey, I'd be curious if you see any improvement:

$ go install golang.org/dl/gotip@latest 
$ gotip download 668636

@jakebailey
Copy link
Contributor Author

Just measuring how long it takes when the results are cached, before it's:

ok      github.com/microsoft/typescript-go/internal/testrunner  (cached)

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 20
Milliseconds      : 124
Ticks             : 201246315
TotalDays         : 0.000232923975694444
TotalHours        : 0.00559017541666667
TotalMinutes      : 0.335410525
TotalSeconds      : 20.1246315
TotalMilliseconds : 20124.6315

After:

ok      github.com/microsoft/typescript-go/internal/testrunner  (cached)

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 12
Milliseconds      : 452
Ticks             : 124522877
TotalDays         : 0.000144123700231481
TotalHours        : 0.00345896880555556
TotalMinutes      : 0.207538128333333
TotalSeconds      : 12.4522877
TotalMilliseconds : 12452.2877

So, certainly better on my machine, though my measurements are different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

9 participants