Skip to content

x/build/cmd/watchflakes: periodically restarts #70743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cagedmantis opened this issue Dec 9, 2024 · 4 comments
Open

x/build/cmd/watchflakes: periodically restarts #70743

cagedmantis opened this issue Dec 9, 2024 · 4 comments
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@cagedmantis
Copy link
Contributor

Watchflakes is periodically being restarted.

Command kubectl get pods produces:
watchflakes-deployment-679f9fbf8c-zp9ld 1/1 Running 47 (26m ago) 27d

@golang/release

@cagedmantis cagedmantis added Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 9, 2024
@cagedmantis cagedmantis added this to the Unreleased milestone Dec 9, 2024
@gabyhelp
Copy link

gabyhelp commented Dec 9, 2024

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@dmitshur dmitshur changed the title x/build/watchflakes: periodically restarts x/build/cmd/watchflakes: periodically restarts Dec 9, 2024
@dmitshur
Copy link
Contributor

Watchflakes currently implements retries at the process level. That is, it has many code paths that handle errors with a log.Fatal (see here), regardless of whether that error is a temporary network problem that would be safe to retry, or something that cannot be retried safely.

To make progress on this issue, errors that are safe to retry within the process need to identified and replaced with something along the lines of a time.Sleep(...); continue.

@cagedmantis
Copy link
Contributor Author

I found these log entries right before a pod crashed:

watchflakes: 2025/02/12 19:34:59 ListCommits go release-branch.go1.23
watchflakes: 2025/02/12 19:35:00 rpc error: code = Unavailable desc = service unavailable

@cagedmantis
Copy link
Contributor Author

Watchflakes is restarting every couple of minutes:
watchflakes-deployment-7d5fb59cbf-6s5ph 1/1 Running 1070 (5m29s ago) 34d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants