forked from scionproto/scion
-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Labels
Description
Since out last update from upstream, it seems we are now getting quite a number of errors while trying to write to the beacon DB.
The typical error would look like:
Jul 17 14:03:43 scionlab-1108-stallman scion-control-service[1257392]: 2024-07-17 14:03:43.846434+0000 ERROR beaconing/writer.go:110 Unable to register {"debug_id": "7804f9a1", "seg_type": "core", "err": {"msg": "Failed to create transaction", "cause": "interrupted (9)", "stacktrace": ["[github.com/scionproto/scion/private/storage/path/sqlite.(*Backend).BeginTransaction](https://www.google.com/url?q=http://github.com/scionproto/scion/private/storage/path/sqlite.(*Backend).BeginTransaction&sa=D&source=calendar&usd=2&usg=AOvVaw0tboqlSOgq4L8BZJiz-hd0) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/storage/path/sqlite/sqlite.go:90", "[github.com/scionproto/scion/private/storage/path/metrics.(*metricsPathDB).BeginTransaction.func1](https://www.google.com/url?q=http://github.com/scionproto/scion/private/storage/path/metrics.(*metricsPathDB).BeginTransaction.func1&sa=D&source=calendar&usd=2&usg=AOvVaw3TBShhFUbnm0DF0ElA_lqq) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/storage/path/metrics/metrics.go:116", "[github.com/scionproto/scion/private/storage/path/metrics.(*Observer).Observe](https://www.google.com/url?q=http://github.com/scionproto/scion/private/storage/path/metrics.(*Observer).Observe&sa=D&source=calendar&usd=2&usg=AOvVaw2TFtv5lzcMCC9xPhdyXr6D) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/storage/path/metrics/metrics.go:77", "[github.com/scionproto/scion/private/storage/path/metrics.(*metricsPathDB).BeginTransaction](https://www.google.com/url?q=http://github.com/scionproto/scion/private/storage/path/metrics.(*metricsPathDB).BeginTransaction&sa=D&source=calendar&usd=2&usg=AOvVaw2Gc5zr-qk8C17mJgYTkd5x) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/storage/path/metrics/metrics.go:115", "[github.com/scionproto/scion/private/segment/seghandler.(*DefaultStorage).StoreSegs](https://www.google.com/url?q=http://github.com/scionproto/scion/private/segment/seghandler.(*DefaultStorage).StoreSegs&sa=D&source=calendar&usd=2&usg=AOvVaw3Ln1rkG_yBuCRE62YsusCs) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/segment/seghandler/storage.go:68", "[github.com/scionproto/scion/control/beaconing.(*LocalWriter).Write](https://www.google.com/url?q=http://github.com/scionproto/scion/control/beaconing.(*LocalWriter).Write&sa=D&source=calendar&usd=2&usg=AOvVaw2adYbO50bHD1fRcLf3q5qw) /builds/PRV-PERRIG/scionlab/scion-builder/scion/control/beaconing/writer.go:241", "[github.com/scionproto/scion/control/beaconing.(*WriteScheduler).run](https://www.google.com/url?q=http://github.com/scionproto/scion/control/beaconing.(*WriteScheduler).run&sa=D&source=calendar&usd=2&usg=AOvVaw0PtmYhvFWLogWGJAzRcrSP) /builds/PRV-PERRIG/scionlab/scion-builder/scion/control/beaconing/writer.go:124", "[github.com/scionproto/scion/control/beaconing.(*WriteScheduler).Run](https://www.google.com/url?q=http://github.com/scionproto/scion/control/beaconing.(*WriteScheduler).Run&sa=D&source=calendar&usd=2&usg=AOvVaw3rnydy92uplBaVpp2BoDa8) /builds/PRV-PERRIG/scionlab/scion-builder/scion/control/beaconing/writer.go:109", "[github.com/scionproto/scion/private/periodic.(*Runner).onTick](https://www.google.com/url?q=http://github.com/scionproto/scion/private/periodic.(*Runner).onTick&sa=D&source=calendar&usd=2&usg=AOvVaw2c4V0iwhdjH_UN_ee6B5T5) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/periodic/periodic.go:206", "[github.com/scionproto/scion/private/periodic.(*Runner).runLoop](https://www.google.com/url?q=http://github.com/scionproto/scion/private/periodic.(*Runner).runLoop&sa=D&source=calendar&usd=2&usg=AOvVaw1A6UFgkmbe4Mlh_0ZavgZU) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/periodic/periodic.go:188", "[github.com/scionproto/scion/private/periodic.StartWithMetrics.func1](https://www.google.com/url?q=http://github.com/scionproto/scion/private/periodic.StartWithMetrics.func1&sa=D&source=calendar&usd=2&usg=AOvVaw1w7O93JF_g4T64WTP1zzXl) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/periodic/periodic.go:138"]}}
It also happens while cleaning:
Jul 17 14:07:55 scionlab-1102-perrig scion-control-service[1189]: 2024-07-17 14:07:55.764585+0000 ERROR cleaner/cleaner.go:67 Failed to delete {"debug_id": "48cda52c", "subsystem": "control_pathstorage_cleaner", "err": {"msg": {"msg": "db: write failed", "stacktrace": ["[github.com/scionproto/scion/private/storage/db.init](https://www.google.com/url?q=http://github.com/scionproto/scion/private/storage/db.init&sa=D&source=calendar&usd=2&usg=AOvVaw0AMBKe4yqgUd2FzRVidqdm) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/storage/db/errors.go:30", "runtime.doInit1 /root/go/pkg/mod/[golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:6735](https://www.google.com/url?q=http://golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:6735&sa=D&source=calendar&usd=2&usg=AOvVaw2188u6lrcqjPXLhgbzCW6v)", "runtime.doInit /root/go/pkg/mod/[golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:6702](https://www.google.com/url?q=http://golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:6702&sa=D&source=calendar&usd=2&usg=AOvVaw2cNhIGAlUjO4XWnC_a4O6Y)", "runtime.main /root/go/pkg/mod/[golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:249](https://www.google.com/url?q=http://golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:249&sa=D&source=calendar&usd=2&usg=AOvVaw3yyTILs7Arcw_58vawHcvk)"]}, "cause": {"msg": {"msg": "db: transaction error", "stacktrace": ["[github.com/scionproto/scion/private/storage/db.init](https://www.google.com/url?q=http://github.com/scionproto/scion/private/storage/db.init&sa=D&source=calendar&usd=2&usg=AOvVaw0AMBKe4yqgUd2FzRVidqdm) /builds/PRV-PERRIG/scionlab/scion-builder/scion/private/storage/db/errors.go:32", "runtime.doInit1 /root/go/pkg/mod/[golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:6735](https://www.google.com/url?q=http://golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:6735&sa=D&source=calendar&usd=2&usg=AOvVaw2188u6lrcqjPXLhgbzCW6v)", "runtime.doInit /root/go/pkg/mod/[golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:6702](https://www.google.com/url?q=http://golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:6702&sa=D&source=calendar&usd=2&usg=AOvVaw2cNhIGAlUjO4XWnC_a4O6Y)", "runtime.main /root/go/pkg/mod/[golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:249](https://www.google.com/url?q=http://golang.org/toolchain@v0.0.1-go1.21.10.linux-amd64/src/runtime/proc.go:249&sa=D&source=calendar&usd=2&usg=AOvVaw3yyTILs7Arcw_58vawHcvk)"]}, "cause": "interrupted (9)", "detailMsg": "create tx"}, "detailMsg": "delete in tx"}}
Some notes:
- The control service of these ASes works okay for a while (hours) but after that is unable to register new beacons, and eventually connectivity is broken.
- It happens in both core and non-core ASes.
- It happens also with non SCIONLab code, directly from scionproto.
- Restarting the control service is enough to get rid of this problem. But we don't want to do that every ~ 6 hours.
- PR from upstream changing the sqlite library: build: support fedora scionproto/scion#4371
- Related PR from upstream: build: prevent go from linking libresolv dynamically. scionproto/scion#4394
- modernc possibly related issue: https://gitlab.com/cznic/sqlite/-/issues/178
TODO:
- Merge patch on scionlab Mattn's sqlite as default #167
- Create a reproducible unit test (if possible, integration test if not)
- Open PR upstream
- Merge to scionlab
Reactions are currently unavailable