Skip to content

Clean up inDelete network atomically #2677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

corhere
Copy link
Collaborator

@corhere corhere commented Apr 25, 2023

The (*network).ipamRelease function nils out the network's IPAM info fields, putting the network struct into an inconsistent state. The network-restore startup code panics if it tries to restore a network from a struct which has fewer IPAM config entries than IPAM info entries. Therefore (*network).delete contains a critical section: by persisting the network to the store after ipamRelease(), the datastore will contain an inconsistent network until the deletion operation completes and finishes deleting the network from the datastore. If for any reason the deletion operation is interrupted between ipamRelease() and deleteFromStore(), the daemon will crash on startup when it tries to restore the network.

Updating the datastore after releasing the network's IPAM pools may have served a purpose in the past, when a global datastore was used for intra-cluster communication and the IPAM allocator had persistent global state, but nowadays there is no global datastore and the IPAM allocator has no persistent state whatsoever. Remove the vestigial datastore update as it is no longer necessary and only serves to cause problems. If the network deletion is interrupted before the network is deleted from the datastore, the deletion will resume during the next daemon startup, including releasing the IPAM pools.

(cherry picked from commit moby/moby@c957ad0)

The (*network).ipamRelease function nils out the network's IPAM info
fields, putting the network struct into an inconsistent state. The
network-restore startup code panics if it tries to restore a network
from a struct which has fewer IPAM config entries than IPAM info
entries. Therefore (*network).delete contains a critical section: by
persisting the network to the store after ipamRelease(), the datastore
will contain an inconsistent network until the deletion operation
completes and finishes deleting the network from the datastore. If for
any reason the deletion operation is interrupted between ipamRelease()
and deleteFromStore(), the daemon will crash on startup when it tries to
restore the network.

Updating the datastore after releasing the network's IPAM pools may have
served a purpose in the past, when a global datastore was used for
intra-cluster communication and the IPAM allocator had persistent global
state, but nowadays there is no global datastore and the IPAM allocator
has no persistent state whatsoever. Remove the vestigial datastore
update as it is no longer necessary and only serves to cause problems.
If the network deletion is interrupted before the network is deleted
from the datastore, the deletion will resume during the next daemon
startup, including releasing the IPAM pools.

Signed-off-by: Cory Snider <[email protected]>
(cherry picked from commit moby/moby@c957ad0)
Signed-off-by: Cory Snider <[email protected]>
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic in libnetwork during daemon start (panic: runtime error: index out of range [0] with length 0)
2 participants