Backup repository sizes - snapshot repository v. disk space use; why the difference?

I'm curious whether anyone knows what the underlying mechanism is causing this...

My (small) environment has replication across multiple nodes in the cluster, so snapshotting isn't something that's seemed important for our use case. However, we're looking at migrating to ES 9.0.1 and in this case a snapshot seemed to good step before rolling out the upgrade.

So, this is the most simple of environments: one cluster, four nodes. Set up a repository in Google Cloud Storage, and registered it and ran a couple of indexes to make sure it was basically operational. Then I removed the index specification, so the snapshot included all indices, and all features and the global state.

Three of the nodes are hot nodes, and hot indexes are configured with one replica, the other node is the warm node - no replicas. The approximate storage space used on disk is 56GB for the hot data (so, duplicated data) and 108GB for the warm data.

So, here's the question: one snapshot, of what is in local storage 164GB turns out to be 600GB in the repository. This is backed up by the API call GET _snapshot/REPOSITORY/SNAPSHOTNAME/_status?pretty (641 720 111 926 bytes), querying the bucket metadata, and the approximate transfer time/bandwidth to Cloud Storage.

That's with the "compression" option enabled for the repository.

Any ideas what's happening here? This is really just being used one-off, but I'd like to understand....

Um. No, this doesn't make sense at all. The blobs in the repository are byte-for-byte copies of the files on disk. There's a few extra metadata files but I doubt they add more than a MiB or two.

Where are you getting these figures from exactly? My best guess is that they are excluding something important.

Hi, thanks for getting back to me... and looking again at the data, it's user error (mine, of course).

I was using monitoring data; when I checked the actual totals as reported by df (XFS filesystems) they do in fact add up, so there's clearly an issue in that system that we need to determine.

And I apologize for the confirmation bias due to having read through all of the other questions around snapshot utilization. In this case, running another snapshot added the expected volume to the cloud bucket.

For the record, the bucket utilization was approximately 88% of the local storage utilization, not 366%.

1 Like

Great, that's a relief :slight_smile: thanks for closing the loop here.