Skip to content

Conversation

@josephnowak
Copy link
Contributor

@josephnowak josephnowak commented Jul 8, 2025

I forgot to pass the align_chunks to the to_zarr method on the datasets, which makes the feature useless for this kind of data structure. I added a specific test to cover this issue.

Now, the align_chunks works on the cases where the data is smaller than a single Zarr chunk (also, one test was added to cover this scenario).

I modified (again) the error message shown with the safe_chunks, now it includes information about the two chunks that overlap with a single Zarr chunk. From what I saw on the error 10501, the original message was not helping the users to understand what was happening.

@josephnowak josephnowak marked this pull request as draft July 9, 2025 14:15
@josephnowak
Copy link
Contributor Author

Looks like those tests failing are not related to this PR

@josephnowak
Copy link
Contributor Author

josephnowak commented Jul 29, 2025

Hi @max-sixty, sorry for bothering you. I'm not sure if you have some free time to review this PR.

I'm not sure how the reviewer is assigned on Xarray in this case, as no other person was involved in the issue, and probably no one else was notified to review this.

@lbesnard
Copy link

lbesnard commented Aug 7, 2025

@josephnowak if that helps for the reviewing process, I can confirm that those changes work for me

@lbesnard
Copy link

@josephnowak do you think someone else could maybe look at this?

@josephnowak
Copy link
Contributor Author

josephnowak commented Aug 19, 2025

Hi @dcherian, sorry for bothering you, not sure if you have some free time to take a look on this, or if you know of someone else that have the time to review this PR would be awesome.

@josephnowak
Copy link
Contributor Author

Hi @lbesnard,

Unfortunately, I don't know of anyone else who could review the PR. The previous time that I sent a PR to Xarray, it was reviewed in a couple of days, so probably most of the maintainers are busy during this month.

I think that as a temporary solution, you can copy and paste the function grid_rechunk that is on my branch and use it directly on your code as I did here. It is more problematic because you need to keep track of the actual chunk structure of your data, but it should help.

@lbesnard
Copy link

Oh i ve been using your hash commit which is great. But im using this in a production tool, which is using poetry, (tldr; pointing xarray package to this hash fails on my CICD pipeline), so Im a bit blocked at the moment.

Its just creating more work for you as you always have to rebase.. thanks a lot anyway!

However I'm surprised no one else seems to have notice this bug.

@josephnowak
Copy link
Contributor Author

Hi @max-sixty @dcherian, sorry for bothering you again, but I would like to know if any of you have some free time to review this PR, or if someone else can review it, or if you can provide an estimate of when you could review it so that I can avoid extra work on my side, having to rebase the PR multiple times.

@josephnowak
Copy link
Contributor Author

Hi @shoyer , I saw that you have done some contributions related to the to_zarr method, is it possible for you to review this PR? Or if you know of someone else would be very helpful.

@josephnowak
Copy link
Contributor Author

Hi @rabernat , I saw that you have made some contributions related to the to_zarr method. Is it possible for you to review this PR? Or if you know of someone else would be very helpful.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josephnowak!

Comment on lines 162 to 167
# This is useful for the scenarios where the enc_chunks are bigger than the
# variable chunks, which happens when the user specifies the enc_chunks manually.
enc_chunks = tuple(
min(enc_chunk, sum(var_chunk))
for enc_chunk, var_chunk in zip(enc_chunks, nd_var_chunks, strict=True)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to convert enc_chunks rather than raising an error?

If so, I think this definitely deserves a unit test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking a look on the PR, the test that I added cover this scenario.
It is necessary to convert the enc_chunks because there are cases where the array that is going to be stored is smaller on at least one of the dimension on the enc_chunks and that was causing that the logic of the align chunks failed because it expected that the array being always bigger or equal to the chunks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, and I think it was not clear that the modification of the enc_chunks was only for the align_chunks algorithm, so I added this change https://github.com/pydata/xarray/pull/10516/files#diff-6462c27c36592f9134c381565c8f30eb59b48ea92d9bcaca371502bdeb8a030aR145-R149 and removed the enc_chunks modification, that should help to clarify the code.

btw, I changed the use of "var" to "v" on the names of the variables because I noticed that the use of "v" was more common on the Xarray code, for example, nd_var_chunks changed to nd_v_chunks and so on.

…st of Xarray, move the modification of the enc_chunks to the build_grid_chunks function, add additional test to covert the scenario where the chunk is bigger than the size of the array
@github-actions github-actions bot added the topic-zarr Related to zarr storage library label Sep 5, 2025
@max-sixty
Copy link
Collaborator

@josephnowak sorry I missed this! thank you as ever for these PRs; happy to see that Stephan took a look.

@max-sixty max-sixty merged commit 40c27d1 into pydata:main Sep 8, 2025
37 checks passed
lbesnard added a commit to lbesnard/xarray that referenced this pull request Oct 16, 2025
lbesnard added a commit to lbesnard/xarray that referenced this pull request Oct 16, 2025
lbesnard added a commit to lbesnard/xarray that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_zarr() via dask with silent data loss on append

5 participants