align_chunks not working for datasets #10516

josephnowak · 2025-07-08T12:53:54Z

I forgot to pass the align_chunks to the to_zarr method on the datasets, which makes the feature useless for this kind of data structure. I added a specific test to cover this issue.

Now, the align_chunks works on the cases where the data is smaller than a single Zarr chunk (also, one test was added to cover this scenario).

I modified (again) the error message shown with the safe_chunks, now it includes information about the two chunks that overlap with a single Zarr chunk. From what I saw on the error 10501, the original message was not helping the users to understand what was happening.

Closes to_zarr() via dask with silent data loss on append #10501
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

…f the datasets

for more information, see https://pre-commit.ci

…or datasets

…-chunks

josephnowak · 2025-07-10T15:33:34Z

Looks like those tests failing are not related to this PR

for more information, see https://pre-commit.ci

josephnowak · 2025-07-29T11:50:48Z

Hi @max-sixty, sorry for bothering you. I'm not sure if you have some free time to review this PR.

I'm not sure how the reviewer is assigned on Xarray in this case, as no other person was involved in the issue, and probably no one else was notified to review this.

lbesnard · 2025-08-07T02:05:00Z

@josephnowak if that helps for the reviewing process, I can confirm that those changes work for me

lbesnard · 2025-08-19T05:30:24Z

@josephnowak do you think someone else could maybe look at this?

josephnowak · 2025-08-19T08:57:22Z

Hi @dcherian, sorry for bothering you, not sure if you have some free time to take a look on this, or if you know of someone else that have the time to review this PR would be awesome.

for more information, see https://pre-commit.ci

josephnowak · 2025-08-27T09:48:59Z

Hi @lbesnard,

Unfortunately, I don't know of anyone else who could review the PR. The previous time that I sent a PR to Xarray, it was reviewed in a couple of days, so probably most of the maintainers are busy during this month.

I think that as a temporary solution, you can copy and paste the function grid_rechunk that is on my branch and use it directly on your code as I did here. It is more problematic because you need to keep track of the actual chunk structure of your data, but it should help.

lbesnard · 2025-08-27T09:59:29Z

Oh i ve been using your hash commit which is great. But im using this in a production tool, which is using poetry, (tldr; pointing xarray package to this hash fails on my CICD pipeline), so Im a bit blocked at the moment.

Its just creating more work for you as you always have to rebase.. thanks a lot anyway!

However I'm surprised no one else seems to have notice this bug.

josephnowak · 2025-09-01T20:43:45Z

Hi @max-sixty @dcherian, sorry for bothering you again, but I would like to know if any of you have some free time to review this PR, or if someone else can review it, or if you can provide an estimate of when you could review it so that I can avoid extra work on my side, having to rebase the PR multiple times.

josephnowak · 2025-09-03T08:59:51Z

Hi @shoyer , I saw that you have done some contributions related to the to_zarr method, is it possible for you to review this PR? Or if you know of someone else would be very helpful.

josephnowak · 2025-09-04T13:36:36Z

Hi @rabernat , I saw that you have made some contributions related to the to_zarr method. Is it possible for you to review this PR? Or if you know of someone else would be very helpful.

shoyer

Thanks @josephnowak!

shoyer · 2025-09-04T17:07:31Z

xarray/backends/chunks.py

+    # This is useful for the scenarios where the enc_chunks are bigger than the
+    # variable chunks, which happens when the user specifies the enc_chunks manually.
+    enc_chunks = tuple(
+        min(enc_chunk, sum(var_chunk))
+        for enc_chunk, var_chunk in zip(enc_chunks, nd_var_chunks, strict=True)
+    )


Are we sure we want to convert enc_chunks rather than raising an error?

If so, I think this definitely deserves a unit test.

Thanks a lot for taking a look on the PR, the test that I added cover this scenario.
It is necessary to convert the enc_chunks because there are cases where the array that is going to be stored is smaller on at least one of the dimension on the enc_chunks and that was causing that the logic of the align chunks failed because it expected that the array being always bigger or equal to the chunks

I thought about this, and I think it was not clear that the modification of the enc_chunks was only for the align_chunks algorithm, so I added this change https://github.com/pydata/xarray/pull/10516/files#diff-6462c27c36592f9134c381565c8f30eb59b48ea92d9bcaca371502bdeb8a030aR145-R149 and removed the enc_chunks modification, that should help to clarify the code.

btw, I changed the use of "var" to "v" on the names of the variables because I noticed that the use of "v" was more common on the Xarray code, for example, nd_var_chunks changed to nd_v_chunks and so on.

…st of Xarray, move the modification of the enc_chunks to the build_grid_chunks function, add additional test to covert the scenario where the chunk is bigger than the size of the array

max-sixty · 2025-09-07T17:16:47Z

@josephnowak sorry I missed this! thank you as ever for these PRs; happy to see that Stephan took a look.

doc/whats-new.rst

The align_chunks parameter was not being sent on the to_zarr method o…

49c9ea4

…f the datasets

github-actions bot added the topic-documentation label Jul 8, 2025

pre-commit-ci bot and others added 4 commits July 8, 2025 12:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa00c95

for more information, see https://pre-commit.ci

Add a note on the whats-new.rst about the error of the align_chunks f…

6d3ff30

…or datasets

Merge remote-tracking branch 'origin/fix/align-chunks' into fix/align…

daf1295

…-chunks

Fix a ValueError on the test_dataset_to_zarr_align_chunks_true

62e3ddb

josephnowak marked this pull request as draft July 9, 2025 14:15

github-actions bot added topic-backends io labels Jul 9, 2025

josephnowak added 2 commits July 9, 2025 17:28

Fix the case when enc_chunks are bigger than the dask chunks

a2789f6

Linter

60c6c75

lbesnard mentioned this pull request Jul 10, 2025

Fix: xarray data loss - udpate - use align_chunks aodn/aodn_cloud_optimised#167

Merged

josephnowak marked this pull request as ready for review July 10, 2025 09:08

josephnowak added 4 commits July 10, 2025 11:41

Merge branch 'main' into fix/align-chunks

2d5fd41

Fix small reintroduced issue when the region is None

f0d60a6

Merge remote-tracking branch 'origin/fix/align-chunks' into fix/align…

467e8d2

…-chunks

Fix mypy issues

b471a8c

josephnowak added 2 commits July 11, 2025 10:31

Merge branch 'main' into fix/align-chunks

9ac9872

Update whats-new.rst

328161a

dcherian removed the topic-documentation label Jul 11, 2025

Merge branch 'main' into fix/align-chunks

67e9193

github-actions bot added the topic-documentation label Jul 13, 2025

pre-commit-ci bot and others added 3 commits July 13, 2025 15:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

8e9c284

for more information, see https://pre-commit.ci

Merge branch 'main' into fix/align-chunks

0c620cf

Merge branch 'main' into fix/align-chunks

1ecacdd

josephnowak and others added 2 commits August 19, 2025 15:37

Merge branch 'main' into fix/align-chunks

4ebb345

[pre-commit.ci] auto fixes from pre-commit.com hooks

a8c0172

for more information, see https://pre-commit.ci

shoyer reviewed Sep 4, 2025

View reviewed changes

josephnowak added 2 commits September 5, 2025 15:56

Merge branch 'main' into fix/align-chunks

1ef0137

Use "v" instead of "var" to follow the name convention used on the re…

ca72ab6

…st of Xarray, move the modification of the enc_chunks to the build_grid_chunks function, add additional test to covert the scenario where the chunk is bigger than the size of the array

github-actions bot added the topic-zarr Related to zarr storage library label Sep 5, 2025

Update the whats-new.rst

6578f0b

shoyer approved these changes Sep 7, 2025

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

josephnowak added 2 commits September 8, 2025 13:14

Fix whats-new.rst

08c2e9d

Merge branch 'pydata:main' into fix/align-chunks

53bde7b

max-sixty merged commit 40c27d1 into pydata:main Sep 8, 2025
37 checks passed

josephnowak mentioned this pull request Sep 10, 2025

align_chunks=True only seems to work if safe_chunks=False (docs inaccurate?) #10724

Closed

ianhi mentioned this pull request Oct 1, 2025

try to fix xarray-backend-test earth-mover/icechunk#1267

Merged

ianhi mentioned this pull request Oct 14, 2025

ZarrStore.open_group got an unexpected keyword argument 'align_chunks' earth-mover/icechunk#1298

Closed

lbesnard added a commit to lbesnard/xarray that referenced this pull request Oct 16, 2025

Feat: readd manually changes on align_chunks from pydata#10516

d9621b8

lbesnard added a commit to lbesnard/xarray that referenced this pull request Oct 16, 2025

Feat: readd manually changes on align_chunks from pydata#10516

94c0731

lbesnard added a commit to lbesnard/xarray that referenced this pull request Oct 16, 2025

Feat: readd manually changes on align_chunks from pydata#10516

2c873ac

Uh oh!

align_chunks not working for datasets #10516

align_chunks not working for datasets #10516

Uh oh!

Conversation

josephnowak commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josephnowak commented Jul 10, 2025

Uh oh!

josephnowak commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lbesnard commented Aug 7, 2025

Uh oh!

lbesnard commented Aug 19, 2025

Uh oh!

josephnowak commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josephnowak commented Aug 27, 2025

Uh oh!

lbesnard commented Aug 27, 2025

Uh oh!

josephnowak commented Sep 1, 2025

Uh oh!

josephnowak commented Sep 3, 2025

Uh oh!

josephnowak commented Sep 4, 2025

Uh oh!

shoyer left a comment

Choose a reason for hiding this comment

Uh oh!

shoyer Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

josephnowak Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

josephnowak Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

max-sixty commented Sep 7, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

josephnowak commented Jul 8, 2025 •

edited

Loading

josephnowak commented Jul 29, 2025 •

edited

Loading

josephnowak commented Aug 19, 2025 •

edited

Loading