Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtracting datasets in xarray 2024.6.0 leads to inconsistent chunks #9267

Open
uwagura opened this issue Jul 22, 2024 · 2 comments · May be fixed by #9896
Open

Subtracting datasets in xarray 2024.6.0 leads to inconsistent chunks #9267

uwagura opened this issue Jul 22, 2024 · 2 comments · May be fixed by #9896

Comments

@uwagura
Copy link

uwagura commented Jul 22, 2024

What is your issue?

When I call groupby() on a dataset and try to subtract another dataset from the result, I get an error that says
ValueError: Object has inconsistent chunks along dimension lead. This can be fixed by calling unify_chunks().
Adding a call to unify chunks beforehand resolves the issue, but for some strange reason this chunking issue only occurs with more recent versions of xarray. When I run the same code below with xarray 2022.3.0, I can run the same code without calling unify chunks. Does anyone know what may have caused the discrepency?

Here's the relevant section of code I was running when I encountered the problem. in the snippet below, the members variable is a list of paths to netcdf files that contain the output from an ensemble of ocean models. I think the error should be reproducible with any group of netcdf files and similar operations:

ds = xarray.open_mfdataset(members, combine='nested', concat_dim='member').sortby('init') 
ensmean = ds.mean('member')
climo = ensmean.sel(init=slice('1993-01-01', '1993-12-31')).groupby('init.month').mean('init').load()
anom = model_ds.groupby('init.month') - climo

And here's the output from xarray.show_version():

INSTALLED VERSIONS
------------------
commit: None
python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:23:07) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.102.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US
LOCALE: ('en_US', 'ISO8859-1')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.6.0
pandas: 2.2.2
numpy: 2.0.0
scipy: None
netCDF4: 1.7.1
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.7.1
distributed: 2024.7.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.6.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 71.0.4
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
@uwagura uwagura added the needs triage Issue that has not been reviewed by xarray team member label Jul 22, 2024
Copy link

welcome bot commented Jul 22, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@dcherian dcherian added bug topic-dask and removed needs triage Issue that has not been reviewed by xarray team member labels Jul 22, 2024
@dcherian
Copy link
Contributor

Ah this is my mistake, we need to loop over variable and chunk them individually here:

if obj.chunks and not other.chunks:
# TODO: What about datasets with some dask vars, and others not?
# This handles dims other than `name``
chunks = {k: v for k, v in obj.chunksizes.items() if k in other.dims}
# a chunk size of 1 seems reasonable since we expect individual elements of
# other to be repeated multiple times across the reduced dimension(s)
chunks[name] = 1
other = other.chunk(chunks)

dcherian added a commit to dcherian/xarray that referenced this issue Aug 28, 2024
dcherian added a commit to dcherian/xarray that referenced this issue Dec 9, 2024
dcherian added a commit to dcherian/xarray that referenced this issue Dec 16, 2024
@dcherian dcherian linked a pull request Dec 16, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants