-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethinking fix_file
#2129
Comments
The issue with the current implementation is of |
This seems the cleanest option, but I doubt whether the |
If we want to work with data that is not stored in files at some point in the future, it would be useful to either get rid of We could consider making |
AFAIK iris can't load file objects eg something stored on an S3 storage so our main problem is not fix_file but the I/O itself - if we get the I/O to work with object stores then we can modify fix_file to work on such a thing I reckon |
I have experience with netCDF files on various types of storages, including object stores, so does @zklaus (am fairly sure of that) - iris is the type of thing you don't want in that case, you want a basic loader into a file format like Zarr or overloaded Zarr - beware of xarray since that transfers the data! |
I tested this a lot lately for the ERA5 grib files, in which case I didn't find any problems with this function. The lazyness of the data and all relevant metadata is properly preserved.
Conceptually I agree, but our current
I don't think we should get rid of
In #2160, I allowed |
Last week I found that reading nc files with lots of variables (like we have for many native models like ICON, EMAC, etc.) can be very slow with iris (it can take hours depending on the setup). See more details here: SciTools/iris#6223. I found that Thus, I'd really like to implement the solution "Allow With the more flexible |
While working on it, I realized that the separation between fix_file/fix_metadata/fix_data does no longer have a purpose now that we're working with |
I really like this idea! However, I don't think this can be easily implemented in practice. Currently, the existing fix functions are not called one after each other. The order is rather fix_file --> load --> fix_metadata --> concatenate --> cmor_check_metadata --> clip_timerange --> fix_data --> cmor_check_data How and where in this chain would you implement this |
@bouweandela I just opened a PR that implements the option to use xarray and ncdata objects in What do you think? |
Currently,
fx_file
has the call signature(filename) -> (filename)
. Thus, files need to be copied if they need to be modified (overwriting input files is a very bad idea), which is very expensive. Usually, thenetCDF4
library is used to perform these kind of modifications.A possible alternative that has often come up is xarray. At the moment, there is a method
DataArray.to_iris()
which allows converting aDataArray
into aCube
(no idea how efficient this is, though). In addition, the iris devs are working on an improved xarray-iris bridge.Thus, it would be very nice to allow an efficient usage of xarray in
fix_file
. My question is now: how would that look like in practice? Possible solutions I could think of are:fx_file
to return xarray objects (currently onlyDataArray
?) andload
to read these kind of objects.fx_file
to return cubes andload
to read cubes.fx_file
in such a way that it is a callback inload
(with one of the call signatures above).The reason I am asking this now is that I want to get started with reading the ERA5 GRIB files that are available on Levante (see #1991). However, iris-grib cannot open them. On the other hand, xarray (in combination with ECMWF's cfgrib) works perfectly fine. Thus, an option to include xarray into our pipeline would be super helpful for me.
@ESMValGroup/technical-lead-development-team opinions?
The text was updated successfully, but these errors were encountered: