Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Out-of-memory data reduction. #48

Open
Datseris opened this issue Mar 19, 2021 · 1 comment
Open

[FR] Out-of-memory data reduction. #48

Datseris opened this issue Mar 19, 2021 · 1 comment
Labels
easy Solving this has easy to medium difficulty! feature request A new feature we would like to have! hard This is definitely difficult to solve!

Comments

@Datseris
Copy link
Member

While the in-memory functionality is great, it is typically the case that you have so much data that they don't fit to memory. Typically these data are saved in either monthly or yearly files, where each file contains one year of all the data, etc.

This is good for us, because at the moment it isn't hard to write a simple for-loop over your code. However we can streamline many things. For example, the output ClimArray can be pre-initialized and efficiently aggregated over, similarly to how yearlyagg works now.

So in principle there are two ways to do out-of-memory data reduction:

  1. Reduce by aggregating over time, by reducing the total amount of time-points and doing an out-of-memory version of yearlyagg and looping over the files.
  2. Reduce by projecting to a lower resolution grid. This is done for each time slice in the files, once again looping over files. This will require us to have [FR] LonLat to LonLat interpolation #46 ready so that we can use it here.

The above is in my eyes easy, provided that the required issues are solved first.

The thing that is hard is also getting automatic parallelization to work here.

@Datseris Datseris added feature request A new feature we would like to have! easy Solving this has easy to medium difficulty! hard This is definitely difficult to solve! labels Mar 19, 2021
@Balinus
Copy link
Member

Balinus commented Mar 23, 2021

You should look over at ESDL.jl and see how they implemented it. As far as I remember, they do out-of-memory reduction in a parallel manner. It is based on netCDF and Zarr chunks capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy Solving this has easy to medium difficulty! feature request A new feature we would like to have! hard This is definitely difficult to solve!
Projects
None yet
Development

No branches or pull requests

2 participants