-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache Dask arrays created from NetCDFDataProxy
s to speed up loading files with multiple variables
#6252
base: main
Are you sure you want to change the base?
Conversation
916a1df
to
c61b12f
Compare
c61b12f
to
1249c6b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6252 +/- ##
==========================================
+ Coverage 89.83% 89.85% +0.01%
==========================================
Files 88 88
Lines 23315 23342 +27
Branches 4338 4342 +4
==========================================
+ Hits 20945 20973 +28
+ Misses 1644 1643 -1
Partials 726 726 ☔ View full report in Codecov by Sentry. |
⏱️ Performance Benchmark Report: 953e8f9Performance shifts
Full benchmark results
Generated by GHA run |
The benchmarks showing changes aren't really the ones I'd expect. |
⏱️ Performance Benchmark Report: 953e8f9Performance shifts
Full benchmark results
Generated by GHA run |
I added a benchmark in bfbd625 that should show the improvement. |
💯 it will be great to see this come together ! |
⏱️ Performance Benchmark Report: d419943Performance shifts
Full benchmark results
Generated by GHA run |
🚀 Pull Request
Description
Another idea to speed up loading NetCDF files with many variables. This caches the last 100 Dask arrays created from
NetCDFDataProxy
s so shared coordinates can be re-used. Since copying a Dask array is much faster than creating a new one, this gives a speedup.Consult Iris pull request check list
Add any of the below labels to trigger actions on this PR: