Relation of chunks and blocks #598
-
Hey, I thoroughly read the documentation and all the blog posts on Blosc2, yet I still have some questions regarding the relationship between chunks and blocks. As I understand it, Blosc2 utilizes a superchunk (schunk) representing the entire array, which in turn consists of chunks, which themselves consist of blocks. My questions regarding this concept are as follows:
I hope these aren't too many questions, but I would be very thankful for any answers that help me understand Blosc2 better :) Kind regards, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Your appreciations are mostly correct, in the sense that compression in Blosc2 only happens actually at block partition. Having said that, Zarr also uses Blosc by default (although only version 1), and the same applies to it, so not a big difference in how they work. The main difference is that Blosc2 has the notion of multidimensionality for partitions for both blocks and chunks, while Blosc1 does not. That means that users have way more flexibility to adapt for their needs, for either achieving either better cratio or speed. Another difference is that Blosc2 has a masking feature to decompress only interesting blocks out of a chunk (see "Block Masks and Parallel I/O" slides in https://www.blosc.org/docs/Blosc2-WP7-LEAPS-Innov-2022.pdf). But in general, yes, while only block size (shape for Blosc2) influences cratio, both chunk and block sizes do have an influence in compression speed due to caches in CPUs. Deciding which shapes are best is sometimes a tough (albeit interesting) task. But at least Blosc2 offers you a comprehensive set of parameters for helping during your experimentation (which you cannot magically replace by heuristics, unfortunately). |
Beta Was this translation helpful? Give feedback.
Your appreciations are mostly correct, in the sense that compression in Blosc2 only happens actually at block partition. Having said that, Zarr also uses Blosc by default (although only version 1), and the same applies to it, so not a big difference in how they work.
The main difference is that Blosc2 has the notion of multidimensionality for partitions for both blocks and chunks, while Blosc1 does not. That means that users have way more flexibility to adapt for their needs, for either achieving either better cratio or speed. Another difference is that Blosc2 has a masking feature to decompress only interesting blocks out of a chunk (see "Block Masks and Parallel I/O" slides in https://w…