Zune-inflate on 0.2.0 #24
Replies: 9 comments 3 replies
-
Nice! I'll take a closer look shortly. A nit on the README: |
Beta Was this translation helpful? Give feedback.
-
Since the performance is not as great as libdeflate yet, perhaps it makes sense to remove the remaining unsafe and position the library as somewhat slower than libdeflate but 100% safe Rust? It still beats miniz_oxide and makes using zune-inflate instead of miniz_ozide a no-brainer. You can later re-add unsafe optimizations as an optional feature, once they have a bigger impact on performance. E.g. https://github.com/pseitz/lz4_flex does this. I've naively replaced the implementations of the only two unsafe functions with the appropriate methods on slices, and |
Beta Was this translation helpful? Give feedback.
-
A benchmark against flate2 with zlib-ng backend would also be great. This is the option people usually turn to for more performance, since it's just a feature flag on the familiar |
Beta Was this translation helpful? Give feedback.
-
I believe Is there anything else you'd like to do before publishing to crates.io? Publishing does not have to coincide with a public announcement. |
Beta Was this translation helpful? Give feedback.
-
Yes, there are some things I'd like before publishing.
|
Beta Was this translation helpful? Give feedback.
-
Hi, all that is left is 2, I can't do that as I'm just a student with interesting inclinations, I was hoping for now you might :). But ideally I've set up criterion in that calling Currently limited to first class decoders, i.e jpeg and png and inflate, things I spent my time optimizing, not ppm, qoi for now. |
Beta Was this translation helpful? Give feedback.
-
AWS has a free tier for compute: https://aws.amazon.com/premiumsupport/knowledge-center/free-tier-windows-instance/ (I'm sure it allows Linux as well, which is cheaper because you don't have to pay for a Windows license). I'm not sure if the VMs created that way are noisy or not, however. The variance may come either from over-commit of host resources or from you getting different CPU platforms in different runs. At least the CPU platform variance can be mitigated by running benchmarks on both master and the proposed PR in the same run, and using the relative values in place of the absolute values. IIRC Github Actions isn't suitable for this specifically because the results are noisy, but that's just hearsay, I haven't verified it myself. Google Cloud also has an always-free VM instance tier but those don't even get an entire CPU core to themselves, that's useless for benchmarking. The same goes for Azure. If AWS (or some other cloud) provides a suitable free tier, you can configure the VM to pull the repo and run the benchmarks on boot, and then shut down. On AWS you can trigger it e.g. through a call to AWS Lambda, which accepts an HTTP request and makes an API call to start up the VM. There may be better ways, I'm not very familiar with AWS. I suggest investigating whether AWS allows non-shared-core VMs in the free tier, and if so, if the benchmarking runs are at all consistent. If there's no free tier available, I suppose I could foot the bill for the infra for a while. |
Beta Was this translation helpful? Give feedback.
-
With all the parameters met, and https://etemesi254.github.io/posts/Zune-Benchmarks/ which consequently leads to the benchmark page at https://etemesi254.github.io/assets/criterion/report/index.html It follows that https://crates.io/crates/zune-inflate should exist. PS for exr, I suggest only enabling zlib, saves some binary size by not including some huge crc table that is used for gzip decoding. Happy programming |
Beta Was this translation helpful? Give feedback.
-
@Shnatsel since you were interested, I almost finalized zune-inflate.
The speedups are nice over miniz-oxide but libdeflate absolutely floors us with its decode speeds.
May spend sometime tweaking the knobs to get mine faster but initial benchmarks can be found in the Readme.
Beta Was this translation helpful? Give feedback.
All reactions