Optimize decoder, hash, channel paths, and prefetching.#326
Open
unquietwiki wants to merge 1 commit intophoboslab:masterfrom
Open
Optimize decoder, hash, channel paths, and prefetching.#326unquietwiki wants to merge 1 commit intophoboslab:masterfrom
unquietwiki wants to merge 1 commit intophoboslab:masterfrom
Conversation
Comment on lines
-322
to
+327
| #define QOI_COLOR_HASH(C) (C.rgba.r*3 + C.rgba.g*5 + C.rgba.b*7 + C.rgba.a*11) | ||
| /* Original hash: r*3 + g*5 + b*7 + a*11 | ||
| Optimized with strength reduction: r*3 = (r<<1)+r, g*5 = (g<<2)+g, etc. */ | ||
| #define QOI_COLOR_HASH(C) (((C.rgba.r << 1) + C.rgba.r) + \ | ||
| ((C.rgba.g << 2) + C.rgba.g) + \ | ||
| ((C.rgba.b << 3) - C.rgba.b) + \ | ||
| ((C.rgba.a << 3) + (C.rgba.a << 1) + C.rgba.a)) |
There was a problem hiding this comment.
compilers will do this for you (and likely do so better).
Author
There was a problem hiding this comment.
Yeah I was seeing something funky with the encode/decode times with the non-32bit word conversion. Also discovered you have to stick with O3 optimization; downgrading to none or O2 yielded worse results.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Alright... so we have all this AI tooling now, and I was curious if there were some kind of gains to be had, without breaking the format: had a very interesting result. There are smaller tweaks that were a wash on my setup, but might see gains on other CPUs. However, the biggest change was using a 32-bit pass on the 4 color channels, if the CPU is compatible: with that, I saw a roughly 13% increase in decode speed over a qoibench run on the standard test set; some individual files were more like 50% faster.
Reference setup: Ubuntu 24.04.x on WSL2; AMD Ryzen 5950X CPU; 48GB DDR4 RAM; Samsung SSD 970 Evo Plus 2TB storage