Optimize the memory layout of the palette compressed data structure to variable length arrays to improve cache locality and reduce number of allocations#2481
Draft
IntegratedQuantum wants to merge 1 commit intoPixelGuys:masterfrom
Conversation
…o variable length arrays to improve cache locality and reduce number of allocations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this one I'm not so sure about, it does make the assembly much nicer and is a little more efficient (most notable more cache-efficient, the chance to have the first palette entries on the same cache line is pretty good), but I'm not too happy with how it looks, and in practice it only has a small impact on performance (I measured around 1-2%, which could be errors)


I'll leave this open for now and take a look at it again in the future. The idea of having buffers after the end of the struct is pretty good in my opinion, but I think it would work nicer if we had an actual data structure for it.
I also had a more crazy idea: Just make them comptime-known arrays using inline switch on the bitsize, but I made a quick prototype of it, and it did actually seem slower so I don't know. godbolt for reference