Skip to content

Optimize the memory layout of the palette compressed data structure to variable length arrays to improve cache locality and reduce number of allocations#2481

Draft
IntegratedQuantum wants to merge 1 commit intoPixelGuys:masterfrom
IntegratedQuantum:optimize_palette_memory_layout

Conversation

@IntegratedQuantum
Copy link
Member

this one I'm not so sure about, it does make the assembly much nicer and is a little more efficient (most notable more cache-efficient, the chance to have the first palette entries on the same cache line is pretty good), but I'm not too happy with how it looks, and in practice it only has a small impact on performance (I measured around 1-2%, which could be errors)
Screenshot at 2026-01-10 17-12-07
Screenshot at 2026-01-10 17-54-26
I'll leave this open for now and take a look at it again in the future. The idea of having buffers after the end of the struct is pretty good in my opinion, but I think it would work nicer if we had an actual data structure for it.

I also had a more crazy idea: Just make them comptime-known arrays using inline switch on the bitsize, but I made a quick prototype of it, and it did actually seem slower so I don't know. godbolt for reference

…o variable length arrays to improve cache locality and reduce number of allocations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

1 participant