Structures should not store large structures #161
Replies: 17 comments
-
Yes I love the idea of moving to 512 byte chunks to optimize for cache! This is great please if you can try and find examples in the code we can clean up. I can help rewrite in this format quicker if you can find me and others the offending spots you find it in. This is the largest codebase ive tried to contribute to a line number would be killer if you can get it. |
Beta Was this translation helpful? Give feedback.
-
Wouldn't this break ABI compatibility in some horrifying way? |
Beta Was this translation helpful? Give feedback.
-
To really make X server better, we'll have to break ABI anyway. Otherwise, what's the point fork if it's the same as Xorg Foundation. |
Beta Was this translation helpful? Give feedback.
-
That would require much more allocations (and additional bookkeeping logic). And we need to be very careful about not breaking ABI. |
Beta Was this translation helpful? Give feedback.
-
in theory, I can just give you changes that don't break ABI. And for yourself and those who are interested in more optimized but poorly compatible X server with old software, make separate fork, name it roughly so I don't have enough old X software, it's more important for me to reduce system-wide CPU cache misses on my workstation server cluster. I must say right away that cluster on Gentoo, but they are developing and contributing with Debian. |
Beta Was this translation helpful? Give feedback.
-
@GermanAizek As for the abi problem, like I said on your other pr, if we become abi-incompatible with Xorg, then nvidia will have to |
Beta Was this translation helpful? Give feedback.
-
Alan Cooper, he said that there are few performance tests in the x server and there is only x11test. Good option is to run perf profiler and valgrind. @stefan11111, This kind of optimization just physically couldn't be worse for 64-bit CPU (most common arch). My favorite computer science writer:Best real linux kernel example align structures for 64-bit and reduce size structures for cache-like:https://www.phoronix.com/news/Linux-6.8-Networking https://www.youtube.com/watch?v=qo1FFNUVB-Q My alignment PRs with benchmarks:https://code.videolan.org/videolan/dav1d/-/merge_requests/1788 |
Beta Was this translation helpful? Give feedback.
-
@stefan11111 |
Beta Was this translation helpful? Give feedback.
-
It is not always a good thing to replace large structs with pointers.
The second one is way slower that the first one. |
Beta Was this translation helpful? Give feedback.
-
The first structure should be 64 bytes, it refers, if necessary, to a large one, or vice versa, frequently used fields refers, if necessary, to a small 64 byte one. Also, when accessing, spending CPU cycles on dereferencing ptr, if there is frequent access, it can do harm. By default, prefer nested structures (by value). It's easier, safer, and often faster due to data locality. Go to the pointers if you have a clear reason:
In other words, it is better to use a pointer for rarely used structures, but divide the frequently used ones into structures that are multiples of 64 bytes. As close to size 64 bytes as possible. |
Beta Was this translation helpful? Give feedback.
-
If you're talking about this issue, it won't work blindly, like aligning structures. That's why I created |
Beta Was this translation helpful? Give feedback.
-
for these changes, which share structures with pointer access, it is desirable to have a good benchmark aka new X11Test. |
Beta Was this translation helpful? Give feedback.
-
@stefan11111, |
Beta Was this translation helpful? Give feedback.
-
Hello @GermanAizek, thank you for reporting the issue. Could you please update the description using the template given in #131 ? Please add your content in the template fields enclosed with Could you please mark the checkboxes in the Extra fields section too by replacing the space Thanks for your help. |
Beta Was this translation helpful? Give feedback.
-
@GermanAizek Could you please please pay special attention to the aspect of "API/ABI stability" in the "Additional Information" section when updating the issue description? This must be very well thought out. |
Beta Was this translation helpful? Give feedback.
-
I'm moving this to X11Libre Ideas · Discussions · GitHub since there are still some open questions. Also the real value should be shown by numbers and outweight the risk of breaking things significantly. |
Beta Was this translation helpful? Give feedback.
-
This will be closed because the reporter did not respond. Please comment on this of you feel the need to reopen this discussion. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Structures should not store large structures, structures larger than 64 bytes should be converted to 8 byte pointer.
Modern CPUs rely heavily on caching to speed up memory access. When a large struct is loaded into the CPU cache, it occupies a significant portion of the cache line. If you only need to access a small part of that struct, or if the relevant data is scattered across many large structs, it can lead to more cache misses and slower performance. By storing pointers, the parent struct is small and can be more easily kept within the CPU cache. The actual data pointed to can be fetched only when needed, potentially leading to better cache utilization for frequently accessed data.
Example:
Unoptimized structure
Optimized CPU Cache-like structure
Beta Was this translation helpful? Give feedback.
All reactions