-
Notifications
You must be signed in to change notification settings - Fork 8.4k
kernel: speed up z_stack_space_get #80391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel: speed up z_stack_space_get #80391
Conversation
andyross
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks correct, but devil's advocacy: do we actually care about this API being fast? I mean, generally this is going to be some kind of debug or auditing hook, etc... No one should have this kind of analysis on a hot path. I guess my gut says that this is just wasting code bytes for no value?
We do, which is the impetus for this change. Our systems monitor thread stack usage at a relatively high frequency with a lot of threads present. |
npitre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be improved further for 64-bit architectures simply by using
an unsigned long rather than a uint32_t. The marker should then be
(unsigned long)0xaaaaaaaaaaaaaaaa with the explicit cast so not to cause
a compiler warning on 32-bit systems.
3262637 to
babfb68
Compare
kernel/thread.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we abstract it out as macro?
kernel/thread.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it was indicated that speed is important, I would suggest doing the unused calculation at the end of the loop. That is, do it once rather than every iteration of the loop. Maybe the compiler will do that for us, but I tend to be somewhat pessimistic about compiler smarts.
kernel/thread.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack on a 64-bit platform has at least 8-byte memory alignment. Adding 4 bytes to checked_stack in the CONFIG_STACK_SENTINEL case in the preceding code block will change checked_stack to be 4-byte aligned. This has the potential to generate an alignment exception when reading a 64-bit value from a 32-bit aligned memory address. Whether such an exception would occur would be both architecture and compiler dependent.
babfb68 to
b1bb1c6
Compare
|
I'm somewhat perplexed by the latest push. What is this Then, if Whi the -1 here? Especially with In other words, this is doubly broken code. Here's what I initially suggested instead: |
b1bb1c6 to
8962635
Compare
|
@npitre I tried to do 3 things in one shot and it just scrambled everything. I update the loop to be as you suggested. |
|
Still not right. First, this: I don't understand the logic behind the above. Then you do: This is wrong. You are ignoring the sentinel flag presence. |
|
@npitre I copy-pasted your exact comment and updated.
This is to account for a completely empty stack since the loop does not increment "unused" 1-word at a time and only calculates it based on where the first "used" word is found. If a stack contains no used words, then we need to update unused to be the full size of the stack.
I am not sure what you mean here, the sentinel is accounted for above the changed. |
Sorry, this makes no sense. |
|
@peter-mitsis said:
So I changed the loop from: to: In this case, if the whole stack is unused, then unused will not be updated during the check, and must be updated after the loop to be the full size of the stack. |
But if the entire stack is used, The best idiomatic way to do this is: About the sentinel issue, this is wrong: Instead, you need: or: as |
8962635 to
f46d088
Compare
The z_stack_space_get call currently checks for free space at the top of the stack by checking each byte individually. This can introduce significant runtime overhead for threads which have large, mostly unused stacks. This change updates the check to first count the free space by word, and then check the sub-word unused bytes. Signed-off-by: Félix Turgeon <[email protected]>
f46d088 to
2074c00
Compare
|
Since there is a lot of talk about speed optimization here, it would be good to measure and report numbers. |
|
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
|
The thing I find confusing is that mostly all of the algorithms mentioned here are still If we're really concerned with speed, why not perform a binary search of the stack space (minus the stack sentinel) for the first non-pattern word, potentially making adjustments for a few bytes at the end? That reduces the time-complexity down to Edit: I guess because the pattern used to fill unused stack space could theoretically end up anywhere in the stack. |
|
If we're really concerned with speed, why not perform a binary search of the stack space (minus the stack sentinel) for the first non-pattern word, potentially making adjustments for a few bytes at the end? That reduces the time-complexity down to `O(log n)` which is at least closer to optimal.
Can't do. The stack is not used fully in a contigous manner due to
alignment requirements, etc. This means you're likely to land on still
untouched spots here and there with the original pattern in the middle
of the actually used stack space.
|
|
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
The z_stack_space_get call currently checks for free space at the top of the stack by checking each byte individually. This can introduce significant runtime overhead for threads which have large, mostly unused stacks. This change updates the check to first count the free space by word, and then check the sub-word unused bytes.