-
Notifications
You must be signed in to change notification settings - Fork 8.3k
arch: custom _current implementation
#80716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kernel/thread.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we set up TLS for dummy threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dummy threads never run, so will never access TLS. The idea is just a trick for re-using the context switch code for initial startup, you need some "thread" to "save" to when you want to start running your first thread but are still on a boot stack somewhere.
kernel/sched.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably can just call arch_current_thread() here, and use the current implementation as the default when CONFIG_ARCH_HAS_CUSTOM_CURRENT_IMPL=n
_current implementation_current implementation
|
cc @srv-meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GP reg - makes a lot of sense for RISC-V. Thank you for the figures @ycsin.
The only thing that stuck out to me as odd was that suspending and resuming a thread seemed to take slightly longer in some graphs down below.
Please take a look at this @peter-mitsis, @andyross, @npitre @fkokosinski
arch/riscv/Kconfig
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about Global Pointer (GP) register.
Makes it easier to read and understand what GP is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably also in the kconfig itself, spell it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the prompt and the help section to spell out the 'global pointer (GP)' but kept the Kconfig name to be consistent with CONFIG_RISCV_GP
|
The only dummy thread that should matter here is the one that Zephyr uses during startup before switching to main thread, no? Should zephyr consider to possibly
Then TLS can be enabled for _current retrieval. |
|
Did you look at the compiled code's assembly out put to see what could |
I do not have kernel benchmark that compares The |
|
Updated the release & migration notes to push this closer to the finish line.
You are absolutely right, applied the suggestion, thanks.
Undecided about this yet, would this affect the I renamed the API from RFC: I wonder if I should name it as
|
0374596 to
7790b1a
Compare
0789cd9 to
1d6cbd8
Compare
Add the following arch-specific APIs: - arch_curr_thread() - arch_set_curr_thread() which allow SMP architectures to implement a faster "get current thread pointer" than the default provided by the kernel. The 'set' function is required for the 'get' to work, more on that later. When `CONFIG_ARCH_HAS_CUSTOM_CURRENT_IMPL` is selected, calls to `_current` & `k_sched_current_thread_query()` will be redirected to `arch_curr_thread()`, which ideally should translate into a single instruction read, avoiding the current "lock > read CPU > read current thread > unlock" path in SMP architectures and thus greatly improves the read performance. However, since the kernel relies on a copy of the "current thread"s on every CPU for certain operations (i.e. to compare the priority of the currently scheduled thread on another CPU to determine if IPI should be sent), we can't eliminate the copy of "current thread" (`current`) from the `struct _cpu` and therefore the kernel now has to invoke `arch_set_curr_thread()` in addition to what it has been doing. This means that it will take slightly longer (most likely one instruction write) to change the current thread pointer on the current CPU. Signed-off-by: Yong Cong Sin <[email protected]> Signed-off-by: Yong Cong Sin <[email protected]>
Implement `arch_curr_thread()` & `arch_set_curr_thread()` with the global pointer (GP) register. Signed-off-by: Yong Cong Sin <[email protected]> Signed-off-by: Yong Cong Sin <[email protected]>
`_current` is now functionally equals to `arch_curr_thread()`, remove its usage in-tree and deprecate it instead of removing it outright, as it has been with us since forever. Signed-off-by: Yong Cong Sin <[email protected]> Signed-off-by: Yong Cong Sin <[email protected]>
1d6cbd8 to
cf825d7
Compare
andyross
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason not to +1, this all looks clean. One note on some better docs so our grandchildren understand what the deal is with USERSPACE and the GP register.
Aesthetically I'm a little mixed on the last patch, though. First because I'm not a big fan of giant refactoring, second because 13 bytes of extra typing starts to get beyond my annoyance threshold. But mostly it's just that on the uniprocessor systems that are Zephyr's ancestral homeland, "_current" really is just a global struct and it's helpful to remember that.
| } | ||
|
|
||
| #ifdef CONFIG_RISCV_CURRENT_VIA_GP | ||
| register struct k_thread *__arch_current_thread __asm__("gp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: while asm+register variables are a really useful trick for controlling marshalling behavior around inline assembly, I think here it might be more confusing than valuable and a more traditional asm block which reads the value directly might be clearer.
|
|
||
| config RISCV_CURRENT_VIA_GP | ||
| bool "Store current thread into the global pointer (GP) register" | ||
| depends on !RISCV_GP && !USERSPACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should document somewhere (maybe just right here in the help string) why USERSPACE is disallowed, and maybe even open an issue demanding that proper entry/exit protection for GP be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #81843
cfriedt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
|
I just saw the deprecation part after being merged in main. This PR didn't include that part initially. Or did it? What is the rationale for that part? Personally I'm rather against this. Unless there is an actual downside, Again, is there a rationale for this? Otherwise I propose reverting |
Yeah, it was added later, since If we revert b1def71, it would just mean that we are going to have 3 functions to get the current thread:
|
... to be used in the kernel's internal code.
Semantically they would be different. We traditionally used
That exists mainly as a syscall for user threads. In fact, with the gp-based |
Mostly a revert of commit b1def71 ("arch: deprecate `_current`"). This commit was part of PR zephyrproject-rtos#80716 whose initial purpose was about providing an architecture specific optimization for _current. The actual deprecation was sneaked in later on without proper discussion. The Zephyr core always used _current before and that was fine. It is quite prevalent as well and the alternative is proving rather verbose. Furthermore, as a concept, the "current thread" is not something that is necessarily architecture specific. Therefore the primary abstraction should not carry the arch_ prefix. Hence this revert. Signed-off-by: Nicolas Pitre <[email protected]>
Mostly a revert of commit b1def71 ("arch: deprecate `_current`"). This commit was part of PR #80716 whose initial purpose was about providing an architecture specific optimization for _current. The actual deprecation was sneaked in later on without proper discussion. The Zephyr core always used _current before and that was fine. It is quite prevalent as well and the alternative is proving rather verbose. Furthermore, as a concept, the "current thread" is not something that is necessarily architecture specific. Therefore the primary abstraction should not carry the arch_ prefix. Hence this revert. Signed-off-by: Nicolas Pitre <[email protected]>
Mostly a revert of commit b1def71 ("arch: deprecate `_current`"). This commit was part of PR zephyrproject-rtos#80716 whose initial purpose was about providing an architecture specific optimization for _current. The actual deprecation was sneaked in later on without proper discussion. The Zephyr core always used _current before and that was fine. It is quite prevalent as well and the alternative is proving rather verbose. Furthermore, as a concept, the "current thread" is not something that is necessarily architecture specific. Therefore the primary abstraction should not carry the arch_ prefix. Hence this revert. (cherry picked from commit 46aa671) Original-Signed-off-by: Nicolas Pitre <[email protected]> GitOrigin-RevId: 46aa671 Cr-Build-Id: 8726132470642320465 Cr-Build-Url: https://cr-buildbucket.appspot.com/build/8726132470642320465 Copybot-Job-Name: zephyr-main-copybot-downstream Change-Id: I030e4cdfecfa55c2ec3b1a27b843a8c60484463e Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/6167306 Reviewed-by: Jonathon Murphy <[email protected]> Tested-by: Keith Short <[email protected]> Reviewed-by: Keith Short <[email protected]> Tested-by: ChromeOS Prod (Robot) <[email protected]> Commit-Queue: Dawid Niedźwiecki <[email protected]>
Mostly a revert of commit b1def71 ("arch: deprecate `_current`"). This commit was part of PR zephyrproject-rtos#80716 whose initial purpose was about providing an architecture specific optimization for _current. The actual deprecation was sneaked in later on without proper discussion. The Zephyr core always used _current before and that was fine. It is quite prevalent as well and the alternative is proving rather verbose. Furthermore, as a concept, the "current thread" is not something that is necessarily architecture specific. Therefore the primary abstraction should not carry the arch_ prefix. Hence this revert. Signed-off-by: Nicolas Pitre <[email protected]> (cherry picked from commit 46aa671)
Mostly a revert of commit b1def71 ("arch: deprecate `_current`"). This commit was part of PR zephyrproject-rtos#80716 whose initial purpose was about providing an architecture specific optimization for _current. The actual deprecation was sneaked in later on without proper discussion. The Zephyr core always used _current before and that was fine. It is quite prevalent as well and the alternative is proving rather verbose. Furthermore, as a concept, the "current thread" is not something that is necessarily architecture specific. Therefore the primary abstraction should not carry the arch_ prefix. Hence this revert. Signed-off-by: Nicolas Pitre <[email protected]> (cherry picked from commit 46aa671) (cherry picked from commit cf375a2)
_current implementation_current implementation
Allow architecture to implement their way to get the current thread for the current implementation.
Benchmarks
scheduler microbenchmark
Based on the upstream scheduler microbenchmark, split (and cropped) for better readability
unpend, ready, switch, pend
tot
avg
latency measurements
Based on the upstream latency measurements benchmark
ISR back to interrupted thread
ISR back to interrupted thread (without 99.9th)
For better readability:
ISR to another thread
ISR to another thread (without 99.9th)
For better readability:
Other tests
unit in ns
unit in cycles
kernel Object Performance (sys_kernel)
Based on the upstream sys_kernel microbenchmark, split for better readability
Semaphore, LIFO, FIFO, Stacks
Memslab