Skip to content

Conversation

@ycsin
Copy link
Member

@ycsin ycsin commented Nov 1, 2024

Allow architecture to implement their way to get the current thread for the current implementation.


Benchmarks

scheduler microbenchmark

Based on the upstream scheduler microbenchmark, split (and cropped) for better readability

unpend, ready, switch, pend

boxplot: scheduler - each operation

tot

boxplot: scheduler - total

avg

boxplot: scheduler - average

latency measurements

Based on the upstream latency measurements benchmark

ISR back to interrupted thread

ISR back to interrupted thread

ISR back to interrupted thread (without 99.9th)

For better readability:

ISR back to interrupted thread - zoomed

ISR to another thread

ISR to another thread

ISR to another thread (without 99.9th)

For better readability:

ISR to another thread - zoomed

Other tests

unit in ns

Other tests (ns)

unit in cycles

Other tests (cycles)

kernel Object Performance (sys_kernel)

Based on the upstream sys_kernel microbenchmark, split for better readability

Semaphore, LIFO, FIFO, Stacks

Semaphore, LIFO, FIFO, Stacks result

Memslab

Memslabresult

@ycsin ycsin added area: Kernel RFC Request For Comments: want input from the community area: Architectures area: RISCV RISCV Architecture (32-bit & 64-bit) labels Nov 1, 2024
kernel/thread.c Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we set up TLS for dummy threads?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dummy threads never run, so will never access TLS. The idea is just a trick for re-using the context switch code for initial startup, you need some "thread" to "save" to when you want to start running your first thread but are still on a boot stack somewhere.

kernel/sched.c Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably can just call arch_current_thread() here, and use the current implementation as the default when CONFIG_ARCH_HAS_CUSTOM_CURRENT_IMPL=n

@ycsin ycsin marked this pull request as ready for review November 5, 2024 06:10
@ycsin ycsin changed the title arch: custom _current implementation [RFC] arch: custom _current implementation Nov 5, 2024
@ycsin
Copy link
Member Author

ycsin commented Nov 5, 2024

cc @srv-meta

cfriedt
cfriedt previously approved these changes Nov 5, 2024
Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GP reg - makes a lot of sense for RISC-V. Thank you for the figures @ycsin.

The only thing that stuck out to me as odd was that suspending and resuming a thread seemed to take slightly longer in some graphs down below.

Please take a look at this @peter-mitsis, @andyross, @npitre @fkokosinski

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about Global Pointer (GP) register.
Makes it easier to read and understand what GP is...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably also in the kconfig itself, spell it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the prompt and the help section to spell out the 'global pointer (GP)' but kept the Kconfig name to be consistent with CONFIG_RISCV_GP

@akabaev
Copy link

akabaev commented Nov 5, 2024

The only dummy thread that should matter here is the one that Zephyr uses during startup before switching to main thread, no? Should zephyr consider to possibly

  • run on main thread from the start and eliminate the need for dummy thread entirely
  • maybe allocate/reserve TLS for main thread early, use it from dummy thread and then inherit it from main during the initial switch?

Then TLS can be enabled for _current retrieval.

@npitre
Copy link

npitre commented Nov 5, 2024

Did you look at the compiled code's assembly out put to see what could
make a difference? Especially with CONFIG_CURRENT_THREAD_USE_TLS=y it
should be about equivalent already. And CONFIG_RISCV_GP=y (which conflicts
with this PR) would itself add some additional performance gain.

@ycsin
Copy link
Member Author

ycsin commented Nov 5, 2024

Did you look at the compiled code's assembly out put to see what could make a difference? Especially with CONFIG_CURRENT_THREAD_USE_TLS=y it should be about equivalent already. And CONFIG_RISCV_GP=y (which conflicts with this PR) would itself add some additional performance gain.

I do not have kernel benchmark that compares CONFIG_CURRENT_THREAD_USE_TLS & CONFIG_RISCV_GP (I can probably run them tomorrow (EDIT: benchmark is here)), they weren't a clear win in our internal application benchmarks.

The CONFIG_CURRENT_THREAD_USE_TLS only affects the k_current_get() calls and is per-thread, which is different from _current (per-cpu) and isn't used much in the kernel/scheduler

@ycsin
Copy link
Member Author

ycsin commented Nov 19, 2024

Updated the release & migration notes to push this closer to the finish line.

Well... if you recompile using CONFIG_OUTPUT_DISASSEMBLY=y you'll notice that currently every _current reference is turned into an out-of-line call to z_impl_k_sched_current_thread_query. This is way suboptimal. Redefining _current to use arch_current() directly would be one way to get this feature's full potential, and most probably make the compiled code smaller too.

You are absolutely right, applied the suggestion, thanks.

And for test coverage purpose this should default to y.

Undecided about this yet, would this affect the CONFIG_RISCV_GP?


I renamed the API from arch_current_thread to arch_curr_thread to look more consistent with arch_curr_cpu


RFC: I wonder if I should name it as arch_curr_cpu_thread instead, to be more distinct from k_current_get?

k_current_get is a little tricky, but it should be fine as long as it is not used in the scheduler:

  • When CONFIG_CURRENT_THREAD_USE_TLS=n, it is equivalent to _current / k_sched_current_thread_query(), which is implemented in a per-CPU global variable, and return thread scheduled on the current CPU, this variable can be modified in the scheduler
  • Otherwise, it returns the z_tls_current, which depends on whatever thread that calls it, and cannot change after thread entry.

@ycsin ycsin force-pushed the pr/riscv-thread-in-gp branch 3 times, most recently from 0374596 to 7790b1a Compare November 19, 2024 15:24
@ycsin ycsin force-pushed the pr/riscv-thread-in-gp branch 3 times, most recently from 0789cd9 to 1d6cbd8 Compare November 21, 2024 04:23
Add the following arch-specific APIs:
- arch_curr_thread()
- arch_set_curr_thread()

which allow SMP architectures to implement a faster "get current
thread pointer" than the default provided by the kernel. The 'set'
function is required for the 'get' to work, more on that later.

When `CONFIG_ARCH_HAS_CUSTOM_CURRENT_IMPL` is selected, calls to
`_current` & `k_sched_current_thread_query()` will be redirected to
`arch_curr_thread()`, which ideally should translate into a single
instruction read, avoiding the current
"lock > read CPU > read current thread > unlock" path in SMP
architectures and thus greatly improves the read performance.

However, since the kernel relies on a copy of the "current thread"s on
every CPU for certain operations (i.e. to compare the priority of the
currently scheduled thread on another CPU to determine if IPI should be
sent), we can't eliminate the copy of "current thread" (`current`) from
the `struct _cpu` and therefore the kernel now has to invoke
`arch_set_curr_thread()` in addition to what it has been doing. This
means that it will take slightly longer (most likely one instruction
write) to change the current thread pointer on the current
CPU.

Signed-off-by: Yong Cong Sin <[email protected]>
Signed-off-by: Yong Cong Sin <[email protected]>
Implement `arch_curr_thread()` & `arch_set_curr_thread()`
with the global pointer (GP) register.

Signed-off-by: Yong Cong Sin <[email protected]>
Signed-off-by: Yong Cong Sin <[email protected]>
`_current` is now functionally equals to `arch_curr_thread()`, remove
its usage in-tree and deprecate it instead of removing it outright,
as it has been with us since forever.

Signed-off-by: Yong Cong Sin <[email protected]>
Signed-off-by: Yong Cong Sin <[email protected]>
@ycsin ycsin force-pushed the pr/riscv-thread-in-gp branch from 1d6cbd8 to cf825d7 Compare November 21, 2024 16:21
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason not to +1, this all looks clean. One note on some better docs so our grandchildren understand what the deal is with USERSPACE and the GP register.

Aesthetically I'm a little mixed on the last patch, though. First because I'm not a big fan of giant refactoring, second because 13 bytes of extra typing starts to get beyond my annoyance threshold. But mostly it's just that on the uniprocessor systems that are Zephyr's ancestral homeland, "_current" really is just a global struct and it's helpful to remember that.

}

#ifdef CONFIG_RISCV_CURRENT_VIA_GP
register struct k_thread *__arch_current_thread __asm__("gp");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: while asm+register variables are a really useful trick for controlling marshalling behavior around inline assembly, I think here it might be more confusing than valuable and a more traditional asm block which reads the value directly might be clearer.


config RISCV_CURRENT_VIA_GP
bool "Store current thread into the global pointer (GP) register"
depends on !RISCV_GP && !USERSPACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document somewhere (maybe just right here in the help string) why USERSPACE is disallowed, and maybe even open an issue demanding that proper entry/exit protection for GP be implemented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #81843

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@nashif nashif merged commit b1def71 into zephyrproject-rtos:main Nov 24, 2024
50 checks passed
@ycsin ycsin deleted the pr/riscv-thread-in-gp branch November 24, 2024 16:40
@npitre
Copy link

npitre commented Nov 30, 2024

I just saw the deprecation part after being merged in main.

This PR didn't include that part initially. Or did it?

What is the rationale for that part?

Personally I'm rather against this. Unless there is an actual downside,
the core of Zephyr always used _current and it was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Having a centrally defined abstraction
also helps with instrumentation when such need comes around, etc. etc.

Again, is there a rationale for this? Otherwise I propose reverting
commit b1def71.

@ycsin
Copy link
Member Author

ycsin commented Dec 3, 2024

This PR didn't include that part initially. Or did it?

What is the rationale for that part?

Yeah, it was added later, since _current functions the same as arch_current_thread(), I thought it was good to clean that up.

If we revert b1def71, it would just mean that we are going to have 3 functions to get the current thread:

  • _current - the thread scheduled on the current CPU
  • arch_current_thread() - interchangeably with _current
  • k_current_get() - the thread from where this function is called

@npitre
Copy link

npitre commented Dec 3, 2024

  • _current - the thread scheduled on the current CPU

... to be used in the kernel's internal code.

  • arch_current_thread() - interchangeably with _current

Semantically they would be different. We traditionally used arch_* for
architecture specific implementations. In this case it doesn't necessarily
have to be an architecture specific implementation. Probably the fallback
case would be better not be called arch_current_thread() for that reason
and avoid possible confusion.

  • k_current_get() - the thread from where this function is called

That exists mainly as a syscall for user threads. In fact, with the gp-based
implementation on RISC-V, the syscall could be dispensed with too. But in
theory, those 3 variants could potentially be different in some ways.

npitre pushed a commit to npitre/zephyr that referenced this pull request Jan 8, 2025
Mostly a revert of commit b1def71 ("arch: deprecate `_current`").

This commit was part of PR zephyrproject-rtos#80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.

The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.

Hence this revert.

Signed-off-by: Nicolas Pitre <[email protected]>
kartben pushed a commit that referenced this pull request Jan 10, 2025
Mostly a revert of commit b1def71 ("arch: deprecate `_current`").

This commit was part of PR #80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.

The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.

Hence this revert.

Signed-off-by: Nicolas Pitre <[email protected]>
Chenhongren pushed a commit to Chenhongren/zephyr that referenced this pull request Jan 22, 2025
Mostly a revert of commit b1def71 ("arch: deprecate `_current`").

This commit was part of PR zephyrproject-rtos#80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.

The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.

Hence this revert.

(cherry picked from commit 46aa671)

Original-Signed-off-by: Nicolas Pitre <[email protected]>
GitOrigin-RevId: 46aa671
Cr-Build-Id: 8726132470642320465
Cr-Build-Url: https://cr-buildbucket.appspot.com/build/8726132470642320465
Copybot-Job-Name: zephyr-main-copybot-downstream
Change-Id: I030e4cdfecfa55c2ec3b1a27b843a8c60484463e
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/zephyr/+/6167306
Reviewed-by: Jonathon Murphy <[email protected]>
Tested-by: Keith Short <[email protected]>
Reviewed-by: Keith Short <[email protected]>
Tested-by: ChromeOS Prod (Robot) <[email protected]>
Commit-Queue: Dawid Niedźwiecki <[email protected]>
masz-nordic pushed a commit to masz-nordic/zephyr that referenced this pull request Feb 13, 2025
Mostly a revert of commit b1def71 ("arch: deprecate `_current`").

This commit was part of PR zephyrproject-rtos#80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.

The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.

Hence this revert.

Signed-off-by: Nicolas Pitre <[email protected]>
(cherry picked from commit 46aa671)
RICCIARDI-Adrien pushed a commit to RICCIARDI-Adrien/zephyr that referenced this pull request Jun 6, 2025
Mostly a revert of commit b1def71 ("arch: deprecate `_current`").

This commit was part of PR zephyrproject-rtos#80716 whose initial purpose was about providing
an architecture specific optimization for _current. The actual deprecation
was sneaked in later on without proper discussion.

The Zephyr core always used _current before and that was fine. It is quite
prevalent as well and the alternative is proving rather verbose.
Furthermore, as a concept, the "current thread" is not something that is
necessarily architecture specific. Therefore the primary abstraction
should not carry the arch_ prefix.

Hence this revert.

Signed-off-by: Nicolas Pitre <[email protected]>
(cherry picked from commit 46aa671)
(cherry picked from commit cf375a2)
@ycsin ycsin changed the title [RFC] arch: custom _current implementation arch: custom _current implementation Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Architectures area: Kernel area: RISCV RISCV Architecture (32-bit & 64-bit) RFC Request For Comments: want input from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants