-
Notifications
You must be signed in to change notification settings - Fork 8.4k
kernel/smp: add a Kconfig to disable validation of _current_cpu
#81042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel/smp: add a Kconfig to disable validation of _current_cpu
#81042
Conversation
|
This is a bit dubious to me. If you care about performance then you should |
I agree with your concern. But because of the difference this single test makes, and the test is there to make sure that |
e2f1b39 to
1bfcf5c
Compare
|
I agree with @ycsin, but for slightly different reasons. Infrastructure actually depends on general assertions enabled to catch the 1-in-a-million type bugs where it would be physically impossible to reproduce by a human in 100 man-years of time. In those situations, we only have post-mortem analysis. However, production systems also need to be as fast as reasonably possible. If sufficient testing has been done (both upstream and in pre-production or shadow tiers) to verify that SMP is bug-free (e.g. to some SLA like 99.999% of uptime), then I could see it being pretty useful to want to tune performance and eliminate this hotspot, while keeping general assertions enabled. |
kernel/Kconfig.smp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording of this could be changed slightly. The general default y should suffice... although I do wish we had some canned phrases for "say y here, unless you really know what you're doing" without sounding condescending.
| Use of the _current_cpu pointer cannot be done safely in a preemptible | |
| context. If a thread is preempted and migrates to another CPU, the old | |
| CPU record will be wrong. Add a validation assert to catch incorrect | |
| usages of `_current_cpu`. | |
| *If you do not know what is this, leave this selected. | |
| The _current_cpu pointer should only ever be used in non-preemptible | |
| contexts. When assertions are enabled in SMP systems, this option enables | |
| extra runtime checks to validate _current_cpu usage. |
kernel/Kconfig.smp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SMP can be enabled even with only 1 core, which is maybe not intuitive. I think it's for the situation with different RTOSes running on different cores.
| default y | |
| default y if MP_MAX_NUM_CPUS > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion is compiled when CONFIG_SMP=y, this Kconfig is to simply compile out that assertion completely so I guess the number of CPUs shouldn't matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had another look in the morning and I get what you meant now - the test is necessary only when there's more than 1 CPUs, and can be disabled otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought - maybe we should just change the CONFIG_SMP here to CONFIG_MP_MAX_NUM_CPUS > 1?
zephyr/include/zephyr/kernel_structs.h
Lines 254 to 268 in 6798064
| #ifdef CONFIG_SMP | |
| /* True if the current context can be preempted and migrated to | |
| * another SMP CPU. | |
| */ | |
| bool z_smp_cpu_mobile(void); | |
| #define _current_cpu ({ __ASSERT_NO_MSG(!z_smp_cpu_mobile()); \ | |
| arch_curr_cpu(); }) | |
| #define _current k_sched_current_thread_query() | |
| #else | |
| #define _current_cpu (&_kernel.cpus[0]) | |
| #define _current _kernel.cpus[0].current | |
| #endif |
npitre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guys, sorry, you have it wrong.
does preemption matters, if
arch_curr_cpu()is just a single instruction?
Of course it matters. What is the relevance of the returned value if one
instruction later you are preempted and moved to a different CPU? It is not
about retrieving the value itself, but more about the context in which the
value is used.
Imagine you get the current CPU pointer, then an IRQ happens with a reschedule,
and execution is resumed on another CPU, and the reason you wanted the current
CPU pointer was for getting at the current thread pointer. You'll get a pointer
for the thread that is now running on the CPU that used to be current before
preemption happened, not the new one.
The assertion ensures preemption was disabled. Removing this assertion could
allow very hard to reproduce bugs because you'd need to have the exact timing
for the race to happen, and then the buggy result would be random. So this
could go unnoticed for a long time and impossible to track.
I could see it being pretty useful to want to tune performance and
eliminate this hotspot, while keeping general assertions enabled.
In that case, either you look at what makes the assertion so costly (there
is no fundamental reason why it should be so prominent on the radar),
or you get rid of the hot spot altogether. If this is because of _current
usage then the right fix is to work on _current implementation just like
it is done elsewhere (e.g. dedicate a CPU register to it, store and retrieve
it from the TLS, etc). But weakening a safeguard for a very subtle,
unfrequent but nasty bug is not the way to go.
You'd prefer for compile time static analysis to take care of such issue?
I'm all for that as well. But this should come as a replacement not
just an open wish.
Otherwise, what will be the next hot spot on the radar? Will it justify
yet another assertion special exclusion config option? If it was done once,
it can be done twice. Where would such a trend stop?
|
Some observations about the assertion cost:
That makes for a lot of IRQ locking/unlocking. The whole thing could be One question comes to mind: is there any That's 3 instructions if you count the CSRR, the test and the branch for This kind of analysis is needed before dropping safeguards is considered. |
AFAICT, the assertion only guarantees that the context isn't preemptible at the very instant where However, it can't guarantee the relevance of _cpu_t *curr_cpu;
int key;
key = arch_irq_lock();
...
curr_cpu = _current_cpu;
...
arch_irq_unlock(key);
...
int curr_cpu_id = curr_cpu->id;
...I'm trying to make the point that the assertion test is not going to eliminate issues arising from preemption, instead it helps to make sure that the context is not preemptible as it is called, reducing racy issues is an indirect effect. But this is more like a code path/branch problem that can/should be eliminated early on with sufficient (coverage) tests, after that it should be safe to disable that runtime test to boost performance. This PR isn't really advocating patches to randomly disable runtime tests, but provide a mean to disable this usage-related assertion test in runtime after sufficient testing are done.
I must admit that I didn't try to make the test less painful, but IMHO these are two different topics that do not conflict - one tries to make the test lighter, so that less people notice it; the other one provides a choice to disable the test altogether, if it is not required anymore. That said, this PR should be merged only if we agree that this runtime assertion test is there to ensure the correctness of usage (codepath-related) because we don't have an easy way to guarantee that otherwise. But sufficient coverage testing should render it redundant and therefore safe to be disabled. |
The assert validation was added in eefd3da to ensure that `_current_cpu` is used safely. However, since `_current_cpu` is used frequently in the kernel/scheduler, especially when SMP is enabled, the assertion has a negative impact on the performance. This commit introduces a Kconfig to disable just the assert validation of the `_current_cpu`, without having to disable `CONFIG_ASSERT` entirely. Signed-off-by: Yong Cong Sin <[email protected]>
1bfcf5c to
0d18407
Compare
andyross
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this is too complicated. "_current_cpu" is a kernel-only API. It has a validation layer around it because messing up and calling arch_curr_cpu() in contexts that can migrate is an easy mistake to make. And early in the SMP era this found real bugs.
But this code is all mature now. I don't see why we shouldn't just update all the users to use the underlying arch API directly.
Alternative framing of the argument: we had an API, it could be used dangerously so we wrapped it in a macro. But that macro has costs, so we now wrap it in a kconfig. Just undo both layers of wrapping, and if someone really wants this feature back let's let put it back as a single kconfig in the arch_curr_cpu() API.
The |
|
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
The assert validation was added in eefd3da to ensure that
_current_cpuis used safely. However, since_current_cpuis used frequently in the kernel/scheduler, especially when SMP is enabled, the assertion has a negative impact on the performance.This commit introduces a Kconfig to disable just the assert validation of the
_current_cpu, without having to disableCONFIG_ASSERTentirely.Benchmark
Show
latency_measure
scheduler
sys_kernel