-
Notifications
You must be signed in to change notification settings - Fork 8.2k
tests/drivers/ipm: Support ipm test case in SMP #14619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
All checks are passing now. Review history of this comment for details about previous failed status. |
Codecov Report
@@ Coverage Diff @@
## master #14619 +/- ##
==========================================
+ Coverage 52.93% 52.94% +0.01%
==========================================
Files 309 309
Lines 45251 45251
Branches 10447 10447
==========================================
+ Hits 23953 23958 +5
+ Misses 16533 16531 -2
+ Partials 4765 4762 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to my comments, the git history needs to be cleaned up, we have a merge commit and 2 patches with the same commit message here. You need to interactively rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nitpicks. Broadly though, I don't understand why this fixes what it says it fixes. How exactly was this broken in SMP? The commit message should have an explanation.
|
@andyross @andrewboie could you please help review the new code? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nitpick about volatile, one spot in the SPIN_VALIDATE predication that definitely needs to be changed.
include/ring_buffer.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the wrong spot. There are plenty of uses for a ring buffer that don't involve contending with SMP CPUs. Instead, I think you probably want to tag the pointer to the struct volatile in the app instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hate to stick my nose in here, but is volatile even the right thing here? If the issue is that other CPUs might change the value, then I'm not convinced that we don't also need explicit storage fences. Probably not a problem on x86, but on other arches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On architectures where it matters, the fence will be provided by the synchronization primitive where we are using one, which we aren't here. It's true that IPM is sort of a special case, as it was intended for a asymmetric multiprocessing architecture (Quark SE -- one x86 and one ARC on the same memory bus) with no cache. But that's not really something we can fix with this patch.
9466d2a to
bbb6d41
Compare
|
@andyross could you please review the new changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, the fact that this is hanging on x86_64 in the CI test implies that this probably doesn't fix the bug it claims to fix. :)
Beyond that, the meat of the changes seem to make sense: you need to have locking around the ring buffer in SMP where you don't in UP, and you need to add a sleep in the test because the other CPU may try to mutate the thing before your test code is prepared for the next step. Those seem fine.
But I don't understand the removal of the (fairly clearly explained) buffer full handling at all. Why is this not introducing a failure case when the buffer fills up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why it's OK to remove this. That condition (an interrupt delivering data that fills the buffer) seems like it can still happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to use the flags(enabled and channel_disable), there is possibility that the flags are accessed by ISR and receiver thread at the same time if SMP enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The volatile is incorrect here. This delcares a volatile pointer (i.e. it tells the compiler that the memory where the local driver_data variable is stored may change and must not be optimized into a register, which doesn't make sense because no other thread is going to be mutating our stack like that). You want a pointer to a volatile struct instead:
volatile struct ipm_console_receiver_runtime_data *driver_data;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, I didn't get the right usage of volatile in that place, and above change should be the cause that make the CI test fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as before: why is it OK to remove this? The condition and the comment explaining it (a RX interrupt is out of space in the buffer) makes sense, and I don't see anything in the patch that addresses it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sender thread will wait until there is space in ring buffer to do next send.
-
while (ring_buf_space_get(&driver_data->rb) == 0) { -
}
tests/drivers/ipm/src/ipm_dummy.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed? It sounds like the original test expects the busy flag to be cleared by an ISR before returning, and now it's going to return synchronously before the interrupt (or whatever) reports success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, sender thread will wait until there is at least one free buffer for the ring buffer.
eb588f5 to
61b3c95
Compare
|
OK, I see the spinning behavior and sort of understand. And to be fair, IPM is a mostly unused, oddball subsystem. But... that's a fairly major API change! You're changing the behavior of the code from "disable and return a synchronous error" to "spin forever if the other side doesn't read". That's not really in the spirit of "bug fix for SMP". And I don't understand why you have to do it. The same tricks that work with this code as written (it was originally for a multiprocessing setup!) should be able to work when there are extra cores on one side, with a little locking. |
Current design of ipm test doesn't consider SMP, there is the possibility that offload_irq and rx_thread run at different cpu core at the same time, it will cause state(enabled and channel_disable) disorder if two different context access them concurrently. Fixes zephyrproject-rtos#12478. Signed-off-by: Wentong Wu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I took a look today. It really does seem like this is only an SMP safety thing, and we've been working around that by disabling SMP for 1.14. We can come back to this in 1.15 and do it right.
Uh oh!
There was an error while loading. Please reload this page.