-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Bluetooth: Controller: In-system profiling and CPU overhead assertions #95391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bluetooth: Controller: In-system profiling and CPU overhead assertions #95391
Conversation
dcdb73d
to
fada168
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds in-system profiling for CPU usage measurement and ticker testing functionality to the Bluetooth controller. It includes CPU overhead assertions to verify performance within defined limits and implements ticker start/stop tests for validation.
Key changes:
- Added CPU profiling assertions with platform-specific maximum thresholds
- Implemented ticker node testing with semaphore-based synchronization
- Enhanced profiling infrastructure with getter functions and conditional compilation guards
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
subsys/bluetooth/controller/ll_sw/ull.c | Added ticker testing infrastructure and profiling output during initialization |
subsys/bluetooth/controller/ll_sw/nordic/lll/lll_prof_internal.h | Added getter functions for profiling metrics and removed conditional compilation |
subsys/bluetooth/controller/ll_sw/nordic/lll/lll_prof.c | Added CPU usage assertions, getter functions, and expanded data types for profiling |
subsys/bluetooth/controller/ll_sw/nordic/lll/lll_adv.c | Added profiling calls and new ISR handler for advertising |
subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c | Added conditional profiling calls to ISR handlers |
subsys/bluetooth/controller/hci/hci.c | Changed logging method for profiling output |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
#define LLL_PROF_ASSERT(val, max) \ | ||
{ \ | ||
LL_ASSERT_MSG(((val) <= (max)), \ | ||
"%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \ | ||
HAL_TICKER_TICKS_TO_US(cputime_ticks_radio), \ | ||
LLL_PROF_RADIO_MAX_US, \ | ||
HAL_TICKER_TICKS_TO_US(cputime_ticks_lll), \ | ||
LLL_PROF_LLL_MAX_US, \ | ||
HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_high), \ | ||
LLL_PROF_ULL_HIGH_MAX_US, \ | ||
HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_low), \ |
Copilot
AI
Sep 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LLL_PROF_ASSERT macro accesses global variables (cputime_ticks_*) that may not be initialized when the macro is called. This creates an implicit dependency that makes the code harder to maintain and debug. Consider passing these values as parameters to make dependencies explicit.
#define LLL_PROF_ASSERT(val, max) \ | |
{ \ | |
LL_ASSERT_MSG(((val) <= (max)), \ | |
"%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \ | |
HAL_TICKER_TICKS_TO_US(cputime_ticks_radio), \ | |
LLL_PROF_RADIO_MAX_US, \ | |
HAL_TICKER_TICKS_TO_US(cputime_ticks_lll), \ | |
LLL_PROF_LLL_MAX_US, \ | |
HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_high), \ | |
LLL_PROF_ULL_HIGH_MAX_US, \ | |
HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_low), \ | |
#define LLL_PROF_ASSERT(val, max, radio_ticks, lll_ticks, ull_high_ticks, ull_low_ticks) \ | |
{ \ | |
LL_ASSERT_MSG(((val) <= (max)), \ | |
"%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \ | |
HAL_TICKER_TICKS_TO_US(radio_ticks), \ | |
LLL_PROF_RADIO_MAX_US, \ | |
HAL_TICKER_TICKS_TO_US(lll_ticks), \ | |
LLL_PROF_LLL_MAX_US, \ | |
HAL_TICKER_TICKS_TO_US(ull_high_ticks), \ | |
LLL_PROF_ULL_HIGH_MAX_US, \ | |
HAL_TICKER_TICKS_TO_US(ull_low_ticks), \ |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are static globals, hence it is ok as-is.
{ | ||
delta(timestamp_radio, &cputime_radio); | ||
delta_ticks(timestamp_ticks_radio, &cputime_ticks_radio); | ||
LLL_PROF_ASSERT(cputime_ticks_radio, HAL_TICKER_US_TO_TICKS(LLL_PROF_RADIO_MAX_US)); |
Copilot
AI
Sep 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LLL_PROF_ASSERT macro is called in every exit function, but the macro performs multiple HAL_TICKER_TICKS_TO_US conversions for the assertion message formatting. These conversions are expensive and only needed if the assertion fails. Consider moving the conversions inside a conditional block that only executes when the assertion would fail.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HAL_TICKER_US_TO_TICKS(LLL_PROF_RADIO_MAX_US)
is constant, expect the compiler to not do conversions at runtime.
#define LLL_PROF_RADIO_MAX_US 85 /* Max. Radio Rx/Tx ISR, O(1)*/ | ||
#define LLL_PROF_LLL_MAX_US 75 /* Max. LLL prepare, O(1) */ | ||
#define LLL_PROF_ULL_HIGH_MAX_US 260 /* Max. Radio + LLL + ULL High, O(1) */ | ||
#define LLL_PROF_ULL_LOW_MAX_US 306 /* Max. ULL Low, O(n) n is ticker nodes */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is O(n), shouldn't the value here be something like TICKER_NODE_CNT * X
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The measured value are restricted to the use case mentioned in the comments above. Other scenarios, or increased nodes are not profiled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if you have 306 for the above use case (number of nodes = 6?), then you can define this as 51 * NODES
which gives the same result, while also fitting better to the O(n)
description.
Why not do it that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work like this. The ISRs are of different priority and pre-empt, .i.e. in the default values, Radio = LLL ISR has higher priority, that means ULL High == ULL_LOW measured account for CPU execution of Radio and/or LLL executing over their execution time :-( . Actually, ULL_LOW CPU time is more like x + 85 + 75 + (2 x ISR Latency)
= 306.
Also, TICKER_NODES
are 16 for this case based on other instances needed to schedule auxiliary transmission, receptions and timeouts. And, the CPU execution time is very depended on difference in optimization (and h/w runtime caching) with any code changes.
I will take these numbers with a pinch of salt!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x + 85 + 75 + (2 x ISR Latency) = 306.
That is far from clear from the text and descriptions here :) It may be useful to include the actual calculations behind these numbers, and how they were measured so that others can reproduce the same numbers in case something changes
fada168
to
b0f12ad
Compare
Add in-system ticker_start and ticker_stop tests. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>
ff3110f
to
c7bf037
Compare
Updated in-system profiling for CPU usage with on target measurements and added assertion checks. Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>
c7bf037
to
e648609
Compare
|
Updated in-system profiling for CPU usage with on target
measurements and added assertion checks.
Add in-system ticker_start and ticker_stop tests.
There is significant improved lower CPU use in Radio ISR (2 ticks less)and LLL ISR (3 ticks less) execution context when experimental LLL pipeline using ordered linked list is used (
nrf52833dk/nrf52833
), using:Comparing without that improvement:
2025-10-13 (with ordered linked list use for LLL pipeline):
Initial Latencies
[2025-10-13 06:08:06.536] test_ticker_cb: 0/16 at 1604878 + 0 + 37 us (ULL_LOW 26 us)
Scheduling 16 ticker nodes in the future, max. ULL_LOW CPU use: 26 us
Max. latency resolving 16 overlaps at first expiry: 37 us
Of which, ISR Latency with CPU wake up: 10 us
Rescheduling overlap
[2025-10-13 06:08:06.541] test_ticker_cb: 1/16 at 1705028 + 100150 + 24 us (ULL_LOW 141 us)
Reschedule one amongst 16 ticker nodes, max. ULL_LOW CPU use: 141 us
Max. latency resolving (16 - n) on subsequent overlaps on expiry: 24 us
Radio ISR Latency
Radio ISR Latency = LLL CPU usage? : 56 us