Open
Conversation
…MHz) Ported from WonderMr/unleashed-firmware feat/opus-optimised and feat/cortex-m4-micro-optimizations branches. 1. Compiler: -Og → -Os for release builds (firmwareopts.scons) Enables -O2-level passes: inlining, dead code elimination, loop optimization, register allocation, tail call optimization, and CSE. 2. Disable heap memset on free() in release (FreeRTOSConfig.h) configHEAP_CLEAR_MEMORY_ON_FREE now conditional on FURI_DEBUG. Saves ~500+ memset calls/sec during active GUI/protocol work. 3. Fix calloc() to explicitly zero memory (memmgr.c) With optimization flipperdevices#2 disabling heap-clear in release, calloc() must memset(0) explicitly to guarantee zero-initialized returns. 4. Fix realloc() to copy min(old_size, new_size) bytes (memmgr.c, memmgr_heap.c/h, api_symbols.csv) Added memmgr_heap_get_block_size() to read usable size from Heap_4 BlockLink_t header. Also added NULL-guard on pvPortMalloc result to preserve original allocation on OOM. 5. Branch prediction hints on furi_check/assert/break (check.h) Added __builtin_expect(!(__e), 0) to all assertion macros. Crash code moves to end of function, hot path becomes fall-through. Affects ~2300+ call sites across the firmware. 6. SPI TX via DMA with RX drain (furi_hal_spi.c) furi_hal_spi_bus_tx() now delegates to DMA when scheduler is running, freeing CPU during display updates and radio TX. RX DMA channel drains into dummy byte to prevent OVR accumulation. 7. __attribute__((flatten)) on furi_get_tick() (kernel.c) Forces inlining of FreeRTOS wrappers at call sites, eliminating function call overhead on this very hot path. 8. __attribute__((flatten)) on hot thread functions (thread.c) Applied to furi_thread_get_current_id(), furi_thread_get_current(), and furi_thread_flags_get(). 9. In-place vprintf for furi_string_cat_vprintf() (string.c) Formats directly into destination buffer at current offset instead of allocating a temporary FuriString. Eliminates malloc+format+ memcpy+free per call. 10. Reduce configEXPECTED_IDLE_TIME_BEFORE_SLEEP 4 → 2 (FreeRTOSConfig.h) Allows FreeRTOS tickless idle to enter STOP mode more aggressively (2ms threshold instead of 4ms). Reduces average power consumption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
perf: 10 safe performance optimizations for STM32WB55 (Cortex-M4 @ 64MHz)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's new
10 safe performance optimizations for STM32WB55 (Cortex-M4 @ 64MHz).
Compiler & build:
Memory management (correctness + perf):
2. Disable heap memset on free() in release — configHEAP_CLEAR_MEMORY_ON_FREE is now conditional on FURI_DEBUG. Saves ~500+ memset calls/sec during active GUI/protocol work (FreeRTOSConfig.h)
3. Fix calloc() to explicitly zero memory — with heap-clear disabled in release, calloc() now does its own memset(0) to guarantee zero-initialized returns (memmgr.c)
4. Fix realloc() to copy min(old_size, new_size) bytes — original copied size (new) bytes, reading past old allocation when growing. Added memmgr_heap_get_block_size() that reads usable size from Heap_4 BlockLink_t header. Also added NULL-guard on pvPortMalloc result to preserve original allocation on OOM (memmgr.c, memmgr_heap.c/.h, api_symbols.csv)
Branch prediction:
5. __builtin_expect hints on furi_check/assert/break — crash code moves to end of function, hot path becomes fall-through (0 pipeline penalty on Cortex-M4 3-stage pipeline). Affects ~2300+ call sites across the firmware (check.h)
DMA:
6. SPI TX via DMA with RX drain — furi_hal_spi_bus_tx() now delegates to DMA when scheduler is running, freeing the CPU during display updates (~1KB/frame @ 20fps) and radio TX. TX-only path sets up RX DMA channel draining into a dummy byte to prevent OVR accumulation. Polling fallback preserved for pre-scheduler context (furi_hal_spi.c)
Hot function inlining:
7. attribute((flatten)) on furi_get_tick() — forces inlining of FreeRTOS wrappers at call sites (kernel.c)
8. attribute((flatten)) on hot thread functions — applied to furi_thread_get_current_id(), furi_thread_get_current(), furi_thread_flags_get() (thread.c)
String formatting:
9. In-place vprintf for furi_string_cat_vprintf() — formats directly into destination buffer at current offset, growing only if needed. Eliminates temporary FuriString allocation (malloc + format + memcpy + free) per call (string.c)
Power:
10. Reduce configEXPECTED_IDLE_TIME_BEFORE_SLEEP from 4 to 2 ticks — allows FreeRTOS tickless idle to enter STOP mode more aggressively (2ms threshold instead of 4ms). Reduces average power consumption (FreeRTOSConfig.h)
Verification
Checklist (For Reviewer)