perf: 6 performance optimizations for STM32WB55 (Cortex-M4 @ 64MHz)#977
Draft
WonderMr wants to merge 3 commits intoDarkFlippers:devfrom
Draft
perf: 6 performance optimizations for STM32WB55 (Cortex-M4 @ 64MHz)#977WonderMr wants to merge 3 commits intoDarkFlippers:devfrom
WonderMr wants to merge 3 commits intoDarkFlippers:devfrom
Conversation
1. Compiler: -Og → -Os (release builds)
Replace debug-level optimization with size-optimized -Os, which
enables most -O2 passes: function inlining, dead code elimination,
loop optimization, aggressive register allocation, tail call
optimization, and common subexpression elimination.
File: site_scons/firmwareopts.scons
2. Disable heap memset on free() in release
configHEAP_CLEAR_MEMORY_ON_FREE was always 1, causing FreeRTOS
Heap_4 to memset() every freed block to zero. Useful for catching
use-after-free in debug, but pure waste in release. Now conditional
on FURI_DEBUG. Saves ~500+ memset calls per second during active
GUI/protocol work.
File: targets/f7/inc/FreeRTOSConfig.h
3. SPI TX via DMA instead of busy-wait polling
furi_hal_spi_bus_tx() polled TXE flag byte-by-byte, keeping the
CPU in a tight loop for entire SPI transfers. Now delegates to
furi_hal_spi_bus_trx_dma() which uses DMA2_Channel7 and FreeRTOS
semaphore-based sleep, freeing CPU during display updates (~1KB/
frame @ 20fps) and radio TX operations.
File: targets/f7/furi_hal/furi_hal_spi.c
4. Fix realloc() to copy min(old_size, new_size)
Original realloc() copied `size` (new) bytes from old block via
memcpy, which reads past allocation when growing. Added
memmgr_heap_get_block_size() that reads usable size from Heap_4
BlockLink_t header. Now copies min(old_size, new_size) bytes,
fixing potential UB and reducing unnecessary copying.
Files: furi/core/memmgr.c, furi/core/memmgr_heap.c/.h,
targets/f7/api_symbols.csv
5. Fix calloc() to explicitly zero memory
Original calloc() just called pvPortMalloc() without memset,
relying on configHEAP_CLEAR_MEMORY_ON_FREE=1 for zero-initialized
returns. With optimization #2 disabling that in release, calloc()
would return uninitialized memory. Added explicit memset(0).
File: furi/core/memmgr.c
6. Branch prediction hints on furi_check/assert/break
Added __builtin_expect(!(__e), 0) to all assertion macros. Tells
GCC that error path is cold: crash code moves to end of function,
hot path becomes fall-through (0 pipeline penalty on Cortex-M4
3-stage pipeline). Affects ~2300+ call sites across the firmware.
File: furi/core/check.h
Also: strncpy → strlcpy in subghz_scene_save_name.c (-Os exposed
-Werror=stringop-truncation warning).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b8150e9 to
63e81d4
Compare
realloc: add NULL check on pvPortMalloc result to prevent crash and preserve original allocation on OOM per C standard. Use memmgr_heap_get_block_size() to copy min(old, new) bytes instead of reading past the old allocation boundary. SPI DMA: TX-only path now sets up RX DMA channel draining into a dummy byte to prevent OVR accumulation on transfers >4 bytes. Pre-scheduler fallback correctly routes to furi_hal_spi_bus_tx() for TX-only ops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
733b117 to
ce8a953
Compare
Member
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's new
Compiler: -Og → -Os (release builds) Replace debug-level optimization with size-optimized -Os, which enables most -O2 passes: function inlining, dead code elimination, loop optimization, aggressive register allocation, tail call optimization, and common subexpression elimination. File: site_scons/firmwareopts.scons
Disable heap memset on free() in release configHEAP_CLEAR_MEMORY_ON_FREE was always 1, causing FreeRTOS Heap_4 to memset() every freed block to zero. Useful for catching use-after-free in debug, but pure waste in release. Now conditional on FURI_DEBUG. Saves ~500+ memset calls per second during active GUI/protocol work. File: targets/f7/inc/FreeRTOSConfig.h
SPI TX via DMA instead of busy-wait polling furi_hal_spi_bus_tx() polled TXE flag byte-by-byte, keeping the CPU in a tight loop for entire SPI transfers. Now delegates to furi_hal_spi_bus_trx_dma() which uses DMA2_Channel7 and FreeRTOS semaphore-based sleep, freeing CPU during display updates (~1KB/ frame @ 20fps) and radio TX operations. File: targets/f7/furi_hal/furi_hal_spi.c
Fix realloc() to copy min(old_size, new_size) Original realloc() copied
size(new) bytes from old block via memcpy, which reads past allocation when growing. Added memmgr_heap_get_block_size() that reads usable size from Heap_4 BlockLink_t header. Now copies min(old_size, new_size) bytes, fixing potential UB and reducing unnecessary copying. Files: furi/core/memmgr.c, furi/core/memmgr_heap.c/.h, targets/f7/api_symbols.csvFix calloc() to explicitly zero memory Original calloc() just called pvPortMalloc() without memset, relying on configHEAP_CLEAR_MEMORY_ON_FREE=1 for zero-initialized returns. With optimization №2 disabling that in release, calloc() would return uninitialized memory. Added explicit memset(0). File: furi/core/memmgr.c
Branch prediction hints on furi_check/assert/break Added __builtin_expect(!(__e), 0) to all assertion macros. Tells GCC that error path is cold: crash code moves to end of function, hot path becomes fall-through (0 pipeline penalty on Cortex-M4 3-stage pipeline). Affects ~2300+ call sites across the firmware. File: furi/core/check.h
Also: strncpy → strlcpy in subghz_scene_save_name.c (-Os exposed -Werror=stringop-truncation warning).
Verification
Checklist (For Reviewer)