Skip to content

Performance optimizations#4360

Open
WonderMr wants to merge 2 commits intoflipperdevices:devfrom
WonderMr:dev
Open

Performance optimizations#4360
WonderMr wants to merge 2 commits intoflipperdevices:devfrom
WonderMr:dev

Conversation

@WonderMr
Copy link

@WonderMr WonderMr commented Mar 17, 2026

What's new

10 safe performance optimizations for STM32WB55 (Cortex-M4 @ 64MHz).

Compiler & build:

  1. -Og → -Os for release builds — enables most -O2 passes: function inlining, dead code elimination, loop optimization, aggressive register allocation, tail call optimization, and common subexpression elimination (site_scons/firmwareopts.scons)

Memory management (correctness + perf):
2. Disable heap memset on free() in release — configHEAP_CLEAR_MEMORY_ON_FREE is now conditional on FURI_DEBUG. Saves ~500+ memset calls/sec during active GUI/protocol work (FreeRTOSConfig.h)
3. Fix calloc() to explicitly zero memory — with heap-clear disabled in release, calloc() now does its own memset(0) to guarantee zero-initialized returns (memmgr.c)
4. Fix realloc() to copy min(old_size, new_size) bytes — original copied size (new) bytes, reading past old allocation when growing. Added memmgr_heap_get_block_size() that reads usable size from Heap_4 BlockLink_t header. Also added NULL-guard on pvPortMalloc result to preserve original allocation on OOM (memmgr.c, memmgr_heap.c/.h, api_symbols.csv)

Branch prediction:
5. __builtin_expect hints on furi_check/assert/break — crash code moves to end of function, hot path becomes fall-through (0 pipeline penalty on Cortex-M4 3-stage pipeline). Affects ~2300+ call sites across the firmware (check.h)

DMA:
6. SPI TX via DMA with RX drain — furi_hal_spi_bus_tx() now delegates to DMA when scheduler is running, freeing the CPU during display updates (~1KB/frame @ 20fps) and radio TX. TX-only path sets up RX DMA channel draining into a dummy byte to prevent OVR accumulation. Polling fallback preserved for pre-scheduler context (furi_hal_spi.c)

Hot function inlining:
7. attribute((flatten)) on furi_get_tick() — forces inlining of FreeRTOS wrappers at call sites (kernel.c)
8. attribute((flatten)) on hot thread functions — applied to furi_thread_get_current_id(), furi_thread_get_current(), furi_thread_flags_get() (thread.c)

String formatting:
9. In-place vprintf for furi_string_cat_vprintf() — formats directly into destination buffer at current offset, growing only if needed. Eliminates temporary FuriString allocation (malloc + format + memcpy + free) per call (string.c)

Power:
10. Reduce configEXPECTED_IDLE_TIME_BEFORE_SLEEP from 4 to 2 ticks — allows FreeRTOS tickless idle to enter STOP mode more aggressively (2ms threshold instead of 4ms). Reduces average power consumption (FreeRTOSConfig.h)

Verification

  • Build: ./fbt COMPACT=1 DEBUG=0 — compiles without warnings/errors
  • Build: ./fbt DEBUG=1 — debug build still compiles and links
  • Boot device, navigate Settings → About — FW version shown correctly
  • Open SubGHz, NFC, IR apps — UI responsive, no hangs
  • Verify formatted strings render correctly in UI (menus, popups, dialogs)
  • SPI peripherals working: display renders, Sub-GHz TX/RX functional
  • Leave device idle 30s — confirm no increased battery drain or wake-up issues
  • Stress test: rapid app switching, menu scrolling — no crashes or visual artifacts
  • Verify calloc-dependent code (e.g. allocating zero-initialized buffers) works correctly in release build

Checklist (For Reviewer)

  • PR has description of feature/bug or link to Confluence/Jira task
  • Description contains actions to verify feature/bugfix
  • I've built this code, uploaded it to the device and verified feature/bugfix

AlZh-Mex and others added 2 commits March 17, 2026 10:32
…MHz)

Ported from WonderMr/unleashed-firmware feat/opus-optimised and
feat/cortex-m4-micro-optimizations branches.

1. Compiler: -Og → -Os for release builds (firmwareopts.scons)
   Enables -O2-level passes: inlining, dead code elimination, loop
   optimization, register allocation, tail call optimization, and CSE.

2. Disable heap memset on free() in release (FreeRTOSConfig.h)
   configHEAP_CLEAR_MEMORY_ON_FREE now conditional on FURI_DEBUG.
   Saves ~500+ memset calls/sec during active GUI/protocol work.

3. Fix calloc() to explicitly zero memory (memmgr.c)
   With optimization flipperdevices#2 disabling heap-clear in release, calloc()
   must memset(0) explicitly to guarantee zero-initialized returns.

4. Fix realloc() to copy min(old_size, new_size) bytes (memmgr.c,
   memmgr_heap.c/h, api_symbols.csv)
   Added memmgr_heap_get_block_size() to read usable size from
   Heap_4 BlockLink_t header. Also added NULL-guard on pvPortMalloc
   result to preserve original allocation on OOM.

5. Branch prediction hints on furi_check/assert/break (check.h)
   Added __builtin_expect(!(__e), 0) to all assertion macros. Crash
   code moves to end of function, hot path becomes fall-through.
   Affects ~2300+ call sites across the firmware.

6. SPI TX via DMA with RX drain (furi_hal_spi.c)
   furi_hal_spi_bus_tx() now delegates to DMA when scheduler is
   running, freeing CPU during display updates and radio TX. RX DMA
   channel drains into dummy byte to prevent OVR accumulation.

7. __attribute__((flatten)) on furi_get_tick() (kernel.c)
   Forces inlining of FreeRTOS wrappers at call sites, eliminating
   function call overhead on this very hot path.

8. __attribute__((flatten)) on hot thread functions (thread.c)
   Applied to furi_thread_get_current_id(), furi_thread_get_current(),
   and furi_thread_flags_get().

9. In-place vprintf for furi_string_cat_vprintf() (string.c)
   Formats directly into destination buffer at current offset instead
   of allocating a temporary FuriString. Eliminates malloc+format+
   memcpy+free per call.

10. Reduce configEXPECTED_IDLE_TIME_BEFORE_SLEEP 4 → 2 (FreeRTOSConfig.h)
    Allows FreeRTOS tickless idle to enter STOP mode more aggressively
    (2ms threshold instead of 4ms). Reduces average power consumption.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
perf: 10 safe performance optimizations for STM32WB55 (Cortex-M4 @ 64MHz)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants