Cpu optimizations by bkerler · Pull Request #5206 · prusa3d/Prusa-Firmware-Buddy

bkerler · 2026-03-26T08:05:17Z

This proposes three fixes for cpu optimizations:

1. precise_stepping: drop redundant previous_step_time double field

previous_step_time (double) was stored alongside previous_step_time_ticks
(uint64_t) and the two were always set in sync. The double was only ever
read to check == 0. to detect the first step event. On Cortex-M4F, the
FPU only handles single-precision; double comparison goes through the
software emulation library (~5-8 cycles). Switching to
previous_step_time_ticks == 0 uses a native 2-register integer compare
(1-2 cycles) and removes 8 bytes from step_generator_state_t.

2. precise_stepping: unroll step event index insertion sort for 4 axes

step_generator_state_update_nearest_idx is called on every step event
generated in the move ISR. The previous implementation used
std::lower_bound with a lambda comparator followed by a separate shift
loop that required asm volatile("") to prevent the compiler from emitting
a memmove call for the 3-element shift.

With PS_AXIS_COUNT fixed at 4, the re-insertion into positions [1..3]
is replaced by an explicit cascade of at most 3 comparisons and
in-place shifts. This eliminates the lower_bound iterator overhead, the
lambda dispatch, and the asm volatile workaround, while producing the
same result.

3. precise_stepping: maintain current_flags incrementally per axis

update_step_generator_state_current_flags() rebuilt current_flags by
iterating all 4 axes through pointer indirection on every step event.
Only the axis whose generator was just called can have changed its
step_flags, so a full recompute is wasteful.

generate_next_step_event now updates current_flags with a single
mask-and-OR immediately after calling the per-axis step generator:

axis_mask = (X_DIR | X_ACTIVE) << axis
current_flags = (current_flags & ~axis_mask) | step_flags[axis]

The update_step_generator_state_current_flags function is removed.
The invariant current_flags equals the OR of all generators'
step_flags is maintained throughout because only one axis changes
per call and that change is applied inline.

previous_step_time (double) was stored alongside previous_step_time_ticks (uint64_t) and the two were always set in sync. The double was only ever read to check == 0. to detect the first step event. On Cortex-M4F, the FPU only handles single-precision; double comparison goes through the software emulation library (~5-8 cycles). Switching to previous_step_time_ticks == 0 uses a native 2-register integer compare (1-2 cycles) and removes 8 bytes from step_generator_state_t.

step_generator_state_update_nearest_idx is called on every step event generated in the move ISR. The previous implementation used std::lower_bound with a lambda comparator followed by a separate shift loop that required asm volatile("") to prevent the compiler from emitting a memmove call for the 3-element shift. With PS_AXIS_COUNT fixed at 4, the re-insertion into positions [1..3] is replaced by an explicit cascade of at most 3 comparisons and in-place shifts. This eliminates the lower_bound iterator overhead, the lambda dispatch, and the asm volatile workaround, while producing the same result.

update_step_generator_state_current_flags() rebuilt current_flags by iterating all 4 axes through pointer indirection on every step event. Only the axis whose generator was just called can have changed its step_flags, so a full recompute is wasteful. generate_next_step_event now updates current_flags with a single mask-and-OR immediately after calling the per-axis step generator: axis_mask = (X_DIR | X_ACTIVE) << axis current_flags = (current_flags & ~axis_mask) | step_flags[axis] The update_step_generator_state_current_flags function is removed. The invariant current_flags equals the OR of all generators' step_flags is maintained throughout because only one axis changes per call and that change is applied inline.

bkerler added 3 commits March 26, 2026 08:59

bkerler marked this pull request as draft March 26, 2026 08:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cpu optimizations#5206

Cpu optimizations#5206
bkerler wants to merge 3 commits intoprusa3d:masterfrom
bkerler:cpu_optimizations

bkerler commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bkerler commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant