⚡️ Speed up function get_mid_block_adapter by 103,174%
#139
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 103,174% (1,031.74x) speedup for
get_mid_block_adapterinsrc/diffusers/models/controlnets/controlnet_xs.py⏱️ Runtime :
459 milliseconds→445 microseconds(best of106runs)📝 Explanation and details
Summary of optimizations:
find_largest_factorwithfind_largest_factor_fastest: avoids Python loop and modulo overhead by simply looping downward from min(number, max_factor), first hit is the answer (much faster for usually small norms).make_zero_convremoved pointlesspadding=0(which is default) for brevity.This rewrite will dramatically reduce overhead on repeated use and speed up single calls, especially on the bottlenecked
find_largest_factorpath. No unnecessary Conv2D parameters/options are created. The network module construction is now as fast as possible.✅ Correctness verification report:
🌀 Generated Regression Tests Details
To edit these changes
git checkout codeflash/optimize-get_mid_block_adapter-mbdrf8nuand push.