⚡️ Speed up function zero_module by 143%
#142
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 143% (1.43x) speedup for
zero_moduleinsrc/diffusers/models/controlnets/controlnet_xs.py⏱️ Runtime :
2.74 milliseconds→1.13 milliseconds(best of233runs)📝 Explanation and details
Here’s how you can optimize the provided program.
Analysis and Ideas:
zero_moduleruns slow mostly because it loops over all parameters and callsnn.init.zeros_on each one.nn.init.zeros_is a simple wrapper overtorch.Tensor.zero_, but there’s no need to call it indirectly per-parameter—you can just callzero_()on each parameter.torch.no_grad()will avoid unnecessary autograd overhead when zeroing parameters.zero_()directly is both faster and idiomatic.Here is your optimized code.
Why is this faster?
nn.init.zeros_, giving a direct fast call for each tensor in place.Function return value and signature are preserved.
All logic is the same.
✅ Correctness verification report:
🌀 Generated Regression Tests Details
To edit these changes
git checkout codeflash/optimize-zero_module-mbduiowfand push.