⚡️ Speed up method BnB8BitDiffusersQuantizer.update_device_map by 7%
#125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 7% (0.07x) speedup for
BnB8BitDiffusersQuantizer.update_device_mapinsrc/diffusers/quantizers/bitsandbytes/bnb_quantizer.py⏱️ Runtime :
66.7 microseconds→62.0 microseconds(best of196runs)📝 Explanation and details
Here is the optimized version of your program, focused on the
update_device_mapmethod, which is the performance-critical code according to your line profiler results. The main performance bottleneck is the repeated logging call that formats the string and performs unnecessary operations inside the function on every call, despite often being used in inference loops. Additional speed-ups can be achieved by.update_device_mapis called many times withdevice_map=None.Here's your optimized program.
Summary of optimizations.
torch.xpuandtorch.cudato slightly speed up repeated attribute access.logger.isEnabledFor(...). Much less computation inside hot call path.This will yield a measurable speedup in hot inference code, especially when
update_device_map(None)is invoked frequently.✅ Correctness verification report:
🌀 Generated Regression Tests Details
To edit these changes
git checkout codeflash/optimize-BnB8BitDiffusersQuantizer.update_device_map-mbdeoax3and push.