⚡️ Speed up method HybridLinearKVPool._transfer_full_attention_id
by 18%
#91
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 18% (0.18x) speedup for
HybridLinearKVPool._transfer_full_attention_id
inpython/sglang/srt/mem_cache/memory_pool.py
⏱️ Runtime :
11.0 microseconds
→9.26 microseconds
(best of10
runs)📝 Explanation and details
The optimization replaces a conditional check-then-lookup pattern with a direct try/except approach for dictionary access. The original code uses
if layer_id not in self.full_attention_layer_id_mapping:
followed by a separate dictionary lookup, which results in two dictionary operations - one for the membership test and another for the actual value retrieval.The optimized version uses try/except KeyError which performs only one dictionary lookup in the success case. In Python, dictionary
__getitem__
is highly optimized and faster than separate__contains__
+__getitem__
calls.Key changes:
dict.keys()
to a list directly for cleaner string formatting (dict_keys views are slower to stringify)Why it's faster:
Test case performance:
The optimization particularly benefits scenarios where
_transfer_full_attention_id
is called frequently with valid layer_ids (the common case), as seen in the test cases where invalid lookups are the minority. The single lookup approach provides consistent performance gains across all valid access patterns.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-HybridLinearKVPool._transfer_full_attention_id-mh2mcnwn
and push.