⚡️ Speed up function get_cross_encoder_activation_function
by 93%
#99
+13
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 93% (0.93x) speedup for
get_cross_encoder_activation_function
inpython/sglang/srt/layers/activation.py
⏱️ Runtime :
26.0 milliseconds
→13.5 milliseconds
(best of79
runs)📝 Explanation and details
The optimization adds an
@lru_cache(maxsize=128)
decorator to theresolve_obj_by_qualname
function insglang/utils.py
. This caching mechanism provides a 92% speedup by eliminating redundant module imports and attribute lookups.Key optimization: The line profiler shows that
importlib.import_module(module_name)
was the primary bottleneck, consuming 96.9% of execution time in the original code. Module imports are expensive operations that involve file system access, parsing, and Python's import machinery. With caching, subsequent calls to resolve the same qualified name (like "torch.nn.modules.activation.ReLU") bypass the import entirely and return the cached result.Performance impact: The cache reduces the critical line from 94.25ms to 32.3ms total execution time - a 66% reduction in the most expensive operation. This is particularly effective for workloads that repeatedly request the same activation functions, as shown in the test cases where 1000+ configs use identical activation functions.
Why it works: The cache is safe because module imports are idempotent - importing the same module multiple times returns the same object. The maxsize=128 limit provides sufficient capacity for typical activation function variety while preventing unbounded memory growth. This optimization is most beneficial for batch processing scenarios and repeated model instantiation with common activation functions.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-get_cross_encoder_activation_function-mh2ueub9
and push.