⚡️ Speed up function mask_tokens_randomly
by 231%
#3
+20
−16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 231% (2.31x) speedup for
mask_tokens_randomly
inblanc/utils.py
⏱️ Runtime :
13.6 milliseconds
→4.12 milliseconds
(best of287
runs)📝 Explanation and details
The optimized code achieves a 231% speedup through several key algorithmic improvements:
1. Precomputed Next Tokens List
tokens[idx + 1]
indexing with a singlenext_tokens = tokens[1:] + ['']
precomputation'' if idx + 1 == len(tokens) else tokens[idx + 1]
in every loop iterationzip(tokens, next_tokens)
for efficient paired iteration2. List Comprehension for Token Position Selection
3. Optimized Loop Structure
while len(token_positions) > 0
loop that repeatedly mutated and copied the listrange(0, position_count, n_mask)
with slicing, avoiding expensive list resizing operations4. Set-Based Membership Testing
positions_to_mask
from list to set, changingidx in positions_to_mask
from O(n) to O(1)5. Comprehensions Over Manual Loops
inputs
andanswers
Performance Benefits by Test Case:
The optimizations are most effective for larger token sequences where the O(1) set operations and reduced list mutations provide substantial savings.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-mask_tokens_randomly-mh2kp1kl
and push.