⚡️ Speed up function encoded_tokens_len by 70% in PR #231 (remove-tiktoken)

codeflash-ai[bot] · web-flow · commit a65676c5bab8 · 2025-05-21T01:49:08.000Z
Here is an optimized version of your code. The bottleneck is minimal as the computation is a single multiplication and a cast to int, which is already fast. However, a very minor optimization can be done by avoiding the `int()` call for many cases by using integer division directly.

You can also remove the `__future__` import, as `annotations` is default since Python 3.7.

Here is an optimized version.



This avoids floating point multiplication and conversion overhead, and gives the same result as `int(len(s)*0.25)` for non-negative integer `len(s)`.
diff --git a/codeflash/code_utils/code_utils.py b/codeflash/code_utils/code_utils.py
@@ -10,10 +10,14 @@
 
 from codeflash.cli_cmds.console import logger
 
+
 def encoded_tokens_len(s: str) -> int:
-    '''Function for returning the approximate length of the encoded tokens
-    It's an approximation of BPE encoding (https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)'''
-    return int(len(s)*0.25)
+    """Function for returning the approximate length of the encoded tokens
+    It's an approximation of BPE encoding (https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
+    """
+    # Use integer division for better performance
+    return len(s) // 4
+
 
 def get_qualified_name(module_name: str, full_qualified_name: str) -> str:
     if not full_qualified_name: