Skip to content

Commit d8d447c

Browse files
⚡️ Speed up method CharacterRemover.remove_control_characters by 46%
Here’s an optimized version of your program. The main bottleneck is `re.sub`, which is relatively slow for simple tasks like filtering ASCII ranges, especially in tight loops. You can greatly speed this up by using `str.translate` with a translation table that drops the unwanted control characters. This avoids regex overhead and is much faster in practice. **Why is this faster?** - `str.translate` does pure C-level translation and omission in a single pass, no regex engine overhead. - The translation table is created only once per instance. - No function-call overhead inside loops. **Guaranteed same results:** Control chars `chr(0)`–`chr(31)` and `chr(127)` are omitted, just as with your regex. This will significantly reduce the time per call as shown in your profile. If you want even more speed and you're always working with ASCII, you can potentially use bytes, but `str.translate` is already highly efficient for this use case.
1 parent 0e5f79f commit d8d447c

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed
Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
1-
import re
2-
3-
41
class CharacterRemover:
52
def __init__(self):
63
self.version = "0.1"
4+
# Build translation table once in init.
5+
self._ctrl_table = self._make_ctrl_table()
76

87
def remove_control_characters(self, s) -> str:
98
"""Remove control characters from the string."""
10-
return re.sub("[\\x00-\\x1F\\x7F]", "", s) if s else ""
9+
return s.translate(self._ctrl_table) if s else ""
10+
11+
def _make_ctrl_table(self):
12+
# Map delete (ASCII 127) and 0-31 to None
13+
ctrl_chars = dict.fromkeys(range(32), None)
14+
ctrl_chars[127] = None
15+
return str.maketrans(ctrl_chars)

0 commit comments

Comments
 (0)