Skip to content

Commit 66eab86

Browse files
committed
Speed up spellchecking by ignoring whitespace-only lines
The new API has introduced extra overhead per line being spellchecked. One way of optimizing out this overhead, is to spellcheck fewer lines. An obvious choice here, is to optimize out empty and whitespace-only lines, since they will not have any typos at all (on account of not having any words). A side-effect of this change is that we now spellcheck lines with trailing whitespace stripped. Semantically, this gives the same result (per "whitespace never has typos"). Performance-wise, it is faster in theory because the strings are now shorter (since we were calling `.rstrip()` anyway). In pratice, I am not sure we are going to find any real corpus where the trailing whitespace is noteworthy from a performance point of view. On the performance corpus from #3491, this takes out ~0.4s of runtime brining us down to slightly above the 5.6s that made the baseline.
1 parent cdde333 commit 66eab86

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

codespell_lib/_codespell.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -948,7 +948,8 @@ def parse_file(
948948
)
949949

950950
for i, line in enumerate(lines):
951-
if line.rstrip() in exclude_lines:
951+
line = line.rstrip()
952+
if not line or line in exclude_lines:
952953
continue
953954

954955
extra_words_to_ignore = set()

0 commit comments

Comments
 (0)