Skip to content

Commit c89507a

Browse files
committed
[fdiff.thirdparty.fdifflib] modify the autojunk strategy for larger file sizes
1 parent 93cd2c3 commit c89507a

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

lib/fdiff/thirdparty/fdifflib.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -325,8 +325,19 @@ def __chain_b(self):
325325
# Purge popular elements that are not junk
326326
self.bpopular = popular = set()
327327
n = len(b)
328-
if self.autojunk and n >= 200:
328+
329+
# Define the number of identical lines that must be included
330+
# in order to consider the string "popular" for the autojunk
331+
# strategy. Edited from upstream Python difflib to decrease
332+
# the number of lines as file sizes become larger
333+
if n >= 20000:
334+
ntest = n // 10000 + 1
335+
elif n >= 2000:
336+
ntest = n // 1000 + 1
337+
elif n >= 200:
329338
ntest = n // 100 + 1
339+
340+
if self.autojunk and n >= 200:
330341
for elt, idxs in b2j.items():
331342
if len(idxs) > ntest:
332343
popular.add(elt)

0 commit comments

Comments
 (0)