sort: fix newline handling across large and/or multiple files#8746
sort: fix newline handling across large and/or multiple files#8746sylvestre merged 4 commits intouutils:mainfrom
Conversation
|
I am a bit embarrassed to admit I did not notice #8652 before submitting this PR. I am not sure how this affects or overlaps with 8652 but I don't doubt that 8652 is also a valid approach to the problem. :) Edit: If nothing else, this PR still fixes the |
|
GNU testsuite comparison: |
CodSpeed Performance ReportMerging #8746 will not alter performanceComparing Summary
Footnotes
|
f2448e2 to
3139847
Compare
|
GNU testsuite comparison: |
e04e64d to
d86638e
Compare
|
GNU testsuite comparison: |
|
Test fixtures are now generated programmatically, let me know if I should squash to remove the hardcoded fixtures from the git history? |
d86638e to
0f6f6d4
Compare
if anything, this change is much leaner and easier to review (and passes CI, at the very least), as opposed to 8652 :) |
|
GNU testsuite comparison: |
0f6f6d4 to
15c0028
Compare
15c0028 to
136eabe
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
well done (instead of timeout) :) /usr/bin/sort oneline_4G.txt ran |
136eabe to
056e6e5
Compare
|
GNU testsuite comparison: |
056e6e5 to
25e269c
Compare
|
GNU testsuite comparison: |
When the sort utility is searching for newlines in a large buffer, skip past any previously-searched data. This fixes a quadratic-time overhead that would occur in case of a line that is far longer than the configured buffer size (and
START_BUFFER_SIZE).On my M1 MacBook Pro:
This PR also fixes a separate issue where the check
last_file_target_size != leftover_lenwas used to determine whether a file is non-empty; however this could fail if the buffer was recently resized, sinceleftover_lenaccounts for the additional capacity butlast_file_target_sizedoes not. This can cause two files to be concatenated without a newline in between. To reproduce: runhead -c 8000 /dev/zero | tr '\0' 'b' >b.txt; echo aaa >a.txt; cargo run sort b.txt a.txt. I added a new testtest_sort::test_start_bufferto cover this.Fixes #8583.