Skip to content

sort: deduplicate file descriptors in merge mode#11961

Open
nonontb wants to merge 10 commits intouutils:mainfrom
nonontb:feature/sort-input-fd-optimization
Open

sort: deduplicate file descriptors in merge mode#11961
nonontb wants to merge 10 commits intouutils:mainfrom
nonontb:feature/sort-input-fd-optimization

Conversation

@nonontb
Copy link
Copy Markdown

@nonontb nonontb commented Apr 23, 2026

What This Does

This PR makes sort -m (merge mode) use less (minimum?) opened files.

The Problem

Before:

If you ran sort -m file.txt file.txt file.txt, the program opened file.txt three times eagerly — once for every time it appeared on the command line.
With lots of duplicates or a tight system limit on open files, this could fail.

If you tried to merge a file that was also your output file, the program had to create a temporary copy behind the scenes, using one more file.

GNU version has no issue running the test in #5714

The Fix

Now the program opens each unique file only once and Lazily and use Mmap (memmap2 - unsafe) to manage one FD for all input file duplicates including re-use of output file as inputs.

Result

Fix #5714

New tests are added for merging duplicate files.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

GNU testsuite comparison:

GNU test failed: tests/misc/sync. tests/misc/sync is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/cut/bounded-memory (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/printf/printf-surprise is now being skipped but was previously passing.
Congrats! The gnu test tests/unexpand/bounded-memory is now passing!
Skip an intermittent issue tests/pr/bounded-memory (was skipped on 'main', now failing)

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 23, 2026

Merging this PR will not alter performance

✅ 309 untouched benchmarks
⏩ 46 skipped benchmarks1


Comparing nonontb:feature/sort-input-fd-optimization (563ddcf) with main (6b16cc9)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@@ -0,0 +1,3 @@
1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please generate the files on the fly

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@nonontb nonontb force-pushed the feature/sort-input-fd-optimization branch 4 times, most recently from ebda4db to a9f249e Compare April 24, 2026 18:29
@nonontb nonontb force-pushed the feature/sort-input-fd-optimization branch from a9f249e to 3517780 Compare April 26, 2026 08:14
@@ -0,0 +1,6 @@
1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please generate this one the fly too

Copy link
Copy Markdown
Author

@nonontb nonontb Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot to delete it but the test is "on the fly"

#[test]
fn test_merge_mixed_stdin_and_files() {
    let (at, mut ucmd) = at_and_ucmd!();
    at.write("merge_duplicates_1.txt", "1\n3\n5\n");
    // Verify that sort -m allows mixing stdin with files (GNU Coreutils compatible)
    ucmd.arg("-m")
        .arg("-")
        .arg("merge_duplicates_1.txt")
        .pipe_in("apricot\nelderberry\nkiwi\n")
        .succeeds()
        .stdout_is("1\n3\n5\napricot\nelderberry\nkiwi\n");
}

Comment thread src/uu/sort/src/merge.rs Outdated
// it gets opened for writing. This allows reading the original content
// via memory-map while writing to the same file, without needing a temp copy.
let output_as_input = if let Some(name) = output.as_output_name() {
let output_path = Path::new(name).canonicalize()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this into a function?

@sylvestre
Copy link
Copy Markdown
Contributor

did you look if we have a benchmark covering this? thanks

@nonontb
Copy link
Copy Markdown
Author

nonontb commented Apr 26, 2026

did you look if we have a benchmark covering this? thanks

I just did and it seems there not relevant benchmark test in src/uu/sort/benches/sort_bench.rs even there is some doc to explain how to do it in src/uu/sort/BENCHMARKING.md or I miss something.

I suppose it would be better to add this bench test in another issue to have some reference numbers before benchmarking this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sort opens too many files

2 participants