Skip to content

Conversation

@ChrisDenton
Copy link
Member

Doing a write then truncate instead of truncate then write is much faster on Windows (and potentially some filesystems on other systems too). A downside is that it may leave the file in an inconsistent state if File::set_len fails.

Fixes #127606

I'm nominating for libs-api because this may not honour the API of std::fs::write. Maybe t-libs can also think of a reason not to do this.

Write then truncate instead of truncate then write.
@ChrisDenton ChrisDenton added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Dec 24, 2024
@rustbot
Copy link
Collaborator

rustbot commented Dec 24, 2024

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 24, 2024
@ChrisDenton ChrisDenton added the A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` label Dec 24, 2024
@Urgau
Copy link
Member

Urgau commented Dec 24, 2024

Do you have numbers? How much faster are we talking?

We should have a library benchmark (if we don't already have one).

@ChrisDenton
Copy link
Member Author

ChrisDenton commented Dec 24, 2024

I need to run a proper benchmark but if the end file size is about the same then there seems to be an order of magnitude difference, which seems significant regardless:

fn write_file_truncate(file_name: &str, data: &[u8]) {
    if let Ok(mut file) = OpenOptions::new()
        .write(true)
        .create(true)
        .truncate(true)
        .open(file_name)
    {
        file.write_all(data).unwrap();
    }
}

fn write_file_set_len(file_name: &str, data: &[u8]) {
    if let Ok(mut file) = OpenOptions::new()
        .write(true)
        .create(true)
        .open(file_name)
    {
        file.write_all(data).unwrap();
        let pos = file.stream_position().unwrap();
        file.set_len(pos);
    }
}

static DATA: &str = include_str!("p&p.txt");

fn main() {
    let now = std::time::Instant::now();
    for _ in 0..1000 {
        write_file_truncate("p&p.txt", DATA.as_bytes());
        //write_file_set_len("p&p.txt", DATA.as_bytes());
    }
    println!("{} ms", now.elapsed().as_millis());
}

Where p&p.txt is a copy of Pride and Prejudice.

The difference was 200 to 500 ms for truncate vs. 60 to 80 ms for set_len.

So this allows writing Pride and Prejudice an order of magnitude faster.

@clubby789
Copy link
Contributor

Benchmark on Linux:
tmpfs:

# truncate
  Time (mean ± σ):     195.1 ms ±  12.4 ms    [User: 0.5 ms, System: 193.7 ms]
  Range (min … max):   177.7 ms … 213.0 ms    16 runs
# set_len
  Time (mean ± σ):      82.8 ms ±   8.0 ms    [User: 0.9 ms, System: 80.9 ms]
  Range (min … max):    74.6 ms … 100.6 ms    31 runs

ext4 (on SSD):

# truncate
  20 seconds for one run
# set_len
  Time (mean ± σ):      85.5 ms ±   6.1 ms    [User: 0.7 ms, System: 84.8 ms]
  Range (min … max):    82.9 ms … 107.5 ms    28 runs

@the8472
Copy link
Member

the8472 commented Dec 25, 2024

Would marking a file as sparse make any difference here? At $work I've got significant speedups from that when incrementally writing a file on NTFS, but it was a different IO pattern than this.

ext4

That's likely due to auto_da_alloc. It's trying to be "helpful" here by adding an implicit fsync for this particular pattern.
An extreme waste of performance when you don't need durability. Potential data loss avoidance if your application isn't doing persistence properly.

It can be disabled via mount options.

@tbu-
Copy link
Contributor

tbu- commented Dec 26, 2024

I think this might change behavior around special files on Linux. E.g. /dev/null:

>>> null = open("/dev/null", "w+")
>>> null.truncate(1024)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

Previously, fs::writeing to /dev/null worked.

@ChrisDenton
Copy link
Member Author

ChrisDenton commented Dec 26, 2024

Ah, that severely dampens my enthusiasm for this. We could work around that by ignoring EINVAL or ERROR_INVALID_FUNCTION on the set_len but then I'd worry there are legitimate cases where opening with truncate actually works but set_len doesn't.

For the record, here are some benchmarks I did:

Windows ReFS

Timer precision: 100 ns
fswrite      fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ set_len   38.59 µs      │ 1.925 ms      │ 42.09 µs      │ 50.49 µs      │ 19456   │ 19456
╰─ truncate  211.4 µs      │ 6.462 ms      │ 233 µs        │ 252 µs        │ 3958    │ 3958

WSL ext4

Timer precision: 18 ns
fswrite      fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ set_len   101.3 µs      │ 748 µs        │ 118.4 µs      │ 126.5 µs      │ 7800    │ 7800
╰─ truncate  1.611 ms      │ 977 ms        │ 15.73 ms      │ 26.35 ms      │ 100     │ 100

@ChrisDenton
Copy link
Member Author

Would marking a file as sparse make any difference here? At $work I've got significant speedups from that when incrementally writing a file on NTFS, but it was a different IO pattern than this.

I doubt it in this case. It's useful when you want to fill in a lot of zeros at virtually no cost but if you're actually writing data sequentially up to the file size then it doesn't really help.

@ChrisDenton ChrisDenton removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Dec 26, 2024
@ChrisDenton
Copy link
Member Author

I'm going to close this and move discussion back to #127606. As I said, I'm no longer thinking this is a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

File truncation is slow on Windows

7 participants