Avoid file overwriting in fallback `AllocateFileRange` implementation #33164

hebasto · 2025-08-09T20:38:15Z

On the master branch, the fallback variant of AllocateFileRange, introduced in #1677, overwrites the file's content. This causes issues on some systems during some "reindex" scenarios. Additionally, the recently introduced feature_reindex_init.py test is also broken on these systems.

The affected systems include: OpenBSD, NetBSD, OmniOS, OpenIndiana.

This PR avoids such overwriting.

Fixes #33128 and feature_reindex_init.py test on affected systems.

DrahtBot · 2025-08-09T20:38:21Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33164.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33228 (util: Address various issues in AllocateFileRange by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

luke-jr

The description for AllocateFileRange says "the range specified in the arguments will never contain live data", so I'm not sure this is the right approach.

luke-jr · 2025-08-11T12:43:21Z

src/util/fs_helpers.cpp

    if (0 == posix_fallocate(fileno(file), 0, nEndPos)) return;
 #endif
    // Fallback version
-    // TODO: just write one byte per block


Why lose this?

luke-jr · 2025-08-11T12:45:15Z

src/util/fs_helpers.cpp

+    if (fseek(file, 0, SEEK_END)) {
+        return;
+    }
+    const int64_t filesize = std::ftell(file);


It looks safe to mix std::ftell with C file i/o functions, but is there any reason to do so?

luke-jr · 2025-08-11T12:46:43Z

src/util/fs_helpers.cpp

        return;
    }
-    while (length > 0) {
+    unsigned int inc_size = offset + length - static_cast<unsigned int>(filesize);


Probably better to use size_t or at least long

luke-jr · 2025-08-11T14:29:21Z

src/util/fs_helpers.cpp

        return;
    }
-    while (length > 0) {
+    unsigned int inc_size = offset + length - static_cast<unsigned int>(filesize);


Change of behaviour: This can allocate before offset if the file is smaller. If that's desired behaviour, there's no point to the offset parameter at all...?

luke-jr · 2025-08-11T14:30:20Z

src/util/fs_helpers.cpp

+    if (fseek(file, filesize, SEEK_SET)) {
+        return;
+    }


cedwies · 2025-08-19T16:20:56Z

Clarification: The doc for AllocateFileRange says: “the range specified … will never contain live data.” But in reality some call paths (during reindex on a few OSes) did pass ranges that overlapped live bytes? The old fallback then zero‑wrote inside the file and corrupted data. This PR makes the fallback defensive by appending after EOF only, so even if a caller violates the promise, we still don’t overwrite (?).

But what about the guarantee from the doc of AllocateFileRange? Should we dismiss this guarantee? Maybe we should clarify which assumptions we can make e.g. filesize < offset (only appending at the end) or offset + length > filesize (no shrinking because if I am not mistaken SetEndOfFile / ftruncate will then shrink the file, potentially erasing live data). So we should guard against that?

luke-jr · 2025-08-19T22:28:08Z

src/util/fs_helpers.cpp

-    // TODO: just write one byte per block
-    static const char buf[65536] = {};
-    if (fseek(file, offset, SEEK_SET)) {
+    if (fseek(file, 0, SEEK_END)) {


TIL that fseek...SEEK_END is undefined behaviour for binary files.

https://wiki.sei.cmu.edu/confluence/display/c/FIO19-C.+Do+not+use+fseek%28%29+and+ftell%28%29+to+compute+the+size+of+a+regular+file

Good point, in strict ISO C that's UB. My understanding is that in our case this code only runs on POSIX systems (not Win/macOS), where fseek(..., SEEK_END) is well-defined for regular files. Or am I missing an issue here?

@cedwies @luke-jr
since this part of the code's gonna be run on POSIX systems only, i think trying out the lseek() system call can be of help. it is reliable on large files, sparse file aware and portable across Unix systems since it is a POSIX-standard syscall.

for instance something like this:

int fd = fileno(file); if (fd == -1) return; off_t file_size = lseek(fd, 0, SEEK_END); if (file_size == static_cast<off_t>-1) return;

what do you think of this?

I would be a bit hesitant about mixing lseek with FILE*, since stdio has its own buffering. From what I read, a safer option might be to stick with the stdio layer (fseeko/ftello) or call fstat on the fd after flushing. Even though I expect mixing the two layers to work most of the time, it can be risky because stdio may have buffered data or be tracking the file position separately from the kernel. I think it's safer to either stick to stdio or _fd-only.

thanks for this context, it will be invaluable as I work on developing a solution for this issue. (#33128)

i will reach out to the relevant authors of the 3 PRs that try to resolve this issue and collaborate with them, rather than submitting an additional PR for review at this stage.

then i will return with a more actionable and concrete proposal rather than a simple suggestion.

Avoid file overwriting in fallback AllocateFileRange implementation

6528709

hebasto added the Block storage label Aug 9, 2025

luke-jr suggested changes Aug 11, 2025

View reviewed changes

luke-jr reviewed Aug 19, 2025

View reviewed changes

DrahtBot mentioned this pull request Aug 21, 2025

util: Address various issues in AllocateFileRange #33228

Open

Avoid file overwriting in fallback AllocateFileRange implementation #33164

Are you sure you want to change the base?

Avoid file overwriting in fallback AllocateFileRange implementation #33164

Conversation

hebasto commented Aug 9, 2025

Uh oh!

DrahtBot commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

luke-jr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cedwies commented Aug 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winterrdog Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winterrdog Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Avoid file overwriting in fallback `AllocateFileRange` implementation #33164

Avoid file overwriting in fallback `AllocateFileRange` implementation #33164

DrahtBot commented Aug 9, 2025 •

edited

Loading

winterrdog Sep 28, 2025 •

edited

Loading

winterrdog Oct 3, 2025 •

edited

Loading