-
Notifications
You must be signed in to change notification settings - Fork 38.2k
Avoid file overwriting in fallback AllocateFileRange implementation
#33164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33164. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
luke-jr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description for AllocateFileRange says "the range specified in the arguments will never contain live data", so I'm not sure this is the right approach.
| if (0 == posix_fallocate(fileno(file), 0, nEndPos)) return; | ||
| #endif | ||
| // Fallback version | ||
| // TODO: just write one byte per block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why lose this?
| if (fseek(file, 0, SEEK_END)) { | ||
| return; | ||
| } | ||
| const int64_t filesize = std::ftell(file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks safe to mix std::ftell with C file i/o functions, but is there any reason to do so?
| return; | ||
| } | ||
| while (length > 0) { | ||
| unsigned int inc_size = offset + length - static_cast<unsigned int>(filesize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to use size_t or at least long
| return; | ||
| } | ||
| while (length > 0) { | ||
| unsigned int inc_size = offset + length - static_cast<unsigned int>(filesize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change of behaviour: This can allocate before offset if the file is smaller. If that's desired behaviour, there's no point to the offset parameter at all...?
| if (fseek(file, filesize, SEEK_SET)) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant?
|
Clarification: The doc for But what about the guarantee from the doc of |
| // TODO: just write one byte per block | ||
| static const char buf[65536] = {}; | ||
| if (fseek(file, offset, SEEK_SET)) { | ||
| if (fseek(file, 0, SEEK_END)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL that fseek...SEEK_END is undefined behaviour for binary files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, in strict ISO C that's UB. My understanding is that in our case this code only runs on POSIX systems (not Win/macOS), where fseek(..., SEEK_END) is well-defined for regular files. Or am I missing an issue here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cedwies @luke-jr
since this part of the code's gonna be run on POSIX systems only, i think trying out the lseek() system call can be of help. it is reliable on large files, sparse file aware and portable across Unix systems since it is a POSIX-standard syscall.
for instance something like this:
int fd = fileno(file);
if (fd == -1) return;
off_t file_size = lseek(fd, 0, SEEK_END);
if (file_size == static_cast<off_t>-1) return;what do you think of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be a bit hesitant about mixing lseek with FILE*, since stdio has its own buffering. From what I read, a safer option might be to stick with the stdio layer (fseeko/ftello) or call fstat on the fd after flushing. Even though I expect mixing the two layers to work most of the time, it can be risky because stdio may have buffered data or be tracking the file position separately from the kernel. I think it's safer to either stick to stdio or _fd-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this context, it will be invaluable as I work on developing a solution for this issue. (#33128)
i will reach out to the relevant authors of the 3 PRs that try to resolve this issue and collaborate with them, rather than submitting an additional PR for review at this stage.
then i will return with a more actionable and concrete proposal rather than a simple suggestion.
On the master branch, the fallback variant of
AllocateFileRange, introduced in #1677, overwrites the file's content. This causes issues on some systems during some "reindex" scenarios. Additionally, the recently introducedfeature_reindex_init.pytest is also broken on these systems.The affected systems include: OpenBSD, NetBSD, OmniOS, OpenIndiana.
This PR avoids such overwriting.
Fixes #33128 and
feature_reindex_init.pytest on affected systems.