feat(io): add Resize API for efficient storage allocation#1571
feat(io): add Resize API for efficient storage allocation#1571LHT129 merged 1 commit intoantgroup:mainfrom
Conversation
LHT129
commented
Feb 9, 2026
- Add Resize() method to basic_io.h with ResizeImpl support via SFINAE
- Implement ResizeImpl for async_io, buffer_io, and memory_block_io
- Replace Write-based expansion hack with Resize in datacell classes
- Add null pointer check for memory_block_io allocation
- Add Resize() method to basic_io.h with ResizeImpl support via SFINAE - Implement ResizeImpl for async_io, buffer_io, and memory_block_io - Replace Write-based expansion hack with Resize in datacell classes - Add null pointer check for memory_block_io allocation Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
Summary of ChangesHello @LHT129, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the I/O subsystem by introducing a dedicated and efficient Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a Resize API for more efficient storage allocation, replacing a previous Write-based hack. This is a significant improvement in terms of code clarity and correctness. The changes across the datacell classes to adopt this new API are clean and appropriate. The implementations of ResizeImpl for the various IO backends are mostly solid, but I've identified a few areas for improvement regarding error handling, efficiency, and code structure, including a critical bug in mmap_io.cpp where the return value of mremap was not being checked. My review includes specific suggestions to address these points.
| MMapIO::ResizeImpl(uint64_t size) { | ||
| auto new_size = size; | ||
| auto old_size = this->size_; | ||
| if (old_size == 0) { | ||
| old_size = DEFAULT_INIT_MMAP_SIZE; | ||
| } | ||
| if (new_size > old_size) { | ||
| auto ret = | ||
| #ifdef __APPLE__ | ||
| ftruncate(this->fd_, static_cast<off_t>(new_size)); | ||
| #else | ||
| ftruncate64(this->fd_, static_cast<int64_t>(new_size)); | ||
| #endif | ||
| if (ret == -1) { | ||
| throw VsagException(ErrorType::INTERNAL_ERROR, "ftruncate failed"); | ||
| } | ||
|
|
||
| #ifdef __APPLE__ | ||
| munmap(this->start_, old_size); | ||
| void* addr = mmap(nullptr, new_size, PROT_READ | PROT_WRITE, MAP_SHARED, this->fd_, 0); | ||
| if (addr == MAP_FAILED) { | ||
| throw VsagException(ErrorType::INTERNAL_ERROR, "mmap remap failed"); | ||
| } | ||
| this->start_ = static_cast<uint8_t*>(addr); | ||
| #else | ||
| this->start_ = | ||
| static_cast<uint8_t*>(mremap(this->start_, old_size, new_size, MREMAP_MAYMOVE)); | ||
| #endif | ||
| } else if (new_size < old_size) { | ||
| #ifdef __APPLE__ | ||
| munmap(this->start_, old_size); | ||
| void* addr = mmap(nullptr, new_size, PROT_READ | PROT_WRITE, MAP_SHARED, this->fd_, 0); | ||
| if (addr == MAP_FAILED) { | ||
| throw VsagException(ErrorType::INTERNAL_ERROR, "mmap remap failed"); | ||
| } | ||
| this->start_ = static_cast<uint8_t*>(addr); | ||
| #else | ||
| this->start_ = | ||
| static_cast<uint8_t*>(mremap(this->start_, old_size, new_size, MREMAP_MAYMOVE)); | ||
| #endif | ||
| auto ret = | ||
| #ifdef __APPLE__ | ||
| ftruncate(this->fd_, static_cast<off_t>(new_size)); | ||
| #else | ||
| ftruncate64(this->fd_, static_cast<int64_t>(new_size)); | ||
| #endif | ||
| if (ret == -1) { | ||
| throw VsagException(ErrorType::INTERNAL_ERROR, "ftruncate failed"); | ||
| } | ||
| } | ||
| this->size_ = new_size; | ||
| } |
There was a problem hiding this comment.
This implementation of ResizeImpl has a few issues:
- Critical Bug: The return value of
mremapis not checked. It can returnMAP_FAILEDon error, which would lead to undefined behavior when the invalid pointer is used. This should be checked and an exception thrown on failure. - Error Reporting: The error message for
ftruncateis generic. Includingstrerror(errno)would provide more details on failure. - Code Duplication: The logic for remapping memory (
mremapormunmap/mmap) is duplicated for both growing and shrinking cases. This can be refactored to improve maintainability.
I've provided a suggested refactoring that addresses all these points by extracting the remap and truncate logic into lambdas, adding error checking, and improving error messages.
void
MMapIO::ResizeImpl(uint64_t size) {
auto new_size = size;
if (new_size == this->size_) {
return;
}
auto old_size = this->size_;
if (old_size == 0) {
old_size = DEFAULT_INIT_MMAP_SIZE;
}
auto remap_memory = [&]() {
#ifdef __APPLE__
munmap(this->start_, old_size);
void* addr = mmap(nullptr, new_size, PROT_READ | PROT_WRITE, MAP_SHARED, this->fd_, 0);
if (addr == MAP_FAILED) {
throw VsagException(ErrorType::INTERNAL_ERROR, fmt::format("mmap remap failed: {}", strerror(errno)));
}
this->start_ = static_cast<uint8_t*>(addr);
#else
void* new_addr = mremap(this->start_, old_size, new_size, MREMAP_MAYMOVE);
if (new_addr == MAP_FAILED) {
throw VsagException(ErrorType::INTERNAL_ERROR, fmt::format("mremap failed: {}", strerror(errno)));
}
this->start_ = static_cast<uint8_t*>(new_addr);
#endif
};
auto truncate_file = [&]() {
auto ret =
#ifdef __APPLE__
ftruncate(this->fd_, static_cast<off_t>(new_size));
#else
ftruncate64(this->fd_, static_cast<int64_t>(new_size));
#endif
if (ret == -1) {
throw VsagException(ErrorType::INTERNAL_ERROR, fmt::format("ftruncate failed: {}", strerror(errno)));
}
};
if (new_size > old_size) {
truncate_file();
remap_memory();
} else if (new_size < old_size) {
remap_memory();
truncate_file();
}
this->size_ = new_size;
}| } else { | ||
| if (size <= this->size_) { | ||
| return; | ||
| } | ||
| ByteBuffer buffer(SERIALIZE_BUFFER_SIZE, this->allocator_); | ||
| uint64_t offset = this->size_; | ||
| while (offset < size) { | ||
| auto cur_size = std::min(SERIALIZE_BUFFER_SIZE, size - offset); | ||
| this->Write(buffer.data, cur_size, offset); | ||
| offset += cur_size; | ||
| } | ||
| } |
There was a problem hiding this comment.
The fallback implementation for Resize can be made more efficient. Instead of writing data in a loop to expand the underlying storage, a single Write call with a size of 0 at the target offset is sufficient. Most WriteImpl implementations will handle this by extending the storage to the specified offset without actually writing data, which is much faster.
} else {
if (size > this->size_) {
// A single write of 0 bytes at the target size is enough to
// trigger the size update in WriteImpl, which is more efficient
// than writing data in a loop.
this->Write(nullptr, 0, size);
}
}| auto ret = ftruncate64(this->wfd_, static_cast<int64_t>(size)); | ||
| #endif | ||
| if (ret == -1) { | ||
| throw VsagException(ErrorType::INTERNAL_ERROR, "ftruncate failed"); |
There was a problem hiding this comment.
The error message for ftruncate is a bit generic. Including the system error message using strerror(errno) would provide more context for debugging, which is a common practice elsewhere in this file.
| throw VsagException(ErrorType::INTERNAL_ERROR, "ftruncate failed"); | |
| throw VsagException(ErrorType::INTERNAL_ERROR, fmt::format("ftruncate failed: {}", strerror(errno))); |
| auto ret = ftruncate64(this->fd_, static_cast<int64_t>(size)); | ||
| #endif | ||
| if (ret == -1) { | ||
| throw VsagException(ErrorType::INTERNAL_ERROR, "ftruncate failed"); |
There was a problem hiding this comment.
Similar to async_io.cpp, the error message for ftruncate could be improved by including the system error string via strerror(errno). This will make debugging failures easier.
| throw VsagException(ErrorType::INTERNAL_ERROR, "ftruncate failed"); | |
| throw VsagException(ErrorType::INTERNAL_ERROR, fmt::format("ftruncate failed: {}", strerror(errno))); |
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (52.54%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. @@ Coverage Diff @@
## main #1571 +/- ##
==========================================
- Coverage 91.24% 91.06% -0.18%
==========================================
Files 329 329
Lines 19396 19445 +49
==========================================
+ Hits 17697 17708 +11
- Misses 1699 1737 +38
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|