-
Notifications
You must be signed in to change notification settings - Fork 5
fix: Complete memory management with dynamic optimization supportFix/memory problems #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…tion - Add allocate_pinned_mat() and setup_pinned_matrices() for flexible memory management - Support dynamic image size handling with ensure_even_dimensions() - Add dynamic buffer reallocation in infer_on_dGPU() - Add even dimension validation with helpful error messages - Improve error handling with try-catch in main loop - Maintain compatibility with both zero-copy and dGPU modes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses critical memory leaks and adds dynamic memory optimization features to the SAM3 segmentation system. The changes include a comprehensive 9-step destructor cleanup, dynamic buffer reallocation for variable image sizes, and automatic even dimension handling.
Changes:
- Implemented complete destructor with proper CUDA resource cleanup (streams, device/host memory, registered pointers)
- Added dynamic memory management with
allocate_pinned_mat()andsetup_pinned_matrices()for flexible buffer allocation - Introduced automatic even dimension validation and resizing to prevent inference errors
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| cpp/include/sam3.cuh | Added member variables to track registered memory pointers, new public methods for pinned memory allocation, and PinnedMemoryDeleter struct |
| cpp/src/sam3/sam3_trt/sam3.cu | Implemented constructor initialization, new memory allocation methods, dynamic buffer reallocation in inference paths, dimension validation, and comprehensive 9-step destructor cleanup |
| cpp/src/sam3/sam3_apps/sam3_pcs_app.cpp | Added even dimension helpers, replaced raw memory management with RAII-based approach using shared_ptr, improved error handling, and fixed memory leak from removed raw_bytes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| struct PinnedMemoryDeleter { | ||
| void operator()(void* ptr) const { | ||
| if (ptr) { | ||
| cudaFreeHost(ptr); | ||
| } | ||
| } | ||
| }; |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PinnedMemoryDeleter struct is defined but never used in the codebase. The allocate_pinned_mat function uses an inline lambda deleter instead. Consider either using this struct in allocate_pinned_mat or removing it to reduce code clutter.
| if (opencv_input != nullptr) | ||
| { | ||
| cudaFree(opencv_input); | ||
| cudaFree((void*)gpu_result); | ||
| } | ||
| opencv_inbytes = current_inbytes; | ||
| cuda_check(cudaMalloc(&opencv_input, opencv_inbytes), " reallocating opencv input memory"); | ||
| cuda_check(cudaMalloc((void**)&gpu_result, opencv_inbytes), " reallocating result memory"); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential memory leak and state inconsistency issue in the reallocation logic. If cudaMalloc for opencv_input succeeds (line 219) but cudaMalloc for gpu_result fails (line 220), the pointers will be in an inconsistent state. The old memory is already freed, opencv_inbytes is updated, but gpu_result is nullptr while opencv_input points to newly allocated memory. Consider setting pointers to nullptr after freeing, or using a temporary variable pattern to ensure atomicity of the reallocation.
| if (opencv_input != nullptr) | |
| { | |
| cudaFree(opencv_input); | |
| cudaFree((void*)gpu_result); | |
| } | |
| opencv_inbytes = current_inbytes; | |
| cuda_check(cudaMalloc(&opencv_input, opencv_inbytes), " reallocating opencv input memory"); | |
| cuda_check(cudaMalloc((void**)&gpu_result, opencv_inbytes), " reallocating result memory"); | |
| // Allocate new buffers first, using temporaries to keep state consistent | |
| void* new_opencv_input = nullptr; | |
| void* new_gpu_result = nullptr; | |
| cuda_check(cudaMalloc(&new_opencv_input, current_inbytes), " reallocating opencv input memory"); | |
| cuda_check(cudaMalloc(&new_gpu_result, current_inbytes), " reallocating result memory"); | |
| // Free old buffers after successful allocations | |
| if (opencv_input != nullptr) | |
| { | |
| cudaFree(opencv_input); | |
| opencv_input = nullptr; | |
| } | |
| if (gpu_result != nullptr) | |
| { | |
| cudaFree((void*)gpu_result); | |
| gpu_result = nullptr; | |
| } | |
| // Commit new state atomically | |
| opencv_input = new_opencv_input; | |
| gpu_result = static_cast<decltype(gpu_result)>(new_gpu_result); | |
| opencv_inbytes = current_inbytes; |
| if (opencv_input != nullptr) | ||
| { | ||
| cudaFree(opencv_input); | ||
| cudaFree((void*)gpu_result); | ||
| } | ||
| cuda_check(cudaMalloc(&opencv_input, opencv_inbytes), " allocating opencv input memory on a dGPU system"); | ||
| cuda_check(cudaMalloc((void**)&gpu_result, opencv_inbytes), " allocating result memory on a dGPU system"); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential memory leak and state inconsistency issue in the reallocation logic. If cudaMalloc for opencv_input succeeds (line 66) but cudaMalloc for gpu_result fails (line 67), the pointers will be in an inconsistent state. The old memory is already freed, opencv_inbytes is updated, but gpu_result is nullptr while opencv_input points to newly allocated memory. Consider setting pointers to nullptr after freeing, or using a temporary variable pattern to ensure atomicity of the reallocation.
| if (opencv_input != nullptr) | |
| { | |
| cudaFree(opencv_input); | |
| cudaFree((void*)gpu_result); | |
| } | |
| cuda_check(cudaMalloc(&opencv_input, opencv_inbytes), " allocating opencv input memory on a dGPU system"); | |
| cuda_check(cudaMalloc((void**)&gpu_result, opencv_inbytes), " allocating result memory on a dGPU system"); | |
| // Use temporary pointers so reallocation is atomic and leaves the object in a consistent state | |
| void* new_opencv_input = nullptr; | |
| void* new_gpu_result = nullptr; | |
| // Allocate new buffers first; cuda_check will handle any allocation errors | |
| cuda_check(cudaMalloc(&new_opencv_input, opencv_inbytes), " allocating opencv input memory on a dGPU system"); | |
| cuda_check(cudaMalloc(&new_gpu_result, opencv_inbytes), " allocating result memory on a dGPU system"); | |
| // Now that both allocations have succeeded, free any old buffers | |
| if (opencv_input != nullptr) | |
| { | |
| cudaFree(opencv_input); | |
| cudaFree((void*)gpu_result); | |
| } | |
| // Update member pointers to point to the newly allocated buffers | |
| opencv_input = new_opencv_input; | |
| gpu_result = new_gpu_result; | |
| // Initialize the buffers |
| const float vis_alpha = 0.3; | ||
| const float probability_threshold = 0.5; | ||
| const SAM3_VISUALIZATION visualize = SAM3_VISUALIZATION::VIS_INSTANCE_SEGMENTATION; | ||
| const SAM3_VISUALIZATION visualize = SAM3_VISUALIZATION::VIS_SEMANTIC_SEGMENTATION; |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The visualization type was changed from VIS_INSTANCE_SEGMENTATION to VIS_SEMANTIC_SEGMENTATION. If this change is intentional for the purposes of this PR, consider documenting it in the PR description. Otherwise, this might be a leftover from testing that should be reverted.
| const SAM3_VISUALIZATION visualize = SAM3_VISUALIZATION::VIS_SEMANTIC_SEGMENTATION; | |
| const SAM3_VISUALIZATION visualize = SAM3_VISUALIZATION::VIS_INSTANCE_SEGMENTATION; |
| cudaFree(opencv_input); | ||
| cudaFree((void*)gpu_result); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cudaFree calls should be wrapped with error checking. While cudaFree typically doesn't fail for valid pointers, it's a best practice to check for errors consistently, especially in cleanup code where errors could indicate corrupted state.
| cudaFree(opencv_input); | |
| cudaFree((void*)gpu_result); | |
| cuda_check(cudaFree(opencv_input), " freeing opencv input memory on a dGPU system"); | |
| cuda_check(cudaFree((void*)gpu_result), " freeing result memory on a dGPU system"); |
| cudaMemset(opencv_input, 0, opencv_inbytes); | ||
| cudaMemset((void *)gpu_result, 0, opencv_inbytes); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cudaMemset calls should be wrapped with cuda_check for consistent error handling. If these operations fail, it could lead to undefined behavior when the uninitialized memory is used.
| cudaMemset(opencv_input, 0, opencv_inbytes); | |
| cudaMemset((void *)gpu_result, 0, opencv_inbytes); | |
| cuda_check(cudaMemset(opencv_input, 0, opencv_inbytes), " zeroing opencv input memory on a dGPU system"); | |
| cuda_check(cudaMemset((void *)gpu_result, 0, opencv_inbytes), " zeroing result memory on a dGPU system"); |
| cudaFree(opencv_input); | ||
| cudaFree((void*)gpu_result); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cudaFree calls should be wrapped with cuda_check for consistent error handling. While cudaFree typically doesn't fail for valid pointers, it's a best practice to check for errors consistently.
| cudaFree(opencv_input); | |
| cudaFree((void*)gpu_result); | |
| cuda_check(cudaFree(opencv_input), " freeing opencv input memory"); | |
| cuda_check(cudaFree((void*)gpu_result), " freeing result memory"); |
|
@Shattered217 can you share system traces or evidence of memory leak? The reason destructors are almost not used in this repo is because the driver takes care of cleaning up all resources. There is a huge amount of CUDA backend that is dedicated to ensuring that these issues don't happen. So, even without your PR, we don't actually encounter memory leaks when the application terminates. |
|
@dataplayer12 The core goal was to fix memory-related anomalies when inferring images of varying resolutions. Sorry—the AI assistant inadvertently "fixed" what the potentially non-existent potential memory leaks. I’ll carefully review the code tomorrow and remove all changes related to "memory leak fixes," focusing solely on resolution-adaptive optimization |
Problem
Program crashes when processing images with different resolutions:
Solution
allocate_pinned_mat()andsetup_pinned_matrices()ensure_even_dimensions()Testing
incorrect image :
The processing of the first image is normal. If the resolution of subsequent images is different, errors will occur. The last image is the erroneous result after processing the Samoyed.