Skip to content

Conversation

@Shattered217
Copy link

@Shattered217 Shattered217 commented Jan 10, 2026

Problem

Program crashes when processing images with different resolutions:

terminate called after throwing an instance of 'std::runtime_error'
what(): Error in async memcpy of opencv image: invalid argument
Aborted (core dumped)

Solution

  • Dynamic memory management: allocate_pinned_mat() and setup_pinned_matrices()
  • Auto-adjust odd dimensions: ensure_even_dimensions()
  • Dynamic buffer reallocation: supports arbitrary image size sequences
  • Added error handling and dimension validation

Testing

  • ✅ Jetson Orin NX
  • ✅ NVIDIA A100
  • ✅ Mixed resolution image

incorrect image :

n02085936_233
pn02085936_233
dog
pdog
The processing of the first image is normal. If the resolution of subsequent images is different, errors will occur. The last image is the erroneous result after processing the Samoyed.

…tion

- Add allocate_pinned_mat() and setup_pinned_matrices() for flexible memory management
- Support dynamic image size handling with ensure_even_dimensions()
- Add dynamic buffer reallocation in infer_on_dGPU()
- Add even dimension validation with helpful error messages
- Improve error handling with try-catch in main loop
- Maintain compatibility with both zero-copy and dGPU modes
Copilot AI review requested due to automatic review settings January 10, 2026 11:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses critical memory leaks and adds dynamic memory optimization features to the SAM3 segmentation system. The changes include a comprehensive 9-step destructor cleanup, dynamic buffer reallocation for variable image sizes, and automatic even dimension handling.

Changes:

  • Implemented complete destructor with proper CUDA resource cleanup (streams, device/host memory, registered pointers)
  • Added dynamic memory management with allocate_pinned_mat() and setup_pinned_matrices() for flexible buffer allocation
  • Introduced automatic even dimension validation and resizing to prevent inference errors

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
cpp/include/sam3.cuh Added member variables to track registered memory pointers, new public methods for pinned memory allocation, and PinnedMemoryDeleter struct
cpp/src/sam3/sam3_trt/sam3.cu Implemented constructor initialization, new memory allocation methods, dynamic buffer reallocation in inference paths, dimension validation, and comprehensive 9-step destructor cleanup
cpp/src/sam3/sam3_apps/sam3_pcs_app.cpp Added even dimension helpers, replaced raw memory management with RAII-based approach using shared_ptr, improved error handling, and fixed memory leak from removed raw_bytes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +12 to +18
struct PinnedMemoryDeleter {
void operator()(void* ptr) const {
if (ptr) {
cudaFreeHost(ptr);
}
}
};
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PinnedMemoryDeleter struct is defined but never used in the codebase. The allocate_pinned_mat function uses an inline lambda deleter instead. Consider either using this struct in allocate_pinned_mat or removing it to reduce code clutter.

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +220
if (opencv_input != nullptr)
{
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
}
opencv_inbytes = current_inbytes;
cuda_check(cudaMalloc(&opencv_input, opencv_inbytes), " reallocating opencv input memory");
cuda_check(cudaMalloc((void**)&gpu_result, opencv_inbytes), " reallocating result memory");
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential memory leak and state inconsistency issue in the reallocation logic. If cudaMalloc for opencv_input succeeds (line 219) but cudaMalloc for gpu_result fails (line 220), the pointers will be in an inconsistent state. The old memory is already freed, opencv_inbytes is updated, but gpu_result is nullptr while opencv_input points to newly allocated memory. Consider setting pointers to nullptr after freeing, or using a temporary variable pattern to ensure atomicity of the reallocation.

Suggested change
if (opencv_input != nullptr)
{
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
}
opencv_inbytes = current_inbytes;
cuda_check(cudaMalloc(&opencv_input, opencv_inbytes), " reallocating opencv input memory");
cuda_check(cudaMalloc((void**)&gpu_result, opencv_inbytes), " reallocating result memory");
// Allocate new buffers first, using temporaries to keep state consistent
void* new_opencv_input = nullptr;
void* new_gpu_result = nullptr;
cuda_check(cudaMalloc(&new_opencv_input, current_inbytes), " reallocating opencv input memory");
cuda_check(cudaMalloc(&new_gpu_result, current_inbytes), " reallocating result memory");
// Free old buffers after successful allocations
if (opencv_input != nullptr)
{
cudaFree(opencv_input);
opencv_input = nullptr;
}
if (gpu_result != nullptr)
{
cudaFree((void*)gpu_result);
gpu_result = nullptr;
}
// Commit new state atomically
opencv_input = new_opencv_input;
gpu_result = static_cast<decltype(gpu_result)>(new_gpu_result);
opencv_inbytes = current_inbytes;

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +67
if (opencv_input != nullptr)
{
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
}
cuda_check(cudaMalloc(&opencv_input, opencv_inbytes), " allocating opencv input memory on a dGPU system");
cuda_check(cudaMalloc((void**)&gpu_result, opencv_inbytes), " allocating result memory on a dGPU system");
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential memory leak and state inconsistency issue in the reallocation logic. If cudaMalloc for opencv_input succeeds (line 66) but cudaMalloc for gpu_result fails (line 67), the pointers will be in an inconsistent state. The old memory is already freed, opencv_inbytes is updated, but gpu_result is nullptr while opencv_input points to newly allocated memory. Consider setting pointers to nullptr after freeing, or using a temporary variable pattern to ensure atomicity of the reallocation.

Suggested change
if (opencv_input != nullptr)
{
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
}
cuda_check(cudaMalloc(&opencv_input, opencv_inbytes), " allocating opencv input memory on a dGPU system");
cuda_check(cudaMalloc((void**)&gpu_result, opencv_inbytes), " allocating result memory on a dGPU system");
// Use temporary pointers so reallocation is atomic and leaves the object in a consistent state
void* new_opencv_input = nullptr;
void* new_gpu_result = nullptr;
// Allocate new buffers first; cuda_check will handle any allocation errors
cuda_check(cudaMalloc(&new_opencv_input, opencv_inbytes), " allocating opencv input memory on a dGPU system");
cuda_check(cudaMalloc(&new_gpu_result, opencv_inbytes), " allocating result memory on a dGPU system");
// Now that both allocations have succeeded, free any old buffers
if (opencv_input != nullptr)
{
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
}
// Update member pointers to point to the newly allocated buffers
opencv_input = new_opencv_input;
gpu_result = new_gpu_result;
// Initialize the buffers

Copilot uses AI. Check for mistakes.
const float vis_alpha = 0.3;
const float probability_threshold = 0.5;
const SAM3_VISUALIZATION visualize = SAM3_VISUALIZATION::VIS_INSTANCE_SEGMENTATION;
const SAM3_VISUALIZATION visualize = SAM3_VISUALIZATION::VIS_SEMANTIC_SEGMENTATION;
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The visualization type was changed from VIS_INSTANCE_SEGMENTATION to VIS_SEMANTIC_SEGMENTATION. If this change is intentional for the purposes of this PR, consider documenting it in the PR description. Otherwise, this might be a leftover from testing that should be reverted.

Suggested change
const SAM3_VISUALIZATION visualize = SAM3_VISUALIZATION::VIS_SEMANTIC_SEGMENTATION;
const SAM3_VISUALIZATION visualize = SAM3_VISUALIZATION::VIS_INSTANCE_SEGMENTATION;

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +64
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cudaFree calls should be wrapped with error checking. While cudaFree typically doesn't fail for valid pointers, it's a best practice to check for errors consistently, especially in cleanup code where errors could indicate corrupted state.

Suggested change
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
cuda_check(cudaFree(opencv_input), " freeing opencv input memory on a dGPU system");
cuda_check(cudaFree((void*)gpu_result), " freeing result memory on a dGPU system");

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +69
cudaMemset(opencv_input, 0, opencv_inbytes);
cudaMemset((void *)gpu_result, 0, opencv_inbytes);
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cudaMemset calls should be wrapped with cuda_check for consistent error handling. If these operations fail, it could lead to undefined behavior when the uninitialized memory is used.

Suggested change
cudaMemset(opencv_input, 0, opencv_inbytes);
cudaMemset((void *)gpu_result, 0, opencv_inbytes);
cuda_check(cudaMemset(opencv_input, 0, opencv_inbytes), " zeroing opencv input memory on a dGPU system");
cuda_check(cudaMemset((void *)gpu_result, 0, opencv_inbytes), " zeroing result memory on a dGPU system");

Copilot uses AI. Check for mistakes.
Comment on lines +215 to +216
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cudaFree calls should be wrapped with cuda_check for consistent error handling. While cudaFree typically doesn't fail for valid pointers, it's a best practice to check for errors consistently.

Suggested change
cudaFree(opencv_input);
cudaFree((void*)gpu_result);
cuda_check(cudaFree(opencv_input), " freeing opencv input memory");
cuda_check(cudaFree((void*)gpu_result), " freeing result memory");

Copilot uses AI. Check for mistakes.
@dataplayer12
Copy link
Owner

@Shattered217 can you share system traces or evidence of memory leak? The reason destructors are almost not used in this repo is because the driver takes care of cleaning up all resources. There is a huge amount of CUDA backend that is dedicated to ensuring that these issues don't happen. So, even without your PR, we don't actually encounter memory leaks when the application terminates.

@Shattered217
Copy link
Author

@dataplayer12 The core goal was to fix memory-related anomalies when inferring images of varying resolutions. Sorry—the AI assistant inadvertently "fixed" what the potentially non-existent potential memory leaks. I’ll carefully review the code tomorrow and remove all changes related to "memory leak fixes," focusing solely on resolution-adaptive optimization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants