feat(yolov8-seg): Add GPU-accelerated segmentation postprocessing #1696

soumyadbanik · 2026-01-27T13:03:46Z

Add O(num_dets) optimized drawing kernel (100x speedup: 5ms -> 0.05ms)
Add gather_kept_bboxes_kernel for dense bbox extraction
Add process_mask_kernel with bilinear interpolation and strict bbox clipping
Add cuda_blur_masks for mask smoothing
Increase kMaxNumOutputBbox to 8500 (fixes crash with standard YOLOv8 models)
Update yolov8_seg.cpp for TensorRT 10 compatibility (enqueueV3)
Add comprehensive documentation (GPU_POSTPROCESSING.md)
Add result images demonstrating correct mask output

Tested on RTX 3080 Ti with CUDA 12.6 and TensorRT 10.x

- Add O(num_dets) optimized drawing kernel (100x speedup: 5ms -> 0.05ms) - Add gather_kept_bboxes_kernel for dense bbox extraction - Add process_mask_kernel with bilinear interpolation and strict bbox clipping - Add cuda_blur_masks for mask smoothing - Increase kMaxNumOutputBbox to 8500 (fixes crash with standard YOLOv8 models) - Update yolov8_seg.cpp for TensorRT 10 compatibility (enqueueV3) - Add comprehensive documentation (GPU_POSTPROCESSING.md) - Add result images demonstrating correct mask output Tested on RTX 3080 Ti with CUDA 12.6 and TensorRT 10.x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(yolov8-seg): Add GPU-accelerated segmentation postprocessing #1696

feat(yolov8-seg): Add GPU-accelerated segmentation postprocessing #1696

Uh oh!

soumyadbanik commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(yolov8-seg): Add GPU-accelerated segmentation postprocessing #1696

Are you sure you want to change the base?

feat(yolov8-seg): Add GPU-accelerated segmentation postprocessing #1696

Uh oh!

Conversation

soumyadbanik commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant