Skip to content

feat: add yolov13 implementation for TensorRT 10#1699

Merged
mpj1234 merged 1 commit intowang-xinyu:trt10from
ydk61:feat-yolov13-trt10
Jan 28, 2026
Merged

feat: add yolov13 implementation for TensorRT 10#1699
mpj1234 merged 1 commit intowang-xinyu:trt10from
ydk61:feat-yolov13-trt10

Conversation

@ydk61
Copy link
Copy Markdown
Contributor

@ydk61 ydk61 commented Jan 28, 2026

No description provided.

Copilot AI review requested due to automatic review settings January 28, 2026 04:33
@ydk61 ydk61 force-pushed the feat-yolov13-trt10 branch from 9442b7d to 08ddd40 Compare January 28, 2026 04:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a full YOLOv13 detection pipeline with TensorRT 10 support, including C++ engine building/inference, GPU-based pre/post-processing, a custom YOLO TensorRT plugin, and two Python inference entrypoints (PyCUDA and cuda-python). The changes introduce new model construction blocks, calibration utilities for INT8, and build tooling (CMake) plus documentation for setup and usage.

Changes:

  • Add C++ detection executable (yolov13-det) with TensorRT 10, including engine build/serialize, dynamic buffer setup using tensor names, and batched GPU preprocessing/postprocessing.
  • Implement custom YOLO plugin (YoloLayer_TRT) and associated CUDA kernels for decoding and NMS, plus shared type, preprocess, and postprocess utilities.
  • Provide Python detection scripts for TensorRT (PyCUDA and cuda-python variants) and a README with environment requirements, build instructions, and Python usage.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
yolov13/yolov13_det_trt_cuda-python.py Adds a TensorRT 10 detection script using the cuda-python driver/runtime APIs and manual tensor-address binding.
yolov13/yolov13_det_trt.py Adds a TensorRT detection script using PyCUDA, mirroring the C++ pipeline at a higher level.
yolov13/yolov13_det.cpp Implements C++ engine build/deserialize, buffer preparation, and batched inference with optional GPU or CPU postprocessing.
yolov13/src/preprocess.cu Implements CUDA-based affine warp, normalization, and batched preprocessing into NCHW float tensors.
yolov13/src/postprocess.cu Implements CUDA decode and NMS kernels to postprocess YOLO outputs on GPU.
yolov13/src/postprocess.cpp Adds CPU-side NMS, GPU decode result handling, and bounding-box drawing utilities.
yolov13/src/model.cpp Builds the YOLOv13 TensorRT network (backbone, neck, heads, plugin attachment) and serializes the engine for TRT 10.
yolov13/src/calibrator.cpp Implements an INT8 entropy calibrator for TensorRT using OpenCV preprocessing.
yolov13/src/block.cpp Provides reusable TensorRT network building blocks (convs, attention, hypergraph modules, YOLO head wiring, plugin setup).
yolov13/readme.md Documents environment requirements, model export to .wts, C++ build/run, and Python inference usage, including TensorRT 10 notes.
yolov13/plugin/yololayer.h Declares the YOLO TensorRT plugin and its creator for the detection head.
yolov13/plugin/yololayer.cu Implements the YOLO plugin, including serialization and CUDA-based detection decoding across feature maps.
yolov13/include/utils.h Adds image preprocessing helpers, directory scanning, label loading, and small string utilities.
yolov13/include/types.h Defines shared detection/affine structures and bbox element constants used across CPU/GPU code.
yolov13/include/preprocess.h Declares CUDA preprocessing API used by the C++ inference pipeline.
yolov13/include/postprocess.h Declares CPU/GPU postprocessing, decode, NMS, and drawing interfaces.
yolov13/include/model.h Declares the YOLOv13 TensorRT engine build function.
yolov13/include/macros.h Adds export and TensorRT-compatibility macros for plugins and loggers.
yolov13/include/logging.h Brings in TensorRT sample-style logging utilities and test helpers.
yolov13/include/cuda_utils.h Adds a CUDA error-checking macro for all CUDA calls.
yolov13/include/config.h Centralizes model/input configuration, thresholds, and build-time precision macros.
yolov13/include/calibrator.h Declares the INT8 entropy calibrator class used in engine building.
yolov13/include/block.h Declares all network construction primitives and plugin wiring used in model.cpp.
yolov13/gen_wts.py Adds a tool to convert a PyTorch YOLOv13 .pt checkpoint into a .wts file for TensorRT building.
yolov13/CMakeLists.txt Adds build configuration for the plugin library and yolov13-det executable against CUDA, TensorRT, and OpenCV.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +94 to +97


class YoLov13TRT(object):
"""
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda.cuCtxCreate is being called with a None pointer as the first argument, but the CUDA Python driver API expects a (flags, device) signature (e.g. cuCtxCreate(0, dev)), so this call will fail at runtime with an argument error and prevent context creation. Update the call to match the CUDA Python API (omit the None parameter and pass just flags and device) so that the context is created correctly on TensorRT 10.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +110
nvinfer1::ILayer* DownsampleConv(nvinfer1::INetworkDefinition* network,
std::map<std::string, nvinfer1::Weights> weightMap, nvinfer1::ITensor& input,
int in_channels, std::string lname, bool channel_adjust = true);
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DownsampleConv is declared twice with the same signature (lines 100-102 and again here with a default for channel_adjust), which redeclares the default argument and can cause a compilation error on some compilers. Remove one of the duplicate declarations or keep only a single declaration with the desired default value to avoid ODR/default-argument redefinition issues.

Copilot uses AI. Check for mistakes.
#pragma once
#include <dirent.h>
#include <fstream>
#include <opencv2/opencv.hpp>
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header uses std::string, std::ostringstream, and std::unordered_map further down but only includes <dirent.h>, <fstream>, and OpenCV headers, relying on transitive includes for the C++ standard library types. To make this header self-contained and avoid brittle build issues, explicitly include <string>, <sstream>, and <unordered_map> here.

Suggested change
#include <opencv2/opencv.hpp>
#include <opencv2/opencv.hpp>
#include <string>
#include <sstream>
#include <unordered_map>
#include <vector>

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,16 @@
#pragma once

#include <map>
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header declares functions that use std::vector (e.g. std::vector<cv::Mat>&) but does not include <vector>, instead depending on transitive includes from other headers. For a self-contained and portable header, add an explicit #include <vector> so that any translation unit including this header compiles reliably.

Suggested change
#include <map>
#include <map>
#include <vector>

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +7
#pragma once

#include <opencv2/opencv.hpp>
#include "NvInfer.h"
#include "cuda_utils.h"
#include "types.h"

Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header defines interfaces using std::vector (e.g. std::vector<Detection> and std::vector<cv::Mat>) but does not include <vector>, relying on indirect includes instead. Please add an explicit #include <vector> so the header is self-contained and does not depend on transitive standard library includes.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,56 @@
import sys # noqa: F401
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'sys' is not used.

Suggested change
import sys # noqa: F401

Copilot uses AI. Check for mistakes.
import time
import cv2
import numpy as np
import pycuda.autoinit # noqa: F401
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'pycuda' is not used.

Copilot uses AI. Check for mistakes.
@mpj1234 mpj1234 merged commit bbd19a0 into wang-xinyu:trt10 Jan 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants