feat: add yolov13 implementation for TensorRT 10#1699
feat: add yolov13 implementation for TensorRT 10#1699mpj1234 merged 1 commit intowang-xinyu:trt10from
Conversation
9442b7d to
08ddd40
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a full YOLOv13 detection pipeline with TensorRT 10 support, including C++ engine building/inference, GPU-based pre/post-processing, a custom YOLO TensorRT plugin, and two Python inference entrypoints (PyCUDA and cuda-python). The changes introduce new model construction blocks, calibration utilities for INT8, and build tooling (CMake) plus documentation for setup and usage.
Changes:
- Add C++ detection executable (
yolov13-det) with TensorRT 10, including engine build/serialize, dynamic buffer setup using tensor names, and batched GPU preprocessing/postprocessing. - Implement custom YOLO plugin (
YoloLayer_TRT) and associated CUDA kernels for decoding and NMS, plus shared type, preprocess, and postprocess utilities. - Provide Python detection scripts for TensorRT (PyCUDA and cuda-python variants) and a README with environment requirements, build instructions, and Python usage.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
yolov13/yolov13_det_trt_cuda-python.py |
Adds a TensorRT 10 detection script using the cuda-python driver/runtime APIs and manual tensor-address binding. |
yolov13/yolov13_det_trt.py |
Adds a TensorRT detection script using PyCUDA, mirroring the C++ pipeline at a higher level. |
yolov13/yolov13_det.cpp |
Implements C++ engine build/deserialize, buffer preparation, and batched inference with optional GPU or CPU postprocessing. |
yolov13/src/preprocess.cu |
Implements CUDA-based affine warp, normalization, and batched preprocessing into NCHW float tensors. |
yolov13/src/postprocess.cu |
Implements CUDA decode and NMS kernels to postprocess YOLO outputs on GPU. |
yolov13/src/postprocess.cpp |
Adds CPU-side NMS, GPU decode result handling, and bounding-box drawing utilities. |
yolov13/src/model.cpp |
Builds the YOLOv13 TensorRT network (backbone, neck, heads, plugin attachment) and serializes the engine for TRT 10. |
yolov13/src/calibrator.cpp |
Implements an INT8 entropy calibrator for TensorRT using OpenCV preprocessing. |
yolov13/src/block.cpp |
Provides reusable TensorRT network building blocks (convs, attention, hypergraph modules, YOLO head wiring, plugin setup). |
yolov13/readme.md |
Documents environment requirements, model export to .wts, C++ build/run, and Python inference usage, including TensorRT 10 notes. |
yolov13/plugin/yololayer.h |
Declares the YOLO TensorRT plugin and its creator for the detection head. |
yolov13/plugin/yololayer.cu |
Implements the YOLO plugin, including serialization and CUDA-based detection decoding across feature maps. |
yolov13/include/utils.h |
Adds image preprocessing helpers, directory scanning, label loading, and small string utilities. |
yolov13/include/types.h |
Defines shared detection/affine structures and bbox element constants used across CPU/GPU code. |
yolov13/include/preprocess.h |
Declares CUDA preprocessing API used by the C++ inference pipeline. |
yolov13/include/postprocess.h |
Declares CPU/GPU postprocessing, decode, NMS, and drawing interfaces. |
yolov13/include/model.h |
Declares the YOLOv13 TensorRT engine build function. |
yolov13/include/macros.h |
Adds export and TensorRT-compatibility macros for plugins and loggers. |
yolov13/include/logging.h |
Brings in TensorRT sample-style logging utilities and test helpers. |
yolov13/include/cuda_utils.h |
Adds a CUDA error-checking macro for all CUDA calls. |
yolov13/include/config.h |
Centralizes model/input configuration, thresholds, and build-time precision macros. |
yolov13/include/calibrator.h |
Declares the INT8 entropy calibrator class used in engine building. |
yolov13/include/block.h |
Declares all network construction primitives and plugin wiring used in model.cpp. |
yolov13/gen_wts.py |
Adds a tool to convert a PyTorch YOLOv13 .pt checkpoint into a .wts file for TensorRT building. |
yolov13/CMakeLists.txt |
Adds build configuration for the plugin library and yolov13-det executable against CUDA, TensorRT, and OpenCV. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
||
| class YoLov13TRT(object): | ||
| """ |
There was a problem hiding this comment.
cuda.cuCtxCreate is being called with a None pointer as the first argument, but the CUDA Python driver API expects a (flags, device) signature (e.g. cuCtxCreate(0, dev)), so this call will fail at runtime with an argument error and prevent context creation. Update the call to match the CUDA Python API (omit the None parameter and pass just flags and device) so that the context is created correctly on TensorRT 10.
| nvinfer1::ILayer* DownsampleConv(nvinfer1::INetworkDefinition* network, | ||
| std::map<std::string, nvinfer1::Weights> weightMap, nvinfer1::ITensor& input, | ||
| int in_channels, std::string lname, bool channel_adjust = true); |
There was a problem hiding this comment.
DownsampleConv is declared twice with the same signature (lines 100-102 and again here with a default for channel_adjust), which redeclares the default argument and can cause a compilation error on some compilers. Remove one of the duplicate declarations or keep only a single declaration with the desired default value to avoid ODR/default-argument redefinition issues.
| #pragma once | ||
| #include <dirent.h> | ||
| #include <fstream> | ||
| #include <opencv2/opencv.hpp> |
There was a problem hiding this comment.
This header uses std::string, std::ostringstream, and std::unordered_map further down but only includes <dirent.h>, <fstream>, and OpenCV headers, relying on transitive includes for the C++ standard library types. To make this header self-contained and avoid brittle build issues, explicitly include <string>, <sstream>, and <unordered_map> here.
| #include <opencv2/opencv.hpp> | |
| #include <opencv2/opencv.hpp> | |
| #include <string> | |
| #include <sstream> | |
| #include <unordered_map> | |
| #include <vector> |
| @@ -0,0 +1,16 @@ | |||
| #pragma once | |||
|
|
|||
| #include <map> | |||
There was a problem hiding this comment.
This header declares functions that use std::vector (e.g. std::vector<cv::Mat>&) but does not include <vector>, instead depending on transitive includes from other headers. For a self-contained and portable header, add an explicit #include <vector> so that any translation unit including this header compiles reliably.
| #include <map> | |
| #include <map> | |
| #include <vector> |
| #pragma once | ||
|
|
||
| #include <opencv2/opencv.hpp> | ||
| #include "NvInfer.h" | ||
| #include "cuda_utils.h" | ||
| #include "types.h" | ||
|
|
There was a problem hiding this comment.
This header defines interfaces using std::vector (e.g. std::vector<Detection> and std::vector<cv::Mat>) but does not include <vector>, relying on indirect includes instead. Please add an explicit #include <vector> so the header is self-contained and does not depend on transitive standard library includes.
| @@ -0,0 +1,56 @@ | |||
| import sys # noqa: F401 | |||
There was a problem hiding this comment.
Import of 'sys' is not used.
| import sys # noqa: F401 |
| import time | ||
| import cv2 | ||
| import numpy as np | ||
| import pycuda.autoinit # noqa: F401 |
There was a problem hiding this comment.
Import of 'pycuda' is not used.
No description provided.