Skip to content

Segmentation Fault Loading YOLO v4 TensorRT Model with Triton #8509

@corentin87

Description

@corentin87

Environment:

  • Triton Inference Server version: 25.09
  • TensorRT version: 10.3.3.9
  • CUDA version: 13.0 (580.95.05 drivers)
  • GPU: A10 (g5 ec2 instance)

Docker container (nvcr.io/nvidia/tritonserver:25.09)

I’m experiencing a segmentation fault when attempting to load a YOLO v4 TensorRT model in Triton. The crash occurs during the model loading phase of tensorrt engine.

I have trained a YOLOv4 model using TAO and generated a .tlt file

I have exported the .tlt to .onnx using
tao model yolo_v4 export -e config.yml -k rocketboots -m yolov4_cspdarknet53_epoch_120.tlt --output yolov4_cspdarknet53_epoch_120.onnx

I have inspect the onnx model using polygraphy

polygraphy inspect model yolov4_cspdarknet53_epoch_120.onnx --verbosity info

[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored

[I] Loading model: /Users/corentinhamel/RocketBoots/Data/person/trainings/yolov4_cspdarknet53_epoch_120.onnx

[I] ==== ONNX Model ====
    Name: model_1 | ONNX Opset: 12

    ---- 1 Graph Input(s) ----
    {Input [dtype=float32, shape=('N', 3, 416, 416)]}

    ---- 4 Graph Output(s) ----
    {BatchedNMS [dtype=int32, shape=()],
     BatchedNMS_1 [dtype=float32, shape=()],
     BatchedNMS_2 [dtype=float32, shape=()],
     BatchedNMS_3 [dtype=float32, shape=()]}
    ---- 380 Initializer(s) ----
    ---- 364 Node(s) ----

I have successfully converted the model from onnx to tensorrt inside the docker container

trtexec --onnx=models/tao-models/detect/person-fisheye/yolov4/non-qat/yolov4_cspdarknet53_epoch_120.onnx --fp16 --shapes=Input:1x3x416x416 --minShapes=Input:1x3x416x416 --optShapes=Input:1x3x416x416 --maxShapes=Input:1x3x416x416 --saveEngine=model_repository/yolo_v4/1/model.plan
[11/13/2025-03:17:16] [I] === ModelOptions ===

[11/13/2025-03:17:16] [I] Format: ONNX

[11/13/2025-03:17:16] [I] Model: models/tao-models/detect/person-fisheye/yolov4/non-qat/yolov4_cspdarknet53_epoch_120.onnx

[11/13/2025-03:17:16] [I] Output:

[11/13/2025-03:17:16] [I] === Build Options ===

[11/13/2025-03:17:16] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default

[11/13/2025-03:17:16] [I] avgTiming: 8

[11/13/2025-03:17:16] [I] Precision: FP32+FP16

[11/13/2025-03:17:16] [I] LayerPrecisions:

[11/13/2025-03:17:16] [I] Layer Device Types:

[11/13/2025-03:17:16] [I] Calibration:

[11/13/2025-03:17:16] [I] Refit: Disabled

[11/13/2025-03:17:16] [I] Strip weights: Disabled

[11/13/2025-03:17:16] [I] Version Compatible: Disabled

[11/13/2025-03:17:16] [I] ONNX Plugin InstanceNorm: Disabled

[11/13/2025-03:17:16] [I] ONNX kENABLE_UINT8_AND_ASYMMETRIC_QUANTIZATION_DLA flag: Disabled

[11/13/2025-03:17:16] [I] TensorRT runtime: full

[11/13/2025-03:17:16] [I] Lean DLL Path:

[11/13/2025-03:17:16] [I] Tempfile Controls: { in_memory: allow, temporary: allow }

[11/13/2025-03:17:16] [I] Exclude Lean Runtime: Disabled

[11/13/2025-03:17:16] [I] Sparsity: Disabled

[11/13/2025-03:17:16] [I] Safe mode: Disabled

[11/13/2025-03:17:16] [I] Build DLA standalone loadable: Disabled

[11/13/2025-03:17:16] [I] Allow GPU fallback for DLA: Disabled

[11/13/2025-03:17:16] [I] DirectIO mode: Disabled

[11/13/2025-03:17:16] [I] Restricted mode: Disabled

[11/13/2025-03:17:16] [I] Skip inference: Disabled

[11/13/2025-03:17:16] [I] Save engine: model_repository/yolo_v4/1/model.plan

[11/13/2025-03:17:16] [I] Load engine:

[11/13/2025-03:17:16] [I] Profiling verbosity: 0

[11/13/2025-03:17:16] [I] Tactic sources: Using default tactic sources

[11/13/2025-03:17:16] [I] timingCacheMode: local

[11/13/2025-03:17:16] [I] timingCacheFile:

[11/13/2025-03:17:16] [I] Enable Compilation Cache: Enabled

[11/13/2025-03:17:16] [I] Enable Monitor Memory: Disabled

[11/13/2025-03:17:16] [I] errorOnTimingCacheMiss: Disabled

[11/13/2025-03:17:16] [I] Preview Features: Use default preview flags.

[11/13/2025-03:17:16] [I] MaxAuxStreams: -1

[11/13/2025-03:17:16] [I] BuilderOptimizationLevel: -1

[11/13/2025-03:17:16] [I] MaxTactics: -1

[11/13/2025-03:17:16] [I] Calibration Profile Index: 0

[11/13/2025-03:17:16] [I] Weight Streaming: Disabled

[11/13/2025-03:17:16] [I] Runtime Platform: Same As Build

[11/13/2025-03:17:16] [I] Debug Tensors:

[11/13/2025-03:17:16] [I] Distributive Independence: Disabled

[11/13/2025-03:17:16] [I] Mark Unfused Tensors As Debug Tensors: Disabled

[11/13/2025-03:17:16] [I] Input(s)s format: fp32:CHW

[11/13/2025-03:17:16] [I] Output(s)s format: fp32:CHW

[11/13/2025-03:17:16] [I] Input build shape (profile 0): Input=1x3x416x416+1x3x416x416+1x3x416x416

[11/13/2025-03:17:16] [I] Input calibration shapes: model

[11/13/2025-03:17:16] [I] === System Options ===

[11/13/2025-03:17:16] [I] Device: 0

[11/13/2025-03:17:16] [I] DLACore:

[11/13/2025-03:17:16] [I] Plugins:

[11/13/2025-03:17:16] [I] setPluginsToSerialize:

[11/13/2025-03:17:16] [I] dynamicPlugins:

[11/13/2025-03:17:16] [I] ignoreParsedPluginLibs: 0

[11/13/2025-03:17:16] [I]

[11/13/2025-03:17:16] [I] === Inference Options ===

[11/13/2025-03:17:16] [I] Batch: Explicit

[11/13/2025-03:17:16] [I] Input inference shape : Input=1x3x416x416

[11/13/2025-03:17:16] [I] Iterations: 10

[11/13/2025-03:17:16] [I] Duration: 3s (+ 200ms warm up)

[11/13/2025-03:17:16] [I] Sleep time: 0ms

[11/13/2025-03:17:16] [I] Idle time: 0ms

[11/13/2025-03:17:16] [I] Inference Streams: 1

[11/13/2025-03:17:16] [I] ExposeDMA: Disabled

[11/13/2025-03:17:16] [I] Data transfers: Enabled

[11/13/2025-03:17:16] [I] Spin-wait: Disabled

[11/13/2025-03:17:16] [I] Multithreading: Disabled

[11/13/2025-03:17:16] [I] CUDA Graph: Disabled

[11/13/2025-03:17:16] [I] Separate profiling: Disabled

[11/13/2025-03:17:16] [I] Time Deserialize: Disabled

[11/13/2025-03:17:16] [I] Time Refit: Disabled

[11/13/2025-03:17:16] [I] NVTX verbosity: 0

[11/13/2025-03:17:16] [I] Persistent Cache Ratio: 0

[11/13/2025-03:17:16] [I] Optimization Profile Index: 0

[11/13/2025-03:17:16] [I] Weight Streaming Budget: 100.000000%

[11/13/2025-03:17:16] [I] Inputs:

[11/13/2025-03:17:16] [I] Debug Tensor Save Destinations:

[11/13/2025-03:17:16] [I] Dump All Debug Tensor in Formats:

[11/13/2025-03:17:16] [I] === Reporting Options ===

[11/13/2025-03:17:16] [I] Verbose: Disabled

[11/13/2025-03:17:16] [I] Averages: 10 inferences

[11/13/2025-03:17:16] [I] Percentiles: 90,95,99

[11/13/2025-03:17:16] [I] Dump refittable layers:Disabled

[11/13/2025-03:17:16] [I] Dump output: Disabled

[11/13/2025-03:17:16] [I] Profile: Disabled

[11/13/2025-03:17:16] [I] Export timing to JSON file:

[11/13/2025-03:17:16] [I] Export output to JSON file:

[11/13/2025-03:17:16] [I] Export profile to JSON file:

[11/13/2025-03:17:16] [I]

[11/13/2025-03:17:16] [I] === Device Information ===

[11/13/2025-03:17:16] [I] Available Devices:

[11/13/2025-03:17:16] [I]   Device 0: "NVIDIA A10G" UUID: GPU-29a7269d-f1f3-ee26-61f1-86fb7e1d840f

[11/13/2025-03:17:16] [I] Selected Device: NVIDIA A10G

[11/13/2025-03:17:16] [I] Selected Device ID: 0

[11/13/2025-03:17:16] [I] Selected Device UUID: GPU-29a7269d-f1f3-ee26-61f1-86fb7e1d840f

[11/13/2025-03:17:16] [I] Compute Capability: 8.6

[11/13/2025-03:17:16] [I] SMs: 80

[11/13/2025-03:17:16] [I] Device Global Memory: 22587 MiB

[11/13/2025-03:17:16] [I] Shared Memory per SM: 100 KiB

[11/13/2025-03:17:16] [I] Memory Bus Width: 384 bits (ECC enabled)

[11/13/2025-03:17:16] [I] Application Compute Clock Rate: 1.71 GHz

[11/13/2025-03:17:16] [I] Application Memory Clock Rate: 6.251 GHz

[11/13/2025-03:17:16] [I]

[11/13/2025-03:17:16] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.

[11/13/2025-03:17:16] [I]

[11/13/2025-03:17:16] [I] TensorRT version: 10.13.3

[11/13/2025-03:17:16] [I] Loading standard plugins

[11/13/2025-03:17:16] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 28, GPU 258 (MiB)

[11/13/2025-03:17:17] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +990, GPU +8, now: CPU 1220, GPU 266 (MiB)

[11/13/2025-03:17:17] [I] Start parsing network model.

[11/13/2025-03:17:18] [I] [TRT] ----------------------------------------------------------------

[11/13/2025-03:17:18] [I] [TRT] Input filename:   models/tao-models/detect/person-fisheye/yolov4/non-qat/yolov4_cspdarknet53_epoch_120.onnx

[11/13/2025-03:17:18] [I] [TRT] ONNX IR version:  0.0.8

[11/13/2025-03:17:18] [I] [TRT] Opset version:    12

[11/13/2025-03:17:18] [I] [TRT] Producer name:    keras2onnx

[11/13/2025-03:17:18] [I] [TRT] Producer version: 1.13.0

[11/13/2025-03:17:18] [I] [TRT] Domain:           

[11/13/2025-03:17:18] [I] [TRT] Model version:    0

[11/13/2025-03:17:18] [I] [TRT] Doc string:       

[11/13/2025-03:17:18] [I] [TRT] ----------------------------------------------------------------

[11/13/2025-03:17:18] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:

[11/13/2025-03:17:18] [W] [TRT] onnxOpImporters.cpp:6854: Attribute caffeSemantics not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.

[11/13/2025-03:17:18] [W] [TRT] BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.

[11/13/2025-03:17:18] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT

[11/13/2025-03:17:18] [I] Finished parsing network model. Parse time: 0.267619

[11/13/2025-03:17:18] [I] Set shape of input tensor Input for optimization profile 0 to: MIN=1x3x416x416 OPT=1x3x416x416 MAX=1x3x416x416

[11/13/2025-03:17:18] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.

[11/13/2025-03:25:52] [I] [TRT] Detected 1 inputs and 4 output network tensors.

[11/13/2025-03:25:57] [I] [TRT] Total Host Persistent Memory: 586288 bytes

[11/13/2025-03:25:57] [I] [TRT] Total Device Persistent Memory: 0 bytes

[11/13/2025-03:25:57] [I] [TRT] Max Scratch Memory: 303360 bytes

[11/13/2025-03:25:57] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 183 steps to complete.

[11/13/2025-03:25:57] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 13.9592ms to assign 8 blocks to 183 nodes requiring 30985728 bytes.

[11/13/2025-03:25:57] [I] [TRT] Total Activation Memory: 30985216 bytes

[11/13/2025-03:25:57] [I] [TRT] Total Weights Memory: 99032456 bytes

[11/13/2025-03:25:58] [I] [TRT] Engine generation completed in 519.787 seconds.

[11/13/2025-03:25:58] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 169 MiB

[11/13/2025-03:25:58] [I] Created engine with size: 97.436 MiB

[11/13/2025-03:25:58] [I] Engine built in 520.137 sec.

[11/13/2025-03:25:58] [I] [TRT] Loaded engine size: 97 MiB

[11/13/2025-03:25:58] [W] [TRT] BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.

[11/13/2025-03:25:58] [I] Engine deserialized in 0.106328 sec.

[11/13/2025-03:25:58] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +29, now: CPU 0, GPU 123 (MiB)

[11/13/2025-03:25:58] [I] Setting persistentCacheLimit to 0 bytes.

[11/13/2025-03:25:58] [I] Created execution context with device memory size: 29.5498 MiB

[11/13/2025-03:25:58] [I] Using random values for input Input

[11/13/2025-03:25:58] [I] Input binding for Input with dimensions 1x3x416x416 is created.

[11/13/2025-03:25:58] [I] Output binding for BatchedNMS with dimensions 1x1 is created.

[11/13/2025-03:25:58] [I] Output binding for BatchedNMS_1 with dimensions 1x200x4 is created.

[11/13/2025-03:25:58] [I] Output binding for BatchedNMS_2 with dimensions 1x200 is created.

[11/13/2025-03:25:58] [I] Output binding for BatchedNMS_3 with dimensions 1x200 is created.

[11/13/2025-03:25:58] [I] Starting inference

[11/13/2025-03:26:02] [I] Warmup completed 87 queries over 200 ms

[11/13/2025-03:26:02] [I] Timing trace has 1308 queries over 3.00766 s

[11/13/2025-03:26:02] [I]

[11/13/2025-03:26:02] [I]

[11/13/2025-03:26:02] [I] === Performance summary ===

[11/13/2025-03:26:02] [I] Throughput: 434.89 qps

[11/13/2025-03:26:02] [I] Latency: min = 2.4563 ms, max = 2.53491 ms, mean = 2.4678 ms, median = 2.46696 ms, percentile(90%) = 2.47156 ms, percentile(95%) = 2.4729 ms, percentile(99%) = 2.5105 ms

[11/13/2025-03:26:02] [I] Enqueue Time: min = 0.745117 ms, max = 0.954224 ms, mean = 0.859333 ms, median = 0.869019 ms, percentile(90%) = 0.893555 ms, percentile(95%) = 0.898865 ms, percentile(99%) = 0.915771 ms

[11/13/2025-03:26:02] [I] H2D Latency: min = 0.159668 ms, max = 0.230713 ms, mean = 0.163629 ms, median = 0.162842 ms, percentile(90%) = 0.165039 ms, percentile(95%) = 0.165405 ms, percentile(99%) = 0.204346 ms

[11/13/2025-03:26:02] [I] GPU Compute Time: min = 2.28442 ms, max = 2.32056 ms, mean = 2.29515 ms, median = 2.2948 ms, percentile(90%) = 2.29883 ms, percentile(95%) = 2.2999 ms, percentile(99%) = 2.30402 ms

[11/13/2025-03:26:02] [I] D2H Latency: min = 0.00732422 ms, max = 0.0373535 ms, mean = 0.00900924 ms, median = 0.00891113 ms, percentile(90%) = 0.0098877 ms, percentile(95%) = 0.0103302 ms, percentile(99%) = 0.0107422 ms

[11/13/2025-03:26:02] [I] Total Host Walltime: 3.00766 s

[11/13/2025-03:26:02] [I] Total GPU Compute Time: 3.00206 s

[11/13/2025-03:26:02] [I] Explanations of the performance metrics are printed in the verbose logs.

[11/13/2025-03:26:02] [I]

&&&& PASSED TensorRT.trtexec [TensorRT v101303] [b9] # trtexec --onnx=models/tao-models/detect/person-fisheye/yolov4/non-qat/yolov4_cspdarknet53_epoch_120.onnx --fp16 --shapes=Input:1x3x416x416 --minShapes=Input:1x3x416x416 --optShapes=Input:1x3x416x416 --maxShapes=Input:1x3x416x416 --saveEngine=model_repository/yolo_v4/1/model.plan

I have inspect the tensorrt model with polygraphy

polygraphy inspect model model_repository/yolo_v4/1/model.plan

[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored

[I] Loading bytes from /etc/rocketboots/model_repository/yolo_v4/1/model.plan

[W] BatchedNMSPlugin is deprecated since TensorRT 9.0. Use INetworkDefinition::addNMS() to add an INMSLayer OR use EfficientNMS plugin.

[W] hasImplicitBatchDimension is deprecated and always return false.

[I] ==== TensorRT Engine ====

    Name: Unnamed Network 0 | Explicit Batch Engine

    

    ---- 1 Engine Input(s) ----

    {Input [dtype=float32, shape=(1, 3, 416, 416)]}

    ---- 4 Engine Output(s) ----

    {BatchedNMS [dtype=int32, shape=(1, 1)],
     BatchedNMS_1 [dtype=float32, shape=(1, 200, 4)],
     BatchedNMS_2 [dtype=float32, shape=(1, 200)],
     BatchedNMS_3 [dtype=float32, shape=(1, 200)]}


    ---- Memory ----

    Device Memory: 30985216 bytes

    ---- 1 Profile(s) (5 Tensor(s) Each) ----

    - Profile: 0

        Tensor: Input                 (Input), Index: 0 | Shapes: min=(1, 3, 416, 416), opt=(1, 3, 416, 416), max=(1, 3, 416, 416)

        Tensor: BatchedNMS           (Output), Index: 1 | Shape: (1, 1)

        Tensor: BatchedNMS_1         (Output), Index: 2 | Shape: (1, 200, 4)

        Tensor: BatchedNMS_2         (Output), Index: 3 | Shape: (1, 200)

        Tensor: BatchedNMS_3         (Output), Index: 4 | Shape: (1, 200)

    

    ---- 210 Layer(s) ----

Finally, I am loading the model with triton and getting the segmentation fault but I can’t really figure out why

tritonserver --model-store model_repository/ --model-control-mode explicit --load-model yolo_v4 --log-verbose 4

I1113 03:53:12.521517 3463 cache_manager.cc:480] "Create CacheManager with cache_dir: '/opt/tritonserver/caches'"

I1113 03:53:12.940603 3463 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x726d56000000' with size 268435456"

I1113 03:53:12.940681 3463 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"

I1113 03:53:12.941579 3463 model_config_utils.cc:753] "Server side auto-completed config: "

name: "yolo_v4"

platform: "tensorrt_plan"

default_model_filename: "model.plan"

backend: "tensorrt"

I1113 03:53:12.942149 3463 model_lifecycle.cc:442] "AsyncLoad() 'yolo_v4'"

I1113 03:53:12.942224 3463 model_lifecycle.cc:473] "loading: yolo_v4:1"

I1113 03:53:12.942311 3463 model_lifecycle.cc:552] "CreateModel() 'yolo_v4' version 1"

I1113 03:53:12.942404 3463 backend_model.cc:505] "Adding default backend config setting: default-max-batch-size,4"

I1113 03:53:12.942475 3463 shared_library.cc:149] "OpenLibraryHandle: /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so"

I1113 03:53:12.991502 3463 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt"

I1113 03:53:12.991541 3463 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19"

I1113 03:53:12.991549 3463 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19"

I1113 03:53:12.991557 3463 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}"

I1113 03:53:12.991581 3463 tensorrt.cc:187] "Registering TensorRT Plugins"

I1113 03:53:12.991617 3463 logging.cc:49] "Registered plugin creator - ::ROIAlign_TRT version 2"

I1113 03:53:12.991636 3463 logging.cc:49] "Registered plugin creator - ::CropAndResizeDynamic version 2"

I1113 03:53:12.991656 3463 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 3"

I1113 03:53:12.991667 3463 logging.cc:49] "Registered plugin creator - ::ScatterElements version 2"

I1113 03:53:12.991682 3463 logging.cc:49] "Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 2"

I1113 03:53:12.991696 3463 logging.cc:49] "Registered plugin creator - ::ModulatedDeformConv2d version 2"

I1113 03:53:12.991709 3463 logging.cc:49] "Registered plugin creator - ::BatchedNMSDynamic_TRT version 1"

I1113 03:53:12.991722 3463 logging.cc:49] "Registered plugin creator - ::BatchedNMS_TRT version 1"

I1113 03:53:12.991741 3463 logging.cc:49] "Registered plugin creator - ::BatchTilePlugin_TRT version 1"

I1113 03:53:12.991752 3463 logging.cc:49] "Registered plugin creator - ::Clip_TRT version 1"

I1113 03:53:12.991769 3463 logging.cc:49] "Registered plugin creator - ::CoordConvAC version 1"

I1113 03:53:12.991787 3463 logging.cc:49] "Registered plugin creator - ::CropAndResizeDynamic version 1"

I1113 03:53:12.991796 3463 logging.cc:49] "Registered plugin creator - ::CropAndResize version 1"

I1113 03:53:12.991810 3463 logging.cc:49] "Registered plugin creator - ::DecodeBbox3DPlugin version 1"

I1113 03:53:12.991818 3463 logging.cc:49] "Registered plugin creator - ::DetectionLayer_TRT version 1"

I1113 03:53:12.991834 3463 logging.cc:49] "Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1"

I1113 03:53:12.991860 3463 logging.cc:49] "Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1"

I1113 03:53:12.991871 3463 logging.cc:49] "Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1"

I1113 03:53:12.991904 3463 logging.cc:49] "Registered plugin creator - ::EfficientNMS_TRT version 1"

I1113 03:53:12.991920 3463 logging.cc:49] "Registered plugin creator - ::FlattenConcat_TRT version 1"

I1113 03:53:12.991938 3463 logging.cc:49] "Registered plugin creator - ::GenerateDetection_TRT version 1"

I1113 03:53:12.991948 3463 logging.cc:49] "Registered plugin creator - ::GridAnchor_TRT version 1"

I1113 03:53:12.991955 3463 logging.cc:49] "Registered plugin creator - ::GridAnchorRect_TRT version 1"

I1113 03:53:12.991975 3463 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 1"

I1113 03:53:12.991992 3463 logging.cc:49] "Registered plugin creator - ::InstanceNormalization_TRT version 2"

I1113 03:53:12.992009 3463 logging.cc:49] "Registered plugin creator - ::LReLU_TRT version 1"

I1113 03:53:12.992021 3463 logging.cc:49] "Registered plugin creator - ::ModulatedDeformConv2d version 1"

I1113 03:53:12.992030 3463 logging.cc:49] "Registered plugin creator - ::MultilevelCropAndResize_TRT version 1"

I1113 03:53:12.992039 3463 logging.cc:49] "Registered plugin creator - ::MultilevelProposeROI_TRT version 1"

I1113 03:53:12.992059 3463 logging.cc:49] "Registered plugin creator - ::MultiscaleDeformableAttnPlugin_TRT version 1"

I1113 03:53:12.992073 3463 logging.cc:49] "Registered plugin creator - ::NMSDynamic_TRT version 1"

I1113 03:53:12.992088 3463 logging.cc:49] "Registered plugin creator - ::NMS_TRT version 1"

I1113 03:53:12.992106 3463 logging.cc:49] "Registered plugin creator - ::Normalize_TRT version 1"

I1113 03:53:12.992122 3463 logging.cc:49] "Registered plugin creator - ::PillarScatterPlugin version 1"

I1113 03:53:12.992138 3463 logging.cc:49] "Registered plugin creator - ::PriorBox_TRT version 1"

I1113 03:53:12.992151 3463 logging.cc:49] "Registered plugin creator - ::ProposalDynamic version 1"

I1113 03:53:12.992160 3463 logging.cc:49] "Registered plugin creator - ::ProposalLayer_TRT version 1"

I1113 03:53:12.992176 3463 logging.cc:49] "Registered plugin creator - ::Proposal version 1"

I1113 03:53:12.992193 3463 logging.cc:49] "Registered plugin creator - ::PyramidROIAlign_TRT version 1"

I1113 03:53:12.992209 3463 logging.cc:49] "Registered plugin creator - ::Region_TRT version 1"

I1113 03:53:12.992222 3463 logging.cc:49] "Registered plugin creator - ::Reorg_TRT version 2"

I1113 03:53:12.992230 3463 logging.cc:49] "Registered plugin creator - ::Reorg_TRT version 1"

I1113 03:53:12.992239 3463 logging.cc:49] "Registered plugin creator - ::ResizeNearest_TRT version 1"

I1113 03:53:12.992247 3463 logging.cc:49] "Registered plugin creator - ::ROIAlign_TRT version 1"

I1113 03:53:12.992255 3463 logging.cc:49] "Registered plugin creator - ::RPROI_TRT version 1"

I1113 03:53:12.992263 3463 logging.cc:49] "Registered plugin creator - ::ScatterElements version 1"

I1113 03:53:12.992271 3463 logging.cc:49] "Registered plugin creator - ::ScatterND version 1"

I1113 03:53:12.992299 3463 logging.cc:49] "Registered plugin creator - ::SpecialSlice_TRT version 1"

I1113 03:53:12.992310 3463 logging.cc:49] "Registered plugin creator - ::Split version 1"

I1113 03:53:12.992318 3463 logging.cc:49] "Registered plugin creator - ::VoxelGeneratorPlugin version 1"

I1113 03:53:12.992364 3463 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: yolo_v4 (version 1)"

I1113 03:53:12.993118 3463 model_config_utils.cc:1986] "ModelConfig 64-bit fields:"

I1113 03:53:12.993139 3463 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::default_priority_level"

I1113 03:53:12.993144 3463 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds"

I1113 03:53:12.993149 3463 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::max_queue_delay_microseconds"

I1113 03:53:12.993154 3463 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_levels"

I1113 03:53:12.993162 3463 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_queue_policy::key"

I1113 03:53:12.993176 3463 model_config_utils.cc:1988] "\tModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds"

I1113 03:53:12.993182 3463 model_config_utils.cc:1988] "\tModelConfig::ensemble_scheduling::step::model_version"

I1113 03:53:12.993188 3463 model_config_utils.cc:1988] "\tModelConfig::input::dims"

I1113 03:53:12.993194 3463 model_config_utils.cc:1988] "\tModelConfig::input::reshape::shape"

I1113 03:53:12.993200 3463 model_config_utils.cc:1988] "\tModelConfig::instance_group::secondary_devices::device_id"

I1113 03:53:12.993207 3463 model_config_utils.cc:1988] "\tModelConfig::model_warmup::inputs::value::dims"

I1113 03:53:12.993212 3463 model_config_utils.cc:1988] "\tModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim"

I1113 03:53:12.993220 3463 model_config_utils.cc:1988] "\tModelConfig::optimization::cuda::graph_spec::input::value::dim"

I1113 03:53:12.993233 3463 model_config_utils.cc:1988] "\tModelConfig::output::dims"

I1113 03:53:12.993239 3463 model_config_utils.cc:1988] "\tModelConfig::output::reshape::shape"

I1113 03:53:12.993245 3463 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::direct::max_queue_delay_microseconds"

I1113 03:53:12.993261 3463 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::max_sequence_idle_microseconds"

I1113 03:53:12.993270 3463 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::oldest::max_queue_delay_microseconds"

I1113 03:53:12.993285 3463 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::state::dims"

I1113 03:53:12.993302 3463 model_config_utils.cc:1988] "\tModelConfig::sequence_batching::state::initial_state::dims"

I1113 03:53:12.993311 3463 model_config_utils.cc:1988] "\tModelConfig::version_policy::specific::versions"

I1113 03:53:12.993458 3463 model_state.cc:355] "Setting the CUDA device to GPU0 to auto-complete config for yolo_v4"

I1113 03:53:12.993485 3463 model_state.cc:401] "Using explicit serialized file 'model.plan' to auto-complete config for yolo_v4"

I1113 03:53:13.121548 3463 logging.cc:46] "Loaded engine size: 97 MiB"

I1113 03:53:13.170375 3463 logging.cc:49] "Local registry did not find BatchedNMSDynamic_TRT creator. Will try parent registry if enabled."

I1113 03:53:13.170415 3463 logging.cc:49] "Global registry found BatchedNMSDynamic_TRT creator."

Segmentation fault (core dumped) 

I am confused why the model works perfectly with trtexec (successful conversion, passes inference test), but segfaults specifically when loaded by Triton’s TensorRT backend. Both use the same TensorRT version (same running container) and the same .plan file.
Not sure if the error could come from BatchedNMS plugging. I've tried to replace it by EfficentNMS , but it also says it's depracated after Tensortrt >=9.

My config.pbtxt is as follow:

name: "yolo_v4"
backend: "tensorrt"
max_batch_size: 0
input [
  {
    name: "Input"
    data_type: TYPE_FP32
    dims: [ 3, 416, 416 ]
  }
]

output [
  {
    name: "BatchedNMS"
    data_type: TYPE_INT32
    dims: [ 1 ]
  },
  {
    name: "BatchedNMS_1"
    data_type: TYPE_FP32
    dims: [ 200, 4 ]
  },
  {
    name: "BatchedNMS_2"
    data_type: TYPE_FP32
    dims: [ 200 ]
  },
  {
    name: "BatchedNMS_3"
    data_type: TYPE_FP32
    dims: [ 200 ]
  }
]

Any suggestions for debugging or workarounds would be appreciated!
Thanks,Corentin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions