-
-
Notifications
You must be signed in to change notification settings - Fork 444
Description
Environment:
- NVIDIA Jetson Orin NX 8GB
- DeepStream 7.1
- CUDA 12.6
- JetPack 6.1
- Linux 5.15.148-tegra
- TensorRT 10.3.0
- ONNX models opset version: 17
I'm running a DeepStream Python project with 3 YOLOv11 models (pgie → sgie1 → sgie2). The TensorRT engines build successfully for the latst model, but the application crashes with a memory corruption error immediately after conversion.
Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode.
WARNING: Deserialize engine failed because file path: deepstream/models/ocr_dataset11.2_yolo11n_256.engine open error
0:00:00.190058720 42706 0xaaaaf71f1e40 WARN nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2080> [UID = 3]: deserialize engine from file :deepstream/models/ocr_dataset11.2_yolo11n_256.engine failed
0:00:00.190096480 42706 0xaaaaf71f1e40 WARN nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2185> [UID = 3]: deserialize backend context from engine from file :deepstream/models/ocr_dataset11.2_yolo11n_256.engine failed, try rebuild
0:00:00.190111552 42706 0xaaaaf71f1e40 INFO nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2106> [UID = 3]: Trying to create engine from model files
Building the TensorRT Engine
Building complete
0:07:07.375651872 42706 0xaaaaf71f1e40 INFO nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2138> [UID = 3]: serialize cuda engine to file: /home/ariapa/projects/isss-plate/src/model_b1_gpu0_fp16.engine successfully
Implicit layer support has been deprecated
INFO: [Implicit Engine Info]: layers num: 0
malloc_consolidate(): unaligned fastbin chunk detectedNow that ocr_dataset11.2_yolo11n_256.engine is built, when I run the project again, I get:
Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode.
0:00:00.280638240 46412 0xaaaae4509040 INFO nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2092> [UID = 3]: deserialized trt engine from :deepstream/models/ocr_dataset11.2_yolo11n_256.engine
Implicit layer support has been deprecated
INFO: [Implicit Engine Info]: layers num: 0
0:00:00.280735072 46412 0xaaaae4509040 INFO nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2195> [UID = 3]: Use deserialized engine model: deepstream/models/ocr_dataset11.2_yolo11n_256.engine
malloc(): unaligned tcache chunk detectedWhen I comment out the sgie2 model, the same error occurs on whichever model loads last (sgie1 in that case).
here's the configuration file for sgie2 (config_detect_ocr.txt):
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=../models/ocr_dataset11.2_yolo11n_256.pt.onnx
labelfile-path=../models/ocr_dataset11.2_yolo11n_256.txt
batch-size=1
network-mode=2
num-detected-classes=37
interval=0
gie-unique-id=3
operate-on-gie-id=2
network-type=0
cluster-mode=2
# workspace-size=2000
parse-bbox-func-name=NvDsInferParseYolo
# parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
process-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
infer-dims=3;256;256
[class-attrs-all]
nms-iou-threshold=0.5
pre-cluster-threshold=0.5
topk=300
What I've tried: switching between NvDsInferParseYolo and NvDsInferParseYoloCuda, and adjusting infer-dims parameter and both generated the same malloc error.
I checked memory with free -h and it wasn't getting full, I also have 4GB swap so it shouldn't be related to insufficient memory.
I have also built the nvdsinfer_custom_impl_Yolo on jetson, but just copied the onnx model files over to jetson from my dGPU computer.
It's worth noting that when I try to convert the pt files to onnx on the jeston with the same command that I used on dGPU computer I get the following error:
$ python export_yolo11.py -w ocr_dataset11.2_yolo11n_256.pt -s 256 --dynamic
Starting: ocr_dataset11.2_yolo11n_256.pt
Opening YOLO11 model
YOLO11n summary (fused): 100 layers, 2,589,367 parameters, 0 gradients, 6.4 GFLOPs
Creating labels.txt file
Exporting the model to ONNX
W1123 18:41:52.682000 39657 torch/onnx/_internal/exporter/_compat.py:114] Setting ONNX exporter to use operator set version 18 because the requested opset_version 17 is a lower version than we have implementations for. Automatic version conversion will be performed, which may not be successful at converting to the requested version. If version conversion is unsuccessful, the opset version of the exported model will be kept at 18. Please consider setting opset_version >=18 to leverage latest ONNX features
Traceback (most recent call last):
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_core.py", line 1416, in export
decomposed_program = _prepare_exported_program_for_export(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_core.py", line 980, in _prepare_exported_program_for_export
exported_program = _fx_passes.decompose_with_registry(exported_program, registry)
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_fx_passes.py", line 19, in decompose_with_registry
return exported_program.run_decompositions(decomp_table)
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/exported_program.py", line 124, in wrapper
return fn(*args, **kwargs)
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/exported_program.py", line 1484, in run_decompositions
return _decompose_exported_program(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/exported_program.py", line 967, in _decompose_exported_program
) = _decompose_and_get_gm_with_new_signature_constants(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/exported_program.py", line 476, in _decompose_and_get_gm_with_new_signature_constants
aten_export_artifact = _export_to_aten_ir(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/_trace.py", line 877, in _export_to_aten_ir
gm, graph_signature = transform(aot_export_module)(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1444, in aot_export_module
fx_g, metadata, in_spec, out_spec = _aot_export_function(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1694, in _aot_export_function
aot_state = create_aot_state(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 567, in create_aot_state
fw_metadata = run_functionalized_fw_and_collect_metadata(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 207, in inner
flat_f_outs = f(*flat_f_args)
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 187, in flat_fn
tree_out = fn(*args, **kwargs)
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1350, in functional_call
out = PropagateUnbackedSymInts(mod).run(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/fx/interpreter.py", line 174, in run
self.env[node] = self.run_node(node)
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 7867, in run_node
rebind_unbacked(fake_mode.shape_env, n, result)
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 602, in rebind_unbacked
if u1.node.hint is not None:
AttributeError: 'float' object has no attribute 'node'
While executing %item : [num_users=1] = call_function[target=torch.ops.aten.item.default](args = (%getitem_21,), kwargs = {})
Original traceback:
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
input = module(input)
File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/ultralytics/nn/tasks.py", line 120, in forward
return self.predict(x, *args, **kwargs)
File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/ultralytics/nn/modules/head.py", line 75, in forward
y = self._inference(x)
Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/export_yolo11.py", line 141, in <module>
main(args)
File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/export_yolo11.py", line 107, in main
torch.onnx.export(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/__init__.py", line 296, in export
return _compat.export_compat(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_compat.py", line 143, in export_compat
onnx_program = _core.export(
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_flags.py", line 23, in wrapper
return func(*args, **kwargs)
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_core.py", line 1444, in export
raise _errors.ConversionError(
torch.onnx._internal.exporter._errors.ConversionError: Failed to decompose the FX graph for ONNX compatibility. This is step 2/3 of exporting the model to ONNX. Next steps:
- Create an issue in the PyTorch GitHub repository against the *torch.export* component and attach the full error stack as well as reproduction scripts.
- Create an error report with `torch.onnx.export(..., report=True)`, and save the ExportedProgram as a pt2 file. Create an issue in the PyTorch GitHub repository against the *onnx* component. Attach the error report and the pt2 model.
## Exception summary
<class 'AttributeError'>: 'float' object has no attribute 'node'
While executing %item : [num_users=1] = call_function[target=torch.ops.aten.item.default](args = (%getitem_21,), kwargs = {})
Original traceback:
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
input = module(input)
File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/ultralytics/nn/tasks.py", line 120, in forward
return self.predict(x, *args, **kwargs)
File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/ultralytics/nn/modules/head.py", line 75, in forward
y = self._inference(x)
Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)
(Refer to the full stack trace above for more information.)But at this point I don't think that this malloc issue is related to model conversion, because I managed to export these 3 models to engine format by running the deepstream-app -c deepstream_app_config.txt command on infer text config files for each of my models one by one and still when I run the project I get the same malloc error:
Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode.
0:00:00.277948416 28111 0xaaab0f764c40 INFO nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2092> [UID = 3]: deserialized trt engine from :deepstream/models/ocr_dataset11.2_yolo11n_256.engine
Implicit layer support has been deprecated
INFO: [Implicit Engine Info]: layers num: 0
0:00:00.278056448 28111 0xaaab0f764c40 INFO nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2195> [UID = 3]: Use deserialized engine model: deepstream/models/ocr_dataset11.2_yolo11n_256.engine
malloc(): unaligned tcache chunk detected