malloc Error on Jetson

Environment:

- NVIDIA Jetson Orin NX 8GB
- DeepStream 7.1
- CUDA 12.6
- JetPack 6.1
- Linux 5.15.148-tegra
- TensorRT 10.3.0
- ONNX models opset version: 17

I'm running a DeepStream Python project with 3 YOLOv11 models (pgie → sgie1 → sgie2). The TensorRT engines build successfully for the latst model, but the application crashes with a memory corruption error immediately after conversion.

```bash
Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode.
WARNING: Deserialize engine failed because file path: deepstream/models/ocr_dataset11.2_yolo11n_256.engine open error
0:00:00.190058720 42706 0xaaaaf71f1e40 WARN                 nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2080> [UID = 3]: deserialize engine from file :deepstream/models/ocr_dataset11.2_yolo11n_256.engine failed
0:00:00.190096480 42706 0xaaaaf71f1e40 WARN                 nvinfer gstnvinfer.cpp:681:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2185> [UID = 3]: deserialize backend context from engine from file :deepstream/models/ocr_dataset11.2_yolo11n_256.engine failed, try rebuild
0:00:00.190111552 42706 0xaaaaf71f1e40 INFO                 nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2106> [UID = 3]: Trying to create engine from model files

Building the TensorRT Engine

Building complete

0:07:07.375651872 42706 0xaaaaf71f1e40 INFO                 nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:2138> [UID = 3]: serialize cuda engine to file: /home/ariapa/projects/isss-plate/src/model_b1_gpu0_fp16.engine successfully
Implicit layer support has been deprecated
INFO: [Implicit Engine Info]: layers num: 0

malloc_consolidate(): unaligned fastbin chunk detected
```

Now that `ocr_dataset11.2_yolo11n_256.engine` is built, when I run the project again, I get:

```bash
Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode.
0:00:00.280638240 46412 0xaaaae4509040 INFO                 nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2092> [UID = 3]: deserialized trt engine from :deepstream/models/ocr_dataset11.2_yolo11n_256.engine
Implicit layer support has been deprecated
INFO: [Implicit Engine Info]: layers num: 0

0:00:00.280735072 46412 0xaaaae4509040 INFO                 nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2195> [UID = 3]: Use deserialized engine model: deepstream/models/ocr_dataset11.2_yolo11n_256.engine
malloc(): unaligned tcache chunk detected
```

When I comment out the sgie2 model, the same error occurs on whichever model loads last (sgie1 in that case).

here's the configuration file for sgie2 (`config_detect_ocr.txt`):

```
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=../models/ocr_dataset11.2_yolo11n_256.pt.onnx
labelfile-path=../models/ocr_dataset11.2_yolo11n_256.txt
batch-size=1
network-mode=2
num-detected-classes=37
interval=0
gie-unique-id=3
operate-on-gie-id=2
network-type=0
cluster-mode=2
# workspace-size=2000
parse-bbox-func-name=NvDsInferParseYolo
# parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

process-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
infer-dims=3;256;256

[class-attrs-all]
nms-iou-threshold=0.5
pre-cluster-threshold=0.5
topk=300
```

What I've tried: switching between `NvDsInferParseYolo` and `NvDsInferParseYoloCuda`, and adjusting `infer-dims` parameter and both generated the same malloc error.

I checked memory with `free -h` and it wasn't getting full, I also have 4GB swap so it shouldn't be related to insufficient memory.

I have also built the `nvdsinfer_custom_impl_Yolo` on jetson, but just copied the onnx model files over to jetson from my dGPU computer.

It's worth noting that when I try to convert the pt files to onnx on the jeston with the same command that I used on dGPU computer I get the following error:

```bash
$ python export_yolo11.py -w ocr_dataset11.2_yolo11n_256.pt -s 256 --dynamic
Starting: ocr_dataset11.2_yolo11n_256.pt
Opening YOLO11 model
YOLO11n summary (fused): 100 layers, 2,589,367 parameters, 0 gradients, 6.4 GFLOPs
Creating labels.txt file
Exporting the model to ONNX
W1123 18:41:52.682000 39657 torch/onnx/_internal/exporter/_compat.py:114] Setting ONNX exporter to use operator set version 18 because the requested opset_version 17 is a lower version than we have implementations for. Automatic version conversion will be performed, which may not be successful at converting to the requested version. If version conversion is unsuccessful, the opset version of the exported model will be kept at 18. Please consider setting opset_version >=18 to leverage latest ONNX features
Traceback (most recent call last):
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_core.py", line 1416, in export
    decomposed_program = _prepare_exported_program_for_export(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_core.py", line 980, in _prepare_exported_program_for_export
    exported_program = _fx_passes.decompose_with_registry(exported_program, registry)
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_fx_passes.py", line 19, in decompose_with_registry
    return exported_program.run_decompositions(decomp_table)
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/exported_program.py", line 124, in wrapper
    return fn(*args, **kwargs)
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/exported_program.py", line 1484, in run_decompositions
    return _decompose_exported_program(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/exported_program.py", line 967, in _decompose_exported_program
    ) = _decompose_and_get_gm_with_new_signature_constants(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/exported_program.py", line 476, in _decompose_and_get_gm_with_new_signature_constants
    aten_export_artifact = _export_to_aten_ir(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/export/_trace.py", line 877, in _export_to_aten_ir
    gm, graph_signature = transform(aot_export_module)(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1444, in aot_export_module
    fx_g, metadata, in_spec, out_spec = _aot_export_function(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1694, in _aot_export_function
    aot_state = create_aot_state(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 567, in create_aot_state
    fw_metadata = run_functionalized_fw_and_collect_metadata(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 207, in inner
    flat_f_outs = f(*flat_f_args)
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 187, in flat_fn
    tree_out = fn(*args, **kwargs)
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1350, in functional_call
    out = PropagateUnbackedSymInts(mod).run(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/fx/interpreter.py", line 174, in run
    self.env[node] = self.run_node(node)
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 7867, in run_node
    rebind_unbacked(fake_mode.shape_env, n, result)
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 602, in rebind_unbacked
    if u1.node.hint is not None:
AttributeError: 'float' object has no attribute 'node'

While executing %item : [num_users=1] = call_function[target=torch.ops.aten.item.default](args = (%getitem_21,), kwargs = {})
Original traceback:
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
    input = module(input)
  File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/ultralytics/nn/tasks.py", line 120, in forward
    return self.predict(x, *args, **kwargs)
  File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/ultralytics/nn/modules/head.py", line 75, in forward
    y = self._inference(x)
Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/export_yolo11.py", line 141, in <module>
    main(args)
  File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/export_yolo11.py", line 107, in main
    torch.onnx.export(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/__init__.py", line 296, in export
    return _compat.export_compat(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_compat.py", line 143, in export_compat
    onnx_program = _core.export(
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_flags.py", line 23, in wrapper
    return func(*args, **kwargs)
  File "/home/ariapa/.local/lib/python3.10/site-packages/torch/onnx/_internal/exporter/_core.py", line 1444, in export
    raise _errors.ConversionError(
torch.onnx._internal.exporter._errors.ConversionError: Failed to decompose the FX graph for ONNX compatibility. This is step 2/3 of exporting the model to ONNX. Next steps:
- Create an issue in the PyTorch GitHub repository against the *torch.export* component and attach the full error stack as well as reproduction scripts.
- Create an error report with `torch.onnx.export(..., report=True)`, and save the ExportedProgram as a pt2 file. Create an issue in the PyTorch GitHub repository against the *onnx* component. Attach the error report and the pt2 model.

## Exception summary

<class 'AttributeError'>: 'float' object has no attribute 'node'

While executing %item : [num_users=1] = call_function[target=torch.ops.aten.item.default](args = (%getitem_21,), kwargs = {})
Original traceback:
File "/home/ariapa/.local/lib/python3.10/site-packages/torch/nn/modules/container.py", line 250, in forward
    input = module(input)
  File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/ultralytics/nn/tasks.py", line 120, in forward
    return self.predict(x, *args, **kwargs)
  File "/home/ariapa/projects/isss-plate/src/deepstream/models/ultralytics/ultralytics/nn/modules/head.py", line 75, in forward
    y = self._inference(x)
Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

(Refer to the full stack trace above for more information.)
```

But at this point I don't think that this malloc issue is related to model conversion, because I managed to export these 3 models to `engine` format by running the `deepstream-app -c deepstream_app_config.txt` command on infer text config files for each of my models one by one and still when I run the project I get the same malloc error:

```bash
Setting min object dimensions as 16x16 instead of 1x1 to support VIC compute mode.
0:00:00.277948416 28111 0xaaab0f764c40 INFO                 nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2092> [UID = 3]: deserialized trt engine from :deepstream/models/ocr_dataset11.2_yolo11n_256.engine
Implicit layer support has been deprecated
INFO: [Implicit Engine Info]: layers num: 0

0:00:00.278056448 28111 0xaaab0f764c40 INFO                 nvinfer gstnvinfer.cpp:684:gst_nvinfer_logger:<tertiary-inference> NvDsInferContext[UID 3]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2195> [UID = 3]: Use deserialized engine model: deepstream/models/ocr_dataset11.2_yolo11n_256.engine
malloc(): unaligned tcache chunk detected
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

malloc Error on Jetson #679

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

malloc Error on Jetson #679

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions