Skip to content

High Latency in TensorRT Node for Image Segmentation on Jetson Orin Nano 8GB #57

@eterry-devops

Description

@eterry-devops

Issue Description
I'm experiencing unexpectedly high latency when running image segmentation using Isaac ROS DNN Inference on a Jetson Orin Nano 8GB. The TensorRT node appears to be the primary bottleneck, with processing delays averaging ~240-260ms.
Environment

Hardware: Jetson Orin Nano 8GB
Model: PeopleSegNet (deployable_quantized_vanilla_unet_onnx_v2.0)
Isaac ROS Version: 3.2
JetPack Version: 6.2
CUDA Version: 12.6

Command Used
bashros2 launch isaac_ros_examples isaac_ros_examples.launch.py
launch_fragments:=zed_mono_rect,unet
engine_file_path:=${ISAAC_ROS_WS}/isaac_ros_assets/models/peoplesemsegnet/deployable_quantized_vanilla_unet_onnx_v2.0/1/model.plan
input_binding_names:=['input_1:0']
output_binding_names:=['argmax_1']
network_output_type:='argmax'
interface_specs_file:=${ISAAC_ROS_WS}/isaac_ros_assets/isaac_ros_unet/zed2_quickstart_interface_specs.json
Performance Measurements
TensorRT Node Input (/tensor_sub)
ros2 topic delay /tensor_sub --window 100

Average delay: ~240-260ms
Min: 108ms
Max: 348ms
Std dev: ~0.04s

TensorRT Node Output (/tensor_pub)
ros2 topic delay /tensor_pub --window 100

Average delay: ~200-214ms
Min: 123ms
Max: 287ms
Std dev: ~0.03s

Analysis
The latency measurements show that:

Total pipeline latency is averaging 240-260ms
The TensorRT node itself appears to be adding significant processing time
There's considerable variance in processing times (std dev ~40ms on input)

Expected Behavior
For a Jetson Orin Nano 8GB with a quantized UNet model, I would expect much lower latency, ideally in the range of 140-160ms for real-time performance from image-to-mask

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions