Skip to content

Working with Dynamic Batches the output is always fixed #3966

@vilsonrodrigues

Description

@vilsonrodrigues

Description

I am working with TensorRT v10 to do inference with dynamic batches.

My model is ViT base obtained based-on (https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/onnx_ptq)

The model was exported with dynamic axes to batch dimension. In build step a profile was setep.

To do an inference first alloc memory using the max shape (32, 3, 224, 224) to input and (32, 1000) to output.

After copy input data to device memory using device_ptr.

Set input shape in context.

Call do_inference function. But the output is always (32000) to any batch dim.

Environment

TensorRT Version: 10.0.1

NVIDIA GPU: Tesla T4

NVIDIA Driver Version: 530

CUDA Version: 12.2

CUDNN Version: 9.2.0

Operating System:

Python Version (if applicable): 3.10

PyTorch Version (if applicable): 2.3.1

Relevant Files

Model link: https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/onnx_ptq

Steps To Reproduce

Commands or scripts:

import tensorrt as trt
from cuda import cuda, cudart
# https://github.com/NVIDIA/TensorRT/blob/release/10.1/samples/python/common_runtime.py
from common_runtime import *

# load vit engine with dynamic batch = 32

stream = cuda_call(cudart.cudaStreamCreate())
batch_size = None
inputs = []
outputs = []
bindings = []

for i in range(engine.num_io_tensors):

    tensor_name = engine.get_tensor_name(i)
    
    # If binding is dynamic some dimensions can be -1
    # get_tensor_shape returns shape with dynamic dim, same ONNX
    # get_tensor_profile_shape returns (min_shape, optimal_shape, max_shape)
    # Pick out the max shape to allocate enough memory for the binding    
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        shape = engine.get_tensor_profile_shape(tensor_name, 0)[-1]
        batch_size = shape[0]
    else:
        shape = engine.get_tensor_shape(tensor_name)
        # Replace dynamic batch dim max input batch 
        shape[0] = batch_size
    
    # Size in bytes
    size = trt.volume(shape)
    trt_type = engine.get_tensor_dtype(tensor_name)  

    #print(shape)

    # Allocate host and device buffers
    if trt.nptype(trt_type):
        dtype = np.dtype(trt.nptype(trt_type))
        bindingMemory = HostDeviceMem(size, dtype)
    else: # no numpy support: create a byte array instead (BF16, FP8, INT4)
        size = int(size * trt_type.itemsize)
        bindingMemory = HostDeviceMem(size)
    
    # Append the device buffer to device bindings
    bindings.append(int(bindingMemory.device))

    # Append to the appropriate list
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        inputs.append(bindingMemory)
    else:
        outputs.append(bindingMemory)

context = engine.create_execution_context()

batch = 4

shape = (batch, 3, 224, 224)
input_data = np.random.rand(*shape).astype("float32")

context.set_input_shape("input", shape)

memcpy_host_to_device(inputs[0].device, input_data)

results = do_inference(context, engine, bindings, inputs, outputs, stream)

results[0].shape
> (32000,)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions