Working with Dynamic Batches the output is always fixed

## Description

I am working with TensorRT v10 to do inference with dynamic batches. 

My model is ViT base obtained based-on (https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/onnx_ptq)

The model was exported with dynamic axes to batch dimension. In build step a profile was setep.

To do an inference first alloc memory using the max shape (32, 3, 224, 224) to input and (32, 1000) to output.

After copy input data to device memory using device_ptr.

Set input shape in context.

Call do_inference function. But the output is always (32000) to any batch dim.





## Environment



**TensorRT Version**: 10.0.1

**NVIDIA GPU**: Tesla T4

**NVIDIA Driver Version**: 530

**CUDA Version**: 12.2

**CUDNN Version**: 9.2.0


Operating System:

Python Version (if applicable): 3.10

PyTorch Version (if applicable): 2.3.1

## Relevant Files



**Model link**: https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/onnx_ptq


## Steps To Reproduce



**Commands or scripts**:

``` python
import tensorrt as trt
from cuda import cuda, cudart
# https://github.com/NVIDIA/TensorRT/blob/release/10.1/samples/python/common_runtime.py
from common_runtime import *

# load vit engine with dynamic batch = 32

stream = cuda_call(cudart.cudaStreamCreate())
batch_size = None
inputs = []
outputs = []
bindings = []

for i in range(engine.num_io_tensors):

    tensor_name = engine.get_tensor_name(i)
    
    # If binding is dynamic some dimensions can be -1
    # get_tensor_shape returns shape with dynamic dim, same ONNX
    # get_tensor_profile_shape returns (min_shape, optimal_shape, max_shape)
    # Pick out the max shape to allocate enough memory for the binding    
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        shape = engine.get_tensor_profile_shape(tensor_name, 0)[-1]
        batch_size = shape[0]
    else:
        shape = engine.get_tensor_shape(tensor_name)
        # Replace dynamic batch dim max input batch 
        shape[0] = batch_size
    
    # Size in bytes
    size = trt.volume(shape)
    trt_type = engine.get_tensor_dtype(tensor_name)  

    #print(shape)

    # Allocate host and device buffers
    if trt.nptype(trt_type):
        dtype = np.dtype(trt.nptype(trt_type))
        bindingMemory = HostDeviceMem(size, dtype)
    else: # no numpy support: create a byte array instead (BF16, FP8, INT4)
        size = int(size * trt_type.itemsize)
        bindingMemory = HostDeviceMem(size)
    
    # Append the device buffer to device bindings
    bindings.append(int(bindingMemory.device))

    # Append to the appropriate list
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        inputs.append(bindingMemory)
    else:
        outputs.append(bindingMemory)

context = engine.create_execution_context()

batch = 4

shape = (batch, 3, 224, 224)
input_data = np.random.rand(*shape).astype("float32")

context.set_input_shape("input", shape)

memcpy_host_to_device(inputs[0].device, input_data)

results = do_inference(context, engine, bindings, inputs, outputs, stream)

results[0].shape
> (32000,)

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with Dynamic Batches the output is always fixed #3966

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Working with Dynamic Batches the output is always fixed #3966

Description

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions