Tritonserver serving Tensorrt model with cuda graph results in weird unconsistent outputs.

Hi TensorRT team.

I'm not sure which component should be blamed for this problem exactly. So I filed a ticket here as well as https://github.com/triton-inference-server/server/issues/8550.

In conclusion, when we send requests sequentially to TritonServer, in the `AAAAABBBBBAAAABBBB` pattern. In every A(B)‘s requesting round, the first few A(B) requests probably return the last period round B(A)'s results, which is absurdly wrong for the current input.

Details can be found in the issue I posted above.

Thanks for your product. Looking forward to your response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tritonserver serving Tensorrt model with cuda graph results in weird unconsistent outputs. #4651

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tritonserver serving Tensorrt model with cuda graph results in weird unconsistent outputs. #4651

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions