TRT/TRT-LLM support for NestedTensor / jagged or ragged tensors for doing away with padding #1469

vadimkantorov · 2024-04-18T10:53:07Z

vadimkantorov
Apr 18, 2024

Does TRT / TRT-LLM support pytorch's concept of NestedTensor or is there some similar concept?

The idea is to not compute FeedForward's on padded-out positions and skip attention computation for them as well

I guess the question also becomes of whether underlying flash attention implementation supports such jagged compact formats.

In PyTorch it seems that they support SDPA with NestedTensor for speedups: pytorch/pytorch#105913 (comment) (can be useful for batched BERT inference)

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TRT/TRT-LLM support for NestedTensor / jagged or ragged tensors for doing away with padding #1469

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

TRT/TRT-LLM support for NestedTensor / jagged or ragged tensors for doing away with padding #1469

Uh oh!

Uh oh!

vadimkantorov Apr 18, 2024

Replies: 0 comments

vadimkantorov
Apr 18, 2024