TRT/TRT-LLM support for NestedTensor / jagged or ragged tensors for doing away with padding #1469
Unanswered
vadimkantorov
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Does TRT / TRT-LLM support pytorch's concept of NestedTensor or is there some similar concept?
The idea is to not compute FeedForward's on padded-out positions and skip attention computation for them as well
I guess the question also becomes of whether underlying flash attention implementation supports such jagged compact formats.
In PyTorch it seems that they support SDPA with NestedTensor for speedups: pytorch/pytorch#105913 (comment) (can be useful for batched BERT inference)
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions