BERT inference examples and benchmarks for A100 #7350
vadimkantorov
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm looking for a modern basic example/benchmark of BERT inference on Triton inference server (similar to the older https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT/triton and https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/triton/large/README.md#deployment-process <- but they do not include torch.compile with bells/whistles) on a A100 gpu
Some variants that would be interesting:
Does anybody know if it exists? Even the most basic comparison of torch.compile config to a modern TRT would be interesting
Thanks :)
Beta Was this translation helpful? Give feedback.
All reactions