Serving LLaMA 3-70B on TPUs | How To Scale Your Model #10
Replies: 3 comments 3 replies
-
"Notably, at all batch sizes greater than 2k, FLOPs is always smaller than our KV loading time in this regime." Is this a typo, where it should be 200 instead of 2k? |
Beta Was this translation helpful? Give feedback.
-
I think Question 3 answer has mistaken wording: |
Beta Was this translation helpful? Give feedback.
-
A couple of places throughout this section and the last use |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Serving LLAMA on TPUs!
Beta Was this translation helpful? Give feedback.
All reactions