How to Think About GPUs | How To Scale Your Model #66
Replies: 9 comments 6 replies
-
Who can implement what is written here probably making 5m a year tell me if I am right or wrong 😁 What a great resource for free |
Beta Was this translation helpful? Give feedback.
-
This resource, in conjunction with the comprehensive insights from David A. Patterson and John L. Hennessy's book on computer architecture, has been invaluable. It has significantly enhanced my understanding of TPUs and machine learning systems, enabling me to perform rapid back-of-the-envelope calculations for system design, which I thought you could only learn privately from ML Labs. thanks! This has made me a much better differential programmer |
Beta Was this translation helpful? Give feedback.
-
Thank you for sharing this 🙏 |
Beta Was this translation helpful? Give feedback.
-
for me beginner. thanks so much. i bookmark before reading. ^^ |
Beta Was this translation helpful? Give feedback.
-
Can you check if there is anything like B100 as in the first picture? I think there is B200, GB200, and GB300? |
Beta Was this translation helpful? Give feedback.
-
How does all of this compare against AMDs MI300x? Possible to update post to include? |
Beta Was this translation helpful? Give feedback.
-
Question on the derivations in Reductions when array is sharded over a separate axis. For the case which means time decreases proportional to the On the other hand, when with a crossover point at Also, separate from the above, shouldn't the scale-out term use the number of nodes rather than the number of GPUs? |
Beta Was this translation helpful? Give feedback.
-
The Tcomm with NVlinkSharp should be B/W instead of N/W. There seems to be a trivial typo there. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the great write-up! Have some questions about quiz 5 that it would be great if you could please help clarify:
Why is it that only the amount of pipeline parallelism affects roofline and not the 8-way TP? Does it have to do with the fact that the TP is only within each node? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Here's some discussion about the GPU section
Beta Was this translation helpful? Give feedback.
All reactions