You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10Lines changed: 10 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,16 @@ We also evaluated PatrickStar v0.4.3 on a single node of A100 SuperPod. It is ab
25
25
26
26
Detail benchmark results on WeChat AI data center as well as NVIDIA SuperPod are posted on this [Google Doc](https://docs.google.com/spreadsheets/d/136CWc_jA_2zC4h1r-6dzD4PrOvp6aw6uCDchEyQv6sE/edit?usp=sharing).
27
27
28
+
29
+
Scale PatrickStar to multiple machine (node) on SuperPod.
30
+
We succeed to train a GPT3-175B on 32 GPU. As far as we known, it is the first work
31
+
to run GPT3 on such small GPU cluster.
32
+
Microsoft used 10,000 V100 to pertrain GPT3.
33
+
Now you can finetune it or even pretrain your own one on 32 A100 GPU, amazing!
34
+
35
+

36
+
37
+
28
38
We've also trained the [CLUE-GPT2](https://huggingface.co/uer/gpt2-chinese-cluecorpussmall) model with PatrickStar, the loss and accuracy curve is shown below:
0 commit comments