[Tutorial][PTD] Deprecate Training Transformer models using Distributed Data Parallel and Pipeline Parallelism
and redirect the page to parallelism APIs
#3358
Job | Run time |
---|---|
10m 51s | |
49m 37s | |
17m 59s | |
27m 17s | |
18m 52s | |
16m 10s | |
18m 27s | |
12m 57s | |
16m 47s | |
14m 40s | |
25m 22s | |
32m 26s | |
35m 24s | |
22m 23s | |
16m 36s | |
19m 53s | |
5h 55m 41s |