Using DTensor to handle local num_heads change while TP is applied (#… #4890
Job | Run time |
---|---|
39m 7s | |
15m 17s | |
24m 20s | |
21m 10s | |
19m 11s | |
22m 16s | |
21m 28s | |
35m 27s | |
15m 16s | |
15m 27s | |
16m 53s | |
13m 57s | |
38m 18s | |
18m 55s | |
28m 19s | |
22m 20s | |
6h 7m 41s |
Job | Run time |
---|---|
39m 7s | |
15m 17s | |
24m 20s | |
21m 10s | |
19m 11s | |
22m 16s | |
21m 28s | |
35m 27s | |
15m 16s | |
15m 27s | |
16m 53s | |
13m 57s | |
38m 18s | |
18m 55s | |
28m 19s | |
22m 20s | |
6h 7m 41s |