Skip to content

Commit 95190af

Browse files
authored
Merge pull request #117 from AI-Hypercomputer/ironwood/readme_add_tflops_per_second_per_chip
Add TFLOPs/sec/chip to Ironwood training README
2 parents 7612af3 + 8aa6a5c commit 95190af

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

training/ironwood/README.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,20 @@
22

33
The training recipes contained in this folder are optimized for Ironwood TPU. Here is a summary of the included recipes.
44

5-
| <div style="width:100px;">Model ID</div> | Number of chips | GBS | Sequence length | Precision | Step time (seconds) | Tokens/sec/chip |
6-
|-----------------|--------------------|--------------|--------------------------|--------------------|-------------------|---------------------------|
7-
| deepseek-v3 | 128 | 2048 | 4096 | bf16 | 27.91552391 | 2,347.65 |
8-
| deepseek-v3 | 128 | 2048 | 4096 | fp8_full | 22.83807576 | 2,869.59 |
9-
| deepseek-v3 | 256 | 4096 | 4096 | bf16 | 29.35336316 | 2,232.66 |
10-
| deepseek-v3 | 256 | 4096 | 4096 | fp8_full | 26.51635157 | 2,471.53 |
11-
| gpt-oss-120b | 64 | 1280 | 8192 | bf16 | 17.77661018 | 9,216.61 |
12-
| gpt-oss-120b | 256 | 5120 | 8192 | bf16 | 18.77993546 | 8,724.20 |
13-
| llama3.1-405b | 256 | 1536 | 8192 | bf16 | 99.66454824 | 493.17 |
14-
| llama3.1-405b | 256 | 1536 | 8192 | fp8_full | 65.02921753 | 755.84 |
15-
| llama3.1-70b | 64 | 256 | 8192 | bf16 | 12.51527348 | 2,618.24 |
16-
| llama3.1-70b | 64 | 256 | 8192 | fp8_full | 8.908863386 | 3,678.13 |
17-
| llama3.1-70b | 256 | 1024 | 8192 | bf16 | 12.78735822 | 2,562.53 |
18-
| llama3.1-70b | 256 | 1024 | 8192 | fp8_full | 9.384045601 | 3,491.88 |
19-
| llama3.1-70b | 256 | 64 | 131072 | bf16 | 34.72535706 | 943.63 |
20-
| llama3.1-70b | 256 | 64 | 131072 | fp8_full | 31.47576637 | 1,041.05 |
21-
| qwen3-235b-a22b | 256 | 8192 | 4096 | bf16 | 33.81617737 | 3,876.01 |
5+
| <div style="width:100px;">Model ID</div> | Number of chips | GBS | Sequence length | Precision | Step time (seconds) | TFLOPs/sec/chip | Tokens/sec/chip |
6+
|-----------------|--------------------|--------------|--------------------------|--------------------|-------------|--------------|-----------------------|
7+
| deepseek-v3 | 128 | 2048 | 4096 | bf16 | 27.91 | 587.91 | 2,347.65 |
8+
| deepseek-v3 | 128 | 2048 | 4096 | fp8_full | 22.83 | 718.57 | 2,869.59 |
9+
| deepseek-v3 | 256 | 4096 | 4096 | bf16 | 29.35 | 559.18 | 2,232.66 |
10+
| deepseek-v3 | 256 | 4096 | 4096 | fp8_full | 26.51 | 618.95 | 2,471.53 |
11+
| gpt-oss-120b | 64 | 1280 | 8192 | bf16 | 17.77 | 317.63 | 9,216.61 |
12+
| gpt-oss-120b | 256 | 5120 | 8192 | bf16 | 18.77 | 300.64 | 8,724.20 |
13+
| llama3.1-405b | 256 | 1536 | 8192 | bf16 | 99.66 | 1,244.67 | 493.17 |
14+
| llama3.1-405b | 256 | 1536 | 8192 | fp8_full | 65.02 | 1,907.81 | 755.84 |
15+
| llama3.1-70b | 64 | 256 | 8192 | bf16 | 12.51 | 1,176.27 | 2,618.24 |
16+
| llama3.1-70b | 64 | 256 | 8192 | fp8_full | 8.90 | 1,652.29 | 3,678.13 |
17+
| llama3.1-70b | 256 | 1024 | 8192 | bf16 | 12.78 | 1,151.14 | 2,562.53 |
18+
| llama3.1-70b | 256 | 1024 | 8192 | fp8_full | 9.38 | 1,568.72 | 3,491.88 |
19+
| llama3.1-70b | 256 | 64 | 131072 | bf16 | 34.72 | 879.83 | 943.63 |
20+
| llama3.1-70b | 256 | 64 | 131072 | fp8_full | 31.47 | 970.72 | 1,041.05 |
21+
| qwen3-235b-a22b | 256 | 8192 | 4096 | bf16 | 33.81 | 574.87 | 3,876.01 |

0 commit comments

Comments
 (0)