Skip to content

Commit f588ed5

Browse files
committed
Adding documentation for optimized model tiers
1 parent 64d6d9b commit f588ed5

File tree

2 files changed

+44
-0
lines changed

2 files changed

+44
-0
lines changed

docs/reference.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,5 @@ reference/alternatives.md
2525
reference/benchmark_and_performance.md
2626
reference/architecture_overview.md
2727
reference/jax_xla_and_pallas.md
28+
reference/tiering.md
2829
```

docs/reference/tiering.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
2+
# MaxText Optimized Models Tiering
3+
4+
For each of the TPU platforms listed below, we present a list of optimized models[^1] [^2] for pre-training. If you’re getting started with MaxText, or want to push performance, we recommend choosing a Gold model, with an accompanying pre-training recipe.
5+
6+
- **Gold Tier**: Fully Optimized Models certified to run with maximum efficiency on Cloud TPUs. They are thoroughly refined for the highest possible performance, making them ideal for production-critical workloads requiring peak throughput.
7+
8+
- **Silver Tier**: High Performance Models that are well-optimized to deliver high, reliable performance on Cloud TPUs. They are effective for most use cases but may offer opportunities for expert tuning to achieve peak (Gold Tier) performance.
9+
10+
## Trillium (v6e)
11+
12+
### Gold
13+
14+
| Model | Recipe | Benchmark Configuration | MFU | Approx tokens/sec/device |
15+
| :--- | :--- | :--- | :--- | :--- |
16+
| Llama 2 70B | [Link](https://github.com/AI-Hypercomputer/tpu-recipes/tree/main/training/trillium/Llama2-70B-MaxText) | 256, BF16, SL=4096 | 43.8% | 900 |
17+
| Llama 3.1 8B | [Link](https://github.com/AI-Hypercomputer/tpu-recipes/tree/main/training/trillium/Llama3.1-8B-MaxText/v6e-256) | 256 Chips, BF16, SL=8192 | 45.46% | 7,207 |
18+
| Llama 3.1 70B | [Link](https://github.com/AI-Hypercomputer/maxtext/blob/92e59fdf547421f647590087f50fea5729da42d8/benchmarks/maxtext_trillium_model_configs.py#L959) | 256 Chips, BF16, SL=8192 | 50.33% | 960 |
19+
20+
### Silver
21+
22+
| Model | Recipe | Benchmark Configuration | MFU | Approx tokens/sec/device |
23+
| :--- | :--- | :--- | :--- | :--- |
24+
| Llama 3.1 405B | [Link](https://github.com/AI-Hypercomputer/maxtext/blob/5e6a7caff904f67fa654fc0ae983a16156bc21f8/benchmarks/maxtext_trillium_model_configs.py#L723) | 256 Chips, BF16, SL=8192 | 38.55% | 123 |
25+
| Mixtral 8X7B | [Link](https://github.com/AI-Hypercomputer/tpu-recipes/tree/main/training/trillium/Mixtral-8x7B-MaxText) | 256 Chips, BF16, SL=4096 | 35.23% | 3,899 |
26+
| Mixtral 8X22B | [Link](https://github.com/AI-Hypercomputer/tpu-recipes/tree/main/training/trillium/Mixtral-8x22B-MaxText) | 256 Chips, BF16, SL=4096 | 36.2% | 1,326 |
27+
28+
## v5p
29+
30+
### Gold
31+
32+
| Model | Recipe | Benchmark Configuration | MFU | Approx tokens/sec/device |
33+
| :--- | :--- | :--- | :--- | :--- |
34+
| Llama 2 70B | [Link](https://github.com/AI-Hypercomputer/maxtext/blob/92e59fdf547421f647590087f50fea5729da42d8/benchmarks/maxtext_v5p_model_configs.py#L156) | 512 Chips, BF16, SL=4096 | 65.4% | 692 |
35+
36+
### Silver
37+
38+
| Model | Recipe | Benchmark Configuration | MFU | Approx tokens/sec/device |
39+
| :--- | :--- | :--- | :--- | :--- |
40+
| Mixtral 8X7B | [Link](https://github.com/AI-Hypercomputer/tpu-recipes/tree/main/training/v5p/Mixtral-8X7B-Maxtext) | 256 Chips(8x4x4), bf16, SL=4096 | 52.56% | 2,909 |
41+
42+
[^1]: Performance results are subject to variations based on system configuration, software versions, and other factors. These benchmarks represent point-in-time measurements under specific conditions.
43+
[^2]: Some older TFLOPS/s results are impacted by an updated calculation for causal attention ([PR #1988](https://github.com/AI-Hypercomputer/maxtext/pull/1988)), which halves the attention FLOPs. This change particularly affects configurations with large sequence lengths. For more details, please refer to the [performance metrics guide](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/guides/performance_metrics.md).

0 commit comments

Comments
 (0)