You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_trainer_configs.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
# Trainer Parameter Configuration Guide
2
2
3
3
This document provides recommended training configurations for Qwen3 series models on **NVIDIA A100 80GB** and **H20 96GB** GPUs.
4
-
Based on model size (0.6B ~ 14B) and context length (`max_model_len`), we present feasible Trainer module setups across varying numbers of GPUs.
4
+
Based on model size (0.6B ~ 14B) and context length (`model.max_model_len`), we present feasible Trainer module setups across varying numbers of GPUs.
5
5
6
6
> 💡 **Terminology**
7
7
>
@@ -12,8 +12,8 @@ Based on model size (0.6B ~ 14B) and context length (`max_model_len`), we presen
12
12
>```
13
13
> - **Offload**: Enable **FSDP v2 + CPU Offload** to reduce GPU memory usage.
14
14
> - **SP=N**: Use **Sequence Parallelism** with parallelism degree N (typically N ≤ number of GPUs).
15
-
> - **Combined entries (e.g., `Env SP=2`)**: All listed conditions must be satisfied simultaneously.
16
-
> - **“-”**: The combination of current hardware and configuration **cannot support training**for this model + sequence length.
15
+
> - **Combined entries (e.g., `Env + SP=2`)**: All listed conditions must be satisfied simultaneously.
16
+
> - **“-”**: The combination of current hardware and configuration **cannot support training**for this model under the given sequence length.
17
17
18
18
---
19
19
@@ -37,7 +37,7 @@ model:
37
37
38
38
---
39
39
40
-
## 🖥️ A100 80GB GPU Configuration Recommendations
40
+
## A100 80GB GPU Configuration Recommendations
41
41
42
42
> ⚠️ **Single-GPU Limitation**: Training models ≥4B or with context lengths >20K on a single A100 GPU places extreme pressure on VRAM. **Multi-GPU setups are strongly recommended**.
43
43
@@ -138,7 +138,7 @@ model:
138
138
139
139
---
140
140
141
-
## 🧊 H20 96GB GPU Configuration Recommendations
141
+
## H20 96GB GPU Configuration Recommendations
142
142
143
143
The H20 has larger VRAM (96GB) but lower compute performance compared to the A100.
144
144
@@ -253,5 +253,5 @@ The H20 has larger VRAM (96GB) but lower compute performance compared to the A10
253
253
- Step 1: Set `export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`
254
254
- Step 2: Increase **Sequence Parallelism (SP)**
255
255
- Step 3: Enable **FSDP v2 + CPU Offload**
256
-
4.**Choosing SP parallelism degree**: Prefer values that are **common divisors of both GPU count and attention head count** (e.g., 2, 4) to avoid communication bottlenecks.
256
+
4.**Choosing SP parallelism degree**: Prefer values that are **common divisors of both GPU count and attention head count** (e.g., 2, 4).
257
257
5.**Prefer multi-GPU over single-GPU**: Even when VRAM appears sufficient, multi-GPU setups improve training efficiency and stability through parallelization.
0 commit comments