Skip to content

Commit 756a055

Browse files
authored
feat(doc): explain deepspeed configs (axolotl-ai-cloud#2514) [skip ci]
* feat(doc): explain deepspeed configs * fix: add fetch configs
1 parent 9a8e3e9 commit 756a055

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

docs/multi-gpu.qmd

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ deepspeed: deepspeed_configs/zero1.json
3636
### Usage {#sec-deepspeed-usage}
3737

3838
```{.bash}
39+
# Fetch deepspeed configs (if not already present)
40+
axolotl fetch deepspeed_configs
41+
3942
# Passing arg via config
4043
axolotl train config.yml
4144
@@ -48,10 +51,20 @@ axolotl train config.yml --deepspeed deepspeed_configs/zero1.json
4851
We provide default configurations for:
4952

5053
- ZeRO Stage 1 (`zero1.json`)
54+
- ZeRO Stage 1 with torch compile (`zero1_torch_compile.json`)
5155
- ZeRO Stage 2 (`zero2.json`)
5256
- ZeRO Stage 3 (`zero3.json`)
57+
- ZeRO Stage 3 with bf16 (`zero3_bf16.json`)
58+
- ZeRO Stage 3 with bf16 and CPU offload params(`zero3_bf16_cpuoffload_params.json`)
59+
- ZeRO Stage 3 with bf16 and CPU offload params and optimizer (`zero3_bf16_cpuoffload_all.json`)
60+
61+
::: {.callout-tip}
62+
63+
Choose the configuration that offloads the least amount to memory while still being able to fit on VRAM for best performance.
5364

54-
Choose based on your memory requirements and performance needs.
65+
Start from Stage 1 -> Stage 2 -> Stage 3.
66+
67+
:::
5568

5669
## FSDP {#sec-fsdp}
5770

0 commit comments

Comments
 (0)