Commit ac76ffa
authored
[moe] feat: enabling expert parallelism in veScale (#59)
## Overview
veScale provides an efficient framework for training Mixture of Experts
(MoE) models using expert parallelism. Expert parallelism can be
deployed with the `parallelize_experts()` function, which simplifies the
process of distributing and managing workload during MoE training.
### Function Signature
```python
model = parallelize_experts(
module: nn.Module,
experts_expr: Union[str, List[str]],
experts_allocator: vescale.moe.ExpertsAllocator,
token_dispatcher: vescale.moe.TokenDispatcher,
config: Dict,
)
```
### Parameters
- **`module`**: The training model (an instance of `nn.Module`) to be
parallelized.
- **`experts_expr`**: Specifies the paths to the expert modules. Can be
a string or a list of strings.
- **`experts_allocator`**: An instance of `ExpertsAllocator`, used for
managing expert parameter allocation.
- **`token_dispatcher`**: An instance of `TokenDispatcher`, responsible
for token scheduling and distribution.
- **`config`**: A dictionary containing the MoE training configuration,
including layer count, number of experts, and other relevant settings.
## Custom Scheduling
veScale allows users to define custom scheduling strategies for expert
parallelism by implementing the following components:
- **`ExpertsAllocator`**: Manages expert parameter allocation. It can
use `collect_performance()` to profile and dynamically adjust the DP x
TP device mesh for each expert. By default, veScale shards all expert
parameters across devices using tensor parallelism.
- **`TokenDispatcher`**: Handles token distribution. Using
`assign_task()`, it determines workload allocation (e.g., expert IDs and
token weights) and adjusts scheduling with `collect_performance()`. The
default implementation randomly assigns tokens to a single DP rank for
the selected expert.
## Optimizer Support
Since veScale supports dynamic placement of expert parameters, a
dedicated optimizer, `MoEOptimizer`, is required. This optimizer handles
the redistribution of expert parameters and their states efficiently.
Future updates will integrate these functionalities into optimizers for
static parameters to streamline the process.
## Getting Started
### Data Preparation
Prepare the Shakespeare dataset by running:
```bash
cd data/shakespeare/
python3 prepare.py
cd ../..
```
### Training Command
```
torchrun --standalone --nproc_per_node={GPU_CNT} mixtral_train.py --dp={dp_size} --tp={tp_size} --max_iters={max_iters}
```1 parent b4b1686 commit ac76ffa
File tree
41 files changed
+2444
-132
lines changed- examples
- llama2_4D_finetune
- mixtral_4D_benchmark
- mixtral_4D_training
- mixtral_EP_training
- data/shakespeare
- nanogpt_4D_finetune
- test/emulator
- vescale
- ddp
- dmodule
- dmp
- policies
- dtensor
- ops
- emulator
- moe
- optim
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
41 files changed
+2444
-132
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
40 | 54 | | |
41 | 55 | | |
42 | 56 | | |
| |||
53 | 67 | | |
54 | 68 | | |
55 | 69 | | |
56 | | - | |
| 70 | + | |
57 | 71 | | |
58 | 72 | | |
59 | 73 | | |
| |||
77 | 91 | | |
78 | 92 | | |
79 | 93 | | |
80 | | - | |
81 | | - | |
| 94 | + | |
| 95 | + | |
82 | 96 | | |
83 | 97 | | |
84 | 98 | | |
| |||
165 | 179 | | |
166 | 180 | | |
167 | 181 | | |
168 | | - | |
| 182 | + | |
169 | 183 | | |
170 | 184 | | |
171 | 185 | | |
| |||
198 | 212 | | |
199 | 213 | | |
200 | 214 | | |
201 | | - | |
| 215 | + | |
202 | 216 | | |
203 | 217 | | |
204 | 218 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | | - | |
60 | | - | |
| 59 | + | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
29 | 29 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
40 | 54 | | |
41 | 55 | | |
42 | 56 | | |
| |||
57 | 71 | | |
58 | 72 | | |
59 | 73 | | |
60 | | - | |
| 74 | + | |
61 | 75 | | |
62 | 76 | | |
63 | 77 | | |
| |||
90 | 104 | | |
91 | 105 | | |
92 | 106 | | |
93 | | - | |
| 107 | + | |
94 | 108 | | |
95 | 109 | | |
96 | 110 | | |
| |||
104 | 118 | | |
105 | 119 | | |
106 | 120 | | |
107 | | - | |
108 | | - | |
| 121 | + | |
| 122 | + | |
109 | 123 | | |
110 | 124 | | |
111 | | - | |
| 125 | + | |
112 | 126 | | |
113 | 127 | | |
114 | 128 | | |
| |||
170 | 184 | | |
171 | 185 | | |
172 | 186 | | |
173 | | - | |
| 187 | + | |
174 | 188 | | |
175 | 189 | | |
176 | 190 | | |
| |||
203 | 217 | | |
204 | 218 | | |
205 | 219 | | |
206 | | - | |
| 220 | + | |
207 | 221 | | |
208 | 222 | | |
209 | 223 | | |
| |||
274 | 288 | | |
275 | 289 | | |
276 | 290 | | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | 291 | | |
283 | 292 | | |
284 | 293 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
29 | 29 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
57 | 61 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
67 | 74 | | |
68 | 75 | | |
69 | 76 | | |
0 commit comments