|
| 1 | +# Design Doc: Large Model |
| 2 | + |
| 3 | +## Abstract |
| 4 | + |
| 5 | +We propose an approach to support the large parameter. |
| 6 | +For embedding layer, the parameter may very large and could |
| 7 | +not be stored in one trainer's memory. In this approach, a Trainer would |
| 8 | +prefetch a sliced parameter from different Parameter Server instances |
| 9 | +according to the input `Ids`, and then run forward, backward and send |
| 10 | +the gradient to Parameter Server to execute the optimize program. |
| 11 | + |
| 12 | +## Design |
| 13 | + |
| 14 | +Fluid large model distributed training use |
| 15 | +[Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split |
| 16 | +a large parameter into multiple parameters which stored on Parameter Server, and |
| 17 | +the Trainer would prefetch them by `RPC` interface. |
| 18 | + |
| 19 | +### Split Large Parameter |
| 20 | + |
| 21 | +<img src="src/split_parameter.png" width="400" /> |
| 22 | + |
| 23 | +**Distributed Transpiler** would split the large parameter |
| 24 | +(weight) into some sliced parameters (weight_0, weight_1, weight_2) as the |
| 25 | +figure above. |
| 26 | + |
| 27 | +### Prefetch Parameters from Parameter Servers |
| 28 | + |
| 29 | +<img src="src/prefetch_parameters.png" width="400" /> |
| 30 | + |
| 31 | +- `PrefetchRpc` operator would send the rows index the multiple Parameter Servers, |
| 32 | + and then receive the SelctedRows. |
| 33 | +- The different with normal Fluid distributed training, we only prefetch the rows |
| 34 | + |
| 35 | +## TODO |
| 36 | + |
| 37 | +- Async Update |
| 38 | + |
| 39 | + To avoid slow-node, Async update is important for distributed training, |
| 40 | + we need an design doc and implement it in future. |
0 commit comments