Skip to content

Commit 5948fd2

Browse files
author
Abhinav Arora
committed
Refine the prefetch parameter document
1 parent 302136e commit 5948fd2

File tree

1 file changed

+10
-17
lines changed

1 file changed

+10
-17
lines changed

doc/fluid/design/dist_train/prefetch_parameter.md

Lines changed: 10 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,33 @@
22

33
## Abstract
44

5-
We propose an approach to prefetch parameter from Parameter
6-
Server while distributed training so that Fluid would training
7-
a model including the large parameter which could not be stored in one
8-
trainer's memory.
5+
We propose an approach to pre-fetch the parameters from a Parameter Server while distributed training so that Fluid is able to train a model with a large number of parameters that cannot be stored in one trainer's memory.
96

107
## Background
118

12-
For an embedding layer, the trainable parameter may be very large and could
13-
not be stored in one trainer's memory. In Fluid distributed training,
14-
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small
15-
parameters and stored in Parameter Server, so we could prefetch the parameter
16-
from the specified Parameter Server according to the input `Ids`.
9+
For an embedding layer, the number of trainable parameters may be very large and it is likely that they may not be able to be stored in one trainer's memory. In Fluid distributed training,
10+
the [Distributed Transpiler](./parameter_server.md#distributed-transpiler) would split every parameter into a number of small parameters that are stored on the Parameter Server. Hence, we can pre-fetch the parameters from the specified Parameter Server using the input `Ids`.
1711

1812
## Design
1913

20-
This is a feature of Fluid distributed training, maybe you want
21-
to know [Distributed Architecture](./distributed_architecture.md) and
22-
[Parameter Server](./parameter_server.md) before reading the following content.
14+
Prior to reading this design, it would be useful for the reader to make themselves familiar with Fluid [Distributed Training Architecture](./distributed_architecture.md) and
15+
[Parameter Server](./parameter_server.md).
2316

2417
### Partationed Parameter
2518

2619
<img src="src/split_parameter.png" width="400" />
2720

28-
- **Distributed Transpiler** would split the large parameter
29-
(weight) into some partitioned parameters (weight_0, weight_1, weight_2) as the
21+
- **Distributed Transpiler** would split the large parameters
22+
(`weight`) into some partitioned parameters (`weight_0`, `weight_1`, `weight_2`) as shown in the
3023
figure above.
31-
- We could use `round-robin` to distribute the partitioned parameter.
24+
- We can use `round-robin` to distribute the partitioned parameter.
3225

33-
### Prefetching Parameter
26+
### Pre-fetching Parameters
3427

3528
<img src="src/prefetch_parameters.png" width="400" />
3629

3730
- `prefetch_rpc` operator would prefetch the parameter from different Parameter
38-
Server according with the input `Ids`, we use [SelectedRows](../../../design/selected_rows.md)
31+
Servers using the input `Ids`. We use [SelectedRows](../../../design/selected_rows.md)
3932
as the received variable type.
4033
- `merge_selected_rows` operator would merge the received parameters into one
4134
`SelectedRows` variable.

0 commit comments

Comments
 (0)