@@ -9,16 +9,16 @@ different purposes.
9
9
10
10
## Background
11
11
12
- The previous implementations of the parameter server does not run a
12
+ The previous implementations of the parameter server do not run a
13
13
fluid sub-program. Parameter initialization, optimizer computation, network
14
14
communication and checkpointing are implemented twice on both the
15
- trainer and the parameter server.
15
+ trainer as well as the parameter server.
16
16
17
- It would be great if we can write code once and use them on both the
18
- trainer and the parameter server: reduces code duplication and
19
- improves extensibility. Given that after the current refactor , we are
20
- representing everything as a computing graph on the
21
- trainer. Representing everything as a computing graph on the parameter
17
+ It would be great if we can write code once and use them on both: the
18
+ trainer and the parameter server, since this reduces code duplication and
19
+ improves extensibility. Given that after the current refactoring , we are
20
+ representing everything as a computation graph on the
21
+ trainer. Representing everything as a computation graph on the parameter
22
22
server becomes a natural extension.
23
23
24
24
## Design
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
30
30
steps:
31
31
32
32
1 . OP placement: the OPs will be placed on different nodes according
33
- to heuristic that minimizes estimated total computation
33
+ to a heuristic that minimizes the estimated total computation
34
34
time. Currently we will use a simple heuristic that puts parameter
35
- varable on parameter server workers and everything else on trainer
35
+ variable on parameter server workers and everything else on trainer
36
36
workers.
37
37
1 . Add communication OPs to enable the communication between nodes.
38
38
@@ -47,22 +47,22 @@ After converting:
47
47
48
48
<img src =" src/dist-graph.png " width =" 700 " />
49
49
50
- 1 . The parameter variable W and it's optimizer program are placed on the parameter server.
50
+ 1 . The parameter variable W and its optimizer program are placed on the parameter server.
51
51
1 . Operators are added to the program.
52
52
- * Send* sends data to the connected * Recv* operator. The
53
53
scheduler on the receive node will only schedule *Recv* operator
54
54
to run when the *Send* operator has ran (the *Send* OP will mark
55
55
the *Recv* OP runnable automatically).
56
- - * Enueue * enqueues the input variable, it can block until space
56
+ - * Enqueue * enqueues the input variable, it can block until space
57
57
become available in the queue.
58
58
- * Dequeue* outputs configurable numbers of tensors from the
59
- queue. It will block until the queue have the required number of
59
+ queue. It will block until the queue has the required number of
60
60
tensors.
61
61
62
62
63
63
### Benefits
64
64
65
- - Model parallelism become easier to implement: it's an extension to
65
+ - Model parallelism becomes easier to implement: it is an extension to
66
66
the trainer - parameter server approach. We can have several "Transpilers"
67
67
to achieve different goals.
68
68
- User-defined optimizer is easier to add - user can now express it as
@@ -72,22 +72,22 @@ After converting:
72
72
73
73
### Challenges
74
74
75
- - It's important to balance the parameter shards of on multiple
76
- parameter server . If a single parameter is very big (some
75
+ - It is important to balance the parameter shards on multiple
76
+ parameter servers . If a single parameter is very big (for example: some
77
77
word-embedding, fully connected, softmax layer), we need to
78
78
automatically partition the single parameter onto different
79
79
parameter servers when possible (only element-wise optimizer depends
80
80
on the parameter variable).
81
- - In the "Aync SGD" figure, the "W" variable on the parameter server
82
- could be read and wrote concurrently. See
81
+ - In the "Async SGD" figure, the "W" variable on the parameter server
82
+ could be read and written concurrently. See
83
83
[ here] ( https://github.com/PaddlePaddle/Paddle/pull/6394 ) for more
84
- details about concurrent program in fluid .
84
+ details about concurrent program in Fluid .
85
85
86
86
### Discussion
87
87
88
88
- Can the Enqueue OP be implemented under our current tensor design
89
- (puts the input tensor into the queue tensor)?
90
- - * Dequeue* OP will have variable numbers of output (depends on the
89
+ (put the input tensor into the queue tensor)?
90
+ - * Dequeue* OP will have variable numbers of output (depending on the
91
91
` min_count ` attribute), does our current design support it? (similar
92
92
question for the * Add* OP)
93
93
0 commit comments