Skip to content

Commit 7081f21

Browse files
author
kavyasrinet
authored
Update the parameter_server doc (#7805)
1 parent 7ed48bd commit 7081f21

File tree

1 file changed

+20
-20
lines changed

1 file changed

+20
-20
lines changed

doc/design/dist_refactor/parameter_server.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,16 @@ different purposes.
99

1010
## Background
1111

12-
The previous implementations of the parameter server does not run a
12+
The previous implementations of the parameter server do not run a
1313
fluid sub-program. Parameter initialization, optimizer computation, network
1414
communication and checkpointing are implemented twice on both the
15-
trainer and the parameter server.
15+
trainer as well as the parameter server.
1616

17-
It would be great if we can write code once and use them on both the
18-
trainer and the parameter server: reduces code duplication and
19-
improves extensibility. Given that after the current refactor, we are
20-
representing everything as a computing graph on the
21-
trainer. Representing everything as a computing graph on the parameter
17+
It would be great if we can write code once and use them on both: the
18+
trainer and the parameter server, since this reduces code duplication and
19+
improves extensibility. Given that after the current refactoring, we are
20+
representing everything as a computation graph on the
21+
trainer. Representing everything as a computation graph on the parameter
2222
server becomes a natural extension.
2323

2424
## Design
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
3030
steps:
3131

3232
1. OP placement: the OPs will be placed on different nodes according
33-
to heuristic that minimizes estimated total computation
33+
to a heuristic that minimizes the estimated total computation
3434
time. Currently we will use a simple heuristic that puts parameter
35-
varable on parameter server workers and everything else on trainer
35+
variable on parameter server workers and everything else on trainer
3636
workers.
3737
1. Add communication OPs to enable the communication between nodes.
3838

@@ -47,22 +47,22 @@ After converting:
4747

4848
<img src="src/dist-graph.png" width="700"/>
4949

50-
1. The parameter variable W and it's optimizer program are placed on the parameter server.
50+
1. The parameter variable W and its optimizer program are placed on the parameter server.
5151
1. Operators are added to the program.
5252
- *Send* sends data to the connected *Recv* operator. The
5353
scheduler on the receive node will only schedule *Recv* operator
5454
to run when the *Send* operator has ran (the *Send* OP will mark
5555
the *Recv* OP runnable automatically).
56-
- *Enueue* enqueues the input variable, it can block until space
56+
- *Enqueue* enqueues the input variable, it can block until space
5757
become available in the queue.
5858
- *Dequeue* outputs configurable numbers of tensors from the
59-
queue. It will block until the queue have the required number of
59+
queue. It will block until the queue has the required number of
6060
tensors.
6161

6262

6363
### Benefits
6464

65-
- Model parallelism become easier to implement: it's an extension to
65+
- Model parallelism becomes easier to implement: it is an extension to
6666
the trainer - parameter server approach. We can have several "Transpilers"
6767
to achieve different goals.
6868
- User-defined optimizer is easier to add - user can now express it as
@@ -72,22 +72,22 @@ After converting:
7272

7373
### Challenges
7474

75-
- It's important to balance the parameter shards of on multiple
76-
parameter server. If a single parameter is very big (some
75+
- It is important to balance the parameter shards on multiple
76+
parameter servers. If a single parameter is very big (for example: some
7777
word-embedding, fully connected, softmax layer), we need to
7878
automatically partition the single parameter onto different
7979
parameter servers when possible (only element-wise optimizer depends
8080
on the parameter variable).
81-
- In the "Aync SGD" figure, the "W" variable on the parameter server
82-
could be read and wrote concurrently. See
81+
- In the "Async SGD" figure, the "W" variable on the parameter server
82+
could be read and written concurrently. See
8383
[here](https://github.com/PaddlePaddle/Paddle/pull/6394) for more
84-
details about concurrent program in fluid.
84+
details about concurrent program in Fluid.
8585

8686
### Discussion
8787

8888
- Can the Enqueue OP be implemented under our current tensor design
89-
(puts the input tensor into the queue tensor)?
90-
- *Dequeue* OP will have variable numbers of output (depends on the
89+
(put the input tensor into the queue tensor)?
90+
- *Dequeue* OP will have variable numbers of output (depending on the
9191
`min_count` attribute), does our current design support it? (similar
9292
question for the *Add* OP)
9393

0 commit comments

Comments
 (0)