Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit 57ee7b2

Browse files
committed
update reviewers and submit time
1 parent 5f54a30 commit 57ee7b2

File tree

5 files changed

+10
-7
lines changed

5 files changed

+10
-7
lines changed

rfcs/20200409-fuse_recv.md renamed to rfcs/20200411-fuse_recv.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
| Status | (Proposed / Accepted / Implemented / Obsolete) |
44
:-------------- |:---------------------------------------------------- |
5-
|
65
| **Author(s)** | Tongxuan Liu([email protected]) Peng Tao([email protected]) Langshi Chen ([email protected]) |
7-
| **Sponsor** | i |
8-
| **Updated** | 2020-04-09 |
6+
| **Reviewers(s)** | Ayush Dubey([email protected]) Jeroen Bédorf([email protected]) Derek Murray([email protected]) Bairen Yi([email protected]) Paul Tucker([email protected]) |
7+
| **Sponsor** | |
8+
| **Updated** | 2020-04-11 |
99

1010
## Objective
1111
This RFC proposes a new FuseRecv Op which would receive multiple tensors with
@@ -54,8 +54,8 @@ be 1.5-2x timer faster in the parameter-server/worker setup.
5454

5555
## Design Proposal
5656

57-
![Figure 1: Current graph partition strategy](20200409-fuse_recv/current_graph_partition_strategy.png "Current graph partition strategy")
58-
![Figure 2: Graph partition strategy with FuseRecv](20200409-fuse_recv/graph_partition_strategy_with_fuse_recv.png "Graph partition strategy with FuseRecv")
57+
![Figure 1: Current graph partition strategy](20200411-fuse_recv/current_graph_partition_strategy.png "Current graph partition strategy")
58+
![Figure 2: Graph partition strategy with FuseRecv](20200411-fuse_recv/graph_partition_strategy_with_fuse_recv.png "Graph partition strategy with FuseRecv")
5959

6060
In the original Recv/Send design, each Recv node only receives one tensor
6161
even if there are Recv Ops that output to the same destination Op. Moreover each
@@ -82,6 +82,7 @@ Pack the N tensors to be sent into a length-N DT_VARIANT vector.
8282

8383
Pros: Reuse currently RPC, avoid potential intricate changes in zero-copy
8484
response buffer code.
85+
8586
Cons: Introduce memcopy overhead.
8687

8788
#### Fuse the tensors into a single Send/Recv Solution 2 (Derek Murray)
@@ -92,6 +93,7 @@ to reuse some of the graph analysis code
9293

9394
Pros: Reuse currently RPC, avoid potential intricate changes in zero-copy
9495
response buffer code.
96+
9597
Cons: The fused tensors could be of different types and dynamic shapes,
9698
which couldn't be handled by this solution.
9799

@@ -118,14 +120,15 @@ missing.
118120

119121
Pros: Dynamic fusion in runtime seems get better result, and also brings
120122
ability to control priority of tensors (which Recv is more important).
123+
121124
Cons: Potential bottleneck of the solution is the time window of ready set.
122125
For different models it would be much different, manually setting the value
123126
would be hard. This solution is another good candidate of FuseRecv.
124127

125128
### Performance Implications
126129
With a wide and deep model, the number of RPCs calls per step has been reduced
127130
by 55%, and the overall training throughput has increased by 40%.
128-
![Figure 3: performance_result](20200409-fuse_recv/performance_result.png "Performance result")
131+
![Figure 3: performance_result](20200411-fuse_recv/performance_result.png "Performance result")
129132

130133
### Dependencies
131134
* None
@@ -187,7 +190,7 @@ the whole graph, replace Recv ops by FuseRecv ops in the partitioned graphs acco
187190
to its topology while iteratively searching and fusing potential Recv
188191
operations. See Figure 4 for the formal algorithm definition.
189192

190-
![Figure 4: fuse_recv_procedure](20200409-fuse_recv/fuse_recv_procedure.png "Fuse Recv Procedure")
193+
![Figure 4: fuse_recv_procedure](20200411-fuse_recv/fuse_recv_procedure.png "Fuse Recv Procedure")
191194

192195
The procedure RECVFUSE takes two input arguments: 1) the TF computation
193196
graph g, 2) a Partitioned graph. It is worth noting that the iteration of

0 commit comments

Comments
 (0)