Skip to content

Commit 2cc2fb4

Browse files
author
Yancey
authored
add sparse update section in fluid dist doc (#8997)
* add sparse update section in fluid dist doc * update by comment * update * update by comment
1 parent f8b8f6c commit 2cc2fb4

22 files changed

+12
-1
lines changed
File renamed without changes.
File renamed without changes.

doc/design/dist_refactor/parameter_server.md renamed to doc/design/fluid_dist/parameter_server.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,17 @@ After converting:
5959
queue. It will block until the queue has the required number of
6060
tensors.
6161

62+
### Sparse Update
63+
64+
For embedding layers, the gradient may have many rows containing only 0 when training,
65+
if the gradient uses a dense tensor to do parameter optimization,
66+
it could spend unnecessary memory, slow down the calculations and waste
67+
the bandwidth while doing distributed training.
68+
In Fluid, we introduce [SelectedRows](../selected_rows.md) to represent a list of rows containing
69+
non-zero gradient data. So when we do parameter optimization both locally and remotely,
70+
we only need to send those non-zero rows to the optimizer operators:
71+
72+
<img src="src/sparse_update.png" width="700" />
6273

6374
### Benefits
6475

@@ -91,6 +102,6 @@ After converting:
91102
`min_count` attribute), does our current design support it? (similar
92103
question for the *Add* OP)
93104

105+
### References
94106

95-
### References:
96107
[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf)

0 commit comments

Comments
 (0)