PaddlePaddle
diff --git a/‎doc/design/dist_refactor/distributed_architecture.md renamed to ‎doc/design/fluid_dist/distributed_architecture.md b/‎doc/design/dist_refactor/distributed_architecture.md renamed to ‎doc/design/fluid_dist/distributed_architecture.md
diff --git a/‎doc/design/dist_refactor/multi_cpu.md renamed to ‎doc/design/fluid_dist/multi_cpu.md b/‎doc/design/dist_refactor/multi_cpu.md renamed to ‎doc/design/fluid_dist/multi_cpu.md
diff --git a/‎doc/design/dist_refactor/parameter_server.md renamed to ‎doc/design/fluid_dist/parameter_server.md
Lines changed: 12 additions & 1 deletion b/‎doc/design/dist_refactor/parameter_server.md renamed to ‎doc/design/fluid_dist/parameter_server.md
Lines changed: 12 additions & 1 deletion
diff --git a/‎doc/design/dist_refactor/src/compiler.graffle renamed to ‎doc/design/fluid_dist/src/compiler.graffle b/‎doc/design/dist_refactor/src/compiler.graffle renamed to ‎doc/design/fluid_dist/src/compiler.graffle
diff --git a/‎doc/design/dist_refactor/src/compiler.png renamed to ‎doc/design/fluid_dist/src/compiler.png b/‎doc/design/dist_refactor/src/compiler.png renamed to ‎doc/design/fluid_dist/src/compiler.png
diff --git a/‎doc/design/dist_refactor/src/dist-graph.graffle renamed to ‎doc/design/fluid_dist/src/dist-graph.graffle b/‎doc/design/dist_refactor/src/dist-graph.graffle renamed to ‎doc/design/fluid_dist/src/dist-graph.graffle
diff --git a/‎doc/design/dist_refactor/src/dist-graph.png renamed to ‎doc/design/fluid_dist/src/dist-graph.png b/‎doc/design/dist_refactor/src/dist-graph.png renamed to ‎doc/design/fluid_dist/src/dist-graph.png
diff --git a/‎doc/design/dist_refactor/src/distributed_architecture.graffle renamed to ‎doc/design/fluid_dist/src/distributed_architecture.graffle b/‎doc/design/dist_refactor/src/distributed_architecture.graffle renamed to ‎doc/design/fluid_dist/src/distributed_architecture.graffle
diff --git a/‎doc/design/dist_refactor/src/distributed_architecture.png renamed to ‎doc/design/fluid_dist/src/distributed_architecture.png b/‎doc/design/dist_refactor/src/distributed_architecture.png renamed to ‎doc/design/fluid_dist/src/distributed_architecture.png
diff --git a/‎doc/design/dist_refactor/src/local-graph.graffle renamed to ‎doc/design/fluid_dist/src/local-graph.graffle b/‎doc/design/dist_refactor/src/local-graph.graffle renamed to ‎doc/design/fluid_dist/src/local-graph.graffle
@@ -59,6 +59,17 @@ After converting:
      queue. It will block until the queue has the required number of
      tensors.
 
+### Sparse Update
+
+For embedding layers, the gradient may have many rows containing only 0 when training,
+if the gradient uses a dense tensor to do parameter optimization,
+it could spend unnecessary memory, slow down the calculations and waste
+the bandwidth while doing distributed training.
+In Fluid, we introduce [SelectedRows](../selected_rows.md) to represent a list of rows containing
+non-zero gradient data. So when we do parameter optimization both locally and remotely,
+we only need to send those non-zero rows to the optimizer operators:
+
+<img src="src/sparse_update.png" width="700" />
 
 ### Benefits
 
@@ -91,6 +102,6 @@ After converting:
   `min_count` attribute), does our current design support it? (similar
   question for the *Add* OP)
 
+### References
 
-### References:
 [1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf)