Skip to content

Commit dd8dc0e

Browse files
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/Enhance_regularizer_py
2 parents 74523c4 + 788c600 commit dd8dc0e

File tree

143 files changed

+3728
-1009
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

143 files changed

+3728
-1009
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ script:
5656
export DEPLOY_DOCS_SH=https://raw.githubusercontent.com/PaddlePaddle/PaddlePaddle.org/master/scripts/deploy/deploy_docs.sh
5757
export DOCS_DIR=`pwd`
5858
cd ..
59-
curl $DEPLOY_DOCS_SH | bash -s $CONTENT_DEC_PASSWD $TRAVIS_BRANCH $DOCS_DIR $DOCS_DIR/build/doc/v2
59+
curl $DEPLOY_DOCS_SH | bash -s $CONTENT_DEC_PASSWD $TRAVIS_BRANCH $DOCS_DIR $DOCS_DIR/build/doc/
6060
notifications:
6161
email:
6262
on_success: change

cmake/generic.cmake

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -244,11 +244,11 @@ function(cc_test TARGET_NAME)
244244
cmake_parse_arguments(cc_test "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
245245
add_executable(${TARGET_NAME} ${cc_test_SRCS})
246246
# Support linking flags: --whole-archive (Linux) / -force_load (MacOS)
247-
target_circle_link_libraries(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main paddle_memory gtest gflags)
247+
target_circle_link_libraries(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main paddle_memory gtest gflags glog)
248248
if("${cc_test_DEPS}" MATCHES "ARCHIVE_START")
249249
list(REMOVE_ITEM cc_test_DEPS ARCHIVE_START ARCHIVE_END)
250250
endif()
251-
add_dependencies(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main paddle_memory gtest gflags)
251+
add_dependencies(${TARGET_NAME} ${cc_test_DEPS} paddle_gtest_main paddle_memory gtest gflags glog)
252252
add_test(NAME ${TARGET_NAME}
253253
COMMAND ${TARGET_NAME} ${cc_test_ARGS}
254254
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
@@ -311,8 +311,8 @@ function(nv_test TARGET_NAME)
311311
set(multiValueArgs SRCS DEPS)
312312
cmake_parse_arguments(nv_test "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
313313
cuda_add_executable(${TARGET_NAME} ${nv_test_SRCS})
314-
target_link_libraries(${TARGET_NAME} ${nv_test_DEPS} paddle_gtest_main paddle_memory gtest gflags)
315-
add_dependencies(${TARGET_NAME} ${nv_test_DEPS} paddle_gtest_main paddle_memory gtest gflags)
314+
target_link_libraries(${TARGET_NAME} ${nv_test_DEPS} paddle_gtest_main paddle_memory gtest gflags glog)
315+
add_dependencies(${TARGET_NAME} ${nv_test_DEPS} paddle_gtest_main paddle_memory gtest gflags glog)
316316
add_test(${TARGET_NAME} ${TARGET_NAME})
317317
endif()
318318
endfunction(nv_test)

doc/design/cpp_data_feeding.md

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
# C++ Data Feeding
22

3-
In training with Paddle V2 API, data feeding wholly dependents on Python code. To get rid of the Python environment and achieve the goal of "wrapping the whole training by a while loop op" in Paddle Fluid, a C++ data feeding mechanism is required.
3+
While using Paddle V2 API for Training, data feeding completely depends on the Python code. To get rid of the Python environment and achieve the goal of "wrapping the whole training by a while loop op" in Paddle Fluid, a C++ data feeding mechanism is required.
44

5-
In this document we show the fundamental design of C++ data feeding process, which includes the data reading, shuffling and batching.
5+
In this document we show the fundamental design of a C++ data feeding process, which includes data reading, shuffling and batching.
66

77
## Reader
88

9-
A new concept named 'Reader' is introduced. `Reader` is a series of inherited classes which can be hold by our `Variable` and they are used to read or process file data.
9+
In order to handle the above mentioned problem, a new concept called 'Reader' is introduced. `Reader` is a series of inherited classes which can be held by our `Variable` and they are used to read or process file data.
1010

1111

1212
### `ReaderBase`
1313

14-
`ReaderBase` is the abstract base class of all readers. It defines the all readers' interfaces.
14+
`ReaderBase` is the abstract base class for all readers. It defines the interface for all readers.
1515

1616
```cpp
1717
class ReaderBase {
@@ -20,11 +20,10 @@ class ReaderBase {
2020
PADDLE_ENFORCE(!shapes_.empty());
2121
}
2222
// Read the next batch of data. (A 'batch' can be only one instance)
23+
// If the next batch doesn't exist, '*out' will be an empty std::vector.
2324
virtual void ReadNext(std::vector<LoDTensor>* out) = 0;
24-
// Show whether the next bacth exists.
25-
virtual bool HasNext() const = 0;
2625

27-
// Reinitialize the reader and read the file from the begin.
26+
// Reinitialize the reader and read the file from the beginning.
2827
virtual void ReInit() = 0;
2928

3029
// Get a certain read in data's shape.
@@ -43,36 +42,36 @@ class ReaderBase {
4342
4443
### `FileReader` and `DecoratedReader`
4544
46-
These two classes are derived from the `ReaderBase` and will further be derived by respective specific readers. That is to say, in our design, there are two kinds of readers: file readers and decorated readers. A file reader reads from a file of some specific format, and yield only one instance of data at a time. e.g. RecordIO reader, jpg reader, .... A decorated reader takes another reader(both file reader and decorated reader are OK) as its 'underlying reader'. It gets data from its underlying reader, does some process on them(shuffling, or batching), then yields processed data. The output data of a decorated reader can be a single instance or a batch. `ShuffleReader` and `BatchReader` are both decorated readers.
45+
These two classes are derived from the `ReaderBase` and will further be derived by more specific readers. Thus, in our design, there are two kinds of readers: file readers and decorated readers. A file reader reads from a file of some specific format, and yield only one instance of data at a time. For example, RecordIO reader, jpg reader, .... A decorated reader takes another reader(both file reader and decorated reader are OK) as its 'underlying reader'. It gets data from its underlying reader, does some processing on them(shuffling, or batching), then yields processed data. The output data of a decorated reader can be a single instance or a batch. `ShuffleReader` and `BatchReader` are both decorated readers.
4746
48-
All the readers share exactly the same interfaces defined in `ReaderBase`. So they can be decorated for more than one time: We can **shuffle** a reader's outputs and then **batch** the shuffle outputs. The interface consistency also allows related ops use readers without knowing what they are exactly.
47+
All the readers share exactly the same interface as defined in `ReaderBase`. So they can be decorated for more than one time: We can **shuffle** a reader's outputs and then **batch** the shuffle outputs. The interface consistency also allows related ops use readers without knowing what they are exactly.
4948
5049
5150
### `ReaderHolder`
5251
53-
Different readers belong to different class types. It leads to a problem: How can we drop them into `Variable`s and fetch them out by a unified method? For example, if a Variable holds a `BatchReader`, we can not get it by the following code:
52+
Different readers belong to different class types. This leads to a problem: How can we drop them into `Variable`s and fetch them out by a unified method? For example, if a Variable holds a `BatchReader`, we can not get it by the following code:
5453
5554
```cpp
5655
var->Get<ReaderBase>("batch_reader");
5756
```
5857

59-
we have to write:
58+
We would have to write:
6059

6160
```cpp
6261
var->Get<BatchReader>("batch_reader");
6362
```
6463

65-
This requires each time getting a reader from a variable we must know the reader's type exactly. It is nearly impossible.
64+
This requires that in order to get a reader from a variable, every time, we must know the reader's type exactly. This is nearly impossible.
6665

67-
To solve this problem, we introduce `ReaderHolder` as a wrapper. It acts as an empty decorator of `ReaderBase`, which erases reader's type. With `ReaderHolder` we are able to fetch all types of readers by `var->Get<ReaderHolder>("...")` and regard the obtained object as a reader.
66+
To solve this problem, we introduce `ReaderHolder` as a wrapper. It acts as an empty decorator of `ReaderBase`, which hides reader's type. With `ReaderHolder` we are able to fetch all types of readers by `var->Get<ReaderHolder>("...")` and regard the obtained object as a reader.
6867

6968
## Related Operators
7069

71-
To create and invoke readers, some now ops are introduced:
70+
To create and invoke readers, some new ops are introduced:
7271

7372
### `CreateReaderOp`
7473

75-
Each reader has its creating op. File readers' creating ops have no input and yield the created file reader as its output. Decorated readers' creating ops take the underlying readers as inputs and then yield new decorated readers.
74+
Each reader has its creation op. File readers' creation ops have no input and yield the created file reader as its output. Decorated readers' creation ops take the underlying readers as inputs and then yield new decorated readers.
7675

7776
### `ReadOp`
7877

doc/design/dist_refactor/distributed_architecture.md renamed to doc/fluid/design/dist_train/distributed_architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Design Doc: Distributed Training Architecture
1+
# Design Doc: Fluid Distributed Training Architecture
22

33
## Abstract
44

File renamed without changes.

doc/design/dist_refactor/parameter_server.md renamed to doc/fluid/design/dist_train/parameter_server.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,17 @@ After converting:
5959
queue. It will block until the queue has the required number of
6060
tensors.
6161

62+
### Sparse Update
63+
64+
For embedding layers, the gradient may have many rows containing only 0 when training,
65+
if the gradient uses a dense tensor to do parameter optimization,
66+
it could spend unnecessary memory, slow down the calculations and waste
67+
the bandwidth while doing distributed training.
68+
In Fluid, we introduce [SelectedRows](../selected_rows.md) to represent a list of rows containing
69+
non-zero gradient data. So when we do parameter optimization both locally and remotely,
70+
we only need to send those non-zero rows to the optimizer operators:
71+
72+
<img src="src/sparse_update.png" width="700" />
6273

6374
### Benefits
6475

@@ -91,6 +102,6 @@ After converting:
91102
`min_count` attribute), does our current design support it? (similar
92103
question for the *Add* OP)
93104

105+
### References
94106

95-
### References:
96107
[1] [TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45166.pdf)

0 commit comments

Comments
 (0)