Skip to content

Commit 5513f92

Browse files
committed
Merge branch 'develop' into expand_test
2 parents 29fe2a0 + 669786b commit 5513f92

File tree

265 files changed

+10782
-2397
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

265 files changed

+10782
-2397
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,4 @@ cmake_install.cmake
2828
paddle/.timestamp
2929
python/paddlepaddle.egg-info/
3030
paddle/pybind/pybind.h
31+
python/paddle/v2/framework/tests/tmp/*

CONTRIBUTING.md

Lines changed: 157 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,157 @@
1-
./doc/howto/dev/contribute_to_paddle_en.md
1+
# Contribute Code
2+
3+
We sincerely appreciate your contribution. This document explains our workflow and work style.
4+
5+
## Workflow
6+
7+
PaddlePaddle uses this [Git branching model](http://nvie.com/posts/a-successful-git-branching-model/). The following steps guide usual contributions.
8+
9+
1. Fork
10+
11+
Our development community has been growing fastly; it doesn't make sense for everyone to write into the official repo. So, please file Pull Requests from your fork. To make a fork, just head over to the GitHub page and click the ["Fork" button](https://help.github.com/articles/fork-a-repo/).
12+
13+
1. Clone
14+
15+
To make a copy of your fork to your local computers, please run
16+
17+
```bash
18+
git clone https://github.com/your-github-account/paddle
19+
cd paddle
20+
```
21+
22+
1. Create the local feature branch
23+
24+
For daily works like adding a new feature or fixing a bug, please open your feature branch before coding:
25+
26+
```bash
27+
git checkout -b my-cool-stuff
28+
```
29+
30+
1. Commit
31+
32+
Before issuing your first `git commit` command, please install [`pre-commit`](http://pre-commit.com/) by running the following commands:
33+
34+
```bash
35+
pip install pre-commit
36+
pre-commit install
37+
```
38+
39+
Our pre-commit configuration requires clang-format 3.8 for auto-formating C/C++ code and yapf for Python.
40+
41+
Once installed, `pre-commit` checks the style of code and documentation in every commit. We will see something like the following when you run `git commit`:
42+
43+
```
44+
➜ git commit
45+
CRLF end-lines remover...............................(no files to check)Skipped
46+
yapf.................................................(no files to check)Skipped
47+
Check for added large files..............................................Passed
48+
Check for merge conflicts................................................Passed
49+
Check for broken symlinks................................................Passed
50+
Detect Private Key...................................(no files to check)Skipped
51+
Fix End of Files.....................................(no files to check)Skipped
52+
clang-formater.......................................(no files to check)Skipped
53+
[my-cool-stuff c703c041] add test file
54+
1 file changed, 0 insertions(+), 0 deletions(-)
55+
create mode 100644 233
56+
```
57+
58+
1. Build and test
59+
60+
Users can build PaddlePaddle natively on Linux and Mac OS X. But to unify the building environment and to make it easy for debugging, the recommended way is [using Docker](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/build_en.md).
61+
62+
1. Keep pulling
63+
64+
An experienced Git user pulls from the official repo often -- daily or even hourly, so they notice conflicts with others work early, and it's easier to resolve smaller conflicts.
65+
66+
```bash
67+
git remote add upstream https://github.com/PaddlePaddle/Paddle
68+
git pull upstream develop
69+
```
70+
71+
1. Push and file a pull request
72+
73+
You can "push" your local work into your forked repo:
74+
75+
```bash
76+
git push origin my-cool-stuff
77+
```
78+
79+
The push allows you to create a pull request, requesting owners of this [official repo](https://github.com/PaddlePaddle/Paddle) to pull your change into the official one.
80+
81+
To create a pull request, please follow [these steps](https://help.github.com/articles/creating-a-pull-request/).
82+
83+
If your change is for fixing an issue, please write ["Fixes <issue-URL>"](https://help.github.com/articles/closing-issues-using-keywords/) in the description section of your pull request. Github would close the issue when the owners merge your pull request.
84+
85+
Please remember to specify some reviewers for your pull request. If you don't know who are the right ones, please follow Github's recommendation.
86+
87+
88+
1. Delete local and remote branches
89+
90+
To keep your local workspace and your fork clean, you might want to remove merged branches:
91+
92+
```bash
93+
git push origin :my-cool-stuff
94+
git checkout develop
95+
git pull upstream develop
96+
git branch -d my-cool-stuff
97+
```
98+
99+
### Code Review
100+
101+
- Please feel free to ping your reviewers by sending them the URL of your pull request via IM or email. Please do this after your pull request passes the CI.
102+
103+
- Please answer reviewers' every comment. If you are to follow the comment, please write "Done"; please give a reason otherwise.
104+
105+
- If you don't want your reviewers to get overwhelmed by email notifications, you might reply their comments by [in a batch](https://help.github.com/articles/reviewing-proposed-changes-in-a-pull-request/).
106+
107+
- Reduce the unnecessary commits. Some developers commit often. It is recommended to append a sequence of small changes into one commit by running `git commit --amend` instead of `git commit`.
108+
109+
110+
## Coding Standard
111+
112+
### Code Style
113+
114+
Our C/C++ code follows the [Google style guide](http://google.github.io/styleguide/cppguide.html).
115+
116+
Our Python code follows the [PEP8 style guide](https://www.python.org/dev/peps/pep-0008/).
117+
118+
Our build process helps to check the code style. In [`build.sh`](https://github.com/PaddlePaddle/Paddle/blob/b84e8226514b8bb4405c3c28e54aa5077193d179/paddle/scripts/docker/build.sh#L42), the entry point of our [builder Docker image](https://github.com/PaddlePaddle/Paddle/blob/b84e8226514b8bb4405c3c28e54aa5077193d179/Dockerfile#L88), the CMake argument `WITH_STYLE_CHECK` is set to `ON` by default. This flag is on
119+
120+
Please install pre-commit, which automatically reformat the changes to C/C++ and Python code whenever we run `git commit`. To check the whole codebase, we can run the command `pre-commit run -a`, as in the [`check_style.sh` file](https://github.com/PaddlePaddle/Paddle/blob/b84e8226514b8bb4405c3c28e54aa5077193d179/paddle/scripts/travis/check_style.sh#L30), which is invoked by [our Travis CI configuration](https://github.com/PaddlePaddle/Paddle/blob/b84e8226514b8bb4405c3c28e54aa5077193d179/.travis.yml#L43).
121+
122+
### Unit Tests
123+
124+
Please remember to add related unit tests.
125+
126+
- For C/C++ code, please follow [`google-test` Primer](https://github.com/google/googletest/blob/master/googletest/docs/Primer.md).
127+
128+
- For Python code, please use [Python's standard `unittest` package](http://pythontesting.net/framework/unittest/unittest-introduction/).
129+
130+
131+
### Writing Logs
132+
133+
We use [glog](https://github.com/google/glog) for logging in our C/C++ code.
134+
135+
For general information, please use `LOG`. For debug information, please use [`VLOG`](http://htmlpreview.github.io/?https://github.com/google/glog/blob/master/doc/glog.html#verbose). The reason is at [here](https://groups.google.com/a/chromium.org/d/msg/chromium-dev/3NDNd1KzXeY/AZKMMx37fdQJ).
136+
137+
`VLOG` requires a *verbose level* parameter. For example:
138+
139+
```c++
140+
VLOG(3) << "Operator FC is taking " << num_inputs << "inputs."
141+
```
142+
143+
When we run a PaddlePaddle application or test, we can specify a verbose threshold. For example:
144+
145+
```bash
146+
GLOG_vmodule=buddy_allocator=2 \
147+
GLOG_v=10 \
148+
python \
149+
../python/paddle/v2/framework/tests/test_recurrent_op.py
150+
```
151+
152+
This will enable VLOG messages generated by `buddy_allocator.{h,cc}` and in the verbose range of 0 to 3, so you will see above example VLOG message, which is in level 3. This suggests that we output overall messages in lower verbose levels, so they display with higher probability. When coding C++, please follow the verbose level convention as follows:
153+
154+
- verbose level 1: [framework](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/framework)
155+
- verbose level 3: [operators](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/operators)
156+
- verbose level 5: [memory](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/memory), [platform](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/platform)
157+
- verbose level 7: [math](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/math)

doc/design/model_format.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,35 @@
22

33
## Motivation
44

5-
The model is the output of training process. One complete model consists of two parts, namely, the **topology** and the **parameters**. To support industrial deployment, we need to make the model format must be self-completed and do not expose any training source code.
5+
A model is an output of the training process. One complete model consists of two parts, the **topology** and the **parameters**. In order to support industrial deployment, the model format must be self-complete and must not expose any training source code.
66

7-
As a result, In PaddlePaddle, the **topology** represents as a [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/doc/design/program.md), which describes the model structure. The **parameters** contain all the trainable weights in the model, we must support large size parameter, and efficient serialization/deserialization.
7+
As a result, In PaddlePaddle, the **topology** is represented as a [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/doc/design/program.md), which describes the model structure. The **parameters** contain all the trainable weights in the model. We must support large size parameters and efficient serialization/deserialization of parameters.
88

99
## Implementation
1010

11-
The topology is saved as a plain text, in detail, a self-contain protobuf file.
11+
The topology is saved as a plain text in a detailed self-contain protobuf file.
1212

13-
The parameters are saved as a binary file. As we all know, the protobuf message has the limits of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We do a (benchmark experiment)[https://github.com/PaddlePaddle/Paddle/pull/4610], its result shows protobuf is not fit in this scene.
13+
The parameters are saved as a binary file. As we all know, the protobuf message has a limit of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We have done a [benchmark experiment](https://github.com/PaddlePaddle/Paddle/pull/4610), which shows that protobuf is not fit for the task.
1414

15-
As a result, we design a particular format for tensor serialization. By default, arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of (LoDTensorDesc)[https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99]. We save the DescProto as the byte string header, it contains the necessary information, such as the `dims`, the `name` of the tensor, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). Tensor stores value in a continuous memory buffer, for speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,
15+
As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of [LoDTensorDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99). We save the DescProto as the byte string header. It contains all the necessary information, such as the `dims`, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,
1616

17-
|HeaderLength|ContentLength|**LoDTensorDesc**|**TensorValue**|
17+
The table below shows a tensor's byte view in detail. Note that all the signed values are written in the little-endian format.
18+
19+
|field name | type | description |
20+
| --- | --- | --- |
21+
| version | uint32_t | Version of saved file. Always 0 now. |
22+
| tensor desc length | uint32_t | TensorDesc(Protobuf message) length in bytes. |
23+
| tensor desc | void* | TensorDesc protobuf binary message |
24+
| tensor data | void* | Tensor's data in binary format. The length of `tensor_data` is decided by `TensorDesc.dims()` and `TensorDesc.data_type()` |
25+
| lod_level | uint64_t | Level of LoD |
26+
| length of lod[0] | uint64_t | [Optional] length of lod[0] in bytes. |
27+
| data of lod[0] | uint64_t* | [Optional] lod[0].data() |
28+
| ... | ... | ... |
1829

19-
In detail, tensor's byte view as the table shows. Note that all the signed value written in little-endian.
2030

21-
```text
22-
[offset] [type] [description]
23-
0004 4 bytes integer HeaderLength, the length of LoDTensorDesc
24-
0008 4 bytes integer ContentLength, the length of LodTensor Buffer
25-
0009 1 bytes char TensorDesc
26-
00010 1 bytes char TensorDesc
27-
...
28-
00100 1 bytes char TensorValue
29-
00101 1 bytes char TensorValue
30-
00102 1 bytes char TensorValue ..
31-
...
32-
```
3331

3432
## Summary
3533

36-
We introduce the model format, the `ProgramDesc` describe the **topology**, and a bunch of particular format binary tensors describes the **parameters**.
34+
- We introduce a model format.
35+
- The model represented by its forward-pass computation procedure is saved in a **ProgramDesc** protobuf message.
36+
- A bunch of specified format binary tensors describe the **parameters**.

0 commit comments

Comments
 (0)