Skip to content

Commit bd7b669

Browse files
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into dev_backward_for_op_desc
2 parents 4b07686 + 4c96008 commit bd7b669

File tree

121 files changed

+1562
-270
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

121 files changed

+1562
-270
lines changed

cmake/configure.cmake

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,12 @@ if(NOT WITH_GOLANG)
4949
endif(NOT WITH_GOLANG)
5050

5151
if(NOT WITH_GPU)
52-
add_definitions(-DPADDLE_ONLY_CPU)
5352
add_definitions(-DHPPL_STUB_FUNC)
5453

5554
list(APPEND CMAKE_CXX_SOURCE_FILE_EXTENSIONS cu)
5655
else()
56+
add_definitions(-DPADDLE_WITH_CUDA)
57+
5758
FIND_PACKAGE(CUDA REQUIRED)
5859

5960
if(${CUDA_VERSION_MAJOR} VERSION_LESS 7)

doc/design/python_api.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ Please be aware that these Python classes need to maintain some construction-tim
1515

1616
### Program
1717

18-
A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), which is composed of an array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator needs to be able to access variables in its ancessor blocks.
18+
A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), which is composed of an array of `BlockDesc`s. The `BlockDesc`s in a `ProgramDesc` can have a tree-like hierarchical structure. However, the `ProgramDesc` onlys stores a flattened array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator need to be able to access variables in its ancestor blocks.
1919

20-
Whenever we create a block, we need set its parent block to the current block, so the Python class `Program` needs to maintain a data member `current_block`.
20+
Whenever we create a block, we need to set its parent block to the current block, hence the Python class `Program` needs to maintain a data member `current_block`.
2121

2222
```python
2323
class Program(objects):
@@ -81,13 +81,13 @@ class Block(objects):
8181
self.ops.prepend(Operator(self, ...))
8282
```
8383

84-
`create_parameter` is necessary because parameters are global variables, those defined in the global block, but can be created in some sub-blocks, e.g., an FC layer in the step block of an RNN operator.
84+
`create_parameter` is necessary because parameters are global variables, defined in the global block, but can be created in some sub-blocks. For example, an FC layer in the step block of an RNN operator.
8585

86-
`prepand_operator` is necessary because the constructor of `Parameter` needs to create the initialize (or load) operator of the parameter, and would like to put it in the *preamble* of the global block.
86+
`prepend_operator` is necessary because the constructor of `Parameter` needs to create the initialize (or load) operator of the parameter, and would like to put it in the *preamble* of the global block.
8787

8888
### Operator
8989

90-
The `Operator` class fills in the `OpDesc` message and calls the C++ function `InferShape` to infer output shape from input shape.
90+
The `Operator` class fills in the `OpDesc` message and calls the C++ function `InferShape` to infer the output shapes from the input shapes.
9191

9292
```python
9393
class Operator(object):
@@ -105,7 +105,7 @@ class Operator(object):
105105
return self.proto.type()
106106
```
107107

108-
`Operator` creates the `OpDesc` message in C++ space, so could it call the `InferShape` function, which is in C++.
108+
`Operator` creates the `OpDesc` message in C++ space, so that it can call the `InferShape` function, which is in C++.
109109

110110
### Variable
111111

@@ -128,7 +128,7 @@ class Variable(object):
128128
self.writer = None
129129
```
130130

131-
Please be aware of `self.writer`, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each writes to a variable is represented by a Variable class. This is guaranteed by the fact that **`core.NewVarDesc` must NOT create a new `VarDesc` message if its name already exists in the specified block**.
131+
Please be aware of `self.writer`, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each write to a variable is represented by a Variable class. This is guaranteed by the fact that **`core.NewVarDesc` must NOT create a new `VarDesc` message if its name already exists in the specified block**.
132132

133133
### Parameter
134134

@@ -155,7 +155,7 @@ class Parameter(Variable):
155155
initialize_op_attrs)
156156
```
157157

158-
When users create a parameter, s/he can call
158+
When users create a parameter, they can call
159159

160160
```python
161161
program.create_parameter(

doc/design/refactor/session.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Design Doc: Session
2+
3+
## Abstract
4+
5+
The *session* object encapsulates the environment in which the
6+
computation graph is executed.
7+
8+
We will have the *local* session and *remote* session, they offer the
9+
same [interface](#interface). The local session encapsulates the local
10+
runtime environment and the remote session encapsulates the cluster
11+
runtime environment.
12+
13+
The local runtime environment contains:
14+
15+
1. computation devices (i.e., CPU, GPU) handles, and
16+
1. the [scope](../scope.md) which holds all variables.
17+
18+
The remote runtime environment contains:
19+
20+
1. computation devices (i.e., CPU and GPU on node 0, 1) in a cluster,
21+
and
22+
1. the distributed [scope](../scope.md) in a cluster which holds all
23+
variables.
24+
25+
The user can create a remote session on Paddle Cloud and evaluate the
26+
computation graph with it. In this way, the user can control the
27+
remote computation resource in a cluster from his local computer.
28+
29+
30+
## Background
31+
32+
The current design has an implicit global session in which
33+
`paddle.eval()` is executed. The pain point is:
34+
35+
Since the user is not able to explicitly switch between runtime
36+
environments, the user cannot run a topology in two independent
37+
environments.
38+
39+
For example, in reinforcement learning, the user may want to have a
40+
stale model for inference and a fresh model for training, and only
41+
replace the stale model with the fresh model periodically.
42+
43+
Furthermore, we have no concept that encapsulates a remote environment
44+
that executes a computation graph.
45+
46+
We need the session object to address above issues.
47+
48+
49+
## Session
50+
51+
A session is an object that owns the runtime environment. All
52+
computations are executed through `session.eval()`.
53+
54+
55+
### Interface
56+
57+
```python
58+
eval(
59+
targets,
60+
feed_dict=None,
61+
)
62+
```
63+
64+
Evaluates the target Operations or Variables in `targets`.
65+
66+
- *targets*: the evaluation targets. Can be a single Operation or
67+
Variable, or a list with the Operations or Variables as
68+
elements. The value returned by `eval()` has the same shape as the
69+
`target` argument.
70+
71+
The PaddlePaddle program is represented by
72+
the [ProgramDesc](../design/program.md), `eval()` will infer the
73+
ProgramDesc from the given targets and run the PaddlePaddle
74+
program. Please
75+
see
76+
[this graph](./distributed_architecture.md#local-training-architecture) for
77+
the detailed illustration for the local session
78+
and
79+
[this graph](./distributed_architecture.md#distributed-training-architecture) for
80+
the detailed illustration for the remote session.
81+
82+
- *feed_dict*: a dictionary that contains the tensors which override
83+
the edges of the computation graph.
84+
85+
feed_dict not only can provide the input data, it can override any
86+
OP's input as well:
87+
88+
```python
89+
a = pd.constant(2.0, name="a")
90+
b = pd.variable(name="b")
91+
c = pd.mul(a,b)
92+
sess.eval(targets=c, feed_dict={"b":3.0}) # returns 6.0
93+
```
94+
95+
```python
96+
close()
97+
```
98+
99+
Closes the session and releases the scope that the session owns.
100+
101+
102+
### Create a Local Session
103+
104+
```python
105+
session(
106+
devices=None
107+
)
108+
```
109+
110+
Creates a new session. One session owns one global scope, so creating
111+
multiple sessions will create different scopes.
112+
113+
- *devices*: a single `string` or a list of `string` of device names,
114+
the corresponding devices will be the computation devices for
115+
`eval()`. If not specified, all available devices (e.g., all GPUs)
116+
will be used. The user doesn't need to specify the CPU device since
117+
it will be always used. Multiple sessions can use the same device.
118+
119+
120+
#### Example
121+
122+
```Python
123+
a = paddle.constant(1.0)
124+
b = paddle.constant(2.0)
125+
c = a + b
126+
sess = paddle.session(devices=["gpu:0", "gpu:1", "fpga:0"])
127+
sess.eval(c)
128+
sess.close()
129+
```
130+
131+
### Create a Remote Session
132+
133+
```python
134+
create_cloud_job(
135+
name,
136+
num_trainer,
137+
mem_per_trainer,
138+
gpu_per_trainer,
139+
cpu_per_trainer,
140+
num_ps,
141+
mem_per_ps,
142+
cpu_per_ps,
143+
)
144+
```
145+
146+
Creates a Paddle Cloud job. Fails if the job name exists.
147+
148+
```python
149+
get_cloud_job(
150+
name
151+
)
152+
```
153+
154+
Gets a Paddle Cloud job.
155+
156+
```python
157+
remote_session(
158+
job
159+
)
160+
```
161+
162+
- *job*: the Paddle Cloud job.
163+
164+
#### Example
165+
166+
```Python
167+
reader = paddle.reader.recordio("/pfs/home/peter/mnist-train-*") # data stored on Paddle Cloud
168+
image = reader.column(0)
169+
label = reader.column(1)
170+
fc1 = paddle.op.fc(image, size=256, act="sigmoid")
171+
fc2 = paddle.op.fc(fc1, size=10, act="softmax")
172+
cost = paddle.op.cross_entropy(fc2, label)
173+
opt = paddle.optimizer.sgd(cost)
174+
175+
job = paddle.create_cloud_job("test", 3, "1G", 1, 1, 2, "1G", 1)
176+
sess = paddle.remote_ession(job)
177+
for i in range(1000):
178+
sess.eval(opt)
179+
sess.close()
180+
```

paddle/api/Util.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ bool isUsingGpu() { return FLAGS_use_gpu; }
4747
void setUseGpu(bool useGpu) { FLAGS_use_gpu = useGpu; }
4848

4949
bool isGpuVersion() {
50-
#ifdef PADDLE_ONLY_CPU
50+
#ifndef PADDLE_WITH_CUDA
5151
return false;
5252
#else
5353
return true;

paddle/capi/Matrix.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ paddle_error paddle_matrix_set_row(paddle_matrix mat,
4646
if (rowID >= ptr->mat->getHeight()) return kPD_OUT_OF_RANGE;
4747
paddle::real* buf = ptr->mat->getRowBuf(rowID);
4848
size_t width = ptr->mat->getWidth();
49-
#ifndef PADDLE_ONLY_CPU
49+
#ifdef PADDLE_WITH_CUDA
5050
hl_memcpy(buf, rowArray, sizeof(paddle::real) * width);
5151
#else
5252
std::copy(rowArray, rowArray + width, buf);

paddle/framework/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc
2323
cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute)
2424
cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker)
2525
cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto proto_desc)
26-
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope)
26+
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope proto_desc)
2727
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry)
2828

2929
cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator)

paddle/framework/block_desc.cc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ VarDescBind *BlockDescBind::Var(const std::string &name) const {
3434
return it->second.get();
3535
}
3636

37+
bool BlockDescBind::HasVar(const std::string &name) const {
38+
return vars_.find(name) != vars_.end();
39+
}
40+
3741
std::vector<VarDescBind *> BlockDescBind::AllVars() const {
3842
std::vector<VarDescBind *> res;
3943
for (const auto &p : vars_) {

paddle/framework/block_desc.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ class BlockDescBind {
5151

5252
VarDescBind *Var(const std::string &name_bytes) const;
5353

54+
bool HasVar(const std::string &var_name) const;
55+
5456
std::vector<VarDescBind *> AllVars() const;
5557

5658
BlockDescBind *ParentBlock() const;

paddle/framework/framework.proto

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ message LoDTensorDesc {
105105
message VarDesc {
106106
required string name = 1;
107107
optional LoDTensorDesc lod_tensor = 2;
108+
optional bool persistable = 3 [ default = false ];
108109
}
109110

110111
message BlockDesc {

paddle/framework/lod_tensor.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
#pragma once
1616

1717
#include <memory>
18-
#ifndef PADDLE_ONLY_CPU
18+
#ifdef PADDLE_WITH_CUDA
1919
#include <thrust/device_vector.h>
2020
#include <thrust/host_vector.h>
2121
#include <thrust/system/cuda/experimental/pinned_allocator.h>
@@ -29,7 +29,7 @@
2929
namespace paddle {
3030
namespace framework {
3131

32-
#ifdef PADDLE_ONLY_CPU
32+
#ifndef PADDLE_WITH_CUDA
3333
template <typename T>
3434
using Vector = std::vector<T>;
3535
#else

0 commit comments

Comments
 (0)