Skip to content

Commit f2a32dd

Browse files
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_im2seq
2 parents 1234b8b + 95853fc commit f2a32dd

File tree

6 files changed

+36
-33
lines changed

6 files changed

+36
-33
lines changed

doc/design/dist_refactor/parameter_server.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,16 @@ different purposes.
99

1010
## Background
1111

12-
The previous implementations of the parameter server does not run a
12+
The previous implementations of the parameter server do not run a
1313
fluid sub-program. Parameter initialization, optimizer computation, network
1414
communication and checkpointing are implemented twice on both the
15-
trainer and the parameter server.
15+
trainer as well as the parameter server.
1616

17-
It would be great if we can write code once and use them on both the
18-
trainer and the parameter server: reduces code duplication and
19-
improves extensibility. Given that after the current refactor, we are
20-
representing everything as a computing graph on the
21-
trainer. Representing everything as a computing graph on the parameter
17+
It would be great if we can write code once and use them on both: the
18+
trainer and the parameter server, since this reduces code duplication and
19+
improves extensibility. Given that after the current refactoring, we are
20+
representing everything as a computation graph on the
21+
trainer. Representing everything as a computation graph on the parameter
2222
server becomes a natural extension.
2323

2424
## Design
@@ -30,9 +30,9 @@ into sub-programs to be scheduled on different nodes with the following
3030
steps:
3131

3232
1. OP placement: the OPs will be placed on different nodes according
33-
to heuristic that minimizes estimated total computation
33+
to a heuristic that minimizes the estimated total computation
3434
time. Currently we will use a simple heuristic that puts parameter
35-
varable on parameter server workers and everything else on trainer
35+
variable on parameter server workers and everything else on trainer
3636
workers.
3737
1. Add communication OPs to enable the communication between nodes.
3838

@@ -47,22 +47,22 @@ After converting:
4747

4848
<img src="src/dist-graph.png" width="700"/>
4949

50-
1. The parameter variable W and it's optimizer program are placed on the parameter server.
50+
1. The parameter variable W and its optimizer program are placed on the parameter server.
5151
1. Operators are added to the program.
5252
- *Send* sends data to the connected *Recv* operator. The
5353
scheduler on the receive node will only schedule *Recv* operator
5454
to run when the *Send* operator has ran (the *Send* OP will mark
5555
the *Recv* OP runnable automatically).
56-
- *Enueue* enqueues the input variable, it can block until space
56+
- *Enqueue* enqueues the input variable, it can block until space
5757
become available in the queue.
5858
- *Dequeue* outputs configurable numbers of tensors from the
59-
queue. It will block until the queue have the required number of
59+
queue. It will block until the queue has the required number of
6060
tensors.
6161

6262

6363
### Benefits
6464

65-
- Model parallelism become easier to implement: it's an extension to
65+
- Model parallelism becomes easier to implement: it is an extension to
6666
the trainer - parameter server approach. We can have several "Transpilers"
6767
to achieve different goals.
6868
- User-defined optimizer is easier to add - user can now express it as
@@ -72,22 +72,22 @@ After converting:
7272

7373
### Challenges
7474

75-
- It's important to balance the parameter shards of on multiple
76-
parameter server. If a single parameter is very big (some
75+
- It is important to balance the parameter shards on multiple
76+
parameter servers. If a single parameter is very big (for example: some
7777
word-embedding, fully connected, softmax layer), we need to
7878
automatically partition the single parameter onto different
7979
parameter servers when possible (only element-wise optimizer depends
8080
on the parameter variable).
81-
- In the "Aync SGD" figure, the "W" variable on the parameter server
82-
could be read and wrote concurrently. See
81+
- In the "Async SGD" figure, the "W" variable on the parameter server
82+
could be read and written concurrently. See
8383
[here](https://github.com/PaddlePaddle/Paddle/pull/6394) for more
84-
details about concurrent program in fluid.
84+
details about concurrent program in Fluid.
8585

8686
### Discussion
8787

8888
- Can the Enqueue OP be implemented under our current tensor design
89-
(puts the input tensor into the queue tensor)?
90-
- *Dequeue* OP will have variable numbers of output (depends on the
89+
(put the input tensor into the queue tensor)?
90+
- *Dequeue* OP will have variable numbers of output (depending on the
9191
`min_count` attribute), does our current design support it? (similar
9292
question for the *Add* OP)
9393

doc/howto/optimization/cpu_profiling.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,7 @@ each column is as follows:
6060
| column | meaning |
6161
| --- | --- |
6262
| ncalls | the number of calls into a function |
63-
| tottime | the total execution time of the function, not including the
64-
execution time of other functions called by the function |
63+
| tottime | the total execution time of the function, not including the execution time of other functions called by the function |
6564
| percall | tottime divided by ncalls |
6665
| cumtime | the total execution time of the function, including the execution time of other functions being called |
6766
| percall | cumtime divided by ncalls |

paddle/gserver/layers/PriorBox.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ bool PriorBoxLayer::init(const LayerMap& layerMap,
6969
if (maxSize_.size() > 0) CHECK_EQ(minSize_.size(), maxSize_.size());
7070

7171
// flip aspect ratios
72-
for (int index = 0; index < tmp.size(); index++) {
72+
for (unsigned index = 0; index < tmp.size(); index++) {
7373
real ar = tmp[index];
7474
if (fabs(ar - 1.) < 1e-6) continue;
7575
aspectRatio_.push_back(ar);

paddle/operators/ctc_align_op.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ class CTCAlignKernel : public framework::OpKernel<T> {
5151
T prev_token = -1;
5252
for (size_t i = input_lod[level][seq_idx];
5353
i < input_lod[level][seq_idx + 1]; ++i) {
54-
if (input_data[i] != blank &&
54+
if ((unsigned)input_data[i] != blank &&
5555
!(merge_repeated && input_data[i] == prev_token)) {
5656
output_data[output_idx] = input_data[i];
5757
++output_idx;

paddle/operators/sequence_reshape_op.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ class SequenceReshapeKernel : public framework::OpKernel<T> {
3535
PADDLE_ENFORCE_EQ(in_lod.size(), 1UL,
3636
"Only support one level sequence now.");
3737
PADDLE_ENFORCE_EQ(
38-
in_dims[0], in_lod[0].back(),
38+
(uint64_t)in_dims[0], in_lod[0].back(),
3939
"Inconsistent size between X.shape[0] and X.lod()[0].back().");
4040

4141
auto in_lod_l0 = in_lod[0];

python/paddle/v2/image.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,6 @@ def resize_short(im, size):
176176
:param size: the shorter edge size of image after resizing.
177177
:type size: int
178178
"""
179-
assert im.shape[-1] == 1 or im.shape[-1] == 3
180179
h, w = im.shape[:2]
181180
h_new, w_new = size, size
182181
if h > w:
@@ -267,7 +266,7 @@ def random_crop(im, size, is_color=True):
267266
return im
268267

269268

270-
def left_right_flip(im):
269+
def left_right_flip(im, is_color=True):
271270
"""
272271
Flip an image along the horizontal direction.
273272
Return the flipped image.
@@ -278,13 +277,15 @@ def left_right_flip(im):
278277
279278
im = left_right_flip(im)
280279
281-
:paam im: input image with HWC layout
280+
:param im: input image with HWC layout or HW layout for gray image
282281
:type im: ndarray
282+
:param is_color: whether input image is color or not
283+
:type is_color: bool
283284
"""
284-
if len(im.shape) == 3:
285+
if len(im.shape) == 3 and is_color:
285286
return im[:, ::-1, :]
286287
else:
287-
return im[:, ::-1, :]
288+
return im[:, ::-1]
288289

289290

290291
def simple_transform(im,
@@ -321,8 +322,9 @@ def simple_transform(im,
321322
if is_train:
322323
im = random_crop(im, crop_size, is_color=is_color)
323324
if np.random.randint(2) == 0:
324-
im = left_right_flip(im)
325+
im = left_right_flip(im, is_color)
325326
else:
327+
im = center_crop(im, crop_size, is_color)
326328
im = center_crop(im, crop_size, is_color=is_color)
327329
if len(im.shape) == 3:
328330
im = to_chw(im)
@@ -331,8 +333,10 @@ def simple_transform(im,
331333
if mean is not None:
332334
mean = np.array(mean, dtype=np.float32)
333335
# mean value, may be one value per channel
334-
if mean.ndim == 1:
336+
if mean.ndim == 1 and is_color:
335337
mean = mean[:, np.newaxis, np.newaxis]
338+
elif mean.ndim == 1:
339+
mean = mean
336340
else:
337341
# elementwise mean
338342
assert len(mean.shape) == len(im)
@@ -372,6 +376,6 @@ def load_and_transform(filename,
372376
mean values per channel.
373377
:type mean: numpy array | list
374378
"""
375-
im = load_image(filename)
379+
im = load_image(filename, is_color)
376380
im = simple_transform(im, resize_size, crop_size, is_train, is_color, mean)
377381
return im

0 commit comments

Comments
 (0)