Skip to content

Commit 6ece41e

Browse files
authored
Merge pull request #7213 from JiayiFeng/dev_add_callback_for_backward
Error Clip Design Doc
2 parents 1dad4bb + 8ab59da commit 6ece41e

File tree

5 files changed

+157
-5
lines changed

5 files changed

+157
-5
lines changed

doc/design/error_clip.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Error Clip
2+
3+
## Overview
4+
5+
Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables' gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next `grad_op` and be shrunk if necessary.
6+
## Usage
7+
8+
Users are allowed to assign different error clip methods or attributes to different `Variable`s. Users can specify it as a parameter of `Variable`'s constructor:
9+
10+
```python
11+
var = framework.Variable(..., error_clip=myErrorClip, ...)
12+
```
13+
14+
The default value of `error_clip` is `None`, which means no error clip is employed. When it's not `None`, it should take an object of `BaseErrorClipAttr`'s derived class. So far, `BaseErrorClipAttr` has only one derived class: `ErrorClipByValue`, whose constructor is:
15+
16+
```python
17+
ErrorClipByValue(max, min=None)
18+
```
19+
20+
`max` and `min` represent the maximal and minimal clip threshold respectively. In backward pass, all values of `var`'s gradient greater than `max` or less than `min` will be clipped to `max` and `min` respectively. When the `min` is None, the minimal threshold will be assigned with `-max` automatically.
21+
22+
So we can enable the error clip with threshold `[-5.0, 5.0]` for variable `var` by:
23+
24+
```python
25+
var = framework.Variable(..., error_clip=ErrorClipByValue(max=5.0), ...)
26+
```
27+
28+
## Implementation
29+
30+
The `BaseErrorClipAttr` and its derived class `ErrorClipByValue` are defined in *clip.py*.
31+
32+
```python
33+
class BaseErrorClipAttr(object):
34+
def append_clip_op(self, block, grad_name):
35+
raise NotImplementedError()
36+
37+
38+
class ErrorClipByValue(BaseErrorClipAttr):
39+
def __init__(self, max, min=None):
40+
max = float(max)
41+
if min is None:
42+
min = -max
43+
else:
44+
min = float(min)
45+
self.max = max
46+
self.min = min
47+
48+
def append_clip_op(self, block, grad_name):
49+
block.append_op(
50+
type="clip",
51+
inputs={"X": grad_name},
52+
outputs={"Out": grad_name},
53+
attrs={"min": self.min,
54+
"max": self.max})
55+
```
56+
57+
The `BaseErrorClipAttr` have one main member functions: `append_clip_op(self, block, grad_name)`.
58+
59+
This function is used to create a `clip_op` and append it to the end of given `block`. For different error clip algorithm require different `clip_op`, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.
60+
61+
These `clip_op`s should be inserted after `grad_op`s whose output gradients need to be clipped. It is equivalent to appending some `clip_op`s to the end of the target block every time a new `grad_op` is added.
62+
63+
```python
64+
for op_desc in grad_op_descs:
65+
new_op_desc = target_block.desc.append_op()
66+
new_op_desc.copy_from(op_desc)
67+
callback(block=target_block, context=grad_to_var)
68+
```
69+
70+
Here we employ a callback function to complete this kind of jobs. In `_append_backward_ops_` function, each time after a `grad_op` is added to the `target_block`, a callback function is invoked. The logic of `clip_op` appending can be implemented inside the callback function.
71+
72+
The callback function for `clip_op` appending is defined in *clip.py*:
73+
74+
```python
75+
def error_clip_callback(block, context):
76+
# the context is a grad_to_var map
77+
grad_to_var = context
78+
op_desc = block.desc.op(block.desc.op_size() - 1)
79+
for grad_n in filter(lambda n: grad_to_var.has_key(n),
80+
op_desc.output_arg_names()):
81+
fwd_var = block.var_recursive(grad_to_var[grad_n])
82+
error_clip = getattr(fwd_var, "error_clip", None)
83+
if error_clip is not None:
84+
error_clip.append_clip_op(block, grad_n)
85+
```
86+
87+
This function takes a `block` and a `context`(which is actually a grad\_to\_var map) as inputs. It checks each output of the last `OpDesc` in the `block`. Notice that the last `OpDesc` of the `block` must be a `grad_op` and its outputs must be some forward variables' gradients. If an output gradient's corresponding forward variable has an attribute of `error_clip`, `error_clip_callback` will call the `error_clip`'s `append_clip_op` function to append the required `clip_op` into the `block`.

python/paddle/v2/fluid/backward.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,17 @@ def _append_backward_ops_(target,
188188
grad_to_var(dict)(output argument):
189189
key(str): grad variable name
190190
val(str): corresponding forward variable name
191+
callback(callable object): a callable object used to decorate new generated grad ops
191192
"""
193+
if callback is None:
194+
195+
def empty_callback(block, context):
196+
pass
197+
198+
callback = empty_callback
199+
elif not hasattr(callback, '__call__'):
200+
raise ValueError("'callback' must be a callable object.")
201+
192202
# grad_op_descs holds created grad_op, and will be appended to target_block
193203
grad_op_descs = []
194204
program = block.program
@@ -226,6 +236,7 @@ def _append_backward_ops_(target,
226236
for op_desc in grad_op_descs:
227237
new_op_desc = target_block.desc.append_op()
228238
new_op_desc.copy_from(op_desc)
239+
callback(block=target_block, context=grad_to_var)
229240

230241

231242
def _append_backward_vars_(block, start_op_idx, grad_to_var, grad_info_map):
@@ -268,7 +279,7 @@ def _append_backward_vars_(block, start_op_idx, grad_to_var, grad_info_map):
268279
_infer_var_data_type_(arg, block)
269280

270281

271-
def append_backward(loss, parameter_list=None, no_grad_set=None):
282+
def append_backward(loss, parameter_list=None, no_grad_set=None, callback=None):
272283
"""
273284
Append backward part to main_program
274285
@@ -312,7 +323,7 @@ def append_backward(loss, parameter_list=None, no_grad_set=None):
312323
grad_to_var = dict()
313324

314325
_append_backward_ops_(loss, root_block, root_block, no_grad_dict,
315-
grad_to_var)
326+
grad_to_var, callback)
316327
_append_backward_vars_(root_block, fwd_op_num, grad_to_var, grad_info_map)
317328

318329
program.current_block_idx = current_block_idx

python/paddle/v2/fluid/clip.py

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,46 @@
11
import functools
22
import layers
3+
from . import core
34

4-
__all__ = ['GradientClipByValue', 'append_gradient_clip_ops']
5+
__all__ = [
6+
'GradientClipByValue', 'append_gradient_clip_ops', 'error_clip_callback'
7+
]
8+
9+
10+
class BaseErrorClipAttr(object):
11+
def append_clip_op(self, block, grad_name):
12+
raise NotImplementedError()
13+
14+
15+
class ErrorClipByValue(BaseErrorClipAttr):
16+
def __init__(self, max, min=None):
17+
max = float(max)
18+
if min is None:
19+
min = -max
20+
else:
21+
min = float(min)
22+
self.max = max
23+
self.min = min
24+
25+
def append_clip_op(self, block, grad_name):
26+
block.append_op(
27+
type="clip",
28+
inputs={"X": grad_name},
29+
outputs={"Out": grad_name},
30+
attrs={"min": self.min,
31+
"max": self.max})
32+
33+
34+
def error_clip_callback(block, context):
35+
# the context is a grad_to_var map
36+
grad_to_var = context
37+
op_desc = block.desc.op(block.desc.op_size() - 1)
38+
for grad_n in filter(lambda n: grad_to_var.has_key(n),
39+
op_desc.output_arg_names()):
40+
fwd_var = block.var_recursive(grad_to_var[grad_n])
41+
error_clip = getattr(fwd_var, "error_clip", None)
42+
if error_clip is not None:
43+
error_clip.append_clip_op(block, grad_n)
544

645

746
class BaseGradientClipAttr(object):

python/paddle/v2/fluid/framework.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,9 +143,11 @@ def __init__(self,
143143
dtype=None,
144144
lod_level=None,
145145
persistable=None,
146+
error_clip=None,
146147
stop_gradient=False,
147148
**kwargs):
148149
self.block = block
150+
self.error_clip = error_clip
149151

150152
if name is None:
151153
name = Variable._unique_var_name_()
@@ -622,6 +624,17 @@ def var(self, name):
622624
raise ValueError("var %s not in this block" % name)
623625
return v
624626

627+
def var_recursive(self, name):
628+
if self.has_var(name):
629+
return self.var(name)
630+
else:
631+
if self.idx == 0:
632+
raise ValueError("var %s is not in block(%d) nor its parents." %
633+
name, self.idx)
634+
else:
635+
parent_block = self.program.block(self.parent_idx)
636+
return parent_block.var_recursive(name)
637+
625638
def all_parameters(self):
626639
return list(self.iter_parameters())
627640

@@ -740,6 +753,7 @@ def copy_param_info_from(self, other):
740753
optimize_attr=p.optimize_attr,
741754
regularizer=p.regularizer,
742755
clip_attr=p.clip_attr,
756+
error_clip=p.error_clip,
743757
name=v.name)
744758
self.vars[new_p.name] = new_p
745759

python/paddle/v2/fluid/optimizer.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
from initializer import Constant
77
from layer_helper import LayerHelper
88
from regularizer import append_regularization_ops
9-
from clip import append_gradient_clip_ops
9+
from clip import append_gradient_clip_ops, error_clip_callback
1010

1111
__all__ = ['SGD', 'Momentum', 'Adagrad', 'Adam', 'Adamax', 'DecayedAdagrad']
1212

@@ -197,7 +197,8 @@ def minimize(self,
197197
This method combines interface `append_backward()` and
198198
`create_optimization_pass()` into one.
199199
"""
200-
params_grads = append_backward(loss, parameter_list, no_grad_set)
200+
params_grads = append_backward(loss, parameter_list, no_grad_set,
201+
error_clip_callback)
201202

202203
params_grads = append_gradient_clip_ops(params_grads)
203204

0 commit comments

Comments
 (0)