Skip to content

Commit 8ab59da

Browse files
committed
Update doc
1 parent be218bf commit 8ab59da

File tree

1 file changed

+36
-49
lines changed

1 file changed

+36
-49
lines changed

doc/design/error_clip.md

Lines changed: 36 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,13 @@
22

33
## Overview
44

5-
Error clip is widely used in model training to prevent gradient exploding. It takes a value as clip threshold. With error clip, all gradient values will be checked before they are taken by the next `grad_op`, and values greater than the threshold will be clipped.
6-
5+
Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables' gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next `grad_op` and be shrunk if necessary.
76
## Usage
87

9-
Users can enable clip and set related attributes via invoking `Optimizer`'s `minimize` API:
8+
Users are allowed to assign different error clip methods or attributes to different `Variable`s. Users can specify it as a parameter of `Variable`'s constructor:
109

1110
```python
12-
def minimize(self,
13-
loss,
14-
startup_program=None,
15-
parameter_list=None,
16-
no_grad_set=None,
17-
error_clip=None):
18-
# ...
11+
var = framework.Variable(..., error_clip=myErrorClip, ...)
1912
```
2013

2114
The default value of `error_clip` is `None`, which means no error clip is employed. When it's not `None`, it should take an object of `BaseErrorClipAttr`'s derived class. So far, `BaseErrorClipAttr` has only one derived class: `ErrorClipByValue`, whose constructor is:
@@ -24,13 +17,12 @@ The default value of `error_clip` is `None`, which means no error clip is employ
2417
ErrorClipByValue(max, min=None)
2518
```
2619

27-
`max` and `min` represent the maximal and minimal clip threshold respectively. When the `min` is None, the minimal threshold will be assigned with `-max`.
20+
`max` and `min` represent the maximal and minimal clip threshold respectively. In backward pass, all values of `var`'s gradient greater than `max` or less than `min` will be clipped to `max` and `min` respectively. When the `min` is None, the minimal threshold will be assigned with `-max` automatically.
2821

29-
So we can enable the error clip with threshold `[-5.0, 5.0]` by:
22+
So we can enable the error clip with threshold `[-5.0, 5.0]` for variable `var` by:
3023

3124
```python
32-
opt = fluid.optimizer.SGD(learning_rate=0.001)
33-
opt.minimize(loss=avg_cost, error_clip=ErrorClipByValue(max=5.0))
25+
var = framework.Variable(..., error_clip=ErrorClipByValue(max=5.0), ...)
3426
```
3527

3628
## Implementation
@@ -39,17 +31,9 @@ The `BaseErrorClipAttr` and its derived class `ErrorClipByValue` are defined in
3931

4032
```python
4133
class BaseErrorClipAttr(object):
42-
def create_clip_op_desc(self, grad_name):
34+
def append_clip_op(self, block, grad_name):
4335
raise NotImplementedError()
4436

45-
def prepend_clip_op_desc(self, op_descs):
46-
grad_names = set()
47-
for op_desc in op_descs:
48-
grad_names.update(filter(lambda n: n.find(
49-
core.grad_var_suffix()) != -1, op_desc.output_arg_names()))
50-
for n in grad_names:
51-
op_descs.append(self.create_clip_op_desc(grad_name=n))
52-
5337

5438
class ErrorClipByValue(BaseErrorClipAttr):
5539
def __init__(self, max, min=None):
@@ -61,40 +45,43 @@ class ErrorClipByValue(BaseErrorClipAttr):
6145
self.max = max
6246
self.min = min
6347

64-
def create_clip_op_desc(self, grad_name):
65-
desc = core.OpDesc()
66-
desc.set_type("clip")
67-
desc.set_input("X", grad_name)
68-
desc.set_output("Out", grad_name)
69-
desc.set_attr("min", self.min)
70-
desc.set_attr("max", self.max)
71-
return desc
48+
def append_clip_op(self, block, grad_name):
49+
block.append_op(
50+
type="clip",
51+
inputs={"X": grad_name},
52+
outputs={"Out": grad_name},
53+
attrs={"min": self.min,
54+
"max": self.max})
7255
```
7356

74-
The `BaseErrorClipAttr` have two main member functions:
75-
76-
- **`create_clip_op_desc(self, grad_name)`**
77-
78-
> This function is used to create a C++ `OpDesc` object of `clip_op` and return its pointer to Python. For different error clips require different `clip_op`, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.
57+
The `BaseErrorClipAttr` have one main member functions: `append_clip_op(self, block, grad_name)`.
7958

80-
- **`prepend_clip_op_desc(self, op_descs)`**
59+
This function is used to create a `clip_op` and append it to the end of given `block`. For different error clip algorithm require different `clip_op`, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.
8160

82-
> This function takes a list of C++ `OpDesc` as input. It checks each `OpDesc` in the list, creates `clip_op`s for every gradient outputs and then appends them to the input list. The input `op_descs` is supposed to be the backward of a certain forward op. It can contain one or more `OpDesc`s (Some op's backward is a combination of several other ops).
83-
84-
This two functions take effort during the backward building. Just as we showed in the *Usage* section, `Optimizer`'s `minimize` function can take an object of `ErrorClipByValue`(or some other `BaseErrorClipAttr`'s derived class). Inside the `minimize` function, the `prepend_clip_op_desc` function will be send to backward building process as an callback function:
61+
These `clip_op`s should be inserted after `grad_op`s whose output gradients need to be clipped. It is equivalent to appending some `clip_op`s to the end of the target block every time a new `grad_op` is added.
8562

8663
```python
87-
params_grads = append_backward(loss=loss,
88-
parameter_list=parameter_list,
89-
no_grad_set=no_grad_set,
90-
callback=error_clip.prepend_clip_op_desc)
64+
for op_desc in grad_op_descs:
65+
new_op_desc = target_block.desc.append_op()
66+
new_op_desc.copy_from(op_desc)
67+
callback(block=target_block, context=grad_to_var)
9168
```
9269

93-
Each time we get the backward of a forward op, we invoke the callback function to append `clip_op` for all the new generated gradients(In the `_append_backward_ops_` function of *backward.py*):
70+
Here we employ a callback function to complete this kind of jobs. In `_append_backward_ops_` function, each time after a `grad_op` is added to the `target_block`, a callback function is invoked. The logic of `clip_op` appending can be implemented inside the callback function.
71+
72+
The callback function for `clip_op` appending is defined in *clip.py*:
9473

9574
```python
96-
grad_op_desc, op_grad_to_var = core.get_grad_op_desc(
97-
op.desc, no_grad_dict[block.idx], grad_sub_block_list)
98-
if callback is not None:
99-
grad_op_desc = callback(grad_op_desc)
75+
def error_clip_callback(block, context):
76+
# the context is a grad_to_var map
77+
grad_to_var = context
78+
op_desc = block.desc.op(block.desc.op_size() - 1)
79+
for grad_n in filter(lambda n: grad_to_var.has_key(n),
80+
op_desc.output_arg_names()):
81+
fwd_var = block.var_recursive(grad_to_var[grad_n])
82+
error_clip = getattr(fwd_var, "error_clip", None)
83+
if error_clip is not None:
84+
error_clip.append_clip_op(block, grad_n)
10085
```
86+
87+
This function takes a `block` and a `context`(which is actually a grad\_to\_var map) as inputs. It checks each output of the last `OpDesc` in the `block`. Notice that the last `OpDesc` of the `block` must be a `grad_op` and its outputs must be some forward variables' gradients. If an output gradient's corresponding forward variable has an attribute of `error_clip`, `error_clip_callback` will call the `error_clip`'s `append_clip_op` function to append the required `clip_op` into the `block`.

0 commit comments

Comments
 (0)