Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit 6a8f789

Browse files
committed
* gelu migration RFC
1 parent 12f4414 commit 6a8f789

File tree

1 file changed

+75
-0
lines changed

1 file changed

+75
-0
lines changed

rfcs/20200525-gelu-migration.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Migrate gelu activation from TensorFlow Addons to TensorFlow Core
2+
3+
| Status | Proposed (Waiting for Sponsor) |
4+
| :---------- | :------------------------------------------------------------------------------------------------- |
5+
| **RFC #** | TBD after PR | |
6+
| **Authors** | Tzu-Wei Sung (@WindQAQ) & Sean Morgan (@seanpmorgan) |
7+
| **Sponsor** | TBD |
8+
| **Updated** | 2020-05-XX |
9+
| **Sponsorship Deadline** | 2020-07-XX (45 Days after submission) |
10+
11+
## Rationale for Migration
12+
* [Gaussian Error Linear Units (GELUs)](https://arxiv.org/pdf/1606.08415.pdf) cited 600+ times
13+
* Used in BERT and other influential architectures
14+
* Multiple approvals from TF side:
15+
* https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617832325
16+
* https://github.com/tensorflow/tensorflow/issues/32783#issuecomment-537284266
17+
18+
## Historical Information
19+
* Have there been signifiant issues reported to Addons that need to be adressed?
20+
* Only ABI incompatibilities for the custom-op (not an issue if built along with core TF)
21+
* When was it implemented in Addons?
22+
* C++ custom-op added **2019-08-2019 (TFA 0.5.0)**
23+
* Python composite op added **2020-02-26 (TFA 0.9.0)**
24+
* We have [performed benchmarking of the GELU activation](https://colab.research.google.com/drive/1rLb4EuydbFg9PbhboXhCDqopcl6BmphG#scrollTo=0GL2x2S4zxW3)
25+
which shows there it may be beneficial to retain the custom-ops, but the maintence burden has grown
26+
to much for us to continue to support it in Addons.
27+
* This migration is long over-due but we've struggled with finalizing the migration process.
28+
29+
## Implementation Details
30+
* Link to implementation in Addons:
31+
* Python: [https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/activations/gelu.py](https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/activations/gelu.py)
32+
* C++ : [https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/custom_ops/activations/cc/kernels/gelu_op.h](https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/custom_ops/activations/cc/kernels/gelu_op.h)
33+
* Does this include custom-op kernels?
34+
* Yes, but currently proposing to just migrate the python composite op. This may
35+
change with discussion in the RFC.
36+
* Are they CPU/GPU/TPU compatible?
37+
* CPU/GPU compatible. No support for TPU.
38+
* What is the pytest coverage of the addon?
39+
* `tensorflow_addons/activations/gelu.py 89%`
40+
## Changes to Implementation (If Needed)
41+
```
42+
def gelu(x: types.TensorLike, approximate: bool = True) -> tf.Tensor:
43+
x = tf.convert_to_tensor(x)
44+
if approximate:
45+
pi = tf.cast(math.pi, x.dtype)
46+
coeff = tf.cast(0.044715, x.dtype)
47+
return 0.5 * x * (1.0 + tf.tanh(tf.sqrt(2.0 / pi) * (x + coeff * tf.pow(x, 3))))
48+
else:
49+
return 0.5 * x * (1.0 + tf.math.erf(x / tf.cast(tf.sqrt(2.0), x.dtype)))
50+
```
51+
The above implementation would only bring over the python composite op. Since there is
52+
[no way for us to build tfxla kernels](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617842977)
53+
we had no support for TPUs in Addons. [There were comments](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-625380208)
54+
about using a "selector", but we would need guidance on how to implement that.
55+
56+
We may also want to discuss the `approximate` bool and if it should be included in the
57+
upstream version.
58+
59+
60+
## Transition Plan
61+
* The activation would land in [nn_ops.py](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow//python/ops/nn_ops.py) as well as in [keras advaced_activations](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/layers/advanced_activations.py)
62+
* No planned changes to the parameter signatures at this time
63+
* Addons would deprecate our activation and make a call to the core functionality.
64+
65+
## Relevant GitHub Issues
66+
https://github.com/tensorflow/tensorflow/pull/33945
67+
https://github.com/tensorflow/addons/issues/550
68+
https://github.com/tensorflow/tensorflow/issues/32783
69+
70+
## Questions and Discussion Topics
71+
* Whom from the TF core team would sponsor this migration and ownership of the API?
72+
* Is it worth brining over the custom-op kernels for CPU/GPU?
73+
74+
## Final Decision
75+
TBD

0 commit comments

Comments
 (0)