|
| 1 | +# Migrate gelu activation from TensorFlow Addons to TensorFlow Core |
| 2 | + |
| 3 | +| Status | Accepted | |
| 4 | +| :---------- | :------------------------------------------------------------------------------------------------- | |
| 5 | +| **RFC #** | [252](https://github.com/tensorflow/community/pull/252) | | |
| 6 | +| **Authors** | Tzu-Wei Sung (@WindQAQ) & Sean Morgan (@seanpmorgan) | |
| 7 | +| **Sponsor** | @alextp | |
| 8 | +| **Updated** | 2020-07-15 | |
| 9 | +| **Sponsorship Deadline** | 2020-07-17 (45 Days after submission) | |
| 10 | + |
| 11 | +## Rationale for Migration |
| 12 | +* [Gaussian Error Linear Units (GELUs)](https://arxiv.org/pdf/1606.08415.pdf) cited 600+ times |
| 13 | +* Used in BERT and other influential architectures |
| 14 | +* Multiple approvals from TF side: |
| 15 | + * https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617832325 |
| 16 | + * https://github.com/tensorflow/tensorflow/issues/32783#issuecomment-537284266 |
| 17 | + |
| 18 | +## Historical Information |
| 19 | +* Have there been signifiant issues reported to Addons that need to be adressed? |
| 20 | + * Only ABI incompatibilities for the custom-op (not an issue if built along with core TF) |
| 21 | +* When was it implemented in Addons? |
| 22 | + * C++ custom-op added **2019-08-2019 (TFA 0.5.0)** |
| 23 | + * Python composite op added **2020-02-26 (TFA 0.9.0)** |
| 24 | +* We have [performed benchmarking of the GELU activation](https://colab.research.google.com/drive/1rLb4EuydbFg9PbhboXhCDqopcl6BmphG#scrollTo=0GL2x2S4zxW3) |
| 25 | +which shows it may be beneficial to retain the custom-ops, but the maintence burden has grown |
| 26 | +to much for us to continue to support it in Addons. |
| 27 | +* This migration is long over-due but we've struggled with finalizing the migration process. |
| 28 | + |
| 29 | +## Implementation Details |
| 30 | +* Link to implementation in Addons: |
| 31 | + * Python: https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/activations/gelu.py |
| 32 | + * C++ : https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/custom_ops/activations/cc/kernels/gelu_op.h |
| 33 | +* Does this include custom-op kernels? |
| 34 | + * Yes, but currently proposing to just migrate the python composite op. This may |
| 35 | + change with discussion in the RFC. |
| 36 | + * Are they CPU/GPU/TPU compatible? |
| 37 | + * CPU/GPU compatible. No support for TPU. |
| 38 | +* What is the pytest coverage of the addon? |
| 39 | + * `tensorflow_addons/activations/gelu.py 89%` |
| 40 | +## Changes to Implementation (If Needed) |
| 41 | +``` |
| 42 | +def gelu(x: types.TensorLike, approximate: bool = True) -> tf.Tensor: |
| 43 | + x = tf.convert_to_tensor(x) |
| 44 | + if approximate: |
| 45 | + pi = tf.cast(math.pi, x.dtype) |
| 46 | + coeff = tf.cast(0.044715, x.dtype) |
| 47 | + return 0.5 * x * (1.0 + tf.tanh(tf.sqrt(2.0 / pi) * (x + coeff * tf.pow(x, 3)))) |
| 48 | + else: |
| 49 | + return 0.5 * x * (1.0 + tf.math.erf(x / tf.cast(tf.sqrt(2.0), x.dtype))) |
| 50 | +``` |
| 51 | +The above implementation would only bring over the python composite op. Since there is |
| 52 | +[no way for us to build tfxla kernels](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617842977) |
| 53 | +we had no support for TPUs in Addons. [There were comments](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-625380208) |
| 54 | +about using a "selector", but we would need guidance on how to implement that. |
| 55 | + |
| 56 | +We may also want to discuss the `approximate` bool and if it should be included in the |
| 57 | +upstream version. |
| 58 | + |
| 59 | + |
| 60 | +## Transition Plan |
| 61 | +* The activation would land in [nn_ops.py](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow//python/ops/nn_ops.py), [keras activations](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/activations.py), |
| 62 | + and possibly in [keras advaced_activation layers](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/layers/advanced_activations.py) |
| 63 | +* No planned changes to the parameter signatures at this time |
| 64 | +* Addons would deprecate our activation and make a call to the core functionality. |
| 65 | +* After merging to TF Core: |
| 66 | + * Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/modeling/activations/gelu.py |
| 67 | + * Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/modeling/activations/gelu_test.py |
| 68 | + * Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/nlp/xlnet/xlnet_modeling.py#L29 |
| 69 | + |
| 70 | +## Relevant GitHub Issues |
| 71 | +* https://github.com/tensorflow/tensorflow/pull/33945 |
| 72 | +* https://github.com/tensorflow/addons/issues/550 |
| 73 | +* https://github.com/tensorflow/tensorflow/issues/32783 |
| 74 | + |
| 75 | +## Questions and Discussion Topics |
| 76 | +* Whom from the TF core team would sponsor this migration and ownership of the API? |
| 77 | +* Is it worth bringing over the custom-op kernels for CPU/GPU? |
| 78 | + |
| 79 | +## Final Decision |
| 80 | +TBD |
0 commit comments