Merge pull request #252 from seanpmorgan/gelu-migration

ematejska · web-flow · commit cced59556864 · 2020-07-16T08:35:43.000-07:00
RFC: Migrate gelu activation from Addons to Core
diff --git a/rfcs/20200525-gelu-migration.md b/rfcs/20200525-gelu-migration.md
@@ -0,0 +1,80 @@
+# Migrate gelu activation from TensorFlow Addons to TensorFlow Core
+
+| Status      | Accepted                                                                                          |
+| :---------- | :------------------------------------------------------------------------------------------------- |
+| **RFC #**   | [252](https://github.com/tensorflow/community/pull/252) |                                       |
+| **Authors** | Tzu-Wei Sung (@WindQAQ) & Sean Morgan (@seanpmorgan) |
+| **Sponsor** | @alextp   |
+| **Updated** | 2020-07-15 |
+| **Sponsorship Deadline** | 2020-07-17 (45 Days after submission)  |
+
+## Rationale for Migration
+* [Gaussian Error Linear Units (GELUs)](https://arxiv.org/pdf/1606.08415.pdf) cited 600+ times
+* Used in BERT and other influential architectures
+* Multiple approvals from TF side: 
+    * https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617832325
+    * https://github.com/tensorflow/tensorflow/issues/32783#issuecomment-537284266
+
+## Historical Information
+* Have there been signifiant issues reported to Addons that need to be adressed?
+    * Only ABI incompatibilities for the custom-op (not an issue if built along with core TF)
+* When was it implemented in Addons?
+    * C++ custom-op added **2019-08-2019 (TFA 0.5.0)**
+    * Python composite op added **2020-02-26 (TFA 0.9.0)**
+* We have [performed benchmarking of the GELU activation](https://colab.research.google.com/drive/1rLb4EuydbFg9PbhboXhCDqopcl6BmphG#scrollTo=0GL2x2S4zxW3) 
+which shows it may be beneficial to retain the custom-ops, but the maintence burden has grown 
+to much for us to continue to support it in Addons.
+* This migration is long over-due but we've struggled with finalizing the migration process.
+
+## Implementation Details
+* Link to implementation in Addons:
+    * Python: https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/activations/gelu.py
+    * C++ : https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/custom_ops/activations/cc/kernels/gelu_op.h
+* Does this include custom-op kernels?
+    * Yes, but currently proposing to just migrate the python composite op. This may 
+    change with discussion in the RFC.
+    * Are they CPU/GPU/TPU compatible?
+        * CPU/GPU compatible. No support for TPU.
+* What is the pytest coverage of the addon?
+    * `tensorflow_addons/activations/gelu.py 89%`
+## Changes to Implementation (If Needed)
+```
+def gelu(x: types.TensorLike, approximate: bool = True) -> tf.Tensor:
+    x = tf.convert_to_tensor(x)
+    if approximate:
+        pi = tf.cast(math.pi, x.dtype)
+        coeff = tf.cast(0.044715, x.dtype)
+        return 0.5 * x * (1.0 + tf.tanh(tf.sqrt(2.0 / pi) * (x + coeff * tf.pow(x, 3))))
+    else:
+        return 0.5 * x * (1.0 + tf.math.erf(x / tf.cast(tf.sqrt(2.0), x.dtype)))
+```
+The above implementation would only bring over the python composite op. Since there is 
+[no way for us to build tfxla kernels](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617842977) 
+we had no support for TPUs in Addons.  [There were comments](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-625380208) 
+about using a "selector", but we would need guidance on how to implement that.
+
+We may also want to discuss the `approximate` bool and if it should be included in the 
+upstream version.
+
+
+## Transition Plan
+* The activation would land in [nn_ops.py](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow//python/ops/nn_ops.py), [keras activations](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/activations.py),
+ and possibly in [keras advaced_activation layers](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/layers/advanced_activations.py)
+* No planned changes to the parameter signatures at this time
+* Addons would deprecate our activation and make a call to the core functionality.
+* After merging to TF Core:
+    * Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/modeling/activations/gelu.py
+    * Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/modeling/activations/gelu_test.py
+    * Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/nlp/xlnet/xlnet_modeling.py#L29
+
+## Relevant GitHub Issues
+* https://github.com/tensorflow/tensorflow/pull/33945
+* https://github.com/tensorflow/addons/issues/550
+* https://github.com/tensorflow/tensorflow/issues/32783
+
+## Questions and Discussion Topics
+* Whom from the TF core team would sponsor this migration and ownership of the API?
+* Is it worth bringing over the custom-op kernels for CPU/GPU?
+
+## Final Decision
+TBD