Improve MacOS support and pin tensorflow version during testing (#383)

mattdangerw · mattdangerw · commit 488b4ce3f7ac · 2022-11-10T14:35:24.000-08:00
* Improve MacOS support

* Conditionally import tensorflow_text everywhere

* Use requirements files for continuous testing

* Fix logs

* Bug fixes and improvement for linux testing

* Typo fix

* Address review comments
diff --git a/.github/workflows/actions.yml b/.github/workflows/actions.yml
@@ -29,8 +29,8 @@ jobs:
             ${{ runner.os }}-pip-
       - name: Install dependencies
         run: |
-          pip install tensorflow
-          pip install -e ".[tests]" --progress-bar off --upgrade
+          pip install -r requirements.txt --progress-bar off
+          pip install -e "." --progress-bar off
       - name: Test with pytest
         run: |
           pytest --cov=keras_nlp --cov-report xml:coverage.xml
@@ -57,7 +57,7 @@ jobs:
             ${{ runner.os }}-pip-
       - name: Install dependencies
         run: |
-          pip install tensorflow
-          pip install -e ".[tests]" --progress-bar off --upgrade
+          pip install -r requirements.txt --progress-bar off
+          pip install -e "." --progress-bar off
       - name: Lint
         run: bash shell/lint.sh
diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml
@@ -30,12 +30,8 @@ jobs:
             ${{ runner.os }}-pip-
       - name: Install dependencies
         run: |
-          pip install -e ".[tests]" --progress-bar off --upgrade
-          pip uninstall keras -y
-          pip uninstall tensorflow -y
-          pip uninstall tensorflow_text -y
-          pip install tf-nightly --progress-bar off --upgrade
-          pip install tensorflow-text-nightly --progress-bar off --upgrade
+          pip install -r requirements-nightly.txt --progress-bar off
+          pip install -e "." --progress-bar off
       - name: Test with pytest
         run: |
           pytest --cov=keras_nlp --cov-report xml:coverage.xml
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -84,25 +84,90 @@ Once the pull request is approved, a team member will take care of merging.
 Python 3.7 or later is required.
 
 Setting up your KerasNLP development environment requires you to fork the
-KerasNLP repository, clone the repository, create a virtual environment, and 
-install dependencies.
-
-You can achieve this by running the following commands:
+KerasNLP repository and clone it locally. With the
+[GitHub CLI](https://github.com/cli/cli) installed, you can do this as follows:
 
 ```shell
 gh repo fork keras-team/keras-nlp --clone --remote
 cd keras-nlp
-python -m venv ~/keras-nlp-venv
-source ~/keras-nlp-venv/bin/activate
-pip install -e ".[tests]"
 ```
 
-The first line relies on having an installation of
-[the GitHub CLI](https://github.com/cli/cli).
+Next we must setup a python environment with the correct dependencies. We
+recommend using `conda` to install tensorflow dependencies (such as CUDA), and
+`pip` to install python packages from PyPI. The exact method will depend on your
+OS.
+
+### Linux (recommended)
+
+To setup a complete environment with TensorFlow, a local install of keras-nlp,
+and all development tools, run the following or adapt it to suit your needs.
+
+```shell
+# Create and activate conda environment.
+conda create -n keras-nlp python=3.9
+conda activate keras-nlp
+
+# The following can be omitted if GPU support is not required.
+conda install -c conda-forge cudatoolkit-dev=11.2 cudnn=8.1.0
+mkdir -p $CONDA_PREFIX/etc/conda/activate.d/
+echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
+echo 'export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX/' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
+source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
+
+# Install dependencies.
+python -m pip install --upgrade pip
+python -m pip install -r requirements.txt
+python -m pip install -e "."
+```
+
+### MacOS
+
+⚠️⚠️⚠️ MacOS binaries are for the M1 architecture are not currently available from
+official sources. You can try experimental development workflow leveraging the
+[tensorflow metal plugin](https://developer.apple.com/metal/tensorflow-plugin/)
+and a [community maintained build](https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon)
+of `tensorflow-text`. These binaries are not provided by Google, so proceed at
+your own risk.
+
+#### Experimental instructions for Arm (M1)
+
+```shell
+# Create and activate conda environment.
+conda create -n keras-nlp python=3.9
+conda activate keras-nlp
+
+# Install dependencies.
+conda install -c apple tensorflow-deps=2.9
+python -m pip install --upgrade pip
+python -m pip install -r requirements-macos-m1.txt
+python -m pip install -e "."
+```
 
-Following these commands you should be able to run the tests using
-`pytest keras_nlp`. Please report any issues running tests following these
-steps.
+#### Instructions for x86 (Intel)
+
+```shell
+# Create and activate conda environment.
+conda create -n keras-nlp python=3.9
+conda activate keras-nlp
+
+# Install dependencies.
+python -m pip install --upgrade pip
+python -m pip install -r requirements.txt
+python -m pip install -e "."
+```
+
+### Windows
+
+For the best experience developing on windows, please install
+[WSL](https://learn.microsoft.com/en-us/windows/wsl/install), and proceed with
+the linux installation instruction above.
+
+To run the format and lint scripts, make sure you clone the repo with Linux
+style line endings and change any line separator settings in your editor.
+This is automatically done if you clone using git inside WSL.
+
+Note that will not support Windows Shell/PowerShell for any scripts in this
+repository.
 
 ## Testing changes
 
@@ -143,18 +208,3 @@ the following commands manually every time you want to format your code:
 If after running these the CI flow is still failing, try updating `flake8`,
 `isort` and `black`. This can be done by running `pip install --upgrade black`,
 `pip install --upgrade flake8`, and `pip install --upgrade isort`.
-
-## Developing on Windows
-
-For Windows development, we recommend using WSL (Windows Subsystem for Linux),
-so you can run the shell scripts in this repository. We will not support
-Windows Shell/PowerShell. You can refer
-[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
-for WSL installation.
-
-Note that if you are using Windows Subsystem for Linux (WSL), make sure you 
-clone the repo with Linux style LF line endings and change the default setting
-for line separator in your Text Editor before running the format
-or lint scripts. This is automatically done if you clone using git inside WSL.
-If there is conflict due to the line endings you might see an error
-like - `: invalid option`.
diff --git a/examples/bert/README.md b/examples/bert/README.md
@@ -16,11 +16,6 @@ need to be trained for much longer on a much larger dataset.
 OUTPUT_DIR=~/bert_test_output
 DATA_URL=https://storage.googleapis.com/tensorflow/keras-nlp/examples/bert
 
-# Create a virtual env and install dependencies.
-mkdir $OUTPUT_DIR
-python3 -m venv $OUTPUT_DIR/env && source $OUTPUT_DIR/env/bin/activate
-pip install -e ".[tests,examples]"
-
 # Download example data.
 wget ${DATA_URL}/bert_vocab_uncased.txt -O $OUTPUT_DIR/bert_vocab_uncased.txt
 wget ${DATA_URL}/wiki_example_data.txt -O $OUTPUT_DIR/wiki_example_data.txt
diff --git a/keras_nlp/integration_tests/basic_usage_test.py b/keras_nlp/integration_tests/basic_usage_test.py
@@ -13,13 +13,17 @@
 # limitations under the License.
 
 import tensorflow as tf
+from absl.testing import parameterized
 from tensorflow import keras
 
 import keras_nlp
 
 
-class BasicUsageTest(tf.test.TestCase):
-    def test_quick_start(self):
+class BasicUsageTest(tf.test.TestCase, parameterized.TestCase):
+    @parameterized.named_parameters(
+        ("jit_compile_false", False), ("jit_compile_true", True)
+    )
+    def test_quick_start(self, jit_compile):
         """This matches the quick start example in our base README."""
 
         # Tokenize some inputs with a binary label.
@@ -47,7 +51,7 @@ def test_quick_start(self):
         model = keras.Model(inputs, outputs)
 
         # Run a single batch of gradient descent.
-        model.compile(loss="binary_crossentropy", jit_compile=True)
+        model.compile(loss="binary_crossentropy", jit_compile=jit_compile)
         loss = model.train_on_batch(x, y)
 
         # Make sure we have a valid loss.
diff --git a/keras_nlp/layers/mlm_mask_generator.py b/keras_nlp/layers/mlm_mask_generator.py
@@ -13,9 +13,15 @@
 # limitations under the License.
 
 import tensorflow as tf
-import tensorflow_text as tf_text
 from tensorflow import keras
 
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
+
 
 class MLMMaskGenerator(keras.layers.Layer):
     """Layer that applies language model masking.
@@ -96,6 +102,8 @@ def __init__(
         random_token_rate=0.1,
         **kwargs,
     ):
+        assert_tf_text_installed(self.__class__.__name__)
+
         super().__init__(**kwargs)
         self.vocabulary_size = vocabulary_size
         self.unselectable_token_ids = unselectable_token_ids
diff --git a/keras_nlp/layers/multi_segment_packer.py b/keras_nlp/layers/multi_segment_packer.py
@@ -15,9 +15,15 @@
 """BERT token packing layer."""
 
 import tensorflow as tf
-import tensorflow_text as tf_text
 from tensorflow import keras
 
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
+
 
 class MultiSegmentPacker(keras.layers.Layer):
     """Packs multiple sequences into a single fixed width model input.
@@ -106,6 +112,8 @@ def __init__(
         truncator="round_robin",
         **kwargs,
     ):
+        assert_tf_text_installed(self.__class__.__name__)
+
         super().__init__(**kwargs)
         self.sequence_length = sequence_length
         if truncator not in ("round_robin", "waterfall"):
diff --git a/keras_nlp/metrics/rouge_base.py b/keras_nlp/metrics/rouge_base.py
@@ -20,7 +20,7 @@
 import tensorflow as tf
 from tensorflow import keras
 
-from keras_nlp.utils.tensor_utils import tensor_to_string_list
+from keras_nlp.utils.tf_utils import tensor_to_string_list
 
 try:
     import rouge_score
@@ -62,8 +62,8 @@ def __init__(
 
         if rouge_score is None:
             raise ImportError(
-                "ROUGE metric requires the `rouge_score` package. "
-                "Please install it with `pip install rouge-score`."
+                f"{self.__class__.__name__} requires the `rouge_score` "
+                "package. Please install it with `pip install rouge-score`."
             )
 
         if not tf.as_dtype(self.dtype).is_floating:
diff --git a/keras_nlp/tokenizers/byte_tokenizer.py b/keras_nlp/tokenizers/byte_tokenizer.py
@@ -16,9 +16,14 @@
 
 import numpy as np
 import tensorflow as tf
-import tensorflow_text as tf_text
 
 from keras_nlp.tokenizers import tokenizer
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
 
 
 class ByteTokenizer(tokenizer.Tokenizer):
@@ -150,6 +155,8 @@ def __init__(
         replacement_char: int = 65533,
         **kwargs,
     ):
+        assert_tf_text_installed(self.__class__.__name__)
+
         # Check dtype and provide a default.
         if "dtype" not in kwargs or kwargs["dtype"] is None:
             kwargs["dtype"] = tf.int32
diff --git a/keras_nlp/tokenizers/sentence_piece_tokenizer.py b/keras_nlp/tokenizers/sentence_piece_tokenizer.py
@@ -17,10 +17,15 @@
 from typing import List
 
 import tensorflow as tf
-import tensorflow_text as tf_text
 
 from keras_nlp.tokenizers import tokenizer
-from keras_nlp.utils.tensor_utils import tensor_to_string_list
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+from keras_nlp.utils.tf_utils import tensor_to_string_list
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
 
 
 class SentencePieceTokenizer(tokenizer.Tokenizer):
@@ -96,6 +101,8 @@ def __init__(
         sequence_length: int = None,
         **kwargs,
     ) -> None:
+        assert_tf_text_installed(self.__class__.__name__)
+
         # Check dtype and provide a default.
         if "dtype" not in kwargs or kwargs["dtype"] is None:
             kwargs["dtype"] = tf.int32
diff --git a/keras_nlp/tokenizers/unicode_character_tokenizer.py b/keras_nlp/tokenizers/unicode_character_tokenizer.py
@@ -13,9 +13,14 @@
 # limitations under the License.
 
 import tensorflow as tf
-import tensorflow_text as tf_text
 
 from keras_nlp.tokenizers import tokenizer
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
 
 
 class UnicodeCharacterTokenizer(tokenizer.Tokenizer):
@@ -199,6 +204,8 @@ def __init__(
         vocabulary_size: int = None,
         **kwargs,
     ) -> None:
+        assert_tf_text_installed(self.__class__.__name__)
+
         # Check dtype and provide a default.
         if "dtype" not in kwargs or kwargs["dtype"] is None:
             kwargs["dtype"] = tf.int32
diff --git a/keras_nlp/tokenizers/word_piece_tokenizer.py b/keras_nlp/tokenizers/word_piece_tokenizer.py
@@ -16,9 +16,14 @@
 from typing import List
 
 import tensorflow as tf
-import tensorflow_text as tf_text
 
 from keras_nlp.tokenizers import tokenizer
+from keras_nlp.utils.tf_utils import assert_tf_text_installed
+
+try:
+    import tensorflow_text as tf_text
+except ImportError:
+    tf_text = None
 
 # Matches whitespace and control characters.
 WHITESPACE_REGEX = r"|".join(
@@ -183,6 +188,8 @@ def __init__(
         oov_token: str = "[UNK]",
         **kwargs,
     ) -> None:
+        assert_tf_text_installed(self.__class__.__name__)
+
         # Check dtype and provide a default.
         if "dtype" not in kwargs or kwargs["dtype"] is None:
             kwargs["dtype"] = tf.int32
diff --git a/keras_nlp/utils/tf_utils.py b/keras_nlp/utils/tf_utils.py
@@ -14,6 +14,11 @@
 
 import tensorflow as tf
 
+try:
+    import tensorflow_text
+except ImportError:
+    tensorflow_text = None
+
 
 def _decode_strings_to_utf8(inputs):
     """Recursively decodes to list of strings with 'utf-8' encoding."""
@@ -45,3 +50,12 @@ def tensor_to_string_list(inputs):
         if inputs.shape.rank != 0:
             list_outputs = list_outputs.tolist()
     return _decode_strings_to_utf8(list_outputs)
+
+
+def assert_tf_text_installed(symbol_name):
+    """Detokenize and convert tensor to nested lists of python strings."""
+    if tensorflow_text is None:
+        raise ImportError(
+            f"{symbol_name} requires the `tensorflow-text` package. "
+            "Please install with `pip install tensorflow-text`."
+        )
diff --git a/keras_nlp/utils/tf_utils_test.py b/keras_nlp/utils/tf_utils_test.py
diff --git a/requirements-common.txt b/requirements-common.txt
diff --git a/requirements-macos-m1.txt b/requirements-macos-m1.txt
diff --git a/requirements-nightly.txt b/requirements-nightly.txt
diff --git a/requirements.txt b/requirements.txt
diff --git a/setup.py b/setup.py