keras-team
diff --git a/‎API_DESIGN_GUIDE.md‎
Lines changed: 90 additions & 0 deletions b/‎API_DESIGN_GUIDE.md‎
Lines changed: 90 additions & 0 deletions
diff --git a/‎CODE_OF_CONDUCT.md‎
Lines changed: 4 additions & 0 deletions b/‎CODE_OF_CONDUCT.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 49 additions & 22 deletions b/‎CONTRIBUTING.md‎
Lines changed: 49 additions & 22 deletions
diff --git a/‎keras_nlp/LICENSE‎ renamed to ‎LICENSE‎ b/‎keras_nlp/LICENSE‎ renamed to ‎LICENSE‎
diff --git a/‎README.md‎
Lines changed: 72 additions & 54 deletions b/‎README.md‎
Lines changed: 72 additions & 54 deletions
@@ -0,0 +1,90 @@
+# API Design Guide
+
+Before reading this document, please read the
+[Keras API design guidelines](https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
+
+Below are some design considerations specific to KerasNLP.
+
+## Philosophy
+
+- **Let user needs be our compass.** Any modular building block that NLP
+  practitioners need is in scope, whether it's data loading, augmentation, model
+  building, evaluation metrics, or visualization utils.
+
+- **Be resolutely high-level.** Even if something is easy to do by hand in 5
+  lines, package it as a one liner.
+
+- **Balance ease of use and flexibility.** Simple things should be easy, and
+  arbitrarily advanced use cases should be possible. There should always be a
+  "we need to go deeper" path available to our most expert users.
+
+- **Grow as a platform and as a community.** KerasNLP development should be
+  driven by the community, with feature and release planning happening in
+  the open on GitHub.
+
+## Avoid new dependencies
+
+The core dependencies of KerasNLP are Keras, NumPy, TensorFlow, and
+[Tensorflow Text](https://www.tensorflow.org/text).
+
+We strive to keep KerasNLP as self-contained as possible, and avoid adding
+dependencies to projects (for example NLTK or spaCy) for text preprocessing.
+
+In rare cases, particularly with tokenizers and metrics, we may need to add
+an external dependency for compatibility with the "canonical" implementation
+of a certain technique. In these cases, avoid adding a new package dependency,
+and add installation instructions for the specific symbol:
+
+```python
+try:
+    import rouge_score
+except ImportError:
+    pass
+
+class RougeL(keras.metrics.Metric):
+    def __init__(self):
+        if rouge_score is None:
+            raise ImportError(
+                'RougeL metrics requires the rouge_score package. '
+                '`pip install rouge-score`.')
+```
+
+## Keep computation inside TensorFlow graph
+
+Our layers, metrics, and tokenizers should be fast and efficient, which means
+running inside the
+[TensorFlow graph](https://www.tensorflow.org/guide/intro_to_graphs)
+whenever possible. This means you should be able to wrap annotate a function
+calling a layer, metric or loss with `@tf.function` without running into issues.
+
+[tf.strings](https://www.tensorflow.org/api_docs/python/tf/strings) and
+[tf.text](https://www.tensorflow.org/text/api_docs/python/text) provides a large
+surface on TensorFlow operations that manipulate strings. If an low-level (c++)
+operation we need is missing, we should add it in collaboration with core
+TensorFlow or TensorFlow Text. KerasNLP is a python-only library.
+
+We should also strive to keep computation XLA compilable wherever possible (e.g.
+`tf.function(jit_compile=True)`). For trainable modeling components this is
+particularly important due to the performance gains offered by XLA. For
+preprocessing and postprocessing, XLA compilation is not a requirement.
+
+## Support tf.data for text preprocessing and augmentation
+
+In general, our preprocessing tools should be runnable inside a
+[tf.data](https://www.tensorflow.org/guide/data) pipeline, and any augmentation
+to training data should be dynamic (runnable on the fly during training rather
+than precomputed).
+
+We should design our preprocessing workflows with tf.data in mind, and support
+both batched and unbatched data as input to preprocessing layers.
+
+## Prioritize multi-lingual support
+
+We strive to keep KerasNLP a friendly and useful library for speakers of all
+languages. In general, prefer designing workflows that are language agnostic,
+and do not involve logic (e.g. stemming) that need to be rewritten
+per-language.
+
+It is OK for new workflows to not come with of the box support for all
+languages in a first release, but a design that does not include a plan for
+multi-lingual support will be rejected.
@@ -0,0 +1,4 @@
+# Code of Conduct
+
+This project follows
+[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
@@ -1,6 +1,38 @@
 # Contribution guide
 
-## How to contribute code
+KerasNLP is an actively growing project and community! We would love for you
+to get involved. Below are instructions for how to plug into KerasNLP
+development.
+
+## Background reading
+
+Before contributing code, please review our [Style Guide](STYLE_GUIDE.md) and
+[API Design Guide](API_DESIGN_GUIDE.md).
+
+Our [Roadmap](ROADMAP.md) contains an overview of the project goals and our
+current focus areas.
+
+We follow
+[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
+
+## Finding an issue
+
+The fastest way to contribute it to find open issues that need an assignee. We
+maintain two lists of github tags for contributors:
+
+ - [good first issue](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22):
+   a list of small, well defined issues for newcomers to the project.
+ - [contributions welcome](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22):
+   a larger list of issues that may range in complexity.
+
+If you would like propose a new symbol or feature, please first review our
+design guide and roadmap linked above, and issue to discuss. If you have a
+specific design in mind, please include a Colab notebook showing the proposed
+design in a end-to-end example. Keep in mind that design for a new feature or
+use case may take longer than contributing to an open issue with a
+vetted-design.
+
+## Contributing code
 
 Follow these steps to submit your code contribution.
 
@@ -47,22 +79,7 @@ request gets approved by the reviewer.
 
 Once the pull request is approved, a team member will take care of merging.
 
-## Developing on Windows
-
-For Windows development, we recommend using WSL (Windows Subsystem for Linux),
-so you can run the shell scripts in this repository. We will not support
-Windows Shell/PowerShell. You can refer
-[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
-for WSL installation.
-
-Note that if you are using Windows Subsystem for Linux (WSL), make sure you 
-clone the repo with Linux style LF line endings and change the default setting
-for line separator in your Text Editor before running the format
-or lint scripts. This is automatically done if you clone using git inside WSL.
-If there is conflict due to the line endings you might see an error
-like - `: invalid option`.
-
-## Setup environment
+## Setting up an Environment
 
 Python 3.7 or later is required.
 
@@ -87,7 +104,7 @@ Following these commands you should be able to run the tests using
 `pytest keras_nlp`. Please report any issues running tests following these
 steps.
 
-## Run tests
+## Testing changes
 
 KerasNLP is tested using [PyTest](https://docs.pytest.org/en/6.2.x/).
 
@@ -115,7 +132,7 @@ You can run the unit tests for KerasNLP by running:
 pytest keras_nlp/
 ```
 
-## Formatting the Code
+## Formatting Code
 
 We use `flake8`, `isort` and `black` for code formatting.  You can run
 the following commands manually every time you want to format your code:
@@ -127,7 +144,17 @@ If after running these the CI flow is still failing, try updating `flake8`,
 `isort` and `black`. This can be done by running `pip install --upgrade black`,
 `pip install --upgrade flake8`, and `pip install --upgrade isort`.
 
-## Community Guidelines
+## Developing on Windows
 
-This project follows 
-[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
+For Windows development, we recommend using WSL (Windows Subsystem for Linux),
+so you can run the shell scripts in this repository. We will not support
+Windows Shell/PowerShell. You can refer
+[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
+for WSL installation.
+
+Note that if you are using Windows Subsystem for Linux (WSL), make sure you 
+clone the repo with Linux style LF line endings and change the default setting
+for line separator in your Text Editor before running the format
+or lint scripts. This is automatically done if you clone using git inside WSL.
+If there is conflict due to the line endings you might see an error
+like - `: invalid option`.
@@ -4,78 +4,96 @@
 ![Tensorflow](https://img.shields.io/badge/tensorflow-v2.5.0+-success.svg)
 [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/keras-team/keras-nlp/issues)
 
-KerasNLP is a repository of modular building blocks (e.g. layers, metrics, losses)
-to support modern Natural Language Processing (NLP) workflows.
-Engineers working with applied NLP can leverage it to
-rapidly assemble training and inference pipelines that are both state-of-the-art
-and production-grade. Common use cases for application include sentiment
-analysis, named entity recognition, text generation, etc.
+KerasNLP is a simple and powerful API for building Natural Language Processing
+(NLP) models within the Keras ecosystem.
 
-KerasNLP can be understood as a horizontal extension of the Keras API: they're
-new first-party Keras objects (layers, metrics, etc) that are too specialized to
-be added to core Keras, but that receive the same level of polish and backwards
-compatibility guarantees as the rest of the Keras API and that are maintained by
-the Keras team itself (unlike TFAddons).
+KerasNLP provides modular building blocks following
+standard Keras interfaces (layers, metrics) that allow you to quickly and
+flexibly iterate on your task. Engineers working in applied NLP can leverage the
+library to assemble training and inference pipelines that are both
+state-of-the-art and production-grade.
 
-Currently, KerasNLP is operating pre-release. Upon launch of KerasNLP 1.0, full
-API docs and code examples will be available.
+KerasNLP can be understood as a horizontal extension of the Keras API —
+components are first-party Keras objects that are too specialized to be
+added to core Keras, but that receive the same level of polish as the rest of
+the Keras API.
 
-## Contributors
+We are a new and growing project, and welcome [contributions](CONTRIBUTING.md).
 
-If you'd like to contribute, please see our [contributing guide](CONTRIBUTING.md).
+## Quick Links
 
-The fastest way to find a place to contribute is to browse our
-[open issues](https://github.com/keras-team/keras-nlp/issues) and find an
-unclaimed issue to work on. Issues with a [contributions welcome](
-https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
-tag are places where we are actively looking for support, and a
-[good first issue](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
-tag means we think this could be a accessible a first time contributor.
+- [Contributing](CONTRIBUTING.md)
+- [Roadmap](ROADMAP.md)
+- [Style Guide](STYLE_GUIDE.md)
+- [API Design Guide](API_DESIGN_GUIDE.md)
+- [Call for Contributions](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
 
-If you would like to propose a new symbol or feature, please open an issue to
-discuss. Be aware the design for new features may take longer than contributing
-pre-planned features. If you have a design in mind, please include a colab
-notebook showing the proposed design in a end-to-end example. Make sure to
-follow the [Keras API design guidelines](
-https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
+## Quick Start
 
-## Roadmap
-
-This is an early stage project, and we are actively working on a more detailed
-roadmap to share soon. For now, most of our immediate planning is done through
-GitHub issues.
-
-At this stage, we are primarily building components for a short list of
-"greatest hits" NLP models (e.g. BERT, GPT-2, word2vec). We will be focusing
-on components that follow a established Keras interface (e.g.
-`keras.layers.Layer`, `keras.metrics.Metric`, or
-`keras_nlp.tokenizers.Tokenizer`).
-
-As we progress further with the library, we will attempt to cover an ever
-expanding list of widely cited model architectures.
-
-## Releases
-
-KerasNLP release are documented on our
-[github release page](https://github.com/keras-team/keras-nlp/releases) and
-available to download from our [PyPI project](
-https://pypi.org/project/keras-nlp/).
-
-To install KerasNLP and all it's dependencies, simply run:
+Install the latest release:
 
 ```
-pip install keras-nlp
+pip install keras-nlp --upgrade
+```
+
+Tokenize text, build a tiny transformer, and train a single batch:
+
+```python
+import keras_nlp
+import tensorflow as tf
+from tensorflow import keras
+
+# Tokenize some inputs with a binary label.
+vocab = ["[UNK]", "the", "qu", "##ick", "br", "##own", "fox", "."]
+sentences = ["The quick brown fox jumped.", "The fox slept."]
+tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(
+    vocabulary=vocab,
+    sequence_length=10,
+)
+x, y = tokenizer(sentences), tf.constant([1, 0])
+
+# Create a tiny transformer.
+inputs = keras.Input(shape=(None,), dtype="int32")
+outputs = keras_nlp.layers.TokenAndPositionEmbedding(
+    vocabulary_size=len(vocab),
+    sequence_length=10,
+    embedding_dim=16,
+)(inputs)
+outputs = keras_nlp.layers.TransformerEncoder(
+    num_heads=4,
+    intermediate_dim=32,
+)(outputs)
+outputs = keras.layers.GlobalAveragePooling1D()(outputs)
+outputs = keras.layers.Dense(1, activation="sigmoid")(outputs)
+model = keras.Model(inputs, outputs)
+
+# Run a single batch of gradient descent.
+model.compile(loss="binary_crossentropy", jit_compile=True)
+model.train_on_batch(x, y)
 ```
 
 ## Compatibility
 
 We follow [Semantic Versioning](https://semver.org/), and plan to
 provide backwards compatibility guarantees both for code and saved models built
-with our components. While we continue with pre-release `0.y.z` development, we
 may break compatibility at any time and APIs should not be consider stable.
 
+## Citing KerasNLP
+
+If KerasNLP helps your research, we appreciate your citations.
+Here is the BibTeX entry:
+
+```bibtex
+@misc{kerasnlp2022,
+  title={KerasNLP},
+  author={Watson, Matthew, and Qian, Chen, and Zhu, Scott and Chollet, Fran\c{c}ois and others},
+  year={2022},
+  howpublished={\url{https://github.com/keras-team/keras-nlp}},
+}
+```
+
 Thank you to all of our wonderful contributors!
 
 <a href="https://github.com/keras-team/keras-nlp/graphs/contributors">
   <img src="https://contrib.rocks/image?repo=keras-team/keras-nlp" />
-</a>
+</a>