Skip to content

Commit 5245885

Browse files
authored
Update docs for upcoming 0.2.0 release (#158)
* Add new docs cross-linking keras.io for 0.2 release * More updates * Formatting * Update link names * Add a style guide and consolidate contributing * More contributing guide updates * Fixups * Formatting futzing * extra callout to docs * Fill out roadmap focus areas in more detail * More roadmap info * Move files around; edits * Address review comments * Clarify note about dtensor * Strip keras.io keras-nlp section links
1 parent 99922f8 commit 5245885

File tree

8 files changed

+524
-85
lines changed

8 files changed

+524
-85
lines changed

API_DESIGN_GUIDE.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# API Design Guide
2+
3+
Before reading this document, please read the
4+
[Keras API design guidelines](https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
5+
6+
Below are some design considerations specific to KerasNLP.
7+
8+
## Philosophy
9+
10+
- **Let user needs be our compass.** Any modular building block that NLP
11+
practitioners need is in scope, whether it's data loading, augmentation, model
12+
building, evaluation metrics, or visualization utils.
13+
14+
- **Be resolutely high-level.** Even if something is easy to do by hand in 5
15+
lines, package it as a one liner.
16+
17+
- **Balance ease of use and flexibility.** Simple things should be easy, and
18+
arbitrarily advanced use cases should be possible. There should always be a
19+
"we need to go deeper" path available to our most expert users.
20+
21+
- **Grow as a platform and as a community.** KerasNLP development should be
22+
driven by the community, with feature and release planning happening in
23+
the open on GitHub.
24+
25+
## Avoid new dependencies
26+
27+
The core dependencies of KerasNLP are Keras, NumPy, TensorFlow, and
28+
[Tensorflow Text](https://www.tensorflow.org/text).
29+
30+
We strive to keep KerasNLP as self-contained as possible, and avoid adding
31+
dependencies to projects (for example NLTK or spaCy) for text preprocessing.
32+
33+
In rare cases, particularly with tokenizers and metrics, we may need to add
34+
an external dependency for compatibility with the "canonical" implementation
35+
of a certain technique. In these cases, avoid adding a new package dependency,
36+
and add installation instructions for the specific symbol:
37+
38+
```python
39+
try:
40+
import rouge_score
41+
except ImportError:
42+
pass
43+
44+
class RougeL(keras.metrics.Metric):
45+
def __init__(self):
46+
if rouge_score is None:
47+
raise ImportError(
48+
'RougeL metrics requires the rouge_score package. '
49+
'`pip install rouge-score`.')
50+
```
51+
52+
## Keep computation inside TensorFlow graph
53+
54+
Our layers, metrics, and tokenizers should be fast and efficient, which means
55+
running inside the
56+
[TensorFlow graph](https://www.tensorflow.org/guide/intro_to_graphs)
57+
whenever possible. This means you should be able to wrap annotate a function
58+
calling a layer, metric or loss with `@tf.function` without running into issues.
59+
60+
[tf.strings](https://www.tensorflow.org/api_docs/python/tf/strings) and
61+
[tf.text](https://www.tensorflow.org/text/api_docs/python/text) provides a large
62+
surface on TensorFlow operations that manipulate strings. If an low-level (c++)
63+
operation we need is missing, we should add it in collaboration with core
64+
TensorFlow or TensorFlow Text. KerasNLP is a python-only library.
65+
66+
We should also strive to keep computation XLA compilable wherever possible (e.g.
67+
`tf.function(jit_compile=True)`). For trainable modeling components this is
68+
particularly important due to the performance gains offered by XLA. For
69+
preprocessing and postprocessing, XLA compilation is not a requirement.
70+
71+
## Support tf.data for text preprocessing and augmentation
72+
73+
In general, our preprocessing tools should be runnable inside a
74+
[tf.data](https://www.tensorflow.org/guide/data) pipeline, and any augmentation
75+
to training data should be dynamic (runnable on the fly during training rather
76+
than precomputed).
77+
78+
We should design our preprocessing workflows with tf.data in mind, and support
79+
both batched and unbatched data as input to preprocessing layers.
80+
81+
## Prioritize multi-lingual support
82+
83+
We strive to keep KerasNLP a friendly and useful library for speakers of all
84+
languages. In general, prefer designing workflows that are language agnostic,
85+
and do not involve logic (e.g. stemming) that need to be rewritten
86+
per-language.
87+
88+
It is OK for new workflows to not come with of the box support for all
89+
languages in a first release, but a design that does not include a plan for
90+
multi-lingual support will be rejected.

CODE_OF_CONDUCT.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Code of Conduct
2+
3+
This project follows
4+
[Google's Open Source Community Guidelines](https://opensource.google/conduct/).

CONTRIBUTING.md

Lines changed: 49 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,38 @@
11
# Contribution guide
22

3-
## How to contribute code
3+
KerasNLP is an actively growing project and community! We would love for you
4+
to get involved. Below are instructions for how to plug into KerasNLP
5+
development.
6+
7+
## Background reading
8+
9+
Before contributing code, please review our [Style Guide](STYLE_GUIDE.md) and
10+
[API Design Guide](API_DESIGN_GUIDE.md).
11+
12+
Our [Roadmap](ROADMAP.md) contains an overview of the project goals and our
13+
current focus areas.
14+
15+
We follow
16+
[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
17+
18+
## Finding an issue
19+
20+
The fastest way to contribute it to find open issues that need an assignee. We
21+
maintain two lists of github tags for contributors:
22+
23+
- [good first issue](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22):
24+
a list of small, well defined issues for newcomers to the project.
25+
- [contributions welcome](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22):
26+
a larger list of issues that may range in complexity.
27+
28+
If you would like propose a new symbol or feature, please first review our
29+
design guide and roadmap linked above, and issue to discuss. If you have a
30+
specific design in mind, please include a Colab notebook showing the proposed
31+
design in a end-to-end example. Keep in mind that design for a new feature or
32+
use case may take longer than contributing to an open issue with a
33+
vetted-design.
34+
35+
## Contributing code
436

537
Follow these steps to submit your code contribution.
638

@@ -47,22 +79,7 @@ request gets approved by the reviewer.
4779

4880
Once the pull request is approved, a team member will take care of merging.
4981

50-
## Developing on Windows
51-
52-
For Windows development, we recommend using WSL (Windows Subsystem for Linux),
53-
so you can run the shell scripts in this repository. We will not support
54-
Windows Shell/PowerShell. You can refer
55-
[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
56-
for WSL installation.
57-
58-
Note that if you are using Windows Subsystem for Linux (WSL), make sure you
59-
clone the repo with Linux style LF line endings and change the default setting
60-
for line separator in your Text Editor before running the format
61-
or lint scripts. This is automatically done if you clone using git inside WSL.
62-
If there is conflict due to the line endings you might see an error
63-
like - `: invalid option`.
64-
65-
## Setup environment
82+
## Setting up an Environment
6683

6784
Python 3.7 or later is required.
6885

@@ -87,7 +104,7 @@ Following these commands you should be able to run the tests using
87104
`pytest keras_nlp`. Please report any issues running tests following these
88105
steps.
89106

90-
## Run tests
107+
## Testing changes
91108

92109
KerasNLP is tested using [PyTest](https://docs.pytest.org/en/6.2.x/).
93110

@@ -115,7 +132,7 @@ You can run the unit tests for KerasNLP by running:
115132
pytest keras_nlp/
116133
```
117134

118-
## Formatting the Code
135+
## Formatting Code
119136

120137
We use `flake8`, `isort` and `black` for code formatting. You can run
121138
the following commands manually every time you want to format your code:
@@ -127,7 +144,17 @@ If after running these the CI flow is still failing, try updating `flake8`,
127144
`isort` and `black`. This can be done by running `pip install --upgrade black`,
128145
`pip install --upgrade flake8`, and `pip install --upgrade isort`.
129146

130-
## Community Guidelines
147+
## Developing on Windows
131148

132-
This project follows
133-
[Google's Open Source Community Guidelines](https://opensource.google/conduct/).
149+
For Windows development, we recommend using WSL (Windows Subsystem for Linux),
150+
so you can run the shell scripts in this repository. We will not support
151+
Windows Shell/PowerShell. You can refer
152+
[to these instructions](https://docs.microsoft.com/en-us/windows/wsl/install)
153+
for WSL installation.
154+
155+
Note that if you are using Windows Subsystem for Linux (WSL), make sure you
156+
clone the repo with Linux style LF line endings and change the default setting
157+
for line separator in your Text Editor before running the format
158+
or lint scripts. This is automatically done if you clone using git inside WSL.
159+
If there is conflict due to the line endings you might see an error
160+
like - `: invalid option`.
File renamed without changes.

README.md

Lines changed: 72 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -4,78 +4,96 @@
44
![Tensorflow](https://img.shields.io/badge/tensorflow-v2.5.0+-success.svg)
55
[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/keras-team/keras-nlp/issues)
66

7-
KerasNLP is a repository of modular building blocks (e.g. layers, metrics, losses)
8-
to support modern Natural Language Processing (NLP) workflows.
9-
Engineers working with applied NLP can leverage it to
10-
rapidly assemble training and inference pipelines that are both state-of-the-art
11-
and production-grade. Common use cases for application include sentiment
12-
analysis, named entity recognition, text generation, etc.
7+
KerasNLP is a simple and powerful API for building Natural Language Processing
8+
(NLP) models within the Keras ecosystem.
139

14-
KerasNLP can be understood as a horizontal extension of the Keras API: they're
15-
new first-party Keras objects (layers, metrics, etc) that are too specialized to
16-
be added to core Keras, but that receive the same level of polish and backwards
17-
compatibility guarantees as the rest of the Keras API and that are maintained by
18-
the Keras team itself (unlike TFAddons).
10+
KerasNLP provides modular building blocks following
11+
standard Keras interfaces (layers, metrics) that allow you to quickly and
12+
flexibly iterate on your task. Engineers working in applied NLP can leverage the
13+
library to assemble training and inference pipelines that are both
14+
state-of-the-art and production-grade.
1915

20-
Currently, KerasNLP is operating pre-release. Upon launch of KerasNLP 1.0, full
21-
API docs and code examples will be available.
16+
KerasNLP can be understood as a horizontal extension of the Keras API —
17+
components are first-party Keras objects that are too specialized to be
18+
added to core Keras, but that receive the same level of polish as the rest of
19+
the Keras API.
2220

23-
## Contributors
21+
We are a new and growing project, and welcome [contributions](CONTRIBUTING.md).
2422

25-
If you'd like to contribute, please see our [contributing guide](CONTRIBUTING.md).
23+
## Quick Links
2624

27-
The fastest way to find a place to contribute is to browse our
28-
[open issues](https://github.com/keras-team/keras-nlp/issues) and find an
29-
unclaimed issue to work on. Issues with a [contributions welcome](
30-
https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
31-
tag are places where we are actively looking for support, and a
32-
[good first issue](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
33-
tag means we think this could be a accessible a first time contributor.
25+
- [Contributing](CONTRIBUTING.md)
26+
- [Roadmap](ROADMAP.md)
27+
- [Style Guide](STYLE_GUIDE.md)
28+
- [API Design Guide](API_DESIGN_GUIDE.md)
29+
- [Call for Contributions](https://github.com/keras-team/keras-nlp/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22)
3430

35-
If you would like to propose a new symbol or feature, please open an issue to
36-
discuss. Be aware the design for new features may take longer than contributing
37-
pre-planned features. If you have a design in mind, please include a colab
38-
notebook showing the proposed design in a end-to-end example. Make sure to
39-
follow the [Keras API design guidelines](
40-
https://github.com/keras-team/governance/blob/master/keras_api_design_guidelines.md).
31+
## Quick Start
4132

42-
## Roadmap
43-
44-
This is an early stage project, and we are actively working on a more detailed
45-
roadmap to share soon. For now, most of our immediate planning is done through
46-
GitHub issues.
47-
48-
At this stage, we are primarily building components for a short list of
49-
"greatest hits" NLP models (e.g. BERT, GPT-2, word2vec). We will be focusing
50-
on components that follow a established Keras interface (e.g.
51-
`keras.layers.Layer`, `keras.metrics.Metric`, or
52-
`keras_nlp.tokenizers.Tokenizer`).
53-
54-
As we progress further with the library, we will attempt to cover an ever
55-
expanding list of widely cited model architectures.
56-
57-
## Releases
58-
59-
KerasNLP release are documented on our
60-
[github release page](https://github.com/keras-team/keras-nlp/releases) and
61-
available to download from our [PyPI project](
62-
https://pypi.org/project/keras-nlp/).
63-
64-
To install KerasNLP and all it's dependencies, simply run:
33+
Install the latest release:
6534

6635
```
67-
pip install keras-nlp
36+
pip install keras-nlp --upgrade
37+
```
38+
39+
Tokenize text, build a tiny transformer, and train a single batch:
40+
41+
```python
42+
import keras_nlp
43+
import tensorflow as tf
44+
from tensorflow import keras
45+
46+
# Tokenize some inputs with a binary label.
47+
vocab = ["[UNK]", "the", "qu", "##ick", "br", "##own", "fox", "."]
48+
sentences = ["The quick brown fox jumped.", "The fox slept."]
49+
tokenizer = keras_nlp.tokenizers.WordPieceTokenizer(
50+
vocabulary=vocab,
51+
sequence_length=10,
52+
)
53+
x, y = tokenizer(sentences), tf.constant([1, 0])
54+
55+
# Create a tiny transformer.
56+
inputs = keras.Input(shape=(None,), dtype="int32")
57+
outputs = keras_nlp.layers.TokenAndPositionEmbedding(
58+
vocabulary_size=len(vocab),
59+
sequence_length=10,
60+
embedding_dim=16,
61+
)(inputs)
62+
outputs = keras_nlp.layers.TransformerEncoder(
63+
num_heads=4,
64+
intermediate_dim=32,
65+
)(outputs)
66+
outputs = keras.layers.GlobalAveragePooling1D()(outputs)
67+
outputs = keras.layers.Dense(1, activation="sigmoid")(outputs)
68+
model = keras.Model(inputs, outputs)
69+
70+
# Run a single batch of gradient descent.
71+
model.compile(loss="binary_crossentropy", jit_compile=True)
72+
model.train_on_batch(x, y)
6873
```
6974

7075
## Compatibility
7176

7277
We follow [Semantic Versioning](https://semver.org/), and plan to
7378
provide backwards compatibility guarantees both for code and saved models built
74-
with our components. While we continue with pre-release `0.y.z` development, we
7579
may break compatibility at any time and APIs should not be consider stable.
7680

81+
## Citing KerasNLP
82+
83+
If KerasNLP helps your research, we appreciate your citations.
84+
Here is the BibTeX entry:
85+
86+
```bibtex
87+
@misc{kerasnlp2022,
88+
title={KerasNLP},
89+
author={Watson, Matthew, and Qian, Chen, and Zhu, Scott and Chollet, Fran\c{c}ois and others},
90+
year={2022},
91+
howpublished={\url{https://github.com/keras-team/keras-nlp}},
92+
}
93+
```
94+
7795
Thank you to all of our wonderful contributors!
7896

7997
<a href="https://github.com/keras-team/keras-nlp/graphs/contributors">
8098
<img src="https://contrib.rocks/image?repo=keras-team/keras-nlp" />
81-
</a>
99+
</a>

0 commit comments

Comments
 (0)