|
| 1 | +# Text and natural language processing with TensorFlow |
| 2 | + |
| 3 | +Before you can train a model on text data, you'll typically need to process |
| 4 | +(or preprocess) the text. In many cases, text needs to be tokenized and |
| 5 | +vectorized before it can be fed to a model, and in some cases the text requires |
| 6 | +additional preprocessing steps such as normalization and feature selection. |
| 7 | + |
| 8 | +After text is processed into a suitable format, you can use it in natural |
| 9 | +language processing (NLP) workflows such as text classification, text |
| 10 | +generation, summarization, and translation. |
| 11 | + |
| 12 | +TensorFlow provides two libraries for text and natural language processing: |
| 13 | +KerasNLP ([GitHub](https://github.com/keras-team/keras-nlp)) and |
| 14 | +TensorFlow Text ([GitHub](https://github.com/tensorflow/text)). |
| 15 | + |
| 16 | +KerasNLP is a high-level NLP modeling library that includes all the latest |
| 17 | +transformer-based models as well as lower-level tokenization utilities. It's the |
| 18 | +recommended solution for most NLP use cases. Built on TensorFlow Text, KerasNLP |
| 19 | +abstracts low-level text processing operations into an API that's designed for |
| 20 | +ease of use. But if you prefer not to work with the Keras API, or you need |
| 21 | +access to the lower-level text processing ops, you can use TensorFlow Text |
| 22 | +directly. |
| 23 | + |
| 24 | +## Keras NLP |
| 25 | + |
| 26 | +The easiest way to get started processing text in TensorFlow is to use |
| 27 | +[KerasNLP](https://keras.io/keras_nlp/). KerasNLP is a natural language |
| 28 | +processing library that supports workflows built from modular components that |
| 29 | +have state-of-the-art preset weights and architectures. You can use KerasNLP |
| 30 | +components with their out-of-the-box configuration. If you need more control, |
| 31 | +you can easily customize components. KerasNLP provides in-graph computation for |
| 32 | +all workflows so you can expect easy productionization using the TensorFlow |
| 33 | +ecosystem. |
| 34 | + |
| 35 | +KerasNLP contains end-to-end implementations of popular |
| 36 | +[model architectures](https://keras.io/api/keras_nlp/models/) like |
| 37 | +[BERT](https://keras.io/api/keras_nlp/models/bert/) and |
| 38 | +[FNet](https://keras.io/api/keras_nlp/models/f_net/). Using KerasNLP models, |
| 39 | +layers, and tokenizers, you can complete many state-of-the-art NLP workflows, |
| 40 | +including |
| 41 | +[machine translation](https://keras.io/examples/nlp/neural_machine_translation_with_keras_nlp/), |
| 42 | +[text generation](https://keras.io/examples/generative/text_generation_gpt/), |
| 43 | +[text classification](https://keras.io/examples/nlp/fnet_classification_with_keras_nlp/), |
| 44 | +and |
| 45 | +[transformer model training](https://keras.io/guides/keras_nlp/transformer_pretraining/). |
| 46 | + |
| 47 | +KerasNLP is an extension of the core Keras API, and every high-level KerasNLP |
| 48 | +module is a `Layer` or `Model`. If you're familiar with Keras, you already |
| 49 | +understand most of KerasNLP. |
| 50 | + |
| 51 | +## TensorFlow Text |
| 52 | + |
| 53 | +KerasNLP provides high-level text processing modules that are available as |
| 54 | +layers or models. If you need access to lower-level tools, you can use |
| 55 | +[TensorFlow Text](https://www.tensorflow.org/text/guide/tf_text_intro). |
| 56 | +TensorFlow Text provides operations and libraries to help you work with raw text |
| 57 | +strings and documents. TensorFlow Text can perform the preprocessing regularly |
| 58 | +required by text-based models, and it also includes other features useful for |
| 59 | +sequence modeling. |
| 60 | + |
| 61 | +Using TensorFlow Text, you can do the following: |
| 62 | + |
| 63 | +* Apply feature-rich tokenizers that can split strings on whitespace, separate |
| 64 | + words and punctuation, and return byte offsets with tokens, so that you know |
| 65 | + where a string can be found in the source text. |
| 66 | +* Check if a token matches a specified string pattern. You can check for |
| 67 | + capitalization, punctuation, numerical data, and other token features. |
| 68 | +* Combine tokens into n-grams. |
| 69 | +* Process text within the TensorFlow graph, so that tokenization during training |
| 70 | + matches tokenization at inference. |
| 71 | + |
| 72 | +## Where to start |
| 73 | + |
| 74 | +The following resources will help you get started with TensorFlow text |
| 75 | +processing: |
| 76 | + |
| 77 | +* [TensorFlow Text](https://www.tensorflow.org/text): Tutorials, guides, and |
| 78 | + other resources to help you process text using TensorFlow Text and KerasNLP. |
| 79 | +* [KerasNLP](https://keras.io/keras_nlp/): Documentation and resources for |
| 80 | + KerasNLP. |
| 81 | + * [Getting Started with KerasNLP](https://keras.io/guides/keras_nlp/getting_started/) |
| 82 | + * [Pretraining a Transformer from scratch with KerasNLP](https://keras.io/guides/keras_nlp/transformer_pretraining/) |
| 83 | +* [TensorFlow tutorials](https://www.tensorflow.org/tutorials): The core |
| 84 | + TensorFlow documentation (this guide) includes several text processing |
| 85 | + tutorials. |
| 86 | + * [Basic text classification](https://www.tensorflow.org/tutorials/keras/text_classification) |
| 87 | + * [Text classification with TensorFlow Hub: Movie reviews](https://www.tensorflow.org/tutorials/keras/text_classification_with_hub) |
| 88 | + * [Load text](https://www.tensorflow.org/tutorials/load_data/text) |
| 89 | + * [word2vec](https://www.tensorflow.org/tutorials/text/word2vec) |
| 90 | + * [Warm-start embedding layer matrix](https://www.tensorflow.org/tutorials/text/warmstart_embedding_matrix) |
| 91 | + * [Image captioning with visual attention](https://www.tensorflow.org/tutorials/text/image_captioning) |
| 92 | +* [Google Machine Learning: Text Classification guide](https://developers.google.com/machine-learning/guides/text-classification): |
| 93 | + A step-by-step introduction to text classification. This is a good place to |
| 94 | + start if you're new to machine learning. |
0 commit comments