Skip to content

Commit 84bd7be

Browse files
pcoetcopybara-github
authored andcommitted
Add intros to TensorFlow text processing tutorials and section.
PiperOrigin-RevId: 524068999
1 parent 7b3bd44 commit 84bd7be

File tree

2 files changed

+96
-0
lines changed

2 files changed

+96
-0
lines changed

site/en/tutorials/_toc.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,8 @@ toc:
125125
- title: "Text"
126126
style: accordion
127127
section:
128+
- title: "Text and natural language processing"
129+
path: /tutorials/text/index
128130
- title: "Word embeddings"
129131
path: /text/guide/word_embeddings
130132
status: external

site/en/tutorials/text/index.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Text and natural language processing with TensorFlow
2+
3+
Before you can train a model on text data, you'll typically need to process
4+
(or preprocess) the text. In many cases, text needs to be tokenized and
5+
vectorized before it can be fed to a model, and in some cases the text requires
6+
additional preprocessing steps such as normalization and feature selection.
7+
8+
After text is processed into a suitable format, you can use it in natural
9+
language processing (NLP) workflows such as text classification, text
10+
generation, summarization, and translation.
11+
12+
TensorFlow provides two libraries for text and natural language processing:
13+
KerasNLP ([GitHub](https://github.com/keras-team/keras-nlp)) and
14+
TensorFlow Text ([GitHub](https://github.com/tensorflow/text)).
15+
16+
KerasNLP is a high-level NLP modeling library that includes all the latest
17+
transformer-based models as well as lower-level tokenization utilities. It's the
18+
recommended solution for most NLP use cases. Built on TensorFlow Text, KerasNLP
19+
abstracts low-level text processing operations into an API that's designed for
20+
ease of use. But if you prefer not to work with the Keras API, or you need
21+
access to the lower-level text processing ops, you can use TensorFlow Text
22+
directly.
23+
24+
## Keras NLP
25+
26+
The easiest way to get started processing text in TensorFlow is to use
27+
[KerasNLP](https://keras.io/keras_nlp/). KerasNLP is a natural language
28+
processing library that supports workflows built from modular components that
29+
have state-of-the-art preset weights and architectures. You can use KerasNLP
30+
components with their out-of-the-box configuration. If you need more control,
31+
you can easily customize components. KerasNLP provides in-graph computation for
32+
all workflows so you can expect easy productionization using the TensorFlow
33+
ecosystem.
34+
35+
KerasNLP contains end-to-end implementations of popular
36+
[model architectures](https://keras.io/api/keras_nlp/models/) like
37+
[BERT](https://keras.io/api/keras_nlp/models/bert/) and
38+
[FNet](https://keras.io/api/keras_nlp/models/f_net/). Using KerasNLP models,
39+
layers, and tokenizers, you can complete many state-of-the-art NLP workflows,
40+
including
41+
[machine translation](https://keras.io/examples/nlp/neural_machine_translation_with_keras_nlp/),
42+
[text generation](https://keras.io/examples/generative/text_generation_gpt/),
43+
[text classification](https://keras.io/examples/nlp/fnet_classification_with_keras_nlp/),
44+
and
45+
[transformer model training](https://keras.io/guides/keras_nlp/transformer_pretraining/).
46+
47+
KerasNLP is an extension of the core Keras API, and every high-level KerasNLP
48+
module is a `Layer` or `Model`. If you're familiar with Keras, you already
49+
understand most of KerasNLP.
50+
51+
## TensorFlow Text
52+
53+
KerasNLP provides high-level text processing modules that are available as
54+
layers or models. If you need access to lower-level tools, you can use
55+
[TensorFlow Text](https://www.tensorflow.org/text/guide/tf_text_intro).
56+
TensorFlow Text provides operations and libraries to help you work with raw text
57+
strings and documents. TensorFlow Text can perform the preprocessing regularly
58+
required by text-based models, and it also includes other features useful for
59+
sequence modeling.
60+
61+
Using TensorFlow Text, you can do the following:
62+
63+
* Apply feature-rich tokenizers that can split strings on whitespace, separate
64+
words and punctuation, and return byte offsets with tokens, so that you know
65+
where a string can be found in the source text.
66+
* Check if a token matches a specified string pattern. You can check for
67+
capitalization, punctuation, numerical data, and other token features.
68+
* Combine tokens into n-grams.
69+
* Process text within the TensorFlow graph, so that tokenization during training
70+
matches tokenization at inference.
71+
72+
## Where to start
73+
74+
The following resources will help you get started with TensorFlow text
75+
processing:
76+
77+
* [TensorFlow Text](https://www.tensorflow.org/text): Tutorials, guides, and
78+
other resources to help you process text using TensorFlow Text and KerasNLP.
79+
* [KerasNLP](https://keras.io/keras_nlp/): Documentation and resources for
80+
KerasNLP.
81+
* [Getting Started with KerasNLP](https://keras.io/guides/keras_nlp/getting_started/)
82+
* [Pretraining a Transformer from scratch with KerasNLP](https://keras.io/guides/keras_nlp/transformer_pretraining/)
83+
* [TensorFlow tutorials](https://www.tensorflow.org/tutorials): The core
84+
TensorFlow documentation (this guide) includes several text processing
85+
tutorials.
86+
* [Basic text classification](https://www.tensorflow.org/tutorials/keras/text_classification)
87+
* [Text classification with TensorFlow Hub: Movie reviews](https://www.tensorflow.org/tutorials/keras/text_classification_with_hub)
88+
* [Load text](https://www.tensorflow.org/tutorials/load_data/text)
89+
* [word2vec](https://www.tensorflow.org/tutorials/text/word2vec)
90+
* [Warm-start embedding layer matrix](https://www.tensorflow.org/tutorials/text/warmstart_embedding_matrix)
91+
* [Image captioning with visual attention](https://www.tensorflow.org/tutorials/text/image_captioning)
92+
* [Google Machine Learning: Text Classification guide](https://developers.google.com/machine-learning/guides/text-classification):
93+
A step-by-step introduction to text classification. This is a good place to
94+
start if you're new to machine learning.

0 commit comments

Comments
 (0)