You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 10, 2025. It is now read-only.
|**RFC #**|[NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #)|
8
-
|**Author(s)**| Terry Huang (Google)|
8
+
|**Author(s)**| Terry Huang (Google), Arno Eigenwillig (Google), Chen Chen (Google)|
9
9
|**Sponsor**| Xiaodan Song(Google), Greg Billock (Google), Mark Omernick (Google) |
10
10
|**Updated**| 2020-08-24 |
11
11
@@ -27,7 +27,7 @@ Additionally, many existing Python methods write out processed outputs to files
27
27
The proposed new set of text preprocessing APIs will allow users to:
28
28
-**Assemble TF input pipelines w/ reusable, well-tested, standard building blocks** that transform their text datasets into model inputs. Being part of the TF graph also enables users to make preprocessing choices dynamically on the fly.
29
29
-**Drastically simplify their model’s inputs to just text.** Users will be able to easily expand to new datasets for training, evaluation or inference. Models deployed to TF Serving can start from text inputs and encapsulate the details of preprocessing.
30
-
-**Reduce risks of training/serving skew** by giving models stronger ownership of the entire preprocessing and postprocessing process.
30
+
-**Reduce risks of training/serving skew** by giving models stronger ownership of the entire preprocessing process.
31
31
-**Reduced complexity and improved input pipeline efficiency** by removing an extra read & write step to transform their datasets and improved efficiency w/ vectorized mapping by processing inputs in batches.
The output of the tf.data pipeline is integer inputs transformed from the raw text and can be fed directly to the model (e.g., bert_pretraining model in model_garden):
155
-
154
+
The outputs of the tf.data pipeline are integer inputs transformed from the raw text and can be fed directly to the model:
155
+
156
156
```
157
157
{
158
158
'input_ids': [
@@ -231,7 +231,7 @@ class SplitterWithOffsets(Splitter):
231
231
"""
232
232
```
233
233
234
-
Splitter subclasses can implement different algorithms for segmenting strings and can even be a trained TF model. We also introduce two concrete implementations of Splitter: RegexSplitter and StateBasedSentenceBreaker).
234
+
Splitter subclasses can implement different algorithms for segmenting strings and can even be a trained TF model. We also introduce two concrete implementations of Splitter: `RegexSplitter` and `StateBasedSentenceBreaker`).
0 commit comments