You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lib/tokenizers.ex
+11-8Lines changed: 11 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -4,16 +4,19 @@ defmodule Tokenizers do
4
4
5
5
Hugging Face describes the Tokenizers library as:
6
6
7
-
> Fast State-of-the-art tokenizers, optimized for both research and production
7
+
> Fast State-of-the-art tokenizers, optimized for both research and
8
+
> production
8
9
>
9
-
> 🤗 Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility. These tokenizers are also used in 🤗 Transformers.
10
+
> 🤗 Tokenizers provides an implementation of today’s most used
11
+
> tokenizers, with a focus on performance and versatility. These
12
+
> tokenizers are also used in 🤗 Transformers.
10
13
11
-
This library has bindings to use pretrained tokenizers. Support for building and training
12
-
a tokenizer from scratch is forthcoming.
14
+
A tokenizer is effectively a pipeline of transformations that take
15
+
a text input and return an encoded version of that text (`t:Tokenizers.Encoding.t/0`).
13
16
14
-
A tokenizer is effectively a pipeline of transforms to take some input text and return a
15
-
`Tokenizers.Encoding.t()`. The main entrypoint to this library is the `Tokenizers.Tokenizer`
16
-
module, which holds the `Tokenizers.Tokenizer.t()` struct, a container holding the constituent
17
-
parts of the pipeline. Most functionality is there.
17
+
The main entrypoint to this library is the `Tokenizers.Tokenizer`
18
+
module, which defines the `t:Tokenizers.Tokenizer.t/0` struct, a
19
+
container holding the constituent parts of the pipeline. Most
0 commit comments