Skip to content

Commit 0bc7dc0

Browse files
authored
Fix typo (#532)
1 parent d229ff7 commit 0bc7dc0

File tree

1 file changed

+1
-1
lines changed
  • chapters/en/chapter6

1 file changed

+1
-1
lines changed

chapters/en/chapter6/3.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ We can see that the tokenizer's special tokens `[CLS]` and `[SEP]` are mapped to
109109

110110
<Tip>
111111

112-
The notion of what a word is is complicated. For instance, does "I'll" (a contraction of "I will") count as one or two words? It actually depends on the tokenizer and the pre-tokenization operation it applies. Some tokenizers just split on spaces, so they will consider this as one word. Others use punctuation on top of spaces, so will consider it two words.
112+
The notion of what a word is complicated. For instance, does "I'll" (a contraction of "I will") count as one or two words? It actually depends on the tokenizer and the pre-tokenization operation it applies. Some tokenizers just split on spaces, so they will consider this as one word. Others use punctuation on top of spaces, so will consider it two words.
113113

114114
✏️ **Try it out!** Create a tokenizer from the `bert-base-cased` and `roberta-base` checkpoints and tokenize "81s" with them. What do you observe? What are the word IDs?
115115

0 commit comments

Comments
 (0)