Skip to content
Discussion options

You must be logged in to vote

Hi, the tokenizer returns a Doc object rather than just a list of tokens. You can inspect the tokens like this and see that there are 6:

doc = tokenizer(s)
print([t.text for t in doc])
# ['Hello', 'world', ',', 'I', 'am', 'Zaf']

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / tokenizer Feature: Tokenizer
2 participants