Skip to content
Discussion options

You must be logged in to vote

Sorry you're having trouble with this. The issue is that the tokenizer is treating 独伊 as a single token, and so your character boundaries don't align with token boundaries. In that case char_span returns None, as mentioned in the docs.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@BrambleXu
Comment options

@polm
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / ja Japanese language data and models feat / doc Feature: Doc, Span and Token objects
2 participants
Converted from issue

This discussion was converted from issue #12081 on January 10, 2023 08:26.