Skip to content
Discussion options

You must be logged in to vote

However, this does not work for the Japanese model (the fine-grained POS tags are missing).

Japanese fine-grained part of speech tags are taken directly from SudachiPy output, so spaCy has no model for that. You can train a Tagger component to provide the tags; I know that's been done before with pretty good results.

For most Japanese tokenizers, tokenization is done jointly with (pseudo-)POS tag assignment, so I would expect your source of tokens to also give you tags. What tokenizer are you using?

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@BLKSerene
Comment options

@BLKSerene
Comment options

@polm
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / ja Japanese language data and models feat / tagger Feature: Part-of-speech tagger feat / morphologizer Feature: Morphologizer
2 participants