Skip to content
Discussion options

You must be logged in to vote

Each model lists sources in nlp.meta and meta.json and on the pages under https://spacy.io/models. The fine-grained tags in token.tag are usually language-specific and frequently also corpus-specific, so you can find more information in the corpus documentation. English uses the PTB tagset and German uses the STTS tagset.

In contrast, token.pos uses universal POS tags from the Universal Dependencies project, which are the same across all languages. They're not 100% 1-to-1 for every single corpus/language in every possible detailed case, but they're used relatively consistently across languages.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@AnnemarieWittig
Comment options

@adrianeboyd
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / en English language data and models lang / de German language data and models models Issues related to the statistical models
2 participants