Training data for English language models #9533
Replies: 1 comment 2 replies
-
Hi, the pretrained English pipelines don't actually contain a morphologizer. The Not all of the UD categories can be mapped easily from the OntoNotes annotation, so there may be some errors in the results, e.g. #8856 (reply in thread). We're working on updating some of the AUX/VERB and |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey!
I'm extending some of my learning materials for spaCy and came across a question that I couldn't find an answer to.
According to spaCy docs, the English language models are trained on the OntoNotes5 corpus, ClearNLP (for converting the treebanks) and WordNet. But on which data is the Morphologizer component of the English pipelines trained on? Some Universal Dependencies dataset?
Beta Was this translation helpful? Give feedback.
All reactions