How to properly mock a spacy Vocab? #9255
-
Following this question on SO, I would like to create doc = Doc(vocab, words=['This', 'is', 'a', 'sentence', '.'],
spaces=[True, True, True, False, True],
tags=['DT', 'VBZ', 'DT', 'NN', '.'],
pos=['DET', 'AUX', 'DET', 'NOUN', 'PUNCT'],
lemmas=['this', 'be', 'a', 'sentence', '.'],
sent_starts=[True, False, False, False, False],
heads=[1, 1, 3, 1, 1],
deps=['nsubj', 'ROOT', 'det', 'attr', 'punct']) , However, if I use the English Having looked at the Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
The different behavior in displaCy is because it has a The reason your periods are not getting flagged is that by calling the object constructor directly you are not getting any of the default functions for setting lex attributes. You can get them by using |
Beta Was this translation helpful? Give feedback.
The different behavior in displaCy is because it has a
collapse_punct
option, which is True by default, that merges punctuation with preceding tokens. So if the period is flagged as punctuation it will be merged.The reason your periods are not getting flagged is that by calling the object constructor directly you are not getting any of the default functions for setting lex attributes. You can get them by using
spacy.vocab.create_vocab
instead, though I would recommend just stealing the Vocab from a blank English pipeline (spacy.blank("en")
).