Skip to content
Discussion options

You must be logged in to vote

The different behavior in displaCy is because it has a collapse_punct option, which is True by default, that merges punctuation with preceding tokens. So if the period is flagged as punctuation it will be merged.

The reason your periods are not getting flagged is that by calling the object constructor directly you are not getting any of the default functions for setting lex attributes. You can get them by using spacy.vocab.create_vocab instead, though I would recommend just stealing the Vocab from a blank English pipeline (spacy.blank("en")).

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@DavidNemeskey
Comment options

@adrianeboyd
Comment options

@DavidNemeskey
Comment options

@adrianeboyd
Comment options

Answer selected by DavidNemeskey
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / pipeline Feature: Processing pipeline and components feat / visualizers Feature: Built-in displaCy and other visualizers feat / doc Feature: Doc, Span and Token objects
3 participants