Default sentence boundary detection during training in en_core_web_sm #8693
-
If I am correct, v3 came with some new ways to do sentence segmentation. Now there are a number of ways:
My question is about the training config of
So the
Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Disabled components aren't trained if they're disabled in the config used for training. Training it separately and then disabling it is part of the internal collate script (and part of why we're not using I had a feeling I'd answered some of this before: #7624 (comment) None of them clobber the annotations from previous components and the parser respects existing sentence boundaries. The parser is by far the slowest, but here it probably makes sense to run evaluations with your own pipelines / data. The senter can be even faster (at slightly reduced accuracy) if you further reduce the parameters in the model. A related post: #7218 (comment) |
Beta Was this translation helpful? Give feedback.
Disabled components aren't trained if they're disabled in the config used for training. Training it separately and then disabling it is part of the internal collate script (and part of why we're not using
spacy assemble
directly, which would 90% of what we need). We want to shipsenter
with the pipeline but leave the default as the parser because the quality is higher.I had a feeling I'd answered some of this before: #7624 (comment)
None of them clobber the annotations from previous components and the parser respects existing sentence boundaries.
The parser is by far the slowest, but here it probably makes sense to run evaluations with your own pipelines / data. The senter can be even fa…