What "noise" is being referred to in spaCy's debug? #9260
-
When running debug as part of a pipeline, the Named Entity Recognition portion always says: "Entity spans consisting of or starting/ending with punctuation can not be trained with a noise level > 0." Does anyone know what that refers to? Could I manipulate some noise setting to be more resilient against not-quite-gold data? https://github.com/explosion/spaCy/blob/master/spacy/cli/debug_data.py#L279 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The particular reference to "noise" looks like a holdover from v2. In v3 you can use data augmentation with the corpus reader. spacy itself includes a few simple augmenters (which we use for the pretrained pipelines): https://spacy.io/api/top-level#augmenters, https://spacy.io/api/top-level#corpus Also check out the new |
Beta Was this translation helpful? Give feedback.
The particular reference to "noise" looks like a holdover from v2.
In v3 you can use data augmentation with the corpus reader. spacy itself includes a few simple augmenters (which we use for the pretrained pipelines): https://spacy.io/api/top-level#augmenters, https://spacy.io/api/top-level#corpus
Also check out the new
augmenty
package, which has many more augmenters: https://github.com/kennethenevoldsen/augmenty