What "noise" is being referred to in spaCy's debug? #9260

Lolologist · 2021-09-21T18:55:00Z

Lolologist
Sep 21, 2021

When running debug as part of a pipeline, the Named Entity Recognition portion always says: "Entity spans consisting of or starting/ending with punctuation can not be trained with a noise level > 0."
There is no noise variable to set, and depending on what one might be it could be interesting to try messing with.

Does anyone know what that refers to? Could I manipulate some noise setting to be more resilient against not-quite-gold data?

https://github.com/explosion/spaCy/blob/master/spacy/cli/debug_data.py#L279

Answered by adrianeboyd

Sep 22, 2021

The particular reference to "noise" looks like a holdover from v2.

In v3 you can use data augmentation with the corpus reader. spacy itself includes a few simple augmenters (which we use for the pretrained pipelines): https://spacy.io/api/top-level#augmenters, https://spacy.io/api/top-level#corpus

Also check out the new augmenty package, which has many more augmenters: https://github.com/kennethenevoldsen/augmenty

View full answer

adrianeboyd · 2021-09-22T06:48:30Z

adrianeboyd
Sep 22, 2021

The particular reference to "noise" looks like a holdover from v2.

In v3 you can use data augmentation with the corpus reader. spacy itself includes a few simple augmenters (which we use for the pretrained pipelines): https://spacy.io/api/top-level#augmenters, https://spacy.io/api/top-level#corpus

Also check out the new augmenty package, which has many more augmenters: https://github.com/kennethenevoldsen/augmenty

1 reply

Lolologist Sep 22, 2021
Author

Thank you! I will give those a look.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What "noise" is being referred to in spaCy's debug? #9260

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What "noise" is being referred to in spaCy's debug? #9260

Uh oh!

Lolologist Sep 21, 2021

Replies: 1 comment · 1 reply

Uh oh!

adrianeboyd Sep 22, 2021

Uh oh!

Lolologist Sep 22, 2021 Author

Lolologist
Sep 21, 2021

Replies: 1 comment 1 reply

adrianeboyd
Sep 22, 2021

Lolologist Sep 22, 2021
Author