do we include non entities in training data for custom spaCy NER model? What does the structure of the .spacy training data file look like? #11391
-
|
Hi Everyone, The words which are entities have special tokens corresponding to them and the words which are not entities have a "O" token corresponding to them. My question is, when I am creating the training data in the spacy format, Should I include both entity and non entity tags in it to give the model an 'better' understanding of what an entity vs a non entity looks like? Or should I only include the labels which actually point to an entity. Please note that I am not talking about fine tuning a pretrained NER model. I am talking about creating a fully custom model for a specialized task. I am just using a toy dataset from Kaggle to understand what to look out for when I start training my actual model. this is what the toy dataset looks like: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 11 replies
-
|
Hello, |
Beta Was this translation helpful? Give feedback.




Hello,
you can use the spacy convert command to convert
IOBfiles to the.spacytraining format.Here are some example files which you can use as a starting point for your data.