do we include non entities in training data for custom spaCy NER model? What does the structure of the .spacy training data file look like? #11391
-
Hi Everyone, The words which are entities have special tokens corresponding to them and the words which are not entities have a "O" token corresponding to them. My question is, when I am creating the training data in the spacy format, Should I include both entity and non entity tags in it to give the model an 'better' understanding of what an entity vs a non entity looks like? Or should I only include the labels which actually point to an entity. Please note that I am not talking about fine tuning a pretrained NER model. I am talking about creating a fully custom model for a specialized task. I am just using a toy dataset from Kaggle to understand what to look out for when I start training my actual model. this is what the toy dataset looks like: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 11 replies
-
Hello, |
Beta Was this translation helpful? Give feedback.
Hello,
you can use the spacy convert command to convert
IOB
files to the.spacy
training format.Here are some example files which you can use as a starting point for your data.