Text or sentences? #13363
shashko-a
started this conversation in
Help: Best practices
Text or sentences?
#13363
Replies: 1 comment 1 reply
-
Hi! The NER model in spaCy will mostly look at local context. For annotators as well, it's usually sufficient to see the local context to do NE annotation - so I think the granularity of a single sentence will probably work best. Either way - if the sentences are independent and not coming from the same original document, I definitely wouldn't merge them into a single annotation/document, as this may actually be confusing ML models trained on such data. Hope that helps! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to train spaCy model with new data to find colors.
I collected a few hundreds of independent sentences and during annotation (I used https://tecoholic.github.io/ner-annotator/) I faced with the question: what's the best way to annotate data and to train spaCy with them?
Should I put all my sentences as one giant string and get one "entities" block in my json from annotator, or would it be better to separate each sentence and to get a json structure like "1 sentence - it's entities, 2 sentence - it's entities,..."?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions