NER for partially annotated documents #12239
-
Hello ! We are using spacy to learn multiple tasks from a documents corpus:
We have good results for the two first tasks. But we encounter some difficulties for the ner. Problem definitionLet’s assume the entities are ENT1 and ENT2. Our dataset is sparse and some documents are not fully labelled. For a given document we can have either:
We know for each document which entities has been reviewed and annotated. We want to use all documents to learn to predict every entities. Question
Thank you very much for your help ! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
Hi @rverdier65,
You say "For a given document", but refer to a "dataset" in this sentence. Did you mean to say "document", i. e. "if there is no occurence of ENT2 in the document"? In general, could you elaborate on what you mean with "reviewed"? |
Beta Was this translation helpful? Give feedback.
-
Hi @rmitsch, thank you for your answer. Indeed I am refering to the document. What I mean with 'reviewed and annotated' is the following:
So for each document, we know which entity has been 'reviewed and annotated'. Some documents can have only ENT1 'reviewed an annotated', others only ENT2, other both and other none of them. We want to use all documents, even if they are 'partially annotated' = not all the entities have been 'reviewed an annotated' in the document. Is it clearer ? |
Beta Was this translation helpful? Give feedback.
Hi @rmitsch, thank you for your answer.
Indeed I am refering to the document.
I meant to say "if there is no annotated entries of ENT2 for this document"
What I mean with 'reviewed and annotated' is the following:
For a given entity, for example ENT2, we consider that the document has been 'reviewed and annotated' by a user, if the user have exhaustively annotated all the occurences of ENT2 in the document.
There is 2 possibilities: