Large Entities that Span Multiple Sentences #8878
Replies: 1 comment
-
This would generally not be handled as an NER problem, and in particular spaCy's NER won't handle it well (see the docs). It looks like your document has very clearly defined chunks, so what I would do is chop up those chunks (the whitespace-separated blocks) and then you can use NER within that to find things like DRUG_NAME, DOSAGE, DATE. If things are in one chunk you know they correspond. (Chunks can be treated as separate spaCy Docs, or potentially as sentences.) For chopping the chunks you can use a regex or whatever as preprocessing before you pass the text to spaCy. You might want to look at Med7, which is an NER label scheme and dataset for similar data. Also note that if you're linking different entities, like matching a dosage with the relevant drug, that would be a relation extraction task, and not "dependencies" in the context of spaCy, which are a grammatical concept and don't really make sense outside of complete sentences. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
Curious about best practices when building a NER that labels large entities that span multiple sentences.
The majority of the examples I've seen have been finding 1-3 word entities in a short, small sentence. My use case, however, asks that I find a paragraph-sized entity on a medical record page. A contrived example can be seen below:
patient name: john smith
admit date: 1/1/2021
discharge date: 1/5/2021
Medications given:
Acetaminophen, signed by Dr. Jane Smith, MD
Given on 1/3/2021
Dosage: 500mg
Oxycodone, signed by Dr. Jane Smith, MD
Given on 1/2/2021
Dosage: 100ml
Document printed on 1/5/2021. St. Mary's Hospital
In this example, I would attempt to label the bold text because I would need not only the medication (oxycodone) but also the dosage and the date given. Is it best practice to try and create an entity from this whole chunk of text or is best to create smaller entities of date given, dosage, etc, and then try and link them as dependencies?
Thanks in advance for any advice
Beta Was this translation helpful? Give feedback.
All reactions