Split pdf based upon patterns #12650
Unanswered
standenman
asked this question in
Help: Other Questions
Replies: 1 comment
-
Like the earlier discussion, this is not a question about spaCy, but about PDF processing. We will leave the discussion open in case that there is someone from the community who has experience with PDF processing and would like to chime in. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I apologize if this question kind of overlaps with another question I asked here, but I am kind of thinking this thing through.
My goal is to programmatically split a pdf doc of varied medical records by identifying the records that represent a given office visit. In thinking about this issue I am mindful of a couple of key points.
Most medical records are the result of an EHR vendor template, and most of those templates may have different labels for the date of service ("date of service", "visit date", "encounter date:", etc), AND a given EHR vendor follows a specific sequence of sections ("Visit Notes", "Medical Problems", "Medications" etc). And nearly all of these records identify the EHR vendor somewhere at the top of bottom of the page. So could I not establish a "template" say for EHR vendor "Epic" such that my python code would recognize, "this sequence in this set of pages represents one discreet office visit in the Epic EHR system"?
If so how would I be creating that? Or would I be going through training sets with Prodigy and annotating? I am just confused about what direction to move in. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions