Split pdf based upon patterns #12650

standenman · 2023-05-19T20:22:30Z

standenman
May 19, 2023

I apologize if this question kind of overlaps with another question I asked here, but I am kind of thinking this thing through.

My goal is to programmatically split a pdf doc of varied medical records by identifying the records that represent a given office visit. In thinking about this issue I am mindful of a couple of key points.

Most medical records are the result of an EHR vendor template, and most of those templates may have different labels for the date of service ("date of service", "visit date", "encounter date:", etc), AND a given EHR vendor follows a specific sequence of sections ("Visit Notes", "Medical Problems", "Medications" etc). And nearly all of these records identify the EHR vendor somewhere at the top of bottom of the page. So could I not establish a "template" say for EHR vendor "Epic" such that my python code would recognize, "this sequence in this set of pages represents one discreet office visit in the Epic EHR system"?

If so how would I be creating that? Or would I be going through training sets with Prodigy and annotating? I am just confused about what direction to move in. Thank you.

danieldk · 2023-05-23T09:49:31Z

danieldk
May 23, 2023

Like the earlier discussion, this is not a question about spaCy, but about PDF processing. We will leave the discussion open in case that there is someone from the community who has experience with PDF processing and would like to chime in.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Split pdf based upon patterns #12650

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Split pdf based upon patterns #12650

Uh oh!

standenman May 19, 2023

Replies: 1 comment

Uh oh!

danieldk May 23, 2023

standenman
May 19, 2023

danieldk
May 23, 2023