Parsing PDF like data without it being an actual pdf object #3393
Replies: 1 comment
-
Sorry, we cannot see any connection to PyMuPDF. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
A product I use stores templates of pdf objects like the below:
"/LL -0.1566228/LLE 2/Cap true/MeasurementTypes 130/SlopeType 0/PitchRun 12/IT/LineDimension/L[9.5 22.55662 159.209 22.55662]/DS(font: Helvetica 12pt; text-align:center; line-height:13.8pt; color:#0000FF)/BM/Multiply/RC(<?xml version='1.0'?>....)]
I'm having trouble parsing this out because I can't figure out how to handle sections like
0/PitchRun 12/IT/LineDimension/L[9.5 22.55662 159.209 22.55662]
where it is switching between different delimiters and has a complex path. My goal is to turn this into a dictionary so that I can edit the individual values and then write it back. I'm betting there's a method to do this becausexref_object
handles it pretty nicely.Right now I'm processing the text left to right and splitting off the key names and values as I go but it all goes off the rails when I hit what I call "nested values" i.e. \Key1\Subkey\SubSubKey.
Any guidance is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions