Parsing PDF like data without it being an actual pdf object #3393

JustADetailer · 2024-04-17T15:00:57Z

JustADetailer
Apr 17, 2024

A product I use stores templates of pdf objects like the below:

"/LL -0.1566228/LLE 2/Cap true/MeasurementTypes 130/SlopeType 0/PitchRun 12/IT/LineDimension/L[9.5 22.55662 159.209 22.55662]/DS(font: Helvetica 12pt; text-align:center; line-height:13.8pt; color:#0000FF)/BM/Multiply/RC(<?xml version='1.0'?>....)]

I'm having trouble parsing this out because I can't figure out how to handle sections like 0/PitchRun 12/IT/LineDimension/L[9.5 22.55662 159.209 22.55662] where it is switching between different delimiters and has a complex path. My goal is to turn this into a dictionary so that I can edit the individual values and then write it back. I'm betting there's a method to do this because xref_object handles it pretty nicely.

Right now I'm processing the text left to right and splitting off the key names and values as I go but it all goes off the rails when I hit what I call "nested values" i.e. \Key1\Subkey\SubSubKey.

Any guidance is appreciated.

JorjMcKie · 2024-04-17T15:28:18Z

JorjMcKie
Apr 17, 2024
Maintainer

Sorry, we cannot see any connection to PyMuPDF.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parsing PDF like data without it being an actual pdf object #3393

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Parsing PDF like data without it being an actual pdf object #3393

Uh oh!

JustADetailer Apr 17, 2024

Replies: 1 comment

Uh oh!

JorjMcKie Apr 17, 2024 Maintainer

JustADetailer
Apr 17, 2024

JorjMcKie
Apr 17, 2024
Maintainer