-
Hi Community, First of all, thanks a lot to the developers (and other helpers) to bring that great library for struggling against the PDF format (mess?). I currently have a set of PDFs that were generated by Autocad (version 14). The PDFs have been constructed taken into consideration the layers defined in Autocad. I'm trying to read the text "contained" in (or controlled by) those layers. import fitz
doc = fitz.open(filename)
page = doc[0]
text = page.get_text("text") Unfortunately, ocgs = doc.get_ocgs() 'ocgs' returns a dictionary with layer ids and names {...
32: {'name': 'PROCESS', 'intent': [], 'on': True, 'usage': None},
12: {'name': 'EQUIPMENT', 'intent': [], 'on': True, 'usage': None},
38: {'name': 'VENDOR', 'intent': [], 'on': True, 'usage': None},
...} Then I tried to dig into the len_xref=doc.xref_length()
for xref in range(1, len_xref):
print('')
print(f"Object {xref}, stream: {doc.xref_is_stream(xref)}")
print(doc.xref_object(xref, compressed=False)) It returns again a list of objects, but no link to the actual content of the objects.
Has anyone an idea how to get access to the text contained in layers? in the oc? If so, how to link the layer to the PS: for the time being I don't want to use OCR extraction. Only if no other choice. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Thanks for the nice feedback! Before a text extraction, you could temporarily switch to the desired layer using |
Beta Was this translation helpful? Give feedback.
-
Hi Jorj, |
Beta Was this translation helpful? Give feedback.
Thanks for the nice feedback!
Before a text extraction, you could temporarily switch to the desired layer using
doc.set_layer_ui_config(number, action=0)
. This is what you would do using a supporting PDF viewer. Available configurations can be display bydoc.layer_ui_configs()
.The action parameter is 0 = set on (default), 1 = toggle on/off, 2 = set off.