Not able to parse image information along with text and table in correct sequence #2982
Replies: 2 comments
-
Hi Team, I tried below code but it seems like I am not able to parse images properly along with text and table data. Below code is showing bunch of images together and missing all text/tables in between ,which is not correct as per pdf file. Can someone please advise what else can be done here? import fitz # PyMuPDF def in_table(tab_bbox,image_bbox,line_bbox): def parse_pdf(pdf_path):
Calling the functionpdf_path = "EOS-User-Manual.pdf" Thank you |
Beta Was this translation helpful? Give feedback.
-
You have not attached the PDF, so there is no way to compare your comments and code with the file's data. Otherwise, I have problems to understand your code: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Reposting again as it seems like last discussion was closed:
Hi Team,
I am able to parse table and text data in order using below code. But now I also wanted to fetch image data along with text & table in sequence. When I tried adding image on below code, I am just getting repeated images and that not in alignment with text & table data on pdf.
Is there any logic which I can apply to below code to be able to fetch images in sequence along with text and table data? Below is my code:
import fitz # PyMuPDF
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
def in_table(tab_bbox, line_bbox):
tab_rect = fitz.Rect(tab_bbox)
line_rect = fitz.Rect(line_bbox)
return line_rect.intersects(tab_rect)
def parse_pdf(pdf_path):
Open pdf
doc = fitz.open(pdf_path)
#parsed_data=[]
Iterate through all pages in the PDF document
for page_num in range(doc.page_count):
page = doc[page_num]
doc.close()
Calling the function
pdf_path = "EOS-User-Manual.pdf"
parse_data = parse_pdf(pdf_path)
#calling sub-set
parsed_data=parse_data[0:100]
print(parsed_data)
print()
Thank you
Reema Jain
Beta Was this translation helpful? Give feedback.
All reactions