PDF #12217
Replies: 6 comments
-
用fitz这个库转img,用fitz.open(stream=pdf_bytes, filetype='bytes')这个方式读取,源码是直接open文件路径 希望能把PDF bytes files的方式也集成进去 |
Beta Was this translation helpful? Give feedback.
-
But this will write the PDF bytes into a file, right? |
Beta Was this translation helpful? Give feedback.
-
不需要写成file 参考以下代码,imgs可以直接作为入参 import fitz
from PIL import Image
imgs = []
with fitz.open(stream=pdf_bytes, filetype='bytes') as pdf:
for pg in range(0, pdf.page_count):
page = pdf[pg]
mat = fitz.Matrix(2, 2)
pm = page.get_pixmap(matrix=mat, alpha=False)
# if width or height > 2000 pixels, don't enlarge the image
if pm.width > 2000 or pm.height > 2000:
pm = page.get_pixmap(matrix=fitz.Matrix(1, 1), alpha=False)
img = Image.frombytes("RGB", [pm.width, pm.height], pm.samples)
img = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)
imgs.append(img) |
Beta Was this translation helpful? Give feedback.
-
imgs作为一个list,det要传False... 这样似乎还不行,还得imgs[0],imgs[1]... 一页一页放进去,不懂是不是我搞错了 |
Beta Was this translation helpful? Give feedback.
-
The issue with this approach is that each page will be processed separately I think, so the bboxes for each one will not be accumulated, right? |
Beta Was this translation helpful? Give feedback.
-
请问,为什么要做这个enlarge的操作?,虽然但是我直接get_pixmap(),的确效果不太好。 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there a way to make paddleocr object accepts bytes in case of PDF files like it accepts bytes in case of images?
Beta Was this translation helpful? Give feedback.
All reactions