Unable to identify cropped region in images #3006
Replies: 4 comments
-
This is a "Discussions" item - no bug, as far as we can see at the moment. |
Beta Was this translation helpful? Give feedback.
-
Please provide an example file - not just an image. |
Beta Was this translation helpful? Give feedback.
-
I might have found a (partial) solution, but some problems still remains. The process I follow now is the following:
The main problems I still have are:
Original document: test_page_2.pdf Problem (2) affect the output (note that it is not present the upside-down clipped text) The output has been produced by the following script import fitz
import numpy as np
import cv2
from functools import reduce
fn_in = "test_page_2.pdf"
doc = fitz.open(fn_in)
page = doc.load_page(0)
bbox_log = page.get_bboxlog()
images = page.get_images(full=True)
# Extract clips
clips = []
drawings = page.get_cdrawings(extended=True)
for drw in drawings:
if drw["type"] != "clip" or drw["level"] != 0:
continue
clips.append(drw["scissor"])
# Extract images
images = page.get_image_info(xrefs=True)
# Enrich
for img in images:
xref = img["xref"]
pix = fitz.Pixmap(doc, xref)
image = (
np.frombuffer(pix.samples_mv, dtype=np.uint8).reshape((pix.height, pix.width, -1)).copy()
)
img["page"] = page.number
img["image"] = image
img["bbox"] = fitz.Rect(img["bbox"]) * page.rotation_matrix
img["transform"] = fitz.Matrix(img["transform"]) * page.rotation_matrix
img["page_rotation"] = page.rotation_matrix
# Recreate page
dpi_ocr = 300
pw, ph = [int(round(d / 72 * dpi_ocr)) for d in [page.rect.width, page.rect.height]]
page_img = np.full((ph, pw, 3), np.iinfo(np.uint8).max, dtype=np.uint8)
objs = []
for o_type, o_rect in bbox_log:
j0, i0, j1, i1 = [d / 72 * dpi_ocr for d in o_rect]
if o_type == "fill-path":
for obj in [d for d in drawings if d["type"] == "f" and d["rect"] == o_rect]:
i0, j0 = max(int(np.floor(i0)), 0), max(int(np.floor(j0)), 0)
i1, j1 = min(int(np.ceil(i1)), ph), min(int(np.floor(j1)), pw)
image = np.stack(
[
np.full(
(i1 - i0, j1 - j0),
int(round(channel * np.iinfo(np.uint8).max)),
dtype=np.uint8,
)
for channel in obj["fill"] + (obj["fill_opacity"],)
],
axis=-1,
)
img = np.full((ph, pw, 4), np.iinfo(np.uint8).max, dtype=np.uint8)
img[..., 3] = 0
img[i0:i1, j0:j1, :] = image
objs.append(img)
if o_type == "fill-image":
for obj in [i for i in images if i["bbox"] == o_rect]:
image = obj["image"].copy()
# Transform to rgba
if image.ndim == 2 or image.shape[2] == 1:
image = cv2.cvtColor(image[:, :], cv2.COLOR_GRAY2RGBA)
elif image.shape[2] == 3:
image = cv2.cvtColor(image, cv2.COLOR_RGB2RGBA)
# Eventually resample to desired dpi
res_1 = max(image.shape[:2]) / (max(obj["bbox"].width, obj["bbox"].height) / 72)
res_2 = min(image.shape[:2]) / (min(obj["bbox"].width, obj["bbox"].height) / 72)
if abs(res_1 - res_2) > 1:
dpi_img = int(round(res_1))
else:
dpi_img = int(round(0.5 * (res_1 + res_2)))
if abs(dpi_ocr - dpi_img) > 1:
h, w = [int(round(d / dpi_img * dpi_ocr)) for d in image.shape[:2]]
ocr_image = cv2.resize(image, (w, h))
else:
h, w = image.shape[:2]
ocr_image = image
# Transform to page coordinates
tr_px2pt = fitz.Matrix(1 / w, 0, 0, 1 / h, 0, 0) * obj["transform"]
pt2px = dpi_ocr / 72 # (points) -> px
tr_pt2px = fitz.Matrix(pt2px, 0, 0, pt2px, 0, 0)
tr_px2px = tr_px2pt * tr_pt2px
mat = np.array(
[
[tr_px2px.a, tr_px2px.c, tr_px2px.e],
[tr_px2px.b, tr_px2px.d, tr_px2px.f],
],
dtype=np.float32,
)
img = cv2.warpAffine(ocr_image, mat, (pw, ph))
objs.append(img)
page_img = reduce(
lambda a, b: a[..., :3] * (1 - b[:, :, 3][..., np.newaxis] / np.iinfo(np.uint8).max)
+ b[:, :, :3] * (b[:, :, 3][..., np.newaxis] / np.iinfo(np.uint8).max),
objs,
page_img,
).astype(np.uint8)
bw_page = cv2.cvtColor(page_img, cv2.COLOR_RGB2GRAY)
# Apply clipping
mask_white = np.full((ph, pw), False if len(clips) == 0 else True, dtype=bool)
for rect in clips:
j0, i0, j1, i1 = [d / 72 * dpi_ocr for d in rect]
i0, j0 = max(int(np.floor(i0)), 0), max(int(np.floor(j0)), 0)
i1, j1 = min(int(np.ceil(i1)), ph), min(int(np.floor(j1)), pw)
mask_white[i0:i1, j0:j1] = False
bw_page[mask_white] = np.iinfo(np.uint8).max
cv2.imwrite("page_image.png", bw_page) |
Beta Was this translation helpful? Give feedback.
-
All methods For If two text bboxes fully equal each other (rare!), then this usually happens because of creator-intended simulation of text effects like boldness or text shading. Clipping info extraction currently is only supported for vector graphics, not for text, images or shadings. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm not able to detect cropped image using the
get_bboxlog()
method (fitz version 1.23.7).I generated the attached PDF with two cropped image (one rotated 90°), but the extraction gives me the bounding boxes of the non-cropped images:
In the following the rendered PDF page and the script used to replicate the result. What am I doing wrong?

Originally posted by @abe-mxff in #1312 (comment)
Beta Was this translation helpful? Give feedback.
All reactions