PaddleOCR Layout Coordinate Mismatch #15957

erisol-dev · 2025-07-03T21:27:50Z

erisol-dev
Jul 3, 2025

Hi everyone,

I’m working with PaddleOCR’s layout module to extract bounding boxes for different elements in a PDF file. I’ve tried two approaches:

Running layout detection directly on the original PDF.
Running it on an image rendered from a specific PDF page.

In both cases, when I visualize the predicted bounding boxes immediately after inference, they look correct, the boxes align as expected in the output image. I want to process these coordinates programmatically to group nearby elements in the same column (e.g., merging text elements that logically belong together).

The issue: When I try to manually draw these predicted boxes on the PDF page or rendered image, using the coordinates returned by the model (without any modifications), they don’t align correctly with the actual content anymore. The boxes appear offset or scaled incorrectly.

- Inference Output

- Manual Coordinate Output

Below is the code I used to render the coordinates on the image path extracted from the PDF.
`import json
import fitz # PyMuPDF
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

def visualize_text_boxes_from_json_image(json_path, image_path, save_path=None):
"""
Load layout JSON, extract text boxes, and visualize on an image.

Args:
    json_path (str): Path to the JSON file with layout info.
    image_path (str): Path to the pre-rendered page image (e.g., PNG).
    save_path (str or None): If provided, saves the image.

Returns:
    list: List of text box dicts (with coordinates, score).
"""
# Load the JSON
with open(json_path, 'r') as f:
    layout_data = json.load(f)

boxes = layout_data.get("boxes", [])

# Extract text boxes
text_boxes = [
    {
        "coordinate": b["coordinate"],
        "score": b["score"],
        "label": b["label"]
    }
    for b in boxes
]

# Load the image
img = np.array(Image.open(image_path))

# Plot
fig, ax = plt.subplots(figsize=(10, 10))
ax.imshow(img)

for b in text_boxes:
    x0, y0, x1, y1 = b["coordinate"]
    width = x1 - x0
    height = y1 - y0

    rect = patches.Rectangle(
        (x0, y0), width, height,
        linewidth=2, edgecolor="lime", facecolor='none'
    )
    ax.add_patch(rect)
    ax.text(x0, y0 - 5, f'{b["score"]:.2f}', color="lime", fontsize=8)

ax.set_axis_off()

if save_path:
    plt.savefig(save_path, bbox_inches='tight')
    print(f"Saved to {save_path}")

plt.show()

return text_boxes`

Below is the inference code.
`from paddleocr import LayoutDetection
input_file = ""

model = LayoutDetection(model_name="PP-DocLayout_plus-L")
output = model.predict(input_file, batch_size=1, layout_nms=True, layout_merge_bboxes_mode="large")
for res in output:
res.save_to_img(save_path="./layout_output/")
res.print() ## Print the structured prediction output
res.save_to_json(save_path="./layout_output/") ## Save the current image's structured result in JSON format`

Answered by erisol-dev

Jul 4, 2025

For anyone who runs into this issue, is that the image get scaled to a different size, like width and height that's why the bounding boxes are off even though it doesn't look like it. Take a look at the output from the predict function and calculate it's width and height, then make sure to scale the bounding boxes accordingly when you want to show it on your image.

View full answer

erisol-dev · 2025-07-04T19:09:53Z

erisol-dev
Jul 4, 2025
Author

For anyone who runs into this issue, is that the image get scaled to a different size, like width and height that's why the bounding boxes are off even though it doesn't look like it. Take a look at the output from the predict function and calculate it's width and height, then make sure to scale the bounding boxes accordingly when you want to show it on your image.

4 replies

DhiviJR Aug 19, 2025

I tried this approach, but the image output from the predict function is both scaled and cropped, so scaling the corrdinates to original scale does not help, since we dnt know the cropping factor.

erisol-dev Aug 19, 2025
Author

I tried this approach, but the image output from the predict function is both scaled and cropped, so scaling the corrdinates to original scale does not help, since we dnt know the cropping factor.

We do know the original scale just take the original image and measure it using PIL. Then when you get the output image, repeat the same thing, you can then scale the coordinate outputs based on the ratio between the two images.

DhiviJR Aug 20, 2025

Thank you for thr response. I exactly tried the same, but the output of paddle is cropped and then resized, so even if i get the ratio between input (original image) and output ( paddle output image) and scale up the corrdinates, it will not match (because of image cropping). This approach will work if the image is only resized, but not cropped.

nenb Aug 29, 2025

I also hit this issue I think.

There is a kwarg use_doc_unwarping that is set to True by default for the predict function. I believe that this attempts to 'correct' the original image before doing the inference.

The issue is that the results will then be relative to this corrected image, rather than the original image.

One option is to set it to False (if it's very important to get results relative to original image). I think this will probably result in less accuracy overall.

Another option is to plot only relative to the corrected image. I think that PaddleOCR returns this corrected image as part of the output from predict (haven't confirmed this, the False solution was okay for my needs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PaddleOCR Layout Coordinate Mismatch #15957

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PaddleOCR Layout Coordinate Mismatch #15957

Uh oh!

Uh oh!

erisol-dev Jul 3, 2025

Replies: 1 comment · 4 replies

Uh oh!

erisol-dev Jul 4, 2025 Author

Uh oh!

Uh oh!

DhiviJR Aug 19, 2025

Uh oh!

erisol-dev Aug 19, 2025 Author

Uh oh!

DhiviJR Aug 20, 2025

Uh oh!

nenb Aug 29, 2025

erisol-dev
Jul 3, 2025

Replies: 1 comment 4 replies

erisol-dev
Jul 4, 2025
Author

erisol-dev Aug 19, 2025
Author