Word Detection Configuration in PaddlePaddleOCR #15011

mkrzywda · 2025-04-14T08:01:12Z

mkrzywda
Apr 14, 2025

Hi everyone,
I'm experiencing issues configuring the word detection module of PaddlePaddleOCR. Despite following the official documentation, the default settings are not providing the desired accuracy—words are either detected too broadly or inaccurately. I’m looking for advice on fine-tuning the configuration to improve precision.

Questions:

Which parameters are crucial for accurate word detection?
Is there any example configuration or recommended settings for projects that require high detection accuracy?
Are there additional modifications needed in the image preprocessing or result postprocessing stages?

    use_angle_cls=False,
    lang="en",
    use_gpu=True,
    use_space_char=True,
    det_db_thresh=0.3,
    det_db_box_thresh=0.5,
    cls=False,
    bin=False,
    inv=False,
    alpha_color=False,
    max_text_length=100,
    return_word_box=True,
)```

I need to detect word bounding boxes in images. Since I'm working with screenshots in .png or .jpg formats that have varying dimensions and resolutions, I'm using dynamic slicing to adapt to these differences.

def adjust_slice_params(image_data: bytes) -> Dict[str, int]:
image = Image.open(io.BytesIO(image_data))
width, height = image.size

horizontal_stride = int(width * 0.6)
vertical_stride = int(height * 0.3)

horizontal_stride = max(horizontal_stride, 1200)
vertical_stride = max(vertical_stride, 300)

slice_params: Dict[str, int] = {
    "horizontal_stride": horizontal_stride,
    "vertical_stride": vertical_stride,
    "merge_x_thres": 30,
    "merge_y_thres": 30,
}
return slice_params
    slice_params = adjust_slice_params(image_data)
    logger.info(f"{slice_params=}")
    result = ocr.ocr(image_data, slice=slice_params, cls=False)

GreatV · 2025-04-14T08:07:32Z

GreatV
Apr 14, 2025
Maintainer

To improve word detection accuracy in PaddlePaddleOCR, especially when working with diverse image formats like PNG or JPG screenshots with varying dimensions, several configuration and preprocessing/postprocessing adjustments can significantly help.

Crucial Parameters for Accurate Word Detection

The most impactful parameters in the PaddleOCR word detection module (especially when using the DB algorithm) include:

det_db_thresh: This threshold determines which pixels are considered as text during binarization. Lowering it (e.g., from 0.3 to 0.2) may detect more text but may introduce noise.
det_db_box_thresh: This threshold filters out low-confidence boxes during box construction. For high-precision needs, increasing it (e.g., from 0.5 to 0.6 or 0.7) may help reduce false positives.
det_db_unclip_ratio: Controls box expansion. A higher value increases box size and captures more context. Try tuning around 1.5–2.0 depending on your font characteristics.
use_dilation: In certain updated versions of PaddleOCR, this option helps group text regions better in dense texts—review documentation for current availability.
box_type: You can optionally use 'poly' (polygon) instead of 'quad' (quadrilateral) if word shapes are irregular.

Example tuning:
Try the following adjustments:

ocr = PaddleOCR(
use_angle_cls=False,
lang="en",
use_gpu=True,
use_space_char=True,
det_db_thresh=0.25, # lowered to detect weaker text
det_db_box_thresh=0.6, # raised to reduce false positives
det_db_unclip_ratio=1.75, # increased for context
max_text_length=100,
return_word_box=True,
)

Recommended Settings & Examples for High Accuracy

While there's no "universal" ideal config, here’s what’s typically recommended for high-accuracy enterprise or document processing tasks:

Use the latest PP-OCRv4 or higher model for detection: It is notably more accurate than earlier versions (PP-OCRv3 or below). You can set det_model_dir explicitly to a downloaded PP-OCRv4 detection model.
Use rescale-based preprocessing via det_limit_side_len or det_limit_type in the YAML model config files (not PaddleOCR class directly). These are useful for large images—ensures consistent input size for the model.
Consider enabling postprocess_cls = True and using cls_model_dir if text angle detection is important, especially for detected rotated boxes.

Image Preprocessing & Postprocessing Tips

Your use of a dynamic slicing strategy is reasonable for screenshots with varying sizes, but consider the following to further improve robustness:

Image Preprocessing:

Ensure consistent scale: Use det_limit_side_len=960 in the model config YAML to reduce size without losing resolution.
Convert images to RGB before passing into PaddleOCR (if using Pillow, use image.convert("RGB")).
Apply light sharpening or contrast enhancement using PIL or OpenCV if the screenshots are blurrier.

Postprocessing:

Use merge_x_thres and merge_y_thres carefully. Try:

"merge_x_thres": 25,
"merge_y_thres": 10

Smaller y thresholds reduce accidental vertical merges, useful when multiple rows of text are close together.

Consider customizing the postprocess_op.py module if you want fine-grained control of how bounding boxes are merged or filtered.

For advanced use-cases requiring further detection precision or custom bounding box behavior, consider retraining detection models on your own screenshot data using finetune_en.md documentation:
https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/finetune_en.md

Although Discussion #13875 focuses mainly on text recognition fine-tuning, the linked docs and configurations are applicable to detection models as well:
#13875

In summary, focus on tuning det_db_thresh, det_db_box_thresh, det_db_unclip_ratio, and postprocessing thresholds, combined with high-quality input preprocessing and possibly switching to the PP-OCRv4 detection model.

Response generated by 🤖 feifei-bot | chatgpt-4o-latest

⚠️ This is an AI-generated response. Please verify any critical information.

1 reply

mkrzywda Apr 14, 2025
Author

But the issue is connected with word bounding box, not with accuracy of PP-OCR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Word Detection Configuration in PaddlePaddleOCR #15011

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Word Detection Configuration in PaddlePaddleOCR #15011

Uh oh!

mkrzywda Apr 14, 2025

Replies: 1 comment · 1 reply

Uh oh!

GreatV Apr 14, 2025 Maintainer

Uh oh!

mkrzywda Apr 14, 2025 Author

mkrzywda
Apr 14, 2025

Replies: 1 comment 1 reply

GreatV
Apr 14, 2025
Maintainer

mkrzywda Apr 14, 2025
Author