PaddleOCR for huge screenshot (3858x2118, 2576x1408) #14677

mkrzywda · 2025-02-13T13:31:30Z

mkrzywda
Feb 13, 2025

I'm currently working with PaddleOCR to process screenshots of various sizes. I've encountered an issue where PaddleOCR does not correctly recognize all text elements in larger screenshots, specifically those with resolutions of 3858x2118 and 2576x1408. However, it works perfectly for screenshots with resolutions of 1936x1048 and 2578x1398.

I attempted to split larger images into smaller fragments, process them individually with OCR, and then merge the results. While this approach works, I'm wondering if there is a better solution.

Answered by GreatV

Feb 13, 2025

To process large images or documents with PaddleOCR for detection and recognition, use the slice operation like this:

ocr_inst = PaddleOCR(**ocr_settings)
results = ocr_inst.ocr(img, det=True, rec=True, slice=slice, cls=False, bin=False, inv=False, alpha_color=False)

For more details, visit: https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/blog/slice.html

View full answer

GreatV · 2025-02-13T13:32:32Z

GreatV
Feb 13, 2025
Maintainer

It looks like you're facing challenges with PaddleOCR when processing large screenshots (3858x2118, 2576x1408) while it works fine on smaller ones. Based on the provided discussion, here are some suggestions to improve OCR performance on large images:

Possible Issues:

Image Resolution and Model Constraints
- PaddleOCR models might have limitations on input image size, affecting detection and recognition accuracy.
- Larger images may require more memory, leading to partial processing or skipped text elements.
Text Density and Clustering
- If the text density is high, the OCR model might struggle to detect and extract text properly.
- The discussion mentions that many datasets used for training contain single-word images, which may mean the model is not optimized for dense multi-word text recognition.

Potential Solutions:

Preprocessing: Resize or Downscale Images
- Try resizing the images to a slightly lower resolution while maintaining aspect ratio.
- Reduce image complexity (e.g., by converting to grayscale or increasing contrast) before passing it to PaddleOCR.
Use Sliding Window or Adaptive Splitting
- Since splitting images has worked for you, consider an adaptive approach:
  - Use an overlapping sliding window to ensure no text is missed.
  - Implement automated text clustering before running OCR, segmenting relevant text areas dynamically.
Optimize PaddleOCR Model Configuration
- Check if a larger detection model (e.g., ch_PP-OCRv4_det) improves performance.
- Modify detection parameters such as det_db_box_thresh and det_db_unclip_ratio to fine-tune recognition in larger images.
Train a Custom Model for Large Text Regions
- If your use case requires handling multiple text clusters, you might need to train PaddleOCR on a dataset with similar characteristics.
- The discussion mentions datasets primarily consisting of single-word images, so a custom dataset with multi-word text regions could improve accuracy.

Additional Considerations:

If resource constraints are an issue, try running PaddleOCR on a GPU for better performance.
If text recognition accuracy varies significantly across resolutions, consider testing different versions of PaddleOCR models (PP-OCRv3 vs. PP-OCRv4).

Would you like specific code examples to implement image splitting or model tuning? Let me know how I can assist further! 🚀

Response generated by feifei-bot | chatgpt-4o-latest

0 replies

GreatV · 2025-02-13T13:35:14Z

GreatV
Feb 13, 2025
Maintainer

To process large images or documents with PaddleOCR for detection and recognition, use the slice operation like this:

ocr_inst = PaddleOCR(**ocr_settings)
results = ocr_inst.ocr(img, det=True, rec=True, slice=slice, cls=False, bin=False, inv=False, alpha_color=False)

For more details, visit: https://paddlepaddle.github.io/PaddleOCR/latest/en/ppocr/blog/slice.html

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PaddleOCR for huge screenshot (3858x2118, 2576x1408) #14677

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PaddleOCR for huge screenshot (3858x2118, 2576x1408) #14677

Uh oh!

mkrzywda Feb 13, 2025

Replies: 2 comments

Uh oh!

GreatV Feb 13, 2025 Maintainer

Possible Issues:

Potential Solutions:

Additional Considerations:

Uh oh!

GreatV Feb 13, 2025 Maintainer

mkrzywda
Feb 13, 2025

GreatV
Feb 13, 2025
Maintainer

GreatV
Feb 13, 2025
Maintainer