PaddleOCR/latest/datasets/ocr_datasets #14413

2024-12-18T02:25:46Z

giscus[bot]
bot Dec 18, 2024

PaddleOCR/latest/datasets/ocr_datasets

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

https://paddlepaddle.github.io/PaddleOCR/latest/datasets/ocr_datasets.html

bkMoon1024 · 2024-12-18T02:25:47Z

bkMoon1024
Dec 18, 2024 — with giscus

PaddleOCR 也提供了数据格式转换脚本，可以将官网 label 转换支持的数据格式。数据转换工具在 ppocr/utils/gen_label.py,这个工具是换名了吗

0 replies

GreatV · 2024-12-18T02:26:38Z

GreatV
Dec 18, 2024
Maintainer

The URL provided in your question refers to the OCR datasets page of PaddleOCR, which is a lightweight and multilingual Optical Character Recognition (OCR) framework based on PaddlePaddle. This page details the resources, datasets, and tools available for OCR tasks, including data preparation, annotation, and synthesis for training and deployment.

Key Features of PaddleOCR's Datasets:

Multilingual Support: The OCR system supports over 80 languages, making it suitable for diverse applications globally.
Data Annotation and Synthesis: It provides tools for annotating and synthesizing datasets, assisting in creating high-quality training data for OCR models.
Versatile Deployment: PaddleOCR is designed to work across various platforms, including servers, mobile devices, embedded systems, and IoT devices.

How to Access the Datasets:

You can explore the details of the OCR datasets by visiting the official documentation:
https://paddlepaddle.github.io/PaddleOCR/latest/datasets/ocr_datasets.html

Related Discussions:

If you're interested in related datasets, such as those for table recognition tasks, you can refer to the following discussion:

Discussion Title: PaddleOCR/latest/datasets/table_datasets
URL: PaddleOCR/latest/datasets/table_datasets #14185

This discussion focuses on datasets for table structure recognition, which is another important feature of PaddleOCR. It complements standard OCR by extracting structured information from tables.

These resources collectively provide a comprehensive toolkit for OCR and related tasks, enabling efficient training and deployment across various applications.

Response generated by feifei-bot | chatgpt-4o-latest

0 replies

qqakashi · 2025-04-21T08:25:23Z

qqakashi
Apr 21, 2025 — with giscus

如果一张图片有多条文本，或者是很大的一张图，里面写满了文本又该怎么识别

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PaddleOCR/latest/datasets/ocr_datasets #14413

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PaddleOCR/latest/datasets/ocr_datasets #14413

Uh oh!

giscus[bot] bot Dec 18, 2024

PaddleOCR/latest/datasets/ocr_datasets

Replies: 3 comments

Uh oh!

bkMoon1024 Dec 18, 2024 — with giscus

Uh oh!

GreatV Dec 18, 2024 Maintainer

Key Features of PaddleOCR's Datasets:

How to Access the Datasets:

Related Discussions:

Uh oh!

qqakashi Apr 21, 2025 — with giscus

giscus[bot]
bot Dec 18, 2024

bkMoon1024
Dec 18, 2024 — with giscus

GreatV
Dec 18, 2024
Maintainer

qqakashi
Apr 21, 2025 — with giscus