You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**`paddleocr/`** — Public API (3.x). `_pipelines/` has high-level pipelines (OCR, PPStructureV3), `_models/` has individual model wrappers (TextDetection, TextRecognition). Users import from here.
22
-
-**`ppocr/`** — Internal training framework. Model architectures, data loading, losses, metrics, postprocessing. Used by `tools/train.py`, not by end users.
41
+
-**`paddleocr/`** — Public API (3.x). `_pipelines/` has high-level pipelines, `_models/` has individual model wrappers. Users import from here.
42
+
-**`ppocr/`** — Internal training framework. Used by `tools/train.py`, not by end users.
43
+
44
+
## Discovering Available Pipelines & Models
45
+
46
+
**Do NOT rely on hardcoded lists.** Always discover dynamically from source:
47
+
48
+
-**Pipelines**: Read `__all__` in `paddleocr/_pipelines/__init__.py`
49
+
-**Models**: Read `__all__` in `paddleocr/_models/__init__.py`
50
+
-**All public exports**: Read `__all__` in `paddleocr/__init__.py`
51
+
52
+
Each pipeline inherits from `PaddleXPipelineWrapper` (in `_pipelines/base.py`).
53
+
Each model inherits from `PaddleXPredictorWrapper` (in `_models/base.py`).
23
54
24
-
Other directories: `tools/` (train/infer/eval scripts), `configs/` (YAML configs by task), `deploy/` (C++, Docker, ONNX, mobile), `tests/` (models/ + pipelines/).
55
+
To understand a specific pipeline or model, read its source file in the corresponding directory.
25
56
26
57
## Critical: 3.x API Only
27
58
@@ -31,6 +62,28 @@ PaddleOCR 3.x is **not backwards compatible** with 2.x. Never generate 2.x-style
31
62
-`PPStructure` is removed — use `PPStructureV3`
32
63
- For single-task inference, use model classes (`TextDetection`, `TextRecognition`) not `det`/`rec` params
33
64
65
+
## Code Style & Conventions
66
+
67
+
- Follow existing patterns in the file you're modifying
68
+
- Use type hints for function signatures
69
+
- Use `pre-commit run --all-files` to lint before committing — this runs ruff, trailing whitespace fixes, and other checks
70
+
- Error messages should be clear and actionable
71
+
- No `eval()`, `exec()`, or `pickle` on user-controlled input
72
+
73
+
## Testing
74
+
75
+
- Tests live in `tests/` with subdirectories `models/` and `pipelines/`
76
+
- Run with `pytest tests/` — resource-intensive tests are skipped by default
77
+
- When adding a new pipeline or model, add corresponding tests
78
+
- Test the public API (`.predict()`, result object methods), not internal implementation details
79
+
80
+
## PR & Commit Guidelines
81
+
82
+
- PR titles: concise, lowercase, descriptive of what changed
83
+
- PR descriptions: explain the "why", not just the "what"
84
+
- Keep PRs focused — one logical change per PR
85
+
- Ensure `pre-commit run --all-files` passes before pushing
86
+
34
87
## Detailed Docs
35
88
36
89
Read these as needed — don't load them all upfront:
All imported from `paddleocr`. **To get the current list**, read `__all__` in `paddleocr/_pipelines/__init__.py`.
19
+
20
+
Common pipelines include `PaddleOCR` (full OCR), `PPStructureV3` (document structure), `DocUnderstanding` (VLM-based QA), but the authoritative list lives in the source. Each pipeline has its own file in `paddleocr/_pipelines/` — read the file to understand its constructor parameters and capabilities.
32
21
33
22
## Available Individual Models
34
23
35
-
| Model | Purpose |
36
-
|-------|---------|
37
-
|`TextDetection`| Detect text regions |
38
-
|`TextRecognition`| Recognize text content |
39
-
|`LayoutDetection`| Detect document layout regions |
|`TextImageUnwarping`| Unwarp distorted text images |
49
-
|`TextLineOrientationClassification`| Classify text line orientation |
24
+
All imported from `paddleocr`. **To get the current list**, read `__all__` in `paddleocr/_models/__init__.py`.
25
+
26
+
Common models include `TextDetection`, `TextRecognition`, `LayoutDetection`, but the authoritative list lives in the source. Each model has its own file in `paddleocr/_models/` — read the file to understand its parameters and default model names.
0 commit comments