docs: address PR review — enrich CLAUDE.md and use dynamic discovery for pipelines/models

scyyh11 · scyyh11 · commit f8f9c97fc336 · 2026-03-11T02:43:37.000-07:00
- CLAUDE.md: add project structure tree, code style conventions, testing
  guidance, PR guidelines, and dynamic discovery section (inspired by
  langchain AGENTS.md)
- inference_api.md: replace hardcoded pipeline/model tables with
  instructions to read __all__ from source __init__.py files, avoiding
  staleness when pipelines/models are added or removed
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -14,14 +14,45 @@ pytest tests/                  # Tests (resource-intensive skipped by default)
 pre-commit run --all-files     # Lint/format
 ```
 
-## Architecture
+## Project Structure
+
+```
+PaddleOCR/
+├── paddleocr/              # Public API (3.x) — what users import
+│   ├── __init__.py         # Top-level exports (__all__ is the source of truth)
+│   ├── _pipelines/         # High-level pipelines (OCR, PPStructureV3, etc.)
+│   ├── _models/            # Individual model wrappers (TextDetection, etc.)
+│   └── _cli.py             # CLI entry point
+├── ppocr/                  # Internal training framework (not user-facing)
+│   ├── modeling/           # Model architectures (Backbone, Neck, Head)
+│   ├── data/               # Data loading and augmentation
+│   ├── losses/             # Loss functions
+│   ├── metrics/            # Evaluation metrics
+│   └── postprocess/        # Post-processing
+├── tools/                  # Train/infer/eval scripts (tools/train.py)
+├── configs/                # YAML configs organized by task (det/, rec/, table/, etc.)
+├── deploy/                 # Deployment (C++, Docker, ONNX, mobile)
+├── tests/                  # Tests (models/ + pipelines/)
+└── agent_docs/             # Detailed AI-readable documentation
+```
 
 Two layers — understand which you're working in:
 
-- **`paddleocr/`** — Public API (3.x). `_pipelines/` has high-level pipelines (OCR, PPStructureV3), `_models/` has individual model wrappers (TextDetection, TextRecognition). Users import from here.
-- **`ppocr/`** — Internal training framework. Model architectures, data loading, losses, metrics, postprocessing. Used by `tools/train.py`, not by end users.
+- **`paddleocr/`** — Public API (3.x). `_pipelines/` has high-level pipelines, `_models/` has individual model wrappers. Users import from here.
+- **`ppocr/`** — Internal training framework. Used by `tools/train.py`, not by end users.
+
+## Discovering Available Pipelines & Models
+
+**Do NOT rely on hardcoded lists.** Always discover dynamically from source:
+
+- **Pipelines**: Read `__all__` in `paddleocr/_pipelines/__init__.py`
+- **Models**: Read `__all__` in `paddleocr/_models/__init__.py`
+- **All public exports**: Read `__all__` in `paddleocr/__init__.py`
+
+Each pipeline inherits from `PaddleXPipelineWrapper` (in `_pipelines/base.py`).
+Each model inherits from `PaddleXPredictorWrapper` (in `_models/base.py`).
 
-Other directories: `tools/` (train/infer/eval scripts), `configs/` (YAML configs by task), `deploy/` (C++, Docker, ONNX, mobile), `tests/` (models/ + pipelines/).
+To understand a specific pipeline or model, read its source file in the corresponding directory.
 
 ## Critical: 3.x API Only
 
@@ -31,6 +62,28 @@ PaddleOCR 3.x is **not backwards compatible** with 2.x. Never generate 2.x-style
 - `PPStructure` is removed — use `PPStructureV3`
 - For single-task inference, use model classes (`TextDetection`, `TextRecognition`) not `det`/`rec` params
 
+## Code Style & Conventions
+
+- Follow existing patterns in the file you're modifying
+- Use type hints for function signatures
+- Use `pre-commit run --all-files` to lint before committing — this runs ruff, trailing whitespace fixes, and other checks
+- Error messages should be clear and actionable
+- No `eval()`, `exec()`, or `pickle` on user-controlled input
+
+## Testing
+
+- Tests live in `tests/` with subdirectories `models/` and `pipelines/`
+- Run with `pytest tests/` — resource-intensive tests are skipped by default
+- When adding a new pipeline or model, add corresponding tests
+- Test the public API (`.predict()`, result object methods), not internal implementation details
+
+## PR & Commit Guidelines
+
+- PR titles: concise, lowercase, descriptive of what changed
+- PR descriptions: explain the "why", not just the "what"
+- Keep PRs focused — one logical change per PR
+- Ensure `pre-commit run --all-files` passes before pushing
+
 ## Detailed Docs
 
 Read these as needed — don't load them all upfront:
diff --git a/agent_docs/inference_api.md b/agent_docs/inference_api.md
@@ -15,38 +15,15 @@ for res in result:
 
 ## Available Pipelines
 
-All imported from `paddleocr`:
-
-| Pipeline | Purpose |
-|----------|---------|
-| `PaddleOCR` | Full OCR (detection + recognition) |
-| `PaddleOCRVL` | Vision-language OCR (v1, v1.5) |
-| `PPStructureV3` | Document structure: tables, formulas, layout |
-| `PPChatOCRv4Doc` | LLM-powered document analysis |
-| `DocUnderstanding` | VLM-based document QA |
-| `FormulaRecognitionPipeline` | Math formula recognition |
-| `SealRecognition` | Seal text detection + recognition |
-| `TableRecognitionPipelineV2` | Table structure recognition |
-| `DocPreprocessor` | Orientation, unwarping |
-| `PPDocTranslation` | Document translation |
+All imported from `paddleocr`. **To get the current list**, read `__all__` in `paddleocr/_pipelines/__init__.py`.
+
+Common pipelines include `PaddleOCR` (full OCR), `PPStructureV3` (document structure), `DocUnderstanding` (VLM-based QA), but the authoritative list lives in the source. Each pipeline has its own file in `paddleocr/_pipelines/` — read the file to understand its constructor parameters and capabilities.
 
 ## Available Individual Models
 
-| Model | Purpose |
-|-------|---------|
-| `TextDetection` | Detect text regions |
-| `TextRecognition` | Recognize text content |
-| `LayoutDetection` | Detect document layout regions |
-| `TableClassification` | Classify table types |
-| `TableCellsDetection` | Detect table cells |
-| `TableStructureRecognition` | Recognize table structure |
-| `SealTextDetection` | Detect seal text |
-| `FormulaRecognition` | Recognize formulas |
-| `ChartParsing` | Parse charts |
-| `DocVLM` | Document vision-language model |
-| `DocImgOrientationClassification` | Classify document orientation |
-| `TextImageUnwarping` | Unwarp distorted text images |
-| `TextLineOrientationClassification` | Classify text line orientation |
+All imported from `paddleocr`. **To get the current list**, read `__all__` in `paddleocr/_models/__init__.py`.
+
+Common models include `TextDetection`, `TextRecognition`, `LayoutDetection`, but the authoritative list lives in the source. Each model has its own file in `paddleocr/_models/` — read the file to understand its parameters and default model names.
 
 ## PaddleOCR Constructor Parameters