Skip to content

feat: add type hints to paddleocr/ public API #17805

Open
scyyh11 wants to merge 4 commits intoPaddlePaddle:mainfrom
scyyh11:feat/type-hints-phase1
Open

feat: add type hints to paddleocr/ public API #17805
scyyh11 wants to merge 4 commits intoPaddlePaddle:mainfrom
scyyh11:feat/type-hints-phase1

Conversation

@scyyh11
Copy link
Collaborator

@scyyh11 scyyh11 commented Mar 13, 2026

Summary

  • Add comprehensive type annotations across the entire paddleocr/ package (42 files) for IDE autocompletion and static type checking
  • Add PEP 561 py.typed marker and shared _types.py module with reusable type aliases (ImageInput, InputType, PredictResult)
  • Annotate all model wrappers (14 subclasses + base/mixins), all pipelines (10 subclasses + base/utils), and utility modules
  • Add mypy configuration in pyproject.toml (Python 3.8 compatible via from __future__ import annotations) and CI check in codestyle workflow

Details

  • Uses modern X | None union syntax everywhere, enabled by from __future__ import annotations
  • mypy passes cleanly: Success: no issues found in 46 source files
  • No runtime behavior changes — annotations are purely for static analysis and IDE support
  • Phase 1 scope: function signatures and module-level constants. Phase 3 will refine PredictResult with TypedDict

Test plan

  • mypy paddleocr/ passes with zero errors
  • pre-commit run --all-files passes

@paddle-bot
Copy link

paddle-bot bot commented Mar 13, 2026

Thanks for your contribution!

@scyyh11 scyyh11 force-pushed the feat/type-hints-phase1 branch 2 times, most recently from e70c90e to 54aef7a Compare March 13, 2026 05:39
scyyh11 added 3 commits March 13, 2026 01:40
Add comprehensive type annotations across the entire paddleocr/ package
for IDE autocompletion and static type checking support.

- Add PEP 561 py.typed marker and shared _types.py (ImageInput, InputType, PredictResult)
- Annotate all model base classes, mixins, and 13 model subclasses
- Annotate all pipeline base, utils, and 10 pipeline subclasses
- Annotate utility modules (_utils/cli, deprecation, logging) and core modules
- Add mypy configuration in pyproject.toml (Python 3.8 compat via `from __future__ import annotations`)
- Add mypy type check step to CI codestyle workflow
Bump python_version to 3.9 (mypy dropped 3.8 support) and add PIL
to ignore_missing_imports.
@scyyh11 scyyh11 force-pushed the feat/type-hints-phase1 branch from 54aef7a to a96572a Compare March 13, 2026 05:40
@scyyh11 scyyh11 requested a review from Bobholamovic March 13, 2026 05:42
@scyyh11 scyyh11 changed the title feat: add type hints to paddleocr/ public API (Phase 1) feat: add type hints to paddleocr/ public API Mar 13, 2026
super().__init__(*args, **kwargs)

def _get_extra_paddlex_predictor_init_args(self):
def _get_extra_paddlex_predictor_init_args(self) -> dict[str, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PaddleOCR要求Python 3.8及以上版本,而在py3.8中dict[str, Any]这样的语法是不被支持的,可能需要统一改造下

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里我用了 from __future__ import annotations 预防了这个问题

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我理解这个主要是会导致type hint的lazy evaluation,从而使运行时import模块不会报错,但如果用户在python 3.8手动去eval这些type hint的话,还是会报错的。可以评估一下

pipeline_version: str = _DEFAULT_PIPELINE_VERSION,
layout_detection_model_name: str | None = None,
layout_detection_model_dir: str | None = None,
layout_threshold: float | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

部分参数可能存在类型不够确切的问题。比如这个参数应该还支持dict。需要以文档中的描述为准添加type hints

@scyyh11
Copy link
Collaborator Author

scyyh11 commented Mar 14, 2026

PaddleX 目前没有 py.typed 或类型存根,DetResult、OCRResult 等内部类型无法用于外部类型检查。本 PR 对这类参数统一标注为 Any,返回值使用 PredictResult = Dict[str, Any]。其余参数类型与 PaddleX 签名对齐。

- Widen layout param types (threshold, unclip_ratio, merge_bboxes_mode)
  to match paddlex per-pipeline signatures
- Fix DocUnderstanding.predict input type: InputType → dict
- Remove dict from layout_unclip_ratio in formula/seal pipelines
  where paddlex doesn't accept it
- Use bare dict/tuple to match paddlex exactly, no refinement
@scyyh11 scyyh11 force-pushed the feat/type-hints-phase1 branch from 506091c to 1f69d6a Compare March 14, 2026 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants