Skip to content

Make layout deps optional, fix requires-python, lazy-import cv2#100

Open
dguido wants to merge 1 commit intozai-org:mainfrom
dguido:fix/deps-and-imports
Open

Make layout deps optional, fix requires-python, lazy-import cv2#100
dguido wants to merge 1 commit intozai-org:mainfrom
dguido:fix/deps-and-imports

Conversation

@dguido
Copy link

@dguido dguido commented Feb 18, 2026

Summary

  • Move heavy deps to optional extras: torch, torchvision, transformers, sentencepiece, accelerate, opencv-python, and flask are moved from core dependencies to their existing [layout] and [server] optional extras. OCR-only mode (pip install glmocr) now installs in seconds instead of pulling ~5GB of ML frameworks. Users who need layout detection install with pip install glmocr[layout] or pip install glmocr[all].

  • Fix requires-python: Bumped from >=3.8 to >=3.10. The core dependency transformers>=5.1.0 already requires Python 3.10+, so the old lower bound caused dependency resolver failures (e.g., uv cannot find a valid solution when it tries to satisfy transformers>=5.1.0 across Python 3.8/3.9).

  • Lazy-import cv2: Moved import cv2 from module level in image_utils.py into crop_image_region(), and deferred visualization_utils imports in utils/__init__.py via __getattr__. opencv-python is only needed for layout detection (polygon cropping and visualization), but the module-level import made it a hard requirement even in OCR-only mode.

  • Fix double image preprocessing in pipeline: In the OCR-only path of Pipeline.process(), images were encoded via load_image_to_base64 (with smart_resize), then build_request() decoded and re-encoded them through load_image_to_base64 a second time. Replaced the build_request() call with direct setdefault() for generation parameters.

  • Simplify [all] extra: Changed from duplicating the full dependency list to referencing glmocr[layout,server].

Motivation

When using glm-ocr in OCR-only mode with an external inference server (e.g., mlx-vlm on Apple Silicon, vLLM, or the MaaS API), there's no need for torch, torchvision, or opencv. The current pyproject.toml makes these mandatory, which means a multi-GB install for a use case that only needs requests and Pillow.

The requires-python >= 3.8 also blocks modern Python package managers (uv) from resolving dependencies, since transformers >= 5.1.0 dropped Python 3.8/3.9 support.

Test plan

  • pip install . (or uv sync) installs without torch/opencv
  • pip install ".[layout]" installs the full layout detection stack
  • pip install ".[all]" installs everything
  • OCR-only pipeline works without opencv installed
  • Layout pipeline works with [layout] extra installed
  • from glmocr.utils import crop_image_region succeeds without opencv (cv2 only imported when function is called)

🤖 Generated with Claude Code

- Move torch, torchvision, transformers, sentencepiece, accelerate,
  opencv-python, and flask from core dependencies to their existing
  optional extras ([layout] and [server]). OCR-only mode now installs
  in seconds instead of pulling ~5GB of ML frameworks.

- Bump requires-python from >=3.8 to >=3.10. The core dependency
  transformers>=5.1.0 already requires Python 3.10+, so the old
  lower bound caused resolver failures (e.g. uv cannot find a valid
  solution across Python 3.8/3.9).

- Lazy-import cv2 in image_utils.crop_image_region() and defer
  visualization_utils imports in utils/__init__.py. opencv-python
  is only needed for layout detection, but the module-level import
  made it required even in OCR-only mode.

- Fix double image preprocessing in Pipeline.process() OCR-only path.
  Images were encoded via load_image_to_base64 (with smart_resize),
  then build_request() decoded and re-encoded them through
  load_image_to_base64 a second time. Replace the build_request()
  call with direct setdefault() for the generation parameters.

- Simplify [all] extra to reference [layout,server] instead of
  duplicating the dependency list. Update classifiers and black
  target-version to reflect 3.10-3.13.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant