Make layout deps optional, fix requires-python, lazy-import cv2#100
Open
dguido wants to merge 1 commit intozai-org:mainfrom
Open
Make layout deps optional, fix requires-python, lazy-import cv2#100dguido wants to merge 1 commit intozai-org:mainfrom
dguido wants to merge 1 commit intozai-org:mainfrom
Conversation
- Move torch, torchvision, transformers, sentencepiece, accelerate, opencv-python, and flask from core dependencies to their existing optional extras ([layout] and [server]). OCR-only mode now installs in seconds instead of pulling ~5GB of ML frameworks. - Bump requires-python from >=3.8 to >=3.10. The core dependency transformers>=5.1.0 already requires Python 3.10+, so the old lower bound caused resolver failures (e.g. uv cannot find a valid solution across Python 3.8/3.9). - Lazy-import cv2 in image_utils.crop_image_region() and defer visualization_utils imports in utils/__init__.py. opencv-python is only needed for layout detection, but the module-level import made it required even in OCR-only mode. - Fix double image preprocessing in Pipeline.process() OCR-only path. Images were encoded via load_image_to_base64 (with smart_resize), then build_request() decoded and re-encoded them through load_image_to_base64 a second time. Replace the build_request() call with direct setdefault() for the generation parameters. - Simplify [all] extra to reference [layout,server] instead of duplicating the dependency list. Update classifiers and black target-version to reflect 3.10-3.13. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Move heavy deps to optional extras: torch, torchvision, transformers, sentencepiece, accelerate, opencv-python, and flask are moved from core
dependenciesto their existing[layout]and[server]optional extras. OCR-only mode (pip install glmocr) now installs in seconds instead of pulling ~5GB of ML frameworks. Users who need layout detection install withpip install glmocr[layout]orpip install glmocr[all].Fix
requires-python: Bumped from>=3.8to>=3.10. The core dependencytransformers>=5.1.0already requires Python 3.10+, so the old lower bound caused dependency resolver failures (e.g.,uvcannot find a valid solution when it tries to satisfytransformers>=5.1.0across Python 3.8/3.9).Lazy-import
cv2: Movedimport cv2from module level inimage_utils.pyintocrop_image_region(), and deferredvisualization_utilsimports inutils/__init__.pyvia__getattr__.opencv-pythonis only needed for layout detection (polygon cropping and visualization), but the module-level import made it a hard requirement even in OCR-only mode.Fix double image preprocessing in pipeline: In the OCR-only path of
Pipeline.process(), images were encoded viaload_image_to_base64(withsmart_resize), thenbuild_request()decoded and re-encoded them throughload_image_to_base64a second time. Replaced thebuild_request()call with directsetdefault()for generation parameters.Simplify
[all]extra: Changed from duplicating the full dependency list to referencingglmocr[layout,server].Motivation
When using
glm-ocrin OCR-only mode with an external inference server (e.g., mlx-vlm on Apple Silicon, vLLM, or the MaaS API), there's no need for torch, torchvision, or opencv. The currentpyproject.tomlmakes these mandatory, which means a multi-GB install for a use case that only needsrequestsandPillow.The
requires-python >= 3.8also blocks modern Python package managers (uv) from resolving dependencies, sincetransformers >= 5.1.0dropped Python 3.8/3.9 support.Test plan
pip install .(oruv sync) installs without torch/opencvpip install ".[layout]"installs the full layout detection stackpip install ".[all]"installs everything[layout]extra installedfrom glmocr.utils import crop_image_regionsucceeds without opencv (cv2 only imported when function is called)🤖 Generated with Claude Code