-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Milestone
Description
Currently, the extraction package always installs and imports torch and transformers, even when only the treebased classifier (RandomForest/XGBoost) is used. This inflates Docker image size and baseline RAM usage.
Example import probe results (AWS workspace, Python 3.10):
Python baseline: 11.9 MB
+ numpy: 25.4 MB
+ pandas: 100.5 MB
+ sklearn: 158.7 MB
+ pymupdf: 183.6 MB
+ shapely: 186.5 MB
+ torch: 632.5 MB
+ transformers: 650.5 MB
+ fasttext: 650.8 MB
+ layoutparser: 650.8 MB
So torch adds ~450 MB just by being imported.
Make heavy deep-learning dependencies (torch, transformers,layoutparser) optional.
Use them only when LayoutLMv3 is explicitly used.
Allow treebased (RandomForest/XGBoost) and baseline classifiers to run without installing these deps.
If optional deps are missing and LayoutLMv3 is requested, raise a clear error:
“torch/transformers are required for LayoutLMv3. Please install with pip install '.[deep-learning]'.”
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels