-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
I'm using docling-serve to process the document but wanted to run the chunking locally using OpenAI's tiktoken. For this, I added docling[chunking-openai] to my pyproject.toml
However, importing the HybridChunker still fails with:
RuntimeError: Extra required by module: 'chunking' by default (or 'chunking-openai' if specifically using OpenAI tokenization); to install, run: `pip install 'docling-core[chunking]'` or `pip install 'docling-core[chunking-openai]'`
docling-core[chunking-openai] does not install transformers
Lines 81 to 93 in 5ab8b8c
| chunking-openai = [ | |
| # common: | |
| 'semchunk (>=2.2.0,<3.0.0)', | |
| 'tree-sitter (>=0.23.2,<1.0.0)', | |
| 'tree-sitter-python (>=0.23.6,<1.0.0)', | |
| 'tree-sitter-c (>=0.23.4,<1.0.0)', | |
| 'tree-sitter-java (>=0.23.5,<1.0.0)', | |
| 'tree-sitter-javascript (>=0.23.1,<1.0.0)', | |
| 'tree-sitter-typescript (>=0.23.2,<1.0.0)', | |
| # specific: | |
| 'tiktoken (>=0.9.0,<0.13.0)', | |
| ] |
But transformers is imported regardless in
docling-core/docling_core/transforms/chunker/hybrid_chunker.py
Lines 14 to 18 in 5ab8b8c
| try: | |
| import semchunk | |
| from transformers import PreTrainedTokenizerBase | |
| except ImportError: | |
| raise RuntimeError( |
Metadata
Metadata
Assignees
Labels
No labels