Skip to content

GH-3676: make sentencepiece a lazy import for BytePairEmbeddings#3696

Open
haoyu-haoyu wants to merge 1 commit intoflairNLP:masterfrom
haoyu-haoyu:fix/lazy-sentencepiece-import
Open

GH-3676: make sentencepiece a lazy import for BytePairEmbeddings#3696
haoyu-haoyu wants to merge 1 commit intoflairNLP:masterfrom
haoyu-haoyu:fix/lazy-sentencepiece-import

Conversation

@haoyu-haoyu
Copy link
Copy Markdown

sentencepiece fails to build on Python 3.13 + macOS and the package seems unmaintained (last release Feb 2024). The direct from sentencepiece import SentencePieceProcessor at the top of token.py means the entire embeddings module fails to import even if nobody uses BytePairEmbeddings.

Moved the import into BytePairEmbeddings.__init__() using the existing lazy_import() helper that the class already uses for bpemb.

Kept transformers[sentencepiece] in requirements.txt since other HF tokenizers (T5, XLM-RoBERTa etc.) import sentencepiece internally and would break without it. This only fixes the direct import that blocks the module load.

Closes #3676

sentencepiece is unmaintained (last release Feb 2024) and fails to build
on Python 3.13 + macOS.  Since the direct import is only used by
BytePairEmbeddings, move it from module-level to the constructor using
the existing lazy_import() helper.

Keep transformers[sentencepiece] in requirements.txt because other HF
tokenizers (T5, XLM-RoBERTa) import sentencepiece internally and would
break without it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Drop sentencepiece dependency

1 participant