Make package zip-safe by migrating to importlib.resources#1212
Conversation
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR migrates the package from __file__-based resource access to importlib.resources API to enable zip-safe installation. The change ensures corpus files and binary models can be accessed whether installed as a regular package or as a zip archive.
Changes:
- Migrated text corpus file reading to use
files().joinpath().read_text()for direct in-memory parsing - Implemented lazy loading with
as_file()context manager for binary models that require file paths - Updated path resolution to use
files()API instead of__file__ - Removed
zip-safe = falsedirective from pyproject.toml
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| pythainlp/tools/path.py | Replaced __file__ with files() API for package path resolution |
| pythainlp/corpus/core.py | Migrated corpus file reading to use read_text() for in-memory text parsing |
| pythainlp/corpus/th_en_translit.py | Updated transliteration dictionary loading to use read_text() |
| pythainlp/tokenize/nlpo3.py | Implemented lazy loading for nlpo3 dictionary with as_file() |
| pythainlp/tokenize/han_solo.py | Implemented lazy loading for CRF model with as_file() |
| pythainlp/tag/crfchunk.py | Updated CRF model loading to use as_file() context manager |
| pythainlp/spell/symspellpy.py | Implemented lazy loading for symspell dictionaries with as_file() |
| pyproject.toml | Removed zip-safe = false configuration |
|
@copilot Read the comments in this thread, analyze the suggestions, recheck safety, and implement suitable changes. |
Address code review feedback: - Keep context managers open for pycrfsuite and SymSpell models - Use string imports for files() instead of module imports - Add proper cleanup in __del__ methods where appropriate Co-authored-by: bact <128572+bact@users.noreply.github.com>
|
@copilot should we make the context manager private? does the user need to get access to it? what is the best practice? |
Change self.model_path_ctx to self._model_path_ctx following Python convention for private attributes. The context manager is an implementation detail that users don't need to access. Co-authored-by: bact <128572+bact@users.noreply.github.com>
Yes, context managers should be private. Fixed in df59213. Best practices:
|
|
@copilot analyze the comments in this thread and make appropriate changes. |
Address code review feedback: - Add threading locks to ensure thread-safe lazy loading - Implement double-check locking pattern to prevent race conditions - Make CRFchunk a context manager for explicit resource management - Document that __del__ is unreliable and context manager is preferred - Add documentation about zip-safe behavior in get_pythainlp_path() - Document that context managers are kept alive for program lifetime Co-authored-by: bact <128572+bact@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
Change variable names from *_path_ctx to *_file_ctx to better reflect that they are file context managers from as_file(), not path contexts: - _dict_path_ctx -> _dict_file_ctx - _model_path_ctx -> _model_file_ctx - _unigram_path_ctx -> _unigram_file_ctx - self._model_path_ctx -> self._model_file_ctx Co-authored-by: bact <128572+bact@users.noreply.github.com>
Replace `import pythainlp; files(pythainlp)` with `files("pythainlp")` in get_pythainlp_path() to avoid importing the entire package unnecessarily. This is consistent with the pattern used elsewhere in the codebase and avoids potential circular import issues.
Co-authored-by: bact <128572+bact@users.noreply.github.com>
|



Plan to make PyThaiNLP zip-safe
Analysis
__file__to access package directoryimportlib.resourcesImplementation
importlib.resourceszip-safe = falsefrom pyproject.tomlCode Review Feedback Addressed
as_file()context managers alive for model lifetimefiles("pythainlp.corpus")instead of module imports__del__methodsTesting & Validation
Summary
Successfully made the PyThaiNLP package zip-safe by:
__file__usage withimportlib.resources.files()as_file()context manager with proper lifetime managementzip-safe = falsefrom pyproject.tomlOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.