Skip to content

Make package zip-safe by migrating to importlib.resources#1212

Merged
bact merged 10 commits intodevfrom
copilot/check-zip-safe-status
Jan 17, 2026
Merged

Make package zip-safe by migrating to importlib.resources#1212
bact merged 10 commits intodevfrom
copilot/check-zip-safe-status

Conversation

Copy link
Contributor

Copilot AI commented Jan 16, 2026

Plan to make PyThaiNLP zip-safe

Analysis

  • Explore repository structure
  • Identify zip-safe issues in pyproject.toml
  • Identify root cause: using __file__ to access package directory
  • Verify Python version requirement (>=3.9) supports importlib.resources

Implementation

  • Update resource access to use importlib.resources
  • Implement lazy loading for models with proper context manager lifecycle
  • Fix module import style (use string imports)
  • Remove zip-safe = false from pyproject.toml
  • Make context manager attributes private following Python conventions
  • Add thread safety with locks for lazy loading
  • Implement context manager protocol for CRFchunk class
  • Rename context manager variables for clarity (*_file_ctx)
  • Avoid broad imports by using string-based imports consistently

Code Review Feedback Addressed

  • Keep as_file() context managers alive for model lifetime
  • Use files("pythainlp.corpus") instead of module imports
  • Add proper cleanup in __del__ methods
  • Make context manager attributes private (underscore prefix)
  • Add thread safety using locks and double-check locking pattern
  • Make CRFchunk a context manager for explicit resource management
  • Document resource management behavior and limitations
  • Document zip-safe behavior changes in get_pythainlp_path()
  • Rename variables from *_path_ctx to *_file_ctx for clarity
  • Use string-based imports consistently to avoid broad imports

Testing & Validation

  • Test basic corpus loading functions
  • Run ruff linter - all checks passed
  • Verify package builds successfully
  • Install and test built wheel
  • Code review completed and feedback addressed

Summary

Successfully made the PyThaiNLP package zip-safe by:

  • Replaced all __file__ usage with importlib.resources.files()
  • Used as_file() context manager with proper lifetime management
  • Implemented lazy loading with thread safety for modules loaded at import time
  • Removed zip-safe = false from pyproject.toml
  • Followed Python best practices for private attributes and resource management
  • Added comprehensive documentation about resource lifecycle and limitations
  • Used clear, descriptive variable names for context managers
  • Consistently used string-based imports to avoid unnecessary package imports
Original prompt

Check if all the code is considered "zip-safe" by the setuptools.

If it is not, fix when possible. Try to make the package zip-safe and remove "zip-safe = false" from pyproject.toml config.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits January 16, 2026 19:58
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Copilot AI changed the title [WIP] Check and update zip-safe status in package Make package zip-safe by migrating to importlib.resources Jan 16, 2026
Copilot AI requested a review from bact January 16, 2026 20:05
@bact bact requested a review from Copilot January 16, 2026 20:08
@coveralls
Copy link

coveralls commented Jan 16, 2026

Coverage Status

coverage: 53.677% (+0.08%) from 53.596%
when pulling c2526e3 on copilot/check-zip-safe-status
into 11c8a9e on dev.

@bact bact marked this pull request as ready for review January 16, 2026 20:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the package from __file__-based resource access to importlib.resources API to enable zip-safe installation. The change ensures corpus files and binary models can be accessed whether installed as a regular package or as a zip archive.

Changes:

  • Migrated text corpus file reading to use files().joinpath().read_text() for direct in-memory parsing
  • Implemented lazy loading with as_file() context manager for binary models that require file paths
  • Updated path resolution to use files() API instead of __file__
  • Removed zip-safe = false directive from pyproject.toml

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pythainlp/tools/path.py Replaced __file__ with files() API for package path resolution
pythainlp/corpus/core.py Migrated corpus file reading to use read_text() for in-memory text parsing
pythainlp/corpus/th_en_translit.py Updated transliteration dictionary loading to use read_text()
pythainlp/tokenize/nlpo3.py Implemented lazy loading for nlpo3 dictionary with as_file()
pythainlp/tokenize/han_solo.py Implemented lazy loading for CRF model with as_file()
pythainlp/tag/crfchunk.py Updated CRF model loading to use as_file() context manager
pythainlp/spell/symspellpy.py Implemented lazy loading for symspell dictionaries with as_file()
pyproject.toml Removed zip-safe = false configuration

@bact
Copy link
Member

bact commented Jan 16, 2026

@copilot Read the comments in this thread, analyze the suggestions, recheck safety, and implement suitable changes.

Address code review feedback:
- Keep context managers open for pycrfsuite and SymSpell models
- Use string imports for files() instead of module imports
- Add proper cleanup in __del__ methods where appropriate

Co-authored-by: bact <128572+bact@users.noreply.github.com>
@bact
Copy link
Member

bact commented Jan 16, 2026

@copilot should we make the context manager private? does the user need to get access to it? what is the best practice?

Change self.model_path_ctx to self._model_path_ctx following Python convention for private attributes. The context manager is an implementation detail that users don't need to access.

Co-authored-by: bact <128572+bact@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 16, 2026

@copilot should we make the context manager private? does the user need to get access to it? what is the best practice?

Yes, context managers should be private. Fixed in df59213.

Best practices:

  • Module-level variables (_model_path_ctx, _unigram_path_ctx, _dict_path_ctx) were already correctly prefixed with underscore
  • Instance attribute self.model_path_ctx has been changed to self._model_path_ctx
  • Context managers are implementation details for resource management that users don't need to access
  • Python convention is to prefix internal attributes with underscore to indicate they're private

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

@bact
Copy link
Member

bact commented Jan 16, 2026

@copilot analyze the comments in this thread and make appropriate changes.

Address code review feedback:
- Add threading locks to ensure thread-safe lazy loading
- Implement double-check locking pattern to prevent race conditions
- Make CRFchunk a context manager for explicit resource management
- Document that __del__ is unreliable and context manager is preferred
- Add documentation about zip-safe behavior in get_pythainlp_path()
- Document that context managers are kept alive for program lifetime

Co-authored-by: bact <128572+bact@users.noreply.github.com>
@bact bact added the refactoring a technical improvement which does not add any new features or change existing features. label Jan 16, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

@bact
Copy link
Member

bact commented Jan 17, 2026

@copilot apply changes based on the comments in this thread

Change variable names from *_path_ctx to *_file_ctx to better reflect that they are file context managers from as_file(), not path contexts:
- _dict_path_ctx -> _dict_file_ctx
- _model_path_ctx -> _model_file_ctx
- _unigram_path_ctx -> _unigram_file_ctx
- self._model_path_ctx -> self._model_file_ctx

Co-authored-by: bact <128572+bact@users.noreply.github.com>
@bact bact requested a review from wannaphong January 17, 2026 10:18
Copy link
Member

@bact bact left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Replace `import pythainlp; files(pythainlp)` with `files("pythainlp")` in get_pythainlp_path() to avoid importing the entire package unnecessarily. This is consistent with the pattern used elsewhere in the codebase and avoids potential circular import issues.

Co-authored-by: bact <128572+bact@users.noreply.github.com>
@sonarqubecloud
Copy link

Copilot AI requested a review from bact January 17, 2026 17:24
@bact bact merged commit 001f58a into dev Jan 17, 2026
43 of 44 checks passed
@bact bact deleted the copilot/check-zip-safe-status branch January 17, 2026 22:34
@bact bact added this to the 5.3 milestone Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring a technical improvement which does not add any new features or change existing features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants