Skip to content

Add None validation for corpus paths to prevent confusing runtime errors#1234

Merged
bact merged 6 commits intodevfrom
copilot/review-pr-changes
Jan 30, 2026
Merged

Add None validation for corpus paths to prevent confusing runtime errors#1234
bact merged 6 commits intodevfrom
copilot/review-pr-changes

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

What does this changes

Adds defensive validation for corpus file paths that can be None when files aren't downloaded, preventing confusing TypeError and AttributeError at runtime. Adds get_thwiki_lstm() helper for validated access. Fixes resource leak in test.

What was wrong

get_corpus_path() returns None when corpus files aren't downloaded, but code assumed non-None:

# ulmfit/core.py - dict values could be None
THWIKI_LSTM = {
    "wgts_fname": get_corpus_path(_MODEL_NAME_LSTM),  # Could be None
    "itos_fname": get_corpus_path(_ITOS_NAME_LSTM),   # Could be None
}

# generate/thai2fit.py - TypeError: expected str, not NoneType
with open(thwiki["itos_fname"], "rb") as f:  # Crashes if None

# tag/thainer.py - passes None to pycrfsuite
self.crf.open(get_corpus_path("thainer-1.4"))  # Crashes if None

# tests/extra/testx_ulmfit.py - resource leak
pickle.load(open(thwiki["itos_fname"], "rb"))  # File never closed

How this fixes it

Validation with clear errors:

  • ulmfit/core.py: Added get_thwiki_lstm() that validates paths and raises RuntimeError with download instructions. THWIKI_LSTM dict preserved for backward compatibility.
  • generate/thai2fit.py: Validates paths at module load before file operations
  • tag/thainer.py: Validates model paths before passing to pycrfsuite with version-specific instructions

Resource safety:

  • tests/extra/testx_ulmfit.py: Replaced pickle.load(open(...)) with proper context manager

All changes maintain backward compatibility. Error messages use multiline format with actionable instructions:

RuntimeError(
    "ULMFiT model files not found. "
    "Please download the corpus first:\n"
    "  pythainlp.corpus.download('wiki_lm_lstm')\n"
    "  pythainlp.corpus.download('wiki_itos_lstm')"
)

Your checklist for this pull request

  • Passed code styles and structures
  • Passed code linting checks and unit test
Original prompt

Review changes made in the following PRs and relevant codes or patterns:

#1232
#1231
#1230
#1228
#1225
#1220
#1214
#1213
#1212
#1211
#1202
#1190
#1189
#1186
#1182


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 4 commits January 30, 2026 00:09
Co-authored-by: bact <128572+bact@users.noreply.github.com>
…tion years

Co-authored-by: bact <128572+bact@users.noreply.github.com>
…ibility

Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
@coveralls
Copy link

coveralls commented Jan 30, 2026

Coverage Status

coverage: 53.549% (-0.09%) from 53.636%
when pulling 31ffa48 on copilot/review-pr-changes
into 7cc69e9 on dev.

@sonarqubecloud
Copy link

@bact bact marked this pull request as ready for review January 30, 2026 00:17
Copilot AI changed the title [WIP] Review changes made in multiple pull requests Add None validation for corpus paths to prevent confusing runtime errors Jan 30, 2026
Copilot AI requested a review from bact January 30, 2026 00:19
@bact bact merged commit bb14ad5 into dev Jan 30, 2026
29 of 32 checks passed
@bact bact deleted the copilot/review-pr-changes branch January 30, 2026 00:20
@bact bact added the refactoring a technical improvement which does not add any new features or change existing features. label Jan 30, 2026
@bact bact added this to the 5.3 milestone Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring a technical improvement which does not add any new features or change existing features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants