Skip to content

Refactor featurize() to eliminate Union types and cast() calls#1236

Merged
bact merged 4 commits intodevfrom
copilot/refactor-featurize-function
Jan 30, 2026
Merged

Refactor featurize() to eliminate Union types and cast() calls#1236
bact merged 4 commits intodevfrom
copilot/refactor-featurize-function

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

What does this changes

Refactors Featurizer.featurize() in pythainlp/tokenize/han_solo.py to eliminate type casting and Union types, addressing PR #1235 review comments.

What was wrong

The features variable was typed as Union[dict[str, int], list[str]], requiring 6 cast() calls scattered throughout the feature extraction loop for type narrowing. This violated type safety principles and created maintenance burden—any logic change required updating two code paths.

How this fixes it

Core logic now builds features in their natural representation (list of strings), converting to dict only at method exit when needed:

# Before: Union type with casting throughout
features: Union[dict[str, int], list[str]]
if return_type == "dict":
    features = {}
else:
    features = []
# ... then 6x cast() calls in the loop

# After: single type, conversion at the end
features: list[str] = []
# ... clean append operations in the loop
# Convert at return only if dict requested
return {
    "X": [{"key": 1 for key in f} for f in all_features_list] if return_type == "dict" else all_features_list,
    "Y": all_labels_int if return_type == "dict" else [str(l) for l in all_labels_int]
}

Changes:

  • Single implementation of feature extraction logic
  • Removed Union and cast imports
  • Zero type casts in hot loop
  • 4 net lines removed

Your checklist for this pull request

  • Passed code styles and structures
  • Passed code linting checks and unit test
Original prompt

Refactor featurize() in pythainlp/tokenize/han_solo.py based on comments in this thread #1235 (review)

Try to separate the logic, so the features variable does not have to be the Union, so we can avoid many casting.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits January 30, 2026 01:26
…type and cast() calls

Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
@sonarqubecloud
Copy link

Copilot AI changed the title [WIP] Refactor featurize function in han_solo.py Refactor featurize() to eliminate Union types and cast() calls Jan 30, 2026
Copilot AI requested a review from bact January 30, 2026 01:34
@bact bact requested a review from Copilot January 30, 2026 01:37
@bact bact added the refactoring a technical improvement which does not add any new features or change existing features. label Jan 30, 2026
@coveralls
Copy link

Coverage Status

coverage: 53.45% (-0.08%) from 53.533%
when pulling 64bfc99 on copilot/refactor-featurize-function
into 03b92ab on dev.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Featurizer.featurize() method in pythainlp/tokenize/han_solo.py to eliminate Union types and cast() calls, addressing technical debt from PR #1235. The refactoring improves type safety and code maintainability by building features in a single, consistent representation and converting to the requested format only at the method's return point.

Changes:

  • Eliminated Union types and 6 cast() calls by building features as list[str] throughout the method
  • Improved variable naming (all_features_list, all_labels_int) for better type clarity
  • Consolidated conversion logic at the end of the method, reducing code duplication

@bact bact marked this pull request as ready for review January 30, 2026 01:41
@bact bact merged commit 9196e9d into dev Jan 30, 2026
35 of 38 checks passed
@bact bact deleted the copilot/refactor-featurize-function branch January 30, 2026 01:41
@bact bact added this to the 5.3 milestone Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactoring a technical improvement which does not add any new features or change existing features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants