Refactor type: ignore suppressions to proper type fixes#1235
Conversation
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Co-authored-by: bact <128572+bact@users.noreply.github.com>
Refactor _find_month function to return 0 instead of None when not found. Update type hinting and adjust month variable assignment in date parsing.
Co-authored-by: bact <128572+bact@users.noreply.github.com>
|
There was a problem hiding this comment.
Pull request overview
This pull request refactors type suppression comments (type: ignore) to use proper type-safe solutions. It successfully eliminates 11 of 12 type ignore comments by replacing lambda functions with generator expressions in morse.py, adding explicit type annotations in multiple tokenization modules, and using typing.cast() for Union type narrowing in han_solo.py.
Changes:
- Replaced lambda functions with default parameter capture with cleaner generator expressions in morse encoding/decoding
- Added explicit type annotations for defaultdict and list variables across tokenization and summarization modules
- Changed _find_month return type from Optional[int] to int with 0 sentinel value
- Used typing.cast() to handle Union[dict, list] type narrowing in han_solo.py
- Improved type annotation specificity (e.g., Optional[object] to Optional[ThapSap])
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pythainlp/util/morse.py | Refactored lambda functions with default parameters to generator expressions for morse encoding/decoding |
| pythainlp/util/date.py | Changed _find_month return type from Optional[int] to int, removed unused variable declarations |
| pythainlp/transliterate/wunsen.py | Improved type annotation from Optional[object] to Optional[ThapSap] |
| pythainlp/tokenize/multi_cut.py | Added explicit type annotation for defaultdict[int, list[str]] |
| pythainlp/tokenize/longest.py | Added explicit type annotations for list[str] and list[int] variables |
| pythainlp/tokenize/han_solo.py | Used typing.cast() for Union type narrowing instead of type: ignore comments |
| pythainlp/summarize/freq.py | Added explicit type annotations for defaultdict instances |
| pythainlp/soundex/prayut_and_somchaip.py | Retained single type: ignore comment with justification |
| if j in text: | ||
| return i + 1 | ||
| return None | ||
| return 0 # Not found in list |
There was a problem hiding this comment.
Changing the return type from Optional[int] to int and returning 0 for "not found" is problematic. The value 0 is ambiguous because it could be confused with a valid index (though months are 1-indexed, making this slightly safer). However, this breaks the API contract - the function previously could distinguish between "month found" and "month not found" via None, which is now lost.
This change could cause silent failures: if a month is not found in the input text, the function will now return 0, which will be passed to datetime() constructor at line 314. The datetime constructor expects month values 1-12, and passing 0 will raise a ValueError at runtime rather than handling the case gracefully.
A better approach would be to either:
- Keep the Optional[int] return type and handle None appropriately in the caller
- Raise an exception when month is not found rather than returning a sentinel value
| return 0 # Not found in list | |
| raise ValueError(f"Month not found in text: {text!r}") |
| if return_type == "dict": | ||
| features[left_key] = 1 | ||
| cast(dict[str, int], features)[left_key] = 1 | ||
| else: | ||
| features.append(left_key) | ||
| cast(list[str], features).append(left_key) |
There was a problem hiding this comment.
Using cast() to work around Union type narrowing defeats the purpose of static type checking and can lead to runtime errors. The cast() function tells the type checker to trust that the value is of a certain type without actually performing any runtime validation.
A more type-safe approach would be to properly narrow the Union type through control flow. Instead of using cast(), you could check the type after initialization:
if return_type == "dict":
assert isinstance(features, dict)
features[left_key] = 1
else:
assert isinstance(features, list)
features.append(left_key)Or refactor the code to avoid the Union type altogether by using separate variables or a more structured approach. The repeated casting throughout the method (lines 104, 106, 122, 124, 133, 135) suggests the control flow could be simplified.
| if return_type == "dict": | ||
| features[right_key] = 1 | ||
| cast(dict[str, int], features)[right_key] = 1 | ||
| else: | ||
| features.append(right_key) | ||
| cast(list[str], features).append(right_key) |
There was a problem hiding this comment.
Using cast() to work around Union type narrowing defeats the purpose of static type checking and can lead to runtime errors. The cast() function tells the type checker to trust that the value is of a certain type without actually performing any runtime validation.
A more type-safe approach would be to properly narrow the Union type through control flow. Instead of using cast(), you could check the type after initialization or refactor the code to avoid the Union type altogether by using separate variables or a more structured approach.
| if return_type == "dict": | ||
| features[ngram_key] = 1 | ||
| cast(dict[str, int], features)[ngram_key] = 1 | ||
| else: | ||
| features.append(ngram_key) | ||
| cast(list[str], features).append(ngram_key) |
There was a problem hiding this comment.
Using cast() to work around Union type narrowing defeats the purpose of static type checking and can lead to runtime errors. The cast() function tells the type checker to trust that the value is of a certain type without actually performing any runtime validation.
A more type-safe approach would be to properly narrow the Union type through control flow. Instead of using cast(), you could check the type after initialization or refactor the code to avoid the Union type altogether by using separate variables or a more structured approach.
|
@copilot apply changes based on the comments in this thread |



What does this changes
Replaces type suppression comments with proper type-safe solutions where feasible. Eliminates 11 of 12
type: ignorecomments across morse encoding, tokenization, and soundex modules.What was wrong
Previous fixes added
type: ignorecomments to suppress mypy errors rather than addressing root causes:morse.py)han_solo.py)soundex.py)How this fixes it
morse.py: Replace
map(lambda x, g=dict.get: g(x, " "), text)with generator expressions(dict.get(x, " ") for x in text). Removes 4 suppressions.han_solo.py: Use
typing.cast()for explicit type narrowing on Union types. Add properUnion[int, str]annotation for mutable variable. Removes 7 suppressions.soundex.py: Retain single
type: ignore[call-overload]- changing tolist[str | None]cascades 13 downstream errors. Current approach is minimal.Example:
Your checklist for this pull request
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.