test: assert 々 iteration mark does not corrupt name readings by bee-san · Pull Request #16 · bee-san/Japanese_Character_Name_Dictionary

bee-san · 2026-03-09T22:06:21Z

Add four tests guarding against the regression where the IDEOGRAPHIC
ITERATION MARK (々, U+3005) leaks into hiragana readings:

test_nene_iteration_mark_given_name_only: 寧々 → ねね (single name)
test_nene_iteration_mark_with_family_name: 田中寧々 reading has no 々
test_ririko_iteration_mark_in_name: 莉々子 reading is non-empty and
has no 々
test_iteration_mark_in_family_name_with_space: 須々木心一 with hints

々 is outside the CJK Unified Ideographs range (0x4E00–0x9FFF) so
is_kanji() returns false for it. The kanji→kana boundary heuristic can
therefore treat it as a kana character, producing a split where 々 is
the entire given-name part. kata_to_hira("々") passes it through
unchanged, resulting in 々 appearing as the reading.

https://claude.ai/code/session_01BLRvapZFYvE1LpBxaHshsF

Add four tests guarding against the regression where the IDEOGRAPHIC ITERATION MARK (々, U+3005) leaks into hiragana readings: - test_nene_iteration_mark_given_name_only: 寧々 → ねね (single name) - test_nene_iteration_mark_with_family_name: 田中寧々 reading has no 々 - test_ririko_iteration_mark_in_name: 莉々子 reading is non-empty and has no 々 - test_iteration_mark_in_family_name_with_space: 須々木心一 with hints 々 is outside the CJK Unified Ideographs range (0x4E00–0x9FFF) so is_kanji() returns false for it. The kanji→kana boundary heuristic can therefore treat it as a kana character, producing a split where 々 is the entire given-name part. kata_to_hira("々") passes it through unchanged, resulting in 々 appearing as the reading. https://claude.ai/code/session_01BLRvapZFYvE1LpBxaHshsF

Root cause: is_kanji() did not include the IDEOGRAPHIC ITERATION MARK (々, U+3005) because it sits outside all CJK Unified Ideographs blocks. This caused two linked failures in the name-parsing pipeline: 1. find_split_point Strategy 1 detected a kanji→non-kanji boundary at the 々 character, potentially isolating 々 as the entire given-name part (e.g. 寧々 → family="寧", given="々"). 2. With given_jp = "々", contains_kanji("々") returned false, so kata_to_hira("々") was used instead of the hint reading. kata_to_hira passes 々 through unchanged, producing a literal 々 in the output (e.g. "ねね々" instead of "ねね"). Fix: add `|| code == 0x3005` to is_kanji(). 々 repeats the preceding kanji (寧々 = 寧寧 = "nene"), so classifying it as kanji is semantically correct and fixes both the boundary detection and the reading fallback. A regression test (test_nene_iteration_mark_both_hints_triggers_split) was added that failed before this fix and now passes. https://claude.ai/code/session_01BLRvapZFYvE1LpBxaHshsF

claude added 2 commits March 9, 2026 21:58

bee-san merged commit d63ec53 into main Mar 9, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: assert 々 iteration mark does not corrupt name readings#16

test: assert 々 iteration mark does not corrupt name readings#16
bee-san merged 2 commits intomainfrom
claude/test-name-parsing-character-btzwJ

bee-san commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bee-san commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants