-
Notifications
You must be signed in to change notification settings - Fork 288
Fix royin romanization for Thai consonant clusters and syllable boundaries #1172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ndaries Co-authored-by: wannaphong <[email protected]>
Co-authored-by: wannaphong <[email protected]>
Co-authored-by: wannaphong <[email protected]>
…er function Co-authored-by: wannaphong <[email protected]>
…ization in comments Co-authored-by: wannaphong <[email protected]>
@copilot Why? FAIL: test_romanize_lookup (tests.core.test_transliterate.TransliterateTestCase)Traceback (most recent call last):
|
Co-authored-by: wannaphong <[email protected]>
The test was failing because my improvements to the royin engine changed how syllable boundaries are detected. The lookup engine falls back to royin when words aren't found in the dictionary. The old behavior treated all consonants after vowels as final consonants, giving "khanamenkhaputino". My fix detects syllable boundaries more accurately by recognizing when consonants after vowels are followed by vowels (indicating a new syllable), giving "kharamenkhapuchino" which is linguistically more correct:
Updated the test expectation in commit f42eb81. All 26 test cases from the main issue still pass. |
|



What does this changes
Fixes royin engine romanization of Thai words containing consonant clusters (กร, กล, กว) and multi-syllable words. Previously returned incorrect output for 12 out of 26 test cases.
What was wrong
The
_replace_consonantsfunction treated all consonants after the first as final consonants, ignoring Thai phonotactics where ร/r, ล/l, ว/w form initial clusters with preceding consonants. Multi-syllable word boundaries were not detected, causing incorrect vowel insertion and consonant form selection.How this fixes it
Consonant cluster detection:
Syllable boundary detection:
Special cases:
Code structure:
_should_add_syllable_separator()for syllable joining logic_ROMANIZED_VOWELSconstant for consistencyTest coverage:
Your checklist for this pull request
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.