You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add lookahead heuristic to improve word segmentation
- Implemented 1-step lookahead to prefer matches that avoid unknown Thai characters
- When longest match leads to unknown Thai char, try shorter matches that lead to dict words
- All existing tests pass (12/12)
- 4 out of 5 benchmark cases now match PyThaiNLP exactly
Co-authored-by: wannaphong <8536487+wannaphong@users.noreply.github.com>
0 commit comments