Commit ac2f800
committed
Add v4.0.0: Rule-based state machine architecture
This release focuses on code quality and maintainability through
rule-based state machine architecture for all break detection algorithms.
New Features:
- BreakContext abstractions (GraphemeBreakContext, WordBreakContext, SentenceBreakContext)
- Named rule functions directly mapping to Unicode Standard (ruleGB3, ruleWB5, ruleSB8, etc.)
- Declarative rule chains with first-match-wins strategy
- Maintained hierarchical optimization (words at grapheme boundaries, sentences at word boundaries)
Code Organization:
- context.go: Break context abstractions with navigation methods
- grapheme_rules.go: Grapheme breaking rules (ruleGB3 through ruleGB12_13)
- word_rules.go: Word breaking rules (ruleWB3 through ruleWB15_16)
- sentence_rules.go: Sentence breaking rules (ruleSB3 through ruleSB11)
- single_pass.go: Cleaned up to use rule-based implementations (96 lines vs 574 lines)
Performance (Apple M4 Pro):
- Rule-based grapheme breaking: 1.6-11x faster than inline (increasing with text length)
- Single-pass API: 1.2-7.4x faster than three separate calls
- Medium and long texts benefit most from rule-based architecture
Benefits:
- Readability: Rules directly match Unicode Standard specification
- Maintainability: Easy to understand, modify, and extend
- Debuggability: Each rule can be tested and traced independently
Conformance:
- 100% conformance maintained on all official Unicode test suites
- Grapheme: 766/766 tests passing
- Word: 1,944/1,944 tests passing
- Sentence: 512/512 tests passing1 parent d3577e3 commit ac2f800
File tree
7 files changed
+1957
-637
lines changed- uax29
7 files changed
+1957
-637
lines changed
0 commit comments