Skip to content

Commit ac2f800

Browse files
committed
Add v4.0.0: Rule-based state machine architecture
This release focuses on code quality and maintainability through rule-based state machine architecture for all break detection algorithms. New Features: - BreakContext abstractions (GraphemeBreakContext, WordBreakContext, SentenceBreakContext) - Named rule functions directly mapping to Unicode Standard (ruleGB3, ruleWB5, ruleSB8, etc.) - Declarative rule chains with first-match-wins strategy - Maintained hierarchical optimization (words at grapheme boundaries, sentences at word boundaries) Code Organization: - context.go: Break context abstractions with navigation methods - grapheme_rules.go: Grapheme breaking rules (ruleGB3 through ruleGB12_13) - word_rules.go: Word breaking rules (ruleWB3 through ruleWB15_16) - sentence_rules.go: Sentence breaking rules (ruleSB3 through ruleSB11) - single_pass.go: Cleaned up to use rule-based implementations (96 lines vs 574 lines) Performance (Apple M4 Pro): - Rule-based grapheme breaking: 1.6-11x faster than inline (increasing with text length) - Single-pass API: 1.2-7.4x faster than three separate calls - Medium and long texts benefit most from rule-based architecture Benefits: - Readability: Rules directly match Unicode Standard specification - Maintainability: Easy to understand, modify, and extend - Debuggability: Each rule can be tested and traced independently Conformance: - 100% conformance maintained on all official Unicode test suites - Grapheme: 766/766 tests passing - Word: 1,944/1,944 tests passing - Sentence: 512/512 tests passing
1 parent d3577e3 commit ac2f800

File tree

7 files changed

+1957
-637
lines changed

7 files changed

+1957
-637
lines changed

0 commit comments

Comments
 (0)