v3.0.0: Hierarchical Break Detection
Performance Improvements
Version 3.0.0 implements hierarchical optimization for the single-pass FindAllBreaks() API introduced in v2.0.0.
Hierarchical Break Detection
Leverages the natural subset relationships between break types:
- Words ⊆ Graphemes: Word breaks only checked at grapheme cluster boundaries
- Sentences ⊆ Words: Sentence breaks only checked at word boundaries
This eliminates redundant checks and significantly improves performance.
Benchmark Results
Performance on Apple M4 Pro comparing v3.0.0 single-pass vs three separate function calls:
| Text Length | v2.0.0 Three Passes | v3.0.0 Single Pass | Speedup |
|---|---|---|---|
| Short (33 chars) | 3,457 ns/op | 2,197 ns/op | 1.57x |
| Medium (86 chars) | 16,191 ns/op | 9,636 ns/op | 1.68x |
| Long (467 chars) | 423,491 ns/op | 188,982 ns/op | 2.24x |
Key benefits:
- Speedup increases with text length (hierarchical pruning more effective on longer text)
- Single UTF-8 decode and classification pass
- Pre-classified data reused across all three break types
- No additional memory allocations compared to v2.0.0
Conformance
Maintains 100% conformance on all official Unicode 17.0.0 test suites:
- Grapheme: 766/766 tests passing
- Word: 1,944/1,944 tests passing
- Sentence: 512/512 tests passing
Breaking Changes
None - all existing APIs remain backward compatible.