Skip to content

v3.0.0: Hierarchical Break Detection

Choose a tag to compare

@SCKelemen SCKelemen released this 16 Dec 20:20
· 39 commits to main since this release

Performance Improvements

Version 3.0.0 implements hierarchical optimization for the single-pass FindAllBreaks() API introduced in v2.0.0.

Hierarchical Break Detection

Leverages the natural subset relationships between break types:

  • Words ⊆ Graphemes: Word breaks only checked at grapheme cluster boundaries
  • Sentences ⊆ Words: Sentence breaks only checked at word boundaries

This eliminates redundant checks and significantly improves performance.

Benchmark Results

Performance on Apple M4 Pro comparing v3.0.0 single-pass vs three separate function calls:

Text Length v2.0.0 Three Passes v3.0.0 Single Pass Speedup
Short (33 chars) 3,457 ns/op 2,197 ns/op 1.57x
Medium (86 chars) 16,191 ns/op 9,636 ns/op 1.68x
Long (467 chars) 423,491 ns/op 188,982 ns/op 2.24x

Key benefits:

  • Speedup increases with text length (hierarchical pruning more effective on longer text)
  • Single UTF-8 decode and classification pass
  • Pre-classified data reused across all three break types
  • No additional memory allocations compared to v2.0.0

Conformance

Maintains 100% conformance on all official Unicode 17.0.0 test suites:

  • Grapheme: 766/766 tests passing
  • Word: 1,944/1,944 tests passing
  • Sentence: 512/512 tests passing

Breaking Changes

None - all existing APIs remain backward compatible.