Skip to content

Commit 3e18a07

Browse files
committed
[stdlib] Fix implementation of Unicode text segmentation for word boundaries
Carefully overhaul our word breaking implementation to follow the recommendations of Unicode Annex #29. Start exposing the core primitives (as well as `String`-level interfaces), so that folks can prototype proper API for these concepts. - Fix `_wordIndex(after:)` to always advance forward. It now requires its input index to be on a word boundary. Remove the `@_spi` attribute, exposing it as a (hidden, but) public entry point. - The old SPIs `_wordIndex(before:)` and `_nearestWordIndex(atOrBelow:)` were irredemably broken; follow the Unicode recommendation for implementing random-access text segmentation and replace them both with a new public `_wordIndex(somewhereAtOrBefore:)` entry pont. - Expose handcrafted low-level state machines for detecting word boundaries (_WordRecognizer`, `_RandomAccessWordRecognizer`), following the design of `_CharacterRecognizer`. - Add tests to reliably validate that the two state machine flavors always produce consistent results. rdar://155482680
1 parent 22b1205 commit 3e18a07

File tree

7 files changed

+1153
-690
lines changed

7 files changed

+1153
-690
lines changed

stdlib/public/core/StringIndexValidation.swift

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -400,20 +400,3 @@ extension _StringGuts {
400400
scalarAlign(validateInclusiveSubscalarIndex_5_7(i)))
401401
}
402402
}
403-
404-
// Word index validation (String)
405-
extension _StringGuts {
406-
internal func validateWordIndex(
407-
_ i: String.Index
408-
) -> String.Index {
409-
return roundDownToNearestWord(scalarAlign(validateSubscalarIndex(i)))
410-
}
411-
412-
internal func validateInclusiveWordIndex(
413-
_ i: String.Index
414-
) -> String.Index {
415-
return roundDownToNearestWord(
416-
scalarAlign(validateInclusiveSubscalarIndex(i))
417-
)
418-
}
419-
}

0 commit comments

Comments
 (0)