| 
 | 1 | +[](https://github.com/dart-lang/characters/actions?query=workflow%3A"Dart+CI"+branch%3Amaster)  | 
 | 2 | +[](https://pub.dev/packages/characters)  | 
 | 3 | +[](https://pub.dev/packages/characters/publisher)  | 
 | 4 | + | 
 | 5 | +[`Characters`][Characters] are strings viewed as  | 
 | 6 | +sequences of **user-perceived character**s,  | 
 | 7 | +also known as [Unicode (extended) grapheme clusters][Grapheme Clusters].  | 
 | 8 | + | 
 | 9 | +The [`Characters`][Characters] class allows access to  | 
 | 10 | +the individual characters of a string,  | 
 | 11 | +and a way to navigate back and forth between them  | 
 | 12 | +using a [`CharacterRange`][CharacterRange].  | 
 | 13 | + | 
 | 14 | +## Unicode characters and representations  | 
 | 15 | + | 
 | 16 | +There is no such thing as plain text.  | 
 | 17 | + | 
 | 18 | +Computers only know numbers,  | 
 | 19 | +so any "text" on a computer is represented by numbers,  | 
 | 20 | +which are again stored as bytes in memory.  | 
 | 21 | + | 
 | 22 | +The meaning of those bytes are provided by layers of interpretation,  | 
 | 23 | +building up to the *glyph*s that the computer displays on the screen.  | 
 | 24 | + | 
 | 25 | +| Abstraction           | Dart Type                                                    | Usage                                                        | Example                                                      |  | 
 | 26 | +| --------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |  | 
 | 27 | +| Bytes                 | [`ByteBuffer`][ByteBuffer],<br />[`Uint8List`][Uint8List]                           | Physical layout: Memory or network communication.            | `file.readAsBytesSync()`                                     |  | 
 | 28 | +| [Code units][]        | [`Uint8List`][Uint8List] (UTF‑8)<br />[`Uint16List`][Uint16List], [`String`][String] (UTF‑16) | Standard formats for<br /> encoding code points in memory.<br />Stored in memory using one (UTF‑8) or more (UTF‑16) bytes. One or more code units encode a code point. | `string.codeUnits`<br />`string.codeUnitAt(index)`<br />`utf8.encode(string)` |  | 
 | 29 | +| [Code points][]       | [`Runes`][Runes]                                                    | The Unicode unit of meaning.                                 | `string.runes`                                               |  | 
 | 30 | +| [Grapheme Clusters][] | [`Characters`][Characters]                                               | Human perceived character. One or more code points.          | `string.characters`                                          |  | 
 | 31 | +| [Glyphs][]            |                                                              | Visual rendering of grapheme clusters.                       | `print(string)`                                              |  | 
 | 32 | + | 
 | 33 | +A Dart `String` is a sequence of UTF-16 code units,  | 
 | 34 | +just like strings in JavaScript and Java.  | 
 | 35 | +The runtime system decides on the underlying physical representation.  | 
 | 36 | + | 
 | 37 | +That makes plain strings inadequate  | 
 | 38 | +when needing to manipulate the text that a user is viewing, or entering,  | 
 | 39 | +because string operations are not working at the grapheme cluster level.  | 
 | 40 | + | 
 | 41 | +For example, to abbreviate a text to, say, the 15 first characters or glyphs,  | 
 | 42 | +a string like "A 🇬🇧 text in English"  | 
 | 43 | +should abbreviate to "A 🇬🇧 text in Eng… when counting characters,  | 
 | 44 | +but will become "A 🇬🇧 text in …"  | 
 | 45 | +if counting code units using [`String`][String] operations.  | 
 | 46 | + | 
 | 47 | +Whenever you need to manipulate strings at the character level,  | 
 | 48 | +you should be using the [`Characters`][Characters] type,  | 
 | 49 | +not the methods of the [`String`][String] class.  | 
 | 50 | + | 
 | 51 | +## The Characters class  | 
 | 52 | + | 
 | 53 | +The [`Characters`][Characters] class exposes a string  | 
 | 54 | +as a sequence of grapheme clusters.  | 
 | 55 | +All operations on [`Characters`][Characters] operate  | 
 | 56 | +on entire grapheme clusters,  | 
 | 57 | +so it removes the risk of splitting combined characters or emojis  | 
 | 58 | +that are inherent in the code-unit based [`String`][String] operations.  | 
 | 59 | + | 
 | 60 | +You can get a [`Characters`][Characters] object for a string using either  | 
 | 61 | +the constructor [`Characters(string)`][Characters constructor]  | 
 | 62 | +or the extension getter `string.characters`.  | 
 | 63 | + | 
 | 64 | +At its core, the class is an [`Iterable<String>`][Iterable]  | 
 | 65 | +where the element strings are single grapheme clusters.  | 
 | 66 | +This allows sequential access to the individual grapheme clusters  | 
 | 67 | +of the original string.  | 
 | 68 | + | 
 | 69 | +On top of that, there are operations mirroring the operations  | 
 | 70 | +of [`String`][String] that are not index, code-unit or code-point based,  | 
 | 71 | +like [`startsWith`][Characters.startsWith]  | 
 | 72 | +or [`replaceAll`][Characters.replaceAll].  | 
 | 73 | +There are some differences between these and the [`String`][String] operations.  | 
 | 74 | +For example the replace methods only accept characters as pattern.  | 
 | 75 | +Regular expressions are not grapheme cluster aware,  | 
 | 76 | +so they cannot be used safely on a sequence of characters.  | 
 | 77 | + | 
 | 78 | +Grapheme clusters have varying length in the underlying representation,  | 
 | 79 | +so operations on a [`Characters`][Characters] sequence cannot be index based.  | 
 | 80 | +Instead, the [`CharacterRange`][CharacterRange] *iterator*  | 
 | 81 | +provided by [`Characters.iterator`][Characters.iterator]  | 
 | 82 | +has been greatly enhanced.  | 
 | 83 | +It can move both forwards and backwards,  | 
 | 84 | +and it can span a *range* of grapheme cluster.  | 
 | 85 | +Most operations that can be performed on a full [`Characters`][Characters]  | 
 | 86 | +can also be performed on the grapheme clusters  | 
 | 87 | +in the range of a [`CharacterRange`][CharacterRange].  | 
 | 88 | +The range can be contracted, expanded or moved in various ways,  | 
 | 89 | +not restricted to using [`moveNext`][CharacterRange.moveNext],  | 
 | 90 | +to move to the next grapheme cluster.  | 
 | 91 | + | 
 | 92 | +Example:  | 
 | 93 | + | 
 | 94 | +```dart  | 
 | 95 | +// Using String indices.  | 
 | 96 | +String? firstTagString(String source) {  | 
 | 97 | +  var start = source.indexOf('<') + 1;  | 
 | 98 | +  if (start > 0) {  | 
 | 99 | +    var end = source.indexOf('>', start);  | 
 | 100 | +    if (end >= 0) {  | 
 | 101 | +      return source.substring(start, end);  | 
 | 102 | +    }  | 
 | 103 | +  }  | 
 | 104 | +  return null;  | 
 | 105 | +}  | 
 | 106 | +
  | 
 | 107 | +// Using CharacterRange operations.  | 
 | 108 | +Characters? firstTagCharacters(Characters source) {  | 
 | 109 | +  var range = source.findFirst('<'.characters);  | 
 | 110 | +  if (range != null && range.moveUntil('>'.characters)) {  | 
 | 111 | +    return range.currentCharacters;  | 
 | 112 | +  }  | 
 | 113 | +  return null;  | 
 | 114 | +}  | 
 | 115 | +```  | 
 | 116 | + | 
 | 117 | +[ByteBuffer]: https://api.dart.dev/dart-typed_data/ByteBuffer-class.html "ByteBuffer class"  | 
 | 118 | +[CharacterRange.moveNext]: https://pub.dev/documentation/characters/latest/characters/CharacterRange/moveNext.html "CharacterRange.moveNext"  | 
 | 119 | +[CharacterRange]: https://pub.dev/documentation/characters/latest/characters/CharacterRange-class.html "CharacterRange class"  | 
 | 120 | +[Characters constructor]: https://pub.dev/documentation/characters/latest/characters/Characters/Characters.html "Characters constructor"  | 
 | 121 | +[Characters.iterator]: https://pub.dev/documentation/characters/latest/characters/Characters/iterator.html "CharactersRange get iterator"  | 
 | 122 | +[Characters.replaceAll]: https://pub.dev/documentation/characters/latest/characters/Characters/replaceAll.html "Characters.replaceAlle"  | 
 | 123 | +[Characters.startsWith]: https://pub.dev/documentation/characters/latest/characters/Characters/startsWith.html "Characters.startsWith"  | 
 | 124 | +[Characters]: https://pub.dev/documentation/characters/latest/characters/Characters-class.html "Characters class"  | 
 | 125 | +[Code Points]: https://unicode.org/glossary/#code_point "Unicode Code Point"  | 
 | 126 | +[Code Units]: https://unicode.org/glossary/#code_unit "Unicode Code Units"  | 
 | 127 | +[Glyphs]: https://unicode.org/glossary/#glyph "Unicode Glyphs"  | 
 | 128 | +[Grapheme Clusters]: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries "Unicode (Extended) Grapheme Cluster"  | 
 | 129 | +[Iterable]: https://api.dart.dev/dart-core/Iterable-class.html "Iterable class"  | 
 | 130 | +[Runes]: https://api.dart.dev/dart-core/Runes-class.html "Runes class"  | 
 | 131 | +[String]: https://api.dart.dev/dart-core/String-class.html "String class"  | 
 | 132 | +[Uint16List]: https://api.dart.dev/dart-typed_data/Uint16List-class.html "Uint16List class"  | 
 | 133 | +[Uint8List]: https://api.dart.dev/dart-typed_data/Uint8List-class.html "Uint8List class"  | 
0 commit comments