Skip to content

Commit 6559f58

Browse files
committed
more docs
1 parent 7911fad commit 6559f58

File tree

1 file changed

+14
-6
lines changed

1 file changed

+14
-6
lines changed

README.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -89,12 +89,8 @@ interface Range {
8989
startByte: number;
9090
endByte: number;
9191

92-
// Range in unicode characters. If you're trying to slice out parts of the tring, you want this, not the byte.
93-
//
94-
// CAUTION: Javascript String.prototype.slice is not actually safe to use on these values
95-
// because it gets characters beyond UTF16 wrong. You want:
96-
// Array.from(myString).slice(startChar, endChar).join('')
97-
// instead.
92+
// Range in unicode codepoints.
93+
// CAUTION: see "Unicode Codepoint Slicing Warning" below.
9894
startChar: number;
9995
endChar: number;
10096
}
@@ -152,6 +148,18 @@ interface Parsed {
152148
}
153149
````
154150

151+
## Unicode Codepoint Slicing Warning
152+
153+
If you have a string and want to use the range provided by our `parse` method to slice out parts of that string, you need avoid two major pitfalls.
154+
155+
First, you want to use our `startChar` and `endChar` not `startByte` and `endByte`. Earlier versions of this library only provided `start` and `end` and they were always bytes, making string slicing unnecessarily difficult.
156+
157+
Second, beware that Javascript's `String.prototype.slice` doesn't actually work on Unicode codepoints. It works on UTF-16 units, which are not the same thing. Intead, you can rely on `String.prototype[Symbol.iterator]` which _does_ work on Unicode codepoints. So this is safe, even when fancy things like emojis are present:
158+
159+
```js
160+
Array.from(myString).slice(range.startChar, range.endChar).join("");
161+
```
162+
155163
## Contributing
156164

157165
See the [CONTRIBUTING.md](./CONTRIBUTING.md) file.

0 commit comments

Comments
 (0)