Skip to content

Commit 080e774

Browse files
committed
cleanup
1 parent edad29d commit 080e774

13 files changed

+2139
-1351
lines changed

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,19 @@
77

88
A lightweight implementation of the [Unicode Text Segmentation (UAX \#29)](https://www.unicode.org/reports/tr29)
99

10-
- **Verified spec-compliance**: Up-to-date Unicode data, passes the official Unicode test suites, verifies full compliance with the `Intl.Segmenter` API via additional property-based testing, maintaining 100% coverage.
10+
- **Spec compliant**: Up-to-date Unicode data, verified by the official Unicode test suites and fuzzed with the native `Intl.Segmenter`, and maintaining 100% test coverage.
1111

12-
- **Excellent compatibility**: It works well on older browsers, edge runtimes, and React Native (Hermes).
12+
- **Excellent compatibility**: It works well on older browsers, edge runtimes, React Native (Hermes) and QuickJS.
1313

14-
- **Zero-dependencies**: It doesn't bloat `node_modules` or the networks tab. Just a small minimal snippet.
14+
- **Zero-dependencies**: It doesn't bloat `node_modules` or the network bandwidth. Like a small minimal snippet.
1515

16-
- **Small bundle size**: It effectively compresses Unicode data and provides a tree-shakeable format, eliminating unused codes.
16+
- **Small bundle size**: It effectively compresses the Unicode data and provides a bundler-friendly format.
1717

18-
- **Extremely efficient**: It's carefully optimized for performance, making it the fastest one in the ecosystem—outperforming even the built-in `Intl.Segmenter`.
18+
- **Extremely efficient**: It's carefully optimized for runtime performance, making it the fastest one in the ecosystem—outperforming even the built-in `Intl.Segmenter`.
1919

2020
- **TypeScript**: It's fully type-checked, and provides type definitions and JSDoc.
2121

22-
- **ESM-first**: It natively supports ES Modules, and still supports CommonJS.
22+
- **ESM-first**: It primarily supports ES modules, and still supports CommonJS.
2323

2424
> [!NOTE]
2525
> unicode-segmenter is now **[e18e] recommendation!**
@@ -42,7 +42,7 @@ And extra utilities for combined use cases.
4242

4343
- [`unicode-segmenter/emoji`](#export-unicode-segmenteremoji): Matches single codepoint emojis
4444
- [`unicode-segmenter/general`](#export-unicode-segmentergeneral): Matches single codepoint alphanumerics
45-
- [`unicode-segmenter/utils`](#export-unicode-segmenterutils): Handles UTF-16 codepoints
45+
- [`unicode-segmenter/utils`](#export-unicode-segmenterutils): Some utilities for handling codepoints
4646

4747
### Export `unicode-segmenter/grapheme`
4848
[![](https://edge.bundlejs.com/badge?q=unicode-segmenter/grapheme&treeshake=[*])](https://bundlejs.com/?q=unicode-segmenter%2Fgrapheme&treeshake=%5B*%5D)
@@ -254,7 +254,7 @@ Since [Hermes doesn't support the `Intl.Segmenter` API](https://github.com/faceb
254254

255255
| Name | Unicode® | ESM? | Size | Size (min) | Size (min+gzip) | Size (min+br) |
256256
|------------------------------|----------|------|----------:|-----------:|----------------:|--------------:|
257-
| `unicode-segmenter/grapheme` | 16.0.0 | ✔️ | 17,313 | 12,783 | 5,285 | 3,946 |
257+
| `unicode-segmenter/grapheme` | 16.0.0 | ✔️ | 15,929 | 12,110 | 5,049 | 3,740 |
258258
| `graphemer` | 15.0.0 | ✖️ ️| 410,435 | 95,104 | 15,752 | 10,660 |
259259
| `grapheme-splitter` | 10.0.0 | ✖️ | 122,252 | 23,680 | 7,852 | 4,841 |
260260
| `@formatjs/intl-segmenter`* | 15.0.0 | ✖️ | 491,043 | 318,721 | 54,248 | 34,380 |
@@ -270,9 +270,9 @@ Since [Hermes doesn't support the `Intl.Segmenter` API](https://github.com/faceb
270270

271271
| Name | Bytecode size | Bytecode size (gzip)* |
272272
|------------------------------|--------------:|----------------------:|
273-
| `unicode-segmenter/grapheme` | 24,386 | 12,690 |
274-
| `graphemer` | 133,949 | 31,710 |
275-
| `grapheme-splitter` | 63,810 | 19,125 |
273+
| `unicode-segmenter/grapheme` | 23,037 | 12,058 |
274+
| `graphemer` | 133,952 | 31,708 |
275+
| `grapheme-splitter` | 63,813 | 19,123 |
276276

277277
* It would be compressed when included as an app asset.
278278

benchmark/grapheme/_records/20250305-intel_x86_64-linux-nodejs_23.9.0.txt

Lines changed: 0 additions & 335 deletions
This file was deleted.

benchmark/grapheme/_records/20250305-intel_x86_64-linux-nodejs_23.9.0_no-pre-compute.txt

Lines changed: 0 additions & 335 deletions
This file was deleted.

benchmark/grapheme/_records/20250305-intel_x86_64-linux-nodejs_23.9.0_refactor-baseline.txt

Lines changed: 0 additions & 335 deletions
This file was deleted.

benchmark/grapheme/_records/20250305-intel_x86_64-linux-nodejs_23.9.0_refactor-eytzinger.txt

Lines changed: 0 additions & 335 deletions
This file was deleted.

benchmark/grapheme/_records/20250307-apple_m4_pro-macos_15.3-bun_1.2.4.txt

Lines changed: 335 additions & 0 deletions
Large diffs are not rendered by default.

benchmark/grapheme/_records/20250307-apple_m4_pro-macos_15.3-chrome_134.0.6998.45.txt

Lines changed: 197 additions & 0 deletions
Large diffs are not rendered by default.

benchmark/grapheme/_records/20250307-apple_m4_pro-macos_15.3-nodejs_23.9.0.txt

Lines changed: 335 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
clk: ~4.25 GHz
2+
cpu: null
3+
runtime: null (null)
4+
5+
benchmark avg (min … max) p75 / p99 (min … top 1%)
6+
------------------------------------------------------------- -------------------------------
7+
• Lorem ipsum (ascii)
8+
------------------------------------------------------------- -------------------------------
9+
unicode-segmenter/grapheme 2.95 µs/iter 2.93 µs █
10+
(2.69 µs … 3.42 µs) 3.17 µs ▂▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▃
11+
graphemer 13.65 µs/iter 0.00 ps █
12+
(0.00 ps … 1.00 ms) 1.00 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
13+
grapheme-splitter 68.97 µs/iter 0.00 ps █
14+
(0.00 ps … 1.00 ms) 1.00 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂
15+
@formatjs/intl-segmenter 67.99 µs/iter 0.00 ps █
16+
(0.00 ps … 24.00 ms) 1.00 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂
17+
unicode-rs/unicode-segmentation (wasm-bindgen) 30.27 µs/iter 0.00 ps █
18+
(0.00 ps … 1.00 ms) 1.00 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
19+
Intl.Segmenter 6.18 µs/iter 6.35 µs █ ▂
20+
(5.86 µs … 6.59 µs) 6.35 µs ▄▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█
21+
22+
┌ ┐
23+
unicode-segmenter/grapheme ┤ 2.95 µs
24+
graphemer ┤■■■■■■ 13.65 µs
25+
grapheme-splitter ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 68.97 µs
26+
@formatjs/intl-segmenter ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 67.99 µs
27+
unicode-rs/unicode-segmentation (wasm-bindgen) ┤■■■■■■■■■■■■■■ 30.27 µs
28+
Intl.Segmenter ┤■■ 6.18 µs
29+
└ ┘
30+
31+
summary
32+
unicode-segmenter/grapheme
33+
2.1x faster than Intl.Segmenter
34+
4.63x faster than graphemer
35+
10.26x faster than unicode-rs/unicode-segmentation (wasm-bindgen)
36+
23.05x faster than @formatjs/intl-segmenter
37+
23.38x faster than grapheme-splitter
38+
39+
• Emojis
40+
------------------------------------------------------------- -------------------------------
41+
unicode-segmenter/grapheme 1.86 µs/iter 1.95 µs ▃ █
42+
(1.71 µs … 1.95 µs) 1.95 µs █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█
43+
graphemer 4.19 µs/iter 4.15 µs █
44+
(4.15 µs … 4.39 µs) 4.39 µs █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄
45+
grapheme-splitter 17.42 µs/iter 17.82 µs ▃ █ ▃
46+
(16.60 µs … 19.04 µs) 18.55 µs █▁▁█▁█▁▁▆▁▁▁▁▆▁▁▁▆▁▁▆
47+
@formatjs/intl-segmenter 19.80 µs/iter 0.00 ps █
48+
(0.00 ps … 1.00 ms) 1.00 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
49+
unicode-rs/unicode-segmentation (wasm-bindgen) 8.86 µs/iter 8.79 µs █
50+
(8.79 µs … 9.03 µs) 9.03 µs █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆
51+
Intl.Segmenter 2.57 µs/iter 2.69 µs ▇ █
52+
(2.20 µs … 2.93 µs) 2.93 µs ▂▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▂
53+
54+
┌ ┐
55+
unicode-segmenter/grapheme ┤ 1.86 µs
56+
graphemer ┤■■■■ 4.19 µs
57+
grapheme-splitter ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 17.42 µs
58+
@formatjs/intl-segmenter ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 19.80 µs
59+
unicode-rs/unicode-segmentation (wasm-bindgen) ┤■■■■■■■■■■■■■ 8.86 µs
60+
Intl.Segmenter ┤■ 2.57 µs
61+
└ ┘
62+
63+
summary
64+
unicode-segmenter/grapheme
65+
1.38x faster than Intl.Segmenter
66+
2.26x faster than graphemer
67+
4.77x faster than unicode-rs/unicode-segmentation (wasm-bindgen)
68+
9.38x faster than grapheme-splitter
69+
10.66x faster than @formatjs/intl-segmenter
70+
71+
• Hindi
72+
------------------------------------------------------------- -------------------------------
73+
unicode-segmenter/grapheme 5.40 µs/iter 5.37 µs █
74+
(5.13 µs … 5.62 µs) 5.62 µs ▃▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▄
75+
graphemer 15.61 µs/iter 0.00 ps █
76+
(0.00 ps … 1.00 ms) 1.00 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
77+
grapheme-splitter 25.70 µs/iter 26.12 µs █
78+
(24.90 µs … 27.34 µs) 26.61 µs ▅▁▁█▁▁█▁▁▅▁▁▁▁▅▁▁▅▁▁▅
79+
@formatjs/intl-segmenter 62.24 µs/iter 62.74 µs ▃ █
80+
(59.57 µs … 63.72 µs) 63.48 µs ▆▁▁▁▁▁▁▆▁▁▁▆█▁▁█▆▆▁▁▆
81+
unicode-rs/unicode-segmentation (wasm-bindgen) 25.61 µs/iter 25.63 µs █ █
82+
(25.39 µs … 26.37 µs) 25.88 µs █▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▄
83+
Intl.Segmenter 5.19 µs/iter 0.00 ps █
84+
(0.00 ps … 1.00 ms) 0.00 ps █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
85+
86+
┌ ┐
87+
unicode-segmenter/grapheme ┤ 5.40 µs
88+
graphemer ┤■■■■■■ 15.61 µs
89+
grapheme-splitter ┤■■■■■■■■■■■■ 25.70 µs
90+
@formatjs/intl-segmenter ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 62.24 µs
91+
unicode-rs/unicode-segmentation (wasm-bindgen) ┤■■■■■■■■■■■■ 25.61 µs
92+
Intl.Segmenter ┤ 5.19 µs
93+
└ ┘
94+
95+
summary
96+
unicode-segmenter/grapheme
97+
1.04x slower than Intl.Segmenter
98+
2.89x faster than graphemer
99+
4.74x faster than unicode-rs/unicode-segmentation (wasm-bindgen)
100+
4.76x faster than grapheme-splitter
101+
11.52x faster than @formatjs/intl-segmenter
102+
103+
• Demonic characters
104+
------------------------------------------------------------- -------------------------------
105+
unicode-segmenter/grapheme 1.70 µs/iter 1.71 µs █
106+
(1.46 µs … 1.71 µs) 1.71 µs ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█
107+
graphemer 6.84 µs/iter 6.84 µs █
108+
(6.59 µs … 7.32 µs) 7.08 µs █▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▆
109+
grapheme-splitter 5.52 µs/iter 5.62 µs █
110+
(5.37 µs … 5.86 µs) 5.86 µs █▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▄
111+
@formatjs/intl-segmenter 62.42 µs/iter 63.96 µs █ █ █ █
112+
(60.55 µs … 65.92 µs) 64.94 µs ██▁█▁▁█▁█▁▁▁▁▁▁▁█▁▁▁█
113+
unicode-rs/unicode-segmentation (wasm-bindgen) 3.32 µs/iter 3.42 µs ▃ █
114+
(3.17 µs … 3.42 µs) 3.42 µs █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█
115+
Intl.Segmenter 1.30 µs/iter 0.00 ps █
116+
(0.00 ps … 1.00 ms) 0.00 ps █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
117+
118+
┌ ┐
119+
unicode-segmenter/grapheme ┤ 1.70 µs
120+
graphemer ┤■■■ 6.84 µs
121+
grapheme-splitter ┤■■ 5.52 µs
122+
@formatjs/intl-segmenter ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 62.42 µs
123+
unicode-rs/unicode-segmentation (wasm-bindgen) ┤■ 3.32 µs
124+
Intl.Segmenter ┤ 1.30 µs
125+
└ ┘
126+
127+
summary
128+
unicode-segmenter/grapheme
129+
1.31x slower than Intl.Segmenter
130+
1.96x faster than unicode-rs/unicode-segmentation (wasm-bindgen)
131+
3.25x faster than grapheme-splitter
132+
4.03x faster than graphemer
133+
36.76x faster than @formatjs/intl-segmenter
134+
135+
• Tweet text (combined)
136+
------------------------------------------------------------- -------------------------------
137+
unicode-segmenter/grapheme 7.11 µs/iter 7.08 µs █
138+
(6.84 µs … 7.57 µs) 7.32 µs ▅▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▅
139+
graphemer 21.34 µs/iter 21.48 µs ▃ █ █
140+
(20.26 µs … 23.68 µs) 21.97 µs ▆▁▁▁▁▁█▁▁█▁▆▁▁█▁▁▁▁▁▆
141+
grapheme-splitter 45.21 µs/iter 44.92 µs █
142+
(44.19 µs … 48.34 µs) 46.14 µs ▅▁▅▅▁█▁█▁▁▁▁▁▁▁▁▁▅▁▁▅
143+
@formatjs/intl-segmenter 83.55 µs/iter 0.00 ps █
144+
(0.00 ps … 1.00 ms) 1.00 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂
145+
unicode-rs/unicode-segmentation (wasm-bindgen) 44.33 µs/iter 44.68 µs █ █ █
146+
(43.21 µs … 45.17 µs) 45.17 µs █▁▁█▁▁▁▁█▁▁▁██▁█▁▁█▁█
147+
Intl.Segmenter 9.24 µs/iter 9.52 µs █
148+
(9.03 µs … 9.77 µs) 9.52 µs █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█
149+
150+
┌ ┐
151+
unicode-segmenter/grapheme ┤ 7.11 µs
152+
graphemer ┤■■■■■■ 21.34 µs
153+
grapheme-splitter ┤■■■■■■■■■■■■■■■■■ 45.21 µs
154+
@formatjs/intl-segmenter ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 83.55 µs
155+
unicode-rs/unicode-segmentation (wasm-bindgen) ┤■■■■■■■■■■■■■■■■■ 44.33 µs
156+
Intl.Segmenter ┤■ 9.24 µs
157+
└ ┘
158+
159+
summary
160+
unicode-segmenter/grapheme
161+
1.3x faster than Intl.Segmenter
162+
3x faster than graphemer
163+
6.24x faster than unicode-rs/unicode-segmentation (wasm-bindgen)
164+
6.36x faster than grapheme-splitter
165+
11.76x faster than @formatjs/intl-segmenter
166+
167+
• Code snippet (combined)
168+
------------------------------------------------------------- -------------------------------
169+
unicode-segmenter/grapheme 15.88 µs/iter 0.00 ps █
170+
(0.00 ps … 1.00 ms) 1.00 ms █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
171+
graphemer 47.69 µs/iter 47.61 µs █
172+
(46.88 µs … 50.29 µs) 48.83 µs ▄▁▁█▁▇▁▁▄▁▁▁▁▁▁▄▁▁▁▁▄
173+
grapheme-splitter 105.55 µs/iter 106.69 µs █ █ █
174+
(101.32 µs … 113.04 µs) 111.33 µs █▁██▁▁█▁█▁▁█▁▁▁▁▁▁▁██
175+
@formatjs/intl-segmenter 195.56 µs/iter 199.22 µs █ ▃
176+
(191.16 µs … 201.90 µs) 199.46 µs ▆▁█▁▁▁▆▁▆▆▁▁▁▁▆▁▁▁▁█▆
177+
unicode-rs/unicode-segmentation (wasm-bindgen) 104.02 µs/iter 104.98 µs █ █
178+
(102.05 µs … 105.71 µs) 105.47 µs █▁▁▁█▁█▁▁███▁▁▁▁▁█▁██
179+
Intl.Segmenter 20.12 µs/iter 20.51 µs █ ▄
180+
(19.53 µs … 21.00 µs) 20.51 µs ▅▁▁▁▁█▁▁▁▁█▁▁▁▁▅▁▁▁▁█
181+
182+
┌ ┐
183+
unicode-segmenter/grapheme ┤ 15.88 µs
184+
graphemer ┤■■■■■■ 47.69 µs
185+
grapheme-splitter ┤■■■■■■■■■■■■■■■■■ 105.55 µs
186+
@formatjs/intl-segmenter ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 195.56 µs
187+
unicode-rs/unicode-segmentation (wasm-bindgen) ┤■■■■■■■■■■■■■■■■■ 104.02 µs
188+
Intl.Segmenter ┤■ 20.12 µs
189+
└ ┘
190+
191+
summary
192+
unicode-segmenter/grapheme
193+
1.27x faster than Intl.Segmenter
194+
3x faster than graphemer
195+
6.55x faster than unicode-rs/unicode-segmentation (wasm-bindgen)
196+
6.65x faster than grapheme-splitter
197+
12.32x faster than @formatjs/intl-segmenter

0 commit comments

Comments
 (0)