Skip to content

Commit 5d1ff06

Browse files
authored
Merge pull request #53 from wismill/feature/compat_case
Add Unicode.Char.Case.Compat
2 parents e28aaef + c66a12f commit 5d1ff06

File tree

11 files changed

+4686
-67
lines changed

11 files changed

+4686
-67
lines changed

Changelog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## 0.3.0 (December 2021)
44

55
- Support for big-endian architectures.
6+
- Added the module `Unicode.Char.Case.Compat`.
67
- Added `GeneralCategory` data type and corresponding `generalCategoryAbbr`,
78
`generalCategory` functions.
89
- Added the following functions to `Unicode.Char.General`:

README.md

Lines changed: 136 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -20,85 +20,164 @@ Please see the haddock documentation for reference documentation.
2020
`unicode-data` is up to _5 times faster_ than `base`.
2121

2222
The following benchmark compares the time taken in milliseconds to process all
23-
the Unicode code points for `base-4.16` and this package (v0.3).
23+
the Unicode code points for `base-4.16` (GHC 9.2.1) and this package (v0.3).
2424
Machine: 8 × AMD Ryzen 5 2500U on Linux.
2525

2626
```
2727
All
2828
Unicode.Char.Case
2929
isLower
30-
base: OK (6.59s)
31-
26 ms ± 238 μs
32-
unicode-data: OK (1.16s)
33-
4.5 ms ± 83 μs, 0.17x
30+
base: OK (1.59s)
31+
25 ms ± 583 μs
32+
unicode-data: OK (2.01s)
33+
3.9 ms ± 22 μs, 0.15x
3434
isUpper
35-
base: OK (1.69s)
36-
27 ms ± 459 μs
37-
unicode-data: OK (1.21s)
38-
4.8 ms ± 77 μs, 0.18x
35+
base: OK (1.62s)
36+
26 ms ± 1.0 ms
37+
unicode-data: OK (2.00s)
38+
3.9 ms ± 24 μs, 0.15x
39+
Unicode.Char.Case.Compat
40+
toLower
41+
base: OK (1.46s)
42+
23 ms ± 512 μs
43+
unicode-data: OK (1.89s)
44+
7.4 ms ± 112 μs, 0.32x
45+
toTitle
46+
base: OK (1.49s)
47+
24 ms ± 399 μs
48+
unicode-data: OK (1.92s)
49+
7.5 ms ± 67 μs, 0.32x
50+
toUpper
51+
base: OK (1.46s)
52+
23 ms ± 468 μs
53+
unicode-data: OK (1.75s)
54+
6.9 ms ± 99 μs, 0.30x
3955
Unicode.Char.General
4056
generalCategory
41-
base: OK (0.92s)
42-
131 ms ± 1.5 ms
43-
unicode-data: OK (1.62s)
44-
108 ms ± 1.2 ms, 0.82x
57+
base: OK (1.95s)
58+
129 ms ± 733 μs
59+
unicode-data: OK (1.63s)
60+
108 ms ± 1.1 ms, 0.84x
61+
isAlphabetic
62+
unicode-data: OK (1.28s)
63+
312 μs ± 3.2 μs
4564
isAlphaNum
46-
base: OK (3.28s)
47-
26 ms ± 300 μs
48-
unicode-data: OK (20.60s)
49-
5.0 ms ± 59 μs, 0.19x
65+
base: OK (1.56s)
66+
25 ms ± 252 μs
67+
unicode-data: OK (2.35s)
68+
4.6 ms ± 31 μs, 0.19x
5069
isControl
51-
base: OK (1.61s)
52-
26 ms ± 463 μs
53-
unicode-data: OK (1.22s)
54-
4.8 ms ± 53 μs, 0.19x
70+
base: OK (1.57s)
71+
25 ms ± 551 μs
72+
unicode-data: OK (2.16s)
73+
4.2 ms ± 33 μs, 0.17x
5574
isMark
56-
base: OK (0.80s)
57-
26 ms ± 339 μs
58-
unicode-data: OK (1.33s)
59-
5.2 ms ± 77 μs, 0.20x
75+
base: OK (1.63s)
76+
26 ms ± 689 μs
77+
unicode-data: OK (2.34s)
78+
4.6 ms ± 27 μs, 0.18x
6079
isPrint
61-
base: OK (3.32s)
62-
26 ms ± 498 μs
63-
unicode-data: OK (1.33s)
64-
5.2 ms ± 55 μs, 0.20x
80+
base: OK (1.62s)
81+
26 ms ± 788 μs
82+
unicode-data: OK (2.13s)
83+
4.2 ms ± 73 μs, 0.16x
6584
isPunctuation
66-
base: OK (3.41s)
67-
27 ms ± 497 μs
68-
unicode-data: OK (2.67s)
69-
5.3 ms ± 28 μs, 0.20x
85+
base: OK (1.61s)
86+
26 ms ± 170 μs
87+
unicode-data: OK (2.04s)
88+
4.0 ms ± 30 μs, 0.16x
7089
isSeparator
71-
base: OK (0.84s)
72-
27 ms ± 422 μs
73-
unicode-data: OK (1.41s)
74-
5.5 ms ± 52 μs, 0.21x
90+
base: OK (1.71s)
91+
27 ms ± 247 μs
92+
unicode-data: OK (2.20s)
93+
4.3 ms ± 25 μs, 0.16x
7594
isSymbol
76-
base: OK (1.72s)
77-
27 ms ± 443 μs
78-
unicode-data: OK (1.45s)
79-
5.7 ms ± 112 μs, 0.21x
95+
base: OK (1.68s)
96+
27 ms ± 312 μs
97+
unicode-data: OK (2.32s)
98+
4.5 ms ± 41 μs, 0.17x
99+
isWhiteSpace
100+
unicode-data: OK (1.28s)
101+
312 μs ± 3.5 μs
102+
isHangul
103+
unicode-data: OK (1.28s)
104+
312 μs ± 2.6 μs
105+
isHangulLV
106+
unicode-data: OK (1.28s)
107+
312 μs ± 2.8 μs
108+
isJamo
109+
unicode-data: OK (1.28s)
110+
312 μs ± 2.7 μs
111+
jamoLIndex
112+
unicode-data: OK (1.28s)
113+
312 μs ± 3.1 μs
114+
jamoVIndex
115+
unicode-data: OK (1.28s)
116+
312 μs ± 2.9 μs
117+
jamoTIndex
118+
unicode-data: OK (1.28s)
119+
312 μs ± 2.9 μs
80120
Unicode.Char.General.Compat
81121
isAlpha
82-
base: OK (3.26s)
83-
26 ms ± 254 μs
84-
unicode-data: OK (2.66s)
85-
5.2 ms ± 48 μs, 0.20x
122+
base: OK (1.59s)
123+
25 ms ± 446 μs
124+
unicode-data: OK (2.14s)
125+
4.2 ms ± 25 μs, 0.17x
86126
isLetter
87-
base: OK (1.70s)
88-
27 ms ± 453 μs
89-
unicode-data: OK (1.33s)
90-
5.2 ms ± 69 μs, 0.19x
127+
base: OK (1.72s)
128+
27 ms ± 677 μs
129+
unicode-data: OK (2.14s)
130+
4.2 ms ± 59 μs, 0.15x
91131
isSpace
92-
base: OK (0.85s)
93-
13 ms ± 237 μs
94-
unicode-data: OK (1.69s)
95-
6.7 ms ± 61 μs, 0.49x
132+
base: OK (1.48s)
133+
12 ms ± 99 μs
134+
unicode-data: OK (2.30s)
135+
4.5 ms ± 30 μs, 0.39x
136+
Unicode.Char.Identifiers
137+
isIDContinue
138+
unicode-data: OK (1.28s)
139+
312 μs ± 2.7 μs
140+
isIDStart
141+
unicode-data: OK (1.29s)
142+
312 μs ± 2.7 μs
143+
isXIDContinue
144+
unicode-data: OK (1.28s)
145+
312 μs ± 3.2 μs
146+
isXIDStart
147+
unicode-data: OK (1.28s)
148+
312 μs ± 3.2 μs
149+
isPatternSyntax
150+
unicode-data: OK (1.28s)
151+
312 μs ± 3.4 μs
152+
isPatternWhitespace
153+
unicode-data: OK (1.28s)
154+
312 μs ± 2.9 μs
155+
Unicode.Char.Normalization
156+
isCombining
157+
unicode-data: OK (1.28s)
158+
313 μs ± 5.1 μs
159+
combiningClass
160+
unicode-data: OK (1.66s)
161+
3.2 ms ± 113 μs
162+
isCombiningStarter
163+
unicode-data: OK (1.29s)
164+
312 μs ± 3.2 μs
165+
isDecomposable
166+
Canonical
167+
unicode-data: OK (1.29s)
168+
312 μs ± 3.5 μs
169+
Kompat
170+
unicode-data: OK (1.28s)
171+
312 μs ± 3.5 μs
172+
decomposeHangul
173+
unicode-data: OK (1.28s)
174+
312 μs ± 3.0 μs
96175
Unicode.Char.Numeric
97176
isNumber
98-
base: OK (1.67s)
99-
26 ms ± 316 μs
100-
unicode-data: OK (1.32s)
101-
5.2 ms ± 91 μs, 0.20x
177+
base: OK (1.66s)
178+
26 ms ± 404 μs
179+
unicode-data: OK (2.47s)
180+
4.8 ms ± 22 μs, 0.18x
102181
```
103182

104183
## Unicode database version update

bench/Main.hs

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import Test.Tasty.Bench (Benchmark, bgroup, bench, bcompare, nf, defaultMain)
44

55
import qualified Data.Char as B
66
import qualified Unicode.Char.Case as C
7+
import qualified Unicode.Char.Case.Compat as CC
78
import qualified Unicode.Char.General as G
89
import qualified Unicode.Char.General.Compat as GC
910
import qualified Unicode.Char.Identifiers as I
@@ -28,6 +29,20 @@ main = defaultMain
2829
, Bench "unicode-data" C.isUpper
2930
]
3031
]
32+
, bgroup "Unicode.Char.Case.Compat"
33+
[ bgroup' "toLower"
34+
[ Bench "base" B.toLower
35+
, Bench "unicode-data" CC.toLower
36+
]
37+
, bgroup' "toTitle"
38+
[ Bench "base" B.toTitle
39+
, Bench "unicode-data" CC.toTitle
40+
]
41+
, bgroup' "toUpper"
42+
[ Bench "base" B.toUpper
43+
, Bench "unicode-data" CC.toUpper
44+
]
45+
]
3146
, bgroup "Unicode.Char.General"
3247
-- Character classification
3348
[ bgroup' "generalCategory"

0 commit comments

Comments
 (0)