You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This gives a bit of performance boost in this function that can be
called during pattern matching.
Here are some cachegrind comparisons with blead:
Key:
Ir Instruction read
Dr Data read
Dw Data write
COND conditional branches
IND indirect branches
The numbers represent relative counts per loop iteration, compared to
blead at 100.0%.
Higher is better: for example, using half as many instructions gives 200%,
while using twice as many gives 50%.
GCC CLANG
valid_utf8_to_uv(0x007f), length is 1
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 100.69 Ir 100.00 99.11
Dr 100.00 101.47 Dr 100.00 99.74
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 101.20 COND 100.00 100.00
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0x07ff), length is 2
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 100.68 Ir 100.00 99.04
Dr 100.00 101.47 Dr 100.00 99.74
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 102.40 COND 100.00 101.23
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0xfffd), length is 3
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 100.83 Ir 100.00 99.04
Dr 100.00 101.47 Dr 100.00 99.75
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 102.99 COND 100.00 101.84
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0xffffd), length is 4
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 100.91 Ir 100.00 99.13
Dr 100.00 101.46 Dr 100.00 99.75
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 103.59 COND 100.00 102.45
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0x3ffffff), length is 5
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 101.28 Ir 100.00 99.29
Dr 100.00 101.46 Dr 100.00 99.75
Dw 100.00 100.00 Dw 100.00 99.57
COND 100.00 104.19 COND 100.00 103.07
IND 100.00 100.00 IND 100.00 94.12
valid_utf8_to_uv(0x7fffffff), length is 6
blead hacked blead hacked
------ ----------- ------ ------
Ir 100.00 89.83 Ir 100.00 88.83
Dr 100.00 95.22 Dr 100.00 92.94
Dw 100.00 92.44 Dw 100.00 91.63
COND 100.00 86.21 COND 100.00 87.11
IND 100.00 100.00 IND 100.00 88.89
Clang gives slightly worse results than gcc. But there is an
improvement in both cases for conditionals for two-byte and longer
characters..
This shows that the performance is significantly worse for code points
that take 6 bytes (or more, which I didn't include) to represent. These
are all well outside the Unicode range; hence are very rarely
encountered. Performance is improved a bit for the typical cases.
The algorithm used could handle 6 and 7 byte characters, but that
increases memory usage, and can lead to the compiler choosing to not
inline this function. In blead, experiments with clang gave these
results
Max bytes inlined Instances in the code where not inlined
3 14
4 19
5 19
6 19
7 57
We really need to accomodate any Unicode code point, which is 4 bytes (5
on EBCDIC). But the others we don't care about. Even though 6 bytes
doesn't show as being worse than 4, I chose to not include it, because
we don't care about performance for these rare non-Unicode code points,
and it just might cause non-inlining for different compilers or clang
versions.
0 commit comments