Commit 7905ec3
authored
[libc] Use UMAXV.4S to reduce bcmp result.
We can use UMAXV.4S to reduce the comparison result in a single
instruction. This improves performance by roughly 4% on Apple M1:
Summary
bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10 ran
1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark3 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.01 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.02 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.03 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark2 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.03 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
1.05 ± 0.02 times faster than bin/libc.src.string.bcmp_benchmark1 --study-name="new bcmp" --sweep-mode --sweep-max-size=128 --output=/dev/null --num-trials=10
(1 = original, 2 = a variant of this patch that uses UMAXV.16B, 3 = this patch)
Reviewers: michaelrj-google, gchatelet, overmighty, SchrodingerZhu
Pull Request: llvm#992601 parent d742903 commit 7905ec3
1 file changed
+6
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
88 | | - | |
| 87 | + | |
89 | 88 | | |
90 | 89 | | |
91 | 90 | | |
| |||
97 | 96 | | |
98 | 97 | | |
99 | 98 | | |
100 | | - | |
101 | | - | |
102 | | - | |
| 99 | + | |
103 | 100 | | |
104 | | - | |
105 | | - | |
| 101 | + | |
106 | 102 | | |
107 | 103 | | |
108 | 104 | | |
| |||
129 | 125 | | |
130 | 126 | | |
131 | 127 | | |
132 | | - | |
133 | | - | |
| 128 | + | |
134 | 129 | | |
135 | 130 | | |
136 | 131 | | |
| |||
150 | 145 | | |
151 | 146 | | |
152 | 147 | | |
153 | | - | |
154 | | - | |
155 | | - | |
| 148 | + | |
| 149 | + | |
156 | 150 | | |
157 | 151 | | |
158 | 152 | | |
| |||
0 commit comments