Commit 75bf739
authored
[libc][gpu] Disable loop unrolling in the throughput benchmark loop (#153971)
This patch makes GPU throughput benchmark results more comparable across
targets by disabling loop unrolling in the benchmark loop.
Motivation:
* PTX (post-LTO) evidence on NVPTX: for libc `sin`, the generated PTX
shows the `throughput` loop unrolled 8x at `N=128` (one iteration
advances the input pointer by 64 bytes = 8 doubles), interleaving eight
independent chains before the back-edge. This hides latency and
significantly reduces cycles/call as the batch size `N` grows.
* Observed scaling (NVPTX measurements): with unrolling enabled, `sin`
dropped from ~3,100 cycles/call at `N=1` to ~360 at `N=128`. After
enforcing `#pragma clang loop unroll(disable)`, results stabilized
(e.g., from ~3100 cycles/call at `N=1` to ~2700 at `N=128`).
* libdevice contrast: the libdevice `sin` path did not exhibit a similar
drop in our measurements, and the PTX appears as compact internal calls
rather than a long FMA chain, leaving less ILP for the outer loop to
extract.
What this change does:
* Applies `#pragma clang loop unroll(disable)` to the GPU `throughput()`
loop in both NVPTX and AMDGPU backends.
Leaving unrolling entirely to the optimizer makes apples-to-apples
comparisons uneven (e.g., libc vs. vendor). Disabling unrolling yields
fairer, more consistent numbers.1 parent 3acb679 commit 75bf739
2 files changed
+16
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
| 121 | + | |
120 | 122 | | |
121 | 123 | | |
122 | 124 | | |
| |||
146 | 148 | | |
147 | 149 | | |
148 | 150 | | |
| 151 | + | |
| 152 | + | |
149 | 153 | | |
150 | 154 | | |
151 | 155 | | |
| |||
174 | 178 | | |
175 | 179 | | |
176 | 180 | | |
| 181 | + | |
| 182 | + | |
177 | 183 | | |
178 | 184 | | |
179 | 185 | | |
| |||
206 | 212 | | |
207 | 213 | | |
208 | 214 | | |
| 215 | + | |
| 216 | + | |
209 | 217 | | |
210 | 218 | | |
211 | 219 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
| 109 | + | |
| 110 | + | |
109 | 111 | | |
110 | 112 | | |
111 | 113 | | |
| |||
135 | 137 | | |
136 | 138 | | |
137 | 139 | | |
| 140 | + | |
| 141 | + | |
138 | 142 | | |
139 | 143 | | |
140 | 144 | | |
| |||
163 | 167 | | |
164 | 168 | | |
165 | 169 | | |
| 170 | + | |
| 171 | + | |
166 | 172 | | |
167 | 173 | | |
168 | 174 | | |
| |||
195 | 201 | | |
196 | 202 | | |
197 | 203 | | |
| 204 | + | |
| 205 | + | |
198 | 206 | | |
199 | 207 | | |
200 | 208 | | |
| |||
0 commit comments