Commit 1f98379
authored
Cache autotune timings to disk (#6261)
Some users had expressed a desire to cache autotune results, both to
speed up local iteration, and to avoid re-tuning when scaling up to
large numbers of GPUs.
This PR caches tuning timings in Triton's cache dir. Running locally on
03-matrix-multiplication.py, the time of later runs is greatly reduced:
```
% time python ./03-matrix-multiplication.py
...
real 1m59.055s
% time python ./03-matrix-multiplication.py
...
real 0m13.794s
```
The cache key consists of:
* system information (triton source, target info, env vars)
* kernel source code (with dependences)
* the values of the tuning keys (e.g. M/N/K in the matmul example)
* the set of configs requested for tuning (so that we'll re-tune if the
user changes tunings)
If any configs have `pre_hook`s defined, we don't try caching at all,
since the results could depend on arbitrary python code.
A sampling of one of the cache entries (from the matmul tutorial) is:
```
% jq . $TRITON_CACHE_DIR/X5O...Q/matmul_kernel.autotune.json
{
"key": [
3968,
3968,
3968,
"torch.float8_e5m2",
"torch.float8_e5m2",
"torch.float16"
],
"configs_timings": [
[
{
"kwargs": {
"BLOCK_SIZE_M": 128,
"BLOCK_SIZE_N": 256,
"BLOCK_SIZE_K": 64,
"GROUP_SIZE_M": 8
},
"num_warps": 8,
"num_ctas": 1,
"num_stages": 3,
"maxnreg": null,
"pre_hook": null
},
[
0.14316800236701965,
0.1420159935951233,
0.14431999623775482
]
],
...
```
It's not strictly necessary to encode the key, since it's part of the
hashed path, but I think it makes it easier to understand the cache
contents for any dev who needs to do so.
I considered a few different designs here:
* storing just the best config versus all timings (I like having the raw
data available in the cache from a dev perspective, but I could relent
on this, it's an easy change)
* allowing new configs to be added while re-using older cached ones (I
got cold feet at the thought of mutating the cache)
* storing all key+config+timings in a single cache file (convenient for
analysis, but also requires mutating the cache)1 parent 658b5b2 commit 1f98379
1 file changed
+84
-27
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
7 | 9 | | |
8 | 10 | | |
9 | 11 | | |
| |||
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
32 | 21 | | |
33 | 22 | | |
34 | 23 | | |
| |||
42 | 31 | | |
43 | 32 | | |
44 | 33 | | |
| 34 | + | |
45 | 35 | | |
46 | 36 | | |
47 | 37 | | |
| |||
170 | 160 | | |
171 | 161 | | |
172 | 162 | | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
173 | 207 | | |
174 | 208 | | |
175 | 209 | | |
| |||
182 | 216 | | |
183 | 217 | | |
184 | 218 | | |
185 | | - | |
186 | 219 | | |
187 | 220 | | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
196 | 237 | | |
197 | 238 | | |
198 | 239 | | |
| |||
300 | 341 | | |
301 | 342 | | |
302 | 343 | | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
303 | 358 | | |
304 | 359 | | |
305 | | - | |
| 360 | + | |
306 | 361 | | |
307 | 362 | | |
308 | 363 | | |
| |||
356 | 411 | | |
357 | 412 | | |
358 | 413 | | |
| 414 | + | |
| 415 | + | |
359 | 416 | | |
360 | 417 | | |
361 | 418 | | |
362 | 419 | | |
363 | 420 | | |
364 | | - | |
| 421 | + | |
365 | 422 | | |
366 | 423 | | |
367 | 424 | | |
| |||
0 commit comments