Commit 05922fc
authored
Implement launch config infrastructure. (#804)
## Summary
This PR introduces two related pieces of launch-config infrastructure
needed by
`cuda.coop` single-phase work:
1. A low-overhead launch-config API that can be consumed from arbitrary
numba-cuda compilation stages (including rewrites), plus pre-launch
callback
registration on configured launches.
2. Launch-config-sensitive (LCS) plumbing so kernels that depend on
launch
configuration are specialized and cached correctly across launch
configs.
## Background and motivation
`cuda.coop` single-phase rewriting needs compile-time access to launch
configuration details (grid/block/shared memory/launch args) and a way
to
register pre-launch hooks from rewrite time (for launch-time kernel
argument
handling without requiring `@cuda.jit(extensions=...)`).
An earlier implementation (PR #288) provided this via Python
`contextvars`, but
review feedback showed launch overhead was too high. This branch
reimplements
the mechanism through C-extension TLS plumbing in `_dispatcher.cpp`,
with
negligible overhead in the launch micro-benchmark.
From `bench-launch-overhead.out` (us/launch, baseline vs contextvar vs
v2):
- 0 args: `5.56` vs `7.29` (+31.1%) vs `5.56` (+0.0%)
- 1 arg: `7.53` vs `9.18` (+21.8%) vs `7.55` (+0.2%)
- 2 args: `8.90` vs `10.64` (+19.5%) vs `8.97` (+0.8%)
- 3 args: `10.31` vs `12.50` (+21.3%) vs `10.37` (+0.5%)
- 4 args: `11.82` vs `13.56` (+14.7%) vs `11.92` (+0.8%)
## What this PR adds
### 1) Launch-config API with low launch overhead
- C-extension (`numba_cuda/numba/cuda/cext/_dispatcher.cpp`) now carries
the
active launch config in thread-local storage only during compilation
paths.
- Python API in `numba_cuda/numba/cuda/launchconfig.py`:
- `current_launch_config()`
- `ensure_current_launch_config()`
- `capture_compile_config()`
- Configured launches expose:
- launch metadata (`griddim`, `blockdim`, `sharedmem`, `args`,
`dispatcher`)
- `pre_launch_callbacks` for just-in-time launch-time hook registration.
### 2) Launch-config-sensitive compilation/caching
- Explicit LCS marker API on `_LaunchConfiguration`
(`numba_cuda/numba/cuda/dispatcher.py`):
- `mark_kernel_as_launch_config_sensitive()`
- `get_kernel_launch_config_sensitive()`
- `is_kernel_launch_config_sensitive()`
- CUDA backend (`numba_cuda/numba/cuda/compiler.py`) promotes that mark
into
compile metadata (`state.metadata["launch_config_sensitive"] = True`).
- Dispatcher/cache behavior for LCS kernels:
- per-launch-config dispatcher specialization routing
- per-launch-config disk-cache keys
- `.lcs` marker file indicating launch-config-sensitive cache entries.
## Why the LCS piece is required
Without LCS, cache keys are signature-based only, so a kernel compiled
once for
launch config A can be reused for launch config B without rerunning
rewrite.
That breaks launch-config-dependent rewrite behavior.
Concrete observed behavior:
- Runtime cache (single process):
- Launch `[1, 32]`: rewrite runs, callback registered.
- Launch `[1, 64]` without LCS: existing kernel reused, rewrite does not
run,
callback for the 64-config path is never registered.
- With LCS marking: second launch recompiles under a distinct
launch-config
specialization, so rewrite/callback registration runs for 64.
- Disk cache (cross process):
- Process 1 compiles and caches launch `[1, 32]`.
- Process 2 launches `[1, 64]` without LCS: 32-config artifact can be
reused
from disk (no rewrite for 64 path).
- With LCS marking: process 2 misses on 64-specific cache key and
compiles a
64-specific variant.
- LCS intentionally preserves exact cache hits for matching launch
configs. It
does not force recompilation when the launch-config key already matches.
So the LCS plumbing is what makes launch-config-dependent rewrite
decisions
correct under both in-memory and disk cache reuse.
Scope note for `cuda.coop` today:
- `cuda.coop` frequently injects LTO-IR/linking files during
compilation.
- numba-cuda currently does not disk-cache kernels with linking files,
so for
those paths the immediate LCS correctness benefit is runtime/in-memory
cache
behavior across launch configs.
- Disk-cache LCS behavior applies to launch-config-sensitive kernels
that are
otherwise disk-cacheable (and remains relevant for future linked-code
cache
support).
## Safety behavior
- If an LCS kernel is loaded from disk but the `.lcs` marker is missing,
we
treat that cache state as unsafe, force recompile, and re-mark.
- If marking fails (e.g. filesystem error), disk caching is disabled for
safety
(fallback to `NullCache`) to avoid unsafe reuse.
## Out of scope
- Cache invalidation keyed on `numba_cuda.__version__` (handled by PR
#800).
Note that PR #800 should be merged and presumably a release cut before
this
PR is merged--that allows downstream projects like cuda.coop to pin
accordingly.1 parent de4642c commit 05922fc
File tree
11 files changed
+1135
-19
lines changed- docs/source/reference
- numba_cuda/numba/cuda
- cext
- tests/cudapy
11 files changed
+1135
-19
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
66 | 67 | | |
67 | 68 | | |
68 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
69 | 169 | | |
70 | 170 | | |
71 | 171 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
16 | 126 | | |
17 | 127 | | |
18 | 128 | | |
| |||
840 | 950 | | |
841 | 951 | | |
842 | 952 | | |
| 953 | + | |
843 | 954 | | |
844 | 955 | | |
845 | 956 | | |
| |||
855 | 966 | | |
856 | 967 | | |
857 | 968 | | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
858 | 984 | | |
859 | | - | |
| 985 | + | |
| 986 | + | |
860 | 987 | | |
| 988 | + | |
861 | 989 | | |
862 | 990 | | |
863 | 991 | | |
| |||
913 | 1041 | | |
914 | 1042 | | |
915 | 1043 | | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
916 | 1049 | | |
917 | 1050 | | |
918 | 1051 | | |
| |||
924 | 1057 | | |
925 | 1058 | | |
926 | 1059 | | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
927 | 1065 | | |
928 | 1066 | | |
929 | 1067 | | |
| |||
935 | 1073 | | |
936 | 1074 | | |
937 | 1075 | | |
| 1076 | + | |
938 | 1077 | | |
939 | 1078 | | |
940 | 1079 | | |
| |||
1040 | 1179 | | |
1041 | 1180 | | |
1042 | 1181 | | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
1043 | 1220 | | |
1044 | 1221 | | |
1045 | 1222 | | |
1046 | 1223 | | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
1047 | 1230 | | |
1048 | 1231 | | |
1049 | 1232 | | |
| |||
1055 | 1238 | | |
1056 | 1239 | | |
1057 | 1240 | | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
1058 | 1248 | | |
1059 | 1249 | | |
1060 | 1250 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
398 | 399 | | |
399 | 400 | | |
400 | 401 | | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
401 | 410 | | |
402 | 411 | | |
403 | 412 | | |
| |||
408 | 417 | | |
409 | 418 | | |
410 | 419 | | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
411 | 423 | | |
412 | 424 | | |
413 | 425 | | |
| |||
0 commit comments