Commit ed78ae3
committed
Update on " [ExecuTorch][BE] Split kv cache and SDPA for better code sharing"
Summary:
Why?
We have coupled SDPA with kv cache for a while. Initially this was done
as we implemented sdpa_with_kv_cache custom op to reduce multiple copy
overheads from kv cache update. (This could have been done by having
separate custom kv cache update and custom sdpa op. Recent changes
enabled this.)
As a result of SDPA module owning kv cache, we get a) non-composable
implementation and b) harder to reuse model definition and components
from repos like tune. Output of this is that we have multiple definition
of the same model, llama, lying around in ET, TorchChat and Tune. This
diff and subsequent ones will try to move in the direction where custom
kv cache and custom sdpa become decoupled and composable, making it more
module-swap friendly with tune's model definition.
How.
Earlier PRs decoupled kv cache update from sdpa. So now
1. Decouple SDPA nn.Module from KV cache.
2. Standardize on KVCache and SDPA interface. That is KVCache and SDPA
both operate on q, k, v in [B, # heads, seq_len, head_dim] formatted
tensors.
3. 2 will introduce multiple tranposes when KVCache and SDPA are
replaced by custom modules, but we will write graph pass to undo
those.
Test Plan:
Existing tests.
Make sure perf doesnt regress
Differential Revision: [D67914054](https://our.internmc.facebook.com/intern/diff/D67914054)
[ghstack-poisoned]File tree
2 files changed
+16
-5
lines changed- examples/models/llama
- source_transformation
2 files changed
+16
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
286 | 286 | | |
287 | 287 | | |
288 | 288 | | |
| 289 | + | |
289 | 290 | | |
290 | 291 | | |
291 | 292 | | |
292 | 293 | | |
293 | 294 | | |
294 | 295 | | |
295 | 296 | | |
| 297 | + | |
296 | 298 | | |
297 | 299 | | |
298 | 300 | | |
| |||
371 | 373 | | |
372 | 374 | | |
373 | 375 | | |
| 376 | + | |
374 | 377 | | |
375 | 378 | | |
376 | 379 | | |
| |||
403 | 406 | | |
404 | 407 | | |
405 | 408 | | |
406 | | - | |
| 409 | + | |
407 | 410 | | |
408 | 411 | | |
409 | 412 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
66 | | - | |
| 68 | + | |
67 | 69 | | |
68 | 70 | | |
69 | 71 | | |
| |||
79 | 81 | | |
80 | 82 | | |
81 | 83 | | |
| 84 | + | |
82 | 85 | | |
83 | 86 | | |
84 | 87 | | |
85 | 88 | | |
86 | 89 | | |
| 90 | + | |
87 | 91 | | |
88 | 92 | | |
89 | 93 | | |
| |||
131 | 135 | | |
132 | 136 | | |
133 | 137 | | |
| 138 | + | |
134 | 139 | | |
135 | 140 | | |
136 | 141 | | |
137 | 142 | | |
| 143 | + | |
138 | 144 | | |
139 | 145 | | |
140 | 146 | | |
| |||
171 | 177 | | |
172 | 178 | | |
173 | 179 | | |
174 | | - | |
| 180 | + | |
175 | 181 | | |
176 | 182 | | |
177 | 183 | | |
| |||
184 | 190 | | |
185 | 191 | | |
186 | 192 | | |
187 | | - | |
| 193 | + | |
188 | 194 | | |
189 | 195 | | |
190 | 196 | | |
| |||
216 | 222 | | |
217 | 223 | | |
218 | 224 | | |
| 225 | + | |
219 | 226 | | |
220 | 227 | | |
221 | 228 | | |
222 | 229 | | |
223 | 230 | | |
| 231 | + | |
224 | 232 | | |
225 | 233 | | |
226 | 234 | | |
| |||
252 | 260 | | |
253 | 261 | | |
254 | 262 | | |
255 | | - | |
| 263 | + | |
256 | 264 | | |
257 | 265 | | |
258 | 266 | | |
| |||
0 commit comments