Commit ea85e27
authored
Increase performance for Gemma3n models on NVGPUs by enabling CUDA Graph execution (ollama#11525)
* Enable CUDA Graphs for gemma3n.
Similar to
ggml-org/llama.cpp#14741,
though ollama has a slightly different model graph
than llama.cpp which requires different workaround
checks.
* Remove residual check by reshaping differently in gemma3n model
This should make the heuristics more robust1 parent c116a75 commit ea85e27
File tree
5 files changed
+67
-10
lines changed- llama/patches
- ml/backend/ggml/ggml/src/ggml-cuda
- model/models/gemma3n
5 files changed
+67
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2474 | 2474 | | |
2475 | 2475 | | |
2476 | 2476 | | |
| 2477 | + | |
| 2478 | + | |
| 2479 | + | |
2477 | 2480 | | |
2478 | 2481 | | |
2479 | 2482 | | |
| |||
2495 | 2498 | | |
2496 | 2499 | | |
2497 | 2500 | | |
2498 | | - | |
2499 | | - | |
2500 | | - | |
| 2501 | + | |
| 2502 | + | |
| 2503 | + | |
| 2504 | + | |
| 2505 | + | |
| 2506 | + | |
| 2507 | + | |
| 2508 | + | |
2501 | 2509 | | |
2502 | 2510 | | |
2503 | | - | |
| 2511 | + | |
2504 | 2512 | | |
2505 | 2513 | | |
2506 | 2514 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
210 | 209 | | |
211 | 210 | | |
212 | 211 | | |
| |||
0 commit comments