Commit f27cd40
Enable faster prompt processing with mainline llama.cpp GGUFs (#409)
* Enable MLA-3 in crippled GGUFs: WIP
* Enable MLA-3 in crippled GGUFs: seems to work
* Add newly created tensors to model.tensors_by_name
Else they don't get run-time repacked.
---------
Co-authored-by: Iwan Kawrakow <[email protected]>1 parent 465569d commit f27cd40
3 files changed
+294
-140
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2334 | 2334 | | |
2335 | 2335 | | |
2336 | 2336 | | |
| 2337 | + | |
2337 | 2338 | | |
2338 | 2339 | | |
2339 | 2340 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
325 | 325 | | |
326 | 326 | | |
327 | 327 | | |
| 328 | + | |
328 | 329 | | |
329 | 330 | | |
330 | 331 | | |
| |||
0 commit comments