Replies: 5 comments 52 replies
-
The config.json is the same (same architecture/same config) so ik_llama.cpp will behave the same (besides the updated weights, which affect output). This is just another finetune. There are cases where finetuned model does change the config (see Qwen with the base being 128K , and the instruct tunes being only 32k with them recommending: "To handle extensive inputs exceeding 32,768 tokens, we utilize YaRN, a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.", but this is not one of those cases, and even in that cases the finetune did not change the architecture (which is what matters for conversion) just config.
For the first point, the linked imatrix will work but I do not recommended it as even though that imatrix was generated on the same model type and so it will apply, the model weights are different and that affects the imatrix data. (Edit: The mradermacher team is already working on quanting and imatrixing that model) The second point, those weights were present in the other releases such as V3, V3-BASE, and R1, and the conversion just does not include them as llama.cpp and ik_llama.cpp both have do not support the MTP, it is a similar situation with what happened with the MLA tensors, where once support was added the conversion script was updated to include them which required reconverting.
I'm curious, and will have to make room for it on my server. I know this is slightly off topic but I'd be curious to hear your experience with this (and any of the other Deepseek models you've tried). |
Beta Was this translation helpful? Give feedback.
-
Important To calculate the imatrix, please do not use any of the As @saood06 pointed out,
This has been superseded by #259. The additional 2 tensors needed for MLA ( |
Beta Was this translation helpful? Give feedback.
-
Just saw this "In our web and application environments, the temperature parameter |
Beta Was this translation helpful? Give feedback.
-
Is this something you have looked into? I think even a basic implementation should offer 50% improvement. There is also jukofyork who is making draft model's (see here) that can be used with llama.cpp's already existing generic drafting implementation, I'm watching that to see how much performance uplift people end up reporting on that. |
Beta Was this translation helpful? Give feedback.
-
The imatrix computation that gave these final perplexity values is useless. It means mainline is not working with |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I saw today a new model deepseek-ai/DeepSeek-V3-0324 that may run on this fork?
Zero pressure for anyone to spend time on this, just experimenting to satisfy my curiosity.
I figure might as well download it and see if it magically "just works" using my existing R1 custom quant procedure.
The main two issues I imagine might crop up without knowing anything:
Well, I'll update this discussion after it finishes downloading and I give it the old college try haha...
Curious if anyone else has any luck and if this new model is "better" at coding like some are speculating over on r/LocalLlama... Who knows!
Beta Was this translation helpful? Give feedback.
All reactions