llama : add support for Deepseek-R1-Qwen distill model #11310

ngxson · 2025-01-20T13:14:34Z

Support all Qwen-based model of the Deepseek-R1 distill family:

ngxson · 2025-01-20T13:35:52Z

cc @bartowski1182, you can now make GGUF quants :D

prusnak · 2025-01-20T15:45:31Z

Are similar changes needed to support DeepSeek-R1-Distill-Llama-* or no change is needed?

ngxson · 2025-01-20T16:01:50Z

@prusnak I don't have time to try, but there are already many GGUFs for that model on the HF hub. Can you try?

prusnak · 2025-01-20T16:27:38Z

@prusnak I don't have time to try, but there are already many GGUFs for that model on the HF hub. Can you try?

I just tried DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf from https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF and it works on current master. 👍

wakamex · 2025-01-20T20:53:09Z

~~my llama-server hangs with @bartowski1182's distill, while llama-cli works fine~~
false alarm, I had a broken build

./build/bin/llama-server -m DeepSeek-R1-Distill-Qwen-14B-Q6_K_L.gguf --port 8083 -v
...
srv  add_waiting_: add task 2 to waiting list. current waiting = 0 (before add)
que          post: new task, id = 2/1, front = 0
que    start_loop: processing task, id = 2
slot get_availabl: id  0 | task 0 | selected slot by lru, t_last = 338721759369
slot        reset: id  0 | task 0 | 
slot launch_slot_: id  0 | task 2 | launching slot : {"id":0,"id_task":2,"n_ctx":4096,"speculative":false,"is_processing":false,"non_causal":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":4096,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"t_max_predict_ms":-1,"n_indent":0,"response_fields":[],"stream":true,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":5,"speculative.p_min":0.8999999761581421,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<｜begin▁of▁sentence｜>You are a helpful assistant.\n\n<｜User｜>hi<｜Assistant｜>","next_token":{"has_next_token":true,"has_new_line":false,"n_remain":-1,"n_decoded":0,"stopping_word":""}}
slot launch_slot_: id  0 | task 2 | processing task
srv  cancel_tasks: cancel task, id_task = 2
srv  remove_waiti: remove task 2 from waiting list. current waiting = 1 (before remove)
que          post: new task, id = 3/1, front = 1
request: POST /v1/chat/completions 127.0.0.1 200
request:  {"messages":[{"role":"system","content":"You are a helpful assistant."},{"id":1737405941029,"role":"user","content":"hi"}],"stream":true,"cache_prompt":true,"samplers":"edkypmxt","temperature":0.8,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"typical_p":1,"xtc_probability":0,"xtc_threshold":0.1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"max_tokens":-1,"timings_per_token":false}
response: 
srv  remove_waiti: remove task 2 from waiting list. current waiting = 0 (before remove)
que    start_loop: processing task, id = 3
slot      release: id  0 | task 2 | stop processing: n_past = 0, truncated = 0

bartowski1182 · 2025-01-20T20:54:51Z

I saw a similar (though reversed) issue with lmstudio, where the model sends one response and then crashes in the chat, but the server works fine 🤔

* llama : add support for Deepseek-R1-Qwen distill model * coding style

llama : add support for Deepseek-R1-Qwen distill model

7542b6d

ngxson requested a review from ggerganov January 20, 2025 13:14

github-actions bot added the python python script changes label Jan 20, 2025

coding style

19be0a8

ggerganov approved these changes Jan 20, 2025

View reviewed changes

ngxson merged commit ec7f3ac into ggml-org:master Jan 20, 2025
48 checks passed

Dango233 mentioned this pull request Jan 20, 2025

Requesting support for DeepSeek-R1-Distill series models ollama/ollama#8502

Closed

Animaxx added a commit to Animaxx/llama.cpp that referenced this pull request Jan 20, 2025

https://github.com/ggerganov/llama.cpp/pull/11310

bf572ca

sholtomaud mentioned this pull request Jan 23, 2025

Bug: New DeepSeek-R1-Distill-Qwen models do not load Mozilla-Ocho/llamafile#684

Closed

MekkCyber mentioned this pull request Jan 23, 2025

Support for Deepseek-r1-Qwen tokenizer mybigday/llama.rn#114

Closed

anagri pushed a commit to BodhiSearch/llama.cpp that referenced this pull request Jan 26, 2025

llama : add support for Deepseek-R1-Qwen distill model (ggml-org#11310)

4e0b364

* llama : add support for Deepseek-R1-Qwen distill model * coding style

martindevans mentioned this pull request Jan 28, 2025

[Feature]: DeepSeek-R1-Distill-Qwen or similar distilled DeepSeek gguf support SciSharp/LLamaSharp#1059

Closed

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025

llama : add support for Deepseek-R1-Qwen distill model (ggml-org#11310)

75a9af4

* llama : add support for Deepseek-R1-Qwen distill model * coding style

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

llama : add support for Deepseek-R1-Qwen distill model (ggml-org#11310)

5493eca

* llama : add support for Deepseek-R1-Qwen distill model * coding style

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

llama : add support for Deepseek-R1-Qwen distill model (ggml-org#11310)

eb6f617

* llama : add support for Deepseek-R1-Qwen distill model * coding style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : add support for Deepseek-R1-Qwen distill model #11310

llama : add support for Deepseek-R1-Qwen distill model #11310

Uh oh!

ngxson commented Jan 20, 2025

Uh oh!

Uh oh!

ngxson commented Jan 20, 2025 •

edited

Loading

Uh oh!

prusnak commented Jan 20, 2025

Uh oh!

ngxson commented Jan 20, 2025 •

edited

Loading

Uh oh!

prusnak commented Jan 20, 2025

Uh oh!

wakamex commented Jan 20, 2025 •

edited

Loading

Uh oh!

bartowski1182 commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llama : add support for Deepseek-R1-Qwen distill model #11310

llama : add support for Deepseek-R1-Qwen distill model #11310

Uh oh!

Conversation

ngxson commented Jan 20, 2025

Uh oh!

Uh oh!

ngxson commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Jan 20, 2025

Uh oh!

ngxson commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prusnak commented Jan 20, 2025

Uh oh!

wakamex commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartowski1182 commented Jan 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ngxson commented Jan 20, 2025 •

edited

Loading

ngxson commented Jan 20, 2025 •

edited

Loading

wakamex commented Jan 20, 2025 •

edited

Loading