Has anyone tried the Deepseek R1 models yet? #189
-
We're seeing some weird behaviors, often times it just skips the internal thinking process and the output quality is pretty bad but when it works the quality is pretty good. Just wondering if we need tweaks for these thinking models? static public let deepseek_r1_distill_qwen_1_5b_8bit = ModelConfiguration(
id: "mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-8bit"
) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
I don't think we've updated the swift-transformers and jinja version so it could be missing the chat template still. If you manually update your Swift Jinja package, that should fix it for now. |
Beta Was this translation helpful? Give feedback.
-
Thanks! Works great after bumping to 0.1.15. Appreciate the quick response! For others: |
Beta Was this translation helpful? Give feedback.
I don't think we've updated the swift-transformers and jinja version so it could be missing the chat template still. If you manually update your Swift Jinja package, that should fix it for now.