Has anyone tried the Deepseek R1 models yet? #189

BrandonWeng · 2025-01-27T04:52:08Z

BrandonWeng
Jan 27, 2025

We're seeing some weird behaviors, often times it just skips the internal thinking process and the output quality is pretty bad but when it works the quality is pretty good. Just wondering if we need tweaks for these thinking models?

    static public let deepseek_r1_distill_qwen_1_5b_8bit = ModelConfiguration(
        id: "mlx-community/DeepSeek-R1-Distill-Qwen-1.5B-8bit"
    )

Answered by awni

Jan 27, 2025

I don't think we've updated the swift-transformers and jinja version so it could be missing the chat template still. If you manually update your Swift Jinja package, that should fix it for now.

View full answer

awni · 2025-01-27T05:01:49Z

awni
Jan 27, 2025
Maintainer

I don't think we've updated the swift-transformers and jinja version so it could be missing the chat template still. If you manually update your Swift Jinja package, that should fix it for now.

0 replies

BrandonWeng · 2025-01-27T16:21:51Z

BrandonWeng
Jan 27, 2025
Author

Thanks! Works great after bumping to 0.1.15. Appreciate the quick response!

For others:
ml-explore/mlx-swift-examples#183 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Has anyone tried the Deepseek R1 models yet? #189

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Has anyone tried the Deepseek R1 models yet? #189

Uh oh!

BrandonWeng Jan 27, 2025

Replies: 2 comments

Uh oh!

awni Jan 27, 2025 Maintainer

Uh oh!

BrandonWeng Jan 27, 2025 Author

BrandonWeng
Jan 27, 2025

awni
Jan 27, 2025
Maintainer

BrandonWeng
Jan 27, 2025
Author