-
Notifications
You must be signed in to change notification settings - Fork 149
models Local documentation
-
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to e...
-
deepseek-r1-distill-qwen-1.5b-cuda-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-...
-
deepseek-r1-distill-qwen-1.5b-generic-cpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Di...
-
deepseek-r1-distill-qwen-1.5b-generic-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Di...
-
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited ...
-
deepseek-r1-distill-qwen-7b-cuda-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CUDA GPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1...
-
deepseek-r1-distill-qwen-7b-generic-cpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.
-
Developed by: Microsoft
-
Model type: ONNX
-
License: MIT
-
Model Description: This is a conversion of the DeepSeek-R1-Dist...
-
deepseek-r1-distill-qwen-7b-generic-gpu
This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on GPUs. This model uses RTN quantization.
- Developed by: Microsoft
- Model type: ONNX
- License: MIT
- Model Description: This is a conversion of the DeepSeek-R1-Dist...