Skip to content

models Local documentation

github-actions[bot] edited this page May 2, 2025 · 5 revisions

Local

Models in this category


  • deepseek-r1-distill-qwen-1.5b

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B for local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to e...

  • deepseek-r1-distill-qwen-1.5b-cuda-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-...

  • deepseek-r1-distill-qwen-1.5b-generic-cpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Di...

  • deepseek-r1-distill-qwen-1.5b-generic-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-1.5B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Di...

  • deepseek-r1-distill-qwen-7b

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference. Optimized models are published here in ONNX format to run on CPU, GPU, and NPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited ...

  • deepseek-r1-distill-qwen-7b-cuda-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CUDA GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1...

  • deepseek-r1-distill-qwen-7b-generic-cpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on CPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft

  • Model type: ONNX

  • License: MIT

  • Model Description: This is a conversion of the DeepSeek-R1-Dist...

  • deepseek-r1-distill-qwen-7b-generic-gpu

    This model is an optimized version of DeepSeek-R1-Distill-Qwen-7B to enable local inference on GPUs. This model uses RTN quantization.

Model Description

  • Developed by: Microsoft
  • Model type: ONNX
  • License: MIT
  • Model Description: This is a conversion of the DeepSeek-R1-Dist...
Clone this wiki locally