@@ -10,15 +10,17 @@ The DeepSeek-AI team provides FP8 safetensors for DeepSeek-R1/V3 models. We achi
1010So those who are persuing the best performance can use the FP8 linear kernel for DeepSeek-V3/R1.
1111
1212## Key Features
13- ✅ Hybrid Precision Architecture (FP8 + GGML)
13+
14+ ✅ Hybrid Precision Architecture (FP8 + GGML)<br >
1415✅ Memory Optimization (~ 19GB VRAM usage)
1516
1617## Quick Start
1718### Using Pre-Merged Weights
1819
19- Pre-merged weights are available on Hugging Face:
20- [ KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid] ( https://huggingface.co/KVCache-ai/DeepSeek-V3 )
20+ Pre-merged weights are available on Hugging Face:< br >
21+ [ KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid] ( https://huggingface.co/KVCache-ai/DeepSeek-V3 ) < br >
2122[ KVCache-ai/DeepSeek-R1-GGML-FP8-Hybrid] ( https://huggingface.co/KVCache-ai/DeepSeek-R1 )
23+
2224> Please confirm the weights are fully uploaded before downloading. The large file size may extend Hugging Face upload time.
2325
2426
@@ -32,12 +34,12 @@ pip install -U huggingface_hub
3234huggingface-cli download --resume-download KVCache-ai/DeepSeek-V3-GGML-FP8-Hybrid --local-dir < local_dir>
3335```
3436### Using merge scripts
35- If you got local DeepSeek-R1/V3 fp8 safetensors and q4km gguf weights, you can merge them using the following scripts.
37+ If you got local DeepSeek-R1/V3 fp8 safetensors and gguf weights(eg.q4km) , you can merge them using the following scripts.
3638
3739``` shell
38- python convert_model .py \
40+ python merge_tensors/merge_safetensor_gguf .py \
3941 --safetensor_path < fp8_safetensor_path> \
40- --gguf_path < q4km_gguf_folder_path > \
42+ --gguf_path < gguf_folder_path > \
4143 --output_path < merged_output_path>
4244```
4345
@@ -60,15 +62,15 @@ python ktransformers/local_chat.py \
6062
6163## Notes
6264
63- ⚠️ Hardware Requirements
65+ ⚠️ Hardware Requirements< br >
6466* Recommended minimum 19GB available VRAM for FP8 kernel.
6567* Requires GPU with FP8 support (e.g., 4090)
6668
6769⏳ First-Run Optimization
6870JIT compilation causes longer initial execution (subsequent runs retain optimized speed).
6971
70- 🔄 Temporary Interface
72+ 🔄 Temporary Interface< br >
7173Current weight loading implementation is provisional - will be refined in future versions
7274
73- 📁 Path Specification
75+ 📁 Path Specification< br >
7476Despite hybrid quantization, merged weights are stored as .safetensors - pass the containing folder path to ` --gguf_path `
0 commit comments