Skip to content

Commit a05b025

Browse files
committed
opencl: update docs
1 parent 82ab059 commit a05b025

File tree

1 file changed

+8
-3
lines changed

1 file changed

+8
-3
lines changed

docs/backend/OPENCL.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,11 +60,16 @@ Currently we support `Q4_0` quantization and have optimized for it. To achieve b
6060

6161
Since `Q6_K` is also supported, `Q4_0` quantization without `--pure` will also work. However, the performance will be worse compared to pure `Q4_0` quantization.
6262

63-
### MXFP4 Models
63+
### `MXFP4` MoE Models
6464

65-
OpenAI gpt-oss models are in MXFP4. The quantized model will be in MXFP4_MOE, a mixture of MXFP4 and Q8_0.
65+
OpenAI gpt-oss models are MoE models in `MXFP4`. The quantized model will be in `MXFP4_MOE`, a mixture of `MXFP4` and `Q8_0`.
6666
For this quantization, there is no need to specify `--pure`.
67-
For gpt-oss-20b model, you can directly download a quantized GGUF file in MXFP4 from Hugging Face.
67+
For gpt-oss-20b model, you can directly [download](https://huggingface.co/ggml-org/gpt-oss-20b-GGUF) the quantized GGUF file in `MXFP4_MOE` from Hugging Face.
68+
69+
Although it is possible to quantize gpt-oss-20b model in pure `Q4_0`, it is not recommendedsince `MXFP4` has been optimized for MoE while `Q4_0` is not.
70+
Hence, using the default `MXFP4_MOE` quantization will give better performance compared to pure `Q4_0` quantization for this model.
71+
72+
However, note that the `Q4_0` model found [here](https://huggingface.co/unsloth/gpt-oss-20b-GGUF/blob/main/gpt-oss-20b-Q4_0.gguf) is a mixture of `Q4_0`, `Q8_0` and `MXFP4` and gives better performance than `MXFP4_MOE` quantization.
6873

6974
## CMake Options
7075

0 commit comments

Comments
 (0)