You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration.
53
+
You can refer to the general [llama-quantize tool](tools/quantize/README.md)for steps to convert a model in Hugging Face safetensor format to GGUF with quantization.
52
54
53
-
Currently we support `Q4_0` quantization and have optimize for it. To achieve best performance on Adreno GPU, add `--pure` to `llama-quantize`. For example,
55
+
Currently we support `Q4_0` quantization and have optimized for it. To achieve best performance on Adreno GPU, add `--pure` to `llama-quantize`. For example,
Since `Q6_K` is also supported, `Q4_0` quantization without `--pure` will also work. However, the performance will be worse compared to pure `Q4_0` quantization.
60
62
63
+
### MXFP4 Models
64
+
65
+
OpenAI gpt-oss models are in MXFP4. The quantized model will be in MXFP4_MOE, a mixture of MXFP4 and Q8_0.
66
+
For this quantization, there is no need to specify `--pure`.
67
+
For gpt-oss-20b model, you can directly download a quantized GGUF file in MXFP4 from Hugging Face.
68
+
61
69
## CMake Options
62
70
63
71
The OpenCL backend has the following CMake options that control the behavior of the backend.
@@ -146,10 +154,13 @@ A Snapdragon X Elite device with Windows 11 Arm64 is used. Make sure the followi
146
154
* Ninja
147
155
* Visual Studio 2022
148
156
* Powershell 7
157
+
* Python
149
158
150
159
Visual Studio provides necessary headers and libraries although it is not directly used for building.
151
160
Alternatively, Visual Studio Build Tools can be installed instead of the full Visual Studio.
152
161
162
+
> Note that building using Visual Studio's cl compiler is not supported. Clang must be used. Clang depends on libraries provided by Visual Studio to work. Therefore, Visual Studio must be installed. Alternatively, Visual Studio Build Tools can be installed instead of the full Visual Studio.
163
+
153
164
Powershell 7 is used for the following commands.
154
165
If an older version of Powershell is used, these commands may not work as they are.
155
166
@@ -201,7 +212,9 @@ ninja
201
212
202
213
## Known Issues
203
214
204
-
- Currently OpenCL backend does not work on Adreno 6xx GPUs.
215
+
- Flash attention does not always improve performance. Disable it for models above 3B.
216
+
- Currently OpenCL backend works on A6xx GPUs with recent drivers and compilers (usually found in IoT platforms).
217
+
However, it does not work on A6xx GPUs found in phones with old drivers and compilers.
0 commit comments