You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* [fix](kt-kernel): fix AVX512 cpu instruction set detection
* [feat](kt-kernel): AVX512 fallback kernel for RAW-INT4
* [fix](kt-kernel): fix setup version issue
* [fix](kt-kernel): update install for custom build
* [docs](kt-kernel): new installation guide for various cpu instruction set
* [fix](kt-kernel): fix _mm512_dpbusd_epi32_compat fallback implmentation
* [style](kt-kernel): clang format
If you have an AMX-capable CPU but plan to use the LLAMAFILE backend, do NOT use the default auto-detection build.
157
-
Use "manual mode" with `CPUINFER_CPU_INSTRUCT` set to `AVX512` or `AVX2` instead of `NATIVE` to avoid compilation issues (see below).
151
+
**Software Fallback Support (AVX512 backends):**
152
+
- ✅ VNNI fallback: Uses AVX512BW instructions
153
+
- ✅ BF16 fallback: Uses AVX512F instructions
154
+
- ✅ Older AVX512 CPUs (Skylake-X, Cascade Lake) can run RAWINT4 with fallbacks
158
155
159
-
⚠️ **Important for BLIS AMD backend users:**
160
-
for the installation guide, see this [issue](https://github.com/kvcache-ai/ktransformers/issues/1601)
156
+
⚠️ **Portability Note:** The default build is optimized for your specific CPU and may not work on different/older CPUs. For portable builds or binary distribution, see [Manual Configuration](#manual-configuration-advanced) below.
161
157
162
-
163
-
### Manual Configuration (Advanced)
164
-
165
-
If you need specific build options (e.g., for LLAMAFILE backend, compatibility, or binary distribution):
166
-
167
-
```bash
168
-
# Example for LLAMAFILE backend on AMX CPU with AVX512
# Build only (skip auto-detection of instruction set)
173
-
./install.sh build --manual
174
-
```
175
-
176
-
For advanced build options and binary distribution, see the [Build Configuration](#build-configuration) section. If you encounter issues, refer to [Error Troubleshooting](#error-troubleshooting).
158
+
⚠️ **AMD BLIS backend users:** See [installation guide](https://github.com/kvcache-ai/ktransformers/issues/1601) for AMD-specific setup.
For portable builds, binary distribution, or cross-machine deployment, you need to manually specify target instruction sets:
470
+
471
+
```bash
472
+
# General distribution (works on any AVX512 CPU from 2017+)
473
+
export CPUINFER_CPU_INSTRUCT=AVX512
474
+
export CPUINFER_ENABLE_AMX=OFF
475
+
./install.sh build --manual
476
+
477
+
# Maximum compatibility (works on any CPU from 2013+)
478
+
export CPUINFER_CPU_INSTRUCT=AVX2
479
+
export CPUINFER_ENABLE_AMX=OFF
480
+
./install.sh build --manual
481
+
482
+
# Modern CPUs only (Ice Lake+, Zen 4+)
483
+
export CPUINFER_CPU_INSTRUCT=FANCY
484
+
export CPUINFER_ENABLE_AMX=OFF
485
+
./install.sh build --manual
486
+
```
487
+
488
+
**Optional: Override VNNI/BF16 detection**
489
+
```bash
490
+
# Force enable/disable VNNI and BF16 (for testing fallbacks)
491
+
export CPUINFER_ENABLE_AVX512_VNNI=OFF
492
+
export CPUINFER_ENABLE_AVX512_BF16=OFF
493
+
./install.sh
494
+
```
495
+
496
+
See `./install.sh --help` for all available options.
497
+
498
+
---
499
+
485
500
## Build Configuration
486
501
487
-
### Manual Installation
502
+
### Manual Installation (Without install.sh)
488
503
489
-
If you prefer manual installation without the `install.sh` script, follow these steps:
504
+
If you prefer manual installation without the `install.sh` script:
490
505
491
506
#### 1. Install System Dependencies
492
507
@@ -508,27 +523,29 @@ If you prefer manual installation without the `install.sh` script, follow these
508
523
509
524
**Instruction Set Details:**
510
525
511
-
-**`NATIVE`**: Auto-detect and use all available CPU instructions (`-march=native`) - **Recommended for best performance**
512
-
-**`AVX512`**: Explicit AVX512 support for Skylake-SP and Cascade Lake
513
-
-**`AVX2`**: AVX2 support for maximum compatibility
514
-
-**`FANCY`**: AVX512 with full extensions (AVX512F/BW/DQ/VL/VNNI) for Ice Lake+ and Zen 4+. Use this when building pre-compiled binaries to distribute to users with modern CPUs. For local builds, prefer `NATIVE` for better performance.
526
+
| Option | Target CPUs | Use Case |
527
+
|--------|-------------|----------|
528
+
|**`NATIVE`**| Your specific CPU only | Local builds (best performance, **default**) |
529
+
|**`AVX512`**| Skylake-X, Ice Lake, Cascade Lake, Zen 4+ | General distribution |
530
+
|**`AVX2`**| Haswell (2013) and newer | Maximum compatibility |
531
+
|**`FANCY`**| Ice Lake+, Zen 4+ | Modern CPUs with full AVX512 extensions |
515
532
516
533
**Example Configurations:**
517
534
518
535
```bash
519
-
#Maximum performance on AMX CPU
536
+
#Local use - maximum performance (default behavior)
0 commit comments