-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Dear Han Guo,
Hello, thank you for your excellent work on FLUTE.
I am currently attempting to run HIGGS quantization using the flute-kernel (installed via pip for CUDA 12.4). My implementation is based on the integration logic found in the Hugging Face transformers library (higgs.py).
The Issue:
When using higgs_grid with (p=2, n=256) (equivalent to 4-bit), the quantization works without any issues.
However when attempting lower bit-widths (e.g., 2-bit or 3-bit settings), the process fails with an error.
My Hypothesis:
I suspect that the currently installed FLUTE kernel might only support limited HIGGS configurations. I noticed that the original kernel implementation repository (galqiwi/higgs-kernels) primarily highlights the (2, 256) case.
Question
Could you confirm if the current FLUTE integration for HIGGS is strictly limited to the (2, 256) / 4-bit case? Or should the kernel support arbitrary (p, n) grids for lower bit-widths as well?
If lower bit-widths are supposed to be supported, I would appreciate any guidance on whether I need to build from source with specific flags or if this requires a different kernel configuration.
Environment:
-
FLUTE version: (Installed via pip for CUDA 12.4; Python 3.11)
-
CUDA version: 12.4
-
GPU: NVIDIA A6000
Best,