Question about Higgs Integration

Dear Han Guo,

Hello, thank you for your excellent work on FLUTE.

I am currently attempting to run HIGGS quantization using the flute-kernel (installed via pip for CUDA 12.4). My implementation is based on the integration logic found in the Hugging Face transformers library [(higgs.py)](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/higgs.py).

### The Issue:
When using higgs_grid with (p=2, n=256) (equivalent to 4-bit), the quantization works without any issues.
However when attempting lower bit-widths (e.g., 2-bit or 3-bit settings), the process fails with an error.

### My Hypothesis:
I suspect that the currently installed FLUTE kernel might only support limited HIGGS configurations. I noticed that the original kernel implementation repository [(galqiwi/higgs-kernels)](https://github.com/galqiwi/higgs-kernels/blob/main/higgs_kernels/kernels.py) primarily highlights the (2, 256) case.

### Question
Could you confirm if the current FLUTE integration for HIGGS is strictly limited to the (2, 256) / 4-bit case? Or should the kernel support arbitrary (p, n) grids for lower bit-widths as well?

If lower bit-widths are supposed to be supported, I would appreciate any guidance on whether I need to build from source with specific flags or if this requires a different kernel configuration.

### Environment:

- FLUTE version: (Installed via pip for CUDA 12.4; Python 3.11)

- CUDA version: 12.4

- GPU: NVIDIA A6000

Best,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Higgs Integration #34

The Issue:

My Hypothesis:

Question

Environment:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about Higgs Integration #34

Description

The Issue:

My Hypothesis:

Question

Environment:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions