Skip to content

Question about Higgs Integration #34

@MLATH

Description

@MLATH

Dear Han Guo,

Hello, thank you for your excellent work on FLUTE.

I am currently attempting to run HIGGS quantization using the flute-kernel (installed via pip for CUDA 12.4). My implementation is based on the integration logic found in the Hugging Face transformers library (higgs.py).

The Issue:

When using higgs_grid with (p=2, n=256) (equivalent to 4-bit), the quantization works without any issues.
However when attempting lower bit-widths (e.g., 2-bit or 3-bit settings), the process fails with an error.

My Hypothesis:

I suspect that the currently installed FLUTE kernel might only support limited HIGGS configurations. I noticed that the original kernel implementation repository (galqiwi/higgs-kernels) primarily highlights the (2, 256) case.

Question

Could you confirm if the current FLUTE integration for HIGGS is strictly limited to the (2, 256) / 4-bit case? Or should the kernel support arbitrary (p, n) grids for lower bit-widths as well?

If lower bit-widths are supposed to be supported, I would appreciate any guidance on whether I need to build from source with specific flags or if this requires a different kernel configuration.

Environment:

  • FLUTE version: (Installed via pip for CUDA 12.4; Python 3.11)

  • CUDA version: 12.4

  • GPU: NVIDIA A6000

Best,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions