Skip to content

deduplicate is_attention_module between compressed-tensors and llm-compressor #2079

@HDCharles

Description

@HDCharles

https://github.com/vllm-project/llm-compressor/blob/db0b68d9faf09066e9b7d679b39a977e484d9b91/src/llmcompressor/modifiers/utils/helpers.py#L32C4-L37

vs

https://github.com/vllm-project/compressed-tensors/blob/73c2cf935b53e0078be7766c5ee064755d980d78/src/compressed_tensors/quantization/lifecycle/initialize.py#L146

they do the exact same thing but if we want to expand this function (to support MLA attention or something like that) its a footgun to know to update 2 repositories. Should probably remove the llm-compressor one and just use the compressed-tensor one since that's already how it works in a few places.

Metadata

Metadata

Assignees

No one assigned

    Labels

    compressed-tensorsRelates to compressed-tensorsgood first issueA good first issue for users wanting to contribute

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions