Skip to content

Query about Sparsity Implementation and Acceleration Mechanism #3

@huchenz1

Description

@huchenz1

Hi, thank you for your wonderful work!

I’ve been exploring the implementation of sparsity in the codebase and noticed that sparsity is achieved through masks such as attn_weight_mask, mlp_weight_mask, and token_select. However, during the forward pass, these masks are represented as binary values (0s and 1s), and the tensor dimensions remain unchanged.

Could you kindly clarify how the actual acceleration effect is achieved during runtime?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions