Query about Sparsity Implementation and Acceleration Mechanism

Hi, thank you for your wonderful work!

I’ve been exploring the implementation of sparsity in the codebase and noticed that sparsity is achieved through masks such as attn_weight_mask, mlp_weight_mask, and token_select. However, during the forward pass, these masks are represented as binary values (0s and 1s), and the tensor dimensions remain unchanged.

Could you kindly clarify how the actual acceleration effect is achieved during runtime?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query about Sparsity Implementation and Acceleration Mechanism #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query about Sparsity Implementation and Acceleration Mechanism #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions