why set USE_SPARSE_ATTN = False USE_TRITON_NSA = False  is faster ;with NSA  the speed is 2.44it/s，without NSA the speed is 6.00it/s

# set USE_SPARSE_ATTN = False USE_TRITON_NSA = False ,the speed is 6.00it/s

NUM_BATCHES = int(1e5)
BATCH_SIZE = 4
GRAD_ACCUM_EVERY = 4
LEARNING_RATE = 1e-4
VALIDATE_EVERY = 100
PRIME_LENGTH = 64
GENERATE_EVERY = 500
GENERATE_LENGTH = 512
SEQ_LEN = 512
HEADS = 8
KV_HEADS = 4

USE_SPARSE_ATTN = False
USE_TRITON_NSA = False
USE_FLEX_FOR_FINE_SELECTION = False   # will push flex a bit, won't be efficient as each layer needs sparsity dynmically generated, but may be enough just to compare to full attention before going all-in on triton kernels
QUERY_HEADS_SHARE_SELECTION = True


# set USE_SPARSE_ATTN = Ture USE_TRITON_NSA = Ture ,the speed is 2.44it/s

NUM_BATCHES = int(1e5)
BATCH_SIZE = 4
GRAD_ACCUM_EVERY = 4
LEARNING_RATE = 1e-4
VALIDATE_EVERY = 100
PRIME_LENGTH = 64
GENERATE_EVERY = 500
GENERATE_LENGTH = 512
SEQ_LEN = 512
HEADS = 8
KV_HEADS = 4

USE_SPARSE_ATTN = True
USE_TRITON_NSA = True
USE_FLEX_FOR_FINE_SELECTION = False   # will push flex a bit, won't be efficient as each layer needs sparsity dynmically generated, but may be enough just to compare to full attention before going all-in on triton kernels
QUERY_HEADS_SHARE_SELECTION = True    




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

why set USE_SPARSE_ATTN = False USE_TRITON_NSA = False is faster ;with NSA the speed is 2.44it/s，without NSA the speed is 6.00it/s #29

set USE_SPARSE_ATTN = False USE_TRITON_NSA = False ,the speed is 6.00it/s

set USE_SPARSE_ATTN = Ture USE_TRITON_NSA = Ture ,the speed is 2.44it/s

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

why set USE_SPARSE_ATTN = False USE_TRITON_NSA = False is faster ;with NSA the speed is 2.44it/s，without NSA the speed is 6.00it/s #29

Description

set USE_SPARSE_ATTN = False USE_TRITON_NSA = False ,the speed is 6.00it/s

set USE_SPARSE_ATTN = Ture USE_TRITON_NSA = Ture ,the speed is 2.44it/s

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions