You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: Add a batched version of ssm_conv
This was done using Claude Code. It found a number of optimizations around
how the threads were organized, resulting in a huge performance boost!
Branch: Mamba2SSD
Signed-off-by: Gabe Goodhart <[email protected]>
* feat: Optimized SSM_SCAN kernel for metal
This used Claude Code and resulted in a modest performance improvement
while maintaining correctness.
Branch: Mamba2SSD
Signed-off-by: Gabe Goodhart <[email protected]>
* test: Add test-backend-ops perf tests for SSM_CONV
Branch: SSMKernelImprovements
Signed-off-by: Gabe Goodhart <[email protected]>
* test: Real representitive tests for SSM_CONV
Branch: SSMKernelImprovements
Signed-off-by: Gabe Goodhart <[email protected]>
* refactor: Use function constant for ssm_conv batch size
Branch: SSMKernelImprovements
Signed-off-by: Gabe Goodhart <[email protected]>
* test: backend op tests for ssm_scan from granite4 1b-h
Branch: SSMKernelImprovements
Signed-off-by: Gabe Goodhart <[email protected]>
* style: remove commented out templates
Branch: SSMKernelImprovements
Signed-off-by: Gabe Goodhart <[email protected]>
* feat: float4 version of ssm_conv_batched
Branch: SSMKernelImprovements
Signed-off-by: Gabe Goodhart <[email protected]>
* fix: Add missing ggml_metal_cv_free
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
---------
Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
0 commit comments