You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add custom_sdpa and use that instead of sdpa_with_kv_cache (#5669)
Summary:
Pull Request resolved: #5669
sdpa_with_kv_cache updates kv cache. In quantized kv cache, cache updates
happens separately. Then the quantized cache is dequantized. After that
we call sdpa_with_kv_cache which copies k and v data into dequantized cache.
Although this is not needed because the actual cache is the one that is
quantized.
For very large context length this will add significant amount data copy.
Subsequent diffs will deprecate sdpa_with_kv_cache op and deconstruct that
using a) update_cache op and b) custom_sdpa op.
ghstack-source-id: 245751544
exported-using-ghexport
//oss complaining of internal lint failure
bypass-github-export-checks
exported-using-ghexport
Reviewed By: swolchok
Differential Revision: D62623241
fbshipit-source-id: 022ce04154e4bf869d83266c09c318b312ad6405
0 commit comments