Skip to content

Commit c984a6e

Browse files
committed
Update on "[Executorch] Add quantized kv cache to oss ci"
Fixes to make sure quantized kv cache works in oss Differential Revision: [D66269487](https://our.internmc.facebook.com/intern/diff/D66269487/) [ghstack-poisoned]
2 parents 98f223b + b199a32 commit c984a6e

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

examples/models/llama/source_transformation/quantized_kv_cache.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
import torch.nn as nn
1212
from executorch.examples.models.llama.llama_transformer import KVCache
1313

14+
# This is needed to ensure that custom ops are registered
15+
from executorch.extension.pybindings import portable_lib # noqa # usort: skip
1416
from torch.ao.quantization.fx._decomposed import quantized_decomposed_lib # noqa: F401
1517

1618

@@ -233,9 +235,8 @@ def from_float(cls, kv_cache, cache_type: QuantizedCacheType):
233235

234236

235237
def replace_kv_cache_with_quantized_kv_cache(module):
236-
# This is needed to ensure that custom ops are registered
237-
from executorch.extension.pybindings import portable_lib # noqa # usort: skip
238238
from executorch.extension.llm.custom_ops import custom_ops # noqa: F401
239+
239240
logging.warning(
240241
"Replacing KVCache with QuantizedKVCache. This modifies the model in place."
241242
)

0 commit comments

Comments
 (0)