Skip to content

Commit b95fea7

Browse files
committed
Update on "[Executorch] Add quantized kv cache to oss ci"
Fixes to make sure quantized kv cache works in oss Differential Revision: [D66269487](https://our.internmc.facebook.com/intern/diff/D66269487/) [ghstack-poisoned]
2 parents c984a6e + da024a1 commit b95fea7

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/models/llama/source_transformation/quantized_kv_cache.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,6 @@
1111
import torch.nn as nn
1212
from executorch.examples.models.llama.llama_transformer import KVCache
1313

14-
# This is needed to ensure that custom ops are registered
15-
from executorch.extension.pybindings import portable_lib # noqa # usort: skip
1614
from torch.ao.quantization.fx._decomposed import quantized_decomposed_lib # noqa: F401
1715

1816

@@ -235,6 +233,8 @@ def from_float(cls, kv_cache, cache_type: QuantizedCacheType):
235233

236234

237235
def replace_kv_cache_with_quantized_kv_cache(module):
236+
# This is needed to ensure that custom ops are registered
237+
from executorch.extension.pybindings import portable_lib # noqa # usort: skip
238238
from executorch.extension.llm.custom_ops import custom_ops # noqa: F401
239239

240240
logging.warning(

0 commit comments

Comments
 (0)