Skip to content

Commit 28c811a

Browse files
authored
Update README.md
1 parent 0892bf4 commit 28c811a

File tree

1 file changed

+1
-1
lines changed
  • recipes/experimental/long-context/H2O

1 file changed

+1
-1
lines changed

recipes/experimental/long-context/H2O/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
### Overview:
44

5-
Heavy-Hitter Oracle (H2O) is an efficient inference framework of LLMs. During the generative inference of transfomers, the size of KV cache grows linearly with the sequence length (prompt length + generation length). And the size KV cache is usually significantly larger than the model parameters, contrains the inference throughput. H2O identifies the key KV pairs and evicts other unnecessary ones, maintaining a small cache size thus improving the throughput.
5+
Heavy-Hitter Oracle (H2O) is an efficient inference framework of LLMs. During the generative inference of transfomers, the size of KV cache grows linearly with the sequence length (prompt length + generation length) during long context generation. And the size KV cache is usually significantly larger than the model parameters, contrains the inference throughput. H2O identifies the critical KV pairs and evicts other unnecessary ones, maintaining a small cache size thus improving the throughput.
66

77
Besides, LLMs usually have poor generation to long sequence during inference. H2O handles this issue by maintaining only heavy-hitter tokens and the most recent tokens. Incorporated by the positional rolling strategy (reassigning the position of each kv with the position in the kv cache instead of the original sequence), H2O can process sequence length much longer than the pretrained context window.
88

0 commit comments

Comments
 (0)