Skip to content

Question on Quest implemenation in KV cache selection #4

@Monstertail

Description

@Monstertail

Thanks for your great work and corresponding implementation of baselines! It really benefits the future work a lot!
I have a question about the implementation of KV cache selection in Quest.

It looks like in this repo, the Quest cache will select all the generated tokens (see

self.selected_key_cache[layer_idx] = torch.cat([self.selected_key_cache[layer_idx], key_states], dim=-2)
),
The selection of tokens is limited to the prompt tokens because the KV page is built only on prefill.

It looks like the original Quest implementation (https://github.com/mit-han-lab/Quest/blob/main/evaluation/quest_attention.py ) will dynamically update the KV pages during the decoding.

Will you take the implementation of dynamic KV page updating into consideration? I implemented a very simple but not perfect version for this here: https://github.com/Monstertail/MagicPIG/blob/b635d06ae2c68c1d2949f2e95f358fb5746f6108/RULER/RULER/scripts/pred/quest_cache.py#L253 . If you are interested, we can think about how to make it better together.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions