Skip to content

Fix hidden states and quant kv cache #10854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

Li-Z-Q
Copy link
Contributor

@Li-Z-Q Li-Z-Q commented Jul 17, 2025

  1. 支持量化代码返回 hidden_states
  2. 支持针对向量模型进行量化加载,包括 weight_only_int8,weight_only_int4 两种方式
  3. 支持向量模型量化加载时仅预分配第一层 kv_cache 并在后续计算时进行复用,从而降低显存占用

Copy link

paddle-bot bot commented Jul 17, 2025

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants