Skip to content

implement of DISCO #10904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

micelvrice
Copy link
Contributor

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

PR changes

Description

KV Cache压缩算法DISCO算法的实现。

…LaMA

- Add DISCO configuration parameters to LlamaConfig
- Create DISCOCache class for intelligent token eviction
- Integrate DISCO into LlamaAttention and LlamaModel
- Add comprehensive unit tests and examples
- Include detailed documentation

DISCO reduces KV cache memory to ~3.2% while maintaining model quality
through adaptive layer-wise allocation and attention-based scoring.
Copy link

paddle-bot bot commented Aug 3, 2025

Thanks for your contribution!

- Fixed index_select error where indices exceeded sequence length
- Changed eviction logic to properly handle scores in sequence dimension
- Average scores across batch and heads before selecting top-k tokens
- Ensure indices are within valid range for the sequence

The issue was that we were flattening all dimensions and getting indices
that could exceed the sequence length. Now we properly average across
batch and head dimensions first, then select top-k tokens from the
sequence dimension only.
- Fixed eviction logic to not evict on first update (initialization)
- Corrected score averaging across batch and head dimensions
- Ensured indices are within valid sequence range
- Added standalone test script to verify all DISCO functionality

The key changes:
1. First cache update now preserves all tokens (no eviction on init)
2. Eviction only happens on subsequent updates when budget is exceeded
3. Scores are properly averaged before selecting top-k tokens
4. All 5 DISCO tests now pass successfully
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants