Skip to content

Conversation

ggerganov
Copy link
Member

  • Take into account split KV cache when computing N_KV
  • Do not measure KV cache copying when -pps is enabled

@ggerganov ggerganov merged commit 6b64f74 into master Aug 25, 2025
47 of 48 checks passed
@ggerganov ggerganov deleted the gg/batched-bench-pps branch August 25, 2025 10:56
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 26, 2025
…5562)

* batched-bench : fix unified KV cache handling + pp timing

* cont : run dummy token only with split KV cache
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 27, 2025
…5562)

* batched-bench : fix unified KV cache handling + pp timing

* cont : run dummy token only with split KV cache
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant