last_chunk_indices caclulations fix by jbyczkow · Pull Request #1024 · vllm-project/vllm-gaudi

jbyczkow · 2026-02-24T12:22:07Z

No description provided.

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

Copilot

Pull request overview

This PR updates how last_chunk_indices are computed for Mamba chunked-prefill metadata in the HPU model runner.

Changes:

Replaced the per-request last_chunk_indices computation loop with a simplified calculation derived from target_seq and chunk_size.
Added assertions around chunk_size and the derived chunk count.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T12:29:04Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+            assert chunk_size > 0
+            nphysical_chunks = target_seq // chunk_size
+            assert nphysical_chunks > 0, (f"target_seq={target_seq} must be >= chunk_size={chunk_size}")
+            last_chunk_indices = [nphysical_chunks - 1 for _ in range(len(contents.req_ids))]


last_chunk_indices is now set to the same value for every request (nphysical_chunks - 1). Downstream (hpu_mamba_mixer2.py) this tensor is used to index the per-chunk SSM states and then written back into ssm_state per request, so making all requests point at the same chunk will cause incorrect state updates (and can effectively make multiple sequences share the same final state). Compute per-request chunk indices based on how tokens are laid out (e.g., in the non-merged 2D layout, the i-th sequence’s last chunk should be offset by i * nphysical_chunks).

Suggested change

last_chunk_indices = [nphysical_chunks - 1 for _ in range(len(contents.req_ids))]

last_chunk_indices = [

i * nphysical_chunks + (nphysical_chunks - 1)

for i in range(len(contents.req_ids))

]

Copilot · 2026-02-24T12:29:05Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+            nphysical_chunks = target_seq // chunk_size
+            assert nphysical_chunks > 0, (f"target_seq={target_seq} must be >= chunk_size={chunk_size}")


nphysical_chunks = target_seq // chunk_size uses floor division, but the mamba chunk-scan implementation reshapes based on seqlen // chunk_size and effectively requires the sequence length to be an exact multiple of chunk_size. It would be safer to assert target_seq % chunk_size == 0 (with a clear error) or explicitly round/pad target_seq up to the next multiple before computing chunk indices, otherwise disabling/altering bucketing can lead to hard-to-debug shape errors or incorrect indexing.

Suggested change

nphysical_chunks = target_seq // chunk_size

assert nphysical_chunks > 0, (f"target_seq={target_seq} must be >= chunk_size={chunk_size}")

# Mamba chunk-scan requires the sequence length to be an exact multiple of chunk_size.

assert target_seq % chunk_size == 0, (

f"Mamba chunking requires target_seq ({target_seq}) to be an exact multiple "

f"of chunk_size ({chunk_size})."

)

nphysical_chunks = target_seq // chunk_size

assert nphysical_chunks > 0, (

f"target_seq={target_seq} must be >= chunk_size={chunk_size}"

)

last_chunk_indices caclulations fix

5598a43

Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

Copilot AI review requested due to automatic review settings February 24, 2026 12:22

jbyczkow requested review from mgawarkiewicz-intel, piotrbocian and wpyszka as code owners February 24, 2026 12:22

Copilot started reviewing on behalf of jbyczkow February 24, 2026 12:22 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

github-actions bot mentioned this pull request Feb 24, 2026

🚦 Team Review Dashboard #701

Open

michalkuligowski enabled auto-merge (squash) February 25, 2026 07:35

michalkuligowski disabled auto-merge February 25, 2026 07:35

Merge branch 'releases/v0.15.1' into granite_tool_calling_fix_0_15_1

d842e4e

michalkuligowski approved these changes Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

last_chunk_indices caclulations fix#1024

last_chunk_indices caclulations fix#1024
jbyczkow wants to merge 2 commits intovllm-project:releases/v0.15.1from
jbyczkow:granite_tool_calling_fix_0_15_1

jbyczkow commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		nphysical_chunks = target_seq // chunk_size
		assert nphysical_chunks > 0, (f"target_seq={target_seq} must be >= chunk_size={chunk_size}")

Conversation

jbyczkow commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants