Fix for CB incosistency for qwen2_5_vl (#765)

asmigosw · web-flow · commit a8a008d0b5c6 · 2026-02-24T14:47:04.000+05:30
For Qwen2_5_vl the `decode_inputs["position_ids"][decode_batch_id]` is
of size (4,1) and the code was only updating the pos_ids of last index
of last array. Therefore, changing it to update the last idx of all
arrays of all the batches.

---------

Signed-off-by: asmigosw &lt;asmigosw@qti.qualcomm.com&gt;
diff --git a/QEfficient/generation/text_generation_inference.py b/QEfficient/generation/text_generation_inference.py
@@ -956,7 +956,7 @@ def run_continuous_batching_decode(self, prompt_queue, generation_len):
                 else:
                     # If the generated sequence is valid and within generation len prepare for next decode
                     decode_inputs["input_ids"][decode_batch_id, -1] = next_token_id[decode_batch_id, -1]
-                    decode_inputs["position_ids"][decode_batch_id, -1] += 1
+                    decode_inputs["position_ids"][decode_batch_id][..., -1] += 1
                     self.generated_ids[batch_id_map[decode_batch_id], generated_id_current_index[decode_batch_id]] = (
                         next_token_id[decode_batch_id, -1]
                     )

Original file line number	Diff line number	Diff line change
`@@ -956,7 +956,7 @@ def run_continuous_batching_decode(self, prompt_queue, generation_len):`
`956`	`956`	`else:`
`957`	`957`	`# If the generated sequence is valid and within generation len prepare for next decode`
`958`	`958`	`decode_inputs["input_ids"][decode_batch_id, -1] = next_token_id[decode_batch_id, -1]`
`959`		`- decode_inputs["position_ids"][decode_batch_id, -1] += 1`
	`959`	`+ decode_inputs["position_ids"][decode_batch_id][..., -1] += 1`
`960`	`960`	`self.generated_ids[batch_id_map[decode_batch_id], generated_id_current_index[decode_batch_id]] = (`
`961`	`961`	`next_token_id[decode_batch_id, -1]`
`962`	`962`	`)`