You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The previous merge blindly skipped decode[0] assuming it overlaps with
prefill[-1]. In practice prefill=12 + decode=8 with expected_total=20
means there is NO overlap (12+8=20), so decode[0] should NOT be skipped.
The decode[1:] skip produced 19 embeddings for a 20-token sequence,
causing misaligned talker input and garbled output.
Now _merge_pd_embeddings computes overlap dynamically:
overlap = max(0, prefill_len + decode_len - expected_total)
This correctly handles both overlap and no-overlap cases.
Also added diagnostic logging for prompt_len, output_len, expected_total,
and actual embedding shapes to make future debugging easier.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments