-
Notifications
You must be signed in to change notification settings - Fork 71
Description
Hi, @Chillee
I am trying to reproduce batch-invariant inference on the AIME24 dataset using the provided batch_invariant_ops.
I ran two different settings:
- Different batch sizes (16/12)
- Different sample shuffle orders
For each setting, we use greedy decoding and set max_new_tokens=4096, and for most samples, the outputs are perfectly batch-invariant.
However, I still observed two samples diverge during generation.
Here is the same token count across two batch_size settings. The outputs of samples 4 and 22 are different.

Here is the divergence point, the top1 token's logit differs.

Environment / Setup:
GPU: one L20 48G
Model: Qwen3-1.7B
Precision: fp16
Sampling Param Seed: 114514
Decoding: Greedy Decoding
Modifications: the only change I made was adjusting BLOCK_N in batch_invariant_ops (from 256 → 128) for fp16. No other modifications were made.
Do you have any suggestions on what might cause this non-determinism?
Is there anything else I should patch or verify in addition to batch_invariant_ops?
Thanks!