Skip to content

Failed to replicate batch invariant using vLLM inference #6

@fopdoodle8

Description

@fopdoodle8

Hi, @Chillee

I am trying to reproduce batch-invariant inference on the AIME24 dataset using the provided batch_invariant_ops.
I ran two different settings:

  • Different batch sizes (16/12)
  • Different sample shuffle orders

For each setting, we use greedy decoding and set max_new_tokens=4096, and for most samples, the outputs are perfectly batch-invariant.
However, I still observed two samples diverge during generation.

Here is the same token count across two batch_size settings. The outputs of samples 4 and 22 are different.
Image
Here is the divergence point, the top1 token's logit differs.
Image

Environment / Setup:
GPU: one L20 48G
Model: Qwen3-1.7B
Precision: fp16
Sampling Param Seed: 114514
Decoding: Greedy Decoding
Modifications: the only change I made was adjusting BLOCK_N in batch_invariant_ops (from 256 → 128) for fp16. No other modifications were made.

Do you have any suggestions on what might cause this non-determinism?
Is there anything else I should patch or verify in addition to batch_invariant_ops?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions