fix: fix the failed sampling unittest on 5090 (#1886)

yzh119 · web-flow · commit 0522a321fb70 · 2025-10-08T14:08:09.000-07:00
## 📌 Description Applying softmax followed by top_k_renorm does not guarantee bitwise-identical results compared to top_k_mask followed by softmax. This may cause slight differences in subsequent top-p sampling. In this PR we relax the condition to up to a 1% mismatch rate. ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [ ] All tests are passing (`unittest`, etc.). ## Reviewer Notes
diff --git a/tests/utils/test_sampling.py b/tests/utils/test_sampling.py
@@ -377,7 +377,18 @@ def test_top_k_top_p_sampling_from_probs_logits_alignment(batch_size, vocab_size
         filter_apply_order="top_k_first",
         generator=generator_probs,
     )
-    assert torch.all(samples == samples_ref)
+
+    num_matches = (samples == samples_ref).sum().item()
+    match_rate = num_matches / samples.numel()
+
+    # NOTE(Zihao): Applying softmax followed by top_k_renorm (softmax -> top_k_renorm)
+    # does not guarantee bitwise-identical results compared to top_k_mask followed by softmax (top_k_mask -> softmax).
+    # This may cause slight differences in subsequent top-p sampling.
+    # We tolerate up to a 1% mismatch rate.
+    assert match_rate >= 0.99, (
+        f"Sample match rate {match_rate:.2%} is below threshold "
+        f"({batch_size - num_matches}/{batch_size} mismatches, expected <=1%)"
+    )
 
 
 @pytest.mark.parametrize("batch_size", [1, 99, 989])