Skip to content

[Spark unit test] Adjust tolerance for test_xqa, test_logits_processor#2828

Open
kahyunnam wants to merge 2 commits intoflashinfer-ai:mainfrom
kahyunnam:knam/spark-tolerance-adjust
Open

[Spark unit test] Adjust tolerance for test_xqa, test_logits_processor#2828
kahyunnam wants to merge 2 commits intoflashinfer-ai:mainfrom
kahyunnam:knam/spark-tolerance-adjust

Conversation

@kahyunnam
Copy link
Collaborator

@kahyunnam kahyunnam commented Mar 19, 2026

📌 Description

Adjust tolerance slightly to avoid unnecessary noise in Spark unit testing.

I found these spark failures in these pipelines: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/284263159, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/284263158

=================================== FAILURES ===================================
E   AssertionError: Batch validation failed: Total 4096 elements, only 4052 (98.9%) meet tolerance criteria, require at least 99.0%
    assert 0.9892578125 >= 0.99
/tmp/flashinfer/tests/attention/test_xqa.py:463: AssertionError: Batch validation failed: Total 4096 elements, only 4052 (98.9%) meet tolerance criteria, require at least 99.0%
=========================== short test summary info ============================
FAILED tests/attention/test_xqa.py::test_xqa[True-1.0-True-0.5-HND-8-128-16-4-1-512-True-input_type1-True-False]
= 1 failed, 41119 passed, 1 xfailed, 10239 xpassed, 1 warning in 591.93s (0:09:51) =
❌ FAILED: tests/attention/test_xqa.py
==========================================
=================================== FAILURES ===================================
E   AssertionError: Too many differences: 2.02%
    assert 0.020202020202020204 < 0.02
/tmp/flashinfer/tests/utils/test_logits_processor.py:828: AssertionError: Too many differences: 2.02%
=========================== short test summary info ============================
FAILED tests/utils/test_logits_processor.py::TestLogitsPipeVsSamplingOps::test_sequential_probs_topk_topp_sample[0.1-128256-99]
====== 1 failed, 590 passed, 12 skipped, 1 warning in 1621.97s (0:27:01) =======
❌ FAILED: tests/utils/test_logits_processor.py
==========================================

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adjusts the tolerance levels in two unit tests, test_xqa and test_logits_processor, to prevent spurious failures caused by minor floating-point precision differences. The changes aim to make the tests more robust and less prone to "noise" without compromising their effectiveness in catching actual regressions.

Highlights

  • test_xqa tolerance adjustment: The required_ratio for batch validation in test_xqa was reduced from 0.99 to 0.98 to accommodate minor floating-point differences and reduce test flakiness.
  • test_logits_processor tolerance adjustment: The diff_ratio assertion in test_sequential_probs_topk_topp_sample was updated to use a dynamic threshold, max(0.03, 2.0 / batch_size), to better handle floating-point precision variations across different batch sizes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

Test validation thresholds are adjusted in two test files. The batch-level pass-ratio requirement is loosened from 0.99 to 0.98 in test_xqa, and the logits processor test switches from a fixed 0.02 mismatch threshold to a batch-size-dependent formula.

Changes

Cohort / File(s) Summary
Test Validation Thresholds
tests/attention/test_xqa.py, tests/utils/test_logits_processor.py
Relaxed test assertion tolerances: batch pass-ratio threshold reduced from 0.99 to 0.98; logits processor mismatch threshold changed from fixed 0.02 to dynamic max(0.03, 2.0 / batch_size) with inline documentation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • fix flaky xqa test #2126 — Modifies the same test file (tests/attention/test_xqa.py) with adjustments to batch-level validation thresholds.

Suggested reviewers

  • yzh119
  • bkryu
  • aleozlx
  • cyx-6
  • jimmyzho
  • nv-yunzheq

Poem

🐰 When tolerances were tight, precision reigned supreme,
But batches whispered softly of a gentler dream,
So we loosened the chains—0.99 down to 0.98,
And dynamic thresholds dance with batch size's fate! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Adjust tolerance for test_xqa, test_logits_processor' clearly and concisely summarizes the main change—adjusting test tolerances in two specific test files.
Description check ✅ Passed The PR description explains the purpose (adjusting tolerance), provides context with specific test failures and logs, and includes completed checklist items.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request adjusts test tolerances in test_xqa and test_logits_processor to mitigate issues related to floating-point precision. While relaxing tolerances can help with test flakiness, it's crucial to ensure that these adjustments do not inadvertently mask actual bugs or regressions. The change in test_logits_processor.py introduces a particularly broad tolerance for small batch sizes, which warrants further scrutiny.

Copy link
Collaborator

@aleozlx aleozlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kahyunnam kahyunnam enabled auto-merge (squash) March 19, 2026 20:14
@kahyunnam
Copy link
Collaborator Author

Still need an approver for tests/attention/ directory ... @bkryu / @nv-yunzheq / @saltyminty / @yzh119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants