[Spark unit test] Adjust tolerance for test_xqa, test_logits_processor by kahyunnam · Pull Request #2828 · flashinfer-ai/flashinfer

kahyunnam · 2026-03-19T20:09:10Z

📌 Description

Adjust tolerance slightly to avoid unnecessary noise in Spark unit testing.

I found these spark failures in these pipelines: https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/284263159, https://gitlab-master.nvidia.com/dl/flashinfer/flashinfer-ci/-/jobs/284263158

=================================== FAILURES ===================================
E   AssertionError: Batch validation failed: Total 4096 elements, only 4052 (98.9%) meet tolerance criteria, require at least 99.0%
    assert 0.9892578125 >= 0.99
/tmp/flashinfer/tests/attention/test_xqa.py:463: AssertionError: Batch validation failed: Total 4096 elements, only 4052 (98.9%) meet tolerance criteria, require at least 99.0%
=========================== short test summary info ============================
FAILED tests/attention/test_xqa.py::test_xqa[True-1.0-True-0.5-HND-8-128-16-4-1-512-True-input_type1-True-False]
= 1 failed, 41119 passed, 1 xfailed, 10239 xpassed, 1 warning in 591.93s (0:09:51) =
❌ FAILED: tests/attention/test_xqa.py
==========================================

=================================== FAILURES ===================================
E   AssertionError: Too many differences: 2.02%
    assert 0.020202020202020204 < 0.02
/tmp/flashinfer/tests/utils/test_logits_processor.py:828: AssertionError: Too many differences: 2.02%
=========================== short test summary info ============================
FAILED tests/utils/test_logits_processor.py::TestLogitsPipeVsSamplingOps::test_sequential_probs_topk_topp_sample[0.1-128256-99]
====== 1 failed, 590 passed, 12 skipped, 1 warning in 1621.97s (0:27:01) =======
❌ FAILED: tests/utils/test_logits_processor.py
==========================================

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

gemini-code-assist · 2026-03-19T20:09:26Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adjusts the tolerance levels in two unit tests, test_xqa and test_logits_processor, to prevent spurious failures caused by minor floating-point precision differences. The changes aim to make the tests more robust and less prone to "noise" without compromising their effectiveness in catching actual regressions.

Highlights

test_xqa tolerance adjustment: The required_ratio for batch validation in test_xqa was reduced from 0.99 to 0.98 to accommodate minor floating-point differences and reduce test flakiness.
test_logits_processor tolerance adjustment: The diff_ratio assertion in test_sequential_probs_topk_topp_sample was updated to use a dynamic threshold, max(0.03, 2.0 / batch_size), to better handle floating-point precision variations across different batch sizes.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-19T20:09:31Z

📝 Walkthrough

Walkthrough

Test validation thresholds are adjusted in two test files. The batch-level pass-ratio requirement is loosened from 0.99 to 0.98 in test_xqa, and the logits processor test switches from a fixed 0.02 mismatch threshold to a batch-size-dependent formula.

Changes

Cohort / File(s)	Summary
Test Validation Thresholds `tests/attention/test_xqa.py`, `tests/utils/test_logits_processor.py`	Relaxed test assertion tolerances: batch pass-ratio threshold reduced from 0.99 to 0.98; logits processor mismatch threshold changed from fixed 0.02 to dynamic `max(0.03, 2.0 / batch_size)` with inline documentation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

fix flaky xqa test #2126 — Modifies the same test file (tests/attention/test_xqa.py) with adjustments to batch-level validation thresholds.

Suggested reviewers

yzh119
bkryu
aleozlx
cyx-6
jimmyzho
nv-yunzheq

Poem

🐰 When tolerances were tight, precision reigned supreme,
But batches whispered softly of a gentler dream,
So we loosened the chains—0.99 down to 0.98,
And dynamic thresholds dance with batch size's fate! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Adjust tolerance for test_xqa, test_logits_processor' clearly and concisely summarizes the main change—adjusting test tolerances in two specific test files.
Description check	✅ Passed	The PR description explains the purpose (adjusting tolerance), provides context with specific test failures and logs, and includes completed checklist items.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

The pull request adjusts test tolerances in test_xqa and test_logits_processor to mitigate issues related to floating-point precision. While relaxing tolerances can help with test flakiness, it's crucial to ensure that these adjustments do not inadvertently mask actual bugs or regressions. The change in test_logits_processor.py introduces a particularly broad tolerance for small batch sizes, which warrants further scrutiny.

tests/utils/test_logits_processor.py

tests/attention/test_xqa.py

aleozlx

lgtm

kahyunnam · 2026-03-19T23:11:48Z

Still need an approver for tests/attention/ directory ... @bkryu / @nv-yunzheq / @saltyminty / @yzh119

kahyunnam added 2 commits March 19, 2026 19:18

loosen threshold to 3% (up from 2%) for test_logits_processor.py

904ab1e

adjust tolerance (99 -> 98) for tests/attention/test_xqa.py

1a14594

kahyunnam requested review from aleozlx, bkryu, cyx-6, jimmyzho, nv-yunzheq, saltyminty, sricketts, yongwww, yyihuang and yzh119 as code owners March 19, 2026 20:09

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

tests/utils/test_logits_processor.py Show resolved Hide resolved

tests/attention/test_xqa.py Show resolved Hide resolved

aleozlx approved these changes Mar 19, 2026

View reviewed changes

kahyunnam enabled auto-merge (squash) March 19, 2026 20:14

bkryu approved these changes Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spark unit test] Adjust tolerance for test_xqa, test_logits_processor#2828

[Spark unit test] Adjust tolerance for test_xqa, test_logits_processor#2828
kahyunnam wants to merge 2 commits intoflashinfer-ai:mainfrom
kahyunnam:knam/spark-tolerance-adjust

kahyunnam commented Mar 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

aleozlx left a comment

Uh oh!

kahyunnam commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kahyunnam commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

aleozlx left a comment

Choose a reason for hiding this comment

Uh oh!

kahyunnam commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kahyunnam commented Mar 19, 2026 •

edited

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading