fix: fall back to CUDA events when CUPTI driver version < 13.0#2818
fix: fall back to CUDA events when CUPTI driver version < 13.0#2818sha7doww wants to merge 4 commits intoflashinfer-ai:mainfrom
Conversation
Probe CUPTI activity tracing support inside the existing try block so that NotSupportedError on older drivers is caught by the existing fallback logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify that bench_gpu_time_with_cupti gracefully falls back to CUDA events when cupti.activity_enable raises (e.g. CUDA driver < 13.0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
📝 WalkthroughWalkthroughThe CUPTI availability check in the GPU benchmarking utility now requires both a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a crash when using bench_gpu_time_with_cupti on systems with an older CUDA driver but a newer cupti-python library. The fix involves probing for driver support within the try...except block to trigger the existing fallback mechanism. A comprehensive test case has been added to ensure the fallback to CUDA events works as expected when cupti.activity_enable raises an exception. The changes are logical and well-tested. I have one minor suggestion to improve code clarity in the new test file.
tests/utils/test_cupti_fallback.py
Outdated
| fake_module = MagicMock() | ||
| fake_module.cupti = fake_cupti | ||
|
|
||
| real_import = __builtins__.__import__ if hasattr(__builtins__, "__import__") else __import__ |
There was a problem hiding this comment.
This line is overly complex for getting a reference to the built-in __import__ function. In standard Python 3 environments, __import__ is directly available in the global scope. You can simplify this for better readability and maintainability.
| real_import = __builtins__.__import__ if hasattr(__builtins__, "__import__") else __import__ | |
| real_import = __import__ |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/utils/test_cupti_fallback.py (1)
10-45: Preferbench_gpu_time(..., enable_cupti=True)in this test.Line 39 currently tests the helper directly; switching to the unified benchmarking entrypoint better matches repository test guidance while still exercising the same fallback behavior.
♻️ Suggested update
-from flashinfer.testing import bench_gpu_time_with_cupti +from flashinfer.testing import bench_gpu_time @@ - times = bench_gpu_time_with_cupti( + times = bench_gpu_time( fn=torch.matmul, input_args=(a, b), repeat_iters=5, dry_run_iters=2, cold_l2_cache=False, + enable_cupti=True, )As per coding guidelines
tests/**/*.py: Useflashinfer.testing.bench_gpu_time()for benchmarking kernels, preferring CUPTI timing with auto-fallback to CUDA events.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/utils/test_cupti_fallback.py` around lines 10 - 45, The test currently calls the internal helper bench_gpu_time_with_cupti; replace that call with the public unified entrypoint bench_gpu_time(..., enable_cupti=True) so the test exercises the same CUPTI fallback via the sanctioned API. Specifically, in test_cupti_fallback_on_activity_enable_error swap the call to bench_gpu_time_with_cupti(fn=torch.matmul, input_args=(a, b), repeat_iters=5, dry_run_iters=2, cold_l2_cache=False) for a call to flashinfer.testing.bench_gpu_time(fn=torch.matmul, input_args=(a, b), repeat_iters=5, dry_run_iters=2, cold_l2_cache=False, enable_cupti=True) (or import bench_gpu_time and call bench_gpu_time(..., enable_cupti=True)); keep the same patches/mocks and warning capture around it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/utils/test_cupti_fallback.py`:
- Around line 10-45: The test currently calls the internal helper
bench_gpu_time_with_cupti; replace that call with the public unified entrypoint
bench_gpu_time(..., enable_cupti=True) so the test exercises the same CUPTI
fallback via the sanctioned API. Specifically, in
test_cupti_fallback_on_activity_enable_error swap the call to
bench_gpu_time_with_cupti(fn=torch.matmul, input_args=(a, b), repeat_iters=5,
dry_run_iters=2, cold_l2_cache=False) for a call to
flashinfer.testing.bench_gpu_time(fn=torch.matmul, input_args=(a, b),
repeat_iters=5, dry_run_iters=2, cold_l2_cache=False, enable_cupti=True) (or
import bench_gpu_time and call bench_gpu_time(..., enable_cupti=True)); keep the
same patches/mocks and warning capture around it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e79249fd-ec1c-45bd-8cbe-6aac24c0fd5d
📒 Files selected for processing (2)
flashinfer/testing/utils.pytests/utils/test_cupti_fallback.py
- Use bench_gpu_time(enable_cupti=True) public API instead of bench_gpu_time_with_cupti directly (CodeRabbit suggestion) - Combine nested with statements (ruff SIM117) - Add docstrings to inner helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address Gemini review suggestion — no need for __builtins__ check in standard Python 3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/utils/test_cupti_fallback.py (1)
13-14: This test does not have GPU architecture-specific requirements—it only requires CUDA to be available. Given that FlashInfer assumes CUDA is available in test environments, the skip decorator may be unnecessary. If retaining it for defensive robustness, align with the project's approach: either use architecture checks fromflashinfer.utils(when architecture-specific support is actually required) or useget_compute_capability()to query device properties. However, for a general "CUDA required" check without architecture constraints, this simple guard is acceptable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/utils/test_cupti_fallback.py` around lines 13 - 14, Remove the pytest.skipif decorator on test_cupti_fallback_on_activity_enable_error because the test only needs CUDA availability and the project assumes CUDA in CI; simply rely on the existing environment assumption, or if you prefer a defensive check, replace the decorator with a project helper such as flashinfer.utils.get_compute_capability() or a has_cuda()/get_compute_capability() call to gate the test instead of using pytest.mark.skipif(not torch.cuda.is_available()).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/utils/test_cupti_fallback.py`:
- Around line 13-14: Remove the pytest.skipif decorator on
test_cupti_fallback_on_activity_enable_error because the test only needs CUDA
availability and the project assumes CUDA in CI; simply rely on the existing
environment assumption, or if you prefer a defensive check, replace the
decorator with a project helper such as
flashinfer.utils.get_compute_capability() or a
has_cuda()/get_compute_capability() call to gate the test instead of using
pytest.mark.skipif(not torch.cuda.is_available()).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0212a953-4539-44d0-bf20-7a3604e1c6e3
📒 Files selected for processing (1)
tests/utils/test_cupti_fallback.py
Description
On systems where
cupti-python >= 13is installed but the CUDA driver is older than 13.0,cupti.activity_enable()raisesNotSupportedError. Previously this call happened after thetryblock, so the exception was unhandled andbench_gpu_time_with_cupticrashed instead of falling back to CUDA events.Root cause: The driver-support check was missing from the guarded import block.
Fix: Added a probe (
activity_enable+activity_disableonRUNTIME) inside the existingtryblock so theNotSupportedErrortriggers the existing CUDA-event fallback path.Related Issues
Fixes a crash when using
bench_gpu_time(enable_cupti=True)on machines with CUDA driver < 13.0 butcupti-python >= 13installed.Pre-commit
pre-commit run --all-filespasses (all 14 hooks green)Tests
tests/utils/test_cupti_fallback.py— mockscupti.activity_enableto raise an exception and verifies:bench_gpu_time_with_cuptidoes not crashUserWarningabout falling back🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests