Tiny allow compiling with line info and release moe #1659

fzyzcjy · 2025-09-09T09:15:18Z

📌 Description

useful for ncu

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

This reverts commit a2e1b7e.

gemini-code-assist

Summary of Changes

Hello @fzyzcjy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request provides a small but impactful update to the JIT compilation setup, enabling the inclusion of line information in compiled CUDA kernels. This enhancement is specifically designed to aid in profiling and debugging efforts, particularly when using tools like NVIDIA Nsight Compute (ncu), by providing more granular detail for performance analysis.

Highlights

JIT Compilation Flags: Introduced a new option to include line information during CUDA JIT compilation. This is controlled by the FLASHINFER_JIT_LINEINFO environment variable.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a way to enable CUDA line information for JIT compilation via the FLASHINFER_JIT_LINEINFO environment variable, which is useful for profiling with tools like ncu. My review identifies a potential issue in the implementation where the flag might be enabled by default unintentionally and could also be duplicated. I've provided a suggestion to correct this behavior.

gemini-code-assist · 2025-09-09T09:16:40Z

flashinfer/jit/core.py

+    if bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")):
+        cuda_cflags += ["-lineinfo"]


This logic has two potential issues:

The boolean check is incorrect: bool("0") is True, so os.environ.get("FLASHINFER_JIT_LINEINFO", "0") will cause this condition to be met by default. The check should be against "1", similar to FLASHINFER_JIT_VERBOSE.

The -lineinfo flag may be added twice, as it's already included on line 191 when verbose mode is active.

The suggested change below corrects the check and prevents adding a duplicate flag.

Suggested change

if bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")):

cuda_cflags += ["-lineinfo"]

if os.environ.get("FLASHINFER_JIT_LINEINFO", "0") == "1" and "-lineinfo" not in cuda_cflags:

cuda_cflags += ["-lineinfo"]

yzh119 · 2025-09-09T14:23:37Z

flashinfer/jit/core.py

        cuda_cflags += ["-DNDEBUG"]

+    # useful for ncu
+    if bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")):


it's covered in FLASHINFER_JIT_VERBOSE

good point, btw it seems ptxas's register-usage-level (enabled in the verbose path) is:

--register-usage-level <0..10> (-regUsageLevel) Controls the aggressiveness of optimizations that affect register usage. ([0..10], default = 5) Higher values aggressively optimize the source program, trading off additional register usage for potential improvements in the generated code. Lower values inhibit optimizations that aggressively increase register usage. This option can work in conjunction with -maxrregcount and CUDA launch bounds. This is a BETA feature for advanced users and there is no guarantee that the implementation stays consistent between ptxas releases. Default value: 5.

thus I am not very sure whether it changes program behavior

yzh119 · 2025-10-07T04:22:00Z

flashinfer/jit/core.py

        cuda_cflags += ["-DNDEBUG"]

+    # useful for ncu
+    if bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")):


bool("0") evaluates to True, so bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")) will be True if this environment variable is not set.

This bug is fixed in #1872 , note that -lineinfo will greatly increase binary size, see my comment in #1872 (comment)

cc @fzyzcjy @zhyncs for viz

Good point, I should have written bool(int(...))...

Revert "revert"

9cb9bcc

This reverts commit a2e1b7e.

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

zhyncs approved these changes Sep 9, 2025

View reviewed changes

zhyncs enabled auto-merge (squash) September 9, 2025 09:22

zhyncs merged commit a69a8bf into flashinfer-ai:main Sep 9, 2025
2 checks passed

yzh119 reviewed Sep 9, 2025

View reviewed changes

yzh119 mentioned this pull request Oct 7, 2025

ci/cd: add nightly build and CI for flashinfer-python, flashinfer-jit-cache, flashinfer-cubin #1872

Merged

5 tasks

yzh119 reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tiny allow compiling with line info and release moe #1659

Tiny allow compiling with line info and release moe #1659

Uh oh!

fzyzcjy commented Sep 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 9, 2025

Uh oh!

Uh oh!

yzh119 Sep 9, 2025

Uh oh!

fzyzcjy Sep 9, 2025 •

edited

Loading

Uh oh!

yzh119 Oct 7, 2025

Uh oh!

fzyzcjy Oct 7, 2025

Uh oh!

Uh oh!

		if bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")):
		cuda_cflags += ["-lineinfo"]

Tiny allow compiling with line info and release moe #1659

Tiny allow compiling with line info and release moe #1659

Uh oh!

Conversation

fzyzcjy commented Sep 9, 2025

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yzh119 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

fzyzcjy Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yzh119 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

fzyzcjy Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fzyzcjy Sep 9, 2025 •

edited

Loading