Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions flashinfer/jit/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,10 @@ def gen_jit_spec(
# non debug mode
cuda_cflags += ["-DNDEBUG"]

# useful for ncu
if bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's covered in FLASHINFER_JIT_VERBOSE

Copy link
Collaborator Author

@fzyzcjy fzyzcjy Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, btw it seems ptxas's register-usage-level (enabled in the verbose path) is:

--register-usage-level <0..10>                      (-regUsageLevel)            
        Controls the aggressiveness of optimizations that affect register usage.
        ([0..10], default = 5) Higher values aggressively optimize the source program,
        trading off additional register usage for potential improvements in the generated
        code. Lower values inhibit optimizations that aggressively increase register
        usage. This option can work in conjunction with -maxrregcount and CUDA launch
        bounds. This is a BETA feature for advanced users and there is no guarantee
        that the implementation stays consistent between ptxas releases.
        Default value:  5.

thus I am not very sure whether it changes program behavior

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool("0") evaluates to True, so bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")) will be True if this environment variable is not set.

This bug is fixed in #1872 , note that -lineinfo will greatly increase binary size, see my comment in #1872 (comment)

cc @fzyzcjy @zhyncs for viz

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I should have written bool(int(...))...

cuda_cflags += ["-lineinfo"]
Comment on lines +201 to +202
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logic has two potential issues:

  1. The boolean check is incorrect: bool("0") is True, so os.environ.get("FLASHINFER_JIT_LINEINFO", "0") will cause this condition to be met by default. The check should be against "1", similar to FLASHINFER_JIT_VERBOSE.
  2. The -lineinfo flag may be added twice, as it's already included on line 191 when verbose mode is active.

The suggested change below corrects the check and prevents adding a duplicate flag.

Suggested change
if bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")):
cuda_cflags += ["-lineinfo"]
if os.environ.get("FLASHINFER_JIT_LINEINFO", "0") == "1" and "-lineinfo" not in cuda_cflags:
cuda_cflags += ["-lineinfo"]


if extra_cflags is not None:
cflags += extra_cflags
if extra_cuda_cflags is not None:
Expand Down