Skip to content

Commit f7a0be4

Browse files
authored
disable optimization and add more debug information during verbose mode (#1719)
<!-- .github/pull_request_template.md --> ## 📌 Description Add device debug information and disable cuda kernel optimization in verbose mode. ## 🔍 Related Issues N/A ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. -->
1 parent 7ee54c7 commit f7a0be4

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

flashinfer/jit/core.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -262,9 +262,8 @@ def gen_jit_spec(
262262
check_cuda_arch()
263263
verbose = os.environ.get("FLASHINFER_JIT_VERBOSE", "0") == "1"
264264

265-
cflags = ["-O3", "-std=c++17", "-Wno-switch-bool"]
265+
cflags = ["-std=c++17", "-Wno-switch-bool"]
266266
cuda_cflags = [
267-
"-O3",
268267
"-std=c++17",
269268
f"--threads={os.environ.get('FLASHINFER_NVCC_THREADS', '1')}",
270269
"-use_fast_math",
@@ -274,16 +273,20 @@ def gen_jit_spec(
274273
"-DFLASHINFER_ENABLE_FP8_E5M2",
275274
]
276275
if verbose:
276+
cflags += ["-O0", "-g"]
277277
cuda_cflags += [
278278
"-g",
279+
"-O0",
280+
"-G",
279281
"-lineinfo",
280282
"--ptxas-options=-v",
281283
"--ptxas-options=--verbose,--register-usage-level=10,--warn-on-local-memory-usage",
282284
"-DCUTLASS_DEBUG_TRACE_LEVEL=2",
283285
]
284286
else:
285287
# non debug mode
286-
cuda_cflags += ["-DNDEBUG"]
288+
cuda_cflags += ["-DNDEBUG", "-O3"]
289+
cflags += ["-O3"]
287290

288291
# useful for ncu
289292
if bool(os.environ.get("FLASHINFER_JIT_LINEINFO", "0")):

0 commit comments

Comments
 (0)