feat(compilation): add VLLM_COMPILE_DEPYF env var to control depyf de…#3
feat(compilation): add VLLM_COMPILE_DEPYF env var to control depyf de…#3vincentzed wants to merge 1 commit intomainfrom
Conversation
…compilation introduce VLLM_COMPILE_DEPYF environment variable to allow users to toggle actual depyf decompilation during compilation. by default, a placeholder file is written unless VLLM_COMPILE_DEPYF=1 is set. this change also ensures that cudagraph error checking always performs decompilation regardless of the flag. fixes issue where decompilation was always attempted, causing unnecessary overhead in production environments.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces an environment variable VLLM_COMPILE_DEPYF to control the depyf decompilation for debugging, which helps reduce overhead in production environments. The implementation correctly handles the new flag and ensures that critical CUDAGraph error checking always performs decompilation. My feedback focuses on a performance improvement to avoid redundant decompilation calls.
| import depyf | ||
| src = depyf.decompile(new_code) |
There was a problem hiding this comment.
This depyf.decompile call is also present earlier in this function (around line 111) for debugging purposes. Since decompilation can be an expensive operation, it would be more efficient to perform it only once and reuse the result.
Consider refactoring to decompile at most once per bytecode_hook call, for example by storing the result in a local variable in a higher scope.
There was a problem hiding this comment.
Code Review
This pull request introduces the VLLM_COMPILE_DEPYF environment variable to conditionally disable depyf decompilation, reducing overhead in production environments. A potential performance issue was identified where decompilation could occur twice; a suggestion to refactor the logic to avoid this redundant operation has been provided.
| # Check if we should perform actual decompilation or write placeholder | ||
| if envs.VLLM_COMPILE_DEPYF: | ||
| # Perform actual decompilation when VLLM_COMPILE_DEPYF=1 | ||
| # usually the decompilation will succeed for most models, | ||
| # as we guarantee a full-graph compilation in Dynamo. | ||
| # but there's no 100% guarantee, since decompliation is | ||
| # not a reversible process. | ||
| import depyf | ||
| src = depyf.decompile(new_code) | ||
|
|
||
| with open(decompiled_file, "w") as f: | ||
| f.write(src) | ||
|
|
||
| logger.debug("Dynamo transformed code saved to %s", | ||
| decompiled_file) | ||
| else: | ||
| # Write placeholder file with comment when VLLM_COMPILE_DEPYF=0 (default) | ||
| placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n" | ||
| with open(decompiled_file, "w") as f: | ||
| f.write(placeholder_content) | ||
| logger.debug("Placeholder Dynamo transformed code saved to %s. " | ||
| "Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.", | ||
| decompiled_file) | ||
| except Exception: | ||
| pass | ||
|
|
||
| if self.vllm_config.compilation_config.use_cudagraph and \ | ||
| "update" in new_code.co_names: | ||
| # For cudagraph error checking, we always perform decompilation regardless of VLLM_COMPILE_DEPYF | ||
| # because this is a critical error checking mechanism | ||
| import depyf | ||
| src = depyf.decompile(new_code) | ||
| msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + src # noqa |
There was a problem hiding this comment.
There's a potential for depyf.decompile(new_code) to be called twice: once within the if envs.VLLM_COMPILE_DEPYF: block, and again within the if self.vllm_config.compilation_config.use_cudagraph and ...: block. To avoid redundant decompilations, consider refactoring to decompile lazily and cache the result within the function scope. For example:
if not os.path.exists(decompiled_file):
try:
src = None
def _decompile_once():
nonlocal src
if src is None:
import depyf
src = depyf.decompile(new_code)
return src
if envs.VLLM_COMPILE_DEPYF:
decompiled_src = _decompile_once()
with open(decompiled_file, "w") as f:
f.write(decompiled_src)
logger.debug("Dynamo transformed code saved to %s",
decompiled_file)
else:
placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n"
with open(decompiled_file, "w") as f:
f.write(placeholder_content)
logger.debug("Placeholder Dynamo transformed code saved to %s. "
"Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.",
decompiled_file)
except Exception:
pass
if self.vllm_config.compilation_config.use_cudagraph and \
"update" in new_code.co_names:
decompiled_src = _decompile_once()
msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + decompiled_src # noqaThis would require declaring src at a higher scope within if not os.path.exists(decompiled_file): to be shared between the two blocks.
if not os.path.exists(decompiled_file):
try:
src = None
def _decompile_once():
nonlocal src
if src is None:
import depyf
src = depyf.decompile(new_code)
return src
if envs.VLLM_COMPILE_DEPYF:
decompiled_src = _decompile_once()
with open(decompiled_file, "w") as f:
f.write(decompiled_src)
logger.debug("Dynamo transformed code saved to %s",
decompiled_file)
else:
placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n"
with open(decompiled_file, "w") as f:
f.write(placeholder_content)
logger.debug("Placeholder Dynamo transformed code saved to %s. "
"Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.",
decompiled_file)
except Exception:
pass
if self.vllm_config.compilation_config.use_cudagraph and \
"update" in new_code.co_names:
decompiled_src = _decompile_once()
msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + decompiled_src # noqaThere was a problem hiding this comment.
cubic analysis
No issues found across 2 files. Review in cubic
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
…compilation
introduce VLLM_COMPILE_DEPYF environment variable to allow users to toggle actual depyf decompilation during compilation. by default, a placeholder file is written unless VLLM_COMPILE_DEPYF=1 is set. this change also ensures that cudagraph error checking always performs decompilation regardless of the flag.
fixes issue where decompilation was always attempted, causing unnecessary overhead in production environments.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Test Plan
Test Result
(Optional) Documentation Update