Skip to content

feat(compilation): add VLLM_COMPILE_DEPYF env var to control depyf de…#3

Closed
vincentzed wants to merge 1 commit intomainfrom
decomplation-operationl
Closed

feat(compilation): add VLLM_COMPILE_DEPYF env var to control depyf de…#3
vincentzed wants to merge 1 commit intomainfrom
decomplation-operationl

Conversation

@vincentzed
Copy link
Owner

@vincentzed vincentzed commented Aug 2, 2025

…compilation

introduce VLLM_COMPILE_DEPYF environment variable to allow users to toggle actual depyf decompilation during compilation. by default, a placeholder file is written unless VLLM_COMPILE_DEPYF=1 is set. this change also ensures that cudagraph error checking always performs decompilation regardless of the flag.

fixes issue where decompilation was always attempted, causing unnecessary overhead in production environments.

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

…compilation

introduce VLLM_COMPILE_DEPYF environment variable to allow users to toggle
actual depyf decompilation during compilation. by default, a placeholder file
is written unless VLLM_COMPILE_DEPYF=1 is set. this change also ensures that
cudagraph error checking always performs decompilation regardless of the flag.

fixes issue where decompilation was always attempted, causing unnecessary
overhead in production environments.
@github-actions
Copy link

github-actions bot commented Aug 2, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@vincentzed
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an environment variable VLLM_COMPILE_DEPYF to control the depyf decompilation for debugging, which helps reduce overhead in production environments. The implementation correctly handles the new flag and ensures that critical CUDAGraph error checking always performs decompilation. My feedback focuses on a performance improvement to avoid redundant decompilation calls.

Comment on lines 133 to 134
import depyf
src = depyf.decompile(new_code)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This depyf.decompile call is also present earlier in this function (around line 111) for debugging purposes. Since decompilation can be an expensive operation, it would be more efficient to perform it only once and reuse the result.

Consider refactoring to decompile at most once per bytecode_hook call, for example by storing the result in a local variable in a higher scope.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the VLLM_COMPILE_DEPYF environment variable to conditionally disable depyf decompilation, reducing overhead in production environments. A potential performance issue was identified where decompilation could occur twice; a suggestion to refactor the logic to avoid this redundant operation has been provided.

Comment on lines +103 to 135
# Check if we should perform actual decompilation or write placeholder
if envs.VLLM_COMPILE_DEPYF:
# Perform actual decompilation when VLLM_COMPILE_DEPYF=1
# usually the decompilation will succeed for most models,
# as we guarantee a full-graph compilation in Dynamo.
# but there's no 100% guarantee, since decompliation is
# not a reversible process.
import depyf
src = depyf.decompile(new_code)

with open(decompiled_file, "w") as f:
f.write(src)

logger.debug("Dynamo transformed code saved to %s",
decompiled_file)
else:
# Write placeholder file with comment when VLLM_COMPILE_DEPYF=0 (default)
placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n"
with open(decompiled_file, "w") as f:
f.write(placeholder_content)
logger.debug("Placeholder Dynamo transformed code saved to %s. "
"Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.",
decompiled_file)
except Exception:
pass

if self.vllm_config.compilation_config.use_cudagraph and \
"update" in new_code.co_names:
# For cudagraph error checking, we always perform decompilation regardless of VLLM_COMPILE_DEPYF
# because this is a critical error checking mechanism
import depyf
src = depyf.decompile(new_code)
msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + src # noqa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a potential for depyf.decompile(new_code) to be called twice: once within the if envs.VLLM_COMPILE_DEPYF: block, and again within the if self.vllm_config.compilation_config.use_cudagraph and ...: block. To avoid redundant decompilations, consider refactoring to decompile lazily and cache the result within the function scope. For example:

        if not os.path.exists(decompiled_file):
            try:
                src = None
                def _decompile_once():
                    nonlocal src
                    if src is None:
                        import depyf
                        src = depyf.decompile(new_code)
                    return src

                if envs.VLLM_COMPILE_DEPYF:
                    decompiled_src = _decompile_once()
                    with open(decompiled_file, "w") as f:
                        f.write(decompiled_src)

                    logger.debug("Dynamo transformed code saved to %s",
                                 decompiled_file)
                else:
                    placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n"
                    with open(decompiled_file, "w") as f:
                        f.write(placeholder_content)
                    logger.debug("Placeholder Dynamo transformed code saved to %s. "
                                 "Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.",
                                 decompiled_file)
            except Exception:
                pass

        if self.vllm_config.compilation_config.use_cudagraph and \
            "update" in new_code.co_names:
            decompiled_src = _decompile_once()
            msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + decompiled_src  # noqa

This would require declaring src at a higher scope within if not os.path.exists(decompiled_file): to be shared between the two blocks.

        if not os.path.exists(decompiled_file):
            try:
                src = None
                def _decompile_once():
                    nonlocal src
                    if src is None:
                        import depyf
                        src = depyf.decompile(new_code)
                    return src

                if envs.VLLM_COMPILE_DEPYF:
                    decompiled_src = _decompile_once()
                    with open(decompiled_file, "w") as f:
                        f.write(decompiled_src)

                    logger.debug("Dynamo transformed code saved to %s",
                                 decompiled_file)
                else:
                    placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n"
                    with open(decompiled_file, "w") as f:
                        f.write(placeholder_content)
                    logger.debug("Placeholder Dynamo transformed code saved to %s. "
                                 "Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.",
                                 decompiled_file)
            except Exception:
                pass

        if self.vllm_config.compilation_config.use_cudagraph and \
            "update" in new_code.co_names:
            decompiled_src = _decompile_once()
            msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + decompiled_src  # noqa

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

No issues found across 2 files. Review in cubic

@github-actions
Copy link

github-actions bot commented Nov 1, 2025

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale label Nov 1, 2025
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

@github-actions github-actions bot closed this Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant