feat(compilation): add VLLM_COMPILE_DEPYF env var to control depyf de… by vincentzed · Pull Request #3 · vincentzed/vllm

vincentzed · 2025-08-02T18:58:29Z

…compilation

introduce VLLM_COMPILE_DEPYF environment variable to allow users to toggle actual depyf decompilation during compilation. by default, a placeholder file is written unless VLLM_COMPILE_DEPYF=1 is set. this change also ensures that cudagraph error checking always performs decompilation regardless of the flag.

fixes issue where decompilation was always attempted, causing unnecessary overhead in production environments.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

…compilation introduce VLLM_COMPILE_DEPYF environment variable to allow users to toggle actual depyf decompilation during compilation. by default, a placeholder file is written unless VLLM_COMPILE_DEPYF=1 is set. this change also ensures that cudagraph error checking always performs decompilation regardless of the flag. fixes issue where decompilation was always attempted, causing unnecessary overhead in production environments.

github-actions · 2025-08-02T18:58:35Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vincentzed · 2025-08-02T18:58:36Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an environment variable VLLM_COMPILE_DEPYF to control the depyf decompilation for debugging, which helps reduce overhead in production environments. The implementation correctly handles the new flag and ensures that critical CUDAGraph error checking always performs decompilation. My feedback focuses on a performance improvement to avoid redundant decompilation calls.

gemini-code-assist · 2025-08-02T19:00:13Z

vllm/compilation/wrapper.py

            import depyf
            src = depyf.decompile(new_code)


This depyf.decompile call is also present earlier in this function (around line 111) for debugging purposes. Since decompilation can be an expensive operation, it would be more efficient to perform it only once and reuse the result.

Consider refactoring to decompile at most once per bytecode_hook call, for example by storing the result in a local variable in a higher scope.

gemini-code-assist

Code Review

This pull request introduces the VLLM_COMPILE_DEPYF environment variable to conditionally disable depyf decompilation, reducing overhead in production environments. A potential performance issue was identified where decompilation could occur twice; a suggestion to refactor the logic to avoid this redundant operation has been provided.

gemini-code-assist · 2025-08-02T19:01:44Z

vllm/compilation/wrapper.py

+                    # Check if we should perform actual decompilation or write placeholder
+                    if envs.VLLM_COMPILE_DEPYF:
+                        # Perform actual decompilation when VLLM_COMPILE_DEPYF=1
+                        # usually the decompilation will succeed for most models,
+                        # as we guarantee a full-graph compilation in Dynamo.
+                        # but there's no 100% guarantee, since decompliation is
+                        # not a reversible process.
+                        import depyf
+                        src = depyf.decompile(new_code)
+
+                        with open(decompiled_file, "w") as f:
+                            f.write(src)
+
+                        logger.debug("Dynamo transformed code saved to %s",
+                                     decompiled_file)
+                    else:
+                        # Write placeholder file with comment when VLLM_COMPILE_DEPYF=0 (default)
+                        placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n"
+                        with open(decompiled_file, "w") as f:
+                            f.write(placeholder_content)
+                        logger.debug("Placeholder Dynamo transformed code saved to %s. "
+                                     "Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.",
+                                     decompiled_file)
                except Exception:
                    pass

        if self.vllm_config.compilation_config.use_cudagraph and \
            "update" in new_code.co_names:
+            # For cudagraph error checking, we always perform decompilation regardless of VLLM_COMPILE_DEPYF
+            # because this is a critical error checking mechanism
            import depyf
            src = depyf.decompile(new_code)
            msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + src  # noqa


There's a potential for depyf.decompile(new_code) to be called twice: once within the if envs.VLLM_COMPILE_DEPYF: block, and again within the if self.vllm_config.compilation_config.use_cudagraph and ...: block. To avoid redundant decompilations, consider refactoring to decompile lazily and cache the result within the function scope. For example:

if not os.path.exists(decompiled_file): try: src = None def _decompile_once(): nonlocal src if src is None: import depyf src = depyf.decompile(new_code) return src if envs.VLLM_COMPILE_DEPYF: decompiled_src = _decompile_once() with open(decompiled_file, "w") as f: f.write(decompiled_src) logger.debug("Dynamo transformed code saved to %s", decompiled_file) else: placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n" with open(decompiled_file, "w") as f: f.write(placeholder_content) logger.debug("Placeholder Dynamo transformed code saved to %s. " "Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.", decompiled_file) except Exception: pass if self.vllm_config.compilation_config.use_cudagraph and \ "update" in new_code.co_names: decompiled_src = _decompile_once() msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + decompiled_src # noqa

This would require declaring src at a higher scope within if not os.path.exists(decompiled_file): to be shared between the two blocks.

if not os.path.exists(decompiled_file): try: src = None def _decompile_once(): nonlocal src if src is None: import depyf src = depyf.decompile(new_code) return src if envs.VLLM_COMPILE_DEPYF: decompiled_src = _decompile_once() with open(decompiled_file, "w") as f: f.write(decompiled_src) logger.debug("Dynamo transformed code saved to %s", decompiled_file) else: placeholder_content = "# Please set VLLM_COMPILE_DEPYF=1 to populate this file\n" with open(decompiled_file, "w") as f: f.write(placeholder_content) logger.debug("Placeholder Dynamo transformed code saved to %s. " "Set VLLM_COMPILE_DEPYF=1 to perform actual decompilation.", decompiled_file) except Exception: pass if self.vllm_config.compilation_config.use_cudagraph and \ "update" in new_code.co_names: decompiled_src = _decompile_once() msg = "Assigning / modifying buffers of nn.Module during forward pass is not allowed when using cudagraph inside the compiler because it will cause silent errors. Please use eager mode or fix the code. The following code contains clues about which buffer is being modified (please search for the usage of the function `update`):\n" + decompiled_src # noqa

cubic-dev-ai

cubic analysis

No issues found across 2 files. Review in cubic

github-actions · 2025-11-01T02:50:57Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions · 2025-12-01T03:19:34Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

gemini-code-assist bot reviewed Aug 2, 2025

View reviewed changes

cubic-dev-ai bot reviewed Aug 2, 2025

View reviewed changes

github-actions bot added the stale label Nov 1, 2025

github-actions bot closed this Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(compilation): add VLLM_COMPILE_DEPYF env var to control depyf de…#3

feat(compilation): add VLLM_COMPILE_DEPYF env var to control depyf de…#3
vincentzed wants to merge 1 commit intomainfrom
decomplation-operationl

vincentzed commented Aug 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 2, 2025

Uh oh!

vincentzed commented Aug 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 2, 2025

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

github-actions bot commented Nov 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vincentzed commented Aug 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 2, 2025

Uh oh!

vincentzed commented Aug 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

cubic analysis

Uh oh!

github-actions bot commented Nov 1, 2025

Uh oh!

github-actions bot commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vincentzed commented Aug 2, 2025 •

edited by github-actions bot

Loading