Skip to content

Conversation

@hyongtao-code
Copy link
Contributor

@hyongtao-code hyongtao-code commented Dec 3, 2025

Purpose

LLMEngine.__del__ previously used the walrus operator together with and, which caused dp_group to be assigned the result of the boolean expression instead of the actual DP group object due to operator precedence. As a result, stateless_destroy_torch_distributed_process_group could receive True/False instead of a process group instance.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

`LLMEngine.__del__` previously used the walrus operator together with
`and`, which caused `dp_group` to be assigned the result of the boolean
expression instead of the actual DP group object due to operator precedence.
As a result, `stateless_destroy_torch_distributed_process_group` could
receive `True`/`False` instead of a process group instance.

Signed-off-by: Yongtao Huang <[email protected]>
@mergify mergify bot added the v1 label Dec 3, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the LLMEngine.__del__ method where incorrect operator precedence with the walrus operator could lead to a crash during resource cleanup. The fix is sound. However, I have also added a comment regarding the use of __del__ for resource management, as it is not a reliable mechanism and can lead to resource leaks. I've suggested a more robust design pattern for consideration.

Comment on lines 411 to 414
def __del__(self):
if (
dp_group := getattr(self, "dp_group", None)
and not self.external_launcher_dp
):
dp_group = getattr(self, "dp_group", None)
if dp_group is not None and not self.external_launcher_dp:
stateless_destroy_torch_distributed_process_group(dp_group)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this change correctly fixes the immediate bug, relying on __del__ for critical resource cleanup like destroying a process group is unreliable. The __del__ method is not guaranteed to be called in all circumstances, for example, if the object is part of a reference cycle. This can lead to resource leaks, which can be particularly problematic in a distributed environment. A more robust approach would be to implement an explicit close() or shutdown() method that handles this cleanup. Ideally, LLMEngine could be made a context manager (by implementing __enter__ and __exit__) to ensure deterministic cleanup using a with statement.

Copy link
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me

For reference, this was introduced by #24899

@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 3, 2025
@njhill njhill merged commit 2fc5d6e into vllm-project:main Dec 3, 2025
49 of 50 checks passed
PatrykSaffer pushed a commit to PatrykSaffer/vllm that referenced this pull request Dec 4, 2025
@hyongtao-code hyongtao-code deleted the fix-del branch December 5, 2025 01:57
charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025
charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants