Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ def receive_weights(self, on_bucket_received: callable):
# receive bucket and update weights
while True:
metadata = self.socket.recv_pyobj()
weights, tensor = [], None
weights = []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Initializing tensor to None is recommended to ensure the variable is defined in the function scope. This prevents a potential NameError during the cleanup phase (line 261) if the bucket metadata is empty and the loop at line 243 never executes.

Suggested change
weights = []
weights, tensor = [], None

for name, meta in metadata["bucket_meta"].items():
shape, dtype, offset = meta["shape"], meta["dtype"], meta["offset"]
size = dtype.itemsize * shape.numel()
Expand All @@ -255,7 +255,10 @@ def receive_weights(self, on_bucket_received: callable):
get_torch_device().synchronize()
self.socket.send(b"")
on_bucket_received(weights)
del weights, tensor
for _, tensor in weights:
del tensor
weights.clear()
del weights
Comment on lines +258 to +261
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The loop iterating over weights to call del tensor is ineffective for releasing memory of the objects within the list; it only unbinds the local loop variable in each iteration. To properly release the tensors and address the memory issue, weights.clear() should be used. This ensures that even if external references to the weights list exist (e.g., within the callback), the tensors themselves can be garbage collected. A single del tensor is then sufficient to clear the reference leaked from the previous loop.

Suggested change
for _, tensor in weights:
del tensor
weights.clear()
del weights
weights.clear()
del weights, tensor

if metadata["is_last"]:
break
finally:
Expand Down
Loading