Skip to content

Conversation

@clefourrier
Copy link
Member

@clefourrier clefourrier commented Nov 20, 2025

The PR does 2 main things:

  • adds a del on the created objects to force the memory release of attached resources
  • constrains the max generation size with the user provided value, as it was skipped and otherwise ignored (we should be careful with this because the generation size management is heavily duplicated across the code base now, so I suspect what I fixed here will need to be ported in other places of the code/put in a better system)

Rest of modifs are nits (duplicated code/legacy functions that I removed), can put in another PR but they were thematically linked

…items, plus updated the logic in generation size to respect what the user asks
if config.model_parallel is False and self.config.dtype not in ["4bit", "8bit"]:
logger.info(f"Using Data Parallelism, putting model on device {self._device}")
self.model = self.model.to(self._device)
if config.compile:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate code, already exists in _create_auto_model

)
# model.to(self.device)
model.eval()
torch.set_grad_enabled(False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set at the module level

continuation = continuation.lstrip()
return continuation

def _model_call(self, inputs: torch.Tensor) -> torch.Tensor:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

legacy function

@clefourrier clefourrier requested a review from NathanHB November 20, 2025 12:52
@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@clefourrier
Copy link
Member Author

@pcuenca this should fix the issue you had with autobatch size, can you take a look?

I'm not sure it's 100% perfect, I'm still getting some memory not deallocated in the model, but I suspect it should already be helpful for your usecase

@clefourrier clefourrier merged commit 2236e17 into main Nov 20, 2025
5 checks passed
@NathanHB NathanHB added the bug label Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants