-
Notifications
You must be signed in to change notification settings - Fork 9
uploading with queue on background #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/litmodels/integrations/checkpoints.py:207
- Duplicate definition of _remove_checkpoint detected; this may lead to maintenance challenges or ambiguous behavior. Please remove the redundant implementation.
def _remove_checkpoint(self, trainer: "pl.Trainer", filepath: str) -> None:
| self.task_queue = queue.Queue() | ||
| self.upload_count = 0 | ||
| self.remove_count = 0 | ||
| self._worker = threading.Thread(target=self._worker_loop, daemon=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it behaves with keyboard interruption ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will likely continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should. But worth to see how it behaves in general. I think W&B still allows you to stop if you do multiple keyboard interrupt
ethanwharris
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Before submitting
What does this PR do?
This pull request introduces an asynchronous model manager that processes upload and remove tasks in the background. By offloading these actions to a worker thread, it avoids blocking the training loop and achieves better resource utilization, ensuring that model performance or training speed is not impacted by file operations.
Additionally, the new approach uses a single queue but maintains separate counters for uploads and removals, making it simple to track task progress in each category. This design keeps the code streamlined and offers a clear way to monitor both types of background tasks while training.
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃