-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Fix load_from_disk progress bar with redirected stdout #7919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix load_from_disk progress bar with redirected stdout #7919
Conversation
Fixes huggingface#7918. Changed disable=None to disable=False to prevent TTY auto-detection from failing when stdout is redirected.
|
this seems to contradict the comment that says
I believe the right approach is to do the same as in huggingface/huggingface_hub#2698 |
Updated to check TQDM_POSITION=-1 to force-enable progress bars in cloud environments, |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
lhoestq
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool ! There are other uses of TQDM progress bars across the code base and they all use the tqdm class in utils/tqdm.py. Could you apply this change to the tqdm class in utils/tqdm.py before we merge ? This way we make sure all progress bars in datasets have the same behavior
|
Moved the TQDM_POSITION check to the tqdm class in utils/tqdm.py so all progress bars |
|
@lhoestq thanks again for the suggestion. I’ve applied it and everything should now be consistent across all tqdm usage. Happy to adjust anything else if needed. |
Fixes #7918
Problem
When using
load_from_disk()withcontextlib.redirect_stdout(), the progress bar was not showing even for datasets with >16 files.Root Cause
The
disableparameter was set toNonewhich triggers TTY auto-detection. This fails when stdout is redirected, causing the progress bar to be hidden.Solution
Changed
disable=len(state["_data_files"]) <= 16 or Nonetodisable=len(state["_data_files"]) <= 16to force the progress bar to show for datasets with >16 files, regardless of stdout redirection.Testing
Verified that progress bars now appear correctly both with and without stdout redirection for datasets with >16 shards.