Skip to content

Conversation

@omarfarhoud
Copy link

Fixes #7918

Problem

When using load_from_disk() with contextlib.redirect_stdout(), the progress bar was not showing even for datasets with >16 files.

Root Cause

The disable parameter was set to None which triggers TTY auto-detection. This fails when stdout is redirected, causing the progress bar to be hidden.

Solution

Changed disable=len(state["_data_files"]) <= 16 or None to disable=len(state["_data_files"]) <= 16 to force the progress bar to show for datasets with >16 files, regardless of stdout redirection.

Testing

Verified that progress bars now appear correctly both with and without stdout redirection for datasets with >16 shards.

Fixes huggingface#7918. Changed disable=None to disable=False to prevent TTY
auto-detection from failing when stdout is redirected.
@lhoestq
Copy link
Member

lhoestq commented Dec 29, 2025

this seems to contradict the comment that says

set disable=None rather than disable=False by default to disable progress bar when no TTY attached

I believe the right approach is to do the same as in huggingface/huggingface_hub#2698

@omarfarhoud
Copy link
Author

this seems to contradict the comment that says

set disable=None rather than disable=False by default to disable progress bar when no TTY attached

I believe the right approach is to do the same as in huggingface/huggingface_hub#2698

Updated to check TQDM_POSITION=-1 to force-enable progress bars in cloud environments,
following the same pattern as huggingface_hub#2698.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! There are other uses of TQDM progress bars across the code base and they all use the tqdm class in utils/tqdm.py. Could you apply this change to the tqdm class in utils/tqdm.py before we merge ? This way we make sure all progress bars in datasets have the same behavior

@omarfarhoud
Copy link
Author

Moved the TQDM_POSITION check to the tqdm class in utils/tqdm.py so all progress bars
in the codebase have consistent behavior. Thanks for the suggestion!

@omarfarhoud omarfarhoud requested a review from lhoestq December 29, 2025 17:13
@omarfarhoud
Copy link
Author

@lhoestq thanks again for the suggestion. I’ve applied it and everything should now be consistent across all tqdm usage. Happy to adjust anything else if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

datasets.load_from_disk doesn't show progress bar

3 participants