Skip to content

fix: make utils.tqdm thread-safe #3286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dolfim-ibm
Copy link

Applied the same patch suggested in huggingface/datasets#7661.

Resolves #3285

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dolfim-ibm for the PR! I left a comment. The test test_progress_bar_respects_group is currently failing because the new tqdm class overrides the original one and drops the custom init that handled the name= kwarg. this can fix it by re adding the original __init__.

Comment on lines 222 to 223
name = kwargs.pop("name", None) # do not pass `name` to `tqdm`
if are_progress_bars_disabled(name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why dropping the original __init__ and __delattr__?
even when keeping the original __delattr__, del tqdm._lock will call SafeDelLockMeta.__delattr__ (which safeguards the class level race only).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are definitely correct, thanks for the suggestions. now the PR is updated.

@Wauplin
Copy link
Contributor

Wauplin commented Aug 7, 2025

Hey @dolfim-ibm @hanouticelina , moving the convo from #3285 (comment) to this PR directly.

I can confirm that I am able to reproduce the issue with your script from #3285 (comment) but without disabling the progress bars. I think disabling the progress bars is what fixes the issue in the reproducible script, not the content of this PR.

More precisely, when I run it a few times I get different behaviors:

  • it can get stuck on lock acquisition and I have to ctrl+c
  • it can raise a AttributeError on _lock
  • or it works

@dolfim-ibm can you confirm to me that you can reproduce the same behavior as mine, even with the changes from your PR? If so, wanna give it a look again on how to fix it? :)

@dolfim-ibm
Copy link
Author

This is my setup:

  • macOS 15.5
  • arm64
  • Python 3.12

I'm running the code in #3285 as

for i in {1..20}; do python run_threaded.py; done

My results:

  • Without metaclass patch (basically as in main), with disable_progress_bars():
    • Almost always getting AttributeError: type object 'tqdm' has no attribute '_lock'
    • Never an issue about getting stuck
  • With metaclass patch (this PR), with disable_progress_bars():
    • No issue at all
  • Without metaclass patch (basically as in main), without disable_progress_bars():
    • I got it stuck (need ctrl+c)
    • Getting into AttributeError: type object 'tqdm' has no attribute '_lock'
  • With metaclass patch (this PR), without disable_progress_bars():
    • I got it stuck (need ctrl+c),
    • Getting into AttributeError: type object 'tqdm' has no attribute '_lock'`

So, I think the PR is currently addressing one of the scenario, but it still fails on the other one.

@Wauplin
Copy link
Contributor

Wauplin commented Aug 7, 2025

Thanks for the extensive test. I agree, this seems to fix the case "concurrency issues + progress bars are disabled" (still wonder why to be honest 😄). But before moving forward / merging this PR, I would really prefer to have a fix for the main use case which is "concurrency issues + progress bars are enabled", which is the case most people will fall into. Disabling progress bars is quite a niche option at this stage. If not a fix, at least an idea of what's going wrong under the hood 😕

@dolfim-ibm
Copy link
Author

My last tests with 2dff573 seems to work also for the case with progressbars enabled.

The idea behind it is to use a "backup" _lock, in case the one which is needed is deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

utils.tqdm is not thread-safe
4 participants