Skip to content

Incorrect num_workers warning #15572

@ZhaofengWu

Description

@ZhaofengWu

Bug description

This is technically not a bug but I couldn't find a more appropriate tag.

https://github.com/Lightning-AI/lightning/blob/d5ffdfac2ad4c8ff87c9895555fc21ccc482c18c/src/pytorch_lightning/trainer/connectors/data_connector.py#L222-L230
complains if the number of dataloader workers is less than the number of CPUs. This ignores when a process is only assigned to a subset of CPUs. I think the check in pytorch is more reasonable
https://github.com/pytorch/pytorch/blob/78a0ca29d939fc3017c3281730ba19ece5162f5c/torch/utils/data/dataloader.py#L532-L546

How to reproduce the bug

Launch any training process where len(os.sched_getaffinity(0)) < os.cpu_count().

Error messages and logs

pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 80 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.

Environment


#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): Trainer
#- PyTorch Lightning Version (e.g., 1.5.0):  1.7.7
#- PyTorch Version (e.g., 1.10): 1.12.1
#- Python version (e.g., 3.9): 3.10
#- OS (e.g., Linux): Linux

More info

No response

cc @Borda @justusschock @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Labels

    data handlingGeneric data-related topicfeatureIs an improvement or enhancementgood first issueGood for newcomersplGeneric label for PyTorch Lightning package

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions