-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
data handlingGeneric data-related topicGeneric data-related topicfeatureIs an improvement or enhancementIs an improvement or enhancementgood first issueGood for newcomersGood for newcomersplGeneric label for PyTorch Lightning packageGeneric label for PyTorch Lightning package
Milestone
Description
Bug description
This is technically not a bug but I couldn't find a more appropriate tag.
https://github.com/Lightning-AI/lightning/blob/d5ffdfac2ad4c8ff87c9895555fc21ccc482c18c/src/pytorch_lightning/trainer/connectors/data_connector.py#L222-L230
complains if the number of dataloader workers is less than the number of CPUs. This ignores when a process is only assigned to a subset of CPUs. I think the check in pytorch is more reasonable
https://github.com/pytorch/pytorch/blob/78a0ca29d939fc3017c3281730ba19ece5162f5c/torch/utils/data/dataloader.py#L532-L546
How to reproduce the bug
Launch any training process where len(os.sched_getaffinity(0)) < os.cpu_count()
.
Error messages and logs
pytorch_lightning/trainer/connectors/data_connector.py:236: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 80 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
Environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): Trainer
#- PyTorch Lightning Version (e.g., 1.5.0): 1.7.7
#- PyTorch Version (e.g., 1.10): 1.12.1
#- Python version (e.g., 3.9): 3.10
#- OS (e.g., Linux): Linux
More info
No response
Metadata
Metadata
Assignees
Labels
data handlingGeneric data-related topicGeneric data-related topicfeatureIs an improvement or enhancementIs an improvement or enhancementgood first issueGood for newcomersGood for newcomersplGeneric label for PyTorch Lightning packageGeneric label for PyTorch Lightning package