BlockingIOError: [Errno 11] Resource temporarily unavailable #25705
Replies: 10 comments 1 reply
-
Hello, could you please provide more information and steps to reproduce this problem? I think this issue should be reported in the OpenBLAS repository. |
Beta Was this translation helpful? Give feedback.
-
I am running a podman container using
|
Beta Was this translation helpful? Give feedback.
-
Is there some extra flag that I need to pass for distributed training inside the container? |
Beta Was this translation helpful? Give feedback.
-
I am not aware of any specific flags for distributed training. You can try to use |
Beta Was this translation helpful? Give feedback.
-
Let me try this and update you. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Also, there could be a problem with memory limit. |
Beta Was this translation helpful? Give feedback.
-
The host machine may also have limited options for running processes, so this can also be a problem. |
Beta Was this translation helpful? Give feedback.
-
The host machine has no such limitations I checked, I will check on the memory constrain part you mentioned. |
Beta Was this translation helpful? Give feedback.
-
Podman by default sets a pid limit in the cgroup of 2048, do you exceed that? |
Beta Was this translation helpful? Give feedback.
-
For anyone else that runs into this issue, I added "--pids-limit -1" to our Podman container and it resolved the issues I was seeing. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Issue Description
I am running privileged podman container inside my gpu machine. Now when I try to run model training, it first says
Steps to reproduce the issue
Steps to reproduce the issue
1.
2.
3.
Describe the results you received
Describe the results you received
Describe the results you expected
Describe the results you expected
podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
Additional environment details
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
Beta Was this translation helpful? Give feedback.
All reactions