Deal with PermissionError when setting up the working_dir#21690
Deal with PermissionError when setting up the working_dir#21690lldelisle wants to merge 1 commit intogalaxyproject:devfrom
Conversation
| try: | ||
| self._setup_working_directory(job=job) | ||
| except PermissionError: | ||
| log.warning("Could not setup the working directory") |
There was a problem hiding this comment.
That has to fail though, we cannot silently ignore this.
Note also that for the traceback you provided:
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: galaxy.jobs.handler ERROR 2025-12-19 15:30:18,228 [pN:handler_0,p:895201,tN:MainThread] Error while recovering job 171 during application startup.
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: Traceback (most recent call last):
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 312, in __check_jobs_at_startup
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: self._check_job_at_startup(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 359, in _check_job_at_startup
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: job_wrapper = self.__recover_job_wrapper(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 365, in __recover_job_wrapper
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: job_wrapper = self.job_wrapper(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 297, in job_wrapper
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: return JobWrapper(job, self, use_persisted_destination=use_persisted_destination)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 2776, in __init__
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: super().__init__(
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 1042, in __init__
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: self._setup_working_directory(job=job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 1340, in _setup_working_directory
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: safe_makedirs(self.tool_working_directory)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/util/path/__init__.py", line 137, in safe_makedirs
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: makedirs(path)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/usr/lib64/python3.9/os.py", line 225, in makedirs
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: mkdir(name, mode)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: PermissionError: [Errno 13] Permission denied: '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/171/working'
safe_makedirs first checks if the directory exists, so either the directory doesn't exist anymore in which case this is bad one way or the other, or the galaxy user can't see the directory, in which case I think all jobs should fail ?
I guess my question then is, can your galaxy user see the directory ?
There was a problem hiding this comment.
My galaxy user is usr_m_galaxy1 and currently do not belong to the group unige. When I run the job:
$ ls -alh /srv/beegfs/scratch/shares/galaxy/common/jobs/000/
total 1.5K
drwxr-xr-- 3 usr_m_galaxy1 hpc_users 1 Jan 30 08:36 .
drwxr-xr-- 3 usr_m_galaxy1 hpc_users 1 Jan 30 08:35 ..
drwxr-xr-- 12 delislel unige 21 Jan 30 08:36 188
$ ls -alh /srv/beegfs/scratch/shares/galaxy/common/jobs/000/188
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.o': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/outputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_container': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/working': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/tmp': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_galaxy_memory_mb': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/188.jt_json': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/metadata': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_outputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/.': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/..': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.e': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_epoch_start': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_galaxy_slots': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/home': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/inputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.sh': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/memory_statement.log': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_working': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_configs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/tool_script.sh': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/configs': Permission denied
total 0
d????????? ? ? ? ? ? .
d????????? ? ? ? ? ? ..
-????????? ? ? ? ? ? 188.jt_json
d????????? ? ? ? ? ? _configs
d????????? ? ? ? ? ? configs
-????????? ? ? ? ? ? galaxy_188.e
-????????? ? ? ? ? ? galaxy_188.o
-????????? ? ? ? ? ? galaxy_188.sh
d????????? ? ? ? ? ? home
d????????? ? ? ? ? ? inputs
-????????? ? ? ? ? ? __instrument_core_container
-????????? ? ? ? ? ? __instrument_core_epoch_start
-????????? ? ? ? ? ? __instrument_core_galaxy_memory_mb
-????????? ? ? ? ? ? __instrument_core_galaxy_slots
-????????? ? ? ? ? ? memory_statement.log
d????????? ? ? ? ? ? metadata
d????????? ? ? ? ? ? _outputs
d????????? ? ? ? ? ? outputs
d????????? ? ? ? ? ? tmp
-????????? ? ? ? ? ? tool_script.sh
d????????? ? ? ? ? ? _working
d????????? ? ? ? ? ? workingAnd indeed os.path.exists is False...
I guess the best would be to create the job dir with drwx--x--x instead of drwxr-xr-- , no?
I guess this is controlled by the umask of gravity, I should put 022 to make sure the galaxyuser (other) always can check if the working directory exists, right?
There was a problem hiding this comment.
Or I could modify the script external_chown_script.py to use a group where they both belong to.
See #21496