-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Many of the jobs hash functions modify kwargs inplace. E.g.:
class CountNgramsJob(Job):
...
@classmethod
def hash(cls, kwargs):
"""delete the queue requirements from the hashing"""
del kwargs["mem_rqmt"]
del kwargs["cpu_rqmt"]
del kwargs["time_rqmt"]
del kwargs["fs_rqmt"]
return super().hash(kwargs)We call this in JobSingleton.__call__:
...
sis_hash = cls._sis_hash_static(parsed_args) # calls cls.hash(parsed_args)
...
if tags is None:
tags = set()
for p in tools.extract_paths(parsed_args):
tags.update(p.tags)
...
job._sis_init(*copy.deepcopy((args, kwargs, parsed_args)))And in Job._sis_init:
...
self._sis_kwargs = parsed_args
...
if gs.AUTO_SET_JOB_INIT_ATTRIBUTES:
self.set_attrs(parsed_args)There are some usages of _sis_kwargs. E.g. in _sis_setup_directory:
for key, value in sorted(self._sis_kwargs.items()):
try:
f.write("PARAMETER: %s: %s\n" % (key, value))I actually wanted to use _sis_kwargs to recalculate the hash, i.e. again to call job.hash(job._sis_kwargs). However, that doesn't work for such hash functions which modify the dict, which remove entries from it, because then you will get exceptions when it tries to remove it again.
I have a workaround for this:
class _DictLazyPop(dict):
"""
Ignores pop or __delitem__ exceptions if key not present.
We use this because the _sis_kwargs that is stored in the job
sometimes has already some of the keys popped
(see JobSingleton.__call__).
"""
def pop(self, k, d=None):
if k in self:
return super().pop(k)
return d
def __delitem__(self, key):
if key in self:
super().__delitem__(key)Then I can call job.hash(_DictLazyPop(job._sis_kwargs)). That hack seems to work for all instances I have tested so far. But it's an ugly hack.
I would actually suggest to change the code in JobSingleton.__call__, to do a copy of parsed_args before we pass it to _sis_hash_static. I guess a shallow copy is even enough. So e.g. using this code:
sis_hash = cls._sis_hash_static(parsed_args.copy())But I'm not sure this maybe causes some unexpected changes in behavior. E.g., the info we dump in _sis_setup_directory now excludes those parameters which were removed for the hashing (e.g. in that CountNgramsJob example). But maybe that doesn't really matter.