Skip to content

Job _sis_hash_static / hash modifies inplace, alters _sis_kwargs #274

@albertz

Description

@albertz

Many of the jobs hash functions modify kwargs inplace. E.g.:

class CountNgramsJob(Job):
     ...

    @classmethod
    def hash(cls, kwargs):
        """delete the queue requirements from the hashing"""
        del kwargs["mem_rqmt"]
        del kwargs["cpu_rqmt"]
        del kwargs["time_rqmt"]
        del kwargs["fs_rqmt"]
        return super().hash(kwargs)

We call this in JobSingleton.__call__:

...
sis_hash = cls._sis_hash_static(parsed_args)  # calls cls.hash(parsed_args)
...
if tags is None:
    tags = set()
    for p in tools.extract_paths(parsed_args):
        tags.update(p.tags)
...
job._sis_init(*copy.deepcopy((args, kwargs, parsed_args)))

And in Job._sis_init:

...
self._sis_kwargs = parsed_args
...
if gs.AUTO_SET_JOB_INIT_ATTRIBUTES:
    self.set_attrs(parsed_args)

There are some usages of _sis_kwargs. E.g. in _sis_setup_directory:

for key, value in sorted(self._sis_kwargs.items()):
    try:
        f.write("PARAMETER: %s: %s\n" % (key, value))

I actually wanted to use _sis_kwargs to recalculate the hash, i.e. again to call job.hash(job._sis_kwargs). However, that doesn't work for such hash functions which modify the dict, which remove entries from it, because then you will get exceptions when it tries to remove it again.

I have a workaround for this:

class _DictLazyPop(dict):
    """
    Ignores pop or __delitem__ exceptions if key not present.
    We use this because the _sis_kwargs that is stored in the job
    sometimes has already some of the keys popped
    (see JobSingleton.__call__).
    """

    def pop(self, k, d=None):
        if k in self:
            return super().pop(k)
        return d

    def __delitem__(self, key):
        if key in self:
            super().__delitem__(key)

Then I can call job.hash(_DictLazyPop(job._sis_kwargs)). That hack seems to work for all instances I have tested so far. But it's an ugly hack.

I would actually suggest to change the code in JobSingleton.__call__, to do a copy of parsed_args before we pass it to _sis_hash_static. I guess a shallow copy is even enough. So e.g. using this code:

sis_hash = cls._sis_hash_static(parsed_args.copy())

But I'm not sure this maybe causes some unexpected changes in behavior. E.g., the info we dump in _sis_setup_directory now excludes those parameters which were removed for the hashing (e.g. in that CountNgramsJob example). But maybe that doesn't really matter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions