Skip to content

instanciate_delayed (used in manager) potentially breaks hashes due to inplace operation #583

@albertz

Description

@albertz

I had such code:

train_exp('exp1', task, model_def, config=config)
train_exp('exp2', task, model_def, config=config)  # exactly same as exp1
train_exp('exp3', task, model_def, config=config)  # exactly same as exp1
train_exp('exp4', task, model_def, config=config2)  # different exp

What I observed was the very weird behavior that exp1 had a different hash from exp2/exp3. Exp2 and exp3 had the same hash. Exp4 had yet another hash.

After debugging this, I found the issue: Somewhere inside my setup code, I was calling instanciate_delayed (it was ok to instanciate the object at that point; this was only for serialization, and was wrapped then via serialization.NonhashedCode). Specifically, it looked like this:

extern_data_raw = instanciate_delayed(dataset.get_extern_data())

The dataset here comes from within the task.

Now, the dataset.get_extern_data() I used here was always return a shallow copy of an static prepared extern_data dict, which looked like this:

extern_data = {
    "data": {
        "dim_tags": [batch_dim, time_dim],
        "sparse_dim": vocab_dim,
        "vocab": {"class": "SentencePieces", "model_file": spm_file, ...},
    },
}

spm_file was a Path.
So instanciate_delayed would inplace modify extern_data["data"]["vocab"]["model_file"].

So that's why I got a different hash for exp2 than what I got for exp1.
That's also why exp2 and exp3 had the same hash: Further calls on instanciate_delayed did not change this anymore.

Unfortunately, after fixing this (e.g. using another implementation of instanciate_delayed which does not operate inplace; or making a deep copy in dataset.get_extern_data()): The hash of exp1 did not change (good). The hash of exp2 and exp3 became the same as exp1 (good). However, the hash of exp4 also changed. That makes sense, but that is somewhat unfortunate, because I have already made a lot of experiments, and when I fix this now, basically almost all my hashes change now!

So that's why I will leave it unfixed now... I will make a new train_exp_v2 for future use...

Anyway, what I want to highlight in this issue: instanciate_delayed can potentially be dangerous due to working inplace. I think we should never have implemented it that way. We should also not change this function now, as this can break hashes (as you see from my example). But we can put a big warning into its docstring, and provide maybe an alternative implementation which does not have this problem. For example:

def instanciate_delayed_copy(o: Any) -> Any:
    """
    Recursively traverses a structure and calls .get() on all
    existing Delayed Operations, especially Variables in the structure

    In contrast to :func:`i6_core.util.instanciate_delayed` this function does not operate inplace.

    :param o: nested structure that may contain DelayedBase objects
    :return: o with all DelayedBase objects replaced by their .get() value
    """
    import tree

    def _instanciate_delayed_obj(o: Any) -> Any:
        if isinstance(o, DelayedBase):
            return o.get()
        return o

    return tree.map_structure(_instanciate_delayed_obj, o)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions