-
Notifications
You must be signed in to change notification settings - Fork 24
Description
I had such code:
train_exp('exp1', task, model_def, config=config)
train_exp('exp2', task, model_def, config=config) # exactly same as exp1
train_exp('exp3', task, model_def, config=config) # exactly same as exp1
train_exp('exp4', task, model_def, config=config2) # different expWhat I observed was the very weird behavior that exp1 had a different hash from exp2/exp3. Exp2 and exp3 had the same hash. Exp4 had yet another hash.
After debugging this, I found the issue: Somewhere inside my setup code, I was calling instanciate_delayed (it was ok to instanciate the object at that point; this was only for serialization, and was wrapped then via serialization.NonhashedCode). Specifically, it looked like this:
extern_data_raw = instanciate_delayed(dataset.get_extern_data())The dataset here comes from within the task.
Now, the dataset.get_extern_data() I used here was always return a shallow copy of an static prepared extern_data dict, which looked like this:
extern_data = {
"data": {
"dim_tags": [batch_dim, time_dim],
"sparse_dim": vocab_dim,
"vocab": {"class": "SentencePieces", "model_file": spm_file, ...},
},
}spm_file was a Path.
So instanciate_delayed would inplace modify extern_data["data"]["vocab"]["model_file"].
So that's why I got a different hash for exp2 than what I got for exp1.
That's also why exp2 and exp3 had the same hash: Further calls on instanciate_delayed did not change this anymore.
Unfortunately, after fixing this (e.g. using another implementation of instanciate_delayed which does not operate inplace; or making a deep copy in dataset.get_extern_data()): The hash of exp1 did not change (good). The hash of exp2 and exp3 became the same as exp1 (good). However, the hash of exp4 also changed. That makes sense, but that is somewhat unfortunate, because I have already made a lot of experiments, and when I fix this now, basically almost all my hashes change now!
So that's why I will leave it unfixed now... I will make a new train_exp_v2 for future use...
Anyway, what I want to highlight in this issue: instanciate_delayed can potentially be dangerous due to working inplace. I think we should never have implemented it that way. We should also not change this function now, as this can break hashes (as you see from my example). But we can put a big warning into its docstring, and provide maybe an alternative implementation which does not have this problem. For example:
def instanciate_delayed_copy(o: Any) -> Any:
"""
Recursively traverses a structure and calls .get() on all
existing Delayed Operations, especially Variables in the structure
In contrast to :func:`i6_core.util.instanciate_delayed` this function does not operate inplace.
:param o: nested structure that may contain DelayedBase objects
:return: o with all DelayedBase objects replaced by their .get() value
"""
import tree
def _instanciate_delayed_obj(o: Any) -> Any:
if isinstance(o, DelayedBase):
return o.get()
return o
return tree.map_structure(_instanciate_delayed_obj, o)