Skip to content

BUG: Error when trying to serialize TaskDocument with vasp_objects in data store #770

@rpinsler

Description

@rpinsler

Describe the bug
We are using a jobflow data store to store band structure information in addition to the task document in the task store. For example, doc.output.vasp_objects looks as follows:

{<VaspObject.BANDSTRUCTURE: 'bandstructure'>: {'@class': 'BandStructureSymmLine',
  '@module': 'pymatgen.electronic_structure.bandstructure',
  'blob_uuid': <some_uuid>,
  'store': 'data'}}

This interacts with the field_validator of JobStoreDocument (own implementation similar to schema in jobflow):

    @field_validator("output", mode="before")
    @classmethod
    def reserialize_output(cls, v):
        if isinstance(v, dict) and "@module" in v and "@class" in v:
            v = MontyDecoder().process_decoded(v)
        return v

which tries to serialize BandStructureSymmLine but fails because it doesn't have the required keys:

Traceback (most recent call last):
  File ".../debug_band_gap.py", line 6, in <module>
    query.run()
  File "CONDA_ENV/lib/python3.10/site-packages/bunnet/odm/queries/cursor.py", line 93, in run
    return self.to_list()
  File "CONDA_ENV/lib/python3.10/site-packages/bunnet/odm/queries/cursor.py", line 85, in to_list
    [
  File "CONDA_ENV/lib/python3.10/site-packages/bunnet/odm/queries/cursor.py", line 86, in <listcomp>
    parse_obj(projection, i, lazy_parse=self.lazy_parse)
  File "CONDA_ENV/lib/python3.10/site-packages/bunnet/odm/utils/parsing.py", line 110, in parse_obj
    result = parse_model(model, data)
  File "CONDA_ENV/lib/python3.10/site-packages/bunnet/odm/utils/pydantic.py", line 37, in parse_model
    return model_type.model_validate(data)
  File "CONDA_ENV/lib/python3.10/site-packages/pydantic/main.py", line 503, in model_validate
    return cls.__pydantic_validator__.validate_python(
  File "CONDA_ENV/lib/python3.10/site-packages/bunnet/odm/documents.py", line 206, in __init__
    super(Document, self).__init__(*args, **kwargs)
  File "CONDA_ENV/lib/python3.10/site-packages/pydantic/main.py", line 164, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
  File ".../schemas/documents/jobstore.py", line 65, in reserialize_output
    v = MontyDecoder().process_decoded(v)
  File "CONDA_ENV/lib/python3.10/site-packages/monty/json.py", line 809, in process_decoded
    d = {
  File "CONDA_ENV/lib/python3.10/site-packages/monty/json.py", line 810, in <dictcomp>
    k: self.process_decoded(v) for k, v in data.items()
  File "CONDA_ENV/lib/python3.10/site-packages/monty/json.py", line 876, in process_decoded
    return {
  File "CONDA_ENV/lib/python3.10/site-packages/monty/json.py", line 877, in <dictcomp>
    self.process_decoded(k): self.process_decoded(v) for k, v in d.items()
  File "CONDA_ENV/lib/python3.10/site-packages/monty/json.py", line 801, in process_decoded
    return cls_.from_dict(data)
  File "CONDA_ENV/lib/python3.10/site-packages/pymatgen/electronic_structure/bandstructure.py", line 623, in from_dict
    labels_dict = {k.strip(): v for k, v in dct["labels_dict"].items()}
KeyError: 'labels_dict'

This used to work, so I am a bit confused what has changed. Is this an issue with the serialization or should the band structure information actually be available at that point?

To Reproduce
Steps to reproduce the behavior: n/a

Expected behavior
JobStoreDocument can be serialized.

Screenshots
n/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions