Skip to content

np.int64 not suitable for scalar (nan value)? #334

@daviddanan

Description

@daviddanan

Describe the bug
When i tried to load a dataset i saved on disk, i was not able to retrieve properly a scalar. After digging a bit, i found out such a thing happen with np.int64.

To Reproduce

from pathlib import Path

from plaid import Sample
from plaid.containers.dataset import Dataset
from plaid.storage.writer import _check_folder

import numpy as np

if __name__ == "__main__":
    sample  = Sample()
    scalars = {"scalar1":np.int64(5),"scalar2":5}
    for simulationScalarsName, simulationScalarsValue in scalars.items():
        sample.add_scalar(simulationScalarsName, simulationScalarsValue)

    dataset = Dataset(samples=[sample])
    print(dataset.get_scalars_to_tabular())

    targetLocation = Path("RandomDir")
    _check_folder(targetLocation, True)
    dataset.save_to_dir(targetLocation)
    datasetReloaded = Dataset.load_from_file(targetLocation)
    print(datasetReloaded.get_scalars_to_tabular())

Expected behavior
This is the output
{'scalar1': array([5.]), 'scalar2': array([5.])}
{'scalar1': array([nan]), 'scalar2': array([5.])}

I expected to have the same before and after saving the dataset. For now, my fix is to cast the value as a float to be as generic as possible.
sample.add_scalar(simulationScalarsName, float(simulationScalarsValue))
Would it be relevant it do that within the method itself with proper error handling if the casting is not possible?

Desktop (please complete the following information):

  • OS: Windows 11
  • Version current main

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions