-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
The following code leads to https://huggingface.co/datasets/fabiencasenave/Tensile2d_test , which has a complete integration in HF datasets.
The features can be exported, but for the complete CGNS trees (with tags and BC and so on), we still need to resort to binary-like samples. The Sequence(Value("float"))
are possible (efficient ?), but HDF5 integration seems to be underway: huggingface/datasets#7743. We cannot use from datasets.features import Array2D
for types, since this imposes fixed-size arrays. The HDF5 integration seems to be limited to fixed size arrays, since the work seems to rely on Array2D
and so on: https://github.com/klamike/datasets/blob/2c4bfba70d525b0f9336b8e36b299d73d4a2f3e4/tests/packaged_modules/test_hdf5.py
from plaid.bridges import huggingface_bridge
import datasets
from datasets import Dataset, Features, Value, Sequence, Array2D
hf_dataset = datasets.load_dataset("PLAID-datasets/Tensile2d", split="all_samples")
dataset, pb_def, = huggingface_bridge.huggingface_dataset_to_plaid(hf_dataset, processes_number = 12, verbose = True)
all_feat_ids = dataset.get_all_features_identifiers()
all_feat_ids = [k for k in all_feat_ids if "name" in k.keys()]
features= {}
for feat_id in all_feat_ids:
if feat_id["type"] == "scalar":
features[feat_id["name"]] = Value("float64")
elif feat_id["type"] == "field":
features[feat_id["name"]] = Sequence(Value("float64"))
_dict = {}
for split in ["train_500", "test", "OOD"]:
def generator():
for id in pb_def.get_split(split):
sample = {}
for feat_id in all_feat_ids:
sample[feat_id["name"]] = dataset[id].get_feature_from_identifier(feat_id)
yield sample
ds = datasets.Dataset.from_generator(
generator,
features=datasets.Features(features),
num_proc=1,
writer_batch_size=1,
split=datasets.splits.NamedSplit(split),
)
_dict[split] = ds
datasets.DatasetDict(_dict).push_to_hub("fabiencasenave/Tensile2d_test")
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request