Skip to content

[Discussion] Rework dataset storage #19

@GuillaumeBroggi

Description

@GuillaumeBroggi

Current implementation rationale for the is_store_dataset flag:

(1) The GNNGraphDataset class was implemented to handle datasets that dont fit into memory (cannot be loaded all at once) while (2) the GNNGraphDatasetInMemory was implemented to handle datasets that fit into memory. However, at some point I realized that you may still want to have your dataset stored in separate files but want to load the entire dataset into memory - instead of asking the user to load the files by himself, I added this flag for convenience.

Implications:

  • The whole dataset is dumped when saving it. This might be an undesired behavior since data are written twice to the disk (once as individual data files and once as the dumped dataset file).
  • It is noted that dumping all the data in one file might be a desired feature for a small dataset. In fact, many ML datasets are shared as a .pt file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions