Skip to content

Questions about generate dataset using make_dataset.py #2

@zzwei1

Description

@zzwei1

Hi, nice work !

I'm trying to explore SEVIR dataset and I wonna generate a dataset with make_dataset.py.

I have modified make_dataset.py and I split each event into 2 trainig samples.

I have run make_dataset.py, and I have got files named as nowcast_training_000.h5, nowcast_testing_000.h5, ..., nowcast_training_008.h5, nowcast_training_008.h5, and their corresponding xxx_META.csv files. (I remain the parameter "n_chunks" as the default value 8).

However, I don't understand the relations in these files, and I have the following confusions,

  1. Is the data in xxx_000.h5 the same as that in xxx_001.h5 and others but with different data orders, or the data in xxx_000.h5 is not the same as that in xxx_001.h5 and others.

  2. Should I use one of the file pairs for training and testing (such as nowcast_training_000.h5 for training and nowcast_testinging_000.h5 for testing ), or using all of the files, or setting the parameter "append" to "True" to write the 8 chunks into 1 training file and 1 testing file ?

Thanks in advance !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions