Skip to content

Avoid creating redundant datasets #74

@bcyphers

Description

@bcyphers

If enter_data() is called with the same train_path twice in a row and the data itself hasn't changed, a new Dataset does not need to be created.

We should add a column which stores some kind of hash of the actual data. When a Dataset would be created, if the metadata and data hash are exactly the same as an existing Dataset, nothing should be added to the ModelHub database and the existing Dataset should be returned instead.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions