Skip to content

motivation for HF datasets? #127

@arendu

Description

@arendu

The motivation behind these kind of "datasets" is very odd imo.

Why not just enforce a single dataset class - and call it a day! Anyone can write a simple script to download a dataset from HF and convert it to open-ai format, right? Also, there is almost zero usecase where you just take one dataset and train on it (at least to build high quality aligned models) its always a combination of a huge set of datasets, each with different formats etc. This is all data-munging work that a toolkit should not be entangled in.

Premature notions of "convenience" just end up being code debt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions