-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
I'm looking into how to convert my old DatasetConfig-based datasets to the new Dataset interface (#231).
What I'm missing:
- I want to combine train, dev, devtrain somehow together. This is what I would want for all training jobs. Should we provide a common data structure for this?
TrainingDatasets? - With a dataset always comes the extern data. Shouldn't this be part of the
Datasetinterface? Otherwise you must do this manually, and somehow infer it from the dataset? Or I would need some other extended structureDatasetWithExternDataor so. - Extern data would use dimension tag, and it's important that they would be shared among train/dev/devtrain. How would we do this?
- I'm building some setup pipeline for standard supervised training, i.e. for the pipeline I somewhere need to define which is the input and which is the output data key in extern data. This would also be in
TrainingDatasets, or maybe there would be a more special variantSupervisedTrainingDatasets?
Metadata
Metadata
Assignees
Labels
No labels