-
Notifications
You must be signed in to change notification settings - Fork 27
Description
@radekosmulski developed the examples for dataloaders with native TensorFlow/PyTorch:#47
I wonder how do we recommend to use the dataloaders without NVTabular. When we do not use NVTabular, we do not have a dataschema, therefore, all columns are treated as an input feature.
In particular, TensorFlow keras expects that the output of the dataloader is (x, y). Currently, @radekosmulski added a parsing function: https://github.com/NVIDIA-Merlin/dataloader/blob/8157c7650248359201545934fd7d3e7f95b0eea8/examples/01a-Getting-started-Tensorflow.ipynb
label_column = 'rating'
def process_batch(data, _):
x = {col: data[col] for col in data.keys() if col != label_column}
y = data[label_column]
return (x, y)
loader._map_fns = [process_batch]
I think the parsing function will be always required and therefore, this it not the best user experience.
Previously, the dataloader supported to parse arguments cat_names, cont_names, label_names.
I think there are multiple options:
- Easy tool to define manually a schema
- Enabling parameters cat_names, cont_names, label_names, etc.
- maybe there are more?