We should be able to use HuggingFace datasets directly in RETURNN.
I guess the most canonical way would be to write a RETURNN Dataset for this. Maybe derived from CachedDataset2.
A separate independent more direct PyTorch dataset wrapper might make sense. Or actually I think not needed, as HuggingFace already directly supports this?
(cc @dthulke @NeoLegends @robin-p-schmitt)