-
Notifications
You must be signed in to change notification settings - Fork 84
Open
Labels
enhancementNew feature or requestNew feature or request
Description
π Feature
Add a TokensLoaderWithMeta class that stores some additional parallel meta data with the tokens. It can be used to store data with a bit more structure than flat sequence, like image tokens. Here's an example:
{
"token": [1,2,3,4,5],
"token_x": [0,0,0,1,1],
"token_y": [0,1,2,0,1]
}Notice they all have the same length.
Motivation
I've been using TokensLoader to train models and find it to be really handy. But it's unfortunately a bit difficult to use when I want to experiment with different positional encoding schemes.
Alternatives
The alternative is to create a normal LitDataset with these. But it is less efficient to store, load, and harder to pack samples.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request