How do I use only a portion of a dataset? #2280
-
I noticed that I cannot set a fractional |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
I just looked at Trainer docs, and it seems that
Line 90 in 8fb72cb You can use this config in your dataset section. For example, the below would split the data and use only 10% (10 piece = 100%). datasets:
- path: fozziethebeat/alpaca_messages_2k_test
type: chat_template
shards: 10 |
Beta Was this translation helpful? Give feedback.
-
The So you can add something like You can set a fractional epoch via #2282. |
Beta Was this translation helpful? Give feedback.
The
split
parameter is enabled for local file datasets via #2281.So you can add something like
split: train[:10%]
to your dataset's config and preprocess and load only 10% of the dataset that way. Alternatively you can setshard
as discussed by @NanoCode012 here.You can set a fractional epoch via #2282.