How to get a percentage of the training data or a certain count of samples? #2976
-
I plan on using axolotl for finetuning a model, but I don't want to train on the entire dataset (from HuggingFace)'s split. I have two datasets that I am using, and I want to use 80% of one, and 20% of the other, or better yet, be able to specify the number of rows. Is there a way to precisely control the amount of training data I am using and the mixing? |
Beta Was this translation helpful? Give feedback.
Answered by
NanoCode012
Jul 24, 2025
Replies: 1 comment 2 replies
-
Hey, for splits, HF supports |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
NanoCode012
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey, for splits, HF supports
split: train[:80%]
natively orsplit: train[:3000]
etc. Does this satisfy what you need? https://huggingface.co/docs/datasets/loading#slice-splits