How to get a percentage of the training data or a certain count of samples? #2976

medemi68 · 2025-07-24T11:12:14Z

medemi68
Jul 24, 2025

I plan on using axolotl for finetuning a model, but I don't want to train on the entire dataset (from HuggingFace)'s split. I have two datasets that I am using, and I want to use 80% of one, and 20% of the other, or better yet, be able to specify the number of rows. Is there a way to precisely control the amount of training data I am using and the mixing?

Answered by NanoCode012

Jul 24, 2025

Hey, for splits, HF supports split: train[:80%] natively or split: train[:3000] etc. Does this satisfy what you need? https://huggingface.co/docs/datasets/loading#slice-splits

View full answer

NanoCode012 · 2025-07-24T11:16:50Z

NanoCode012
Jul 24, 2025
Maintainer

Hey, for splits, HF supports split: train[:80%] natively or split: train[:3000] etc. Does this satisfy what you need? https://huggingface.co/docs/datasets/loading#slice-splits

2 replies

medemi68 Aug 14, 2025
Author

I think that might work. If possible it would be great if this could be reflected somewhere in the documentation, unless it already is and I missed it.

NanoCode012 Aug 14, 2025
Maintainer

It is but very nested I'd say. The link above is pointed here https://docs.axolotl.ai/docs/dataset_loading.html#loading-datasets

If you're interested, you can make a small PR to that section to add an explanation on how to do split :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to get a percentage of the training data or a certain count of samples? #2976

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to get a percentage of the training data or a certain count of samples? #2976

Uh oh!

medemi68 Jul 24, 2025

Replies: 1 comment · 2 replies

Uh oh!

NanoCode012 Jul 24, 2025 Maintainer

Uh oh!

medemi68 Aug 14, 2025 Author

Uh oh!

NanoCode012 Aug 14, 2025 Maintainer

medemi68
Jul 24, 2025

Replies: 1 comment 2 replies

NanoCode012
Jul 24, 2025
Maintainer

medemi68 Aug 14, 2025
Author

NanoCode012 Aug 14, 2025
Maintainer