fine tuning on custom dataset taking ages to progress #247

bijucyborg · 2023-07-05T12:45:08Z

bijucyborg
Jul 5, 2023

Hi,

So I powered up a Ubuntu VM with access to GPU.
GPU #1 - current utilization: 0.0% - VRAM usage: 4.1 GB / 16.0 GB - NVIDIA A16-16Q
GPU #2 - current utilization: 0.0% - VRAM usage: 4.1 GB / 16.0 GB - NVIDIA A16-16Q

I then created a dataset according to specifications.
https://www.kaggle.com/code/bijucyborg/amzn-top-cellphones-q-a-for-h2o-llm-studio

If I start a fine tuning job with 50 to 100 rows, the fine tuning is completed in 24 seconds or so.
If I run the fine tuning on oasst dataset, the fine tuning is completed in 15 minutes approx.
I'm using the pythia 1b parameter model to conduct these experiments.

But if I include 250 rows or 500 rows, the training starts but is taking forever to even initialise.
2023-07-05 12:28:29,736 - INFO: Training Epoch: 1 / 1
2023-07-05 12:28:29,737 - INFO: train loss: 0%| | 0/123 [00:00<?, ?it/s]

If I look at the resource utilization the CPU is maxed out at 100% whereas the GPU is not being utilised at all.

What could I be doing wrong and where should be I be looking to find why this is happening. The dataset is quite a suspect but since reducing the rows makes the training super fast, I believe its got something to do with quantity than quality.
However since oasst which is 8000 rows also works like a charm, im confused about what could be wrong.

Appreciate any clues to make this work. Thanks in advance.

bijucyborg · 2023-07-05T15:24:52Z

bijucyborg
Jul 5, 2023
Author

ok it was the parent id matching causing the training to be slow. Since the oasst dataset also has parent id matching, I would need to investigate why fine tuning with my dataset is so much slower than oasst. I made up this id and parent id column based on some assumptions so there is high likelyhood that something is wrong here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fine tuning on custom dataset taking ages to progress #247

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

fine tuning on custom dataset taking ages to progress #247

Uh oh!

bijucyborg Jul 5, 2023

Replies: 1 comment

Uh oh!

bijucyborg Jul 5, 2023 Author

bijucyborg
Jul 5, 2023

bijucyborg
Jul 5, 2023
Author