How to train a model online with TRL (without pre-existing dataset)? #2822
Replies: 2 comments
-
I'm experiencing a similar issue. As far as I know, the A possible solution might be to use the older version 0.11. I hope someone has a better solution. EDIT: |
Beta Was this translation helpful? Give feedback.
-
Hi @SepehrDehdashtian , I understand your wish of cli unix philo (no wget in the raw cli), and I have a great news for you!! I developped a PR to support local dataset at https://github.com/huggingface/trl/pull/3470/files . It has not been merged yet. But in a few lines I got to test successfully on my how hard disk a chatbot training. It took 9 seconds for 3 prompt training in json. It currently only supports sft. Do not hesistate to clone the specific branch if you can not wait for the local dataset. Best regards. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I’m trying to fine-tune a model (e.g., DeepSeek-R1-Distill-Llama-8B) in an online RL style, without having a static dataset beforehand. Instead, I want to generate prompts on the fly, sample responses from my policy, compute a reward at the end of each response, and update the policy accordingly.
I am new to RL fine-tuning of LLMs, so I might be missing something obvious. Any guidance or clarification would be greatly appreciated!
Problem Setup
A pseudo-code for illustration
My Questions
Since I’m still learning about RL fine-tuning for LLMs, I’d appreciate any resources or explanations that clarify the best approach to achieve this in TRL.
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions