Skip to content

Commit 34753e1

Browse files
committed
docs: Update READMEs with teacher-guided SFT training strategy
- Main README: Document teacher (Mixtral-8x7B via vLLM) + student (Llama 3.1 8B) architecture - Notebooks README: Explain real-time validation + periodic SFT approach - Training flow: Student generates → Teacher validates → Environment executes → SFT corrections - Remove PROCESS_REWARD_MODELING.md (outdated, replaced by actual implementation) - Update CRITIQUE_LEARNING_IMPLEMENTATION.md to reflect SFT-focused strategy (no PPO)
1 parent 6777241 commit 34753e1

File tree

4 files changed

+405
-987
lines changed

4 files changed

+405
-987
lines changed

0 commit comments

Comments
 (0)