Commit 34753e1
committed
docs: Update READMEs with teacher-guided SFT training strategy
- Main README: Document teacher (Mixtral-8x7B via vLLM) + student (Llama 3.1 8B) architecture
- Notebooks README: Explain real-time validation + periodic SFT approach
- Training flow: Student generates → Teacher validates → Environment executes → SFT corrections
- Remove PROCESS_REWARD_MODELING.md (outdated, replaced by actual implementation)
- Update CRITIQUE_LEARNING_IMPLEMENTATION.md to reflect SFT-focused strategy (no PPO)1 parent 6777241 commit 34753e1
File tree
4 files changed
+405
-987
lines changed- notebooks
4 files changed
+405
-987
lines changed
0 commit comments