-
Notifications
You must be signed in to change notification settings - Fork 84
Open
Description
π Overview
Aiming to implement a unified fine-tuning approach for Llama 3 using ORPO (Odds Ratio Preference Optimization), showcasing it through interactive marimo notebooks. ORPO combines SFT + DPO into a single efficient process, making it more resource-efficient while maintaining or improving performance.
π― Objectives
- Implement ORPO fine-tuning for Llama 3.1 8B model using the TRL library
π Technical Details
ORPO Implementation
- Combine instruction tuning and preference alignment in a single stage
- Use TRL library for implementation
- Target model: Llama 3 8B
- Based on approaches from:
π‘ Additional Contributions
- Plans to create additional demos using Marimo for other repository tasks/issues
- Complement existing Gradio demos with Marimo alternatives
π Todo List
- Set up initial marimo notebook structure
- Implement ORPO fine-tuning pipeline
π€ Related Issues
- References Call for Contributions (Call for contributionsΒ #43)
- Complements the Gradio Demos initiative (Building Gradio Demos with Llama Β #44)
π Resources
Looking forward to feedback and suggestions from the community!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels