Fine-tune `Llama 3.1` 8B with `ORPO`

## 📋 Overview
Aiming to implement a unified fine-tuning approach for Llama 3 using `ORPO` (Odds Ratio Preference Optimization), showcasing it through interactive [marimo](https://marimo.io/) notebooks. ORPO combines `SFT + DPO` into a single efficient process, making it more resource-efficient while maintaining or improving performance.

## 🎯 Objectives
 - Implement ORPO fine-tuning for Llama 3.1 8B model using the [TRL library](https://huggingface.co/docs/trl/index)

## 🔍 Technical Details

### ORPO Implementation
- Combine instruction tuning and preference alignment in a single stage
- Use [TRL library](https://huggingface.co/docs/trl/index) for implementation
- Target model: Llama 3 8B
- Based on approaches from:
  - [MLAbonne's ORPO Guide](https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html)

## 💡 Additional Contributions
- Plans to create additional demos using Marimo for other repository tasks/issues
- Complement existing Gradio demos with Marimo alternatives

## 📝 Todo List
- [ ] Set up initial marimo notebook structure
- [ ] Implement ORPO fine-tuning pipeline

## 🤝 Related Issues
- References Call for Contributions (#43)
- Complements the Gradio Demos initiative (#44)

## 📚 Resources
- [Marimo.io documentation](https://docs.marimo.io/)
- [TRL library documentation](https://huggingface.co/docs/trl/index)

Looking forward to feedback and suggestions from the community!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune `Llama 3.1` 8B with `ORPO` #82

📋 Overview

🎯 Objectives

🔍 Technical Details

ORPO Implementation

💡 Additional Contributions

📝 Todo List

🤝 Related Issues

📚 Resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fine-tune Llama 3.1 8B with ORPO #82

Description

📋 Overview

🎯 Objectives

🔍 Technical Details

ORPO Implementation

💡 Additional Contributions

📝 Todo List

🤝 Related Issues

📚 Resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Fine-tune `Llama 3.1` 8B with `ORPO` #82