Skip to content

Fine-tune Llama 3.1 8B with ORPOΒ #82

@Haleshot

Description

@Haleshot

πŸ“‹ Overview

Aiming to implement a unified fine-tuning approach for Llama 3 using ORPO (Odds Ratio Preference Optimization), showcasing it through interactive marimo notebooks. ORPO combines SFT + DPO into a single efficient process, making it more resource-efficient while maintaining or improving performance.

🎯 Objectives

  • Implement ORPO fine-tuning for Llama 3.1 8B model using the TRL library

πŸ” Technical Details

ORPO Implementation

  • Combine instruction tuning and preference alignment in a single stage
  • Use TRL library for implementation
  • Target model: Llama 3 8B
  • Based on approaches from:

πŸ’‘ Additional Contributions

  • Plans to create additional demos using Marimo for other repository tasks/issues
  • Complement existing Gradio demos with Marimo alternatives

πŸ“ Todo List

  • Set up initial marimo notebook structure
  • Implement ORPO fine-tuning pipeline

🀝 Related Issues

πŸ“š Resources

Looking forward to feedback and suggestions from the community!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions