|
| 1 | +# RAI Fine-tuning Module |
| 2 | + |
| 3 | +⚠️ **Experimental Module**: This module is in active development. Features may change and some functionality is still in progress. |
| 4 | + |
| 5 | +## Module Overview |
| 6 | + |
| 7 | +This module provides tools for extracting and formatting training data from various providers such as Langfuse and LangSmith. The formatted training data is designed to work seamlessly with Unsloth for efficient fine-tuning. It includes: |
| 8 | + |
| 9 | +**Data Preparation:** |
| 10 | + |
| 11 | +- **Observation Extractors**: Extract observations from various sources (Langfuse, LangSmith) with standardized preprocessing |
| 12 | +- **Training Data Formatter**: Converts RAI observations to training data format in ChatML format for Unsloth compatibility |
| 13 | + |
| 14 | +It is recommended for extractors to adopt a standardized data format based on Langfuse structure. Langfuse format was chosen as the standardization target because it provides cleaner, more direct access to conversation data with flat message structures (`input`/`output` fields) that closely match the target ChatML format. This reduces preprocessing complexity and makes the formatter more maintainable compared to handling raw LangSmith data with deeply nested LangChain internal structures. |
| 15 | + |
| 16 | +Data from different sources (e.g., LangSmith) can be preprocessed at the extraction level to ensure consistent formatting. For example, LangSmith data with nested message structures and different field names is converted to the standard format before reaching the formatter, maintaining a single, reusable formatter for all data sources. |
| 17 | + |
| 18 | +Formatter follows OpenAI recommendation on [data formatting](https://platform.openai.com/docs/guides/supervised-fine-tuning#formatting-your-data) for fine tuning. |
| 19 | + |
| 20 | +**Fine-tune Helpers:** |
| 21 | + |
| 22 | +To be added. It includes: |
| 23 | + |
| 24 | +- **Model Fine-tuning**: Uses Unsloth for optimized training with 4-bit quantization and LoRA support |
| 25 | +- **LoRA Merger**: Merges LoRA adapter weights back into base models for standalone deployment |
| 26 | +- **Ollama Converter**: Converts fine-tuned models to Ollama format using GGUF export |
| 27 | + |
| 28 | +The module is designed as a standalone package to avoid dependency conflicts between different versions of Triton required by openai-whisper and unsloth-zoo. |
| 29 | + |
| 30 | +**System Component Proposal**: (Feedback is welcome and appreciated!) |
| 31 | + |
| 32 | +<div style="text-align: center; padding: 20px;"><img src="imgs/rai-fine-tune-system-components.png" alt="RAI Fine Tune System Components"></div> |
| 33 | + |
| 34 | +Folder Structure (Tenatative) |
| 35 | + |
| 36 | +``` |
| 37 | +src/rai_finetune/rai_finetune/ |
| 38 | +├── data/ # Data processing |
| 39 | +│ ├── formatters/ # Data formatting |
| 40 | +│ ├── extractors/ # Data extraction |
| 41 | +│ ├── validators/ # Data validation (To be implemented) |
| 42 | +├── utils/ # Utilities |
| 43 | +│ ├── chat_template.py # Chat templates |
| 44 | +│ ├── templates/ # Template files |
| 45 | +│ └── model_loader.py # Base model loading (from ModelManager, to be implemented) |
| 46 | +├── adapters/ # LoRA management |
| 47 | +│ ├── merger.py # LoRA merging (To be implemented) |
| 48 | +│ └── config.py # Adapter configs (To be implemented) |
| 49 | +├── trainers/ # Training orchestration |
| 50 | +│ ├── trainer.py # Main trainer (To be implemented) |
| 51 | +│ └── data_loader.py # Data preparation (To be implemented) |
| 52 | +└── exporters/ # Model export |
| 53 | + ├── ollama.py # Ollama export (To be implemented) |
| 54 | + └── gguf.py # GGUF export (To be implemented) |
| 55 | +``` |
| 56 | + |
| 57 | +## Environment Setup |
| 58 | + |
| 59 | +This module utilizes `unsloth` which works with Python 3.10, 3.11, and 3.12. Python 3.12+ has Dynamo compatibility issues with `unsloth`; see [issue reference](https://github.com/unslothai/unsloth/issues/886). Thus Python 3.10 is selected for its compatibility with the rest of RAI components. The instructions below are targeted for Linux. |
| 60 | + |
| 61 | +### 1. Install System Dependencies |
| 62 | + |
| 63 | +```bash |
| 64 | +sudo apt update |
| 65 | +sudo apt install -y \ |
| 66 | + libncurses5-dev \ |
| 67 | + libncursesw5-dev \ |
| 68 | + libreadline-dev \ |
| 69 | + libsqlite3-dev \ |
| 70 | + libssl-dev \ |
| 71 | + zlib1g-dev \ |
| 72 | + libbz2-dev \ |
| 73 | + libffi-dev \ |
| 74 | + liblzma-dev \ |
| 75 | + libgdbm-dev \ |
| 76 | + libnss3-dev \ |
| 77 | + libtinfo6 \ |
| 78 | + build-essential |
| 79 | +``` |
| 80 | + |
| 81 | +### 2. Install Python 3.10 with pyenv |
| 82 | + |
| 83 | +Use pyenv to manage Python versions: |
| 84 | + |
| 85 | +```bash |
| 86 | +# Install pyenv if not already installed |
| 87 | +curl https://pyenv.run | bash |
| 88 | + |
| 89 | +# Add to shell profile |
| 90 | +echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc |
| 91 | +echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc |
| 92 | +echo 'eval "$(pyenv init -)"' >> ~/.bashrc |
| 93 | + |
| 94 | +# Reload shell or source profile |
| 95 | +source ~/.bashrc |
| 96 | + |
| 97 | +# Install Python 3.10 |
| 98 | +pyenv install 3.10 |
| 99 | +``` |
| 100 | + |
| 101 | +### 3. Set up Poetry Environment |
| 102 | + |
| 103 | +```bash |
| 104 | +cd src/rai_finetune |
| 105 | + |
| 106 | +# Set local Python version |
| 107 | +pyenv local 3.10 |
| 108 | + |
| 109 | +# Install Poetry if not already installed |
| 110 | +curl -sSL https://install.python-poetry.org | python3 - |
| 111 | + |
| 112 | +# Create and activate Poetry environment |
| 113 | +poetry env use python |
| 114 | +poetry install |
| 115 | +poetry run pip install flash-attn --no-build-isolation |
| 116 | + |
| 117 | +# Activate the environment |
| 118 | +. ./setup_finetune_shell.sh |
| 119 | +``` |
| 120 | + |
| 121 | +### 4. Install llama.cpp Tools (Optional) |
| 122 | + |
| 123 | +The Ollama conversion process requires the `llama-quantize` tool from llama.cpp. To handle this, developers can: |
| 124 | + |
| 125 | +```bash |
| 126 | +# Clone and build llama.cpp at project root |
| 127 | +git clone https://github.com/ggerganov/llama.cpp.git |
| 128 | +cd llama.cpp |
| 129 | +mkdir build && cd build |
| 130 | +cmake .. |
| 131 | +cmake --build . --config Release |
| 132 | +# The llama-quantize tool will be in the build/bin directory |
| 133 | +``` |
| 134 | + |
| 135 | +## CLI Usage |
| 136 | + |
| 137 | +The module provides a unified command-line interface for data operations: |
| 138 | + |
| 139 | +```bash |
| 140 | +# Show general help |
| 141 | +python -m rai_finetune.data_cli --help |
| 142 | + |
| 143 | +# Show help for specific extractors |
| 144 | +python -m rai_finetune.data_cli extract langfuse --help |
| 145 | +python -m rai_finetune.data_cli format --help |
| 146 | +``` |
| 147 | + |
| 148 | +## Script Execution Flow |
| 149 | + |
| 150 | +Before running any scripts, make sure the shell is set up properly by running the following from the root folder of the project: |
| 151 | + |
| 152 | +```bash |
| 153 | +source src/rai_finetune/setup_finetune_shell.sh |
| 154 | +``` |
| 155 | + |
| 156 | +### 1. Observation Extraction |
| 157 | + |
| 158 | +Extract observations from Langfuse for specific models using the CLI: |
| 159 | + |
| 160 | +```bash |
| 161 | +python -m rai_finetune.data_cli extract langfuse \ |
| 162 | + --models "gpt-4o" \ |
| 163 | + --models "gpt-4o-mini" \ |
| 164 | + --output langfuse_raw_data.jsonl \ |
| 165 | + --max-data-limit 5000 |
| 166 | +``` |
| 167 | + |
| 168 | +With start and stop time filters: |
| 169 | + |
| 170 | +```bash |
| 171 | +python -m rai_finetune.data_cli extract langfuse \ |
| 172 | + --models "gpt-4o" \ |
| 173 | + --start-time "2025-08-01T00:00:00Z" \ |
| 174 | + --stop-time "2025-08-31T23:59:59Z" \ |
| 175 | + --output langfuse_raw_data_filtered.jsonl |
| 176 | + |
| 177 | +``` |
| 178 | + |
| 179 | +**CLI Options:** |
| 180 | + |
| 181 | +**Langfuse Options:** |
| 182 | + |
| 183 | +- `--models`: List of model names to extract observations from |
| 184 | +- `--output`: Output file for extracted observations (required) |
| 185 | +- `--page-size`: Page size for pagination (default: 50) |
| 186 | +- `--start-time`: Start time for data extraction (ISO format) |
| 187 | +- `--stop-time`: Stop time for data extraction (ISO format) |
| 188 | +- `--max-data-limit`: Maximum number of records to extract (default: 5000) |
| 189 | +- `--host`: Langfuse host URL (default: http://localhost:3000) |
| 190 | +- `--public-key`: Langfuse public key (or set LANGFUSE_PUBLIC_KEY env var) |
| 191 | +- `--secret-key`: Langfuse secret key (or set LANGFUSE_SECRET_KEY env var) |
| 192 | +- `--type-filter`: Observation type filter (default: GENERATION) |
| 193 | +- `--trace-id`: Restrict to specific trace ID |
| 194 | +- `--include-fields`: Fields to include in saved data samples |
| 195 | + |
| 196 | +**Environment Variables:** |
| 197 | +You can set Langfuse credentials as environment variables to avoid passing them on the command line: |
| 198 | + |
| 199 | +```bash |
| 200 | +export LANGFUSE_PUBLIC_KEY="your_public_key" |
| 201 | +export LANGFUSE_SECRET_KEY="your_secret_key" |
| 202 | +``` |
| 203 | + |
| 204 | +### 2. Training Data Preparation |
| 205 | + |
| 206 | +For tool calling fine-tuning using the CLI, format data samples using |
| 207 | + |
| 208 | +Format data for training: |
| 209 | + |
| 210 | +```bash |
| 211 | +python -m rai_finetune.data_cli format \ |
| 212 | + --input langfuse_raw_data.jsonl \ |
| 213 | + --output langfuse_tc_data.jsonl \ |
| 214 | + --system-prompt "You are a specialized AI assistant for robotics and tool calling tasks." |
| 215 | +``` |
| 216 | + |
| 217 | +**CLI Options:** |
| 218 | + |
| 219 | +- `--input`: Input observations file (required) |
| 220 | +- `--output`: Output training data file (required) |
| 221 | +- `--system-prompt`: System prompt to use (default: "You are a helpful AI assistant that can use tools to help users.") |
| 222 | +- `--system-prompt`: System prompt to use (default: "You are a helpful AI assistant that can use tools to help users.") |
| 223 | +- `--system-prompt-file`: Path to file containing custom system prompt |
0 commit comments