|
| 1 | +Supervised Fine-Tuning |
| 2 | +======================= |
| 3 | + |
| 4 | +.. |huggingface| image:: /_static/svg/hf-logo.svg |
| 5 | + :width: 16px |
| 6 | + :height: 16px |
| 7 | + :class: inline-icon |
| 8 | + |
| 9 | +This page explains how to run **full-parameter supervised fine-tuning (SFT)** and **LoRA fine-tuning** with the RLinf framework. SFT is typically the first stage before reinforcement learning: the model imitates high-quality examples so RL can continue optimization with a strong prior. |
| 10 | + |
| 11 | +Contents |
| 12 | +---------- |
| 13 | + |
| 14 | +- How to configure full-parameter SFT and LoRA SFT in RLinf |
| 15 | +- How to launch training on a single machine or multi-node cluster |
| 16 | +- How to monitor and evaluate results |
| 17 | + |
| 18 | + |
| 19 | +Supported datasets |
| 20 | +-------------------- |
| 21 | + |
| 22 | +RLinf currently supports datasets in the LeRobot format, selected via **config_type**. |
| 23 | + |
| 24 | +Supported formats include: |
| 25 | + |
| 26 | +- pi0_maniskill |
| 27 | +- pi0_libero |
| 28 | +- pi05_libero |
| 29 | +- pi05_maniskill |
| 30 | +- pi05_metaworld |
| 31 | +- pi05_calvin |
| 32 | + |
| 33 | +You can also train with a custom dataset format. Refer to the files below: |
| 34 | + |
| 35 | +1. In ``examples/sft/config/custom_sft_openpi.yaml``, set the data format. |
| 36 | + |
| 37 | +.. code:: yaml |
| 38 | +
|
| 39 | + model: |
| 40 | + openpi: |
| 41 | + config_name: "pi0_custom" |
| 42 | +
|
| 43 | +2. In ``rlinf/models/embodiment/openpi/__init__.py``, set the data format to ``pi0_custom``. |
| 44 | + |
| 45 | +.. code:: python |
| 46 | +
|
| 47 | + TrainConfig( |
| 48 | + name="pi0_custom", |
| 49 | + model=pi0_config.Pi0Config(), |
| 50 | + data=CustomDataConfig( |
| 51 | + repo_id="physical-intelligence/custom_dataset", |
| 52 | + base_config=DataConfig( |
| 53 | + prompt_from_task=True |
| 54 | + ), # we need language instruction |
| 55 | + assets=AssetsConfig(assets_dir="checkpoints/torch/pi0_base/assets"), |
| 56 | + extra_delta_transform=True, # True for delta action, False for abs_action |
| 57 | + action_train_with_rotation_6d=False, # User can add extra config in custom dataset |
| 58 | + ), |
| 59 | + pytorch_weight_path="checkpoints/torch/pi0_base", |
| 60 | + ), |
| 61 | +
|
| 62 | +3. In ``rlinf/models/embodiment/openpi/dataconfig/custom_dataconfig.py``, define the custom dataset config. |
| 63 | + |
| 64 | +.. code:: python |
| 65 | +
|
| 66 | + class CustomDataConfig(DataConfig): |
| 67 | + def __init__(self, *args, **kwargs): |
| 68 | + super().__init__(*args, **kwargs) |
| 69 | + self.repo_id = "physical-intelligence/custom_dataset" |
| 70 | + self.base_config = DataConfig( |
| 71 | + prompt_from_task=True |
| 72 | + ) |
| 73 | + self.assets = AssetsConfig(assets_dir="checkpoints/torch/pi0_base/assets") |
| 74 | + self.extra_delta_transform = True |
| 75 | + self.action_train_with_rotation_6d = False |
| 76 | +
|
| 77 | +
|
| 78 | +Training configuration |
| 79 | +---------------------- |
| 80 | + |
| 81 | +A full example lives in ``examples/sft/config/libero_sft_openpi.yaml``. Key fields: |
| 82 | + |
| 83 | +.. code:: yaml |
| 84 | +
|
| 85 | + cluster: |
| 86 | + num_nodes: 1 # number of nodes |
| 87 | + component_placement: # component → GPU mapping |
| 88 | + actor: 0-3 |
| 89 | +
|
| 90 | +To enable LoRA fine-tuning, set ``actor.model.is_lora`` to True and configure ``actor.model.lora_rank``. |
| 91 | + |
| 92 | +.. code:: yaml |
| 93 | +
|
| 94 | + actor: |
| 95 | + model: |
| 96 | + is_lora: True |
| 97 | + lora_rank: 32 |
| 98 | +
|
| 99 | +Launch scripts |
| 100 | +---------------- |
| 101 | + |
| 102 | +First start the Ray cluster, then run the helper script: |
| 103 | + |
| 104 | +.. code:: bash |
| 105 | +
|
| 106 | + cd /path_to_RLinf/ray_utils |
| 107 | + bash start_ray.sh # start head + workers |
| 108 | +
|
| 109 | + # return to repo root |
| 110 | + bash examples/sft/train_embodied_sft.py --config libero_sft_openpi.yaml |
| 111 | +
|
| 112 | +The same script works for generic text SFT; just swap the config file. |
0 commit comments