|
3 | 3 | <!-- markdown all in one --> |
4 | 4 | - [Nemo-Reinforcer: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to \>100B Parameters, scaling from 1 GPU to 100s](#nemo-reinforcer-a-scalable-and-efficient-post-training-library-for-models-ranging-from-tiny-to-100b-parameters-scaling-from-1-gpu-to-100s) |
5 | 5 | - [Features](#features) |
6 | | - - [Installation](#installation) |
| 6 | + - [Prerequisuites](#prerequisuites) |
7 | 7 | - [Quick start](#quick-start) |
8 | 8 | - [SFT](#sft) |
9 | 9 | - [Single Node](#single-node) |
@@ -38,28 +38,26 @@ What you can expect: |
38 | 38 | - 🔜 **Environment Isolation** - Dependency isolation between components |
39 | 39 | - 🔜 **DPO Algorithm** - Direct Preference Optimization for alignment |
40 | 40 |
|
41 | | -## Installation |
| 41 | +## Prerequisuites |
42 | 42 |
|
43 | 43 | ```sh |
44 | | -# For faster setup we use `uv` |
| 44 | +# For faster setup and environment isolation, we use `uv` |
45 | 45 | pip install uv |
46 | 46 |
|
47 | | -# Specify a virtual env that uses Python 3.12 |
48 | | -uv venv -p python3.12.9 .venv |
49 | | -# Install NeMo-Reinforcer with vllm |
50 | | -uv pip install -e .[vllm] |
51 | | -# Install NeMo-Reinforcer with dev/test dependencies |
52 | | -uv pip install -e '.[dev,test]' |
| 47 | +# If you cannot install at the system level, you can install for your user with |
| 48 | +# pip install --user uv |
53 | 49 |
|
54 | | -# Use uv run to launch any runs. |
55 | | -# Note that it is recommended to not activate the venv and instead use `uv run` since |
| 50 | +# Use `uv run` to launch all commands. It handles pip installing implicitly and |
| 51 | +# ensures your environment is up to date with our lock file. |
| 52 | + |
| 53 | +# Note that it is not recommended to activate the venv and instead use `uv run` since |
56 | 54 | # it ensures consistent environment usage across different shells and sessions. |
57 | 55 | # Example: uv run python examples/run_grpo_math.py |
58 | 56 | ``` |
59 | 57 |
|
60 | 58 | ## Quick start |
61 | 59 |
|
62 | | -**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. |
| 60 | +**Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. |
63 | 61 |
|
64 | 62 | ### SFT |
65 | 63 |
|
@@ -91,21 +89,14 @@ Refer to `examples/configs/sft.yaml` for a full list of parameters that can be o |
91 | 89 |
|
92 | 90 | For distributed training across multiple nodes: |
93 | 91 |
|
94 | | -Set `UV_CACHE_DIR` to a directory that can be read from all workers before running any uv run command. |
95 | | - |
96 | | -```sh |
97 | | -export UV_CACHE_DIR=/path/that/all/workers/can/access/uv_cache |
98 | | -``` |
99 | | - |
100 | 92 | ```sh |
101 | 93 | # Run from the root of NeMo-Reinforcer repo |
102 | 94 | NUM_ACTOR_NODES=2 |
103 | 95 | # Add a timestamp to make each job name unique |
104 | 96 | TIMESTAMP=$(date +%Y%m%d_%H%M%S) |
105 | 97 |
|
106 | 98 | # SFT experiment uses Llama-3.1-8B model |
107 | | -COMMAND="uv pip install -e .; uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \ |
108 | | -UV_CACHE_DIR=YOUR_UV_CACHE_DIR \ |
| 99 | +COMMAND="uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \ |
109 | 100 | CONTAINER=YOUR_CONTAINER \ |
110 | 101 | MOUNTS="$PWD:$PWD" \ |
111 | 102 | sbatch \ |
@@ -159,8 +150,7 @@ NUM_ACTOR_NODES=2 |
159 | 150 | TIMESTAMP=$(date +%Y%m%d_%H%M%S) |
160 | 151 |
|
161 | 152 | # grpo_math_8b uses Llama-3.1-8B-Instruct model |
162 | | -COMMAND="uv pip install -e .; uv run ./examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml cluster.num_nodes=2 checkpointing.checkpoint_dir='results/llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='grpo-llama8b_math'" \ |
163 | | -UV_CACHE_DIR=YOUR_UV_CACHE_DIR \ |
| 153 | +COMMAND="uv run ./examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml cluster.num_nodes=2 checkpointing.checkpoint_dir='results/llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='grpo-llama8b_math'" \ |
164 | 154 | CONTAINER=YOUR_CONTAINER \ |
165 | 155 | MOUNTS="$PWD:$PWD" \ |
166 | 156 | sbatch \ |
|
0 commit comments