docs: update docs everywhere to remove uv pip install which isn't reliable (#217)

terrykong · web-flow · commit 98b7a90c41fc · 2025-04-20T17:06:09.000Z
Signed-off-by: Terry Kong &lt;terryk@nvidia.com&gt;
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -13,12 +13,6 @@ docker buildx build -t nemo-reinforcer -f Dockerfile .
 docker run -it --gpus all -v /path/to/nemo-reinforcer:/workspace/nemo-reinforcer nemo-reinforcer
 ```
 
-2. **Install the package in development mode**:
-```bash
-cd /workspace/nemo-reinforcer
-pip install -e .
-```
-
 ## Making Changes
 
 ### Workflow: Clone and Branch (No Fork Required)
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 <!-- markdown all in one -->
 - [Nemo-Reinforcer: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to \>100B Parameters, scaling from 1 GPU to 100s](#nemo-reinforcer-a-scalable-and-efficient-post-training-library-for-models-ranging-from-tiny-to-100b-parameters-scaling-from-1-gpu-to-100s)
   - [Features](#features)
-  - [Installation](#installation)
+  - [Prerequisuites](#prerequisuites)
   - [Quick start](#quick-start)
     - [SFT](#sft)
       - [Single Node](#single-node)
@@ -38,28 +38,26 @@ What you can expect:
 - 🔜 **Environment Isolation** - Dependency isolation between components
 - 🔜 **DPO Algorithm** - Direct Preference Optimization for alignment
 
-## Installation
+## Prerequisuites
 
 ```sh
-# For faster setup we use `uv`
+# For faster setup and environment isolation, we use `uv`
 pip install uv
 
-# Specify a virtual env that uses Python 3.12
-uv venv -p python3.12.9 .venv
-# Install NeMo-Reinforcer with vllm
-uv pip install -e .[vllm]
-# Install NeMo-Reinforcer with dev/test dependencies
-uv pip install -e '.[dev,test]'
+# If you cannot install at the system level, you can install for your user with
+# pip install --user uv
 
-# Use uv run to launch any runs.
-# Note that it is recommended to not activate the venv and instead use `uv run` since
+# Use `uv run` to launch all commands. It handles pip installing implicitly and
+# ensures your environment is up to date with our lock file.
+
+# Note that it is not recommended to activate the venv and instead use `uv run` since
 # it ensures consistent environment usage across different shells and sessions.
 # Example: uv run python examples/run_grpo_math.py
 ```
 
 ## Quick start
 
-**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
+**Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
 
 ### SFT
 
@@ -91,21 +89,14 @@ Refer to `examples/configs/sft.yaml` for a full list of parameters that can be o
 
 For distributed training across multiple nodes:
 
-Set `UV_CACHE_DIR` to a directory that can be read from all workers before running any uv run command.
-
-```sh
-export UV_CACHE_DIR=/path/that/all/workers/can/access/uv_cache
-```
-
 ```sh
 # Run from the root of NeMo-Reinforcer repo
 NUM_ACTOR_NODES=2
 # Add a timestamp to make each job name unique
 TIMESTAMP=$(date +%Y%m%d_%H%M%S)
 
 # SFT experiment uses Llama-3.1-8B model
-COMMAND="uv pip install -e .; uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \
-UV_CACHE_DIR=YOUR_UV_CACHE_DIR \
+COMMAND="uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \
 CONTAINER=YOUR_CONTAINER \
 MOUNTS="$PWD:$PWD" \
 sbatch \
@@ -159,8 +150,7 @@ NUM_ACTOR_NODES=2
 TIMESTAMP=$(date +%Y%m%d_%H%M%S)
 
 # grpo_math_8b uses Llama-3.1-8B-Instruct model
-COMMAND="uv pip install -e .; uv run ./examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml cluster.num_nodes=2 checkpointing.checkpoint_dir='results/llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='grpo-llama8b_math'" \
-UV_CACHE_DIR=YOUR_UV_CACHE_DIR \
+COMMAND="uv run ./examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml cluster.num_nodes=2 checkpointing.checkpoint_dir='results/llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='grpo-llama8b_math'" \
 CONTAINER=YOUR_CONTAINER \
 MOUNTS="$PWD:$PWD" \
 sbatch \
diff --git a/docs/cluster.md b/docs/cluster.md
@@ -4,6 +4,7 @@
   - [Slurm](#slurm)
     - [Batched Job Submission](#batched-job-submission)
     - [Interactive Launching](#interactive-launching)
+    - [Slurm UV\_CACHE\_DIR](#slurm-uv_cache_dir)
   - [Kubernetes](#kubernetes)
 
 ## Slurm
@@ -14,7 +15,7 @@
 # Run from the root of NeMo-Reinforcer repo
 NUM_ACTOR_NODES=1  # Total nodes requested (head is colocated on ray-worker-0)
 
-COMMAND="uv pip install -e .; uv run ./examples/run_grpo_math.py" \
+COMMAND="uv run ./examples/run_grpo_math.py" \
 CONTAINER=YOUR_CONTAINER \
 MOUNTS="$PWD:$PWD" \
 sbatch \
@@ -39,21 +40,6 @@ Make note of the the job submission number. Once the job begins you can track it
 tail -f 1980204-logs/ray-driver.log
 ```
 
-:::{note}
-`UV_CACHE_DIR` defaults to `$SLURM_SUBMIT_DIR/uv_cache` and is mounted to head and worker nodes
-to ensure fast `venv` creation. 
-
-If you would like to override it to somewhere else all head/worker nodes can access, you may set it
-via:
-
-```sh
-...
-UV_CACHE_DIR=/path/that/all/workers/and/head/can/access \
-sbatch ... \
-    ray.sub
-```
-:::
-
 ### Interactive Launching
 
 :::{tip}
@@ -87,11 +73,27 @@ bash 1980204-attach.sh
 ```
 Now that you are on the head node, you can launch the command like so:
 ```sh
-uv venv .venv
-uv pip install -e .
 uv run ./examples/run_grpo_math.py
 ```
 
+### Slurm UV_CACHE_DIR
+
+There several choices for `UV_CACHE_DIR` when using `ray.sub`:
+
+1. (default) `UV_CACHE_DIR` defaults to `$SLURM_SUBMIT_DIR/uv_cache` when not specified the shell environment, and is mounted to head and worker nodes to serve as a persistent cache between runs.
+2. Use the warm uv cache from our docker images
+    ```sh
+    ...
+    UV_CACHE_DIR=/home/ray/.cache/uv \
+    sbatch ... \
+        ray.sub
+    ```
+
+(1) is more efficient in general since the cache is not ephemeral and is persisted run to run; but for users that
+don't want to persist the cache, you can use (2), which is just as performant as (1) if the `uv.lock` is 
+covered by warmed cache.
+
+
 ## Kubernetes
 
 TBD
diff --git a/docs/guides/grpo.md b/docs/guides/grpo.md
@@ -12,7 +12,7 @@ uv run examples/run_grpo_math.py --config <PATH TO YAML CONFIG> {overrides}
 
 If not specified, `config` will default to [examples/configs/grpo.yaml](../../examples/configs/grpo_math_1B.yaml)
 
-**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
+**Reminder**: Don't forget to set your HF_HOME, WANDB_API_KEY, and HF_DATASETS_CACHE (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
 
 ## Now, for the details:
 
diff --git a/docs/guides/sft.md b/docs/guides/sft.md
@@ -21,7 +21,7 @@ uv run examples/run_sft.py \
     cluster.gpus_per_node=1 \
     logger.wandb.name="sft-dev-1-gpu"
 ```
-**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
+**Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
 
 ## Datasets
 
diff --git a/docs/testing.md b/docs/testing.md
@@ -103,8 +103,6 @@ Functional tests may require multiple GPUs to run. See each script to understand
 Functional tests are located under `tests/functional/`.
 
 ```sh
-# Install the project and the test dependencies
-uv pip install -e '.[test]'
 # Run the functional test for sft
 uv run bash tests/functional/sft.sh
 ```
diff --git a/nemo_reinforcer/models/generation/vllm.py b/nemo_reinforcer/models/generation/vllm.py
@@ -153,8 +153,8 @@ def __init__(
             self.SamplingParams = vllm.SamplingParams
         except ImportError:
             raise ImportError(
-                "vLLM is not installed. Please install it with `pip install nemo-reinforcer[vllm]` "
-                "or `pip install vllm --no-build-isolation` separately."
+                f"vLLM is not installed. Please check that VllmGenerationWorker.DEFAULT_PY_EXECUTABLE covers the vllm dependency. "
+                "If you are working interactively, you can install by running  `uv sync --extra vllm` anywhere in the repo."
             )
         vllm_kwargs = self.cfg.get("vllm_kwargs", {}).copy()
 
diff --git a/nemo_reinforcer/models/generation/vllm_backend.py b/nemo_reinforcer/models/generation/vllm_backend.py
@@ -17,9 +17,8 @@
     import vllm
 except ImportError:
     raise ImportError(
-        "vLLM is not installed. Please install it with `pip install nemo-reinforcer[vllm]` "
-        "or `pip install vllm` separately. This issue may also occur if worker is using incorrect "
-        "py_executable."
+        f"vLLM is not installed. Please check that VllmGenerationWorker.DEFAULT_PY_EXECUTABLE covers the vllm dependency. "
+        "If you are working interactively, you can install by running  `uv sync --extra vllm` anywhere in the repo."
     )