[Feat] add training and evaluation docs

kellyiss · web-flow · commit 2922cc6c1571 · 2026-01-05T11:55:16.000+08:00
* add training and evaluation docs

* fix some terms

* ad training tutorial
diff --git a/source/en/user_guide/internnav/quick_start/evaluation.md b/source/en/user_guide/internnav/quick_start/evaluation.md
@@ -1,16 +1,16 @@
-# Training and Evaluation
+# Evaluation
 
-This document presents how to train and evaluate models for different systems with InternNav. 
+This document describes how to evaluate models in **InternNav**.
 
-## Whole-system
+## InternVLA-N1 (Dual System) 
 
-### Training
-The training pipeline is currently under preparation and will be open-sourced soon.
+Model weights of InternVLA-N1 (Dual System) can be downloaded from [InternVLA-N1-DualVLN](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) and [InternVLA-N1-w-NavDP](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP).
 
-### Evaluation
-Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
+---
+
+### Evaluation on Isaac Sim
+Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory.
 
-#### Evaluation on Isaac Sim
 [UPDATE] We support using local model and isaac sim in one process now. Evaluate on Single-GPU:
 
 ```bash
@@ -51,7 +51,7 @@ The simulation can be visualized by set `vis_output=True` in eval_cfg.
 
 <img src="../../../_static/video/nav_eval.gif" alt="My GIF">
 
-#### Evaluation on Habitat Sim
+### Evaluation on Habitat Sim
 Evaluate on Single-GPU:
 
 ```bash
@@ -74,18 +74,36 @@ For multi-gpu inference, currently we support inference on SLURM as well as envi
     --config scripts/eval/configs/habitat_dual_system_cfg.py
 ```
 
+## InternVLA-N1 (System 2)
 
-## System1
+Model weights of InternVLA-N1 (System2) can be downloaded from [InternVLA-N1-System2](https://huggingface.co/InternRobotics/InternVLA-N1-System2).
 
-### Training
+Currently we only support evaluate single System2 on Habitat:
 
-Download the training data from [Hugging Face](https://huggingface.co/datasets/InternRobotics/InternData-N1/), and organize them in the form mentioned in [installation](./installation.md).
+Evaluate on Single-GPU:
 
 ```bash
-./scripts/train/start_train.sh --name "$NAME" --model-name navdp
+python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py
+
+# set config with the following fields
+eval_cfg = EvalCfg(
+    agent=AgentCfg(
+        model_name='internvla_n1',
+        model_settings={
+            "mode": "system2",  # inference mode: dual_system or system2
+            "model_path": "checkpoints/<s2_checkpoint>",  # path to model checkpoint
+        }
+    )
+)
+```
+
+For multi-gpu inference, currently we only support inference on SLURM.
+
+```bash
+./scripts/eval/bash/eval_system2.sh
 ```
 
-### Evaluation
+## VN Systems (System 1) 
 
 We support the evaluation of diverse System-1 baselines separately in [NavDP](https://github.com/InternRobotics/NavDP/tree/navdp_benchmark) to make it easy to use and deploy.
 To install the environment, we provide a quick start below:
@@ -129,53 +147,8 @@ python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path}
 python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR}
 ```
 
-
-## System2
-
-### Training
-
-Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the training of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.
-
-```base
-# train cma model
-./scripts/train/start_train.sh --name cma_train --model cma
-
-# train rdp model
-./scripts/train/start_train.sh --name rdp_train --model rdp
-
-# train seq2seq model
-./scripts/train/start_train.sh --name seq2seq_train --model seq2seq
-```
-### Evaluation
-
-#### InternVLA-N1-S2
-Currently we only support evaluate single System2 on Habitat:
-
-Evaluate on Single-GPU:
-
-```bash
-python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py
-
-# set config with the following fields
-eval_cfg = EvalCfg(
-    agent=AgentCfg(
-        model_name='internvla_n1',
-        model_settings={
-            "mode": "system2",  # inference mode: dual_system or system2
-            "model_path": "checkpoints/<s2_checkpoint>",  # path to model checkpoint
-        }
-    )
-)
-```
-
-For multi-gpu inference, currently we only support inference on SLURM.
-
-```bash
-./scripts/eval/bash/eval_system2.sh
-```
-
-#### Baseline Models
-We provide three small VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.
+## Single-System VLN Baselines
+We provide three small Single-System VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.
 
 Download the baseline models:
 ```bash
diff --git a/source/en/user_guide/internnav/quick_start/index.md b/source/en/user_guide/internnav/quick_start/index.md
@@ -15,5 +15,6 @@ myst:
 installation
 simulation
 interndata
-train_eval
+training
+evaluation
 ```
diff --git a/source/en/user_guide/internnav/quick_start/training.md b/source/en/user_guide/internnav/quick_start/training.md
@@ -0,0 +1,132 @@
+# Training
+
+This document provides instructions for training models in **InternNav**.  
+
+## Overview
+
+InternNav supports training models under three system paradigms:
+
+- **Dual-System VLN Models**: integrated System2 + System1 architectures  
+- **Single-System VLN Models**: end-to-end vision-and-language navigation models  
+- **VN System (System1) Models**: low-level visual navigation and control models  
+
+
+Each paradigm follows a different training protocol, which is detailed below.
+
+
+## Dual-System VLN Models
+Dual-System VLN Models integrates **System2** (high-level reasoning and planning) with  
+**System1** (low-level action control), supporting both modular integration and joint training.
+
+
+### Supported Systems
+- **InternVLA-N1 (System2)**  
+- **InternVLA-N1 (Dual System) w/ NavDP***
+  (*NavDP** indicates joint tuning with System2)
+- **InternVLA-N1 (Dual System) DualVLN**
+
+
+### 1. Training for InternVLA-N1 (System2)
+
+**InternVLA-N1 (System2)** is trained independently to predict 2D pixel goals for navigation.
+
+It can be used with any compatible System1 model capable of executing 2D pixel goals or point goals (given depth and pose).  
+Alternatively, it can be jointly trained together with a System1 model for end-to-end multi-system optimization.
+
+
+#### Training Command
+
+```bash
+# training system2 separately
+sbatch ./scripts/train/base_train/qwenvl_train/train_system2.sh 
+```
+
+---
+
+### 2. Joint Training for InternVLA-N1 (Dual System)
+
+After completing training of **InternVLA-N1 (System2)**, joint training is supported with a pixel-goal navigation System1, using either the **NavDP** or **NextDiT** architecture.
+
+- **InternVLA-N1 (Dual System) w/ NavDP**: preserves **NavDP**'s model design and uses **RGB-D** input.  
+- **InternVLA-N1 (Dual System) DualVLN**: uses only **RGB** input, resulting in a smaller model footprint.
+
+#### Training Command
+
+```bash
+# training system1 based on system2
+sbatch ./scripts/train/base_train/qwenvl_train/train_dual_system.sh 
+```
+
+- For **w/ NavDP** model variant, set `system1=navdp_async`. Optimal performance is typically observed after **30,000 iterations**.  
+- For **DualVLN** model variant, set `system1=nextdit_async`. Optimal performance is typically observed after **15,000 iterations**.
+
+## Single-System VLN Models
+
+Single-System VLN Models directly map **visual observations and language instructions** to navigation actions in an end-to-end manner.
+
+
+### Supported Models
+
+The following Single-System VLN Models are currently supported:
+
+- Seq2Seq  
+- CMA  
+- RDP  
+
+For our VLM-based VLN model **StreamVLN**, please refer to the following repository for training details:  
+https://github.com/InternRobotics/StreamVLN  
+
+Support for StreamVLN within InternNav is planned for future releases.
+
+
+### Training Command
+
+Training is performed through a unified training entry script.  
+Below are example commands for each supported model.
+
+**Seq2Seq**
+```
+./scripts/train/base_train/start_train.sh --name seq2seq_train --model seq2seq
+```
+
+**CMA**
+```
+./scripts/train/base_train/start_train.sh --name cma_train --model cma
+```
+
+**RDP**
+```
+./scripts/train/base_train/start_train.sh --name rdp_train --model rdp
+```
+
+
+## VN System (System1) Models
+
+VN System (System1) focuses on **low-level visual navigation and motion control**.  
+
+
+### Supported Methods
+
+The following visual navigation methods are included in the System1 benchmark:
+
+- DD-PPO  
+- iPlanner  
+- ViPlanner  
+- GNM  
+- ViNT  
+- NoMaD  
+- NavDP (**InternVLA-N1 System1**)
+
+Among them, **only NavDP is currently supported for training** in InternNav.  
+All other methods are provided for **evaluation and comparison purposes only**.
+
+
+### Training Command
+
+**NavDP**
+
+
+```bash
+./scripts/train/base_train/start_train.sh --name navdp_train --model-name navdp
+```
+
diff --git a/source/en/user_guide/internnav/tutorials/training.md b/source/en/user_guide/internnav/tutorials/training.md