diff --git a/docs/source/conf.py b/docs/source/conf.py
index 2ee6771ea..13acd25cc 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -140,8 +140,8 @@ def get_version_path():
     "navbar_center": "navbar-nav",
     "canonical_url": "https://meta-pytorch.org/forge/",
     "header_links_before_dropdown": 7,
-    "show_nav_level": 2,
     "show_toc_level": 2,
+    "navigation_depth": 3,
 }
 
 theme_variables = pytorch_sphinx_theme2.get_theme_variables()
@@ -173,6 +173,7 @@ def get_version_path():
     "colon_fence",
     "deflist",
     "html_image",
+    "substitution",
 ]
 
 # Configure MyST parser to treat mermaid code blocks as mermaid directives
diff --git a/docs/source/getting_started.md b/docs/source/getting_started.md
index 3fe46de7e..6c2218806 100644
--- a/docs/source/getting_started.md
+++ b/docs/source/getting_started.md
@@ -1,9 +1,289 @@
-# Get Started
+# Getting Started
 
-Welcome to TorchForge! This guide will help you get up and running with TorchForge, a PyTorch-native platform specifically designed for post-training generative AI models.
+This guide will walk you through installing TorchForge, understanding its dependencies, verifying your setup, and running your first training job.
 
-TorchForge specializes in post-training techniques for large language models, including:
+## System Requirements
 
-- **Supervised Fine-Tuning (SFT)**: Adapt pre-trained models to specific tasks using labeled data
-- **Group Relative Policy Optimization (GRPO)**: Advanced reinforcement learning for model alignment
-- **Multi-GPU Distributed Training**: Efficient scaling across multiple GPUs and nodes
+Before installing TorchForge, ensure your system meets the following requirements.
+
+| Component | Requirement | Notes |
+|-----------|-------------|-------|
+| **Operating System** | Linux (Fedora/Ubuntu/Debian) | MacOS and Windows not currently supported |
+| **Python** | 3.10 or higher | Python 3.11 recommended |
+| **GPU** | NVIDIA with CUDA support | AMD GPUs not currently supported |
+| **Minimum GPUs** | 2+ for SFT, 3+ for GRPO | More GPUs enable larger models |
+| **CUDA** | 12.8 | Required for GPU training |
+| **RAM** | 32GB+ recommended | Depends on model size |
+| **Disk Space** | 50GB+ free | For models, datasets, and checkpoints |
+| **PyTorch** | Nightly build | Latest distributed features (DTensor, FSDP) |
+| **Monarch** | Pre-packaged wheel | Distributed orchestration and actor system |
+| **vLLM** | v0.10.0+ | Fast inference with PagedAttention |
+| **TorchTitan** | Latest | Production training infrastructure |
+
+
+## Prerequisites
+
+- **Conda or Miniconda**: For environment management
+  - Download from [conda.io](https://docs.conda.io/en/latest/miniconda.html)
+
+- **GitHub CLI (gh)**: Required for downloading pre-packaged dependencies
+  - Install instructions: [github.com/cli/cli#installation](https://github.com/cli/cli#installation)
+  - After installing, authenticate with: `gh auth login`
+  - You can use either HTTPS or SSH as the authentication protocol
+
+- **Git**: For cloning the repository
+  - Usually pre-installed on Linux systems
+  - Verify with: `git --version`
+
+
+**Installation note:** The installation script provides pre-built wheels with PyTorch nightly already included.
+
+## Installation
+
+TorchForge uses pre-packaged wheels for all dependencies, making installation faster and more reliable.
+
+1. **Clone the Repository**
+
+   ```bash
+   git clone https://github.com/meta-pytorch/forge.git
+   cd forge
+   ```
+
+2. **Create Conda Environment**
+
+   ```bash
+   conda create -n forge python=3.10
+   conda activate forge
+   ```
+
+3. **Run Installation Script**
+
+   ```bash
+   ./scripts/install.sh
+   ```
+
+   The installation script will:
+   - Install system dependencies using DNF (or your package manager)
+   - Download pre-built wheels for PyTorch nightly, Monarch, vLLM, and TorchTitan
+   - Install TorchForge and all Python dependencies
+   - Configure the environment for GPU training
+
+   ```{tip}
+   **Using sudo instead of conda**: If you prefer installing system packages directly rather than through conda, use:
+   `./scripts/install.sh --use-sudo`
+   ```
+
+   ```{warning}
+   When adding packages to `pyproject.toml`, use `uv sync --inexact` to avoid removing Monarch and vLLM.
+   ```
+
+## Verifying Your Setup
+
+After installation, verify that all components are working correctly:
+
+1. **Check GPU Availability**
+
+   ```bash
+   python -c "import torch; print(f'GPUs available: {torch.cuda.device_count()}')"
+   ```
+
+   Expected output: `GPUs available: 2` (or more)
+
+2. **Check CUDA Version**
+
+   ```bash
+   python -c "import torch; print(f'CUDA version: {torch.version.cuda}')"
+   ```
+
+   Expected output: `CUDA version: 12.8`
+3. **Check All Dependencies**
+
+   ```bash
+   # Check core components
+   python -c "import torch, forge, monarch, vllm; print('All imports successful')"
+
+   # Check specific versions
+   python -c "
+   import torch
+   import forge
+   import vllm
+
+   print(f'PyTorch: {torch.__version__}')
+   print(f'TorchForge: {forge.__version__}')
+   print(f'vLLM: {vllm.__version__}')
+   print(f'CUDA: {torch.version.cuda}')
+   print(f'GPUs: {torch.cuda.device_count()}')
+   "
+   ```
+
+4. **Verify Monarch**
+
+   ```bash
+   python -c "
+   from monarch.actor import Actor, this_host
+
+   # Test basic Monarch functionality
+   procs = this_host().spawn_procs({'gpus': 1})
+   print('Monarch: Process spawning works')
+   "
+   ```
+
+## Quick Start Examples
+
+Now that TorchForge is installed, let's run some training examples.
+
+Here's what training looks like with TorchForge:
+
+```bash
+# Install dependencies
+conda create -n forge python=3.10
+conda activate forge
+git clone https://github.com/meta-pytorch/forge
+cd forge
+./scripts/install.sh
+
+# Download a model
+hf download meta-llama/Meta-Llama-3.1-8B-Instruct --local-dir /tmp/Meta-Llama-3.1-8B-Instruct --exclude "original/consolidated.00.pth"
+
+# Run SFT training (requires 2+ GPUs)
+uv run forge run --nproc_per_node 2 \
+  apps/sft/main.py --config apps/sft/llama3_8b.yaml
+
+# Run GRPO training (requires 3+ GPUs)
+python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
+```
+
+### Example 1: Supervised Fine-Tuning (SFT)
+
+Fine-tune Llama 3 8B on your data. **Requires: 2+ GPUs**
+
+1. **Download the Model**
+
+   ```bash
+   uv run forge download meta-llama/Meta-Llama-3.1-8B-Instruct \
+     --output-dir /tmp/Meta-Llama-3.1-8B-Instruct \
+     --ignore-patterns "original/consolidated.00.pth"
+   ```
+
+   ```{note}
+   Model downloads require Hugging Face authentication. Run `huggingface-cli login` first if you haven't already.
+   ```
+
+2. **Run Training**
+
+   ```bash
+    python -m apps.sft.main --config apps/sft/llama3_8b.yaml
+   ```
+
+   **What's Happening:**
+   - `--nproc_per_node 2`: Use 2 GPUs for training
+   - `apps/sft/main.py`: SFT training script
+   - `--config apps/sft/llama3_8b.yaml`: Configuration file with hyperparameters
+   - **TorchTitan** handles model sharding across the 2 GPUs
+   - **Monarch** coordinates the distributed training
+
+
+   ```
+
+### Example 2: GRPO Training
+
+Train a model using reinforcement learning with GRPO. **Requires: 3+ GPUs**
+
+```bash
+python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml
+```
+
+**What's Happening:**
+- GPU 0: Trainer model (being trained, powered by TorchTitan)
+- GPU 1: Reference model (frozen baseline, powered by TorchTitan)
+- GPU 2: Policy model (scoring outputs, powered by vLLM)
+- **Monarch** orchestrates all three components
+- **TorchStore** handles weight synchronization from training to inference
+
+## Understanding Configuration Files
+
+TorchForge uses YAML configuration files to manage training parameters. Let's examine a typical config:
+
+```yaml
+# Example: apps/sft/llama3_8b.yaml
+model:
+  name: meta-llama/Meta-Llama-3.1-8B-Instruct
+  path: /tmp/Meta-Llama-3.1-8B-Instruct
+
+training:
+  batch_size: 4
+  learning_rate: 1e-5
+  num_epochs: 10
+  gradient_accumulation_steps: 4
+
+distributed:
+  strategy: fsdp  # Managed by TorchTitan
+  precision: bf16
+
+checkpointing:
+  save_interval: 1000
+  output_dir: /tmp/checkpoints
+```
+
+**Key Sections:**
+- **model**: Model path and settings
+- **training**: Hyperparameters like batch size and learning rate
+- **distributed**: Multi-GPU strategy (FSDP, tensor parallel, etc.) handled by TorchTitan
+- **checkpointing**: Where and when to save model checkpoints
+
+## Next Steps
+
+Now that you have TorchForge installed and verified:
+
+1. **Learn the Concepts**: Read {doc}`concepts` to understand TorchForge's architecture, including Monarch, Services, and TorchStore
+2. **Explore Examples**: Check the `apps/` directory for more training examples
+4. **Read Tutorials**: Follow {doc}`tutorials` for step-by-step guides
+5. **API Documentation**: Explore {doc}`api` for detailed API reference
+
+## Getting Help
+
+If you encounter issues:
+
+1. **Search Issues**: Look through [GitHub Issues](https://github.com/meta-pytorch/forge/issues)
+2. **File a Bug Report**: Create a new issue with:
+   - Your system configuration (output of diagnostic command below)
+   - Full error message
+   - Steps to reproduce
+   - Expected vs actual behavior
+
+**Diagnostic command:**
+```bash
+python -c "
+import torch
+import forge
+
+try:
+    import monarch
+    monarch_status = 'OK'
+except Exception as e:
+    monarch_status = str(e)
+
+try:
+    import vllm
+    vllm_version = vllm.__version__
+except Exception as e:
+    vllm_version = str(e)
+
+print(f'PyTorch: {torch.__version__}')
+print(f'TorchForge: {forge.__version__}')
+print(f'Monarch: {monarch_status}')
+print(f'vLLM: {vllm_version}')
+print(f'CUDA: {torch.version.cuda}')
+print(f'GPUs: {torch.cuda.device_count()}')
+"
+```
+
+Include this output in your bug reports!
+
+## Additional Resources
+
+- **Contributing Guide**: [CONTRIBUTING.md](https://github.com/meta-pytorch/forge/blob/main/CONTRIBUTING.md)
+- **Code of Conduct**: [CODE_OF_CONDUCT.md](https://github.com/meta-pytorch/forge/blob/main/CODE_OF_CONDUCT.md)
+- **Monarch Documentation**: [meta-pytorch.org/monarch](https://meta-pytorch.org/monarch)
+- **vLLM Documentation**: [docs.vllm.ai](https://docs.vllm.ai)
+- **TorchTitan**: [github.com/pytorch/torchtitan](https://github.com/pytorch/torchtitan)
diff --git a/docs/source/index.md b/docs/source/index.md
index 802d62baa..074fa228f 100644
--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -1,11 +1,183 @@
-# Welcome to TorchForge documentation!
+# TorchForge Documentation
 
-**TorchForge** is a PyTorch-native platform specifically designed
-for post-training generative AI models.
+**TorchForge** is a PyTorch-native library for RL post-training and agentic development. Built on the principle that **researchers should write algorithms, not infrastructure**.
 
-Key Features
-------------
+```{note}
+**Experimental Status:** TorchForge is currently in early development. Expect bugs, incomplete features, and API changes. Please file issues on [GitHub](https://github.com/meta-pytorch/forge) for bug reports and feature requests.
+```
+
+## Why TorchForge?
+
+Reinforcement Learning has become essential to frontier AI - from instruction following and reasoning to complex research capabilities. But infrastructure complexity often dominates the actual research.
+
+TorchForge lets you **express RL algorithms as naturally as pseudocode**, while powerful infrastructure handles distribution, fault tolerance, and optimization underneath.
+
+### Core Design Principles
+
+- **Algorithms, Not Infrastructure**: Write your RL logic without distributed systems code
+- **Any Degree of Asynchrony**: From fully synchronous PPO to fully async off-policy training
+- **Composable Components**: Mix and match proven frameworks (vLLM, TorchTitan) with custom logic
+- **Built on Solid Foundations**: Leverages Monarch's single-controller model for simplified distributed programming
+
+## Foundation: The Technology Stack
+
+TorchForge is built on carefully selected, battle-tested components:
+
+::::{grid} 1 1 2 2
+:gutter: 3
+
+:::{grid-item-card} **Monarch**
+:link: https://meta-pytorch.org/monarch
+
+Single-controller distributed programming framework that orchestrates clusters like you'd program a single machine. Provides actor meshes, fault tolerance, and RDMA-based data transfers.
+
+**Why it matters:** Eliminates SPMD complexity, making distributed RL tractable
+:::
+
+:::{grid-item-card} **vLLM**
+:link: https://docs.vllm.ai
+
+High-throughput, memory-efficient inference engine with PagedAttention and continuous batching.
+
+**Why it matters:** Handles policy generation efficiently at scale
+:::
+
+:::{grid-item-card} **TorchTitan**
+:link: https://github.com/pytorch/torchtitan
+
+Meta's production-grade LLM training platform with FSDP, pipeline parallelism, and tensor parallelism.
+
+**Why it matters:** Battle-tested training infrastructure proven at scale
+:::
+
+:::{grid-item-card} **TorchStore**
+:link: https://github.com/meta-pytorch/torchstore
+
+Distributed, in-memory key-value store for PyTorch tensors built on Monarch, optimized for weight synchronization with automatic DTensor resharding.
+
+**Why it matters:** Solves the weight transfer bottleneck in async RL
+:::
+
+::::
+
+## What You Can Build
+
+::::{grid} 1 1 2 3
+:gutter: 2
+
+:::{grid-item-card} Supervised Fine-Tuning
+Adapt foundation models to specific tasks using labeled data with efficient multi-GPU training.
+:::
+
+:::{grid-item-card} GRPO Training
+Train models with Generalized Reward Policy Optimization for aligning with human preferences.
+:::
+
+:::{grid-item-card} Asynchronous RL
+Continuous rollout generation with non-blocking training for maximum throughput.
+:::
+
+:::{grid-item-card} Code Execution
+Safe, sandboxed code execution environments for RL on coding tasks (RLVR).
+:::
+
+:::{grid-item-card} Tool Integration
+Extensible environment system for agents that interact with tools and APIs.
+:::
+
+:::{grid-item-card} Custom Workflows
+Build your own components and compose them naturally with existing infrastructure.
+:::
+
+::::
 
+## Requirements at a Glance
+
+Before diving in, check out {doc}`getting_started` and ensure your system meets the requirements.
+
+## Writing RL Code
+
+With TorchForge, your RL logic looks like pseudocode:
+
+```python
+async def generate_episode(dataloader, policy, reward, replay_buffer):
+    # Sample a prompt
+    prompt, target = await dataloader.sample.route()
+
+    # Generate response
+    response = await policy.generate.route(prompt)
+
+    # Score the response
+    reward_value = await reward.evaluate_response.route(
+        prompt=prompt,
+        response=response.text,
+        target=target
+    )
+
+    # Store for training
+    await replay_buffer.add.route(
+        Episode(prompt_ids=response.prompt_ids,
+                response_ids=response.token_ids,
+                reward=reward_value)
+    )
+```
+
+No retry logic, no resource management, no synchronization code - just your algorithm.
+
+## Documentation Paths
+
+Choose your learning path:
+
+::::{grid} 1 1 2 2
+:gutter: 3
+
+:::{grid-item-card} 🚀 Getting Started
+:link: getting_started
+:link-type: doc
+
+Installation, prerequisites, verification, and your first training run.
+
+**Time to first run: ~15 minutes**
+:::
+
+:::{grid-item-card} 💻 Tutorials
+:link: tutorials
+:link-type: doc
+
+Step-by-step guides and practical examples for training with TorchForge.
+
+**For hands-on development**
+:::
+
+:::{grid-item-card} 📖 API Reference
+:link: api
+:link-type: doc
+
+Complete API documentation for customization and extension.
+
+**For deep integration**
+:::
+
+::::
+
+## Validation & Partnerships
+
+TorchForge has been validated in real-world deployments:
+
+- **Stanford Collaboration**: Integration with the Weaver weak verifier project, training models that hill-climb on challenging reasoning benchmarks (MATH, GPQA)
+- **CoreWeave**: Large-scale training on 512 H100 GPU clusters with smooth, efficient performance
+- **Scale**: Tested across hundreds of GPUs with continuous rollouts and asynchronous training
+
+## Community
+
+- **GitHub**: [meta-pytorch/forge](https://github.com/meta-pytorch/forge)
+- **Issues**: [Report bugs and request features](https://github.com/meta-pytorch/forge/issues)
+- **Contributing**: [CONTRIBUTING.md](https://github.com/meta-pytorch/forge/blob/main/CONTRIBUTING.md)
+- **Code of Conduct**: [CODE_OF_CONDUCT.md](https://github.com/meta-pytorch/forge/blob/main/CODE_OF_CONDUCT.md)
+
+```{tip}
+Before starting significant work, signal your intention in the issue tracker to coordinate with maintainers.
+```
 * **Post-Training Focus**: Specializes in techniques
   like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO)
 * **PyTorch Integration**: Built natively on PyTorch with
@@ -18,17 +190,19 @@ Key Features
   like Llama3 8B and Qwen3.1 7B
 
 ```{toctree}
-:maxdepth: 1
-:caption: Contents:
+:maxdepth: 2
+:caption: Documentation
 
 getting_started
-concepts
 tutorials
 api
 ```
 
-## Indices and tables
+## Indices
+
+* {ref}`genindex` - Index of all documented objects
+* {ref}`modindex` - Python module index
+
+---
 
-* {ref}`genindex`
-* {ref}`modindex`
-* {ref}`search`
+**License**: BSD 3-Clause | **GitHub**: [meta-pytorch/forge](https://github.com/meta-pytorch/forge)