ai4ce · surendhar-palanisamy · Mar 10, 2026 · Mar 10, 2026 · Mar 11, 2026 · Mar 11, 2026
diff --git a/.github/workflows/deploy-unav-v2-modal.yml b/.github/workflows/deploy-unav-v2-modal.yml
@@ -17,6 +17,7 @@ on:
           - t4
           - any
           - a100
+          - h200
       ram_mb:
         description: "RAM reservation in MiB (max in this workflow: 98304)"
         required: true
@@ -58,8 +59,8 @@ jobs:
           if parsed <= 0:
               raise ValueError("scaledown_window must be a positive integer")
           gpu = os.environ["UNAV_GPU_TYPE"].strip().lower()
-          if gpu not in {"a10", "t4", "any", "a100"}:
-              raise ValueError("gpu_type must be one of: a10, t4, any, a100")
+          if gpu not in {"a10", "t4", "any", "a100", "h200"}:
+              raise ValueError("gpu_type must be one of: a10, t4, any, a100, h200")
           ram_mb = int(os.environ["UNAV_RAM_MB"])
           max_ram_mb = 98304
           if ram_mb <= 0:

diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,197 @@
+# UNav-Server Agent Guidelines
+
+This document provides guidelines for agents working on the UNav-Server codebase.
+
+## Project Overview
+
+UNav-Server provides a serverless implementation for indoor navigation using computer vision. It leverages Modal for deployment and offers features like visual localization, path planning, and navigation guidance.
+
+## Project Structure
+
+```
+UNav-Server/
+├── src/
+│   └── modal_functions/
+│       ├── unav_v1/           # Legacy version
+│       ├── unav_v2/           # Current production version
+│       │   ├── unav_modal.py          # Main Modal app (~200 lines)
+│       │   ├── logic/                  # Extracted business logic
+│       │   │   ├── __init__.py         # Exports all run_* functions
+│       │   │   ├── navigation.py       # run_planner, run_localize_user
+│       │   │   ├── init.py             # Initialization & monkey-patching
+│       │   │   ├── places.py           # run_get_places, run_get_fallback_places
+│       │   │   ├── maps.py             # run_ensure_maps_loaded
+│       │   │   ├── utils.py            # run_safe_serialize, etc.
+│       │   │   └── vlm.py               # run_vlm_on_image
+│       │   ├── server_methods/
+│       │   │   └── helpers.py          # Queue utility functions
+│       │   ├── test_modal_functions.py
+│       │   ├── modal_config.py
+│       │   ├── deploy_config.py
+│       │   ├── destinations_service.py
+│       │   └── media/                  # Test images
+│       └── volume_utils/               # Volume management utilities
+├── .github/workflows/                   # CI/CD workflows
+├── requirements.txt                     # Python dependencies
+└── TODO.md                             # Technical documentation
+```
+
+## Build/Lint/Test Commands
+
+### Python Version
+- Minimum: Python 3.10+
+- Recommended: Python 3.11 (used in CI/CD)
+
+### Setup
+```bash
+# Create virtual environment
+python -m venv .venv
+
+# Activate (macOS/Linux)
+source .venv/bin/activate
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+### Running Tests
+```bash
+# Navigate to the module directory
+cd src/modal_functions/unav_v2
+
+# Run a single test file
+python test_modal_functions.py
+
+# Run with pytest (if installed)
+pytest test_modal_functions.py -v
+```
+
+### Deployment Commands
+```bash
+# Deploy to Modal (from unav_v2 directory)
+cd src/modal_functions/unav_v2
+modal deploy unav_modal.py
+
+# Deploy with custom parameters
+UNAV_SCALEDOWN_WINDOW=600 UNAV_GPU_TYPE=t4 UNAV_RAM_MB=73728 modal deploy unav_modal.py
+```
+
+### GitHub Actions Deployment
+1. Go to Actions -> "Deploy UNav v2 Modal" -> "Run workflow"
+2. Set inputs: scaledown_window, gpu_type, ram_mb
+3. Requires secrets: MODAL_TOKEN_ID, MODAL_TOKEN_SECRET
+
+## Code Style Guidelines
+
+### Import Organization
+Order: stdlib -> third-party -> local imports, with blank lines between groups.
+
+```python
+import os
+import json
+from typing import Dict, List, Any, Optional
+
+import modal
+import cv2
+import numpy as np
+
+from .deploy_config import get_scaledown_window
+from .logic import run_planner, run_localize_user
+```
+
+### Naming Conventions
+- **Functions/variables**: snake_case (e.g., `get_destinations_list`, `image_data`)
+- **Classes**: PascalCase (e.g., `UnavServer`, `FacilityNavigator`)
+- **Constants**: UPPER_SNAKE_CASE (e.g., `BUILDING`, `PLACE`)
+- **Logic functions**: prefix with `run_` (e.g., `run_planner`, `run_safe_serialize`)
+- **Private methods**: prefix with underscore (e.g., `_configure_middleware_tracing`)
+
+### Type Hints
+Use type hints for function parameters and return values.
+
+```python
+def get_destinations_list_impl(
+    server: Any,
+    floor: str = "6_floor",
+    place: str = "New_York_City",
+    enable_multifloor: bool = False,
+) -> Dict[str, Any]:
+```
+
+### Refactoring Pattern: Logic Extraction
+
+When extracting code from `unav_modal.py`:
+
+1. **Keep `@method()` decorators in `unav_modal.py`** - Modal requires them
+2. **Move logic to `logic/` directory** - Each function gets `run_` prefix
+3. **Thin wrapper pattern** - Method in unav_modal.py just calls the logic function
+
+```python
+# unav_modal.py - thin wrapper
+@method()
+def planner(self, session_id: str, ...):
+    return run_planner(self, session_id=session_id, ...)
+
+# logic/navigation.py - actual logic
+def run_planner(self, session_id: str, ...) -> Dict[str, Any]:
+    # Full implementation here
+    pass
+```
+
+**DO NOT create wrapper methods** for internal functions (e.g., `get_session`, `update_session`) - call `run_*` functions directly from logic modules.
+
+### Error Handling
+- Use try/except blocks for operations that may fail
+- Catch specific exceptions when possible
+- Return error dictionaries for recoverable errors
+- Use print statements with emojis for logging
+
+```python
+try:
+    result = some_function()
+except ValueError as e:
+    print(f"❌ Invalid value: {e}")
+    return {"status": "error", "message": str(e)}
+except Exception as e:
+    print(f"❌ Unexpected error: {e}")
+    raise
+```
+
+### Code Formatting
+- Maximum line length: 100 characters (soft limit)
+- Use 4 spaces for indentation (no tabs)
+- Use blank lines to separate logical sections
+- Use trailing commas in multi-line collections
+- Use f-strings for string interpolation
+
+### Logging Patterns
+- `print("🔧 [Phase X] ...")` - Initialization steps
+- `print("✅ ...")` - Success messages
+- `print("⚠️ ...")` - Warnings
+- `print("❌ ...")` - Errors
+- `print(f"[DEBUG] ...")` - Debug info
+
+### Testing Guidelines
+- Test files: `test_modal_functions.py`
+- Use descriptive test parameters (BUILDING, PLACE, FLOOR, etc.)
+- Include error handling for Modal class lookup
+- Test both success and failure paths when applicable
+
+## Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| UNAV_SCALEDOWN_WINDOW | 300 | Modal scaledown window (seconds) |
+| UNAV_GPU_TYPE | t4 | GPU type (a10, t4, a100, any, h200) |
+| UNAV_RAM_MB | 73728 | RAM reservation in MiB |
+| MODAL_TOKEN_ID | - | Modal token (GitHub secret) |
+| MODAL_TOKEN_SECRET | - | Modal secret (GitHub secret) |
+
+## Notes for Agents
+
+- This is a Modal-based serverless application
+- Tests require a deployed Modal app to run against
+- The codebase uses the unav-core library internally (runtime dependency - LSP errors are expected locally)
+- Code changes may require redeployment to take effect
+- Check TODO.md for technical context on implementation decisions
+- Runtime imports (torch, unav, middleware, google.genai) only exist in Modal container
diff --git a/TODO.md b/TODO.md
@@ -0,0 +1,121 @@
+# UNav-VIS4ION Alignment Tasks
+
+## Context
+Output differences between Modal.com and static GPU are from **wrapper/orchestration layer**, not from unav-core itself.
+
+## Root Cause Analysis
+
+### Why unav-server (static GPU) works better:
+1. **Map loading**: Loads ALL floors at startup (`localizer.load_maps_and_features()`)
+2. **Session in memory across requests persistence**: Queue persists
+3. **Queue works**: Subsequent calls use refinement queue → better accuracy
+
+### Why Modal fails:
+1. **Floor-locked maps**: Only loads target floor, not all floors
+2. **No session persistence**: `enable_memory_snapshot=False`, each call hits different container
+3. **Queue doesn't work**: Session/queue lost on cold-start = **ALWAYS cold start**
+
+This is why X-value drifts - every single localization is cold-start!
+
+## Codex Comparison Results
+- Test case: `vianys_640_360_langone_elevator.jpeg`, NYU Langone, destination context set to `15_floor`
+- Floor-lock issue: 0/5 successful localizations (floor-locked) → 5/5 successful (multifloor)
+- Cold-start stabilization: XY error improved from 160.79 px → 58.90 px → 47.92 px
+
+---
+
+## Changes Implemented (Chronological)
+
+### 1. Enable Multifloor by Default
+- **Changed**: `enable_multifloor` default from `False` to `True` in both `planner()` and `localize_user()`
+- **Why**: Matches unav-server behavior - loads all floors for the building instead of just target floor
+- **Impact**: Fixes 0/5 → 5/5 success rate
+
+### 2. Queue Bucketing by Image Shape
+- **Added** helper functions:
+  - `_get_queue_key_for_image_shape()` - generates queue key based on `image.shape[:2]` (e.g., "360x640")
+  - `_get_refinement_queue_for_map()` - retrieves queue for specific map_key and queue_key
+  - `_update_refinement_queue()` - updates queue for specific map_key and queue_key
+- **Modified** queue structure to be nested: `{best_map_key: {queue_key: {pairs, initial_poses, pps}}}`
+- **Note**: Less critical since queue doesn't persist in serverless anyway
+
+### 3. Cold-Start Multi-Pass Stabilization (v1 - 2 passes)
+- **Initial**: Ran 2 localization passes on cold-start, averaged results
+- **Why**: Since queue doesn't work in serverless, each request needs self-correction
+
+### 4. Cold-Start Multi-Pass Stabilization (v2 - 3 passes)
+- **Changed**: Upgraded from 2 passes to 3 passes
+- **Updated**: bootstrap_mode label from "mean_pass2_pass3" to "mean_all_passes"
+- **Why**: Better averaging with more samples, diminishing returns after 3 but still improved
+
+### 5. Add Debug Fields
+Added `debug_info` to responses:
+- `map_scope`: "building_level_multifloor" or "floor_locked"
+- `bootstrap_mode`: "mean_all_passes", "single_pass", or "none"
+- `bootstrap_passes`: number of passes run
+- `queue_key`: image shape bucket key
+- `n_frames`: number of frames in queue
+- `top_candidates_count`: number of VPR candidates
+
+---
+
+## Technical Details
+
+### Helper Functions (lines 13-42)
+```python
+def _get_queue_key_for_image_shape(image_shape):
+    """Get a queue key based on image shape for bucket-based refinement queue handling."""
+    if image_shape is None:
+        return "default"
+    h, w = image_shape[:2]
+    return f"{h}x{w}"
+
+def _get_refinement_queue_for_map(queue_dict, map_key, queue_key):
+    """Get the refinement queue for a specific map_key and queue_key."""
+    ...
+
+def _update_refinement_queue(queue_dict, map_key, queue_key, new_queue_state):
+    """Update the refinement queue for a specific map_key and queue_key."""
+    ...
+```
+
+### Cold-Start Stabilization Logic
+- Triggered when `len(refinement_queue) == 0` (always in serverless)
+- Runs 3 bootstrap passes on empty queue
+- Averages XY coordinates and angle from all 3 passes
+- Updates queue after each pass for next iteration
+- Falls back to single pass if fewer than 2 succeed
+
+### Code Locations
+- `planner()`: lines ~1470-1520
+- `localize_user()`: lines ~1902-1952
+
+---
+
+## Expected Improvements
+
+| Metric | Before | After |
+|--------|--------|-------|
+| Success rate (floor-locked) | 0/5 | 5/5 |
+| XY error (cold-start) | ~160px | ~50px |
+| Map scope | floor-locked | building-level |
+
+---
+
+## Future Considerations
+
+1. **Enable memory snapshots**: Could persist queue across cold-starts (but adds ~5-10s restore time)
+2. **Client-side queue**: Pass queue with each request
+3. **External storage**: Redis for queue persistence
+4. **top_k optimization**: Experiment with different top_k values:
+   - Default (None) uses config value (~10-20)
+   - Lower top_k = faster but fewer candidates
+   - Higher top_k = slower but more candidates to match against
+   - Consider making it dynamic based on image quality
+
+---
+
+## Notes
+- Test image: `vianys_640_360_langone_elevator.jpeg`
+- Expected floor: `15_floor`
+- Reference mean output should be used for comparison
diff --git a/src/modal_functions/unav_v2/deploy_config.py b/src/modal_functions/unav_v2/deploy_config.py
@@ -27,6 +27,7 @@ def get_gpu_config() -> List[str]:
         "a10": "A10",
         "a10g": "A10",
         "a100": "A100",
+        "h200": "H200",
         "any": "any",
     }
 

diff --git a/src/modal_functions/unav_v2/destinations_service.py b/src/modal_functions/unav_v2/destinations_service.py
@@ -1,5 +1,8 @@
 from typing import Any
 
+from .logic.places import run_get_places
+from .logic.maps import run_ensure_maps_loaded
+
 
 def get_destinations_list_impl(
     server: Any,
@@ -32,15 +35,17 @@ def _run():
         print(f"🎯 [Phase 3] Getting destinations for {place}/{building}/{floor}")
 
         # Ensure maps are loaded for this location.
-        server.ensure_maps_loaded(
-            place,
-            building,
+        run_ensure_maps_loaded(
+            server=server,
+            place=place,
+            building=building,
             floor=floor,
             enable_multifloor=enable_multifloor,
         )
 
         if enable_multifloor:
-            places = server.get_places(
+            places = run_get_places(
+                server,
                 target_place=place,
                 target_building=building,
                 enable_multifloor=True,