Hardik Shah1,2,
Erica Tevere1,
Deegan Atha1,
Marcel Kaufmann1,
Shehryar Khattak1,
Manthan Patel2,
Marco Hutter2,
Jonas Frey2,3,4,
Patrick Spieler1
1Jet Propulsion Laboratory (JPL), NASA Β Β 2Robotics Systems Lab, ETH Zurich Β Β
3Stanford University Β Β 4University of California, Berkeley
Autonomous navigation in complex, unstructured outdoor environments requires robots to operate over long ranges without prior maps and limited depth sensing. In such settings, relying solely on geometric frontiers for exploration is often insufficient; the ability to reason semantically about where to go and what is safe to traverse is crucial for robust, efficient exploration.
This work presents WildOS, a unified system for long-range, open-vocabulary object search that combines safe geometric exploration with semantic visual reasoning. WildOS builds a sparse navigation graph to maintain spatial memory, while utilizing a foundation-model-based vision module, ExploRFM, to score frontier nodes of the graph. ExploRFM simultaneously predicts traversability, visual frontiers, and object similarity in image space, enabling real-time, onboard semantic navigation tasks. The resulting vision-scored graph enables the robot to explore semantically meaningful directions while ensuring geometric safety.
Furthermore, we introduce a particle-filter-based method for coarse localization of the open-vocabulary target query, that estimates candidate goal positions beyond the robot's immediate depth horizon, enabling effective planning toward distant goals. Extensive closed-loop field experiments across diverse off-road and urban terrains demonstrate that WildOS enables robust navigation, significantly outperforming purely geometric and purely vision-based baselines in both efficiency and autonomy.
wildos/
βββ nvidia_radio/ # Modified RADIO backbone with NACLIP + SigLIP2 alignment
βββ explorfm/ # ExploRFM model (inference): frontiers, traversability, object similarity
βββ explorfm_trainer/ # Training pipeline for ExploRFM heads (Lightning + Hydra)
βββ visual_navigation/ # ROS 2 navigation: WildOS, baselines (LRN, ImgFrontierNav)
βββ triangulation3d/ # Particle-filter-based 3D object triangulation
βββ graphnav_planner/ # Graph-based path planner (C++)
βββ graphnav_msgs/ # ROS 2 message definitions for navigation graph
βββ object_search_msgs/ # ROS 2 message definitions for object search
βββ gps_visualization/ # GPS path visualization (ROS 2 C++)
βββ ckpts/ # Model checkpoints
Each package has its own README with additional details. See the Component Overview section below.
- ROS 2 Jazzy (tested)
- Python >= 3.10
- uv β Python package manager
- CUDA-capable GPU (ExploRFM trained on NVIDIA GeForce RTX 4090, deployed on NVIDIA Jetson AGX Orin GPU)
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | shuv venv wildos_venv
source wildos_venv/bin/activateuv pip install -r requirements.txtuv pip install -e ./nvidia_radio
uv pip install -e ./explorfmuv tool install huggingface_hub[cli]# From your colcon workspace (with this repo cloned/symlinked into src/)
colcon build --packages-select graphnav_msgs object_search_msgs gps_visualization graphnav_planner triangulation3d visual_navigation
source install/setup.bashNote: WildOS was deployed inside a Docker container during field experiments. The dependencies above can be replicated in a virtual environment for development.
Pre-trained head checkpoints are included in ckpts/:
| Checkpoint | Description |
|---|---|
ckpts/frontier_head.ckpt |
Visual frontier prediction head |
ckpts/trav_head.ckpt |
Traversability prediction head |
-
C-RADIOv3-B backbone β download to
ckpts/:# Download from: https://huggingface.co/nvidia/C-RADIOv3-B/blob/main/c-radio_v3-b_half.pth.tar wget -P ckpts/ https://huggingface.co/nvidia/C-RADIOv3-B/resolve/main/c-radio_v3-b_half.pth.tar -
SigLIP2 adaptor β download to
ckpts/siglip2/:huggingface-cli download google/siglip2-so400m-patch16-naflex --cache-dir ckpts/siglip2
Path configuration: All nodes in
visual_navigationexpect theckpts/folder to be atPath.home() / ckpts.
python explorfm/explorfm_model.pyExpected output:
[INFO] Loading SigLIP2 model and processor for version: google/siglip2-so400m-patch16-naflex
[INFO] Using checkpoint path: ckpts/siglip2
Loaded traversability head from ckpts/trav_head.ckpt
Loaded frontier head from ckpts/frontier_head.ckpt
Traversability shape: torch.Size([1, 1, 720, 1280])
Frontiers shape: torch.Size([1, 1, 720, 1280])
Adaptor features shape: torch.Size([1, 1152, 22, 40])
# Launch WildOS with open-vocabulary object search
ros2 launch visual_navigation wildos_launch.py ns:=spot1 do_object_search:=true
# Launch the graph planner
ros2 launch graphnav_planner graphnav_planner.launch.yml ns:=spot1# Image Frontier Navigation baseline
ros2 launch visual_navigation imgfrontier_nav_launch.py ns:=spot1 do_object_search:=true
# LRN baseline
ros2 launch visual_navigation lrn_launch.py ns:=spot1 do_object_search:=false# Standalone ExploRFM triangulation (for testing, with teleoperation)
ros2 launch visual_navigation explorfm_triangulation_launch.py robot_namespace:=spot1
# Visualize ExploRFM outputs (debugging)
ros2 run visual_navigation viz_netAll experiment videos are available on YouTube.
The following packages must be running alongside WildOS:
- Elevation Mapping CuPy β GPU based local 2.5D mapping
- DLIO β LiDAR-inertial odometry
- Nav2 β local planning and control
- Graph Construction - code will be released in a future update.
| Package | Description | Details |
|---|---|---|
nvidia_radio/ |
Modified RADIO backbone with NACLIP + SigLIP2 language alignment | README |
explorfm/ |
ExploRFM model β predicts traversability, visual frontiers, and object similarity | README |
explorfm_trainer/ |
Lightning + Hydra training pipeline for ExploRFM heads | README |
visual_navigation/ |
ROS 2 navigation: WildOS pipeline, baselines (LRN, ImgFrontierNav), scoring, triangulation | README |
triangulation3d/ |
Particle-filter-based 3D object triangulation | README |
graphnav_planner/ |
C++ graph-based path planner | β |
graphnav_msgs/ |
ROS 2 message definitions for navigation graph | β |
object_search_msgs/ |
ROS 2 message definitions for object search | β |
gps_visualization/ |
GPS path visualization (ROS 2 C++) | β |
If you find this work useful, please cite:
@misc{shah2026wildosopenvocabularyobjectsearch,
title={WildOS: Open-Vocabulary Object Search in the Wild},
author={Hardik Shah and Erica Tevere and Deegan Atha and Marcel Kaufmann and Shehryar Khattak and Manthan Patel and Marco Hutter and Jonas Frey and Patrick Spieler},
year={2026},
eprint={2602.19308},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2602.19308},
}We thank the authors of the following works for open-sourcing their code:
We also thank the authors of LRN for sharing their code, which was helpful in setting up the baseline.
This project is released under the Apache 2.0 License.