Skip to content

utiasDSL/perceive_semantix_release

Repository files navigation

Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments

Benjamin Bogenberger1, Oliver Harrison1, Orrin Dahanaggamaarachchi1, Lukas Brunke1,2,3, Jingxing Qian2,3, Siqi Zhou1,4, Angela P. Schoellig1,2,3,

1Technical University of Munich, 2University of Toronto, 3Vector Institute, 4Simon Fraser University

IEEE RA-L arXiv Website Python >=3.11 CUDA 12.x

Official perception pipeline of Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments. This includes the blocks Sec. IV-A (green), Sec. IV-B (red), and Sec. IV-C (orange). Input data are posed RGB-D frames $\mathbf{F}_t$ and optionally a user query $\mathbf{q}$.

Block diagram

Abstract: Robots deployed in real-world environments, such as homes, must not only navigate safely but also understand their surroundings and adapt to changes in the environment. To perform tasks efficiently, they must build and maintain a semantic map that accurately reflects the current state of the environment. Existing research on semantic exploration largely focuses on static scenes without persistent object-level instance tracking. In this work, we propose an open-vocabulary, semantic exploration system for semi-static environments. Our system maintains a consistent map by building a probabilistic model of object instance stationarity, systematically tracking semi-static changes, and actively exploring areas that have not been visited for an extended period. In addition to active map maintenance, our approach leverages the map's semantic richness with large language model (LLM)-based reasoning for open-vocabulary object-goal navigation. This enables the robot to search more efficiently by prioritizing contextually relevant areas.We compare our approach against state-of-the-art baselines using publicly available object navigation and mapping datasets, and we further demonstrate real-world transferability in three real-world environments. Our approach outperforms the compared baselines in both success rate and search efficiency for object-navigation tasks and can more reliably handle changes in mapping semi-static environments. In real-world experiments, our system detects 95% of map changes on average, improving efficiency by more than 29% as compared to random and patrol strategies.

Quick start

  • Install pixi

  • In repository root directory run (installs & activates pixi environment, builds the perceive_semantix_lib package):

    pixi shell
  • You can choose if you want to process data stored in "raw"-format or whether you want to work with in-/output streams from ROS2.

  • To get started on adapting this library for your own application check the "raw" data interface - it is essentially a wrapper around

    input = InputDataStamped(
        time_sec=time_sec,
        data=InputData(
            camera_intrinsics=camera_intrinsics,
            color=color_img,
            depth=depth_img,
            pose=camera_pose,
        ),  
    )
    scene.step(input)

Run raw data interface

  • Unzip the example data

    unzip $PIXI_PROJECT_ROOT/example_data/input_streams/raw/ball_reidentification_experiment.zip  -d $PIXI_PROJECT_ROOT/example_data/input_streams/raw/
  • Run (this will create some cache folders including downloaded model weights (if not already present) and create a logging directory)

    python $PIXI_PROJECT_ROOT/interfaces/disk_io/main.py $PIXI_PROJECT_ROOT/example_data/input_streams/raw/ball_reidentification_experiment -v

Run ROS interface

  • Download the example ROS bag from https://drive.google.com/file/d/1UydbDrrtkGNGaZbzJEqIdlpPAD8VTFEv/view?usp=drive_link and unzip it

  • Activate the pixi environment pixi shell

  • Build the package colcon build --cmake-args -DPython_EXECUTABLE=$(which python)

  • Source the package source install/setup.bash

  • Run (this will create some cache folders including downloaded model weights (if not already present) and create a logging directory)

    ros2 run perceive_semantix_ros2 perceive_semantix_node --ros-args -p image_rotations_clockwise:=-1 -p store_output:=False -p initial_scene_path:=$PIXI_PROJECT_ROOT/example_data/premapped_scenes/scene_office_legacy.pkl
    • Explanation of arguments:
      • image_rotations_clockwise:=-1: account for the mounting orientation of the camera. The object recognition networks work best with normally oriented images
      • store_output:=False: do not store the mapping output
      • initial_scene_path:=... path to the a previously generated map to use for initialization
  • Play the ROS bag

    ros2 bag play <path_to_your_unzipped_rosbag>

Contributing

Code Outline

Code is seperated into interfaces (./interfaces, e.g. ROS interface) and the core library (./perceive_semantix_lib). An outline of the core library is given in its README.

Tool Setup

  1. The project uses the Ruff Python linter and code formatter, and uses typeguard together with jaxtyping (for arrays) for runtime type-checking. Both are installed and enabled in the dev environment:

    pixi shell -e dev
  2. Inside the dev environment run

    ruff check

    and

    ruff format
  3. Ruff extensions are also available for code editors, e.g., Ruff for VS Code

Citation

If you find this work useful, please consider citing our paper:

@ARTICLE{semi-static-semantic-exploration,
  author={Bogenberger, Benjamin and Harrison, Oliver and Dahanaggamaarachchi, Orrin and Brunke, Lukas and Qian, Jingxing and Zhou, Siqi and Schoellig, Angela P.},
  journal={IEEE Robotics and Automation Letters}, 
  title={Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments}, 
  year={2026},
  doi={10.1109/LRA.2026.3656790}
}

About

Official perception pipeline of Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments

Resources

License

Stars

Watchers

Forks