Maximizing Query Diversity for Terrain Cost Preference Learning in Robot Navigation

Official PyTorch implementation of the paper: "Maximizing Query Diversity for Terrain Cost Preference Learning in Robot Navigation" Published in RO-MAN, 2025

Abstract

Effective robot navigation in real-world environments requires an understanding of terrain properties, as different terrain types impact factors such as speed, safety, and wear on the platform. Preference-based learning offers a compelling framework in which terrain costs can be inferred through simple trajectory queries to the user. However, existing query selection methods often suffer from redundant selection due to limited trajectory diversity, as well as query ambiguity, where the user must choose between trajectories with minimal distinguishable differences. These issues lead to inefficient learning and suboptimal terrain cost estimation. In this paper, we introduce a joint optimization framework that increases learning efficiency by improving both the diversity of the trajectory set and the query selection strategy. We used a variational autoencoder (VAE) to encode and group trajectories based on their terrain characteristics. Clusters were used to identify less represented terrain types so that new trajectories can be added to the corresponding cluster to ensure a balanced and representative query set. Additionally, we employ a cluster-aware query selection mechanism that prioritizes diverse trajectory pairs pulled from distinct clusters to maximize information gain. Experimental results demonstrate that our approach significantly reduces the number of queries required to converge to the ground-truth terrain cost assignment, outperforming state-of-the-art query selection techniques.

Background

To represent the terrain costs for robot navigation, we utilized costmaps, which are grid-based representations of the environment where each cell stores a cost value associated with traversing that area. Costmaps are typically generated from LIDAR data and, in our approach, we learned the terrain costs represented in the costmaps using human preferences.

I used this approach to infer terrain weights based on user preferences over robot trajectory/path samples by creating a feature function that simply reports the number of occurrences of each terrain in the path type divided by the total length.

Results

Using a simulated oracle to answer preference queries, I rigorously tested my approach against state-of-the-art baseline query acquisition functions. I primarily focused a suite of experiments comparing our method with mutual information based queries, which prioritizes the potential ability of a selected query to resolve uncertainty in the belief distribution, because our method is built on top of this.

First I conducted three fixed length experiments, where I tracked the learned weights over time for a certain number of queries. I did this with 25, 50, and 75 queries, respectively. The figure above shows the learned weights for both methods. Mutual information querying resulted in the less represented terrain types (those that did not frequently appear in the trajectories shown to the user) converged to approximately 0, indicating that they were not distinguishable from one another. On the other hand, our method showed that these terrain weights actually converged to a representative value, and to the correct ground truth order. The figure below also shows the alignment and convergence of the learned weights when compared to the ground truth costs over time. The convergence plot demonstrates that our approach got closer to the ground truth faster than the mutual information querying approach. Similarly, the alignment plot shows that our approach not only became fully aligned much faster than the comparative method,, but also remained aligned for longer during preference elicitation.

We also conducted convergence and alignment experiments where the learning continued until the mean absolute error was within 0.75 of the ground truth and where ground truth alignment was reach (0 terrain types out of order), respectively. For the first suite of experiments, we conducted 11 tests. In 8/11 tests our method showed faster convergence than PbIRL with mutual information queries, and 1 test was a tie. In other words, discounting the tie, our method converged faster 80% of the time. There were also several tests were our method converged significantly faster. We tested alignment three times, where our approach became aligned much more quickly in all runs. Overall, we show statistically significant results over mutual information based querying.

Thus our approach shows statistically significant improvements over mutual information.

Finally, we conducted an experiment comparing our method with not just mutual information, but three other query acquisition functions as well. These include volume removal, Thompson sampling, and disagreement based selection. Our algorithm showed faster convergence against all four other algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maximizing Query Diversity for Terrain Cost Preference Learning in Robot Navigation

Abstract

Background

Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

smiley-maker/VAE-Preference-Learning

Folders and files

Latest commit

History

Repository files navigation

Maximizing Query Diversity for Terrain Cost Preference Learning in Robot Navigation

Abstract

Background

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages