Support demo-free training in strict DrS setup without crashing on empty buffers + update README command usage by AbdullahHendy · Pull Request #1 · tongzhoumu/DrS

AbdullahHendy · 2025-04-25T05:38:57Z

First off, thank you for the great work on DrS — our team really enjoyed reading the paper and found the method both elegant and practical.

We're currently exploring your suggestion of using large language models (LLMs) to generate or refine stage indicators, and in doing so, I ran into an issue when trying to run drs_learn_reward_maniskill2.py without demonstrations (--demo-path "").

While the paper states that demonstrations are optional, the current implementation crashes with an "All buffers are empty!" exception during discriminator training if no successful trajectories have been collected yet. Initially, this seemed to contradict the intended behavior of DrS as described in the paper.

After understanding the logic, I realized this was just a runtime oversight, and I submitted a minimal fix that skips discriminator updates when --demo-path is empty and success buffers are naturally unpopulated as explained below. I also noticed that the current README give example commands that should be run under the repo's root directory, but it doesn't necessarily work due to python imports, I tried to address this in this PR.

fix1: skip discriminator update if success buffer is empty in no-demo mode

DrS claims that demonstrations are optional, but the current implementation
assumes they exist by default and crashes if no successful trajectory has
been collected early on. This gives a misleading impression that demos are
required, which contradicts the paper, and, I believe, isn't intended by the code authors.

To reproduce the crash:

    python -m drs.drs_learn_reward_maniskill2 \
        --env-id TurnFaucet_DrS_learn-v0 \
        --n-stages 2 \
        --control-mode pd_ee_delta_pose \
        --demo-path ""

This leads to:
    Exception: All buffers are empty!

This happens because DrS's discriminator training logic tries to sample
from stage_buffers[k+1:] before any successful trajectories exist.

One fix is to catch the exception and skip discriminator training **only**
when `--demo-path` is not provided. If `--demo-path` is set, the crash is
intentional to signal a real data mismatch or loading issue.

This is especially important in cases where:
- Users want to run DrS with a different number of stages than the demo dataset supports.
- For example, TurnFaucet can have multiple valid stage definitions:
    def compute_stage_indicator(self):
        delta_angle = self.current_angle - self._last_angle
        success = self.evaluate()['success']
        return {
            'is_grasped': float(self.agent.check_grasp(self.target_link)),
            'handle_move': float((delta_angle > 1e-3) or success),
        }

    Changing the stage logic would lead to tensor shape mismatches with the demo_data/*.pkl,
    so users may intentionally skip demos.

To my understanding, this change strictly preserves DrS methodology — reward learning is only
triggered when success/failure buffers have data. As per current implementation, the logic avoids silent
failures and enforces crash-on-invalid demo paths (Raises 'All buffers are empty!' Exception), but continues gracefully
in no-demo setups when it's expected to have empty buffers in the early stages.

fix2: update README command examples to use `python -m` for correct module imports

The original README included commands like:

    python drs/drs_reuse_reward_maniskill2.py --env-id ...

However, running these as-is leads to:

    ModuleNotFoundError: No module named 'drs'

This happens because relative imports like `from drs.drs_learn_reward_maniskill2 import Discriminator`
are not valid when executing scripts directly via their file path. This will happen everywhere where there is 'import drs.BLAH'

The correct way to run these scripts is with the `-m` flag from the project root:

    python -m drs.drs_reuse_reward_maniskill2 --env-id ...

This ensures that Python will treat the `drs/` directory as a module and resolves imports correctly.

This commit updates all usage examples in the README accordingly to avoid runtime import errors
and improve out-of-the-box usage.

…mode DrS claims that demonstrations are optional, but the current implementation assumes they exist by default and crashes if no successful trajectory has been collected early on. This gives a misleading impression that demos are required, which contradicts the paper, and, I believe, isn't intended by the code authors. To reproduce the crash: python -m drs.drs_learn_reward_maniskill2 \ --env-id TurnFaucet_DrS_learn-v0 \ --n-stages 2 \ --control-mode pd_ee_delta_pose \ --demo-path "" This leads to: Exception: All buffers are empty! This happens because DrS's discriminator training logic tries to sample from stage_buffers[k+1:] before any successful trajectories exist. One fix is to catch the exception and skip discriminator training **only** when `--demo-path` is not provided. If `--demo-path` is set, the crash is intentional to signal a real data mismatch or loading issue. This is especially important in cases where: - Users want to run DrS with a different number of stages than the demo dataset supports. - For example, TurnFaucet can have multiple valid stage definitions: def compute_stage_indicator(self): delta_angle = self.current_angle - self._last_angle success = self.evaluate()['success'] return { 'is_grasped': float(self.agent.check_grasp(self.target_link)), 'handle_move': float((delta_angle > 1e-3) or success), } Changing the stage logic would lead to tensor shape mismatches with the demo_data/*.pkl, so users may intentionally skip demos. To my understanding, this change strictly preserves DrS methodology — reward learning is only triggered when success/failure buffers have data. As per current implementation, the logic avoids silent failures and enforces crash-on-invalid demo paths (Raises 'All buffers are empty!' Exception), but continues gracefully in no-demo setups when it's expected to have empty buffers in the early stages.

…le imports The original README included commands like: python drs/drs_reuse_reward_maniskill2.py --env-id ... However, running these as-is leads to: ModuleNotFoundError: No module named 'drs' This happens because relative imports like `from drs.drs_learn_reward_maniskill2 import Discriminator` are not valid when executing scripts directly via their file path. This will happen everywhere where there is 'import drs.BLAH' The correct way to run these scripts is with the `-m` flag from the project root: python -m drs.drs_reuse_reward_maniskill2 --env-id ... This ensures that Python will treat the `drs/` directory as a module and resolves imports correctly. This commit updates all usage examples in the README accordingly to avoid runtime import errors and improve out-of-the-box usage.

AbdullahHendy added 2 commits April 25, 2025 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Support demo-free training in strict DrS setup without crashing on empty buffers + update README command usage#1

Support demo-free training in strict DrS setup without crashing on empty buffers + update README command usage#1
AbdullahHendy wants to merge 2 commits intotongzhoumu:mainfrom
AbdullahHendy:fix-no-demo-empty-buffer

AbdullahHendy commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

AbdullahHendy commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant