Skip to content

Comments

Support demo-free training in strict DrS setup without crashing on empty buffers + update README command usage#1

Open
AbdullahHendy wants to merge 2 commits intotongzhoumu:mainfrom
AbdullahHendy:fix-no-demo-empty-buffer
Open

Support demo-free training in strict DrS setup without crashing on empty buffers + update README command usage#1
AbdullahHendy wants to merge 2 commits intotongzhoumu:mainfrom
AbdullahHendy:fix-no-demo-empty-buffer

Conversation

@AbdullahHendy
Copy link

First off, thank you for the great work on DrS — our team really enjoyed reading the paper and found the method both elegant and practical.

We're currently exploring your suggestion of using large language models (LLMs) to generate or refine stage indicators, and in doing so, I ran into an issue when trying to run drs_learn_reward_maniskill2.py without demonstrations (--demo-path "").

While the paper states that demonstrations are optional, the current implementation crashes with an "All buffers are empty!" exception during discriminator training if no successful trajectories have been collected yet. Initially, this seemed to contradict the intended behavior of DrS as described in the paper.

After understanding the logic, I realized this was just a runtime oversight, and I submitted a minimal fix that skips discriminator updates when --demo-path is empty and success buffers are naturally unpopulated as explained below. I also noticed that the current README give example commands that should be run under the repo's root directory, but it doesn't necessarily work due to python imports, I tried to address this in this PR.


fix1: skip discriminator update if success buffer is empty in no-demo mode

DrS claims that demonstrations are optional, but the current implementation
assumes they exist by default and crashes if no successful trajectory has
been collected early on. This gives a misleading impression that demos are
required, which contradicts the paper, and, I believe, isn't intended by the code authors.

To reproduce the crash:

    python -m drs.drs_learn_reward_maniskill2 \
        --env-id TurnFaucet_DrS_learn-v0 \
        --n-stages 2 \
        --control-mode pd_ee_delta_pose \
        --demo-path ""

This leads to:
    Exception: All buffers are empty!

This happens because DrS's discriminator training logic tries to sample
from stage_buffers[k+1:] before any successful trajectories exist.

One fix is to catch the exception and skip discriminator training **only**
when `--demo-path` is not provided. If `--demo-path` is set, the crash is
intentional to signal a real data mismatch or loading issue.

This is especially important in cases where:
- Users want to run DrS with a different number of stages than the demo dataset supports.
- For example, TurnFaucet can have multiple valid stage definitions:
    def compute_stage_indicator(self):
        delta_angle = self.current_angle - self._last_angle
        success = self.evaluate()['success']
        return {
            'is_grasped': float(self.agent.check_grasp(self.target_link)),
            'handle_move': float((delta_angle > 1e-3) or success),
        }

    Changing the stage logic would lead to tensor shape mismatches with the demo_data/*.pkl,
    so users may intentionally skip demos.

To my understanding, this change strictly preserves DrS methodology — reward learning is only
triggered when success/failure buffers have data. As per current implementation, the logic avoids silent
failures and enforces crash-on-invalid demo paths (Raises 'All buffers are empty!' Exception), but continues gracefully
in no-demo setups when it's expected to have empty buffers in the early stages.

fix2: update README command examples to use `python -m` for correct module imports

The original README included commands like:

    python drs/drs_reuse_reward_maniskill2.py --env-id ...

However, running these as-is leads to:

    ModuleNotFoundError: No module named 'drs'

This happens because relative imports like `from drs.drs_learn_reward_maniskill2 import Discriminator`
are not valid when executing scripts directly via their file path. This will happen everywhere where there is 'import drs.BLAH'

The correct way to run these scripts is with the `-m` flag from the project root:

    python -m drs.drs_reuse_reward_maniskill2 --env-id ...

This ensures that Python will treat the `drs/` directory as a module and resolves imports correctly.

This commit updates all usage examples in the README accordingly to avoid runtime import errors
and improve out-of-the-box usage.

…mode

DrS claims that demonstrations are optional, but the current implementation
assumes they exist by default and crashes if no successful trajectory has
been collected early on. This gives a misleading impression that demos are
required, which contradicts the paper, and, I believe, isn't intended by the code authors.

To reproduce the crash:

    python -m drs.drs_learn_reward_maniskill2 \
        --env-id TurnFaucet_DrS_learn-v0 \
        --n-stages 2 \
        --control-mode pd_ee_delta_pose \
        --demo-path ""

This leads to:
    Exception: All buffers are empty!

This happens because DrS's discriminator training logic tries to sample
from stage_buffers[k+1:] before any successful trajectories exist.

One fix is to catch the exception and skip discriminator training **only**
when `--demo-path` is not provided. If `--demo-path` is set, the crash is
intentional to signal a real data mismatch or loading issue.

This is especially important in cases where:
- Users want to run DrS with a different number of stages than the demo dataset supports.
- For example, TurnFaucet can have multiple valid stage definitions:
    def compute_stage_indicator(self):
        delta_angle = self.current_angle - self._last_angle
        success = self.evaluate()['success']
        return {
            'is_grasped': float(self.agent.check_grasp(self.target_link)),
            'handle_move': float((delta_angle > 1e-3) or success),
        }

    Changing the stage logic would lead to tensor shape mismatches with the demo_data/*.pkl,
    so users may intentionally skip demos.

To my understanding, this change strictly preserves DrS methodology — reward learning is only
triggered when success/failure buffers have data. As per current implementation, the logic avoids silent
failures and enforces crash-on-invalid demo paths (Raises 'All buffers are empty!' Exception), but continues gracefully
in no-demo setups when it's expected to have empty buffers in the early stages.
…le imports

The original README included commands like:

    python drs/drs_reuse_reward_maniskill2.py --env-id ...

However, running these as-is leads to:

    ModuleNotFoundError: No module named 'drs'

This happens because relative imports like `from drs.drs_learn_reward_maniskill2 import Discriminator`
are not valid when executing scripts directly via their file path. This will happen everywhere where there is 'import drs.BLAH'

The correct way to run these scripts is with the `-m` flag from the project root:

    python -m drs.drs_reuse_reward_maniskill2 --env-id ...

This ensures that Python will treat the `drs/` directory as a module and resolves imports correctly.

This commit updates all usage examples in the README accordingly to avoid runtime import errors
and improve out-of-the-box usage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant