Support demo-free training in strict DrS setup without crashing on empty buffers + update README command usage#1
Open
AbdullahHendy wants to merge 2 commits intotongzhoumu:mainfrom
Conversation
…mode
DrS claims that demonstrations are optional, but the current implementation
assumes they exist by default and crashes if no successful trajectory has
been collected early on. This gives a misleading impression that demos are
required, which contradicts the paper, and, I believe, isn't intended by the code authors.
To reproduce the crash:
python -m drs.drs_learn_reward_maniskill2 \
--env-id TurnFaucet_DrS_learn-v0 \
--n-stages 2 \
--control-mode pd_ee_delta_pose \
--demo-path ""
This leads to:
Exception: All buffers are empty!
This happens because DrS's discriminator training logic tries to sample
from stage_buffers[k+1:] before any successful trajectories exist.
One fix is to catch the exception and skip discriminator training **only**
when `--demo-path` is not provided. If `--demo-path` is set, the crash is
intentional to signal a real data mismatch or loading issue.
This is especially important in cases where:
- Users want to run DrS with a different number of stages than the demo dataset supports.
- For example, TurnFaucet can have multiple valid stage definitions:
def compute_stage_indicator(self):
delta_angle = self.current_angle - self._last_angle
success = self.evaluate()['success']
return {
'is_grasped': float(self.agent.check_grasp(self.target_link)),
'handle_move': float((delta_angle > 1e-3) or success),
}
Changing the stage logic would lead to tensor shape mismatches with the demo_data/*.pkl,
so users may intentionally skip demos.
To my understanding, this change strictly preserves DrS methodology — reward learning is only
triggered when success/failure buffers have data. As per current implementation, the logic avoids silent
failures and enforces crash-on-invalid demo paths (Raises 'All buffers are empty!' Exception), but continues gracefully
in no-demo setups when it's expected to have empty buffers in the early stages.
…le imports
The original README included commands like:
python drs/drs_reuse_reward_maniskill2.py --env-id ...
However, running these as-is leads to:
ModuleNotFoundError: No module named 'drs'
This happens because relative imports like `from drs.drs_learn_reward_maniskill2 import Discriminator`
are not valid when executing scripts directly via their file path. This will happen everywhere where there is 'import drs.BLAH'
The correct way to run these scripts is with the `-m` flag from the project root:
python -m drs.drs_reuse_reward_maniskill2 --env-id ...
This ensures that Python will treat the `drs/` directory as a module and resolves imports correctly.
This commit updates all usage examples in the README accordingly to avoid runtime import errors
and improve out-of-the-box usage.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First off, thank you for the great work on DrS — our team really enjoyed reading the paper and found the method both elegant and practical.
We're currently exploring your suggestion of using large language models (LLMs) to generate or refine stage indicators, and in doing so, I ran into an issue when trying to run drs_learn_reward_maniskill2.py without demonstrations (--demo-path "").
While the paper states that demonstrations are optional, the current implementation crashes with an "All buffers are empty!" exception during discriminator training if no successful trajectories have been collected yet. Initially, this seemed to contradict the intended behavior of DrS as described in the paper.
After understanding the logic, I realized this was just a runtime oversight, and I submitted a minimal fix that skips discriminator updates when --demo-path is empty and success buffers are naturally unpopulated as explained below. I also noticed that the current README give example commands that should be run under the repo's root directory, but it doesn't necessarily work due to python imports, I tried to address this in this PR.