-
-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Question
I'm trying to reproduce some results from the IQL paper and I'm confused about something with the dataset versions.
In table 1 of the paper, the mujoco locomotion datasets end with "v2":
If I do minari list remote --prefix mujoco, I only see ones ending in "v0":
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ # Episodes ┃ # Steps ┃ Size ┃ Author ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ mujoco/hopper/expert-v0 │ 1K │ 999K │ 135.3 MB │ Kallinteris Andreas │
│ mujoco/hopper/simple-v0 │ 2K │ 999K │ 148.1 MB │ Kallinteris Andreas │
│ mujoco/hopper/medium-v0 │ 1K │ 999K │ 139.6 MB │ Kallinteris Andreas │
├─────────────────────────────────────────┼────────────┼─────────┼──────────┼─────────────────────┤
│ mujoco/halfcheetah/simple-v0 │ 1K │ 1M │ 210.2 MB │ Kallinteris Andreas │
│ mujoco/halfcheetah/expert-v0 │ 1K │ 1M │ 210.2 MB │ Kallinteris Andreas │
│ mujoco/halfcheetah/medium-v0 │ 1K │ 1M │ 210.2 MB │ Kallinteris Andreas │
├─────────────────────────────────────────┼────────────┼─────────┼──────────┼─────────────────────┤
│ mujoco/walker2d/simple-v0 │ 1K │ 1000K │ 210.3 MB │ Kallinteris Andreas │
│ mujoco/walker2d/medium-v0 │ 1K │ 1000K │ 210.8 MB │ Kallinteris Andreas │
│ mujoco/walker2d/expert-v0 │ 1K │ 999K │ 210.0 MB │ Kallinteris Andreas │
...
Same as on the minari docs here.
There's also the issue that the paper has ones with names like walker2d-medium-replay-v2 and halfcheetah-medium-expert-v2. What are "medium-replay" and "medium-expert" ?
I see on the old Farama D4RL repo that they say:
Added new Gym-MuJoCo datasets (labeled v2) which fixed Hopper's performance and the qpos/qvel fields
So... the v2 mujoco ones were in D4RL?
Finally, I noticed that in the same old D4RL repo's wiki here, they say:
The *-v0 datasets were used to generate the results reported in our whitepaper and are included for backwards compatibility. However, the *-v2 datasets have improved metadata: ...
I don't know what the whitepaper is, but the page is by Justin Fu, who's not on the IQL paper and is the lead author of the D4RL paper, so I assume it's that. I'm not sure if this relates to the ones from the IQL paper I'm looking for?
The IQL paper also says (section 5.2):
We obtained results for TD3+BC and Onestep RL (Exp. Weight) directly from the authors. Note that Chen et al. (2021) and Brandfonbrener et al. (2021) incorrectly report results for some prior methods, such as CQL, using the “-v0” environments. These generally produce lower scores than the “-v2” environments that these papers use for their own methods. We use the “-v2” environments for all methods to ensure a fair comparison, resulting in higher values for CQL. Because of this fix, our reported CQL scores are higher than all other prior methods. We obtained results for “-v2” datasets using an author-suggested implementation.
I... don't know what to make of this. I don't know if it applies to other envs, or the mujoco ones too.
Can someone clarify what's going on?
- Was there a name switch at some point?
- were the v2 mujoco datasets just not ported to minari?
- how can I just get the datasets they used in the paper?
thanks for any help!