Replies: 13 comments
-
@ManifoldFR may I ask how you obtained the values for the model/policy hyperparams? Did you perform tuning using Optuna as in the RL zoo? |
Beta Was this translation helpful? Give feedback.
-
I started from the parameters of Jason Peng's code, but for things like the maximum grad norm, target KL or vf coef I had to make guesses because these were not parameters in his PPO implementation (also he had two separate optimizers for the policy and value functions).
…On Tue, 24 Nov 2020, 16:37 Federico, ***@***.***> wrote:
@ManifoldFR <https://github.com/ManifoldFR> may I ask how you obtained
the values for the model/policy hyperparams? Did you perform tuning using
Optuna as in the RL zoo?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3161 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFA427HJQ7DP7TAFYUPRE4DSRPHMLANCNFSM4UBBMUGA>
.
|
Beta Was this translation helpful? Give feedback.
-
Training sometimes get stuck in such behavior. Did you try a couple of training runs? |
Beta Was this translation helpful? Give feedback.
-
What about the discount factor and lambda parameter for TD(lambda) ? Also, are you using my branch with the modifications to the Gym env? |
Beta Was this translation helpful? Give feedback.
-
@erwincoumans I tried a couple of runs using my script and another one using the training script from stable-baselines 3 zoo @ManifoldFR I used the default values for the discount factor and lambda parameter. Did you use custom values? I wondered you also used the default ones given that you didn't list them with the other params. |
Beta Was this translation helpful? Give feedback.
-
Sorry about that, I use a strategy where I have a default set of PPO params
on top of SB3's defaults, and the values I gave you were the overrides for
the both of them. Check the hyperparams.yml in the Dropbox link I sent, I
use the same discount and lambda (0.95) as Jason Peng. I think one of the
important things was I use 4096 timesteps per env per rollout
…On Tue, 24 Nov 2020, 16:47 Federico, ***@***.***> wrote:
@erwincoumans <https://github.com/erwincoumans> I tried a couple of runs
using my script and another one using the training script from
stable-baselines 3 zoo
@ManifoldFR <https://github.com/ManifoldFR> I used the default values for
the discount factor and lambda parameter. Did you use custom values? I
wondered you also used the default ones given that you didn't list them
with the other params.
I used the version with action/observation scaling, so I guess it's the
same.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3161 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFA427DM23A7MY6NJZJAP3DSRPISNANCNFSM4UBBMUGA>
.
|
Beta Was this translation helpful? Give feedback.
-
Ah I see, no worries! |
Beta Was this translation helpful? Give feedback.
-
I was wondering whether I was doing something wrong in the training setup or when loading the model, but I figured there might be something wrong with the parameters given that the training would get stuck |
Beta Was this translation helpful? Give feedback.
-
Yes, the method is quite brittle I'm afraid, some hyperparameters can send you to very bad local minima. Have you looked at other papers like Facebook's ScaDiver ? The approach is the same but the subreward aggregation/early termination strategies are different. Maybe it's more robust but I haven't tested yet |
Beta Was this translation helpful? Give feedback.
-
I haven't read the paper but I saw their repo and video, seems very promising. I am trying to stick with DeepMimic because I don't want to change everything halfway :) Also, if I recall correctly, they use a different format for clips (3d joints instead of quaternions maybe?), so I would have to adapt the tracking algorithm to that as well. |
Beta Was this translation helpful? Give feedback.
-
They use the more standard BVH format instead of the custom format used in
deepmimic, they have code to convert to character poses in reduced
coordinates to supply to pybullet
…On Tue, 24 Nov 2020, 17:18 Federico, ***@***.***> wrote:
I haven't read the paper but I saw their repo and video, seems very
promising. I am trying to stick with DeepMimic because I don't want to
change everything halfway :)
Also, if I recall correctly, they use a different format for clips (3d
joints instead of quaternions maybe?), so I would have to adapt the
tracking algorithm to that as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3161 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFA427B7A3AWBIY5AWRD4FDSRPMGPANCNFSM4UBBMUGA>
.
|
Beta Was this translation helpful? Give feedback.
-
Btw I could help but notice that in
I don't think it would actually make a huge difference, but it seemed a bit odd. |
Beta Was this translation helpful? Give feedback.
-
That's something I'm not 100% sure about. DeepMimic's interaction loop is pretty non-standard and it's hard to tell when the rewards are calculated: I think it's with respect to the current state IMO either one works as long as you make sure the reference pose you're comparing the state to is the right one (same time step). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
i tried to train the character using the hyperparams given by @ManifoldFR in #3076 .
However, after 60 millions steps the character averages a reward of ~300/350 and when I test it the character walks by moving always the same foot and then dragging the other one.
Here are my training and enjoy scripts:
train
enjoy
In
deep_mimic_env.py
I modified the action space by using aFakeBox
class that inheritsgym.spaces.Box
Beta Was this translation helpful? Give feedback.
All reactions