CoM reward makes motion jittery #3700

tfederico · 2020-07-14T14:39:42Z

tfederico
Jul 14, 2020

I tried to run the motion file walk using the checkpoint humanoid3d_walk_COMenabled.ckpt, but the character walks in an odd way and the movements are jittery. You can see the result in this video.

I've also tried to re-download the checkpoint and/or retrain the policy, but the results are the same. The total reward is around 600, but the reward shown in the table is around 60/70.

Do you know which might be the issue? @ManifoldFR @erwincoumans

tfederico · 2020-07-14T14:45:10Z

tfederico
Jul 14, 2020
Author

This is the output of the last iteration

total_reward= 591.6705727255253
total_reward= 243.67367030236088
total_reward= 616.797021415739
total_reward= 671.1451071319889
total_reward= 567.0324464533726
total_reward= 600.0086108514454
total_reward= 569.2327377122585
total_reward= 625.4447597898787
total_reward= 593.9930947095625
total_reward= 653.7059954912147
total_reward= 597.3132993126923
total_reward= 653.7461403359835
total_reward= 541.6954370380283
total_reward= 540.8598626358088
total_reward= 545.4463052148188
total_reward= 597.4143210801008
total_reward= 613.9794177757146
total_reward= 670.3452912501295
total_reward= 591.3753855060046
total_reward= 602.9625493302237
total_reward= 576.2369867090213
total_reward= 626.2160568181738
total_reward= 586.670532133944
total_reward= 660.4745956421806
total_reward= 336.8020764473559
total_reward= 603.2142162408127
total_reward= 547.6281677943383
total_reward= 569.104629862165
total_reward= 609.5559980021011
total_reward= 467.91968946873004
total_reward= 614.8123625621893
total_reward= 639.544688730203
total_reward= 558.1669304658501
total_reward= 500.94694794326466
total_reward= 606.1023105542579
Agent 0
-------------------------------------
|       Iteration |           14449 |
|       Wall_Time |            21.3 |
|         Samples |        63173017 |
|    Train_Return |              67 |
|     Test_Return |            71.8 |
|      State_Mean |        -0.00787 |
|       State_Std |            1.06 |
|       Goal_Mean |               0 |
|        Goal_Std |               0 |
|        Exp_Rate |            0.21 |
|       Exp_Noise |            0.05 |
|        Exp_Temp |         0.00106 |
|     Critic_Loss |         0.00201 |
| Critic_Stepsize |            0.01 |
|      Actor_Loss |            0.27 |
|  Actor_Stepsize |         2.5e-06 |
|       Clip_Frac |           0.199 |
|        Adv_Mean |       0.5106435 |
|         Adv_Std |      0.45592296 |
-------------------------------------```

0 replies

erwincoumans · 2020-07-14T16:45:26Z

erwincoumans
Jul 14, 2020
Maintainer

Using CoM may require longer training. Let's disable CoM reward for now, until it is fixed.

0 replies

ManifoldFR · 2020-07-14T16:46:43Z

ManifoldFR
Jul 14, 2020

Yes, that's the behavior I'm getting, table values that are lower even though "total_reward" is around 600 and the sim character seems to follow the reference COM for longer (as in, the character doesn't veer leftwards for some episodes)
I agree, let's set the default to False for now until we understand what's going on

0 replies

erwincoumans · 2020-07-14T16:49:29Z

erwincoumans
Jul 14, 2020
Maintainer

Using

python testrl.py --arg_file run_humanoid3d_walk_args.txt

seems to walk just fine though, without jitter. Isn't the walk args policy using the CoM reward?

0 replies

ManifoldFR · 2020-07-14T16:57:32Z

ManifoldFR
Jul 14, 2020

That args file is loading the checkpoint of the policy trained without the CoM reward. If you override the --model_files argument of the script, you can use the right checkpoint:

python testrl.py --arg_file run_humanoid3d_walk_args.txt --model_files data/policies/humanoid3d/humanoid3d_walk_COMenabled.ckpt

0 replies

erwincoumans · 2020-07-14T17:34:10Z

erwincoumans
Jul 14, 2020
Maintainer

Thanks. How can you disable the CoM reward? Do we have to manually edit the file humanoid_stable_pd.py?

0 replies

ManifoldFR · 2020-07-14T18:47:29Z

ManifoldFR
Jul 14, 2020

The argument useComReward=Falsecan be passed to the constructor of HumanoidStablePD (default value is True), so either that or changing pybullet_deep_mimic_env.py unfortunately

0 replies

CoM reward makes motion jittery #3700

Uh oh!

Uh oh!

tfederico Jul 14, 2020

Replies: 7 comments

Uh oh!

Uh oh!

tfederico Jul 14, 2020 Author

Uh oh!

erwincoumans Jul 14, 2020 Maintainer

Uh oh!

Uh oh!

ManifoldFR Jul 14, 2020

Uh oh!

erwincoumans Jul 14, 2020 Maintainer

Uh oh!

ManifoldFR Jul 14, 2020

Uh oh!

erwincoumans Jul 14, 2020 Maintainer

Uh oh!

ManifoldFR Jul 14, 2020

tfederico
Jul 14, 2020

tfederico
Jul 14, 2020
Author

erwincoumans
Jul 14, 2020
Maintainer

ManifoldFR
Jul 14, 2020

erwincoumans
Jul 14, 2020
Maintainer

ManifoldFR
Jul 14, 2020

erwincoumans
Jul 14, 2020
Maintainer

ManifoldFR
Jul 14, 2020