"Vibrations" around the goal #1971
Replies: 4 comments
-
Sounds like maybe this is just the noise injected to the controller during RL? That's how RL works after all... |
Beta Was this translation helpful? Give feedback.
-
Sorry, I probably missed the point. I can understand the movement of the sphere around the goal during learning since the agent has to explore the environment. But I can't understand why I have this problem of "vibrations" when I apply the trained agent (using 2 million learning timesteps). In theory, the agent should learn that, once the target is reached, the null action is the best one. |
Beta Was this translation helpful? Give feedback.
-
Most RL frameworks let you switch to a deterministic policy (taking the action with maximum probability) once finished learning since, depending on configurations, the agent may never converge to a fully zero variance policy on its own. Could you see if you have such an option available to you? Also, its worth considering your delta mocap position magnitude. If its too large the action will overshoot the target, forcing the agent to adjust again. Lastly, what are the observations? |
Beta Was this translation helpful? Give feedback.
-
I'm using PPO from Stablebaselines3, so I can switch to a deterministic policy when the policy predicts the action.However, the change doesn't provide any effect (same problem with the body moving back and forth at maximum action possible). Inside "observation" I put the position and the velocity of the body, in "desired_goal" the target pos and in "achieved_goal" the position of the body again. (Initially, I thought the problem was caused by the difference in the movement between mocap and body. Using solimp='0.998 0.999 0.0001 0.1 6' solref='0.0015 0.7' I have created an hard constraint between mocap and body to solve this issue) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm a mechanical engineering student and I'm trying to use MuJoCo for Reinforcement Learning.
As first attempt, I created a simple environment with a spherical body moved by a mocap and a target site the body has to reach.
To move the sphere exactly with the mocap I created a weld joint with solimp='0.998 0.999 0.0001 0.1 6' solref='0.0015 0.7' (close to a hard constraint). The reward for the RL is considered to be the negative of the distance between the sphere and the target and the action is continuous between -0.04 and 0.04 (action = mocap delta position ).
Applying RL, I observe that the sphere reach the target but then it starts moving back and forth around the target at maximum action.
Is it a problem related to how the mocap works ?
Beta Was this translation helpful? Give feedback.
All reactions