"Vibrations" around the goal #1971

DanieleLiuni · 2024-06-22T09:39:06Z

DanieleLiuni
Jun 22, 2024

Hi,
I'm a mechanical engineering student and I'm trying to use MuJoCo for Reinforcement Learning.
As first attempt, I created a simple environment with a spherical body moved by a mocap and a target site the body has to reach.
To move the sphere exactly with the mocap I created a weld joint with solimp='0.998 0.999 0.0001 0.1 6' solref='0.0015 0.7' (close to a hard constraint). The reward for the RL is considered to be the negative of the distance between the sphere and the target and the action is continuous between -0.04 and 0.04 (action = mocap delta position ).
Applying RL, I observe that the sphere reach the target but then it starts moving back and forth around the target at maximum action.
Is it a problem related to how the mocap works ?

yuvaltassa · 2024-06-24T15:27:41Z

yuvaltassa
Jun 24, 2024
Maintainer

Sounds like maybe this is just the noise injected to the controller during RL? That's how RL works after all...

0 replies

DanieleLiuni · 2024-06-25T08:55:12Z

DanieleLiuni
Jun 25, 2024
Author

Sorry, I probably missed the point. I can understand the movement of the sphere around the goal during learning since the agent has to explore the environment. But I can't understand why I have this problem of "vibrations" when I apply the trained agent (using 2 million learning timesteps). In theory, the agent should learn that, once the target is reached, the null action is the best one.
A similar situation is shown in the "Reach" Fetch environment provided by Gymnasium Robotics: the robot doesn't stop after the target is reached.
Since I want to study oscillations of DLOs, I need to remove this effect.

0 replies

Balint-H · 2024-06-25T12:45:30Z

Balint-H
Jun 25, 2024
Collaborator

Most RL frameworks let you switch to a deterministic policy (taking the action with maximum probability) once finished learning since, depending on configurations, the agent may never converge to a fully zero variance policy on its own. Could you see if you have such an option available to you?

Also, its worth considering your delta mocap position magnitude. If its too large the action will overshoot the target, forcing the agent to adjust again. Lastly, what are the observations?

0 replies

DanieleLiuni · 2024-06-25T15:48:25Z

DanieleLiuni
Jun 25, 2024
Author

I'm using PPO from Stablebaselines3, so I can switch to a deterministic policy when the policy predicts the action.However, the change doesn't provide any effect (same problem with the body moving back and forth at maximum action possible).
self.action_space = spaces.Box(-0.04, 0.04, shape=(self.n_actions,), dtype="float32"), so the agent can choose an action between -0.04 and 0.04, the distance from the target is 0.45 and the distance threshold is 0.03. I don't think that delta position is too high since it's one order of magnitude lower than the distance.
Finally, the observations are :
self.observation_space = spaces.Dict(
dict(
desired_goal=spaces.Box(-np.inf, np.inf, shape=obs["achieved_goal"].shape, dtype="float64"), # target position
achieved_goal=spaces.Box(-np.inf,np.inf, shape=obs["achieved_goal"].shape, dtype="float64"), # gripper position
observation = spaces.Box(-np.inf, np.inf, shape = obs["observation"].shape,dtype="float64"),
contact = spaces.Discrete(2),
)
)

Inside "observation" I put the position and the velocity of the body, in "desired_goal" the target pos and in "achieved_goal" the position of the body again.

(Initially, I thought the problem was caused by the difference in the movement between mocap and body. Using solimp='0.998 0.999 0.0001 0.1 6' solref='0.0015 0.7' I have created an hard constraint between mocap and body to solve this issue)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"Vibrations" around the goal #1971

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

"Vibrations" around the goal #1971

Uh oh!

DanieleLiuni Jun 22, 2024

Replies: 4 comments

Uh oh!

yuvaltassa Jun 24, 2024 Maintainer

Uh oh!

Uh oh!

DanieleLiuni Jun 25, 2024 Author

Uh oh!

Uh oh!

Balint-H Jun 25, 2024 Collaborator

Uh oh!

Uh oh!

DanieleLiuni Jun 25, 2024 Author

DanieleLiuni
Jun 22, 2024

yuvaltassa
Jun 24, 2024
Maintainer

DanieleLiuni
Jun 25, 2024
Author

Balint-H
Jun 25, 2024
Collaborator

DanieleLiuni
Jun 25, 2024
Author