[Question] Issues with Multi-Armed RL agent #1852
Unanswered
pmfaustino
asked this question in
Q&A
Replies: 2 comments
-
Thanks for posting this. Great work! I'll move this post into our Discussions section for the team to follow up. |
Beta Was this translation helpful? Give feedback.
0 replies
-
How are you doing, did you try punish the bias arm agent reward? I mean both agent will not receiving reward or be punished because the other agent failed to finish its subtasks. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I have successfully trained a direct RL agent for a single-arm robot (Franka Panda) to pick up a cube and lift it to a desired position. However, I’m encountering difficulties when attempting to apply the same approach to a multi-arm robot (ABB Yumi) for the same task.
At first I tried to use the closest arm to pick the cube, but the agent developed a bias towards one arm, which resulted in situations like being unable to lift the cube because the closest arm would just ignore it and the cube would be out of reach of the farthest arm. Then I’ve explored conventional "multi-armed bandit" strategies like Epsilon Greedy and Thompson Sampling, but, despite adjusting various parameters, the agent seems to favor one arm consistently.
Here are the observations I’m using for training:
Joint positions and velocities
Object position
Distance from the left and right grippers to the object
Goal position
Distance from the object to the goal
Actions taken
The reward system I’ve designed includes the following components:
A distance reward, which is inversely proportional to the distance between the gripper and the object (the closer the gripper, the higher the reward).
A lift reward, which is granted when the object is lifted above a minimum threshold.
A goal reward, which is inversely proportional to the distance from the object to the goal (the closer the object is to the goal, the higher the reward).
I’m reaching out to see if anyone has encountered a similar problem or can suggest a different approach that might help solve this issue. Any advice or insights on training multi-arm RL agents effectively would be greatly appreciated!
Thank you in advance for your help!
Beta Was this translation helpful? Give feedback.
All reactions