Why play back the policy into the same RL train script will not make the agent learn? #2205

celestialdr4g0n · 2025-04-01T12:46:42Z

celestialdr4g0n
Apr 1, 2025

I’m experimenting with loading a trained policy (successfully trained) in my environment and playing back its actions during training. However, I discovered that simply using the pre-trained policy to generate actions based on the current training environment’s observations does not lead to any further learning by the agent.

The pseudo code look something like this:

class ENV(DirectEnv):
  def __init__():
    self.l_policy = loadPolicy()
  def _pre_physic_step(actions):
    self.actions = self.l_policy.runner.act(self._get_observations()) <-- current training env obs
    self.target_actions = f(self.actions)
  def apply_actions:
    self.set_action()

except the actions, everything else are the same.
I am using SKRL, PPO. My task is Franka Lift Cube

RandomOakForest · 2025-04-04T16:32:49Z

RandomOakForest
Apr 4, 2025
Maintainer

Thanks for posting this. Which scripts are you using for training and then play? Playback may be just doing inference. Alternatively, weights may be unaltered after training (frozen), unless you pick up the training where you left off (checkpoint). If you could share some code or hwo are you calling your training and inference rounds would help.

0 replies

celestialdr4g0n · 2025-04-08T03:02:55Z

celestialdr4g0n
Apr 8, 2025
Author

Thank you for your answer. I use the functions in the play.py script to load the trained policy and train the agent using train.py script. Then I use play.py to observe my training result.
torch.load() does not work out of the box so I have to use functions from skrl lib.

train.py

class ENV(DirectEnv):
  def __init__():
        self.task_name = "Franka-Lift-direct-v0-rev"
        self.trained_pth = "path_to_check_point"
        exp_cfg = load_cfg_from_registry(self.task_name, "skrl_cfg_entry_point")
        exp_cfg["trainer"]["close_environment_at_exit"] = False
        exp_cfg["agent"]["experiment"]["write_interval"] = 0  # don't log to TensorBoard
        exp_cfg["agent"]["experiment"]["checkpoint_interval"] = 0  # don't generate checkpoints
        dummy_env = SkrlVecEnvWrapper(self, ml_framework="torch") # the obs dim must be the same
        self.trained_runner = Runner(dummy_env, exp_cfg)
        self.trained_runner.agent.load(self.trained_pth)
        self.trained_runner.agent.set_running_mode("eval")

  def _pre_physic_step(actions):
     self.actions = actions.clone().clamp(-1.0, 1.0)
     r_actions = self.trained_runner.agent.act(self._get_observations()["policy"], timestep=0, timesteps=0)
     r_actions_mean = r_actions[-1].get("mean_actions", r_actions[0])
     # catch reset envs and apply reset actions 
     r_actions_mean[self.episode_length_buf==0, :] = self.robot.data.default_joint_pos[self.episode_length_buf==0, :] 
     dof_targets = r_actions_mean.clone()
     self.dof_targets[:, self.left_finger_joint_idx] = torch.where(
            dof_targets[:, self.left_finger_joint_idx] > 0,
            self.open_translation,
            self.close_translation
      )
     self.dof_targets[:, self.right_finger_joint_idx] = torch.where(
            dof_targets[:, self.right_finger_joint_idx] > 0,
            self.open_translation,
            self.close_translation
      )
  def apply_actions:
    self.robot.set_joint_position_target(self.dof_targets)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why play back the policy into the same RL train script will not make the agent learn? #2205

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Why play back the policy into the same RL train script will not make the agent learn? #2205

Uh oh!

Uh oh!

celestialdr4g0n Apr 1, 2025

Replies: 2 comments

Uh oh!

RandomOakForest Apr 4, 2025 Maintainer

Uh oh!

Uh oh!

celestialdr4g0n Apr 8, 2025 Author

celestialdr4g0n
Apr 1, 2025

RandomOakForest
Apr 4, 2025
Maintainer

celestialdr4g0n
Apr 8, 2025
Author