[Question] How to get value estimates in play.py script for rl_games? #3029

dhruvkm2402 · 2025-07-19T05:51:22Z

dhruvkm2402
Jul 19, 2025

Question

Hello, I have trained a task using rl_games. I can the actions are obtained "actions = agent.get_action(obs, is_deterministic=agent.is_deterministic)" . How do I get value estimates? I need it for further analysis. I don't see a similar method for getting values.

RandomOakForest · 2025-07-26T11:37:29Z

RandomOakForest
Jul 26, 2025
Maintainer

Thanks for posting this. In rl_games, the typical interface for action selection you’re using is:

actions = agent.get_action(obs, is_deterministic=agent.is_deterministic)

However, rl_games does not expose a direct method like agent.get_value(obs) for obtaining value estimates (i.e., the critic’s output for a given observation) during inference or test-time. This is a common need for analysis, diagnostics, or advanced algorithms.

How to Get Value Estimates from a Trained rl_games Agent

1. Understanding the Architecture

rl_games uses an "actor-critic" model. The actor returns actions, and the critic (the value network) outputs a value estimate given the current observation.
The weights for both are loaded when you restore a model/checkpoint.
Under the hood, both actor and critic are often parts of a combined PyTorch nn.Module.

2. Accessing the Value Function Manually

If you need value estimates, you can do so by directly accessing the appropriate function of the model. In rl_games, the agent wrapper usually stores the PyTorch policy network (often in agent.model or similar).

Suppose your policy is loaded and you have an observation tensor (properly normalized and placed on the correct device):

with torch.no_grad():
    # Get both action and value estimate
    action, value = agent.model.forward(obs)
    # action is your policy output, value is the state value estimate

For typical rl_games PPO agents, the .forward(obs) method returns both action probabilities (or sampled actions) and values.
If you want just the value for a given observation:

with torch.no_grad():
    value = agent.model.a2c_network.get_value(obs)

The naming may vary; for separate actor-critic settings, check for .critic or similar module names in the codebase or model.
For asymmetric actor-critic, you may need to supply the specific observation (e.g., "state" key for privileged obs).

3. Example with Isaac Lab Integration

When you’re using Isaac Lab or a custom vectorized environment:

Ensure that the obs tensor is shaped/batched as expected (e.g., [num_envs, obs_dim] and on the correct device).
Example:

import torch

obs = ...  # Your (batched) observation tensor

with torch.no_grad():
    # PPO's a2c_network usually provides get_value
    value = agent.model.a2c_network.get_value(obs)  # shape: [num_envs, 1]

4. Important Notes

There’s no built-in function like get_value(obs) on the agent by default. You must use direct access to the neural network (agent.model).
If you used a custom network, the value function might be in a different attribute (e.g., agent.model.value_head(obs)).
Always check your model’s network definition (from your config or model class).

I'll move this post to our Discussions for follow up.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] How to get value estimates in play.py script for rl_games? #3029

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Question] How to get value estimates in play.py script for rl_games? #3029

Uh oh!

dhruvkm2402 Jul 19, 2025

Question

Replies: 1 comment

Uh oh!

RandomOakForest Jul 26, 2025 Maintainer

How to Get Value Estimates from a Trained rl_games Agent

1. Understanding the Architecture

2. Accessing the Value Function Manually

3. Example with Isaac Lab Integration

4. Important Notes

dhruvkm2402
Jul 19, 2025

RandomOakForest
Jul 26, 2025
Maintainer