MuZero config for a custom env #446

svha1977 · 2025-11-30T08:30:41Z

svha1977
Nov 30, 2025

I did try to use LightZero in my own custom gym env with a wrapper as described. I also sense I then should use the e.g. cart pole config and did try that. But uppon collection of samples then it run into an error on compute_target_reward with an error
140 # target reward, target value
--> 141 batch_rewards, batch_target_values = self._compute_target_reward_value(
142 reward_value_context, policy._target_model
143 )
144 with self._compute_target_timer:
145 batch_target_policies_re = self._compute_target_policy_reanalyzed(policy_re_context, policy._target_model)

File ~/miniforge3/envs/rl/lib/python3.10/site-packages/lzero/mcts/buffer/game_buffer_muzero.py:525, in MuZeroGameBuffer._compute_target_reward_value(self, reward_value_context, model)
522 batch_rewards.append(target_rewards)
523 batch_target_values.append(target_values)
--> 525 batch_rewards = np.asarray(batch_rewards)
526 batch_target_values = np.asarray(batch_target_values)
528 return batch_rewards, batch_target_values

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (256, 6) + inhomogeneous part.

I suspect that it might be near episode end where it cannot collect x samples. My env is roughly like 90 steps long. Somethimes smaller?

Did try many different configs. This is the latest one. Any ideas as how to make it work for a non boardgame setup but a general env?

import copy
from easydict import EasyDict
from lzero.entry import train_muzero_with_gym_env

from zoo.classic_control.cartpole.config.cartpole_muzero_config import (
main_config as cartpole_main_config,
create_config as cartpole_create_config,
)

-------------------------------------------------------------------------

1) Start from CartPole MuZero config

-------------------------------------------------------------------------

soccer_main_config = copy.deepcopy(cartpole_main_config)
soccer_create_config = copy.deepcopy(cartpole_create_config)

-------------------------------------------------------------------------

2) Env section: point to your Gym env

-------------------------------------------------------------------------

soccer_main_config.env.env_id = "SoccerGym-v0"
soccer_main_config.env.continuous = False
soccer_main_config.env.manually_discretization = False

parallel envs – tweak if you want

soccer_main_config.env.collector_env_num = 4
soccer_main_config.env.evaluator_env_num = 2
soccer_main_config.env.n_evaluator_episode = 4
soccer_main_config.env.stop_value = 1e9 # "never stop early"

-------------------------------------------------------------------------

3) Model: observation / action sizes

IMPORTANT: observation_shape is a SCALAR for vector obs

-------------------------------------------------------------------------

puyuan1996 · 2025-12-19T11:26:34Z

puyuan1996
Dec 19, 2025
Maintainer

Please refer to #458 for more discussions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MuZero config for a custom env #446

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

MuZero config for a custom env #446

Uh oh!

svha1977 Nov 30, 2025

-------------------------------------------------------------------------

1) Start from CartPole MuZero config

-------------------------------------------------------------------------

-------------------------------------------------------------------------

2) Env section: point to your Gym env

-------------------------------------------------------------------------

parallel envs – tweak if you want

-------------------------------------------------------------------------

3) Model: observation / action sizes

IMPORTANT: observation_shape is a SCALAR for vector obs

-------------------------------------------------------------------------

Replies: 1 comment

Uh oh!

puyuan1996 Dec 19, 2025 Maintainer

svha1977
Nov 30, 2025

puyuan1996
Dec 19, 2025
Maintainer