Replies: 1 comment
-
|
Please refer to #458 for more discussions. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I did try to use LightZero in my own custom gym env with a wrapper as described. I also sense I then should use the e.g. cart pole config and did try that. But uppon collection of samples then it run into an error on compute_target_reward with an error
140 # target reward, target value
--> 141 batch_rewards, batch_target_values = self._compute_target_reward_value(
142 reward_value_context, policy._target_model
143 )
144 with self._compute_target_timer:
145 batch_target_policies_re = self._compute_target_policy_reanalyzed(policy_re_context, policy._target_model)
File ~/miniforge3/envs/rl/lib/python3.10/site-packages/lzero/mcts/buffer/game_buffer_muzero.py:525, in MuZeroGameBuffer._compute_target_reward_value(self, reward_value_context, model)
522 batch_rewards.append(target_rewards)
523 batch_target_values.append(target_values)
--> 525 batch_rewards = np.asarray(batch_rewards)
526 batch_target_values = np.asarray(batch_target_values)
528 return batch_rewards, batch_target_values
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (256, 6) + inhomogeneous part.
I suspect that it might be near episode end where it cannot collect x samples. My env is roughly like 90 steps long. Somethimes smaller?
Did try many different configs. This is the latest one. Any ideas as how to make it work for a non boardgame setup but a general env?
import copy
from easydict import EasyDict
from lzero.entry import train_muzero_with_gym_env
from zoo.classic_control.cartpole.config.cartpole_muzero_config import (
main_config as cartpole_main_config,
create_config as cartpole_create_config,
)
-------------------------------------------------------------------------
1) Start from CartPole MuZero config
-------------------------------------------------------------------------
soccer_main_config = copy.deepcopy(cartpole_main_config)
soccer_create_config = copy.deepcopy(cartpole_create_config)
-------------------------------------------------------------------------
2) Env section: point to your Gym env
-------------------------------------------------------------------------
soccer_main_config.env.env_id = "SoccerGym-v0"
soccer_main_config.env.continuous = False
soccer_main_config.env.manually_discretization = False
parallel envs – tweak if you want
soccer_main_config.env.collector_env_num = 4
soccer_main_config.env.evaluator_env_num = 2
soccer_main_config.env.n_evaluator_episode = 4
soccer_main_config.env.stop_value = 1e9 # "never stop early"
-------------------------------------------------------------------------
3) Model: observation / action sizes
IMPORTANT: observation_shape is a SCALAR for vector obs
-------------------------------------------------------------------------
Beta Was this translation helpful? Give feedback.
All reactions