-
Notifications
You must be signed in to change notification settings - Fork 149
Description
Required prerequisites
- I have read the documentation https://omnisafe.readthedocs.io.
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
Questions
Thanks for your great work!
I managed to integrate my environment within your example file train_from_custom_env.py. However, I failed to use the vectorized environment to speed up the data collection.
First I tried directly to change the parameter vector_env_nums in config files, but it reported
Processing rollout for epoch: 0... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--
Traceback (most recent call last):
File "/home/southriver/omnisafe/examples/train_from_custom_env.py", line 160, in <module>
agent.learn()
File "/home/southriver/omnisafe/omnisafe/algorithms/algo_wrapper.py", line 180, in learn
ep_ret, ep_cost, ep_len = self.agent.learn()
File "/home/southriver/omnisafe/omnisafe/algorithms/on_policy/base/policy_gradient.py", line 259, in learn
self._env.rollout(
File "/home/southriver/omnisafe/omnisafe/adapter/onpolicy_adapter.py", line 94, in rollout
buffer.store(
File "/home/southriver/omnisafe/omnisafe/common/buffer/vector_onpolicy_buffer.py", line 99, in store
buffer.store(**{k: v[i] for k, v in data.items()})
File "/home/southriver/omnisafe/omnisafe/common/buffer/vector_onpolicy_buffer.py", line 99, in <dictcomp>
buffer.store(**{k: v[i] for k, v in data.items()})
IndexError: index 1 is out of bounds for dimension 0 with size 1In this scenario, the step function was implemented to only collect single environments forward simulation, which indicates obs[obs_space_size], reward[1], and cost[1].
Then I tried to implement the step to collect batched forward simulation, and make the return elements in step satisfies obs[num_env, obs_space_size], reward[num_env, 1], and cost[num_env, 1], but also failed:
Processing rollout for epoch: 0... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--
Traceback (most recent call last):
File "/home/southriver/omnisafe/examples/train_from_custom_env.py", line 182, in <module> agent.learn()
File "/home/southriver/omnisafe/omnisafe/algorithms/algo_wrapper.py", line 180, in learn ep_ret, ep_cost, ep_len = self.agent.learn()
File "/home/southriver/omnisafe/omnisafe/algorithms/on_policy/base/policy_gradient.py", line 259, in learn self._env.rollout(
File "/home/southriver/omnisafe/omnisafe/adapter/onpolicy_adapter.py", line 88, in rollout self._log_value(reward=reward, cost=cost, info=info)
File "/home/southriver/omnisafe/omnisafe/adapter/onpolicy_adapter.py", line 155, in _log_value self._ep_ret += info.get('original_reward', reward).cpu()
RuntimeError: output with shape [1] doesn't match the broadcast shape [1, 4]I also tried to reshape the reward and cost into [1], but also failed:
Traceback (most recent call last):
File "/home/southriver/omnisafe/examples/train_from_custom_env.py", line 186, in <module> agent.learn()
File "/home/southriver/omnisafe/omnisafe/algorithms/algo_wrapper.py", line 180, in learn ep_ret, ep_cost, ep_len = self.agent.learn()
File "/home/southriver/omnisafe/omnisafe/algorithms/on_policy/base/policy_gradient.py", line 259, in learn self._env.rollout(
File "/home/southriver/omnisafe/omnisafe/adapter/onpolicy_adapter.py", line 94, in rollout buffer.store(
File "/home/southriver/omnisafe/omnisafe/common/buffer/vector_onpolicy_buffer.py", line 99, in store buffer.store(**{k: v[i] for k, v in data.items()})
File "/home/southriver/omnisafe/omnisafe/common/buffer/onpolicy_buffer.py", line 145, in store self.data[key][self.ptr] = value
RuntimeError: expand(torch.cuda.FloatTensor{[4, 185]}, size=[185]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)This is my original implementation in forward simulation:
def step(
self,
action: torch.Tensor,
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, dict]:
self._count += 1
# obs = torch.as_tensor(self._observation_space.sample())
# reward = 2 * torch.as_tensor(random.random()) # noqa
# cost = 2 * torch.as_tensor(random.random()) # noqa
# terminated = torch.as_tensor(random.random() > 0.9) # noqa
# prepare action, a_in is the real action input to the simulator
a_in = [
(action[0] + 1) / 4 * 3,
action[1]
]
# forward simulation
latest_scan, distance, cos, sin, collision, goal, a, reward, cost = self.sim.step(
lin_velocity=a_in[0].item(), ang_velocity=a_in[1].item()
)
# prepare observation
latest_scan = np.array(latest_scan)
inf_mask = np.isinf(latest_scan)
latest_scan[inf_mask] = 7.0 # max range
max_bins = 180
bin_size = int(np.ceil(len(latest_scan) / max_bins))
min_values = []
for i in range(0, len(latest_scan), bin_size):
# Get the current bin
bin = latest_scan[i : i + min(bin_size, len(latest_scan) - i)]
# Find the minimum value in the current bin and append it to the min_values list
min_values.append(min(bin) / 7)
distance /= 100 # process to maintain within [0, 1]
lin_vel = (action[0] + 1) /2 # action is in [-1, 1], process to [0, 1]
ang_vel = (action[1] + 1) /2
state = min_values + [distance, cos, sin, lin_vel, ang_vel]
# process data types
obs = torch.as_tensor(state, dtype=torch.float32).to(self.device)
reward = torch.as_tensor(reward, dtype=torch.float32).to(self.device)
cost = torch.as_tensor(cost, dtype=torch.float32).to(self.device)
terminated = torch.as_tensor(goal, dtype=torch.float32).to(self.device)
truncated = torch.as_tensor(self._count > self.max_episode_steps, dtype=torch.float32).to(self.device)
return obs, reward, cost, terminated, truncated, {'final_observation': obs, 'cost': cost}