Skip to content

[Question] How to vectorize customized environment? #380

@South-River

Description

@South-River

Required prerequisites

Questions

Thanks for your great work!
I managed to integrate my environment within your example file train_from_custom_env.py. However, I failed to use the vectorized environment to speed up the data collection.
First I tried directly to change the parameter vector_env_nums in config files, but it reported

Processing rollout for epoch: 0... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--
Traceback (most recent call last):
  File "/home/southriver/omnisafe/examples/train_from_custom_env.py", line 160, in <module>
    agent.learn()
  File "/home/southriver/omnisafe/omnisafe/algorithms/algo_wrapper.py", line 180, in learn
    ep_ret, ep_cost, ep_len = self.agent.learn()
  File "/home/southriver/omnisafe/omnisafe/algorithms/on_policy/base/policy_gradient.py", line 259, in learn
    self._env.rollout(
  File "/home/southriver/omnisafe/omnisafe/adapter/onpolicy_adapter.py", line 94, in rollout
    buffer.store(
  File "/home/southriver/omnisafe/omnisafe/common/buffer/vector_onpolicy_buffer.py", line 99, in store
    buffer.store(**{k: v[i] for k, v in data.items()})
  File "/home/southriver/omnisafe/omnisafe/common/buffer/vector_onpolicy_buffer.py", line 99, in <dictcomp>
    buffer.store(**{k: v[i] for k, v in data.items()})
IndexError: index 1 is out of bounds for dimension 0 with size 1

In this scenario, the step function was implemented to only collect single environments forward simulation, which indicates obs[obs_space_size], reward[1], and cost[1].

Then I tried to implement the step to collect batched forward simulation, and make the return elements in step satisfies obs[num_env, obs_space_size], reward[num_env, 1], and cost[num_env, 1], but also failed:

Processing rollout for epoch: 0... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:-- 
Traceback (most recent call last): 
File "/home/southriver/omnisafe/examples/train_from_custom_env.py", line 182, in <module> agent.learn() 
File "/home/southriver/omnisafe/omnisafe/algorithms/algo_wrapper.py", line 180, in learn ep_ret, ep_cost, ep_len = self.agent.learn() 
File "/home/southriver/omnisafe/omnisafe/algorithms/on_policy/base/policy_gradient.py", line 259, in learn self._env.rollout( 
File "/home/southriver/omnisafe/omnisafe/adapter/onpolicy_adapter.py", line 88, in rollout self._log_value(reward=reward, cost=cost, info=info) 
File "/home/southriver/omnisafe/omnisafe/adapter/onpolicy_adapter.py", line 155, in _log_value self._ep_ret += info.get('original_reward', reward).cpu() 
RuntimeError: output with shape [1] doesn't match the broadcast shape [1, 4]

I also tried to reshape the reward and cost into [1], but also failed:

Traceback (most recent call last): 
File "/home/southriver/omnisafe/examples/train_from_custom_env.py", line 186, in <module> agent.learn() 
File "/home/southriver/omnisafe/omnisafe/algorithms/algo_wrapper.py", line 180, in learn ep_ret, ep_cost, ep_len = self.agent.learn() 
File "/home/southriver/omnisafe/omnisafe/algorithms/on_policy/base/policy_gradient.py", line 259, in learn self._env.rollout( 
File "/home/southriver/omnisafe/omnisafe/adapter/onpolicy_adapter.py", line 94, in rollout buffer.store( 
File "/home/southriver/omnisafe/omnisafe/common/buffer/vector_onpolicy_buffer.py", line 99, in store buffer.store(**{k: v[i] for k, v in data.items()}) 
File "/home/southriver/omnisafe/omnisafe/common/buffer/onpolicy_buffer.py", line 145, in store self.data[key][self.ptr] = value 
RuntimeError: expand(torch.cuda.FloatTensor{[4, 185]}, size=[185]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

This is my original implementation in forward simulation:

    def step(
        self,
        action: torch.Tensor,
    ) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, dict]:
        self._count += 1
        # obs = torch.as_tensor(self._observation_space.sample())
        # reward = 2 * torch.as_tensor(random.random())  # noqa
        # cost = 2 * torch.as_tensor(random.random())  # noqa
        # terminated = torch.as_tensor(random.random() > 0.9)  # noqa
        
        # prepare action, a_in is the real action input to the simulator
        a_in = [
            (action[0] + 1) / 4 * 3,
            action[1]
        ]   
        
        # forward simulation
        latest_scan, distance, cos, sin, collision, goal, a, reward, cost = self.sim.step(
            lin_velocity=a_in[0].item(), ang_velocity=a_in[1].item()
        )
        
        # prepare observation
        latest_scan = np.array(latest_scan)
        inf_mask = np.isinf(latest_scan)
        latest_scan[inf_mask] = 7.0  # max range
        
        max_bins = 180
        bin_size = int(np.ceil(len(latest_scan) / max_bins))
        min_values = []
        for i in range(0, len(latest_scan), bin_size):
            # Get the current bin
            bin = latest_scan[i : i + min(bin_size, len(latest_scan) - i)]
            # Find the minimum value in the current bin and append it to the min_values list
            min_values.append(min(bin) / 7)
        
        distance /= 100 # process to maintain within [0, 1]
        lin_vel = (action[0] + 1) /2        # action is in [-1, 1], process to [0, 1]
        ang_vel = (action[1] + 1) /2
        state = min_values + [distance, cos, sin, lin_vel, ang_vel]   
        
        # process data types
        obs = torch.as_tensor(state, dtype=torch.float32).to(self.device) 
        reward = torch.as_tensor(reward, dtype=torch.float32).to(self.device)          
        cost = torch.as_tensor(cost, dtype=torch.float32).to(self.device)              
        terminated = torch.as_tensor(goal, dtype=torch.float32).to(self.device)        
        truncated = torch.as_tensor(self._count > self.max_episode_steps, dtype=torch.float32).to(self.device) 
        
        return obs, reward, cost, terminated, truncated, {'final_observation': obs, 'cost': cost}

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions