Skip to content

observation_space does not match reset() observation, though I confirmed they are identical.  #921

@adiyaDalat

Description

@adiyaDalat

Console report "The observation returned by the reset() method does not match the given observation space", though I printed the shape of observation from reset() and observation_space which are identical.

minimal code as below:

import gym
import numpy as np

from stable_baselines3 import A2C
from stable_baselines3.common.env_checker import check_env

class CustomEnv(core.Env):
    
    metadata = {'render.modes': ['human']}
    
    def __init__(self, df, window=1):
        
        self.feature = df.to_numpy()
        self.window = window
        self.feature_shape = (window, np.shape(self.feature)[1])
        
        self.current_tick = self.window
        
        self.action_space = spaces.Discrete(3)
        
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=self.feature_shape, dtype=np.float32)
        
        self.step_num = 0
        self.total_reward = 0
        
    
    def reset(self):
        position = 0
        self.current_tick = self.window
        
        obs = self._get_observation(position)
        return obs
    
    def _get_observation(self, position):
        obs = self.feature[(self.current_tick-self.window):self.current_tick]
        return obs

check_env(CustomEnv(df, window=3))
Traceback (most recent call last): AssertionError: The observation returned by the `reset()` method does not match the given observation space

### Checklist

  • [v] I have read the documentation (required)
  • [v] I have checked that there is no similar issue in the repo (required)
  • [v] I have checked my env using the env checker (required)
  • [v] I have provided a minimal working example to reproduce the bug (required)

Above that, I have two more questions:

  1. How should I integrate the state of agent{0,1,2} into the observation? If I directly integrate the state as another element of the observation array, will agent take it as the important state of itself rather than the observation of the surrounding environment? I found in OpenAI gym, the document used a dict as the observation though rather than mixed them.
  2. If I want to use multiple rows of data as input observation from an array and use LSTM as model, is the coding above correct? SB3 suggested to flatten to 1D array though..

Metadata

Metadata

Assignees

No one assigned

    Labels

    custom gym envIssue related to Custom Gym EnvquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions