Skip to content

Implement Atari environment using envpool#136

Open
SamBoasman wants to merge 2 commits intomainfrom
feat/atari_env
Open

Implement Atari environment using envpool#136
SamBoasman wants to merge 2 commits intomainfrom
feat/atari_env

Conversation

@SamBoasman
Copy link
Contributor

No description provided.

@SamBoasman SamBoasman requested a review from Copilot December 15, 2025 04:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Atari environment support using the envpool library for high-performance, vectorized Atari game environments. The implementation leverages envpool's built-in frame stacking rather than the existing MultiModalWrapper approach used by other environments.

Key Changes:

  • Adds new AtariEnvironment class that wraps envpool for Atari games with configurable frame stacking
  • Integrates seed parameters into the environment factory to enable proper seeding of training and evaluation environments
  • Fixes a potential NameError in training_runner.py by properly initializing train_info

Reviewed changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scripts/util/configurations.py Adds AtariConfig class with default 4-frame stacking configuration
scripts/environments/atari/atari_environment.py New AtariEnvironment implementation using envpool with support for frame stacking, rendering, and seeding
scripts/environments/environment_factory.py Extends factory to accept seed parameters, adds AtariEnvironment case, fixes "Unknown" typo
scripts/training_runner.py Initializes train_info at loop start to prevent NameError when episode ends before policy update
scripts/base_runner.py Passes train_seed and eval_seed to environment factory
requirements.txt Adds envpool==0.8.4 dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +80 to +88
frame = np.stack([frame] * 3, axis=-1)
return cv2.resize(frame, (width, height), interpolation=cv2.INTER_CUBIC)

def render(self):
frame = self.grab_frame()
cv2.imshow(self.name, frame)
cv2.waitKey(1)

def get_overlay_info(self) -> dict:
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grab_frame method assumes that if state.shape has 4 dimensions, it's RGB and takes the last 3 channels with self.state[-3:]. However, with envpool's frame stacking (stack_num=4), the observation shape will be (4, 84, 84) for grayscale Atari games. This means len(self.state.shape) == 3, not 4, so it would go to the else branch. The logic should check the actual channel dimension rather than the number of dimensions to distinguish between RGB and grayscale stacked frames.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
episodic_life=evaluation,
reward_clip=evaluation,
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation parameter is inverted for episodic_life and reward_clip. In Atari, these settings should be disabled during evaluation (set to False) for standard evaluation, and enabled during training (set to True). Currently, when evaluation=True, these are set to True, which is the opposite of the intended behavior.

Suggested change
episodic_life=evaluation,
reward_clip=evaluation,
episodic_life=not evaluation,
reward_clip=not evaluation,

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +36
return self.env.action_space.high[0]

@cached_property
def min_action_value(self) -> float:
return self.env.action_space.low[0]
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Atari environments with Discrete action spaces, attempting to access .high[0] and .low[0] will raise an AttributeError because Discrete spaces don't have these attributes. These properties are only applicable to Box action spaces, but Atari games use Discrete action spaces.

Suggested change
return self.env.action_space.high[0]
@cached_property
def min_action_value(self) -> float:
return self.env.action_space.low[0]
if isinstance(self.env.action_space, spaces.Box):
return self.env.action_space.high[0]
elif isinstance(self.env.action_space, spaces.Discrete):
return self.env.action_space.n - 1
else:
raise ValueError(f"Unhandled action space type: {type(self.env.action_space)}")
@cached_property
def min_action_value(self) -> float:
if isinstance(self.env.action_space, spaces.Box):
return self.env.action_space.low[0]
elif isinstance(self.env.action_space, spaces.Discrete):
return 0
else:
raise ValueError(f"Unhandled action space type: {type(self.env.action_space)}")

Copilot uses AI. Check for mistakes.
return np.array([self.env.action_space.sample()], dtype=int)

def set_seed(self, seed: int) -> None:
self.env.reset()
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling self.env.reset() without using seed parameter doesn't actually seed the environment. This method should accept and use the seed parameter to properly seed the environment, similar to how OpenAIEnvironment handles it with self.env.reset(seed=seed).

Suggested change
self.env.reset()
self.env.reset(seed=seed)

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +86
def render(self):
frame = self.grab_frame()
cv2.imshow(self.name, frame)
cv2.waitKey(1)
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The window name is created using the task name and seed, but if the config.display is not 1, the self.name attribute won't be set. This will cause an AttributeError when render() is called and tries to use self.name. Consider initializing self.name regardless of the display setting or adding a conditional check in the render method.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants